← Back to all projects

LEARN FUZZING FROM SCRATCH

Software is inherently complex, and with complexity comes bugs. These bugs can range from minor annoyances to critical security vulnerabilities that expose sensitive data or allow remote code execution. Traditional testing methods, while essential, often struggle to uncover these elusive flaws, especially those triggered by unexpected or malformed inputs. This is where fuzzing comes in.

Learn Guided Fuzzing with AFL: From Zero to AFL Internals Master

Goal: Deeply understand the Domain Name Systemโ€”from basic lookups to building servers, implementing security, and mastering the protocol that makes the internet usable.


Why Fuzzing Matters

Software is inherently complex, and with complexity comes bugs. These bugs can range from minor annoyances to critical security vulnerabilities that expose sensitive data or allow remote code execution. Traditional testing methods, while essential, often struggle to uncover these elusive flaws, especially those triggered by unexpected or malformed inputs. This is where fuzzing comes in.

Fuzzing is an automated software testing technique that involves feeding a program with large amounts of malformed, unexpected, or random data to expose bugs, crashes, or other anomalous behavior. Itโ€™s a powerful technique for discovering vulnerabilities that might otherwise go unnoticed.

Historically, fuzzing started as โ€œdumbโ€ or โ€œblack-boxโ€ fuzzing, where inputs were generated randomly without any knowledge of the programโ€™s internal structure. While simple, this approach often struggled to reach deep code paths, as many programs expect inputs in specific formats.

The advent of guided fuzzing, pioneered by tools like American Fuzzy Lop (AFL), revolutionized the field. Guided fuzzers donโ€™t just throw random data; they observe how the program behaves with each input and use that feedback to intelligently generate subsequent inputs. This โ€œsmartโ€ approach allows them to explore code paths more efficiently and uncover bugs in areas that traditional fuzzers would never reach.

AFL, first released in 2013, quickly became a de-facto standard in fuzzing due to its effectiveness and ease of use. It has found hundreds of significant bugs in major software projects, demonstrating that vulnerabilities can be detected automatically at scale. Understanding AFLโ€™s internal workings is not just about mastering a tool; itโ€™s about grasping the fundamental concepts of modern software security testing and building a foundation for advanced vulnerability research.


Core Concept Analysis

1. The Fuzzing Spectrum: Black-box, Grey-box, White-box

Fuzzing techniques can be broadly categorized based on their knowledge of the target programโ€™s internal structure:

  • Black-box Fuzzing: The fuzzer has no knowledge of the programโ€™s internal structure. It treats the program as a black box, feeding random inputs and observing crashes.
  • White-box Fuzzing: The fuzzer has full knowledge of the programโ€™s source code or binary. It uses techniques like symbolic execution to analyze program paths.
  • Grey-box Fuzzing: A middle ground. Grey-box fuzzers, like AFL, use lightweight instrumentation to gain some feedback (e.g., code coverage) from the programโ€™s execution.
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Black-box Fuzzerโ”‚     โ”‚   Grey-box Fuzzer โ”‚     โ”‚   White-box Fuzzerโ”‚
โ”‚                   โ”‚     โ”‚                   โ”‚     โ”‚                   โ”‚
โ”‚  Input -> Program โ”‚     โ”‚  Input -> Program โ”‚     โ”‚  Input -> Program โ”‚
โ”‚        โ”‚          โ”‚     โ”‚        โ”‚          โ”‚     โ”‚        โ”‚          โ”‚
โ”‚        โ””โ”€ Output  โ”‚     โ”‚        โ””โ”€ Output  โ”‚     โ”‚        โ””โ”€ Output  โ”‚
โ”‚                   โ”‚     โ”‚          โ”‚         โ”‚     โ”‚          โ”‚         โ”‚
โ”‚  No internal      โ”‚     โ”‚          โ””โ”€ Feedback (e.g., Coverage) โ”‚     โ”‚          โ””โ”€ Deep Analysis (e.g., Symbolic Execution) โ”‚
โ”‚  knowledge        โ”‚     โ”‚                   โ”‚     โ”‚                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

2. Code Coverage: The Guiding Light

AFL uses edge coverage to understand which parts of the programโ€™s control flow graph are executed.

       โ”Œโ”€โ”€โ”€โ”
       โ”‚ A โ”‚
       โ””โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ (Edge A->B)
       โ”Œโ”€โ”€โ”€โ”
       โ”‚ B โ”‚
       โ””โ”€โ”€โ”€โ”˜
      โ•ฑ   โ•ฒ
     โ–ผ     โ–ผ (Edges B->C, B->D)
   โ”Œโ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”
   โ”‚ C โ”‚ โ”‚ D โ”‚
   โ””โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”˜

3. The Feedback Loop

AFLโ€™s core is an evolutionary algorithm. It starts with seeds, mutates them, executes the target, and if new coverage is found, adds the mutated input back to the queue.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”     โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Seed Corpus โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Mutate Input โ”‚โ”€โ”€โ”€โ”€โ–ถโ”‚ Execute Targetโ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜     โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ–ฒ                                         โ”‚
       โ”‚           โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”      Found   โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค New Coverage? โ”‚โ—€โ”€โ”€โ”€โ”€โ”€Newโ”€โ”€โ”€โ”€โ”˜
                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜      Path

Concept Summary Table

Concept Cluster What You Need to Internalize
Instrumentation How binary/source code is modified to report execution paths to the fuzzer.
Mutation Engine The strategies (bit-flipping, arithmetic, splicing) used to generate diverse test cases.
Coverage Bitmap How edge transitions are hashed into a fixed-size shared memory region to track global progress.
Fork Server The mechanism used to avoid the overhead of execve() for every single test case.
Evolutionary Queue How the fuzzer manages and prioritizes the most promising inputs for further exploration.

Deep Dive Reading by Concept

Foundational Theory

Concept Book & Chapter
Basics of Fuzzing The Fuzzing Book by Zeller et al. โ€” Ch. 1: โ€œIntroduction to Fuzzingโ€
Code Coverage The Fuzzing Book by Zeller et al. โ€” Ch. 2: โ€œCoverage-Based Fuzzingโ€
Grey-box Fuzzing The Fuzzing Book by Zeller et al. โ€” Ch. โ€œGreybox Fuzzingโ€

System Internals

Concept Book & Chapter
Process Control Computer Systems: A Programmerโ€™s Perspective by Bryant & Oโ€™Hallaron โ€” Ch. 8: โ€œExceptional Control Flowโ€
Shared Memory The Linux Programming Interface by Michael Kerrisk โ€” Ch. 48: โ€œSystem V Shared Memoryโ€
Binary Analysis Practical Binary Analysis by Dennis Andriesse โ€” Ch. 10: โ€œDynamic Binary Instrumentationโ€

Project 1: Simple Black-box Fuzzer

  • File: LEARN_FUZZING_FROM_SCRATCH.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C, Go, Rust
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The โ€œResume Goldโ€
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Fuzzing Fundamentals, Input Generation
  • Software or Tool: Any command-line program
  • Main Book: โ€œThe Fuzzing Bookโ€ by Zeller et al.

What youโ€™ll build: A tool that feeds random byte strings to a target program and monitors for crashes (exit signals).

Why it teaches fuzzing: You learn the core loop: Generate -> Execute -> Monitor. It highlights why โ€œdumbโ€ fuzzing fails on complex inputs (parsing roadblocks).

Core challenges youโ€™ll face:

  • Catching signals (SIGSEGV, SIGABRT)
  • Handling process timeouts
  • Generating diverse random blobs

Real World Outcome A script that finds null-pointer dereferences in a buggy C parser you write.

$ python fuzzer.py ./target
[+] Iteration 1240: CRASH! (Saved to crashes/id_001.bin)

Project 2: Mutation Engine (AFL Strategies)

  • File: LEARN_FUZZING_FROM_SCRATCH.md
  • Main Programming Language: Python
  • Alternative Programming Languages: C, Rust
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The โ€œResume Goldโ€
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Data Manipulation
  • Software or Tool: AFL++ Documentation (Technical Details)
  • Main Book: โ€œThe Fuzzing Bookโ€ by Zeller et al.

What youโ€™ll build: A library implementing bit-flipping, byte-flipping, arithmetic increments, and โ€œinteresting valuesโ€ (0, -1, INT_MAX).

Why it teaches fuzzing: It shows how AFL systematically probes for boundary conditions (off-by-one, overflows) instead of just relying on luck.


Project 3: The Fork Server

  • File: LEARN_FUZZING_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The โ€œResume Goldโ€
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Operating Systems, IPC
  • Software or Tool: fork(), pipes, waitpid()
  • Main Book: โ€œThe Linux Programming Interfaceโ€ by Michael Kerrisk

What youโ€™ll build: A parent process that โ€œforksโ€ a child which is already pre-initialized, communicating via pipes to trigger new runs.

Why it teaches fuzzing: This is the secret to AFLโ€™s speed (1000+ execs/sec). You learn the cost of execve() vs fork().


Project 4: Shared Memory Coverage Bitmap

  • File: LEARN_FUZZING_FROM_SCRATCH.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The โ€œResume Goldโ€
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Memory Internals
  • Software or Tool: shmget, shmat
  • Main Book: โ€œThe Linux Programming Interfaceโ€ by Michael Kerrisk

What youโ€™ll build: A 64KB shared memory region where a โ€œtargetโ€ writes its path IDs and the โ€œfuzzerโ€ reads them to detect new coverage.

Why it teaches fuzzing: You understand how AFL โ€œseesโ€ code without having a debugger attached.


Project 5: Coverage-Guided Orchestrator

  • File: LEARN_FUZZING_FROM_SCRATCH.md
  • Main Programming Language: Python/C
  • Alternative Programming Languages: Go, Rust
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 1. The โ€œResume Goldโ€
  • Difficulty: Level 4: Expert
  • Knowledge Area: Systems Integration
  • Software or Tool: Custom built pieces from Projects 1-4
  • Main Book: โ€œPractical Binary Analysisโ€ by Dennis Andriesse

What youโ€™ll build: The full AFL loop. A fuzzer that takes a seed, mutates it, runs it via the fork server, reads the SHM bitmap, and adds it to the queue if new bits are set.


Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
Black-box Fuzzer 1/5 1 Day Surface 2/5
Mutation Engine 2/5 2 Days Medium 3/5
Fork Server 4/5 4 Days Deep (OS) 4/5
Coverage Bitmap 3/5 2 Days Deep (Memory) 3/5
Full Orchestrator 5/5 1-2 Weeks Master 5/5

Recommendation

Start with Project 1 to get the satisfaction of finding your first crash. Then, immediately dive into Project 3 (Fork Server). Most people find the OS-level โ€œtrickโ€ of the fork server to be the most enlightening part of how modern fuzzers achieve extreme performance.


Final Overall Project: The โ€œMini-AFLโ€

What youโ€™ll build: A self-contained, coverage-guided fuzzer for Linux ELF binaries. It will use a GCC plugin or LLVM pass to instrument code, a fork server for speed, and a genetic algorithm queue to manage thousands of inputs.

Goal: Find a CVE (Common Vulnerability and Exposure) in an open-source library (like a small JSON parser or image library) using your own tool.


Summary

This learning path takes you from basic random testing to building a high-performance, coverage-guided security tool.

# Project Name Main Language Difficulty Time Estimate
1 Black-box Fuzzer Python Beginner 1 day
2 Mutation Engine Python Intermediate 2 days
3 Fork Server C Advanced 4 days
4 Coverage Bitmap C Advanced 2 days
5 Full Orchestrator C/Python Expert 1-2 weeks

Expected Outcomes

  • Deep understanding of Linux process management and IPC.
  • Ability to read and implement research papers on software security.
  • Mastery of the โ€œfeedback loopโ€ that drives modern automated bug finding.
  • A significant portfolio piece demonstrating systems-level engineering.