← Back to all projects

SPRINT 2 DATA AND INVARIANTS PROJECTS

Sprint 2: Data & Invariants — Project Recommendations

Goal

After completing these projects, you will understand the hidden contracts that govern C programming—the unwritten rules that determine whether your program works correctly or descends into undefined behavior.

You will internalize:

  • What invariants are and why they’re the difference between “works on my machine” and “provably correct”
  • Who owns what memory and when it’s safe to use, modify, or free it
  • How data actually exists in memory (not as abstract types, but as bytes with alignment and padding)
  • When pointers are valid and when they’re time bombs waiting to explode
  • How to design defensively so your code fails loudly and obviously rather than silently corrupting data

This isn’t about memorizing C syntax. It’s about developing the instinct to ask the right questions:

  • “What must ALWAYS be true for this data structure to make sense?”
  • “Who is responsible for freeing this memory?”
  • “Is this pointer still valid here?”
  • “What happens if this assumption is violated?”

By the end of this sprint, you’ll write code that is not just “working” but trustworthy—code you can reason about, debug systematically, and deploy with confidence.


Foundational Concepts: The Five Pillars

1. Invariants — The Hidden Rules That Must Never Break

What is an invariant?

An invariant is a condition that must always be true at specific points in your program’s execution. It’s not a suggestion. It’s not a performance optimization. It’s a contract that your code makes with itself.

┌─────────────────────────────────────────────────┐
│  INVARIANT: A condition that MUST be true      │
│  at all "stable points" in your program        │
│                                                 │
│  Stable points:                                 │
│  • Before a function is called                 │
│  • After a function returns                    │
│  • At the start of a loop iteration            │
│  • After allocating/freeing memory             │
└─────────────────────────────────────────────────┘

Example: Dynamic Array Invariants

Consider a simple dynamic array:

typedef struct {
    int *data;      // Pointer to heap-allocated array
    size_t length;  // Number of elements currently in use
    size_t capacity; // Total number of elements allocated
} Vec;

This structure has at least five invariants that must hold simultaneously:

INVARIANT SET for Vec:
┌──────────────────────────────────────────────────────────┐
│ 1. capacity >= length                                    │
│    (You can't use more space than you allocated)         │
│                                                           │
│ 2. data != NULL  OR  capacity == 0                       │
│    (Either we have allocated memory, or capacity is 0)   │
│                                                           │
│ 3. First 'length' elements at data[0..length-1] are      │
│    initialized and valid                                 │
│                                                           │
│ 4. Only ONE owner may call free(data)                    │
│    (Double-free is undefined behavior)                   │
│                                                           │
│ 5. After realloc(), ALL previous pointers to             │
│    elements are INVALID                                  │
│    (The array may have moved)                            │
└──────────────────────────────────────────────────────────┘

2. Ownership — Every Byte Has Exactly One Master

In C, ownership is the answer to the question: “Who is responsible for calling free() on this memory?”

Every allocated byte must have exactly one owner:

  • Too few owners (zero) → memory leak
  • Too many owners (two+) → double free or use-after-free
OWNERSHIP PRINCIPLE:
┌─────────────────────────────────────────────────┐
│  Every malloc() must have EXACTLY ONE free()   │
│  Every piece of memory has ONE responsible      │
│  entity that will free it                       │
│                                                 │
│  Owner = "The entity that calls free()"        │
└─────────────────────────────────────────────────┘

Pattern 1: STRUCT OWNS ITS HEAP DATA
┌──────────────────────────┐
│  Vec                     │
│  ┌────────────────────┐  │
│  │ int *data   ───┐   │  │
│  │ size_t length  │   │  │
│  │ size_t capacity│   │  │
│  └────────────────│───┘  │
└───────────────────│──────┘
                    ▼
           ┌────────────────┐
           │ Heap memory    │  ← Vec OWNS this
           │ [1,2,3,4,5...] │  ← vec_destroy() frees it
           └────────────────┘

Ownership Pattern: Struct owns heap data


3. Memory Layout — Data Is Not Abstract; It’s Bytes with Rules

In C, data is bytes—and those bytes have very specific rules about where they can live and how they can be arranged.

ALIGNMENT RULES (typical 64-bit system):
┌────────────┬───────────┬────────────────────────────┐
│ Type       │ Size      │ Must start at address...   │
├────────────┼───────────┼────────────────────────────┤
│ char       │ 1 byte    │ Any address (1-aligned)    │
│ short      │ 2 bytes   │ Multiple of 2 (2-aligned)  │
│ int        │ 4 bytes   │ Multiple of 4 (4-aligned)  │
│ long       │ 8 bytes   │ Multiple of 8 (8-aligned)  │
│ double     │ 8 bytes   │ Multiple of 8 (8-aligned)  │
│ pointer    │ 8 bytes   │ Multiple of 8 (8-aligned)  │
└────────────┴───────────┴────────────────────────────┘

Struct padding example:

struct Example1 {
    char a;     // 1 byte
    int b;      // 4 bytes
    char c;     // 1 byte
};
// sizeof(Example1) == 12 bytes (not 6!)

Memory layout:
┌──────┬─────┬─────┬─────┬──────────────────┬──────┬─────┬─────┬─────┐
  a    PAD  PAD  PAD         b           c    PAD  PAD  PAD 
├──────┼─────┼─────┼─────┼──────────────────┼──────┼─────┼─────┼─────┤
  0     1    2    3    4   5   6   7     8     9   10   11  
└──────┴─────┴─────┴─────┴──────────────────┴──────┴─────┴─────┴─────┘

Struct Padding: Memory layout with alignment


4. Validity Windows — When Pointers Live and Die

A pointer has a lifetime—a window of time during program execution when dereferencing it is safe.

SCENARIO: Reallocation invalidates ALL previous pointers
──────────────────────────────────────────────────────────
Vec v;
vec_init(&v);

vec_push(&v, 10);
int *first = &v.data[0];  // ← first points into v.data
printf("%d\n", *first);   // ✓ SAFE (currently valid)

vec_push(&v, 20);  // ← This might realloc() v.data
                   // ← v.data may have MOVED
                   // ← first is now INVALID (dangling pointer)

printf("%d\n", *first);   // ✗ UNDEFINED BEHAVIOR
                          //   (first points to freed memory)

BEFORE realloc():
  v.data ─────────┐
                  ▼
         ┌─────────────────┐
  Heap:  │ [10, ...empty...] │
         └─────────────────┘
                  ▲
  first ─────────┘   (VALID pointer)

AFTER vec_push() triggers realloc():
  v.data ─────────────────────┐
                              ▼
                     ┌─────────────────────────────┐
  Heap:              │ [10, 20, ...empty........]  │
                     └─────────────────────────────┘

         ┌─────────────────┐
         │ (freed memory)  │
         └─────────────────┘
                  ▲
  first ─────────┘   (INVALID - DANGLING!)

Pointer Invalidation: Before and after realloc


5. Defensive Design — Trust Nothing, Verify Everything

Defensive design is the philosophy: Assume everything will go wrong, and code accordingly.

DEFENSIVE DESIGN PRINCIPLES:
┌─────────────────────────────────────────────────┐
│  1. Every input is malicious until proven safe │
│  2. Every pointer is NULL until proven valid   │
│  3. Every buffer can overflow                  │
│  4. Every invariant can be violated            │
│  5. Fail LOUDLY and EARLY, not silently later │
└─────────────────────────────────────────────────┘

Example: Defensive vs naive code:

// NAIVE CODE:
void vec_get(Vec *v, size_t index) {
    return v->data[index];  // ✗ Hope nothing goes wrong
}

// DEFENSIVE CODE:
int vec_get(Vec *v, size_t index, int *out) {
    // 1. Validate inputs
    if (v == NULL || out == NULL) return -1;

    // 2. Check invariants
    assert(v->capacity >= v->length);

    // 3. Validate index
    if (index >= v->length) return -1;

    // 4. Perform operation
    *out = v->data[index];
    return 0;  // ✓ Success
}

Core Concept Analysis

The mental shift you’re targeting is profound: from “does this compile?” to “under what conditions is this valid?”

This breaks down into these fundamental building blocks:

Concept Cluster What You Must Internalize
Invariants Rules that must ALWAYS be true for data to be meaningful
Ownership Every byte has exactly one owner responsible for its lifetime
Memory Layout Data isn’t abstract—it’s bytes with alignment and padding
Validity Windows Pointers are only valid during specific program phases
Defensive Design Trust nothing, verify everything, fail loudly

Deep Dive Reading By Concept

This section maps each core concept to specific book chapters for deeper understanding. Use this as your reference guide throughout the sprint.

Concept 1: Invariants

Book Chapter/Section What You’ll Learn Priority
“Writing Solid Code” by Steve Maguire Chapter 2: Assert Yourself Using assertions to enforce invariants at runtime, when to check vs when to trust ⭐⭐⭐ Essential
“C Interfaces and Implementations” by David Hanson Chapter 1: Introduction (pp. 1-20) Design-by-contract philosophy, preconditions/postconditions, invariants as interface guarantees ⭐⭐⭐ Essential
“Effective C” by Robert Seacord Chapter 2: Objects, Functions, and Types (pp. 23-45) Type invariants, object validity, representation invariants ⭐⭐ Recommended
“The Practice of Programming” by Kernighan & Pike Chapter 5: Debugging (pp. 109-144) Documenting invariants, defensive checks, invariant violations as debugging clues ⭐⭐ Recommended
“Code Complete” by Steve McConnell Chapter 8.3: Design by Contract Formal treatment of contracts, invariants in practice ⭐ Supplemental

Key Takeaways:

  • Invariants are not “nice to have”—they define what makes data valid
  • Runtime assertions catch violated contracts before corruption spreads
  • Well-documented invariants are your best debugging tool

Concept 2: Ownership (Memory Ownership, Lifetime, Transfer)

Book Chapter/Section What You’ll Learn Priority
“Effective C” by Robert Seacord Chapter 6: Dynamic Memory Management (pp. 167-220) Ownership rules, who frees what, transfer of ownership, ownership models ⭐⭐⭐ Essential
“C Interfaces and Implementations” by David Hanson Chapter 5: Memory Management - Arena (pp. 81-104) Arena ownership model, bulk deallocation, lifetime hierarchies ⭐⭐⭐ Essential
“21st Century C” by Ben Klemens Chapter 11: Debugging (pp. 283-310) Ownership tracking, Valgrind usage, leak detection strategies ⭐⭐⭐ Essential
“Expert C Programming” by Peter van der Linden Chapter 4: The Shocking Truth: C Arrays and Pointers Are NOT the Same! (pp. 97-128) Ownership implications of array decay, pointer aliasing ⭐⭐ Recommended
“The Practice of Programming” by Kernighan & Pike Chapter 4: Interfaces (pp. 77-108) Clear ownership contracts in API design ⭐⭐ Recommended

Key Takeaways:

  • Every byte has exactly one owner responsible for freeing it
  • Transfer of ownership must be explicit and documented
  • Arena/pool allocators simplify ownership by grouping lifetimes

Concept 3: Memory Layout (Struct Layout, Padding, Alignment, Fat Pointers)

Book Chapter/Section What You’ll Learn Priority
“Computer Systems: A Programmer’s Perspective” (CS:APP) by Bryant & O’Hallaron Chapter 3.9: Heterogeneous Data Structures (pp. 279-297) Struct layout rules, padding calculation, alignment requirements by architecture ⭐⭐⭐ Essential
“Expert C Programming” by Peter van der Linden Chapter 6: Poetry in Motion: Runtime Data Structures (pp. 149-180) How structs are laid out in memory, alignment holes, sizeof surprises ⭐⭐⭐ Essential
“Effective C” by Robert Seacord Chapter 5: Pointers (pp. 125-165) Pointer sizes, alignment, fat pointers (pointer + metadata), representation ⭐⭐⭐ Essential
“21st Century C” by Ben Klemens Chapter 4: Functions (pp. 87-116) - see “Flexible Function Inputs” Using structs to pass metadata with pointers (fat pointer pattern) ⭐⭐ Recommended
“The Lost Art of Structure Packing” by Eric S. Raymond Online article (full text) Minimizing struct size through field reordering, cache-friendly layout ⭐⭐ Recommended

Key Takeaways:

  • Alignment is not optional—misaligned access crashes on some architectures
  • Padding bytes exist between fields; struct order matters for size
  • Fat pointers (pointer + length/capacity) are safer than bare pointers

Concept 4: Pointer Validity (When Pointers Are Valid, Aliasing Dangers)

Book Chapter/Section What You’ll Learn Priority
“Effective C” by Robert Seacord Chapter 5.3: Pointer Arithmetic (pp. 145-152) Valid pointer range, past-the-end pointers, when arithmetic is defined ⭐⭐⭐ Essential
“Expert C Programming” by Peter van der Linden Chapter 9: More about Arrays (pp. 237-266) Pointer validity lifetime, dangling pointers, use-after-free ⭐⭐⭐ Essential
“Computer Systems: A Programmer’s Perspective” (CS:APP) Chapter 3.10.4: Pointers and Arrays (pp. 297-302) Array decay, pointer bounds, undefined behavior zones ⭐⭐ Recommended
“C Traps and Pitfalls” by Andrew Koenig Chapter 4: Linkage (pp. 61-80) Scope vs lifetime, when automatic variables become invalid ⭐⭐ Recommended
“Writing Solid Code” by Steve Maguire Chapter 3: Fortify Your Subsystems (pp. 51-78) Detecting invalid pointers, debugging use-after-free ⭐⭐ Recommended
“The Practice of Programming” by Kernighan & Pike Chapter 6: Testing (pp. 145-178) Testing pointer validity edge cases ⭐ Supplemental

Aliasing-Specific Resources:

  • “What Every Programmer Should Know About Memory” by Ulrich Drepper (online, Section 6.1) - Cache effects of aliasing
  • “Understanding and Using C Pointers” by Richard Reese - Chapter 8: Pointers and Memory Aliasing

Key Takeaways:

  • Pointers are only valid within their object’s lifetime
  • Realloc invalidates ALL previous pointers to the block
  • Aliasing (two pointers to same memory) makes reasoning difficult

Concept 5: Defensive Design (Bounds Checking, Sentinel Values, Representing Absence)

Book Chapter/Section What You’ll Learn Priority
“Writing Solid Code” by Steve Maguire Chapter 3: Fortify Your Subsystems (pp. 51-78) Pervasive input validation, bounds checking strategies, fail-fast design ⭐⭐⭐ Essential
“The Practice of Programming” by Kernighan & Pike Chapter 5: Debugging (pp. 109-144) Defensive checks, sentinel values, making bugs reproducible ⭐⭐⭐ Essential
“Effective C” by Robert Seacord Chapter 4.3: Representing Strings (pp. 95-110) Null termination, bounded strings, sentinel techniques ⭐⭐ Recommended
“C Traps and Pitfalls” by Andrew Koenig Chapter 2: Syntactic Pitfalls (pp. 17-44) Common mistakes that bounds checking prevents ⭐⭐ Recommended
“21st Century C” by Ben Klemens Chapter 2: Debug, Test, Document (pp. 23-58) Modern defensive programming with sanitizers, static analysis ⭐⭐ Recommended
“Secure Coding in C and C++” by Robert Seacord Chapter 2: Strings (pp. 23-84) Security-focused bounds checking, preventing overflows ⭐ Supplemental

Key Takeaways:

  • Never trust input—validate array indices, pointer bounds, sizes
  • Sentinel values (NULL, -1, SIZE_MAX) clearly represent “absent” data
  • Fail loudly (assertions) rather than silently corrupting memory

Essential Reading Order (Week-by-Week)

This reading plan aligns with the project timeline and builds concepts progressively.

Week 1: Foundation (Before Project 1 - Dynamic Array)

Goal: Understand invariants and basic ownership

Priority Book Reading Time Notes
⭐⭐⭐ “C Interfaces and Implementations” Chapter 1 (pp. 1-20) 1 hour The design-by-contract foundation
⭐⭐⭐ “Writing Solid Code” Chapter 2: Assert Yourself 2 hours How to use assertions effectively
⭐⭐⭐ “Effective C” Chapter 6 (pp. 167-200) - first half 2 hours malloc/free ownership basics
⭐⭐ “The Practice of Programming” Chapter 4: Interfaces (pp. 77-108) 1.5 hours Clean API ownership contracts

Total: ~6.5 hours reading

Action: Start Project 1 with invariants and assertions from day one


Week 2: Memory Layout & Validity (During Project 1, Before Project 2/3)

Goal: Understand how data lives in memory

Priority Book Reading Time Notes
⭐⭐⭐ “Computer Systems” (CS:APP) Chapter 3.9 (pp. 279-297) 2 hours Struct layout, padding, alignment
⭐⭐⭐ “Expert C Programming” Chapter 6 (pp. 149-180) 2 hours Memory layout surprises
⭐⭐⭐ “Effective C” Chapter 5.3 (pp. 145-152) 1 hour Pointer arithmetic validity
⭐⭐ Eric S. Raymond “The Lost Art of Structure Packing” 30 min Online - practical packing guide

Total: ~5.5 hours reading

Action: Apply to Project 2 (gap buffer) or Project 3 (arena alignment)


Week 3: Ownership in Depth (During Projects 2/3)

Goal: Master ownership transfer and arena patterns

Priority Book Reading Time Notes
⭐⭐⭐ “C Interfaces and Implementations” Chapter 5: Arena (pp. 81-104) 2 hours Arena allocator deep dive
⭐⭐⭐ “Effective C” Chapter 6 (pp. 200-220) - second half 1.5 hours Advanced ownership patterns
⭐⭐⭐ “21st Century C” Chapter 11: Debugging (pp. 283-310) 2 hours Valgrind, ownership tracking
⭐⭐ “Expert C Programming” Chapter 9 (pp. 237-266) 2 hours Pointer lifetime subtleties

Total: ~7.5 hours reading

Action: Apply to Project 3 (arena) or Project 4 (JSON ownership decisions)


Week 4: Defensive Design & Integration (During Projects 4/5)

Goal: Build bulletproof, production-grade code

Priority Book Reading Time Notes
⭐⭐⭐ “Writing Solid Code” Chapter 3: Fortify Subsystems (pp. 51-78) 2 hours Pervasive validation
⭐⭐⭐ “The Practice of Programming” Chapter 5: Debugging (pp. 109-144) 2 hours Debugging with invariants
⭐⭐ “The Practice of Programming” Chapter 6: Testing (pp. 145-178) 2 hours Test strategies for invariants
⭐⭐ “C Traps and Pitfalls” Chapters 2, 4 (pp. 17-44, 61-80) 2 hours Common mistakes to avoid

Total: ~8 hours reading

Action: Apply to Projects 4/5 (JSON parser, linked list DB)


Week 5+: Advanced Topics (During Capstone HTTP Server)

Goal: Production-quality, security-aware code

Priority Book Reading Time Notes
⭐⭐⭐ “The Linux Programming Interface” Chapter 63: Alternative I/O (pp. 1367-1402) 3 hours select/poll for HTTP server
⭐⭐ “Secure Coding in C and C++” Chapter 2: Strings (pp. 23-84) 2 hours Security-focused validation
⭐⭐ “The Tangled Web” Chapter 9: HTTP (pp. 135-160) 1.5 hours HTTP parsing dangers
“Understanding and Using C Pointers” Chapter 8: Aliasing (online chapter) 1 hour Deep aliasing coverage

Total: ~7.5 hours reading

Action: Build production-grade HTTP server with all concepts combined


Reading Strategy Tips

How to Read Technical Books Effectively

  1. Read with a project in mind: Don’t read passively. For each chapter, ask “How does this apply to my current project?”

  2. Type the examples: Don’t just read code—type it, compile it, break it. Change variables and see what happens.

  3. Keep a concepts journal: For each reading session, write:
    • 3 key concepts learned
    • 1 thing that surprised you
    • 1 thing to try in your project
  4. Skim first, deep-dive second: First pass—skim headings and code samples (30 min). Second pass—read carefully and take notes (2x time).

  5. Focus on priority stars:
    • ⭐⭐⭐ Essential: Read carefully, take notes, type examples
    • ⭐⭐ Recommended: Read thoroughly, focus on concepts
    • ⭐ Supplemental: Skim or reference as needed

Alternative Resources (If Books Are Unavailable)

If you don’t have access to these books:

  • Invariants: “Design by Contract in C” (online articles), SQLite source code comments
  • Ownership: Rust book Chapter 4 (concepts transfer to C), Zig documentation on allocators
  • Memory Layout: “What Every Computer Scientist Should Know About Memory” by Ulrich Drepper (free online)
  • Pointer Validity: Clang/GCC sanitizer documentation (shows what they catch)
  • Defensive Design: CERT C Coding Standard (free online), MISRA C guidelines

Concept-to-Project Mapping

Project Primary Concepts Read These First
Dynamic Array Invariants, Ownership, Bounds Checking “Writing Solid Code” Ch.2, “C Interfaces” Ch.1, “Effective C” Ch.6 (first half)
Gap Buffer Memory Layout, Aliasing, Sentinel Values CS:APP Ch.3.9, “Expert C” Ch.6, “Practice of Programming” Ch.5
Arena Allocator Ownership, Alignment, Lifetime “C Interfaces” Ch.5, CS:APP Ch.3.9.3, “Effective C” Ch.6 (second half)
JSON Parser Ownership Transfer, Representing Absence, Defensive Design “Effective C” Ch.4+6, “Writing Solid Code” Ch.3, “Practice of Programming” Ch.4
Linked List DB Invariants, Iterator Validity, Defensive Checks “Writing Solid Code” Ch.2+3, “Algorithms in C” Ch.3
HTTP Server All concepts combined All Week 5+ readings

If you’re buying books one at a time:

  1. “Effective C” by Robert Seacord (2020) - Modern, comprehensive, covers 80% of what you need
  2. “Writing Solid Code” by Steve Maguire (1993) - Timeless defensive programming wisdom
  3. “Computer Systems: A Programmer’s Perspective” (CS:APP) - Essential for memory layout understanding
  4. “C Interfaces and Implementations” by David Hanson - Best for learning by working code examples
  5. “The Practice of Programming” by Kernighan & Pike - Broader wisdom, excellent for debugging/testing
  6. “Expert C Programming” by Peter van der Linden - Fun read, memorable examples, great for “aha moments”

Note: Many university libraries have CS:APP, and several of these books have older editions available cheaply or free online.


Project 1: Safe Dynamic Array Library (vec.h)

  • File: SPRINT_2_DATA_AND_INVARIANTS_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Data Structures / Safety
  • Software or Tool: Generic Data Structures
  • Main Book: “C Interfaces and Implementations” by David Hanson

What you’ll build: A reusable, memory-safe dynamic array library in C with documented invariants, bounds checking, and clear ownership semantics—usable in other projects.

Why it teaches Data & Invariants: This is the canonical “invariant-heavy” data structure. A dynamic array has at least 5 invariants that must hold simultaneously:

  • capacity >= length
  • data != NULL || capacity == 0
  • length elements are initialized
  • Only the owner may free data
  • After realloc, all previous pointers are invalid

You cannot implement this correctly without thinking about invariants constantly.

Core challenges you’ll face:

  • Defining the invariant formally: What EXACTLY must be true after every operation? (maps to: invariant design)
  • Handling reallocation safely: When you grow, old pointers die. How do you document/enforce this? (maps to: aliasing dangers, pointer validity)
  • Ownership on insert/remove: Does the array own the elements or just point to them? What happens on vec_clear()? (maps to: ownership, deep vs shallow copy)
  • Bounds checking without performance death: Where do you check? What do you do on violation? (maps to: manual bounds checking, asserting invariants)
  • Memory layout decisions: Store {ptr, len, cap} or {cap, len, data[]}? Each has tradeoffs. (maps to: struct layout, fat pointers)

Key Concepts:

  • Invariants as contracts: “C Interfaces and Implementations” by David Hanson - Chapter 1 (Design Principles)
  • Dynamic array implementation: “C Programming: A Modern Approach” by K.N. King - Chapter 17 (Advanced Uses of Pointers)
  • Assertions in C: “Writing Solid Code” by Steve Maguire - Chapter 2 (Assert Yourself)
  • Memory ownership patterns: “Effective C” by Robert Seacord - Chapter 6 (Dynamic Memory Management)

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic C, pointers, malloc/free basics

Real world outcome:

  • A working vec.h library you’ll reuse in every subsequent project
  • A test program that demonstrates: creating arrays, pushing 10,000 elements, random access, removal, and memory cleanup with zero leaks (verified by Valgrind)
  • Printed output showing capacity growth: "Pushed 1000 items, capacity grew: 16 → 32 → 64 → 128 → 256 → 512 → 1024 → 2048"

Learning milestones:

  1. After basic implementation - You understand why vec->data can become invalid after vec_push()
  2. After adding bounds checks - You’ve internalized the cost/benefit of runtime verification
  3. After documenting invariants - You can explain exactly when your data structure is “valid” vs “corrupted”
  4. After running Valgrind clean - You understand ownership isn’t optional—it’s the difference between working code and time bombs

Real World Outcome (Expanded)

What you’ll see when you’re done:

You’ll have a production-ready vec.h header file and a demonstration program that proves your implementation works correctly:

$ gcc -Wall -Wextra -g vec_demo.c -o vec_demo
$ ./vec_demo

=== Dynamic Array (vec.h) Demonstration ===

[Test 1: Basic Creation and Push]
Creating vector with initial capacity 4...
Vector created: {data=0x7f9a42c05a00, length=0, capacity=4}
Invariant check: ✓ capacity >= length (4 >= 0)
Invariant check: ✓ data != NULL
Invariant check: ✓ length within bounds

Pushing integers 0-9...
Push 0: capacity grew 4 → 8 (realloc triggered)
  Old pointer: 0x7f9a42c05a00 (now INVALID)
  New pointer: 0x7f9a42c05c00
Push 1: no realloc (8 > 1)
...
Push 8: capacity grew 8 → 16 (realloc triggered)
  Old pointer: 0x7f9a42c05c00 (now INVALID)
  New pointer: 0x7f9a42c05e00

Final state: {length=10, capacity=16}
Contents: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

[Test 2: Random Access with Bounds Checking]
vec_get(vec, 5) = 5 ✓
vec_get(vec, 9) = 9 ✓
Attempting out-of-bounds access: vec_get(vec, 10)...
ASSERTION FAILED: vec_demo.c:42: index < vec->length (10 < 10)
Aborted (core dumped)

[Test 3: Memory Growth Pattern]
Starting with capacity=1, pushing 10,000 integers...
Capacity growth sequence:
  1 → 2 → 4 → 8 → 16 → 32 → 64 → 128 → 256 → 512 → 1024 → 2048 → 4096 → 8192 → 16384

Total reallocations: 14
Final capacity: 16384 (63.6% utilization)
Amortized cost analysis:
  Total push operations: 10,000
  Total realloc operations: 14
  Average cost per push: O(1)[Test 4: Cleanup and Valgrind]
Destroying vector...
Freed 16384 bytes

$ valgrind --leak-check=full ./vec_demo

==12345== HEAP SUMMARY:
==12345==     in use at exit: 0 bytes in 0 blocks
==12345==   total heap usage: 15 allocs, 15 frees, 131,072 bytes allocated
==12345==
==12345== All heap blocks were freed -- no leaks are possible

SUCCESS: Zero memory leaks ✓

The files you’ll have:

$ ls -lh
-rw-r--r--  vec.h           # Your library (150-200 lines)
-rw-r--r--  vec_demo.c      # Demonstration program
-rw-r--r--  vec_test.c      # Unit tests
-rw-r--r--  Makefile        # Build configuration

The Core Question You’re Answering

The fundamental question this project answers:

“How do you design a data structure so that it’s IMPOSSIBLE to use incorrectly—or at least OBVIOUS when you do?”

This breaks down into three sub-questions:

  1. “What must always be true?” (Invariants)
    • How do you formally define “valid state”?
    • What are the relationships between fields that must never break?
  2. “Who is responsible for what?” (Ownership)
    • Who allocates the memory?
    • Who frees it?
    • What happens if I copy a pointer—do I now share ownership?
  3. “When is a pointer safe to use?” (Validity windows)
    • After vec_push(), is my old pointer to vec->data still valid?
    • How do you document these lifetime rules?

The “Aha!” moment you’re chasing:

Halfway through this project, you’ll have a bug. You’ll push an element, your capacity will grow, and suddenly your pointer to an old element will be reading garbage. That’s when invariants stop being “comments in the header” and become “laws of physics your code must obey.”

Concepts You Must Understand First

Before writing a single line of code, research these topics:

Topic What You Must Know Book Reference
Pointer Arithmetic Why ptr + 1 moves by sizeof(*ptr), not 1 byte “C Programming: A Modern Approach” by K.N. King - Chapter 12
malloc/realloc/free How realloc can move memory, what happens to old pointers “Effective C” by Robert Seacord - Chapter 6.2
Dangling Pointers What makes a pointer “dangling” vs “valid” “Expert C Programming” by Peter van der Linden - Chapter 4
What is an Invariant? A property that must be true at well-defined program points “C Interfaces and Implementations” by David Hanson - Chapter 1.3
Using assert() How to write assertions, when they’re checked, -DNDEBUG flag “Writing Solid Code” by Steve Maguire - Chapter 2
Amortized O(1) Why doubling capacity makes push O(1) on average “Introduction to Algorithms” by Cormen et al. - Chapter 17.4

Pre-coding checklist:

  • What does realloc(ptr, new_size) return if it fails? (Answer: NULL, and old pointer is still valid)
  • If capacity is 8 and I push the 9th element, what should the new capacity be? (Answer: 16, using doubling)
  • After vec_push(&v, x), is int *old_ptr = &v.data[0] still safe? (Answer: NO, realloc may have moved data)

Questions to Guide Your Design

Answer these BEFORE implementing each function:

  1. For vec_init(): Should initial capacity be zero or non-zero? What does capacity=0 mean for the data pointer?
  2. For vec_push(): When capacity is full, by how much should you grow? What if realloc fails?
  3. For vec_get() and vec_set(): What happens on out-of-bounds access? Assert? Return error?
  4. For vec_remove(): When removing from the middle, do you shift elements left or swap with the last?
  5. For vec_destroy(): After destroy, should vec->data be NULL? Is it safe to call twice?

Thinking Exercise

Trace Memory By Hand (DO THIS BEFORE CODING):

vec_t v;
vec_init(&v, 2);  // capacity=2

int *p0 = &v.data[0];
vec_push(&v, 10);  // v.data[0] = 10
vec_push(&v, 20);  // v.data[1] = 20

int *p1 = &v.data[1];
vec_push(&v, 30);  // Triggers realloc!

printf("%d\n", *p0);  // What happens?
printf("%d\n", *p1);  // What happens?

Draw the memory layout on paper for each step. Circle where p0 and p1 become dangling after realloc.

The Interview Questions They’ll Ask

  1. “What is the difference between &x and x?”
  2. “You have a dynamic array that starts at capacity 1 and doubles each time. You push n elements. What’s the total cost of all reallocations?”
    • Expected: O(n) total, O(1) amortized per push
  3. “After calling vec_push(), why might pointers to elements become invalid?”
  4. “Your vector implementation passes all tests but Valgrind shows a leak. Where would you look first?”
  5. “Should vec_get() return a pointer to the element or a copy? Justify your choice.”

Hints in Layers

Hint 1: Starting Structure

typedef struct {
    int *data;        // Pointer to heap-allocated array
    size_t length;    // Number of elements currently stored
    size_t capacity;  // Total slots allocated
} vec_t;

Hint 2: The Push Operation

void vec_push(vec_t *vec, int value) {
    if (vec->length >= vec->capacity) {
        size_t new_capacity = (vec->capacity == 0) ? 1 : vec->capacity * 2;
        int *new_data = realloc(vec->data, new_capacity * sizeof(int));
        if (new_data == NULL) abort();  // Handle failure
        vec->data = new_data;
        vec->capacity = new_capacity;
    }
    vec->data[vec->length] = value;
    vec->length++;
}

Hint 3: Common Bugs

  • Forgetting capacity == 0 case (leads to infinite loop with 0 * 2 = 0)
  • Not checking if realloc failed
  • Updating vec->capacity before checking realloc success

Books That Will Help

Topic Book Chapter
Dynamic Arrays “C Programming: A Modern Approach” by K.N. King Chapter 17
Invariants & Contracts “C Interfaces and Implementations” by David Hanson Chapter 1
Assertions “Writing Solid Code” by Steve Maguire Chapter 2
Memory Management “Effective C” by Robert Seacord Chapter 6
Amortized Analysis “Introduction to Algorithms” by Cormen et al. Chapter 17.4
Defensive Programming “The Practice of Programming” by Kernighan & Pike Chapter 5

Project 2: Text Editor Buffer (Gap Buffer Implementation)

  • File: SPRINT_2_DATA_AND_INVARIANTS_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Text Editing / Data Structures
  • Software or Tool: Gap Buffer
  • Main Book: “The Craft of Text Editing” by Craig Finseth

What you’ll build: The core buffer data structure used by real text editors—a gap buffer that efficiently handles insertions and deletions at the cursor position.

Why it teaches Data & Invariants: A gap buffer has one of the most elegant invariant sets in computer science:

[text before cursor][.....gap.....][text after cursor]

The “gap” is uninitialized memory that moves with the cursor. You must track:

  • Where the gap starts
  • Where the gap ends
  • What’s “real text” vs “garbage in the gap”
  • When the buffer needs to grow

One wrong index and you’re reading garbage or corrupting memory.

Core challenges you’ll face:

  • Representing “absence” in the gap: The gap contains invalid data—how do you ensure you never read it? (maps to: sentinel values, representing absence safely)
  • Moving the gap: Cursor moves require memcpy/memmove—aliasing errors lurk everywhere (maps to: aliasing dangers, deep vs shallow copy)
  • Struct layout for performance: Gap buffer fields must be ordered to minimize cache misses (maps to: struct layout, padding, alignment)
  • Documenting the invariant: Your header comment must precisely specify what’s valid (maps to: documenting invariants)
  • Debugging corrupted buffers: When invariants break, you need to dump state intelligibly (maps to: debugging broken invariants)

Key Concepts:

  • Gap buffer algorithm: “The Craft of Text Editing” by Craig Finseth - Chapter 6 (available free online)
  • Memory movement and aliasing: “C Programming: A Modern Approach” by K.N. King - Chapter 17 (memcpy vs memmove)
  • Struct layout and padding: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Chapter 3.9
  • Defensive programming: “The Practice of Programming” by Kernighan & Pike - Chapter 5 (Debugging)

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 1 (or equivalent dynamic array understanding)

Real world outcome:

  • A terminal-based mini text editor where you can:
    • Type characters (inserted at cursor)
    • Move cursor with arrow keys
    • Delete with backspace/delete
    • Load and save files
  • Visible proof it works: open a 10MB log file, jump to middle, insert text—it should be instant (O(1) insertion)
  • A debug command that prints: "Buffer: [Hello | | World] gap_start=6 gap_end=11 total=16"

Learning milestones:

  1. After basic insert/delete - You understand why gap buffers beat naive arrays for editing
  2. After implementing gap movement - You’ve debugged your first aliasing bug (guaranteed)
  3. After adding file I/O - You understand ownership transfer (file → buffer → file)
  4. After handling edge cases - You appreciate why “empty” and “full” are the hardest states

Real World Outcome (Expanded)

What you’ll see when it works:

You’ll build a terminal-based text editor called gapedit that behaves like a miniature Emacs. Here’s exactly what happens when you run it:

$ ./gapedit myfile.txt
GapEdit v1.0 - Gap Buffer Text Editor
Loaded: myfile.txt (1,247 bytes)
Buffer allocated: 2048 bytes (gap: 801 bytes)
----------------------------------------
Hello, World!
This is a test file.
█
----------------------------------------
[Ln 2, Col 1] | GAP: 801 bytes | BUF: 2048 bytes | Ctrl+D=debug, Ctrl+Q=quit

Press Ctrl+D (debug mode) to see the gap structure:

========== GAP BUFFER DEBUG VIEW ==========
Total buffer size: 2048
Gap start: 41
Gap end: 842
Gap size: 801 bytes
Text before gap: 41 bytes
Text after gap: 205 bytes

Memory layout (visual):
[Hello, World!\nThis is a test file.\n|........gap.........|remaining text]
                                     ^gap_start          ^gap_end

Invariants check:
✓ gap_start <= gap_end
✓ gap_end <= total_size
✓ buffer != NULL
✓ gap is at cursor position
==========================================

Performance comparison mode (Ctrl+P):

========== PERFORMANCE TEST ==========
Test: Insert 1000 characters at same position

Gap Buffer:     0.0023 seconds
Naive Array:    2.4521 seconds (1065x slower!)

Test: Random inserts at cursor position
Gap Buffer:     O(1) per insert
Naive Array:    O(n) per insert (requires shifting)
======================================

The Core Question You’re Answering

Main Question:

“How do you make text insertion O(1) when arrays require O(n) shifting, without using a linked list of characters?”

The Deep Insight: Text editing has a locality property—99% of edits happen at or near the cursor position. A gap buffer says: “Keep a ‘hole’ where the cursor is. Insertions just fill the hole.”

The Invariant That Makes It Work:

buffer = [prefix text][........gap........][suffix text]
                      ^                    ^
                  gap_start            gap_end

The guarantee:

  • buffer[0..gap_start-1] contains text before cursor
  • buffer[gap_start..gap_end-1] contains GARBAGE (uninitialized memory)
  • buffer[gap_end..total_size-1] contains text after cursor
  • The cursor is ALWAYS at gap_start

Concepts You Must Understand First

Concept What to Study Book Reference
Gap Buffer Algorithm Conceptual understanding of the data structure “The Craft of Text Editing” by Craig Finseth - Chapter 6 (free online)
memcpy vs memmove Why memcpy is dangerous when source and destination overlap “C Programming: A Modern Approach” by K.N. King - Chapter 17.7
Struct Layout Understanding struct padding and cache implications “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Chapter 3.9
Representing Absence The gap contains undefined data—never read it “Effective C” by Robert Seacord - Chapter 2.5
Pointer Arithmetic Safe bounds checking with gap boundaries “C Programming: A Modern Approach” by K.N. King - Chapter 12

Critical safety rules:

  • gap_start <= gap_end ALWAYS
  • gap_end <= total_size ALWAYS
  • NEVER read buffer[gap_start] when gap_start < gap_end

Questions to Guide Your Design

  1. Struct Design: What fields do you need? (buffer, gap_start, gap_end, total_size)
  2. Gap Movement: When cursor moves left, which byte moves from where to where?
  3. Insertion: What happens when the gap is full (gap_start == gap_end)?
  4. Deletion: Backspace vs Delete—what’s the difference in gap manipulation?
  5. Edge Cases: What does an empty buffer look like? A full buffer?
  6. File I/O: When saving, how do you skip the gap?

Thinking Exercise

Exercise: Manual Gap Buffer Trace

Start with this buffer (12 bytes total, gap is 5 bytes):

[H][e][l][ ][ ][ ][ ][ ][W][o][r][d]
 0  1  2  3  4  5  6  7  8  9  10 11
       ^gap_start=3      ^gap_end=8

Perform these operations by hand:

  1. Insert ‘l’ at cursor → gap_start=?, gap_end=?
  2. Insert ‘o’ at cursor
  3. Move cursor right 1 position (which byte moves?)
  4. Delete (not backspace)

The Interview Questions They’ll Ask

  1. “Explain how a gap buffer works in 30 seconds.”
  2. “What’s the time complexity of inserting a character at the cursor in a gap buffer?”
  3. “Why can’t you use memcpy to move the gap?”
  4. “What are the invariants that must hold for a gap buffer to be valid?”
  5. “When would a gap buffer be worse than a linked list of lines?”
  6. “You’re at the end of a 1MB file. You press Home. What’s the time complexity?”

Hints in Layers

Hint 1: Core Structure

struct gap_buffer {
    char *buffer;      // The actual text storage
    size_t gap_start;  // Start of gap (cursor position)
    size_t gap_end;    // End of gap (one past gap)
    size_t total_size; // Total allocated size
};

Hint 2: Moving Gap Right

void move_gap_right(struct gap_buffer *buf) {
    if (buf->gap_end >= buf->total_size) return;
    buf->buffer[buf->gap_start] = buf->buffer[buf->gap_end];
    buf->gap_start++;
    buf->gap_end++;
}

Hint 3: Growing the Buffer When gap is full, use realloc and move text after gap to end of new buffer.

Books That Will Help

Topic Book Chapter
Gap Buffer Algorithm “The Craft of Text Editing” by Craig Finseth Chapter 6 (free online)
Memory Movement “C Programming: A Modern Approach” by K.N. King Chapter 17.7
Struct Layout “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron Chapter 3.9
Defensive Programming “The Practice of Programming” by Kernighan & Pike Chapter 5
Terminal I/O “The Linux Programming Interface” by Michael Kerrisk Chapter 62

Project 3: Memory Arena Allocator

  • File: SPRINT_2_DATA_AND_INVARIANTS_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig, C++
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: Level 1: The “Resume Gold”
  • Difficulty: Level 3: Advanced (The Engineer)
  • Knowledge Area: Memory Management, Data Invariants
  • Software or Tool: Custom Memory Allocator
  • Main Book: C Interfaces and Implementations by David Hanson

What you’ll build: A custom memory allocator that allocates from a fixed pool and frees everything at once—the pattern used in game engines, compilers, and parsers.

Why it teaches Data & Invariants: This is ownership crystallized into code. An arena says:

“I own all memory allocated from me. When I die, it all dies.”

You must track:

  • What’s allocated vs available
  • Alignment requirements for different types
  • When pointers into the arena become invalid
  • The “high water mark” of usage

Core challenges you’ll face:

  • Alignment enforcement: Allocating a double at an odd address crashes on some CPUs (maps to: struct layout, padding, alignment)
  • Defining arena lifetime: All pointers die when arena_reset() is called—how do you prevent use-after-reset? (maps to: ownership, when pointers are valid)
  • No individual free: Arenas don’t free individual allocations—this constraint IS the invariant (maps to: invariants as design constraints)
  • Nested/child arenas: Can an arena allocate from another arena? Ownership gets complex fast (maps to: ownership transfer)
  • Detecting overflow: What happens when the arena is full? (maps to: asserting invariants at runtime)

Key Concepts:

  • Arena allocation pattern: “Handmade Hero” by Casey Muratori - Day 5-8 (videos, search “handmade hero memory arena”)
  • Alignment requirements: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Chapter 3.9.3
  • Memory allocator design: “C Interfaces and Implementations” by David Hanson - Chapter 5 (Arena)
  • Ownership semantics: “Effective C” by Robert Seacord - Chapter 6

Difficulty: Intermediate-Advanced Time estimate: 1 week Prerequisites: Solid understanding of malloc/free, pointer arithmetic

Real world outcome:

  • An arena.h library with: arena_create(), arena_alloc(), arena_reset(), arena_destroy()
  • A demonstration program that:
    • Parses a 1MB JSON-like file using ONLY arena allocation
    • Prints memory stats: "Parsed 50,000 nodes using 2.3MB arena (0 malloc calls during parse)"
    • Resets arena and parses again (proving reset works)
  • Valgrind showing exactly 2 allocations total (arena create + backing memory)

Learning milestones:

  1. After basic arena works - You understand why games/parsers use arenas (bulk free is FAST)
  2. After handling alignment - You’ve read your CPU’s alignment requirements and understood why
  3. After integrating with another project - You see how ownership models simplify entire codebases
  4. After benchmarking vs malloc - You have numbers showing arena allocation is 10-100x faster

Real World Outcome (Expanded)

What you will see when this project is complete:

$ ./arena_demo
=== Memory Arena Allocator Demo ===

[Test 1: Basic Allocation]
Creating arena with 1MB capacity...
Arena created: base=0x7f8a4c000000 size=1048576 offset=0

Allocating 100 integers (400 bytes)...
Arena stats: allocated=400 available=1048176 fragmentation=0.00%
Allocated at: 0x7f8a4c000000 (aligned to 8 bytes)

Allocating 50 doubles (400 bytes)...
Arena stats: allocated=800 available=1047776 fragmentation=0.00%
Allocated at: 0x7f8a4c000190 (aligned to 8 bytes)

[Test 2: JSON Parser Simulation]
Parsing JSON file (1.2 MB)...
- Allocated 50,000 nodes (40 bytes each) = 2,000,000 bytes
- Allocated 42,000 strings (avg 28 bytes) = 1,176,000 bytes
- Total arena usage: 3,176,000 bytes
- Parse time: 12.3ms
- Number of malloc calls during parse: 0

Resetting arena...
Arena stats: allocated=0 available=4194304 (all memory reclaimed)

[Test 3: Alignment Verification]
Testing alignment for different types:
  char:     allocated at 0x7f8a4c000000 (alignment=1) ✓
  short:    allocated at 0x7f8a4c000002 (alignment=2) ✓
  int:      allocated at 0x7f8a4c000004 (alignment=4) ✓
  double:   allocated at 0x7f8a4c000010 (alignment=8)[Test 4: Arena vs Malloc Benchmark]
Allocating 100,000 small objects (16 bytes each):

Using malloc/free:
  Allocation time: 45.2ms
  Deallocation time: 38.7ms
  Total time: 83.9ms

Using arena:
  Allocation time: 1.3ms
  Reset time: 0.02ms
  Total time: 1.32ms
  Speedup: 63.6x faster

[Memory Leak Check]
$ valgrind --leak-check=full ./arena_demo
==12345== All heap blocks were freed -- no leaks are possible

The Core Question You’re Answering

Primary Question:

“If I can’t call free() on individual allocations, how do I prevent memory leaks while still being efficient?”

The Answer You’ll Discover: Memory ownership doesn’t have to be about tracking individual allocations. By grouping allocations by lifetime, you can:

  1. Eliminate the entire class of “forgot to free” bugs
  2. Make deallocation O(1) instead of O(n)
  3. Prevent fragmentation entirely

Secondary Questions This Project Resolves:

  1. “Why does alignment matter?”
    • You’ll see programs crash on some architectures when you ignore alignment
    • You’ll understand why malloc() always returns maximally-aligned pointers
  2. “What does ‘lifetime’ mean in practical terms?”
    • Request handling: allocate during parsing, free when response sent
    • Frame allocation in games: allocate per frame, reset every 16ms

Concepts You Must Understand First

Concept What to Research Book Reference
Memory Alignment Why CPUs require certain types at certain addresses “Computer Systems: A Programmer’s Perspective” - Section 3.9.3
Pointer Arithmetic Integer-pointer conversions (uintptr_t), alignment checking via bitwise AND “C Programming: A Modern Approach” by K.N. King - Chapter 17.4
Ownership & Lifetime RAII pattern, scope-based resource management “C Interfaces and Implementations” by David Hanson - Chapter 5
Bump Allocator Simplest form of arena—just increment a pointer Handmade Hero videos - Day 5-8

Write down your invariants BEFORE coding:

INV1: 0 <= offset <= capacity
INV2: offset % align == 0 (where align is max alignment)
INV3: All returned pointers are in range [base, base+offset)
INV4: After reset, offset == 0
INV5: base is aligned to _Alignof(max_align_t)

Questions to Guide Your Design

  1. What operations should the arena support? (Create, alloc, reset, destroy—anything else?)
  2. What should happen when the arena is full? (Return NULL? Abort? Grow?)
  3. How should alignment be specified? (Infer from type? Explicit parameter?)
  4. How do you handle alignment on allocation?
    size_t aligned_offset = (arena->offset + alignment - 1) & ~(alignment - 1);
    
  5. What should arena_reset() do? (Just set offset = 0? Poison memory in debug mode?)

Thinking Exercise

Exercise: Manual Arena Simulation

Given this sequence of operations, draw the memory layout after each step:

Arena arena;
arena_init(&arena, 64);  // 64 bytes total

char *p1 = arena_alloc(&arena, 3, 1);  // 3 bytes, align=1
int  *p2 = arena_alloc(&arena, 4, 4);  // 4 bytes, align=4
char *p3 = arena_alloc(&arena, 5, 1);  // 5 bytes, align=1
double *p4 = arena_alloc(&arena, 8, 8); // 8 bytes, align=8

Questions:

  1. What is the offset after each allocation?
  2. Where is padding inserted, and why?
  3. How much memory is “wasted” to alignment?

The Interview Questions They’ll Ask

  1. “Explain how an arena allocator works in 30 seconds.”
  2. “Why are arenas faster than malloc?”
  3. “When would you NOT use an arena allocator?”
  4. “How do you handle alignment in an arena allocator?”
    • Answer: (offset + align - 1) & ~(align - 1)
  5. “What happens when an arena runs out of space?”
  6. “How do you debug use-after-reset bugs?”

Hints in Layers

Hint 1: Core Structure

typedef struct Arena {
    char *base;      // Start of memory
    size_t capacity; // Total size
    size_t offset;   // Current position
} Arena;

Hint 2: Alignment in Code

void* arena_alloc(Arena *arena, size_t size, size_t alignment) {
    size_t aligned_offset = (arena->offset + alignment - 1) & ~(alignment - 1);
    if (aligned_offset + size > arena->capacity) return NULL;
    void *ptr = arena->base + aligned_offset;
    arena->offset = aligned_offset + size;
    return ptr;
}

Hint 3: Type-Safe Macro

#define arena_alloc_type(arena, T) \
    (T*)arena_alloc(arena, sizeof(T), _Alignof(T))

Books That Will Help

Topic Book Chapter
Alignment Fundamentals “Computer Systems: A Programmer’s Perspective” Section 3.9.3
Arena Allocator Design “C Interfaces and Implementations” by David Hanson Chapter 5
Pointer Arithmetic “C Programming: A Modern Approach” by K.N. King Chapter 17.4
Memory Ownership “Effective C” by Robert Seacord Chapter 6
Practical Implementation Handmade Hero by Casey Muratori Day 5-8 (videos)

Project 4: JSON Parser with Explicit Ownership

  • File: SPRINT_2_DATA_AND_INVARIANTS_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Parsing / Data Structures
  • Software or Tool: JSON
  • Main Book: “Crafting Interpreters” by Bob Nystrom

What you’ll build: A JSON parser that produces a tree structure with clear, documented ownership semantics—who owns each string, each array, each object.

Why it teaches Data & Invariants: JSON parsing is a gauntlet of ownership decisions:

  • Does the parsed tree own the strings, or point into the source?
  • Who frees nested objects?
  • What happens to the tree if parsing fails halfway?
  • How do you represent null vs “key not present”?

Every node in your tree has an invariant. Every edge represents an ownership relationship.

Core challenges you’ll face:

  • String ownership strategies: Copy strings (safe, slow) or point into source (fast, dangerous)? (maps to: deep vs shallow copy, ownership)
  • Representing JSON null vs absent: null is a value; missing key is different—how do you distinguish? (maps to: representing absence safely, sentinel values)
  • Array and object children: Your dynamic array and string buffer from Project 1 become real (maps to: data structures)
  • Partial parse cleanup: Parser hits error at byte 50,000—you must free 1,000 already-allocated nodes (maps to: ownership, when to free)
  • Recursive structure invariants: json_value contains json_array contains json_value… (maps to: pointers in structs, validity)

Key Concepts:

  • Recursive descent parsing: “Crafting Interpreters” by Robert Nystrom - Chapter 6 (free online)
  • Ownership in tree structures: “C Interfaces and Implementations” by David Hanson - Chapter 2 (Lists) and Chapter 3 (Tables)
  • String interning patterns: “The Practice of Programming” by Kernighan & Pike - Chapter 2 (Algorithms)
  • Error handling with cleanup: “Effective C” by Robert Seacord - Chapter 6.6 (Flexible Array Members and Cleanup)

Difficulty: Intermediate-Advanced Time estimate: 2 weeks Prerequisites: Projects 1 & 3 (dynamic array and arena), basic parsing concepts

Real world outcome:

  • A json.h library that parses JSON from string or file
  • A CLI tool: ./jsonquery file.json '.users[0].name' → prints "Alice"
  • Demonstrates ownership by:
    • Parsing a 5MB JSON file
    • Printing: "Parsed 125,000 values (42,000 strings, 3,000 arrays, 80,000 primitives)"
    • Freeing everything with zero leaks
  • Bonus: pretty-print the parsed JSON back out (proving you understood the structure)

Learning milestones:

  1. After parsing primitives - You’ve decided on your null/absent representation
  2. After parsing nested structures - You’ve hit your first “who frees this?” bug
  3. After handling parse errors - You understand why cleanup code is half the codebase
  4. After optimizing with arena - You see how ownership models compose

Real World Outcome (Expanded)

What you’ll see when this project is complete:

$ ./jsonquery data.json
Successfully parsed JSON in 0.34ms
Tree structure:
  - Root: Object (4 keys)
  - Total nodes: 247
  - Strings: 89 (12.4 KB)
  - Arrays: 12
  - Objects: 18
  - Primitives: 128
  - Memory used: 45.8 KB (arena-based, 0 individual allocations)

$ ./jsonquery data.json '.users[0].name'
"Alice Johnson"

$ ./jsonquery data.json '.users[].age' --format=csv
28
34
22

$ ./jsonquery input.json --pretty-print
{
  "users": [
    {
      "name": "Alice",
      "age": 28,
      "active": true
    }
  ]
}

Memory and Ownership Visualization:

$ ./jsonquery large.json --debug-ownership
Parse tree:
├─ Root object owns: 8 child values
│   ├─ Key "users" → Array (3 elements)
│   │   ├─ users[0] → Object (owns 4 strings)
│   │   │   └─ All strings are REFERENCES to intern table
│   │   ├─ users[1] → Object (owns 4 strings)
│   │   └─ users[2] → Object (owns 4 strings)
│   └─ ...

Cleanup strategy: Single arena_destroy() frees entire tree

The Core Question You’re Answering

“How do you build a recursive tree structure in C where every node clearly owns its children, strings have explicit lifetime semantics, and cleanup is guaranteed even on partial parse failures?”

This expands to:

  1. Ownership in Trees: When object A contains array B which contains object C, who frees C?
  2. String Ownership Trade-offs: Should the tree own copies (safe but slow), or point into source (fast but dangerous)?
  3. Representing Absence vs Null: How do you distinguish JSON null from “key doesn’t exist”?
  4. Partial Parse Cleanup: If parsing fails at token 5,000, how do you free the 1,200 nodes already allocated?

Concepts You Must Understand First

Concept What You Must Learn Book Reference
Recursive Descent Parsing How to parse nested structures with recursive functions “Crafting Interpreters” by Bob Nystrom - Chapter 6
Ownership Semantics Deep vs shallow copy, ownership transfer “Effective C” by Robert Seacord - Chapter 6
Union Types in C Tagged unions to represent multiple value types “C Programming: A Modern Approach” by K.N. King - Chapter 16.4
String Interning Deduplication where identical strings share one allocation “The Practice of Programming” - Chapter 2.9
Error Propagation How to propagate errors up the call stack without exceptions “Effective C” by Robert Seacord - Chapter 2.6

Questions to Guide Your Design

  1. Data Structure: How will you represent a JSON value? (Tagged union? Multiple structs?)
  2. String Ownership: Copy strings (needs char* str), or point into source buffer?
  3. Arrays: Dynamic array of json_value*? Who frees the array vs the elements?
  4. Objects: Array of {key, value} pairs? Hash table?
  5. Parse Errors: Return NULL? Return error code? What about partial cleanup?
  6. Arena Integration: All memory from arena—cleanup is arena_destroy()?

Thinking Exercise

Trace Ownership Through a Parse:

Given this JSON:

{
  "name": "Alice",
  "tags": ["admin", "user"]
}

Draw a memory diagram showing:

  1. The source string in memory
  2. The parsed tree with boxes for each node
  3. Ownership arrows: who owns what

Count allocations:

  • Deep copy mode: How many malloc calls?
  • Shallow copy mode: How many malloc calls?
  • Arena + intern mode: How many malloc calls?

The Interview Questions They’ll Ask

  1. “Explain the difference between deep copy and shallow copy in the context of a JSON parser.”
  2. “How would you detect and prevent stack overflow in a recursive descent parser?”
  3. “How do you represent ‘null’ vs ‘key not found’ in C?”
  4. “Write a function json_get_string(json_value* v) that safely extracts a string, returning NULL if not a string.”
  5. “Implement json_free(json_value* root) that recursively frees a JSON tree without leaking or double-freeing.”

Hints in Layers

Hint 1: Tagged Union Structure

typedef enum {
    JSON_NULL, JSON_BOOL, JSON_NUMBER, JSON_STRING, JSON_ARRAY, JSON_OBJECT
} json_type;

typedef struct json_value {
    json_type type;
    union {
        bool bool_val;
        double number_val;
        struct { char* str; size_t len; } string_val;
        struct { struct json_value** elements; size_t count; } array_val;
        struct { struct json_pair* pairs; size_t count; } object_val;
    } value;
} json_value;

Hint 2: Parsing Strategy

json_value* parse_value(const char** cursor) {
    skip_whitespace(cursor);
    char c = **cursor;
    if (c == '{') return parse_object(cursor);
    if (c == '[') return parse_array(cursor);
    if (c == '"') return parse_string(cursor);
    if (c == 't' || c == 'f') return parse_bool(cursor);
    if (c == 'n') return parse_null(cursor);
    if (c == '-' || isdigit(c)) return parse_number(cursor);
    return NULL;  // Parse error
}

Hint 3: Arena + Error Handling With arena allocation, partial parse cleanup is automatic—just destroy the arena.

Books That Will Help

Topic Book Chapter
Recursive Descent Parsing “Crafting Interpreters” by Bob Nystrom Chapter 6 (free online)
JSON Specification ECMA-404 Standard Full document (9 pages)
Ownership in Trees “C Interfaces and Implementations” by David Hanson Chapter 2-3
Tagged Unions “C Programming: A Modern Approach” by K.N. King Chapter 16.4
Error Handling “Effective C” by Robert Seacord Chapter 2.6

Project 5: Simple Linked List Database

  • File: SPRINT_2_DATA_AND_INVARIANTS_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Data Structures / Databases
  • Software or Tool: Linked List / Indexing
  • Main Book: “Algorithms in C” by Robert Sedgewick

What you’ll build: An in-memory database storing records as linked list nodes with indexes (also linked lists), supporting insert, delete, find, and iteration—with bulletproof invariant checking.

Why it teaches Data & Invariants: Linked lists get a bad reputation because people implement them without invariants. A proper linked list has:

  • head->prev == NULL
  • tail->next == NULL
  • For all nodes: node->next->prev == node (if next exists)
  • Count matches actual traversal length

Your database adds more: index consistency, no dangling references, transaction-like semantics.

Core challenges you’ll face:

  • Doubly-linked list invariants: The circular relationships must stay consistent through every operation (maps to: invariants, asserting at runtime)
  • Index synchronization: Delete a record → must update all indexes pointing to it (maps to: aliasing, pointers inside structs)
  • Iterator invalidation: What happens if you delete while iterating? (maps to: when pointers are valid)
  • Sentinel nodes: Should head/tail be real nodes or sentinels? Each has invariant implications (maps to: sentinel values)
  • Serialization: Save to disk → who owns the loaded data? (maps to: ownership transfer)

Key Concepts:

  • Linked list with invariants: “Algorithms in C” by Robert Sedgewick - Chapter 3 (Elementary Data Structures)
  • Doubly-linked invariants: “Introduction to Algorithms” by Cormen et al. - Chapter 10.2 (Linked Lists)
  • Database record structures: “Database Management Systems” by Ramakrishnan - Chapter 9 (conceptual, not full read)
  • Defensive data structure design: “Writing Solid Code” by Steve Maguire - Chapter 3 (Fortify Your Subsystems)

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Project 1, basic linked list concept

Real world outcome:

  • A contacts database CLI tool:
    > add "Alice" "alice@email.com" "555-1234"
    Added contact #1
    > add "Bob" "bob@email.com" "555-5678"
    Added contact #2
    > find-by-email "alice@email.com"
    #1: Alice | alice@email.com | 555-1234
    > delete 1
    Deleted contact #1
    > list
    #2: Bob | bob@email.com | 555-5678
    > save contacts.db
    Saved 1 records
    > load contacts.db
    Loaded 1 records
    
  • An --invariant-check flag that verifies all invariants after every operation (slow but proves correctness)
  • Prints: "Invariant check: ✓ list length=5, ✓ forward/backward consistency, ✓ index sync"

Learning milestones:

  1. After basic list operations - You’ve debugged a prev/next inconsistency
  2. After adding indexes - You understand why databases have “referential integrity”
  3. After adding persistence - You’ve dealt with ownership across process boundaries
  4. After running invariant checks - You trust your code because you VERIFIED it, not hoped

Real World Outcome (Expanded)

What you’ll see when this project is complete:

$ ./contactdb
ContactDB v1.0 - Type 'help' for commands
Invariant checking: ENABLED (use --fast to disable)

> add "Alice Johnson" "alice@techcorp.com" "555-0123" "Engineer"
[INVARIANT CHECK] Before insert: list_length=0 ✓ head=NULL ✓ tail=NULL ✓
[INVARIANT CHECK] After insert: list_length=1 ✓ head->prev=NULL ✓ tail->next=NULL ✓
Added record #1: Alice Johnson

> add "Bob Smith" "bob@startup.io" "555-0456" "Designer"
[INVARIANT CHECK] After insert: list_length=2 ✓ count matches traversal ✓
Added record #2: Bob Smith

> list
ID  | Name            | Email                | Phone      | Title
----|-----------------|----------------------|------------|----------
1   | Alice Johnson   | alice@techcorp.com   | 555-0123   | Engineer
2   | Bob Smith       | bob@startup.io       | 555-0456   | Designer

> check-invariants
Running comprehensive invariant check...
  ✓ Main list: length=2, forward traversal count=2, backward traversal count=2
  ✓ Head invariant: head->prev == NULL
  ✓ Tail invariant: tail->next == NULL
  ✓ Bidirectional consistency: all nodes satisfy node->next->prev == node
  ✓ Email index: 2 entries, all point to valid records
All invariants satisfied ✓

> save contacts.db
Saved 2 records to contacts.db

> exit
$ valgrind --leak-check=full ./contactdb < commands.txt
==12345== All heap blocks were freed -- no leaks are possible

The Core Question You’re Answering

The fundamental question this project answers:

“How do you make a data structure that’s impossible to corrupt?”

More specifically:

Why isn’t “it compiles and runs” enough?

  • Delete a node → forget to update prev → next delete causes crash 3 days later
  • Insert at tail → forget to update head when empty → iteration skips first element
  • Index points to deleted node → use-after-free → security vulnerability

The conceptual shift:

Before this project: “I’ll be careful and write correct code.” After this project: “I’ll write code that proves it’s correct by checking invariants.”

Concepts You Must Understand First

Concept What to Study Book Reference
Linked List Fundamentals Singly vs doubly-linked lists, node structure, basic operations “Algorithms in C” by Robert Sedgewick - Chapter 3.2-3.3
Doubly-Linked Invariants The bidirectional constraint: node->next->prev == node “Introduction to Algorithms” (CLRS) - Chapter 10.2
Invariants as Contracts Preconditions, postconditions, using assert() “Code Complete” by Steve McConnell - Chapter 8.2
Pointer Aliasing What happens when multiple pointers reference the same memory “Expert C Programming” by Peter van der Linden - Chapter 9
Iterator Invalidation What happens if you delete while iterating “Effective C” by Robert Seacord - Chapter 6.3

Pre-Project Checklist:

  • Can you draw a doubly-linked list with 3 nodes, showing all pointers?
  • Can you explain what happens if you free(node) but an index still points to it?
  • Do you know what assert() does and when to use it?

Questions to Guide Your Design

  1. Node Structure: Separate data struct, or embed data in node?
  2. Sentinel Nodes: Use NULL-terminated, or circular with sentinel?
  3. Invariants: Write them down NOW:
    • head == NULL ⟺ tail == NULL
    • If head != NULL, then head->prev == NULL
    • For all nodes n: n->next->prev == n
  4. When to Check: Before every operation? After? Only in debug?
  5. Iterator Invalidation: What if you delete the current node while iterating?
  6. Index Maintenance: When you delete a record, how do you update all indexes pointing to it?

Thinking Exercise

Exercise: Trace Delete Operations

Given a doubly-linked list with nodes Alice, Bob, Carol:

NULL ← [Alice] ↔ [Bob] ↔ [Carol] → NULL
        ↑head              ↑tail

Show the exact steps to delete Bob:

  1. Which pointers must change? (Hint: 4 pointers)
  2. In what order?
  3. When is it safe to free(bob)?
  4. What if Carol’s prev still points to freed memory?

The Interview Questions They’ll Ask

  1. “What invariants must hold for a doubly-linked list to be valid?”
  2. “What’s wrong with this code?”
    for (Node *n = list->head; n != NULL; n = n->next) {
        if (should_delete(n)) delete_node(list, n);  // Bug!
    }
    
  3. “You have a linked list database with an email index. When you delete a record, how do you update the index safely?”
  4. “How would you serialize a linked list to disk?”
  5. “Valgrind reports ‘definitely lost: 480 bytes in 10 blocks’. How do you debug this?”

Hints in Layers

Hint 1: Node Structure

typedef struct Record {
    int id;
    char *name;
    char *email;
    char *phone;
    char *title;
    struct Record *next;
    struct Record *prev;
} Record;

typedef struct ContactList {
    Record *head;
    Record *tail;
    int count;
} ContactList;

Hint 2: Invariant Checker

bool check_invariants(ContactList *list) {
    if ((list->head == NULL) != (list->tail == NULL)) return false;
    if (list->head != NULL && list->head->prev != NULL) return false;
    if (list->tail != NULL && list->tail->next != NULL) return false;

    int forward_count = 0;
    for (Record *r = list->head; r != NULL; r = r->next) {
        forward_count++;
        if (r->next != NULL && r->next->prev != r) return false;
    }
    return forward_count == list->count;
}

Hint 3: Safe Deletion

void delete_record(ContactList *list, int id) {
    Record *target = find_by_id(list, id);
    if (!target) return;

    if (list->head == list->tail) {  // Only node
        list->head = list->tail = NULL;
    } else if (target == list->head) {
        list->head = target->next;
        list->head->prev = NULL;
    } else if (target == list->tail) {
        list->tail = target->prev;
        list->tail->next = NULL;
    } else {  // Middle node
        target->prev->next = target->next;
        target->next->prev = target->prev;
    }

    free(target->name); free(target->email); free(target->phone); free(target->title);
    free(target);
    list->count--;
}

Books That Will Help

Topic Book Chapter
Linked List Fundamentals “Algorithms in C” by Robert Sedgewick Chapter 3.2-3.3
Doubly-Linked Lists “Introduction to Algorithms” (CLRS) Chapter 10.2
Invariants and Contracts “Code Complete” by Steve McConnell Chapter 8.2
Using assert() “Writing Solid Code” by Steve Maguire Chapter 2
Memory Ownership “Effective C” by Robert Seacord Chapter 6
Defensive Programming “The Practice of Programming” by Kernighan & Pike Chapter 5

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor Reusability
Dynamic Array Library ⭐⭐ 1 week Deep on invariants, ownership ⭐⭐⭐ Used in ALL other projects
Text Editor Buffer ⭐⭐⭐ 1-2 weeks Deep on aliasing, layout ⭐⭐⭐⭐⭐ Standalone cool tool
Memory Arena ⭐⭐⭐ 1 week Deep on ownership, lifetime ⭐⭐⭐ Used in parsers, games
JSON Parser ⭐⭐⭐⭐ 2 weeks Deep on all concepts ⭐⭐⭐⭐ Practical library
Linked List Database ⭐⭐⭐ 1-2 weeks Deep on invariants, debugging ⭐⭐⭐ Good teaching tool

Recommendation Path

Based on the curriculum’s progression, here’s the optimal order:

Week 1-2: Foundation

Start with Project 1 (Dynamic Array) — This is non-negotiable. Every other project needs this, and it teaches the invariant mindset in the simplest possible context.

Week 2-3: Choose Your Path

If you want immediate satisfaction → Project 2 (Text Editor Buffer)

  • You get a working tool you’ll actually use
  • Gap buffers are beautiful and underappreciated
  • Forces you to confront aliasing in a visceral way

If you want maximum ownership understanding → Project 3 (Arena Allocator)

  • Ownership becomes crystal clear when YOU are the allocator
  • Sets you up perfectly for the JSON parser
  • Game industry essential skill

Week 3-4: Integration

Finish with Project 4 (JSON Parser) — This combines everything. Use your arena, use your dynamic array, face every ownership decision at once.

Optional Depth

Project 5 (Linked List DB) — Do this if you want to really cement invariant checking as a discipline. It’s the most “defensive programming” focused project.


Final Capstone Project: Memory-Safe HTTP Server with Request Pooling

What you’ll build: A single-threaded HTTP/1.1 server that handles multiple concurrent connections using select()/poll(), with a custom memory pool for request parsing, demonstrating every concept from this sprint in a production-relevant context.

Why this is the ultimate test: A network server is where memory bugs become security bugs. You must:

  • Parse untrusted input into owned data structures
  • Track connection state with strict invariants (partial reads, connection lifecycle)
  • Prevent buffer overflows in parsing (attackers WILL send malformed data)
  • Free resources correctly when connections close unexpectedly
  • Handle ownership of request data across parse/handle/respond phases

Core challenges you’ll face:

  • Connection state machine invariants: Each connection is in exactly one state (READING_HEADERS, READING_BODY, SENDING_RESPONSE, etc.) with strict rules about transitions
  • Request lifetime ownership: Who owns the parsed headers? When can they be freed? What if send() blocks?
  • Buffer management: Partial reads mean you have “incomplete” data—representing this safely requires fat pointers
  • Pool allocation for requests: Each request allocates from a pool; pool resets after response—arena pattern in action
  • Bounds checking on parsing: Content-Length: 999999999999 must not crash your server

Key Concepts:

  • Network programming in C: “The Sockets Networking API” (UNIX Network Programming Vol.1) by Stevens - Chapters 5-6
  • Event-driven servers: “The Linux Programming Interface” by Michael Kerrisk - Chapter 63 (Alternative I/O Models)
  • Defensive parsing: “The Tangled Web” by Michal Zalewski - Chapter 9 (HTTP protocol security)
  • State machine design: “Programming in the Large with Design Patterns” by Clements - Chapter 5

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: All previous projects, basic understanding of sockets

Real world outcome:

  • An HTTP server that:
    $ ./httpserver 8080 ./www
    Listening on port 8080...
    
    • Serves static files from a directory
    • Handles 100+ concurrent connections
    • Survives curl, ab (Apache Bench), and manual telnet abuse
    • Prints per-request memory stats: "GET /index.html - parsed in 450 bytes pool, 0 heap allocs"
  • A fuzzer script that sends malformed requests (truncated, oversized, malformed headers) and verifies the server doesn’t crash or leak
  • Valgrind clean under load

Learning milestones:

  1. After basic request/response - You’ve parsed HTTP headers with bounds checking
  2. After handling concurrent connections - You understand state machine invariants
  3. After integrating arena allocation - You see the performance difference (no malloc per request)
  4. After surviving the fuzzer - You trust your invariant checking saved you from security bugs
  5. After Valgrind clean - You’ve built something genuinely production-quality

Real World Outcome

What You’ll Actually See When Running Your Server

When you complete this project, you’ll have built something that feels real—a production-quality HTTP server that you can interact with through your browser, command-line tools, and stress-testing utilities.

1. Server Startup

$ ./httpserver --pool-size 4096 --max-connections 128 --port 8080 ./www
[INFO] Initializing memory pools...
[INFO] Created 128 connection pools (4096 bytes each)
[INFO] Total pre-allocated memory: 512 KB
[INFO] Listening on 0.0.0.0:8080
[INFO] Document root: /Users/you/project/www
[INFO] Server ready. Press Ctrl+C to stop.

The server starts and pre-allocates all memory pools. No malloc() calls will happen during request handling.

2. Handling Normal Requests

Open another terminal and make requests:

$ curl -v http://localhost:8080/index.html

Server output shows detailed state tracking:

[CONN 0] New connection from 127.0.0.1:52341
[CONN 0] State: READING_REQUEST_LINE
[CONN 0] Read 78 bytes, buffer: 78/4096
[CONN 0] Parsed: GET /index.html HTTP/1.1
[CONN 0] State: READING_HEADERS → PROCESSING
[CONN 0] Headers parsed: 6 headers, used 312 bytes pool
[CONN 0] State: PROCESSING → SENDING_RESPONSE
[CONN 0] Serving file: ./www/index.html (1247 bytes)
[CONN 0] Sent 1247 bytes
[CONN 0] State: SENDING_RESPONSE → CLOSING
[CONN 0] Pool reset, 312 bytes freed
[CONN 0] Connection closed
[CONN 0] Total lifetime: 3ms, heap allocs: 0

Notice the state transitions, pool memory usage, and zero heap allocations.

3. Memory Statistics Output

After handling multiple requests, you can send SIGUSR1 to dump statistics:

$ kill -SIGUSR1 $(pidof httpserver)

Server output:

========== Memory Statistics ==========
Total requests served: 1,247
Total bytes received: 156,783
Total bytes sent: 4,291,034

Pool Statistics:
  Active connections: 3
  Peak connections: 47
  Pool resets: 1,247
  Average pool usage: 412 bytes/request
  Peak pool usage: 3,891 bytes (URI: /api/large_payload)
  Pool overflows: 0

Heap Statistics:
  malloc() calls during runtime: 0
  free() calls during runtime: 0
  Current heap usage: 0 bytes

Connection State Breakdown:
  READING_REQUEST_LINE: 2
  READING_HEADERS: 1
  PROCESSING: 0
  SENDING_RESPONSE: 0

Invariant Checks Passed: 47,291
Invariant Violations: 0
=======================================

The zero heap allocations during runtime proves your pool allocator works. The invariant checks show defensive programming in action.

4. Concurrent Connection Test

Use Apache Bench to stress test:

$ ab -n 10000 -c 100 http://localhost:8080/test.html

Server output (with verbose mode):

[INFO] Connection spike detected: 100 concurrent connections
[CONN 0-99] All connections in READING_REQUEST_LINE state
[CONN 12] State: READING_REQUEST_LINE → READING_HEADERS
[CONN 45] State: READING_REQUEST_LINE → READING_HEADERS
[CONN 7] State: READING_HEADERS → PROCESSING
... (fast stream of state transitions) ...
[INFO] Handled 10,000 requests in 2.341 seconds (4,272 req/sec)
[INFO] Average response time: 23.4ms
[INFO] Pool utilization: avg 387 bytes, peak 1,203 bytes
[INFO] No connection pool exhausted (max usage: 100/128)

Apache Bench results:

Concurrency Level:      100
Time taken for tests:   2.341 seconds
Complete requests:      10000
Failed requests:        0
Total transferred:      15470000 bytes
HTML transferred:       12470000 bytes
Requests per second:    4272.18 [#/sec] (mean)
Time per request:       23.410 [ms] (mean)
Time per request:       0.234 [ms] (mean, across all concurrent requests)

Zero failed requests shows your state machine handles concurrency correctly.

5. Fuzzer Output

Run your custom fuzzer (or use a tool like AFL):

$ ./fuzzer --target localhost:8080 --malformed-requests 1000

Fuzzer tries malicious payloads:

[TEST 001/1000] Oversized Content-Length
  Sending: Content-Length: 9999999999
  Server response: 413 Request Entity Too Large
  Status: ✓ PASS (server rejected)

[TEST 002/1000] Missing HTTP version
  Sending: GET /index.html\r\n\r\n
  Server response: 400 Bad Request
  Status: ✓ PASS (server rejected)

[TEST 003/1000] Null byte in URI
  Sending: GET /test%00.html HTTP/1.1
  Server response: 400 Bad Request
  Status: ✓ PASS (server rejected)

[TEST 004/1000] Line too long (10KB URI)
  Sending: GET /AAAA...AAAA (10240 bytes)
  Server response: 414 URI Too Long
  Status: ✓ PASS (buffer overflow prevented)

[TEST 005/1000] Partial request (truncated)
  Sending: GET /index
  (connection held open)
  Server response: (timeout after 30s, connection closed)
  Status: ✓ PASS (no crash, no leak)

...

[TEST 1000/1000] Complete
  Passed: 998/1000
  Failed: 2/1000
  Server crashes: 0
  Memory leaks detected: 0

The fuzzer proves your bounds checking and invariant enforcement prevent exploits.

6. Valgrind Output

Run under Valgrind while handling requests:

$ valgrind --leak-check=full --show-leak-kinds=all ./httpserver 8080 ./www

After handling 1000 requests and shutting down gracefully:

==12345== HEAP SUMMARY:
==12345==     in use at exit: 0 bytes in 0 blocks
==12345==   total heap usage: 129 allocs, 129 frees, 532,480 bytes allocated
==12345==
==12345== All heap blocks were freed -- no leaks are possible
==12345==
==12345== For lists of detected and suppressed errors, rerun with: -s
==12345== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The 129 allocations are:

  • 1 allocation for server socket setup
  • 128 allocations for connection pool buffers (one per max connection)
  • 0 allocations during request handling

Perfect cleanup. Every byte accounted for.

7. Manual Telnet Testing

The real test—raw protocol interaction:

$ telnet localhost 8080
Trying 127.0.0.1...
Connected to localhost.

Now type a valid request manually:

GET /index.html HTTP/1.1
Host: localhost
Connection: close

Server responds:

HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 1247
Connection: close

<!DOCTYPE html>
<html>
...
</html>
Connection closed by foreign host.

Now try malformed input:

$ telnet localhost 8080
GET /AAAA

(wait 5 seconds, type more garbage)

XXXX random garbage \x00\x01\x02

Server response:

HTTP/1.1 400 Bad Request
Content-Length: 0
Connection: close

Connection closed by foreign host.

Server output shows invariants in action:

[CONN 3] Malformed request line detected
[CONN 3] Invariant check: buffer_pos <= buffer_size ✓
[CONN 3] Invariant check: no null bytes in header ✗ FAILED
[CONN 3] Responding with 400 Bad Request
[CONN 3] Pool reset (partial parse cleanup)
[CONN 3] Connection closed safely

Even with garbage input, no crash, no leak, clean error handling.

8. Benchmark Comparison

Compare your server against a naive implementation (one that uses malloc per request):

$ ./benchmark --compare naive pooled

Output:

=== Benchmark: 10,000 requests, 50 concurrent ===

Naive Implementation (malloc per request):
  Total time: 4.782 seconds
  Requests/sec: 2,091
  Avg latency: 47.8ms
  malloc() calls: 10,000
  free() calls: 10,000

Pooled Implementation (arena per connection):
  Total time: 2.103 seconds
  Requests/sec: 4,755
  Avg latency: 21.0ms
  malloc() calls: 0
  free() calls: 0

Performance improvement: 2.27x faster
Memory efficiency: 100% fewer allocations

Your pool allocator is 2-3x faster because you eliminated malloc overhead.


The Core Question You’re Answering

“How do you write network servers in C that don’t crash, leak, or get exploited—and how do you PROVE they’re safe?”

This project answers the fundamental tension in systems programming:

The Paradox

  • C gives you speed by letting you manage memory manually
  • But manual memory management is where 70% of security bugs come from
  • And network servers are the #1 target for attackers

What You’re Proving

By completing this project, you demonstrate you can reconcile this paradox:

  1. Performance without malloc() — Using arena/pool allocation, you get malloc’s flexibility without its overhead
  2. Safety without garbage collection — Using invariants and state machines, you get Rust’s safety without Rust
  3. Correctness without prayer — Using assertions and fuzzing, you VERIFY instead of HOPE

The Deeper Question

“What is the relationship between data structure invariants and program security?”

In this server:

  • Every buffer overflow vulnerability is prevented by an invariant: bytes_read <= buffer_capacity
  • Every use-after-free is prevented by an invariant: connection_state != CLOSED || buffer_ptr == NULL
  • Every memory leak is prevented by an invariant: pool_reset_count == request_count

You’re learning that security is not a feature you add—it’s an invariant you maintain.

The Interview-Level Insight

When asked “How would you prevent buffer overflows in C?”, most developers say:

“Use strncpy instead of strcpy”

After this project, you’ll say:

“Design your data structures with invariants that make overflows impossible by construction. Then assert those invariants at every state transition. Then fuzz the boundaries.”

That’s the difference between someone who’s read about safety and someone who’s built safe systems.


Concepts You Must Understand First

Before writing a single line of code, you must research and internalize these concepts. This is not optional—skipping this step will result in you debugging for weeks.

1. HTTP/1.1 Protocol Fundamentals

What you must know:

  • The structure of an HTTP request: request line, headers, optional body
  • Valid vs invalid HTTP requests (what makes a request malformed?)
  • The difference between Content-Length and Transfer-Encoding: chunked
  • HTTP response structure: status line, headers, body

Book references:

  • “HTTP: The Definitive Guide” by Gourley & Totty - Chapters 1-3
  • “Computer Networking: A Top-Down Approach” by Kurose & Ross - Chapter 2.2 (Web and HTTP)

Research task: Open Wireshark, make a curl request, and examine the raw bytes. You should be able to identify:

47 45 54 20 2f 69 6e 64 65 78 2e 68 74 6d 6c ...
G  E  T     /  i  n  d  e  x  .  h  t  m  l ...

Critical questions to answer:

  • What bytes mark the end of headers? (Hint: \r\n\r\n)
  • What happens if Content-Length doesn’t match actual body size?
  • What’s the maximum reasonable size for a header?

2. Non-Blocking I/O and Multiplexing

What you must know:

  • The difference between blocking and non-blocking sockets
  • Why recv() might return fewer bytes than you asked for
  • How select() or poll() lets you wait on multiple sockets
  • What EAGAIN/EWOULDBLOCK means

Book references:

  • “UNIX Network Programming, Vol 1” by Stevens - Chapter 6 (I/O Multiplexing)
  • “The Linux Programming Interface” by Kerrisk - Chapter 63 (Alternative I/O Models)

Research task: Write a tiny test program:

int sock = socket(...);
fcntl(sock, F_SETFL, O_NONBLOCK);
int n = recv(sock, buf, 100, 0);
// What are the possible values of n? (positive, 0, -1)
// What errno values can happen?

Critical questions to answer:

  • If select() says a socket is readable, can recv() still block? (No, but it can return 0)
  • What does recv() returning 0 mean? (Connection closed by peer)
  • Can you call send() in a loop until all bytes are sent? (No! It might return partial writes)

3. State Machines for Connection Management

What you must know:

  • What a finite state machine (FSM) is
  • How to represent states as an enum
  • How to document legal state transitions
  • How to detect illegal transitions (invariant violations)

Book references:

  • “Design Patterns” by Gamma et al. - Chapter 5.8 (State Pattern)
  • “Programming in the Large with Design Patterns” by Clements - Chapter 5
  • “Crafting Interpreters” by Nystrom - Chapter 4 (has excellent FSM diagrams for parsing)

Research task: Draw this state machine on paper:

IDLE → READING_REQUEST → READING_HEADERS → PROCESSING → SENDING_RESPONSE → CLOSING → IDLE

For each transition, write:

  • What event triggers it? (e.g., “received \r\n\r\n”)
  • What invariants must hold? (e.g., “header_count > 0”)
  • What cleanup happens? (e.g., “reset pool”)

Critical questions to answer:

  • Can you go from READING_HEADERS directly to CLOSING? (Yes, on error)
  • What happens if you’re in SENDING_RESPONSE but send() returns EAGAIN? (Stay in same state, try again later)
  • How do you prevent state corruption when errors happen mid-transition?

4. Memory Pool/Arena Allocation

What you must know:

  • How pool allocation differs from malloc
  • Why pools are faster (no metadata, bulk deallocation)
  • When pointers into a pool become invalid
  • How to handle pool exhaustion

Book references:

  • “C Interfaces and Implementations” by Hanson - Chapter 5 (Arena)
  • “Game Engine Architecture” by Gregory - Chapter 5.2 (Memory Allocators)

Research task: Implement a tiny pool allocator:

typedef struct {
    char *base;
    size_t size;
    size_t used;
} Pool;

void* pool_alloc(Pool *p, size_t n) {
    if (p->used + n > p->size) return NULL; // Out of space
    void *ptr = p->base + p->used;
    p->used += n;
    return ptr;
}

void pool_reset(Pool *p) {
    p->used = 0; // All allocations now invalid!
}

Critical questions to answer:

  • After pool_reset(), what happens to pointers you allocated earlier? (They’re dangling!)
  • How do you prevent use-after-reset bugs? (Clear all pointers that referenced pool memory)
  • What’s the tradeoff between pool size and memory waste?

5. Buffer Management and Partial I/O

What you must know:

  • That recv() doesn’t respect message boundaries
  • How to handle a header that arrives in multiple recv() calls
  • The concept of a “read buffer” with tracked position
  • How to avoid buffer overflows when accumulating data

Book references:

  • “UNIX Network Programming, Vol 1” by Stevens - Chapter 3 (Socket Programming)
  • “Effective TCP/IP Programming” by Stevens - Tip 17 (Understanding Buffering)

Research task: Trace this scenario on paper:

  1. Client sends 150-byte request
  2. First recv() returns 80 bytes (request line + 2 headers)
  3. Second recv() returns 70 bytes (remaining headers + \r\n\r\n)

Draw the buffer state after each recv():

After recv 1: [GET /index.html...\r\nHost: local...] pos=80
After recv 2: [GET /index.html...\r\nHost: localhost\r\n\r\n] pos=150

Critical questions to answer:

  • How do you detect “end of headers” when they arrive in chunks?
  • What if the buffer is full but you haven’t seen \r\n\r\n yet?
  • How do you distinguish “incomplete data” from “malformed request”?

6. Defensive Parsing and Bounds Checking

What you must know:

  • Why parsing untrusted input is dangerous
  • How to check bounds BEFORE reading, not after
  • The difference between “length-prefixed” and “delimiter-based” parsing
  • How to prevent integer overflow in size calculations

Book references:

  • “Writing Solid Code” by Maguire - Chapter 2 (Assert Yourself)
  • “The Tangled Web” by Zalewski - Chapter 9 (HTTP Security)
  • “Secure Programming Cookbook” by Viega & Messier - Chapter 2 (Input Validation)

Research task: Find the bug:

char buf[1024];
char *end = strstr(buf, "\r\n");
int len = end - buf; // What if end is NULL?
memcpy(header, buf, len); // Overflow possible?

Fix it:

char buf[1024];
char *end = strstr(buf, "\r\n");
if (!end || end < buf) { /* error */ }
size_t len = end - buf;
if (len >= sizeof(header)) { /* error */ }
memcpy(header, buf, len);
header[len] = '\0';

Critical questions to answer:

  • What’s the difference between strncpy and memcpy for safety?
  • If a header says Content-Length: 999999999, how do you reject it before allocating?
  • How do you prevent \0 injection attacks in headers?

Questions to Guide Your Design

Answer these questions before you start coding. Write down your answers. They will guide every design decision.

About Connection State

  1. How many states can a connection be in?
    • List them all: IDLE, READING_REQUEST_LINE, READING_HEADERS, READING_BODY, PROCESSING, SENDING_RESPONSE, CLOSING
    • Are there any intermediate states you need?
  2. What data does a connection need to track?
    • Socket file descriptor?
    • Client IP address?
    • Read buffer and position?
    • Parsed request data?
    • Send buffer for response?
    • State enum value?
  3. What invariants must hold for each state? Example:
    • READING_HEADERS → buffer_pos > 0 && request_line_parsed == true
    • SENDING_RESPONSE → response_buffer != NULL && bytes_sent <= response_size
  4. How do you transition between states safely?
    • Do you need a set_state() function that enforces valid transitions?
    • Should state transitions assert preconditions?

About Memory Management

  1. Where does each piece of data live?
    • Connection struct: stack, heap, or global array?
    • Request line: pool or buffer?
    • Headers: pool or buffer?
    • Response body: pool, buffer, or read from disk?
  2. When does memory get freed?
    • After each request? (pool reset)
    • When connection closes? (free connection struct)
    • On server shutdown? (free all pools)
  3. How do you prevent use-after-free?
    • After pool reset, what pointers need to be cleared?
    • What if you’re iterating connections while one closes?
  4. What’s your maximum memory usage?
    • max_connections * pool_size_per_connection
    • Plus overhead for connection structs
    • Can you calculate this before starting?

About Parsing

  1. How do you handle partial reads?
    • Do you accumulate into a buffer?
    • How do you track “how much is valid data”?
    • What if \r\n\r\n arrives split across two recv() calls?
  2. What makes a request malformed?
    • Missing HTTP version?
    • Header without :?
    • Content-Length that’s not a number?
    • Request line > 8KB?
  3. How do you reject bad requests safely?
    • Do you send a 400 response or just close?
    • Do you log the violation?
    • Do you need to clean up partially-parsed data?
  4. How do you prevent attacks via large inputs?
    • Maximum request line length?
    • Maximum number of headers?
    • Maximum total header size?
    • Maximum Content-Length?

About Concurrency

  1. How do you handle 100 simultaneous connections?
    • Array of connection structs?
    • How do you find free slots?
    • What if all slots are full?
  2. How does select() or poll() fit in?
    • Do you rebuild the fd_set every loop?
    • How do you map from ready fd back to connection struct?
  3. What happens when you’re sending a response and new data arrives?
    • Can a connection be readable and writable at the same time?
    • Do you prioritize sending over reading?

About Error Handling

  1. What errors can happen at each step?
    • accept() can fail (out of FDs, EMFILE)
    • recv() can fail (EAGAIN, connection reset)
    • send() can fail (EAGAIN, broken pipe)
    • open() (for file serving) can fail (ENOENT)
  2. How do you handle errors without leaking?
    • If parsing fails halfway, how do you clean up?
    • If send() fails, do you close immediately or retry?
  3. How do you log errors for debugging?
    • Do you log every error?
    • Do you include connection ID, state, and context?

Thinking Exercise

Do this BEFORE writing code. This exercise will save you days of debugging.

Exercise 1: Trace a Request Lifecycle

On paper, trace every step of handling this request:

Client sends:
GET /test.html HTTP/1.1\r\n
Host: localhost\r\n
Connection: close\r\n
\r\n

File /www/test.html exists and contains 50 bytes.

Trace the server’s actions:

  1. accept() returns new socket fd 4
  2. Create connection struct for fd 4, state = READING_REQUEST_LINE
  3. Add fd 4 to select() fd_set
  4. select() returns, fd 4 is readable
  5. recv(fd 4, buf, 4096) returns 80 bytes (entire request)
  6. Parse request line: GET /test.html HTTP/1.1
    • Store in pool: method=”GET”, uri=”/test.html”, version=”HTTP/1.1”
    • State transition: READING_REQUEST_LINE → READING_HEADERS
  7. Parse headers:
    • Find \r\n\r\n at position 76
    • Parse 2 headers, store in pool
    • State transition: READING_HEADERS → PROCESSING
  8. Process request:
    • Build file path: ./www/test.html
    • Check if file exists and is readable
    • Open file, get size (50 bytes)
    • Build response header: HTTP/1.1 200 OK\r\nContent-Length: 50\r\n\r\n
    • State transition: PROCESSING → SENDING_RESPONSE
  9. Send response:
    • send(fd 4, response_header, 47) returns 47
    • send(fd 4, file_contents, 50) returns 50
    • State transition: SENDING_RESPONSE → CLOSING
  10. Close connection:
    • close(fd 4)
    • Pool reset (frees method, uri, version, headers)
    • Mark connection slot as free
    • State transition: CLOSING → IDLE

Now answer:

  • At what point does pool memory get allocated?
  • What happens if step 5 only returns 40 bytes?
  • What if the file doesn’t exist in step 8?
  • What if send() in step 9 returns EAGAIN?

Exercise 2: Draw the State Machine

Draw a state diagram with these states:

  • IDLE
  • READING_REQUEST_LINE
  • READING_HEADERS
  • READING_BODY (for POST requests)
  • PROCESSING
  • SENDING_RESPONSE
  • CLOSING

For each arrow (transition), label:

  • The event that triggers it
  • The invariant that must hold before transition
  • The action taken during transition

Example:

READING_REQUEST_LINE --[found \r\n]--> READING_HEADERS
  Pre-condition: buffer contains valid HTTP method
  Action: Parse and store request line in pool
  Post-condition: request_line != NULL

Exercise 3: Design Your Connection Struct

Write the struct definition on paper:

typedef struct {
    // TODO: What fields do you need?
    // - Socket fd?
    // - State?
    // - Read buffer?
    // - Pool?
    // - Parsed request data?
    // - Response buffer?
} Connection;

For each field, write a comment answering:

  • Who owns this? (The connection struct? External?)
  • When is it valid? (Always? Only in certain states?)
  • When is it freed? (On connection close? On pool reset?)

Exercise 4: Diagram Memory Pools

Draw three connections at different stages:

Connection 0 (SENDING_RESPONSE):
  Pool: [USED: method="GET", uri="/index.html", headers... | FREE: ...]
  Used: 412 bytes / 4096 bytes

Connection 1 (READING_HEADERS):
  Pool: [USED: method="POST", uri="/api/data" | FREE: ...]
  Used: 87 bytes / 4096 bytes

Connection 2 (IDLE):
  Pool: [ALL FREE]
  Used: 0 bytes / 4096 bytes

Now, Connection 0 finishes sending and closes. Show the pool reset:

Connection 0 (IDLE):
  Pool: [ALL FREE]
  Used: 0 bytes / 4096 bytes  <-- Reset!

Question: What happens to the pointers method, uri, headers that were in Connection 0’s pool? Answer: They’re now dangling! You must NULL them out or not reference them after reset.


The Interview Questions They’ll Ask

If you’ve built this project, you can answer these real interview questions:

Network Programming

Q1: “Explain how select() or poll() works. Why would you use it instead of threads?”

Your answer: select() lets you monitor multiple file descriptors and blocks until at least one is ready for I/O. It returns which FDs are readable/writable. You’d use it instead of threads because:

  1. Threads have overhead (stack per thread, context switching)
  2. For I/O-bound workloads, threads spend most time blocked anyway
  3. select() scales better—100 connections = 1 thread, not 100

I used it in my HTTP server to handle 128 concurrent connections in a single thread with zero context switches.

Q2: “What does recv() returning 0 mean? What about -1?”

Your answer:

  • recv() → 0 means the peer closed the connection gracefully (FIN received)
  • recv() → -1 means an error occurred; check errno:
    • EAGAIN/EWOULDBLOCK: no data available (try again later)
    • ECONNRESET: peer sent RST (connection aborted)
    • EINTR: system call interrupted by signal

In my server, I handle each case differently:

  • 0: transition to CLOSING state, clean up
  • EAGAIN: stay in current state, wait for select() to tell me when readable
  • Other errors: log error, close connection

Q3: “How would you prevent a slowloris attack?”

Your answer: Slowloris sends partial requests very slowly to tie up server connections. Defenses:

  1. Timeout per connection state: If in READING_HEADERS for >30 seconds, close
  2. Limit maximum header size: Reject if headers exceed 8KB
  3. Limit total connections per IP: Prevent single attacker from filling all slots

I implemented #1 and #2 in my server—each connection tracks last_activity timestamp and I enforce maximum buffer sizes.

State Machines

Q4: “How do you design a state machine for connection handling? What states would you have?”

Your answer: I designed a state machine with these states:

  • IDLE: Connection slot available
  • READING_REQUEST_LINE: Accumulating bytes until \r\n
  • READING_HEADERS: Accumulating headers until \r\n\r\n
  • READING_BODY: Reading POST body (if Content-Length > 0)
  • PROCESSING: Serving file / generating response
  • SENDING_RESPONSE: send() may return partial writes
  • CLOSING: Cleanup, pool reset

Each state has invariants (e.g., READING_HEADERS → request_line != NULL) and legal transitions. I enforce transitions with a set_state() function that asserts preconditions.

Q5: “What happens if you receive data in the wrong state?”

Your answer: That’s an invariant violation. For example, if we’re in SENDING_RESPONSE and select() says the socket is readable (client sent more data), that violates HTTP request/response semantics.

In my server:

  1. I check if the state allows reading (READING_* states only)
  2. If data arrives in wrong state, I log a warning and either ignore it or close the connection (depending on severity)
  3. For pipelined requests (HTTP/1.1), I’d need to handle SENDING_RESPONSE → READING_REQUEST_LINE transition

Memory Safety

Q6: “How do you prevent buffer overflows when parsing HTTP headers?”

Your answer: Multiple layers of defense:

  1. Maximum buffer size: Reject requests if headers exceed 8KB
  2. Bounds checking before memcpy: Always verify src_len + dest_pos <= dest_capacity
  3. Invariant assertions: After every parse, assert buffer_pos <= buffer_size
  4. Fuzzing: I wrote a fuzzer that sends oversized headers, null bytes, etc.

Example code:

if (bytes_read + conn->buf_pos > BUFFER_SIZE) {
    // Reject request, respond with 413
    return HTTP_REQUEST_TOO_LARGE;
}
memcpy(conn->buffer + conn->buf_pos, data, bytes_read);
conn->buf_pos += bytes_read;
assert(conn->buf_pos <= BUFFER_SIZE);

Q7: “Explain how arena/pool allocation prevents memory leaks.”

Your answer: Traditional allocation: each malloc() requires matching free(). If you forget one, leak.

Pool allocation: you allocate from a pool, then reset the entire pool at once. For HTTP servers:

  • Each connection has a pool (4KB)
  • Parse request → allocates from pool (headers, URI, etc.)
  • Send response → pool reset (frees everything at once)
  • Zero individual free() calls → zero leak opportunities

Tradeoff: after reset, all pointers into pool are invalid. You must clear them or avoid using them.

Q8: “How do you debug a use-after-free bug in this architecture?”

Your answer:

  1. Reproduction: Enable verbose logging, find which connection triggers it
  2. State tracking: Log every state transition and pool reset
  3. Invariant violations: Check if we’re accessing pool memory after reset
  4. Tools:
    • Valgrind (detects use-after-free)
    • AddressSanitizer (faster than Valgrind)
    • Manual: fill pool with 0xDEADBEEF on reset, detect if we read it

Example bug I found:

char *uri = pool_alloc(&conn->pool, uri_len);
// ... later ...
pool_reset(&conn->pool);
// ... later ...
printf("URI: %s\n", uri); // BUG! uri is now dangling

Fix: NULL out pointers after reset, or copy data out of pool before reset.

Parsing

Q9: “How do you parse HTTP headers when they arrive in multiple recv() calls?”

Your answer:

  1. Maintain a read buffer per connection
  2. Track current position in buffer
  3. On each recv(), append to buffer and update position
  4. Search for delimiter (\r\n for request line, \r\n\r\n for end of headers)
  5. If not found, keep accumulating (partial header)
  6. If found, parse and transition state

Example:

int n = recv(fd, conn->buf + conn->buf_pos, BUFFER_SIZE - conn->buf_pos, 0);
conn->buf_pos += n;

char *end = memmem(conn->buf, conn->buf_pos, "\r\n\r\n", 4);
if (end) {
    // Parse headers from conn->buf to end
    parse_headers(conn, conn->buf, end);
    conn->state = PROCESSING;
}
// Else: keep waiting for more data

Q10: “What attacks can happen during parsing? How do you prevent them?”

Your answer:

  1. Buffer overflow: Send 10MB header → overflow buffer
    • Defense: Enforce max header size, reject early
  2. Null byte injection: GET /file%00.html → path traversal
    • Defense: Reject requests with null bytes in URI
  3. Integer overflow: Content-Length: 4294967296 on 32-bit system
    • Defense: Check against reasonable max (e.g., 10MB)
  4. Slowloris: Send headers 1 byte/second → tie up connections
    • Defense: Timeout if in READING_HEADERS for >30 seconds
  5. Request smuggling: Conflicting Content-Length and Transfer-Encoding
    • Defense: Reject if both present, follow RFC 7230 strictly

Performance

Q11: “Why is pool allocation faster than malloc()?”

Your answer: malloc() has overhead:

  1. Metadata: Stores block size, free list pointers (8-16 bytes per allocation)
  2. Fragmentation: Searching for free block of right size
  3. Thread safety: Locks on every call (in multi-threaded allocators)

Pool allocation:

  1. No metadata: Just bump a pointer
  2. No search: Always allocate from current position
  3. Bulk free: Reset pointer to 0 (frees everything)

In my server, I saw 2-3x speedup vs malloc because:

  • Per request: ~10 allocations (method, uri, each header)
  • With malloc: 10 lock/unlock cycles + metadata writes
  • With pool: 10 pointer bumps, 1 reset

Q12: “How would you make this server multi-threaded?”

Your answer: Two approaches:

1. Thread-per-connection (simple, doesn’t scale):

  • accept() → spawn thread → handle request → exit thread
  • Problem: 10,000 connections = 10,000 threads (too much memory)

2. Thread pool with work queue (production-quality):

  • Main thread: select() on all connections
  • When a connection is ready, add to queue
  • Worker threads: pull from queue, process, return
  • Tradeoff: Added complexity (locks, work distribution)

For most use cases, single-threaded event loop (what I built) is fastest because HTTP is I/O-bound, not CPU-bound.


Hints in Layers

Use these hints progressively—try each layer before moving to the next.

Hint 1: Start with Single Connection

Don’t build the full concurrent server first. Start tiny:

  1. Accept ONE connection
  2. Read until you see \r\n\r\n
  3. Print what you received
  4. Send a hardcoded response: HTTP/1.1 200 OK\r\n\r\nHello
  5. Close connection

Test with:

echo -e "GET / HTTP/1.1\r\nHost: localhost\r\n\r\n" | nc localhost 8080

Why this helps: You’ll learn the recv/send mechanics without state machine complexity.

Hint 2: State Machine on Paper First

Before coding states, draw the diagram:

On paper, draw circles (states) and arrows (transitions). For each arrow, write:

  • What event triggers it
  • What gets allocated/freed
  • What invariants must hold

Example:

[READING_REQUEST_LINE] --[got \r\n]--> [READING_HEADERS]
  Event: Found \r\n in buffer
  Action: Parse "GET /index.html HTTP/1.1", allocate from pool
  Invariant after: conn->method != NULL

Why this helps: You’ll discover edge cases (what if \r\n is the first thing received? Malformed!) before writing code.

Hint 3: Pool Allocator is Just a Bump Pointer

Don’t overthink pools:

typedef struct {
    char *memory;    // Base pointer (from malloc)
    size_t capacity; // Total size
    size_t used;     // How much we've allocated
} Pool;

void* pool_alloc(Pool *p, size_t size) {
    if (p->used + size > p->capacity) {
        return NULL; // Out of memory
    }
    void *ptr = p->memory + p->used;
    p->used += size;
    return ptr;
}

void pool_reset(Pool *p) {
    p->used = 0; // That's it. All allocations now invalid.
}

Why this helps: You’ll have a working pool in 15 minutes, then you can focus on the hard part (the state machine).

Hint 4: Use a Connection Array, Not Linked List

Don’t use malloc() for each connection:

#define MAX_CONNECTIONS 128

typedef struct {
    int fd;           // -1 means slot is free
    State state;
    Pool pool;
    char buffer[4096];
    size_t buffer_pos;
    // ... other fields ...
} Connection;

Connection connections[MAX_CONNECTIONS];

Connection* find_free_connection() {
    for (int i = 0; i < MAX_CONNECTIONS; i++) {
        if (connections[i].fd == -1) {
            return &connections[i];
        }
    }
    return NULL; // All slots full
}

Why this helps: Pre-allocated array = no allocation during runtime = faster and simpler.

Hint 5: Partial Reads are Normal

If you’re getting “partial header” bugs:

Remember: recv() returns when data is AVAILABLE, not when your message is COMPLETE.

// WRONG:
char buf[4096];
recv(fd, buf, 4096, 0); // Might only return 50 bytes!
parse_headers(buf); // Fails if header incomplete

// RIGHT:
typedef struct {
    char buffer[4096];
    size_t pos; // How much is valid
} Connection;

int n = recv(fd, conn->buffer + conn->pos, 4096 - conn->pos, 0);
conn->pos += n;

// Now search for end marker in conn->buffer[0..conn->pos]
char *end = memmem(conn->buffer, conn->pos, "\r\n\r\n", 4);
if (end) {
    // Complete! Parse it.
} else {
    // Incomplete. Wait for more data.
}

Why this helps: You’ll stop expecting TCP to preserve message boundaries (it doesn’t).

Hint 6: Check Bounds Before Copying, Not After

If you’re getting segfaults in parsing:

// WRONG:
strcpy(dest, src); // Might overflow dest
if (strlen(dest) > MAX) { /* too late! */ }

// RIGHT:
size_t len = strlen(src);
if (len >= sizeof(dest)) {
    // Reject as too long
    return ERROR;
}
memcpy(dest, src, len);
dest[len] = '\0';

Why this helps: Checking BEFORE prevents the overflow. Checking AFTER is useless.

If you’re having issues finding \r\n\r\n:

// WRONG: strstr() stops at null bytes
char *end = strstr(buffer, "\r\n\r\n");

// RIGHT: memmem() searches binary data
char *end = memmem(buffer, buffer_len, "\r\n\r\n", 4);

Why this helps: HTTP headers can theoretically contain null bytes (malicious input), and your buffer might not be null-terminated yet.

Hint 8: Debugging State Transitions

If your state machine is breaking:

Add logging to EVERY state transition:

void set_state(Connection *conn, State new_state) {
    printf("[CONN %d] State: %s → %s\n",
           conn->fd,
           state_name(conn->state),
           state_name(new_state));

    // Assert valid transitions
    assert(is_valid_transition(conn->state, new_state));

    conn->state = new_state;
}

Now run your server and watch the logs. You’ll see illegal transitions immediately.

Hint 9: Fuzzer is Your Friend

If you think your parser is correct but aren’t sure:

Write a fuzzer that sends garbage:

// Oversized request line
send(sock, "GET /", 5);
for (int i = 0; i < 10000; i++) {
    send(sock, "A", 1);
}
send(sock, " HTTP/1.1\r\n\r\n", 13);

// Missing HTTP version
send(sock, "GET /index.html\r\n\r\n", 19);

// Header without colon
send(sock, "GET / HTTP/1.1\r\nBadHeader\r\n\r\n", 29);

// Null byte in URI
send(sock, "GET /test\x00.html HTTP/1.1\r\n\r\n", 27);

If any of these crash your server, you found a bug.

Hint 10: Valgrind Everything

If you’re not sure about memory safety:

valgrind --leak-check=full --track-origins=yes ./httpserver 8080 ./www

Then make requests. Valgrind will tell you:

  • Every uninitialized read
  • Every buffer overflow
  • Every memory leak

Fix until Valgrind is silent.


Books That Will Help

Topic Book Chapter What You’ll Learn
HTTP Protocol “HTTP: The Definitive Guide” by Gourley & Totty Chapters 1-3 Request/response structure, headers, status codes
HTTP Security “The Tangled Web” by Zalewski Chapter 9 Common HTTP attacks, parsing vulnerabilities
Socket Programming “UNIX Network Programming, Vol 1” by Stevens Chapters 5-6 socket(), bind(), listen(), accept(), recv(), send()
Non-Blocking I/O “UNIX Network Programming, Vol 1” by Stevens Chapter 6 select(), poll(), non-blocking sockets, EAGAIN handling
I/O Models Comparison “The Linux Programming Interface” by Kerrisk Chapter 63 When to use select vs poll vs epoll vs threads
State Machines “Crafting Interpreters” by Nystrom Chapter 4 Designing state machines for parsers
State Pattern “Design Patterns” by Gamma et al. Chapter 5.8 Object-oriented state machine design (adapt to C)
Pool Allocation “C Interfaces and Implementations” by Hanson Chapter 5 Arena allocator design and implementation
Memory Allocators “Game Engine Architecture” by Gregory Chapter 5.2 Pool, stack, and custom allocator patterns
Buffer Overflows “Secure Programming Cookbook” by Viega & Messier Chapter 2 Input validation, bounds checking, safe string ops
Defensive Programming “Writing Solid Code” by Maguire Chapters 2-3 Assertions, invariant checking, debugging techniques
Struct Layout “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron Chapter 3.9 Alignment, padding, memory layout
Invariants “Code Complete” by McConnell Chapter 8 Designing with preconditions/postconditions
Safe Coding in C “Effective C” by Seacord Chapters 5-6 Pointers, dynamic memory, ownership patterns
Debugging Techniques “The Practice of Programming” by Kernighan & Pike Chapter 5 Systematic debugging, defensive checks
HTTP/1.1 Spec RFC 7230-7235 (Free online) All Official HTTP/1.1 specification (dense but authoritative)

Week 1: Before coding

  1. Stevens - UNIX Network Programming Ch. 5-6 (sockets basics)
  2. Kerrisk - Linux Programming Interface Ch. 63 (I/O models)
  3. HTTP: Definitive Guide Ch. 1-3 (protocol fundamentals)

Week 2: During basic implementation

  1. Hanson - C Interfaces Ch. 5 (pool allocator)
  2. Nystrom - Crafting Interpreters Ch. 4 (state machines)
  3. Maguire - Writing Solid Code Ch. 2-3 (assertions, invariants)

Week 3: During parsing and security

  1. Zalewski - Tangled Web Ch. 9 (HTTP attacks)
  2. Viega - Secure Programming Cookbook Ch. 2 (input validation)
  3. Seacord - Effective C Ch. 5-6 (safe memory patterns)

Week 4: During debugging and optimization

  1. Kernighan - Practice of Programming Ch. 5 (debugging)
  2. Gregory - Game Engine Architecture Ch. 5.2 (allocator optimizations)

Free Online Resources

  • Beej’s Guide to Network Programming: Excellent practical introduction to sockets (free PDF)
  • RFC 7230 (HTTP/1.1): The actual spec, surprisingly readable
  • Handmade Hero Day 5-8: Casey Muratori implementing arena allocators (video series, free on YouTube)
  • Crafting Interpreters: Full book free online at craftinginterpreters.com

The Mental Shift You’re After

By the end of these projects, you won’t just know the syntax of C. You’ll have internalized:

Every data structure is a contract.

  • The struct definition is the vocabulary
  • The invariants are the grammar rules
  • The operations are the only legal sentences
  • Violation is not “a bug”—it’s undefined behavior

You’ll find yourself:

  • Writing invariant comments BEFORE writing code
  • Adding assert() statements instinctively
  • Asking “who owns this?” before passing a pointer
  • Feeling uncomfortable when ownership is unclear

That discomfort is the goal. It means you’ve internalized why C gives you power AND responsibility.