Sprint 2: Data and Invariants (C Systems Programming)

Goal: Build a rock-solid mental model for how data really lives in C: which bytes are valid, who owns them, which invariants must always hold, and how to prove those invariants with code and tests. By the end of this sprint you will be able to design data structures with explicit contracts, track lifetimes without a garbage collector, and debug memory failures methodically using sanitizers and instrumentation. You will ship working, documented, and testable C libraries (dynamic array, gap buffer, arena allocator, JSON parser, linked-list database) and a capstone HTTP server whose correctness depends on the invariants you defined.


Introduction

Data and invariants are the rules that keep C programs correct. C gives you direct access to raw memory, but it does not protect you. The only thing standing between your program and undefined behavior is your discipline: the explicit contracts you design, enforce, and test. This guide turns those contracts into buildable systems.

What you will build (by the end of this guide):

  • A reusable dynamic array library (vec.h) with documented invariants and validation
  • A text editor buffer using a gap buffer (Emacs-style) with correct insertion/deletion
  • A fast arena allocator with explicit ownership and alignment guarantees
  • A JSON parser with ownership rules for every value and error
  • A tiny linked-list database with safe iteration and mutation
  • A minimal HTTP/1.1 server with request pooling and robust parsing

Scope (what is included):

  • Invariants, preconditions, postconditions, and loop invariants
  • Ownership, lifetimes, and pointer validity in C
  • Memory layout, alignment, padding, and object representation
  • Allocation strategies: malloc/free/realloc, arenas, pools
  • Parsing and buffer management in memory-unsafe languages

Out of scope (for this guide):

  • Garbage collectors and managed runtimes
  • Kernel-level memory allocators
  • Full HTTP compliance beyond basic request parsing
  • Advanced concurrency (covered in later sprints)

The Big Picture (Mental Model)

           Data Structure Contracts in C

  Inputs -> Parse -> Allocate -> Mutate -> Validate -> Free
     |        |         |          |          |         |
     v        v         v          v          v         v
  (bytes)  (buffers) (ownership) (invariants) (asserts) (cleanup)

If any invariant is false at a stable point, the program is already broken.

Key Terms You Will See Everywhere

  • Invariant: A condition that must always be true at defined stable points
  • Ownership: The rule that says who is responsible for freeing a resource
  • Lifetime: The time interval during which a pointer refers to a valid object
  • Object representation: The actual bytes that store a C object in memory
  • Undefined behavior (UB): Behavior not defined by the C standard
  • Stable point: A place where invariants must hold (function boundaries, loop starts)

How to Use This Guide

  1. Read the Theory Primer first. It is the mini-book that gives you the mental models.
  2. Build projects in order for your first pass. Each project uses the invariants from earlier ones.
  3. Before coding, write down invariants. You should know what must be true before you write code.
  4. Instrument early. Use asserts, Valgrind, and AddressSanitizer from day one.
  5. Repeat with a twist. After each project, change a constraint (e.g., shrink buffer, shrink arena, malformed JSON) and verify the invariants still hold.

Prerequisites & Background Knowledge

Before starting these projects, you should have foundational understanding in these areas:

Essential Prerequisites (Must Have)

Programming Skills:

  • C syntax, control flow, functions, and structs
  • Basic pointer use and pointer arithmetic
  • Familiarity with malloc, free, and realloc
  • Comfort compiling with gcc or clang

C Fundamentals:

  • The difference between stack and heap allocations
  • What sizeof means and why it matters
  • How to read compiler warnings and treat them as errors

Recommended Reading:

  • “C Programming: A Modern Approach” by K. N. King (Ch. 11-12, 17)
  • “Effective C” by Robert C. Seacord (Ch. 4-6)

Helpful But Not Required

  • Basic debugging with gdb or lldb
  • Using Valgrind or AddressSanitizer
  • Familiarity with linked lists and arrays
  • Some exposure to parsing or state machines

Self-Assessment Questions

Before starting, ask yourself:

  1. Can you explain the difference between a pointer and the object it points to?
  2. Can you write a loop that iterates through a char * buffer without overflow?
  3. Do you understand why memcpy is unsafe for overlapping regions?
  4. Do you know what happens to pointers after realloc?
  5. Can you identify when a pointer is dangling?

If you answered “no” to 1-3: Review chapters in “C Programming: A Modern Approach” first.

Development Environment Setup

Required Tools:

  • Linux, macOS, or WSL
  • GCC or Clang (C11 or later)
  • make
  • gdb or lldb

Recommended Tools:

  • Valgrind Memcheck (memory error detection)
  • AddressSanitizer (compiler-based memory checks)
  • clang-tidy or Clang static analyzer

Testing Your Setup:

# Compiler check
gcc --version
clang --version

# Valgrind check (Linux)
valgrind --version

# AddressSanitizer quick test
cat > asan_test.c <<'C'
#include <stdlib.h>
int main(void) {
    int *p = malloc(sizeof(int));
    free(p);
    *p = 5; // use-after-free
    return 0;
}
C

clang -fsanitize=address -g asan_test.c -o asan_test
./asan_test

Valgrind Memcheck detects many memory errors at runtime, and AddressSanitizer detects out-of-bounds and use-after-free with relatively low overhead (often around ~2x runtime slowdown). These tools are part of the standard professional workflow for C and C++ codebases. See https://valgrind.org/docs/manual/quick-start.html and https://clang.llvm.org/docs/AddressSanitizer.html for details.

Time Investment

  • Project 1-2: 1-3 days each
  • Project 3-4: 2-4 days each
  • Project 5: 2-3 days
  • Capstone: 5-7 days

Total sprint: 3-6 weeks depending on experience.

Important Reality Check

These projects are designed to surface memory and invariant bugs. You will crash programs. That is the point. The goal is to understand why the crash happened and prevent it next time.


Big Picture / Mental Model

C data structures are safe only if their invariants are true at stable points.

                Stable Points (Invariants Must Hold)

   [function entry] -> [loop start] -> [after mutation] -> [function exit]
          |                 |               |                   |
          v                 v               v                   v
   Preconditions       Loop invariants   Representation     Postconditions
                                           invariants

If ANY invariant is false at a stable point, the program is already incorrect.

Think in layers:

  1. Representation layer: bytes, alignment, object representation
  2. Ownership layer: who frees what, and when
  3. Behavior layer: API contracts and invariants
  4. Validation layer: asserts, tests, sanitizers

Theory Primer

This is the mini-book. Every concept in the table below has its own section.

Chapter 1: Invariants and Contracts

Fundamentals

An invariant is a rule that must always be true at specific stable points in your program. In C, invariants replace the safety net you get from managed runtimes. They are the only guarantee that data structures stay consistent across operations. An invariant can be as simple as length <= capacity or as subtle as “every node in the list has a valid next pointer”. Every API boundary introduces a contract: the caller promises certain conditions before the call (preconditions), and the callee promises certain conditions after the call (postconditions). If you do not write these down, you will still depend on them, but you will discover them only when your program crashes.

Deep Dive into the Concept

Invariants live at stable points: function entry, function exit, and loop boundaries. Between stable points, invariants may be temporarily violated while you mutate a structure. That is acceptable only if you restore them before the next stable point. This is how you reason about complex operations in small steps. A dynamic array insert, for example, temporarily violates the invariant length <= capacity if you increment length before growing the buffer. A correct implementation ensures that the buffer is grown first, or that the final stable point restores the invariant. In code, the invariant should be enforced by structure: the order of operations, helper functions, and asserts.

Invariants can be grouped into:

  • Representation invariants: properties that must always be true for a data structure to be valid (e.g., linked list nodes form an acyclic chain; gap buffer has a contiguous gap).
  • Behavioral invariants: properties across operations (e.g., push followed by pop returns the original value).
  • Resource invariants: constraints about ownership and cleanup (e.g., every allocated pointer has exactly one owner).

Loop invariants are a special case. A loop invariant must be true at the start of each iteration. This is how you prove that a loop does not run off the end of a buffer or corrupt state. In C, failing to maintain loop invariants often results in out-of-bounds access or corrupted metadata. You should be able to state your loop invariants in a sentence. Example: “At the start of each iteration, i is in range [0, len) and all elements in out[0..i-1] are valid.” If you cannot state a loop invariant, the loop is probably not correct.

A practical way to enforce invariants is to centralize mutation. For example, if vec_push is the only function allowed to modify length, you can assert the invariant at the start and end of that function. Another approach is to create a vec_validate() function that checks invariants and can be called in debug builds. In addition, sanitizers and tools like Valgrind are your safety net, but they are not a substitute for invariants: they detect violations, not prevent them.

Invariants are also about interfaces. If your API allows callers to break invariants, your system is already unsafe. This is why you must be explicit about which fields are public and which are private. For example, a struct Vec that exposes length and capacity allows external code to mutate them and invalidate the structure. You should prefer opaque structs or a clear “do not touch” contract. In C, this is a social contract enforced by documentation and discipline.

Finally, invariants are what make testing meaningful. A unit test should not only check results, it should check that the structure is still valid. A dynamic array test that only checks output but never validates internal state will miss many bugs. Invariants are the bridge between correctness and implementation detail.

Another useful practice is to classify invariants as recoverable or fatal. Recoverable invariants can be checked at runtime and returned as errors (for example, input length too large). Fatal invariants should be enforced with asserts because continuing execution would corrupt state (for example, length > capacity inside a vector). This split helps you decide whether to return errors or crash early in debug builds.

How This Fits on Projects

Every project in this sprint starts by writing invariants. If you skip this step, you will be guessing. The dynamic array, gap buffer, arena allocator, JSON parser, linked list database, and HTTP server all depend on invariants to avoid undefined behavior.

Definitions & Key Terms

  • Invariant: A rule that must always be true at a stable point
  • Precondition: What must be true before a function runs
  • Postcondition: What must be true after a function returns
  • Loop invariant: Condition true at every loop start
  • Representation invariant: Structural constraints of a data structure

Mental Model Diagram

     Stable Points and Invariants

  [Function Entry] -> [Mutations] -> [Function Exit]
        |                 |              |
   Preconditions     Temporary break   Postconditions
                      (allowed)        (must hold)

How It Works (Step-by-Step)

  1. Define invariants for the data structure.
  2. At function entry, assert preconditions.
  3. Perform mutations, allowing invariants to be temporarily false.
  4. Restore invariants before returning.
  5. Assert postconditions and validate structure.

Minimal Concrete Example

#include <assert.h>

typedef struct {
    int *data;
    size_t length;
    size_t capacity;
} Vec;

static void vec_validate(const Vec *v) {
    assert(v != NULL);
    assert(v->capacity == 0 || v->data != NULL);
    assert(v->length <= v->capacity);
}

Common Misconceptions

  • “If it works in my test, it is correct.” (No: invariants may be broken silently.)
  • “Asserts are only for debugging.” (They encode contracts.)
  • “Invariants slow code.” (Only if you leave validation in production builds.)

Check-Your-Understanding Questions

  1. What is the difference between a representation invariant and a behavioral invariant?
  2. Why are invariants allowed to be temporarily false inside a function?
  3. What makes a loop invariant useful in C?

Check-Your-Understanding Answers

  1. Representation invariants describe the structure; behavioral invariants describe operation semantics.
  2. Because the function body is not a stable point; only entry/exit and loop boundaries are.
  3. It provides a correctness argument and prevents out-of-bounds access.

Real-World Applications

  • Database engines assert invariants on page layouts.
  • Compilers validate AST invariants after transformations.
  • Networking code uses invariants to ensure buffer sizes and offsets are valid.

Where You Will Apply It

  • Project 1: Dynamic Array
  • Project 2: Gap Buffer
  • Project 3: Arena Allocator
  • Project 4: JSON Parser
  • Project 5: Linked List Database
  • Project 6: HTTP Server

References

  • C11 Committee Draft N1570 (assertions, object lifetimes, alignment) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
  • “Code Complete” by Steve McConnell (Ch. 8)

Key Insight

Invariants are the real type system of C.

Summary

Invariants are non-negotiable truths about your data structures. You must define them, enforce them, and test them. They are how you write trustworthy C code.

Homework/Exercises to Practice the Concept

  1. Write invariants for a stack and queue.
  2. Add validate() functions for each structure.
  3. Write tests that intentionally violate the invariants.

Solutions to the Homework/Exercises

  1. Stack: top <= capacity, data != NULL when capacity > 0.
  2. Queue: head, tail within [0, capacity), size <= capacity.
  3. Use asserts and confirm the program halts when broken.

Chapter 2: Ownership and Lifetimes

Fundamentals

Ownership answers one question: who is responsible for freeing a resource? In C, every dynamically allocated object must have exactly one owner. If no one owns it, you leak memory. If two owners exist, you will eventually double-free or use-after-free. Lifetimes define the valid time interval for a pointer: from allocation until it is freed or otherwise invalidated. Most C bugs are ownership and lifetime bugs, not algorithmic bugs. If you can clearly describe ownership for every pointer, you can avoid the majority of undefined behavior. The C standard defines when objects exist and when pointers are valid. After free, the pointer becomes invalid, and after realloc, the old pointer may be invalid even if the memory appears unchanged.

Deep Dive into the Concept

Ownership is about responsibility, not just access. You can have multiple pointers to the same object, but only one “owner” is responsible for freeing it. This is the key distinction between ownership and aliasing. Aliasing means two pointers refer to the same memory; ownership means only one of them is allowed to release it. In practice, ownership is a contract across functions and modules. If a function returns a pointer, you must decide whether the caller owns it (must free it), whether ownership remains with the callee (do not free it), or whether ownership is shared through a reference-counting scheme. In C, reference counting is manual and error-prone, so most libraries avoid it unless necessary.

A robust ownership model uses three verbs: create, borrow, and transfer. A function that creates memory (e.g., vec_create) returns ownership to the caller. A function that borrows memory uses it temporarily without freeing it (e.g., vec_get). A function that transfers ownership passes responsibility to another component (e.g., json_take_string). You should document this in every API. Without documentation, the caller will guess, and the guess will be wrong.

Lifetimes are defined by the C standard. An object’s lifetime begins when storage is obtained and ends when the storage is released. This includes heap objects (malloc/free), stack objects (lifetime ends when the function returns), and static objects (lifetime is the entire program). Accessing an object outside of its lifetime is undefined behavior. This matters for struct fields and pointers. A struct can outlive the memory it points to; if you do not manage that lifetime, you will read dangling pointers and corrupt memory.

The realloc function is a perfect example of lifetime hazards. It may return the same pointer, or it may allocate new storage, copy data, and free the old storage. Any pointer derived from the old storage is invalid after realloc if the block moved. This is why realloc is dangerous when you have external pointers into an array. A safe approach is to never store raw pointers to elements in resizable arrays. Store indices instead, or rebuild pointers after reallocation.

Ownership extends beyond memory. File descriptors, sockets, and mutexes are resources with lifetimes and ownership. The same mental model applies: every resource must have exactly one owner or one explicit owner group. If you do not enforce this, you will leak descriptors or close them twice.

Finally, ownership is the basis of designing safe APIs. If you can describe ownership precisely, you can also describe how to test it. For example, a JSON parser that returns a tree structure must state who owns the strings: does the parser allocate them, or does it point into the input buffer? If it points into the input buffer, the caller must keep the buffer alive. This is an ownership contract, and if it is violated, you will see use-after-free bugs.

Ownership also interacts with error handling. If a function partially allocates resources and then fails, it must either free everything before returning or transfer ownership of the partially built structure to the caller with a clear cleanup function. Ambiguous ownership on error paths is where many real-world leaks happen, so you should design error handling as part of your ownership model.

How This Fits on Projects

Ownership rules are central to the dynamic array, arena allocator, JSON parser, linked list DB, and HTTP server. Each project requires explicit ownership documentation.

Definitions & Key Terms

  • Owner: The code responsible for freeing a resource
  • Borrower: Temporary user without ownership
  • Transfer: Passing ownership from one module to another
  • Lifetime: Valid time interval of an object

Mental Model Diagram

Ownership Flow

   malloc -> owner A -> transfer -> owner B -> free
        \-> borrow (no free)

How It Works (Step-by-Step)

  1. Decide who owns the resource at creation.
  2. Document whether functions borrow or transfer ownership.
  3. Ensure every code path frees exactly once.
  4. Invalidate pointers after free.

Minimal Concrete Example

char *make_copy(const char *s) {
    size_t n = strlen(s) + 1;
    char *p = malloc(n);
    if (!p) return NULL;
    memcpy(p, s, n);
    return p; // caller owns p
}

Common Misconceptions

  • “If two pointers exist, they both own the memory.” (No, only one does.)
  • “free(NULL) always means safe.” (It is safe, but it does not fix double-free bugs.)
  • “realloc preserves all pointers.” (Only the returned pointer is valid.)

Check-Your-Understanding Questions

  1. What is the difference between aliasing and ownership?
  2. Why does realloc invalidate old pointers?
  3. What lifetime does a stack-allocated array have?

Check-Your-Understanding Answers

  1. Aliasing is multiple references; ownership is responsibility to free.
  2. Because realloc may move the block, and the old address may be freed.
  3. It ends when the function returns.

Real-World Applications

  • Allocator APIs for databases
  • JSON parsers that return owned trees
  • Network servers managing buffers per connection

Where You Will Apply It

  • Project 1: Dynamic Array
  • Project 3: Arena Allocator
  • Project 4: JSON Parser
  • Project 6: HTTP Server

References

  • C11 Committee Draft N1570 (object lifetime and memory management functions) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
  • SEI CERT C Coding Standard (memory management rules) - https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard

Key Insight

Ownership is the only way to make C safe.

Summary

Ownership and lifetime define who frees memory and when pointers are valid. Without a clear ownership model, your program will eventually crash.

Homework/Exercises to Practice the Concept

  1. Annotate ownership for every pointer in a small program.
  2. Rewrite a function to clarify ownership transfer.
  3. Design a function that returns a buffer without leaking memory.

Solutions to the Homework/Exercises

  1. Use comments like “caller owns” or “borrowed”.
  2. Change function signature and document ownership explicitly.
  3. Ensure each allocation path has exactly one free.

Chapter 3: Memory Layout, Alignment, and Object Representation

Fundamentals

C is a language of bytes, not abstract types. Every object has a representation: a sequence of bytes stored in memory. These bytes must follow alignment requirements: certain types must begin at addresses that are multiples of their alignment. The C standard specifies alignment and object representation rules, and violating them results in undefined behavior. This is why sizeof(struct) can be larger than the sum of its fields: padding is inserted to satisfy alignment. Understanding memory layout is essential for writing correct allocators, parsing buffers, and working with low-level data structures. Endianness also matters when you interpret bytes directly, especially for serialization and debugging raw memory.

Deep Dive into the Concept

Alignment exists to make hardware access efficient and safe. Many architectures require that certain types (like 4-byte ints or 8-byte doubles) be aligned to their size. The C standard allows implementations to impose alignment constraints, and the programmer must respect them. Misaligned access can cause crashes or silent performance penalties. This matters in allocators and data structures that manually compute offsets. When you design an arena allocator, you must round up each allocation to the required alignment. When you lay out a struct in a binary file, you must decide whether to store padding bytes or to use explicit packing rules.

Object representation includes not only values but also padding bits. The C standard allows padding bits in integers and padding bytes in structs. This means that copying raw memory with memcmp can be misleading if the padding bytes are uninitialized. The correct way to compare structs is to compare fields, not raw bytes. For example, two structs with identical field values but different padding bytes are equal at the language level, but memcmp may say they are different. This is a subtle but common bug.

Alignment is expressed in C11 with _Alignof and _Alignas, and the standard library provides alignof and alignas via <stdalign.h>. These allow you to compute and enforce alignment constraints. For example, if you are writing an allocator that returns memory for double, you must ensure that the returned pointer is aligned to _Alignof(double). A correct allocator will round up its internal pointer to the next alignment boundary.

Memory layout is also about the difference between arrays and pointers. An array is a contiguous block of elements; a pointer is just an address. sizeof(array) gives you the total size of the array, while sizeof(pointer) gives you the size of the pointer itself. This confusion is a frequent source of buffer overflows.

Struct layout rules also matter when you serialize data. The compiler is allowed to reorder padding but not fields. This means the in-memory layout is predictable but not compact. If you need a compact, portable representation, you must define an explicit serialization format instead of dumping the struct bytes. This is why JSON, HTTP, and other protocols define byte-level representations rather than relying on C struct layouts.

Endianness is another subtle detail. Multi-byte values are stored in memory with either little-endian or big-endian byte order depending on the platform. If you inspect raw bytes or write them to disk, you must decide on a portable encoding (for example, network byte order). This is not just a networking concern; it affects debugging and binary formats too.

Finally, object representation matters for aliasing. If you treat memory as a different type than it was originally written, you may violate the “effective type” rules. This is undefined behavior and can cause subtle compiler optimizations to break your code. This is why you should use memcpy for type punning instead of casting pointers. The standard allows unsigned char to alias any object representation, so it is safe for raw byte access.

How This Fits on Projects

Memory layout is critical for the gap buffer, arena allocator, JSON parser, and HTTP server. Alignment and padding mistakes cause crashes and data corruption.

Definitions & Key Terms

  • Alignment: Required address boundary for an object
  • Padding: Unused bytes inserted for alignment
  • Object representation: Byte-level storage of a C object
  • Effective type: The type used to access an object

Mental Model Diagram

Struct Layout (example on 64-bit)

struct Foo { char a; int b; char c; };

Address: 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
         [ a ] [pad][pad][pad][  b  ][  b  ][  b  ][  b  ]
         [ c ] [pad][pad][pad] ...

How It Works (Step-by-Step)

  1. Determine alignment requirements with _Alignof.
  2. Round allocation addresses up to the next alignment boundary.
  3. Avoid comparing structs with memcmp.
  4. Serialize fields explicitly when writing to disk or network.

Minimal Concrete Example

#include <stdalign.h>
#include <stddef.h>

size_t align_up(size_t n, size_t align) {
    return (n + align - 1) & ~(align - 1);
}

size_t double_align = _Alignof(double); // C11

Common Misconceptions

  • “Struct size equals sum of fields.” (Padding breaks this.)
  • “memcmp is a safe struct comparison.” (Padding bytes can differ.)
  • “Casting fixes alignment.” (It does not.)

Check-Your-Understanding Questions

  1. Why does C insert padding into structs?
  2. How do you ensure a custom allocator returns aligned memory?
  3. Why is comparing structs with memcmp unsafe?

Check-Your-Understanding Answers

  1. To satisfy alignment requirements.
  2. Round up pointers to _Alignof(type) boundaries.
  3. Padding bytes may be uninitialized or differ.

Real-World Applications

  • Binary file formats and network protocols
  • High-performance allocators
  • Memory-mapped data structures

Where You Will Apply It

  • Project 2: Gap Buffer
  • Project 3: Arena Allocator
  • Project 4: JSON Parser
  • Project 6: HTTP Server

References

  • C11 Committee Draft N1570 (alignment and object representation) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Key Insight

If you do not understand layout, you cannot reason about memory safety.

Summary

Memory layout is about alignment, padding, and object representation. These rules are non-negotiable in C and directly affect correctness.

Homework/Exercises to Practice the Concept

  1. Write a program that prints sizeof and offsetof for a struct.
  2. Implement align_up for a custom allocator.
  3. Serialize a struct to disk without using fwrite(&struct).

Solutions to the Homework/Exercises

  1. Use offsetof from <stddef.h> to confirm offsets.
  2. Use bitwise rounding as shown above.
  3. Write each field explicitly in a defined order.

Chapter 4: Pointer Validity, Aliasing, and Undefined Behavior

Fundamentals

Pointers are only valid when they point to a live object within its lifetime. The C standard specifies what it means for a pointer to be valid, what operations are allowed on it, and when it becomes invalid. Accessing memory outside of an object’s lifetime or bounds is undefined behavior. Aliasing rules (“effective type” and strict aliasing) further restrict how you can access memory through different pointer types. These rules are essential for compiler optimizations, and violating them can produce bugs that only appear in optimized builds. Modern compilers assume you follow these rules even if the code seems to work in debug mode.

Deep Dive into the Concept

Pointer validity begins with object lifetime. An object exists only after storage is allocated and before it is released. A pointer to an object before initialization is not valid for reading; a pointer after free is invalid even if it still has a numerical address. This is why use-after-free bugs are so dangerous: the pointer still “looks” valid but no longer refers to a live object. The C standard explicitly allows compilers to assume that you never access an object outside its lifetime. This assumption enables aggressive optimizations that can rearrange or eliminate code. If you violate it, your program can behave unpredictably.

Aliasing is the ability for two pointers to refer to the same memory. The C standard defines rules about which types can alias each other. The effective type of an object is the type used to store it, and only certain types can be used to access it later. In practice, char * (or unsigned char *) is allowed to inspect any object representation, but unrelated types are not. This is the origin of “strict aliasing” rules: the compiler assumes that pointers of different (non-compatible) types do not refer to the same memory. If you violate this, your code may work at -O0 and fail at -O2.

Pointer arithmetic is also restricted. You can only compute pointers within a single array object (including one past the end). Even comparing pointers is only defined if they point into the same array (or one past the end). This matters in C because many custom data structures treat buffers as arrays and use pointer arithmetic to iterate. If you compute a pointer beyond one-past-the-end, or if you compare pointers from different allocations, you are in undefined behavior territory.

One of the most common issues is realloc invalidation. When you call realloc, the old memory may be freed and replaced by a new block. Every pointer derived from the old block becomes invalid. This is why storing raw element pointers into a resizable array is unsafe. Use indices instead, or rebuild pointers after reallocation.

Another subtlety is uninitialized memory. Reading uninitialized memory is undefined behavior because the object has an indeterminate value. Tools like Valgrind track “definedness” to detect this; AddressSanitizer catches out-of-bounds and use-after-free but not all uninitialized reads. In C, you must initialize every byte you intend to read, including padding bytes if you compare raw memory.

The core mental model is: the C standard defines a narrow set of valid pointer operations. If you stay within it, your program is predictable. If you step outside it, you are no longer writing C; you are relying on undefined behavior. The projects in this sprint will force you to stay within the valid subset by writing explicit bounds checks and invariants.

Finally, be careful with pointer comparisons and type punning. Comparing pointers that do not point into the same array object is undefined, even if their numeric values look ordered. Type punning through unions or casts can also trigger strict-aliasing violations. The safe, portable way to reinterpret bytes is to copy with memcpy into a properly typed object. This may feel indirect, but it keeps you inside the rules the compiler assumes.

How This Fits on Projects

Pointer validity and aliasing rules are central to the dynamic array, gap buffer, JSON parser, and HTTP server. These projects will fail under optimization if you violate strict aliasing or lifetime rules.

Definitions & Key Terms

  • Valid pointer: Points to a live object within its lifetime
  • One-past-the-end: The only pointer allowed beyond an array
  • Effective type: The type used to access an object in memory
  • Strict aliasing: Compiler rule that unrelated pointer types do not alias

Mental Model Diagram

Pointer Lifetime

malloc -> valid pointer -> use -> free -> INVALID pointer

Any access after free = undefined behavior

How It Works (Step-by-Step)

  1. Allocate memory and obtain a pointer.
  2. Use pointer only within bounds and lifetime.
  3. After free or realloc, treat old pointers as invalid.
  4. Access bytes through unsigned char * when inspecting raw data.

Minimal Concrete Example

int *p = malloc(sizeof(int) * 4);
int *q = p + 4; // one-past-end is OK
// *(q) is NOT OK
free(p);
// *p is UB now

Common Misconceptions

  • “If it works in debug, it will work in release.” (UB often shows up only in optimized builds.)
  • “Pointers are just integers.” (Not in C semantics.)
  • “memcmp on structs is always safe.” (Padding bytes and effective type rules matter.)

Check-Your-Understanding Questions

  1. Why is reading uninitialized memory undefined behavior?
  2. What does strict aliasing allow and forbid?
  3. Why is one-past-the-end pointer allowed but not dereferenceable?

Check-Your-Understanding Answers

  1. Because the value is indeterminate and the standard allows anything to happen.
  2. It allows aliasing by compatible types and unsigned char, forbids unrelated types.
  3. It allows pointer arithmetic and comparisons without dereference.

Real-World Applications

  • Optimized C libraries rely on strict aliasing assumptions
  • Performance-critical code uses pointer arithmetic safely
  • Sanitizers detect invalid access patterns

Where You Will Apply It

  • Project 1: Dynamic Array
  • Project 2: Gap Buffer
  • Project 4: JSON Parser
  • Project 6: HTTP Server

References

  • C11 Committee Draft N1570 (object lifetime, pointer arithmetic, effective type) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

Key Insight

Undefined behavior is a contract violation, not a runtime error.

Summary

Pointer validity and aliasing rules define the safe subset of C. Violating them means your program is not defined by the standard.

Homework/Exercises to Practice the Concept

  1. Write a small program that violates strict aliasing and observe behavior at -O0 vs -O2.
  2. Create a test that uses realloc and invalidates old pointers.
  3. Use Valgrind to detect uninitialized reads.

Solutions to the Homework/Exercises

  1. Use a float * and int * to alias the same memory (observe unexpected results).
  2. Store a pointer to vec->data[0], then realloc, then check pointer validity.
  3. Run valgrind --track-origins=yes to see uninitialized reads.

Chapter 5: Allocation Strategies and Custom Allocators

Fundamentals

The default allocator (malloc, free, realloc) is general-purpose and flexible, but not always optimal. For high-performance or predictable lifetimes, custom allocators like arenas and pools are better. An arena allocator gives you fast, linear allocation and bulk free. A pool allocator gives you fixed-size blocks with constant-time allocation and deallocation. Understanding the standard allocation functions is required: malloc returns a block of uninitialized storage or NULL, free releases it, and realloc may move it. These functions are defined by the C standard and are the foundation for all custom allocators. Fragmentation, allocator locks, and metadata overhead are why special-purpose allocators can outperform malloc for specific workloads.

Deep Dive into the Concept

Allocators exist to manage fragmentation, performance, and ownership. The general-purpose allocator must support arbitrary allocation sizes and lifetimes, which means it uses metadata, free lists, and often global locks. This overhead can be significant when you are allocating thousands of small objects in a tight loop (e.g., parsing HTTP headers or JSON tokens). An arena allocator sidesteps this by allocating a large block up front and then handing out chunks with a simple pointer bump. This is extremely fast because it avoids per-allocation metadata and search costs. The trade-off is that you can only free everything at once. That is fine for workloads with a clear phase boundary, such as “parse request, then discard all allocations.”

Pool allocators are another strategy. They manage a fixed number of blocks of identical size. Each allocation returns a block from the free list, and each free returns the block to the list. This is ideal for fixed-size objects like list nodes or connection structs. Pools are fast and predictable but do not support variable-sized objects without additional logic.

Alignment is critical for custom allocators. The C standard specifies alignment requirements for each type, and a correct allocator must ensure that returned pointers satisfy those requirements. A simple strategy is to round up every allocation to the next power-of-two boundary or to _Alignof(max_type). The arena allocator in this sprint will implement align_up to satisfy alignment and avoid undefined behavior.

Another allocator concern is reallocation. If you implement a custom allocator, you must decide whether realloc semantics are supported or prohibited. In arenas, realloc can sometimes be implemented only for the most recent allocation; otherwise you must allocate a new block and copy. This affects API design. You should clearly document the limitations.

Ownership is also part of allocator design. Arena allocations are typically owned by the arena itself. The caller borrows pointers and must not call free on them. Instead, the arena is reset in bulk. This is a different ownership model than malloc, and mixing the two leads to undefined behavior. In your projects, you will design APIs that make this explicit.

Another important detail is allocator metadata. General-purpose allocators often use size classes, headers, and free lists. This metadata allows fast allocation but adds overhead and can cause fragmentation. If you build a pool allocator, you can store the free list inside the freed blocks themselves, which keeps metadata minimal. If you need zeroed memory, calloc is defined to return zero-initialized storage; if you reimplement this behavior, you must decide whether to zero eagerly or lazily. These choices affect performance and correctness.

Finally, you will use tools to validate your allocator. Valgrind can detect leaks and invalid accesses, but it may not understand your custom allocator unless you provide annotations. AddressSanitizer detects invalid access patterns but cannot see the internal invariants of your allocator. This means your own validation code is critical. You should implement allocator self-checks (e.g., “used <= capacity”) and run them in debug mode.

How This Fits on Projects

The arena allocator project is the core demonstration. The HTTP server also uses arenas for request-scoped allocations.

Definitions & Key Terms

  • Arena: Linear allocator with bulk reset
  • Pool: Fixed-size block allocator
  • Fragmentation: Inefficient memory usage due to holes
  • Alignment: Address boundary requirements for objects

Mental Model Diagram

Arena Allocator

[base ------------------------------ end]
       ^ used

allocate: return base+used, then used += size
reset: used = 0

How It Works (Step-by-Step)

  1. Allocate a big backing block (malloc).
  2. Track a used offset.
  3. On each allocation, align used and return the pointer.
  4. On reset, set used = 0.

Minimal Concrete Example

typedef struct {
    unsigned char *base;
    size_t used;
    size_t capacity;
} Arena;

void *arena_alloc(Arena *a, size_t size, size_t align) {
    size_t p = (a->used + align - 1) & ~(align - 1);
    if (p + size > a->capacity) return NULL;
    void *ptr = a->base + p;
    a->used = p + size;
    return ptr;
}

Common Misconceptions

  • “Arena allocations can be freed individually.” (No, only reset.)
  • “Alignment does not matter on x86.” (It still matters for performance and correctness.)
  • “malloc always returns zeroed memory.” (It does not.)

Check-Your-Understanding Questions

  1. Why is an arena allocator faster than malloc for small allocations?
  2. What is the main trade-off of arenas?
  3. How do you enforce alignment in a custom allocator?

Check-Your-Understanding Answers

  1. It avoids metadata and free-list searches.
  2. You cannot free individual allocations.
  3. Round up addresses to the required alignment.

Real-World Applications

  • Game engines use arenas for frame allocations
  • HTTP servers use arenas for request lifetimes
  • Compilers use arenas for AST nodes

Where You Will Apply It

  • Project 3: Arena Allocator
  • Project 6: HTTP Server

References

  • C11 Committee Draft N1570 (malloc/free/realloc semantics) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
  • Valgrind Memcheck documentation - https://valgrind.org/docs/manual/quick-start.html
  • LLVM AddressSanitizer documentation - https://clang.llvm.org/docs/AddressSanitizer.html

Key Insight

Allocators encode ownership policy.

Summary

Custom allocators trade flexibility for speed and predictability. They require explicit alignment and ownership models.

Homework/Exercises to Practice the Concept

  1. Write an arena allocator and add a reset function.
  2. Implement a fixed-size pool for list nodes.
  3. Measure performance vs malloc using a benchmark.

Solutions to the Homework/Exercises

  1. Use used offset and align_up function.
  2. Maintain a free list inside each block.
  3. Time N allocations and compare.

Chapter 6: Buffering and Parsing as Ownership

Fundamentals

Parsing is a memory problem as much as a syntax problem. When you parse input, you must decide how to store it, how long it lives, and who owns it. A gap buffer is a text editing structure that represents the buffer as a contiguous array with a “gap” for efficient insertions and deletions. JSON parsing involves building a tree of values with explicit ownership for strings, numbers, and nested objects. HTTP parsing requires reading a byte stream, detecting boundaries, and enforcing strict limits to avoid buffer overflows. These are all data-structure problems, not just parsing problems. In practice, the parser’s buffer limits and error paths are part of its invariants.

Deep Dive into the Concept

Buffering and parsing are where invariants meet real input. A gap buffer keeps a contiguous gap in the middle of the array; insertions occur by filling the gap, and deletions enlarge it. The invariants include: the buffer is contiguous, the gap has non-negative size, and the text length equals total size minus gap size. Emacs uses this structure to support efficient editing, and the concept is well-documented in the GNU Emacs manual.

JSON parsing in C raises ownership questions immediately. When you parse a JSON string, do you copy it into a new allocation, or do you point into the original input buffer? If you point into the input, the caller must keep the input buffer alive for the lifetime of the parsed tree. If you copy, you allocate many small strings and must free them later. Both are valid, but you must document the ownership contract. The RFC 8259 JSON specification defines the syntax and data model. For example, JSON numbers are base-10 with an optional leading minus, optional fraction, and optional exponent; NaN/Infinity are not valid JSON, and trailing commas are not allowed.

HTTP parsing is similarly constrained by specification. HTTP/1.1 is a text-based protocol with strict rules for request lines, headers, and line endings. RFC 9112 specifies the message syntax and parsing rules. The request line is method SP request-target SP HTTP-version CRLF, header fields are field-name: field-value lines terminated by CRLF, and field names are case-insensitive. In a C implementation, you must handle partial reads, buffer limits, and malformed input without overflow. This is exactly where invariants help: “buffer contains no more than MAX_HEADER bytes,” “request line must contain method, target, version,” and “header lines end with CRLF.”

Buffering is also about streaming. TCP does not preserve message boundaries, so you will often receive partial requests. You must design a buffer that can accumulate data until a full message is available. This is where state machines and invariants intersect: your parser has states like READING_REQUEST_LINE or READING_HEADERS, and each state has its own invariants. The capstone HTTP server will demonstrate this.

Finally, parsing is a security boundary. Buffer overflows and use-after-free bugs in parsers are some of the most exploited vulnerabilities in software. Defensive design requires explicit limits (maximum header size, maximum nesting depth, maximum string length) and clear error handling. These limits are not optional in production code; they are essential for safety.

JSON parsing also benefits from a streaming mindset. Even if you parse the whole input at once, it helps to structure the parser as a state machine that can resume. This makes it easier to enforce depth limits, avoid stack overflows, and produce precise error messages with offsets. For HTTP, partial reads and pipelined requests mean your buffer may contain multiple requests or half of one. Your invariants should capture these realities, for example: “buffer contains zero or more complete requests followed by an optional prefix of the next request.”

How This Fits on Projects

The gap buffer project uses a classic text editor structure. The JSON parser implements an explicit ownership model. The HTTP server uses a streaming parser with request pooling.

Definitions & Key Terms

  • Gap buffer: Text buffer with a movable gap for fast edits
  • Streaming parser: Parser that handles partial input
  • Ownership model: Rules for who owns parsed data
  • CRLF: Carriage return + line feed (\r\n) line endings in HTTP

Mental Model Diagram

Gap Buffer Layout

[TEXT....][GAP.....][TEXT....]
 ^beg            ^gap_end     ^end

Insert: fill gap
Delete: enlarge gap

How It Works (Step-by-Step)

  1. Represent input as a buffer with explicit limits.
  2. Track positions (gap, read offset, parse state).
  3. Move gap or advance parse pointer as needed.
  4. Allocate parsed values with explicit ownership.
  5. Enforce limits and return errors on violations.

Minimal Concrete Example

// HTTP request line parse (simplified)
char *end = memmem(buf, len, "\r\n", 2);
if (!end) return NEED_MORE_DATA;
// Split method, target, version by spaces

Common Misconceptions

  • “TCP read returns a full request.” (No, it is a byte stream.)
  • “Parsing is just string splitting.” (It is memory management too.)
  • “You can ignore malformed input.” (Malformed input is the norm in attacks.)

Check-Your-Understanding Questions

  1. Why does a gap buffer allow faster insertions?
  2. What are the ownership choices for parsed JSON strings?
  3. Why must HTTP parsers handle partial reads?

Check-Your-Understanding Answers

  1. The gap means insertions can happen without shifting all text.
  2. Either copy strings (parser owns) or reference input (caller owns).
  3. TCP is stream-based and may deliver partial lines.

Real-World Applications

  • Text editors (Emacs, classic editors)
  • JSON configuration loaders
  • HTTP servers and proxies

Where You Will Apply It

  • Project 2: Gap Buffer
  • Project 4: JSON Parser
  • Project 6: HTTP Server

References

  • GNU Emacs manual on gap buffers - https://www.gnu.org/software/emacs/manual/html_node/elisp/Buffer-Gap.html
  • RFC 8259 (JSON) - https://www.rfc-editor.org/rfc/rfc8259
  • RFC 9112 (HTTP/1.1) - https://www.rfc-editor.org/rfc/rfc9112

Key Insight

Parsing is an ownership problem wearing a syntax mask.

Summary

Buffers and parsers must be designed around invariants, ownership, and limits. This is how you prevent memory bugs in C.

Homework/Exercises to Practice the Concept

  1. Implement a gap buffer with insert and delete.
  2. Parse a JSON number and store it with explicit ownership.
  3. Write a streaming HTTP parser that handles partial reads.

Solutions to the Homework/Exercises

  1. Track gap_start, gap_end, and ensure gap_end >= gap_start.
  2. Store numbers in a union; strings are allocated or borrowed.
  3. Keep a buffer and state; only parse when \r\n is present.

Glossary

  • Alignment: Memory address boundary requirement for objects
  • Arena allocator: Allocator that frees all allocations at once
  • Borrowed pointer: Pointer that must not be freed by the caller
  • Dangling pointer: Pointer to an object whose lifetime ended
  • Effective type: Type used to access object representation
  • Gap buffer: Array with a movable gap for efficient edits
  • Invariant: Condition that must always hold at stable points
  • Ownership: Responsibility for freeing a resource
  • Padding: Unused bytes added to satisfy alignment
  • Strict aliasing: Rule restricting access through incompatible types
  • Undefined behavior: Behavior not defined by the C standard

Why Data and Invariants Matter

The Modern Problem It Solves

Memory safety bugs remain one of the most costly classes of vulnerabilities in software. C and C++ are powerful but memory-unsafe, which means errors like buffer overflows and use-after-free can become exploitable security issues. Governments and major software vendors have publicly called for transitions toward memory-safe strategies because of the scale of the problem.

Recent guidance from CISA and international partners explicitly emphasizes memory safety as a top security priority, noting that memory safety vulnerabilities remain a major portion of reported issues in memory-unsafe codebases.

Real-world impact (recent sources):

  • Google reported that memory safety issues accounted for 76% of Android vulnerabilities in 2019 and 24% in 2024, showing both the scale of the problem and the impact of memory-safe strategies (Google Online Security Blog, Sep 2024). See: https://security.googleblog.com/2024/09/eliminating-memory-safety-vulnerabilities-Android.html
  • CISA and international partners stated that two-thirds of reported vulnerabilities in memory-unsafe languages still relate to memory safety issues (CISA, 2023). See: https://www.cisa.gov/case-memory-safe-roadmaps
Old Reality (C/C++ Unsafe)         New Reality (Memory-Safe Strategy)

UB bugs -> exploits -> patching    Invariants -> safe-by-design -> fewer bugs

Context and Evolution (Optional)

The C standard intentionally leaves some behavior undefined to allow compiler optimizations. Those optimizations are beneficial, but they make memory bugs extremely dangerous. This is why modern engineering increasingly emphasizes explicit invariants, ownership models, and defensive validation.


Concept Summary Table

Concept What You Need to Internalize
Invariants and Contracts Stable-point rules that make a data structure valid
Ownership and Lifetimes Exactly one owner per resource; pointers valid only within lifetime
Memory Layout and Alignment Object representation and alignment rules control safe memory access
Pointer Validity and Aliasing Strict rules for safe pointer use and avoiding undefined behavior
Allocation Strategies When to use malloc vs arena/pool and how it affects ownership
Buffering and Parsing Parsers are ownership problems; input must be buffered safely

Project-to-Concept Map

Project What It Builds Primer Chapters It Uses
Project 1: Dynamic Array Safe resizable array 1, 2, 4
Project 2: Gap Buffer Editor buffer with gap 1, 3, 6
Project 3: Arena Allocator Fast bulk allocation 2, 3, 5
Project 4: JSON Parser Owned parse tree 1, 2, 4, 6
Project 5: Linked List DB Safe mutation & iteration 1, 2, 4
Project 6: HTTP Server Streaming parser w/ pools 1, 2, 3, 5, 6

Deep Dive Reading by Concept

Fundamentals and Contracts

Concept Book & Chapter Why This Matters
Invariants “Code Complete” by Steve McConnell - Ch. 8 Preconditions, postconditions, and defensive checks
Ownership “Effective C” by Robert Seacord - Ch. 4-6 Safe memory management patterns

Memory and Layout

Concept Book & Chapter Why This Matters
Memory layout “Computer Systems: A Programmer’s Perspective” - Ch. 2, 3 Understanding object representation
Alignment “Expert C Programming” by Peter van der Linden - Ch. 3 Practical alignment pitfalls

Parsing and Buffers

Concept Book & Chapter Why This Matters
Parsing in C “The Practice of Programming” by Kernighan & Pike - Ch. 3-5 Robust parsing techniques
State machines “Crafting Interpreters” by Bob Nystrom - Ch. 4 Implementing parsers as state machines

Specifications

Concept Specification Why This Matters
JSON syntax RFC 8259 Defines the correct JSON format
HTTP/1.1 RFC 9112 Defines HTTP request syntax and parsing rules

Quick Start

Feeling overwhelmed? Start here.

Day 1 (4 hours):

  1. Read Chapter 1 and Chapter 2 of the Theory Primer.
  2. Build Project 1 (dynamic array) with only push/pop.
  3. Add vec_validate() and run it after each operation.

Day 2 (4 hours):

  1. Add realloc growth and bounds checks to Project 1.
  2. Run Valgrind or AddressSanitizer on your tests.
  3. Read Chapter 5 (allocation strategies).

By the end of 48 hours, you will understand ownership and invariants, which is 80% of this sprint.


Path 1: The “C Fundamentals” Path

Best for: Newer C programmers

  1. Project 1 -> Project 3 -> Project 2
  2. Then Project 4
  3. Capstone after all

Path 2: The “Systems Engineer” Path

Best for: Comfortable with C, want deeper systems intuition

  1. Project 3 -> Project 1 -> Project 4
  2. Then Project 2 and 5
  3. Capstone last

Path 3: The “Interview Prep” Path

Best for: Whiteboard-level mastery

  1. Project 1 and 3
  2. Project 2 (gap buffer) for invariants
  3. Project 4 for parsing

Path 4: The Completionist

Best for: Full mastery

  1. Projects 1-6 in order
  2. Add an additional feature to each project

Success Metrics

You can claim mastery when you can:

  • State invariants for each data structure without looking at code
  • Run all projects with Valgrind-clean or ASan-clean results
  • Explain ownership for every pointer in your codebase
  • Handle malformed input without crashes or leaks
  • Implement any project from scratch within 2-4 hours

Project Overview Table

# Project Difficulty Time Primary Focus
1 Safe Dynamic Array (vec.h) Intermediate 1-2 days Invariants, ownership, realloc
2 Gap Buffer Editor Advanced 2-3 days Invariants, memory layout
3 Arena Allocator Advanced 2-3 days Ownership, alignment
4 JSON Parser Advanced 3-4 days Ownership, parsing, validation
5 Linked List Database Intermediate 2-3 days Safe mutation, iteration
6 HTTP Server (Capstone) Expert 5-7 days Parsing, pools, invariants

Project List

Project 1: Safe Dynamic Array Library (vec.h)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 2 (Practical and Useful)
  • Business Potential: Level 2 (Reusable library)
  • Difficulty: Intermediate
  • Knowledge Area: Data structures, invariants
  • Software or Tool: GCC/Clang, Valgrind
  • Main Book: “C Interfaces and Implementations” by David Hanson

What you’ll build: A resizable array with explicit invariants, push/pop, bounds-checked access, and safe growth using realloc.

Why it teaches data and invariants: A vector forces you to define and enforce invariants about capacity, length, and ownership of the underlying buffer.

Core challenges you’ll face:

  • Defining and validating invariants
  • Handling realloc safely
  • Preventing out-of-bounds access

Real World Outcome

You will have a small library you can include in any C project.

What you will see:

  1. A vec.h and vec.c with clear API contracts
  2. Debug builds that assert invariants on every operation
  3. Clean Valgrind and ASan runs

Command Line Outcome Example:

$ make test
cc -Wall -Wextra -g vec.c vec_test.c -o vec_test

$ ./vec_test
[vec] push 1
[vec] push 2
[vec] pop -> 2
[vec] len=1 cap=8 OK

$ valgrind --leak-check=full ./vec_test
==12345== HEAP SUMMARY:
==12345==     in use at exit: 0 bytes in 0 blocks
==12345==   total heap usage: 3 allocs, 3 frees
==12345== All heap blocks were freed -- no leaks are possible

The Core Question You’re Answering

“How do I keep a resizable array correct across reallocations and mutations?”

Concepts You Must Understand First

  1. Invariants
    • What must always be true about length and capacity?
    • Book Reference: “Code Complete” Ch. 8
  2. Ownership and Lifetimes
    • Who owns the buffer? When is it freed?
    • Book Reference: “Effective C” Ch. 4-6
  3. Pointer Validity
    • What happens to pointers after realloc?
    • Book Reference: “Expert C Programming” Ch. 3

Questions to Guide Your Design

  1. How will you prevent out-of-bounds access?
  2. When do you grow the buffer, and by how much?
  3. How do you handle realloc failure safely?

Thinking Exercise

“The Realloc Trap”

int *p = vec->data;
vec_push(vec, 42); // may realloc
// Is p still valid?

Questions:

  • Under what conditions is p valid?
  • How should your API prevent this misuse?

The Interview Questions They’ll Ask

  1. “What invariants define a dynamic array?”
  2. “Why is realloc dangerous?”
  3. “What growth factor should you use and why?”
  4. “How do you ensure amortized O(1) push?”

Hints in Layers

Hint 1: Start with invariants Define length <= capacity, data != NULL when capacity > 0.

Hint 2: Use realloc carefully Always assign to a temporary pointer first.

int *tmp = realloc(vec->data, new_cap * sizeof(int));
if (!tmp) return false; // keep old buffer intact
vec->data = tmp;

Hint 3: Add validation Call vec_validate() at entry/exit in debug builds.

Hint 4: Test with Valgrind Run valgrind --leak-check=full ./vec_test after every change.

Books That Will Help

Topic Book Chapter
Invariants “Code Complete” Ch. 8
Memory safety “Effective C” Ch. 4-6
Dynamic arrays “C Interfaces and Implementations” Ch. 4-5

Common Pitfalls & Debugging

Problem 1: “Segmentation fault after push”

  • Why: realloc failed and old pointer lost.
  • Fix: Use a temp pointer and only assign on success.
  • Quick test: Force small allocations and check error paths.

Problem 2: “Use-after-free”

  • Why: External pointer to array element after growth.
  • Fix: Document that element pointers are invalid after mutation.

Problem 3: “Length exceeds capacity”

  • Why: Incrementing length before growth.
  • Fix: Ensure growth happens before length update.

Definition of Done

  • Invariants are documented and enforced
  • push, pop, get, set work correctly
  • realloc failure path is safe
  • Valgrind and ASan are clean

Project 2: Text Editor Gap Buffer

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 3 (Genuinely Clever)
  • Business Potential: Level 2 (Editor component)
  • Difficulty: Advanced
  • Knowledge Area: Data structure invariants
  • Software or Tool: GCC/Clang, Valgrind
  • Main Book: “C Interfaces and Implementations” by David Hanson

What you’ll build: A gap buffer that supports efficient insert and delete operations, with explicit invariants and validation.

Why it teaches data and invariants: The gap buffer is a canonical data structure whose correctness is defined entirely by invariants (gap location, gap size, text length).

Core challenges you’ll face:

  • Maintaining a contiguous gap
  • Moving the gap efficiently
  • Avoiding off-by-one errors

Real World Outcome

What you will see:

  1. A buffer that supports fast insertions at the cursor
  2. Correct behavior when inserting large text blocks
  3. Explicit validation of gap invariants

Command Line Outcome Example:

$ ./gapedit
[insert] "Hello"
[buffer] "Hello|" (gap size 11)
[move cursor] to start
[insert] "Say "
[buffer] "Say Hello|" (gap size 7)
[delete] 3 chars
[buffer] "Say |lo" (gap size 10)

The Core Question You’re Answering

“How can I edit a large text buffer efficiently without reallocating every time?”

Concepts You Must Understand First

  1. Invariants
    • Gap must remain contiguous
    • Text length = total size - gap size
    • Book Reference: “Code Complete” Ch. 8
  2. Memory Layout
    • Buffer is a flat array with a hole
    • Book Reference: “Expert C Programming” Ch. 3

Questions to Guide Your Design

  1. How will you represent the gap (start/end indices)?
  2. When do you move the gap vs expand it?
  3. How do you validate invariants after edits?

Thinking Exercise

“Gap Movement”

If the cursor moves from position 100 to 500, how many bytes should you move? What happens if you move the wrong direction?

The Interview Questions They’ll Ask

  1. “Why is a gap buffer efficient for insertions?”
  2. “What are the key invariants?”
  3. “How is a gap buffer different from a rope?”

Hints in Layers

Hint 1: Represent the gap with two indices gap_start and gap_end are enough.

Hint 2: Move gap with memmove Use memmove because regions overlap.

memmove(buf + new_gap_end, buf + gap_end, move_size);

Hint 3: Validate after every edit Check gap_end >= gap_start and text_len + gap_size == capacity.

Books That Will Help

Topic Book Chapter
Data structure invariants “Code Complete” Ch. 8
Memory layout “Expert C Programming” Ch. 3

Common Pitfalls & Debugging

Problem 1: “Corrupted text after insertion”

  • Why: Using memcpy for overlapping regions.
  • Fix: Use memmove.

Problem 2: “Gap disappears”

  • Why: Wrong updates to gap_start or gap_end.
  • Fix: Add assertions after each mutation.

Definition of Done

  • All invariants documented
  • Insert/delete works at any cursor location
  • Gap moves correctly with memmove
  • Valgrind and ASan clean

Project 3: Arena Allocator

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 3 (Genuinely Clever)
  • Business Potential: Level 3 (Reusable infrastructure)
  • Difficulty: Advanced
  • Knowledge Area: Allocation strategies
  • Software or Tool: GCC/Clang, Valgrind
  • Main Book: “C Interfaces and Implementations” by David Hanson

What you’ll build: A bump-pointer arena allocator with alignment, reset, and optional debug validation.

Why it teaches data and invariants: You will define invariants about used space and alignment, and enforce ownership rules (arena owns all allocations).

Core challenges you’ll face:

  • Alignment rounding
  • Out-of-memory handling
  • Ownership clarity (no individual frees)

Real World Outcome

$ ./arena_test
[arena] init size=4096
[alloc] 64 bytes -> 0x1000
[alloc] 32 bytes -> 0x1040
[reset]
[alloc] 128 bytes -> 0x1000
[ok] invariants hold

The Core Question You’re Answering

“How can I allocate many small objects quickly when they share a lifetime?”

Concepts You Must Understand First

  1. Alignment
    • Use _Alignof and round up
    • Book Reference: “Expert C Programming” Ch. 3
  2. Ownership
    • Arena owns all allocations
    • Book Reference: “Effective C” Ch. 4-6

Questions to Guide Your Design

  1. How will you represent used and capacity?
  2. How do you handle out-of-memory without corrupting state?
  3. What alignment should you guarantee?

Thinking Exercise

“The Alignment Bug”

If you allocate a double at an odd address, what might happen on a strict-alignment CPU?

The Interview Questions They’ll Ask

  1. “What is an arena allocator?”
  2. “Why is alignment necessary?”
  3. “What workloads benefit most from arenas?”

Hints in Layers

Hint 1: Use align_up Always align the pointer before allocating.

Hint 2: Validate after allocation Ensure used <= capacity.

Hint 3: No frees The only valid free is arena_reset().

Books That Will Help

Topic Book Chapter
Custom allocators “C Interfaces and Implementations” Ch. 5
Memory management “Effective C” Ch. 4-6

Common Pitfalls & Debugging

Problem 1: “Allocator returns NULL too early”

  • Why: Alignment math is wrong.
  • Fix: Verify align_up logic.

Problem 2: “Data corruption”

  • Why: Overlapping allocations.
  • Fix: Add asserts for used offset.

Definition of Done

  • Allocator returns aligned pointers
  • Reset reuses memory safely
  • All invariants validated
  • Valgrind clean

Project 4: JSON Parser with Explicit Ownership

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 4 (Hardcore Tech Flex)
  • Business Potential: Level 3 (Reusable parsing library)
  • Difficulty: Advanced
  • Knowledge Area: Parsing, ownership
  • Software or Tool: GCC/Clang, Valgrind
  • Main Book: “The Practice of Programming” by Kernighan & Pike

What you’ll build: A JSON parser that produces a tree of values with explicit ownership rules and robust error reporting.

Why it teaches data and invariants: JSON parsing forces you to define ownership of strings and nested structures. You also must enforce invariants about token boundaries and nesting depth.

Core challenges you’ll face:

  • String parsing with escapes
  • Ownership model for parsed strings
  • Defensive limits (depth, length)

Real World Outcome

$ ./json_parse '{"name":"Ada","age":36,"tags":["math","systems"]}'
JSON Object
  name: "Ada"
  age: 36
  tags: ["math", "systems"]

$ ./json_parse '{"name": "unterminated}'
Error: Unterminated string at byte 9

The Core Question You’re Answering

“How do I parse a complex format and still keep ownership and invariants correct?”

Concepts You Must Understand First

  1. Ownership
    • Do strings belong to parser or caller?
    • Book Reference: “Effective C” Ch. 4-6
  2. Pointer Validity
    • Avoid dangling pointers into temporary buffers
    • Book Reference: “Expert C Programming” Ch. 3
  3. Parsing and Buffers
    • JSON syntax rules
    • Spec Reference: RFC 8259

Questions to Guide Your Design

  1. Will you copy strings or point into the input buffer?
  2. How will you represent JSON values in a union?
  3. How will you track parsing errors with offsets?

Thinking Exercise

“Borrowed Input”

If your parser stores pointers into the input buffer, what happens when the input buffer is freed? How can you prevent this?

The Interview Questions They’ll Ask

  1. “What are the valid JSON types?”
  2. “How do you handle escape sequences?”
  3. “What is the complexity of your parser?”
  4. “How do you prevent stack overflow from deep nesting?”

Hints in Layers

Hint 1: Start with a tokenizer Turn input into tokens before building the tree.

Hint 2: Use a union for value types Define JSON_STRING, JSON_NUMBER, etc.

Hint 3: Choose ownership explicitly Document whether strings are copied or borrowed.

Books That Will Help

Topic Book Chapter
Parsing “The Practice of Programming” Ch. 3-5
Memory safety “Effective C” Ch. 4-6
JSON spec RFC 8259 All

Common Pitfalls & Debugging

Problem 1: “Segfault on invalid JSON”

  • Why: Missing bounds checks.
  • Fix: Always check input length before reading.

Problem 2: “Leaks after parse error”

  • Why: Partial tree not freed.
  • Fix: Free allocated nodes on error path.

Definition of Done

  • Correctly parses valid JSON per RFC 8259
  • Rejects invalid JSON with useful errors
  • Ownership model documented and tested
  • Valgrind and ASan clean

Project 5: Linked List Database

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 2 (Practical)
  • Business Potential: Level 2 (Simple storage engine)
  • Difficulty: Intermediate
  • Knowledge Area: Ownership, mutation safety
  • Software or Tool: GCC/Clang, Valgrind
  • Main Book: “Effective C” by Robert Seacord

What you’ll build: A small database where each record is stored in a linked list node, with safe iteration and deletion.

Why it teaches data and invariants: Linked lists require invariants about node linking and pointer validity, especially when deleting nodes during iteration.

Real World Outcome

$ ./listdb
insert key=1 value="alpha"
insert key=2 value="beta"
find key=2 -> "beta"
delete key=1
list: (2, "beta")

The Core Question You’re Answering

“How do I mutate a linked list safely without invalidating my iterator?”

Concepts You Must Understand First

  1. Ownership
    • Each node owns its data
    • Book Reference: “Effective C” Ch. 4-6
  2. Pointer Validity
    • Deleting a node invalidates its pointer
    • Book Reference: “Expert C Programming” Ch. 3

Questions to Guide Your Design

  1. How do you delete a node while iterating?
  2. How will you prevent use-after-free?
  3. How do you represent ownership of node payloads?

Thinking Exercise

“Delete While Iterating”

If you delete the current node, how do you continue the loop safely?

The Interview Questions They’ll Ask

  1. “How do you delete a node in a singly linked list?”
  2. “What invariants does a linked list have?”
  3. “Why is it dangerous to free the current node before saving next?”

Hints in Layers

Hint 1: Save next pointer first Before deleting, store next in a temporary variable.

Hint 2: Use a dummy head Simplify edge cases with a sentinel node.

Books That Will Help

Topic Book Chapter
Memory safety “Effective C” Ch. 4-6
Data structures “Algorithms in C” by Sedgewick Ch. 3

Common Pitfalls & Debugging

Problem 1: “Crash during iteration”

  • Why: Use-after-free on deleted node.
  • Fix: Store next before delete.

Problem 2: “Lost nodes”

  • Why: Not updating links correctly.
  • Fix: Use a sentinel head to simplify.

Definition of Done

  • Insert, delete, and find operations work
  • Iterator deletion is safe
  • No leaks or use-after-free

Project 6: HTTP Server with Request Pooling (Capstone)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 4 (Hardcore Tech Flex)
  • Business Potential: Level 4 (Service foundation)
  • Difficulty: Expert
  • Knowledge Area: Parsing, ownership, allocators
  • Software or Tool: GCC/Clang, Valgrind
  • Main Book: “UNIX Network Programming, Vol 1” by Stevens

What you’ll build: A minimal HTTP/1.1 server that parses requests, uses a pool allocator per request, and responds with static content.

Why it teaches data and invariants: The server combines all concepts: parsing, buffering, ownership, invariants, and allocation strategies.

Real World Outcome

$ ./httpserver 8080 ./www
[server] listening on :8080
[conn 3] GET /index.html
[conn 3] 200 OK (1284 bytes)

$ curl -v http://localhost:8080/index.html
> GET /index.html HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.5.0
>
< HTTP/1.1 200 OK
< Content-Length: 1284
< Content-Type: text/html
<
<html>...

The Core Question You’re Answering

“How do I parse and serve HTTP requests safely in C without memory bugs?”

Concepts You Must Understand First

  1. Parsing and Buffering
    • HTTP/1.1 request format
    • Spec Reference: RFC 9112
  2. Arena Allocation
    • Request-scoped allocations
    • Book Reference: “C Interfaces and Implementations” Ch. 5
  3. Invariants
    • Parser state invariants
    • Book Reference: “Code Complete” Ch. 8

Questions to Guide Your Design

  1. How will you handle partial reads from the socket?
  2. What is the maximum header size you will accept?
  3. How will you reset per-request allocations safely?

Thinking Exercise

“Partial Request”

If recv() returns half of the headers, what invariant must still hold? How do you keep the buffer consistent?

The Interview Questions They’ll Ask

  1. “Why must HTTP parsers handle partial reads?”
  2. “How do you prevent header buffer overflow?”
  3. “What is a safe maximum request size?”
  4. “How do you validate request line syntax?”

Hints in Layers

Hint 1: Start with single-connection blocking I/O Get a minimal server working before concurrency.

Hint 2: Implement a state machine Use states like REQ_LINE, HEADERS, BODY.

Hint 3: Use an arena for each request All allocations freed when request ends.

Books That Will Help

Topic Book Chapter
HTTP parsing RFC 9112 All
Sockets “UNIX Network Programming” Vol 1 Ch. 4-6
Allocators “C Interfaces and Implementations” Ch. 5

Common Pitfalls & Debugging

Problem 1: “Server hangs on input”

  • Why: Waiting for full request when partial data is available.
  • Fix: Use a buffer and state machine.

Problem 2: “Buffer overflow on long headers”

  • Why: No explicit header size limit.
  • Fix: Enforce max header size and reject large requests.

Problem 3: “Memory leak per request”

  • Why: Forgetting to reset arena.
  • Fix: Reset arena after each request.

Definition of Done

  • Parses HTTP/1.1 request line and headers correctly
  • Handles partial reads safely
  • Uses arena allocator for request memory
  • No memory leaks or invalid accesses

Summary

This sprint teaches the true foundation of safe C: invariants, ownership, memory layout, and parsing discipline. If you complete the projects and can explain the invariants from memory, you will have skills that transfer to any systems codebase.


Additional Resources

  • C11 Committee Draft N1570 (alignment, object lifetime, aliasing rules) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
  • RFC 8259 (JSON) - https://www.rfc-editor.org/rfc/rfc8259
  • RFC 9112 (HTTP/1.1) - https://www.rfc-editor.org/rfc/rfc9112
  • GNU Emacs manual on gap buffers - https://www.gnu.org/software/emacs/manual/html_node/elisp/Buffer-Gap.html
  • Valgrind Memcheck documentation - https://valgrind.org/docs/manual/quick-start.html
  • LLVM AddressSanitizer documentation - https://clang.llvm.org/docs/AddressSanitizer.html