Sprint 2: Data and Invariants (C Systems Programming)
Goal: Build a rock-solid mental model for how data really lives in C: which bytes are valid, who owns them, which invariants must always hold, and how to prove those invariants with code and tests. By the end of this sprint you will be able to design data structures with explicit contracts, track lifetimes without a garbage collector, and debug memory failures methodically using sanitizers and instrumentation. You will ship working, documented, and testable C libraries (dynamic array, gap buffer, arena allocator, JSON parser, linked-list database) and a capstone HTTP server whose correctness depends on the invariants you defined.
Introduction
Data and invariants are the rules that keep C programs correct. C gives you direct access to raw memory, but it does not protect you. The only thing standing between your program and undefined behavior is your discipline: the explicit contracts you design, enforce, and test. This guide turns those contracts into buildable systems.
What you will build (by the end of this guide):
- A reusable dynamic array library (
vec.h) with documented invariants and validation - A text editor buffer using a gap buffer (Emacs-style) with correct insertion/deletion
- A fast arena allocator with explicit ownership and alignment guarantees
- A JSON parser with ownership rules for every value and error
- A tiny linked-list database with safe iteration and mutation
- A minimal HTTP/1.1 server with request pooling and robust parsing
Scope (what is included):
- Invariants, preconditions, postconditions, and loop invariants
- Ownership, lifetimes, and pointer validity in C
- Memory layout, alignment, padding, and object representation
- Allocation strategies: malloc/free/realloc, arenas, pools
- Parsing and buffer management in memory-unsafe languages
Out of scope (for this guide):
- Garbage collectors and managed runtimes
- Kernel-level memory allocators
- Full HTTP compliance beyond basic request parsing
- Advanced concurrency (covered in later sprints)
The Big Picture (Mental Model)
Data Structure Contracts in C
Inputs -> Parse -> Allocate -> Mutate -> Validate -> Free
| | | | | |
v v v v v v
(bytes) (buffers) (ownership) (invariants) (asserts) (cleanup)
If any invariant is false at a stable point, the program is already broken.
Key Terms You Will See Everywhere
- Invariant: A condition that must always be true at defined stable points
- Ownership: The rule that says who is responsible for freeing a resource
- Lifetime: The time interval during which a pointer refers to a valid object
- Object representation: The actual bytes that store a C object in memory
- Undefined behavior (UB): Behavior not defined by the C standard
- Stable point: A place where invariants must hold (function boundaries, loop starts)
How to Use This Guide
- Read the Theory Primer first. It is the mini-book that gives you the mental models.
- Build projects in order for your first pass. Each project uses the invariants from earlier ones.
- Before coding, write down invariants. You should know what must be true before you write code.
- Instrument early. Use asserts, Valgrind, and AddressSanitizer from day one.
- Repeat with a twist. After each project, change a constraint (e.g., shrink buffer, shrink arena, malformed JSON) and verify the invariants still hold.
Prerequisites & Background Knowledge
Before starting these projects, you should have foundational understanding in these areas:
Essential Prerequisites (Must Have)
Programming Skills:
- C syntax, control flow, functions, and structs
- Basic pointer use and pointer arithmetic
- Familiarity with
malloc,free, andrealloc - Comfort compiling with
gccorclang
C Fundamentals:
- The difference between stack and heap allocations
- What
sizeofmeans and why it matters - How to read compiler warnings and treat them as errors
Recommended Reading:
- “C Programming: A Modern Approach” by K. N. King (Ch. 11-12, 17)
- “Effective C” by Robert C. Seacord (Ch. 4-6)
Helpful But Not Required
- Basic debugging with
gdborlldb - Using Valgrind or AddressSanitizer
- Familiarity with linked lists and arrays
- Some exposure to parsing or state machines
Self-Assessment Questions
Before starting, ask yourself:
- Can you explain the difference between a pointer and the object it points to?
- Can you write a loop that iterates through a
char *buffer without overflow? - Do you understand why
memcpyis unsafe for overlapping regions? - Do you know what happens to pointers after
realloc? - Can you identify when a pointer is dangling?
If you answered “no” to 1-3: Review chapters in “C Programming: A Modern Approach” first.
Development Environment Setup
Required Tools:
- Linux, macOS, or WSL
- GCC or Clang (C11 or later)
makegdborlldb
Recommended Tools:
- Valgrind Memcheck (memory error detection)
- AddressSanitizer (compiler-based memory checks)
clang-tidyor Clang static analyzer
Testing Your Setup:
# Compiler check
gcc --version
clang --version
# Valgrind check (Linux)
valgrind --version
# AddressSanitizer quick test
cat > asan_test.c <<'C'
#include <stdlib.h>
int main(void) {
int *p = malloc(sizeof(int));
free(p);
*p = 5; // use-after-free
return 0;
}
C
clang -fsanitize=address -g asan_test.c -o asan_test
./asan_test
Valgrind Memcheck detects many memory errors at runtime, and AddressSanitizer detects out-of-bounds and use-after-free with relatively low overhead (often around ~2x runtime slowdown). These tools are part of the standard professional workflow for C and C++ codebases. See https://valgrind.org/docs/manual/quick-start.html and https://clang.llvm.org/docs/AddressSanitizer.html for details.
Time Investment
- Project 1-2: 1-3 days each
- Project 3-4: 2-4 days each
- Project 5: 2-3 days
- Capstone: 5-7 days
Total sprint: 3-6 weeks depending on experience.
Important Reality Check
These projects are designed to surface memory and invariant bugs. You will crash programs. That is the point. The goal is to understand why the crash happened and prevent it next time.
Big Picture / Mental Model
C data structures are safe only if their invariants are true at stable points.
Stable Points (Invariants Must Hold)
[function entry] -> [loop start] -> [after mutation] -> [function exit]
| | | |
v v v v
Preconditions Loop invariants Representation Postconditions
invariants
If ANY invariant is false at a stable point, the program is already incorrect.
Think in layers:
- Representation layer: bytes, alignment, object representation
- Ownership layer: who frees what, and when
- Behavior layer: API contracts and invariants
- Validation layer: asserts, tests, sanitizers
Theory Primer
This is the mini-book. Every concept in the table below has its own section.
Chapter 1: Invariants and Contracts
Fundamentals
An invariant is a rule that must always be true at specific stable points in your program. In C, invariants replace the safety net you get from managed runtimes. They are the only guarantee that data structures stay consistent across operations. An invariant can be as simple as length <= capacity or as subtle as “every node in the list has a valid next pointer”. Every API boundary introduces a contract: the caller promises certain conditions before the call (preconditions), and the callee promises certain conditions after the call (postconditions). If you do not write these down, you will still depend on them, but you will discover them only when your program crashes.
Deep Dive into the Concept
Invariants live at stable points: function entry, function exit, and loop boundaries. Between stable points, invariants may be temporarily violated while you mutate a structure. That is acceptable only if you restore them before the next stable point. This is how you reason about complex operations in small steps. A dynamic array insert, for example, temporarily violates the invariant length <= capacity if you increment length before growing the buffer. A correct implementation ensures that the buffer is grown first, or that the final stable point restores the invariant. In code, the invariant should be enforced by structure: the order of operations, helper functions, and asserts.
Invariants can be grouped into:
- Representation invariants: properties that must always be true for a data structure to be valid (e.g., linked list nodes form an acyclic chain; gap buffer has a contiguous gap).
- Behavioral invariants: properties across operations (e.g.,
pushfollowed bypopreturns the original value). - Resource invariants: constraints about ownership and cleanup (e.g., every allocated pointer has exactly one owner).
Loop invariants are a special case. A loop invariant must be true at the start of each iteration. This is how you prove that a loop does not run off the end of a buffer or corrupt state. In C, failing to maintain loop invariants often results in out-of-bounds access or corrupted metadata. You should be able to state your loop invariants in a sentence. Example: “At the start of each iteration, i is in range [0, len) and all elements in out[0..i-1] are valid.” If you cannot state a loop invariant, the loop is probably not correct.
A practical way to enforce invariants is to centralize mutation. For example, if vec_push is the only function allowed to modify length, you can assert the invariant at the start and end of that function. Another approach is to create a vec_validate() function that checks invariants and can be called in debug builds. In addition, sanitizers and tools like Valgrind are your safety net, but they are not a substitute for invariants: they detect violations, not prevent them.
Invariants are also about interfaces. If your API allows callers to break invariants, your system is already unsafe. This is why you must be explicit about which fields are public and which are private. For example, a struct Vec that exposes length and capacity allows external code to mutate them and invalidate the structure. You should prefer opaque structs or a clear “do not touch” contract. In C, this is a social contract enforced by documentation and discipline.
Finally, invariants are what make testing meaningful. A unit test should not only check results, it should check that the structure is still valid. A dynamic array test that only checks output but never validates internal state will miss many bugs. Invariants are the bridge between correctness and implementation detail.
Another useful practice is to classify invariants as recoverable or fatal. Recoverable invariants can be checked at runtime and returned as errors (for example, input length too large). Fatal invariants should be enforced with asserts because continuing execution would corrupt state (for example, length > capacity inside a vector). This split helps you decide whether to return errors or crash early in debug builds.
How This Fits on Projects
Every project in this sprint starts by writing invariants. If you skip this step, you will be guessing. The dynamic array, gap buffer, arena allocator, JSON parser, linked list database, and HTTP server all depend on invariants to avoid undefined behavior.
Definitions & Key Terms
- Invariant: A rule that must always be true at a stable point
- Precondition: What must be true before a function runs
- Postcondition: What must be true after a function returns
- Loop invariant: Condition true at every loop start
- Representation invariant: Structural constraints of a data structure
Mental Model Diagram
Stable Points and Invariants
[Function Entry] -> [Mutations] -> [Function Exit]
| | |
Preconditions Temporary break Postconditions
(allowed) (must hold)
How It Works (Step-by-Step)
- Define invariants for the data structure.
- At function entry, assert preconditions.
- Perform mutations, allowing invariants to be temporarily false.
- Restore invariants before returning.
- Assert postconditions and validate structure.
Minimal Concrete Example
#include <assert.h>
typedef struct {
int *data;
size_t length;
size_t capacity;
} Vec;
static void vec_validate(const Vec *v) {
assert(v != NULL);
assert(v->capacity == 0 || v->data != NULL);
assert(v->length <= v->capacity);
}
Common Misconceptions
- “If it works in my test, it is correct.” (No: invariants may be broken silently.)
- “Asserts are only for debugging.” (They encode contracts.)
- “Invariants slow code.” (Only if you leave validation in production builds.)
Check-Your-Understanding Questions
- What is the difference between a representation invariant and a behavioral invariant?
- Why are invariants allowed to be temporarily false inside a function?
- What makes a loop invariant useful in C?
Check-Your-Understanding Answers
- Representation invariants describe the structure; behavioral invariants describe operation semantics.
- Because the function body is not a stable point; only entry/exit and loop boundaries are.
- It provides a correctness argument and prevents out-of-bounds access.
Real-World Applications
- Database engines assert invariants on page layouts.
- Compilers validate AST invariants after transformations.
- Networking code uses invariants to ensure buffer sizes and offsets are valid.
Where You Will Apply It
- Project 1: Dynamic Array
- Project 2: Gap Buffer
- Project 3: Arena Allocator
- Project 4: JSON Parser
- Project 5: Linked List Database
- Project 6: HTTP Server
References
- C11 Committee Draft N1570 (assertions, object lifetimes, alignment) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
- “Code Complete” by Steve McConnell (Ch. 8)
Key Insight
Invariants are the real type system of C.
Summary
Invariants are non-negotiable truths about your data structures. You must define them, enforce them, and test them. They are how you write trustworthy C code.
Homework/Exercises to Practice the Concept
- Write invariants for a stack and queue.
- Add
validate()functions for each structure. - Write tests that intentionally violate the invariants.
Solutions to the Homework/Exercises
- Stack:
top <= capacity,data != NULLwhen capacity > 0. - Queue:
head,tailwithin[0, capacity), size <= capacity. - Use asserts and confirm the program halts when broken.
Chapter 2: Ownership and Lifetimes
Fundamentals
Ownership answers one question: who is responsible for freeing a resource? In C, every dynamically allocated object must have exactly one owner. If no one owns it, you leak memory. If two owners exist, you will eventually double-free or use-after-free. Lifetimes define the valid time interval for a pointer: from allocation until it is freed or otherwise invalidated. Most C bugs are ownership and lifetime bugs, not algorithmic bugs. If you can clearly describe ownership for every pointer, you can avoid the majority of undefined behavior. The C standard defines when objects exist and when pointers are valid. After free, the pointer becomes invalid, and after realloc, the old pointer may be invalid even if the memory appears unchanged.
Deep Dive into the Concept
Ownership is about responsibility, not just access. You can have multiple pointers to the same object, but only one “owner” is responsible for freeing it. This is the key distinction between ownership and aliasing. Aliasing means two pointers refer to the same memory; ownership means only one of them is allowed to release it. In practice, ownership is a contract across functions and modules. If a function returns a pointer, you must decide whether the caller owns it (must free it), whether ownership remains with the callee (do not free it), or whether ownership is shared through a reference-counting scheme. In C, reference counting is manual and error-prone, so most libraries avoid it unless necessary.
A robust ownership model uses three verbs: create, borrow, and transfer. A function that creates memory (e.g., vec_create) returns ownership to the caller. A function that borrows memory uses it temporarily without freeing it (e.g., vec_get). A function that transfers ownership passes responsibility to another component (e.g., json_take_string). You should document this in every API. Without documentation, the caller will guess, and the guess will be wrong.
Lifetimes are defined by the C standard. An object’s lifetime begins when storage is obtained and ends when the storage is released. This includes heap objects (malloc/free), stack objects (lifetime ends when the function returns), and static objects (lifetime is the entire program). Accessing an object outside of its lifetime is undefined behavior. This matters for struct fields and pointers. A struct can outlive the memory it points to; if you do not manage that lifetime, you will read dangling pointers and corrupt memory.
The realloc function is a perfect example of lifetime hazards. It may return the same pointer, or it may allocate new storage, copy data, and free the old storage. Any pointer derived from the old storage is invalid after realloc if the block moved. This is why realloc is dangerous when you have external pointers into an array. A safe approach is to never store raw pointers to elements in resizable arrays. Store indices instead, or rebuild pointers after reallocation.
Ownership extends beyond memory. File descriptors, sockets, and mutexes are resources with lifetimes and ownership. The same mental model applies: every resource must have exactly one owner or one explicit owner group. If you do not enforce this, you will leak descriptors or close them twice.
Finally, ownership is the basis of designing safe APIs. If you can describe ownership precisely, you can also describe how to test it. For example, a JSON parser that returns a tree structure must state who owns the strings: does the parser allocate them, or does it point into the input buffer? If it points into the input buffer, the caller must keep the buffer alive. This is an ownership contract, and if it is violated, you will see use-after-free bugs.
Ownership also interacts with error handling. If a function partially allocates resources and then fails, it must either free everything before returning or transfer ownership of the partially built structure to the caller with a clear cleanup function. Ambiguous ownership on error paths is where many real-world leaks happen, so you should design error handling as part of your ownership model.
How This Fits on Projects
Ownership rules are central to the dynamic array, arena allocator, JSON parser, linked list DB, and HTTP server. Each project requires explicit ownership documentation.
Definitions & Key Terms
- Owner: The code responsible for freeing a resource
- Borrower: Temporary user without ownership
- Transfer: Passing ownership from one module to another
- Lifetime: Valid time interval of an object
Mental Model Diagram
Ownership Flow
malloc -> owner A -> transfer -> owner B -> free
\-> borrow (no free)
How It Works (Step-by-Step)
- Decide who owns the resource at creation.
- Document whether functions borrow or transfer ownership.
- Ensure every code path frees exactly once.
- Invalidate pointers after free.
Minimal Concrete Example
char *make_copy(const char *s) {
size_t n = strlen(s) + 1;
char *p = malloc(n);
if (!p) return NULL;
memcpy(p, s, n);
return p; // caller owns p
}
Common Misconceptions
- “If two pointers exist, they both own the memory.” (No, only one does.)
- “free(NULL) always means safe.” (It is safe, but it does not fix double-free bugs.)
- “realloc preserves all pointers.” (Only the returned pointer is valid.)
Check-Your-Understanding Questions
- What is the difference between aliasing and ownership?
- Why does
reallocinvalidate old pointers? - What lifetime does a stack-allocated array have?
Check-Your-Understanding Answers
- Aliasing is multiple references; ownership is responsibility to free.
- Because
reallocmay move the block, and the old address may be freed. - It ends when the function returns.
Real-World Applications
- Allocator APIs for databases
- JSON parsers that return owned trees
- Network servers managing buffers per connection
Where You Will Apply It
- Project 1: Dynamic Array
- Project 3: Arena Allocator
- Project 4: JSON Parser
- Project 6: HTTP Server
References
- C11 Committee Draft N1570 (object lifetime and memory management functions) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
- SEI CERT C Coding Standard (memory management rules) - https://wiki.sei.cmu.edu/confluence/display/c/SEI+CERT+C+Coding+Standard
Key Insight
Ownership is the only way to make C safe.
Summary
Ownership and lifetime define who frees memory and when pointers are valid. Without a clear ownership model, your program will eventually crash.
Homework/Exercises to Practice the Concept
- Annotate ownership for every pointer in a small program.
- Rewrite a function to clarify ownership transfer.
- Design a function that returns a buffer without leaking memory.
Solutions to the Homework/Exercises
- Use comments like “caller owns” or “borrowed”.
- Change function signature and document ownership explicitly.
- Ensure each allocation path has exactly one free.
Chapter 3: Memory Layout, Alignment, and Object Representation
Fundamentals
C is a language of bytes, not abstract types. Every object has a representation: a sequence of bytes stored in memory. These bytes must follow alignment requirements: certain types must begin at addresses that are multiples of their alignment. The C standard specifies alignment and object representation rules, and violating them results in undefined behavior. This is why sizeof(struct) can be larger than the sum of its fields: padding is inserted to satisfy alignment. Understanding memory layout is essential for writing correct allocators, parsing buffers, and working with low-level data structures. Endianness also matters when you interpret bytes directly, especially for serialization and debugging raw memory.
Deep Dive into the Concept
Alignment exists to make hardware access efficient and safe. Many architectures require that certain types (like 4-byte ints or 8-byte doubles) be aligned to their size. The C standard allows implementations to impose alignment constraints, and the programmer must respect them. Misaligned access can cause crashes or silent performance penalties. This matters in allocators and data structures that manually compute offsets. When you design an arena allocator, you must round up each allocation to the required alignment. When you lay out a struct in a binary file, you must decide whether to store padding bytes or to use explicit packing rules.
Object representation includes not only values but also padding bits. The C standard allows padding bits in integers and padding bytes in structs. This means that copying raw memory with memcmp can be misleading if the padding bytes are uninitialized. The correct way to compare structs is to compare fields, not raw bytes. For example, two structs with identical field values but different padding bytes are equal at the language level, but memcmp may say they are different. This is a subtle but common bug.
Alignment is expressed in C11 with _Alignof and _Alignas, and the standard library provides alignof and alignas via <stdalign.h>. These allow you to compute and enforce alignment constraints. For example, if you are writing an allocator that returns memory for double, you must ensure that the returned pointer is aligned to _Alignof(double). A correct allocator will round up its internal pointer to the next alignment boundary.
Memory layout is also about the difference between arrays and pointers. An array is a contiguous block of elements; a pointer is just an address. sizeof(array) gives you the total size of the array, while sizeof(pointer) gives you the size of the pointer itself. This confusion is a frequent source of buffer overflows.
Struct layout rules also matter when you serialize data. The compiler is allowed to reorder padding but not fields. This means the in-memory layout is predictable but not compact. If you need a compact, portable representation, you must define an explicit serialization format instead of dumping the struct bytes. This is why JSON, HTTP, and other protocols define byte-level representations rather than relying on C struct layouts.
Endianness is another subtle detail. Multi-byte values are stored in memory with either little-endian or big-endian byte order depending on the platform. If you inspect raw bytes or write them to disk, you must decide on a portable encoding (for example, network byte order). This is not just a networking concern; it affects debugging and binary formats too.
Finally, object representation matters for aliasing. If you treat memory as a different type than it was originally written, you may violate the “effective type” rules. This is undefined behavior and can cause subtle compiler optimizations to break your code. This is why you should use memcpy for type punning instead of casting pointers. The standard allows unsigned char to alias any object representation, so it is safe for raw byte access.
How This Fits on Projects
Memory layout is critical for the gap buffer, arena allocator, JSON parser, and HTTP server. Alignment and padding mistakes cause crashes and data corruption.
Definitions & Key Terms
- Alignment: Required address boundary for an object
- Padding: Unused bytes inserted for alignment
- Object representation: Byte-level storage of a C object
- Effective type: The type used to access an object
Mental Model Diagram
Struct Layout (example on 64-bit)
struct Foo { char a; int b; char c; };
Address: 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
[ a ] [pad][pad][pad][ b ][ b ][ b ][ b ]
[ c ] [pad][pad][pad] ...
How It Works (Step-by-Step)
- Determine alignment requirements with
_Alignof. - Round allocation addresses up to the next alignment boundary.
- Avoid comparing structs with
memcmp. - Serialize fields explicitly when writing to disk or network.
Minimal Concrete Example
#include <stdalign.h>
#include <stddef.h>
size_t align_up(size_t n, size_t align) {
return (n + align - 1) & ~(align - 1);
}
size_t double_align = _Alignof(double); // C11
Common Misconceptions
- “Struct size equals sum of fields.” (Padding breaks this.)
- “memcmp is a safe struct comparison.” (Padding bytes can differ.)
- “Casting fixes alignment.” (It does not.)
Check-Your-Understanding Questions
- Why does C insert padding into structs?
- How do you ensure a custom allocator returns aligned memory?
- Why is comparing structs with
memcmpunsafe?
Check-Your-Understanding Answers
- To satisfy alignment requirements.
- Round up pointers to
_Alignof(type)boundaries. - Padding bytes may be uninitialized or differ.
Real-World Applications
- Binary file formats and network protocols
- High-performance allocators
- Memory-mapped data structures
Where You Will Apply It
- Project 2: Gap Buffer
- Project 3: Arena Allocator
- Project 4: JSON Parser
- Project 6: HTTP Server
References
- C11 Committee Draft N1570 (alignment and object representation) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
Key Insight
If you do not understand layout, you cannot reason about memory safety.
Summary
Memory layout is about alignment, padding, and object representation. These rules are non-negotiable in C and directly affect correctness.
Homework/Exercises to Practice the Concept
- Write a program that prints
sizeofandoffsetoffor a struct. - Implement
align_upfor a custom allocator. - Serialize a struct to disk without using
fwrite(&struct).
Solutions to the Homework/Exercises
- Use
offsetoffrom<stddef.h>to confirm offsets. - Use bitwise rounding as shown above.
- Write each field explicitly in a defined order.
Chapter 4: Pointer Validity, Aliasing, and Undefined Behavior
Fundamentals
Pointers are only valid when they point to a live object within its lifetime. The C standard specifies what it means for a pointer to be valid, what operations are allowed on it, and when it becomes invalid. Accessing memory outside of an object’s lifetime or bounds is undefined behavior. Aliasing rules (“effective type” and strict aliasing) further restrict how you can access memory through different pointer types. These rules are essential for compiler optimizations, and violating them can produce bugs that only appear in optimized builds. Modern compilers assume you follow these rules even if the code seems to work in debug mode.
Deep Dive into the Concept
Pointer validity begins with object lifetime. An object exists only after storage is allocated and before it is released. A pointer to an object before initialization is not valid for reading; a pointer after free is invalid even if it still has a numerical address. This is why use-after-free bugs are so dangerous: the pointer still “looks” valid but no longer refers to a live object. The C standard explicitly allows compilers to assume that you never access an object outside its lifetime. This assumption enables aggressive optimizations that can rearrange or eliminate code. If you violate it, your program can behave unpredictably.
Aliasing is the ability for two pointers to refer to the same memory. The C standard defines rules about which types can alias each other. The effective type of an object is the type used to store it, and only certain types can be used to access it later. In practice, char * (or unsigned char *) is allowed to inspect any object representation, but unrelated types are not. This is the origin of “strict aliasing” rules: the compiler assumes that pointers of different (non-compatible) types do not refer to the same memory. If you violate this, your code may work at -O0 and fail at -O2.
Pointer arithmetic is also restricted. You can only compute pointers within a single array object (including one past the end). Even comparing pointers is only defined if they point into the same array (or one past the end). This matters in C because many custom data structures treat buffers as arrays and use pointer arithmetic to iterate. If you compute a pointer beyond one-past-the-end, or if you compare pointers from different allocations, you are in undefined behavior territory.
One of the most common issues is realloc invalidation. When you call realloc, the old memory may be freed and replaced by a new block. Every pointer derived from the old block becomes invalid. This is why storing raw element pointers into a resizable array is unsafe. Use indices instead, or rebuild pointers after reallocation.
Another subtlety is uninitialized memory. Reading uninitialized memory is undefined behavior because the object has an indeterminate value. Tools like Valgrind track “definedness” to detect this; AddressSanitizer catches out-of-bounds and use-after-free but not all uninitialized reads. In C, you must initialize every byte you intend to read, including padding bytes if you compare raw memory.
The core mental model is: the C standard defines a narrow set of valid pointer operations. If you stay within it, your program is predictable. If you step outside it, you are no longer writing C; you are relying on undefined behavior. The projects in this sprint will force you to stay within the valid subset by writing explicit bounds checks and invariants.
Finally, be careful with pointer comparisons and type punning. Comparing pointers that do not point into the same array object is undefined, even if their numeric values look ordered. Type punning through unions or casts can also trigger strict-aliasing violations. The safe, portable way to reinterpret bytes is to copy with memcpy into a properly typed object. This may feel indirect, but it keeps you inside the rules the compiler assumes.
How This Fits on Projects
Pointer validity and aliasing rules are central to the dynamic array, gap buffer, JSON parser, and HTTP server. These projects will fail under optimization if you violate strict aliasing or lifetime rules.
Definitions & Key Terms
- Valid pointer: Points to a live object within its lifetime
- One-past-the-end: The only pointer allowed beyond an array
- Effective type: The type used to access an object in memory
- Strict aliasing: Compiler rule that unrelated pointer types do not alias
Mental Model Diagram
Pointer Lifetime
malloc -> valid pointer -> use -> free -> INVALID pointer
Any access after free = undefined behavior
How It Works (Step-by-Step)
- Allocate memory and obtain a pointer.
- Use pointer only within bounds and lifetime.
- After
freeorrealloc, treat old pointers as invalid. - Access bytes through
unsigned char *when inspecting raw data.
Minimal Concrete Example
int *p = malloc(sizeof(int) * 4);
int *q = p + 4; // one-past-end is OK
// *(q) is NOT OK
free(p);
// *p is UB now
Common Misconceptions
- “If it works in debug, it will work in release.” (UB often shows up only in optimized builds.)
- “Pointers are just integers.” (Not in C semantics.)
- “memcmp on structs is always safe.” (Padding bytes and effective type rules matter.)
Check-Your-Understanding Questions
- Why is reading uninitialized memory undefined behavior?
- What does strict aliasing allow and forbid?
- Why is one-past-the-end pointer allowed but not dereferenceable?
Check-Your-Understanding Answers
- Because the value is indeterminate and the standard allows anything to happen.
- It allows aliasing by compatible types and
unsigned char, forbids unrelated types. - It allows pointer arithmetic and comparisons without dereference.
Real-World Applications
- Optimized C libraries rely on strict aliasing assumptions
- Performance-critical code uses pointer arithmetic safely
- Sanitizers detect invalid access patterns
Where You Will Apply It
- Project 1: Dynamic Array
- Project 2: Gap Buffer
- Project 4: JSON Parser
- Project 6: HTTP Server
References
- C11 Committee Draft N1570 (object lifetime, pointer arithmetic, effective type) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
Key Insight
Undefined behavior is a contract violation, not a runtime error.
Summary
Pointer validity and aliasing rules define the safe subset of C. Violating them means your program is not defined by the standard.
Homework/Exercises to Practice the Concept
- Write a small program that violates strict aliasing and observe behavior at
-O0vs-O2. - Create a test that uses
reallocand invalidates old pointers. - Use Valgrind to detect uninitialized reads.
Solutions to the Homework/Exercises
- Use a
float *andint *to alias the same memory (observe unexpected results). - Store a pointer to
vec->data[0], thenrealloc, then check pointer validity. - Run
valgrind --track-origins=yesto see uninitialized reads.
Chapter 5: Allocation Strategies and Custom Allocators
Fundamentals
The default allocator (malloc, free, realloc) is general-purpose and flexible, but not always optimal. For high-performance or predictable lifetimes, custom allocators like arenas and pools are better. An arena allocator gives you fast, linear allocation and bulk free. A pool allocator gives you fixed-size blocks with constant-time allocation and deallocation. Understanding the standard allocation functions is required: malloc returns a block of uninitialized storage or NULL, free releases it, and realloc may move it. These functions are defined by the C standard and are the foundation for all custom allocators. Fragmentation, allocator locks, and metadata overhead are why special-purpose allocators can outperform malloc for specific workloads.
Deep Dive into the Concept
Allocators exist to manage fragmentation, performance, and ownership. The general-purpose allocator must support arbitrary allocation sizes and lifetimes, which means it uses metadata, free lists, and often global locks. This overhead can be significant when you are allocating thousands of small objects in a tight loop (e.g., parsing HTTP headers or JSON tokens). An arena allocator sidesteps this by allocating a large block up front and then handing out chunks with a simple pointer bump. This is extremely fast because it avoids per-allocation metadata and search costs. The trade-off is that you can only free everything at once. That is fine for workloads with a clear phase boundary, such as “parse request, then discard all allocations.”
Pool allocators are another strategy. They manage a fixed number of blocks of identical size. Each allocation returns a block from the free list, and each free returns the block to the list. This is ideal for fixed-size objects like list nodes or connection structs. Pools are fast and predictable but do not support variable-sized objects without additional logic.
Alignment is critical for custom allocators. The C standard specifies alignment requirements for each type, and a correct allocator must ensure that returned pointers satisfy those requirements. A simple strategy is to round up every allocation to the next power-of-two boundary or to _Alignof(max_type). The arena allocator in this sprint will implement align_up to satisfy alignment and avoid undefined behavior.
Another allocator concern is reallocation. If you implement a custom allocator, you must decide whether realloc semantics are supported or prohibited. In arenas, realloc can sometimes be implemented only for the most recent allocation; otherwise you must allocate a new block and copy. This affects API design. You should clearly document the limitations.
Ownership is also part of allocator design. Arena allocations are typically owned by the arena itself. The caller borrows pointers and must not call free on them. Instead, the arena is reset in bulk. This is a different ownership model than malloc, and mixing the two leads to undefined behavior. In your projects, you will design APIs that make this explicit.
Another important detail is allocator metadata. General-purpose allocators often use size classes, headers, and free lists. This metadata allows fast allocation but adds overhead and can cause fragmentation. If you build a pool allocator, you can store the free list inside the freed blocks themselves, which keeps metadata minimal. If you need zeroed memory, calloc is defined to return zero-initialized storage; if you reimplement this behavior, you must decide whether to zero eagerly or lazily. These choices affect performance and correctness.
Finally, you will use tools to validate your allocator. Valgrind can detect leaks and invalid accesses, but it may not understand your custom allocator unless you provide annotations. AddressSanitizer detects invalid access patterns but cannot see the internal invariants of your allocator. This means your own validation code is critical. You should implement allocator self-checks (e.g., “used <= capacity”) and run them in debug mode.
How This Fits on Projects
The arena allocator project is the core demonstration. The HTTP server also uses arenas for request-scoped allocations.
Definitions & Key Terms
- Arena: Linear allocator with bulk reset
- Pool: Fixed-size block allocator
- Fragmentation: Inefficient memory usage due to holes
- Alignment: Address boundary requirements for objects
Mental Model Diagram
Arena Allocator
[base ------------------------------ end]
^ used
allocate: return base+used, then used += size
reset: used = 0
How It Works (Step-by-Step)
- Allocate a big backing block (
malloc). - Track a
usedoffset. - On each allocation, align
usedand return the pointer. - On reset, set
used = 0.
Minimal Concrete Example
typedef struct {
unsigned char *base;
size_t used;
size_t capacity;
} Arena;
void *arena_alloc(Arena *a, size_t size, size_t align) {
size_t p = (a->used + align - 1) & ~(align - 1);
if (p + size > a->capacity) return NULL;
void *ptr = a->base + p;
a->used = p + size;
return ptr;
}
Common Misconceptions
- “Arena allocations can be freed individually.” (No, only reset.)
- “Alignment does not matter on x86.” (It still matters for performance and correctness.)
- “malloc always returns zeroed memory.” (It does not.)
Check-Your-Understanding Questions
- Why is an arena allocator faster than malloc for small allocations?
- What is the main trade-off of arenas?
- How do you enforce alignment in a custom allocator?
Check-Your-Understanding Answers
- It avoids metadata and free-list searches.
- You cannot free individual allocations.
- Round up addresses to the required alignment.
Real-World Applications
- Game engines use arenas for frame allocations
- HTTP servers use arenas for request lifetimes
- Compilers use arenas for AST nodes
Where You Will Apply It
- Project 3: Arena Allocator
- Project 6: HTTP Server
References
- C11 Committee Draft N1570 (malloc/free/realloc semantics) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
- Valgrind Memcheck documentation - https://valgrind.org/docs/manual/quick-start.html
- LLVM AddressSanitizer documentation - https://clang.llvm.org/docs/AddressSanitizer.html
Key Insight
Allocators encode ownership policy.
Summary
Custom allocators trade flexibility for speed and predictability. They require explicit alignment and ownership models.
Homework/Exercises to Practice the Concept
- Write an arena allocator and add a
resetfunction. - Implement a fixed-size pool for list nodes.
- Measure performance vs
mallocusing a benchmark.
Solutions to the Homework/Exercises
- Use
usedoffset andalign_upfunction. - Maintain a free list inside each block.
- Time
Nallocations and compare.
Chapter 6: Buffering and Parsing as Ownership
Fundamentals
Parsing is a memory problem as much as a syntax problem. When you parse input, you must decide how to store it, how long it lives, and who owns it. A gap buffer is a text editing structure that represents the buffer as a contiguous array with a “gap” for efficient insertions and deletions. JSON parsing involves building a tree of values with explicit ownership for strings, numbers, and nested objects. HTTP parsing requires reading a byte stream, detecting boundaries, and enforcing strict limits to avoid buffer overflows. These are all data-structure problems, not just parsing problems. In practice, the parser’s buffer limits and error paths are part of its invariants.
Deep Dive into the Concept
Buffering and parsing are where invariants meet real input. A gap buffer keeps a contiguous gap in the middle of the array; insertions occur by filling the gap, and deletions enlarge it. The invariants include: the buffer is contiguous, the gap has non-negative size, and the text length equals total size minus gap size. Emacs uses this structure to support efficient editing, and the concept is well-documented in the GNU Emacs manual.
JSON parsing in C raises ownership questions immediately. When you parse a JSON string, do you copy it into a new allocation, or do you point into the original input buffer? If you point into the input, the caller must keep the input buffer alive for the lifetime of the parsed tree. If you copy, you allocate many small strings and must free them later. Both are valid, but you must document the ownership contract. The RFC 8259 JSON specification defines the syntax and data model. For example, JSON numbers are base-10 with an optional leading minus, optional fraction, and optional exponent; NaN/Infinity are not valid JSON, and trailing commas are not allowed.
HTTP parsing is similarly constrained by specification. HTTP/1.1 is a text-based protocol with strict rules for request lines, headers, and line endings. RFC 9112 specifies the message syntax and parsing rules. The request line is method SP request-target SP HTTP-version CRLF, header fields are field-name: field-value lines terminated by CRLF, and field names are case-insensitive. In a C implementation, you must handle partial reads, buffer limits, and malformed input without overflow. This is exactly where invariants help: “buffer contains no more than MAX_HEADER bytes,” “request line must contain method, target, version,” and “header lines end with CRLF.”
Buffering is also about streaming. TCP does not preserve message boundaries, so you will often receive partial requests. You must design a buffer that can accumulate data until a full message is available. This is where state machines and invariants intersect: your parser has states like READING_REQUEST_LINE or READING_HEADERS, and each state has its own invariants. The capstone HTTP server will demonstrate this.
Finally, parsing is a security boundary. Buffer overflows and use-after-free bugs in parsers are some of the most exploited vulnerabilities in software. Defensive design requires explicit limits (maximum header size, maximum nesting depth, maximum string length) and clear error handling. These limits are not optional in production code; they are essential for safety.
JSON parsing also benefits from a streaming mindset. Even if you parse the whole input at once, it helps to structure the parser as a state machine that can resume. This makes it easier to enforce depth limits, avoid stack overflows, and produce precise error messages with offsets. For HTTP, partial reads and pipelined requests mean your buffer may contain multiple requests or half of one. Your invariants should capture these realities, for example: “buffer contains zero or more complete requests followed by an optional prefix of the next request.”
How This Fits on Projects
The gap buffer project uses a classic text editor structure. The JSON parser implements an explicit ownership model. The HTTP server uses a streaming parser with request pooling.
Definitions & Key Terms
- Gap buffer: Text buffer with a movable gap for fast edits
- Streaming parser: Parser that handles partial input
- Ownership model: Rules for who owns parsed data
- CRLF: Carriage return + line feed (
\r\n) line endings in HTTP
Mental Model Diagram
Gap Buffer Layout
[TEXT....][GAP.....][TEXT....]
^beg ^gap_end ^end
Insert: fill gap
Delete: enlarge gap
How It Works (Step-by-Step)
- Represent input as a buffer with explicit limits.
- Track positions (gap, read offset, parse state).
- Move gap or advance parse pointer as needed.
- Allocate parsed values with explicit ownership.
- Enforce limits and return errors on violations.
Minimal Concrete Example
// HTTP request line parse (simplified)
char *end = memmem(buf, len, "\r\n", 2);
if (!end) return NEED_MORE_DATA;
// Split method, target, version by spaces
Common Misconceptions
- “TCP read returns a full request.” (No, it is a byte stream.)
- “Parsing is just string splitting.” (It is memory management too.)
- “You can ignore malformed input.” (Malformed input is the norm in attacks.)
Check-Your-Understanding Questions
- Why does a gap buffer allow faster insertions?
- What are the ownership choices for parsed JSON strings?
- Why must HTTP parsers handle partial reads?
Check-Your-Understanding Answers
- The gap means insertions can happen without shifting all text.
- Either copy strings (parser owns) or reference input (caller owns).
- TCP is stream-based and may deliver partial lines.
Real-World Applications
- Text editors (Emacs, classic editors)
- JSON configuration loaders
- HTTP servers and proxies
Where You Will Apply It
- Project 2: Gap Buffer
- Project 4: JSON Parser
- Project 6: HTTP Server
References
- GNU Emacs manual on gap buffers - https://www.gnu.org/software/emacs/manual/html_node/elisp/Buffer-Gap.html
- RFC 8259 (JSON) - https://www.rfc-editor.org/rfc/rfc8259
- RFC 9112 (HTTP/1.1) - https://www.rfc-editor.org/rfc/rfc9112
Key Insight
Parsing is an ownership problem wearing a syntax mask.
Summary
Buffers and parsers must be designed around invariants, ownership, and limits. This is how you prevent memory bugs in C.
Homework/Exercises to Practice the Concept
- Implement a gap buffer with
insertanddelete. - Parse a JSON number and store it with explicit ownership.
- Write a streaming HTTP parser that handles partial reads.
Solutions to the Homework/Exercises
- Track
gap_start,gap_end, and ensuregap_end >= gap_start. - Store numbers in a union; strings are allocated or borrowed.
- Keep a buffer and state; only parse when
\r\nis present.
Glossary
- Alignment: Memory address boundary requirement for objects
- Arena allocator: Allocator that frees all allocations at once
- Borrowed pointer: Pointer that must not be freed by the caller
- Dangling pointer: Pointer to an object whose lifetime ended
- Effective type: Type used to access object representation
- Gap buffer: Array with a movable gap for efficient edits
- Invariant: Condition that must always hold at stable points
- Ownership: Responsibility for freeing a resource
- Padding: Unused bytes added to satisfy alignment
- Strict aliasing: Rule restricting access through incompatible types
- Undefined behavior: Behavior not defined by the C standard
Why Data and Invariants Matter
The Modern Problem It Solves
Memory safety bugs remain one of the most costly classes of vulnerabilities in software. C and C++ are powerful but memory-unsafe, which means errors like buffer overflows and use-after-free can become exploitable security issues. Governments and major software vendors have publicly called for transitions toward memory-safe strategies because of the scale of the problem.
Recent guidance from CISA and international partners explicitly emphasizes memory safety as a top security priority, noting that memory safety vulnerabilities remain a major portion of reported issues in memory-unsafe codebases.
Real-world impact (recent sources):
- Google reported that memory safety issues accounted for 76% of Android vulnerabilities in 2019 and 24% in 2024, showing both the scale of the problem and the impact of memory-safe strategies (Google Online Security Blog, Sep 2024). See: https://security.googleblog.com/2024/09/eliminating-memory-safety-vulnerabilities-Android.html
- CISA and international partners stated that two-thirds of reported vulnerabilities in memory-unsafe languages still relate to memory safety issues (CISA, 2023). See: https://www.cisa.gov/case-memory-safe-roadmaps
Old Reality (C/C++ Unsafe) New Reality (Memory-Safe Strategy)
UB bugs -> exploits -> patching Invariants -> safe-by-design -> fewer bugs
Context and Evolution (Optional)
The C standard intentionally leaves some behavior undefined to allow compiler optimizations. Those optimizations are beneficial, but they make memory bugs extremely dangerous. This is why modern engineering increasingly emphasizes explicit invariants, ownership models, and defensive validation.
Concept Summary Table
| Concept | What You Need to Internalize |
|---|---|
| Invariants and Contracts | Stable-point rules that make a data structure valid |
| Ownership and Lifetimes | Exactly one owner per resource; pointers valid only within lifetime |
| Memory Layout and Alignment | Object representation and alignment rules control safe memory access |
| Pointer Validity and Aliasing | Strict rules for safe pointer use and avoiding undefined behavior |
| Allocation Strategies | When to use malloc vs arena/pool and how it affects ownership |
| Buffering and Parsing | Parsers are ownership problems; input must be buffered safely |
Project-to-Concept Map
| Project | What It Builds | Primer Chapters It Uses |
|---|---|---|
| Project 1: Dynamic Array | Safe resizable array | 1, 2, 4 |
| Project 2: Gap Buffer | Editor buffer with gap | 1, 3, 6 |
| Project 3: Arena Allocator | Fast bulk allocation | 2, 3, 5 |
| Project 4: JSON Parser | Owned parse tree | 1, 2, 4, 6 |
| Project 5: Linked List DB | Safe mutation & iteration | 1, 2, 4 |
| Project 6: HTTP Server | Streaming parser w/ pools | 1, 2, 3, 5, 6 |
Deep Dive Reading by Concept
Fundamentals and Contracts
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Invariants | “Code Complete” by Steve McConnell - Ch. 8 | Preconditions, postconditions, and defensive checks |
| Ownership | “Effective C” by Robert Seacord - Ch. 4-6 | Safe memory management patterns |
Memory and Layout
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Memory layout | “Computer Systems: A Programmer’s Perspective” - Ch. 2, 3 | Understanding object representation |
| Alignment | “Expert C Programming” by Peter van der Linden - Ch. 3 | Practical alignment pitfalls |
Parsing and Buffers
| Concept | Book & Chapter | Why This Matters |
|---|---|---|
| Parsing in C | “The Practice of Programming” by Kernighan & Pike - Ch. 3-5 | Robust parsing techniques |
| State machines | “Crafting Interpreters” by Bob Nystrom - Ch. 4 | Implementing parsers as state machines |
Specifications
| Concept | Specification | Why This Matters |
|---|---|---|
| JSON syntax | RFC 8259 | Defines the correct JSON format |
| HTTP/1.1 | RFC 9112 | Defines HTTP request syntax and parsing rules |
Quick Start
Feeling overwhelmed? Start here.
Day 1 (4 hours):
- Read Chapter 1 and Chapter 2 of the Theory Primer.
- Build Project 1 (dynamic array) with only push/pop.
- Add
vec_validate()and run it after each operation.
Day 2 (4 hours):
- Add
reallocgrowth and bounds checks to Project 1. - Run Valgrind or AddressSanitizer on your tests.
- Read Chapter 5 (allocation strategies).
By the end of 48 hours, you will understand ownership and invariants, which is 80% of this sprint.
Recommended Learning Paths
Path 1: The “C Fundamentals” Path
Best for: Newer C programmers
- Project 1 -> Project 3 -> Project 2
- Then Project 4
- Capstone after all
Path 2: The “Systems Engineer” Path
Best for: Comfortable with C, want deeper systems intuition
- Project 3 -> Project 1 -> Project 4
- Then Project 2 and 5
- Capstone last
Path 3: The “Interview Prep” Path
Best for: Whiteboard-level mastery
- Project 1 and 3
- Project 2 (gap buffer) for invariants
- Project 4 for parsing
Path 4: The Completionist
Best for: Full mastery
- Projects 1-6 in order
- Add an additional feature to each project
Success Metrics
You can claim mastery when you can:
- State invariants for each data structure without looking at code
- Run all projects with Valgrind-clean or ASan-clean results
- Explain ownership for every pointer in your codebase
- Handle malformed input without crashes or leaks
- Implement any project from scratch within 2-4 hours
Project Overview Table
| # | Project | Difficulty | Time | Primary Focus |
|---|---|---|---|---|
| 1 | Safe Dynamic Array (vec.h) |
Intermediate | 1-2 days | Invariants, ownership, realloc |
| 2 | Gap Buffer Editor | Advanced | 2-3 days | Invariants, memory layout |
| 3 | Arena Allocator | Advanced | 2-3 days | Ownership, alignment |
| 4 | JSON Parser | Advanced | 3-4 days | Ownership, parsing, validation |
| 5 | Linked List Database | Intermediate | 2-3 days | Safe mutation, iteration |
| 6 | HTTP Server (Capstone) | Expert | 5-7 days | Parsing, pools, invariants |
Project List
Project 1: Safe Dynamic Array Library (vec.h)
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 2 (Practical and Useful)
- Business Potential: Level 2 (Reusable library)
- Difficulty: Intermediate
- Knowledge Area: Data structures, invariants
- Software or Tool: GCC/Clang, Valgrind
- Main Book: “C Interfaces and Implementations” by David Hanson
What you’ll build: A resizable array with explicit invariants, push/pop, bounds-checked access, and safe growth using realloc.
Why it teaches data and invariants: A vector forces you to define and enforce invariants about capacity, length, and ownership of the underlying buffer.
Core challenges you’ll face:
- Defining and validating invariants
- Handling
reallocsafely - Preventing out-of-bounds access
Real World Outcome
You will have a small library you can include in any C project.
What you will see:
- A
vec.handvec.cwith clear API contracts - Debug builds that assert invariants on every operation
- Clean Valgrind and ASan runs
Command Line Outcome Example:
$ make test
cc -Wall -Wextra -g vec.c vec_test.c -o vec_test
$ ./vec_test
[vec] push 1
[vec] push 2
[vec] pop -> 2
[vec] len=1 cap=8 OK
$ valgrind --leak-check=full ./vec_test
==12345== HEAP SUMMARY:
==12345== in use at exit: 0 bytes in 0 blocks
==12345== total heap usage: 3 allocs, 3 frees
==12345== All heap blocks were freed -- no leaks are possible
The Core Question You’re Answering
“How do I keep a resizable array correct across reallocations and mutations?”
Concepts You Must Understand First
- Invariants
- What must always be true about length and capacity?
- Book Reference: “Code Complete” Ch. 8
- Ownership and Lifetimes
- Who owns the buffer? When is it freed?
- Book Reference: “Effective C” Ch. 4-6
- Pointer Validity
- What happens to pointers after
realloc? - Book Reference: “Expert C Programming” Ch. 3
- What happens to pointers after
Questions to Guide Your Design
- How will you prevent out-of-bounds access?
- When do you grow the buffer, and by how much?
- How do you handle
reallocfailure safely?
Thinking Exercise
“The Realloc Trap”
int *p = vec->data;
vec_push(vec, 42); // may realloc
// Is p still valid?
Questions:
- Under what conditions is
pvalid? - How should your API prevent this misuse?
The Interview Questions They’ll Ask
- “What invariants define a dynamic array?”
- “Why is
reallocdangerous?” - “What growth factor should you use and why?”
- “How do you ensure amortized O(1) push?”
Hints in Layers
Hint 1: Start with invariants
Define length <= capacity, data != NULL when capacity > 0.
Hint 2: Use realloc carefully
Always assign to a temporary pointer first.
int *tmp = realloc(vec->data, new_cap * sizeof(int));
if (!tmp) return false; // keep old buffer intact
vec->data = tmp;
Hint 3: Add validation
Call vec_validate() at entry/exit in debug builds.
Hint 4: Test with Valgrind
Run valgrind --leak-check=full ./vec_test after every change.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Invariants | “Code Complete” | Ch. 8 |
| Memory safety | “Effective C” | Ch. 4-6 |
| Dynamic arrays | “C Interfaces and Implementations” | Ch. 4-5 |
Common Pitfalls & Debugging
Problem 1: “Segmentation fault after push”
- Why:
reallocfailed and old pointer lost. - Fix: Use a temp pointer and only assign on success.
- Quick test: Force small allocations and check error paths.
Problem 2: “Use-after-free”
- Why: External pointer to array element after growth.
- Fix: Document that element pointers are invalid after mutation.
Problem 3: “Length exceeds capacity”
- Why: Incrementing length before growth.
- Fix: Ensure growth happens before length update.
Definition of Done
- Invariants are documented and enforced
push,pop,get,setwork correctlyreallocfailure path is safe- Valgrind and ASan are clean
Project 2: Text Editor Gap Buffer
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 3 (Genuinely Clever)
- Business Potential: Level 2 (Editor component)
- Difficulty: Advanced
- Knowledge Area: Data structure invariants
- Software or Tool: GCC/Clang, Valgrind
- Main Book: “C Interfaces and Implementations” by David Hanson
What you’ll build: A gap buffer that supports efficient insert and delete operations, with explicit invariants and validation.
Why it teaches data and invariants: The gap buffer is a canonical data structure whose correctness is defined entirely by invariants (gap location, gap size, text length).
Core challenges you’ll face:
- Maintaining a contiguous gap
- Moving the gap efficiently
- Avoiding off-by-one errors
Real World Outcome
What you will see:
- A buffer that supports fast insertions at the cursor
- Correct behavior when inserting large text blocks
- Explicit validation of gap invariants
Command Line Outcome Example:
$ ./gapedit
[insert] "Hello"
[buffer] "Hello|" (gap size 11)
[move cursor] to start
[insert] "Say "
[buffer] "Say Hello|" (gap size 7)
[delete] 3 chars
[buffer] "Say |lo" (gap size 10)
The Core Question You’re Answering
“How can I edit a large text buffer efficiently without reallocating every time?”
Concepts You Must Understand First
- Invariants
- Gap must remain contiguous
- Text length = total size - gap size
- Book Reference: “Code Complete” Ch. 8
- Memory Layout
- Buffer is a flat array with a hole
- Book Reference: “Expert C Programming” Ch. 3
Questions to Guide Your Design
- How will you represent the gap (start/end indices)?
- When do you move the gap vs expand it?
- How do you validate invariants after edits?
Thinking Exercise
“Gap Movement”
If the cursor moves from position 100 to 500, how many bytes should you move? What happens if you move the wrong direction?
The Interview Questions They’ll Ask
- “Why is a gap buffer efficient for insertions?”
- “What are the key invariants?”
- “How is a gap buffer different from a rope?”
Hints in Layers
Hint 1: Represent the gap with two indices
gap_start and gap_end are enough.
Hint 2: Move gap with memmove
Use memmove because regions overlap.
memmove(buf + new_gap_end, buf + gap_end, move_size);
Hint 3: Validate after every edit
Check gap_end >= gap_start and text_len + gap_size == capacity.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Data structure invariants | “Code Complete” | Ch. 8 |
| Memory layout | “Expert C Programming” | Ch. 3 |
Common Pitfalls & Debugging
Problem 1: “Corrupted text after insertion”
- Why: Using
memcpyfor overlapping regions. - Fix: Use
memmove.
Problem 2: “Gap disappears”
- Why: Wrong updates to
gap_startorgap_end. - Fix: Add assertions after each mutation.
Definition of Done
- All invariants documented
- Insert/delete works at any cursor location
- Gap moves correctly with memmove
- Valgrind and ASan clean
Project 3: Arena Allocator
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 3 (Genuinely Clever)
- Business Potential: Level 3 (Reusable infrastructure)
- Difficulty: Advanced
- Knowledge Area: Allocation strategies
- Software or Tool: GCC/Clang, Valgrind
- Main Book: “C Interfaces and Implementations” by David Hanson
What you’ll build: A bump-pointer arena allocator with alignment, reset, and optional debug validation.
Why it teaches data and invariants: You will define invariants about used space and alignment, and enforce ownership rules (arena owns all allocations).
Core challenges you’ll face:
- Alignment rounding
- Out-of-memory handling
- Ownership clarity (no individual frees)
Real World Outcome
$ ./arena_test
[arena] init size=4096
[alloc] 64 bytes -> 0x1000
[alloc] 32 bytes -> 0x1040
[reset]
[alloc] 128 bytes -> 0x1000
[ok] invariants hold
The Core Question You’re Answering
“How can I allocate many small objects quickly when they share a lifetime?”
Concepts You Must Understand First
- Alignment
- Use
_Alignofand round up - Book Reference: “Expert C Programming” Ch. 3
- Use
- Ownership
- Arena owns all allocations
- Book Reference: “Effective C” Ch. 4-6
Questions to Guide Your Design
- How will you represent
usedandcapacity? - How do you handle out-of-memory without corrupting state?
- What alignment should you guarantee?
Thinking Exercise
“The Alignment Bug”
If you allocate a double at an odd address, what might happen on a strict-alignment CPU?
The Interview Questions They’ll Ask
- “What is an arena allocator?”
- “Why is alignment necessary?”
- “What workloads benefit most from arenas?”
Hints in Layers
Hint 1: Use align_up
Always align the pointer before allocating.
Hint 2: Validate after allocation
Ensure used <= capacity.
Hint 3: No frees
The only valid free is arena_reset().
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Custom allocators | “C Interfaces and Implementations” | Ch. 5 |
| Memory management | “Effective C” | Ch. 4-6 |
Common Pitfalls & Debugging
Problem 1: “Allocator returns NULL too early”
- Why: Alignment math is wrong.
- Fix: Verify
align_uplogic.
Problem 2: “Data corruption”
- Why: Overlapping allocations.
- Fix: Add asserts for
usedoffset.
Definition of Done
- Allocator returns aligned pointers
- Reset reuses memory safely
- All invariants validated
- Valgrind clean
Project 4: JSON Parser with Explicit Ownership
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 4 (Hardcore Tech Flex)
- Business Potential: Level 3 (Reusable parsing library)
- Difficulty: Advanced
- Knowledge Area: Parsing, ownership
- Software or Tool: GCC/Clang, Valgrind
- Main Book: “The Practice of Programming” by Kernighan & Pike
What you’ll build: A JSON parser that produces a tree of values with explicit ownership rules and robust error reporting.
Why it teaches data and invariants: JSON parsing forces you to define ownership of strings and nested structures. You also must enforce invariants about token boundaries and nesting depth.
Core challenges you’ll face:
- String parsing with escapes
- Ownership model for parsed strings
- Defensive limits (depth, length)
Real World Outcome
$ ./json_parse '{"name":"Ada","age":36,"tags":["math","systems"]}'
JSON Object
name: "Ada"
age: 36
tags: ["math", "systems"]
$ ./json_parse '{"name": "unterminated}'
Error: Unterminated string at byte 9
The Core Question You’re Answering
“How do I parse a complex format and still keep ownership and invariants correct?”
Concepts You Must Understand First
- Ownership
- Do strings belong to parser or caller?
- Book Reference: “Effective C” Ch. 4-6
- Pointer Validity
- Avoid dangling pointers into temporary buffers
- Book Reference: “Expert C Programming” Ch. 3
- Parsing and Buffers
- JSON syntax rules
- Spec Reference: RFC 8259
Questions to Guide Your Design
- Will you copy strings or point into the input buffer?
- How will you represent JSON values in a union?
- How will you track parsing errors with offsets?
Thinking Exercise
“Borrowed Input”
If your parser stores pointers into the input buffer, what happens when the input buffer is freed? How can you prevent this?
The Interview Questions They’ll Ask
- “What are the valid JSON types?”
- “How do you handle escape sequences?”
- “What is the complexity of your parser?”
- “How do you prevent stack overflow from deep nesting?”
Hints in Layers
Hint 1: Start with a tokenizer Turn input into tokens before building the tree.
Hint 2: Use a union for value types
Define JSON_STRING, JSON_NUMBER, etc.
Hint 3: Choose ownership explicitly Document whether strings are copied or borrowed.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Parsing | “The Practice of Programming” | Ch. 3-5 |
| Memory safety | “Effective C” | Ch. 4-6 |
| JSON spec | RFC 8259 | All |
Common Pitfalls & Debugging
Problem 1: “Segfault on invalid JSON”
- Why: Missing bounds checks.
- Fix: Always check input length before reading.
Problem 2: “Leaks after parse error”
- Why: Partial tree not freed.
- Fix: Free allocated nodes on error path.
Definition of Done
- Correctly parses valid JSON per RFC 8259
- Rejects invalid JSON with useful errors
- Ownership model documented and tested
- Valgrind and ASan clean
Project 5: Linked List Database
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 2 (Practical)
- Business Potential: Level 2 (Simple storage engine)
- Difficulty: Intermediate
- Knowledge Area: Ownership, mutation safety
- Software or Tool: GCC/Clang, Valgrind
- Main Book: “Effective C” by Robert Seacord
What you’ll build: A small database where each record is stored in a linked list node, with safe iteration and deletion.
Why it teaches data and invariants: Linked lists require invariants about node linking and pointer validity, especially when deleting nodes during iteration.
Real World Outcome
$ ./listdb
insert key=1 value="alpha"
insert key=2 value="beta"
find key=2 -> "beta"
delete key=1
list: (2, "beta")
The Core Question You’re Answering
“How do I mutate a linked list safely without invalidating my iterator?”
Concepts You Must Understand First
- Ownership
- Each node owns its data
- Book Reference: “Effective C” Ch. 4-6
- Pointer Validity
- Deleting a node invalidates its pointer
- Book Reference: “Expert C Programming” Ch. 3
Questions to Guide Your Design
- How do you delete a node while iterating?
- How will you prevent use-after-free?
- How do you represent ownership of node payloads?
Thinking Exercise
“Delete While Iterating”
If you delete the current node, how do you continue the loop safely?
The Interview Questions They’ll Ask
- “How do you delete a node in a singly linked list?”
- “What invariants does a linked list have?”
- “Why is it dangerous to free the current node before saving next?”
Hints in Layers
Hint 1: Save next pointer first
Before deleting, store next in a temporary variable.
Hint 2: Use a dummy head Simplify edge cases with a sentinel node.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Memory safety | “Effective C” | Ch. 4-6 |
| Data structures | “Algorithms in C” by Sedgewick | Ch. 3 |
Common Pitfalls & Debugging
Problem 1: “Crash during iteration”
- Why: Use-after-free on deleted node.
- Fix: Store
nextbefore delete.
Problem 2: “Lost nodes”
- Why: Not updating links correctly.
- Fix: Use a sentinel head to simplify.
Definition of Done
- Insert, delete, and find operations work
- Iterator deletion is safe
- No leaks or use-after-free
Project 6: HTTP Server with Request Pooling (Capstone)
- Main Programming Language: C
- Alternative Programming Languages: Rust, Zig
- Coolness Level: Level 4 (Hardcore Tech Flex)
- Business Potential: Level 4 (Service foundation)
- Difficulty: Expert
- Knowledge Area: Parsing, ownership, allocators
- Software or Tool: GCC/Clang, Valgrind
- Main Book: “UNIX Network Programming, Vol 1” by Stevens
What you’ll build: A minimal HTTP/1.1 server that parses requests, uses a pool allocator per request, and responds with static content.
Why it teaches data and invariants: The server combines all concepts: parsing, buffering, ownership, invariants, and allocation strategies.
Real World Outcome
$ ./httpserver 8080 ./www
[server] listening on :8080
[conn 3] GET /index.html
[conn 3] 200 OK (1284 bytes)
$ curl -v http://localhost:8080/index.html
> GET /index.html HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/8.5.0
>
< HTTP/1.1 200 OK
< Content-Length: 1284
< Content-Type: text/html
<
<html>...
The Core Question You’re Answering
“How do I parse and serve HTTP requests safely in C without memory bugs?”
Concepts You Must Understand First
- Parsing and Buffering
- HTTP/1.1 request format
- Spec Reference: RFC 9112
- Arena Allocation
- Request-scoped allocations
- Book Reference: “C Interfaces and Implementations” Ch. 5
- Invariants
- Parser state invariants
- Book Reference: “Code Complete” Ch. 8
Questions to Guide Your Design
- How will you handle partial reads from the socket?
- What is the maximum header size you will accept?
- How will you reset per-request allocations safely?
Thinking Exercise
“Partial Request”
If recv() returns half of the headers, what invariant must still hold? How do you keep the buffer consistent?
The Interview Questions They’ll Ask
- “Why must HTTP parsers handle partial reads?”
- “How do you prevent header buffer overflow?”
- “What is a safe maximum request size?”
- “How do you validate request line syntax?”
Hints in Layers
Hint 1: Start with single-connection blocking I/O Get a minimal server working before concurrency.
Hint 2: Implement a state machine Use states like REQ_LINE, HEADERS, BODY.
Hint 3: Use an arena for each request All allocations freed when request ends.
Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| HTTP parsing | RFC 9112 | All |
| Sockets | “UNIX Network Programming” Vol 1 | Ch. 4-6 |
| Allocators | “C Interfaces and Implementations” | Ch. 5 |
Common Pitfalls & Debugging
Problem 1: “Server hangs on input”
- Why: Waiting for full request when partial data is available.
- Fix: Use a buffer and state machine.
Problem 2: “Buffer overflow on long headers”
- Why: No explicit header size limit.
- Fix: Enforce max header size and reject large requests.
Problem 3: “Memory leak per request”
- Why: Forgetting to reset arena.
- Fix: Reset arena after each request.
Definition of Done
- Parses HTTP/1.1 request line and headers correctly
- Handles partial reads safely
- Uses arena allocator for request memory
- No memory leaks or invalid accesses
Summary
This sprint teaches the true foundation of safe C: invariants, ownership, memory layout, and parsing discipline. If you complete the projects and can explain the invariants from memory, you will have skills that transfer to any systems codebase.
Additional Resources
- C11 Committee Draft N1570 (alignment, object lifetime, aliasing rules) - https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
- RFC 8259 (JSON) - https://www.rfc-editor.org/rfc/rfc8259
- RFC 9112 (HTTP/1.1) - https://www.rfc-editor.org/rfc/rfc9112
- GNU Emacs manual on gap buffers - https://www.gnu.org/software/emacs/manual/html_node/elisp/Buffer-Gap.html
- Valgrind Memcheck documentation - https://valgrind.org/docs/manual/quick-start.html
- LLVM AddressSanitizer documentation - https://clang.llvm.org/docs/AddressSanitizer.html