Sprint 4.5 - Data Structures and Algorithmic Tools (Systems View)

Goal: Build the mental model that data structures are performance contracts in real systems. You will learn why arrays are usually the fastest choice, where linked lists actually win, how hash tables hide costly resizing and collision behavior, and why disk-friendly trees and LSM structures dominate storage engines. By the end, you will be able to choose structures based on access patterns, memory layout, and I/O behavior, and you will validate those choices with profiling and benchmarks, not intuition.

Introduction

Data structures in systems programming are not just containers; they are executable performance agreements between your code, the CPU cache, and the storage device. In production systems, the wrong structure silently destroys throughput, increases tail latency, or makes durability impossible to reason about.

What you will build (by the end of this guide):

An in-memory cache server with O(1) lookup and O(1) LRU eviction
A log-structured key-value store with background compaction
A network packet buffer that implements backpressure and prioritization
A memory allocator with free lists, size classes, and leak detection
A disk-based B-tree index with range queries
A capstone database engine that combines all of the above

Scope (what is included):

Data structures as they appear in production systems
Memory layout, cache locality, and I/O behavior
Real benchmarking and performance measurement
Persistence, crash resilience, and compaction strategies

Out of scope (for this guide):

Formal proofs of correctness
Purely academic data structure exercises
High-level database query optimizers beyond basics

The Big Picture (Mental Model)

Workload -> Access Pattern -> Memory Layout -> Structure -> Algorithm -> Measurement
   |             |                |              |           |             |
   v             v                v              v           v             v
Reads vs Writes  Random vs Seq    Cache lines    Array,      O(1) vs O(log) Benchmarks
Latency targets  Point vs Range   Page size      Hash, Tree  Tail latency   Profiling
Durability       Burstiness       Allocator      LSM, Ring   Amortized cost Flamegraphs

Key Terms You Will See Everywhere

Cache line: The smallest chunk of memory the CPU cache loads at once (typically 64 bytes).
Load factor: The ratio of elements to buckets in a hash table.
Fanout: The number of children per internal node in a tree (critical for disk I/O).
Compaction: The process of merging and rewriting storage segments to reduce fragmentation and amplify reads.
Backpressure: A signal that slows producers when buffers are full.

How to Use This Guide

Read the Theory Primer like a mini-book. Each chapter is designed to give you a mental model and an implementation blueprint.
Build projects in order if you are new to systems performance. Each project depends on previous structures.
For each project, copy the Real World Outcome first, then implement until your output matches.
Use the Questions to Guide Your Design before you write code. This is where most production bugs originate.
Measure everything. A data structure is only correct when it meets the performance contract you intended.

Prerequisites & Background

Essential Prerequisites (Must Have)

Programming Skills:

Confident C programming (pointers, structs, function pointers, manual memory management)
Comfort reading and writing code that uses malloc, free, and memcpy
Ability to build and debug with make, gcc or clang

Systems Fundamentals:

File I/O (open, read, write, fsync)
Basic socket programming (socket, bind, accept)
Understanding of stack vs heap, and process memory layout

Data Structure Fundamentals:

Arrays, linked lists, stacks, queues
Basic hash tables and collision handling
Trees and traversal basics

Debugging Skills:

gdb or lldb for stepping and stack traces
valgrind for memory leaks and invalid accesses
Basic profiling with perf or gprof

Helpful But Not Required

Lock-free programming and atomics
Filesystem internals and page caching
Benchmark design and statistical analysis

Self-Assessment Questions

Can you explain why a cache miss is expensive compared to a register read?
Can you detect a leak and a double free using valgrind?
Do you know what a load factor is and why 0.75 is common?
Can you explain why B-trees have high fanout?
Can you read a flamegraph and identify hot paths?

If you answered “no” to 2 or more, review CS:APP chapters on memory and performance and build a small C project first.

Development Environment Setup

Required Tools:

Linux or macOS environment
gcc or clang (C11 or newer)
make
gdb or lldb
valgrind (Linux) or leaks (macOS)
perf (Linux) or Instruments (macOS)

Recommended Tools:

hyperfine for benchmarking
strace or dtrace for system call tracing
flamegraph for visualization

Testing Your Setup:

$ cc --version
$ make --version
$ gdb --version
$ valgrind --version

Time Investment

Project 1: 1-2 weeks (15-25 hours)
Project 2: 2-3 weeks (25-40 hours)
Project 3: 1-2 weeks (15-25 hours)
Project 4: 2 weeks (20-30 hours)
Project 5: 2-3 weeks (25-40 hours)
Capstone: 4+ weeks (40-80 hours)

Important Reality Check

These projects are deliberately hard. You will debug segmentation faults, measure unexpected slowdowns, and rewrite code. The goal is to build intuition for the systems tradeoffs that textbooks rarely teach. Expect to iterate.

Big Picture / Mental Model

Data structures in systems are chosen primarily by access pattern and storage medium:

Access Pattern -> Structure -> Storage Medium -> Dominant Cost
Random lookups -> Hash table -> RAM            -> Cache misses
Ordered ranges -> B-tree      -> Disk/SSD       -> Page reads
Append-only    -> LSM tree    -> Disk/SSD       -> Sequential writes
Streaming FIFO -> Ring buffer -> RAM            -> Backpressure
Object lifetimes -> Free list -> RAM           -> Fragmentation

Key insight: In systems, latency is dominated by memory and I/O behavior, not just Big-O. Arrays often beat “faster” theoretical structures simply because they align with the CPU cache and storage page size.

Theory Primer

This section is your mini-book. Each chapter includes fundamentals, a deep dive, and the practical connection to the projects.

Chapter 1: Memory Hierarchy and Arrays (Cache Locality as a Superpower)

Fundamentals

Arrays are the baseline structure for systems performance because they place elements in contiguous memory. CPUs fetch data in cache lines, not individual bytes. When you access one element of an array, you implicitly load nearby elements into L1 or L2 cache, dramatically reducing latency for the next access. This behavior is called spatial locality. Arrays also allow the CPU’s hardware prefetcher to predict the next memory access and fetch it ahead of time. In practice, this means an O(n) scan over an array can be faster than an O(log n) tree lookup if the tree nodes are scattered in memory. Arrays also have low metadata overhead, which matters for millions of small objects. The core idea: arrays are fast because they turn unpredictable memory access into predictable sequential access.

Deep Dive

The memory hierarchy is a stack of increasingly slower, larger storage layers: registers, L1/L2/L3 caches, DRAM, SSD, and disk. Each level is orders of magnitude slower than the one above. This means the performance of a data structure is often determined by how well it aligns with cache lines and memory pages. Arrays excel because their contiguous layout allows cache lines to be fully utilized. When you iterate through an array, each cache line fetch gives you multiple elements for free. By contrast, pointer-heavy structures like linked lists cause the CPU to fetch a cache line that contains mostly pointer metadata and a single element, wasting bandwidth and adding unpredictable branch behavior.

Array growth strategies are another key systems concept. Most dynamic arrays grow geometrically (e.g., doubling capacity) to amortize reallocation costs. The cost of resizing can be high because it involves allocating a new buffer and copying the old contents. However, the amortized cost per append is still O(1), and the benefit is tight, contiguous storage. This is why LSM memtables and log buffers prefer array-backed structures: writes are sequential, and reads can be optimized by scanning or binary searching. Arrays are also the foundation of many higher-level structures: hash tables store buckets in arrays, heaps are arrays with index arithmetic, and ring buffers are arrays with modular arithmetic.

In systems programming, array layout decisions should align with cache line and page size boundaries. The typical cache line size is 64 bytes, so packing your hot fields into a compact struct often yields massive performance gains. For disk, the unit is the page or block (commonly 4KB). A B-tree node that fits in a page minimizes the number of I/O operations per query. Arrays also enable vectorization and SIMD because sequential memory allows predictable memory access patterns. When you measure performance, you will often find that arrays with simple loops outperform more “clever” structures due to better branch prediction and fewer cache misses. The deeper lesson is that asymptotic complexity is only part of the performance contract; layout and access patterns can dominate.

How This Fits in Projects

Project 2: Log segments and SSTables use array-like sequential layout.
Project 3: Ring buffers are fixed-size arrays with circular indexing.
Project 4: Size classes use arrays of free lists.
Project 5: B-tree nodes are effectively fixed-size arrays stored in pages.

Definitions and Key Terms

Spatial locality: Accessing nearby memory addresses close in time.
Temporal locality: Re-accessing the same memory soon after use.
Cache line: Fixed-size chunk of memory fetched by the CPU cache.
Prefetcher: CPU unit that predicts and fetches future memory accesses.
Amortized cost: Average cost per operation over many operations.

Mental Model Diagram

Array in Memory (contiguous)
[0][1][2][3][4][5][6][7][8][9]
 ^  ^  ^  ^  ^  ^  ^  ^  ^  ^
 |  |  |  |  |  |  |  |  |  |
Cache line fetch grabs multiple elements at once

How It Works (Step-by-Step)

CPU loads a cache line containing element i.
Nearby elements i+1, i+2 are already loaded.
The loop continues with cache hits instead of misses.
If the array grows, a larger buffer is allocated and copied.
The amortized cost remains low because growth is infrequent.

Minimal Concrete Example

// Simple dynamic array append with geometric growth
void push_int(int **arr, size_t *len, size_t *cap, int value) {
    if (*len == *cap) {
        *cap = (*cap == 0) ? 8 : (*cap * 2);
        *arr = realloc(*arr, *cap * sizeof(int));
    }
    (*arr)[(*len)++] = value;
}

Common Misconceptions

“O(1) lookups always beat O(log n)”: Cache misses can make the tree slower.
“Resizing arrays is always expensive”: It is expensive occasionally, but amortized cheap.
“Arrays are only for small data”: Arrays scale extremely well when access is sequential.

Check-Your-Understanding Questions

Why can a sequential array scan beat a balanced tree lookup even though it is O(n)?
What does amortized O(1) append mean in practice?
How does cache line size influence struct layout?

Check-Your-Understanding Answers

The array scan has predictable memory access with cache prefetching, while the tree lookup suffers cache misses on pointer jumps.
Most appends are O(1), and expensive resizes happen rarely, so average cost stays constant.
Packing hot fields into one cache line reduces misses and improves throughput.

Real-World Applications

Network ring buffers and audio pipelines
Log buffers and write-ahead logs
In-memory index arrays and vectorized scans

Where You Will Apply It

Project 2: Sequential log segments and SSTables
Project 3: Ring buffer memory layout
Project 4: Size class arrays

References

Computer Systems: A Programmer’s Perspective - Ch. 6 (Memory Hierarchy)
Write Great Code, Volume 2 - Ch. 5-6 (Memory and caches)

Key Insight

Arrays are fast because they align with how hardware actually moves data.

Summary

Arrays exploit spatial locality, reduce metadata overhead, and enable the CPU to prefetch effectively. In systems programming, this matters more than theoretical complexity. They are the backbone of high-performance queues, tables, and storage engines.

Homework/Exercises to Practice the Concept

Measure the time to iterate over a 100M-element array vs a linked list.
Compare performance of AoS (array of structs) vs SoA (struct of arrays) for a hot loop.

Solutions to the Homework/Exercises

The array scan should be orders of magnitude faster due to cache locality.
SoA often performs better for SIMD-friendly operations and cache line packing.

Chapter 2: Linked Lists and Intrusive Structures (Where Pointers Actually Win)

Fundamentals

Linked lists are often presented as a flexible alternative to arrays. In systems programming, their main advantage is not insertion speed but ownership of memory. By embedding list pointers directly inside a struct (an intrusive list), you avoid extra allocations and can remove or insert nodes in O(1) when you already have the pointer. This is why linked lists appear in memory allocators and LRU caches. They are not cache-friendly, and they are not good for random access. They are good for constant-time splicing when you already have the element and when the list itself is not too long. In short: linked lists are a specialized tool, not a default choice.

Deep Dive

The biggest performance cost of linked lists is pointer chasing. Each node is potentially in a different cache line or even a different memory page, causing frequent cache misses. The CPU cannot prefetch effectively because the next pointer is unknown until the current node is loaded. This makes even a simple traversal slow. In systems code, that cost can dwarf the theoretical advantage of O(1) insertion. However, there are important places where linked lists are still ideal.

Intrusive lists are the canonical systems use case. The list pointers live inside the objects themselves, so you do not allocate list node wrappers. This reduces allocation overhead and improves memory locality compared to non-intrusive lists. Because you already have the pointer to the object, removal is O(1) as long as you maintain prev and next. This is exactly what an LRU cache needs: after a GET, you can move the accessed item to the head of the list in constant time. In a memory allocator, free blocks are tracked in free lists using pointers embedded in the free blocks themselves. That way, the memory used for bookkeeping comes from the free block itself. When you coalesce blocks, you can unlink and relink nodes without extra allocation.

Linked lists are also the backbone of many kernel and embedded systems data structures where memory is preallocated and deterministic. The Linux kernel uses intrusive list macros (list_head) extensively. The key is that lists are used as indexing glue rather than as the primary storage. The actual data may live in arrays or pools, and lists provide fast membership and ordering operations. That means lists often pair with another structure: a hash table for lookup plus a list for ordering (LRU), or an array pool plus a free list for allocation.

One of the subtle benefits of linked lists is stable pointers. In a dynamic array, a resize moves elements, invalidating pointers. In a linked list, node addresses remain stable. That makes lists useful for caches or allocators where you must store stable references to objects. But this benefit is only valuable when you already have pointers from another structure. For random access, arrays and trees remain superior.

How This Fits in Projects

Project 1: LRU list for cache eviction.
Project 4: Free lists for memory allocator.
Project 6: Buffer pool LRU and free page lists.

Definitions and Key Terms

Intrusive list: List pointers are embedded in the data object itself.
Pointer chasing: Following pointers that lead to unpredictable memory addresses.
Stable pointer: An address that does not change even when container grows.
Splicing: Removing and inserting nodes without full traversal.

Mental Model Diagram

Hash Table Lookup + LRU List Ordering

Key -> Hash -> Bucket -> Node
                        |
                        v
   [MRU] <-> [ ... ] <-> [LRU]

Hash table gives O(1) lookup; list gives O(1) eviction ordering

How It Works (Step-by-Step)

Store objects in a hash table for O(1) lookup.
Maintain a doubly linked list for recency ordering.
On access, unlink the node and move it to the head.
On eviction, remove the tail node.

Minimal Concrete Example

typedef struct node {
    struct node *prev;
    struct node *next;
    int key;
    int value;
} node_t;

void list_remove(node_t **head, node_t *n) {
    if (n->prev) n->prev->next = n->next;
    if (n->next) n->next->prev = n->prev;
    if (*head == n) *head = n->next;
    n->prev = n->next = NULL;
}

Common Misconceptions

“Linked lists are always faster for inserts”: Only true if you already have the pointer.
“Linked lists save memory”: They often use more memory due to extra pointers.
“Linked lists avoid resizing”: True, but at the cost of cache locality.

Check-Your-Understanding Questions

Why does a linked list traversal suffer from cache misses?
Why are intrusive lists better than external node wrappers?
In what scenarios are stable pointers more important than cache locality?

Check-Your-Understanding Answers

Each node is in a potentially different cache line, so each hop can be a miss.
Intrusive lists avoid extra allocations and reduce indirection.
When objects are referenced elsewhere and cannot move (e.g., LRU nodes or allocator blocks).

Real-World Applications

LRU caches in Redis and OS page caches
Free lists in memory allocators
Kernel task queues and timer lists

Where You Will Apply It

Project 1: LRU eviction list
Project 4: Free list management
Project 6: Buffer pool and free page lists

References

Operating Systems: Three Easy Pieces - Ch. 17 (Free-space management)
Advanced Programming in the UNIX Environment - Ch. 7 (Memory allocation concepts)

Key Insight

Linked lists are not a general-purpose structure; they are a precision tool for constant-time splicing with stable pointers.

Summary

Use linked lists when you already have the node pointer and need constant-time reordering or removal. Otherwise, prefer arrays and contiguous structures.

Homework/Exercises to Practice the Concept

Implement an intrusive doubly linked list and measure traversal time vs array.
Add LRU ordering to a hash table and confirm eviction order.

Solutions to the Homework/Exercises

The list traversal will be much slower due to cache misses.
LRU ordering should evict the least recently accessed key.

Chapter 3: Hash Tables and Hashing in Systems

Fundamentals

Hash tables provide average O(1) lookup by mapping keys to buckets using a hash function. In systems programming, the hidden cost is not the lookup itself but the load factor, the collision strategy, and the resizing policy. As the table fills, collisions increase and each lookup becomes a small linear scan. Resizing is expensive because every key must be rehashed. In performance-critical systems, you often use incremental rehashing or fixed-size tables to avoid latency spikes. Good hash functions are also critical; a weak hash leads to clustering and destroys performance. Hash tables are the backbone of caches, indexes, and memory tracking systems because they deliver fast point lookups with predictable average behavior when tuned correctly.

Deep Dive

A hash table is only “O(1)” on average when the load factor stays within a safe range, typically below 0.75 for open addressing or around 1.0 for chaining. In chaining, each bucket holds a list of entries. This allows high load factors, but each bucket traversal can become costly and cache-unfriendly. In open addressing, entries live directly in the bucket array, improving cache locality, but the table cannot be overfull and deletions are complex due to tombstones. The choice between these strategies is a fundamental systems design tradeoff: chaining is simpler and supports deletions easily, while open addressing can be faster but is more sensitive to load factor.

Resizing is the major hidden cost. When the table grows, every entry must be rehashed because the modulo divisor changes. For a table with millions of entries, this can cause a multi-millisecond or multi-second pause, which is unacceptable for latency-sensitive systems. Production caches like Redis avoid this with incremental rehashing: two tables exist simultaneously, and entries are migrated gradually during normal operations. This is a core example of an algorithmic choice that is invisible in academic exercises but critical in real systems.

Hash function quality determines collision behavior. Non-uniform hashing creates clustering, which means that some buckets are far more expensive than others. In security-sensitive contexts, a poor hash function can lead to denial-of-service attacks (hash flooding). Many systems use randomized hash seeds or stronger hashing algorithms to mitigate this. There is also the issue of key size: variable-length string keys increase hash computation cost, so production systems often store precomputed hash values or use short integer keys when possible. Additionally, the metadata overhead of hash table entries can be significant, particularly when using separate chaining. For memory-heavy workloads, you must account for the cost of pointers, alignment padding, and allocator overhead.

In systems programming, hash tables are often paired with other structures. An LRU cache combines a hash table for lookup and a linked list for eviction order. A memory allocator uses a hash table to track allocations for debugging. A database may use a hash table for in-memory indexes but rely on B-trees for ordered disk indexes. The core is understanding that a hash table is not just a data structure; it is a set of tradeoffs around memory usage, cache locality, and worst-case behavior.

How This Fits in Projects

Project 1: Key lookup in the cache server
Project 2: In-memory index for log-structured store
Project 4: Allocation tracking
Project 6: Secondary indexes

Definitions and Key Terms

Load factor: items / buckets
Chaining: collisions handled by per-bucket lists
Open addressing: collisions handled by probing in the array
Tombstone: marker for deleted entries in open addressing
Incremental rehashing: migrating entries over multiple operations

Mental Model Diagram

Key -> Hash(key) -> Bucket index -> Entry

If collision:
Bucket 5: [k1]->[k2]->[k3] (chaining)

or
Bucket 5 occupied, probe 6, 7, 8... (open addressing)

How It Works (Step-by-Step)

Compute hash for key.
Map hash to bucket using modulo.
Search bucket chain or probe sequence.
Insert or update entry.
Resize when load factor threshold reached.

Minimal Concrete Example

size_t bucket = hash(key) % table->capacity;
entry_t *e = table->buckets[bucket];
while (e && strcmp(e->key, key) != 0) e = e->next;

Common Misconceptions

“Hash tables are always O(1)”: Only average-case; bad hashing can degrade to O(n).
“Resizing is rare”: It is rare but can cause huge latency spikes if not handled.
“Chaining is always slower”: It can outperform open addressing under heavy deletions.

Check-Your-Understanding Questions

Why does open addressing degrade sharply at high load factors?
How does incremental rehashing reduce tail latency?
Why can hashing be a security risk?

Check-Your-Understanding Answers

Probing sequences grow long, causing many cache misses and comparisons.
It spreads the rehash cost across many operations.
Attackers can craft keys that collide and force O(n) behavior.

Real-World Applications

Redis and Memcached key lookup
In-memory metadata tracking
Deduplication tables in storage systems

Where You Will Apply It

Project 1: Cache table
Project 2: Log index
Project 4: Allocation tracking
Project 6: Secondary indexes and metadata

References

Algorithms, Fourth Edition - Ch. 3.4 (Hash Tables)
Mastering Algorithms with C - Ch. 8 (Hash Tables)
Redis key eviction and LRU policy overview: https://redis.io/docs/latest/develop/reference/eviction/

Key Insight

A hash table is only fast if you control collisions, load factor, and resizing.

Summary

Hash tables power fast point lookups, but their real performance is governed by collision strategy, load factor, and resizing. Systems code must treat these as first-class design decisions.

Homework/Exercises to Practice the Concept

Implement a hash table with chaining and measure lookup latency at load factors 0.5, 1.0, 2.0.
Add incremental rehashing and compare tail latency.

Solutions to the Homework/Exercises

Latency increases non-linearly as collisions increase.
Incremental rehashing reduces long pauses during resize.

Chapter 4: Queues in Systems (Ring Buffers and Priority Queues)

Fundamentals

Queues are the backbone of data movement in systems: network packets, audio frames, log records, and task scheduling all rely on queue behavior. The most important queue structure in systems programming is the ring buffer, which is a fixed-size array with head and tail indices that wrap around. This structure avoids allocation and provides predictable backpressure when full. For traffic shaping and scheduling, a priority queue (usually a binary heap) is used to select the most important item first. Together, ring buffers and heaps provide the essential tools for building predictable, high-throughput pipelines.

Deep Dive

A ring buffer uses modular arithmetic to map a continuous stream of writes into a fixed-size array. The head points to the next element to read, and the tail points to the next element to write. The buffer is empty when head equals tail, and full when advancing the tail would collide with the head. This makes capacity predictable and enables lock-free producer-consumer designs when you use atomic head and tail counters. Because the buffer is fixed-size, you avoid the unpredictable latency of malloc/free, and you gain cache-friendly memory layout.

Backpressure is a core systems concept enabled by ring buffers. When the buffer is full, the producer must slow down or drop input. This is the difference between stable systems and those that overload under spikes. In networking, ring buffers absorb bursts and keep the consumer from being overwhelmed. In audio processing, ring buffers prevent underruns and overruns. Systems that ignore backpressure typically exhibit tail latency spikes and memory blowups.

Priority queues solve a different problem: ordering items by importance rather than arrival time. A binary heap stored in an array offers O(log n) insert and removal with excellent cache locality. For traffic shaping, you can maintain separate priority queues and choose from the highest priority queue when the ring buffer is ready to send. In real-time systems, priority queues are essential for deadline scheduling.

Ring buffers and heaps are both array-based structures, which makes them predictable and cache-friendly. The most common errors are off-by-one bugs in wrap-around arithmetic, and incorrect full/empty detection. These bugs manifest as silent data corruption. In systems code, you should write unit tests that explicitly cover wrap-around and boundary conditions.

How This Fits in Projects

Project 3: Core of the packet buffer and rate limiter
Project 6: Write-ahead log buffer

Definitions and Key Terms

Head/Tail pointers: Indices that track read/write positions.
Backpressure: Signaling to slow producers when buffers are full.
Priority queue: A structure that returns the highest-priority item first.
Heap property: Parent nodes are ordered relative to children.

Mental Model Diagram

Ring Buffer

[0][1][2][3][4][5][6][7]
 ^           ^
head        tail

Write: buffer[tail] = item; tail = (tail+1) % N
Read:  item = buffer[head]; head = (head+1) % N

How It Works (Step-by-Step)

Producer checks if buffer is full.
If not full, write at tail and advance.
Consumer checks if buffer is empty.
If not empty, read at head and advance.
Priority queue selects next item by importance.

Minimal Concrete Example

size_t next = (tail + 1) % capacity;
if (next == head) return BUFFER_FULL;
ring[tail] = item;
tail = next;

Common Misconceptions

“Ring buffers are just queues”: They are queues with fixed capacity and wrap-around logic.
“Priority queues are too expensive”: For moderate sizes, O(log n) with array layout is fast.
“Backpressure is optional”: Without it, systems fail under load spikes.

Check-Your-Understanding Questions

How do you distinguish between full and empty in a ring buffer?
Why is a heap often stored in an array?
What happens if a producer ignores backpressure?

Check-Your-Understanding Answers

Full occurs when advancing tail would meet head; empty when head == tail.
Arrays provide cache-friendly layout and simple index arithmetic.
Buffers grow unbounded or data gets dropped unpredictably, causing latency spikes.

Real-World Applications

NIC receive queues and kernel ring buffers
Audio and video streaming pipelines
Task schedulers and priority-based job systems

Where You Will Apply It

Project 3: Packet buffer and traffic shaping
Project 6: WAL and query scheduling

References

Linux kernel circular buffer documentation: https://docs.kernel.org/core-api/circular-buffers.html
Linux kernel ring buffer design: https://www.kernel.org/doc/html/latest/trace/ring-buffer-design.html
Computer Networks - Ch. 3-4 (Queueing and scheduling)

Key Insight

Ring buffers make performance predictable by fixing capacity and eliminating allocation.

Summary

Systems pipelines depend on predictable queues. Ring buffers provide fast FIFO behavior with backpressure, and priority queues provide policy control.

Homework/Exercises to Practice the Concept

Implement a ring buffer and test wrap-around with capacity 8.
Implement a binary heap and measure insert/remove throughput.

Solutions to the Homework/Exercises

Wrap-around should preserve ordering across the boundary.
Heap operations should remain logarithmic with stable throughput.

Chapter 5: Memory Allocator Design (Free Lists, Size Classes, Fragmentation)

Fundamentals

Memory allocation is where data structures meet physical memory. A good allocator must balance fast allocation with minimal fragmentation. Most allocators use free lists to track available blocks and size classes to group allocations of similar sizes. Small allocations are served from fixed-size bins, while large allocations are handled separately. The allocator must also support coalescing of adjacent free blocks to reduce fragmentation. Debugging features such as double-free detection and leak tracking require additional metadata structures, often hash tables. Understanding allocator design is crucial because it directly affects the performance and memory usage of every system program.

Deep Dive

A simple allocator might use a single free list of all free blocks. This is easy to implement but slow because every allocation requires scanning the list to find a sufficiently large block. It also produces fragmentation because blocks are split and not recombined efficiently. Production allocators use segregated free lists: separate lists for different size classes. When you request 32 bytes, the allocator picks a block from the 32-byte bin without scanning the entire heap. This makes allocation O(1) in practice. The tradeoff is internal fragmentation: you might waste a few bytes inside each block because the block size is fixed by the class. Modern allocators accept this tradeoff because the performance gain is massive.

Coalescing is the process of merging adjacent free blocks to form a larger block. Without coalescing, free space becomes fragmented into small unusable pieces. Allocators track block sizes and whether neighboring blocks are free. A common technique is boundary tags: storing the size in both the header and footer so that you can quickly find adjacent blocks. Another technique is to maintain separate free lists per size and also maintain a doubly linked list of all free blocks for coalescing.

Threading adds another dimension. Allocators often maintain per-thread caches to reduce contention. Jemalloc and tcmalloc maintain multiple arenas, each with its own free lists and metadata. This reduces locking overhead but increases memory usage because each arena may hold unused blocks. This is a classic systems tradeoff: performance vs memory. Debugging features also require metadata. Leak detection means you must track every allocation, typically in a hash table keyed by pointer. Double-free detection requires marking free blocks and verifying they are not already freed.

Allocator behavior is deeply intertwined with cache locality. Small allocations are often packed into slabs or pages, improving cache performance. But if you allocate many tiny objects with different lifetimes, fragmentation can become severe. Understanding your application’s allocation patterns is key. The allocator is not just a library function; it is a policy engine controlling how memory is organized and reused.

How This Fits in Projects

Project 4: Implement a custom allocator with free lists and size classes
Project 6: Manage page and buffer pools with allocator-like logic

Definitions and Key Terms

Internal fragmentation: Wasted space inside allocated blocks.
External fragmentation: Free space split into unusable pieces.
Segregated free list: Separate free lists per size class.
Boundary tag: Metadata in header/footer used for coalescing.
Arena: Independent allocator instance to reduce contention.

Mental Model Diagram

Heap Layout
[Hdr|....allocated....|Ftr][Hdr|..free..|Ftr][Hdr|....alloc....|Ftr]
               ^               ^
           block size       block size

Free lists:
[16B] -> [block] -> [block]
[32B] -> [block]
[64B] -> [block] -> [block]

How It Works (Step-by-Step)

Choose size class based on request size.
Pop a block from the free list.
If no block, request more memory and split.
On free, insert into free list and attempt coalescing.
Update metadata for debugging and leak detection.

Minimal Concrete Example

// Very simplified segregated free list
free_block_t *bin = bins[class_index(size)];
if (bin) {
    bins[class_index(size)] = bin->next;
    return (void *)(bin + 1);
}

Common Misconceptions

“malloc is just a wrapper over mmap”: It is a complex policy engine.
“Fragmentation is rare”: It is common in long-running systems.
“Faster allocation always means better”: It can cost significant memory overhead.

Check-Your-Understanding Questions

Why do size classes improve allocation speed?
What is the difference between internal and external fragmentation?
Why do allocators use arenas?

Check-Your-Understanding Answers

They avoid scanning large free lists and provide O(1) lookup.
Internal is wasted space inside blocks, external is unusable free chunks.
To reduce contention in multi-threaded programs.

Real-World Applications

glibc malloc, jemalloc, tcmalloc
Memory pools in game engines
Buffer pool management in databases

Where You Will Apply It

Project 4: Custom allocator
Project 6: Buffer and page pool logic

References

Computer Systems: A Programmer’s Perspective - Ch. 9.9 (Dynamic memory allocation)
jemalloc manpage and design notes: https://jemalloc.net/jemalloc.3.html
tcmalloc design overview: https://google.github.io/tcmalloc/design.html

Key Insight

Allocator design is about trading memory efficiency for predictable speed.

Summary

Allocators organize memory using free lists and size classes to deliver fast allocation. Fragmentation and concurrency are the real challenges, not just allocating bytes.

Homework/Exercises to Practice the Concept

Simulate allocations and frees to measure fragmentation under different size classes.
Implement boundary tags and test coalescing correctness.

Solutions to the Homework/Exercises

Size classes reduce time but increase internal fragmentation.
Coalescing should merge adjacent blocks and reduce external fragmentation.

Chapter 6: B-Trees and Disk-Oriented Indexes

Fundamentals

B-trees are the dominant structure for ordered data on disk. They are designed to minimize I/O by maximizing fanout, which reduces tree height. Each node is sized to fit a disk page (commonly 4KB). This means a single disk read brings in dozens or hundreds of keys at once. In contrast, binary search trees are shallow in memory but terrible on disk because each pointer jump can be a separate I/O. B-trees give predictable O(log n) performance with far fewer disk reads, which is why they are used in filesystems and databases.

Deep Dive

A B-tree node contains multiple keys and pointers to children, arranged in sorted order. When a node becomes full, it splits into two nodes and promotes a key to the parent. This keeps the tree balanced and ensures a consistent height across operations. A key concept is that the node size should align with the disk page size, so each node can be loaded with a single I/O. The fanout (number of children per node) is therefore determined by page size and key size. A fanout of 100 means that a tree with a million keys has height around 3, which means only 3 page reads per lookup.

Many systems use B+ trees, where all values are stored in leaf nodes and internal nodes store only keys and child pointers. Leaves are linked together to support efficient range scans. This design is common in database indexes because range queries are frequent, and sequential leaf scans are efficient on disk. Another important concept is the separation between logical and physical layout. A B-tree node is a logical unit, but its on-disk representation requires careful layout, including headers, key arrays, child pointers, and checksums. Crash consistency often involves journaling or write-ahead logs to ensure that node splits do not corrupt the tree.

The cost model for B-trees is dominated by disk or SSD latency. Even with SSDs, random reads are far slower than memory reads. This is why B-trees maximize fanout and minimize height. It is also why many systems choose B-trees for primary indexes but use hash tables for in-memory caches. A B-tree gives ordered traversal and predictable performance for both point and range queries, while hash tables provide fast point lookups but no ordering.

In practice, B-trees must handle deletion, which can cause underfull nodes. This requires merging or rebalancing nodes, which is more complex than insertion. Many systems choose to delay merges until nodes are under a threshold, trading space efficiency for fewer writes. Understanding these tradeoffs is critical when building disk-based indexes.

How This Fits in Projects

Project 5: Disk-based file index
Project 6: Primary index in the database engine

Definitions and Key Terms

Fanout: Number of children per internal node.
Page: Fixed-size block of disk I/O, often 4KB.
B+ tree: Variant where values are stored in leaves and leaves are linked.
Split: Operation that divides a full node into two.
Merge: Operation that combines underfull nodes after deletion.

Mental Model Diagram

B-Tree Node (fits in one page)
[ k1 | k2 | k3 | k4 ]
 /   /   |   \   \
C0  C1   C2   C3  C4

High fanout reduces height and I/O

How It Works (Step-by-Step)

Search down the tree by comparing keys in a node.
Load child page and repeat until leaf.
Insert into leaf; if full, split and promote.
Adjust parents until tree remains balanced.
For deletes, merge or redistribute if underfull.

Minimal Concrete Example

// Search in a B-tree node (simplified)
int i = 0;
while (i < node->nkeys && key > node->keys[i]) i++;
if (i < node->nkeys && key == node->keys[i]) return FOUND;
return search(node->children[i], key);

Common Misconceptions

“B-trees are only for databases”: Filesystems also use them extensively.
“B-trees are about Big-O”: They are about minimizing disk I/O.
“B-trees are too complex”: Complexity is necessary for predictable performance.

Check-Your-Understanding Questions

Why does higher fanout reduce disk I/O?
What is the benefit of a B+ tree for range queries?
Why does deletion require merges or redistribution?

Check-Your-Understanding Answers

Higher fanout reduces tree height, so fewer pages are read.
Leaves are linked, enabling sequential scans.
Deletions can leave nodes underfull, breaking balance rules.

Real-World Applications

SQLite and many embedded databases
Filesystem indexes
Block device indexing

Where You Will Apply It

Project 5: File index
Project 6: Primary index

References

Algorithms, Fourth Edition - Ch. 6.1 (Balanced search trees)
SQLite file format documentation (B-tree pages and page sizes): https://www.sqlite.org/fileformat.html
Database Internals - Ch. 2-4 (Storage engine architecture)

Key Insight

B-trees trade complex node management for predictable disk I/O.

Summary

B-trees are the core disk index structure because they minimize page reads. Their design is driven by storage page size and fanout, not just algorithmic complexity.

Homework/Exercises to Practice the Concept

Compute the height of a B-tree with fanout 100 and 1 million keys.
Implement leaf-linked range scan and measure throughput.

Solutions to the Homework/Exercises

Height is about 3 (log base 100 of 1,000,000).
Range scans should be sequential and significantly faster than random lookups.

Chapter 7: LSM Trees and Compaction (Write-Optimized Storage)

Fundamentals

Log-Structured Merge (LSM) trees are designed to optimize write throughput by converting random writes into sequential appends. Writes go to an in-memory structure (memtable) and are flushed to disk as sorted files (SSTables). Reads may need to check multiple files, so LSM trees trade read amplification for write performance. Compaction is the process of merging these files to keep read performance reasonable. LSM trees dominate modern write-heavy storage systems such as RocksDB, LevelDB, Cassandra, and HBase.

Deep Dive

An LSM tree begins with a write-ahead log (WAL) to guarantee durability. The actual data is written into an in-memory memtable, often implemented as a skip list or sorted array. When the memtable reaches a size threshold, it is flushed to disk as an immutable sorted file called an SSTable. Over time, many SSTables accumulate, and reads must search multiple files. This is where compaction comes in: smaller SSTables are merged into larger ones, reducing the number of files and reclaiming space from overwritten keys.

Compaction introduces write amplification because data is rewritten multiple times as it moves through levels. The size ratio between levels (often 10x) controls the tradeoff between write amplification and read amplification. Larger ratios mean fewer levels but more work per compaction. Smaller ratios mean more levels and more read overhead. This is a central tuning knob for LSM systems. Bloom filters are commonly used to reduce read amplification by quickly determining which SSTables cannot contain a key. This reduces unnecessary disk reads. LSM trees therefore use a combination of structures: arrays for log segments, hash tables for in-memory indexes, Bloom filters for negative lookups, and merge algorithms for compaction.

LSM trees excel on SSDs and even HDDs because sequential writes are far cheaper than random writes. In B-trees, each insert may cause multiple random writes along the tree path. In LSM trees, writes are appended, and random I/O is replaced by sequential compaction runs. The downside is that reads are more complex and require merging multiple versions of a key. This is why LSM systems are best suited to write-heavy workloads or workloads where range scans are less frequent.

Real-world implementations introduce complexity: multi-level compaction policies, tiered vs leveled compaction, size-tiered compactions, and concurrent compaction threads. They also require careful handling of tombstones (deletions) and snapshots. The key lesson is that LSM trees trade read simplicity for write speed, and the correct choice depends on workload.

How This Fits in Projects

Project 2: Core of the log-structured key-value store
Project 6: Secondary index and write path in the database engine

Definitions and Key Terms

Memtable: In-memory sorted structure for recent writes.
SSTable: Immutable sorted file on disk.
Compaction: Merge process to reduce file count.
Write amplification: Extra writes caused by compaction.
Bloom filter: Probabilistic structure for fast negative lookups.

Mental Model Diagram

Write Path
Client -> WAL -> Memtable -> Flush -> SSTable
                                  -> Compaction -> Larger SSTable

How It Works (Step-by-Step)

Append write to WAL for durability.
Insert key into memtable.
When memtable full, flush to new SSTable.
Compaction merges SSTables by key order.
Bloom filters reduce disk reads during lookups.

Minimal Concrete Example

// Write path sketch
wal_append(key, value);
memtable_put(key, value);
if (memtable_full()) flush_to_sstable();

Common Misconceptions

“LSM trees are always faster”: They are faster for writes, but reads can be slower.
“Compaction is optional”: Without compaction, read performance collapses.
“Bloom filters guarantee accuracy”: They can have false positives.

Check-Your-Understanding Questions

Why does LSM design turn random writes into sequential writes?
What is write amplification and why does it matter?
How do Bloom filters reduce read amplification?

Check-Your-Understanding Answers

Writes are appended to logs and flushed sequentially to SSTables.
Data is rewritten multiple times during compaction, increasing total writes.
They quickly exclude SSTables that cannot contain a key.

Real-World Applications

RocksDB, LevelDB, Cassandra, HBase
Time-series databases and logging systems

Where You Will Apply It

Project 2: Log-structured KV store
Project 6: Secondary index and WAL integration

References

LSM-Tree original paper (O’Neil et al., 1996): https://dblp.org/rec/journals/acta/OniellCM96.html
RocksDB compaction documentation: https://github.com/facebook/rocksdb/wiki/Compaction
Designing Data-Intensive Applications - Ch. 3 (Storage and retrieval)

Key Insight

LSM trees trade read simplicity for massive write throughput by embracing sequential I/O.

Summary

LSM trees are the go-to write-optimized structure. They rely on compaction, bloom filters, and sequential I/O to achieve high throughput under heavy write loads.

Homework/Exercises to Practice the Concept

Implement a memtable and flush it to a sorted file.
Write a merge function for two sorted SSTables.

Solutions to the Homework/Exercises

The flushed file should be sorted by key and immutable.
The merge should be linear in the total number of entries.

Chapter 8: Measurement and Big-O in Real Systems

Fundamentals

Big-O notation is a useful mental model, but real systems performance depends on constant factors, memory layout, and I/O behavior. The only reliable way to know if a structure is “fast” is to measure it under realistic workload conditions. Systems engineers use profiling tools like perf, gprof, and flamegraphs, as well as benchmarking tools like hyperfine. The key skill is to turn performance into a measurable contract: latency targets, throughput targets, and memory limits. This chapter teaches you how to reason about data structures with both theory and measurement.

Deep Dive

In practice, a data structure’s theoretical complexity is only one part of the story. Consider a hash table vs a B-tree: the hash table offers O(1) average lookup, but if the table is large and the keys are scattered, each lookup may incur cache misses. The B-tree has O(log n) complexity, but if its nodes fit in cache lines or disk pages, each step may be efficient. The result is that the B-tree can outperform the hash table for large datasets. This is why systems engineers often talk about “cache complexity” or “I/O complexity” rather than just Big-O.

Benchmarking must match the intended workload. A cache optimized for random reads may perform poorly under sequential scans. A ring buffer optimized for single-producer/single-consumer may fail under multi-producer contention. This is why you must build benchmarks that reflect real use: bursty traffic, skewed key distributions (e.g., Zipfian), or mixed read/write ratios. Measurement also requires statistical discipline: multiple runs, warm-up iterations, and percentile-based metrics. Mean latency is often misleading; tail latency (p99, p999) reveals when the system fails to meet its contract.

Profiling is the method for finding bottlenecks. A flamegraph shows which functions dominate CPU time. Cachegrind reveals where cache misses occur. perf stat can show branch mispredictions, LLC misses, and cycles per instruction. For memory-heavy projects, tools like valgrind --tool=massif measure heap usage over time. In storage projects, strace reveals system call overhead, and iostat shows disk saturation. The point is not to optimize prematurely, but to understand the bottleneck and confirm that your design is behaving as expected.

Finally, measurement should drive design. If your LRU cache spends 30 percent of its time in rehashing, you need a different resizing policy. If your allocator has high fragmentation, you need different size classes or coalescing policies. If your B-tree range scans are slow, check whether leaf nodes are aligned with page boundaries. In systems programming, theory gives you the starting point, but measurement decides the winner.

How This Fits in Projects

All projects require benchmarking and profiling to validate structure choices.

Definitions and Key Terms

Throughput: Operations per second.
Latency: Time per operation.
Tail latency: High-percentile latency (p99, p999).
Warm cache: Repeated access where data is already in memory.
Zipfian distribution: Access pattern where a few keys dominate.

Mental Model Diagram

Theory -> Hypothesis -> Implementation -> Measurement -> Decision
   |           |              |                |           |
 Big-O      Expectation     Code            Benchmark     Change

How It Works (Step-by-Step)

Define the performance contract (latency/throughput/memory).
Create a workload that matches production patterns.
Measure baseline performance.
Profile to identify bottlenecks.
Change the design and re-measure.

Minimal Concrete Example

# Quick benchmark loop
$ hyperfine './cache_bench --keys 100000 --ratio 80/20'

Common Misconceptions

“Big-O is enough”: It ignores hardware realities.
“Average latency is fine”: Tail latency is what users feel.
“Profiling is only for optimization”: It validates correctness of design.

Check-Your-Understanding Questions

Why might O(log n) beat O(1) for large data sets?
Why is p99 latency more important than average?
What is a warm cache and why does it matter?

Check-Your-Understanding Answers

Cache locality and I/O behavior can dominate theoretical complexity.
Users experience worst-case delays, not the average.
Warm caches avoid disk I/O, which dramatically improves performance.

Real-World Applications

Performance tuning in databases, caches, and storage systems
SLO-driven system design

Where You Will Apply It

Every project in this sprint

References

Computer Systems: A Programmer’s Perspective - Ch. 5-6 (Performance and memory hierarchy)
The Art of Debugging with GDB - performance analysis sections

Key Insight

Measurement, not theory, is the final judge in systems performance.

Summary

Big-O is a tool, not the truth. Real systems require measurement, profiling, and workload-aware benchmarking to validate data structure choices.

Homework/Exercises to Practice the Concept

Build a microbenchmark comparing array scan, linked list scan, and tree lookup.
Measure p50, p90, p99 latency for your cache server.

Solutions to the Homework/Exercises

Array scans will dominate due to cache locality.
p99 latency should spike during resizing or eviction storms.

Glossary

Cache line: The smallest unit the CPU cache fetches (commonly 64 bytes).
Spatial locality: Accessing memory addresses near each other.
Temporal locality: Re-accessing the same data soon.
Load factor: Items per bucket in a hash table.
Fanout: Number of children per node in a tree.
Compaction: Merging storage files to reduce fragmentation and improve reads.
Backpressure: Slowing input when buffers fill.
Fragmentation: Memory waste due to allocation patterns.
SSTable: Immutable sorted storage file in LSM trees.
WAL: Write-ahead log for durability.

Why Data Structures Matter

The Modern Problem It Solves

Modern systems live or die on latency and throughput. Data structures are the mechanism that turns hardware into performance. The same workload can vary by orders of magnitude depending on layout and access pattern.

Real-world impact (data points):

SQLite documents that database pages range from 512 to 65536 bytes and that the maximum database size is about 281 TB, which makes B-tree page layout fundamental to scalability (SQLite file format documentation, accessed 2026): https://www.sqlite.org/fileformat.html
RocksDB’s 2022 compaction write-up reports more than 10% write-amplification reduction and about 12% compaction load savings from aligned compaction output files, showing how compaction policy directly affects throughput (RocksDB blog, 2022-10-31): https://rocksdb.org/blog/2022/10/31/align-compaction-output-file.html
jemalloc documentation notes that size classes trade internal fragmentation for speed, often around 20 percent wasted space for small allocations in exchange for predictable performance (jemalloc manpage, accessed 2026): https://jemalloc.net/jemalloc.3.html
tcmalloc documentation describes dozens of size classes (around 60-80) to balance speed and memory usage (tcmalloc design docs, accessed 2026): https://google.github.io/tcmalloc/design.html

Old Approach (Academic)         Systems Approach (This Sprint)
[Big-O proofs]                 [Measure cache misses]
[Balanced trees on paper]      [B-trees sized to pages]
[Linked lists everywhere]      [Arrays + intrusive lists]
[No workload model]            [Benchmarks + tail latency]

Context and Evolution (Optional)

Data structures evolved from theoretical foundations to hardware-driven designs. The LSM-tree was introduced in the mid-1990s to address write-heavy workloads and has become the core of many modern storage engines. Today, hardware characteristics like SSD write amplification and CPU cache behavior dominate design choices.

Concept Summary Table

Concept Cluster	What You Need to Internalize
Memory Hierarchy and Arrays	Sequential access dominates performance because cache lines and pages are the real unit of work.
Linked Lists and Intrusive Structures	Lists are only good when you already have the pointer and need constant-time splicing.
Hash Tables	O(1) depends on load factor, collision strategy, and resizing behavior.
Ring Buffers and Priority Queues	Fixed-size queues provide predictable backpressure; heaps enforce priority.
Memory Allocators	Free lists and size classes trade fragmentation for speed.
B-Trees and Disk Indexes	High fanout reduces I/O and enables efficient range queries.
LSM Trees and Compaction	Sequential writes win, but compaction and bloom filters are essential.
Measurement and Big-O	Theory guides you, measurement decides.

Project-to-Concept Map

Project	What It Builds	Primer Chapters It Uses
Project 1: In-Memory Cache	Hash table + LRU list	Ch. 2, Ch. 3, Ch. 8
Project 2: Log-Structured KV Store	LSM tree	Ch. 1, Ch. 3, Ch. 7, Ch. 8
Project 3: Packet Buffer	Ring buffer + priority queue	Ch. 1, Ch. 4, Ch. 8
Project 4: Memory Allocator	Free lists + size classes	Ch. 2, Ch. 5, Ch. 8
Project 5: B-Tree File Index	Disk-oriented tree	Ch. 1, Ch. 6, Ch. 8
Project 6: Capstone Database	All structures combined	All chapters

Deep Dive Reading by Concept

Memory and Performance

Concept	Book & Chapter	Why This Matters
Memory hierarchy	Computer Systems: A Programmer’s Perspective - Ch. 6	Explains cache behavior and locality.
Dynamic memory	Computer Systems: A Programmer’s Perspective - Ch. 9.9	Allocator fundamentals and tradeoffs.
Performance measurement	The Art of Debugging with GDB - relevant sections	Shows profiling and debugging methods.

Data Structures and Algorithms

Concept	Book & Chapter	Why This Matters
Hash tables	Algorithms, Fourth Edition - Ch. 3.4	Collision strategies and resizing.
Priority queues	Algorithms, Fourth Edition - Ch. 2.4	Heap-based scheduling models.
Balanced trees	Algorithms, Fourth Edition - Ch. 6.1	B-tree fundamentals.

Systems and Storage

Concept	Book & Chapter	Why This Matters
File I/O	The Linux Programming Interface - Ch. 13-14	File descriptors and persistence.
Memory allocators	Operating Systems: Three Easy Pieces - Ch. 17	Free-space management.
LSM trees	Designing Data-Intensive Applications - Ch. 3	Write-optimized storage design.

Quick Start

Your first 48 hours:

Day 1 (4 hours):

Read Chapter 1 and Chapter 3 only.
Skim Project 1 Real World Outcome and Definition of Done.
Build a basic hash table with chaining and measure lookup latency.
Do not optimize yet; just get the cache to store and retrieve keys.

Day 2 (4 hours):

Add an intrusive LRU list to your cache (Project 1).
Run a benchmark with 80/20 read/write and record p99 latency.
Read the “Common Pitfalls” section for Project 1.
Stop when you can consistently evict the least recently used key.

By the end of Day 2, you should understand how hash tables and linked lists combine into a real system.

Recommended Learning Paths

Path 1: Systems Performance Engineer (Recommended Start)

Best for: People who want to understand caching and latency first.

Project 1 (Cache)
Project 3 (Packet Buffer)
Project 4 (Allocator)
Project 2 (LSM Store)
Project 5 (B-Tree Index)
Capstone

Path 2: Storage and Database Engineer

Best for: People focused on durable data systems.

Project 2 (LSM Store)
Project 5 (B-Tree Index)
Project 4 (Allocator)
Project 1 (Cache)
Capstone

Path 3: Networking and Streaming

Best for: People who want low-latency data pipelines.

Project 3 (Packet Buffer)
Project 1 (Cache)
Project 4 (Allocator)
Project 2 (LSM Store)
Project 5 (B-Tree Index)

Path 4: The Completionist

Phase 1: Project 1 and Project 3 Phase 2: Project 2 and Project 4 Phase 3: Project 5 Phase 4: Capstone

Success Metrics

You can explain why arrays beat linked lists in most real workloads.
You can implement a hash table with predictable load factor behavior.
Your ring buffer correctly applies backpressure without data loss.
Your allocator detects leaks and double frees in real programs.
Your B-tree index supports range scans with predictable depth.
You can measure p50/p99 latency and explain changes after optimizations.

Tooling Appendix: Benchmarking and Profiling

Latency: hyperfine, wrk, custom microbenchmarks
CPU profiling: perf record, flamegraphs
Memory profiling: valgrind --tool=massif, heaptrack
Cache behavior: valgrind --tool=cachegrind, perf stat
I/O behavior: strace, iostat, dtrace

Project Overview Table

Project	Difficulty	Time	Structures Covered	Why It Matters
In-Memory Cache	Intermediate	1-2 weeks	Hash tables, LRU lists	Fast read path in every system
Log-Structured KV Store	Advanced	2-3 weeks	Arrays, LSM trees	Write-heavy storage systems
Packet Buffer	Intermediate	1-2 weeks	Ring buffers, heaps	Backpressure and traffic shaping
Memory Allocator	Advanced	2 weeks	Free lists, size classes	Memory efficiency and stability
B-Tree File Index	Advanced	2-3 weeks	B-trees, disk pages	Ordered persistence
Capstone Database	Expert	4+ weeks	All structures	Systems integration

Project List

Project 1: In-Memory Cache (Mini-Redis LRU)

Main Programming Language: C
Alternative Programming Languages: Rust, Zig
Coolness Level: Level 4
Business Potential: Infrastructure component
Difficulty: Intermediate
Knowledge Area: Caches and in-memory data systems
Main Book: “The Linux Programming Interface” - Socket basics

What you will build: A TCP cache server with O(1) GET/SET, LRU eviction, and stats reporting.

Why it teaches systems data structures: It forces you to combine a hash table with an intrusive doubly linked list, implement resizing without latency spikes, and measure cache hit rates under realistic load.

Real World Outcome

What you will see:

A server process that accepts GET and SET commands over TCP.
A stats command that shows hit rate, evictions, and memory usage.
Deterministic LRU eviction under memory pressure.

Command Line Outcome Example:

$ ./mini_cache --port 6380 --max-mb 8
[cache] listening on 0.0.0.0:6380
[cache] max memory: 8 MB

$ nc localhost 6380
SET user:1 alice
OK
SET user:2 bob
OK
GET user:1
alice
STATS
keys=2 hits=1 misses=0 evictions=0 mem=1320B

# Fill memory to force eviction
SET big:1 <10MB_blob>
OK
STATS
keys=2 hits=1 misses=0 evictions=1 mem=8MB
GET user:2
(nil)   # user:2 was least recently used and evicted

The Core Question You’re Answering

“How do you combine O(1) lookup with O(1) eviction ordering under memory pressure?”

Concepts You Must Understand First

Hash tables
- What is the load factor and why does it matter?
- What are the tradeoffs between chaining and open addressing?
- Book Reference: Algorithms, Fourth Edition - Ch. 3.4
Intrusive doubly linked lists
- How do you remove a node in O(1)?
- Why does stable pointer ownership matter?
- Book Reference: Operating Systems: Three Easy Pieces - Ch. 17
Incremental resizing
- Why can resizing cause latency spikes?
- How does incremental rehashing avoid pauses?
- Book Reference: The Linux Programming Interface - Ch. 7

Questions to Guide Your Design

How will you enforce the memory limit and measure current usage?
Will you use chaining or open addressing for collisions?
How will you update the LRU list on both read and write?
How will you avoid a full-table rehash pause?

Thinking Exercise

The Eviction Trace Problem

You have this access sequence:

SET a 1
SET b 2
SET c 3
GET a
SET d 4

Assume max memory only fits 3 keys. Which key should be evicted? Why?

The Interview Questions They’ll Ask

“How does an LRU cache achieve O(1) lookup and eviction?”
“Why is a linked list appropriate here but not for general storage?”
“How would you avoid rehash latency spikes in production?”
“What is the tradeoff between LFU and LRU?”
“How would you instrument cache hit rate?”

Hints in Layers

Hint 1: Structure choice Use a hash table for lookup and an intrusive doubly linked list for LRU order.

Hint 2: LRU updates Every successful GET must move the node to the head of the list.

Hint 3: Eviction policy Evict from the tail of the LRU list when memory is full.

Hint 4: Incremental rehash Keep two hash tables and migrate a few buckets on each operation.

Books That Will Help

Topic	Book	Chapter
Hash tables	Algorithms, Fourth Edition	Ch. 3.4
Memory allocation	Computer Systems: A Programmer’s Perspective	Ch. 9.9
Networking basics	The Linux Programming Interface	Ch. 56-61

Common Pitfalls & Debugging

Problem 1: Eviction order is wrong

Why: LRU list not updated on GET
Fix: Always move node to head after access
Quick test: Access sequence and verify tail eviction

Problem 2: Crash during eviction

Why: Node removed from list but still referenced in hash table
Fix: Remove from hash table before free
Quick test: Run under valgrind --tool=memcheck

Problem 3: Latency spikes under load

Why: Full-table resize in one operation
Fix: Implement incremental rehash
Quick test: Benchmark and check p99 latency

Definition of Done

Server accepts GET/SET/DEL over TCP
LRU eviction works and is deterministic
Memory usage is enforced and measurable
Incremental resizing prevents latency spikes
Benchmarks show stable p99 latency

Project 2: Log-Structured Key-Value Store (Mini LSM-Tree)

Main Programming Language: C
Alternative Programming Languages: Rust, Go
Coolness Level: Level 4
Business Potential: Open-core infrastructure
Difficulty: Advanced
Knowledge Area: Storage engines
Main Book: “Designing Data-Intensive Applications” - Ch. 3

What you will build: A persistent key-value store with append-only logs, in-memory index, and background compaction into SSTables.

Real World Outcome

What you will see:

A CLI tool that can put, get, and delete keys.
A directory of log segments and SSTable files.
Compaction statistics that show write amplification.

Command Line Outcome Example:

$ ./lsm_kv --dir ./data
lsm> put user:1 alice
OK (offset=0)
lsm> put user:2 bob
OK (offset=32)
lsm> get user:1
alice
lsm> stats
segments=3 memtable=1MB sstables=2 compactions=1
lsm> compact
compacted 2 files -> 1 file (write_amp=2.4x)

The Core Question You’re Answering

“How do you trade read simplicity for massive write throughput?”

Concepts You Must Understand First

Append-only logging
- Why are sequential writes faster than random writes?
- Book Reference: Computer Systems: A Programmer’s Perspective - Ch. 6
Merge algorithms
- How do you merge sorted runs efficiently?
- Book Reference: Algorithms, Fourth Edition - Ch. 2.2
Bloom filters
- What is a false positive?
- Book Reference: Algorithms, Fourth Edition - Ch. 3 (hashing basics)

Questions to Guide Your Design

What size should your memtable be before flushing?
Will you use leveled or tiered compaction?
How will you track tombstones and deletes?
How will you measure write amplification?

Thinking Exercise

The Compaction Tradeoff

If you have 10 SSTables of 100MB each, and you compact them into one 1GB file, how many bytes are rewritten? How does this impact SSD wear?

The Interview Questions They’ll Ask

“Why do LSM trees favor writes over reads?”
“What is compaction and why is it necessary?”
“How do Bloom filters help?”
“What is write amplification?”
“When would you choose a B-tree over an LSM tree?”

Hints in Layers

Hint 1: WAL first Write to a log file before updating the memtable.

Hint 2: Immutable SSTables Flush memtable to a new sorted file and never modify it.

Hint 3: Merge compaction Use a multi-way merge to combine SSTables by key order.

Hint 4: Bloom filter Store a Bloom filter per SSTable to skip irrelevant files.

Books That Will Help

Topic	Book	Chapter
Log-structured storage	Designing Data-Intensive Applications	Ch. 3
File I/O	The Linux Programming Interface	Ch. 13-14
Merge algorithms	Algorithms, Fourth Edition	Ch. 2.2

Common Pitfalls & Debugging

Problem 1: Data disappears after restart

Why: WAL not replayed on startup
Fix: Rebuild memtable from WAL before serving reads
Quick test: Insert, restart, and verify data

Problem 2: Compaction corrupts keys

Why: Merge logic does not resolve duplicates correctly
Fix: Always keep the newest key version
Quick test: Insert same key twice, compact, then read

Problem 3: Reads slow down over time

Why: Too many SSTables
Fix: Trigger compaction based on file count threshold
Quick test: Monitor lookup time as files grow

Definition of Done

WAL ensures durability across crashes
Memtable flush produces sorted SSTables
Compaction reduces file count
Bloom filters reduce unnecessary reads
Benchmarks show higher write throughput than B-tree baseline

Project 3: Network Packet Buffer and Traffic Shaper

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 3
Business Potential: Service and support
Difficulty: Intermediate
Knowledge Area: Networking
Main Book: “The Linux Programming Interface” - sockets and IPC

What you will build: A userspace traffic shaper that buffers packets, applies backpressure, and schedules transmission by priority.

Real World Outcome

What you will see:

A ring buffer filling and draining under load.
Backpressure when the buffer reaches capacity.
Priority packets transmitted before low-priority traffic.

Command Line Outcome Example:

$ ./packet_buffer --rate 5mbps --buffer 1024
[buf] capacity=1024 packets rate=5mbps
[buf] enqueue: 100 packets
[buf] enqueue: 500 packets
[buf] buffer full, backpressure ON
[buf] dequeue: priority=HIGH packet_id=42
[buf] dequeue: priority=LOW packet_id=9

The Core Question You’re Answering

“How do you keep a system stable under bursty input without dropping the wrong data?”

Concepts You Must Understand First

Ring buffers
- How does wrap-around indexing work?
- Book Reference: Computer Systems: A Programmer’s Perspective - Ch. 6
Backpressure
- What is the difference between blocking and dropping, and why?
- Book Reference: Computer Networks - Ch. 3
Priority queues
- How do you implement a heap in an array?
- Book Reference: Algorithms, Fourth Edition - Ch. 2.4

Questions to Guide Your Design

How will you detect full vs empty buffers?
What is your policy when full: drop, block, or overwrite?
How will you represent priorities and scheduling?
How will you measure latency per packet class?

Thinking Exercise

The Burst Scenario

If packets arrive at 1000/s and you can transmit only 600/s, how long does it take to fill a buffer of 10,000 packets? What does this imply about backpressure?

The Interview Questions They’ll Ask

“Why are ring buffers so common in network stacks?”
“How do you detect full vs empty without extra state?”
“How would you implement a priority queue efficiently?”
“What are the consequences of dropping packets?”

Hints in Layers

Hint 1: Head/tail indices Keep head for reads and tail for writes.

Hint 2: Full detection Treat the buffer as full when (tail + 1) % N == head.

Hint 3: Priority queue Store per-priority queues and dequeue from highest priority first.

Hint 4: Backpressure signal Return a code or block when buffer is full.

Books That Will Help

Topic	Book	Chapter
Queues and scheduling	Computer Networks	Ch. 3
Priority queues	Algorithms, Fourth Edition	Ch. 2.4
Networking APIs	The Linux Programming Interface	Ch. 56-61

Common Pitfalls & Debugging

Problem 1: Buffer corruption

Why: Wrap-around math incorrect
Fix: Add tests for head/tail boundaries
Quick test: Enqueue/dequeue around capacity repeatedly

Problem 2: Deadlocks under load

Why: Backpressure not released
Fix: Ensure dequeue always advances head
Quick test: Simulate full buffer and confirm recovery

Definition of Done

Ring buffer passes wrap-around tests
Backpressure engages and releases predictably
Priority traffic is transmitted first
Throughput and latency are measured

Project 4: Memory Allocator with Debugging

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 4
Business Potential: Resume gold
Difficulty: Advanced
Knowledge Area: Memory management
Main Book: “Computer Systems: A Programmer’s Perspective” - Ch. 9.9

What you will build: A custom allocator with segregated free lists, coalescing, and debugging hooks for leak detection and double frees.

Real World Outcome

What you will see:

A drop-in malloc/free replacement via LD_PRELOAD.
A leak report showing unfreed allocations and stack traces.
A fragmentation report showing free list utilization.

Command Line Outcome Example:

$ LD_PRELOAD=./libmymalloc.so ./test_program
[malloc] alloc 128 bytes -> 0x7f123400
[malloc] alloc 256 bytes -> 0x7f123480
[malloc] free  128 bytes -> 0x7f123400
[malloc] WARNING double free at 0x7f123400
[malloc] leak report: 1 block (256 bytes)

The Core Question You’re Answering

“How does a memory allocator balance speed with fragmentation control?”

Concepts You Must Understand First

Free lists and size classes
- Why segregate by size?
- Book Reference: Operating Systems: Three Easy Pieces - Ch. 17
Coalescing
- How do you detect adjacent free blocks?
- Book Reference: Computer Systems: A Programmer’s Perspective - Ch. 9.9
Debugging metadata
- How do you track allocations without huge overhead?
- Book Reference: The Art of Debugging with GDB - relevant sections

Questions to Guide Your Design

How many size classes will you use and why?
When do you coalesce (immediately or lazily)?
How will you track allocation metadata safely?
How will you detect double frees?

Thinking Exercise

The Fragmentation Puzzle

If you allocate 100 blocks of 64 bytes and free every other block, how much contiguous memory is available for a 1KB allocation? What does this mean for external fragmentation?

The Interview Questions They’ll Ask

“What is the difference between internal and external fragmentation?”
“Why do allocators use size classes?”
“How would you implement coalescing?”
“How does LD_PRELOAD work for allocator replacement?”

Hints in Layers

Hint 1: Segregated bins Use an array of free lists indexed by size class.

Hint 2: Boundary tags Store size in header and footer for coalescing.

Hint 3: Debug map Use a hash table keyed by pointer to track allocations.

Hint 4: Canary bytes Add guard bytes to detect buffer overflows.

Books That Will Help

Topic	Book	Chapter
Allocator design	Computer Systems: A Programmer’s Perspective	Ch. 9.9
Free lists	Operating Systems: Three Easy Pieces	Ch. 17
Debugging memory	The Art of Debugging with GDB	relevant sections

Common Pitfalls & Debugging

Problem 1: Crashes during free

Why: Block headers corrupted
Fix: Add guard bytes and validate headers
Quick test: Use AddressSanitizer in debug builds

Problem 2: High fragmentation

Why: Size classes too coarse
Fix: Add more bins or use best-fit for large blocks
Quick test: Track fragmentation statistics

Definition of Done

Allocator serves allocations of multiple sizes
Coalescing reduces external fragmentation
Leak detection reports unfreed blocks
Double frees are detected and reported

Project 5: B-Tree File Index (Mini Filesystem Index)

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 4
Business Potential: Open-core infrastructure
Difficulty: Advanced
Knowledge Area: Disk data structures
Main Book: “Database Internals” - storage engine design

What you will build: A disk-based B-tree that indexes file names and supports prefix queries and range scans.

Real World Outcome

What you will see:

An index file on disk with fixed-size pages.
Fast prefix lookups like log_* and img_*.
Stable tree height as the index grows.

Command Line Outcome Example:

$ ./btree_index --build /var/log
[index] pagesize=4096
[index] inserted 120000 keys
[index] height=3 fanout=96

$ ./btree_index --query "log_"
log_2024_12_31
log_2025_01_01
log_2025_01_02

The Core Question You’re Answering

“How do you design a tree that minimizes disk I/O rather than CPU time?”

Concepts You Must Understand First

B-tree node layout
- How do keys and pointers fit in a page?
- Book Reference: Algorithms, Fourth Edition - Ch. 6.1
Page-oriented I/O
- Why does page size dictate tree fanout?
- Book Reference: The Linux Programming Interface - file I/O chapters
Range scans
- Why do linked leaves matter?
- Book Reference: Database Internals - Ch. 2-4

Questions to Guide Your Design

What page size will you use and why?
How will you serialize nodes to disk?
How will you handle splits and merges safely?
How will you implement prefix and range scans?

Thinking Exercise

The Height Calculation

If your page can hold 100 keys, how many levels do you need for 1 billion keys? What does this imply about lookup I/O?

The Interview Questions They’ll Ask

“Why do databases use B-trees instead of binary search trees?”
“What is the difference between B-tree and B+ tree?”
“How do you keep a B-tree balanced during inserts?”
“How would you implement range scans efficiently?”

Hints in Layers

Hint 1: Page alignment Choose a page size (e.g., 4096 bytes) and design node layout accordingly.

Hint 2: Split policy When a node is full, split and promote the middle key to the parent.

Hint 3: Linked leaves Store a next-leaf pointer to support range scans.

Hint 4: Crash safety Write nodes to new pages before updating parent pointers.

Books That Will Help

Topic	Book	Chapter
B-tree fundamentals	Algorithms, Fourth Edition	Ch. 6.1
Disk I/O and pages	The Linux Programming Interface	Ch. 13-14
Storage engines	Database Internals	Ch. 2-4

Common Pitfalls & Debugging

Problem 1: Tree corruption after crash

Why: Parent pointers updated before child page persisted
Fix: Write child page first, then parent update (or WAL)
Quick test: Crash mid-insert and verify recovery

Problem 2: Range scans slow

Why: Leaf nodes not linked
Fix: Add leaf-level linked list
Quick test: Scan with prefix and measure throughput

Definition of Done

Index persists to disk across restarts
Inserts keep tree balanced and height stable
Range scans return ordered results
Benchmark shows predictable lookup time

Project 6: Capstone - Simple Database Engine

Main Programming Language: C
Alternative Programming Languages: Rust
Coolness Level: Level 5
Business Potential: Infrastructure product
Difficulty: Expert
Knowledge Area: Database systems

What you will build: A minimal SQL engine with a B-tree primary index, an LSM-based secondary index, a buffer pool cache, and a WAL for durability.

Real World Outcome

What you will see:

A SQL-like shell that supports CREATE TABLE, INSERT, SELECT.
A persistent database file with a WAL and B-tree pages.
Query plans that show index usage.

Command Line Outcome Example:

$ ./mini_db
> CREATE TABLE users (id INT, name TEXT);
OK
> INSERT INTO users VALUES (1, 'alice');
OK
> INSERT INTO users VALUES (2, 'bob');
OK
> SELECT name FROM users WHERE id = 2;
bob
> .plan SELECT name FROM users WHERE id = 2;
plan: index lookup (btree)

The Core Question You’re Answering

“How do the major systems data structures integrate into a working database?”

Concepts You Must Understand First

B-tree index
- How does it map keys to pages?
- Book Reference: Algorithms, Fourth Edition - Ch. 6.1
LSM and WAL
- Why is append-only logging essential for durability?
- Book Reference: Designing Data-Intensive Applications - Ch. 3
Buffer pool and LRU
- How does a cache reduce disk I/O?
- Book Reference: Operating Systems: Three Easy Pieces - Ch. 17

Questions to Guide Your Design

What is your on-disk page format?
How will you coordinate WAL, memtable, and B-tree updates?
How will you manage buffer pool eviction?
What is your crash recovery procedure?

Thinking Exercise

The Crash Scenario

If the system crashes between writing the WAL and updating the B-tree, what happens on restart? Which state should be considered correct?

The Interview Questions They’ll Ask

“What is the role of the WAL in a database engine?”
“How does a buffer pool reduce disk I/O?”
“Why might a database use both B-trees and LSM trees?”
“How would you design crash recovery?”

Hints in Layers

Hint 1: Start with WAL Log every change before applying it to indexes.

Hint 2: Buffer pool Cache pages in memory and evict with LRU.

Hint 3: Simplify SQL Start with a minimal grammar: CREATE, INSERT, SELECT, WHERE.

Hint 4: Recovery Replay WAL entries on startup before serving queries.

Books That Will Help

Topic	Book	Chapter
Storage engines	Database Internals	Ch. 2-4
WAL and recovery	Operating Systems: Three Easy Pieces	Ch. 42
Indexes	Algorithms, Fourth Edition	Ch. 6.1

Common Pitfalls & Debugging

Problem 1: Data corruption after crash

Why: WAL not replayed correctly
Fix: Ensure WAL is applied before opening the B-tree
Quick test: Crash during insert and verify recovery

Problem 2: Buffer pool thrashing

Why: Cache size too small or eviction policy broken
Fix: Tune size and validate LRU ordering
Quick test: Run workload with skewed access

Definition of Done

SQL shell supports basic CREATE/INSERT/SELECT
WAL ensures durability across crashes
B-tree index supports point and range queries
Buffer pool reduces disk I/O under load
Basic query plan output is correct