Sprint 4.5 - Data Structures and Algorithmic Tools (Systems View)

Goal: Build the mental model that data structures are performance contracts in real systems. You will learn why arrays are usually the fastest choice, where linked lists actually win, how hash tables hide costly resizing and collision behavior, and why disk-friendly trees and LSM structures dominate storage engines. By the end, you will be able to choose structures based on access patterns, memory layout, and I/O behavior, and you will validate those choices with profiling and benchmarks, not intuition.

Introduction

Data structures in systems programming are not just containers; they are executable performance agreements between your code, the CPU cache, and the storage device. In production systems, the wrong structure silently destroys throughput, increases tail latency, or makes durability impossible to reason about.

What you will build (by the end of this guide):

  • An in-memory cache server with O(1) lookup and O(1) LRU eviction
  • A log-structured key-value store with background compaction
  • A network packet buffer that implements backpressure and prioritization
  • A memory allocator with free lists, size classes, and leak detection
  • A disk-based B-tree index with range queries
  • A capstone database engine that combines all of the above

Scope (what is included):

  • Data structures as they appear in production systems
  • Memory layout, cache locality, and I/O behavior
  • Real benchmarking and performance measurement
  • Persistence, crash resilience, and compaction strategies

Out of scope (for this guide):

  • Formal proofs of correctness
  • Purely academic data structure exercises
  • High-level database query optimizers beyond basics

The Big Picture (Mental Model)

Workload -> Access Pattern -> Memory Layout -> Structure -> Algorithm -> Measurement
   |             |                |              |           |             |
   v             v                v              v           v             v
Reads vs Writes  Random vs Seq    Cache lines    Array,      O(1) vs O(log) Benchmarks
Latency targets  Point vs Range   Page size      Hash, Tree  Tail latency   Profiling
Durability       Burstiness       Allocator      LSM, Ring   Amortized cost Flamegraphs

Key Terms You Will See Everywhere

  • Cache line: The smallest chunk of memory the CPU cache loads at once (typically 64 bytes).
  • Load factor: The ratio of elements to buckets in a hash table.
  • Fanout: The number of children per internal node in a tree (critical for disk I/O).
  • Compaction: The process of merging and rewriting storage segments to reduce fragmentation and amplify reads.
  • Backpressure: A signal that slows producers when buffers are full.

How to Use This Guide

  1. Read the Theory Primer like a mini-book. Each chapter is designed to give you a mental model and an implementation blueprint.
  2. Build projects in order if you are new to systems performance. Each project depends on previous structures.
  3. For each project, copy the Real World Outcome first, then implement until your output matches.
  4. Use the Questions to Guide Your Design before you write code. This is where most production bugs originate.
  5. Measure everything. A data structure is only correct when it meets the performance contract you intended.

Prerequisites & Background

Essential Prerequisites (Must Have)

Programming Skills:

  • Confident C programming (pointers, structs, function pointers, manual memory management)
  • Comfort reading and writing code that uses malloc, free, and memcpy
  • Ability to build and debug with make, gcc or clang

Systems Fundamentals:

  • File I/O (open, read, write, fsync)
  • Basic socket programming (socket, bind, accept)
  • Understanding of stack vs heap, and process memory layout

Data Structure Fundamentals:

  • Arrays, linked lists, stacks, queues
  • Basic hash tables and collision handling
  • Trees and traversal basics

Debugging Skills:

  • gdb or lldb for stepping and stack traces
  • valgrind for memory leaks and invalid accesses
  • Basic profiling with perf or gprof

Helpful But Not Required

  • Lock-free programming and atomics
  • Filesystem internals and page caching
  • Benchmark design and statistical analysis

Self-Assessment Questions

  • Can you explain why a cache miss is expensive compared to a register read?
  • Can you detect a leak and a double free using valgrind?
  • Do you know what a load factor is and why 0.75 is common?
  • Can you explain why B-trees have high fanout?
  • Can you read a flamegraph and identify hot paths?

If you answered “no” to 2 or more, review CS:APP chapters on memory and performance and build a small C project first.

Development Environment Setup

Required Tools:

  • Linux or macOS environment
  • gcc or clang (C11 or newer)
  • make
  • gdb or lldb
  • valgrind (Linux) or leaks (macOS)
  • perf (Linux) or Instruments (macOS)

Recommended Tools:

  • hyperfine for benchmarking
  • strace or dtrace for system call tracing
  • flamegraph for visualization

Testing Your Setup:

$ cc --version
$ make --version
$ gdb --version
$ valgrind --version

Time Investment

  • Project 1: 1-2 weeks (15-25 hours)
  • Project 2: 2-3 weeks (25-40 hours)
  • Project 3: 1-2 weeks (15-25 hours)
  • Project 4: 2 weeks (20-30 hours)
  • Project 5: 2-3 weeks (25-40 hours)
  • Capstone: 4+ weeks (40-80 hours)

Important Reality Check

These projects are deliberately hard. You will debug segmentation faults, measure unexpected slowdowns, and rewrite code. The goal is to build intuition for the systems tradeoffs that textbooks rarely teach. Expect to iterate.

Big Picture / Mental Model

Data structures in systems are chosen primarily by access pattern and storage medium:

Access Pattern -> Structure -> Storage Medium -> Dominant Cost
Random lookups -> Hash table -> RAM            -> Cache misses
Ordered ranges -> B-tree      -> Disk/SSD       -> Page reads
Append-only    -> LSM tree    -> Disk/SSD       -> Sequential writes
Streaming FIFO -> Ring buffer -> RAM            -> Backpressure
Object lifetimes -> Free list -> RAM           -> Fragmentation

Key insight: In systems, latency is dominated by memory and I/O behavior, not just Big-O. Arrays often beat “faster” theoretical structures simply because they align with the CPU cache and storage page size.

Theory Primer

This section is your mini-book. Each chapter includes fundamentals, a deep dive, and the practical connection to the projects.

Chapter 1: Memory Hierarchy and Arrays (Cache Locality as a Superpower)

Fundamentals

Arrays are the baseline structure for systems performance because they place elements in contiguous memory. CPUs fetch data in cache lines, not individual bytes. When you access one element of an array, you implicitly load nearby elements into L1 or L2 cache, dramatically reducing latency for the next access. This behavior is called spatial locality. Arrays also allow the CPU’s hardware prefetcher to predict the next memory access and fetch it ahead of time. In practice, this means an O(n) scan over an array can be faster than an O(log n) tree lookup if the tree nodes are scattered in memory. Arrays also have low metadata overhead, which matters for millions of small objects. The core idea: arrays are fast because they turn unpredictable memory access into predictable sequential access.

Deep Dive

The memory hierarchy is a stack of increasingly slower, larger storage layers: registers, L1/L2/L3 caches, DRAM, SSD, and disk. Each level is orders of magnitude slower than the one above. This means the performance of a data structure is often determined by how well it aligns with cache lines and memory pages. Arrays excel because their contiguous layout allows cache lines to be fully utilized. When you iterate through an array, each cache line fetch gives you multiple elements for free. By contrast, pointer-heavy structures like linked lists cause the CPU to fetch a cache line that contains mostly pointer metadata and a single element, wasting bandwidth and adding unpredictable branch behavior.

Array growth strategies are another key systems concept. Most dynamic arrays grow geometrically (e.g., doubling capacity) to amortize reallocation costs. The cost of resizing can be high because it involves allocating a new buffer and copying the old contents. However, the amortized cost per append is still O(1), and the benefit is tight, contiguous storage. This is why LSM memtables and log buffers prefer array-backed structures: writes are sequential, and reads can be optimized by scanning or binary searching. Arrays are also the foundation of many higher-level structures: hash tables store buckets in arrays, heaps are arrays with index arithmetic, and ring buffers are arrays with modular arithmetic.

In systems programming, array layout decisions should align with cache line and page size boundaries. The typical cache line size is 64 bytes, so packing your hot fields into a compact struct often yields massive performance gains. For disk, the unit is the page or block (commonly 4KB). A B-tree node that fits in a page minimizes the number of I/O operations per query. Arrays also enable vectorization and SIMD because sequential memory allows predictable memory access patterns. When you measure performance, you will often find that arrays with simple loops outperform more “clever” structures due to better branch prediction and fewer cache misses. The deeper lesson is that asymptotic complexity is only part of the performance contract; layout and access patterns can dominate.

How This Fits in Projects

  • Project 2: Log segments and SSTables use array-like sequential layout.
  • Project 3: Ring buffers are fixed-size arrays with circular indexing.
  • Project 4: Size classes use arrays of free lists.
  • Project 5: B-tree nodes are effectively fixed-size arrays stored in pages.

Definitions and Key Terms

  • Spatial locality: Accessing nearby memory addresses close in time.
  • Temporal locality: Re-accessing the same memory soon after use.
  • Cache line: Fixed-size chunk of memory fetched by the CPU cache.
  • Prefetcher: CPU unit that predicts and fetches future memory accesses.
  • Amortized cost: Average cost per operation over many operations.

Mental Model Diagram

Array in Memory (contiguous)
[0][1][2][3][4][5][6][7][8][9]
 ^  ^  ^  ^  ^  ^  ^  ^  ^  ^
 |  |  |  |  |  |  |  |  |  |
Cache line fetch grabs multiple elements at once

How It Works (Step-by-Step)

  1. CPU loads a cache line containing element i.
  2. Nearby elements i+1, i+2 are already loaded.
  3. The loop continues with cache hits instead of misses.
  4. If the array grows, a larger buffer is allocated and copied.
  5. The amortized cost remains low because growth is infrequent.

Minimal Concrete Example

// Simple dynamic array append with geometric growth
void push_int(int **arr, size_t *len, size_t *cap, int value) {
    if (*len == *cap) {
        *cap = (*cap == 0) ? 8 : (*cap * 2);
        *arr = realloc(*arr, *cap * sizeof(int));
    }
    (*arr)[(*len)++] = value;
}

Common Misconceptions

  • “O(1) lookups always beat O(log n)”: Cache misses can make the tree slower.
  • “Resizing arrays is always expensive”: It is expensive occasionally, but amortized cheap.
  • “Arrays are only for small data”: Arrays scale extremely well when access is sequential.

Check-Your-Understanding Questions

  1. Why can a sequential array scan beat a balanced tree lookup even though it is O(n)?
  2. What does amortized O(1) append mean in practice?
  3. How does cache line size influence struct layout?

Check-Your-Understanding Answers

  1. The array scan has predictable memory access with cache prefetching, while the tree lookup suffers cache misses on pointer jumps.
  2. Most appends are O(1), and expensive resizes happen rarely, so average cost stays constant.
  3. Packing hot fields into one cache line reduces misses and improves throughput.

Real-World Applications

  • Network ring buffers and audio pipelines
  • Log buffers and write-ahead logs
  • In-memory index arrays and vectorized scans

Where You Will Apply It

  • Project 2: Sequential log segments and SSTables
  • Project 3: Ring buffer memory layout
  • Project 4: Size class arrays

References

  • Computer Systems: A Programmer’s Perspective - Ch. 6 (Memory Hierarchy)
  • Write Great Code, Volume 2 - Ch. 5-6 (Memory and caches)

Key Insight

Arrays are fast because they align with how hardware actually moves data.

Summary

Arrays exploit spatial locality, reduce metadata overhead, and enable the CPU to prefetch effectively. In systems programming, this matters more than theoretical complexity. They are the backbone of high-performance queues, tables, and storage engines.

Homework/Exercises to Practice the Concept

  1. Measure the time to iterate over a 100M-element array vs a linked list.
  2. Compare performance of AoS (array of structs) vs SoA (struct of arrays) for a hot loop.

Solutions to the Homework/Exercises

  1. The array scan should be orders of magnitude faster due to cache locality.
  2. SoA often performs better for SIMD-friendly operations and cache line packing.

Chapter 2: Linked Lists and Intrusive Structures (Where Pointers Actually Win)

Fundamentals

Linked lists are often presented as a flexible alternative to arrays. In systems programming, their main advantage is not insertion speed but ownership of memory. By embedding list pointers directly inside a struct (an intrusive list), you avoid extra allocations and can remove or insert nodes in O(1) when you already have the pointer. This is why linked lists appear in memory allocators and LRU caches. They are not cache-friendly, and they are not good for random access. They are good for constant-time splicing when you already have the element and when the list itself is not too long. In short: linked lists are a specialized tool, not a default choice.

Deep Dive

The biggest performance cost of linked lists is pointer chasing. Each node is potentially in a different cache line or even a different memory page, causing frequent cache misses. The CPU cannot prefetch effectively because the next pointer is unknown until the current node is loaded. This makes even a simple traversal slow. In systems code, that cost can dwarf the theoretical advantage of O(1) insertion. However, there are important places where linked lists are still ideal.

Intrusive lists are the canonical systems use case. The list pointers live inside the objects themselves, so you do not allocate list node wrappers. This reduces allocation overhead and improves memory locality compared to non-intrusive lists. Because you already have the pointer to the object, removal is O(1) as long as you maintain prev and next. This is exactly what an LRU cache needs: after a GET, you can move the accessed item to the head of the list in constant time. In a memory allocator, free blocks are tracked in free lists using pointers embedded in the free blocks themselves. That way, the memory used for bookkeeping comes from the free block itself. When you coalesce blocks, you can unlink and relink nodes without extra allocation.

Linked lists are also the backbone of many kernel and embedded systems data structures where memory is preallocated and deterministic. The Linux kernel uses intrusive list macros (list_head) extensively. The key is that lists are used as indexing glue rather than as the primary storage. The actual data may live in arrays or pools, and lists provide fast membership and ordering operations. That means lists often pair with another structure: a hash table for lookup plus a list for ordering (LRU), or an array pool plus a free list for allocation.

One of the subtle benefits of linked lists is stable pointers. In a dynamic array, a resize moves elements, invalidating pointers. In a linked list, node addresses remain stable. That makes lists useful for caches or allocators where you must store stable references to objects. But this benefit is only valuable when you already have pointers from another structure. For random access, arrays and trees remain superior.

How This Fits in Projects

  • Project 1: LRU list for cache eviction.
  • Project 4: Free lists for memory allocator.
  • Project 6: Buffer pool LRU and free page lists.

Definitions and Key Terms

  • Intrusive list: List pointers are embedded in the data object itself.
  • Pointer chasing: Following pointers that lead to unpredictable memory addresses.
  • Stable pointer: An address that does not change even when container grows.
  • Splicing: Removing and inserting nodes without full traversal.

Mental Model Diagram

Hash Table Lookup + LRU List Ordering

Key -> Hash -> Bucket -> Node
                        |
                        v
   [MRU] <-> [ ... ] <-> [LRU]

Hash table gives O(1) lookup; list gives O(1) eviction ordering

How It Works (Step-by-Step)

  1. Store objects in a hash table for O(1) lookup.
  2. Maintain a doubly linked list for recency ordering.
  3. On access, unlink the node and move it to the head.
  4. On eviction, remove the tail node.

Minimal Concrete Example

typedef struct node {
    struct node *prev;
    struct node *next;
    int key;
    int value;
} node_t;

void list_remove(node_t **head, node_t *n) {
    if (n->prev) n->prev->next = n->next;
    if (n->next) n->next->prev = n->prev;
    if (*head == n) *head = n->next;
    n->prev = n->next = NULL;
}

Common Misconceptions

  • “Linked lists are always faster for inserts”: Only true if you already have the pointer.
  • “Linked lists save memory”: They often use more memory due to extra pointers.
  • “Linked lists avoid resizing”: True, but at the cost of cache locality.

Check-Your-Understanding Questions

  1. Why does a linked list traversal suffer from cache misses?
  2. Why are intrusive lists better than external node wrappers?
  3. In what scenarios are stable pointers more important than cache locality?

Check-Your-Understanding Answers

  1. Each node is in a potentially different cache line, so each hop can be a miss.
  2. Intrusive lists avoid extra allocations and reduce indirection.
  3. When objects are referenced elsewhere and cannot move (e.g., LRU nodes or allocator blocks).

Real-World Applications

  • LRU caches in Redis and OS page caches
  • Free lists in memory allocators
  • Kernel task queues and timer lists

Where You Will Apply It

  • Project 1: LRU eviction list
  • Project 4: Free list management
  • Project 6: Buffer pool and free page lists

References

  • Operating Systems: Three Easy Pieces - Ch. 17 (Free-space management)
  • Advanced Programming in the UNIX Environment - Ch. 7 (Memory allocation concepts)

Key Insight

Linked lists are not a general-purpose structure; they are a precision tool for constant-time splicing with stable pointers.

Summary

Use linked lists when you already have the node pointer and need constant-time reordering or removal. Otherwise, prefer arrays and contiguous structures.

Homework/Exercises to Practice the Concept

  1. Implement an intrusive doubly linked list and measure traversal time vs array.
  2. Add LRU ordering to a hash table and confirm eviction order.

Solutions to the Homework/Exercises

  1. The list traversal will be much slower due to cache misses.
  2. LRU ordering should evict the least recently accessed key.

Chapter 3: Hash Tables and Hashing in Systems

Fundamentals

Hash tables provide average O(1) lookup by mapping keys to buckets using a hash function. In systems programming, the hidden cost is not the lookup itself but the load factor, the collision strategy, and the resizing policy. As the table fills, collisions increase and each lookup becomes a small linear scan. Resizing is expensive because every key must be rehashed. In performance-critical systems, you often use incremental rehashing or fixed-size tables to avoid latency spikes. Good hash functions are also critical; a weak hash leads to clustering and destroys performance. Hash tables are the backbone of caches, indexes, and memory tracking systems because they deliver fast point lookups with predictable average behavior when tuned correctly.

Deep Dive

A hash table is only “O(1)” on average when the load factor stays within a safe range, typically below 0.75 for open addressing or around 1.0 for chaining. In chaining, each bucket holds a list of entries. This allows high load factors, but each bucket traversal can become costly and cache-unfriendly. In open addressing, entries live directly in the bucket array, improving cache locality, but the table cannot be overfull and deletions are complex due to tombstones. The choice between these strategies is a fundamental systems design tradeoff: chaining is simpler and supports deletions easily, while open addressing can be faster but is more sensitive to load factor.

Resizing is the major hidden cost. When the table grows, every entry must be rehashed because the modulo divisor changes. For a table with millions of entries, this can cause a multi-millisecond or multi-second pause, which is unacceptable for latency-sensitive systems. Production caches like Redis avoid this with incremental rehashing: two tables exist simultaneously, and entries are migrated gradually during normal operations. This is a core example of an algorithmic choice that is invisible in academic exercises but critical in real systems.

Hash function quality determines collision behavior. Non-uniform hashing creates clustering, which means that some buckets are far more expensive than others. In security-sensitive contexts, a poor hash function can lead to denial-of-service attacks (hash flooding). Many systems use randomized hash seeds or stronger hashing algorithms to mitigate this. There is also the issue of key size: variable-length string keys increase hash computation cost, so production systems often store precomputed hash values or use short integer keys when possible. Additionally, the metadata overhead of hash table entries can be significant, particularly when using separate chaining. For memory-heavy workloads, you must account for the cost of pointers, alignment padding, and allocator overhead.

In systems programming, hash tables are often paired with other structures. An LRU cache combines a hash table for lookup and a linked list for eviction order. A memory allocator uses a hash table to track allocations for debugging. A database may use a hash table for in-memory indexes but rely on B-trees for ordered disk indexes. The core is understanding that a hash table is not just a data structure; it is a set of tradeoffs around memory usage, cache locality, and worst-case behavior.

How This Fits in Projects

  • Project 1: Key lookup in the cache server
  • Project 2: In-memory index for log-structured store
  • Project 4: Allocation tracking
  • Project 6: Secondary indexes

Definitions and Key Terms

  • Load factor: items / buckets
  • Chaining: collisions handled by per-bucket lists
  • Open addressing: collisions handled by probing in the array
  • Tombstone: marker for deleted entries in open addressing
  • Incremental rehashing: migrating entries over multiple operations

Mental Model Diagram

Key -> Hash(key) -> Bucket index -> Entry

If collision:
Bucket 5: [k1]->[k2]->[k3] (chaining)

or
Bucket 5 occupied, probe 6, 7, 8... (open addressing)

How It Works (Step-by-Step)

  1. Compute hash for key.
  2. Map hash to bucket using modulo.
  3. Search bucket chain or probe sequence.
  4. Insert or update entry.
  5. Resize when load factor threshold reached.

Minimal Concrete Example

size_t bucket = hash(key) % table->capacity;
entry_t *e = table->buckets[bucket];
while (e && strcmp(e->key, key) != 0) e = e->next;

Common Misconceptions

  • “Hash tables are always O(1)”: Only average-case; bad hashing can degrade to O(n).
  • “Resizing is rare”: It is rare but can cause huge latency spikes if not handled.
  • “Chaining is always slower”: It can outperform open addressing under heavy deletions.

Check-Your-Understanding Questions

  1. Why does open addressing degrade sharply at high load factors?
  2. How does incremental rehashing reduce tail latency?
  3. Why can hashing be a security risk?

Check-Your-Understanding Answers

  1. Probing sequences grow long, causing many cache misses and comparisons.
  2. It spreads the rehash cost across many operations.
  3. Attackers can craft keys that collide and force O(n) behavior.

Real-World Applications

  • Redis and Memcached key lookup
  • In-memory metadata tracking
  • Deduplication tables in storage systems

Where You Will Apply It

  • Project 1: Cache table
  • Project 2: Log index
  • Project 4: Allocation tracking
  • Project 6: Secondary indexes and metadata

References

  • Algorithms, Fourth Edition - Ch. 3.4 (Hash Tables)
  • Mastering Algorithms with C - Ch. 8 (Hash Tables)
  • Redis key eviction and LRU policy overview: https://redis.io/docs/latest/develop/reference/eviction/

Key Insight

A hash table is only fast if you control collisions, load factor, and resizing.

Summary

Hash tables power fast point lookups, but their real performance is governed by collision strategy, load factor, and resizing. Systems code must treat these as first-class design decisions.

Homework/Exercises to Practice the Concept

  1. Implement a hash table with chaining and measure lookup latency at load factors 0.5, 1.0, 2.0.
  2. Add incremental rehashing and compare tail latency.

Solutions to the Homework/Exercises

  1. Latency increases non-linearly as collisions increase.
  2. Incremental rehashing reduces long pauses during resize.

Chapter 4: Queues in Systems (Ring Buffers and Priority Queues)

Fundamentals

Queues are the backbone of data movement in systems: network packets, audio frames, log records, and task scheduling all rely on queue behavior. The most important queue structure in systems programming is the ring buffer, which is a fixed-size array with head and tail indices that wrap around. This structure avoids allocation and provides predictable backpressure when full. For traffic shaping and scheduling, a priority queue (usually a binary heap) is used to select the most important item first. Together, ring buffers and heaps provide the essential tools for building predictable, high-throughput pipelines.

Deep Dive

A ring buffer uses modular arithmetic to map a continuous stream of writes into a fixed-size array. The head points to the next element to read, and the tail points to the next element to write. The buffer is empty when head equals tail, and full when advancing the tail would collide with the head. This makes capacity predictable and enables lock-free producer-consumer designs when you use atomic head and tail counters. Because the buffer is fixed-size, you avoid the unpredictable latency of malloc/free, and you gain cache-friendly memory layout.

Backpressure is a core systems concept enabled by ring buffers. When the buffer is full, the producer must slow down or drop input. This is the difference between stable systems and those that overload under spikes. In networking, ring buffers absorb bursts and keep the consumer from being overwhelmed. In audio processing, ring buffers prevent underruns and overruns. Systems that ignore backpressure typically exhibit tail latency spikes and memory blowups.

Priority queues solve a different problem: ordering items by importance rather than arrival time. A binary heap stored in an array offers O(log n) insert and removal with excellent cache locality. For traffic shaping, you can maintain separate priority queues and choose from the highest priority queue when the ring buffer is ready to send. In real-time systems, priority queues are essential for deadline scheduling.

Ring buffers and heaps are both array-based structures, which makes them predictable and cache-friendly. The most common errors are off-by-one bugs in wrap-around arithmetic, and incorrect full/empty detection. These bugs manifest as silent data corruption. In systems code, you should write unit tests that explicitly cover wrap-around and boundary conditions.

How This Fits in Projects

  • Project 3: Core of the packet buffer and rate limiter
  • Project 6: Write-ahead log buffer

Definitions and Key Terms

  • Head/Tail pointers: Indices that track read/write positions.
  • Backpressure: Signaling to slow producers when buffers are full.
  • Priority queue: A structure that returns the highest-priority item first.
  • Heap property: Parent nodes are ordered relative to children.

Mental Model Diagram

Ring Buffer

[0][1][2][3][4][5][6][7]
 ^           ^
head        tail

Write: buffer[tail] = item; tail = (tail+1) % N
Read:  item = buffer[head]; head = (head+1) % N

How It Works (Step-by-Step)

  1. Producer checks if buffer is full.
  2. If not full, write at tail and advance.
  3. Consumer checks if buffer is empty.
  4. If not empty, read at head and advance.
  5. Priority queue selects next item by importance.

Minimal Concrete Example

size_t next = (tail + 1) % capacity;
if (next == head) return BUFFER_FULL;
ring[tail] = item;
tail = next;

Common Misconceptions

  • “Ring buffers are just queues”: They are queues with fixed capacity and wrap-around logic.
  • “Priority queues are too expensive”: For moderate sizes, O(log n) with array layout is fast.
  • “Backpressure is optional”: Without it, systems fail under load spikes.

Check-Your-Understanding Questions

  1. How do you distinguish between full and empty in a ring buffer?
  2. Why is a heap often stored in an array?
  3. What happens if a producer ignores backpressure?

Check-Your-Understanding Answers

  1. Full occurs when advancing tail would meet head; empty when head == tail.
  2. Arrays provide cache-friendly layout and simple index arithmetic.
  3. Buffers grow unbounded or data gets dropped unpredictably, causing latency spikes.

Real-World Applications

  • NIC receive queues and kernel ring buffers
  • Audio and video streaming pipelines
  • Task schedulers and priority-based job systems

Where You Will Apply It

  • Project 3: Packet buffer and traffic shaping
  • Project 6: WAL and query scheduling

References

  • Linux kernel circular buffer documentation: https://docs.kernel.org/core-api/circular-buffers.html
  • Linux kernel ring buffer design: https://www.kernel.org/doc/html/latest/trace/ring-buffer-design.html
  • Computer Networks - Ch. 3-4 (Queueing and scheduling)

Key Insight

Ring buffers make performance predictable by fixing capacity and eliminating allocation.

Summary

Systems pipelines depend on predictable queues. Ring buffers provide fast FIFO behavior with backpressure, and priority queues provide policy control.

Homework/Exercises to Practice the Concept

  1. Implement a ring buffer and test wrap-around with capacity 8.
  2. Implement a binary heap and measure insert/remove throughput.

Solutions to the Homework/Exercises

  1. Wrap-around should preserve ordering across the boundary.
  2. Heap operations should remain logarithmic with stable throughput.

Chapter 5: Memory Allocator Design (Free Lists, Size Classes, Fragmentation)

Fundamentals

Memory allocation is where data structures meet physical memory. A good allocator must balance fast allocation with minimal fragmentation. Most allocators use free lists to track available blocks and size classes to group allocations of similar sizes. Small allocations are served from fixed-size bins, while large allocations are handled separately. The allocator must also support coalescing of adjacent free blocks to reduce fragmentation. Debugging features such as double-free detection and leak tracking require additional metadata structures, often hash tables. Understanding allocator design is crucial because it directly affects the performance and memory usage of every system program.

Deep Dive

A simple allocator might use a single free list of all free blocks. This is easy to implement but slow because every allocation requires scanning the list to find a sufficiently large block. It also produces fragmentation because blocks are split and not recombined efficiently. Production allocators use segregated free lists: separate lists for different size classes. When you request 32 bytes, the allocator picks a block from the 32-byte bin without scanning the entire heap. This makes allocation O(1) in practice. The tradeoff is internal fragmentation: you might waste a few bytes inside each block because the block size is fixed by the class. Modern allocators accept this tradeoff because the performance gain is massive.

Coalescing is the process of merging adjacent free blocks to form a larger block. Without coalescing, free space becomes fragmented into small unusable pieces. Allocators track block sizes and whether neighboring blocks are free. A common technique is boundary tags: storing the size in both the header and footer so that you can quickly find adjacent blocks. Another technique is to maintain separate free lists per size and also maintain a doubly linked list of all free blocks for coalescing.

Threading adds another dimension. Allocators often maintain per-thread caches to reduce contention. Jemalloc and tcmalloc maintain multiple arenas, each with its own free lists and metadata. This reduces locking overhead but increases memory usage because each arena may hold unused blocks. This is a classic systems tradeoff: performance vs memory. Debugging features also require metadata. Leak detection means you must track every allocation, typically in a hash table keyed by pointer. Double-free detection requires marking free blocks and verifying they are not already freed.

Allocator behavior is deeply intertwined with cache locality. Small allocations are often packed into slabs or pages, improving cache performance. But if you allocate many tiny objects with different lifetimes, fragmentation can become severe. Understanding your application’s allocation patterns is key. The allocator is not just a library function; it is a policy engine controlling how memory is organized and reused.

How This Fits in Projects

  • Project 4: Implement a custom allocator with free lists and size classes
  • Project 6: Manage page and buffer pools with allocator-like logic

Definitions and Key Terms

  • Internal fragmentation: Wasted space inside allocated blocks.
  • External fragmentation: Free space split into unusable pieces.
  • Segregated free list: Separate free lists per size class.
  • Boundary tag: Metadata in header/footer used for coalescing.
  • Arena: Independent allocator instance to reduce contention.

Mental Model Diagram

Heap Layout
[Hdr|....allocated....|Ftr][Hdr|..free..|Ftr][Hdr|....alloc....|Ftr]
               ^               ^
           block size       block size

Free lists:
[16B] -> [block] -> [block]
[32B] -> [block]
[64B] -> [block] -> [block]

How It Works (Step-by-Step)

  1. Choose size class based on request size.
  2. Pop a block from the free list.
  3. If no block, request more memory and split.
  4. On free, insert into free list and attempt coalescing.
  5. Update metadata for debugging and leak detection.

Minimal Concrete Example

// Very simplified segregated free list
free_block_t *bin = bins[class_index(size)];
if (bin) {
    bins[class_index(size)] = bin->next;
    return (void *)(bin + 1);
}

Common Misconceptions

  • “malloc is just a wrapper over mmap”: It is a complex policy engine.
  • “Fragmentation is rare”: It is common in long-running systems.
  • “Faster allocation always means better”: It can cost significant memory overhead.

Check-Your-Understanding Questions

  1. Why do size classes improve allocation speed?
  2. What is the difference between internal and external fragmentation?
  3. Why do allocators use arenas?

Check-Your-Understanding Answers

  1. They avoid scanning large free lists and provide O(1) lookup.
  2. Internal is wasted space inside blocks, external is unusable free chunks.
  3. To reduce contention in multi-threaded programs.

Real-World Applications

  • glibc malloc, jemalloc, tcmalloc
  • Memory pools in game engines
  • Buffer pool management in databases

Where You Will Apply It

  • Project 4: Custom allocator
  • Project 6: Buffer and page pool logic

References

  • Computer Systems: A Programmer’s Perspective - Ch. 9.9 (Dynamic memory allocation)
  • jemalloc manpage and design notes: https://jemalloc.net/jemalloc.3.html
  • tcmalloc design overview: https://google.github.io/tcmalloc/design.html

Key Insight

Allocator design is about trading memory efficiency for predictable speed.

Summary

Allocators organize memory using free lists and size classes to deliver fast allocation. Fragmentation and concurrency are the real challenges, not just allocating bytes.

Homework/Exercises to Practice the Concept

  1. Simulate allocations and frees to measure fragmentation under different size classes.
  2. Implement boundary tags and test coalescing correctness.

Solutions to the Homework/Exercises

  1. Size classes reduce time but increase internal fragmentation.
  2. Coalescing should merge adjacent blocks and reduce external fragmentation.

Chapter 6: B-Trees and Disk-Oriented Indexes

Fundamentals

B-trees are the dominant structure for ordered data on disk. They are designed to minimize I/O by maximizing fanout, which reduces tree height. Each node is sized to fit a disk page (commonly 4KB). This means a single disk read brings in dozens or hundreds of keys at once. In contrast, binary search trees are shallow in memory but terrible on disk because each pointer jump can be a separate I/O. B-trees give predictable O(log n) performance with far fewer disk reads, which is why they are used in filesystems and databases.

Deep Dive

A B-tree node contains multiple keys and pointers to children, arranged in sorted order. When a node becomes full, it splits into two nodes and promotes a key to the parent. This keeps the tree balanced and ensures a consistent height across operations. A key concept is that the node size should align with the disk page size, so each node can be loaded with a single I/O. The fanout (number of children per node) is therefore determined by page size and key size. A fanout of 100 means that a tree with a million keys has height around 3, which means only 3 page reads per lookup.

Many systems use B+ trees, where all values are stored in leaf nodes and internal nodes store only keys and child pointers. Leaves are linked together to support efficient range scans. This design is common in database indexes because range queries are frequent, and sequential leaf scans are efficient on disk. Another important concept is the separation between logical and physical layout. A B-tree node is a logical unit, but its on-disk representation requires careful layout, including headers, key arrays, child pointers, and checksums. Crash consistency often involves journaling or write-ahead logs to ensure that node splits do not corrupt the tree.

The cost model for B-trees is dominated by disk or SSD latency. Even with SSDs, random reads are far slower than memory reads. This is why B-trees maximize fanout and minimize height. It is also why many systems choose B-trees for primary indexes but use hash tables for in-memory caches. A B-tree gives ordered traversal and predictable performance for both point and range queries, while hash tables provide fast point lookups but no ordering.

In practice, B-trees must handle deletion, which can cause underfull nodes. This requires merging or rebalancing nodes, which is more complex than insertion. Many systems choose to delay merges until nodes are under a threshold, trading space efficiency for fewer writes. Understanding these tradeoffs is critical when building disk-based indexes.

How This Fits in Projects

  • Project 5: Disk-based file index
  • Project 6: Primary index in the database engine

Definitions and Key Terms

  • Fanout: Number of children per internal node.
  • Page: Fixed-size block of disk I/O, often 4KB.
  • B+ tree: Variant where values are stored in leaves and leaves are linked.
  • Split: Operation that divides a full node into two.
  • Merge: Operation that combines underfull nodes after deletion.

Mental Model Diagram

B-Tree Node (fits in one page)
[ k1 | k2 | k3 | k4 ]
 /   /   |   \   \
C0  C1   C2   C3  C4

High fanout reduces height and I/O

How It Works (Step-by-Step)

  1. Search down the tree by comparing keys in a node.
  2. Load child page and repeat until leaf.
  3. Insert into leaf; if full, split and promote.
  4. Adjust parents until tree remains balanced.
  5. For deletes, merge or redistribute if underfull.

Minimal Concrete Example

// Search in a B-tree node (simplified)
int i = 0;
while (i < node->nkeys && key > node->keys[i]) i++;
if (i < node->nkeys && key == node->keys[i]) return FOUND;
return search(node->children[i], key);

Common Misconceptions

  • “B-trees are only for databases”: Filesystems also use them extensively.
  • “B-trees are about Big-O”: They are about minimizing disk I/O.
  • “B-trees are too complex”: Complexity is necessary for predictable performance.

Check-Your-Understanding Questions

  1. Why does higher fanout reduce disk I/O?
  2. What is the benefit of a B+ tree for range queries?
  3. Why does deletion require merges or redistribution?

Check-Your-Understanding Answers

  1. Higher fanout reduces tree height, so fewer pages are read.
  2. Leaves are linked, enabling sequential scans.
  3. Deletions can leave nodes underfull, breaking balance rules.

Real-World Applications

  • SQLite and many embedded databases
  • Filesystem indexes
  • Block device indexing

Where You Will Apply It

  • Project 5: File index
  • Project 6: Primary index

References

  • Algorithms, Fourth Edition - Ch. 6.1 (Balanced search trees)
  • SQLite file format documentation (B-tree pages and page sizes): https://www.sqlite.org/fileformat.html
  • Database Internals - Ch. 2-4 (Storage engine architecture)

Key Insight

B-trees trade complex node management for predictable disk I/O.

Summary

B-trees are the core disk index structure because they minimize page reads. Their design is driven by storage page size and fanout, not just algorithmic complexity.

Homework/Exercises to Practice the Concept

  1. Compute the height of a B-tree with fanout 100 and 1 million keys.
  2. Implement leaf-linked range scan and measure throughput.

Solutions to the Homework/Exercises

  1. Height is about 3 (log base 100 of 1,000,000).
  2. Range scans should be sequential and significantly faster than random lookups.

Chapter 7: LSM Trees and Compaction (Write-Optimized Storage)

Fundamentals

Log-Structured Merge (LSM) trees are designed to optimize write throughput by converting random writes into sequential appends. Writes go to an in-memory structure (memtable) and are flushed to disk as sorted files (SSTables). Reads may need to check multiple files, so LSM trees trade read amplification for write performance. Compaction is the process of merging these files to keep read performance reasonable. LSM trees dominate modern write-heavy storage systems such as RocksDB, LevelDB, Cassandra, and HBase.

Deep Dive

An LSM tree begins with a write-ahead log (WAL) to guarantee durability. The actual data is written into an in-memory memtable, often implemented as a skip list or sorted array. When the memtable reaches a size threshold, it is flushed to disk as an immutable sorted file called an SSTable. Over time, many SSTables accumulate, and reads must search multiple files. This is where compaction comes in: smaller SSTables are merged into larger ones, reducing the number of files and reclaiming space from overwritten keys.

Compaction introduces write amplification because data is rewritten multiple times as it moves through levels. The size ratio between levels (often 10x) controls the tradeoff between write amplification and read amplification. Larger ratios mean fewer levels but more work per compaction. Smaller ratios mean more levels and more read overhead. This is a central tuning knob for LSM systems. Bloom filters are commonly used to reduce read amplification by quickly determining which SSTables cannot contain a key. This reduces unnecessary disk reads. LSM trees therefore use a combination of structures: arrays for log segments, hash tables for in-memory indexes, Bloom filters for negative lookups, and merge algorithms for compaction.

LSM trees excel on SSDs and even HDDs because sequential writes are far cheaper than random writes. In B-trees, each insert may cause multiple random writes along the tree path. In LSM trees, writes are appended, and random I/O is replaced by sequential compaction runs. The downside is that reads are more complex and require merging multiple versions of a key. This is why LSM systems are best suited to write-heavy workloads or workloads where range scans are less frequent.

Real-world implementations introduce complexity: multi-level compaction policies, tiered vs leveled compaction, size-tiered compactions, and concurrent compaction threads. They also require careful handling of tombstones (deletions) and snapshots. The key lesson is that LSM trees trade read simplicity for write speed, and the correct choice depends on workload.

How This Fits in Projects

  • Project 2: Core of the log-structured key-value store
  • Project 6: Secondary index and write path in the database engine

Definitions and Key Terms

  • Memtable: In-memory sorted structure for recent writes.
  • SSTable: Immutable sorted file on disk.
  • Compaction: Merge process to reduce file count.
  • Write amplification: Extra writes caused by compaction.
  • Bloom filter: Probabilistic structure for fast negative lookups.

Mental Model Diagram

Write Path
Client -> WAL -> Memtable -> Flush -> SSTable
                                  -> Compaction -> Larger SSTable

How It Works (Step-by-Step)

  1. Append write to WAL for durability.
  2. Insert key into memtable.
  3. When memtable full, flush to new SSTable.
  4. Compaction merges SSTables by key order.
  5. Bloom filters reduce disk reads during lookups.

Minimal Concrete Example

// Write path sketch
wal_append(key, value);
memtable_put(key, value);
if (memtable_full()) flush_to_sstable();

Common Misconceptions

  • “LSM trees are always faster”: They are faster for writes, but reads can be slower.
  • “Compaction is optional”: Without compaction, read performance collapses.
  • “Bloom filters guarantee accuracy”: They can have false positives.

Check-Your-Understanding Questions

  1. Why does LSM design turn random writes into sequential writes?
  2. What is write amplification and why does it matter?
  3. How do Bloom filters reduce read amplification?

Check-Your-Understanding Answers

  1. Writes are appended to logs and flushed sequentially to SSTables.
  2. Data is rewritten multiple times during compaction, increasing total writes.
  3. They quickly exclude SSTables that cannot contain a key.

Real-World Applications

  • RocksDB, LevelDB, Cassandra, HBase
  • Time-series databases and logging systems

Where You Will Apply It

  • Project 2: Log-structured KV store
  • Project 6: Secondary index and WAL integration

References

  • LSM-Tree original paper (O’Neil et al., 1996): https://dblp.org/rec/journals/acta/OniellCM96.html
  • RocksDB compaction documentation: https://github.com/facebook/rocksdb/wiki/Compaction
  • Designing Data-Intensive Applications - Ch. 3 (Storage and retrieval)

Key Insight

LSM trees trade read simplicity for massive write throughput by embracing sequential I/O.

Summary

LSM trees are the go-to write-optimized structure. They rely on compaction, bloom filters, and sequential I/O to achieve high throughput under heavy write loads.

Homework/Exercises to Practice the Concept

  1. Implement a memtable and flush it to a sorted file.
  2. Write a merge function for two sorted SSTables.

Solutions to the Homework/Exercises

  1. The flushed file should be sorted by key and immutable.
  2. The merge should be linear in the total number of entries.

Chapter 8: Measurement and Big-O in Real Systems

Fundamentals

Big-O notation is a useful mental model, but real systems performance depends on constant factors, memory layout, and I/O behavior. The only reliable way to know if a structure is “fast” is to measure it under realistic workload conditions. Systems engineers use profiling tools like perf, gprof, and flamegraphs, as well as benchmarking tools like hyperfine. The key skill is to turn performance into a measurable contract: latency targets, throughput targets, and memory limits. This chapter teaches you how to reason about data structures with both theory and measurement.

Deep Dive

In practice, a data structure’s theoretical complexity is only one part of the story. Consider a hash table vs a B-tree: the hash table offers O(1) average lookup, but if the table is large and the keys are scattered, each lookup may incur cache misses. The B-tree has O(log n) complexity, but if its nodes fit in cache lines or disk pages, each step may be efficient. The result is that the B-tree can outperform the hash table for large datasets. This is why systems engineers often talk about “cache complexity” or “I/O complexity” rather than just Big-O.

Benchmarking must match the intended workload. A cache optimized for random reads may perform poorly under sequential scans. A ring buffer optimized for single-producer/single-consumer may fail under multi-producer contention. This is why you must build benchmarks that reflect real use: bursty traffic, skewed key distributions (e.g., Zipfian), or mixed read/write ratios. Measurement also requires statistical discipline: multiple runs, warm-up iterations, and percentile-based metrics. Mean latency is often misleading; tail latency (p99, p999) reveals when the system fails to meet its contract.

Profiling is the method for finding bottlenecks. A flamegraph shows which functions dominate CPU time. Cachegrind reveals where cache misses occur. perf stat can show branch mispredictions, LLC misses, and cycles per instruction. For memory-heavy projects, tools like valgrind --tool=massif measure heap usage over time. In storage projects, strace reveals system call overhead, and iostat shows disk saturation. The point is not to optimize prematurely, but to understand the bottleneck and confirm that your design is behaving as expected.

Finally, measurement should drive design. If your LRU cache spends 30 percent of its time in rehashing, you need a different resizing policy. If your allocator has high fragmentation, you need different size classes or coalescing policies. If your B-tree range scans are slow, check whether leaf nodes are aligned with page boundaries. In systems programming, theory gives you the starting point, but measurement decides the winner.

How This Fits in Projects

  • All projects require benchmarking and profiling to validate structure choices.

Definitions and Key Terms

  • Throughput: Operations per second.
  • Latency: Time per operation.
  • Tail latency: High-percentile latency (p99, p999).
  • Warm cache: Repeated access where data is already in memory.
  • Zipfian distribution: Access pattern where a few keys dominate.

Mental Model Diagram

Theory -> Hypothesis -> Implementation -> Measurement -> Decision
   |           |              |                |           |
 Big-O      Expectation     Code            Benchmark     Change

How It Works (Step-by-Step)

  1. Define the performance contract (latency/throughput/memory).
  2. Create a workload that matches production patterns.
  3. Measure baseline performance.
  4. Profile to identify bottlenecks.
  5. Change the design and re-measure.

Minimal Concrete Example

# Quick benchmark loop
$ hyperfine './cache_bench --keys 100000 --ratio 80/20'

Common Misconceptions

  • “Big-O is enough”: It ignores hardware realities.
  • “Average latency is fine”: Tail latency is what users feel.
  • “Profiling is only for optimization”: It validates correctness of design.

Check-Your-Understanding Questions

  1. Why might O(log n) beat O(1) for large data sets?
  2. Why is p99 latency more important than average?
  3. What is a warm cache and why does it matter?

Check-Your-Understanding Answers

  1. Cache locality and I/O behavior can dominate theoretical complexity.
  2. Users experience worst-case delays, not the average.
  3. Warm caches avoid disk I/O, which dramatically improves performance.

Real-World Applications

  • Performance tuning in databases, caches, and storage systems
  • SLO-driven system design

Where You Will Apply It

  • Every project in this sprint

References

  • Computer Systems: A Programmer’s Perspective - Ch. 5-6 (Performance and memory hierarchy)
  • The Art of Debugging with GDB - performance analysis sections

Key Insight

Measurement, not theory, is the final judge in systems performance.

Summary

Big-O is a tool, not the truth. Real systems require measurement, profiling, and workload-aware benchmarking to validate data structure choices.

Homework/Exercises to Practice the Concept

  1. Build a microbenchmark comparing array scan, linked list scan, and tree lookup.
  2. Measure p50, p90, p99 latency for your cache server.

Solutions to the Homework/Exercises

  1. Array scans will dominate due to cache locality.
  2. p99 latency should spike during resizing or eviction storms.

Glossary

  • Cache line: The smallest unit the CPU cache fetches (commonly 64 bytes).
  • Spatial locality: Accessing memory addresses near each other.
  • Temporal locality: Re-accessing the same data soon.
  • Load factor: Items per bucket in a hash table.
  • Fanout: Number of children per node in a tree.
  • Compaction: Merging storage files to reduce fragmentation and improve reads.
  • Backpressure: Slowing input when buffers fill.
  • Fragmentation: Memory waste due to allocation patterns.
  • SSTable: Immutable sorted storage file in LSM trees.
  • WAL: Write-ahead log for durability.

Why Data Structures Matter

The Modern Problem It Solves

Modern systems live or die on latency and throughput. Data structures are the mechanism that turns hardware into performance. The same workload can vary by orders of magnitude depending on layout and access pattern.

Real-world impact (data points):

  • SQLite documents that database pages range from 512 to 65536 bytes and that the maximum database size is about 281 TB, which makes B-tree page layout fundamental to scalability (SQLite file format documentation, accessed 2026): https://www.sqlite.org/fileformat.html
  • RocksDB’s 2022 compaction write-up reports more than 10% write-amplification reduction and about 12% compaction load savings from aligned compaction output files, showing how compaction policy directly affects throughput (RocksDB blog, 2022-10-31): https://rocksdb.org/blog/2022/10/31/align-compaction-output-file.html
  • jemalloc documentation notes that size classes trade internal fragmentation for speed, often around 20 percent wasted space for small allocations in exchange for predictable performance (jemalloc manpage, accessed 2026): https://jemalloc.net/jemalloc.3.html
  • tcmalloc documentation describes dozens of size classes (around 60-80) to balance speed and memory usage (tcmalloc design docs, accessed 2026): https://google.github.io/tcmalloc/design.html
Old Approach (Academic)         Systems Approach (This Sprint)
[Big-O proofs]                 [Measure cache misses]
[Balanced trees on paper]      [B-trees sized to pages]
[Linked lists everywhere]      [Arrays + intrusive lists]
[No workload model]            [Benchmarks + tail latency]

Context and Evolution (Optional)

Data structures evolved from theoretical foundations to hardware-driven designs. The LSM-tree was introduced in the mid-1990s to address write-heavy workloads and has become the core of many modern storage engines. Today, hardware characteristics like SSD write amplification and CPU cache behavior dominate design choices.

Concept Summary Table

Concept Cluster What You Need to Internalize
Memory Hierarchy and Arrays Sequential access dominates performance because cache lines and pages are the real unit of work.
Linked Lists and Intrusive Structures Lists are only good when you already have the pointer and need constant-time splicing.
Hash Tables O(1) depends on load factor, collision strategy, and resizing behavior.
Ring Buffers and Priority Queues Fixed-size queues provide predictable backpressure; heaps enforce priority.
Memory Allocators Free lists and size classes trade fragmentation for speed.
B-Trees and Disk Indexes High fanout reduces I/O and enables efficient range queries.
LSM Trees and Compaction Sequential writes win, but compaction and bloom filters are essential.
Measurement and Big-O Theory guides you, measurement decides.

Project-to-Concept Map

Project What It Builds Primer Chapters It Uses
Project 1: In-Memory Cache Hash table + LRU list Ch. 2, Ch. 3, Ch. 8
Project 2: Log-Structured KV Store LSM tree Ch. 1, Ch. 3, Ch. 7, Ch. 8
Project 3: Packet Buffer Ring buffer + priority queue Ch. 1, Ch. 4, Ch. 8
Project 4: Memory Allocator Free lists + size classes Ch. 2, Ch. 5, Ch. 8
Project 5: B-Tree File Index Disk-oriented tree Ch. 1, Ch. 6, Ch. 8
Project 6: Capstone Database All structures combined All chapters

Deep Dive Reading by Concept

Memory and Performance

Concept Book & Chapter Why This Matters
Memory hierarchy Computer Systems: A Programmer’s Perspective - Ch. 6 Explains cache behavior and locality.
Dynamic memory Computer Systems: A Programmer’s Perspective - Ch. 9.9 Allocator fundamentals and tradeoffs.
Performance measurement The Art of Debugging with GDB - relevant sections Shows profiling and debugging methods.

Data Structures and Algorithms

Concept Book & Chapter Why This Matters
Hash tables Algorithms, Fourth Edition - Ch. 3.4 Collision strategies and resizing.
Priority queues Algorithms, Fourth Edition - Ch. 2.4 Heap-based scheduling models.
Balanced trees Algorithms, Fourth Edition - Ch. 6.1 B-tree fundamentals.

Systems and Storage

Concept Book & Chapter Why This Matters
File I/O The Linux Programming Interface - Ch. 13-14 File descriptors and persistence.
Memory allocators Operating Systems: Three Easy Pieces - Ch. 17 Free-space management.
LSM trees Designing Data-Intensive Applications - Ch. 3 Write-optimized storage design.

Quick Start

Your first 48 hours:

Day 1 (4 hours):

  1. Read Chapter 1 and Chapter 3 only.
  2. Skim Project 1 Real World Outcome and Definition of Done.
  3. Build a basic hash table with chaining and measure lookup latency.
  4. Do not optimize yet; just get the cache to store and retrieve keys.

Day 2 (4 hours):

  1. Add an intrusive LRU list to your cache (Project 1).
  2. Run a benchmark with 80/20 read/write and record p99 latency.
  3. Read the “Common Pitfalls” section for Project 1.
  4. Stop when you can consistently evict the least recently used key.

By the end of Day 2, you should understand how hash tables and linked lists combine into a real system.

Best for: People who want to understand caching and latency first.

  1. Project 1 (Cache)
  2. Project 3 (Packet Buffer)
  3. Project 4 (Allocator)
  4. Project 2 (LSM Store)
  5. Project 5 (B-Tree Index)
  6. Capstone

Path 2: Storage and Database Engineer

Best for: People focused on durable data systems.

  1. Project 2 (LSM Store)
  2. Project 5 (B-Tree Index)
  3. Project 4 (Allocator)
  4. Project 1 (Cache)
  5. Capstone

Path 3: Networking and Streaming

Best for: People who want low-latency data pipelines.

  1. Project 3 (Packet Buffer)
  2. Project 1 (Cache)
  3. Project 4 (Allocator)
  4. Project 2 (LSM Store)
  5. Project 5 (B-Tree Index)

Path 4: The Completionist

Phase 1: Project 1 and Project 3 Phase 2: Project 2 and Project 4 Phase 3: Project 5 Phase 4: Capstone

Success Metrics

  • You can explain why arrays beat linked lists in most real workloads.
  • You can implement a hash table with predictable load factor behavior.
  • Your ring buffer correctly applies backpressure without data loss.
  • Your allocator detects leaks and double frees in real programs.
  • Your B-tree index supports range scans with predictable depth.
  • You can measure p50/p99 latency and explain changes after optimizations.

Tooling Appendix: Benchmarking and Profiling

  • Latency: hyperfine, wrk, custom microbenchmarks
  • CPU profiling: perf record, flamegraphs
  • Memory profiling: valgrind --tool=massif, heaptrack
  • Cache behavior: valgrind --tool=cachegrind, perf stat
  • I/O behavior: strace, iostat, dtrace

Project Overview Table

Project Difficulty Time Structures Covered Why It Matters
In-Memory Cache Intermediate 1-2 weeks Hash tables, LRU lists Fast read path in every system
Log-Structured KV Store Advanced 2-3 weeks Arrays, LSM trees Write-heavy storage systems
Packet Buffer Intermediate 1-2 weeks Ring buffers, heaps Backpressure and traffic shaping
Memory Allocator Advanced 2 weeks Free lists, size classes Memory efficiency and stability
B-Tree File Index Advanced 2-3 weeks B-trees, disk pages Ordered persistence
Capstone Database Expert 4+ weeks All structures Systems integration

Project List

Project 1: In-Memory Cache (Mini-Redis LRU)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Zig
  • Coolness Level: Level 4
  • Business Potential: Infrastructure component
  • Difficulty: Intermediate
  • Knowledge Area: Caches and in-memory data systems
  • Main Book: “The Linux Programming Interface” - Socket basics

What you will build: A TCP cache server with O(1) GET/SET, LRU eviction, and stats reporting.

Why it teaches systems data structures: It forces you to combine a hash table with an intrusive doubly linked list, implement resizing without latency spikes, and measure cache hit rates under realistic load.

Real World Outcome

What you will see:

  1. A server process that accepts GET and SET commands over TCP.
  2. A stats command that shows hit rate, evictions, and memory usage.
  3. Deterministic LRU eviction under memory pressure.

Command Line Outcome Example:

$ ./mini_cache --port 6380 --max-mb 8
[cache] listening on 0.0.0.0:6380
[cache] max memory: 8 MB

$ nc localhost 6380
SET user:1 alice
OK
SET user:2 bob
OK
GET user:1
alice
STATS
keys=2 hits=1 misses=0 evictions=0 mem=1320B

# Fill memory to force eviction
SET big:1 <10MB_blob>
OK
STATS
keys=2 hits=1 misses=0 evictions=1 mem=8MB
GET user:2
(nil)   # user:2 was least recently used and evicted

The Core Question You’re Answering

“How do you combine O(1) lookup with O(1) eviction ordering under memory pressure?”

Concepts You Must Understand First

  1. Hash tables
    • What is the load factor and why does it matter?
    • What are the tradeoffs between chaining and open addressing?
    • Book Reference: Algorithms, Fourth Edition - Ch. 3.4
  2. Intrusive doubly linked lists
    • How do you remove a node in O(1)?
    • Why does stable pointer ownership matter?
    • Book Reference: Operating Systems: Three Easy Pieces - Ch. 17
  3. Incremental resizing
    • Why can resizing cause latency spikes?
    • How does incremental rehashing avoid pauses?
    • Book Reference: The Linux Programming Interface - Ch. 7

Questions to Guide Your Design

  1. How will you enforce the memory limit and measure current usage?
  2. Will you use chaining or open addressing for collisions?
  3. How will you update the LRU list on both read and write?
  4. How will you avoid a full-table rehash pause?

Thinking Exercise

The Eviction Trace Problem

You have this access sequence:

SET a 1
SET b 2
SET c 3
GET a
SET d 4

Assume max memory only fits 3 keys. Which key should be evicted? Why?

The Interview Questions They’ll Ask

  1. “How does an LRU cache achieve O(1) lookup and eviction?”
  2. “Why is a linked list appropriate here but not for general storage?”
  3. “How would you avoid rehash latency spikes in production?”
  4. “What is the tradeoff between LFU and LRU?”
  5. “How would you instrument cache hit rate?”

Hints in Layers

Hint 1: Structure choice Use a hash table for lookup and an intrusive doubly linked list for LRU order.

Hint 2: LRU updates Every successful GET must move the node to the head of the list.

Hint 3: Eviction policy Evict from the tail of the LRU list when memory is full.

Hint 4: Incremental rehash Keep two hash tables and migrate a few buckets on each operation.

Books That Will Help

Topic Book Chapter
Hash tables Algorithms, Fourth Edition Ch. 3.4
Memory allocation Computer Systems: A Programmer’s Perspective Ch. 9.9
Networking basics The Linux Programming Interface Ch. 56-61

Common Pitfalls & Debugging

Problem 1: Eviction order is wrong

  • Why: LRU list not updated on GET
  • Fix: Always move node to head after access
  • Quick test: Access sequence and verify tail eviction

Problem 2: Crash during eviction

  • Why: Node removed from list but still referenced in hash table
  • Fix: Remove from hash table before free
  • Quick test: Run under valgrind --tool=memcheck

Problem 3: Latency spikes under load

  • Why: Full-table resize in one operation
  • Fix: Implement incremental rehash
  • Quick test: Benchmark and check p99 latency

Definition of Done

  • Server accepts GET/SET/DEL over TCP
  • LRU eviction works and is deterministic
  • Memory usage is enforced and measurable
  • Incremental resizing prevents latency spikes
  • Benchmarks show stable p99 latency

Project 2: Log-Structured Key-Value Store (Mini LSM-Tree)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust, Go
  • Coolness Level: Level 4
  • Business Potential: Open-core infrastructure
  • Difficulty: Advanced
  • Knowledge Area: Storage engines
  • Main Book: “Designing Data-Intensive Applications” - Ch. 3

What you will build: A persistent key-value store with append-only logs, in-memory index, and background compaction into SSTables.

Real World Outcome

What you will see:

  1. A CLI tool that can put, get, and delete keys.
  2. A directory of log segments and SSTable files.
  3. Compaction statistics that show write amplification.

Command Line Outcome Example:

$ ./lsm_kv --dir ./data
lsm> put user:1 alice
OK (offset=0)
lsm> put user:2 bob
OK (offset=32)
lsm> get user:1
alice
lsm> stats
segments=3 memtable=1MB sstables=2 compactions=1
lsm> compact
compacted 2 files -> 1 file (write_amp=2.4x)

The Core Question You’re Answering

“How do you trade read simplicity for massive write throughput?”

Concepts You Must Understand First

  1. Append-only logging
    • Why are sequential writes faster than random writes?
    • Book Reference: Computer Systems: A Programmer’s Perspective - Ch. 6
  2. Merge algorithms
    • How do you merge sorted runs efficiently?
    • Book Reference: Algorithms, Fourth Edition - Ch. 2.2
  3. Bloom filters
    • What is a false positive?
    • Book Reference: Algorithms, Fourth Edition - Ch. 3 (hashing basics)

Questions to Guide Your Design

  1. What size should your memtable be before flushing?
  2. Will you use leveled or tiered compaction?
  3. How will you track tombstones and deletes?
  4. How will you measure write amplification?

Thinking Exercise

The Compaction Tradeoff

If you have 10 SSTables of 100MB each, and you compact them into one 1GB file, how many bytes are rewritten? How does this impact SSD wear?

The Interview Questions They’ll Ask

  1. “Why do LSM trees favor writes over reads?”
  2. “What is compaction and why is it necessary?”
  3. “How do Bloom filters help?”
  4. “What is write amplification?”
  5. “When would you choose a B-tree over an LSM tree?”

Hints in Layers

Hint 1: WAL first Write to a log file before updating the memtable.

Hint 2: Immutable SSTables Flush memtable to a new sorted file and never modify it.

Hint 3: Merge compaction Use a multi-way merge to combine SSTables by key order.

Hint 4: Bloom filter Store a Bloom filter per SSTable to skip irrelevant files.

Books That Will Help

Topic Book Chapter
Log-structured storage Designing Data-Intensive Applications Ch. 3
File I/O The Linux Programming Interface Ch. 13-14
Merge algorithms Algorithms, Fourth Edition Ch. 2.2

Common Pitfalls & Debugging

Problem 1: Data disappears after restart

  • Why: WAL not replayed on startup
  • Fix: Rebuild memtable from WAL before serving reads
  • Quick test: Insert, restart, and verify data

Problem 2: Compaction corrupts keys

  • Why: Merge logic does not resolve duplicates correctly
  • Fix: Always keep the newest key version
  • Quick test: Insert same key twice, compact, then read

Problem 3: Reads slow down over time

  • Why: Too many SSTables
  • Fix: Trigger compaction based on file count threshold
  • Quick test: Monitor lookup time as files grow

Definition of Done

  • WAL ensures durability across crashes
  • Memtable flush produces sorted SSTables
  • Compaction reduces file count
  • Bloom filters reduce unnecessary reads
  • Benchmarks show higher write throughput than B-tree baseline

Project 3: Network Packet Buffer and Traffic Shaper

  • Main Programming Language: C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 3
  • Business Potential: Service and support
  • Difficulty: Intermediate
  • Knowledge Area: Networking
  • Main Book: “The Linux Programming Interface” - sockets and IPC

What you will build: A userspace traffic shaper that buffers packets, applies backpressure, and schedules transmission by priority.

Real World Outcome

What you will see:

  1. A ring buffer filling and draining under load.
  2. Backpressure when the buffer reaches capacity.
  3. Priority packets transmitted before low-priority traffic.

Command Line Outcome Example:

$ ./packet_buffer --rate 5mbps --buffer 1024
[buf] capacity=1024 packets rate=5mbps
[buf] enqueue: 100 packets
[buf] enqueue: 500 packets
[buf] buffer full, backpressure ON
[buf] dequeue: priority=HIGH packet_id=42
[buf] dequeue: priority=LOW packet_id=9

The Core Question You’re Answering

“How do you keep a system stable under bursty input without dropping the wrong data?”

Concepts You Must Understand First

  1. Ring buffers
    • How does wrap-around indexing work?
    • Book Reference: Computer Systems: A Programmer’s Perspective - Ch. 6
  2. Backpressure
    • What is the difference between blocking and dropping, and why?
    • Book Reference: Computer Networks - Ch. 3
  3. Priority queues
    • How do you implement a heap in an array?
    • Book Reference: Algorithms, Fourth Edition - Ch. 2.4

Questions to Guide Your Design

  1. How will you detect full vs empty buffers?
  2. What is your policy when full: drop, block, or overwrite?
  3. How will you represent priorities and scheduling?
  4. How will you measure latency per packet class?

Thinking Exercise

The Burst Scenario

If packets arrive at 1000/s and you can transmit only 600/s, how long does it take to fill a buffer of 10,000 packets? What does this imply about backpressure?

The Interview Questions They’ll Ask

  1. “Why are ring buffers so common in network stacks?”
  2. “How do you detect full vs empty without extra state?”
  3. “How would you implement a priority queue efficiently?”
  4. “What are the consequences of dropping packets?”

Hints in Layers

Hint 1: Head/tail indices Keep head for reads and tail for writes.

Hint 2: Full detection Treat the buffer as full when (tail + 1) % N == head.

Hint 3: Priority queue Store per-priority queues and dequeue from highest priority first.

Hint 4: Backpressure signal Return a code or block when buffer is full.

Books That Will Help

Topic Book Chapter
Queues and scheduling Computer Networks Ch. 3
Priority queues Algorithms, Fourth Edition Ch. 2.4
Networking APIs The Linux Programming Interface Ch. 56-61

Common Pitfalls & Debugging

Problem 1: Buffer corruption

  • Why: Wrap-around math incorrect
  • Fix: Add tests for head/tail boundaries
  • Quick test: Enqueue/dequeue around capacity repeatedly

Problem 2: Deadlocks under load

  • Why: Backpressure not released
  • Fix: Ensure dequeue always advances head
  • Quick test: Simulate full buffer and confirm recovery

Definition of Done

  • Ring buffer passes wrap-around tests
  • Backpressure engages and releases predictably
  • Priority traffic is transmitted first
  • Throughput and latency are measured

Project 4: Memory Allocator with Debugging

  • Main Programming Language: C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 4
  • Business Potential: Resume gold
  • Difficulty: Advanced
  • Knowledge Area: Memory management
  • Main Book: “Computer Systems: A Programmer’s Perspective” - Ch. 9.9

What you will build: A custom allocator with segregated free lists, coalescing, and debugging hooks for leak detection and double frees.

Real World Outcome

What you will see:

  1. A drop-in malloc/free replacement via LD_PRELOAD.
  2. A leak report showing unfreed allocations and stack traces.
  3. A fragmentation report showing free list utilization.

Command Line Outcome Example:

$ LD_PRELOAD=./libmymalloc.so ./test_program
[malloc] alloc 128 bytes -> 0x7f123400
[malloc] alloc 256 bytes -> 0x7f123480
[malloc] free  128 bytes -> 0x7f123400
[malloc] WARNING double free at 0x7f123400
[malloc] leak report: 1 block (256 bytes)

The Core Question You’re Answering

“How does a memory allocator balance speed with fragmentation control?”

Concepts You Must Understand First

  1. Free lists and size classes
    • Why segregate by size?
    • Book Reference: Operating Systems: Three Easy Pieces - Ch. 17
  2. Coalescing
    • How do you detect adjacent free blocks?
    • Book Reference: Computer Systems: A Programmer’s Perspective - Ch. 9.9
  3. Debugging metadata
    • How do you track allocations without huge overhead?
    • Book Reference: The Art of Debugging with GDB - relevant sections

Questions to Guide Your Design

  1. How many size classes will you use and why?
  2. When do you coalesce (immediately or lazily)?
  3. How will you track allocation metadata safely?
  4. How will you detect double frees?

Thinking Exercise

The Fragmentation Puzzle

If you allocate 100 blocks of 64 bytes and free every other block, how much contiguous memory is available for a 1KB allocation? What does this mean for external fragmentation?

The Interview Questions They’ll Ask

  1. “What is the difference between internal and external fragmentation?”
  2. “Why do allocators use size classes?”
  3. “How would you implement coalescing?”
  4. “How does LD_PRELOAD work for allocator replacement?”

Hints in Layers

Hint 1: Segregated bins Use an array of free lists indexed by size class.

Hint 2: Boundary tags Store size in header and footer for coalescing.

Hint 3: Debug map Use a hash table keyed by pointer to track allocations.

Hint 4: Canary bytes Add guard bytes to detect buffer overflows.

Books That Will Help

Topic Book Chapter
Allocator design Computer Systems: A Programmer’s Perspective Ch. 9.9
Free lists Operating Systems: Three Easy Pieces Ch. 17
Debugging memory The Art of Debugging with GDB relevant sections

Common Pitfalls & Debugging

Problem 1: Crashes during free

  • Why: Block headers corrupted
  • Fix: Add guard bytes and validate headers
  • Quick test: Use AddressSanitizer in debug builds

Problem 2: High fragmentation

  • Why: Size classes too coarse
  • Fix: Add more bins or use best-fit for large blocks
  • Quick test: Track fragmentation statistics

Definition of Done

  • Allocator serves allocations of multiple sizes
  • Coalescing reduces external fragmentation
  • Leak detection reports unfreed blocks
  • Double frees are detected and reported

Project 5: B-Tree File Index (Mini Filesystem Index)

  • Main Programming Language: C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 4
  • Business Potential: Open-core infrastructure
  • Difficulty: Advanced
  • Knowledge Area: Disk data structures
  • Main Book: “Database Internals” - storage engine design

What you will build: A disk-based B-tree that indexes file names and supports prefix queries and range scans.

Real World Outcome

What you will see:

  1. An index file on disk with fixed-size pages.
  2. Fast prefix lookups like log_* and img_*.
  3. Stable tree height as the index grows.

Command Line Outcome Example:

$ ./btree_index --build /var/log
[index] pagesize=4096
[index] inserted 120000 keys
[index] height=3 fanout=96

$ ./btree_index --query "log_"
log_2024_12_31
log_2025_01_01
log_2025_01_02

The Core Question You’re Answering

“How do you design a tree that minimizes disk I/O rather than CPU time?”

Concepts You Must Understand First

  1. B-tree node layout
    • How do keys and pointers fit in a page?
    • Book Reference: Algorithms, Fourth Edition - Ch. 6.1
  2. Page-oriented I/O
    • Why does page size dictate tree fanout?
    • Book Reference: The Linux Programming Interface - file I/O chapters
  3. Range scans
    • Why do linked leaves matter?
    • Book Reference: Database Internals - Ch. 2-4

Questions to Guide Your Design

  1. What page size will you use and why?
  2. How will you serialize nodes to disk?
  3. How will you handle splits and merges safely?
  4. How will you implement prefix and range scans?

Thinking Exercise

The Height Calculation

If your page can hold 100 keys, how many levels do you need for 1 billion keys? What does this imply about lookup I/O?

The Interview Questions They’ll Ask

  1. “Why do databases use B-trees instead of binary search trees?”
  2. “What is the difference between B-tree and B+ tree?”
  3. “How do you keep a B-tree balanced during inserts?”
  4. “How would you implement range scans efficiently?”

Hints in Layers

Hint 1: Page alignment Choose a page size (e.g., 4096 bytes) and design node layout accordingly.

Hint 2: Split policy When a node is full, split and promote the middle key to the parent.

Hint 3: Linked leaves Store a next-leaf pointer to support range scans.

Hint 4: Crash safety Write nodes to new pages before updating parent pointers.

Books That Will Help

Topic Book Chapter
B-tree fundamentals Algorithms, Fourth Edition Ch. 6.1
Disk I/O and pages The Linux Programming Interface Ch. 13-14
Storage engines Database Internals Ch. 2-4

Common Pitfalls & Debugging

Problem 1: Tree corruption after crash

  • Why: Parent pointers updated before child page persisted
  • Fix: Write child page first, then parent update (or WAL)
  • Quick test: Crash mid-insert and verify recovery

Problem 2: Range scans slow

  • Why: Leaf nodes not linked
  • Fix: Add leaf-level linked list
  • Quick test: Scan with prefix and measure throughput

Definition of Done

  • Index persists to disk across restarts
  • Inserts keep tree balanced and height stable
  • Range scans return ordered results
  • Benchmark shows predictable lookup time

Project 6: Capstone - Simple Database Engine

  • Main Programming Language: C
  • Alternative Programming Languages: Rust
  • Coolness Level: Level 5
  • Business Potential: Infrastructure product
  • Difficulty: Expert
  • Knowledge Area: Database systems

What you will build: A minimal SQL engine with a B-tree primary index, an LSM-based secondary index, a buffer pool cache, and a WAL for durability.

Real World Outcome

What you will see:

  1. A SQL-like shell that supports CREATE TABLE, INSERT, SELECT.
  2. A persistent database file with a WAL and B-tree pages.
  3. Query plans that show index usage.

Command Line Outcome Example:

$ ./mini_db
> CREATE TABLE users (id INT, name TEXT);
OK
> INSERT INTO users VALUES (1, 'alice');
OK
> INSERT INTO users VALUES (2, 'bob');
OK
> SELECT name FROM users WHERE id = 2;
bob
> .plan SELECT name FROM users WHERE id = 2;
plan: index lookup (btree)

The Core Question You’re Answering

“How do the major systems data structures integrate into a working database?”

Concepts You Must Understand First

  1. B-tree index
    • How does it map keys to pages?
    • Book Reference: Algorithms, Fourth Edition - Ch. 6.1
  2. LSM and WAL
    • Why is append-only logging essential for durability?
    • Book Reference: Designing Data-Intensive Applications - Ch. 3
  3. Buffer pool and LRU
    • How does a cache reduce disk I/O?
    • Book Reference: Operating Systems: Three Easy Pieces - Ch. 17

Questions to Guide Your Design

  1. What is your on-disk page format?
  2. How will you coordinate WAL, memtable, and B-tree updates?
  3. How will you manage buffer pool eviction?
  4. What is your crash recovery procedure?

Thinking Exercise

The Crash Scenario

If the system crashes between writing the WAL and updating the B-tree, what happens on restart? Which state should be considered correct?

The Interview Questions They’ll Ask

  1. “What is the role of the WAL in a database engine?”
  2. “How does a buffer pool reduce disk I/O?”
  3. “Why might a database use both B-trees and LSM trees?”
  4. “How would you design crash recovery?”

Hints in Layers

Hint 1: Start with WAL Log every change before applying it to indexes.

Hint 2: Buffer pool Cache pages in memory and evict with LRU.

Hint 3: Simplify SQL Start with a minimal grammar: CREATE, INSERT, SELECT, WHERE.

Hint 4: Recovery Replay WAL entries on startup before serving queries.

Books That Will Help

Topic Book Chapter
Storage engines Database Internals Ch. 2-4
WAL and recovery Operating Systems: Three Easy Pieces Ch. 42
Indexes Algorithms, Fourth Edition Ch. 6.1

Common Pitfalls & Debugging

Problem 1: Data corruption after crash

  • Why: WAL not replayed correctly
  • Fix: Ensure WAL is applied before opening the B-tree
  • Quick test: Crash during insert and verify recovery

Problem 2: Buffer pool thrashing

  • Why: Cache size too small or eviction policy broken
  • Fix: Tune size and validate LRU ordering
  • Quick test: Run workload with skewed access

Definition of Done

  • SQL shell supports basic CREATE/INSERT/SELECT
  • WAL ensures durability across crashes
  • B-tree index supports point and range queries
  • Buffer pool reduces disk I/O under load
  • Basic query plan output is correct