← Back to all projects

SPRINT 4 5 REAL WORLD PROJECTS

Sprint 4.5 — Data Structures & Algorithmic Tools (Systems View)

Real-World Projects for Learning by Doing

This sprint is focused on why these structures exist in real systems, not academic exercises. These projects force you to confront data structure concepts through building real things.


Core Concept Analysis

This sprint is fundamentally about understanding performance contracts:

Concept What You Really Need to Learn
Arrays Why they win (cache locality, predictable access)
Linked Lists Why they usually lose, and the few cases they don’t
Stacks/Queues Buffering, flow control, backpressure
Hash Tables Fast lookup, but with hidden traps (resizing, collisions)
Trees Ordered access, range queries, disk-friendly structures
Big-O Intuition for “will this scale?” not math proofs

Project 1: In-Memory Cache with LRU Eviction (Mini-Redis)

  • File: SPRINT_4_5_REAL_WORLD_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Data Structures / Performance
  • Software or Tool: Redis / Hash Tables
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A simple key-value cache server that evicts least-recently-used items when memory is full, accessible via TCP.

Why it teaches data structures from a systems view: This project forces you to combine a hash table (O(1) lookups) with a doubly-linked list (O(1) eviction ordering)—one of the few cases where linked lists are actually the right choice. You’ll feel the performance difference between good and bad structure choices.

Core challenges you’ll face:

  • Implementing a hash table with collision handling (maps to hash table mechanics)
  • Understanding why LRU needs a doubly-linked list, not an array (maps to linked structure tradeoffs)
  • Handling hash table resizing without blocking clients (maps to “why resizing is hard”)
  • Managing memory limits and eviction (maps to space tradeoffs)
  • Ring buffer for network I/O (maps to queues and buffers)

Key Concepts:

  • Hash Tables (Open Addressing vs Chaining): “Designing Data-Intensive Applications” Ch. 3 - Martin Kleppmann
  • Doubly Linked Lists for LRU: “Algorithms, Fourth Edition” Ch. 1.3 - Sedgewick & Wayne
  • Cache Locality: “Computer Systems: A Programmer’s Perspective” Ch. 6 - Bryant & O’Hallaron
  • Load Factor and Collision Handling: “Mastering Algorithms with C” Ch. 8 - Kyle Loudon

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: C pointers, basic networking (sockets), memory management

Real world outcome: A working cache server you can telnet into, run SET key value, GET key, and watch the LRU eviction happen when you exceed capacity. You can benchmark it and see actual performance numbers.

Learning milestones:

  1. Hash table working with basic collision handling → you understand why O(1) has asterisks
  2. LRU eviction working → you understand when linked lists are actually good
  3. Resize without blocking → you understand why hash tables are tricky in production
  4. Benchmark shows expected performance → you have Big-O intuition grounded in reality

Project 2: Log-Structured Key-Value Store (Mini LSM-Tree)

  • File: SPRINT_4_5_REAL_WORLD_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Database Storage Engines
  • Software or Tool: LSM Trees
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A persistent key-value store that writes to append-only log segments, maintains an in-memory index (hash table), and compacts segments in the background.

Why it teaches data structures from a systems view: This is how real databases work (LevelDB, RocksDB, Cassandra). You’ll see why arrays (sequential writes) beat random access, how hash tables provide fast lookups, and why trees matter for sorted output during compaction.

Core challenges you’ll face:

  • Append-only log segments using dynamic arrays (maps to arrays & cache locality)
  • In-memory hash table index pointing to disk offsets (maps to hash table mechanics)
  • Sorted String Tables (SSTables) requiring merge operations (maps to algorithmic thinking)
  • Background compaction using sorted merging (maps to trees & algorithmic thinking)
  • Understanding when to use which structure (maps to time vs space tradeoffs)

Key Concepts:

  • Log-Structured Storage: “Designing Data-Intensive Applications” Ch. 3 - Martin Kleppmann
  • Dynamic Arrays & Growth Strategies: “C Interfaces and Implementations” Ch. 9 - David Hanson
  • Hash Tables for Indexing: “The Art of Computer Programming Vol. 3” Ch. 6.4 - Donald Knuth
  • Merge Algorithms: “Algorithms, Fourth Edition” Ch. 2.2 - Sedgewick & Wayne
  • Why Sequential I/O Wins: “Computer Systems: A Programmer’s Perspective” Ch. 6 - Bryant & O’Hallaron

Difficulty: Intermediate-Advanced Time estimate: 2-3 weeks Prerequisites: File I/O, hash tables basics, C memory management

Real world outcome: A persistent database that survives restarts. You can PUT/GET millions of keys, watch disk usage, trigger compaction, and measure read/write throughput. You’ll understand why LevelDB exists.

Learning milestones:

  1. Basic log + hash index working → you understand why append-only is powerful
  2. Multiple segments with in-memory switching → you understand growth strategies
  3. SSTable compaction working → you understand why sorted structures matter for merging
  4. Benchmark shows write-optimized behavior → you have intuition for LSM vs B-tree tradeoffs

Project 3: Network Packet Buffer / Traffic Shaper

  • File: SPRINT_4_5_REAL_WORLD_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Networking / Systems Programming
  • Software or Tool: Ring Buffers
  • Main Book: “The Linux Programming Interface” by Michael Kerrisk

What you’ll build: A userspace network buffer that sits between input and output, implementing a ring buffer with backpressure, rate limiting, and priority queues.

Why it teaches data structures from a systems view: Real network systems live and die by buffer management. You’ll implement ring buffers (the most important queue structure in systems), understand backpressure viscerally, and see why arrays with clever indexing beat linked lists.

Core challenges you’ll face:

  • Lock-free ring buffer with wrap-around arithmetic (maps to ring buffers)
  • Backpressure signaling when buffer fills (maps to backpressure intuition)
  • Priority queue for different traffic classes (maps to trees/heaps)
  • Memory pool using arrays instead of malloc per packet (maps to arrays & cache locality)
  • Understanding FIFO vs priority ordering (maps to stacks & queues)

Key Concepts:

  • Ring Buffers: “The Linux Programming Interface” Ch. 44 - Michael Kerrisk
  • FIFO vs LIFO Behavior: “Data Structures the Fun Way” Ch. 4 - Jeremy Kubica
  • Priority Queues (Heaps): “Algorithms, Fourth Edition” Ch. 2.4 - Sedgewick & Wayne
  • Cache-Friendly Memory Pools: “Game Programming Patterns” - Robert Nystrom (online, Data Locality chapter)
  • Backpressure in Systems: “Designing Data-Intensive Applications” Ch. 11 - Martin Kleppmann

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: C, basic understanding of networking concepts, pointers

Real world outcome: A traffic shaper you can pipe data through. Send bursts of traffic, watch it smooth out. Fill the buffer, see backpressure kick in. Prioritize certain traffic, watch it jump the queue. Real metrics, real behavior.

Learning milestones:

  1. Basic ring buffer working → you understand why circular arrays are everywhere in systems
  2. Backpressure working → you understand flow control intuitively
  3. Priority queue integrated → you understand when trees/heaps are necessary
  4. Memory pool eliminates malloc overhead → you understand cache locality wins

Project 4: Simple Memory Allocator with Debugging

  • File: SPRINT_4_5_REAL_WORLD_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Memory Management
  • Software or Tool: Malloc Implementation
  • Main Book: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron

What you’ll build: A custom malloc/free implementation that tracks allocations, detects leaks/double-frees, and visualizes heap fragmentation.

Why it teaches data structures from a systems view: Memory allocators are where data structure choices have the most visible performance impact. You’ll use free lists (where linked lists shine), understand why arrays of size classes exist, see hash tables for tracking, and viscerally feel fragmentation.

Core challenges you’ll face:

  • Free list implementation using intrusive linked lists (maps to linked structures - where appropriate)
  • Size class segregation using arrays of free lists (maps to arrays & growth strategies)
  • Allocation tracking using hash table (maps to hash tables)
  • Coalescing adjacent free blocks (maps to algorithmic thinking)
  • Visualizing heap layout (maps to understanding memory-first thinking)

Key Concepts:

  • Free Lists (Intrusive Linked Lists): “Operating Systems: Three Easy Pieces” Ch. 17 - Arpaci-Dusseau
  • Memory Allocator Design: “Computer Systems: A Programmer’s Perspective” Ch. 9.9 - Bryant & O’Hallaron
  • Hash Tables for Tracking: “Mastering Algorithms with C” Ch. 8 - Kyle Loudon
  • Fragmentation & Coalescing: “The Linux Programming Interface” Ch. 7 - Michael Kerrisk

Difficulty: Intermediate-Advanced Time estimate: 2 weeks Prerequisites: Pointers, memory layout understanding, basic C

Real world outcome: A drop-in malloc replacement. Run real programs with it, see allocation patterns visualized, catch memory bugs automatically. When you find a double-free in your own code using your own allocator, you’ve learned something.

Learning milestones:

  1. Basic malloc/free working → you understand free lists
  2. Size classes reduce fragmentation → you understand why jemalloc/tcmalloc exist
  3. Leak detection working → you understand allocation tracking with hash tables
  4. Visualization shows fragmentation patterns → you understand memory-first thinking

Project 5: B-Tree File Index (Mini Filesystem Index)

  • File: SPRINT_4_5_REAL_WORLD_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Data Structures / File Systems
  • Software or Tool: B-Trees
  • Main Book: “Database Internals” by Alex Petrov

What you’ll build: A disk-based B-tree that indexes files by name, supporting range queries (“all files starting with ‘log_’”), and survives restarts.

Why it teaches data structures from a systems view: B-trees are why databases and filesystems work. You’ll understand why binary trees are terrible for disk, why B-trees exist, and why “balanced” matters for predictable performance. This is the tree structure that actually matters in systems.

Core challenges you’ll face:

  • Node structure fitting disk pages/blocks (maps to cache locality / disk I/O)
  • Splitting and merging nodes during insert/delete (maps to balanced trees conceptually)
  • Range queries via in-order traversal (maps to why trees matter for ordered data)
  • Understanding why high fanout beats binary (maps to logarithmic intuition)
  • Persistence and crash recovery (maps to real-world tree usage)

Key Concepts:

  • Why B-Trees Exist: “Designing Data-Intensive Applications” Ch. 3 - Martin Kleppmann
  • B-Tree Implementation: “Algorithms, Fourth Edition” Ch. 6.1 - Sedgewick & Wayne
  • Disk I/O and Page-Oriented Design: “Database Internals” Ch. 2-4 - Alex Petrov
  • Balanced Trees Conceptually: “Introduction to Algorithms” Ch. 18 - CLRS (skim for intuition)

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Trees basics, file I/O, understanding of disk vs memory

Real world outcome: A persistent index you can query. Insert a million file names, query by prefix, watch the tree depth stay small. Restart the program, data is still there. This is how locate, database indexes, and filesystems actually work.

Learning milestones:

  1. In-memory B-tree working → you understand why high fanout matters
  2. Disk persistence working → you understand page-oriented design
  3. Range queries working → you understand why trees beat hash tables for ordered data
  4. Performance stays logarithmic at scale → you have real Big-O intuition

Project Comparison Table

Project Difficulty Time Structures Covered Fun Factor
Mini-Redis Cache Intermediate 1-2 weeks Hash tables, linked lists, ring buffers ⭐⭐⭐⭐⭐
LSM Key-Value Store Int-Advanced 2-3 weeks Arrays, hash tables, merge algorithms ⭐⭐⭐⭐
Network Packet Buffer Intermediate 1-2 weeks Ring buffers, priority queues, arrays ⭐⭐⭐⭐
Memory Allocator Int-Advanced 2 weeks Free lists, hash tables, arrays ⭐⭐⭐⭐⭐
B-Tree File Index Advanced 2-3 weeks Trees, disk-oriented design ⭐⭐⭐

Recommendation

Start with Project 1: Mini-Redis Cache (LRU)

Here’s why:

  1. Immediate feedback: You can interact with it via telnet/netcat instantly
  2. Forces the right question: “When ARE linked lists good?” (LRU is a canonical answer)
  3. Hash table depth: You’ll implement collision handling, resizing, load factor management
  4. Systems-relevant: Every web system uses caches; you’ll understand Redis/Memcached internals
  5. Natural progression: After this, the LSM store or B-tree makes more sense

If you want more emphasis on arrays and cache locality, start with Project 3 (Network Packet Buffer) instead—ring buffers are pure array mastery.

If you want to go deep on memory, Project 4 (Memory Allocator) will teach you more about memory layout than anything else, and intrusive linked lists are the one place where linked structures truly shine in systems code.


Final Capstone Project: Build a Simple Database Engine

What you’ll build: A complete (simple) database engine combining everything: a B-tree for the primary index, an LSM-tree for secondary indexes, a buffer pool using LRU eviction, a write-ahead log using ring buffers, and a query executor with basic SQL parsing.

Why it teaches everything together: A database is where all data structures meet reality. You’ll make tradeoffs between structures, understand why real databases use specific structures for specific tasks, and see how everything interacts under memory pressure and concurrent access.

Core challenges you’ll face:

  • Buffer pool with LRU eviction (hash table + linked list from Project 1)
  • Write-ahead log using ring buffer (from Project 3)
  • Primary index using B-tree (from Project 5)
  • Secondary indexes using hash tables or LSM (from Projects 1, 2)
  • Query planning requiring algorithmic thinking (Big-O intuition throughout)
  • Memory management for pages (from Project 4)

Key Concepts:

  • Database Internals Overview: “Database Internals” - Alex Petrov
  • Query Processing: “Designing Data-Intensive Applications” Ch. 3, 6 - Martin Kleppmann
  • Buffer Pool Management: “Database Management Systems” Ch. 9 - Ramakrishnan & Gehrke
  • Write-Ahead Logging: “Operating Systems: Three Easy Pieces” Ch. 42 - Arpaci-Dusseau

Difficulty: Advanced Time estimate: 1 month+ Prerequisites: Completing 2-3 projects above

Real world outcome: A working database you can query with SQL. CREATE TABLE, INSERT, SELECT, WHERE, ORDER BY. Watch it persist to disk, survive crashes, handle thousands of rows efficiently. You’ll understand PostgreSQL, SQLite, and MySQL at a structural level.

Learning milestones:

  1. Storage engine working (B-tree + WAL) → you understand durability
  2. Buffer pool with eviction working → you understand memory management in databases
  3. Basic query execution working → you understand how SQL becomes data structure operations
  4. Performance scales predictably → you’ve internalized data structure performance contracts

Summary

The key insight from this sprint is correct: You need to understand these structures mechanically and know when to use them—not implement red-black tree rotations from memory. These projects give you that practical intuition by forcing you to make real tradeoff decisions with real, measurable consequences.

Quick Reference: Which Structure When?

Need Best Structure Why
Fast lookup by key Hash table O(1) average, but watch load factor
Ordered iteration / range queries B-tree / BST O(log n) and maintains order
Fast insert/delete at ends Ring buffer O(1), cache-friendly
LRU ordering with fast access Hash table + doubly-linked list O(1) for both operations
Memory pools Array of fixed-size blocks Cache locality, no fragmentation
Free block tracking Intrusive linked list No extra allocation, coalescing
Sequential writes Append-only log (array) Cache-friendly, disk-friendly