← Back to all projects

PHASE 2 TRACK B SYSTEMS LIBRARIES PROJECTS

Systems Libraries & Runtimes — Project-Based Learning Path

Phase 2 — Advanced Systems Track B

This track is about building the infrastructure that other software depends on. You’ll learn to write code that’s fast, correct across platforms, and doesn’t invoke undefined behavior.


Core Concept Analysis

The topics in this track break down into these fundamental building blocks:

Concept What You Must Understand
Memory Allocators Free lists, fragmentation, arena vs general-purpose, metadata overhead
Threading Primitives Mutexes, atomics, memory ordering, lock-free algorithms
Async Runtimes Event loops, futures/promises, IO multiplexing (epoll/kqueue), schedulers
ABI Details Calling conventions, struct layout, symbol visibility, name mangling
Platform Differences POSIX vs Windows syscalls, endianness, feature detection
Performance Tuning Cache lines, branch prediction, SIMD, profiling
Undefined Behavior Strict aliasing, alignment, integer overflow, pointer provenance
API Design Ergonomics vs zero-cost, error handling, versioning

Project 1: Custom Memory Allocator

  • File: PHASE_2_TRACK_B_SYSTEMS_LIBRARIES_PROJECTS.md
  • Main Programming Language: C
  • Alternative Programming Languages: C++, Rust, Zig
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold” (Educational/Personal Brand)
  • Difficulty: Level 3: Advanced (The Engineer)
  • Knowledge Area: Memory Management, Systems Programming
  • Software or Tool: glibc, jemalloc, valgrind
  • Main Book: “C Interfaces and Implementations” - David Hanson What you’ll build: A specialized memory allocator (arena/pool allocator) that eliminates malloc from your hot path - because malloc is a latency killer. Why it teaches HFT: HFT systems never call malloc in the critical path. This project teaches you why general-purpose allocators are slow and how to design allocation strategies for specific use cases. Core challenges you’ll face:
  • Pool allocator design: Fixed-size blocks with O(1) alloc/free (teaches memory management fundamentals)
  • Arena allocator: Bump allocation with bulk free (teaches allocation patterns)
  • Thread-local pools: Avoiding false sharing in multi-threaded allocators (teaches TLS, cache lines) What you’ll build: A specialized memory allocator (arena/pool allocator) that eliminates malloc from your hot path - because malloc is a latency killer. Why it teaches HFT: HFT systems never call malloc in the critical path. This project teaches you why general-purpose allocators are slow and how to design allocation strategies for specific use cases. Core challenges you’ll face:
  • Pool allocator design: Fixed-size blocks with O(1) alloc/free (teaches memory management fundamentals)
  • Arena allocator: Bump allocation with bulk free (teaches allocation patterns)
  • Thread-local pools: Avoiding false sharing in multi-threaded allocators (teaches TLS, cache lines)

What you’ll build: A general-purpose memory allocator (malloc/free/realloc) that can replace the system allocator in real programs.

Why it teaches memory allocators: You cannot fake your way through this. You’ll face fragmentation, understand why jemalloc uses size classes, learn why metadata placement matters, and discover that “fast” and “memory-efficient” are often at odds.

Core challenges you’ll face:

  • Designing free list structures (maps to fragmentation strategies)
  • Handling alignment requirements for different types (maps to ABI/alignment)
  • Implementing coalescing without destroying performance (maps to performance tuning)
  • Making it thread-safe without killing scalability (maps to threading primitives)
  • Beating glibc malloc in at least one benchmark (maps to real-world validation)

Key Concepts:

  • Free list management: “C Interfaces and Implementations” by David Hanson - Chapter 5 (Arena) and Chapter 6 (Mem)
  • Size classes and binning: jemalloc design doc - Jason Evans’ “A Scalable Concurrent malloc Implementation”
  • Thread-local caching: “Hoard: A Scalable Memory Allocator” - Emery Berger paper
  • Fragmentation analysis: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Chapter 9

Difficulty: Advanced Time estimate: 3-4 weeks Prerequisites: Solid C, understanding of virtual memory basics

Real world outcome:

  • Your allocator will be LD_PRELOAD-able into real programs
  • You’ll have benchmarks showing throughput (allocations/sec) and memory efficiency vs glibc/jemalloc
  • You can literally run LD_PRELOAD=./myalloc.so ls and see it work

Learning milestones:

  1. Basic allocator working - You understand why malloc needs metadata and how free lists work
  2. Thread-safe version - You grasp why naive locking destroys performance and why per-thread caches exist
  3. Competitive benchmarks - You’ve internalized the tradeoffs between speed, fragmentation, and memory overhead

Project 2: Work-Stealing Thread Pool

  • File: LEARN_CPP_CONCURRENCY_AND_PARALLELISM.md
  • Main Programming Language: C++
  • Alternative Programming Languages: Rust, Go, Java
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Thread Pool Design / Work Scheduling
  • Software or Tool: High-Performance Thread Pool Library

What you’ll build: A thread pool with work-stealing scheduling, similar to Rayon’s or Go’s runtime scheduler.

Why it teaches threading primitives: You’ll implement mutexes, condition variables, and atomic operations from scratch (or use them correctly). Work-stealing forces you to understand cache coherency, false sharing, and memory ordering—you can’t just slap a lock on everything.

Core challenges you’ll face:

  • Implementing lock-free deques for each worker (maps to atomics and memory ordering)
  • Avoiding false sharing between worker threads (maps to cache-line awareness)
  • Balancing work without excessive stealing overhead (maps to performance tuning)
  • Handling thread parking/unparking efficiently (maps to threading primitives)
  • Making the API ergonomic for parallel iterators (maps to API design)

Key Concepts:

  • Memory ordering and atomics: “Rust Atomics and Locks” by Mara Bos - Chapters 1-3 (even if writing in C, this is the clearest explanation)
  • Work-stealing algorithm: “Scheduling Multithreaded Computations by Work Stealing” - Blumofe & Leiserson paper
  • Lock-free deque: Chase-Lev deque paper - “Dynamic Circular Work-Stealing Deque”
  • False sharing: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Chapter 6 (Cache)

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Basic threading, atomics concepts

Real world outcome:

  • A library that parallelizes embarrassingly parallel workloads
  • Run ./threadpool_demo --threads 8 and see a Mandelbrot set render 7.5x faster than single-threaded
  • Benchmark output showing near-linear scaling with core count

Learning milestones:

  1. Basic thread pool working - You understand condition variables and worker loops
  2. Work-stealing implemented - You grasp why memory ordering matters and have debugged a race condition
  3. Near-linear scaling achieved - You’ve eliminated false sharing and understand cache-aware programming

Project 3: Mini Async Runtime

  • File: PHASE_2_TRACK_B_SYSTEMS_LIBRARIES_PROJECTS.md
  • Programming Language: Rust or C
  • Coolness Level: Level 5: Pure Magic (Super Cool)
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 5: Master
  • Knowledge Area: Asynchronous I/O / Runtime Design
  • Software or Tool: Epoll / Kqueue / Futures
  • Main Book: “Asynchronous Programming in Rust” (or libuv documentation)

What you’ll build: A single-threaded async runtime that can serve HTTP requests, similar to a simplified Tokio or libuv.

Why it teaches async runtimes: Async is magic until you build the event loop yourself. You’ll implement futures, understand why poll vs push matters, and see exactly how epoll/kqueue enables thousands of concurrent connections without threads.

Core challenges you’ll face:

  • Implementing an event loop with epoll/kqueue (maps to platform differences)
  • Building a future/task abstraction (maps to API design)
  • Managing wakers and the reactor pattern (maps to async internals)
  • Non-blocking IO without burning CPU (maps to performance tuning)
  • Supporting timers and cancellation (maps to real-world completeness)

Key Concepts:

  • Event loops and IO multiplexing: “The Linux Programming Interface” by Michael Kerrisk - Chapter 63 (Alternative I/O Models)
  • Future/Promise patterns: Tokio tutorial’s “Async in Depth” section
  • Reactor pattern: “libuv Design Overview” documentation
  • Platform IO differences: “Advanced Programming in the UNIX Environment” by Stevens & Rago - Chapter 14

Resources for epoll/kqueue abstraction challenge:

  • “Epoll is fundamentally broken” by Marek Majkowski (Cloudflare) - Understanding edge vs level triggering pitfalls

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: Sockets basics, understanding of file descriptors

Real world outcome:

  • A working HTTP server handling 10,000 concurrent connections on a single thread
  • wrk -c 1000 http://localhost:8080/ showing impressive throughput
  • Visually see connection count vs memory usage staying flat (unlike thread-per-connection)

Learning milestones:

  1. Echo server with epoll - You understand the event loop and non-blocking IO
  2. Task/future abstraction working - You grasp how wakers notify the runtime
  3. HTTP server benchmarked - You’ve seen why async enables C10K and understand the tradeoffs

Project 4: Cross-Platform Syscall Abstraction Library

  • File: PHASE_2_TRACK_B_SYSTEMS_LIBRARIES_PROJECTS.md
  • Programming Language: C
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Systems Programming / Portability
  • Software or Tool: POSIX / Win32 API
  • Main Book: “Advanced Programming in the UNIX Environment” by Stevens & Rago

What you’ll build: A library that wraps platform-specific syscalls (file operations, networking, process control) into a unified API, similar to libuv’s uv_fs_* or Rust’s std.

Why it teaches ABI and platform differences: You’ll discover that “POSIX” doesn’t mean “identical.” struct layouts differ, error codes differ, syscall numbers differ. You’ll fight with calling conventions and learn why #ifdef hell exists.

Core challenges you’ll face:

  • Abstracting file operations across Linux/macOS/Windows (maps to platform differences)
  • Handling struct layout differences (maps to ABI details)
  • Defining stable API versioning (maps to API design)
  • Avoiding undefined behavior in type punning (maps to UB avoidance)
  • Writing comprehensive test suites (maps to real-world robustness)

Key Concepts:

  • POSIX variations: “Advanced Programming in the UNIX Environment” by Stevens & Rago - Throughout (notes platform differences)
  • ABI and calling conventions: “Computer Systems: A Programmer’s Perspective” by Bryant & O’Hallaron - Chapter 3 (Machine-Level Representation)
  • Undefined behavior in C: “Effective C” by Robert Seacord - Chapters on UB
  • API design principles: “C Interfaces and Implementations” by David Hanson - Introduction and design philosophy

Difficulty: Intermediate Time estimate: 2 weeks Prerequisites: C, basic familiarity with at least 2 OSes

Real world outcome:

  • A header library that compiles the same code on Linux, macOS, and (optionally) Windows
  • A demo program that lists directory contents, reads files, and spawns processes—same code, all platforms
  • CI pipeline showing green builds on all target platforms

Learning milestones:

  1. File ops abstracted - You understand why stat differs between platforms
  2. Process spawning unified - You grasp fork/exec vs CreateProcess differences
  3. Library is usable - Someone else can #include your header and write portable code

Project 5: High-Performance String Search Library

  • File: PHASE_2_TRACK_B_SYSTEMS_LIBRARIES_PROJECTS.md
  • Programming Language: C (with SIMD Intrinsics)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Algorithms / Low-Level Optimization
  • Software or Tool: SIMD / Vectorization
  • Main Book: “Modern X86 Assembly Language Programming” by Daniel Kusswurm

What you’ll build: A fast substring search library with SIMD acceleration, similar to what powers ripgrep’s core.

Why it teaches performance tuning: You’ll learn that the algorithm textbooks teach (KMP, Boyer-Moore) isn’t what fast tools use. You’ll discover SIMD, understand why memory access patterns matter more than Big-O for real data, and learn to profile before optimizing.

Core challenges you’ll face:

  • Implementing SIMD-accelerated search (maps to performance tuning)
  • Handling different string encodings (UTF-8 awareness) (maps to real-world correctness)
  • Designing an API that’s both fast and safe (maps to API design)
  • Avoiding undefined behavior with pointer arithmetic (maps to UB avoidance)
  • Beating strstr and memmem in benchmarks (maps to measurable outcome)

Key Concepts:

  • SIMD fundamentals: “Modern X86 Assembly Language Programming” by Daniel Kusswurm - SIMD chapters
  • Fast string search algorithms: Andrew Gallant’s (BurntSushi) blog posts on ripgrep’s internals
  • Cache-aware programming: “What Every Programmer Should Know About Memory” by Ulrich Drepper
  • UTF-8 handling: “UTF-8 Everywhere” manifesto and ripgrep’s encoding handling

Resources for SIMD string search challenge:

  • “Hyperscan” Intel paper - How regex engines use SIMD for literal matching

Difficulty: Advanced Time estimate: 2-3 weeks Prerequisites: C, basic understanding of CPU architecture

Real world outcome:

  • A command-line tool: ./fastsearch "pattern" largefile.txt
  • Benchmarks showing 3-10x speedup over naive search on large files
  • Flame graphs showing where time is actually spent

Learning milestones:

  1. Basic SIMD search working - You understand vector instructions and intrinsics
  2. UTF-8 handled correctly - You grasp why byte-level search needs encoding awareness
  3. Consistently faster than stdlib - You’ve profiled, found bottlenecks, and optimized the right things

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor Most Teaches
Memory Allocator Advanced 3-4 weeks ★★★★★ ★★★ Memory, UB, Performance
Work-Stealing Thread Pool Advanced 2-3 weeks ★★★★★ ★★★★ Threading, Atomics, Cache
Mini Async Runtime Advanced 2-3 weeks ★★★★ ★★★★★ Async, Platform IO, API
Syscall Abstraction Intermediate 2 weeks ★★★ ★★★ ABI, Platform, API
String Search Library Advanced 2-3 weeks ★★★★ ★★★★ SIMD, Performance, Profiling

Based on this track’s emphasis on excellent job market and OSS-heavy hiring signal, here’s the recommended order:

Start with: Mini Async Runtime

  • High demand skill (Tokio, libuv, Node.js internals)
  • You’ll build something visibly impressive (C10K server)
  • Teaches platform differences naturally (epoll vs kqueue)
  • Foundation for understanding modern networking stacks

Then: Work-Stealing Thread Pool

  • Complements async knowledge (CPU-bound vs IO-bound)
  • Threading + atomics is interview gold for systems roles
  • Shows you understand parallelism at a deep level

Finally: Memory Allocator

  • The “boss fight” of systems programming
  • Forces you to synthesize everything: performance, correctness, threading
  • Having “wrote a malloc” on your resume/portfolio is a strong signal

Final Capstone Project: Embedded Key-Value Database

What you’ll build: A persistent, thread-safe, embedded key-value store like a simplified RocksDB or LMDB—combining everything from above.

Why it teaches everything: This project is the “final boss” because it requires:

  • Memory allocators: Custom allocators for the buffer pool
  • Threading: Concurrent readers/writers with proper synchronization
  • Async: Optional async API for non-blocking operations
  • ABI: Stable on-disk format across versions
  • Platform: Memory-mapped files work differently on each OS
  • Performance: You’ll profile compaction, cache hit rates, write amplification
  • UB avoidance: Memory-mapped IO is a UB minefield
  • API design: Making it usable for application developers

Core challenges you’ll face:

  • Implementing a log-structured merge tree or B-tree (maps to data structure design)
  • Memory-mapping files safely across platforms (maps to platform differences)
  • Write-ahead logging for durability (maps to crash consistency)
  • Concurrent access without global locks (maps to threading primitives)
  • Compaction without blocking readers (maps to async/background work)
  • Benchmarking against SQLite/RocksDB (maps to performance tuning)

Key Concepts:

  • LSM trees: “Designing Data-Intensive Applications” by Martin Kleppmann - Chapter 3
  • Memory-mapped IO: “The Linux Programming Interface” by Kerrisk - Chapter 49
  • Crash consistency: “Operating Systems: Three Easy Pieces” - Crash Consistency chapter
  • B-tree implementation: “Algorithms” by Sedgewick & Wayne - Chapter 6
  • Concurrent data structures: “The Art of Multiprocessor Programming” by Herlihy & Shavit

Difficulty: Expert Time estimate: 1-2 months Prerequisites: All previous projects, or equivalent experience

Real world outcome:

  • A library: db_open(), db_get(), db_put(), db_close()
  • A benchmark suite comparing ops/sec against SQLite and RocksDB
  • A crash test that kills the process mid-write and verifies data integrity on restart
  • Potentially: others using it in their projects (OSS signal)

Learning milestones:

  1. Basic persistence working - You understand write-ahead logging and fsync semantics
  2. Concurrent access safe - You’ve implemented MVCC or reader-writer locks correctly
  3. Performance competitive - You’ve profiled, optimized, and understand why RocksDB makes certain tradeoffs
  4. Crash-safe verified - You can kill -9 your DB at any point and recover correctly

Summary

This track will transform you from someone who uses systems libraries to someone who builds them. The projects above give you real artifacts to show employers and the deep understanding to discuss implementation tradeoffs in interviews.

Projects in this track align with real-world tools:

  • Memory Allocator → jemalloc, tcmalloc, mimalloc
  • Thread Pool → Rayon, Go runtime, Java ForkJoinPool
  • Async Runtime → Tokio, libuv, io_uring
  • Syscall Abstraction → libuv, Rust std, Go runtime
  • String Search → ripgrep, hyperscan, stringzilla
  • Key-Value Store → RocksDB, LMDB, LevelDB