Project 13: Database Storage Engine (B-Trees and Transactions)

A persistent key-value storage engine with B-tree indexing, write-ahead logging, and ACID transactions—the core of databases like SQLite or LevelDB.

Quick Reference

Attribute Value
Primary Language Rust
Alternative Languages C, C++, Go
Difficulty Level 4: Expert
Time Estimate 1 month+
Knowledge Area Database Internals / B-Trees / ACID
Tooling File I/O, memory mapping
Prerequisites Projects 1-4 completed, understanding of data structures

What You Will Build

A persistent key-value storage engine with B-tree indexing, write-ahead logging, and ACID transactions—the core of databases like SQLite or LevelDB.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Implementing B-tree node splitting and merging → maps to complex data structure manipulation
  • Write-ahead logging for crash recovery → maps to file I/O and fsync
  • Page-level locking for concurrency → maps to RwLock and ownership
  • Memory-mapped I/O for performance → maps to unsafe and raw pointers

Key Concepts

  • B-tree algorithms: “Algorithms, Fourth Edition” Chapter 6 - Sedgewick
  • Write-ahead logging: “Designing Data-Intensive Applications” Chapter 7 - Kleppmann
  • ACID properties: “Designing Data-Intensive Applications” Chapter 7 - Kleppmann
  • Memory-mapped files: “The Linux Programming Interface” Chapter 49 - Kerrisk

Real-World Outcome

$ cargo run --release
🗄️ RustKV - Embedded Storage Engine
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

> put user:1 {"name": "Alice", "age": 30}
OK (0.1ms)

> put user:2 {"name": "Bob", "age": 25}
OK (0.08ms)

> get user:1
{"name": "Alice", "age": 30} (0.02ms)

> BEGIN
Transaction started

> put user:1 {"name": "Alice", "age": 31}
OK (staged)

> ROLLBACK
Transaction rolled back

> get user:1
{"name": "Alice", "age": 30}  # Unchanged!

Benchmark:
  Random writes: 150,000 ops/sec
  Random reads:  450,000 ops/sec
  Range scans:   2.1M keys/sec

  Crash recovery test: PASSED
  (Killed process during write, all committed data recovered)

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_RUST_DEEP_DIVE.md
  • “Designing Data-Intensive Applications” by Martin Kleppmann