Project 13: Database Storage Engine (B-Trees and Transactions)

A persistent key-value storage engine with B-tree indexing, write-ahead logging, and ACID transactions—the core of databases like SQLite or LevelDB.

Quick Reference

Attribute	Value
Primary Language	Rust
Alternative Languages	C, C++, Go
Difficulty	Level 4: Expert
Time Estimate	1 month+
Knowledge Area	Database Internals / B-Trees / ACID
Tooling	File I/O, memory mapping
Prerequisites	Projects 1-4 completed, understanding of data structures

What You Will Build

A persistent key-value storage engine with B-tree indexing, write-ahead logging, and ACID transactions—the core of databases like SQLite or LevelDB.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

Implementing B-tree node splitting and merging → maps to complex data structure manipulation
Write-ahead logging for crash recovery → maps to file I/O and fsync
Page-level locking for concurrency → maps to RwLock and ownership
Memory-mapped I/O for performance → maps to unsafe and raw pointers

Key Concepts

B-tree algorithms: “Algorithms, Fourth Edition” Chapter 6 - Sedgewick
Write-ahead logging: “Designing Data-Intensive Applications” Chapter 7 - Kleppmann
ACID properties: “Designing Data-Intensive Applications” Chapter 7 - Kleppmann
Memory-mapped files: “The Linux Programming Interface” Chapter 49 - Kerrisk

Real-World Outcome

$ cargo run --release
🗄️ RustKV - Embedded Storage Engine
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

> put user:1 {"name": "Alice", "age": 30}
OK (0.1ms)

> put user:2 {"name": "Bob", "age": 25}
OK (0.08ms)

> get user:1
{"name": "Alice", "age": 30} (0.02ms)

> BEGIN
Transaction started

> put user:1 {"name": "Alice", "age": 31}
OK (staged)

> ROLLBACK
Transaction rolled back

> get user:1
{"name": "Alice", "age": 30}  # Unchanged!

Benchmark:
  Random writes: 150,000 ops/sec
  Random reads:  450,000 ops/sec
  Range scans:   2.1M keys/sec

  Crash recovery test: PASSED
  (Killed process during write, all committed data recovered)

Implementation Guide

Reproduce the simplest happy-path scenario.
Build the smallest working version of the core feature.
Add input validation and error handling.
Add instrumentation/logging to confirm behavior.
Refactor into clean modules with tests.

Milestones

Milestone 1: Minimal working program that runs end-to-end.
Milestone 2: Correct outputs for typical inputs.
Milestone 3: Robust handling of edge cases.
Milestone 4: Clean structure and documented usage.

Validation Checklist

Output matches the real-world outcome example
Handles invalid inputs safely
Provides clear errors and exit codes
Repeatable results across runs

References

Main guide: LEARN_RUST_DEEP_DIVE.md
“Designing Data-Intensive Applications” by Martin Kleppmann