Project 13: Database Storage Engine (B-Trees and Transactions)
A persistent key-value storage engine with B-tree indexing, write-ahead logging, and ACID transactions—the core of databases like SQLite or LevelDB.
Quick Reference
| Attribute | Value |
|---|---|
| Primary Language | Rust |
| Alternative Languages | C, C++, Go |
| Difficulty | Level 4: Expert |
| Time Estimate | 1 month+ |
| Knowledge Area | Database Internals / B-Trees / ACID |
| Tooling | File I/O, memory mapping |
| Prerequisites | Projects 1-4 completed, understanding of data structures |
What You Will Build
A persistent key-value storage engine with B-tree indexing, write-ahead logging, and ACID transactions—the core of databases like SQLite or LevelDB.
Why It Matters
This project builds core skills that appear repeatedly in real-world systems and tooling.
Core Challenges
- Implementing B-tree node splitting and merging → maps to complex data structure manipulation
- Write-ahead logging for crash recovery → maps to file I/O and fsync
- Page-level locking for concurrency → maps to RwLock and ownership
- Memory-mapped I/O for performance → maps to unsafe and raw pointers
Key Concepts
- B-tree algorithms: “Algorithms, Fourth Edition” Chapter 6 - Sedgewick
- Write-ahead logging: “Designing Data-Intensive Applications” Chapter 7 - Kleppmann
- ACID properties: “Designing Data-Intensive Applications” Chapter 7 - Kleppmann
- Memory-mapped files: “The Linux Programming Interface” Chapter 49 - Kerrisk
Real-World Outcome
$ cargo run --release
🗄️ RustKV - Embedded Storage Engine
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> put user:1 {"name": "Alice", "age": 30}
OK (0.1ms)
> put user:2 {"name": "Bob", "age": 25}
OK (0.08ms)
> get user:1
{"name": "Alice", "age": 30} (0.02ms)
> BEGIN
Transaction started
> put user:1 {"name": "Alice", "age": 31}
OK (staged)
> ROLLBACK
Transaction rolled back
> get user:1
{"name": "Alice", "age": 30} # Unchanged!
Benchmark:
Random writes: 150,000 ops/sec
Random reads: 450,000 ops/sec
Range scans: 2.1M keys/sec
Crash recovery test: PASSED
(Killed process during write, all committed data recovered)
Implementation Guide
- Reproduce the simplest happy-path scenario.
- Build the smallest working version of the core feature.
- Add input validation and error handling.
- Add instrumentation/logging to confirm behavior.
- Refactor into clean modules with tests.
Milestones
- Milestone 1: Minimal working program that runs end-to-end.
- Milestone 2: Correct outputs for typical inputs.
- Milestone 3: Robust handling of edge cases.
- Milestone 4: Clean structure and documented usage.
Validation Checklist
- Output matches the real-world outcome example
- Handles invalid inputs safely
- Provides clear errors and exit codes
- Repeatable results across runs
References
- Main guide:
LEARN_RUST_DEEP_DIVE.md - “Designing Data-Intensive Applications” by Martin Kleppmann