Project 5: Write-Ahead Logging (WAL) and Crash Recovery
Implement crash-safe durability by enforcing log-before-data ordering and deterministic recovery replay.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 1-2 weeks |
| Main Programming Language | C (Alternatives: Rust, Go) |
| Alternative Programming Languages | Rust, Go |
| Coolness Level | Level 5: Pure Magic |
| Business Potential | Level 4: Open Core Infrastructure |
| Prerequisites | Projects 1-4, page-LSN awareness |
| Key Topics | WAL protocol, checkpoints, analysis/redo/undo workflow |
1. Learning Objectives
- Design WAL record schema with LSN discipline.
- Enforce durable commit protocol (log flush before ACK).
- Implement deterministic recovery path after crash injection.
- Benchmark checkpoint interval tradeoffs.
2. All Theory Needed (Per-Concept Breakdown)
WAL Protocol and Recovery Invariants
Fundamentals
WAL’s core invariant is write ordering: before a dirty data page update becomes durable, corresponding log records must already be durable. This enables post-crash replay and consistency restoration.
Deep Dive into the concept
Durability without WAL requires forcing all modified pages at commit, which is slow and unpredictable. WAL allows commits to be durable after sequential log flushes, deferring data-page flush. Recovery then reconstructs state by replaying logged actions. LSN fields coordinate this process: each log record has LSN; each dirty page stores last-applied LSN. During redo, apply a log record only if page-LSN is behind it.
Checkpointing narrows replay scope. A checkpoint captures metadata about active transactions and dirty pages at a point in time. Restart can begin near that marker instead of scanning entire log history. Tradeoff: frequent checkpoints reduce restart work but increase runtime overhead.
Reliable WAL implementation must handle torn writes, partial tail records, and interrupted flush calls. Record checksums and length fields help detect invalid tails. Recovery must stop safely at first invalid tail boundary.
Testing should include randomized crash points across transaction lifecycle: before commit flush, after commit flush but before data flush, during data flush, and mid-checkpoint. A key validation is idempotence: running recovery twice should produce identical final state.
How this fit on projects
- Primary in this project.
- Mandatory for integrated transactional behavior in Projects 6 and 7.
Definitions & key terms
- LSN: Monotonic log sequence number.
- Redo: Reapplication of logged update.
- Undo: Reversal of uncommitted update.
- Checkpoint: Restart marker reducing replay window.
Mental model diagram
Update page P -> append WAL(L=900)
Commit txn -> fsync WAL up to L=930 -> ACK commit
Data page flush later
Crash -> restart -> redo records where page_lsn < record_lsn
How it works (step-by-step, with invariants and failure modes)
- Emit WAL records for updates.
- Assign LSNs and tag dirty pages with page-LSN.
- On commit, flush WAL through commit LSN.
- Recover with analysis + redo (+ undo if supported).
Failure modes:
- Commit ACK before durable WAL flush.
- Redo replay without page-LSN guard.
- Partial WAL tail treated as valid record.
Minimal concrete example
[LSN 410] BEGIN T7
[LSN 418] UPDATE T7 page=12 slot=4
[LSN 430] COMMIT T7
fsync WAL <=430
Common misconceptions
- “Commit means data pages already flushed.” -> False in WAL systems.
- “Checkpoint eliminates recovery work.” -> It only shortens it.
Check-your-understanding questions
- Why does page-LSN make redo idempotent?
- Why does commit depend on log flush, not data flush?
- What crash class requires backup/replication beyond WAL?
Check-your-understanding answers
- It prevents applying already persisted updates again.
- Sequential log durability is cheaper and sufficient for replay.
- Media loss/corruption beyond local crash scenarios.
Real-world applications
- PostgreSQL WAL and restart recovery.
- SQLite journaling/WAL modes.
Where you’ll apply it
- This project and integrated TinyDB in Project 7.
References
- https://www.postgresql.org/docs/current/wal-intro.html
- https://research.ibm.com/publications/aries-a-transaction-recovery-method-supporting-fine-granularity-locking-and-partial-rollbacks-using-write-ahead-logging
Key insights
WAL is a correctness protocol with performance benefits, not merely an audit log.
Summary
Recovery quality depends on disciplined write ordering, robust log parsing, and deterministic crash tests.
Homework/Exercises to practice the concept
- Define WAL record schema for begin/update/commit/checkpoint.
- Design crash matrix with at least six kill points.
- Compare recovery time at three checkpoint intervals.
Solutions to the homework/exercises
- Include txn_id, record_type, LSN, payload_len, checksum.
- Cover pre/post commit flush and partial-tail scenarios.
- Short intervals lower RTO but add write overhead.
3. Project Specification
3.1 What You Will Build
WAL subsystem with record parser, recovery engine, checkpoint writer, and crash harness.
3.2 Functional Requirements
- Append typed WAL records.
- Force WAL flush on commit.
- Persist and load checkpoints.
- Perform recovery after crash.
- Validate idempotent replay.
3.3 Non-Functional Requirements
- Performance: Group commit support for reduced flush overhead.
- Reliability: Deterministic recovery on crash tests.
- Usability: Human-readable log dump tooling.
3.4 Example Usage / Output
run workload with crash point -> recover -> verify committed state hash
3.5 Data Formats / Schemas / Protocols
WAL segment files with record header, payload, checksum; checkpoint metadata file.
3.6 Edge Cases
Partial final record, duplicate recovery runs, interrupted checkpoint write.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make wal_demo
./wal_demo run --ops 20000 --checkpoint-every 500 --crash-at 13742
./wal_demo recover
./wal_demo verify
3.7.2 Golden Path Demo (Deterministic)
Fixed operation seed and deterministic crash point.
3.7.3 If CLI: exact terminal transcript
$ ./wal_demo run --ops 20000 --checkpoint-every 500 --crash-at 13742
SIMULATED CRASH at op=13742
$ ./wal_demo recover
redo applied=1298 skipped=9821 undo_txns=4
RECOVERY COMPLETE state_hash=4d7a9c...
$ ./wal_demo verify
OK committed_state_consistent=true
4. Solution Architecture
4.1 High-Level Design
Txn Manager -> WAL Writer -> Durable Log Segments
-> Checkpointer
Restart -> Recovery Engine (analysis/redo/undo)
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| WAL writer | append + flush | checksummed records |
| Checkpointer | persist progress | interval config |
| Recovery engine | replay logic | page-LSN guard |
| Crash harness | deterministic kill | reproducible seeds |
4.4 Data Structures (No Full Code)
WalRecord { lsn, txn_id, type, payload_len, checksum }
Checkpoint { ckpt_lsn, active_txns, dirty_page_table }
4.4 Algorithm Overview
- Commit path: append -> flush -> ACK
- Recovery path: analysis -> redo -> undo -> validate
5. Implementation Guide
5.1 Development Environment Setup
make wal_demo
5.2 Project Structure
project/
├── src/wal/
├── src/recovery/
├── tests/crash/
└── tools/logdump
5.3 The Core Question You’re Answering
What exact ordering rules guarantee committed data survives crashes?
5.4 Concepts You Must Understand First
- WAL-before-data invariant
- LSN idempotence
- checkpoint economics
5.5 Questions to Guide Your Design
- How do you identify end of valid WAL after crash?
- Which records are mandatory for analysis phase?
5.6 Thinking Exercise
Draw timeline for one transaction with three crash points and expected recovery outcomes.
5.7 The Interview Questions They’ll Ask
- What is WAL-before-data and why?
- How do you detect torn WAL tails?
- Why is recovery replay idempotent?
- What tradeoff does checkpoint frequency create?
5.8 Hints in Layers
- Build log parser before full recovery logic.
- Add checksums and length fields early.
- Verify replay idempotence by rerunning recovery.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Crash consistency | Operating Systems: Three Easy Pieces | crash chapters |
| Reliability in C | Effective C, 2nd Edition | error handling chapters |
5.10 Implementation Phases
- Phase 1: WAL append and parser
- Phase 2: commit protocol and flush
- Phase 3: recovery and checkpoints
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Record granularity | physical/logical | logical+page reference | simpler and compact |
| Commit flushing | per-txn/group | group commit option | better throughput |
6. Testing Strategy
6.1 Test Categories
- Unit: record encode/decode and checksum
- Integration: commit/recover workflows
- Crash: random kill-point matrix
6.2 Critical Test Cases
- Crash before commit flush loses uncommitted txns.
- Crash after commit flush preserves committed txn.
- Recovery rerun yields unchanged state hash.
6.3 Test Data
Seeded transaction scripts with deterministic key/value updates.
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| Premature commit ACK | committed data missing | flush WAL before ACK |
| Missing page-LSN guard | duplicate replay effects | compare record LSN to page-LSN |
7.2 Debugging Strategies
- Log per-record replay decisions.
- Keep
logdumputility for forensic inspection.
7.3 Performance Traps
Over-frequent checkpoints can throttle write throughput.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add verbose recovery trace mode.
8.2 Intermediate Extensions
- Add WAL segment archiving.
8.3 Advanced Extensions
- Add point-in-time restore prototype.
9. Real-World Connections
9.1 Industry Applications
WAL and recovery are core reliability mechanisms in production DBMS engines.
9.2 Related Open Source Projects
- PostgreSQL WAL internals docs.
9.3 Interview Relevance
Durability and recovery questions are central for DB/storage roles.
10. Resources
10.1 Essential Reading
- PostgreSQL WAL docs and ARIES paper notes.
10.2 Video Resources
- Recovery internals conference talks.
10.3 Tools & Documentation
- Crash harness scripts, log parser, state-hash verifier.
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain commit durability conditions.
- I can explain redo idempotence.
11.2 Implementation
- Recovery passes crash matrix deterministically.
- Checkpointing behavior is benchmarked.
11.3 Growth
- I can reason about group commit tradeoffs.
12. Submission / Completion Criteria
Minimum Viable Completion:
- WAL append + commit flush + simple redo recovery
Full Completion:
- Plus checkpoints and undo handling with deterministic tests
Excellence:
- Plus recovery tooling and performance tuning report