Project 5: Write-Ahead Logging (WAL) and Crash Recovery

Implement crash-safe durability by enforcing log-before-data ordering and deterministic recovery replay.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 1-2 weeks
Main Programming Language C (Alternatives: Rust, Go)
Alternative Programming Languages Rust, Go
Coolness Level Level 5: Pure Magic
Business Potential Level 4: Open Core Infrastructure
Prerequisites Projects 1-4, page-LSN awareness
Key Topics WAL protocol, checkpoints, analysis/redo/undo workflow

1. Learning Objectives

  1. Design WAL record schema with LSN discipline.
  2. Enforce durable commit protocol (log flush before ACK).
  3. Implement deterministic recovery path after crash injection.
  4. Benchmark checkpoint interval tradeoffs.

2. All Theory Needed (Per-Concept Breakdown)

WAL Protocol and Recovery Invariants

Fundamentals

WAL’s core invariant is write ordering: before a dirty data page update becomes durable, corresponding log records must already be durable. This enables post-crash replay and consistency restoration.

Deep Dive into the concept

Durability without WAL requires forcing all modified pages at commit, which is slow and unpredictable. WAL allows commits to be durable after sequential log flushes, deferring data-page flush. Recovery then reconstructs state by replaying logged actions. LSN fields coordinate this process: each log record has LSN; each dirty page stores last-applied LSN. During redo, apply a log record only if page-LSN is behind it.

Checkpointing narrows replay scope. A checkpoint captures metadata about active transactions and dirty pages at a point in time. Restart can begin near that marker instead of scanning entire log history. Tradeoff: frequent checkpoints reduce restart work but increase runtime overhead.

Reliable WAL implementation must handle torn writes, partial tail records, and interrupted flush calls. Record checksums and length fields help detect invalid tails. Recovery must stop safely at first invalid tail boundary.

Testing should include randomized crash points across transaction lifecycle: before commit flush, after commit flush but before data flush, during data flush, and mid-checkpoint. A key validation is idempotence: running recovery twice should produce identical final state.

How this fit on projects

  • Primary in this project.
  • Mandatory for integrated transactional behavior in Projects 6 and 7.

Definitions & key terms

  • LSN: Monotonic log sequence number.
  • Redo: Reapplication of logged update.
  • Undo: Reversal of uncommitted update.
  • Checkpoint: Restart marker reducing replay window.

Mental model diagram

Update page P -> append WAL(L=900)
Commit txn -> fsync WAL up to L=930 -> ACK commit
Data page flush later
Crash -> restart -> redo records where page_lsn < record_lsn

How it works (step-by-step, with invariants and failure modes)

  1. Emit WAL records for updates.
  2. Assign LSNs and tag dirty pages with page-LSN.
  3. On commit, flush WAL through commit LSN.
  4. Recover with analysis + redo (+ undo if supported).

Failure modes:

  • Commit ACK before durable WAL flush.
  • Redo replay without page-LSN guard.
  • Partial WAL tail treated as valid record.

Minimal concrete example

[LSN 410] BEGIN T7
[LSN 418] UPDATE T7 page=12 slot=4
[LSN 430] COMMIT T7
fsync WAL <=430

Common misconceptions

  • “Commit means data pages already flushed.” -> False in WAL systems.
  • “Checkpoint eliminates recovery work.” -> It only shortens it.

Check-your-understanding questions

  1. Why does page-LSN make redo idempotent?
  2. Why does commit depend on log flush, not data flush?
  3. What crash class requires backup/replication beyond WAL?

Check-your-understanding answers

  1. It prevents applying already persisted updates again.
  2. Sequential log durability is cheaper and sufficient for replay.
  3. Media loss/corruption beyond local crash scenarios.

Real-world applications

  • PostgreSQL WAL and restart recovery.
  • SQLite journaling/WAL modes.

Where you’ll apply it

  • This project and integrated TinyDB in Project 7.

References

Key insights

WAL is a correctness protocol with performance benefits, not merely an audit log.

Summary

Recovery quality depends on disciplined write ordering, robust log parsing, and deterministic crash tests.

Homework/Exercises to practice the concept

  1. Define WAL record schema for begin/update/commit/checkpoint.
  2. Design crash matrix with at least six kill points.
  3. Compare recovery time at three checkpoint intervals.

Solutions to the homework/exercises

  1. Include txn_id, record_type, LSN, payload_len, checksum.
  2. Cover pre/post commit flush and partial-tail scenarios.
  3. Short intervals lower RTO but add write overhead.

3. Project Specification

3.1 What You Will Build

WAL subsystem with record parser, recovery engine, checkpoint writer, and crash harness.

3.2 Functional Requirements

  1. Append typed WAL records.
  2. Force WAL flush on commit.
  3. Persist and load checkpoints.
  4. Perform recovery after crash.
  5. Validate idempotent replay.

3.3 Non-Functional Requirements

  • Performance: Group commit support for reduced flush overhead.
  • Reliability: Deterministic recovery on crash tests.
  • Usability: Human-readable log dump tooling.

3.4 Example Usage / Output

run workload with crash point -> recover -> verify committed state hash

3.5 Data Formats / Schemas / Protocols

WAL segment files with record header, payload, checksum; checkpoint metadata file.

3.6 Edge Cases

Partial final record, duplicate recovery runs, interrupted checkpoint write.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make wal_demo
./wal_demo run --ops 20000 --checkpoint-every 500 --crash-at 13742
./wal_demo recover
./wal_demo verify

3.7.2 Golden Path Demo (Deterministic)

Fixed operation seed and deterministic crash point.

3.7.3 If CLI: exact terminal transcript

$ ./wal_demo run --ops 20000 --checkpoint-every 500 --crash-at 13742
SIMULATED CRASH at op=13742
$ ./wal_demo recover
redo applied=1298 skipped=9821 undo_txns=4
RECOVERY COMPLETE state_hash=4d7a9c...
$ ./wal_demo verify
OK committed_state_consistent=true

4. Solution Architecture

4.1 High-Level Design

Txn Manager -> WAL Writer -> Durable Log Segments
             -> Checkpointer
Restart -> Recovery Engine (analysis/redo/undo)

4.2 Key Components

Component Responsibility Key Decisions
WAL writer append + flush checksummed records
Checkpointer persist progress interval config
Recovery engine replay logic page-LSN guard
Crash harness deterministic kill reproducible seeds

4.4 Data Structures (No Full Code)

WalRecord { lsn, txn_id, type, payload_len, checksum }
Checkpoint { ckpt_lsn, active_txns, dirty_page_table }

4.4 Algorithm Overview

  • Commit path: append -> flush -> ACK
  • Recovery path: analysis -> redo -> undo -> validate

5. Implementation Guide

5.1 Development Environment Setup

make wal_demo

5.2 Project Structure

project/
├── src/wal/
├── src/recovery/
├── tests/crash/
└── tools/logdump

5.3 The Core Question You’re Answering

What exact ordering rules guarantee committed data survives crashes?

5.4 Concepts You Must Understand First

  • WAL-before-data invariant
  • LSN idempotence
  • checkpoint economics

5.5 Questions to Guide Your Design

  • How do you identify end of valid WAL after crash?
  • Which records are mandatory for analysis phase?

5.6 Thinking Exercise

Draw timeline for one transaction with three crash points and expected recovery outcomes.

5.7 The Interview Questions They’ll Ask

  1. What is WAL-before-data and why?
  2. How do you detect torn WAL tails?
  3. Why is recovery replay idempotent?
  4. What tradeoff does checkpoint frequency create?

5.8 Hints in Layers

  • Build log parser before full recovery logic.
  • Add checksums and length fields early.
  • Verify replay idempotence by rerunning recovery.

5.9 Books That Will Help

Topic Book Chapter
Crash consistency Operating Systems: Three Easy Pieces crash chapters
Reliability in C Effective C, 2nd Edition error handling chapters

5.10 Implementation Phases

  • Phase 1: WAL append and parser
  • Phase 2: commit protocol and flush
  • Phase 3: recovery and checkpoints

5.11 Key Implementation Decisions

Decision Options Recommendation Rationale
Record granularity physical/logical logical+page reference simpler and compact
Commit flushing per-txn/group group commit option better throughput

6. Testing Strategy

6.1 Test Categories

  • Unit: record encode/decode and checksum
  • Integration: commit/recover workflows
  • Crash: random kill-point matrix

6.2 Critical Test Cases

  1. Crash before commit flush loses uncommitted txns.
  2. Crash after commit flush preserves committed txn.
  3. Recovery rerun yields unchanged state hash.

6.3 Test Data

Seeded transaction scripts with deterministic key/value updates.


7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall Symptom Solution
Premature commit ACK committed data missing flush WAL before ACK
Missing page-LSN guard duplicate replay effects compare record LSN to page-LSN

7.2 Debugging Strategies

  • Log per-record replay decisions.
  • Keep logdump utility for forensic inspection.

7.3 Performance Traps

Over-frequent checkpoints can throttle write throughput.


8. Extensions & Challenges

8.1 Beginner Extensions

  • Add verbose recovery trace mode.

8.2 Intermediate Extensions

  • Add WAL segment archiving.

8.3 Advanced Extensions

  • Add point-in-time restore prototype.

9. Real-World Connections

9.1 Industry Applications

WAL and recovery are core reliability mechanisms in production DBMS engines.

  • PostgreSQL WAL internals docs.

9.3 Interview Relevance

Durability and recovery questions are central for DB/storage roles.


10. Resources

10.1 Essential Reading

  • PostgreSQL WAL docs and ARIES paper notes.

10.2 Video Resources

  • Recovery internals conference talks.

10.3 Tools & Documentation

  • Crash harness scripts, log parser, state-hash verifier.

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain commit durability conditions.
  • I can explain redo idempotence.

11.2 Implementation

  • Recovery passes crash matrix deterministically.
  • Checkpointing behavior is benchmarked.

11.3 Growth

  • I can reason about group commit tradeoffs.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • WAL append + commit flush + simple redo recovery

Full Completion:

  • Plus checkpoints and undo handling with deterministic tests

Excellence:

  • Plus recovery tooling and performance tuning report