Project 5: Write-Ahead Logging (WAL) and Crash Recovery

Implement crash-safe durability by enforcing log-before-data ordering and deterministic recovery replay.

Quick Reference

Attribute	Value
Difficulty	Expert
Time Estimate	1-2 weeks
Main Programming Language	C (Alternatives: Rust, Go)
Alternative Programming Languages	Rust, Go
Coolness Level	Level 5: Pure Magic
Business Potential	Level 4: Open Core Infrastructure
Prerequisites	Projects 1-4, page-LSN awareness
Key Topics	WAL protocol, checkpoints, analysis/redo/undo workflow

1. Learning Objectives

Design WAL record schema with LSN discipline.
Enforce durable commit protocol (log flush before ACK).
Implement deterministic recovery path after crash injection.
Benchmark checkpoint interval tradeoffs.

2. All Theory Needed (Per-Concept Breakdown)

WAL Protocol and Recovery Invariants

Fundamentals

WAL’s core invariant is write ordering: before a dirty data page update becomes durable, corresponding log records must already be durable. This enables post-crash replay and consistency restoration.

Deep Dive into the concept

Durability without WAL requires forcing all modified pages at commit, which is slow and unpredictable. WAL allows commits to be durable after sequential log flushes, deferring data-page flush. Recovery then reconstructs state by replaying logged actions. LSN fields coordinate this process: each log record has LSN; each dirty page stores last-applied LSN. During redo, apply a log record only if page-LSN is behind it.

Checkpointing narrows replay scope. A checkpoint captures metadata about active transactions and dirty pages at a point in time. Restart can begin near that marker instead of scanning entire log history. Tradeoff: frequent checkpoints reduce restart work but increase runtime overhead.

Reliable WAL implementation must handle torn writes, partial tail records, and interrupted flush calls. Record checksums and length fields help detect invalid tails. Recovery must stop safely at first invalid tail boundary.

Testing should include randomized crash points across transaction lifecycle: before commit flush, after commit flush but before data flush, during data flush, and mid-checkpoint. A key validation is idempotence: running recovery twice should produce identical final state.

How this fit on projects

Primary in this project.
Mandatory for integrated transactional behavior in Projects 6 and 7.

Definitions & key terms

LSN: Monotonic log sequence number.
Redo: Reapplication of logged update.
Undo: Reversal of uncommitted update.
Checkpoint: Restart marker reducing replay window.

Mental model diagram

Update page P -> append WAL(L=900)
Commit txn -> fsync WAL up to L=930 -> ACK commit
Data page flush later
Crash -> restart -> redo records where page_lsn < record_lsn

How it works (step-by-step, with invariants and failure modes)

Emit WAL records for updates.
Assign LSNs and tag dirty pages with page-LSN.
On commit, flush WAL through commit LSN.
Recover with analysis + redo (+ undo if supported).

Failure modes:

Commit ACK before durable WAL flush.
Redo replay without page-LSN guard.
Partial WAL tail treated as valid record.

Minimal concrete example

[LSN 410] BEGIN T7
[LSN 418] UPDATE T7 page=12 slot=4
[LSN 430] COMMIT T7
fsync WAL <=430

Common misconceptions

“Commit means data pages already flushed.” -> False in WAL systems.
“Checkpoint eliminates recovery work.” -> It only shortens it.

Check-your-understanding questions

Why does page-LSN make redo idempotent?
Why does commit depend on log flush, not data flush?
What crash class requires backup/replication beyond WAL?

Check-your-understanding answers

It prevents applying already persisted updates again.
Sequential log durability is cheaper and sufficient for replay.
Media loss/corruption beyond local crash scenarios.

Real-world applications

PostgreSQL WAL and restart recovery.
SQLite journaling/WAL modes.

Where you’ll apply it

This project and integrated TinyDB in Project 7.

References

Key insights

WAL is a correctness protocol with performance benefits, not merely an audit log.

Summary

Recovery quality depends on disciplined write ordering, robust log parsing, and deterministic crash tests.

Homework/Exercises to practice the concept

Define WAL record schema for begin/update/commit/checkpoint.
Design crash matrix with at least six kill points.
Compare recovery time at three checkpoint intervals.

Solutions to the homework/exercises

Include txn_id, record_type, LSN, payload_len, checksum.
Cover pre/post commit flush and partial-tail scenarios.
Short intervals lower RTO but add write overhead.

3. Project Specification

3.1 What You Will Build

WAL subsystem with record parser, recovery engine, checkpoint writer, and crash harness.

3.2 Functional Requirements

Append typed WAL records.
Force WAL flush on commit.
Persist and load checkpoints.
Perform recovery after crash.
Validate idempotent replay.

3.3 Non-Functional Requirements

Performance: Group commit support for reduced flush overhead.
Reliability: Deterministic recovery on crash tests.
Usability: Human-readable log dump tooling.

3.4 Example Usage / Output

run workload with crash point -> recover -> verify committed state hash

3.5 Data Formats / Schemas / Protocols

WAL segment files with record header, payload, checksum; checkpoint metadata file.

3.6 Edge Cases

Partial final record, duplicate recovery runs, interrupted checkpoint write.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make wal_demo
./wal_demo run --ops 20000 --checkpoint-every 500 --crash-at 13742
./wal_demo recover
./wal_demo verify

3.7.2 Golden Path Demo (Deterministic)

Fixed operation seed and deterministic crash point.

3.7.3 If CLI: exact terminal transcript

$ ./wal_demo run --ops 20000 --checkpoint-every 500 --crash-at 13742
SIMULATED CRASH at op=13742
$ ./wal_demo recover
redo applied=1298 skipped=9821 undo_txns=4
RECOVERY COMPLETE state_hash=4d7a9c...
$ ./wal_demo verify
OK committed_state_consistent=true

4. Solution Architecture

4.1 High-Level Design

Txn Manager -> WAL Writer -> Durable Log Segments
             -> Checkpointer
Restart -> Recovery Engine (analysis/redo/undo)

4.2 Key Components

Component	Responsibility	Key Decisions
WAL writer	append + flush	checksummed records
Checkpointer	persist progress	interval config
Recovery engine	replay logic	page-LSN guard
Crash harness	deterministic kill	reproducible seeds

4.4 Data Structures (No Full Code)

WalRecord { lsn, txn_id, type, payload_len, checksum }
Checkpoint { ckpt_lsn, active_txns, dirty_page_table }

4.4 Algorithm Overview

Commit path: append -> flush -> ACK
Recovery path: analysis -> redo -> undo -> validate

5. Implementation Guide

5.1 Development Environment Setup

make wal_demo

5.2 Project Structure

project/
├── src/wal/
├── src/recovery/
├── tests/crash/
└── tools/logdump

5.3 The Core Question You’re Answering

What exact ordering rules guarantee committed data survives crashes?

5.4 Concepts You Must Understand First

WAL-before-data invariant
LSN idempotence
checkpoint economics

5.5 Questions to Guide Your Design

How do you identify end of valid WAL after crash?
Which records are mandatory for analysis phase?

5.6 Thinking Exercise

Draw timeline for one transaction with three crash points and expected recovery outcomes.

5.7 The Interview Questions They’ll Ask

What is WAL-before-data and why?
How do you detect torn WAL tails?
Why is recovery replay idempotent?
What tradeoff does checkpoint frequency create?

5.8 Hints in Layers

Build log parser before full recovery logic.
Add checksums and length fields early.
Verify replay idempotence by rerunning recovery.

5.9 Books That Will Help

Topic	Book	Chapter
Crash consistency	Operating Systems: Three Easy Pieces	crash chapters
Reliability in C	Effective C, 2nd Edition	error handling chapters

5.10 Implementation Phases

Phase 1: WAL append and parser
Phase 2: commit protocol and flush
Phase 3: recovery and checkpoints

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Record granularity	physical/logical	logical+page reference	simpler and compact
Commit flushing	per-txn/group	group commit option	better throughput

6. Testing Strategy

6.1 Test Categories

Unit: record encode/decode and checksum
Integration: commit/recover workflows
Crash: random kill-point matrix

6.2 Critical Test Cases

Crash before commit flush loses uncommitted txns.
Crash after commit flush preserves committed txn.
Recovery rerun yields unchanged state hash.

6.3 Test Data

Seeded transaction scripts with deterministic key/value updates.

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
Premature commit ACK	committed data missing	flush WAL before ACK
Missing page-LSN guard	duplicate replay effects	compare record LSN to page-LSN

7.2 Debugging Strategies

Log per-record replay decisions.
Keep logdump utility for forensic inspection.

7.3 Performance Traps

Over-frequent checkpoints can throttle write throughput.

8. Extensions & Challenges

8.1 Beginner Extensions

Add verbose recovery trace mode.

8.2 Intermediate Extensions

Add WAL segment archiving.

8.3 Advanced Extensions

Add point-in-time restore prototype.

9. Real-World Connections

9.1 Industry Applications

WAL and recovery are core reliability mechanisms in production DBMS engines.

PostgreSQL WAL internals docs.

9.3 Interview Relevance

Durability and recovery questions are central for DB/storage roles.

10. Resources

10.1 Essential Reading

PostgreSQL WAL docs and ARIES paper notes.

10.2 Video Resources

Recovery internals conference talks.

10.3 Tools & Documentation

Crash harness scripts, log parser, state-hash verifier.

Previous: Project 4
Next: Project 6

11. Self-Assessment Checklist

11.1 Understanding

I can explain commit durability conditions.
I can explain redo idempotence.

11.2 Implementation

Recovery passes crash matrix deterministically.
Checkpointing behavior is benchmarked.

11.3 Growth

I can reason about group commit tradeoffs.

12. Submission / Completion Criteria

Minimum Viable Completion:

WAL append + commit flush + simple redo recovery

Full Completion:

Plus checkpoints and undo handling with deterministic tests

Excellence:

Plus recovery tooling and performance tuning report