Project 12: Resilient Storage and Logging Engine (FAT/LittleFS + Atomic Writes)

Build a crash-resilient storage subsystem with atomic writes, circular logs, corruption detection, and deterministic recovery behavior.

Quick Reference

Attribute Value
Difficulty Expert
Time Estimate 2-3 weeks
Main Programming Language C
Alternative Programming Languages C++, Rust
Coolness Level Level 3
Business Potential Level 4
Prerequisites Filesystem concepts, CRC/checksum basics
Key Topics FAT vs LittleFS, write amplification, atomic commits, log rotation

1. Learning Objectives

  1. Design power-loss-safe write paths.
  2. Build circular logging with bounded storage growth.
  3. Recover cleanly from torn writes and corruption.
  4. Compare FAT and LittleFS tradeoffs by workload.

2. Theory Primer

2.1 Durability in Embedded Storage

Durability requires explicit protocol design. Direct overwrite is fragile; staged writes with integrity metadata are safer.

2.2 Logging and Recovery Mechanics

Binary logs are compact and fast, text logs are human-readable; production systems often use both.

3. Specification

  • Implement atomic settings writes (tmp+commit).
  • Implement circular log with max size cap.
  • Implement startup scanner with repair/quarantine decisions.

Example output:

I storage: commit settings v12 -> PASS
W storage: crc mismatch record=1842 skipped
I storage: recovery used snapshot v11
I storage: ring usage=73% within policy

4. Architecture

[API] -> [Durability Layer] -> [FS Adapter]
                    -> [Integrity Scanner]
                    -> [Recovery Manager]

5. Implementation Guide

Core question:

“How do I guarantee trustworthy data when power can disappear at any moment?”

Design questions:

  1. Which records need strict durability?
  2. How do you version your on-disk formats?
  3. How is timestamp quality handled without NTP?

6. Testing

  • Fault injection: reset between write phases.
  • Corruption corpus for parser/scanner tests.
  • Long-run logging growth and rotation test.

7. Pitfalls

  • Assuming rename is always atomic across all environments.
  • Treating wall-clock as always available.
  • Ignoring write amplification under small writes.

8. Extensions

  • Add WAL-style journal mode.
  • Add remote log upload with replay checkpoints.

9. Completion

  • Power-loss tests pass with deterministic recovery.
  • Log growth remains bounded.
  • Corruption handling is observable and safe.