Project 12: Resilient Storage and Logging Engine (FAT/LittleFS + Atomic Writes)
Build a crash-resilient storage subsystem with atomic writes, circular logs, corruption detection, and deterministic recovery behavior.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 2-3 weeks |
| Main Programming Language | C |
| Alternative Programming Languages | C++, Rust |
| Coolness Level | Level 3 |
| Business Potential | Level 4 |
| Prerequisites | Filesystem concepts, CRC/checksum basics |
| Key Topics | FAT vs LittleFS, write amplification, atomic commits, log rotation |
1. Learning Objectives
- Design power-loss-safe write paths.
- Build circular logging with bounded storage growth.
- Recover cleanly from torn writes and corruption.
- Compare FAT and LittleFS tradeoffs by workload.
2. Theory Primer
2.1 Durability in Embedded Storage
Durability requires explicit protocol design. Direct overwrite is fragile; staged writes with integrity metadata are safer.
2.2 Logging and Recovery Mechanics
Binary logs are compact and fast, text logs are human-readable; production systems often use both.
3. Specification
- Implement atomic settings writes (tmp+commit).
- Implement circular log with max size cap.
- Implement startup scanner with repair/quarantine decisions.
Example output:
I storage: commit settings v12 -> PASS
W storage: crc mismatch record=1842 skipped
I storage: recovery used snapshot v11
I storage: ring usage=73% within policy
4. Architecture
[API] -> [Durability Layer] -> [FS Adapter]
-> [Integrity Scanner]
-> [Recovery Manager]
5. Implementation Guide
Core question:
“How do I guarantee trustworthy data when power can disappear at any moment?”
Design questions:
- Which records need strict durability?
- How do you version your on-disk formats?
- How is timestamp quality handled without NTP?
6. Testing
- Fault injection: reset between write phases.
- Corruption corpus for parser/scanner tests.
- Long-run logging growth and rotation test.
7. Pitfalls
- Assuming rename is always atomic across all environments.
- Treating wall-clock as always available.
- Ignoring write amplification under small writes.
8. Extensions
- Add WAL-style journal mode.
- Add remote log upload with replay checkpoints.
9. Completion
- Power-loss tests pass with deterministic recovery.
- Log growth remains bounded.
- Corruption handling is observable and safe.