Project 12: Robust Data Logger with Rotation and Integrity Checks
Build a log system that survives power loss using atomic writes, rotation, and checksums.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Intermediate |
| Time Estimate | 1–2 weekends |
| Main Programming Language | Python (Alternatives: Go, Rust, C) |
| Alternative Programming Languages | Go, Rust, C |
| Coolness Level | Medium |
| Business Potential | High |
| Prerequisites | Linux file I/O, basic scripting |
| Key Topics | Atomic writes, fsync, log rotation, checksums |
1. Learning Objectives
By completing this project, you will:
- Implement atomic log writes with fsync.
- Rotate logs based on time or size.
- Compute and verify checksums.
- Survive simulated power loss without losing all data.
2. All Theory Needed (Per-Concept Breakdown)
Concept 1: Filesystem Durability, Atomic Rename, and Integrity Checks
Fundamentals
SD cards are vulnerable to corruption during power loss. File systems rely on caches, so data is not always written to disk immediately. An atomic rename is a technique where you write data to a temporary file and then rename it into place, which is typically an atomic operation on POSIX filesystems. fsync forces data to be flushed to disk. Checksums detect corruption by verifying that file contents match a stored hash. A robust logger uses these techniques to survive crashes and power loss.
Deep Dive into the concept
When you write to a file on Linux, the data often goes into page cache first and is flushed later. If power is lost before the flush, the data is lost. fsync forces the OS to flush data to the physical device. However, excessive fsync calls can be slow and wear out SD cards. A balanced approach is to fsync at the end of a log write or after a batch of writes. This is why batching and rotation are useful: you can write data in chunks and then fsync once per chunk.
Atomic rename is a classic technique for durability. You write to a temporary file (e.g., log.tmp), fsync it, then rename it to log.csv. On most filesystems, rename is atomic: at any moment, the file is either the old version or the new version, never a partial mix. This is crucial for metadata files such as “current log pointer” or checksum files. For append-only logs, you can also write entries with checksums and use a validation pass on startup to detect truncation.
Log rotation is the mechanism to split logs into manageable files. Rotation can be based on size (e.g., 10 MB) or time (e.g., daily). For SD cards, time-based rotation is often better because it aligns with human analysis and allows easy deletion of old logs by directory. A retention policy determines how many days to keep. Implementing rotation safely requires closing the current log, fsyncing it, computing and writing a checksum, and then opening a new log file.
Checksums provide integrity. A simple approach is to compute a hash (SHA-256) for each log file and store it alongside the file. When you read logs later, you can verify that the hash matches. Another approach is to include a checksum per record, so if a file is truncated you can detect the last valid entry. This helps recovery after power loss: you can scan entries and stop at the first invalid checksum.
Because logging is continuous, you must also consider performance. Writing small lines and fsyncing every line is too slow and wears out the SD card. Instead, buffer entries in memory and flush every N lines or every N seconds. The buffer size is a tradeoff between durability and performance. For example, flushing every 1–5 seconds provides good durability without excessive writes.
How this fit on projects
This concept is central to §3 and §5.10. It also supports Projects 7 and 11 by logging sensor and power data safely.
Definitions & key terms
- fsync: Force OS to flush file buffers.
- Atomic rename: File replacement without partial state.
- Log rotation: Splitting logs by size or time.
- Checksum: Hash used to verify integrity.
Mental model diagram (ASCII)
Write -> Buffer -> fsync -> Rotate -> Checksum
How it works (step-by-step, with invariants and failure modes)
- Append entries to buffer.
- Flush buffer to temp file and fsync.
- Rename temp to final log (atomic).
- Compute checksum and store.
Failure modes:
- Power loss before fsync -> data lost.
- No rotation -> disk fills.
- Missing checksum -> integrity unknown.
Minimal concrete example
with open(tmp, "wb") as f:
f.write(data)
f.flush(); os.fsync(f.fileno())
os.rename(tmp, final)
Common misconceptions
- “Writing means data is safe.” Not until fsync.
- “Rotation is optional.” Without it, disks fill.
Check-your-understanding questions
- Why is rename considered atomic?
- What does fsync guarantee?
- How can checksums detect corruption?
Check-your-understanding answers
- Filesystem ensures rename updates metadata atomically.
- It flushes file data and metadata to disk.
- A mismatch indicates data changed or truncated.
Real-world applications
- Data loggers in industrial systems and IoT devices.
Where you’ll apply it
- This project: §3.2, §5.10.
- Other projects: Project 7, Project 11.
References
- “The Linux Programming Interface” — file I/O and fsync
- “Making Embedded Systems” — reliability
Key insights
Durability is a deliberate design choice; you must trade off speed and safety explicitly.
Summary
Atomic writes, fsync, and checksums are the building blocks of robust logging.
Homework/Exercises to practice the concept
- Implement a simple log with fsync every N lines.
- Simulate power loss and recover the last valid entry.
- Measure how fsync frequency affects write speed.
Solutions to the homework/exercises
- Use a buffer and fsync every 10 lines.
- Scan entries and stop at first invalid checksum.
- Frequent fsync slows writes; find a balance.
3. Project Specification
3.1 What You Will Build
A data logger that rotates logs daily, computes checksums, and survives power loss.
3.2 Functional Requirements
- Write sensor data to a log file.
- Rotate logs daily or by size.
- Compute checksums for each log.
- Recover safely after power loss.
3.3 Non-Functional Requirements
- Performance: Write 1 log entry/sec with no backlog.
- Reliability: Survive power loss without total data loss.
- Usability: Logs are easy to parse.
3.4 Example Usage / Output
$ ./logger_status
Active log: logs/2026-01-01.csv
Files retained: 7 days
Last checksum: OK
3.5 Data Formats / Schemas / Protocols
CSV log:
2026-01-01T12:00:00Z,22.6,45.2
3.6 Edge Cases
- Power loss mid-write.
- Disk full during rotation.
- Checksum file missing.
3.7 Real World Outcome
Logs rotate cleanly and can be validated even after crashes.
3.7.1 How to Run (Copy/Paste)
python3 logger.py --rotate daily --retain 7
3.7.2 Golden Path Demo (Deterministic)
export FIXED_TIME="2026-01-01T12:00:00Z"
python3 logger.py --simulate --entries 3
Expected output:
[2026-01-01T12:00:00Z] Wrote log entry
3.7.3 Failure Demo (Deterministic)
python3 logger.py --simulate --disk-full
Expected output:
[ERROR] Disk full during log rotation
Exit code: 121
3.7.4 CLI Exit Codes
0: Success120: Log write failure121: Disk full
4. Solution Architecture
4.1 High-Level Design
Data Source -> Buffer -> Log Writer -> Rotation -> Checksum
4.2 Key Components
| Component | Responsibility | Key Decisions | |—|—|—| | Log Writer | Append entries | Buffer size | | Rotator | Switch files | Time vs size | | Checksum | Verify integrity | SHA-256 vs CRC | | Recovery | Scan on startup | Last valid entry |
4.3 Data Structures (No Full Code)
buffer = []
4.4 Algorithm Overview
Key Algorithm: Safe Rotation
- Close and fsync current log.
- Compute checksum.
- Open new log.
Complexity Analysis:
- Time: O(n) for checksum
- Space: O(1) aside from buffer
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install -y coreutils
5.2 Project Structure
project-root/
├── logger.py
├── checksum.py
└── README.md
5.3 The Core Question You’re Answering
“How do you keep logs safe on fragile storage that can lose power at any time?”
5.4 Concepts You Must Understand First
- fsync and write caching.
- Atomic rename and rotation.
- Checksum verification.
5.5 Questions to Guide Your Design
- How many days of logs must be retained?
- What is your strategy for partial writes?
5.6 Thinking Exercise
Design a write sequence that survives power loss at any step.
5.7 The Interview Questions They’ll Ask
- Why are SD cards unreliable under heavy writes?
- What is an atomic rename and why is it useful?
- How do you detect corrupted log files?
5.8 Hints in Layers
Hint 1: Start with daily log files.
Hint 2: Add fsync after each batch.
Hint 3: Add checksums and recovery logic.
5.9 Books That Will Help
| Topic | Book | Chapter | |—|—|—| | File I/O | The Linux Programming Interface | Ch. 13 | | Reliability | Making Embedded Systems | Ch. 7 |
5.10 Implementation Phases
Phase 1: Basic logging (3 hours)
- Write entries to file.
Phase 2: Rotation (4 hours)
- Rotate daily and retain N days.
Phase 3: Integrity (4 hours)
- Add checksums and recovery scan.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |—|—|—|—| | Rotation | Time / Size | Time | Predictable retention | | Checksum | SHA-256 / CRC32 | SHA-256 | Strong integrity |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |—|—|—| | Unit Tests | Checksum validation | Known file hash | | Integration Tests | Rotation | Daily switch | | Edge Case Tests | Power loss | Simulated crash |
6.2 Critical Test Cases
- Log rotates at midnight boundary.
- Checksum mismatch is detected.
- Disk full -> exit
121.
6.3 Test Data
entries=10 per file
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |—|—|—| | No fsync | Data lost | Add flush + fsync | | No rotation | Disk fills | Implement retention | | Weak checksum | Undetected corruption | Use SHA-256 |
7.2 Debugging Strategies
- Pull power mid-write and inspect log integrity.
- Verify checksums with
sha256sum.
7.3 Performance Traps
- Fsync per line is too slow; batch writes.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add gzip compression for old logs.
8.2 Intermediate Extensions
- Add JSON log format with schema version.
8.3 Advanced Extensions
- Implement append-only with per-record checksums.
9. Real-World Connections
9.1 Industry Applications
- Flight data recorders, industrial logging, IoT telemetry.
9.2 Related Open Source Projects
logrotatefor system logs.
9.3 Interview Relevance
- Durability and atomicity are common systems interview topics.
10. Resources
10.1 Essential Reading
fsyncandrenameman pages.
10.2 Video Resources
- Filesystem durability tutorials.
10.3 Tools & Documentation
sha256sumandlogrotatedocs.
10.4 Related Projects in This Series
- Previous: Project 11
- Next: Project 13
11. Self-Assessment Checklist
11.1 Understanding
- I can explain fsync and atomic rename.
- I can explain why checksums matter.
11.2 Implementation
- Logs rotate and retain correctly.
- Integrity checks pass after power loss simulation.
11.3 Growth
- I can discuss durability tradeoffs in interviews.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Log writes and daily rotation.
Full Completion:
- Checksums and power-loss recovery.
Excellence (Going Above & Beyond):
- Append-only logs with per-record checksums.