Project 12: Robust Data Logger with Rotation and Integrity Checks

Build a log system that survives power loss using atomic writes, rotation, and checksums.

Quick Reference

Attribute	Value
Difficulty	Intermediate
Time Estimate	1–2 weekends
Main Programming Language	Python (Alternatives: Go, Rust, C)
Alternative Programming Languages	Go, Rust, C
Coolness Level	Medium
Business Potential	High
Prerequisites	Linux file I/O, basic scripting
Key Topics	Atomic writes, fsync, log rotation, checksums

1. Learning Objectives

By completing this project, you will:

Implement atomic log writes with fsync.
Rotate logs based on time or size.
Compute and verify checksums.
Survive simulated power loss without losing all data.

2. All Theory Needed (Per-Concept Breakdown)

Concept 1: Filesystem Durability, Atomic Rename, and Integrity Checks

Fundamentals

SD cards are vulnerable to corruption during power loss. File systems rely on caches, so data is not always written to disk immediately. An atomic rename is a technique where you write data to a temporary file and then rename it into place, which is typically an atomic operation on POSIX filesystems. fsync forces data to be flushed to disk. Checksums detect corruption by verifying that file contents match a stored hash. A robust logger uses these techniques to survive crashes and power loss.

Deep Dive into the concept

When you write to a file on Linux, the data often goes into page cache first and is flushed later. If power is lost before the flush, the data is lost. fsync forces the OS to flush data to the physical device. However, excessive fsync calls can be slow and wear out SD cards. A balanced approach is to fsync at the end of a log write or after a batch of writes. This is why batching and rotation are useful: you can write data in chunks and then fsync once per chunk.

Atomic rename is a classic technique for durability. You write to a temporary file (e.g., log.tmp), fsync it, then rename it to log.csv. On most filesystems, rename is atomic: at any moment, the file is either the old version or the new version, never a partial mix. This is crucial for metadata files such as “current log pointer” or checksum files. For append-only logs, you can also write entries with checksums and use a validation pass on startup to detect truncation.

Log rotation is the mechanism to split logs into manageable files. Rotation can be based on size (e.g., 10 MB) or time (e.g., daily). For SD cards, time-based rotation is often better because it aligns with human analysis and allows easy deletion of old logs by directory. A retention policy determines how many days to keep. Implementing rotation safely requires closing the current log, fsyncing it, computing and writing a checksum, and then opening a new log file.

Checksums provide integrity. A simple approach is to compute a hash (SHA-256) for each log file and store it alongside the file. When you read logs later, you can verify that the hash matches. Another approach is to include a checksum per record, so if a file is truncated you can detect the last valid entry. This helps recovery after power loss: you can scan entries and stop at the first invalid checksum.

Because logging is continuous, you must also consider performance. Writing small lines and fsyncing every line is too slow and wears out the SD card. Instead, buffer entries in memory and flush every N lines or every N seconds. The buffer size is a tradeoff between durability and performance. For example, flushing every 1–5 seconds provides good durability without excessive writes.

How this fit on projects

This concept is central to §3 and §5.10. It also supports Projects 7 and 11 by logging sensor and power data safely.

Definitions & key terms

fsync: Force OS to flush file buffers.
Atomic rename: File replacement without partial state.
Log rotation: Splitting logs by size or time.
Checksum: Hash used to verify integrity.

Mental model diagram (ASCII)

Write -> Buffer -> fsync -> Rotate -> Checksum

How it works (step-by-step, with invariants and failure modes)

Append entries to buffer.
Flush buffer to temp file and fsync.
Rename temp to final log (atomic).
Compute checksum and store.

Failure modes:

Power loss before fsync -> data lost.
No rotation -> disk fills.
Missing checksum -> integrity unknown.

Minimal concrete example

with open(tmp, "wb") as f:
    f.write(data)
    f.flush(); os.fsync(f.fileno())
os.rename(tmp, final)

Common misconceptions

“Writing means data is safe.” Not until fsync.
“Rotation is optional.” Without it, disks fill.

Check-your-understanding questions

Why is rename considered atomic?
What does fsync guarantee?
How can checksums detect corruption?

Check-your-understanding answers

Filesystem ensures rename updates metadata atomically.
It flushes file data and metadata to disk.
A mismatch indicates data changed or truncated.

Real-world applications

Data loggers in industrial systems and IoT devices.

Where you’ll apply it

This project: §3.2, §5.10.
Other projects: Project 7, Project 11.

References

“The Linux Programming Interface” — file I/O and fsync
“Making Embedded Systems” — reliability

Key insights

Durability is a deliberate design choice; you must trade off speed and safety explicitly.

Summary

Atomic writes, fsync, and checksums are the building blocks of robust logging.

Homework/Exercises to practice the concept

Implement a simple log with fsync every N lines.
Simulate power loss and recover the last valid entry.
Measure how fsync frequency affects write speed.

Solutions to the homework/exercises

Use a buffer and fsync every 10 lines.
Scan entries and stop at first invalid checksum.
Frequent fsync slows writes; find a balance.

3. Project Specification

3.1 What You Will Build

A data logger that rotates logs daily, computes checksums, and survives power loss.

3.2 Functional Requirements

Write sensor data to a log file.
Rotate logs daily or by size.
Compute checksums for each log.
Recover safely after power loss.

3.3 Non-Functional Requirements

Performance: Write 1 log entry/sec with no backlog.
Reliability: Survive power loss without total data loss.
Usability: Logs are easy to parse.

3.4 Example Usage / Output

$ ./logger_status
Active log: logs/2026-01-01.csv
Files retained: 7 days
Last checksum: OK

3.5 Data Formats / Schemas / Protocols

CSV log:

2026-01-01T12:00:00Z,22.6,45.2

3.6 Edge Cases

Power loss mid-write.
Disk full during rotation.
Checksum file missing.

3.7 Real World Outcome

Logs rotate cleanly and can be validated even after crashes.

3.7.1 How to Run (Copy/Paste)

python3 logger.py --rotate daily --retain 7

3.7.2 Golden Path Demo (Deterministic)

export FIXED_TIME="2026-01-01T12:00:00Z"
python3 logger.py --simulate --entries 3

Expected output:

[2026-01-01T12:00:00Z] Wrote log entry

3.7.3 Failure Demo (Deterministic)

python3 logger.py --simulate --disk-full

Expected output:

[ERROR] Disk full during log rotation

Exit code: 121

3.7.4 CLI Exit Codes

0: Success
120: Log write failure
121: Disk full

4. Solution Architecture

4.1 High-Level Design

Data Source -> Buffer -> Log Writer -> Rotation -> Checksum

4.2 Key Components

4.3 Data Structures (No Full Code)

buffer = []

4.4 Algorithm Overview

Key Algorithm: Safe Rotation

Close and fsync current log.
Compute checksum.
Open new log.

Complexity Analysis:

Time: O(n) for checksum
Space: O(1) aside from buffer

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install -y coreutils

5.2 Project Structure

project-root/
├── logger.py
├── checksum.py
└── README.md

5.3 The Core Question You’re Answering

“How do you keep logs safe on fragile storage that can lose power at any time?”

5.4 Concepts You Must Understand First

fsync and write caching.
Atomic rename and rotation.
Checksum verification.

5.5 Questions to Guide Your Design

How many days of logs must be retained?
What is your strategy for partial writes?

5.6 Thinking Exercise

Design a write sequence that survives power loss at any step.

5.7 The Interview Questions They’ll Ask

Why are SD cards unreliable under heavy writes?
What is an atomic rename and why is it useful?
How do you detect corrupted log files?

5.8 Hints in Layers

Hint 1: Start with daily log files.

Hint 2: Add fsync after each batch.

Hint 3: Add checksums and recovery logic.

5.9 Books That Will Help

| Topic | Book | Chapter | |—|—|—| | File I/O | The Linux Programming Interface | Ch. 13 | | Reliability | Making Embedded Systems | Ch. 7 |

5.10 Implementation Phases

Phase 1: Basic logging (3 hours)

Write entries to file.

Phase 2: Rotation (4 hours)

Rotate daily and retain N days.

Phase 3: Integrity (4 hours)

Add checksums and recovery scan.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Log rotates at midnight boundary.
Checksum mismatch is detected.
Disk full -> exit 121.

6.3 Test Data

entries=10 per file

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Pull power mid-write and inspect log integrity.
Verify checksums with sha256sum.

7.3 Performance Traps

Fsync per line is too slow; batch writes.

8. Extensions & Challenges

8.1 Beginner Extensions

Add gzip compression for old logs.

8.2 Intermediate Extensions

Add JSON log format with schema version.

8.3 Advanced Extensions

Implement append-only with per-record checksums.

9. Real-World Connections

9.1 Industry Applications

Flight data recorders, industrial logging, IoT telemetry.

logrotate for system logs.

9.3 Interview Relevance

Durability and atomicity are common systems interview topics.

10. Resources

10.1 Essential Reading

fsync and rename man pages.

10.2 Video Resources

Filesystem durability tutorials.

10.3 Tools & Documentation

sha256sum and logrotate docs.

Previous: Project 11
Next: Project 13

11. Self-Assessment Checklist

11.1 Understanding

I can explain fsync and atomic rename.
I can explain why checksums matter.

11.2 Implementation

Logs rotate and retain correctly.
Integrity checks pass after power loss simulation.

11.3 Growth

I can discuss durability tradeoffs in interviews.

12. Submission / Completion Criteria

Minimum Viable Completion:

Log writes and daily rotation.

Full Completion:

Checksums and power-loss recovery.

Excellence (Going Above & Beyond):

Append-only logs with per-record checksums.