Project 4: Automated Backup System with Timers
Build a timer-driven backup system that replaces cron with persistence, jitter, integrity checks, and failure notifications.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 1: Beginner |
| Time Estimate | 6-12 hours |
| Main Programming Language | Bash (Alternatives: Python) |
| Alternative Programming Languages | Python, Go |
| Coolness Level | Level 1: Practical and Useful |
| Business Potential | Level 3: Service and Support |
| Prerequisites | Shell scripting, tar/rsync basics, systemd unit files |
| Key Topics | systemd timers, backup integrity, logging and OnFailure |
1. Learning Objectives
By completing this project, you will:
- Replace cron with systemd timers using OnCalendar and Persistent.
- Add jitter (RandomizedDelaySec) to prevent thundering herd.
- Implement daily incrementals and weekly full backups.
- Verify backups using checksums and restore tests.
- Trigger alerts on failure with OnFailure units.
2. All Theory Needed (Per-Concept Breakdown)
Concept 1: systemd Timer Semantics
Fundamentals
systemd timers are first-class units that schedule services. Unlike cron, timers are part of the systemd dependency graph and provide better observability. A timer can trigger a service on calendar schedules, at boot, or at fixed intervals. The Persistent=true option runs missed jobs on the next boot, which is crucial for laptops and intermittently powered systems. RandomizedDelaySec adds jitter so that large fleets do not run at the same moment. AccuracySec controls how precise the scheduler is. Understanding these options lets you build reliable schedules that are robust to reboots and load spikes. Timers also provide clear audit trails via systemctl list-timers.
Deep Dive into the Concept
OnCalendar uses a rich calendar syntax that supports ranges, lists, and names. It can express schedules like Mon..Fri 02:00, *-*-01 01:00 (first day of the month), or daily. A single timer can include multiple OnCalendar lines, which act as OR conditions. Timers also support OnBootSec and OnUnitActiveSec, which are relative schedules (e.g., run 10 minutes after boot or 24 hours after last run). These options enable advanced scheduling patterns that cron cannot express easily.
Persistent=true is the defining feature for reliability. systemd records the last trigger time in /var/lib/systemd/timers/. If the machine was off or asleep when the timer would have fired, systemd runs the service as soon as possible after boot. This ensures that critical tasks are not skipped. In a fleet, this behavior reduces the risk of silent backup gaps, but it can also produce a startup spike. Pairing Persistent=true with RandomizedDelaySec spreads the load after boot.
RandomizedDelaySec adds a random delay to each scheduled run. This is essential for distributed systems. If 500 machines all run backups at 02:00, they can overload storage or networks. Jitter spreads the load across a window. AccuracySec is the scheduler’s precision. Larger values allow the kernel and systemd to coalesce timers, reducing wakeups and power usage. For backups, a precision of minutes is often fine, and it saves power on laptops.
Timers are associated with service units. When the timer triggers, systemd starts the associated service. Failures are logged in journald and can trigger OnFailure units. You can inspect timers with systemctl list-timers to see the last and next run. This observability is a huge advantage over cron, where the schedule is a file and the runtime history is opaque.
Finally, timers are still units. They can be enabled, disabled, and ordered. You can specify dependencies such as After=network.target to ensure that backups only run when the network is available. This makes timers part of the same dependency system you use for services.
Time zones and clock changes are another subtlety. systemd evaluates calendar expressions in the system’s local time zone unless you specify otherwise. During DST transitions, a scheduled time may occur twice or not at all. systemd handles these cases by choosing the next valid time, but you should be aware of it when auditing schedules. Use systemd-analyze calendar to preview upcoming trigger times and verify that your schedule behaves as expected. For deterministic testing, consider fixed UTC schedules or mock time in a test environment. These details matter when you need reliable, predictable backups across environments.
How this fit on projects
Timers are the control plane of this backup system. You will use them in Section 3.2, Section 3.7, and Section 5.10 Phase 2.
Definitions & key terms
- OnCalendar -> Calendar-based schedule expression.
- Persistent -> Run missed jobs on next boot.
- RandomizedDelaySec -> Jitter added after scheduled time.
- AccuracySec -> Scheduling precision.
- Timer unit -> systemd unit that triggers a service.
Mental model diagram (ASCII)
schedule -> backup.timer -> backup.service -> backup.sh
|
+--> persistence store (last run time)
How it works (step-by-step)
- systemd evaluates the timer schedule and computes the next trigger.
- If the system was off, Persistent schedules a catch-up run.
- RandomizedDelaySec adds jitter.
- The timer starts the backup service.
- The service logs to journald.
Invariants: A timer always triggers its linked service; last run time is stored.
Failure modes: Timer not enabled, clock misconfiguration, or missing dependency targets.
Minimal concrete example
[Timer]
OnCalendar=*-*-* 02:00:00
Persistent=true
RandomizedDelaySec=30m
AccuracySec=5m
Common misconceptions
- “Timers are just cron with new syntax” -> They integrate with systemd and persistence.
- “Persistent runs missed jobs forever” -> It runs at most once on boot.
Check-your-understanding questions
- Why is RandomizedDelaySec valuable in fleets?
- What happens if a machine is off during a scheduled run?
- How does AccuracySec affect power usage?
- Why is list-timers useful?
Check-your-understanding answers
- It spreads load to avoid thundering herd.
- With Persistent=true, the job runs after boot.
- Lower precision allows wakeup coalescing and saves power.
- It shows last and next run times for auditing.
Real-world applications
- Backup scheduling on laptops and servers.
- Log cleanup and update tasks with jitter.
Where you’ll apply it
- This project: Section 3.2, Section 3.7, Section 5.10 Phase 2.
- Also used in: P03-socket-activated-server.md as an activation parallel.
References
- systemd.timer documentation.
Key insights
Timers provide persistence and observability that cron lacks.
Summary
Use OnCalendar, Persistent, and RandomizedDelaySec to build reliable schedules.
Homework/exercises to practice the concept
- Create a timer that runs every 5 minutes.
- Add jitter and observe next run time.
- Disable and re-enable a timer and compare
list-timersoutput.
Solutions to the homework/exercises
- Set
OnCalendar=*:0/5. - Add
RandomizedDelaySec=2mand inspectlist-timers. systemctl disable --now, thenenable --now.
Concept 2: Backup Integrity, Incrementals, and Retention
Fundamentals
Backups are only useful if they are restorable. A reliable system must include integrity checks, retention policies, and restore validation. Incremental backups capture changes since the last full backup and reduce storage usage. A common pattern is daily incrementals and weekly full backups. Checksums verify that data was not corrupted. Retention policies define how many backups to keep and prevent disks from filling up. This concept makes your backup system trustworthy rather than merely scheduled. It also forces you to think about naming, metadata, and reproducibility.
Deep Dive into the Concept
An incremental strategy reduces storage by reusing unchanged data. rsync --link-dest is a common approach: it creates a new snapshot directory where unchanged files are hard-linked to the previous snapshot. Each snapshot appears complete but consumes space only for changed files. This simplifies restores because you can copy a snapshot directory without applying patch chains. Another approach is tar incremental archives with a snapshot file that tracks changed files. This works but is harder to restore and manage.
Integrity checks should happen at two levels. First, verify that the backup command succeeded (exit code, size thresholds). Second, compute a checksum manifest (for example, SHA256 hashes of archives or snapshot metadata) and store it alongside the backup. On restore, verify checksums before extracting data. For critical systems, also perform periodic restore tests on sample files. This is often neglected but is essential for real reliability.
Retention policies must be explicit. A simple rule: keep 7 daily incrementals and 4 weekly full backups. Implement cleanup by deleting older snapshots beyond those limits. You should also ensure cleanup occurs after a successful backup, not before, to avoid losing all backups if a run fails.
Data consistency is another consideration. If you back up a live database by copying files, you may get a corrupted snapshot. In this project, you can focus on file-level backups and note that databases require special tools (e.g., pg_dump). But your script should be structured so that a database dump step could be inserted later.
Metadata is also part of integrity. Store a small manifest file with snapshot name, timestamp, source path, and backup tool version. This makes restores reproducible and helps with audits. If you compress archives, record the compression method and level so restores can be automated. If you encrypt backups, ensure key management is documented and test restores regularly. Finally, consider the trade-off between snapshot-style backups and archive-style backups: snapshots are easy to browse, while archives are easy to transfer. Your script can support both or choose one and document why.
Finally, space estimation matters. For reliability, estimate disk usage before running. If the destination filesystem is near full, the backup should fail gracefully and emit a clear error, rather than corrupting existing snapshots. This ties into your failure handling and alerting logic.
How this fit on projects
The backup script and retention policy are central to this project. You will implement them in Section 3.2 and Section 5.10 Phase 2.
Definitions & key terms
- Incremental backup -> Captures changes since last full backup.
- Snapshot -> Point-in-time view of files.
- Retention policy -> Rules for how many backups to keep.
- Checksum -> Hash used to verify integrity.
- Restore test -> Validation that a backup can be used.
Mental model diagram (ASCII)
Weekly Full: F1 ----- F2 ----- F3
Daily Inc: i i i i i i i i i
Restore: pick F2 + its incrementals
How it works (step-by-step)
- Determine last full snapshot.
- Create new snapshot via rsync with link-dest.
- Generate checksum manifest.
- Enforce retention by deleting old snapshots.
Invariants: Each snapshot is self-contained; manifests match snapshot contents.
Failure modes: disk full, corrupted snapshots, or retention deleting too aggressively.
Minimal concrete example
rsync -a --delete --link-dest=/backups/latest /data/ /backups/2026-01-01/
sha256sum /backups/2026-01-01.tar.gz > /backups/2026-01-01.sha256
Common misconceptions
- “Incrementals are always small” -> They grow if many files change.
- “Backup success means data is safe” -> Without checksum, you do not know.
Check-your-understanding questions
- Why use hard links for incrementals?
- How do you verify a backup without restoring everything?
- What happens if retention is not enforced?
- Why should cleanup happen after a successful backup?
Check-your-understanding answers
- It saves space by reusing unchanged files.
- Verify checksums and perform a sample restore.
- Disk fills and backups fail.
- It avoids deleting the last good backup if the new one fails.
Real-world applications
- File server snapshots.
- Developer laptop backup systems.
Where you’ll apply it
- This project: Section 3.2, Section 5.10 Phase 2, Section 7.1.
- Also used in: P05-systemd-controlled-development-environment-manager.md for persistence patterns.
References
rsyncmanual.- GNU tar incremental backup documentation.
Key insights
A backup is trustworthy only when integrity and retention are enforced.
Summary
Incrementals plus checksums and retention equal a real backup system.
Homework/exercises to practice the concept
- Create a snapshot with
rsync --link-dest. - Modify one file and run again; compare disk usage.
- Verify a checksum manifest with
sha256sum -c.
Solutions to the homework/exercises
- Use a new snapshot directory and
--link-dest. - Only changed files should increase disk usage.
sha256sum -c backup.sha256.
Concept 3: Logging, OnFailure, and Alerting
Fundamentals
A scheduled backup is useless if failures are silent. systemd integrates logging through journald and supports OnFailure units that run when a service fails. Your backup service should log structured messages about start time, duration, bytes copied, and errors. When the backup fails, systemd should trigger an alert service that sends a notification or writes an error summary. Clear exit codes allow operators to see why the backup failed. This concept is about observability and reliability, not just scheduling. It also reduces mean time to detection in real incidents.
Deep Dive into the Concept
When a systemd service exits with a non-zero status, systemd marks the unit as failed. If the unit has OnFailure=backup-alert.service, systemd starts the alert service immediately. This allows a clean separation: the backup script focuses on data handling, while the alert unit handles notification. The alert unit could send email, push to Slack, or write to a monitoring log. For a learning project, a simple alert that writes to /var/log/backup-alerts.log is enough, but the system should make it easy to extend.
Logging should be structured and consistent. Use systemd-cat or prefix messages with clear tags. Include key metrics: start time, end time, duration, bytes copied, and checksum result. This provides data for debugging slow backups or failures. For example, log a message like BACKUP_OK bytes=219430912 duration=32s rather than just “Backup complete”.
Exit codes should be meaningful. Define distinct exit codes for disk full, source missing, and checksum mismatch. This allows the alert service to display a human-friendly message. It also makes test cases deterministic: you can simulate a disk full error and expect exit code 5.
Permissions also matter. If your backup runs as root, logs will be accessible to root. If it runs as a user service, logs may be restricted. For a system-level backup, root is typical. Ensure that permissions are explicit and documented.
Finally, consider log rate limiting. If your backup script logs too frequently, journald may rate limit. Keep log volume low and meaningful. In your dashboard project (Project 1), you can verify that log messages appear correctly and that failures are visible.
Alerting should be layered. A local log file is good for debugging, but you also want an external signal for operational awareness. You can implement a simple backup-alert.service that sends mail or writes to a webhook. Provide the last failure reason in a small state file so the alert service can include it in the message. Distinguish between transient errors (network outage) and permanent errors (missing source). This allows different escalation paths. Over time, these details turn a simple backup script into an operationally reliable system.
How this fit on projects
Alerting and logging are required for real reliability. You will use them in Section 3.2, Section 3.7, and Section 5.10 Phase 3.
Definitions & key terms
- OnFailure -> Unit that runs when another unit fails.
- Structured log -> Log with key/value fields.
- Exit code -> Numeric code indicating failure reason.
- Alert unit -> A service that sends notifications.
Mental model diagram (ASCII)
backup.service -> fails -> OnFailure=backup-alert.service
|
+--> journald logs (success/failure)
How it works (step-by-step)
- Backup script logs start metadata.
- Script exits with success or error code.
- systemd marks unit success/failure.
- OnFailure unit triggers on failure.
- Alert service sends notification.
Invariants: Non-zero exit means failed; OnFailure always triggers on failure.
Failure modes: silent errors if exit codes are wrong, or alert service misconfigured.
Minimal concrete example
[Unit]
OnFailure=backup-alert.service
Common misconceptions
- “Logs are enough” -> Without alerts, failures are silent.
- “OnFailure is only for system units” -> It works for user units too.
Check-your-understanding questions
- How do you trigger an OnFailure unit?
- Why use journald instead of a plain file?
- What should exit codes communicate?
- Why keep logs structured?
Check-your-understanding answers
- Exit non-zero from the main service.
- journald provides structured querying and rotation.
- The failure category, not just “failed”.
- Structured logs are easier to search and parse.
Real-world applications
- Automated alerts for failed backups, deployments, or scheduled jobs.
Where you’ll apply it
- This project: Section 3.7, Section 5.10 Phase 3, Section 7.
- Also used in: P01-service-health-dashboard.md for log viewing.
References
- systemd.service documentation (OnFailure).
- systemd-cat manual.
Key insights
Backups are reliable only if failures are visible and actionable.
Summary
Use journald for structured logs and OnFailure for alerts.
Homework/exercises to practice the concept
- Write a service that always fails and confirm OnFailure runs.
- Use
systemd-catto tag log lines. - Filter logs by priority.
Solutions to the homework/exercises
ExecStart=/bin/falseand check status.echo test | systemd-cat -t backup.journalctl -u backup.service -p err.
Concept 4: Backup Integrity, Atomicity, and Verification
Fundamentals
A backup that cannot be trusted is worse than no backup. Integrity means that a backup is complete, uncorrupted, and restorable. Atomicity means that each backup snapshot is internally consistent; you should not capture a half-written file or a partially updated database. Verification ensures that your backups can actually be restored. These concepts are not unique to systemd, but they are essential to building a reliable timer-driven backup system. systemd timers provide scheduling, but they do not guarantee that your data is consistent or recoverable. That responsibility is yours. If you understand integrity, atomicity, and verification, you will design a backup system that operators can trust.
Deep Dive into the Concept
Integrity begins with defining what data must be backed up and what constitutes a successful backup. For file-based backups, integrity means capturing all files in scope with correct contents and metadata (permissions, ownership, timestamps). For databases, integrity means capturing a consistent snapshot, often via a database-specific dump or snapshot mechanism. A naive tar of a live database directory can produce a corrupt backup. The rule is: use application-aware tools where consistency matters, and consider filesystem snapshots when possible.
Atomicity is about ensuring that a single backup run produces a coherent state. One technique is to stage the backup into a temporary directory and then atomically move it into place when complete. Another technique is to use filesystem snapshots (e.g., LVM, btrfs, ZFS) to capture a point-in-time view, then back up from the snapshot. Even without snapshots, you can reduce inconsistency by ordering operations (e.g., stop a service briefly, flush buffers, or use application-specific freeze commands). In a timer-based system, this should be encoded in the service unit, which can perform pre-backup hooks to quiesce services and post-backup hooks to resume them.
Verification is the most neglected part of backup systems. At a minimum, you should validate that the archive or sync completes without errors and that checksums match. For file archives, you can compute a manifest of SHA-256 hashes and store it alongside the backup. For rsync-based backups, you can run a follow-up verification pass with --checksum or compare manifest files. For database dumps, verification can include loading the dump into a test database or at least checking that the dump is non-empty and has the expected schema markers. In a timer-driven system, you can schedule a weekly verification job (another timer) that checks recent backups and alerts on failures.
Retention policies are part of integrity: you must ensure that backups are rotated correctly so you do not accidentally delete your last good backup. A robust system keeps at least one known-good full backup and a chain of incrementals. When deleting old backups, use deterministic rules (e.g., keep last 7 daily, last 4 weekly, last 12 monthly). This should be implemented carefully, with guardrails to prevent deleting everything due to a bug or clock issue. A good practice is to require that at least one backup remains before deletion proceeds.
Finally, backups must be observable. Logs should record backup size, duration, checksum verification results, and any skipped files. If a backup fails due to disk space, you need that to be explicit. For integrity, it is not enough to say “backup complete”; you need to say “backup complete, 2.1G, checksum verified”. This information becomes the basis for audit and incident response.
How this fit on projects
This concept informs your backup script design, log output, and validation strategy. You will apply it in Section 3.2 (Functional Requirements: integrity checks), Section 3.5 (Data formats: manifest schema), Section 5.4 (Concepts you must understand first), and Section 6.2 (Critical test cases). It also affects your failure demo in Section 3.7.3.
Definitions & key terms
- Integrity -> Assurance that backup data is complete and uncorrupted.
- Atomicity -> A backup snapshot represents a single consistent point in time.
- Verification -> Steps that confirm a backup is restorable or valid.
- Manifest -> A file listing checksums and metadata for backup contents.
- Retention policy -> Rules for how long backups are kept and rotated.
Mental model diagram (ASCII)
Live data -> snapshot/stage -> archive -> verify -> rotate
| | | |
+-- quiesce +-- atomic move +-- hash +-- retention
How it works (step-by-step)
- Quiesce or snapshot the data source.
- Copy data to a staging directory.
- Create archive or sync to destination.
- Generate checksum manifest and verify.
- Atomically move completed backup into place.
- Apply retention policy and log results.
Invariants: A backup is only marked successful after verification.
Failure modes: Inconsistent snapshots, partial archives, or silent corruption.
Minimal concrete example
# Create a manifest of SHA-256 checksums
find /backups/latest -type f -print0 | xargs -0 sha256sum > /backups/latest/manifest.sha256
# Verify later
sha256sum -c /backups/latest/manifest.sha256
Common misconceptions
- “If the backup command exits 0 it is valid” -> It might still be incomplete or corrupt.
- “File copy is enough for databases” -> Many databases require consistent snapshot tools.
- “Verification is too expensive” -> Periodic verification is cheaper than data loss.
Check-your-understanding questions
- Why is a database directory copy often an invalid backup?
- How does atomic move improve backup reliability?
- What is the minimal verification you should perform after a backup?
- How can retention policies accidentally delete all backups?
Check-your-understanding answers
- The database files can change during copy, producing an inconsistent snapshot.
- It ensures only fully completed backups are visible as “latest”.
- Validate the archive and checksum manifest or equivalent.
- If rules are based on a wrong clock or bug, they may remove everything.
Real-world applications
- Production backup systems for databases and file servers.
- Compliance-driven retention and audit trails.
- Disaster recovery planning and restore drills.
Where you’ll apply it
- This project: Section 3.2, Section 3.5, Section 5.4, Section 6.2.
- Also used in: P01-service-health-dashboard.md for log and audit reporting.
References
- “The Linux Command Line” archiving chapters.
- rsync and tar documentation.
- Database-specific backup guides (PostgreSQL pg_dump, MySQL mysqldump).
Key insights
A backup is only as good as its verification and restore path.
Summary
Integrity requires consistent snapshots, verification, and careful retention; timers only schedule the work.
Homework/exercises to practice the concept
- Create a backup manifest and verify it after modifying a file to see failure.
- Implement a retention script that keeps 7 daily and 4 weekly backups.
- Simulate a partial backup and ensure your script marks it as failed.
Solutions to the homework/exercises
- Change a file and rerun
sha256sum -c; it should fail. - Use date-based folder names and delete older ones after counting.
- Exit with non-zero status and emit a failure log line.
3. Project Specification
3.1 What You Will Build
A backup system that:
- Runs daily incremental backups and weekly full backups.
- Uses systemd timers with persistence and jitter.
- Verifies integrity with checksums and restore tests.
- Sends alerts on failures.
Included: backup script, timer/service units, retention logic, logging.
Excluded: full database snapshot support, multi-region replication.
3.2 Functional Requirements
- Timer Units: daily incremental + weekly full.
- Backup Script: run rsync or tar with manifest.
- Integrity Check: generate SHA256 manifest and verify.
- Retention: keep last 7 incrementals and 4 full backups.
- OnFailure: trigger alert service.
- Restore Test: validate a sample restore weekly.
3.3 Non-Functional Requirements
- Reliability: missed runs execute after reboot.
- Performance: run incrementals in under 10 minutes for 10GB.
- Usability: clear logs and exit codes.
3.4 Example Usage / Output
$ systemctl list-timers backup.timer
NEXT LAST UNIT
Thu 2026-01-02 02:00:00 UTC Wed 2026-01-01 02:00:00 UTC backup.timer
$ journalctl -u backup.service -n 2
Jan 01 02:00:01 host backup[1234]: BACKUP_OK bytes=219430912 duration=32s
3.5 Data Formats / Schemas / Protocols
Backup layout:
/backups/
full/2026-01-05/
inc/2026-01-06/
manifests/2026-01-06.sha256
3.6 Edge Cases
- Disk full during backup.
- Source directory missing.
- Missed run due to laptop sleep.
- Checksum mismatch on restore test.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
sudo cp backup.service backup.timer /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable --now backup.timer
3.7.2 Golden Path Demo (Deterministic)
- Use fixed directories:
/data->/backups. - Use
BACKUP_FAKE_TIME=2026-01-01T02:00:00Zfor logs.
3.7.3 If CLI: exact terminal transcript
$ BACKUP_FAKE_TIME=2026-01-01T02:00:00Z sudo systemctl start backup.service
Jan 01 02:00:00 host backup[1001]: BACKUP_OK bytes=219430912 duration=32s
Failure demo:
$ sudo systemctl start backup.service
Jan 01 02:00:01 host backup[1001]: BACKUP_FAILED reason=source_missing
exit code: 6
Exit codes:
0success5disk full6source missing7checksum mismatch
4. Solution Architecture
4.1 High-Level Design
backup.timer -> backup.service -> backup.sh
|
+--> checksum + retention
+--> OnFailure=backup-alert.service
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Timer Units | Scheduling | OnCalendar + Persistent |
| Backup Script | Data copy + checksums | rsync vs tar |
| Alert Unit | Notify on failure | email vs log file |
4.3 Data Structures (No Full Code)
sha256sum filename
4.4 Algorithm Overview
Key Algorithm: Incremental Backup
- Determine last full snapshot.
- Run rsync with link-dest into new snapshot dir.
- Generate checksum manifest.
- Rotate old snapshots.
Complexity Analysis:
- Time: O(files changed)
- Space: O(changed data)
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install -y rsync coreutils
5.2 Project Structure
backup/
├── backup.sh
├── backup.service
├── backup.timer
├── backup-alert.service
└── README.md
5.3 The Core Question You’re Answering
“How can I schedule reliable jobs that survive reboots and avoid stampedes?”
5.4 Concepts You Must Understand First
- Timer semantics and persistence.
- Incremental backup strategy.
- Failure notification with OnFailure.
5.5 Questions to Guide Your Design
- How will you verify that backups are restorable?
- How will you avoid all machines backing up at the same time?
- How do you handle disk-full errors?
5.6 Thinking Exercise
Design a weekly schedule for 100 machines such that backups spread across 2 hours.
5.7 The Interview Questions They’ll Ask
- “Why are systemd timers better than cron for laptops?”
- “What does Persistent=true do?”
- “How do you verify backup integrity?”
5.8 Hints in Layers
Hint 1: Write the backup script first.
Hint 2: Wrap it in a .service unit.
Hint 3: Add a .timer with persistence.
Hint 4: Add OnFailure alerts.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Shell scripting | “The Linux Command Line” | archive chapters |
| System admin | “How Linux Works” | scheduling sections |
| Reliability | “Site Reliability Engineering” | monitoring and alerting |
5.10 Implementation Phases
Phase 1: Foundation (2-3 hours)
Goals: backup script runs manually.
Checkpoint: backup.sh creates a snapshot.
Phase 2: Timers (2-3 hours)
Goals: schedule via systemd timers.
Checkpoint: systemctl list-timers shows next run.
Phase 3: Integrity and Alerts (2-4 hours)
Goals: checksums + OnFailure.
Checkpoint: forced failure triggers alert.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Backup tool | rsync vs tar | rsync + link-dest | incremental snapshots |
| Alerting | email vs log file | log + email | minimal and visible |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———|———|———-| | Unit Tests | Script logic | checksum mismatch detection | | Integration Tests | timer/service | Persistent run after reboot | | Edge Case Tests | disk full | exit code 5 |
6.2 Critical Test Cases
- Missing source returns exit 6.
- Disk full triggers failure and OnFailure.
- RandomizedDelaySec changes next run time.
6.3 Test Data
source: /data
backup: /backups
7. Common Pitfalls and Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|——–|———|———-|
| Timer not enabled | never runs | systemctl enable --now |
| No jitter | thundering herd | RandomizedDelaySec |
| No retention | disk fills | delete old snapshots |
7.2 Debugging Strategies
systemctl list-timersfor schedule inspection.journalctl -u backup.servicefor logs.
7.3 Performance Traps
Running full backups daily wastes time and storage.
8. Extensions and Challenges
8.1 Beginner Extensions
- Add config file for paths and retention.
- Add dry-run mode.
8.2 Intermediate Extensions
- Add remote sync (rsync over SSH).
- Encrypt backups with GPG.
8.3 Advanced Extensions
- Integrate with LVM snapshots.
- Add restore verification on a schedule.
9. Real-World Connections
9.1 Industry Applications
- Server backup automation for SMBs.
- Laptop backup for remote teams.
9.2 Related Open Source Projects
- borgbackup, restic (advanced backup systems).
9.3 Interview Relevance
- Explain why timers are more reliable than cron.
10. Resources
10.1 Essential Reading
- systemd timer docs.
- rsync manual.
10.2 Video Resources
- systemd scheduling talks.
10.3 Tools and Documentation
systemctl,journalctl,rsync.
10.4 Related Projects in This Series
- P01-service-health-dashboard.md for log monitoring.
- P03-socket-activated-server.md for activation parallels.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain Persistent=true.
- I can describe incremental backups.
- I can explain OnFailure.
11.2 Implementation
- Timers run on schedule.
- Checksums are generated and verified.
- Failures trigger alerts.
11.3 Growth
- I can restore a file from backup.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Timer triggers backups.
- Logs show success/failure.
Full Completion:
- Integrity checks and retention policies work.
Excellence (Going Above and Beyond):
- Remote, encrypted backups with periodic restore tests.