Project 6: Deployment Pipeline Tool (Final Integration)
Build a deployment tool that watches a directory, syncs changes to a remote host, restarts a supervised service, aggregates logs, and reports health.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 3-4 weeks |
| Main Programming Language | C (Alternatives: Go, Rust) |
| Alternative Programming Languages | Go, Rust |
| Coolness Level | Level 5 - Real integration capstone |
| Business Potential | Level 4 - DevOps tooling |
| Prerequisites | All prior projects, SSH/rsync basics |
| Key Topics | File watching, orchestration, supervision, log streaming |
1. Learning Objectives
By completing this project, you will:
- Orchestrate file watching, sync, restart, and health checks as a single tool.
- Integrate the log tailer, connection pool, and supervisor concepts.
- Handle partial failures with rollback and retries.
- Provide deterministic deployment reports.
- Build a deploy pipeline that fails safely under real OS constraints.
2. All Theory Needed (Per-Concept Breakdown)
2.1 File Watching and Change Detection
Fundamentals
File watching detects changes in a directory so you can trigger a deploy. On Linux, inotify provides event notifications for file creation, modification, and deletion. A polling approach periodically scans file timestamps to detect changes. Inotify is efficient but can drop events under heavy load; polling is simpler but less responsive. A robust deploy tool should support both or use inotify with periodic sanity scans.
Deep Dive into the concept
Inotify watches directories and reports events via a file descriptor. Each event includes a mask (e.g., IN_MODIFY, IN_CREATE) and a file name. However, inotify has limits: the event queue can overflow, and you can exceed watch limits (fs.inotify.max_user_watches). When this happens, events are lost. A deploy tool must detect this and recover by doing a full rescan. Polling, by contrast, is simple and robust but can be CPU-expensive. A hybrid approach is ideal: use inotify for real-time detection and fall back to periodic scans for consistency.
Change detection also needs filtering. Temporary files, editor swap files, and build artifacts can cause noisy deploys. Your tool should support ignore patterns and configurable includes. It should also debounce bursts of changes (e.g., saving many files at once) by grouping changes within a small time window before triggering a deploy.
How this fits in projects
File watching is the trigger for the deploy pipeline in Project 6. It can reuse the polling techniques from Project 1 and the event loop patterns from Project 2.
Definitions & key terms
- inotify: Linux filesystem event notification system.
- Debounce: Delay to group rapid events into one action.
- Rescan: Full directory scan to recover from event loss.
Mental model diagram (ASCII)
fs events -> debounce -> change set -> deploy pipeline
How it works (step-by-step, with invariants and failure modes)
- Register inotify watches or start polling.
- Capture change events into a queue.
- Debounce for a short window (e.g., 200ms).
- Produce a change set and trigger deploy.
- Failure mode: inotify queue overflow -> fallback to rescan.
Minimal concrete example
int fd = inotify_init1(IN_NONBLOCK);
int wd = inotify_add_watch(fd, path, IN_CREATE|IN_MODIFY|IN_DELETE);
Common misconceptions
- “Inotify never drops events”: it can under heavy load.
- “Polling is always bad”: it is reliable and sometimes sufficient.
Check-your-understanding questions
- What happens when inotify queue overflows?
- Why debounce events?
- What is a reasonable fallback strategy?
Check-your-understanding answers
- Events are lost; you must rescan.
- To avoid deploying on every tiny change.
- Perform a full scan and rebuild the change set.
Real-world applications
- Hot reloaders in development tools.
- CI systems that detect source changes.
Where you will apply it
- Project 6: See §3.1 and §5.10 Phase 1.
- Also used in: P01 Multi-Source Log Tailer for polling logic.
References
man 7 inotify.
Key insights
Change detection must be robust to event loss and noisy edits.
Summary
File watching is the trigger mechanism; reliability comes from debounce and rescan logic.
Homework/exercises to practice the concept
- Implement a small inotify watcher and log events.
- Simulate queue overflow by generating many events.
- Add a debounce timer and group events into a batch.
Solutions to the homework/exercises
- Events should show CREATE/MODIFY/DELETE masks.
- Overflow produces IN_Q_OVERFLOW; you must rescan.
- Debounce groups changes into a single deploy trigger.
2.2 Reliable Sync and Partial Deployment Handling
Fundamentals
Deployment sync copies files to a remote host. Tools like rsync are efficient because they transfer only changed blocks. But sync can fail mid-way due to network loss or permissions. A deploy tool must detect partial syncs and decide whether to retry or rollback.
Deep Dive into the concept
Rsync uses file checksums and rolling hashes to send only changed blocks. It can also preserve permissions and timestamps. However, if the connection drops mid-transfer, the remote directory may be in a partially updated state. This can break running services. To avoid this, a common pattern is to sync into a staging directory, then atomically switch a symlink to point to the new version. If sync fails, you leave the old symlink intact and the service continues running.
Another approach is to use a temporary file naming scheme and rename on completion. This provides atomicity at the file level but not at the directory level. For a deployment tool, directory-level atomicity is more important. Therefore, the recommended approach is: sync to releases/<timestamp>/, validate, then switch a current symlink. This pattern is used by many deployment systems.
Retries must be bounded. If a sync fails repeatedly, you should back off and report an error. The tool should also provide a diff report of which files were updated in the last attempt to aid debugging.
How this fits in projects
Sync reliability is core to Project 6 and depends on the connection pool reliability from Project 2.
Definitions & key terms
- Staging directory: Temporary deploy target before switch.
- Atomic switch: Changing a symlink to activate new version.
- Partial deploy: Incomplete file sync state.
Mental model diagram (ASCII)
rsync -> releases/2026-01-01-120000/
validate -> switch symlink current -> new release
How it works (step-by-step, with invariants and failure modes)
- Create new release directory on remote.
- Sync files to release directory.
- Validate (checksum or test command).
- Atomically update
currentsymlink. - Failure mode: sync fails -> do not switch symlink.
Minimal concrete example
rsync -az ./src user@host:/opt/app/releases/2026-01-01-120000/
ssh user@host ln -sfn /opt/app/releases/2026-01-01-120000 /opt/app/current
Common misconceptions
- “Rsync always leaves a clean state”: partial syncs can leave broken files.
- “Atomic switch is optional”: it prevents broken deployments.
Check-your-understanding questions
- Why use a staging directory?
- What does
ln -sfnaccomplish? - How do you handle repeated sync failures?
Check-your-understanding answers
- It isolates incomplete updates from running services.
- It atomically updates the symlink to new release.
- Backoff and report errors; avoid infinite retries.
Real-world applications
- Blue/green and symlink-based deployment systems.
Where you will apply it
- Project 6: See §3.2 Functional Requirements and §5.10 Phase 2.
- Also used in: P02 HTTP Connection Pool.
References
- rsync documentation.
- Capistrano-style deploy patterns.
Key insights
Atomic release switching prevents partial deploys from breaking services.
Summary
Use staging directories and symlink swaps to make deployments atomic and safe.
Homework/exercises to practice the concept
- Sync to a staging directory and switch a symlink.
- Simulate a failed sync and verify the old release remains active.
- Record a diff of changed files for each deploy.
Solutions to the homework/exercises
- The new release should activate only after the symlink update.
- Old release continues running if sync fails.
- The diff shows which files changed during deploy.
2.3 Service Supervision, Restart, and Health Checks
Fundamentals
After a deploy, you need to restart the service and confirm it is healthy. A supervisor handles start/stop/restart and forwards signals safely. Health checks validate whether the service is ready. This can be a local HTTP check, a PID file, or a custom command. The deployment pipeline should not proceed to log streaming until the service is healthy.
Deep Dive into the concept
Restarting a service during deployment must be done carefully. If you kill the process abruptly, it may leave resources in a bad state. The supervisor pattern from Project 3 provides a controlled shutdown: send SIGTERM, wait for exit, then SIGKILL if necessary. After restart, you should run a health check with a timeout. Health checks should be idempotent and fast. A common strategy is to poll a /health endpoint with a timeout and retry interval.
Health checks should be integrated into the deployment state machine. If health checks fail, the deployment should either roll back to the previous release or mark the deployment as failed and leave the system in a safe state. The decision should be configurable.
How this fits in projects
This concept integrates Project 3’s supervisor logic into Project 6 and uses Project 2’s connection pool for health checks.
Definitions & key terms
- Health check: Test to verify service readiness.
- Rollback: Return to previous release after failure.
- Graceful shutdown: Stop process with SIGTERM before SIGKILL.
Mental model diagram (ASCII)
deploy -> restart -> health check -> success -> log stream
\-> fail -> rollback
How it works (step-by-step, with invariants and failure modes)
- Send SIGTERM to service group.
- Wait for exit, then restart.
- Poll health endpoint with timeout.
- If healthy, proceed; if not, rollback.
- Failure mode: no timeout -> deploy hangs indefinitely.
Minimal concrete example
curl -sf http://127.0.0.1:8080/health || exit 1
Common misconceptions
- “Restart means ready”: not necessarily; you need a health check.
- “SIGKILL is safe”: it skips cleanup.
Check-your-understanding questions
- Why separate restart and health check?
- When should you rollback?
- What is a reasonable health check timeout?
Check-your-understanding answers
- A process can start but not be ready.
- When health checks fail after retries.
- Enough to cover startup time, typically a few seconds.
Real-world applications
- Continuous deployment systems.
- Service supervisors in production.
Where you will apply it
- Project 6: See §3.2 and §5.10 Phase 3.
- Also used in: P03 Process Supervisor.
References
- systemd service management docs.
Key insights
A deploy is not complete until health checks pass.
Summary
Integrate supervised restarts with explicit health checks and rollback logic.
Homework/exercises to practice the concept
- Add a health check endpoint to a toy service.
- Simulate failed health checks and trigger rollback.
- Measure time-to-healthy and log it.
Solutions to the homework/exercises
- Health check should return 200 when ready.
- Rollback should restore previous release and keep service running.
- Logs show time from restart to healthy.
2.4 Log Aggregation Across Rotations
Fundamentals
Deployment tools often tail logs to show live output. Log aggregation must survive rotations, restarts, and multiple sources. This is essentially Project 1 integrated into a pipeline. You need ordered, timestamped output with clear source labels and rotation events.
Deep Dive into the concept
When a service restarts, logs may rotate or truncate. A deploy tool should attach to logs via a rotation-aware tailer that tracks inode identity. If the log file is replaced, the tailer must reopen and continue without missing lines. When multiple services or instances are involved, the tool should merge output by timestamp to create a coherent view of system behavior.
A key design decision is whether log streaming is blocking or optional. In a deploy pipeline, log streaming should not block the overall pipeline if logs are unavailable; it should instead emit warnings and continue. This ensures the deploy completes even if logs are delayed or missing.
How this fits in projects
This concept reuses Project 1’s tailer and is integrated into Project 6 as the final observability layer.
Definitions & key terms
- Aggregation: Combining logs from multiple sources.
- Rotation-aware: Tailer that handles rename and copytruncate.
Mental model diagram (ASCII)
logs A + logs B -> rotation-aware tailers -> ordered output
How it works (step-by-step, with invariants and failure modes)
- Attach tailers to target log files.
- Detect rotations and reopen.
- Merge lines by timestamp.
- Failure mode: missing logs -> warn and continue.
Minimal concrete example
./tailer --config deploy-logs.yml
Common misconceptions
- “Tail -f is enough”: it fails on rename rotations.
- “Log streaming should block deploy”: it should be optional.
Check-your-understanding questions
- Why is rotation handling necessary in deploy logs?
- How do you order logs from multiple sources?
- What should happen if a log file is missing?
Check-your-understanding answers
- Services often rotate logs on restart.
- By timestamp with tie-breakers.
- Emit warning and continue.
Real-world applications
- CI/CD pipelines streaming service logs.
- Incident response during deploys.
Where you will apply it
- Project 6: See §3.2 Functional Requirements and §5.10 Phase 4.
- Also used in: P01 Multi-Source Log Tailer.
References
- Project 1 tailer design.
Key insights
Log streaming is an observability feature, not a gate.
Summary
Integrate a rotation-aware tailer to provide live deploy visibility without blocking.
Homework/exercises to practice the concept
- Tail two log files and merge output by timestamp.
- Rotate a log file mid-stream and verify continuity.
- Simulate missing log and ensure the tool continues.
Solutions to the homework/exercises
- Output should interleave based on timestamps.
- Rotation should be detected and handled without missing lines.
- A warning should be emitted, but deploy continues.
2.5 Deployment State Machine and Determinism
Fundamentals
A deploy pipeline is a state machine: detect changes, sync, restart, health check, stream logs, report. Each step has success and failure transitions. Making this explicit prevents hidden states and ensures predictable recovery. Determinism matters for testing; you should be able to run the pipeline with fixed inputs and get the same report.
Deep Dive into the concept
State machines make failure handling explicit. Each state should define its inputs, outputs, timeouts, and failure transitions. For example, the “sync” state might transition to “restart” on success or “rollback” on failure. Timeouts are critical: if a step hangs, the pipeline should abort or retry. Deterministic behavior is achieved by using fixed seeds for retries, consistent timestamp formatting, and stable ordering of log output. This makes it possible to test the pipeline in CI.
A useful design pattern is to emit a deployment report that includes timestamps, durations, and outcomes for each state. This report should be deterministic in test mode and include a golden path sample. You should also include a failure path sample (e.g., health check fails) with clear exit codes.
How this fits in projects
This concept ties all prior components into a single orchestrated flow and defines how the pipeline behaves under failure.
Definitions & key terms
- State machine: A set of states with explicit transitions.
- Rollback: Transition to a previous safe state.
- Deterministic mode: Fixed seeds and timestamps for testing.
Mental model diagram (ASCII)
WATCH -> SYNC -> RESTART -> HEALTH -> LOGS -> REPORT
| | | | | |
+------fail----> ROLLBACK <-+--------+--------+
How it works (step-by-step, with invariants and failure modes)
- Wait for changes.
- Sync to staging and switch release.
- Restart service via supervisor.
- Run health checks with timeout.
- Attach log tailer.
- Emit report and exit.
- Failure mode: step timeout -> rollback.
Minimal concrete example
{"state":"HEALTH","status":"failed","duration_ms":1200}
Common misconceptions
- “Deployment is linear”: it is stateful with failure paths.
- “Testing deploys is too complex”: deterministic mode makes it testable.
Check-your-understanding questions
- Why make the state machine explicit?
- What does deterministic mode enable?
- When should rollback happen?
Check-your-understanding answers
- To define failure behavior and avoid hidden states.
- Reproducible tests and predictable reports.
- When sync or health check fails.
Real-world applications
- CI/CD pipelines.
- Deployment orchestration tools.
Where you will apply it
- Project 6: See §4.1 and §5.10 Phase 4.
- Also used in: P02 HTTP Connection Pool for retries.
References
- State machine design patterns.
Key insights
Explicit states create predictable recovery paths.
Summary
A deploy pipeline should be a deterministic state machine with explicit failure transitions and reports.
Homework/exercises to practice the concept
- Draw a state machine for a simple deploy.
- Add timeout transitions and rollback paths.
- Emit a JSON report for each state.
Solutions to the homework/exercises
- The diagram should include success and failure transitions.
- Timeouts should go to rollback or failure states.
- Reports should include state name and duration.
3. Project Specification
3.1 What You Will Build
A CLI deploy tool that monitors a directory, syncs changes to a remote host, restarts a supervised service, tails logs, and prints a deterministic deploy report.
3.2 Functional Requirements
- Watch: Detect file changes (inotify + debounce).
- Sync: Sync to remote staging directory and switch symlink.
- Restart: Use supervisor to restart service.
- Health: Poll health endpoint with timeout.
- Logs: Stream logs with rotation handling.
- Report: Output a deterministic report and exit code.
3.3 Non-Functional Requirements
- Reliability: Partial syncs do not break running service.
- Observability: Clear step-by-step logs and report.
- Determinism: Fixed seeds and timestamps in test mode.
3.4 Example Usage / Output
$ ./deploy watch ./src user@server:/opt/app
[deploy] detected change: main.c
[deploy] rsync complete in 420ms
[deploy] restarting service via supervisor
[deploy] service healthy in 1.2s
[deploy] log stream attached
3.5 Data Formats / Schemas / Protocols
Deploy report (JSON):
{
"version": 1,
"steps": [
{"name":"sync","status":"ok","duration_ms":420},
{"name":"restart","status":"ok","duration_ms":800},
{"name":"health","status":"ok","duration_ms":1200}
],
"result": "success"
}
3.6 Edge Cases
- Sync fails mid-transfer.
- Health check times out.
- Logs unavailable or rotated during deploy.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make
./deploy watch ./src user@server:/opt/app --deterministic --seed 42
3.7.2 Golden Path Demo (Deterministic)
- Use a test server with fixed health response.
- Report shows deterministic durations in test mode.
3.7.3 If CLI: exact terminal transcript
$ ./deploy watch ./src user@server:/opt/app --seed 1
[deploy] detected change: main.c
[deploy] rsync complete in 400ms
[deploy] restart ok
[deploy] health ok
[deploy] report written deploy.json
Failure demo (health check timeout):
$ ./deploy watch ./src user@server:/opt/app
[deploy] restart ok
[deploy] health timeout after 5s
[deploy] rollback to previous release
Exit codes:
0on success.8on sync failure.9on health check failure.
4. Solution Architecture
4.1 High-Level Design
Watcher -> Sync -> Restart -> Health -> Logs -> Report
\--------------------------------------/
Rollback on failure
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Watcher | Detect changes and debounce | inotify + rescan |
| Syncer | Rsync to staging + symlink switch | atomic release switch |
| Supervisor | Restart service | reuse Project 3 logic |
| Health check | Validate readiness | HTTP endpoint via pool |
| Logger | Stream logs | reuse Project 1 tailer |
| Reporter | Emit deterministic report | JSON output |
4.3 Data Structures (No Full Code)
struct step_result { char name[32]; int status; int64_t duration_ms; };
4.4 Algorithm Overview
Key Algorithm: Deploy state machine
- Detect change, build change set.
- Sync to staging, verify.
- Restart service, wait for exit.
- Health check with retries.
- Attach log tailer.
- Emit report and exit.
Complexity Analysis:
- Time: dominated by sync and restart durations.
- Space: O(n) for change set.
5. Implementation Guide
5.1 Development Environment Setup
sudo apt-get install -y gcc make rsync openssh-client
5.2 Project Structure
deploy-tool/
├── src/
│ ├── main.c
│ ├── watch.c
│ ├── sync.c
│ ├── supervisor.c
│ ├── health.c
│ ├── logs.c
│ └── report.c
├── include/
│ └── deploy.h
└── Makefile
5.3 The Core Question You’re Answering
“How do I make multiple systems act as one coherent tool under failure?”
5.4 Concepts You Must Understand First
- File watching and debounce.
- Atomic sync and rollback.
- Supervisor restart logic.
- Health checks and timeouts.
- Log aggregation and rotation handling.
5.5 Questions to Guide Your Design
- What is the minimal deploy state machine?
- How do you handle partial sync failures?
- When do you rollback vs retry?
5.6 Thinking Exercise
Partial Deploy Scenario
Sync 60% of files, then network drops.
Should you roll back, retry, or fail fast?
5.7 The Interview Questions They’ll Ask
- How do you design a deploy tool that survives partial failure?
- How do you ensure logs keep flowing across restarts?
- Why do you need atomic symlink switching?
5.8 Hints in Layers
Hint 1: File watching
// inotify + debounce window
Hint 2: Sync to staging
rsync -az ./src user@host:/opt/app/releases/ID/
Hint 3: Supervisor integration
// reuse Project 3 restart logic
Hint 4: Log tailer
// reuse Project 1 rotation-aware tailer
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Processes | APUE | Ch. 8-9 |
| File I/O | TLPI | Ch. 4 |
| Networking | TCP/IP Sockets in C | Ch. 2-4 |
5.10 Implementation Phases
Phase 1: Watch + Change Set (4-6 days)
Goals:
- Detect file changes and build deploy triggers.
Tasks:
- Implement inotify watcher.
- Add debounce and rescan fallback.
Checkpoint: Change detection triggers deploy.
Phase 2: Sync + Rollback (5-7 days)
Goals:
- Reliable sync and atomic switch.
Tasks:
- Sync to staging directory.
- Switch symlink on success.
- Rollback on failure.
Checkpoint: Partial sync never breaks active release.
Phase 3: Restart + Health (5-7 days)
Goals:
- Restart service and confirm readiness.
Tasks:
- Integrate supervisor restart logic.
- Implement health check retries.
Checkpoint: Failed health checks trigger rollback.
Phase 4: Logs + Report (4-6 days)
Goals:
- Stream logs and emit report.
Tasks:
- Attach rotation-aware tailer.
- Emit deterministic JSON report.
Checkpoint: Deploy report matches golden path.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| Watch strategy | inotify, polling | inotify + rescan | Reliable and responsive |
| Sync strategy | direct, staging + symlink | staging + symlink | Atomicity |
| Failure policy | rollback, fail fast | rollback | Safer for production |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Unit Tests | State transitions | state machine tests |
| Integration Tests | Sync + restart + health | deploy script |
| Edge Case Tests | Sync failure and rollback | network drop test |
6.2 Critical Test Cases
- Sync fails mid-transfer; symlink remains on old release.
- Health check times out; rollback occurs.
- Log tailer survives rotation during deploy.
6.3 Test Data
Mock health endpoint: returns 200 OK
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| No atomic switch | Broken deploy on failure | Use staging + symlink |
| Missing timeouts | Deploy hangs | Add step timeouts |
| Log tailer follows old inode | Logs stop after restart | Use rotation-aware tailer |
7.2 Debugging Strategies
- Log each state transition with durations.
- Use a dry-run mode to print actions without executing.
- Add verbose mode for rsync and ssh output.
7.3 Performance Traps
- Deploying on every single file change without debounce.
- Running health checks too frequently without backoff.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a
--dry-runmode. - Add a simple HTML report output.
8.2 Intermediate Extensions
- Add parallel sync to multiple hosts.
- Add config file support.
8.3 Advanced Extensions
- Add canary deploy mode.
- Add rollback based on log error patterns.
9. Real-World Connections
9.1 Industry Applications
- Capistrano-style deploys: staging + symlink pattern.
- CI/CD pipelines: automated deploy with health checks.
9.2 Related Open Source Projects
- Capistrano: Ruby-based deployment tool.
- Fabric: Python deployment automation.
9.3 Interview Relevance
- Deployment pipelines demonstrate systems integration skills.
10. Resources
10.1 Essential Reading
- The Linux Programming Interface - file I/O and process control.
- Release It! - resilience patterns.
10.2 Video Resources
- Talks on deploy pipelines and zero-downtime deploys.
10.3 Tools & Documentation
man 7 inotify, rsync manual, ssh manual.
10.4 Related Projects in This Series
- P01 Multi-Source Log Tailer
- P02 HTTP Connection Pool
- P03 Process Supervisor
- P05 Environment Debugger
11. Self-Assessment Checklist
11.1 Understanding
- I can explain why deployments need atomic switches.
- I can explain why health checks are required.
- I can describe the deploy state machine.
11.2 Implementation
- All functional requirements are met.
- Rollbacks work on failure.
- Logs stream across restarts.
11.3 Growth
- I can explain the integration of Projects 1-5.
- I can justify my failure handling policy.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Detect changes, sync to remote, restart service, health check.
- Emit deploy report with exit code.
Full Completion:
- Rollback on failure and log streaming integrated.
- Deterministic test mode with golden path.
Excellence (Going Above & Beyond):
- Canary deploys and multi-host fanout.
- Rich reports with timelines and log excerpts.