Project 6: Deployment Pipeline Tool (Final Integration)

Build a deployment tool that watches a directory, syncs changes to a remote host, restarts a supervised service, aggregates logs, and reports health.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	3-4 weeks
Main Programming Language	C (Alternatives: Go, Rust)
Alternative Programming Languages	Go, Rust
Coolness Level	Level 5 - Real integration capstone
Business Potential	Level 4 - DevOps tooling
Prerequisites	All prior projects, SSH/rsync basics
Key Topics	File watching, orchestration, supervision, log streaming

1. Learning Objectives

By completing this project, you will:

Orchestrate file watching, sync, restart, and health checks as a single tool.
Integrate the log tailer, connection pool, and supervisor concepts.
Handle partial failures with rollback and retries.
Provide deterministic deployment reports.
Build a deploy pipeline that fails safely under real OS constraints.

2. All Theory Needed (Per-Concept Breakdown)

2.1 File Watching and Change Detection

Fundamentals

File watching detects changes in a directory so you can trigger a deploy. On Linux, inotify provides event notifications for file creation, modification, and deletion. A polling approach periodically scans file timestamps to detect changes. Inotify is efficient but can drop events under heavy load; polling is simpler but less responsive. A robust deploy tool should support both or use inotify with periodic sanity scans.

Deep Dive into the concept

Inotify watches directories and reports events via a file descriptor. Each event includes a mask (e.g., IN_MODIFY, IN_CREATE) and a file name. However, inotify has limits: the event queue can overflow, and you can exceed watch limits (fs.inotify.max_user_watches). When this happens, events are lost. A deploy tool must detect this and recover by doing a full rescan. Polling, by contrast, is simple and robust but can be CPU-expensive. A hybrid approach is ideal: use inotify for real-time detection and fall back to periodic scans for consistency.

Change detection also needs filtering. Temporary files, editor swap files, and build artifacts can cause noisy deploys. Your tool should support ignore patterns and configurable includes. It should also debounce bursts of changes (e.g., saving many files at once) by grouping changes within a small time window before triggering a deploy.

How this fits in projects

File watching is the trigger for the deploy pipeline in Project 6. It can reuse the polling techniques from Project 1 and the event loop patterns from Project 2.

Definitions & key terms

inotify: Linux filesystem event notification system.
Debounce: Delay to group rapid events into one action.
Rescan: Full directory scan to recover from event loss.

Mental model diagram (ASCII)

fs events -> debounce -> change set -> deploy pipeline

How it works (step-by-step, with invariants and failure modes)

Register inotify watches or start polling.
Capture change events into a queue.
Debounce for a short window (e.g., 200ms).
Produce a change set and trigger deploy.
Failure mode: inotify queue overflow -> fallback to rescan.

Minimal concrete example

int fd = inotify_init1(IN_NONBLOCK);
int wd = inotify_add_watch(fd, path, IN_CREATE|IN_MODIFY|IN_DELETE);

Common misconceptions

“Inotify never drops events”: it can under heavy load.
“Polling is always bad”: it is reliable and sometimes sufficient.

Check-your-understanding questions

What happens when inotify queue overflows?
Why debounce events?
What is a reasonable fallback strategy?

Check-your-understanding answers

Events are lost; you must rescan.
To avoid deploying on every tiny change.
Perform a full scan and rebuild the change set.

Real-world applications

Hot reloaders in development tools.
CI systems that detect source changes.

Where you will apply it

Project 6: See §3.1 and §5.10 Phase 1.
Also used in: P01 Multi-Source Log Tailer for polling logic.

References

man 7 inotify.

Key insights

Change detection must be robust to event loss and noisy edits.

Summary

File watching is the trigger mechanism; reliability comes from debounce and rescan logic.

Homework/exercises to practice the concept

Implement a small inotify watcher and log events.
Simulate queue overflow by generating many events.
Add a debounce timer and group events into a batch.

Solutions to the homework/exercises

Events should show CREATE/MODIFY/DELETE masks.
Overflow produces IN_Q_OVERFLOW; you must rescan.
Debounce groups changes into a single deploy trigger.

2.2 Reliable Sync and Partial Deployment Handling

Fundamentals

Deployment sync copies files to a remote host. Tools like rsync are efficient because they transfer only changed blocks. But sync can fail mid-way due to network loss or permissions. A deploy tool must detect partial syncs and decide whether to retry or rollback.

Deep Dive into the concept

Rsync uses file checksums and rolling hashes to send only changed blocks. It can also preserve permissions and timestamps. However, if the connection drops mid-transfer, the remote directory may be in a partially updated state. This can break running services. To avoid this, a common pattern is to sync into a staging directory, then atomically switch a symlink to point to the new version. If sync fails, you leave the old symlink intact and the service continues running.

Another approach is to use a temporary file naming scheme and rename on completion. This provides atomicity at the file level but not at the directory level. For a deployment tool, directory-level atomicity is more important. Therefore, the recommended approach is: sync to releases/<timestamp>/, validate, then switch a current symlink. This pattern is used by many deployment systems.

Retries must be bounded. If a sync fails repeatedly, you should back off and report an error. The tool should also provide a diff report of which files were updated in the last attempt to aid debugging.

How this fits in projects

Sync reliability is core to Project 6 and depends on the connection pool reliability from Project 2.

Definitions & key terms

Staging directory: Temporary deploy target before switch.
Atomic switch: Changing a symlink to activate new version.
Partial deploy: Incomplete file sync state.

Mental model diagram (ASCII)

rsync -> releases/2026-01-01-120000/
validate -> switch symlink current -> new release

How it works (step-by-step, with invariants and failure modes)

Create new release directory on remote.
Sync files to release directory.
Validate (checksum or test command).
Atomically update current symlink.
Failure mode: sync fails -> do not switch symlink.

Minimal concrete example

rsync -az ./src user@host:/opt/app/releases/2026-01-01-120000/
ssh user@host ln -sfn /opt/app/releases/2026-01-01-120000 /opt/app/current

Common misconceptions

“Rsync always leaves a clean state”: partial syncs can leave broken files.
“Atomic switch is optional”: it prevents broken deployments.

Check-your-understanding questions

Why use a staging directory?
What does ln -sfn accomplish?
How do you handle repeated sync failures?

Check-your-understanding answers

It isolates incomplete updates from running services.
It atomically updates the symlink to new release.
Backoff and report errors; avoid infinite retries.

Real-world applications

Blue/green and symlink-based deployment systems.

Where you will apply it

Project 6: See §3.2 Functional Requirements and §5.10 Phase 2.
Also used in: P02 HTTP Connection Pool.

References

rsync documentation.
Capistrano-style deploy patterns.

Key insights

Atomic release switching prevents partial deploys from breaking services.

Summary

Use staging directories and symlink swaps to make deployments atomic and safe.

Homework/exercises to practice the concept

Sync to a staging directory and switch a symlink.
Simulate a failed sync and verify the old release remains active.
Record a diff of changed files for each deploy.

Solutions to the homework/exercises

The new release should activate only after the symlink update.
Old release continues running if sync fails.
The diff shows which files changed during deploy.

2.3 Service Supervision, Restart, and Health Checks

Fundamentals

After a deploy, you need to restart the service and confirm it is healthy. A supervisor handles start/stop/restart and forwards signals safely. Health checks validate whether the service is ready. This can be a local HTTP check, a PID file, or a custom command. The deployment pipeline should not proceed to log streaming until the service is healthy.

Deep Dive into the concept

Restarting a service during deployment must be done carefully. If you kill the process abruptly, it may leave resources in a bad state. The supervisor pattern from Project 3 provides a controlled shutdown: send SIGTERM, wait for exit, then SIGKILL if necessary. After restart, you should run a health check with a timeout. Health checks should be idempotent and fast. A common strategy is to poll a /health endpoint with a timeout and retry interval.

Health checks should be integrated into the deployment state machine. If health checks fail, the deployment should either roll back to the previous release or mark the deployment as failed and leave the system in a safe state. The decision should be configurable.

How this fits in projects

This concept integrates Project 3’s supervisor logic into Project 6 and uses Project 2’s connection pool for health checks.

Definitions & key terms

Health check: Test to verify service readiness.
Rollback: Return to previous release after failure.
Graceful shutdown: Stop process with SIGTERM before SIGKILL.

Mental model diagram (ASCII)

deploy -> restart -> health check -> success -> log stream
               \-> fail -> rollback

How it works (step-by-step, with invariants and failure modes)

Send SIGTERM to service group.
Wait for exit, then restart.
Poll health endpoint with timeout.
If healthy, proceed; if not, rollback.
Failure mode: no timeout -> deploy hangs indefinitely.

Minimal concrete example

curl -sf http://127.0.0.1:8080/health || exit 1

Common misconceptions

“Restart means ready”: not necessarily; you need a health check.
“SIGKILL is safe”: it skips cleanup.

Check-your-understanding questions

Why separate restart and health check?
When should you rollback?
What is a reasonable health check timeout?

Check-your-understanding answers

A process can start but not be ready.
When health checks fail after retries.
Enough to cover startup time, typically a few seconds.

Real-world applications

Continuous deployment systems.
Service supervisors in production.

Where you will apply it

Project 6: See §3.2 and §5.10 Phase 3.
Also used in: P03 Process Supervisor.

References

systemd service management docs.

Key insights

A deploy is not complete until health checks pass.

Summary

Integrate supervised restarts with explicit health checks and rollback logic.

Homework/exercises to practice the concept

Add a health check endpoint to a toy service.
Simulate failed health checks and trigger rollback.
Measure time-to-healthy and log it.

Solutions to the homework/exercises

Health check should return 200 when ready.
Rollback should restore previous release and keep service running.
Logs show time from restart to healthy.

2.4 Log Aggregation Across Rotations

Fundamentals

Deployment tools often tail logs to show live output. Log aggregation must survive rotations, restarts, and multiple sources. This is essentially Project 1 integrated into a pipeline. You need ordered, timestamped output with clear source labels and rotation events.

Deep Dive into the concept

When a service restarts, logs may rotate or truncate. A deploy tool should attach to logs via a rotation-aware tailer that tracks inode identity. If the log file is replaced, the tailer must reopen and continue without missing lines. When multiple services or instances are involved, the tool should merge output by timestamp to create a coherent view of system behavior.

A key design decision is whether log streaming is blocking or optional. In a deploy pipeline, log streaming should not block the overall pipeline if logs are unavailable; it should instead emit warnings and continue. This ensures the deploy completes even if logs are delayed or missing.

How this fits in projects

This concept reuses Project 1’s tailer and is integrated into Project 6 as the final observability layer.

Definitions & key terms

Aggregation: Combining logs from multiple sources.
Rotation-aware: Tailer that handles rename and copytruncate.

Mental model diagram (ASCII)

logs A + logs B -> rotation-aware tailers -> ordered output

How it works (step-by-step, with invariants and failure modes)

Attach tailers to target log files.
Detect rotations and reopen.
Merge lines by timestamp.
Failure mode: missing logs -> warn and continue.

Minimal concrete example

./tailer --config deploy-logs.yml

Common misconceptions

“Tail -f is enough”: it fails on rename rotations.
“Log streaming should block deploy”: it should be optional.

Check-your-understanding questions

Why is rotation handling necessary in deploy logs?
How do you order logs from multiple sources?
What should happen if a log file is missing?

Check-your-understanding answers

Services often rotate logs on restart.
By timestamp with tie-breakers.
Emit warning and continue.

Real-world applications

CI/CD pipelines streaming service logs.
Incident response during deploys.

Where you will apply it

Project 6: See §3.2 Functional Requirements and §5.10 Phase 4.
Also used in: P01 Multi-Source Log Tailer.

References

Project 1 tailer design.

Key insights

Log streaming is an observability feature, not a gate.

Summary

Integrate a rotation-aware tailer to provide live deploy visibility without blocking.

Homework/exercises to practice the concept

Tail two log files and merge output by timestamp.
Rotate a log file mid-stream and verify continuity.
Simulate missing log and ensure the tool continues.

Solutions to the homework/exercises

Output should interleave based on timestamps.
Rotation should be detected and handled without missing lines.
A warning should be emitted, but deploy continues.

2.5 Deployment State Machine and Determinism

Fundamentals

A deploy pipeline is a state machine: detect changes, sync, restart, health check, stream logs, report. Each step has success and failure transitions. Making this explicit prevents hidden states and ensures predictable recovery. Determinism matters for testing; you should be able to run the pipeline with fixed inputs and get the same report.

Deep Dive into the concept

State machines make failure handling explicit. Each state should define its inputs, outputs, timeouts, and failure transitions. For example, the “sync” state might transition to “restart” on success or “rollback” on failure. Timeouts are critical: if a step hangs, the pipeline should abort or retry. Deterministic behavior is achieved by using fixed seeds for retries, consistent timestamp formatting, and stable ordering of log output. This makes it possible to test the pipeline in CI.

A useful design pattern is to emit a deployment report that includes timestamps, durations, and outcomes for each state. This report should be deterministic in test mode and include a golden path sample. You should also include a failure path sample (e.g., health check fails) with clear exit codes.

How this fits in projects

This concept ties all prior components into a single orchestrated flow and defines how the pipeline behaves under failure.

Definitions & key terms

State machine: A set of states with explicit transitions.
Rollback: Transition to a previous safe state.
Deterministic mode: Fixed seeds and timestamps for testing.

Mental model diagram (ASCII)

WATCH -> SYNC -> RESTART -> HEALTH -> LOGS -> REPORT
   |       |        |         |        |        |
   +------fail----> ROLLBACK <-+--------+--------+

How it works (step-by-step, with invariants and failure modes)

Wait for changes.
Sync to staging and switch release.
Restart service via supervisor.
Run health checks with timeout.
Attach log tailer.
Emit report and exit.
Failure mode: step timeout -> rollback.

Minimal concrete example

{"state":"HEALTH","status":"failed","duration_ms":1200}

Common misconceptions

“Deployment is linear”: it is stateful with failure paths.
“Testing deploys is too complex”: deterministic mode makes it testable.

Check-your-understanding questions

Why make the state machine explicit?
What does deterministic mode enable?
When should rollback happen?

Check-your-understanding answers

To define failure behavior and avoid hidden states.
Reproducible tests and predictable reports.
When sync or health check fails.

Real-world applications

CI/CD pipelines.
Deployment orchestration tools.

Where you will apply it

Project 6: See §4.1 and §5.10 Phase 4.
Also used in: P02 HTTP Connection Pool for retries.

References

State machine design patterns.

Key insights

Explicit states create predictable recovery paths.

Summary

A deploy pipeline should be a deterministic state machine with explicit failure transitions and reports.

Homework/exercises to practice the concept

Draw a state machine for a simple deploy.
Add timeout transitions and rollback paths.
Emit a JSON report for each state.

Solutions to the homework/exercises

The diagram should include success and failure transitions.
Timeouts should go to rollback or failure states.
Reports should include state name and duration.

3. Project Specification

3.1 What You Will Build

A CLI deploy tool that monitors a directory, syncs changes to a remote host, restarts a supervised service, tails logs, and prints a deterministic deploy report.

3.2 Functional Requirements

Watch: Detect file changes (inotify + debounce).
Sync: Sync to remote staging directory and switch symlink.
Restart: Use supervisor to restart service.
Health: Poll health endpoint with timeout.
Logs: Stream logs with rotation handling.
Report: Output a deterministic report and exit code.

3.3 Non-Functional Requirements

Reliability: Partial syncs do not break running service.
Observability: Clear step-by-step logs and report.
Determinism: Fixed seeds and timestamps in test mode.

3.4 Example Usage / Output

$ ./deploy watch ./src user@server:/opt/app
[deploy] detected change: main.c
[deploy] rsync complete in 420ms
[deploy] restarting service via supervisor
[deploy] service healthy in 1.2s
[deploy] log stream attached

3.5 Data Formats / Schemas / Protocols

Deploy report (JSON):

{
  "version": 1,
  "steps": [
    {"name":"sync","status":"ok","duration_ms":420},
    {"name":"restart","status":"ok","duration_ms":800},
    {"name":"health","status":"ok","duration_ms":1200}
  ],
  "result": "success"
}

3.6 Edge Cases

Sync fails mid-transfer.
Health check times out.
Logs unavailable or rotated during deploy.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./deploy watch ./src user@server:/opt/app --deterministic --seed 42

3.7.2 Golden Path Demo (Deterministic)

Use a test server with fixed health response.
Report shows deterministic durations in test mode.

3.7.3 If CLI: exact terminal transcript

$ ./deploy watch ./src user@server:/opt/app --seed 1
[deploy] detected change: main.c
[deploy] rsync complete in 400ms
[deploy] restart ok
[deploy] health ok
[deploy] report written deploy.json

Failure demo (health check timeout):

$ ./deploy watch ./src user@server:/opt/app
[deploy] restart ok
[deploy] health timeout after 5s
[deploy] rollback to previous release

Exit codes:

0 on success.
8 on sync failure.
9 on health check failure.

4. Solution Architecture

4.1 High-Level Design

Watcher -> Sync -> Restart -> Health -> Logs -> Report
    \--------------------------------------/
                Rollback on failure

4.2 Key Components

Component	Responsibility	Key Decisions
Watcher	Detect changes and debounce	inotify + rescan
Syncer	Rsync to staging + symlink switch	atomic release switch
Supervisor	Restart service	reuse Project 3 logic
Health check	Validate readiness	HTTP endpoint via pool
Logger	Stream logs	reuse Project 1 tailer
Reporter	Emit deterministic report	JSON output

4.3 Data Structures (No Full Code)

struct step_result { char name[32]; int status; int64_t duration_ms; };

4.4 Algorithm Overview

Key Algorithm: Deploy state machine

Detect change, build change set.
Sync to staging, verify.
Restart service, wait for exit.
Health check with retries.
Attach log tailer.
Emit report and exit.

Complexity Analysis:

Time: dominated by sync and restart durations.
Space: O(n) for change set.

5. Implementation Guide

5.1 Development Environment Setup

sudo apt-get install -y gcc make rsync openssh-client

5.2 Project Structure

deploy-tool/
├── src/
│   ├── main.c
│   ├── watch.c
│   ├── sync.c
│   ├── supervisor.c
│   ├── health.c
│   ├── logs.c
│   └── report.c
├── include/
│   └── deploy.h
└── Makefile

5.3 The Core Question You’re Answering

“How do I make multiple systems act as one coherent tool under failure?”

5.4 Concepts You Must Understand First

File watching and debounce.
Atomic sync and rollback.
Supervisor restart logic.
Health checks and timeouts.
Log aggregation and rotation handling.

5.5 Questions to Guide Your Design

What is the minimal deploy state machine?
How do you handle partial sync failures?
When do you rollback vs retry?

5.6 Thinking Exercise

Partial Deploy Scenario

Sync 60% of files, then network drops.
Should you roll back, retry, or fail fast?

5.7 The Interview Questions They’ll Ask

How do you design a deploy tool that survives partial failure?
How do you ensure logs keep flowing across restarts?
Why do you need atomic symlink switching?

5.8 Hints in Layers

Hint 1: File watching

// inotify + debounce window

Hint 2: Sync to staging

rsync -az ./src user@host:/opt/app/releases/ID/

Hint 3: Supervisor integration

// reuse Project 3 restart logic

Hint 4: Log tailer

// reuse Project 1 rotation-aware tailer

5.9 Books That Will Help

Topic	Book	Chapter
Processes	APUE	Ch. 8-9
File I/O	TLPI	Ch. 4
Networking	TCP/IP Sockets in C	Ch. 2-4

5.10 Implementation Phases

Phase 1: Watch + Change Set (4-6 days)

Goals:

Detect file changes and build deploy triggers.

Tasks:

Implement inotify watcher.
Add debounce and rescan fallback.

Checkpoint: Change detection triggers deploy.

Phase 2: Sync + Rollback (5-7 days)

Goals:

Reliable sync and atomic switch.

Tasks:

Sync to staging directory.
Switch symlink on success.
Rollback on failure.

Checkpoint: Partial sync never breaks active release.

Phase 3: Restart + Health (5-7 days)

Goals:

Restart service and confirm readiness.

Tasks:

Integrate supervisor restart logic.
Implement health check retries.

Checkpoint: Failed health checks trigger rollback.

Phase 4: Logs + Report (4-6 days)

Goals:

Stream logs and emit report.

Tasks:

Attach rotation-aware tailer.
Emit deterministic JSON report.

Checkpoint: Deploy report matches golden path.

5.11 Key Implementation Decisions

Decision	Options	Recommendation	Rationale
Watch strategy	inotify, polling	inotify + rescan	Reliable and responsive
Sync strategy	direct, staging + symlink	staging + symlink	Atomicity
Failure policy	rollback, fail fast	rollback	Safer for production

6. Testing Strategy

6.1 Test Categories

Category	Purpose	Examples
Unit Tests	State transitions	state machine tests
Integration Tests	Sync + restart + health	deploy script
Edge Case Tests	Sync failure and rollback	network drop test

6.2 Critical Test Cases

Sync fails mid-transfer; symlink remains on old release.
Health check times out; rollback occurs.
Log tailer survives rotation during deploy.

6.3 Test Data

Mock health endpoint: returns 200 OK

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

Pitfall	Symptom	Solution
No atomic switch	Broken deploy on failure	Use staging + symlink
Missing timeouts	Deploy hangs	Add step timeouts
Log tailer follows old inode	Logs stop after restart	Use rotation-aware tailer

7.2 Debugging Strategies

Log each state transition with durations.
Use a dry-run mode to print actions without executing.
Add verbose mode for rsync and ssh output.

7.3 Performance Traps

Deploying on every single file change without debounce.
Running health checks too frequently without backoff.

8. Extensions & Challenges

8.1 Beginner Extensions

Add a --dry-run mode.
Add a simple HTML report output.

8.2 Intermediate Extensions

Add parallel sync to multiple hosts.
Add config file support.

8.3 Advanced Extensions

Add canary deploy mode.
Add rollback based on log error patterns.

9. Real-World Connections

9.1 Industry Applications

Capistrano-style deploys: staging + symlink pattern.
CI/CD pipelines: automated deploy with health checks.

Capistrano: Ruby-based deployment tool.
Fabric: Python deployment automation.

9.3 Interview Relevance

Deployment pipelines demonstrate systems integration skills.

10. Resources

10.1 Essential Reading

The Linux Programming Interface - file I/O and process control.
Release It! - resilience patterns.

10.2 Video Resources

Talks on deploy pipelines and zero-downtime deploys.

10.3 Tools & Documentation

man 7 inotify, rsync manual, ssh manual.

11. Self-Assessment Checklist

11.1 Understanding

I can explain why deployments need atomic switches.
I can explain why health checks are required.
I can describe the deploy state machine.

11.2 Implementation

All functional requirements are met.
Rollbacks work on failure.
Logs stream across restarts.

11.3 Growth

I can explain the integration of Projects 1-5.
I can justify my failure handling policy.

12. Submission / Completion Criteria

Minimum Viable Completion:

Detect changes, sync to remote, restart service, health check.
Emit deploy report with exit code.

Full Completion:

Rollback on failure and log streaming integrated.
Deterministic test mode with golden path.

Excellence (Going Above & Beyond):

Canary deploys and multi-host fanout.
Rich reports with timelines and log excerpts.