Project 10: Update and Recovery Runbook
Write and execute a step-by-step runbook for patching and recovery with rollback.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 3 |
| Time Estimate | 2 weekends |
| Main Programming Language | Shell (sh) (Alternatives: csh, Python) |
| Alternative Programming Languages | sh, csh, Python |
| Coolness Level | Level 3 |
| Business Potential | Level 3 |
| Prerequisites | Projects 1-9 complete |
| Key Topics | Runbooks, freebsd-update, pkg, rollback |
1. Learning Objectives
By completing this project, you will:
- Build a deterministic update runbook with validation and rollback steps.
- Apply the runbook to a VM and record outputs.
- Handle a simulated failure and recover cleanly.
- Produce a reusable operations checklist for future upgrades.
2. All Theory Needed (Per-Concept Breakdown)
Concept 1: Update Workflow Discipline and Change Control
Fundamentals A runbook is a documented, repeatable procedure that reduces risk during maintenance. FreeBSD updates require a specific sequence: base system updates, reboot, validation, and then package updates. Each step should have a clear entry condition, expected outcome, and rollback trigger. Change control adds discipline: you plan a maintenance window, document pre-checks, and verify post-checks. For this project, you must be able to define a strict workflow and follow it without improvisation.
In practice, write a short checklist for update runbook discipline and confirm it after each reboot. This keeps the concept concrete and prevents accidental drift between sessions.
Deep Dive into the concept Operations reliability comes from repeatability. A runbook captures the exact commands, outputs, and decision points required to perform a change safely. In FreeBSD, the separation between base and packages makes this especially important. The base update affects the kernel and core libraries; packages depend on those libraries. This dependency chain means you must sequence updates correctly and validate at each step. A runbook formalizes the sequence so you are not relying on memory during a high-risk task.
Change control is not just bureaucracy. It is a set of safety checks designed to prevent surprises. Pre-checks ensure the system is healthy (disk space, ZFS status, service status). Execution steps are explicit and include safeguards such as creating a boot environment. Post-checks verify system health (version output, service status, logs). Rollback triggers are pre-defined: if a critical service fails or the system fails to boot, you switch to the previous BE. By writing these down, you reduce ambiguity and speed up recovery.
Determinism matters because you want to compare outcomes across runs. This is why the runbook should include fixed commands and expected outputs. If time-based data is needed, you set TZ=UTC to normalize timestamps. If randomness exists, you avoid it or explicitly note it. This makes your runbook a diagnostic tool as well as a procedure. When a step fails, you can compare the output to the expected output and immediately isolate the discrepancy.
In production, runbooks are often reviewed and approved before execution. In this lab, you simulate that by reading your runbook top to bottom before running it. You should be able to explain why each step exists. This creates the mental model that operations are systems, not ad-hoc tasks. The result is a workflow you can reuse in any FreeBSD environment.
Operationally, update runbook discipline is easiest to keep stable when you treat it as a small contract between configuration, tooling, and observable outputs. Write down the exact files that own the state and the commands that reveal the current truth. Then verify the contract at three points: immediately after you make the change, after a reboot, and after a deliberate disturbance such as restarting services or reloading modules. FreeBSD rewards this discipline because it rarely hides state; if something changes, it is usually in a file you control. Make a habit of collecting a before-and-after snapshot of commands and outputs so you can explain which change caused which effect.
At scale, update runbook discipline is also about failure containment. Identify what must remain available when something breaks and design a safe escape hatch. For example, keep console access for firewall changes, keep a previous boot environment for upgrades, or keep a dataset snapshot before risky edits. The same pattern applies across domains: define invariants, define the rollback path, and then only proceed when you can trigger that rollback quickly. Finally, test the failure path while the system is healthy; you learn more from a controlled rollback than from an emergency. This perspective turns the lab exercise into an operational capability you can trust on production systems.
How this fits on projects
- This concept is used in Section 3.1 and Section 5.10.
- It depends on P07 Boot Environments and P03 Package Workflow.
Definitions & key terms
- Runbook -> Step-by-step operational procedure.
- Pre-check -> Validation before changes.
- Post-check -> Validation after changes.
- Rollback trigger -> Condition that forces recovery.
Mental model diagram
Pre-check -> Change -> Validate -> Rollback if needed
How it works (step-by-step, with invariants and failure modes)
- Capture system state and health checks.
- Create rollback path (BE or snapshot).
- Apply updates.
- Validate system health.
Invariants:
- Updates are not applied without rollback.
- Validation is mandatory.
Failure modes:
- Skipped validation -> hidden failures.
- No rollback -> extended downtime.
Minimal concrete example
bectl create preupdate
freebsd-update fetch install
Common misconceptions
- “A runbook is just notes.” -> It must be executable and deterministic.
- “Rollback is optional.” -> It is essential for safe changes.
Check-your-understanding questions
- Why must validation be explicit?
- What is the difference between pre-check and post-check?
- Why do you document rollback triggers in advance?
Check-your-understanding answers
- To detect failures before they become incidents.
- Pre-checks ensure readiness; post-checks confirm success.
- To avoid debate during an outage.
Real-world applications
- Production patching with minimal downtime.
- Compliance evidence for maintenance operations.
Where you’ll apply it
- Section 3.7 Real World Outcome
- Section 5.10 Phase 2
- Also used in: P07 Boot Environments
References
- “Absolute FreeBSD, 3rd Edition” (Ch. 18)
- FreeBSD Handbook: Updating FreeBSD
Key insights A runbook is a safety system, not a checklist.
Summary Operational success comes from a disciplined, repeatable update process.
Homework/Exercises to practice the concept
- Write a pre-check list for disk space and services.
- Define three rollback triggers.
- Compare two update runs and note differences.
Solutions to the homework/exercises
- Use
df -h,zpool status,service -e. - SSH failure, kernel panic, critical service down.
- Differences indicate changes in system state.
Concept 2: Security Advisories and Patch Planning
Fundamentals FreeBSD publishes security advisories and errata that inform when patches are needed. Understanding how to read advisories and map them to your system is part of responsible maintenance. A patch plan includes identifying affected components, scheduling a maintenance window, and verifying that updates apply cleanly. For this project, you must identify where advisories come from and how they influence your update schedule.
In practice, write a short checklist for security advisory driven patch planning and confirm it after each reboot. This keeps the concept concrete and prevents accidental drift between sessions.
In practice, rehearse the steps on a disposable VM so you can recognize normal outputs and failure signals quickly.
Deep Dive into the concept Security advisories describe vulnerabilities in the FreeBSD base system or packages. The FreeBSD Security Team issues advisories with identifiers and severity information, and they often include affected versions. In practice, you check advisories to determine whether your current release is impacted. If an advisory affects your system, you schedule a patch window and follow your runbook.
Patch planning is more than “run updates.” It includes impact analysis: which services rely on affected components, what downtime is expected, and what tests must be run afterward. In a lab, you simulate this by identifying a change window and documenting expected outcomes. You also track the version before and after the update so you can prove the change took effect.
There are two layers to patching: base system and packages. Advisories apply to both, but the mechanisms differ. Base advisories are handled with freebsd-update, while package advisories are handled with pkg audit and upgrades. This is why a runbook must include both steps. In a real environment, you may also need to coordinate with application owners or schedule longer maintenance windows for critical services.
The discipline of reading advisories trains you to treat updates as risk management, not just routine chores. This is why this project ends the sequence: it pulls together installation, service management, storage, and rollback into a professional operational workflow.
Operationally, security advisory driven patch planning is easiest to keep stable when you treat it as a small contract between configuration, tooling, and observable outputs. Write down the exact files that own the state and the commands that reveal the current truth. Then verify the contract at three points: immediately after you make the change, after a reboot, and after a deliberate disturbance such as restarting services or reloading modules. FreeBSD rewards this discipline because it rarely hides state; if something changes, it is usually in a file you control. Make a habit of collecting a before-and-after snapshot of commands and outputs so you can explain which change caused which effect.
At scale, security advisory driven patch planning is also about failure containment. Identify what must remain available when something breaks and design a safe escape hatch. For example, keep console access for firewall changes, keep a previous boot environment for upgrades, or keep a dataset snapshot before risky edits. The same pattern applies across domains: define invariants, define the rollback path, and then only proceed when you can trigger that rollback quickly. Finally, test the failure path while the system is healthy; you learn more from a controlled rollback than from an emergency. This perspective turns the lab exercise into an operational capability you can trust on production systems.
Also, document the specific signals you will treat as success or failure for security advisory driven patch planning. For access or policy topics, that might be a deliberate allow case and a deliberate deny case that is correctly logged. For workflow topics, it might be a version change plus a service health check. Writing down these signals forces you to define what working actually means and prevents you from moving forward on assumptions.
How this fits on projects
- This concept is used in Section 3.2 and Section 5.10.
- It builds on P03 Package Workflow and P07 Boot Environments.
Definitions & key terms
- Security advisory -> Official notice of a vulnerability.
- Errata -> Update notice for non-security issues.
- Patch window -> Scheduled maintenance period.
Mental model diagram
Advisory -> Impact analysis -> Patch window -> Runbook execution
How it works (step-by-step, with invariants and failure modes)
- Read advisory and confirm affected version.
- Plan patch window and rollback.
- Apply base and package updates.
- Validate and document results.
Invariants:
- Advisories must be mapped to actual system version.
- Patch plan includes rollback.
Failure modes:
- Applying updates without understanding scope.
- Skipping package audits after base updates.
Minimal concrete example
pkg audit
freebsd-version
Common misconceptions
- “Advisories are optional.” -> They guide safe patching.
- “Only base system matters.” -> Packages can have vulnerabilities too.
Check-your-understanding questions
- What tool checks package vulnerabilities?
- Why do you record version numbers before and after?
- What is a patch window?
Check-your-understanding answers
pkg audit.- To verify the update actually applied.
- Scheduled time for safe maintenance.
Real-world applications
- Compliance-driven patching.
- Incident response to newly disclosed vulnerabilities.
Where you’ll apply it
- Section 3.2 Functional Requirements
- Section 5.10 Phase 1
- Also used in: P03 Package Workflow
References
- FreeBSD Security Advisories and Errata (official site)
- “Mastering FreeBSD and OpenBSD Security” (Ch. 3-4)
Key insights Patch planning starts with understanding advisories, not just running updates.
Summary Security advisories drive maintenance schedules and define what “safe” means.
Homework/Exercises to practice the concept
- Find a recent advisory and summarize its impact.
- Write a patch plan with a rollback trigger.
- Run pkg audit and document output.
Solutions to the homework/exercises
- Advisory summary should list affected versions and fixes.
- Patch plan includes window, steps, and rollback.
- Audit output lists vulnerable packages.
3. Project Specification
3.1 What You Will Build
A complete update and recovery runbook that includes pre-checks, update steps, validation, and rollback procedures, along with a recorded execution on your VM.
3.2 Functional Requirements
- Runbook written with explicit steps.
- Pre-checks documented (disk, services, ZFS).
- Base update performed with freebsd-update.
- Package update performed with pkg.
- Rollback tested or simulated.
3.3 Non-Functional Requirements
- Performance: Runbook fits within a defined maintenance window.
- Reliability: Steps are deterministic and repeatable.
- Usability: Another person can follow the runbook without guesswork.
3.4 Example Usage / Output
$ freebsd-version
14.1-RELEASE
$ pkg audit
0 problem(s) in the installed packages found.
3.5 Data Formats / Schemas / Protocols
- Runbook format
1) Pre-checks 2) Create BE 3) Update base 4) Reboot and validate 5) Update packages 6) Rollback if needed
3.6 Edge Cases
- Update requires multiple reboots.
- Network failure during fetch.
- Package update fails due to ABI mismatch.
3.7 Real World Outcome
You can follow a written procedure to update and recover a FreeBSD system safely.
3.7.1 How to Run (Copy/Paste)
bectl create preupdate
freebsd-update fetch install
reboot
pkg upgrade
3.7.2 Golden Path Demo (Deterministic)
- Use fixed BE name
preupdate. - Use
TZ=UTCfor consistent logs.
3.7.3 If CLI: provide an exact terminal transcript
$ TZ=UTC freebsd-version
14.1-RELEASE
$ TZ=UTC pkg audit
0 problem(s) in the installed packages found.
$ echo $?
0
Failure demo (deterministic)
$ TZ=UTC freebsd-update install
freebsd-update: Cannot fetch updates; server unreachable.
$ echo $?
1
4. Solution Architecture
4.1 High-Level Design
Pre-checks -> BE -> Base update -> Reboot -> Validate -> Packages -> Rollback
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Runbook | Step-by-step procedure | Deterministic steps |
| BE | Rollback safety | Naming scheme |
| Validation checks | Proof of success | Services + versions |
4.3 Data Structures (No Full Code)
RunbookStep
- id: 3
- description: "Update base system"
- command: "freebsd-update fetch install"
- expected: "no errors"
4.4 Algorithm Overview
Key Algorithm: Update Runbook Execution
- Capture pre-check state.
- Create rollback path.
- Apply updates and reboot.
- Validate system.
- Roll back if any critical test fails.
Complexity Analysis:
- Time: O(update duration)
- Space: O(BE size)
5. Implementation Guide
5.1 Development Environment Setup
# Ensure you have console access in case of lockout
5.2 Project Structure
update-runbook/
+-- runbook.md
+-- precheck.log
+-- postcheck.log
5.3 The Core Question You’re Answering
“Can I update a FreeBSD system without fear?”
5.4 Concepts You Must Understand First
Stop and research these before coding:
- Update workflow discipline
- Security advisories and patch planning
5.5 Questions to Guide Your Design
- What is your maintenance window length?
- Which services are critical to validate?
- What is your rollback trigger list?
5.6 Thinking Exercise
Failure Scenario
Assume an update breaks SSH. How do you recover?
5.7 The Interview Questions They’ll Ask
- What is freebsd-update used for?
- Why upgrade packages after the base system?
- How do you validate an upgrade?
- What is your rollback plan?
- Where do you find security advisories?
5.8 Hints in Layers
Hint 1: Write the checklist first Don’t update without a plan.
Hint 2: Use BEs Rollback should be easy.
Hint 3: Record outputs Keep before/after version logs.
Hint 4: Simulate failure Practice recovery before production.
5.9 Books That Will Help
| Topic | Book | Chapter |
|---|---|---|
| Upgrades | “Absolute FreeBSD, 3rd Edition” | Ch. 18 |
| Security | “Mastering FreeBSD and OpenBSD Security” | Ch. 3-4 |
5.10 Implementation Phases
Phase 1: Runbook Draft (2-3 hours)
Goals: Write a complete runbook. Tasks:
- Document pre-checks and commands.
- Define rollback triggers. Checkpoint: runbook has no missing steps.
Phase 2: Execution (3-4 hours)
Goals: Run the update in a VM. Tasks:
- Execute runbook steps.
- Record outputs. Checkpoint: post-checks match expected state.
Phase 3: Recovery Drill (2-3 hours)
Goals: Test rollback. Tasks:
- Simulate a failure.
- Execute rollback steps. Checkpoint: system returns to stable state.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale |
|---|---|---|---|
| BE naming | preupdate, timestamp | preupdate | Clear intent |
| Validation scope | minimal, full | minimal + critical services | Balanced effort |
| Patch cadence | monthly, quarterly | quarterly | Stable for lab |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples |
|---|---|---|
| Pre-check Tests | System readiness | zpool status, df -h |
| Validation Tests | Post-update health | freebsd-version |
| Rollback Tests | Recovery | bectl activate |
6.2 Critical Test Cases
- Pre-check success: system healthy before update.
- Post-check success: version and services ok.
- Rollback success: prior BE boots and services start.
6.3 Test Data
BE name: preupdate
Critical services: sshd
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution |
|---|---|---|
| No rollback path | Downtime | Always create BE |
| Skipping package upgrades | Service failures | Run pkg upgrade |
| No post-checks | Hidden issues | Validate explicitly |
7.2 Debugging Strategies
- Compare before/after versions: identify what changed.
- Read logs: /var/log/messages for update errors.
7.3 Performance Traps
- Long updates without planning can exceed maintenance window.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add a quick smoke-test script to the runbook.
- Include a backup reminder step.
8.2 Intermediate Extensions
- Automate runbook execution with a script.
- Add service-level health checks.
8.3 Advanced Extensions
- Integrate monitoring alerts into rollback triggers.
- Create a runbook template for other systems.
9. Real-World Connections
9.1 Industry Applications
- Operations teams: standard patching runbooks.
- Compliance: audit trails for system maintenance.
9.2 Related Open Source Projects
- freebsd-update: base update tool.
- pkg: package management tool.
9.3 Interview Relevance
- Ability to explain safe maintenance workflows.
- Understanding of rollback strategies.
10. Resources
10.1 Essential Reading
- “Absolute FreeBSD, 3rd Edition” by Michael W. Lucas - Ch. 18
- “Mastering FreeBSD and OpenBSD Security” - Ch. 3-4
10.2 Video Resources
- “FreeBSD Update Best Practices” - community talks
10.3 Tools & Documentation
- freebsd-update(8): base updates
- pkg(8): package updates
10.4 Related Projects in This Series
- P07 Boot Environments - rollback safety.
- P03 Package Workflow - update hygiene.
11. Self-Assessment Checklist
11.1 Understanding
- I can explain the update sequence.
- I can define rollback triggers.
- I understand how advisories drive patches.
11.2 Implementation
- Runbook written and executed.
- Outputs recorded for pre/post checks.
- Rollback tested or simulated.
11.3 Growth
- I can reuse this runbook for future upgrades.
- I can explain the workflow in interviews.
- I can teach safe update practices.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Runbook written with pre/post checks.
- Base and package updates executed in VM.
Full Completion:
- Rollback tested successfully.
- Update outputs recorded and reviewed.
Excellence (Going Above & Beyond):
- Automated runbook script.
- Template created for additional systems.