Project 6: Controller Reconciliation Lab
Implement a custom reconciliation workflow that converges child resources from a custom desired-state object.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Expert |
| Time Estimate | 12-20 hours |
| Main Programming Language | Go |
| Alternative Programming Languages | Python, Java |
| Coolness Level | Level 4 - Platform Craft |
| Business Potential | 3. Automation Engine |
| Prerequisites | Kubernetes API model, watches/informers, status conditions |
| Key Topics | idempotent reconciliation, drift correction, backoff and status design |
1. Learning Objectives
- Build idempotent reconcile logic.
- Represent progress and failures through status conditions.
- Handle retries and drift safely.
- Measure reconcile health and control-plane load.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Reconciliation Pattern
Fundamentals
Reconciliation compares desired and observed state, then applies minimal actions until convergence.
Deep Dive into the concept
Controllers should be level-triggered: every run computes desired child state from source-of-truth spec, then performs idempotent mutations. They must tolerate duplicate events and transient API errors. Strong designs separate read/compute/mutate/status phases and avoid side effects in non-mutating steps. Status fields must reflect real observed state, not assumptions.
How this fit on projects
- Core for P06 and essential for P10 automation architecture.
Definitions & key terms
- desired state, observed state, reconcile loop, observedGeneration.
Mental model diagram
watch event -> reconcile -> compare desired/actual -> patch resources -> update status
How it works
- Read source object and children.
- Compute desired children.
- Diff and apply changes.
- Update status conditions.
- Requeue as needed.
Invariants: repeated runs are safe. Failure modes: hot loops, stale status, non-idempotent side effects.
2.2 Retry Safety and Error Taxonomy
Fundamentals
Retries are normal in distributed control loops.
Deep Dive into the concept
Errors should be categorized as transient, permanent, or spec-invalid. Transient errors use backoff. Permanent errors should surface actionable status without uncontrolled requeue. Spec-invalid errors require user remediation and should avoid API thrash.
3. Project Specification
3.1 What You Will Build
A custom controller lab:
- custom resource definition
- reconcile loop for child resources
- status conditions and event reporting
3.2 Functional Requirements
- Watch custom resources and enqueue reconcile requests.
- Create/update/delete child resources to match spec.
- Update status fields with condition transitions.
- Correct manual drift automatically.
3.3 Non-Functional Requirements
- Performance: reconcile median under 2 seconds.
- Reliability: no uncontrolled requeue storms.
- Usability: status includes clear remediation hints.
3.7 Real World Outcome
$ ./reconcile-lab apply examples/cache-cluster.yaml
reconcile: created stateful workload and service
status: progressing -> ready
$ kubectl edit statefulset cache-cluster
# manual drift introduced
$ ./reconcile-lab observe
drift detected and corrected in 8s
4. Solution Architecture
4.1 High-Level Design
watch cache -> reconcile engine -> resource applier -> status updater -> metrics
4.2 Key Components
| Component | Responsibility | Key Decisions |
|---|---|---|
| Watcher | enqueue reconcile work | dedupe and backpressure handling |
| Engine | desired state computation | pure function design |
| Applier | API mutations | idempotent patch strategy |
| Status updater | progress/failure conditions | operator-friendly signal model |
5. Implementation Guide
5.3 The Core Question You’re Answering
“How do we converge declarative intent safely when events are duplicated, delayed, or partially failing?”
5.6 Milestones
- CRD + watch loop.
- Child resource reconciliation.
- Status condition model.
- Drift correction and failure injection tests.
5.9 Definition of Done
- Idempotent reconcile loop implemented.
- Status conditions reflect true observed state.
- Drift correction demonstrated.
- Backoff/error policy prevents hot loops.