Project 10: Build a Mini Cloud Control Plane
Build a small control plane that schedules VMs across hosts, exposes an API, and supports migration.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Level 5: Expert |
| Time Estimate | 6-10 weeks |
| Main Programming Language | Go or Python (Alternatives: Rust) |
| Alternative Programming Languages | Rust |
| Coolness Level | Level 5: Cloud Builder |
| Business Potential | Level 4: Platform Builder |
| Prerequisites | libvirt, networking, distributed state |
| Key Topics | Scheduling, reconciliation, migration |
1. Learning Objectives
By completing this project, you will:
- Build an API that manages VM lifecycle across hosts.
- Implement a scheduler that uses resource metrics.
- Store durable VM state and reconcile drift.
- Trigger migration or evacuation workflows.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Scheduling and Resource Accounting
Fundamentals Scheduling decides where VMs should run based on resource availability and policy. A scheduler must consider CPU, memory, storage, and network capacity, and it must make consistent placement decisions under concurrent requests. Overcommit allows more vCPUs than physical cores, but it increases CPU steal time and latency. A robust scheduler balances utilization with predictable performance.
Schedulers must encode policy explicitly. Bin-packing maximizes density but increases contention, while spreading reduces hotspots but can waste capacity. Admission control is a safety valve: it rejects requests that would violate hard constraints. Quotas and priorities ensure fairness among tenants, which is essential for multi-tenant reliability.Deep Dive into the concept Scheduling in virtualization is a multi-dimensional optimization problem. CPU, memory, storage, and network capacities are all constraints, and the scheduler must decide which constraint dominates for a given VM. Some schedulers use bin-packing (maximize density), while others use spreading (avoid hotspots). The choice depends on workload characteristics and SLOs.
Resource accounting is not trivial. CPU usage fluctuates; memory usage can spike; storage latency can degrade due to background recovery. A scheduler that uses stale metrics will make poor decisions. Therefore, most systems collect metrics frequently and apply smoothing or decay to reduce noise. Some schedulers treat metrics as “soft” constraints, while others treat them as “hard” constraints (e.g., never exceed memory capacity).
Overcommit is a policy choice. Overcommitting CPU is common because CPU time is multiplexed, but overcommitting memory is riskier because it can trigger swapping or OOM kills. A scheduler must encode this policy explicitly and reflect it in placement decisions. It must also consider NUMA locality: placing a VM’s vCPUs and memory on the same NUMA node reduces latency and improves throughput.
Scheduling also needs to respect affinity rules. Some VMs must be co-located (e.g., for low latency) while others must be separated (anti-affinity for availability). These rules are often expressed as labels or constraints. The scheduler must detect conflicts and either reject the request or find a valid placement.
Finally, scheduling is not a one-time decision. Host conditions change, VMs migrate, and hardware fails. The scheduler must support rebalancing and evacuation while maintaining safety. This is why scheduling is deeply tied to migration and control-plane orchestration.
Schedulers must encode policy explicitly. Bin-packing maximizes density but increases contention, while spreading reduces hotspots but can waste capacity. Admission control is a safety valve: it rejects requests that would violate hard constraints. Quotas and priorities ensure fairness among tenants, which is essential for multi-tenant reliability.
Schedulers must encode policy explicitly. Bin-packing maximizes density but increases contention, while spreading reduces hotspots but can waste capacity. Admission control is a safety valve: it rejects requests that would violate hard constraints. Quotas and priorities ensure fairness among tenants, which is essential for multi-tenant reliability.
Schedulers must encode policy explicitly. Bin-packing maximizes density but increases contention, while spreading reduces hotspots but can waste capacity. Admission control is a safety valve: it rejects requests that would violate hard constraints. Quotas and priorities ensure fairness among tenants, which is essential for multi-tenant reliability.
Schedulers must encode policy explicitly. Bin-packing maximizes density but increases contention, while spreading reduces hotspots but can waste capacity. Admission control is a safety valve: it rejects requests that would violate hard constraints. Quotas and priorities ensure fairness among tenants, which is essential for multi-tenant reliability.
Schedulers must encode policy explicitly. Bin-packing maximizes density but increases contention, while spreading reduces hotspots but can waste capacity. Admission control is a safety valve: it rejects requests that would violate hard constraints. Quotas and priorities ensure fairness among tenants, which is essential for multi-tenant reliability.How this fit on projects This concept defines how your control plane chooses hosts for VM placement and migration.
Definitions & key terms
- Scheduler: component that selects a host for a VM.
- Overcommit: allocating more virtual resources than physical.
- Affinity/anti-affinity: placement constraints.
Mental model diagram
Request -> capacity check -> policy rules -> host selection
How it works (step-by-step, with invariants and failure modes)
- Gather host metrics.
- Filter hosts that violate hard constraints.
- Score remaining hosts based on policy.
- Select host and record placement.
Invariants: do not exceed hard constraints; avoid duplicate placements. Failure modes include stale metrics and conflicting policies.
Minimal concrete example
VM needs 4GB RAM -> hosts A/B -> B has 6GB free -> select B
Common misconceptions
- Least-loaded host is always best.
- Overcommit is harmless.
Check-your-understanding questions
- Why are stale metrics dangerous for schedulers?
- What is the difference between hard and soft constraints?
Check-your-understanding answers
- They lead to overcommit and instability.
- Hard constraints must never be violated; soft constraints can be traded off.
Real-world applications
- OpenStack Nova scheduler
- Cloud provider VM placement engines
Where you’ll apply it
- Apply in §3.2 (functional requirements) and §5.10 (implementation phases)
- Also used in: P07-vagrant-style-orchestrator
References
- OpenStack scheduling docs
- “Fundamentals of Software Architecture” (scheduling chapters)
Key insights Scheduling is a policy engine informed by real-time telemetry.
Summary You now understand how resource accounting drives placement decisions.
Homework/Exercises to practice the concept
- Design a simple scoring function for host selection.
- Explain how you would enforce anti-affinity.
Solutions to the homework/exercises
- Score = weighted sum of free CPU, RAM, and I/O headroom.
- Filter out hosts that already run a VM with the same label.
2.2 Control Plane State, Reconciliation, and Observability
Fundamentals A control plane manages VM lifecycle, scheduling, policy, and observability. Libvirt provides a consistent API for VM definition and lifecycle across hypervisors. QEMU provides device models and the runtime. A control plane tracks desired state, reconciles actual state, and integrates with metrics and logs. Without observability, VM performance problems are guesswork. Control planes are not optional in production; even a small cluster needs a single source of truth for VM identity, ownership, and placement.
State drift is the enemy of control planes. Without event-driven updates and reconciliation, the system’s view of VM state diverges from reality. Durable state storage enables recovery after crashes, and audit logging provides accountability. These concerns might feel administrative, but they directly impact uptime and operator trust.Deep Dive into the concept Control planes separate desired state from actual state. Users declare what should exist, and controllers reconcile reality by creating or updating VMs. This is a distributed systems problem: state must be durable, API calls must be idempotent, and failures must not create duplicate VMs. Many control planes store state in a database and use reconciliation loops similar to Kubernetes.
Observability ties it together. Hypervisors expose metrics like vCPU run time, VM exit counts, dirty page rates, and disk latency. Logs record VM lifecycle events and migration progress. Tracing tools can attribute latency to host or guest. Without these signals, diagnosing performance regressions is nearly impossible.
Security is a control-plane concern too. Access control must be enforced at the API boundary, and actions should be audited. VM images must be verified and stored in trusted registries. Quotas prevent a single tenant from exhausting cluster resources. These policies are not just administrative; they directly influence scheduling and availability.
Finally, control planes must handle partial failure. A host may be unreachable but still running VMs. The control plane must decide whether to fence and restart those VMs elsewhere, balancing safety with availability. This is why leases, heartbeats, and fencing are standard patterns. Event-driven design is essential: libvirt and QEMU expose event streams that notify when VMs change state. A control plane that ignores events quickly diverges from reality.
Metrics design influences stability. If the control plane tracks only coarse CPU usage, it may oversubscribe memory or saturate storage I/O without noticing. Good control planes track queue depth, latency percentiles, and error rates, then feed those signals into placement and admission control. Poor signals lead to oscillation and instability.
Control planes also expose interfaces for automation. Webhooks, event streams, and rate limits protect the system from overload while still enabling integration with CI/CD and monitoring. Without backpressure, a burst of API requests can cascade into host overload and VM instability.
State drift is the enemy of control planes. Without event-driven updates and reconciliation, the system’s view of VM state diverges from reality. Durable state storage enables recovery after crashes, and audit logging provides accountability. These concerns might feel administrative, but they directly impact uptime and operator trust.
State drift is the enemy of control planes. Without event-driven updates and reconciliation, the system’s view of VM state diverges from reality. Durable state storage enables recovery after crashes, and audit logging provides accountability. These concerns might feel administrative, but they directly impact uptime and operator trust.
State drift is the enemy of control planes. Without event-driven updates and reconciliation, the system’s view of VM state diverges from reality. Durable state storage enables recovery after crashes, and audit logging provides accountability. These concerns might feel administrative, but they directly impact uptime and operator trust.
State drift is the enemy of control planes. Without event-driven updates and reconciliation, the system’s view of VM state diverges from reality. Durable state storage enables recovery after crashes, and audit logging provides accountability. These concerns might feel administrative, but they directly impact uptime and operator trust.How this fit on projects This concept defines the API, state store, and reconciliation loop of your mini-cloud.
Definitions & key terms
- Desired state: target configuration for a VM.
- Actual state: real VM status on hosts.
- Reconciliation: convergence process.
- Observability: metrics, logs, tracing.
Mental model diagram
API -> desired state -> controller -> libvirt -> VM -> metrics
How it works (step-by-step, with invariants and failure modes)
- Persist desired state.
- Run reconciliation loop.
- Act on discrepancies.
- Update state with events and metrics.
Invariants: single source of truth; idempotent operations. Failure modes include state drift and duplicate actions.
Minimal concrete example
DESIRED: running, ACTUAL: stopped -> start VM
Common misconceptions
- Event handling is optional.
- Observability is a “nice to have.”
Check-your-understanding questions
- Why must operations be idempotent?
- How do events reduce state drift?
Check-your-understanding answers
- Retries must not create duplicates.
- Events notify the control plane of actual state changes.
Real-world applications
- OpenStack Nova
- Cloud provider control planes
Where you’ll apply it
- Apply in §3.2 (functional requirements) and §4.1 (architecture)
- Also used in: P07-vagrant-style-orchestrator
References
- libvirt API docs
- QEMU QMP docs
Key insights Control planes are distributed systems; state drift is the enemy.
Summary You now understand how reconciliation and observability enable reliable orchestration.
Homework/Exercises to practice the concept
- Design a minimal state schema for VMs and hosts.
- Explain how you would detect and handle host failure.
Solutions to the homework/exercises
- Tables: hosts, vms, allocations, events.
- Use heartbeats and fences to avoid split-brain.
3. Project Specification
3.1 What You Will Build
A mini control plane that schedules VMs across nodes and exposes an API for lifecycle actions.
3.2 Functional Requirements
- API endpoints for create/start/stop/destroy.
- Scheduler chooses host based on metrics.
- State persisted in a DB.
- Migration or evacuation workflow.
3.3 Non-Functional Requirements
- Performance: API responds under 200ms for simple calls.
- Reliability: state survives restarts.
- Usability: clear logs and errors.
3.4 Example Usage / Output
$ curl -X POST /v1/vms {"name":"web01"}
{"id":"vm-101","host":"node2"}
3.5 Data Formats / Schemas / Protocols
- VM schema: id, name, cpu, ram, image, host
3.6 Edge Cases
- Host down during create
- Duplicate create requests
3.7 Real World Outcome
A VM can be created via API and placed on the least-loaded host.
3.7.1 How to Run (Copy/Paste)
- Start API server, connect to libvirt hosts
3.7.2 Golden Path Demo (Deterministic)
- Create VM -> list -> migrate
3.7.3 If CLI: exact terminal transcript
$ curl -X POST /v1/vms -d '{"name":"web01"}'
{"id":"vm-101","status":"running","host":"node2"}
4. Solution Architecture
4.1 High-Level Design
API -> Scheduler -> libvirt hosts -> VMs
^ metrics/logs -----------|
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | API server | Lifecycle actions | REST vs RPC | | Scheduler | Placement | scoring policy | | State store | Persistence | SQL or KV |
4.3 Data Structures (No Full Code)
- Host record: id, cpu, ram, utilization
- VM record: id, host, state
4.4 Algorithm Overview
- Validate request
- Choose host
- Call libvirt
- Persist state
5. Implementation Guide
5.1 Development Environment Setup
# Multi-node lab with libvirt
5.2 Project Structure
project-root/
├── api/
├── scheduler/
├── state/
└── README.md
5.3 The Core Question You’re Answering
“How do real cloud platforms orchestrate VMs across multiple hosts?”
5.4 Concepts You Must Understand First
- Scheduling and placement
- Idempotent API design
- State reconciliation
5.5 Questions to Guide Your Design
- What is centralized vs per-node state?
- How will you handle stale metrics?
5.6 Thinking Exercise
Design a placement algorithm for three uneven hosts.
5.7 The Interview Questions They’ll Ask
- “How does a scheduler decide where to place a VM?”
- “How do you ensure API idempotency?”
5.8 Hints in Layers
Hint 1: Start with a single host backend. Hint 2: Add a basic scheduler. Hint 3: Pseudocode
REQUEST -> validate -> choose host -> start VM -> record
Hint 4: Add request IDs for idempotency.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Distributed state | “Designing Data-Intensive Applications” | Ch. 9 |
5.10 Implementation Phases
- Phase 1: API + single host
- Phase 2: Scheduler + state store
- Phase 3: Migration workflows
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | State store | SQLite vs etcd | SQLite | Simpler for lab |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Integration Tests | API workflows | create/start/stop |
6.2 Critical Test Cases
- Duplicate create request returns same VM.
- Host down triggers reschedule.
6.3 Test Data
VM request: cpu=2, ram=4G
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | Stale metrics | Overloaded host | Refresh metrics | | Missing idempotency | Duplicate VMs | Request IDs |
7.2 Debugging Strategies
- Log every state transition with request ID.
7.3 Performance Traps
- Synchronous calls to remote hosts block API.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add list and status endpoints.
8.2 Intermediate Extensions
- Add quotas per user.
8.3 Advanced Extensions
- Add autoscaling and evacuation.
9. Real-World Connections
9.1 Industry Applications
- OpenStack, cloud providers
9.2 Related Open Source Projects
- OpenStack Nova, libvirt
9.3 Interview Relevance
- Scheduling, reconciliation, idempotency
10. Resources
10.1 Essential Reading
- libvirt API docs
- OpenStack scheduler docs
10.2 Video Resources
- Cloud control plane talks