Project 6: Build a Mini Cloud Platform

Build a minimal cloud control plane that schedules VMs across nodes, manages images, and exposes an API for lifecycle operations.

Quick Reference

Attribute Value
Difficulty Expert (Level 5)
Time Estimate 8-10 weeks
Main Programming Language Go/Python + C (Alternatives: Rust)
Alternative Programming Languages Rust
Coolness Level Level 5: Cloud Builder
Business Potential Level 4: Open Core Builder
Prerequisites libvirt usage, distributed systems basics, API design
Key Topics scheduling, state store, orchestration, API design

1. Learning Objectives

By completing this project, you will:

  1. Build a control plane API for VM lifecycle management.
  2. Implement a scheduler with resource accounting and bin-packing.
  3. Store cluster state in a durable key-value store.
  4. Build node agents that reconcile desired vs actual VM state.
  5. Design a minimal multi-node virtualization platform.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Control Plane Architecture and API Design

Fundamentals

A cloud control plane is the system that translates user requests into infrastructure actions. It exposes APIs, tracks desired state, and orchestrates hypervisors to create VMs. A good control plane separates stateless API servers from stateful components (databases or key-value stores). Understanding this architecture is essential for building a minimal cloud platform that is reliable and predictable. At a basic level, you should be able to describe the flow of a single request end-to-end and identify which component owns which responsibility. This mental model is the foundation for every other subsystem in this project. Without it, the system becomes a tangle of side effects.

Deep Dive into the concept

Control planes consist of several logical layers: an API layer, a scheduling layer, and an execution layer (agents). The API layer handles client requests, validates input, and writes desired state to a store. It should be stateless so it can scale horizontally. The scheduling layer reads resource information and chooses a target node for a VM. It writes placement decisions back to the state store. The execution layer consists of agents running on each node that reconcile desired state with actual state by calling libvirt.

API design must reflect these stages. For example, a POST /v1/vms request should create a VM record with status “building.” The scheduler assigns a node and updates the record to “scheduled.” A node agent then performs the actual VM creation and transitions status to “running.” This separation is crucial: if the API server crashes, the desired state still exists and agents can continue. This is the core idea of declarative systems.

Idempotency is another key property. Clients may retry requests, and your API must handle this gracefully. You can use idempotency keys or require unique VM names. For POST /v1/vms, if the VM already exists, return the existing record instead of creating a duplicate. This protects you from race conditions and client retries.

API versioning and error handling are important even in a toy system. Defining a consistent error JSON schema makes debugging easier. For example, { "error": { "code": "INVALID_REQUEST", "message": "...", "details": {...} } }. Your control plane should return clear errors for invalid requests, missing resources, or internal failures.

Finally, control planes must be observable. Even in a mini platform, you should log scheduling decisions, API requests, and state transitions. These logs will help you debug and also illustrate how a cloud control plane behaves. This is where your project starts to feel like a real system: you will see requests flow through the API, scheduler, and agents, each updating shared state.

An additional concern is API evolution. Even in a small project, defining clear versioned endpoints (e.g., /v1/) teaches you how to evolve APIs without breaking clients. It also forces you to think about backward compatibility and migration strategies for the state store. In large systems, these are non-trivial engineering efforts. Including a version prefix and a documented error schema now will make your mini cloud feel like a production-quality system in miniature.

Authentication and authorization are another dimension, even if you keep them lightweight. A minimal token-based auth scheme (single shared token) is enough to illustrate how control planes protect infrastructure. It also forces you to think about secure defaults: unauthenticated endpoints can allow destructive operations. Even if you do not fully implement multi-tenant RBAC, acknowledging the security boundary helps you design API responses and logging with a real-world mindset.

Finally, consider rate limiting and request validation. In real systems, the control plane must protect itself from overload and from malformed input. Adding basic limits (e.g., max VM size, max requests per minute) demonstrates that the control plane is not just a router but a policy enforcement point. Even if your limits are simple, their presence aligns your mini cloud with real operational practices.

How this fit on projects

Control plane design is implemented in Section 3.2 and Section 5.10 Phase 1, and validated in Section 3.7 with API calls.

Definitions & key terms

  • Control plane -> The orchestration layer that manages infrastructure.
  • Desired state -> What the system should do (e.g., a VM should exist).
  • Reconciliation -> Making actual state match desired state.
  • Idempotency -> Repeated requests have the same effect.

Mental model diagram (ASCII)

Client -> API -> State Store -> Scheduler -> Node Agent -> Hypervisor

How it works (step-by-step, with invariants and failure modes)

  1. Client sends request to API.
  2. API validates and writes desired state.
  3. Scheduler picks a node and records placement.
  4. Agent sees desired state and creates VM.

Failure modes: API crash, scheduler crash, agent crash. Desired state must persist.

Minimal concrete example

POST /v1/vms
{
  "name": "web01",
  "cpu": 2,
  "ram": 4096,
  "image": "ubuntu"
}

Common misconceptions

  • “The API must directly create the VM.” -> It should delegate to agents for reliability.
  • “Stateless means no state.” -> Stateless components rely on external state stores.
  • “Idempotency is optional.” -> It is critical for safe retries.

Check-your-understanding questions

  1. Why should the API server be stateless?
  2. What is the role of a reconciliation loop?
  3. Predict what happens if the scheduler is down.

Check-your-understanding answers

  1. So it can scale and restart without losing state.
  2. It ensures actual state matches desired state over time.
  3. VM requests remain pending until scheduling resumes.

Real-world applications

  • OpenStack, OpenNebula, and cloud control planes.

Where you’ll apply it

References

  • “Designing Data-Intensive Applications” (coordination and state)

Key insights

Control planes separate intent from execution, which makes them resilient.

Summary

You now understand the architectural role of the control plane and why APIs must be idempotent.

Homework/Exercises to practice the concept

  1. Design an API schema for VM creation and deletion.
  2. Write a state transition diagram for VM lifecycle.

Solutions to the homework/exercises

  1. Include POST /v1/vms, GET /v1/vms/{id}, DELETE /v1/vms/{id}.
  2. States: building -> running -> stopped -> deleted.

2.2 Scheduling and Resource Accounting

Fundamentals

Scheduling decides which node should run a VM. A scheduler must understand resource capacity (CPU, RAM, disk) and current usage, then place workloads according to a policy (e.g., bin-packing or least-loaded). Resource accounting ensures that scheduling decisions do not oversubscribe nodes. This concept is crucial for building a cloud platform that behaves predictably. Even a simple scheduler must answer two questions consistently: can this node fit the VM, and why did I choose it over others? Without this clarity, scheduling becomes guesswork instead of policy. The result would be random outages under load. It is the earliest place to enforce placement policy.

Deep Dive into the concept

Scheduling is a classic systems problem: you have a set of nodes with resources and a set of workloads with requirements. A simple scheduler might pick the node with the most free RAM. A more realistic scheduler uses bin-packing: it tries to pack VMs densely to leave other nodes free. In contrast, a spread strategy distributes VMs to reduce risk. You can implement a simple scoring system that combines CPU and RAM utilization.

Resource accounting means tracking current usage and reservations. The scheduler must know how much CPU and RAM is already allocated to VMs on each node. It can obtain this from libvirt or from a state store. But it must also account for pending VMs that are not yet running. This is why scheduler and state store must be consistent. If you schedule based on stale data, you can overload nodes.

A key design is whether you allow overcommit. Many cloud platforms allow CPU overcommit because not all VMs use full CPU simultaneously. Memory overcommit is riskier because it can lead to swapping or OOM. For a small platform, you might disable overcommit for simplicity. But you should document the trade-off.

Scheduling also has failure modes. If a node goes down, scheduled VMs may need to be re-placed. This is part of the reconciliation loop. Your scheduler should consider node health: only schedule to healthy nodes. You can maintain a heartbeat system where agents periodically update their status.

Finally, scheduling is a policy. You can experiment with different strategies: random, least-loaded, or affinity-based. The key is to make it deterministic and observable. Log scheduling decisions with the metrics that influenced them; this is essential for debugging.

There is also a fairness dimension. If you always pack VMs tightly, some nodes may remain idle while others are hot, which can reduce resilience. Conversely, spreading can waste resources but improve fault tolerance. In your project, you can expose the policy as a configuration flag and demonstrate how the same workload is placed differently. This teaches the real-world lesson that scheduling is not just an algorithm; it encodes operational priorities like cost, risk, and performance.

Scheduling constraints go beyond CPU and RAM. Real systems consider labels, affinity, and anti-affinity (e.g., keep replicas on different nodes). You can implement a lightweight version by allowing a VM to request a “zone” label and ensuring the scheduler respects it. Even if you do not implement full constraints, this concept shows why scheduling is a policy engine rather than a single formula. It also ties into failure domains from Project 4, which you can reuse conceptually here.

Admission control complements scheduling. Before a VM request reaches the scheduler, you can enforce global quotas or sanity checks (e.g., reject VMs larger than any node). This keeps the scheduler simpler and provides immediate feedback to users. Even a basic admission layer reinforces the idea that large systems separate validation, policy, and placement into distinct steps.

You can also record rejected requests for auditing. A simple log of “why” a request was denied builds intuition about capacity planning and helps users learn the platform’s limits. This is a tiny feature with big operational value.

How this fit on projects

Scheduling logic appears in Section 3.2 and Section 5.10 Phase 2, and is demonstrated in Section 3.7 with logs.

Definitions & key terms

  • Bin-packing -> Packing workloads to minimize wasted space.
  • Overcommit -> Reserving more resources than physically available.
  • Placement decision -> Scheduler’s chosen node for a VM.

Mental model diagram (ASCII)

Nodes: A(4 CPU, 8GB), B(8 CPU, 16GB)
VM: 2 CPU, 4GB -> schedule to node with best fit

How it works (step-by-step, with invariants and failure modes)

  1. Gather node resource metrics.
  2. Filter out unhealthy nodes.
  3. Score nodes based on policy.
  4. Choose best node and record placement.

Failure modes: stale metrics, race conditions, overcommit errors.

Minimal concrete example

score = (free_cpu_weight * free_cpu) + (free_ram_weight * free_ram)

Common misconceptions

  • “Scheduling is just picking a random node.” -> It must respect resource limits.
  • “Overcommit is always bad.” -> It is a deliberate policy with trade-offs.
  • “Schedulers never fail.” -> They are distributed components and can fail.

Check-your-understanding questions

  1. Why is resource accounting critical for scheduling?
  2. What happens if you schedule two large VMs at once without locking?
  3. Predict how overcommit affects node stability.

Check-your-understanding answers

  1. Without accounting, you can overload nodes.
  2. Both VMs may be placed on the same node, causing overload.
  3. Overcommit increases utilization but risks contention or OOM.

Real-world applications

  • Kubernetes scheduler and OpenStack Nova.

Where you’ll apply it

References

  • Scheduling chapters in distributed systems literature

Key insights

Scheduling is resource policy encoded in code; it must be deterministic and observable.

Summary

You now understand how to implement a simple scheduler with resource accounting.

Homework/Exercises to practice the concept

  1. Simulate scheduling decisions with three nodes and five VMs.
  2. Implement a simple bin-packing algorithm in a script.

Solutions to the homework/exercises

  1. Place VMs in a way that maximizes remaining capacity.
  2. Sort VMs by size and place in the smallest fitting node.

2.3 Distributed State Store and Reconciliation Loops

Fundamentals

A distributed state store (e.g., etcd or SQLite + replication) keeps the source of truth for VM state. Agents on each node reconcile this desired state with actual state by creating or destroying VMs as needed. This pattern is the backbone of reliable cloud platforms because it allows components to crash and recover without losing intent. You should think of the store as the contract between independent components, and treat every write as a durable decision the rest of the system must honor. When in doubt, the store wins over local observations. That principle prevents conflicting actions across nodes. It also makes audits and rollback possible.

Deep Dive into the concept

The state store is the control plane’s memory. It contains VM records, node status, and placement decisions. For a minimal system, you can use SQLite with a single API server, but for a multi-node control plane, a distributed key-value store like etcd is more realistic. It provides strong consistency and supports compare-and-swap operations, which are useful for idempotent updates and locking.

Reconciliation loops are the mechanism by which agents enforce desired state. Each agent periodically reads VM records assigned to its node, compares them to actual VM state (via libvirt), and takes action: create missing VMs, start stopped VMs, or delete unwanted ones. This loop is eventually consistent: if a VM creation fails temporarily, the loop will retry.

Locks and leases are important. Suppose two agents think they own the same VM; you risk duplicate creation. A common solution is to assign each VM a node in the state store and enforce that only that node’s agent can act. You can also use leases in etcd so that if a node fails, its leases expire and other nodes can take over.

Consistency vs availability is the central trade-off. A strongly consistent store like etcd ensures that the control plane has a single source of truth, but it requires quorum. If quorum is lost, the control plane cannot schedule new VMs. This is similar to the quorum rules in your hyperconverged lab. Understanding this trade-off helps you reason about failure modes: if the state store is down, the system can still keep running VMs but cannot accept new changes.

The reconciliation loop is also where you implement health checks and repair. If a VM crashes, the agent can detect it and restart. If a node becomes unhealthy, the scheduler can reschedule its VMs. This is the entry point for high availability. Even in a minimal system, you should implement a simple health check and state transition logic.

Another subtlety is eventual consistency. Agents may act on slightly stale desired state if the store update lags or if network partitions occur. Your design should tolerate these delays by making operations idempotent and by recording the “last seen” version of a VM record. This is a key pattern in distributed systems: even if the control plane is temporarily inconsistent, reconciliation will converge once communication is restored.

Leader election is another pattern often used in control planes. If you run multiple schedulers, only one should be active at a time to avoid duplicate placements. You can avoid this by running a single scheduler process, but understanding leader election prepares you for scaling. Even a simple “lease” key in the state store that the scheduler renews periodically illustrates the concept and ties back to quorum and consensus.

Also consider how agents report health. A periodic heartbeat with a TTL in the state store is a straightforward approach. If the TTL expires, the scheduler marks the node unhealthy and avoids placing new VMs there. This mirrors real-world systems and makes your mini cloud resilient to partial failures without requiring complex monitoring infrastructure.

How this fit on projects

State store and reconciliation are implemented in Section 3.2 and Section 5.10 Phase 3, and are demonstrated in Section 3.7 with failure cases.

Definitions & key terms

  • State store -> Database or KV store containing desired state.
  • Reconciliation loop -> Periodic process that enforces desired state.
  • Lease -> Time-limited ownership token to avoid split-brain.

Mental model diagram (ASCII)

State Store (desired state)
    |
    v
Agent reads -> compares -> acts -> updates state

How it works (step-by-step, with invariants and failure modes)

  1. API writes desired state.
  2. Scheduler assigns node.
  3. Agent reads assigned VMs.
  4. Agent reconciles with libvirt state.
  5. Agent updates status in store.

Failure modes: state store down, stale leases, conflicting updates.

Minimal concrete example

if vm.desired == "running" and vm.actual != "running":
  start_vm(vm)

Common misconceptions

  • “Reconciliation is one-time.” -> It is continuous and periodic.
  • “State store is only for metadata.” -> It is the source of truth.
  • “Agents can always act.” -> They must respect leases and locks.

Check-your-understanding questions

  1. Why is a strong state store important for correctness?
  2. What happens if two agents reconcile the same VM?
  3. Predict how loss of quorum affects scheduling.

Check-your-understanding answers

  1. It prevents split-brain and inconsistent decisions.
  2. You may get duplicate VMs or conflicting actions.
  3. New scheduling halts; existing VMs keep running.

Real-world applications

  • Kubernetes controllers and etcd.
  • OpenStack Nova + Placement.

Where you’ll apply it

References

  • etcd documentation
  • “Designing Data-Intensive Applications” (consensus and coordination)

Key insights

Reconciliation loops make systems self-healing; the state store is the truth.

Summary

You now understand how distributed state and reconciliation enable resilient control planes.

Homework/Exercises to practice the concept

  1. Implement a simple reconciliation loop in a script.
  2. Simulate a node failure by stopping an agent.

Solutions to the homework/exercises

  1. Poll a JSON file and ensure a local process matches it.
  2. Stop the agent and observe stale state in the store.

2.4 Network Virtualization, IPAM, and Tenant Isolation

Fundamentals

A mini cloud platform is not complete without networking. When you create a VM, you must allocate an IP, connect its virtual NIC to a bridge or overlay network, and ensure traffic is isolated between tenants. This requires three building blocks: virtual networking (bridges, TAP devices, or VXLAN), IP address management (IPAM) to track and allocate addresses deterministically, and policy enforcement (NAT or security groups) to control access. Even in a small cluster, these systems must be reliable and repeatable, because a bad network configuration can make a VM appear “down” even when it is running.

Deep Dive into the concept

Virtual networking in KVM-based systems typically starts with a Linux bridge. Each VM has a TAP interface; the hypervisor attaches that TAP to a bridge that acts like a virtual switch. This provides layer-2 connectivity among VMs on the same host. If you want VMs to reach the outside world, you can add NAT on the host or bridge directly to a physical interface. For a multi-host setup, you need an overlay network. VXLAN is the most common choice: it encapsulates layer-2 frames inside UDP packets and uses VTEPs (VXLAN Tunnel Endpoints) on each host to create a shared L2 segment across L3 networks. This lets a VM on node A talk to a VM on node B as if they were on the same switch.

IPAM is the control-plane mechanism that hands out IP addresses and prevents collisions. At minimum, you need a database table that tracks subnet ranges, allocated addresses, and which VM holds which IP. You can implement static IP assignment (control plane writes the IP into cloud-init or guest metadata) or DHCP. Static assignment is simpler and deterministic: you allocate an IP, store it in your state store, and inject it into the VM’s cloud-init user-data. The downside is manual DNS unless you build your own. DHCP is more flexible but requires a DHCP server and predictable MAC address assignment. For a mini-cloud, static IPs plus a simple DNS mapping in your API layer is often enough.

Tenant isolation is the next challenge. You must ensure that VM A cannot spoof traffic to VM B’s network. The simplest model is per-tenant bridges or per-tenant VXLAN segments. Each tenant gets a unique bridge or VXLAN VNI. Your control plane creates the correct network objects at VM creation time and ensures that each VM’s tap interface connects to the right segment. For additional control, you can implement security groups using iptables or nftables to restrict inbound and outbound traffic. Even if you only support a basic allowlist (e.g., SSH and HTTP), it teaches you how cloud platforms enforce network policy.

Another subtlety is MAC address stability. If a VM’s MAC changes between boots, DHCP leases break and security rules might not match. Your control plane should generate deterministic MAC addresses, store them in the database, and reapply them at every start. This is the same reason that large clouds use deterministic network metadata services: without stable IDs, networks become unreliable.

Finally, observe how networking interacts with scheduling. If a VM is migrated to another node, the control plane must reattach the network interface and ensure IP/MAC continuity. With VXLAN, this is automatic because the overlay is host-agnostic. With local bridges, you must perform additional wiring on the target host. Designing for migration early prevents painful rewrites later.

How this fit on projects

This concept informs Section 3.2 functional requirements (network creation and IP assignment), Section 3.5 data formats (network config schema), and Section 5.10 Phase 2 where you wire VMs to networks.

Definitions & key terms

  • TAP device -> Virtual NIC endpoint used by VMs.
  • Linux bridge -> Kernel L2 switch that connects interfaces.
  • VXLAN -> L2 overlay network over UDP.
  • IPAM -> IP Address Management, allocation and tracking of IPs.
  • Security group -> Policy rules restricting network traffic.

Mental model diagram (ASCII)

VM -> TAP0 --+
             |--> Linux bridge --> NAT/uplink
VM -> TAP1 --+

Multi-host:
VM -> TAP -> bridge -> VXLAN VTEP ~~ UDP ~~ VTEP -> bridge -> TAP -> VM

How it works (step-by-step, with invariants and failure modes)

  1. Allocate an IP and MAC from IPAM (invariant: no duplicates in DB).
  2. Create or select a network segment (bridge or VXLAN VNI).
  3. Create a TAP device and attach it to the segment.
  4. Configure VM NIC with the allocated MAC and network.
  5. Apply firewall or NAT rules. Failure mode: wrong bridge or IP leads to unreachable VM.

Minimal concrete example

# Create bridge
ip link add br0 type bridge
ip link set br0 up

# Create TAP and attach
ip tuntap add dev tap0 mode tap
ip link set tap0 master br0
ip link set tap0 up

Common misconceptions

  • “Networking is just a libvirt XML detail.” -> The control plane must ensure real Linux networking exists.
  • “IP allocation can be random.” -> Random without tracking causes collisions and hard-to-debug outages.
  • “Security groups are optional.” -> Without them, every VM is exposed to every other VM.

Check-your-understanding questions

  1. Why does a VM need a TAP device instead of directly using a bridge?
  2. Predict what happens if two VMs are assigned the same static IP.
  3. What advantage does VXLAN provide in multi-host clusters?
  4. How can a control plane keep MAC addresses stable across VM restarts?

Check-your-understanding answers

  1. The TAP device is the VM-facing endpoint that the hypervisor can attach to a bridge.
  2. ARP conflicts and intermittent connectivity; one VM will “steal” traffic.
  3. VXLAN creates a virtual L2 network across hosts without requiring shared physical switches.
  4. Generate a deterministic MAC and store it in the state database; reapply on each boot.

Real-world applications

  • OpenStack Neutron and Kubernetes CNI plugins implement IPAM and overlays at scale.
  • Cloud providers use VXLAN/GENEVE overlays to isolate tenant networks.

Where you’ll apply it

References

  • Linux bridge and ip command documentation
  • RFC 7348 (VXLAN)

Key insights

Networking is part of the control plane: without deterministic IP and network wiring, your VMs are effectively offline.

Summary

You now understand how virtual networks, IPAM, and policy enforcement fit into a mini cloud platform.

Homework/Exercises to practice the concept

  1. Write a small IPAM allocator that hands out IPs from 10.0.3.0/24 and tracks them in a SQLite table.
  2. Create two bridges and show how to isolate traffic between them with separate subnets.

Solutions to the homework/exercises

  1. Store allocated IPs with a UNIQUE constraint and return the first free address on allocation.
  2. Attach each VM’s TAP to its own bridge and ensure no routes exist between the subnets.

3. Project Specification

3.1 What You Will Build

A mini cloud platform that:

  • Exposes REST API for VM lifecycle.
  • Schedules VMs across multiple nodes.
  • Uses a state store for VM and node state.
  • Runs node agents that manage libvirt.

Included: API server, scheduler, state store, node agents. Excluded: full multi-tenant auth, billing, GUI.

3.2 Functional Requirements

  1. API: Create, list, get, delete VMs.
  2. Scheduler: place VMs on nodes based on resources.
  3. State Store: persist VM and node state.
  4. Node Agent: reconcile VM state with libvirt.
  5. Health Checks: detect node failures.

3.3 Non-Functional Requirements

  • Reliability: API and scheduler can restart without losing state.
  • Usability: clear API responses and logs.
  • Observability: logs for scheduling decisions.

3.4 Example Usage / Output

$ curl -X POST http://localhost:8080/v1/vms \
  -d '{"name":"web01","cpu":2,"ram":4096,"image":"ubuntu"}'
{"id":"vm-101","status":"building","host":"node2"}

3.5 Data Formats / Schemas / Protocols

VM record (JSON):

{
  "id": "vm-101",
  "name": "web01",
  "cpu": 2,
  "ram": 4096,
  "image": "ubuntu",
  "status": "running",
  "host": "node2"
}

3.6 Edge Cases

  • Scheduler cannot find capacity -> 409 error.
  • Node agent down -> VM stays in building state.
  • Duplicate VM name -> 409 error.

3.7 Real World Outcome

You will be able to create and manage VMs across multiple nodes using a simple REST API.

3.7.1 How to Run (Copy/Paste)

./apiserver
./scheduler
./agent --node node1

3.7.2 Golden Path Demo (Deterministic)

  • Create a VM and observe scheduler log showing placement.

3.7.3 API Transcript (Success + Failure)

$ curl -s -X POST http://localhost:8080/v1/vms \
  -d '{"name":"web01","cpu":2,"ram":4096,"image":"ubuntu"}'
{"id":"vm-101","status":"building","host":"node2"}

$ curl -s http://localhost:8080/v1/vms/vm-101
{"id":"vm-101","status":"running","ip":"10.0.3.15"}

$ curl -s -X POST http://localhost:8080/v1/vms \
  -d '{"name":"web01","cpu":64,"ram":262144,"image":"ubuntu"}'
{"error":{"code":"NO_CAPACITY","message":"no node can fit this VM"}}

Error JSON shape:

{
  "error": {
    "code": "STRING_CODE",
    "message": "human readable message",
    "details": {"optional": "context"}
  }
}

4. Solution Architecture

4.1 High-Level Design

API Server -> State Store -> Scheduler -> Node Agents -> libvirt

4.2 Key Components

| Component | Responsibility | Key Decisions | |———–|—————-|—————| | API server | request validation | stateless design | | Scheduler | placement decisions | bin-packing | | State store | source of truth | etcd or SQLite | | Node agent | reconcile VMs | periodic loop |

4.3 Data Structures (No Full Code)

type VM struct {
  ID string
  Name string
  CPU int
  RAM int
  Host string
  Status string
}

4.4 Algorithm Overview

Key Algorithm: Reconciliation

  1. Agent lists desired VMs.
  2. Compares to libvirt state.
  3. Creates/starts/stops as needed.

Complexity Analysis:

  • Time: O(VMs per node)
  • Space: O(number of VMs)

5. Implementation Guide

5.1 Development Environment Setup

# install libvirt and a KV store (etcd or sqlite)

5.2 Project Structure

mini-cloud/
+-- apiserver/
+-- scheduler/
+-- agent/
+-- shared/

5.3 The Core Question You’re Answering

“How do real cloud platforms orchestrate VMs across multiple hosts?”

5.4 Concepts You Must Understand First

  1. Control plane design
  2. Scheduling algorithms
  3. Distributed state stores

5.5 Questions to Guide Your Design

  1. What is your API contract?
  2. How do you avoid duplicate VM creation?
  3. How do agents recover after crashes?

5.6 Thinking Exercise

Design a scheduling policy that avoids hotspots.

5.7 The Interview Questions They’ll Ask

  1. How does a scheduler decide where to place a VM?
  2. What happens if the state store goes down?
  3. Why is reconciliation important?

5.8 Hints in Layers

Hint 1: Start with a single-node deployment. Hint 2: Add scheduler logic once API works. Hint 3: Add node agents with reconciliation.

5.9 Books That Will Help

| Topic | Book | Chapter | |——-|——|———| | Distributed systems | DDIA | Ch. 8-9 | | Virtualization | Modern OS | Ch. 7 | | Systems design | Site Reliability Engineering | Ch. 2 |

5.10 Implementation Phases

Phase 1: Foundation (2-3 weeks)

Goals: API server + state store. Tasks: CRUD endpoints, database schema. Checkpoint: POST /v1/vms stores records.

Phase 2: Core Functionality (3-4 weeks)

Goals: Scheduler and node agents. Tasks: placement logic, reconciliation loop. Checkpoint: VMs start on chosen nodes.

Phase 3: Polish & Edge Cases (2-3 weeks)

Goals: HA and failure handling. Tasks: node health checks, rescheduling. Checkpoint: scheduler avoids unhealthy nodes.

5.11 Key Implementation Decisions

| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | State store | etcd vs SQLite | SQLite for simplicity | single control plane | | Scheduling | bin-pack vs spread | bin-pack | efficient usage | | API auth | none vs token | token optional | security awareness |


6. Testing Strategy

6.1 Test Categories

| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | scheduler scoring | resource fit | | Integration Tests | API + agent | create VM | | Failure Tests | node down | reschedule |

6.2 Critical Test Cases

  1. VM created via API appears running on target node.
  2. Duplicate VM name returns 409.
  3. Scheduler logs decision.

6.3 Test Data

Two nodes with different capacities

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

| Pitfall | Symptom | Solution | |———|———|———-| | Stale metrics | VMs overload node | refresh before schedule | | No idempotency | duplicate VMs | enforce unique names | | Agent crash | VMs stuck | restart agent |

7.2 Debugging Strategies

  • Log every state transition.
  • Use correlation IDs for API requests.

7.3 Performance Traps

  • Too frequent reconciliation loops can overload libvirt.

8. Extensions & Challenges

8.1 Beginner Extensions

  • Add a CLI client for API.

8.2 Intermediate Extensions

  • Add live migration endpoint.

8.3 Advanced Extensions

  • Add multi-tenant quotas and RBAC.

9. Real-World Connections

9.1 Industry Applications

  • Private cloud stacks and virtualization platforms.
  • OpenStack Nova
  • OpenNebula

9.3 Interview Relevance

  • Control plane architecture and scheduling are common design interview topics.

10. Resources

10.1 Essential Reading

  • DDIA chapters on coordination
  • libvirt API docs

10.2 Video Resources

  • Cloud control plane talks

10.3 Tools & Documentation

  • virsh, libvirt APIs, etcd

11. Self-Assessment Checklist

11.1 Understanding

  • I can explain control plane vs data plane.
  • I understand scheduling trade-offs.
  • I can describe reconciliation loops.

11.2 Implementation

  • API works as documented.
  • Scheduler places VMs correctly.
  • Agents reconcile state.

11.3 Growth

  • I can explain this system in an interview.

12. Submission / Completion Criteria

Minimum Viable Completion:

  • API server + scheduler + agent run.
  • VM creation works end-to-end.

Full Completion:

  • Failure handling and rescheduling.
  • Deterministic scheduling logs.

Excellence (Going Above & Beyond):

  • Live migration endpoint and multi-node HA.