← Back to all projects

LEARN HOME CLUSTERS DEEP DIVE

Learn Home Clusters: From Single Server to Distributed Infrastructure

Goal: Deeply understand home clusters (homelabs)—what they are, why they’re useful, how they differ from single servers, and how to build increasingly sophisticated distributed systems at home. You’ll progress from running your first container to building a production-grade Kubernetes cluster with GitOps, monitoring, and self-healing.


Why Home Clusters Matter

A home cluster (or homelab) is a collection of computers in your home that work together as a unified system. Instead of one big server doing everything, you have multiple machines sharing the load, providing redundancy, and teaching you the skills that power modern cloud infrastructure.

After completing these projects, you will:

  • Understand the difference between a single server and a cluster
  • Know when to use VMs vs. containers vs. orchestration
  • Build and manage your own Kubernetes cluster
  • Implement distributed storage that survives disk failures
  • Set up monitoring, alerting, and self-healing infrastructure
  • Deploy applications using GitOps (infrastructure as code)
  • Understand networking concepts like VLANs, load balancing, and VPNs
  • Have skills directly transferable to cloud platforms (AWS, GCP, Azure)

Core Concept Analysis

What is a Cluster?

Single Server                          Cluster
┌─────────────────┐                    ┌─────────────────┐
│                 │                    │    Node 1       │
│   All services  │                    │  (control plane)│
│   All storage   │                    └────────┬────────┘
│   Single point  │                             │
│   of failure    │                    ┌────────┴────────┐
│                 │                    │                 │
└─────────────────┘           ┌────────┴──────┐  ┌───────┴───────┐
                              │    Node 2     │  │    Node 3     │
                              │   (worker)    │  │   (worker)    │
                              └───────────────┘  └───────────────┘

If server dies,               If node dies, workloads move
everything is gone            to remaining nodes automatically

The Stack: From Hardware to Applications

┌─────────────────────────────────────────────────────────────────────────┐
│                          YOUR APPLICATIONS                              │
│            (Websites, Databases, Media Servers, Home Automation)        │
├─────────────────────────────────────────────────────────────────────────┤
│                      CONTAINER ORCHESTRATION                            │
│              (Kubernetes / K3s / Docker Swarm / Nomad)                  │
│        Schedules containers, handles scaling, self-healing             │
├─────────────────────────────────────────────────────────────────────────┤
│                      CONTAINER RUNTIME                                  │
│                    (Docker / containerd / Podman)                       │
│              Runs isolated containers from images                       │
├─────────────────────────────────────────────────────────────────────────┤
│                      VIRTUALIZATION (optional)                          │
│                    (Proxmox / VMware / Hyper-V)                         │
│           VMs provide isolation and easy snapshotting                   │
├─────────────────────────────────────────────────────────────────────────┤
│                      OPERATING SYSTEM                                   │
│              (Ubuntu Server / Debian / Fedora / Talos)                  │
├─────────────────────────────────────────────────────────────────────────┤
│                         HARDWARE                                        │
│    (Raspberry Pi / Intel N100 Mini PC / Refurbished Dell Optiplex)     │
└─────────────────────────────────────────────────────────────────────────┘

Why Build a Home Cluster?

Reason Description
Learning Hands-on experience with cloud-native technologies without cloud costs
Career Kubernetes, Docker, and DevOps skills are highly sought after
Self-hosting Run your own services (media, email, storage) with privacy
High availability Services survive individual machine failures
Cost savings Learn on $200-500 hardware vs. $100s/month in cloud bills
Experimentation Break things freely—rebuild in minutes with automation

Cluster Types Compared

Type Complexity Use Case Learning Value
Docker Compose ★☆☆☆☆ Single server, multiple containers Container basics
Docker Swarm ★★☆☆☆ Simple multi-node clustering HA basics, easy setup
K3s / MicroK8s ★★★☆☆ Lightweight Kubernetes Production K8s skills
Full Kubernetes ★★★★☆ Complete K8s experience Enterprise-grade skills
Proxmox + K8s ★★★★★ VMs + Containers Full infrastructure

Proxmox vs Kubernetes: When to Use What

┌─────────────────────────────────────────────────────────────────┐
│                         PROXMOX                                 │
│                   (Virtualization Layer)                        │
│                                                                 │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐           │
│  │  VM 1   │  │  VM 2   │  │  VM 3   │  │  VM 4   │           │
│  │ K8s     │  │ K8s     │  │ K8s     │  │ Windows │           │
│  │ Control │  │ Worker  │  │ Worker  │  │ for     │           │
│  │ Plane   │  │ Node 1  │  │ Node 2  │  │ Testing │           │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘           │
└─────────────────────────────────────────────────────────────────┘

Proxmox handles:                Kubernetes handles:
- VM lifecycle                  - Container scheduling
- Snapshots & backups          - Auto-scaling
- Resource allocation          - Service discovery
- HA for VMs                   - Rolling deployments
- Storage management           - Self-healing containers

Best practice: Run Kubernetes inside Proxmox VMs. You get the best of both worlds—VM-level isolation and easy recovery (snapshots) with container-level efficiency and orchestration.


Hardware Options for Home Clusters

Budget Recommendations (2025)

Option Cost Nodes Best For
Raspberry Pi 5 (4GB) ~$60 each 3-5 Learning, low power, ARM experience
Intel N100 Mini PCs ~$150 each 3-4 Best value, 6W TDP, silent
Refurbished Optiplex ~$100 each 3-4 More RAM/CPU, slightly loud
Minisforum MS-01 ~$600 each 2-3 10GbE networking, serious clusters

Starter Cluster: 3-Node Raspberry Pi

Total Cost: ~$200-250

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Pi 5 (8GB)     │     │  Pi 5 (4GB)     │     │  Pi 5 (4GB)     │
│  Control Plane  │     │  Worker 1       │     │  Worker 2       │
│                 │     │                 │     │                 │
│  + 64GB SD      │     │  + 64GB SD      │     │  + 64GB SD      │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                         ┌───────┴───────┐
                         │   Gigabit     │
                         │   Switch      │
                         └───────────────┘
Total Cost: ~$450-600

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ N100 Mini PC    │     │ N100 Mini PC    │     │ N100 Mini PC    │
│ 16GB RAM        │     │ 16GB RAM        │     │ 16GB RAM        │
│ 512GB NVMe      │     │ 512GB NVMe      │     │ 512GB NVMe      │
│ 2x 2.5GbE       │     │ 2x 2.5GbE       │     │ 2x 2.5GbE       │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                         ┌───────┴───────┐
                         │  2.5GbE       │
                         │  Switch       │
                         └───────────────┘

Advantages:
- x86 architecture (more compatible software)
- 16-32GB RAM per node
- NVMe storage (fast)
- Low power (~6W idle)
- Silent operation

Project List

Projects are ordered from fundamental understanding to advanced implementations.


Project 1: Docker on a Single Node (The Foundation)

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: Bash / YAML
  • Alternative Programming Languages: Python (for automation)
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Containers / Docker
  • Software or Tool: Docker, Docker Compose
  • Main Book: “Docker Deep Dive” by Nigel Poulton

What you’ll build: A single server running multiple containerized applications (web server, database, reverse proxy) using Docker and Docker Compose. This is the foundation for all cluster work.

Why it teaches home clusters: Before you can orchestrate containers across multiple machines, you need to understand what containers are and how they work on a single machine. Docker Compose is the “single-node orchestrator” that introduces you to declarative configuration.

Core challenges you’ll face:

  • Container networking → maps to how containers talk to each other
  • Volume management → maps to persistent data in ephemeral containers
  • Image building → maps to creating custom container images
  • Compose file syntax → maps to declarative infrastructure

Key Concepts:

  • Container vs VM: “Docker Deep Dive” Chapter 4 - Nigel Poulton
  • Docker Networking: “Docker Deep Dive” Chapter 11
  • Docker Volumes: “Docker Deep Dive” Chapter 10
  • Compose File Reference: Docker official documentation

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Linux command line, understanding of web servers. No prior container experience needed.

Real world outcome:

$ docker compose up -d

[+] Running 4/4
 ✔ Network homelab_default       Created
 ✔ Container homelab-db-1        Started
 ✔ Container homelab-app-1       Started
 ✔ Container homelab-nginx-1     Started

$ curl http://localhost
Welcome to my homelab!

$ docker compose ps
NAME                STATUS      PORTS
homelab-db-1        running     5432/tcp
homelab-app-1       running     8080/tcp
homelab-nginx-1     running     0.0.0.0:80->80/tcp

Implementation Hints:

Docker Compose file structure:

# docker-compose.yml
version: '3.8'

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - app

  app:
    build: ./app
    environment:
      - DATABASE_URL=postgres://db:5432/myapp

  db:
    image: postgres:15
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=secret

volumes:
  postgres_data:

Questions to answer as you build:

  • What happens to data when a container restarts?
  • How do containers find each other by name?
  • Why use Alpine-based images?
  • What’s the difference between build: and image:?

Learning milestones:

  1. “Hello World” container runs → You understand container basics
  2. Multi-container app works → You understand networking
  3. Data persists across restarts → You understand volumes
  4. You can rebuild from scratch in one command → You understand declarative config

Project 2: Docker Swarm Cluster (Your First Real Cluster)

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: Bash / YAML
  • Alternative Programming Languages: Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Container Orchestration
  • Software or Tool: Docker Swarm, Portainer
  • Main Book: “Docker Deep Dive” by Nigel Poulton

What you’ll build: A 3-node Docker Swarm cluster where containers can run on any node, services are load-balanced automatically, and if a node dies, containers are rescheduled to surviving nodes.

Why it teaches home clusters: Docker Swarm is the gentlest introduction to container orchestration. One command (docker swarm init) and you have a cluster. It teaches the core concepts—managers, workers, services, replicas—without Kubernetes complexity.

Core challenges you’ll face:

  • Manager vs worker nodes → maps to control plane vs data plane
  • Service replication → maps to high availability
  • Overlay networking → maps to cross-node container communication
  • Stack deployments → maps to multi-service applications

Key Concepts:

  • Swarm Mode: “Docker Deep Dive” Chapter 14 - Nigel Poulton
  • Overlay Networks: “Docker Deep Dive” Chapter 11
  • Service Discovery: Built into Swarm via DNS
  • Rolling Updates: “Docker Deep Dive” Chapter 14

Difficulty: Intermediate Time estimate: Weekend to 1 week Prerequisites: Project 1 (Docker basics). 2-3 machines (physical or VMs) with Docker installed.

Real world outcome:

# On node 1 (manager)
$ docker swarm init --advertise-addr 192.168.1.10
Swarm initialized: current node is now a manager.
To add a worker: docker swarm join --token SWMTKN-xxx 192.168.1.10:2377

# On nodes 2 and 3
$ docker swarm join --token SWMTKN-xxx 192.168.1.10:2377
This node joined a swarm as a worker.

# Back on manager - deploy a replicated service
$ docker service create --name web --replicas 3 -p 80:80 nginx
3/3: Running

$ docker service ps web
ID         NAME    NODE     STATE
abc123     web.1   node1    Running
def456     web.2   node2    Running
ghi789     web.3   node3    Running

# Kill node3, watch containers reschedule
$ docker service ps web
ID         NAME        NODE     STATE
abc123     web.1       node1    Running
def456     web.2       node2    Running
jkl012     web.3       node1    Running  # Rescheduled!

Implementation Hints:

Swarm stack file (similar to Compose, but for Swarm):

# stack.yml
version: '3.8'

services:
  web:
    image: nginx:alpine
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
    ports:
      - "80:80"
    networks:
      - webnet

  visualizer:
    image: dockersamples/visualizer
    ports:
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      placement:
        constraints:
          - node.role == manager

networks:
  webnet:
    driver: overlay

Deploy and manage:

$ docker stack deploy -c stack.yml myapp
$ docker stack services myapp
$ docker stack ps myapp

Resources for key challenges:

Learning milestones:

  1. 3-node cluster initialized → You understand cluster formation
  2. Service runs across nodes → You understand scheduling
  3. Service survives node failure → You understand high availability
  4. Rolling update works → You understand zero-downtime deployments

Project 3: K3s Kubernetes Cluster (Lightweight Production-Grade)

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML / Bash
  • Alternative Programming Languages: Go, Python
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Kubernetes / Container Orchestration
  • Software or Tool: K3s, kubectl, Helm
  • Main Book: “Kubernetes Up and Running” by Brendan Burns

What you’ll build: A production-ready Kubernetes cluster using K3s—a lightweight K8s distribution perfect for edge and home labs. You’ll deploy applications, understand pods, services, deployments, and ingress.

Why it teaches home clusters: Kubernetes is the industry standard for container orchestration. K3s gives you real Kubernetes without the resource overhead. Skills learned here transfer directly to EKS, GKE, AKS, and enterprise Kubernetes.

Core challenges you’ll face:

  • Pod scheduling and lifecycle → maps to fundamental K8s unit
  • Services and networking → maps to how pods communicate
  • Deployments and ReplicaSets → maps to declarative application management
  • Ingress and load balancing → maps to external access

Key Concepts:

  • Kubernetes Architecture: “Kubernetes Up and Running” Chapter 3 - Burns et al.
  • Pods and Containers: “Kubernetes Up and Running” Chapter 5
  • Services: “Kubernetes Up and Running” Chapter 7
  • Deployments: “Kubernetes Up and Running” Chapter 9

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2 (orchestration basics), Linux administration, networking fundamentals. 3 machines recommended.

Real world outcome:

# On control plane node
$ curl -sfL https://get.k3s.io | sh -

# Get join token
$ cat /var/lib/rancher/k3s/server/node-token

# On worker nodes
$ curl -sfL https://get.k3s.io | K3S_URL=https://192.168.1.10:6443 \
    K3S_TOKEN=<token> sh -

# Check cluster
$ kubectl get nodes
NAME     STATUS   ROLES                  AGE
node1    Ready    control-plane,master   5m
node2    Ready    <none>                 2m
node3    Ready    <none>                 2m

# Deploy an application
$ kubectl create deployment nginx --image=nginx --replicas=3
$ kubectl expose deployment nginx --port=80 --type=LoadBalancer
$ kubectl get pods -o wide
NAME                    READY   STATUS    NODE
nginx-abc123-xxx        1/1     Running   node1
nginx-abc123-yyy        1/1     Running   node2
nginx-abc123-zzz        1/1     Running   node3

Implementation Hints:

Kubernetes manifest structure:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:latest
        ports:
        - containerPort: 8080
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP

Essential kubectl commands:

kubectl get pods,svc,deploy -A          # See everything
kubectl describe pod <name>             # Debug a pod
kubectl logs <pod> -f                   # Stream logs
kubectl exec -it <pod> -- /bin/sh       # Shell into pod
kubectl apply -f manifest.yaml          # Apply config
kubectl delete -f manifest.yaml         # Delete resources

Resources for key challenges:

Learning milestones:

  1. Cluster formed, kubectl works → You understand K8s architecture
  2. Deployment scales up/down → You understand workload management
  3. Service discovery works → You understand K8s networking
  4. Ingress routes external traffic → You understand edge access

Project 4: Distributed Storage with Longhorn

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML / Bash
  • Alternative Programming Languages: Go
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Distributed Storage / Kubernetes
  • Software or Tool: Longhorn, K3s
  • Main Book: “Kubernetes Up and Running” by Brendan Burns

What you’ll build: A distributed block storage system that replicates data across cluster nodes. When a node dies, your data survives. Databases, file storage, and stateful apps all get persistent volumes that are automatically replicated.

Why it teaches home clusters: Single points of failure are the enemy of clusters. Local disk on one node isn’t “clustered storage.” Longhorn teaches you how real distributed storage works—replication, consistency, and failure recovery.

Core challenges you’ll face:

  • Persistent Volume Claims (PVCs) → maps to requesting storage in Kubernetes
  • Storage Classes → maps to different storage tiers/policies
  • Replication factor → maps to data redundancy
  • Backup and restore → maps to disaster recovery

Key Concepts:

  • Kubernetes Storage: “Kubernetes Up and Running” Chapter 13 - Burns et al.
  • PV/PVC/StorageClass: Kubernetes official documentation
  • Distributed Block Storage: Longhorn architecture documentation
  • iSCSI: How Longhorn exposes volumes

Difficulty: Advanced Time estimate: 1 week Prerequisites: Project 3 (K3s cluster running). Each node needs some local disk space.

Real world outcome:

# Install Longhorn
$ kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

# Wait for Longhorn to be ready
$ kubectl -n longhorn-system get pods
NAME                                       READY   STATUS
longhorn-manager-xxxxx                     1/1     Running
longhorn-driver-deployer-xxxxx             1/1     Running

# Access Longhorn UI
$ kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80

# Create a replicated volume for PostgreSQL
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
EOF

# Longhorn dashboard shows:
# - Volume: postgres-data
# - Replicas: 3 (one on each node)
# - Status: Healthy

# Kill a node, watch Longhorn rebuild replica on surviving nodes

Implementation Hints:

Longhorn StorageClass configuration:

# storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"  # 48 hours
  fromBackup: ""
  dataLocality: "best-effort"

Using storage in a deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: postgres-data

Resources for key challenges:

Learning milestones:

  1. Longhorn installed and healthy → You understand CSI drivers
  2. PVC provisions automatically → You understand dynamic provisioning
  3. Data survives pod restart → You understand persistent storage
  4. Data survives node failure → You understand distributed replication

Project 5: Load Balancer with MetalLB

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML
  • Alternative Programming Languages: N/A
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Networking / Kubernetes
  • Software or Tool: MetalLB, K3s
  • Main Book: “Kubernetes Networking” by James Strong

What you’ll build: A bare-metal load balancer that gives your Kubernetes services external IP addresses—just like cloud LoadBalancers do. No more NodePort hacks; services get real IPs from your home network.

Why it teaches home clusters: In the cloud, type: LoadBalancer magically gets an IP. On bare metal, you need MetalLB. Understanding how this works demystifies a core Kubernetes networking concept and teaches you about Layer 2/BGP networking.

Core challenges you’ll face:

  • IP address pool management → maps to allocating IPs from your network
  • Layer 2 vs BGP mode → maps to how addresses are advertised
  • Service type LoadBalancer → maps to external access patterns
  • ARP/NDP → maps to how Layer 2 mode works

Key Concepts:

  • Kubernetes Service Types: “Kubernetes Up and Running” Chapter 7
  • Layer 2 Networking: ARP (Address Resolution Protocol)
  • BGP: Border Gateway Protocol basics
  • Load Balancing Algorithms: Round-robin, least connections

Difficulty: Intermediate Time estimate: Half a day Prerequisites: Project 3 (K3s cluster), basic networking knowledge (IP addresses, subnets).

Real world outcome:

# Install MetalLB
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/main/config/manifests/metallb-native.yaml

# Configure IP pool (use unused IPs from your LAN)
$ cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: homelab-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.200-192.168.1.220
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: homelab-l2
  namespace: metallb-system
EOF

# Create a LoadBalancer service
$ kubectl expose deployment nginx --type=LoadBalancer --port=80

$ kubectl get svc nginx
NAME    TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)
nginx   LoadBalancer   10.43.12.34    192.168.1.200    80:31234/TCP

# Access from any machine on your network!
$ curl http://192.168.1.200
Welcome to nginx!

Implementation Hints:

MetalLB configuration options:

# Layer 2 mode (simple, no router config needed)
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: production
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.200-192.168.1.220  # Reserve these in your DHCP

---
# BGP mode (for advanced setups with BGP-capable router)
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgp-advertisement
  namespace: metallb-system
spec:
  ipAddressPools:
  - production

Learning milestones:

  1. MetalLB pods running → You understand the installation
  2. Service gets external IP → You understand IP allocation
  3. External access works → You understand L2 advertisement
  4. Multiple services get unique IPs → You understand pool management

Project 6: Ingress Controller and TLS Certificates

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML
  • Alternative Programming Languages: N/A
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Networking / Security
  • Software or Tool: Traefik (K3s default) or Nginx Ingress, cert-manager
  • Main Book: “Kubernetes Networking” by James Strong

What you’ll build: An ingress controller that routes traffic based on hostnames (app1.home.lab, app2.home.lab), plus automatic TLS certificate management so everything is HTTPS.

Why it teaches home clusters: Real applications need domain-based routing and HTTPS. The Ingress resource is how Kubernetes handles this. cert-manager automates the tedious certificate renewal process—essential for production.

Core challenges you’ll face:

  • Ingress resource configuration → maps to routing rules
  • TLS termination → maps to HTTPS at the edge
  • cert-manager and Let’s Encrypt → maps to automated certificates
  • Local DNS setup → maps to resolving *.home.lab

Key Concepts:

  • Kubernetes Ingress: “Kubernetes Up and Running” Chapter 8
  • TLS/SSL Certificates: How HTTPS works
  • ACME Protocol: How Let’s Encrypt issues certificates
  • DNS Challenge: Proving domain ownership

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster), Project 5 (MetalLB). Domain name helpful but not required.

Real world outcome:

# K3s comes with Traefik ingress controller
$ kubectl get pods -n kube-system | grep traefik
traefik-xxxxx    1/1     Running

# Install cert-manager
$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

# Create Ingress with TLS
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80
EOF

# Browser shows:
# 🔒 https://myapp.example.com - Certificate valid!

Implementation Hints:

cert-manager ClusterIssuer:

# For production (real certs from Let's Encrypt)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: you@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: traefik

# For testing (self-signed, no rate limits)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned
spec:
  selfSigned: {}

Local DNS with Pi-hole or CoreDNS:

# Add to /etc/hosts or Pi-hole Local DNS:
192.168.1.200  app1.home.lab
192.168.1.200  app2.home.lab
192.168.1.200  grafana.home.lab

Learning milestones:

  1. Ingress routes by hostname → You understand L7 routing
  2. Self-signed TLS works → You understand TLS termination
  3. Let’s Encrypt cert auto-issued → You understand ACME
  4. Multiple apps on one IP → You understand virtual hosting

Project 7: Monitoring Stack (Prometheus + Grafana)

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML / PromQL
  • Alternative Programming Languages: Python (exporters)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Observability / Monitoring
  • Software or Tool: Prometheus, Grafana, kube-prometheus-stack
  • Main Book: “Prometheus Up and Running” by Brian Brazil

What you’ll build: A complete monitoring stack that collects metrics from all cluster nodes and applications, visualizes them in beautiful dashboards, and can alert you when something goes wrong.

Why it teaches home clusters: You can’t manage what you can’t measure. Prometheus is the standard for Kubernetes monitoring. Learning PromQL (Prometheus Query Language) and Grafana dashboards are essential DevOps skills.

Core challenges you’ll face:

  • Prometheus scraping → maps to metric collection
  • PromQL queries → maps to analyzing time series data
  • Grafana dashboards → maps to visualization
  • Alertmanager → maps to notification on issues

Key Concepts:

  • Prometheus Architecture: “Prometheus Up and Running” Chapter 2 - Brian Brazil
  • PromQL Basics: “Prometheus Up and Running” Chapter 4
  • Grafana Dashboards: Grafana documentation
  • Kubernetes Metrics: kube-state-metrics, node-exporter

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster). Understanding of metrics concepts helpful.

Real world outcome:

# Install kube-prometheus-stack via Helm
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm install monitoring prometheus-community/kube-prometheus-stack \
    --namespace monitoring --create-namespace

# Access Grafana
$ kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Open http://localhost:3000 (admin/prom-operator)

# Grafana shows:
# - Cluster CPU/Memory usage: 45% / 62%
# - Pod count: 47 running, 0 pending
# - Node status: 3/3 healthy
# - Network I/O: 150 Mbps in, 50 Mbps out
# - Disk usage: 234 GB / 512 GB

# Example PromQL queries:
# CPU usage by pod: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
# Memory by namespace: sum(container_memory_usage_bytes) by (namespace)

Implementation Hints:

kube-prometheus-stack values customization:

# values.yaml
grafana:
  adminPassword: your-secure-password
  ingress:
    enabled: true
    hosts:
      - grafana.home.lab

prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          resources:
            requests:
              storage: 50Gi

alertmanager:
  config:
    receivers:
    - name: 'slack'
      slack_configs:
      - api_url: 'https://hooks.slack.com/xxx'
        channel: '#alerts'

Essential PromQL queries:

# Node CPU usage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Pod memory usage
sum(container_memory_usage_bytes{pod!=""}) by (pod, namespace)

# HTTP request rate
sum(rate(http_requests_total[5m])) by (service)

# Error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

Learning milestones:

  1. Prometheus scrapes metrics → You understand metric collection
  2. Grafana dashboard shows cluster health → You understand visualization
  3. Custom PromQL query works → You understand querying
  4. Alert fires and notifies you → You understand alerting

Project 8: GitOps with ArgoCD

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML / Git
  • Alternative Programming Languages: Kustomize, Helm
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: GitOps / CI/CD
  • Software or Tool: ArgoCD, GitHub/GitLab
  • Main Book: “GitOps and Kubernetes” by Billy Yuen

What you’ll build: A GitOps pipeline where your cluster state is defined in Git. Push a change to your repository, and ArgoCD automatically syncs it to your cluster. No more kubectl apply—everything is version-controlled and auditable.

Why it teaches home clusters: GitOps is how modern teams manage infrastructure. “Git as the single source of truth” means you can rebuild your entire cluster from a repository. It’s declarative, auditable, and enables easy rollbacks.

Core challenges you’ll face:

  • Application custom resources → maps to defining apps in ArgoCD
  • Sync policies → maps to automatic vs manual deployment
  • Helm/Kustomize integration → maps to templating and overlays
  • RBAC and multi-tenancy → maps to who can deploy what

Key Concepts:

  • GitOps Principles: “GitOps and Kubernetes” Chapter 1 - Billy Yuen
  • ArgoCD Applications: ArgoCD documentation
  • Kustomize: Kubernetes-native templating
  • Helm Charts: Package manager for Kubernetes

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3 (K3s cluster), Git proficiency, understanding of Kubernetes manifests.

Real world outcome:

# Install ArgoCD
$ kubectl create namespace argocd
$ kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Access ArgoCD UI
$ kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get initial password: kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

# Create an Application pointing to your Git repo
$ cat <<EOF | kubectl apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-homelab
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/yourusername/homelab-gitops.git
    targetRevision: HEAD
    path: apps
  destination:
    server: https://kubernetes.default.svc
    namespace: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
EOF

# Now: git push to your repo → ArgoCD syncs automatically!
# ArgoCD UI shows:
# - App Status: Synced ✓
# - Health: Healthy ✓
# - Last Sync: 30 seconds ago

Implementation Hints:

Repository structure for GitOps:

homelab-gitops/
├── apps/                    # Application definitions
│   ├── nginx/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── ingress.yaml
│   ├── postgres/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── pvc.yaml
│   └── kustomization.yaml
├── infrastructure/          # Cluster infrastructure
│   ├── metallb/
│   ├── cert-manager/
│   └── monitoring/
└── README.md

ArgoCD Application with Helm:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: prometheus
  namespace: argocd
spec:
  source:
    repoURL: https://prometheus-community.github.io/helm-charts
    chart: kube-prometheus-stack
    targetRevision: 45.0.0
    helm:
      values: |
        grafana:
          adminPassword: secret
  destination:
    server: https://kubernetes.default.svc
    namespace: monitoring

Resources for key challenges:

Learning milestones:

  1. ArgoCD syncs from Git → You understand GitOps basics
  2. Auto-sync on git push → You understand continuous deployment
  3. Helm charts deployed via ArgoCD → You understand templating
  4. Full cluster rebuildable from Git → You’ve achieved true GitOps

Project 9: Proxmox Virtualization Platform

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: Bash / Web UI
  • Alternative Programming Languages: Terraform (for automation)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Virtualization / Infrastructure
  • Software or Tool: Proxmox VE
  • Main Book: “Mastering Proxmox” by Wasim Ahmed

What you’ll build: A Proxmox cluster that manages VMs and containers across multiple physical nodes. This becomes the foundation layer for everything else—you can run Kubernetes inside Proxmox VMs for the best of both worlds.

Why it teaches home clusters: Proxmox teaches you virtualization concepts: hypervisors, VM lifecycle, live migration, HA clustering, and storage pools. Running K8s on Proxmox gives you snapshot/backup capabilities and easy recovery.

Core challenges you’ll face:

  • Proxmox cluster formation → maps to quorum and consensus
  • VM templates → maps to golden images
  • Storage configuration → maps to local vs shared storage
  • HA groups → maps to automatic VM failover

Key Concepts:

  • Type 1 vs Type 2 Hypervisors: Proxmox is Type 1 (bare metal)
  • QEMU/KVM: Linux virtualization technology
  • LXC Containers: Lightweight OS-level virtualization
  • Ceph Integration: Distributed storage in Proxmox

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: 2-3 physical machines (or nested virtualization), basic Linux administration.

Real world outcome:

Proxmox Web UI (https://192.168.1.10:8006)

┌─────────────────────────────────────────────────────────────────┐
│ Datacenter: HomeCluster                                         │
├─────────────────────────────────────────────────────────────────┤
│ Nodes:                                                          │
│   ✓ pve1 (192.168.1.10) - 16 CPU, 64GB RAM, 1TB SSD            │
│   ✓ pve2 (192.168.1.11) - 16 CPU, 64GB RAM, 1TB SSD            │
│   ✓ pve3 (192.168.1.12) - 16 CPU, 64GB RAM, 1TB SSD            │
│                                                                 │
│ VMs:                                                            │
│   k8s-control (pve1) - 4 CPU, 8GB RAM - Running                │
│   k8s-worker1 (pve2) - 4 CPU, 8GB RAM - Running                │
│   k8s-worker2 (pve3) - 4 CPU, 8GB RAM - Running                │
│   windows-test (pve1) - 4 CPU, 16GB RAM - Stopped              │
│                                                                 │
│ Storage:                                                        │
│   local-lvm: 234GB / 500GB                                      │
│   ceph-pool: 1.2TB / 3TB (replicated across nodes)              │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Proxmox cluster setup:

# On first node (pve1)
$ pvecm create HomeCluster

# On other nodes (pve2, pve3)
$ pvecm add 192.168.1.10

# Check cluster status
$ pvecm status
Cluster information
-------------------
Name:             HomeCluster
Config Version:   3
Transport:        knet
Secure auth:      on

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve1 (local)
         2          1 pve2
         3          1 pve3

Creating a VM template:

# Download cloud image
$ wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img

# Create VM
$ qm create 9000 --name ubuntu-template --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0

# Import disk
$ qm importdisk 9000 jammy-server-cloudimg-amd64.img local-lvm

# Attach disk
$ qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0

# Convert to template
$ qm template 9000

# Clone from template
$ qm clone 9000 100 --name k8s-control --full

Resources for key challenges:

Learning milestones:

  1. Proxmox cluster formed → You understand cluster quorum
  2. VM created and runs → You understand virtualization basics
  3. Template cloning works → You understand golden images
  4. Live migration works → You understand VM mobility

Project 10: Network Segmentation with VLANs

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: N/A (Network configuration)
  • Alternative Programming Languages: Ansible (for automation)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Networking / Security
  • Software or Tool: Managed switch, pfSense/OPNsense
  • Main Book: “Computer Networks” by Tanenbaum

What you’ll build: A properly segmented home network with VLANs separating management traffic, user devices, IoT devices, and cluster traffic. Each VLAN has appropriate firewall rules.

Why it teaches home clusters: Production clusters aren’t on flat networks. VLANs teach you layer 2/3 networking, inter-VLAN routing, and network security. These skills are essential for any infrastructure role.

Core challenges you’ll face:

  • VLAN tagging (802.1Q) → maps to how VLANs work on the wire
  • Inter-VLAN routing → maps to how VLANs communicate
  • Firewall rules → maps to controlling traffic flow
  • Trunk vs access ports → maps to switch configuration

Key Concepts:

  • VLANs: “Computer Networks” Chapter 4 - Tanenbaum
  • 802.1Q Tagging: IEEE standard for VLAN tagging
  • Layer 3 Switching: Routing between VLANs
  • Firewall Zones: Grouping interfaces by trust level

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Managed switch (VLAN-capable), router/firewall (pfSense/OPNsense). Basic networking knowledge.

Real world outcome:

Network Diagram:

                    Internet
                        │
                   ┌────┴────┐
                   │ OPNsense│  (Router/Firewall)
                   │ Firewall│
                   └────┬────┘
                        │ Trunk (all VLANs)
                   ┌────┴────┐
                   │ Managed │
                   │ Switch  │
                   └─┬──┬──┬─┘
                     │  │  │
        ┌────────────┘  │  └────────────┐
        │               │               │
   VLAN 10         VLAN 20         VLAN 30
   Management      Cluster         IoT
   ────────────    ─────────       ──────────
   • OPNsense      • k8s-ctrl      • Cameras
   • Proxmox UI    • k8s-worker1   • Smart bulbs
   • Switch mgmt   • k8s-worker2   • Thermostats
                   • NAS

Firewall Rules:
• VLAN 10 → All VLANs (management access)
• VLAN 20 → Internet, VLAN 30 limited
• VLAN 30 → Internet only (IoT isolated)

Implementation Hints:

OPNsense VLAN configuration:

Interfaces → Other Types → VLAN

VLAN 10: Parent: igb0, Tag: 10, Description: Management
VLAN 20: Parent: igb0, Tag: 20, Description: Cluster
VLAN 30: Parent: igb0, Tag: 30, Description: IoT

Interfaces → Assignments:
  OPT1: vlan10 → MGMT (192.168.10.1/24)
  OPT2: vlan20 → CLUSTER (192.168.20.1/24)
  OPT3: vlan30 → IOT (192.168.30.1/24)

Services → DHCPv4:
  MGMT: 192.168.10.100 - 192.168.10.200
  CLUSTER: 192.168.20.100 - 192.168.20.200
  IOT: 192.168.30.100 - 192.168.30.200

Switch VLAN configuration (example for TP-Link):

Port 1-4:   PVID=10, Untagged=10, Tagged=none     (Management)
Port 5-12:  PVID=20, Untagged=20, Tagged=none     (Cluster)
Port 13-20: PVID=30, Untagged=30, Tagged=none     (IoT)
Port 24:    PVID=1,  Untagged=1,  Tagged=10,20,30 (Trunk to router)

Resources for key challenges:

Learning milestones:

  1. VLANs created on switch → You understand 802.1Q
  2. Devices in VLAN get correct IPs → You understand DHCP per VLAN
  3. VLANs can communicate via router → You understand inter-VLAN routing
  4. Firewall blocks unauthorized traffic → You understand security zones

Project 11: Secure Remote Access with Tailscale

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: N/A (Configuration)
  • Alternative Programming Languages: N/A
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Networking / VPN
  • Software or Tool: Tailscale
  • Main Book: N/A (Online documentation)

What you’ll build: Secure access to your entire homelab from anywhere in the world using Tailscale’s WireGuard-based mesh VPN. No port forwarding, no dynamic DNS hassles, no exposed services.

Why it teaches home clusters: Remote access is essential for managing your cluster. Tailscale teaches you about WireGuard, mesh networking, and zero-trust security—all without the complexity of setting up your own VPN server.

Core challenges you’ll face:

  • Mesh VPN topology → maps to peer-to-peer connections
  • Subnet routing → maps to accessing entire home network remotely
  • ACLs → maps to who can access what
  • Exit nodes → maps to routing all traffic through home

Key Concepts:

  • WireGuard Protocol: Modern, fast VPN protocol
  • Mesh Networking: Every device connects to every other
  • NAT Traversal: Connecting through firewalls
  • Zero-Trust: Authenticate before connect

Difficulty: Beginner Time estimate: Half a day Prerequisites: Any device to install Tailscale on. Free account at tailscale.com.

Real world outcome:

# On your cluster nodes
$ curl -fsSL https://tailscale.com/install.sh | sh
$ sudo tailscale up

# Tailscale dashboard shows:
# Machine         IP            Last Seen
# k8s-control     100.64.0.1    Connected
# k8s-worker1     100.64.0.2    Connected
# k8s-worker2     100.64.0.3    Connected
# macbook         100.64.0.4    Connected
# iphone          100.64.0.5    Connected

# From your phone on 4G:
$ ssh 100.64.0.1
Welcome to k8s-control!

# Enable subnet routing (access all 192.168.x.x from anywhere)
$ sudo tailscale up --advertise-routes=192.168.0.0/16

# From anywhere, access your home network:
$ curl http://192.168.1.10:8006  # Proxmox UI works!

Implementation Hints:

Tailscale setup on Kubernetes nodes:

# Install on each node
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

# On one node, enable subnet routing
sudo tailscale up --advertise-routes=10.43.0.0/16,10.42.0.0/16

# In Tailscale admin console, approve the routes

Tailscale ACL configuration (in admin console):

{
  "acls": [
    {
      "action": "accept",
      "src": ["group:admin"],
      "dst": ["*:*"]
    },
    {
      "action": "accept",
      "src": ["group:developers"],
      "dst": ["tag:k8s:22", "tag:k8s:6443"]
    }
  ],
  "tagOwners": {
    "tag:k8s": ["group:admin"]
  }
}

Resources for key challenges:

Learning milestones:

  1. Two devices connected via Tailscale → You understand mesh VPN
  2. SSH to homelab from phone → You understand remote access
  3. Subnet routing exposes home network → You understand routing
  4. ACLs restrict access by user → You understand zero-trust

Project 12: High Availability Kubernetes (Multi-Master)

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML / Bash
  • Alternative Programming Languages: Ansible
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: Kubernetes / HA
  • Software or Tool: K3s with embedded etcd, kube-vip
  • Main Book: “Kubernetes Up and Running” by Brendan Burns

What you’ll build: A Kubernetes cluster with 3 control plane nodes, so the cluster survives the loss of any single control plane node. etcd runs in HA mode, and a virtual IP floats between masters.

Why it teaches home clusters: Single control plane = single point of failure. Understanding HA patterns (quorum, leader election, virtual IPs) is essential for production systems. This is how real Kubernetes clusters are built.

Core challenges you’ll face:

  • etcd quorum → maps to why 3 nodes minimum
  • Virtual IP failover → maps to stable API endpoint
  • Leader election → maps to only one active leader
  • Split-brain prevention → maps to network partition handling

Key Concepts:

  • etcd Consensus: “Kubernetes Up and Running” Chapter 3
  • Raft Protocol: How etcd achieves consensus
  • Virtual IP: kube-vip or keepalived
  • Quorum: N/2 + 1 nodes must agree

Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Project 3 (K3s basics), networking knowledge. 3+ machines for control plane.

Real world outcome:

# K3s HA cluster with 3 control planes + 2 workers
$ kubectl get nodes
NAME          STATUS   ROLES                       AGE
k8s-ctrl-1    Ready    control-plane,etcd,master   5d
k8s-ctrl-2    Ready    control-plane,etcd,master   5d
k8s-ctrl-3    Ready    control-plane,etcd,master   5d
k8s-worker-1  Ready    <none>                      5d
k8s-worker-2  Ready    <none>                      5d

# Virtual IP for API
$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.1.100:6443

# Kill ctrl-1, cluster keeps running
$ ssh ctrl-1 "sudo poweroff"

# After 30 seconds:
$ kubectl get nodes
NAME          STATUS     ROLES                       AGE
k8s-ctrl-1    NotReady   control-plane,etcd,master   5d
k8s-ctrl-2    Ready      control-plane,etcd,master   5d  # Now leader
k8s-ctrl-3    Ready      control-plane,etcd,master   5d
k8s-worker-1  Ready      <none>                      5d
k8s-worker-2  Ready      <none>                      5d

# VIP moved to ctrl-2, kubectl still works!

Implementation Hints:

K3s HA with embedded etcd:

# First control plane node
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --tls-san 192.168.1.100 \
  --disable servicelb

# Get token
cat /var/lib/rancher/k3s/server/node-token

# Additional control plane nodes
curl -sfL https://get.k3s.io | sh -s - server \
  --server https://192.168.1.101:6443 \
  --token <TOKEN> \
  --tls-san 192.168.1.100

# Workers
curl -sfL https://get.k3s.io | sh -s - agent \
  --server https://192.168.1.100:6443 \
  --token <TOKEN>

kube-vip for virtual IP:

apiVersion: v1
kind: Pod
metadata:
  name: kube-vip
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-vip
    image: ghcr.io/kube-vip/kube-vip:latest
    args:
    - manager
    env:
    - name: vip_arp
      value: "true"
    - name: vip_interface
      value: eth0
    - name: address
      value: "192.168.1.100"
    - name: port
      value: "6443"
    - name: vip_leaderelection
      value: "true"

Learning milestones:

  1. 3-node control plane running → You understand HA architecture
  2. etcd cluster healthy → You understand consensus
  3. VIP fails over on node loss → You understand virtual IPs
  4. Cluster survives any single failure → You’ve achieved true HA

Project 13: CI/CD Pipeline with Self-Hosted Runners

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML / Docker
  • Alternative Programming Languages: Go, Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: CI/CD / DevOps
  • Software or Tool: GitHub Actions + self-hosted runner, or Drone CI
  • Main Book: “Continuous Delivery” by Jez Humble

What you’ll build: A self-hosted CI/CD system that builds Docker images, runs tests, and deploys to your Kubernetes cluster—all running on your homelab instead of paying for cloud CI minutes.

Why it teaches home clusters: CI/CD is the heartbeat of modern software delivery. Running it yourself teaches you about build pipelines, artifact management, and deployment automation. Plus, self-hosted runners have access to your local cluster.

Core challenges you’ll face:

  • Runner registration → maps to connecting to GitHub/GitLab
  • Docker-in-Docker builds → maps to building images in CI
  • Secrets management → maps to secure credential handling
  • Deployment triggers → maps to automatic deployments

Key Concepts:

  • CI/CD Principles: “Continuous Delivery” Chapter 1 - Jez Humble
  • GitHub Actions Syntax: GitHub documentation
  • Docker BuildKit: Modern Docker image building
  • Kubernetes Deployments: Rolling updates

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster), Project 8 (GitOps helpful). Git and Docker proficiency.

Real world outcome:

# .github/workflows/deploy.yml
name: Build and Deploy
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: self-hosted  # Your homelab runner!
    steps:
      - uses: actions/checkout@v4

      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Push to local registry
        run: |
          docker tag myapp:${{ github.sha }} registry.home.lab/myapp:${{ github.sha }}
          docker push registry.home.lab/myapp:${{ github.sha }}

      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/myapp \
            myapp=registry.home.lab/myapp:${{ github.sha }}
GitHub Actions log:
✓ Build Docker image (45s)
✓ Push to local registry (12s)
✓ Deploy to Kubernetes (8s)

Total: 1m 5s on YOUR hardware, $0 cloud costs

Implementation Hints:

GitHub self-hosted runner setup:

# On a node in your cluster
mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64.tar.gz -L https://github.com/actions/runner/releases/download/v2.xxx/actions-runner-linux-x64-2.xxx.tar.gz
tar xzf actions-runner-linux-x64.tar.gz

# Configure (get token from GitHub repo settings)
./config.sh --url https://github.com/USERNAME/REPO --token <TOKEN>

# Run as service
sudo ./svc.sh install
sudo ./svc.sh start

Local container registry:

# Deploy a local registry in K8s
apiVersion: apps/v1
kind: Deployment
metadata:
  name: registry
spec:
  replicas: 1
  selector:
    matchLabels:
      app: registry
  template:
    spec:
      containers:
      - name: registry
        image: registry:2
        ports:
        - containerPort: 5000
        volumeMounts:
        - name: data
          mountPath: /var/lib/registry
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: registry-data

Learning milestones:

  1. Self-hosted runner connected → You understand runner architecture
  2. Build runs on your hardware → You understand self-hosted benefits
  3. Image pushed to local registry → You understand artifact storage
  4. Auto-deploy on git push → You understand full CI/CD

Project 14: Backup and Disaster Recovery

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML / Bash
  • Alternative Programming Languages: Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Backup / DR
  • Software or Tool: Velero, Restic, Proxmox Backup Server
  • Main Book: “Site Reliability Engineering” by Google

What you’ll build: A comprehensive backup strategy covering Kubernetes resources, persistent volumes, and VM snapshots. You’ll practice restoring from backup to prove it works.

Why it teaches home clusters: “Backup that isn’t tested isn’t backup.” Learning disaster recovery means understanding what state needs to be saved, how to save it, and how to restore it. This is critical for any production system.

Core challenges you’ll face:

  • What to backup → maps to identifying critical state
  • Velero for K8s → maps to cluster-aware backups
  • PV snapshots → maps to data backup
  • Recovery testing → maps to validating backups

Key Concepts:

  • RTO/RPO: Recovery Time/Point Objectives
  • 3-2-1 Backup Rule: 3 copies, 2 media types, 1 offsite
  • Velero Architecture: Velero documentation
  • Snapshot vs Backup: Point-in-time vs full copy

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3-4 (K3s with storage), storage for backups (NAS, S3, etc.).

Real world outcome:

# Install Velero with S3-compatible backend
$ velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.5.0 \
    --bucket homelab-backups \
    --secret-file ./credentials-velero \
    --backup-location-config region=us-east-1,s3Url=http://minio.home.lab:9000

# Create a backup
$ velero backup create production-backup \
    --include-namespaces production \
    --snapshot-volumes

Backup request "production-backup" submitted successfully.

# Simulate disaster (delete namespace)
$ kubectl delete namespace production
namespace "production" deleted

# Restore from backup
$ velero restore create --from-backup production-backup

Restore request "production-backup-20231215" submitted successfully.

# Everything is back!
$ kubectl get all -n production
NAME                          READY   STATUS
pod/myapp-abc123-xxx          1/1     Running
pod/postgres-xyz789-yyy       1/1     Running

Implementation Hints:

Velero scheduled backup:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 3 * * *"  # 3 AM daily
  template:
    includedNamespaces:
    - production
    - monitoring
    snapshotVolumes: true
    ttl: 720h  # Keep for 30 days

Proxmox backup with PBS:

# Add Proxmox Backup Server as storage
pvesm add pbs pbs-storage \
    --server 192.168.1.50 \
    --datastore backups \
    --username backup@pbs \
    --password secret

# Schedule VM backups
# In Proxmox UI: Datacenter → Backup → Add
# Schedule: daily, 02:00
# Selection: All VMs
# Storage: pbs-storage

Learning milestones:

  1. Velero installed and connected to storage → You understand backup infra
  2. Manual backup succeeds → You understand backup process
  3. Restore works → You understand recovery
  4. Scheduled backups running → You understand automation

Project 15: Self-Hosted Application Platform

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML
  • Alternative Programming Languages: Various per app
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Self-Hosting / Applications
  • Software or Tool: Various (see list below)
  • Main Book: N/A (App-specific docs)

What you’ll build: A suite of self-hosted applications running on your cluster: media server (Jellyfin), password manager (Vaultwarden), file sync (Nextcloud), ad-blocking DNS (Pi-hole), home automation (Home Assistant).

Why it teaches home clusters: This is why many people build homelabs—to run their own services. You’ll learn to deploy stateful applications, manage persistent data, configure networking, and understand real-world application requirements.

Core challenges you’ll face:

  • Stateful application management → maps to databases, file storage
  • Resource allocation → maps to limits and requests
  • External access → maps to ingress, DNS
  • Data persistence → maps to PVCs, backups

Popular Self-Hosted Apps:

App Purpose Complexity
Jellyfin Media streaming Medium
Vaultwarden Password manager Low
Nextcloud File sync/office High
Pi-hole Ad-blocking DNS Low
Home Assistant Home automation Medium
Paperless-ngx Document management Medium
Gitea Git hosting Low
Miniflux RSS reader Low

Difficulty: Intermediate Time estimate: Ongoing (add apps as needed) Prerequisites: Project 3-6 (K3s with ingress and storage).

Real world outcome:

Your home dashboard (https://home.yourdomain.com):

┌─────────────────────────────────────────────────────────────────┐
│  🏠 Homelab Dashboard                                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  📺 Jellyfin        → jellyfin.home.lab      [Healthy]         │
│  🔐 Vaultwarden     → vault.home.lab         [Healthy]         │
│  📁 Nextcloud       → cloud.home.lab         [Healthy]         │
│  🛡️ Pi-hole         → pihole.home.lab        [Healthy]         │
│  🏠 Home Assistant  → hass.home.lab          [Healthy]         │
│  📄 Paperless       → docs.home.lab          [Healthy]         │
│  📊 Grafana         → grafana.home.lab       [Healthy]         │
│                                                                 │
│  Cluster Status: 3/3 nodes healthy                              │
│  Storage: 234 GB used / 1 TB available                          │
│  Network: 50 Mbps avg throughput                                │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Jellyfin deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jellyfin
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jellyfin
  template:
    metadata:
      labels:
        app: jellyfin
    spec:
      containers:
      - name: jellyfin
        image: jellyfin/jellyfin:latest
        ports:
        - containerPort: 8096
        volumeMounts:
        - name: config
          mountPath: /config
        - name: media
          mountPath: /media
        resources:
          limits:
            memory: "4Gi"
            cpu: "2"
      volumes:
      - name: config
        persistentVolumeClaim:
          claimName: jellyfin-config
      - name: media
        nfs:
          server: 192.168.1.50
          path: /media

Learning milestones:

  1. First app deployed and accessible → You understand the pattern
  2. 5+ apps running smoothly → You understand resource management
  3. All apps backed up → You understand data protection
  4. Family uses your services → You’ve built real infrastructure

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Docker Single Node Beginner Weekend ⭐⭐⭐ ⭐⭐⭐
2. Docker Swarm Intermediate Weekend-1wk ⭐⭐⭐⭐ ⭐⭐⭐⭐
3. K3s Kubernetes Advanced 1-2 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
4. Longhorn Storage Advanced 1 week ⭐⭐⭐⭐ ⭐⭐⭐
5. MetalLB Intermediate Half day ⭐⭐⭐ ⭐⭐⭐
6. Ingress + TLS Intermediate 1 week ⭐⭐⭐⭐ ⭐⭐⭐⭐
7. Prometheus + Grafana Intermediate 1 week ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
8. GitOps with ArgoCD Advanced 1-2 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
9. Proxmox Advanced 1-2 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
10. VLANs Advanced 1-2 weeks ⭐⭐⭐⭐ ⭐⭐⭐
11. Tailscale Beginner Half day ⭐⭐⭐ ⭐⭐⭐⭐⭐
12. HA Kubernetes Expert 1-2 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
13. CI/CD Intermediate 1 week ⭐⭐⭐⭐ ⭐⭐⭐⭐
14. Backup & DR Advanced 1-2 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐
15. Self-Hosted Apps Intermediate Ongoing ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐

If you’re starting from scratch:

  1. Project 1: Docker basics → Get comfortable with containers
  2. Project 2: Docker Swarm → Your first real cluster
  3. Project 11: Tailscale → Easy remote access
  4. Project 3: K3s Kubernetes → The main event
  5. Project 5: MetalLB → Real IPs for services
  6. Project 6: Ingress + TLS → HTTPS for everything
  7. Project 7: Monitoring → See what’s happening
  8. Continue with storage, GitOps, etc.

If you want the “serious infrastructure” path:

  1. Project 9: Proxmox → Virtualization foundation
  2. Project 10: VLANs → Proper network segmentation
  3. Project 3: K3s (on Proxmox VMs)
  4. Project 12: HA Kubernetes → Production-grade
  5. Project 4: Longhorn → Distributed storage
  6. Project 8: GitOps → Infrastructure as code
  7. Project 14: Backup & DR → Protect everything

If you just want to self-host apps quickly:

  1. Project 1: Docker basics
  2. Project 3: K3s (single node is fine)
  3. Project 6: Ingress
  4. Project 15: Self-hosted apps → Start deploying!
  5. Project 11: Tailscale → Access from anywhere

Final Capstone Project: Production-Grade Homelab

  • File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
  • Main Programming Language: YAML / Terraform / Ansible
  • Alternative Programming Languages: Go, Python
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 5: Master
  • Knowledge Area: Full-Stack Infrastructure
  • Software or Tool: All of the above
  • Main Book: “Site Reliability Engineering” by Google

What you’ll build: A complete, production-grade homelab that combines everything:

  • Infrastructure: Proxmox cluster with HA
  • Orchestration: HA Kubernetes (K3s) on Proxmox VMs
  • Storage: Longhorn or Ceph for distributed storage
  • Networking: VLANs, MetalLB, Ingress, TLS
  • Security: Tailscale, proper firewall rules, secrets management
  • Observability: Prometheus, Grafana, Loki for logs
  • GitOps: ArgoCD managing all deployments
  • CI/CD: Self-hosted runners building and deploying
  • Backup: Velero + Proxmox Backup Server
  • Applications: Full suite of self-hosted services

Why this is the ultimate homelab project: This is a mini cloud platform. Companies pay millions for infrastructure like this. You’ll have something that rivals a small startup’s infrastructure, running in your home, managed as code, and fully automated.

Real world outcome:

Your Infrastructure as Code Repository:

homelab-infrastructure/
├── terraform/
│   └── proxmox/              # VM provisioning
├── ansible/
│   └── playbooks/            # OS configuration
├── kubernetes/
│   ├── infrastructure/       # MetalLB, cert-manager, etc.
│   ├── monitoring/           # Prometheus stack
│   ├── applications/         # All your apps
│   └── argocd/              # ArgoCD itself
├── docs/
│   ├── architecture.md
│   └── runbooks/             # Operational procedures
└── README.md

$ git push origin main
# → Terraform provisions VMs
# → Ansible configures nodes
# → K3s cluster forms
# → ArgoCD syncs all applications
# → Full stack running in 30 minutes from bare metal

Summary

# Project Main Language
1 Docker on Single Node Bash / YAML
2 Docker Swarm Cluster Bash / YAML
3 K3s Kubernetes Cluster YAML / Bash
4 Distributed Storage (Longhorn) YAML / Bash
5 Load Balancer (MetalLB) YAML
6 Ingress + TLS Certificates YAML
7 Monitoring (Prometheus + Grafana) YAML / PromQL
8 GitOps (ArgoCD) YAML / Git
9 Proxmox Virtualization Bash / Web UI
10 VLANs and Network Segmentation Network config
11 Tailscale VPN Configuration
12 HA Kubernetes (Multi-Master) YAML / Bash
13 CI/CD Pipeline YAML / Docker
14 Backup and Disaster Recovery YAML / Bash
15 Self-Hosted Applications YAML
Final Production-Grade Homelab (Capstone) All of the above

Key Resources Referenced

Online Resources

Books

  • “Docker Deep Dive” by Nigel Poulton
  • “Kubernetes Up and Running” by Brendan Burns et al.
  • “Prometheus Up and Running” by Brian Brazil
  • “GitOps and Kubernetes” by Billy Yuen
  • “Computer Networks” by Andrew Tanenbaum
  • “Continuous Delivery” by Jez Humble
  • “Site Reliability Engineering” by Google

Hardware