Learn Home Clusters: From Single Server to Distributed Infrastructure

Goal: Deeply understand home clusters (homelabs)—what they are, why they’re useful, how they differ from single servers, and how to build increasingly sophisticated distributed systems at home. You’ll progress from running your first container to building a production-grade Kubernetes cluster with GitOps, monitoring, and self-healing.

Why Home Clusters Matter

A home cluster (or homelab) is a collection of computers in your home that work together as a unified system. Instead of one big server doing everything, you have multiple machines sharing the load, providing redundancy, and teaching you the skills that power modern cloud infrastructure.

After completing these projects, you will:

Understand the difference between a single server and a cluster
Know when to use VMs vs. containers vs. orchestration
Build and manage your own Kubernetes cluster
Implement distributed storage that survives disk failures
Set up monitoring, alerting, and self-healing infrastructure
Deploy applications using GitOps (infrastructure as code)
Understand networking concepts like VLANs, load balancing, and VPNs
Have skills directly transferable to cloud platforms (AWS, GCP, Azure)

Core Concept Analysis

What is a Cluster?

Single Server                          Cluster
┌─────────────────┐                    ┌─────────────────┐
│                 │                    │    Node 1       │
│   All services  │                    │  (control plane)│
│   All storage   │                    └────────┬────────┘
│   Single point  │                             │
│   of failure    │                    ┌────────┴────────┐
│                 │                    │                 │
└─────────────────┘           ┌────────┴──────┐  ┌───────┴───────┐
                              │    Node 2     │  │    Node 3     │
                              │   (worker)    │  │   (worker)    │
                              └───────────────┘  └───────────────┘

If server dies,               If node dies, workloads move
everything is gone            to remaining nodes automatically

The Stack: From Hardware to Applications

┌─────────────────────────────────────────────────────────────────────────┐
│                          YOUR APPLICATIONS                              │
│            (Websites, Databases, Media Servers, Home Automation)        │
├─────────────────────────────────────────────────────────────────────────┤
│                      CONTAINER ORCHESTRATION                            │
│              (Kubernetes / K3s / Docker Swarm / Nomad)                  │
│        Schedules containers, handles scaling, self-healing             │
├─────────────────────────────────────────────────────────────────────────┤
│                      CONTAINER RUNTIME                                  │
│                    (Docker / containerd / Podman)                       │
│              Runs isolated containers from images                       │
├─────────────────────────────────────────────────────────────────────────┤
│                      VIRTUALIZATION (optional)                          │
│                    (Proxmox / VMware / Hyper-V)                         │
│           VMs provide isolation and easy snapshotting                   │
├─────────────────────────────────────────────────────────────────────────┤
│                      OPERATING SYSTEM                                   │
│              (Ubuntu Server / Debian / Fedora / Talos)                  │
├─────────────────────────────────────────────────────────────────────────┤
│                         HARDWARE                                        │
│    (Raspberry Pi / Intel N100 Mini PC / Refurbished Dell Optiplex)     │
└─────────────────────────────────────────────────────────────────────────┘

Why Build a Home Cluster?

Reason	Description
Learning	Hands-on experience with cloud-native technologies without cloud costs
Career	Kubernetes, Docker, and DevOps skills are highly sought after
Self-hosting	Run your own services (media, email, storage) with privacy
High availability	Services survive individual machine failures
Cost savings	Learn on $200-500 hardware vs. $100s/month in cloud bills
Experimentation	Break things freely—rebuild in minutes with automation

Cluster Types Compared

Type	Complexity	Use Case	Learning Value
Docker Compose	★☆☆☆☆	Single server, multiple containers	Container basics
Docker Swarm	★★☆☆☆	Simple multi-node clustering	HA basics, easy setup
K3s / MicroK8s	★★★☆☆	Lightweight Kubernetes	Production K8s skills
Full Kubernetes	★★★★☆	Complete K8s experience	Enterprise-grade skills
Proxmox + K8s	★★★★★	VMs + Containers	Full infrastructure

Proxmox vs Kubernetes: When to Use What

┌─────────────────────────────────────────────────────────────────┐
│                         PROXMOX                                 │
│                   (Virtualization Layer)                        │
│                                                                 │
│  ┌─────────┐  ┌─────────┐  ┌─────────┐  ┌─────────┐           │
│  │  VM 1   │  │  VM 2   │  │  VM 3   │  │  VM 4   │           │
│  │ K8s     │  │ K8s     │  │ K8s     │  │ Windows │           │
│  │ Control │  │ Worker  │  │ Worker  │  │ for     │           │
│  │ Plane   │  │ Node 1  │  │ Node 2  │  │ Testing │           │
│  └─────────┘  └─────────┘  └─────────┘  └─────────┘           │
└─────────────────────────────────────────────────────────────────┘

Proxmox handles:                Kubernetes handles:
- VM lifecycle                  - Container scheduling
- Snapshots & backups          - Auto-scaling
- Resource allocation          - Service discovery
- HA for VMs                   - Rolling deployments
- Storage management           - Self-healing containers

Best practice: Run Kubernetes inside Proxmox VMs. You get the best of both worlds—VM-level isolation and easy recovery (snapshots) with container-level efficiency and orchestration.

Hardware Options for Home Clusters

Budget Recommendations (2025)

Option	Cost	Nodes	Best For
Raspberry Pi 5 (4GB)	~$60 each	3-5	Learning, low power, ARM experience
Intel N100 Mini PCs	~$150 each	3-4	Best value, 6W TDP, silent
Refurbished Optiplex	~$100 each	3-4	More RAM/CPU, slightly loud
Minisforum MS-01	~$600 each	2-3	10GbE networking, serious clusters

Starter Cluster: 3-Node Raspberry Pi

Total Cost: ~$200-250

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│  Pi 5 (8GB)     │     │  Pi 5 (4GB)     │     │  Pi 5 (4GB)     │
│  Control Plane  │     │  Worker 1       │     │  Worker 2       │
│                 │     │                 │     │                 │
│  + 64GB SD      │     │  + 64GB SD      │     │  + 64GB SD      │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                         ┌───────┴───────┐
                         │   Gigabit     │
                         │   Switch      │
                         └───────────────┘

Recommended Cluster: 3-Node Intel N100

Total Cost: ~$450-600

┌─────────────────┐     ┌─────────────────┐     ┌─────────────────┐
│ N100 Mini PC    │     │ N100 Mini PC    │     │ N100 Mini PC    │
│ 16GB RAM        │     │ 16GB RAM        │     │ 16GB RAM        │
│ 512GB NVMe      │     │ 512GB NVMe      │     │ 512GB NVMe      │
│ 2x 2.5GbE       │     │ 2x 2.5GbE       │     │ 2x 2.5GbE       │
└────────┬────────┘     └────────┬────────┘     └────────┬────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                         ┌───────┴───────┐
                         │  2.5GbE       │
                         │  Switch       │
                         └───────────────┘

Advantages:
- x86 architecture (more compatible software)
- 16-32GB RAM per node
- NVMe storage (fast)
- Low power (~6W idle)
- Silent operation

Project List

Projects are ordered from fundamental understanding to advanced implementations.

Project 1: Docker on a Single Node (The Foundation)

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: Bash / YAML
Alternative Programming Languages: Python (for automation)
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 1. The “Resume Gold”
Difficulty: Level 1: Beginner
Knowledge Area: Containers / Docker
Software or Tool: Docker, Docker Compose
Main Book: “Docker Deep Dive” by Nigel Poulton

What you’ll build: A single server running multiple containerized applications (web server, database, reverse proxy) using Docker and Docker Compose. This is the foundation for all cluster work.

Why it teaches home clusters: Before you can orchestrate containers across multiple machines, you need to understand what containers are and how they work on a single machine. Docker Compose is the “single-node orchestrator” that introduces you to declarative configuration.

Core challenges you’ll face:

Container networking → maps to how containers talk to each other
Volume management → maps to persistent data in ephemeral containers
Image building → maps to creating custom container images
Compose file syntax → maps to declarative infrastructure

Key Concepts:

Container vs VM: “Docker Deep Dive” Chapter 4 - Nigel Poulton
Docker Networking: “Docker Deep Dive” Chapter 11
Docker Volumes: “Docker Deep Dive” Chapter 10
Compose File Reference: Docker official documentation

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Linux command line, understanding of web servers. No prior container experience needed.

Real world outcome:

$ docker compose up -d

[+] Running 4/4
 ✔ Network homelab_default       Created
 ✔ Container homelab-db-1        Started
 ✔ Container homelab-app-1       Started
 ✔ Container homelab-nginx-1     Started

$ curl http://localhost
Welcome to my homelab!

$ docker compose ps
NAME                STATUS      PORTS
homelab-db-1        running     5432/tcp
homelab-app-1       running     8080/tcp
homelab-nginx-1     running     0.0.0.0:80->80/tcp

Implementation Hints:

Docker Compose file structure:

# docker-compose.yml
version: '3.8'

services:
  nginx:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    depends_on:
      - app

  app:
    build: ./app
    environment:
      - DATABASE_URL=postgres://db:5432/myapp

  db:
    image: postgres:15
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=secret

volumes:
  postgres_data:

Questions to answer as you build:

What happens to data when a container restarts?
How do containers find each other by name?
Why use Alpine-based images?
What’s the difference between build: and image:?

Learning milestones:

“Hello World” container runs → You understand container basics
Multi-container app works → You understand networking
Data persists across restarts → You understand volumes
You can rebuild from scratch in one command → You understand declarative config

Project 2: Docker Swarm Cluster (Your First Real Cluster)

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: Bash / YAML
Alternative Programming Languages: Python
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Container Orchestration
Software or Tool: Docker Swarm, Portainer
Main Book: “Docker Deep Dive” by Nigel Poulton

What you’ll build: A 3-node Docker Swarm cluster where containers can run on any node, services are load-balanced automatically, and if a node dies, containers are rescheduled to surviving nodes.

Why it teaches home clusters: Docker Swarm is the gentlest introduction to container orchestration. One command (docker swarm init) and you have a cluster. It teaches the core concepts—managers, workers, services, replicas—without Kubernetes complexity.

Core challenges you’ll face:

Manager vs worker nodes → maps to control plane vs data plane
Service replication → maps to high availability
Overlay networking → maps to cross-node container communication
Stack deployments → maps to multi-service applications

Key Concepts:

Swarm Mode: “Docker Deep Dive” Chapter 14 - Nigel Poulton
Overlay Networks: “Docker Deep Dive” Chapter 11
Service Discovery: Built into Swarm via DNS
Rolling Updates: “Docker Deep Dive” Chapter 14

Difficulty: Intermediate Time estimate: Weekend to 1 week Prerequisites: Project 1 (Docker basics). 2-3 machines (physical or VMs) with Docker installed.

Real world outcome:

# On node 1 (manager)
$ docker swarm init --advertise-addr 192.168.1.10
Swarm initialized: current node is now a manager.
To add a worker: docker swarm join --token SWMTKN-xxx 192.168.1.10:2377

# On nodes 2 and 3
$ docker swarm join --token SWMTKN-xxx 192.168.1.10:2377
This node joined a swarm as a worker.

# Back on manager - deploy a replicated service
$ docker service create --name web --replicas 3 -p 80:80 nginx
3/3: Running

$ docker service ps web
ID         NAME    NODE     STATE
abc123     web.1   node1    Running
def456     web.2   node2    Running
ghi789     web.3   node3    Running

# Kill node3, watch containers reschedule
$ docker service ps web
ID         NAME        NODE     STATE
abc123     web.1       node1    Running
def456     web.2       node2    Running
jkl012     web.3       node1    Running  # Rescheduled!

Implementation Hints:

Swarm stack file (similar to Compose, but for Swarm):

# stack.yml
version: '3.8'

services:
  web:
    image: nginx:alpine
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
      restart_policy:
        condition: on-failure
    ports:
      - "80:80"
    networks:
      - webnet

  visualizer:
    image: dockersamples/visualizer
    ports:
      - "8080:8080"
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      placement:
        constraints:
          - node.role == manager

networks:
  webnet:
    driver: overlay

Deploy and manage:

$ docker stack deploy -c stack.yml myapp
$ docker stack services myapp
$ docker stack ps myapp

Resources for key challenges:

Learning milestones:

3-node cluster initialized → You understand cluster formation
Service runs across nodes → You understand scheduling
Service survives node failure → You understand high availability
Rolling update works → You understand zero-downtime deployments

Project 3: K3s Kubernetes Cluster (Lightweight Production-Grade)

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML / Bash
Alternative Programming Languages: Go, Python
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 1. The “Resume Gold”
Difficulty: Level 3: Advanced
Knowledge Area: Kubernetes / Container Orchestration
Software or Tool: K3s, kubectl, Helm
Main Book: “Kubernetes Up and Running” by Brendan Burns

What you’ll build: A production-ready Kubernetes cluster using K3s—a lightweight K8s distribution perfect for edge and home labs. You’ll deploy applications, understand pods, services, deployments, and ingress.

Why it teaches home clusters: Kubernetes is the industry standard for container orchestration. K3s gives you real Kubernetes without the resource overhead. Skills learned here transfer directly to EKS, GKE, AKS, and enterprise Kubernetes.

Core challenges you’ll face:

Pod scheduling and lifecycle → maps to fundamental K8s unit
Services and networking → maps to how pods communicate
Deployments and ReplicaSets → maps to declarative application management
Ingress and load balancing → maps to external access

Key Concepts:

Kubernetes Architecture: “Kubernetes Up and Running” Chapter 3 - Burns et al.
Pods and Containers: “Kubernetes Up and Running” Chapter 5
Services: “Kubernetes Up and Running” Chapter 7
Deployments: “Kubernetes Up and Running” Chapter 9

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2 (orchestration basics), Linux administration, networking fundamentals. 3 machines recommended.

Real world outcome:

# On control plane node
$ curl -sfL https://get.k3s.io | sh -

# Get join token
$ cat /var/lib/rancher/k3s/server/node-token

# On worker nodes
$ curl -sfL https://get.k3s.io | K3S_URL=https://192.168.1.10:6443 \
    K3S_TOKEN=<token> sh -

# Check cluster
$ kubectl get nodes
NAME     STATUS   ROLES                  AGE
node1    Ready    control-plane,master   5m
node2    Ready    <none>                 2m
node3    Ready    <none>                 2m

# Deploy an application
$ kubectl create deployment nginx --image=nginx --replicas=3
$ kubectl expose deployment nginx --port=80 --type=LoadBalancer
$ kubectl get pods -o wide
NAME                    READY   STATUS    NODE
nginx-abc123-xxx        1/1     Running   node1
nginx-abc123-yyy        1/1     Running   node2
nginx-abc123-zzz        1/1     Running   node3

Implementation Hints:

Kubernetes manifest structure:

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
spec:
  replicas: 3
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp
        image: myapp:latest
        ports:
        - containerPort: 8080
        resources:
          limits:
            memory: "128Mi"
            cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  selector:
    app: myapp
  ports:
  - port: 80
    targetPort: 8080
  type: ClusterIP

Essential kubectl commands:

kubectl get pods,svc,deploy -A          # See everything
kubectl describe pod <name>             # Debug a pod
kubectl logs <pod> -f                   # Stream logs
kubectl exec -it <pod> -- /bin/sh       # Shell into pod
kubectl apply -f manifest.yaml          # Apply config
kubectl delete -f manifest.yaml         # Delete resources

Resources for key challenges:

Learning milestones:

Cluster formed, kubectl works → You understand K8s architecture
Deployment scales up/down → You understand workload management
Service discovery works → You understand K8s networking
Ingress routes external traffic → You understand edge access

Project 4: Distributed Storage with Longhorn

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML / Bash
Alternative Programming Languages: Go
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Distributed Storage / Kubernetes
Software or Tool: Longhorn, K3s
Main Book: “Kubernetes Up and Running” by Brendan Burns

What you’ll build: A distributed block storage system that replicates data across cluster nodes. When a node dies, your data survives. Databases, file storage, and stateful apps all get persistent volumes that are automatically replicated.

Why it teaches home clusters: Single points of failure are the enemy of clusters. Local disk on one node isn’t “clustered storage.” Longhorn teaches you how real distributed storage works—replication, consistency, and failure recovery.

Core challenges you’ll face:

Persistent Volume Claims (PVCs) → maps to requesting storage in Kubernetes
Storage Classes → maps to different storage tiers/policies
Replication factor → maps to data redundancy
Backup and restore → maps to disaster recovery

Key Concepts:

Kubernetes Storage: “Kubernetes Up and Running” Chapter 13 - Burns et al.
PV/PVC/StorageClass: Kubernetes official documentation
Distributed Block Storage: Longhorn architecture documentation
iSCSI: How Longhorn exposes volumes

Difficulty: Advanced Time estimate: 1 week Prerequisites: Project 3 (K3s cluster running). Each node needs some local disk space.

Real world outcome:

# Install Longhorn
$ kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

# Wait for Longhorn to be ready
$ kubectl -n longhorn-system get pods
NAME                                       READY   STATUS
longhorn-manager-xxxxx                     1/1     Running
longhorn-driver-deployer-xxxxx             1/1     Running

# Access Longhorn UI
$ kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80

# Create a replicated volume for PostgreSQL
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
EOF

# Longhorn dashboard shows:
# - Volume: postgres-data
# - Replicas: 3 (one on each node)
# - Status: Healthy

# Kill a node, watch Longhorn rebuild replica on surviving nodes

Implementation Hints:

Longhorn StorageClass configuration:

# storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"  # 48 hours
  fromBackup: ""
  dataLocality: "best-effort"

Using storage in a deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: postgres
spec:
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: postgres-data

Resources for key challenges:

Learning milestones:

Longhorn installed and healthy → You understand CSI drivers
PVC provisions automatically → You understand dynamic provisioning
Data survives pod restart → You understand persistent storage
Data survives node failure → You understand distributed replication

Project 5: Load Balancer with MetalLB

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML
Alternative Programming Languages: N/A
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Networking / Kubernetes
Software or Tool: MetalLB, K3s
Main Book: “Kubernetes Networking” by James Strong

What you’ll build: A bare-metal load balancer that gives your Kubernetes services external IP addresses—just like cloud LoadBalancers do. No more NodePort hacks; services get real IPs from your home network.

Why it teaches home clusters: In the cloud, type: LoadBalancer magically gets an IP. On bare metal, you need MetalLB. Understanding how this works demystifies a core Kubernetes networking concept and teaches you about Layer 2/BGP networking.

Core challenges you’ll face:

IP address pool management → maps to allocating IPs from your network
Layer 2 vs BGP mode → maps to how addresses are advertised
Service type LoadBalancer → maps to external access patterns
ARP/NDP → maps to how Layer 2 mode works

Key Concepts:

Kubernetes Service Types: “Kubernetes Up and Running” Chapter 7
Layer 2 Networking: ARP (Address Resolution Protocol)
BGP: Border Gateway Protocol basics
Load Balancing Algorithms: Round-robin, least connections

Difficulty: Intermediate Time estimate: Half a day Prerequisites: Project 3 (K3s cluster), basic networking knowledge (IP addresses, subnets).

Real world outcome:

# Install MetalLB
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/main/config/manifests/metallb-native.yaml

# Configure IP pool (use unused IPs from your LAN)
$ cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: homelab-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.200-192.168.1.220
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: homelab-l2
  namespace: metallb-system
EOF

# Create a LoadBalancer service
$ kubectl expose deployment nginx --type=LoadBalancer --port=80

$ kubectl get svc nginx
NAME    TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)
nginx   LoadBalancer   10.43.12.34    192.168.1.200    80:31234/TCP

# Access from any machine on your network!
$ curl http://192.168.1.200
Welcome to nginx!

Implementation Hints:

MetalLB configuration options:

# Layer 2 mode (simple, no router config needed)
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: production
  namespace: metallb-system
spec:
  addresses:
  - 192.168.1.200-192.168.1.220  # Reserve these in your DHCP

---
# BGP mode (for advanced setups with BGP-capable router)
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: bgp-advertisement
  namespace: metallb-system
spec:
  ipAddressPools:
  - production

Learning milestones:

MetalLB pods running → You understand the installation
Service gets external IP → You understand IP allocation
External access works → You understand L2 advertisement
Multiple services get unique IPs → You understand pool management

Project 6: Ingress Controller and TLS Certificates

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML
Alternative Programming Languages: N/A
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Networking / Security
Software or Tool: Traefik (K3s default) or Nginx Ingress, cert-manager
Main Book: “Kubernetes Networking” by James Strong

What you’ll build: An ingress controller that routes traffic based on hostnames (app1.home.lab, app2.home.lab), plus automatic TLS certificate management so everything is HTTPS.

Why it teaches home clusters: Real applications need domain-based routing and HTTPS. The Ingress resource is how Kubernetes handles this. cert-manager automates the tedious certificate renewal process—essential for production.

Core challenges you’ll face:

Ingress resource configuration → maps to routing rules
TLS termination → maps to HTTPS at the edge
cert-manager and Let’s Encrypt → maps to automated certificates
Local DNS setup → maps to resolving *.home.lab

Key Concepts:

Kubernetes Ingress: “Kubernetes Up and Running” Chapter 8
TLS/SSL Certificates: How HTTPS works
ACME Protocol: How Let’s Encrypt issues certificates
DNS Challenge: Proving domain ownership

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster), Project 5 (MetalLB). Domain name helpful but not required.

Real world outcome:

# K3s comes with Traefik ingress controller
$ kubectl get pods -n kube-system | grep traefik
traefik-xxxxx    1/1     Running

# Install cert-manager
$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml

# Create Ingress with TLS
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp-ingress
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
  - hosts:
    - myapp.example.com
    secretName: myapp-tls
  rules:
  - host: myapp.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: myapp
            port:
              number: 80
EOF

# Browser shows:
# 🔒 https://myapp.example.com - Certificate valid!

Implementation Hints:

cert-manager ClusterIssuer:

# For production (real certs from Let's Encrypt)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: you@example.com
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
    - http01:
        ingress:
          class: traefik

# For testing (self-signed, no rate limits)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: selfsigned
spec:
  selfSigned: {}

Local DNS with Pi-hole or CoreDNS:

# Add to /etc/hosts or Pi-hole Local DNS:
168.1.200  app1.home.lab
168.1.200  app2.home.lab
168.1.200  grafana.home.lab

Learning milestones:

Ingress routes by hostname → You understand L7 routing
Self-signed TLS works → You understand TLS termination
Let’s Encrypt cert auto-issued → You understand ACME
Multiple apps on one IP → You understand virtual hosting

Project 7: Monitoring Stack (Prometheus + Grafana)

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML / PromQL
Alternative Programming Languages: Python (exporters)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: Observability / Monitoring
Software or Tool: Prometheus, Grafana, kube-prometheus-stack
Main Book: “Prometheus Up and Running” by Brian Brazil

What you’ll build: A complete monitoring stack that collects metrics from all cluster nodes and applications, visualizes them in beautiful dashboards, and can alert you when something goes wrong.

Why it teaches home clusters: You can’t manage what you can’t measure. Prometheus is the standard for Kubernetes monitoring. Learning PromQL (Prometheus Query Language) and Grafana dashboards are essential DevOps skills.

Core challenges you’ll face:

Prometheus scraping → maps to metric collection
PromQL queries → maps to analyzing time series data
Grafana dashboards → maps to visualization
Alertmanager → maps to notification on issues

Key Concepts:

Prometheus Architecture: “Prometheus Up and Running” Chapter 2 - Brian Brazil
PromQL Basics: “Prometheus Up and Running” Chapter 4
Grafana Dashboards: Grafana documentation
Kubernetes Metrics: kube-state-metrics, node-exporter

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster). Understanding of metrics concepts helpful.

Real world outcome:

# Install kube-prometheus-stack via Helm
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm install monitoring prometheus-community/kube-prometheus-stack \
    --namespace monitoring --create-namespace

# Access Grafana
$ kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Open http://localhost:3000 (admin/prom-operator)

# Grafana shows:
# - Cluster CPU/Memory usage: 45% / 62%
# - Pod count: 47 running, 0 pending
# - Node status: 3/3 healthy
# - Network I/O: 150 Mbps in, 50 Mbps out
# - Disk usage: 234 GB / 512 GB

# Example PromQL queries:
# CPU usage by pod: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
# Memory by namespace: sum(container_memory_usage_bytes) by (namespace)

Implementation Hints:

kube-prometheus-stack values customization:

# values.yaml
grafana:
  adminPassword: your-secure-password
  ingress:
    enabled: true
    hosts:
      - grafana.home.lab

prometheus:
  prometheusSpec:
    retention: 30d
    storageSpec:
      volumeClaimTemplate:
        spec:
          storageClassName: longhorn
          resources:
            requests:
              storage: 50Gi

alertmanager:
  config:
    receivers:
    - name: 'slack'
      slack_configs:
      - api_url: 'https://hooks.slack.com/xxx'
        channel: '#alerts'

Essential PromQL queries:

# Node CPU usage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Pod memory usage
sum(container_memory_usage_bytes{pod!=""}) by (pod, namespace)

# HTTP request rate
sum(rate(http_requests_total[5m])) by (service)

# Error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))

Learning milestones:

Prometheus scrapes metrics → You understand metric collection
Grafana dashboard shows cluster health → You understand visualization
Custom PromQL query works → You understand querying
Alert fires and notifies you → You understand alerting

Project 8: GitOps with ArgoCD

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML / Git
Alternative Programming Languages: Kustomize, Helm
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: GitOps / CI/CD
Software or Tool: ArgoCD, GitHub/GitLab
Main Book: “GitOps and Kubernetes” by Billy Yuen

What you’ll build: A GitOps pipeline where your cluster state is defined in Git. Push a change to your repository, and ArgoCD automatically syncs it to your cluster. No more kubectl apply—everything is version-controlled and auditable.

Why it teaches home clusters: GitOps is how modern teams manage infrastructure. “Git as the single source of truth” means you can rebuild your entire cluster from a repository. It’s declarative, auditable, and enables easy rollbacks.

Core challenges you’ll face:

Application custom resources → maps to defining apps in ArgoCD
Sync policies → maps to automatic vs manual deployment
Helm/Kustomize integration → maps to templating and overlays
RBAC and multi-tenancy → maps to who can deploy what

Key Concepts:

GitOps Principles: “GitOps and Kubernetes” Chapter 1 - Billy Yuen
ArgoCD Applications: ArgoCD documentation
Kustomize: Kubernetes-native templating
Helm Charts: Package manager for Kubernetes

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3 (K3s cluster), Git proficiency, understanding of Kubernetes manifests.

Real world outcome:

# Install ArgoCD
$ kubectl create namespace argocd
$ kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml

# Access ArgoCD UI
$ kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get initial password: kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d

# Create an Application pointing to your Git repo
$ cat <<EOF | kubectl apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: my-homelab
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/yourusername/homelab-gitops.git
    targetRevision: HEAD
    path: apps
  destination:
    server: https://kubernetes.default.svc
    namespace: default
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
EOF

# Now: git push to your repo → ArgoCD syncs automatically!
# ArgoCD UI shows:
# - App Status: Synced ✓
# - Health: Healthy ✓
# - Last Sync: 30 seconds ago

Implementation Hints:

Repository structure for GitOps:

homelab-gitops/
├── apps/                    # Application definitions
│   ├── nginx/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── ingress.yaml
│   ├── postgres/
│   │   ├── deployment.yaml
│   │   ├── service.yaml
│   │   └── pvc.yaml
│   └── kustomization.yaml
├── infrastructure/          # Cluster infrastructure
│   ├── metallb/
│   ├── cert-manager/
│   └── monitoring/
└── README.md

ArgoCD Application with Helm:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: prometheus
  namespace: argocd
spec:
  source:
    repoURL: https://prometheus-community.github.io/helm-charts
    chart: kube-prometheus-stack
    targetRevision: 45.0.0
    helm:
      values: |
        grafana:
          adminPassword: secret
  destination:
    server: https://kubernetes.default.svc
    namespace: monitoring

Resources for key challenges:

ArgoCD Documentation
Pi Kubernetes Cluster with FluxCD - Alternative GitOps tool

Learning milestones:

ArgoCD syncs from Git → You understand GitOps basics
Auto-sync on git push → You understand continuous deployment
Helm charts deployed via ArgoCD → You understand templating
Full cluster rebuildable from Git → You’ve achieved true GitOps

Project 9: Proxmox Virtualization Platform

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: Bash / Web UI
Alternative Programming Languages: Terraform (for automation)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Virtualization / Infrastructure
Software or Tool: Proxmox VE
Main Book: “Mastering Proxmox” by Wasim Ahmed

What you’ll build: A Proxmox cluster that manages VMs and containers across multiple physical nodes. This becomes the foundation layer for everything else—you can run Kubernetes inside Proxmox VMs for the best of both worlds.

Why it teaches home clusters: Proxmox teaches you virtualization concepts: hypervisors, VM lifecycle, live migration, HA clustering, and storage pools. Running K8s on Proxmox gives you snapshot/backup capabilities and easy recovery.

Core challenges you’ll face:

Proxmox cluster formation → maps to quorum and consensus
VM templates → maps to golden images
Storage configuration → maps to local vs shared storage
HA groups → maps to automatic VM failover

Key Concepts:

Type 1 vs Type 2 Hypervisors: Proxmox is Type 1 (bare metal)
QEMU/KVM: Linux virtualization technology
LXC Containers: Lightweight OS-level virtualization
Ceph Integration: Distributed storage in Proxmox

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: 2-3 physical machines (or nested virtualization), basic Linux administration.

Real world outcome:

Proxmox Web UI (https://192.168.1.10:8006)

┌─────────────────────────────────────────────────────────────────┐
│ Datacenter: HomeCluster                                         │
├─────────────────────────────────────────────────────────────────┤
│ Nodes:                                                          │
│   ✓ pve1 (192.168.1.10) - 16 CPU, 64GB RAM, 1TB SSD            │
│   ✓ pve2 (192.168.1.11) - 16 CPU, 64GB RAM, 1TB SSD            │
│   ✓ pve3 (192.168.1.12) - 16 CPU, 64GB RAM, 1TB SSD            │
│                                                                 │
│ VMs:                                                            │
│   k8s-control (pve1) - 4 CPU, 8GB RAM - Running                │
│   k8s-worker1 (pve2) - 4 CPU, 8GB RAM - Running                │
│   k8s-worker2 (pve3) - 4 CPU, 8GB RAM - Running                │
│   windows-test (pve1) - 4 CPU, 16GB RAM - Stopped              │
│                                                                 │
│ Storage:                                                        │
│   local-lvm: 234GB / 500GB                                      │
│   ceph-pool: 1.2TB / 3TB (replicated across nodes)              │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Proxmox cluster setup:

# On first node (pve1)
$ pvecm create HomeCluster

# On other nodes (pve2, pve3)
$ pvecm add 192.168.1.10

# Check cluster status
$ pvecm status
Cluster information
-------------------
Name:             HomeCluster
Config Version:   3
Transport:        knet
Secure auth:      on

Membership information
----------------------
    Nodeid      Votes Name
         1          1 pve1 (local)
         2          1 pve2
         3          1 pve3

Creating a VM template:

# Download cloud image
$ wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img

# Create VM
$ qm create 9000 --name ubuntu-template --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0

# Import disk
$ qm importdisk 9000 jammy-server-cloudimg-amd64.img local-lvm

# Attach disk
$ qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0

# Convert to template
$ qm template 9000

# Clone from template
$ qm clone 9000 100 --name k8s-control --full

Resources for key challenges:

Learning milestones:

Proxmox cluster formed → You understand cluster quorum
VM created and runs → You understand virtualization basics
Template cloning works → You understand golden images
Live migration works → You understand VM mobility

Project 10: Network Segmentation with VLANs

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: N/A (Network configuration)
Alternative Programming Languages: Ansible (for automation)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Networking / Security
Software or Tool: Managed switch, pfSense/OPNsense
Main Book: “Computer Networks” by Tanenbaum

What you’ll build: A properly segmented home network with VLANs separating management traffic, user devices, IoT devices, and cluster traffic. Each VLAN has appropriate firewall rules.

Why it teaches home clusters: Production clusters aren’t on flat networks. VLANs teach you layer 2/3 networking, inter-VLAN routing, and network security. These skills are essential for any infrastructure role.

Core challenges you’ll face:

VLAN tagging (802.1Q) → maps to how VLANs work on the wire
Inter-VLAN routing → maps to how VLANs communicate
Firewall rules → maps to controlling traffic flow
Trunk vs access ports → maps to switch configuration

Key Concepts:

VLANs: “Computer Networks” Chapter 4 - Tanenbaum
802.1Q Tagging: IEEE standard for VLAN tagging
Layer 3 Switching: Routing between VLANs
Firewall Zones: Grouping interfaces by trust level

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Managed switch (VLAN-capable), router/firewall (pfSense/OPNsense). Basic networking knowledge.

Real world outcome:

Network Diagram:

                    Internet
                        │
                   ┌────┴────┐
                   │ OPNsense│  (Router/Firewall)
                   │ Firewall│
                   └────┬────┘
                        │ Trunk (all VLANs)
                   ┌────┴────┐
                   │ Managed │
                   │ Switch  │
                   └─┬──┬──┬─┘
                     │  │  │
        ┌────────────┘  │  └────────────┐
        │               │               │
   VLAN 10         VLAN 20         VLAN 30
   Management      Cluster         IoT
   ────────────    ─────────       ──────────
   • OPNsense      • k8s-ctrl      • Cameras
   • Proxmox UI    • k8s-worker1   • Smart bulbs
   • Switch mgmt   • k8s-worker2   • Thermostats
                   • NAS

Firewall Rules:
• VLAN 10 → All VLANs (management access)
• VLAN 20 → Internet, VLAN 30 limited
• VLAN 30 → Internet only (IoT isolated)

Implementation Hints:

OPNsense VLAN configuration:

Interfaces → Other Types → VLAN

VLAN 10: Parent: igb0, Tag: 10, Description: Management
VLAN 20: Parent: igb0, Tag: 20, Description: Cluster
VLAN 30: Parent: igb0, Tag: 30, Description: IoT

Interfaces → Assignments:
  OPT1: vlan10 → MGMT (192.168.10.1/24)
  OPT2: vlan20 → CLUSTER (192.168.20.1/24)
  OPT3: vlan30 → IOT (192.168.30.1/24)

Services → DHCPv4:
  MGMT: 192.168.10.100 - 192.168.10.200
  CLUSTER: 192.168.20.100 - 192.168.20.200
  IOT: 192.168.30.100 - 192.168.30.200

Switch VLAN configuration (example for TP-Link):

Port 1-4:   PVID=10, Untagged=10, Tagged=none     (Management)
Port 5-12:  PVID=20, Untagged=20, Tagged=none     (Cluster)
Port 13-20: PVID=30, Untagged=30, Tagged=none     (IoT)
Port 24:    PVID=1,  Untagged=1,  Tagged=10,20,30 (Trunk to router)

Resources for key challenges:

Learning milestones:

VLANs created on switch → You understand 802.1Q
Devices in VLAN get correct IPs → You understand DHCP per VLAN
VLANs can communicate via router → You understand inter-VLAN routing
Firewall blocks unauthorized traffic → You understand security zones

Project 11: Secure Remote Access with Tailscale

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: N/A (Configuration)
Alternative Programming Languages: N/A
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 1: Beginner
Knowledge Area: Networking / VPN
Software or Tool: Tailscale
Main Book: N/A (Online documentation)

What you’ll build: Secure access to your entire homelab from anywhere in the world using Tailscale’s WireGuard-based mesh VPN. No port forwarding, no dynamic DNS hassles, no exposed services.

Why it teaches home clusters: Remote access is essential for managing your cluster. Tailscale teaches you about WireGuard, mesh networking, and zero-trust security—all without the complexity of setting up your own VPN server.

Core challenges you’ll face:

Mesh VPN topology → maps to peer-to-peer connections
Subnet routing → maps to accessing entire home network remotely
ACLs → maps to who can access what
Exit nodes → maps to routing all traffic through home

Key Concepts:

WireGuard Protocol: Modern, fast VPN protocol
Mesh Networking: Every device connects to every other
NAT Traversal: Connecting through firewalls
Zero-Trust: Authenticate before connect

Difficulty: Beginner Time estimate: Half a day Prerequisites: Any device to install Tailscale on. Free account at tailscale.com.

Real world outcome:

# On your cluster nodes
$ curl -fsSL https://tailscale.com/install.sh | sh
$ sudo tailscale up

# Tailscale dashboard shows:
# Machine         IP            Last Seen
# k8s-control     100.64.0.1    Connected
# k8s-worker1     100.64.0.2    Connected
# k8s-worker2     100.64.0.3    Connected
# macbook         100.64.0.4    Connected
# iphone          100.64.0.5    Connected

# From your phone on 4G:
$ ssh 100.64.0.1
Welcome to k8s-control!

# Enable subnet routing (access all 192.168.x.x from anywhere)
$ sudo tailscale up --advertise-routes=192.168.0.0/16

# From anywhere, access your home network:
$ curl http://192.168.1.10:8006  # Proxmox UI works!

Implementation Hints:

Tailscale setup on Kubernetes nodes:

# Install on each node
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up

# On one node, enable subnet routing
sudo tailscale up --advertise-routes=10.43.0.0/16,10.42.0.0/16

# In Tailscale admin console, approve the routes

Tailscale ACL configuration (in admin console):

{
  "acls": [
    {
      "action": "accept",
      "src": ["group:admin"],
      "dst": ["*:*"]
    },
    {
      "action": "accept",
      "src": ["group:developers"],
      "dst": ["tag:k8s:22", "tag:k8s:6443"]
    }
  ],
  "tagOwners": {
    "tag:k8s": ["group:admin"]
  }
}

Resources for key challenges:

Learning milestones:

Two devices connected via Tailscale → You understand mesh VPN
SSH to homelab from phone → You understand remote access
Subnet routing exposes home network → You understand routing
ACLs restrict access by user → You understand zero-trust

Project 12: High Availability Kubernetes (Multi-Master)

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML / Bash
Alternative Programming Languages: Ansible
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Kubernetes / HA
Software or Tool: K3s with embedded etcd, kube-vip
Main Book: “Kubernetes Up and Running” by Brendan Burns

What you’ll build: A Kubernetes cluster with 3 control plane nodes, so the cluster survives the loss of any single control plane node. etcd runs in HA mode, and a virtual IP floats between masters.

Why it teaches home clusters: Single control plane = single point of failure. Understanding HA patterns (quorum, leader election, virtual IPs) is essential for production systems. This is how real Kubernetes clusters are built.

Core challenges you’ll face:

etcd quorum → maps to why 3 nodes minimum
Virtual IP failover → maps to stable API endpoint
Leader election → maps to only one active leader
Split-brain prevention → maps to network partition handling

Key Concepts:

etcd Consensus: “Kubernetes Up and Running” Chapter 3
Raft Protocol: How etcd achieves consensus
Virtual IP: kube-vip or keepalived
Quorum: N/2 + 1 nodes must agree

Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Project 3 (K3s basics), networking knowledge. 3+ machines for control plane.

Real world outcome:

# K3s HA cluster with 3 control planes + 2 workers
$ kubectl get nodes
NAME          STATUS   ROLES                       AGE
k8s-ctrl-1    Ready    control-plane,etcd,master   5d
k8s-ctrl-2    Ready    control-plane,etcd,master   5d
k8s-ctrl-3    Ready    control-plane,etcd,master   5d
k8s-worker-1  Ready    <none>                      5d
k8s-worker-2  Ready    <none>                      5d

# Virtual IP for API
$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.1.100:6443

# Kill ctrl-1, cluster keeps running
$ ssh ctrl-1 "sudo poweroff"

# After 30 seconds:
$ kubectl get nodes
NAME          STATUS     ROLES                       AGE
k8s-ctrl-1    NotReady   control-plane,etcd,master   5d
k8s-ctrl-2    Ready      control-plane,etcd,master   5d  # Now leader
k8s-ctrl-3    Ready      control-plane,etcd,master   5d
k8s-worker-1  Ready      <none>                      5d
k8s-worker-2  Ready      <none>                      5d

# VIP moved to ctrl-2, kubectl still works!

Implementation Hints:

K3s HA with embedded etcd:

# First control plane node
curl -sfL https://get.k3s.io | sh -s - server \
  --cluster-init \
  --tls-san 192.168.1.100 \
  --disable servicelb

# Get token
cat /var/lib/rancher/k3s/server/node-token

# Additional control plane nodes
curl -sfL https://get.k3s.io | sh -s - server \
  --server https://192.168.1.101:6443 \
  --token <TOKEN> \
  --tls-san 192.168.1.100

# Workers
curl -sfL https://get.k3s.io | sh -s - agent \
  --server https://192.168.1.100:6443 \
  --token <TOKEN>

kube-vip for virtual IP:

apiVersion: v1
kind: Pod
metadata:
  name: kube-vip
  namespace: kube-system
spec:
  hostNetwork: true
  containers:
  - name: kube-vip
    image: ghcr.io/kube-vip/kube-vip:latest
    args:
    - manager
    env:
    - name: vip_arp
      value: "true"
    - name: vip_interface
      value: eth0
    - name: address
      value: "192.168.1.100"
    - name: port
      value: "6443"
    - name: vip_leaderelection
      value: "true"

Learning milestones:

3-node control plane running → You understand HA architecture
etcd cluster healthy → You understand consensus
VIP fails over on node loss → You understand virtual IPs
Cluster survives any single failure → You’ve achieved true HA

Project 13: CI/CD Pipeline with Self-Hosted Runners

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML / Docker
Alternative Programming Languages: Go, Python
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: CI/CD / DevOps
Software or Tool: GitHub Actions + self-hosted runner, or Drone CI
Main Book: “Continuous Delivery” by Jez Humble

What you’ll build: A self-hosted CI/CD system that builds Docker images, runs tests, and deploys to your Kubernetes cluster—all running on your homelab instead of paying for cloud CI minutes.

Why it teaches home clusters: CI/CD is the heartbeat of modern software delivery. Running it yourself teaches you about build pipelines, artifact management, and deployment automation. Plus, self-hosted runners have access to your local cluster.

Core challenges you’ll face:

Runner registration → maps to connecting to GitHub/GitLab
Docker-in-Docker builds → maps to building images in CI
Secrets management → maps to secure credential handling
Deployment triggers → maps to automatic deployments

Key Concepts:

CI/CD Principles: “Continuous Delivery” Chapter 1 - Jez Humble
GitHub Actions Syntax: GitHub documentation
Docker BuildKit: Modern Docker image building
Kubernetes Deployments: Rolling updates

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster), Project 8 (GitOps helpful). Git and Docker proficiency.

Real world outcome:

# .github/workflows/deploy.yml
name: Build and Deploy
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: self-hosted  # Your homelab runner!
    steps:
      - uses: actions/checkout@v4

      - name: Build Docker image
        run: docker build -t myapp:${{ github.sha }} .

      - name: Push to local registry
        run: |
          docker tag myapp:${{ github.sha }} registry.home.lab/myapp:${{ github.sha }}
          docker push registry.home.lab/myapp:${{ github.sha }}

      - name: Deploy to Kubernetes
        run: |
          kubectl set image deployment/myapp \
            myapp=registry.home.lab/myapp:${{ github.sha }}

GitHub Actions log:
✓ Build Docker image (45s)
✓ Push to local registry (12s)
✓ Deploy to Kubernetes (8s)

Total: 1m 5s on YOUR hardware, $0 cloud costs

Implementation Hints:

GitHub self-hosted runner setup:

# On a node in your cluster
mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64.tar.gz -L https://github.com/actions/runner/releases/download/v2.xxx/actions-runner-linux-x64-2.xxx.tar.gz
tar xzf actions-runner-linux-x64.tar.gz

# Configure (get token from GitHub repo settings)
./config.sh --url https://github.com/USERNAME/REPO --token <TOKEN>

# Run as service
sudo ./svc.sh install
sudo ./svc.sh start

Local container registry:

# Deploy a local registry in K8s
apiVersion: apps/v1
kind: Deployment
metadata:
  name: registry
spec:
  replicas: 1
  selector:
    matchLabels:
      app: registry
  template:
    spec:
      containers:
      - name: registry
        image: registry:2
        ports:
        - containerPort: 5000
        volumeMounts:
        - name: data
          mountPath: /var/lib/registry
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: registry-data

Learning milestones:

Self-hosted runner connected → You understand runner architecture
Build runs on your hardware → You understand self-hosted benefits
Image pushed to local registry → You understand artifact storage
Auto-deploy on git push → You understand full CI/CD

Project 14: Backup and Disaster Recovery

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML / Bash
Alternative Programming Languages: Go
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Backup / DR
Software or Tool: Velero, Restic, Proxmox Backup Server
Main Book: “Site Reliability Engineering” by Google

What you’ll build: A comprehensive backup strategy covering Kubernetes resources, persistent volumes, and VM snapshots. You’ll practice restoring from backup to prove it works.

Why it teaches home clusters: “Backup that isn’t tested isn’t backup.” Learning disaster recovery means understanding what state needs to be saved, how to save it, and how to restore it. This is critical for any production system.

Core challenges you’ll face:

What to backup → maps to identifying critical state
Velero for K8s → maps to cluster-aware backups
PV snapshots → maps to data backup
Recovery testing → maps to validating backups

Key Concepts:

RTO/RPO: Recovery Time/Point Objectives
3-2-1 Backup Rule: 3 copies, 2 media types, 1 offsite
Velero Architecture: Velero documentation
Snapshot vs Backup: Point-in-time vs full copy

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3-4 (K3s with storage), storage for backups (NAS, S3, etc.).

Real world outcome:

# Install Velero with S3-compatible backend
$ velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.5.0 \
    --bucket homelab-backups \
    --secret-file ./credentials-velero \
    --backup-location-config region=us-east-1,s3Url=http://minio.home.lab:9000

# Create a backup
$ velero backup create production-backup \
    --include-namespaces production \
    --snapshot-volumes

Backup request "production-backup" submitted successfully.

# Simulate disaster (delete namespace)
$ kubectl delete namespace production
namespace "production" deleted

# Restore from backup
$ velero restore create --from-backup production-backup

Restore request "production-backup-20231215" submitted successfully.

# Everything is back!
$ kubectl get all -n production
NAME                          READY   STATUS
pod/myapp-abc123-xxx          1/1     Running
pod/postgres-xyz789-yyy       1/1     Running

Implementation Hints:

Velero scheduled backup:

apiVersion: velero.io/v1
kind: Schedule
metadata:
  name: daily-backup
  namespace: velero
spec:
  schedule: "0 3 * * *"  # 3 AM daily
  template:
    includedNamespaces:
    - production
    - monitoring
    snapshotVolumes: true
    ttl: 720h  # Keep for 30 days

Proxmox backup with PBS:

# Add Proxmox Backup Server as storage
pvesm add pbs pbs-storage \
    --server 192.168.1.50 \
    --datastore backups \
    --username backup@pbs \
    --password secret

# Schedule VM backups
# In Proxmox UI: Datacenter → Backup → Add
# Schedule: daily, 02:00
# Selection: All VMs
# Storage: pbs-storage

Learning milestones:

Velero installed and connected to storage → You understand backup infra
Manual backup succeeds → You understand backup process
Restore works → You understand recovery
Scheduled backups running → You understand automation

Project 15: Self-Hosted Application Platform

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML
Alternative Programming Languages: Various per app
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Self-Hosting / Applications
Software or Tool: Various (see list below)
Main Book: N/A (App-specific docs)

What you’ll build: A suite of self-hosted applications running on your cluster: media server (Jellyfin), password manager (Vaultwarden), file sync (Nextcloud), ad-blocking DNS (Pi-hole), home automation (Home Assistant).

Why it teaches home clusters: This is why many people build homelabs—to run their own services. You’ll learn to deploy stateful applications, manage persistent data, configure networking, and understand real-world application requirements.

Core challenges you’ll face:

Stateful application management → maps to databases, file storage
Resource allocation → maps to limits and requests
External access → maps to ingress, DNS
Data persistence → maps to PVCs, backups

Popular Self-Hosted Apps:

App	Purpose	Complexity
Jellyfin	Media streaming	Medium
Vaultwarden	Password manager	Low
Nextcloud	File sync/office	High
Pi-hole	Ad-blocking DNS	Low
Home Assistant	Home automation	Medium
Paperless-ngx	Document management	Medium
Gitea	Git hosting	Low
Miniflux	RSS reader	Low

Difficulty: Intermediate Time estimate: Ongoing (add apps as needed) Prerequisites: Project 3-6 (K3s with ingress and storage).

Real world outcome:

Your home dashboard (https://home.yourdomain.com):

┌─────────────────────────────────────────────────────────────────┐
│  🏠 Homelab Dashboard                                           │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  📺 Jellyfin        → jellyfin.home.lab      [Healthy]         │
│  🔐 Vaultwarden     → vault.home.lab         [Healthy]         │
│  📁 Nextcloud       → cloud.home.lab         [Healthy]         │
│  🛡️ Pi-hole         → pihole.home.lab        [Healthy]         │
│  🏠 Home Assistant  → hass.home.lab          [Healthy]         │
│  📄 Paperless       → docs.home.lab          [Healthy]         │
│  📊 Grafana         → grafana.home.lab       [Healthy]         │
│                                                                 │
│  Cluster Status: 3/3 nodes healthy                              │
│  Storage: 234 GB used / 1 TB available                          │
│  Network: 50 Mbps avg throughput                                │
└─────────────────────────────────────────────────────────────────┘

Implementation Hints:

Jellyfin deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jellyfin
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jellyfin
  template:
    metadata:
      labels:
        app: jellyfin
    spec:
      containers:
      - name: jellyfin
        image: jellyfin/jellyfin:latest
        ports:
        - containerPort: 8096
        volumeMounts:
        - name: config
          mountPath: /config
        - name: media
          mountPath: /media
        resources:
          limits:
            memory: "4Gi"
            cpu: "2"
      volumes:
      - name: config
        persistentVolumeClaim:
          claimName: jellyfin-config
      - name: media
        nfs:
          server: 192.168.1.50
          path: /media

Learning milestones:

First app deployed and accessible → You understand the pattern
5+ apps running smoothly → You understand resource management
All apps backed up → You understand data protection
Family uses your services → You’ve built real infrastructure

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor
1. Docker Single Node	Beginner	Weekend	⭐⭐⭐	⭐⭐⭐
2. Docker Swarm	Intermediate	Weekend-1wk	⭐⭐⭐⭐	⭐⭐⭐⭐
3. K3s Kubernetes	Advanced	1-2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
4. Longhorn Storage	Advanced	1 week	⭐⭐⭐⭐	⭐⭐⭐
5. MetalLB	Intermediate	Half day	⭐⭐⭐	⭐⭐⭐
6. Ingress + TLS	Intermediate	1 week	⭐⭐⭐⭐	⭐⭐⭐⭐
7. Prometheus + Grafana	Intermediate	1 week	⭐⭐⭐⭐	⭐⭐⭐⭐⭐
8. GitOps with ArgoCD	Advanced	1-2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
9. Proxmox	Advanced	1-2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
10. VLANs	Advanced	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐
11. Tailscale	Beginner	Half day	⭐⭐⭐	⭐⭐⭐⭐⭐
12. HA Kubernetes	Expert	1-2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
13. CI/CD	Intermediate	1 week	⭐⭐⭐⭐	⭐⭐⭐⭐
14. Backup & DR	Advanced	1-2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐
15. Self-Hosted Apps	Intermediate	Ongoing	⭐⭐⭐⭐	⭐⭐⭐⭐⭐

Recommended Learning Path

If you’re starting from scratch:

Project 1: Docker basics → Get comfortable with containers
Project 2: Docker Swarm → Your first real cluster
Project 11: Tailscale → Easy remote access
Project 3: K3s Kubernetes → The main event
Project 5: MetalLB → Real IPs for services
Project 6: Ingress + TLS → HTTPS for everything
Project 7: Monitoring → See what’s happening
Continue with storage, GitOps, etc.

If you want the “serious infrastructure” path:

Project 9: Proxmox → Virtualization foundation
Project 10: VLANs → Proper network segmentation
Project 3: K3s (on Proxmox VMs)
Project 12: HA Kubernetes → Production-grade
Project 4: Longhorn → Distributed storage
Project 8: GitOps → Infrastructure as code
Project 14: Backup & DR → Protect everything

If you just want to self-host apps quickly:

Project 1: Docker basics
Project 3: K3s (single node is fine)
Project 6: Ingress
Project 15: Self-hosted apps → Start deploying!
Project 11: Tailscale → Access from anywhere

Final Capstone Project: Production-Grade Homelab

File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
Main Programming Language: YAML / Terraform / Ansible
Alternative Programming Languages: Go, Python
Coolness Level: Level 5: Pure Magic
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 5: Master
Knowledge Area: Full-Stack Infrastructure
Software or Tool: All of the above
Main Book: “Site Reliability Engineering” by Google

What you’ll build: A complete, production-grade homelab that combines everything:

Infrastructure: Proxmox cluster with HA
Orchestration: HA Kubernetes (K3s) on Proxmox VMs
Storage: Longhorn or Ceph for distributed storage
Networking: VLANs, MetalLB, Ingress, TLS
Security: Tailscale, proper firewall rules, secrets management
Observability: Prometheus, Grafana, Loki for logs
GitOps: ArgoCD managing all deployments
CI/CD: Self-hosted runners building and deploying
Backup: Velero + Proxmox Backup Server
Applications: Full suite of self-hosted services

Why this is the ultimate homelab project: This is a mini cloud platform. Companies pay millions for infrastructure like this. You’ll have something that rivals a small startup’s infrastructure, running in your home, managed as code, and fully automated.

Real world outcome:

Your Infrastructure as Code Repository:

homelab-infrastructure/
├── terraform/
│   └── proxmox/              # VM provisioning
├── ansible/
│   └── playbooks/            # OS configuration
├── kubernetes/
│   ├── infrastructure/       # MetalLB, cert-manager, etc.
│   ├── monitoring/           # Prometheus stack
│   ├── applications/         # All your apps
│   └── argocd/              # ArgoCD itself
├── docs/
│   ├── architecture.md
│   └── runbooks/             # Operational procedures
└── README.md

$ git push origin main
# → Terraform provisions VMs
# → Ansible configures nodes
# → K3s cluster forms
# → ArgoCD syncs all applications
# → Full stack running in 30 minutes from bare metal

Summary

#	Project	Main Language
1	Docker on Single Node	Bash / YAML
2	Docker Swarm Cluster	Bash / YAML
3	K3s Kubernetes Cluster	YAML / Bash
4	Distributed Storage (Longhorn)	YAML / Bash
5	Load Balancer (MetalLB)	YAML
6	Ingress + TLS Certificates	YAML
7	Monitoring (Prometheus + Grafana)	YAML / PromQL
8	GitOps (ArgoCD)	YAML / Git
9	Proxmox Virtualization	Bash / Web UI
10	VLANs and Network Segmentation	Network config
11	Tailscale VPN	Configuration
12	HA Kubernetes (Multi-Master)	YAML / Bash
13	CI/CD Pipeline	YAML / Docker
14	Backup and Disaster Recovery	YAML / Bash
15	Self-Hosted Applications	YAML
Final	Production-Grade Homelab (Capstone)	All of the above

Key Resources Referenced

Online Resources

Books

“Docker Deep Dive” by Nigel Poulton
“Kubernetes Up and Running” by Brendan Burns et al.
“Prometheus Up and Running” by Brian Brazil
“GitOps and Kubernetes” by Billy Yuen
“Computer Networks” by Andrew Tanenbaum
“Continuous Delivery” by Jez Humble
“Site Reliability Engineering” by Google