LEARN HOME CLUSTERS DEEP DIVE
Learn Home Clusters: From Single Server to Distributed Infrastructure
Goal: Deeply understand home clusters (homelabs)—what they are, why they’re useful, how they differ from single servers, and how to build increasingly sophisticated distributed systems at home. You’ll progress from running your first container to building a production-grade Kubernetes cluster with GitOps, monitoring, and self-healing.
Why Home Clusters Matter
A home cluster (or homelab) is a collection of computers in your home that work together as a unified system. Instead of one big server doing everything, you have multiple machines sharing the load, providing redundancy, and teaching you the skills that power modern cloud infrastructure.
After completing these projects, you will:
- Understand the difference between a single server and a cluster
- Know when to use VMs vs. containers vs. orchestration
- Build and manage your own Kubernetes cluster
- Implement distributed storage that survives disk failures
- Set up monitoring, alerting, and self-healing infrastructure
- Deploy applications using GitOps (infrastructure as code)
- Understand networking concepts like VLANs, load balancing, and VPNs
- Have skills directly transferable to cloud platforms (AWS, GCP, Azure)
Core Concept Analysis
What is a Cluster?
Single Server Cluster
┌─────────────────┐ ┌─────────────────┐
│ │ │ Node 1 │
│ All services │ │ (control plane)│
│ All storage │ └────────┬────────┘
│ Single point │ │
│ of failure │ ┌────────┴────────┐
│ │ │ │
└─────────────────┘ ┌────────┴──────┐ ┌───────┴───────┐
│ Node 2 │ │ Node 3 │
│ (worker) │ │ (worker) │
└───────────────┘ └───────────────┘
If server dies, If node dies, workloads move
everything is gone to remaining nodes automatically
The Stack: From Hardware to Applications
┌─────────────────────────────────────────────────────────────────────────┐
│ YOUR APPLICATIONS │
│ (Websites, Databases, Media Servers, Home Automation) │
├─────────────────────────────────────────────────────────────────────────┤
│ CONTAINER ORCHESTRATION │
│ (Kubernetes / K3s / Docker Swarm / Nomad) │
│ Schedules containers, handles scaling, self-healing │
├─────────────────────────────────────────────────────────────────────────┤
│ CONTAINER RUNTIME │
│ (Docker / containerd / Podman) │
│ Runs isolated containers from images │
├─────────────────────────────────────────────────────────────────────────┤
│ VIRTUALIZATION (optional) │
│ (Proxmox / VMware / Hyper-V) │
│ VMs provide isolation and easy snapshotting │
├─────────────────────────────────────────────────────────────────────────┤
│ OPERATING SYSTEM │
│ (Ubuntu Server / Debian / Fedora / Talos) │
├─────────────────────────────────────────────────────────────────────────┤
│ HARDWARE │
│ (Raspberry Pi / Intel N100 Mini PC / Refurbished Dell Optiplex) │
└─────────────────────────────────────────────────────────────────────────┘
Why Build a Home Cluster?
| Reason | Description |
|---|---|
| Learning | Hands-on experience with cloud-native technologies without cloud costs |
| Career | Kubernetes, Docker, and DevOps skills are highly sought after |
| Self-hosting | Run your own services (media, email, storage) with privacy |
| High availability | Services survive individual machine failures |
| Cost savings | Learn on $200-500 hardware vs. $100s/month in cloud bills |
| Experimentation | Break things freely—rebuild in minutes with automation |
Cluster Types Compared
| Type | Complexity | Use Case | Learning Value |
|---|---|---|---|
| Docker Compose | ★☆☆☆☆ | Single server, multiple containers | Container basics |
| Docker Swarm | ★★☆☆☆ | Simple multi-node clustering | HA basics, easy setup |
| K3s / MicroK8s | ★★★☆☆ | Lightweight Kubernetes | Production K8s skills |
| Full Kubernetes | ★★★★☆ | Complete K8s experience | Enterprise-grade skills |
| Proxmox + K8s | ★★★★★ | VMs + Containers | Full infrastructure |
Proxmox vs Kubernetes: When to Use What
┌─────────────────────────────────────────────────────────────────┐
│ PROXMOX │
│ (Virtualization Layer) │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ VM 1 │ │ VM 2 │ │ VM 3 │ │ VM 4 │ │
│ │ K8s │ │ K8s │ │ K8s │ │ Windows │ │
│ │ Control │ │ Worker │ │ Worker │ │ for │ │
│ │ Plane │ │ Node 1 │ │ Node 2 │ │ Testing │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────────────┘
Proxmox handles: Kubernetes handles:
- VM lifecycle - Container scheduling
- Snapshots & backups - Auto-scaling
- Resource allocation - Service discovery
- HA for VMs - Rolling deployments
- Storage management - Self-healing containers
Best practice: Run Kubernetes inside Proxmox VMs. You get the best of both worlds—VM-level isolation and easy recovery (snapshots) with container-level efficiency and orchestration.
Hardware Options for Home Clusters
Budget Recommendations (2025)
| Option | Cost | Nodes | Best For |
|---|---|---|---|
| Raspberry Pi 5 (4GB) | ~$60 each | 3-5 | Learning, low power, ARM experience |
| Intel N100 Mini PCs | ~$150 each | 3-4 | Best value, 6W TDP, silent |
| Refurbished Optiplex | ~$100 each | 3-4 | More RAM/CPU, slightly loud |
| Minisforum MS-01 | ~$600 each | 2-3 | 10GbE networking, serious clusters |
Starter Cluster: 3-Node Raspberry Pi
Total Cost: ~$200-250
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Pi 5 (8GB) │ │ Pi 5 (4GB) │ │ Pi 5 (4GB) │
│ Control Plane │ │ Worker 1 │ │ Worker 2 │
│ │ │ │ │ │
│ + 64GB SD │ │ + 64GB SD │ │ + 64GB SD │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌───────┴───────┐
│ Gigabit │
│ Switch │
└───────────────┘
Recommended Cluster: 3-Node Intel N100
Total Cost: ~$450-600
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ N100 Mini PC │ │ N100 Mini PC │ │ N100 Mini PC │
│ 16GB RAM │ │ 16GB RAM │ │ 16GB RAM │
│ 512GB NVMe │ │ 512GB NVMe │ │ 512GB NVMe │
│ 2x 2.5GbE │ │ 2x 2.5GbE │ │ 2x 2.5GbE │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
└───────────────────────┼───────────────────────┘
│
┌───────┴───────┐
│ 2.5GbE │
│ Switch │
└───────────────┘
Advantages:
- x86 architecture (more compatible software)
- 16-32GB RAM per node
- NVMe storage (fast)
- Low power (~6W idle)
- Silent operation
Project List
Projects are ordered from fundamental understanding to advanced implementations.
Project 1: Docker on a Single Node (The Foundation)
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: Bash / YAML
- Alternative Programming Languages: Python (for automation)
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 1: Beginner
- Knowledge Area: Containers / Docker
- Software or Tool: Docker, Docker Compose
- Main Book: “Docker Deep Dive” by Nigel Poulton
What you’ll build: A single server running multiple containerized applications (web server, database, reverse proxy) using Docker and Docker Compose. This is the foundation for all cluster work.
Why it teaches home clusters: Before you can orchestrate containers across multiple machines, you need to understand what containers are and how they work on a single machine. Docker Compose is the “single-node orchestrator” that introduces you to declarative configuration.
Core challenges you’ll face:
- Container networking → maps to how containers talk to each other
- Volume management → maps to persistent data in ephemeral containers
- Image building → maps to creating custom container images
- Compose file syntax → maps to declarative infrastructure
Key Concepts:
- Container vs VM: “Docker Deep Dive” Chapter 4 - Nigel Poulton
- Docker Networking: “Docker Deep Dive” Chapter 11
- Docker Volumes: “Docker Deep Dive” Chapter 10
- Compose File Reference: Docker official documentation
Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic Linux command line, understanding of web servers. No prior container experience needed.
Real world outcome:
$ docker compose up -d
[+] Running 4/4
✔ Network homelab_default Created
✔ Container homelab-db-1 Started
✔ Container homelab-app-1 Started
✔ Container homelab-nginx-1 Started
$ curl http://localhost
Welcome to my homelab!
$ docker compose ps
NAME STATUS PORTS
homelab-db-1 running 5432/tcp
homelab-app-1 running 8080/tcp
homelab-nginx-1 running 0.0.0.0:80->80/tcp
Implementation Hints:
Docker Compose file structure:
# docker-compose.yml
version: '3.8'
services:
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx.conf:/etc/nginx/nginx.conf:ro
depends_on:
- app
app:
build: ./app
environment:
- DATABASE_URL=postgres://db:5432/myapp
db:
image: postgres:15
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=secret
volumes:
postgres_data:
Questions to answer as you build:
- What happens to data when a container restarts?
- How do containers find each other by name?
- Why use Alpine-based images?
- What’s the difference between
build:andimage:?
Learning milestones:
- “Hello World” container runs → You understand container basics
- Multi-container app works → You understand networking
- Data persists across restarts → You understand volumes
- You can rebuild from scratch in one command → You understand declarative config
Project 2: Docker Swarm Cluster (Your First Real Cluster)
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: Bash / YAML
- Alternative Programming Languages: Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Container Orchestration
- Software or Tool: Docker Swarm, Portainer
- Main Book: “Docker Deep Dive” by Nigel Poulton
What you’ll build: A 3-node Docker Swarm cluster where containers can run on any node, services are load-balanced automatically, and if a node dies, containers are rescheduled to surviving nodes.
Why it teaches home clusters: Docker Swarm is the gentlest introduction to container orchestration. One command (docker swarm init) and you have a cluster. It teaches the core concepts—managers, workers, services, replicas—without Kubernetes complexity.
Core challenges you’ll face:
- Manager vs worker nodes → maps to control plane vs data plane
- Service replication → maps to high availability
- Overlay networking → maps to cross-node container communication
- Stack deployments → maps to multi-service applications
Key Concepts:
- Swarm Mode: “Docker Deep Dive” Chapter 14 - Nigel Poulton
- Overlay Networks: “Docker Deep Dive” Chapter 11
- Service Discovery: Built into Swarm via DNS
- Rolling Updates: “Docker Deep Dive” Chapter 14
Difficulty: Intermediate Time estimate: Weekend to 1 week Prerequisites: Project 1 (Docker basics). 2-3 machines (physical or VMs) with Docker installed.
Real world outcome:
# On node 1 (manager)
$ docker swarm init --advertise-addr 192.168.1.10
Swarm initialized: current node is now a manager.
To add a worker: docker swarm join --token SWMTKN-xxx 192.168.1.10:2377
# On nodes 2 and 3
$ docker swarm join --token SWMTKN-xxx 192.168.1.10:2377
This node joined a swarm as a worker.
# Back on manager - deploy a replicated service
$ docker service create --name web --replicas 3 -p 80:80 nginx
3/3: Running
$ docker service ps web
ID NAME NODE STATE
abc123 web.1 node1 Running
def456 web.2 node2 Running
ghi789 web.3 node3 Running
# Kill node3, watch containers reschedule
$ docker service ps web
ID NAME NODE STATE
abc123 web.1 node1 Running
def456 web.2 node2 Running
jkl012 web.3 node1 Running # Rescheduled!
Implementation Hints:
Swarm stack file (similar to Compose, but for Swarm):
# stack.yml
version: '3.8'
services:
web:
image: nginx:alpine
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
restart_policy:
condition: on-failure
ports:
- "80:80"
networks:
- webnet
visualizer:
image: dockersamples/visualizer
ports:
- "8080:8080"
volumes:
- /var/run/docker.sock:/var/run/docker.sock
deploy:
placement:
constraints:
- node.role == manager
networks:
webnet:
driver: overlay
Deploy and manage:
$ docker stack deploy -c stack.yml myapp
$ docker stack services myapp
$ docker stack ps myapp
Resources for key challenges:
Learning milestones:
- 3-node cluster initialized → You understand cluster formation
- Service runs across nodes → You understand scheduling
- Service survives node failure → You understand high availability
- Rolling update works → You understand zero-downtime deployments
Project 3: K3s Kubernetes Cluster (Lightweight Production-Grade)
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML / Bash
- Alternative Programming Languages: Go, Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 1. The “Resume Gold”
- Difficulty: Level 3: Advanced
- Knowledge Area: Kubernetes / Container Orchestration
- Software or Tool: K3s, kubectl, Helm
- Main Book: “Kubernetes Up and Running” by Brendan Burns
What you’ll build: A production-ready Kubernetes cluster using K3s—a lightweight K8s distribution perfect for edge and home labs. You’ll deploy applications, understand pods, services, deployments, and ingress.
Why it teaches home clusters: Kubernetes is the industry standard for container orchestration. K3s gives you real Kubernetes without the resource overhead. Skills learned here transfer directly to EKS, GKE, AKS, and enterprise Kubernetes.
Core challenges you’ll face:
- Pod scheduling and lifecycle → maps to fundamental K8s unit
- Services and networking → maps to how pods communicate
- Deployments and ReplicaSets → maps to declarative application management
- Ingress and load balancing → maps to external access
Key Concepts:
- Kubernetes Architecture: “Kubernetes Up and Running” Chapter 3 - Burns et al.
- Pods and Containers: “Kubernetes Up and Running” Chapter 5
- Services: “Kubernetes Up and Running” Chapter 7
- Deployments: “Kubernetes Up and Running” Chapter 9
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 2 (orchestration basics), Linux administration, networking fundamentals. 3 machines recommended.
Real world outcome:
# On control plane node
$ curl -sfL https://get.k3s.io | sh -
# Get join token
$ cat /var/lib/rancher/k3s/server/node-token
# On worker nodes
$ curl -sfL https://get.k3s.io | K3S_URL=https://192.168.1.10:6443 \
K3S_TOKEN=<token> sh -
# Check cluster
$ kubectl get nodes
NAME STATUS ROLES AGE
node1 Ready control-plane,master 5m
node2 Ready <none> 2m
node3 Ready <none> 2m
# Deploy an application
$ kubectl create deployment nginx --image=nginx --replicas=3
$ kubectl expose deployment nginx --port=80 --type=LoadBalancer
$ kubectl get pods -o wide
NAME READY STATUS NODE
nginx-abc123-xxx 1/1 Running node1
nginx-abc123-yyy 1/1 Running node2
nginx-abc123-zzz 1/1 Running node3
Implementation Hints:
Kubernetes manifest structure:
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myapp:latest
ports:
- containerPort: 8080
resources:
limits:
memory: "128Mi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
ports:
- port: 80
targetPort: 8080
type: ClusterIP
Essential kubectl commands:
kubectl get pods,svc,deploy -A # See everything
kubectl describe pod <name> # Debug a pod
kubectl logs <pod> -f # Stream logs
kubectl exec -it <pod> -- /bin/sh # Shell into pod
kubectl apply -f manifest.yaml # Apply config
kubectl delete -f manifest.yaml # Delete resources
Resources for key challenges:
Learning milestones:
- Cluster formed, kubectl works → You understand K8s architecture
- Deployment scales up/down → You understand workload management
- Service discovery works → You understand K8s networking
- Ingress routes external traffic → You understand edge access
Project 4: Distributed Storage with Longhorn
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML / Bash
- Alternative Programming Languages: Go
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Distributed Storage / Kubernetes
- Software or Tool: Longhorn, K3s
- Main Book: “Kubernetes Up and Running” by Brendan Burns
What you’ll build: A distributed block storage system that replicates data across cluster nodes. When a node dies, your data survives. Databases, file storage, and stateful apps all get persistent volumes that are automatically replicated.
Why it teaches home clusters: Single points of failure are the enemy of clusters. Local disk on one node isn’t “clustered storage.” Longhorn teaches you how real distributed storage works—replication, consistency, and failure recovery.
Core challenges you’ll face:
- Persistent Volume Claims (PVCs) → maps to requesting storage in Kubernetes
- Storage Classes → maps to different storage tiers/policies
- Replication factor → maps to data redundancy
- Backup and restore → maps to disaster recovery
Key Concepts:
- Kubernetes Storage: “Kubernetes Up and Running” Chapter 13 - Burns et al.
- PV/PVC/StorageClass: Kubernetes official documentation
- Distributed Block Storage: Longhorn architecture documentation
- iSCSI: How Longhorn exposes volumes
Difficulty: Advanced Time estimate: 1 week Prerequisites: Project 3 (K3s cluster running). Each node needs some local disk space.
Real world outcome:
# Install Longhorn
$ kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml
# Wait for Longhorn to be ready
$ kubectl -n longhorn-system get pods
NAME READY STATUS
longhorn-manager-xxxxx 1/1 Running
longhorn-driver-deployer-xxxxx 1/1 Running
# Access Longhorn UI
$ kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
# Create a replicated volume for PostgreSQL
$ cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
EOF
# Longhorn dashboard shows:
# - Volume: postgres-data
# - Replicas: 3 (one on each node)
# - Status: Healthy
# Kill a node, watch Longhorn rebuild replica on surviving nodes
Implementation Hints:
Longhorn StorageClass configuration:
# storageclass.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-fast
provisioner: driver.longhorn.io
allowVolumeExpansion: true
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880" # 48 hours
fromBackup: ""
dataLocality: "best-effort"
Using storage in a deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: postgres
spec:
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumes:
- name: data
persistentVolumeClaim:
claimName: postgres-data
Resources for key challenges:
Learning milestones:
- Longhorn installed and healthy → You understand CSI drivers
- PVC provisions automatically → You understand dynamic provisioning
- Data survives pod restart → You understand persistent storage
- Data survives node failure → You understand distributed replication
Project 5: Load Balancer with MetalLB
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Networking / Kubernetes
- Software or Tool: MetalLB, K3s
- Main Book: “Kubernetes Networking” by James Strong
What you’ll build: A bare-metal load balancer that gives your Kubernetes services external IP addresses—just like cloud LoadBalancers do. No more NodePort hacks; services get real IPs from your home network.
Why it teaches home clusters: In the cloud, type: LoadBalancer magically gets an IP. On bare metal, you need MetalLB. Understanding how this works demystifies a core Kubernetes networking concept and teaches you about Layer 2/BGP networking.
Core challenges you’ll face:
- IP address pool management → maps to allocating IPs from your network
- Layer 2 vs BGP mode → maps to how addresses are advertised
- Service type LoadBalancer → maps to external access patterns
- ARP/NDP → maps to how Layer 2 mode works
Key Concepts:
- Kubernetes Service Types: “Kubernetes Up and Running” Chapter 7
- Layer 2 Networking: ARP (Address Resolution Protocol)
- BGP: Border Gateway Protocol basics
- Load Balancing Algorithms: Round-robin, least connections
Difficulty: Intermediate Time estimate: Half a day Prerequisites: Project 3 (K3s cluster), basic networking knowledge (IP addresses, subnets).
Real world outcome:
# Install MetalLB
$ kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/main/config/manifests/metallb-native.yaml
# Configure IP pool (use unused IPs from your LAN)
$ cat <<EOF | kubectl apply -f -
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: homelab-pool
namespace: metallb-system
spec:
addresses:
- 192.168.1.200-192.168.1.220
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
name: homelab-l2
namespace: metallb-system
EOF
# Create a LoadBalancer service
$ kubectl expose deployment nginx --type=LoadBalancer --port=80
$ kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
nginx LoadBalancer 10.43.12.34 192.168.1.200 80:31234/TCP
# Access from any machine on your network!
$ curl http://192.168.1.200
Welcome to nginx!
Implementation Hints:
MetalLB configuration options:
# Layer 2 mode (simple, no router config needed)
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
name: production
namespace: metallb-system
spec:
addresses:
- 192.168.1.200-192.168.1.220 # Reserve these in your DHCP
---
# BGP mode (for advanced setups with BGP-capable router)
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
name: bgp-advertisement
namespace: metallb-system
spec:
ipAddressPools:
- production
Learning milestones:
- MetalLB pods running → You understand the installation
- Service gets external IP → You understand IP allocation
- External access works → You understand L2 advertisement
- Multiple services get unique IPs → You understand pool management
Project 6: Ingress Controller and TLS Certificates
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Networking / Security
- Software or Tool: Traefik (K3s default) or Nginx Ingress, cert-manager
- Main Book: “Kubernetes Networking” by James Strong
What you’ll build: An ingress controller that routes traffic based on hostnames (app1.home.lab, app2.home.lab), plus automatic TLS certificate management so everything is HTTPS.
Why it teaches home clusters: Real applications need domain-based routing and HTTPS. The Ingress resource is how Kubernetes handles this. cert-manager automates the tedious certificate renewal process—essential for production.
Core challenges you’ll face:
- Ingress resource configuration → maps to routing rules
- TLS termination → maps to HTTPS at the edge
- cert-manager and Let’s Encrypt → maps to automated certificates
- Local DNS setup → maps to resolving *.home.lab
Key Concepts:
- Kubernetes Ingress: “Kubernetes Up and Running” Chapter 8
- TLS/SSL Certificates: How HTTPS works
- ACME Protocol: How Let’s Encrypt issues certificates
- DNS Challenge: Proving domain ownership
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster), Project 5 (MetalLB). Domain name helpful but not required.
Real world outcome:
# K3s comes with Traefik ingress controller
$ kubectl get pods -n kube-system | grep traefik
traefik-xxxxx 1/1 Running
# Install cert-manager
$ kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.yaml
# Create Ingress with TLS
$ cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp-ingress
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- myapp.example.com
secretName: myapp-tls
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp
port:
number: 80
EOF
# Browser shows:
# 🔒 https://myapp.example.com - Certificate valid!
Implementation Hints:
cert-manager ClusterIssuer:
# For production (real certs from Let's Encrypt)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: you@example.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: traefik
# For testing (self-signed, no rate limits)
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: selfsigned
spec:
selfSigned: {}
Local DNS with Pi-hole or CoreDNS:
# Add to /etc/hosts or Pi-hole Local DNS:
192.168.1.200 app1.home.lab
192.168.1.200 app2.home.lab
192.168.1.200 grafana.home.lab
Learning milestones:
- Ingress routes by hostname → You understand L7 routing
- Self-signed TLS works → You understand TLS termination
- Let’s Encrypt cert auto-issued → You understand ACME
- Multiple apps on one IP → You understand virtual hosting
Project 7: Monitoring Stack (Prometheus + Grafana)
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML / PromQL
- Alternative Programming Languages: Python (exporters)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Observability / Monitoring
- Software or Tool: Prometheus, Grafana, kube-prometheus-stack
- Main Book: “Prometheus Up and Running” by Brian Brazil
What you’ll build: A complete monitoring stack that collects metrics from all cluster nodes and applications, visualizes them in beautiful dashboards, and can alert you when something goes wrong.
Why it teaches home clusters: You can’t manage what you can’t measure. Prometheus is the standard for Kubernetes monitoring. Learning PromQL (Prometheus Query Language) and Grafana dashboards are essential DevOps skills.
Core challenges you’ll face:
- Prometheus scraping → maps to metric collection
- PromQL queries → maps to analyzing time series data
- Grafana dashboards → maps to visualization
- Alertmanager → maps to notification on issues
Key Concepts:
- Prometheus Architecture: “Prometheus Up and Running” Chapter 2 - Brian Brazil
- PromQL Basics: “Prometheus Up and Running” Chapter 4
- Grafana Dashboards: Grafana documentation
- Kubernetes Metrics: kube-state-metrics, node-exporter
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster). Understanding of metrics concepts helpful.
Real world outcome:
# Install kube-prometheus-stack via Helm
$ helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
$ helm install monitoring prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace
# Access Grafana
$ kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80
# Open http://localhost:3000 (admin/prom-operator)
# Grafana shows:
# - Cluster CPU/Memory usage: 45% / 62%
# - Pod count: 47 running, 0 pending
# - Node status: 3/3 healthy
# - Network I/O: 150 Mbps in, 50 Mbps out
# - Disk usage: 234 GB / 512 GB
# Example PromQL queries:
# CPU usage by pod: sum(rate(container_cpu_usage_seconds_total[5m])) by (pod)
# Memory by namespace: sum(container_memory_usage_bytes) by (namespace)
Implementation Hints:
kube-prometheus-stack values customization:
# values.yaml
grafana:
adminPassword: your-secure-password
ingress:
enabled: true
hosts:
- grafana.home.lab
prometheus:
prometheusSpec:
retention: 30d
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: longhorn
resources:
requests:
storage: 50Gi
alertmanager:
config:
receivers:
- name: 'slack'
slack_configs:
- api_url: 'https://hooks.slack.com/xxx'
channel: '#alerts'
Essential PromQL queries:
# Node CPU usage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)
# Pod memory usage
sum(container_memory_usage_bytes{pod!=""}) by (pod, namespace)
# HTTP request rate
sum(rate(http_requests_total[5m])) by (service)
# Error rate
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m]))
Learning milestones:
- Prometheus scrapes metrics → You understand metric collection
- Grafana dashboard shows cluster health → You understand visualization
- Custom PromQL query works → You understand querying
- Alert fires and notifies you → You understand alerting
Project 8: GitOps with ArgoCD
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML / Git
- Alternative Programming Languages: Kustomize, Helm
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: GitOps / CI/CD
- Software or Tool: ArgoCD, GitHub/GitLab
- Main Book: “GitOps and Kubernetes” by Billy Yuen
What you’ll build: A GitOps pipeline where your cluster state is defined in Git. Push a change to your repository, and ArgoCD automatically syncs it to your cluster. No more kubectl apply—everything is version-controlled and auditable.
Why it teaches home clusters: GitOps is how modern teams manage infrastructure. “Git as the single source of truth” means you can rebuild your entire cluster from a repository. It’s declarative, auditable, and enables easy rollbacks.
Core challenges you’ll face:
- Application custom resources → maps to defining apps in ArgoCD
- Sync policies → maps to automatic vs manual deployment
- Helm/Kustomize integration → maps to templating and overlays
- RBAC and multi-tenancy → maps to who can deploy what
Key Concepts:
- GitOps Principles: “GitOps and Kubernetes” Chapter 1 - Billy Yuen
- ArgoCD Applications: ArgoCD documentation
- Kustomize: Kubernetes-native templating
- Helm Charts: Package manager for Kubernetes
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3 (K3s cluster), Git proficiency, understanding of Kubernetes manifests.
Real world outcome:
# Install ArgoCD
$ kubectl create namespace argocd
$ kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Access ArgoCD UI
$ kubectl port-forward svc/argocd-server -n argocd 8080:443
# Get initial password: kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
# Create an Application pointing to your Git repo
$ cat <<EOF | kubectl apply -f -
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: my-homelab
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourusername/homelab-gitops.git
targetRevision: HEAD
path: apps
destination:
server: https://kubernetes.default.svc
namespace: default
syncPolicy:
automated:
prune: true
selfHeal: true
EOF
# Now: git push to your repo → ArgoCD syncs automatically!
# ArgoCD UI shows:
# - App Status: Synced ✓
# - Health: Healthy ✓
# - Last Sync: 30 seconds ago
Implementation Hints:
Repository structure for GitOps:
homelab-gitops/
├── apps/ # Application definitions
│ ├── nginx/
│ │ ├── deployment.yaml
│ │ ├── service.yaml
│ │ └── ingress.yaml
│ ├── postgres/
│ │ ├── deployment.yaml
│ │ ├── service.yaml
│ │ └── pvc.yaml
│ └── kustomization.yaml
├── infrastructure/ # Cluster infrastructure
│ ├── metallb/
│ ├── cert-manager/
│ └── monitoring/
└── README.md
ArgoCD Application with Helm:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prometheus
namespace: argocd
spec:
source:
repoURL: https://prometheus-community.github.io/helm-charts
chart: kube-prometheus-stack
targetRevision: 45.0.0
helm:
values: |
grafana:
adminPassword: secret
destination:
server: https://kubernetes.default.svc
namespace: monitoring
Resources for key challenges:
- ArgoCD Documentation
- Pi Kubernetes Cluster with FluxCD - Alternative GitOps tool
Learning milestones:
- ArgoCD syncs from Git → You understand GitOps basics
- Auto-sync on git push → You understand continuous deployment
- Helm charts deployed via ArgoCD → You understand templating
- Full cluster rebuildable from Git → You’ve achieved true GitOps
Project 9: Proxmox Virtualization Platform
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: Bash / Web UI
- Alternative Programming Languages: Terraform (for automation)
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Virtualization / Infrastructure
- Software or Tool: Proxmox VE
- Main Book: “Mastering Proxmox” by Wasim Ahmed
What you’ll build: A Proxmox cluster that manages VMs and containers across multiple physical nodes. This becomes the foundation layer for everything else—you can run Kubernetes inside Proxmox VMs for the best of both worlds.
Why it teaches home clusters: Proxmox teaches you virtualization concepts: hypervisors, VM lifecycle, live migration, HA clustering, and storage pools. Running K8s on Proxmox gives you snapshot/backup capabilities and easy recovery.
Core challenges you’ll face:
- Proxmox cluster formation → maps to quorum and consensus
- VM templates → maps to golden images
- Storage configuration → maps to local vs shared storage
- HA groups → maps to automatic VM failover
Key Concepts:
- Type 1 vs Type 2 Hypervisors: Proxmox is Type 1 (bare metal)
- QEMU/KVM: Linux virtualization technology
- LXC Containers: Lightweight OS-level virtualization
- Ceph Integration: Distributed storage in Proxmox
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: 2-3 physical machines (or nested virtualization), basic Linux administration.
Real world outcome:
Proxmox Web UI (https://192.168.1.10:8006)
┌─────────────────────────────────────────────────────────────────┐
│ Datacenter: HomeCluster │
├─────────────────────────────────────────────────────────────────┤
│ Nodes: │
│ ✓ pve1 (192.168.1.10) - 16 CPU, 64GB RAM, 1TB SSD │
│ ✓ pve2 (192.168.1.11) - 16 CPU, 64GB RAM, 1TB SSD │
│ ✓ pve3 (192.168.1.12) - 16 CPU, 64GB RAM, 1TB SSD │
│ │
│ VMs: │
│ k8s-control (pve1) - 4 CPU, 8GB RAM - Running │
│ k8s-worker1 (pve2) - 4 CPU, 8GB RAM - Running │
│ k8s-worker2 (pve3) - 4 CPU, 8GB RAM - Running │
│ windows-test (pve1) - 4 CPU, 16GB RAM - Stopped │
│ │
│ Storage: │
│ local-lvm: 234GB / 500GB │
│ ceph-pool: 1.2TB / 3TB (replicated across nodes) │
└─────────────────────────────────────────────────────────────────┘
Implementation Hints:
Proxmox cluster setup:
# On first node (pve1)
$ pvecm create HomeCluster
# On other nodes (pve2, pve3)
$ pvecm add 192.168.1.10
# Check cluster status
$ pvecm status
Cluster information
-------------------
Name: HomeCluster
Config Version: 3
Transport: knet
Secure auth: on
Membership information
----------------------
Nodeid Votes Name
1 1 pve1 (local)
2 1 pve2
3 1 pve3
Creating a VM template:
# Download cloud image
$ wget https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img
# Create VM
$ qm create 9000 --name ubuntu-template --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
# Import disk
$ qm importdisk 9000 jammy-server-cloudimg-amd64.img local-lvm
# Attach disk
$ qm set 9000 --scsihw virtio-scsi-pci --scsi0 local-lvm:vm-9000-disk-0
# Convert to template
$ qm template 9000
# Clone from template
$ qm clone 9000 100 --name k8s-control --full
Resources for key challenges:
Learning milestones:
- Proxmox cluster formed → You understand cluster quorum
- VM created and runs → You understand virtualization basics
- Template cloning works → You understand golden images
- Live migration works → You understand VM mobility
Project 10: Network Segmentation with VLANs
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: N/A (Network configuration)
- Alternative Programming Languages: Ansible (for automation)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 3: Advanced
- Knowledge Area: Networking / Security
- Software or Tool: Managed switch, pfSense/OPNsense
- Main Book: “Computer Networks” by Tanenbaum
What you’ll build: A properly segmented home network with VLANs separating management traffic, user devices, IoT devices, and cluster traffic. Each VLAN has appropriate firewall rules.
Why it teaches home clusters: Production clusters aren’t on flat networks. VLANs teach you layer 2/3 networking, inter-VLAN routing, and network security. These skills are essential for any infrastructure role.
Core challenges you’ll face:
- VLAN tagging (802.1Q) → maps to how VLANs work on the wire
- Inter-VLAN routing → maps to how VLANs communicate
- Firewall rules → maps to controlling traffic flow
- Trunk vs access ports → maps to switch configuration
Key Concepts:
- VLANs: “Computer Networks” Chapter 4 - Tanenbaum
- 802.1Q Tagging: IEEE standard for VLAN tagging
- Layer 3 Switching: Routing between VLANs
- Firewall Zones: Grouping interfaces by trust level
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Managed switch (VLAN-capable), router/firewall (pfSense/OPNsense). Basic networking knowledge.
Real world outcome:
Network Diagram:
Internet
│
┌────┴────┐
│ OPNsense│ (Router/Firewall)
│ Firewall│
└────┬────┘
│ Trunk (all VLANs)
┌────┴────┐
│ Managed │
│ Switch │
└─┬──┬──┬─┘
│ │ │
┌────────────┘ │ └────────────┐
│ │ │
VLAN 10 VLAN 20 VLAN 30
Management Cluster IoT
──────────── ───────── ──────────
• OPNsense • k8s-ctrl • Cameras
• Proxmox UI • k8s-worker1 • Smart bulbs
• Switch mgmt • k8s-worker2 • Thermostats
• NAS
Firewall Rules:
• VLAN 10 → All VLANs (management access)
• VLAN 20 → Internet, VLAN 30 limited
• VLAN 30 → Internet only (IoT isolated)
Implementation Hints:
OPNsense VLAN configuration:
Interfaces → Other Types → VLAN
VLAN 10: Parent: igb0, Tag: 10, Description: Management
VLAN 20: Parent: igb0, Tag: 20, Description: Cluster
VLAN 30: Parent: igb0, Tag: 30, Description: IoT
Interfaces → Assignments:
OPT1: vlan10 → MGMT (192.168.10.1/24)
OPT2: vlan20 → CLUSTER (192.168.20.1/24)
OPT3: vlan30 → IOT (192.168.30.1/24)
Services → DHCPv4:
MGMT: 192.168.10.100 - 192.168.10.200
CLUSTER: 192.168.20.100 - 192.168.20.200
IOT: 192.168.30.100 - 192.168.30.200
Switch VLAN configuration (example for TP-Link):
Port 1-4: PVID=10, Untagged=10, Tagged=none (Management)
Port 5-12: PVID=20, Untagged=20, Tagged=none (Cluster)
Port 13-20: PVID=30, Untagged=30, Tagged=none (IoT)
Port 24: PVID=1, Untagged=1, Tagged=10,20,30 (Trunk to router)
Resources for key challenges:
Learning milestones:
- VLANs created on switch → You understand 802.1Q
- Devices in VLAN get correct IPs → You understand DHCP per VLAN
- VLANs can communicate via router → You understand inter-VLAN routing
- Firewall blocks unauthorized traffic → You understand security zones
Project 11: Secure Remote Access with Tailscale
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: N/A (Configuration)
- Alternative Programming Languages: N/A
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 1: Beginner
- Knowledge Area: Networking / VPN
- Software or Tool: Tailscale
- Main Book: N/A (Online documentation)
What you’ll build: Secure access to your entire homelab from anywhere in the world using Tailscale’s WireGuard-based mesh VPN. No port forwarding, no dynamic DNS hassles, no exposed services.
Why it teaches home clusters: Remote access is essential for managing your cluster. Tailscale teaches you about WireGuard, mesh networking, and zero-trust security—all without the complexity of setting up your own VPN server.
Core challenges you’ll face:
- Mesh VPN topology → maps to peer-to-peer connections
- Subnet routing → maps to accessing entire home network remotely
- ACLs → maps to who can access what
- Exit nodes → maps to routing all traffic through home
Key Concepts:
- WireGuard Protocol: Modern, fast VPN protocol
- Mesh Networking: Every device connects to every other
- NAT Traversal: Connecting through firewalls
- Zero-Trust: Authenticate before connect
Difficulty: Beginner Time estimate: Half a day Prerequisites: Any device to install Tailscale on. Free account at tailscale.com.
Real world outcome:
# On your cluster nodes
$ curl -fsSL https://tailscale.com/install.sh | sh
$ sudo tailscale up
# Tailscale dashboard shows:
# Machine IP Last Seen
# k8s-control 100.64.0.1 Connected
# k8s-worker1 100.64.0.2 Connected
# k8s-worker2 100.64.0.3 Connected
# macbook 100.64.0.4 Connected
# iphone 100.64.0.5 Connected
# From your phone on 4G:
$ ssh 100.64.0.1
Welcome to k8s-control!
# Enable subnet routing (access all 192.168.x.x from anywhere)
$ sudo tailscale up --advertise-routes=192.168.0.0/16
# From anywhere, access your home network:
$ curl http://192.168.1.10:8006 # Proxmox UI works!
Implementation Hints:
Tailscale setup on Kubernetes nodes:
# Install on each node
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
# On one node, enable subnet routing
sudo tailscale up --advertise-routes=10.43.0.0/16,10.42.0.0/16
# In Tailscale admin console, approve the routes
Tailscale ACL configuration (in admin console):
{
"acls": [
{
"action": "accept",
"src": ["group:admin"],
"dst": ["*:*"]
},
{
"action": "accept",
"src": ["group:developers"],
"dst": ["tag:k8s:22", "tag:k8s:6443"]
}
],
"tagOwners": {
"tag:k8s": ["group:admin"]
}
}
Resources for key challenges:
Learning milestones:
- Two devices connected via Tailscale → You understand mesh VPN
- SSH to homelab from phone → You understand remote access
- Subnet routing exposes home network → You understand routing
- ACLs restrict access by user → You understand zero-trust
Project 12: High Availability Kubernetes (Multi-Master)
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML / Bash
- Alternative Programming Languages: Ansible
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 4: Expert
- Knowledge Area: Kubernetes / HA
- Software or Tool: K3s with embedded etcd, kube-vip
- Main Book: “Kubernetes Up and Running” by Brendan Burns
What you’ll build: A Kubernetes cluster with 3 control plane nodes, so the cluster survives the loss of any single control plane node. etcd runs in HA mode, and a virtual IP floats between masters.
Why it teaches home clusters: Single control plane = single point of failure. Understanding HA patterns (quorum, leader election, virtual IPs) is essential for production systems. This is how real Kubernetes clusters are built.
Core challenges you’ll face:
- etcd quorum → maps to why 3 nodes minimum
- Virtual IP failover → maps to stable API endpoint
- Leader election → maps to only one active leader
- Split-brain prevention → maps to network partition handling
Key Concepts:
- etcd Consensus: “Kubernetes Up and Running” Chapter 3
- Raft Protocol: How etcd achieves consensus
- Virtual IP: kube-vip or keepalived
- Quorum: N/2 + 1 nodes must agree
Difficulty: Expert Time estimate: 1-2 weeks Prerequisites: Project 3 (K3s basics), networking knowledge. 3+ machines for control plane.
Real world outcome:
# K3s HA cluster with 3 control planes + 2 workers
$ kubectl get nodes
NAME STATUS ROLES AGE
k8s-ctrl-1 Ready control-plane,etcd,master 5d
k8s-ctrl-2 Ready control-plane,etcd,master 5d
k8s-ctrl-3 Ready control-plane,etcd,master 5d
k8s-worker-1 Ready <none> 5d
k8s-worker-2 Ready <none> 5d
# Virtual IP for API
$ kubectl cluster-info
Kubernetes control plane is running at https://192.168.1.100:6443
# Kill ctrl-1, cluster keeps running
$ ssh ctrl-1 "sudo poweroff"
# After 30 seconds:
$ kubectl get nodes
NAME STATUS ROLES AGE
k8s-ctrl-1 NotReady control-plane,etcd,master 5d
k8s-ctrl-2 Ready control-plane,etcd,master 5d # Now leader
k8s-ctrl-3 Ready control-plane,etcd,master 5d
k8s-worker-1 Ready <none> 5d
k8s-worker-2 Ready <none> 5d
# VIP moved to ctrl-2, kubectl still works!
Implementation Hints:
K3s HA with embedded etcd:
# First control plane node
curl -sfL https://get.k3s.io | sh -s - server \
--cluster-init \
--tls-san 192.168.1.100 \
--disable servicelb
# Get token
cat /var/lib/rancher/k3s/server/node-token
# Additional control plane nodes
curl -sfL https://get.k3s.io | sh -s - server \
--server https://192.168.1.101:6443 \
--token <TOKEN> \
--tls-san 192.168.1.100
# Workers
curl -sfL https://get.k3s.io | sh -s - agent \
--server https://192.168.1.100:6443 \
--token <TOKEN>
kube-vip for virtual IP:
apiVersion: v1
kind: Pod
metadata:
name: kube-vip
namespace: kube-system
spec:
hostNetwork: true
containers:
- name: kube-vip
image: ghcr.io/kube-vip/kube-vip:latest
args:
- manager
env:
- name: vip_arp
value: "true"
- name: vip_interface
value: eth0
- name: address
value: "192.168.1.100"
- name: port
value: "6443"
- name: vip_leaderelection
value: "true"
Learning milestones:
- 3-node control plane running → You understand HA architecture
- etcd cluster healthy → You understand consensus
- VIP fails over on node loss → You understand virtual IPs
- Cluster survives any single failure → You’ve achieved true HA
Project 13: CI/CD Pipeline with Self-Hosted Runners
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML / Docker
- Alternative Programming Languages: Go, Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: CI/CD / DevOps
- Software or Tool: GitHub Actions + self-hosted runner, or Drone CI
- Main Book: “Continuous Delivery” by Jez Humble
What you’ll build: A self-hosted CI/CD system that builds Docker images, runs tests, and deploys to your Kubernetes cluster—all running on your homelab instead of paying for cloud CI minutes.
Why it teaches home clusters: CI/CD is the heartbeat of modern software delivery. Running it yourself teaches you about build pipelines, artifact management, and deployment automation. Plus, self-hosted runners have access to your local cluster.
Core challenges you’ll face:
- Runner registration → maps to connecting to GitHub/GitLab
- Docker-in-Docker builds → maps to building images in CI
- Secrets management → maps to secure credential handling
- Deployment triggers → maps to automatic deployments
Key Concepts:
- CI/CD Principles: “Continuous Delivery” Chapter 1 - Jez Humble
- GitHub Actions Syntax: GitHub documentation
- Docker BuildKit: Modern Docker image building
- Kubernetes Deployments: Rolling updates
Difficulty: Intermediate Time estimate: 1 week Prerequisites: Project 3 (K3s cluster), Project 8 (GitOps helpful). Git and Docker proficiency.
Real world outcome:
# .github/workflows/deploy.yml
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: self-hosted # Your homelab runner!
steps:
- uses: actions/checkout@v4
- name: Build Docker image
run: docker build -t myapp:${{ github.sha }} .
- name: Push to local registry
run: |
docker tag myapp:${{ github.sha }} registry.home.lab/myapp:${{ github.sha }}
docker push registry.home.lab/myapp:${{ github.sha }}
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/myapp \
myapp=registry.home.lab/myapp:${{ github.sha }}
GitHub Actions log:
✓ Build Docker image (45s)
✓ Push to local registry (12s)
✓ Deploy to Kubernetes (8s)
Total: 1m 5s on YOUR hardware, $0 cloud costs
Implementation Hints:
GitHub self-hosted runner setup:
# On a node in your cluster
mkdir actions-runner && cd actions-runner
curl -o actions-runner-linux-x64.tar.gz -L https://github.com/actions/runner/releases/download/v2.xxx/actions-runner-linux-x64-2.xxx.tar.gz
tar xzf actions-runner-linux-x64.tar.gz
# Configure (get token from GitHub repo settings)
./config.sh --url https://github.com/USERNAME/REPO --token <TOKEN>
# Run as service
sudo ./svc.sh install
sudo ./svc.sh start
Local container registry:
# Deploy a local registry in K8s
apiVersion: apps/v1
kind: Deployment
metadata:
name: registry
spec:
replicas: 1
selector:
matchLabels:
app: registry
template:
spec:
containers:
- name: registry
image: registry:2
ports:
- containerPort: 5000
volumeMounts:
- name: data
mountPath: /var/lib/registry
volumes:
- name: data
persistentVolumeClaim:
claimName: registry-data
Learning milestones:
- Self-hosted runner connected → You understand runner architecture
- Build runs on your hardware → You understand self-hosted benefits
- Image pushed to local registry → You understand artifact storage
- Auto-deploy on git push → You understand full CI/CD
Project 14: Backup and Disaster Recovery
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML / Bash
- Alternative Programming Languages: Go
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. The “Service & Support” Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Backup / DR
- Software or Tool: Velero, Restic, Proxmox Backup Server
- Main Book: “Site Reliability Engineering” by Google
What you’ll build: A comprehensive backup strategy covering Kubernetes resources, persistent volumes, and VM snapshots. You’ll practice restoring from backup to prove it works.
Why it teaches home clusters: “Backup that isn’t tested isn’t backup.” Learning disaster recovery means understanding what state needs to be saved, how to save it, and how to restore it. This is critical for any production system.
Core challenges you’ll face:
- What to backup → maps to identifying critical state
- Velero for K8s → maps to cluster-aware backups
- PV snapshots → maps to data backup
- Recovery testing → maps to validating backups
Key Concepts:
- RTO/RPO: Recovery Time/Point Objectives
- 3-2-1 Backup Rule: 3 copies, 2 media types, 1 offsite
- Velero Architecture: Velero documentation
- Snapshot vs Backup: Point-in-time vs full copy
Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Project 3-4 (K3s with storage), storage for backups (NAS, S3, etc.).
Real world outcome:
# Install Velero with S3-compatible backend
$ velero install \
--provider aws \
--plugins velero/velero-plugin-for-aws:v1.5.0 \
--bucket homelab-backups \
--secret-file ./credentials-velero \
--backup-location-config region=us-east-1,s3Url=http://minio.home.lab:9000
# Create a backup
$ velero backup create production-backup \
--include-namespaces production \
--snapshot-volumes
Backup request "production-backup" submitted successfully.
# Simulate disaster (delete namespace)
$ kubectl delete namespace production
namespace "production" deleted
# Restore from backup
$ velero restore create --from-backup production-backup
Restore request "production-backup-20231215" submitted successfully.
# Everything is back!
$ kubectl get all -n production
NAME READY STATUS
pod/myapp-abc123-xxx 1/1 Running
pod/postgres-xyz789-yyy 1/1 Running
Implementation Hints:
Velero scheduled backup:
apiVersion: velero.io/v1
kind: Schedule
metadata:
name: daily-backup
namespace: velero
spec:
schedule: "0 3 * * *" # 3 AM daily
template:
includedNamespaces:
- production
- monitoring
snapshotVolumes: true
ttl: 720h # Keep for 30 days
Proxmox backup with PBS:
# Add Proxmox Backup Server as storage
pvesm add pbs pbs-storage \
--server 192.168.1.50 \
--datastore backups \
--username backup@pbs \
--password secret
# Schedule VM backups
# In Proxmox UI: Datacenter → Backup → Add
# Schedule: daily, 02:00
# Selection: All VMs
# Storage: pbs-storage
Learning milestones:
- Velero installed and connected to storage → You understand backup infra
- Manual backup succeeds → You understand backup process
- Restore works → You understand recovery
- Scheduled backups running → You understand automation
Project 15: Self-Hosted Application Platform
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML
- Alternative Programming Languages: Various per app
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 2. The “Micro-SaaS / Pro Tool”
- Difficulty: Level 2: Intermediate
- Knowledge Area: Self-Hosting / Applications
- Software or Tool: Various (see list below)
- Main Book: N/A (App-specific docs)
What you’ll build: A suite of self-hosted applications running on your cluster: media server (Jellyfin), password manager (Vaultwarden), file sync (Nextcloud), ad-blocking DNS (Pi-hole), home automation (Home Assistant).
Why it teaches home clusters: This is why many people build homelabs—to run their own services. You’ll learn to deploy stateful applications, manage persistent data, configure networking, and understand real-world application requirements.
Core challenges you’ll face:
- Stateful application management → maps to databases, file storage
- Resource allocation → maps to limits and requests
- External access → maps to ingress, DNS
- Data persistence → maps to PVCs, backups
Popular Self-Hosted Apps:
| App | Purpose | Complexity |
|---|---|---|
| Jellyfin | Media streaming | Medium |
| Vaultwarden | Password manager | Low |
| Nextcloud | File sync/office | High |
| Pi-hole | Ad-blocking DNS | Low |
| Home Assistant | Home automation | Medium |
| Paperless-ngx | Document management | Medium |
| Gitea | Git hosting | Low |
| Miniflux | RSS reader | Low |
Difficulty: Intermediate Time estimate: Ongoing (add apps as needed) Prerequisites: Project 3-6 (K3s with ingress and storage).
Real world outcome:
Your home dashboard (https://home.yourdomain.com):
┌─────────────────────────────────────────────────────────────────┐
│ 🏠 Homelab Dashboard │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 📺 Jellyfin → jellyfin.home.lab [Healthy] │
│ 🔐 Vaultwarden → vault.home.lab [Healthy] │
│ 📁 Nextcloud → cloud.home.lab [Healthy] │
│ 🛡️ Pi-hole → pihole.home.lab [Healthy] │
│ 🏠 Home Assistant → hass.home.lab [Healthy] │
│ 📄 Paperless → docs.home.lab [Healthy] │
│ 📊 Grafana → grafana.home.lab [Healthy] │
│ │
│ Cluster Status: 3/3 nodes healthy │
│ Storage: 234 GB used / 1 TB available │
│ Network: 50 Mbps avg throughput │
└─────────────────────────────────────────────────────────────────┘
Implementation Hints:
Jellyfin deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: jellyfin
spec:
replicas: 1
selector:
matchLabels:
app: jellyfin
template:
metadata:
labels:
app: jellyfin
spec:
containers:
- name: jellyfin
image: jellyfin/jellyfin:latest
ports:
- containerPort: 8096
volumeMounts:
- name: config
mountPath: /config
- name: media
mountPath: /media
resources:
limits:
memory: "4Gi"
cpu: "2"
volumes:
- name: config
persistentVolumeClaim:
claimName: jellyfin-config
- name: media
nfs:
server: 192.168.1.50
path: /media
Learning milestones:
- First app deployed and accessible → You understand the pattern
- 5+ apps running smoothly → You understand resource management
- All apps backed up → You understand data protection
- Family uses your services → You’ve built real infrastructure
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. Docker Single Node | Beginner | Weekend | ⭐⭐⭐ | ⭐⭐⭐ |
| 2. Docker Swarm | Intermediate | Weekend-1wk | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 3. K3s Kubernetes | Advanced | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 4. Longhorn Storage | Advanced | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 5. MetalLB | Intermediate | Half day | ⭐⭐⭐ | ⭐⭐⭐ |
| 6. Ingress + TLS | Intermediate | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 7. Prometheus + Grafana | Intermediate | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 8. GitOps with ArgoCD | Advanced | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 9. Proxmox | Advanced | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 10. VLANs | Advanced | 1-2 weeks | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| 11. Tailscale | Beginner | Half day | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 12. HA Kubernetes | Expert | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 13. CI/CD | Intermediate | 1 week | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 14. Backup & DR | Advanced | 1-2 weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| 15. Self-Hosted Apps | Intermediate | Ongoing | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Recommended Learning Path
If you’re starting from scratch:
- Project 1: Docker basics → Get comfortable with containers
- Project 2: Docker Swarm → Your first real cluster
- Project 11: Tailscale → Easy remote access
- Project 3: K3s Kubernetes → The main event
- Project 5: MetalLB → Real IPs for services
- Project 6: Ingress + TLS → HTTPS for everything
- Project 7: Monitoring → See what’s happening
- Continue with storage, GitOps, etc.
If you want the “serious infrastructure” path:
- Project 9: Proxmox → Virtualization foundation
- Project 10: VLANs → Proper network segmentation
- Project 3: K3s (on Proxmox VMs)
- Project 12: HA Kubernetes → Production-grade
- Project 4: Longhorn → Distributed storage
- Project 8: GitOps → Infrastructure as code
- Project 14: Backup & DR → Protect everything
If you just want to self-host apps quickly:
- Project 1: Docker basics
- Project 3: K3s (single node is fine)
- Project 6: Ingress
- Project 15: Self-hosted apps → Start deploying!
- Project 11: Tailscale → Access from anywhere
Final Capstone Project: Production-Grade Homelab
- File: LEARN_HOME_CLUSTERS_DEEP_DIVE.md
- Main Programming Language: YAML / Terraform / Ansible
- Alternative Programming Languages: Go, Python
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. The “Open Core” Infrastructure
- Difficulty: Level 5: Master
- Knowledge Area: Full-Stack Infrastructure
- Software or Tool: All of the above
- Main Book: “Site Reliability Engineering” by Google
What you’ll build: A complete, production-grade homelab that combines everything:
- Infrastructure: Proxmox cluster with HA
- Orchestration: HA Kubernetes (K3s) on Proxmox VMs
- Storage: Longhorn or Ceph for distributed storage
- Networking: VLANs, MetalLB, Ingress, TLS
- Security: Tailscale, proper firewall rules, secrets management
- Observability: Prometheus, Grafana, Loki for logs
- GitOps: ArgoCD managing all deployments
- CI/CD: Self-hosted runners building and deploying
- Backup: Velero + Proxmox Backup Server
- Applications: Full suite of self-hosted services
Why this is the ultimate homelab project: This is a mini cloud platform. Companies pay millions for infrastructure like this. You’ll have something that rivals a small startup’s infrastructure, running in your home, managed as code, and fully automated.
Real world outcome:
Your Infrastructure as Code Repository:
homelab-infrastructure/
├── terraform/
│ └── proxmox/ # VM provisioning
├── ansible/
│ └── playbooks/ # OS configuration
├── kubernetes/
│ ├── infrastructure/ # MetalLB, cert-manager, etc.
│ ├── monitoring/ # Prometheus stack
│ ├── applications/ # All your apps
│ └── argocd/ # ArgoCD itself
├── docs/
│ ├── architecture.md
│ └── runbooks/ # Operational procedures
└── README.md
$ git push origin main
# → Terraform provisions VMs
# → Ansible configures nodes
# → K3s cluster forms
# → ArgoCD syncs all applications
# → Full stack running in 30 minutes from bare metal
Summary
| # | Project | Main Language |
|---|---|---|
| 1 | Docker on Single Node | Bash / YAML |
| 2 | Docker Swarm Cluster | Bash / YAML |
| 3 | K3s Kubernetes Cluster | YAML / Bash |
| 4 | Distributed Storage (Longhorn) | YAML / Bash |
| 5 | Load Balancer (MetalLB) | YAML |
| 6 | Ingress + TLS Certificates | YAML |
| 7 | Monitoring (Prometheus + Grafana) | YAML / PromQL |
| 8 | GitOps (ArgoCD) | YAML / Git |
| 9 | Proxmox Virtualization | Bash / Web UI |
| 10 | VLANs and Network Segmentation | Network config |
| 11 | Tailscale VPN | Configuration |
| 12 | HA Kubernetes (Multi-Master) | YAML / Bash |
| 13 | CI/CD Pipeline | YAML / Docker |
| 14 | Backup and Disaster Recovery | YAML / Bash |
| 15 | Self-Hosted Applications | YAML |
| Final | Production-Grade Homelab (Capstone) | All of the above |
Key Resources Referenced
Online Resources
- DEV Community - 2025 Pi K3s Guide
- Pi Kubernetes Cluster Project
- Virtualization HowTo - Kubernetes Home Lab 2025
- Docker Swarm for Home Labs
- Proxmox vs Kubernetes Comparison
- Kubernetes Storage Comparison
- Tailscale for Homelab
- Top Home Lab Networking Mistakes 2025
Books
- “Docker Deep Dive” by Nigel Poulton
- “Kubernetes Up and Running” by Brendan Burns et al.
- “Prometheus Up and Running” by Brian Brazil
- “GitOps and Kubernetes” by Billy Yuen
- “Computer Networks” by Andrew Tanenbaum
- “Continuous Delivery” by Jez Humble
- “Site Reliability Engineering” by Google