LEARN DEVOPS BY DOING
Learn DevOps: From “It Works on My Machine” to Platform Engineer
Goal: Master the art of shipping software reliably. You will move from manually configuring servers (painful) to building automated, self-healing platforms (powerful).
The DevOps Philosophy
DevOps isn’t just tools; it’s a response to a specific problem: The Wall of Confusion.
- Developers want change (new features).
- Operations want stability (no crashes).
- The Conflict: Developers toss code over the wall, Ops struggles to run it, and blame ensues.
The projects below simulate this evolution. You will start by doing things the “hard way” to feel the pain, then implement the DevOps solution to make the pain go away.
Core Concept Analysis
- Foundations: Linux, Scripting, and Networking.
- Immutable Infrastructure: Docker & Containers (Packaging).
- Infrastructure as Code (IaC): Terraform (Provisioning).
- Configuration Management: Ansible (Configuring).
- CI/CD: Pipelines (Automation).
- Orchestration: Kubernetes (Scaling & Healing).
- Observability: Prometheus/Grafana (Monitoring).
Project 1: The “Old School” Manual Web Server (Feel the Pain)
- File:
01_MANUAL_OPS.md - Main Programming Language: Bash
- Alternative Languages: Python (Fabric)
- Coolness Level: Level 1: Pure Corporate Snoozefest
- Business Potential: 3. Service & Support Model
- Difficulty: Level 1: Beginner
- Knowledge Area: Linux Systems & Networking
- Software: VirtualBox (or AWS Free Tier), Nginx, UFW
- Main Book: “UNIX and Linux System Administration Handbook” by Evi Nemeth
What you’ll build: You will manually provision a Virtual Machine (Ubuntu), secure it with a firewall, install a web server (Nginx), configure a custom HTML page, and set up a systemd service to keep it running.
Why it teaches DevOps: To understand automation, you must first understand what you are automating. You will face “drift” (when you forget what commands you ran) and security risks (leaving ports open).
Core challenges you’ll face:
- Permissions: Understanding
sudo,chown, andchmod(Why can’t Nginx read my file?). - Service Management: Writing a
systemdunit file so your app restarts on crash. - Networking: configuring
ufw(Uncomplicated Firewall) to allow port 80/443 but block everything else.
Real world outcome:
- You have a live IP address you can curl or visit in a browser that serves a custom webpage.
- If you reboot the server, the site comes back up automatically.
Implementation Hints:
- Don’t use a GUI. Do everything via SSH.
- Try to secure SSH by disabling password login and using keys only.
- Self-Reflection: If you had to do this for 50 servers, how long would it take? (This motivates Ansible).
Learning milestones:
- Comfortable with the Linux Command Line Interface (CLI).
- Understanding of Linux permissions and users.
- Understanding of system daemons (background processes).
Project 2: The “Works on My Machine” Fixer (Docker)
- File:
02_DOCKER_CONTAINERIZATION.md - Main Programming Language: Dockerfile (DSL)
- Alternative Languages: Python (for the app)
- Coolness Level: Level 2: Practical but Forgettable
- Business Potential: 2. Micro-SaaS Potential
- Difficulty: Level 2: Intermediate
- Knowledge Area: Containerization & Isolation
- Software: Docker, Python/Node.js
- Main Book: “Docker Deep Dive” by Nigel Poulton
What you’ll build: Take a simple Python web app that requires a specific, weird version of a library. Containerize it so it runs identically on your machine, a friend’s machine, and a server, without installing Python on the host.
Why it teaches DevOps: This solves “Dependency Hell.” You learn that the environment is part of the application.
Core challenges you’ll face:
- Layer Caching: Optimizing the Dockerfile so builds are fast.
- Volume Mounting: Getting code into the container without rebuilding it (for development).
- Port Mapping: Understanding the difference between the container’s port and the host’s port.
Real world outcome:
- You can delete Python from your computer, run one command (
docker run), and your application still works perfectly.
Implementation Hints:
- Create a “bad” application that relies on an environment variable to crash if it’s missing.
- Use a
.dockerignorefile to keep junk out of your image. - Compare the size of a standard image (Ubuntu base) vs. an Alpine image.
Learning milestones:
- Understanding Process Isolation.
- Mastering Image vs. Container concepts.
- Ability to package any app into a portable artifact.
Project 3: The “GitOps” Pipeline (CI/CD)
- File:
03_CICD_PIPELINE.md - Main Programming Language: YAML
- Alternative Languages: Groovy (Jenkins)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. Service & Support Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Automation & Release Engineering
- Software: GitHub Actions (or GitLab CI)
- Main Book: “Continuous Delivery” by Jez Humble and David Farley
What you’ll build: An automated pipeline that triggers every time you push code to GitHub. It should: 1. Lint the code (check for errors), 2. Run unit tests, 3. Build the Docker image, 4. Push it to DockerHub.
Why it teaches DevOps: This removes humans from the release process. If the tests fail, the bad code is rejected automatically. This is the “Integration” in CI/CD.
Core challenges you’ll face:
- Secret Management: How to log in to DockerHub from GitHub without committing your password to the repo.
- YAML Syntax: Indentation errors will break your pipeline.
- Job Dependencies: Ensuring the “Build” step doesn’t run if the “Test” step fails.
Real world outcome:
- You change a line of code,
git push, and 3 minutes later, a new container image appears on DockerHub without you touching anything else.
Implementation Hints:
- Start with a “Hello World” step that just prints “Build Started”.
- Look for “GitHub Actions Secrets” in the settings.
- deliberately break your code and watch the pipeline stop you.
Learning milestones:
- Automated quality control.
- Understanding of ephemeral build runners.
- Secure handling of credentials in automation.
Project 4: The Infrastructure Creator (Terraform)
- File:
04_TERRAFORM_IAC.md - Main Programming Language: HCL (HashiCorp Configuration Language)
- Alternative Languages: Pulumi (TypeScript/Python)
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 4. Open Core Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: Infrastructure as Code (IaC)
- Software: Terraform, AWS (Free Tier)
- Main Book: “Terraform: Up & Running” by Yevgeniy Brikman
What you’ll build: You will write code that creates a VPC (Virtual Private Cloud), a Subnet, an Internet Gateway, and an EC2 instance on AWS. You will not use the AWS Console (GUI).
Why it teaches DevOps: Clicking buttons in a GUI is not reproducible. IaC allows you to treat servers like software—versioned, tested, and reviewable.
Core challenges you’ll face:
- State Management: Understanding the
terraform.tfstatefile (if you delete it, Terraform loses track of your servers). - Dependency Graph: Terraform knows it must create the VPC before the Subnet.
- Drift: What happens if you manually change the server in AWS? (Terraform will try to revert it).
Real world outcome:
- Run
terraform apply-> Real AWS infrastructure appears. - Run
terraform destroy-> It all disappears. You pay nothing for idle resources.
Implementation Hints:
- Use the
aws_instanceresource. - WARNING: Do not commit your AWS Access Keys to GitHub. Use environment variables.
- Learn to use
terraform planto see what will happen before it happens.
Learning milestones:
- Infrastructure as Code principles.
- Cloud networking fundamentals (VPC, CIDR blocks).
- Declarative vs. Imperative programming.
Project 5: The Configuration Manager (Ansible)
- File:
05_ANSIBLE_CONFIG.md - Main Programming Language: YAML / Jinja2
- Alternative Languages: Chef, Puppet
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. Service & Support Model
- Difficulty: Level 2: Intermediate
- Knowledge Area: Configuration Management
- Software: Ansible
- Main Book: “Ansible for DevOps” by Jeff Geerling
What you’ll build: A “Playbook” that connects to your empty EC2 instance (from Project 4), installs Docker, creates a user, copies your config files, and starts your container.
Why it teaches DevOps: Terraform builds the house (Server), Ansible arranges the furniture (Software/Config). It ensures that if you have 100 servers, they are all configured exactly the same.
Core challenges you’ll face:
- Agentless Architecture: Configuring SSH keys so Ansible can talk to the server.
- Idempotency: Writing scripts that can run 100 times but only make changes once.
- Inventory Management: Defining which servers belong to “web” vs “database” groups.
Real world outcome:
- You run
ansible-playbook -i inventory site.ymland watch as your empty server transforms into a production-ready machine in minutes.
Implementation Hints:
- Use the
aptmodule to install packages, notshell: apt-get install. - Explore “Roles” to organize your code (e.g., a role for “security”, a role for “docker”).
Learning milestones:
- Idempotent script execution.
- SSH key management at scale.
- Separation of configuration data from execution logic.
Project 6: The Self-Healing Cluster (Kubernetes)
- File:
06_KUBERNETES_CLUSTER.md - Main Programming Language: YAML (Manifests)
- Alternative Languages: Helm (Templates)
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. Open Core Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Container Orchestration
- Software: Minikube or Kind (Kubernetes in Docker), kubectl
- Main Book: “Kubernetes Up and Running” by Brendan Burns et al.
What you’ll build: A local Kubernetes cluster running a “Deployment” of your app (3 replicas). You will verify that if you delete one “Pod” (container instance), the cluster immediately creates a new one to replace it.
Why it teaches DevOps: This is the modern standard. It treats a fleet of servers as a single computer. It handles scaling, rolling updates, and self-healing automatically.
Core challenges you’ll face:
- Networking: Understanding
Services(how to talk to the pods) andIngress(how the world talks to the Service). - State: Understanding that Pods die. If you save data inside a Pod, it is gone forever (you need
PersistentVolumes). - Declarative Model: You don’t tell K8s “start a container”; you tell it “I want 3 containers” and it makes it happen.
Real world outcome:
- You access your app. You manually kill a container backend. The app does not go down. Kubernetes restores the balance.
Implementation Hints:
- Start with
minikube. - Write a
deployment.yamland aservice.yaml. - Use
kubectl get pods -wto watch the self-healing in real-time.
Learning milestones:
- Container Orchestration concepts.
- Service Discovery.
- Zero-downtime deployment strategies (Rolling Updates).
Project 7: The “Eyes Everywhere” Monitor (Prometheus & Grafana)
- File:
07_OBSERVABILITY_STACK.md - Main Programming Language: PromQL (Query Language)
- Alternative Languages: Python (to expose metrics)
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. Service & Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Observability & Monitoring
- Software: Prometheus, Grafana, Node Exporter
- Main Book: “Site Reliability Engineering” by Google (The Monitoring Chapter)
What you’ll build: A monitoring stack that scrapes data from your Kubernetes cluster (CPU usage, Memory, Request rate). You will build a dashboard in Grafana to visualize this data and set up an alert that triggers if the CPU goes over 80%.
Why it teaches DevOps: You cannot fix what you cannot see. DevOps requires a feedback loop.
Core challenges you’ll face:
- Metric Types: Understanding Counters (always goes up) vs. Gauges (fluctuates).
- Querying: Learning PromQL to ask questions like “What is the 95th percentile response time?”.
- Alert Fatigue: Setting up alerts that matter, not just noise.
Real world outcome:
- You stress-test your application (load it with traffic) and watch the graphs spike in real-time on your dashboard.
- You receive an email or Slack notification when the system is stressed.
Implementation Hints:
- Install
node_exporteron your servers to get hardware metrics. - Import a pre-made Grafana dashboard ID (like 1860) to see what’s possible, then build your own.
Learning milestones:
- The difference between Logging and Monitoring.
- Time-series databases.
- Creating actionable alerts.
Project 8: The “Blue/Green” Zero-Downtime Deployer
- File:
08_BLUE_GREEN_DEPLOY.md - Main Programming Language: Bash / CI Scripts
- Alternative Languages: ArgoCD (GitOps tool)
- Coolness Level: Level 5: Pure Magic
- Business Potential: 4. Open Core Infrastructure
- Difficulty: Level 4: Expert
- Knowledge Area: Advanced Deployment Strategies
- Software: Kubernetes, Nginx/Ingress
- Main Book: “Accelerate” by Nicole Forsgren
What you’ll build: A deployment system where you have two environments: Blue (Live) and Green (New Version). You deploy to Green, run tests, and then switch the “Router” (Ingress) to point to Green instantly.
Why it teaches DevOps: This eliminates maintenance windows. You deploy during the day without users noticing.
Core challenges you’ll face:
- Database Schema Changes: How do you switch code if the new code needs a new database column? (Backwards compatibility).
- Session Handling: What happens to users logged into the “Blue” version when you switch to “Green”?
- Traffic Splitting: Sending only 1% of users to the new version (Canary Release).
Real world outcome:
- You deploy a version of your app with a bright red background (Green).
- You run a command.
- Instantly, without refreshing, the next request serves the red background version. No errors.
Implementation Hints:
- Use Kubernetes
Services. One service points to Blue pods, one to Green pods. - Change the
Ingressrule to point to the Green service.
Learning milestones:
- Advanced traffic management.
- Risk mitigation in deployments.
- Database migration strategies.
Project 9: The “Vault” Keeper (DevSecOps)
- File:
09_SECRET_MANAGEMENT.md - Main Programming Language: HCL
- Alternative Languages: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. Service & Support Model
- Difficulty: Level 3: Advanced
- Knowledge Area: Security & Identity
- Software: HashiCorp Vault
- Main Book: “Securing DevOps” by Julien Vehent
What you’ll build: A centralized secret server. Your application will boot up, authenticate with Vault, and request its database password. The password is never stored in the code or environment variables.
Why it teaches DevOps: Hardcoding passwords is a cardinal sin. This teaches “Secret Rotation” and “Least Privilege.”
Core challenges you’ll face:
- The “Zero” Secret: How does the app authenticate to get the secret? (The bootstrap problem).
- Dynamic Secrets: configuring Vault to generate a new AWS user for the app that exists only for 1 hour.
Real world outcome:
- You rotate the database password in Vault. The application continues to work because it dynamically fetches the new credentials.
Implementation Hints:
- Run Vault in “Dev” mode locally to start.
- Use the Vault API to read secrets.
Learning milestones:
- Dynamic secrets.
- Identity-based security.
- Audit trails for access.
Project 10: The Ultimate Platform (Capstone)
- File:
10_CAPSTONE_PLATFORM.md - Main Programming Language: Polyglot (YAML, HCL, Bash, Python)
- Coolness Level: Level 5: Pure Magic
- Business Potential: 1. Resume Gold
- Difficulty: Level 5: Master
- Knowledge Area: Platform Engineering
What you’ll build: A complete “IDP” (Internal Developer Platform).
- Repo: A developer pushes code to Git.
- CI: GitHub Actions runs tests and builds the image.
- CD: ArgoCD detects the change and deploys it to a Kubernetes cluster (provisioned by Terraform).
- Security: The app pulls secrets from Vault.
- Obs: Prometheus monitors it, and Fluentd ships logs to ElasticSearch.
Why it teaches DevOps: This is the “End Game.” You are no longer building the app; you are building the factory that builds the app.
Real world outcome:
- You commit code. You go get coffee. When you come back, the code is live, secured, and monitored. You did nothing manually.
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. Manual Web Server | Beginner | Weekend | ⭐⭐⭐ | ⭐⭐ |
| 2. Docker Fixer | Intermediate | Weekend | ⭐⭐⭐ | ⭐⭐⭐ |
| 3. CI/CD Pipeline | Intermediate | 1 Week | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 4. Terraform IaC | Advanced | 1-2 Weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ |
| 5. Ansible Config | Intermediate | 1 Week | ⭐⭐⭐ | ⭐⭐⭐ |
| 6. K8s Cluster | Expert | 2 Weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| 7. Monitoring Stack | Advanced | 1 Week | ⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 8. Blue/Green | Expert | 2 Weeks | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
Recommendation
Where to Start?
- If you are a complete beginner: Start with Project 1 (Manual Web Server). You cannot automate what you do not understand. You need to feel the pain of editing config files by hand.
- If you know Linux already: Jump to Project 2 (Docker) and Project 3 (CI/CD). These are the most employable skills immediately.
The Golden Path
Project 1 -> Project 2 -> Project 4 -> Project 3. Reason: Learn Linux, then Containers, then Cloud (Infrastructure), then automate the flow between them.
Summary
| Project | Main Tool/Language |
|---|---|
| 1. Manual Web Server | Bash / Linux |
| 2. Docker Containerization | Docker |
| 3. CI/CD Pipeline | GitHub Actions (YAML) |
| 4. Terraform Infrastructure | Terraform (HCL) |
| 5. Ansible Configuration | Ansible (YAML) |
| 6. Kubernetes Cluster | Kubernetes (YAML) |
| 7. Observability Stack | Prometheus / Grafana |
| 8. Blue/Green Deployment | Kubernetes / Ingress |
| 9. Secret Management | HashiCorp Vault |
| 10. Capstone Platform | All of the above |