LEARN TERRAFORM DEEP DIVE

Learn Terraform: From Zero to Infrastructure Master

Goal: Deeply understand Terraform—from basic resource provisioning to building production-grade, multi-cloud infrastructure with proper state management, modules, and security practices.

Why Terraform Matters

Before Terraform existed, infrastructure was managed through:

Manual clicking in cloud consoles (error-prone, unreproducible)
Custom scripts (fragile, cloud-specific, hard to maintain)
Configuration management tools (Chef, Puppet—focused on server config, not infrastructure)

Terraform solves the fundamental problem: How do you treat infrastructure like software?

After completing these projects, you will:

Understand the declarative Infrastructure as Code (IaC) paradigm
Know how Terraform’s state machine tracks real-world resources
Build reusable, composable infrastructure modules
Manage multi-environment deployments safely
Understand provider architecture and how Terraform talks to cloud APIs
Implement proper security, state management, and collaboration workflows

Core Concept Analysis

The Terraform Lifecycle

                    ┌─────────────────────────────────────────┐
                    │              terraform init             │
                    │   (Download providers, setup backend)   │
                    └─────────────────┬───────────────────────┘
                                      │
                                      ▼
                    ┌─────────────────────────────────────────┐
                    │              terraform plan             │
                    │   (Compare desired vs actual state)     │
                    └─────────────────┬───────────────────────┘
                                      │
                                      ▼
                    ┌─────────────────────────────────────────┐
                    │             terraform apply             │
                    │   (Execute changes, update state)       │
                    └─────────────────┬───────────────────────┘
                                      │
                                      ▼
                    ┌─────────────────────────────────────────┐
                    │            terraform destroy            │
                    │   (Remove all managed resources)        │
                    └─────────────────────────────────────────┘

Fundamental Concepts

Declarative vs Imperative
- Terraform is declarative: you describe the desired end state
- Terraform figures out how to get there (create, update, destroy)
- This is fundamentally different from scripts that describe steps
State Management
- Terraform keeps a terraform.tfstate file mapping config to real resources
- State is the source of truth for what Terraform manages
- Without state, Terraform cannot know what exists in your infrastructure
- Remote state enables team collaboration and prevents conflicts
Providers
- Plugins that translate Terraform config into API calls
- Each cloud/service (AWS, GCP, Azure, GitHub, Kubernetes) has a provider
- Providers define resources (things to create) and data sources (things to read)

Resources and Data Sources

# Resource: Terraform MANAGES this (creates, updates, destroys)
resource "aws_instance" "web" {
  ami           = "ami-12345678"
  instance_type = "t2.micro"
}

# Data Source: Terraform READS this (does not manage)
data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]
}

Modules
- Reusable containers for related resources
- Enable abstraction: hide complexity, expose clean interfaces
- Can be local directories or remote (Terraform Registry, Git)
Variables and Outputs
- Variables: Inputs to your configuration (parameterization)
- Outputs: Exports from your configuration (for humans or other modules)
- Locals: Computed intermediate values
Backend and State Locking
- Backend: Where state is stored (local, S3, GCS, Terraform Cloud)
- State locking: Prevents concurrent modifications (critical for teams)

Project List

Projects are ordered from fundamental understanding to production-grade implementations.

Project 1: Local Infrastructure Sandbox (Understand the Core Loop)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL (HashiCorp Configuration Language)
Alternative Programming Languages: JSON (Terraform also accepts JSON syntax)
Coolness Level: Level 1: Pure Corporate Snoozefest
Business Potential: 1. The “Resume Gold”
Difficulty: Level 1: Beginner
Knowledge Area: Infrastructure as Code / State Management
Software or Tool: Terraform, Local Provider
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A Terraform configuration that creates local files and directories on your machine, demonstrating the full init→plan→apply→destroy cycle without any cloud costs or complexity.

Why it teaches Terraform: Before touching clouds, you need to understand Terraform’s core mechanics. The local provider lets you see state management, resource lifecycle, and the plan/apply workflow without distractions. You’ll understand WHY Terraform works the way it does.

Core challenges you’ll face:

Understanding state files → maps to how Terraform tracks resources
Interpreting plan output → maps to reading +/- change indicators
Handling resource dependencies → maps to implicit and explicit depends_on
Dealing with state drift → maps to what happens when you modify files manually

Key Concepts:

Terraform Lifecycle: “Terraform: Up & Running” Chapter 1 - Yevgeniy Brikman
State Files: HashiCorp Learn - State Management Tutorial
HCL Syntax: “Terraform: Up & Running” Chapter 2 - Yevgeniy Brikman
Resource Dependencies: HashiCorp Docs - Dependency Lock File

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic command-line familiarity, text editor usage

Real world outcome:

$ terraform init
Initializing the backend...
Initializing provider plugins...
- Finding latest version of hashicorp/local...
- Installing hashicorp/local v2.4.0...

$ terraform plan
Terraform will perform the following actions:

  # local_file.config will be created
  + resource "local_file" "config" {
      + content              = "environment=development"
      + filename             = "./config/app.conf"
    }

Plan: 1 to add, 0 to change, 0 to destroy.

$ terraform apply
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

$ cat config/app.conf
environment=development

$ terraform destroy
Destroy complete! Resources: 1 destroyed.

Implementation Hints:

Start with the simplest possible configuration:

my-first-terraform/
├── main.tf          # Resource definitions
├── variables.tf     # Input variables
├── outputs.tf       # Output values
└── terraform.tfstate # (Generated after apply)

The local provider has these resources:

local_file - Creates a file with specified content
local_sensitive_file - Same but content hidden in logs
null_resource - Does nothing but useful for provisioners

Questions to explore:

What happens if you manually edit a file Terraform created?
What happens if you delete the state file and run plan again?
How does Terraform handle dependencies between resources?
What’s the difference between terraform.tfstate and terraform.tfstate.backup?

Use terraform state list and terraform state show <resource> to inspect state.

Learning milestones:

init/plan/apply works → You understand the basic workflow
You can read plan output → You understand what Terraform will do before it does it
You understand state files → You know why state is critical
You handle drift correctly → You understand real-world state management

Project 2: Static Website on S3 (Your First Cloud Resource)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: JSON, Pulumi (TypeScript), AWS CDK
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 1: Beginner
Knowledge Area: Cloud Storage / Static Hosting
Software or Tool: Terraform, AWS S3, AWS CLI
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A fully functional static website hosted on AWS S3, including bucket configuration, public access policies, and website hosting settings—all managed through Terraform.

Why it teaches Terraform: This is the classic “Hello World” of cloud Terraform. You’ll learn how providers work, how to configure authentication, and how to translate AWS console clicking into code. Most importantly, you’ll see your infrastructure in a real cloud.

Core challenges you’ll face:

AWS authentication setup → maps to provider configuration and credentials
Bucket naming uniqueness → maps to global namespace constraints
Public access configuration → maps to AWS security model evolution
Website endpoint configuration → maps to AWS-specific resource attributes

Key Concepts:

AWS Provider Configuration: “Terraform: Up & Running” Chapter 2 - Yevgeniy Brikman
S3 Bucket Policies: AWS Documentation - Bucket Policy Examples
Terraform Outputs: HashiCorp Learn - Output Data from Terraform

Difficulty: Beginner Time estimate: Weekend Prerequisites: AWS account, AWS CLI configured, Project 1 completed

Real world outcome:

$ terraform apply
aws_s3_bucket.website: Creating...
aws_s3_bucket.website: Creation complete after 2s [id=my-terraform-website-12345]
aws_s3_bucket_website_configuration.website: Creating...
aws_s3_bucket_website_configuration.website: Creation complete after 1s

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:

website_url = "http://my-terraform-website-12345.s3-website-us-east-1.amazonaws.com"

$ curl http://my-terraform-website-12345.s3-website-us-east-1.amazonaws.com
<!DOCTYPE html>
<html>
<head><title>My Terraform Website</title></head>
<body><h1>Hello from Terraform!</h1></body>
</html>

Implementation Hints:

AWS S3 website hosting requires multiple resources:

aws_s3_bucket - The bucket itself
aws_s3_bucket_public_access_block - Control public access settings
aws_s3_bucket_policy - Define who can read the bucket
aws_s3_bucket_website_configuration - Enable website hosting
aws_s3_object - Upload your HTML files

Modern AWS requires explicit public access configuration. You must:

Disable “Block Public Access” settings
Attach a bucket policy allowing s3:GetObject

Use terraform output to get the website URL after deployment.

Questions to explore:

What happens if you change the bucket name after it’s created?
How do you upload multiple files (hint: for_each)?
What’s the difference between aws_s3_bucket_acl and aws_s3_bucket_policy?

Learning milestones:

AWS provider authenticates → You understand cloud credentials
Website is accessible → You created real cloud infrastructure
You can update content → You understand resource updates vs recreation
destroy removes everything → You understand infrastructure lifecycle

Project 3: VPC from Scratch (Network Architecture)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: Pulumi (Python), AWS CDK (TypeScript)
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: Cloud Networking / VPC Design
Software or Tool: Terraform, AWS VPC, AWS Subnets
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A complete AWS VPC with public and private subnets across multiple availability zones, internet gateway, NAT gateway, and proper route tables.

Why it teaches Terraform: Networking is where most cloud architectures fail or succeed. This project forces you to understand resource dependencies (subnets depend on VPC, route tables depend on gateways), CIDR block math, and how Terraform handles complex interdependencies.

Core challenges you’ll face:

CIDR block planning → maps to network address space design
Resource dependency chains → maps to Terraform’s implicit dependency detection
Multi-AZ distribution → maps to using count/for_each with availability zones
Route table associations → maps to many-to-many resource relationships

Key Concepts:

VPC Fundamentals: “AWS Certified Solutions Architect Study Guide” Chapter 5 - Ben Piper
Terraform Count and For_Each: “Terraform: Up & Running” Chapter 5 - Yevgeniy Brikman
CIDR Notation: “Computer Networks” by Tanenbaum - Chapter on IP Addressing
AWS Networking: AWS Documentation - VPC User Guide

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic networking (IP addresses, CIDR), Projects 1-2 completed

Real world outcome:

$ terraform apply
...
Apply complete! Resources: 15 added, 0 changed, 0 destroyed.

Outputs:

vpc_id = "vpc-0abc123def456"
public_subnets = [
  "subnet-0pub1abc",
  "subnet-0pub2def",
]
private_subnets = [
  "subnet-0priv1abc",
  "subnet-0priv2def",
]
nat_gateway_ip = "54.123.45.67"

# Verify with AWS CLI
$ aws ec2 describe-vpcs --vpc-ids vpc-0abc123def456
{
    "Vpcs": [{
        "CidrBlock": "10.0.0.0/16",
        "State": "available"
    }]
}

Implementation Hints:

VPC architecture pattern:

VPC (10.0.0.0/16)
├── Public Subnet AZ-a (10.0.1.0/24) → Internet Gateway
├── Public Subnet AZ-b (10.0.2.0/24) → Internet Gateway
├── Private Subnet AZ-a (10.0.10.0/24) → NAT Gateway
└── Private Subnet AZ-b (10.0.20.0/24) → NAT Gateway

Resources you’ll create:

aws_vpc - The virtual network
aws_subnet - Network segments (use for_each for multiple)
aws_internet_gateway - Public internet access
aws_nat_gateway - Private subnet internet access (outbound only)
aws_eip - Elastic IP for NAT gateway
aws_route_table - Routing rules
aws_route_table_association - Link subnets to route tables

Use the cidrsubnet() function to calculate subnet CIDR blocks:

cidrsubnet("10.0.0.0/16", 8, 1)  # Returns "10.0.1.0/24"

Questions to explore:

Why do private subnets need a NAT gateway?
What’s the cost difference between Internet Gateway and NAT Gateway?
How do you handle multiple NAT gateways for high availability?
What happens if you delete the Internet Gateway while instances are running?

Learning milestones:

VPC with subnets created → You understand AWS networking basics
Instances in public subnet have internet → You understand routing
Instances in private subnet can reach internet outbound → You understand NAT
You use for_each effectively → You understand Terraform iteration

Project 4: EC2 Web Server with Security Groups

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: Pulumi (Go), Ansible + Terraform
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Compute / Security / Configuration
Software or Tool: Terraform, AWS EC2, Security Groups
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: An EC2 instance running a web server (nginx), with proper security groups, SSH key management, and user data scripts—deployed into your VPC from Project 3.

Why it teaches Terraform: This is where Terraform meets actual compute. You’ll learn about data sources (finding AMIs), provisioners (bootstrapping instances), security groups (cloud firewalls), and how to wire compute into your network.

Core challenges you’ll face:

Finding the right AMI → maps to data sources and dynamic lookups
SSH key management → maps to sensitive data in Terraform
User data scripts → maps to cloud-init and instance bootstrapping
Security group rules → maps to ingress/egress and CIDR blocks

Key Concepts:

EC2 Fundamentals: “Terraform: Up & Running” Chapter 2 - Yevgeniy Brikman
Data Sources: HashiCorp Docs - Data Sources
User Data Scripts: AWS Documentation - Run Commands on Launch
Security Groups: “AWS Certified Solutions Architect Study Guide” - Ben Piper

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic Linux, SSH, Projects 1-3 completed

Real world outcome:

$ terraform apply
...
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:

instance_public_ip = "54.234.56.78"
ssh_command = "ssh -i ~/.ssh/terraform-key ubuntu@54.234.56.78"

$ curl http://54.234.56.78
<!DOCTYPE html>
<html>
<head><title>Welcome to nginx!</title></head>
<body>
<h1>Deployed with Terraform!</h1>
<p>Instance ID: i-0abc123def456</p>
</body>
</html>

$ ssh -i ~/.ssh/terraform-key ubuntu@54.234.56.78
ubuntu@ip-10-0-1-15:~$ systemctl status nginx
● nginx.service - A high performance web server
   Active: active (running)

Implementation Hints:

Use data source to find latest Ubuntu AMI:

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]  # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

User data script for nginx:

#!/bin/bash
apt-get update
apt-get install -y nginx
systemctl start nginx
echo "<h1>Deployed with Terraform!</h1>" > /var/www/html/index.html

Security group pattern:

Ingress: Allow SSH (22) from your IP, HTTP (80) from anywhere
Egress: Allow all outbound

Questions to explore:

What happens if user data script fails?
How do you update the instance without destroying it?
What’s the difference between aws_security_group and aws_security_group_rule?
How do you handle AMI updates (new image published)?

Learning milestones:

Instance launches and is reachable → You understand compute basics
User data script runs successfully → You understand bootstrapping
Security groups work correctly → You understand cloud firewalls
You can SSH into the instance → You understand key management

Project 5: RDS Database with Secrets Management

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: Pulumi (Python), CDKTF
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: Databases / Secrets / Security
Software or Tool: Terraform, AWS RDS, AWS Secrets Manager
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A managed PostgreSQL database on AWS RDS, with credentials stored in AWS Secrets Manager, subnet groups for multi-AZ, and proper security group configuration allowing access only from your application tier.

Why it teaches Terraform: Databases are stateful—destroying them loses data. This project teaches you about prevent_destroy lifecycle rules, sensitive outputs, secrets management, and how to handle resources that require special care.

Core challenges you’ll face:

Handling sensitive values → maps to Terraform sensitive variables and outputs
Database subnet groups → maps to RDS networking requirements
Lifecycle management → maps to prevent_destroy, create_before_destroy
Connecting secrets to resources → maps to data source for secrets retrieval

Key Concepts:

RDS Configuration: “Terraform: Up & Running” Chapter 3 - Yevgeniy Brikman
Sensitive Variables: HashiCorp Docs - Sensitive Variables
Lifecycle Meta-Arguments: HashiCorp Docs - Lifecycle
Secrets Manager Integration: AWS Documentation

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic SQL/database concepts, Projects 1-4 completed

Real world outcome:

$ terraform apply
...
aws_db_instance.postgres: Creating...
aws_db_instance.postgres: Still creating... [4m30s elapsed]
aws_db_instance.postgres: Creation complete after 5m12s

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

db_endpoint = "mydb.abc123.us-east-1.rds.amazonaws.com:5432"
db_name = "appdb"
secret_arn = "arn:aws:secretsmanager:us-east-1:123456789:secret:db-creds-AbC123"

# Connect from EC2 instance in same VPC
$ psql -h mydb.abc123.us-east-1.rds.amazonaws.com -U admin -d appdb
Password: ********
appdb=> \dt
         List of relations
 Schema | Name | Type  | Owner
--------+------+-------+-------
(0 rows)

Implementation Hints:

Generate a random password and store in Secrets Manager:

resource "random_password" "db" {
  length  = 24
  special = true
}

resource "aws_secretsmanager_secret" "db" {
  name = "app/database/credentials"
}

resource "aws_secretsmanager_secret_version" "db" {
  secret_id     = aws_secretsmanager_secret.db.id
  secret_string = jsonencode({
    username = "admin"
    password = random_password.db.result
  })
}

Protect the database from accidental destruction:

resource "aws_db_instance" "postgres" {
  # ... other config ...

  lifecycle {
    prevent_destroy = true
  }

  # For real protection
  deletion_protection = true
}

Database subnet group requires subnets in multiple AZs:

resource "aws_db_subnet_group" "main" {
  name       = "main"
  subnet_ids = aws_subnet.private[*].id
}

Questions to explore:

What happens if you try to terraform destroy with prevent_destroy?
How do you rotate database credentials with Terraform?
What’s the difference between skip_final_snapshot and production settings?
How does your EC2 instance retrieve the password from Secrets Manager?

Learning milestones:

RDS instance is accessible from EC2 → You understand VPC security
Credentials are in Secrets Manager → You understand secrets handling
destroy is blocked by lifecycle → You understand protection mechanisms
You can rotate credentials → You understand secret rotation patterns

Project 6: Terraform Modules (Reusability)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: CDKTF (TypeScript/Python), Pulumi
Coolness Level: Level 3: Genuinely Clever
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 3: Advanced
Knowledge Area: Software Architecture / Code Reuse
Software or Tool: Terraform, Terraform Registry
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A library of reusable Terraform modules—VPC module, EC2 module, RDS module—that encapsulate your previous projects and can be composed together. You’ll publish to a private module registry.

Why it teaches Terraform: Modules are the key to scalable Terraform. This project teaches you abstraction, interface design, versioning, and how professional teams structure their infrastructure code. You’ll learn why “copy-paste” doesn’t scale.

Core challenges you’ll face:

Designing module interfaces → maps to what to expose as variables vs hide
Module composition → maps to passing outputs as inputs between modules
Versioning strategy → maps to semantic versioning for infrastructure
Documentation → maps to README, input/output descriptions

Key Concepts:

Module Design: “Terraform: Up & Running” Chapter 4 - Yevgeniy Brikman
Module Best Practices: HashiCorp Blog - How to Write and Rightsize Terraform Modules
Module Registry: HashiCorp Docs - Private Registries
Semantic Versioning: semver.org

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1-5 completed, understanding of software design principles

Real world outcome:

# Using your modules in a new project
$ cat main.tf
module "network" {
  source  = "git::https://github.com/yourname/terraform-aws-vpc.git?ref=v1.2.0"

  cidr_block  = "10.0.0.0/16"
  environment = "production"
  azs         = ["us-east-1a", "us-east-1b"]
}

module "webserver" {
  source = "git::https://github.com/yourname/terraform-aws-ec2.git?ref=v2.0.0"

  vpc_id    = module.network.vpc_id
  subnet_id = module.network.public_subnets[0]

  instance_type = "t3.small"
  user_data     = file("${path.module}/scripts/init.sh")
}

module "database" {
  source = "git::https://github.com/yourname/terraform-aws-rds.git?ref=v1.0.0"

  vpc_id             = module.network.vpc_id
  subnet_ids         = module.network.private_subnets
  allowed_cidr_blocks = [module.network.vpc_cidr]

  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.t3.micro"
}

$ terraform init
Initializing modules...
Downloading git::https://github.com/yourname/terraform-aws-vpc.git?ref=v1.2.0...
Downloading git::https://github.com/yourname/terraform-aws-ec2.git?ref=v2.0.0...
Downloading git::https://github.com/yourname/terraform-aws-rds.git?ref=v1.0.0...

$ terraform apply
Apply complete! Resources: 24 added, 0 changed, 0 destroyed.

Implementation Hints:

Module directory structure:

modules/
├── vpc/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── versions.tf
│   └── README.md
├── ec2/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   └── README.md
└── rds/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── README.md

Good module interface design principles:

Minimal required inputs - Most things should have sensible defaults
Predictable outputs - Output everything consumers might need
Clear naming - Variable names should be self-documenting
Validation - Use validation blocks to catch errors early

Variable validation example:

variable "environment" {
  type        = string
  description = "Environment name (dev, staging, prod)"

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

Questions to explore:

When should you create a new module vs extend an existing one?
How do you handle breaking changes in module versions?
What’s the trade-off between flexible modules and opinionated modules?
How do you test modules before releasing new versions?

Learning milestones:

Modules work from local paths → You understand basic module structure
Modules work from Git refs → You understand versioning
You compose multiple modules → You understand module interfaces
You document modules properly → You understand professional practices

Project 7: Remote State and Workspaces (Team Collaboration)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: N/A (Terraform-specific feature)
Coolness Level: Level 2: Practical but Forgettable
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 2: Intermediate
Knowledge Area: State Management / Collaboration
Software or Tool: Terraform, AWS S3, DynamoDB
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A remote backend configuration using S3 for state storage and DynamoDB for state locking, with workspace-based environment separation (dev/staging/prod).

Why it teaches Terraform: State is Terraform’s Achilles heel—lose it and you lose your ability to manage infrastructure. This project teaches the production-grade patterns that every team needs: remote state, locking, and environment isolation.

Core challenges you’ll face:

Backend configuration chicken-and-egg → maps to bootstrapping state backend
State locking → maps to concurrent access prevention
Workspace isolation → maps to environment separation patterns
State migration → maps to moving from local to remote

Key Concepts:

Remote State: “Terraform: Up & Running” Chapter 3 - Yevgeniy Brikman
State Locking: HashiCorp Docs - State Locking
Workspaces: HashiCorp Docs - Workspaces
Backend Configuration: HashiCorp Docs - S3 Backend

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Projects 1-6 completed, understanding of DynamoDB basics

Real world outcome:

# Initial setup (run once to create backend infrastructure)
$ cd bootstrap/
$ terraform apply
Apply complete! Resources: 3 added

# Configure backend in your main project
$ cat backend.tf
terraform {
  backend "s3" {
    bucket         = "my-terraform-state-12345"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

# Migrate existing state
$ terraform init -migrate-state
Initializing the backend...
Copying state from local to S3...
Successfully configured the backend "s3"!

# Work with multiple environments
$ terraform workspace new staging
Created and switched to workspace "staging"!

$ terraform workspace new production
Created and switched to workspace "production"!

$ terraform workspace list
  default
  staging
* production

# If someone else tries to apply at the same time...
$ terraform apply
Acquiring state lock...
Error: Error acquiring the state lock

Lock Info:
  ID:        abc123-def456
  Path:      s3://my-terraform-state-12345/infrastructure/terraform.tfstate
  Operation: apply
  Who:       colleague@laptop
  Created:   2025-01-15 10:30:00

Implementation Hints:

Bootstrap configuration (run first, stores state locally):

# bootstrap/main.tf
resource "aws_s3_bucket" "state" {
  bucket = "my-terraform-state-${random_id.suffix.hex}"
}

resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_dynamodb_table" "locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Workspace-aware configuration:

locals {
  environment = terraform.workspace

  instance_type = {
    dev        = "t3.micro"
    staging    = "t3.small"
    production = "t3.large"
  }
}

resource "aws_instance" "app" {
  instance_type = local.instance_type[local.environment]
  # ...
}

Questions to explore:

What happens if you delete the DynamoDB table while someone holds a lock?
How do you recover from a stuck lock?
What’s the difference between workspaces and separate state files?
How do you share outputs between workspaces (hint: terraform_remote_state)?

Learning milestones:

State is in S3 → You understand remote state basics
Locking prevents conflicts → You understand collaboration safety
Workspaces separate environments → You understand isolation
You migrate state successfully → You understand state management

Project 8: Docker Infrastructure with Terraform

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: Docker Compose (for comparison)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 2: Intermediate
Knowledge Area: Containers / Local Development
Software or Tool: Terraform, Docker Provider
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A local development environment using Terraform’s Docker provider—managing containers, networks, and volumes the same way you manage cloud infrastructure.

Why it teaches Terraform: Terraform isn’t just for clouds. This project proves that Terraform’s model applies to any API. You’ll understand that a “provider” is just an API translator, and Terraform’s value is the workflow, not the cloud integration.

Core challenges you’ll face:

Docker provider configuration → maps to non-cloud provider patterns
Container lifecycle → maps to resource update behaviors
Network configuration → maps to cross-resource dependencies
Volume persistence → maps to stateful resource handling

Key Concepts:

Docker Provider: Terraform Registry - kreuzwerker/docker
Container Orchestration: Docker Documentation
Terraform Providers: “Terraform: Up & Running” Chapter 7 - Yevgeniy Brikman

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic Docker knowledge, Projects 1-3 completed

Real world outcome:

$ terraform apply
docker_network.app_network: Creating...
docker_volume.postgres_data: Creating...
docker_network.app_network: Creation complete after 1s
docker_volume.postgres_data: Creation complete after 0s
docker_container.postgres: Creating...
docker_container.postgres: Creation complete after 2s
docker_container.redis: Creating...
docker_container.redis: Creation complete after 1s
docker_container.app: Creating...
docker_container.app: Creation complete after 1s

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

app_url = "http://localhost:8080"
postgres_port = 5432

$ docker ps
CONTAINER ID   IMAGE          STATUS         PORTS                    NAMES
abc123         myapp:latest   Up 2 minutes   0.0.0.0:8080->8080/tcp   app
def456         postgres:15    Up 2 minutes   0.0.0.0:5432->5432/tcp   postgres
ghi789         redis:7        Up 2 minutes   6379/tcp                 redis

$ curl http://localhost:8080/health
{"status": "healthy", "database": "connected", "cache": "connected"}

Implementation Hints:

Docker provider configuration:

terraform {
  required_providers {
    docker = {
      source  = "kreuzwerker/docker"
      version = "~> 3.0"
    }
  }
}

provider "docker" {
  host = "unix:///var/run/docker.sock"
}

Multi-container application:

resource "docker_network" "app" {
  name = "app-network"
}

resource "docker_volume" "postgres_data" {
  name = "postgres-data"
}

resource "docker_container" "postgres" {
  name  = "postgres"
  image = docker_image.postgres.image_id

  networks_advanced {
    name = docker_network.app.name
  }

  volumes {
    volume_name    = docker_volume.postgres_data.name
    container_path = "/var/lib/postgresql/data"
  }

  env = [
    "POSTGRES_PASSWORD=secret"
  ]
}

Questions to explore:

How does Terraform handle container image updates?
What’s the difference between Docker Compose and Terraform for containers?
How do you handle container logs and debugging?
Can you use Terraform to build Docker images?

Learning milestones:

Containers start via Terraform → You understand provider universality
Containers communicate → You understand Terraform networking
Data persists after destroy/apply → You understand volumes
You see when Terraform shines vs Docker Compose → You understand tool selection

Project 9: Multi-Environment Deployment (Dev/Staging/Prod)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: Terragrunt (HCL extension)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 3: Advanced
Knowledge Area: Environment Management / GitOps
Software or Tool: Terraform, Git, Terragrunt (optional)
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A complete infrastructure deployment pipeline supporting multiple environments (dev/staging/prod) with different configurations, proper variable management, and environment promotion workflows.

Why it teaches Terraform: Real infrastructure has environments. This project teaches the patterns for managing configuration differences, preventing “works on dev, breaks on prod” scenarios, and creating safe promotion paths.

Core challenges you’ll face:

Configuration variance → maps to tfvars files and environment-specific values
DRY principle → maps to modules, locals, and variable defaults
Environment isolation → maps to separate state files and accounts
Promotion workflow → maps to GitOps and PR-based deployments

Key Concepts:

Multi-Environment Patterns: “Terraform: Up & Running” Chapter 5 - Yevgeniy Brikman
Terragrunt: Gruntwork.io - Terragrunt Documentation
GitOps for Infrastructure: HashiCorp Blog - GitOps Patterns

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1-7 completed, Git workflow understanding

Real world outcome:

# Directory structure
$ tree environments/
environments/
├── dev/
│   ├── main.tf
│   ├── backend.tf
│   └── terraform.tfvars
├── staging/
│   ├── main.tf
│   ├── backend.tf
│   └── terraform.tfvars
└── prod/
    ├── main.tf
    ├── backend.tf
    └── terraform.tfvars

# Deploy to dev
$ cd environments/dev
$ terraform apply
Apply complete! Resources: 12 added

# Compare environments
$ terraform-docs markdown table environments/dev > docs/dev.md
$ diff environments/dev/terraform.tfvars environments/prod/terraform.tfvars
1c1
< environment     = "dev"
---
> environment     = "prod"
3c3
< instance_type   = "t3.micro"
---
> instance_type   = "t3.large"
5c5
< db_instance_class = "db.t3.micro"
---
> db_instance_class = "db.r5.large"

# Promote to staging (via Git PR)
$ git checkout -b promote-to-staging
$ cp environments/dev/terraform.tfvars environments/staging/
$ git add . && git commit -m "Promote dev config to staging"
$ git push && gh pr create

Implementation Hints:

Environment-specific tfvars:

# environments/dev/terraform.tfvars
environment       = "dev"
instance_type     = "t3.micro"
instance_count    = 1
db_instance_class = "db.t3.micro"
enable_monitoring = false

# environments/prod/terraform.tfvars
environment       = "prod"
instance_type     = "t3.large"
instance_count    = 3
db_instance_class = "db.r5.large"
enable_monitoring = true

Shared module with environment-aware defaults:

# modules/app/variables.tf
variable "environment" {
  type = string
}

variable "enable_monitoring" {
  type    = bool
  default = false
}

variable "alarm_actions" {
  type    = list(string)
  default = []
}

# modules/app/main.tf
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  count = var.enable_monitoring ? 1 : 0

  alarm_name    = "${var.environment}-high-cpu"
  alarm_actions = var.alarm_actions
  # ...
}

Terragrunt alternative (DRYer approach):

# terragrunt.hcl in each environment
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../modules//app"
}

inputs = {
  environment   = "dev"
  instance_type = "t3.micro"
}

Questions to explore:

Should environments share state or have separate state files?
How do you handle environment-specific resources (e.g., only prod has CDN)?
What’s the role of Git branches in environment management?
How do you prevent accidental production changes?

Learning milestones:

Same code deploys to multiple environments → You understand variable management
Environments have different resources → You understand conditional logic
Promotion requires approval → You understand GitOps patterns
Prod is protected → You understand access control

Project 10: Kubernetes Cluster with EKS

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: eksctl, Pulumi (TypeScript)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 4: Expert
Knowledge Area: Container Orchestration / Kubernetes
Software or Tool: Terraform, AWS EKS, kubectl
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A production-ready EKS cluster with managed node groups, proper IAM roles, VPC CNI networking, and cluster add-ons (CoreDNS, kube-proxy, VPC CNI).

Why it teaches Terraform: EKS is the most complex AWS service to provision. This project forces you to understand IAM roles for service accounts, cross-resource dependencies, and how Terraform handles resources that take 15+ minutes to create.

Core challenges you’ll face:

IAM role trust policies → maps to assume_role and service-linked roles
EKS networking complexity → maps to VPC CNI and pod networking
Long creation times → maps to Terraform timeouts and patience
Kubernetes provider bootstrap → maps to provider dependencies on resources

Key Concepts:

EKS Architecture: AWS Documentation - EKS User Guide
IAM for EKS: AWS - EKS IAM
EKS Terraform Module: Terraform Registry - terraform-aws-modules/eks

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Kubernetes basics, Docker, Projects 1-9 completed

Real world outcome:

$ terraform apply
module.eks.aws_eks_cluster.this: Creating...
module.eks.aws_eks_cluster.this: Still creating... [10m elapsed]
module.eks.aws_eks_cluster.this: Creation complete after 12m3s
module.eks.aws_eks_node_group.this: Creating...
module.eks.aws_eks_node_group.this: Still creating... [5m elapsed]
module.eks.aws_eks_node_group.this: Creation complete after 6m45s

Apply complete! Resources: 32 added, 0 changed, 0 destroyed.

Outputs:

cluster_endpoint = "https://ABC123.sk1.us-east-1.eks.amazonaws.com"
cluster_name = "my-terraform-cluster"

# Configure kubectl
$ aws eks update-kubeconfig --name my-terraform-cluster --region us-east-1
Added new context arn:aws:eks:us-east-1:123456789:cluster/my-terraform-cluster

$ kubectl get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-1-123.ec2.internal                Ready    <none>   5m    v1.28
ip-10-0-2-234.ec2.internal                Ready    <none>   5m    v1.28

$ kubectl run nginx --image=nginx
pod/nginx created

$ kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          30s

Implementation Hints:

Use the official EKS module (don’t reinvent the wheel):

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "my-cluster"
  cluster_version = "1.28"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = {
    default = {
      min_size     = 2
      max_size     = 4
      desired_size = 2

      instance_types = ["t3.medium"]
    }
  }
}

If building from scratch, understand the dependencies:

IAM role for EKS cluster (trust policy: eks.amazonaws.com)
EKS cluster (takes ~10 minutes)
IAM role for node groups (trust policy: ec2.amazonaws.com)
Node groups (take ~5 minutes each)
aws-auth ConfigMap (for kubectl access)

Kubernetes provider configuration after EKS:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

Questions to explore:

Why does EKS take so long to create?
What’s the difference between managed and self-managed node groups?
How do you add cluster add-ons (metrics-server, ingress controller)?
How do you handle EKS upgrades?

Learning milestones:

Cluster is running → You understand EKS basics
kubectl works → You understand authentication flow
Pods schedule correctly → You understand node groups
You can deploy applications → You understand the full workflow

Project 11: CI/CD Pipeline for Terraform (GitOps)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: YAML (GitHub Actions), HCL
Alternative Programming Languages: GitLab CI, Jenkins, Atlantis
Coolness Level: Level 3: Genuinely Clever
Business Potential: 4. The “Open Core” Infrastructure
Difficulty: Level 3: Advanced
Knowledge Area: CI/CD / DevOps / Automation
Software or Tool: GitHub Actions, Terraform, tflint, tfsec
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A complete CI/CD pipeline that runs terraform fmt, terraform validate, security scanning, terraform plan on PRs (with plan output as comments), and terraform apply on merge to main.

Why it teaches Terraform: Manual Terraform runs don’t scale. This project teaches the professional workflow: code review for infrastructure changes, automated validation, and safe automated deployments.

Core challenges you’ll face:

Credential management in CI → maps to OIDC, secrets, and least privilege
Plan output as PR comment → maps to Terraform output parsing
Concurrency control → maps to preventing parallel applies
Approval workflows → maps to gating production deployments

Key Concepts:

Terraform in CI/CD: “Terraform: Up & Running” Chapter 8 - Yevgeniy Brikman
GitHub Actions with AWS: AWS Blog - OIDC for GitHub Actions
tfsec: tfsec.dev
Atlantis: runatlantis.io

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Git, GitHub Actions basics, Projects 1-10 completed

Real world outcome:

# .github/workflows/terraform.yml
name: Terraform

on:
  pull_request:
    paths: ['**.tf']
  push:
    branches: [main]
    paths: ['**.tf']

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Init
        run: terraform init -backend=false

      - name: Terraform Validate
        run: terraform validate

      - name: tfsec Security Scan
        uses: aquasecurity/tfsec-action@v1.0.0

  plan:
    if: github.event_name == 'pull_request'
    needs: validate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/github-actions
          aws-region: us-east-1

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -out=tfplan

      - name: Comment Plan on PR
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '## Terraform Plan\n```\n${{ steps.plan.outputs.stdout }}\n```'
            })

  apply:
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    needs: validate
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Terraform Apply
        run: terraform apply -auto-approve

# PR Comment example:
## Terraform Plan

Terraform will perform the following actions:

  # aws_instance.web will be updated in-place
  ~ resource "aws_instance" "web" {
      ~ instance_type = "t3.micro" -> "t3.small"
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Implementation Hints:

OIDC for AWS (no long-lived secrets):

# Create in AWS first
resource "aws_iam_openid_connect_provider" "github" {
  url             = "https://token.actions.githubusercontent.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["6938fd4d..."]
}

resource "aws_iam_role" "github_actions" {
  name = "github-actions"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.github.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
        }
        StringLike = {
          "token.actions.githubusercontent.com:sub" = "repo:yourorg/yourrepo:*"
        }
      }
    }]
  })
}

Concurrency control:

concurrency:
  group: terraform-${{ github.ref }}
  cancel-in-progress: false

Questions to explore:

How do you handle multiple environments in the same repo?
What’s the difference between GitHub Actions and Atlantis?
How do you implement manual approval for production?
How do you handle Terraform state locking conflicts in CI?

Learning milestones:

Format/validate runs on every PR → You understand basic CI
Plan appears as PR comment → You understand review workflows
Apply only runs on main → You understand gated deployments
No long-lived credentials → You understand security best practices

Project 12: Import Existing Infrastructure

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: Terraformer (auto-generator)
Coolness Level: Level 3: Genuinely Clever
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 3: Advanced
Knowledge Area: Migration / Legacy Systems
Software or Tool: Terraform, Terraformer, AWS CLI
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: Take existing manually-created AWS infrastructure (EC2, VPC, RDS created via console) and bring it under Terraform management without destroying and recreating resources.

Why it teaches Terraform: Real-world Terraform adoption almost never starts greenfield. This project teaches the critical skill of importing existing resources, handling state without disruption, and the detective work required to reverse-engineer infrastructure.

Core challenges you’ll face:

Discovering existing resources → maps to AWS CLI and console archaeology
Writing matching configuration → maps to reverse engineering infrastructure
Import without state drift → maps to plan showing no changes
Handling dependencies → maps to import order matters

Key Concepts:

Terraform Import: HashiCorp Docs - Import
Import Blocks (Terraform 1.5+): HashiCorp Docs - Import Block
Terraformer: Google - Terraformer

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: AWS CLI proficiency, Projects 1-7 completed

Real world outcome:

# Step 1: Discover what exists
$ aws ec2 describe-instances --query 'Reservations[].Instances[].{ID:InstanceId,Type:InstanceType,VPC:VpcId}'
[
    {
        "ID": "i-0abc123def456",
        "Type": "t3.small",
        "VPC": "vpc-0xyz789"
    }
]

# Step 2: Write matching config (modern Terraform 1.5+ way)
$ cat imports.tf
import {
  to = aws_instance.legacy_web
  id = "i-0abc123def456"
}

import {
  to = aws_vpc.legacy
  id = "vpc-0xyz789"
}

# Step 3: Generate config from imports
$ terraform plan -generate-config-out=generated.tf
Planning...
aws_instance.legacy_web: Preparing import... [id=i-0abc123def456]
aws_vpc.legacy: Preparing import... [id=vpc-0xyz789]

# Review generated config
$ cat generated.tf
resource "aws_instance" "legacy_web" {
  ami                    = "ami-12345678"
  instance_type          = "t3.small"
  subnet_id              = "subnet-abc123"
  vpc_security_group_ids = ["sg-xyz789"]
  # ... full configuration
}

# Step 4: Apply import
$ terraform apply
aws_instance.legacy_web: Importing... [id=i-0abc123def456]
aws_instance.legacy_web: Import complete

# Step 5: Verify no drift
$ terraform plan
No changes. Your infrastructure matches the configuration.

Implementation Hints:

Old import method (still useful):

# Write the resource block first (empty or guessed)
resource "aws_instance" "legacy" {
  # TODO: fill in after import
}

# Import into state
terraform import aws_instance.legacy i-0abc123def456

# Now terraform plan will show what's different
terraform plan
# Add missing attributes until plan shows "no changes"

Terraformer for bulk import:

# Install terraformer
brew install terraformer

# Import all EC2 instances
terraformer import aws --resources=ec2_instance --regions=us-east-1

# Output is in generated/aws/ec2_instance/

Common gotchas:

Some attributes are “import-only” and not in normal output
Security groups often have circular dependencies
Some resources can’t be imported (must recreate)
Order matters: VPC before subnets before instances

Questions to explore:

What resources can’t be imported?
How do you handle resources created by other tools (CloudFormation)?
What’s the strategy for importing a 500-resource account?
How do you handle imports across multiple Terraform states?

Learning milestones:

Single resource imports cleanly → You understand import basics
Plan shows no changes → You matched the config perfectly
Multiple dependent resources import → You understand import order
You document the import process → You can repeat it for others

Project 13: Serverless Infrastructure (Lambda + API Gateway)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL, Python (Lambda code)
Alternative Programming Languages: Serverless Framework, AWS SAM
Coolness Level: Level 3: Genuinely Clever
Business Potential: 2. The “Micro-SaaS / Pro Tool”
Difficulty: Level 3: Advanced
Knowledge Area: Serverless / FaaS / API Management
Software or Tool: Terraform, AWS Lambda, API Gateway
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A serverless API with Lambda functions behind API Gateway, including IAM roles, CloudWatch logs, and a custom domain with SSL.

Why it teaches Terraform: Serverless is the opposite of traditional infrastructure—no servers to manage. This project shows Terraform’s versatility and teaches the IAM permission model that governs all AWS services.

Core challenges you’ll face:

Lambda packaging → maps to deployment packages and layers
API Gateway complexity → maps to routes, methods, integrations
IAM permissions → maps to least privilege for Lambda execution
Custom domains → maps to ACM certificates and Route53

Key Concepts:

Lambda with Terraform: HashiCorp Tutorial - Lambda Functions
API Gateway: AWS Documentation
IAM Best Practices: AWS - Least Privilege

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Python basics, Projects 1-7 completed

Real world outcome:

$ terraform apply
aws_lambda_function.api: Creating...
aws_apigatewayv2_api.main: Creating...
...
Apply complete! Resources: 12 added, 0 changed, 0 destroyed.

Outputs:

api_endpoint = "https://abc123.execute-api.us-east-1.amazonaws.com"
custom_domain = "https://api.myapp.com"

$ curl https://api.myapp.com/hello
{"message": "Hello from Terraform-deployed Lambda!"}

$ curl https://api.myapp.com/users/123
{"user_id": "123", "name": "John Doe"}

$ aws logs tail /aws/lambda/my-api --follow
START RequestId: abc-123
{"level": "INFO", "message": "Processing request"}
END RequestId: abc-123
REPORT Duration: 3.21 ms   Billed Duration: 4 ms   Memory Size: 128 MB

Implementation Hints:

Lambda function with Terraform:

data "archive_file" "lambda" {
  type        = "zip"
  source_dir  = "${path.module}/lambda"
  output_path = "${path.module}/lambda.zip"
}

resource "aws_lambda_function" "api" {
  filename         = data.archive_file.lambda.output_path
  function_name    = "my-api"
  role            = aws_iam_role.lambda.arn
  handler         = "main.handler"
  source_code_hash = data.archive_file.lambda.output_base64sha256
  runtime         = "python3.11"

  environment {
    variables = {
      ENVIRONMENT = var.environment
    }
  }
}

resource "aws_iam_role" "lambda" {
  name = "lambda-execution-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

API Gateway v2 (HTTP API):

resource "aws_apigatewayv2_api" "main" {
  name          = "my-api"
  protocol_type = "HTTP"
}

resource "aws_apigatewayv2_integration" "lambda" {
  api_id           = aws_apigatewayv2_api.main.id
  integration_type = "AWS_PROXY"
  integration_uri  = aws_lambda_function.api.invoke_arn
}

resource "aws_apigatewayv2_route" "hello" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "GET /hello"
  target    = "integrations/${aws_apigatewayv2_integration.lambda.id}"
}

Questions to explore:

What’s the difference between API Gateway REST API and HTTP API?
How do you handle Lambda cold starts?
How do you deploy Lambda updates without downtime?
What’s the trade-off between Terraform and Serverless Framework?

Learning milestones:

Lambda responds to HTTP requests → You understand serverless basics
Logs appear in CloudWatch → You understand observability
Custom domain works with SSL → You understand ACM and Route53
You can deploy updates safely → You understand deployment patterns

Project 14: Infrastructure Testing with Terratest

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: Go, HCL
Alternative Programming Languages: Python (pytest-terraform), Terraform Test (native)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 3. The “Service & Support” Model
Difficulty: Level 4: Expert
Knowledge Area: Testing / Quality Assurance / TDD
Software or Tool: Terratest, Go, Terraform Test
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A comprehensive test suite for your Terraform modules using both Terratest (Go-based) and native Terraform testing, including unit tests, integration tests, and end-to-end validation.

Why it teaches Terraform: Infrastructure needs testing just like application code. This project teaches professional-grade IaC practices: testing modules before release, validating assumptions, and catching regressions before they hit production.

Core challenges you’ll face:

Test isolation → maps to creating unique resources per test run
Test cleanup → maps to defer and destroy patterns
Async infrastructure → maps to retry logic and eventual consistency
Cost management → maps to running tests economically

Key Concepts:

Terratest: Gruntwork - Terratest Documentation
Native Terraform Test: HashiCorp Docs - Tests
Testing Best Practices: “Terraform: Up & Running” Chapter 9 - Yevgeniy Brikman

Difficulty: Expert Time estimate: 2 weeks Prerequisites: Basic Go knowledge, Projects 1-6 completed

Real world outcome:

# Native Terraform Test (Terraform 1.6+)
$ cat tests/vpc.tftest.hcl
run "vpc_creates_successfully" {
  command = apply

  assert {
    condition     = aws_vpc.main.cidr_block == "10.0.0.0/16"
    error_message = "VPC CIDR block is incorrect"
  }

  assert {
    condition     = length(aws_subnet.private) == 2
    error_message = "Expected 2 private subnets"
  }
}

run "vpc_has_internet_access" {
  command = apply

  assert {
    condition     = aws_internet_gateway.main.id != ""
    error_message = "Internet gateway not created"
  }
}

$ terraform test
tests/vpc.tftest.hcl... in progress
  run "vpc_creates_successfully"... pass
  run "vpc_has_internet_access"... pass
tests/vpc.tftest.hcl... tearing down
tests/vpc.tftest.hcl... pass

Success! 2 passed, 0 failed.

# Terratest (Go)
$ go test -v -timeout 30m ./tests/
=== RUN   TestVpcModule
    vpc_test.go:25: Creating VPC with unique name: test-vpc-abc123
    vpc_test.go:40: VPC created: vpc-0abc123
    vpc_test.go:55: Verifying public subnet has internet access...
    vpc_test.go:60: HTTP request to internet succeeded
    vpc_test.go:70: Destroying test infrastructure...
--- PASS: TestVpcModule (180.25s)
PASS

Implementation Hints:

Terratest example (Go):

package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/stretchr/testify/assert"
)

func TestVpcModule(t *testing.T) {
    t.Parallel()

    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "cidr_block":  "10.0.0.0/16",
            "environment": "test",
        },
    }

    // Clean up after test
    defer terraform.Destroy(t, terraformOptions)

    // Deploy infrastructure
    terraform.InitAndApply(t, terraformOptions)

    // Get outputs
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")

    // Validate VPC exists
    vpc := aws.GetVpcById(t, vpcId, "us-east-1")
    assert.Equal(t, "10.0.0.0/16", vpc.CidrBlock)
}

Native Terraform test structure:

modules/vpc/
├── main.tf
├── variables.tf
├── outputs.tf
└── tests/
    ├── unit.tftest.hcl      # Fast, mock-based tests
    └── integration.tftest.hcl # Real infrastructure tests

Test patterns:

Unit tests: Validate configuration without applying (plan-only)
Integration tests: Apply to real cloud, verify, destroy
E2E tests: Deploy full stack, run application tests

Questions to explore:

How do you test modules without spending money?
What’s the difference between Terratest and native tests?
How do you handle flaky tests (eventual consistency)?
How do you parallelize tests safely?

Learning milestones:

Tests run and pass → You understand test setup
Tests create/destroy cleanly → You understand test isolation
Tests catch real bugs → You understand test value
Tests run in CI → You understand automated testing

Project 15: Multi-Cloud Deployment (AWS + GCP)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL
Alternative Programming Languages: Pulumi (multi-cloud native)
Coolness Level: Level 4: Hardcore Tech Flex
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 4: Expert
Knowledge Area: Multi-Cloud / Hybrid Architecture
Software or Tool: Terraform, AWS, GCP, Consul/Tailscale
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A multi-cloud architecture with a web tier on AWS and database on GCP (Cloud SQL), connected via secure VPN tunnel, demonstrating true multi-cloud orchestration.

Why it teaches Terraform: This is Terraform’s killer feature—one tool, multiple clouds. This project forces you to understand provider abstraction, network connectivity between clouds, and the reality of multi-cloud complexity.

Core challenges you’ll face:

Multi-provider configuration → maps to provider aliases and credentials
Cross-cloud networking → maps to VPN, peering, or overlay networks
Consistent naming and tagging → maps to locals and conventions
Deployment ordering → maps to depends_on across providers

Key Concepts:

Multi-Cloud Terraform: “Terraform: Up & Running” Chapter 7 - Yevgeniy Brikman
GCP Networking: Google Cloud Documentation
VPN Connectivity: AWS/GCP - Site-to-Site VPN

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: GCP basics, VPN/networking, Projects 1-10 completed

Real world outcome:

$ terraform apply
# AWS resources
aws_vpc.main: Creating...
aws_vpn_gateway.main: Creating...
aws_instance.web: Creating...

# GCP resources
google_compute_network.main: Creating...
google_sql_database_instance.main: Creating...
google_compute_vpn_gateway.main: Creating...

# Cross-cloud VPN
aws_vpn_connection.to_gcp: Creating...
google_compute_vpn_tunnel.to_aws: Creating...

Apply complete! Resources: 28 added

Outputs:

aws_web_public_ip = "54.123.45.67"
gcp_database_private_ip = "10.100.0.5"
vpn_status = "established"

# Test connectivity from AWS to GCP
$ ssh ubuntu@54.123.45.67
ubuntu@web:~$ psql -h 10.100.0.5 -U admin -d appdb
Password:
appdb=> SELECT 'Connected from AWS to GCP!';
         ?column?
---------------------------
 Connected from AWS to GCP!

Implementation Hints:

Multi-provider configuration:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

# Provider aliases for multi-region
provider "aws" {
  alias  = "west"
  region = "us-west-2"
}

Cross-cloud VPN concept:

AWS VPC (10.0.0.0/16)
    │
    └── VPN Gateway ──────VPN Tunnel────── VPN Gateway
                                               │
                                    GCP VPC (10.100.0.0/16)

Shared naming convention:

locals {
  project = "myapp"
  env     = "prod"

  common_tags = {
    Project     = local.project
    Environment = local.env
    ManagedBy   = "terraform"
  }
}

resource "aws_vpc" "main" {
  tags = local.common_tags
}

resource "google_compute_network" "main" {
  labels = { for k, v in local.common_tags : lower(k) => lower(v) }
}

Questions to explore:

What are the latency implications of cross-cloud communication?
How do you handle cloud-specific features (no GCP equivalent)?
What’s the cost model for VPN vs dedicated interconnect?
How do you manage credentials for multiple clouds?

Learning milestones:

Both clouds provision → You understand multi-provider
VPN tunnel establishes → You understand cross-cloud networking
Application connects across clouds → You understand end-to-end
You understand trade-offs → You can advise on multi-cloud

Project 16: Custom Terraform Provider (Go)

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: Go
Alternative Programming Languages: N/A (providers must be Go)
Coolness Level: Level 5: Pure Magic
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 5: Master
Knowledge Area: SDK Development / API Integration
Software or Tool: Terraform Plugin SDK, Go
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A custom Terraform provider for an internal service or third-party API (e.g., managing DNS records on a custom service, or a provider for an internal config management system).

Why it teaches Terraform: This is the deepest level of Terraform understanding. By building a provider, you’ll understand the entire Terraform plugin architecture, CRUD operations, state management, and how Terraform communicates with the outside world.

Core challenges you’ll face:

Plugin SDK architecture → maps to provider, resources, data sources
CRUD implementation → maps to Create, Read, Update, Delete functions
State handling → maps to schema and state management
Error handling → maps to diagnostics and retries

Key Concepts:

Provider Development: HashiCorp Docs - Plugin Development
Terraform Plugin Framework: HashiCorp - Plugin Framework
Go Programming: “The Go Programming Language” by Donovan & Kernighan

Difficulty: Master Time estimate: 1 month+ Prerequisites: Go proficiency, deep Terraform knowledge, all previous projects

Real world outcome:

# Your custom provider
$ cat main.tf
terraform {
  required_providers {
    internal = {
      source  = "yourcompany/internal"
      version = "1.0.0"
    }
  }
}

provider "internal" {
  api_endpoint = "https://internal-api.company.com"
  api_key      = var.internal_api_key
}

resource "internal_config" "app_settings" {
  name = "myapp"

  settings = {
    feature_flag_x = "true"
    max_connections = "100"
  }
}

data "internal_config" "existing" {
  name = "legacy-app"
}

$ terraform apply
internal_config.app_settings: Creating...
internal_config.app_settings: Creation complete [id=cfg-abc123]

Apply complete! Resources: 1 added

$ terraform state show internal_config.app_settings
# internal_config.app_settings:
resource "internal_config" "app_settings" {
    id       = "cfg-abc123"
    name     = "myapp"
    settings = {
        "feature_flag_x"  = "true"
        "max_connections" = "100"
    }
}

Implementation Hints:

Provider structure (Plugin Framework - modern approach):

// provider.go
package provider

import (
    "context"
    "github.com/hashicorp/terraform-plugin-framework/provider"
    "github.com/hashicorp/terraform-plugin-framework/provider/schema"
)

type InternalProvider struct {
    version string
}

func (p *InternalProvider) Schema(ctx context.Context, req provider.SchemaRequest, resp *provider.SchemaResponse) {
    resp.Schema = schema.Schema{
        Attributes: map[string]schema.Attribute{
            "api_endpoint": schema.StringAttribute{
                Required: true,
            },
            "api_key": schema.StringAttribute{
                Required:  true,
                Sensitive: true,
            },
        },
    }
}

func (p *InternalProvider) Resources(ctx context.Context) []func() resource.Resource {
    return []func() resource.Resource{
        NewConfigResource,
    }
}

Resource implementation:

// resource_config.go
func (r *ConfigResource) Create(ctx context.Context, req resource.CreateRequest, resp *resource.CreateResponse) {
    var data ConfigResourceModel
    resp.Diagnostics.Append(req.Plan.Get(ctx, &data)...)

    // Call your API
    result, err := r.client.CreateConfig(data.Name.ValueString(), data.Settings)
    if err != nil {
        resp.Diagnostics.AddError("Create failed", err.Error())
        return
    }

    data.ID = types.StringValue(result.ID)
    resp.Diagnostics.Append(resp.State.Set(ctx, &data)...)
}

Questions to explore:

When should you build a custom provider vs use http provider?
How do you handle API pagination and rate limiting?
How do you test providers?
How do you publish to the Terraform Registry?

Learning milestones:

Provider compiles → You understand plugin architecture
Resources CRUD works → You understand the Terraform lifecycle
Provider handles errors gracefully → You understand production quality
Others can use your provider → You understand distribution

Project Comparison Table

Project	Difficulty	Time	Depth of Understanding	Fun Factor
1. Local Infrastructure Sandbox	Beginner	Weekend	⭐⭐⭐⭐⭐	⭐⭐
2. Static Website on S3	Beginner	Weekend	⭐⭐⭐	⭐⭐⭐
3. VPC from Scratch	Intermediate	1-2 weeks	⭐⭐⭐⭐	⭐⭐
4. EC2 Web Server	Intermediate	1 week	⭐⭐⭐	⭐⭐⭐
5. RDS Database	Intermediate	1 week	⭐⭐⭐⭐	⭐⭐
6. Terraform Modules	Advanced	2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐
7. Remote State	Intermediate	Weekend	⭐⭐⭐⭐	⭐⭐
8. Docker Infrastructure	Intermediate	Weekend	⭐⭐⭐	⭐⭐⭐⭐
9. Multi-Environment	Advanced	2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐
10. EKS Kubernetes	Expert	2-3 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐
11. CI/CD Pipeline	Advanced	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐⭐
12. Import Existing	Advanced	1-2 weeks	⭐⭐⭐⭐	⭐⭐⭐
13. Serverless	Advanced	1-2 weeks	⭐⭐⭐	⭐⭐⭐⭐
14. Infrastructure Testing	Expert	2 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐
15. Multi-Cloud	Expert	3-4 weeks	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐
16. Custom Provider	Master	1 month+	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐

Recommendation

If you’re brand new to Terraform:

Start with Project 1 (Local Infrastructure Sandbox) → Project 2 (S3 Website) → Project 3 (VPC)

This progression teaches you the core workflow without cloud costs (Project 1), then introduces real cloud resources (Project 2), then shows you infrastructure complexity (Project 3).

If you have some Terraform experience:

Jump to Project 6 (Modules) → Project 7 (Remote State) → Project 9 (Multi-Environment)

These projects teach professional-grade patterns that separate hobbyists from production-ready engineers.

If you want to become a Terraform expert:

Work through all projects, but especially focus on:

Project 14 (Testing) — Most teams skip this, but it’s crucial
Project 15 (Multi-Cloud) — Proves you understand Terraform’s core value
Project 16 (Custom Provider) — The ultimate deep dive

Final Overall Project: Production-Grade Multi-Tenant SaaS Platform

File: LEARN_TERRAFORM_DEEP_DIVE.md
Main Programming Language: HCL, Python, Go
Alternative Programming Languages: TypeScript (CDK), Pulumi
Coolness Level: Level 5: Pure Magic
Business Potential: 5. The “Industry Disruptor”
Difficulty: Level 5: Master
Knowledge Area: Full Stack Infrastructure / Platform Engineering
Software or Tool: Terraform, AWS, Kubernetes, GitOps, Monitoring
Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A complete production-grade SaaS platform infrastructure including:

Multi-region VPC with transit gateway
EKS clusters (primary + DR)
RDS with read replicas and failover
ElastiCache for session/cache layer
CloudFront CDN with WAF
Multi-tenant isolation patterns
Complete observability stack (CloudWatch, Prometheus, Grafana)
Automated backup and disaster recovery
CI/CD with Terraform Cloud or self-hosted
Cost management and tagging strategy

Why this is the ultimate test: This project synthesizes everything: modules, remote state, multi-environment, testing, CI/CD, security, and operational excellence. It’s what enterprise Terraform looks like.

Core challenges you’ll face:

Module composition at scale → 50+ modules working together
State organization → Multiple state files, team boundaries
Blast radius management → Isolating critical resources
Cost control → Tagging, budgets, resource optimization
Compliance → Security groups, encryption, audit logging
DR planning → Failover automation, recovery testing

Real world outcome:

Platform Architecture:
├── networking/
│   ├── vpc-primary/
│   ├── vpc-dr/
│   └── transit-gateway/
├── compute/
│   ├── eks-primary/
│   ├── eks-dr/
│   └── node-groups/
├── data/
│   ├── rds-primary/
│   ├── rds-replica/
│   ├── elasticache/
│   └── s3-buckets/
├── security/
│   ├── iam-roles/
│   ├── kms-keys/
│   └── waf/
├── observability/
│   ├── cloudwatch/
│   ├── prometheus/
│   └── grafana/
└── cicd/
    ├── terraform-cloud/
    └── github-actions/

$ terraform workspace list
  dev
  staging
* production
  dr-production

$ terraform plan
Plan: 0 to add, 0 to change, 0 to destroy.

Your infrastructure is up to date.

$ ./scripts/dr-failover.sh
Initiating DR failover...
RDS failover: complete
EKS context switch: complete
DNS cutover: complete
DR failover complete in 4m 32s

Implementation Hints:

This project requires all previous skills combined. Key patterns:

Layered architecture — Network layer deploys before compute, compute before data
Dependency injection — Pass outputs between layers via remote state
Environment parity — Same modules, different variables
Security by default — Encryption everywhere, least privilege
Observable by default — Metrics, logs, traces from day one

Questions to ask yourself:

Can I destroy and recreate any layer without data loss?
Can I deploy to a new region in under a day?
Can I onboard a new engineer without tribal knowledge?
Can I pass a security audit?

Learning milestones:

Platform deploys end-to-end → You understand orchestration
Team can work independently → You understand boundaries
DR failover works → You understand resilience
Costs are predictable → You understand operations

Resources Summary

Essential Books

“Terraform: Up & Running” by Yevgeniy Brikman — The definitive Terraform book, covers basics through advanced
“Terraform in Depth” by Robert Hafner (2025) — Modern best practices and patterns

Official Resources

HashiCorp Developer — Official tutorials and documentation
Terraform Registry — Modules and providers
Terraform Best Practices — Official recommendations

Community Resources

Gruntwork Blog — Deep-dive articles from Terratest creators
CloudPosse Terraform Modules — Production-ready modules to learn from

Summary

#	Project	Main Language
1	Local Infrastructure Sandbox	HCL
2	Static Website on S3	HCL
3	VPC from Scratch	HCL
4	EC2 Web Server with Security Groups	HCL
5	RDS Database with Secrets Management	HCL
6	Terraform Modules	HCL
7	Remote State and Workspaces	HCL
8	Docker Infrastructure	HCL
9	Multi-Environment Deployment	HCL
10	Kubernetes Cluster with EKS	HCL
11	CI/CD Pipeline for Terraform	YAML (GitHub Actions), HCL
12	Import Existing Infrastructure	HCL
13	Serverless Infrastructure	HCL, Python
14	Infrastructure Testing	Go, HCL
15	Multi-Cloud Deployment	HCL
16	Custom Terraform Provider	Go
Final	Production-Grade Multi-Tenant SaaS Platform	HCL, Python, Go