← Back to all projects

LEARN TERRAFORM DEEP DIVE

Learn Terraform: From Zero to Infrastructure Master

Goal: Deeply understand Terraform—from basic resource provisioning to building production-grade, multi-cloud infrastructure with proper state management, modules, and security practices.


Why Terraform Matters

Before Terraform existed, infrastructure was managed through:

  • Manual clicking in cloud consoles (error-prone, unreproducible)
  • Custom scripts (fragile, cloud-specific, hard to maintain)
  • Configuration management tools (Chef, Puppet—focused on server config, not infrastructure)

Terraform solves the fundamental problem: How do you treat infrastructure like software?

After completing these projects, you will:

  • Understand the declarative Infrastructure as Code (IaC) paradigm
  • Know how Terraform’s state machine tracks real-world resources
  • Build reusable, composable infrastructure modules
  • Manage multi-environment deployments safely
  • Understand provider architecture and how Terraform talks to cloud APIs
  • Implement proper security, state management, and collaboration workflows

Core Concept Analysis

The Terraform Lifecycle

                    ┌─────────────────────────────────────────┐
                    │              terraform init             │
                    │   (Download providers, setup backend)   │
                    └─────────────────┬───────────────────────┘
                                      │
                                      ▼
                    ┌─────────────────────────────────────────┐
                    │              terraform plan             │
                    │   (Compare desired vs actual state)     │
                    └─────────────────┬───────────────────────┘
                                      │
                                      ▼
                    ┌─────────────────────────────────────────┐
                    │             terraform apply             │
                    │   (Execute changes, update state)       │
                    └─────────────────┬───────────────────────┘
                                      │
                                      ▼
                    ┌─────────────────────────────────────────┐
                    │            terraform destroy            │
                    │   (Remove all managed resources)        │
                    └─────────────────────────────────────────┘

Fundamental Concepts

  1. Declarative vs Imperative
    • Terraform is declarative: you describe the desired end state
    • Terraform figures out how to get there (create, update, destroy)
    • This is fundamentally different from scripts that describe steps
  2. State Management
    • Terraform keeps a terraform.tfstate file mapping config to real resources
    • State is the source of truth for what Terraform manages
    • Without state, Terraform cannot know what exists in your infrastructure
    • Remote state enables team collaboration and prevents conflicts
  3. Providers
    • Plugins that translate Terraform config into API calls
    • Each cloud/service (AWS, GCP, Azure, GitHub, Kubernetes) has a provider
    • Providers define resources (things to create) and data sources (things to read)
  4. Resources and Data Sources
    # Resource: Terraform MANAGES this (creates, updates, destroys)
    resource "aws_instance" "web" {
      ami           = "ami-12345678"
      instance_type = "t2.micro"
    }
    
    # Data Source: Terraform READS this (does not manage)
    data "aws_ami" "ubuntu" {
      most_recent = true
      owners      = ["099720109477"]
    }
    
  5. Modules
    • Reusable containers for related resources
    • Enable abstraction: hide complexity, expose clean interfaces
    • Can be local directories or remote (Terraform Registry, Git)
  6. Variables and Outputs
    • Variables: Inputs to your configuration (parameterization)
    • Outputs: Exports from your configuration (for humans or other modules)
    • Locals: Computed intermediate values
  7. Backend and State Locking
    • Backend: Where state is stored (local, S3, GCS, Terraform Cloud)
    • State locking: Prevents concurrent modifications (critical for teams)

Project List

Projects are ordered from fundamental understanding to production-grade implementations.


Project 1: Local Infrastructure Sandbox (Understand the Core Loop)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL (HashiCorp Configuration Language)
  • Alternative Programming Languages: JSON (Terraform also accepts JSON syntax)
  • Coolness Level: Level 1: Pure Corporate Snoozefest
  • Business Potential: 1. The “Resume Gold”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Infrastructure as Code / State Management
  • Software or Tool: Terraform, Local Provider
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A Terraform configuration that creates local files and directories on your machine, demonstrating the full init→plan→apply→destroy cycle without any cloud costs or complexity.

Why it teaches Terraform: Before touching clouds, you need to understand Terraform’s core mechanics. The local provider lets you see state management, resource lifecycle, and the plan/apply workflow without distractions. You’ll understand WHY Terraform works the way it does.

Core challenges you’ll face:

  • Understanding state files → maps to how Terraform tracks resources
  • Interpreting plan output → maps to reading +/- change indicators
  • Handling resource dependencies → maps to implicit and explicit depends_on
  • Dealing with state drift → maps to what happens when you modify files manually

Key Concepts:

  • Terraform Lifecycle: “Terraform: Up & Running” Chapter 1 - Yevgeniy Brikman
  • State Files: HashiCorp Learn - State Management Tutorial
  • HCL Syntax: “Terraform: Up & Running” Chapter 2 - Yevgeniy Brikman
  • Resource Dependencies: HashiCorp Docs - Dependency Lock File

Difficulty: Beginner Time estimate: Weekend Prerequisites: Basic command-line familiarity, text editor usage

Real world outcome:

$ terraform init
Initializing the backend...
Initializing provider plugins...
- Finding latest version of hashicorp/local...
- Installing hashicorp/local v2.4.0...

$ terraform plan
Terraform will perform the following actions:

  # local_file.config will be created
  + resource "local_file" "config" {
      + content              = "environment=development"
      + filename             = "./config/app.conf"
    }

Plan: 1 to add, 0 to change, 0 to destroy.

$ terraform apply
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

$ cat config/app.conf
environment=development

$ terraform destroy
Destroy complete! Resources: 1 destroyed.

Implementation Hints:

Start with the simplest possible configuration:

my-first-terraform/
├── main.tf          # Resource definitions
├── variables.tf     # Input variables
├── outputs.tf       # Output values
└── terraform.tfstate # (Generated after apply)

The local provider has these resources:

  • local_file - Creates a file with specified content
  • local_sensitive_file - Same but content hidden in logs
  • null_resource - Does nothing but useful for provisioners

Questions to explore:

  1. What happens if you manually edit a file Terraform created?
  2. What happens if you delete the state file and run plan again?
  3. How does Terraform handle dependencies between resources?
  4. What’s the difference between terraform.tfstate and terraform.tfstate.backup?

Use terraform state list and terraform state show <resource> to inspect state.

Learning milestones:

  1. init/plan/apply works → You understand the basic workflow
  2. You can read plan output → You understand what Terraform will do before it does it
  3. You understand state files → You know why state is critical
  4. You handle drift correctly → You understand real-world state management

Project 2: Static Website on S3 (Your First Cloud Resource)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: JSON, Pulumi (TypeScript), AWS CDK
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 1: Beginner
  • Knowledge Area: Cloud Storage / Static Hosting
  • Software or Tool: Terraform, AWS S3, AWS CLI
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A fully functional static website hosted on AWS S3, including bucket configuration, public access policies, and website hosting settings—all managed through Terraform.

Why it teaches Terraform: This is the classic “Hello World” of cloud Terraform. You’ll learn how providers work, how to configure authentication, and how to translate AWS console clicking into code. Most importantly, you’ll see your infrastructure in a real cloud.

Core challenges you’ll face:

  • AWS authentication setup → maps to provider configuration and credentials
  • Bucket naming uniqueness → maps to global namespace constraints
  • Public access configuration → maps to AWS security model evolution
  • Website endpoint configuration → maps to AWS-specific resource attributes

Key Concepts:

Difficulty: Beginner Time estimate: Weekend Prerequisites: AWS account, AWS CLI configured, Project 1 completed

Real world outcome:

$ terraform apply
aws_s3_bucket.website: Creating...
aws_s3_bucket.website: Creation complete after 2s [id=my-terraform-website-12345]
aws_s3_bucket_website_configuration.website: Creating...
aws_s3_bucket_website_configuration.website: Creation complete after 1s

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:

website_url = "http://my-terraform-website-12345.s3-website-us-east-1.amazonaws.com"

$ curl http://my-terraform-website-12345.s3-website-us-east-1.amazonaws.com
<!DOCTYPE html>
<html>
<head><title>My Terraform Website</title></head>
<body><h1>Hello from Terraform!</h1></body>
</html>

Implementation Hints:

AWS S3 website hosting requires multiple resources:

  1. aws_s3_bucket - The bucket itself
  2. aws_s3_bucket_public_access_block - Control public access settings
  3. aws_s3_bucket_policy - Define who can read the bucket
  4. aws_s3_bucket_website_configuration - Enable website hosting
  5. aws_s3_object - Upload your HTML files

Modern AWS requires explicit public access configuration. You must:

  • Disable “Block Public Access” settings
  • Attach a bucket policy allowing s3:GetObject

Use terraform output to get the website URL after deployment.

Questions to explore:

  1. What happens if you change the bucket name after it’s created?
  2. How do you upload multiple files (hint: for_each)?
  3. What’s the difference between aws_s3_bucket_acl and aws_s3_bucket_policy?

Learning milestones:

  1. AWS provider authenticates → You understand cloud credentials
  2. Website is accessible → You created real cloud infrastructure
  3. You can update content → You understand resource updates vs recreation
  4. destroy removes everything → You understand infrastructure lifecycle

Project 3: VPC from Scratch (Network Architecture)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: Pulumi (Python), AWS CDK (TypeScript)
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Cloud Networking / VPC Design
  • Software or Tool: Terraform, AWS VPC, AWS Subnets
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A complete AWS VPC with public and private subnets across multiple availability zones, internet gateway, NAT gateway, and proper route tables.

Why it teaches Terraform: Networking is where most cloud architectures fail or succeed. This project forces you to understand resource dependencies (subnets depend on VPC, route tables depend on gateways), CIDR block math, and how Terraform handles complex interdependencies.

Core challenges you’ll face:

  • CIDR block planning → maps to network address space design
  • Resource dependency chains → maps to Terraform’s implicit dependency detection
  • Multi-AZ distribution → maps to using count/for_each with availability zones
  • Route table associations → maps to many-to-many resource relationships

Key Concepts:

  • VPC Fundamentals: “AWS Certified Solutions Architect Study Guide” Chapter 5 - Ben Piper
  • Terraform Count and For_Each: “Terraform: Up & Running” Chapter 5 - Yevgeniy Brikman
  • CIDR Notation: “Computer Networks” by Tanenbaum - Chapter on IP Addressing
  • AWS Networking: AWS Documentation - VPC User Guide

Difficulty: Intermediate Time estimate: 1-2 weeks Prerequisites: Basic networking (IP addresses, CIDR), Projects 1-2 completed

Real world outcome:

$ terraform apply
...
Apply complete! Resources: 15 added, 0 changed, 0 destroyed.

Outputs:

vpc_id = "vpc-0abc123def456"
public_subnets = [
  "subnet-0pub1abc",
  "subnet-0pub2def",
]
private_subnets = [
  "subnet-0priv1abc",
  "subnet-0priv2def",
]
nat_gateway_ip = "54.123.45.67"

# Verify with AWS CLI
$ aws ec2 describe-vpcs --vpc-ids vpc-0abc123def456
{
    "Vpcs": [{
        "CidrBlock": "10.0.0.0/16",
        "State": "available"
    }]
}

Implementation Hints:

VPC architecture pattern:

VPC (10.0.0.0/16)
├── Public Subnet AZ-a (10.0.1.0/24) → Internet Gateway
├── Public Subnet AZ-b (10.0.2.0/24) → Internet Gateway
├── Private Subnet AZ-a (10.0.10.0/24) → NAT Gateway
└── Private Subnet AZ-b (10.0.20.0/24) → NAT Gateway

Resources you’ll create:

  • aws_vpc - The virtual network
  • aws_subnet - Network segments (use for_each for multiple)
  • aws_internet_gateway - Public internet access
  • aws_nat_gateway - Private subnet internet access (outbound only)
  • aws_eip - Elastic IP for NAT gateway
  • aws_route_table - Routing rules
  • aws_route_table_association - Link subnets to route tables

Use the cidrsubnet() function to calculate subnet CIDR blocks:

cidrsubnet("10.0.0.0/16", 8, 1)  # Returns "10.0.1.0/24"

Questions to explore:

  1. Why do private subnets need a NAT gateway?
  2. What’s the cost difference between Internet Gateway and NAT Gateway?
  3. How do you handle multiple NAT gateways for high availability?
  4. What happens if you delete the Internet Gateway while instances are running?

Learning milestones:

  1. VPC with subnets created → You understand AWS networking basics
  2. Instances in public subnet have internet → You understand routing
  3. Instances in private subnet can reach internet outbound → You understand NAT
  4. You use for_each effectively → You understand Terraform iteration

Project 4: EC2 Web Server with Security Groups

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: Pulumi (Go), Ansible + Terraform
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Compute / Security / Configuration
  • Software or Tool: Terraform, AWS EC2, Security Groups
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: An EC2 instance running a web server (nginx), with proper security groups, SSH key management, and user data scripts—deployed into your VPC from Project 3.

Why it teaches Terraform: This is where Terraform meets actual compute. You’ll learn about data sources (finding AMIs), provisioners (bootstrapping instances), security groups (cloud firewalls), and how to wire compute into your network.

Core challenges you’ll face:

  • Finding the right AMI → maps to data sources and dynamic lookups
  • SSH key management → maps to sensitive data in Terraform
  • User data scripts → maps to cloud-init and instance bootstrapping
  • Security group rules → maps to ingress/egress and CIDR blocks

Key Concepts:

  • EC2 Fundamentals: “Terraform: Up & Running” Chapter 2 - Yevgeniy Brikman
  • Data Sources: HashiCorp Docs - Data Sources
  • User Data Scripts: AWS Documentation - Run Commands on Launch
  • Security Groups: “AWS Certified Solutions Architect Study Guide” - Ben Piper

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic Linux, SSH, Projects 1-3 completed

Real world outcome:

$ terraform apply
...
Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

Outputs:

instance_public_ip = "54.234.56.78"
ssh_command = "ssh -i ~/.ssh/terraform-key ubuntu@54.234.56.78"

$ curl http://54.234.56.78
<!DOCTYPE html>
<html>
<head><title>Welcome to nginx!</title></head>
<body>
<h1>Deployed with Terraform!</h1>
<p>Instance ID: i-0abc123def456</p>
</body>
</html>

$ ssh -i ~/.ssh/terraform-key ubuntu@54.234.56.78
ubuntu@ip-10-0-1-15:~$ systemctl status nginx
● nginx.service - A high performance web server
   Active: active (running)

Implementation Hints:

Use data source to find latest Ubuntu AMI:

data "aws_ami" "ubuntu" {
  most_recent = true
  owners      = ["099720109477"]  # Canonical

  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-jammy-22.04-amd64-server-*"]
  }
}

User data script for nginx:

#!/bin/bash
apt-get update
apt-get install -y nginx
systemctl start nginx
echo "<h1>Deployed with Terraform!</h1>" > /var/www/html/index.html

Security group pattern:

  • Ingress: Allow SSH (22) from your IP, HTTP (80) from anywhere
  • Egress: Allow all outbound

Questions to explore:

  1. What happens if user data script fails?
  2. How do you update the instance without destroying it?
  3. What’s the difference between aws_security_group and aws_security_group_rule?
  4. How do you handle AMI updates (new image published)?

Learning milestones:

  1. Instance launches and is reachable → You understand compute basics
  2. User data script runs successfully → You understand bootstrapping
  3. Security groups work correctly → You understand cloud firewalls
  4. You can SSH into the instance → You understand key management

Project 5: RDS Database with Secrets Management

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: Pulumi (Python), CDKTF
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Databases / Secrets / Security
  • Software or Tool: Terraform, AWS RDS, AWS Secrets Manager
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A managed PostgreSQL database on AWS RDS, with credentials stored in AWS Secrets Manager, subnet groups for multi-AZ, and proper security group configuration allowing access only from your application tier.

Why it teaches Terraform: Databases are stateful—destroying them loses data. This project teaches you about prevent_destroy lifecycle rules, sensitive outputs, secrets management, and how to handle resources that require special care.

Core challenges you’ll face:

  • Handling sensitive values → maps to Terraform sensitive variables and outputs
  • Database subnet groups → maps to RDS networking requirements
  • Lifecycle management → maps to prevent_destroy, create_before_destroy
  • Connecting secrets to resources → maps to data source for secrets retrieval

Key Concepts:

  • RDS Configuration: “Terraform: Up & Running” Chapter 3 - Yevgeniy Brikman
  • Sensitive Variables: HashiCorp Docs - Sensitive Variables
  • Lifecycle Meta-Arguments: HashiCorp Docs - Lifecycle
  • Secrets Manager Integration: AWS Documentation

Difficulty: Intermediate Time estimate: 1 week Prerequisites: Basic SQL/database concepts, Projects 1-4 completed

Real world outcome:

$ terraform apply
...
aws_db_instance.postgres: Creating...
aws_db_instance.postgres: Still creating... [4m30s elapsed]
aws_db_instance.postgres: Creation complete after 5m12s

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

db_endpoint = "mydb.abc123.us-east-1.rds.amazonaws.com:5432"
db_name = "appdb"
secret_arn = "arn:aws:secretsmanager:us-east-1:123456789:secret:db-creds-AbC123"

# Connect from EC2 instance in same VPC
$ psql -h mydb.abc123.us-east-1.rds.amazonaws.com -U admin -d appdb
Password: ********
appdb=> \dt
         List of relations
 Schema | Name | Type  | Owner
--------+------+-------+-------
(0 rows)

Implementation Hints:

Generate a random password and store in Secrets Manager:

resource "random_password" "db" {
  length  = 24
  special = true
}

resource "aws_secretsmanager_secret" "db" {
  name = "app/database/credentials"
}

resource "aws_secretsmanager_secret_version" "db" {
  secret_id     = aws_secretsmanager_secret.db.id
  secret_string = jsonencode({
    username = "admin"
    password = random_password.db.result
  })
}

Protect the database from accidental destruction:

resource "aws_db_instance" "postgres" {
  # ... other config ...

  lifecycle {
    prevent_destroy = true
  }

  # For real protection
  deletion_protection = true
}

Database subnet group requires subnets in multiple AZs:

resource "aws_db_subnet_group" "main" {
  name       = "main"
  subnet_ids = aws_subnet.private[*].id
}

Questions to explore:

  1. What happens if you try to terraform destroy with prevent_destroy?
  2. How do you rotate database credentials with Terraform?
  3. What’s the difference between skip_final_snapshot and production settings?
  4. How does your EC2 instance retrieve the password from Secrets Manager?

Learning milestones:

  1. RDS instance is accessible from EC2 → You understand VPC security
  2. Credentials are in Secrets Manager → You understand secrets handling
  3. destroy is blocked by lifecycle → You understand protection mechanisms
  4. You can rotate credentials → You understand secret rotation patterns

Project 6: Terraform Modules (Reusability)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: CDKTF (TypeScript/Python), Pulumi
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Software Architecture / Code Reuse
  • Software or Tool: Terraform, Terraform Registry
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A library of reusable Terraform modules—VPC module, EC2 module, RDS module—that encapsulate your previous projects and can be composed together. You’ll publish to a private module registry.

Why it teaches Terraform: Modules are the key to scalable Terraform. This project teaches you abstraction, interface design, versioning, and how professional teams structure their infrastructure code. You’ll learn why “copy-paste” doesn’t scale.

Core challenges you’ll face:

  • Designing module interfaces → maps to what to expose as variables vs hide
  • Module composition → maps to passing outputs as inputs between modules
  • Versioning strategy → maps to semantic versioning for infrastructure
  • Documentation → maps to README, input/output descriptions

Key Concepts:

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1-5 completed, understanding of software design principles

Real world outcome:

# Using your modules in a new project
$ cat main.tf
module "network" {
  source  = "git::https://github.com/yourname/terraform-aws-vpc.git?ref=v1.2.0"

  cidr_block  = "10.0.0.0/16"
  environment = "production"
  azs         = ["us-east-1a", "us-east-1b"]
}

module "webserver" {
  source = "git::https://github.com/yourname/terraform-aws-ec2.git?ref=v2.0.0"

  vpc_id    = module.network.vpc_id
  subnet_id = module.network.public_subnets[0]

  instance_type = "t3.small"
  user_data     = file("${path.module}/scripts/init.sh")
}

module "database" {
  source = "git::https://github.com/yourname/terraform-aws-rds.git?ref=v1.0.0"

  vpc_id             = module.network.vpc_id
  subnet_ids         = module.network.private_subnets
  allowed_cidr_blocks = [module.network.vpc_cidr]

  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.t3.micro"
}

$ terraform init
Initializing modules...
Downloading git::https://github.com/yourname/terraform-aws-vpc.git?ref=v1.2.0...
Downloading git::https://github.com/yourname/terraform-aws-ec2.git?ref=v2.0.0...
Downloading git::https://github.com/yourname/terraform-aws-rds.git?ref=v1.0.0...

$ terraform apply
Apply complete! Resources: 24 added, 0 changed, 0 destroyed.

Implementation Hints:

Module directory structure:

modules/
├── vpc/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   ├── versions.tf
│   └── README.md
├── ec2/
│   ├── main.tf
│   ├── variables.tf
│   ├── outputs.tf
│   └── README.md
└── rds/
    ├── main.tf
    ├── variables.tf
    ├── outputs.tf
    └── README.md

Good module interface design principles:

  1. Minimal required inputs - Most things should have sensible defaults
  2. Predictable outputs - Output everything consumers might need
  3. Clear naming - Variable names should be self-documenting
  4. Validation - Use validation blocks to catch errors early

Variable validation example:

variable "environment" {
  type        = string
  description = "Environment name (dev, staging, prod)"

  validation {
    condition     = contains(["dev", "staging", "prod"], var.environment)
    error_message = "Environment must be dev, staging, or prod."
  }
}

Questions to explore:

  1. When should you create a new module vs extend an existing one?
  2. How do you handle breaking changes in module versions?
  3. What’s the trade-off between flexible modules and opinionated modules?
  4. How do you test modules before releasing new versions?

Learning milestones:

  1. Modules work from local paths → You understand basic module structure
  2. Modules work from Git refs → You understand versioning
  3. You compose multiple modules → You understand module interfaces
  4. You document modules properly → You understand professional practices

Project 7: Remote State and Workspaces (Team Collaboration)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: N/A (Terraform-specific feature)
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: State Management / Collaboration
  • Software or Tool: Terraform, AWS S3, DynamoDB
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A remote backend configuration using S3 for state storage and DynamoDB for state locking, with workspace-based environment separation (dev/staging/prod).

Why it teaches Terraform: State is Terraform’s Achilles heel—lose it and you lose your ability to manage infrastructure. This project teaches the production-grade patterns that every team needs: remote state, locking, and environment isolation.

Core challenges you’ll face:

  • Backend configuration chicken-and-egg → maps to bootstrapping state backend
  • State locking → maps to concurrent access prevention
  • Workspace isolation → maps to environment separation patterns
  • State migration → maps to moving from local to remote

Key Concepts:

  • Remote State: “Terraform: Up & Running” Chapter 3 - Yevgeniy Brikman
  • State Locking: HashiCorp Docs - State Locking
  • Workspaces: HashiCorp Docs - Workspaces
  • Backend Configuration: HashiCorp Docs - S3 Backend

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Projects 1-6 completed, understanding of DynamoDB basics

Real world outcome:

# Initial setup (run once to create backend infrastructure)
$ cd bootstrap/
$ terraform apply
Apply complete! Resources: 3 added

# Configure backend in your main project
$ cat backend.tf
terraform {
  backend "s3" {
    bucket         = "my-terraform-state-12345"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

# Migrate existing state
$ terraform init -migrate-state
Initializing the backend...
Copying state from local to S3...
Successfully configured the backend "s3"!

# Work with multiple environments
$ terraform workspace new staging
Created and switched to workspace "staging"!

$ terraform workspace new production
Created and switched to workspace "production"!

$ terraform workspace list
  default
  staging
* production

# If someone else tries to apply at the same time...
$ terraform apply
Acquiring state lock...
Error: Error acquiring the state lock

Lock Info:
  ID:        abc123-def456
  Path:      s3://my-terraform-state-12345/infrastructure/terraform.tfstate
  Operation: apply
  Who:       colleague@laptop
  Created:   2025-01-15 10:30:00

Implementation Hints:

Bootstrap configuration (run first, stores state locally):

# bootstrap/main.tf
resource "aws_s3_bucket" "state" {
  bucket = "my-terraform-state-${random_id.suffix.hex}"
}

resource "aws_s3_bucket_versioning" "state" {
  bucket = aws_s3_bucket.state.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_dynamodb_table" "locks" {
  name         = "terraform-locks"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "LockID"

  attribute {
    name = "LockID"
    type = "S"
  }
}

Workspace-aware configuration:

locals {
  environment = terraform.workspace

  instance_type = {
    dev        = "t3.micro"
    staging    = "t3.small"
    production = "t3.large"
  }
}

resource "aws_instance" "app" {
  instance_type = local.instance_type[local.environment]
  # ...
}

Questions to explore:

  1. What happens if you delete the DynamoDB table while someone holds a lock?
  2. How do you recover from a stuck lock?
  3. What’s the difference between workspaces and separate state files?
  4. How do you share outputs between workspaces (hint: terraform_remote_state)?

Learning milestones:

  1. State is in S3 → You understand remote state basics
  2. Locking prevents conflicts → You understand collaboration safety
  3. Workspaces separate environments → You understand isolation
  4. You migrate state successfully → You understand state management

Project 8: Docker Infrastructure with Terraform

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: Docker Compose (for comparison)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Containers / Local Development
  • Software or Tool: Terraform, Docker Provider
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A local development environment using Terraform’s Docker provider—managing containers, networks, and volumes the same way you manage cloud infrastructure.

Why it teaches Terraform: Terraform isn’t just for clouds. This project proves that Terraform’s model applies to any API. You’ll understand that a “provider” is just an API translator, and Terraform’s value is the workflow, not the cloud integration.

Core challenges you’ll face:

  • Docker provider configuration → maps to non-cloud provider patterns
  • Container lifecycle → maps to resource update behaviors
  • Network configuration → maps to cross-resource dependencies
  • Volume persistence → maps to stateful resource handling

Key Concepts:

  • Docker Provider: Terraform Registry - kreuzwerker/docker
  • Container Orchestration: Docker Documentation
  • Terraform Providers: “Terraform: Up & Running” Chapter 7 - Yevgeniy Brikman

Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic Docker knowledge, Projects 1-3 completed

Real world outcome:

$ terraform apply
docker_network.app_network: Creating...
docker_volume.postgres_data: Creating...
docker_network.app_network: Creation complete after 1s
docker_volume.postgres_data: Creation complete after 0s
docker_container.postgres: Creating...
docker_container.postgres: Creation complete after 2s
docker_container.redis: Creating...
docker_container.redis: Creation complete after 1s
docker_container.app: Creating...
docker_container.app: Creation complete after 1s

Apply complete! Resources: 5 added, 0 changed, 0 destroyed.

Outputs:

app_url = "http://localhost:8080"
postgres_port = 5432

$ docker ps
CONTAINER ID   IMAGE          STATUS         PORTS                    NAMES
abc123         myapp:latest   Up 2 minutes   0.0.0.0:8080->8080/tcp   app
def456         postgres:15    Up 2 minutes   0.0.0.0:5432->5432/tcp   postgres
ghi789         redis:7        Up 2 minutes   6379/tcp                 redis

$ curl http://localhost:8080/health
{"status": "healthy", "database": "connected", "cache": "connected"}

Implementation Hints:

Docker provider configuration:

terraform {
  required_providers {
    docker = {
      source  = "kreuzwerker/docker"
      version = "~> 3.0"
    }
  }
}

provider "docker" {
  host = "unix:///var/run/docker.sock"
}

Multi-container application:

resource "docker_network" "app" {
  name = "app-network"
}

resource "docker_volume" "postgres_data" {
  name = "postgres-data"
}

resource "docker_container" "postgres" {
  name  = "postgres"
  image = docker_image.postgres.image_id

  networks_advanced {
    name = docker_network.app.name
  }

  volumes {
    volume_name    = docker_volume.postgres_data.name
    container_path = "/var/lib/postgresql/data"
  }

  env = [
    "POSTGRES_PASSWORD=secret"
  ]
}

Questions to explore:

  1. How does Terraform handle container image updates?
  2. What’s the difference between Docker Compose and Terraform for containers?
  3. How do you handle container logs and debugging?
  4. Can you use Terraform to build Docker images?

Learning milestones:

  1. Containers start via Terraform → You understand provider universality
  2. Containers communicate → You understand Terraform networking
  3. Data persists after destroy/apply → You understand volumes
  4. You see when Terraform shines vs Docker Compose → You understand tool selection

Project 9: Multi-Environment Deployment (Dev/Staging/Prod)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: Terragrunt (HCL extension)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Environment Management / GitOps
  • Software or Tool: Terraform, Git, Terragrunt (optional)
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A complete infrastructure deployment pipeline supporting multiple environments (dev/staging/prod) with different configurations, proper variable management, and environment promotion workflows.

Why it teaches Terraform: Real infrastructure has environments. This project teaches the patterns for managing configuration differences, preventing “works on dev, breaks on prod” scenarios, and creating safe promotion paths.

Core challenges you’ll face:

  • Configuration variance → maps to tfvars files and environment-specific values
  • DRY principle → maps to modules, locals, and variable defaults
  • Environment isolation → maps to separate state files and accounts
  • Promotion workflow → maps to GitOps and PR-based deployments

Key Concepts:

  • Multi-Environment Patterns: “Terraform: Up & Running” Chapter 5 - Yevgeniy Brikman
  • Terragrunt: Gruntwork.io - Terragrunt Documentation
  • GitOps for Infrastructure: HashiCorp Blog - GitOps Patterns

Difficulty: Advanced Time estimate: 2 weeks Prerequisites: Projects 1-7 completed, Git workflow understanding

Real world outcome:

# Directory structure
$ tree environments/
environments/
├── dev/
│   ├── main.tf
│   ├── backend.tf
│   └── terraform.tfvars
├── staging/
│   ├── main.tf
│   ├── backend.tf
│   └── terraform.tfvars
└── prod/
    ├── main.tf
    ├── backend.tf
    └── terraform.tfvars

# Deploy to dev
$ cd environments/dev
$ terraform apply
Apply complete! Resources: 12 added

# Compare environments
$ terraform-docs markdown table environments/dev > docs/dev.md
$ diff environments/dev/terraform.tfvars environments/prod/terraform.tfvars
1c1
< environment     = "dev"
---
> environment     = "prod"
3c3
< instance_type   = "t3.micro"
---
> instance_type   = "t3.large"
5c5
< db_instance_class = "db.t3.micro"
---
> db_instance_class = "db.r5.large"

# Promote to staging (via Git PR)
$ git checkout -b promote-to-staging
$ cp environments/dev/terraform.tfvars environments/staging/
$ git add . && git commit -m "Promote dev config to staging"
$ git push && gh pr create

Implementation Hints:

Environment-specific tfvars:

# environments/dev/terraform.tfvars
environment       = "dev"
instance_type     = "t3.micro"
instance_count    = 1
db_instance_class = "db.t3.micro"
enable_monitoring = false

# environments/prod/terraform.tfvars
environment       = "prod"
instance_type     = "t3.large"
instance_count    = 3
db_instance_class = "db.r5.large"
enable_monitoring = true

Shared module with environment-aware defaults:

# modules/app/variables.tf
variable "environment" {
  type = string
}

variable "enable_monitoring" {
  type    = bool
  default = false
}

variable "alarm_actions" {
  type    = list(string)
  default = []
}

# modules/app/main.tf
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  count = var.enable_monitoring ? 1 : 0

  alarm_name    = "${var.environment}-high-cpu"
  alarm_actions = var.alarm_actions
  # ...
}

Terragrunt alternative (DRYer approach):

# terragrunt.hcl in each environment
include "root" {
  path = find_in_parent_folders()
}

terraform {
  source = "../../modules//app"
}

inputs = {
  environment   = "dev"
  instance_type = "t3.micro"
}

Questions to explore:

  1. Should environments share state or have separate state files?
  2. How do you handle environment-specific resources (e.g., only prod has CDN)?
  3. What’s the role of Git branches in environment management?
  4. How do you prevent accidental production changes?

Learning milestones:

  1. Same code deploys to multiple environments → You understand variable management
  2. Environments have different resources → You understand conditional logic
  3. Promotion requires approval → You understand GitOps patterns
  4. Prod is protected → You understand access control

Project 10: Kubernetes Cluster with EKS

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: eksctl, Pulumi (TypeScript)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Container Orchestration / Kubernetes
  • Software or Tool: Terraform, AWS EKS, kubectl
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A production-ready EKS cluster with managed node groups, proper IAM roles, VPC CNI networking, and cluster add-ons (CoreDNS, kube-proxy, VPC CNI).

Why it teaches Terraform: EKS is the most complex AWS service to provision. This project forces you to understand IAM roles for service accounts, cross-resource dependencies, and how Terraform handles resources that take 15+ minutes to create.

Core challenges you’ll face:

  • IAM role trust policies → maps to assume_role and service-linked roles
  • EKS networking complexity → maps to VPC CNI and pod networking
  • Long creation times → maps to Terraform timeouts and patience
  • Kubernetes provider bootstrap → maps to provider dependencies on resources

Key Concepts:

Difficulty: Expert Time estimate: 2-3 weeks Prerequisites: Kubernetes basics, Docker, Projects 1-9 completed

Real world outcome:

$ terraform apply
module.eks.aws_eks_cluster.this: Creating...
module.eks.aws_eks_cluster.this: Still creating... [10m elapsed]
module.eks.aws_eks_cluster.this: Creation complete after 12m3s
module.eks.aws_eks_node_group.this: Creating...
module.eks.aws_eks_node_group.this: Still creating... [5m elapsed]
module.eks.aws_eks_node_group.this: Creation complete after 6m45s

Apply complete! Resources: 32 added, 0 changed, 0 destroyed.

Outputs:

cluster_endpoint = "https://ABC123.sk1.us-east-1.eks.amazonaws.com"
cluster_name = "my-terraform-cluster"

# Configure kubectl
$ aws eks update-kubeconfig --name my-terraform-cluster --region us-east-1
Added new context arn:aws:eks:us-east-1:123456789:cluster/my-terraform-cluster

$ kubectl get nodes
NAME                                       STATUS   ROLES    AGE   VERSION
ip-10-0-1-123.ec2.internal                Ready    <none>   5m    v1.28
ip-10-0-2-234.ec2.internal                Ready    <none>   5m    v1.28

$ kubectl run nginx --image=nginx
pod/nginx created

$ kubectl get pods
NAME    READY   STATUS    RESTARTS   AGE
nginx   1/1     Running   0          30s

Implementation Hints:

Use the official EKS module (don’t reinvent the wheel):

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "my-cluster"
  cluster_version = "1.28"

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  eks_managed_node_groups = {
    default = {
      min_size     = 2
      max_size     = 4
      desired_size = 2

      instance_types = ["t3.medium"]
    }
  }
}

If building from scratch, understand the dependencies:

  1. IAM role for EKS cluster (trust policy: eks.amazonaws.com)
  2. EKS cluster (takes ~10 minutes)
  3. IAM role for node groups (trust policy: ec2.amazonaws.com)
  4. Node groups (take ~5 minutes each)
  5. aws-auth ConfigMap (for kubectl access)

Kubernetes provider configuration after EKS:

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    args        = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

Questions to explore:

  1. Why does EKS take so long to create?
  2. What’s the difference between managed and self-managed node groups?
  3. How do you add cluster add-ons (metrics-server, ingress controller)?
  4. How do you handle EKS upgrades?

Learning milestones:

  1. Cluster is running → You understand EKS basics
  2. kubectl works → You understand authentication flow
  3. Pods schedule correctly → You understand node groups
  4. You can deploy applications → You understand the full workflow

Project 11: CI/CD Pipeline for Terraform (GitOps)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: YAML (GitHub Actions), HCL
  • Alternative Programming Languages: GitLab CI, Jenkins, Atlantis
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 3: Advanced
  • Knowledge Area: CI/CD / DevOps / Automation
  • Software or Tool: GitHub Actions, Terraform, tflint, tfsec
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A complete CI/CD pipeline that runs terraform fmt, terraform validate, security scanning, terraform plan on PRs (with plan output as comments), and terraform apply on merge to main.

Why it teaches Terraform: Manual Terraform runs don’t scale. This project teaches the professional workflow: code review for infrastructure changes, automated validation, and safe automated deployments.

Core challenges you’ll face:

  • Credential management in CI → maps to OIDC, secrets, and least privilege
  • Plan output as PR comment → maps to Terraform output parsing
  • Concurrency control → maps to preventing parallel applies
  • Approval workflows → maps to gating production deployments

Key Concepts:

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Git, GitHub Actions basics, Projects 1-10 completed

Real world outcome:

# .github/workflows/terraform.yml
name: Terraform

on:
  pull_request:
    paths: ['**.tf']
  push:
    branches: [main]
    paths: ['**.tf']

jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3

      - name: Terraform Format Check
        run: terraform fmt -check -recursive

      - name: Terraform Init
        run: terraform init -backend=false

      - name: Terraform Validate
        run: terraform validate

      - name: tfsec Security Scan
        uses: aquasecurity/tfsec-action@v1.0.0

  plan:
    if: github.event_name == 'pull_request'
    needs: validate
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/github-actions
          aws-region: us-east-1

      - name: Terraform Plan
        id: plan
        run: terraform plan -no-color -out=tfplan

      - name: Comment Plan on PR
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '## Terraform Plan\n```\n${{ steps.plan.outputs.stdout }}\n```'
            })

  apply:
    if: github.ref == 'refs/heads/main' && github.event_name == 'push'
    needs: validate
    runs-on: ubuntu-latest
    environment: production
    steps:
      - name: Terraform Apply
        run: terraform apply -auto-approve
# PR Comment example:
## Terraform Plan

Terraform will perform the following actions:

  # aws_instance.web will be updated in-place
  ~ resource "aws_instance" "web" {
      ~ instance_type = "t3.micro" -> "t3.small"
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Implementation Hints:

OIDC for AWS (no long-lived secrets):

# Create in AWS first
resource "aws_iam_openid_connect_provider" "github" {
  url             = "https://token.actions.githubusercontent.com"
  client_id_list  = ["sts.amazonaws.com"]
  thumbprint_list = ["6938fd4d..."]
}

resource "aws_iam_role" "github_actions" {
  name = "github-actions"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect = "Allow"
      Principal = {
        Federated = aws_iam_openid_connect_provider.github.arn
      }
      Action = "sts:AssumeRoleWithWebIdentity"
      Condition = {
        StringEquals = {
          "token.actions.githubusercontent.com:aud" = "sts.amazonaws.com"
        }
        StringLike = {
          "token.actions.githubusercontent.com:sub" = "repo:yourorg/yourrepo:*"
        }
      }
    }]
  })
}

Concurrency control:

concurrency:
  group: terraform-${{ github.ref }}
  cancel-in-progress: false

Questions to explore:

  1. How do you handle multiple environments in the same repo?
  2. What’s the difference between GitHub Actions and Atlantis?
  3. How do you implement manual approval for production?
  4. How do you handle Terraform state locking conflicts in CI?

Learning milestones:

  1. Format/validate runs on every PR → You understand basic CI
  2. Plan appears as PR comment → You understand review workflows
  3. Apply only runs on main → You understand gated deployments
  4. No long-lived credentials → You understand security best practices

Project 12: Import Existing Infrastructure

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: Terraformer (auto-generator)
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Migration / Legacy Systems
  • Software or Tool: Terraform, Terraformer, AWS CLI
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: Take existing manually-created AWS infrastructure (EC2, VPC, RDS created via console) and bring it under Terraform management without destroying and recreating resources.

Why it teaches Terraform: Real-world Terraform adoption almost never starts greenfield. This project teaches the critical skill of importing existing resources, handling state without disruption, and the detective work required to reverse-engineer infrastructure.

Core challenges you’ll face:

  • Discovering existing resources → maps to AWS CLI and console archaeology
  • Writing matching configuration → maps to reverse engineering infrastructure
  • Import without state drift → maps to plan showing no changes
  • Handling dependencies → maps to import order matters

Key Concepts:

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: AWS CLI proficiency, Projects 1-7 completed

Real world outcome:

# Step 1: Discover what exists
$ aws ec2 describe-instances --query 'Reservations[].Instances[].{ID:InstanceId,Type:InstanceType,VPC:VpcId}'
[
    {
        "ID": "i-0abc123def456",
        "Type": "t3.small",
        "VPC": "vpc-0xyz789"
    }
]

# Step 2: Write matching config (modern Terraform 1.5+ way)
$ cat imports.tf
import {
  to = aws_instance.legacy_web
  id = "i-0abc123def456"
}

import {
  to = aws_vpc.legacy
  id = "vpc-0xyz789"
}

# Step 3: Generate config from imports
$ terraform plan -generate-config-out=generated.tf
Planning...
aws_instance.legacy_web: Preparing import... [id=i-0abc123def456]
aws_vpc.legacy: Preparing import... [id=vpc-0xyz789]

# Review generated config
$ cat generated.tf
resource "aws_instance" "legacy_web" {
  ami                    = "ami-12345678"
  instance_type          = "t3.small"
  subnet_id              = "subnet-abc123"
  vpc_security_group_ids = ["sg-xyz789"]
  # ... full configuration
}

# Step 4: Apply import
$ terraform apply
aws_instance.legacy_web: Importing... [id=i-0abc123def456]
aws_instance.legacy_web: Import complete

# Step 5: Verify no drift
$ terraform plan
No changes. Your infrastructure matches the configuration.

Implementation Hints:

Old import method (still useful):

# Write the resource block first (empty or guessed)
resource "aws_instance" "legacy" {
  # TODO: fill in after import
}

# Import into state
terraform import aws_instance.legacy i-0abc123def456

# Now terraform plan will show what's different
terraform plan
# Add missing attributes until plan shows "no changes"

Terraformer for bulk import:

# Install terraformer
brew install terraformer

# Import all EC2 instances
terraformer import aws --resources=ec2_instance --regions=us-east-1

# Output is in generated/aws/ec2_instance/

Common gotchas:

  • Some attributes are “import-only” and not in normal output
  • Security groups often have circular dependencies
  • Some resources can’t be imported (must recreate)
  • Order matters: VPC before subnets before instances

Questions to explore:

  1. What resources can’t be imported?
  2. How do you handle resources created by other tools (CloudFormation)?
  3. What’s the strategy for importing a 500-resource account?
  4. How do you handle imports across multiple Terraform states?

Learning milestones:

  1. Single resource imports cleanly → You understand import basics
  2. Plan shows no changes → You matched the config perfectly
  3. Multiple dependent resources import → You understand import order
  4. You document the import process → You can repeat it for others

Project 13: Serverless Infrastructure (Lambda + API Gateway)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL, Python (Lambda code)
  • Alternative Programming Languages: Serverless Framework, AWS SAM
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Serverless / FaaS / API Management
  • Software or Tool: Terraform, AWS Lambda, API Gateway
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A serverless API with Lambda functions behind API Gateway, including IAM roles, CloudWatch logs, and a custom domain with SSL.

Why it teaches Terraform: Serverless is the opposite of traditional infrastructure—no servers to manage. This project shows Terraform’s versatility and teaches the IAM permission model that governs all AWS services.

Core challenges you’ll face:

  • Lambda packaging → maps to deployment packages and layers
  • API Gateway complexity → maps to routes, methods, integrations
  • IAM permissions → maps to least privilege for Lambda execution
  • Custom domains → maps to ACM certificates and Route53

Key Concepts:

Difficulty: Advanced Time estimate: 1-2 weeks Prerequisites: Python basics, Projects 1-7 completed

Real world outcome:

$ terraform apply
aws_lambda_function.api: Creating...
aws_apigatewayv2_api.main: Creating...
...
Apply complete! Resources: 12 added, 0 changed, 0 destroyed.

Outputs:

api_endpoint = "https://abc123.execute-api.us-east-1.amazonaws.com"
custom_domain = "https://api.myapp.com"

$ curl https://api.myapp.com/hello
{"message": "Hello from Terraform-deployed Lambda!"}

$ curl https://api.myapp.com/users/123
{"user_id": "123", "name": "John Doe"}

$ aws logs tail /aws/lambda/my-api --follow
START RequestId: abc-123
{"level": "INFO", "message": "Processing request"}
END RequestId: abc-123
REPORT Duration: 3.21 ms   Billed Duration: 4 ms   Memory Size: 128 MB

Implementation Hints:

Lambda function with Terraform:

data "archive_file" "lambda" {
  type        = "zip"
  source_dir  = "${path.module}/lambda"
  output_path = "${path.module}/lambda.zip"
}

resource "aws_lambda_function" "api" {
  filename         = data.archive_file.lambda.output_path
  function_name    = "my-api"
  role            = aws_iam_role.lambda.arn
  handler         = "main.handler"
  source_code_hash = data.archive_file.lambda.output_base64sha256
  runtime         = "python3.11"

  environment {
    variables = {
      ENVIRONMENT = var.environment
    }
  }
}

resource "aws_iam_role" "lambda" {
  name = "lambda-execution-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Action = "sts:AssumeRole"
      Effect = "Allow"
      Principal = {
        Service = "lambda.amazonaws.com"
      }
    }]
  })
}

API Gateway v2 (HTTP API):

resource "aws_apigatewayv2_api" "main" {
  name          = "my-api"
  protocol_type = "HTTP"
}

resource "aws_apigatewayv2_integration" "lambda" {
  api_id           = aws_apigatewayv2_api.main.id
  integration_type = "AWS_PROXY"
  integration_uri  = aws_lambda_function.api.invoke_arn
}

resource "aws_apigatewayv2_route" "hello" {
  api_id    = aws_apigatewayv2_api.main.id
  route_key = "GET /hello"
  target    = "integrations/${aws_apigatewayv2_integration.lambda.id}"
}

Questions to explore:

  1. What’s the difference between API Gateway REST API and HTTP API?
  2. How do you handle Lambda cold starts?
  3. How do you deploy Lambda updates without downtime?
  4. What’s the trade-off between Terraform and Serverless Framework?

Learning milestones:

  1. Lambda responds to HTTP requests → You understand serverless basics
  2. Logs appear in CloudWatch → You understand observability
  3. Custom domain works with SSL → You understand ACM and Route53
  4. You can deploy updates safely → You understand deployment patterns

Project 14: Infrastructure Testing with Terratest

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: Go, HCL
  • Alternative Programming Languages: Python (pytest-terraform), Terraform Test (native)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 4: Expert
  • Knowledge Area: Testing / Quality Assurance / TDD
  • Software or Tool: Terratest, Go, Terraform Test
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A comprehensive test suite for your Terraform modules using both Terratest (Go-based) and native Terraform testing, including unit tests, integration tests, and end-to-end validation.

Why it teaches Terraform: Infrastructure needs testing just like application code. This project teaches professional-grade IaC practices: testing modules before release, validating assumptions, and catching regressions before they hit production.

Core challenges you’ll face:

  • Test isolation → maps to creating unique resources per test run
  • Test cleanup → maps to defer and destroy patterns
  • Async infrastructure → maps to retry logic and eventual consistency
  • Cost management → maps to running tests economically

Key Concepts:

  • Terratest: Gruntwork - Terratest Documentation
  • Native Terraform Test: HashiCorp Docs - Tests
  • Testing Best Practices: “Terraform: Up & Running” Chapter 9 - Yevgeniy Brikman

Difficulty: Expert Time estimate: 2 weeks Prerequisites: Basic Go knowledge, Projects 1-6 completed

Real world outcome:

# Native Terraform Test (Terraform 1.6+)
$ cat tests/vpc.tftest.hcl
run "vpc_creates_successfully" {
  command = apply

  assert {
    condition     = aws_vpc.main.cidr_block == "10.0.0.0/16"
    error_message = "VPC CIDR block is incorrect"
  }

  assert {
    condition     = length(aws_subnet.private) == 2
    error_message = "Expected 2 private subnets"
  }
}

run "vpc_has_internet_access" {
  command = apply

  assert {
    condition     = aws_internet_gateway.main.id != ""
    error_message = "Internet gateway not created"
  }
}

$ terraform test
tests/vpc.tftest.hcl... in progress
  run "vpc_creates_successfully"... pass
  run "vpc_has_internet_access"... pass
tests/vpc.tftest.hcl... tearing down
tests/vpc.tftest.hcl... pass

Success! 2 passed, 0 failed.

# Terratest (Go)
$ go test -v -timeout 30m ./tests/
=== RUN   TestVpcModule
    vpc_test.go:25: Creating VPC with unique name: test-vpc-abc123
    vpc_test.go:40: VPC created: vpc-0abc123
    vpc_test.go:55: Verifying public subnet has internet access...
    vpc_test.go:60: HTTP request to internet succeeded
    vpc_test.go:70: Destroying test infrastructure...
--- PASS: TestVpcModule (180.25s)
PASS

Implementation Hints:

Terratest example (Go):

package test

import (
    "testing"
    "github.com/gruntwork-io/terratest/modules/terraform"
    "github.com/gruntwork-io/terratest/modules/aws"
    "github.com/stretchr/testify/assert"
)

func TestVpcModule(t *testing.T) {
    t.Parallel()

    terraformOptions := &terraform.Options{
        TerraformDir: "../modules/vpc",
        Vars: map[string]interface{}{
            "cidr_block":  "10.0.0.0/16",
            "environment": "test",
        },
    }

    // Clean up after test
    defer terraform.Destroy(t, terraformOptions)

    // Deploy infrastructure
    terraform.InitAndApply(t, terraformOptions)

    // Get outputs
    vpcId := terraform.Output(t, terraformOptions, "vpc_id")

    // Validate VPC exists
    vpc := aws.GetVpcById(t, vpcId, "us-east-1")
    assert.Equal(t, "10.0.0.0/16", vpc.CidrBlock)
}

Native Terraform test structure:

modules/vpc/
├── main.tf
├── variables.tf
├── outputs.tf
└── tests/
    ├── unit.tftest.hcl      # Fast, mock-based tests
    └── integration.tftest.hcl # Real infrastructure tests

Test patterns:

  1. Unit tests: Validate configuration without applying (plan-only)
  2. Integration tests: Apply to real cloud, verify, destroy
  3. E2E tests: Deploy full stack, run application tests

Questions to explore:

  1. How do you test modules without spending money?
  2. What’s the difference between Terratest and native tests?
  3. How do you handle flaky tests (eventual consistency)?
  4. How do you parallelize tests safely?

Learning milestones:

  1. Tests run and pass → You understand test setup
  2. Tests create/destroy cleanly → You understand test isolation
  3. Tests catch real bugs → You understand test value
  4. Tests run in CI → You understand automated testing

Project 15: Multi-Cloud Deployment (AWS + GCP)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL
  • Alternative Programming Languages: Pulumi (multi-cloud native)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 4: Expert
  • Knowledge Area: Multi-Cloud / Hybrid Architecture
  • Software or Tool: Terraform, AWS, GCP, Consul/Tailscale
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A multi-cloud architecture with a web tier on AWS and database on GCP (Cloud SQL), connected via secure VPN tunnel, demonstrating true multi-cloud orchestration.

Why it teaches Terraform: This is Terraform’s killer feature—one tool, multiple clouds. This project forces you to understand provider abstraction, network connectivity between clouds, and the reality of multi-cloud complexity.

Core challenges you’ll face:

  • Multi-provider configuration → maps to provider aliases and credentials
  • Cross-cloud networking → maps to VPN, peering, or overlay networks
  • Consistent naming and tagging → maps to locals and conventions
  • Deployment ordering → maps to depends_on across providers

Key Concepts:

  • Multi-Cloud Terraform: “Terraform: Up & Running” Chapter 7 - Yevgeniy Brikman
  • GCP Networking: Google Cloud Documentation
  • VPN Connectivity: AWS/GCP - Site-to-Site VPN

Difficulty: Expert Time estimate: 3-4 weeks Prerequisites: GCP basics, VPN/networking, Projects 1-10 completed

Real world outcome:

$ terraform apply
# AWS resources
aws_vpc.main: Creating...
aws_vpn_gateway.main: Creating...
aws_instance.web: Creating...

# GCP resources
google_compute_network.main: Creating...
google_sql_database_instance.main: Creating...
google_compute_vpn_gateway.main: Creating...

# Cross-cloud VPN
aws_vpn_connection.to_gcp: Creating...
google_compute_vpn_tunnel.to_aws: Creating...

Apply complete! Resources: 28 added

Outputs:

aws_web_public_ip = "54.123.45.67"
gcp_database_private_ip = "10.100.0.5"
vpn_status = "established"

# Test connectivity from AWS to GCP
$ ssh ubuntu@54.123.45.67
ubuntu@web:~$ psql -h 10.100.0.5 -U admin -d appdb
Password:
appdb=> SELECT 'Connected from AWS to GCP!';
         ?column?
---------------------------
 Connected from AWS to GCP!

Implementation Hints:

Multi-provider configuration:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
}

provider "aws" {
  region = "us-east-1"
}

provider "google" {
  project = "my-project"
  region  = "us-central1"
}

# Provider aliases for multi-region
provider "aws" {
  alias  = "west"
  region = "us-west-2"
}

Cross-cloud VPN concept:

AWS VPC (10.0.0.0/16)
    │
    └── VPN Gateway ──────VPN Tunnel────── VPN Gateway
                                               │
                                    GCP VPC (10.100.0.0/16)

Shared naming convention:

locals {
  project = "myapp"
  env     = "prod"

  common_tags = {
    Project     = local.project
    Environment = local.env
    ManagedBy   = "terraform"
  }
}

resource "aws_vpc" "main" {
  tags = local.common_tags
}

resource "google_compute_network" "main" {
  labels = { for k, v in local.common_tags : lower(k) => lower(v) }
}

Questions to explore:

  1. What are the latency implications of cross-cloud communication?
  2. How do you handle cloud-specific features (no GCP equivalent)?
  3. What’s the cost model for VPN vs dedicated interconnect?
  4. How do you manage credentials for multiple clouds?

Learning milestones:

  1. Both clouds provision → You understand multi-provider
  2. VPN tunnel establishes → You understand cross-cloud networking
  3. Application connects across clouds → You understand end-to-end
  4. You understand trade-offs → You can advise on multi-cloud

Project 16: Custom Terraform Provider (Go)

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: Go
  • Alternative Programming Languages: N/A (providers must be Go)
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 5: Master
  • Knowledge Area: SDK Development / API Integration
  • Software or Tool: Terraform Plugin SDK, Go
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A custom Terraform provider for an internal service or third-party API (e.g., managing DNS records on a custom service, or a provider for an internal config management system).

Why it teaches Terraform: This is the deepest level of Terraform understanding. By building a provider, you’ll understand the entire Terraform plugin architecture, CRUD operations, state management, and how Terraform communicates with the outside world.

Core challenges you’ll face:

  • Plugin SDK architecture → maps to provider, resources, data sources
  • CRUD implementation → maps to Create, Read, Update, Delete functions
  • State handling → maps to schema and state management
  • Error handling → maps to diagnostics and retries

Key Concepts:

  • Provider Development: HashiCorp Docs - Plugin Development
  • Terraform Plugin Framework: HashiCorp - Plugin Framework
  • Go Programming: “The Go Programming Language” by Donovan & Kernighan

Difficulty: Master Time estimate: 1 month+ Prerequisites: Go proficiency, deep Terraform knowledge, all previous projects

Real world outcome:

# Your custom provider
$ cat main.tf
terraform {
  required_providers {
    internal = {
      source  = "yourcompany/internal"
      version = "1.0.0"
    }
  }
}

provider "internal" {
  api_endpoint = "https://internal-api.company.com"
  api_key      = var.internal_api_key
}

resource "internal_config" "app_settings" {
  name = "myapp"

  settings = {
    feature_flag_x = "true"
    max_connections = "100"
  }
}

data "internal_config" "existing" {
  name = "legacy-app"
}

$ terraform apply
internal_config.app_settings: Creating...
internal_config.app_settings: Creation complete [id=cfg-abc123]

Apply complete! Resources: 1 added

$ terraform state show internal_config.app_settings
# internal_config.app_settings:
resource "internal_config" "app_settings" {
    id       = "cfg-abc123"
    name     = "myapp"
    settings = {
        "feature_flag_x"  = "true"
        "max_connections" = "100"
    }
}

Implementation Hints:

Provider structure (Plugin Framework - modern approach):

// provider.go
package provider

import (
    "context"
    "github.com/hashicorp/terraform-plugin-framework/provider"
    "github.com/hashicorp/terraform-plugin-framework/provider/schema"
)

type InternalProvider struct {
    version string
}

func (p *InternalProvider) Schema(ctx context.Context, req provider.SchemaRequest, resp *provider.SchemaResponse) {
    resp.Schema = schema.Schema{
        Attributes: map[string]schema.Attribute{
            "api_endpoint": schema.StringAttribute{
                Required: true,
            },
            "api_key": schema.StringAttribute{
                Required:  true,
                Sensitive: true,
            },
        },
    }
}

func (p *InternalProvider) Resources(ctx context.Context) []func() resource.Resource {
    return []func() resource.Resource{
        NewConfigResource,
    }
}

Resource implementation:

// resource_config.go
func (r *ConfigResource) Create(ctx context.Context, req resource.CreateRequest, resp *resource.CreateResponse) {
    var data ConfigResourceModel
    resp.Diagnostics.Append(req.Plan.Get(ctx, &data)...)

    // Call your API
    result, err := r.client.CreateConfig(data.Name.ValueString(), data.Settings)
    if err != nil {
        resp.Diagnostics.AddError("Create failed", err.Error())
        return
    }

    data.ID = types.StringValue(result.ID)
    resp.Diagnostics.Append(resp.State.Set(ctx, &data)...)
}

Questions to explore:

  1. When should you build a custom provider vs use http provider?
  2. How do you handle API pagination and rate limiting?
  3. How do you test providers?
  4. How do you publish to the Terraform Registry?

Learning milestones:

  1. Provider compiles → You understand plugin architecture
  2. Resources CRUD works → You understand the Terraform lifecycle
  3. Provider handles errors gracefully → You understand production quality
  4. Others can use your provider → You understand distribution

Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
1. Local Infrastructure Sandbox Beginner Weekend ⭐⭐⭐⭐⭐ ⭐⭐
2. Static Website on S3 Beginner Weekend ⭐⭐⭐ ⭐⭐⭐
3. VPC from Scratch Intermediate 1-2 weeks ⭐⭐⭐⭐ ⭐⭐
4. EC2 Web Server Intermediate 1 week ⭐⭐⭐ ⭐⭐⭐
5. RDS Database Intermediate 1 week ⭐⭐⭐⭐ ⭐⭐
6. Terraform Modules Advanced 2 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐
7. Remote State Intermediate Weekend ⭐⭐⭐⭐ ⭐⭐
8. Docker Infrastructure Intermediate Weekend ⭐⭐⭐ ⭐⭐⭐⭐
9. Multi-Environment Advanced 2 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐
10. EKS Kubernetes Expert 2-3 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐
11. CI/CD Pipeline Advanced 1-2 weeks ⭐⭐⭐⭐ ⭐⭐⭐⭐
12. Import Existing Advanced 1-2 weeks ⭐⭐⭐⭐ ⭐⭐⭐
13. Serverless Advanced 1-2 weeks ⭐⭐⭐ ⭐⭐⭐⭐
14. Infrastructure Testing Expert 2 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐
15. Multi-Cloud Expert 3-4 weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
16. Custom Provider Master 1 month+ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐

Recommendation

If you’re brand new to Terraform:

Start with Project 1 (Local Infrastructure Sandbox)Project 2 (S3 Website)Project 3 (VPC)

This progression teaches you the core workflow without cloud costs (Project 1), then introduces real cloud resources (Project 2), then shows you infrastructure complexity (Project 3).

If you have some Terraform experience:

Jump to Project 6 (Modules)Project 7 (Remote State)Project 9 (Multi-Environment)

These projects teach professional-grade patterns that separate hobbyists from production-ready engineers.

If you want to become a Terraform expert:

Work through all projects, but especially focus on:

  • Project 14 (Testing) — Most teams skip this, but it’s crucial
  • Project 15 (Multi-Cloud) — Proves you understand Terraform’s core value
  • Project 16 (Custom Provider) — The ultimate deep dive

Final Overall Project: Production-Grade Multi-Tenant SaaS Platform

  • File: LEARN_TERRAFORM_DEEP_DIVE.md
  • Main Programming Language: HCL, Python, Go
  • Alternative Programming Languages: TypeScript (CDK), Pulumi
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 5. The “Industry Disruptor”
  • Difficulty: Level 5: Master
  • Knowledge Area: Full Stack Infrastructure / Platform Engineering
  • Software or Tool: Terraform, AWS, Kubernetes, GitOps, Monitoring
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A complete production-grade SaaS platform infrastructure including:

  • Multi-region VPC with transit gateway
  • EKS clusters (primary + DR)
  • RDS with read replicas and failover
  • ElastiCache for session/cache layer
  • CloudFront CDN with WAF
  • Multi-tenant isolation patterns
  • Complete observability stack (CloudWatch, Prometheus, Grafana)
  • Automated backup and disaster recovery
  • CI/CD with Terraform Cloud or self-hosted
  • Cost management and tagging strategy

Why this is the ultimate test: This project synthesizes everything: modules, remote state, multi-environment, testing, CI/CD, security, and operational excellence. It’s what enterprise Terraform looks like.

Core challenges you’ll face:

  • Module composition at scale → 50+ modules working together
  • State organization → Multiple state files, team boundaries
  • Blast radius management → Isolating critical resources
  • Cost control → Tagging, budgets, resource optimization
  • Compliance → Security groups, encryption, audit logging
  • DR planning → Failover automation, recovery testing

Real world outcome:

Platform Architecture:
├── networking/
│   ├── vpc-primary/
│   ├── vpc-dr/
│   └── transit-gateway/
├── compute/
│   ├── eks-primary/
│   ├── eks-dr/
│   └── node-groups/
├── data/
│   ├── rds-primary/
│   ├── rds-replica/
│   ├── elasticache/
│   └── s3-buckets/
├── security/
│   ├── iam-roles/
│   ├── kms-keys/
│   └── waf/
├── observability/
│   ├── cloudwatch/
│   ├── prometheus/
│   └── grafana/
└── cicd/
    ├── terraform-cloud/
    └── github-actions/

$ terraform workspace list
  dev
  staging
* production
  dr-production

$ terraform plan
Plan: 0 to add, 0 to change, 0 to destroy.

Your infrastructure is up to date.

$ ./scripts/dr-failover.sh
Initiating DR failover...
RDS failover: complete
EKS context switch: complete
DNS cutover: complete
DR failover complete in 4m 32s

Implementation Hints:

This project requires all previous skills combined. Key patterns:

  1. Layered architecture — Network layer deploys before compute, compute before data
  2. Dependency injection — Pass outputs between layers via remote state
  3. Environment parity — Same modules, different variables
  4. Security by default — Encryption everywhere, least privilege
  5. Observable by default — Metrics, logs, traces from day one

Questions to ask yourself:

  • Can I destroy and recreate any layer without data loss?
  • Can I deploy to a new region in under a day?
  • Can I onboard a new engineer without tribal knowledge?
  • Can I pass a security audit?

Learning milestones:

  1. Platform deploys end-to-end → You understand orchestration
  2. Team can work independently → You understand boundaries
  3. DR failover works → You understand resilience
  4. Costs are predictable → You understand operations

Resources Summary

Essential Books

  • “Terraform: Up & Running” by Yevgeniy Brikman — The definitive Terraform book, covers basics through advanced
  • “Terraform in Depth” by Robert Hafner (2025) — Modern best practices and patterns

Official Resources

Community Resources


Summary

# Project Main Language
1 Local Infrastructure Sandbox HCL
2 Static Website on S3 HCL
3 VPC from Scratch HCL
4 EC2 Web Server with Security Groups HCL
5 RDS Database with Secrets Management HCL
6 Terraform Modules HCL
7 Remote State and Workspaces HCL
8 Docker Infrastructure HCL
9 Multi-Environment Deployment HCL
10 Kubernetes Cluster with EKS HCL
11 CI/CD Pipeline for Terraform YAML (GitHub Actions), HCL
12 Import Existing Infrastructure HCL
13 Serverless Infrastructure HCL, Python
14 Infrastructure Testing Go, HCL
15 Multi-Cloud Deployment HCL
16 Custom Terraform Provider Go
Final Production-Grade Multi-Tenant SaaS Platform HCL, Python, Go

Sources