← Back to all projects

LEARN MULTI CLOUD BY DOING

Learn Multi-Cloud: From Vendor Lock-in to Global Architect

Goal: Master the art of designing systems that run across multiple cloud providers (AWS, Azure, GCP), understanding the trade-offs between portability, complexity, cost, and resilience.


The Reality of Multi-Cloud

Multi-cloud isn’t just about “using two clouds.” It’s about abstraction and interoperability. It is the discipline of treating cloud providers as commodity utilities rather than magical ecosystems.

When you build for a single cloud, you use their proprietary glue (Lambda, SQS, DynamoDB). When you build for multi-cloud, you must build your own glue or use open standards (Kubernetes, Terraform, OIDC, WireGuard).

After completing these projects, you will:

  • abstract away the differences between S3 and Azure Blob Storage.
  • build networks that span Google and Amazon data centers.
  • deploy infrastructure to any cloud with a single command.
  • understand “Data Gravity” and why moving data is the hardest part.
  • architect for “Disaster Recovery” where the disaster is a total provider outage.

Core Concept Analysis

The Three Pillars of Multi-Cloud

  1. Compute Abstraction: How to run code anywhere (Containers, K8s, Wasm).
  2. Data Consistency: Keeping state synced across providers (Replication, Federation).
  3. Connectivity: Making networks talk securely (VPNs, Interconnects, Mesh).

Project List

Projects are ordered from “Stateless/Simple” to “Stateful/Complex” to “Platform Engineering.”


Project 1: The High-Availability Static Site (Global Failover)

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: HCL (Terraform)
  • Alternative Programming Languages: Pulumi (TypeScript/Python)
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 1: Beginner
  • Knowledge Area: DNS / Object Storage / CDN
  • Software or Tool: Terraform, AWS S3, Azure Blob Storage, Cloudflare (DNS)
  • Main Book: “Terraform: Up & Running” by Yevgeniy Brikman

What you’ll build: A static website deployed simultaneously to AWS S3 and Azure Blob Storage. You will configure a DNS traffic manager (like Cloudflare Load Balancer or AWS Route53) to serve traffic from the primary cloud but automatically switch to the secondary cloud if the primary goes down.

Why it teaches Multi-cloud: This introduces the most fundamental concept: Redundancy. You will learn how two different providers solve the same problem (Object Storage) and how to mask those differences behind a DNS layer.

Core challenges you’ll face:

  • Provider Differences: S3 permissions vs. Azure Blob permissions (Public Access logic).
  • DNS Failover: Configuring health checks that actually detect a “down” state.
  • Content Sync: Ensuring index.html is identical on both clouds.

Key Concepts:

  • Object Storage: S3 Buckets vs Azure Containers.
  • DNS Weighted Routing: RFC 1035.
  • Infrastructure as Code (IaC): Defining both environments in one file.

Difficulty: Beginner Time estimate: Weekend Pre-requirements: Free tier accounts on AWS and Azure. Basic CLI knowledge.

Real world outcome: You kill the AWS bucket (delete the file or block access). Within 30 seconds, your website is still reachable, but now it’s being served by Microsoft Azure.

Implementation Hints: Use Terraform to provision the buckets. Use a third-party DNS (like Cloudflare) for the “switch” logic because it sits outside both clouds. If you use Route53 (AWS) to manage failover to Azure, you are still dependent on AWS DNS being up.


Project 2: The “Cloud-Agnostic” Storage API

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Go, Node.js
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Software Design / Abstraction Layers
  • Software or Tool: Boto3 (AWS SDK), Azure SDK
  • Main Book: “Clean Architecture” by Robert C. Martin (for the abstraction concept)

What you’ll build: A REST API service that allows users to upload files. The service accepts the file and saves it to either AWS S3 or Google Cloud Storage (GCS) based on a configuration flag or a “least used” algorithm. The user never knows which cloud holds their data.

Why it teaches Multi-cloud: This teaches Software Abstraction. You cannot let vendor-specific code (boto3.upload_file) leak into your business logic. You must build an IStorageProvider interface that unifies them.

Core challenges you’ll face:

  • Unified Interface: AWS returns an ETag; GCS returns a generation ID. You need to standardize the response.
  • Authentication: Handling AWS Credentials (~/.aws/credentials) and GCP Service Accounts (key.json) in the same app.
  • Error Handling: Mapping S3Exception and GoogleAPIError to a generic StorageError.

Key Concepts:

  • The Adapter Pattern: Design Patterns.
  • SDK Management: Managing multiple heavy client libraries.

Difficulty: Intermediate Time estimate: 1 week Pre-requirements: Basic Python/API knowledge.

Real world outcome:

POST /upload
{
  "provider": "auto",
  "file": "image.png"
}
Response:
{
  "id": "12345",
  "storage_backend": "gcp-us-east1"
}

Implementation Hints: Define an abstract base class StorageProvider with methods upload, download, and delete. Create S3Provider and GCSProvider classes that implement this. Use environment variables to toggle which one is active.


Project 3: The Spot Instance Arbitrage Bot

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Python, Rust
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Cost Optimization / Dynamic Provisioning
  • Software or Tool: Terraform (triggered by code) or Pulumi
  • Main Book: “Cloud FinOps” by J.R. Storment

What you’ll build: A background worker that periodically checks the “Spot Instance” prices for a 2-CPU/4GB RAM server on AWS, Azure, and GCP. It automatically provisions a VM on the cheapest provider to run a dummy workload (like processing a queue), then shuts it down.

Why it teaches Multi-cloud: The primary business driver for multi-cloud is often Cost Leverage. This forces you to map “Instance Types” across providers (e.g., t3.medium vs Standard_B2s vs e2-medium) and automate the creation/destruction of resources programmatically.

Core challenges you’ll face:

  • Normalization: “2 vCPU” on AWS is not exactly “2 vCPU” on GCP. You need a mapping table.
  • Programmatic Infrastructure: Running Terraform/Pulumi from inside your code, not from your CLI.
  • State Management: Knowing what is currently running where so you don’t spawn infinite servers.

Key Concepts:

  • Spot/Preemptible Instances: Ephemeral compute markets.
  • Infrastructure Automation: API-driven provisioning.

Difficulty: Advanced Time estimate: 2 weeks Pre-requirements: Project 2.

Real world outcome: Logs show: [08:00] AWS: $0.03/hr | Azure: $0.025/hr. Winner: Azure. Spawning VM... [09:00] AWS: $0.015/hr | Azure: $0.04/hr. Winner: AWS. Destroying Azure VM, Spawning AWS VM...

Implementation Hints: Use the Pricing APIs for each cloud. Use Pulumi (Infrastructure as Code in Go/Python) instead of Terraform, as it is easier to integrate logic (if/else) regarding which cloud to pick dynamically.


Project 4: Multi-Cloud VPN Mesh (The Network Bridge)

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: Bash / Linux Networking
  • Alternative Programming Languages: Ansible
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Networking / VPNs
  • Software or Tool: WireGuard, AWS VPC, Azure VNet
  • Main Book: “Computer Networking: A Top-Down Approach” (for subnetting)

What you’ll build: A secure, private network tunnel connecting an AWS VPC and an Azure VNet. You will deploy a VM in each cloud that has no public IP address, yet they will be able to ping each other over the tunnel using private IPs.

Why it teaches Multi-cloud: Networking is the hardest part of multi-cloud. Providers have different ways of handling routing, firewalls, and NAT. Building a site-to-site VPN manually (using WireGuard) demystifies how “Direct Connect” or “ExpressRoute” works.

Core challenges you’ll face:

  • CIDR Overlap: If both clouds use 10.0.0.0/16, they can’t talk. You must plan subnets (e.g., AWS 10.1.0.0/16, Azure 10.2.0.0/16).
  • Firewalls: AWS Security Groups vs Azure Network Security Groups (NSG). Allowing UDP port 51820.
  • Route Tables: Telling the AWS VPC that traffic for 10.2.0.0/16 must go to the WireGuard instance.

Key Concepts:

  • Overlay Networks: Building a network on top of a network.
  • CIDR Planning: IP address management.
  • Routing: Static routes and Next-Hop gateways.

Difficulty: Advanced Time estimate: 1 week Pre-requirements: Understanding of IP addresses and Subnets.

Real world outcome: From the AWS EC2 instance (10.1.0.5): ping 10.2.0.4 (The Azure VM private IP) -> Success.

Implementation Hints: Launch a “Gateway VM” in each cloud with a Public IP. Install WireGuard on both. Configure them to peer with each other. Then, in the cloud console, update the Route Tables of the subnets to send cross-cloud traffic to these Gateway VMs.


Project 5: The “Frankenstein” Serverless Pipeline

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: Python or Node.js
  • Alternative Programming Languages: C#
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Event-Driven Architecture
  • Software or Tool: AWS Lambda, Google Cloud Functions, Azure Service Bus
  • Main Book: “Serverless Architectures on AWS” (Concepts apply generally)

What you’ll build: An image processing pipeline that splits work across clouds.

  1. User uploads image to AWS S3.
  2. S3 triggers an AWS Lambda that sends a message to Azure Service Bus.
  3. Azure Functions triggers off the bus, resizes the image, and uploads it to Google Cloud Storage.

Why it teaches Multi-cloud: This demonstrates “Best of Breed” architecture (or worst nightmare, depending on your view). It forces you to handle Identity Federation (how does AWS authenticate to Azure?) and Latency (moving data across the internet).

Core challenges you’ll face:

  • Authentication: You shouldn’t embed long-lived keys. You should use OIDC Federation if possible, or secure storage for API keys.
  • Egress Costs: You pay to move data out of AWS. You’ll see this on your bill.
  • Distributed Tracing: If it breaks, where did it break?

Key Concepts:

  • Loose Coupling: Systems communicating via queues.
  • Egress/Ingress Traffic: The hidden cost of cloud.

Difficulty: Intermediate Time estimate: Weekend Pre-requirements: Basic Serverless knowledge.

Real world outcome: You upload cat.jpg to AWS. 5 seconds later, cat-small.jpg appears in your Google Cloud bucket.


Project 6: Unified Infrastructure Dashboard (Terraform State Reader)

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: JavaScript (React/Vue) or Python (Streamlit)
  • Alternative Programming Languages: Go
  • Coolness Level: Level 2: Practical but Forgettable
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Asset Management / IaC
  • Software or Tool: Terraform State Files
  • Main Book: “Infrastructure as Code” by Kief Morris

What you’ll build: A web dashboard that reads your Terraform .tfstate files (stored in S3/Azure Blob) and visualizes your entire multi-cloud estate in one list. “You have 5 VMs on AWS and 3 on Azure.”

Why it teaches Multi-cloud: Managing inventory is a huge problem. This teaches you that the “Source of Truth” in modern cloud is not the Console, but the State File. You learn to parse the JSON structure of infrastructure.

Core challenges you’ll face:

  • State Locking: Reading the state file while Terraform might be writing to it.
  • Parsing: The Terraform state JSON schema is complex.
  • Security: The state file often contains secrets. You need to handle it carefully.

Key Concepts:

  • Drift Detection: When the real world differs from the code.
  • Centralized Inventory: CMDB (Configuration Management Database).

Difficulty: Intermediate Time estimate: 1 week Pre-requirements: Terraform experience.

Real world outcome: A webpage showing a pie chart: “Total Cloud Spend Estimate” and a list of resources grouped by region, regardless of provider.


Project 7: The “Vendor Lock-in” Breaker (Database Replicator)

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: Python
  • Alternative Programming Languages: Java, Go
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: Data Persistence / Database Migration
  • Software or Tool: AWS RDS (Postgres), Google Cloud SQL (Postgres), Debezium (CDC)
  • Main Book: “Designing Data-Intensive Applications” by Martin Kleppmann

What you’ll build: A Change Data Capture (CDC) system. You have a primary database on AWS. Every time a row is inserted/updated, a script (or Kafka connector) captures that change and replays it into a standby database on Google Cloud.

Why it teaches Multi-cloud: Data Gravity is the biggest blocker to multi-cloud. Moving compute is easy; moving state is hard. This simulates a “Live Migration” or “Active-Passive” DR strategy.

Core challenges you’ll face:

  • Replication Lag: The speed of light is a limit.
  • Consistency: handling collisions or out-of-order updates.
  • Schema Drift: If you change the table on AWS, does the script crash on GCP?

Key Concepts:

  • CDC (Change Data Capture): Streaming database changes.
  • WAL (Write Ahead Log): How Postgres stores changes.
  • Disaster Recovery (RPO/RTO): Recovery Point Objective.

Difficulty: Advanced Time estimate: 2 weeks Pre-requirements: Strong SQL knowledge.

Real world outcome: You run INSERT INTO users (name) VALUES ('Alice'); on AWS. You query the GCP database, and ‘Alice’ appears there within seconds.

Implementation Hints: Start simple: A Python script that polls the updated_at column on AWS and inserts new rows to GCP. Advanced version: Use Debezium with Kafka to read the Postgres WAL.


Project 8: Multi-Cloud Kubernetes Cluster (GitOps Federation)

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: YAML (Kubernetes Manifests)
  • Alternative Programming Languages: Go (for custom operators)
  • Coolness Level: Level 5: Pure Magic
  • Business Potential: 4. The “Open Core” Infrastructure
  • Difficulty: Level 4: Expert
  • Knowledge Area: Container Orchestration
  • Software or Tool: AWS EKS, Azure AKS, ArgoCD
  • Main Book: “Kubernetes: Up and Running”

What you’ll build: The holy grail. A Kubernetes setup where you commit code to Git, and ArgoCD automatically deploys that application to both an EKS cluster (AWS) and an AKS cluster (Azure).

Why it teaches Multi-cloud: Kubernetes provides the ultimate abstraction layer. Once you are inside K8s, AWS vs Azure doesn’t matter (mostly). This project teaches GitOps and Cluster Federation.

Core challenges you’ll face:

  • Ingress Controllers: AWS uses ALB; Azure uses Application Gateway. You need an abstraction (like Nginx Ingress) to make them behave identically.
  • Persistent Volumes: AWS EBS vs Azure Disk. You’ll need StorageClasses that map to the specific provider.
  • Cost: Running two managed K8s clusters is expensive (approx $140/month minimum just for control planes). Note: Use spot instances or turn them off at night.

Key Concepts:

  • GitOps: Infrastructure as Code for apps.
  • Ingress Normalization: Making traffic entry points consistent.
  • Storage Classes: Abstracting disk providers.

Difficulty: Expert Time estimate: 2-3 weeks Pre-requirements: Solid Kubernetes and Docker knowledge.

Real world outcome: You change the color of your website in Git. ArgoCD syncs. Both your AWS URL and Azure URL update to the new color automatically.


Project 9: The “Secret” Syncer

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: Go
  • Alternative Programming Languages: Python
  • Coolness Level: Level 3: Genuinely Clever
  • Business Potential: 2. The “Micro-SaaS / Pro Tool”
  • Difficulty: Level 2: Intermediate
  • Knowledge Area: Security / Secrets Management
  • Software or Tool: AWS Secrets Manager, HashiCorp Vault (or Azure Key Vault)
  • Main Book: “Security Engineering” by Ross Anderson

What you’ll build: A daemon that watches AWS Secrets Manager. When a secret changes (e.g., DB_PASSWORD), it automatically pushes that new value to Azure Key Vault (or GitHub Secrets), ensuring all environments stay in sync.

Why it teaches Multi-cloud: Managing rotation of credentials across clouds is a nightmare. This teaches you about Event-Driven Security and API interaction with sensitive data.

Core challenges you’ll face:

  • Security: Your tool needs permission to read/write secrets. If this tool is compromised, everything is compromised.
  • Version conflicts: Handling race conditions if secrets change in two places.

Key Concepts:

  • Secret Rotation: Changing passwords automatically.
  • Least Privilege: Scoping permissions tightly.

Difficulty: Intermediate Time estimate: 1 week Pre-requirements: Basic IAM knowledge.

Real world outcome: You update a secret in AWS Console. You refresh the Azure Portal, and the secret is updated there.


Project 10: Global Latency Router (The Performance Optimiser)

  • File: LEARN_MULTI_CLOUD_BY_DOING.md
  • Main Programming Language: JavaScript (Edge Workers)
  • Alternative Programming Languages: Rust (Wasm)
  • Coolness Level: Level 4: Hardcore Tech Flex
  • Business Potential: 3. The “Service & Support” Model
  • Difficulty: Level 3: Advanced
  • Knowledge Area: CDN / Edge Computing
  • Software or Tool: Cloudflare Workers or AWS Lambda@Edge
  • Main Book: “High Performance Browser Networking” by Ilya Grigorik

What you’ll build: An “Edge” function that intercepts a user’s request. It checks the user’s country headers.

  • If user is in US -> Route to AWS (us-east-1).
  • If user is in Europe -> Route to GCP (europe-west1).
  • If user is in Asia -> Route to Azure (japaneast).

Why it teaches Multi-cloud: This moves logic to the “Edge,” the layer before the cloud. It teaches Geo-routing and performance optimization strategies that utilize different providers’ regional strengths.

Core challenges you’ll face:

  • Testing: How do you pretend to be in Japan when you are in New York? (VPNs).
  • Failover: What if the Asia cluster is down? The edge logic needs to handle that.

Key Concepts:

  • Edge Computing: Running code at the CDN level.
  • Anycast: Network addressing and routing methodology.

Difficulty: Advanced Time estimate: 1 week Pre-requirements: HTTP headers and DNS knowledge.

Real world outcome: You access your site via VPN from Tokyo, it hits Azure. You disconnect VPN (US), it hits AWS.


Project Comparison Table

Project Difficulty Time Depth of Understanding Fun Factor
Static Failover Beginner Weekend ⭐⭐ ⭐⭐
Storage API Intermediate 1 Week ⭐⭐⭐⭐ ⭐⭐⭐
Arbitrage Bot Advanced 2 Weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
VPN Mesh Advanced 1 Week ⭐⭐⭐⭐⭐ ⭐⭐
Serverless Pipe Intermediate Weekend ⭐⭐⭐ ⭐⭐⭐
Unified Dash Intermediate 1 Week ⭐⭐ ⭐⭐⭐
DB Replicator Advanced 2 Weeks ⭐⭐⭐⭐ ⭐⭐⭐⭐
K8s GitOps Expert 3 Weeks ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐
Secret Syncer Intermediate 1 Week ⭐⭐⭐ ⭐⭐
Latency Router Advanced 1 Week ⭐⭐⭐ ⭐⭐⭐⭐

Recommendation

Where to start?

Start with Project 1 (Static Site Failover). It requires very little coding but forces you to touch the consoles of two major providers and understand DNS, which is the glue of the internet.

For the Developer

Move to Project 2 (Cloud-Agnostic Storage API). This will teach you how to write code that isn’t tightly coupled to a vendor, a crucial skill for modern software engineering.

For the Ops/SRE

Project 4 (VPN Mesh) and Project 3 (Arbitrage Bot) are your bread and butter. Networking and Cost/Automation are the two biggest reasons companies hire multi-cloud architects.


Final Capstone: The “Universal” PaaS

What you’ll build: A CLI tool (like heroku-clone) that allows a developer to run my-paas deploy .

The System:

  1. Uploads the code to a Cloud-Agnostic Storage bucket (Project 2).
  2. Triggers a build process in Jenkins/GitHub Actions.
  3. Builds a Docker container.
  4. Uses Terraform to spin up a spot instance on the Cheapest Cloud (Project 3) for that hour.
  5. Deploys the container to that instance.
  6. Updates the Global DNS (Project 1) to point to that new instance IP.

Why this is the ultimate goal: This combines storage, compute, networking, cost optimization, and automation into a single cohesive platform. If you build this, you are a Senior Cloud Architect.


Summary: All Projects

# Project Name Main Language
1 The High-Availability Static Site HCL (Terraform)
2 The “Cloud-Agnostic” Storage API Python
3 The Spot Instance Arbitrage Bot Go
4 Multi-Cloud VPN Mesh Bash / Linux
5 The “Frankenstein” Serverless Pipeline Python
6 Unified Infrastructure Dashboard JavaScript
7 The “Vendor Lock-in” Breaker (DB Replicator) Python
8 Multi-Cloud Kubernetes Cluster (GitOps) YAML
9 The “Secret” Syncer Go
10 Global Latency Router JavaScript
11 The “Universal” PaaS (Capstone) Multiple

Essential Resources (Cost Management)

Warning: Multi-cloud can get expensive if you leave resources running.

  • AWS Free Tier: 12 months free for EC2 micro, S3, RDS.
  • Azure Free Account: $200 credit for 30 days + 12 months free popular services.
  • GCP Free Tier: $300 credit for 90 days + Always Free limits.
  • Resource: Use Terraform destroy or Project 3’s logic to aggressively kill resources when not in use.

Multi-cloud is complex. It adds overhead. But by building these projects, you will learn exactly when that overhead is worth the resilience and leverage it provides.