Project 2: Build a Load Balancer

Build a TCP/HTTP load balancer that routes traffic, detects failures, and exposes operational metrics.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	2-3 weeks
Main Programming Language	Go (Alternatives: Rust, C, Python)
Alternative Programming Languages	Rust, C, Python
Coolness Level	Level 3: Genuinely Clever
Business Potential	Service and Support Model
Prerequisites	TCP sockets, HTTP basics, concurrency
Key Topics	Load balancing algorithms, health checks, failure detection, backpressure

1. Learning Objectives

By completing this project, you will:

Implement a TCP and HTTP reverse proxy with routing logic.
Compare round-robin, least-connections, and weighted algorithms.
Design active and passive health checks with timeouts.
Implement safe shared state for concurrent routing decisions.
Expose operational metrics and observe system behavior under load.
Explain L4 vs L7 load balancing trade-offs in interviews.

2. All Theory Needed (Per-Concept Breakdown)

2.1 L4 vs L7 Load Balancing

Description / Expanded Explanation

Layer 4 load balancing routes by TCP/UDP metadata (IP, port), while Layer 7 routes by application data (HTTP headers, paths). L4 is faster and simpler; L7 is more flexible and enables routing by URL or headers.

Definitions & Key Terms

L4 -> transport layer routing using IP/port
L7 -> application layer routing using request content
reverse proxy -> a server that forwards client requests to backends
connection vs request -> L4 routes connections, L7 routes requests

Mental Model Diagram (ASCII)

Client -> LB (L4) -> Backend
         LB (L7) -> Backend based on HTTP path

Mental Model Diagram (Image)

L4 vs L7

How It Works (Step-by-Step)

Client connects to load balancer.
L4 picks backend immediately and forwards bytes.
L7 reads HTTP request to select backend.
Responses are proxied back to the client.

Minimal Concrete Example

GET /api/users HTTP/1.1
Host: example.com

Route by path prefix for L7, or by connection for L4.

Common Misconceptions

“L7 is always better” -> L7 costs CPU and adds latency.
“L4 cannot be smart” -> L4 can still use metrics like connection count.

Check-Your-Understanding Questions

Why is L7 more expensive than L4?
What data does L4 use to make a routing decision?
When would L4 be preferred over L7?

Where You’ll Apply It

See 3.2 and 3.4 for routing features.
See 4.1 for architecture split between L4 and L7.
Also used in: P08 TCP Socket Server

2.2 Load Balancing Algorithms

Description / Expanded Explanation

The algorithm decides which backend handles the next request. Simple algorithms are easy to implement but can mis-handle uneven load. Advanced algorithms need more state and metrics.

Definitions & Key Terms

round-robin -> rotate through backends in order
least-connections -> choose backend with fewest active connections
weighted round-robin -> distribute traffic proportional to weights
sticky sessions -> route same client to same backend

Mental Model Diagram (ASCII)

Backends: A B C
Round-robin: A B C A B C
Weighted: A A B C A A B C

Mental Model Diagram (Image)

Algorithms

How It Works (Step-by-Step)

Maintain backend list with health and metrics.
Select based on algorithm.
Update connection counters and response metrics.
If request fails, retry or mark backend unhealthy.

Minimal Concrete Example

idx := atomic.AddUint32(&rr, 1)
backend := backends[idx%uint32(len(backends))]

Common Misconceptions

“Round-robin is fair” -> it ignores backend speed and connection length.
“Sticky sessions are free” -> they reduce balancing effectiveness.

Check-Your-Understanding Questions

Why can least-connections be better for long requests?
How do weights affect distribution?
What happens to sticky sessions when a backend dies?

Where You’ll Apply It

See 3.2 for algorithm requirements.
See 4.4 for routing logic.
Also used in: P04 Raft Consensus for leader selection logic

2.3 Health Checks and Failure Detection

Description / Expanded Explanation

A load balancer must detect failed backends quickly without overreacting to temporary slowness. Health checks can be active (polling endpoints) or passive (observing failures).

Definitions & Key Terms

active health check -> periodic probe request
passive health check -> mark unhealthy after failed requests
failure threshold -> number of consecutive failures before down
recovery window -> time before retrying a failed backend

Mental Model Diagram (ASCII)

healthy -> fail -> fail -> fail -> DOWN
DOWN -> probe -> success -> HEALTHY

Mental Model Diagram (Image)

Health Checks

How It Works (Step-by-Step)

Run checks every N seconds.
On failure, increment counter.
If failures exceed threshold, mark DOWN.
Periodically probe DOWN backends for recovery.

Minimal Concrete Example

if failures >= 3 {
    backend.healthy = false
}

Common Misconceptions

“One failure means down” -> transient errors are normal.
“Health checks should be heavy” -> use lightweight endpoints.

Check-Your-Understanding Questions

Why use a failure threshold instead of one failed probe?
What happens if health checks are too frequent?
How do you avoid flapping backends?

Where You’ll Apply It

See 3.2 and 3.6 for requirements and edge cases.
See 5.10 Phase 2 for health check implementation.
Also used in: P04 Raft Consensus

2.4 Connection Lifecycle and Backpressure

Description / Expanded Explanation

Load balancers must handle many concurrent connections without exhausting resources. Backpressure prevents the system from accepting more work than it can handle, and timeouts ensure slow backends do not block the entire proxy.

Definitions & Key Terms

keep-alive -> reuse TCP connections for multiple requests
backpressure -> signaling upstream to slow down
timeout -> bound on waiting for backend response
connection pool -> reuse outbound connections to backends

Mental Model Diagram (ASCII)

client -> [LB] -> backend
           ^ queue limits and timeouts

Mental Model Diagram (Image)

Backpressure

How It Works (Step-by-Step)

Accept client connection.
If queue is full, reject with 503.
Proxy to backend with timeout.
If backend is slow, cancel and retry or fail.

Minimal Concrete Example

ctx, cancel := context.WithTimeout(req.Context(), 2*time.Second)

Common Misconceptions

“More concurrency always helps” -> too many open connections exhaust file descriptors.
“Timeouts are optional” -> slow backends can stall the whole balancer.

Check-Your-Understanding Questions

What happens when file descriptor limits are exceeded?
Why is backpressure important in distributed systems?
What is the trade-off between retries and latency?

Where You’ll Apply It

See 3.3 and 3.6 for performance and edge cases.
See 7.3 for performance traps.
Also used in: P08 TCP Socket Server

2.5 Concurrency and Shared State

Description / Expanded Explanation

Routing decisions require shared state such as backend health and connection counts. In concurrent systems this state must be consistent and safe without becoming a bottleneck.

Definitions & Key Terms

atomic -> CPU-level operation that is indivisible
mutex -> lock to guard shared state
race condition -> incorrect behavior caused by unsynchronized access
read-write lock -> allows multiple readers, single writer

Mental Model Diagram (ASCII)

requests -> [router]
           | state: backends, counters |

Mental Model Diagram (Image)

Concurrency State

How It Works (Step-by-Step)

Each request chooses a backend using shared state.
Update connection counts atomically or via lock.
Health checker updates backend status periodically.
Protect state with locks or immutable snapshots.

Minimal Concrete Example

mu.Lock()
backend.active++
mu.Unlock()

Common Misconceptions

“Go avoids concurrency bugs” -> data races still exist.
“Lock everything” -> heavy locks can destroy throughput.

Check-Your-Understanding Questions

Why is an atomic counter sufficient for round-robin index?
When do you need a full mutex instead of atomics?
How do you prevent races when reloading config?

Where You’ll Apply It

See 4.2 and 4.4 for component responsibilities.
See 5.10 Phase 2 for concurrency design.
Also used in: P07 Lock-Free Queue

3. Project Specification

3.1 What You Will Build

A load balancer that accepts incoming TCP/HTTP connections and distributes them across multiple backend servers. It supports multiple algorithms, health checks, and exposes metrics. It is a standalone CLI service.

3.2 Functional Requirements

Routing algorithms: round-robin, least-connections, weighted.
Health checks: active and passive with configurable intervals.
Sticky sessions: IP hash based routing.
Metrics endpoint: JSON stats at /metrics.
Config reload: hot reload without dropping existing connections.
Graceful shutdown: stop accepting new connections, drain old ones.

3.3 Non-Functional Requirements

Performance: add less than 5 ms overhead per request.
Reliability: failover within 3 consecutive failed checks.
Usability: clear logs and deterministic tests.

3.4 Example Usage / Output

$ ./loadbalancer --config lb.yaml --port 8080
[LB] listening on :8080
[LB] algorithm=round-robin backends=3

3.5 Data Formats / Schemas / Protocols

Config file lb.yaml:

listen: 8080
algorithm: round-robin
health_check:
  interval_ms: 5000
  timeout_ms: 800
  failure_threshold: 3
backends:
  - url: http://127.0.0.1:9001
    weight: 2
  - url: http://127.0.0.1:9002
    weight: 1

Metrics JSON:

{
  "total_requests": 1204,
  "backends": [
    {"url":"http://127.0.0.1:9001","healthy":true,"active":3},
    {"url":"http://127.0.0.1:9002","healthy":true,"active":1}
  ]
}

Error JSON (unified shape):

{"error":"backend_unavailable","message":"no healthy backends"}

3.6 Edge Cases

All backends unhealthy -> return 503 with error JSON.
Backend slow -> timeout and retry once.
Config reload while requests in flight.
Client closes connection early.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./backend --port 9001 --name A &
./backend --port 9002 --name B &
./loadbalancer --config lb.yaml --port 8080

3.7.2 Golden Path Demo (Deterministic)

The demo uses a fixed request order and deterministic backend responses.

3.7.3 CLI Transcript (Success)

$ ./loadbalancer --config lb.yaml --port 8080
[LB] algorithm=round-robin backends=2
$ curl http://localhost:8080/hello
Hello from A
$ curl http://localhost:8080/hello
Hello from B
$ curl http://localhost:8080/metrics
{"total_requests":2,"backends":[{"url":"http://127.0.0.1:9001","healthy":true,"active":0},{"url":"http://127.0.0.1:9002","healthy":true,"active":0}]}
$ echo $?
0

3.7.3 CLI Transcript (Failure)

$ pkill backend
$ curl http://localhost:8080/hello
{"error":"backend_unavailable","message":"no healthy backends"}
$ echo $?
22

3.7.4 If API

Endpoints:

GET /metrics -> 200 JSON stats
POST /admin/reload -> reload config

Example success:

POST /admin/reload HTTP/1.1
Host: localhost:8080

HTTP/1.1 200 OK
{"status":"reloaded","backends":2}

Example error:

POST /admin/reload HTTP/1.1
Host: localhost:8080

HTTP/1.1 400 Bad Request
{"error":"config_invalid","message":"missing backends"}

3.7.5 Exit Codes

0 success
1 invalid config
2 port bind failure

4. Solution Architecture

4.1 High-Level Design

Client -> Listener -> Router -> Backend Pool -> Backend
                    | Health Checker |
                    | Metrics        |

4.2 Key Components

4.3 Data Structures (No Full Code)

type Backend struct {
    URL string
    Weight int
    Healthy bool
    Active int64
}

4.4 Algorithm Overview

Routing

Snapshot healthy backends.
Apply algorithm to choose backend.
Proxy request and update metrics.

Complexity Analysis

Time: O(n) for least-connections, O(1) for round-robin
Space: O(n) backend state

5. Implementation Guide

5.1 Development Environment Setup

make

5.2 Project Structure

project-root/
├── cmd/loadbalancer/
├── internal/router/
├── internal/health/
├── internal/metrics/
└── tests/

5.3 The Core Question You’re Answering

“Given multiple servers, how do I route traffic fairly and safely when failures happen?”

5.4 Concepts You Must Understand First

TCP connection lifecycle
HTTP parsing and proxying
Concurrency and shared state
Health check thresholds

5.5 Questions to Guide Your Design

How will you protect the backend list during reloads?
What is the retry policy on failure?
How do you avoid slow backend poisoning overall latency?

5.6 Thinking Exercise

Simulate 3 backends with weights 3,2,1. Write the first 12 backend selections.

5.7 The Interview Questions They’ll Ask

Explain the difference between L4 and L7.
How do you implement sticky sessions and what are trade-offs?
How do you prevent cascading failures?

5.8 Hints in Layers

Hint 1: Start with a single-backend TCP proxy. Hint 2: Add round-robin with an atomic counter. Hint 3: Add health checks before implementing weighted algorithms.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (3-4 days)

Goals: TCP proxy and round-robin routing. Checkpoint: distribute requests across two backends.

Phase 2: Core Functionality (7-10 days)

Goals: health checks and metrics. Checkpoint: backend failure is detected and removed.

Phase 3: Polish and Edge Cases (4-6 days)

Goals: hot reload, sticky sessions, timeouts. Checkpoint: reload config without dropping connections.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Round-robin distributes evenly across healthy backends.
Least-connections routes to backend with smallest active count.
Unhealthy backend is excluded within 3 checks.

6.3 Test Data

requests: 1000
concurrency: 50

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Enable verbose logs for routing decisions.
Use tcpdump to verify L7 parsing.
Run with race detector for concurrency bugs.

7.3 Performance Traps

Excessive locking in hot path.
Re-parsing HTTP headers multiple times.

8. Extensions & Challenges

8.1 Beginner Extensions

Add least-connections algorithm.
Add basic metrics endpoint.

8.2 Intermediate Extensions

Implement circuit breaker per backend.
Support TLS termination.

8.3 Advanced Extensions

Adaptive routing based on latency.
Consistent hashing for cache affinity.

9. Real-World Connections

9.1 Industry Applications

Edge proxies in CDNs.
Service meshes and API gateways.

HAProxy
Nginx
Envoy

9.3 Interview Relevance

Core system design question across companies.
Shows understanding of failure modes and trade-offs.

10. Resources

10.1 Essential Reading

Designing Data-Intensive Applications (Chapter 1)
Building Microservices (Chapter 11)

10.2 Video Resources

Talks on load balancing and reliability

10.3 Tools & Documentation

hey, ab, tcpdump, pprof

11. Self-Assessment Checklist

11.1 Understanding

I can explain L4 vs L7 load balancing.
I can describe health check failure thresholds.
I can justify a routing algorithm choice.

11.2 Implementation

All routing algorithms work as specified.
Health checks remove unhealthy backends.
Metrics endpoint reports correct counts.

11.3 Growth

I can diagram the architecture in an interview.
I can explain how I would scale this further.

12. Submission / Completion Criteria

Minimum Viable Completion:

Accept connections and route to backends.
Round-robin routing works.
Health checks mark dead backends as down.

Full Completion:

Supports weighted and least-connections.
Exposes metrics endpoint.
Supports hot reload.

Excellence (Going Above & Beyond):

Adds circuit breaker and adaptive routing.
Supports TLS termination and observability hooks.