DISTRIBUTED SYSTEMS FUNDAMENTALS
Learn Distributed Systems: From Sockets to Consensus
Goal: Deeply understand the physics of distributed computing—how independent nodes communicate, agree on truth, survive failures, and scale. You will move from basic network primitives to implementing industry-standard algorithms like Raft, Consistent Hashing, and Vector Clocks from scratch.
Why Distributed Systems Matter
A “Distributed System” is defined as a system where the failure of a computer you didn’t even know existed renders your own computer unusable.
In the single-node world, function calls always succeed (or crash the whole app). Time is absolute. Memory is shared. In the distributed world:
- Network calls fail randomly.
- Time is an illusion (clocks drift).
- Consensus is hard (The Two Generals Problem).
- Partial failure is the norm.
Mastering this makes you the engineer who designs systems like Kafka, etcd, Cassandra, and Kubernetes.
Core Concept Analysis
1. The Fallacies of Distributed Computing
Before writing code, you must unlearn single-node assumptions.
- The network is reliable. (It’s not)
- Latency is zero. (It’s not)
- Bandwidth is infinite. (It’s not)
- The network is secure. (It’s not)
- Topology doesn’t change. (Nodes crash/restart constantly)
- There is one administrator.
- Transport cost is zero.
- The network is homogeneous.
2. The CAP Theorem (Brewer’s Theorem)
You can only have 2 of the 3:
- Consistency: Every read receives the most recent write or an error.
- Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
- Partition Tolerance: The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network.
Reality Check: In a distributed system over a wide area network, P is non-negotiable. You effectively choose between CP (database locks up if network breaks) or AP (database serves stale data if network breaks).
3. Consensus: Making Nodes Agree
How do 5 servers agree on the value of x if 2 of them crash and the network delays messages?
- Paxos: The original, mathematically proven, notoriously hard to understand.
- Raft: Designed to be understandable. Uses Leader Election + Log Replication.
Client
│ "Set x=5"
▼
┌───────────┐
│ Leader │ <-- Handles all writes
└─────┬─────┘
│ (AppendEntries RPC)
┌─────┴─────┐
▼ ▼
┌──────┐ ┌──────┐
│Follow│ │Follow│
└──────┘ └──────┘
4. Ordering & Time
- Physical Time: NTP, Atomic clocks. Unreliable for ordering events due to drift.
- Logical Time: Lamport Clocks (Partial ordering), Vector Clocks (Causal ordering). Used to determine “happened-before” relationships.
Concept Summary Table
| Concept Cluster | What You Need to Internalize |
|---|---|
| Consistency Models | Strong (Linearizability) vs. Eventual. The trade-off between correctness and latency. |
| Replication | Single-Leader (MySQL), Multi-Leader (Google Docs), Leaderless (DynamoDB). |
| Sharding | How to split data across nodes. Range-based vs. Hash-based. Consistent Hashing. |
| Transactions | ACID across nodes is hard. 2PC (blocking) vs. Sagas (compensating actions). |
| Failure Detection | Heartbeats, timeouts, and why you can never truly know if a node is dead or just slow. |
Deep Dive Reading by Concept
Fundamentals & Time
| Concept | Book & Chapter |
|---|---|
| Philosophy & Fallacies | “Distributed Systems” by van Steen & Tanenbaum — Ch. 1: Introduction |
| Clocks & Ordering | “Designing Data-Intensive Applications” (DDIA) by Kleppmann — Ch. 8: The Trouble with Distributed Systems |
Consensus & Replication
| Concept | Book & Chapter |
|---|---|
| Replication Models | “Designing Data-Intensive Applications” — Ch. 5: Replication |
| Consensus (Raft) | “In Search of an Understandable Consensus Algorithm” (Raft Paper) — Extended Version |
| CAP Theorem | “DDIA” — Ch. 9: Consistency and Consensus |
Scalability & Partitioning
| Concept | Book & Chapter |
|---|---|
| Partitioning | “Designing Data-Intensive Applications” — Ch. 6: Partitioning |
| Consistent Hashing | “Dynamo: Amazon’s Highly Available Key-value Store” (The Dynamo Paper) |
Project 1: The “Unreliable” RPC Library
- File: DISTRIBUTED_SYSTEMS_FUNDAMENTALS.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, Python (with
socket) - Coolness Level: Level 3: Genuinely Clever
- Business Potential: 3. Service & Support (Chaos Engineering Tool)
- Difficulty: Level 2: Intermediate
- Knowledge Area: Networking / Failure Modes
- Software or Tool: UDP Sockets, Go Context
- Main Book: “Distributed Systems” by van Steen & Tanenbaum
What you’ll build: A custom Remote Procedure Call (RPC) wrapper over UDP that intentionally fails. You will allow the user to configure “Packet Loss %”, “Latency (jitter)”, and “Duplicate Packets”. You will then build a simple Client/Server using this library and implement the logic to make it reliable (Ack, Retries, Idempotency).
Why it teaches Distributed Systems: Most developers work with TCP/HTTP, which hides the messy reality of networks. By building on UDP and injecting chaos, you are forced to handle the “Partial Failure” problem manually. You will reinvent TCP reliability features (Sequence numbers, ACKs) in application space.
Core challenges you’ll face:
- The Timeout Problem: How long should you wait? Too short = waste bandwidth. Too long = bad UX.
- Idempotency: If I send “Deduct $10” and the ACK is lost, I might retry. How does the server know not to deduct $20?
- Ordering: UDP packets arrive out of order. How do you reconstruct the message?
Key Concepts:
- At-Least-Once vs At-Most-Once Delivery: RPC Semantics.
- Jitter: Why fixed timeouts are bad (thundering herd).
- Idempotency Keys: Request IDs.
Difficulty: Intermediate Time estimate: Weekend Prerequisites: Basic socket programming.
Real World Outcome
You’ll have a chaos_rpc library. When you run your demo, you’ll see logs like this:
Example Output:
$ ./server --port 8080 &
$ ./client --target 8080 --loss-rate 50% --cmd "ping"
[Client] Sending "ping" (Attempt 1)...
[Chaos] Dropped packet!
[Client] Timeout (200ms). Retrying (Attempt 2)...
[Server] Received "ping". Sending "pong".
[Client] Received "pong". Success!
[Client] Sending "pay_bill" (Attempt 1)...
[Server] Processed "pay_bill". Sending ACK.
[Chaos] Dropped ACK!
[Client] Timeout. Retrying "pay_bill" (Attempt 2)...
[Server] Received "pay_bill". DETECTED DUPLICATE (Seq #5). Resending cached ACK.
[Client] Received ACK.
The Core Question You’re Answering
“How do I ensure a requested action happens exactly once when the network might drop the request OR the response?”
Before coding, sit with this. The answer is: You can’t usually guarantee exactly once easily, but you can build idempotence on top of at-least-once.
Project 2: The Logical Clock Chat (Causality)
- File: DISTRIBUTED_SYSTEMS_FUNDAMENTALS.md
- Main Programming Language: Node.js or Go
- Alternative Programming Languages: Python
- Coolness Level: Level 3: Genuinely Clever
- Business Potential: 2. Micro-SaaS (Collab Tools)
- Difficulty: Level 3: Advanced
- Knowledge Area: Logical Time / Ordering
- Software or Tool: WebSockets
- Main Book: “Designing Data-Intensive Applications” (Ch 8)
What you’ll build: A decentralized chat system (P2P-ish) running on 3 different ports (representing 3 nodes). Messages are not ordered by “wall clock time” (Date.now) but by Lamport Timestamps. You will intentionally delay messages to simulate network lag and verify that they are sorted correctly by “happened-before” relationship, not arrival time.
Why it teaches Distributed Systems: Physical clocks drift. In distributed systems, we care about causality (did A cause B?), not seconds. Lamport clocks are the simplest way to track this.
Core challenges you’ll face:
- Clock Drift: Simulating one node having a “fast” clock and one a “slow” clock.
- The Timestamp Algorithm:
max(local_time, incoming_message_time) + 1. - Concurrent Events: What if two events happen at the same logical time? (Tie-breaking by Node ID).
Key Concepts:
- Lamport Timestamps: Leslie Lamport’s 1978 paper.
- Total Ordering vs Partial Ordering.
- Happened-Before Relationship.
Real World Outcome
You start 3 terminal windows. You send messages. You introduce artificial delay on Node 2.
Example Output:
# Node 1 (Time: 5)
> "Hello" (Sent at L=6)
# Node 3 (Time: 10)
> "World" (Sent at L=11)
# Node 2 (Received "World" first due to network weirdness)
[Node 2] Received "World" (L=11). Local time updated to 12.
[Node 2] Received "Hello" (L=6).
[Node 2] SORTING LOG...
1. <Node 1> "Hello" (L=6)
2. <Node 3> "World" (L=11)
# Even though "World" arrived first physically, "Hello" is causally before it.
Project 3: The Rumor Mill (Gossip Protocol)
- File: DISTRIBUTED_SYSTEMS_FUNDAMENTALS.md
- Main Programming Language: Go
- Alternative Programming Languages: Rust, Elixir (perfect for this)
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. Open Core Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: Cluster Membership / Failure Detection
- Software or Tool: SWIM Protocol
- Main Book: “Distributed Systems” by van Steen & Tanenbaum
What you’ll build: A Cluster Membership library. You start 10 nodes. They know nothing about each other initially except for one “seed” node. Using a Gossip (Epidemic) protocol, they will all eventually discover each other. If you kill -9 one node, the cluster must detect the failure and mark it as “Dead” via gossip.
Why it teaches Distributed Systems: This is how Cassandra, Consul, and DynamoDB work. They don’t have a central registry. They gossip. You will learn about Eventual Consistency (the membership list is eventually correct) and Phi Accrual Failure Detection (or simple timeouts).
Core challenges you’ll face:
- Convergence Time: How long until Node A knows about Node Z?
- Bandwidth: Don’t broadcast to everyone; pick random
kpeers (Fan-out). - False Positives: Marking a node dead just because a packet dropped. (Indirect probes).
Key Concepts:
- SWIM Protocol: Scalable Weakly-consistent Infection-style Process Group Membership.
- Dissemination vs Failure Detection.
- Epidemic Algorithms.
Real World Outcome
You spawn 5 processes.
Example Output:
$ ./node --port 3001 --join 3000
$ ./node --port 3002 --join 3000
...
[Node 3005] Members: [3000, 3001, 3002, 3003, 3004, 3005]
# KILL NODE 3002
[Node 3001] Suspect 3002... (No Ack)
[Node 3001] Ping-Req to 3004 asking "Is 3002 alive?"
[Node 3004] 3002 is unreachable.
[Node 3001] Marking 3002 as DEAD. Spreading rumor.
[Node 3005] Received rumor: 3002 is DEAD.
Project 4: The Raft Consensus (Log Replication)
- File: DISTRIBUTED_SYSTEMS_FUNDAMENTALS.md
- Main Programming Language: Go (Standard for Raft)
- Alternative Programming Languages: Rust, Java
- Coolness Level: Level 5: Pure Magic
- Business Potential: 5. Industry Disruptor (Core of Kubernetes/Etcd)
- Difficulty: Level 5: Master
- Knowledge Area: Consensus / Strong Consistency
- Software or Tool: RPC, State Machines
- Main Book: “In Search of an Understandable Consensus Algorithm” (The Raft Paper)
What you’ll build: You will implement the Raft Consensus Algorithm from scratch.
Step 1: Leader Election (Nodes vote, one becomes leader).
Step 2: Log Replication (Leader accepts command set x=5, replicates to followers).
Step 3: Commit (Once majority ack, execute).
Why it teaches Distributed Systems: This is the Holy Grail. If you can build Raft, you understand distributed systems. It forces you to handle split votes, network partitions, log inconsistencies, and safety properties.
Core challenges you’ll face:
- The Split Vote: What if 3 nodes vote for A and 3 vote for B? (Randomized election timeouts).
- The Partition: Leader A gets cut off. Partition B elects Leader B. Partition B rejoins. Leader A must step down.
- Log Indexing: Ensuring the logs match exactly.
Key Concepts:
- Replicated State Machine.
- Term Numbers: Using logical time to detect stale leaders.
- Quorums: Majority (N/2 + 1) rules.
- Safety Property: If a log entry is committed, it is present in all future leaders.
Real World Outcome
A resilient Key-Value store. You run 5 nodes. You write data to the leader. You kill the leader. A new leader is elected. You read the data—it is still there.
Example Output:
[Node 1] Role: LEADER. Term: 5.
[Client] PUT key="foo" val="bar"
[Node 1] Appending to log index 10. Sending AppendEntries to [2,3,4,5].
[Node 2] Ack index 10.
[Node 3] Ack index 10.
[Node 1] Majority reached. Committing. Replying "Success" to client.
# KILL NODE 1
[Node 2] Election Timeout. Starting election Term 6.
[Node 3] Vote for Node 2.
[Node 4] Vote for Node 2.
[Node 2] Become LEADER.
[Client] GET key="foo" -> "bar" (Data survived!)
Project 5: The Sharded Cache (Consistent Hashing)
- File: DISTRIBUTED_SYSTEMS_FUNDAMENTALS.md
- Main Programming Language: Go or Java
- Alternative Programming Languages: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 4. Open Core Infrastructure
- Difficulty: Level 3: Advanced
- Knowledge Area: Scalability / Partitioning
- Software or Tool: Hashing Algorithms (MD5/SHA1 for the ring)
- Main Book: “Dynamo: Amazon’s Highly Available Key-value Store”
What you’ll build: A distributed in-memory cache (like a mini-Memcached or Dynamo). You will NOT use simple hash(key) % N (because adding a node breaks everything). You will implement Consistent Hashing using a Ring topology with virtual nodes. You will demonstrate adding a node and seeing only 1/N keys move.
Why it teaches Distributed Systems: Sharding is how we scale. But “naive” sharding causes “rebalancing storms.” Consistent hashing minimizes data movement. This project teaches you how large-scale storage systems (Cassandra, Riak) distribute data.
Core challenges you’ll face:
- The Ring: Mapping the hash space (0 to 2^32) to a circle.
- Virtual Nodes: Ensuring even distribution (avoiding “hot spots” on the ring).
- Key Routing: Given a key, finding the successor node on the ring.
Key Concepts:
- Horizontal Scaling.
- Rebalancing: Moving data when topology changes.
- Hot Partitions: Why reliable hashing is needed.
Real World Outcome
A CLI that lets you add nodes and keys, showing exactly where they land.
Example Output:
> add_node "Node A" (Hash: 1000)
> add_node "Node B" (Hash: 3000)
> put "user_1" (Hash: 2500) -> Mapped to Node B
> put "user_2" (Hash: 500) -> Mapped to Node A
> add_node "Node C" (Hash: 2000)
[System] Node C added between A and B.
[Rebalance] Keys (1000-2000] moving from Node B to Node C.
> get "user_1" -> Fetched from Node B (No change)
> put "user_3" (Hash: 1500) -> Mapped to Node C
Project 6: The Transaction Coordinator (2PC)
- File: DISTRIBUTED_SYSTEMS_FUNDAMENTALS.md
- Main Programming Language: Go or Java
- Alternative Programming Languages: Python
- Coolness Level: Level 4: Hardcore Tech Flex
- Business Potential: 3. Service & Support
- Difficulty: Level 4: Expert
- Knowledge Area: Distributed Transactions / ACID
- Software or Tool: SQL Databases (SQLite x 2)
- Main Book: “Designing Data-Intensive Applications” (Ch 9)
What you’ll build: A “Bank Transfer” system across two distinct databases (Bank A DB and Bank B DB). You will build a Two-Phase Commit (2PC) Coordinator. Phase 1: Ask A and B “Can you commit?” (Prepare). Lock funds. Phase 2: If both say Yes -> “Commit”. If one says No -> “Abort”.
Why it teaches Distributed Systems: Distributed Transactions are notoriously hard. You will see why 2PC is called “blocking.” If the coordinator crashes after Phase 1, the banks are stuck holding locks forever. This teaches the pain of “Strong Consistency” across boundaries.
Core challenges you’ll face:
- The Blocking Problem: What happens if a participant crashes?
- Coordinator Crash: Saving the state of the transaction to disk (WAL) before sending Commit.
- Simulating Failure: Failing a node during the critical window between Prepare and Commit.
Key Concepts:
- Atomic Commit.
- Prepare Phase / Commit Phase.
- Blocking vs Non-Blocking Protocols.
Real World Outcome
A system that guarantees money is never created or destroyed, even if the program crashes halfway.
Example Output:
[User] Transfer $100 from A to B.
[Coord] Phase 1: Prepare (A: -$100, B: +$100)
[Bank A] Locked $100. Vote: YES.
[Bank B] Locked account. Vote: YES.
[Coord] All YES. Phase 2: Commit.
[Bank A] Committed.
[Chaos] Coordinator Crashes!
[Restart] Coordinator recovers from WAL. Sees pending Commit for Tx #55.
[Coord] Resending Commit to Bank B.
[Bank B] Committed.
[System] Transfer Complete. Consistency maintained.
Project Comparison Table
| Project | Difficulty | Time | Depth of Understanding | Fun Factor |
|---|---|---|---|---|
| 1. Unreliable RPC | ⭐⭐ | Weekend | Failure Modes | ⭐⭐⭐ |
| 2. Logical Clock Chat | ⭐⭐⭐ | Weekend | Time & Causality | ⭐⭐⭐⭐ |
| 3. Rumor Mill | ⭐⭐⭐ | 1 week | Eventual Consistency | ⭐⭐⭐⭐ |
| 4. Raft Consensus | ⭐⭐⭐⭐⭐ | 2 weeks | Strong Consistency | ⭐⭐⭐⭐⭐ |
| 5. Sharded Cache | ⭐⭐⭐ | 1 week | Scalability | ⭐⭐⭐ |
| 6. Tx Coordinator | ⭐⭐⭐⭐ | 1 week | Transactions | ⭐⭐⭐ |
Recommendation
If you want to understand “The Cloud”: Start with Project 3 (Gossip) and Project 5 (Sharding). This explains how DynamoDB and Cassandra work.
If you want to understand Kubernetes/Etcd: You must do Project 4 (Raft). It is the hardest but most rewarding.
If you just want better Microservices: Focus on Project 1 (RPC) and Project 6 (2PC/Sagas).
Final Overall Project: The “Distributed Database”
What you’ll build: Combine everything. A replicated, sharded Key-Value store.
- Sharding: Use Consistent Hashing (Project 5) to route keys to shards.
- Replication: Each shard is a Raft Cluster (Project 4).
- Discovery: Nodes find each other via Gossip (Project 3).
- Client: A smart client that knows which node to talk to.
Why this is the ultimate test: This is basically building CockroachDB or TiKV lite. It requires every single concept to work in harmony.
Summary
This learning path covers Distributed Systems through 6 hands-on projects. Here’s the complete list:
| # | Project Name | Main Language | Difficulty | Time Estimate |
|---|---|---|---|---|
| 1 | The “Unreliable” RPC Library | Go | Intermediate | Weekend |
| 2 | The Logical Clock Chat | Node.js/Go | Advanced | Weekend |
| 3 | The Rumor Mill (Gossip) | Go | Advanced | 1 week |
| 4 | The Raft Consensus | Go | Master | 2 weeks |
| 5 | The Sharded Cache | Go | Advanced | 1 week |
| 6 | The Transaction Coordinator | Go | Expert | 1 week |
Expected Outcomes
After completing these projects, you will:
- Stop trusting the network.
- Understand why “Strong Consistency” kills performance.
- Know how to scale a system from 1 node to 100 nodes.
- Be able to debug “Split Brain” scenarios.
- Have implemented the algorithms that power Google, Amazon, and Facebook infrastructure.