Project 12: Distributed Key-Value Store with Raft

A distributed key-value store that uses the Raft consensus algorithm to replicate data across multiple nodes, tolerating failures and ensuring consistency.

Quick Reference

Attribute	Value
Primary Language	Go
Alternative Languages	Rust, Java
Difficulty	Level 5: Master
Time Estimate	2-3 months
Knowledge Area	Distributed Systems, Consensus, Replication
Tooling	None (from scratch) or etcd/raft library
Prerequisites	All previous projects. Deep networking knowledge. Read the Raft paper multiple times.

What You Will Build

A distributed key-value store that uses the Raft consensus algorithm to replicate data across multiple nodes, tolerating failures and ensuring consistency.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

Leader election → maps to state machines, timeouts, voting
Log replication → maps to RPC, consistency guarantees
Handling failures → maps to network partitions, node crashes
Client interaction → maps to linearizability, redirects

Key Concepts

Raft consensus: The Raft paper (raft.github.io)
Distributed systems: “Designing Data-Intensive Applications” Ch. 8-9 - Kleppmann
Replication: “Designing Data-Intensive Applications” Ch. 5 - Kleppmann
Consistency models: “Designing Data-Intensive Applications” Ch. 9 - Kleppmann

Real-World Outcome

# Start a 3-node cluster:
$ ./raft-kv --id 1 --cluster localhost:8001,localhost:8002,localhost:8003
Node 1 starting...
Cluster: [localhost:8001, localhost:8002, localhost:8003]
[FOLLOWER] Waiting for leader...
[CANDIDATE] Starting election for term 1
[LEADER] Elected! Term: 1

$ ./raft-kv --id 2 --cluster localhost:8001,localhost:8002,localhost:8003
Node 2 starting...
[FOLLOWER] Leader is node 1

$ ./raft-kv --id 3 --cluster localhost:8001,localhost:8002,localhost:8003
Node 3 starting...
[FOLLOWER] Leader is node 1

# Client operations:
$ ./raft-cli set name "Alice"
OK (committed to 3/3 nodes)

$ ./raft-cli get name
Alice

# Kill the leader (node 1):
$ kill -9 <pid-node-1>

# Other nodes elect new leader:
Node 2: [CANDIDATE] Starting election for term 2
Node 2: [LEADER] Elected! Term: 2

# Client still works:
$ ./raft-cli get name
Alice (from node 2)

# Restart node 1:
Node 1: [FOLLOWER] Catching up... (100 log entries)
Node 1: [FOLLOWER] Caught up! Leader is node 2

# View cluster status:
$ ./raft-cli status
Cluster Status:
  Node 1: FOLLOWER  (log: 156 entries)
  Node 2: LEADER    (log: 156 entries) *
  Node 3: FOLLOWER  (log: 156 entries)
  Commit index: 156
  Last applied: 156

Implementation Guide

Reproduce the simplest happy-path scenario.
Build the smallest working version of the core feature.
Add input validation and error handling.
Add instrumentation/logging to confirm behavior.
Refactor into clean modules with tests.

Milestones

Milestone 1: Minimal working program that runs end-to-end.
Milestone 2: Correct outputs for typical inputs.
Milestone 3: Robust handling of edge cases.
Milestone 4: Clean structure and documented usage.

Validation Checklist

Output matches the real-world outcome example
Handles invalid inputs safely
Provides clear errors and exit codes
Repeatable results across runs

References

Main guide: LEARN_GO_DEEP_DIVE.md
“Designing Data-Intensive Applications” by Martin Kleppmann