Project 12: Distributed Key-Value Store with Raft

A distributed key-value store that uses the Raft consensus algorithm to replicate data across multiple nodes, tolerating failures and ensuring consistency.

Quick Reference

Attribute Value
Primary Language Go
Alternative Languages Rust, Java
Difficulty Level 5: Master
Time Estimate 2-3 months
Knowledge Area Distributed Systems, Consensus, Replication
Tooling None (from scratch) or etcd/raft library
Prerequisites All previous projects. Deep networking knowledge. Read the Raft paper multiple times.

What You Will Build

A distributed key-value store that uses the Raft consensus algorithm to replicate data across multiple nodes, tolerating failures and ensuring consistency.

Why It Matters

This project builds core skills that appear repeatedly in real-world systems and tooling.

Core Challenges

  • Leader election → maps to state machines, timeouts, voting
  • Log replication → maps to RPC, consistency guarantees
  • Handling failures → maps to network partitions, node crashes
  • Client interaction → maps to linearizability, redirects

Key Concepts

  • Raft consensus: The Raft paper (raft.github.io)
  • Distributed systems: “Designing Data-Intensive Applications” Ch. 8-9 - Kleppmann
  • Replication: “Designing Data-Intensive Applications” Ch. 5 - Kleppmann
  • Consistency models: “Designing Data-Intensive Applications” Ch. 9 - Kleppmann

Real-World Outcome

# Start a 3-node cluster:
$ ./raft-kv --id 1 --cluster localhost:8001,localhost:8002,localhost:8003
Node 1 starting...
Cluster: [localhost:8001, localhost:8002, localhost:8003]
[FOLLOWER] Waiting for leader...
[CANDIDATE] Starting election for term 1
[LEADER] Elected! Term: 1

$ ./raft-kv --id 2 --cluster localhost:8001,localhost:8002,localhost:8003
Node 2 starting...
[FOLLOWER] Leader is node 1

$ ./raft-kv --id 3 --cluster localhost:8001,localhost:8002,localhost:8003
Node 3 starting...
[FOLLOWER] Leader is node 1

# Client operations:
$ ./raft-cli set name "Alice"
OK (committed to 3/3 nodes)

$ ./raft-cli get name
Alice

# Kill the leader (node 1):
$ kill -9 <pid-node-1>

# Other nodes elect new leader:
Node 2: [CANDIDATE] Starting election for term 2
Node 2: [LEADER] Elected! Term: 2

# Client still works:
$ ./raft-cli get name
Alice (from node 2)

# Restart node 1:
Node 1: [FOLLOWER] Catching up... (100 log entries)
Node 1: [FOLLOWER] Caught up! Leader is node 2

# View cluster status:
$ ./raft-cli status
Cluster Status:
  Node 1: FOLLOWER  (log: 156 entries)
  Node 2: LEADER    (log: 156 entries) *
  Node 3: FOLLOWER  (log: 156 entries)
  Commit index: 156
  Last applied: 156

Implementation Guide

  1. Reproduce the simplest happy-path scenario.
  2. Build the smallest working version of the core feature.
  3. Add input validation and error handling.
  4. Add instrumentation/logging to confirm behavior.
  5. Refactor into clean modules with tests.

Milestones

  • Milestone 1: Minimal working program that runs end-to-end.
  • Milestone 2: Correct outputs for typical inputs.
  • Milestone 3: Robust handling of edge cases.
  • Milestone 4: Clean structure and documented usage.

Validation Checklist

  • Output matches the real-world outcome example
  • Handles invalid inputs safely
  • Provides clear errors and exit codes
  • Repeatable results across runs

References

  • Main guide: LEARN_GO_DEEP_DIVE.md
  • “Designing Data-Intensive Applications” by Martin Kleppmann