Project 4: Raft Consensus Implementation
Build a replicated log and leader election system using the Raft consensus algorithm.
Quick Reference
| Attribute | Value |
|---|---|
| Difficulty | Advanced |
| Time Estimate | 3-4 weeks |
| Main Programming Language | Go (Alternatives: Rust, C++) |
| Alternative Programming Languages | Rust, C++ |
| Coolness Level | Level 4: Hardcore Tech Flex |
| Business Potential | Core distributed systems infra |
| Prerequisites | Networking, concurrency, basic distributed systems |
| Key Topics | Leader election, log replication, quorums, safety |
1. Learning Objectives
By completing this project, you will:
- Implement leader election and term transitions.
- Replicate a log across multiple nodes with consistency guarantees.
- Apply committed entries to a deterministic state machine.
- Handle network partitions and node failures.
- Explain safety and liveness trade-offs in consensus.
- Build a test harness that simulates faults.
2. All Theory Needed (Per-Concept Breakdown)
2.1 Terms, Roles, and Elections
Description / Expanded Explanation
Raft divides time into terms. In each term, at most one leader is elected. Nodes transition between follower, candidate, and leader. Elections are triggered by timeouts and require majority votes.
Definitions & Key Terms
- term -> logical epoch in Raft
- follower -> passive node that responds to requests
- candidate -> node attempting to become leader
- leader -> node that handles client requests
- election timeout -> randomized delay before starting an election
Mental Model Diagram (ASCII)
Follower --timeout--> Candidate --majority--> Leader
^ | |
|--append entries------|-------------------|
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Followers wait for heartbeats from a leader.
- If timeout expires, follower becomes candidate and starts election.
- Candidate requests votes from peers.
- If majority votes, candidate becomes leader.
- Leader sends heartbeats to maintain authority.
Minimal Concrete Example
term 5: node B times out -> candidate -> gets 3/5 votes -> leader
Common Misconceptions
- “Multiple leaders can exist” -> Raft ensures at most one leader per term.
- “Timeouts must be identical” -> they should be randomized.
Check-Your-Understanding Questions
- Why is randomization required for election timeouts?
- What happens if two candidates split votes?
- Can a leader step down without a higher term?
Where You’ll Apply It
- See 3.2 for RPC requirements.
- See 4.4 for election algorithm.
- Also used in: P02 Load Balancer for failover logic
2.2 Log Replication
Description / Expanded Explanation
The leader appends client commands to its log and replicates them to followers. Followers only accept entries that match their existing log prefix, enforcing consistency.
Definitions & Key Terms
- log entry -> command with term and index
- append entries -> RPC to replicate log
- match index -> highest log index known to be replicated
- next index -> next entry a follower should receive
Mental Model Diagram (ASCII)
Leader log: [1][2][3][4]
Follower: [1][2]
AppendEntries -> send [3][4]
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Leader receives client command and appends to log.
- Leader sends AppendEntries RPCs to followers.
- Followers verify previous index and term.
- If match, follower appends entries and acknowledges.
Minimal Concrete Example
AppendEntries(prevIndex=2, prevTerm=1, entries=[3,4])
Common Misconceptions
- “Followers always accept new entries” -> they reject if log mismatch.
- “Replication equals commitment” -> entries must be committed by majority.
Check-Your-Understanding Questions
- What is the purpose of prevIndex and prevTerm?
- When does the leader retry with earlier entries?
- What happens if a follower has conflicting entries?
Where You’ll Apply It
- See 4.4 for replication steps.
- See 5.10 Phase 2 for implementation details.
- Also used in: P03 Persistent Key-Value Store
2.3 Commitment and Safety
Description / Expanded Explanation
An entry is committed when it is stored on a majority of nodes. Only committed entries are applied to the state machine. This ensures safety: no two nodes apply different commands at the same index.
Definitions & Key Terms
- commit index -> highest log index known to be committed
- majority -> quorum of nodes (N/2 + 1)
- safety -> no conflicting commits
- state machine -> deterministic function applied to log entries
Mental Model Diagram (ASCII)
5 nodes: need 3 acknowledgments to commit
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Leader tracks replication acknowledgments.
- When majority replicated, advance commit index.
- Apply committed entries to state machine in order.
Minimal Concrete Example
Acked: 3/5 -> commit index advances
Common Misconceptions
- “Leader commit equals durability” -> must be replicated.
- “Followers can commit independently” -> they follow leader commit index.
Check-Your-Understanding Questions
- Why is majority required for commit?
- Can a committed entry be lost?
- How does Raft prevent divergent logs?
Where You’ll Apply It
- See 3.3 and 3.6 for guarantees and edge cases.
- See 6.2 for tests.
- Also used in: P03 Persistent Key-Value Store
2.4 Failure Modes and Timeouts
Description / Expanded Explanation
Raft operates under partial failure and message delay. Correct timeout tuning prevents split brain and ensures progress despite failures.
Definitions & Key Terms
- heartbeat -> periodic AppendEntries with no data
- partition -> network split
- liveness -> system continues to make progress
- timeout -> duration before election starts
Mental Model Diagram (ASCII)
Leader down -> timeout -> new election -> new leader
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Leader fails, heartbeats stop.
- Followers time out and start elections.
- Majority partition elects a new leader.
- Minority partition cannot commit.
Minimal Concrete Example
5 nodes -> partition 3/2 -> only 3-node side makes progress
Common Misconceptions
- “Raft prevents all downtime” -> partitions can block progress.
- “Short timeouts are best” -> too short causes instability.
Check-Your-Understanding Questions
- Why can the minority partition not elect a leader?
- How do timeouts affect leader stability?
- What happens when partition heals?
Where You’ll Apply It
- See 5.10 Phase 1 for election timeouts.
- See 7.3 for performance traps.
- Also used in: P02 Load Balancer
2.5 Deterministic State Machines
Description / Expanded Explanation
Raft guarantees that the same sequence of committed commands is applied in the same order on all nodes. If the state machine is deterministic, all nodes reach the same state.
Definitions & Key Terms
- state machine -> deterministic function applied to log entries
- determinism -> same input yields same output
- snapshot -> compacted state to reduce log size
Mental Model Diagram (ASCII)
log: [set x=1][set y=2]
state: x=1, y=2
Mental Model Diagram (Image)

How It Works (Step-by-Step)
- Maintain apply index separate from commit index.
- Apply entries in order to the state machine.
- Create snapshots to truncate old log entries.
Minimal Concrete Example
apply(set k=v) -> map[k]=v
Common Misconceptions
- “State machine can be non-deterministic” -> then replicas diverge.
- “Snapshots are optional” -> long logs become too large.
Check-Your-Understanding Questions
- Why must application be in order?
- What state is stored in snapshots?
- How do snapshots interact with log replication?
Where You’ll Apply It
- See 4.2 for component responsibilities.
- See 5.10 Phase 3 for snapshots.
- Also used in: P03 Persistent Key-Value Store
3. Project Specification
3.1 What You Will Build
A Raft cluster that supports leader election, log replication, and client commands applied to a simple key-value state machine. It exposes a CLI to start nodes and submit commands.
3.2 Functional Requirements
- Leader election with randomized timeouts.
- AppendEntries RPC for replication and heartbeats.
- RequestVote RPC for elections.
- Commitment when majority acknowledges.
- State machine applies committed entries.
- Snapshotting to truncate log.
3.3 Non-Functional Requirements
- Reliability: tolerate f failures in a 2f+1 cluster.
- Performance: handle 100 ops/sec in local cluster.
- Usability: clear logs and deterministic tests.
3.4 Example Usage / Output
$ raft-node --id 1 --peers 2,3
$ raft-client put x 1
OK (leader=1)
3.5 Data Formats / Schemas / Protocols
AppendEntries RPC:
{
"term": 5,
"leader_id": 1,
"prev_log_index": 10,
"prev_log_term": 4,
"entries": ["set x=1"],
"leader_commit": 9
}
RequestVote RPC:
{"term":5,"candidate_id":2,"last_log_index":10,"last_log_term":4}
Error JSON:
{"error":"not_leader","leader":"node1"}
3.6 Edge Cases
- Leader crashes after replicating but before commit.
- Partition splits cluster; minority cannot progress.
- Log divergence after leader change.
3.7 Real World Outcome
3.7.1 How to Run (Copy/Paste)
make
./raft-node --id 1 --peers 2,3 &
./raft-node --id 2 --peers 1,3 &
./raft-node --id 3 --peers 1,2 &
./raft-client put x 1
3.7.2 Golden Path Demo (Deterministic)
Use fixed random seed for timeout jitter and a scripted client workload.
3.7.3 CLI Transcript (Success)
$ ./raft-client put x 1
OK leader=1 index=5
$ ./raft-client get x
1
$ echo $?
0
3.7.3 CLI Transcript (Failure)
$ ./raft-client put x 2 --target 2
{"error":"not_leader","leader":"node1"}
$ echo $?
3
3.7.4 Exit Codes
0success2RPC timeout3not leader4log inconsistency
4. Solution Architecture
4.1 High-Level Design
client -> leader -> append entries -> followers
| commit index | state machine
4.2 Key Components
| Component | Responsibility | Key Decisions | |———–|—————-|—————| | RPC layer | send/receive messages | JSON or binary protocol | | Raft state | term, role, log | in-memory + persisted | | Replicator | append entries | per-follower nextIndex | | State machine | apply committed entries | deterministic map |
4.3 Data Structures (No Full Code)
type LogEntry struct {
Term int
Command string
}
4.4 Algorithm Overview
Election: start election on timeout, request votes, become leader if majority. Replication: leader appends, replicates, commits with majority.
Complexity Analysis
- Time: O(n) per append to replicate to n followers
- Space: O(log size) per node
5. Implementation Guide
5.1 Development Environment Setup
make
5.2 Project Structure
project-root/
├── cmd/raft-node/
├── internal/raft/
├── internal/rpc/
└── tests/
5.3 The Core Question You’re Answering
“How can multiple machines agree on the same sequence of operations even when some fail?”
5.4 Concepts You Must Understand First
- Leader election and quorum
- Log replication rules
- Deterministic state machines
- Failure detection and timeouts
5.5 Questions to Guide Your Design
- How will you persist term and log on disk?
- How will you avoid split brain?
- How do you choose timeout ranges?
5.6 Thinking Exercise
Simulate a 5-node cluster with a partition 3/2. Which side can commit entries?
5.7 The Interview Questions They’ll Ask
- Explain why Raft uses a leader.
- How does Raft ensure safety after a leader crash?
- What is the difference between Raft and Paxos?
5.8 Hints in Layers
Hint 1: Implement a single-node log first. Hint 2: Add elections without replication. Hint 3: Add replication and commit index.
5.9 Books That Will Help
| Topic | Book | Chapter | |——-|——|———| | Consensus | Designing Data-Intensive Applications | Ch. 9 | | Raft paper | In Search of an Understandable Consensus Algorithm | Sections 3-5 |
5.10 Implementation Phases
Phase 1: Foundation (7-10 days)
Leader election and RPCs.
Phase 2: Core Functionality (10-14 days)
Log replication and commitment.
Phase 3: Polish and Edge Cases (7-10 days)
Snapshots and log compaction.
5.11 Key Implementation Decisions
| Decision | Options | Recommendation | Rationale | |———-|———|—————-|———–| | RPC format | JSON, protobuf | JSON | simple and readable | | Persistence | append-only log | WAL style | durability | | Timeouts | fixed vs randomized | randomized | avoid split votes |
6. Testing Strategy
6.1 Test Categories
| Category | Purpose | Examples | |———-|———|———-| | Unit Tests | role transitions | follower -> candidate -> leader | | Integration Tests | replication | commit after majority | | Fault Injection | partition | drop messages |
6.2 Critical Test Cases
- Leader crash during replication -> new leader commits safely.
- Partition heals -> logs converge.
- Multiple elections -> only one leader per term.
6.3 Test Data
commands: set x=1, set y=2, del x
7. Common Pitfalls & Debugging
7.1 Frequent Mistakes
| Pitfall | Symptom | Solution | |———|———|———-| | No persistence | split brain after restart | persist term and log | | Short timeouts | constant elections | increase timeout range | | Apply uncommitted | diverging state | apply only committed entries |
7.2 Debugging Strategies
- Add verbose logs for role changes.
- Run deterministic simulation with fixed seed.
7.3 Performance Traps
- Sending full log on every append.
- Excessively large heartbeat frequency.
8. Extensions & Challenges
8.1 Beginner Extensions
- Add basic RPC tracing.
- Add CLI for cluster status.
8.2 Intermediate Extensions
- Snapshotting and log truncation.
- Leader transfer.
8.3 Advanced Extensions
- Dynamic membership changes.
- Persistent storage with batching.
9. Real-World Connections
9.1 Industry Applications
- etcd, Consul, and many distributed KV stores.
9.2 Related Open Source Projects
- etcd
- HashiCorp Raft
9.3 Interview Relevance
- Central to distributed systems interviews.
10. Resources
10.1 Essential Reading
- Raft paper: In Search of an Understandable Consensus Algorithm
- Designing Data-Intensive Applications (Chapter 9)
10.2 Video Resources
- Raft visual explanations and lectures
10.3 Tools & Documentation
tcfor network simulation
10.4 Related Projects in This Series
11. Self-Assessment Checklist
11.1 Understanding
- I can explain leader election and term transitions.
- I can explain commit rules and safety.
- I can describe failure handling.
11.2 Implementation
- Cluster elects a leader reliably.
- Logs are consistent across nodes.
- State machine outputs match on all nodes.
11.3 Growth
- I can compare Raft with Paxos at a high level.
12. Submission / Completion Criteria
Minimum Viable Completion:
- Leader election works with 3 nodes.
- AppendEntries replicates log entries.
Full Completion:
- Commitment and state machine application implemented.
- Snapshots supported.
Excellence (Going Above & Beyond):
- Dynamic membership changes.
- Fault injection test harness.