Project 4: Raft Consensus Implementation

Build a replicated log and leader election system using the Raft consensus algorithm.

Quick Reference

Attribute	Value
Difficulty	Advanced
Time Estimate	3-4 weeks
Main Programming Language	Go (Alternatives: Rust, C++)
Alternative Programming Languages	Rust, C++
Coolness Level	Level 4: Hardcore Tech Flex
Business Potential	Core distributed systems infra
Prerequisites	Networking, concurrency, basic distributed systems
Key Topics	Leader election, log replication, quorums, safety

1. Learning Objectives

By completing this project, you will:

Implement leader election and term transitions.
Replicate a log across multiple nodes with consistency guarantees.
Apply committed entries to a deterministic state machine.
Handle network partitions and node failures.
Explain safety and liveness trade-offs in consensus.
Build a test harness that simulates faults.

2. All Theory Needed (Per-Concept Breakdown)

2.1 Terms, Roles, and Elections

Description / Expanded Explanation

Raft divides time into terms. In each term, at most one leader is elected. Nodes transition between follower, candidate, and leader. Elections are triggered by timeouts and require majority votes.

Definitions & Key Terms

term -> logical epoch in Raft
follower -> passive node that responds to requests
candidate -> node attempting to become leader
leader -> node that handles client requests
election timeout -> randomized delay before starting an election

Mental Model Diagram (ASCII)

Follower --timeout--> Candidate --majority--> Leader
   ^                      |                   |
   |--append entries------|-------------------|

Mental Model Diagram (Image)

Raft Roles

How It Works (Step-by-Step)

Followers wait for heartbeats from a leader.
If timeout expires, follower becomes candidate and starts election.
Candidate requests votes from peers.
If majority votes, candidate becomes leader.
Leader sends heartbeats to maintain authority.

Minimal Concrete Example

term 5: node B times out -> candidate -> gets 3/5 votes -> leader

Common Misconceptions

“Multiple leaders can exist” -> Raft ensures at most one leader per term.
“Timeouts must be identical” -> they should be randomized.

Check-Your-Understanding Questions

Why is randomization required for election timeouts?
What happens if two candidates split votes?
Can a leader step down without a higher term?

Where You’ll Apply It

See 3.2 for RPC requirements.
See 4.4 for election algorithm.
Also used in: P02 Load Balancer for failover logic

2.2 Log Replication

Description / Expanded Explanation

The leader appends client commands to its log and replicates them to followers. Followers only accept entries that match their existing log prefix, enforcing consistency.

Definitions & Key Terms

log entry -> command with term and index
append entries -> RPC to replicate log
match index -> highest log index known to be replicated
next index -> next entry a follower should receive

Mental Model Diagram (ASCII)

Leader log: [1][2][3][4]
Follower:   [1][2]
AppendEntries -> send [3][4]

Mental Model Diagram (Image)

Log Replication

How It Works (Step-by-Step)

Leader receives client command and appends to log.
Leader sends AppendEntries RPCs to followers.
Followers verify previous index and term.
If match, follower appends entries and acknowledges.

Minimal Concrete Example

AppendEntries(prevIndex=2, prevTerm=1, entries=[3,4])

Common Misconceptions

“Followers always accept new entries” -> they reject if log mismatch.
“Replication equals commitment” -> entries must be committed by majority.

Check-Your-Understanding Questions

What is the purpose of prevIndex and prevTerm?
When does the leader retry with earlier entries?
What happens if a follower has conflicting entries?

Where You’ll Apply It

See 4.4 for replication steps.
See 5.10 Phase 2 for implementation details.
Also used in: P03 Persistent Key-Value Store

2.3 Commitment and Safety

Description / Expanded Explanation

An entry is committed when it is stored on a majority of nodes. Only committed entries are applied to the state machine. This ensures safety: no two nodes apply different commands at the same index.

Definitions & Key Terms

commit index -> highest log index known to be committed
majority -> quorum of nodes (N/2 + 1)
safety -> no conflicting commits
state machine -> deterministic function applied to log entries

Mental Model Diagram (ASCII)

5 nodes: need 3 acknowledgments to commit

Mental Model Diagram (Image)

Commitment

How It Works (Step-by-Step)

Leader tracks replication acknowledgments.
When majority replicated, advance commit index.
Apply committed entries to state machine in order.

Minimal Concrete Example

Acked: 3/5 -> commit index advances

Common Misconceptions

“Leader commit equals durability” -> must be replicated.
“Followers can commit independently” -> they follow leader commit index.

Check-Your-Understanding Questions

Why is majority required for commit?
Can a committed entry be lost?
How does Raft prevent divergent logs?

Where You’ll Apply It

See 3.3 and 3.6 for guarantees and edge cases.
See 6.2 for tests.
Also used in: P03 Persistent Key-Value Store

2.4 Failure Modes and Timeouts

Description / Expanded Explanation

Raft operates under partial failure and message delay. Correct timeout tuning prevents split brain and ensures progress despite failures.

Definitions & Key Terms

heartbeat -> periodic AppendEntries with no data
partition -> network split
liveness -> system continues to make progress
timeout -> duration before election starts

Mental Model Diagram (ASCII)

Leader down -> timeout -> new election -> new leader

Mental Model Diagram (Image)

Failures

How It Works (Step-by-Step)

Leader fails, heartbeats stop.
Followers time out and start elections.
Majority partition elects a new leader.
Minority partition cannot commit.

Minimal Concrete Example

5 nodes -> partition 3/2 -> only 3-node side makes progress

Common Misconceptions

“Raft prevents all downtime” -> partitions can block progress.
“Short timeouts are best” -> too short causes instability.

Check-Your-Understanding Questions

Why can the minority partition not elect a leader?
How do timeouts affect leader stability?
What happens when partition heals?

Where You’ll Apply It

See 5.10 Phase 1 for election timeouts.
See 7.3 for performance traps.
Also used in: P02 Load Balancer

2.5 Deterministic State Machines

Description / Expanded Explanation

Raft guarantees that the same sequence of committed commands is applied in the same order on all nodes. If the state machine is deterministic, all nodes reach the same state.

Definitions & Key Terms

state machine -> deterministic function applied to log entries
determinism -> same input yields same output
snapshot -> compacted state to reduce log size

Mental Model Diagram (ASCII)

log: [set x=1][set y=2]
state: x=1, y=2

Mental Model Diagram (Image)

State Machine

How It Works (Step-by-Step)

Maintain apply index separate from commit index.
Apply entries in order to the state machine.
Create snapshots to truncate old log entries.

Minimal Concrete Example

apply(set k=v) -> map[k]=v

Common Misconceptions

“State machine can be non-deterministic” -> then replicas diverge.
“Snapshots are optional” -> long logs become too large.

Check-Your-Understanding Questions

Why must application be in order?
What state is stored in snapshots?
How do snapshots interact with log replication?

Where You’ll Apply It

See 4.2 for component responsibilities.
See 5.10 Phase 3 for snapshots.
Also used in: P03 Persistent Key-Value Store

3. Project Specification

3.1 What You Will Build

A Raft cluster that supports leader election, log replication, and client commands applied to a simple key-value state machine. It exposes a CLI to start nodes and submit commands.

3.2 Functional Requirements

Leader election with randomized timeouts.
AppendEntries RPC for replication and heartbeats.
RequestVote RPC for elections.
Commitment when majority acknowledges.
State machine applies committed entries.
Snapshotting to truncate log.

3.3 Non-Functional Requirements

Reliability: tolerate f failures in a 2f+1 cluster.
Performance: handle 100 ops/sec in local cluster.
Usability: clear logs and deterministic tests.

3.4 Example Usage / Output

$ raft-node --id 1 --peers 2,3
$ raft-client put x 1
OK (leader=1)

3.5 Data Formats / Schemas / Protocols

AppendEntries RPC:

{
  "term": 5,
  "leader_id": 1,
  "prev_log_index": 10,
  "prev_log_term": 4,
  "entries": ["set x=1"],
  "leader_commit": 9
}

RequestVote RPC:

{"term":5,"candidate_id":2,"last_log_index":10,"last_log_term":4}

Error JSON:

{"error":"not_leader","leader":"node1"}

3.6 Edge Cases

Leader crashes after replicating but before commit.
Partition splits cluster; minority cannot progress.
Log divergence after leader change.

3.7 Real World Outcome

3.7.1 How to Run (Copy/Paste)

make
./raft-node --id 1 --peers 2,3 &
./raft-node --id 2 --peers 1,3 &
./raft-node --id 3 --peers 1,2 &
./raft-client put x 1

3.7.2 Golden Path Demo (Deterministic)

Use fixed random seed for timeout jitter and a scripted client workload.

3.7.3 CLI Transcript (Success)

$ ./raft-client put x 1
OK leader=1 index=5
$ ./raft-client get x
1
$ echo $?
0

3.7.3 CLI Transcript (Failure)

$ ./raft-client put x 2 --target 2
{"error":"not_leader","leader":"node1"}
$ echo $?
3

3.7.4 Exit Codes

0 success
2 RPC timeout
3 not leader
4 log inconsistency

4. Solution Architecture

4.1 High-Level Design

client -> leader -> append entries -> followers
           | commit index | state machine

4.2 Key Components

4.3 Data Structures (No Full Code)

type LogEntry struct {
    Term int
    Command string
}

4.4 Algorithm Overview

Election: start election on timeout, request votes, become leader if majority. Replication: leader appends, replicates, commits with majority.

Complexity Analysis

Time: O(n) per append to replicate to n followers
Space: O(log size) per node

5. Implementation Guide

5.1 Development Environment Setup

make

5.2 Project Structure

project-root/
├── cmd/raft-node/
├── internal/raft/
├── internal/rpc/
└── tests/

5.3 The Core Question You’re Answering

“How can multiple machines agree on the same sequence of operations even when some fail?”

5.4 Concepts You Must Understand First

Leader election and quorum
Log replication rules
Deterministic state machines
Failure detection and timeouts

5.5 Questions to Guide Your Design

How will you persist term and log on disk?
How will you avoid split brain?
How do you choose timeout ranges?

5.6 Thinking Exercise

Simulate a 5-node cluster with a partition 3/2. Which side can commit entries?

5.7 The Interview Questions They’ll Ask

Explain why Raft uses a leader.
How does Raft ensure safety after a leader crash?
What is the difference between Raft and Paxos?

5.8 Hints in Layers

Hint 1: Implement a single-node log first. Hint 2: Add elections without replication. Hint 3: Add replication and commit index.

5.9 Books That Will Help

5.10 Implementation Phases

Phase 1: Foundation (7-10 days)

Leader election and RPCs.

Phase 2: Core Functionality (10-14 days)

Log replication and commitment.

Phase 3: Polish and Edge Cases (7-10 days)

Snapshots and log compaction.

5.11 Key Implementation Decisions

6. Testing Strategy

6.1 Test Categories

6.2 Critical Test Cases

Leader crash during replication -> new leader commits safely.
Partition heals -> logs converge.
Multiple elections -> only one leader per term.

6.3 Test Data

commands: set x=1, set y=2, del x

7. Common Pitfalls & Debugging

7.1 Frequent Mistakes

7.2 Debugging Strategies

Add verbose logs for role changes.
Run deterministic simulation with fixed seed.

7.3 Performance Traps

Sending full log on every append.
Excessively large heartbeat frequency.

8. Extensions & Challenges

8.1 Beginner Extensions

Add basic RPC tracing.
Add CLI for cluster status.

8.2 Intermediate Extensions

Snapshotting and log truncation.
Leader transfer.

8.3 Advanced Extensions

Dynamic membership changes.
Persistent storage with batching.

9. Real-World Connections

9.1 Industry Applications

etcd, Consul, and many distributed KV stores.

etcd
HashiCorp Raft

9.3 Interview Relevance

Central to distributed systems interviews.

10. Resources

10.1 Essential Reading

Raft paper: In Search of an Understandable Consensus Algorithm
Designing Data-Intensive Applications (Chapter 9)

10.2 Video Resources

Raft visual explanations and lectures

10.3 Tools & Documentation

tc for network simulation

11. Self-Assessment Checklist

11.1 Understanding

I can explain leader election and term transitions.
I can explain commit rules and safety.
I can describe failure handling.

11.2 Implementation

Cluster elects a leader reliably.
Logs are consistent across nodes.
State machine outputs match on all nodes.

11.3 Growth

I can compare Raft with Paxos at a high level.

12. Submission / Completion Criteria

Minimum Viable Completion:

Leader election works with 3 nodes.
AppendEntries replicates log entries.

Full Completion:

Commitment and state machine application implemented.
Snapshots supported.

Excellence (Going Above & Beyond):

Dynamic membership changes.
Fault injection test harness.