Design a Consensus Protocol (Raft / Paxos)

Design a consensus protocol (like Raft or Paxos) that enables a cluster of servers to agree on a sequence of commands despite server crashes and network partitions. The protocol implements a replicated state machine: a single leader handles all client requests, replicates log entries to followers via AppendEntries RPCs, commits entries when a majority acknowledges, and guarantees that committed entries are never lost. Leader election uses randomised timeouts and majority voting, ensuring exactly one leader per term. The system tolerates F failures in a 2F+1 node cluster while maintaining linearizable consistency.

Scale Estimates

Metric	Value
Cluster size	3, 5, or 7 nodes (odd)
Fault tolerance	F failures with 2F+1 nodes
Heartbeat interval	100–150ms
Election timeout	150–300ms (randomised)
Leader election time	< 5 seconds (typically < 1 second)
Write latency (within DC)	1–5ms (majority round-trip)
Write latency (cross-DC)	50–200ms
Throughput	10,000–100,000 ops/sec
Log entry size	~100 bytes (typical KV command)
Snapshot interval	Every 10,000–100,000 entries

Non-Functional Requirements

Safety: Committed entries are NEVER lost or overwritten — the most critical property; guaranteed by: Leader Completeness (new leader has all committed entries), Log Matching (same index + term = identical prefix), and majority quorum (no disjoint majorities)
Liveness: The protocol eventually makes progress as long as a majority of servers are alive and can communicate; liveness depends on: randomised election timeout (prevents indefinite split votes), leader heartbeats, and bounded message delay
Linearizability: The replicated state machine behaves as if there is a single server; clients see the most recently committed state; exactly-once semantics via client session tracking
Fault tolerance: 2F+1 nodes tolerate F failures; a 5-node cluster operates correctly with any 2 nodes down; majority partition continues, minority partition halts (correct CAP CP behaviour)
Understandability (Raft): Designed for clarity — separation of concerns (election, replication, safety); no log gaps; strong leader; this is why Raft is preferred over Paxos in most production systems
Performance: Within-DC consensus: 1–5ms write latency; batching + pipelining for throughput; consensus is rarely the bottleneck (disk/network usually slower)

Scale Estimates

Metric

Value

Cluster size

3, 5, or 7 nodes (odd)

Fault tolerance

F failures with 2F+1 nodes

Heartbeat interval

100–150ms

Election timeout

150–300ms (randomised)

Leader election time

< 5 seconds (typically < 1 second)

Write latency (within DC)

1–5ms (majority round-trip)

Write latency (cross-DC)

50–200ms

Throughput

10,000–100,000 ops/sec

Log entry size

~100 bytes (typical KV command)

Snapshot interval

Every 10,000–100,000 entries

Non-Functional Requirements

Safety: Committed entries are NEVER lost or overwritten — the most critical property; guaranteed by: Leader Completeness (new leader has all committed entries), Log Matching (same index + term = identical prefix), and majority quorum (no disjoint majorities)

Liveness: The protocol eventually makes progress as long as a majority of servers are alive and can communicate; liveness depends on: randomised election timeout (prevents indefinite split votes), leader heartbeats, and bounded message delay

Linearizability: The replicated state machine behaves as if there is a single server; clients see the most recently committed state; exactly-once semantics via client session tracking

Fault tolerance: 2F+1 nodes tolerate F failures; a 5-node cluster operates correctly with any 2 nodes down; majority partition continues, minority partition halts (correct CAP CP behaviour)

Understandability (Raft): Designed for clarity — separation of concerns (election, replication, safety); no log gaps; strong leader; this is why Raft is preferred over Paxos in most production systems

Performance: Within-DC consensus: 1–5ms write latency; batching + pipelining for throughput; consensus is rarely the bottleneck (disk/network usually slower)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Consensus Protocol (Raft / Paxos)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Consensus Protocol (Raft / Paxos)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does the Raft leader election work?

2How does Raft's log replication work in detail?

3How does Paxos differ from Raft?

4How does Raft handle network partitions and split-brain?

5How does log compaction (snapshotting) work?

6How does cluster membership change work safely?

7Where is consensus used in real-world systems?

Key Topics

Asked At

Design a Consensus Protocol (Raft / Paxos)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does the Raft leader election work?

2How does Raft's log replication work in detail?

3How does Paxos differ from Raft?

4How does Raft handle network partitions and split-brain?

5How does log compaction (snapshotting) work?

6How does cluster membership change work safely?

7Where is consensus used in real-world systems?

Key Topics

Asked At