Loading learning content...
A consensus algorithm isn't useful if it can lose data. The entire point of replicating state across multiple servers is to ensure that committed operations survive failures. If a client receives confirmation that their write succeeded, that write must never be lost, even if servers crash, networks partition, or leaders fail.
Raft's safety guarantees are the formal statements that prove this property holds. They form an interlocking set of invariants: each property depends on others, and together they create an airtight guarantee that committed entries are never lost or overwritten.
This page examines each safety property in depth. We'll understand what each property means, why it's necessary, how Raft's mechanisms ensure it holds, and provide intuitive proof sketches. By the end, you'll have a complete mental model of why Raft is correct.
By the end of this page, you will understand:
• The five key safety properties that Raft guarantees • Election Safety — at most one leader per term • Leader Append-Only — leaders never overwrite or delete entries • Log Matching — if entries match, all predecessors match • Leader Completeness — elected leaders have all committed entries • State Machine Safety — all servers apply the same commands in the same order
Raft guarantees five key safety properties. These properties are invariants—they hold at all times during the execution of the algorithm, regardless of failures, crashes, or network issues.
| Property | Formal Statement | Intuition |
|---|---|---|
| Election Safety | At most one leader can be elected in a given term | No split-brain scenarios |
| Leader Append-Only | A leader never overwrites or deletes entries in its log; it only appends new entries | Leader's log is authoritative |
| Log Matching | If two logs contain an entry with the same index and term, then the logs are identical in all entries up through that index | Same prefix = same history |
| Leader Completeness | If a log entry is committed in a given term, that entry will be present in the logs of the leaders for all higher-numbered terms | New leaders have all committed data |
| State Machine Safety | If a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for that index | Identical commands applied |
These properties are not independent—they form a dependency chain:
Election Safety → ensures at most one leader per term ↓ Leader Append-Only → ensures leader's log grows monotonically ↓ Log Matching → ensures logs that match at a point match everywhere before ↓ Leader Completeness → ensures new leaders have committed entries ↓ State Machine Safety → ensures all state machines converge to same state
Let's examine each property in detail.
Property: At most one leader can be elected in a given term.
Why it matters: If two leaders existed in the same term, they could accept different commands at the same log index, leading to divergent state. The term uniquely identifies "whose log is authoritative."
How Raft ensures it:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
"""Election Safety Proof Sketch CLAIM: In any term T, at most one server can become leader. PROOF: 1. To become leader, a candidate must receive votes from a MAJORITY of servers (including itself). 2. Each server grants at most ONE vote per term. (voted_for is set and persisted before responding) 3. Any two majorities in a cluster of N servers must share at least one server (quorum intersection). 4. Suppose, for contradiction, that two servers S1 and S2 both became leader in term T. - S1 received votes from majority M1 - S2 received votes from majority M2 - M1 ∩ M2 is non-empty (at least one server in both) 5. Let V be a server in M1 ∩ M2. - V voted for S1 in term T - V voted for S2 in term T - But V can only vote once per term! 6. Contradiction. Therefore, at most one leader per term. ∎ EXAMPLE WITH 5 SERVERS: Cluster: {A, B, C, D, E}Majority requires: 3 votes If A becomes leader, A got votes from 3 servers, say {A, B, C}If X becomes leader, X got votes from 3 servers Any 3 servers from 5 must overlap with {A, B, C}:- {A, B, C} overlaps with everything containing A, B, or C- {D, E, ?} must include one of A, B, C to reach 3 Therefore X ∈ {A, B, C} and X must have gotten a vote from someone who already voted for A. Contradiction. IMPLEMENTATION VERIFICATION: Key code to verify in any Raft implementation:""" class VoteTracker: """ Demonstrates vote tracking that ensures Election Safety. """ def __init__(self): self.current_term = 0 self.voted_for = None # Server ID we voted for this term def can_vote_for(self, candidate_id: int, candidate_term: int) -> bool: """ Determine if we can vote for this candidate. """ if candidate_term < self.current_term: return False # Candidate is from past term if candidate_term > self.current_term: # New term - we haven't voted yet return True # Same term - check if we already voted if self.voted_for is None: return True # Haven't voted yet if self.voted_for == candidate_id: return True # Already voted for this candidate (idempotent) return False # Already voted for someone else def vote_for(self, candidate_id: int, candidate_term: int): """ Cast vote. MUST be persisted before responding! """ assert self.can_vote_for(candidate_id, candidate_term) if candidate_term > self.current_term: self.current_term = candidate_term self.voted_for = candidate_id # CRITICAL: Must persist before returning self._persist_to_disk() def _persist_to_disk(self): """ Persist current_term and voted_for. Why this matters: - Server crashes after voting, before response - Server restarts, receives another RequestVote for same term - Without persistence, might vote again → violates Election Safety """ # Write to stable storage (WAL, disk, etc.) passThe voted_for variable MUST be persisted to stable storage before responding to RequestVote. If a server votes, crashes, restarts, and votes again for a different candidate in the same term, Election Safety is violated. This is why Raft requires durable storage for currentTerm, votedFor, and log[].
Property: A leader never overwrites or deletes entries in its log; it only appends new entries.
Why it matters: If a leader could overwrite or delete its own entries, committed data could be lost. The leader's log is the source of truth—it must grow monotonically.
How Raft ensures it:
This property is enforced by the algorithm's structure, not by a specific mechanism:
Leaders only append — When a leader receives a client command, it appends to the end of its log. There's no code path for a leader to modify existing entries.
Log indices are permanent — Once an entry is at index i, it stays at index i. New entries are assigned index = last_index + 1.
Leaders don't process AppendEntries — Leaders send AppendEntries; they don't receive them. Overwriting can only happen on followers, and only for uncommitted entries.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
"""Leader Append-Only Property The leader's log grows monotonically: Time 0: Leader log = [1:a, 2:b, 3:c]Time 1: Client sends command 'd'Time 1: Leader log = [1:a, 2:b, 3:c, 4:d] (appended)Time 2: Client sends command 'e'Time 2: Leader log = [1:a, 2:b, 3:c, 4:d, 5:e] (appended) At no point does the leader:- Delete entry 2:b- Change entry 3:c to 3:x- Insert something before 5:e The only operation is: log.append(new_entry)""" class LeaderLog: """ A leader's log: append-only by construction. """ def __init__(self): self.entries = [] def append(self, term: int, command: any) -> int: """ The ONLY mutation operation available to leaders. Returns the index of the new entry. """ new_index = len(self.entries) + 1 entry = LogEntry(index=new_index, term=term, command=command) self.entries.append(entry) return new_index # Note: No delete(), no update(), no insert_at() methods # Leaders cannot modify existing entries by design """Contrast with FollowerLog: Followers CAN have entries overwritten, but only:1. Uncommitted entries2. That conflict with leader's log This is safe because:- Committed entries are on majority- Leader has all committed entries (Leader Completeness)- Therefore, leader never sends conflicting entries for committed indexes""" class FollowerLog: """ A follower's log: can be truncated for uncommitted conflicts. """ def __init__(self): self.entries = [] self.commit_index = 0 def append_entries(self, prev_index: int, prev_term: int, entries: list) -> bool: """ Append entries from leader, possibly truncating conflicts. """ # Consistency check... for entry in entries: existing = self.get_entry(entry.index) if existing and existing.term != entry.term: # Conflict detected assert entry.index > self.commit_index, "BUG: Conflict at committed index would violate safety!" # Truncate from here self.entries = self.entries[:entry.index - 1] # Append if not self.get_entry(entry.index): self.entries.append(entry) return TrueLeader Append-Only is a "by construction" property. The leader code path simply doesn't include any operations that would modify existing entries. This is safer than relying on runtime checks—there's nothing to check because the operation doesn't exist.
Property: If two logs contain an entry with the same index and term, then:
Why it matters: This allows Raft to verify log consistency with a single check. If the entry at index N matches, we know entries 1 through N-1 also match—no need to verify each one.
How Raft ensures it:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
"""Log Matching Property Proof CLAIM: If logs L1 and L2 both have entry (index=i, term=t), then L1[1..i] = L2[1..i] PROOF BY INDUCTION: Base case (i = 1): - Entry at index 1 is the first entry - If both logs have entry (1, t), they contain the same command (because each leader creates exactly one entry per index per term, and there's at most one leader per term by Election Safety) - No preceding entries, so trivially true Inductive step: Assume the property holds for all indices < i. Show it holds for index i. - Both logs have entry (i, t) - By part 1 of Log Matching, they have the same command at i - When the leader created entry (i, t), it sent AppendEntries with: prev_log_index = i - 1 prev_log_term = term of entry at i - 1 - Any server that accepted (i, t) must have had matching (i-1, prev_term) - Therefore, L1 and L2 both have (i-1, prev_term) - By inductive hypothesis, L1[1..i-1] = L2[1..i-1] - Combined with L1[i] = L2[i], we get L1[1..i] = L2[1..i] ∎ EXAMPLE: Leader log: [1|T1]:a [2|T1]:b [3|T2]:c [4|T2]:d Follower receives AppendEntries for entry [4|T2]:d prev_log_index = 3 prev_log_term = T2 Follower checks: Do I have [3|T2]? - If yes: Accept [4|T2]:d, and by Log Matching, entries 1-3 match - If no: Reject, ask leader to backtrack The single check at index 3 verifies the entire prefix! PRACTICAL IMPLICATION: This is why the AppendEntries consistency check is so simple: def check_consistency(prev_index, prev_term, my_log): if prev_index == 0: return True # Empty prefix always matches if prev_index > len(my_log): return False # I don't have that entry if my_log[prev_index - 1].term != prev_term: return False # I have different entry return True # Match at prev_index → entire prefix matches!""" def verify_log_matching(log1: list, log2: list) -> bool: """ Verify that Log Matching holds between two logs. For any index where both logs have entries with the same term, all preceding entries should also match. """ for i in range(min(len(log1), len(log2))): if log1[i].term == log2[i].term: # Same term at this index - commands should match if log1[i].command != log2[i].command: return False # Violation! # All preceding entries should match for j in range(i): if log1[j] != log2[j]: return False # Violation! return TrueProperty: If a log entry is committed in a given term, that entry will be present in the logs of the leaders for all higher-numbered terms.
Why it matters: This is THE critical safety property. It guarantees that once data is committed, it survives leader changes. No matter how many leaders fail and new leaders are elected, committed data persists.
How Raft ensures it:
This property relies on three mechanisms working together:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
"""Leader Completeness Proof CLAIM: If entry E is committed at index i in term T, then every leader of term T' > T has E at index i. PROOF: Given: - Entry E was committed in term T - This means E was replicated to a majority M_commit To show: - Any leader of term T' > T has E at index i Proof by induction on T': Base case (T' = T + 1): - Let L be the leader of term T' = T + 1 - L won election, getting votes from majority M_vote - M_commit ∩ M_vote is non-empty (quorum intersection) - Let V be a server in M_commit ∩ M_vote - V has entry E (since V ∈ M_commit) - V voted for L (since V ∈ M_vote) - By election restriction, L's log is "at least as up-to-date" as V's - Therefore, L has entry E ∎ Inductive step: - Assume all leaders of terms T+1, T+2, ..., T'-1 have E - Show leader of term T' has E - Let L' be the leader of term T' - L' needs votes from majority M' - Consider any voter V' in M' Case 1: V' was a leader of some term in [T+1, T'-1] - By inductive hypothesis, V' has E Case 2: V' was never a leader in [T+1, T'-1] - V' still has E from term T (hasn't been overwritten) - (Entries are only overwritten by leaders' AppendEntries) - Either way, V' has E - M' intersects with set of servers that have E - By election restriction, L' has "at least as up-to-date" log - Therefore, L' has E ∎ KEY INSIGHT: Quorum Intersection + Election Restriction Any two majorities overlap.Committed entries are on a majority.Winners contact a majority to win.Therefore, winners contact someone with the committed entry.Election restriction ensures winners have all committed entries.""" def illustrate_quorum_intersection(): """ Show how quorum intersection preserves committed entries. """ cluster = ["S1", "S2", "S3", "S4", "S5"] # Entry E committed on majority commit_majority = {"S1", "S2", "S3"} # New leader needs majority votes # Every possible majority includes at least one from commit_majority possible_election_majorities = [ {"S1", "S2", "S3"}, {"S1", "S2", "S4"}, {"S1", "S2", "S5"}, {"S1", "S3", "S4"}, {"S1", "S3", "S5"}, {"S1", "S4", "S5"}, {"S2", "S3", "S4"}, {"S2", "S3", "S5"}, {"S2", "S4", "S5"}, {"S3", "S4", "S5"}, ] for election in possible_election_majorities: intersection = commit_majority & election assert len(intersection) > 0, "Impossible: no intersection!" print(f"Election {election} intersects commit at {intersection}") # All 10 possible election majorities intersect! # It's mathematically impossible to win without contacting # someone who has the committed entry. # The election restriction finishes the jobdef election_restriction_check(candidate_log: list, voter_log: list) -> bool: """ Voter grants vote only if candidate's log is at least as up-to-date. This ensures that if voter has committed entry E, candidate must also have E (or be "ahead" of E). """ candidate_last_term = candidate_log[-1].term if candidate_log else 0 candidate_last_index = len(candidate_log) voter_last_term = voter_log[-1].term if voter_log else 0 voter_last_index = len(voter_log) # Up-to-date comparison if candidate_last_term != voter_last_term: return candidate_last_term > voter_last_term return candidate_last_index >= voter_last_indexWithout the election restriction, Leader Completeness would fail. A server with a stale log could win an election (if it just needed votes, not up-to-date-ness) and then accept new commands that overwrite committed entries. The restriction ensures that only servers with complete committed logs can become leader.
Property: If a server has applied a log entry at a given index to its state machine, no other server will ever apply a different log entry for that index.
Why it matters: This is the ultimate goal. All servers must execute the same sequence of commands. If server A applies "SET x=5" at index 3 and server B applies "SET x=7" at index 3, they'll have different states—a fundamental failure.
How Raft ensures it:
State Machine Safety follows from Leader Completeness and the commitment rule:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
"""State Machine Safety Proof CLAIM: If server S1 applies entry E1 at index i, no other server applies a different entry at index i. PROOF: 1. S1 only applies E1 after E1 is committed (servers only apply entries when commit_index >= entry.index) 2. E1 is committed means: - E1 was replicated to a majority - The leader of E1's term committed it 3. By Leader Completeness: - All future leaders have E1 at index i 4. By Leader Append-Only: - No leader will overwrite index i with a different entry 5. By Log Matching: - Any server that has E1 at index i also has the same preceding entries - Any server that will receive index i (from any leader) will receive E1 6. Therefore: - Any server S2 that applies something at index i will apply E1 - S1 and S2 apply the same entry ∎ THE CHAIN OF PROPERTIES: State Machine Safety depends on: ← Leader Completeness (committed entries survive leader changes) ← Election Safety (one leader per term) ← Election Restriction (leader has all committed entries) ← Log Matching (same index+term = same command) ← Leader Append-Only (leaders don't overwrite)""" class StateMachine: """ Demonstrates safe state machine application. """ def __init__(self): self.state = {} self.last_applied = 0 def apply_committed_entries(self, log: list, commit_index: int): """ Apply entries up to commit_index. Safety guarantees: - commit_index only increases (never decreases) - Entries at index <= commit_index are guaranteed committed - Committed entries are identical across all servers """ while self.last_applied < commit_index: self.last_applied += 1 entry = log[self.last_applied - 1] # Apply the command self._apply_command(entry.command) # At this point, every other server that applies # index self.last_applied will apply the SAME command def _apply_command(self, command: dict): """Execute a command against the state.""" if command["op"] == "SET": self.state[command["key"]] = command["value"] elif command["op"] == "DELETE": self.state.pop(command["key"], None) # Demonstration of convergencedef demonstrate_convergence(): """ Show how all state machines converge to the same state. """ # Committed log (same on all servers by Leader Completeness) committed_log = [ LogEntry(1, 1, {"op": "SET", "key": "x", "value": 1}), LogEntry(2, 1, {"op": "SET", "key": "y", "value": 2}), LogEntry(3, 2, {"op": "SET", "key": "x", "value": 3}), LogEntry(4, 2, {"op": "DELETE", "key": "y"}), ] # Three servers apply in different orders (timing) # but result is the same server1 = StateMachine() server2 = StateMachine() server3 = StateMachine() # Server 1: Already at commit_index 4 server1.apply_committed_entries(committed_log, 4) # Server 2: Was behind, catching up server2.apply_committed_entries(committed_log, 2) # First to index 2 server2.apply_committed_entries(committed_log, 4) # Then to index 4 # Server 3: Applies one at a time for i in range(1, 5): server3.apply_committed_entries(committed_log, i) # All three have identical final state assert server1.state == server2.state == server3.state assert server1.state == {"x": 3} # y was deletedState Machine Safety also requires the state machine to be deterministic. Given the same command, every server must produce the same result. This is why commands should not use wall-clock time, random numbers, or other non-deterministic inputs. If randomness is needed, the leader should generate it and include it in the command.
The five safety properties form an interlocking system. Let's trace through a complete scenario to see how they work together to guarantee correctness.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697
"""Complete Scenario: Safety Properties in Action Cluster: {S1, S2, S3, S4, S5} === PHASE 1: Normal Operation === S1 is leader of term 1.Client sends: SET x = 42 Step 1: S1 appends to its log S1 log: [1|T1]: SET x = 42 (Leader Append-Only: S1 will never overwrite this) Step 2: S1 replicates to followers S2 log: [1|T1]: SET x = 42 ✓ S3 log: [1|T1]: SET x = 42 ✓ S4: offline S5: slow, hasn't received yet Step 3: S1 counts replications S1, S2, S3 have the entry = 3 = majority Entry is COMMITTED Step 4: S1 applies to state machine, responds to client S1 state: {x: 42} Client receives: "OK" === PHASE 2: Leader Failure === S1 crashes. S2 starts election for term 2. Step 5: S2 requests votes with its log info S2 log: [1|T1]: SET x = 42 last_log_index = 1, last_log_term = T1 Step 6: Voting S3 checks: Is S2's log at least as up-to-date? S3 last_log_term = T1, S2 last_log_term = T1 S3 last_log_index = 1, S2 last_log_index = 1 Equal = at least as up-to-date ✓ S3 votes YES (Election restriction satisfied) S4: comes back online, empty log S4 last_log_term = 0, S2 last_log_term = T1 S2 is more up-to-date ✓ S4 votes YES S5: still slow, doesn't vote in time Step 7: S2 wins with votes from {S2, S3, S4} (Election Safety: Only S2 can be leader of term 2) Step 8: S2 has committed entry at index 1 (Leader Completeness: S2 has all committed entries) === PHASE 3: New Leader Continues === Client sends: SET y = 100 Step 9: S2 appends to its log S2 log: [1|T1]: SET x = 42, [2|T2]: SET y = 100 (Leader Append-Only) Step 10: S2 replicates to followers S3: Updates commit_index from S2, applies index 1 S3 state: {x: 42} S4: Receives AppendEntries prev_log_index = 1, prev_log_term = T1 S4 doesn't have index 1 → rejects S2 backs up to prev_log_index = 0 S2 sends entries 1 and 2 S4 log: [1|T1]: SET x = 42, [2|T2]: SET y = 100 (Log Matching: S4's log now matches S2's) Step 11: Both entries committed when on majority All servers eventually: {x: 42, y: 100} (State Machine Safety: All applied same commands) === PHASE 4: S1 Comes Back === S1 restarts. It has: S1 log: [1|T1]: SET x = 42 S1 current_term = 1 Step 12: S1 receives AppendEntries from S2 (term 2) S1 discovers higher term, updates to term 2 S1 becomes follower S1 replicates entry 2 from S2 S1 log: [1|T1]: SET x = 42, [2|T2]: SET y = 100 S1 applies and reaches same state All five servers converge to {x: 42, y: 100}All safety properties maintained throughout!"""| Step | Property Used | What It Prevented |
|---|---|---|
| 1-3 | Leader Append-Only | Entry can't be accidentally deleted |
| 5-6 | Election Restriction | Stale S4 can't become leader |
| 7 | Election Safety | No split-brain with multiple term-2 leaders |
| 8 | Leader Completeness | S2 has committed entry before accepting new commands |
| 10 | Log Matching | S4 synchronized correctly |
| 11 | State Machine Safety | All servers have identical state |
The properties we've discussed are safety properties—they guarantee bad things never happen. Raft also provides liveness (the system eventually makes progress), but that requires additional assumptions like eventually synchronous networks and a majority of servers being alive. FLP impossibility means we can't have both safety and guaranteed termination in a fully asynchronous system.
What's Next:
With the theory of Raft understood, we'll examine Raft in Practice—how real-world systems like etcd and Consul implement Raft, the engineering challenges they face, and the production considerations that go beyond the academic algorithm.
You now understand Raft's safety guarantees in depth. You can explain why Raft never loses committed data, how the five properties interlock to create a correct algorithm, and trace through scenarios showing how the properties work together. This foundation will help you reason about correctness when implementing or debugging Raft-based systems.