Loading learning content...
In 1989, Leslie Lamport invented Paxos, the first consensus algorithm proven to be both safe and live (under partial synchrony). The algorithm was so profound that it influenced every subsequent consensus protocol—Raft, ZAB, Viewstamped Replication, and others are all, at their core, variations on Paxos themes.
Paxos is also infamous for being difficult to understand. Lamport's original paper presented the algorithm through an allegory about a Greek parliament on the fictional island of Paxos, which many found more confusing than clarifying. He later wrote "Paxos Made Simple" (2001), which opens with: "The Paxos algorithm, when presented in plain English, is very simple."
Our approach:
We'll build Paxos from first principles, understanding why each rule exists rather than memorizing the protocol mechanically. By the end, you'll understand how Paxos achieves the seemingly impossible: getting distributed nodes to agree on a value despite arbitrary message delays and node failures.
By the end of this page, you will understand the three roles in Paxos (proposer, acceptor, learner), the two-phase protocol structure, why each phase is necessary for safety, how Paxos handles competing proposals, and the connection between single-decree Paxos and Multi-Paxos for replicated log implementations.
Paxos solves single-decree consensus: getting a group of nodes to agree on exactly one value. The value, once chosen, cannot be changed.
The system model:
The scenario:
Multiple nodes may propose values concurrently. The algorithm must:
Why is this hard?
Consider a simple approach: "If you receive a proposal and haven't seen one before, accept it."
Worse: If nodes 1 and 2 fail, only Y might survive. But clients who received acknowledgment for X think it was chosen.
Result: At most one value can achieve majority acceptance.
The key insight:
Paxos prevents conflicting acceptances by having proposers first check what acceptors have already accepted. If a value has been (or might have been) chosen, the proposer adopts that value rather than proposing something new. This ensures that once a value might be chosen, all future proposals carry that same value.
This check-then-propose pattern is the two-phase structure of Paxos.
Paxos defines three logical roles. In practice, a single node often acts in all three roles simultaneously.
1. Proposers:
Proposers are nodes that propose values. A proposer:
2. Acceptors:
Acceptors are the "voters" that constitute the durable memory of the system. An acceptor:
3. Learners:
Learners discover what value was chosen. A learner:
| Role | Responsibility | State to Maintain | Messages Sent/Received |
|---|---|---|---|
| Proposer | Initiate consensus rounds | Current proposal number | Sends: Prepare, Accept Receives: Promise, Accepted |
| Acceptor | Vote on proposals, remember votes | Highest promised number, accepted value | Receives: Prepare, Accept Sends: Promise, Accepted |
| Learner | Discover chosen value | Set of accepted values per acceptor | Receives: Accepted notifications |
Proposal Numbers:
Proposal numbers must be:
Common implementation:
Combine a local sequence number with the proposer's node ID:
proposal_number = (sequence_number, node_id)
Compare by sequence number first, then by node ID. This ensures uniqueness and ordering without coordination.
In typical deployments, every node acts as proposer, acceptor, and learner. The separation is conceptual—it helps understand the algorithm, but nodes don't need to be partitioned by role. A 3-node Paxos cluster has 3 proposers, 3 acceptors, and 3 learners—all colocated.
The Prepare phase achieves two goals:
Proposer actions:
n higher than any previously usedPrepare(n) to all acceptors (or at least a majority)Acceptor actions upon receiving Prepare(n):
If n > highest_promised_number:
highest_promised_number = nPromise(n, accepted_value, accepted_number)If n <= highest_promised_number:
Reject(highest_promised_number)The promise is crucial:
By collecting promises from a majority, the proposer learns:
This combination lets the proposer safely propose a value.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
# ============================================# PHASE 1: PREPARE# ============================================ class PaxosProposer: """ Paxos proposer handling Phase 1 (Prepare). """ def __init__(self, node_id: int, acceptors: List[str]): self.node_id = node_id self.acceptors = acceptors self.sequence_number = 0 # Incremented for each new proposal def generate_proposal_number(self) -> ProposalNumber: """ Generate unique proposal number higher than any we've seen. Format: (sequence, node_id) - lexicographic ordering. Sequence ensures our proposals increase. Node ID breaks ties for uniqueness. """ self.sequence_number += 1 return ProposalNumber(self.sequence_number, self.node_id) def prepare(self, proposal_number: ProposalNumber) -> Optional[PrepareResult]: """ Phase 1a: Send Prepare request to all acceptors. Returns PrepareResult if majority respond, None otherwise. """ promises: List[Promise] = [] for acceptor in self.acceptors: try: response = self._send_prepare(acceptor, proposal_number) if response.type == "PROMISE": promises.append(response) else: # Rejected - acceptor has promised to higher proposal # We might want to bump our number and retry pass except NetworkTimeout: # Acceptor unreachable - continue with others pass # Did we get a majority? majority_threshold = len(self.acceptors) // 2 + 1 if len(promises) >= majority_threshold: # Success! Extract highest accepted value from promises highest_accepted = self._find_highest_accepted(promises) return PrepareResult( success=True, promises=promises, highest_accepted=highest_accepted ) else: # Failed to get majority - might retry with higher number return None def _find_highest_accepted(self, promises: List[Promise]) -> Optional[AcceptedValue]: """ Among all promises, find the value accepted with highest proposal number. This is the value we must adopt if it exists. """ highest = None for promise in promises: if promise.accepted_number is not None: if highest is None or promise.accepted_number > highest.number: highest = AcceptedValue( number=promise.accepted_number, value=promise.accepted_value ) return highest class PaxosAcceptor: """ Paxos acceptor handling Phase 1 (Prepare). """ def __init__(self, node_id: str, storage: DurableStorage): self.node_id = node_id self.storage = storage # Recover state from storage (survives crashes) self.highest_promised = storage.get("highest_promised", None) self.accepted_number = storage.get("accepted_number", None) self.accepted_value = storage.get("accepted_value", None) def handle_prepare(self, proposal_number: ProposalNumber) -> Response: """ Phase 1b: Handle Prepare request from proposer. If proposal number is higher than any we've seen, promise to reject lower-numbered proposals. """ if self.highest_promised is None or proposal_number > self.highest_promised: # Accept this prepare - make a promise # CRITICAL: Persist before responding self.highest_promised = proposal_number self.storage.put("highest_promised", proposal_number) self.storage.flush() # Respond with our promise and any accepted value return Promise( proposal_number=proposal_number, accepted_number=self.accepted_number, accepted_value=self.accepted_value ) else: # Reject - we've promised to a higher proposal return Reject( proposal_number=proposal_number, highest_promised=self.highest_promised )An acceptor MUST persist highest_promised before responding with a Promise. If it crashes and restarts, forgetting its promise, it might later accept a lower-numbered proposal—violating the promise and potentially breaking consensus safety. Persistence is not optional; it's fundamental to correctness.
The Accept phase proposes a value for consensus. After receiving promises from a majority, the proposer knows:
n can now be accepted by a majorityChoosing what value to propose:
This is the critical rule that makes Paxos safe:
If any promise included an accepted value, the proposer must propose the value with the highest accepted number. Only if no promise included an accepted value may the proposer propose its own value.
Why this rule?
If a value was accepted by a majority with proposal number k, then any majority of acceptors will include at least one that accepted it. The proposer must continue with that value to avoid conflicting with a potentially-chosen value.
Proposer actions (Phase 2a):
Send Accept(n, v) to all acceptors, where:
n is the proposal number from Phase 1v is the highest accepted value from promises, or our own value if noneAcceptor actions (Phase 2b):
Upon receiving Accept(n, v):
n >= highest_promised_number:
accepted_number = n, accepted_value = vAccepted(n, v)n < highest_promised_number:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146
# ============================================# PHASE 2: ACCEPT# ============================================ class PaxosProposer: """ Paxos proposer handling Phase 2 (Accept). """ def propose_value(self, my_value: Any) -> Optional[Any]: """ Complete Paxos protocol to propose a value. Returns the chosen value (might be ours or someone else's). Returns None if consensus could not be reached. """ while True: # Retry until success or explicit failure # Phase 1: Prepare n = self.generate_proposal_number() prepare_result = self.prepare(n) if prepare_result is None: # Failed to get majority - retry with higher number continue # Determine value to propose if prepare_result.highest_accepted is not None: # MUST use the highest accepted value value_to_propose = prepare_result.highest_accepted.value else: # Free to propose our own value value_to_propose = my_value # Phase 2: Accept accept_result = self.accept(n, value_to_propose) if accept_result.success: # Value was chosen! return value_to_propose else: # Someone sent a higher Prepare - retry continue def accept(self, proposal_number: ProposalNumber, value: Any) -> AcceptResult: """ Phase 2a: Send Accept request to all acceptors. Returns success if a majority accept. """ accepted_count = 0 for acceptor in self.acceptors: try: response = self._send_accept(acceptor, proposal_number, value) if response.type == "ACCEPTED": accepted_count += 1 # Notify learners self._notify_learners(acceptor, proposal_number, value) else: # Rejected - acceptor promised to higher proposal # This proposal won't succeed if response.highest_promised > proposal_number: return AcceptResult(success=False) except NetworkTimeout: pass # Continue with other acceptors majority_threshold = len(self.acceptors) // 2 + 1 return AcceptResult( success=(accepted_count >= majority_threshold), accepted_count=accepted_count ) class PaxosAcceptor: """ Paxos acceptor handling Phase 2 (Accept). """ def handle_accept(self, proposal_number: ProposalNumber, value: Any) -> Response: """ Phase 2b: Handle Accept request from proposer. Accept if we haven't promised to a higher proposal. """ if self.highest_promised is None or proposal_number >= self.highest_promised: # Accept this proposal # Update our state self.highest_promised = proposal_number self.accepted_number = proposal_number self.accepted_value = value # CRITICAL: Persist before responding self.storage.put("highest_promised", proposal_number) self.storage.put("accepted_number", proposal_number) self.storage.put("accepted_value", value) self.storage.flush() return Accepted( proposal_number=proposal_number, value=value ) else: # Reject - we've promised to a higher proposal return Reject( proposal_number=proposal_number, highest_promised=self.highest_promised ) class PaxosLearner: """ Paxos learner discovering the chosen value. """ def __init__(self, acceptor_count: int): self.acceptor_count = acceptor_count self.accepted_values: Dict[str, Tuple[ProposalNumber, Any]] = {} self.chosen_value: Optional[Any] = None def on_accepted(self, acceptor_id: str, proposal_number: ProposalNumber, value: Any): """ Receive notification that an acceptor accepted a value. Check if this value has now been chosen (majority accepted). """ self.accepted_values[acceptor_id] = (proposal_number, value) # Count how many acceptors have accepted this exact (number, value) pair count = sum( 1 for (n, v) in self.accepted_values.values() if n == proposal_number and v == value ) majority_threshold = self.acceptor_count // 2 + 1 if count >= majority_threshold and self.chosen_value is None: self.chosen_value = value print(f"Value chosen: {value}") def get_chosen_value(self) -> Optional[Any]: """Return the chosen value, or None if not yet known.""" return self.chosen_valuePaxos maintains this invariant: If a value v is chosen (accepted by a majority with number n), then every proposal with number > n will also propose v. Why? Any majority of acceptors includes one who accepted v, and Phase 1 forces the proposer to adopt v. This is the key insight that makes Paxos safe.
Let's trace through a complete Paxos execution with competing proposals to see how the protocol ensures safety.
Setup:
Scenario: P1 starts first, but messages to A4 and A5 are delayed
| Step | Action | A1 | A2 | A3 | A4 | A5 |
|---|---|---|---|---|---|---|
| 1 | Initial state | |||||
| 2 | P1: Prepare(1) | Promise(1,-) | Promise(1,-) | Promise(1,-) | (delayed) | (delayed) |
| 3 | P1 gets 3 promises, proceeds to Accept | |||||
| 4 | P1: Accept(1,X) | Accept(1,X)✓ | Accept(1,X)✓ | Accept(1,X)✓ | (delayed) | (delayed) |
| 5 | X is chosen (majority A1,A2,A3) | X | X | X | ||
| 6 | P2: Prepare(2) | Promise(2,1,X) | Promise(2,1,X) | Promise(2,1,X) | Promise(2,-) | Promise(2,-) |
| 7 | P2 sees X was accepted at n=1, must adopt X | |||||
| 8 | P2: Accept(2,X) [not Y!] | Accept(2,X)✓ | Accept(2,X)✓ | Accept(2,X)✓ | Accept(2,X)✓ | Accept(2,X)✓ |
| 9 | X confirmed again with higher number | X | X | X | X | X |
Key observations:
Step 5: X is chosen when 3 acceptors (A1, A2, A3) accept it with proposal number 1.
Step 6-7: When P2 runs Prepare(2), it receives promises from all 5 acceptors. Three of them (A1, A2, A3) report having accepted X at proposal number 1. P2 must propose X, not Y, because X was accepted by a majority.
Step 8-9: P2's Accept(2, X) succeeds and reinforces X as the chosen value.
Even if P2 had reached A4, A5 before P1:
| Step | Action | A1 | A2 | A3 | A4 | A5 |
|---|---|---|---|---|---|---|
| 1 | P2: Prepare(2) reaches A4, A5 first | Promise(2,-) | Promise(2,-) | |||
| 2 | P1: Prepare(1) | Promise(1,-) | Promise(1,-) | Promise(1,-) | (already promised 2) | (already promised 2) |
| 3 | P1 has 3 promises, proceeds | |||||
| 4 | P1: Accept(1,X) to A1,A2,A3 | Accept(1,X)✓ | Accept(1,X)✓ | Accept(1,X)✓ | ||
| 5 | P1: Accept(1,X) to A4,A5 | Reject (promised 2) | Reject (promised 2) | |||
| 6 | P2: Prepare(2) to A1,A2,A3 | Promise(2,1,X) | Promise(2,1,X) | Promise(2,1,X) | ||
| 7 | P2 sees X accepted, adopts X | |||||
| 8 | P2: Accept(2,X) | Accept(2,X)✓ | Accept(2,X)✓ | Accept(2,X)✓ | Accept(2,X)✓ | Accept(2,X)✓ |
The crucial point:
Even in this alternate timeline, P2 learns about X when it collects promises from A1, A2, A3. Because X was accepted by a majority, any majority P2 consults will include at least one acceptor that knows about X.
When does Paxos not terminate?
Paxos can fail to terminate if proposers keep "stepping on each other":
This is the liveness problem mentioned by FLP. In practice, randomized timeouts prevent this: proposers wait a random interval before retrying, making continuous collisions unlikely.
Each element of Paxos exists to prevent specific failure modes. Let's examine why we can't simplify the protocol.
Why Phase 1 (Prepare)?
Without Phase 1, we can't prevent conflicting acceptances:
Imagine acceptors just accept the first proposal they see. With network delays, different acceptors see different proposals first. A1 might accept X while A2 accepts Y. Neither achieves majority. Now what? We're stuck, or worse, both might eventually get accepted by different majorities in succession.
Phase 1's promises ensure that once a proposer starts phase 2, old proposals can't interfere.
Why must we adopt the highest accepted value?
Without this rule, we can overwrite a chosen value:
| Step | Without Adoption Rule | Consequence |
|---|---|---|
| 1 | P1 gets promises from A1, A2, A3 | No one has accepted anything yet |
| 2 | P1 sends Accept(1, X) to A1, A2 only | X accepted by A1, A2 |
| 3 | P1 crashes before reaching A3 | X not yet chosen (only 2, need 3) |
| 4 | P2 gets promises from A3, A4, A5 | None of these accepted anything |
| 5 | P2 proposes Accept(2, Y) | If A3 accepts Y: A3, A4, A5 accept Y |
| 6 | Y is chosen (3 acceptors) | But A1, A2 have accepted X! |
In step 4-5, if P2 didn't have to check what was already accepted, it could propose Y and get it accepted by A3, A4, A5. But X was already accepted by A1, A2. If P1 later resumes and gets A3 to accept X, we have both X and Y accepted by majorities!
The adoption rule prevents this: P2's Prepare would see A1 or A2 (if it contacts them) report X. If P2's majority doesn't include A1 or A2, it's possible that X wasn't chosen. But if P2's Accept succeeds with a majority including A3, that overlaps with any majority that might choose X, so one of them will inform P2.
Why unique, ordered proposal numbers?
Without ordering, we can't determine which promise takes precedence.
If proposals are unordered, an acceptor that promised to "proposal A" has no way to determine if "proposal B" is higher or lower. The promise mechanism only works because proposals are totally ordered.
Paxos is surprisingly minimal. Every element exists to prevent a specific failure. Remove the Prepare phase, and conflicting values can be accepted. Remove the adoption rule, and chosen values can be overwritten. Remove proposal ordering, and promises become meaningless. Paxos is consensus stripped to its essence.
Single-decree Paxos chooses one value. For practical systems like replicated databases, we need Multi-Paxos: a sequence of Paxos instances, each choosing one log entry.
Conceptually:
The leader optimization:
Running full two-phase Paxos for every log entry is expensive. Multi-Paxos optimizes:
This reduces steady-state operation from 4 message delays (Prepare → Promise → Accept → Accepted) to 2 (Accept → Accepted).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167
class MultiPaxosLeader: """ Multi-Paxos leader using leader optimization. Once leader is established, only Phase 2 is needed for each new log entry. """ def __init__(self, node_id: int, acceptors: List[str]): self.node_id = node_id self.acceptors = acceptors self.current_proposal_number: Optional[ProposalNumber] = None self.prepared_up_to: int = -1 # Highest log index we've prepared self.is_leader = False def become_leader(self) -> bool: """ Establish leadership by running Phase 1 for all pending log positions. Returns True if we successfully became leader. """ n = self.generate_proposal_number() # Prepare for "infinity" - all future log positions # In practice, send highest known log index to acceptors promises = [] for acceptor in self.acceptors: try: response = self._send_prepare(acceptor, n) if response.type == "PROMISE": promises.append(response) except NetworkTimeout: pass if len(promises) < self.majority_size(): return False # We're now the leader for this proposal number self.current_proposal_number = n self.is_leader = True # Recover any in-progress log entries from promises self._recover_log_entries(promises) return True def replicate_entry(self, log_index: int, value: Any) -> bool: """ Replicate a log entry (Phase 2 only, since we're leader). Only works if we're the established leader. """ if not self.is_leader: raise NotLeaderError("Must be leader to replicate") # With leader optimization, skip Phase 1 # Go directly to Phase 2 (Accept) accepted_count = 0 for acceptor in self.acceptors: try: response = self._send_accept( acceptor, self.current_proposal_number, log_index, value ) if response.type == "ACCEPTED": accepted_count += 1 elif response.type == "REJECT": # Someone has a higher proposal number # We're no longer the leader self.is_leader = False return False except NetworkTimeout: pass return accepted_count >= self.majority_size() def _recover_log_entries(self, promises: List[Promise]): """ Recover uncommitted log entries from Phase 1 promises. For each log position, we must adopt the highest-numbered accepted value, if any. """ # Collect all accepted entries from promises accepted_entries: Dict[int, Tuple[ProposalNumber, Any]] = {} for promise in promises: for log_index, (n, v) in promise.accepted_entries.items(): if log_index not in accepted_entries: accepted_entries[log_index] = (n, v) elif n > accepted_entries[log_index][0]: accepted_entries[log_index] = (n, v) # For each recovered entry, complete Phase 2 for log_index, (n, v) in sorted(accepted_entries.items()): # Must re-propose this value to ensure it's committed success = self.replicate_entry(log_index, v) if not success: # Lost leadership during recovery return class MultiPaxosFollower: """ Multi-Paxos follower (acceptor) with leader lease support. """ def __init__(self, node_id: str, storage: DurableStorage): self.node_id = node_id self.storage = storage # Per-log-position state self.log: Dict[int, LogEntry] = {} self.highest_promised: Optional[ProposalNumber] = None def handle_prepare(self, proposal_number: ProposalNumber) -> PrepareResponse: """ Handle Prepare from aspiring leader. For Multi-Paxos, this covers all log positions. """ if self.highest_promised is None or proposal_number > self.highest_promised: # Update promise self.highest_promised = proposal_number self.storage.put("highest_promised", proposal_number) self.storage.flush() # Return all accepted but uncommitted log entries return PrepareResponse( type="PROMISE", proposal_number=proposal_number, accepted_entries=self._get_accepted_entries() ) else: return PrepareResponse( type="REJECT", highest_promised=self.highest_promised ) def handle_accept(self, proposal_number: ProposalNumber, log_index: int, value: Any) -> AcceptResponse: """ Handle Accept from leader. For multi-Paxos, each Accept is for a specific log position. """ if proposal_number >= self.highest_promised: # Accept this entry self.log[log_index] = LogEntry( index=log_index, proposal_number=proposal_number, value=value ) self.storage.put(f"log_{log_index}", self.log[log_index]) self.storage.flush() return AcceptResponse(type="ACCEPTED") else: return AcceptResponse( type="REJECT", highest_promised=self.highest_promised )In steady state, Multi-Paxos with a stable leader achieves 2-message-delay commits. The leader sends Accept, waits for majority Accepted responses, then knows the entry is committed. This is exactly what Raft formalizes as 'normal operation'. The Prepare phase only runs during leader election or when the current leader is challenged.
We've built a deep understanding of Paxos, the foundational consensus algorithm. Let's consolidate the essential insights:
What's next:
While Paxos is mathematically elegant, its understandability has been a persistent challenge. Raft was designed from the ground up to be more comprehensible while providing the same guarantees. In the next section, we'll explore Raft's approach and see how it makes consensus more accessible to implementers.
You now understand Paxos: why each phase exists, how the safety invariants work, and how Multi-Paxos extends single-decree consensus to replicated logs. This knowledge provides the foundation for understanding all modern consensus protocols, including Raft, which we'll explore next.