Loading content...
Paxos achieves distributed consensus through a carefully orchestrated dance between three distinct roles: Proposers, Acceptors, and Learners. While the algorithm itself is often described in terms of message flows and phases, a deeper understanding requires examining each role as an independent actor with its own responsibilities, state, and invariants.
Think of it like a parliamentary system: proposers introduce legislation, acceptors vote on it, and learners record the outcome. Each role is essential, and the correctness of the whole system depends on each role fulfilling its specific contract.
In this page, we'll examine each role with the rigor required for implementing production-quality consensus systems.
By the end of this page, you will understand the precise responsibilities of each Paxos role, the state each role must maintain (and persist), the invariants each role must preserve, and how roles interact during the protocol. You'll also learn about role collocation strategies used in production deployments.
Proposers are the active driving force in Paxos. They initiate the consensus process by proposing values and shepherding proposals through the two-phase protocol. Any node in the system that wants to get a value chosen must act as a proposer.
Core Responsibility:
A proposer's job is to get a value chosen by the acceptors. This sounds simple, but the proposer must navigate a complex landscape of concurrent proposals, failures, and previously-accepted values.
Proposer State:
A proposer maintains relatively simple state, but managing it correctly is crucial:
12345678910111213141516171819202122232425262728293031323334353637383940
class ProposerState: def __init__(self, proposer_id: int, num_acceptors: int): # Unique identifier for this proposer self.proposer_id = proposer_id # The highest proposal number we've used # Used to generate new, higher proposal numbers self.highest_proposed_round: int = 0 # The value we initially want to propose # May be replaced if we discover a previously accepted value self.initial_value: Optional[Any] = None # The value we're actually proposing (may differ from initial_value) # Set after Phase 1 completes self.proposing_value: Optional[Any] = None # Responses collected during the current round self.phase1_promises: List[PromiseResponse] = [] self.phase2_acceptances: List[AcceptedResponse] = [] # Majority threshold self.majority = (num_acceptors // 2) + 1 def generate_proposal_number(self) -> int: """ Generate a unique, monotonically increasing proposal number. The number is unique across all proposers by encoding proposer_id. Common scheme: (round_number, proposer_id) With 3 proposers (IDs 0, 1, 2), proposals would be: - Proposer 0: 0, 3, 6, 9, ... - Proposer 1: 1, 4, 7, 10, ... - Proposer 2: 2, 5, 8, 11, ... This ensures uniqueness and ordering. """ self.highest_proposed_round += 1 # Encode proposer_id in the lower bits return self.highest_proposed_round * 1000 + self.proposer_idThe proposal number generation scheme must guarantee global uniqueness. A common approach is to use (round_number × num_proposers + proposer_id), ensuring no two proposers ever generate the same proposal number. Getting this wrong leads to subtle correctness bugs.
The Proposer's Decision Tree:
A proposer must handle multiple scenarios during Phase 1 and Phase 2. Here's the decision logic:
| Situation | Action | Rationale |
|---|---|---|
| Phase 1: Received majority promises, no prior acceptances | Proceed to Phase 2 with initial value | Free to propose any value |
| Phase 1: Received majority promises, some with prior acceptances | Proceed to Phase 2 with the value from highest-numbered acceptance | Must respect already-accepted values to ensure safety |
| Phase 1: Received rejection (acceptor promised higher proposal) | Restart with new, higher proposal number | Cannot proceed with superseded proposal |
| Phase 1: Timeout waiting for responses | Retry with same or higher proposal number | Some acceptors may have failed or be slow |
| Phase 2: Received majority acceptances | Value is chosen; notify learners | Consensus achieved |
| Phase 2: Received rejection | Restart Phase 1 with higher proposal number | Another proposer may have preempted us |
The most critical rule: if any acceptor reports having accepted a value in Phase 1, the proposer MUST propose the value with the highest proposal number among all reported acceptances. This is the key mechanism that ensures once a value is chosen, all future proposals propagate that same value.
Acceptors are the voting body of Paxos. They respond to proposer requests, making promises and accepting values. The collective state of acceptors—specifically, which values have been accepted by majorities—determines what value has been chosen.
Core Responsibility:
An acceptor's job is to faithfully respond to Prepare and Accept requests according to the Paxos rules. Acceptors are primarily reactive—they don't initiate communication, but their responses determine the outcome of consensus.
Acceptor State:
Acceptor state is minimal but critical. Every piece of state must be durably persisted before any response is sent:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
class AcceptorState: """ Acceptor state that must be persisted to stable storage. CRITICAL: All state changes MUST be written to disk BEFORE sending any response message. This ensures safety properties hold even after crash and recovery. """ def __init__(self, acceptor_id: int, storage: DurableStorage): self.acceptor_id = acceptor_id self.storage = storage # The highest proposal number we have promised to not accept # proposals below. Initially None (no promises made). self.promised_proposal_number: Optional[int] = None # The highest-numbered proposal we have accepted. # Stored as (proposal_number, value) or None if nothing accepted. self.accepted_proposal: Optional[Tuple[int, Any]] = None # Recover state from storage on startup self._recover_from_storage() def _recover_from_storage(self): """Load persisted state after crash recovery.""" saved = self.storage.load(f"acceptor_{self.acceptor_id}") if saved: self.promised_proposal_number = saved.get("promised") self.accepted_proposal = saved.get("accepted") def _persist(self): """Write current state to durable storage.""" self.storage.save(f"acceptor_{self.acceptor_id}", { "promised": self.promised_proposal_number, "accepted": self.accepted_proposal, }) # MUST call fsync to ensure data reaches disk self.storage.sync() def handle_prepare(self, proposal_number: int) -> PromiseResponse: """ Handle a Prepare(n) request from a proposer. Returns either: - Promise(n, accepted_proposal) if n > promised_proposal_number - Rejection if n <= promised_proposal_number """ if (self.promised_proposal_number is None or proposal_number > self.promised_proposal_number): # Update promise and persist BEFORE responding self.promised_proposal_number = proposal_number self._persist() return PromiseResponse( promised=proposal_number, accepted_proposal=self.accepted_proposal ) else: # Reject: we've already promised a higher number return RejectionResponse( highest_promised=self.promised_proposal_number ) def handle_accept(self, proposal_number: int, value: Any) -> AcceptedResponse: """ Handle an Accept(n, v) request from a proposer. Accepts if n >= promised_proposal_number (not strictly greater, because the proposer who did Prepare(n) should be able to Accept(n, v)). """ if (self.promised_proposal_number is None or proposal_number >= self.promised_proposal_number): # Accept the proposal self.accepted_proposal = (proposal_number, value) self.promised_proposal_number = proposal_number self._persist() return AcceptedResponse( proposal_number=proposal_number, value=value ) else: # Reject: we've promised a higher number return RejectionResponse( highest_promised=self.promised_proposal_number )An acceptor that responds before persisting its state can violate safety. Consider: an acceptor promises proposal 5, responds, then crashes before persisting. On restart, it has no record of the promise and might accept proposal 3. This can cause two different values to be 'chosen.' ALWAYS persist before responding.
Acceptor Invariants:
Acceptors must maintain these invariants at all times:
The Accept Condition Subtlety:
Note that the accept condition is proposal_number >= promised_proposal_number, not strictly greater. This allows the proposer who sent Prepare(n) to subsequently send Accept(n, v). If we required strictly greater, the proposer would need to use different numbers for prepare and accept, complicating the protocol.
Some presentations use strictly greater and require the proposer to use a smaller number for prepare, then use that number for accept. Both approaches are correct; the key is consistency between proposer and acceptor.
Learners are observers that discover which value has been chosen. They don't participate in the voting process but need to know the outcome. In many systems, learners are the clients or application logic that acts on the consensus decision.
Core Responsibility:
A learner's job is to determine when a value has been chosen and what that value is. A value is chosen when a majority of acceptors have accepted it for the same proposal number.
Learner State:
Learners track received acceptances to detect when a value is chosen:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
class LearnerState: """ Learner state for detecting chosen values. Unlike acceptor state, learner state does not need to be persisted for correctness (though it may be cached for efficiency). """ def __init__(self, learner_id: int, num_acceptors: int): self.learner_id = learner_id self.num_acceptors = num_acceptors self.majority = (num_acceptors // 2) + 1 # Track acceptances by proposal number # Map: proposal_number -> {acceptor_id -> value} self.acceptances: Dict[int, Dict[int, Any]] = defaultdict(dict) # The chosen value, once known self.chosen_value: Optional[Any] = None self.chosen_proposal_number: Optional[int] = None def receive_accepted(self, acceptor_id: int, proposal_number: int, value: Any) -> Optional[Any]: """ Process an Accepted notification from an acceptor. Returns the chosen value if this notification completes a majority, otherwise returns None. """ # If we already know a value is chosen, we're done if self.chosen_value is not None: return self.chosen_value # Record this acceptance self.acceptances[proposal_number][acceptor_id] = value # Check if we have a majority for this proposal acceptors_for_proposal = self.acceptances[proposal_number] if len(acceptors_for_proposal) >= self.majority: # Verify all acceptors agree on the value # (They should, but this is a sanity check) values = set(acceptors_for_proposal.values()) if len(values) == 1: self.chosen_value = value self.chosen_proposal_number = proposal_number return self.chosen_value else: raise ProtocolViolation( f"Acceptors accepted different values for proposal {proposal_number}" ) return None def query_chosen_value(self) -> Optional[Any]: """Return the chosen value if known, otherwise None.""" return self.chosen_valueLearning Strategies:
There are multiple ways for learners to discover chosen values, each with different trade-offs:
| Strategy | Description | Pros | Cons |
|---|---|---|---|
| Direct from Acceptors | Each acceptor sends Accepted to all learners | Immediate notification; fault-tolerant | O(n × m) messages for n acceptors and m learners |
| Via Proposer | Proposer counts acceptances and notifies learners | Fewer messages (proposer aggregates) | Proposer becomes bottleneck; must handle proposer failure |
| Distinguished Learner | One learner receives from acceptors, then broadcasts | Reduces acceptor load | Single point of failure unless replicated |
| Learner Query | Learners query acceptors on-demand | No continuous notification overhead | Higher latency for learning; may miss transient choices |
Most production systems use the 'via proposer' approach, where the proposer (often acting as leader in Multi-Paxos) is responsible for notifying learners. If the proposer fails, a new proposer eventually takes over and can be queried for the chosen value.
Understanding how the three roles interact is essential for implementing and debugging Paxos. Let's trace through a complete consensus round with explicit message types.
The Message Types:
Paxos uses four primary message types:
| Message | Sender → Receiver | Contents | Purpose |
|---|---|---|---|
| Prepare(n) | Proposer → Acceptor | Proposal number n | Request promise to not accept proposals < n |
| Promise(n, accepted) | Acceptor → Proposer | Proposal n, previously accepted (n', v') or null | Commit to not accept < n; report any prior acceptance |
| Accept(n, v) | Proposer → Acceptor | Proposal number n, value v | Request acceptance of value v for proposal n |
| Accepted(n, v) | Acceptor → Proposer/Learner | Proposal number n, value v | Confirm acceptance; enables learning |
Rejection Messages:
While not always explicitly modeled, practical implementations include rejection messages:
When a proposer receives a rejection with the highest promised number P, it should generate its next proposal number to be greater than P. This avoids wasteful retry attempts with numbers that will also be rejected.
In practice, the three Paxos roles are rarely deployed as separate processes. Instead, production systems typically collocate multiple roles on the same physical or virtual node.
Common Collocation Patterns:
| Pattern | Description | When Used |
|---|---|---|
| Full Collocation | Each node runs proposer + acceptor + learner | Most common; simplest deployment; all nodes are peers |
| Separate Learners | Some nodes are proposer+acceptor; others are learner-only | When read-heavy replicas don't need to participate in voting |
| Leader-Based | One node is active proposer; all are acceptors+learners | Multi-Paxos with stable leader; reduces contention |
| Witness Nodes | Some acceptors don't store data, only vote | Reducing storage costs while maintaining quorum sizes |
Full Collocation Example:
Consider a 5-node cluster where each node runs all three roles:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
class PaxosNode: """ A single node running all three Paxos roles. This is the typical deployment model. """ def __init__(self, node_id: int, cluster_nodes: List[str]): self.node_id = node_id self.cluster_nodes = cluster_nodes self.num_nodes = len(cluster_nodes) # Initialize all three roles on this node self.proposer = ProposerRole( proposer_id=node_id, num_acceptors=self.num_nodes ) self.acceptor = AcceptorRole( acceptor_id=node_id, storage=DurableStorage(f"acceptor_{node_id}.db") ) self.learner = LearnerRole( learner_id=node_id, num_acceptors=self.num_nodes ) # Network layer for sending/receiving messages self.network = PaxosNetwork(node_id, cluster_nodes) async def propose_value(self, value: Any) -> Any: """ Client-facing API: propose a value and return the chosen value. The returned value may differ from the proposed value if an earlier proposal already achieved consensus. """ # Act as proposer chosen = await self.proposer.run_paxos( initial_value=value, acceptors=self.cluster_nodes, network=self.network ) return chosen async def handle_prepare(self, req: PrepareRequest) -> PrepareResponse: """Handle incoming Prepare message - delegate to acceptor role.""" return self.acceptor.handle_prepare(req.proposal_number) async def handle_accept(self, req: AcceptRequest) -> AcceptResponse: """Handle incoming Accept message - delegate to acceptor role.""" response = self.acceptor.handle_accept(req.proposal_number, req.value) # Also notify our local learner if isinstance(response, AcceptedResponse): self.learner.receive_accepted( self.node_id, req.proposal_number, req.value ) return responseEven when collocated, it's good practice to maintain clear separation between role implementations. This makes the code easier to reason about and test. The proposer shouldn't directly manipulate acceptor state, even if they're in the same process.
The durability requirements for each role differ significantly, which has implications for performance and implementation:
Acceptor Durability: Critical
Acceptors MUST persist their state to stable storage before sending any response. This includes:
Without durable storage, an acceptor that crashes and restarts would have no memory of its previous promises, potentially violating safety.
Consider: Acceptor A promises proposal 5, then crashes before persisting. On restart, A has no record of the promise. Proposal 3 arrives, A accepts it (having no memory of promising 5). Meanwhile, other acceptors who received proposal 5 accept its value. Now we have acceptors accepting different values for proposals 3 and 5—potentially leading to different chosen values.
Proposer Durability: Optional but Helpful
Proposer state doesn't strictly need to be persisted for correctness. If a proposer crashes mid-protocol, the protocol simply stalls until another proposer takes over. However, persisting the highest proposal number used helps avoid generating duplicate proposal numbers after restart.
Learner Durability: Not Required
Learners can always re-learn the chosen value by querying acceptors. However, caching learned values improves performance for repeated queries.
| Role | Must Persist | Optional but Helpful | Notes |
|---|---|---|---|
| Acceptor | promised_proposal_number, accepted_proposal | — | Safety depends on persisting before responding |
| Proposer | — | highest_proposal_number_used | Avoids duplicate numbers on restart |
| Learner | — | chosen_value (cache) | Can re-learn if needed |
Recovery Procedures:
When a node restarts, each role follows a specific recovery procedure:
Implementing Paxos correctly is notoriously difficult. Many subtle bugs have been discovered in production implementations. Here are the most common pitfalls, organized by role:
Paxos bugs often only manifest under specific failure scenarios that are rare in normal operation but guaranteed to occur eventually in production. Comprehensive testing requires injecting failures (network partitions, crashes, message delays) and verifying invariants hold.
We've taken a deep dive into the three roles that make Paxos work. Let's consolidate the key takeaways:
What's next:
Now that we understand the roles, we'll examine how they work together in the Two-Phase Protocol. The next page walks through Phase 1 (Prepare/Promise) and Phase 2 (Accept/Accepted) in exhaustive detail, including failure scenarios and recovery paths.
You now understand the three Paxos roles—Proposers, Acceptors, and Learners—their responsibilities, state management, and interactions. This knowledge prepares you for understanding the detailed protocol mechanics in the next page.