Loading learning content...
Imagine a scenario that seems deceptively simple: three computers need to agree on a single value. Perhaps they're deciding which server should be the leader, or committing a transaction, or choosing a configuration setting. With reliable communication, this would be trivial—everyone broadcasts their vote, and the majority wins.
But what happens when messages can be delayed, lost, or delivered out of order? What if computers can crash and restart at any moment? What if the network partitions, leaving different groups of computers unable to communicate with each other?
These conditions transform a simple voting problem into one of the most profound challenges in computer science: the distributed consensus problem. And the algorithm that first solved this problem—with mathematical rigor and correctness proofs—has shaped every distributed system built in the past three decades.
That algorithm is Paxos.
This page introduces Paxos at a conceptual level. You will understand the historical context that led to Paxos, why achieving consensus is fundamentally hard, what problem Paxos actually solves, and the key insights that make it work. By the end, you'll have a solid foundation for understanding the detailed protocol mechanics in subsequent pages.
Understanding Paxos requires appreciating the intellectual context from which it emerged. By the late 1980s, distributed systems researchers faced a daunting reality: building reliable systems from unreliable components was extraordinarily difficult, and most approaches either sacrificed correctness or availability.
Leslie Lamport, a computer scientist at SRI International (and later Microsoft Research), had already made groundbreaking contributions to distributed systems theory, including the foundational work on logical clocks and the happens-before relationship. In 1989, he turned his attention to the consensus problem.
Lamport originally described Paxos through an allegory about a fictional Greek island called Paxos, where legislators needed to agree on laws despite their part-time attendance at the parliament. This whimsical presentation style was considered unconventional by academic standards, and the paper—'The Part-Time Parliament'—was rejected by multiple journals before finally being published in 1998, nearly a decade after its conception.
Why the long delay mattered:
The delayed publication created an interesting phenomenon. During the 1990s, distributed systems practitioners struggled with consensus problems without access to Paxos. Systems were built with ad-hoc solutions that often had subtle bugs—bugs that only manifested under rare failure conditions that were nearly impossible to test.
When 'The Part-Time Parliament' finally appeared in 1998, many readers found it impenetrable, complaining about the Greek parliament allegory obscuring the technical content. Lamport eventually published a simplified version, 'Paxos Made Simple' (2001), which presents the algorithm more directly.
The impact of Paxos:
Despite—or perhaps because of—its complexity, Paxos has become the theoretical foundation for virtually every production consensus system. Google's Chubby and Spanner, Apache ZooKeeper (via ZAB), and numerous proprietary systems all implement variants of Paxos. Even systems that claim to use different algorithms (like Raft or Viewstamped Replication) are, at their core, solving the same problem with mechanisms that are provably equivalent to Paxos.
Lamport himself received the Turing Award in 2013, with Paxos cited as one of his seminal contributions.
Before diving into how Paxos works, we must precisely define the problem it solves. Consensus in distributed systems means getting a collection of independent processes to agree on a single value, despite failures and communication uncertainties.
The formal specification:
A correct consensus algorithm must satisfy three properties:
In 1985, Fischer, Lynch, and Paterson proved that in a purely asynchronous system (where message delays are unbounded), no deterministic algorithm can guarantee all three properties if even a single process can fail. This is the famous FLP impossibility result. Paxos works around this by relying on eventual synchrony—assuming that the network will eventually behave well enough to make progress.
Why consensus is hard:
The difficulty of consensus stems from the interplay of failures and asynchrony. Consider these challenges:
| Challenge | Description | Why It Complicates Consensus |
|---|---|---|
| Message Loss | Network packets can be dropped without notification | A process cannot distinguish between a crashed peer and a slow/unreachable one |
| Message Delay | Messages can be delivered after arbitrary delays | A 'late' vote might arrive after a decision has already been made |
| Message Reordering | Messages may arrive in different order than sent | Two processes might see the same events in different sequences |
| Process Crashes | Processes can fail at any point during execution | A process might crash after sending some messages but before others |
| Process Recovery | Crashed processes can restart with persistent state | A recovering process might have outdated information or duplicate messages |
| Network Partitions | Groups of processes become mutually unreachable | Each partition might try to make independent decisions |
The fundamental tension:
Consensus algorithms must navigate a fundamental tension:
Under adversarial conditions (arbitrary message delays, arbitrary process failures), these requirements conflict. If we're too cautious (always waiting for confirmation), we risk deadlock. If we're too aggressive (making decisions quickly), conflicting decisions can be made.
Paxos resolves this tension by ensuring safety unconditionally (under all failure scenarios) while providing liveness under reasonable assumptions about eventual network behavior.
It's easy to have a vague sense that Paxos 'solves consensus.' But understanding precisely what Paxos guarantees—and what it does not guarantee—is essential for using it correctly.
Paxos solves single-value consensus:
The basic Paxos algorithm agrees on a single value. Given a set of processes, possibly proposing different values, Paxos ensures that all processes that complete the protocol agree on exactly one of the proposed values.
Once a value is chosen, it is chosen forever. No subsequent execution of the protocol can change it. This immutability is fundamental—it's what allows replicated state machines to maintain consistent histories.
Basic Paxos decides on exactly one value. For practical systems that need to decide on a sequence of values (like a replicated log), Multi-Paxos runs multiple instances of the protocol, one for each log entry. We'll cover Multi-Paxos in a later page.
The system model assumptions:
Paxos operates under specific assumptions about the environment:
What Paxos does NOT solve:
Understanding Paxos's limitations is as important as understanding its guarantees:
Before examining the precise mechanics of Paxos, let's build intuition about its core ideas. At its heart, Paxos is remarkably elegant—it's the composition of a few key insights.
Key Insight #1: Majorities Always Overlap
If you have 5 nodes and any decision requires a majority (3 nodes), then any two majorities must share at least one node in common. This overlapping node guarantees information transfer between decisions.
This insight is fundamental: even if the network partitions, at most one partition can contain a majority. And any two majorities will have at least one node that participated in both, carrying information from one decision to the next.
For any two quorums Q1 and Q2 of a group of N nodes, if each quorum contains more than N/2 nodes, then Q1 ∩ Q2 ≠ ∅. This simple mathematical property is the foundation upon which Paxos's safety is built.
Key Insight #2: Proposal Numbers Create Total Ordering
In a distributed system, it's difficult to establish a global notion of time. Paxos sidesteps this by using proposal numbers (also called ballot numbers)—unique, ordered identifiers that establish a total ordering among proposals.
Each proposer generates proposal numbers that are guaranteed to be unique (typically by incorporating the proposer's ID) and monotonically increasing. This ordering allows Paxos to reason about 'newer' and 'older' proposals without relying on synchronized clocks.
Key Insight #3: Promise Before Accept
Paxos uses a two-phase approach:
Phase 1 (Prepare): A proposer asks acceptors to promise not to accept any proposal older than its current proposal number. This 'locks out' older proposals.
Phase 2 (Accept): If the proposer receives promises from a majority, it asks acceptors to accept a value. Importantly, the proposer may be required to propose a value that was previously accepted (discovered in Phase 1) rather than its own preferred value.
This two-phase structure ensures that once a value is accepted by a majority, all future proposals will discover and propagate that value.
Key Insight #4: Values are 'Stickier' Than They Appear
Here's the crucial subtlety: once a value has been accepted by a majority of acceptors in round N, any proposer attempting a higher-numbered round N+k will discover that value during Phase 1 (because it must contact a majority, which overlaps with the previous accepting majority) and will be forced to propose that same value.
This creates a kind of 'value stickiness'—values, once accepted by a majority, propagate forward to all future proposals, even if the original proposer has crashed.
Paxos doesn't use explicit voting, locking, or two-phase commit. Instead, it uses the combination of majority quorums, proposal ordering, and the prepare-accept pattern to achieve consensus. The algorithm is so simple in its core logic that many engineers, upon first understanding it, wonder how it took so long to discover.
Paxos defines three logical roles that participants can play. In practice, the same physical node often plays multiple roles, but understanding the roles separately clarifies the algorithm's structure.
| Role | Responsibility | Key Behaviors |
|---|---|---|
| Proposer | Proposes values to be chosen | Generates unique proposal numbers; runs the two-phase protocol; may learn of previously accepted values and adopt them |
| Acceptor | Votes on proposals; remembers accepted values | Promises to not accept older proposals; accepts values from valid proposals; persists state to stable storage |
| Learner | Learns the chosen value once consensus is reached | Receives notification when a value is chosen; does not participate in the voting process itself |
Role interactions:
Proposers ↔ Acceptors: Proposers send Prepare and Accept messages to acceptors. Acceptors respond with Promise and Accepted messages.
Acceptors ↔ Learners: Once an acceptor has accepted a value, it can inform learners. When learners receive accepted messages from a majority of acceptors for the same proposal number and value, they know the value is chosen.
Proposer = Learner (often): In many implementations, the proposer also acts as a learner, discovering whether its proposal was successful.
In typical deployments, each server runs all three roles. For example, with 5 servers, each server is a proposer (can propose values), an acceptor (votes on proposals), and a learner (learns the outcome). This collocation simplifies deployment while maintaining logical separation.
The cluster size question:
Paxos requires a majority of acceptors to make progress. For a cluster of N acceptors:
Common cluster sizes:
| Cluster Size | Majority Required | Tolerated Failures | Notes |
|---|---|---|---|
| 3 nodes | 2 | 1 | Minimum useful size; survives single failure |
| 5 nodes | 3 | 2 | Common production choice; good balance of safety and overhead |
| 7 nodes | 4 | 3 | Higher fault tolerance; increased message overhead |
| 2F+1 nodes | F+1 | F | General formula for tolerating F failures |
A 4-node cluster requires 3 nodes for a majority and tolerates only 1 failure—the same fault tolerance as a 3-node cluster. Even-numbered clusters provide no additional safety benefit while increasing communication overhead. This is why production Paxos deployments almost always use odd numbers of nodes.
Understanding Paxos is not merely academic—it's fundamentally practical for system design. Here's why it matters:
Real-world systems built on Paxos:
| System | Algorithm | Use Case |
|---|---|---|
| Google Chubby | Paxos | Distributed lock service, name service |
| Google Spanner | Multi-Paxos + TrueTime | Globally-distributed relational database |
| Google Megastore | Paxos | Structured storage for Google applications |
| Apache ZooKeeper | ZAB (Paxos variant) | Coordination service for distributed applications |
| etcd | Raft (Paxos equivalent) | Distributed key-value store, Kubernetes state store |
| CockroachDB | Raft | Distributed SQL database |
| TiDB/TiKV | Raft | Distributed HTAP database |
| Consul | Raft | Service mesh, configuration management |
You'll notice many systems use Raft rather than Paxos directly. Raft was designed in 2014 to be more understandable than Paxos while solving the same problem. Raft makes specific design choices (e.g., leader-based log replication) that Paxos leaves open. We'll compare these algorithms in later modules.
Lamport's original Paxos has spawned a family of variants, each optimizing for different use cases. Understanding this landscape helps you choose the right approach for your system.
Basic Paxos (Single-Decree Paxos):
This is the foundational algorithm we'll study in detail. It achieves consensus on a single value. Every other Paxos variant is built upon or derived from this core algorithm.
Multi-Paxos:
Extends Basic Paxos to agree on a sequence of values (a log). This is what production systems actually implement. Multi-Paxos introduces optimizations that make it much more efficient than running separate instances of Basic Paxos.
While these variants exist, mastering Basic Paxos is essential before exploring them. The principles of Basic Paxos—quorum intersection, proposal ordering, the prepare-accept pattern—appear in all variants. Once you deeply understand Basic Paxos, the variants become natural extensions rather than new algorithms.
We've covered substantial ground, establishing the foundation for understanding Paxos. Let's consolidate the key takeaways:
What's next:
Now that we understand what Paxos is and why it matters, we'll examine the roles in detail. The next page explores Proposers, Acceptors, and Learners—the three actors in the Paxos protocol—and their precise responsibilities, state management, and interactions.
You now understand the historical context, problem definition, and core intuitions behind Paxos. This foundation prepares you for the detailed protocol mechanics ahead. Next, we'll explore the three Paxos roles and their responsibilities.