Loading learning content...
The Two-Phase Commit protocol has a fundamental weakness: it can block indefinitely when the coordinator fails while participants are in the PREPARED state. This blocking problem motivated the development of the Three-Phase Commit (3PC) protocol, introduced by Dale Skeen in 1981.
3PC adds an additional intermediate phase between PREPARE and COMMIT, creating a buffer that allows participants to make progress even when the coordinator becomes unreachable. The key insight is that by introducing a pre-commit phase, participants gain enough information to safely resolve the transaction without the coordinator.
However, 3PC is not a silver bullet. While it eliminates blocking under certain failure assumptions, it introduces additional complexity, latency, and—critically—it cannot handle network partitions correctly. Understanding 3PC's design, benefits, and limitations provides essential insight into the fundamental trade-offs in distributed transaction protocols.
By the end of this page, you will understand the Three-Phase Commit protocol's motivation and design. You'll comprehend its three phases in detail, understand why it's non-blocking under certain conditions, recognize its limitations (especially with network partitions), and be able to compare 2PC and 3PC for different scenarios.
To understand 3PC, we must first deeply analyze why 2PC blocks and what properties would be needed to avoid blocking.
The Root Cause of 2PC Blocking:
In 2PC, when a participant enters the PREPARED state:
If the coordinator fails, the participant is stuck. It cannot:
The problem is asymmetric information: from the PREPARED state, the participant cannot distinguish between a future COMMIT and a future ABORT.
In 2PC, there's a direct transition from PREPARED to COMMITTED. A participant can be in PREPARED while another is in COMMITTED—an inconsistent situation where resolution requires the coordinator. If we insert an intermediate state that ALL participants must pass through before COMMITTED, no participant can be COMMITTED while others are still uncertain.
The 3PC Solution: Add a Pre-Commit Phase
3PC inserts a PRE-COMMIT phase between PREPARED and COMMITTED:
The crucial property: Before any participant receives DO_COMMIT (and enters COMMITTED), all participants have received PRE_COMMIT (and entered PRE-COMMITTED state).
This means if the coordinator fails and any participant is in PRE-COMMITTED state, that participant knows:
This additional knowledge enables non-blocking termination.
| Scenario | 2PC Behavior | 3PC Behavior |
|---|---|---|
| Coordinator fails during vote collection | Participants abort or block until recovery | Participants abort or block until recovery |
| Coordinator fails after decision, before any notification | All PREPARED participants blocked | PRE-COMMIT phase prevents this exact scenario |
| Coordinator fails after some but not all notified | Some committed, others blocked | PRE-COMMIT ensures uniform state among participants |
| Network partition during commit phase | Blocked | May commit or abort incorrectly (see limitations) |
Let's examine each phase of the 3PC protocol in rigorous detail.
Phase 1: Can Commit (Voting Phase)
This phase is essentially identical to 2PC's prepare phase:
CAN_COMMIT message to all participantsYES_VOTE and enters UNCERTAIN stateNO_VOTE and aborts locallyYES_VOTE: proceed to Phase 2NO_VOTE or timeout: send ABORT to all and terminateImportant: At the end of Phase 1, if the coordinator hasn't received all YES_VOTEs, it aborts immediately. There is no ambiguity—the coordinator knows the transaction will abort.
Phase 2: Pre-Commit (Prepare to Commit Phase)
This is the new phase that distinguishes 3PC from 2PC:
PRE_COMMIT message to all participantsACK_PRE_COMMITCritical Property: Receiving PRE_COMMIT tells a participant that:
This is the key insight: the PRE-COMMITTED state indicates consensus to commit without irrevocable commitment.
Phase 3: Do Commit (Commit Phase)
The final phase mirrors 2PC's commit phase:
DO_COMMIT message to all participantsDONEThe Complete Message Flow:
| Phase | Coordinator Message | Participant Response | State Transition |
|---|---|---|---|
| Phase 1 | CAN_COMMIT | YES_VOTE | INITIAL → UNCERTAIN |
| Phase 2 | PRE_COMMIT | ACK_PRE_COMMIT | UNCERTAIN → PRE-COMMITTED |
| Phase 3 | DO_COMMIT | DONE | PRE-COMMITTED → COMMITTED |
Understanding the state machines for both coordinator and participant is essential for reasoning about 3PC's non-blocking properties.
Coordinator State Machine:
Participant State Machine:
The Key Non-Blocking Property:
Notice that in 3PC, before any participant reaches COMMITTED, all participants must pass through PRE-COMMITTED. This means:
In 3PC, timeout behavior differs by state: In UNCERTAIN state, timeout leads to abort (safe because no one committed). In PRE-COMMITTED state, timeout can lead to commit (safe because all voted yes and coordinator intended to commit). This asymmetry is what enables non-blocking termination.
The primary advantage of 3PC over 2PC is its non-blocking property under certain failure assumptions. Let's understand how this works.
The Termination Protocol:
When the coordinator fails and a participant times out waiting for a message, the participant initiates a termination protocol:
| States Found Among Survivors | Decision | Rationale |
|---|---|---|
| Any COMMITTED | COMMIT | Someone already committed; all must commit |
| Any ABORTED | ABORT | Someone already aborted; all must abort |
| Any PRE-COMMITTED, none COMMITTED/ABORTED | COMMIT | All agreed, coordinator intended to commit |
| All UNCERTAIN | ABORT | No one reached PRE-COMMIT; safe to abort |
| Mix of UNCERTAIN and PRE-COMMITTED | COMMIT | PRE-COMMITTED proves unanimous YES |
Why This Works:
The key invariant that 3PC maintains is:
No participant can be in COMMITTED state while any other participant is in UNCERTAIN state.
This is because:
Contrast with 2PC:
In 2PC, a participant in PREPARED state (equivalent to UNCERTAIN) cannot know if another participant has committed. In 3PC, if any participant is merely UNCERTAIN, we know for certain that no participant is COMMITTED—enabling safe abort.
Think of PRE-COMMITTED as a 'buffer zone' between uncertainty and commitment. All participants must pass through this buffer before any can commit. This synchronization point gives survivors enough information to resolve the transaction without the original coordinator.
Let's trace through how 3PC handles various failure scenarios, comparing with 2PC behavior.
Scenario 5.1: Coordinator Fails After Receiving All YES_VOTEs, Before Sending PRE_COMMIT
In 2PC: All participants in PREPARED state, blocked forever In 3PC:
Scenario 5.2: Coordinator Fails After Sending Some PRE_COMMIT Messages
In 3PC:
Scenario 5.3: Coordinator Fails After Sending All PRE_COMMIT, Before Sending DO_COMMIT
In 3PC:
Scenario 5.4: Coordinator Fails After Sending Some DO_COMMIT Messages
In 3PC:
This scenario is similar to 2PC in the sense that the decision is already determined, but unlike 2PC, participants can resolve it themselves without waiting for coordinator recovery.
While 3PC eliminates blocking under crash failures, it has a critical vulnerability: network partitions can cause inconsistency. This is 3PC's most significant limitation.
The Dangerous Scenario:
This scenario violates the fundamental atomicity property: different participants have reached different terminal states. Unlike 2PC's blocking (which sacrifices liveness for safety), 3PC's partition vulnerability sacrifices safety for liveness. Neither is acceptable in all scenarios.
Why This Happens:
The termination protocol assumes that if no participant is PRE-COMMITTED, it's safe to abort. But during a partition:
The Fundamental Trade-off:
3PC's termination protocol assumes that all surviving participants can communicate with each other. With network partitions, this assumption breaks down. Each partition sees only a subset of participants and may make inconsistent decisions.
Quorum-Based Solutions:
Some 3PC variants use quorum-based voting to mitigate this:
However, this reintroduces some blocking (though less than 2PC) and requires careful configuration of quorum sizes.
Let's systematically compare the two protocols across multiple dimensions to understand when each is appropriate.
| Dimension | 2PC | 3PC |
|---|---|---|
| Message Complexity | 4n messages (2 rounds) | 6n messages (3 rounds) |
| Latency | 2 RTTs to commit point | 3 RTTs to commit point |
| Blocking under Crash Failures | Yes (coordinator crash blocks) | No (survivors can terminate) |
| Safety under Partitions | Safe (blocks rather than diverge) | UNSAFE (may diverge) |
| Implementation Complexity | Moderate | Higher |
| Practical Adoption | Widespread (XA, databases) | Rare |
| Log Writes per Transaction | 2-3 forced writes | 3-4 forced writes |
| Recovery Complexity | Moderate | Higher |
When to Use 2PC:
When to Use 3PC:
In practice, 2PC with highly available coordinators is more common than 3PC. Modern systems like CockroachDB, Spanner, and TiDB use 2PC combined with Paxos/Raft-replicated coordinators. This provides non-blocking behavior (coordinator failure triggers quick leader election) while maintaining safety during partitions.
Both 2PC and 3PC have limitations. Modern distributed databases have evolved various approaches that combine ideas from both protocols with consensus algorithms.
1. Paxos Commit
Instead of a single coordinator, use a Paxos group as the coordinator:
2. Raft-Based Coordination
Similar to Paxos Commit but using Raft consensus:
3. RAMP Transactions
Read Atomic Multi-Partition transactions:
4. Saga Pattern
Compensating transactions instead of distributed commit:
| Approach | Blocking Risk | Partition Safety | Complexity | Consistency |
|---|---|---|---|---|
| 2PC | High | Safe (blocks) | Moderate | Strong |
| 3PC | Low (crash only) | UNSAFE | Higher | Strong* |
| Paxos/Raft + 2PC | Very Low | Safe | High | Strong |
| Sagas | None | Safe | Moderate | Eventual |
| RAMP | None | Safe | Moderate | Read Committed |
The industry trend is toward '2PC + Consensus' hybrids. By replicating coordinator state via Paxos or Raft, systems get the safety of 2PC (no divergence during partitions) with the liveness of 3PC (minimal blocking because coordinator failure just triggers leader election). This is the approach used by most modern NewSQL databases.
Why Pure 3PC Isn't Used:
Partition Unsafety: The fundamental partition problem is too dangerous for production systems where network issues are common.
Complexity: The extra phase adds implementation and testing complexity without solving the hard cases (partitions).
Better Alternatives Exist: Paxos/Raft + 2PC achieves 3PC's non-blocking goals while maintaining partition safety.
Latency Matters: The extra round trip hurts performance without enough compensating benefit.
However, understanding 3PC remains valuable as it illuminates the fundamental trade-offs in distributed consensus and provides insight into why more sophisticated solutions evolved.
We've thoroughly explored the Three-Phase Commit protocol—its motivation, mechanics, advantages, and critical limitations. Let's consolidate the key insights:
Module Completion:
You have completed the Distributed Transactions module. You now understand:
This knowledge is essential for designing, implementing, and troubleshooting distributed database systems.
Congratulations! You've completed the Distributed Transactions module. You now possess a comprehensive understanding of atomic commit protocols—both 2PC and 3PC—their mechanics, trade-offs, and the insights that drive modern distributed database designs. Continue to the next module to explore the CAP Theorem and its implications for distributed systems.