Loading learning content...
In a centralized database system, ensuring that a transaction either fully commits or fully aborts is relatively straightforward—the single database engine controls all resources and can make a unilateral decision. However, in a distributed database system, where data resides across multiple independent nodes, this seemingly simple guarantee becomes extraordinarily complex.
Consider a global banking transaction that transfers money between accounts stored on servers in New York, London, and Tokyo. Each server independently manages its local data and has no direct knowledge of the others' states. If the New York server commits the debit but the London server fails before crediting, the system enters an inconsistent state where money has vanished. The fundamental question becomes: How can we ensure that all participating nodes either all commit or all abort, despite operating independently and potentially failing at any moment?
By the end of this page, you will deeply understand the Two-Phase Commit (2PC) protocol—the canonical solution for achieving atomicity in distributed transactions. You will comprehend its theoretical foundations, walk through its precise mechanics, analyze its properties, and recognize both its power and its limitations.
Before we can appreciate the elegance of Two-Phase Commit, we must first understand the fundamental problem it solves. In distributed systems theory, this is known as the atomic commitment problem.
The Core Problem:
A distributed transaction involves multiple participants—independent processes or nodes that each hold a portion of the data being accessed. For atomicity, we require:
These requirements seem intuitive, but achieving them in the presence of unreliable networks and node failures is remarkably subtle.
The Fischer-Lynch-Paterson (FLP) theorem proves that no deterministic protocol can guarantee consensus in an asynchronous system with even one faulty process. The Two-Phase Commit protocol navigates this impossibility by accepting certain trade-offs—specifically, the possibility of blocking during failures.
Why Simple Approaches Fail:
A naive approach might have each participant independently decide to commit once its local work succeeds. But this fails catastrophically:
Another naive approach might have a central coordinator simply tell everyone to commit. But this also fails:
The fundamental insight is that we need a protocol where every participant's ability to commit is verified before any participant actually commits. This is the core idea behind Two-Phase Commit.
The Two-Phase Commit (2PC) protocol, introduced by Jim Gray in 1978, is the canonical solution to the atomic commitment problem. The name derives from its two distinct phases:
This two-phase structure ensures that no participant permanently commits until the coordinator has confirmed that all participants are prepared to commit. Let's examine each phase in rigorous detail.
| Phase | Initiator | Message Flow | Purpose |
|---|---|---|---|
| Phase 1: Prepare | Coordinator | Coordinator → Participants (PREPARE) | Gather votes on commit readiness |
| Phase 1 Response | Participants | Participants → Coordinator (VOTE_COMMIT / VOTE_ABORT) | Report local commit capability |
| Phase 2: Commit/Abort | Coordinator | Coordinator → Participants (COMMIT / ABORT) | Enforce global decision |
| Phase 2 Response | Participants | Participants → Coordinator (ACK) | Confirm completion |
Protocol Actors:
The 2PC protocol involves two types of actors:
Coordinator (Transaction Manager):
Participants (Resource Managers):
Typically, the client application interacts only with the coordinator, which orchestrates the entire distributed transaction transparently.
The Prepare Phase is the heart of the 2PC protocol—it ensures that every participant can commit before any participant is instructed to do so. Here is the precise sequence of events:
Step 1: Coordinator Initiates Prepare
When the application requests to commit the distributed transaction, the coordinator:
<PREPARE T> record to its local stable logPREPARE message to all participantsStep 2: Participant Processes PREPARE
Upon receiving the PREPARE message, each participant decides whether it can commit.
If the participant CAN commit:
<PREPARED T> record to its stable logVOTE_COMMITIf the participant CANNOT commit (constraint violation, deadlock, resource unavailable):
<ABORT T> record to its stable logVOTE_ABORTWhen a participant votes VOTE_COMMIT, it enters the prepared state. This is a commitment to follow the coordinator's decision, whatever it may be. The participant is promising: 'I CAN commit this transaction, and I WILL commit if you tell me to—and I WILL abort if you tell me to abort.' This promise is irrevocable until the coordinator responds.
The Durability of PREPARED:
The <PREPARED T> log record is absolutely critical. By writing this record to stable storage before voting VOTE_COMMIT, the participant ensures that:
Why Voting COMMIT is Irrevocable:
Once a participant votes VOTE_COMMIT, it cannot unilaterally abort. Consider this scenario:
By making VOTE_COMMIT irrevocable and logging it durably, we prevent this inconsistency.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
COORDINATOR: Initiate Prepare Phase────────────────────────────────────────────function prepare_phase(transaction T, participants P[]): // Step 1: Log prepare initiation log_to_stable_storage(<PREPARE T>) // Step 2: Send PREPARE to all participants for each participant p in P[]: send(p, PREPARE, T) // Step 3: Collect votes with timeout votes = {} start_timer(PREPARE_TIMEOUT) while not all_votes_received(P[], votes): if timeout_expired(): return GLOBAL_ABORT // Missing vote = veto msg = receive_message() if msg.type == VOTE_COMMIT or msg.type == VOTE_ABORT: votes[msg.sender] = msg.type // Step 4: Determine global decision if all(vote == VOTE_COMMIT for vote in votes.values()): return GLOBAL_COMMIT else: return GLOBAL_ABORT PARTICIPANT: Process PREPARE Request────────────────────────────────────────────function handle_prepare(transaction T): if can_commit(T): // Durably record our promise before voting log_to_stable_storage(<PREPARED T>) // Enter prepared/uncertain state set_state(T, PREPARED) send(coordinator, VOTE_COMMIT, T) else: // Log abort and release resources log_to_stable_storage(<ABORT T>) release_locks(T) rollback_changes(T) send(coordinator, VOTE_ABORT, T)Once the coordinator has collected all votes (or the timeout has expired), it enters the Commit Phase and makes a global decision.
Decision Rule:
The decision follows a simple but critical rule:
This rule ensures that a transaction commits only when every participant has explicitly confirmed its ability to commit. Any failure, whether an explicit VOTE_ABORT or a failure to respond, results in abort—preserving the all-or-nothing property.
| Vote Pattern | Global Decision | Rationale |
|---|---|---|
| All VOTE_COMMIT | COMMIT | Universal agreement—safe to commit |
| At least one VOTE_ABORT | ABORT | Explicit rejection—cannot commit |
| At least one timeout/no response | ABORT | Cannot confirm consensus—abort to be safe |
| Mixed VOTE_COMMIT and VOTE_ABORT | ABORT | Any ABORT vetoes the transaction |
Step-by-Step Commit Phase:
If Global Decision is COMMIT:
<COMMIT T> to its stable log (this is the commit point)GLOBAL_COMMIT message to all participants<COMMIT T> to its stable logACKNOWLEDGMENT to coordinator<END T> to log and forgets the transactionIf Global Decision is ABORT:
<ABORT T> to its stable logGLOBAL_ABORT message to all participants<ABORT T> to its stable logACKNOWLEDGMENT to coordinator<END T> to log and forgets the transactionThe commit point is the moment when the coordinator writes <COMMIT T> to stable storage. Before this moment, the transaction might still abort. After this moment, the transaction is irrevocably committed. Even if the coordinator crashes immediately after writing this record, upon recovery it will see the COMMIT record and re-send GLOBAL_COMMIT to any participants that haven't acknowledged.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
COORDINATOR: Commit Phase────────────────────────────────────────────function commit_phase(transaction T, decision, participants P[]): if decision == GLOBAL_COMMIT: // THE COMMIT POINT - once logged, transaction MUST commit log_to_stable_storage(<COMMIT T>) // Inform all participants of commit decision for each participant p in P[]: send(p, GLOBAL_COMMIT, T) else: // decision == GLOBAL_ABORT log_to_stable_storage(<ABORT T>) // Inform participants that voted COMMIT (others already aborted) for each participant p that voted VOTE_COMMIT: send(p, GLOBAL_ABORT, T) // Collect acknowledgments (with retry for reliability) acks = {} while not all_acks_received(P[], acks): for each p not in acks: resend_decision(p, decision, T) // Retry if needed msg = receive_message() if msg.type == ACK: acks[msg.sender] = true // Transaction complete log_to_stable_storage(<END T>) forget_transaction(T) PARTICIPANT: Process Global Decision────────────────────────────────────────────function handle_global_decision(transaction T, decision): if decision == GLOBAL_COMMIT: // Make changes permanent log_to_stable_storage(<COMMIT T>) commit_local_changes(T) release_locks(T) else: // decision == GLOBAL_ABORT // Undo any prepared work log_to_stable_storage(<ABORT T>) rollback_changes(T) release_locks(T) // Acknowledge completion to coordinator send(coordinator, ACK, T) // Clean up local transaction state forget_transaction(T)Understanding the state transitions in 2PC is essential for implementing the protocol correctly and reasoning about failure scenarios. Both the coordinator and participants move through well-defined states.
Coordinator States:
Participant States:
The Critical PREPARED State:
The PREPARED state for participants is uniquely important—and uniquely dangerous. A participant in the PREPARED state:
If the coordinator fails while participants are in PREPARED state, those participants are blocked. They cannot proceed—they cannot commit (they don't know if others prepared) and they cannot abort (perhaps the coordinator decided to commit). This is the famous blocking problem of 2PC.
Let's trace through a complete successful 2PC execution with three participants to see how the messages flow and how state transitions occur.
Scenario: A distributed transaction T spans three database servers (P1, P2, P3). The transaction coordinator (C) manages the commit process.
Timeline of Events:
| Time | Actor | Action | State After |
|---|---|---|---|
| t0 | Client | Request COMMIT for transaction T | |
| t1 | Coordinator | Log <PREPARE T>, send PREPARE to P1, P2, P3 | WAIT |
| t2 | P1 | Receive PREPARE, can commit, log <PREPARED>, send VOTE_COMMIT | PREPARED |
| t3 | P2 | Receive PREPARE, can commit, log <PREPARED>, send VOTE_COMMIT | PREPARED |
| t4 | P3 | Receive PREPARE, can commit, log <PREPARED>, send VOTE_COMMIT | PREPARED |
| t5 | Coordinator | Receive all VOTE_COMMIT, log <COMMIT T>, send GLOBAL_COMMIT | COMMIT |
| t6 | P1 | Receive GLOBAL_COMMIT, log <COMMIT>, commit locally, send ACK | COMMITTED |
| t7 | P2 | Receive GLOBAL_COMMIT, log <COMMIT>, commit locally, send ACK | COMMITTED |
| t8 | P3 | Receive GLOBAL_COMMIT, log <COMMIT>, commit locally, send ACK | COMMITTED |
| t9 | Coordinator | Receive all ACKs, log <END T> | END |
Abort Scenario:
Now consider a case where P2 cannot commit (perhaps due to a constraint violation):
The key insight: P2's abort vote vetoes the entire transaction. This preserves atomicity—either all participants commit or none do.
The correctness of 2PC depends critically on write-ahead logging (WAL) and the force-write discipline. Every state transition must be logged to stable storage before the corresponding message is sent. This ensures that crashes can be recovered correctly.
Write-Ahead Logging Rules for 2PC:
These log records must be force-written to stable storage (typically using fsync or equivalent). They cannot remain in volatile buffer cache. If a crash occurs after a log record is written but before the corresponding action, recovery can correctly handle the situation. If the log record were still in volatile memory, it would be lost, and recovery would produce incorrect results.
Log Contents:
The log records contain essential information for recovery:
<PREPARE T>:
<PREPARED T> (at participant):
<COMMIT T>:
<ABORT T>:
<END T>:
The Two-Phase Commit protocol provides several important guarantees, but also has notable limitations. Understanding these properties is essential for correct application.
Safety Properties (Always Hold):
Liveness Properties (Hold Under Conditions):
Unfortunately, 2PC has a significant limitation regarding liveness—specifically, it can block in certain failure scenarios.
If the coordinator fails after sending PREPARE but before sending the global decision, participants in the PREPARED state are blocked. They cannot commit (they don't know if all others prepared) and cannot abort (perhaps the coordinator decided to commit). These participants hold their locks indefinitely, blocking other transactions that conflict with them.
We've established the foundational understanding of the Two-Phase Commit protocol, the canonical solution for distributed transaction atomicity. Let's consolidate the key insights:
What's Next:
The next page examines the Coordinator Role in depth—how the coordinator manages the protocol, tracks participant states, handles timeouts, and ensures reliable recovery. Understanding the coordinator's responsibilities is essential for implementing robust distributed transactions.
You now understand the Two-Phase Commit protocol—its motivation, mechanics, and properties. Next, we'll dive deep into the coordinator's responsibilities and see how it orchestrates the entire distributed transaction process.