Loading learning content...
In a centralized database system, ensuring that a transaction either fully commits or fully aborts is relatively straightforward—the single database engine controls all resources and can make a unilateral decision. However, in a distributed database system, where data resides across multiple independent nodes, this seemingly simple guarantee becomes extraordinarily complex.\n\nConsider a global banking transaction that transfers money between accounts stored on servers in New York, London, and Tokyo. Each server independently manages its local data and has no direct knowledge of the others' states. If the New York server commits the debit but the London server fails before crediting, the system enters an inconsistent state where money has vanished. The fundamental question becomes: How can we ensure that all participating nodes either all commit or all abort, despite operating independently and potentially failing at any moment?
By the end of this page, you will deeply understand the Two-Phase Commit (2PC) protocol—the canonical solution for achieving atomicity in distributed transactions. You will comprehend its theoretical foundations, walk through its precise mechanics, analyze its properties, and recognize both its power and its limitations.
Before we can appreciate the elegance of Two-Phase Commit, we must first understand the fundamental problem it solves. In distributed systems theory, this is known as the atomic commitment problem.\n\nThe Core Problem:\n\nA distributed transaction involves multiple participants—independent processes or nodes that each hold a portion of the data being accessed. For atomicity, we require:\n\n1. All-or-Nothing Execution: Either every participant commits the transaction, or none of them do\n2. Unanimous Agreement: If any participant cannot commit (due to constraint violations, deadlocks, or failures), then no participant commits\n3. Irrevocability: Once a participant commits, it cannot later abort (and vice versa)\n4. Eventual Termination: The protocol must eventually reach a decision, not remain indefinitely uncertain\n\nThese requirements seem intuitive, but achieving them in the presence of unreliable networks and node failures is remarkably subtle.
The Fischer-Lynch-Paterson (FLP) theorem proves that no deterministic protocol can guarantee consensus in an asynchronous system with even one faulty process. The Two-Phase Commit protocol navigates this impossibility by accepting certain trade-offs—specifically, the possibility of blocking during failures.
Why Simple Approaches Fail:\n\nA naive approach might have each participant independently decide to commit once its local work succeeds. But this fails catastrophically:\n\n- Participant A completes its local work and commits\n- Participant B encounters a constraint violation and must abort\n- Result: Inconsistent state—A has committed changes that B has not\n\nAnother naive approach might have a central coordinator simply tell everyone to commit. But this also fails:\n\n- Coordinator sends 'COMMIT' to all participants\n- Participant A commits successfully\n- Participant B crashes before receiving the message\n- Result: A has committed, B has not—inconsistency persists\n\nThe fundamental insight is that we need a protocol where every participant's ability to commit is verified before any participant actually commits. This is the core idea behind Two-Phase Commit.
The Two-Phase Commit (2PC) protocol, introduced by Jim Gray in 1978, is the canonical solution to the atomic commitment problem. The name derives from its two distinct phases:\n\n1. Prepare Phase (Voting Phase): The coordinator queries all participants to determine if they can commit\n2. Commit Phase (Decision Phase): Based on the votes, the coordinator instructs all participants to either commit or abort\n\nThis two-phase structure ensures that no participant permanently commits until the coordinator has confirmed that all participants are prepared to commit. Let's examine each phase in rigorous detail.
| Phase | Initiator | Message Flow | Purpose |
|---|---|---|---|
| Phase 1: Prepare | Coordinator | Coordinator → Participants (PREPARE) | Gather votes on commit readiness |
| Phase 1 Response | Participants | Participants → Coordinator (VOTE_COMMIT / VOTE_ABORT) | Report local commit capability |
| Phase 2: Commit/Abort | Coordinator | Coordinator → Participants (COMMIT / ABORT) | Enforce global decision |
| Phase 2 Response | Participants | Participants → Coordinator (ACK) | Confirm completion |
Protocol Actors:\n\nThe 2PC protocol involves two types of actors:\n\nCoordinator (Transaction Manager):\n- Initiates and manages the distributed transaction\n- Sends PREPARE messages to all participants\n- Collects votes and makes the global commit/abort decision\n- Logs decisions to stable storage for recovery\n- Sends the final decision to all participants\n\nParticipants (Resource Managers):\n- Execute local portions of the transaction\n- Respond to PREPARE with their vote (VOTE_COMMIT or VOTE_ABORT)\n- Follow the coordinator's final decision\n- Log their state transitions for recovery\n\nTypically, the client application interacts only with the coordinator, which orchestrates the entire distributed transaction transparently.
The Prepare Phase is the heart of the 2PC protocol—it ensures that every participant can commit before any participant is instructed to do so. Here is the precise sequence of events:\n\nStep 1: Coordinator Initiates Prepare\n\nWhen the application requests to commit the distributed transaction, the coordinator:\n\n1. Writes a <PREPARE T> record to its local stable log\n2. Sends a PREPARE message to all participants\n3. Sets a timeout timer for collecting votes\n\nStep 2: Participant Processes PREPARE\n\nUpon receiving the PREPARE message, each participant decides whether it can commit.\n\nIf the participant CAN commit:\n1. It writes a <PREPARED T> record to its stable log\n2. It replies with VOTE_COMMIT\n3. It enters the prepared state (also called the uncertain state)\n4. It releases no locks—resources remain locked!\n\nIf the participant CANNOT commit (constraint violation, deadlock, resource unavailable):\n1. It writes an <ABORT T> record to its stable log\n2. It replies with VOTE_ABORT\n3. It aborts its local portion and releases locks\n4. It need not wait for the coordinator's decision—the transaction will abort
When a participant votes VOTE_COMMIT, it enters the prepared state. This is a commitment to follow the coordinator's decision, whatever it may be. The participant is promising: 'I CAN commit this transaction, and I WILL commit if you tell me to—and I WILL abort if you tell me to abort.' This promise is irrevocable until the coordinator responds.
The Durability of PREPARED:\n\nThe <PREPARED T> log record is absolutely critical. By writing this record to stable storage before voting VOTE_COMMIT, the participant ensures that:\n\n1. If it crashes: Upon recovery, it can determine from its log that it voted to commit and must wait for the coordinator's decision\n2. If it is in doubt: It knows to query the coordinator rather than unilaterally aborting\n\nWhy Voting COMMIT is Irrevocable:\n\nOnce a participant votes VOTE_COMMIT, it cannot unilaterally abort. Consider this scenario:\n\n- Participant P1 votes VOTE_COMMIT\n- Participant P2 also votes VOTE_COMMIT\n- Coordinator decides COMMIT\n- P1 crashes before receiving COMMIT, recovers, and unilaterally aborts\n- Result: P2 has committed, P1 has aborted—inconsistency!\n\nBy making VOTE_COMMIT irrevocable and logging it durably, we prevent this inconsistency.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
COORDINATOR: Initiate Prepare Phase────────────────────────────────────────────function prepare_phase(transaction T, participants P[]): // Step 1: Log prepare initiation log_to_stable_storage(<PREPARE T>) // Step 2: Send PREPARE to all participants for each participant p in P[]: send(p, PREPARE, T) // Step 3: Collect votes with timeout votes = {} start_timer(PREPARE_TIMEOUT) while not all_votes_received(P[], votes): if timeout_expired(): return GLOBAL_ABORT // Missing vote = veto msg = receive_message() if msg.type == VOTE_COMMIT or msg.type == VOTE_ABORT: votes[msg.sender] = msg.type // Step 4: Determine global decision if all(vote == VOTE_COMMIT for vote in votes.values()): return GLOBAL_COMMIT else: return GLOBAL_ABORT PARTICIPANT: Process PREPARE Request────────────────────────────────────────────function handle_prepare(transaction T): if can_commit(T): // Durably record our promise before voting log_to_stable_storage(<PREPARED T>) // Enter prepared/uncertain state set_state(T, PREPARED) send(coordinator, VOTE_COMMIT, T) else: // Log abort and release resources log_to_stable_storage(<ABORT T>) release_locks(T) rollback_changes(T) send(coordinator, VOTE_ABORT, T)Once the coordinator has collected all votes (or the timeout has expired), it enters the Commit Phase and makes a global decision.\n\nDecision Rule:\n\nThe decision follows a simple but critical rule:\n\n- If ALL participants voted VOTE_COMMIT → Global decision is COMMIT\n- If ANY participant voted VOTE_ABORT OR any participant timed out → Global decision is ABORT\n\nThis rule ensures that a transaction commits only when every participant has explicitly confirmed its ability to commit. Any failure, whether an explicit VOTE_ABORT or a failure to respond, results in abort—preserving the all-or-nothing property.
| Vote Pattern | Global Decision | Rationale |
|---|---|---|
| All VOTE_COMMIT | COMMIT | Universal agreement—safe to commit |
| At least one VOTE_ABORT | ABORT | Explicit rejection—cannot commit |
| At least one timeout/no response | ABORT | Cannot confirm consensus—abort to be safe |
| Mixed VOTE_COMMIT and VOTE_ABORT | ABORT | Any ABORT vetoes the transaction |
Step-by-Step Commit Phase:\n\nIf Global Decision is COMMIT:\n\n1. Coordinator writes <COMMIT T> to its stable log (this is the commit point)\n2. Coordinator sends GLOBAL_COMMIT message to all participants\n3. Each participant:\n - Writes <COMMIT T> to its stable log\n - Makes changes permanent (commit locally)\n - Releases all locks\n - Sends ACKNOWLEDGMENT to coordinator\n4. Coordinator waits for all acknowledgments\n5. Coordinator writes <END T> to log and forgets the transaction\n\nIf Global Decision is ABORT:\n\n1. Coordinator writes <ABORT T> to its stable log\n2. Coordinator sends GLOBAL_ABORT message to all participants\n3. Each participant that was in PREPARED state:\n - Writes <ABORT T> to its stable log\n - Rolls back changes (undo local work)\n - Releases all locks\n - Sends ACKNOWLEDGMENT to coordinator\n4. Coordinator waits for all acknowledgments\n5. Coordinator writes <END T> to log and forgets the transaction
The commit point is the moment when the coordinator writes <COMMIT T> to stable storage. Before this moment, the transaction might still abort. After this moment, the transaction is irrevocably committed. Even if the coordinator crashes immediately after writing this record, upon recovery it will see the COMMIT record and re-send GLOBAL_COMMIT to any participants that haven't acknowledged.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
COORDINATOR: Commit Phase────────────────────────────────────────────function commit_phase(transaction T, decision, participants P[]): if decision == GLOBAL_COMMIT: // THE COMMIT POINT - once logged, transaction MUST commit log_to_stable_storage(<COMMIT T>) // Inform all participants of commit decision for each participant p in P[]: send(p, GLOBAL_COMMIT, T) else: // decision == GLOBAL_ABORT log_to_stable_storage(<ABORT T>) // Inform participants that voted COMMIT (others already aborted) for each participant p that voted VOTE_COMMIT: send(p, GLOBAL_ABORT, T) // Collect acknowledgments (with retry for reliability) acks = {} while not all_acks_received(P[], acks): for each p not in acks: resend_decision(p, decision, T) // Retry if needed msg = receive_message() if msg.type == ACK: acks[msg.sender] = true // Transaction complete log_to_stable_storage(<END T>) forget_transaction(T) PARTICIPANT: Process Global Decision────────────────────────────────────────────function handle_global_decision(transaction T, decision): if decision == GLOBAL_COMMIT: // Make changes permanent log_to_stable_storage(<COMMIT T>) commit_local_changes(T) release_locks(T) else: // decision == GLOBAL_ABORT // Undo any prepared work log_to_stable_storage(<ABORT T>) rollback_changes(T) release_locks(T) // Acknowledge completion to coordinator send(coordinator, ACK, T) // Clean up local transaction state forget_transaction(T)Understanding the state transitions in 2PC is essential for implementing the protocol correctly and reasoning about failure scenarios. Both the coordinator and participants move through well-defined states.\n\nCoordinator States:\n\n1. INITIAL: Transaction is active, running normally\n2. WAIT: Sent PREPARE, waiting for votes from participants\n3. COMMIT: Decided to commit, sending GLOBAL_COMMIT\n4. ABORT: Decided to abort, sending GLOBAL_ABORT\n5. END: Transaction completed, resources freed\n\nParticipant States:\n\n1. INITIAL: Executing local transaction operations\n2. PREPARED (UNCERTAIN): Voted VOTE_COMMIT, waiting for global decision\n3. COMMITTED: Received GLOBAL_COMMIT, applied changes permanently\n4. ABORTED: Either voted VOTE_ABORT or received GLOBAL_ABORT
The Critical PREPARED State:\n\nThe PREPARED state for participants is uniquely important—and uniquely dangerous. A participant in the PREPARED state:\n\n1. Has promised to follow the coordinator's decision\n2. Cannot unilaterally abort or commit\n3. Is holding locks that block other transactions\n4. Must wait for the coordinator to resolve the uncertainty\n\nIf the coordinator fails while participants are in PREPARED state, those participants are blocked. They cannot proceed—they cannot commit (they don't know if others prepared) and they cannot abort (perhaps the coordinator decided to commit). This is the famous blocking problem of 2PC.
Let's trace through a complete successful 2PC execution with three participants to see how the messages flow and how state transitions occur.\n\nScenario: A distributed transaction T spans three database servers (P1, P2, P3). The transaction coordinator (C) manages the commit process.\n\nTimeline of Events:
| Time | Actor | Action | State After |
|---|---|---|---|
| t0 | Client | Request COMMIT for transaction T | |
| t1 | Coordinator | Log <PREPARE T>, send PREPARE to P1, P2, P3 | WAIT |
| t2 | P1 | Receive PREPARE, can commit, log <PREPARED>, send VOTE_COMMIT | PREPARED |
| t3 | P2 | Receive PREPARE, can commit, log <PREPARED>, send VOTE_COMMIT | PREPARED |
| t4 | P3 | Receive PREPARE, can commit, log <PREPARED>, send VOTE_COMMIT | PREPARED |
| t5 | Coordinator | Receive all VOTE_COMMIT, log <COMMIT T>, send GLOBAL_COMMIT | COMMIT |
| t6 | P1 | Receive GLOBAL_COMMIT, log <COMMIT>, commit locally, send ACK | COMMITTED |
| t7 | P2 | Receive GLOBAL_COMMIT, log <COMMIT>, commit locally, send ACK | COMMITTED |
| t8 | P3 | Receive GLOBAL_COMMIT, log <COMMIT>, commit locally, send ACK | COMMITTED |
| t9 | Coordinator | Receive all ACKs, log <END T> | END |
Abort Scenario:\n\nNow consider a case where P2 cannot commit (perhaps due to a constraint violation):\n\n- P1 and P3 vote VOTE_COMMIT\n- P2 votes VOTE_ABORT\n- Coordinator decides GLOBAL_ABORT\n- P1 and P3 rollback their prepared work\n- P2 has already aborted\n\nThe key insight: P2's abort vote vetoes the entire transaction. This preserves atomicity—either all participants commit or none do.
The correctness of 2PC depends critically on write-ahead logging (WAL) and the force-write discipline. Every state transition must be logged to stable storage before the corresponding message is sent. This ensures that crashes can be recovered correctly.\n\nWrite-Ahead Logging Rules for 2PC:
These log records must be force-written to stable storage (typically using fsync or equivalent). They cannot remain in volatile buffer cache. If a crash occurs after a log record is written but before the corresponding action, recovery can correctly handle the situation. If the log record were still in volatile memory, it would be lost, and recovery would produce incorrect results.
Log Contents:\n\nThe log records contain essential information for recovery:\n\n**<PREPARE T>:\n- Transaction ID\n- List of participants\n- Timeout value\n\n<PREPARED T>** (at participant):\n- Transaction ID\n- Coordinator identity\n- All undo/redo information for local changes\n\n**<COMMIT T>:\n- Transaction ID\n- List of participants (at coordinator)\n\n<ABORT T>:\n- Transaction ID\n\n<END T>**:\n- Transaction ID\n- Signal that all participants have acknowledged
The Two-Phase Commit protocol provides several important guarantees, but also has notable limitations. Understanding these properties is essential for correct application.\n\nSafety Properties (Always Hold):
Liveness Properties (Hold Under Conditions):\n\nUnfortunately, 2PC has a significant limitation regarding liveness—specifically, it can block in certain failure scenarios.
If the coordinator fails after sending PREPARE but before sending the global decision, participants in the PREPARED state are blocked. They cannot commit (they don't know if all others prepared) and cannot abort (perhaps the coordinator decided to commit). These participants hold their locks indefinitely, blocking other transactions that conflict with them.
We've established the foundational understanding of the Two-Phase Commit protocol, the canonical solution for distributed transaction atomicity. Let's consolidate the key insights:
What's Next:\n\nThe next page examines the Coordinator Role in depth—how the coordinator manages the protocol, tracks participant states, handles timeouts, and ensures reliable recovery. Understanding the coordinator's responsibilities is essential for implementing robust distributed transactions.
You now understand the Two-Phase Commit protocol—its motivation, mechanics, and properties. Next, we'll dive deep into the coordinator's responsibilities and see how it orchestrates the entire distributed transaction process.