Loading content...
In a distributed database system, failures are not exceptional events—they are inevitable realities. Network cables get unplugged. Servers crash unexpectedly. Power outages affect entire data centers. Hard drives fail silently. Messages are delayed indefinitely or arrive out of order. The Two-Phase Commit protocol must navigate these treacherous waters while still guaranteeing transaction atomicity.
The original 2PC design prioritizes safety over liveness. It ensures that transactions never violate atomicity (no partial commits) even at the cost of blocking during certain failure scenarios. Understanding how 2PC handles—and struggles with—failures is essential for architects and engineers who must build systems that remain both correct and available despite failures.
By the end of this page, you will understand every significant failure scenario in the Two-Phase Commit protocol: coordinator failures at each protocol phase, participant failures, network partitions, and message losses. You'll learn the recovery strategies for each scenario and understand why some failures inevitably cause blocking.
Before analyzing specific failure scenarios, we must establish the failure model—the assumptions about what kinds of failures can occur and how the system behaves when they happen.
Crash-Recovery Model:
The Two-Phase Commit protocol assumes a crash-recovery (or fail-stop with recovery) model:
Network Model:
The network is assumed to be asynchronous with:
| Failure Type | Description | 2PC Assumption | Handling Strategy |
|---|---|---|---|
| Crash Failure | Node stops executing entirely | Assumed—primary model | Logging, recovery protocol |
| Transient Crash | Node crashes and recovers quickly | Assumed | Log-based recovery |
| Permanent Crash | Node never recovers | Not explicitly handled | Administrative intervention |
| Message Loss | Individual messages fail to arrive | Network retries | Timeout + retry |
| Network Partition | Network split prevents communication | Eventually heals | Timeout, blocking possible |
| Byzantine Failure | Node sends incorrect/malicious messages | NOT assumed | Not handled by basic 2PC |
The Fischer-Lynch-Paterson theorem proves that no deterministic protocol can guarantee consensus in an asynchronous system with even one crash failure. 2PC navigates this impossibility by potentially blocking—it guarantees safety (atomicity) but not liveness (termination) in all failure scenarios.
Let's analyze what happens when the coordinator fails at various points during Phase 1 (the Prepare phase).
Scenario 2.1: Coordinator Crashes Before Sending Any PREPARE
Scenario 2.2: Coordinator Crashes After Sending Some PREPAREs
Scenario 2.3: Coordinator Crashes While Collecting Votes
Key Insight:
Coordinator failures during Phase 1 are relatively benign—the coordinator can safely decide ABORT upon recovery because a commit decision requires ALL votes, and an incomplete Phase 1 means not all votes were received.
With the presumed abort optimization, the coordinator doesn't need to log the abort decision. Any transaction with no COMMIT record is presumed aborted. This simplifies Phase 1 failure recovery—the coordinator just needs to ensure any PREPARED participants learn to abort.
Coordinator failures during Phase 2 are more complex because a commit decision may have been made. The exact recovery depends on what was logged before the crash.
Scenario 3.1: Coordinator Crashes Before Logging Decision
Critical Point: This scenario is safe because nothing was committed. The invariant is: if COMMIT wasn't logged, the transaction can be aborted.
There's a tiny window between when the coordinator decides to commit (in memory) and when it logs that decision. If the coordinator crashes in this window, the in-memory decision is lost. The force-write requirement ensures this window is as small as possible—the decision is logged synchronously before proceeding.
Scenario 3.2: Coordinator Crashes After Logging COMMIT But Before Sending Any GLOBAL_COMMIT
Scenario 3.3: Coordinator Crashes After Sending Some GLOBAL_COMMIT Messages
Scenario 3.4: Coordinator Crashes After Logging ABORT But Before Sending GLOBAL_ABORT
Summary of Coordinator Recovery Actions:
The coordinator's recovery action depends entirely on what's in the log:
| Log Contains | Recovery Action | Rationale |
|---|---|---|
| No PREPARE record | Nothing to do | Transaction never started committing |
| PREPARE but no decision | Decide ABORT, send GLOBAL_ABORT | Phase 1 incomplete or decision never made |
| COMMIT record, no END | Resend GLOBAL_COMMIT to all | Decision made, must complete |
| ABORT record, no END | Resend GLOBAL_ABORT to PREPARED participants | Decision made, must complete |
| END record | Nothing to do | Transaction fully complete |
Participant failures affect the coordinator's ability to collect votes and disseminate decisions. Let's examine the key scenarios.
Scenario 4.1: Participant Crashes During Execution (ACTIVE State)
Scenario 4.2: Participant Crashes Before Voting
Scenario 4.3: Participant Crashes After Voting COMMIT (PREPARED State)
Sub-scenario 4.3a: All Others Also Voted COMMIT
Sub-scenario 4.3b: Some Other Participant Voted ABORT
Scenario 4.4: Participant Crashes After Receiving Decision But Before Applying It
Notice that the coordinator may send the same decision multiple times (due to retries). The participant must handle this idempotently—processing GLOBAL_COMMIT when already committed should simply return ACK without re-committing.
The most significant limitation of Two-Phase Commit is the blocking problem. Under certain failure scenarios, participants can be stuck indefinitely, unable to make progress.
The Classic Blocking Scenario:
Why Participants Cannot Proceed:
Can't unilaterally COMMIT: They don't know if all others prepared. Perhaps another participant voted ABORT and the coordinator decided ABORT.
Can't unilaterally ABORT: The coordinator might have decided COMMIT (in memory before crashing). Other participants may have received COMMIT and already committed.
Cooperative termination fails: All participants are in PREPARED state. None of them knows the decision.
In this scenario, all participants hold locks and wait. Other transactions that need these resources are blocked. If the coordinator never recovers (hardware destroyed, disk unrecoverable), the participants may be blocked forever—or until an administrator manually resolves the situation.
Practical Mitigation Strategies:
While the blocking problem cannot be eliminated in 2PC, it can be mitigated:
1. Coordinator High Availability Replicate the coordinator's state to standby nodes. If the primary fails, a standby takes over and completes the transaction.
2. Transaction Timeouts Set maximum transaction durations. After a very long timeout (minutes or hours), administrators can manually resolve blocked transactions.
3. Presumed Abort with Persistent Decision Ensure the COMMIT decision is replicated before the coordinator fails. If the decision is replicated, standbys can deliver it.
4. Use Consensus-Based Coordination Modern systems (CockroachDB, Spanner) use Paxos or Raft to replicate coordinator state. Coordinator failure causes a brief delay for leader election, not indefinite blocking.
5. Accept Some Blocking In many systems, coordinator failures are rare enough that brief blocking periods are acceptable. The key is ensuring coordinators are highly available.
Network partitions—when nodes can communicate with some peers but not others—create some of the most challenging failure scenarios for 2PC.
Scenario 6.1: Partition During Phase 1
The network partitions after coordinator sends PREPARE to some participants:
Scenario 6.2: Partition During Phase 2
This is more problematic. After all vote COMMIT:
Effects:
Scenario 6.3: Asymmetric Partition
Consider a more complex scenario:
If P1 receives GLOBAL_COMMIT:
Key Insight: Partial Information Propagation
Cooperative termination helps when at least one participant has definitive information (COMMITTED or ABORTED state). That participant can share the outcome with others. But if the partition isolates all PREPARED participants from any that know the outcome, blocking occurs.
Partition Healing:
When the partition heals:
The behavior of 2PC during network partitions illustrates the CAP theorem trade-offs. 2PC chooses Consistency over Availability: during a partition, it blocks (sacrifices availability) rather than allowing some nodes to commit while others might abort (which would violate consistency).
While TCP provides reliable delivery, messages can still be 'lost' from the application's perspective due to crashes, reordering, or extreme delays. 2PC must handle these cases gracefully.
Scenario 7.1: PREPARE Message Lost
Scenario 7.2: Vote Message Lost
| Lost Message | Sender State | Receiver Expected Action | Resolution |
|---|---|---|---|
| PREPARE | Coordinator in WAIT | Participant never votes | Timeout → ABORT |
| VOTE_COMMIT | Participant in PREPARED | Coordinator missing vote | Timeout → ABORT |
| VOTE_ABORT | Participant aborted | Coordinator missing vote | Timeout → ABORT (same outcome) |
| GLOBAL_COMMIT | Coordinator sent decision | Participant still PREPARED | Retry until ACK received |
| GLOBAL_ABORT | Coordinator sent decision | Participant still PREPARED | Retry until ACK received |
| ACK | Participant completed | Coordinator waiting for ACK | Retry decision, idempotent handling |
Scenario 7.3: GLOBAL_COMMIT Lost
Scenario 7.4: ACK Message Lost
Delayed Messages:
Messages that arrive very late can also cause issues:
Sequence Numbers and Transaction IDs:
To handle duplicates and out-of-order messages:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
class ParticipantMessageHandler { /** * Handle incoming GLOBAL_COMMIT with idempotency */ async handleCommit(txId: string): Promise<'ACK' | 'UNKNOWN'> { const state = await this.getTransactionState(txId); switch (state) { case 'PREPARED': // Normal case - execute the commit await this.executeCommit(txId); return 'ACK'; case 'COMMITTED': // Already committed - duplicate message, just ACK console.log(`Duplicate COMMIT for ${txId} - already committed`); return 'ACK'; case 'ABORTED': // This is unexpected - we should not receive COMMIT if we aborted // This could be a bug or message reordering issue throw new Error(`Received COMMIT for aborted transaction ${txId}`); case 'UNKNOWN': // Transaction not found - might have been cleaned up long ago // Check persistent log to see if we ever committed it const logState = await this.checkPersistentLog(txId); if (logState === 'COMMITTED') { return 'ACK'; } else if (logState === 'ABORTED') { throw new Error(`Stale COMMIT for aborted transaction ${txId}`); } // No record at all - very old transaction or bug return 'UNKNOWN'; default: throw new Error(`Unexpected state ${state} for transaction ${txId}`); } } /** * Handle incoming PREPARE with idempotency */ async handlePrepare(txId: string): Promise<'VOTE_COMMIT' | 'VOTE_ABORT' | 'ALREADY_PREPARED'> { const state = await this.getTransactionState(txId); switch (state) { case 'ACTIVE': // Normal case - process the prepare return this.processPrepare(txId); case 'PREPARED': // Already prepared - duplicate message // Return our previous vote const previousVote = await this.getPreviousVote(txId); return previousVote; case 'COMMITTED': case 'ABORTED': // Transaction already finished - late PREPARE console.log(`Late PREPARE for finished transaction ${txId}`); // Inform coordinator of current state return state === 'COMMITTED' ? 'ALREADY_PREPARED' : 'VOTE_ABORT'; default: // Unknown transaction - treat as new, but this is unusual return 'VOTE_ABORT'; } }}Let's consolidate the recovery procedures for all failure scenarios into a comprehensive reference.
Coordinator Recovery Procedure:
12345678910111213141516171819202122232425262728293031
COORDINATOR RECOVERY ALGORITHM═══════════════════════════════════════════════════════════════ 1. SCAN LOG from last checkpoint For each transaction T in log: 2. IF <END T> found: → Transaction complete, no action needed 3. ELSE IF <COMMIT T> found (but no <END T>): → Decision was COMMIT → Resend GLOBAL_COMMIT to all participants → Collect ACKs, log <END T> when all acknowledge 4. ELSE IF <ABORT T> found (but no <END T>): → Decision was ABORT → Resend GLOBAL_ABORT to participants that voted COMMIT → Collect ACKs, log <END T> when all acknowledge 5. ELSE IF <PREPARE T> found (but no decision): → Phase 1 was in progress, decision never made → Since we cannot confirm all votes, decide ABORT → Log <ABORT T> → Send GLOBAL_ABORT to all participants → Collect ACKs, log <END T> 6. HANDLE PARTICIPANT QUERIES: IF participant asks about transaction T: → Look up T in active transactions or log → IF T has COMMIT record: return COMMIT → IF T has ABORT record or not found: return ABORT (presumed abort)Participant Recovery Procedure:
1234567891011121314151617181920212223242526272829303132333435
PARTICIPANT RECOVERY ALGORITHM═══════════════════════════════════════════════════════════════ 1. SCAN LOG from last checkpoint For each transaction T in log: 2. IF <COMMIT T> found: → Complete commit if not already done → Redo any changes if necessary → Release locks 3. ELSE IF <ABORT T> found: → Complete abort if not already done → Undo any changes if necessary → Release locks 4. ELSE IF <PREPARED T> found (no COMMIT or ABORT): → Transaction is in-doubt → Re-acquire locks based on write set in PREPARED record → Query coordinator for decision (may block if unreachable) → Once decision learned: → IF COMMIT: complete commit, release locks → IF ABORT: complete abort, release locks 5. ELSE (only execution records found): → Transaction was active, never voted → Abort locally using undo log → Release any locks → Log <ABORT T> 6. SPECIAL HANDLING FOR BLOCKED TRANSACTIONS: → Periodically retry querying coordinator → Try cooperative termination with other participants → Log warnings if blocked for extended period → Eventually may require administrator interventionProduction systems should regularly test recovery procedures using fault injection. Simulate crashes at every protocol phase, verify correct recovery, and measure recovery time. This is often formalized as 'chaos engineering' in modern distributed systems practices.
We've comprehensively examined how Two-Phase Commit handles—and sometimes struggles with—failures in distributed systems. Let's consolidate the key insights:
What's Next:
The next page examines the Three-Phase Commit (3PC) Protocol, which attempts to eliminate the blocking problem by adding an extra phase. We'll see how 3PC improves on 2PC's liveness properties while understanding why even 3PC cannot fully solve all distributed consensus challenges.
You now understand the comprehensive failure handling in Two-Phase Commit—from coordinator and participant crashes to network partitions and message losses. Next, we'll explore how the Three-Phase Commit protocol addresses the blocking problem.