Database Management SystemsDistributed Transactions

Distributed Transactions

LevelAdvanced

Duration75 mins

TopicDistributed Transactions

2 / 5

Coordinator Role

The Orchestrator of Distributed Consensus

In the Two-Phase Commit protocol, the coordinator (also called the transaction manager or transaction coordinator) serves as the central authority orchestrating the commit process. While participants execute local transaction operations, it is the coordinator that transforms a collection of independent local decisions into a unified global outcome.\n\nThe coordinator's role is both powerful and perilous. It is the single entity that can definitively resolve whether a distributed transaction commits or aborts. Yet this centrality also makes it a critical point of failure—if the coordinator fails at the wrong moment, participants may be left stranded in an uncertain state, unable to proceed. Understanding the coordinator's responsibilities, mechanisms, and recovery procedures is essential for anyone implementing or troubleshooting distributed transaction systems.

What You Will Learn

By the end of this page, you will have a comprehensive understanding of the coordinator's responsibilities throughout the distributed transaction lifecycle. You'll understand transaction initiation, participant tracking, vote collection, decision dissemination, timeout management, and the coordinator's recovery procedures after failures.

Coordinator Identity and Selection

Before examining the coordinator's operational responsibilities, we must understand how a coordinator is selected and what architectural role it plays in the distributed system.\n\nCoordinator Selection Strategies:\n\nIn production distributed database systems, the coordinator is typically selected through one of several mechanisms:\n\n1. Client-Designated Coordinator\nThe application or client driver selects which node will coordinate the transaction, often based on which node was first contacted or which holds the primary data being accessed. This is common in sharded database systems.\n\n2. First-Participant Coordinator\nThe first database node accessed by the transaction automatically becomes the coordinator. Subsequent participants register with this initial node.\n\n3. Dedicated Transaction Manager\nA separate, specialized transaction manager service coordinates all distributed transactions. This is common in enterprise transaction processing systems using X/Open XA or similar standards.\n\n4. Leader Election\nIn highly available systems, a leader election protocol (Raft, Paxos) selects a node to serve as coordinator. If this coordinator fails, a new leader is elected.

Coordinator Selection Approaches
Approach	Advantages	Disadvantages	Use Case
Client-Designated	Predictable, client control, low latency	Requires client awareness, potential hotspots	Sharded databases (CockroachDB)
First-Participant	Automatic, no central service	Coordinator location varies, harder to monitor	Federated databases
Dedicated TM	Centralized management, auditing	Single point of failure without HA	Enterprise OLTP (Tuxedo, CICS)
Leader Election	High availability, fault-tolerant	Election latency, complexity	Cloud-native databases (Spanner)

Coordinator Identity Persistence:\n\nRegardless of how the coordinator is selected, it is critical that all participants know the coordinator's identity and can contact it to resolve uncertainty. This identity must be:\n\n- Recorded in each participant's log (in the PREPARED record)\n- Stable for the duration of the transaction\n- Reachable via a known network address\n- Recoverable after participant crashes\n\nWithout this information, a participant that crashed while in the PREPARED state would have no way to discover the transaction's outcome upon recovery.

Coordinator Data Structures

To manage distributed transactions effectively, the coordinator maintains several essential data structures. These structures track the state of ongoing transactions, participant information, and timing constraints.\n\nEssential Coordinator Data Structures:

coordinator-data-structures.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// Transaction Record maintained by the Coordinator
interface TransactionRecord {
    // Unique identifier for the distributed transaction
    transactionId: string;
    
    // Current state in the coordinator's state machine
    state: 'INITIAL' | 'WAIT' | 'COMMIT' | 'ABORT' | 'END';
    
    // List of all participants in this transaction
    participants: ParticipantInfo[];
    
    // Timestamp when commit process began
    prepareStartTime: number;
    
    // Timeout value for vote collection
    voteTimeout: number;
    
    // Received votes from participants
    votes: Map<string, 'VOTE_COMMIT' | 'VOTE_ABORT'>;
    
    // Acknowledgments received from participants
    acknowledgments: Set<string>;
    
    // The global decision once determined
    decision?: 'COMMIT' | 'ABORT';
}
 
// Information about each participant
interface ParticipantInfo {
    // Unique identifier for the participant
    participantId: string;
    
    // Network address for communication
    endpoint: string;
    
    // Whether we've received their vote
    hasVoted: boolean;
    
    // The vote received (if any)
    vote?: 'VOTE_COMMIT' | 'VOTE_ABORT';
    
    // Whether they've acknowledged the decision
    hasAcknowledged: boolean;
    
    // Number of retry attempts for this participant
    retryCount: number;
}
 
// The Coordinator's persistent transaction table
class CoordinatorTransactionManager {
    // In-memory transaction table (also persisted to log)
    private activeTransactions: Map<string, TransactionRecord>;
    
    // Pending timer handles for timeouts
    private timeoutHandles: Map<string, Timer>;
    
    // Durable log for recovery
    private durableLog: DurableLog;
}

Transaction Table:\n\nThe transaction table is the coordinator's primary data structure—a mapping from transaction IDs to transaction records. This table must support:\n\n- Fast lookup by transaction ID (O(1) expected time)\n- Enumeration of all active transactions for timeout checking\n- Atomic updates that coordinate with logging\n\nParticipant Registry:\n\nFor each transaction, the coordinator maintains a participant registry recording:\n\n- Which nodes are participating\n- How to contact each participant (network endpoints)\n- What vote each participant has cast\n- Whether each participant has acknowledged the decision\n\nThis registry is populated during the transaction's execution phase as participants join the distributed transaction.

Registration Phase

Before the commit process begins, there's an implicit or explicit registration phase. When a transaction first accesses data at a participant node, that participant registers with the coordinator. This registration must complete before the prepare phase begins—the coordinator must know all participants to send them PREPARE messages.

Transaction Lifecycle from Coordinator's Perspective

Let's trace through the complete lifecycle of a distributed transaction from the coordinator's standpoint, examining each phase in detail.\n\nPhase 0: Transaction Initiation and Participant Registration\n\nThe coordinator's involvement begins when the distributed transaction is initiated:\n\n1. Transaction ID Assignment: The coordinator assigns a globally unique transaction ID (often a combination of coordinator ID and sequence number, or a UUID)\n2. Transaction Record Creation: An entry is created in the transaction table with state INITIAL\n3. Participant Registration: As the application accesses data at various nodes, those nodes register as participants\n4. Operations Execution: The transaction executes operations at participant nodes (reads, writes, updates)\n5. Commit Request: When the application requests commit, the prepare phase begins

Converting Mermaid diagram...

Phase 1: Vote Collection (Coordinator's Prepare Phase Actions)\n\nWhen the application requests commit, the coordinator transitions from transaction execution to the commit protocol:\n\nStep 1: Log Prepare Initiation\n\nlog_force(<PREPARE T, participants=[P1, P2, P3], timeout=30s>)\n\n\nStep 2: Transition to WAIT State\nUpdate the transaction record's state to WAIT.\n\nStep 3: Start Timeout Timer\nInitiate a timer for vote collection. If this timer expires without receiving all votes, the coordinator will decide ABORT.\n\nStep 4: Send PREPARE to All Participants\nSend PREPARE messages to all registered participants. These messages should include the transaction ID and may include other context.\n\nStep 5: Collect Votes\nWait for VOTE_COMMIT or VOTE_ABORT from each participant. Record each vote in the transaction record.\n\nStep 6: Make Decision\nOnce all votes are received (or timeout expires):\n- If all voted COMMIT → decision is COMMIT\n- Otherwise → decision is ABORT

coordinator-prepare-phase.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class Coordinator {
    /**
     * Initiate the prepare phase for a distributed transaction
     */
    async initiateCommit(txId: string): Promise<'COMMIT' | 'ABORT'> {
        const txRecord = this.activeTransactions.get(txId);
        if (!txRecord || txRecord.state !== 'INITIAL') {
            throw new Error(`Invalid transaction state for commit: ${txId}`);
        }
        
        // Step 1: Force-write prepare record to durable log
        await this.durableLog.forceWrite({
            type: 'PREPARE',
            transactionId: txId,
            participants: txRecord.participants.map(p => p.participantId),
            timestamp: Date.now()
        });
        
        // Step 2: Transition to WAIT state
        txRecord.state = 'WAIT';
        txRecord.prepareStartTime = Date.now();
        
        // Step 3: Start timeout timer
        this.startVoteTimeout(txId);
        
        // Step 4: Send PREPARE to all participants (parallel)
        const preparePromises = txRecord.participants.map(p =>
            this.sendPrepare(p, txId)
        );
        
        // Don't await these - we'll collect votes as they arrive
        // Fire and forget the PREPARE messages
        
        // Step 5: Wait for votes (with timeout)
        const decision = await this.collectVotes(txId);
        
        // Step 6: Execute commit phase
        return this.executeDecision(txId, decision);
    }
    
    /**
     * Collect votes from all participants
     */
    private async collectVotes(txId: string): Promise<'COMMIT' | 'ABORT'> {
        const txRecord = this.activeTransactions.get(txId)!;
        
        return new Promise((resolve) => {
            const checkComplete = () => {
                // Check if all votes received
                if (txRecord.votes.size === txRecord.participants.length) {
                    // All votes in - check if unanimous COMMIT
                    const allCommit = Array.from(txRecord.votes.values())
                        .every(v => v === 'VOTE_COMMIT');
                    
                    this.clearVoteTimeout(txId);
                    resolve(allCommit ? 'COMMIT' : 'ABORT');
                    return true;
                }
                
                // Check for any ABORT vote (early termination)
                if (Array.from(txRecord.votes.values()).includes('VOTE_ABORT')) {
                    this.clearVoteTimeout(txId);
                    resolve('ABORT');
                    return true;
                }
                
                return false;
            };
            
            // Register vote handler
            this.voteHandlers.set(txId, (participantId, vote) => {
                txRecord.votes.set(participantId, vote);
                txRecord.participants.find(p => p.participantId === participantId)!.hasVoted = true;
                checkComplete();
            });
            
            // Register timeout handler
            this.timeoutHandlers.set(txId, () => {
                // Timeout - not all votes received, must abort
                resolve('ABORT');
            });
        });
    }
}

Decision Dissemination

Once the coordinator has made a decision (either COMMIT or ABORT), it must reliably disseminate this decision to all participants. This is Phase 2 of the protocol, and the coordinator has specific responsibilities to ensure all participants receive and act on the decision.\n\nDecision Persistence:\n\nBefore sending the decision to any participant, the coordinator must force-write the decision to its durable log. This is critical because:\n\n1. If the decision is COMMIT and the coordinator crashes after sending to only some participants, it must be able to resend COMMIT to the remaining participants upon recovery\n2. If the decision is ABORT, participants that voted COMMIT need to know to abort\n3. The log is the source of truth for recovery\n\nThe Commit Point:\n\nFor COMMIT decisions, the moment when <COMMIT T> is written to stable storage is the commit point. This is the point of no return—once this record is durable:\n\n- The transaction WILL commit (eventually)\n- All participants WILL be told to commit\n- Even if the coordinator crashes, recovery will complete the commit

Decision Durability Requirement

The decision record must be on stable storage before ANY participant receives the decision. If the coordinator crashes after sending COMMIT to one participant but before logging, recovery would not know to commit—potentially leading to one committed participant and others that abort. The log must always lead the execution.

Reliable Decision Delivery:\n\nThe coordinator must ensure every participant receives the decision. This requires a reliable delivery mechanism:\n\n1. Send Decision: Send GLOBAL_COMMIT or GLOBAL_ABORT to all participants\n2. Track Acknowledgments: Track which participants have acknowledged\n3. Retry Unacknowledged: Periodically resend to participants that haven't acknowledged\n4. Handle Failures: Continue retrying until participant acknowledges or is known to have failed permanently

decision-dissemination.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
class Coordinator {
    /**
     * Execute the commit or abort decision
     */
    async executeDecision(txId: string, decision: 'COMMIT' | 'ABORT'): Promise<'COMMIT' | 'ABORT'> {
        const txRecord = this.activeTransactions.get(txId)!;
        
        // CRITICAL: Force-write decision to log BEFORE sending to participants
        await this.durableLog.forceWrite({
            type: decision,
            transactionId: txId,
            timestamp: Date.now()
        });
        
        // Update in-memory state
        txRecord.state = decision;
        txRecord.decision = decision;
        
        // Determine which participants need notification
        // For COMMIT: all participants
        // For ABORT: only those that voted COMMIT (others already aborted)
        const participantsToNotify = decision === 'COMMIT'
            ? txRecord.participants
            : txRecord.participants.filter(p => p.vote === 'VOTE_COMMIT');
        
        // Send decision to all relevant participants
        for (const participant of participantsToNotify) {
            this.sendDecision(participant, txId, decision);
        }
        
        // Start acknowledgment collection phase
        this.startAckPhase(txId, participantsToNotify);
        
        // Wait for all acknowledgments (with retry loop)
        await this.collectAcknowledgments(txId, participantsToNotify);
        
        // All acknowledged - log END and clean up
        await this.completeTransaction(txId);
        
        return decision;
    }
    
    /**
     * Collect acknowledgments with retry for reliability
     */
    private async collectAcknowledgments(
        txId: string, 
        participants: ParticipantInfo[]
    ): Promise<void> {
        const txRecord = this.activeTransactions.get(txId)!;
        
        while (true) {
            // Check if all have acknowledged
            const allAcked = participants.every(p => 
                txRecord.acknowledgments.has(p.participantId)
            );
            
            if (allAcked) {
                return; // All done
            }
            
            // Find who hasn't acknowledged
            const unackedParticipants = participants.filter(p =>
                !txRecord.acknowledgments.has(p.participantId)
            );
            
            // Retry sending decision to unacknowledged participants
            for (const participant of unackedParticipants) {
                participant.retryCount++;
                
                if (participant.retryCount > MAX_RETRIES) {
                    // Participant seems permanently failed
                    // Log this and consider them acknowledged
                    // (when they recover, they'll query us)
                    this.log.warn(`Participant ${participant.participantId} not responding`);
                    txRecord.acknowledgments.add(participant.participantId);
                    continue;
                }
                
                // Retry sending decision
                this.sendDecision(participant, txId, txRecord.decision!);
            }
            
            // Wait before next retry round
            await this.sleep(RETRY_INTERVAL * Math.min(participant.retryCount, 5));
        }
    }
    
    /**
     * Complete the transaction - log END and clean up
     */
    private async completeTransaction(txId: string): Promise<void> {
        // Log transaction end
        await this.durableLog.forceWrite({
            type: 'END',
            transactionId: txId,
            timestamp: Date.now()
        });
        
        // Clean up in-memory structures
        this.activeTransactions.delete(txId);
        this.timeoutHandles.delete(txId);
        this.voteHandlers.delete(txId);
    }
}

Timeout Management

Timeouts are essential for ensuring the coordinator doesn't wait indefinitely for participants. The coordinator uses several types of timeouts throughout the protocol.\n\nTypes of Coordinator Timeouts:\n\n1. Prepare Timeout (Vote Collection Timeout)\n\nAfter sending PREPARE to all participants, the coordinator starts a timer. If this timer expires before all votes are received:\n\n- Missing votes are treated as implicit VOTE_ABORT\n- The coordinator decides GLOBAL_ABORT\n- Any participants that did vote COMMIT will be told to abort\n\n2. Acknowledgment Timeout\n\nAfter sending the global decision, the coordinator waits for acknowledgments. This timeout handling is more nuanced:\n\n- The coordinator cannot simply give up—it must ensure all participants learn the decision\n- Instead of timing out, the coordinator retries with exponential backoff\n- After many retries, the coordinator may log a warning but continues trying\n- Eventually the transaction is marked 'complete' even if some participant seems permanently failed

Coordinator Timeout Behaviors
Timeout Type	Typical Value	On Expiry	Blocked Resources
Prepare Timeout	5-30 seconds	Decide ABORT	No locks held by coordinator
Ack Timeout (per retry)	1-5 seconds	Retry send	Transaction record in memory
Ack Maximum Retries	10-50 attempts	Log warning, mark complete	Log space for transaction
Transaction Total Timeout	Minutes to hours	Force abort (if possible)	All participant locks

Timeout Selection Trade-offs

Timeout values are critical for system behavior: Too short means aborting transactions that could succeed (false positives). Too long means holding locks while waiting for failed participants (reduced availability). Production systems often use adaptive timeouts that adjust based on historical network latency and participant response times.

Implementing Robust Timeouts:\n\nA robust timeout implementation must handle concurrent operations, timer cancellation, and clock drift:\n\n1. Use Monotonic Clocks: Wall-clock time can jump (NTP adjustments), causing spurious timeouts or missed timeouts. Use monotonic clocks for measuring elapsed time.\n\n2. Handle Timer Cancellation: When a vote arrives, check if all votes are now complete. If so, cancel the prepare timeout before it fires.\n\n3. Avoid Timer Storms: If many transactions timeout simultaneously (e.g., after a network partition heals), the system should handle the load gracefully.\n\n4. Persist Timeout State: On recovery, recalculate how much time remains rather than restarting timers from zero.

timeout-implementation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
class TimeoutManager {
    private timers: Map<string, NodeJS.Timeout> = new Map();
    private deadlines: Map<string, number> = new Map(); // Monotonic time
    
    /**
     * Start a timeout for vote collection
     */
    startVoteTimeout(txId: string, timeoutMs: number, onTimeout: () => void): void {
        // Record deadline using monotonic time
        const deadline = this.getMonotonicTime() + timeoutMs;
        this.deadlines.set(txId, deadline);
        
        // Schedule callback
        const handle = setTimeout(() => {
            this.timers.delete(txId);
            this.deadlines.delete(txId);
            onTimeout();
        }, timeoutMs);
        
        this.timers.set(txId, handle);
    }
    
    /**
     * Cancel a pending timeout
     */
    cancelTimeout(txId: string): void {
        const handle = this.timers.get(txId);
        if (handle) {
            clearTimeout(handle);
            this.timers.delete(txId);
            this.deadlines.delete(txId);
        }
    }
    
    /**
     * On recovery, recalculate remaining timeout
     */
    resumeTimeout(txId: string, originalDeadline: number, onTimeout: () => void): void {
        const now = this.getMonotonicTime();
        const remaining = Math.max(0, originalDeadline - now);
        
        if (remaining === 0) {
            // Already expired
            onTimeout();
        } else {
            // Resume with remaining time
            this.startVoteTimeout(txId, remaining, onTimeout);
        }
    }
    
    /**
     * Get monotonic time (not affected by wall-clock adjustments)
     */
    private getMonotonicTime(): number {
        // In Node.js: process.hrtime()
        // In browser: performance.now()
        return performance.now();
    }
}

Coordinator Recovery

When a coordinator crashes and restarts, it must recover the state of all active distributed transactions and complete them. The recovery procedure is driven entirely by the durable log.\n\nLog-Based Recovery Algorithm:\n\nThe coordinator scans its log from the most recent checkpoint forward and categorizes each transaction:\n\nCase 1: Only <PREPARE T> found, no <COMMIT T> or <ABORT T>\n- The crash occurred during vote collection (WAIT state)\n- We don't know what votes were received (they were in volatile memory)\n- Safe action: Decide ABORT and send GLOBAL_ABORT to all participants\n- Rationale: Any participant in PREPARED state will abort; participants that already voted ABORT have already aborted\n\nCase 2: <COMMIT T> found, no <END T>\n- A commit decision was made, but not all participants acknowledged\n- Action: Resend GLOBAL_COMMIT to all participants and collect acknowledgments\n\nCase 3: <ABORT T> found, no <END T>\n- An abort decision was made, but not all participants acknowledged\n- Action: Resend GLOBAL_ABORT to participants that voted COMMIT and collect acknowledgments\n\nCase 4: <END T> found\n- Transaction completed. Nothing to do.\n\nCase 5: No log records for a transaction (participant asks about it)\n- Transaction was never initiated or has long since completed\n- Reply: ABORT (the transaction doesn't exist from coordinator's perspective)

Converting Mermaid diagram...

Presumed Abort Optimization

Many systems implement 'Presumed Abort' optimization: if there's no log record for a transaction, assume it aborted. This reduces logging for abort cases (no need to write <ABORT T>) and allows faster response to participant queries. However, COMMIT records must still be force-written before any participant is told to commit.

coordinator-recovery.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
class CoordinatorRecovery {
    /**
     * Recover coordinator state after crash
     */
    async recover(): Promise<void> {
        console.log('Starting coordinator recovery...');
        
        // Step 1: Scan log and categorize transactions
        const transactions = await this.classifyTransactions();
        
        // Step 2: Handle each category
        for (const [txId, state] of transactions) {
            await this.recoverTransaction(txId, state);
        }
        
        console.log('Coordinator recovery complete');
    }
    
    /**
     * Scan log and classify transaction states
     */
    private async classifyTransactions(): Promise<Map<string, TransactionState>> {
        const transactions = new Map<string, TransactionState>();
        
        // Read log from checkpoint
        const logRecords = await this.durableLog.readFromCheckpoint();
        
        for (const record of logRecords) {
            switch (record.type) {
                case 'PREPARE':
                    transactions.set(record.transactionId, {
                        phase: 'WAIT',
                        participants: record.participants,
                        decision: undefined
                    });
                    break;
                    
                case 'COMMIT':
                    const commitTx = transactions.get(record.transactionId);
                    if (commitTx) {
                        commitTx.decision = 'COMMIT';
                    }
                    break;
                    
                case 'ABORT':
                    const abortTx = transactions.get(record.transactionId);
                    if (abortTx) {
                        abortTx.decision = 'ABORT';
                    }
                    break;
                    
                case 'END':
                    // Transaction complete - remove from recovery list
                    transactions.delete(record.transactionId);
                    break;
            }
        }
        
        return transactions;
    }
    
    /**
     * Recover a single transaction
     */
    private async recoverTransaction(txId: string, state: TransactionState): Promise<void> {
        if (state.decision === 'COMMIT') {
            // Decision was COMMIT - need to complete it
            console.log(`Recovering COMMIT for ${txId}`);
            await this.completeCommit(txId, state.participants);
            
        } else if (state.decision === 'ABORT') {
            // Decision was ABORT - need to notify voting COMMIT participants
            console.log(`Recovering ABORT for ${txId}`);
            await this.completeAbort(txId, state.participants);
            
        } else {
            // No decision - crashed during vote collection
            // Safe choice: ABORT
            console.log(`No decision found for ${txId} - aborting`);
            await this.durableLog.forceWrite({ type: 'ABORT', transactionId: txId });
            await this.completeAbort(txId, state.participants);
        }
    }
    
    /**
     * Handle participant query about transaction outcome
     */
    handleDecisionQuery(txId: string): 'COMMIT' | 'ABORT' {
        // Look up in recovered transactions
        const state = this.activeTransactions.get(txId);
        
        if (state?.decision === 'COMMIT') {
            return 'COMMIT';
        }
        
        // No record or explicit ABORT - return ABORT
        // (Presumed Abort)
        return 'ABORT';
    }
}

Coordinator High Availability

The coordinator is a single point of failure in classic 2PC. If the coordinator fails while participants are in the PREPARED state, those participants are blocked—unable to commit or abort—until the coordinator recovers. In production systems, this risk is mitigated through coordinator high availability (HA) mechanisms.\n\nHigh Availability Strategies:\n\n1. Synchronous Log Replication\n\nThe simplest HA approach replicates the coordinator's log to a standby node:\n\n- Every log write is synchronously replicated before being acknowledged\n- If the primary coordinator fails, the standby has an identical log\n- The standby can take over and complete in-flight transactions\n- Drawback: Adds latency to every commit (wait for replication)\n\n2. Active-Passive Failover\n\n- One active coordinator handles all transactions\n- One or more passive standbys maintain synchronized state\n- A failure detection mechanism (heartbeats) triggers failover\n- The passive takes over and resends pending decisions\n\n3. Active-Active Coordination (Partitioned)\n\n- Multiple coordinators, each responsible for a subset of transactions\n- Transactions are assigned to coordinators via hashing or load balancing\n- Each coordinator has its own HA mechanism\n- No single point of failure for all transactions

Coordinator HA Approaches
Approach	Failover Time	Data Loss Risk	Complexity	Latency Impact
Synchronous Replication	Seconds	None	Low-Medium	High (+1 RTT per commit)
Asynchronous Replication	Seconds	Some possible	Medium	Low
Consensus (Paxos/Raft)	Sub-second	None with quorum	High	Medium
Shared Storage	Seconds-Minutes	None if storage survives	Medium	Low

Modern Distributed Databases

Modern distributed databases like CockroachDB, TiDB, and YugabyteDB avoid the single-coordinator problem by using consensus protocols (Raft) for transaction coordination. Each transaction's state is replicated across multiple nodes, ensuring no single failure blocks the transaction. These systems effectively integrate 2PC with consensus-based replication.

Fencing to Prevent Split-Brain:\n\nWhen using failover mechanisms, it's critical to ensure that a 'zombie' coordinator (an old primary that was incorrectly presumed dead) cannot interfere with the new coordinator. This is called the split-brain problem.\n\nFencing Mechanisms:\n\n1. Epoch/Generation Numbers: Each coordinator incarnation has a unique epoch number. Participants reject messages from old epochs.\n\n2. Lease-Based Leadership: The coordinator holds a time-limited lease. It cannot send decisions after its lease expires. The new coordinator waits for the old lease to expire.\n\n3. STONITH (Shoot The Other Node In The Head): Physically power off the old coordinator before the new one takes over.\n\n4. Consensus-Based Leadership: Use a consensus protocol to elect a single leader. Only the elected leader can coordinate transactions.

Coordinator Logging Optimizations

Logging is a significant performance bottleneck in 2PC—every transaction requires multiple synchronous writes to stable storage. Several optimizations reduce this overhead while maintaining correctness.\n\nKey Optimizations:

Logging Performance Optimizations

•Group Commit: Batch multiple transaction's log records into a single disk I/O. Instead of syncing after each record, wait briefly to accumulate records from concurrent transactions, then sync once.
•Presumed Abort: Don't log ABORT decisions. If a transaction has no COMMIT record, assume it aborted. Saves one forced write per abort.
•Presumed Commit (alternative): Log aborts but not commits. More complex but better if most transactions commit.
•Early Prepare Logging: Write the PREPARE record asynchronously while sending PREPARE messages. Only force it if a COMMIT decision is made.
•Log Record Compression: Compress participant lists and use transaction ID deltas to reduce log record sizes.
•Parallel Logging: Use multiple log partitions for concurrent writes, with each transaction assigned to a partition.

Group Commit in Detail:\n\nGroup commit is the most impactful optimization. Instead of forcing each log record individually:\n\n1. Queue Records: As transactions complete phases, their log records enter a queue\n2. Wait for Group: Wait a short time (e.g., 5-20ms) or until N records accumulate\n3. Single Sync: Write all queued records and sync once\n4. Wake All Waiters: Notify all waiting transactions that their records are durable\n\nThis amortizes the cost of a single sync (typically 2-10ms for spinning disk, <1ms for NVMe SSD) across many transactions.

group-commit.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class GroupCommitLog {
    private pendingRecords: LogRecord[] = [];
    private waiters: Map<string, () => void> = new Map();
    private flushTimer: NodeJS.Timeout | null = null;
    
    private readonly MAX_BATCH_SIZE = 100;
    private readonly MAX_WAIT_MS = 10;
    
    /**
     * Write a log record with group commit optimization
     */
    async writeWithGroupCommit(record: LogRecord): Promise<void> {
        return new Promise((resolve) => {
            // Add record to pending batch
            this.pendingRecords.push(record);
            this.waiters.set(record.transactionId, resolve);
            
            // Check if batch is full
            if (this.pendingRecords.length >= this.MAX_BATCH_SIZE) {
                this.flushBatch();
                return;
            }
            
            // Start timer if not already running
            if (!this.flushTimer) {
                this.flushTimer = setTimeout(
                    () => this.flushBatch(), 
                    this.MAX_WAIT_MS
                );
            }
        });
    }
    
    /**
     * Flush pending records to disk
     */
    private async flushBatch(): Promise<void> {
        if (this.flushTimer) {
            clearTimeout(this.flushTimer);
            this.flushTimer = null;
        }
        
        if (this.pendingRecords.length === 0) return;
        
        // Capture current batch
        const batch = this.pendingRecords;
        const batchWaiters = new Map(this.waiters);
        
        this.pendingRecords = [];
        this.waiters = new Map();
        
        // Write and sync
        await this.writeToDisk(batch);
        await this.syncToDisk();
        
        // Notify all waiters
        for (const resolve of batchWaiters.values()) {
            resolve();
        }
    }
}

Summary: The Coordinator's Critical Role

We've thoroughly examined the coordinator's responsibilities in the Two-Phase Commit protocol. The coordinator is the central authority that transforms distributed uncertainty into global consensus. Let's consolidate the key insights:

Key Takeaways

•Transaction Management: The coordinator tracks all participants, manages transaction state, and orchestrates the commit protocol.
•Vote Collection: During Phase 1, the coordinator gathers votes from all participants, enforcing timeouts to prevent indefinite waiting.
•Decision Authority: The coordinator is the sole entity that determines the global outcome—COMMIT only if ALL vote yes; ABORT otherwise.
•Durable Decision: The coordinator force-writes its decision before informing any participant, ensuring recovery can complete the transaction.
•Reliable Notification: The coordinator must ensure all participants receive the decision, retrying until acknowledged.
•Recovery Obligation: After a crash, the coordinator must recover all in-flight transactions and complete them based on logged decisions.
•High Availability: Production systems use replication and failover to prevent coordinator failures from blocking participants indefinitely.

What's Next:\n\nThe next page examines the Participant Role—how participants process coordinator messages, manage their local state, handle the uncertain PREPARED state, and recover after failures. Understanding both roles is essential for a complete picture of distributed transaction processing.

Page Complete

You now understand the coordinator's comprehensive responsibilities in the Two-Phase Commit protocol—from transaction initiation through vote collection, decision making, reliable notification, recovery, and high availability. Next, we'll explore how participants respond to and implement these coordinator directives.

2 / 5

Loading learning content...

Database Management SystemsDistributed Transactions

Distributed Transactions

LevelAdvanced

Duration75 mins

TopicDistributed Transactions

2 / 5

Coordinator Role

The Orchestrator of Distributed Consensus

What You Will Learn

Coordinator Identity and Selection

Coordinator Selection Approaches
Approach	Advantages	Disadvantages	Use Case
Client-Designated	Predictable, client control, low latency	Requires client awareness, potential hotspots	Sharded databases (CockroachDB)
First-Participant	Automatic, no central service	Coordinator location varies, harder to monitor	Federated databases
Dedicated TM	Centralized management, auditing	Single point of failure without HA	Enterprise OLTP (Tuxedo, CICS)
Leader Election	High availability, fault-tolerant	Election latency, complexity	Cloud-native databases (Spanner)

Coordinator Data Structures

coordinator-data-structures.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
// Transaction Record maintained by the Coordinator
interface TransactionRecord {
    // Unique identifier for the distributed transaction
    transactionId: string;
    
    // Current state in the coordinator's state machine
    state: 'INITIAL' | 'WAIT' | 'COMMIT' | 'ABORT' | 'END';
    
    // List of all participants in this transaction
    participants: ParticipantInfo[];
    
    // Timestamp when commit process began
    prepareStartTime: number;
    
    // Timeout value for vote collection
    voteTimeout: number;
    
    // Received votes from participants
    votes: Map<string, 'VOTE_COMMIT' | 'VOTE_ABORT'>;
    
    // Acknowledgments received from participants
    acknowledgments: Set<string>;
    
    // The global decision once determined
    decision?: 'COMMIT' | 'ABORT';
}
 
// Information about each participant
interface ParticipantInfo {
    // Unique identifier for the participant
    participantId: string;
    
    // Network address for communication
    endpoint: string;
    
    // Whether we've received their vote
    hasVoted: boolean;
    
    // The vote received (if any)
    vote?: 'VOTE_COMMIT' | 'VOTE_ABORT';
    
    // Whether they've acknowledged the decision
    hasAcknowledged: boolean;
    
    // Number of retry attempts for this participant
    retryCount: number;
}
 
// The Coordinator's persistent transaction table
class CoordinatorTransactionManager {
    // In-memory transaction table (also persisted to log)
    private activeTransactions: Map<string, TransactionRecord>;
    
    // Pending timer handles for timeouts
    private timeoutHandles: Map<string, Timer>;
    
    // Durable log for recovery
    private durableLog: DurableLog;
}

Registration Phase

Transaction Lifecycle from Coordinator's Perspective

Converting Mermaid diagram...

coordinator-prepare-phase.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
class Coordinator {
    /**
     * Initiate the prepare phase for a distributed transaction
     */
    async initiateCommit(txId: string): Promise<'COMMIT' | 'ABORT'> {
        const txRecord = this.activeTransactions.get(txId);
        if (!txRecord || txRecord.state !== 'INITIAL') {
            throw new Error(`Invalid transaction state for commit: ${txId}`);
        }
        
        // Step 1: Force-write prepare record to durable log
        await this.durableLog.forceWrite({
            type: 'PREPARE',
            transactionId: txId,
            participants: txRecord.participants.map(p => p.participantId),
            timestamp: Date.now()
        });
        
        // Step 2: Transition to WAIT state
        txRecord.state = 'WAIT';
        txRecord.prepareStartTime = Date.now();
        
        // Step 3: Start timeout timer
        this.startVoteTimeout(txId);
        
        // Step 4: Send PREPARE to all participants (parallel)
        const preparePromises = txRecord.participants.map(p =>
            this.sendPrepare(p, txId)
        );
        
        // Don't await these - we'll collect votes as they arrive
        // Fire and forget the PREPARE messages
        
        // Step 5: Wait for votes (with timeout)
        const decision = await this.collectVotes(txId);
        
        // Step 6: Execute commit phase
        return this.executeDecision(txId, decision);
    }
    
    /**
     * Collect votes from all participants
     */
    private async collectVotes(txId: string): Promise<'COMMIT' | 'ABORT'> {
        const txRecord = this.activeTransactions.get(txId)!;
        
        return new Promise((resolve) => {
            const checkComplete = () => {
                // Check if all votes received
                if (txRecord.votes.size === txRecord.participants.length) {
                    // All votes in - check if unanimous COMMIT
                    const allCommit = Array.from(txRecord.votes.values())
                        .every(v => v === 'VOTE_COMMIT');
                    
                    this.clearVoteTimeout(txId);
                    resolve(allCommit ? 'COMMIT' : 'ABORT');
                    return true;
                }
                
                // Check for any ABORT vote (early termination)
                if (Array.from(txRecord.votes.values()).includes('VOTE_ABORT')) {
                    this.clearVoteTimeout(txId);
                    resolve('ABORT');
                    return true;
                }
                
                return false;
            };
            
            // Register vote handler
            this.voteHandlers.set(txId, (participantId, vote) => {
                txRecord.votes.set(participantId, vote);
                txRecord.participants.find(p => p.participantId === participantId)!.hasVoted = true;
                checkComplete();
            });
            
            // Register timeout handler
            this.timeoutHandlers.set(txId, () => {
                // Timeout - not all votes received, must abort
                resolve('ABORT');
            });
        });
    }
}

Decision Dissemination

Decision Durability Requirement

decision-dissemination.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
class Coordinator {
    /**
     * Execute the commit or abort decision
     */
    async executeDecision(txId: string, decision: 'COMMIT' | 'ABORT'): Promise<'COMMIT' | 'ABORT'> {
        const txRecord = this.activeTransactions.get(txId)!;
        
        // CRITICAL: Force-write decision to log BEFORE sending to participants
        await this.durableLog.forceWrite({
            type: decision,
            transactionId: txId,
            timestamp: Date.now()
        });
        
        // Update in-memory state
        txRecord.state = decision;
        txRecord.decision = decision;
        
        // Determine which participants need notification
        // For COMMIT: all participants
        // For ABORT: only those that voted COMMIT (others already aborted)
        const participantsToNotify = decision === 'COMMIT'
            ? txRecord.participants
            : txRecord.participants.filter(p => p.vote === 'VOTE_COMMIT');
        
        // Send decision to all relevant participants
        for (const participant of participantsToNotify) {
            this.sendDecision(participant, txId, decision);
        }
        
        // Start acknowledgment collection phase
        this.startAckPhase(txId, participantsToNotify);
        
        // Wait for all acknowledgments (with retry loop)
        await this.collectAcknowledgments(txId, participantsToNotify);
        
        // All acknowledged - log END and clean up
        await this.completeTransaction(txId);
        
        return decision;
    }
    
    /**
     * Collect acknowledgments with retry for reliability
     */
    private async collectAcknowledgments(
        txId: string, 
        participants: ParticipantInfo[]
    ): Promise<void> {
        const txRecord = this.activeTransactions.get(txId)!;
        
        while (true) {
            // Check if all have acknowledged
            const allAcked = participants.every(p => 
                txRecord.acknowledgments.has(p.participantId)
            );
            
            if (allAcked) {
                return; // All done
            }
            
            // Find who hasn't acknowledged
            const unackedParticipants = participants.filter(p =>
                !txRecord.acknowledgments.has(p.participantId)
            );
            
            // Retry sending decision to unacknowledged participants
            for (const participant of unackedParticipants) {
                participant.retryCount++;
                
                if (participant.retryCount > MAX_RETRIES) {
                    // Participant seems permanently failed
                    // Log this and consider them acknowledged
                    // (when they recover, they'll query us)
                    this.log.warn(`Participant ${participant.participantId} not responding`);
                    txRecord.acknowledgments.add(participant.participantId);
                    continue;
                }
                
                // Retry sending decision
                this.sendDecision(participant, txId, txRecord.decision!);
            }
            
            // Wait before next retry round
            await this.sleep(RETRY_INTERVAL * Math.min(participant.retryCount, 5));
        }
    }
    
    /**
     * Complete the transaction - log END and clean up
     */
    private async completeTransaction(txId: string): Promise<void> {
        // Log transaction end
        await this.durableLog.forceWrite({
            type: 'END',
            transactionId: txId,
            timestamp: Date.now()
        });
        
        // Clean up in-memory structures
        this.activeTransactions.delete(txId);
        this.timeoutHandles.delete(txId);
        this.voteHandlers.delete(txId);
    }
}

Timeout Management

Coordinator Timeout Behaviors
Timeout Type	Typical Value	On Expiry	Blocked Resources
Prepare Timeout	5-30 seconds	Decide ABORT	No locks held by coordinator
Ack Timeout (per retry)	1-5 seconds	Retry send	Transaction record in memory
Ack Maximum Retries	10-50 attempts	Log warning, mark complete	Log space for transaction
Transaction Total Timeout	Minutes to hours	Force abort (if possible)	All participant locks

Timeout Selection Trade-offs

timeout-implementation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
class TimeoutManager {
    private timers: Map<string, NodeJS.Timeout> = new Map();
    private deadlines: Map<string, number> = new Map(); // Monotonic time
    
    /**
     * Start a timeout for vote collection
     */
    startVoteTimeout(txId: string, timeoutMs: number, onTimeout: () => void): void {
        // Record deadline using monotonic time
        const deadline = this.getMonotonicTime() + timeoutMs;
        this.deadlines.set(txId, deadline);
        
        // Schedule callback
        const handle = setTimeout(() => {
            this.timers.delete(txId);
            this.deadlines.delete(txId);
            onTimeout();
        }, timeoutMs);
        
        this.timers.set(txId, handle);
    }
    
    /**
     * Cancel a pending timeout
     */
    cancelTimeout(txId: string): void {
        const handle = this.timers.get(txId);
        if (handle) {
            clearTimeout(handle);
            this.timers.delete(txId);
            this.deadlines.delete(txId);
        }
    }
    
    /**
     * On recovery, recalculate remaining timeout
     */
    resumeTimeout(txId: string, originalDeadline: number, onTimeout: () => void): void {
        const now = this.getMonotonicTime();
        const remaining = Math.max(0, originalDeadline - now);
        
        if (remaining === 0) {
            // Already expired
            onTimeout();
        } else {
            // Resume with remaining time
            this.startVoteTimeout(txId, remaining, onTimeout);
        }
    }
    
    /**
     * Get monotonic time (not affected by wall-clock adjustments)
     */
    private getMonotonicTime(): number {
        // In Node.js: process.hrtime()
        // In browser: performance.now()
        return performance.now();
    }
}

Coordinator Recovery

Converting Mermaid diagram...

Presumed Abort Optimization

coordinator-recovery.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
class CoordinatorRecovery {
    /**
     * Recover coordinator state after crash
     */
    async recover(): Promise<void> {
        console.log('Starting coordinator recovery...');
        
        // Step 1: Scan log and categorize transactions
        const transactions = await this.classifyTransactions();
        
        // Step 2: Handle each category
        for (const [txId, state] of transactions) {
            await this.recoverTransaction(txId, state);
        }
        
        console.log('Coordinator recovery complete');
    }
    
    /**
     * Scan log and classify transaction states
     */
    private async classifyTransactions(): Promise<Map<string, TransactionState>> {
        const transactions = new Map<string, TransactionState>();
        
        // Read log from checkpoint
        const logRecords = await this.durableLog.readFromCheckpoint();
        
        for (const record of logRecords) {
            switch (record.type) {
                case 'PREPARE':
                    transactions.set(record.transactionId, {
                        phase: 'WAIT',
                        participants: record.participants,
                        decision: undefined
                    });
                    break;
                    
                case 'COMMIT':
                    const commitTx = transactions.get(record.transactionId);
                    if (commitTx) {
                        commitTx.decision = 'COMMIT';
                    }
                    break;
                    
                case 'ABORT':
                    const abortTx = transactions.get(record.transactionId);
                    if (abortTx) {
                        abortTx.decision = 'ABORT';
                    }
                    break;
                    
                case 'END':
                    // Transaction complete - remove from recovery list
                    transactions.delete(record.transactionId);
                    break;
            }
        }
        
        return transactions;
    }
    
    /**
     * Recover a single transaction
     */
    private async recoverTransaction(txId: string, state: TransactionState): Promise<void> {
        if (state.decision === 'COMMIT') {
            // Decision was COMMIT - need to complete it
            console.log(`Recovering COMMIT for ${txId}`);
            await this.completeCommit(txId, state.participants);
            
        } else if (state.decision === 'ABORT') {
            // Decision was ABORT - need to notify voting COMMIT participants
            console.log(`Recovering ABORT for ${txId}`);
            await this.completeAbort(txId, state.participants);
            
        } else {
            // No decision - crashed during vote collection
            // Safe choice: ABORT
            console.log(`No decision found for ${txId} - aborting`);
            await this.durableLog.forceWrite({ type: 'ABORT', transactionId: txId });
            await this.completeAbort(txId, state.participants);
        }
    }
    
    /**
     * Handle participant query about transaction outcome
     */
    handleDecisionQuery(txId: string): 'COMMIT' | 'ABORT' {
        // Look up in recovered transactions
        const state = this.activeTransactions.get(txId);
        
        if (state?.decision === 'COMMIT') {
            return 'COMMIT';
        }
        
        // No record or explicit ABORT - return ABORT
        // (Presumed Abort)
        return 'ABORT';
    }
}

Coordinator High Availability

Coordinator HA Approaches
Approach	Failover Time	Data Loss Risk	Complexity	Latency Impact
Synchronous Replication	Seconds	None	Low-Medium	High (+1 RTT per commit)
Asynchronous Replication	Seconds	Some possible	Medium	Low
Consensus (Paxos/Raft)	Sub-second	None with quorum	High	Medium
Shared Storage	Seconds-Minutes	None if storage survives	Medium	Low

Modern Distributed Databases

Coordinator Logging Optimizations

Logging Performance Optimizations

•Group Commit: Batch multiple transaction's log records into a single disk I/O. Instead of syncing after each record, wait briefly to accumulate records from concurrent transactions, then sync once.
•Presumed Abort: Don't log ABORT decisions. If a transaction has no COMMIT record, assume it aborted. Saves one forced write per abort.
•Presumed Commit (alternative): Log aborts but not commits. More complex but better if most transactions commit.
•Early Prepare Logging: Write the PREPARE record asynchronously while sending PREPARE messages. Only force it if a COMMIT decision is made.
•Log Record Compression: Compress participant lists and use transaction ID deltas to reduce log record sizes.
•Parallel Logging: Use multiple log partitions for concurrent writes, with each transaction assigned to a partition.

group-commit.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
class GroupCommitLog {
    private pendingRecords: LogRecord[] = [];
    private waiters: Map<string, () => void> = new Map();
    private flushTimer: NodeJS.Timeout | null = null;
    
    private readonly MAX_BATCH_SIZE = 100;
    private readonly MAX_WAIT_MS = 10;
    
    /**
     * Write a log record with group commit optimization
     */
    async writeWithGroupCommit(record: LogRecord): Promise<void> {
        return new Promise((resolve) => {
            // Add record to pending batch
            this.pendingRecords.push(record);
            this.waiters.set(record.transactionId, resolve);
            
            // Check if batch is full
            if (this.pendingRecords.length >= this.MAX_BATCH_SIZE) {
                this.flushBatch();
                return;
            }
            
            // Start timer if not already running
            if (!this.flushTimer) {
                this.flushTimer = setTimeout(
                    () => this.flushBatch(), 
                    this.MAX_WAIT_MS
                );
            }
        });
    }
    
    /**
     * Flush pending records to disk
     */
    private async flushBatch(): Promise<void> {
        if (this.flushTimer) {
            clearTimeout(this.flushTimer);
            this.flushTimer = null;
        }
        
        if (this.pendingRecords.length === 0) return;
        
        // Capture current batch
        const batch = this.pendingRecords;
        const batchWaiters = new Map(this.waiters);
        
        this.pendingRecords = [];
        this.waiters = new Map();
        
        // Write and sync
        await this.writeToDisk(batch);
        await this.syncToDisk();
        
        // Notify all waiters
        for (const resolve of batchWaiters.values()) {
            resolve();
        }
    }
}

Summary: The Coordinator's Critical Role

Key Takeaways

•Transaction Management: The coordinator tracks all participants, manages transaction state, and orchestrates the commit protocol.
•Vote Collection: During Phase 1, the coordinator gathers votes from all participants, enforcing timeouts to prevent indefinite waiting.
•Decision Authority: The coordinator is the sole entity that determines the global outcome—COMMIT only if ALL vote yes; ABORT otherwise.
•Durable Decision: The coordinator force-writes its decision before informing any participant, ensuring recovery can complete the transaction.
•Reliable Notification: The coordinator must ensure all participants receive the decision, retrying until acknowledged.
•Recovery Obligation: After a crash, the coordinator must recover all in-flight transactions and complete them based on logged decisions.
•High Availability: Production systems use replication and failover to prevent coordinator failures from blocking participants indefinitely.

Page Complete

2 / 5