Transaction States - Learning Module

Loading content...

0/241

Partially Committed

The Point of No Return?

Your transaction has executed all its operations successfully. You've read the data you needed, performed calculations, updated rows, inserted records—everything went according to plan. Now you issue COMMIT. Is your data safe?

Not yet.

Between the moment you request a commit and the moment your transaction's effects become permanent, there exists a critical transitional state: Partially Committed. This state represents a transaction that has completed all its operations but whose changes have not yet been guaranteed to survive a system crash.

The Partially Committed state exists because achieving durability (the 'D' in ACID) requires physical I/O operations that cannot be completed instantaneously. During this brief window, the database system is working to make your transaction's effects permanent—but if a failure occurs at precisely the wrong moment, even a 'committed' transaction might be lost.

What You Will Learn

By the end of this page, you will understand why the Partially Committed state exists, what operations the DBMS performs during this critical phase, how different databases implement commit processing, the distinction between logical and physical commit, and what happens when failures occur during partial commitment.

Formal Definition of Partially Committed State

Let's establish a precise understanding of what it means for a transaction to be Partially Committed.

Formal Definition:

A transaction T is in the Partially Committed state if and only if:

T was in the Active state and has executed its final operation (the COMMIT request)
T has not yet completed the commit protocol (durability not yet guaranteed)
T has not failed during commit processing

Using state machine notation:

T ∈ PartiallyCommitted ⟺ 
    (previous_state(T) = Active) ∧
    (commit_requested(T) = true) ∧
    (commit_complete(T) = false) ∧
    (failure_during_commit(T) = false)

The Key Distinction:

The Partially Committed state represents the logical completion of a transaction that is waiting for physical confirmation. The transaction has done everything it logically intended to do, but the system has not yet made those changes crash-proof.

Comparison: Active vs Partially Committed
Aspect	Active State	Partially Committed State
Operations permitted	Read, Write, Savepoint, Control	None—awaiting commit completion
Reversibility	Can be rolled back at any time	Attempting to achieve irreversibility
Durability status	Changes only in buffer/log	Forcing log to stable storage
Next possible states	Partially Committed or Failed	Committed or Failed
Resource locks	Held for isolation	Still held, released after commit
Log records	In log buffer	Being forced to disk

Why Not Go Directly to Committed?

You might wonder why we need a Partially Committed state at all. The reason is hardware reality: writing to stable storage takes time. While waiting for disk I/O to confirm, failures can still occur. The Partially Committed state acknowledges that the logical decision to commit has been made, but physical durability is still being established. This separation allows the DBMS to properly handle failures during commit processing.

Entering the Partially Committed State

A transaction transitions from Active to Partially Committed when it issues a commit request. Let's examine this transition in detail.

The COMMIT Command:

When you execute COMMIT (or the equivalent in your database), you're signaling that:

All intended operations have been executed
No errors were detected that require rollback
The transaction's effects should be made permanent

At this moment, the transaction's state changes to Partially Committed, and a series of critical actions begin.

Converting Mermaid diagram...

What Happens Immediately After COMMIT:

The instant a transaction enters Partially Committed:

Commit Record Generation — A commit log record is created, marking this transaction as intending to commit
Operation Prohibition — No further read or write operations are permitted; the transaction's work is complete
Log Force Initiation — The DBMS begins forcing all log records (including the commit record) to stable storage
Lock Retention — All locks are still held; releasing them prematurely could violate isolation
Wait State — The transaction waits for the log force to complete

commit-processing-pseudocode.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Pseudocode: Commit processing from Active to Committed
 
function process_commit(transaction T) {
    // PRE-CONDITION: T is in Active state
    assert(T.state == ACTIVE);
    
    // Transition to Partially Committed
    T.state = PARTIALLY_COMMITTED;
    
    // Step 1: Write the commit log record to log buffer
    commit_record = create_log_record(
        type: COMMIT,
        transaction_id: T.id,
        timestamp: current_time()
    );
    append_to_log_buffer(commit_record);
    
    // Step 2: Force log to stable storage (THE CRITICAL STEP)
    // This is where durability is achieved
    try {
        force_log_to_disk(up_to: commit_record.LSN);
        // If we reach here, the log force succeeded
        
        // Step 3: Transition to Committed
        T.state = COMMITTED;
        
        // Step 4: Release all locks (now safe)
        release_all_locks(T);
        
        // Step 5: Clean up transaction resources
        deallocate_transaction_descriptor(T);
        
        return SUCCESS;
        
    } catch (IOError e) {
        // Log force failed - the commit cannot complete
        T.state = FAILED;
        // Transaction will need to be aborted
        return FAILURE;
    }
}

The Critical Moment

The transition from Partially Committed to Committed happens at exactly one point: when the log force completes successfully. If the system crashes AFTER the log force but BEFORE updating the transaction state to Committed, recovery will still see the commit record in the log and complete the commit. This is why log-based recovery works.

Log Force: The Heart of Commit Processing

The most critical operation in the Partially Committed state is the log force. Understanding this operation is essential for grasping how databases achieve durability.

What is a Log Force?

A log force (also called log flush or log sync) is the operation of writing all log records from the in-memory log buffer to stable storage (disk) and waiting for confirmation that the write has completed. Only when this confirmation arrives is durability guaranteed.

Why Force the Log?

Consider the alternative: if we only kept log records in memory and the system crashed, all those records would be lost. Without the log records, the DBMS would have no way to know:

What changes were made by the transaction
Whether the transaction committed or not
How to redo committed work or undo uncommitted work

The log force ensures that the commit decision survives any crash.

Log Contents at Commit Time
Log Record Type	Contents	Purpose
BEGIN	Transaction ID, Start timestamp	Mark transaction start for recovery
UPDATE (each)	TID, Table, OldValue, NewValue, LSN	Provide redo and undo information
INSERT (each)	TID, Table, NewRow, LSN	Enable redo; undo deletes the row
DELETE (each)	TID, Table, OldRow, LSN	Enable redo; undo reinserts the row
COMMIT	Transaction ID, Commit timestamp	Mark transaction as durably committed

The Write-Ahead Logging (WAL) Rule:

Databases follow the Write-Ahead Logging rule: before any change is written to the actual data files, the corresponding log record must be written to stable storage first. This ensures:

Undo capability — If a transaction aborts, the old values are in the log
Redo capability — If a committed transaction's changes weren't written to data files, they can be redone from the log
Recovery correctness — The log is the authoritative record of what happened

In the Partially Committed state, we go one step further: we force ALL log records for the transaction, including the commit record, to disk.

log-force-details.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Detailed log force operation during Partially Committed
 
function force_log_to_disk(up_to_lsn: LSN) {
    // Get the current state of the log buffer
    buffer_end_lsn = log_buffer.last_lsn();
    
    // We may force more than requested if the buffer has additional records
    // (batching for efficiency)
    lsn_to_force = max(up_to_lsn, buffer_end_lsn);
    
    // Create the I/O operation
    io_request = {
        data: log_buffer.contents_up_to(lsn_to_force),
        destination: log_file.next_write_position(),
        flags: O_SYNC | O_DIRECT  // Ensure actual disk write, bypass OS cache
    };
    
    // Issue the write and WAIT for completion
    // This is a SYNCHRONOUS operation - critical for durability
    result = disk_write_sync(io_request);
    
    if (result.bytes_written != io_request.data.size) {
        throw IOError("Incomplete log write");
    }
    
    // Optionally: force disk controller to flush its cache
    // (Important for true durability on some hardware)
    disk_sync(log_file);
    
    // Update bookkeeping
    log_buffer.mark_flushed_up_to(lsn_to_force);
    
    // Now durability is guaranteed up to lsn_to_force
}
 
// Group commit optimization: batch multiple commits together
function group_commit() {
    // Instead of forcing for each transaction individually,
    // collect multiple commits and force once
    wait_for_commit_batch(timeout: 1ms);  // Brief delay to collect commits
    
    // Force all accumulated commits in one I/O
    force_log_to_disk(max_lsn_of_waiting_commits);
    
    // Wake up all transactions whose commits are now durable
    for each transaction in waiting_commits:
        transaction.state = COMMITTED;
        release_locks(transaction);
}

Group Commit Optimization

Modern databases use 'group commit' to amortize the cost of log forces across multiple transactions. Instead of each transaction forcing the log independently, the system briefly waits to batch multiple commit records together. One I/O operation then makes all transactions durable. This dramatically improves throughput in high-concurrency systems.

Hardware Considerations for Durability

The Partially Committed state's duration and reliability depend heavily on hardware characteristics. Understanding these factors is crucial for database administrators and engineers designing storage systems.

The Lying Disk Problem:

Some disk drives (and operating systems) report that a write has completed when the data is actually only in a volatile cache on the disk controller. If power is lost at this moment, the 'durable' data is lost. This is why true durability requires:

Battery-backed write cache — The disk controller has batteries to preserve its cache during power loss
Force Unit Access (FUA) — A command that bypasses the disk cache and writes directly to platters
Cache flush commands — Explicit commands to flush the disk's write cache to permanent storage
NVMe protocols — Modern NVMe drives have clearer semantics for data persistence

Durable Storage

•Battery-backed RAID controllers
•Enterprise SSDs with power-loss protection
•Proper fsync/fdatasync calls
•Direct I/O bypassing OS cache
•NVMe with proper flush semantics

Durability Risks

•Consumer-grade SSDs without PLP
•Enabled write-back caching without BBU
•Virtualized storage without proper config
•Network storage with unclear semantics
•Mounted file systems with caching issues

Latency in Partially Committed State:

The time a transaction spends in Partially Committed is dominated by I/O latency. Typical values:

Storage Type	Log Force Latency	Impact on Commit Rate
HDD (7.2K RPM)	5-10 ms	~100-200 commits/sec (individual)
Enterprise SSD	0.1-0.5 ms	~2,000-10,000 commits/sec
NVMe SSD	0.02-0.1 ms	~10,000-50,000 commits/sec
Optane/PMEM	0.005-0.02 ms	~50,000-200,000 commits/sec

Note: With group commit, actual throughput can be much higher than individual commit latency would suggest.

Database Configuration for Durability:

durability-configuration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- PostgreSQL durability settings
 
-- Full durability (default, recommended for most workloads)
SET synchronous_commit = on;
 
-- Group commit delay (wait up to 10μs to batch commits)
SET commit_delay = 10;       -- microseconds
SET commit_siblings = 5;     -- minimum concurrent transactions before delaying
 
-- For write-heavy workloads: slightly relaxed durability
-- CAUTION: Data may be lost on crash (up to 3 * wal_writer_delay)
SET synchronous_commit = off;
 
-- Force WAL sync method (varies by OS)
-- SHOW wal_sync_method;  -- see current setting
-- Options: fsync, fdatasync, open_sync, open_datasync
 
-- Check if the file system/disk combination is trustworthy
-- Linux: Check /sys/block/sdX/queue/write_cache for write caching status

Trading Durability for Performance

Some applications trade durability for performance by configuring asynchronous commit or reduced flush frequency. This is acceptable ONLY when the application can tolerate losing the last few seconds of transactions after a crash. Never disable durability guarantees without understanding the business impact and having compensating controls.

Failures During Partial Commitment

What happens if something goes wrong while a transaction is Partially Committed? This is a critical scenario that database recovery systems must handle correctly.

Types of Failures During Partial Commitment:

I/O Failure — The log force fails due to disk error, full disk, or hardware malfunction
System Crash — The entire database server crashes during log force
Power Failure — Sudden power loss while writing to disk
Network Failure (distributed systems) — Communication lost during two-phase commit

Converting Mermaid diagram...

Scenario Analysis:

Scenario 1: I/O Error During Log Force

If the database cannot complete the log force due to a disk error:

The transaction transitions from Partially Committed to Failed
The DBMS initiates rollback procedures
The client receives an error indicating the commit failed
The application must retry the entire transaction

Scenario 2: Crash Before Complete Log Force

If the system crashes before the log force completes:

The commit record did NOT reach stable storage
During recovery, no commit record is found for this transaction
The transaction is treated as an interrupted active transaction
Recovery performs UNDO — rolling back all changes
The transaction effectively aborted

Scenario 3: Crash After Complete Log Force

If the system crashes just after the log force succeeds but before the transaction state is updated to Committed:

The commit record IS in stable storage
During recovery, the commit record is found
Recovery performs REDO — ensuring all changes are applied
The transaction is treated as committed
The effects are preserved, even though the client might not have received confirmation

The Recovery Decision Point

The presence or absence of the COMMIT record in the recovery log is the definitive answer to 'did this transaction commit?' This is why the log force is the critical operation. Once the commit record reaches stable storage, the transaction WILL commit, even if the system crashes immediately afterward. Before the commit record reaches stable storage, the transaction WILL abort if the system crashes.

Client-Side Uncertainty:

A subtle but important issue arises from Scenario 3. Consider this sequence:

Client issues COMMIT
Server enters Partially Committed state
Server successfully forces log to disk
Server crashes before sending 'COMMIT successful' to client
Server recovers, transaction is committed
Client reconnects, doesn't know if their commit succeeded

The transaction did commit, but the client is uncertain. This is an inherent limitation of distributed systems—the 'two generals problem.' Applications must be designed to handle this uncertainty, typically through:

Idempotent operations — Retrying a commit-uncertain transaction produces the same result
Unique transaction identifiers — Checking if a transaction with a given ID already completed
Exactly-once semantics — Using application-level deduplication

handling-commit-uncertainty.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import uuid
import psycopg2
from psycopg2 import OperationalError
 
def transfer_with_idempotency(conn, from_account, to_account, amount):
    """
    Idempotent funds transfer that handles commit uncertainty.
    Uses a unique transfer ID to prevent duplicate execution.
    """
    # Generate unique transfer ID OUTSIDE the transaction
    transfer_id = str(uuid.uuid4())
    
    max_retries = 3
    for attempt in range(max_retries):
        try:
            with conn.cursor() as cur:
                cur.execute("BEGIN")
                
                # First, check if this transfer already happened
                # (handles reconnection after uncertain commit)
                cur.execute("""
                    SELECT status FROM transfers WHERE transfer_id = %s
                """, (transfer_id,))
                existing = cur.fetchone()
                
                if existing:
                    # Transfer already recorded - check its status
                    if existing[0] == 'COMPLETED':
                        conn.commit()
                        return {'status': 'already_completed', 'id': transfer_id}
                    # If not completed, continue with the transfer
                
                # Record the transfer attempt
                cur.execute("""
                    INSERT INTO transfers (transfer_id, from_acct, to_acct, amount, status)
                    VALUES (%s, %s, %s, %s, 'PENDING')
                    ON CONFLICT (transfer_id) DO NOTHING
                """, (transfer_id, from_account, to_account, amount))
                
                # Perform the actual transfer
                cur.execute("""
                    UPDATE accounts SET balance = balance - %s 
                    WHERE account_id = %s AND balance >= %s
                """, (amount, from_account, amount))
                
                if cur.rowcount == 0:
                    conn.rollback()
                    return {'status': 'insufficient_funds'}
                
                cur.execute("""
                    UPDATE accounts SET balance = balance + %s 
                    WHERE account_id = %s
                """, (amount, to_account))
                
                # Mark transfer complete
                cur.execute("""
                    UPDATE transfers SET status = 'COMPLETED' 
                    WHERE transfer_id = %s
                """, (transfer_id,))
                
                # COMMIT - entering Partially Committed state
                conn.commit()  # This might raise an exception if connection lost
                
                return {'status': 'completed', 'id': transfer_id}
                
        except OperationalError as e:
            # Connection lost - we don't know if commit succeeded
            # On retry, the idempotency check will tell us
            conn.close()
            conn = get_new_connection()  # Reconnect
            continue
            
    return {'status': 'failed_after_retries'}

Database-Specific Commit Processing

Different database systems implement the Partially Committed state with varying nuances. Understanding these differences is important for database selection and performance tuning.

PostgreSQL Commit Processing:

PostgreSQL uses Write-Ahead Logging (WAL) for durability. During commit:

Write COMMIT record to WAL buffer
Force WAL buffer to disk (respects synchronous_commit setting)
Update pg_xact (commit status) in shared memory
Return success to client
Background writer eventually updates actual data pages

The commit is durable after step 2, regardless of whether data pages are updated.

postgres-commit-internals.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Examine WAL activity during commit
 
-- Monitor WAL write position
SELECT pg_current_wal_lsn() AS current_wal_position;
 
-- Before commit
BEGIN;
UPDATE test_table SET value = 'new_value' WHERE id = 1;
 
-- Check WAL position growth
SELECT pg_current_wal_lsn() AS after_update;
 
-- Commit and observe
COMMIT;
SELECT pg_current_wal_lsn() AS after_commit;
 
-- View commit timestamp (for investigating transaction timing)
SELECT pg_xact_commit_timestamp('123456'::xid);  -- If track_commit_timestamp is on
 
-- Check if WAL has been acknowledged by replicas (synchronous replication)
SELECT 
    application_name,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,  -- Confirmed durable on replica
    replay_lsn
FROM pg_stat_replication;

MySQL/InnoDB Commit Processing:

MySQL separates the redo log (InnoDB) from the binary log (replication). A commit involves:

Prepare phase: Write prepare record to InnoDB redo log
Commit phase: Write to binary log (if enabled)
Commit phase: Mark transaction committed in InnoDB
Flush redo log according to innodb_flush_log_at_trx_commit

With innodb_flush_log_at_trx_commit=1 and sync_binlog=1, both logs are forced to disk at each commit.

SQL Server Commit Processing:

SQL Server uses a sophisticated logging mechanism:

Generate COMMIT log record
Write log records to the log file (synchronous if database is in FULL recovery mode)
Release locks
Update transaction status

SQL Server supports delayed durability as an option, where commit returns before the log is forced to disk, trading durability for latency.

Oracle Commit Processing:

Oracle uses redo logs and has an efficient commit mechanism:

Generate commit record with SCN (System Change Number)
Copy redo from private redo buffer to shared redo buffer
LGWR (Log Writer) writes to redo log files
Commit completes when LGWR confirms write

Oracle's Log Writer is a dedicated background process that batches commits for efficiency.

Commit Processing Comparison
Aspect	PostgreSQL	MySQL/InnoDB	SQL Server	Oracle
Log Mechanism	WAL	Redo Log + Binlog	Transaction Log	Redo Log
Commit Batching	Group Commit	Group Commit	Group Commit	LGWR Batching
Relaxed Durability	synchronous_commit=off	innodb_flush...=2	Delayed Durability	COMMIT_WRITE NOWAIT
Typical Commit Latency	~0.1-1ms	~0.1-0.5ms	~0.1-0.5ms	~0.05-0.2ms

Observing the Partially Committed State

The Partially Committed state is typically fleeting—lasting only as long as the log force operation, often just milliseconds. However, in certain circumstances, transactions can be observed in this state.

When Partially Committed Becomes Visible:

High system load — When I/O bandwidth is saturated, log forces take longer
Synchronous replication — Commit may wait for replica acknowledgment
Network storage — Remote storage adds latency
Slow disk subsystem — Degraded RAID arrays or overloaded storage
Commit contention — Many transactions committing simultaneously

Monitoring Commit Latency:

monitoring-commit-latency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Enable timing for session
\timing on
 
-- Measure commit latency directly
BEGIN;
UPDATE small_table SET value = 'test' WHERE id = 1;
COMMIT;  -- Timing shows commit duration (includes Partially Committed time)
 
-- Monitor WAL write/sync statistics
SELECT 
    *,
    CASE WHEN wal_buffers_full > 0 
         THEN 'May indicate commit contention' 
         ELSE 'Normal' 
    END AS assessment
FROM pg_stat_wal;
 
-- Check for transactions waiting on WAL insert lock
SELECT 
    wait_event_type,
    wait_event,
    count(*)
FROM pg_stat_activity
WHERE wait_event LIKE '%Wal%'
GROUP BY wait_event_type, wait_event;
 
-- View bgwriter/checkpointer stats (related to durability)
SELECT * FROM pg_stat_bgwriter;

Troubleshooting Long Partially Committed Times:

If you observe transactions spending excessive time in the commit phase:

Check I/O latency — Use iostat, disk monitoring tools; look for high await times
Verify storage configuration — Ensure write caching is properly configured
Examine replication lag — Synchronous replication waits can extend commit time
Review commit frequency — Too many individual commits may overwhelm I/O
Consider group commit tuning — Adjusting group commit parameters may help
Upgrade storage — NVMe or persistent memory dramatically reduces commit latency

The Impact on Application Performance

Application-perceived commit latency directly affects user experience. In a typical OLTP application, commit latency contributes to overall response time. Moving from HDD to SSD can reduce commit latency by 10-100x, dramatically improving application responsiveness. For write-heavy applications, commit optimization is often the highest-ROI performance improvement.

Summary: The Partially Committed State

We've thoroughly explored the Partially Committed state—the critical bridge between logical commit and physical durability. Let's consolidate the key concepts:

Key Takeaways

•Transitional state — Partially Committed represents a transaction that has completed its operations and requested commit, but whose durability is not yet guaranteed.
•Log force is the critical operation — The primary activity in this state is forcing the commit record and all related log records to stable storage.
•Durability decision point — The presence of the commit record in stable storage determines whether a transaction commits or aborts after a crash.
•Hardware matters — Storage characteristics (HDD vs SSD vs NVMe) dramatically affect how long transactions spend in Partially Committed state.
•Group commit optimization — Modern databases batch multiple commits to amortize I/O costs, improving throughput without sacrificing durability.
•Failure handling — If a failure occurs before the log force completes, the transaction will abort. After the log force, it will commit during recovery.
•Client uncertainty — A crash during Partially Committed can leave clients uncertain about commit status, requiring idempotent designs.
•Database variations — Different systems have varying implementations and tuning options for commit processing.

What's Next:

With an understanding of both Active and Partially Committed states, we'll now explore the Committed state—the final, successful outcome where a transaction's effects become permanent and irreversible. We'll examine what 'committed' truly means, the guarantees it provides, and how the system handles post-commit cleanup.

Page Complete

You now understand the Partially Committed state in depth—why it exists, what happens during this phase, how durability is achieved through log forcing, and how to handle failures during commitment. This understanding is essential for building robust applications and troubleshooting database performance issues.

Partially Committed

The Point of No Return?

Not yet.

What You Will Learn

Formal Definition of Partially Committed State

Let's establish a precise understanding of what it means for a transaction to be Partially Committed.

Formal Definition:

A transaction T is in the Partially Committed state if and only if:

T was in the Active state and has executed its final operation (the COMMIT request)
T has not yet completed the commit protocol (durability not yet guaranteed)
T has not failed during commit processing

Using state machine notation:

T ∈ PartiallyCommitted ⟺ 
    (previous_state(T) = Active) ∧
    (commit_requested(T) = true) ∧
    (commit_complete(T) = false) ∧
    (failure_during_commit(T) = false)

The Key Distinction:

Comparison: Active vs Partially Committed
Aspect	Active State	Partially Committed State
Operations permitted	Read, Write, Savepoint, Control	None—awaiting commit completion
Reversibility	Can be rolled back at any time	Attempting to achieve irreversibility
Durability status	Changes only in buffer/log	Forcing log to stable storage
Next possible states	Partially Committed or Failed	Committed or Failed
Resource locks	Held for isolation	Still held, released after commit
Log records	In log buffer	Being forced to disk

Why Not Go Directly to Committed?

Entering the Partially Committed State

A transaction transitions from Active to Partially Committed when it issues a commit request. Let's examine this transition in detail.

The COMMIT Command:

When you execute COMMIT (or the equivalent in your database), you're signaling that:

All intended operations have been executed
No errors were detected that require rollback
The transaction's effects should be made permanent

At this moment, the transaction's state changes to Partially Committed, and a series of critical actions begin.

Converting Mermaid diagram...

What Happens Immediately After COMMIT:

The instant a transaction enters Partially Committed:

Commit Record Generation — A commit log record is created, marking this transaction as intending to commit
Operation Prohibition — No further read or write operations are permitted; the transaction's work is complete
Log Force Initiation — The DBMS begins forcing all log records (including the commit record) to stable storage
Lock Retention — All locks are still held; releasing them prematurely could violate isolation
Wait State — The transaction waits for the log force to complete

commit-processing-pseudocode.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Pseudocode: Commit processing from Active to Committed
 
function process_commit(transaction T) {
    // PRE-CONDITION: T is in Active state
    assert(T.state == ACTIVE);
    
    // Transition to Partially Committed
    T.state = PARTIALLY_COMMITTED;
    
    // Step 1: Write the commit log record to log buffer
    commit_record = create_log_record(
        type: COMMIT,
        transaction_id: T.id,
        timestamp: current_time()
    );
    append_to_log_buffer(commit_record);
    
    // Step 2: Force log to stable storage (THE CRITICAL STEP)
    // This is where durability is achieved
    try {
        force_log_to_disk(up_to: commit_record.LSN);
        // If we reach here, the log force succeeded
        
        // Step 3: Transition to Committed
        T.state = COMMITTED;
        
        // Step 4: Release all locks (now safe)
        release_all_locks(T);
        
        // Step 5: Clean up transaction resources
        deallocate_transaction_descriptor(T);
        
        return SUCCESS;
        
    } catch (IOError e) {
        // Log force failed - the commit cannot complete
        T.state = FAILED;
        // Transaction will need to be aborted
        return FAILURE;
    }
}

The Critical Moment

Log Force: The Heart of Commit Processing

The most critical operation in the Partially Committed state is the log force. Understanding this operation is essential for grasping how databases achieve durability.

What is a Log Force?

Why Force the Log?

Consider the alternative: if we only kept log records in memory and the system crashed, all those records would be lost. Without the log records, the DBMS would have no way to know:

What changes were made by the transaction
Whether the transaction committed or not
How to redo committed work or undo uncommitted work

The log force ensures that the commit decision survives any crash.

Log Contents at Commit Time
Log Record Type	Contents	Purpose
BEGIN	Transaction ID, Start timestamp	Mark transaction start for recovery
UPDATE (each)	TID, Table, OldValue, NewValue, LSN	Provide redo and undo information
INSERT (each)	TID, Table, NewRow, LSN	Enable redo; undo deletes the row
DELETE (each)	TID, Table, OldRow, LSN	Enable redo; undo reinserts the row
COMMIT	Transaction ID, Commit timestamp	Mark transaction as durably committed

The Write-Ahead Logging (WAL) Rule:

Databases follow the Write-Ahead Logging rule: before any change is written to the actual data files, the corresponding log record must be written to stable storage first. This ensures:

Undo capability — If a transaction aborts, the old values are in the log
Redo capability — If a committed transaction's changes weren't written to data files, they can be redone from the log
Recovery correctness — The log is the authoritative record of what happened

In the Partially Committed state, we go one step further: we force ALL log records for the transaction, including the commit record, to disk.

log-force-details.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Detailed log force operation during Partially Committed
 
function force_log_to_disk(up_to_lsn: LSN) {
    // Get the current state of the log buffer
    buffer_end_lsn = log_buffer.last_lsn();
    
    // We may force more than requested if the buffer has additional records
    // (batching for efficiency)
    lsn_to_force = max(up_to_lsn, buffer_end_lsn);
    
    // Create the I/O operation
    io_request = {
        data: log_buffer.contents_up_to(lsn_to_force),
        destination: log_file.next_write_position(),
        flags: O_SYNC | O_DIRECT  // Ensure actual disk write, bypass OS cache
    };
    
    // Issue the write and WAIT for completion
    // This is a SYNCHRONOUS operation - critical for durability
    result = disk_write_sync(io_request);
    
    if (result.bytes_written != io_request.data.size) {
        throw IOError("Incomplete log write");
    }
    
    // Optionally: force disk controller to flush its cache
    // (Important for true durability on some hardware)
    disk_sync(log_file);
    
    // Update bookkeeping
    log_buffer.mark_flushed_up_to(lsn_to_force);
    
    // Now durability is guaranteed up to lsn_to_force
}
 
// Group commit optimization: batch multiple commits together
function group_commit() {
    // Instead of forcing for each transaction individually,
    // collect multiple commits and force once
    wait_for_commit_batch(timeout: 1ms);  // Brief delay to collect commits
    
    // Force all accumulated commits in one I/O
    force_log_to_disk(max_lsn_of_waiting_commits);
    
    // Wake up all transactions whose commits are now durable
    for each transaction in waiting_commits:
        transaction.state = COMMITTED;
        release_locks(transaction);
}

Group Commit Optimization

Hardware Considerations for Durability

The Lying Disk Problem:

Battery-backed write cache — The disk controller has batteries to preserve its cache during power loss
Force Unit Access (FUA) — A command that bypasses the disk cache and writes directly to platters
Cache flush commands — Explicit commands to flush the disk's write cache to permanent storage
NVMe protocols — Modern NVMe drives have clearer semantics for data persistence

Durable Storage

•Battery-backed RAID controllers
•Enterprise SSDs with power-loss protection
•Proper fsync/fdatasync calls
•Direct I/O bypassing OS cache
•NVMe with proper flush semantics

Durability Risks

•Consumer-grade SSDs without PLP
•Enabled write-back caching without BBU
•Virtualized storage without proper config
•Network storage with unclear semantics
•Mounted file systems with caching issues

Latency in Partially Committed State:

The time a transaction spends in Partially Committed is dominated by I/O latency. Typical values:

Storage Type	Log Force Latency	Impact on Commit Rate
HDD (7.2K RPM)	5-10 ms	~100-200 commits/sec (individual)
Enterprise SSD	0.1-0.5 ms	~2,000-10,000 commits/sec
NVMe SSD	0.02-0.1 ms	~10,000-50,000 commits/sec
Optane/PMEM	0.005-0.02 ms	~50,000-200,000 commits/sec

Note: With group commit, actual throughput can be much higher than individual commit latency would suggest.

Database Configuration for Durability:

durability-configuration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- PostgreSQL durability settings
 
-- Full durability (default, recommended for most workloads)
SET synchronous_commit = on;
 
-- Group commit delay (wait up to 10μs to batch commits)
SET commit_delay = 10;       -- microseconds
SET commit_siblings = 5;     -- minimum concurrent transactions before delaying
 
-- For write-heavy workloads: slightly relaxed durability
-- CAUTION: Data may be lost on crash (up to 3 * wal_writer_delay)
SET synchronous_commit = off;
 
-- Force WAL sync method (varies by OS)
-- SHOW wal_sync_method;  -- see current setting
-- Options: fsync, fdatasync, open_sync, open_datasync
 
-- Check if the file system/disk combination is trustworthy
-- Linux: Check /sys/block/sdX/queue/write_cache for write caching status

Trading Durability for Performance

Failures During Partial Commitment

What happens if something goes wrong while a transaction is Partially Committed? This is a critical scenario that database recovery systems must handle correctly.

Types of Failures During Partial Commitment:

I/O Failure — The log force fails due to disk error, full disk, or hardware malfunction
System Crash — The entire database server crashes during log force
Power Failure — Sudden power loss while writing to disk
Network Failure (distributed systems) — Communication lost during two-phase commit

Converting Mermaid diagram...

Scenario Analysis:

Scenario 1: I/O Error During Log Force

If the database cannot complete the log force due to a disk error:

The transaction transitions from Partially Committed to Failed
The DBMS initiates rollback procedures
The client receives an error indicating the commit failed
The application must retry the entire transaction

Scenario 2: Crash Before Complete Log Force

If the system crashes before the log force completes:

The commit record did NOT reach stable storage
During recovery, no commit record is found for this transaction
The transaction is treated as an interrupted active transaction
Recovery performs UNDO — rolling back all changes
The transaction effectively aborted

Scenario 3: Crash After Complete Log Force

If the system crashes just after the log force succeeds but before the transaction state is updated to Committed:

The commit record IS in stable storage
During recovery, the commit record is found
Recovery performs REDO — ensuring all changes are applied
The transaction is treated as committed
The effects are preserved, even though the client might not have received confirmation

The Recovery Decision Point

Client-Side Uncertainty:

A subtle but important issue arises from Scenario 3. Consider this sequence:

Client issues COMMIT
Server enters Partially Committed state
Server successfully forces log to disk
Server crashes before sending 'COMMIT successful' to client
Server recovers, transaction is committed
Client reconnects, doesn't know if their commit succeeded

Idempotent operations — Retrying a commit-uncertain transaction produces the same result
Unique transaction identifiers — Checking if a transaction with a given ID already completed
Exactly-once semantics — Using application-level deduplication

handling-commit-uncertainty.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
import uuid
import psycopg2
from psycopg2 import OperationalError
 
def transfer_with_idempotency(conn, from_account, to_account, amount):
    """
    Idempotent funds transfer that handles commit uncertainty.
    Uses a unique transfer ID to prevent duplicate execution.
    """
    # Generate unique transfer ID OUTSIDE the transaction
    transfer_id = str(uuid.uuid4())
    
    max_retries = 3
    for attempt in range(max_retries):
        try:
            with conn.cursor() as cur:
                cur.execute("BEGIN")
                
                # First, check if this transfer already happened
                # (handles reconnection after uncertain commit)
                cur.execute("""
                    SELECT status FROM transfers WHERE transfer_id = %s
                """, (transfer_id,))
                existing = cur.fetchone()
                
                if existing:
                    # Transfer already recorded - check its status
                    if existing[0] == 'COMPLETED':
                        conn.commit()
                        return {'status': 'already_completed', 'id': transfer_id}
                    # If not completed, continue with the transfer
                
                # Record the transfer attempt
                cur.execute("""
                    INSERT INTO transfers (transfer_id, from_acct, to_acct, amount, status)
                    VALUES (%s, %s, %s, %s, 'PENDING')
                    ON CONFLICT (transfer_id) DO NOTHING
                """, (transfer_id, from_account, to_account, amount))
                
                # Perform the actual transfer
                cur.execute("""
                    UPDATE accounts SET balance = balance - %s 
                    WHERE account_id = %s AND balance >= %s
                """, (amount, from_account, amount))
                
                if cur.rowcount == 0:
                    conn.rollback()
                    return {'status': 'insufficient_funds'}
                
                cur.execute("""
                    UPDATE accounts SET balance = balance + %s 
                    WHERE account_id = %s
                """, (amount, to_account))
                
                # Mark transfer complete
                cur.execute("""
                    UPDATE transfers SET status = 'COMPLETED' 
                    WHERE transfer_id = %s
                """, (transfer_id,))
                
                # COMMIT - entering Partially Committed state
                conn.commit()  # This might raise an exception if connection lost
                
                return {'status': 'completed', 'id': transfer_id}
                
        except OperationalError as e:
            # Connection lost - we don't know if commit succeeded
            # On retry, the idempotency check will tell us
            conn.close()
            conn = get_new_connection()  # Reconnect
            continue
            
    return {'status': 'failed_after_retries'}

Database-Specific Commit Processing

Different database systems implement the Partially Committed state with varying nuances. Understanding these differences is important for database selection and performance tuning.

PostgreSQL Commit Processing:

PostgreSQL uses Write-Ahead Logging (WAL) for durability. During commit:

Write COMMIT record to WAL buffer
Force WAL buffer to disk (respects synchronous_commit setting)
Update pg_xact (commit status) in shared memory
Return success to client
Background writer eventually updates actual data pages

The commit is durable after step 2, regardless of whether data pages are updated.

postgres-commit-internals.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Examine WAL activity during commit
 
-- Monitor WAL write position
SELECT pg_current_wal_lsn() AS current_wal_position;
 
-- Before commit
BEGIN;
UPDATE test_table SET value = 'new_value' WHERE id = 1;
 
-- Check WAL position growth
SELECT pg_current_wal_lsn() AS after_update;
 
-- Commit and observe
COMMIT;
SELECT pg_current_wal_lsn() AS after_commit;
 
-- View commit timestamp (for investigating transaction timing)
SELECT pg_xact_commit_timestamp('123456'::xid);  -- If track_commit_timestamp is on
 
-- Check if WAL has been acknowledged by replicas (synchronous replication)
SELECT 
    application_name,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,  -- Confirmed durable on replica
    replay_lsn
FROM pg_stat_replication;

MySQL/InnoDB Commit Processing:

MySQL separates the redo log (InnoDB) from the binary log (replication). A commit involves:

Prepare phase: Write prepare record to InnoDB redo log
Commit phase: Write to binary log (if enabled)
Commit phase: Mark transaction committed in InnoDB
Flush redo log according to innodb_flush_log_at_trx_commit

With innodb_flush_log_at_trx_commit=1 and sync_binlog=1, both logs are forced to disk at each commit.

SQL Server Commit Processing:

SQL Server uses a sophisticated logging mechanism:

Generate COMMIT log record
Write log records to the log file (synchronous if database is in FULL recovery mode)
Release locks
Update transaction status

SQL Server supports delayed durability as an option, where commit returns before the log is forced to disk, trading durability for latency.

Oracle Commit Processing:

Oracle uses redo logs and has an efficient commit mechanism:

Generate commit record with SCN (System Change Number)
Copy redo from private redo buffer to shared redo buffer
LGWR (Log Writer) writes to redo log files
Commit completes when LGWR confirms write

Oracle's Log Writer is a dedicated background process that batches commits for efficiency.

Commit Processing Comparison
Aspect	PostgreSQL	MySQL/InnoDB	SQL Server	Oracle
Log Mechanism	WAL	Redo Log + Binlog	Transaction Log	Redo Log
Commit Batching	Group Commit	Group Commit	Group Commit	LGWR Batching
Relaxed Durability	synchronous_commit=off	innodb_flush...=2	Delayed Durability	COMMIT_WRITE NOWAIT
Typical Commit Latency	~0.1-1ms	~0.1-0.5ms	~0.1-0.5ms	~0.05-0.2ms

Observing the Partially Committed State

When Partially Committed Becomes Visible:

High system load — When I/O bandwidth is saturated, log forces take longer
Synchronous replication — Commit may wait for replica acknowledgment
Network storage — Remote storage adds latency
Slow disk subsystem — Degraded RAID arrays or overloaded storage
Commit contention — Many transactions committing simultaneously

Monitoring Commit Latency:

monitoring-commit-latency.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Enable timing for session
\timing on
 
-- Measure commit latency directly
BEGIN;
UPDATE small_table SET value = 'test' WHERE id = 1;
COMMIT;  -- Timing shows commit duration (includes Partially Committed time)
 
-- Monitor WAL write/sync statistics
SELECT 
    *,
    CASE WHEN wal_buffers_full > 0 
         THEN 'May indicate commit contention' 
         ELSE 'Normal' 
    END AS assessment
FROM pg_stat_wal;
 
-- Check for transactions waiting on WAL insert lock
SELECT 
    wait_event_type,
    wait_event,
    count(*)
FROM pg_stat_activity
WHERE wait_event LIKE '%Wal%'
GROUP BY wait_event_type, wait_event;
 
-- View bgwriter/checkpointer stats (related to durability)
SELECT * FROM pg_stat_bgwriter;

Troubleshooting Long Partially Committed Times:

If you observe transactions spending excessive time in the commit phase:

Check I/O latency — Use iostat, disk monitoring tools; look for high await times
Verify storage configuration — Ensure write caching is properly configured
Examine replication lag — Synchronous replication waits can extend commit time
Review commit frequency — Too many individual commits may overwhelm I/O
Consider group commit tuning — Adjusting group commit parameters may help
Upgrade storage — NVMe or persistent memory dramatically reduces commit latency

The Impact on Application Performance

Summary: The Partially Committed State

We've thoroughly explored the Partially Committed state—the critical bridge between logical commit and physical durability. Let's consolidate the key concepts:

Key Takeaways

•Transitional state — Partially Committed represents a transaction that has completed its operations and requested commit, but whose durability is not yet guaranteed.
•Log force is the critical operation — The primary activity in this state is forcing the commit record and all related log records to stable storage.
•Durability decision point — The presence of the commit record in stable storage determines whether a transaction commits or aborts after a crash.
•Hardware matters — Storage characteristics (HDD vs SSD vs NVMe) dramatically affect how long transactions spend in Partially Committed state.
•Group commit optimization — Modern databases batch multiple commits to amortize I/O costs, improving throughput without sacrificing durability.
•Failure handling — If a failure occurs before the log force completes, the transaction will abort. After the log force, it will commit during recovery.
•Client uncertainty — A crash during Partially Committed can leave clients uncertain about commit status, requiring idempotent designs.
•Database variations — Different systems have varying implementations and tuning options for commit processing.

What's Next:

Page Complete