Database Management SystemsDirty Read Problem

The Dirty Read Problem: Reading Uncommitted Data

LevelIntermediate

Duration60 mins

TopicDirty Read Problem

2 / 5

Uncommitted Data: The Phantom State

The Invisible Layer Between Intent and Permanence

When you write code to update a database record, there's a crucial period between when the change is made and when it becomes permanent. During this interval, the modification exists in a peculiar state—visible to some operations but not yet durable, present in memory but not guaranteed to survive, enacted but not committed.

This is the realm of uncommitted data, and understanding it is essential to grasping why dirty reads are so problematic. Uncommitted data is not simply 'pending' data—it represents a fundamental uncertainty about whether the change will ever become part of the permanent database state.

What You Will Learn

This page explores the nature of uncommitted data: where it exists in the database architecture, how it differs from committed data, why it's inherently provisional, and how its exposure to other transactions creates the dirty read problem. You'll understand the mechanics that make uncommitted data fundamentally different from—and more dangerous than—committed data.

What Is Uncommitted Data?

Uncommitted data refers to modifications made by a transaction that has not yet reached its commit point. These modifications exist within the transaction's execution context but have not been made permanent (durable) in the database.

The Commit Boundary:

Every database transaction has a lifecycle that ends in one of two ways:

COMMIT: The transaction successfully completes, and all its modifications become permanent. The changes are guaranteed to survive system failures (durability).
ROLLBACK (ABORT): The transaction is cancelled, and all its modifications are discarded. It's as if the transaction never happened.

Uncommitted data exists in the uncertain period before either of these outcomes. At this point:

The outcome is undetermined—the transaction might commit or abort
The data is provisional—it may become permanent or disappear entirely
The visibility is ambiguous—whether other transactions should see it depends on isolation settings

Converting Mermaid diagram...

The Critical Uncertainty

The key insight is that uncommitted data has an uncertain future. Unlike committed data, which is guaranteed to persist, uncommitted data might vanish completely if the transaction aborts. Any decision based on uncommitted data is therefore based on information that might be retroactively annulled.

Where Uncommitted Data Lives

To understand dirty reads, we must understand the architecture of database systems and where uncommitted data physically resides. The handling of uncommitted data is fundamental to both transaction processing and recovery mechanisms.

Multi-Layer Storage Architecture:

Modern database systems use a layered storage architecture:

Database Storage Layers and Data States
Storage Layer	Uncommitted Data	Committed Data	Characteristics
CPU Cache/Registers	✓ Present	✓ Present	Fastest, volatile, single transaction
Buffer Pool (RAM)	✓ Modified pages ('dirty' pages)	✓ Present	Fast, volatile, shared across transactions
Transaction Log	✓ UNDO information recorded	✓ REDO information recorded	Sequential, persisted before data pages
Data Files (Disk)	May or may not be written*	✓ Eventually written	Permanent storage, random access

The Buffer Pool: Where Dirty Reads Occur

The buffer pool (also called the buffer cache) is the critical location for understanding dirty reads. This is a region of main memory that caches frequently-accessed database pages:

When a transaction reads data, pages are loaded from disk into the buffer pool
When a transaction writes data, modifications are made to pages in the buffer pool
Multiple transactions share the buffer pool, potentially accessing the same pages

When a transaction modifies a page, that page becomes dirty—its in-memory version differs from the on-disk version. This is where "dirty" data (and thus "dirty" reads) gets its name.

buffer-pool-architecture.pseudo
BUFFER POOL ARCHITECTURE
════════════════════════════════════════════════════════════════════
 
┌─────────────────────────────────────────────────────────────────┐
│                         BUFFER POOL (RAM)                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
│  │ Page P1  │  │ Page P2  │  │ Page P3  │  │ Page P4  │  ...   │
│  │ Clean    │  │ DIRTY    │  │ Clean    │  │ DIRTY    │        │
│  │          │  │ (by T1)  │  │          │  │ (by T2)  │        │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘        │
│                     ▲                           ▲                │
│                     │                           │                │
│              Uncommitted                  Uncommitted            │
│              by T1                        by T2                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      PERSISTENT STORAGE (Disk)                   │
├─────────────────────────────────────────────────────────────────┤
│  [Log File]   [Data File 1]   [Data File 2]   [Index Files]     │
│     │              │               │                │            │
│  Contains       May have        May have         May have        │
│  UNDO/REDO      old or new      old or new       old or new      │
│  records        version         version          version         │
└─────────────────────────────────────────────────────────────────┘
 
KEY INSIGHT: Dirty pages in buffer pool ARE VISIBLE to other 
transactions under weak isolation. This is where dirty reads occur.

Steal vs. No-Steal Policy

Database systems using a 'steal' policy may write dirty pages to disk BEFORE a transaction commits (to free buffer space). This doesn't change the commited/uncommitted distinction—the transaction log is the source of truth. Whether the dirty data is in RAM or on disk, it remains uncommitted until the transaction commits.

The Transaction Log and Uncommitted State

The transaction log (also called the write-ahead log or WAL) is the definitive record of the committed/uncommitted distinction. Understanding how the log tracks uncommitted data illuminates why arbitrary access to such data is dangerous.

Write-Ahead Logging (WAL) Principle:

Before any modification is applied to the database, it must first be recorded in the transaction log:

Log record written: Before modifying a data page, write an UNDO record (containing the old value) and a REDO record (containing the new value) to the log
Modification applied: Now modify the data page in the buffer pool
Commit decision: Eventually, write a COMMIT or ABORT record to the log

The log serves as the source of truth for what is committed. If the system crashes, recovery uses the log to undo uncommitted changes and redo committed changes.

Transaction Log Records for Uncommitted Operations
Log Record Type	Contents	Purpose	Uncommitted Status
BEGIN	Transaction ID, timestamp	Marks transaction start	Transaction now 'in flight'
UPDATE (UNDO/REDO)	Transaction ID, page ID, old value, new value	Records modification for recovery	Change is uncommitted
INSERT	Transaction ID, record location, record data	Records new row for recovery	Insert is uncommitted
DELETE	Transaction ID, record location, deleted data	Records deletion for recovery	Delete is uncommitted
COMMIT	Transaction ID, timestamp	Marks durability boundary	All changes now COMMITTED
ABORT	Transaction ID	Marks rollback requirement	All changes must be UNDONE

The Commit Record as the Dividing Line:

The presence or absence of a COMMIT record in the log is the definitive answer to whether data is committed:

COMMIT record present: All modifications by this transaction are committed and will survive recovery
COMMIT record absent: All modifications by this transaction are uncommitted and must be rolled back on recovery

This is why reading uncommitted data is fundamentally problematic: you are reading data whose ultimate fate—committed or aborted—is still undetermined. The COMMIT record that would legitimize this data doesn't exist yet and might never exist.

transaction-log-example.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
TRANSACTION LOG SEQUENCE
═══════════════════════════════════════════════════════════════════════
 
LSN   Transaction  Operation     Data Item   Before    After     Status
───────────────────────────────────────────────────────────────────────
101   T1           BEGIN         -           -         -         Active
102   T1           UPDATE        Account A   1000      800       Uncommitted
103   T2           BEGIN         -           -         -         Active
104   T1           UPDATE        Account B   500       700       Uncommitted
105   T2           READ          Account A   -         (sees?)   -
      
      ▲▲▲ CRITICAL MOMENT: T2 reads Account A ▲▲▲
      
      If T2 sees 800 (T1's uncommitted write): DIRTY READ
      If T2 sees 1000 (last committed value): Safe
      
106   T2           COMMIT        -           -         -         T2 Committed
107   T1           ABORT         -           -         -         T1 Aborted
      
AFTER RECOVERY:
- Account A = 1000 (T1's change undone)
- Account B = 500  (T1's change undone)
- T2's committed state may be based on value 800, which never existed!

The Log Never Lies

After a crash, recovery examines the log. Transactions with COMMIT records have their changes guaranteed. Transactions without COMMIT records have their changes undone. If T2 committed while using T1's uncommitted data, and T1 later aborted, the database is now in an inconsistent state that cannot be automatically corrected.

Committed vs. Uncommitted: The Semantic Divide

The distinction between committed and uncommitted data isn't just about timing—it represents fundamentally different semantic statuses within the database system.

Committed Data Properties:

Properties of Committed Data

•Durable: Survives system crashes, power failures, and hardware issues
•Consistent: Satisfies all database constraints and business rules at commit time
•Definitive: The value is the 'official' state of that data item
•Recoverable: Can be restored after any failure using log-based recovery
•Irrevocable: Cannot be retroactively annulled (only updated by new transactions)

Uncommitted Data Properties:

Properties of Uncommitted Data

•Provisional: May become permanent or may be completely discarded
•Uncertain: Constraints are checked, but final validity is unknown until commit
•Tentative: Represents the current intent of an in-flight transaction
•Vulnerable: Will be rolled back automatically on crash or explicit abort
•Revocable: Can be annulled at any moment before commit completes

Committed = Contract Fulfilled

When data is committed, the database makes a promise: this data will persist. Applications can depend on it. Other systems can be notified. Business processes can proceed. It's a contract.

Uncommitted = No Promises

Uncommitted data comes with no guarantees. The transaction might abort (intentionally or due to errors). The changes might vanish. Basing decisions on uncommitted data is basing decisions on promises that haven't been made.

The Trust Hierarchy:

Database systems inherently trust committed data more than uncommitted data. This trust hierarchy manifests in several ways:

Query default behavior: Most databases default to READ COMMITTED isolation, refusing to show uncommitted data to other transactions
Replication systems: Data replication typically propagates only committed changes, not uncommitted writes
External systems: Event-driven architectures (CDC, message queues) should only trigger on committed changes
Recovery systems: Recovery always preserves committed data and discards uncommitted data

Dirty reads violate this trust hierarchy by exposing uncommitted data to transactions that then treat it as if it had the guarantees of committed data.

Visibility Semantics Across Isolation Levels

Different isolation levels define different visibility rules for uncommitted data. Understanding these rules is essential for both database developers and application architects.

SQL Standard Isolation Levels and Uncommitted Data:

Uncommitted Data Visibility by Isolation Level
Isolation Level	Can See Uncommitted Data?	Implementation Approach	Use Case
READ UNCOMMITTED	✓ YES - Can read dirty data	No read locks, no version checking	Approximate analytics, monitoring dashboards
READ COMMITTED	✗ NO - Only committed values	Short-duration read locks or MVCC	Most OLTP applications
REPEATABLE READ	✗ NO - Plus consistent reads	Read locks held or snapshot	Reporting with consistency needs
SERIALIZABLE	✗ NO - Full isolation	Full locking or SSI	Critical financial operations

The READ UNCOMMITTED Exception:

The READ UNCOMMITTED isolation level explicitly permits dirty reads. This is not an accident or oversight—it's a deliberate trade-off:

Why would anyone want to read uncommitted data?

Performance: Avoiding locks entirely maximizes concurrency
Approximate results: Some queries don't need perfect accuracy (e.g., "roughly how many active users?")
Non-critical monitoring: Dashboard widgets showing real-time approximations
Long-running queries: Avoiding blocking writers for hours during analytics queries

However, this relaxation comes with significant risks, which is why READ UNCOMMITTED is rarely used and never the default.

isolation-level-demo.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Demonstrating uncommitted data visibility
 
-- Session 1: Start a transaction but don't commit
BEGIN TRANSACTION;
UPDATE accounts SET balance = 999999 WHERE account_id = 1;
-- Transaction T1 is still active, change is uncommitted
 
-- Session 2 with READ UNCOMMITTED: CAN see the dirty data
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT balance FROM accounts WHERE account_id = 1;
-- Returns: 999999 (the uncommitted value!)
-- This is a DIRTY READ
 
-- Session 3 with READ COMMITTED: CANNOT see the dirty data
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT balance FROM accounts WHERE account_id = 1;
-- Returns: original committed value (e.g., 1000)
-- This query either blocks or sees the old value (MVCC)
 
-- Session 1: Now rollback
ROLLBACK;
 
-- Session 2 made decisions based on 999999, but that value
-- was never committed - it effectively never existed

MVCC and Dirty Read Prevention

Modern databases like PostgreSQL and MySQL (InnoDB) use Multi-Version Concurrency Control (MVCC). In MVCC, readers see a consistent snapshot of committed data as of a specific point in time. This inherently prevents dirty reads without requiring read locks—readers never see uncommitted versions because they access historical committed versions.

The Provisional Nature of Uncommitted State

Uncommitted data is epistemologically different from committed data. While this might seem philosophical, understanding this distinction is crucial for robust database application design.

What Does 'Provisional' Mean?

When we call uncommitted data 'provisional', we mean:

Existence is contingent: The data's existence in the final database state depends on a future event (the commit) that hasn't occurred yet
Value is negotiable: Until commit, the transaction can further modify the data, meaning even the 'current' uncommitted value might not be the eventual committed value
Consistency is incomplete: Constraint checking may be deferred until commit time (deferred constraints)
Failure modes are open: System crashes, deadlocks, integrity violations, or application logic could all cause an abort

What Can Cause Uncommitted Data to Disappear
Cause	Description	Frequency
Explicit ROLLBACK	Application deliberately cancels the transaction	Common
Constraint Violation	Deferred constraint check fails at commit time	Occasional
Deadlock Resolution	Transaction chosen as deadlock victim and aborted	Occasional
Lock Timeout	Could not acquire necessary locks within time limit	Occasional
System Crash	Server failure before commit completes	Rare but critical
Connection Failure	Network or client disconnect before commit	Occasional
Resource Exhaustion	Out of log space, memory, or other limits	Rare
Application Error	Uncaught exception or error handling triggers rollback	Common

The Uncertainty Principle:

Reading uncommitted data is like observing a quantum superposition before measurement—you're seeing a value that exists in a state of uncertainty. The act of the original transaction committing or aborting is what 'collapses' this uncertainty into a definite outcome.

Unlike physical systems, however, your reading of the uncommitted data doesn't influence whether it commits or aborts. But if you base your own committed operations on that uncertain value, you've now coupled your transaction's correctness to that uncertainty.

The Business Perspective

From a business logic standpoint, uncommitted data represents intent, not fact. A user has indicated they want to transfer money, but the transfer isn't complete. An order has been placed, but not confirmed. Reading this data and acting on it is like acting on a promise that can still be withdrawn.

Implementation: How Databases Track Uncommitted State

Different database systems implement uncommitted data tracking in different ways. Understanding these implementations helps predict behavior and diagnose issues.

Lock-Based Systems:

In traditional lock-based systems (like SQL Server, DB2):

Writers acquire exclusive locks on data items before modifying them
Uncommitted modifications exist in the buffer pool, protected by locks
Readers requesting the same data must wait for the lock (or read dirty if allowed)
Commit releases locks, making data visible; rollback discards changes and releases locks

lock-based-uncommitted.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
LOCK-BASED UNCOMMITTED DATA TRACKING
═══════════════════════════════════════════════════════════════════
 
Data Structure for Each Data Item X:
┌────────────────────────────────────────────────────────────────┐
│  Lock Mode: [None | Shared | Exclusive]                        │
│  Lock Holder(s): Transaction ID(s) holding the lock            │
│  Current Value: The possibly-uncommitted value in buffer       │
│  Lock Queue: Transactions waiting for this lock                │
└────────────────────────────────────────────────────────────────┘
 
WRITE Operation (Transaction T, Item X):
  1. Request Exclusive Lock on X
     - If granted: Proceed to step 2
     - If blocked: Wait or abort depending on timeout
  2. Read current value into UNDO log
  3. Modify value in buffer pool
  4. Record REDO info in transaction log
  5. Mark page as 'dirty' (modified, uncommitted)
  
  ▶ X now contains UNCOMMITTED data from T
  
COMMIT (Transaction T):
  1. Force log records to disk (guarantee durability)
  2. Write COMMIT record to log
  3. Release all exclusive locks
  
  ▶ All of T's dirty data is now COMMITTED and visible
 
DIRTY READ scenario:
  - T1 holds exclusive lock on X with uncommitted value
  - T2 requests to READ X with READ UNCOMMITTED isolation
  - System allows T2 to read X's current buffer value
  - T2 sees T1's uncommitted modification

MVCC-Based Systems:

In MVCC systems (like PostgreSQL, MySQL InnoDB, Oracle):

Each modification creates a new version of the data item
Versions are tagged with the creating transaction's ID and commit status
Readers access versions based on visibility rules (typically only committed versions)
Uncommitted versions exist but are 'invisible' to other transactions by default

mvcc-uncommitted.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
MVCC UNCOMMITTED DATA TRACKING
═══════════════════════════════════════════════════════════════════
 
Version Chain for Data Item X:
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Version 3   │───▶│ Version 2   │───▶│ Version 1   │
│ XID: T5     │    │ XID: T3     │    │ XID: T1     │
│ Status: IN  │    │ Status: C   │    │ Status: C   │
│ Value: 300  │    │ Value: 200  │    │ Value: 100  │
│ Progress    │    │ (Committed) │    │ (Committed) │
└─────────────┘    └─────────────┘    └─────────────┘
       │
       ▼
    UNCOMMITTED        COMMITTED        COMMITTED
    (invisible to      (visible to      (visible to
     most readers)      snapshot ≥ T3)   snapshot ≥ T1)
 
VISIBILITY CHECK (Reader Transaction T_reader):
  FOR each version V of X (newest to oldest):
    IF V.creating_transaction is COMMITTED:
      IF V.created_before(T_reader.snapshot):
        RETURN V  -- This is the visible version
    ELIF V.creating_transaction == T_reader:
      RETURN V  -- Transaction sees its own uncommitted changes
    ELSE:
      CONTINUE  -- Skip uncommitted versions from other transactions
      
DIRTY READ in MVCC:
  - Reader ignores visibility check
  - Returns newest version regardless of commit status
  - This is explicitly enabled by READ UNCOMMITTED only

MVCC's Natural Protection

MVCC inherently prevents dirty reads in its default operation because the visibility algorithm explicitly skips uncommitted versions from other transactions. Dirty reads require deliberately bypassing this visibility check, which is why they're only possible at READ UNCOMMITTED isolation level.

Summary: The Nature of Uncommitted Data

We've explored the core concept underlying dirty reads: uncommitted data and its fundamental properties. Let's consolidate the key insights:

Key Concepts Covered

•Definition: Uncommitted data is any modification made by a transaction that hasn't yet reached its commit point
•Location: Uncommitted data resides primarily in the buffer pool (RAM), with recovery information in the transaction log
•The Commit Boundary: The COMMIT record in the transaction log is the definitive marker separating uncommitted from committed state
•Semantic Difference: Committed data has guarantees (durability, consistency); uncommitted data has only provisional status
•Visibility Rules: Different isolation levels define different rules for whether uncommitted data can be seen
•Provisional Nature: Uncommitted data may disappear due to rollback, deadlock, system crash, or application errors
•Implementation Varies: Lock-based and MVCC systems track uncommitted data differently, but the semantics are consistent

What's Next:

Now that we understand what uncommitted data is and why it's dangerous, we'll examine the consequences of reading such data. The next page explores the incorrect results that arise from dirty reads—how reading uncommitted data leads to computations, decisions, and committed states that violate database integrity.

Page Complete

You now have a deep understanding of uncommitted data: its nature, location, lifecycle, and why exposing it to other transactions is inherently risky. This foundation is essential for understanding the full impact of dirty reads, which we'll explore in the subsequent pages.

2 / 5

Loading learning content...

Database Management SystemsDirty Read Problem

The Dirty Read Problem: Reading Uncommitted Data

LevelIntermediate

Duration60 mins

TopicDirty Read Problem

2 / 5

Uncommitted Data: The Phantom State

The Invisible Layer Between Intent and Permanence

What You Will Learn

What Is Uncommitted Data?

The Commit Boundary:

Every database transaction has a lifecycle that ends in one of two ways:

COMMIT: The transaction successfully completes, and all its modifications become permanent. The changes are guaranteed to survive system failures (durability).
ROLLBACK (ABORT): The transaction is cancelled, and all its modifications are discarded. It's as if the transaction never happened.

Uncommitted data exists in the uncertain period before either of these outcomes. At this point:

The outcome is undetermined—the transaction might commit or abort
The data is provisional—it may become permanent or disappear entirely
The visibility is ambiguous—whether other transactions should see it depends on isolation settings

Converting Mermaid diagram...

The Critical Uncertainty

Where Uncommitted Data Lives

Multi-Layer Storage Architecture:

Modern database systems use a layered storage architecture:

Database Storage Layers and Data States
Storage Layer	Uncommitted Data	Committed Data	Characteristics
CPU Cache/Registers	✓ Present	✓ Present	Fastest, volatile, single transaction
Buffer Pool (RAM)	✓ Modified pages ('dirty' pages)	✓ Present	Fast, volatile, shared across transactions
Transaction Log	✓ UNDO information recorded	✓ REDO information recorded	Sequential, persisted before data pages
Data Files (Disk)	May or may not be written*	✓ Eventually written	Permanent storage, random access

The Buffer Pool: Where Dirty Reads Occur

The buffer pool (also called the buffer cache) is the critical location for understanding dirty reads. This is a region of main memory that caches frequently-accessed database pages:

When a transaction reads data, pages are loaded from disk into the buffer pool
When a transaction writes data, modifications are made to pages in the buffer pool
Multiple transactions share the buffer pool, potentially accessing the same pages

When a transaction modifies a page, that page becomes dirty—its in-memory version differs from the on-disk version. This is where "dirty" data (and thus "dirty" reads) gets its name.

buffer-pool-architecture.pseudo
BUFFER POOL ARCHITECTURE
════════════════════════════════════════════════════════════════════
 
┌─────────────────────────────────────────────────────────────────┐
│                         BUFFER POOL (RAM)                        │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐        │
│  │ Page P1  │  │ Page P2  │  │ Page P3  │  │ Page P4  │  ...   │
│  │ Clean    │  │ DIRTY    │  │ Clean    │  │ DIRTY    │        │
│  │          │  │ (by T1)  │  │          │  │ (by T2)  │        │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘        │
│                     ▲                           ▲                │
│                     │                           │                │
│              Uncommitted                  Uncommitted            │
│              by T1                        by T2                  │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      PERSISTENT STORAGE (Disk)                   │
├─────────────────────────────────────────────────────────────────┤
│  [Log File]   [Data File 1]   [Data File 2]   [Index Files]     │
│     │              │               │                │            │
│  Contains       May have        May have         May have        │
│  UNDO/REDO      old or new      old or new       old or new      │
│  records        version         version          version         │
└─────────────────────────────────────────────────────────────────┘
 
KEY INSIGHT: Dirty pages in buffer pool ARE VISIBLE to other 
transactions under weak isolation. This is where dirty reads occur.

Steal vs. No-Steal Policy

The Transaction Log and Uncommitted State

Write-Ahead Logging (WAL) Principle:

Before any modification is applied to the database, it must first be recorded in the transaction log:

Log record written: Before modifying a data page, write an UNDO record (containing the old value) and a REDO record (containing the new value) to the log
Modification applied: Now modify the data page in the buffer pool
Commit decision: Eventually, write a COMMIT or ABORT record to the log

The log serves as the source of truth for what is committed. If the system crashes, recovery uses the log to undo uncommitted changes and redo committed changes.

Transaction Log Records for Uncommitted Operations
Log Record Type	Contents	Purpose	Uncommitted Status
BEGIN	Transaction ID, timestamp	Marks transaction start	Transaction now 'in flight'
UPDATE (UNDO/REDO)	Transaction ID, page ID, old value, new value	Records modification for recovery	Change is uncommitted
INSERT	Transaction ID, record location, record data	Records new row for recovery	Insert is uncommitted
DELETE	Transaction ID, record location, deleted data	Records deletion for recovery	Delete is uncommitted
COMMIT	Transaction ID, timestamp	Marks durability boundary	All changes now COMMITTED
ABORT	Transaction ID	Marks rollback requirement	All changes must be UNDONE

The Commit Record as the Dividing Line:

The presence or absence of a COMMIT record in the log is the definitive answer to whether data is committed:

COMMIT record present: All modifications by this transaction are committed and will survive recovery
COMMIT record absent: All modifications by this transaction are uncommitted and must be rolled back on recovery

transaction-log-example.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
TRANSACTION LOG SEQUENCE
═══════════════════════════════════════════════════════════════════════
 
LSN   Transaction  Operation     Data Item   Before    After     Status
───────────────────────────────────────────────────────────────────────
101   T1           BEGIN         -           -         -         Active
102   T1           UPDATE        Account A   1000      800       Uncommitted
103   T2           BEGIN         -           -         -         Active
104   T1           UPDATE        Account B   500       700       Uncommitted
105   T2           READ          Account A   -         (sees?)   -
      
      ▲▲▲ CRITICAL MOMENT: T2 reads Account A ▲▲▲
      
      If T2 sees 800 (T1's uncommitted write): DIRTY READ
      If T2 sees 1000 (last committed value): Safe
      
106   T2           COMMIT        -           -         -         T2 Committed
107   T1           ABORT         -           -         -         T1 Aborted
      
AFTER RECOVERY:
- Account A = 1000 (T1's change undone)
- Account B = 500  (T1's change undone)
- T2's committed state may be based on value 800, which never existed!

The Log Never Lies

Committed vs. Uncommitted: The Semantic Divide

The distinction between committed and uncommitted data isn't just about timing—it represents fundamentally different semantic statuses within the database system.

Committed Data Properties:

Properties of Committed Data

•Durable: Survives system crashes, power failures, and hardware issues
•Consistent: Satisfies all database constraints and business rules at commit time
•Definitive: The value is the 'official' state of that data item
•Recoverable: Can be restored after any failure using log-based recovery
•Irrevocable: Cannot be retroactively annulled (only updated by new transactions)

Uncommitted Data Properties:

Properties of Uncommitted Data

•Provisional: May become permanent or may be completely discarded
•Uncertain: Constraints are checked, but final validity is unknown until commit
•Tentative: Represents the current intent of an in-flight transaction
•Vulnerable: Will be rolled back automatically on crash or explicit abort
•Revocable: Can be annulled at any moment before commit completes

Committed = Contract Fulfilled

When data is committed, the database makes a promise: this data will persist. Applications can depend on it. Other systems can be notified. Business processes can proceed. It's a contract.

Uncommitted = No Promises

The Trust Hierarchy:

Database systems inherently trust committed data more than uncommitted data. This trust hierarchy manifests in several ways:

Query default behavior: Most databases default to READ COMMITTED isolation, refusing to show uncommitted data to other transactions
Replication systems: Data replication typically propagates only committed changes, not uncommitted writes
External systems: Event-driven architectures (CDC, message queues) should only trigger on committed changes
Recovery systems: Recovery always preserves committed data and discards uncommitted data

Dirty reads violate this trust hierarchy by exposing uncommitted data to transactions that then treat it as if it had the guarantees of committed data.

Visibility Semantics Across Isolation Levels

Different isolation levels define different visibility rules for uncommitted data. Understanding these rules is essential for both database developers and application architects.

SQL Standard Isolation Levels and Uncommitted Data:

Uncommitted Data Visibility by Isolation Level
Isolation Level	Can See Uncommitted Data?	Implementation Approach	Use Case
READ UNCOMMITTED	✓ YES - Can read dirty data	No read locks, no version checking	Approximate analytics, monitoring dashboards
READ COMMITTED	✗ NO - Only committed values	Short-duration read locks or MVCC	Most OLTP applications
REPEATABLE READ	✗ NO - Plus consistent reads	Read locks held or snapshot	Reporting with consistency needs
SERIALIZABLE	✗ NO - Full isolation	Full locking or SSI	Critical financial operations

The READ UNCOMMITTED Exception:

The READ UNCOMMITTED isolation level explicitly permits dirty reads. This is not an accident or oversight—it's a deliberate trade-off:

Why would anyone want to read uncommitted data?

Performance: Avoiding locks entirely maximizes concurrency
Approximate results: Some queries don't need perfect accuracy (e.g., "roughly how many active users?")
Non-critical monitoring: Dashboard widgets showing real-time approximations
Long-running queries: Avoiding blocking writers for hours during analytics queries

However, this relaxation comes with significant risks, which is why READ UNCOMMITTED is rarely used and never the default.

isolation-level-demo.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Demonstrating uncommitted data visibility
 
-- Session 1: Start a transaction but don't commit
BEGIN TRANSACTION;
UPDATE accounts SET balance = 999999 WHERE account_id = 1;
-- Transaction T1 is still active, change is uncommitted
 
-- Session 2 with READ UNCOMMITTED: CAN see the dirty data
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT balance FROM accounts WHERE account_id = 1;
-- Returns: 999999 (the uncommitted value!)
-- This is a DIRTY READ
 
-- Session 3 with READ COMMITTED: CANNOT see the dirty data
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
SELECT balance FROM accounts WHERE account_id = 1;
-- Returns: original committed value (e.g., 1000)
-- This query either blocks or sees the old value (MVCC)
 
-- Session 1: Now rollback
ROLLBACK;
 
-- Session 2 made decisions based on 999999, but that value
-- was never committed - it effectively never existed

MVCC and Dirty Read Prevention

The Provisional Nature of Uncommitted State

Uncommitted data is epistemologically different from committed data. While this might seem philosophical, understanding this distinction is crucial for robust database application design.

What Does 'Provisional' Mean?

When we call uncommitted data 'provisional', we mean:

Existence is contingent: The data's existence in the final database state depends on a future event (the commit) that hasn't occurred yet
Value is negotiable: Until commit, the transaction can further modify the data, meaning even the 'current' uncommitted value might not be the eventual committed value
Consistency is incomplete: Constraint checking may be deferred until commit time (deferred constraints)
Failure modes are open: System crashes, deadlocks, integrity violations, or application logic could all cause an abort

What Can Cause Uncommitted Data to Disappear
Cause	Description	Frequency
Explicit ROLLBACK	Application deliberately cancels the transaction	Common
Constraint Violation	Deferred constraint check fails at commit time	Occasional
Deadlock Resolution	Transaction chosen as deadlock victim and aborted	Occasional
Lock Timeout	Could not acquire necessary locks within time limit	Occasional
System Crash	Server failure before commit completes	Rare but critical
Connection Failure	Network or client disconnect before commit	Occasional
Resource Exhaustion	Out of log space, memory, or other limits	Rare
Application Error	Uncaught exception or error handling triggers rollback	Common

The Uncertainty Principle:

The Business Perspective

Implementation: How Databases Track Uncommitted State

Different database systems implement uncommitted data tracking in different ways. Understanding these implementations helps predict behavior and diagnose issues.

Lock-Based Systems:

In traditional lock-based systems (like SQL Server, DB2):

Writers acquire exclusive locks on data items before modifying them
Uncommitted modifications exist in the buffer pool, protected by locks
Readers requesting the same data must wait for the lock (or read dirty if allowed)
Commit releases locks, making data visible; rollback discards changes and releases locks

lock-based-uncommitted.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
LOCK-BASED UNCOMMITTED DATA TRACKING
═══════════════════════════════════════════════════════════════════
 
Data Structure for Each Data Item X:
┌────────────────────────────────────────────────────────────────┐
│  Lock Mode: [None | Shared | Exclusive]                        │
│  Lock Holder(s): Transaction ID(s) holding the lock            │
│  Current Value: The possibly-uncommitted value in buffer       │
│  Lock Queue: Transactions waiting for this lock                │
└────────────────────────────────────────────────────────────────┘
 
WRITE Operation (Transaction T, Item X):
  1. Request Exclusive Lock on X
     - If granted: Proceed to step 2
     - If blocked: Wait or abort depending on timeout
  2. Read current value into UNDO log
  3. Modify value in buffer pool
  4. Record REDO info in transaction log
  5. Mark page as 'dirty' (modified, uncommitted)
  
  ▶ X now contains UNCOMMITTED data from T
  
COMMIT (Transaction T):
  1. Force log records to disk (guarantee durability)
  2. Write COMMIT record to log
  3. Release all exclusive locks
  
  ▶ All of T's dirty data is now COMMITTED and visible
 
DIRTY READ scenario:
  - T1 holds exclusive lock on X with uncommitted value
  - T2 requests to READ X with READ UNCOMMITTED isolation
  - System allows T2 to read X's current buffer value
  - T2 sees T1's uncommitted modification

MVCC-Based Systems:

In MVCC systems (like PostgreSQL, MySQL InnoDB, Oracle):

Each modification creates a new version of the data item
Versions are tagged with the creating transaction's ID and commit status
Readers access versions based on visibility rules (typically only committed versions)
Uncommitted versions exist but are 'invisible' to other transactions by default

mvcc-uncommitted.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
MVCC UNCOMMITTED DATA TRACKING
═══════════════════════════════════════════════════════════════════
 
Version Chain for Data Item X:
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│ Version 3   │───▶│ Version 2   │───▶│ Version 1   │
│ XID: T5     │    │ XID: T3     │    │ XID: T1     │
│ Status: IN  │    │ Status: C   │    │ Status: C   │
│ Value: 300  │    │ Value: 200  │    │ Value: 100  │
│ Progress    │    │ (Committed) │    │ (Committed) │
└─────────────┘    └─────────────┘    └─────────────┘
       │
       ▼
    UNCOMMITTED        COMMITTED        COMMITTED
    (invisible to      (visible to      (visible to
     most readers)      snapshot ≥ T3)   snapshot ≥ T1)
 
VISIBILITY CHECK (Reader Transaction T_reader):
  FOR each version V of X (newest to oldest):
    IF V.creating_transaction is COMMITTED:
      IF V.created_before(T_reader.snapshot):
        RETURN V  -- This is the visible version
    ELIF V.creating_transaction == T_reader:
      RETURN V  -- Transaction sees its own uncommitted changes
    ELSE:
      CONTINUE  -- Skip uncommitted versions from other transactions
      
DIRTY READ in MVCC:
  - Reader ignores visibility check
  - Returns newest version regardless of commit status
  - This is explicitly enabled by READ UNCOMMITTED only

MVCC's Natural Protection

Summary: The Nature of Uncommitted Data

We've explored the core concept underlying dirty reads: uncommitted data and its fundamental properties. Let's consolidate the key insights:

Key Concepts Covered

•Definition: Uncommitted data is any modification made by a transaction that hasn't yet reached its commit point
•Location: Uncommitted data resides primarily in the buffer pool (RAM), with recovery information in the transaction log
•The Commit Boundary: The COMMIT record in the transaction log is the definitive marker separating uncommitted from committed state
•Semantic Difference: Committed data has guarantees (durability, consistency); uncommitted data has only provisional status
•Visibility Rules: Different isolation levels define different rules for whether uncommitted data can be seen
•Provisional Nature: Uncommitted data may disappear due to rollback, deadlock, system crash, or application errors
•Implementation Varies: Lock-based and MVCC systems track uncommitted data differently, but the semantics are consistent

What's Next:

Page Complete

2 / 5