Loading learning content...
When you write code to update a database record, there's a crucial period between when the change is made and when it becomes permanent. During this interval, the modification exists in a peculiar state—visible to some operations but not yet durable, present in memory but not guaranteed to survive, enacted but not committed.
This is the realm of uncommitted data, and understanding it is essential to grasping why dirty reads are so problematic. Uncommitted data is not simply 'pending' data—it represents a fundamental uncertainty about whether the change will ever become part of the permanent database state.
This page explores the nature of uncommitted data: where it exists in the database architecture, how it differs from committed data, why it's inherently provisional, and how its exposure to other transactions creates the dirty read problem. You'll understand the mechanics that make uncommitted data fundamentally different from—and more dangerous than—committed data.
Uncommitted data refers to modifications made by a transaction that has not yet reached its commit point. These modifications exist within the transaction's execution context but have not been made permanent (durable) in the database.
The Commit Boundary:
Every database transaction has a lifecycle that ends in one of two ways:
COMMIT: The transaction successfully completes, and all its modifications become permanent. The changes are guaranteed to survive system failures (durability).
ROLLBACK (ABORT): The transaction is cancelled, and all its modifications are discarded. It's as if the transaction never happened.
Uncommitted data exists in the uncertain period before either of these outcomes. At this point:
The key insight is that uncommitted data has an uncertain future. Unlike committed data, which is guaranteed to persist, uncommitted data might vanish completely if the transaction aborts. Any decision based on uncommitted data is therefore based on information that might be retroactively annulled.
To understand dirty reads, we must understand the architecture of database systems and where uncommitted data physically resides. The handling of uncommitted data is fundamental to both transaction processing and recovery mechanisms.
Multi-Layer Storage Architecture:
Modern database systems use a layered storage architecture:
| Storage Layer | Uncommitted Data | Committed Data | Characteristics |
|---|---|---|---|
| CPU Cache/Registers | ✓ Present | ✓ Present | Fastest, volatile, single transaction |
| Buffer Pool (RAM) | ✓ Modified pages ('dirty' pages) | ✓ Present | Fast, volatile, shared across transactions |
| Transaction Log | ✓ UNDO information recorded | ✓ REDO information recorded | Sequential, persisted before data pages |
| Data Files (Disk) | May or may not be written* | ✓ Eventually written | Permanent storage, random access |
The Buffer Pool: Where Dirty Reads Occur
The buffer pool (also called the buffer cache) is the critical location for understanding dirty reads. This is a region of main memory that caches frequently-accessed database pages:
When a transaction modifies a page, that page becomes dirty—its in-memory version differs from the on-disk version. This is where "dirty" data (and thus "dirty" reads) gets its name.
BUFFER POOL ARCHITECTURE════════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────────────────────────────────┐│ BUFFER POOL (RAM) │├─────────────────────────────────────────────────────────────────┤│ ││ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ││ │ Page P1 │ │ Page P2 │ │ Page P3 │ │ Page P4 │ ... ││ │ Clean │ │ DIRTY │ │ Clean │ │ DIRTY │ ││ │ │ │ (by T1) │ │ │ │ (by T2) │ ││ └──────────┘ └──────────┘ └──────────┘ └──────────┘ ││ ▲ ▲ ││ │ │ ││ Uncommitted Uncommitted ││ by T1 by T2 ││ │└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ PERSISTENT STORAGE (Disk) │├─────────────────────────────────────────────────────────────────┤│ [Log File] [Data File 1] [Data File 2] [Index Files] ││ │ │ │ │ ││ Contains May have May have May have ││ UNDO/REDO old or new old or new old or new ││ records version version version │└─────────────────────────────────────────────────────────────────┘ KEY INSIGHT: Dirty pages in buffer pool ARE VISIBLE to other transactions under weak isolation. This is where dirty reads occur.Database systems using a 'steal' policy may write dirty pages to disk BEFORE a transaction commits (to free buffer space). This doesn't change the commited/uncommitted distinction—the transaction log is the source of truth. Whether the dirty data is in RAM or on disk, it remains uncommitted until the transaction commits.
The transaction log (also called the write-ahead log or WAL) is the definitive record of the committed/uncommitted distinction. Understanding how the log tracks uncommitted data illuminates why arbitrary access to such data is dangerous.
Write-Ahead Logging (WAL) Principle:
Before any modification is applied to the database, it must first be recorded in the transaction log:
Log record written: Before modifying a data page, write an UNDO record (containing the old value) and a REDO record (containing the new value) to the log
Modification applied: Now modify the data page in the buffer pool
Commit decision: Eventually, write a COMMIT or ABORT record to the log
The log serves as the source of truth for what is committed. If the system crashes, recovery uses the log to undo uncommitted changes and redo committed changes.
| Log Record Type | Contents | Purpose | Uncommitted Status |
|---|---|---|---|
| BEGIN | Transaction ID, timestamp | Marks transaction start | Transaction now 'in flight' |
| UPDATE (UNDO/REDO) | Transaction ID, page ID, old value, new value | Records modification for recovery | Change is uncommitted |
| INSERT | Transaction ID, record location, record data | Records new row for recovery | Insert is uncommitted |
| DELETE | Transaction ID, record location, deleted data | Records deletion for recovery | Delete is uncommitted |
| COMMIT | Transaction ID, timestamp | Marks durability boundary | All changes now COMMITTED |
| ABORT | Transaction ID | Marks rollback requirement | All changes must be UNDONE |
The Commit Record as the Dividing Line:
The presence or absence of a COMMIT record in the log is the definitive answer to whether data is committed:
This is why reading uncommitted data is fundamentally problematic: you are reading data whose ultimate fate—committed or aborted—is still undetermined. The COMMIT record that would legitimize this data doesn't exist yet and might never exist.
1234567891011121314151617181920212223
TRANSACTION LOG SEQUENCE═══════════════════════════════════════════════════════════════════════ LSN Transaction Operation Data Item Before After Status───────────────────────────────────────────────────────────────────────101 T1 BEGIN - - - Active102 T1 UPDATE Account A 1000 800 Uncommitted103 T2 BEGIN - - - Active104 T1 UPDATE Account B 500 700 Uncommitted105 T2 READ Account A - (sees?) - ▲▲▲ CRITICAL MOMENT: T2 reads Account A ▲▲▲ If T2 sees 800 (T1's uncommitted write): DIRTY READ If T2 sees 1000 (last committed value): Safe 106 T2 COMMIT - - - T2 Committed107 T1 ABORT - - - T1 Aborted AFTER RECOVERY:- Account A = 1000 (T1's change undone)- Account B = 500 (T1's change undone)- T2's committed state may be based on value 800, which never existed!After a crash, recovery examines the log. Transactions with COMMIT records have their changes guaranteed. Transactions without COMMIT records have their changes undone. If T2 committed while using T1's uncommitted data, and T1 later aborted, the database is now in an inconsistent state that cannot be automatically corrected.
The distinction between committed and uncommitted data isn't just about timing—it represents fundamentally different semantic statuses within the database system.
Committed Data Properties:
Uncommitted Data Properties:
When data is committed, the database makes a promise: this data will persist. Applications can depend on it. Other systems can be notified. Business processes can proceed. It's a contract.
Uncommitted data comes with no guarantees. The transaction might abort (intentionally or due to errors). The changes might vanish. Basing decisions on uncommitted data is basing decisions on promises that haven't been made.
The Trust Hierarchy:
Database systems inherently trust committed data more than uncommitted data. This trust hierarchy manifests in several ways:
Query default behavior: Most databases default to READ COMMITTED isolation, refusing to show uncommitted data to other transactions
Replication systems: Data replication typically propagates only committed changes, not uncommitted writes
External systems: Event-driven architectures (CDC, message queues) should only trigger on committed changes
Recovery systems: Recovery always preserves committed data and discards uncommitted data
Dirty reads violate this trust hierarchy by exposing uncommitted data to transactions that then treat it as if it had the guarantees of committed data.
Different isolation levels define different visibility rules for uncommitted data. Understanding these rules is essential for both database developers and application architects.
SQL Standard Isolation Levels and Uncommitted Data:
| Isolation Level | Can See Uncommitted Data? | Implementation Approach | Use Case |
|---|---|---|---|
| READ UNCOMMITTED | ✓ YES - Can read dirty data | No read locks, no version checking | Approximate analytics, monitoring dashboards |
| READ COMMITTED | ✗ NO - Only committed values | Short-duration read locks or MVCC | Most OLTP applications |
| REPEATABLE READ | ✗ NO - Plus consistent reads | Read locks held or snapshot | Reporting with consistency needs |
| SERIALIZABLE | ✗ NO - Full isolation | Full locking or SSI | Critical financial operations |
The READ UNCOMMITTED Exception:
The READ UNCOMMITTED isolation level explicitly permits dirty reads. This is not an accident or oversight—it's a deliberate trade-off:
Why would anyone want to read uncommitted data?
However, this relaxation comes with significant risks, which is why READ UNCOMMITTED is rarely used and never the default.
123456789101112131415161718192021222324
-- Demonstrating uncommitted data visibility -- Session 1: Start a transaction but don't commitBEGIN TRANSACTION;UPDATE accounts SET balance = 999999 WHERE account_id = 1;-- Transaction T1 is still active, change is uncommitted -- Session 2 with READ UNCOMMITTED: CAN see the dirty dataSET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;SELECT balance FROM accounts WHERE account_id = 1;-- Returns: 999999 (the uncommitted value!)-- This is a DIRTY READ -- Session 3 with READ COMMITTED: CANNOT see the dirty dataSET TRANSACTION ISOLATION LEVEL READ COMMITTED;SELECT balance FROM accounts WHERE account_id = 1;-- Returns: original committed value (e.g., 1000)-- This query either blocks or sees the old value (MVCC) -- Session 1: Now rollbackROLLBACK; -- Session 2 made decisions based on 999999, but that value-- was never committed - it effectively never existedModern databases like PostgreSQL and MySQL (InnoDB) use Multi-Version Concurrency Control (MVCC). In MVCC, readers see a consistent snapshot of committed data as of a specific point in time. This inherently prevents dirty reads without requiring read locks—readers never see uncommitted versions because they access historical committed versions.
Uncommitted data is epistemologically different from committed data. While this might seem philosophical, understanding this distinction is crucial for robust database application design.
What Does 'Provisional' Mean?
When we call uncommitted data 'provisional', we mean:
Existence is contingent: The data's existence in the final database state depends on a future event (the commit) that hasn't occurred yet
Value is negotiable: Until commit, the transaction can further modify the data, meaning even the 'current' uncommitted value might not be the eventual committed value
Consistency is incomplete: Constraint checking may be deferred until commit time (deferred constraints)
Failure modes are open: System crashes, deadlocks, integrity violations, or application logic could all cause an abort
| Cause | Description | Frequency |
|---|---|---|
| Explicit ROLLBACK | Application deliberately cancels the transaction | Common |
| Constraint Violation | Deferred constraint check fails at commit time | Occasional |
| Deadlock Resolution | Transaction chosen as deadlock victim and aborted | Occasional |
| Lock Timeout | Could not acquire necessary locks within time limit | Occasional |
| System Crash | Server failure before commit completes | Rare but critical |
| Connection Failure | Network or client disconnect before commit | Occasional |
| Resource Exhaustion | Out of log space, memory, or other limits | Rare |
| Application Error | Uncaught exception or error handling triggers rollback | Common |
The Uncertainty Principle:
Reading uncommitted data is like observing a quantum superposition before measurement—you're seeing a value that exists in a state of uncertainty. The act of the original transaction committing or aborting is what 'collapses' this uncertainty into a definite outcome.
Unlike physical systems, however, your reading of the uncommitted data doesn't influence whether it commits or aborts. But if you base your own committed operations on that uncertain value, you've now coupled your transaction's correctness to that uncertainty.
From a business logic standpoint, uncommitted data represents intent, not fact. A user has indicated they want to transfer money, but the transfer isn't complete. An order has been placed, but not confirmed. Reading this data and acting on it is like acting on a promise that can still be withdrawn.
Different database systems implement uncommitted data tracking in different ways. Understanding these implementations helps predict behavior and diagnose issues.
Lock-Based Systems:
In traditional lock-based systems (like SQL Server, DB2):
12345678910111213141516171819202122232425262728293031323334
LOCK-BASED UNCOMMITTED DATA TRACKING═══════════════════════════════════════════════════════════════════ Data Structure for Each Data Item X:┌────────────────────────────────────────────────────────────────┐│ Lock Mode: [None | Shared | Exclusive] ││ Lock Holder(s): Transaction ID(s) holding the lock ││ Current Value: The possibly-uncommitted value in buffer ││ Lock Queue: Transactions waiting for this lock │└────────────────────────────────────────────────────────────────┘ WRITE Operation (Transaction T, Item X): 1. Request Exclusive Lock on X - If granted: Proceed to step 2 - If blocked: Wait or abort depending on timeout 2. Read current value into UNDO log 3. Modify value in buffer pool 4. Record REDO info in transaction log 5. Mark page as 'dirty' (modified, uncommitted) ▶ X now contains UNCOMMITTED data from T COMMIT (Transaction T): 1. Force log records to disk (guarantee durability) 2. Write COMMIT record to log 3. Release all exclusive locks ▶ All of T's dirty data is now COMMITTED and visible DIRTY READ scenario: - T1 holds exclusive lock on X with uncommitted value - T2 requests to READ X with READ UNCOMMITTED isolation - System allows T2 to read X's current buffer value - T2 sees T1's uncommitted modificationMVCC-Based Systems:
In MVCC systems (like PostgreSQL, MySQL InnoDB, Oracle):
12345678910111213141516171819202122232425262728293031
MVCC UNCOMMITTED DATA TRACKING═══════════════════════════════════════════════════════════════════ Version Chain for Data Item X:┌─────────────┐ ┌─────────────┐ ┌─────────────┐│ Version 3 │───▶│ Version 2 │───▶│ Version 1 ││ XID: T5 │ │ XID: T3 │ │ XID: T1 ││ Status: IN │ │ Status: C │ │ Status: C ││ Value: 300 │ │ Value: 200 │ │ Value: 100 ││ Progress │ │ (Committed) │ │ (Committed) │└─────────────┘ └─────────────┘ └─────────────┘ │ ▼ UNCOMMITTED COMMITTED COMMITTED (invisible to (visible to (visible to most readers) snapshot ≥ T3) snapshot ≥ T1) VISIBILITY CHECK (Reader Transaction T_reader): FOR each version V of X (newest to oldest): IF V.creating_transaction is COMMITTED: IF V.created_before(T_reader.snapshot): RETURN V -- This is the visible version ELIF V.creating_transaction == T_reader: RETURN V -- Transaction sees its own uncommitted changes ELSE: CONTINUE -- Skip uncommitted versions from other transactions DIRTY READ in MVCC: - Reader ignores visibility check - Returns newest version regardless of commit status - This is explicitly enabled by READ UNCOMMITTED onlyMVCC inherently prevents dirty reads in its default operation because the visibility algorithm explicitly skips uncommitted versions from other transactions. Dirty reads require deliberately bypassing this visibility check, which is why they're only possible at READ UNCOMMITTED isolation level.
We've explored the core concept underlying dirty reads: uncommitted data and its fundamental properties. Let's consolidate the key insights:
What's Next:
Now that we understand what uncommitted data is and why it's dangerous, we'll examine the consequences of reading such data. The next page explores the incorrect results that arise from dirty reads—how reading uncommitted data leads to computations, decisions, and committed states that violate database integrity.
You now have a deep understanding of uncommitted data: its nature, location, lifecycle, and why exposing it to other transactions is inherently risky. This foundation is essential for understanding the full impact of dirty reads, which we'll explore in the subsequent pages.