Dirty Read Problem - Learning Module

Loading content...

0/241

Rollback Implications: When Dirty Sources Disappear

The Vanishing Act That Breaks Everything

Consider a supply chain system where Order #1001 reads a supplier's uncommitted inventory update showing 500 units available. Based on this, Order #1001 confirms delivery to a customer and commits. Seconds later, the supplier transaction rolls back—a shipping manifest had an error, and the inventory was never actually updated.

The committed Order #1001 now references inventory that doesn't exist. The customer expects delivery. The warehouse has no stock. And the system has no automatic way to detect or resolve this inconsistency.

This is the rollback implication of dirty reads: when the transaction that wrote the dirty data aborts, it doesn't just restore data—it creates a fundamental reality gap between what the database says and what actually exists.

What You Will Learn

This page explores the specific implications of rollback scenarios: cascading aborts and their prevention, the formal concepts of recoverability, the distinction between recoverable, cascadeless, and strict schedules, and why rollback scenarios make dirty reads uniquely dangerous compared to other concurrency anomalies.

Anatomy of a Dirty Read Rollback

Let's examine the precise mechanics of what happens when a dirty read is followed by a rollback of the source transaction.

The Setup:

Data item A has committed value A = 100
Transaction T₁ modifies A to 200 (uncommitted)
Transaction T₂ reads A = 200 (dirty read)
T₂ uses this value and commits
T₁ subsequently aborts

The Timeline:

Converting Mermaid diagram...

What the Rollback Does and Doesn't Do:

T₁'s Rollback Accomplishes:

Restores A to its committed value (100)
Releases any locks T₁ was holding
Records the ABORT in the transaction log
Ensures durability of the restored state

T₁'s Rollback Does NOT:

Undo T₂'s committed changes (B = 250)
Notify T₂ that the data it read was invalid
Trigger any compensating action
Restore database consistency (B no longer equals A + 50)

This asymmetry is the core of the problem: rollback perfectly handles the data written by the aborting transaction but has no mechanism to address data written by other transactions that depended on the aborted writes.

The Unilateral Recovery Problem

Database recovery is inherently unilateral—each transaction is rolled back independently without considering dependencies. This is correct behavior for ACID guarantees when isolation is maintained. But when dirty reads have created cross-transaction dependencies, unilateral rollback leaves those dependencies dangling.

Cascading Aborts: The Domino Effect

One approach to maintaining consistency when dirty reads occur is cascading aborts—when T₁ aborts, any transaction that read T₁'s uncommitted data must also abort.

Formal Definition:

A cascading abort (or cascading rollback) occurs when the abort of one transaction forces one or more other transactions to abort as well. If T₂ reads data written by T₁ before T₁ commits, and T₁ aborts, then T₂ must also abort to prevent data inconsistency.

The Cascade Chain:

cascading-abort-example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CASCADING ABORT SCENARIO
═══════════════════════════════════════════════════════════════════════
 
Time    T₁                  T₂                  T₃                  T₄
────────────────────────────────────────────────────────────────────────
t1      write(A) = 100
t2                          read(A) → 100*                            
t3                          write(B) = 200
t4                                              read(B) → 200*
t5                                              write(C) = 300
t6                                                                  read(C) → 300*
t7                                                                  write(D) = 400
t8      ABORT! ←───────────────────────────────────────────────────────────
                            ↓ CASCADE
t9                          ABORT! (read dirty A) ←────────────────────────
                                                ↓ CASCADE
t10                                             ABORT! (read dirty B) ←────
                                                                    ↓ CASCADE
t11                                                                 ABORT! (read dirty C)
 
* = dirty read from uncommitted transaction
 
Result: T₁'s abort causes T₂, T₃, and T₄ to all abort.
        All work by all transactions is lost.
        Significant wasted computation.

Problems with Cascading Aborts:

While cascading aborts maintain consistency, they introduce severe problems:

Cascading Abort Problems

•Wasted Work: All computation by cascaded transactions is discarded—potentially hours of work
•Chain Reaction: A single abort can trigger many others, exponentially multiplying the impact
•Uncommitted Dependency: Requires tracking which transactions read from uncommitted writes—complex bookkeeping
•Delayed Decisions: Transactions cannot commit until all transactions they depend on have committed
•User Experience: Users whose transactions cascade-abort experience unexplained failures
•Unbounded Depth: In extreme cases, the cascade can touch a majority of active transactions

The Committed Transaction Problem

If T₂ has already committed before T₁ aborts, cascading abort cannot help. You cannot abort a committed transaction—durability guarantees prevent this. This is why cascading aborts are primarily a theoretical mechanism; in practice, once T₂ commits, the damage from a dirty read is permanent.

Recoverable Schedules: The Formal Solution

Database theory defines recoverability as a property of schedules that ensures the ability to properly recover from transaction failures. A schedule's recoverability determines whether cascading aborts are needed and whether they're even possible.

Definition of Recoverable Schedule:

A schedule S is recoverable if and only if:

For any two transactions Tᵢ and Tⱼ, if Tⱼ reads a data item written by Tᵢ, then Tᵢ must commit before Tⱼ commits.

In formal notation: If Tⱼ reads from Tᵢ, then Commit(Tᵢ) < Commit(Tⱼ) in the schedule.

Recoverable Schedule Example:

T₁: w(A) ... ... ... c₁
T₂:      r(A) ... ... ... c₂

Timeline: w₁(A), r₂(A), c₁, c₂

This is recoverable because T₁ commits (c₁) before T₂ commits (c₂). Even though T₂ read T₁'s uncommitted data, T₂ waited for T₁ to commit before committing itself.

Non-Recoverable Schedule Example:

T₁: w(A) ... ... ... ... a₁
T₂:      r(A) ... c₂

Timeline: w₁(A), r₂(A), c₂, a₁

This is NON-recoverable because T₂ commits (c₂) before T₁ aborts (a₁). T₂ has committed with dirty data, and we cannot rollback T₂ after it has committed.

Why Recoverability Matters

Non-recoverable schedules are problematic because if T₁ aborts after T₂ has committed, there's no way to restore consistency. T₂'s committed data was based on T₁'s writes, which are now officially nullified. The database is stuck in an inconsistent state with no automated recovery path.

Schedule Classification by Recoverability
Schedule Type	Definition	Cascading Aborts?	Key Property
Recoverable	Tᵢ commits before Tⱼ commits if Tⱼ reads from Tᵢ	Possible but may be needed	Can always undo Tⱼ if Tᵢ aborts
Cascadeless (ACA)	Tⱼ only reads committed data	Never needed	No dirty reads occur
Strict	Tⱼ only reads/writes after Tᵢ commits or aborts	Never needed	Before-images always recoverable

Cascadeless and Strict Schedules

To avoid cascading aborts entirely, we need stronger schedule properties than mere recoverability.

Cascadeless Schedules (Avoiding Cascading Abort - ACA):

A schedule S is cascadeless if and only if:

For any two transactions Tᵢ and Tⱼ, if Tⱼ reads a data item written by Tᵢ, then Tᵢ must commit before Tⱼ reads.

In other words: transactions may only read committed data. This directly prevents dirty reads.

schedule-comparison.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SCHEDULE CLASSIFICATION EXAMPLES
═══════════════════════════════════════════════════════════════════════
 
1. NON-RECOVERABLE SCHEDULE:
   w₁(A), r₂(A), c₂, a₁
   
   Problem: T₂ committed before T₁ aborted
   Result: Cannot recover - T₂'s committed data is based on T₁'s nullified writes
   Status: ILLEGAL - most DBs prevent this
 
2. RECOVERABLE BUT NOT CASCADELESS:
   w₁(A), r₂(A), c₁, c₂
   
   Key: T₁ commits before T₂ commits
   But: T₂ read before T₁ committed (dirty read possible)
   If T₁ had aborted: T₂ would need to cascade abort (but T₂ hadn't committed yet)
   Status: Legal but risky
 
3. CASCADELESS (ACA):
   w₁(A), c₁, r₂(A), c₂
   
   Key: T₂ only reads after T₁ commits
   Result: T₂ only ever sees committed data
   If T₁ aborted: It would happen before c₁, so before r₂, so T₂ reads original A
   Status: Safe - no cascading aborts possible
 
4. STRICT:
   w₁(A), c₁, r₂(A), w₂(A), c₂
   
   Key: T₂ doesn't read OR write A until T₁ finishes
   Result: Before-images for recovery always clear
   Status: Safest - supports simple recovery algorithms
 
HIERARCHY:
  Strict ⊂ Cascadeless ⊂ Recoverable ⊂ All Schedules

Strict Schedules:

A schedule S is strict if and only if:

For any data item X written by transaction Tᵢ, no other transaction may read or write X until Tᵢ either commits or aborts.

Strict schedules are the most restrictive but offer the cleanest recovery semantics. If T₁ writes A and then aborts, recovering A's before-image is straightforward because no other transaction has touched A since T₁ wrote it.

Converting Mermaid diagram...

What DBMSs Actually Enforce

Most modern database systems enforce at least cascadeless schedules by default (via READ COMMITTED isolation or higher). This prevents dirty reads entirely. Only by explicitly requesting READ UNCOMMITTED can you opt into recoverable-but-not-cascadeless schedules where cascading aborts become a concern.

Why Rollback Distinguishes Dirty Reads from Other Anomalies

The rollback scenario fundamentally distinguishes dirty reads from other concurrency anomalies. Let's examine why.

Dirty Reads vs. Non-Repeatable Reads:

Comparing Dirty Reads and Non-Repeatable Reads
Aspect	Dirty Read	Non-Repeatable Read
Data Source	Uncommitted (may abort)	Committed (will persist)
If Source Rolls Back	Read value never existed legally	N/A - source was committed
Consistency Status	May be reading intermediate, inconsistent state	Both reads are of consistent states
Recovery Option	May require cascade abort or manual correction	No special recovery needed
Result Validity	Results may be based on phantom data	Both values were valid at some point

The Ontological Difference:

With dirty reads, rollback creates an ontological problem—the data that was read never actually existed in any committed database state. It was a provisional value that the database ultimately rejected.

With non-repeatable reads, both values read were legitimate committed values of the data item; they just differed because the data changed between reads. This is a temporal difference, not an ontological one.

Example Contrast:

anomaly-contrast.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
DIRTY READ WITH ROLLBACK:
─────────────────────────────────────────────────────────────────────
T₁: WRITE(A=200)              ...              ABORT
T₂:              READ(A)→200       COMMIT
 
Database history of committed values for A:
  A = 100 (initial) → A = 100 (still, because T₁ aborted)
  
T₂ read A = 200, but 200 was NEVER in the committed history.
The value 200 is a phantom - it has no legitimate existence.
 
───────────────────────────────────────────────────────────────────────
NON-REPEATABLE READ (no dirty read):
─────────────────────────────────────────────────────────────────────
T₁: WRITE(A=200)        COMMIT
T₂: READ(A)→100                      READ(A)→200    COMMIT
 
Database history of committed values for A:
  A = 100 (initial) → A = 200 (after T₁ committed)
  
T₂ read two different values, but BOTH were in the committed history.
100 was legitimately committed at t=0; 200 was legitimately committed after T₁.
The inconsistency is temporal, not ontological.

The Phantom Data Problem

When dirty data triggers a rollback, you've made decisions based on a value that the database officially repudiates. It's not that you saw stale data or saw data change—you saw data that the database declares never happened. This is fundamentally more severe than any anomaly involving only committed data.

Practical Rollback Scenarios

Let's examine real-world scenarios where dirty reads followed by rollbacks cause significant problems.

Scenario: Wire Transfer That Fails Validation

A wire transfer from Account A to Account B involves:

Deduct from A
Credit to B
Validate (AML check, limits, etc.)
Commit or abort based on validation

wire-transfer-rollback.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- T1: Wire transfer (will fail AML check)
BEGIN;
UPDATE accounts SET balance = balance - 50000 WHERE id = 'A';  -- A: 100k → 50k
UPDATE accounts SET balance = balance + 50000 WHERE id = 'B';  -- B: 10k → 60k
 
-- Meanwhile, T2 (READ UNCOMMITTED) checks B's balance for another purpose
SELECT balance FROM accounts WHERE id = 'B';  -- Returns 60k (DIRTY!)
 
-- T2 uses this to approve a withdrawal from B
UPDATE accounts SET balance = balance - 55000 WHERE id = 'B';  -- B: 60k → 5k
COMMIT;  -- T2 commits!
 
-- Back in T1: AML check fails
ROLLBACK;  -- T1 aborts, B's balance should be 10k
 
-- Final state:
-- B was credited 50k (rolled back) then debited 55k (committed)
-- Expected B balance: 10k
-- Actual B balance: 10k - 55k + 50k (rollback) = ???
-- Depends on implementation - but logic is clearly broken

The Negative Balance Problem

T2 approved a $55,000 withdrawal from an account that actually only has $10,000. The dirty read showed $60,000 (which would never commit), but T2 committed a withdrawal based on that phantom balance. The account may now be severely overdrawn.

How Databases Handle Rollback with Dirty Reads

Different database systems handle rollback scenarios differently. Understanding these mechanisms helps predict behavior and design robust applications.

Lock-Based Systems:

In lock-based systems without READ UNCOMMITTED:

Writers hold exclusive locks until commit/abort
Readers wait for locks or are blocked
Rollback releases locks, allowing waiting readers to proceed with correct data
No dirty reads occur; no rollback implications

With READ UNCOMMITTED (explicitly requested):

Readers ignore locks and read current buffer values
Rollback restores data, but reader may have already used dirty value
No automatic recovery mechanism
Application must handle inconsistency

rollback-handling.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
ROLLBACK MECHANISM (Simplified)
═══════════════════════════════════════════════════════════════════════
 
FUNCTION rollback(transaction T):
    # Phase 1: Undo all writes by T using log records
    FOR each write operation W by T (in reverse order):
        data_item = W.data_item
        before_value = W.undo_info
        restore(data_item, before_value)
        write_compensation_log_record(T, data_item, before_value)
    
    # Phase 2: Release all locks held by T
    FOR each lock L held by T:
        release(L)
        wake_waiting_transactions(L)
    
    # Phase 3: Mark transaction as aborted
    write_abort_record(T)
    remove_from_active_transactions(T)
    
    # NOTICE: No check for dependent transactions!
    # NOTICE: No notification to readers of T's dirty data!
    # This is intentional - rollback is unilateral
 
CONSEQUENCE:
    If READ UNCOMMITTED allowed dirty reads, those readers
    are NOT notified and NOT automatically rolled back.
    
    The system ASSUMES schedules are at least recoverable,
    meaning readers of dirty data haven't committed yet
    OR that READ COMMITTED+ prevents dirty reads entirely.

MVCC Systems:

In MVCC systems like PostgreSQL and MySQL InnoDB:

Readers see consistent snapshots of committed data by default
Uncommitted data exists as uncommitted versions invisible to most readers
Rollback marks uncommitted versions as invalid
Future readers never see rolled-back data
READ UNCOMMITTED (if supported) bypasses version visibility rules

MVCC inherently provides stronger protection because the normal visibility algorithm excludes uncommitted data. You have to explicitly bypass this protection.

MVCC Provides Natural Protection

In MVCC systems, rollback is even cleaner from the reader's perspective—uncommitted versions are simply never visible to begin with (for transactions using READ COMMITTED or higher). The rolled-back data might as well have never been written as far as other transactions are concerned.

Summary: The Rollback Dimension of Dirty Reads

The rollback scenario adds a critical dimension to understanding dirty reads. It's not just that we read 'wrong' data—we read data that the database officially repudiates as having never legally existed.

Key Concepts Covered

•Rollback Mechanics: Rollback restores the aborting transaction's data but doesn't fix data derived from dirty reads
•Cascading Aborts: Theoretically solve the problem by aborting dependent uncommitted transactions, but fail if dependents already committed
•Recoverability: A schedule property ensuring that we can always cascade abort if needed
•Cascadeless Schedules: Prevent dirty reads entirely, eliminating the need for cascading aborts
•Ontological Difference: Dirty reads involve data that never existed; other anomalies involve data that was valid at some point
•Real-World Impact: Banking, inventory, and reservation scenarios show severe business consequences
•Implementation: Lock-based and MVCC systems handle rollback differently, but both can be bypassed by READ UNCOMMITTED

What's Next:

We've thoroughly examined the problem side of dirty reads. The final page explores prevention strategies—the isolation levels, locking protocols, and architectural patterns that prevent dirty reads from occurring in the first place.

Page Complete

You now understand the specific implications of rollback scenarios for dirty reads, including cascading aborts, recoverability concepts, and the unique severity of reading data that gets rolled back. This understanding motivates the prevention strategies we'll explore in the final page.

Rollback Implications: When Dirty Sources Disappear

The Vanishing Act That Breaks Everything

What You Will Learn

Anatomy of a Dirty Read Rollback

Let's examine the precise mechanics of what happens when a dirty read is followed by a rollback of the source transaction.

The Setup:

Data item A has committed value A = 100
Transaction T₁ modifies A to 200 (uncommitted)
Transaction T₂ reads A = 200 (dirty read)
T₂ uses this value and commits
T₁ subsequently aborts

The Timeline:

Converting Mermaid diagram...

What the Rollback Does and Doesn't Do:

T₁'s Rollback Accomplishes:

Restores A to its committed value (100)
Releases any locks T₁ was holding
Records the ABORT in the transaction log
Ensures durability of the restored state

T₁'s Rollback Does NOT:

Undo T₂'s committed changes (B = 250)
Notify T₂ that the data it read was invalid
Trigger any compensating action
Restore database consistency (B no longer equals A + 50)

The Unilateral Recovery Problem

Cascading Aborts: The Domino Effect

One approach to maintaining consistency when dirty reads occur is cascading aborts—when T₁ aborts, any transaction that read T₁'s uncommitted data must also abort.

Formal Definition:

The Cascade Chain:

cascading-abort-example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CASCADING ABORT SCENARIO
═══════════════════════════════════════════════════════════════════════
 
Time    T₁                  T₂                  T₃                  T₄
────────────────────────────────────────────────────────────────────────
t1      write(A) = 100
t2                          read(A) → 100*                            
t3                          write(B) = 200
t4                                              read(B) → 200*
t5                                              write(C) = 300
t6                                                                  read(C) → 300*
t7                                                                  write(D) = 400
t8      ABORT! ←───────────────────────────────────────────────────────────
                            ↓ CASCADE
t9                          ABORT! (read dirty A) ←────────────────────────
                                                ↓ CASCADE
t10                                             ABORT! (read dirty B) ←────
                                                                    ↓ CASCADE
t11                                                                 ABORT! (read dirty C)
 
* = dirty read from uncommitted transaction
 
Result: T₁'s abort causes T₂, T₃, and T₄ to all abort.
        All work by all transactions is lost.
        Significant wasted computation.

Problems with Cascading Aborts:

While cascading aborts maintain consistency, they introduce severe problems:

Cascading Abort Problems

•Wasted Work: All computation by cascaded transactions is discarded—potentially hours of work
•Chain Reaction: A single abort can trigger many others, exponentially multiplying the impact
•Uncommitted Dependency: Requires tracking which transactions read from uncommitted writes—complex bookkeeping
•Delayed Decisions: Transactions cannot commit until all transactions they depend on have committed
•User Experience: Users whose transactions cascade-abort experience unexplained failures
•Unbounded Depth: In extreme cases, the cascade can touch a majority of active transactions

The Committed Transaction Problem

Recoverable Schedules: The Formal Solution

Definition of Recoverable Schedule:

A schedule S is recoverable if and only if:

For any two transactions Tᵢ and Tⱼ, if Tⱼ reads a data item written by Tᵢ, then Tᵢ must commit before Tⱼ commits.

In formal notation: If Tⱼ reads from Tᵢ, then Commit(Tᵢ) < Commit(Tⱼ) in the schedule.

Recoverable Schedule Example:

T₁: w(A) ... ... ... c₁
T₂:      r(A) ... ... ... c₂

Timeline: w₁(A), r₂(A), c₁, c₂

This is recoverable because T₁ commits (c₁) before T₂ commits (c₂). Even though T₂ read T₁'s uncommitted data, T₂ waited for T₁ to commit before committing itself.

Non-Recoverable Schedule Example:

T₁: w(A) ... ... ... ... a₁
T₂:      r(A) ... c₂

Timeline: w₁(A), r₂(A), c₂, a₁

This is NON-recoverable because T₂ commits (c₂) before T₁ aborts (a₁). T₂ has committed with dirty data, and we cannot rollback T₂ after it has committed.

Why Recoverability Matters

Schedule Classification by Recoverability
Schedule Type	Definition	Cascading Aborts?	Key Property
Recoverable	Tᵢ commits before Tⱼ commits if Tⱼ reads from Tᵢ	Possible but may be needed	Can always undo Tⱼ if Tᵢ aborts
Cascadeless (ACA)	Tⱼ only reads committed data	Never needed	No dirty reads occur
Strict	Tⱼ only reads/writes after Tᵢ commits or aborts	Never needed	Before-images always recoverable

Cascadeless and Strict Schedules

To avoid cascading aborts entirely, we need stronger schedule properties than mere recoverability.

Cascadeless Schedules (Avoiding Cascading Abort - ACA):

A schedule S is cascadeless if and only if:

For any two transactions Tᵢ and Tⱼ, if Tⱼ reads a data item written by Tᵢ, then Tᵢ must commit before Tⱼ reads.

In other words: transactions may only read committed data. This directly prevents dirty reads.

schedule-comparison.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
SCHEDULE CLASSIFICATION EXAMPLES
═══════════════════════════════════════════════════════════════════════
 
1. NON-RECOVERABLE SCHEDULE:
   w₁(A), r₂(A), c₂, a₁
   
   Problem: T₂ committed before T₁ aborted
   Result: Cannot recover - T₂'s committed data is based on T₁'s nullified writes
   Status: ILLEGAL - most DBs prevent this
 
2. RECOVERABLE BUT NOT CASCADELESS:
   w₁(A), r₂(A), c₁, c₂
   
   Key: T₁ commits before T₂ commits
   But: T₂ read before T₁ committed (dirty read possible)
   If T₁ had aborted: T₂ would need to cascade abort (but T₂ hadn't committed yet)
   Status: Legal but risky
 
3. CASCADELESS (ACA):
   w₁(A), c₁, r₂(A), c₂
   
   Key: T₂ only reads after T₁ commits
   Result: T₂ only ever sees committed data
   If T₁ aborted: It would happen before c₁, so before r₂, so T₂ reads original A
   Status: Safe - no cascading aborts possible
 
4. STRICT:
   w₁(A), c₁, r₂(A), w₂(A), c₂
   
   Key: T₂ doesn't read OR write A until T₁ finishes
   Result: Before-images for recovery always clear
   Status: Safest - supports simple recovery algorithms
 
HIERARCHY:
  Strict ⊂ Cascadeless ⊂ Recoverable ⊂ All Schedules

Strict Schedules:

A schedule S is strict if and only if:

For any data item X written by transaction Tᵢ, no other transaction may read or write X until Tᵢ either commits or aborts.

Converting Mermaid diagram...

What DBMSs Actually Enforce

Why Rollback Distinguishes Dirty Reads from Other Anomalies

The rollback scenario fundamentally distinguishes dirty reads from other concurrency anomalies. Let's examine why.

Dirty Reads vs. Non-Repeatable Reads:

Comparing Dirty Reads and Non-Repeatable Reads
Aspect	Dirty Read	Non-Repeatable Read
Data Source	Uncommitted (may abort)	Committed (will persist)
If Source Rolls Back	Read value never existed legally	N/A - source was committed
Consistency Status	May be reading intermediate, inconsistent state	Both reads are of consistent states
Recovery Option	May require cascade abort or manual correction	No special recovery needed
Result Validity	Results may be based on phantom data	Both values were valid at some point

The Ontological Difference:

Example Contrast:

anomaly-contrast.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
DIRTY READ WITH ROLLBACK:
─────────────────────────────────────────────────────────────────────
T₁: WRITE(A=200)              ...              ABORT
T₂:              READ(A)→200       COMMIT
 
Database history of committed values for A:
  A = 100 (initial) → A = 100 (still, because T₁ aborted)
  
T₂ read A = 200, but 200 was NEVER in the committed history.
The value 200 is a phantom - it has no legitimate existence.
 
───────────────────────────────────────────────────────────────────────
NON-REPEATABLE READ (no dirty read):
─────────────────────────────────────────────────────────────────────
T₁: WRITE(A=200)        COMMIT
T₂: READ(A)→100                      READ(A)→200    COMMIT
 
Database history of committed values for A:
  A = 100 (initial) → A = 200 (after T₁ committed)
  
T₂ read two different values, but BOTH were in the committed history.
100 was legitimately committed at t=0; 200 was legitimately committed after T₁.
The inconsistency is temporal, not ontological.

The Phantom Data Problem

Practical Rollback Scenarios

Let's examine real-world scenarios where dirty reads followed by rollbacks cause significant problems.

Scenario: Wire Transfer That Fails Validation

A wire transfer from Account A to Account B involves:

Deduct from A
Credit to B
Validate (AML check, limits, etc.)
Commit or abort based on validation

wire-transfer-rollback.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- T1: Wire transfer (will fail AML check)
BEGIN;
UPDATE accounts SET balance = balance - 50000 WHERE id = 'A';  -- A: 100k → 50k
UPDATE accounts SET balance = balance + 50000 WHERE id = 'B';  -- B: 10k → 60k
 
-- Meanwhile, T2 (READ UNCOMMITTED) checks B's balance for another purpose
SELECT balance FROM accounts WHERE id = 'B';  -- Returns 60k (DIRTY!)
 
-- T2 uses this to approve a withdrawal from B
UPDATE accounts SET balance = balance - 55000 WHERE id = 'B';  -- B: 60k → 5k
COMMIT;  -- T2 commits!
 
-- Back in T1: AML check fails
ROLLBACK;  -- T1 aborts, B's balance should be 10k
 
-- Final state:
-- B was credited 50k (rolled back) then debited 55k (committed)
-- Expected B balance: 10k
-- Actual B balance: 10k - 55k + 50k (rollback) = ???
-- Depends on implementation - but logic is clearly broken

The Negative Balance Problem

How Databases Handle Rollback with Dirty Reads

Different database systems handle rollback scenarios differently. Understanding these mechanisms helps predict behavior and design robust applications.

Lock-Based Systems:

In lock-based systems without READ UNCOMMITTED:

Writers hold exclusive locks until commit/abort
Readers wait for locks or are blocked
Rollback releases locks, allowing waiting readers to proceed with correct data
No dirty reads occur; no rollback implications

With READ UNCOMMITTED (explicitly requested):

Readers ignore locks and read current buffer values
Rollback restores data, but reader may have already used dirty value
No automatic recovery mechanism
Application must handle inconsistency

rollback-handling.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
ROLLBACK MECHANISM (Simplified)
═══════════════════════════════════════════════════════════════════════
 
FUNCTION rollback(transaction T):
    # Phase 1: Undo all writes by T using log records
    FOR each write operation W by T (in reverse order):
        data_item = W.data_item
        before_value = W.undo_info
        restore(data_item, before_value)
        write_compensation_log_record(T, data_item, before_value)
    
    # Phase 2: Release all locks held by T
    FOR each lock L held by T:
        release(L)
        wake_waiting_transactions(L)
    
    # Phase 3: Mark transaction as aborted
    write_abort_record(T)
    remove_from_active_transactions(T)
    
    # NOTICE: No check for dependent transactions!
    # NOTICE: No notification to readers of T's dirty data!
    # This is intentional - rollback is unilateral
 
CONSEQUENCE:
    If READ UNCOMMITTED allowed dirty reads, those readers
    are NOT notified and NOT automatically rolled back.
    
    The system ASSUMES schedules are at least recoverable,
    meaning readers of dirty data haven't committed yet
    OR that READ COMMITTED+ prevents dirty reads entirely.

MVCC Systems:

In MVCC systems like PostgreSQL and MySQL InnoDB:

Readers see consistent snapshots of committed data by default
Uncommitted data exists as uncommitted versions invisible to most readers
Rollback marks uncommitted versions as invalid
Future readers never see rolled-back data
READ UNCOMMITTED (if supported) bypasses version visibility rules

MVCC inherently provides stronger protection because the normal visibility algorithm excludes uncommitted data. You have to explicitly bypass this protection.

MVCC Provides Natural Protection

Summary: The Rollback Dimension of Dirty Reads

Key Concepts Covered

•Rollback Mechanics: Rollback restores the aborting transaction's data but doesn't fix data derived from dirty reads
•Cascading Aborts: Theoretically solve the problem by aborting dependent uncommitted transactions, but fail if dependents already committed
•Recoverability: A schedule property ensuring that we can always cascade abort if needed
•Cascadeless Schedules: Prevent dirty reads entirely, eliminating the need for cascading aborts
•Ontological Difference: Dirty reads involve data that never existed; other anomalies involve data that was valid at some point
•Real-World Impact: Banking, inventory, and reservation scenarios show severe business consequences
•Implementation: Lock-based and MVCC systems handle rollback differently, but both can be bypassed by READ UNCOMMITTED

What's Next:

Page Complete