Loading content...
Consider a supply chain system where Order #1001 reads a supplier's uncommitted inventory update showing 500 units available. Based on this, Order #1001 confirms delivery to a customer and commits. Seconds later, the supplier transaction rolls back—a shipping manifest had an error, and the inventory was never actually updated.
The committed Order #1001 now references inventory that doesn't exist. The customer expects delivery. The warehouse has no stock. And the system has no automatic way to detect or resolve this inconsistency.
This is the rollback implication of dirty reads: when the transaction that wrote the dirty data aborts, it doesn't just restore data—it creates a fundamental reality gap between what the database says and what actually exists.
This page explores the specific implications of rollback scenarios: cascading aborts and their prevention, the formal concepts of recoverability, the distinction between recoverable, cascadeless, and strict schedules, and why rollback scenarios make dirty reads uniquely dangerous compared to other concurrency anomalies.
Let's examine the precise mechanics of what happens when a dirty read is followed by a rollback of the source transaction.
The Setup:
The Timeline:
What the Rollback Does and Doesn't Do:
T₁'s Rollback Accomplishes:
T₁'s Rollback Does NOT:
This asymmetry is the core of the problem: rollback perfectly handles the data written by the aborting transaction but has no mechanism to address data written by other transactions that depended on the aborted writes.
Database recovery is inherently unilateral—each transaction is rolled back independently without considering dependencies. This is correct behavior for ACID guarantees when isolation is maintained. But when dirty reads have created cross-transaction dependencies, unilateral rollback leaves those dependencies dangling.
One approach to maintaining consistency when dirty reads occur is cascading aborts—when T₁ aborts, any transaction that read T₁'s uncommitted data must also abort.
Formal Definition:
A cascading abort (or cascading rollback) occurs when the abort of one transaction forces one or more other transactions to abort as well. If T₂ reads data written by T₁ before T₁ commits, and T₁ aborts, then T₂ must also abort to prevent data inconsistency.
The Cascade Chain:
12345678910111213141516171819202122232425
CASCADING ABORT SCENARIO═══════════════════════════════════════════════════════════════════════ Time T₁ T₂ T₃ T₄────────────────────────────────────────────────────────────────────────t1 write(A) = 100t2 read(A) → 100* t3 write(B) = 200t4 read(B) → 200*t5 write(C) = 300t6 read(C) → 300*t7 write(D) = 400t8 ABORT! ←─────────────────────────────────────────────────────────── ↓ CASCADEt9 ABORT! (read dirty A) ←──────────────────────── ↓ CASCADEt10 ABORT! (read dirty B) ←──── ↓ CASCADEt11 ABORT! (read dirty C) * = dirty read from uncommitted transaction Result: T₁'s abort causes T₂, T₃, and T₄ to all abort. All work by all transactions is lost. Significant wasted computation.Problems with Cascading Aborts:
While cascading aborts maintain consistency, they introduce severe problems:
If T₂ has already committed before T₁ aborts, cascading abort cannot help. You cannot abort a committed transaction—durability guarantees prevent this. This is why cascading aborts are primarily a theoretical mechanism; in practice, once T₂ commits, the damage from a dirty read is permanent.
Database theory defines recoverability as a property of schedules that ensures the ability to properly recover from transaction failures. A schedule's recoverability determines whether cascading aborts are needed and whether they're even possible.
Definition of Recoverable Schedule:
A schedule S is recoverable if and only if:
For any two transactions Tᵢ and Tⱼ, if Tⱼ reads a data item written by Tᵢ, then Tᵢ must commit before Tⱼ commits.
In formal notation: If Tⱼ reads from Tᵢ, then Commit(Tᵢ) < Commit(Tⱼ) in the schedule.
Recoverable Schedule Example:
T₁: w(A) ... ... ... c₁
T₂: r(A) ... ... ... c₂
Timeline: w₁(A), r₂(A), c₁, c₂
This is recoverable because T₁ commits (c₁) before T₂ commits (c₂). Even though T₂ read T₁'s uncommitted data, T₂ waited for T₁ to commit before committing itself.
Non-Recoverable Schedule Example:
T₁: w(A) ... ... ... ... a₁
T₂: r(A) ... c₂
Timeline: w₁(A), r₂(A), c₂, a₁
This is NON-recoverable because T₂ commits (c₂) before T₁ aborts (a₁). T₂ has committed with dirty data, and we cannot rollback T₂ after it has committed.
Non-recoverable schedules are problematic because if T₁ aborts after T₂ has committed, there's no way to restore consistency. T₂'s committed data was based on T₁'s writes, which are now officially nullified. The database is stuck in an inconsistent state with no automated recovery path.
| Schedule Type | Definition | Cascading Aborts? | Key Property |
|---|---|---|---|
| Recoverable | Tᵢ commits before Tⱼ commits if Tⱼ reads from Tᵢ | Possible but may be needed | Can always undo Tⱼ if Tᵢ aborts |
| Cascadeless (ACA) | Tⱼ only reads committed data | Never needed | No dirty reads occur |
| Strict | Tⱼ only reads/writes after Tᵢ commits or aborts | Never needed | Before-images always recoverable |
To avoid cascading aborts entirely, we need stronger schedule properties than mere recoverability.
Cascadeless Schedules (Avoiding Cascading Abort - ACA):
A schedule S is cascadeless if and only if:
For any two transactions Tᵢ and Tⱼ, if Tⱼ reads a data item written by Tᵢ, then Tᵢ must commit before Tⱼ reads.
In other words: transactions may only read committed data. This directly prevents dirty reads.
1234567891011121314151617181920212223242526272829303132333435
SCHEDULE CLASSIFICATION EXAMPLES═══════════════════════════════════════════════════════════════════════ 1. NON-RECOVERABLE SCHEDULE: w₁(A), r₂(A), c₂, a₁ Problem: T₂ committed before T₁ aborted Result: Cannot recover - T₂'s committed data is based on T₁'s nullified writes Status: ILLEGAL - most DBs prevent this 2. RECOVERABLE BUT NOT CASCADELESS: w₁(A), r₂(A), c₁, c₂ Key: T₁ commits before T₂ commits But: T₂ read before T₁ committed (dirty read possible) If T₁ had aborted: T₂ would need to cascade abort (but T₂ hadn't committed yet) Status: Legal but risky 3. CASCADELESS (ACA): w₁(A), c₁, r₂(A), c₂ Key: T₂ only reads after T₁ commits Result: T₂ only ever sees committed data If T₁ aborted: It would happen before c₁, so before r₂, so T₂ reads original A Status: Safe - no cascading aborts possible 4. STRICT: w₁(A), c₁, r₂(A), w₂(A), c₂ Key: T₂ doesn't read OR write A until T₁ finishes Result: Before-images for recovery always clear Status: Safest - supports simple recovery algorithms HIERARCHY: Strict ⊂ Cascadeless ⊂ Recoverable ⊂ All SchedulesStrict Schedules:
A schedule S is strict if and only if:
For any data item X written by transaction Tᵢ, no other transaction may read or write X until Tᵢ either commits or aborts.
Strict schedules are the most restrictive but offer the cleanest recovery semantics. If T₁ writes A and then aborts, recovering A's before-image is straightforward because no other transaction has touched A since T₁ wrote it.
Most modern database systems enforce at least cascadeless schedules by default (via READ COMMITTED isolation or higher). This prevents dirty reads entirely. Only by explicitly requesting READ UNCOMMITTED can you opt into recoverable-but-not-cascadeless schedules where cascading aborts become a concern.
The rollback scenario fundamentally distinguishes dirty reads from other concurrency anomalies. Let's examine why.
Dirty Reads vs. Non-Repeatable Reads:
| Aspect | Dirty Read | Non-Repeatable Read |
|---|---|---|
| Data Source | Uncommitted (may abort) | Committed (will persist) |
| If Source Rolls Back | Read value never existed legally | N/A - source was committed |
| Consistency Status | May be reading intermediate, inconsistent state | Both reads are of consistent states |
| Recovery Option | May require cascade abort or manual correction | No special recovery needed |
| Result Validity | Results may be based on phantom data | Both values were valid at some point |
The Ontological Difference:
With dirty reads, rollback creates an ontological problem—the data that was read never actually existed in any committed database state. It was a provisional value that the database ultimately rejected.
With non-repeatable reads, both values read were legitimate committed values of the data item; they just differed because the data changed between reads. This is a temporal difference, not an ontological one.
Example Contrast:
1234567891011121314151617181920212223
DIRTY READ WITH ROLLBACK:─────────────────────────────────────────────────────────────────────T₁: WRITE(A=200) ... ABORTT₂: READ(A)→200 COMMIT Database history of committed values for A: A = 100 (initial) → A = 100 (still, because T₁ aborted) T₂ read A = 200, but 200 was NEVER in the committed history.The value 200 is a phantom - it has no legitimate existence. ───────────────────────────────────────────────────────────────────────NON-REPEATABLE READ (no dirty read):─────────────────────────────────────────────────────────────────────T₁: WRITE(A=200) COMMITT₂: READ(A)→100 READ(A)→200 COMMIT Database history of committed values for A: A = 100 (initial) → A = 200 (after T₁ committed) T₂ read two different values, but BOTH were in the committed history.100 was legitimately committed at t=0; 200 was legitimately committed after T₁.The inconsistency is temporal, not ontological.When dirty data triggers a rollback, you've made decisions based on a value that the database officially repudiates. It's not that you saw stale data or saw data change—you saw data that the database declares never happened. This is fundamentally more severe than any anomaly involving only committed data.
Let's examine real-world scenarios where dirty reads followed by rollbacks cause significant problems.
Scenario: Wire Transfer That Fails Validation
A wire transfer from Account A to Account B involves:
1234567891011121314151617181920
-- T1: Wire transfer (will fail AML check)BEGIN;UPDATE accounts SET balance = balance - 50000 WHERE id = 'A'; -- A: 100k → 50kUPDATE accounts SET balance = balance + 50000 WHERE id = 'B'; -- B: 10k → 60k -- Meanwhile, T2 (READ UNCOMMITTED) checks B's balance for another purposeSELECT balance FROM accounts WHERE id = 'B'; -- Returns 60k (DIRTY!) -- T2 uses this to approve a withdrawal from BUPDATE accounts SET balance = balance - 55000 WHERE id = 'B'; -- B: 60k → 5kCOMMIT; -- T2 commits! -- Back in T1: AML check failsROLLBACK; -- T1 aborts, B's balance should be 10k -- Final state:-- B was credited 50k (rolled back) then debited 55k (committed)-- Expected B balance: 10k-- Actual B balance: 10k - 55k + 50k (rollback) = ???-- Depends on implementation - but logic is clearly brokenT2 approved a $55,000 withdrawal from an account that actually only has $10,000. The dirty read showed $60,000 (which would never commit), but T2 committed a withdrawal based on that phantom balance. The account may now be severely overdrawn.
Different database systems handle rollback scenarios differently. Understanding these mechanisms helps predict behavior and design robust applications.
Lock-Based Systems:
In lock-based systems without READ UNCOMMITTED:
With READ UNCOMMITTED (explicitly requested):
12345678910111213141516171819202122232425262728293031
ROLLBACK MECHANISM (Simplified)═══════════════════════════════════════════════════════════════════════ FUNCTION rollback(transaction T): # Phase 1: Undo all writes by T using log records FOR each write operation W by T (in reverse order): data_item = W.data_item before_value = W.undo_info restore(data_item, before_value) write_compensation_log_record(T, data_item, before_value) # Phase 2: Release all locks held by T FOR each lock L held by T: release(L) wake_waiting_transactions(L) # Phase 3: Mark transaction as aborted write_abort_record(T) remove_from_active_transactions(T) # NOTICE: No check for dependent transactions! # NOTICE: No notification to readers of T's dirty data! # This is intentional - rollback is unilateral CONSEQUENCE: If READ UNCOMMITTED allowed dirty reads, those readers are NOT notified and NOT automatically rolled back. The system ASSUMES schedules are at least recoverable, meaning readers of dirty data haven't committed yet OR that READ COMMITTED+ prevents dirty reads entirely.MVCC Systems:
In MVCC systems like PostgreSQL and MySQL InnoDB:
MVCC inherently provides stronger protection because the normal visibility algorithm excludes uncommitted data. You have to explicitly bypass this protection.
In MVCC systems, rollback is even cleaner from the reader's perspective—uncommitted versions are simply never visible to begin with (for transactions using READ COMMITTED or higher). The rolled-back data might as well have never been written as far as other transactions are concerned.
The rollback scenario adds a critical dimension to understanding dirty reads. It's not just that we read 'wrong' data—we read data that the database officially repudiates as having never legally existed.
What's Next:
We've thoroughly examined the problem side of dirty reads. The final page explores prevention strategies—the isolation levels, locking protocols, and architectural patterns that prevent dirty reads from occurring in the first place.
You now understand the specific implications of rollback scenarios for dirty reads, including cascading aborts, recoverability concepts, and the unique severity of reading data that gets rolled back. This understanding motivates the prevention strategies we'll explore in the final page.