Loading learning content...
We've seen how cascading rollbacks can transform a single transaction failure into a catastrophic chain reaction—dozens of transactions may be undone because they read uncommitted data that was later invalidated. This is the price of allowing dirty reads in recoverable schedules.
But what if we could have our cake and eat it too? What if schedules could be recoverable without the risk of cascading rollbacks?
This is exactly what cascadeless schedules provide. By ensuring that every transaction only reads committed data, we eliminate the dependency on uncommitted transactions that causes cascades. When a transaction aborts, only that single transaction needs to be undone—no other transactions are affected because none of them read the aborted transaction's uncommitted writes.
This seemingly simple restriction—read only committed data—has profound implications for database system design, isolation levels, and concurrency control implementation.
By the end of this page, you will understand the formal definition of cascadeless schedules, why preventing dirty reads eliminates cascades, the relationship between cascadeless schedules and isolation levels, implementation strategies in real databases, and the performance tradeoffs involved.
Definition: A schedule S is cascadeless (also called avoids cascading aborts or ACA) if and only if every transaction Tⱼ in S reads only values that were written by transactions that have already committed.
Formally: For every transaction Tⱼ and every data item X that Tⱼ reads, if Tᵢ is the last transaction that wrote X before Tⱼ's read:
commit(Tᵢ) < read(Tⱼ, X)
In other words: A transaction may only read a value after the transaction that wrote it has committed.
Key insight: This is a stronger condition than recoverability. Recoverability only requires that the commit of the reader comes after the commit of the writer. Cascadelessness requires that even the read comes after the commit of the writer.
| Property | Recoverability Requirement | Cascadelessness Requirement |
|---|---|---|
| If Tⱼ reads from Tᵢ... | commit(Tᵢ) < commit(Tⱼ) | commit(Tᵢ) < read(Tⱼ, X) |
| Dirty reads allowed? | Yes | No |
| Cascading aborts possible? | Yes | No |
| Relative strength | Weaker | Stronger |
| Set relationship | Superset | Subset |
All cascadeless schedules are recoverable, but not all recoverable schedules are cascadeless. Cascadelessness implies recoverability because if a transaction only reads committed data, the commit ordering requirement of recoverability is automatically satisfied.
Why does this eliminate cascades?
Recall that cascading rollback occurs when:
In a cascadeless schedule, step 1 cannot happen—Tⱼ can only read from Tᵢ after Tᵢ has committed. If Tᵢ commits, it will never abort, so there's no cascade trigger. If Tᵢ is still active (not committed), Tⱼ must wait to read X, eliminating the dirty read.
Let's examine concrete schedule examples to distinguish cascadeless schedules from merely recoverable ones:
Notation:
A more complex example with multiple transactions:
Consider a schedule with three transactions accessing data items X, Y, and Z:
123456789101112131415161718192021222324
Schedule S:─────────────────────────────────────────────────────────────────Time T₁ T₂ T₃ State─────────────────────────────────────────────────────────────────t1 W(X, 100) X=100 (uncommitted by T₁)t2 W(Y, 200) Y=200 (uncommitted by T₂)t3 C T₁ committed, X=100 (committed)t4 R(X) → 100 T₃ reads X (committed by T₁) ✓t5 C T₂ committed, Y=200 (committed)t6 R(Y) → 200 T₃ reads Y (committed by T₂) ✓t7 W(Z, 300) Z=300 (uncommitted by T₃)t8 C T₃ committed IS THIS SCHEDULE CASCADELESS? Analysis:• T₃ reads X at t4 — X was last written by T₁ at t1, T₁ committed at t3 → commit(T₁) = t3 < t4 = read(T₃, X) ✓ • T₃ reads Y at t6 — Y was last written by T₂ at t2, T₂ committed at t5 → commit(T₂) = t5 < t6 = read(T₃, Y) ✓ Conclusion: Schedule S is CASCADELESS ✓ Neither read is a dirty read.For each read operation R(X) by transaction Tⱼ: Find the most recent write W(X) by transaction Tᵢ. If Tᵢ ≠ Tⱼ, check if Tᵢ committed before the read. If any read fails this check, the schedule is NOT cascadeless.
SQL isolation levels directly impact whether schedules are cascadeless:
READ UNCOMMITTED:
READ COMMITTED:
REPEATABLE READ and SERIALIZABLE:
| Isolation Level | Dirty Reads | Cascadeless | Additional Properties |
|---|---|---|---|
| READ UNCOMMITTED | Allowed | No | Minimal guarantees |
| READ COMMITTED | Prevented | Yes | Non-repeatable reads possible |
| REPEATABLE READ | Prevented | Yes | Phantom reads possible |
| SERIALIZABLE | Prevented | Yes | Strictest guarantees |
READ COMMITTED is the default isolation level in most databases (PostgreSQL, Oracle, SQL Server) precisely because it provides cascadelessness—eliminating cascading rollbacks—while avoiding the concurrency costs of higher isolation levels. It's the minimum isolation that most applications need.
Why don't most applications use READ UNCOMMITTED?
Despite the potential performance benefits, READ UNCOMMITTED is rarely used because:
The only common use case for READ UNCOMMITTED is approximate counting or monitoring queries where absolute accuracy is not required.
Database systems enforce cascadeless schedules through various mechanisms. The choice of mechanism affects performance, concurrency, and system complexity.
Strategy 1: Lock-Based (Blocking Reads)
When a transaction wants to read data item X:
This ensures readers only see committed data but may cause readers to wait.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157
from dataclasses import dataclass, fieldfrom typing import Dict, Set, Optional, Tuplefrom enum import Enumimport threadingimport time class LockType(Enum): NONE = "none" SHARED = "shared" # Read lock EXCLUSIVE = "exclusive" # Write lock @dataclass class DataItem: value: any committed_value: any # Last committed value lock_holder: Optional[str] = None lock_type: LockType = LockType.NONE waiting_queue: list = field(default_factory=list) class CascadelessLockManager: """ Lock-based implementation that ensures cascadeless schedules. Key guarantee: A transaction can only read a value after the transaction that wrote it has committed. """ def __init__(self): self.data: Dict[str, DataItem] = {} self.transactions: Dict[str, Set[str]] = {} # txn -> locked items self.lock = threading.Lock() def write(self, txn_id: str, item: str, value: any) -> bool: """ Write a value (requires exclusive lock). The value becomes visible to others only after commit. """ with self.lock: if item not in self.data: self.data[item] = DataItem( value=value, committed_value=None ) data = self.data[item] # Check if we can acquire exclusive lock if data.lock_type == LockType.NONE: data.lock_holder = txn_id data.lock_type = LockType.EXCLUSIVE data.value = value self._track_lock(txn_id, item) print(f" {txn_id}: W({item}) = {value} [exclusive lock acquired]") return True elif data.lock_holder == txn_id: # We already have the lock data.value = value print(f" {txn_id}: W({item}) = {value} [already own lock]") return True else: # Cannot acquire lock - would need to wait print(f" {txn_id}: W({item}) BLOCKED - {data.lock_holder} holds lock") return False def read(self, txn_id: str, item: str) -> Tuple[bool, any]: """ Read a value - ensures cascadelessness by reading only committed data. If uncommitted data exists, the reader WAITS (in real implementation) or returns False (in this simplified version). """ with self.lock: if item not in self.data: print(f" {txn_id}: R({item}) = NULL [item doesn't exist]") return True, None data = self.data[item] # CRITICAL: If another transaction holds an exclusive lock, # we can only read the COMMITTED value, not the uncommitted one if data.lock_type == LockType.EXCLUSIVE and data.lock_holder != txn_id: # Read the last committed value (guarantees cascadelessness) committed = data.committed_value print(f" {txn_id}: R({item}) = {committed} [reading committed value, " f"ignoring uncommitted by {data.lock_holder}]") return True, committed else: # Either no lock, or we hold the lock (reading our own write) value = data.value print(f" {txn_id}: R({item}) = {value}") return True, value def commit(self, txn_id: str) -> None: """ Commit transaction - make all writes visible and release locks. """ with self.lock: if txn_id not in self.transactions: self.transactions[txn_id] = set() for item in self.transactions.get(txn_id, set()): data = self.data[item] if data.lock_holder == txn_id: # Make value visible to all data.committed_value = data.value data.lock_holder = None data.lock_type = LockType.NONE print(f" {txn_id}: COMMIT - {item} = {data.value} now visible") self.transactions[txn_id] = set() def abort(self, txn_id: str) -> None: """ Abort transaction - restore original values and release locks. NO OTHER TRANSACTIONS NEED TO ABORT (cascadeless guarantee). """ with self.lock: for item in self.transactions.get(txn_id, set()): data = self.data[item] if data.lock_holder == txn_id: # Restore to committed value data.value = data.committed_value data.lock_holder = None data.lock_type = LockType.NONE print(f" {txn_id}: ABORT - {item} restored to {data.committed_value}") self.transactions[txn_id] = set() print(f" {txn_id}: ABORTED (no cascade - other transactions unaffected)") def _track_lock(self, txn_id: str, item: str): if txn_id not in self.transactions: self.transactions[txn_id] = set() self.transactions[txn_id].add(item) # Demonstrate cascadeless behaviorprint("=" * 60)print("CASCADELESS SCHEDULE DEMONSTRATION")print("=" * 60) lm = CascadelessLockManager() # Initialize some datalm.data["X"] = DataItem(value=100, committed_value=100)print("Initial: X = 100 (committed)") print("\n--- T1 writes, T2 tries to read ---")lm.write("T1", "X", 500) # T1 writes but doesn't commitlm.read("T2", "X") # T2 reads - gets committed value 100, NOT 500! print("\n--- T1 aborts ---")lm.abort("T1") # T1 aborts print("\n--- T2 continues unaffected ---")lm.read("T2", "X") # T2 can still read X = 100lm.commit("T2") # T2 commits successfully print("\n✓ No cascade! T2 was never affected by T1's abort.")Strategy 2: Multi-Version Concurrency Control (MVCC)
MVCC maintains multiple versions of each data item:
This is the approach used by PostgreSQL, Oracle, MySQL InnoDB, and most modern databases.
MVCC provides cascadelessness with minimal concurrency impact. Writers don't block readers, and readers don't block writers. This is why nearly all modern OLTP databases use MVCC as their primary concurrency control mechanism.
Enforcing cascadelessness has performance implications, but they're generally favorable compared to allowing cascading rollbacks:
Positive Impacts:
Predictable recovery: When a transaction aborts, only that transaction's work is lost—no chain reaction of wasted computation.
Simpler rollback: The system doesn't need to track which other transactions might be affected by an abort.
Better throughput under failures: High abort rates don't amplify into cascade catastrophes.
Potential Costs:
| Aspect | With Cascadelessness | Without (Dirty Reads) |
|---|---|---|
| Read latency (no conflict) | Same | Same |
| Read latency (conflict) | May wait for commit | Immediate (dirty) |
| Abort impact | Single transaction | May cascade to many |
| Worst-case throughput | Predictable | Can collapse (avalanche) |
| Average throughput | High | Slightly higher (rare waits) |
| Memory overhead | MVCC: version storage | Minimal |
| Implementation complexity | Moderate | Low (but cascade handling complex) |
When does cascadelessness cost performance?
The main cost occurs when:
This is most significant for hot records that are frequently read and written. However:
In practice, the performance benefits of avoiding cascades far outweigh the costs of preventing dirty reads.
Analysis shows that if abort rate exceeds ~1-2%, the cost of potential cascades (wasted work, retries, cascade processing) exceeds the cost of preventing dirty reads. Since typical abort rates are 1-5%, cascadelessness almost always wins on net performance.
Let's examine how major database systems implement cascadelessness:
PostgreSQL:
Oracle:
MySQL InnoDB:
SQL Server:
12345678910111213141516171819202122232425262728293031
-- PostgreSQL: Demonstrating cascadeless behavior -- Session 1: Start transaction and updateBEGIN;UPDATE accounts SET balance = balance - 1000 WHERE id = 1;-- Don't commit yet -- Session 2: Try to read the same rowBEGIN;SELECT balance FROM accounts WHERE id = 1;-- Returns the OLD (committed) balance, not the uncommitted update! -- This is cascadeless in action:-- Session 2 reads committed data, not Session 1's uncommitted write -- Session 1: Abort the transactionROLLBACK; -- Session 2 continues unaffected-- No cascade! Session 2 never saw the aborted dataUPDATE accounts SET balance = balance + 500 WHERE id = 2;COMMIT; -- Succeeds normally -- Verify with explicit READ COMMITTEDSET TRANSACTION ISOLATION LEVEL READ COMMITTED;-- All reads return only committed data (cascadeless) -- The only way to get dirty reads in PostgreSQL:SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;-- But PostgreSQL actually treats this as READ COMMITTED!-- PostgreSQL is ALWAYS cascadeless.PostgreSQL is notably strict: even if you request READ UNCOMMITTED, it silently upgrades to READ COMMITTED. PostgreSQL developers deliberately chose to NEVER allow dirty reads, making all PostgreSQL schedules cascadeless by design.
Given a schedule, how do we verify it is cascadeless? Here's a systematic algorithm:
Algorithm: Cascadelessness Test
Input: Schedule S with operations from transactions T₁, T₂, ..., Tₙ
Output: TRUE if S is cascadeless, FALSE otherwise
1. For each READ(X) operation by transaction Tⱼ:
a. Find the most recent WRITE(X) operation before this read
b. Let Tᵢ be the transaction that performed this write
c. If Tᵢ ≠ Tⱼ (reading another transaction's write):
- Check if COMMIT(Tᵢ) appears before READ(X) in S
- If not, return FALSE (dirty read found)
2. If all reads pass the check, return TRUE
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
from dataclasses import dataclassfrom typing import List, Dict, Set, Tuplefrom enum import Enum class OpType(Enum): READ = "R" WRITE = "W" COMMIT = "C" ABORT = "A" @dataclassclass Operation: txn: str op_type: OpType item: str = "" timestamp: int = 0 def test_cascadelessness(schedule: List[Operation]) -> Tuple[bool, str]: """ Test if a schedule is cascadeless. A schedule is cascadeless iff every read from another transaction's write occurs AFTER that transaction has committed. Args: schedule: List of operations in temporal order Returns: Tuple of (is_cascadeless, explanation) """ # Track last writer for each item last_writer: Dict[str, Tuple[str, int]] = {} # item -> (txn, timestamp) # Track committed transactions committed: Set[str] = set() commit_times: Dict[str, int] = {} # Process schedule for i, op in enumerate(schedule): op.timestamp = i if op.op_type == OpType.WRITE: last_writer[op.item] = (op.txn, i) elif op.op_type == OpType.COMMIT: committed.add(op.txn) commit_times[op.txn] = i elif op.op_type == OpType.READ: if op.item in last_writer: writer_txn, write_time = last_writer[op.item] # Check if reading from another transaction if writer_txn != op.txn: # Writer must have committed before this read if writer_txn not in committed: return False, ( f"DIRTY READ: {op.txn} reads {op.item} at t={i}, " f"but writer {writer_txn} (wrote at t={write_time}) " f"has not committed yet → NOT CASCADELESS" ) return True, "All reads are from committed transactions → CASCADELESS ✓" # Example schedules to testprint("=" * 70)print("CASCADELESSNESS TESTING")print("=" * 70) # Schedule 1: Cascadelessschedule1 = [ Operation("T1", OpType.WRITE, "X"), Operation("T1", OpType.COMMIT), Operation("T2", OpType.READ, "X"), Operation("T2", OpType.COMMIT),]print("\nSchedule 1: T1:W(X), T1:C, T2:R(X), T2:C")result, explanation = test_cascadelessness(schedule1)print(f"Result: {explanation}") # Schedule 2: Not Cascadeless (dirty read)schedule2 = [ Operation("T1", OpType.WRITE, "X"), Operation("T2", OpType.READ, "X"), # Dirty read! Operation("T1", OpType.COMMIT), Operation("T2", OpType.COMMIT),]print("\nSchedule 2: T1:W(X), T2:R(X), T1:C, T2:C")result, explanation = test_cascadelessness(schedule2)print(f"Result: {explanation}") # Schedule 3: Complex cascadeless scheduleschedule3 = [ Operation("T1", OpType.WRITE, "X"), Operation("T2", OpType.WRITE, "Y"), Operation("T1", OpType.COMMIT), Operation("T3", OpType.READ, "X"), # Reads T1's committed write ✓ Operation("T2", OpType.COMMIT), Operation("T3", OpType.READ, "Y"), # Reads T2's committed write ✓ Operation("T3", OpType.COMMIT),]print("\nSchedule 3: T1:W(X), T2:W(Y), T1:C, T3:R(X), T2:C, T3:R(Y), T3:C")result, explanation = test_cascadelessness(schedule3)print(f"Result: {explanation}") # Schedule 4: Transaction reads its own uncommitted write (OK)schedule4 = [ Operation("T1", OpType.WRITE, "X"), Operation("T1", OpType.READ, "X"), # Reading own write - always OK Operation("T1", OpType.COMMIT),]print("\nSchedule 4: T1:W(X), T1:R(X), T1:C")result, explanation = test_cascadelessness(schedule4)print(f"Result: {explanation}")A transaction reading its own uncommitted write is NOT a dirty read and does NOT violate cascadelessness. The read-write dependency is within the same transaction, so there's no cascade possibility.
Cascadeless schedules eliminate the risk of cascading rollbacks by preventing dirty reads entirely. Let's consolidate the key concepts:
What's next:
Cascadeless schedules prevent cascading rollbacks, but they still allow a subtle problem: a transaction might overwrite or read data that's being modified by an uncommitted transaction in a way that complicates recovery. Strict schedules add an even stronger guarantee that simplifies recovery further. The next page explores strict schedules and their benefits for database recovery processes.
You now understand cascadeless schedules—the property that eliminates cascading rollbacks by preventing dirty reads. You've learned formal definitions, implementation strategies, and how major databases enforce cascadelessness through isolation levels and MVCC. Next, we'll explore strict schedules that provide the strongest recoverability guarantees.