Loading learning content...
Imagine you're a bank customer checking your account balance online. The system shows $5,000. You decide to withdraw $3,000 at an ATM across the street. When you arrive and insert your card, the ATM checks your balance—but now it shows only $2,000. Confused, you return to the website, and it still shows $5,000 on your open session.
What happened?
While you were walking to the ATM, an automatic payment of $3,000 processed against your account. The ATM saw the updated balance, but your web session—still running the same transaction—continued to show the old value. Within the same logical transaction, you observed two different values for the same data item.
This is the unrepeatable read problem: a concurrency anomaly where a transaction reads the same data item twice and gets different values because another transaction modified and committed the data between the reads. It's not a bug in your application—it's a fundamental challenge in concurrent database access.
By the end of this page, you will understand the formal definition of unrepeatable reads, the precise timing conditions that cause them, how they differ from dirty reads and phantom reads, and why they violate the isolation property of ACID transactions.
An unrepeatable read (also called a non-repeatable read or fuzzy read) occurs when a transaction reads the same data item multiple times and observes different values due to modifications by other committed transactions between the reads.
Formal Definition:
An unrepeatable read occurs when transaction T₁ reads a data item X, then transaction T₂ modifies and commits X, and subsequently T₁ reads X again—observing T₂'s committed value instead of the value it originally read.
Symbolically, the problematic sequence is:
T₁: R(X) → value v₁
T₂: W(X) → value v₂
T₂: COMMIT
T₁: R(X) → value v₂ (different from v₁!)
T₁: COMMIT
The key characteristics that define an unrepeatable read are:
The modifying transaction must have committed for this to be an unrepeatable read. If T₂ modifies X but hasn't committed, and T₁ reads the modified value, that's a dirty read—a different (and generally more severe) problem. Unrepeatable reads involve only committed data, which makes them paradoxically harder to detect: the data is technically 'valid' at each read.
Understanding when unrepeatable reads can occur requires examining the precise timing relationships between transactions. The vulnerability window exists between a transaction's first and subsequent reads of the same data item.
The Vulnerable Interval:
When transaction T₁ reads data item X at time t₁, and plans to read X again at time t₂, the interval [t₁, t₂] represents the window during which another transaction could modify X and cause an unrepeatable read.
| Time | T₁ (Reader) | T₂ (Writer) | X Value | State |
|---|---|---|---|---|
| t₀ | BEGIN | 100 | X = 100 (initial) | |
| t₁ | R(X) → 100 | 100 | T₁ sees 100 | |
| t₂ | ...processing... | BEGIN | 100 | T₂ starts |
| t₃ | ...processing... | W(X) ← 200 | 100→200 | T₂ writes 200 (uncommitted) |
| t₄ | ...processing... | COMMIT | 200 | T₂ commits, X = 200 permanent |
| t₅ | R(X) → 200 ⚠️ | 200 | T₁ sees 200 (different!) | |
| t₆ | COMMIT | 200 | T₁ commits with inconsistent view |
Critical Observations:
At t₁: T₁ reads X = 100. This is its "snapshot" of reality.
Between t₁ and t₅: T₂ successfully modifies and commits X = 200. The database is now in a new consistent state.
At t₅: T₁ reads X again and sees 200. From T₁'s perspective, the same data item magically changed value.
The paradox: Both values (100 and 200) were valid committed values at the time they were read. T₁ isn't reading "wrong" data—it's reading data from two different points in time.
The fundamental issue is that T₁ doesn't operate on a consistent snapshot of the database. It sees the database at time t₁ for its first read and at time t₅ for its second read.
The probability of an unrepeatable read increases with the time gap between reads. Long-running transactions that read the same data multiple times are especially vulnerable. A transaction that reads a value, performs complex calculations for several seconds, then reads again has ample opportunity for intervening commits by other transactions.
Unrepeatable reads are one of three read-related anomalies in concurrent database systems. Understanding how they differ is crucial for selecting appropriate isolation levels and designing robust transactions.
The Three Read Anomalies:
| Anomaly | What Changes | Data Source | Scope |
|---|---|---|---|
| Dirty Read | Value of existing row | Uncommitted transaction | Single row |
| Unrepeatable Read | Value of existing row | Committed transaction | Single row |
| Phantom Read | Set of rows matching query | Committed transaction | Query result set |
Phantom Read vs. Unrepeatable Read:
The distinction between unrepeatable reads and phantom reads is subtle but important:
| Aspect | Unrepeatable Read | Phantom Read |
|---|---|---|
| What's affected | A specific row that was previously read | The set of rows matching a query condition |
| Operation type | UPDATE or DELETE to an existing row | INSERT or DELETE affecting query results |
| Example | Row with ID=5 changes from balance=100 to balance=200 | Query for balance>1000 returns 3 rows, then 5 rows |
| Lock required | Row-level lock on specific rows | Predicate lock or range lock on query condition |
Why the distinction matters:
Preventing unrepeatable reads requires locking specific rows that have been read. Preventing phantom reads requires locking potential rows that might be read—rows that don't yet exist or that match a condition. This makes phantom prevention significantly more complex and expensive.
Think of it this way: Dirty reads see tentative futures (uncommitted changes). Unrepeatable reads see the changing present (committed changes between reads). Phantom reads see shifting populations (new or removed rows matching conditions). Unrepeatable reads are about the value of identified rows changing; phantoms are about which rows exist.
The ACID properties define the guarantees that database transactions should provide. Unrepeatable reads directly violate the Isolation property, with indirect implications for Consistency.
The Isolation Violation:
The 'I' in ACID guarantees that concurrent transactions execute as if they were serialized—running one after another with no overlap. In a truly isolated system, transaction T₁ should not be affected by T₂'s concurrent modifications.
Unrepeatable reads violate this because:
The Serial Equivalence Test:
Consider what happens if T₁ and T₂ ran serially:
Serial Order T₁ → T₂:
T₁: BEGIN, R(X)=100, R(X)=100, COMMIT
T₂: BEGIN, W(X)=200, COMMIT
Result: T₁ sees X=100 both times (consistent)
Serial Order T₂ → T₁:
T₂: BEGIN, W(X)=200, COMMIT
T₁: BEGIN, R(X)=200, R(X)=200, COMMIT
Result: T₁ sees X=200 both times (consistent)
But with unrepeatable read:
T₁ sees X=100 first, then X=200
This matches neither serial order!
The concurrent execution produces a result impossible under serial execution—the definition of an isolation violation.
Database systems provide multiple isolation levels precisely because full isolation (SERIALIZABLE) reduces concurrency. Levels like READ COMMITTED intentionally allow unrepeatable reads to improve performance. The key is understanding what you're trading away and whether your application can tolerate the anomaly.
The unrepeatable read problem violates what database theorists call read stability—the expectation that data read during a transaction remains stable for the transaction's duration.
What Read Stability Means:
A schedule provides read stability for transaction T if every data item X that T reads retains its value throughout T's execution.
This is a natural expectation for most programmers. If you read a variable's value in a program, you expect it to have that value until you explicitly change it. Databases that allow unrepeatable reads break this intuition.
Why Read Stability Matters:
Example: The Broken Transfer:
-- Transaction T₁: Verify and transfer
BEGIN;
SELECT balance INTO @balance FROM accounts WHERE id = 1;
-- @balance = 1000
-- Meanwhile T₂ runs: UPDATE accounts SET balance = 500 WHERE id = 1; COMMIT;
-- T₁ continues, unaware of the change
IF @balance >= 500 THEN
-- Check passes, but actual balance is now 500, not 1000
SELECT balance INTO @current_balance FROM accounts WHERE id = 1;
-- @current_balance = 500 (different!)
UPDATE accounts SET balance = balance - 500 WHERE id = 1;
-- This creates an overdraft: 500 - 500 = 0 (or negative if T₂ withdrew more)
END IF;
COMMIT;
The transaction made a decision based on one value (1000) but executed based on another (500). Read stability would have prevented this inconsistency.
A common mistake is caching query results in application variables and assuming they remain valid. Without read stability at the database level, these cached values become dangerous lies. Either re-read with proper isolation, use explicit locking, or design logic that doesn't depend on value stability.
To rigorously analyze and detect unrepeatable reads, we need formal notation for representing transaction schedules. This notation enables precise reasoning about when anomalies occur.
Schedule Notation:
Formal Pattern for Unrepeatable Read:
A schedule contains an unrepeatable read if there exists transactions Tᵢ and Tⱼ such that:
Rᵢ(X) < Wⱼ(X) < Cⱼ < Rᵢ(X)
Where < denotes "precedes in the schedule." In words:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184
from typing import Dict, List, Set, Tuple, Optionalfrom dataclasses import dataclass, fieldfrom enum import Enum class OpType(Enum): READ = "R" WRITE = "W" COMMIT = "C" ABORT = "A" @dataclassclass Operation: """Represents a single operation in a schedule.""" transaction: str op_type: OpType data_item: Optional[str] = None timestamp: int = 0 def __repr__(self): if self.data_item: return f"{self.op_type.value}_{self.transaction}({self.data_item})" return f"{self.op_type.value}_{self.transaction}" @dataclassclass UnrepeatableReadDetector: """ Detects unrepeatable read anomalies in transaction schedules. An unrepeatable read occurs when: 1. Transaction Ti reads data item X 2. Transaction Tj writes to X 3. Tj commits 4. Ti reads X again (seeing Tj's value) Key distinction from dirty read: Tj must commit before Ti's second read. """ operations: List[Operation] = field(default_factory=list) # Track reads per transaction per item: {txn: {item: [timestamps]}} reads: Dict[str, Dict[str, List[int]]] = field(default_factory=dict) # Track writes: {item: [(txn, timestamp)]} writes: Dict[str, List[Tuple[str, int]]] = field(default_factory=dict) # Track commits: {txn: timestamp} commits: Dict[str, int] = field(default_factory=dict) timestamp: int = 0 def execute(self, operation: Operation) -> None: """Execute an operation and track it.""" operation.timestamp = self.timestamp self.operations.append(operation) self.timestamp += 1 if operation.op_type == OpType.READ: txn = operation.transaction item = operation.data_item if txn not in self.reads: self.reads[txn] = {} if item not in self.reads[txn]: self.reads[txn][item] = [] self.reads[txn][item].append(operation.timestamp) print(f" t={operation.timestamp}: {operation}") elif operation.op_type == OpType.WRITE: item = operation.data_item if item not in self.writes: self.writes[item] = [] self.writes[item].append((operation.transaction, operation.timestamp)) print(f" t={operation.timestamp}: {operation}") elif operation.op_type == OpType.COMMIT: self.commits[operation.transaction] = operation.timestamp print(f" t={operation.timestamp}: {operation}") elif operation.op_type == OpType.ABORT: print(f" t={operation.timestamp}: {operation}") def detect_unrepeatable_reads(self) -> List[Dict]: """ Detect all unrepeatable read anomalies in the schedule. Returns list of anomalies with details about the transactions and data items involved. """ anomalies = [] for txn_i, items_read in self.reads.items(): for item, read_times in items_read.items(): if len(read_times) < 2: continue # Need at least 2 reads for unrepeatable read first_read = read_times[0] for subsequent_read in read_times[1:]: # Check for intervening committed write by another transaction if item not in self.writes: continue for writer_txn, write_time in self.writes[item]: if writer_txn == txn_i: continue # Same transaction, not an anomaly # Check pattern: R1(X) < W2(X) < C2 < R1(X) if writer_txn not in self.commits: continue # Writer didn't commit commit_time = self.commits[writer_txn] if (first_read < write_time < commit_time < subsequent_read): anomalies.append({ "reader": txn_i, "writer": writer_txn, "item": item, "first_read_time": first_read, "write_time": write_time, "commit_time": commit_time, "second_read_time": subsequent_read, }) return anomalies def demonstrate_detection(): """Demonstrate unrepeatable read detection with examples.""" print("=" * 60) print("Example 1: UNREPEATABLE READ PRESENT") print("=" * 60) detector1 = UnrepeatableReadDetector() # Schedule: R1(X), W2(X), C2, R1(X), C1 # T1 reads X, T2 writes & commits X, T1 reads X again detector1.execute(Operation("T1", OpType.READ, "X")) detector1.execute(Operation("T2", OpType.WRITE, "X")) detector1.execute(Operation("T2", OpType.COMMIT)) detector1.execute(Operation("T1", OpType.READ, "X")) detector1.execute(Operation("T1", OpType.COMMIT)) anomalies1 = detector1.detect_unrepeatable_reads() print(f"\nAnomalies detected: {len(anomalies1)}") for a in anomalies1: print(f" - {a['reader']} experienced unrepeatable read of {a['item']}") print(f" First read at t={a['first_read_time']}, {a['writer']} wrote at " f"t={a['write_time']}, committed at t={a['commit_time']}, " f"second read at t={a['second_read_time']}") print("\n" + "=" * 60) print("Example 2: NO UNREPEATABLE READ (Writer doesn't commit in time)") print("=" * 60) detector2 = UnrepeatableReadDetector() # Schedule: R1(X), W2(X), R1(X), C1, C2 # T2 commits AFTER T1's second read detector2.execute(Operation("T1", OpType.READ, "X")) detector2.execute(Operation("T2", OpType.WRITE, "X")) detector2.execute(Operation("T1", OpType.READ, "X")) detector2.execute(Operation("T1", OpType.COMMIT)) detector2.execute(Operation("T2", OpType.COMMIT)) anomalies2 = detector2.detect_unrepeatable_reads() print(f"\nAnomalies detected: {len(anomalies2)}") if not anomalies2: print(" No unrepeatable reads (T2 committed after T1's second read)") print("\n" + "=" * 60) print("Example 3: NO UNREPEATABLE READ (Only one read)") print("=" * 60) detector3 = UnrepeatableReadDetector() # Schedule: R1(X), W2(X), C2, C1 # T1 only reads X once detector3.execute(Operation("T1", OpType.READ, "X")) detector3.execute(Operation("T2", OpType.WRITE, "X")) detector3.execute(Operation("T2", OpType.COMMIT)) detector3.execute(Operation("T1", OpType.COMMIT)) anomalies3 = detector3.detect_unrepeatable_reads() print(f"\nAnomalies detected: {len(anomalies3)}") if not anomalies3: print(" No unrepeatable reads (T1 only read X once)") if __name__ == "__main__": demonstrate_detection()Detection algorithms like the one shown are primarily for analysis and testing. In production systems, prevention (through appropriate isolation levels or locking) is preferred over detection because detecting an unrepeatable read after it occurs may be too late—the transaction may have already made incorrect decisions based on inconsistent data.
The three read anomalies form a hierarchy based on their severity and the isolation level required to prevent them. Understanding this hierarchy is essential for choosing appropriate database configurations.
The Hierarchy (from most to least severe):
Dirty Read ⊃ Unrepeatable Read ⊃ Phantom Read
This means:
| Isolation Level | Dirty Read | Unrepeatable Read | Phantom Read |
|---|---|---|---|
| READ UNCOMMITTED | Possible | Possible | Possible |
| READ COMMITTED | Prevented | Possible | Possible |
| REPEATABLE READ | Prevented | Prevented | Possible* |
| SERIALIZABLE | Prevented | Prevented | Prevented |
Note on REPEATABLE READ and Phantoms:
The SQL standard states that REPEATABLE READ may allow phantom reads. However, implementations vary:
Why REPEATABLE READ Prevents Unrepeatable Reads:
The REPEATABLE READ isolation level ensures that any row read during a transaction will return the same value if read again within that same transaction. This is typically implemented through:
The SQL standard defines minimum guarantees, but implementations often provide stronger guarantees. Always consult your specific database's documentation to understand exactly which anomalies are prevented at each isolation level.
We've established a rigorous understanding of what unrepeatable reads are and how they fit into the landscape of concurrency anomalies. Let's consolidate the key concepts:
What's next:
Now that we understand the formal definition of unrepeatable reads, we'll explore the broader concept of inconsistent reads—how unrepeatable reads can combine with other data access patterns to create complex inconsistency scenarios that affect transaction correctness and application logic.
You now understand the precise definition of unrepeatable reads, can distinguish them from dirty reads and phantom reads, and know which isolation levels prevent them. Next, we'll examine how inconsistent reads manifest in more complex scenarios and why they create challenges for application correctness.