Database Management SystemConcurrency Problems

Unrepeatable Read Problem

LevelIntermediate

Duration75 mins

TopicConcurrency Problems

1 / 5

Unrepeatable Read Definition

When Data Shifts Beneath Your Feet

Imagine you're a bank customer checking your account balance online. The system shows $5,000. You decide to withdraw $3,000 at an ATM across the street. When you arrive and insert your card, the ATM checks your balance—but now it shows only $2,000. Confused, you return to the website, and it still shows $5,000 on your open session.

What happened?

While you were walking to the ATM, an automatic payment of $3,000 processed against your account. The ATM saw the updated balance, but your web session—still running the same transaction—continued to show the old value. Within the same logical transaction, you observed two different values for the same data item.

This is the unrepeatable read problem: a concurrency anomaly where a transaction reads the same data item twice and gets different values because another transaction modified and committed the data between the reads. It's not a bug in your application—it's a fundamental challenge in concurrent database access.

What You Will Learn

By the end of this page, you will understand the formal definition of unrepeatable reads, the precise timing conditions that cause them, how they differ from dirty reads and phantom reads, and why they violate the isolation property of ACID transactions.

Formal Definition

An unrepeatable read (also called a non-repeatable read or fuzzy read) occurs when a transaction reads the same data item multiple times and observes different values due to modifications by other committed transactions between the reads.

Formal Definition:

An unrepeatable read occurs when transaction T₁ reads a data item X, then transaction T₂ modifies and commits X, and subsequently T₁ reads X again—observing T₂'s committed value instead of the value it originally read.

Symbolically, the problematic sequence is:

T₁: R(X) → value v₁
        T₂: W(X) → value v₂
        T₂: COMMIT
T₁: R(X) → value v₂  (different from v₁!)
T₁: COMMIT

The key characteristics that define an unrepeatable read are:

Defining Characteristics

•Same data item: The transaction reads the exact same row or record twice (not different rows matching a condition)
•Same transaction: Both reads occur within the same transaction boundary (between BEGIN and COMMIT/ROLLBACK)
•Intervening modification: Another transaction modifies the data between the two reads
•Committed modification: The modifying transaction commits before the second read (this distinguishes it from dirty reads)
•Different values observed: The two reads return different values for the same data item

The 'Committed' Distinction

The modifying transaction must have committed for this to be an unrepeatable read. If T₂ modifies X but hasn't committed, and T₁ reads the modified value, that's a dirty read—a different (and generally more severe) problem. Unrepeatable reads involve only committed data, which makes them paradoxically harder to detect: the data is technically 'valid' at each read.

The Timing Window

Understanding when unrepeatable reads can occur requires examining the precise timing relationships between transactions. The vulnerability window exists between a transaction's first and subsequent reads of the same data item.

The Vulnerable Interval:

When transaction T₁ reads data item X at time t₁, and plans to read X again at time t₂, the interval [t₁, t₂] represents the window during which another transaction could modify X and cause an unrepeatable read.

Timeline of an Unrepeatable Read
Time	T₁ (Reader)	T₂ (Writer)	X Value	State
t₀	BEGIN		100	X = 100 (initial)
t₁	R(X) → 100		100	T₁ sees 100
t₂	...processing...	BEGIN	100	T₂ starts
t₃	...processing...	W(X) ← 200	100→200	T₂ writes 200 (uncommitted)
t₄	...processing...	COMMIT	200	T₂ commits, X = 200 permanent
t₅	R(X) → 200 ⚠️		200	T₁ sees 200 (different!)
t₆	COMMIT		200	T₁ commits with inconsistent view

Critical Observations:

At t₁: T₁ reads X = 100. This is its "snapshot" of reality.
Between t₁ and t₅: T₂ successfully modifies and commits X = 200. The database is now in a new consistent state.
At t₅: T₁ reads X again and sees 200. From T₁'s perspective, the same data item magically changed value.
The paradox: Both values (100 and 200) were valid committed values at the time they were read. T₁ isn't reading "wrong" data—it's reading data from two different points in time.

The fundamental issue is that T₁ doesn't operate on a consistent snapshot of the database. It sees the database at time t₁ for its first read and at time t₅ for its second read.

The Wider the Window, the Greater the Risk

The probability of an unrepeatable read increases with the time gap between reads. Long-running transactions that read the same data multiple times are especially vulnerable. A transaction that reads a value, performs complex calculations for several seconds, then reads again has ample opportunity for intervening commits by other transactions.

Distinguishing from Related Anomalies

Unrepeatable reads are one of three read-related anomalies in concurrent database systems. Understanding how they differ is crucial for selecting appropriate isolation levels and designing robust transactions.

The Three Read Anomalies:

Comparison of Read Anomalies
Anomaly	What Changes	Data Source	Scope
Dirty Read	Value of existing row	Uncommitted transaction	Single row
Unrepeatable Read	Value of existing row	Committed transaction	Single row
Phantom Read	Set of rows matching query	Committed transaction	Query result set

Dirty Read (Different Problem)

•T₁ reads uncommitted data from T₂
•T₂ might rollback, invalidating T₁'s read
•Reads data that may never "exist"
•Violates isolation at a fundamental level
•Prevented by: READ COMMITTED or higher

Unrepeatable Read (This Topic)

•T₁ reads committed data that later changes
•Both reads see valid committed values
•Reads data that did exist (but changed)
•Violates repeatable read expectation
•Prevented by: REPEATABLE READ or higher

Phantom Read vs. Unrepeatable Read:

The distinction between unrepeatable reads and phantom reads is subtle but important:

Aspect	Unrepeatable Read	Phantom Read
What's affected	A specific row that was previously read	The set of rows matching a query condition
Operation type	UPDATE or DELETE to an existing row	INSERT or DELETE affecting query results
Example	Row with ID=5 changes from balance=100 to balance=200	Query for balance>1000 returns 3 rows, then 5 rows
Lock required	Row-level lock on specific rows	Predicate lock or range lock on query condition

Why the distinction matters:

Preventing unrepeatable reads requires locking specific rows that have been read. Preventing phantom reads requires locking potential rows that might be read—rows that don't yet exist or that match a condition. This makes phantom prevention significantly more complex and expensive.

Memory Aid

Think of it this way: Dirty reads see tentative futures (uncommitted changes). Unrepeatable reads see the changing present (committed changes between reads). Phantom reads see shifting populations (new or removed rows matching conditions). Unrepeatable reads are about the value of identified rows changing; phantoms are about which rows exist.

Why This Violates ACID

The ACID properties define the guarantees that database transactions should provide. Unrepeatable reads directly violate the Isolation property, with indirect implications for Consistency.

The Isolation Violation:

The 'I' in ACID guarantees that concurrent transactions execute as if they were serialized—running one after another with no overlap. In a truly isolated system, transaction T₁ should not be affected by T₂'s concurrent modifications.

Unrepeatable reads violate this because:

T₁ observes T₂'s effects mid-transaction: Isolation should hide T₂'s changes until T₁ completes
T₁ sees partial database evolution: Some reads see pre-T₂ state, others see post-T₂ state
No equivalent serial schedule: T₁ sees a state that would never occur if transactions ran sequentially

ACID Properties Affected

•Isolation (Direct): T₁ is not isolated from T₂'s committed changes. The interleaved execution produces results that couldn't occur in any serial execution of T₁ and T₂.
•Consistency (Indirect): If T₁'s logic depends on stable values (e.g., an invariant that two reads match), the transaction may violate application-level consistency constraints.
•Atomicity (Preserved): Each transaction still succeeds or fails as a unit—unrepeatable reads don't affect atomicity.
•Durability (Preserved): Committed data remains durable—unrepeatable reads don't affect persistence.

The Serial Equivalence Test:

Consider what happens if T₁ and T₂ ran serially:

Serial Order T₁ → T₂:

T₁: BEGIN, R(X)=100, R(X)=100, COMMIT
T₂: BEGIN, W(X)=200, COMMIT
Result: T₁ sees X=100 both times (consistent)

Serial Order T₂ → T₁:

T₂: BEGIN, W(X)=200, COMMIT  
T₁: BEGIN, R(X)=200, R(X)=200, COMMIT
Result: T₁ sees X=200 both times (consistent)

But with unrepeatable read:

T₁ sees X=100 first, then X=200
This matches neither serial order!

The concurrent execution produces a result impossible under serial execution—the definition of an isolation violation.

Isolation Levels as a Tradeoff

Database systems provide multiple isolation levels precisely because full isolation (SERIALIZABLE) reduces concurrency. Levels like READ COMMITTED intentionally allow unrepeatable reads to improve performance. The key is understanding what you're trading away and whether your application can tolerate the anomaly.

Read Stability Expectation

The unrepeatable read problem violates what database theorists call read stability—the expectation that data read during a transaction remains stable for the transaction's duration.

What Read Stability Means:

A schedule provides read stability for transaction T if every data item X that T reads retains its value throughout T's execution.

This is a natural expectation for most programmers. If you read a variable's value in a program, you expect it to have that value until you explicitly change it. Databases that allow unrepeatable reads break this intuition.

Why Read Stability Matters:

Implications of Lost Read Stability

•Broken assertions: Code that caches query results may hold stale data, leading to incorrect decisions
•Inconsistent calculations: Multi-step computations using the same data may mix old and new values
•Failed validations: Checks performed at the start of a transaction may not hold at the end
•Violated invariants: Business rules that depend on relationships between data items may be violated
•Non-deterministic behavior: The same transaction code may produce different results depending on concurrent activity

Example: The Broken Transfer:

-- Transaction T₁: Verify and transfer
BEGIN;
SELECT balance INTO @balance FROM accounts WHERE id = 1;
-- @balance = 1000

-- Meanwhile T₂ runs: UPDATE accounts SET balance = 500 WHERE id = 1; COMMIT;

-- T₁ continues, unaware of the change
IF @balance >= 500 THEN
    -- Check passes, but actual balance is now 500, not 1000
    
    SELECT balance INTO @current_balance FROM accounts WHERE id = 1;
    -- @current_balance = 500 (different!)
    
    UPDATE accounts SET balance = balance - 500 WHERE id = 1;
    -- This creates an overdraft: 500 - 500 = 0 (or negative if T₂ withdrew more)
END IF;
COMMIT;

The transaction made a decision based on one value (1000) but executed based on another (500). Read stability would have prevented this inconsistency.

The Illusion of Caching

A common mistake is caching query results in application variables and assuming they remain valid. Without read stability at the database level, these cached values become dangerous lies. Either re-read with proper isolation, use explicit locking, or design logic that doesn't depend on value stability.

Formal Notation and Detection

To rigorously analyze and detect unrepeatable reads, we need formal notation for representing transaction schedules. This notation enables precise reasoning about when anomalies occur.

Schedule Notation:

Rᵢ(X): Transaction Tᵢ reads data item X
Wᵢ(X): Transaction Tᵢ writes data item X
Cᵢ: Transaction Tᵢ commits
Aᵢ: Transaction Tᵢ aborts

Formal Pattern for Unrepeatable Read:

A schedule contains an unrepeatable read if there exists transactions Tᵢ and Tⱼ such that:

Rᵢ(X) < Wⱼ(X) < Cⱼ < Rᵢ(X)

Where < denotes "precedes in the schedule." In words:

Tᵢ reads X
Tⱼ writes X (after Tᵢ's first read)
Tⱼ commits (making the write permanent)
Tᵢ reads X again (seeing Tⱼ's committed value)

unrepeatable_read_detector.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
from typing import Dict, List, Set, Tuple, Optional
from dataclasses import dataclass, field
from enum import Enum
 
class OpType(Enum):
    READ = "R"
    WRITE = "W"
    COMMIT = "C"
    ABORT = "A"
 
@dataclass
class Operation:
    """Represents a single operation in a schedule."""
    transaction: str
    op_type: OpType
    data_item: Optional[str] = None
    timestamp: int = 0
    
    def __repr__(self):
        if self.data_item:
            return f"{self.op_type.value}_{self.transaction}({self.data_item})"
        return f"{self.op_type.value}_{self.transaction}"
 
@dataclass
class UnrepeatableReadDetector:
    """
    Detects unrepeatable read anomalies in transaction schedules.
    
    An unrepeatable read occurs when:
    1. Transaction Ti reads data item X
    2. Transaction Tj writes to X
    3. Tj commits
    4. Ti reads X again (seeing Tj's value)
    
    Key distinction from dirty read: Tj must commit before Ti's second read.
    """
    
    operations: List[Operation] = field(default_factory=list)
    # Track reads per transaction per item: {txn: {item: [timestamps]}}
    reads: Dict[str, Dict[str, List[int]]] = field(default_factory=dict)
    # Track writes: {item: [(txn, timestamp)]}
    writes: Dict[str, List[Tuple[str, int]]] = field(default_factory=dict)
    # Track commits: {txn: timestamp}
    commits: Dict[str, int] = field(default_factory=dict)
    timestamp: int = 0
    
    def execute(self, operation: Operation) -> None:
        """Execute an operation and track it."""
        operation.timestamp = self.timestamp
        self.operations.append(operation)
        self.timestamp += 1
        
        if operation.op_type == OpType.READ:
            txn = operation.transaction
            item = operation.data_item
            if txn not in self.reads:
                self.reads[txn] = {}
            if item not in self.reads[txn]:
                self.reads[txn][item] = []
            self.reads[txn][item].append(operation.timestamp)
            print(f"  t={operation.timestamp}: {operation}")
            
        elif operation.op_type == OpType.WRITE:
            item = operation.data_item
            if item not in self.writes:
                self.writes[item] = []
            self.writes[item].append((operation.transaction, operation.timestamp))
            print(f"  t={operation.timestamp}: {operation}")
            
        elif operation.op_type == OpType.COMMIT:
            self.commits[operation.transaction] = operation.timestamp
            print(f"  t={operation.timestamp}: {operation}")
            
        elif operation.op_type == OpType.ABORT:
            print(f"  t={operation.timestamp}: {operation}")
    
    def detect_unrepeatable_reads(self) -> List[Dict]:
        """
        Detect all unrepeatable read anomalies in the schedule.
        
        Returns list of anomalies with details about the transactions
        and data items involved.
        """
        anomalies = []
        
        for txn_i, items_read in self.reads.items():
            for item, read_times in items_read.items():
                if len(read_times) < 2:
                    continue  # Need at least 2 reads for unrepeatable read
                
                first_read = read_times[0]
                
                for subsequent_read in read_times[1:]:
                    # Check for intervening committed write by another transaction
                    if item not in self.writes:
                        continue
                    
                    for writer_txn, write_time in self.writes[item]:
                        if writer_txn == txn_i:
                            continue  # Same transaction, not an anomaly
                        
                        # Check pattern: R1(X) < W2(X) < C2 < R1(X)
                        if writer_txn not in self.commits:
                            continue  # Writer didn't commit
                        
                        commit_time = self.commits[writer_txn]
                        
                        if (first_read < write_time < commit_time < subsequent_read):
                            anomalies.append({
                                "reader": txn_i,
                                "writer": writer_txn,
                                "item": item,
                                "first_read_time": first_read,
                                "write_time": write_time,
                                "commit_time": commit_time,
                                "second_read_time": subsequent_read,
                            })
        
        return anomalies
 
def demonstrate_detection():
    """Demonstrate unrepeatable read detection with examples."""
    
    print("=" * 60)
    print("Example 1: UNREPEATABLE READ PRESENT")
    print("=" * 60)
    
    detector1 = UnrepeatableReadDetector()
    
    # Schedule: R1(X), W2(X), C2, R1(X), C1
    # T1 reads X, T2 writes & commits X, T1 reads X again
    detector1.execute(Operation("T1", OpType.READ, "X"))
    detector1.execute(Operation("T2", OpType.WRITE, "X"))
    detector1.execute(Operation("T2", OpType.COMMIT))
    detector1.execute(Operation("T1", OpType.READ, "X"))
    detector1.execute(Operation("T1", OpType.COMMIT))
    
    anomalies1 = detector1.detect_unrepeatable_reads()
    print(f"\nAnomalies detected: {len(anomalies1)}")
    for a in anomalies1:
        print(f"  - {a['reader']} experienced unrepeatable read of {a['item']}")
        print(f"    First read at t={a['first_read_time']}, {a['writer']} wrote at "
              f"t={a['write_time']}, committed at t={a['commit_time']}, "
              f"second read at t={a['second_read_time']}")
    
    print("\n" + "=" * 60)
    print("Example 2: NO UNREPEATABLE READ (Writer doesn't commit in time)")
    print("=" * 60)
    
    detector2 = UnrepeatableReadDetector()
    
    # Schedule: R1(X), W2(X), R1(X), C1, C2
    # T2 commits AFTER T1's second read
    detector2.execute(Operation("T1", OpType.READ, "X"))
    detector2.execute(Operation("T2", OpType.WRITE, "X"))
    detector2.execute(Operation("T1", OpType.READ, "X"))
    detector2.execute(Operation("T1", OpType.COMMIT))
    detector2.execute(Operation("T2", OpType.COMMIT))
    
    anomalies2 = detector2.detect_unrepeatable_reads()
    print(f"\nAnomalies detected: {len(anomalies2)}")
    if not anomalies2:
        print("  No unrepeatable reads (T2 committed after T1's second read)")
    
    print("\n" + "=" * 60)
    print("Example 3: NO UNREPEATABLE READ (Only one read)")
    print("=" * 60)
    
    detector3 = UnrepeatableReadDetector()
    
    # Schedule: R1(X), W2(X), C2, C1
    # T1 only reads X once
    detector3.execute(Operation("T1", OpType.READ, "X"))
    detector3.execute(Operation("T2", OpType.WRITE, "X"))
    detector3.execute(Operation("T2", OpType.COMMIT))
    detector3.execute(Operation("T1", OpType.COMMIT))
    
    anomalies3 = detector3.detect_unrepeatable_reads()
    print(f"\nAnomalies detected: {len(anomalies3)}")
    if not anomalies3:
        print("  No unrepeatable reads (T1 only read X once)")
 
if __name__ == "__main__":
    demonstrate_detection()

Detection vs. Prevention

Detection algorithms like the one shown are primarily for analysis and testing. In production systems, prevention (through appropriate isolation levels or locking) is preferred over detection because detecting an unrepeatable read after it occurs may be too late—the transaction may have already made incorrect decisions based on inconsistent data.

Read Phenomena Hierarchy

The three read anomalies form a hierarchy based on their severity and the isolation level required to prevent them. Understanding this hierarchy is essential for choosing appropriate database configurations.

The Hierarchy (from most to least severe):

Dirty Read ⊃ Unrepeatable Read ⊃ Phantom Read

This means:

Preventing dirty reads also helps prevent conditions that could lead to more severe isolation violations
Preventing unrepeatable reads requires at least as much protection as preventing dirty reads, plus additional guarantees
Preventing phantoms requires the strongest protection

SQL Isolation Levels and Read Phenomena
Isolation Level	Dirty Read	Unrepeatable Read	Phantom Read
READ UNCOMMITTED	Possible	Possible	Possible
READ COMMITTED	Prevented	Possible	Possible
REPEATABLE READ	Prevented	Prevented	Possible*
SERIALIZABLE	Prevented	Prevented	Prevented

Note on REPEATABLE READ and Phantoms:

The SQL standard states that REPEATABLE READ may allow phantom reads. However, implementations vary:

PostgreSQL: REPEATABLE READ uses snapshot isolation, which prevents phantoms in most cases
MySQL InnoDB: Uses next-key locking which prevents many phantom scenarios
SQL Server: May allow phantoms unless additional locking hints are used

Why REPEATABLE READ Prevents Unrepeatable Reads:

The REPEATABLE READ isolation level ensures that any row read during a transaction will return the same value if read again within that same transaction. This is typically implemented through:

Locking approach: Hold read locks until transaction end
MVCC approach: Read from a consistent snapshot taken at transaction start

Implementation Matters

The SQL standard defines minimum guarantees, but implementations often provide stronger guarantees. Always consult your specific database's documentation to understand exactly which anomalies are prevented at each isolation level.

Summary: Unrepeatable Read Definition

We've established a rigorous understanding of what unrepeatable reads are and how they fit into the landscape of concurrency anomalies. Let's consolidate the key concepts:

Key Takeaways

•Definition: An unrepeatable read occurs when a transaction reads the same data item twice and gets different values because another transaction modified and committed the data between reads.
•Committed Data: Unlike dirty reads, unrepeatable reads involve only committed data—both read values were valid at their respective times.
•Timing Condition: The pattern is R₁(X) < W₂(X) < C₂ < R₁(X)—a write and commit by T₂ interleaved between T₁'s reads.
•Isolation Violation: Unrepeatable reads violate the ACID Isolation property because the transaction observes changes from concurrent transactions mid-execution.
•Read Stability: The anomaly breaks the expectation that data remains stable throughout a transaction.
•Prevention: REPEATABLE READ or SERIALIZABLE isolation levels prevent unrepeatable reads.

What's next:

Now that we understand the formal definition of unrepeatable reads, we'll explore the broader concept of inconsistent reads—how unrepeatable reads can combine with other data access patterns to create complex inconsistency scenarios that affect transaction correctness and application logic.

Page Complete

You now understand the precise definition of unrepeatable reads, can distinguish them from dirty reads and phantom reads, and know which isolation levels prevent them. Next, we'll examine how inconsistent reads manifest in more complex scenarios and why they create challenges for application correctness.

1 / 5

Loading learning content...

Database Management SystemConcurrency Problems

Unrepeatable Read Problem

LevelIntermediate

Duration75 mins

TopicConcurrency Problems

1 / 5

Unrepeatable Read Definition

When Data Shifts Beneath Your Feet

What happened?

What You Will Learn

Formal Definition

Formal Definition:

An unrepeatable read occurs when transaction T₁ reads a data item X, then transaction T₂ modifies and commits X, and subsequently T₁ reads X again—observing T₂'s committed value instead of the value it originally read.

Symbolically, the problematic sequence is:

T₁: R(X) → value v₁
        T₂: W(X) → value v₂
        T₂: COMMIT
T₁: R(X) → value v₂  (different from v₁!)
T₁: COMMIT

The key characteristics that define an unrepeatable read are:

Defining Characteristics

•Same data item: The transaction reads the exact same row or record twice (not different rows matching a condition)
•Same transaction: Both reads occur within the same transaction boundary (between BEGIN and COMMIT/ROLLBACK)
•Intervening modification: Another transaction modifies the data between the two reads
•Committed modification: The modifying transaction commits before the second read (this distinguishes it from dirty reads)
•Different values observed: The two reads return different values for the same data item

The 'Committed' Distinction

The Timing Window

The Vulnerable Interval:

Timeline of an Unrepeatable Read
Time	T₁ (Reader)	T₂ (Writer)	X Value	State
t₀	BEGIN		100	X = 100 (initial)
t₁	R(X) → 100		100	T₁ sees 100
t₂	...processing...	BEGIN	100	T₂ starts
t₃	...processing...	W(X) ← 200	100→200	T₂ writes 200 (uncommitted)
t₄	...processing...	COMMIT	200	T₂ commits, X = 200 permanent
t₅	R(X) → 200 ⚠️		200	T₁ sees 200 (different!)
t₆	COMMIT		200	T₁ commits with inconsistent view

Critical Observations:

At t₁: T₁ reads X = 100. This is its "snapshot" of reality.
Between t₁ and t₅: T₂ successfully modifies and commits X = 200. The database is now in a new consistent state.
At t₅: T₁ reads X again and sees 200. From T₁'s perspective, the same data item magically changed value.
The paradox: Both values (100 and 200) were valid committed values at the time they were read. T₁ isn't reading "wrong" data—it's reading data from two different points in time.

The fundamental issue is that T₁ doesn't operate on a consistent snapshot of the database. It sees the database at time t₁ for its first read and at time t₅ for its second read.

The Wider the Window, the Greater the Risk

Distinguishing from Related Anomalies

The Three Read Anomalies:

Comparison of Read Anomalies
Anomaly	What Changes	Data Source	Scope
Dirty Read	Value of existing row	Uncommitted transaction	Single row
Unrepeatable Read	Value of existing row	Committed transaction	Single row
Phantom Read	Set of rows matching query	Committed transaction	Query result set

Dirty Read (Different Problem)

•T₁ reads uncommitted data from T₂
•T₂ might rollback, invalidating T₁'s read
•Reads data that may never "exist"
•Violates isolation at a fundamental level
•Prevented by: READ COMMITTED or higher

Unrepeatable Read (This Topic)

•T₁ reads committed data that later changes
•Both reads see valid committed values
•Reads data that did exist (but changed)
•Violates repeatable read expectation
•Prevented by: REPEATABLE READ or higher

Phantom Read vs. Unrepeatable Read:

The distinction between unrepeatable reads and phantom reads is subtle but important:

Aspect	Unrepeatable Read	Phantom Read
What's affected	A specific row that was previously read	The set of rows matching a query condition
Operation type	UPDATE or DELETE to an existing row	INSERT or DELETE affecting query results
Example	Row with ID=5 changes from balance=100 to balance=200	Query for balance>1000 returns 3 rows, then 5 rows
Lock required	Row-level lock on specific rows	Predicate lock or range lock on query condition

Why the distinction matters:

Memory Aid

Why This Violates ACID

The ACID properties define the guarantees that database transactions should provide. Unrepeatable reads directly violate the Isolation property, with indirect implications for Consistency.

The Isolation Violation:

Unrepeatable reads violate this because:

T₁ observes T₂'s effects mid-transaction: Isolation should hide T₂'s changes until T₁ completes
T₁ sees partial database evolution: Some reads see pre-T₂ state, others see post-T₂ state
No equivalent serial schedule: T₁ sees a state that would never occur if transactions ran sequentially

ACID Properties Affected

•Isolation (Direct): T₁ is not isolated from T₂'s committed changes. The interleaved execution produces results that couldn't occur in any serial execution of T₁ and T₂.
•Consistency (Indirect): If T₁'s logic depends on stable values (e.g., an invariant that two reads match), the transaction may violate application-level consistency constraints.
•Atomicity (Preserved): Each transaction still succeeds or fails as a unit—unrepeatable reads don't affect atomicity.
•Durability (Preserved): Committed data remains durable—unrepeatable reads don't affect persistence.

The Serial Equivalence Test:

Consider what happens if T₁ and T₂ ran serially:

Serial Order T₁ → T₂:

T₁: BEGIN, R(X)=100, R(X)=100, COMMIT
T₂: BEGIN, W(X)=200, COMMIT
Result: T₁ sees X=100 both times (consistent)

Serial Order T₂ → T₁:

T₂: BEGIN, W(X)=200, COMMIT  
T₁: BEGIN, R(X)=200, R(X)=200, COMMIT
Result: T₁ sees X=200 both times (consistent)

But with unrepeatable read:

T₁ sees X=100 first, then X=200
This matches neither serial order!

The concurrent execution produces a result impossible under serial execution—the definition of an isolation violation.

Isolation Levels as a Tradeoff

Read Stability Expectation

The unrepeatable read problem violates what database theorists call read stability—the expectation that data read during a transaction remains stable for the transaction's duration.

What Read Stability Means:

A schedule provides read stability for transaction T if every data item X that T reads retains its value throughout T's execution.

Why Read Stability Matters:

Implications of Lost Read Stability

•Broken assertions: Code that caches query results may hold stale data, leading to incorrect decisions
•Inconsistent calculations: Multi-step computations using the same data may mix old and new values
•Failed validations: Checks performed at the start of a transaction may not hold at the end
•Violated invariants: Business rules that depend on relationships between data items may be violated
•Non-deterministic behavior: The same transaction code may produce different results depending on concurrent activity

Example: The Broken Transfer:

-- Transaction T₁: Verify and transfer
BEGIN;
SELECT balance INTO @balance FROM accounts WHERE id = 1;
-- @balance = 1000

-- Meanwhile T₂ runs: UPDATE accounts SET balance = 500 WHERE id = 1; COMMIT;

-- T₁ continues, unaware of the change
IF @balance >= 500 THEN
    -- Check passes, but actual balance is now 500, not 1000
    
    SELECT balance INTO @current_balance FROM accounts WHERE id = 1;
    -- @current_balance = 500 (different!)
    
    UPDATE accounts SET balance = balance - 500 WHERE id = 1;
    -- This creates an overdraft: 500 - 500 = 0 (or negative if T₂ withdrew more)
END IF;
COMMIT;

The transaction made a decision based on one value (1000) but executed based on another (500). Read stability would have prevented this inconsistency.

The Illusion of Caching

Formal Notation and Detection

To rigorously analyze and detect unrepeatable reads, we need formal notation for representing transaction schedules. This notation enables precise reasoning about when anomalies occur.

Schedule Notation:

Rᵢ(X): Transaction Tᵢ reads data item X
Wᵢ(X): Transaction Tᵢ writes data item X
Cᵢ: Transaction Tᵢ commits
Aᵢ: Transaction Tᵢ aborts

Formal Pattern for Unrepeatable Read:

A schedule contains an unrepeatable read if there exists transactions Tᵢ and Tⱼ such that:

Rᵢ(X) < Wⱼ(X) < Cⱼ < Rᵢ(X)

Where < denotes "precedes in the schedule." In words:

Tᵢ reads X
Tⱼ writes X (after Tᵢ's first read)
Tⱼ commits (making the write permanent)
Tᵢ reads X again (seeing Tⱼ's committed value)

unrepeatable_read_detector.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
from typing import Dict, List, Set, Tuple, Optional
from dataclasses import dataclass, field
from enum import Enum
 
class OpType(Enum):
    READ = "R"
    WRITE = "W"
    COMMIT = "C"
    ABORT = "A"
 
@dataclass
class Operation:
    """Represents a single operation in a schedule."""
    transaction: str
    op_type: OpType
    data_item: Optional[str] = None
    timestamp: int = 0
    
    def __repr__(self):
        if self.data_item:
            return f"{self.op_type.value}_{self.transaction}({self.data_item})"
        return f"{self.op_type.value}_{self.transaction}"
 
@dataclass
class UnrepeatableReadDetector:
    """
    Detects unrepeatable read anomalies in transaction schedules.
    
    An unrepeatable read occurs when:
    1. Transaction Ti reads data item X
    2. Transaction Tj writes to X
    3. Tj commits
    4. Ti reads X again (seeing Tj's value)
    
    Key distinction from dirty read: Tj must commit before Ti's second read.
    """
    
    operations: List[Operation] = field(default_factory=list)
    # Track reads per transaction per item: {txn: {item: [timestamps]}}
    reads: Dict[str, Dict[str, List[int]]] = field(default_factory=dict)
    # Track writes: {item: [(txn, timestamp)]}
    writes: Dict[str, List[Tuple[str, int]]] = field(default_factory=dict)
    # Track commits: {txn: timestamp}
    commits: Dict[str, int] = field(default_factory=dict)
    timestamp: int = 0
    
    def execute(self, operation: Operation) -> None:
        """Execute an operation and track it."""
        operation.timestamp = self.timestamp
        self.operations.append(operation)
        self.timestamp += 1
        
        if operation.op_type == OpType.READ:
            txn = operation.transaction
            item = operation.data_item
            if txn not in self.reads:
                self.reads[txn] = {}
            if item not in self.reads[txn]:
                self.reads[txn][item] = []
            self.reads[txn][item].append(operation.timestamp)
            print(f"  t={operation.timestamp}: {operation}")
            
        elif operation.op_type == OpType.WRITE:
            item = operation.data_item
            if item not in self.writes:
                self.writes[item] = []
            self.writes[item].append((operation.transaction, operation.timestamp))
            print(f"  t={operation.timestamp}: {operation}")
            
        elif operation.op_type == OpType.COMMIT:
            self.commits[operation.transaction] = operation.timestamp
            print(f"  t={operation.timestamp}: {operation}")
            
        elif operation.op_type == OpType.ABORT:
            print(f"  t={operation.timestamp}: {operation}")
    
    def detect_unrepeatable_reads(self) -> List[Dict]:
        """
        Detect all unrepeatable read anomalies in the schedule.
        
        Returns list of anomalies with details about the transactions
        and data items involved.
        """
        anomalies = []
        
        for txn_i, items_read in self.reads.items():
            for item, read_times in items_read.items():
                if len(read_times) < 2:
                    continue  # Need at least 2 reads for unrepeatable read
                
                first_read = read_times[0]
                
                for subsequent_read in read_times[1:]:
                    # Check for intervening committed write by another transaction
                    if item not in self.writes:
                        continue
                    
                    for writer_txn, write_time in self.writes[item]:
                        if writer_txn == txn_i:
                            continue  # Same transaction, not an anomaly
                        
                        # Check pattern: R1(X) < W2(X) < C2 < R1(X)
                        if writer_txn not in self.commits:
                            continue  # Writer didn't commit
                        
                        commit_time = self.commits[writer_txn]
                        
                        if (first_read < write_time < commit_time < subsequent_read):
                            anomalies.append({
                                "reader": txn_i,
                                "writer": writer_txn,
                                "item": item,
                                "first_read_time": first_read,
                                "write_time": write_time,
                                "commit_time": commit_time,
                                "second_read_time": subsequent_read,
                            })
        
        return anomalies
 
def demonstrate_detection():
    """Demonstrate unrepeatable read detection with examples."""
    
    print("=" * 60)
    print("Example 1: UNREPEATABLE READ PRESENT")
    print("=" * 60)
    
    detector1 = UnrepeatableReadDetector()
    
    # Schedule: R1(X), W2(X), C2, R1(X), C1
    # T1 reads X, T2 writes & commits X, T1 reads X again
    detector1.execute(Operation("T1", OpType.READ, "X"))
    detector1.execute(Operation("T2", OpType.WRITE, "X"))
    detector1.execute(Operation("T2", OpType.COMMIT))
    detector1.execute(Operation("T1", OpType.READ, "X"))
    detector1.execute(Operation("T1", OpType.COMMIT))
    
    anomalies1 = detector1.detect_unrepeatable_reads()
    print(f"\nAnomalies detected: {len(anomalies1)}")
    for a in anomalies1:
        print(f"  - {a['reader']} experienced unrepeatable read of {a['item']}")
        print(f"    First read at t={a['first_read_time']}, {a['writer']} wrote at "
              f"t={a['write_time']}, committed at t={a['commit_time']}, "
              f"second read at t={a['second_read_time']}")
    
    print("\n" + "=" * 60)
    print("Example 2: NO UNREPEATABLE READ (Writer doesn't commit in time)")
    print("=" * 60)
    
    detector2 = UnrepeatableReadDetector()
    
    # Schedule: R1(X), W2(X), R1(X), C1, C2
    # T2 commits AFTER T1's second read
    detector2.execute(Operation("T1", OpType.READ, "X"))
    detector2.execute(Operation("T2", OpType.WRITE, "X"))
    detector2.execute(Operation("T1", OpType.READ, "X"))
    detector2.execute(Operation("T1", OpType.COMMIT))
    detector2.execute(Operation("T2", OpType.COMMIT))
    
    anomalies2 = detector2.detect_unrepeatable_reads()
    print(f"\nAnomalies detected: {len(anomalies2)}")
    if not anomalies2:
        print("  No unrepeatable reads (T2 committed after T1's second read)")
    
    print("\n" + "=" * 60)
    print("Example 3: NO UNREPEATABLE READ (Only one read)")
    print("=" * 60)
    
    detector3 = UnrepeatableReadDetector()
    
    # Schedule: R1(X), W2(X), C2, C1
    # T1 only reads X once
    detector3.execute(Operation("T1", OpType.READ, "X"))
    detector3.execute(Operation("T2", OpType.WRITE, "X"))
    detector3.execute(Operation("T2", OpType.COMMIT))
    detector3.execute(Operation("T1", OpType.COMMIT))
    
    anomalies3 = detector3.detect_unrepeatable_reads()
    print(f"\nAnomalies detected: {len(anomalies3)}")
    if not anomalies3:
        print("  No unrepeatable reads (T1 only read X once)")
 
if __name__ == "__main__":
    demonstrate_detection()

Detection vs. Prevention

Read Phenomena Hierarchy

The Hierarchy (from most to least severe):

Dirty Read ⊃ Unrepeatable Read ⊃ Phantom Read

This means:

Preventing dirty reads also helps prevent conditions that could lead to more severe isolation violations
Preventing unrepeatable reads requires at least as much protection as preventing dirty reads, plus additional guarantees
Preventing phantoms requires the strongest protection

SQL Isolation Levels and Read Phenomena
Isolation Level	Dirty Read	Unrepeatable Read	Phantom Read
READ UNCOMMITTED	Possible	Possible	Possible
READ COMMITTED	Prevented	Possible	Possible
REPEATABLE READ	Prevented	Prevented	Possible*
SERIALIZABLE	Prevented	Prevented	Prevented

Note on REPEATABLE READ and Phantoms:

The SQL standard states that REPEATABLE READ may allow phantom reads. However, implementations vary:

PostgreSQL: REPEATABLE READ uses snapshot isolation, which prevents phantoms in most cases
MySQL InnoDB: Uses next-key locking which prevents many phantom scenarios
SQL Server: May allow phantoms unless additional locking hints are used

Why REPEATABLE READ Prevents Unrepeatable Reads:

The REPEATABLE READ isolation level ensures that any row read during a transaction will return the same value if read again within that same transaction. This is typically implemented through:

Locking approach: Hold read locks until transaction end
MVCC approach: Read from a consistent snapshot taken at transaction start

Implementation Matters

Summary: Unrepeatable Read Definition

We've established a rigorous understanding of what unrepeatable reads are and how they fit into the landscape of concurrency anomalies. Let's consolidate the key concepts:

Key Takeaways

•Definition: An unrepeatable read occurs when a transaction reads the same data item twice and gets different values because another transaction modified and committed the data between reads.
•Committed Data: Unlike dirty reads, unrepeatable reads involve only committed data—both read values were valid at their respective times.
•Timing Condition: The pattern is R₁(X) < W₂(X) < C₂ < R₁(X)—a write and commit by T₂ interleaved between T₁'s reads.
•Isolation Violation: Unrepeatable reads violate the ACID Isolation property because the transaction observes changes from concurrent transactions mid-execution.
•Read Stability: The anomaly breaks the expectation that data remains stable throughout a transaction.
•Prevention: REPEATABLE READ or SERIALIZABLE isolation levels prevent unrepeatable reads.

What's next:

Page Complete

1 / 5