Unrepeatable Read Problem - Learning Module

Loading content...

0/241

Inconsistent Reads

The Database That Contradicts Itself

Consider an airline booking system where a transaction needs to verify seat availability and passenger counts. The transaction reads 150 passengers booked and 180 total seats, confirming 30 seats remain available. Later in the same transaction, it reads the seat count again to finalize a booking—but now sees 175 total seats because a concurrent maintenance operation updated the aircraft configuration.

The transaction now holds two contradictory facts:

150 passengers booked (from first read)
175 total seats (from second read, down from 180)

If it proceeds with the original calculation (30 available), it would allow overbooking—the actual availability is now 175 - 150 = 25 seats. Yet the transaction believes 30 seats are free.

This is read inconsistency: a transaction operating on data that presents a logically impossible view of the database—facts that couldn't coexist at any single point in time. Unrepeatable reads are the mechanism; inconsistent data views are the consequence.

What You Will Learn

By the end of this page, you will understand how unrepeatable reads create inconsistent data views, the concept of snapshot consistency, how multi-item inconsistencies emerge from single-item anomalies, and the relationship between read consistency and transaction correctness.

The Nature of Read Inconsistency

Read inconsistency occurs when a transaction's reads don't form a coherent snapshot of the database. The transaction observes a state that never actually existed—a Frankenstein's monster assembled from fragments of different database states.

Types of Read Inconsistency:

1. Single-Item Inconsistency (Basic Unrepeatable Read) The same data item is read twice with different values. The transaction has contradictory information about one specific piece of data.

2. Multi-Item Inconsistency (Read Skew) Related data items are read at different times, producing values that couldn't coexist. The inconsistency spans multiple data items that should maintain certain relationships.

3. Aggregate Inconsistency Reading the same aggregate query (SUM, COUNT, AVG) multiple times yields different results due to intervening modifications.

Types of Read Inconsistency
Type	Data Scope	Example	Detection Difficulty
Single-Item	One row/field	Balance reads as $100, then $80	Easy (compare values)
Multi-Item	Related rows	Account A + Account B should sum to $1000, but transaction sees $1050	Medium (requires invariant knowledge)
Aggregate	Set of rows	COUNT(*) returns 100, then 103	Easy (compare aggregates)
Derived	Computed values	Cached calculation differs from re-calculation	Hard (requires re-computation)

The Fundamental Problem:

Databases evolve through a sequence of consistent states. Each committed transaction moves the database from one consistent state to another. For a transaction to operate correctly, it should perceive a single, coherent state.

When reads are inconsistent, the transaction perceives a synthetic state—one that never existed in the database's history:

Database States:    S₀ → S₁ → S₂ → S₃ → S₄
                         ↑         ↑
Transaction reads:  Read R₁ from S₁   Read R₂ from S₃
                         
                    R₁ ∪ R₂ ≠ any Sᵢ

The transaction's working set {R₁, R₂} corresponds to no actual database state. Decisions based on this synthetic state may violate real-world constraints.

The Silent Violation

Read inconsistency is particularly dangerous because each individual read returns valid, committed data. There's no error, no exception, no indication that anything is wrong. The database is functioning correctly—it's the transaction's perception that's fragmented across time.

Snapshot Consistency

The ideal solution to read inconsistency is snapshot consistency: every read in a transaction returns values from the same logical point in time—a consistent snapshot of the database.

Snapshot Consistency Definition:

A transaction exhibits snapshot consistency if all its reads observe values from a single consistent database state, regardless of when the reads occur or what concurrent transactions do.

With snapshot consistency, even if a transaction takes minutes to complete and reads the same data multiple times, every read returns the value as it was at the snapshot point—typically the transaction's start time.

Snapshot Properties:

What Snapshot Consistency Guarantees

•Repeatable Reads: Reading the same item always returns the same value
•Cross-Item Consistency: Related items maintain their expected relationships
•Aggregate Stability: Aggregate queries return consistent results
•Time-Travel Semantics: The transaction sees a frozen view of the past
•No Interference: Concurrent commits don't affect visible values

How Databases Implement Snapshots:

Multi-Version Concurrency Control (MVCC):

MVCC maintains multiple versions of each data item, each tagged with a transaction timestamp or version number. When a transaction reads:

The system looks up the appropriate version based on the transaction's snapshot timestamp
If concurrent transactions have committed newer versions, they're invisible
The transaction always sees the version that was current at its snapshot point

-- PostgreSQL example: Snapshot at start of first statement (READ COMMITTED)
-- or at transaction start (REPEATABLE READ/SERIALIZABLE)

BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

SELECT balance FROM accounts WHERE id = 1;  -- Returns 1000

-- Another session: UPDATE accounts SET balance = 500 WHERE id = 1; COMMIT;

SELECT balance FROM accounts WHERE id = 1;  -- Still returns 1000!
COMMIT;

The second SELECT returns 1000 because the transaction's snapshot was taken before the concurrent update. PostgreSQL's REPEATABLE READ provides true snapshot consistency.

Snapshot Timestamp vs. Read Timestamp

The snapshot timestamp (when the snapshot was taken) is different from when reads occur. A transaction might take its snapshot at T=100, then perform a read at T=150, but see data as of T=100. This temporal decoupling is what provides consistency.

Read Skew: Multi-Item Inconsistency

Read skew is a specific form of read inconsistency where a transaction reads multiple related data items and observes an inconsistent combination due to intervening modifications.

The Classic Example: The Constraint Violation

Consider a database invariant: Account_A.balance + Account_B.balance = 1000 (total funds are conserved).

Initial State:
  Account A: 500
  Account B: 500
  Sum: 1000 ✓

T₁ (Read):                    T₂ (Transfer):
────────────────────────      ────────────────────────
R(A) → 500                    
                              W(A) ← 600  (add 100)
                              W(B) ← 400  (subtract 100)
                              COMMIT
R(B) → 400

T₁ sees: A=500, B=400
Sum: 900 ≠ 1000 ✗

T₁ reads A before T₂'s update and B after. It observes a state where 100 dollars have vanished—a state that never existed in reality. T₂'s transfer was atomic (A+100 and B-100 happened together), but T₁ saw a fragmented view.

What T₁ Observes

•Account A: $500 (old value)
•Account B: $400 (new value)
•Total: $900
•"Missing" $100 is unexplained
•Invariant appears violated

Actual Database States

•Before T₂: A=500, B=500, Sum=1000
•After T₂: A=600, B=400, Sum=1000
•Total always correct
•Invariant never violated
•T₁ saw a synthetic state

Read Skew in Real Systems:

Read skew creates serious problems in applications that need to enforce cross-item constraints:

Domain	Invariant	Read Skew Consequence
Banking	Account balances must reconcile	Audit reports show discrepancies
Inventory	Stock count = received - sold	Inventory appears to have losses
Bookings	Reserved + Available = Total	System allows overbooking
Accounting	Debits = Credits	Books fail to balance
HR	Headcount by department sums to total	Reporting inconsistencies

The Relationship to Unrepeatable Reads:

Read skew is conceptually related to unrepeatable reads:

Unrepeatable read: same item, different values
Read skew: different items, inconsistent combination

Both result from reading at different points in the database's evolution. Read skew can even occur without reading any single item twice—the inconsistency emerges from the combination of reads across different items.

Read Skew Prevention

Preventing read skew requires the same mechanisms as preventing unrepeatable reads: snapshot isolation or serializable transactions. If related data items must be consistent, they need to be read from the same snapshot.

Aggregate Query Inconsistency

Aggregate queries (SUM, COUNT, AVG, etc.) are particularly susceptible to read inconsistency because they implicitly read many rows. If concurrent transactions modify the underlying data between when the query starts and completes, the aggregate may reflect a mix of old and new values.

The Aggregate Inconsistency Problem:

-- Transaction T₁: Generating a balance report
BEGIN;

-- Query 1: Total deposits
SELECT SUM(amount) FROM transactions WHERE type = 'DEPOSIT';
-- Returns: 100,000

-- Meanwhile, T₂: INSERT INTO transactions VALUES (..., 'DEPOSIT', 5000); COMMIT;

-- Query 2: Total withdrawals  
SELECT SUM(amount) FROM transactions WHERE type = 'WITHDRAWAL';
-- Returns: 45,000

-- Query 3: Net calculation (done in app, or could be SQL)
SELECT SUM(amount) FROM transactions;
-- If T₂'s deposit is included: 60,000 (not 55,000!)

COMMIT;

The report shows:

Deposits: $100,000
Withdrawals: $45,000
Expected Net: $55,000
Actual Total Query: $60,000 (includes T₂'s deposit)

The totals don't reconcile because they were computed at different points in time.

aggregate_inconsistency_demo.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
-- Demonstration of aggregate inconsistency
-- Run in two separate sessions to see the effect
 
-- Session 1: Create test table
CREATE TABLE IF NOT EXISTS inventory (
    product_id INT PRIMARY KEY,
    category VARCHAR(50),
    quantity INT
);
 
-- Insert test data
INSERT INTO inventory VALUES (1, 'Electronics', 100);
INSERT INTO inventory VALUES (2, 'Electronics', 150);
INSERT INTO inventory VALUES (3, 'Clothing', 200);
INSERT INTO inventory VALUES (4, 'Clothing', 75);
 
-- ============================================
-- Session 1: Read Committed (allows inconsistency)
-- ============================================
BEGIN;
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
 
-- First aggregate: Electronics total
SELECT SUM(quantity) AS electronics_total 
FROM inventory 
WHERE category = 'Electronics';
-- Returns: 250
 
-- [Session 2: UPDATE inventory SET quantity = 200 WHERE product_id = 1; COMMIT;]
-- Electronics item 1: 100 -> 200
 
-- Second aggregate: Clothing total
SELECT SUM(quantity) AS clothing_total 
FROM inventory 
WHERE category = 'Clothing';
-- Returns: 275
 
-- Third aggregate: Overall total
SELECT SUM(quantity) AS grand_total 
FROM inventory;
-- Returns: 625 (includes the update!)
 
-- But 250 + 275 = 525 ≠ 625!
-- Inconsistency: electronics read old data, grand total read new data
 
COMMIT;
 
-- ============================================
-- Alternative: REPEATABLE READ (consistent)
-- ============================================
BEGIN;
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
 
SELECT SUM(quantity) AS electronics_total 
FROM inventory 
WHERE category = 'Electronics';
-- Returns: 250
 
-- [Session 2: UPDATE inventory SET quantity = 200 WHERE product_id = 1; COMMIT;]
-- Change is invisible to this transaction
 
SELECT SUM(quantity) AS clothing_total 
FROM inventory 
WHERE category = 'Clothing';
-- Returns: 275
 
SELECT SUM(quantity) AS grand_total 
FROM inventory;
-- Returns: 525 (consistent with sub-totals!)
 
COMMIT;

Why Aggregates Are Especially Vulnerable:

Duration: Aggregate queries over large tables take time, increasing the window for concurrent modifications
Implicit scope: The query scans many rows—each is a potential inconsistency point
No explicit revisit: Unlike explicit repeated reads, aggregate inconsistency doesn't involve reading the same row twice—just different rows at different times
Invisible composition: Multiple partial aggregates combined in the application are particularly risky if computed in separate queries

Best Practice: Atomic Reporting:

-- Good: Single query, atomic read
SELECT 
    SUM(CASE WHEN type = 'DEPOSIT' THEN amount ELSE 0 END) AS deposits,
    SUM(CASE WHEN type = 'WITHDRAWAL' THEN amount ELSE 0 END) AS withdrawals,
    SUM(amount) AS net
FROM transactions;

-- This single query reads all rows in one atomic operation,
-- eliminating the window for inconsistency

Reporting Danger Zone

Financial and compliance reports are high-stakes areas where aggregate inconsistency can have serious consequences—audit failures, regulatory issues, or incorrect business decisions. Always use snapshot isolation or single-query approaches for critical reporting.

Observability and Detection

Detecting read inconsistency in running systems is challenging because each individual read returns correct data. Detection requires comparing reads or checking invariants—actions that applications may not naturally perform.

Detection Strategies:

1. Invariant Checking: If you know that certain values should maintain relationships, verify those relationships explicitly:

-- Check that sub-totals equal grand total
DO $$
DECLARE
    electronics_sum INT;
    clothing_sum INT;
    grand_total INT;
BEGIN
    SELECT SUM(quantity) INTO electronics_sum FROM inventory WHERE category = 'Electronics';
    SELECT SUM(quantity) INTO clothing_sum FROM inventory WHERE category = 'Clothing';
    SELECT SUM(quantity) INTO grand_total FROM inventory;
    
    IF electronics_sum + clothing_sum != grand_total THEN
        RAISE EXCEPTION 'Inconsistency detected: % + % != %', 
            electronics_sum, clothing_sum, grand_total;
    END IF;
END $$;

2. Version Comparison: Some systems track row versions or timestamps that can be compared:

-- PostgreSQL: Check if version changed
SELECT xmin, balance FROM accounts WHERE id = 1; -- xmin = 1234
-- ... later ...
SELECT xmin, balance FROM accounts WHERE id = 1; -- xmin = 1236?
-- Different xmin = row was modified

inconsistency_detector.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
from typing import Dict, Set, List, Tuple, Optional
from dataclasses import dataclass, field
from collections import defaultdict
import hashlib
 
@dataclass
class ReadRecord:
    """Records information about a read operation."""
    item: str
    value: any
    read_number: int  # 1st, 2nd, etc read of this item
    timestamp: int
 
@dataclass
class InconsistencyDetector:
    """
    Detects read inconsistency patterns in transaction execution.
    
    Strategies:
    1. Track repeated reads of same item (unrepeatable reads)
    2. Track reads of related items (potential read skew)  
    3. Verify invariants when specified
    """
    
    reads: Dict[str, List[ReadRecord]] = field(default_factory=lambda: defaultdict(list))
    invariants: List[callable] = field(default_factory=list)
    current_values: Dict[str, any] = field(default_factory=dict)
    timestamp: int = 0
    
    def record_read(self, item: str, value: any) -> Optional[str]:
        """
        Record a read and check for unrepeatable read.
        Returns warning message if inconsistency detected.
        """
        self.timestamp += 1
        item_reads = self.reads[item]
        read_number = len(item_reads) + 1
        
        record = ReadRecord(
            item=item,
            value=value,
            read_number=read_number,
            timestamp=self.timestamp
        )
        item_reads.append(record)
        self.current_values[item] = value
        
        # Check for unrepeatable read
        if read_number > 1:
            previous = item_reads[-2]
            if previous.value != value:
                return (f"UNREPEATABLE READ DETECTED: "
                       f"Item '{item}' changed from {previous.value} "
                       f"(read #{previous.read_number}) to {value} "
                       f"(read #{read_number})")
        
        return None
    
    def add_invariant(self, name: str, check_fn: callable, 
                      involved_items: List[str]) -> None:
        """
        Add an invariant that should hold across related items.
        
        Args:
            name: Human-readable invariant name
            check_fn: Function(values_dict) -> bool
            involved_items: List of item names involved in invariant
        """
        self.invariants.append({
            'name': name,
            'check': check_fn,
            'items': set(involved_items)
        })
    
    def check_invariants(self) -> List[str]:
        """
        Check all invariants against current read values.
        Returns list of violated invariant messages.
        """
        violations = []
        
        for inv in self.invariants:
            # Check if we have values for all involved items
            if not all(item in self.current_values for item in inv['items']):
                continue  # Not enough data yet
            
            try:
                if not inv['check'](self.current_values):
                    violations.append(
                        f"INVARIANT VIOLATION: {inv['name']} failed with "
                        f"values {dict((k, self.current_values[k]) for k in inv['items'])}"
                    )
            except Exception as e:
                violations.append(f"INVARIANT ERROR: {inv['name']} raised {e}")
        
        return violations
    
    def generate_consistency_hash(self) -> str:
        """
        Generate a hash of all current values.
        Can be used to compare transaction snapshots.
        """
        sorted_items = sorted(self.current_values.items())
        content = str(sorted_items).encode()
        return hashlib.md5(content).hexdigest()[:12]
    
    def get_read_summary(self) -> Dict:
        """Get summary of all reads performed."""
        return {
            'items_read': len(self.reads),
            'total_reads': sum(len(v) for v in self.reads.values()),
            'items_read_multiple': sum(1 for v in self.reads.values() if len(v) > 1),
        }
 
def demonstrate_detection():
    """Demonstrate inconsistency detection."""
    
    print("=" * 60)
    print("Scenario: Bank Account Consistency Check")
    print("=" * 60)
    
    detector = InconsistencyDetector()
    
    # Define invariant: accounts sum to 1000
    detector.add_invariant(
        name="Total Balance Conservation",
        check_fn=lambda v: v.get('account_a', 0) + v.get('account_b', 0) == 1000,
        involved_items=['account_a', 'account_b']
    )
    
    # Simulate reads that observe inconsistent state
    print("\nTransaction reads:")
    
    # Read account A (old value)
    warning = detector.record_read('account_a', 500)
    print(f"  Read account_a = 500")
    if warning: print(f"  ⚠️ {warning}")
    
    # Simulate concurrent transfer: A goes to 600, B goes to 400
    # Transaction reads B (new value)  
    warning = detector.record_read('account_b', 400)
    print(f"  Read account_b = 400")
    if warning: print(f"  ⚠️ {warning}")
    
    # Check invariants
    violations = detector.check_invariants()
    print(f"\nInvariant check:")
    if violations:
        for v in violations:
            print(f"  ❌ {v}")
        print("  Transaction observed an inconsistent state!")
    else:
        print("  ✓ All invariants satisfied")
    
    print("\n" + "=" * 60)
    print("Scenario: Unrepeatable Read Detection")
    print("=" * 60)
    
    detector2 = InconsistencyDetector()
    
    print("\nTransaction reads:")
    
    warning = detector2.record_read('balance', 1000)
    print(f"  Read balance = 1000")
    if warning: print(f"  ⚠️ {warning}")
    
    # Simulate concurrent modification + commit
    # Transaction reads again
    warning = detector2.record_read('balance', 800)
    print(f"  Read balance = 800")
    if warning: print(f"  ⚠️ {warning}")
    
    summary = detector2.get_read_summary()
    print(f"\nRead Summary: {summary}")
 
if __name__ == "__main__":
    demonstrate_detection()

Logging and Monitoring:

Production systems should log enough information to reconstruct whether inconsistency occurred:

Transaction ID: Track which queries belong together
Query timestamps: When each read occurred
Row versions: Capture version/timestamp of read rows
Related transaction IDs: Log commits that could affect concurrent transactions

With this information, post-hoc analysis can identify transactions that may have experienced inconsistency.

Prevention Over Detection

In practice, preventing inconsistency (through proper isolation levels) is far easier than detecting it after the fact. Detection is primarily useful for auditing, testing, and understanding system behavior—not for fixing problems in real-time.

Application-Level Manifestations

Read inconsistency manifests in applications in various ways, often without clear error messages. Understanding these manifestations helps developers recognize when inconsistency might be the root cause of bugs.

Common Manifestations:

Symptoms of Read Inconsistency

•Intermittent failures: Operations that usually succeed occasionally fail with constraint violations or validation errors
•Non-reproducible bugs: The same code path produces different results depending on timing
•Reconciliation failures: Totals don't match sums of components when comparing reports
•'Impossible' states: Application reaches states that shouldn't be reachable given the business logic
•Race condition symptoms: Behavior changes when running under load or with concurrent users
•Cache-database mismatches: Cached values differ from subsequent database reads

Debugging Pattern:

When investigating potential read inconsistency:

Check isolation level: What level is the connection/transaction using?
Map data dependencies: Which values depend on which other values?
Identify read pattern: Are the same or related items read multiple times?
Assess concurrency: Could other transactions modify this data concurrently?
Test under load: Does the bug appear more frequently under concurrent access?

Example Investigation:

Bug report: "Inventory occasionally shows negative stock after sale"

1. Isolation level: READ COMMITTED
2. Dependencies: available_stock = total_stock - reserved
3. Read pattern:
   - Read total_stock (100)
   - Process sale logic...
   - Read reserved (50)
   - Calculate available = 100 - 50 = 50
   - But total_stock was actually updated to 40 mid-transaction!
4. Concurrency: Multiple sales processing simultaneously
5. Fix: Use REPEATABLE READ or read all values in single query

The Heisenbugs of Databases

Read inconsistency often causes 'heisenbugs'—bugs that disappear when you try to observe them. Adding logging can change timing enough to prevent the race. Running in a debugger serializes operations. The bug only manifests under specific, hard-to-reproduce concurrent conditions.

Consistency Guarantees in Practice

Different database systems and configurations provide different consistency guarantees. Understanding what your system actually provides is essential for writing correct applications.

PostgreSQL Consistency Levels:

PostgreSQL Isolation Behavior
Level	Snapshot Timing	Consistency Guarantee
READ UNCOMMITTED	Same as READ COMMITTED	Per-statement snapshot (no dirty reads)
READ COMMITTED	Start of each statement	Statement-level consistency only
REPEATABLE READ	Start of transaction	Transaction-level snapshot consistency
SERIALIZABLE	Start of transaction + conflict detection	True serializability

MySQL (InnoDB) Consistency Levels:

Level	Locking Behavior	Read Consistency
READ UNCOMMITTED	No locks for reads	May see uncommitted changes
READ COMMITTED	Locks released after each read	Per-statement view
REPEATABLE READ	Locks held until transaction end	Transaction-level snapshot
SERIALIZABLE	Gap locks + next-key locks	Full serialization

Oracle and SQL Server:

Both provide snapshot isolation explicitly:

-- Oracle
SET TRANSACTION READ ONLY;  -- Snapshot at transaction start

-- SQL Server
SET TRANSACTION ISOLATION LEVEL SNAPSHOT;  -- Requires DB configuration

Key Implementation Differences:

PostgreSQL, Oracle, MySQL InnoDB: MVCC-based, reads don't block writes
SQL Server (traditional): Locking-based by default, but SNAPSHOT available
SQLite: Serial execution in WAL mode, or locking in rollback mode

Don't Assume Consistency

Default isolation levels vary by database and even by connection library. Always explicitly set the isolation level for transactions that require consistency guarantees. 'It worked in testing' doesn't mean it will work under production concurrency.

Summary: Inconsistent Reads

Inconsistent reads are the consequence of unrepeatable reads and related phenomena, creating transactions that observe synthetic database states. Let's consolidate the key concepts:

Key Takeaways

•Synthetic states: Read inconsistency creates a view of the database that never actually existed—fragments assembled from different points in time.
•Multiple forms: Single-item inconsistency (unrepeatable reads), multi-item inconsistency (read skew), and aggregate inconsistency affect different data scopes.
•Snapshot consistency: The solution is snapshots—all reads see data from the same logical point in time.
•Aggregate vulnerability: Aggregate queries over large datasets are especially vulnerable due to their duration and implicit multi-row scope.
•Detection is hard: Inconsistency leaves no obvious trace; detection requires comparing reads or checking invariants.
•Manifestations vary: Applications experience intermittent failures, reconciliation issues, and 'impossible' states.

What's next:

We've examined how unrepeatable reads create inconsistency. Next, we'll look at concrete example scenarios—real-world situations where these anomalies cause problems, helping you recognize and anticipate them in your own systems.

Page Complete

You now understand how unrepeatable reads lead to broader read inconsistency, the concept of snapshot consistency, and how to recognize consistency issues in applications. Next, we'll explore detailed real-world scenarios that demonstrate these problems in action.