Concurrency Overview - Learning Module

Loading content...

0/252

Problems Introduction

The Price of Power

The previous page extolled the dramatic benefits of concurrent execution—throughput improvements measured in orders of magnitude, response times that enable real-time interactive systems, and economics that make large-scale computing viable. But this power comes with a dangerous caveat.

When multiple transactions execute concurrently and access shared data, a class of subtle but devastating problems can emerge. These problems don't crash the system; they corrupt it silently. Money disappears from bank accounts. Inventory counts become inconsistent. Decisions are made on data that was never valid.

The fundamental tension:

We want the performance benefits of concurrent execution, but we need the correctness guarantees of serial execution. This tension—between performance and correctness—is the central challenge of database concurrency control.

This page introduces the categories of problems that uncontrolled concurrent execution creates. Understanding these problems is essential before we can appreciate the mechanisms designed to prevent them.

Critical System Risks

The problems covered in this page are not theoretical curiosities. They cause real data corruption in production systems. Financial losses, incorrect business decisions, customer trust violations, and regulatory failures have all resulted from inadequately controlled concurrent access to databases.

What You Will Learn

By the end of this page, you will understand the taxonomy of concurrency problems, how they arise from specific operation orderings, their real-world consequences, and why controlled concurrent execution rather than simply 'more concurrency' is the goal of database systems.

Why Problems Occur

To understand concurrency problems, we must first understand why they exist at all. The answer lies in the interaction between transaction isolation expectations and actual execution behavior.

The Isolation Expectation:

The ACID property of Isolation promises that transactions appear to execute in isolation from one another. From any transaction's perspective, it should seem as though it were the only transaction running. Other transactions' modifications should not be visible until they commit, and the transaction's own modifications should not interfere with others.

The Reality of Concurrent Execution:

In reality, concurrent transactions share physical resources and data structures. Operations from different transactions interleave in time. This interleaving can create situations where:

One transaction reads data that another transaction is in the middle of modifying
Multiple transactions read the same data and attempt to modify it based on stale information
A transaction's partial modifications become visible to other transactions
The order of operations creates dependencies that couldn't exist in serial execution

The Core Issue: Non-atomic Operations

Even a simple SQL statement like UPDATE accounts SET balance = balance - 100 WHERE id = 1 actually involves multiple steps:

Find the row with id = 1
Read the current balance value
Subtract 100 from that value
Write the new value back

Between these steps, other transactions can execute their own operations on the same data. This is where problems arise—in the gaps between read and write, between logical operation and physical completion.

The Gap Between Instructions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Demonstrating the vulnerability window
 
-- This single SQL statement:
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
 
-- Actually executes as multiple internal steps:
 
-- Step 1: Lock acquisition (may wait for other transactions)
-- Step 2: Page fetch (find the row in buffer pool or disk)
-- Step 3: Current value read (balance = 1000)
-- Step 4: Computation (1000 - 100 = 900)
-- Step 5: Value write (balance = 900)
-- Step 6: Lock release (or held until commit, depending on protocol)
 
-- Between ANY of these steps, the scheduler might:
-- - Execute operations from other transactions
-- - Allow other transactions to read the same row
-- - Allow other transactions to modify the same row
 
-- Timeline showing the danger:
-- Time 0:   T1 starts UPDATE on account 1 (balance = 1000)
-- Time 1:   T1 reads current value: 1000
-- Time 2:   T2 starts UPDATE on account 1
-- Time 3:   T2 reads current value: 1000 (STILL 1000! T1 hasn't written yet)
-- Time 4:   T1 computes: 1000 - 100 = 900
-- Time 5:   T2 computes: 1000 - 200 = 800
-- Time 6:   T1 writes: balance = 900
-- Time 7:   T2 writes: balance = 800 (OVERWRITES T1's update!)
 
-- Final balance: 800
-- Expected balance: 1000 - 100 - 200 = 700
-- LOST: $100 from T1's deduction!

The Fundamental Asymmetry

In serial execution, each transaction sees the complete effects of all previous transactions and none of the effects of future transactions. In concurrent execution, transactions may see partial effects, inconsistent states, or miss other transactions' changes entirely. This asymmetry is the source of all concurrency problems.

Classification of Problems

Database concurrency problems fall into distinct categories based on their nature and the operations that cause them. Understanding this taxonomy is essential for selecting appropriate control mechanisms.

The Four Classic Concurrency Problems:

Lost Update (Write-Write Conflict) — Two transactions read the same data and both try to update it. The update from one transaction is lost because the other overwrites it without seeing the first update.
Dirty Read (Read Uncommitted Write) — A transaction reads data that another transaction has modified but not yet committed. If the modifying transaction rolls back, the reading transaction has used data that was never valid.
Non-Repeatable Read (Read Inconsistency) — A transaction reads the same data item twice and gets different values because another transaction modified and committed between the reads.
Phantom Read (New Data Appearance) — A transaction re-executes a query and gets additional rows that weren't present before because another transaction inserted (and committed) matching data.

Concurrency Problem Classification
Problem	Operations Involved	Cause	Effect
Lost Update	Read-Write-Read-Write	Both transactions base write on same old value	One transaction's update disappears
Dirty Read	Write-Read	Reading uncommitted modifications	Decisions based on possibly invalid data
Non-Repeatable Read	Read-Write-Commit-Read	Data changed between reads in same transaction	Inconsistent view within transaction
Phantom Read	Query-Insert/Delete-Commit-Query	Set membership changed between queries	Aggregations and checks become invalid

Relationship to Isolation Levels:

These four problems correspond to violations of different isolation guarantees. The SQL standard defines isolation levels based on which problems they permit:

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible
Read Committed	Prevented	Possible	Possible
Repeatable Read	Prevented	Prevented	Possible
Serializable	Prevented	Prevented	Prevented

Note: Lost Update is prevented at all standard isolation levels through different mechanisms.

Beyond the Classic Four

Research has identified additional anomalies like Write Skew, Read Skew, and various snapshot isolation anomalies. The classic four cover most practical scenarios, but sophisticated applications may encounter these rarer problems, especially with optimistic concurrency control.

Lost Update Problem Overview

The Lost Update problem is perhaps the most intuitive and immediately dangerous of the concurrency problems. It occurs when two transactions read the same data item, then both write to it based on what they read. The second write overwrites the first, and the first transaction's modification is permanently lost.

The Fundamental Pattern:

Transaction T₁ reads item X (value = V)
Transaction T₂ reads item X (value = V) — same value, T₁ hasn't written yet
Transaction T₁ writes item X (new value based on V)
Transaction T₂ writes item X (new value based on V) — overwrites T₁!

The key insight is that both transactions base their write on the same original value, unaware of the other's modification.

Lost Update Scenario
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- SCENARIO: Two concurrent withdrawals from the same account
 
-- Initial state: Account balance = $1,000
 
-- Transaction T1: Withdraw $100
BEGIN TRANSACTION; -- T1
  SELECT balance FROM accounts WHERE id = 1;  -- T1 reads: $1,000
  -- T1 will calculate: $1,000 - $100 = $900
  
-- Transaction T2: Withdraw $200 (starts before T1 commits)
BEGIN TRANSACTION; -- T2
  SELECT balance FROM accounts WHERE id = 1;  -- T2 reads: $1,000 (SAME!)
  -- T2 will calculate: $1,000 - $200 = $800
 
-- T1 completes its update
  UPDATE accounts SET balance = 900 WHERE id = 1;  -- T1 writes $900
COMMIT; -- T1
 
-- T2 completes its update (T2 doesn't know about T1's change!)
  UPDATE accounts SET balance = 800 WHERE id = 1;  -- T2 writes $800
COMMIT; -- T2
 
-- FINAL STATE: balance = $800
-- EXPECTED STATE: $1,000 - $100 - $200 = $700
-- 
-- T1's $100 withdrawal is LOST!
-- The bank is now short $100, or the customer got a "free" withdrawal.
 
-- This is REAL money disappearing due to a concurrency bug.

Lost Update Consequences

•Financial Loss — In banking, payment, or e-commerce systems, lost updates mean real money disappearing or appearing incorrectly.
•Inventory Errors — Stock counts become wrong, leading to overselling or unnecessary orders.
•Data Corruption — Documents, records, or configurations lose changes users believed they saved.
•Silent Failures — Lost updates don't raise errors—they simply don't happen. Users may never know their change was lost.
•Audit Trail Issues — The lost update may never be recorded, making reconciliation impossible.

The Danger of Read-Modify-Write

The Read-Modify-Write pattern (read current value, compute change, write new value) is inherently vulnerable to lost updates. Every increment counter, balance update, and accumulator in your application is a potential lost update waiting to happen without proper concurrency control.

Dirty Read Problem Overview

A Dirty Read occurs when a transaction reads data that has been modified by another transaction that has not yet committed. This data is considered 'dirty' because it may not survive—if the writing transaction rolls back, the dirty data never existed from a committed database perspective.

The Fundamental Pattern:

Transaction T₁ writes item X (new value = V')
Transaction T₂ reads item X (sees V') — this is the dirty read!
Transaction T₁ rolls back (X reverts to original value V)
Transaction T₂ has now used a value (V') that was never valid

The danger is that T₂ may make decisions, perform calculations, or write other data based on a value that the database never actually contained in any committed state.

Dirty Read Scenario
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- SCENARIO: Bank transfer with rollback
 
-- Initial state: 
-- Account A = $1,000
-- Account B = $0
 
-- Transaction T1: Transfer $500 from Account A to Account B
BEGIN TRANSACTION; -- T1
  UPDATE accounts SET balance = 500 WHERE id = 'A';  -- A = $500
  UPDATE accounts SET balance = 500 WHERE id = 'B';  -- B = $500
  -- Suppose T1 detects an error (maybe daily transfer limit exceeded)
  
-- Transaction T2: Check if Account B has sufficient funds
BEGIN TRANSACTION; -- T2
  SELECT balance FROM accounts WHERE id = 'B';  
  -- T2 reads: $500 (DIRTY! T1 hasn't committed)
  
  -- T2 approves a $400 withdrawal from Account B based on this!
  UPDATE accounts SET balance = balance - 400 WHERE id = 'B';  -- B = $100
COMMIT; -- T2
 
-- T1 must rollback due to the error!
ROLLBACK; -- T1
-- This reverts A back to $1,000, B back to $0
 
-- But wait... T2 already withdrew $400 from B based on the dirty read!
-- 
-- Now: B = $0 - $400 = -$400 (or the withdrawal was allowed incorrectly)
-- 
-- The customer got $400 that Account B never actually had!
-- Or the system is in an inconsistent state.

Why Dirty Reads Are Dangerous:

Dirty reads cause transactions to operate on data that may never have been valid:

Decision Making — Business logic may approve or deny actions based on uncommitted data
Cascading Errors — The dirty-read transaction may write data based on the invalid read, propagating corruption
Inconsistent Reporting — Reports may include data that was rolled back
Constraint Violations — A transaction might see a state that passes validation but results in violations when combined with rollback

Most business applications cannot tolerate dirty reads, which is why Read Committed (or higher) isolation is the default in most database systems.

Dirty Read Impact by Domain
Domain	Dirty Read Scenario	Consequence
Banking	Read inflated balance from pending deposit	Overdraft allowed, bank loses money
Inventory	Read increased stock from pending receipt	Oversell inventory, customer orders unfulfillable
Healthcare	Read medication order later cancelled	Wrong medication prepared or administered
Trading	Read position size from uncommitted trade	Risk calculations incorrect, overexposure

When Dirty Reads Are Acceptable

Some scenarios tolerate dirty reads for performance: approximate analytics, real-time dashboards where 'close enough' suffices, or systems where uncommitted data is almost always committed. But these are the exception—most transactional applications require at least Read Committed isolation.

Non-Repeatable Read Problem Overview

A Non-Repeatable Read (also called Fuzzy Read or Read Skew) occurs when a transaction reads the same data item twice and gets different values because another transaction modified and committed that data between the reads.

The Fundamental Pattern:

Transaction T₁ reads item X (value = V)
Transaction T₂ updates item X to V' and commits
Transaction T₁ reads item X again (value = V') — different value!
Transaction T₁ has now seen two different values for X within the same transaction

This violates the expectation that a transaction sees a consistent snapshot of the database throughout its execution.

Non-Repeatable Read Scenario
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- SCENARIO: Price change during order processing
 
-- Initial state: Product price = $100
 
-- Transaction T1: Process a customer order
BEGIN TRANSACTION; -- T1
  -- Step 1: Show product price to customer
  SELECT price FROM products WHERE id = 101;  -- T1 reads: $100
  -- Customer confirms order at $100...
  
-- Transaction T2: Merchant updates product price
BEGIN TRANSACTION; -- T2
  UPDATE products SET price = 150 WHERE id = 101;  -- Price = $150
COMMIT; -- T2
 
-- Transaction T1 continues (perhaps to generate invoice)
  -- Step 2: Calculate order total
  SELECT price FROM products WHERE id = 101;  -- T1 reads: $150 (CHANGED!)
  
  -- Now T1 has a problem:
  -- Customer saw and agreed to $100
  -- Invoice calculation uses $150
  -- Order record might be inconsistent about the actual price
  
COMMIT; -- T1
 
-- The customer was shown $100 but might be charged $150
-- Or there's a $50 discrepancy in records
-- Either way, this is a data consistency issue

Why Non-Repeatable Reads Are Problematic:

Logical Inconsistency — Application logic that assumes data stability can produce incorrect results
Decision Reversal — A check at the start of a transaction doesn't guarantee the condition holds at the end
Calculation Errors — Aggregations or multi-step computations may use inconsistent values
User Confusion — Users may see different values within what appears to be a single operation

Distinguishing from Dirty Reads:

Unlike dirty reads, non-repeatable reads involve only committed data. The modifying transaction has fully committed before the second read. This makes the problem more subtle—both values are 'valid' in some sense, just from different points in time.

Common Non-Repeatable Read Scenarios

•Multi-Step Validation — Check a condition, perform action, re-check condition. If condition changed between checks, action may be invalid.
•Report Generation — Query header data first, detail data later. Changes between queries cause report inconsistencies.
•Shopping Cart — Read price for display, read again for checkout. Price change between creates discrepancy.
•Balance Checks — Verify sufficient funds, deduct amount. Balance change between makes deduction overdraw.
•Quota Enforcement — Check remaining quota, add new record. Quota change between allows exceeding limit.

The TOCTOU Problem

Non-repeatable reads are a database manifestation of the Time-of-Check to Time-of-Use (TOCTOU) race condition common in computing. Any time you check a condition and then act on it, you're vulnerable if the condition can change between check and use.

Phantom Read Problem Overview

A Phantom Read occurs when a transaction executes a query twice and gets different result sets because another transaction inserted or deleted rows that match the query's predicates between the two executions.

The Fundamental Pattern:

Transaction T₁ executes a query with a WHERE clause, gets N rows
Transaction T₂ inserts a new row matching T₁'s WHERE clause and commits
Transaction T₁ re-executes the same query, gets N+1 rows
The new row is a 'phantom'—it appeared out of nowhere from T₁'s perspective

Phantom reads differ from non-repeatable reads in that they involve the set of matching rows rather than the value of a specific row. Non-repeatable reads happen when existing data changes; phantom reads happen when the set membership changes.

Phantom Read Scenario
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- SCENARIO: Employee salary budget calculation
 
-- Initial state: 3 employees in Engineering department
 
-- Transaction T1: Calculate total salary budget for Engineering
BEGIN TRANSACTION; -- T1
  -- Step 1: Count employees and sum salaries
  SELECT COUNT(*), SUM(salary) FROM employees 
  WHERE department = 'Engineering';
  -- Result: 3 employees, total salary = $300,000
  
-- Transaction T2: HR adds a new engineer
BEGIN TRANSACTION; -- T2
  INSERT INTO employees (name, department, salary) 
  VALUES ('New Hire', 'Engineering', 80000);
COMMIT; -- T2
 
-- Transaction T1 continues: Verify budget against company limit
  -- Step 2: Check individual salaries for policy compliance
  SELECT name, salary FROM employees 
  WHERE department = 'Engineering';
  -- Result: 4 rows! (New Hire appears as PHANTOM)
  
  -- Now T1 is confused:
  -- Step 1 said 3 employees totaling $300,000
  -- Step 2 shows 4 employees
  -- If T1 re-sums: 4 employees totaling $380,000
  
  -- Budget calculations are inconsistent within the same transaction!
  
COMMIT; -- T1
 
-- This could cause:
-- - Incorrect budget approvals/rejections
-- - Wrong per-employee calculations (300K/3 vs 380K/4)
-- - Reporting discrepancies

Why Phantom Reads Are Particularly Tricky:

Phantom reads are harder to prevent than other anomalies because they involve predicates, not just data items:

Traditional row-level locking is insufficient — You can't lock rows that don't exist yet
Range predicates are vulnerable — Any INSERT or DELETE matching a predicate can create a phantom
INDEX scans are affected — Even index-based access can't prevent phantoms without special mechanisms

Preventing phantoms typically requires either predicate locking (locking the condition, not just rows) or serializable snapshot isolation (ensuring the transaction sees a fixed point-in-time view).

Phantom Read Scenarios
Query Type	Phantom Trigger	Consequence
COUNT(*) WHERE ...	INSERT matching WHERE	Count becomes wrong mid-transaction
SUM/AVG WHERE ...	INSERT/DELETE matching WHERE	Aggregates become inconsistent
SELECT WHERE ... FOR UPDATE	INSERT matching WHERE	Missed rows in update operation
Unique constraint check	Concurrent INSERT	Duplicate key despite check

Phantoms vs. Non-Repeatable Reads

Non-repeatable reads change the value of existing rows; phantom reads change which rows exist. If you lock the rows you've read, you prevent non-repeatable reads but not phantoms—a new row can still appear. Preventing phantoms requires locking the 'gaps' where new rows could be inserted.

Interconnections Between Problems

The four classic concurrency problems are not independent phenomena—they represent different manifestations of a single underlying issue: the violation of serializability. Understanding their relationships helps in selecting appropriate concurrency control mechanisms.

The Severity Hierarchy:

The problems can be ranked by severity, which corresponds to the isolation levels that prevent them:

Dirty Read — Most severe; involves uncommitted data that may never exist
Non-Repeatable Read — Data changes between reads within a transaction
Phantom Read — Set membership changes between queries
Lost Update — Special case; prevented differently than the others

Preventing higher-severity problems generally also prevents lower-severity ones (preventing dirty reads automatically prevents non-repeatable reads of uncommitted data).

Converting Mermaid diagram...

The Lost Update Special Case:

Lost updates sit somewhat outside the standard hierarchy. They're caused by the read-modify-write pattern without proper synchronization. Different isolation levels handle them differently:

Locking-based systems: Prevent via write locks that block concurrent reads needed for the modify step
MVCC systems: May allow (then detect/abort) or prevent through first-committer-wins rules
Application level: Can also be prevented using optimistic concurrency (version numbers, ETags)

Practical Trade-offs:

Higher isolation prevents more problems but costs more in terms of performance:

Isolation Level	Problems Prevented	Performance Impact
Read Uncommitted	None	Minimal overhead
Read Committed	Dirty Read	Low overhead
Repeatable Read	Dirty Read, Non-Repeatable Read	Moderate overhead
Serializable	All	Highest overhead

Most applications use Read Committed as a reasonable balance. Critical operations may escalate to Serializable for specific transactions.

The Write Skew Anomaly

Beyond the classic four, Write Skew occurs when two transactions read overlapping data, make decisions based on it, and then write to non-overlapping data in ways that would be invalid if they saw each other's writes. Example: Two doctors each check that a hospital has at least one doctor on call, then both go off-call—leaving no one on call. This requires serializable isolation to prevent.

Summary: Concurrency Problems Introduction

We have surveyed the landscape of problems that arise from concurrent transaction execution. These problems represent the price of concurrency—challenges that must be addressed to gain the benefits described in the previous page.

Key Takeaways

•Concurrency problems arise from transactions reading partial states or missing other transactions' modifications.
•Lost Update — Occurs when concurrent read-modify-write operations overwrite each other's changes without seeing them.
•Dirty Read — Occurs when reading uncommitted data that may later be rolled back, basing decisions on invalid data.
•Non-Repeatable Read — Occurs when the same data item has different values in successive reads within one transaction.
•Phantom Read — Occurs when the set of rows matching a query changes between query executions within one transaction.
•Problems form a hierarchy corresponding to SQL isolation levels—higher isolation prevents more problems at higher cost.
•All problems violate serializability — The goal of concurrency control is to allow concurrency while preventing these violations.

What's next:

Having introduced the problems at a high level, the next page examines interleaving—the mechanism by which these problems actually manifest. We'll see exactly how the scheduler's decisions to interleave operations from different transactions can create problematic orderings, and why not all interleavings are created equal.

Page Complete

You now have a comprehensive overview of the concurrency problems that plague uncontrolled concurrent execution. Each problem—lost updates, dirty reads, non-repeatable reads, and phantom reads—will be explored in detail in dedicated modules. First, we'll understand the interleaving mechanism that makes these problems possible.

Problems Introduction

The Price of Power

The fundamental tension:

Critical System Risks

What You Will Learn

Why Problems Occur

The Isolation Expectation:

The Reality of Concurrent Execution:

In reality, concurrent transactions share physical resources and data structures. Operations from different transactions interleave in time. This interleaving can create situations where:

One transaction reads data that another transaction is in the middle of modifying
Multiple transactions read the same data and attempt to modify it based on stale information
A transaction's partial modifications become visible to other transactions
The order of operations creates dependencies that couldn't exist in serial execution

The Core Issue: Non-atomic Operations

Even a simple SQL statement like UPDATE accounts SET balance = balance - 100 WHERE id = 1 actually involves multiple steps:

Find the row with id = 1
Read the current balance value
Subtract 100 from that value
Write the new value back

The Gap Between Instructions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Demonstrating the vulnerability window
 
-- This single SQL statement:
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
 
-- Actually executes as multiple internal steps:
 
-- Step 1: Lock acquisition (may wait for other transactions)
-- Step 2: Page fetch (find the row in buffer pool or disk)
-- Step 3: Current value read (balance = 1000)
-- Step 4: Computation (1000 - 100 = 900)
-- Step 5: Value write (balance = 900)
-- Step 6: Lock release (or held until commit, depending on protocol)
 
-- Between ANY of these steps, the scheduler might:
-- - Execute operations from other transactions
-- - Allow other transactions to read the same row
-- - Allow other transactions to modify the same row
 
-- Timeline showing the danger:
-- Time 0:   T1 starts UPDATE on account 1 (balance = 1000)
-- Time 1:   T1 reads current value: 1000
-- Time 2:   T2 starts UPDATE on account 1
-- Time 3:   T2 reads current value: 1000 (STILL 1000! T1 hasn't written yet)
-- Time 4:   T1 computes: 1000 - 100 = 900
-- Time 5:   T2 computes: 1000 - 200 = 800
-- Time 6:   T1 writes: balance = 900
-- Time 7:   T2 writes: balance = 800 (OVERWRITES T1's update!)
 
-- Final balance: 800
-- Expected balance: 1000 - 100 - 200 = 700
-- LOST: $100 from T1's deduction!

The Fundamental Asymmetry

Classification of Problems

The Four Classic Concurrency Problems:

Lost Update (Write-Write Conflict) — Two transactions read the same data and both try to update it. The update from one transaction is lost because the other overwrites it without seeing the first update.
Dirty Read (Read Uncommitted Write) — A transaction reads data that another transaction has modified but not yet committed. If the modifying transaction rolls back, the reading transaction has used data that was never valid.
Non-Repeatable Read (Read Inconsistency) — A transaction reads the same data item twice and gets different values because another transaction modified and committed between the reads.
Phantom Read (New Data Appearance) — A transaction re-executes a query and gets additional rows that weren't present before because another transaction inserted (and committed) matching data.

Concurrency Problem Classification
Problem	Operations Involved	Cause	Effect
Lost Update	Read-Write-Read-Write	Both transactions base write on same old value	One transaction's update disappears
Dirty Read	Write-Read	Reading uncommitted modifications	Decisions based on possibly invalid data
Non-Repeatable Read	Read-Write-Commit-Read	Data changed between reads in same transaction	Inconsistent view within transaction
Phantom Read	Query-Insert/Delete-Commit-Query	Set membership changed between queries	Aggregations and checks become invalid

Relationship to Isolation Levels:

These four problems correspond to violations of different isolation guarantees. The SQL standard defines isolation levels based on which problems they permit:

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible
Read Committed	Prevented	Possible	Possible
Repeatable Read	Prevented	Prevented	Possible
Serializable	Prevented	Prevented	Prevented

Note: Lost Update is prevented at all standard isolation levels through different mechanisms.

Beyond the Classic Four

Lost Update Problem Overview

The Fundamental Pattern:

Transaction T₁ reads item X (value = V)
Transaction T₂ reads item X (value = V) — same value, T₁ hasn't written yet
Transaction T₁ writes item X (new value based on V)
Transaction T₂ writes item X (new value based on V) — overwrites T₁!

The key insight is that both transactions base their write on the same original value, unaware of the other's modification.

Lost Update Scenario
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- SCENARIO: Two concurrent withdrawals from the same account
 
-- Initial state: Account balance = $1,000
 
-- Transaction T1: Withdraw $100
BEGIN TRANSACTION; -- T1
  SELECT balance FROM accounts WHERE id = 1;  -- T1 reads: $1,000
  -- T1 will calculate: $1,000 - $100 = $900
  
-- Transaction T2: Withdraw $200 (starts before T1 commits)
BEGIN TRANSACTION; -- T2
  SELECT balance FROM accounts WHERE id = 1;  -- T2 reads: $1,000 (SAME!)
  -- T2 will calculate: $1,000 - $200 = $800
 
-- T1 completes its update
  UPDATE accounts SET balance = 900 WHERE id = 1;  -- T1 writes $900
COMMIT; -- T1
 
-- T2 completes its update (T2 doesn't know about T1's change!)
  UPDATE accounts SET balance = 800 WHERE id = 1;  -- T2 writes $800
COMMIT; -- T2
 
-- FINAL STATE: balance = $800
-- EXPECTED STATE: $1,000 - $100 - $200 = $700
-- 
-- T1's $100 withdrawal is LOST!
-- The bank is now short $100, or the customer got a "free" withdrawal.
 
-- This is REAL money disappearing due to a concurrency bug.

Lost Update Consequences

•Financial Loss — In banking, payment, or e-commerce systems, lost updates mean real money disappearing or appearing incorrectly.
•Inventory Errors — Stock counts become wrong, leading to overselling or unnecessary orders.
•Data Corruption — Documents, records, or configurations lose changes users believed they saved.
•Silent Failures — Lost updates don't raise errors—they simply don't happen. Users may never know their change was lost.
•Audit Trail Issues — The lost update may never be recorded, making reconciliation impossible.

The Danger of Read-Modify-Write

Dirty Read Problem Overview

The Fundamental Pattern:

Transaction T₁ writes item X (new value = V')
Transaction T₂ reads item X (sees V') — this is the dirty read!
Transaction T₁ rolls back (X reverts to original value V)
Transaction T₂ has now used a value (V') that was never valid

The danger is that T₂ may make decisions, perform calculations, or write other data based on a value that the database never actually contained in any committed state.

Dirty Read Scenario
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- SCENARIO: Bank transfer with rollback
 
-- Initial state: 
-- Account A = $1,000
-- Account B = $0
 
-- Transaction T1: Transfer $500 from Account A to Account B
BEGIN TRANSACTION; -- T1
  UPDATE accounts SET balance = 500 WHERE id = 'A';  -- A = $500
  UPDATE accounts SET balance = 500 WHERE id = 'B';  -- B = $500
  -- Suppose T1 detects an error (maybe daily transfer limit exceeded)
  
-- Transaction T2: Check if Account B has sufficient funds
BEGIN TRANSACTION; -- T2
  SELECT balance FROM accounts WHERE id = 'B';  
  -- T2 reads: $500 (DIRTY! T1 hasn't committed)
  
  -- T2 approves a $400 withdrawal from Account B based on this!
  UPDATE accounts SET balance = balance - 400 WHERE id = 'B';  -- B = $100
COMMIT; -- T2
 
-- T1 must rollback due to the error!
ROLLBACK; -- T1
-- This reverts A back to $1,000, B back to $0
 
-- But wait... T2 already withdrew $400 from B based on the dirty read!
-- 
-- Now: B = $0 - $400 = -$400 (or the withdrawal was allowed incorrectly)
-- 
-- The customer got $400 that Account B never actually had!
-- Or the system is in an inconsistent state.

Why Dirty Reads Are Dangerous:

Dirty reads cause transactions to operate on data that may never have been valid:

Decision Making — Business logic may approve or deny actions based on uncommitted data
Cascading Errors — The dirty-read transaction may write data based on the invalid read, propagating corruption
Inconsistent Reporting — Reports may include data that was rolled back
Constraint Violations — A transaction might see a state that passes validation but results in violations when combined with rollback

Most business applications cannot tolerate dirty reads, which is why Read Committed (or higher) isolation is the default in most database systems.

Dirty Read Impact by Domain
Domain	Dirty Read Scenario	Consequence
Banking	Read inflated balance from pending deposit	Overdraft allowed, bank loses money
Inventory	Read increased stock from pending receipt	Oversell inventory, customer orders unfulfillable
Healthcare	Read medication order later cancelled	Wrong medication prepared or administered
Trading	Read position size from uncommitted trade	Risk calculations incorrect, overexposure

When Dirty Reads Are Acceptable

Non-Repeatable Read Problem Overview

The Fundamental Pattern:

Transaction T₁ reads item X (value = V)
Transaction T₂ updates item X to V' and commits
Transaction T₁ reads item X again (value = V') — different value!
Transaction T₁ has now seen two different values for X within the same transaction

This violates the expectation that a transaction sees a consistent snapshot of the database throughout its execution.

Non-Repeatable Read Scenario
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- SCENARIO: Price change during order processing
 
-- Initial state: Product price = $100
 
-- Transaction T1: Process a customer order
BEGIN TRANSACTION; -- T1
  -- Step 1: Show product price to customer
  SELECT price FROM products WHERE id = 101;  -- T1 reads: $100
  -- Customer confirms order at $100...
  
-- Transaction T2: Merchant updates product price
BEGIN TRANSACTION; -- T2
  UPDATE products SET price = 150 WHERE id = 101;  -- Price = $150
COMMIT; -- T2
 
-- Transaction T1 continues (perhaps to generate invoice)
  -- Step 2: Calculate order total
  SELECT price FROM products WHERE id = 101;  -- T1 reads: $150 (CHANGED!)
  
  -- Now T1 has a problem:
  -- Customer saw and agreed to $100
  -- Invoice calculation uses $150
  -- Order record might be inconsistent about the actual price
  
COMMIT; -- T1
 
-- The customer was shown $100 but might be charged $150
-- Or there's a $50 discrepancy in records
-- Either way, this is a data consistency issue

Why Non-Repeatable Reads Are Problematic:

Logical Inconsistency — Application logic that assumes data stability can produce incorrect results
Decision Reversal — A check at the start of a transaction doesn't guarantee the condition holds at the end
Calculation Errors — Aggregations or multi-step computations may use inconsistent values
User Confusion — Users may see different values within what appears to be a single operation

Distinguishing from Dirty Reads:

Common Non-Repeatable Read Scenarios

•Multi-Step Validation — Check a condition, perform action, re-check condition. If condition changed between checks, action may be invalid.
•Report Generation — Query header data first, detail data later. Changes between queries cause report inconsistencies.
•Shopping Cart — Read price for display, read again for checkout. Price change between creates discrepancy.
•Balance Checks — Verify sufficient funds, deduct amount. Balance change between makes deduction overdraw.
•Quota Enforcement — Check remaining quota, add new record. Quota change between allows exceeding limit.

The TOCTOU Problem

Phantom Read Problem Overview

The Fundamental Pattern:

Transaction T₁ executes a query with a WHERE clause, gets N rows
Transaction T₂ inserts a new row matching T₁'s WHERE clause and commits
Transaction T₁ re-executes the same query, gets N+1 rows
The new row is a 'phantom'—it appeared out of nowhere from T₁'s perspective

Phantom Read Scenario
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- SCENARIO: Employee salary budget calculation
 
-- Initial state: 3 employees in Engineering department
 
-- Transaction T1: Calculate total salary budget for Engineering
BEGIN TRANSACTION; -- T1
  -- Step 1: Count employees and sum salaries
  SELECT COUNT(*), SUM(salary) FROM employees 
  WHERE department = 'Engineering';
  -- Result: 3 employees, total salary = $300,000
  
-- Transaction T2: HR adds a new engineer
BEGIN TRANSACTION; -- T2
  INSERT INTO employees (name, department, salary) 
  VALUES ('New Hire', 'Engineering', 80000);
COMMIT; -- T2
 
-- Transaction T1 continues: Verify budget against company limit
  -- Step 2: Check individual salaries for policy compliance
  SELECT name, salary FROM employees 
  WHERE department = 'Engineering';
  -- Result: 4 rows! (New Hire appears as PHANTOM)
  
  -- Now T1 is confused:
  -- Step 1 said 3 employees totaling $300,000
  -- Step 2 shows 4 employees
  -- If T1 re-sums: 4 employees totaling $380,000
  
  -- Budget calculations are inconsistent within the same transaction!
  
COMMIT; -- T1
 
-- This could cause:
-- - Incorrect budget approvals/rejections
-- - Wrong per-employee calculations (300K/3 vs 380K/4)
-- - Reporting discrepancies

Why Phantom Reads Are Particularly Tricky:

Phantom reads are harder to prevent than other anomalies because they involve predicates, not just data items:

Traditional row-level locking is insufficient — You can't lock rows that don't exist yet
Range predicates are vulnerable — Any INSERT or DELETE matching a predicate can create a phantom
INDEX scans are affected — Even index-based access can't prevent phantoms without special mechanisms

Phantom Read Scenarios
Query Type	Phantom Trigger	Consequence
COUNT(*) WHERE ...	INSERT matching WHERE	Count becomes wrong mid-transaction
SUM/AVG WHERE ...	INSERT/DELETE matching WHERE	Aggregates become inconsistent
SELECT WHERE ... FOR UPDATE	INSERT matching WHERE	Missed rows in update operation
Unique constraint check	Concurrent INSERT	Duplicate key despite check

Phantoms vs. Non-Repeatable Reads

Interconnections Between Problems

The Severity Hierarchy:

The problems can be ranked by severity, which corresponds to the isolation levels that prevent them:

Dirty Read — Most severe; involves uncommitted data that may never exist
Non-Repeatable Read — Data changes between reads within a transaction
Phantom Read — Set membership changes between queries
Lost Update — Special case; prevented differently than the others

Preventing higher-severity problems generally also prevents lower-severity ones (preventing dirty reads automatically prevents non-repeatable reads of uncommitted data).

Converting Mermaid diagram...

The Lost Update Special Case:

Lost updates sit somewhat outside the standard hierarchy. They're caused by the read-modify-write pattern without proper synchronization. Different isolation levels handle them differently:

Locking-based systems: Prevent via write locks that block concurrent reads needed for the modify step
MVCC systems: May allow (then detect/abort) or prevent through first-committer-wins rules
Application level: Can also be prevented using optimistic concurrency (version numbers, ETags)

Practical Trade-offs:

Higher isolation prevents more problems but costs more in terms of performance:

Isolation Level	Problems Prevented	Performance Impact
Read Uncommitted	None	Minimal overhead
Read Committed	Dirty Read	Low overhead
Repeatable Read	Dirty Read, Non-Repeatable Read	Moderate overhead
Serializable	All	Highest overhead

Most applications use Read Committed as a reasonable balance. Critical operations may escalate to Serializable for specific transactions.

The Write Skew Anomaly

Summary: Concurrency Problems Introduction

Key Takeaways

•Concurrency problems arise from transactions reading partial states or missing other transactions' modifications.
•Lost Update — Occurs when concurrent read-modify-write operations overwrite each other's changes without seeing them.
•Dirty Read — Occurs when reading uncommitted data that may later be rolled back, basing decisions on invalid data.
•Non-Repeatable Read — Occurs when the same data item has different values in successive reads within one transaction.
•Phantom Read — Occurs when the set of rows matching a query changes between query executions within one transaction.
•Problems form a hierarchy corresponding to SQL isolation levels—higher isolation prevents more problems at higher cost.
•All problems violate serializability — The goal of concurrency control is to allow concurrency while preventing these violations.

What's next:

Page Complete