Loading content...
The previous page extolled the dramatic benefits of concurrent execution—throughput improvements measured in orders of magnitude, response times that enable real-time interactive systems, and economics that make large-scale computing viable. But this power comes with a dangerous caveat.
When multiple transactions execute concurrently and access shared data, a class of subtle but devastating problems can emerge. These problems don't crash the system; they corrupt it silently. Money disappears from bank accounts. Inventory counts become inconsistent. Decisions are made on data that was never valid.
The fundamental tension:
We want the performance benefits of concurrent execution, but we need the correctness guarantees of serial execution. This tension—between performance and correctness—is the central challenge of database concurrency control.
This page introduces the categories of problems that uncontrolled concurrent execution creates. Understanding these problems is essential before we can appreciate the mechanisms designed to prevent them.
The problems covered in this page are not theoretical curiosities. They cause real data corruption in production systems. Financial losses, incorrect business decisions, customer trust violations, and regulatory failures have all resulted from inadequately controlled concurrent access to databases.
By the end of this page, you will understand the taxonomy of concurrency problems, how they arise from specific operation orderings, their real-world consequences, and why controlled concurrent execution rather than simply 'more concurrency' is the goal of database systems.
To understand concurrency problems, we must first understand why they exist at all. The answer lies in the interaction between transaction isolation expectations and actual execution behavior.
The Isolation Expectation:
The ACID property of Isolation promises that transactions appear to execute in isolation from one another. From any transaction's perspective, it should seem as though it were the only transaction running. Other transactions' modifications should not be visible until they commit, and the transaction's own modifications should not interfere with others.
The Reality of Concurrent Execution:
In reality, concurrent transactions share physical resources and data structures. Operations from different transactions interleave in time. This interleaving can create situations where:
The Core Issue: Non-atomic Operations
Even a simple SQL statement like UPDATE accounts SET balance = balance - 100 WHERE id = 1 actually involves multiple steps:
Between these steps, other transactions can execute their own operations on the same data. This is where problems arise—in the gaps between read and write, between logical operation and physical completion.
1234567891011121314151617181920212223242526272829303132
-- Demonstrating the vulnerability window -- This single SQL statement:UPDATE accounts SET balance = balance - 100 WHERE id = 1; -- Actually executes as multiple internal steps: -- Step 1: Lock acquisition (may wait for other transactions)-- Step 2: Page fetch (find the row in buffer pool or disk)-- Step 3: Current value read (balance = 1000)-- Step 4: Computation (1000 - 100 = 900)-- Step 5: Value write (balance = 900)-- Step 6: Lock release (or held until commit, depending on protocol) -- Between ANY of these steps, the scheduler might:-- - Execute operations from other transactions-- - Allow other transactions to read the same row-- - Allow other transactions to modify the same row -- Timeline showing the danger:-- Time 0: T1 starts UPDATE on account 1 (balance = 1000)-- Time 1: T1 reads current value: 1000-- Time 2: T2 starts UPDATE on account 1-- Time 3: T2 reads current value: 1000 (STILL 1000! T1 hasn't written yet)-- Time 4: T1 computes: 1000 - 100 = 900-- Time 5: T2 computes: 1000 - 200 = 800-- Time 6: T1 writes: balance = 900-- Time 7: T2 writes: balance = 800 (OVERWRITES T1's update!) -- Final balance: 800-- Expected balance: 1000 - 100 - 200 = 700-- LOST: $100 from T1's deduction!In serial execution, each transaction sees the complete effects of all previous transactions and none of the effects of future transactions. In concurrent execution, transactions may see partial effects, inconsistent states, or miss other transactions' changes entirely. This asymmetry is the source of all concurrency problems.
Database concurrency problems fall into distinct categories based on their nature and the operations that cause them. Understanding this taxonomy is essential for selecting appropriate control mechanisms.
The Four Classic Concurrency Problems:
Lost Update (Write-Write Conflict) — Two transactions read the same data and both try to update it. The update from one transaction is lost because the other overwrites it without seeing the first update.
Dirty Read (Read Uncommitted Write) — A transaction reads data that another transaction has modified but not yet committed. If the modifying transaction rolls back, the reading transaction has used data that was never valid.
Non-Repeatable Read (Read Inconsistency) — A transaction reads the same data item twice and gets different values because another transaction modified and committed between the reads.
Phantom Read (New Data Appearance) — A transaction re-executes a query and gets additional rows that weren't present before because another transaction inserted (and committed) matching data.
| Problem | Operations Involved | Cause | Effect |
|---|---|---|---|
| Lost Update | Read-Write-Read-Write | Both transactions base write on same old value | One transaction's update disappears |
| Dirty Read | Write-Read | Reading uncommitted modifications | Decisions based on possibly invalid data |
| Non-Repeatable Read | Read-Write-Commit-Read | Data changed between reads in same transaction | Inconsistent view within transaction |
| Phantom Read | Query-Insert/Delete-Commit-Query | Set membership changed between queries | Aggregations and checks become invalid |
Relationship to Isolation Levels:
These four problems correspond to violations of different isolation guarantees. The SQL standard defines isolation levels based on which problems they permit:
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible |
| Read Committed | Prevented | Possible | Possible |
| Repeatable Read | Prevented | Prevented | Possible |
| Serializable | Prevented | Prevented | Prevented |
Note: Lost Update is prevented at all standard isolation levels through different mechanisms.
Research has identified additional anomalies like Write Skew, Read Skew, and various snapshot isolation anomalies. The classic four cover most practical scenarios, but sophisticated applications may encounter these rarer problems, especially with optimistic concurrency control.
The Lost Update problem is perhaps the most intuitive and immediately dangerous of the concurrency problems. It occurs when two transactions read the same data item, then both write to it based on what they read. The second write overwrites the first, and the first transaction's modification is permanently lost.
The Fundamental Pattern:
The key insight is that both transactions base their write on the same original value, unaware of the other's modification.
1234567891011121314151617181920212223242526272829
-- SCENARIO: Two concurrent withdrawals from the same account -- Initial state: Account balance = $1,000 -- Transaction T1: Withdraw $100BEGIN TRANSACTION; -- T1 SELECT balance FROM accounts WHERE id = 1; -- T1 reads: $1,000 -- T1 will calculate: $1,000 - $100 = $900 -- Transaction T2: Withdraw $200 (starts before T1 commits)BEGIN TRANSACTION; -- T2 SELECT balance FROM accounts WHERE id = 1; -- T2 reads: $1,000 (SAME!) -- T2 will calculate: $1,000 - $200 = $800 -- T1 completes its update UPDATE accounts SET balance = 900 WHERE id = 1; -- T1 writes $900COMMIT; -- T1 -- T2 completes its update (T2 doesn't know about T1's change!) UPDATE accounts SET balance = 800 WHERE id = 1; -- T2 writes $800COMMIT; -- T2 -- FINAL STATE: balance = $800-- EXPECTED STATE: $1,000 - $100 - $200 = $700-- -- T1's $100 withdrawal is LOST!-- The bank is now short $100, or the customer got a "free" withdrawal. -- This is REAL money disappearing due to a concurrency bug.The Read-Modify-Write pattern (read current value, compute change, write new value) is inherently vulnerable to lost updates. Every increment counter, balance update, and accumulator in your application is a potential lost update waiting to happen without proper concurrency control.
A Dirty Read occurs when a transaction reads data that has been modified by another transaction that has not yet committed. This data is considered 'dirty' because it may not survive—if the writing transaction rolls back, the dirty data never existed from a committed database perspective.
The Fundamental Pattern:
The danger is that T₂ may make decisions, perform calculations, or write other data based on a value that the database never actually contained in any committed state.
12345678910111213141516171819202122232425262728293031
-- SCENARIO: Bank transfer with rollback -- Initial state: -- Account A = $1,000-- Account B = $0 -- Transaction T1: Transfer $500 from Account A to Account BBEGIN TRANSACTION; -- T1 UPDATE accounts SET balance = 500 WHERE id = 'A'; -- A = $500 UPDATE accounts SET balance = 500 WHERE id = 'B'; -- B = $500 -- Suppose T1 detects an error (maybe daily transfer limit exceeded) -- Transaction T2: Check if Account B has sufficient fundsBEGIN TRANSACTION; -- T2 SELECT balance FROM accounts WHERE id = 'B'; -- T2 reads: $500 (DIRTY! T1 hasn't committed) -- T2 approves a $400 withdrawal from Account B based on this! UPDATE accounts SET balance = balance - 400 WHERE id = 'B'; -- B = $100COMMIT; -- T2 -- T1 must rollback due to the error!ROLLBACK; -- T1-- This reverts A back to $1,000, B back to $0 -- But wait... T2 already withdrew $400 from B based on the dirty read!-- -- Now: B = $0 - $400 = -$400 (or the withdrawal was allowed incorrectly)-- -- The customer got $400 that Account B never actually had!-- Or the system is in an inconsistent state.Why Dirty Reads Are Dangerous:
Dirty reads cause transactions to operate on data that may never have been valid:
Most business applications cannot tolerate dirty reads, which is why Read Committed (or higher) isolation is the default in most database systems.
| Domain | Dirty Read Scenario | Consequence |
|---|---|---|
| Banking | Read inflated balance from pending deposit | Overdraft allowed, bank loses money |
| Inventory | Read increased stock from pending receipt | Oversell inventory, customer orders unfulfillable |
| Healthcare | Read medication order later cancelled | Wrong medication prepared or administered |
| Trading | Read position size from uncommitted trade | Risk calculations incorrect, overexposure |
Some scenarios tolerate dirty reads for performance: approximate analytics, real-time dashboards where 'close enough' suffices, or systems where uncommitted data is almost always committed. But these are the exception—most transactional applications require at least Read Committed isolation.
A Non-Repeatable Read (also called Fuzzy Read or Read Skew) occurs when a transaction reads the same data item twice and gets different values because another transaction modified and committed that data between the reads.
The Fundamental Pattern:
This violates the expectation that a transaction sees a consistent snapshot of the database throughout its execution.
1234567891011121314151617181920212223242526272829
-- SCENARIO: Price change during order processing -- Initial state: Product price = $100 -- Transaction T1: Process a customer orderBEGIN TRANSACTION; -- T1 -- Step 1: Show product price to customer SELECT price FROM products WHERE id = 101; -- T1 reads: $100 -- Customer confirms order at $100... -- Transaction T2: Merchant updates product priceBEGIN TRANSACTION; -- T2 UPDATE products SET price = 150 WHERE id = 101; -- Price = $150COMMIT; -- T2 -- Transaction T1 continues (perhaps to generate invoice) -- Step 2: Calculate order total SELECT price FROM products WHERE id = 101; -- T1 reads: $150 (CHANGED!) -- Now T1 has a problem: -- Customer saw and agreed to $100 -- Invoice calculation uses $150 -- Order record might be inconsistent about the actual price COMMIT; -- T1 -- The customer was shown $100 but might be charged $150-- Or there's a $50 discrepancy in records-- Either way, this is a data consistency issueWhy Non-Repeatable Reads Are Problematic:
Distinguishing from Dirty Reads:
Unlike dirty reads, non-repeatable reads involve only committed data. The modifying transaction has fully committed before the second read. This makes the problem more subtle—both values are 'valid' in some sense, just from different points in time.
Non-repeatable reads are a database manifestation of the Time-of-Check to Time-of-Use (TOCTOU) race condition common in computing. Any time you check a condition and then act on it, you're vulnerable if the condition can change between check and use.
A Phantom Read occurs when a transaction executes a query twice and gets different result sets because another transaction inserted or deleted rows that match the query's predicates between the two executions.
The Fundamental Pattern:
Phantom reads differ from non-repeatable reads in that they involve the set of matching rows rather than the value of a specific row. Non-repeatable reads happen when existing data changes; phantom reads happen when the set membership changes.
123456789101112131415161718192021222324252627282930313233343536
-- SCENARIO: Employee salary budget calculation -- Initial state: 3 employees in Engineering department -- Transaction T1: Calculate total salary budget for EngineeringBEGIN TRANSACTION; -- T1 -- Step 1: Count employees and sum salaries SELECT COUNT(*), SUM(salary) FROM employees WHERE department = 'Engineering'; -- Result: 3 employees, total salary = $300,000 -- Transaction T2: HR adds a new engineerBEGIN TRANSACTION; -- T2 INSERT INTO employees (name, department, salary) VALUES ('New Hire', 'Engineering', 80000);COMMIT; -- T2 -- Transaction T1 continues: Verify budget against company limit -- Step 2: Check individual salaries for policy compliance SELECT name, salary FROM employees WHERE department = 'Engineering'; -- Result: 4 rows! (New Hire appears as PHANTOM) -- Now T1 is confused: -- Step 1 said 3 employees totaling $300,000 -- Step 2 shows 4 employees -- If T1 re-sums: 4 employees totaling $380,000 -- Budget calculations are inconsistent within the same transaction! COMMIT; -- T1 -- This could cause:-- - Incorrect budget approvals/rejections-- - Wrong per-employee calculations (300K/3 vs 380K/4)-- - Reporting discrepanciesWhy Phantom Reads Are Particularly Tricky:
Phantom reads are harder to prevent than other anomalies because they involve predicates, not just data items:
Preventing phantoms typically requires either predicate locking (locking the condition, not just rows) or serializable snapshot isolation (ensuring the transaction sees a fixed point-in-time view).
| Query Type | Phantom Trigger | Consequence |
|---|---|---|
| COUNT(*) WHERE ... | INSERT matching WHERE | Count becomes wrong mid-transaction |
| SUM/AVG WHERE ... | INSERT/DELETE matching WHERE | Aggregates become inconsistent |
| SELECT WHERE ... FOR UPDATE | INSERT matching WHERE | Missed rows in update operation |
| Unique constraint check | Concurrent INSERT | Duplicate key despite check |
Non-repeatable reads change the value of existing rows; phantom reads change which rows exist. If you lock the rows you've read, you prevent non-repeatable reads but not phantoms—a new row can still appear. Preventing phantoms requires locking the 'gaps' where new rows could be inserted.
The four classic concurrency problems are not independent phenomena—they represent different manifestations of a single underlying issue: the violation of serializability. Understanding their relationships helps in selecting appropriate concurrency control mechanisms.
The Severity Hierarchy:
The problems can be ranked by severity, which corresponds to the isolation levels that prevent them:
Preventing higher-severity problems generally also prevents lower-severity ones (preventing dirty reads automatically prevents non-repeatable reads of uncommitted data).
The Lost Update Special Case:
Lost updates sit somewhat outside the standard hierarchy. They're caused by the read-modify-write pattern without proper synchronization. Different isolation levels handle them differently:
Practical Trade-offs:
Higher isolation prevents more problems but costs more in terms of performance:
| Isolation Level | Problems Prevented | Performance Impact |
|---|---|---|
| Read Uncommitted | None | Minimal overhead |
| Read Committed | Dirty Read | Low overhead |
| Repeatable Read | Dirty Read, Non-Repeatable Read | Moderate overhead |
| Serializable | All | Highest overhead |
Most applications use Read Committed as a reasonable balance. Critical operations may escalate to Serializable for specific transactions.
Beyond the classic four, Write Skew occurs when two transactions read overlapping data, make decisions based on it, and then write to non-overlapping data in ways that would be invalid if they saw each other's writes. Example: Two doctors each check that a hospital has at least one doctor on call, then both go off-call—leaving no one on call. This requires serializable isolation to prevent.
We have surveyed the landscape of problems that arise from concurrent transaction execution. These problems represent the price of concurrency—challenges that must be addressed to gain the benefits described in the previous page.
What's next:
Having introduced the problems at a high level, the next page examines interleaving—the mechanism by which these problems actually manifest. We'll see exactly how the scheduler's decisions to interleave operations from different transactions can create problematic orderings, and why not all interleavings are created equal.
You now have a comprehensive overview of the concurrency problems that plague uncontrolled concurrent execution. Each problem—lost updates, dirty reads, non-repeatable reads, and phantom reads—will be explored in detail in dedicated modules. First, we'll understand the interleaving mechanism that makes these problems possible.