Loading learning content...
Throughout this module, we've established a fundamental tension in database systems:
The resolution of this tension is concurrency control—active mechanisms that manage how transactions interleave to preserve correctness while maximizing concurrency benefits.
This is not optional. A database without concurrency control is not a usable database. It's a data corruption engine that happens to be fast. Every production database system—from SQLite to Oracle to distributed systems like Spanner—implements sophisticated concurrency control. The question is never whether to control concurrency, but how.
By the end of this page, you will understand why concurrency control is non-negotiable, see real-world consequences of inadequate control, understand the core principles that underpin all concurrency control mechanisms, and preview the techniques (locking, timestamps, MVCC) that implement these principles.
Let's be absolutely clear about why passive approaches—hoping for the best, testing thoroughly, or trusting application code—are fundamentally inadequate for managing database concurrency.
Why 'Hope' Doesn't Work:
The number of possible interleavings is combinatorially explosive. For just 10 transactions with 10 operations each, there are more possible interleavings than atoms in the observable universe. Among these:
Law of large numbers: At high transaction volumes, even improbable interleavings become inevitable. A 1-in-a-billion bad interleaving will occur multiple times per day on a busy system.
Why Testing Is Insufficient:
Testing cannot guarantee absence of concurrency bugs:
A system that passes millions of test runs may still fail in production when a rare timing condition occurs.
Why Application Logic Cannot Substitute:
Developers sometimes attempt to handle concurrency at the application level—implementing their own locking, version checking, or retry logic. This approach has fundamental problems:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- EXAMPLE: Application-level concurrency control attempt -- Developer's plan: Read current balance, check, then update-- Implemented in application code (pseudocode): def withdraw(account_id, amount): # Step 1: Read current balance balance = db.query("SELECT balance FROM accounts WHERE id = ?", account_id) # Step 2: Check if sufficient funds if balance >= amount: # Step 3: Perform withdrawal new_balance = balance - amount db.execute("UPDATE accounts SET balance = ? WHERE id = ?", new_balance, account_id) return True return False -- PROBLEM: Race condition between Step 1 and Step 3! -- Timeline:-- T1: withdraw(1, 100) T2: withdraw(1, 100)-- --------------------- ------------------------ Read balance: $150 -- Read balance: $150-- Check: 150 >= 100? Yes -- Check: 150 >= 100? Yes-- Update: balance = 50 -- Update: balance = 50 (OVERWRITES!) -- Result: Two $100 withdrawals from $150 account-- Final balance: $50 (should have failed one or left $-50)-- Lost update! Application logic did NOT prevent this. -- The developer might try:def withdraw_v2(account_id, amount): db.execute("UPDATE accounts SET balance = balance - ? " "WHERE id = ? AND balance >= ?", amount, account_id, amount) return db.rows_affected > 0 -- This is better but still has issues:-- - Other operations (reads, calculations) still unprotected-- - Complex operations spanning multiple tables vulnerable-- - Requires developer to remember the pattern everywhere-- - Still no guarantee of isolation for multi-step transactionsApplication-level checks can complement database concurrency control but cannot replace it. The database remains the ultimate arbiter of data integrity because it sees all access—from all applications, all users, all paths. Relying solely on application logic creates gaps that will eventually be exploited by bugs, direct database access, or malicious actors.
When concurrency control fails—either through bugs in the database system or misconfigured isolation levels—the consequences can be severe. Let's examine real-world impact categories.
Financial Data Corruption:
Banking and payment systems are particularly vulnerable:
Even small errors compound: a $0.01 error occurring 10 million times per day is $100,000 daily loss.
| Domain | Scenario | Consequence | Root Cause |
|---|---|---|---|
| E-commerce | Inventory oversell | Orders placed for unavailable items | Lost update on stock count |
| Banking | Double withdrawal | Account overdrafts, bank loss | Non-serializable balance check |
| Healthcare | Duplicate medication order | Patient safety risk | Phantom read in order list |
| Ticketing | Double booking | Sold same seat twice | Lost update on availability |
| Trading | Position overcounting | Regulatory violations | Dirty read of pending trades |
Data Integrity Violations:
Beyond financial impact, concurrency failures can violate fundamental data integrity:
The Compounding Effect:
Concurrency bugs rarely occur in isolation. One corrupted record affects queries that read it, leading to:
The most dangerous concurrency failures are silent ones—data corruption that isn't immediately detected. Unlike crashes (which are obvious), corrupted data may propagate for days or weeks before being discovered, by which point recovery may be impossible. This is why prevention through proper concurrency control is infinitely better than detection after the fact.
Given the impossibility of testing all interleavings and the inadequacy of application-level control, how can we guarantee correctness? The answer lies in the serializability principle—the foundational concept of database concurrency control.
The Serializability Guarantee:
A concurrent execution is serializable if its outcome (final database state and all values read by transactions) is identical to some serial execution of the same transactions.
In other words: the transactions may have executed concurrently with interleaved operations, but the result is indistinguishable from running them one at a time in some order.
Why Serializability Works:
1234567891011121314151617181920212223242526272829303132333435363738
-- Understanding serializability -- Three transactions execute concurrently:-- T1: Credit Account A with $100-- T2: Debit Account A with $50 -- T3: Read and display Account A balance -- Serial executions possible (6 orderings for 3 transactions):-- S1: T1, T2, T3 → A +100 -50 = A +50, T3 reads A+50-- S2: T1, T3, T2 → A +100, T3 reads A+100, then -50-- S3: T2, T1, T3 → A -50 +100 = A +50, T3 reads A+50-- S4: T2, T3, T1 → A -50, T3 reads A-50, then +100-- S5: T3, T1, T2 → T3 reads original A, then +100 -50-- S6: T3, T2, T1 → T3 reads original A, then -50 +100 -- A CONCURRENT EXECUTION is serializable if it produces a result-- identical to one of S1-S6 (any one will do). -- Example concurrent execution (interleaved):-- r3(A) r1(A) w1(A) r2(A) w2(A)-- -- Analysis: -- T3 reads original A (before any changes)-- T1 then T2 modify A-- -- Is this equivalent to some serial schedule?-- Matches S5: T3, T1, T2 (or S6: T3, T2, T1 if T2 read before T1 write)-- -- YES - this concurrent execution is SERIALIZABLE! -- Example NON-serializable execution:-- r1(A) r2(A) w1(A) w2(A) r3(A)---- T1 and T2 both read original A, then both write based on that.-- T1's write is overwritten by T2's write (LOST UPDATE!)-- This outcome doesn't match any S1-S6.---- NO - this concurrent execution is NOT serializable.The Practical Implication:
Concurrency control mechanisms are designed to ensure that whatever interleaving naturally occurs will be serializable. They do this by:
As long as the mechanism guarantees serializability, programmers can write transactions as if they execute in isolation. The database handles the complexity of making concurrent execution safe.
Serializability is a powerful abstraction: it allows programmers to reason about transactions individually, assuming isolation, while the database maintains this illusion across concurrent executions. This separation of concerns is what makes complex database applications feasible to write correctly.
While serializability is the gold standard for correctness, it comes with performance costs. Database systems offer isolation levels—different degrees of concurrency control with different performance and correctness trade-offs.
The SQL Standard Isolation Levels:
In order of increasing strictness (and typically decreasing performance):
Applications choose isolation levels based on their tolerance for anomalies versus their need for performance.
| Level | Dirty Read | Non-Repeatable Read | Phantom | Typical Use Case |
|---|---|---|---|---|
| Read Uncommitted | Possible | Possible | Possible | Approximate analytics, non-critical dashboards |
| Read Committed | Prevented | Possible | Possible | Most OLTP applications (default) |
| Repeatable Read | Prevented | Prevented | Possible | Reports requiring consistent view |
| Serializable | Prevented | Prevented | Prevented | Financial transactions, safety-critical systems |
Choosing the Right Level:
The choice of isolation level involves understanding your application's requirements:
Mixed Levels:
Advanced applications may use different isolation levels for different transactions:
This provides appropriate protection where needed while minimizing performance impact.
123456789101112131415161718192021222324252627282930313233
-- Setting isolation level in SQL -- PostgreSQL: Set for current transactionBEGIN;SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;-- Critical operations hereCOMMIT; -- MySQL: Set for sessionSET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED; -- SQL Server: Set for current transactionSET TRANSACTION ISOLATION LEVEL REPEATABLE READ;BEGIN TRANSACTION;-- Operations hereCOMMIT; -- Oracle: Uses "Read Committed" and "Serializable" onlySET TRANSACTION ISOLATION LEVEL SERIALIZABLE; -- Checking current isolation level: -- PostgreSQLSHOW TRANSACTION ISOLATION LEVEL; -- MySQLSELECT @@transaction_isolation; -- SQL ServerDBCC USEROPTIONS; -- Best Practice: Explicitly set isolation level when needed-- rather than relying on defaults that may vary by database.Default isolation levels differ by database: PostgreSQL and SQL Server default to Read Committed; MySQL/InnoDB defaults to Repeatable Read; Oracle defaults to Read Committed. Know your database's default and explicitly set higher levels when needed for critical operations.
Database systems implement concurrency control through several fundamental mechanisms. Each has trade-offs in terms of performance, complexity, and the types of anomalies they prevent.
The Major Concurrency Control Mechanisms:
Locking (Pessimistic Control)
Timestamp Ordering (Optimistic/Deterministic)
Multi-Version Concurrency Control (MVCC)
Trade-offs Between Mechanisms:
| Mechanism | Strengths | Weaknesses | Best For |
|---|---|---|---|
| Locking | Proven, predictable | Blocking, deadlocks | Mixed workloads |
| Timestamp | No deadlocks | High abort rate under contention | Distributed systems |
| MVCC | Readers don't block | Storage overhead, garbage collection | Read-heavy workloads |
| Optimistic | Maximum concurrency | Wasted work on abort | Low contention |
Modern databases often combine mechanisms:
There is no universally 'best' concurrency control mechanism. The right choice depends on workload characteristics (read vs. write ratio), contention patterns (hot spots vs. distributed access), consistency requirements (full serializability vs. acceptable anomalies), and performance goals (throughput vs. latency).
Concurrency control has costs—reduced throughput, increased latency, higher resource usage. Understanding these costs helps in making informed decisions about isolation levels and mechanism choices.
Costs of Concurrency Control:
Benefits of Concurrency Control:
123456789101112131415161718192021222324252627282930313233
-- Observing concurrency control impact (PostgreSQL example) -- Check lock statisticsSELECT * FROM pg_stat_activity WHERE wait_event_type = 'Lock'; -- View lock waitsSELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statementFROM pg_catalog.pg_locks blocked_locksJOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pidJOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relationWHERE NOT blocked_locks.granted; -- Check for deadlock occurrencesSELECT * FROM pg_stat_database WHERE deadlocks > 0; -- MVCC bloat monitoringSELECT relname, n_live_tup, n_dead_tup, round(n_dead_tup * 100.0 / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pctFROM pg_stat_user_tablesWHERE n_dead_tup > 0ORDER BY n_dead_tup DESC; -- Serialization failure rate (for Serializable isolation)SELECT setting FROM pg_settings WHERE name = 'default_transaction_isolation';-- Then check application logs for "could not serialize access" errorsThe 'right' amount of concurrency control depends on your requirements. Over-controlling (unnecessarily strict isolation) wastes performance. Under-controlling (too-weak isolation) risks data corruption. The goal is the minimum control level that meets your correctness requirements.
This module has established the foundational understanding of why concurrency control exists and matters. The subsequent modules in this chapter dive deep into each specific concurrency problem, followed by chapters covering the mechanisms that solve them.
The Learning Path Ahead:
This Chapter (Concurrency Problems):
Next Chapters:
| Topic | Where Covered | Dependencies |
|---|---|---|
| Lost Update details | Module 2 | This module |
| Dirty Read details | Module 3 | This module |
| Non-Repeatable Read details | Module 4 | This module |
| Phantom Read details | Module 5 | This module |
| Isolation Levels | Module 6 | Modules 2-5 |
| Lock-based solutions | Chapter 24 | This chapter |
| Timestamp/MVCC solutions | Chapter 25 | This chapter |
How to Study This Material:
Master each problem individually: Understand the specific scenario, mechanism, and consequences of each concurrency anomaly.
Relate problems to isolation levels: Learn which isolation levels prevent which problems and why.
Connect problems to solutions: When studying locking and MVCC later, connect back to which problems each mechanism solves.
Practice with examples: Work through interleaving scenarios to build intuition.
Think in producers and consumers: Concurrency problems occur when one transaction produces data that another consumes—understanding this flow helps identify risks.
The goal isn't to memorize definitions but to build mental models that let you recognize potential concurrency issues in real application code. As you learn each problem and solution, think about how they'd manifest in systems you work with—banking apps, e-commerce, booking systems, etc.
We have completed our foundational examination of concurrency in database systems. This module has established the vocabulary, concepts, and motivation that underpin all subsequent study of concurrency control.
The Big Picture:
Concurrency control is one of the most sophisticated aspects of database systems, decades in development and the subject of ongoing research. Yet its purpose is simple: allow the benefits of concurrent execution while preventing the problems concurrent execution can cause.
As you proceed through the detailed study of each concurrency problem and the mechanisms that solve them, keep this purpose in mind. Every lock acquired, every version maintained, every validation check performed exists to ensure that your data remains correct even when hundreds or thousands of transactions access it simultaneously.
You now have the foundation to understand why these mechanisms exist and what they're trying to achieve.
Congratulations! You have completed Module 1: Concurrency Overview of Chapter 23. You now understand why concurrent execution is both essential and dangerous, how interleaving creates the potential for problems, and why active concurrency control is non-negotiable. The following modules will examine each concurrency problem in detail, building the complete picture of what database systems must protect against.