Concurrency Overview - Learning Module

Loading content...

0/241

Concurrent Execution

The Reality of Modern Database Systems

Imagine a global e-commerce platform during a flash sale. In a single second, thousands of customers are simultaneously browsing products, adding items to carts, placing orders, and making payments. Behind the scenes, each of these operations translates into database transactions—operations that read, write, and modify data. If these transactions were forced to execute one at a time, in strict sequential order, the system would crawl to a halt, and customers would abandon their carts in frustration.

This is where concurrent execution becomes not just beneficial, but absolutely essential.

Concurrent execution allows a database management system (DBMS) to process multiple transactions at the same time, dramatically improving throughput and responsiveness. But this power comes with significant complexity: when transactions execute concurrently and access shared data, the potential for data corruption, inconsistency, and lost updates becomes very real.

This page explores the fundamental concept of concurrent execution—what it means, how it works at a technical level, and why understanding it is crucial for anyone working with database systems.

What You Will Learn

By the end of this page, you will understand what concurrent execution means in database systems, how transactions can execute in parallel, the distinction between true parallelism and interleaved execution, the components that enable concurrent processing, and the fundamental challenges that arise when transactions share resources.

Defining Concurrent Execution

At its core, concurrent execution refers to the ability of a database management system to handle multiple transactions simultaneously, rather than processing them one after another in strict sequence. This concept is fundamental to how modern databases achieve the performance and responsiveness that applications demand.

The formal definition:

Concurrent execution occurs when two or more transactions have overlapping execution periods—meaning that while one transaction is still in progress (has executed BEGIN but not yet COMMIT or ROLLBACK), another transaction begins execution. The transactions' operations may be interleaved in various ways, depending on the scheduling decisions made by the DBMS.

Concurrent vs. Parallel: A Critical Distinction

While often used interchangeably in casual conversation, concurrency and parallelism are distinct concepts. Concurrency means multiple transactions are in progress at the same time—their executions overlap temporally. Parallelism means multiple operations are executing at literally the same instant using multiple processors or cores. A single-core system can achieve concurrency through interleaving but cannot achieve true parallelism. Modern multi-core systems can achieve both.

Understanding the contrast with serial execution:

To appreciate concurrent execution, we must first understand its opposite: serial execution. In serial execution, transactions run one at a time, one after another, with no overlap. A transaction T₂ cannot begin until transaction T₁ has completely finished (committed or rolled back).

Consider two simple transactions:

T₁: Transfer $100 from Account A to Account B
T₂: Calculate the total balance across all accounts

In serial execution, we would have:

Schedule S₁: T₁ completes entirely, then T₂ executes entirely, OR
Schedule S₂: T₂ completes entirely, then T₁ executes entirely

In concurrent execution, operations from T₁ and T₂ can be interleaved:

While T₁ is deducting from Account A, T₂ might be reading Account C
T₂ might read Account A before T₁'s deduction, then read Account B after T₁'s credit
The specific interleaving depends on the scheduler's decisions

Serial vs. Concurrent Execution Comparison
Aspect	Serial Execution	Concurrent Execution
Transaction overlap	None—one transaction at a time	Multiple transactions active simultaneously
Resource utilization	Often poor—waiting for I/O wastes CPU	High—other transactions use CPU during I/O waits
Throughput	Limited by slowest transaction	Significantly higher due to overlap
Response time	Long waits for users	Generally shorter, more responsive
Complexity	Simple—no conflicts possible	Complex—requires concurrency control
Correctness	Guaranteed by isolation	Requires explicit mechanisms to ensure

Why Databases Need Concurrent Execution

The need for concurrent execution in database systems stems from fundamental realities of computer architecture, application requirements, and user expectations. Understanding these drivers helps explain why every modern DBMS is designed with concurrency as a core capability.

The CPU-I/O speed disparity:

The primary technical motivation for concurrency is the enormous speed gap between processors and storage devices. A modern CPU can execute billions of operations per second, while even the fastest NVMe SSDs have latencies measured in microseconds for individual operations. Mechanical hard drives are orders of magnitude slower still.

When a transaction performs a disk read, the CPU would be idle for thousands of CPU cycles if forced to wait. Concurrent execution allows the DBMS to put another transaction's operations on the CPU during this wait time, dramatically improving overall system utilization.

Speed Disparity Illustration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Transaction T1: Update customer record
-- This requires disk I/O to fetch and write the record
 
BEGIN TRANSACTION;
  -- Step 1: Read customer record (disk I/O: ~100 microseconds for SSD)
  SELECT * FROM customers WHERE customer_id = 1001;
  
  -- Step 2: Process in memory (CPU: ~1 microsecond)
  -- Application logic here...
  
  -- Step 3: Write updated record (disk I/O: ~100 microseconds)
  UPDATE customers SET balance = balance - 50 WHERE customer_id = 1001;
COMMIT;
 
-- During the ~200 microseconds of I/O wait, the CPU could execute
-- approximately 200,000 to 2,000,000 instructions!
-- This is wasted potential in pure serial execution.
 
-- With concurrent execution:
-- While T1 waits for disk I/O, T2, T3, T4... can use the CPU
-- System throughput increases dramatically

Quantifying the throughput improvement:

Consider a workload where each transaction spends 80% of its time waiting for I/O and 20% actually using the CPU (this ratio is common for database transactions). In serial execution, the CPU is utilized only 20% of the time.

With concurrent execution allowing 5 transactions to overlap:

While Transaction 1 waits for I/O, Transactions 2-5 can use CPU
CPU utilization can approach 100%
Throughput can improve by up to 5x in this scenario

This is the fundamental performance argument for concurrency: exploiting I/O latency to keep the CPU productive.

Key Drivers for Concurrent Execution

•CPU Utilization — Keeping processors busy during I/O waits maximizes expensive hardware investment and increases work completed per unit time.
•Disk Arm Utilization — Concurrent transactions allow the I/O subsystem to queue and optimize multiple requests, improving disk efficiency through techniques like elevator scheduling.
•Multi-core Exploitation — Modern servers have dozens of CPU cores. Serial execution uses one core; concurrent execution can leverage all available cores for true parallelism.
•Response Time — Short transactions (like looking up a product price) shouldn't wait behind long transactions (like generating a complex report). Concurrency allows quick queries to proceed.
•User Experience — Applications serving thousands of simultaneous users require the database to handle concurrent requests. Serial execution would create unacceptable wait times.

The Real-World Impact

Consider a banking system processing 10,000 transactions per second during peak hours. If each transaction averages 50 milliseconds of CPU + I/O time, serial execution could handle at most 20 transactions per second (1/0.05). Concurrent execution is the only way to achieve the required 10,000 TPS—a 500x improvement.

How Concurrent Execution Works

Understanding the mechanics of concurrent execution requires examining how a DBMS manages multiple transactions, the components involved, and the lifecycle of concurrent operations. This knowledge forms the foundation for understanding concurrency control mechanisms covered in later modules.

The Transaction Scheduler:

At the heart of concurrent execution is the transaction scheduler (also called the scheduler or concurrency control manager). This component determines the order in which operations from different transactions are executed. The scheduler's decisions create what is called a schedule—a specific ordering of operations from multiple transactions.

The scheduler must balance two often-conflicting goals:

Maximize concurrency to achieve high throughput and resource utilization
Maintain correctness by ensuring the result is equivalent to some serial execution

Converting Mermaid diagram...

The lifecycle of a concurrent transaction:

When multiple transactions execute concurrently, each follows this lifecycle while potentially overlapping with others:

Transaction Start: Client initiates a transaction (explicit BEGIN or implicit first statement)
Operation Submission: Application submits SQL operations (SELECT, INSERT, UPDATE, DELETE)
Scheduling: Scheduler determines when each operation can execute
Resource Acquisition: Lock manager grants necessary locks on data items
Operation Execution: Buffer manager retrieves data, modifications made in memory
Logging: Log manager records operations for recovery purposes
Transaction Completion: COMMIT writes changes permanently; ROLLBACK discards changes
Resource Release: Locks and other resources are released for other transactions

Concurrent Transaction Lifecycle Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Timeline showing two concurrent transactions
-- Time flows downward; T1 and T2 operations are interleaved
 
-- T1: Bank Transfer              -- T2: Balance Inquiry
-- =====================          -- =====================
 
t0: BEGIN TRANSACTION;            
t1:                               BEGIN TRANSACTION;
t2: READ(Account_A);              
t3:                               READ(Account_A);
t4: A = A - 100;                  
t5:                               READ(Account_B);
t6: WRITE(Account_A);             
t7: READ(Account_B);              
t8:                               READ(Account_C);
t9: B = B + 100;                  
t10: WRITE(Account_B);            
t11: COMMIT;                      
t12:                              total = A + B + C;
t13:                              COMMIT;
 
-- Both transactions are "in progress" from their BEGIN until COMMIT
-- Operations are interleaved based on scheduler decisions
-- Notice: T2 read A before T1's write, but read B after T1's write
-- This interleaving could produce inconsistent results!

The Fundamental Challenge

The example above illustrates a critical problem: T2's balance calculation might be incorrect because it read Account A before T1's deduction but Account B after T1's credit. The 'total' would appear $100 higher than it actually is! This is why concurrent execution, while necessary, requires sophisticated control mechanisms.

Schedules and Their Properties

The concept of a schedule is central to understanding and reasoning about concurrent execution. A schedule is a formal representation of how operations from multiple transactions are ordered in time.

Formal Definition:

A schedule S for a set of transactions T₁, T₂, ..., Tₙ is a total ordering of all the operations in these transactions that preserves the ordering of operations within each individual transaction.

In other words:

All operations from all transactions appear in the schedule
If operation A comes before operation B in transaction Tᵢ, then A comes before B in the schedule
Operations from different transactions can be interleaved in any order

Types of Schedules:

Serial Schedules: All operations of one transaction complete before any operation of another transaction begins. For n transactions, there are n! possible serial schedules.

Non-serial (Concurrent) Schedules: Operations from different transactions are interleaved. The number of possible non-serial schedules is astronomically larger than serial schedules.

The goal of concurrency control is to allow only those non-serial schedules that produce results equivalent to some serial schedule—these are called serializable schedules.

Schedule Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Consider two transactions:
-- T1: r1(x) w1(x)    [read x, then write x]
-- T2: r2(x) w2(x)    [read x, then write x]
 
-- Serial Schedules (only 2 possible):
-- S1: r1(x) w1(x) r2(x) w2(x)   -- T1 then T2
-- S2: r2(x) w2(x) r1(x) w1(x)   -- T2 then T1
 
-- Non-serial Schedules (many possible):
-- S3: r1(x) r2(x) w1(x) w2(x)   -- Interleaved
-- S4: r1(x) r2(x) w2(x) w1(x)   -- Interleaved differently
-- S5: r2(x) r1(x) w1(x) w2(x)   -- Another interleaving
-- ... many more possible
 
-- Key question: Which non-serial schedules are "safe"?
-- A schedule is safe (serializable) if it produces the same
-- result as some serial schedule.
 
-- Example analysis of S3: r1(x) r2(x) w1(x) w2(x)
-- T1 reads x (gets initial value)
-- T2 reads x (gets same initial value, T1 hasn't written yet)
-- T1 writes x (based on initial value)
-- T2 writes x (based on initial value, overwrites T1's write!)
-- This is the LOST UPDATE problem - T1's update is lost!
-- S3 is NOT equivalent to either S1 or S2 - it's not serializable

Schedule Classification
Schedule Type	Definition	Properties	Allowed?
Serial	No interleaving of operations between transactions	Always correct; poor performance	Always
Serializable	Equivalent to some serial schedule	Correct; allows concurrency benefits	Yes
Non-serializable	Not equivalent to any serial schedule	May produce incorrect results	Must be prevented

The Scheduler's Challenge

The scheduler must decide, in real-time without knowing future operations, whether to allow an operation to proceed. It cannot wait to see the complete schedule before deciding—it must make incremental decisions that guarantee no non-serializable schedule can result. This is the core challenge of concurrency control.

Data Access Patterns in Concurrent Execution

When transactions execute concurrently, they access shared data items. Understanding the patterns of these accesses—particularly which combinations can cause problems—is essential for designing and using concurrency control mechanisms effectively.

The fundamental operations:

From a concurrency perspective, all database operations reduce to two fundamental types:

Read (R): Retrieve the current value of a data item without modifying it
Write (W): Modify the value of a data item

Every SQL statement ultimately translates to some combination of reads and writes on specific data items (rows, pages, tables, etc.).

Conflict Matrix for Concurrent Operations
Operation 1 (T₁)	Operation 2 (T₂)	Conflict?	Why?
Read(x)	Read(x)	No Conflict	Both are reading; neither changes the data
Read(x)	Write(x)	Conflict	T₁'s read might see before or after T₂'s write
Write(x)	Read(x)	Conflict	T₂'s read might see before or after T₁'s write
Write(x)	Write(x)	Conflict	Final value depends on which write happens last

Understanding conflicts:

Two operations conflict if:

They belong to different transactions
They access the same data item
At least one of them is a write operation

Conflicting operations are significant because their relative order affects the final result. If we swap the order of two conflicting operations, we might get a different database state or different values read by transactions.

Conflict Analysis Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Example: Analyzing conflicts in a schedule
-- T1: r1(x) w1(x) r1(y) w1(y)
-- T2: r2(x) r2(y) w2(y)
 
-- Consider schedule: r1(x) r2(x) w1(x) r1(y) r2(y) w1(y) w2(y)
 
-- Identifying all pairs of conflicting operations:
 
-- On data item x:
--   r1(x) and r2(x): No conflict (both reads)
--   r1(x) and w1(x): No conflict (same transaction)
--   r2(x) and w1(x): CONFLICT! (different transactions, one is write)
 
-- On data item y:
--   r1(y) and r2(y): No conflict (both reads)
--   r1(y) and w1(y): No conflict (same transaction)
--   r1(y) and w2(y): CONFLICT! (different transactions, one is write)
--   r2(y) and w1(y): CONFLICT! (different transactions, one is write)
--   r2(y) and w2(y): No conflict (same transaction)
--   w1(y) and w2(y): CONFLICT! (different transactions, both writes)
 
-- Total conflicts: 4 conflicting pairs
-- These conflicts constrain which schedules are serializable

Read-Heavy Workloads Scale Better

Since reads don't conflict with other reads, workloads dominated by read operations can achieve much higher concurrency than write-heavy workloads. This is why many systems implement read replicas for scaling reads, while writes must go through a single primary—to manage write-write conflicts.

Levels of Concurrency in DBMS

Modern database systems implement concurrency at multiple levels, from coarse-grained parallelism across queries to fine-grained parallelism within single operations. Understanding these levels helps in appreciating where concurrent execution occurs and how it impacts system behavior.

Inter-query Concurrency:

This is the classic form of database concurrency—different queries or transactions from different users executing simultaneously. When User A runs a SELECT and User B runs an UPDATE at the same time, this is inter-query concurrency. This is the primary focus of concurrency control mechanisms like locking and MVCC.

Intra-query Concurrency:

A single complex query can be broken into parallel operations. For example, a query scanning a large table might use multiple threads, each scanning a different portion of the table. The results are then merged. This is sometimes called query parallelism or intra-query parallelism.

Levels of Database Concurrency

•Inter-Query Concurrency — Multiple independent queries/transactions run simultaneously. This is the traditional focus of concurrency control.
•Intra-Query Concurrency — Single query uses multiple threads/processes. Common in analytical workloads scanning large datasets.
•Intra-Operation Concurrency — Single operation (like a sort or hash join) parallelizes internally. Uses multiple CPU cores for compute-intensive operations.
•I/O Concurrency — Multiple disk operations proceed in parallel using asynchronous I/O or multiple storage devices.
•Network Concurrency — In distributed databases, queries to multiple nodes proceed in parallel to reduce latency.

Intra-Query Parallelism Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Modern databases can parallelize complex queries
-- Example in PostgreSQL with parallel query execution
 
-- Check parallel query settings
SHOW max_parallel_workers_per_gather;  -- e.g., 2
 
-- A query on a large table might use parallel workers:
EXPLAIN ANALYZE
SELECT department_id, AVG(salary), COUNT(*)
FROM employees  -- Assume 10 million rows
WHERE hire_date > '2010-01-01'
GROUP BY department_id;
 
-- Possible parallel execution plan:
-- Finalize GroupAggregate
--   -> Gather Merge (2 workers)
--        -> Partial GroupAggregate
--             -> Parallel Seq Scan on employees
--                  Filter: (hire_date > '2010-01-01')
--                  Workers Planned: 2
--                  Workers Launched: 2
 
-- The scan and initial aggregation happen in parallel
-- across 2 worker processes, then results are merged
-- This is intra-query concurrency within a single statement

Inter-Query vs. Intra-Query Focus

When studying 'concurrency control' in database courses, the primary focus is inter-query concurrency—how multiple transactions from different users are coordinated. Intra-query parallelism is typically covered under 'query processing' or 'parallel databases' as a separate topic.

The Buffer Manager's Role in Concurrency

Concurrent execution would be impossible without proper coordination at the buffer pool level. The buffer manager is the DBMS component that mediates all access to disk data, and its interaction with concurrency control is crucial to understand.

What is the Buffer Pool?

The buffer pool (or buffer cache) is a region of main memory where the DBMS caches pages of data read from disk. Instead of reading directly from disk for every operation, transactions read and write data in the buffer pool. Changes are later written back to disk.

Shared buffer access:

When multiple transactions execute concurrently, they may need to access the same data pages in the buffer pool. The buffer manager must coordinate this access to prevent corruption—but this is separate from the logical concurrency control that ensures serializability.

Converting Mermaid diagram...

Buffer latches vs. locks:

The buffer manager uses latches (lightweight locks) to protect physical access to buffer pages. These are different from the locks used by the concurrency control system:

Aspect	Latches	Locks
Purpose	Protect physical data structures in memory	Protect logical data items for transactions
Duration	Held for very short periods (microseconds)	Held for transaction duration (milliseconds to seconds)
Scope	Single page or data structure	Logical item (row, table, etc.)
Deadlock	Avoided through coding discipline	Detected and resolved by DBMS
Release	Immediate after operation	At commit/rollback or when no longer needed

Understanding this distinction is important: even with proper logical concurrency control, the buffer manager must use latches to prevent race conditions during page access.

The Dirty Page Challenge

When a transaction modifies data in the buffer pool, that page becomes 'dirty'—it differs from the on-disk version. If the system crashes before the dirty page is written to disk, the modification is lost. But writing every change immediately would cripple performance. This tension between durability and performance is resolved through logging (Write-Ahead Logging), not just concurrency control.

Summary: Concurrent Execution

We have established what concurrent execution means and why it is essential for modern database systems. Let's consolidate the key concepts:

Key Takeaways

•Concurrent execution allows multiple transactions to be in progress simultaneously, with their operations potentially interleaved.
•Serial execution is simple but wastes resources—CPU sits idle during I/O operations, limiting throughput.
•The CPU-I/O speed gap is the primary driver for concurrency—utilizing the CPU during I/O waits dramatically improves performance.
•A schedule is a specific ordering of operations from multiple transactions; schedules can be serial or non-serial.
•Conflicting operations (different transactions, same data item, at least one write) are the source of concurrency problems.
•Multiple levels of concurrency exist: inter-query, intra-query, intra-operation, I/O, and network.
•Buffer management is intertwined with concurrency—latches protect physical pages while locks protect logical data.

What's next:

Now that we understand what concurrent execution is and how it works, the next page explores why we want concurrency—the concrete performance benefits that make the complexity of concurrent execution worthwhile. We'll quantify throughput improvements, examine response time benefits, and understand why concurrent execution is non-negotiable for real-world systems.

Page Complete

You now understand the fundamental concept of concurrent execution in database systems—how transactions can overlap in time, the role of the scheduler in ordering operations, and why this capability exists. The challenges of concurrent execution (the problems it can cause) will be explored in depth in upcoming pages.

Concurrent Execution

The Reality of Modern Database Systems

This is where concurrent execution becomes not just beneficial, but absolutely essential.

This page explores the fundamental concept of concurrent execution—what it means, how it works at a technical level, and why understanding it is crucial for anyone working with database systems.

What You Will Learn

Defining Concurrent Execution

The formal definition:

Concurrent vs. Parallel: A Critical Distinction

Understanding the contrast with serial execution:

Consider two simple transactions:

T₁: Transfer $100 from Account A to Account B
T₂: Calculate the total balance across all accounts

In serial execution, we would have:

Schedule S₁: T₁ completes entirely, then T₂ executes entirely, OR
Schedule S₂: T₂ completes entirely, then T₁ executes entirely

In concurrent execution, operations from T₁ and T₂ can be interleaved:

While T₁ is deducting from Account A, T₂ might be reading Account C
T₂ might read Account A before T₁'s deduction, then read Account B after T₁'s credit
The specific interleaving depends on the scheduler's decisions

Serial vs. Concurrent Execution Comparison
Aspect	Serial Execution	Concurrent Execution
Transaction overlap	None—one transaction at a time	Multiple transactions active simultaneously
Resource utilization	Often poor—waiting for I/O wastes CPU	High—other transactions use CPU during I/O waits
Throughput	Limited by slowest transaction	Significantly higher due to overlap
Response time	Long waits for users	Generally shorter, more responsive
Complexity	Simple—no conflicts possible	Complex—requires concurrency control
Correctness	Guaranteed by isolation	Requires explicit mechanisms to ensure

Why Databases Need Concurrent Execution

The CPU-I/O speed disparity:

Speed Disparity Illustration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Transaction T1: Update customer record
-- This requires disk I/O to fetch and write the record
 
BEGIN TRANSACTION;
  -- Step 1: Read customer record (disk I/O: ~100 microseconds for SSD)
  SELECT * FROM customers WHERE customer_id = 1001;
  
  -- Step 2: Process in memory (CPU: ~1 microsecond)
  -- Application logic here...
  
  -- Step 3: Write updated record (disk I/O: ~100 microseconds)
  UPDATE customers SET balance = balance - 50 WHERE customer_id = 1001;
COMMIT;
 
-- During the ~200 microseconds of I/O wait, the CPU could execute
-- approximately 200,000 to 2,000,000 instructions!
-- This is wasted potential in pure serial execution.
 
-- With concurrent execution:
-- While T1 waits for disk I/O, T2, T3, T4... can use the CPU
-- System throughput increases dramatically

Quantifying the throughput improvement:

With concurrent execution allowing 5 transactions to overlap:

While Transaction 1 waits for I/O, Transactions 2-5 can use CPU
CPU utilization can approach 100%
Throughput can improve by up to 5x in this scenario

This is the fundamental performance argument for concurrency: exploiting I/O latency to keep the CPU productive.

Key Drivers for Concurrent Execution

•CPU Utilization — Keeping processors busy during I/O waits maximizes expensive hardware investment and increases work completed per unit time.
•Disk Arm Utilization — Concurrent transactions allow the I/O subsystem to queue and optimize multiple requests, improving disk efficiency through techniques like elevator scheduling.
•Multi-core Exploitation — Modern servers have dozens of CPU cores. Serial execution uses one core; concurrent execution can leverage all available cores for true parallelism.
•Response Time — Short transactions (like looking up a product price) shouldn't wait behind long transactions (like generating a complex report). Concurrency allows quick queries to proceed.
•User Experience — Applications serving thousands of simultaneous users require the database to handle concurrent requests. Serial execution would create unacceptable wait times.

The Real-World Impact

How Concurrent Execution Works

The Transaction Scheduler:

The scheduler must balance two often-conflicting goals:

Maximize concurrency to achieve high throughput and resource utilization
Maintain correctness by ensuring the result is equivalent to some serial execution

Converting Mermaid diagram...

The lifecycle of a concurrent transaction:

When multiple transactions execute concurrently, each follows this lifecycle while potentially overlapping with others:

Transaction Start: Client initiates a transaction (explicit BEGIN or implicit first statement)
Operation Submission: Application submits SQL operations (SELECT, INSERT, UPDATE, DELETE)
Scheduling: Scheduler determines when each operation can execute
Resource Acquisition: Lock manager grants necessary locks on data items
Operation Execution: Buffer manager retrieves data, modifications made in memory
Logging: Log manager records operations for recovery purposes
Transaction Completion: COMMIT writes changes permanently; ROLLBACK discards changes
Resource Release: Locks and other resources are released for other transactions

Concurrent Transaction Lifecycle Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Timeline showing two concurrent transactions
-- Time flows downward; T1 and T2 operations are interleaved
 
-- T1: Bank Transfer              -- T2: Balance Inquiry
-- =====================          -- =====================
 
t0: BEGIN TRANSACTION;            
t1:                               BEGIN TRANSACTION;
t2: READ(Account_A);              
t3:                               READ(Account_A);
t4: A = A - 100;                  
t5:                               READ(Account_B);
t6: WRITE(Account_A);             
t7: READ(Account_B);              
t8:                               READ(Account_C);
t9: B = B + 100;                  
t10: WRITE(Account_B);            
t11: COMMIT;                      
t12:                              total = A + B + C;
t13:                              COMMIT;
 
-- Both transactions are "in progress" from their BEGIN until COMMIT
-- Operations are interleaved based on scheduler decisions
-- Notice: T2 read A before T1's write, but read B after T1's write
-- This interleaving could produce inconsistent results!

The Fundamental Challenge

Schedules and Their Properties

Formal Definition:

In other words:

All operations from all transactions appear in the schedule
If operation A comes before operation B in transaction Tᵢ, then A comes before B in the schedule
Operations from different transactions can be interleaved in any order

Types of Schedules:

Serial Schedules: All operations of one transaction complete before any operation of another transaction begins. For n transactions, there are n! possible serial schedules.

Non-serial (Concurrent) Schedules: Operations from different transactions are interleaved. The number of possible non-serial schedules is astronomically larger than serial schedules.

The goal of concurrency control is to allow only those non-serial schedules that produce results equivalent to some serial schedule—these are called serializable schedules.

Schedule Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Consider two transactions:
-- T1: r1(x) w1(x)    [read x, then write x]
-- T2: r2(x) w2(x)    [read x, then write x]
 
-- Serial Schedules (only 2 possible):
-- S1: r1(x) w1(x) r2(x) w2(x)   -- T1 then T2
-- S2: r2(x) w2(x) r1(x) w1(x)   -- T2 then T1
 
-- Non-serial Schedules (many possible):
-- S3: r1(x) r2(x) w1(x) w2(x)   -- Interleaved
-- S4: r1(x) r2(x) w2(x) w1(x)   -- Interleaved differently
-- S5: r2(x) r1(x) w1(x) w2(x)   -- Another interleaving
-- ... many more possible
 
-- Key question: Which non-serial schedules are "safe"?
-- A schedule is safe (serializable) if it produces the same
-- result as some serial schedule.
 
-- Example analysis of S3: r1(x) r2(x) w1(x) w2(x)
-- T1 reads x (gets initial value)
-- T2 reads x (gets same initial value, T1 hasn't written yet)
-- T1 writes x (based on initial value)
-- T2 writes x (based on initial value, overwrites T1's write!)
-- This is the LOST UPDATE problem - T1's update is lost!
-- S3 is NOT equivalent to either S1 or S2 - it's not serializable

Schedule Classification
Schedule Type	Definition	Properties	Allowed?
Serial	No interleaving of operations between transactions	Always correct; poor performance	Always
Serializable	Equivalent to some serial schedule	Correct; allows concurrency benefits	Yes
Non-serializable	Not equivalent to any serial schedule	May produce incorrect results	Must be prevented

The Scheduler's Challenge

Data Access Patterns in Concurrent Execution

The fundamental operations:

From a concurrency perspective, all database operations reduce to two fundamental types:

Read (R): Retrieve the current value of a data item without modifying it
Write (W): Modify the value of a data item

Every SQL statement ultimately translates to some combination of reads and writes on specific data items (rows, pages, tables, etc.).

Conflict Matrix for Concurrent Operations
Operation 1 (T₁)	Operation 2 (T₂)	Conflict?	Why?
Read(x)	Read(x)	No Conflict	Both are reading; neither changes the data
Read(x)	Write(x)	Conflict	T₁'s read might see before or after T₂'s write
Write(x)	Read(x)	Conflict	T₂'s read might see before or after T₁'s write
Write(x)	Write(x)	Conflict	Final value depends on which write happens last

Understanding conflicts:

Two operations conflict if:

They belong to different transactions
They access the same data item
At least one of them is a write operation

Conflict Analysis Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Example: Analyzing conflicts in a schedule
-- T1: r1(x) w1(x) r1(y) w1(y)
-- T2: r2(x) r2(y) w2(y)
 
-- Consider schedule: r1(x) r2(x) w1(x) r1(y) r2(y) w1(y) w2(y)
 
-- Identifying all pairs of conflicting operations:
 
-- On data item x:
--   r1(x) and r2(x): No conflict (both reads)
--   r1(x) and w1(x): No conflict (same transaction)
--   r2(x) and w1(x): CONFLICT! (different transactions, one is write)
 
-- On data item y:
--   r1(y) and r2(y): No conflict (both reads)
--   r1(y) and w1(y): No conflict (same transaction)
--   r1(y) and w2(y): CONFLICT! (different transactions, one is write)
--   r2(y) and w1(y): CONFLICT! (different transactions, one is write)
--   r2(y) and w2(y): No conflict (same transaction)
--   w1(y) and w2(y): CONFLICT! (different transactions, both writes)
 
-- Total conflicts: 4 conflicting pairs
-- These conflicts constrain which schedules are serializable

Read-Heavy Workloads Scale Better

Levels of Concurrency in DBMS

Inter-query Concurrency:

Intra-query Concurrency:

Levels of Database Concurrency

•Inter-Query Concurrency — Multiple independent queries/transactions run simultaneously. This is the traditional focus of concurrency control.
•Intra-Query Concurrency — Single query uses multiple threads/processes. Common in analytical workloads scanning large datasets.
•Intra-Operation Concurrency — Single operation (like a sort or hash join) parallelizes internally. Uses multiple CPU cores for compute-intensive operations.
•I/O Concurrency — Multiple disk operations proceed in parallel using asynchronous I/O or multiple storage devices.
•Network Concurrency — In distributed databases, queries to multiple nodes proceed in parallel to reduce latency.

Intra-Query Parallelism Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- Modern databases can parallelize complex queries
-- Example in PostgreSQL with parallel query execution
 
-- Check parallel query settings
SHOW max_parallel_workers_per_gather;  -- e.g., 2
 
-- A query on a large table might use parallel workers:
EXPLAIN ANALYZE
SELECT department_id, AVG(salary), COUNT(*)
FROM employees  -- Assume 10 million rows
WHERE hire_date > '2010-01-01'
GROUP BY department_id;
 
-- Possible parallel execution plan:
-- Finalize GroupAggregate
--   -> Gather Merge (2 workers)
--        -> Partial GroupAggregate
--             -> Parallel Seq Scan on employees
--                  Filter: (hire_date > '2010-01-01')
--                  Workers Planned: 2
--                  Workers Launched: 2
 
-- The scan and initial aggregation happen in parallel
-- across 2 worker processes, then results are merged
-- This is intra-query concurrency within a single statement

Inter-Query vs. Intra-Query Focus

The Buffer Manager's Role in Concurrency

What is the Buffer Pool?

Shared buffer access:

Converting Mermaid diagram...

Buffer latches vs. locks:

The buffer manager uses latches (lightweight locks) to protect physical access to buffer pages. These are different from the locks used by the concurrency control system:

Aspect	Latches	Locks
Purpose	Protect physical data structures in memory	Protect logical data items for transactions
Duration	Held for very short periods (microseconds)	Held for transaction duration (milliseconds to seconds)
Scope	Single page or data structure	Logical item (row, table, etc.)
Deadlock	Avoided through coding discipline	Detected and resolved by DBMS
Release	Immediate after operation	At commit/rollback or when no longer needed

Understanding this distinction is important: even with proper logical concurrency control, the buffer manager must use latches to prevent race conditions during page access.

The Dirty Page Challenge

Summary: Concurrent Execution

We have established what concurrent execution means and why it is essential for modern database systems. Let's consolidate the key concepts:

Key Takeaways

•Concurrent execution allows multiple transactions to be in progress simultaneously, with their operations potentially interleaved.
•Serial execution is simple but wastes resources—CPU sits idle during I/O operations, limiting throughput.
•The CPU-I/O speed gap is the primary driver for concurrency—utilizing the CPU during I/O waits dramatically improves performance.
•A schedule is a specific ordering of operations from multiple transactions; schedules can be serial or non-serial.
•Conflicting operations (different transactions, same data item, at least one write) are the source of concurrency problems.
•Multiple levels of concurrency exist: inter-query, intra-query, intra-operation, I/O, and network.
•Buffer management is intertwined with concurrency—latches protect physical pages while locks protect logical data.

What's next:

Page Complete