Loading content...
Imagine a busy banking system processing thousands of transactions simultaneously. At any given moment, multiple customers are transferring funds, checking balances, and making payments—all operating on shared account data. The database management system must orchestrate these concurrent operations in a way that preserves data integrity while maximizing throughput.
But how do we reason about whether a particular interleaving of operations is correct? How can we determine if concurrent execution produces the same results as running transactions one after another? The answer lies in the formal concept of a schedule—the fundamental abstraction for analyzing concurrent transaction execution.
A schedule is essentially a transcript of what happened: a precise, ordered record of every database operation executed by every transaction. Understanding schedules is the first step toward understanding serializability, the gold standard for concurrent transaction correctness.
By the end of this page, you will understand the formal definition of a schedule, the notation used to represent schedules, the distinction between serial and concurrent schedules, and why schedules serve as the mathematical foundation for reasoning about transaction correctness in database systems.
A schedule (also called a history) is a sequence of operations from one or more transactions that represents a particular execution order. Formally, a schedule S over a set of transactions T = {T₁, T₂, ..., Tₙ} is a partially ordered list of operations where:
The schedule captures not just what operations occurred, but the precise order in which they executed across all participating transactions.
We use the term 'schedule' rather than 'execution' because a schedule is an abstraction. Real execution involves timing, CPU scheduling, disk I/O, and many other details. A schedule abstracts away these implementation details, focusing purely on the logical order of operations relevant to correctness analysis.
Schedules are composed of operations—the atomic units of database access. For serializability analysis, we focus on the following fundamental operations:
| Operation | Notation | Description |
|---|---|---|
| Read | R(X) or rᵢ(X) | Transaction reads data item X |
| Write | W(X) or wᵢ(X) | Transaction writes data item X |
| Commit | C or cᵢ | Transaction commits successfully |
| Abort | A or aᵢ | Transaction aborts/rolls back |
The subscript i identifies which transaction performs the operation. For example, r₁(A) means "Transaction T₁ reads data item A," while w₂(B) means "Transaction T₂ writes data item B."
Important: For conflict serializability analysis, we typically focus on read and write operations since conflicts arise from how transactions access shared data items. Commit and abort operations become critical when analyzing recoverability.
T₁: Transfer $100 from A to B
1. Read A
2. A = A - 100
3. Write A
4. Read B
5. B = B + 100
6. Write B
7. Commit
T₂: Apply 10% interest to A
1. Read A
2. A = A * 1.10
3. Write A
4. CommitCompact notation for T₁: r₁(A) w₁(A) r₁(B) w₁(B) c₁
Compact notation for T₂: r₂(A) w₂(A) c₂
Possible Schedule S₁:
r₁(A) w₁(A) r₁(B) w₁(B) c₁ r₂(A) w₂(A) c₂
Possible Schedule S₂:
r₁(A) r₂(A) w₁(A) w₂(A) r₁(B) w₁(B) c₁ c₂S₁ is a serial schedule (T₁ runs completely before T₂). S₂ is a concurrent schedule with interleaved operations. Both are valid schedules, but they may produce different final results!
A data item is the unit of data that transactions read and write. The granularity of data items varies depending on the system and analysis context:
Account.balance)For serializability theory, we abstract data items as simple named entities (A, B, X, Y, etc.) without specifying their physical representation. The key insight is that conflicts arise when two operations access the same data item.
| Granularity | Concurrency | Overhead | Use Case |
|---|---|---|---|
| Attribute | Highest | Very High | Column-specific updates |
| Tuple/Row | High | Moderate | OLTP workloads (default) |
| Page | Medium | Low | Bulk operations, range scans |
| Table | Low | Very Low | Schema changes, full scans |
| Database | Minimal | Minimal | Backup, global operations |
The behavior of operations determines whether conflicts can occur:
Read Operations (R(X)):
Write Operations (W(X)):
The Critical Insight: Reads are inherently compatible with each other, but writes introduce ordering dependencies. This asymmetry is the foundation of conflict analysis.
Think of reads as 'observing' and writes as 'modifying.' Multiple observers can look at the same object simultaneously without interfering. But if someone modifies the object, the order matters: what observers see depends on whether they look before or after the modification.
Schedules can be represented in several ways, each useful for different purposes. Mastering these representations is essential for analyzing serializability.
The simplest representation lists operations in their execution order as a linear sequence:
S = r₁(A) r₂(A) w₁(A) w₂(A) r₁(B) w₁(B) c₁ c₂
This notation is compact and easy to write, making it ideal for exam questions and quick analysis. However, it can become unwieldy for schedules with many transactions.
A table format shows the temporal progression with each transaction in its own column:
| Time | T₁ | T₂ |
|---|---|---|
| t₁ | r(A) | |
| t₂ | r(A) | |
| t₃ | w(A) | |
| t₄ | w(A) | |
| t₅ | r(B) | |
| t₆ | w(B) | |
| t₇ | c | |
| t₈ | c |
This format makes it easy to visualize the interleaving and identify which transaction performs each operation.
A visual timeline shows transactions as horizontal bars with operations marked along them:
T₁: ──r(A)────w(A)────r(B)────w(B)────c──────────────
│ │
▼ ▼
T₂: ─────────r(A)────────w(A)────────────────────c───
Timeline diagrams excel at showing the temporal relationship between operations across transactions.
For rigorous mathematical analysis, schedules can be defined as ordered sets:
S = (O, <S)
where:
- O is the set of all operations from all transactions
- <S is a partial order on O defining execution order
This formal notation is used in academic papers and advanced theoretical discussions.
Transaction operations:
T₁: r(X) w(X)
T₂: r(X) r(Y) w(Y)
T₃: r(Y) w(Y)Linear: r₁(X) r₂(X) r₃(Y) w₁(X) r₂(Y) w₂(Y) w₃(Y) c₁ c₂ c₃
Tabular:
Time | T₁ | T₂ | T₃
-----|------|------|-----
1 | r(X) | |
2 | | r(X) |
3 | | | r(Y)
4 | w(X) | |
5 | | r(Y) |
6 | | w(Y) |
7 | | | w(Y)
8 | c | |
9 | | c |
10 | | | c All representations describe the exact same execution order. Choose the format that best suits your analysis needs.
A complete schedule satisfies two important properties:
Complete schedules represent finished executions where all transactions have reached a final state. When analyzing serializability, we typically work with complete schedules because they represent the full picture of what happened.
Complete Schedule Example:
S = r₁(A) w₁(A) r₂(A) w₂(A) c₁ c₂
Both T₁ and T₂ have all their operations included and both have committed.
An incomplete schedule (also called a partial schedule or prefix) represents an execution in progress:
Incomplete Schedule Example:
S' = r₁(A) r₂(A) w₁(A)
This is incomplete because:
- T₁ hasn't committed yet
- T₂ still has operations to perform
- Neither transaction has terminated
Why do incomplete schedules matter?
In real systems, the scheduler must make decisions before transactions complete. Concurrency control protocols must ensure that any incomplete schedule can be extended to a serializable complete schedule—or detect when this becomes impossible and abort transactions.
A schedule that looks fine while incomplete may become non-serializable once completed. Good concurrency control protocols prevent this by never allowing schedules to reach states where no serializable completion exists. This is why most protocols are conservative—they prevent potential problems rather than fixing them later.
Interleaving is the mixing of operations from multiple concurrent transactions during execution. The DBMS interleaves transaction operations to maximize resource utilization and system throughput.
Consider a transaction that reads data, performs computation, and writes results:
T₁: READ (disk I/O) → COMPUTE (CPU) → WRITE (disk I/O)
During disk I/O, the CPU sits idle. During computation, the disk sits idle. If we run transactions one at a time (serially), resources are underutilized:
Serial execution (wasted resources):
CPU: [idle][T₁ compute][idle][idle][T₂ compute][idle]
Disk: [T₁ read][idle][T₁ write][T₂ read][idle][T₂ write]
With interleaving, we can overlap I/O from one transaction with computation from another:
Interleaved execution (better utilization):
CPU: [idle][T₁ compute][T₂ compute][idle]
Disk: [T₁ read][T₂ read][T₁ write][T₂ write]
Interleaving improves throughput and resource utilization, but introduces complexity. The database must ensure that interleaved execution produces correct results—results equivalent to some non-interleaved (serial) execution. This is precisely what serializability guarantees.
For n transactions, each with m operations, the number of possible interleavings is astronomically large. This is calculated as:
Number of interleavings = (n × m)! / (m!)ⁿ
Example: Just 3 transactions with 3 operations each:
Example: 5 transactions with 5 operations each:
This exponential growth makes exhaustive checking impractical. We need efficient algorithms—like precedence graphs—to determine serializability without examining every possible interleaving.
The Challenge: Among all possible interleavings, some produce correct results (equivalent to some serial execution), while others produce anomalies. The scheduler must either:
| Transactions | Ops/Transaction | Total Ops | Possible Interleavings |
|---|---|---|---|
| 2 | 2 | 4 | 6 |
| 2 | 3 | 6 | 20 |
| 3 | 3 | 9 | 1,680 |
| 4 | 4 | 16 | ~2.5 million |
| 5 | 5 | 25 | ~6.2 × 10¹⁴ |
| 10 | 10 | 100 | 10¹⁰⁰ |
Schedules are classified into two fundamental categories based on the degree of interleaving:
A serial schedule executes transactions one at a time, with no interleaving. Each transaction runs to completion before the next one begins:
Serial Schedule (T₁ before T₂):
S₁ = r₁(A) w₁(A) r₁(B) w₁(B) c₁ r₂(A) w₂(A) c₂
|____________T₁______________| |____T₂____|
Properties of Serial Schedules:
For n transactions, there are exactly n! possible serial schedules (one for each ordering).
A concurrent schedule (also called interleaved schedule) has operations from different transactions mixed together:
Concurrent Schedule:
S₂ = r₁(A) r₂(A) w₁(A) r₁(B) w₂(A) w₁(B) c₁ c₂
Here, T₂'s operations are interspersed with T₁'s operations.
Properties of Concurrent Schedules:
The goal of serializability is to identify which concurrent schedules are 'safe'—meaning they produce results identical to some serial schedule. A concurrent schedule that is equivalent to a serial schedule gives us the best of both worlds: the correctness of serial execution with the performance benefits of concurrency.
Initial: A = 1000, B = 500
T₁: Transfer $100 from A to B
r₁(A), A=A-100, w₁(A), r₁(B), B=B+100, w₁(B)
T₂: Transfer $50 from A to B
r₂(A), A=A-50, w₂(A), r₂(B), B=B+50, w₂(B)Serial S1 (T₁→T₂): Final A=850, B=650 ✓
Serial S2 (T₂→T₁): Final A=850, B=650 ✓
Concurrent S3:
r₁(A) A=1000
r₂(A) A=1000 // T₂ reads before T₁ writes
w₁(A) A=900 // T₁ writes A-100
w₂(A) A=950 // T₂ overwrites with its calculation!
...
Final A=950, B=650 ✗ (Lost $50!)The serial schedules both preserve the total ($1500). The bad concurrent schedule S3 loses $50 because T₂ read before T₁ wrote—the classic 'lost update' problem. This schedule is NOT equivalent to any serial schedule.
Before we move to analyzing schedules for serializability, let's consolidate the essential properties that characterize schedules:
| Property | Description | Importance |
|---|---|---|
| Completeness | All operations from all transactions are present | Required for final correctness analysis |
| Order Preservation | Operations within each transaction maintain their original order | Ensures transaction semantics |
| Termination | Every transaction ends with commit or abort | Defines final database state |
| Property | Description | Importance |
|---|---|---|
| Serializability | Equivalent to some serial schedule | Correctness criterion |
| Recoverability | Safe to commit without risking cascading aborts | Durability requirement |
| Strictness | Even stronger than recoverability | Simplifies recovery |
| Property | Description | Importance |
|---|---|---|
| Concurrency Level | Degree of interleaving | Throughput optimization |
| Response Time | Time from start to commit | User experience |
| Abort Rate | Frequency of rollbacks | System efficiency |
Concurrency control involves balancing three competing goals: (1) High concurrency for performance, (2) Guaranteed correctness (serializability), and (3) Simple recovery. No protocol optimizes all three simultaneously. Two-phase locking emphasizes correctness but may reduce concurrency. Optimistic protocols increase concurrency but may require more aborts.
Understanding schedules is not merely academic—it's the foundation for all database concurrency control. Here's why this matters:
Schedules give us a formal language to ask and answer: "Is this execution correct?" Without the schedule abstraction, we would have no systematic way to compare different executions or prove correctness properties.
Every concurrency control protocol—two-phase locking, timestamp ordering, optimistic concurrency control, multi-version approaches—is designed to produce schedules with specific properties. Understanding schedules is essential to understanding why these protocols work.
When concurrent transactions produce unexpected results, analyzing the execution as a schedule helps identify:
Schedule analysis helps optimize transaction design:
Architectural choices depend on schedule properties:
What's Next:
Now that we understand what schedules are and how to represent them, we need to explore serial schedules in depth. Serial schedules serve as our reference point—the "gold standard" of correctness that concurrent schedules must match. The next page examines serial schedules, their properties, and why they define what we mean by a "correct" execution.
You now understand the foundational concept of schedules in database transaction processing. You can formally define a schedule, represent it in multiple formats, distinguish between serial and concurrent schedules, and appreciate why this abstraction is essential for reasoning about concurrent transaction correctness. Next, we'll dive deeper into serial schedules and their role as the correctness benchmark.