Two Phase Locking - Learning Module

Loading content...

0/241

Serializability Guarantee

The Mathematical Foundation of Database Correctness

We've learned the rules of Two-Phase Locking: acquire locks during the growing phase, release during the shrinking phase, never acquire after releasing. These rules seem reasonable, but the critical question remains: Why do these rules guarantee correct concurrent execution?

This isn't a question of intuition—it's a question that demands mathematical proof. The correctness of 2PL was formally proven in 1976, establishing that any schedule produced by 2PL-compliant transactions is conflict serializable. This proof is one of the foundational results in database theory.

In this page, we'll walk through this proof step by step, building both formal understanding and practical intuition. By the end, you won't just believe that 2PL guarantees serializability—you'll understand why it must.

What You Will Learn

By the end of this page, you will understand the formal definition of conflict serializability, how the lock point determines serialization order, the proof that 2PL schedules are always conflict-serializable, and intuitive visualizations of why the two-phase structure works.

Conflict Serializability: A Quick Review

Before proving that 2PL guarantees serializability, let's ensure we have a crystal-clear understanding of what conflict serializability means.

Key Definitions:

Foundational Concepts

•Schedule: A sequence of interleaved operations (read, write, lock, unlock) from multiple concurrent transactions
•Serial Schedule: A schedule where transactions execute one after another, with no interleaving. T₁ completes entirely before T₂ begins.
•Conflicting Operations: Two operations conflict if: (1) they belong to different transactions, (2) they access the same data item, and (3) at least one is a write operation
•Conflict Equivalent: Two schedules are conflict equivalent if one can be transformed into the other by swapping non-conflicting adjacent operations
•Conflict Serializable: A schedule is conflict serializable if it is conflict equivalent to some serial schedule

Why Conflict Serializability Matters:

Serial schedules are trivially correct—there's no concurrency, so no concurrency problems can occur. But serial execution wastes resources (one transaction at a time).

Conflict serializability gives us the best of both worlds: concurrent execution with serial-equivalent results. If a concurrent schedule is conflict serializable, its effects are identical to some serial execution order. Users cannot distinguish the concurrent execution from a serial one.

Types of Conflicting Operations
Operation 1	Operation 2	Conflict?	Reason
T₁: Read(A)	T₂: Read(A)	No	Both reads — order doesn't matter
T₁: Read(A)	T₂: Write(A)	Yes	Read-Write — order determines what T₁ sees
T₁: Write(A)	T₂: Read(A)	Yes	Write-Read — order determines what T₂ sees
T₁: Write(A)	T₂: Write(A)	Yes	Write-Write — order determines final value
T₁: Read(A)	T₂: Write(B)	No	Different items — no interference

The Precedence Graph: Testing Serializability

The precedence graph (also called the conflict graph or serialization graph) is the tool we use to test whether a schedule is conflict serializable.

Construction Rules:

Create a node for each transaction in the schedule
Add a directed edge Tᵢ → Tⱼ if there exists a conflicting pair of operations where Tᵢ's operation appears before Tⱼ's operation in the schedule
The edge indicates that in any equivalent serial schedule, Tᵢ must come before Tⱼ

precedence_graph_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
SCHEDULE S:
R₁(A), W₂(A), R₂(B), W₁(B), R₃(A), W₃(B)
 
Where:
- R₁(A) means Transaction 1 reads A
- W₂(A) means Transaction 2 writes A
 
STEP 1: Identify conflicting pairs
┌────────────────────────────────────────────────────────────┐
│ Pair              │ Conflict? │ Edge Added                 │
├───────────────────┼───────────┼────────────────────────────┤
│ R₁(A) - W₂(A)     │ Yes       │ T₁ → T₂ (R-W on A)         │
│ W₂(A) - R₃(A)     │ Yes       │ T₂ → T₃ (W-R on A)         │
│ R₂(B) - W₁(B)     │ Yes       │ T₂ → T₁ (R-W on B)         │
│ W₁(B) - W₃(B)     │ Yes       │ T₁ → T₃ (W-W on B)         │
│ R₂(B) - W₃(B)     │ Yes       │ T₂ → T₃ (R-W on B)         │
└────────────────────────────────────────────────────────────┘
 
STEP 2: Draw the precedence graph
    T₁ ←──── T₂
     │        │
     │        │
     ↓        ↓
        T₃
 
STEP 3: Check for cycles
Path: T₁ → T₂ → T₁ ← CYCLE DETECTED!
(T₁ → T₂ from A, T₂ → T₁ from B)
 
CONCLUSION: Schedule S is NOT conflict serializable
The cycle means there's no valid serial ordering.

The Key Theorem

A schedule is conflict serializable if and only if its precedence graph is acyclic (has no cycles).

If the graph is acyclic, a topological sort of the nodes gives a valid serial ordering. If there's a cycle, no serial ordering can satisfy all the constraints—the schedule cannot be serialized.

The Lock Point and Serialization Order

The lock point of a transaction is the moment when it has acquired all the locks it will ever need—the transition point between the growing and shrinking phases. This concept is absolutely crucial to understanding why 2PL guarantees serializability.

Key Insight:

The Lock Point Theorem

In a 2PL schedule, the order of lock points defines a valid serialization order. If transaction Tᵢ's lock point occurs before Tⱼ's lock point, then in the equivalent serial schedule, Tᵢ precedes Tⱼ.

Why Lock Point Order Works:

Think about what it means for Tᵢ's lock point to occur before Tⱼ's:

At Tᵢ's lock point, Tᵢ holds all its locks
If Tⱼ also needs any of those same locks, Tⱼ must wait until Tᵢ releases them
But Tᵢ won't release until after its lock point
So Tⱼ's lock point MUST occur after Tᵢ's lock point

This creates a clear temporal ordering among transactions based on their lock points.

lock_point_ordering.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
EXAMPLE: Three transactions with 2PL
 
Time →
                                                              
T₁: |--Lock(A)--Lock(B)--|*LP₁*|--Unlock(B)--Unlock(A)--|COMMIT|
                                                              
T₂:      |--Lock(C)------Wait(A)...|--Lock(A)--|*LP₂*|--Unlock--|COMMIT|
                                                              
T₃:           |--Lock(D)--Wait(B).........|--Lock(B)--|*LP₃*|--Unlock--|COMMIT|
 
Lock Point Order: LP₁ < LP₂ < LP₃
 
Therefore: Serial order T₁, T₂, T₃ is valid
 
WHY THIS ORDER WORKS:
- T₂ waited for A (held by T₁) → T₂ depends on T₁ finishing first
- T₃ waited for B (held by T₁) → T₃ depends on T₁ finishing first
- T₃ also waited for T₂ implicitly via the lock schedule
 
The concurrent schedule is equivalent to: T₁; T₂; T₃ executing serially

Converting Mermaid diagram...

The Formal Proof: 2PL Guarantees Serializability

We now present the formal proof that 2PL schedules are always conflict serializable. The proof proceeds by contradiction, showing that if a 2PL schedule were not serializable (i.e., had a cycle in its precedence graph), a contradiction would arise.

Theorem: Every legal schedule under Two-Phase Locking is conflict serializable.

Proof Structure:

Assume a 2PL schedule S exists with a cycle in its precedence graph
Show this leads to a lock acquisition/release contradiction
Conclude no such cycle can exist
Therefore, 2PL schedules are conflict serializable

formal_proof.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
PROOF BY CONTRADICTION
 
ASSUMPTION: There exists a legal 2PL schedule S that is not conflict 
serializable. This means its precedence graph has a cycle.
 
Let the cycle be: T₁ → T₂ → T₃ → ... → Tₖ → T₁
 
WHAT EACH EDGE MEANS:
An edge Tᵢ → Tⱼ exists because there are conflicting operations where
Tᵢ's operation on some item X precedes Tⱼ's conflicting operation on X.
 
For Tᵢ to access X before Tⱼ, and for both to complete without conflict:
- Tᵢ must HOLD a lock on X while accessing it
- Tⱼ must WAIT for that lock until Tᵢ releases it
- Therefore: unlock_i(X) < lock_j(X) in time
 
EXAMINING THE CYCLE:
T₁ → T₂: unlock₁(X₁) < lock₂(X₁) for some item X₁
T₂ → T₃: unlock₂(X₂) < lock₃(X₂) for some item X₂
...
Tₖ → T₁: unlockₖ(Xₖ) < lock₁(Xₖ) for some item Xₖ
 
NOW APPLY 2PL CONSTRAINTS:
For each transaction Tᵢ following 2PL:
- All lock operations precede all unlock operations
- Therefore: lock_i(any) < unlock_i(any)
 
COMBINING THE CONSTRAINTS:
From T₁ → T₂: unlock₁(X₁) < lock₂(X₁)
By 2PL for T₂:  lock₂(X₂) < unlock₂(X₂)
And:            lock₂(X₁) ≤ lock₂(X₂)  (all locks before any unlock)
Therefore:      unlock₁(X₁) < unlock₂(X₂)
 
Extending this through the cycle:
unlock₁(X₁) < unlock₂(X₂) < unlock₃(X₃) < ... < unlockₖ(Xₖ) < lock₁(Xₖ)
 
But by 2PL for T₁: lock₁(Xₖ) < unlock₁(X₁)
 
COMBINING:
unlock₁(X₁) < ... < lock₁(Xₖ) < unlock₁(X₁)
 
This says: unlock₁(X₁) < unlock₁(X₁)
 
CONTRADICTION! A timestamp cannot be less than itself.
 
CONCLUSION: Our assumption was false. No cycle can exist.
Therefore, all 2PL schedules have acyclic precedence graphs.
Therefore, all 2PL schedules are conflict serializable. ∎

The Essence of the Proof

The proof shows that any cycle in the precedence graph would require a transaction to unlock something after it has already unlocked it (temporal impossibility). The two-phase structure—all locks before any unlocks—creates a "ratchet" effect that makes backward dependencies impossible.

Building Intuition: Why Two Phases Work

The formal proof establishes correctness, but intuition helps us truly understand the mechanism. Let's build that intuition through several mental models:

Mental Model 1: The Claiming Game

Imagine transactions as players in a resource-claiming game:

During the growing phase, you claim territories (locks on data items)
Once you start giving up territories (shrinking phase), you can never claim more
The order in which players reach their "maximum empire" (lock point) determines their ranking
The game produces a clear winner order with no ties or disputes

Mental Model 2: The Door Policy

•Imagine each data item as a room with a door
•Growing phase: You can enter rooms (lock), but can't leave any
•Lock point: You've entered all rooms you need
•Shrinking phase: You can leave rooms, but can't enter new ones
•Result: No circular dependencies possible

Mental Model 3: The Peak Moment

•Each transaction's lock count forms a mountain shape
•The peak (lock point) is a unique moment in time
•Transactions can be ordered by their peak times
•No transaction's peak can "wrap around" another's
•This ordering is the serialization order

Mental Model 4: The Handoff Chain

Think of conflicts as handoffs:

If T₂ conflicts with T₁ on item X, T₂ "receives" X from T₁
The handoff only occurs after T₁ is done with X (released lock)
Under 2PL, once T₁ releases anything, T₁ is past its lock point
So T₂'s lock point must come after T₁'s lock point
Handoffs always flow "forward" in lock-point order

Why Cycles Are Impossible:

A cycle would require: T₁ hands to T₂ hands to T₃ hands to T₁

But "hands to" means "lock point is before". So we'd need:

LP₁ < LP₂ < LP₃ < LP₁

This is impossible! You can't have LP₁ both before and after itself.

The One-Way Flow

The key insight is that dependencies in 2PL only flow one way—from earlier lock points to later lock points. This unidirectional flow makes cycles mathematically impossible, guaranteeing a valid serial ordering exists.

Worked Examples: Finding the Serial Order

Let's work through complete examples showing how to determine the equivalent serial order for 2PL schedules.

example1_two_transactions.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SCHEDULE S1 (2PL Compliant):
 
Time: 1  2  3  4  5  6  7  8  9  10 11 12 13 14
T₁:   L(A) R(A)    L(B)    R(B)    U(A)    U(B)    COMMIT
T₂:        L(C)    R(C) L(A)--wait--→→ L(A) R(A) U(C) U(A) COMMIT
 
Legend: L=Lock, R=Read, U=Unlock
 
LOCK POINT ANALYSIS:
- T₁'s lock point: After L(B) at time 4 (all locks acquired)
- T₂'s lock point: After L(A) at time 9 (had to wait for T₁)
 
Lock Point Order: LP₁ (time 4) < LP₂ (time 9)
 
SERIAL ORDER: T₁ ; T₂
 
VERIFICATION:
- T₂ read A after T₁ read A (if T₁ wrote, T₂ would see it)
- This matches serial execution where T₁ completes before T₂ starts ✓

Implications for Database System Design

The serializability guarantee of 2PL has profound implications for how database systems are designed and how applications interact with them.

For Database Engine Developers:

Engine Design Implications

•No Need for Runtime Serializability Checks — By enforcing 2PL, the engine guarantees serializability without checking each schedule
•Lock Manager is Correctness-Critical — The lock manager must never allow lock acquisition after release for the same transaction
•Lock Point is Implicit — The system doesn't need to explicitly track lock points; the guarantee holds automatically
•Focus Shifts to Performance — Since correctness is guaranteed, optimization efforts can focus on throughput and latency
•Deadlock Handling is Separate — 2PL guarantees correctness but not liveness; deadlock detection/prevention is a separate subsystem

For Application Developers:

Application Design Implications

•Transaction Boundaries Matter — Everything within a transaction is serializable; cross-transaction consistency must be designed separately
•Long Transactions Are Costly — Extended growing phases block other transactions; keep transactions short
•Lock Scope Determines Isolation — Only explicitly accessed (locked) data is protected; phantom reads require additional mechanisms
•Deadlock is a Normal Condition — Applications should be prepared to retry when transactions are aborted due to deadlock
•Isolation Levels are Optimizations — Lower isolation levels relax 2PL for performance, trading correctness for throughput

The Serializability Contract

2PL guarantees serializability within its scope. It does NOT guarantee:

Consistency across multiple transactions (that's application logic)
Freedom from deadlocks (that's handled separately)
Protection against phantoms (that requires predicate locking)
Real-time ordering (two serial orders may differ from wall-clock order)

Summary: The Power of Two-Phase Locking

We've rigorously established why Two-Phase Locking guarantees conflict serializability. This guarantee is the foundation upon which all transactional database systems build their concurrency control. Let's consolidate our understanding:

Key Takeaways

•Conflict serializability = equivalence to some serial schedule — The concurrent execution produces results identical to a serial run.
•The precedence graph detects serializability — Acyclic graph = serializable.
•Lock point order determines serial order — Transactions serialize in the order of their lock points.
•The proof shows cycles are impossible — The two-phase structure prevents temporal contradictions.
•Dependencies flow one way — From earlier to later lock points, never backwards.
•The guarantee is automatic — Just follow 2PL rules; no runtime checking needed.
•Design implications are profound — Both engine and application design depend on this guarantee.

What's next:

Now that we understand the power of 2PL, we must also understand its limitations. The next page examines the weaknesses and trade-offs of Two-Phase Locking: deadlocks, reduced concurrency, potential for starvation, and why variants like Strict 2PL and Rigorous 2PL were developed to address some of these issues.

Page Complete

You now understand why 2PL guarantees serializability, both formally and intuitively. This understanding is essential for database professionals—it explains why the protocol works, when it applies, and what assumptions underlie its guarantee. You're equipped to reason about transaction correctness at a deep level.