Phantom Read Problem - Learning Module

Loading content...

0/241

Range Queries

The Range Query Vulnerability

Range queries represent the most phantom-susceptible category of database operations. When you write WHERE price BETWEEN 100 AND 500 or WHERE date > '2024-01-01', you're defining an infinite search space—a conceptual set of all possible rows that could satisfy your condition, past, present, and future.

This page dissects why range queries are uniquely vulnerable to phantoms, how database systems attempt to protect them, and what strategies developers must employ to ensure consistency in range-based operations.

What You Will Learn

By the end of this page, you will master the mechanics of range query phantoms, understand how gap locking and next-key locking work, analyze the unique challenges of unbounded and multi-dimensional ranges, and recognize range query patterns that require special protection.

Anatomy of Range Queries

Range queries use comparison operators to define a subset of possible column values. Understanding their structure reveals why they create phantom vulnerabilities.

Types of Range Query Predicates
Predicate Type	SQL Example	Range Characteristics	Phantom Risk
Bounded Range	`BETWEEN 10 AND 100`	Closed interval [10, 100]	High - many possible insertions
Lower Bound Only	`> 50` or `>= 50`	Half-open interval [50, ∞)	Very High - unbounded upper end
Upper Bound Only	`< 50` or `<= 50`	Half-open interval (−∞, 50]	Very High - unbounded lower end
Inequality	`!= 50`	Everything except 50	Extreme - nearly all values
LIKE Pattern	`LIKE 'Smith%'`	String range starting with 'Smith'	High - many possible strings
IN List	`IN (1, 5, 10)`	Discrete set {1, 5, 10}	Moderate - defined set only
Composite Range	`x > 10 AND y < 20`	Multi-dimensional box	Very High - 2D+ search space

The Infinite Set Problem:

Consider the predicate salary > 100000. This conceptually matches:

All current employees with salary above 100000
All future employees who will be hired with salary above 100000
All employees whose salary might be raised above 100000
Literally any row that could ever have a salary value exceeding 100000

Traditional locking protects existing rows. But how do you lock a salary value of 150000 that no employee currently has? You can't—unless you lock the range itself.

The Core Challenge

Range predicates define potentially infinite sets of matching values. Row-level locks can only protect existing materialized values. True range protection requires locking the conceptual space between and beyond existing values—a fundamentally different locking paradigm.

Gap Locking Mechanics

Gap locks are a specialized locking mechanism designed to prevent phantom insertions. They protect the "gaps" between existing index values—the spaces where new rows could be inserted.

How Gap Locks Work:

In a B+Tree index, values are stored in sorted order. Between any two adjacent values lies a gap—a range of values that don't currently exist but could be inserted. Gap locks prevent insertions into these gaps.

gap-locking-illustration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- B+Tree Index on 'price' column
-- Current values: [50, 100, 150, 200, 300]
--
-- Visual representation with gaps:
-- 
--   (-∞, 50)  [50]  (50, 100)  [100]  (100, 150)  [150]  (150, 200)  [200]  (200, 300)  [300]  (300, +∞)
--      ↑             ↑                  ↑                  ↑                  ↑                  ↑
--    Gap 1         Gap 2              Gap 3              Gap 4              Gap 5              Gap 6
--
 
-- Query: SELECT * FROM products WHERE price BETWEEN 100 AND 200;
 
-- With row-level locking ONLY:
-- Locks acquired: [100] [150] [200]
-- Gaps remain UNPROTECTED!
-- A concurrent INSERT of price = 125 would succeed → PHANTOM!
 
-- With gap locking ENABLED:
-- Locks acquired:
--   - Row lock on [100]
--   - Gap lock on (50, 100)   -- Prevents inserts like 75
--   - Gap lock on (100, 150)  -- Prevents inserts like 125
--   - Row lock on [150]
--   - Gap lock on (150, 200)  -- Prevents inserts like 175
--   - Row lock on [200]
--   - Gap lock on (200, 300)  -- Prevents inserts like 250
 
-- Now concurrent INSERT of price = 125 would BLOCK on gap lock!

Gap Lock Properties:

Gap locks don't conflict with each other: Multiple transactions can hold gap locks on the same gap. Gap locks only block insertions into the gap.
Gap locks are held until transaction end: Unlike some row locks that might be released early, gap locks must persist to prevent phantoms throughout the transaction.
Gap locks extend to "supremum": For unbounded ranges like price > 200, the gap lock extends from the last matching value to a special "supremum" pseudo-record representing positive infinity.
Gap locks work with index order: Gap locks only make sense on indexed columns. For non-indexed predicates, gap locking is impossible without a table scan and table-level locking.

Next-Key Locking

InnoDB uses 'next-key locking'—a combination of a row lock and a gap lock on the gap before the row. This efficiently prevents phantoms by locking both the existing row and the insertion point before it. Next-key locks are the default locking mode in InnoDB for index scans.

Bounded vs Unbounded Ranges

The difference between bounded and unbounded ranges significantly affects both phantom vulnerability and the cost of prevention.

Bounded Ranges

•Defined extent: BETWEEN 100 AND 200
•Finite gap count: Only gaps within bounds need locking
•Index-friendly: Can use index range scan efficiently
•Predictable locking: Lock count proportional to result size
•Lower contention: Only blocks inserts in specific range

Unbounded Ranges

•Infinite extent: > 100 or < 100
•Requires infinity lock: Must lock to supremum/infimum
•Higher contention: Blocks all inserts beyond boundary
•Still index-friendly: Uses range scan to infinity
•Greater phantom risk: Larger insertion surface area

bounded-vs-unbounded.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- ============================================
-- BOUNDED RANGE: price BETWEEN 100 AND 200
-- ============================================
-- Index values: [50, 100, 150, 200, 300]
--
-- Locks needed:
--   Gap (50, 100), Row [100]
--   Gap (100, 150), Row [150]
--   Gap (150, 200), Row [200]
-- 
-- Concurrent activities ALLOWED:
--   INSERT price = 25   ✓ (outside range)
--   INSERT price = 250  ✓ (outside range)
--   UPDATE existing row to price = 175  ✗ (blocked by gap lock)
--   INSERT price = 125  ✗ (blocked by gap lock)
 
-- ============================================
-- UNBOUNDED RANGE: price > 150
-- ============================================
-- Index values: [50, 100, 150, 200, 300]
--
-- Locks needed:
--   Gap (100, 150), Row [150]  -- Gap before first matching row
--   Gap (150, 200), Row [200]
--   Gap (200, 300), Row [300]
--   Gap (300, supremum)        -- The infinite gap after last value
--
-- Concurrent activities ALLOWED:
--   INSERT price = 25   ✓ (below range)
--   INSERT price = 125  ✓ (below range)
-- 
-- Concurrent activities BLOCKED:
--   INSERT price = 175  ✗ (in range)
--   INSERT price = 500  ✗ (in supremum gap!)
--   INSERT price = 999999  ✗ (still in supremum gap)
 
-- ============================================
-- FULLY UNBOUNDED: inequality != 150
-- ============================================
-- This is essentially (−∞, 150) ∪ (150, +∞)
-- Requires locking EVERYTHING except the value 150
-- Effectively requires table-level locking for protection!

The Supremum Lock

The 'supremum' is a special pseudo-record that represents the upper bound of all possible values in an index. Locking to the supremum effectively locks an infinite range. Similarly, 'infimum' represents the lower bound. These concepts enable gap locking on unbounded ranges.

Multi-Dimensional Range Predicates

When predicates involve multiple columns, the phantom problem becomes significantly more complex. Multi-dimensional ranges create search spaces that are difficult to protect efficiently.

The Compound Predicate Challenge:

Consider: WHERE department = 'Sales' AND salary > 50000 AND hire_date < '2024-01-01'

This defines a three-dimensional search space. Phantom protection requires ensuring no new row can enter this 3D region—but B+Tree indexes are fundamentally one-dimensional. They can efficiently handle one column's range, but compound conditions require creative approaches.

multi-dimensional-predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- Multi-dimensional phantom scenario
-- Tables: employees (id, name, department, salary, hire_date)
-- Indexes: (department), (salary), (hire_date), (department, salary)
 
-- Query with 2D predicate:
SELECT * FROM employees
WHERE department = 'Engineering'
  AND salary BETWEEN 80000 AND 120000;
 
-- Visual representation of the 2D search space:
--
--            salary
--              ↑
--         120K │ ███████████████
--              │ ███████████████
--              │ ███████████████
--         80K  │ ███████████████
--              │
--              └──────────────────→ department
--                   Engineering
--
-- The shaded region represents all possible (dept, salary) pairs
-- that would satisfy the predicate
 
-- With index on (department, salary):
-- Gap locks can protect this 2D slice efficiently
-- The composite index linearizes the 2D space
 
-- Without composite index:
-- Option 1: Lock all Engineering rows (over-locking)
-- Option 2: Lock all rows with salary 80K-120K (over-locking)
-- Option 3: Lock both independently (partial protection)
-- Option 4: Table lock (serialization)
 
-- ============================================
-- 3D Predicate: Even more complex
-- ============================================
SELECT * FROM employees
WHERE department = 'Engineering'
  AND salary BETWEEN 80000 AND 120000
  AND hire_date > '2023-01-01';
 
-- Now we need to protect a 3D region
-- No single index can represent this efficiently
-- Practical options:
-- 1. Composite index (dept, salary, hire_date) - if selectivity is right
-- 2. Partial protection with multiple indexes
-- 3. Accept phantom risk at lower isolation levels
-- 4. Use SERIALIZABLE with SSI (Serializable Snapshot Isolation)

Indexing Strategies for Multi-Dimensional Protection
Strategy	Protection Level	Overhead	When to Use
Composite Index (perfect order)	Complete	Low query overhead, storage for index	Frequent queries on this exact combination
Multiple Single-Column Indexes	Partial (depends on optimizer)	Multiple index maintenance	Varied query patterns
Covering Index	Complete for covered queries	High storage overhead	Read-heavy workloads with known patterns
Table Lock	Complete	Severe concurrency impact	Rare queries, batch operations
SSI (Postgres)	Complete (detects conflicts)	Retry overhead on conflicts	Modern PostgreSQL, mixed workloads

Index Order Matters

For a composite index (A, B, C), gap locking works efficiently for predicates on (A), (A, B), or (A, B, C). But a predicate on just (B, C) cannot use this index for gap locking—the index order determines which predicates can be protected.

LIKE and Pattern Predicates

Pattern matching predicates like LIKE create interesting range scenarios. The nature of the pattern determines whether efficient gap locking is possible.

pattern-predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
-- ============================================
-- PREFIX PATTERNS: Can use range locking
-- ============================================
-- Pattern: LIKE 'Smith%'
-- Equivalent to: name >= 'Smith' AND name < 'Smiti'
-- (The 'th' becomes 'ti' as the upper bound)
 
SELECT * FROM customers WHERE last_name LIKE 'Smith%';
 
-- This translates to an index range scan:
-- Start: 'Smith' (inclusive)
-- End:   'Smiti' (exclusive)
--
-- Gap locks protect:
--   - Gap before 'Smith'
--   - All 'Smith...' variations: 'Smithson', 'Smithers', etc.
--   - Gap after the 'Smit...' range
--
-- This IS efficiently protectable against phantoms!
 
-- ============================================
-- SUFFIX PATTERNS: Cannot use range locking
-- ============================================
-- Pattern: LIKE '%son'
-- This matches: 'Johnson', 'Wilson', 'Smithson', etc.
-- There's no contiguous range in a standard B+Tree index
 
SELECT * FROM customers WHERE last_name LIKE '%son';
 
-- Database must scan all rows (or use specialized index)
-- Gap locking is IMPOSSIBLE without full table protection
-- Only options: table lock or accept phantom risk
 
-- ============================================
-- INFIX PATTERNS: Also cannot use range locking
-- ============================================
-- Pattern: LIKE '%mit%'
-- Matches: 'Smith', 'Schmidt', 'Blacksmith', etc.
-- Same problem as suffix patterns
 
SELECT * FROM customers WHERE last_name LIKE '%mit%';
 
-- ============================================
-- ESCAPE PATTERNS: Range determination varies
-- ============================================
-- Pattern: LIKE 'O\'Brien%'  -- Literal apostrophe
-- Can still be treated as prefix range if optimizer understands
 
SELECT * FROM customers WHERE last_name LIKE 'O''Brien%';

LIKE Pattern Phantom Protection
Pattern Type	Example	Index Usable?	Gap Lock Possible?	Phantom Risk
Prefix	`LIKE 'abc%'`	Yes (range scan)	Yes	Low (if index exists)
Suffix	`LIKE '%xyz'`	No (full scan)	No	High (table lock or accept)
Infix	`LIKE '%abc%'`	No (full scan)	No	High (table lock or accept)
Prefix+Suffix	`LIKE 'a%z'`	Partial (prefix only)	Partial	Medium
Single Wildcard	`LIKE 'a_c'`	Possible (range)	Possible	Low to Medium

Full-Text and Trigram Indexes

For suffix and infix patterns, specialized indexes like PostgreSQL's pg_trgm (trigram) or full-text search indexes can provide range-like protection. However, their phantom locking behavior varies by implementation—consult your database's documentation.

Date and Time Range Phantoms

Temporal predicates are among the most common range queries in business applications—and consequently, among the most phantom-prone. Reports by date range, time-windowed aggregations, and scheduling queries all face significant phantom risks.

datetime-phantom-scenarios.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- ============================================
-- SCENARIO 1: Daily Sales Report
-- ============================================
BEGIN TRANSACTION;
 
-- Query 1: Get today's sales total
SELECT SUM(amount) as daily_total
FROM sales
WHERE sale_date = CURRENT_DATE;
-- Result: $45,000
 
-- Process this total for dashboard...
 
-- Meanwhile, T2 records new sale:
-- INSERT INTO sales (sale_date, amount) 
-- VALUES (CURRENT_DATE, 5000);
-- COMMIT;
 
-- Query 2: Verify total matches details
SELECT COUNT(*), SUM(amount)
FROM sales
WHERE sale_date = CURRENT_DATE;
-- Result: $50,000 ← PHANTOM!
 
COMMIT;
 
-- ============================================
-- SCENARIO 2: Month-to-Date with Unbounded End
-- ============================================
BEGIN TRANSACTION;
 
SELECT SUM(amount)
FROM sales
WHERE sale_date >= '2024-01-01'
  AND sale_date <= CURRENT_DATE;  -- Bounded but date changes!
 
-- Problems:
-- 1. New sales today create INSERT phantoms
-- 2. If query runs across midnight, CURRENT_DATE changes
-- 3. Timezone differences can create edge cases
 
-- ============================================
-- SCENARIO 3: Time-Windowed Aggregation
-- ============================================
BEGIN TRANSACTION;
 
-- Get orders from last 1 hour
SELECT COUNT(*)
FROM orders
WHERE created_at > NOW() - INTERVAL '1 hour';
-- Result: 150 orders
 
-- Problem: NOW() keeps advancing!
-- Even without phantoms, this query isn't truly repeatable
-- because the time boundary itself is moving
 
-- Better approach: Pin the time reference
SET @query_time = NOW();
 
SELECT COUNT(*)
FROM orders
WHERE created_at > @query_time - INTERVAL '1 hour'
  AND created_at <= @query_time;
 
-- This creates a stable, bounded range
-- Still needs phantom protection, but at least the
-- boundaries don't shift
 
COMMIT;

Common Temporal Phantom Pitfalls

•CURRENT_DATE/NOW() drift — These functions return the current time, not transaction start time. Within a transaction, they may return different values, especially for long-running queries.
•Timezone boundary issues — A sale at 11:59 PM in one timezone is 12:01 AM the next day in another. Phantom behavior can vary based on timezone handling.
•Unbounded future dates — WHERE date >= '2024-01-01' has no upper bound. All future insertions become phantoms. Use explicit end dates.
•Partition boundary interactions — If tables are partitioned by date, phantom behavior may differ at partition boundaries where the optimizer changes strategies.
•Batch insert windows — Systems that batch-insert data (e.g., hourly ETL) create concentrated phantom windows. A report running during batch load faces high phantom risk.

The transaction_timestamp() Function

PostgreSQL's transaction_timestamp() returns the timestamp when the current transaction began—unlike NOW() which returns statement start time. Use transaction-stable timestamps for consistent temporal predicates within a transaction.

Aggregate Functions with Range Predicates

The intersection of aggregate functions and range predicates creates the highest-risk phantom scenarios. Each aggregate function has unique characteristics that determine how phantoms affect its results.

Aggregate Function Phantom Sensitivity
Function	Phantom Sensitivity	Impact of Single Phantom	Recoverability
COUNT(*)	Very High	Changes by exactly 1	Easy to detect (integer change)
SUM(x)	High	Increases/decreases by phantom value	Detectable if values known
AVG(x)	High	Shifts average non-linearly	Hard to detect (depends on distribution)
MIN(x)	Medium	Only changes if phantom is new minimum	Detectable if < previous MIN
MAX(x)	Medium	Only changes if phantom is new maximum	Detectable if > previous MAX
COUNT(DISTINCT x)	Medium-High	Changes by 0 or 1	Depends on value uniqueness
STRING_AGG/GROUP_CONCAT	Very High	Completely changes output	Visible but may be subtle

aggregate-range-phantoms.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Demonstration: How phantoms affect different aggregates
 
-- Initial state: employees in Engineering with salary 50K-100K
-- Rows: (Alice, 60K), (Bob, 75K), (Carol, 80K), (Dave, 95K)
 
-- Query at t1:
SELECT 
    COUNT(*) as emp_count,           -- 4
    SUM(salary) as total_salary,     -- 310,000
    AVG(salary) as avg_salary,       -- 77,500
    MIN(salary) as min_salary,       -- 60,000
    MAX(salary) as max_salary        -- 95,000
FROM employees
WHERE department = 'Engineering'
  AND salary BETWEEN 50000 AND 100000;
 
-- Phantom INSERT: (Eve, Engineering, 70,000)
 
-- Query at t2 (same transaction):
SELECT 
    COUNT(*) as emp_count,           -- 5        (+1)
    SUM(salary) as total_salary,     -- 380,000  (+70,000)
    AVG(salary) as avg_salary,       -- 76,000   (−1,500 !)
    MIN(salary) as min_salary,       -- 60,000   (unchanged)
    MAX(salary) as max_salary        -- 95,000   (unchanged)
FROM employees
WHERE department = 'Engineering'
  AND salary BETWEEN 50000 AND 100000;
 
-- Observations:
-- 1. COUNT always changes by 1 for single phantom
-- 2. SUM changes by the phantom value
-- 3. AVG can go UP or DOWN depending on phantom value vs current avg
-- 4. MIN/MAX only change if phantom is extreme value
-- 5. The AVERAGE DECREASED even though we ADDED a row!
--    (Eve's 70K is below the current average of 77.5K)
 
-- ============================================
-- Dangerous pattern: Using aggregate results in logic
-- ============================================
BEGIN TRANSACTION;
 
-- Step 1: Calculate bonus pool based on headcount
SELECT COUNT(*) * 1000 as bonus_pool
FROM employees
WHERE department = 'Engineering'
  AND hire_date < CURRENT_DATE - INTERVAL '1 year';
-- Result: 4 employees × $1000 = $4000 pool
 
-- Phantom INSERT: New employee Eve hired 2 years ago
 
-- Step 2: Distribute bonus pool equally
UPDATE employees
SET bonus = 4000.00 / (
    SELECT COUNT(*) FROM employees
    WHERE department = 'Engineering'
      AND hire_date < CURRENT_DATE - INTERVAL '1 year'
)
WHERE department = 'Engineering'
  AND hire_date < CURRENT_DATE - INTERVAL '1 year';
 
-- Now 5 employees each get $4000/5 = $800
-- But the pool was calculated for 4 employees!
-- We've created $1000 from thin air (or underpaid everyone)
 
COMMIT;

The Aggregate-Then-Use Pattern

One of the most dangerous phantom patterns is computing an aggregate, then using that value in subsequent operations. If a phantom changes the underlying result set between the computation and its use, the logic becomes invalid. This pattern requires SERIALIZABLE isolation or explicit snapshot management.

Prevention Strategies for Range Queries

Protecting range queries from phantoms requires deliberate strategy choices. Each approach has distinct trade-offs between correctness, performance, and complexity.

Range Query Protection Strategies

•Use SERIALIZABLE Isolation — The most straightforward approach. The database handles phantom prevention through predicate locking (2PL), gap locking, or SSI. Trade-off: reduced concurrency and potential for more aborts/retries.
•Composite Indexes on Range Columns — Create indexes that match your range predicates. Enables efficient gap locking. Trade-off: storage overhead and insert performance impact.
•Explicit Table Hints (SQL Server) — Use WITH (HOLDLOCK) or WITH (TABLOCKX) to force stricter locking. Trade-off: very coarse-grained, severe concurrency impact.
•SELECT ... FOR UPDATE (PostgreSQL/MySQL) — Explicitly request exclusive locks on result set. Trade-off: may not prevent INSERT phantoms unless combined with gap locking.
•Application-Level Verification — Re-query critical predicates and compare results; abort if changed. Trade-off: complexity, race conditions, wasted work.
•Materialized Views with Refresh Logic — Pre-compute range query results; refresh atomically. Trade-off: staleness, refresh overhead, storage.

prevention-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
-- ============================================
-- Strategy 1: SERIALIZABLE Isolation
-- ============================================
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
 
SELECT SUM(amount)
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';
 
-- Database now protects this range from concurrent inserts
-- InnoDB: Uses gap locks on order_date index
-- PostgreSQL SSI: Tracks predicate read, detects conflicts
 
COMMIT;
 
-- ============================================
-- Strategy 2: Composite Index
-- ============================================
CREATE INDEX idx_orders_date_status 
ON orders(order_date, status);
 
-- Query can now use efficient gap locking on both columns:
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31'
  AND status = 'pending';
 
-- ============================================
-- Strategy 3: SELECT FOR UPDATE (with caution)
-- ============================================
BEGIN TRANSACTION;
 
SELECT * FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31'
FOR UPDATE;  -- Locks all matching rows exclusively
 
-- WARNING: This prevents concurrent updates but may NOT
-- prevent INSERT phantoms in all databases!
-- Check your database documentation for gap locking behavior
 
COMMIT;
 
-- ============================================
-- Strategy 4: Application-Level Verification
-- ============================================
BEGIN TRANSACTION;
 
-- Capture initial state
SELECT COUNT(*) INTO @initial_count
FROM orders
WHERE order_date = CURRENT_DATE;
 
-- ... perform operations based on count ...
 
-- Verify before commit
SELECT COUNT(*) INTO @final_count
FROM orders
WHERE order_date = CURRENT_DATE;
 
IF @initial_count != @final_count THEN
    ROLLBACK;  -- Phantom detected, retry
ELSE
    COMMIT;
END IF;

The 80/20 Rule for Range Protection

In most applications, 80% of phantom-sensitive range queries occur in a small number of critical paths (financial reports, inventory counts, security audits). Focus your protection efforts on these paths. Accept phantom risk in non-critical analytics where eventual consistency is acceptable.

Summary: Range Queries and Phantom Reads

We have thoroughly examined how range queries uniquely amplify phantom read vulnerabilities. Let's consolidate the essential insights:

Key Takeaways

•Range queries define infinite potential result sets — Any predicate with comparison operators creates a space where new rows can appear
•Gap locking protects index gaps — Specialized locks on the spaces between existing values prevent phantom insertions
•Unbounded ranges require supremum/infimum locks — Half-open ranges (> x, < y) must lock to infinity
•Multi-dimensional predicates multiply complexity — Composite indexes are essential for efficient multi-column range protection
•LIKE patterns have variable protection — Prefix patterns are protectable; suffix and infix patterns require table-level protection
•Temporal predicates are pervasive and phantom-prone — Date/time range queries are among the most common and most vulnerable
•Aggregates amplify phantom impact — COUNT, SUM, AVG changes ripple through dependent logic
•Prevention requires deliberate strategy — SERIALIZABLE isolation, composite indexes, or application-level verification

What's next:

Now that we understand how range queries expose phantom vulnerabilities, the next page presents concrete example scenarios that demonstrate phantom reads in realistic application contexts. These examples will solidify your understanding of phantom behavior and its real-world consequences.

Page Complete

You now possess a deep understanding of range query phantom vulnerabilities—from the mechanics of gap locking to the challenges of multi-dimensional predicates. This knowledge enables you to design database schemas and queries that are resilient to phantom interference.