Loading content...
Range queries represent the most phantom-susceptible category of database operations. When you write WHERE price BETWEEN 100 AND 500 or WHERE date > '2024-01-01', you're defining an infinite search space—a conceptual set of all possible rows that could satisfy your condition, past, present, and future.
This page dissects why range queries are uniquely vulnerable to phantoms, how database systems attempt to protect them, and what strategies developers must employ to ensure consistency in range-based operations.
By the end of this page, you will master the mechanics of range query phantoms, understand how gap locking and next-key locking work, analyze the unique challenges of unbounded and multi-dimensional ranges, and recognize range query patterns that require special protection.
Range queries use comparison operators to define a subset of possible column values. Understanding their structure reveals why they create phantom vulnerabilities.
| Predicate Type | SQL Example | Range Characteristics | Phantom Risk |
|---|---|---|---|
| Bounded Range | BETWEEN 10 AND 100 | Closed interval [10, 100] | High - many possible insertions |
| Lower Bound Only | > 50 or >= 50 | Half-open interval [50, ∞) | Very High - unbounded upper end |
| Upper Bound Only | < 50 or <= 50 | Half-open interval (−∞, 50] | Very High - unbounded lower end |
| Inequality | != 50 | Everything except 50 | Extreme - nearly all values |
| LIKE Pattern | LIKE 'Smith%' | String range starting with 'Smith' | High - many possible strings |
| IN List | IN (1, 5, 10) | Discrete set {1, 5, 10} | Moderate - defined set only |
| Composite Range | x > 10 AND y < 20 | Multi-dimensional box | Very High - 2D+ search space |
The Infinite Set Problem:
Consider the predicate salary > 100000. This conceptually matches:
Traditional locking protects existing rows. But how do you lock a salary value of 150000 that no employee currently has? You can't—unless you lock the range itself.
Range predicates define potentially infinite sets of matching values. Row-level locks can only protect existing materialized values. True range protection requires locking the conceptual space between and beyond existing values—a fundamentally different locking paradigm.
Gap locks are a specialized locking mechanism designed to prevent phantom insertions. They protect the "gaps" between existing index values—the spaces where new rows could be inserted.
How Gap Locks Work:
In a B+Tree index, values are stored in sorted order. Between any two adjacent values lies a gap—a range of values that don't currently exist but could be inserted. Gap locks prevent insertions into these gaps.
12345678910111213141516171819202122232425262728
-- B+Tree Index on 'price' column-- Current values: [50, 100, 150, 200, 300]---- Visual representation with gaps:-- -- (-∞, 50) [50] (50, 100) [100] (100, 150) [150] (150, 200) [200] (200, 300) [300] (300, +∞)-- ↑ ↑ ↑ ↑ ↑ ↑-- Gap 1 Gap 2 Gap 3 Gap 4 Gap 5 Gap 6-- -- Query: SELECT * FROM products WHERE price BETWEEN 100 AND 200; -- With row-level locking ONLY:-- Locks acquired: [100] [150] [200]-- Gaps remain UNPROTECTED!-- A concurrent INSERT of price = 125 would succeed → PHANTOM! -- With gap locking ENABLED:-- Locks acquired:-- - Row lock on [100]-- - Gap lock on (50, 100) -- Prevents inserts like 75-- - Gap lock on (100, 150) -- Prevents inserts like 125-- - Row lock on [150]-- - Gap lock on (150, 200) -- Prevents inserts like 175-- - Row lock on [200]-- - Gap lock on (200, 300) -- Prevents inserts like 250 -- Now concurrent INSERT of price = 125 would BLOCK on gap lock!Gap Lock Properties:
Gap locks don't conflict with each other: Multiple transactions can hold gap locks on the same gap. Gap locks only block insertions into the gap.
Gap locks are held until transaction end: Unlike some row locks that might be released early, gap locks must persist to prevent phantoms throughout the transaction.
Gap locks extend to "supremum": For unbounded ranges like price > 200, the gap lock extends from the last matching value to a special "supremum" pseudo-record representing positive infinity.
Gap locks work with index order: Gap locks only make sense on indexed columns. For non-indexed predicates, gap locking is impossible without a table scan and table-level locking.
InnoDB uses 'next-key locking'—a combination of a row lock and a gap lock on the gap before the row. This efficiently prevents phantoms by locking both the existing row and the insertion point before it. Next-key locks are the default locking mode in InnoDB for index scans.
The difference between bounded and unbounded ranges significantly affects both phantom vulnerability and the cost of prevention.
BETWEEN 100 AND 200> 100 or < 100123456789101112131415161718192021222324252627282930313233343536373839404142
-- ============================================-- BOUNDED RANGE: price BETWEEN 100 AND 200-- ============================================-- Index values: [50, 100, 150, 200, 300]---- Locks needed:-- Gap (50, 100), Row [100]-- Gap (100, 150), Row [150]-- Gap (150, 200), Row [200]-- -- Concurrent activities ALLOWED:-- INSERT price = 25 ✓ (outside range)-- INSERT price = 250 ✓ (outside range)-- UPDATE existing row to price = 175 ✗ (blocked by gap lock)-- INSERT price = 125 ✗ (blocked by gap lock) -- ============================================-- UNBOUNDED RANGE: price > 150-- ============================================-- Index values: [50, 100, 150, 200, 300]---- Locks needed:-- Gap (100, 150), Row [150] -- Gap before first matching row-- Gap (150, 200), Row [200]-- Gap (200, 300), Row [300]-- Gap (300, supremum) -- The infinite gap after last value---- Concurrent activities ALLOWED:-- INSERT price = 25 ✓ (below range)-- INSERT price = 125 ✓ (below range)-- -- Concurrent activities BLOCKED:-- INSERT price = 175 ✗ (in range)-- INSERT price = 500 ✗ (in supremum gap!)-- INSERT price = 999999 ✗ (still in supremum gap) -- ============================================-- FULLY UNBOUNDED: inequality != 150-- ============================================-- This is essentially (−∞, 150) ∪ (150, +∞)-- Requires locking EVERYTHING except the value 150-- Effectively requires table-level locking for protection!The 'supremum' is a special pseudo-record that represents the upper bound of all possible values in an index. Locking to the supremum effectively locks an infinite range. Similarly, 'infimum' represents the lower bound. These concepts enable gap locking on unbounded ranges.
When predicates involve multiple columns, the phantom problem becomes significantly more complex. Multi-dimensional ranges create search spaces that are difficult to protect efficiently.
The Compound Predicate Challenge:
Consider: WHERE department = 'Sales' AND salary > 50000 AND hire_date < '2024-01-01'
This defines a three-dimensional search space. Phantom protection requires ensuring no new row can enter this 3D region—but B+Tree indexes are fundamentally one-dimensional. They can efficiently handle one column's range, but compound conditions require creative approaches.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
-- Multi-dimensional phantom scenario-- Tables: employees (id, name, department, salary, hire_date)-- Indexes: (department), (salary), (hire_date), (department, salary) -- Query with 2D predicate:SELECT * FROM employeesWHERE department = 'Engineering' AND salary BETWEEN 80000 AND 120000; -- Visual representation of the 2D search space:---- salary-- ↑-- 120K │ ███████████████-- │ ███████████████-- │ ███████████████-- 80K │ ███████████████-- │-- └──────────────────→ department-- Engineering---- The shaded region represents all possible (dept, salary) pairs-- that would satisfy the predicate -- With index on (department, salary):-- Gap locks can protect this 2D slice efficiently-- The composite index linearizes the 2D space -- Without composite index:-- Option 1: Lock all Engineering rows (over-locking)-- Option 2: Lock all rows with salary 80K-120K (over-locking)-- Option 3: Lock both independently (partial protection)-- Option 4: Table lock (serialization) -- ============================================-- 3D Predicate: Even more complex-- ============================================SELECT * FROM employeesWHERE department = 'Engineering' AND salary BETWEEN 80000 AND 120000 AND hire_date > '2023-01-01'; -- Now we need to protect a 3D region-- No single index can represent this efficiently-- Practical options:-- 1. Composite index (dept, salary, hire_date) - if selectivity is right-- 2. Partial protection with multiple indexes-- 3. Accept phantom risk at lower isolation levels-- 4. Use SERIALIZABLE with SSI (Serializable Snapshot Isolation)| Strategy | Protection Level | Overhead | When to Use |
|---|---|---|---|
| Composite Index (perfect order) | Complete | Low query overhead, storage for index | Frequent queries on this exact combination |
| Multiple Single-Column Indexes | Partial (depends on optimizer) | Multiple index maintenance | Varied query patterns |
| Covering Index | Complete for covered queries | High storage overhead | Read-heavy workloads with known patterns |
| Table Lock | Complete | Severe concurrency impact | Rare queries, batch operations |
| SSI (Postgres) | Complete (detects conflicts) | Retry overhead on conflicts | Modern PostgreSQL, mixed workloads |
For a composite index (A, B, C), gap locking works efficiently for predicates on (A), (A, B), or (A, B, C). But a predicate on just (B, C) cannot use this index for gap locking—the index order determines which predicates can be protected.
Pattern matching predicates like LIKE create interesting range scenarios. The nature of the pattern determines whether efficient gap locking is possible.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
-- ============================================-- PREFIX PATTERNS: Can use range locking-- ============================================-- Pattern: LIKE 'Smith%'-- Equivalent to: name >= 'Smith' AND name < 'Smiti'-- (The 'th' becomes 'ti' as the upper bound) SELECT * FROM customers WHERE last_name LIKE 'Smith%'; -- This translates to an index range scan:-- Start: 'Smith' (inclusive)-- End: 'Smiti' (exclusive)---- Gap locks protect:-- - Gap before 'Smith'-- - All 'Smith...' variations: 'Smithson', 'Smithers', etc.-- - Gap after the 'Smit...' range---- This IS efficiently protectable against phantoms! -- ============================================-- SUFFIX PATTERNS: Cannot use range locking-- ============================================-- Pattern: LIKE '%son'-- This matches: 'Johnson', 'Wilson', 'Smithson', etc.-- There's no contiguous range in a standard B+Tree index SELECT * FROM customers WHERE last_name LIKE '%son'; -- Database must scan all rows (or use specialized index)-- Gap locking is IMPOSSIBLE without full table protection-- Only options: table lock or accept phantom risk -- ============================================-- INFIX PATTERNS: Also cannot use range locking-- ============================================-- Pattern: LIKE '%mit%'-- Matches: 'Smith', 'Schmidt', 'Blacksmith', etc.-- Same problem as suffix patterns SELECT * FROM customers WHERE last_name LIKE '%mit%'; -- ============================================-- ESCAPE PATTERNS: Range determination varies-- ============================================-- Pattern: LIKE 'O\'Brien%' -- Literal apostrophe-- Can still be treated as prefix range if optimizer understands SELECT * FROM customers WHERE last_name LIKE 'O''Brien%';| Pattern Type | Example | Index Usable? | Gap Lock Possible? | Phantom Risk |
|---|---|---|---|---|
| Prefix | LIKE 'abc%' | Yes (range scan) | Yes | Low (if index exists) |
| Suffix | LIKE '%xyz' | No (full scan) | No | High (table lock or accept) |
| Infix | LIKE '%abc%' | No (full scan) | No | High (table lock or accept) |
| Prefix+Suffix | LIKE 'a%z' | Partial (prefix only) | Partial | Medium |
| Single Wildcard | LIKE 'a_c' | Possible (range) | Possible | Low to Medium |
For suffix and infix patterns, specialized indexes like PostgreSQL's pg_trgm (trigram) or full-text search indexes can provide range-like protection. However, their phantom locking behavior varies by implementation—consult your database's documentation.
Temporal predicates are among the most common range queries in business applications—and consequently, among the most phantom-prone. Reports by date range, time-windowed aggregations, and scheduling queries all face significant phantom risks.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
-- ============================================-- SCENARIO 1: Daily Sales Report-- ============================================BEGIN TRANSACTION; -- Query 1: Get today's sales totalSELECT SUM(amount) as daily_totalFROM salesWHERE sale_date = CURRENT_DATE;-- Result: $45,000 -- Process this total for dashboard... -- Meanwhile, T2 records new sale:-- INSERT INTO sales (sale_date, amount) -- VALUES (CURRENT_DATE, 5000);-- COMMIT; -- Query 2: Verify total matches detailsSELECT COUNT(*), SUM(amount)FROM salesWHERE sale_date = CURRENT_DATE;-- Result: $50,000 ← PHANTOM! COMMIT; -- ============================================-- SCENARIO 2: Month-to-Date with Unbounded End-- ============================================BEGIN TRANSACTION; SELECT SUM(amount)FROM salesWHERE sale_date >= '2024-01-01' AND sale_date <= CURRENT_DATE; -- Bounded but date changes! -- Problems:-- 1. New sales today create INSERT phantoms-- 2. If query runs across midnight, CURRENT_DATE changes-- 3. Timezone differences can create edge cases -- ============================================-- SCENARIO 3: Time-Windowed Aggregation-- ============================================BEGIN TRANSACTION; -- Get orders from last 1 hourSELECT COUNT(*)FROM ordersWHERE created_at > NOW() - INTERVAL '1 hour';-- Result: 150 orders -- Problem: NOW() keeps advancing!-- Even without phantoms, this query isn't truly repeatable-- because the time boundary itself is moving -- Better approach: Pin the time referenceSET @query_time = NOW(); SELECT COUNT(*)FROM ordersWHERE created_at > @query_time - INTERVAL '1 hour' AND created_at <= @query_time; -- This creates a stable, bounded range-- Still needs phantom protection, but at least the-- boundaries don't shift COMMIT;WHERE date >= '2024-01-01' has no upper bound. All future insertions become phantoms. Use explicit end dates.PostgreSQL's transaction_timestamp() returns the timestamp when the current transaction began—unlike NOW() which returns statement start time. Use transaction-stable timestamps for consistent temporal predicates within a transaction.
The intersection of aggregate functions and range predicates creates the highest-risk phantom scenarios. Each aggregate function has unique characteristics that determine how phantoms affect its results.
| Function | Phantom Sensitivity | Impact of Single Phantom | Recoverability |
|---|---|---|---|
| COUNT(*) | Very High | Changes by exactly 1 | Easy to detect (integer change) |
| SUM(x) | High | Increases/decreases by phantom value | Detectable if values known |
| AVG(x) | High | Shifts average non-linearly | Hard to detect (depends on distribution) |
| MIN(x) | Medium | Only changes if phantom is new minimum | Detectable if < previous MIN |
| MAX(x) | Medium | Only changes if phantom is new maximum | Detectable if > previous MAX |
| COUNT(DISTINCT x) | Medium-High | Changes by 0 or 1 | Depends on value uniqueness |
| STRING_AGG/GROUP_CONCAT | Very High | Completely changes output | Visible but may be subtle |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
-- Demonstration: How phantoms affect different aggregates -- Initial state: employees in Engineering with salary 50K-100K-- Rows: (Alice, 60K), (Bob, 75K), (Carol, 80K), (Dave, 95K) -- Query at t1:SELECT COUNT(*) as emp_count, -- 4 SUM(salary) as total_salary, -- 310,000 AVG(salary) as avg_salary, -- 77,500 MIN(salary) as min_salary, -- 60,000 MAX(salary) as max_salary -- 95,000FROM employeesWHERE department = 'Engineering' AND salary BETWEEN 50000 AND 100000; -- Phantom INSERT: (Eve, Engineering, 70,000) -- Query at t2 (same transaction):SELECT COUNT(*) as emp_count, -- 5 (+1) SUM(salary) as total_salary, -- 380,000 (+70,000) AVG(salary) as avg_salary, -- 76,000 (−1,500 !) MIN(salary) as min_salary, -- 60,000 (unchanged) MAX(salary) as max_salary -- 95,000 (unchanged)FROM employeesWHERE department = 'Engineering' AND salary BETWEEN 50000 AND 100000; -- Observations:-- 1. COUNT always changes by 1 for single phantom-- 2. SUM changes by the phantom value-- 3. AVG can go UP or DOWN depending on phantom value vs current avg-- 4. MIN/MAX only change if phantom is extreme value-- 5. The AVERAGE DECREASED even though we ADDED a row!-- (Eve's 70K is below the current average of 77.5K) -- ============================================-- Dangerous pattern: Using aggregate results in logic-- ============================================BEGIN TRANSACTION; -- Step 1: Calculate bonus pool based on headcountSELECT COUNT(*) * 1000 as bonus_poolFROM employeesWHERE department = 'Engineering' AND hire_date < CURRENT_DATE - INTERVAL '1 year';-- Result: 4 employees × $1000 = $4000 pool -- Phantom INSERT: New employee Eve hired 2 years ago -- Step 2: Distribute bonus pool equallyUPDATE employeesSET bonus = 4000.00 / ( SELECT COUNT(*) FROM employees WHERE department = 'Engineering' AND hire_date < CURRENT_DATE - INTERVAL '1 year')WHERE department = 'Engineering' AND hire_date < CURRENT_DATE - INTERVAL '1 year'; -- Now 5 employees each get $4000/5 = $800-- But the pool was calculated for 4 employees!-- We've created $1000 from thin air (or underpaid everyone) COMMIT;One of the most dangerous phantom patterns is computing an aggregate, then using that value in subsequent operations. If a phantom changes the underlying result set between the computation and its use, the logic becomes invalid. This pattern requires SERIALIZABLE isolation or explicit snapshot management.
Protecting range queries from phantoms requires deliberate strategy choices. Each approach has distinct trade-offs between correctness, performance, and complexity.
WITH (HOLDLOCK) or WITH (TABLOCKX) to force stricter locking. Trade-off: very coarse-grained, severe concurrency impact.12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
-- ============================================-- Strategy 1: SERIALIZABLE Isolation-- ============================================SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;BEGIN TRANSACTION; SELECT SUM(amount)FROM ordersWHERE order_date BETWEEN '2024-01-01' AND '2024-01-31'; -- Database now protects this range from concurrent inserts-- InnoDB: Uses gap locks on order_date index-- PostgreSQL SSI: Tracks predicate read, detects conflicts COMMIT; -- ============================================-- Strategy 2: Composite Index-- ============================================CREATE INDEX idx_orders_date_status ON orders(order_date, status); -- Query can now use efficient gap locking on both columns:SELECT * FROM ordersWHERE order_date BETWEEN '2024-01-01' AND '2024-01-31' AND status = 'pending'; -- ============================================-- Strategy 3: SELECT FOR UPDATE (with caution)-- ============================================BEGIN TRANSACTION; SELECT * FROM ordersWHERE order_date BETWEEN '2024-01-01' AND '2024-01-31'FOR UPDATE; -- Locks all matching rows exclusively -- WARNING: This prevents concurrent updates but may NOT-- prevent INSERT phantoms in all databases!-- Check your database documentation for gap locking behavior COMMIT; -- ============================================-- Strategy 4: Application-Level Verification-- ============================================BEGIN TRANSACTION; -- Capture initial stateSELECT COUNT(*) INTO @initial_countFROM ordersWHERE order_date = CURRENT_DATE; -- ... perform operations based on count ... -- Verify before commitSELECT COUNT(*) INTO @final_countFROM ordersWHERE order_date = CURRENT_DATE; IF @initial_count != @final_count THEN ROLLBACK; -- Phantom detected, retryELSE COMMIT;END IF;In most applications, 80% of phantom-sensitive range queries occur in a small number of critical paths (financial reports, inventory counts, security audits). Focus your protection efforts on these paths. Accept phantom risk in non-critical analytics where eventual consistency is acceptable.
We have thoroughly examined how range queries uniquely amplify phantom read vulnerabilities. Let's consolidate the essential insights:
What's next:
Now that we understand how range queries expose phantom vulnerabilities, the next page presents concrete example scenarios that demonstrate phantom reads in realistic application contexts. These examples will solidify your understanding of phantom behavior and its real-world consequences.
You now possess a deep understanding of range query phantom vulnerabilities—from the mechanics of gap locking to the challenges of multi-dimensional predicates. This knowledge enables you to design database schemas and queries that are resilient to phantom interference.