Loading learning content...
If point queries were the only requirement, hash indexes would dominate—their O(1) lookup is unbeatable. But real-world database workloads are filled with range queries: finding all orders from last quarter, retrieving employees with salaries between $50K and $100K, or scanning log entries from the past hour.
This is where B+-trees shine. Their linked leaf structure transforms range queries from repeated tree traversals into a single descent followed by a sequential scan. This architectural decision—linking all leaf nodes in key order—is the defining feature that makes B+-trees the default index structure in virtually every relational database.
By the end of this page, you will master B+-tree range query execution. You'll understand how to find range boundaries, traverse linked leaves efficiently, handle open-ended ranges, calculate I/O costs, and recognize optimization opportunities for range-heavy workloads.
Range queries come in several forms, each with slightly different execution characteristics:
Closed Range (BETWEEN):
SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'
Both lower and upper bounds are specified and inclusive.
Half-Open Ranges (Inequalities):
SELECT * FROM products WHERE price >= 100 AND price < 500
One or both bounds may be exclusive.
Open-Ended Ranges (One Bound):
SELECT * FROM employees WHERE salary >= 80000
SELECT * FROM logs WHERE timestamp < '2024-01-15'
Only one bound is specified; the range extends to the minimum or maximum key.
Prefix Ranges (String Matching):
SELECT * FROM customers WHERE name LIKE 'John%'
String prefixes translate to range queries on collation-ordered indexes.
| Query Type | SQL Example | Start Bound | End Bound | Scan Direction |
|---|---|---|---|---|
| Closed range | BETWEEN a AND b | Search for a | Stop after b | Forward |
| Greater than | a or >= a | Search for a | Rightmost leaf | Forward |
| Less than | < b or <= b | Leftmost leaf | Stop after b | Forward |
| Prefix match | LIKE 'abc%' | Search for 'abc' | Stop at first non-match | Forward |
| Full scan | ORDER BY indexed_col | Leftmost leaf | Rightmost leaf | Forward or backward |
The query LIKE 'John%' is equivalent to the range query name >= 'John' AND name < 'Joho' (where 'Joho' is the first string lexicographically greater than all strings starting with 'John'). The optimizer transforms prefix LIKE into range bounds automatically.
The B+-tree's range query efficiency stems from its doubly-linked leaf chain. Let's examine this structure in detail.
Leaf Node Linking:
Every leaf node contains:
This creates a linked list at the leaf level that maintains key ordering across the entire tree.
Why Linking Matters:
Without leaf linking, a range query would require:
With leaf linking:
For a query returning 10,000 matching records from a billion-row table:
That's a 1,400× improvement from a simple design decision.
Some B+-tree implementations use only forward (next) pointers to save space. This is sufficient for forward range scans but requires different strategies for backward scans (e.g., ORDER BY DESC). Production databases typically implement doubly-linked leaves to support both scan directions efficiently.
The range query algorithm consists of three phases: boundary finding, sequential scanning, and termination. Let's examine each in detail.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
FUNCTION BPlusTreeRangeQuery(root, lowKey, highKey, lowInclusive, highInclusive) // ==================================================== // PHASE 1: FIND STARTING POSITION (Boundary Search) // ==================================================== // Use standard search to find the leaf containing lowKey startLeaf ← BPlusTreeSearch(root, lowKey).leafNode // Find the position within the leaf to start scanning IF lowInclusive THEN startPos ← FindFirstGreaterOrEqual(startLeaf, lowKey) ELSE startPos ← FindFirstGreaterThan(startLeaf, lowKey) END IF // ==================================================== // PHASE 2: SEQUENTIAL LEAF SCAN // ==================================================== results ← [] currentLeaf ← startLeaf currentPos ← startPos WHILE currentLeaf ≠ NULL DO WHILE currentPos < currentLeaf.keyCount DO currentKey ← currentLeaf.keys[currentPos] // Check if we've passed the upper bound IF highInclusive THEN IF currentKey > highKey THEN RETURN results // Done - past upper bound END IF ELSE IF currentKey >= highKey THEN RETURN results // Done - reached or passed upper bound END IF END IF // Key is in range - add to results results.append(currentLeaf.recordPointers[currentPos]) currentPos ← currentPos + 1 END WHILE // ==================================================== // PHASE 3: MOVE TO NEXT LEAF (Sibling Traversal) // ==================================================== currentLeaf ← currentLeaf.nextSibling currentPos ← 0 END WHILE RETURN results // Reached end of indexEND FUNCTION FUNCTION FindFirstGreaterOrEqual(leaf, key) // Binary search for first key >= target left ← 0 right ← leaf.keyCount - 1 result ← leaf.keyCount // Default: past end WHILE left ≤ right DO mid ← (left + right) / 2 IF leaf.keys[mid] >= key THEN result ← mid right ← mid - 1 // Look for earlier match ELSE left ← mid + 1 END IF END WHILE RETURN resultEND FUNCTION FUNCTION FindFirstGreaterThan(leaf, key) // Binary search for first key > target left ← 0 right ← leaf.keyCount - 1 result ← leaf.keyCount // Default: past end WHILE left ≤ right DO mid ← (left + right) / 2 IF leaf.keys[mid] > key THEN result ← mid right ← mid - 1 // Look for earlier match ELSE left ← mid + 1 END IF END WHILE RETURN resultEND FUNCTIONThe algorithm checks the upper bound on every key. For a small range in a large table, most of the work is finding the start; once we exceed the upper bound, we stop immediately without scanning further leaves. This makes B+-tree range queries efficient regardless of total table size—cost depends primarily on result set size.
Let's trace a complete range query through our example B+-tree:
Query: Find all keys in the range [25, 48] (inclusive both bounds)
Tree Structure (Leaf Level Only):
[3,5,7] ↔ [10,12,15] ↔ [20,22,28] ↔ [30,35,38] ↔ [40,42,45] ↔ [50,55,58]
Phase 1: Find Starting Position
Search for lowKey = 25:
Phase 2: Sequential Scan
| Step | Current Leaf | Current Key | Compare with 48 | Action |
|---|---|---|---|---|
| 1 | Leaf 3 | 28 | 28 ≤ 48 ✓ | Add 28 to results |
| 2 | ↓ end of Leaf 3, move to sibling | |||
| 3 | Leaf 4 | 30 | 30 ≤ 48 ✓ | Add 30 to results |
| 4 | Leaf 4 | 35 | 35 ≤ 48 ✓ | Add 35 to results |
| 5 | Leaf 4 | 38 | 38 ≤ 48 ✓ | Add 38 to results |
| 6 | ↓ end of Leaf 4, move to sibling | |||
| 7 | Leaf 5 | 40 | 40 ≤ 48 ✓ | Add 40 to results |
| 8 | Leaf 5 | 42 | 42 ≤ 48 ✓ | Add 42 to results |
| 9 | Leaf 5 | 45 | 45 ≤ 48 ✓ | Add 45 to results |
| 10 | ↓ end of Leaf 5, move to sibling | |||
| 11 | Leaf 6 | 50 | 50 > 48 ✗ | Stop - upper bound exceeded |
| Metric | Value | Explanation |
|---|---|---|
| Tree height traversals | 1 | Single descent to find starting leaf |
| Leaf nodes read | 4 | Leaves 3, 4, 5, 6 (stopped mid-scan on 6) |
| Total I/Os | ~5 | Height traversal + leaf reads |
| Keys examined | 10 | 28, 30, 35, 38, 40, 42, 45, 50 + binary search |
| Keys returned | 7 | [28, 30, 35, 38, 40, 42, 45] |
Despite the tree containing many more leaves, we only read the 4 leaves that contain keys in our range. The linked list structure ensures we never read irrelevant leaves—we start at the right position and stop exactly when the range ends.
Open-ended range queries require special handling because one bound is effectively infinite. Let's examine each case:
Query Pattern: WHERE column >= value
Algorithm:
Example: WHERE salary >= 80000
I/O Cost: O(log N) + O(k/b) where k is result count and b is keys per leaf
Optimization: For queries returning a large fraction of the table, the optimizer may prefer a full table scan to avoid following many sibling pointers.
An open-ended range query like WHERE age > 5 on an employee table will likely match nearly all rows. At this point, using the index becomes counterproductive—the query would read the entire index AND then fetch each row via record pointers (random I/Os). The optimizer may choose a sequential table scan instead.
Let's rigorously analyze the cost of range queries in terms of I/O operations and CPU time.
Notation:
I/O Cost Breakdown:
For typical parameters:
Example: Range query returning 10,000 keys from a billion-row table with 4KB pages and 100 keys/leaf:
| Result Set Size (k) | Leaf Pages Read | Total I/Os | Approx. Time @ 10ms/IO |
|---|---|---|---|
| 10 | 1 | 5 | 50 ms |
| 100 | 1 | 5 | 50 ms |
| 1,000 | 10 | 14 | 140 ms |
| 10,000 | 100 | 104 | 1 second |
| 100,000 | 1,000 | 1,004 | 10 seconds |
| 1,000,000 | 10,000 | 10,004 | 100 seconds (without read-ahead) |
When the storage system detects sequential leaf access, it can prefetch upcoming pages. With SSDs or large read-ahead buffers, sequential leaf scanning achieves near-streaming throughput—far faster than the random I/O estimates above. A 100,000-row range scan might take 1 second rather than 10 seconds with effective read-ahead.
CPU Cost:
CPU cost is typically negligible compared to I/O cost. For in-memory databases (all data cached), CPU becomes the bottleneck, but it still scales linearly with result size—unavoidable for any approach.
How do B+-tree range queries compare to other approaches?
| Structure | Range Query Cost | Key Advantage | Key Limitation |
|---|---|---|---|
| B+-tree | O(log N + k) | Linked leaves enable sequential scan | One tree descent required |
| Hash Index | O(N) — full scan required | N/A for ranges | Cannot support range queries at all |
| Sorted Array | O(log N + k) | Cache-friendly scan | O(N) insert cost prohibitive |
| Skip List | O(log N + k) | Simpler implementation | Less cache-efficient than B+-tree |
| Full Table Scan | O(N) | No index maintenance cost | Must examine every row |
When the index is not clustered (data rows not stored in index order), each index entry points to a potentially random location in the heap. A range scan returning 1,000 rows might require 1,000 random I/Os to fetch the actual data. This is often slower than a full table scan, leading optimizers to skip the index for large range queries on non-clustered indexes.
Database systems employ several optimizations to make range queries even faster:
A covering index includes all columns that a query needs. For range queries, this means scanning only leaf pages—no heap access at all. A query like SELECT employee_id, salary FROM employees WHERE salary BETWEEN 50000 AND 100000 on an index covering (salary, employee_id) never touches the table data.
Let's consolidate the key concepts of B+-tree range queries:
What's Next:
Now that you understand B+-tree query operations, the next page examines insertion with splitting—how new keys are added to the tree while maintaining balance and the critical invariants that make search efficient.
You now understand how B+-trees handle range queries with remarkable efficiency. This knowledge is essential for index design, query optimization, and understanding why B+-trees remain the dominant index structure in relational databases despite decades of alternative proposals.