Loading learning content...
If point queries are the battleground where hash and tree indexes compete closely, range queries are the terrain where the battle is decided before it begins. Hash indexes simply cannot perform range queries efficiently—not because of poor implementation, but because of fundamental mathematical properties of hash functions.
This limitation isn't a minor inconvenience. Range queries are ubiquitous in real-world database applications: finding all orders placed between two dates, all employees with salaries in a specific band, all products priced below a threshold. Understanding why hash indexes fail here—and how B+trees excel—is essential for practical index design.
Point queries ask 'Where is the record with key = X?' Range queries ask 'Where are all records with A ≤ key ≤ B?' Hash indexes can only answer the first question. Tree indexes answer both. This asymmetry often makes hash indexes impractical despite their point-query performance.
To understand why hash indexes cannot support range queries, we must examine what hash functions do to key ordering.
The Purpose of Hashing
Hash functions are designed to uniformly distribute keys across buckets. A good hash function achieves this by deliberately scrambling the relationship between similar keys. If consecutive keys mapped to consecutive buckets, the distribution would be poor—all hot data would cluster in adjacent buckets.
Consider what happens to a sequence of integer keys:
Key Hash Value (mod 8 buckets)
100 → 3
101 → 7
102 → 1
103 → 4
104 → 0
105 → 6
106 → 2
107 → 5
Notice that consecutive keys (100, 101, 102, ...) map to seemingly random buckets (3, 7, 1, ...). This is by design—it's exactly what makes hash indexes efficient for point lookups.
The same property that makes hash indexes fast for equality lookups—destroying key ordering—makes them useless for range queries. To find all keys between 100 and 107, you'd need to examine every bucket because those keys are scattered randomly across the entire structure.
Mathematical Formalization
A hash function h: K → B maps keys from key space K to bucket space B. For range queries on [a, b], we need all keys k where a ≤ k ≤ b.
Given keys k₁ < k₂:
To support range queries, we would need a hash function where:
But this is precisely what makes a hash function poor—keys would cluster based on their natural ordering, defeating the purpose of hashing for uniform distribution.
| Property | Hash Function | B+Tree Organization |
|---|---|---|
| Order preservation | Deliberately destroyed | Carefully maintained |
| Key k < key k' | h(k) has no relation to h(k') | k appears before k' in traversal |
| Adjacent keys | May be in distant buckets | Are in same or adjacent leaves |
| Range [a,b] location | Scattered across all buckets | Contiguous in leaf sequence |
Let's quantify the performance difference for range queries between hash and tree indexes.
Hash Index Range Query 'Algorithm'
Since hash indexes provide no ordering, a range query on a hash-indexed column requires:
Cost: O(n) where n is total number of entries—independent of how selective the range is.
1234567891011121314151617181920212223242526
function hashRangeQuery(lowerBound, upperBound, hashTable): // TERRIBLE: Must scan all buckets - O(n) always results = [] for bucketNum in range(0, hashTable.numBuckets): bucket = readBucket(bucketNum) // Every single bucket! for entry in bucket.entries: if lowerBound <= entry.key <= upperBound: results.append(entry) // Don't forget overflow chains overflow = bucket.overflowPointer while overflow != null: for entry in overflow.entries: if lowerBound <= entry.key <= upperBound: results.append(entry) overflow = overflow.nextOverflow return results // Analysis:// - Table has 1 million rows// - Range query matches 1,000 rows (0.1% selectivity)// - Hash: STILL reads all 1 million entries// - Cost: O(n) regardless of selectivityB+Tree Range Query Algorithm
B+trees exploit their ordered structure to efficiently locate and scan range results:
Cost: O(log n + k) where k is the number of matching entries
12345678910111213141516171819202122232425262728293031
function btreeRangeQuery(lowerBound, upperBound, btree): // EFFICIENT: O(log n + k) where k = result count results = [] // Step 1: Find starting leaf - O(log n) currentNode = btree.root while not currentNode.isLeaf: childIndex = findChildIndex(currentNode, lowerBound) currentNode = readNode(currentNode.children[childIndex]) // Step 2: Locate first matching entry within leaf startIndex = binarySearch(currentNode.keys, lowerBound) // Step 3: Sequential scan through leaves - O(k) while currentNode != null: for i in range(startIndex, len(currentNode.keys)): if currentNode.keys[i] > upperBound: return results // Done! Upper bound exceeded results.append(currentNode.entries[i]) // Follow sibling pointer to next leaf currentNode = currentNode.nextSibling startIndex = 0 // Start from beginning of next leaf return results // Analysis:// - Table has 1 million rows// - Range query matches 1,000 rows (0.1% selectivity)// - B+tree: log₂₀₀(1M) ≈ 3 levels + 1,000 entries ≈ 1,003 accesses// - Cost: O(log n + k) - proportional to result size| Selectivity | Result Size | Hash Index Cost | B+Tree Cost | Speedup Factor |
|---|---|---|---|---|
| 0.01% | 100 rows | ~1,000,000 accesses | ~103 accesses | ~10,000x |
| 0.1% | 1,000 rows | ~1,000,000 accesses | ~1,003 accesses | ~1,000x |
| 1% | 10,000 rows | ~1,000,000 accesses | ~10,003 accesses | ~100x |
| 10% | 100,000 rows | ~1,000,000 accesses | ~100,003 accesses | ~10x |
| 100% | 1,000,000 rows | ~1,000,000 accesses | ~1,000,003 accesses | ~1x |
For highly selective range queries (small result sets), B+trees outperform hash by orders of magnitude. Only when the range covers nearly the entire table does hash performance approach B+tree. In practice, useful range queries are selective—if you're fetching everything, you'd use a table scan anyway.
Range queries come in several forms, all of which B+trees handle efficiently and hash indexes cannot support:
Bounded Range Queries
SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31'
SELECT * FROM products WHERE price >= 10.00 AND price <= 50.00
SELECT * FROM employees WHERE hire_date >= '2020-01-01'
Unbounded Range Queries (One-Sided)
SELECT * FROM transactions WHERE amount > 10000 -- Find high-value transactions
SELECT * FROM inventory WHERE quantity < 10 -- Find low-stock items
SELECT * FROM users WHERE last_login < NOW() - INTERVAL '90 days' -- Inactive users
Prefix Queries (String Ranges)
Prefix matching is a special case of range query:
SELECT * FROM customers WHERE name LIKE 'Smith%'
SELECT * FROM products WHERE sku LIKE 'ELEC-%'
These translate to range queries:
B+trees support this naturally through sorted string ordering. Hash indexes cannot—hashing 'Smith' and 'Smithson' produces unrelated bucket addresses.
Min/Max Queries
SELECT MIN(price), MAX(price) FROM products WHERE category = 'Electronics'
SELECT * FROM orders ORDER BY order_date DESC LIMIT 10
These leverage the ordering of B+trees:
Hash indexes require full scans for all of these.
| Query Type | Example | B+Tree | Hash Index |
|---|---|---|---|
| Bounded range | BETWEEN a AND b | ✓ O(log n + k) | ✗ O(n) |
| Unbounded range | > a or < b | ✓ O(log n + k) | ✗ O(n) |
| Prefix match | LIKE 'abc%' | ✓ O(log n + k) | ✗ O(n) |
| MIN value | MIN(column) | ✓ O(log n) | ✗ O(n) |
| MAX value | MAX(column) | ✓ O(log n) | ✗ O(n) |
| ORDER BY | ORDER BY column | ✓ O(n) no sort | ✗ O(n log n) |
| Top-K / LIMIT | LIMIT k | ✓ O(log n + k) | ✗ O(n + k log k) |
Beyond algorithmic complexity, B+trees provide a crucial I/O efficiency advantage for range queries: sequential access at the leaf level.
Leaf Node Linking
B+tree leaf nodes are linked via sibling pointers, forming a doubly-linked list:
[Leaf₀] ↔ [Leaf₁] ↔ [Leaf₂] ↔ [Leaf₃] ↔ [Leaf₄] ↔ ...
This structure enables sequential scanning after locating the range start. On traditional spinning disks, sequential reads are dramatically faster than random reads:
A range query reading 100 consecutive pages:
The difference can be 30x or more for disk-based systems.
Physical Clustering
B+trees often exhibit good physical clustering on disk:
Modern storage systems exploit this:
SSDs have similar random and sequential latency for small reads, but sequential reads still win due to: (1) reduced CPU/driver overhead per I/O, (2) better internal parallelism, and (3) larger effective read sizes through prefetching. The advantage is smaller than with HDDs (perhaps 3-5x) but still significant.
| Aspect | Hash Index Scan | B+Tree Range Scan |
|---|---|---|
| Access pattern | Random (all buckets) | Sequential (leaf chain) |
| Prefetch effectiveness | None (unpredictable) | High (predictable path) |
| Disk seek count | One per bucket | One initial + sequential |
| Cache utilization | Poor (no locality) | Excellent (adjacent pages) |
| CPU efficiency | Low (scattered data) | High (cache-friendly) |
Range queries are pervasive in production databases. Let's examine common scenarios where the hash vs tree index decision has direct performance implications.
E-Commerce: Order Date Ranges
123456789101112131415161718
-- Daily sales report: All orders from yesterdaySELECT * FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '1 day' AND order_date < CURRENT_DATE; -- Monthly revenue calculationSELECT SUM(total_amount) FROM ordersWHERE order_date >= '2024-01-01' AND order_date < '2024-02-01'; -- Customer order history (customer_id equality + date range)SELECT * FROM ordersWHERE customer_id = 12345 AND order_date >= '2023-01-01'ORDER BY order_date DESC; -- Index recommendation:-- B+tree on (customer_id, order_date) - composite supports both!-- Hash on customer_id - point lookup only, loses date orderingFinancial Systems: Transaction Ranges
123456789101112131415161718
-- Fraud detection: High-value transactionsSELECT * FROM transactions WHERE amount > 10000; -- Account statement generationSELECT * FROM transactionsWHERE account_id = 'ABC123' AND transaction_time BETWEEN '2024-01-01' AND '2024-01-31'ORDER BY transaction_time; -- Interest calculation: Balance over timeSELECT date, balance FROM daily_balancesWHERE account_id = 'ABC123' AND date >= '2023-01-01'ORDER BY date; -- Compliance reporting: Transactions in amount rangesSELECT COUNT(*), SUM(amount) FROM transactionsWHERE amount >= 1000 AND amount < 10000;Analytics and Logging: Time-Series Data
12345678910111213141516171819
-- Dashboards: Recent metricsSELECT metric_name, AVG(value) FROM metricsWHERE timestamp >= NOW() - INTERVAL '1 hour'GROUP BY metric_name; -- Log analysis: Errors in time windowSELECT * FROM logsWHERE log_level = 'ERROR' AND created_at >= NOW() - INTERVAL '24 hours'ORDER BY created_at DESC; -- User activity analysisSELECT date_trunc('day', event_time), COUNT(*) FROM user_eventsWHERE user_id = 12345 AND event_time >= '2024-01-01'GROUP BY 1 ORDER BY 1; -- Time-series data is fundamentally about ranges!Virtually every business domain involves time-based queries (date ranges), value-based filtering (price/amount ranges), or sorted output (ORDER BY). Hash indexes, despite their point-query speed, cannot participate in any of these operations efficiently. This is why B+trees dominate in practice.
The range query limitation of hash indexes has profound implications for index selection strategy.
The Mix Problem
Most real workloads combine point and range queries on the same columns:
-- Point query: Specific customer lookup
SELECT * FROM customers WHERE customer_id = 12345;
-- Range query: Customers in region
SELECT * FROM customers WHERE region = 'WEST' AND created_at > '2024-01-01';
With a hash index on customer_id:
With a B+tree index on customer_id:
The hash index provides marginal point-query benefit but blocks any range capability on that column.
= value comparisonsAsk yourself: 'Will I EVER need to do a range query, ordering, or prefix match on this column?' If the answer is anything other than 'definitely not,' choose a B+tree. The modest point-query performance difference rarely justifies losing all range query capability.
Range queries often benefit from covering indexes—indexes that contain all columns needed to satisfy a query without accessing the base table. This optimization interacts differently with hash and tree structures.
B+Tree Covering Indexes
B+tree indexes can include additional columns beyond the search key:
CREATE INDEX idx_orders_covering ON orders (order_date, customer_id, total_amount);
-- This query can be answered entirely from the index:
SELECT customer_id, total_amount FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';
Benefits:
Hash Index Covering Limitations
Hash indexes can theoretically store additional columns in bucket entries, but the utility is limited:
Practical Example
Consider an analytics query:
SELECT product_category, SUM(quantity), SUM(revenue)
FROM order_items
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY product_category;
With B+tree covering index on (order_date, product_category, quantity, revenue):
With hash index: Cannot use index for range; falls back to table scan.
B+tree covering indexes transform expensive range queries into efficient index-only scans. Hash indexes cannot provide this benefit for range operations—they can only cover equality lookups, which are already fast anyway.
Range queries represent the clearest differentiation between hash and tree indexes. This isn't a matter of performance tuning or clever implementation—it's a fundamental mathematical limitation.
If your workload includes even occasional range queries, hash indexes are not viable for those columns. The O(1) point-query advantage is irrelevant when range queries become full table scans. This single limitation explains why B+trees dominate production databases.
What's Next
We've examined point queries (where hash indexes compete) and range queries (where they fail). The next page explores space usage—how hash and tree indexes differ in memory and storage footprint, including overhead, fill factors, and growth characteristics.
You now understand why hash indexes fundamentally cannot support range queries and the dramatic performance implications. This limitation is the primary reason B+trees are the default choice in nearly every production database system. Next, we examine space utilization differences.