Loading learning content...
We've established that multi-level index trees provide O(log_f n) search complexity, where f is the fanout and n is the number of records. But what does this mean in practice? How many milliseconds does a search actually take? How does caching affect performance? How do range queries differ from point queries?
This page moves from theoretical complexity to practical performance analysis. Understanding these details is crucial for database tuning, capacity planning, and troubleshooting performance issues. The difference between a well-tuned and poorly-tuned database often comes down to understanding exactly how search operations interact with the index structure.
By the end of this page, you will understand: (1) the exact cost of point queries in terms of I/O operations, (2) how range queries traverse the tree and scan leaves, (3) the mathematics of I/O cost with and without caching, (4) how buffer management affects search performance, and (5) comparison with alternative access methods (full scan, hash index).
A point query retrieves a single record based on an exact key match. This is the most common index operation and the one for which B+-trees are optimized. Let's analyze the cost precisely.
The search path:
A point query traverses from the root to a leaf, reading one block at each level, then accesses the data block containing the actual record (unless it's a covering index).
Total I/O = (Tree Height) + (Data Access) = h + 1
For typical parameters:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
Point Query Cost Analysis========================== Scenario: SELECT * FROM customers WHERE customer_id = 5000000 Index structure: - 10 million records - Fanout: 250 entries per block - Tree height: 3 levels - Index type: B+-tree (data pointers in leaves only) Search path: Step 1: Read root block (Level 3) - 1 I/O operation - Find entry where K < 5,000,000 ≤ K' - Binary search within block: ~8 comparisons (log₂ 250) - CPU time: ~1 microsecond - Disk time: 0.1ms (SSD) or 10ms (HDD) Step 2: Read internal block (Level 2) - Follow pointer from Step 1 - 1 I/O operation - Binary search: ~8 comparisons - Disk time: 0.1ms (SSD) or 10ms (HDD) Step 3: Read leaf block (Level 1) - Follow pointer from Step 2 - 1 I/O operation - Binary search: ~8 comparisons - Find exact entry for customer_id = 5,000,000 - Disk time: 0.1ms (SSD) or 10ms (HDD) Step 4: Read data block (Level 0) - Follow data pointer from leaf entry - 1 I/O operation - Retrieve actual customer record - Disk time: 0.1ms (SSD) or 10ms (HDD) TOTAL COST: Cold Cache Warm Cache (upper levels cached) I/O operations: 4 2 (Level 3+2 cached) SSD latency: 0.4 ms 0.2 ms HDD latency: 40 ms 20 ms CPU time: ~4 µs ~4 µs (negligible) The dominant cost is ALWAYS disk I/O.CPU time for in-block binary search is <0.01% of total time.| Records | Tree Height | I/O (Cold) | I/O (Warm) | SSD Latency (Cold) | SSD Latency (Warm) |
|---|---|---|---|---|---|
| 10,000 | 2 | 3 | 2 | 0.3 ms | 0.2 ms |
| 1,000,000 | 3 | 4 | 2 | 0.4 ms | 0.2 ms |
| 100,000,000 | 4 | 5 | 2 | 0.5 ms | 0.2 ms |
| 10,000,000,000 | 5 | 6 | 2 | 0.6 ms | 0.2 ms |
| 1,000,000,000,000 | 6 | 7 | 2-3 | 0.7 ms | 0.2-0.3 ms |
Notice that warm cache latency is almost constant (0.2-0.3 ms) regardless of database size. Upper index levels occupy little memory and stay cached. Only leaf and data blocks vary. This is why properly sized buffer pools are critical—they convert logarithmic I/O into near-constant I/O.
A range query retrieves all records where the key falls within a specified range. B+-trees excel at range queries due to their leaf-level structure: all leaf nodes are linked, allowing sequential traversal after finding the starting point.
The range query algorithm:
The cost has two components: the initial tree traversal (h I/Os) and the leaf/data scan (proportional to result size).
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
def range_query_cost( start_key: int, end_key: int, total_records: int, fanout: int, records_per_data_block: int, selectivity: float = None # Fraction of records in range) -> dict: """ Calculate I/O cost for a range query. Parameters: - start_key, end_key: Range boundaries - total_records: Total records in table - fanout: Index entries per block - records_per_data_block: Data records per block - selectivity: Fraction of records matching (if known) Returns: - Dictionary with cost breakdown """ import math # Tree height height = math.ceil(math.log(total_records) / math.log(fanout)) # Calculate selectivity if not provided if selectivity is None: key_range = end_key - start_key total_key_range = total_records # Assume keys are 1 to n selectivity = key_range / total_key_range # Records in range records_in_range = int(total_records * selectivity) # Leaf blocks to scan # Each leaf block has 'fanout' entries leaf_blocks_scanned = math.ceil(records_in_range / fanout) # Data blocks to access # Best case: data is clustered, sequential # Worst case: each record in different block clustered_data_blocks = math.ceil(records_in_range / records_per_data_block) unclustered_data_blocks = records_in_range # One per record return { 'tree_traversal_ios': height, 'leaf_scan_ios': leaf_blocks_scanned, 'data_access_ios_clustered': clustered_data_blocks, 'data_access_ios_unclustered': unclustered_data_blocks, 'total_ios_clustered': height + leaf_blocks_scanned + clustered_data_blocks, 'total_ios_unclustered': height + leaf_blocks_scanned + unclustered_data_blocks, 'records_returned': records_in_range, } # Example: Find orders from Jan 2024 (assume 5% of 1 million orders)cost = range_query_cost( start_key=20240101, end_key=20240131, total_records=1_000_000, fanout=250, records_per_data_block=20, selectivity=0.05) print(f"""Range Query: Orders from January 2024=====================================Records in range: {cost['records_returned']:,}Tree traversal: {cost['tree_traversal_ios']} I/Os Clustered index (orders sorted by date): Leaf scan: {cost['leaf_scan_ios']} I/Os Data access: {cost['data_access_ios_clustered']} I/Os TOTAL: {cost['total_ios_clustered']} I/Os Non-clustered index (orders NOT sorted by date): Leaf scan: {cost['leaf_scan_ios']} I/Os Data access: {cost['data_access_ios_unclustered']:,} I/Os TOTAL: {cost['total_ios_unclustered']:,} I/Os""") # Output:# Records in range: 50,000# Tree traversal: 3 I/Os# # Clustered index: TOTAL: 2,703 I/Os# Non-clustered index: TOTAL: 50,203 I/Os (20x worse!)The clustering advantage:
The example above highlights a critical factor: whether the data is physically clustered by the index key.
For large range queries on non-clustered indexes, the optimizer may choose a full table scan instead, as it can be faster than thousands of random I/Os.
Non-clustered range queries have a break-even point. If the range selects more than ~5-15% of the table, a full table scan (sequential I/O) is often faster than index access (random I/O). Query optimizers detect this and switch strategies. Understanding this helps explain why the same query might use an index scan OR a table scan depending on the data.
The buffer pool (or buffer cache) is memory allocated to cache frequently-accessed disk blocks. Its size and management policy dramatically affect index search performance. Understanding this interaction is crucial for database tuning.
The caching hierarchy for index searches:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
Buffer Pool Impact Analysis============================ Database: 10 million records, B+-tree indexBlock size: 4 KBFanout: 250 Index structure: Level 3 (Root): 1 block = 4 KB Level 2: 40 blocks = 160 KB Level 1 (Leaves): 40,000 blocks = 160 MB Data: 500,000 blocks = 2 GB SCENARIO 1: Tiny buffer pool (1 MB)-----------------------------------Can cache: Root + Level 2 + ~200 hot leaf blocksHit rate estimate: - Root: 100% (always cached) - Level 2: 100% (fits entirely) - Level 1: ~0.5% (200/40,000 if uniform access) - Data: ~0% (essentially no caching) Average I/Os per point query: ~3 (leaf + data + maybe some misses) SCENARIO 2: Moderate buffer pool (200 MB)-----------------------------------------Can cache: Root + Level 2 + All leaves + ~10,000 hot data blocksHit rate estimate: - Root: 100% - Level 2: 100% - Level 1: 100% (all 160 MB cached!) - Data: ~2% uniform, or higher with locality Average I/Os per point query: ~1-1.5 (mostly just data access) SCENARIO 3: Large buffer pool (500 MB)--------------------------------------Can cache: Entire index + 25% of dataHit rate estimate: - All index levels: 100% - Data: ~25% uniform, higher with locality Average I/Os per point query: ~0.75 (many fully cached) SCENARIO 4: Huge buffer pool (3 GB)-----------------------------------Can cache: Everything (index + all data)Hit rate estimate: 100% Average I/Os per point query: 0 (all in memory) KEY INSIGHT: The marginal value of buffer pool memory is non-linear. - First 1 MB: Critical (caches upper index levels) - 1-200 MB: High value (caches all index leaves) - 200 MB - 2 GB: Moderate value (caches data) - Beyond data size: No additional value Prioritize buffer pool sizing for: 1. Upper index levels (mandatory) 2. Leaf levels of hot indexes 3. Hot data pages| Buffer Pool Size | % of Index Cached | % of Data Cached | Avg I/Os per Query | Relative Performance |
|---|---|---|---|---|
| 1 MB | 0.5% | 0% | 3.5 | 1x (baseline) |
| 10 MB | 5% | 0.5% | 2.8 | 1.25x |
| 100 MB | 50% | 3% | 1.8 | 1.9x |
| 200 MB | 100% | 5% | 1.0 | 3.5x |
| 500 MB | 100% | 20% | 0.8 | 4.4x |
| 2 GB | 100% | 80% | 0.2 | 17.5x |
| 3 GB (all) | 100% | 100% | 0 | ∞ (in-memory) |
If you have limited memory, prioritize caching index pages over data pages. A fully cached index with uncached data gives 1 I/O per query (data access). A partially cached index with cached data still requires 2-3 I/Os per query (index traversal). The index is smaller and more heavily accessed—cache it first.
Multi-level tree indexes aren't the only way to access data. Understanding how they compare to alternatives helps justify their near-universal use and identifies scenarios where other approaches might be better.
Alternative access methods:
| Access Method | Point Query I/O | Range Query I/O (5%) | Space Overhead | Update Cost |
|---|---|---|---|---|
| Full Table Scan | 500,000 blocks | 500,000 blocks | 0 bytes | 1 block |
| Hash Index | 1-2 blocks | N/A (no range support) | ~150 MB | 1-2 blocks |
| Single-Level Index | 16 blocks | 16 + 2,000 leaf + data | ~140 MB | Multiple blocks |
| B+-Tree Index | 3-4 blocks | 3 + 2,000 leaf + data | ~160 MB | 2-4 blocks |
| Bitmap Index | ~10 blocks | Depends on conjunction | Varies greatly | Expensive |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
-- Access Method Comparison: Concrete Examples-- Table: orders (10 million rows, 200 bytes each)-- Total data: ~2 GB (500,000 blocks at 4 KB) -- QUERY 1: Point lookup-- SELECT * FROM orders WHERE order_id = 5000000 -- Access Method I/O Operations Time (SSD) Time (HDD)-- ------------------- ----------------- ------------ -------------- Full Scan 500,000 50 sec 83 min-- Hash Index 1-2 0.2 ms 20 ms-- B+-Tree Index 4 0.4 ms 40 ms -- QUERY 2: Range query (5% selectivity)-- SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31' -- Access Method I/O Operations Time (SSD) Time (HDD)-- ------------------- ----------------- ------------ -------------- Full Scan 500,000 50 sec 83 min-- Hash Index N/A (no range!) Cannot use Cannot use-- B+-Tree (clustered) 27,000 2.7 sec 4.5 min-- B+-Tree (non-clust.) 502,000 50 sec 84 min -- QUERY 3: High-selectivity range (90% of table)-- SELECT * FROM orders WHERE order_date >= '2023-01-01' -- Access Method I/O Operations Time (SSD) Time (HDD)-- ------------------- ----------------- ------------ -------------- Full Scan 500,000 50 sec 83 min-- B+-Tree (clustered) 450,000 45 sec 75 min-- B+-Tree (non-clust.) 9,000,000+ 15 min+ 25 hours+ -- For high selectivity, full scan wins over non-clustered index!-- The optimizer should (and usually does) choose full scan here. -- QUERY 4: AND of multiple conditions-- SELECT * FROM orders WHERE status = 'shipped' AND region = 'west' -- Access Method I/O Operations (if each selects 20%)-- ------------------- -------------------------------------------- Full Scan 500,000 (but can evaluate both predicates)-- Single B+-Tree 100,000 (one index) + filter in memory-- Bitmap Index ~500 (if low cardinality, bitmap AND is fast)-- Two B+-Tree indexes Intersection: ~20,000 (if supported)B+-trees provide the best general-purpose performance: fast point queries, efficient range queries, reasonable update costs, and predictable behavior. While hash indexes are faster for pure point queries, losing range query capability is rarely acceptable. This versatility makes B+-trees the universal choice for primary database indexes.
The theoretical analysis assumes ideal conditions. Real-world performance is affected by multiple additional factors. Understanding these helps bridge the gap between textbook knowledge and production troubleshooting.
Factors affecting actual search performance:
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Demonstrating index fragmentation impact -- Check fragmentation in SQL ServerSELECT object_name(ips.object_id) as table_name, i.name as index_name, ips.avg_fragmentation_in_percent, ips.page_count, ips.avg_page_space_used_in_percentFROM sys.dm_db_index_physical_stats( DB_ID(), NULL, NULL, NULL, 'DETAILED') ipsJOIN sys.indexes i ON ips.object_id = i.object_id AND ips.index_id = i.index_idWHERE ips.avg_fragmentation_in_percent > 10; -- Example output:-- table_name index_name fragmentation% pages space_used%-- ----------- --------------- ---------------- -------- ------------- orders ix_order_date 45.2 125000 62.3-- customers pk_customer_id 8.7 5000 91.4 -- The orders index is heavily fragmented:-- - 45% fragmentation means pages are not contiguous-- - 62% space used means 38% wasted space (from splits/deletes)-- -- Impact:-- - Range scans read 1.6x more blocks (1/0.62) -- - Sequential scans become random I/O-- -- Solution: Rebuild or reorganize the indexALTER INDEX ix_order_date ON orders REBUILD; -- PostgreSQL equivalentSELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) as index_sizeFROM pg_stat_user_indexes; -- Check bloat (fragmentation) with pgstattuple extensionSELECT * FROM pgstattuple('orders_pkey');-- Returns: dead_tuple_percent, free_space, etc.Indexes degrade over time. Without maintenance (rebuild, reorganize), a 4-I/O query can become a 6-8 I/O query due to fragmentation and bloat. Regular index maintenance is not optional for production databases—it's essential for maintaining baseline performance.
Armed with understanding of how search works, we can derive practical optimization strategies. These are the techniques used by database administrators and performance engineers to maximize index efficiency.
Optimization strategies:
12345678910111213141516171819202122232425262728293031323334353637383940
-- Covering Index Example: Eliminating Data Block Access -- Original querySELECT customer_id, order_date, total_amountFROM ordersWHERE customer_id = 12345ORDER BY order_date DESCLIMIT 10; -- Without covering index:-- 1. Search B+-tree on customer_id (3-4 I/Os)-- 2. For each matching row, access data block to get order_date, total_amount-- 3. If customer has 100 orders: potentially 100 additional I/Os! -- With covering index:CREATE INDEX ix_orders_covering ON orders (customer_id, order_date DESC) INCLUDE (total_amount); -- Now:-- 1. Search B+-tree on customer_id (3-4 I/Os) -- 2. All needed columns are IN THE INDEX - no data block access needed-- 3. Range scan within index for limit 10: maybe 1 additional leaf block -- Result: 4-5 I/Os instead of 100+ I/Os -- Trade-off: Index is larger (includes total_amount)-- But for frequently-executed queries, this is almost always worth it -- EXPLAIN ANALYZE will show "Index Only Scan" when covering worksEXPLAIN ANALYZESELECT customer_id, order_date, total_amountFROM orders WHERE customer_id = 12345ORDER BY order_date DESCLIMIT 10; -- Good output indicates:-- "Index Only Scan using ix_orders_covering"-- "Heap Fetches: 0" (no data block access needed)Covering indexes are one of the highest-impact optimizations available. By including all columns needed by a query in the index, you eliminate the data block access—often the most expensive part of a query. For hot queries, always consider whether a covering index is practical.
Search efficiency in multi-level index trees is the product of algorithmic design, physical storage characteristics, and system configuration. Understanding all three dimensions is essential for database performance engineering.
What's next:
Searches are only half the story. The final page examines index maintenance—how insertions, deletions, and updates are handled while maintaining tree balance and search efficiency. Understanding maintenance costs is essential for write-heavy workloads and helps explain the trade-offs inherent in index design.
You now have a rigorous understanding of search efficiency in multi-level indexes: point query costs, range query dynamics, buffer pool impact, comparison with alternatives, and optimization strategies. This practical knowledge enables informed decisions about index design and database tuning.