Loading content...
The previous page revealed a fundamental limitation: single-level indexes require log₂(blocks) random disk accesses per lookup—a cost that becomes prohibitive as databases scale to millions or billions of records. Binary search, despite its elegant O(log n) complexity, fails us at the block level because each comparison requires reading an entire disk block.
The solution lies in a deceptively simple insight: we can build an index on top of an index. This hierarchical approach transforms the search problem from binary (base-2) to a much higher base determined by the number of entries per block. The result is a dramatic reduction in I/O operations—often from 15-20 block reads down to 3-4, even for datasets spanning billions of records.
By the end of this page, you will understand: (1) the core concept of multi-level indexing, (2) how building indexes on indexes achieves efficiency, (3) the mathematical relationship between index levels and block accesses, (4) how fanout determines index depth, and (5) the intuition behind why hierarchical organization fundamentally changes scalability.
The breakthrough of multi-level indexing comes from a single observation: a single-level index is itself a sorted file. And what do we do with sorted files to accelerate access? We build indexes on them.
Consider our running example: a 10-million-record customer database with a single-level index spanning 34,247 blocks. This index is sorted by customer_id, with each block containing 292 index entries (at 14 bytes per entry in 4 KB blocks).
Now imagine we build a second index—not on the original data, but on the first index. This second index contains one entry for each block of the first index: the key value of the first entry in that block, plus a pointer to the block.
123456789101112131415161718192021222324252627282930313233343536
Multi-Level Index Construction============================== Level 0 (Data File): - 10,000,000 customer records - Record size: 200 bytes - Total: ~2 GB across multiple data blocks Level 1 (Primary Index / First-Level Index): - One entry per data record (dense index) - Entry size: 14 bytes (8-byte key + 6-byte pointer) - Entries per block: 292 - Total blocks: 34,247 - Total size: ~134 MB Level 2 (Secondary Index on Level 1): - One entry per BLOCK of Level 1 (sparse index on Level 1) - Entry size: 14 bytes (8-byte key + 6-byte pointer) - Total entries needed: 34,247 (one per Level 1 block) - Entries per block: 292 - Total blocks: ⌈34,247 / 292⌉ = 118 blocks - Total size: ~472 KB Level 3 (Tertiary Index on Level 2): - One entry per BLOCK of Level 2 (sparse index on Level 2) - Entry size: 14 bytes - Total entries needed: 118 - Entries per block: 292 - Total blocks: ⌈118 / 292⌉ = 1 block - Total size: 4 KB TOTAL INDEX STRUCTURE: Level 3: 1 block (root) Level 2: 118 blocks Level 1: 34,247 blocks Level 0: Data (millions of blocks)The crucial observation:
The second-level index has only 118 blocks (compared to 34,247 for the first level). We can build a third index on the second, which has only 1 block. This single root block contains entries pointing to all 118 second-level blocks.
Search process transformation:
With single-level index: Binary search over 34,247 blocks = 16 block reads.
With multi-level index:
Total: 4 block reads instead of 16 (plus data access).
We reduced 16 random disk accesses to 4 by adding just 119 additional blocks (118 + 1) of metadata. The overhead is minimal (119 blocks = 476 KB), but the performance improvement is 4x. This is the fundamental value proposition of multi-level indexing.
The efficiency of multi-level indexes stems from a mathematical relationship between the fanout (entries per block) and the number of levels required. Understanding this relationship is essential for database tuning and capacity planning.
Key definitions:
Fanout (f): The number of index entries that fit in one block. Also called branching factor.
Number of levels (L): The depth of the index hierarchy.
Records addressable at L levels: f^L records (approximately)
The exponential growth of addressable records:
| Levels (L) | Addressable Records (f^L) | Index Block Accesses | Storage (Index Only) |
|---|---|---|---|
| 1 | 292 | 1 | 4 KB |
| 2 | 85,264 | 2 | 4 KB + 4 KB = 8 KB |
| 3 | 24,897,088 (~25M) | 3 | ~476 KB |
| 4 | 7,270,009,696 (~7.3B) | 4 | ~134 MB |
| 5 | 2,122,842,831,232 (~2.1T) | 5 | ~38 GB |
The logarithmic relationship:
To find the number of levels required to index r records:
L = ⌈log_f(r)⌉
For our 10-million-record database with f = 292:
L = ⌈log₂₉₂(10,000,000)⌉
= ⌈ln(10,000,000) / ln(292)⌉
= ⌈16.12 / 5.68⌉
= ⌈2.84⌉
= 3 levels
Three levels of index can address 24.9 million records. Our 10-million-record database fits comfortably.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
-- Calculating required index levels for various data sizes-- Assuming: Block size = 4096 bytes, Entry size = 14 bytes-- Fanout (f) = floor(4096 / 14) = 292 -- Formula: Levels = CEILING(LOG(records) / LOG(fanout)) WITH fanout_params AS ( SELECT 4096 AS block_size, 14 AS entry_size, FLOOR(4096.0 / 14) AS fanout -- 292),data_scales AS ( SELECT 1000 AS records UNION ALL SELECT 10000 UNION ALL SELECT 100000 UNION ALL SELECT 1000000 UNION ALL -- 1 million SELECT 10000000 UNION ALL -- 10 million SELECT 100000000 UNION ALL -- 100 million SELECT 1000000000 UNION ALL -- 1 billion SELECT 10000000000 UNION ALL -- 10 billion SELECT 100000000000 -- 100 billion)SELECT d.records, f.fanout, CEILING(LOG(d.records) / LOG(f.fanout)) AS levels_required, -- Compare to single-level binary search blocks CEILING(d.records * 14.0 / 4096) AS single_level_blocks, CEILING(LOG(CEILING(d.records * 14.0 / 4096)) / LOG(2)) AS binary_search_blocksFROM data_scales dCROSS JOIN fanout_params f; -- Results:-- Records Fanout Levels Single-Level Blocks Binary Search Blocks-- ---------- ------ ------ ------------------- ----------------------- 1,000 292 2 4 2-- 10,000 292 2 35 6-- 100,000 292 3 343 9-- 1,000,000 292 3 3,425 12-- 10,000,000 292 3 34,247 16-- 100,000,000 292 4 342,466 19-- 1,000,000,000 292 4 3,424,658 22-- 10,000,000,000 292 5 34,246,576 25-- 100,000,000,000 292 5 342,465,754 29With fanout of 292, each level of the multi-level index effectively 'skips' 8-9 levels of binary search (since log₂(292) ≈ 8.2). A 4-level multi-level index has search efficiency comparable to a 32-level binary search—but we never need 32 binary search levels because the math caps us at ~25 for even the largest databases.
Multi-level indexes have a specific density characteristic at each level that is fundamental to their efficiency. Understanding this pattern reveals why the structure works.
The density pattern:
Bottom level (Level 1): Can be either dense or sparse, depending on whether data is sorted.
Upper levels (Level 2 and above): Always sparse—one entry per block of the level below.
This pattern ensures that each upper level has a predictable size: it has exactly ⌈(blocks at level below) / fanout⌉ blocks.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
Sparse vs Dense Multi-Level Index Comparison============================================ Dataset: 10,000,000 recordsRecord size: 200 bytesBlock size: 4,096 bytesData blocks: ⌈10,000,000 × 200 / 4,096⌉ = 488,282 blocks CASE 1: Dense Index (data not sorted by index key)-------------------------------------------------Level 1 (Dense - one entry per record): Entries: 10,000,000 Blocks: ⌈10,000,000 / 292⌉ = 34,247 blocks Level 2 (Sparse - one entry per Level 1 block): Entries: 34,247 Blocks: ⌈34,247 / 292⌉ = 118 blocks Level 3 (Sparse - one entry per Level 2 block): Entries: 118 Blocks: ⌈118 / 292⌉ = 1 block (root) Total index blocks: 34,247 + 118 + 1 = 34,366 blocksTotal index size: ~134 MBSearch cost: 3 index blocks + 1 data block = 4 I/Os CASE 2: Sparse Index (data IS sorted by index key)-------------------------------------------------Level 1 (Sparse - one entry per DATA block): Entries: 488,282 (one per data block) Blocks: ⌈488,282 / 292⌉ = 1,673 blocks Level 2 (Sparse - one entry per Level 1 block): Entries: 1,673 Blocks: ⌈1,673 / 292⌉ = 6 blocks Level 3 (Sparse - one entry per Level 2 block): Entries: 6 Blocks: ⌈6 / 292⌉ = 1 block (root) Total index blocks: 1,673 + 6 + 1 = 1,680 blocksTotal index size: ~6.5 MBSearch cost: 3 index blocks + 1 data block = 4 I/Os COMPARISON: Dense index: 34,366 blocks (~134 MB) Sparse index: 1,680 blocks (~6.5 MB) Ratio: 20:1 size difference! The sparse index is 20x smaller but has the SAME search cost.This is why clustering/primary key indexes use sparse organization.A table can have only ONE clustered index (because data can only be physically sorted one way). This makes the clustered index selection critical—it gets the efficiency of sparse indexing. All other indexes must be dense, paying the 20x size overhead. Choose your clustering key wisely.
Understanding how search works in a multi-level index solidifies the conceptual model and prepares you for the tree structures (B-trees, B+-trees) that implement this concept. The algorithm is elegant in its simplicity.
The search algorithm:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
def multi_level_search(target_key, root_block): """ Search for a key in a multi-level index. At each level: 1. Read the current block into memory 2. Find the largest key K where K <= target_key 3. Follow the pointer associated with K to the next level 4. Repeat until we reach the leaf level 5. At leaf level, follow pointer to data record Time Complexity: O(L) block reads, where L = number of levels Space Complexity: O(1) additional memory (reuse buffer for each block) """ current_block = root_block # Traverse from root to leaf level while current_block.level > 0: # 0 = leaf level # Read block from disk (1 I/O) block_data = disk_read(current_block.block_id) # Binary search within block (in-memory, fast) # Find largest entry where entry.key <= target_key entry = binary_search_le(block_data.entries, target_key) if entry is None: # Target key is smaller than all keys in index # Follow pointer before first key (if exists) entry = block_data.entries[0] # Follow pointer to next level current_block = entry.pointer # At leaf level - find exact entry block_data = disk_read(current_block.block_id) # 1 I/O entry = binary_search_exact(block_data.entries, target_key) if entry is None: return None # Key not found # Follow pointer to data record data_block = disk_read(entry.data_pointer.block_id) # 1 I/O record = extract_record(data_block, entry.data_pointer.offset) return record def binary_search_le(entries, target_key): """ Binary search within a single block (in memory). Find the largest key <= target_key. Time Complexity: O(log f) where f = fanout This is FAST because it's entirely in RAM. """ low, high = 0, len(entries) - 1 result = None while low <= high: mid = (low + high) // 2 if entries[mid].key <= target_key: result = entries[mid] low = mid + 1 # Look for larger key still <= target else: high = mid - 1 return resultKey aspects of the algorithm:
1. One block read per level
At each level, we read exactly one block. The block tells us which block to read at the next level. This is the source of the O(L) I/O complexity.
2. In-memory binary search is free (comparatively)
Within each block, we use binary search to find the right entry. With 292 entries per block, this takes ~8 comparisons—microseconds of CPU time. Compared to the milliseconds (HDD) or hundreds of microseconds (SSD) for the block read, this is negligible.
3. No wasted reads
Unlike single-level binary search (which might read blocks we don't need in later iterations), multi-level search reads only blocks on the direct path from root to leaf. Every I/O contributes to finding the answer.
4. Predictable performance
Every search follows the same pattern and accesses the same number of blocks (L levels + 1 data block). There are no worst cases or lucky cases—performance is consistent.
Single-level: O(log₂ blocks) I/Os. Multi-level: O(log_f records) I/Os. Since f ≈ 300 and log₂(300) ≈ 8, multi-level indexing effectively divides the number of I/Os by 8. For very large datasets, this means 3-4 reads instead of 20-25.
A visual representation helps cement understanding of how multi-level indexes organize data. Let's trace through a concrete example with simplified numbers for clarity.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
Multi-Level Index Structure (Simplified Example)================================================= Scenario: Index 1,000 records with fanout f = 10(Real systems have f ≈ 100-500, but 10 is easier to visualize) LEVEL 3 (Root) ┌─────────────┐ │ 1 100 200 │ <- 10 entries, pointing to Level 2 blocks │ 300 400 500 │ │ 600 700 800 │ Each entry: (key, pointer to L2 block) │ 900 │ └─────────────┘ │ ┌──────────────────┼──────────────────┐ ▼ ▼ ▼ LEVEL 2 LEVEL 2 LEVEL 2 ┌──────────┐ ┌──────────┐ ┌──────────┐ │ 1 10 20 │ │ 300 310 │ │ 900 910 │ │ 30 40 50 │ │ 320 330 │ │ 920 930 │ │ 60 70 80 │ │ 340 350 │ │ 940 950 │ │ 90 │ │ ... │ │ 960 ... │ └──────────┘ └──────────┘ └──────────┘ │ │ │ ▼ ▼ ▼ LEVEL 1 LEVEL 1 LEVEL 1 (100 blocks) (100 blocks) (100 blocks) Each block Each block Each block has 10 entries has 10 entries has 10 entries pointing to pointing to pointing to data records data records data records │ │ │ ▼ ▼ ▼ DATA FILE DATA FILE DATA FILE (1000 records distributed across data blocks) SEARCH EXAMPLE: Find record with key = 347----------------------------------------- Step 1: Read Level 3 (Root) - Scan entries: 1, 100, 200, 300, 400, ... - 300 <= 347 < 400, so follow pointer for key 300 - → Go to Level 2 block for range [300, 400) Step 2: Read Level 2 Block (300 range) - Scan entries: 300, 310, 320, 330, 340, 350, ... - 340 <= 347 < 350, so follow pointer for key 340 - → Go to Level 1 block for range [340, 350) Step 3: Read Level 1 Block (340 range) - Scan entries: 340, 341, 342, 343, 344, 345, 346, 347, 348, 349 - Found exact match: 347 - Follow data pointer to actual record Step 4: Read Data Block - Retrieve the actual record with key 347 Total I/O: 4 block reads (3 index + 1 data) Compare to single-level binary search on 100 index blocks: log₂(100) = 6.6, so 7 block reads for index alone. PLUS 1 data read = 8 total. Multi-level saves 50% of I/Os in this small example.The savings grow larger with bigger datasets.Notice how the multi-level index forms a tree structure naturally: one root, multiple branches at Level 2, even more at Level 1, and finally the leaves (data pointers). This isn't a coincidence—B-trees and B+-trees are the formalized, optimized versions of this exact structure.
Multi-level indexing isn't just a theoretical improvement—it has profound practical implications for database system design, capacity planning, and query performance. Understanding these implications is essential for database administrators and engineers.
12345678910111213141516171819202122232425262728293031323334353637383940
-- Practical cache utilization for multi-level index-- 10 million records, fanout = 292 -- Index Structure:-- Level 3: 1 block (always cached)-- Level 2: 118 blocks (easily cached)-- Level 1: 34,247 blocks (selectively cached based on access patterns) -- Memory allocation strategy: -- Scenario 1: Minimal cache (< 1 MB allocated to this index)-- Cached: Level 3 only (4 KB)-- Search cost: 3 disk reads (L3 cached, L2+L1+data from disk) -- Scenario 2: Moderate cache (1-5 MB allocated)-- Cached: Level 3 + Level 2 (476 KB)-- Search cost: 2 disk reads (L3+L2 cached, L1+data from disk) -- Scenario 3: Generous cache (50+ MB allocated for hot data)-- Cached: Level 3 + Level 2 + ~10% of Level 1 (hot ranges)-- Search cost: 1.3 disk reads average (assuming 70% hot data hit rate) -- Scenario 4: Full index cache (140 MB allocated)-- Cached: All index levels-- Search cost: 1 disk read (data only) -- KEY INSIGHT: -- With just 476 KB of cache (Level 2 + Level 3), we cut disk I/O in half.-- Multi-level indexes have excellent cache efficiency due to hierarchical access patterns. SELECT 'Level 3 (Root)' as level, 1 as blocks, 1 * 4096.0 / 1024 as size_kb, 'Always cached - every query touches this' as cache_strategyUNION ALL SELECT 'Level 2', 118, 118 * 4096.0 / 1024, 'Should be cached - only 472 KB'UNION ALL SELECT 'Level 1', 34247, 34247 * 4096.0 / 1024 / 1024 as size_mb, 'Selective caching based on access patterns';| Characteristic | Single-Level Index | Multi-Level Index | Improvement |
|---|---|---|---|
| Search I/O (cold cache) | 16 block reads | 4 block reads | 4x reduction |
| Search I/O (warm cache) | ~10 block reads (50% cached) | 2 block reads (upper levels cached) | 5x reduction |
| Cache efficiency | Random—binary search scatters access | Hierarchical—upper levels always hot | Much better |
| Memory required for good performance | ~134 MB (full index) | ~0.5 MB (upper levels) | 268x reduction |
| Scalability to 1B records | 22 block reads | 4 block reads | 5.5x reduction |
When tuning database cache sizes, prioritize memory for upper index levels. A small cache that holds all Level 2+ blocks outperforms a larger cache randomly holding Level 1 blocks. Most modern databases do this automatically, but understanding the principle helps in troubleshooting and capacity planning.
The multi-level index concept represents a fundamental breakthrough in database access efficiency. By building indexes on indexes, we transform the scaling behavior from logarithm-base-2 to logarithm-base-fanout—a change that reduces I/O operations by factors of 4-6x at scale.
What's next:
The multi-level concept naturally leads to tree-based data structures. The next page explores how this hierarchical organization is formalized into tree structures—specifically the balanced search trees that power every modern database index. We'll see how B-trees maintain balance, handle updates, and optimize for the block-based access patterns of disk storage.
You now understand the multi-level index concept: hierarchical organization, fanout-based efficiency, sparse upper levels, and the resulting search algorithm. This conceptual foundation prepares you for understanding B-trees and B+-trees—the practical implementations of multi-level indexing used in every major database system.