Loading learning content...
The multi-level index concept we explored in the previous page—building indexes on indexes to achieve logarithmic-base-fanout search—is elegant in theory. But a concept alone doesn't run on a database server. We need a concrete data structure that embodies this hierarchical organization while efficiently supporting not just searches, but also insertions, deletions, and modifications.
The answer is tree structures—specifically, balanced search trees designed for block-based storage. This page bridges the gap between the multi-level indexing concept and the B-trees and B+-trees that implement it in practice. Understanding this connection is essential: the tree isn't just an implementation detail—it's what makes multi-level indexing practical, maintainable, and robust.
By the end of this page, you will understand: (1) how multi-level indexes naturally form tree structures, (2) the key properties of balanced search trees for databases, (3) node organization and the role of internal vs. leaf nodes, (4) why balance is critical and how it's maintained, and (5) the fundamental differences between binary search trees and disk-optimized trees.
Let's formalize the relationship between multi-level indexes and tree structures. The mapping is straightforward and reveals why trees are the natural representation for hierarchical indexes.
The correspondence:
| Multi-Level Index Term | Tree Term | Description |
|---|---|---|
| Index block | Node | A unit of storage containing keys and pointers |
| Top-level index block | Root node | Single entry point; accessed first in every search |
| Bottom-level index block | Leaf node | Contains pointers to actual data records |
| Intermediate-level block | Internal node | Contains pointers to nodes at the next level down |
| Fanout | Branching factor | Number of child pointers per node |
| Number of levels | Height | Distance from root to leaves |
This correspondence isn't coincidental—a multi-level index is a tree, whether we call it that or not.
123456789101112131415161718192021222324252627282930313233343536
Multi-Level Index → Tree Structure Mapping============================================ MULTI-LEVEL INDEX VIEW: TREE VIEW: Level 3 (Root): [Block] Root Node: [1 block = 1 node] ↓ │Level 2: [Block] [Block] ... Internal Nodes: [118 blocks = 118 nodes] ↓ ↓ │Level 1: [B][B][B] [B][B][B] ... Leaf Nodes: [34,247 blocks = 34,247 nodes] ↓ │Level 0: DATA RECORDS DATA RECORDS TREE TERMINOLOGY APPLIED: Height (h) = Number of levels = 3 (L3 → L2 → L1) Note: Height counts edges or levels depending on definition Fanout (f) = Max children per node = 292 (entries per block) Degree (d) = Typically: min children per internal node For B-trees: d = ⌈f/2⌉ (ensures at least half-full) Root = Single node at top; accessed first in every query Internal = Nodes between root and leaves; contain keys + child pointersNodes Entry format: (key, child_pointer) Leaf = Bottom-level nodes; contain keys + data pointersNodes Entry format: (key, data_pointer) OR (key, data_value) Balance = All paths from root to any leaf have the same length This is CRITICAL for predictable performanceViewing multi-level indexes as trees isn't just naming—it unlocks a rich body of knowledge about tree algorithms, balance maintenance, and complexity analysis. When you understand index operations as tree operations, you can reason about performance, predict behavior, and troubleshoot issues using the extensive theory of balanced trees.
Each node in the index tree corresponds to a disk block. Understanding the internal organization of these nodes is essential for grasping how searches, insertions, and deletions work.
Internal node structure:
Internal nodes (including the root, unless it's also a leaf) contain:
The relationship between keys and pointers follows a specific pattern: if a node has n keys, it has n+1 child pointers (or n pointers if using a slightly different convention).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
Internal Node Structure======================== Standard format (n keys, n+1 pointers): ┌─────────────────────────────────────────────────────────────────────┐│ P₀ │ K₁ │ P₁ │ K₂ │ P₂ │ K₃ │ P₃ │ ... │ K_{n} │ P_{n} │ unused │└─────────────────────────────────────────────────────────────────────┘ Where: K₁ < K₂ < K₃ < ... < K_{n} (keys in sorted order) Pointer meaning: P₀ points to subtree with all keys < K₁ P₁ points to subtree with keys where K₁ ≤ key < K₂ P₂ points to subtree with keys where K₂ ≤ key < K₃ ... P_{n} points to subtree with all keys ≥ K_{n} EXAMPLE: Internal node with 4 keys (5 pointers) ┌──────────────────────────────────────────────────────────────┐│ P₀ │ 100 │ P₁ │ 200 │ P₂ │ 300 │ P₃ │ 400 │ P₄ │ unused │└──────────────────────────────────────────────────────────────┘ │ │ │ │ │ ↓ ↓ ↓ ↓ ↓keys keys keys keys keys< 100 100-199 200-299 300-399 ≥ 400 Memory layout (assuming 4-byte keys, 6-byte pointers): - Each (pointer, key) pair: 6 + 4 = 10 bytes - First pointer alone: 6 bytes - 4 keys, 5 pointers: 6 + (4 × 10) = 46 bytes for entries - Block size 4096 bytes → up to ~400 entries possible - Actual capacity depends on header/metadata overhead Leaf Node Structure (different from internal!)============================================== Leaf nodes store actual data pointers (or data values): ┌────────────────────────────────────────────────────────────────────────┐│ K₁ │ DP₁ │ K₂ │ DP₂ │ K₃ │ DP₃ │ ... │ K_{n} │ DP_{n} │ next │ unused │└────────────────────────────────────────────────────────────────────────┘ Where: K₁, K₂, ... K_{n} = key values (sorted) DP₁, DP₂, ... DP_{n} = data pointers (to actual records) next = pointer to next leaf node (for range scans) - B+-tree feature Key differences from internal nodes: 1. No "between keys" semantics - each key has exactly one data pointer 2. May include sibling pointers for efficient range queries 3. In B+-tree: ALL data pointers are in leaves, none in internal nodes| Property | Internal Node | Leaf Node |
|---|---|---|
| Content | Keys + child pointers | Keys + data pointers (or data values) |
| Key count to pointer count | n keys → n+1 child pointers | n keys → n data pointers |
| Purpose | Navigate to correct subtree | Provide access to actual data |
| Sibling links | Not typically needed | Often linked for range scans |
| Key semantics | Separator/router values | Actual indexed values from data |
In internal nodes, keys serve as 'road signs' directing search traffic. They may or may not correspond to actual keys in the data. In leaf nodes, keys are the actual indexed values from the database records. This distinction becomes important when understanding B-tree vs. B+-tree designs.
The single most important property of database index trees is balance: all leaf nodes must be at the same depth from the root. This property guarantees predictable performance—every search follows a path of exactly the same length.
Why balance is non-negotiable:
Consider an unbalanced tree where some leaves are at depth 3 and others at depth 10. Searching for a key in a deep branch requires 10 I/O operations, while searching in a shallow branch requires only 3. This creates:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
Balance vs. Imbalance: A Concrete Example========================================== Inserting keys 1, 2, 3, 4, 5, 6, 7 in order: UNBALANCED BST (simple binary search tree): 1 ╲ 2 ╲ 3 ╲ 4 ╲ 5 ╲ 6 ╲ 7 Height: 7 (equals number of nodes - worst case!)Search for key 7: 7 comparisons, 7 node accessesThis is O(n), not O(log n)! BALANCED TREE (e.g., B-tree, AVL, Red-Black): 4 ╱ ╲ 2 6 ╱ ╲ ╱ ╲ 1 3 5 7 Height: 3 (logarithmic in number of nodes)Search for ANY key: ≤3 comparisons, ≤3 node accessesThis is O(log n), regardless of insertion order FOR DATABASE INDEXES (with fanout f ≈ 300): Same 7 keys in a B-tree-style node: ┌───────────────────────────────────┐ │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ ... │ └───────────────────────────────────┘ All 7 keys fit in a single node!Height: 1 (root is also a leaf)Search for any key: 1 node access This is why high fanout is so powerful: - 7 keys → 1 level (single block) - 7,000 keys → 2 levels - 7,000,000 keys → 3 levels - 7,000,000,000 keys → 4 levels ALL of these are balanced, and the height grows very slowly.Balance doesn't come for free. Maintaining balance during insertions and deletions requires extra work: node splits, node merges, key redistributions. But this maintenance cost is O(log n) per operation—well worth paying for guaranteed O(log n) search. The alternative (unbalanced trees) makes search O(n) in the worst case.
If balance is the key property, why not use well-known balanced binary trees like AVL trees or Red-Black trees? These are proven data structures with guaranteed O(log n) operations. Yet no major database uses them for disk-based indexes. Understanding why reveals the fundamental optimization that makes B-trees suitable for databases.
The mismatch between binary trees and disk I/O:
| Characteristic | Binary Tree (f=2) | B-Tree (f≈300) | Impact |
|---|---|---|---|
| Keys per node | 1 | ~300 | Binary wastes 99.7% of block |
| Height for 1M keys | 20 levels | 3 levels | 7x more I/O for binary |
| Block utilization | ~10 bytes used of 4KB | ~4KB fully used | Binary is 400x less efficient |
| Cache efficiency | Poor - many small nodes | Excellent - few large nodes | Binary thrashes cache |
| I/O per search | 20 random reads | 3 random reads | Binary is impractical |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
Binary Tree vs. B-Tree: Quantitative Analysis============================================== Dataset: 10,000,000 recordsKey size: 8 bytesPointer size: 6 bytesBlock size: 4,096 bytes BINARY TREE (AVL/Red-Black):---------------------------Node content: 1 key + 2 child pointers = 8 + 6 + 6 = 20 bytesBlock utilization: 20 / 4096 = 0.5% (wasting 99.5% of block!) Since we read whole blocks anyway, ONE BLOCK could hold: 4096 / 20 = 204 nodes But binary tree structure means we visit each node in sequence,not all nodes in a block together. Height of balanced binary tree with 10M nodes: log₂(10,000,000) = 23.25 → 24 levels With tree nodes scattered across blocks: Worst case: 24 block reads per search (Each level might be in a different block) Even if nodes are clustered: - Navigating the tree still requires following pointers - Each pointer might go to a different block - Expected I/O: ~20 block reads B-TREE (fanout ≈ 300):---------------------Node content: ~300 keys + ~301 pointers = 300×8 + 301×6 = 4,206 bytesBlock utilization: ~100% (node sized to match block) Height of B-tree with 10M keys: log₃₀₀(10,000,000) = 2.83 → 3 levels Block reads per search: exactly 3 (one per level) COMPARISON SUMMARY: Binary Tree B-Tree Ratio ----------- ------ -----Height 24 3 8xI/O per search ~20 3 6.7xBlock utilization 0.5% ~100% 200xCache friendliness Very poor Excellent The B-tree is designed for block-based storage.The binary tree is designed for pointer-based (RAM) storage.Using binary trees on disk is a fundamental mismatch.B-trees optimize for the actual cost function of disk storage: minimize block reads. Since each block read costs the same whether it holds 10 bytes or 4,000 bytes of useful data, we should maximize data per block. This means maximum fanout, which means nodes with hundreds of children—the opposite of binary trees.
Database index trees (B-trees and variants) maintain several invariants that ensure balance, efficiency, and correctness. These properties are non-negotiable—violating any of them breaks the structure.
The essential properties:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
class BTreeNode: """ Representation of a B-tree node with invariant enforcement. Order (m): Maximum number of children per node For a B-tree of order m: - Each internal node has at most m children - Each internal node (except root) has at least ⌈m/2⌉ children - Root has at least 2 children (if not a leaf) - Each node has at most m-1 keys - Each node (except root) has at least ⌈m/2⌉-1 keys - All leaves are at the same depth """ def __init__(self, order: int, is_leaf: bool = False): self.order = order # Maximum children self.is_leaf = is_leaf # Is this a leaf node? self.keys = [] # Search keys (sorted) self.children = [] # Child pointers (one more than keys) self.data_pointers = [] # For leaves: pointers to actual data @property def min_keys(self) -> int: """Minimum keys required (except root).""" return (self.order + 1) // 2 - 1 # ⌈m/2⌉ - 1 @property def max_keys(self) -> int: """Maximum keys allowed.""" return self.order - 1 @property def is_underfull(self) -> bool: """Node has fewer than minimum keys.""" return len(self.keys) < self.min_keys @property def is_full(self) -> bool: """Node is at maximum capacity.""" return len(self.keys) >= self.max_keys @property def can_donate(self) -> bool: """Node has more than minimum keys (can donate to sibling).""" return len(self.keys) > self.min_keys def validate(self) -> bool: """Check that all node invariants hold.""" # Keys must be sorted if self.keys != sorted(self.keys): return False # Key count within bounds (ignoring root exception) if len(self.keys) > self.max_keys: return False # Internal nodes: children = keys + 1 if not self.is_leaf: if len(self.children) != len(self.keys) + 1: return False # All keys unique (for primary indexes) if len(self.keys) != len(set(self.keys)): return False return True def validate_btree(root: BTreeNode) -> tuple[bool, str]: """ Validate entire B-tree structure. Returns (is_valid, error_message). """ def check_depth(node: BTreeNode, current_depth: int) -> int: """Return depth of first leaf found, or -1 if inconsistent.""" if node.is_leaf: return current_depth depths = [check_depth(child, current_depth + 1) for child in node.children] if len(set(depths)) > 1 or -1 in depths: return -1 # Inconsistent depths return depths[0] leaf_depth = check_depth(root, 0) if leaf_depth == -1: return (False, "Leaves at different depths - not balanced!") return (True, f"Valid B-tree with height {leaf_depth}")B-tree 'order' (often denoted m) has varying definitions in literature. Some define it as maximum children, others as maximum keys. The practical meaning is: order determines how many keys/children fit in a node. For disk-based trees, order is chosen to make node size equal block size.
The height of the index tree directly determines search cost. Since each level requires one block read, minimizing height is critical. Let's derive the height bounds mathematically.
Minimum height (best case - all nodes maximally full):
With nodes at maximum capacity (f children per internal node), the tree has maximum density. At depth d, we have f^d nodes. Total keys addressable with height h:
1 + f + f² + f³ + ... + f^(h-1) leaves = (f^h - 1) / (f - 1) ≈ f^h for large f
So height h can address approximately f^h records. Inverting:
h_min = ⌈log_f(n)⌉
Maximum height (worst case - all nodes minimally full):
With nodes at minimum capacity (⌈f/2⌉ children per internal node), the tree is sparsest. Let m = ⌈f/2⌉. Then:
h_max = ⌈log_m(n)⌉ = ⌈log_f(n) / log_f(m)⌉ ≈ 2 × h_min (since m ≈ f/2)
In practice, nodes average 69-75% full, so actual height is close to minimum.
| Records (n) | Min Height (h_min) | Max Height (h_max) | Typical Height | Block Reads to Search |
|---|---|---|---|---|
| 1,000 | 2 | 2 | 2 | 2 + 1 (data) |
| 100,000 | 3 | 3 | 3 | 3 + 1 (data) |
| 10,000,000 | 3 | 4 | 3-4 | 3-4 + 1 (data) |
| 1,000,000,000 | 4 | 5 | 4 | 4 + 1 (data) |
| 100,000,000,000 | 5 | 6 | 5 | 5 + 1 (data) |
| 10,000,000,000,000 | 5 | 6 | 5-6 | 5-6 + 1 (data) |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- Calculating B+-tree height for various data sizes-- Practical formula with realistic assumptions WITH params AS ( SELECT 4096 AS block_size_bytes, 8 AS key_size_bytes, 8 AS pointer_size_bytes, -- 6-8 bytes for block pointer 0.70 AS typical_fill_factor -- Nodes typically 70% full),calculated AS ( SELECT FLOOR((block_size_bytes - 100) / (key_size_bytes + pointer_size_bytes)) AS max_fanout, FLOOR((block_size_bytes - 100) / (key_size_bytes + pointer_size_bytes) * typical_fill_factor) AS typical_fanout FROM params),data_sizes AS ( SELECT 1000 AS records, '1 thousand' as label UNION ALL SELECT 1000000, '1 million' UNION ALL SELECT 1000000000, '1 billion' UNION ALL SELECT 1000000000000, '1 trillion')SELECT d.label, d.records, c.typical_fanout AS fanout, CEILING(LOG(d.records) / LOG(c.typical_fanout)) AS tree_height, CEILING(LOG(d.records) / LOG(c.typical_fanout)) + 1 AS total_ios_per_searchFROM data_sizes dCROSS JOIN calculated c; -- Results with typical fanout ~180-200:-- Records Fanout Height Total I/Os (with data)-- 1 thousand 180 2 3-- 1 million 180 3 4-- 1 billion 180 4 5-- 1 trillion 180 5 6 /*KEY INSIGHT: Even for a TRILLION records, search requires only 6 block reads! This is the magic of high-fanout balanced trees: - Tree height grows as log_f(n) - With f ≈ 200, each level adds factor of 200 to capacity - Physical limits (disk space) are reached before height becomes problematic*/For any database that fits on available storage (petabytes), the index tree height never exceeds 6-7 levels. This means every point query requires at most 7 I/O operations—a constant, predictable cost regardless of whether you have millions or trillions of records. This predictability is why B-trees are universal in databases.
Tree structures formalize the multi-level index concept into a robust, maintainable data structure. The key insight is designing trees for disk block access rather than pointer-based traversal—maximizing fanout to minimize height and I/O operations.
What's next:
With the tree structure foundation in place, the next page examines search efficiency in detail. We'll analyze how searches traverse the tree, quantify the exact I/O costs, compare with alternative approaches, and understand how caching and buffer management interact with tree-based search to achieve optimal performance.
You now understand how multi-level indexes manifest as tree structures: the node organization, balance properties, why high fanout is essential, and how height bounds guarantee efficient search. This prepares you for deeper exploration of search algorithms and tree maintenance operations in subsequent pages.