Database Management SystemsMulti-Level Index

Multi-Level Index Structures

LevelIntermediate

Duration75 mins

TopicMulti-Level Index

5 / 5

Index Maintenance

The Cost of Keeping Balance

Multi-level index trees achieve their remarkable search efficiency through careful structural organization: balanced height, sorted keys, and predictable fanout. But databases aren't static—records are constantly inserted, updated, and deleted. Every modification potentially disrupts the tree's carefully maintained structure.

Index maintenance is the hidden cost of indexing. While we've seen that search requires 3-4 I/Os, maintaining the tree through modifications can require significantly more. Understanding these costs is essential for database design, especially for write-heavy workloads like OLTP systems, streaming data pipelines, and high-frequency trading platforms.

What You Will Learn

By the end of this page, you will understand: (1) how insertions work, including node splits when nodes overflow, (2) how deletions work, including underflow handling through redistribution and merging, (3) the I/O cost of maintenance operations, (4) how updates affect indexes, and (5) strategies for minimizing maintenance overhead in production systems.

Insertion Operations

Inserting a new record requires adding an entry to the index tree. The process starts by finding the correct leaf node (just like a search), then inserting the new key-pointer pair. The complexity arises when the target leaf is already full.

The insertion algorithm:

Search: Traverse the tree from root to find the appropriate leaf node
Insert: Add the new entry in sorted position within the leaf
Split (if necessary): If the leaf overflows, split it into two nodes
Propagate: If a split occurs, insert the new separator key in the parent (which may cascade)

insertion_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
def insert_into_btree(root: BTreeNode, key, data_pointer) -> BTreeNode:
    """
    Insert a key-pointer pair into a B+-tree.
    Returns the (possibly new) root node.
    
    This implementation uses a recursive descent with split propagation.
    Real databases often use iterative approaches with parent pointers.
    """
    
    def insert_recursive(node: BTreeNode, key, data_pointer):
        """
        Returns: (new_key, new_child) if split occurred, else (None, None)
        new_key: The separator key to insert in parent
        new_child: Pointer to the new sibling node created by split
        """
        
        if node.is_leaf:
            # Insert into leaf node
            insert_position = find_insert_position(node.keys, key)
            node.keys.insert(insert_position, key)
            node.data_pointers.insert(insert_position, data_pointer)
            
            if len(node.keys) > node.max_keys:
                # Overflow! Must split this leaf
                return split_leaf(node)
            else:
                return (None, None)  # No split needed
        
        else:
            # Internal node: find child and recurse
            child_index = find_child_index(node.keys, key)
            child = node.children[child_index]
            
            # Recursive insert into appropriate subtree
            (promoted_key, new_child) = insert_recursive(child, key, data_pointer)
            
            if promoted_key is not None:
                # A split occurred below - must insert separator here
                node.keys.insert(child_index, promoted_key)
                node.children.insert(child_index + 1, new_child)
                
                if len(node.keys) > node.max_keys:
                    # This internal node also overflows - split it too
                    return split_internal(node)
            
            return (None, None)
    
    # Start insertion from root
    (promoted_key, new_child) = insert_recursive(root, key, data_pointer)
    
    if promoted_key is not None:
        # Root was split - create new root
        new_root = BTreeNode(order=root.order, is_leaf=False)
        new_root.keys = [promoted_key]
        new_root.children = [root, new_child]
        return new_root
    
    return root  # Root unchanged
 
 
def split_leaf(node: BTreeNode):
    """
    Split an overflowing leaf node into two nodes.
    Returns: (separator_key, new_sibling)
    """
    mid = len(node.keys) // 2
    
    # Create new sibling with upper half of entries
    sibling = BTreeNode(order=node.order, is_leaf=True)
    sibling.keys = node.keys[mid:]
    sibling.data_pointers = node.data_pointers[mid:]
    
    # Original node keeps lower half
    node.keys = node.keys[:mid]
    node.data_pointers = node.data_pointers[:mid]
    
    # Maintain leaf sibling chain (for B+-tree)
    sibling.next_leaf = node.next_leaf
    node.next_leaf = sibling
    
    # Separator key: first key of new sibling (copied up)
    separator = sibling.keys[0]
    
    return (separator, sibling)
 
 
def split_internal(node: BTreeNode):
    """
    Split an overflowing internal node.
    Returns: (separator_key, new_sibling)
    
    Key difference from leaf split: separator is pushed up, not copied.
    """
    mid = len(node.keys) // 2
    
    # The middle key is pushed up to parent (not kept in either child)
    separator = node.keys[mid]
    
    # Create new sibling with upper half
    sibling = BTreeNode(order=node.order, is_leaf=False)
    sibling.keys = node.keys[mid + 1:]  # Skip the separator
    sibling.children = node.children[mid + 1:]
    
    # Original keeps lower half
    node.keys = node.keys[:mid]
    node.children = node.children[:mid + 1]
    
    return (separator, sibling)

Visualizing a node split:

When a leaf node with capacity 4 receives a 5th entry:

split_visualization.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Node Split Example
==================
 
Before insertion of key 35:
 
    Parent Node: [20 | 40 | 60]
                  /    |     \
                 v     v      v
        [10,15] [25,30,32,38] [45,50]
                 ^^^^^^^^
                 Target leaf (already full with 4 keys)
 
 
After inserting 35 (causes overflow → split):
 
Step 1: Insert 35 in sorted position
    [25, 30, 32, 35, 38]  ← 5 entries, overflow!
 
Step 2: Split leaf at midpoint
    [25, 30]  |  [32, 35, 38]
       ↑              ↑
    Original      New sibling
 
Step 3: Promote separator (32) to parent
    Parent Node: [20 | 32 | 40 | 60]
                  /    |    |     \
                 v     v    v      v
        [10,15] [25,30] [32,35,38] [45,50]
                 
Note: 
- Separator 32 is COPIED to parent (in B+-tree)
- Separator 32 remains in leaf (data pointer needed)
- Parent now has 4 keys (if max is 4, parent would also split)

Cascading Splits

A single insertion can trigger splits all the way up to the root. If the root splits, a new root is created, increasing tree height by 1. This cascade is rare (probability decreases exponentially with level) but can cause significant I/O when it happens. This is why B-trees 'grow from the bottom up.'

Insertion I/O Cost Analysis

Let's quantify the I/O cost of insertions. The cost depends on whether splits occur, which depends on how full the tree nodes are.

Base cost (no split):

Read h blocks (root to leaf traversal)
Write 1 block (modified leaf)
Total: h reads + 1 write

For our 10-million-record example (h=3): 3 reads + 1 write = 4 I/Os

Cost with leaf split:

Read h blocks (traversal)
Write 2 blocks (original leaf + new sibling)
Write 1 block (parent to add separator)
Total: h reads + 3 writes

Cost with cascading split:

If parent also splits, add 2 more writes. If it cascades to root, potentially h×2 writes.

Insertion I/O Costs by Scenario
Scenario	Probability	Read I/Os	Write I/Os	Total I/Os
No split	~65-70%	3	1	4
Leaf split only	~25-30%	3	3	6
Leaf + parent split	~3-5%	3	5	8
Split to root	<1%	3	2h+1	2h+4
New root created	<<1%	3	2h+2	2h+5

insertion_cost_analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Expected Insertion I/O Cost Calculation
========================================
 
Assumptions:
  - Tree height: h = 3
  - Fanout: f = 250
  - Node fill factor after splits: ~70% (empirical average)
 
Split probability at each level:
  P(leaf split) ≈ 1/f (one split per f insertions on average)
               ≈ 1/250 ≈ 0.4% per insertion
               
  But nodes aren't always full. With 70% fill factor:
  P(leaf split) ≈ 30% (room for ~75 more keys per leaf)
  
  Wait, this seems contradictory. Let's think more carefully:
  
  After many random insertions, nodes average ~69% full (ln(2) ≈ 0.693).
  Each node can hold f keys.
  Average keys per leaf: 0.69 × f ≈ 173 (for f=250)
  
  P(leaf split | insertion into this leaf):
    Depends on current fullness. On average, over long term:
    P(split) ≈ 1 / (keys per leaf) ≈ 1/173 ≈ 0.6%
    
  More accurately, considering that splits create two ~50% full nodes:
    Average split rate: ~ 1 / (0.5 × f) = 1/125 ≈ 0.8% per insertion
    
  P(parent split | leaf split) ≈ same logic ≈ 0.8%
  P(root split | parent split) ≈ 0.8%
  
 
Expected I/O Cost:
  E[I/O] = P(no split) × 4 + P(leaf split only) × 6 + P(cascade) × (6+)
  
  With p = 0.008 (0.8% split rate):
  E[I/O] ≈ (1-p) × 4 + p × (1-p) × 6 + p² × 8 + ...
         ≈ 0.992 × 4 + 0.00794 × 6 + 0.000064 × 8
         ≈ 3.968 + 0.048 + 0.0005
         ≈ 4.02 I/Os per insertion
 
RESULT: Expected insertion cost is only slightly above search cost!
  - Search: 3-4 I/Os (read only)
  - Insert: ~4 I/Os (mostly no split, occasional split overhead averages out)
 
This is remarkably efficient. The B-tree design minimizes overhead.

Write Amplification Reality

Despite the potential for cascading splits, the expected insertion cost is close to the search cost. B-tree insertions have low write amplification (~1.3x) on average. This is vastly better than maintaining a sorted array (which would require moving half the data on average) and is one reason B-trees are so efficient for OLTP workloads.

Deletion Operations

Deleting a record requires removing its entry from the index tree. This is more complex than insertion because we must handle underflow—when a node falls below the minimum occupancy requirement.

The deletion algorithm:

Search: Find the leaf node containing the key to delete
Delete: Remove the entry from the leaf
Handle underflow (if necessary): If the leaf falls below minimum occupancy: a. Redistribute: Borrow entries from a sibling node, OR b. Merge: Combine with a sibling node
Propagate: If merge occurs, delete separator from parent (which may cascade)

deletion_visualization.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
Deletion Example: Handling Underflow
=====================================
 
Tree parameters: Minimum 2 keys per leaf (max 4)
 
Initial state:
    Parent: [30 | 50]
            /   |    \
           v    v     v
    [10,20] [35,40] [55,60,70,80]
              ↑
    Delete key 40 from this leaf
 
 
Step 1: Delete 40
    [35] ← Only 1 key remaining, minimum is 2 → UNDERFLOW!
 
 
Step 2: Check siblings for redistribution
 
Option A: Borrow from right sibling (has 4 keys, can donate)
    Before: [35] and [55,60,70,80]
    After:  [35,55] and [60,70,80]
    Parent separator changes: 50 → 60
    
    Result:
        Parent: [30 | 60]
                /   |    \
               v    v     v
        [10,20] [35,55] [60,70,80]
    
    I/O: 1 read (sibling) + 3 writes (leaf, sibling, parent) = 4 I/Os
 
 
Option B: Borrow from left sibling (has 2 keys, can donate 1)
    Before: [10,20] and [35]
    After:  [10] and [20,35]
    Parent separator changes: 30 → 20
    
    But wait: [10] now has only 1 key → left sibling underflows!
    Would need to borrow from ITS sibling, or merge.
    
    In this case, borrowing from right is better (more keys available).
 
 
What if sibling can't donate?
============================
 
Scenario: Delete from [35,40] when right sibling has only [55,60]
 
After deleting 40: [35] ← underflow
Right sibling [55,60] has exactly minimum, can't donate
 
Must MERGE:
    [35] + [55,60] + separator(50) → [35,50,55,60]
    
    Before merge:
        Parent: [30 | 50]
                /   |    \
               v    v     v
        [10,20] [35] [55,60]
    
    After merge:
        Parent: [30]
                /    \
               v      v
        [10,20] [35,50,55,60]
    
    I/O: 1 read + 3 writes (leaf gone, sibling modified, parent modified)
 
Note: Parent lost a key. If parent underflows, cascade merge upward.
If root ends up with 0 keys, tree height decreases by 1.

Redistribution (Borrow)

•Sibling has more than minimum keys
•Move one key from sibling to underflow node
•Update separator in parent
•Tree structure unchanged
•Preferred option (simpler, local changes)

Merge (Coalesce)

•Sibling has exactly minimum keys (can't donate)
•Combine underflow node + sibling into one node
•Pull separator down from parent
•One less node in tree; may cascade underflow
•Can decrease tree height if root merges

Lazy Deletion Alternative

Many databases use 'lazy deletion'—marking entries as deleted without immediately removing them. This avoids underflow handling during the delete operation. The tree is later reorganized during maintenance windows or when a threshold of deleted entries is reached. This trades storage space for reduced deletion overhead.

Update Operations

Updates come in two flavors, with very different index maintenance costs:

Non-key update: Changing a column that is NOT part of the index key. Example: updating a customer's email address when the index is on customer_id.

Key update: Changing a column that IS part of the index key. Example: correcting a customer_id value.

The difference is profound.

Update Operation Costs
Update Type	Index Impact	I/O Cost	Complexity
Non-key (covered column)	Update leaf entry value	h reads + 1 write	Low—modify in place
Non-key (non-covered)	No index change	0 index I/Os	None—index unaffected
Key update	Delete + Insert	2× (delete + insert cost)	High—full rebalancing possible

update_scenarios.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Update Operation Examples
 
-- Table: customers with clustered index on customer_id
-- Non-clustered index on (last_name, first_name)
 
-- SCENARIO 1: Non-key update on non-indexed column
UPDATE customers 
SET email = 'new@email.com' 
WHERE customer_id = 12345;
 
-- Index impact: NONE
-- Clustered index used to find row (3-4 I/Os)
-- Only data page modified
-- last_name index: unchanged
-- Total: 4 I/Os (same as point query + 1 write)
 
 
-- SCENARIO 2: Non-key update on indexed column (not key)
-- Assume covering index: (customer_id) INCLUDE (status)
UPDATE customers 
SET status = 'active' 
WHERE customer_id = 12345;
 
-- Index impact: Update index entry to change status value
-- Find entry: 3-4 I/Os
-- Modify leaf entry in place: 1 write
-- Total: 4-5 I/Os
 
 
-- SCENARIO 3: Key update (changing primary key - rare!)
UPDATE customers 
SET customer_id = 99999 
WHERE customer_id = 12345;
 
-- Index impact: MAJOR
-- 1. Delete old entry for customer_id = 12345
--    - Search: 3 I/Os
--    - Delete from leaf: 1 write
--    - Potential underflow handling: 0-4 more I/Os
-- 
-- 2. Insert new entry for customer_id = 99999
--    - Search: 3 I/Os  
--    - Insert into leaf: 1 write
--    - Potential split handling: 0-4 more I/Os
--
-- 3. Data record must also move (clustered index)
--    
-- Total: 8-20+ I/Os
 
-- This is why primary keys should be immutable!
 
 
-- SCENARIO 4: Update on secondary index key
UPDATE customers 
SET last_name = 'Smith' 
WHERE customer_id = 12345;
 
-- Secondary index on (last_name, first_name) affected:
-- 1. Delete old entry for (old_last_name, first_name)
-- 2. Insert new entry for ('Smith', first_name)
-- 
-- Clustered index (customer_id): just data update, no key change
-- 
-- Total: ~8-12 I/Os (search + delete + insert + data update)
 
-- Every indexed column update costs more than non-indexed updates!

The Index Tax

Every index on a table adds maintenance overhead for inserts, deletes, and updates to indexed columns. A table with 5 indexes requires ~5x the write I/O of an unindexed table. This 'index tax' is why good index design balances read performance gains against write performance costs.

Bulk Operations and Batch Loading

Row-by-row operations are optimized for transactional workloads. But what about loading millions of rows during data migration, ETL processes, or initial database population? The row-by-row approach becomes prohibitively expensive.

Bulk loading strategies:

Bulk Loading Techniques

•Sorted bulk load: Sort data by index key before loading. Build tree bottom-up by filling leaf pages sequentially, then constructing internal nodes. O(n) I/O instead of O(n log n).
•Drop and rebuild: Drop indexes before loading, load data, rebuild indexes from scratch. Often faster than maintaining indexes during load.
•Deferred index updates: Collect index modifications in a buffer, apply in batches sorted by key. Reduces random I/O.
•Minimal logging: Temporarily disable detailed logging during bulk loads. Faster but reduced recovery options.

bulk_load_comparison.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
Bulk Load Performance Comparison
=================================
 
Scenario: Load 10 million records into table with B+-tree index
Record size: 200 bytes
Index entry: 14 bytes
Block size: 4 KB
 
METHOD 1: Row-by-row INSERT with index maintenance
-------------------------------------------------
Per-row cost: ~4-6 I/Os (search + insert + occasional split)
Total I/Os: 10,000,000 × 5 = 50,000,000 I/Os
SSD time: 50M × 0.1ms = 5,000 seconds = 83 minutes
HDD time: 50M × 10ms = 500,000 seconds = 139 hours (!)
 
Problem: Completely impractical for HDD, slow even for SSD.
 
 
METHOD 2: Sorted bulk load (bottom-up construction)
--------------------------------------------------
Step 1: Sort data by index key
  - External sort: O(n log n) comparisons
  - I/O: ~2-4 passes over data = 2-4 × (10M × 200 bytes / 4KB)
  - I/O: ~2 million block reads/writes
 
Step 2: Build leaf level
  - Write full leaf blocks sequentially
  - 10M entries / 250 per leaf = 40,000 leaf blocks
  - I/O: 40,000 sequential writes
 
Step 3: Build internal levels
  - Level 2: 40,000 / 250 = 160 blocks
  - Level 3: 160 / 250 = 1 block (root)
  - I/O: ~200 writes
 
Total I/Os: ~2,000,000 + 40,200 = ~2,040,000 I/Os
SSD time: 2M × 0.1ms = 200 seconds = 3.3 minutes
HDD time: 2M × 0.1ms (sequential!) = ~200 seconds = 3.3 minutes
 
SPEEDUP: 25x faster than row-by-row!
 
 
METHOD 3: Drop index, load, rebuild
-----------------------------------
Step 1: Drop index
  - Instant
 
Step 2: Load data (no index maintenance)
  - Simple append to heap: 10M × 200 bytes = 2 GB
  - Sequential writes: 500,000 block writes
  - I/O: 500,000 writes
 
Step 3: Rebuild index (same as sorted bulk load)
  - But data not sorted! Must sort first.
  - Index build: ~2,040,000 I/Os (same as Method 2)
 
Total: ~2,540,000 I/Os
Slightly slower than Method 2, but simpler to implement.
 
CONCLUSION: 
For large loads, NEVER do row-by-row with index maintenance.
Either bulk load with sorted data, or drop/rebuild indexes.

ETL Best Practice

When loading large datasets: (1) sort data by primary key before loading, (2) load without secondary indexes, (3) build secondary indexes after load completes. This can reduce load times from hours to minutes. Most database bulk-loading utilities implement these optimizations automatically.

Maintenance Best Practices

Index maintenance isn't just about individual operations—it's an ongoing operational concern. Over time, indexes can develop problems that degrade performance: fragmentation, bloat, and suboptimal structure. Understanding ongoing maintenance is essential for production systems.

Common index degradation issues:

Index Degradation Patterns

•Logical fragmentation — Pages out of key order. Sequential scans become random I/O. Caused by page splits that allocate non-contiguous pages.
•Internal fragmentation (bloat) — Pages partially filled due to deletions or splits. Wasted space means more blocks to read. Degrades scan performance.
•Page splits creating hot spots — Inserting sequential keys (like auto-increment IDs) always splits the rightmost leaf. Can cause contention.
•Stale statistics — Query optimizer uses outdated cardinality estimates, choosing suboptimal plans. More a statistics issue than index issue, but related.

maintenance_operations.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
-- Index Maintenance Operations
 
-- SQL Server: Check fragmentation
SELECT 
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    ips.avg_fragmentation_in_percent AS Fragmentation,
    ips.avg_page_space_used_in_percent AS PageDensity
FROM sys.indexes i
JOIN sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') ips
    ON i.object_id = ips.object_id AND i.index_id = ips.index_id
WHERE ips.avg_fragmentation_in_percent > 10
ORDER BY Fragmentation DESC;
 
 
-- SQL Server: Reorganize (light maintenance, online)
ALTER INDEX ix_customer_name ON customers REORGANIZE;
-- Defragments leaf level, compacts pages
-- Low overhead, can run during business hours
 
 
-- SQL Server: Rebuild (heavy maintenance, may block)
ALTER INDEX ix_customer_name ON customers REBUILD;
-- Drops and recreates entire index
-- Full defragmentation, resets fill factor
-- Can block queries (unless ONLINE option used)
 
 
-- PostgreSQL: Check bloat
SELECT 
    schemaname, tablename, indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes;
 
-- PostgreSQL: Reindex (equivalent to rebuild)
REINDEX INDEX ix_customer_name;
-- Or for all indexes on a table:
REINDEX TABLE customers;
 
 
-- MySQL/InnoDB: Analyze to update statistics
ANALYZE TABLE customers;
 
-- MySQL: Optimize to rebuild and defragment
OPTIMIZE TABLE customers;
-- Recreates table and indexes, expensive but thorough
 
 
/*
Maintenance Schedule Guidelines:
================================
- Daily: Update statistics for tables with heavy DML
- Weekly: Reorganize indexes with 10-30% fragmentation  
- Monthly: Rebuild indexes with >30% fragmentation
- After bulk loads: Always rebuild/reorganize affected indexes
 
Automation:
- Use maintenance plans (SQL Server)
- Use pg_repack extension (PostgreSQL) for online rebuilds
- Schedule during low-activity windows
- Monitor index health with regular fragmentation checks
*/

Proactive vs. Reactive Maintenance

Proactive index maintenance (scheduled reorganization and statistics updates) is far cheaper than reactive troubleshooting. A fragmented index might slow queries by 2-3x, but diagnosing 'why is the database slow' takes hours. Regular maintenance prevents the crisis before it happens.

Summary: The Full Cost of Indexing

Index maintenance is the other side of the indexing coin. While indexes dramatically accelerate reads, they impose ongoing costs for modifications. Understanding these costs is essential for balanced database design.

Key Takeaways

•Insertions cost ~4-6 I/Os — Search to leaf, insert, occasional split. Splits cascade rarely. Expected cost only slightly exceeds search cost.
•Deletions require underflow handling — Redistribution (borrowing) is preferred; merging is fallback. Lazy deletion (marking) is common in practice.
•Updates on indexed columns are expensive — Key updates require delete + insert (2x cost). Non-key updates on non-indexed columns have zero index overhead.
•Bulk operations need special handling — Row-by-row with index maintenance is 25x+ slower than sorted bulk load. Drop/rebuild or pre-sort for large loads.
•Ongoing maintenance is essential — Fragmentation, bloat, and stale statistics degrade performance over time. Schedule regular reorganization and statistics updates.
•Each index multiplies write costs — N indexes means ~N× the write I/O. Only create indexes that provide sufficient read benefit to justify write overhead.

Module complete:

You've now completed Module 5: Multi-Level Index, covering:

Single-level limitations — Why flat indexes fail at scale
Multi-level concept — Building indexes on indexes
Tree structure — Formalizing the hierarchy as balanced trees
Search efficiency — Analyzing query costs and optimization
Index maintenance — Managing inserts, deletes, and updates

This knowledge prepares you for the detailed study of B-trees and B+-trees in the next chapter, where you'll see exactly how these concepts are implemented in production database systems.

Module Complete

Congratulations! You now have a comprehensive understanding of multi-level indexing: from the theoretical foundations (logarithmic search, balance properties) to practical concerns (I/O costs, maintenance overhead, bulk loading). This knowledge is directly applicable to database design, performance tuning, and troubleshooting in production systems.

5 / 5

Loading learning content...

Database Management SystemsMulti-Level Index

Multi-Level Index Structures

LevelIntermediate

Duration75 mins

TopicMulti-Level Index

5 / 5

Index Maintenance

The Cost of Keeping Balance

What You Will Learn

Insertion Operations

The insertion algorithm:

Search: Traverse the tree from root to find the appropriate leaf node
Insert: Add the new entry in sorted position within the leaf
Split (if necessary): If the leaf overflows, split it into two nodes
Propagate: If a split occurs, insert the new separator key in the parent (which may cascade)

insertion_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
def insert_into_btree(root: BTreeNode, key, data_pointer) -> BTreeNode:
    """
    Insert a key-pointer pair into a B+-tree.
    Returns the (possibly new) root node.
    
    This implementation uses a recursive descent with split propagation.
    Real databases often use iterative approaches with parent pointers.
    """
    
    def insert_recursive(node: BTreeNode, key, data_pointer):
        """
        Returns: (new_key, new_child) if split occurred, else (None, None)
        new_key: The separator key to insert in parent
        new_child: Pointer to the new sibling node created by split
        """
        
        if node.is_leaf:
            # Insert into leaf node
            insert_position = find_insert_position(node.keys, key)
            node.keys.insert(insert_position, key)
            node.data_pointers.insert(insert_position, data_pointer)
            
            if len(node.keys) > node.max_keys:
                # Overflow! Must split this leaf
                return split_leaf(node)
            else:
                return (None, None)  # No split needed
        
        else:
            # Internal node: find child and recurse
            child_index = find_child_index(node.keys, key)
            child = node.children[child_index]
            
            # Recursive insert into appropriate subtree
            (promoted_key, new_child) = insert_recursive(child, key, data_pointer)
            
            if promoted_key is not None:
                # A split occurred below - must insert separator here
                node.keys.insert(child_index, promoted_key)
                node.children.insert(child_index + 1, new_child)
                
                if len(node.keys) > node.max_keys:
                    # This internal node also overflows - split it too
                    return split_internal(node)
            
            return (None, None)
    
    # Start insertion from root
    (promoted_key, new_child) = insert_recursive(root, key, data_pointer)
    
    if promoted_key is not None:
        # Root was split - create new root
        new_root = BTreeNode(order=root.order, is_leaf=False)
        new_root.keys = [promoted_key]
        new_root.children = [root, new_child]
        return new_root
    
    return root  # Root unchanged
 
 
def split_leaf(node: BTreeNode):
    """
    Split an overflowing leaf node into two nodes.
    Returns: (separator_key, new_sibling)
    """
    mid = len(node.keys) // 2
    
    # Create new sibling with upper half of entries
    sibling = BTreeNode(order=node.order, is_leaf=True)
    sibling.keys = node.keys[mid:]
    sibling.data_pointers = node.data_pointers[mid:]
    
    # Original node keeps lower half
    node.keys = node.keys[:mid]
    node.data_pointers = node.data_pointers[:mid]
    
    # Maintain leaf sibling chain (for B+-tree)
    sibling.next_leaf = node.next_leaf
    node.next_leaf = sibling
    
    # Separator key: first key of new sibling (copied up)
    separator = sibling.keys[0]
    
    return (separator, sibling)
 
 
def split_internal(node: BTreeNode):
    """
    Split an overflowing internal node.
    Returns: (separator_key, new_sibling)
    
    Key difference from leaf split: separator is pushed up, not copied.
    """
    mid = len(node.keys) // 2
    
    # The middle key is pushed up to parent (not kept in either child)
    separator = node.keys[mid]
    
    # Create new sibling with upper half
    sibling = BTreeNode(order=node.order, is_leaf=False)
    sibling.keys = node.keys[mid + 1:]  # Skip the separator
    sibling.children = node.children[mid + 1:]
    
    # Original keeps lower half
    node.keys = node.keys[:mid]
    node.children = node.children[:mid + 1]
    
    return (separator, sibling)

Visualizing a node split:

When a leaf node with capacity 4 receives a 5th entry:

split_visualization.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Node Split Example
==================
 
Before insertion of key 35:
 
    Parent Node: [20 | 40 | 60]
                  /    |     \
                 v     v      v
        [10,15] [25,30,32,38] [45,50]
                 ^^^^^^^^
                 Target leaf (already full with 4 keys)
 
 
After inserting 35 (causes overflow → split):
 
Step 1: Insert 35 in sorted position
    [25, 30, 32, 35, 38]  ← 5 entries, overflow!
 
Step 2: Split leaf at midpoint
    [25, 30]  |  [32, 35, 38]
       ↑              ↑
    Original      New sibling
 
Step 3: Promote separator (32) to parent
    Parent Node: [20 | 32 | 40 | 60]
                  /    |    |     \
                 v     v    v      v
        [10,15] [25,30] [32,35,38] [45,50]
                 
Note: 
- Separator 32 is COPIED to parent (in B+-tree)
- Separator 32 remains in leaf (data pointer needed)
- Parent now has 4 keys (if max is 4, parent would also split)

Cascading Splits

Insertion I/O Cost Analysis

Let's quantify the I/O cost of insertions. The cost depends on whether splits occur, which depends on how full the tree nodes are.

Base cost (no split):

Read h blocks (root to leaf traversal)
Write 1 block (modified leaf)
Total: h reads + 1 write

For our 10-million-record example (h=3): 3 reads + 1 write = 4 I/Os

Cost with leaf split:

Read h blocks (traversal)
Write 2 blocks (original leaf + new sibling)
Write 1 block (parent to add separator)
Total: h reads + 3 writes

Cost with cascading split:

If parent also splits, add 2 more writes. If it cascades to root, potentially h×2 writes.

Insertion I/O Costs by Scenario
Scenario	Probability	Read I/Os	Write I/Os	Total I/Os
No split	~65-70%	3	1	4
Leaf split only	~25-30%	3	3	6
Leaf + parent split	~3-5%	3	5	8
Split to root	<1%	3	2h+1	2h+4
New root created	<<1%	3	2h+2	2h+5

insertion_cost_analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Expected Insertion I/O Cost Calculation
========================================
 
Assumptions:
  - Tree height: h = 3
  - Fanout: f = 250
  - Node fill factor after splits: ~70% (empirical average)
 
Split probability at each level:
  P(leaf split) ≈ 1/f (one split per f insertions on average)
               ≈ 1/250 ≈ 0.4% per insertion
               
  But nodes aren't always full. With 70% fill factor:
  P(leaf split) ≈ 30% (room for ~75 more keys per leaf)
  
  Wait, this seems contradictory. Let's think more carefully:
  
  After many random insertions, nodes average ~69% full (ln(2) ≈ 0.693).
  Each node can hold f keys.
  Average keys per leaf: 0.69 × f ≈ 173 (for f=250)
  
  P(leaf split | insertion into this leaf):
    Depends on current fullness. On average, over long term:
    P(split) ≈ 1 / (keys per leaf) ≈ 1/173 ≈ 0.6%
    
  More accurately, considering that splits create two ~50% full nodes:
    Average split rate: ~ 1 / (0.5 × f) = 1/125 ≈ 0.8% per insertion
    
  P(parent split | leaf split) ≈ same logic ≈ 0.8%
  P(root split | parent split) ≈ 0.8%
  
 
Expected I/O Cost:
  E[I/O] = P(no split) × 4 + P(leaf split only) × 6 + P(cascade) × (6+)
  
  With p = 0.008 (0.8% split rate):
  E[I/O] ≈ (1-p) × 4 + p × (1-p) × 6 + p² × 8 + ...
         ≈ 0.992 × 4 + 0.00794 × 6 + 0.000064 × 8
         ≈ 3.968 + 0.048 + 0.0005
         ≈ 4.02 I/Os per insertion
 
RESULT: Expected insertion cost is only slightly above search cost!
  - Search: 3-4 I/Os (read only)
  - Insert: ~4 I/Os (mostly no split, occasional split overhead averages out)
 
This is remarkably efficient. The B-tree design minimizes overhead.

Write Amplification Reality

Deletion Operations

The deletion algorithm:

Search: Find the leaf node containing the key to delete
Delete: Remove the entry from the leaf
Handle underflow (if necessary): If the leaf falls below minimum occupancy: a. Redistribute: Borrow entries from a sibling node, OR b. Merge: Combine with a sibling node
Propagate: If merge occurs, delete separator from parent (which may cascade)

deletion_visualization.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
Deletion Example: Handling Underflow
=====================================
 
Tree parameters: Minimum 2 keys per leaf (max 4)
 
Initial state:
    Parent: [30 | 50]
            /   |    \
           v    v     v
    [10,20] [35,40] [55,60,70,80]
              ↑
    Delete key 40 from this leaf
 
 
Step 1: Delete 40
    [35] ← Only 1 key remaining, minimum is 2 → UNDERFLOW!
 
 
Step 2: Check siblings for redistribution
 
Option A: Borrow from right sibling (has 4 keys, can donate)
    Before: [35] and [55,60,70,80]
    After:  [35,55] and [60,70,80]
    Parent separator changes: 50 → 60
    
    Result:
        Parent: [30 | 60]
                /   |    \
               v    v     v
        [10,20] [35,55] [60,70,80]
    
    I/O: 1 read (sibling) + 3 writes (leaf, sibling, parent) = 4 I/Os
 
 
Option B: Borrow from left sibling (has 2 keys, can donate 1)
    Before: [10,20] and [35]
    After:  [10] and [20,35]
    Parent separator changes: 30 → 20
    
    But wait: [10] now has only 1 key → left sibling underflows!
    Would need to borrow from ITS sibling, or merge.
    
    In this case, borrowing from right is better (more keys available).
 
 
What if sibling can't donate?
============================
 
Scenario: Delete from [35,40] when right sibling has only [55,60]
 
After deleting 40: [35] ← underflow
Right sibling [55,60] has exactly minimum, can't donate
 
Must MERGE:
    [35] + [55,60] + separator(50) → [35,50,55,60]
    
    Before merge:
        Parent: [30 | 50]
                /   |    \
               v    v     v
        [10,20] [35] [55,60]
    
    After merge:
        Parent: [30]
                /    \
               v      v
        [10,20] [35,50,55,60]
    
    I/O: 1 read + 3 writes (leaf gone, sibling modified, parent modified)
 
Note: Parent lost a key. If parent underflows, cascade merge upward.
If root ends up with 0 keys, tree height decreases by 1.

Redistribution (Borrow)

•Sibling has more than minimum keys
•Move one key from sibling to underflow node
•Update separator in parent
•Tree structure unchanged
•Preferred option (simpler, local changes)

Merge (Coalesce)

•Sibling has exactly minimum keys (can't donate)
•Combine underflow node + sibling into one node
•Pull separator down from parent
•One less node in tree; may cascade underflow
•Can decrease tree height if root merges

Lazy Deletion Alternative

Update Operations

Updates come in two flavors, with very different index maintenance costs:

Non-key update: Changing a column that is NOT part of the index key. Example: updating a customer's email address when the index is on customer_id.

Key update: Changing a column that IS part of the index key. Example: correcting a customer_id value.

The difference is profound.

Update Operation Costs
Update Type	Index Impact	I/O Cost	Complexity
Non-key (covered column)	Update leaf entry value	h reads + 1 write	Low—modify in place
Non-key (non-covered)	No index change	0 index I/Os	None—index unaffected
Key update	Delete + Insert	2× (delete + insert cost)	High—full rebalancing possible

update_scenarios.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Update Operation Examples
 
-- Table: customers with clustered index on customer_id
-- Non-clustered index on (last_name, first_name)
 
-- SCENARIO 1: Non-key update on non-indexed column
UPDATE customers 
SET email = 'new@email.com' 
WHERE customer_id = 12345;
 
-- Index impact: NONE
-- Clustered index used to find row (3-4 I/Os)
-- Only data page modified
-- last_name index: unchanged
-- Total: 4 I/Os (same as point query + 1 write)
 
 
-- SCENARIO 2: Non-key update on indexed column (not key)
-- Assume covering index: (customer_id) INCLUDE (status)
UPDATE customers 
SET status = 'active' 
WHERE customer_id = 12345;
 
-- Index impact: Update index entry to change status value
-- Find entry: 3-4 I/Os
-- Modify leaf entry in place: 1 write
-- Total: 4-5 I/Os
 
 
-- SCENARIO 3: Key update (changing primary key - rare!)
UPDATE customers 
SET customer_id = 99999 
WHERE customer_id = 12345;
 
-- Index impact: MAJOR
-- 1. Delete old entry for customer_id = 12345
--    - Search: 3 I/Os
--    - Delete from leaf: 1 write
--    - Potential underflow handling: 0-4 more I/Os
-- 
-- 2. Insert new entry for customer_id = 99999
--    - Search: 3 I/Os  
--    - Insert into leaf: 1 write
--    - Potential split handling: 0-4 more I/Os
--
-- 3. Data record must also move (clustered index)
--    
-- Total: 8-20+ I/Os
 
-- This is why primary keys should be immutable!
 
 
-- SCENARIO 4: Update on secondary index key
UPDATE customers 
SET last_name = 'Smith' 
WHERE customer_id = 12345;
 
-- Secondary index on (last_name, first_name) affected:
-- 1. Delete old entry for (old_last_name, first_name)
-- 2. Insert new entry for ('Smith', first_name)
-- 
-- Clustered index (customer_id): just data update, no key change
-- 
-- Total: ~8-12 I/Os (search + delete + insert + data update)
 
-- Every indexed column update costs more than non-indexed updates!

The Index Tax

Bulk Operations and Batch Loading

Bulk loading strategies:

Bulk Loading Techniques

•Sorted bulk load: Sort data by index key before loading. Build tree bottom-up by filling leaf pages sequentially, then constructing internal nodes. O(n) I/O instead of O(n log n).
•Drop and rebuild: Drop indexes before loading, load data, rebuild indexes from scratch. Often faster than maintaining indexes during load.
•Deferred index updates: Collect index modifications in a buffer, apply in batches sorted by key. Reduces random I/O.
•Minimal logging: Temporarily disable detailed logging during bulk loads. Faster but reduced recovery options.

bulk_load_comparison.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
Bulk Load Performance Comparison
=================================
 
Scenario: Load 10 million records into table with B+-tree index
Record size: 200 bytes
Index entry: 14 bytes
Block size: 4 KB
 
METHOD 1: Row-by-row INSERT with index maintenance
-------------------------------------------------
Per-row cost: ~4-6 I/Os (search + insert + occasional split)
Total I/Os: 10,000,000 × 5 = 50,000,000 I/Os
SSD time: 50M × 0.1ms = 5,000 seconds = 83 minutes
HDD time: 50M × 10ms = 500,000 seconds = 139 hours (!)
 
Problem: Completely impractical for HDD, slow even for SSD.
 
 
METHOD 2: Sorted bulk load (bottom-up construction)
--------------------------------------------------
Step 1: Sort data by index key
  - External sort: O(n log n) comparisons
  - I/O: ~2-4 passes over data = 2-4 × (10M × 200 bytes / 4KB)
  - I/O: ~2 million block reads/writes
 
Step 2: Build leaf level
  - Write full leaf blocks sequentially
  - 10M entries / 250 per leaf = 40,000 leaf blocks
  - I/O: 40,000 sequential writes
 
Step 3: Build internal levels
  - Level 2: 40,000 / 250 = 160 blocks
  - Level 3: 160 / 250 = 1 block (root)
  - I/O: ~200 writes
 
Total I/Os: ~2,000,000 + 40,200 = ~2,040,000 I/Os
SSD time: 2M × 0.1ms = 200 seconds = 3.3 minutes
HDD time: 2M × 0.1ms (sequential!) = ~200 seconds = 3.3 minutes
 
SPEEDUP: 25x faster than row-by-row!
 
 
METHOD 3: Drop index, load, rebuild
-----------------------------------
Step 1: Drop index
  - Instant
 
Step 2: Load data (no index maintenance)
  - Simple append to heap: 10M × 200 bytes = 2 GB
  - Sequential writes: 500,000 block writes
  - I/O: 500,000 writes
 
Step 3: Rebuild index (same as sorted bulk load)
  - But data not sorted! Must sort first.
  - Index build: ~2,040,000 I/Os (same as Method 2)
 
Total: ~2,540,000 I/Os
Slightly slower than Method 2, but simpler to implement.
 
CONCLUSION: 
For large loads, NEVER do row-by-row with index maintenance.
Either bulk load with sorted data, or drop/rebuild indexes.

ETL Best Practice

Maintenance Best Practices

Common index degradation issues:

Index Degradation Patterns

•Logical fragmentation — Pages out of key order. Sequential scans become random I/O. Caused by page splits that allocate non-contiguous pages.
•Internal fragmentation (bloat) — Pages partially filled due to deletions or splits. Wasted space means more blocks to read. Degrades scan performance.
•Page splits creating hot spots — Inserting sequential keys (like auto-increment IDs) always splits the rightmost leaf. Can cause contention.
•Stale statistics — Query optimizer uses outdated cardinality estimates, choosing suboptimal plans. More a statistics issue than index issue, but related.

maintenance_operations.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
-- Index Maintenance Operations
 
-- SQL Server: Check fragmentation
SELECT 
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    ips.avg_fragmentation_in_percent AS Fragmentation,
    ips.avg_page_space_used_in_percent AS PageDensity
FROM sys.indexes i
JOIN sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'LIMITED') ips
    ON i.object_id = ips.object_id AND i.index_id = ips.index_id
WHERE ips.avg_fragmentation_in_percent > 10
ORDER BY Fragmentation DESC;
 
 
-- SQL Server: Reorganize (light maintenance, online)
ALTER INDEX ix_customer_name ON customers REORGANIZE;
-- Defragments leaf level, compacts pages
-- Low overhead, can run during business hours
 
 
-- SQL Server: Rebuild (heavy maintenance, may block)
ALTER INDEX ix_customer_name ON customers REBUILD;
-- Drops and recreates entire index
-- Full defragmentation, resets fill factor
-- Can block queries (unless ONLINE option used)
 
 
-- PostgreSQL: Check bloat
SELECT 
    schemaname, tablename, indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes;
 
-- PostgreSQL: Reindex (equivalent to rebuild)
REINDEX INDEX ix_customer_name;
-- Or for all indexes on a table:
REINDEX TABLE customers;
 
 
-- MySQL/InnoDB: Analyze to update statistics
ANALYZE TABLE customers;
 
-- MySQL: Optimize to rebuild and defragment
OPTIMIZE TABLE customers;
-- Recreates table and indexes, expensive but thorough
 
 
/*
Maintenance Schedule Guidelines:
================================
- Daily: Update statistics for tables with heavy DML
- Weekly: Reorganize indexes with 10-30% fragmentation  
- Monthly: Rebuild indexes with >30% fragmentation
- After bulk loads: Always rebuild/reorganize affected indexes
 
Automation:
- Use maintenance plans (SQL Server)
- Use pg_repack extension (PostgreSQL) for online rebuilds
- Schedule during low-activity windows
- Monitor index health with regular fragmentation checks
*/

Proactive vs. Reactive Maintenance

Summary: The Full Cost of Indexing

Key Takeaways

•Insertions cost ~4-6 I/Os — Search to leaf, insert, occasional split. Splits cascade rarely. Expected cost only slightly exceeds search cost.
•Deletions require underflow handling — Redistribution (borrowing) is preferred; merging is fallback. Lazy deletion (marking) is common in practice.
•Updates on indexed columns are expensive — Key updates require delete + insert (2x cost). Non-key updates on non-indexed columns have zero index overhead.
•Bulk operations need special handling — Row-by-row with index maintenance is 25x+ slower than sorted bulk load. Drop/rebuild or pre-sort for large loads.
•Ongoing maintenance is essential — Fragmentation, bloat, and stale statistics degrade performance over time. Schedule regular reorganization and statistics updates.
•Each index multiplies write costs — N indexes means ~N× the write I/O. Only create indexes that provide sufficient read benefit to justify write overhead.

Module complete:

You've now completed Module 5: Multi-Level Index, covering:

Single-level limitations — Why flat indexes fail at scale
Multi-level concept — Building indexes on indexes
Tree structure — Formalizing the hierarchy as balanced trees
Search efficiency — Analyzing query costs and optimization
Index maintenance — Managing inserts, deletes, and updates

This knowledge prepares you for the detailed study of B-trees and B+-trees in the next chapter, where you'll see exactly how these concepts are implemented in production database systems.

Module Complete

5 / 5