Database Management SystemsB+-Tree Performance

B+-Tree Performance Analysis

LevelAdvanced

Duration75 mins

TopicB+-Tree Performance

5 / 5

Practical Considerations

From Theory to Production Excellence

Understanding B+-tree theory is necessary but not sufficient for production excellence. Real-world systems present challenges that purely theoretical analysis cannot anticipate: fragmentation, workload changes, hardware failures, concurrent access patterns, and the constant pressure to balance performance against operational complexity.

This page distills hard-won practical wisdom for B+-tree management in production environments. You'll learn monitoring strategies, tuning approaches, common pitfalls, and the decision frameworks used by experienced database administrators and performance engineers.

What You Will Master

By the end of this page, you will understand production monitoring and health assessment, master tuning strategies for common scenarios, recognize and resolve performance anti-patterns, apply database-specific optimization techniques, and develop operational best practices for B+-tree index management.

Index Health Monitoring

Proactive monitoring prevents performance degradation before it impacts users. Effective B+-tree monitoring focuses on several key metrics.

Core Metrics to Monitor:

B+-Tree Health Metrics
Metric	Healthy Range	Warning Signs	Action Required
Index Size vs Data Size	0.1-0.5× data	1× data size	Check for bloat, duplicates
Page Fill Factor	70-95%	<60% average	REINDEX or VACUUM
Tree Height	Theoretical minimum +1	Expected +2	Investigate fragmentation
Cache Hit Rate	95%	<90%	Increase buffer pool
Dead Tuples %	<5%	10%	Run maintenance
Scans per Second	Stable trend	Sudden increase	Check query patterns

monitoring_queries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
-- PostgreSQL: Comprehensive Index Health Check
WITH index_stats AS (
    SELECT
        schemaname,
        tablename,
        indexrelname as index_name,
        pg_relation_size(indexrelid) as index_size,
        pg_relation_size(relid) as table_size,
        idx_scan as index_scans,
        idx_tup_read as tuples_read,
        idx_tup_fetch as tuples_fetched
    FROM pg_stat_user_indexes
    JOIN pg_index ON indexrelid = pg_index.indexrelid
),
bloat_estimate AS (
    SELECT
        schemaname,
        tablename,
        indexrelname,
        -- Estimate bloat based on index vs table ratio
        CASE 
            WHEN table_size > 0 THEN 
                (index_size::float / table_size * 100)::numeric(5,2)
            ELSE 0 
        END as index_to_table_pct
    FROM index_stats
)
SELECT 
    s.schemaname || '.' || s.tablename as table_name,
    s.index_name,
    pg_size_pretty(s.index_size) as index_size,
    pg_size_pretty(s.table_size) as table_size,
    b.index_to_table_pct || '%' as size_ratio,
    s.index_scans,
    CASE 
        WHEN s.index_scans = 0 THEN 'UNUSED'
        WHEN s.index_scans < 100 THEN 'LOW'
        ELSE 'ACTIVE'
    END as usage_status
FROM index_stats s
JOIN bloat_estimate b ON s.index_name = b.indexrelname
ORDER BY s.index_size DESC
LIMIT 20;
 
-- MySQL InnoDB: Index Statistics
SELECT 
    TABLE_SCHEMA,
    TABLE_NAME,
    INDEX_NAME,
    NON_UNIQUE,
    CARDINALITY,
    SUB_PART,
    INDEX_TYPE
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY TABLE_NAME, INDEX_NAME;
 
-- MySQL: InnoDB Buffer Pool Hit Rate
SHOW STATUS LIKE 'innodb_buffer_pool_read%';
-- Calculate: 1 - (reads / read_requests) = hit_rate
 
-- Oracle: Index Clustering Factor Analysis
SELECT 
    index_name,
    num_rows,
    distinct_keys,
    clustering_factor,
    -- Lower is better; < table blocks means good clustering
    (SELECT blocks FROM user_tables WHERE table_name = ui.table_name) as table_blocks,
    blevel + 1 as tree_height
FROM user_indexes ui
WHERE index_type = 'NORMAL'
ORDER BY num_rows DESC;

Automated Monitoring

Set up automated alerts for: Index size growth > 10% per week, Cache hit rate drops below 90%, Query latency increases without load increase, Index scan count drops to zero (unused index). Many monitoring tools (pganalyze, PMM, OEM) provide B+-tree specific dashboards.

Fragmentation and Maintenance

Over time, B+-trees develop fragmentation: non-sequential page layout, partially-filled pages, and accumulated dead space. This degrades performance in subtle ways.

Types of Fragmentation:

Fragmentation Types

•Internal Fragmentation: Partially-filled pages waste space; more pages to scan for same data
•External Fragmentation: Logical neighbors stored on non-adjacent disk blocks; sequential scans become random
•Page Splits Fragmentation: Splits scatter related data across non-adjacent pages
•Dead Space Accumulation: Deleted entries leave gaps; bloat increases over time
•Index Bloat: PostgreSQL-specific; old row versions in indexes causing extra pages

fragmentation_detection.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- PostgreSQL: Estimate Index Bloat
-- Using pgstattuple extension (most accurate)
CREATE EXTENSION IF NOT EXISTS pgstattuple;
 
SELECT 
    indexrelid::regclass as index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
    avg_leaf_density,
    avg_leaf_density as fill_percentage,
    CASE 
        WHEN avg_leaf_density < 50 THEN 'SEVERE BLOAT'
        WHEN avg_leaf_density < 70 THEN 'MODERATE BLOAT'
        WHEN avg_leaf_density < 85 THEN 'SOME BLOAT'
        ELSE 'HEALTHY'
    END as health_status
FROM (
    SELECT 
        indexrelid,
        (pgstatindex(indexrelid::regclass)).avg_leaf_density
    FROM pg_index
    WHERE indisvalid AND NOT indisprimary
    LIMIT 10  -- Check top 10 indexes
) stats
ORDER BY avg_leaf_density ASC;
 
-- PostgreSQL: Identify Bloated Indexes Needing REINDEX
-- Compare current size to estimated minimal size
WITH index_info AS (
    SELECT
        c.relname as index_name,
        c.relpages as current_pages,
        -- Estimate optimal pages from distinct keys and page capacity
        GREATEST(1, 
            CEIL(i.indrelid::regclass::text::regclass::oid::bigint / 500.0)
        ) as estimated_min_pages
    FROM pg_class c
    JOIN pg_index i ON c.oid = i.indexrelid
    WHERE c.relkind = 'i'
)
SELECT 
    index_name,
    current_pages,
    estimated_min_pages,
    ROUND(((current_pages - estimated_min_pages)::float / current_pages * 100)::numeric, 2) 
        as bloat_pct
FROM index_info
WHERE current_pages > estimated_min_pages * 1.3  -- >30% bloat
ORDER BY (current_pages - estimated_min_pages) DESC;
 
-- MySQL InnoDB: Check for Fragmentation
SELECT 
    TABLE_NAME,
    DATA_LENGTH,
    INDEX_LENGTH,
    DATA_FREE,
    ROUND((DATA_FREE / (DATA_LENGTH + INDEX_LENGTH)) * 100, 2) as fragmentation_pct
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'your_database'
  AND DATA_FREE > 0
ORDER BY DATA_FREE DESC;

Maintenance Strategies:

Index Maintenance Operations by Database
Database	Operation	Locks	Downtime	When to Use
PostgreSQL	REINDEX	Exclusive	Yes	Heavy bloat, corruption
PostgreSQL	REINDEX CONCURRENTLY	Minimal	No	Production maintenance
PostgreSQL	VACUUM	Shared	No	Routine dead tuple removal
MySQL	OPTIMIZE TABLE	Exclusive	Yes	Rebuild table + indexes
MySQL	ALTER TABLE ... FORCE	Exclusive	Yes	InnoDB table rebuild
Oracle	ALTER INDEX REBUILD	Exclusive	Yes	Full rebuild
Oracle	ALTER INDEX COALESCE	Shared	Minimal	Defragment leaf blocks
SQL Server	ALTER INDEX REBUILD	Exclusive	Yes	Full rebuild
SQL Server	ALTER INDEX REORGANIZE	Shared	No	Light defragmentation

Maintenance Window Planning

Most maintenance operations require significant I/O and may lock tables. Schedule during low-traffic periods. For large indexes (>10GB), operations can take hours. Always test on non-production first and have a rollback plan.

Tuning Strategies

Effective B+-tree tuning requires matching configuration to workload characteristics. Here are proven strategies for common scenarios.

Strategy 1: OLTP Optimization (High Throughput, Low Latency)

oltp_tuning.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- PostgreSQL OLTP Tuning
 
-- 1. Aggressive buffer pool sizing (75% of RAM if dedicated)
-- In postgresql.conf:
-- shared_buffers = '12GB'  -- For 16GB RAM server
-- effective_cache_size = '14GB'  -- Total expected cache
 
-- 2. Optimize for point lookups - smaller pages, more in cache
-- (Compile-time in PostgreSQL, but we can optimize indexes)
 
-- 3. Use covering indexes to avoid heap access
CREATE INDEX idx_orders_covering ON orders (
    customer_id,
    order_date
) INCLUDE (
    order_total,
    status
);
 
-- 4. Partial indexes for hot data
CREATE INDEX idx_orders_active ON orders (order_date)
WHERE status = 'active';
 
-- 5. Tune autovacuum for high-write tables
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,  -- Vacuum at 5% dead tuples
    autovacuum_analyze_scale_factor = 0.02  -- Analyze at 2% changes
);
 
-- MySQL InnoDB OLTP Tuning
 
-- Buffer pool sizing (typically 70-80% of RAM)
-- SET GLOBAL innodb_buffer_pool_size = 12 * 1024 * 1024 * 1024;
 
-- Adaptive hash index (enabled by default, helpful for point lookups)
-- SHOW VARIABLES LIKE 'innodb_adaptive_hash_index';
 
-- Change buffer for secondary index updates
-- SET GLOBAL innodb_change_buffer_max_size = 25;  -- % of buffer pool

Strategy 2: OLAP Optimization (Scan Performance)

olap_tuning.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- OLAP-Oriented B+-Tree Tuning
 
-- 1. Larger pages for better sequential scan (if configurable)
-- MySQL: innodb_page_size = 32768  -- 32KB, set at init
 
-- 2. Higher fill factor for read-heavy workloads
CREATE INDEX idx_sales_analytics ON sales (
    sale_date,
    region_id,
    product_category
) WITH (fillfactor = 95);  -- PostgreSQL
 
-- 3. Compressed indexes (Oracle, SQL Server)
-- Oracle: 
-- CREATE INDEX idx_sales_date ON sales(sale_date) COMPRESS 2;
 
-- 4. Column order optimization for prefix scans
-- Most selective column that's always used in filters goes FIRST
-- Range scan column goes LAST
 
-- Good for: WHERE region = X AND date BETWEEN a AND b
CREATE INDEX idx_sales_region_date ON sales (region_id, sale_date);
 
-- 5. Consider bitmap indexes for low-cardinality (Oracle, PostgreSQL)
-- Good for: Large tables, few distinct values, OR conditions
-- PostgreSQL extension: CREATE INDEX ON sales USING gin(region_id);

Write-Heavy Tuning

•Lower fill factor (80%) reduces splits
•Batch inserts when possible
•Consider partitioning by time
•Use append-only patterns (sequential keys)
•Increase checkpoint interval

Read-Heavy Tuning

•Higher fill factor (95%) maximizes fanout
•Covering indexes eliminate heap access
•Maximize buffer pool size
•Pre-warm cache after restart
•Consider read replicas for load distribution

Common Anti-Patterns

Learning from common mistakes can save significant debugging time. Here are the most frequent B+-tree performance anti-patterns and their solutions.

B+-Tree Anti-Patterns
Anti-Pattern	Symptom	Root Cause	Solution
Over-indexing	Slow writes, large storage	Index for every query	Consolidate; use covering indexes
Under-indexing	Full table scans everywhere	No index design thought	Analyze query patterns; create targeted indexes
Wrong column order	Index not used by optimizer	Misunderstanding index usage	Leading column must be in WHERE/JOIN
Function on indexed column	Index scan becomes table scan	WHERE UPPER(name) = ...	Create expression index or modify query
Type mismatch	Implicit conversion prevents index use	WHERE int_col = '123'	Match types exactly in queries
Low selectivity leading column	Index returns too many rows	Indexing on gender, boolean	Move high-selectivity column first
Unused indexes	Storage waste, write overhead	Speculative index creation	Drop after monitoring period
Duplicate indexes	Double maintenance cost	Accidental creation	Audit with queries below

detect_anti_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- PostgreSQL: Find Unused Indexes
SELECT 
    schemaname,
    tablename,
    indexrelname,
    idx_scan as total_scans,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0  -- Never used
  AND indexrelid NOT IN (  -- Exclude constraint-backing indexes
      SELECT conindid FROM pg_constraint WHERE conindid IS NOT NULL
  )
ORDER BY pg_relation_size(indexrelid) DESC;
 
-- PostgreSQL: Find Duplicate/Redundant Indexes
SELECT 
    a.indexrelid::regclass as index1,
    b.indexrelid::regclass as index2,
    a.indkey as columns1,
    b.indkey as columns2
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid 
    AND a.indexrelid != b.indexrelid
    AND a.indkey::text LIKE b.indkey::text || '%'  -- a is prefix of b
WHERE a.indisunique = false;  -- Not unique constraint
 
-- MySQL: Find Unused Indexes
SELECT 
    object_schema,
    object_name,
    index_name,
    count_read,
    count_write
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
  AND count_read = 0
  AND object_schema NOT IN ('mysql', 'performance_schema')
ORDER BY count_write DESC;
 
-- Detect Function Calls Preventing Index Use
-- In query plans, look for:
-- "Filter: (upper(name) = 'JOHN')" vs "Index Cond: (name = 'john')"
EXPLAIN ANALYZE
SELECT * FROM users WHERE UPPER(email) = 'TEST@EXAMPLE.COM';
-- Fix: CREATE INDEX idx_email_upper ON users (UPPER(email));

The Index That Hurt

The most dangerous anti-pattern is an index that's used but shouldn't be. The optimizer may choose an index that returns 50% of the table when a sequential scan would be faster. This happens with outdated statistics or misleading histograms. Always EXPLAIN ANALYZE before and after index changes.

Index Selection Guidelines

Choosing the right indexes requires balancing multiple factors. Follow this decision framework for thoughtful index design.

The Index Selection Decision Tree:

When to Create a B+-Tree Index

•Equality lookups on high-cardinality columns: Primary key, unique identifiers, foreign keys
•Range queries: Date ranges, numeric ranges, alphabetical ranges
•ORDER BY optimization: Avoid sorts when index delivers pre-sorted data
•Join acceleration: Index on foreign key columns for nested loop joins
•Covering index opportunity: Frequently accessed columns can be included to avoid heap

When NOT to Use B+-Tree Index

•Very low cardinality: Status columns (Active/Inactive), boolean flags → Consider bitmap
•Very high update frequency: Columns updated every transaction → Cost may exceed benefit
•Small tables: Tables with < 1000 rows → Full scan is often faster
•Unselective queries: Retrieving > 10-15% of table → Full scan is faster
•Complex expressions without expression index: WHERE a + b > c → Won't use index

Column Order for Composite Indexes:

The order of columns in a composite index dramatically affects utility:

composite_index_order.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Composite Index Column Order Guide
 
-- RULE 1: Equality columns before range columns
-- Query: WHERE status = 'active' AND date BETWEEN x AND y
-- Good:  (status, date)  -- Equality first
-- Bad:   (date, status)  -- Range first breaks index use for status
 
-- RULE 2: High-selectivity before low-selectivity (for equality)
-- Query: WHERE department = 'Engineering' AND country = 'US'
-- If 5% are Engineering, 20% are US:
-- Good:  (department, country)  -- More selective first
-- Bad:   (country, department)  -- Less selective first
 
-- RULE 3: Consider all query patterns the index must serve
-- These queries all benefit from (customer_id, order_date, product_id):
-- SELECT * FROM orders WHERE customer_id = 100;
-- SELECT * FROM orders WHERE customer_id = 100 AND order_date > '2024-01-01';
-- SELECT * FROM orders WHERE customer_id = 100 AND order_date = '2024-01-01' 
--   AND product_id = 5;
 
-- But this query CANNOT use the index efficiently:
-- SELECT * FROM orders WHERE order_date > '2024-01-01';  -- No leading column!
 
-- RULE 4: ORDER BY columns at the end
-- Query: SELECT * FROM products WHERE category = 'Electronics' ORDER BY price;
-- Index: (category, price)  -- Avoids sort operation!
 
-- RULE 5: Consider covering index for read-heavy patterns
CREATE INDEX idx_orders_customer_covering ON orders (
    customer_id,           -- Equality condition
    order_date DESC        -- Range + ORDER BY
) INCLUDE (
    total_amount,          -- Selected columns
    status                 -- Avoiding heap access
);

The '5 Minute Rule' for Index Decisions

If a query takes > 5 seconds and returns < 5% of rows, an index will likely help. If a query is fast but runs 1000s of times per second, even small improvements justify an index. When in doubt, test with EXPLAIN ANALYZE on production-like data.

Production Operational Best Practices

Running B+-tree indexes in production requires ongoing attention and established procedures.

Operational Checklist

•Daily: Monitor cache hit rates, check for lock contention, review slow query logs
•Weekly: Check index usage statistics, identify unused indexes, review growth trends
•Monthly: Analyze fragmentation levels, run statistics updates, prune stale indexes
•Quarterly: Full index review, rebuild heavily fragmented indexes, capacity planning
•On Schema Change: Verify index impacts, rebuild affected indexes, update statistics

operational_procedures.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- PostgreSQL: Safe Production Index Creation
-- Always use CONCURRENTLY in production
CREATE INDEX CONCURRENTLY idx_new_index ON large_table (column)
WHERE condition;  -- Consider partial index
 
-- Monitor progress
SELECT * FROM pg_stat_progress_create_index;
 
-- If creation fails, clean up invalid index
SELECT indexrelid::regclass FROM pg_index WHERE NOT indisvalid;
DROP INDEX CONCURRENTLY idx_new_index;  -- If invalid
 
-- PostgreSQL: Statistics Update Procedure
-- After bulk data changes
ANALYZE table_name;  -- Quick statistics refresh
 
-- For more accurate statistics on important columns
ALTER TABLE orders ALTER COLUMN order_date 
SET STATISTICS 1000;  -- Default is 100
ANALYZE orders;
 
-- PostgreSQL: Online REINDEX (v12+)
REINDEX INDEX CONCURRENTLY idx_name;
 
-- Monitor reindex progress
SELECT 
    a.query,
    p.phase,
    p.lockers_total,
    p.blocks_total,
    p.blocks_done,
    round(100.0 * p.blocks_done / nullif(p.blocks_total, 0), 1) as pct_done
FROM pg_stat_activity a
JOIN pg_stat_progress_create_index p ON a.pid = p.pid
WHERE a.query LIKE '%REINDEX%';
 
-- MySQL: Online DDL for Index Operations (5.6+)
ALTER TABLE orders 
ADD INDEX idx_new (column_name),
ALGORITHM=INPLACE,  -- Online, no table copy
LOCK=NONE;          -- Concurrent DML allowed

Handling Index Emergencies:

Index Emergency Response
Emergency	Immediate Action	Root Cause Investigation	Prevention
Query suddenly slow	Check query plan changes	Statistics stale? Index dropped?	Regular ANALYZE, plan monitoring
Disk I/O spike	Identify offending query	Index scan turned table scan	Statistics update, EXPLAIN check
Lock contention surge	Identify blocking sessions	Hot page conflicts	Reduce transaction size, partition
Index corruption	Fail over if replica available	Hardware issue? Bug?	Checksums, RAID, regular REINDEX
Out of disk space	DROP unused indexes	Index bloat, poor retention	Monitoring, scheduled maintenance

The Importance of Baselines

Establish performance baselines during normal operations. Document: typical query latencies, cache hit rates, I/O patterns, index sizes. Without baselines, you can't recognize degradation or measure improvement.

Database-Specific Considerations

Each database system has unique B+-tree implementation characteristics and tuning opportunities.

PostgreSQL B-Tree Specifics

•MVCC in Indexes: Indexes contain row version pointers; dead tuples cause bloat until VACUUM
•HOT Updates: Heap-Only-Tuples avoid index updates when indexed columns unchanged
•Deduplication (v13+): Automatically compresses duplicate keys in non-unique indexes
•Suffix Truncation (v12+): Internal pages store minimal separator keys
•Parallel Index Scans (v10+): Large index scans can use multiple workers
•INCLUDE columns (v11+): Add non-key columns to index for covering scans

Summary: Production Excellence with B+-Trees

We've comprehensively covered the practical aspects of B+-tree management in production systems.

Key Takeaways

•Monitor proactively: Cache hit rates, fragmentation, unused indexes before problems surface
•Maintain regularly: Schedule VACUUM/ANALYZE, check bloat, rebuild when needed
•Tune for workload: OLTP prefers small pages, high cache; OLAP prefers large pages, high fill
•Avoid anti-patterns: Function calls, type mismatches, wrong column order destroy index utility
•Design indexes thoughtfully: Column order matters; covering indexes eliminate heap access
•Know your database: Each system has unique features and tuning parameters

Module Complete:

You've now mastered B+-tree performance from mathematical foundations through practical production management. This knowledge enables you to design efficient indexes, predict performance characteristics, diagnose issues, and optimize database systems for any workload.

Module Complete

Congratulations! You've completed the B+-Tree Performance module. You now possess the analytical skills to predict index behavior, the practical knowledge to tune systems effectively, and the operational wisdom to maintain healthy indexes in production. This foundation will serve you throughout your database engineering career.

5 / 5

Loading learning content...

Database Management SystemsB+-Tree Performance

B+-Tree Performance Analysis

LevelAdvanced

Duration75 mins

TopicB+-Tree Performance

5 / 5

Practical Considerations

From Theory to Production Excellence

What You Will Master

Index Health Monitoring

Proactive monitoring prevents performance degradation before it impacts users. Effective B+-tree monitoring focuses on several key metrics.

Core Metrics to Monitor:

B+-Tree Health Metrics
Metric	Healthy Range	Warning Signs	Action Required
Index Size vs Data Size	0.1-0.5× data	1× data size	Check for bloat, duplicates
Page Fill Factor	70-95%	<60% average	REINDEX or VACUUM
Tree Height	Theoretical minimum +1	Expected +2	Investigate fragmentation
Cache Hit Rate	95%	<90%	Increase buffer pool
Dead Tuples %	<5%	10%	Run maintenance
Scans per Second	Stable trend	Sudden increase	Check query patterns

monitoring_queries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
-- PostgreSQL: Comprehensive Index Health Check
WITH index_stats AS (
    SELECT
        schemaname,
        tablename,
        indexrelname as index_name,
        pg_relation_size(indexrelid) as index_size,
        pg_relation_size(relid) as table_size,
        idx_scan as index_scans,
        idx_tup_read as tuples_read,
        idx_tup_fetch as tuples_fetched
    FROM pg_stat_user_indexes
    JOIN pg_index ON indexrelid = pg_index.indexrelid
),
bloat_estimate AS (
    SELECT
        schemaname,
        tablename,
        indexrelname,
        -- Estimate bloat based on index vs table ratio
        CASE 
            WHEN table_size > 0 THEN 
                (index_size::float / table_size * 100)::numeric(5,2)
            ELSE 0 
        END as index_to_table_pct
    FROM index_stats
)
SELECT 
    s.schemaname || '.' || s.tablename as table_name,
    s.index_name,
    pg_size_pretty(s.index_size) as index_size,
    pg_size_pretty(s.table_size) as table_size,
    b.index_to_table_pct || '%' as size_ratio,
    s.index_scans,
    CASE 
        WHEN s.index_scans = 0 THEN 'UNUSED'
        WHEN s.index_scans < 100 THEN 'LOW'
        ELSE 'ACTIVE'
    END as usage_status
FROM index_stats s
JOIN bloat_estimate b ON s.index_name = b.indexrelname
ORDER BY s.index_size DESC
LIMIT 20;
 
-- MySQL InnoDB: Index Statistics
SELECT 
    TABLE_SCHEMA,
    TABLE_NAME,
    INDEX_NAME,
    NON_UNIQUE,
    CARDINALITY,
    SUB_PART,
    INDEX_TYPE
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY TABLE_NAME, INDEX_NAME;
 
-- MySQL: InnoDB Buffer Pool Hit Rate
SHOW STATUS LIKE 'innodb_buffer_pool_read%';
-- Calculate: 1 - (reads / read_requests) = hit_rate
 
-- Oracle: Index Clustering Factor Analysis
SELECT 
    index_name,
    num_rows,
    distinct_keys,
    clustering_factor,
    -- Lower is better; < table blocks means good clustering
    (SELECT blocks FROM user_tables WHERE table_name = ui.table_name) as table_blocks,
    blevel + 1 as tree_height
FROM user_indexes ui
WHERE index_type = 'NORMAL'
ORDER BY num_rows DESC;

Automated Monitoring

Fragmentation and Maintenance

Over time, B+-trees develop fragmentation: non-sequential page layout, partially-filled pages, and accumulated dead space. This degrades performance in subtle ways.

Types of Fragmentation:

Fragmentation Types

•Internal Fragmentation: Partially-filled pages waste space; more pages to scan for same data
•External Fragmentation: Logical neighbors stored on non-adjacent disk blocks; sequential scans become random
•Page Splits Fragmentation: Splits scatter related data across non-adjacent pages
•Dead Space Accumulation: Deleted entries leave gaps; bloat increases over time
•Index Bloat: PostgreSQL-specific; old row versions in indexes causing extra pages

fragmentation_detection.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
-- PostgreSQL: Estimate Index Bloat
-- Using pgstattuple extension (most accurate)
CREATE EXTENSION IF NOT EXISTS pgstattuple;
 
SELECT 
    indexrelid::regclass as index_name,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size,
    avg_leaf_density,
    avg_leaf_density as fill_percentage,
    CASE 
        WHEN avg_leaf_density < 50 THEN 'SEVERE BLOAT'
        WHEN avg_leaf_density < 70 THEN 'MODERATE BLOAT'
        WHEN avg_leaf_density < 85 THEN 'SOME BLOAT'
        ELSE 'HEALTHY'
    END as health_status
FROM (
    SELECT 
        indexrelid,
        (pgstatindex(indexrelid::regclass)).avg_leaf_density
    FROM pg_index
    WHERE indisvalid AND NOT indisprimary
    LIMIT 10  -- Check top 10 indexes
) stats
ORDER BY avg_leaf_density ASC;
 
-- PostgreSQL: Identify Bloated Indexes Needing REINDEX
-- Compare current size to estimated minimal size
WITH index_info AS (
    SELECT
        c.relname as index_name,
        c.relpages as current_pages,
        -- Estimate optimal pages from distinct keys and page capacity
        GREATEST(1, 
            CEIL(i.indrelid::regclass::text::regclass::oid::bigint / 500.0)
        ) as estimated_min_pages
    FROM pg_class c
    JOIN pg_index i ON c.oid = i.indexrelid
    WHERE c.relkind = 'i'
)
SELECT 
    index_name,
    current_pages,
    estimated_min_pages,
    ROUND(((current_pages - estimated_min_pages)::float / current_pages * 100)::numeric, 2) 
        as bloat_pct
FROM index_info
WHERE current_pages > estimated_min_pages * 1.3  -- >30% bloat
ORDER BY (current_pages - estimated_min_pages) DESC;
 
-- MySQL InnoDB: Check for Fragmentation
SELECT 
    TABLE_NAME,
    DATA_LENGTH,
    INDEX_LENGTH,
    DATA_FREE,
    ROUND((DATA_FREE / (DATA_LENGTH + INDEX_LENGTH)) * 100, 2) as fragmentation_pct
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'your_database'
  AND DATA_FREE > 0
ORDER BY DATA_FREE DESC;

Maintenance Strategies:

Index Maintenance Operations by Database
Database	Operation	Locks	Downtime	When to Use
PostgreSQL	REINDEX	Exclusive	Yes	Heavy bloat, corruption
PostgreSQL	REINDEX CONCURRENTLY	Minimal	No	Production maintenance
PostgreSQL	VACUUM	Shared	No	Routine dead tuple removal
MySQL	OPTIMIZE TABLE	Exclusive	Yes	Rebuild table + indexes
MySQL	ALTER TABLE ... FORCE	Exclusive	Yes	InnoDB table rebuild
Oracle	ALTER INDEX REBUILD	Exclusive	Yes	Full rebuild
Oracle	ALTER INDEX COALESCE	Shared	Minimal	Defragment leaf blocks
SQL Server	ALTER INDEX REBUILD	Exclusive	Yes	Full rebuild
SQL Server	ALTER INDEX REORGANIZE	Shared	No	Light defragmentation

Maintenance Window Planning

Tuning Strategies

Effective B+-tree tuning requires matching configuration to workload characteristics. Here are proven strategies for common scenarios.

Strategy 1: OLTP Optimization (High Throughput, Low Latency)

oltp_tuning.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- PostgreSQL OLTP Tuning
 
-- 1. Aggressive buffer pool sizing (75% of RAM if dedicated)
-- In postgresql.conf:
-- shared_buffers = '12GB'  -- For 16GB RAM server
-- effective_cache_size = '14GB'  -- Total expected cache
 
-- 2. Optimize for point lookups - smaller pages, more in cache
-- (Compile-time in PostgreSQL, but we can optimize indexes)
 
-- 3. Use covering indexes to avoid heap access
CREATE INDEX idx_orders_covering ON orders (
    customer_id,
    order_date
) INCLUDE (
    order_total,
    status
);
 
-- 4. Partial indexes for hot data
CREATE INDEX idx_orders_active ON orders (order_date)
WHERE status = 'active';
 
-- 5. Tune autovacuum for high-write tables
ALTER TABLE orders SET (
    autovacuum_vacuum_scale_factor = 0.05,  -- Vacuum at 5% dead tuples
    autovacuum_analyze_scale_factor = 0.02  -- Analyze at 2% changes
);
 
-- MySQL InnoDB OLTP Tuning
 
-- Buffer pool sizing (typically 70-80% of RAM)
-- SET GLOBAL innodb_buffer_pool_size = 12 * 1024 * 1024 * 1024;
 
-- Adaptive hash index (enabled by default, helpful for point lookups)
-- SHOW VARIABLES LIKE 'innodb_adaptive_hash_index';
 
-- Change buffer for secondary index updates
-- SET GLOBAL innodb_change_buffer_max_size = 25;  -- % of buffer pool

Strategy 2: OLAP Optimization (Scan Performance)

olap_tuning.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- OLAP-Oriented B+-Tree Tuning
 
-- 1. Larger pages for better sequential scan (if configurable)
-- MySQL: innodb_page_size = 32768  -- 32KB, set at init
 
-- 2. Higher fill factor for read-heavy workloads
CREATE INDEX idx_sales_analytics ON sales (
    sale_date,
    region_id,
    product_category
) WITH (fillfactor = 95);  -- PostgreSQL
 
-- 3. Compressed indexes (Oracle, SQL Server)
-- Oracle: 
-- CREATE INDEX idx_sales_date ON sales(sale_date) COMPRESS 2;
 
-- 4. Column order optimization for prefix scans
-- Most selective column that's always used in filters goes FIRST
-- Range scan column goes LAST
 
-- Good for: WHERE region = X AND date BETWEEN a AND b
CREATE INDEX idx_sales_region_date ON sales (region_id, sale_date);
 
-- 5. Consider bitmap indexes for low-cardinality (Oracle, PostgreSQL)
-- Good for: Large tables, few distinct values, OR conditions
-- PostgreSQL extension: CREATE INDEX ON sales USING gin(region_id);

Write-Heavy Tuning

•Lower fill factor (80%) reduces splits
•Batch inserts when possible
•Consider partitioning by time
•Use append-only patterns (sequential keys)
•Increase checkpoint interval

Read-Heavy Tuning

•Higher fill factor (95%) maximizes fanout
•Covering indexes eliminate heap access
•Maximize buffer pool size
•Pre-warm cache after restart
•Consider read replicas for load distribution

Common Anti-Patterns

Learning from common mistakes can save significant debugging time. Here are the most frequent B+-tree performance anti-patterns and their solutions.

B+-Tree Anti-Patterns
Anti-Pattern	Symptom	Root Cause	Solution
Over-indexing	Slow writes, large storage	Index for every query	Consolidate; use covering indexes
Under-indexing	Full table scans everywhere	No index design thought	Analyze query patterns; create targeted indexes
Wrong column order	Index not used by optimizer	Misunderstanding index usage	Leading column must be in WHERE/JOIN
Function on indexed column	Index scan becomes table scan	WHERE UPPER(name) = ...	Create expression index or modify query
Type mismatch	Implicit conversion prevents index use	WHERE int_col = '123'	Match types exactly in queries
Low selectivity leading column	Index returns too many rows	Indexing on gender, boolean	Move high-selectivity column first
Unused indexes	Storage waste, write overhead	Speculative index creation	Drop after monitoring period
Duplicate indexes	Double maintenance cost	Accidental creation	Audit with queries below

detect_anti_patterns.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
-- PostgreSQL: Find Unused Indexes
SELECT 
    schemaname,
    tablename,
    indexrelname,
    idx_scan as total_scans,
    pg_size_pretty(pg_relation_size(indexrelid)) as index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0  -- Never used
  AND indexrelid NOT IN (  -- Exclude constraint-backing indexes
      SELECT conindid FROM pg_constraint WHERE conindid IS NOT NULL
  )
ORDER BY pg_relation_size(indexrelid) DESC;
 
-- PostgreSQL: Find Duplicate/Redundant Indexes
SELECT 
    a.indexrelid::regclass as index1,
    b.indexrelid::regclass as index2,
    a.indkey as columns1,
    b.indkey as columns2
FROM pg_index a
JOIN pg_index b ON a.indrelid = b.indrelid 
    AND a.indexrelid != b.indexrelid
    AND a.indkey::text LIKE b.indkey::text || '%'  -- a is prefix of b
WHERE a.indisunique = false;  -- Not unique constraint
 
-- MySQL: Find Unused Indexes
SELECT 
    object_schema,
    object_name,
    index_name,
    count_read,
    count_write
FROM performance_schema.table_io_waits_summary_by_index_usage
WHERE index_name IS NOT NULL
  AND count_read = 0
  AND object_schema NOT IN ('mysql', 'performance_schema')
ORDER BY count_write DESC;
 
-- Detect Function Calls Preventing Index Use
-- In query plans, look for:
-- "Filter: (upper(name) = 'JOHN')" vs "Index Cond: (name = 'john')"
EXPLAIN ANALYZE
SELECT * FROM users WHERE UPPER(email) = 'TEST@EXAMPLE.COM';
-- Fix: CREATE INDEX idx_email_upper ON users (UPPER(email));

The Index That Hurt

Index Selection Guidelines

Choosing the right indexes requires balancing multiple factors. Follow this decision framework for thoughtful index design.

The Index Selection Decision Tree:

When to Create a B+-Tree Index

•Equality lookups on high-cardinality columns: Primary key, unique identifiers, foreign keys
•Range queries: Date ranges, numeric ranges, alphabetical ranges
•ORDER BY optimization: Avoid sorts when index delivers pre-sorted data
•Join acceleration: Index on foreign key columns for nested loop joins
•Covering index opportunity: Frequently accessed columns can be included to avoid heap

When NOT to Use B+-Tree Index

•Very low cardinality: Status columns (Active/Inactive), boolean flags → Consider bitmap
•Very high update frequency: Columns updated every transaction → Cost may exceed benefit
•Small tables: Tables with < 1000 rows → Full scan is often faster
•Unselective queries: Retrieving > 10-15% of table → Full scan is faster
•Complex expressions without expression index: WHERE a + b > c → Won't use index

Column Order for Composite Indexes:

The order of columns in a composite index dramatically affects utility:

composite_index_order.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Composite Index Column Order Guide
 
-- RULE 1: Equality columns before range columns
-- Query: WHERE status = 'active' AND date BETWEEN x AND y
-- Good:  (status, date)  -- Equality first
-- Bad:   (date, status)  -- Range first breaks index use for status
 
-- RULE 2: High-selectivity before low-selectivity (for equality)
-- Query: WHERE department = 'Engineering' AND country = 'US'
-- If 5% are Engineering, 20% are US:
-- Good:  (department, country)  -- More selective first
-- Bad:   (country, department)  -- Less selective first
 
-- RULE 3: Consider all query patterns the index must serve
-- These queries all benefit from (customer_id, order_date, product_id):
-- SELECT * FROM orders WHERE customer_id = 100;
-- SELECT * FROM orders WHERE customer_id = 100 AND order_date > '2024-01-01';
-- SELECT * FROM orders WHERE customer_id = 100 AND order_date = '2024-01-01' 
--   AND product_id = 5;
 
-- But this query CANNOT use the index efficiently:
-- SELECT * FROM orders WHERE order_date > '2024-01-01';  -- No leading column!
 
-- RULE 4: ORDER BY columns at the end
-- Query: SELECT * FROM products WHERE category = 'Electronics' ORDER BY price;
-- Index: (category, price)  -- Avoids sort operation!
 
-- RULE 5: Consider covering index for read-heavy patterns
CREATE INDEX idx_orders_customer_covering ON orders (
    customer_id,           -- Equality condition
    order_date DESC        -- Range + ORDER BY
) INCLUDE (
    total_amount,          -- Selected columns
    status                 -- Avoiding heap access
);

The '5 Minute Rule' for Index Decisions

Production Operational Best Practices

Running B+-tree indexes in production requires ongoing attention and established procedures.

Operational Checklist

•Daily: Monitor cache hit rates, check for lock contention, review slow query logs
•Weekly: Check index usage statistics, identify unused indexes, review growth trends
•Monthly: Analyze fragmentation levels, run statistics updates, prune stale indexes
•Quarterly: Full index review, rebuild heavily fragmented indexes, capacity planning
•On Schema Change: Verify index impacts, rebuild affected indexes, update statistics

operational_procedures.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- PostgreSQL: Safe Production Index Creation
-- Always use CONCURRENTLY in production
CREATE INDEX CONCURRENTLY idx_new_index ON large_table (column)
WHERE condition;  -- Consider partial index
 
-- Monitor progress
SELECT * FROM pg_stat_progress_create_index;
 
-- If creation fails, clean up invalid index
SELECT indexrelid::regclass FROM pg_index WHERE NOT indisvalid;
DROP INDEX CONCURRENTLY idx_new_index;  -- If invalid
 
-- PostgreSQL: Statistics Update Procedure
-- After bulk data changes
ANALYZE table_name;  -- Quick statistics refresh
 
-- For more accurate statistics on important columns
ALTER TABLE orders ALTER COLUMN order_date 
SET STATISTICS 1000;  -- Default is 100
ANALYZE orders;
 
-- PostgreSQL: Online REINDEX (v12+)
REINDEX INDEX CONCURRENTLY idx_name;
 
-- Monitor reindex progress
SELECT 
    a.query,
    p.phase,
    p.lockers_total,
    p.blocks_total,
    p.blocks_done,
    round(100.0 * p.blocks_done / nullif(p.blocks_total, 0), 1) as pct_done
FROM pg_stat_activity a
JOIN pg_stat_progress_create_index p ON a.pid = p.pid
WHERE a.query LIKE '%REINDEX%';
 
-- MySQL: Online DDL for Index Operations (5.6+)
ALTER TABLE orders 
ADD INDEX idx_new (column_name),
ALGORITHM=INPLACE,  -- Online, no table copy
LOCK=NONE;          -- Concurrent DML allowed

Handling Index Emergencies:

Index Emergency Response
Emergency	Immediate Action	Root Cause Investigation	Prevention
Query suddenly slow	Check query plan changes	Statistics stale? Index dropped?	Regular ANALYZE, plan monitoring
Disk I/O spike	Identify offending query	Index scan turned table scan	Statistics update, EXPLAIN check
Lock contention surge	Identify blocking sessions	Hot page conflicts	Reduce transaction size, partition
Index corruption	Fail over if replica available	Hardware issue? Bug?	Checksums, RAID, regular REINDEX
Out of disk space	DROP unused indexes	Index bloat, poor retention	Monitoring, scheduled maintenance

The Importance of Baselines

Database-Specific Considerations

Each database system has unique B+-tree implementation characteristics and tuning opportunities.

PostgreSQL B-Tree Specifics

•MVCC in Indexes: Indexes contain row version pointers; dead tuples cause bloat until VACUUM
•HOT Updates: Heap-Only-Tuples avoid index updates when indexed columns unchanged
•Deduplication (v13+): Automatically compresses duplicate keys in non-unique indexes
•Suffix Truncation (v12+): Internal pages store minimal separator keys
•Parallel Index Scans (v10+): Large index scans can use multiple workers
•INCLUDE columns (v11+): Add non-key columns to index for covering scans

Summary: Production Excellence with B+-Trees

We've comprehensively covered the practical aspects of B+-tree management in production systems.

Key Takeaways

•Monitor proactively: Cache hit rates, fragmentation, unused indexes before problems surface
•Maintain regularly: Schedule VACUUM/ANALYZE, check bloat, rebuild when needed
•Tune for workload: OLTP prefers small pages, high cache; OLAP prefers large pages, high fill
•Avoid anti-patterns: Function calls, type mismatches, wrong column order destroy index utility
•Design indexes thoughtfully: Column order matters; covering indexes eliminate heap access
•Know your database: Each system has unique features and tuning parameters

Module Complete:

Module Complete

5 / 5