Index Types - Learning Module

Loading content...

0/241

Index Classification: A Systematic Taxonomy

A Unified Classification Framework

Throughout this module, we've explored primary indexes, secondary indexes, unique indexes, and composite indexes. But these aren't mutually exclusive categories—they represent different dimensions of classification. A single index can simultaneously be:

Primary (determines physical ordering) AND unique (no duplicates) AND composite (multiple columns)

Or:

Secondary (doesn't control ordering) AND non-unique (allows duplicates) AND single-column

To make informed indexing decisions, we need a comprehensive classification framework that considers all these dimensions together. This systematic approach transforms indexing from an art into an engineering discipline.

What You Will Learn

By the end of this page, you will understand the multiple dimensions along which indexes can be classified, how to systematically analyze indexing requirements, decision frameworks for index type selection, common index anti-patterns and how to avoid them, and practical guidelines for comprehensive index design.

Index Classification Dimensions

Indexes can be classified along several independent dimensions. Each dimension represents a different aspect of how the index is structured, behaves, or interacts with the underlying data.

The six fundamental classification dimensions:

Index Classification Dimensions
Dimension	Categories	Key Question
Physical Ordering	Clustered vs. Non-Clustered	Does the index control data row order?
Entry Density	Dense vs. Sparse	One entry per record or per block?
Key Uniqueness	Unique vs. Non-Unique	Are duplicate key values allowed?
Column Count	Single-Column vs. Composite	How many columns in the index key?
Data Structure	B+-Tree, Hash, Bitmap, etc.	What structure organizes entries?
Content Scope	Full vs. Partial (Filtered)	Are all rows indexed or a subset?

classification_examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Index Classification Examples:
 
Example 1: MySQL InnoDB PRIMARY KEY
─────────────────────────────────────
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE
);
 
Classification:
• Physical Ordering: Clustered (data stored in PK order)
• Entry Density: Dense (B+-tree leaf = data rows)
• Key Uniqueness: Unique (PRIMARY KEY constraint)
• Column Count: Single-column
• Data Structure: B+-tree
• Content Scope: Full (all rows indexed)
 
 
Example 2: PostgreSQL Partial Unique Index
─────────────────────────────────────
CREATE UNIQUE INDEX idx_active_email
ON users (email)
WHERE is_active = TRUE;
 
Classification:
• Physical Ordering: Non-Clustered (PostgreSQL uses heap)
• Entry Density: Dense (one entry per qualifying row)
• Key Uniqueness: Unique (within matching rows)
• Column Count: Single-column
• Data Structure: B+-tree
• Content Scope: Partial (only is_active = TRUE rows)
 
 
Example 3: Composite Non-Unique Secondary Index
─────────────────────────────────────
CREATE INDEX idx_customer_date
ON orders (customer_id, order_date);
 
Classification:
• Physical Ordering: Non-Clustered
• Entry Density: Dense
• Key Uniqueness: Non-Unique
• Column Count: Composite (2 columns)
• Data Structure: B+-tree
• Content Scope: Full

Orthogonal Dimensions

These dimensions are largely independent—any combination is theoretically possible (though some are rare). A sparse clustered unique composite index on a hash structure is technically definable, even if unusual. Understanding each dimension separately helps you reason about any index you encounter.

Ordering-Based Classification

The most fundamental division in index classification is based on the relationship between index order and data order.

Clustered (Primary) Index:

Data rows are physically stored in index key order
Only ONE clustered index per table (physical ordering is singular)
Range queries benefit from sequential I/O
Changes to clustered key require row relocation

Non-Clustered (Secondary) Index:

Data rows are independent of index order
MULTIPLE non-clustered indexes allowed per table
Follows pointers for data access (extra I/O)
Key changes only affect index, not data location

Clustered Index Characteristics:

┌────────────────────────────┐
│     Index + Data           │
│   Integrated Structure     │
│                            │
│  ┌──────────────────────┐  │
│  │ Key 1 │ Full Row 1   │  │
│  │ Key 2 │ Full Row 2   │  │
│  │ Key 3 │ Full Row 3   │  │
│  └──────────────────────┘  │
│                            │
│  Data IS the leaf level    │
└────────────────────────────┘

Best for:

Range scans on key
ORDER BY on key
Point lookups

Non-Clustered Index:

┌───────────────────────────┐
│       Index Structure     │
│   ┌─────────┬─────────┐   │
│   │ Key 1   │ Pointer─┼──►│ Row in heap/table
│   │ Key 2   │ Pointer─┼──►│ Row in heap/table
│   │ Key 3   │ Pointer─┼──►│ Row in heap/table
│   └─────────┴─────────┘   │
│                           │
│  Index separate from data │
└───────────────────────────┘

Best for:

Multiple access paths
Frequently updated keys
Covering queries

Clustered vs. Non-Clustered Decision Factors
Factor	Favors Clustered	Favors Non-Clustered
Range queries on key	✅ Strong advantage	Sequential I/O lost
Multiple columns to filter	Can only cluster on one	✅ Create multiple indexes
Key value changes	Row relocation needed	✅ Only index entry moves
Join with key as driver	✅ Sequential access	Random access pattern
Wide rows	✅ No extra lookup	Pointer + lookup cost
Narrow covering queries	May be overkill	✅ Index-only scans

Density-Based Classification

Indexes can be classified by how many entries they contain relative to data records.

Dense Index:

Contains one entry for every record in the data file
Can locate any record directly through index
Larger index size, but more flexible
Required when data is not ordered on index key

Sparse Index:

Contains one entry per disk block (or per range)
Only possible when data is ordered on index key
Much smaller index size
Must scan within block after index lookup

dense_vs_sparse.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Dense Index                              Sparse Index
(1 entry per record)                     (1 entry per block)
 
Index:                                   Index:
┌───────┬────────┐                       ┌───────┬────────┐
│ Key 1 │ → Row 1│                       │ Key 1 │ → Blk 1│
│ Key 2 │ → Row 2│                       │ Key 4 │ → Blk 2│──┐
│ Key 3 │ → Row 3│                       │ Key 7 │ → Blk 3│  │
│ Key 4 │ → Row 4│                       └───────┴────────┘  │
│ Key 5 │ → Row 5│                                           │
│ Key 6 │ → Row 6│                       Data Blocks:        │
│ Key 7 │ → Row 7│                       ┌───────────────────┴──┐
│ ...   │ ...    │                       │ Block 2:              │
└───────┴────────┘                       │  Key 4, Key 5, Key 6  │
                                         └───────────────────────┘
Index entries: N
(where N = number of records)            Index entries: N / blocking_factor
                                         (much smaller!)
 
Size comparison (1M records, 50 records/block):
Dense:  1,000,000 entries
Sparse: 20,000 entries (50x smaller!)
 
Lookup for Key=5:
Dense:  Binary search → Direct to Row 5
Sparse: Binary search → Block 2 → Scan within block for Key 5

Modern B+-Trees Are Semi-Dense

Most modern B+-tree implementations blur the dense/sparse line. Internal nodes are sparse (entry per subtree), while leaf nodes are dense (entry per record). For clustered indexes in InnoDB, leaf nodes contain actual rows—making the distinction somewhat academic. The key takeaway: clustered indexes are inherently more compact than secondary indexes.

Structure-Based Classification

Indexes are implemented using different data structures, each optimized for certain operations.

Common index structures:

Index Data Structures Comparison
Structure	Best For	Point Query	Range Query	Insert
B+-Tree	General purpose, most common	O(log n)	O(log n + k)	O(log n)
Hash Index	Equality lookups only	O(1) average	❌ Not supported	O(1) average
Bitmap Index	Low-cardinality columns, OLAP	O(1)	Bitmap operations	Expensive
R-Tree	Spatial/geographic data	O(log n)	Spatial range	O(log n)
GiST/GIN	Full-text, arrays, JSON	Depends	Depends	Varies
LSM-Tree	Write-heavy workloads	O(log n)	O(log n + k)	O(1) amortized

structure_selection.sql
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- B+-Tree: Default and most versatile
CREATE INDEX idx_btree ON orders (customer_id);
-- Supports: =, <, >, <=, >=, BETWEEN, LIKE 'prefix%', ORDER BY
 
-- Hash Index (PostgreSQL, MySQL Memory engine)
CREATE INDEX idx_hash ON sessions (session_token) USING HASH;
-- Supports: = only
-- NOT useful for: <, >, BETWEEN, ORDER BY
-- Best for: Exact match lookups like session tokens, cache keys
 
-- Bitmap Index (Oracle, PostgreSQL with extensions)
-- Conceptually:
CREATE BITMAP INDEX idx_status ON orders (status);
-- Best for: Low-cardinality columns (few distinct values)
-- Excellent for: Complex AND/OR queries in analytics
 
-- B+-Tree vs Hash decision:
-- Q: "Find all orders with customer_id = 100"
-- Both work, Hash slightly faster for pure equality
 
-- Q: "Find all orders with customer_id BETWEEN 100 AND 200"
-- Only B+-Tree works (Hash cannot do ranges)
 
-- Q: "Find all orders with customer_id = 100 ORDER BY order_date"
-- B+-Tree on (customer_id, order_date) handles both filter and sort!
 
-- Recommendation: Default to B+-Tree unless you have specific Hash/Bitmap use case

B+-Tree Dominance

B+-tree indexes handle 90%+ of indexing needs. They support all comparison operators, ORDER BY optimization, and range queries efficiently. Hash and bitmap indexes are specialized tools for specific scenarios—use them consciously, not routinely.

Index Selection Decision Framework

Given the multiple classification dimensions, how do you decide which indexes to create? Here's a systematic framework:

Step 1: Identify Critical Queries

Collect the most important queries—those that are:

Executed most frequently (volume)
Performance-critical (latency SLAs)
Resource-intensive (full scans, sorts)

Step 2: Analyze Query Patterns

For each query, identify:

Filter columns (WHERE clause)
Join columns
Sort columns (ORDER BY)
Select columns (for covering potential)

decision_framework.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Index Selection Decision Tree:
 
START: Analyzing query patterns
 
1. Does the table have a natural unique identifier?
   ├─ YES → Create PRIMARY KEY (clustered in most DBs)
   └─ NO → Consider surrogate key (auto-increment)
 
2. For each frequent query's WHERE clause:
   ├─ Single equality column?
   │   ├─ High cardinality → Secondary index
   │   └─ Low cardinality → Consider bitmap or skip (full scan may be OK)
   │
   ├─ Multiple equality columns?
   │   └─ Composite index (equality cols first, by selectivity)
   │
   ├─ Range predicate?
   │   └─ Composite index (equality cols, then range col last)
   │
   └─ Complex OR conditions?
       └─ Consider separate indexes + intersection, or restructure query
 
3. Does query have ORDER BY?
   ├─ Same as WHERE columns → Include in composite index order
   └─ Different columns → May need separate index or accept filesort
 
4. Is the query very narrow (few columns in SELECT)?
   ├─ YES → Covering index opportunity (add SELECT cols to index)
   └─ NO → Standard index sufficient
 
5. Is the column frequently updated?
   ├─ YES → Index carefully (update cost) or avoid if not critical
   └─ NO → Index freely
 
6. Review overall index count
   ├─ > 5-7 indexes → Review write performance impact
   └─ < 3 indexes → May have coverage gaps

Index Selection Checklist

•Primary key: Always define one (for row identification and clustering)
•Foreign keys: Index them (supports joins and referential integrity checks)
•Frequent WHERE columns: Index if selective; composite if multiple columns
•JOIN columns: Index the 'many' side of one-to-many relationships
•ORDER BY + LIMIT: Index can eliminate sort operation
•Covering opportunities: Add columns to avoid table lookups for narrow queries
•Unique constraints: Create via UNIQUE index for data integrity

Monitor, Don't Guess

Index selection should be data-driven. Use slow query logs, EXPLAIN plans, and index usage statistics to validate assumptions. An index that seems theoretically useful may be unused in practice, while a missing index may cause silent performance degradation.

Common Index Anti-Patterns

Even experienced developers make indexing mistakes. Here are common anti-patterns to avoid:

Anti-Pattern 1: Over-Indexing

Creating an index on every column 'just in case.' Each index adds write overhead and consumes storage. For write-heavy tables, excessive indexes can cut write throughput by 50% or more.

Index Anti-Patterns to Avoid

•Index everything: Creates write overhead without proportional read benefit
•Ignoring column order: Composite index (A, B) doesn't help queries filtering only on B
•Low-cardinality single indexes: Indexing boolean or status columns alone often useless
•Duplicate prefixes: (A, B) and (A, B, C) are redundant—the second covers the first
•Missing foreign key indexes: Causes slow cascades and join performance issues
•Functions on indexed columns: WHERE UPPER(email) = ? can't use index on email
•Type mismatches: WHERE int_column = '123' may prevent index use in some databases
•Ignoring covering potential: Extra table lookups when index could satisfy query
•Never validating: Creating indexes without checking EXPLAIN or measuring impact

anti_pattern_fixes.sql
Fixes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Anti-pattern: Function on indexed column
-- Bad:
SELECT * FROM users WHERE UPPER(email) = 'ALICE@EXAMPLE.COM';
-- Index on email NOT used!
 
-- Fix: Expression index (PostgreSQL) or functional index (other DBs)
CREATE INDEX idx_email_upper ON users (UPPER(email));
 
-- Better fix: Normalize data on write, query directly
UPDATE users SET email = LOWER(email);
SELECT * FROM users WHERE email = 'alice@example.com';
 
-- Anti-pattern: Duplicate prefix indexes
-- Wasteful:
CREATE INDEX idx_a ON t(A);
CREATE INDEX idx_ab ON t(A, B);
CREATE INDEX idx_abc ON t(A, B, C);
-- idx_a is redundant! idx_ab covers queries on A.
-- idx_ab is redundant! idx_abc covers queries on A and (A, B).
 
-- Fixed: Single covering index
CREATE INDEX idx_abc ON t(A, B, C);
-- Serves: WHERE A=?, WHERE A=? AND B=?, WHERE A=? AND B=? AND C=?
 
-- Anti-pattern: Missing FK index
-- Schema:
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT REFERENCES customers(id)
);
-- Problem: No index on customer_id
-- Consequence: DELETE FROM customers WHERE id = 100 does FULL SCAN of orders!
 
-- Fixed:
CREATE INDEX idx_orders_customer ON orders(customer_id);

Unused Index Detection

Most databases provide statistics on index usage. PostgreSQL: pg_stat_user_indexes.idx_scan. MySQL: performance_schema.table_io_waits_summary_by_index_usage. Regularly audit indexes with zero or near-zero reads—candidates for removal.

Practical Classification Matrix

To synthesize everything we've learned, here's a comprehensive matrix that maps use cases to index types:

Index Type Selection by Use Case
Use Case	Recommended Index	Why
Row identification	PRIMARY KEY (clustered)	Mandatory; determines physical order
Natural unique field (SSN, email)	UNIQUE INDEX	Constraint + index combined
Foreign key column	Secondary non-unique	Supports joins and cascades
High-selectivity filter	Secondary B+-tree	Standard query acceleration
Low-selectivity filter	Consider bitmap or skip	B+-tree less effective
Multi-column equality filter	Composite (all eq cols)	Leftmost prefix serves query
Equality + range filter	Composite (eq first, range last)	Preserves range scan efficiency
ORDER BY with LIMIT	Index matching sort order	Eliminates filesort
Narrow covering query	Composite with INCLUDE	Index-only scan
Exact match only (tokens)	Hash index	O(1) lookup
Geographic/spatial data	R-tree / spatial index	Multi-dimensional queries
Full-text search	GIN / full-text index	Inverted index structure
Conditional uniqueness	Partial unique index	Filtered constraint

Context Is King

These recommendations are starting points, not absolute rules. The right index depends on your specific data distribution, query patterns, read/write ratio, and performance requirements. Always validate with actual data and workloads.

Summary: Index Classification Mastery

We've concluded our comprehensive exploration of index types with a systematic classification framework that enables principled index design decisions.

Key Takeaways

•Multi-dimensional classification: Indexes are classified along ordering, density, uniqueness, column count, structure, and scope dimensions
•Orthogonal dimensions: These dimensions are independent—a single index can be clustered + unique + composite + B+-tree + partial
•Ordering is fundamental: Clustered vs. non-clustered is the most impactful distinction, affecting I/O patterns dramatically
•B+-tree dominance: Default to B+-tree; use hash/bitmap/spatial only for specific needs
•Decision framework: Systematically analyze queries, identify filter/sort/covering needs, then select appropriate index type
•Anti-patterns destroy value: Over-indexing, wrong column order, missing FK indexes, and functions on columns are common mistakes
•Measurement is mandatory: Use EXPLAIN, slow logs, and index usage stats to validate index effectiveness
•Balance reads and writes: Every index is a read/write tradeoff—optimize for your actual workload

Module completion:

This completes Module 2: Index Types. You now have comprehensive knowledge of:

Primary indexes and their physical ordering relationship with data
Secondary indexes and their role as alternative access paths
Unique indexes combining performance with constraint enforcement
Composite indexes for multi-column query optimization
Index classification frameworks for systematic design decisions

With this foundation, you're prepared to explore advanced topics: B+-tree internals, hash indexing, specialized index structures, and query optimization techniques.

Module Complete

Congratulations! You've mastered the taxonomy of database indexes. You can now classify any index you encounter, make informed decisions about index creation, avoid common anti-patterns, and design index strategies that balance query performance with maintenance overhead. These skills are fundamental to database performance engineering.

Index Classification: A Systematic Taxonomy

A Unified Classification Framework

Primary (determines physical ordering) AND unique (no duplicates) AND composite (multiple columns)

Or:

Secondary (doesn't control ordering) AND non-unique (allows duplicates) AND single-column

What You Will Learn

Index Classification Dimensions

Indexes can be classified along several independent dimensions. Each dimension represents a different aspect of how the index is structured, behaves, or interacts with the underlying data.

The six fundamental classification dimensions:

Index Classification Dimensions
Dimension	Categories	Key Question
Physical Ordering	Clustered vs. Non-Clustered	Does the index control data row order?
Entry Density	Dense vs. Sparse	One entry per record or per block?
Key Uniqueness	Unique vs. Non-Unique	Are duplicate key values allowed?
Column Count	Single-Column vs. Composite	How many columns in the index key?
Data Structure	B+-Tree, Hash, Bitmap, etc.	What structure organizes entries?
Content Scope	Full vs. Partial (Filtered)	Are all rows indexed or a subset?

classification_examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Index Classification Examples:
 
Example 1: MySQL InnoDB PRIMARY KEY
─────────────────────────────────────
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE
);
 
Classification:
• Physical Ordering: Clustered (data stored in PK order)
• Entry Density: Dense (B+-tree leaf = data rows)
• Key Uniqueness: Unique (PRIMARY KEY constraint)
• Column Count: Single-column
• Data Structure: B+-tree
• Content Scope: Full (all rows indexed)
 
 
Example 2: PostgreSQL Partial Unique Index
─────────────────────────────────────
CREATE UNIQUE INDEX idx_active_email
ON users (email)
WHERE is_active = TRUE;
 
Classification:
• Physical Ordering: Non-Clustered (PostgreSQL uses heap)
• Entry Density: Dense (one entry per qualifying row)
• Key Uniqueness: Unique (within matching rows)
• Column Count: Single-column
• Data Structure: B+-tree
• Content Scope: Partial (only is_active = TRUE rows)
 
 
Example 3: Composite Non-Unique Secondary Index
─────────────────────────────────────
CREATE INDEX idx_customer_date
ON orders (customer_id, order_date);
 
Classification:
• Physical Ordering: Non-Clustered
• Entry Density: Dense
• Key Uniqueness: Non-Unique
• Column Count: Composite (2 columns)
• Data Structure: B+-tree
• Content Scope: Full

Orthogonal Dimensions

Ordering-Based Classification

The most fundamental division in index classification is based on the relationship between index order and data order.

Clustered (Primary) Index:

Data rows are physically stored in index key order
Only ONE clustered index per table (physical ordering is singular)
Range queries benefit from sequential I/O
Changes to clustered key require row relocation

Non-Clustered (Secondary) Index:

Data rows are independent of index order
MULTIPLE non-clustered indexes allowed per table
Follows pointers for data access (extra I/O)
Key changes only affect index, not data location

Clustered Index Characteristics:

┌────────────────────────────┐
│     Index + Data           │
│   Integrated Structure     │
│                            │
│  ┌──────────────────────┐  │
│  │ Key 1 │ Full Row 1   │  │
│  │ Key 2 │ Full Row 2   │  │
│  │ Key 3 │ Full Row 3   │  │
│  └──────────────────────┘  │
│                            │
│  Data IS the leaf level    │
└────────────────────────────┘

Best for:

Range scans on key
ORDER BY on key
Point lookups

Non-Clustered Index:

┌───────────────────────────┐
│       Index Structure     │
│   ┌─────────┬─────────┐   │
│   │ Key 1   │ Pointer─┼──►│ Row in heap/table
│   │ Key 2   │ Pointer─┼──►│ Row in heap/table
│   │ Key 3   │ Pointer─┼──►│ Row in heap/table
│   └─────────┴─────────┘   │
│                           │
│  Index separate from data │
└───────────────────────────┘

Best for:

Multiple access paths
Frequently updated keys
Covering queries

Clustered vs. Non-Clustered Decision Factors
Factor	Favors Clustered	Favors Non-Clustered
Range queries on key	✅ Strong advantage	Sequential I/O lost
Multiple columns to filter	Can only cluster on one	✅ Create multiple indexes
Key value changes	Row relocation needed	✅ Only index entry moves
Join with key as driver	✅ Sequential access	Random access pattern
Wide rows	✅ No extra lookup	Pointer + lookup cost
Narrow covering queries	May be overkill	✅ Index-only scans

Density-Based Classification

Indexes can be classified by how many entries they contain relative to data records.

Dense Index:

Contains one entry for every record in the data file
Can locate any record directly through index
Larger index size, but more flexible
Required when data is not ordered on index key

Sparse Index:

Contains one entry per disk block (or per range)
Only possible when data is ordered on index key
Much smaller index size
Must scan within block after index lookup

dense_vs_sparse.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Dense Index                              Sparse Index
(1 entry per record)                     (1 entry per block)
 
Index:                                   Index:
┌───────┬────────┐                       ┌───────┬────────┐
│ Key 1 │ → Row 1│                       │ Key 1 │ → Blk 1│
│ Key 2 │ → Row 2│                       │ Key 4 │ → Blk 2│──┐
│ Key 3 │ → Row 3│                       │ Key 7 │ → Blk 3│  │
│ Key 4 │ → Row 4│                       └───────┴────────┘  │
│ Key 5 │ → Row 5│                                           │
│ Key 6 │ → Row 6│                       Data Blocks:        │
│ Key 7 │ → Row 7│                       ┌───────────────────┴──┐
│ ...   │ ...    │                       │ Block 2:              │
└───────┴────────┘                       │  Key 4, Key 5, Key 6  │
                                         └───────────────────────┘
Index entries: N
(where N = number of records)            Index entries: N / blocking_factor
                                         (much smaller!)
 
Size comparison (1M records, 50 records/block):
Dense:  1,000,000 entries
Sparse: 20,000 entries (50x smaller!)
 
Lookup for Key=5:
Dense:  Binary search → Direct to Row 5
Sparse: Binary search → Block 2 → Scan within block for Key 5

Modern B+-Trees Are Semi-Dense

Structure-Based Classification

Indexes are implemented using different data structures, each optimized for certain operations.

Common index structures:

Index Data Structures Comparison
Structure	Best For	Point Query	Range Query	Insert
B+-Tree	General purpose, most common	O(log n)	O(log n + k)	O(log n)
Hash Index	Equality lookups only	O(1) average	❌ Not supported	O(1) average
Bitmap Index	Low-cardinality columns, OLAP	O(1)	Bitmap operations	Expensive
R-Tree	Spatial/geographic data	O(log n)	Spatial range	O(log n)
GiST/GIN	Full-text, arrays, JSON	Depends	Depends	Varies
LSM-Tree	Write-heavy workloads	O(log n)	O(log n + k)	O(1) amortized

structure_selection.sql
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- B+-Tree: Default and most versatile
CREATE INDEX idx_btree ON orders (customer_id);
-- Supports: =, <, >, <=, >=, BETWEEN, LIKE 'prefix%', ORDER BY
 
-- Hash Index (PostgreSQL, MySQL Memory engine)
CREATE INDEX idx_hash ON sessions (session_token) USING HASH;
-- Supports: = only
-- NOT useful for: <, >, BETWEEN, ORDER BY
-- Best for: Exact match lookups like session tokens, cache keys
 
-- Bitmap Index (Oracle, PostgreSQL with extensions)
-- Conceptually:
CREATE BITMAP INDEX idx_status ON orders (status);
-- Best for: Low-cardinality columns (few distinct values)
-- Excellent for: Complex AND/OR queries in analytics
 
-- B+-Tree vs Hash decision:
-- Q: "Find all orders with customer_id = 100"
-- Both work, Hash slightly faster for pure equality
 
-- Q: "Find all orders with customer_id BETWEEN 100 AND 200"
-- Only B+-Tree works (Hash cannot do ranges)
 
-- Q: "Find all orders with customer_id = 100 ORDER BY order_date"
-- B+-Tree on (customer_id, order_date) handles both filter and sort!
 
-- Recommendation: Default to B+-Tree unless you have specific Hash/Bitmap use case

B+-Tree Dominance

Index Selection Decision Framework

Given the multiple classification dimensions, how do you decide which indexes to create? Here's a systematic framework:

Step 1: Identify Critical Queries

Collect the most important queries—those that are:

Executed most frequently (volume)
Performance-critical (latency SLAs)
Resource-intensive (full scans, sorts)

Step 2: Analyze Query Patterns

For each query, identify:

Filter columns (WHERE clause)
Join columns
Sort columns (ORDER BY)
Select columns (for covering potential)

decision_framework.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Index Selection Decision Tree:
 
START: Analyzing query patterns
 
1. Does the table have a natural unique identifier?
   ├─ YES → Create PRIMARY KEY (clustered in most DBs)
   └─ NO → Consider surrogate key (auto-increment)
 
2. For each frequent query's WHERE clause:
   ├─ Single equality column?
   │   ├─ High cardinality → Secondary index
   │   └─ Low cardinality → Consider bitmap or skip (full scan may be OK)
   │
   ├─ Multiple equality columns?
   │   └─ Composite index (equality cols first, by selectivity)
   │
   ├─ Range predicate?
   │   └─ Composite index (equality cols, then range col last)
   │
   └─ Complex OR conditions?
       └─ Consider separate indexes + intersection, or restructure query
 
3. Does query have ORDER BY?
   ├─ Same as WHERE columns → Include in composite index order
   └─ Different columns → May need separate index or accept filesort
 
4. Is the query very narrow (few columns in SELECT)?
   ├─ YES → Covering index opportunity (add SELECT cols to index)
   └─ NO → Standard index sufficient
 
5. Is the column frequently updated?
   ├─ YES → Index carefully (update cost) or avoid if not critical
   └─ NO → Index freely
 
6. Review overall index count
   ├─ > 5-7 indexes → Review write performance impact
   └─ < 3 indexes → May have coverage gaps

Index Selection Checklist

•Primary key: Always define one (for row identification and clustering)
•Foreign keys: Index them (supports joins and referential integrity checks)
•Frequent WHERE columns: Index if selective; composite if multiple columns
•JOIN columns: Index the 'many' side of one-to-many relationships
•ORDER BY + LIMIT: Index can eliminate sort operation
•Covering opportunities: Add columns to avoid table lookups for narrow queries
•Unique constraints: Create via UNIQUE index for data integrity

Monitor, Don't Guess

Common Index Anti-Patterns

Even experienced developers make indexing mistakes. Here are common anti-patterns to avoid:

Anti-Pattern 1: Over-Indexing

Creating an index on every column 'just in case.' Each index adds write overhead and consumes storage. For write-heavy tables, excessive indexes can cut write throughput by 50% or more.

Index Anti-Patterns to Avoid

•Index everything: Creates write overhead without proportional read benefit
•Ignoring column order: Composite index (A, B) doesn't help queries filtering only on B
•Low-cardinality single indexes: Indexing boolean or status columns alone often useless
•Duplicate prefixes: (A, B) and (A, B, C) are redundant—the second covers the first
•Missing foreign key indexes: Causes slow cascades and join performance issues
•Functions on indexed columns: WHERE UPPER(email) = ? can't use index on email
•Type mismatches: WHERE int_column = '123' may prevent index use in some databases
•Ignoring covering potential: Extra table lookups when index could satisfy query
•Never validating: Creating indexes without checking EXPLAIN or measuring impact

anti_pattern_fixes.sql
Fixes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Anti-pattern: Function on indexed column
-- Bad:
SELECT * FROM users WHERE UPPER(email) = 'ALICE@EXAMPLE.COM';
-- Index on email NOT used!
 
-- Fix: Expression index (PostgreSQL) or functional index (other DBs)
CREATE INDEX idx_email_upper ON users (UPPER(email));
 
-- Better fix: Normalize data on write, query directly
UPDATE users SET email = LOWER(email);
SELECT * FROM users WHERE email = 'alice@example.com';
 
-- Anti-pattern: Duplicate prefix indexes
-- Wasteful:
CREATE INDEX idx_a ON t(A);
CREATE INDEX idx_ab ON t(A, B);
CREATE INDEX idx_abc ON t(A, B, C);
-- idx_a is redundant! idx_ab covers queries on A.
-- idx_ab is redundant! idx_abc covers queries on A and (A, B).
 
-- Fixed: Single covering index
CREATE INDEX idx_abc ON t(A, B, C);
-- Serves: WHERE A=?, WHERE A=? AND B=?, WHERE A=? AND B=? AND C=?
 
-- Anti-pattern: Missing FK index
-- Schema:
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT REFERENCES customers(id)
);
-- Problem: No index on customer_id
-- Consequence: DELETE FROM customers WHERE id = 100 does FULL SCAN of orders!
 
-- Fixed:
CREATE INDEX idx_orders_customer ON orders(customer_id);

Unused Index Detection

Practical Classification Matrix

To synthesize everything we've learned, here's a comprehensive matrix that maps use cases to index types:

Index Type Selection by Use Case
Use Case	Recommended Index	Why
Row identification	PRIMARY KEY (clustered)	Mandatory; determines physical order
Natural unique field (SSN, email)	UNIQUE INDEX	Constraint + index combined
Foreign key column	Secondary non-unique	Supports joins and cascades
High-selectivity filter	Secondary B+-tree	Standard query acceleration
Low-selectivity filter	Consider bitmap or skip	B+-tree less effective
Multi-column equality filter	Composite (all eq cols)	Leftmost prefix serves query
Equality + range filter	Composite (eq first, range last)	Preserves range scan efficiency
ORDER BY with LIMIT	Index matching sort order	Eliminates filesort
Narrow covering query	Composite with INCLUDE	Index-only scan
Exact match only (tokens)	Hash index	O(1) lookup
Geographic/spatial data	R-tree / spatial index	Multi-dimensional queries
Full-text search	GIN / full-text index	Inverted index structure
Conditional uniqueness	Partial unique index	Filtered constraint

Context Is King

Summary: Index Classification Mastery

We've concluded our comprehensive exploration of index types with a systematic classification framework that enables principled index design decisions.

Key Takeaways

•Multi-dimensional classification: Indexes are classified along ordering, density, uniqueness, column count, structure, and scope dimensions
•Orthogonal dimensions: These dimensions are independent—a single index can be clustered + unique + composite + B+-tree + partial
•Ordering is fundamental: Clustered vs. non-clustered is the most impactful distinction, affecting I/O patterns dramatically
•B+-tree dominance: Default to B+-tree; use hash/bitmap/spatial only for specific needs
•Decision framework: Systematically analyze queries, identify filter/sort/covering needs, then select appropriate index type
•Anti-patterns destroy value: Over-indexing, wrong column order, missing FK indexes, and functions on columns are common mistakes
•Measurement is mandatory: Use EXPLAIN, slow logs, and index usage stats to validate index effectiveness
•Balance reads and writes: Every index is a read/write tradeoff—optimize for your actual workload

Module completion:

This completes Module 2: Index Types. You now have comprehensive knowledge of:

Primary indexes and their physical ordering relationship with data
Secondary indexes and their role as alternative access paths
Unique indexes combining performance with constraint enforcement
Composite indexes for multi-column query optimization
Index classification frameworks for systematic design decisions

With this foundation, you're prepared to explore advanced topics: B+-tree internals, hash indexing, specialized index structures, and query optimization techniques.

Module Complete