Loading content...
Throughout this module, we've explored primary indexes, secondary indexes, unique indexes, and composite indexes. But these aren't mutually exclusive categories—they represent different dimensions of classification. A single index can simultaneously be:
Or:
To make informed indexing decisions, we need a comprehensive classification framework that considers all these dimensions together. This systematic approach transforms indexing from an art into an engineering discipline.
By the end of this page, you will understand the multiple dimensions along which indexes can be classified, how to systematically analyze indexing requirements, decision frameworks for index type selection, common index anti-patterns and how to avoid them, and practical guidelines for comprehensive index design.
Indexes can be classified along several independent dimensions. Each dimension represents a different aspect of how the index is structured, behaves, or interacts with the underlying data.
The six fundamental classification dimensions:
| Dimension | Categories | Key Question |
|---|---|---|
| Physical Ordering | Clustered vs. Non-Clustered | Does the index control data row order? |
| Entry Density | Dense vs. Sparse | One entry per record or per block? |
| Key Uniqueness | Unique vs. Non-Unique | Are duplicate key values allowed? |
| Column Count | Single-Column vs. Composite | How many columns in the index key? |
| Data Structure | B+-Tree, Hash, Bitmap, etc. | What structure organizes entries? |
| Content Scope | Full vs. Partial (Filtered) | Are all rows indexed or a subset? |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
Index Classification Examples: Example 1: MySQL InnoDB PRIMARY KEY─────────────────────────────────────CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT, order_date DATE); Classification:• Physical Ordering: Clustered (data stored in PK order)• Entry Density: Dense (B+-tree leaf = data rows)• Key Uniqueness: Unique (PRIMARY KEY constraint)• Column Count: Single-column• Data Structure: B+-tree• Content Scope: Full (all rows indexed) Example 2: PostgreSQL Partial Unique Index─────────────────────────────────────CREATE UNIQUE INDEX idx_active_emailON users (email)WHERE is_active = TRUE; Classification:• Physical Ordering: Non-Clustered (PostgreSQL uses heap)• Entry Density: Dense (one entry per qualifying row)• Key Uniqueness: Unique (within matching rows)• Column Count: Single-column• Data Structure: B+-tree• Content Scope: Partial (only is_active = TRUE rows) Example 3: Composite Non-Unique Secondary Index─────────────────────────────────────CREATE INDEX idx_customer_dateON orders (customer_id, order_date); Classification:• Physical Ordering: Non-Clustered• Entry Density: Dense• Key Uniqueness: Non-Unique• Column Count: Composite (2 columns)• Data Structure: B+-tree• Content Scope: FullThese dimensions are largely independent—any combination is theoretically possible (though some are rare). A sparse clustered unique composite index on a hash structure is technically definable, even if unusual. Understanding each dimension separately helps you reason about any index you encounter.
The most fundamental division in index classification is based on the relationship between index order and data order.
Clustered (Primary) Index:
Non-Clustered (Secondary) Index:
Clustered Index Characteristics:
┌────────────────────────────┐
│ Index + Data │
│ Integrated Structure │
│ │
│ ┌──────────────────────┐ │
│ │ Key 1 │ Full Row 1 │ │
│ │ Key 2 │ Full Row 2 │ │
│ │ Key 3 │ Full Row 3 │ │
│ └──────────────────────┘ │
│ │
│ Data IS the leaf level │
└────────────────────────────┘
Best for:
Non-Clustered Index:
┌───────────────────────────┐
│ Index Structure │
│ ┌─────────┬─────────┐ │
│ │ Key 1 │ Pointer─┼──►│ Row in heap/table
│ │ Key 2 │ Pointer─┼──►│ Row in heap/table
│ │ Key 3 │ Pointer─┼──►│ Row in heap/table
│ └─────────┴─────────┘ │
│ │
│ Index separate from data │
└───────────────────────────┘
Best for:
| Factor | Favors Clustered | Favors Non-Clustered |
|---|---|---|
| Range queries on key | ✅ Strong advantage | Sequential I/O lost |
| Multiple columns to filter | Can only cluster on one | ✅ Create multiple indexes |
| Key value changes | Row relocation needed | ✅ Only index entry moves |
| Join with key as driver | ✅ Sequential access | Random access pattern |
| Wide rows | ✅ No extra lookup | Pointer + lookup cost |
| Narrow covering queries | May be overkill | ✅ Index-only scans |
Indexes can be classified by how many entries they contain relative to data records.
Dense Index:
Sparse Index:
1234567891011121314151617181920212223242526
Dense Index Sparse Index(1 entry per record) (1 entry per block) Index: Index:┌───────┬────────┐ ┌───────┬────────┐│ Key 1 │ → Row 1│ │ Key 1 │ → Blk 1││ Key 2 │ → Row 2│ │ Key 4 │ → Blk 2│──┐│ Key 3 │ → Row 3│ │ Key 7 │ → Blk 3│ ││ Key 4 │ → Row 4│ └───────┴────────┘ ││ Key 5 │ → Row 5│ ││ Key 6 │ → Row 6│ Data Blocks: ││ Key 7 │ → Row 7│ ┌───────────────────┴──┐│ ... │ ... │ │ Block 2: │└───────┴────────┘ │ Key 4, Key 5, Key 6 │ └───────────────────────┘Index entries: N(where N = number of records) Index entries: N / blocking_factor (much smaller!) Size comparison (1M records, 50 records/block):Dense: 1,000,000 entriesSparse: 20,000 entries (50x smaller!) Lookup for Key=5:Dense: Binary search → Direct to Row 5Sparse: Binary search → Block 2 → Scan within block for Key 5Most modern B+-tree implementations blur the dense/sparse line. Internal nodes are sparse (entry per subtree), while leaf nodes are dense (entry per record). For clustered indexes in InnoDB, leaf nodes contain actual rows—making the distinction somewhat academic. The key takeaway: clustered indexes are inherently more compact than secondary indexes.
Indexes are implemented using different data structures, each optimized for certain operations.
Common index structures:
| Structure | Best For | Point Query | Range Query | Insert |
|---|---|---|---|---|
| B+-Tree | General purpose, most common | O(log n) | O(log n + k) | O(log n) |
| Hash Index | Equality lookups only | O(1) average | ❌ Not supported | O(1) average |
| Bitmap Index | Low-cardinality columns, OLAP | O(1) | Bitmap operations | Expensive |
| R-Tree | Spatial/geographic data | O(log n) | Spatial range | O(log n) |
| GiST/GIN | Full-text, arrays, JSON | Depends | Depends | Varies |
| LSM-Tree | Write-heavy workloads | O(log n) | O(log n + k) | O(1) amortized |
123456789101112131415161718192021222324252627
-- B+-Tree: Default and most versatileCREATE INDEX idx_btree ON orders (customer_id);-- Supports: =, <, >, <=, >=, BETWEEN, LIKE 'prefix%', ORDER BY -- Hash Index (PostgreSQL, MySQL Memory engine)CREATE INDEX idx_hash ON sessions (session_token) USING HASH;-- Supports: = only-- NOT useful for: <, >, BETWEEN, ORDER BY-- Best for: Exact match lookups like session tokens, cache keys -- Bitmap Index (Oracle, PostgreSQL with extensions)-- Conceptually:CREATE BITMAP INDEX idx_status ON orders (status);-- Best for: Low-cardinality columns (few distinct values)-- Excellent for: Complex AND/OR queries in analytics -- B+-Tree vs Hash decision:-- Q: "Find all orders with customer_id = 100"-- Both work, Hash slightly faster for pure equality -- Q: "Find all orders with customer_id BETWEEN 100 AND 200"-- Only B+-Tree works (Hash cannot do ranges) -- Q: "Find all orders with customer_id = 100 ORDER BY order_date"-- B+-Tree on (customer_id, order_date) handles both filter and sort! -- Recommendation: Default to B+-Tree unless you have specific Hash/Bitmap use caseB+-tree indexes handle 90%+ of indexing needs. They support all comparison operators, ORDER BY optimization, and range queries efficiently. Hash and bitmap indexes are specialized tools for specific scenarios—use them consciously, not routinely.
Given the multiple classification dimensions, how do you decide which indexes to create? Here's a systematic framework:
Step 1: Identify Critical Queries
Collect the most important queries—those that are:
Step 2: Analyze Query Patterns
For each query, identify:
12345678910111213141516171819202122232425262728293031323334353637
Index Selection Decision Tree: START: Analyzing query patterns 1. Does the table have a natural unique identifier? ├─ YES → Create PRIMARY KEY (clustered in most DBs) └─ NO → Consider surrogate key (auto-increment) 2. For each frequent query's WHERE clause: ├─ Single equality column? │ ├─ High cardinality → Secondary index │ └─ Low cardinality → Consider bitmap or skip (full scan may be OK) │ ├─ Multiple equality columns? │ └─ Composite index (equality cols first, by selectivity) │ ├─ Range predicate? │ └─ Composite index (equality cols, then range col last) │ └─ Complex OR conditions? └─ Consider separate indexes + intersection, or restructure query 3. Does query have ORDER BY? ├─ Same as WHERE columns → Include in composite index order └─ Different columns → May need separate index or accept filesort 4. Is the query very narrow (few columns in SELECT)? ├─ YES → Covering index opportunity (add SELECT cols to index) └─ NO → Standard index sufficient 5. Is the column frequently updated? ├─ YES → Index carefully (update cost) or avoid if not critical └─ NO → Index freely 6. Review overall index count ├─ > 5-7 indexes → Review write performance impact └─ < 3 indexes → May have coverage gapsIndex selection should be data-driven. Use slow query logs, EXPLAIN plans, and index usage statistics to validate assumptions. An index that seems theoretically useful may be unused in practice, while a missing index may cause silent performance degradation.
Even experienced developers make indexing mistakes. Here are common anti-patterns to avoid:
Anti-Pattern 1: Over-Indexing
Creating an index on every column 'just in case.' Each index adds write overhead and consumes storage. For write-heavy tables, excessive indexes can cut write throughput by 50% or more.
1234567891011121314151617181920212223242526272829303132333435
-- Anti-pattern: Function on indexed column-- Bad:SELECT * FROM users WHERE UPPER(email) = 'ALICE@EXAMPLE.COM';-- Index on email NOT used! -- Fix: Expression index (PostgreSQL) or functional index (other DBs)CREATE INDEX idx_email_upper ON users (UPPER(email)); -- Better fix: Normalize data on write, query directlyUPDATE users SET email = LOWER(email);SELECT * FROM users WHERE email = 'alice@example.com'; -- Anti-pattern: Duplicate prefix indexes-- Wasteful:CREATE INDEX idx_a ON t(A);CREATE INDEX idx_ab ON t(A, B);CREATE INDEX idx_abc ON t(A, B, C);-- idx_a is redundant! idx_ab covers queries on A.-- idx_ab is redundant! idx_abc covers queries on A and (A, B). -- Fixed: Single covering indexCREATE INDEX idx_abc ON t(A, B, C);-- Serves: WHERE A=?, WHERE A=? AND B=?, WHERE A=? AND B=? AND C=? -- Anti-pattern: Missing FK index-- Schema:CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT REFERENCES customers(id));-- Problem: No index on customer_id-- Consequence: DELETE FROM customers WHERE id = 100 does FULL SCAN of orders! -- Fixed:CREATE INDEX idx_orders_customer ON orders(customer_id);Most databases provide statistics on index usage. PostgreSQL: pg_stat_user_indexes.idx_scan. MySQL: performance_schema.table_io_waits_summary_by_index_usage. Regularly audit indexes with zero or near-zero reads—candidates for removal.
To synthesize everything we've learned, here's a comprehensive matrix that maps use cases to index types:
| Use Case | Recommended Index | Why |
|---|---|---|
| Row identification | PRIMARY KEY (clustered) | Mandatory; determines physical order |
| Natural unique field (SSN, email) | UNIQUE INDEX | Constraint + index combined |
| Foreign key column | Secondary non-unique | Supports joins and cascades |
| High-selectivity filter | Secondary B+-tree | Standard query acceleration |
| Low-selectivity filter | Consider bitmap or skip | B+-tree less effective |
| Multi-column equality filter | Composite (all eq cols) | Leftmost prefix serves query |
| Equality + range filter | Composite (eq first, range last) | Preserves range scan efficiency |
| ORDER BY with LIMIT | Index matching sort order | Eliminates filesort |
| Narrow covering query | Composite with INCLUDE | Index-only scan |
| Exact match only (tokens) | Hash index | O(1) lookup |
| Geographic/spatial data | R-tree / spatial index | Multi-dimensional queries |
| Full-text search | GIN / full-text index | Inverted index structure |
| Conditional uniqueness | Partial unique index | Filtered constraint |
These recommendations are starting points, not absolute rules. The right index depends on your specific data distribution, query patterns, read/write ratio, and performance requirements. Always validate with actual data and workloads.
We've concluded our comprehensive exploration of index types with a systematic classification framework that enables principled index design decisions.
Module completion:
This completes Module 2: Index Types. You now have comprehensive knowledge of:
With this foundation, you're prepared to explore advanced topics: B+-tree internals, hash indexing, specialized index structures, and query optimization techniques.
Congratulations! You've mastered the taxonomy of database indexes. You can now classify any index you encounter, make informed decisions about index creation, avoid common anti-patterns, and design index strategies that balance query performance with maintenance overhead. These skills are fundamental to database performance engineering.