Database Management SystemsIndex Types

Index Types: Primary, Secondary, Unique, and Composite Indexes

LevelIntermediate

Duration60 mins

TopicIndex Types

2 / 5

Secondary Index: Alternative Access Paths to Data

Beyond the Primary Index

A table can have only one physical ordering—and thus only one primary (or clustered) index. But what happens when your queries need fast access through multiple fields? You might need to search employees by department, filter orders by customer, or look up products by category. These use cases demand secondary indexes.

Secondary indexes are the workhorses of database query optimization. While a primary index gets exclusive control over physical data ordering, secondary indexes provide alternative pathways to the same data. They enable multiple fast access patterns on a single table, transforming the database from a one-dimensional lookup table into a multi-dimensional query engine.

What You Will Learn

By the end of this page, you will understand how secondary indexes work without controlling physical ordering, the critical distinction between dense and sparse secondary indexes, why secondary indexes often require more storage than primary indexes, the concept of index indirection and its performance implications, and strategies for effective secondary index design.

What Is a Secondary Index?

A secondary index is an index on a field that is not the ordering field of the data file. The data file may be ordered on a different field (with a primary index), unordered (heap), or organized in any other manner—the secondary index doesn't care. It provides its own independent pathway to records.

Formal definition:

A secondary index (also called a non-clustered index) is an index structure where the indexed field's values are not in the same order as the physical records in the data file. The index entries contain search key values and pointers to the actual records (or to blocks containing those records).

Key insight: Because the data isn't sorted on the secondary index field, there's no guarantee that records with similar key values are physically adjacent. This has profound implications for query performance.

secondary_index_structure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Data File (Ordered by Employee_ID = Primary Index)
┌─────────────────────────────────────────────────────────────────────┐
│ Block 1 │ (101, "Engineering") (102, "Sales") (103, "Engineering") │
│ Block 2 │ (104, "Marketing") (105, "Sales") (106, "Engineering")   │
│ Block 3 │ (107, "HR") (108, "Sales") (109, "Marketing")             │
│ Block 4 │ (110, "Engineering") (111, "HR") (112, "Sales")           │
└─────────────────────────────────────────────────────────────────────┘
                         ↑
    Data is sorted by Employee_ID, NOT by Department
 
Secondary Index on Department (Non-Ordering Field)
┌────────────────────────────────────────────────────────────────────┐
│  Index Entry Format: [Department] → [List of Record Pointers]     │
│                                                                     │
│  ┌──────────────┬───────────────────────────────────────────────┐  │
│  │ Engineering  │ → (101,B1) (103,B1) (106,B2) (110,B4)         │  │
│  ├──────────────┼───────────────────────────────────────────────┤  │
│  │ HR           │ → (107,B3) (111,B4)                           │  │
│  ├──────────────┼───────────────────────────────────────────────┤  │
│  │ Marketing    │ → (104,B2) (109,B3)                           │  │
│  ├──────────────┼───────────────────────────────────────────────┤  │
│  │ Sales        │ → (102,B1) (105,B2) (108,B3) (112,B4)         │  │
│  └──────────────┴───────────────────────────────────────────────┘  │
│                                                                     │
│  Note: Records for each department are SCATTERED across blocks     │
└────────────────────────────────────────────────────────────────────┘

The Scattered Records Problem

Unlike a primary index where records with adjacent key values are in adjacent blocks, secondary indexes point to records scattered throughout the data file. Searching for all 'Engineering' employees might require accessing 4 different blocks, even though there are only 4 matching records. This random I/O pattern is the fundamental cost of secondary indexes.

Primary Index vs. Secondary Index Comparison
Characteristic	Primary Index	Secondary Index
Data ordering	Index field = ordering field	Index field ≠ ordering field
Physical proximity	Related records adjacent	Related records scattered
Number per table	At most one	Multiple allowed
Sparse index option	Yes (one entry per block)	Only on unique fields
Range query I/O	Sequential reads	Random reads
Index size	Smaller (sparse)	Larger (often dense)

Dense Secondary Indexes

When a secondary index is built on a unique field (one where each value appears exactly once), the index can theoretically be sparse. However, secondary indexes are commonly dense—containing one entry for every record in the data file, not one per block.

Why dense indexes are often necessary:

Consider the challenge: The data file isn't sorted on the secondary index field. This means records with adjacent key values might be in completely different blocks. A sparse index (one entry per block) wouldn't help us find records—we'd have no way to know which block contains a given value.

Dense secondary index structure:

dense_secondary_index.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Dense Secondary Index on Social_Security_Number (Unique Field)
 
Data File (Ordered by Employee_ID):            Dense Secondary Index:
┌─────────────────────────────────────┐       ┌─────────────────────────────┐
│ Block 1:                            │       │  SSN         │ Record Ptr   │
│   (101, SSN: 456-78-9012, Alice)   │   ←───│  111-22-3333 │ → Block 3:2  │
│   (102, SSN: 789-01-2345, Bob)     │   ←───│  222-33-4444 │ → Block 4:1  │
│   (103, SSN: 345-67-8901, Carol)   │   ←───│  333-44-5555 │ → Block 2:3  │
├─────────────────────────────────────┤       │  345-67-8901 │ → Block 1:3  │
│ Block 2:                            │       │  456-78-9012 │ → Block 1:1  │
│   (104, SSN: 567-89-0123, David)   │   ←───│  555-66-7777 │ → Block 3:3  │
│   (105, SSN: 678-90-1234, Eve)     │   ←───│  567-89-0123 │ → Block 2:1  │
│   (106, SSN: 333-44-5555, Frank)   │   ←───│  678-90-1234 │ → Block 2:2  │
├─────────────────────────────────────┤       │  789-01-2345 │ → Block 1:2  │
│ Block 3:                            │       │  888-99-0000 │ → Block 4:2  │
│   (107, SSN: 111-22-3333, Grace)   │       └─────────────────────────────┘
│   (108, SSN: 890-12-3456, Henry)   │              ↑
│   (109, SSN: 555-66-7777, Iris)    │       Index is SORTED by SSN
├─────────────────────────────────────┤       Data is SORTED by Employee_ID
│ Block 4:                            │       
│   (110, SSN: 222-33-4444, Jake)    │       Notice: Index pointers go to
│   (111, SSN: 888-99-0000, Kate)    │       records scattered across 
│                                     │       ALL data blocks!
└─────────────────────────────────────┘

Size implications of dense indexes:

A dense index has one entry per record. For a table with 1,000,000 records:

Primary index (sparse, assuming 50 records/block): 20,000 entries
Dense secondary index: 1,000,000 entries

The secondary index is 50x larger than the primary index!

However, index entries are much smaller than full records. If each entry is 20 bytes (12-byte key + 8-byte pointer) vs. 200-byte records:

Secondary index: 1,000,000 × 20 = 20 MB
Data file: 1,000,000 × 200 = 200 MB

The index is still 10x smaller than the data—small enough to often fit in memory, making index traversal extremely fast.

Index-Only Access

If a query only needs values from the indexed columns, the database can satisfy the query entirely from the index without reading the data file at all. This is called an 'index-only scan' or 'covering index' behavior. Dense secondary indexes are particularly valuable for this optimization.

Secondary Indexes on Non-Unique Fields

When a secondary index is built on a field with duplicate values (non-unique), the design becomes more complex. There are several approaches to handle this:

Option 1: Dense Index with Repeated Keys

Every record gets an index entry, even if the key value is repeated. Simple but potentially wasteful.

Option 2: Variable-Length Entries

Each unique key value has one entry containing a list of all record pointers. Space-efficient but complicates the index structure.

Option 3: Two-Level Indirection

The secondary index points to an intermediate structure (bucket or block of pointers) which then points to actual records.

non_unique_index_options.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Option 1: Repeated Keys (Dense, Simple)
┌───────────────┬─────────────┐
│ Department    │ Record Ptr  │
├───────────────┼─────────────┤
│ Engineering   │ → 101       │   4 entries for "Engineering"
│ Engineering   │ → 103       │   Wastes space on repeated key
│ Engineering   │ → 106       │   But simple B-tree structure
│ Engineering   │ → 110       │
│ HR            │ → 107       │
│ HR            │ → 111       │
│ ...           │ ...         │
└───────────────┴─────────────┘
 
Option 2: Variable-Length Pointer Lists
┌───────────────┬─────────────────────────────────┐
│ Department    │ Record Pointers                 │
├───────────────┼─────────────────────────────────┤
│ Engineering   │ [101, 103, 106, 110]            │   1 entry per unique value
│ HR            │ [107, 111]                      │   Variable-length lists
│ Marketing     │ [104, 109]                      │   Efficient for reads
│ Sales         │ [102, 105, 108, 112]            │   Complex for updates
└───────────────┴─────────────────────────────────┘
 
Option 3: Bucket-Based Indirection
Secondary Index                 Pointer Buckets           Data File
┌──────────────┐               ┌───────────────┐         ┌──────────┐
│ Engineering ─┼──────────────►│ 101 │ 103 │   │    ┌───►│ Block 1  │
│              │               │ 106 │ 110 │   │────┤    │ ...      │
├──────────────┤               ├───────────────┤    │    ├──────────┤
│ HR ──────────┼──────────────►│ 107 │ 111 │   │────┼───►│ Block 2  │
│              │               ├───────────────┤    │    │ ...      │
├──────────────┤               │ ...           │    │    ├──────────┤
│ Sales ───────┼──────────────►│ 102 │ 105 │   │────┘    │ Block 3  │
│              │               │ 108 │ 112 │   │         └──────────┘
└──────────────┴───────────────┴───────────────┘
        ↑                              ↑
     Sparse index               Bucket pages for
     on unique keys             each key value

Modern B+-Tree Implementation

Most modern databases use B+-trees for secondary indexes, storing (key, pointer) pairs in sorted order. For non-unique keys, duplicate entries are allowed in the leaf nodes. The tree remains balanced, and all operations maintain O(log n) complexity. The 'pointer buckets' approach is less common in modern systems.

Non-Unique Index Implementation Tradeoffs
Approach	Space Efficiency	Query Speed	Update Speed	Implementation
Repeated keys	Low (duplicates)	Good	Good	Simple B+-tree
Variable-length lists	High	Good	Complex	Custom structure
Bucket indirection	Medium	Extra I/O	Medium	Two-level structure

Index Indirection and Performance Implications

Secondary indexes use indirection—they don't contain data, they contain pointers to data. This layer of indirection has significant performance implications that every database professional must understand.

The random I/O problem:

When you traverse a secondary index and then follow pointers to retrieve data, each pointer might lead to a different disk block. If you're retrieving 100 records, you might need 100 separate random disk accesses—even if the data exists in far fewer blocks.

Contrast this with a primary index: once you locate the starting point, consecutive records are in consecutive disk locations, enabling efficient prefetching and sequential reads.

random_io_analysis.sql
Query Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Scenario: Retrieve all employees in 'Engineering' department
-- Data: 10,000 employees, 1,000 in Engineering
-- Data blocks: 200 (50 records per block on average)
 
-- Using Primary Index on Employee_ID (if searching by ID range):
-- I/O pattern: Sequential
-- Disk accesses: ~20-40 blocks (consecutive blocks)
-- Time: Fast (sequential prefetch works)
 
-- Using Secondary Index on Department:
-- I/O pattern: Random
-- Disk accesses: Could be up to 1,000 (one per Engineering employee)
-- Best case: ~200 blocks (Engineering employees evenly distributed)
-- Worst case: ~1,000 random accesses
 
-- The math of random I/O:
-- Sequential I/O: 0.1ms per block (disk head already positioned)
-- Random I/O: 10ms per block (seek + rotational latency)
 
-- 40 blocks sequential: 40 × 0.1ms = 4ms
-- 200 blocks random: 200 × 10ms = 2000ms = 2 seconds!
 
-- This explains why:
-- 1. Full table scans can beat secondary index lookups
-- 2. Query optimizers sometimes reject index usage
-- 3. "Index everything" is terrible advice

The Tipping Point

For queries returning more than 5-20% of a table, a full table scan with sequential I/O often outperforms secondary index access with random I/O. Query optimizers make this decision automatically based on statistics—this is why the same query might use an index for a selective WHERE clause but scan the table for a less selective one.

Strategies to mitigate random I/O:

Database systems employ several techniques to reduce the random I/O cost of secondary indexes:

Buffer pool caching: Cache frequently accessed blocks in memory
Index-organized tables: Some systems can cluster data by a secondary index key
Covering indexes: Include all needed columns in the index to avoid data access
Block prefetching: Read ahead when sequential access is predicted
Bitmap heap scans: Collect all block IDs, sort them, then access in order

When Secondary Indexes Excel

•Highly selective queries — Retrieving a small percentage of rows (<5-10%)
•Point queries — Looking up a single row by indexed value
•Index-only queries — Query satisfied entirely from index
•OLTP workloads — Many small, precise lookups
•Frequently accessed hot data — Index and data cached in buffer pool

Secondary Index Implementation in Modern Databases

Modern database systems implement secondary indexes primarily using B+-tree structures, but with important variations in how they reference data rows.

InnoDB Secondary Index Architecture:

In InnoDB, secondary indexes have a unique characteristic: they store the primary key as the pointer to the data row, not a physical row address.

This design has important implications:

Secondary index lookup requires two operations: (1) Find entry in secondary index → (2) Look up primary key in clustered index
This is called a double lookup or bookmark lookup
If you change the primary key, all secondary indexes automatically update (conceptually)
Physical row moves don't require secondary index updates

innodb_secondary_index.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
InnoDB Secondary Index Lookup Process
 
Secondary Index on Department         Clustered Index (on pk)
┌────────────────────────────┐        ┌────────────────────────────┐
│  Department  │ Primary Key │        │  pk  │  Full Row Data      │
├──────────────┼─────────────┤        ├──────┼─────────────────────┤
│ "Engineering"│     101     │───┐    │ 101  │ Alice, Engineering..│
│ "Engineering"│     103     │   │    │ 102  │ Bob, Sales, ...     │
│ "Engineering"│     106     │   │    │ 103  │ Carol, Engineering..│
│ "HR"         │     107     │   │    │ ...  │ ...                 │
└──────────────┴─────────────┘   │    └──────┴─────────────────────┘
                                 │           ▲
     Step 1: Find "Engineering"  │           │
     in secondary index         └───────────┘
                               Step 2: Use pk (101)
                               to find row in
                               clustered index
 
Total cost: 2 B+-tree traversals

INCLUDE Columns for Covering Indexes

SQL Server and PostgreSQL support INCLUDE clause in index creation. This adds non-key columns to the leaf level without affecting the tree structure. Example: CREATE INDEX idx ON Orders(CustomerID) INCLUDE (TotalAmount, OrderDate). This enables index-only scans for more queries.

Secondary Index Design Considerations

Creating effective secondary indexes requires balancing query performance against maintenance costs. Unlike primary indexes (which are often predetermined by the table's natural key), secondary indexes are strategic choices that should be driven by workload analysis.

Create Secondary Index When

•Queries filter on this column frequently in WHERE clauses
•High selectivity — Column has many distinct values
•Join operations use this column as the join key
•ORDER BY on this column is common (if used with LIMIT)
•Read-heavy workload — Queries far exceed modifications
•Foreign key columns — Support referential integrity checks

Avoid Secondary Index When

•Low selectivity — Few distinct values (e.g., boolean, gender)
•Small tables — Full scan is faster than index access
•Write-heavy workload — Index maintenance exceeds query benefit
•Rarely queried columns — Index cost not justified
•Frequently updated columns — Index churn causes fragmentation
•Very wide columns — Large index entries waste space

Index Overhead Is Real

Every secondary index must be updated on INSERT, and potentially on UPDATE/DELETE. With 5 secondary indexes, an INSERT becomes 6 write operations (data + 5 index updates). For high-volume OLTP, excessive indexes can reduce write throughput by 50% or more.

Selectivity and the break-even point:

Selectivity refers to what fraction of rows match a query predicate. A column with 1000 distinct values in a 100,000-row table has selectivity of 1000/100000 = 1%.

High selectivity (1-10%): Secondary index highly beneficial
Medium selectivity (10-30%): May or may not benefit; test empirically
Low selectivity (>30%): Full table scan likely faster

The exact threshold depends on row size, block size, and storage characteristics. Always validate with query plans and actual performance measurements.

Secondary Index Maintenance and Fragmentation

Secondary indexes require ongoing maintenance to remain performant. Over time, insert, update, and delete operations can cause fragmentation—where the index structure becomes less efficient than its ideal form.

Types of fragmentation:

Index Fragmentation Types

•Logical fragmentation: Index pages out of physical order (page 5 might precede page 2 on disk). Affects sequential scans.
•Page density fragmentation: Pages only partially filled due to splits. Wastes space, increases I/O for scans.
•Deletion markers: Deleted entries leave 'tombstones' that must be scanned past. Reduces effective index density.

index_maintenance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- SQL Server: Analyze and fix fragmentation
-- Check fragmentation level
SELECT 
    index_id,
    avg_fragmentation_in_percent,
    page_count
FROM sys.dm_db_index_physical_stats(
    DB_ID(), OBJECT_ID('Employee'), NULL, NULL, 'DETAILED'
);
 
-- Reorganize (online, for <30% fragmentation)
ALTER INDEX IX_Employee_Department 
ON Employee REORGANIZE;
 
-- Rebuild (more thorough, for >30% fragmentation)
ALTER INDEX IX_Employee_Department 
ON Employee REBUILD;

Index Maintenance Strategy

Regular index maintenance should be scheduled based on workload: OLTP systems with heavy writes may need weekly or daily maintenance; analytics systems with batch loads can maintain after load cycles. Monitor fragmentation levels and set thresholds for action (e.g., reorganize at 10%, rebuild at 30%).

Summary: Secondary Index Mastery

We've explored secondary indexes in depth—the versatile index structures that provide alternative access paths to data without controlling physical storage ordering.

Key Takeaways

•Definition: Secondary indexes are built on non-ordering fields, providing lookup paths independent of physical data arrangement
•Scattered records: Unlike primary indexes, related records are physically distributed, resulting in random I/O patterns
•Dense nature: Secondary indexes typically maintain one entry per record (dense), making them larger than sparse primary indexes
•Multiple per table: Unlike primary indexes, you can create multiple secondary indexes on a single table
•Indirection cost: Secondary index access involves pointer traversal, and InnoDB specifically uses two-level lookup via primary key
•Random I/O tradeoff: For returning many rows (>5-20% of table), full table scan often outperforms secondary index access
•Write overhead: Every secondary index adds maintenance cost for INSERT/UPDATE/DELETE operations
•Design driven by workload: Index selection should be based on query patterns, selectivity analysis, and read/write ratio

What's next:

With primary and secondary indexes understood, we'll explore unique indexes—a specialized index type that combines fast lookup with constraint enforcement. Unique indexes serve dual purposes: accelerating queries while guaranteeing data integrity.

Page Complete

You now have comprehensive understanding of secondary indexes: their structure, the random I/O implications, dense vs sparse considerations, and how they're implemented across major database systems. This knowledge enables you to make informed decisions about when and how to create secondary indexes.

2 / 5

Loading learning content...

Database Management SystemsIndex Types

Index Types: Primary, Secondary, Unique, and Composite Indexes

LevelIntermediate

Duration60 mins

TopicIndex Types

2 / 5

Secondary Index: Alternative Access Paths to Data

Beyond the Primary Index

What You Will Learn

What Is a Secondary Index?

Formal definition:

secondary_index_structure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
Data File (Ordered by Employee_ID = Primary Index)
┌─────────────────────────────────────────────────────────────────────┐
│ Block 1 │ (101, "Engineering") (102, "Sales") (103, "Engineering") │
│ Block 2 │ (104, "Marketing") (105, "Sales") (106, "Engineering")   │
│ Block 3 │ (107, "HR") (108, "Sales") (109, "Marketing")             │
│ Block 4 │ (110, "Engineering") (111, "HR") (112, "Sales")           │
└─────────────────────────────────────────────────────────────────────┘
                         ↑
    Data is sorted by Employee_ID, NOT by Department
 
Secondary Index on Department (Non-Ordering Field)
┌────────────────────────────────────────────────────────────────────┐
│  Index Entry Format: [Department] → [List of Record Pointers]     │
│                                                                     │
│  ┌──────────────┬───────────────────────────────────────────────┐  │
│  │ Engineering  │ → (101,B1) (103,B1) (106,B2) (110,B4)         │  │
│  ├──────────────┼───────────────────────────────────────────────┤  │
│  │ HR           │ → (107,B3) (111,B4)                           │  │
│  ├──────────────┼───────────────────────────────────────────────┤  │
│  │ Marketing    │ → (104,B2) (109,B3)                           │  │
│  ├──────────────┼───────────────────────────────────────────────┤  │
│  │ Sales        │ → (102,B1) (105,B2) (108,B3) (112,B4)         │  │
│  └──────────────┴───────────────────────────────────────────────┘  │
│                                                                     │
│  Note: Records for each department are SCATTERED across blocks     │
└────────────────────────────────────────────────────────────────────┘

The Scattered Records Problem

Primary Index vs. Secondary Index Comparison
Characteristic	Primary Index	Secondary Index
Data ordering	Index field = ordering field	Index field ≠ ordering field
Physical proximity	Related records adjacent	Related records scattered
Number per table	At most one	Multiple allowed
Sparse index option	Yes (one entry per block)	Only on unique fields
Range query I/O	Sequential reads	Random reads
Index size	Smaller (sparse)	Larger (often dense)

Dense Secondary Indexes

Why dense indexes are often necessary:

Dense secondary index structure:

dense_secondary_index.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Dense Secondary Index on Social_Security_Number (Unique Field)
 
Data File (Ordered by Employee_ID):            Dense Secondary Index:
┌─────────────────────────────────────┐       ┌─────────────────────────────┐
│ Block 1:                            │       │  SSN         │ Record Ptr   │
│   (101, SSN: 456-78-9012, Alice)   │   ←───│  111-22-3333 │ → Block 3:2  │
│   (102, SSN: 789-01-2345, Bob)     │   ←───│  222-33-4444 │ → Block 4:1  │
│   (103, SSN: 345-67-8901, Carol)   │   ←───│  333-44-5555 │ → Block 2:3  │
├─────────────────────────────────────┤       │  345-67-8901 │ → Block 1:3  │
│ Block 2:                            │       │  456-78-9012 │ → Block 1:1  │
│   (104, SSN: 567-89-0123, David)   │   ←───│  555-66-7777 │ → Block 3:3  │
│   (105, SSN: 678-90-1234, Eve)     │   ←───│  567-89-0123 │ → Block 2:1  │
│   (106, SSN: 333-44-5555, Frank)   │   ←───│  678-90-1234 │ → Block 2:2  │
├─────────────────────────────────────┤       │  789-01-2345 │ → Block 1:2  │
│ Block 3:                            │       │  888-99-0000 │ → Block 4:2  │
│   (107, SSN: 111-22-3333, Grace)   │       └─────────────────────────────┘
│   (108, SSN: 890-12-3456, Henry)   │              ↑
│   (109, SSN: 555-66-7777, Iris)    │       Index is SORTED by SSN
├─────────────────────────────────────┤       Data is SORTED by Employee_ID
│ Block 4:                            │       
│   (110, SSN: 222-33-4444, Jake)    │       Notice: Index pointers go to
│   (111, SSN: 888-99-0000, Kate)    │       records scattered across 
│                                     │       ALL data blocks!
└─────────────────────────────────────┘

Size implications of dense indexes:

A dense index has one entry per record. For a table with 1,000,000 records:

Primary index (sparse, assuming 50 records/block): 20,000 entries
Dense secondary index: 1,000,000 entries

The secondary index is 50x larger than the primary index!

However, index entries are much smaller than full records. If each entry is 20 bytes (12-byte key + 8-byte pointer) vs. 200-byte records:

Secondary index: 1,000,000 × 20 = 20 MB
Data file: 1,000,000 × 200 = 200 MB

The index is still 10x smaller than the data—small enough to often fit in memory, making index traversal extremely fast.

Index-Only Access

Secondary Indexes on Non-Unique Fields

When a secondary index is built on a field with duplicate values (non-unique), the design becomes more complex. There are several approaches to handle this:

Option 1: Dense Index with Repeated Keys

Every record gets an index entry, even if the key value is repeated. Simple but potentially wasteful.

Option 2: Variable-Length Entries

Each unique key value has one entry containing a list of all record pointers. Space-efficient but complicates the index structure.

Option 3: Two-Level Indirection

The secondary index points to an intermediate structure (bucket or block of pointers) which then points to actual records.

non_unique_index_options.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Option 1: Repeated Keys (Dense, Simple)
┌───────────────┬─────────────┐
│ Department    │ Record Ptr  │
├───────────────┼─────────────┤
│ Engineering   │ → 101       │   4 entries for "Engineering"
│ Engineering   │ → 103       │   Wastes space on repeated key
│ Engineering   │ → 106       │   But simple B-tree structure
│ Engineering   │ → 110       │
│ HR            │ → 107       │
│ HR            │ → 111       │
│ ...           │ ...         │
└───────────────┴─────────────┘
 
Option 2: Variable-Length Pointer Lists
┌───────────────┬─────────────────────────────────┐
│ Department    │ Record Pointers                 │
├───────────────┼─────────────────────────────────┤
│ Engineering   │ [101, 103, 106, 110]            │   1 entry per unique value
│ HR            │ [107, 111]                      │   Variable-length lists
│ Marketing     │ [104, 109]                      │   Efficient for reads
│ Sales         │ [102, 105, 108, 112]            │   Complex for updates
└───────────────┴─────────────────────────────────┘
 
Option 3: Bucket-Based Indirection
Secondary Index                 Pointer Buckets           Data File
┌──────────────┐               ┌───────────────┐         ┌──────────┐
│ Engineering ─┼──────────────►│ 101 │ 103 │   │    ┌───►│ Block 1  │
│              │               │ 106 │ 110 │   │────┤    │ ...      │
├──────────────┤               ├───────────────┤    │    ├──────────┤
│ HR ──────────┼──────────────►│ 107 │ 111 │   │────┼───►│ Block 2  │
│              │               ├───────────────┤    │    │ ...      │
├──────────────┤               │ ...           │    │    ├──────────┤
│ Sales ───────┼──────────────►│ 102 │ 105 │   │────┘    │ Block 3  │
│              │               │ 108 │ 112 │   │         └──────────┘
└──────────────┴───────────────┴───────────────┘
        ↑                              ↑
     Sparse index               Bucket pages for
     on unique keys             each key value

Modern B+-Tree Implementation

Non-Unique Index Implementation Tradeoffs
Approach	Space Efficiency	Query Speed	Update Speed	Implementation
Repeated keys	Low (duplicates)	Good	Good	Simple B+-tree
Variable-length lists	High	Good	Complex	Custom structure
Bucket indirection	Medium	Extra I/O	Medium	Two-level structure

Index Indirection and Performance Implications

The random I/O problem:

Contrast this with a primary index: once you locate the starting point, consecutive records are in consecutive disk locations, enabling efficient prefetching and sequential reads.

random_io_analysis.sql
Query Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Scenario: Retrieve all employees in 'Engineering' department
-- Data: 10,000 employees, 1,000 in Engineering
-- Data blocks: 200 (50 records per block on average)
 
-- Using Primary Index on Employee_ID (if searching by ID range):
-- I/O pattern: Sequential
-- Disk accesses: ~20-40 blocks (consecutive blocks)
-- Time: Fast (sequential prefetch works)
 
-- Using Secondary Index on Department:
-- I/O pattern: Random
-- Disk accesses: Could be up to 1,000 (one per Engineering employee)
-- Best case: ~200 blocks (Engineering employees evenly distributed)
-- Worst case: ~1,000 random accesses
 
-- The math of random I/O:
-- Sequential I/O: 0.1ms per block (disk head already positioned)
-- Random I/O: 10ms per block (seek + rotational latency)
 
-- 40 blocks sequential: 40 × 0.1ms = 4ms
-- 200 blocks random: 200 × 10ms = 2000ms = 2 seconds!
 
-- This explains why:
-- 1. Full table scans can beat secondary index lookups
-- 2. Query optimizers sometimes reject index usage
-- 3. "Index everything" is terrible advice

The Tipping Point

Strategies to mitigate random I/O:

Database systems employ several techniques to reduce the random I/O cost of secondary indexes:

Buffer pool caching: Cache frequently accessed blocks in memory
Index-organized tables: Some systems can cluster data by a secondary index key
Covering indexes: Include all needed columns in the index to avoid data access
Block prefetching: Read ahead when sequential access is predicted
Bitmap heap scans: Collect all block IDs, sort them, then access in order

When Secondary Indexes Excel

•Highly selective queries — Retrieving a small percentage of rows (<5-10%)
•Point queries — Looking up a single row by indexed value
•Index-only queries — Query satisfied entirely from index
•OLTP workloads — Many small, precise lookups
•Frequently accessed hot data — Index and data cached in buffer pool

Secondary Index Implementation in Modern Databases

Modern database systems implement secondary indexes primarily using B+-tree structures, but with important variations in how they reference data rows.

InnoDB Secondary Index Architecture:

In InnoDB, secondary indexes have a unique characteristic: they store the primary key as the pointer to the data row, not a physical row address.

This design has important implications:

Secondary index lookup requires two operations: (1) Find entry in secondary index → (2) Look up primary key in clustered index
This is called a double lookup or bookmark lookup
If you change the primary key, all secondary indexes automatically update (conceptually)
Physical row moves don't require secondary index updates

innodb_secondary_index.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
InnoDB Secondary Index Lookup Process
 
Secondary Index on Department         Clustered Index (on pk)
┌────────────────────────────┐        ┌────────────────────────────┐
│  Department  │ Primary Key │        │  pk  │  Full Row Data      │
├──────────────┼─────────────┤        ├──────┼─────────────────────┤
│ "Engineering"│     101     │───┐    │ 101  │ Alice, Engineering..│
│ "Engineering"│     103     │   │    │ 102  │ Bob, Sales, ...     │
│ "Engineering"│     106     │   │    │ 103  │ Carol, Engineering..│
│ "HR"         │     107     │   │    │ ...  │ ...                 │
└──────────────┴─────────────┘   │    └──────┴─────────────────────┘
                                 │           ▲
     Step 1: Find "Engineering"  │           │
     in secondary index         └───────────┘
                               Step 2: Use pk (101)
                               to find row in
                               clustered index
 
Total cost: 2 B+-tree traversals

INCLUDE Columns for Covering Indexes

Secondary Index Design Considerations

Create Secondary Index When

•Queries filter on this column frequently in WHERE clauses
•High selectivity — Column has many distinct values
•Join operations use this column as the join key
•ORDER BY on this column is common (if used with LIMIT)
•Read-heavy workload — Queries far exceed modifications
•Foreign key columns — Support referential integrity checks

Avoid Secondary Index When

•Low selectivity — Few distinct values (e.g., boolean, gender)
•Small tables — Full scan is faster than index access
•Write-heavy workload — Index maintenance exceeds query benefit
•Rarely queried columns — Index cost not justified
•Frequently updated columns — Index churn causes fragmentation
•Very wide columns — Large index entries waste space

Index Overhead Is Real

Selectivity and the break-even point:

Selectivity refers to what fraction of rows match a query predicate. A column with 1000 distinct values in a 100,000-row table has selectivity of 1000/100000 = 1%.

High selectivity (1-10%): Secondary index highly beneficial
Medium selectivity (10-30%): May or may not benefit; test empirically
Low selectivity (>30%): Full table scan likely faster

The exact threshold depends on row size, block size, and storage characteristics. Always validate with query plans and actual performance measurements.

Secondary Index Maintenance and Fragmentation

Types of fragmentation:

Index Fragmentation Types

•Logical fragmentation: Index pages out of physical order (page 5 might precede page 2 on disk). Affects sequential scans.
•Page density fragmentation: Pages only partially filled due to splits. Wastes space, increases I/O for scans.
•Deletion markers: Deleted entries leave 'tombstones' that must be scanned past. Reduces effective index density.

index_maintenance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- SQL Server: Analyze and fix fragmentation
-- Check fragmentation level
SELECT 
    index_id,
    avg_fragmentation_in_percent,
    page_count
FROM sys.dm_db_index_physical_stats(
    DB_ID(), OBJECT_ID('Employee'), NULL, NULL, 'DETAILED'
);
 
-- Reorganize (online, for <30% fragmentation)
ALTER INDEX IX_Employee_Department 
ON Employee REORGANIZE;
 
-- Rebuild (more thorough, for >30% fragmentation)
ALTER INDEX IX_Employee_Department 
ON Employee REBUILD;

Index Maintenance Strategy

Summary: Secondary Index Mastery

We've explored secondary indexes in depth—the versatile index structures that provide alternative access paths to data without controlling physical storage ordering.

Key Takeaways

•Definition: Secondary indexes are built on non-ordering fields, providing lookup paths independent of physical data arrangement
•Scattered records: Unlike primary indexes, related records are physically distributed, resulting in random I/O patterns
•Dense nature: Secondary indexes typically maintain one entry per record (dense), making them larger than sparse primary indexes
•Multiple per table: Unlike primary indexes, you can create multiple secondary indexes on a single table
•Indirection cost: Secondary index access involves pointer traversal, and InnoDB specifically uses two-level lookup via primary key
•Random I/O tradeoff: For returning many rows (>5-20% of table), full table scan often outperforms secondary index access
•Write overhead: Every secondary index adds maintenance cost for INSERT/UPDATE/DELETE operations
•Design driven by workload: Index selection should be based on query patterns, selectivity analysis, and read/write ratio

What's next:

Page Complete

2 / 5