Loading learning content...
A table can have only one physical ordering—and thus only one primary (or clustered) index. But what happens when your queries need fast access through multiple fields? You might need to search employees by department, filter orders by customer, or look up products by category. These use cases demand secondary indexes.
Secondary indexes are the workhorses of database query optimization. While a primary index gets exclusive control over physical data ordering, secondary indexes provide alternative pathways to the same data. They enable multiple fast access patterns on a single table, transforming the database from a one-dimensional lookup table into a multi-dimensional query engine.
By the end of this page, you will understand how secondary indexes work without controlling physical ordering, the critical distinction between dense and sparse secondary indexes, why secondary indexes often require more storage than primary indexes, the concept of index indirection and its performance implications, and strategies for effective secondary index design.
A secondary index is an index on a field that is not the ordering field of the data file. The data file may be ordered on a different field (with a primary index), unordered (heap), or organized in any other manner—the secondary index doesn't care. It provides its own independent pathway to records.
Formal definition:
A secondary index (also called a non-clustered index) is an index structure where the indexed field's values are not in the same order as the physical records in the data file. The index entries contain search key values and pointers to the actual records (or to blocks containing those records).
Key insight: Because the data isn't sorted on the secondary index field, there's no guarantee that records with similar key values are physically adjacent. This has profound implications for query performance.
1234567891011121314151617181920212223242526
Data File (Ordered by Employee_ID = Primary Index)┌─────────────────────────────────────────────────────────────────────┐│ Block 1 │ (101, "Engineering") (102, "Sales") (103, "Engineering") ││ Block 2 │ (104, "Marketing") (105, "Sales") (106, "Engineering") ││ Block 3 │ (107, "HR") (108, "Sales") (109, "Marketing") ││ Block 4 │ (110, "Engineering") (111, "HR") (112, "Sales") │└─────────────────────────────────────────────────────────────────────┘ ↑ Data is sorted by Employee_ID, NOT by Department Secondary Index on Department (Non-Ordering Field)┌────────────────────────────────────────────────────────────────────┐│ Index Entry Format: [Department] → [List of Record Pointers] ││ ││ ┌──────────────┬───────────────────────────────────────────────┐ ││ │ Engineering │ → (101,B1) (103,B1) (106,B2) (110,B4) │ ││ ├──────────────┼───────────────────────────────────────────────┤ ││ │ HR │ → (107,B3) (111,B4) │ ││ ├──────────────┼───────────────────────────────────────────────┤ ││ │ Marketing │ → (104,B2) (109,B3) │ ││ ├──────────────┼───────────────────────────────────────────────┤ ││ │ Sales │ → (102,B1) (105,B2) (108,B3) (112,B4) │ ││ └──────────────┴───────────────────────────────────────────────┘ ││ ││ Note: Records for each department are SCATTERED across blocks │└────────────────────────────────────────────────────────────────────┘Unlike a primary index where records with adjacent key values are in adjacent blocks, secondary indexes point to records scattered throughout the data file. Searching for all 'Engineering' employees might require accessing 4 different blocks, even though there are only 4 matching records. This random I/O pattern is the fundamental cost of secondary indexes.
| Characteristic | Primary Index | Secondary Index |
|---|---|---|
| Data ordering | Index field = ordering field | Index field ≠ ordering field |
| Physical proximity | Related records adjacent | Related records scattered |
| Number per table | At most one | Multiple allowed |
| Sparse index option | Yes (one entry per block) | Only on unique fields |
| Range query I/O | Sequential reads | Random reads |
| Index size | Smaller (sparse) | Larger (often dense) |
When a secondary index is built on a unique field (one where each value appears exactly once), the index can theoretically be sparse. However, secondary indexes are commonly dense—containing one entry for every record in the data file, not one per block.
Why dense indexes are often necessary:
Consider the challenge: The data file isn't sorted on the secondary index field. This means records with adjacent key values might be in completely different blocks. A sparse index (one entry per block) wouldn't help us find records—we'd have no way to know which block contains a given value.
Dense secondary index structure:
123456789101112131415161718192021222324
Dense Secondary Index on Social_Security_Number (Unique Field) Data File (Ordered by Employee_ID): Dense Secondary Index:┌─────────────────────────────────────┐ ┌─────────────────────────────┐│ Block 1: │ │ SSN │ Record Ptr ││ (101, SSN: 456-78-9012, Alice) │ ←───│ 111-22-3333 │ → Block 3:2 ││ (102, SSN: 789-01-2345, Bob) │ ←───│ 222-33-4444 │ → Block 4:1 ││ (103, SSN: 345-67-8901, Carol) │ ←───│ 333-44-5555 │ → Block 2:3 │├─────────────────────────────────────┤ │ 345-67-8901 │ → Block 1:3 ││ Block 2: │ │ 456-78-9012 │ → Block 1:1 ││ (104, SSN: 567-89-0123, David) │ ←───│ 555-66-7777 │ → Block 3:3 ││ (105, SSN: 678-90-1234, Eve) │ ←───│ 567-89-0123 │ → Block 2:1 ││ (106, SSN: 333-44-5555, Frank) │ ←───│ 678-90-1234 │ → Block 2:2 │├─────────────────────────────────────┤ │ 789-01-2345 │ → Block 1:2 ││ Block 3: │ │ 888-99-0000 │ → Block 4:2 ││ (107, SSN: 111-22-3333, Grace) │ └─────────────────────────────┘│ (108, SSN: 890-12-3456, Henry) │ ↑│ (109, SSN: 555-66-7777, Iris) │ Index is SORTED by SSN├─────────────────────────────────────┤ Data is SORTED by Employee_ID│ Block 4: │ │ (110, SSN: 222-33-4444, Jake) │ Notice: Index pointers go to│ (111, SSN: 888-99-0000, Kate) │ records scattered across │ │ ALL data blocks!└─────────────────────────────────────┘Size implications of dense indexes:
A dense index has one entry per record. For a table with 1,000,000 records:
The secondary index is 50x larger than the primary index!
However, index entries are much smaller than full records. If each entry is 20 bytes (12-byte key + 8-byte pointer) vs. 200-byte records:
The index is still 10x smaller than the data—small enough to often fit in memory, making index traversal extremely fast.
If a query only needs values from the indexed columns, the database can satisfy the query entirely from the index without reading the data file at all. This is called an 'index-only scan' or 'covering index' behavior. Dense secondary indexes are particularly valuable for this optimization.
When a secondary index is built on a field with duplicate values (non-unique), the design becomes more complex. There are several approaches to handle this:
Option 1: Dense Index with Repeated Keys
Every record gets an index entry, even if the key value is repeated. Simple but potentially wasteful.
Option 2: Variable-Length Entries
Each unique key value has one entry containing a list of all record pointers. Space-efficient but complicates the index structure.
Option 3: Two-Level Indirection
The secondary index points to an intermediate structure (bucket or block of pointers) which then points to actual records.
1234567891011121314151617181920212223242526272829303132333435363738
Option 1: Repeated Keys (Dense, Simple)┌───────────────┬─────────────┐│ Department │ Record Ptr │├───────────────┼─────────────┤│ Engineering │ → 101 │ 4 entries for "Engineering"│ Engineering │ → 103 │ Wastes space on repeated key│ Engineering │ → 106 │ But simple B-tree structure│ Engineering │ → 110 ││ HR │ → 107 ││ HR │ → 111 ││ ... │ ... │└───────────────┴─────────────┘ Option 2: Variable-Length Pointer Lists┌───────────────┬─────────────────────────────────┐│ Department │ Record Pointers │├───────────────┼─────────────────────────────────┤│ Engineering │ [101, 103, 106, 110] │ 1 entry per unique value│ HR │ [107, 111] │ Variable-length lists│ Marketing │ [104, 109] │ Efficient for reads│ Sales │ [102, 105, 108, 112] │ Complex for updates└───────────────┴─────────────────────────────────┘ Option 3: Bucket-Based IndirectionSecondary Index Pointer Buckets Data File┌──────────────┐ ┌───────────────┐ ┌──────────┐│ Engineering ─┼──────────────►│ 101 │ 103 │ │ ┌───►│ Block 1 ││ │ │ 106 │ 110 │ │────┤ │ ... │├──────────────┤ ├───────────────┤ │ ├──────────┤│ HR ──────────┼──────────────►│ 107 │ 111 │ │────┼───►│ Block 2 ││ │ ├───────────────┤ │ │ ... │├──────────────┤ │ ... │ │ ├──────────┤│ Sales ───────┼──────────────►│ 102 │ 105 │ │────┘ │ Block 3 ││ │ │ 108 │ 112 │ │ └──────────┘└──────────────┴───────────────┴───────────────┘ ↑ ↑ Sparse index Bucket pages for on unique keys each key valueMost modern databases use B+-trees for secondary indexes, storing (key, pointer) pairs in sorted order. For non-unique keys, duplicate entries are allowed in the leaf nodes. The tree remains balanced, and all operations maintain O(log n) complexity. The 'pointer buckets' approach is less common in modern systems.
| Approach | Space Efficiency | Query Speed | Update Speed | Implementation |
|---|---|---|---|---|
| Repeated keys | Low (duplicates) | Good | Good | Simple B+-tree |
| Variable-length lists | High | Good | Complex | Custom structure |
| Bucket indirection | Medium | Extra I/O | Medium | Two-level structure |
Secondary indexes use indirection—they don't contain data, they contain pointers to data. This layer of indirection has significant performance implications that every database professional must understand.
The random I/O problem:
When you traverse a secondary index and then follow pointers to retrieve data, each pointer might lead to a different disk block. If you're retrieving 100 records, you might need 100 separate random disk accesses—even if the data exists in far fewer blocks.
Contrast this with a primary index: once you locate the starting point, consecutive records are in consecutive disk locations, enabling efficient prefetching and sequential reads.
1234567891011121314151617181920212223242526
-- Scenario: Retrieve all employees in 'Engineering' department-- Data: 10,000 employees, 1,000 in Engineering-- Data blocks: 200 (50 records per block on average) -- Using Primary Index on Employee_ID (if searching by ID range):-- I/O pattern: Sequential-- Disk accesses: ~20-40 blocks (consecutive blocks)-- Time: Fast (sequential prefetch works) -- Using Secondary Index on Department:-- I/O pattern: Random-- Disk accesses: Could be up to 1,000 (one per Engineering employee)-- Best case: ~200 blocks (Engineering employees evenly distributed)-- Worst case: ~1,000 random accesses -- The math of random I/O:-- Sequential I/O: 0.1ms per block (disk head already positioned)-- Random I/O: 10ms per block (seek + rotational latency) -- 40 blocks sequential: 40 × 0.1ms = 4ms-- 200 blocks random: 200 × 10ms = 2000ms = 2 seconds! -- This explains why:-- 1. Full table scans can beat secondary index lookups-- 2. Query optimizers sometimes reject index usage-- 3. "Index everything" is terrible adviceFor queries returning more than 5-20% of a table, a full table scan with sequential I/O often outperforms secondary index access with random I/O. Query optimizers make this decision automatically based on statistics—this is why the same query might use an index for a selective WHERE clause but scan the table for a less selective one.
Strategies to mitigate random I/O:
Database systems employ several techniques to reduce the random I/O cost of secondary indexes:
Modern database systems implement secondary indexes primarily using B+-tree structures, but with important variations in how they reference data rows.
InnoDB Secondary Index Architecture:
In InnoDB, secondary indexes have a unique characteristic: they store the primary key as the pointer to the data row, not a physical row address.
This design has important implications:
12345678910111213141516171819
InnoDB Secondary Index Lookup Process Secondary Index on Department Clustered Index (on pk)┌────────────────────────────┐ ┌────────────────────────────┐│ Department │ Primary Key │ │ pk │ Full Row Data │├──────────────┼─────────────┤ ├──────┼─────────────────────┤│ "Engineering"│ 101 │───┐ │ 101 │ Alice, Engineering..││ "Engineering"│ 103 │ │ │ 102 │ Bob, Sales, ... ││ "Engineering"│ 106 │ │ │ 103 │ Carol, Engineering..││ "HR" │ 107 │ │ │ ... │ ... │└──────────────┴─────────────┘ │ └──────┴─────────────────────┘ │ ▲ Step 1: Find "Engineering" │ │ in secondary index └───────────┘ Step 2: Use pk (101) to find row in clustered index Total cost: 2 B+-tree traversalsSQL Server and PostgreSQL support INCLUDE clause in index creation. This adds non-key columns to the leaf level without affecting the tree structure. Example: CREATE INDEX idx ON Orders(CustomerID) INCLUDE (TotalAmount, OrderDate). This enables index-only scans for more queries.
Creating effective secondary indexes requires balancing query performance against maintenance costs. Unlike primary indexes (which are often predetermined by the table's natural key), secondary indexes are strategic choices that should be driven by workload analysis.
Every secondary index must be updated on INSERT, and potentially on UPDATE/DELETE. With 5 secondary indexes, an INSERT becomes 6 write operations (data + 5 index updates). For high-volume OLTP, excessive indexes can reduce write throughput by 50% or more.
Selectivity and the break-even point:
Selectivity refers to what fraction of rows match a query predicate. A column with 1000 distinct values in a 100,000-row table has selectivity of 1000/100000 = 1%.
The exact threshold depends on row size, block size, and storage characteristics. Always validate with query plans and actual performance measurements.
Secondary indexes require ongoing maintenance to remain performant. Over time, insert, update, and delete operations can cause fragmentation—where the index structure becomes less efficient than its ideal form.
Types of fragmentation:
1234567891011121314151617
-- SQL Server: Analyze and fix fragmentation-- Check fragmentation levelSELECT index_id, avg_fragmentation_in_percent, page_countFROM sys.dm_db_index_physical_stats( DB_ID(), OBJECT_ID('Employee'), NULL, NULL, 'DETAILED'); -- Reorganize (online, for <30% fragmentation)ALTER INDEX IX_Employee_Department ON Employee REORGANIZE; -- Rebuild (more thorough, for >30% fragmentation)ALTER INDEX IX_Employee_Department ON Employee REBUILD;Regular index maintenance should be scheduled based on workload: OLTP systems with heavy writes may need weekly or daily maintenance; analytics systems with batch loads can maintain after load cycles. Monitor fragmentation levels and set thresholds for action (e.g., reorganize at 10%, rebuild at 30%).
We've explored secondary indexes in depth—the versatile index structures that provide alternative access paths to data without controlling physical storage ordering.
What's next:
With primary and secondary indexes understood, we'll explore unique indexes—a specialized index type that combines fast lookup with constraint enforcement. Unique indexes serve dual purposes: accelerating queries while guaranteeing data integrity.
You now have comprehensive understanding of secondary indexes: their structure, the random I/O implications, dense vs sparse considerations, and how they're implemented across major database systems. This knowledge enables you to make informed decisions about when and how to create secondary indexes.