Loading learning content...
When designing an index for a database table, one of the most fundamental decisions concerns coverage: should the index contain an entry for every record in the underlying data file, or should it point to only a subset of records? This decision fundamentally shapes the index's storage footprint, search characteristics, and maintenance overhead.
A dense index answers this question with full commitment—it maintains one index entry for every single record in the data file. This comprehensive approach provides maximum lookup precision but requires proportionally more storage. Understanding dense indexes is essential before exploring the alternatives, as the dense approach represents the conceptually purest form of indexing.
By the end of this page, you will understand the complete architecture of dense indexes—from their formal definition to their internal structure, storage characteristics, search algorithms, maintenance operations, and practical applications. You'll be able to analyze when dense indexing is the optimal choice and recognize its implementation in real database systems.
A dense index is an index structure where there exists exactly one index entry for every record (tuple) in the indexed data file. Each index entry consists of a search key value and a pointer (record identifier or physical address) that directly locates the corresponding record in the underlying data file.
Formally:
Given a data file $D$ containing $n$ records ${r_1, r_2, ..., r_n}$ and an indexing attribute (search key) $K$, a dense index $I$ on $K$ satisfies:
$$|I| = n$$
where each index entry $e_i \in I$ is a pair $(k_i, p_i)$ such that:
This one-to-one correspondence between data records and index entries is the defining characteristic that distinguishes dense indexes from their sparse counterparts.
An index entry is a compact metadata structure—typically containing just the key value and a pointer (often 4-12 bytes total). This is dramatically smaller than the full data record, which may contain dozens of columns and hundreds of bytes. This size differential is what makes indexing valuable despite the storage overhead.
The Pointer Abstraction:
The 'pointer' in an index entry can take several forms depending on the database architecture:
Physical Address (Page ID + Slot Number): Direct byte offset or page/slot coordinates within the data file. Example: (PageID: 4527, Slot: 12)
Row Identifier (RID): A logical identifier that the storage engine translates to a physical location. Example: RID: 0x00000A340000000C
Primary Key Value: In secondary indexes, the pointer may be the primary key of the record, requiring a subsequent lookup in the primary index (this is common in InnoDB).
The choice of pointer representation affects lookup performance, storage overhead, and the complexity of maintaining indexes during data reorganization.
A dense index is not merely a list of key-pointer pairs—it's a carefully organized structure optimized for search operations. Let's dissect its components layer by layer.
Visual Structure:
Consider a dense index on an Employee table with employee_id as the search key:
DATA FILE (Employee records):
┌─────────────────────────────────────────────────────────────┐
│ Block 0: [E101, "Alice", "Engineering", 95000] │
│ [E102, "Bob", "Marketing", 72000] │
│ [E103, "Carol", "Engineering", 88000] │
├─────────────────────────────────────────────────────────────┤
│ Block 1: [E104, "David", "Sales", 67000] │
│ [E105, "Eve", "Engineering", 102000] │
│ [E106, "Frank", "HR", 58000] │
├─────────────────────────────────────────────────────────────┤
│ Block 2: [E107, "Grace", "Marketing", 79000] │
│ ... │
└─────────────────────────────────────────────────────────────┘
DENSE INDEX (on employee_id):
┌─────────────────────────────────┐
│ [E101 → (Block 0, Slot 0)] │ ← Entry for EVERY record
│ [E102 → (Block 0, Slot 1)] │
│ [E103 → (Block 0, Slot 2)] │
│ [E104 → (Block 1, Slot 0)] │
│ [E105 → (Block 1, Slot 1)] │
│ [E106 → (Block 1, Slot 2)] │
│ [E107 → (Block 2, Slot 0)] │
│ ... │
└─────────────────────────────────┘
Every single employee record has a corresponding index entry—this is the hallmark of dense indexing.
For a dense index with a 4-byte integer key and an 8-byte pointer, each entry consumes 12 bytes. A table with 10 million records requires roughly 120MB just for the index entries—before considering any tree structure overhead. This storage cost is the primary trade-off of dense indexing.
The power of a dense index lies in its search efficiency. Because every record has a corresponding index entry, and entries are sorted, we can apply efficient search algorithms that dramatically outperform scanning the data file directly.
Binary Search on Dense Index:
With sorted entries, binary search locates any key in logarithmic time:
BINARY_SEARCH(index I, key K):
low ← 0
high ← |I| - 1
while low ≤ high:
mid ← (low + high) / 2
if I[mid].key = K:
return I[mid].pointer // Found
else if I[mid].key < K:
low ← mid + 1
else:
high ← mid - 1
return NULL // Not found
Complexity Analysis:
For 10 million records with 500 entries per block, this requires only ~4-5 block reads to locate any record.
With a dense index, finding any single record never requires examining more than O(log n) index entries. For a table with 1 billion records, this means at most ~30 comparisons—regardless of which record you seek. This predictable, bounded performance is the foundation of efficient database operations.
Dense indexes provide complete coverage at the cost of storage. Understanding this trade-off requires careful analysis of both the absolute and relative storage requirements.
| Component | Size | Notes |
|---|---|---|
| Search Key (INT) | 4 bytes | Standard 32-bit integer primary key |
| Pointer (RID) | 8 bytes | Page ID (4 bytes) + Slot (4 bytes) |
| Entry Total | 12 bytes | Key + Pointer per record |
| 10M Records | 120 MB | Index entries alone |
| With Tree Overhead (~20%) | 144 MB | B-tree internal nodes |
Comparative Storage Analysis:
Consider a realistic scenario comparing data file size to dense index size:
Scenario: Employee table with 10 million records
DATA FILE:
├── employee_id (INT): 4 bytes
├── name (VARCHAR(100)): ~50 bytes avg
├── department (VARCHAR(50)): ~25 bytes avg
├── salary (DECIMAL): 8 bytes
├── hire_date (DATE): 4 bytes
├── email (VARCHAR(100)): ~40 bytes avg
├── Row header/overhead: ~20 bytes
└── TOTAL per row: ~151 bytes
Data File Size: 10M × 151 bytes ≈ 1.51 GB
DENSE INDEX (on employee_id):
├── Entry size: 12 bytes
├── B-tree overhead: ~20%
└── TOTAL: ~144 MB
Index-to-Data Ratio: 144 MB / 1.51 GB ≈ 9.3%
The dense index consumes roughly 9-10% of the data file size—substantial but often worthwhile for frequently-queried columns.
Using UUIDs as primary keys (common for distributed systems) transforms the storage calculation dramatically. A 36-byte UUID key with an 8-byte pointer creates 44-byte entries. For 10 million records, the index alone requires 440 MB—3x the integer key scenario. This often motivates hybrid approaches using internal integer IDs with UUID mappings.
Dense indexes must be maintained in synchronization with the underlying data. Every insert, update, or delete on indexed columns triggers corresponding index modifications. Understanding these operations reveals the maintenance cost of dense indexing.
Inserting a New Record:
INSERT_WITH_DENSE_INDEX(table T, record R, index I):
// Step 1: Insert record into data file
pointer ← INSERT_INTO_DATA_FILE(T, R)
// Step 2: Create index entry
key ← EXTRACT_KEY(R, I.key_column)
entry ← (key, pointer)
// Step 3: Insert entry at correct sorted position
INSERT_SORTED(I, entry) // May cause block splits
Complexity:
Consideration: Every new record requires a new index entry—no exceptions in a dense index.
Every data modification potentially triggers index modifications on ALL dense indexes for that table. A table with 5 dense indexes experiences 5x write amplification for indexed columns. This is why over-indexing is a common performance anti-pattern—the write cost can overwhelm the read benefits.
The behavior and utility of a dense index depends significantly on whether the underlying data file is sorted on the indexed attribute. This distinction creates two fundamentally different use cases.
The Range Query Distinction:
QUERY: SELECT * FROM Employee WHERE salary BETWEEN 80000 AND 100000
Scenario A: Dense index on salary, data file sorted by salary
┌──────────────────────────────────────────────────────────────┐
│ Index scan finds contiguous range │
│ Data records for range are physically adjacent │
│ Sequential disk read: ~3-5 I/O operations │
│ Excellent cache locality │
└──────────────────────────────────────────────────────────────┘
Scenario B: Dense index on salary, data file sorted by employee_id
┌──────────────────────────────────────────────────────────────┐
│ Index scan finds contiguous range in INDEX │
│ But data records are scattered throughout data file │
│ Random disk reads: potentially 1 I/O per matching record │
│ Poor cache locality │
└──────────────────────────────────────────────────────────────┘
For range queries, the performance difference can be 100x or more depending on data volume and I/O characteristics.
A data file can only be physically sorted in ONE order. Therefore, only ONE index can have a clustering relationship with the data. All other indexes are necessarily non-clustering, making this choice architecturally significant. Choosing the clustered index column is one of the most impactful database design decisions.
Dense indexes are not universally optimal—they represent a specific set of trade-offs. Understanding when these trade-offs favor dense indexing helps you make informed design decisions.
In practice, ALL secondary indexes are dense indexes. Since the data file can only be sorted on one attribute (used by the primary/clustered index), any additional index on a different column must include entries for every record to provide complete search capability. This makes understanding dense indexing essential for any non-trivial database schema.
Real-World Application Patterns:
| Use Case | Why Dense Index Works |
|---|---|
| User authentication (lookup by email) | Exact-match queries on unique unsorted column |
| Order lookup by customer ID | Secondary index on customer FK |
| Product search by SKU | Unique identifier point queries |
| Session management by session token | Random access to heap-organized data |
| Audit log by transaction ID | Unique identifier in append-optimized table |
Modern database systems implement dense indexing in various ways, often as part of more sophisticated index structures. Understanding these implementations connects theory to practice.
| Database | Dense Index Implementation | Key Characteristics |
|---|---|---|
| PostgreSQL | B-tree secondary indexes | Dense by default; each entry points to ctid (physical tuple) |
| MySQL InnoDB | Secondary indexes | Dense; entries contain primary key (requires PK lookup) |
| Oracle | Non-unique B-tree indexes | Dense on non-sorted columns; supports index compression |
| SQL Server | Non-clustered indexes | Dense; can include additional columns (covering index) |
| MongoDB | Secondary indexes | Dense B-tree; entries reference _id or document location |
12345678910111213141516171819
-- Dense index automatically created on PRIMARY KEYCREATE TABLE employees ( employee_id SERIAL PRIMARY KEY, -- Implicit B-tree index (dense) email VARCHAR(100) UNIQUE, -- Implicit unique index (dense) department_id INT, salary DECIMAL(10,2)); -- Explicit dense secondary index on department_idCREATE INDEX idx_employees_dept ON employees(department_id); -- Partial dense index (dense over subset of rows)CREATE INDEX idx_high_salary ON employees(salary) WHERE salary > 100000; -- Verify index structureSELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'employees';MySQL InnoDB stores the primary key value (not a physical pointer) in secondary index entries. Looking up via a secondary index requires: (1) secondary index traversal to find PK, (2) primary key index traversal to find the actual row. This 'double lookup' pattern trades storage efficiency for reduced index maintenance during row relocations.
We've thoroughly examined dense indexes—the complete, one-entry-per-record approach to database indexing. Let's consolidate the essential knowledge:
Looking Ahead:
Dense indexes represent one extreme of the coverage spectrum—complete but storage-intensive. In the next page, we'll explore sparse indexes, which take the opposite approach by indexing only selected records. Understanding both extremes, and the mathematical relationship between them, will equip you to make informed indexing decisions in any database design scenario.
You now have a comprehensive understanding of dense indexes—their structure, algorithms, storage implications, and practical applications. This foundation is essential for understanding the sparse index alternative we'll explore next.