Dense Vs Sparse Index - Learning Module

Loading content...

0/241

Space Trade-off

Quantifying the Index Size Decision

The choice between dense and sparse indexing fundamentally trades storage space for search simplicity. Dense indexes provide direct record access at the cost of larger size; sparse indexes achieve dramatic space savings by requiring an additional block-scan step.

This page develops a quantitative framework for analyzing this trade-off. By understanding the mathematical relationships between record count, block size, entry size, and index dimensions, you'll be able to make informed decisions about index design and accurately predict storage requirements.

What You Will Learn

By the end of this page, you will be able to calculate index sizes precisely, compare dense vs sparse storage requirements for any scenario, analyze the impact of block size and entry size on index dimensions, and understand how index size affects memory utilization and I/O performance.

Fundamental Size Equations

Let's establish the mathematical foundation for index size analysis. We'll define precise notation and derive the core formulas.

Variable Definitions:

Symbol	Definition	Typical Values
$n$	Number of records in data file	10³ to 10¹⁰
$R$	Record size (bytes)	100 - 2000
$B$	Block/page size (bytes)	4,096 - 65,536
$b_f$	Blocking factor (records per block) = $⌊B/R⌋$	20 - 500
$b$	Number of data blocks = $⌈n/b_f⌉$	Derived
$K$	Key size (bytes)	4 - 128
$P$	Pointer size (bytes)	4 - 8
$E$	Index entry size = $K + P$	8 - 136
$i_f$	Index blocking factor = $⌊B/E⌋$	50 - 1000

Dense Index Size:

A dense index contains exactly $n$ entries (one per record):

$$S_{dense} = n \times E \text{ bytes}$$

Number of index blocks:

$$b_{dense} = \lceil n / i_f \rceil$$

Sparse Index Size:

A sparse index contains exactly $b$ entries (one per data block):

$$S_{sparse} = b \times E = \lceil n / b_f \rceil \times E \text{ bytes}$$

Number of index blocks:

$$b_{sparse} = \lceil b / i_f \rceil = \left\lceil \frac{n}{b_f \times i_f} \right\rceil$$

The Compression Ratio

The ratio of sparse to dense index size equals the inverse of the blocking factor:

$$\frac{S_{sparse}}{S_{dense}} = \frac{b}{n} = \frac{1}{b_f}$$

If each block holds 100 records, the sparse index is exactly 100× smaller. This ratio is independent of total record count—it depends only on record size and block size.

Worked Example: E-Commerce Database

Let's apply these formulas to a realistic e-commerce scenario to see the concrete implications.

Scenario: Order History Table

Table: orders
├── order_id (BIGINT):       8 bytes    ← Primary Key
├── customer_id (BIGINT):    8 bytes
├── order_date (TIMESTAMP):  8 bytes
├── status (VARCHAR(20)):   21 bytes
├── total_amount (DECIMAL): 16 bytes
├── shipping_address:       100 bytes (avg)
├── billing_address:        100 bytes (avg)
├── metadata (JSON):        200 bytes (avg)
├── Row overhead:            20 bytes
└── TOTAL RECORD SIZE:      481 bytes ≈ 500 bytes

Configuration:
├── Record count (n):        50,000,000 (50 million orders)
├── Block size (B):          8,192 bytes (8 KB)
├── Key size (K):            8 bytes (BIGINT order_id)
├── Pointer size (P):        8 bytes
└── Entry size (E):          16 bytes

Calculations:

Step 1: Data File Metrics

Blocking factor: b_f = ⌊8192 / 500⌋ = 16 records/block
Data blocks: b = ⌈50,000,000 / 16⌉ = 3,125,000 blocks
Data file size: 3,125,000 × 8 KB = 25 GB

Step 2: Dense Index Size

Dense entries: n = 50,000,000
Dense index size: 50,000,000 × 16 bytes = 800 MB
Index blocking factor: i_f = ⌊8192 / 16⌋ = 512 entries/block
Dense index blocks: ⌈50,000,000 / 512⌉ = 97,657 blocks

Step 3: Sparse Index Size

Sparse entries: b = 3,125,000
Sparse index size: 3,125,000 × 16 bytes = 50 MB
Sparse index blocks: ⌈3,125,000 / 512⌉ = 6,104 blocks

Summary: Dense vs Sparse for 50M Orders
Metric	Dense Index	Sparse Index	Ratio
Entries	50,000,000	3,125,000	16:1
Raw Size	800 MB	50 MB	16:1
Index Blocks	97,657	6,104	16:1
% of Data Size	3.2%	0.2%	16:1

Memory Implications

An 800 MB dense index likely exceeds typical buffer pool allocations and will require disk I/O for many lookups. A 50 MB sparse index fits comfortably in memory on almost any server, enabling pure in-memory index traversal. This 16× difference directly impacts query latency.

Impact of Blocking Factor

The blocking factor ($b_f$ = records per block) is the key determinant of sparse index efficiency. Let's analyze how record size and block size interact to affect this crucial ratio.

Blocking Factor Formula:

$$b_f = \left\lfloor \frac{B}{R} \right\rfloor$$

where:

$B$ = block size
$R$ = record size

Space Compression Ratio:

$$\text{Compression} = \frac{\text{Dense Size}}{\text{Sparse Size}} = b_f = \left\lfloor \frac{B}{R} \right\rfloor$$

Higher blocking factors yield greater sparse index advantages.

Blocking Factor Sensitivity Analysis
Record Size	Block Size 4KB	Block Size 8KB	Block Size 16KB
50 bytes	81 records/block	163 records/block	327 records/block
100 bytes	40 records/block	81 records/block	163 records/block
200 bytes	20 records/block	40 records/block	81 records/block
500 bytes	8 records/block	16 records/block	32 records/block
1000 bytes	4 records/block	8 records/block	16 records/block
2000 bytes	2 records/block	4 records/block	8 records/block

Analysis Insights:

Small Records Favor Sparse: With 50-byte records and 8KB blocks, sparse indexes are 163× smaller than dense. The space savings are dramatic.
Large Records Diminish Advantage: With 2000-byte records and 4KB blocks, only 2 records fit per block, making sparse indexes only 2× smaller. The advantage becomes marginal.
Block Size Scaling: Larger blocks increase blocking factor linearly, amplifying sparse index advantages proportionally.
Threshold Consideration: When $b_f < 10$, the sparse index advantage may not justify the two-phase search overhead in some workloads.

Variable-Length Record Complication

With variable-length records, the blocking factor varies across blocks. Use the average record size for estimation, but expect actual sparse index sizes to deviate by ±10-20% from predictions. Some blocks will be densely packed (small records), others sparse (large records).

Key Size Impact on Index Dimensions

While the dense-to-sparse ratio depends only on blocking factor, the absolute size of both indexes scales with key size. This has significant practical implications.

Index Size vs Key Type (10M Records, 8KB Blocks, 100-byte Records)
Key Type	Key Size	Entry Size	Dense Size	Sparse Size
INT	4 bytes	12 bytes	120 MB	1.5 MB
BIGINT	8 bytes	16 bytes	160 MB	2.0 MB
UUID (binary)	16 bytes	24 bytes	240 MB	3.0 MB
UUID (string)	36 bytes	44 bytes	440 MB	5.5 MB
VARCHAR(50) avg	50 bytes	58 bytes	580 MB	7.25 MB
Composite (3 INTs)	12 bytes	20 bytes	200 MB	2.5 MB

Key Observations:

String UUIDs Cost 4× Integer Keys: A table using string UUIDs (36 bytes) instead of integers (4 bytes) sees its dense index grow from 120 MB to 440 MB—a 3.7× increase.
Index Fanout Degrades: Larger keys mean fewer entries per index block. With 44-byte entries in 8KB blocks, fanout drops to 186 (vs 682 for 12-byte entries). This increases B-tree height.
Composite Keys Compound: Multi-column keys add sizes together. A composite key on (customer_id, order_id, product_id) with three BIGINTs creates 24-byte keys.
Sparse Still Proportional: Despite larger absolute sizes, the sparse-to-dense ratio remains constant at 1:80 (for 80 records/block). Sparse indexing benefits persist across key sizes.

Design Implication

When storage or memory is constrained, prefer compact key types. Using BIGINT surrogate keys instead of natural VARCHAR keys can reduce index size by 50% or more—often the difference between fitting in memory and spilling to disk.

Multi-Level Index Size Analysis

Real index structures like B-trees are multi-level. Understanding the total storage across all levels provides a complete picture of index overhead.

Multi-Level Size Formula:

For a multi-level sparse index with fanout $f$ (entries per index block):

$$\text{Total Index Blocks} = \sum_{i=1}^{h} \left\lceil \frac{b}{f^i} \right\rceil$$

This is a geometric series. For large $f$, approximation:

$$\text{Total Blocks} \approx \frac{b}{f-1}$$

Example Calculation:

Scenario:
├── Data blocks (b):         1,000,000
├── Index fanout (f):        500 entries/block
└── Block size:              8 KB

Level 1: ⌈1,000,000 / 500⌉     = 2,000 blocks
Level 2: ⌈2,000 / 500⌉         = 4 blocks  
Level 3: ⌈4 / 500⌉             = 1 block (root)

Total index blocks:            2,005 blocks
Total index size:              2,005 × 8 KB ≈ 16 MB

Overhead ratio: 16 MB / (1M × 8 KB) = 16 MB / 8 GB = 0.2%

Dense vs Sparse Multi-Level Comparison:

Same 1M block scenario:

SPARSE (indexing blocks):
├── Base entries:        1,000,000
├── Level 1 entries:     2,000
├── Level 2 entries:     4
├── Total entries:       1,002,004
└── Total size:          ~16 MB

DENSE (indexing records with b_f = 100):
├── Base entries:        100,000,000 (n = b × b_f)
├── Level 1 entries:     200,000
├── Level 2 entries:     400
├── Level 3 entries:     1
├── Total entries:       100,200,401
└── Total size:          ~1.6 GB

Ratio: 1.6 GB / 16 MB = 100:1

The 100:1 ratio matches the blocking factor ($b_f = 100$), confirming multi-level overhead is proportionally distributed.

B+Tree Overhead

In production B+trees, add 10-30% overhead for: (1) partial block utilization after splits, (2) per-entry metadata, (3) free space for future inserts, (4) node headers. Multiply calculated sizes by 1.2-1.3 for realistic estimates.

Memory Pressure and the Hot Index Problem

Index size directly impacts memory utilization. When indexes exceed available buffer pool memory, performance degrades dramatically. This section analyzes the memory implications of the dense/sparse choice.

The Memory Threshold Effect:

Performance Curve:

  Response        │
  Time (ms)       │
                  │
       50 ────────│─────────────────╮
                  │                 │
       40 ────────│                 │
                  │                 │
       30 ────────│                 │
                  │                 │
       20 ────────│                 ╰─────────────────
                  │   ┌───────────────┐
       10 ────────│───│ Index fits in │
                  │   │    memory     │
        0 ────────┼───┴───────────────┴───────────────
                  │         │               │
             Index Size   Buffer      Memory
                          Pool        Threshold

When the index fits in memory: sub-millisecond lookups. When it doesn't: disk I/O adds 5-20ms per access.

Index Memory Fit Analysis (Buffer Pool: 4 GB)
Scenario	Dense Size	Sparse Size	Dense Fits?	Sparse Fits?
1M records	16 MB	0.2 MB	✓ Yes	✓ Yes
10M records	160 MB	2 MB	✓ Yes	✓ Yes
100M records	1.6 GB	20 MB	✓ Yes	✓ Yes
500M records	8 GB	100 MB	✗ No	✓ Yes
1B records	16 GB	200 MB	✗ No	✓ Yes
10B records	160 GB	2 GB	✗ No	✓ Yes

Analysis:

At 500M records, the dense index (8 GB) exceeds the 4 GB buffer pool. Index operations now require disk I/O, increasing latency 100-1000×.

The sparse index (100 MB) fits comfortably, maintaining in-memory performance even at 10 billion records.

Working Set Consideration:

Not all index entries are accessed equally. An index has a 'hot' working set of frequently accessed entries. If this working set fits in memory, performance remains acceptable even with a larger total index:

Working Set Size ≈ (Hot Key Fraction) × (Index Size)

Example:
├── 500M records, dense index = 8 GB
├── 1% of keys are frequently accessed
├── Working set = 80 MB (fits!)
└── Acceptable performance for hot queries

The Worst Case: Uniform Random Access

When access patterns are uniformly distributed (no hot set), the entire index becomes the working set. This is common in analytics workloads, batch processing, and testing scenarios. Sparse indexes provide crucial resilience against this worst case.

I/O Cost Comparison

Beyond storage size, we must analyze the I/O cost of searches. Sparse indexes trade smaller size for additional search steps. Let's quantify this trade-off.

I/O Cost Models:

Dense Index Point Query:

I/O = (Index Height) + 1 (data block)
    = ⌈log_f(n)⌉ + 1

Where f = index fanout (entries per block)

Sparse Index Point Query:

I/O = (Index Height) + 1 (data block) + 0 (no extra scan if in-memory)
    = ⌈log_f(b)⌉ + 1

Where b = number of data blocks = n / b_f

Since $b = n / b_f$:

Dense:  ⌈log_f(n)⌉ + 1
Sparse: ⌈log_f(n/b_f)⌉ + 1 = ⌈log_f(n) - log_f(b_f)⌉ + 1

The difference is approximately $log_f(b_f)$, which is typically less than 1 for reasonable fanouts and blocking factors.

I/O Cost Comparison: Dense vs Sparse
Records (n)	Dense Height	Sparse Height	Extra I/O
1M	3	2-3	0-1
100M	4	3	1
10B	5	4	1
1T	6	5	1

Key Insight:

The I/O difference between dense and sparse indexes is typically at most 1 block read for the index traversal, plus the in-block scan time (microseconds in memory).

Given that:

One random disk I/O ≈ 5-10 ms (HDD) or 0.1-0.5 ms (SSD)
In-memory block scan ≈ 1-10 μs

The 'extra cost' of sparse indexes is negligible—often invisible in latency measurements.

The Real Trade-off:

The true benefit of sparse indexes isn't saving one index I/O—it's keeping the entire index in memory when a dense index would spill to disk. This can mean the difference between:

Sparse: 0.1 ms (all in-memory)
Dense: 10-50 ms (disk I/O required)

Sparse Index Sweet Spot

Sparse indexes provide maximum benefit when: (1) the sparse index fits in memory while the dense equivalent wouldn't, AND (2) data is sorted on the index key. In this scenario, you get both the memory benefit AND the clustering benefit for range queries.

Practical Sizing Guidelines

Based on our analysis, here are practical guidelines for estimating and planning index sizes in real systems.

Quick Estimation Rules

•Rule of 10%: Dense secondary indexes typically consume 8-15% of the data table size (assuming ~8-byte keys and ~100-byte average records).
•Rule of 1%: Sparse/clustered indexes typically consume 0.1-1% of the data table size under similar assumptions.
•UUID Penalty: Multiply estimates by 3-4× when using UUID keys instead of integer keys.
•Multi-Index Multiply: Each additional dense secondary index adds another 10% overhead. Five indexes ≈ 50% of table size.
•Safety Factor: Apply 1.3× multiplier to account for B-tree overhead, partial fill, and metadata.

index_size_calculator.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Estimate index sizes before creation
 
-- Step 1: Get table statistics
SELECT 
    relname AS table_name,
    n_live_tup AS row_count,
    pg_relation_size(relid) AS table_bytes
FROM pg_stat_user_tables
WHERE relname = 'orders';
 
-- Step 2: Estimate dense index size
-- Formula: entries × entry_size × overhead_factor
SELECT 
    n_live_tup AS rows,
    n_live_tup * 16 AS raw_dense_bytes,        -- 8-byte key + 8-byte pointer
    n_live_tup * 16 * 1.3 AS estimated_dense,  -- With 30% overhead
    pg_size_pretty((n_live_tup * 16 * 1.3)::bigint) AS dense_readable
FROM pg_stat_user_tables 
WHERE relname = 'orders';
 
-- Step 3: Actual index sizes (after creation)
SELECT 
    indexrelname AS index_name,
    pg_relation_size(indexrelid) AS index_bytes,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname = 'orders';
 
-- Step 4: Index-to-table ratio
SELECT 
    t.relname AS table_name,
    pg_size_pretty(pg_relation_size(t.relid)) AS table_size,
    pg_size_pretty(pg_indexes_size(t.relid)) AS total_index_size,
    round(100.0 * pg_indexes_size(t.relid) / 
          NULLIF(pg_relation_size(t.relid), 0), 1) AS index_pct
FROM pg_stat_user_tables t
WHERE t.relname = 'orders';

Monitoring in Production

Set up alerts for index size growth. If indexes are growing faster than expected (e.g., >20% per month), investigate potential causes: missing archival policies, unexpected data growth, or index bloat from fragmentation.

Summary: Space Trade-off Analysis

We've developed a quantitative framework for analyzing the storage trade-offs between dense and sparse indexes. Let's consolidate the key formulas and insights:

Key Takeaways

•Size Ratio: Sparse indexes are $b_f$ (blocking factor) times smaller than dense indexes. With 100 records/block, sparse is 100× smaller.
•Blocking Factor: $b_f = ⌊B/R⌋$ depends on block size $B$ and record size $R$. Smaller records = greater sparse advantage.
•Key Size Matters: Larger keys increase both index types proportionally but reduce fanout, increasing tree height.
•Memory Threshold: Index performance degrades dramatically when indexes exceed buffer pool size. Sparse indexes delay this threshold.
•I/O Cost Difference: At most 1 extra block read for sparse vs dense—negligible compared to the memory-fit advantage.
•Practical Rule: Dense indexes ≈ 10% of table size; sparse/clustered ≈ 1% of table size.

Looking Ahead:

With the space trade-off quantified, we now turn to search efficiency—analyzing the time complexity and practical performance characteristics of each index type. Understanding both storage AND performance is essential for optimal index design.

Page Complete

You now have the quantitative tools to calculate and compare index sizes, predict memory requirements, and understand the storage implications of index design decisions. Next, we'll complete the analysis with search efficiency considerations.

Space Trade-off

Quantifying the Index Size Decision

What You Will Learn

Fundamental Size Equations

Let's establish the mathematical foundation for index size analysis. We'll define precise notation and derive the core formulas.

Variable Definitions:

Symbol	Definition	Typical Values
$n$	Number of records in data file	10³ to 10¹⁰
$R$	Record size (bytes)	100 - 2000
$B$	Block/page size (bytes)	4,096 - 65,536
$b_f$	Blocking factor (records per block) = $⌊B/R⌋$	20 - 500
$b$	Number of data blocks = $⌈n/b_f⌉$	Derived
$K$	Key size (bytes)	4 - 128
$P$	Pointer size (bytes)	4 - 8
$E$	Index entry size = $K + P$	8 - 136
$i_f$	Index blocking factor = $⌊B/E⌋$	50 - 1000

Dense Index Size:

A dense index contains exactly $n$ entries (one per record):

$$S_{dense} = n \times E \text{ bytes}$$

Number of index blocks:

$$b_{dense} = \lceil n / i_f \rceil$$

Sparse Index Size:

A sparse index contains exactly $b$ entries (one per data block):

$$S_{sparse} = b \times E = \lceil n / b_f \rceil \times E \text{ bytes}$$

Number of index blocks:

$$b_{sparse} = \lceil b / i_f \rceil = \left\lceil \frac{n}{b_f \times i_f} \right\rceil$$

The Compression Ratio

The ratio of sparse to dense index size equals the inverse of the blocking factor:

$$\frac{S_{sparse}}{S_{dense}} = \frac{b}{n} = \frac{1}{b_f}$$

If each block holds 100 records, the sparse index is exactly 100× smaller. This ratio is independent of total record count—it depends only on record size and block size.

Worked Example: E-Commerce Database

Let's apply these formulas to a realistic e-commerce scenario to see the concrete implications.

Scenario: Order History Table

Table: orders
├── order_id (BIGINT):       8 bytes    ← Primary Key
├── customer_id (BIGINT):    8 bytes
├── order_date (TIMESTAMP):  8 bytes
├── status (VARCHAR(20)):   21 bytes
├── total_amount (DECIMAL): 16 bytes
├── shipping_address:       100 bytes (avg)
├── billing_address:        100 bytes (avg)
├── metadata (JSON):        200 bytes (avg)
├── Row overhead:            20 bytes
└── TOTAL RECORD SIZE:      481 bytes ≈ 500 bytes

Configuration:
├── Record count (n):        50,000,000 (50 million orders)
├── Block size (B):          8,192 bytes (8 KB)
├── Key size (K):            8 bytes (BIGINT order_id)
├── Pointer size (P):        8 bytes
└── Entry size (E):          16 bytes

Calculations:

Step 1: Data File Metrics

Blocking factor: b_f = ⌊8192 / 500⌋ = 16 records/block
Data blocks: b = ⌈50,000,000 / 16⌉ = 3,125,000 blocks
Data file size: 3,125,000 × 8 KB = 25 GB

Step 2: Dense Index Size

Dense entries: n = 50,000,000
Dense index size: 50,000,000 × 16 bytes = 800 MB
Index blocking factor: i_f = ⌊8192 / 16⌋ = 512 entries/block
Dense index blocks: ⌈50,000,000 / 512⌉ = 97,657 blocks

Step 3: Sparse Index Size

Sparse entries: b = 3,125,000
Sparse index size: 3,125,000 × 16 bytes = 50 MB
Sparse index blocks: ⌈3,125,000 / 512⌉ = 6,104 blocks

Summary: Dense vs Sparse for 50M Orders
Metric	Dense Index	Sparse Index	Ratio
Entries	50,000,000	3,125,000	16:1
Raw Size	800 MB	50 MB	16:1
Index Blocks	97,657	6,104	16:1
% of Data Size	3.2%	0.2%	16:1

Memory Implications

Impact of Blocking Factor

The blocking factor ($b_f$ = records per block) is the key determinant of sparse index efficiency. Let's analyze how record size and block size interact to affect this crucial ratio.

Blocking Factor Formula:

$$b_f = \left\lfloor \frac{B}{R} \right\rfloor$$

where:

$B$ = block size
$R$ = record size

Space Compression Ratio:

$$\text{Compression} = \frac{\text{Dense Size}}{\text{Sparse Size}} = b_f = \left\lfloor \frac{B}{R} \right\rfloor$$

Higher blocking factors yield greater sparse index advantages.

Blocking Factor Sensitivity Analysis
Record Size	Block Size 4KB	Block Size 8KB	Block Size 16KB
50 bytes	81 records/block	163 records/block	327 records/block
100 bytes	40 records/block	81 records/block	163 records/block
200 bytes	20 records/block	40 records/block	81 records/block
500 bytes	8 records/block	16 records/block	32 records/block
1000 bytes	4 records/block	8 records/block	16 records/block
2000 bytes	2 records/block	4 records/block	8 records/block

Analysis Insights:

Small Records Favor Sparse: With 50-byte records and 8KB blocks, sparse indexes are 163× smaller than dense. The space savings are dramatic.
Large Records Diminish Advantage: With 2000-byte records and 4KB blocks, only 2 records fit per block, making sparse indexes only 2× smaller. The advantage becomes marginal.
Block Size Scaling: Larger blocks increase blocking factor linearly, amplifying sparse index advantages proportionally.
Threshold Consideration: When $b_f < 10$, the sparse index advantage may not justify the two-phase search overhead in some workloads.

Variable-Length Record Complication

Key Size Impact on Index Dimensions

While the dense-to-sparse ratio depends only on blocking factor, the absolute size of both indexes scales with key size. This has significant practical implications.

Index Size vs Key Type (10M Records, 8KB Blocks, 100-byte Records)
Key Type	Key Size	Entry Size	Dense Size	Sparse Size
INT	4 bytes	12 bytes	120 MB	1.5 MB
BIGINT	8 bytes	16 bytes	160 MB	2.0 MB
UUID (binary)	16 bytes	24 bytes	240 MB	3.0 MB
UUID (string)	36 bytes	44 bytes	440 MB	5.5 MB
VARCHAR(50) avg	50 bytes	58 bytes	580 MB	7.25 MB
Composite (3 INTs)	12 bytes	20 bytes	200 MB	2.5 MB

Key Observations:

String UUIDs Cost 4× Integer Keys: A table using string UUIDs (36 bytes) instead of integers (4 bytes) sees its dense index grow from 120 MB to 440 MB—a 3.7× increase.
Index Fanout Degrades: Larger keys mean fewer entries per index block. With 44-byte entries in 8KB blocks, fanout drops to 186 (vs 682 for 12-byte entries). This increases B-tree height.
Composite Keys Compound: Multi-column keys add sizes together. A composite key on (customer_id, order_id, product_id) with three BIGINTs creates 24-byte keys.
Sparse Still Proportional: Despite larger absolute sizes, the sparse-to-dense ratio remains constant at 1:80 (for 80 records/block). Sparse indexing benefits persist across key sizes.

Design Implication

Multi-Level Index Size Analysis

Real index structures like B-trees are multi-level. Understanding the total storage across all levels provides a complete picture of index overhead.

Multi-Level Size Formula:

For a multi-level sparse index with fanout $f$ (entries per index block):

$$\text{Total Index Blocks} = \sum_{i=1}^{h} \left\lceil \frac{b}{f^i} \right\rceil$$

This is a geometric series. For large $f$, approximation:

$$\text{Total Blocks} \approx \frac{b}{f-1}$$

Example Calculation:

Scenario:
├── Data blocks (b):         1,000,000
├── Index fanout (f):        500 entries/block
└── Block size:              8 KB

Level 1: ⌈1,000,000 / 500⌉     = 2,000 blocks
Level 2: ⌈2,000 / 500⌉         = 4 blocks  
Level 3: ⌈4 / 500⌉             = 1 block (root)

Total index blocks:            2,005 blocks
Total index size:              2,005 × 8 KB ≈ 16 MB

Overhead ratio: 16 MB / (1M × 8 KB) = 16 MB / 8 GB = 0.2%

Dense vs Sparse Multi-Level Comparison:

Same 1M block scenario:

SPARSE (indexing blocks):
├── Base entries:        1,000,000
├── Level 1 entries:     2,000
├── Level 2 entries:     4
├── Total entries:       1,002,004
└── Total size:          ~16 MB

DENSE (indexing records with b_f = 100):
├── Base entries:        100,000,000 (n = b × b_f)
├── Level 1 entries:     200,000
├── Level 2 entries:     400
├── Level 3 entries:     1
├── Total entries:       100,200,401
└── Total size:          ~1.6 GB

Ratio: 1.6 GB / 16 MB = 100:1

The 100:1 ratio matches the blocking factor ($b_f = 100$), confirming multi-level overhead is proportionally distributed.

B+Tree Overhead

Memory Pressure and the Hot Index Problem

The Memory Threshold Effect:

Performance Curve:

  Response        │
  Time (ms)       │
                  │
       50 ────────│─────────────────╮
                  │                 │
       40 ────────│                 │
                  │                 │
       30 ────────│                 │
                  │                 │
       20 ────────│                 ╰─────────────────
                  │   ┌───────────────┐
       10 ────────│───│ Index fits in │
                  │   │    memory     │
        0 ────────┼───┴───────────────┴───────────────
                  │         │               │
             Index Size   Buffer      Memory
                          Pool        Threshold

When the index fits in memory: sub-millisecond lookups. When it doesn't: disk I/O adds 5-20ms per access.

Index Memory Fit Analysis (Buffer Pool: 4 GB)
Scenario	Dense Size	Sparse Size	Dense Fits?	Sparse Fits?
1M records	16 MB	0.2 MB	✓ Yes	✓ Yes
10M records	160 MB	2 MB	✓ Yes	✓ Yes
100M records	1.6 GB	20 MB	✓ Yes	✓ Yes
500M records	8 GB	100 MB	✗ No	✓ Yes
1B records	16 GB	200 MB	✗ No	✓ Yes
10B records	160 GB	2 GB	✗ No	✓ Yes

Analysis:

At 500M records, the dense index (8 GB) exceeds the 4 GB buffer pool. Index operations now require disk I/O, increasing latency 100-1000×.

The sparse index (100 MB) fits comfortably, maintaining in-memory performance even at 10 billion records.

Working Set Consideration:

Working Set Size ≈ (Hot Key Fraction) × (Index Size)

Example:
├── 500M records, dense index = 8 GB
├── 1% of keys are frequently accessed
├── Working set = 80 MB (fits!)
└── Acceptable performance for hot queries

The Worst Case: Uniform Random Access

I/O Cost Comparison

Beyond storage size, we must analyze the I/O cost of searches. Sparse indexes trade smaller size for additional search steps. Let's quantify this trade-off.

I/O Cost Models:

Dense Index Point Query:

I/O = (Index Height) + 1 (data block)
    = ⌈log_f(n)⌉ + 1

Where f = index fanout (entries per block)

Sparse Index Point Query:

I/O = (Index Height) + 1 (data block) + 0 (no extra scan if in-memory)
    = ⌈log_f(b)⌉ + 1

Where b = number of data blocks = n / b_f

Since $b = n / b_f$:

Dense:  ⌈log_f(n)⌉ + 1
Sparse: ⌈log_f(n/b_f)⌉ + 1 = ⌈log_f(n) - log_f(b_f)⌉ + 1

The difference is approximately $log_f(b_f)$, which is typically less than 1 for reasonable fanouts and blocking factors.

I/O Cost Comparison: Dense vs Sparse
Records (n)	Dense Height	Sparse Height	Extra I/O
1M	3	2-3	0-1
100M	4	3	1
10B	5	4	1
1T	6	5	1

Key Insight:

The I/O difference between dense and sparse indexes is typically at most 1 block read for the index traversal, plus the in-block scan time (microseconds in memory).

Given that:

One random disk I/O ≈ 5-10 ms (HDD) or 0.1-0.5 ms (SSD)
In-memory block scan ≈ 1-10 μs

The 'extra cost' of sparse indexes is negligible—often invisible in latency measurements.

The Real Trade-off:

The true benefit of sparse indexes isn't saving one index I/O—it's keeping the entire index in memory when a dense index would spill to disk. This can mean the difference between:

Sparse: 0.1 ms (all in-memory)
Dense: 10-50 ms (disk I/O required)

Sparse Index Sweet Spot

Practical Sizing Guidelines

Based on our analysis, here are practical guidelines for estimating and planning index sizes in real systems.

Quick Estimation Rules

•Rule of 10%: Dense secondary indexes typically consume 8-15% of the data table size (assuming ~8-byte keys and ~100-byte average records).
•Rule of 1%: Sparse/clustered indexes typically consume 0.1-1% of the data table size under similar assumptions.
•UUID Penalty: Multiply estimates by 3-4× when using UUID keys instead of integer keys.
•Multi-Index Multiply: Each additional dense secondary index adds another 10% overhead. Five indexes ≈ 50% of table size.
•Safety Factor: Apply 1.3× multiplier to account for B-tree overhead, partial fill, and metadata.

index_size_calculator.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Estimate index sizes before creation
 
-- Step 1: Get table statistics
SELECT 
    relname AS table_name,
    n_live_tup AS row_count,
    pg_relation_size(relid) AS table_bytes
FROM pg_stat_user_tables
WHERE relname = 'orders';
 
-- Step 2: Estimate dense index size
-- Formula: entries × entry_size × overhead_factor
SELECT 
    n_live_tup AS rows,
    n_live_tup * 16 AS raw_dense_bytes,        -- 8-byte key + 8-byte pointer
    n_live_tup * 16 * 1.3 AS estimated_dense,  -- With 30% overhead
    pg_size_pretty((n_live_tup * 16 * 1.3)::bigint) AS dense_readable
FROM pg_stat_user_tables 
WHERE relname = 'orders';
 
-- Step 3: Actual index sizes (after creation)
SELECT 
    indexrelname AS index_name,
    pg_relation_size(indexrelid) AS index_bytes,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE relname = 'orders';
 
-- Step 4: Index-to-table ratio
SELECT 
    t.relname AS table_name,
    pg_size_pretty(pg_relation_size(t.relid)) AS table_size,
    pg_size_pretty(pg_indexes_size(t.relid)) AS total_index_size,
    round(100.0 * pg_indexes_size(t.relid) / 
          NULLIF(pg_relation_size(t.relid), 0), 1) AS index_pct
FROM pg_stat_user_tables t
WHERE t.relname = 'orders';

Monitoring in Production

Summary: Space Trade-off Analysis

We've developed a quantitative framework for analyzing the storage trade-offs between dense and sparse indexes. Let's consolidate the key formulas and insights:

Key Takeaways

•Size Ratio: Sparse indexes are $b_f$ (blocking factor) times smaller than dense indexes. With 100 records/block, sparse is 100× smaller.
•Blocking Factor: $b_f = ⌊B/R⌋$ depends on block size $B$ and record size $R$. Smaller records = greater sparse advantage.
•Key Size Matters: Larger keys increase both index types proportionally but reduce fanout, increasing tree height.
•Memory Threshold: Index performance degrades dramatically when indexes exceed buffer pool size. Sparse indexes delay this threshold.
•I/O Cost Difference: At most 1 extra block read for sparse vs dense—negligible compared to the memory-fit advantage.
•Practical Rule: Dense indexes ≈ 10% of table size; sparse/clustered ≈ 1% of table size.

Looking Ahead:

Page Complete