Index Selection - Learning Module

Loading content...

0/252

Workload Analysis

From Queries to Workloads

While individual query analysis tells you what could be indexed, workload analysis tells you what should be indexed. A workload is the complete picture of database activity: all queries, their frequencies, their timing, their interactions, and their collective resource consumption.

Why Workload Analysis Matters:

Optimizing individual queries in isolation can lead to disastrous outcomes:

Adding an index for Query A might slow Query A' (a variant pattern)
Optimizing reads without considering writes creates write bottlenecks
Peak-time patterns differ from off-peak; indexes need to serve both
Resource constraints mean you can't index everything

Workload analysis provides the holistic view necessary for strategic index decisions.

What You Will Learn

By the end of this page, you will understand how to characterize database workloads, identify temporal patterns, perform trade-off analysis across competing requirements, and build a workload model that guides long-term indexing strategy.

Workload Characterization Framework

A workload characterization captures the essential properties of database activity in a structured format that enables systematic analysis. Think of it as creating a profile or fingerprint of your database usage.

Workload Characterization Dimensions
Dimension	What to Measure	Indexing Relevance
Query Mix	% SELECT / INSERT / UPDATE / DELETE	Determines read vs. write optimization priority
Query Frequency	Executions per second/minute for each query type	Identifies high-impact optimization targets
Data Access Patterns	Sequential vs. random; point vs. range queries	Influences index type selection (B+-tree, hash, etc.)
Temporal Patterns	Peak hours, daily cycles, monthly surges	May require different strategies for different periods
Concurrency	Active connections, transactions per second	Affects lock contention considerations
Data Characteristics	Table sizes, growth rates, cardinality	Impacts index storage and maintenance costs
Resource Constraints	Memory, CPU, disk I/O limits	Constrains what indexing is feasible
SLA Requirements	Latency targets, throughput requirements	Defines optimization goals

Building a Workload Profile:

A formal workload profile documents your findings in a structured format. Here's a template for capturing workload characteristics:

Workload Profile Template
JSON
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
{
  "workload_name": "E-commerce Order Processing",
  "analysis_period": "2024-01-01 to 2024-01-31",
  "environment": "Production - Primary",
 
  "summary_metrics": {
    "total_queries_per_day": 12500000,
    "read_write_ratio": "92:8",
    "peak_queries_per_second": 850,
    "average_concurrent_connections": 45,
    "peak_concurrent_connections": 180
  },
 
  "tables": [
    {
      "name": "orders",
      "rows": 45000000,
      "size_gb": 12.5,
      "daily_inserts": 15000,
      "daily_updates": 45000,
      "daily_deletes": 500,
      "growth_rate_percent_monthly": 3.2
    }
  ],
 
  "query_classes": [
    {
      "class_id": "Q1",
      "description": "Get order by ID",
      "pattern": "SELECT * FROM orders WHERE id = ?",
      "frequency_per_day": 3500000,
      "average_latency_ms": 2.1,
      "target_latency_ms": 5,
      "status": "MEETS_SLA"
    },
    {
      "class_id": "Q2",
      "description": "Recent orders for customer",
      "pattern": "SELECT * FROM orders WHERE customer_id = ? ORDER BY created_at DESC LIMIT 20",
      "frequency_per_day": 850000,
      "average_latency_ms": 45,
      "target_latency_ms": 20,
      "status": "EXCEEDS_SLA"
    }
  ],
 
  "temporal_patterns": {
    "peak_hours": ["09:00-12:00", "14:00-17:00", "19:00-22:00"],
    "low_activity": ["02:00-06:00"],
    "monthly_spike": "Last 3 days of month",
    "batch_windows": ["03:00-05:00"]
  }
}

Document Assumptions

When building a workload profile, explicitly document your assumptions and data sources. 'Peak QPS is 850' is less useful than 'Peak QPS is 850 (measured on 2024-01-15, holiday sale event, may not reflect normal operations).' This context is crucial for decision-making.

Deep Dive: Read/Write Analysis

The read/write ratio is the single most important metric for indexing strategy. It determines whether you should optimize aggressively for reads (add indexes) or conservatively (minimize indexes).

Measuring Read/Write Ratio:

Read/Write Ratio Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Per-table read/write analysis
SELECT
    relname AS table_name,
    seq_scan + idx_scan AS reads,
    n_tup_ins + n_tup_upd + n_tup_del AS writes,
    ROUND(
        100.0 * (seq_scan + idx_scan) /
        NULLIF(seq_scan + idx_scan + n_tup_ins + n_tup_upd + n_tup_del, 0),
        2
    ) AS read_percent,
    CASE
        WHEN (seq_scan + idx_scan) > 10 * (n_tup_ins + n_tup_upd + n_tup_del)
            THEN 'READ_HEAVY'
        WHEN (seq_scan + idx_scan) > 4 * (n_tup_ins + n_tup_upd + n_tup_del)
            THEN 'READ_DOMINANT'
        WHEN (seq_scan + idx_scan) > (n_tup_ins + n_tup_upd + n_tup_del)
            THEN 'BALANCED'
        WHEN (n_tup_ins + n_tup_upd + n_tup_del) > 4 * (seq_scan + idx_scan)
            THEN 'WRITE_DOMINANT'
        ELSE 'WRITE_HEAVY'
    END AS workload_type
FROM pg_stat_user_tables
ORDER BY reads + writes DESC;
 
-- Update detail breakdown (which columns change?)
SELECT
    a.attname AS column_name,
    s.null_frac AS null_percentage,
    s.n_distinct AS distinct_values,
    CASE
        WHEN a.attnum = ANY(i.indkey::int[]) THEN 'INDEXED'
        ELSE 'NOT_INDEXED'
    END AS index_status
FROM pg_stats s
JOIN pg_attribute a ON a.attname = s.attname
    AND a.attrelid = (SELECT oid FROM pg_class WHERE relname = 'orders')
LEFT JOIN pg_index i ON i.indrelid = a.attrelid
    AND a.attnum = ANY(i.indkey::int[])
WHERE s.tablename = 'orders';

Indexing Strategy by Workload Type

•READ_HEAVY (> 90% reads) — Index liberally. Add covering indexes. Consider denormalization. Write overhead is acceptable for read optimization.
•READ_DOMINANT (75-90% reads) — Index strategically. Focus on high-frequency queries. Monitor write latency after index additions.
•BALANCED (50-75% reads) — Index conservatively. Justify each index with measurable read benefit. Consider partial indexes.
•WRITE_DOMINANT (25-50% reads) — Index minimally. Only index for SLA-critical reads. Consider computed columns instead of indexes.
•WRITE_HEAVY (< 25% reads) — Consider OLTP-specific strategies: heap tables, minimal indexes, batch read optimization.

Per-Table Analysis is Essential

A database might be 80% reads overall, but a specific logging table might be 95% writes. Apply indexing strategy at the table level, not database level. Your session table doesn't need the same indexes as your order history table.

Temporal Pattern Analysis

Database workloads are not uniform over time. Understanding temporal patterns is critical because:

Peak hours may require different optimization than off-peak
Batch jobs create temporary load spikes
Seasonal variations affect data distribution and query patterns
Maintenance windows must align with low-activity periods

Common Temporal Patterns:

Temporal Patterns and Indexing Implications
Pattern	Example	Indexing Implication
Diurnal (Daily)	Business hours peak, overnight lull	Schedule index maintenance for off-peak. Design for peak load.
Weekly	Monday report surge, weekend quiet	Consider Monday-specific indexes. Heavy reports may need temporary indexes.
Monthly	End-of-month billing spike	Partition by date. Archive old data. Partial indexes on current month.
Seasonal	Holiday shopping surge	Capacity planning. Consider temporary indexes for seasonal queries.
Event-Driven	Sale events, product launches	Pre-create indexes for anticipated patterns. Monitor and add reactively.
Batch Windows	3 AM ETL jobs	Batch queries may warrant different indexes than OLTP queries.

Temporal Pattern Detection
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- Query volume by hour of day (requires logging)
-- This example uses a custom query_log table
SELECT
    EXTRACT(HOUR FROM timestamp) AS hour,
    COUNT(*) AS query_count,
    AVG(duration_ms) AS avg_duration_ms,
    PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY duration_ms) AS p95_ms,
    SUM(CASE WHEN query_type = 'SELECT' THEN 1 ELSE 0 END) AS reads,
    SUM(CASE WHEN query_type IN ('INSERT','UPDATE','DELETE') THEN 1 ELSE 0 END) AS writes
FROM query_log
WHERE timestamp > NOW() - INTERVAL '7 days'
GROUP BY EXTRACT(HOUR FROM timestamp)
ORDER BY hour;
 
-- Day-of-week analysis
SELECT
    TO_CHAR(timestamp, 'Day') AS day_of_week,
    COUNT(*) AS query_count,
    AVG(duration_ms) AS avg_duration_ms
FROM query_log
WHERE timestamp > NOW() - INTERVAL '4 weeks'
GROUP BY TO_CHAR(timestamp, 'Day'),
         EXTRACT(DOW FROM timestamp)
ORDER BY EXTRACT(DOW FROM MIN(timestamp));
 
-- Identify batch job windows (sudden activity spikes)
SELECT
    DATE_TRUNC('hour', timestamp) AS hour,
    COUNT(*) AS queries,
    ROUND(AVG(rows_affected), 0) AS avg_rows_affected
FROM query_log
WHERE query_type IN ('INSERT', 'UPDATE', 'DELETE')
  AND timestamp > NOW() - INTERVAL '7 days'
GROUP BY DATE_TRUNC('hour', timestamp)
HAVING COUNT(*) > 1000 OR AVG(rows_affected) > 1000
ORDER BY queries DESC;

Multi-Mode Workloads:

Many production systems exhibit fundamentally different patterns at different times:

OLTP Mode (Business Hours): High-frequency point queries, low latency requirements
Reporting Mode (Morning): Complex aggregations, dashboard refreshes
Batch Mode (Night): Bulk imports, ETL processes, reconciliation jobs

Each mode may benefit from different indexes. In extreme cases, you might:

Use different read replicas with different indexes for different purposes
Drop and recreate indexes between modes (rare, but done in some DW scenarios)
Design composite indexes that serve both modes with some compromise

Design for Your Worst Case

If your system must meet SLAs during peak load, design indexes for peak load. An index that works at 50 QPS but fails at 500 QPS is useless if your peak is 400 QPS. Always test index configurations under realistic load conditions.

Hot Spot Identification

Hot spots are tables, indexes, or data ranges that receive disproportionate traffic. Identifying hot spots is essential because they often determine system performance and are the highest-value optimization targets.

Types of Hot Spots:

Hot Spot Categories

•Table Hot Spots — A single table receives the majority of all database operations. Common for session tables, audit logs, or core transaction tables.
•Index Hot Spots — A specific index handles most lookups. Often the primary key or a frequently-used secondary index.
•Page Hot Spots — Specific pages within an index receive concentrated access. Common with auto-increment keys (right-edge insertion) or current-date queries.
•Row Hot Spots — Particular rows are updated constantly. Example: 'last updated' timestamps, counter columns, status flags.
•Temporal Hot Spots — Recent data is accessed far more than historical data. Enables partition pruning and partial indexing strategies.

Hot Spot Detection Queries
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Table hot spots: Most accessed tables
SELECT
    schemaname,
    relname AS table_name,
    seq_scan + idx_scan AS total_scans,
    seq_tup_read + idx_tup_fetch AS total_tuples_accessed,
    n_tup_ins AS inserts,
    n_tup_upd AS updates,
    n_tup_del AS deletes,
    ROUND(100.0 * (seq_scan + idx_scan) /
        NULLIF(SUM(seq_scan + idx_scan) OVER (), 0), 2) AS pct_of_total_scans
FROM pg_stat_user_tables
ORDER BY total_scans DESC
LIMIT 10;
 
-- Index hot spots: Most used indexes
SELECT
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched,
    ROUND(100.0 * idx_scan /
        NULLIF(SUM(idx_scan) OVER (), 0), 2) AS pct_of_total_scans
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC
LIMIT 15;
 
-- Page-level access patterns (requires pg_buffercache)
CREATE EXTENSION IF NOT EXISTS pg_buffercache;
 
SELECT
    c.relname AS table_name,
    COUNT(*) AS buffered_pages,
    ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM pg_buffercache), 2) AS pct_of_buffer,
    SUM(usagecount) AS total_usage
FROM pg_buffercache b
JOIN pg_class c ON b.relfilenode = c.relfilenode
WHERE c.relkind IN ('r', 'i')  -- tables and indexes
GROUP BY c.relname
ORDER BY total_usage DESC
LIMIT 20;

Hot Spots Require Targeted Strategies

Standard indexing advice may not apply to hot spots. A table handling 80% of your writes needs minimal indexes regardless of query patterns. A constantly-updated row may need to be moved to a separate table. Think creatively about hot spot mitigation.

Trade-off Analysis Framework

Index design involves navigating multiple competing concerns. A formal trade-off analysis helps make these tensions explicit and enables informed decision-making.

The Core Index Trade-offs
Trade-off	Option A	Option B	Decision Factor
Read vs Write	More indexes (faster reads)	Fewer indexes (faster writes)	Read/write ratio of workload
Space vs Performance	Covering indexes (larger)	Minimal indexes (smaller)	Available storage and memory
Query A vs Query B	Index optimized for A	Index optimized for B	Relative frequency and criticality
Simplicity vs Optimization	Fewer, broader indexes	Many specialized indexes	Maintenance capability
Latency vs Throughput	Optimize tail latency	Optimize average throughput	SLA requirements
Current vs Future	Optimize for today's workload	Build for anticipated growth	Change velocity and predictability

Quantitative Trade-off Analysis:

For significant indexing decisions, quantify the trade-offs:

Trade-off Calculation Example
Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Index Addition Trade-off Analysis
 
## Proposed Index
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status, created_at DESC);
 
## Current State
- Table size: 45 million rows
- Current indexes: 4
 
## Read Benefit Analysis
| Query | Current (ms) | With Index (ms) | Improvement | Daily Executions | Daily Time Saved |
|-------|--------------|-----------------|-------------|------------------|------------------|
| Q1: Customer orders | 450 | 15 | 435ms | 850,000 | 102 hours |
| Q2: Status by customer | 380 | 8 | 372ms | 120,000 | 12 hours |
| Q3: Recent customer orders | 520 | 12 | 508ms | 200,000 | 28 hours |
 
**Total Daily Read Benefit: 142 compute-hours saved**
 
## Write Cost Analysis
| Operation | Current (ms) | With Index (ms) | Increase | Daily Ops | Daily Cost Added |
|-----------|--------------|-----------------|----------|-----------|------------------|
| INSERT | 0.8 | 1.1 | 0.3ms | 15,000 | 1.25 hours |
| UPDATE (status) | 1.2 | 1.8 | 0.6ms | 45,000 | 7.5 hours |
| DELETE | 0.9 | 1.3 | 0.4ms | 500 | 0.06 hours |
 
**Total Daily Write Cost: 8.8 compute-hours added**
 
## Net Impact
- **Daily Time Saved: 133 hours (142 - 8.8)**
- **Storage Cost: +1.8 GB**
- **Memory Requirement: +400 MB buffer preference**
 
## Recommendation
**CREATE THE INDEX** - Net benefit is 16× the cost.
Even accounting for peak-hour write concentration, the trade-off is strongly positive.

Document Your Trade-offs

Create a decision log for significant indexing choices. Document what you considered, what you chose, and why. This enables learning from outcomes and provides context for future engineers who inherit your system.

Index Capacity Planning

Capacity planning for indexes involves projecting future index size and resource requirements based on data growth patterns. Without capacity planning, systems gradually degrade until a crisis forces emergency action.

Capacity Planning Questions

•Storage Growth: If the table grows 10% per month, when will indexes exceed available storage?
•Memory Pressure: At current growth rate, when will indexes exceed buffer pool size?
•Maintenance Windows: As indexes grow, will maintenance operations complete within allocated windows?
•Write Scalability: As write volume increases, can current index count sustain required throughput?
•Recovery Time: If indexes must be rebuilt, how long will that take at projected sizes?

Capacity Projection Model
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Current index sizes
WITH index_sizes AS (
    SELECT
        schemaname,
        tablename,
        indexname,
        pg_relation_size(indexrelid) AS current_size_bytes
    FROM pg_stat_user_indexes
)
SELECT
    tablename,
    indexname,
    pg_size_pretty(current_size_bytes) AS current_size,
    -- Project growth (assuming 5% monthly growth)
    pg_size_pretty(current_size_bytes * 1.05) AS size_1_month,
    pg_size_pretty(current_size_bytes * POWER(1.05, 6)) AS size_6_months,
    pg_size_pretty(current_size_bytes * POWER(1.05, 12)) AS size_1_year,
    pg_size_pretty(current_size_bytes * POWER(1.05, 24)) AS size_2_years
FROM index_sizes
ORDER BY current_size_bytes DESC;
 
-- Total index capacity projection
SELECT
    pg_size_pretty(SUM(pg_relation_size(indexrelid))) AS total_index_size,
    pg_size_pretty(SUM(pg_relation_size(indexrelid)) * 1.05) AS size_1_month,
    pg_size_pretty(SUM(pg_relation_size(indexrelid)) * POWER(1.05, 6)) AS size_6_months,
    pg_size_pretty(SUM(pg_relation_size(indexrelid)) * POWER(1.05, 12)) AS size_1_year,
    -- Compare to buffer pool
    current_setting('shared_buffers') AS buffer_pool,
    ROUND(100.0 * SUM(pg_relation_size(indexrelid)) /
        pg_size_bytes(current_setting('shared_buffers')), 2) AS pct_of_buffer_now,
    ROUND(100.0 * SUM(pg_relation_size(indexrelid)) * POWER(1.05, 12) /
        pg_size_bytes(current_setting('shared_buffers')), 2) AS pct_of_buffer_1_year
FROM pg_stat_user_indexes;

Capacity Thresholds to Monitor:

Metric	Healthy	Warning	Critical
Index size vs. buffer pool	< 50%	50-80%	80%
Index growth vs. storage	< 70% capacity	70-85%	85%
Maintenance window usage	< 60% of window	60-85%	85%
Index build time (for rebuild)	< 2 hours	2-8 hours	8 hours

Set alerts at warning thresholds to enable proactive intervention before critical levels are reached.

Growth is Rarely Linear

Business growth often accelerates: new products, more users, expanding markets. Your 5% monthly growth assumption may become 15% after a successful product launch. Build headroom into capacity plans and revisit projections quarterly.

Workload-Based Index Strategy

Different workload types call for fundamentally different indexing strategies. Here we map common workload patterns to proven indexing approaches.

Online Transaction Processing (OLTP) workloads prioritize low latency, high concurrency, and data consistency.

Characteristics:

Short, simple queries
High transaction rate (100s-1000s TPS)
Point lookups dominate
Write-heavy compared to analytics
Sub-100ms latency requirements

Index Strategy:

Primary key indexes are critical — Fast point lookups are essential
Index foreign keys — For referential integrity checks and joins
Minimize index count — Write overhead matters; every index slows transactions
Avoid wide indexes — Covering indexes rarely justified; rows are fetched anyway
Consider hash indexes — For pure equality lookups (if DBMS supports)
Evaluate fill factor — Higher fill factors reduce page splits for predictable inserts

OLTP Index Recommendations

•Target: 3-5 indexes per table maximum
•Favor simple over composite indexes
•Ensure clustered index supports common access pattern
•Use partial indexes to exclude historical data from active indexes

Summary: From Data to Decisions

Workload analysis transforms index design from guesswork into engineering. By understanding your complete workload, you can make strategic decisions that serve the whole system, not just individual queries.

Key Takeaways

•Characterize your workload formally — Document read/write ratios, query frequencies, temporal patterns, and resource constraints.
•Analyze at the table level — Global metrics hide table-specific patterns that require tailored strategies.
•Understand temporal patterns — Design for peak load; schedule maintenance for low-activity periods.
•Identify hot spots — Concentrated access patterns require specialized approaches.
•Make trade-offs explicit — Quantify costs and benefits; document decisions and rationale.
•Plan for capacity — Index growth is predictable; anticipate resource needs before they become crises.
•Match strategy to workload type — OLTP, OLAP, hybrid, and append-only workloads have different optimal indexing strategies.

Page Complete

You now understand how to perform comprehensive workload analysis for index design. You can characterize workloads, identify patterns, make trade-off decisions, and plan for growth. In the final page, we'll explore automated index recommendation systems that apply these principles algorithmically.