Database Management SystemPhysical Design

Physical Design

LevelIntermediate

Duration90 mins

TopicPhysical Design

5 / 5

Performance Considerations

The Performance Equation: Bringing It All Together

Physical design culminates in performance—the measurable response time, throughput, and resource utilization of your database system. Storage structures, indexes, partitioning, and denormalization are means to an end; performance is that end.

This page synthesizes the physical design topics into a holistic performance framework. Database performance is not a single metric but an ecosystem of interacting factors:

Query optimization — How the database transforms SQL into execution plans
I/O patterns — How data flows between disk and memory
Buffer management — How the database caches data
Concurrency — How multiple queries share resources
Hardware utilization — CPU, memory, disk, network bottlenecks

Understanding these interactions enables you to diagnose performance problems systematically rather than guessing.

What You Will Learn

This page covers query optimization and execution plans, I/O optimization strategies, buffer pool management, workload characterization, performance monitoring, and systematic performance tuning methodology. You'll learn to reason about performance holistically and diagnose issues methodically.

The Query Processing Pipeline

Understanding how databases process queries reveals optimization opportunities at each stage.

Query processing stages:

Parsing — SQL text → parse tree
- Syntax validation
- Identifier resolution (table/column names)
- Semantic checking (type compatibility)
Optimization — Parse tree → execution plan
- Generate candidate plans (different join orders, access methods)
- Estimate cost of each candidate
- Select lowest-cost plan
Execution — Execution plan → results
- Execute operators (scan, join, sort, aggregate)
- Fetch data from storage/buffer pool
- Return results to client

Performance tuning focuses primarily on optimization (improving plan selection) and execution (improving operator efficiency).

The query optimizer's role:

The query optimizer is the brain of database performance. Given a SQL query, it must:

Enumerate access methods: Full scan vs. index scan vs. index-only scan
Enumerate join methods: Nested loop, hash join, merge join
Enumerate join orders: For n tables, n! possible orders
Estimate costs: Based on statistics (row counts, value distributions, correlations)
Select optimal plan: Balance between plan quality and optimization time

Cost estimation accuracy is critical. With accurate statistics, the optimizer chooses good plans. With stale or missing statistics, it makes poor choices that can degrade performance by orders of magnitude.

Statistics Are Your Foundation

Stale statistics are among the most common causes of poor query performance. After bulk loads, major deletes, or schema changes, update statistics explicitly. Most databases have auto-vacuum/auto-analyze, but high-change tables may need manual intervention or more aggressive settings.

statistics_management.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Update statistics for a specific table
ANALYZE orders;
 
-- Update statistics for all tables
ANALYZE;
 
-- View statistics for a table
SELECT 
    attname,
    n_distinct,
    correlation,
    most_common_vals,
    histogram_bounds
FROM pg_stats 
WHERE tablename = 'orders';
 
-- Increase statistics target for better estimates on key columns
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;
-- Default is 100; higher values = more accurate estimates, slower ANALYZE
 
-- Check when tables were last analyzed
SELECT 
    relname,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze,
    n_live_tup,
    n_dead_tup
FROM pg_stat_user_tables;

Reading and Understanding Execution Plans

Execution plans reveal exactly how the database will (or did) execute your query. Reading plans is the most essential performance tuning skill.

Key execution plan elements:

Access methods: How tables are accessed
- Seq Scan / Table Scan — Full table read (often bad)
- Index Scan — Use index to find rows, then fetch from table
- Index Only Scan — Read only from index (covering index)
- Bitmap Index Scan — Build bitmap of matching rows, then fetch
Join methods: How tables are combined
- Nested Loop — For each row in outer, scan inner (good for small outer, indexed inner)
- Hash Join — Build hash table on smaller table, probe with larger (good for equality joins)
- Merge Join — Scan both sorted tables in parallel (good for pre-sorted or indexed data)
Cost estimates: Optimizer's predicted resource usage
- startup cost..total cost — Units are abstract (not milliseconds)
- rows — Estimated number of output rows
- width — Average row size in bytes

explain_plan_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Basic execution plan (estimated)
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
 
-- Detailed plan with actual execution timings
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 123;
 
-- Example output interpretation:
/*
Index Scan using idx_customer_id on orders  (cost=0.43..8.45 rows=3 width=120) 
  (actual time=0.021..0.024 rows=3 loops=1)
  Index Cond: (customer_id = 123)
  Buffers: shared hit=4
Planning Time: 0.085 ms
Execution Time: 0.041 ms
*/
 
-- Interpretation:
-- - Using index idx_customer_id (good)
-- - Estimated 3 rows, actually got 3 rows (estimates accurate)
-- - Buffers: shared hit=4 means 4 blocks read from cache (no disk I/O)
-- - Total time: 0.041 ms (excellent)
 
-- Problem plan example:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE DATE_TRUNC('day', order_date) = '2024-05-15';
 
/*
Seq Scan on orders  (cost=0.00..25000.00 rows=500 width=120) 
  (actual time=0.015..245.000 rows=483 loops=1)
  Filter: (date_trunc('day', order_date) = '2024-05-15')
  Rows Removed by Filter: 999517
  Buffers: shared hit=12500 read=12500
*/
 
-- Problem: Function on column prevents index use → full table scan
-- Fix: Use sargable predicate
SELECT * FROM orders 
WHERE order_date >= '2024-05-15' AND order_date < '2024-05-16';

Execution Plan Warning Signs
Warning Sign	Meaning	Typical Fix
Seq Scan / Full Table Scan on large table	No usable index found	Add appropriate index
Estimated vs actual rows differ greatly	Statistics outdated	ANALYZE the table
Nested Loop with Seq Scan on inner	Inner table scanned repeatedly	Add index on join column
Sort / Using filesort	Data sorted in memory/disk	Add index covering ORDER BY
Materialize / Using temporary	Intermediate results stored	Simplify query, add indexes
High Buffers: read count	Data fetched from disk, not cache	Increase buffer pool, or access pattern issue

I/O Optimization

Disk I/O is the ultimate bottleneck for database performance. All physical design decisions aim to minimize I/O—especially random I/O.

I/O optimization strategies:

1. Reduce total I/O volume:

Use indexes to avoid full table scans
Use covering indexes to avoid table access entirely
Partition to enable partition pruning
Project only needed columns (SELECT specific columns, not *)

2. Convert random I/O to sequential I/O:

Cluster/order data by access pattern
Use sorted indexes for range scans
Batch random accesses (bitmap scans)
Prefetch adjacent blocks during scans

3. Reduce I/O latency:

Increase buffer pool size (more data cached)
Use faster storage (SSD vs HDD, NVMe vs SATA)
Optimize storage configuration (RAID, filesystem)

Sequential vs Random I/O:

Access Pattern	HDD Performance	SSD Performance
Sequential Read	100-200 MB/s	500-3000+ MB/s
Random Read (4KB)	~100 IOPS	50,000-500,000 IOPS
Time for 1000 random reads	~10 seconds	2-20 ms

Even with SSDs, sequential access is 10-100x faster than random access. Physical design should favor sequential patterns.

The clustering advantage:

Clustered tables store related rows physically adjacent. For range queries:

Clustered: Need 10 adjacent blocks → 10 sequential I/Os → ~1ms
Non-clustered: Need 1000 scattered rows → 1000 random I/Os → 20-200ms

This 20-200x difference explains why clustering key selection is critical.

Monitor I/O Patterns

Use operating system tools (iostat, iotop) and database metrics to distinguish read vs write I/O, sequential vs random I/O, and identify which queries/tables generate the most I/O. This data guides physical design decisions—focusing optimization where impact is greatest.

io_monitoring_examples.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Track I/O at the table level
SELECT 
    schemaname,
    relname,
    heap_blks_read,    -- Blocks read from disk
    heap_blks_hit,     -- Blocks found in cache
    ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) as cache_hit_ratio,
    idx_blks_read,
    idx_blks_hit
FROM pg_statio_user_tables
ORDER BY heap_blks_read DESC
LIMIT 10;
 
-- Track I/O at the index level
SELECT
    indexrelname,
    idx_blks_read,
    idx_blks_hit,
    ROUND(100.0 * idx_blks_hit / NULLIF(idx_blks_hit + idx_blks_read, 0), 2) as cache_hit_ratio
FROM pg_statio_user_indexes
ORDER BY idx_blks_read DESC
LIMIT 10;
 
-- Overall database cache hit ratio (should be > 99% for OLTP)
SELECT
    sum(heap_blks_hit) as hits,
    sum(heap_blks_read) as reads,
    ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit + heap_blks_read), 0), 2) as hit_ratio
FROM pg_statio_user_tables;

Buffer Pool Management

The buffer pool (or buffer cache) is the database's primary memory cache for data pages. Effective buffer management dramatically reduces disk I/O.

How the buffer pool works:

When a page is needed, check if it's in the buffer pool
If yes (buffer hit), read from memory (~1μs)
If no (buffer miss), read from disk (~100μs-10ms), then cache
When buffer pool is full, evict pages using replacement policy (typically LRU variants)

Buffer pool sizing:

Too small: High miss rate, excessive disk I/O
Too large: Wastes memory that could serve application or OS file cache
Sweet spot: Large enough to cache the working set; typically 50-80% of available RAM for dedicated database servers

Working set concept:

The working set is the data actively accessed over a time window. If the working set fits in buffer pool → excellent hit rate. If it exceeds buffer pool → thrashing (continuous eviction and re-reading).

Buffer Pool Configuration by Database
Database	Parameter	Recommendation
PostgreSQL	shared_buffers	25-40% of RAM (OS caches the rest)
MySQL/InnoDB	innodb_buffer_pool_size	70-80% of RAM (on dedicated server)
Oracle	sga_target / db_cache_size	40-60% of RAM (depends on SGA components)
SQL Server	max server memory	Leave 4-8GB for OS; rest for SQL Server

Buffer pool monitoring:

Key metrics to track:

Hit ratio: Percentage of page requests served from memory
- OLTP target: >99%
- OLAP (large scans): Lower is acceptable
Eviction rate: How often pages are evicted
- High eviction with high misses = buffer pool too small
- High eviction with high hits = working set rotates
Dirty page ratio: Percentage of modified pages not yet written to disk
- Too high = checkpoint storm risk
- Too low = possible over-flushing

Advanced buffer pool features:

Multiple buffer pools: Reduce lock contention in high-concurrency systems
Buffer pool warming: Pre-load pages on database restart
Separate pools: Different pools for different access patterns (Oracle's keep pool)

buffer_pool_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Check shared_buffers setting
SHOW shared_buffers;
 
-- Check hit ratios (from earlier I/O section)
SELECT
    sum(heap_blks_hit) as buffer_hits,
    sum(heap_blks_read) as disk_reads,
    ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit + heap_blks_read), 0), 2) as hit_ratio
FROM pg_statio_user_tables;
 
-- Check buffer usage by table (pg_buffercache extension)
CREATE EXTENSION IF NOT EXISTS pg_buffercache;
 
SELECT 
    c.relname,
    COUNT(*) as buffers,
    pg_size_pretty(COUNT(*) * 8192) as buffer_size
FROM pg_buffercache b
JOIN pg_class c ON b.relfilenode = c.relfilenode
WHERE c.relname NOT LIKE 'pg_%'
GROUP BY c.relname
ORDER BY buffers DESC
LIMIT 10;

Workload Characterization

Effective physical design requires understanding your workload—the mix of queries, their patterns, and their requirements. Different workload types demand different optimization strategies.

OLTP (Online Transaction Processing):

Short, simple queries
Point lookups and small range scans
High concurrency (thousands of connections)
Low latency requirements (< 10ms)
Mix of reads and writes
Data typically fits in memory

Optimization focus: Indexes for point lookups, minimal locking, fast commits, connection pooling.

OLAP (Online Analytical Processing):

Complex queries with aggregations, joins
Large table scans, data warehouse patterns
Lower concurrency (tens of queries)
Higher latency acceptable (seconds to minutes)
Read-heavy (writes are bulk loads)
Data far exceeds memory

Optimization focus: Columnar storage, bitmap indexes, materialized aggregates, parallel query execution.

OLTP vs OLAP Optimization Strategies
Aspect	OLTP Optimization	OLAP Optimization
Storage	Row-oriented (heap + indexes)	Column-oriented, compressed
Indexes	B-tree on filters/joins	Bitmap, bloom filters, zone maps
Buffer pool	Maximize hit ratio	Tolerate lower hit ratio, focus on sequential I/O
Partitioning	By tenant or hot/cold	By date for time-series analytics
Denormalization	Minimal, for specific hot paths	Heavy (star schema, fact/dimension)
Query execution	Single-threaded, fast	Parallel, batch-oriented

Hybrid workloads (HTAP):

Modern systems often combine OLTP and OLAP requirements:

Real-time dashboards over transactional data
Operational analytics (fraud detection during transaction)
Mixed read/write with complex queries

Solutions for HTAP:

Read replicas for analytics (isolate load)
Materialized views for pre-computed analytics
Dual storage formats (row for OLTP, columnar for OLAP)
Specialized HTAP databases (TiDB, SingleStore, Oracle In-Memory)

Workload analysis queries:

Understand your workload before optimizing:

What are the top 10 queries by total time?
What are the most frequent queries?
What tables are most accessed (read and write)?
What is the read:write ratio?
What is the concurrency level throughout the day?

workload_analysis.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Enable pg_stat_statements extension for query tracking
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
 
-- Top queries by total time
SELECT
    queryid,
    LEFT(query, 80) as query_preview,
    calls,
    total_exec_time::NUMERIC(10,2) as total_ms,
    mean_exec_time::NUMERIC(10,2) as avg_ms,
    rows,
    shared_blks_hit,
    shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
 
-- Read vs Write distribution
SELECT
    SUM(CASE WHEN query ~* '^select' THEN calls ELSE 0 END) as reads,
    SUM(CASE WHEN query ~* '^(insert|update|delete)' THEN calls ELSE 0 END) as writes
FROM pg_stat_statements;
 
-- Table access patterns
SELECT
    schemaname,
    relname,
    seq_scan,        -- Full table scans
    idx_scan,        -- Index scans
    n_tup_ins,       -- Inserts
    n_tup_upd,       -- Updates
    n_tup_del        -- Deletes
FROM pg_stat_user_tables
ORDER BY seq_scan + idx_scan DESC
LIMIT 10;

Profile Before Optimizing

Never optimize blindly. Collect workload data for at least one full business cycle (day, week, or month depending on your patterns). Understand what queries matter most—the top 10% of queries often consume 90% of resources. Focus optimization effort there.

Systematic Performance Tuning Methodology

Performance tuning is effective only when approached systematically. The following methodology prevents wasted effort and ensures measurable results.

Phase 1: Establish Baselines

Before any optimization:

Measure current performance metrics
Identify top resource consumers
Document current configuration
Create reproducible test workload

Phase 2: Identify Bottlenecks

Determine what's limiting performance:

Is CPU saturated? (query execution, parsing overhead)
Is I/O saturated? (disk throughput, IOPS limits)
Is memory constrained? (buffer pool misses, swapping)
Is network the limit? (data transfer, connection overhead)
Is there lock contention? (waiting on rows, tables, or internal structures)

Phase 3: Targeted Optimization

Address the identified bottleneck:

If I/O-bound: Improve indexes, add caching, optimize queries
If CPU-bound: Simplify queries, reduce parsing, optimize functions
If lock-bound: Reduce transaction scope, improve concurrency
If memory-bound: Increase memory, reduce working set, optimize access patterns

Tuning Priority Order

•Fix the query — Rewrite inefficient SQL, eliminate unnecessary work
•Fix indexing — Ensure appropriate indexes exist and are used
•Fix schema — Correct data types, adjust denormalization, add partitioning
•Fix configuration — Adjust database parameters (memory, parallelism, etc.)
•Fix hardware — Add resources only after software optimizations exhausted

Phase 4: Validate and Document

Measure performance after each change
Compare to baseline (did it actually improve?)
Watch for regressions in other areas
Document what changed and why

Common tuning mistakes:

Tuning the wrong thing: Optimizing a query that's 1% of load while ignoring the 50% query
Over-indexing: Adding indexes without measuring impact, slowing writes
Configuration cargo-culting: Copying settings from blog posts without understanding your workload
Premature hardware scaling: Throwing hardware at software problems
Ignoring the optimizer: Fighting the optimizer rather than helping it (hints vs. statistics)

One Change at a Time

Make one change, measure, document. If you change five things simultaneously and performance improves, you don't know which change helped. Worse, one change may have helped while another hurt, and you've masked the problem.

Performance Monitoring and Alerting

Proactive monitoring catches performance issues before they become incidents. Establish comprehensive monitoring covering key database health metrics.

Essential metrics to monitor:

Key Database Performance Metrics
Category	Metrics	Alert Thresholds
Latency	p50, p95, p99 query latency	p99 > 100ms (adjust per SLA)
Throughput	Queries/second, transactions/second	Below baseline, sudden drops
Resource - CPU	CPU utilization, system vs user time	80% sustained
Resource - Memory	Buffer pool hit ratio, swap usage	Hit ratio < 99%, any swap
Resource - I/O	IOPS, throughput, await time	Await > 20ms (HDD) / > 5ms (SSD)
Connections	Active connections, connection wait	Near max_connections, wait > 0
Locks	Lock waits, deadlocks	Any deadlocks, wait > 1s
Replication	Replication lag	acceptable lag (e.g., 10s)

Monitoring stack components:

Data collection: Agents that gather metrics (Prometheus postgres_exporter, Percona PMM, Datadog agent)
Storage: Time-series database for metric history (Prometheus, InfluxDB, Datadog)
Visualization: Dashboards showing trends and current state (Grafana, built-in vendor dashboards)
Alerting: Notifications when thresholds exceeded (PagerDuty, OpsGenie, Slack integrations)
Query analysis: Slow query logs, query profiling (pg_stat_statements, Performance Schema)

Slow query logging:

Capture queries exceeding latency threshold for analysis:

slow_query_logging.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 100;  -- Log queries > 100ms
ALTER SYSTEM SET log_statement = 'none';  -- Don't log all statements
ALTER SYSTEM SET log_lock_waits = on;     -- Log lock waits
SELECT pg_reload_conf();
 
-- Also configure in postgresql.conf:
-- log_destination = 'csvlog'
-- logging_collector = on
-- log_directory = 'pg_log'
 
-- Query pg_stat_statements for slow query analysis
SELECT
    queryid,
    calls,
    mean_exec_time::NUMERIC(10,2) as avg_ms,
    max_exec_time::NUMERIC(10,2) as max_ms,
    stddev_exec_time::NUMERIC(10,2) as stddev_ms,
    LEFT(query, 100)
FROM pg_stat_statements
WHERE mean_exec_time > 100  -- Queries averaging > 100ms
ORDER BY mean_exec_time DESC;

Baseline Your Metrics

Alert thresholds should be based on YOUR normal operations, not generic values. Collect 2-4 weeks of data, establish baselines (mean, p95, p99), then set alerts at 2-3 standard deviations above normal. This reduces false positives while catching true anomalies.

Summary: Performance Considerations

Performance is the culmination of all physical design decisions. Understanding the query processing pipeline, I/O patterns, buffer management, and workload characteristics enables systematic optimization rather than guesswork.

Key Takeaways

•Statistics drive optimization — Stale statistics cause poor plans; ANALYZE regularly
•Read execution plans — They reveal exactly what the database does; master this skill
•Minimize I/O, especially random I/O — All physical design aims at this goal
•Buffer pool is critical — Size appropriately, monitor hit ratios, aim for >99% in OLTP
•Know your workload — OLTP and OLAP require fundamentally different optimization strategies
•Tune systematically — Baseline, identify bottleneck, target optimization, validate, document
•Prioritize correctly — Query optimization before indexing before schema before configuration before hardware
•Monitor proactively — Catch issues before they become incidents with comprehensive metrics and alerts

Module Complete:

With this page, you've completed the Physical Design module. You now understand:

Storage structures — Heap, sorted, hash, clustered organizations
Index selection — B-tree, hash, bitmap, specialized indexes and when to use each
Partitioning — Range, list, hash, composite schemes and partition pruning
Denormalization — Patterns, consistency strategies, and decision frameworks
Performance — Query optimization, I/O, buffer management, and systematic tuning

These skills enable you to translate logical database designs into physical implementations that perform well under real-world workloads.

Module Complete: Physical Design

You've mastered the physical design phase of database development. You can now make informed decisions about storage structures, indexes, partitioning, and denormalization based on workload analysis. Combined with logical design knowledge, you possess the complete toolkit for professional database design and optimization.

5 / 5

Loading learning content...

Database Management SystemPhysical Design

Physical Design

LevelIntermediate

Duration90 mins

TopicPhysical Design

5 / 5

Performance Considerations

The Performance Equation: Bringing It All Together

This page synthesizes the physical design topics into a holistic performance framework. Database performance is not a single metric but an ecosystem of interacting factors:

Query optimization — How the database transforms SQL into execution plans
I/O patterns — How data flows between disk and memory
Buffer management — How the database caches data
Concurrency — How multiple queries share resources
Hardware utilization — CPU, memory, disk, network bottlenecks

Understanding these interactions enables you to diagnose performance problems systematically rather than guessing.

What You Will Learn

The Query Processing Pipeline

Understanding how databases process queries reveals optimization opportunities at each stage.

Query processing stages:

Parsing — SQL text → parse tree
- Syntax validation
- Identifier resolution (table/column names)
- Semantic checking (type compatibility)
Optimization — Parse tree → execution plan
- Generate candidate plans (different join orders, access methods)
- Estimate cost of each candidate
- Select lowest-cost plan
Execution — Execution plan → results
- Execute operators (scan, join, sort, aggregate)
- Fetch data from storage/buffer pool
- Return results to client

Performance tuning focuses primarily on optimization (improving plan selection) and execution (improving operator efficiency).

The query optimizer's role:

The query optimizer is the brain of database performance. Given a SQL query, it must:

Enumerate access methods: Full scan vs. index scan vs. index-only scan
Enumerate join methods: Nested loop, hash join, merge join
Enumerate join orders: For n tables, n! possible orders
Estimate costs: Based on statistics (row counts, value distributions, correlations)
Select optimal plan: Balance between plan quality and optimization time

Statistics Are Your Foundation

statistics_management.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Update statistics for a specific table
ANALYZE orders;
 
-- Update statistics for all tables
ANALYZE;
 
-- View statistics for a table
SELECT 
    attname,
    n_distinct,
    correlation,
    most_common_vals,
    histogram_bounds
FROM pg_stats 
WHERE tablename = 'orders';
 
-- Increase statistics target for better estimates on key columns
ALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;
-- Default is 100; higher values = more accurate estimates, slower ANALYZE
 
-- Check when tables were last analyzed
SELECT 
    relname,
    last_vacuum,
    last_autovacuum,
    last_analyze,
    last_autoanalyze,
    n_live_tup,
    n_dead_tup
FROM pg_stat_user_tables;

Reading and Understanding Execution Plans

Execution plans reveal exactly how the database will (or did) execute your query. Reading plans is the most essential performance tuning skill.

Key execution plan elements:

Access methods: How tables are accessed
- Seq Scan / Table Scan — Full table read (often bad)
- Index Scan — Use index to find rows, then fetch from table
- Index Only Scan — Read only from index (covering index)
- Bitmap Index Scan — Build bitmap of matching rows, then fetch
Join methods: How tables are combined
- Nested Loop — For each row in outer, scan inner (good for small outer, indexed inner)
- Hash Join — Build hash table on smaller table, probe with larger (good for equality joins)
- Merge Join — Scan both sorted tables in parallel (good for pre-sorted or indexed data)
Cost estimates: Optimizer's predicted resource usage
- startup cost..total cost — Units are abstract (not milliseconds)
- rows — Estimated number of output rows
- width — Average row size in bytes

explain_plan_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Basic execution plan (estimated)
EXPLAIN SELECT * FROM orders WHERE customer_id = 123;
 
-- Detailed plan with actual execution timings
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 123;
 
-- Example output interpretation:
/*
Index Scan using idx_customer_id on orders  (cost=0.43..8.45 rows=3 width=120) 
  (actual time=0.021..0.024 rows=3 loops=1)
  Index Cond: (customer_id = 123)
  Buffers: shared hit=4
Planning Time: 0.085 ms
Execution Time: 0.041 ms
*/
 
-- Interpretation:
-- - Using index idx_customer_id (good)
-- - Estimated 3 rows, actually got 3 rows (estimates accurate)
-- - Buffers: shared hit=4 means 4 blocks read from cache (no disk I/O)
-- - Total time: 0.041 ms (excellent)
 
-- Problem plan example:
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE DATE_TRUNC('day', order_date) = '2024-05-15';
 
/*
Seq Scan on orders  (cost=0.00..25000.00 rows=500 width=120) 
  (actual time=0.015..245.000 rows=483 loops=1)
  Filter: (date_trunc('day', order_date) = '2024-05-15')
  Rows Removed by Filter: 999517
  Buffers: shared hit=12500 read=12500
*/
 
-- Problem: Function on column prevents index use → full table scan
-- Fix: Use sargable predicate
SELECT * FROM orders 
WHERE order_date >= '2024-05-15' AND order_date < '2024-05-16';

Execution Plan Warning Signs
Warning Sign	Meaning	Typical Fix
Seq Scan / Full Table Scan on large table	No usable index found	Add appropriate index
Estimated vs actual rows differ greatly	Statistics outdated	ANALYZE the table
Nested Loop with Seq Scan on inner	Inner table scanned repeatedly	Add index on join column
Sort / Using filesort	Data sorted in memory/disk	Add index covering ORDER BY
Materialize / Using temporary	Intermediate results stored	Simplify query, add indexes
High Buffers: read count	Data fetched from disk, not cache	Increase buffer pool, or access pattern issue

I/O Optimization

Disk I/O is the ultimate bottleneck for database performance. All physical design decisions aim to minimize I/O—especially random I/O.

I/O optimization strategies:

1. Reduce total I/O volume:

Use indexes to avoid full table scans
Use covering indexes to avoid table access entirely
Partition to enable partition pruning
Project only needed columns (SELECT specific columns, not *)

2. Convert random I/O to sequential I/O:

Cluster/order data by access pattern
Use sorted indexes for range scans
Batch random accesses (bitmap scans)
Prefetch adjacent blocks during scans

3. Reduce I/O latency:

Increase buffer pool size (more data cached)
Use faster storage (SSD vs HDD, NVMe vs SATA)
Optimize storage configuration (RAID, filesystem)

Sequential vs Random I/O:

Access Pattern	HDD Performance	SSD Performance
Sequential Read	100-200 MB/s	500-3000+ MB/s
Random Read (4KB)	~100 IOPS	50,000-500,000 IOPS
Time for 1000 random reads	~10 seconds	2-20 ms

Even with SSDs, sequential access is 10-100x faster than random access. Physical design should favor sequential patterns.

The clustering advantage:

Clustered tables store related rows physically adjacent. For range queries:

Clustered: Need 10 adjacent blocks → 10 sequential I/Os → ~1ms
Non-clustered: Need 1000 scattered rows → 1000 random I/Os → 20-200ms

This 20-200x difference explains why clustering key selection is critical.

Monitor I/O Patterns

io_monitoring_examples.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Track I/O at the table level
SELECT 
    schemaname,
    relname,
    heap_blks_read,    -- Blocks read from disk
    heap_blks_hit,     -- Blocks found in cache
    ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) as cache_hit_ratio,
    idx_blks_read,
    idx_blks_hit
FROM pg_statio_user_tables
ORDER BY heap_blks_read DESC
LIMIT 10;
 
-- Track I/O at the index level
SELECT
    indexrelname,
    idx_blks_read,
    idx_blks_hit,
    ROUND(100.0 * idx_blks_hit / NULLIF(idx_blks_hit + idx_blks_read, 0), 2) as cache_hit_ratio
FROM pg_statio_user_indexes
ORDER BY idx_blks_read DESC
LIMIT 10;
 
-- Overall database cache hit ratio (should be > 99% for OLTP)
SELECT
    sum(heap_blks_hit) as hits,
    sum(heap_blks_read) as reads,
    ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit + heap_blks_read), 0), 2) as hit_ratio
FROM pg_statio_user_tables;

Buffer Pool Management

The buffer pool (or buffer cache) is the database's primary memory cache for data pages. Effective buffer management dramatically reduces disk I/O.

How the buffer pool works:

When a page is needed, check if it's in the buffer pool
If yes (buffer hit), read from memory (~1μs)
If no (buffer miss), read from disk (~100μs-10ms), then cache
When buffer pool is full, evict pages using replacement policy (typically LRU variants)

Buffer pool sizing:

Too small: High miss rate, excessive disk I/O
Too large: Wastes memory that could serve application or OS file cache
Sweet spot: Large enough to cache the working set; typically 50-80% of available RAM for dedicated database servers

Working set concept:

Buffer Pool Configuration by Database
Database	Parameter	Recommendation
PostgreSQL	shared_buffers	25-40% of RAM (OS caches the rest)
MySQL/InnoDB	innodb_buffer_pool_size	70-80% of RAM (on dedicated server)
Oracle	sga_target / db_cache_size	40-60% of RAM (depends on SGA components)
SQL Server	max server memory	Leave 4-8GB for OS; rest for SQL Server

Buffer pool monitoring:

Key metrics to track:

Hit ratio: Percentage of page requests served from memory
- OLTP target: >99%
- OLAP (large scans): Lower is acceptable
Eviction rate: How often pages are evicted
- High eviction with high misses = buffer pool too small
- High eviction with high hits = working set rotates
Dirty page ratio: Percentage of modified pages not yet written to disk
- Too high = checkpoint storm risk
- Too low = possible over-flushing

Advanced buffer pool features:

Multiple buffer pools: Reduce lock contention in high-concurrency systems
Buffer pool warming: Pre-load pages on database restart
Separate pools: Different pools for different access patterns (Oracle's keep pool)

buffer_pool_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Check shared_buffers setting
SHOW shared_buffers;
 
-- Check hit ratios (from earlier I/O section)
SELECT
    sum(heap_blks_hit) as buffer_hits,
    sum(heap_blks_read) as disk_reads,
    ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit + heap_blks_read), 0), 2) as hit_ratio
FROM pg_statio_user_tables;
 
-- Check buffer usage by table (pg_buffercache extension)
CREATE EXTENSION IF NOT EXISTS pg_buffercache;
 
SELECT 
    c.relname,
    COUNT(*) as buffers,
    pg_size_pretty(COUNT(*) * 8192) as buffer_size
FROM pg_buffercache b
JOIN pg_class c ON b.relfilenode = c.relfilenode
WHERE c.relname NOT LIKE 'pg_%'
GROUP BY c.relname
ORDER BY buffers DESC
LIMIT 10;

Workload Characterization

Effective physical design requires understanding your workload—the mix of queries, their patterns, and their requirements. Different workload types demand different optimization strategies.

OLTP (Online Transaction Processing):

Short, simple queries
Point lookups and small range scans
High concurrency (thousands of connections)
Low latency requirements (< 10ms)
Mix of reads and writes
Data typically fits in memory

Optimization focus: Indexes for point lookups, minimal locking, fast commits, connection pooling.

OLAP (Online Analytical Processing):

Complex queries with aggregations, joins
Large table scans, data warehouse patterns
Lower concurrency (tens of queries)
Higher latency acceptable (seconds to minutes)
Read-heavy (writes are bulk loads)
Data far exceeds memory

Optimization focus: Columnar storage, bitmap indexes, materialized aggregates, parallel query execution.

OLTP vs OLAP Optimization Strategies
Aspect	OLTP Optimization	OLAP Optimization
Storage	Row-oriented (heap + indexes)	Column-oriented, compressed
Indexes	B-tree on filters/joins	Bitmap, bloom filters, zone maps
Buffer pool	Maximize hit ratio	Tolerate lower hit ratio, focus on sequential I/O
Partitioning	By tenant or hot/cold	By date for time-series analytics
Denormalization	Minimal, for specific hot paths	Heavy (star schema, fact/dimension)
Query execution	Single-threaded, fast	Parallel, batch-oriented

Hybrid workloads (HTAP):

Modern systems often combine OLTP and OLAP requirements:

Real-time dashboards over transactional data
Operational analytics (fraud detection during transaction)
Mixed read/write with complex queries

Solutions for HTAP:

Read replicas for analytics (isolate load)
Materialized views for pre-computed analytics
Dual storage formats (row for OLTP, columnar for OLAP)
Specialized HTAP databases (TiDB, SingleStore, Oracle In-Memory)

Workload analysis queries:

Understand your workload before optimizing:

What are the top 10 queries by total time?
What are the most frequent queries?
What tables are most accessed (read and write)?
What is the read:write ratio?
What is the concurrency level throughout the day?

workload_analysis.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Enable pg_stat_statements extension for query tracking
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
 
-- Top queries by total time
SELECT
    queryid,
    LEFT(query, 80) as query_preview,
    calls,
    total_exec_time::NUMERIC(10,2) as total_ms,
    mean_exec_time::NUMERIC(10,2) as avg_ms,
    rows,
    shared_blks_hit,
    shared_blks_read
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
 
-- Read vs Write distribution
SELECT
    SUM(CASE WHEN query ~* '^select' THEN calls ELSE 0 END) as reads,
    SUM(CASE WHEN query ~* '^(insert|update|delete)' THEN calls ELSE 0 END) as writes
FROM pg_stat_statements;
 
-- Table access patterns
SELECT
    schemaname,
    relname,
    seq_scan,        -- Full table scans
    idx_scan,        -- Index scans
    n_tup_ins,       -- Inserts
    n_tup_upd,       -- Updates
    n_tup_del        -- Deletes
FROM pg_stat_user_tables
ORDER BY seq_scan + idx_scan DESC
LIMIT 10;

Profile Before Optimizing

Systematic Performance Tuning Methodology

Performance tuning is effective only when approached systematically. The following methodology prevents wasted effort and ensures measurable results.

Phase 1: Establish Baselines

Before any optimization:

Measure current performance metrics
Identify top resource consumers
Document current configuration
Create reproducible test workload

Phase 2: Identify Bottlenecks

Determine what's limiting performance:

Is CPU saturated? (query execution, parsing overhead)
Is I/O saturated? (disk throughput, IOPS limits)
Is memory constrained? (buffer pool misses, swapping)
Is network the limit? (data transfer, connection overhead)
Is there lock contention? (waiting on rows, tables, or internal structures)

Phase 3: Targeted Optimization

Address the identified bottleneck:

If I/O-bound: Improve indexes, add caching, optimize queries
If CPU-bound: Simplify queries, reduce parsing, optimize functions
If lock-bound: Reduce transaction scope, improve concurrency
If memory-bound: Increase memory, reduce working set, optimize access patterns

Tuning Priority Order

•Fix the query — Rewrite inefficient SQL, eliminate unnecessary work
•Fix indexing — Ensure appropriate indexes exist and are used
•Fix schema — Correct data types, adjust denormalization, add partitioning
•Fix configuration — Adjust database parameters (memory, parallelism, etc.)
•Fix hardware — Add resources only after software optimizations exhausted

Phase 4: Validate and Document

Measure performance after each change
Compare to baseline (did it actually improve?)
Watch for regressions in other areas
Document what changed and why

Common tuning mistakes:

Tuning the wrong thing: Optimizing a query that's 1% of load while ignoring the 50% query
Over-indexing: Adding indexes without measuring impact, slowing writes
Configuration cargo-culting: Copying settings from blog posts without understanding your workload
Premature hardware scaling: Throwing hardware at software problems
Ignoring the optimizer: Fighting the optimizer rather than helping it (hints vs. statistics)

One Change at a Time

Performance Monitoring and Alerting

Proactive monitoring catches performance issues before they become incidents. Establish comprehensive monitoring covering key database health metrics.

Essential metrics to monitor:

Key Database Performance Metrics
Category	Metrics	Alert Thresholds
Latency	p50, p95, p99 query latency	p99 > 100ms (adjust per SLA)
Throughput	Queries/second, transactions/second	Below baseline, sudden drops
Resource - CPU	CPU utilization, system vs user time	80% sustained
Resource - Memory	Buffer pool hit ratio, swap usage	Hit ratio < 99%, any swap
Resource - I/O	IOPS, throughput, await time	Await > 20ms (HDD) / > 5ms (SSD)
Connections	Active connections, connection wait	Near max_connections, wait > 0
Locks	Lock waits, deadlocks	Any deadlocks, wait > 1s
Replication	Replication lag	acceptable lag (e.g., 10s)

Monitoring stack components:

Data collection: Agents that gather metrics (Prometheus postgres_exporter, Percona PMM, Datadog agent)
Storage: Time-series database for metric history (Prometheus, InfluxDB, Datadog)
Visualization: Dashboards showing trends and current state (Grafana, built-in vendor dashboards)
Alerting: Notifications when thresholds exceeded (PagerDuty, OpsGenie, Slack integrations)
Query analysis: Slow query logs, query profiling (pg_stat_statements, Performance Schema)

Slow query logging:

Capture queries exceeding latency threshold for analysis:

slow_query_logging.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Enable slow query logging
ALTER SYSTEM SET log_min_duration_statement = 100;  -- Log queries > 100ms
ALTER SYSTEM SET log_statement = 'none';  -- Don't log all statements
ALTER SYSTEM SET log_lock_waits = on;     -- Log lock waits
SELECT pg_reload_conf();
 
-- Also configure in postgresql.conf:
-- log_destination = 'csvlog'
-- logging_collector = on
-- log_directory = 'pg_log'
 
-- Query pg_stat_statements for slow query analysis
SELECT
    queryid,
    calls,
    mean_exec_time::NUMERIC(10,2) as avg_ms,
    max_exec_time::NUMERIC(10,2) as max_ms,
    stddev_exec_time::NUMERIC(10,2) as stddev_ms,
    LEFT(query, 100)
FROM pg_stat_statements
WHERE mean_exec_time > 100  -- Queries averaging > 100ms
ORDER BY mean_exec_time DESC;

Baseline Your Metrics

Summary: Performance Considerations

Key Takeaways

•Statistics drive optimization — Stale statistics cause poor plans; ANALYZE regularly
•Read execution plans — They reveal exactly what the database does; master this skill
•Minimize I/O, especially random I/O — All physical design aims at this goal
•Buffer pool is critical — Size appropriately, monitor hit ratios, aim for >99% in OLTP
•Know your workload — OLTP and OLAP require fundamentally different optimization strategies
•Tune systematically — Baseline, identify bottleneck, target optimization, validate, document
•Prioritize correctly — Query optimization before indexing before schema before configuration before hardware
•Monitor proactively — Catch issues before they become incidents with comprehensive metrics and alerts

Module Complete:

With this page, you've completed the Physical Design module. You now understand:

Storage structures — Heap, sorted, hash, clustered organizations
Index selection — B-tree, hash, bitmap, specialized indexes and when to use each
Partitioning — Range, list, hash, composite schemes and partition pruning
Denormalization — Patterns, consistency strategies, and decision frameworks
Performance — Query optimization, I/O, buffer management, and systematic tuning

These skills enable you to translate logical database designs into physical implementations that perform well under real-world workloads.

Module Complete: Physical Design

5 / 5