Loading learning content...
Physical design culminates in performance—the measurable response time, throughput, and resource utilization of your database system. Storage structures, indexes, partitioning, and denormalization are means to an end; performance is that end.
This page synthesizes the physical design topics into a holistic performance framework. Database performance is not a single metric but an ecosystem of interacting factors:
Understanding these interactions enables you to diagnose performance problems systematically rather than guessing.
This page covers query optimization and execution plans, I/O optimization strategies, buffer pool management, workload characterization, performance monitoring, and systematic performance tuning methodology. You'll learn to reason about performance holistically and diagnose issues methodically.
Understanding how databases process queries reveals optimization opportunities at each stage.
Query processing stages:
Parsing — SQL text → parse tree
Optimization — Parse tree → execution plan
Execution — Execution plan → results
Performance tuning focuses primarily on optimization (improving plan selection) and execution (improving operator efficiency).
The query optimizer's role:
The query optimizer is the brain of database performance. Given a SQL query, it must:
Cost estimation accuracy is critical. With accurate statistics, the optimizer chooses good plans. With stale or missing statistics, it makes poor choices that can degrade performance by orders of magnitude.
Stale statistics are among the most common causes of poor query performance. After bulk loads, major deletes, or schema changes, update statistics explicitly. Most databases have auto-vacuum/auto-analyze, but high-change tables may need manual intervention or more aggressive settings.
123456789101112131415161718192021222324252627282930
-- Update statistics for a specific tableANALYZE orders; -- Update statistics for all tablesANALYZE; -- View statistics for a tableSELECT attname, n_distinct, correlation, most_common_vals, histogram_boundsFROM pg_stats WHERE tablename = 'orders'; -- Increase statistics target for better estimates on key columnsALTER TABLE orders ALTER COLUMN customer_id SET STATISTICS 1000;-- Default is 100; higher values = more accurate estimates, slower ANALYZE -- Check when tables were last analyzedSELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze, n_live_tup, n_dead_tupFROM pg_stat_user_tables;Execution plans reveal exactly how the database will (or did) execute your query. Reading plans is the most essential performance tuning skill.
Key execution plan elements:
Access methods: How tables are accessed
Seq Scan / Table Scan — Full table read (often bad)Index Scan — Use index to find rows, then fetch from tableIndex Only Scan — Read only from index (covering index)Bitmap Index Scan — Build bitmap of matching rows, then fetchJoin methods: How tables are combined
Nested Loop — For each row in outer, scan inner (good for small outer, indexed inner)Hash Join — Build hash table on smaller table, probe with larger (good for equality joins)Merge Join — Scan both sorted tables in parallel (good for pre-sorted or indexed data)Cost estimates: Optimizer's predicted resource usage
startup cost..total cost — Units are abstract (not milliseconds)rows — Estimated number of output rowswidth — Average row size in bytes123456789101112131415161718192021222324252627282930313233343536373839
-- Basic execution plan (estimated)EXPLAIN SELECT * FROM orders WHERE customer_id = 123; -- Detailed plan with actual execution timingsEXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)SELECT * FROM orders WHERE customer_id = 123; -- Example output interpretation:/*Index Scan using idx_customer_id on orders (cost=0.43..8.45 rows=3 width=120) (actual time=0.021..0.024 rows=3 loops=1) Index Cond: (customer_id = 123) Buffers: shared hit=4Planning Time: 0.085 msExecution Time: 0.041 ms*/ -- Interpretation:-- - Using index idx_customer_id (good)-- - Estimated 3 rows, actually got 3 rows (estimates accurate)-- - Buffers: shared hit=4 means 4 blocks read from cache (no disk I/O)-- - Total time: 0.041 ms (excellent) -- Problem plan example:EXPLAIN (ANALYZE, BUFFERS)SELECT * FROM orders WHERE DATE_TRUNC('day', order_date) = '2024-05-15'; /*Seq Scan on orders (cost=0.00..25000.00 rows=500 width=120) (actual time=0.015..245.000 rows=483 loops=1) Filter: (date_trunc('day', order_date) = '2024-05-15') Rows Removed by Filter: 999517 Buffers: shared hit=12500 read=12500*/ -- Problem: Function on column prevents index use → full table scan-- Fix: Use sargable predicateSELECT * FROM orders WHERE order_date >= '2024-05-15' AND order_date < '2024-05-16';| Warning Sign | Meaning | Typical Fix |
|---|---|---|
| Seq Scan / Full Table Scan on large table | No usable index found | Add appropriate index |
| Estimated vs actual rows differ greatly | Statistics outdated | ANALYZE the table |
| Nested Loop with Seq Scan on inner | Inner table scanned repeatedly | Add index on join column |
| Sort / Using filesort | Data sorted in memory/disk | Add index covering ORDER BY |
| Materialize / Using temporary | Intermediate results stored | Simplify query, add indexes |
| High Buffers: read count | Data fetched from disk, not cache | Increase buffer pool, or access pattern issue |
Disk I/O is the ultimate bottleneck for database performance. All physical design decisions aim to minimize I/O—especially random I/O.
I/O optimization strategies:
1. Reduce total I/O volume:
2. Convert random I/O to sequential I/O:
3. Reduce I/O latency:
Sequential vs Random I/O:
| Access Pattern | HDD Performance | SSD Performance |
|---|---|---|
| Sequential Read | 100-200 MB/s | 500-3000+ MB/s |
| Random Read (4KB) | ~100 IOPS | 50,000-500,000 IOPS |
| Time for 1000 random reads | ~10 seconds | 2-20 ms |
Even with SSDs, sequential access is 10-100x faster than random access. Physical design should favor sequential patterns.
The clustering advantage:
Clustered tables store related rows physically adjacent. For range queries:
This 20-200x difference explains why clustering key selection is critical.
Use operating system tools (iostat, iotop) and database metrics to distinguish read vs write I/O, sequential vs random I/O, and identify which queries/tables generate the most I/O. This data guides physical design decisions—focusing optimization where impact is greatest.
1234567891011121314151617181920212223242526272829
-- Track I/O at the table levelSELECT schemaname, relname, heap_blks_read, -- Blocks read from disk heap_blks_hit, -- Blocks found in cache ROUND(100.0 * heap_blks_hit / NULLIF(heap_blks_hit + heap_blks_read, 0), 2) as cache_hit_ratio, idx_blks_read, idx_blks_hitFROM pg_statio_user_tablesORDER BY heap_blks_read DESCLIMIT 10; -- Track I/O at the index levelSELECT indexrelname, idx_blks_read, idx_blks_hit, ROUND(100.0 * idx_blks_hit / NULLIF(idx_blks_hit + idx_blks_read, 0), 2) as cache_hit_ratioFROM pg_statio_user_indexesORDER BY idx_blks_read DESCLIMIT 10; -- Overall database cache hit ratio (should be > 99% for OLTP)SELECT sum(heap_blks_hit) as hits, sum(heap_blks_read) as reads, ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit + heap_blks_read), 0), 2) as hit_ratioFROM pg_statio_user_tables;The buffer pool (or buffer cache) is the database's primary memory cache for data pages. Effective buffer management dramatically reduces disk I/O.
How the buffer pool works:
Buffer pool sizing:
Working set concept:
The working set is the data actively accessed over a time window. If the working set fits in buffer pool → excellent hit rate. If it exceeds buffer pool → thrashing (continuous eviction and re-reading).
| Database | Parameter | Recommendation |
|---|---|---|
| PostgreSQL | shared_buffers | 25-40% of RAM (OS caches the rest) |
| MySQL/InnoDB | innodb_buffer_pool_size | 70-80% of RAM (on dedicated server) |
| Oracle | sga_target / db_cache_size | 40-60% of RAM (depends on SGA components) |
| SQL Server | max server memory | Leave 4-8GB for OS; rest for SQL Server |
Buffer pool monitoring:
Key metrics to track:
Hit ratio: Percentage of page requests served from memory
Eviction rate: How often pages are evicted
Dirty page ratio: Percentage of modified pages not yet written to disk
Advanced buffer pool features:
1234567891011121314151617181920212223
-- Check shared_buffers settingSHOW shared_buffers; -- Check hit ratios (from earlier I/O section)SELECT sum(heap_blks_hit) as buffer_hits, sum(heap_blks_read) as disk_reads, ROUND(100.0 * sum(heap_blks_hit) / NULLIF(sum(heap_blks_hit + heap_blks_read), 0), 2) as hit_ratioFROM pg_statio_user_tables; -- Check buffer usage by table (pg_buffercache extension)CREATE EXTENSION IF NOT EXISTS pg_buffercache; SELECT c.relname, COUNT(*) as buffers, pg_size_pretty(COUNT(*) * 8192) as buffer_sizeFROM pg_buffercache bJOIN pg_class c ON b.relfilenode = c.relfilenodeWHERE c.relname NOT LIKE 'pg_%'GROUP BY c.relnameORDER BY buffers DESCLIMIT 10;Effective physical design requires understanding your workload—the mix of queries, their patterns, and their requirements. Different workload types demand different optimization strategies.
OLTP (Online Transaction Processing):
Optimization focus: Indexes for point lookups, minimal locking, fast commits, connection pooling.
OLAP (Online Analytical Processing):
Optimization focus: Columnar storage, bitmap indexes, materialized aggregates, parallel query execution.
| Aspect | OLTP Optimization | OLAP Optimization |
|---|---|---|
| Storage | Row-oriented (heap + indexes) | Column-oriented, compressed |
| Indexes | B-tree on filters/joins | Bitmap, bloom filters, zone maps |
| Buffer pool | Maximize hit ratio | Tolerate lower hit ratio, focus on sequential I/O |
| Partitioning | By tenant or hot/cold | By date for time-series analytics |
| Denormalization | Minimal, for specific hot paths | Heavy (star schema, fact/dimension) |
| Query execution | Single-threaded, fast | Parallel, batch-oriented |
Hybrid workloads (HTAP):
Modern systems often combine OLTP and OLAP requirements:
Solutions for HTAP:
Workload analysis queries:
Understand your workload before optimizing:
1234567891011121314151617181920212223242526272829303132333435
-- Enable pg_stat_statements extension for query trackingCREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Top queries by total timeSELECT queryid, LEFT(query, 80) as query_preview, calls, total_exec_time::NUMERIC(10,2) as total_ms, mean_exec_time::NUMERIC(10,2) as avg_ms, rows, shared_blks_hit, shared_blks_readFROM pg_stat_statementsORDER BY total_exec_time DESCLIMIT 10; -- Read vs Write distributionSELECT SUM(CASE WHEN query ~* '^select' THEN calls ELSE 0 END) as reads, SUM(CASE WHEN query ~* '^(insert|update|delete)' THEN calls ELSE 0 END) as writesFROM pg_stat_statements; -- Table access patternsSELECT schemaname, relname, seq_scan, -- Full table scans idx_scan, -- Index scans n_tup_ins, -- Inserts n_tup_upd, -- Updates n_tup_del -- DeletesFROM pg_stat_user_tablesORDER BY seq_scan + idx_scan DESCLIMIT 10;Never optimize blindly. Collect workload data for at least one full business cycle (day, week, or month depending on your patterns). Understand what queries matter most—the top 10% of queries often consume 90% of resources. Focus optimization effort there.
Performance tuning is effective only when approached systematically. The following methodology prevents wasted effort and ensures measurable results.
Phase 1: Establish Baselines
Before any optimization:
Phase 2: Identify Bottlenecks
Determine what's limiting performance:
Phase 3: Targeted Optimization
Address the identified bottleneck:
Phase 4: Validate and Document
Common tuning mistakes:
Make one change, measure, document. If you change five things simultaneously and performance improves, you don't know which change helped. Worse, one change may have helped while another hurt, and you've masked the problem.
Proactive monitoring catches performance issues before they become incidents. Establish comprehensive monitoring covering key database health metrics.
Essential metrics to monitor:
| Category | Metrics | Alert Thresholds |
|---|---|---|
| Latency | p50, p95, p99 query latency | p99 > 100ms (adjust per SLA) |
| Throughput | Queries/second, transactions/second | Below baseline, sudden drops |
| Resource - CPU | CPU utilization, system vs user time | 80% sustained |
| Resource - Memory | Buffer pool hit ratio, swap usage | Hit ratio < 99%, any swap |
| Resource - I/O | IOPS, throughput, await time | Await > 20ms (HDD) / > 5ms (SSD) |
| Connections | Active connections, connection wait | Near max_connections, wait > 0 |
| Locks | Lock waits, deadlocks | Any deadlocks, wait > 1s |
| Replication | Replication lag | acceptable lag (e.g., 10s) |
Monitoring stack components:
Data collection: Agents that gather metrics (Prometheus postgres_exporter, Percona PMM, Datadog agent)
Storage: Time-series database for metric history (Prometheus, InfluxDB, Datadog)
Visualization: Dashboards showing trends and current state (Grafana, built-in vendor dashboards)
Alerting: Notifications when thresholds exceeded (PagerDuty, OpsGenie, Slack integrations)
Query analysis: Slow query logs, query profiling (pg_stat_statements, Performance Schema)
Slow query logging:
Capture queries exceeding latency threshold for analysis:
12345678910111213141516171819202122
-- Enable slow query loggingALTER SYSTEM SET log_min_duration_statement = 100; -- Log queries > 100msALTER SYSTEM SET log_statement = 'none'; -- Don't log all statementsALTER SYSTEM SET log_lock_waits = on; -- Log lock waitsSELECT pg_reload_conf(); -- Also configure in postgresql.conf:-- log_destination = 'csvlog'-- logging_collector = on-- log_directory = 'pg_log' -- Query pg_stat_statements for slow query analysisSELECT queryid, calls, mean_exec_time::NUMERIC(10,2) as avg_ms, max_exec_time::NUMERIC(10,2) as max_ms, stddev_exec_time::NUMERIC(10,2) as stddev_ms, LEFT(query, 100)FROM pg_stat_statementsWHERE mean_exec_time > 100 -- Queries averaging > 100msORDER BY mean_exec_time DESC;Alert thresholds should be based on YOUR normal operations, not generic values. Collect 2-4 weeks of data, establish baselines (mean, p95, p99), then set alerts at 2-3 standard deviations above normal. This reduces false positives while catching true anomalies.
Performance is the culmination of all physical design decisions. Understanding the query processing pipeline, I/O patterns, buffer management, and workload characteristics enables systematic optimization rather than guesswork.
Module Complete:
With this page, you've completed the Physical Design module. You now understand:
These skills enable you to translate logical database designs into physical implementations that perform well under real-world workloads.
You've mastered the physical design phase of database development. You can now make informed decisions about storage structures, indexes, partitioning, and denormalization based on workload analysis. Combined with logical design knowledge, you possess the complete toolkit for professional database design and optimization.