Loading learning content...
In most systems, the database is the ultimate arbiter of performance. You can optimize application code, add caching layers, and scale horizontally—but eventually, data must be read or written, and the database becomes the bottleneck of last resort.
Database query analysis is the discipline of understanding how databases execute queries, identifying inefficiencies, and systematically improving query performance. It transforms database interaction from trial-and-error to engineering discipline.
A single poorly-written query can bring a system to its knees. Conversely, a well-optimized query can handle millions of operations efficiently. The difference often comes down to understanding how the database executes your queries—and that requires analysis.
By the end of this page, you will understand query execution plans (EXPLAIN), slow query logging, index analysis, query pattern identification, and N+1 problem detection. You'll learn to diagnose database performance issues systematically using the same techniques employed by principal engineers and database specialists.
Database operations fundamentally differ from application code in one critical way: they're I/O-bound and often involve disk access. While CPU-bound work benefits from faster processors, database operations are constrained by the physical limits of storage and network.
The Scale Amplification Problem:
Consider a query that takes 50 milliseconds. In isolation, this seems acceptable. But:
This math explains why databases become bottlenecks quickly. Small inefficiencies compound into system-wide problems.
| Query Time | Queries/Request | Requests/Second | Database Load | System Impact |
|---|---|---|---|---|
| 5ms | 3 | 100 | 1.5s/s | Comfortable headroom |
| 50ms | 3 | 100 | 15s/s | Moderate load, limited scaling |
| 500ms | 3 | 100 | 150s/s | Database saturated, queue buildup |
| 5ms | 30 (N+1) | 100 | 15s/s | Hidden inefficiency, same problems |
The Invisibility Problem:
Database inefficiencies are often invisible during development:
Queries that perform perfectly in development can be catastrophically slow in production. Query analysis practices catch these issues before deployment.
Object-Relational Mappers (ORMs) abstract SQL generation, which often hides inefficient query patterns. The most common is the N+1 problem: fetching a list, then querying each item individually. An innocent-looking for loop becomes 100 separate database queries. Query analysis reveals these hidden patterns.
Every major database provides an EXPLAIN command that reveals how it intends to execute a query. This is the single most important tool for query analysis. It shows:
Understanding EXPLAIN output transforms query optimization from guesswork to science.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
-- PostgreSQL EXPLAIN ANALYZE Example-- EXPLAIN shows the plan; ANALYZE actually runs it with timing EXPLAIN ANALYZESELECT o.order_id, o.order_date, c.customer_name, p.product_nameFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_items oi ON o.order_id = oi.order_idJOIN products p ON oi.product_id = p.product_idWHERE o.order_date >= '2024-01-01' AND c.country = 'USA'; -- Example Output:/*Hash Join (cost=856.23..2341.56 rows=1523 width=128) (actual time=12.3..45.7 rows=1456 loops=1) Hash Cond: (oi.product_id = p.product_id) -> Hash Join (cost=423.12..1567.89 rows=1523 width=96) (actual time=8.1..32.4 rows=1456 loops=1) Hash Cond: (o.customer_id = c.customer_id) -> Nested Loop (cost=0.43..1089.34 rows=5234 width=56) (actual time=0.05..18.2 rows=4238 loops=1) -> Index Scan using idx_orders_date on orders o (cost=0.43..234.56 rows=2341 width=24) (actual time=0.03..2.1 rows=2156 loops=1) Index Cond: (order_date >= '2024-01-01'::date) -> Index Scan using idx_order_items_order on order_items oi (cost=0.29..0.35 rows=2 width=32) (actual time=0.01..0.01 rows=2 loops=2156) Index Cond: (order_id = o.order_id) -> Hash (cost=312.45..312.45 rows=8934 width=40) (actual time=7.8..7.8 rows=8234 loops=1) Buckets: 16384 Batches: 1 Memory Usage: 524kB -> Seq Scan on customers c (cost=0.00..312.45 rows=8934 width=40) (actual time=0.02..5.4 rows=8234 loops=1) Filter: (country = 'USA'::text) Rows Removed by Filter: 12456 -> Hash (cost=234.56..234.56 rows=15234 width=32) (actual time=4.1..4.1 rows=15234 loops=1) Buckets: 32768 Batches: 1 Memory Usage: 892kB -> Seq Scan on products p (cost=0.00..234.56 rows=15234 width=32) (actual time=0.01..2.3 rows=15234 loops=1) Planning Time: 0.892 msExecution Time: 46.234 ms*/ -- Key things to look for in this output: -- 1. SCAN TYPES:-- - "Seq Scan" = Full table scan (potentially slow on large tables)-- - "Index Scan" = Uses an index (generally good)-- - "Index Only Scan" = All data from index, no table access (best)-- - "Bitmap Heap Scan" = Combines multiple indexes -- 2. JOIN TYPES:-- - "Nested Loop" = For each row in A, scan B (good for small result sets)-- - "Hash Join" = Build hash of one table, probe with other (good for larger sets)-- - "Merge Join" = Walk sorted tables together (when both pre-sorted) -- 3. RED FLAGS:-- - "Seq Scan" on large tables with filters-- - "Rows Removed by Filter" is much larger than rows returned-- - "loops=X" where X is very large (indicates nested loop with many iterations)-- - Estimated rows differs drastically from actual rowsInterpreting Cost Estimates:
Cost numbers in EXPLAIN output are relative, not absolute time. They represent the database's estimate of work required using an internal unit. The key insight is relative cost between plan options:
Critical: Estimated rows vs. actual rows discrepancy is a major warning sign. It indicates outdated statistics, requiring ANALYZE to update them.
loops times. 1000 loops × 10ms = 10 seconds.Plain EXPLAIN shows the plan without executing. EXPLAIN ANALYZE executes the query and shows actual timing. For debugging, always use ANALYZE—the optimizer's estimates may differ dramatically from reality. But remember: ANALYZE actually runs the query, so avoid on data-modifying statements in production.
While EXPLAIN analyzes individual queries you're aware of, slow query logs capture problematic queries that emerge in production—including ones you never anticipated.
Every major database can be configured to log queries exceeding a time threshold. This creates a continuous record of performance problems as they occur in real workloads.
Why This Matters:
Your application may execute thousands of distinct query patterns. You can't EXPLAIN them all preemptively. Slow query logs reveal:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
-- PostgreSQL Slow Query Log Configuration-- Edit postgresql.conf or use ALTER SYSTEM -- Log queries slower than 1 secondALTER SYSTEM SET log_min_duration_statement = '1000ms'; -- Log all statements (verbose, for debugging only)-- ALTER SYSTEM SET log_statement = 'all'; -- Include duration in log outputALTER SYSTEM SET log_duration = 'on'; -- Log lock waits longer than 1 secondALTER SYSTEM SET log_lock_waits = 'on';ALTER SYSTEM SET deadlock_timeout = '1s'; -- Reload configurationSELECT pg_reload_conf(); ------------------------------------------------------------- MySQL Slow Query Log Configuration-- Edit my.cnf or use SET GLOBAL -- Enable slow query logSET GLOBAL slow_query_log = 'ON'; -- Log queries slower than 2 secondsSET GLOBAL long_query_time = 2; -- Log queries not using indexesSET GLOBAL log_queries_not_using_indexes = 'ON'; -- Specify log file locationSET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log'; ------------------------------------------------------------- Example Slow Query Log Entry (MySQL):/*# Time: 2024-06-15T14:23:45.123456Z# User@Host: app_user[app_user] @ app-server-01 [10.0.1.45]# Query_time: 12.456789 Lock_time: 0.000234 Rows_sent: 1 Rows_examined: 8234567SET timestamp=1718456625;SELECT customer_id, email, created_at FROM customers WHERE LOWER(email) LIKE '%@example.com';*/ -- RED FLAGS in this log entry:-- 1. Query_time: 12.4 seconds (extremely slow)-- 2. Rows_examined: 8,234,567 but Rows_sent: 1 (scanning millions for one result)-- 3. LOWER(email) prevents index use (function on column)-- 4. LIKE '%...' can't use index prefix (leading wildcard) -- Solution: Create functional index, or restructure:-- CREATE INDEX idx_customers_email_lower ON customers (LOWER(email));-- Or store normalized email separatelyAnalyzing Slow Query Logs at Scale:
In high-traffic systems, slow query logs can become massive. Direct reading is impractical. Use aggregation tools to identify patterns:
| Tool | Database | Key Features |
|---|---|---|
| pt-query-digest (Percona) | MySQL | Groups similar queries, shows worst offenders, provides optimization suggestions |
| pgBadger | PostgreSQL | Generates HTML reports, identifies patterns, shows hourly/daily breakdown |
| pg_stat_statements | PostgreSQL | Built-in extension, tracks all queries, cumulative statistics |
| Performance Insights (AWS) | RDS/Aurora | Cloud-native, real-time, identifies wait events |
| Query Store (SQL Server) | SQL Server | Built-in, tracks plan changes, identifies regressions |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
-- PostgreSQL pg_stat_statements: Built-in Query Analysis -- Enable the extension (requires restart)-- shared_preload_libraries = 'pg_stat_statements'CREATE EXTENSION IF NOT EXISTS pg_stat_statements; -- Top 10 queries by total execution time (cumulative impact)SELECT substring(query, 1, 80) AS short_query, calls, round(total_exec_time::numeric, 2) AS total_time_ms, round(mean_exec_time::numeric, 2) AS avg_time_ms, round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) AS percentage, rowsFROM pg_stat_statementsORDER BY total_exec_time DESCLIMIT 10; -- Example output:/* short_query | calls | total_time_ms | avg_time_ms | percentage | rows-------------------------------------------------------+--------+---------------+-------------+------------+--------- SELECT o.*, c.name FROM orders o JOIN customers c... | 234567 | 4523456.23 | 19.28 | 34.56 | 234567 UPDATE inventory SET quantity = quantity - $1 WHERE | 89234 | 1234567.89 | 13.84 | 9.43 | 89234 SELECT * FROM products WHERE category_id = $1 AND... | 456789 | 987654.32 | 2.16 | 7.54 | 892345*/ -- Top 10 queries by average execution time (individual latency)SELECT substring(query, 1, 80) AS short_query, calls, round(mean_exec_time::numeric, 2) AS avg_time_ms, round(stddev_exec_time::numeric, 2) AS stddev_ms, round(max_exec_time::numeric, 2) AS max_time_msFROM pg_stat_statementsWHERE calls > 100 -- Filter out rare queriesORDER BY mean_exec_time DESCLIMIT 10; -- Queries with high variance (inconsistent performance)SELECT substring(query, 1, 80) AS short_query, calls, round(mean_exec_time::numeric, 2) AS avg_time_ms, round(stddev_exec_time::numeric, 2) AS stddev_ms, round(stddev_exec_time / NULLIF(mean_exec_time, 0)::numeric, 2) AS coefficient_of_variationFROM pg_stat_statementsWHERE calls > 100 AND mean_exec_time > 10 -- At least 10ms averageORDER BY (stddev_exec_time / NULLIF(mean_exec_time, 0)) DESCLIMIT 10; -- High variance often indicates:-- 1. Parameter-dependent plan selection-- 2. Lock contention under load-- 3. Cache cold vs warm performance differenceIn most systems, a small number of query patterns dominate total database time. pg_stat_statements often reveals that 10-20 distinct query patterns account for 80%+ of database load. Focus optimization efforts on these high-impact queries first.
Indexes are the primary mechanism for query acceleration, but they're not free. Each index:
Index analysis determines whether indexes are being used effectively, identifies missing indexes, and finds unused indexes that should be removed.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
-- PostgreSQL Index Analysis Queries -- Find unused indexes (candidates for removal)SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan AS times_used, idx_tup_read AS tuples_readFROM pg_stat_user_indexesWHERE idx_scan = 0 -- Never used AND indexrelid NOT IN ( SELECT conindid FROM pg_constraint -- Exclude constraint indexes )ORDER BY pg_relation_size(indexrelid) DESC; -- Output example:/* schemaname | tablename | indexname | index_size | times_used | tuples_read------------+------------+------------------------+------------+------------+------------- public | orders | idx_orders_region | 2 GB | 0 | 0 public | users | idx_users_legacy_col | 500 MB | 0 | 0*/ -- These indexes are consuming space but providing no benefit! ----------------------------------------------------------- -- Find missing indexes (tables with sequential scans)SELECT schemaname, relname AS tablename, seq_scan, seq_tup_read, idx_scan, idx_tup_fetch, CASE WHEN seq_scan > 0 THEN round((seq_tup_read::numeric / seq_scan), 2) ELSE 0 END AS avg_rows_per_seq_scan, pg_size_pretty(pg_relation_size(relid)) AS table_sizeFROM pg_stat_user_tablesWHERE seq_scan > 1000 -- Significant number of sequential scans AND pg_relation_size(relid) > 10000000 -- Tables > 10MBORDER BY seq_tup_read DESCLIMIT 10; -- Output example:/* schemaname | tablename | seq_scan | seq_tup_read | idx_scan | avg_rows_per_seq_scan | table_size------------+-----------+----------+--------------+----------+-----------------------+------------ public | events | 45230 | 8923456780 | 1234 | 197345.67 | 15 GB*/-- 45,230 sequential scans averaging 197K rows each on a 15GB table!-- This table desperately needs an index for the common query pattern. ----------------------------------------------------------- -- Index efficiency: Are indexes being used effectively?SELECT indexrelname AS indexname, relname AS tablename, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan AS index_scans, idx_tup_read AS rows_read, idx_tup_fetch AS rows_fetched, CASE WHEN idx_scan > 0 THEN round((idx_tup_read::numeric / idx_scan), 2) ELSE 0 END AS avg_rows_per_scanFROM pg_stat_user_indexesWHERE idx_scan > 0ORDER BY avg_rows_per_scan DESCLIMIT 10; -- High avg_rows_per_scan might indicate:-- 1. Index is not selective enough-- 2. Queries aren't providing enough filter criteria-- 3. Index could be improved (partial index, better column order)Composite Index Column Order Matters:
For multi-column indexes, column order is critical. The index is organized by the first column, then by the second within each first-column value, and so on.
Principle: Order columns by selectivity and query patterns:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- Example: Composite Index Column Order -- Query pattern:SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending' AND created_at > '2024-01-01'; -- Index options: -- OPTION A: (customer_id, status, created_at)-- Good if customer_id is highly selective-- Filters to specific customer, then status, then scans dates -- OPTION B: (status, customer_id, created_at)-- Better if status reduces rows more than customer_id-- Example: Only 1% of orders are 'pending' -- OPTION C: (created_at, customer_id, status) -- POOR: Range column first means we can't efficiently use-- the rest of the index for equality filters -- To determine best order, analyze cardinality:SELECT COUNT(DISTINCT customer_id) AS customer_cardinality, COUNT(DISTINCT status) AS status_cardinality, COUNT(*) AS total_rows, COUNT(*) FILTER (WHERE status = 'pending') AS pending_countFROM orders; -- Example output:-- customer_cardinality: 50,000-- status_cardinality: 5-- total_rows: 10,000,000-- pending_count: 100,000 (1%) -- Analysis:-- customer_id: 10M / 50K = 200 rows per customer (good selectivity)-- status = 'pending': 100K rows (1% of table)-- -- Best index: (status, customer_id, created_at)-- WHY: 'pending' immediately reduces to 100K rows,-- then customer_id picks specific customer's pending orders,-- then created_at range is appliedA composite index on (A, B, C) can satisfy queries filtering on A, or A+B, or A+B+C. It cannot efficiently satisfy queries filtering only on B, only on C, or B+C without A. This is the 'leftmost prefix rule.' Always check that your query patterns align with your index column order.
The N+1 query problem is one of the most common and devastating performance issues in database-backed applications. It's particularly insidious because it often hides behind innocent-looking application code, especially when using ORMs.
The Pattern:
This turns a O(1) operation into O(N), where each query carries the overhead of network round-trips, query parsing, and lock acquisition.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
# N+1 Problem Example (Python with SQLAlchemy-style ORM) # BAD: N+1 Query Patterndef get_orders_with_items_bad(): """ This innocent-looking code causes N+1 queries. If there are 100 orders, this executes 101 queries! """ # Query 1: Get all orders orders = Order.query.filter_by(status='pending').all() results = [] for order in orders: # Queries 2..N+1: Each iteration triggers a new query! # ORM lazy-loads order.items when accessed items = order.items # <-- Hidden query here! results.append({ 'order_id': order.id, 'customer': order.customer_name, 'items': [{'name': i.name, 'qty': i.quantity} for i in items] }) return results # SQL generated (100 orders = 101 queries):# SELECT * FROM orders WHERE status = 'pending';# SELECT * FROM order_items WHERE order_id = 1;# SELECT * FROM order_items WHERE order_id = 2;# SELECT * FROM order_items WHERE order_id = 3;# ... (97 more queries)# SELECT * FROM order_items WHERE order_id = 100; # GOOD: Eager Loading Solutiondef get_orders_with_items_good(): """ Use eager loading to fetch all data in 1-2 queries. """ # Single query with JOIN or separate IN query orders = Order.query.filter_by(status='pending').options( joinedload(Order.items) # Eager load items ).all() # No additional queries - data already loaded results = [] for order in orders: items = order.items # Already in memory! results.append({ 'order_id': order.id, 'customer': order.customer_name, 'items': [{'name': i.name, 'qty': i.quantity} for i in items] }) return results # SQL generated (always 1 query):# SELECT orders.*, order_items.* # FROM orders # LEFT JOIN order_items ON orders.id = order_items.order_id# WHERE orders.status = 'pending'; # ALTERNATIVE: Manual batching for complex casesdef get_orders_with_items_batched(): """ When JOIN is inefficient (large results), batch manually. """ # Query 1: Get all orders orders = Order.query.filter_by(status='pending').all() order_ids = [o.id for o in orders] # Query 2: Get all items for all orders in one query items = OrderItem.query.filter(OrderItem.order_id.in_(order_ids)).all() # Group items by order_id in Python (no DB query) items_by_order = {} for item in items: items_by_order.setdefault(item.order_id, []).append(item) results = [] for order in orders: order_items = items_by_order.get(order.id, []) results.append({ 'order_id': order.id, 'customer': order.customer_name, 'items': [{'name': i.name, 'qty': i.quantity} for i in order_items] }) return results # SQL generated (always 2 queries):# SELECT * FROM orders WHERE status = 'pending';# SELECT * FROM order_items WHERE order_id IN (1, 2, 3, ..., 100);Detecting N+1 Problems:
N+1 problems often hide until production load exposes them. Detection strategies:
The abstraction that makes ORMs convenient also hides N+1 problems. A simple object attribute access (order.items) might trigger a database query. Configure your ORM to raise exceptions on lazy loading in development, forcing explicit decisions about data fetching strategy.
Individual query optimization is necessary but not sufficient. Query pattern analysis examines how queries interact, their timing, concurrency effects, and aggregate behavior.
Pattern Categories:
| Pattern | Symptom | Root Cause | Solution |
|---|---|---|---|
| Hot Row | Lock contention on specific rows | Many transactions updating same row (counters, status) | Batching, distributed counters, queue-based updates |
| Write Amplification | High replication lag, disk I/O | Updates to indexed columns, wide rows | Partial updates, narrow tables, index review |
| Temporal Skew | Performance degrades at specific times | Batch jobs conflicting with OLTP | Schedule isolation, read replicas for batch |
| Connection Exhaustion | Connection timeouts under load | Slow queries holding connections | Query optimization, connection pooling, timeouts |
| Cache Thrashing | Query cache hit rate drops | Query variation prevents caching | Query normalization, parameter binding |
Lock Analysis:
Query patterns that cause blocking are often invisible until under load. Lock contention analysis reveals these patterns:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
-- PostgreSQL: View current locks and blocking relationshipsSELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_query, blocking_activity.query AS blocking_query, blocked_activity.state AS blocked_state, blocking_activity.state AS blocking_state, now() - blocked_activity.query_start AS blocked_durationFROM pg_catalog.pg_locks blocked_locksJOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pidJOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pidJOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pidWHERE NOT blocked_locks.granted; -- Example output:/*blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_query | blocking_query | blocked_duration------------+--------------+--------------+---------------+---------------------------------------------+------------------------------------------+------------------ 12345 | app_user | 12344 | app_user | UPDATE inventory SET qty = $1 WHERE id = $2 | UPDATE inventory SET qty = $1 WHERE id = $2 | 00:00:05.234567*/ -- This shows two transactions trying to update the same inventory row!-- Solution: Application-level locking, queue-based updates, or optimistic concurrency -- PostgreSQL: Historical lock wait analysis (requires pg_stat_statements)SELECT substring(query, 1, 60) AS query, calls, round(total_exec_time::numeric / 1000, 2) AS total_exec_seconds, round(mean_exec_time::numeric, 2) AS avg_ms, round(blk_read_time::numeric + blk_write_time::numeric, 2) AS total_io_time_msFROM pg_stat_statementsWHERE blk_read_time + blk_write_time > 0ORDER BY blk_read_time + blk_write_time DESCLIMIT 10;Connection pools hide query patterns from the database. What looks like many short transactions might be a single long-held connection executing sequentially. Use application-level tracing to correlate queries with business operations. Connection poolers like PgBouncer can provide transaction-level visibility.
Query analysis shouldn't be a reactive, incident-driven activity. World-class engineering teams integrate continuous query analysis into their monitoring stack.
Key Database Metrics to Monitor:
Alerting Strategy:
Effective alerts catch problems before users notice:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
# Example: Prometheus + AlertManager Database Alerts groups: - name: database_performance rules: # Query latency regression - alert: DatabaseQueryLatencyHigh expr: | histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[5m]) ) > 0.5 for: 10m labels: severity: warning annotations: summary: "Database query P95 latency above 500ms" description: "P95 query latency is {{ $value | humanizeDuration }}" # Sudden latency spike (deviation from baseline) - alert: DatabaseQueryLatencySpike expr: | histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[5m])) / histogram_quantile(0.95, rate(db_query_duration_seconds_bucket[1h] offset 1d)) > 2 for: 5m labels: severity: warning annotations: summary: "Query latency 2x higher than same time yesterday" # Connection pool exhaustion - alert: DatabaseConnectionPoolNearLimit expr: | db_connections_active / db_connections_max > 0.8 for: 5m labels: severity: warning annotations: summary: "Database connection pool at {{ $value | humanizePercentage }}" # Slow query rate increasing - alert: SlowQueryRateIncreasing expr: | rate(db_slow_queries_total[10m]) > 10 for: 15m labels: severity: warning annotations: summary: "Slow query rate: {{ $value }}/sec" # Replication lag - alert: ReplicationLagHigh expr: | db_replication_lag_seconds > 30 for: 5m labels: severity: critical annotations: summary: "Replication lag is {{ $value | humanizeDuration }}" description: "Read replicas are falling behind. Read-after-write may fail."Database metrics in isolation are less useful than correlated views. A dashboard showing query latency, deploy markers, traffic volume, and error rate together reveals causality. Did latency increase after a deploy? Did traffic spike cause the issue? Integrated observability answers these questions.
We've explored the comprehensive discipline of database query analysis—from understanding execution plans to integrating continuous monitoring.
What's Next:
Database queries are one source of latency. The next page explores network monitoring—understanding and optimizing the network layer that connects distributed system components.
You now possess the knowledge to systematically analyze and optimize database queries. These skills apply across database technologies and are essential for any engineer working with data-intensive systems. Next, we'll turn our attention to network performance.