Loading content...
Every experienced database professional has war stories—queries that brought production to its knees, optimizations that made things worse, and silent time bombs that exploded at the worst possible moment.
These stories share common themes. The same pitfalls trap developer after developer, year after year. By learning to recognize these patterns before they cause outages, you can save yourself and your organization significant pain.
This page catalogs the most common performance pitfalls, explains why they occur, and provides actionable strategies for detection and prevention. Consider this your field guide to database performance antipatterns.
By the end of this page, you will recognize and prevent: N+1 query problems; unbounded queries missing pagination; lock contention patterns; statistics-related plan instability; over- and under-indexing; query plan regression triggers; and monitoring blind spots that hide problems until they explode.
The N+1 query problem is perhaps the most common performance issue in applications using ORMs. It occurs when code fetches a list of items (1 query), then fetches related data for each item individually (N queries).
123456789101112131415161718192021222324252627
# The Problem (Python/SQLAlchemy style pseudocode) # Query 1: Fetch 100 ordersorders = db.query("SELECT * FROM orders WHERE status = 'pending' LIMIT 100") # Queries 2-101: Fetch customer for EACH orderfor order in orders: customer = db.query(f"SELECT * FROM customers WHERE id = {order.customer_id}") print(f"{customer.name}: {order.total}") # Result: 101 database round trips# At 5ms per query = 505ms just for network latency # The Solution: Eager Loading / Join # Single query with JOINresults = db.query(""" SELECT o.*, c.name as customer_name FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.status = 'pending' LIMIT 100""") # Result: 1 database round trip# At 5ms + slightly more processing = ~8ms totalWhy N+1 Is So Insidious:
Detection Strategies:
| Method | How It Works |
|---|---|
| Query logging | Count distinct query patterns per request |
| APM tools | Flag endpoints with high query counts |
| Database metrics | Monitor queries-per-second spikes |
| Code review | Look for database access in loops |
| ORM debugging | Enable query logging in development |
Most ORMs default to 'lazy loading'—related objects are fetched only when accessed. This is convenient but creates N+1 problems. Consider making eager loading the default and lazy loading opt-in for your critical paths.
An unbounded query is one that can return an unlimited number of rows. These queries work fine with small datasets but become catastrophic as data grows—eventually returning millions of rows, exhausting memory, and timing out.
12345678910111213141516171819202122232425262728
-- DANGEROUS: No LIMITSELECT * FROM events WHERE created_at > '2024-01-01';-- Could return 10 rows or 10 million rows -- DANGEROUS: Admin dashboard showing "all users"SELECT * FROM users ORDER BY created_at DESC;-- Works with 1,000 users; OOM with 1,000,000 users -- DANGEROUS: Export querySELECT * FROM transactions WHERE merchant_id = 123;-- High-volume merchants may have millions of transactions -- DANGEROUS: Count without upper boundSELECT COUNT(*) FROM logs WHERE level = 'ERROR';-- Counting billions of rows locks the table -- SAFE: Always use paginationSELECT * FROM events WHERE created_at > '2024-01-01'ORDER BY created_atLIMIT 100 OFFSET 0; -- Or keyset pagination (see below) -- SAFE: Bounded count with short-circuitSELECT COUNT(*) FROM ( SELECT 1 FROM logs WHERE level = 'ERROR' LIMIT 1001) subq;-- If > 1000, shows "1000+"; doesn't count the full tableThe Pagination Performance Trap:
Even paginated queries can be problematic:
-- OFFSET-based pagination degrades linearly
SELECT * FROM products ORDER BY name LIMIT 20 OFFSET 10000;
-- Database must read and skip 10,000 rows to return 20
-- For page 5000 (OFFSET 100000):
-- Skip 100,000 rows → return 20 rows = terrible performance
Solution: Keyset Pagination
-- Initial page
SELECT * FROM products ORDER BY created_at, id LIMIT 20;
-- Next page (using last row's values)
SELECT * FROM products
WHERE (created_at, id) > ('2024-01-15 09:30:00', 12345)
ORDER BY created_at, id
LIMIT 20;
-- Uses index, no row skipping, constant performance
Always set statement timeouts at the connection or query level: SET statement_timeout = '30s' (PostgreSQL) or MAX_EXECUTION_TIME hint (MySQL). This prevents runaway queries from monopolizing resources. Better a timeout error than a crashed database.
Lock contention occurs when multiple transactions compete for the same resources. What seems like a "slow query" may actually be a fast query waiting for locks held by another transaction.
Common Lock Contention Patterns:
| Pattern | Cause | Impact |
|---|---|---|
| Long-running transactions | Web request with multiple writes held open | Locks held for seconds, blocking others |
| Hot row updates | Counter, queue, or popular item updated frequently | Serialized access, throughput collapse |
| Full table locks | DDL operations, some MySQL ALTER TABLE operations | All access blocked during operation |
| Gap locks (InnoDB) | Range scans in serializable/repeatable read isolation | Phantom rows prevented but lock ranges |
| Unindexed foreign keys (Oracle) | Child table UPDATE locks parent table rows | Cascading lock escalation |
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- PostgreSQL: Find blocking queriesSELECT blocked.pid AS blocked_pid, blocked.query AS blocked_query, blocking.pid AS blocking_pid, blocking.query AS blocking_query, blocking.state, NOW() - blocking.query_start AS blocking_durationFROM pg_catalog.pg_locks blocked_locksJOIN pg_catalog.pg_stat_activity blocked ON blocked.pid = blocked_locks.pidJOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pidJOIN pg_catalog.pg_stat_activity blocking ON blocking.pid = blocking_locks.pidWHERE NOT blocked_locks.granted; -- MySQL: Find locks and waitsSELECT * FROM information_schema.innodb_lock_waits;SELECT * FROM performance_schema.data_lock_waits; -- Prevention: Keep transactions short-- BAD: Open transaction during external callBEGIN;INSERT INTO orders ...;-- HTTP call to payment processor (3 seconds)UPDATE orders SET payment_confirmed = true ...;COMMIT; -- Locks held for 3+ seconds -- GOOD: Defer locks until needed-- HTTP call to payment processor (3 seconds)BEGIN;INSERT INTO orders ...;UPDATE orders SET payment_confirmed = true ...;COMMIT; -- Locks held for millisecondsDatabases detect and break deadlocks by aborting one transaction. The application should catch the error and retry. Occasional deadlocks are normal; frequent deadlocks indicate a design problem—inconsistent lock ordering or unnecessary contention.
The query optimizer relies on statistics to estimate row counts and choose execution plans. Stale, missing, or misleading statistics cause the optimizer to make poor decisions—sometimes catastrophically poor.
123456789101112131415161718192021222324252627282930313233343536373839
-- PostgreSQL: Check statistics freshnessSELECT schemaname, relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze, n_live_tup, n_dead_tupFROM pg_stat_user_tablesORDER BY last_analyze NULLS FIRST; -- PostgreSQL: Manually update statisticsANALYZE orders; -- Specific tableANALYZE; -- All tables (caution in production) -- PostgreSQL: Extended statistics for correlated columns CREATE STATISTICS stats_customer_country_region ON country, region FROM customers;ANALYZE customers; -- MySQL: Check statisticsSELECT table_name, table_rows, last_updateFROM information_schema.tables WHERE table_schema = 'your_db'; -- MySQL: Update statisticsANALYZE TABLE orders; -- SQL Server: Update statisticsUPDATE STATISTICS orders;-- Or with full scan for accuracy:UPDATE STATISTICS orders WITH FULLSCAN;After bulk loading data (ETL, migrations), statistics don't exist for the new data. The optimizer estimates zero rows and makes disastrous plan choices. Always ANALYZE immediately after bulk operations. Some databases auto-trigger this, but verify.
SELECT * is convenient but creates performance and maintainability problems. Understanding when it's acceptable and when it's harmful helps avoid common pitfalls.
1234567891011121314151617181920212223242526
-- Table: users (id, name, email, bio TEXT, profile_photo BYTEA) -- PROBLEM: Fetching binary data unnecessarilySELECT * FROM users WHERE id = 123;-- Returns profile_photo (1MB) even if you only need name -- SOLUTION: Explicit columnsSELECT id, name, email FROM users WHERE id = 123;-- Returns only needed data; can use covering index -- PROBLEM: Covering index not usedCREATE INDEX idx_users_email_name ON users (email, name); SELECT * FROM users WHERE email = 'test@example.com';-- Cannot use index-only scan; must access table for all columns SELECT email, name FROM users WHERE email = 'test@example.com'; -- Index-only scan possible; much faster -- ACCEPTABLE uses of SELECT *:-- 1. Column validation/exploration in development-- 2. When you genuinely need all columns AND table is small-- 3. In EXISTS subqueries (SELECT 1 FROM ... is equivalent)-- 4. Migrating data: INSERT INTO new_table SELECT * FROM old_tableThe WIDTH Problem:
Row width (total bytes per row) directly affects performance:
For tables with TEXT/BLOB columns or many wide VARCHAR columns, SELECT * can return 10x-100x more bytes than necessary.
Most ORMs support projection—fetching only specific columns. Django's .only() and .defer(), Rails' .select(), SQLAlchemy's load_only(), Entity Framework's .Select(). Use these for queries where you don't need all columns, especially for list views.
Indexing problems manifest in two opposite ways: missing indexes cause full scans on large tables, while excessive indexes slow writes and waste storage. Finding the right balance requires understanding actual query patterns.
Detecting Missing Indexes:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- PostgreSQL: Find sequential scans on large tablesSELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, pg_size_pretty(pg_relation_size(relid)) as sizeFROM pg_stat_user_tablesWHERE seq_scan > 0 AND pg_relation_size(relid) > 10000000 -- Tables > 10MBORDER BY seq_tup_read DESC; -- High seq_tup_read on large tables indicates potential missing indexes -- PostgreSQL: Find slow queries (requires pg_stat_statements)SELECT query, calls, total_exec_time / 1000 as total_seconds, mean_exec_time as avg_ms, rowsFROM pg_stat_statementsORDER BY mean_exec_time DESCLIMIT 20; -- MySQL: Use Performance SchemaSELECT digest_text, count_star, avg_timer_wait/1000000000 as avg_ms, sum_rows_examined, sum_rows_sentFROM performance_schema.events_statements_summary_by_digestORDER BY avg_timer_wait DESCLIMIT 20; -- SQL Server: Missing Index DMVs (built-in recommendations)SELECT migs.avg_user_impact, mid.statement, mid.equality_columns, mid.inequality_columns, mid.included_columnsFROM sys.dm_db_missing_index_group_stats migsJOIN sys.dm_db_missing_index_groups mig ON migs.group_handle = mig.index_group_handleJOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handleORDER BY migs.avg_user_impact * migs.user_seeks DESC;Signs of Over-Indexing:
| Symptom | Cause |
|---|---|
| Slow INSERT/UPDATE | Every index must be updated |
| Large storage usage | Indexes may exceed table size |
| Many unused indexes | Indexes created but never scanned |
| Duplicate indexes | idx(a) and idx(a,b) both exist |
| Competing indexes | Multiple indexes on same columns in different orders |
SQL Server and Oracle suggest 'missing indexes' based on query workload. These suggestions are often good starting points but can recommend redundant indexes or indexes only useful for rare queries. Always evaluate whether the read improvement justifies the write overhead.
A query that was fast yesterday becomes slow today—not because the query changed, but because the execution plan changed. This is a "plan regression," and it's one of the most frustrating performance problems to diagnose.
Prevention and Mitigation:
| Strategy | Description |
|---|---|
| Plan baselines | SQL Server, Oracle: Lock known-good plans |
| Query store | Capture historical plans for comparison |
| Monitoring | Alert on query execution time increases |
| Controlled deploys | Deploy schema/index changes during low traffic |
| Testing | Load test with production-like data before changes |
| Rollback capability | Be prepared to revert statistics/settings |
12345678910111213141516171819202122232425262728293031
-- SQL Server: Query Store for plan historySELECT q.query_id, qt.query_sql_text, rs.avg_duration, p.query_plan, rs.first_execution_time, rs.last_execution_timeFROM sys.query_store_query qJOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_idJOIN sys.query_store_plan p ON q.query_id = p.query_idJOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_idWHERE qt.query_sql_text LIKE '%orders%'ORDER BY rs.avg_duration DESC; -- PostgreSQL: auto_explain for continuous plan logging-- In postgresql.conf:-- shared_preload_libraries = 'auto_explain'-- auto_explain.log_min_duration = '1s'-- auto_explain.log_analyze = true -- Then check logs for plan changes over time -- Force a specific plan (PostgreSQL - last resort)-- First, find the good plan's structure, then use:SET enable_seqscan = off;SET enable_nestloop = off;-- etc., to push optimizer toward the good plan-- Better: Fix the underlying cause (statistics, indexes)While databases offer plan hints and forcing, these are maintenance burdens. The forced plan may become suboptimal as data changes. Focus on fixing root causes: better indexes, fresh statistics, or query rewrites. Use plan forcing only for temporary stabilization while addressing the real issue.
Many performance problems go undetected until they cause outages because monitoring focuses on the wrong metrics or misses critical signals. Understanding common blind spots helps you build comprehensive observability.
| Blind Spot | Why It's Hidden | How to Detect |
|---|---|---|
| Slow queries that don't time out | Average response time looks fine; P99 is terrible | Monitor P95/P99 latency, not just average |
| Lock wait time | Query itself is fast; wait time isn't measured | Monitor lock_time, time_waiting separately |
| Connection exhaustion | Queries work; new connections fail | Monitor connection pool usage, not just errors |
| Disk space exhaustion | Sudden; everything breaks at once | Alert at 70% disk usage, not 95% |
| Replication lag | Reads work; stale data returned | Monitor seconds_behind_master / replication lag |
| Checkpoint impact | Periodic slowdowns during writes | Correlate checkpoint events with latency spikes |
| Index bloat | Gradual degradation, hard to attribute | Track index size ratio to row count over time |
Building Effective Alerting:
You now have a comprehensive understanding of the most common database performance pitfalls and how to detect, prevent, and resolve them. These patterns recur across all production environments—recognizing them quickly is a core skill for any database practitioner.
Congratulations on completing the Query Optimization module! You've learned the fundamental skills that separate database novices from experts—skills that directly impact application performance and user experience.
Your Next Steps:
Query optimization is a skill developed through practice. Apply these concepts to your own queries:
The techniques in this module form the foundation for all database performance work. As you encounter more complex scenarios, you'll build on these fundamentals.
You have completed the Query Optimization module within SQL Databases Deep Dive. You now possess the knowledge to diagnose and resolve the majority of SQL query performance problems encountered in production systems.