System Design (HLD)Latency Optimization

Latency Optimization: Eliminating Performance Bottlenecks

LevelAdvanced

Duration90 mins

TopicLatency Optimization

2 / 5

Database Query Optimization

When Milliseconds Become Minutes

The fastest network in the world cannot save you from a slow database query. Databases are often the primary bottleneck in application latency—a single poorly optimized query can take seconds when it should take milliseconds, negating every other optimization in your system.

Consider this common scenario: Your API endpoint returns data in 50ms locally, but in production it takes 2 seconds. The network is fast, the server has plenty of CPU—but the database is scanning 10 million rows to return 50, because someone forgot to add an index. Or worse, the query is executing a cartesian product due to a missing join condition, turning a 50-row result into a 50-million-row intermediate set.

Database query optimization is where elite engineers separate themselves. It requires understanding query planners, indexing theory, data distribution, and the specific characteristics of your database engine. This page will give you that understanding.

What You Will Master

By completing this page, you will understand how database query planners work, master indexing strategies for common access patterns, learn to read and interpret query execution plans, and acquire techniques that can reduce query latency from seconds to milliseconds.

Understanding Query Execution

Before optimizing queries, you must understand what happens when a query is executed. Databases don't execute SQL directly—they translate it into a query execution plan, a tree of physical operations that retrieve and process data.

The Query Processing Pipeline:

SQL Query → Parser → Query Rewriter → Query Optimizer → Execution Engine → Results

Parser: Validates SQL syntax, converts to internal representation (parse tree)
Query Rewriter: Applies rule-based transformations (view expansion, predicate pushdown)
Query Optimizer: Explores execution strategies, estimates costs, selects plan
Execution Engine: Executes the plan, retrieves data, applies operations

The Query Optimizer is the heart of database performance. It explores different ways to execute your query—which indexes to use, which join algorithms to apply, what order to join tables—and selects the plan with lowest estimated cost.

Common Execution Plan Operations
Operation	Description	Performance Characteristic
Sequential Scan (Seq Scan)	Read every row in table	O(n) - slow for large tables
Index Scan	Use index to locate rows	O(log n) - much faster for selective queries
Index Only Scan	Answer query entirely from index	Fastest - no table access needed
Bitmap Index Scan	Build bitmap of matching rows, then fetch	Good for moderate selectivity
Nested Loop Join	For each row in outer, scan inner	O(n×m) - fast when inner is small/indexed
Hash Join	Build hash table of smaller relation	O(n+m) - good for large unsorted tables
Merge Join	Merge two sorted relations	O(n+m) - requires sorted input
Sort	Sort rows by specified columns	O(n log n) - expensive for large datasets
Aggregate	Compute aggregate functions	O(n) - must process all matching rows

Cost Estimation:

The optimizer assigns a cost to each possible plan based on:

I/O cost: Number of disk pages to read (dominant factor for large tables)
CPU cost: Processing time for operations like filtering, sorting, hashing
Memory cost: Memory required for operations like sorting, hash tables

Cost estimates depend on table statistics—row counts, value distributions, null frequencies. Stale statistics lead to poor plans. This is why ANALYZE (PostgreSQL) or UPDATE STATISTICS (SQL Server) is critical.

Selectivity:

Selectivity is the fraction of rows that match a condition. A condition matching 1% of rows has 0.01 selectivity—very selective. A condition matching 90% has 0.9 selectivity—not selective.

Selective conditions favor index scans; non-selective conditions favor sequential scans (reading whole table once is faster than thousands of random index lookups).

Statistics Freshness Matters

Databases make optimization decisions based on statistics collected about your data. After loading large amounts of data or significant data changes, run ANALYZE (PostgreSQL) or UPDATE STATISTICS (SQL Server) to refresh statistics. Stale statistics can cause the optimizer to choose catastrophically wrong plans—like choosing a sequential scan of 100 million rows instead of an index scan that would read 100 rows.

Reading Execution Plans

Execution plans are your window into database decision-making. Learning to read them reveals why queries are slow and what to optimize.

EXPLAIN vs EXPLAIN ANALYZE:

EXPLAIN shows the planned execution (estimates only, doesn't run query)
EXPLAIN ANALYZE shows actual execution (runs query, compares estimates to reality)

Always use EXPLAIN ANALYZE when debugging—estimated costs can be very different from actual costs when statistics are stale or data is skewed.

explain-analyze-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- PostgreSQL: Basic query with execution plan
EXPLAIN ANALYZE
SELECT u.name, o.total_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01'
  AND o.status = 'completed';
 
-- Example output:
/*
Nested Loop  (cost=0.87..1234.56 rows=142 width=40) 
             (actual time=0.045..2.341 rows=138 loops=1)
  ->  Index Scan using idx_orders_created_at on orders o
        (cost=0.43..567.89 rows=142 width=20) 
        (actual time=0.025..1.234 rows=138 loops=1)
        Index Cond: (created_at > '2024-01-01')
        Filter: (status = 'completed')
        Rows Removed by Filter: 45
  ->  Index Scan using users_pkey on users u
        (cost=0.43..4.45 rows=1 width=24) 
        (actual time=0.006..0.006 rows=1 loops=138)
        Index Cond: (id = o.user_id)
Planning Time: 0.234 ms
Execution Time: 2.456 ms
*/
 
-- Key things to look for:
-- 1. "Seq Scan" on large tables - usually bad, needs index
-- 2. "actual rows" >> "rows" estimate - stale statistics
-- 3. High "loops" count with nested loops - may need different join
-- 4. "Rows Removed by Filter" is high - filter not pushed to index
-- 5. Large "cost" numbers - expensive operations
 
-- PostgreSQL: Get detailed buffer/IO statistics
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM large_table WHERE id = 12345;
 
-- MySQL equivalent
EXPLAIN ANALYZE
SELECT u.name, o.total_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01';

Red Flags in Execution Plans

•Sequential Scan on large tables — If you're filtering for few rows from millions, an index is usually needed.
•Actual rows >> Estimated rows — Statistics are stale. Run ANALYZE immediately.
•High 'Rows Removed by Filter' — Filter condition isn't using an index. Consider composite index including filter column.
•Sort operation with high cost — Consider index for ORDER BY columns, or add LIMIT to reduce sort volume.
•Hash Join with huge hash table — If one side is very large, ensure it's the probe side (not build side). Consider adding index for nested loop.
•Nested loop with high loop count — Each loop is an index lookup. If inner side is expensive, consider hash join instead.
•Memory-intensive operations spilling to disk — Increase work_mem (PostgreSQL) or sort_buffer_size (MySQL) for session.

Visual Plan Analysis

Use visual tools for complex plans: pgAdmin has a graphical plan viewer for PostgreSQL, MySQL Workbench visualizes EXPLAIN output, and tools like explain.depesz.com (PostgreSQL) color-code slow operations. Visual representation makes it easier to spot bottlenecks in complex multi-join queries.

Indexing Fundamentals

Indexes are the most powerful tool for query optimization. They transform O(n) table scans into O(log n) lookups. But indexing is nuanced—wrong indexes waste space and slow writes; right indexes make impossible queries instant.

How B-Tree Indexes Work:

Most database indexes are B-trees (balanced trees). A B-tree index maintains sorted key values in a tree structure where:

Root and internal nodes contain keys and pointers to child nodes
Leaf nodes contain keys and pointers to actual table rows (or in clustered indexes, the row data itself)
All leaf nodes are at the same depth (balanced), guaranteeing O(log n) access

For a table with 1 billion rows, a B-tree typically has depth 4-5. Finding any row requires at most 4-5 page reads, versus potentially millions for a sequential scan.

Index Types and Use Cases
Index Type	Best For	Not Good For
B-Tree (default)	Equality, range, sorting, prefix LIKE	Full-text search, array containment
Hash	Exact equality lookups only	Range queries, sorting
GIN (Generalized Inverted)	Full-text search, arrays, JSONB	Simple equality/range
GiST (Generalized Search Tree)	Geometric/spatial data, ranges	Simple equality
BRIN (Block Range)	Very large sorted tables (time-series)	Random access patterns
Bitmap	Multiple low-selectivity conditions	High-cardinality columns

Index Selection Principles:

When to Create Indexes

•Primary key columns — Always indexed (automatic in most databases).
•Foreign key columns — JOIN performance depends on these. Most databases don't auto-index FKs.
•WHERE clause columns — Columns frequently filtered with = or range conditions.
•ORDER BY columns — Avoid expensive sorts by maintaining sorted indexes.
•Columns in GROUP BY — Can speed up aggregation when combined with filtering.
•High-selectivity columns — Columns where conditions typically match small percentage of rows.
•Composite indexes for combined conditions — When queries filter on multiple columns together.

index-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- Single column index for equality and range queries
CREATE INDEX idx_orders_created_at ON orders(created_at);
 
-- Composite index for multi-column filtering
-- Column order matters! Put equality columns first, then range
CREATE INDEX idx_orders_user_status_date 
ON orders(user_id, status, created_at);
 
-- This index supports these queries efficiently:
-- WHERE user_id = X
-- WHERE user_id = X AND status = Y
-- WHERE user_id = X AND status = Y AND created_at > Z
-- BUT NOT: WHERE status = Y (can't skip user_id)
 
-- Covering index: includes all columns query needs
-- Enables "Index Only Scan" - no table access required
CREATE INDEX idx_orders_covering 
ON orders(user_id, status) 
INCLUDE (order_total, created_at);
 
-- Partial index: only index subset of rows
-- Smaller, faster, perfect for skewed data
CREATE INDEX idx_active_orders 
ON orders(created_at) 
WHERE status = 'active';
 
-- Expression index: index computed values
CREATE INDEX idx_users_lower_email 
ON users(LOWER(email));
 
-- Now this query uses the index:
-- WHERE LOWER(email) = 'user@example.com'
 
-- For text search with LIKE 'prefix%'
-- B-tree works for prefix, but need special operator class
CREATE INDEX idx_users_name_pattern 
ON users(name varchar_pattern_ops);
 
-- PostgreSQL: Check index usage
SELECT 
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Index Overhead

Every index slows down writes—INSERT, UPDATE, and DELETE must maintain all indexes. A table with 10 indexes makes insertions 10x slower than a table with no indexes. Index only columns that benefit query performance, and regularly review unused indexes. In PostgreSQL, check pg_stat_user_indexes for idx_scan = 0 to find unused indexes.

Advanced Indexing Strategies

Beyond basic indexing, advanced strategies unlock performance for complex query patterns.

Composite Index Column Ordering:

In composite indexes, column order is critical. The index can only use columns from left to right. Given index (A, B, C):

✅ WHERE A = 1 — Uses index
✅ WHERE A = 1 AND B = 2 — Uses index fully
✅ WHERE A = 1 AND B = 2 AND C > 3 — Uses all columns
⚠️ WHERE A = 1 AND C = 3 — Uses A only, scans for C
❌ WHERE B = 2 — Cannot use index (A skipped)

The Rule: Put equality conditions first, range conditions last. Put most selective equality columns first for best filtering.

Index-Friendly Patterns

•WHERE status = 'active' — Equality, uses B-tree
•WHERE created_at > '2024-01-01' — Range, uses B-tree
•WHERE email LIKE 'john%' — Prefix match, uses B-tree
•WHERE user_id IN (1,2,3) — Equality list, uses index
•ORDER BY created_at DESC LIMIT 10 — Index scan with limit
•WHERE (a, b) > (1, 2) — Row comparison, uses composite

Index-Breaking Patterns

•WHERE YEAR(created_at) = 2024 — Function on column
•WHERE amount * 1.1 > 100 — Expression on column
•WHERE email LIKE '%@gmail.com' — Suffix match
•WHERE name != 'John' — Negative conditions
•WHERE status IS NOT NULL — Often low selectivity
•WHERE LOWER(email) = 'x' — Without expression index

Covering Indexes:

A covering index includes all columns a query needs, allowing an Index Only Scan—the database never reads the table, only the index. This eliminates the expensive random I/O of table lookups.

-- Query needs user_id, email, name
SELECT email, name FROM users WHERE user_id = 123;

-- Covering index with INCLUDE clause
CREATE INDEX idx_users_covering ON users(user_id) INCLUDE (email, name);

Partial Indexes:

Partial indexes only index rows matching a condition. For tables where queries always filter on a condition, partial indexes are smaller and faster.

-- Active orders are 5% of all orders
CREATE INDEX idx_active_orders ON orders(created_at) WHERE status = 'active';

-- Index is 20x smaller, faster to scan, faster to maintain

Index-Organized Tables (Clustered Indexes):

In MySQL (InnoDB) and SQL Server, the primary key index stores actual row data—rows are physically ordered by primary key. This means:

Primary key lookups are fastest (one index traversal, no table lookup)
Range scans on primary key are very efficient (data is physically adjacent)
Secondary indexes store primary key values and require additional lookup

Choose primary keys carefully in these systems—UUID primary keys cause random I/O on every insert; sequential IDs allow efficient append-only writes.

Index Maintenance

B-tree indexes become fragmented over time as rows are inserted and deleted. Fragmented indexes are larger and slower. Schedule regular REINDEX (PostgreSQL) or ALTER INDEX REBUILD (SQL Server) during low-traffic periods. Monitor index bloat with pg_stat_user_indexes and pg_relation_size.

Query Rewriting Techniques

Sometimes the best optimization is rewriting the query itself. Semantically equivalent queries can have vastly different performance characteristics.

Common Query Patterns and Their Optimizations:

query-rewrites.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
-- PROBLEM 1: Subquery in WHERE with correlation
-- Bad: Correlated subquery executes once per row
SELECT * FROM orders o
WHERE o.total > (
    SELECT AVG(total) FROM orders o2 WHERE o2.user_id = o.user_id
);
 
-- Better: Rewrite as JOIN with pre-aggregated data
SELECT o.* FROM orders o
JOIN (
    SELECT user_id, AVG(total) as avg_total 
    FROM orders 
    GROUP BY user_id
) user_avg ON o.user_id = user_avg.user_id
WHERE o.total > user_avg.avg_total;
 
-- PROBLEM 2: OR conditions on different columns
-- Bad: Often can't use indexes effectively
SELECT * FROM products WHERE category_id = 5 OR brand_id = 10;
 
-- Better: UNION of two indexed queries
SELECT * FROM products WHERE category_id = 5
UNION
SELECT * FROM products WHERE brand_id = 10;
 
-- PROBLEM 3: DISTINCT with large result set
-- Bad: Sorts entire result for deduplication
SELECT DISTINCT user_id FROM orders WHERE created_at > '2024-01-01';
 
-- Better: GROUP BY (sometimes optimizes differently)
SELECT user_id FROM orders 
WHERE created_at > '2024-01-01' 
GROUP BY user_id;
 
-- Even better if exists check is sufficient
SELECT user_id FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.user_id = u.id AND o.created_at > '2024-01-01'
);
 
-- PROBLEM 4: Counting with expensive conditions
-- Bad: Counts all matching rows
SELECT COUNT(*) FROM orders WHERE status = 'completed';
 
-- Better for approximation: Use pg_stat_user_tables
SELECT n_tup_ins - n_tup_del AS approx_count 
FROM pg_stat_user_tables WHERE relname = 'orders';
 
-- Or for exact count on filtered large table, consider:
SELECT reltuples::bigint AS estimate FROM pg_class WHERE relname = 'orders';
 
-- PROBLEM 5: Pagination with OFFSET
-- Bad: OFFSET 10000 still reads and discards 10000 rows
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;
 
-- Better: Keyset pagination (seek method)
SELECT * FROM products 
WHERE id > 10020  -- Last ID from previous page
ORDER BY id LIMIT 20;
 
-- PROBLEM 6: NOT IN with nullable column
-- Bad: NOT IN with NULLs returns unexpected results, can't use index
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM deleted_users);
 
-- Better: NOT EXISTS (handles NULLs correctly, often faster)
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM deleted_users d WHERE d.user_id = u.id);
 
-- Or: LEFT JOIN / IS NULL
SELECT u.* FROM users u
LEFT JOIN deleted_users d ON u.id = d.user_id
WHERE d.user_id IS NULL;
 
-- PROBLEM 7: Multiple conditions on same table
-- Bad: Multiple subqueries hit same table repeatedly
SELECT * FROM products 
WHERE category_id = (SELECT id FROM categories WHERE name = 'Electronics')
  AND brand_id = (SELECT id FROM brands WHERE name = 'Samsung');
 
-- Better: Single lookup with JOIN
SELECT p.* FROM products p
JOIN categories c ON p.category_id = c.id
JOIN brands b ON p.brand_id = b.id
WHERE c.name = 'Electronics' AND b.name = 'Samsung';

Query Rewriting Principles

•Push filters down — Apply WHERE conditions as early as possible to reduce data processed in joins.
•Replace correlated subqueries with JOINs — Correlated subqueries execute per-row; JOINs process sets efficiently.
•Use EXISTS instead of IN for subqueries — EXISTS stops at first match; IN may materialize entire subquery.
•Avoid functions on indexed columns — WHERE YEAR(date) = 2024 can't use index; WHERE date >= '2024-01-01' AND date < '2025-01-01' can.
•Prefer keyset pagination to OFFSET — OFFSET 10000 reads and discards 10000 rows; keyset pagination jumps directly.
•Denormalize for read-heavy patterns — If you always join the same tables, consider storing denormalized data.
•Break complex queries into CTEs — CTEs (WITH clauses) can be materialized, reused, and make plans more predictable.

Join Optimization

JOINs are where query optimization gets complex. The number of possible join orders grows factorially with table count—for 10 tables, there are 10! = 3.6 million possible orders. Understanding join algorithms helps you structure queries for optimal execution.

Join Algorithm Comparison:

Join Algorithms: When Each Is Chosen
Algorithm	Mechanism	Best When	Cost
Nested Loop	For each outer row, scan inner table/index	Small outer, indexed inner	O(n × m) or O(n × log m) with index
Hash Join	Build hash table on smaller table, probe with larger	Large unsorted tables, no useful indexes	O(n + m) with memory for hash table
Merge Join	Merge two sorted inputs	Both sides already sorted (by index or prior sort)	O(n + m) but requires sorted input

Join Optimization Strategies:

1. Ensure Join Columns Are Indexed: Foreign key columns should almost always be indexed. Without an index, every join becomes a sequential scan of the inner table.

2. Filter Before Joining: Apply WHERE conditions to reduce rows before joining. The optimizer usually does this, but complex queries may fool it.

3. Consider Join Order: With multiple joins, order matters. Join the most restrictive conditions first to minimize intermediate result sizes. Most optimizers explore orderings automatically, but hints can help.

4. Avoid Cartesian Products: A missing join condition creates a cartesian product (every combination of rows). 1000 × 1000 = 1 million rows. Always verify join conditions completely connect all tables.

join-optimization.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- Ensure FK columns are indexed
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
 
-- PostgreSQL: Force specific join order when optimizer fails
SET join_collapse_limit = 1;  -- Respect explicit join order
 
SELECT *
FROM users u
JOIN orders o ON o.user_id = u.id        -- Join 1
JOIN order_items oi ON oi.order_id = o.id  -- Join 2
JOIN products p ON p.id = oi.product_id    -- Join 3
WHERE u.id = 12345;
 
RESET join_collapse_limit;
 
-- PostgreSQL: Hint to disable specific join type
SET enable_hashjoin = off;  -- Force nested loop or merge join
EXPLAIN ANALYZE SELECT ...;
SET enable_hashjoin = on;   -- Reset
 
-- Large join optimization: materialize expensive subquery
WITH expensive_filter AS MATERIALIZED (
    SELECT user_id, SUM(total) as total_spend
    FROM orders
    WHERE created_at > '2024-01-01'
    GROUP BY user_id
    HAVING SUM(total) > 1000
)
SELECT u.*, ef.total_spend
FROM users u
JOIN expensive_filter ef ON u.id = ef.user_id;
 
-- Lateral join for efficient correlated logic
-- Get latest 3 orders per user efficiently
SELECT u.*, recent_orders.*
FROM users u,
LATERAL (
    SELECT order_id, total, created_at
    FROM orders
    WHERE user_id = u.id
    ORDER BY created_at DESC
    LIMIT 3
) recent_orders
WHERE u.status = 'active';
 
-- This is more efficient than:
-- SELECT * FROM users u
-- JOIN orders o ON o.user_id = u.id
-- WHERE ... (with window functions for top 3)

Join Hints: Use Sparingly

Most databases support hints to force specific join orders or algorithms (LEADING/USE_HASH in Oracle, OPTION (LOOP JOIN) in SQL Server, GUCs in PostgreSQL). Use hints only as a last resort when the optimizer consistently fails. Hints are maintenance burdens—they become wrong as data changes. Prefer fixing statistics, adding indexes, or restructuring queries.

Query Performance Patterns

Certain query patterns appear repeatedly in applications. Knowing the optimized form for each saves debugging time.

Pattern 1: The N+1 Query Problem

One of the most common ORM-induced performance issues.

n-plus-one.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// N+1 PROBLEM: 1 query for users, N queries for orders
const users = await prisma.user.findMany({ take: 100 });
for (const user of users) {
    // This executes 100 separate queries!
    const orders = await prisma.order.findMany({
        where: { userId: user.id }
    });
}
 
// SOLUTION: Eager loading / batch fetching
const usersWithOrders = await prisma.user.findMany({
    take: 100,
    include: {
        orders: true  // Single query with JOIN
    }
});
 
// ALTERNATIVE: DataLoader pattern for GraphQL
import DataLoader from 'dataloader';
 
const orderLoader = new DataLoader(async (userIds: string[]) => {
    const orders = await prisma.order.findMany({
        where: { userId: { in: userIds } }
    });
    // Group by userId and return in same order as input
    const ordersByUser = new Map<string, Order[]>();
    for (const order of orders) {
        const existing = ordersByUser.get(order.userId) || [];
        ordersByUser.set(order.userId, [...existing, order]);
    }
    return userIds.map(id => ordersByUser.get(id) || []);
});
 
// Now resolvers use the loader - automatically batched
const resolvers = {
    User: {
        orders: (user) => orderLoader.load(user.id)
    }
};

Pattern 2: Leaderboards and Rankings

Getting top N with position is common but easy to implement poorly.

leaderboard-patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Efficient: Get top 100 with rank
-- Use window function with LIMIT
SELECT 
    user_id,
    score,
    RANK() OVER (ORDER BY score DESC) as position
FROM leaderboard
ORDER BY score DESC
LIMIT 100;
 
-- Get specific user's rank without scanning entire table
-- Create covering index
CREATE INDEX idx_leaderboard_score ON leaderboard(score DESC, user_id);
 
-- Count users with higher score
SELECT 
    (SELECT COUNT(*) FROM leaderboard WHERE score > l.score) + 1 as position,
    l.score
FROM leaderboard l
WHERE l.user_id = 12345;
 
-- For pagination in leaderboards, use keyset
SELECT user_id, score
FROM leaderboard
WHERE score < 9500  -- Last score from previous page
ORDER BY score DESC
LIMIT 100;
 
-- For ties, include user_id as tiebreaker
WHERE (score, user_id) < (9500, 'prev_user_id')
ORDER BY score DESC, user_id DESC
LIMIT 100;

Pattern 3: Time-Series Queries

Queries over time ranges with aggregation.

timeseries-patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Partition by time for efficient range queries
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    created_at TIMESTAMP,
    data JSONB
) PARTITION BY RANGE (created_at);
 
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
 
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
 
-- Query automatically only scans relevant partition
SELECT * FROM events 
WHERE created_at >= '2024-01-15' AND created_at < '2024-01-20';
 
-- For large time-series, use BRIN index (tiny, fast for sorted data)
CREATE INDEX idx_events_created_at_brin ON events 
USING BRIN (created_at) WITH (pages_per_range = 128);
 
-- Aggregate with date_trunc for grouping
SELECT 
    date_trunc('hour', created_at) as hour,
    event_type,
    COUNT(*) as count
FROM events
WHERE created_at >= NOW() - INTERVAL '24 hours'
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC;
 
-- Pre-aggregate into summary table for fast dashboards
INSERT INTO event_hourly_summary (hour, event_type, count)
SELECT 
    date_trunc('hour', created_at),
    event_type,
    COUNT(*)
FROM events
WHERE created_at >= NOW() - INTERVAL '1 hour'
GROUP BY 1, 2
ON CONFLICT (hour, event_type) 
DO UPDATE SET count = EXCLUDED.count;

Summary: Database Query Optimization

Database queries are often the dominant source of application latency. Optimizing them requires understanding how databases execute queries and applying systematic techniques. Here are the key principles:

Key Takeaways

•Understand query execution — Know how parsers, optimizers, and execution engines work. Statistics drive optimization decisions; keep them fresh with ANALYZE.
•Master execution plans — EXPLAIN ANALYZE is your primary diagnostic tool. Look for sequential scans, high row estimates vs actuals, and expensive sorts.
•Index strategically — Create indexes for filtered, joined, and sorted columns. Use composite indexes with correct column order. Consider covering and partial indexes.
•Avoid index-breaking patterns — Functions on columns, leading wildcards, negative conditions, and expressions break index usage. Rewrite queries to be index-friendly.
•Rewrite inefficient queries — Replace correlated subqueries with JOINs, use EXISTS over IN for subqueries, prefer keyset pagination over OFFSET.
•Optimize joins — Index join columns, filter before joining, be aware of join algorithm selection. Avoid cartesian products from missing conditions.
•Recognize common patterns — N+1 queries, leaderboards, time-series aggregations—each has established optimization approaches.

What's Next:

Optimized queries are essential but not sufficient. When query optimization reaches its limits, caching becomes the next line of defense. The next page explores how to use caching specifically for latency reduction—not just throughput, but making responses faster through intelligent data placement.

Page Complete

You now understand the fundamentals of database query optimization—from execution plans to indexing strategies to query rewriting. These techniques can reduce query times from seconds to milliseconds, often delivering 10-1000x performance improvements.

2 / 5

Loading learning content...

System Design (HLD)Latency Optimization

Latency Optimization: Eliminating Performance Bottlenecks

LevelAdvanced

Duration90 mins

TopicLatency Optimization

2 / 5

Database Query Optimization

When Milliseconds Become Minutes

What You Will Master

Understanding Query Execution

The Query Processing Pipeline:

SQL Query → Parser → Query Rewriter → Query Optimizer → Execution Engine → Results

Parser: Validates SQL syntax, converts to internal representation (parse tree)
Query Rewriter: Applies rule-based transformations (view expansion, predicate pushdown)
Query Optimizer: Explores execution strategies, estimates costs, selects plan
Execution Engine: Executes the plan, retrieves data, applies operations

Common Execution Plan Operations
Operation	Description	Performance Characteristic
Sequential Scan (Seq Scan)	Read every row in table	O(n) - slow for large tables
Index Scan	Use index to locate rows	O(log n) - much faster for selective queries
Index Only Scan	Answer query entirely from index	Fastest - no table access needed
Bitmap Index Scan	Build bitmap of matching rows, then fetch	Good for moderate selectivity
Nested Loop Join	For each row in outer, scan inner	O(n×m) - fast when inner is small/indexed
Hash Join	Build hash table of smaller relation	O(n+m) - good for large unsorted tables
Merge Join	Merge two sorted relations	O(n+m) - requires sorted input
Sort	Sort rows by specified columns	O(n log n) - expensive for large datasets
Aggregate	Compute aggregate functions	O(n) - must process all matching rows

Cost Estimation:

The optimizer assigns a cost to each possible plan based on:

I/O cost: Number of disk pages to read (dominant factor for large tables)
CPU cost: Processing time for operations like filtering, sorting, hashing
Memory cost: Memory required for operations like sorting, hash tables

Selectivity:

Selectivity is the fraction of rows that match a condition. A condition matching 1% of rows has 0.01 selectivity—very selective. A condition matching 90% has 0.9 selectivity—not selective.

Selective conditions favor index scans; non-selective conditions favor sequential scans (reading whole table once is faster than thousands of random index lookups).

Statistics Freshness Matters

Reading Execution Plans

Execution plans are your window into database decision-making. Learning to read them reveals why queries are slow and what to optimize.

EXPLAIN vs EXPLAIN ANALYZE:

EXPLAIN shows the planned execution (estimates only, doesn't run query)
EXPLAIN ANALYZE shows actual execution (runs query, compares estimates to reality)

Always use EXPLAIN ANALYZE when debugging—estimated costs can be very different from actual costs when statistics are stale or data is skewed.

explain-analyze-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- PostgreSQL: Basic query with execution plan
EXPLAIN ANALYZE
SELECT u.name, o.total_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01'
  AND o.status = 'completed';
 
-- Example output:
/*
Nested Loop  (cost=0.87..1234.56 rows=142 width=40) 
             (actual time=0.045..2.341 rows=138 loops=1)
  ->  Index Scan using idx_orders_created_at on orders o
        (cost=0.43..567.89 rows=142 width=20) 
        (actual time=0.025..1.234 rows=138 loops=1)
        Index Cond: (created_at > '2024-01-01')
        Filter: (status = 'completed')
        Rows Removed by Filter: 45
  ->  Index Scan using users_pkey on users u
        (cost=0.43..4.45 rows=1 width=24) 
        (actual time=0.006..0.006 rows=1 loops=138)
        Index Cond: (id = o.user_id)
Planning Time: 0.234 ms
Execution Time: 2.456 ms
*/
 
-- Key things to look for:
-- 1. "Seq Scan" on large tables - usually bad, needs index
-- 2. "actual rows" >> "rows" estimate - stale statistics
-- 3. High "loops" count with nested loops - may need different join
-- 4. "Rows Removed by Filter" is high - filter not pushed to index
-- 5. Large "cost" numbers - expensive operations
 
-- PostgreSQL: Get detailed buffer/IO statistics
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM large_table WHERE id = 12345;
 
-- MySQL equivalent
EXPLAIN ANALYZE
SELECT u.name, o.total_amount
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.created_at > '2024-01-01';

Red Flags in Execution Plans

•Sequential Scan on large tables — If you're filtering for few rows from millions, an index is usually needed.
•Actual rows >> Estimated rows — Statistics are stale. Run ANALYZE immediately.
•High 'Rows Removed by Filter' — Filter condition isn't using an index. Consider composite index including filter column.
•Sort operation with high cost — Consider index for ORDER BY columns, or add LIMIT to reduce sort volume.
•Hash Join with huge hash table — If one side is very large, ensure it's the probe side (not build side). Consider adding index for nested loop.
•Nested loop with high loop count — Each loop is an index lookup. If inner side is expensive, consider hash join instead.
•Memory-intensive operations spilling to disk — Increase work_mem (PostgreSQL) or sort_buffer_size (MySQL) for session.

Visual Plan Analysis

Indexing Fundamentals

How B-Tree Indexes Work:

Most database indexes are B-trees (balanced trees). A B-tree index maintains sorted key values in a tree structure where:

Root and internal nodes contain keys and pointers to child nodes
Leaf nodes contain keys and pointers to actual table rows (or in clustered indexes, the row data itself)
All leaf nodes are at the same depth (balanced), guaranteeing O(log n) access

For a table with 1 billion rows, a B-tree typically has depth 4-5. Finding any row requires at most 4-5 page reads, versus potentially millions for a sequential scan.

Index Types and Use Cases
Index Type	Best For	Not Good For
B-Tree (default)	Equality, range, sorting, prefix LIKE	Full-text search, array containment
Hash	Exact equality lookups only	Range queries, sorting
GIN (Generalized Inverted)	Full-text search, arrays, JSONB	Simple equality/range
GiST (Generalized Search Tree)	Geometric/spatial data, ranges	Simple equality
BRIN (Block Range)	Very large sorted tables (time-series)	Random access patterns
Bitmap	Multiple low-selectivity conditions	High-cardinality columns

Index Selection Principles:

When to Create Indexes

•Primary key columns — Always indexed (automatic in most databases).
•Foreign key columns — JOIN performance depends on these. Most databases don't auto-index FKs.
•WHERE clause columns — Columns frequently filtered with = or range conditions.
•ORDER BY columns — Avoid expensive sorts by maintaining sorted indexes.
•Columns in GROUP BY — Can speed up aggregation when combined with filtering.
•High-selectivity columns — Columns where conditions typically match small percentage of rows.
•Composite indexes for combined conditions — When queries filter on multiple columns together.

index-examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- Single column index for equality and range queries
CREATE INDEX idx_orders_created_at ON orders(created_at);
 
-- Composite index for multi-column filtering
-- Column order matters! Put equality columns first, then range
CREATE INDEX idx_orders_user_status_date 
ON orders(user_id, status, created_at);
 
-- This index supports these queries efficiently:
-- WHERE user_id = X
-- WHERE user_id = X AND status = Y
-- WHERE user_id = X AND status = Y AND created_at > Z
-- BUT NOT: WHERE status = Y (can't skip user_id)
 
-- Covering index: includes all columns query needs
-- Enables "Index Only Scan" - no table access required
CREATE INDEX idx_orders_covering 
ON orders(user_id, status) 
INCLUDE (order_total, created_at);
 
-- Partial index: only index subset of rows
-- Smaller, faster, perfect for skewed data
CREATE INDEX idx_active_orders 
ON orders(created_at) 
WHERE status = 'active';
 
-- Expression index: index computed values
CREATE INDEX idx_users_lower_email 
ON users(LOWER(email));
 
-- Now this query uses the index:
-- WHERE LOWER(email) = 'user@example.com'
 
-- For text search with LIKE 'prefix%'
-- B-tree works for prefix, but need special operator class
CREATE INDEX idx_users_name_pattern 
ON users(name varchar_pattern_ops);
 
-- PostgreSQL: Check index usage
SELECT 
    schemaname,
    relname AS table_name,
    indexrelname AS index_name,
    idx_scan AS times_used,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

Index Overhead

Advanced Indexing Strategies

Beyond basic indexing, advanced strategies unlock performance for complex query patterns.

Composite Index Column Ordering:

In composite indexes, column order is critical. The index can only use columns from left to right. Given index (A, B, C):

✅ WHERE A = 1 — Uses index
✅ WHERE A = 1 AND B = 2 — Uses index fully
✅ WHERE A = 1 AND B = 2 AND C > 3 — Uses all columns
⚠️ WHERE A = 1 AND C = 3 — Uses A only, scans for C
❌ WHERE B = 2 — Cannot use index (A skipped)

The Rule: Put equality conditions first, range conditions last. Put most selective equality columns first for best filtering.

Index-Friendly Patterns

•WHERE status = 'active' — Equality, uses B-tree
•WHERE created_at > '2024-01-01' — Range, uses B-tree
•WHERE email LIKE 'john%' — Prefix match, uses B-tree
•WHERE user_id IN (1,2,3) — Equality list, uses index
•ORDER BY created_at DESC LIMIT 10 — Index scan with limit
•WHERE (a, b) > (1, 2) — Row comparison, uses composite

Index-Breaking Patterns

•WHERE YEAR(created_at) = 2024 — Function on column
•WHERE amount * 1.1 > 100 — Expression on column
•WHERE email LIKE '%@gmail.com' — Suffix match
•WHERE name != 'John' — Negative conditions
•WHERE status IS NOT NULL — Often low selectivity
•WHERE LOWER(email) = 'x' — Without expression index

Covering Indexes:

A covering index includes all columns a query needs, allowing an Index Only Scan—the database never reads the table, only the index. This eliminates the expensive random I/O of table lookups.

-- Query needs user_id, email, name
SELECT email, name FROM users WHERE user_id = 123;

-- Covering index with INCLUDE clause
CREATE INDEX idx_users_covering ON users(user_id) INCLUDE (email, name);

Partial Indexes:

Partial indexes only index rows matching a condition. For tables where queries always filter on a condition, partial indexes are smaller and faster.

-- Active orders are 5% of all orders
CREATE INDEX idx_active_orders ON orders(created_at) WHERE status = 'active';

-- Index is 20x smaller, faster to scan, faster to maintain

Index-Organized Tables (Clustered Indexes):

In MySQL (InnoDB) and SQL Server, the primary key index stores actual row data—rows are physically ordered by primary key. This means:

Primary key lookups are fastest (one index traversal, no table lookup)
Range scans on primary key are very efficient (data is physically adjacent)
Secondary indexes store primary key values and require additional lookup

Choose primary keys carefully in these systems—UUID primary keys cause random I/O on every insert; sequential IDs allow efficient append-only writes.

Index Maintenance

Query Rewriting Techniques

Sometimes the best optimization is rewriting the query itself. Semantically equivalent queries can have vastly different performance characteristics.

Common Query Patterns and Their Optimizations:

query-rewrites.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
-- PROBLEM 1: Subquery in WHERE with correlation
-- Bad: Correlated subquery executes once per row
SELECT * FROM orders o
WHERE o.total > (
    SELECT AVG(total) FROM orders o2 WHERE o2.user_id = o.user_id
);
 
-- Better: Rewrite as JOIN with pre-aggregated data
SELECT o.* FROM orders o
JOIN (
    SELECT user_id, AVG(total) as avg_total 
    FROM orders 
    GROUP BY user_id
) user_avg ON o.user_id = user_avg.user_id
WHERE o.total > user_avg.avg_total;
 
-- PROBLEM 2: OR conditions on different columns
-- Bad: Often can't use indexes effectively
SELECT * FROM products WHERE category_id = 5 OR brand_id = 10;
 
-- Better: UNION of two indexed queries
SELECT * FROM products WHERE category_id = 5
UNION
SELECT * FROM products WHERE brand_id = 10;
 
-- PROBLEM 3: DISTINCT with large result set
-- Bad: Sorts entire result for deduplication
SELECT DISTINCT user_id FROM orders WHERE created_at > '2024-01-01';
 
-- Better: GROUP BY (sometimes optimizes differently)
SELECT user_id FROM orders 
WHERE created_at > '2024-01-01' 
GROUP BY user_id;
 
-- Even better if exists check is sufficient
SELECT user_id FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.user_id = u.id AND o.created_at > '2024-01-01'
);
 
-- PROBLEM 4: Counting with expensive conditions
-- Bad: Counts all matching rows
SELECT COUNT(*) FROM orders WHERE status = 'completed';
 
-- Better for approximation: Use pg_stat_user_tables
SELECT n_tup_ins - n_tup_del AS approx_count 
FROM pg_stat_user_tables WHERE relname = 'orders';
 
-- Or for exact count on filtered large table, consider:
SELECT reltuples::bigint AS estimate FROM pg_class WHERE relname = 'orders';
 
-- PROBLEM 5: Pagination with OFFSET
-- Bad: OFFSET 10000 still reads and discards 10000 rows
SELECT * FROM products ORDER BY id LIMIT 20 OFFSET 10000;
 
-- Better: Keyset pagination (seek method)
SELECT * FROM products 
WHERE id > 10020  -- Last ID from previous page
ORDER BY id LIMIT 20;
 
-- PROBLEM 6: NOT IN with nullable column
-- Bad: NOT IN with NULLs returns unexpected results, can't use index
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM deleted_users);
 
-- Better: NOT EXISTS (handles NULLs correctly, often faster)
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM deleted_users d WHERE d.user_id = u.id);
 
-- Or: LEFT JOIN / IS NULL
SELECT u.* FROM users u
LEFT JOIN deleted_users d ON u.id = d.user_id
WHERE d.user_id IS NULL;
 
-- PROBLEM 7: Multiple conditions on same table
-- Bad: Multiple subqueries hit same table repeatedly
SELECT * FROM products 
WHERE category_id = (SELECT id FROM categories WHERE name = 'Electronics')
  AND brand_id = (SELECT id FROM brands WHERE name = 'Samsung');
 
-- Better: Single lookup with JOIN
SELECT p.* FROM products p
JOIN categories c ON p.category_id = c.id
JOIN brands b ON p.brand_id = b.id
WHERE c.name = 'Electronics' AND b.name = 'Samsung';

Query Rewriting Principles

•Push filters down — Apply WHERE conditions as early as possible to reduce data processed in joins.
•Replace correlated subqueries with JOINs — Correlated subqueries execute per-row; JOINs process sets efficiently.
•Use EXISTS instead of IN for subqueries — EXISTS stops at first match; IN may materialize entire subquery.
•Avoid functions on indexed columns — WHERE YEAR(date) = 2024 can't use index; WHERE date >= '2024-01-01' AND date < '2025-01-01' can.
•Prefer keyset pagination to OFFSET — OFFSET 10000 reads and discards 10000 rows; keyset pagination jumps directly.
•Denormalize for read-heavy patterns — If you always join the same tables, consider storing denormalized data.
•Break complex queries into CTEs — CTEs (WITH clauses) can be materialized, reused, and make plans more predictable.

Join Optimization

Join Algorithm Comparison:

Join Algorithms: When Each Is Chosen
Algorithm	Mechanism	Best When	Cost
Nested Loop	For each outer row, scan inner table/index	Small outer, indexed inner	O(n × m) or O(n × log m) with index
Hash Join	Build hash table on smaller table, probe with larger	Large unsorted tables, no useful indexes	O(n + m) with memory for hash table
Merge Join	Merge two sorted inputs	Both sides already sorted (by index or prior sort)	O(n + m) but requires sorted input

Join Optimization Strategies:

1. Ensure Join Columns Are Indexed: Foreign key columns should almost always be indexed. Without an index, every join becomes a sequential scan of the inner table.

2. Filter Before Joining: Apply WHERE conditions to reduce rows before joining. The optimizer usually does this, but complex queries may fool it.

join-optimization.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- Ensure FK columns are indexed
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
CREATE INDEX idx_order_items_product_id ON order_items(product_id);
 
-- PostgreSQL: Force specific join order when optimizer fails
SET join_collapse_limit = 1;  -- Respect explicit join order
 
SELECT *
FROM users u
JOIN orders o ON o.user_id = u.id        -- Join 1
JOIN order_items oi ON oi.order_id = o.id  -- Join 2
JOIN products p ON p.id = oi.product_id    -- Join 3
WHERE u.id = 12345;
 
RESET join_collapse_limit;
 
-- PostgreSQL: Hint to disable specific join type
SET enable_hashjoin = off;  -- Force nested loop or merge join
EXPLAIN ANALYZE SELECT ...;
SET enable_hashjoin = on;   -- Reset
 
-- Large join optimization: materialize expensive subquery
WITH expensive_filter AS MATERIALIZED (
    SELECT user_id, SUM(total) as total_spend
    FROM orders
    WHERE created_at > '2024-01-01'
    GROUP BY user_id
    HAVING SUM(total) > 1000
)
SELECT u.*, ef.total_spend
FROM users u
JOIN expensive_filter ef ON u.id = ef.user_id;
 
-- Lateral join for efficient correlated logic
-- Get latest 3 orders per user efficiently
SELECT u.*, recent_orders.*
FROM users u,
LATERAL (
    SELECT order_id, total, created_at
    FROM orders
    WHERE user_id = u.id
    ORDER BY created_at DESC
    LIMIT 3
) recent_orders
WHERE u.status = 'active';
 
-- This is more efficient than:
-- SELECT * FROM users u
-- JOIN orders o ON o.user_id = u.id
-- WHERE ... (with window functions for top 3)

Join Hints: Use Sparingly

Query Performance Patterns

Certain query patterns appear repeatedly in applications. Knowing the optimized form for each saves debugging time.

Pattern 1: The N+1 Query Problem

One of the most common ORM-induced performance issues.

n-plus-one.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// N+1 PROBLEM: 1 query for users, N queries for orders
const users = await prisma.user.findMany({ take: 100 });
for (const user of users) {
    // This executes 100 separate queries!
    const orders = await prisma.order.findMany({
        where: { userId: user.id }
    });
}
 
// SOLUTION: Eager loading / batch fetching
const usersWithOrders = await prisma.user.findMany({
    take: 100,
    include: {
        orders: true  // Single query with JOIN
    }
});
 
// ALTERNATIVE: DataLoader pattern for GraphQL
import DataLoader from 'dataloader';
 
const orderLoader = new DataLoader(async (userIds: string[]) => {
    const orders = await prisma.order.findMany({
        where: { userId: { in: userIds } }
    });
    // Group by userId and return in same order as input
    const ordersByUser = new Map<string, Order[]>();
    for (const order of orders) {
        const existing = ordersByUser.get(order.userId) || [];
        ordersByUser.set(order.userId, [...existing, order]);
    }
    return userIds.map(id => ordersByUser.get(id) || []);
});
 
// Now resolvers use the loader - automatically batched
const resolvers = {
    User: {
        orders: (user) => orderLoader.load(user.id)
    }
};

Pattern 2: Leaderboards and Rankings

Getting top N with position is common but easy to implement poorly.

leaderboard-patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Efficient: Get top 100 with rank
-- Use window function with LIMIT
SELECT 
    user_id,
    score,
    RANK() OVER (ORDER BY score DESC) as position
FROM leaderboard
ORDER BY score DESC
LIMIT 100;
 
-- Get specific user's rank without scanning entire table
-- Create covering index
CREATE INDEX idx_leaderboard_score ON leaderboard(score DESC, user_id);
 
-- Count users with higher score
SELECT 
    (SELECT COUNT(*) FROM leaderboard WHERE score > l.score) + 1 as position,
    l.score
FROM leaderboard l
WHERE l.user_id = 12345;
 
-- For pagination in leaderboards, use keyset
SELECT user_id, score
FROM leaderboard
WHERE score < 9500  -- Last score from previous page
ORDER BY score DESC
LIMIT 100;
 
-- For ties, include user_id as tiebreaker
WHERE (score, user_id) < (9500, 'prev_user_id')
ORDER BY score DESC, user_id DESC
LIMIT 100;

Pattern 3: Time-Series Queries

Queries over time ranges with aggregation.

timeseries-patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- Partition by time for efficient range queries
CREATE TABLE events (
    id BIGSERIAL,
    event_type VARCHAR(50),
    created_at TIMESTAMP,
    data JSONB
) PARTITION BY RANGE (created_at);
 
CREATE TABLE events_2024_01 PARTITION OF events
FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
 
CREATE TABLE events_2024_02 PARTITION OF events
FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
 
-- Query automatically only scans relevant partition
SELECT * FROM events 
WHERE created_at >= '2024-01-15' AND created_at < '2024-01-20';
 
-- For large time-series, use BRIN index (tiny, fast for sorted data)
CREATE INDEX idx_events_created_at_brin ON events 
USING BRIN (created_at) WITH (pages_per_range = 128);
 
-- Aggregate with date_trunc for grouping
SELECT 
    date_trunc('hour', created_at) as hour,
    event_type,
    COUNT(*) as count
FROM events
WHERE created_at >= NOW() - INTERVAL '24 hours'
GROUP BY 1, 2
ORDER BY 1 DESC, 3 DESC;
 
-- Pre-aggregate into summary table for fast dashboards
INSERT INTO event_hourly_summary (hour, event_type, count)
SELECT 
    date_trunc('hour', created_at),
    event_type,
    COUNT(*)
FROM events
WHERE created_at >= NOW() - INTERVAL '1 hour'
GROUP BY 1, 2
ON CONFLICT (hour, event_type) 
DO UPDATE SET count = EXCLUDED.count;

Summary: Database Query Optimization

Key Takeaways

•Understand query execution — Know how parsers, optimizers, and execution engines work. Statistics drive optimization decisions; keep them fresh with ANALYZE.
•Master execution plans — EXPLAIN ANALYZE is your primary diagnostic tool. Look for sequential scans, high row estimates vs actuals, and expensive sorts.
•Index strategically — Create indexes for filtered, joined, and sorted columns. Use composite indexes with correct column order. Consider covering and partial indexes.
•Avoid index-breaking patterns — Functions on columns, leading wildcards, negative conditions, and expressions break index usage. Rewrite queries to be index-friendly.
•Rewrite inefficient queries — Replace correlated subqueries with JOINs, use EXISTS over IN for subqueries, prefer keyset pagination over OFFSET.
•Optimize joins — Index join columns, filter before joining, be aware of join algorithm selection. Avoid cartesian products from missing conditions.
•Recognize common patterns — N+1 queries, leaderboards, time-series aggregations—each has established optimization approaches.

What's Next:

Page Complete

2 / 5