Query Compilation - Learning Module

Loading content...

0/241

Performance Benefits

Quantifying the Gains

Throughout this module, we've explored query compilation, prepared statements, caching, and plan reuse. We've discussed how these techniques work and why they matter. Now we answer the ultimate question: How much do they actually help?

Performance benefits vary dramatically based on workload characteristics, query complexity, and system configuration. A technique that delivers 100x improvement for one workload may provide negligible benefit—or even harm—for another. This page quantifies expected gains, identifies optimal application scenarios, and provides decision frameworks for real-world deployment.

The goal: Transform architectural knowledge into practical performance engineering.

What You Will Learn

By the end of this page, you will understand quantitative performance improvements from each technique, workload characteristics that maximize benefits, common pitfalls that negate expected gains, and how to establish baseline measurements for your own systems.

Breaking Down Query Execution Time

To understand performance benefits, we must first understand where time goes during query processing. Each phase consumes CPU cycles, and different techniques target different phases.

Typical OLTP Query Breakdown (simple point lookup):

Phase	Typical Time	Percentage
Network round-trip	0.5-5ms	Variable
Parse SQL	0.05-0.2ms	2-5%
Semantic analysis	0.1-0.3ms	3-8%
Optimization	0.5-10ms	15-60%
Plan compilation/JIT	2-100ms	When applicable
Execution (cached data)	0.1-1ms	5-30%
Result serialization	0.1-0.5ms	3-10%

Key observation: For simple queries, optimization often dominates execution time. This is precisely where prepared statements and plan caching deliver massive wins.

Typical OLAP Query Breakdown (complex analytical):

Phase	Typical Time	Percentage
Parse + Analyze	1-10ms	0.001-0.1%
Optimization	50-500ms	0.1-1%
JIT Compilation	100-500ms	0.2-1%
Execution (scanning, joins)	10s-10min	98-99%
Result serialization	10-1000ms	0.01-1%

Key observation: For complex queries, execution dominates. Compilation techniques (JIT, vectorization) that accelerate execution provide the wins.

Converting Mermaid diagram...

Profile Before Optimizing

These distributions are typical, not universal. Your workload may differ significantly. Always measure actual phase timings using database instrumentation (pg_stat_statements, Query Store, AWR) before assuming where bottlenecks lie.

Prepared Statement Performance Gains

Prepared statements eliminate redundant parsing and optimization. The magnitude of benefit depends on query complexity and execution frequency.

Measured improvements (industry benchmarks):

Simple key lookup (1 table, 1 predicate): 2-5x throughput increase
Medium complexity (2-3 joins): 5-10x throughput increase
Complex query (many joins, subqueries): 10-50x throughput increase

These gains assume high query frequency. For one-time queries, prepared statements add overhead without benefit.

Prepared Statement Benchmark Results
Query Type	Ad-Hoc (qps)	Prepared (qps)	Improvement	Notes
Simple SELECT by PK	8,000	25,000	3.1x	Parse/optimize overhead eliminated
INSERT single row	12,000	35,000	2.9x	Bound parameters avoid escaping
SELECT with 2 JOINs	2,000	18,000	9x	Optimization savings compound
Complex aggregate	500	8,000	16x	Heavy optimization cost amortized
Dynamic search (varies)	6,000	6,500	1.08x	Low cache hit rate limits benefit

prepared_statement_benchmark.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# Benchmark: Prepared vs Ad-Hoc Query Performance
 
import time
import psycopg2
from psycopg2 import sql
import statistics
 
def benchmark_adhoc(conn, iterations=10000):
    """Benchmark ad-hoc queries (string formatting)."""
    cursor = conn.cursor()
    times = []
    
    for i in range(iterations):
        user_id = i % 100000  # Vary the parameter
        
        start = time.perf_counter()
        # WARNING: Don't do this in production (SQL injection risk)
        cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
        cursor.fetchone()
        elapsed = time.perf_counter() - start
        
        times.append(elapsed * 1000)  # Convert to ms
    
    return {
        'mean_ms': statistics.mean(times),
        'p99_ms': sorted(times)[int(len(times) * 0.99)],
        'qps': iterations / sum(times) * 1000
    }
 
def benchmark_prepared(conn, iterations=10000):
    """Benchmark prepared statements."""
    cursor = conn.cursor()
    times = []
    
    # Prepare once (in real apps, this happens at connection init)
    cursor.execute(
        "PREPARE user_lookup(int) AS "
        "SELECT * FROM users WHERE id = $1"
    )
    
    for i in range(iterations):
        user_id = i % 100000
        
        start = time.perf_counter()
        cursor.execute("EXECUTE user_lookup(%s)", (user_id,))
        cursor.fetchone()
        elapsed = time.perf_counter() - start
        
        times.append(elapsed * 1000)
    
    cursor.execute("DEALLOCATE user_lookup")
    
    return {
        'mean_ms': statistics.mean(times),
        'p99_ms': sorted(times)[int(len(times) * 0.99)],
        'qps': iterations / sum(times) * 1000
    }
 
def benchmark_native_prepared(conn, iterations=10000):
    """Benchmark using driver's native prepared statement support."""
    times = []
    
    # Use parameterized query (driver handles preparation)
    query = "SELECT * FROM users WHERE id = %s"
    
    for i in range(iterations):
        user_id = i % 100000
        cursor = conn.cursor()
        
        start = time.perf_counter()
        cursor.execute(query, (user_id,))
        cursor.fetchone()
        elapsed = time.perf_counter() - start
        
        times.append(elapsed * 1000)
        cursor.close()
    
    return {
        'mean_ms': statistics.mean(times),
        'p99_ms': sorted(times)[int(len(times) * 0.99)],
        'qps': iterations / sum(times) * 1000
    }
 
# Run benchmarks
conn = psycopg2.connect("dbname=benchmark user=postgres")
 
print("Warming up connection pool...")
benchmark_adhoc(conn, iterations=100)
 
print("\nBenchmark Results:")
print("-" * 50)
 
adhoc = benchmark_adhoc(conn, iterations=10000)
print(f"Ad-Hoc Queries:")
print(f"  Mean: {adhoc['mean_ms']:.3f}ms, P99: {adhoc['p99_ms']:.3f}ms")
print(f"  Throughput: {adhoc['qps']:.0f} qps")
 
prepared = benchmark_prepared(conn, iterations=10000)
print(f"\nPrepared Statements:")
print(f"  Mean: {prepared['mean_ms']:.3f}ms, P99: {prepared['p99_ms']:.3f}ms")
print(f"  Throughput: {prepared['qps']:.0f} qps")
 
improvement = prepared['qps'] / adhoc['qps']
print(f"\nImprovement: {improvement:.1f}x throughput increase")

Query Compilation (JIT) Benefits

JIT compilation provides performance benefits that differ qualitatively from plan caching. While plan caching reduces preparation overhead, JIT compilation accelerates execution itself.

Where JIT helps most:

Expression evaluation — Complex WHERE clauses, computed columns, CASE expressions
Tuple deforming — Extracting column values from row representations
Aggregation — SUM, COUNT, AVG over large datasets
Hash operations — Hash joins, hash aggregation

Where JIT doesn't help:

I/O-bound queries — If waiting on disk, faster CPU doesn't help
Simple queries — Compilation time exceeds execution time
Short-running queries — No time to amortize compilation overhead
Memory-intensive operations — Cache misses dominate, not computation

JIT Compilation Benchmark Results (PostgreSQL)
Query Type	JIT Off (ms)	JIT On (ms)	Speedup	Analysis
TPC-H Q1 (aggregation)	4,500	2,100	2.1x	Expression-heavy aggregation benefits
TPC-H Q5 (complex joins)	8,200	5,800	1.4x	Mix of compute and I/O
TPC-H Q6 (scan + filter)	1,200	520	2.3x	Predicate evaluation accelerated
Simple point lookup	0.8	1.2	0.67x	JIT overhead exceeds benefit
Large hash join	15,000	9,500	1.6x	Hash computation accelerated
I/O-bound scan	25,000	24,500	1.02x	Disk waiting dominates

postgresql_jit_analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- PostgreSQL JIT Performance Analysis
 
-- Enable JIT and set thresholds
SET jit = on;
SET jit_above_cost = 100000;        -- JIT for queries costing > this
SET jit_inline_above_cost = 500000;  -- Inline functions above this cost
SET jit_optimize_above_cost = 500000; -- Apply optimizations above this
 
-- Analyze a query with JIT timing breakdown
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT 
    l_returnflag,
    l_linestatus,
    SUM(l_quantity) as sum_qty,
    SUM(l_extendedprice) as sum_base_price,
    SUM(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    AVG(l_quantity) as avg_qty,
    AVG(l_extendedprice) as avg_price,
    COUNT(*) as count_order
FROM lineitem
WHERE l_shipdate <= DATE '1998-12-01' - INTERVAL '90 days'
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;
 
-- Sample output shows JIT timing:
-- JIT:
--   Functions: 21
--   Options: Inlining true, Optimization true, Expressions true, Deforming true
--   Timing: Generation 5.234 ms, Inlining 18.456 ms, Optimization 89.123 ms,
--           Emission 45.678 ms, Total 158.491 ms
-- Execution Time: 2134.567 ms
 
-- Compare with JIT disabled
SET jit = off;
-- Re-run query... Execution Time: 4567.890 ms
 
-- JIT benefit: 4567 / 2134 = 2.14x speedup
-- Note: JIT time (158ms) is small compared to execution savings (2433ms)
 
-- Query statistics including JIT metrics
SELECT 
    substring(query, 1, 60) as query_preview,
    calls,
    mean_exec_time::numeric(10,2) as avg_ms,
    jit_functions as jit_funcs,
    jit_generation_time::numeric(10,2) as jit_gen_ms,
    jit_optimization_time::numeric(10,2) as jit_opt_ms,
    jit_emission_time::numeric(10,2) as jit_emit_ms
FROM pg_stat_statements
WHERE jit_functions > 0
ORDER BY jit_generation_time DESC
LIMIT 10;

JIT Threshold Tuning

Default JIT thresholds are often too conservative or too aggressive for specific workloads. In OLAP-heavy environments, lower jit_above_cost. In OLTP environments, raise it significantly or disable JIT entirely. Monitor JIT compilation time vs. execution time to validate settings.

Plan Caching Impact

Plan caching benefits compound across all database connections. Unlike per-session prepared statements, global plan caches avoid redundant optimization even for first-time executions within a session.

Key metrics to measure:

Plan Cache Hit Rate — Percentage of lookups finding cached plan
Compilation Time Saved — Total optimization time avoided
Memory Efficiency — Plans serving multiple queries
Eviction Rate — How often plans are pushed out of cache

plan_cache_impact_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
-- SQL Server: Plan Cache Impact Analysis
 
-- 1. Overall plan cache efficiency
SELECT 
    objtype as object_type,
    COUNT(*) as cached_plans,
    SUM(usecounts) as total_uses,
    SUM(size_in_bytes) / 1024 / 1024 as cache_size_mb,
    AVG(usecounts) as avg_reuse_count,
    SUM(usecounts) * 1.0 / COUNT(*) as reuse_ratio
FROM sys.dm_exec_cached_plans
GROUP BY objtype
ORDER BY total_uses DESC;
 
-- 2. Cache hit ratio from performance counters
SELECT 
    counter_name,
    cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Plan Cache%'
    AND counter_name IN (
        'Cache Hit Ratio',
        'Cache Hit Ratio Base',
        'Cache Object Counts',
        'Cache Pages'
    );
 
-- 3. Compilation time analysis
-- (Requires Query Store to be enabled)
SELECT 
    SUM(rs.count_executions) as total_executions,
    SUM(p.count_compiles) as total_compiles,
    (SUM(rs.count_executions) - SUM(p.count_compiles)) as cache_hits,
    100.0 * (SUM(rs.count_executions) - SUM(p.count_compiles)) 
        / SUM(rs.count_executions) as cache_hit_pct,
    SUM(p.count_compiles * rs.avg_compile_duration) / 1000000.0 
        as total_compile_sec,
    SUM(p.count_compiles * rs.avg_compile_duration) / 1000000.0 
        / SUM(p.count_compiles) as avg_compile_sec
FROM sys.query_store_plan p
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id;
 
-- 4. Queries that would benefit from better caching
-- (High compile count relative to execution count)
SELECT TOP 20
    qt.query_sql_text,
    p.count_compiles,
    rs.count_executions,
    (p.count_compiles * 1.0 / rs.count_executions) as compile_ratio,
    rs.avg_compile_duration / 1000.0 as avg_compile_ms
FROM sys.query_store_plan p
JOIN sys.query_store_query q ON p.query_id = q.query_id
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
WHERE rs.count_executions > 100
ORDER BY compile_ratio DESC;

Target Metrics for Healthy Plan Caching

•Plan cache hit rate > 95% — Most queries should find cached plans
•Average reuse count > 10 — Plans should serve many executions
•Eviction rate < 5% per hour — Cache should be stable
•Compile time < 5% of total time — Compilation shouldn't dominate
•Query variance < 50% CV — Consistent execution times indicate stable plans

Real-World Case Studies

Theory meets practice in these real-world scenarios where query compilation techniques delivered measurable business impact.

Case Study 1: E-Commerce Product Search

Problem: Product search API response times averaged 450ms, with P99 at 2 seconds. Database CPU at 85% during peak hours.

Analysis: 80% of search queries used ad-hoc SQL with dynamic filter construction. Each search triggered full parse-optimize cycle despite similar query shapes.

Solution: Implemented prepared statement templates for common filter combinations (category, price range, rating). Used parameterized queries instead of string concatenation.

Result:

Average response time: 450ms → 85ms (5.3x improvement)
P99 response time: 2000ms → 280ms (7.1x improvement)
Database CPU during peak: 85% → 35%
Throughput capacity: 2x increase with same hardware

Case Study 1: E-Commerce Search Performance
Metric	Before	After	Improvement
Avg Response Time	450ms	85ms	5.3x faster
P99 Response Time	2000ms	280ms	7.1x faster
Peak CPU Usage	85%	35%	50% reduction
Max Throughput	5,000 qps	10,000 qps	2x capacity
Cost (monthly)	$45,000	$22,500	50% reduction

Case Study 2: Financial Reporting Dashboard

Problem: Daily financial reports took 4+ hours to generate. Business users couldn't access updated data until mid-morning.

Analysis: Complex aggregation queries processed hundreds of millions of rows. Interpreted execution spent 70% of time on expression evaluation overhead.

Solution: Enabled PostgreSQL JIT compilation. Tuned thresholds for the reporting workload. Added materialized views with incremental refresh for key aggregations.

Result:

Report generation time: 4 hours → 55 minutes (4.4x improvement)
Critical path query: 45 minutes → 8 minutes (5.6x improvement)
Business impact: Reports available before trading hours open

Case Study 3: SaaS Multi-Tenant Application

Problem: Parameter sniffing caused intermittent performance degradation. Some tenants (large) experienced 10-second queries while small tenants got millisecond responses.

Analysis: Shared execution plans optimized for first-execution tenant. Large tenant queries used small-tenant plans (index seeks on 10 million rows) and vice versa.

Solution: Implemented OPTIMIZE FOR UNKNOWN for tenant-filtered queries. Added Query Store monitoring with automatic plan correction. Considered tenant-specific connection pools for extreme cases.

Result:

P99 query time: 10,000ms → 500ms (20x improvement)
Support tickets (slow queries): 50/week → 2/week
Tenant satisfaction scores: +35%
Avoided expensive migration to sharded architecture

Document Your Wins

Performance optimization often goes unrecognized because pre-optimization measurements weren't taken. Always benchmark before AND after changes. Document the business impact (cost savings, capacity increase, user satisfaction) alongside technical metrics.

When to Apply Which Technique

Different scenarios call for different techniques. This decision matrix helps you choose the right approach based on workload characteristics.

Technique Selection by Workload
Workload Type	Primary Technique	Secondary Technique	Avoid
High-frequency OLTP	Prepared statements	Plan caching	JIT compilation
Complex analytics	JIT compilation	Vectorization	Over-prepared patterns
Mixed OLTP/OLAP	Adaptive query processing	Workload segmentation	One-size-fits-all
Ad-hoc reporting	Plan caching	Materialized views	Per-query preparation
Microservices (many connections)	Connection pooling + prepared	Protocol-level preparation	Per-connection prep
Batch processing	Bulk operations	JIT for transforms	Row-by-row prepared

Decision Framework

•Query frequency > 100/second? → Prepared statements are essential; every millisecond of optimization time matters.
•Query execution > 1 second? → Consider JIT compilation; compilation time is amortized over long execution.
•Data highly skewed? → Use OPTIMIZE FOR UNKNOWN or adaptive query processing to avoid parameter sniffing issues.
•Many unique query patterns? → Focus on plan caching efficiency; prepared statements provide limited benefit.
•CPU-bound execution? → JIT compilation and vectorization address computation overhead.
•I/O-bound execution? → Focus on indexes, caching, and avoiding unnecessary scans; query compilation won't help.

Start Simple, Measure, Iterate

Don't implement every technique simultaneously. Start with the lowest-effort, highest-impact change (usually prepared statements for OLTP). Measure the improvement. Only add complexity when there's proven need.

Common Pitfalls and Anti-Patterns

Performance optimization can backfire when applied incorrectly. These anti-patterns negate or reverse expected benefits.

Anti-Patterns to Avoid

•Preparing statements inside request loops — Preparing per-request is worse than ad-hoc queries. Prepare once at startup or use connection pool caching.
•Enabling JIT for all queries — Low jit_above_cost causes JIT overhead for simple queries. Keep thresholds appropriate for workload.
•Over-caching with stale plans — Never invalidating plan cache leads to suboptimal plans as data changes. Balance caching with freshness.
•Ignoring parameter sniffing in production — Testing with small datasets hides sniffing issues. Test with production-scale data and distribution.
•Disabling all caching for 'freshness' — Extreme position that sacrifices massive performance. Use appropriate TTLs instead.
•Assuming preparation equals parameterization — Concatenating user input into prepared statement templates reintroduces SQL injection.
•Single connection pool for all workloads — OLAP queries competing with OLTP in same pool causes contention. Segment by workload type.
•Not monitoring after changes — Performance optimization isn't set-and-forget. Workloads evolve; monitor continuously.

common_mistakes.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Anti-Patterns in Query Optimization
 
import psycopg2
 
# ==========================================
# ANTI-PATTERN 1: Preparing inside the loop
# ==========================================
def process_orders_WRONG(order_ids: list):
    """
    WRONG: Prepares statement for each order.
    Worse than ad-hoc because of PREPARE + EXECUTE overhead.
    """
    conn = psycopg2.connect(...)
    cursor = conn.cursor()
    
    for order_id in order_ids:
        # Preparing inside loop - defeats the purpose!
        cursor.execute(
            "PREPARE get_order AS SELECT * FROM orders WHERE id = $1"
        )
        cursor.execute("EXECUTE get_order(%s)", (order_id,))
        cursor.execute("DEALLOCATE get_order")
        # ... process order
 
def process_orders_CORRECT(order_ids: list):
    """
    CORRECT: Prepare once, execute many times.
    """
    conn = psycopg2.connect(...)
    cursor = conn.cursor()
    
    # Prepare once
    cursor.execute(
        "PREPARE get_order(int) AS SELECT * FROM orders WHERE id = $1"
    )
    
    # Execute many times
    for order_id in order_ids:
        cursor.execute("EXECUTE get_order(%s)", (order_id,))
        # ... process order
    
    # Deallocate when done
    cursor.execute("DEALLOCATE get_order")
 
 
# ==========================================
# ANTI-PATTERN 2: Parameterized SQL Injection
# ==========================================
def search_products_WRONG(table_name: str, category: str):
    """
    WRONG: User input in non-parameterizable position.
    Still vulnerable despite using parameters for category!
    """
    conn = psycopg2.connect(...)
    cursor = conn.cursor()
    
    # table_name from user input - SQL injection!
    query = f"SELECT * FROM {table_name} WHERE category = %s"
    cursor.execute(query, (category,))  # Category is safe, table is not!
    return cursor.fetchall()
 
def search_products_CORRECT(table_name: str, category: str):
    """
    CORRECT: Whitelist non-parameterizable inputs.
    """
    ALLOWED_TABLES = {'products', 'archived_products', 'featured_products'}
    
    if table_name not in ALLOWED_TABLES:
        raise ValueError(f"Invalid table: {table_name}")
    
    conn = psycopg2.connect(...)
    cursor = conn.cursor()
    
    # Safe because table_name is whitelisted
    query = f"SELECT * FROM {table_name} WHERE category = %s"
    cursor.execute(query, (category,))
    return cursor.fetchall()

Summary: Performance Benefits

We've quantified the performance benefits of query compilation techniques and established frameworks for their effective application. Let's consolidate the key insights:

Key Takeaways

•Query execution time has distinct phases — Parse, analyze, optimize, compile, and execute. Different techniques target different phases.
•Prepared statements deliver 2-10x+ throughput gains — By eliminating parse and optimize phases for repeated queries.
•JIT compilation accelerates execution 1.5-2.5x — For CPU-bound queries with complex expressions; avoid for simple queries.
•Plan caching benefits compound globally — Cached plans serve all connections, maximizing hardware efficiency.
•Workload characteristics determine optimal strategy — OLTP favors caching; OLAP favors compilation; mixed requires adaptation.
•Anti-patterns can negate or reverse benefits — Preparing per-request, ignoring parameter sniffing, or misconfigured thresholds cause harm.
•Measure before and after — Performance optimization without measurement is guesswork. Establish baselines and track improvements.

Module Conclusion:

Query compilation is not a single technique but a family of approaches that transform how databases execute queries. From the fundamental concept of eliminating interpretation overhead, through the practical mechanisms of prepared statements and plan caching, to the sophisticated strategies of adaptive processing and regression detection—these techniques form the foundation of high-performance database operation.

The key insight throughout: know your workload. Performance optimization is not about applying every technique everywhere. It's about understanding where time goes, which techniques address those costs, and measuring the impact of changes.

With this knowledge, you can diagnose performance issues at their root, select appropriate optimization strategies, and build systems that scale gracefully from prototype to production.

Module Complete

Congratulations! You've completed the Query Compilation module. You now understand compiled queries, prepared statements, query caching, plan reuse, and the performance benefits these techniques provide. You're equipped to analyze query processing overhead and apply appropriate optimization strategies for your workloads.

Performance Benefits

Quantifying the Gains

The goal: Transform architectural knowledge into practical performance engineering.

What You Will Learn

Breaking Down Query Execution Time

To understand performance benefits, we must first understand where time goes during query processing. Each phase consumes CPU cycles, and different techniques target different phases.

Typical OLTP Query Breakdown (simple point lookup):

Phase	Typical Time	Percentage
Network round-trip	0.5-5ms	Variable
Parse SQL	0.05-0.2ms	2-5%
Semantic analysis	0.1-0.3ms	3-8%
Optimization	0.5-10ms	15-60%
Plan compilation/JIT	2-100ms	When applicable
Execution (cached data)	0.1-1ms	5-30%
Result serialization	0.1-0.5ms	3-10%

Key observation: For simple queries, optimization often dominates execution time. This is precisely where prepared statements and plan caching deliver massive wins.

Typical OLAP Query Breakdown (complex analytical):

Phase	Typical Time	Percentage
Parse + Analyze	1-10ms	0.001-0.1%
Optimization	50-500ms	0.1-1%
JIT Compilation	100-500ms	0.2-1%
Execution (scanning, joins)	10s-10min	98-99%
Result serialization	10-1000ms	0.01-1%

Key observation: For complex queries, execution dominates. Compilation techniques (JIT, vectorization) that accelerate execution provide the wins.

Converting Mermaid diagram...

Profile Before Optimizing

Prepared Statement Performance Gains

Prepared statements eliminate redundant parsing and optimization. The magnitude of benefit depends on query complexity and execution frequency.

Measured improvements (industry benchmarks):

Simple key lookup (1 table, 1 predicate): 2-5x throughput increase
Medium complexity (2-3 joins): 5-10x throughput increase
Complex query (many joins, subqueries): 10-50x throughput increase

These gains assume high query frequency. For one-time queries, prepared statements add overhead without benefit.

Prepared Statement Benchmark Results
Query Type	Ad-Hoc (qps)	Prepared (qps)	Improvement	Notes
Simple SELECT by PK	8,000	25,000	3.1x	Parse/optimize overhead eliminated
INSERT single row	12,000	35,000	2.9x	Bound parameters avoid escaping
SELECT with 2 JOINs	2,000	18,000	9x	Optimization savings compound
Complex aggregate	500	8,000	16x	Heavy optimization cost amortized
Dynamic search (varies)	6,000	6,500	1.08x	Low cache hit rate limits benefit

prepared_statement_benchmark.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# Benchmark: Prepared vs Ad-Hoc Query Performance
 
import time
import psycopg2
from psycopg2 import sql
import statistics
 
def benchmark_adhoc(conn, iterations=10000):
    """Benchmark ad-hoc queries (string formatting)."""
    cursor = conn.cursor()
    times = []
    
    for i in range(iterations):
        user_id = i % 100000  # Vary the parameter
        
        start = time.perf_counter()
        # WARNING: Don't do this in production (SQL injection risk)
        cursor.execute(f"SELECT * FROM users WHERE id = {user_id}")
        cursor.fetchone()
        elapsed = time.perf_counter() - start
        
        times.append(elapsed * 1000)  # Convert to ms
    
    return {
        'mean_ms': statistics.mean(times),
        'p99_ms': sorted(times)[int(len(times) * 0.99)],
        'qps': iterations / sum(times) * 1000
    }
 
def benchmark_prepared(conn, iterations=10000):
    """Benchmark prepared statements."""
    cursor = conn.cursor()
    times = []
    
    # Prepare once (in real apps, this happens at connection init)
    cursor.execute(
        "PREPARE user_lookup(int) AS "
        "SELECT * FROM users WHERE id = $1"
    )
    
    for i in range(iterations):
        user_id = i % 100000
        
        start = time.perf_counter()
        cursor.execute("EXECUTE user_lookup(%s)", (user_id,))
        cursor.fetchone()
        elapsed = time.perf_counter() - start
        
        times.append(elapsed * 1000)
    
    cursor.execute("DEALLOCATE user_lookup")
    
    return {
        'mean_ms': statistics.mean(times),
        'p99_ms': sorted(times)[int(len(times) * 0.99)],
        'qps': iterations / sum(times) * 1000
    }
 
def benchmark_native_prepared(conn, iterations=10000):
    """Benchmark using driver's native prepared statement support."""
    times = []
    
    # Use parameterized query (driver handles preparation)
    query = "SELECT * FROM users WHERE id = %s"
    
    for i in range(iterations):
        user_id = i % 100000
        cursor = conn.cursor()
        
        start = time.perf_counter()
        cursor.execute(query, (user_id,))
        cursor.fetchone()
        elapsed = time.perf_counter() - start
        
        times.append(elapsed * 1000)
        cursor.close()
    
    return {
        'mean_ms': statistics.mean(times),
        'p99_ms': sorted(times)[int(len(times) * 0.99)],
        'qps': iterations / sum(times) * 1000
    }
 
# Run benchmarks
conn = psycopg2.connect("dbname=benchmark user=postgres")
 
print("Warming up connection pool...")
benchmark_adhoc(conn, iterations=100)
 
print("\nBenchmark Results:")
print("-" * 50)
 
adhoc = benchmark_adhoc(conn, iterations=10000)
print(f"Ad-Hoc Queries:")
print(f"  Mean: {adhoc['mean_ms']:.3f}ms, P99: {adhoc['p99_ms']:.3f}ms")
print(f"  Throughput: {adhoc['qps']:.0f} qps")
 
prepared = benchmark_prepared(conn, iterations=10000)
print(f"\nPrepared Statements:")
print(f"  Mean: {prepared['mean_ms']:.3f}ms, P99: {prepared['p99_ms']:.3f}ms")
print(f"  Throughput: {prepared['qps']:.0f} qps")
 
improvement = prepared['qps'] / adhoc['qps']
print(f"\nImprovement: {improvement:.1f}x throughput increase")

Query Compilation (JIT) Benefits

JIT compilation provides performance benefits that differ qualitatively from plan caching. While plan caching reduces preparation overhead, JIT compilation accelerates execution itself.

Where JIT helps most:

Expression evaluation — Complex WHERE clauses, computed columns, CASE expressions
Tuple deforming — Extracting column values from row representations
Aggregation — SUM, COUNT, AVG over large datasets
Hash operations — Hash joins, hash aggregation

Where JIT doesn't help:

I/O-bound queries — If waiting on disk, faster CPU doesn't help
Simple queries — Compilation time exceeds execution time
Short-running queries — No time to amortize compilation overhead
Memory-intensive operations — Cache misses dominate, not computation

JIT Compilation Benchmark Results (PostgreSQL)
Query Type	JIT Off (ms)	JIT On (ms)	Speedup	Analysis
TPC-H Q1 (aggregation)	4,500	2,100	2.1x	Expression-heavy aggregation benefits
TPC-H Q5 (complex joins)	8,200	5,800	1.4x	Mix of compute and I/O
TPC-H Q6 (scan + filter)	1,200	520	2.3x	Predicate evaluation accelerated
Simple point lookup	0.8	1.2	0.67x	JIT overhead exceeds benefit
Large hash join	15,000	9,500	1.6x	Hash computation accelerated
I/O-bound scan	25,000	24,500	1.02x	Disk waiting dominates

postgresql_jit_analysis.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
-- PostgreSQL JIT Performance Analysis
 
-- Enable JIT and set thresholds
SET jit = on;
SET jit_above_cost = 100000;        -- JIT for queries costing > this
SET jit_inline_above_cost = 500000;  -- Inline functions above this cost
SET jit_optimize_above_cost = 500000; -- Apply optimizations above this
 
-- Analyze a query with JIT timing breakdown
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT 
    l_returnflag,
    l_linestatus,
    SUM(l_quantity) as sum_qty,
    SUM(l_extendedprice) as sum_base_price,
    SUM(l_extendedprice * (1 - l_discount)) as sum_disc_price,
    AVG(l_quantity) as avg_qty,
    AVG(l_extendedprice) as avg_price,
    COUNT(*) as count_order
FROM lineitem
WHERE l_shipdate <= DATE '1998-12-01' - INTERVAL '90 days'
GROUP BY l_returnflag, l_linestatus
ORDER BY l_returnflag, l_linestatus;
 
-- Sample output shows JIT timing:
-- JIT:
--   Functions: 21
--   Options: Inlining true, Optimization true, Expressions true, Deforming true
--   Timing: Generation 5.234 ms, Inlining 18.456 ms, Optimization 89.123 ms,
--           Emission 45.678 ms, Total 158.491 ms
-- Execution Time: 2134.567 ms
 
-- Compare with JIT disabled
SET jit = off;
-- Re-run query... Execution Time: 4567.890 ms
 
-- JIT benefit: 4567 / 2134 = 2.14x speedup
-- Note: JIT time (158ms) is small compared to execution savings (2433ms)
 
-- Query statistics including JIT metrics
SELECT 
    substring(query, 1, 60) as query_preview,
    calls,
    mean_exec_time::numeric(10,2) as avg_ms,
    jit_functions as jit_funcs,
    jit_generation_time::numeric(10,2) as jit_gen_ms,
    jit_optimization_time::numeric(10,2) as jit_opt_ms,
    jit_emission_time::numeric(10,2) as jit_emit_ms
FROM pg_stat_statements
WHERE jit_functions > 0
ORDER BY jit_generation_time DESC
LIMIT 10;

JIT Threshold Tuning

Plan Caching Impact

Key metrics to measure:

Plan Cache Hit Rate — Percentage of lookups finding cached plan
Compilation Time Saved — Total optimization time avoided
Memory Efficiency — Plans serving multiple queries
Eviction Rate — How often plans are pushed out of cache

plan_cache_impact_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
-- SQL Server: Plan Cache Impact Analysis
 
-- 1. Overall plan cache efficiency
SELECT 
    objtype as object_type,
    COUNT(*) as cached_plans,
    SUM(usecounts) as total_uses,
    SUM(size_in_bytes) / 1024 / 1024 as cache_size_mb,
    AVG(usecounts) as avg_reuse_count,
    SUM(usecounts) * 1.0 / COUNT(*) as reuse_ratio
FROM sys.dm_exec_cached_plans
GROUP BY objtype
ORDER BY total_uses DESC;
 
-- 2. Cache hit ratio from performance counters
SELECT 
    counter_name,
    cntr_value
FROM sys.dm_os_performance_counters
WHERE object_name LIKE '%Plan Cache%'
    AND counter_name IN (
        'Cache Hit Ratio',
        'Cache Hit Ratio Base',
        'Cache Object Counts',
        'Cache Pages'
    );
 
-- 3. Compilation time analysis
-- (Requires Query Store to be enabled)
SELECT 
    SUM(rs.count_executions) as total_executions,
    SUM(p.count_compiles) as total_compiles,
    (SUM(rs.count_executions) - SUM(p.count_compiles)) as cache_hits,
    100.0 * (SUM(rs.count_executions) - SUM(p.count_compiles)) 
        / SUM(rs.count_executions) as cache_hit_pct,
    SUM(p.count_compiles * rs.avg_compile_duration) / 1000000.0 
        as total_compile_sec,
    SUM(p.count_compiles * rs.avg_compile_duration) / 1000000.0 
        / SUM(p.count_compiles) as avg_compile_sec
FROM sys.query_store_plan p
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id;
 
-- 4. Queries that would benefit from better caching
-- (High compile count relative to execution count)
SELECT TOP 20
    qt.query_sql_text,
    p.count_compiles,
    rs.count_executions,
    (p.count_compiles * 1.0 / rs.count_executions) as compile_ratio,
    rs.avg_compile_duration / 1000.0 as avg_compile_ms
FROM sys.query_store_plan p
JOIN sys.query_store_query q ON p.query_id = q.query_id
JOIN sys.query_store_query_text qt ON q.query_text_id = qt.query_text_id
JOIN sys.query_store_runtime_stats rs ON p.plan_id = rs.plan_id
WHERE rs.count_executions > 100
ORDER BY compile_ratio DESC;

Target Metrics for Healthy Plan Caching

•Plan cache hit rate > 95% — Most queries should find cached plans
•Average reuse count > 10 — Plans should serve many executions
•Eviction rate < 5% per hour — Cache should be stable
•Compile time < 5% of total time — Compilation shouldn't dominate
•Query variance < 50% CV — Consistent execution times indicate stable plans

Real-World Case Studies

Theory meets practice in these real-world scenarios where query compilation techniques delivered measurable business impact.

Case Study 1: E-Commerce Product Search

Problem: Product search API response times averaged 450ms, with P99 at 2 seconds. Database CPU at 85% during peak hours.

Analysis: 80% of search queries used ad-hoc SQL with dynamic filter construction. Each search triggered full parse-optimize cycle despite similar query shapes.

Solution: Implemented prepared statement templates for common filter combinations (category, price range, rating). Used parameterized queries instead of string concatenation.

Result:

Average response time: 450ms → 85ms (5.3x improvement)
P99 response time: 2000ms → 280ms (7.1x improvement)
Database CPU during peak: 85% → 35%
Throughput capacity: 2x increase with same hardware

Case Study 1: E-Commerce Search Performance
Metric	Before	After	Improvement
Avg Response Time	450ms	85ms	5.3x faster
P99 Response Time	2000ms	280ms	7.1x faster
Peak CPU Usage	85%	35%	50% reduction
Max Throughput	5,000 qps	10,000 qps	2x capacity
Cost (monthly)	$45,000	$22,500	50% reduction

Case Study 2: Financial Reporting Dashboard

Problem: Daily financial reports took 4+ hours to generate. Business users couldn't access updated data until mid-morning.

Analysis: Complex aggregation queries processed hundreds of millions of rows. Interpreted execution spent 70% of time on expression evaluation overhead.

Solution: Enabled PostgreSQL JIT compilation. Tuned thresholds for the reporting workload. Added materialized views with incremental refresh for key aggregations.

Result:

Report generation time: 4 hours → 55 minutes (4.4x improvement)
Critical path query: 45 minutes → 8 minutes (5.6x improvement)
Business impact: Reports available before trading hours open

Case Study 3: SaaS Multi-Tenant Application

Problem: Parameter sniffing caused intermittent performance degradation. Some tenants (large) experienced 10-second queries while small tenants got millisecond responses.

Analysis: Shared execution plans optimized for first-execution tenant. Large tenant queries used small-tenant plans (index seeks on 10 million rows) and vice versa.

Solution: Implemented OPTIMIZE FOR UNKNOWN for tenant-filtered queries. Added Query Store monitoring with automatic plan correction. Considered tenant-specific connection pools for extreme cases.

Result:

P99 query time: 10,000ms → 500ms (20x improvement)
Support tickets (slow queries): 50/week → 2/week
Tenant satisfaction scores: +35%
Avoided expensive migration to sharded architecture

Document Your Wins

When to Apply Which Technique

Different scenarios call for different techniques. This decision matrix helps you choose the right approach based on workload characteristics.

Technique Selection by Workload
Workload Type	Primary Technique	Secondary Technique	Avoid
High-frequency OLTP	Prepared statements	Plan caching	JIT compilation
Complex analytics	JIT compilation	Vectorization	Over-prepared patterns
Mixed OLTP/OLAP	Adaptive query processing	Workload segmentation	One-size-fits-all
Ad-hoc reporting	Plan caching	Materialized views	Per-query preparation
Microservices (many connections)	Connection pooling + prepared	Protocol-level preparation	Per-connection prep
Batch processing	Bulk operations	JIT for transforms	Row-by-row prepared

Decision Framework

•Query frequency > 100/second? → Prepared statements are essential; every millisecond of optimization time matters.
•Query execution > 1 second? → Consider JIT compilation; compilation time is amortized over long execution.
•Data highly skewed? → Use OPTIMIZE FOR UNKNOWN or adaptive query processing to avoid parameter sniffing issues.
•Many unique query patterns? → Focus on plan caching efficiency; prepared statements provide limited benefit.
•CPU-bound execution? → JIT compilation and vectorization address computation overhead.
•I/O-bound execution? → Focus on indexes, caching, and avoiding unnecessary scans; query compilation won't help.

Start Simple, Measure, Iterate

Common Pitfalls and Anti-Patterns

Performance optimization can backfire when applied incorrectly. These anti-patterns negate or reverse expected benefits.

Anti-Patterns to Avoid

•Preparing statements inside request loops — Preparing per-request is worse than ad-hoc queries. Prepare once at startup or use connection pool caching.
•Enabling JIT for all queries — Low jit_above_cost causes JIT overhead for simple queries. Keep thresholds appropriate for workload.
•Over-caching with stale plans — Never invalidating plan cache leads to suboptimal plans as data changes. Balance caching with freshness.
•Ignoring parameter sniffing in production — Testing with small datasets hides sniffing issues. Test with production-scale data and distribution.
•Disabling all caching for 'freshness' — Extreme position that sacrifices massive performance. Use appropriate TTLs instead.
•Assuming preparation equals parameterization — Concatenating user input into prepared statement templates reintroduces SQL injection.
•Single connection pool for all workloads — OLAP queries competing with OLTP in same pool causes contention. Segment by workload type.
•Not monitoring after changes — Performance optimization isn't set-and-forget. Workloads evolve; monitor continuously.

common_mistakes.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Anti-Patterns in Query Optimization
 
import psycopg2
 
# ==========================================
# ANTI-PATTERN 1: Preparing inside the loop
# ==========================================
def process_orders_WRONG(order_ids: list):
    """
    WRONG: Prepares statement for each order.
    Worse than ad-hoc because of PREPARE + EXECUTE overhead.
    """
    conn = psycopg2.connect(...)
    cursor = conn.cursor()
    
    for order_id in order_ids:
        # Preparing inside loop - defeats the purpose!
        cursor.execute(
            "PREPARE get_order AS SELECT * FROM orders WHERE id = $1"
        )
        cursor.execute("EXECUTE get_order(%s)", (order_id,))
        cursor.execute("DEALLOCATE get_order")
        # ... process order
 
def process_orders_CORRECT(order_ids: list):
    """
    CORRECT: Prepare once, execute many times.
    """
    conn = psycopg2.connect(...)
    cursor = conn.cursor()
    
    # Prepare once
    cursor.execute(
        "PREPARE get_order(int) AS SELECT * FROM orders WHERE id = $1"
    )
    
    # Execute many times
    for order_id in order_ids:
        cursor.execute("EXECUTE get_order(%s)", (order_id,))
        # ... process order
    
    # Deallocate when done
    cursor.execute("DEALLOCATE get_order")
 
 
# ==========================================
# ANTI-PATTERN 2: Parameterized SQL Injection
# ==========================================
def search_products_WRONG(table_name: str, category: str):
    """
    WRONG: User input in non-parameterizable position.
    Still vulnerable despite using parameters for category!
    """
    conn = psycopg2.connect(...)
    cursor = conn.cursor()
    
    # table_name from user input - SQL injection!
    query = f"SELECT * FROM {table_name} WHERE category = %s"
    cursor.execute(query, (category,))  # Category is safe, table is not!
    return cursor.fetchall()
 
def search_products_CORRECT(table_name: str, category: str):
    """
    CORRECT: Whitelist non-parameterizable inputs.
    """
    ALLOWED_TABLES = {'products', 'archived_products', 'featured_products'}
    
    if table_name not in ALLOWED_TABLES:
        raise ValueError(f"Invalid table: {table_name}")
    
    conn = psycopg2.connect(...)
    cursor = conn.cursor()
    
    # Safe because table_name is whitelisted
    query = f"SELECT * FROM {table_name} WHERE category = %s"
    cursor.execute(query, (category,))
    return cursor.fetchall()

Summary: Performance Benefits

We've quantified the performance benefits of query compilation techniques and established frameworks for their effective application. Let's consolidate the key insights:

Key Takeaways

•Query execution time has distinct phases — Parse, analyze, optimize, compile, and execute. Different techniques target different phases.
•Prepared statements deliver 2-10x+ throughput gains — By eliminating parse and optimize phases for repeated queries.
•JIT compilation accelerates execution 1.5-2.5x — For CPU-bound queries with complex expressions; avoid for simple queries.
•Plan caching benefits compound globally — Cached plans serve all connections, maximizing hardware efficiency.
•Workload characteristics determine optimal strategy — OLTP favors caching; OLAP favors compilation; mixed requires adaptation.
•Anti-patterns can negate or reverse benefits — Preparing per-request, ignoring parameter sniffing, or misconfigured thresholds cause harm.
•Measure before and after — Performance optimization without measurement is guesswork. Establish baselines and track improvements.

Module Conclusion:

With this knowledge, you can diagnose performance issues at their root, select appropriate optimization strategies, and build systems that scale gracefully from prototype to production.

Module Complete