Loading learning content...
Sometimes the best optimization is to write a different query. Query rewriting transforms SQL statements to achieve identical results with dramatically better performance—changing how you ask the question without changing what you're asking.
The optimizer can only choose among execution plans for the query you wrote. It cannot rewrite your query into a fundamentally different structure. That's your job. A correlated subquery executing 100,000 times might become a single join. A DISTINCT on a large result set might become an EXISTS check. A function in WHERE might become a join to a reference table.
This page establishes systematic rewriting techniques that address common performance anti-patterns, transforming slow queries into efficient ones through structural changes.
By the end of this page, you will understand how to identify rewriting opportunities from execution plans and statistics, apply proven transformation patterns for subqueries, joins, aggregations, and predicates, and verify that rewrites preserve correctness while improving performance.
Not every slow query needs rewriting. Before restructuring SQL, consider whether simpler solutions exist:
Try These First:
Rewrite When:
Query rewriting must preserve correctness. A fast query that returns wrong results is worse than a slow query that returns right results. Always verify rewrites against the original using representative test data before deployment.
Subqueries are intuitive to write but often execute inefficiently. Many subqueries can be transformed into joins, which typically perform better because the optimizer has more flexibility in choosing execution order and methods.
Pattern 1: Correlated Subquery → Join
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124
-- ================================================================-- Subquery to Join Transformations-- ================================================================ -- =========================-- Pattern 1: Correlated Subquery → LEFT JOIN-- ========================= -- BEFORE: Correlated subquery (executes once per customer)SELECT c.customer_id, c.customer_name, (SELECT MAX(o.order_date) FROM orders o WHERE o.customer_id = c.customer_id) AS last_order_dateFROM customers c; /*Execution: For each of 100,000 customers, runs a separate subquery against orders table. If orders has 1M rows,this could be 100,000 index seeks or worse.*/ -- AFTER: LEFT JOIN with aggregation (single pass)SELECT c.customer_id, c.customer_name, MAX(o.order_date) AS last_order_dateFROM customers cLEFT JOIN orders o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.customer_name; /*Execution: Single hash join or merge join between customersand orders, then aggregate. Much more efficient for set operations.*/ -- =========================-- Pattern 2: IN Subquery → INNER JOIN-- ========================= -- BEFORE: IN with subquerySELECT *FROM products pWHERE p.product_id IN ( SELECT DISTINCT oi.product_id FROM order_items oi WHERE oi.quantity > 100); -- AFTER: JOIN (eliminates need for DISTINCT processing)SELECT DISTINCT p.*FROM products pINNER JOIN order_items oi ON p.product_id = oi.product_idWHERE oi.quantity > 100; -- Or better, using EXISTS (often fastest):SELECT p.*FROM products pWHERE EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id AND oi.quantity > 100); /*EXISTS vs IN vs JOIN comparison:- IN with subquery: May materialize subquery results- JOIN: May produce duplicates requiring DISTINCT- EXISTS: Stops at first match, no duplicate handling needed For "is there any matching row?" questions, EXISTS is typically optimal.*/ -- =========================-- Pattern 3: NOT IN → LEFT JOIN with NULL check-- ========================= -- BEFORE: NOT IN (DANGEROUS with NULLable columns!)SELECT c.*FROM customers cWHERE c.customer_id NOT IN ( SELECT o.customer_id FROM orders o WHERE o.status = 'shipped'); /*WARNING: If orders.customer_id contains ANY NULL value,NOT IN returns zero rows! This is SQL semantics, not a bug. Also, NOT IN often prevents efficient anti-join execution.*/ -- AFTER: LEFT JOIN with NULL check (correct and efficient)SELECT c.*FROM customers cLEFT JOIN orders o ON c.customer_id = o.customer_id AND o.status = 'shipped'WHERE o.customer_id IS NULL; /*Execution: Hash anti-join or merge anti-join.Correct behavior regardless of NULL values.Clear intent: find customers with no matching shipped orders.*/ -- Alternative: NOT EXISTS (often preferred)SELECT c.*FROM customers cWHERE NOT EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.status = 'shipped'); /*NOT EXISTS handles NULLs correctly and often generates optimal anti-join plans.Most readable for "no matching rows" semantics.*/Pattern 4: Scalar Subquery in SELECT → JOIN
Scalar subqueries in the SELECT list execute once per output row. For large result sets, this creates severe performance problems.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
-- =========================-- Pattern 4: Scalar Subquery in SELECT → JOIN-- ========================= -- BEFORE: Multiple scalar subqueries (very slow!)SELECT o.order_id, o.order_date, (SELECT c.customer_name FROM customers c WHERE c.customer_id = o.customer_id) AS customer_name, (SELECT COUNT(*) FROM order_items oi WHERE oi.order_id = o.order_id) AS item_count, (SELECT SUM(oi.quantity * oi.unit_price) FROM order_items oi WHERE oi.order_id = o.order_id) AS order_totalFROM orders oWHERE o.order_date >= '2024-01-01'; /*For 50,000 orders, this executes:- 50,000 lookups to customers- 50,000 COUNT queries against order_items - 50,000 SUM queries against order_items= 150,000 subquery executions!*/ -- AFTER: JOINs with aggregation (3 scans maximum)SELECT o.order_id, o.order_date, c.customer_name, oi_stats.item_count, oi_stats.order_totalFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN ( SELECT order_id, COUNT(*) AS item_count, SUM(quantity * unit_price) AS order_total FROM order_items GROUP BY order_id) oi_stats ON o.order_id = oi_stats.order_idWHERE o.order_date >= '2024-01-01'; /*Execution:- Single scan of orders (filtered)- Single scan of customers (hash/merge joined)- Single scan of order_items (grouped, then joined)- Much more efficient set-based processing*/ -- =========================-- CTE Approach for Clarity-- ========================= WITH order_item_stats AS ( SELECT order_id, COUNT(*) AS item_count, SUM(quantity * unit_price) AS order_total FROM order_items GROUP BY order_id)SELECT o.order_id, o.order_date, c.customer_name, ois.item_count, ois.order_totalFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_item_stats ois ON o.order_id = ois.order_idWHERE o.order_date >= '2024-01-01'; /*CTEs improve readability without runtime cost (in most cases).The optimizer typically inlines non-recursive CTEs.*/Modern optimizers can automatically transform some subqueries to joins (subquery unnesting/decorrelation). However, this doesn't work for all patterns. When in doubt, write the efficient form explicitly. Check EXPLAIN to verify the optimizer's choice.
Predicates (WHERE and JOIN conditions) often contain hidden performance problems. Small changes to how conditions are expressed can dramatically change execution plans.
SARGable Predicates
SARG = Search ARGument. A SARGable predicate allows the optimizer to use indexes. Non-SARGable predicates force table scans.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133
-- ================================================================-- Predicate Optimization Patterns-- ================================================================ -- =========================-- Pattern 1: Remove Functions from Indexed Columns-- ========================= -- NON-SARGABLE (cannot use index on created_at)SELECT * FROM ordersWHERE YEAR(created_at) = 2024 AND MONTH(created_at) = 6; -- SARGABLE (uses index on created_at)SELECT * FROM ordersWHERE created_at >= '2024-06-01' AND created_at < '2024-07-01'; /*Why the difference?- YEAR(created_at) = 2024: Must evaluate YEAR() for every row- created_at >= '2024-06-01': Can seek directly in index The optimizer can use an index range scan for direct comparisonsbut cannot "see through" function calls to use indexes.*/ -- NON-SARGABLE: Function on columnSELECT * FROM customers WHERE LOWER(email) = 'john@example.com'; -- SARGABLE: Apply function to literal instead (if case-insensitive collation)SELECT * FROM customers WHERE email = 'john@example.com'; -- Or create functional index (PostgreSQL, MySQL 8+)CREATE INDEX idx_customers_email_lower ON customers(LOWER(email)); -- =========================-- Pattern 2: Avoid Implicit Type Conversions-- ========================= -- PROBLEM: customer_id is INT, but we pass string-- NON-SARGABLE (implicit conversion on every row)SELECT * FROM orders WHERE customer_id = '12345'; -- SARGABLE (correct type, no conversion)SELECT * FROM orders WHERE customer_id = 12345; -- PROBLEM: Mixed types in JOIN-- This may convert every row of one tableSELECT * FROM orders oJOIN order_legacy l ON o.order_id = CAST(l.order_code AS INT); -- Better: Fix the source data or add computed columnALTER TABLE order_legacy ADD order_id_int AS CAST(order_code AS INT) PERSISTED;CREATE INDEX idx_order_legacy_id ON order_legacy(order_id_int); -- Then join on matching typesSELECT * FROM orders oJOIN order_legacy l ON o.order_id = l.order_id_int; -- =========================-- Pattern 3: Transform OR to UNION-- ========================= -- PROBLEM: OR on different columns prevents single index use-- This often results in a table scanSELECT * FROM productsWHERE category_id = 5 OR supplier_id = 12; -- SOLUTION: UNION allows index use on each conditionSELECT * FROM products WHERE category_id = 5UNIONSELECT * FROM products WHERE supplier_id = 12; -- Or UNION ALL with deduplication if overlap is rareSELECT DISTINCT * FROM ( SELECT * FROM products WHERE category_id = 5 UNION ALL SELECT * FROM products WHERE supplier_id = 12) combined; /*Each UNION branch can use its optimal index:- First branch uses idx_products_category- Second branch uses idx_products_supplierCombined result is still smaller than full table scan.*/ -- =========================-- Pattern 4: Rewrite LIKE for Index Usage-- ========================= -- NON-SARGABLE: Leading wildcard (cannot use index)SELECT * FROM products WHERE product_name LIKE '%widget%'; -- SARGABLE: Trailing wildcard only (uses index)SELECT * FROM products WHERE product_name LIKE 'widget%'; -- For middle-of-string search, consider:-- 1. Full-text search indexes-- 2. Trigram indexes (PostgreSQL pg_trgm)-- 3. Application-level search (Elasticsearch, etc.) -- PostgreSQL: Trigram index for LIKE '%pattern%'CREATE EXTENSION IF NOT EXISTS pg_trgm;CREATE INDEX idx_products_name_trgm ON products USING gin(product_name gin_trgm_ops); -- Now this can use the trigram index:SELECT * FROM products WHERE product_name LIKE '%widget%'; -- =========================-- Pattern 5: Transform NOT IN to Anti-Join-- ========================= -- SLOW: NOT IN with large subquerySELECT p.* FROM products pWHERE p.category_id NOT IN ( SELECT c.category_id FROM categories c WHERE c.active = 0); -- BETTER: LEFT JOIN with NULL checkSELECT p.* FROM products pLEFT JOIN categories c ON p.category_id = c.category_id AND c.active = 0WHERE c.category_id IS NULL; -- BEST: NOT EXISTS (clearest intent, usually fastest)SELECT p.* FROM products pWHERE NOT EXISTS ( SELECT 1 FROM categories c WHERE c.category_id = p.category_id AND c.active = 0);Type conversions often happen silently. When comparing VARCHAR to NVARCHAR, INT to BIGINT, or CHAR to VARCHAR, the optimizer may convert one side, preventing index usage. Check data types in table definitions and ensure predicates use matching types.
Joins are often the largest contributors to query cost. Rewriting joins can eliminate unnecessary work, reduce intermediate result sizes, and enable more efficient join algorithms.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183
-- ================================================================-- Join Optimization Patterns-- ================================================================ -- =========================-- Pattern 1: Filter Early, Join Late-- ========================= -- SLOW: Join all rows, then filterSELECT c.customer_name, o.order_id, o.totalFROM customers cJOIN orders o ON c.customer_id = o.customer_idWHERE o.order_date >= '2024-01-01' AND o.status = 'completed' AND c.country = 'USA'; /*Order of operations matters. If the optimizer doesn't push predicates down, this may join ALL orders to ALL customersfirst, then filter. Sometimes we need to make filtering explicit.*/ -- BETTER: Subqueries materialize filtered data firstSELECT c.customer_name, o.order_id, o.totalFROM ( SELECT customer_id, customer_name FROM customers WHERE country = 'USA') cJOIN ( SELECT customer_id, order_id, total FROM orders WHERE order_date >= '2024-01-01' AND status = 'completed') o ON c.customer_id = o.customer_id; /*This explicitly filters before joining.Modern optimizers often do this automatically,but explicit subqueries can help with complex queries. Check EXPLAIN to verify predicate pushdown is happening.*/ -- =========================-- Pattern 2: Reduce Join Width-- ========================= -- SLOW: Selecting many columns from large tableSELECT o.*, -- All order columns c.*, -- All customer columns oi.* -- All order_item columnsFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_items oi ON o.order_id = oi.order_idWHERE o.order_date >= '2024-01-01'; -- BETTER: Select only needed columnsSELECT o.order_id, o.order_date, o.total, c.customer_name, c.email, oi.product_id, oi.quantity, oi.unit_priceFROM orders oJOIN customers c ON o.customer_id = c.customer_idJOIN order_items oi ON o.order_id = oi.order_idWHERE o.order_date >= '2024-01-01'; /*Narrower rows = more rows fit in memory = fewer buffer operationsAlso enables covering indexes that include only needed columns.*/ -- =========================-- Pattern 3: Eliminate Unnecessary Joins-- ========================= -- BEFORE: Join to get single column valueSELECT o.order_id, o.order_date, c.customer_id -- Only using the key, which we already have!FROM orders oJOIN customers c ON o.customer_id = c.customer_id; -- AFTER: No join neededSELECT order_id, order_date, customer_id -- Already in orders tableFROM orders; -- BEFORE: Join for existence check (returns duplicates)SELECT DISTINCT o.order_id, o.totalFROM orders oJOIN order_items oi ON o.order_id = oi.order_idWHERE oi.product_id = 100; -- AFTER: EXISTS (no duplicate handling needed)SELECT o.order_id, o.totalFROM orders oWHERE EXISTS ( SELECT 1 FROM order_items oi WHERE oi.order_id = o.order_id AND oi.product_id = 100); -- =========================-- Pattern 4: Replace Multiple Joins with UNION-- ========================= -- SLOW: Multiple LEFT JOINs to check multiple conditionsSELECT customer_id, CASE WHEN e.email_id IS NOT NULL THEN 'Email' WHEN p.phone_id IS NOT NULL THEN 'Phone' WHEN a.address_id IS NOT NULL THEN 'Mail' ELSE 'Unknown' END AS contact_methodFROM customers cLEFT JOIN email_contacts e ON c.customer_id = e.customer_idLEFT JOIN phone_contacts p ON c.customer_id = p.customer_idLEFT JOIN address_contacts a ON c.customer_id = a.customer_id; -- BETTER: UNION approach (for large tables)SELECT customer_id, 'Email' AS contact_methodFROM customers WHERE customer_id IN (SELECT customer_id FROM email_contacts)UNION ALLSELECT customer_id, 'Phone' AS contact_methodFROM customers WHERE customer_id IN (SELECT customer_id FROM phone_contacts) AND customer_id NOT IN (SELECT customer_id FROM email_contacts)UNION ALL-- ... etc. /*The optimal approach depends on data distribution.- If most customers have email: LEFT JOIN may be fine- If contacts are sparse: EXISTS checks may be faster- Profile both approaches with real data!*/ -- =========================-- Pattern 5: Rewrite Theta Joins-- ========================= -- SLOW: Range join (non-equi join)SELECT e.employee_name, s.salary_gradeFROM employees eJOIN salary_grades s ON e.salary >= s.min_salary AND e.salary <= s.max_salary; /*Range joins can't use hash or merge joins efficiently.They often result in nested loops with full scan of one side.*/ -- OPTIMIZATION OPTIONS:-- 1. Pre-compute join result in application-- 2. Use bucketing/binning to convert to equi-join-- 3. Add computed column for grade lookup -- Example: Add precomputed salary grade to employeesALTER TABLE employees ADD salary_grade_id INT; -- Update using the range logic (one-time or batch)UPDATE employees eSET salary_grade_id = ( SELECT s.grade_id FROM salary_grades s WHERE e.salary >= s.min_salary AND e.salary <= s.max_salary); -- Now queries can use efficient equi-joinSELECT e.employee_name, s.salary_gradeFROM employees eJOIN salary_grades s ON e.salary_grade_id = s.grade_id;The optimizer chooses join order, but it has limits on how many orderings it evaluates. For complex queries with many joins, try reordering tables in FROM clause to suggest a good starting point. Some databases respect hints like STRAIGHT_JOIN (MySQL) or explicit join order.
Aggregation queries—GROUP BY, DISTINCT, and aggregate functions—can be expensive because they require sorting or hashing entire datasets. Strategic rewrites can dramatically reduce the data volume processed.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159
-- ================================================================-- Aggregation Optimization Patterns-- ================================================================ -- =========================-- Pattern 1: Filter Before Aggregating-- ========================= -- SLOW: Aggregate all, then filter with HAVINGSELECT product_id, SUM(quantity) AS total_quantityFROM order_itemsGROUP BY product_idHAVING SUM(quantity) > 1000; /*This aggregates ALL products, then discards most results.If 95% of products have quantity < 1000, wasted work.*/ -- Better approach when possible: Pre-filter if column existsSELECT product_id, SUM(quantity) AS total_quantityFROM order_items oiWHERE EXISTS ( -- Only include products that COULD exceed threshold SELECT 1 FROM products p WHERE p.product_id = oi.product_id AND p.is_popular = 1 -- Some pre-filter criterion)GROUP BY product_idHAVING SUM(quantity) > 1000; -- Or maintain running totals separately for large datasets-- (denormalization for read performance) -- =========================-- Pattern 2: Use DISTINCT vs GROUP BY Appropriately-- ========================= -- These are semantically equivalent:SELECT DISTINCT customer_id FROM orders;SELECT customer_id FROM orders GROUP BY customer_id; /*Performance may differ:- DISTINCT: Optimized for exact duplicates- GROUP BY: Optimized when aggregates are needed Generally, use DISTINCT for simple deduplication,GROUP BY when calculating aggregates.Check EXPLAIN to see if optimizer chooses different strategies.*/ -- =========================-- Pattern 3: Approximate Aggregation for Analytics-- ========================= -- EXACT (potentially expensive for billions of rows)SELECT COUNT(DISTINCT user_id) AS unique_usersFROM page_viewsWHERE event_date >= '2024-01-01'; -- APPROXIMATE (much faster for analytics use cases)-- PostgreSQL uses HyperLogLog internallySELECT COUNT(DISTINCT user_id) AS unique_usersFROM page_viewsWHERE event_date >= '2024-01-01'; -- For explicit approximate counting (if supported):-- SQL Server: APPROX_COUNT_DISTINCT (2019+)SELECT APPROX_COUNT_DISTINCT(user_id) AS approx_unique_usersFROM page_viewsWHERE event_date >= '2024-01-01'; /*Approximate functions trade 1-2% accuracy for 10-100x speed.Excellent for dashboards, analytics, monitoring.Not appropriate for financial reporting or exact counts.*/ -- =========================-- Pattern 4: Partial Aggregation with Pre-aggregated Tables-- ========================= -- SLOW: Aggregate from transaction table every timeSELECT DATE_TRUNC('month', order_date) AS month, COUNT(*) AS order_count, SUM(total) AS revenueFROM ordersWHERE order_date >= '2020-01-01'GROUP BY DATE_TRUNC('month', order_date); -- FAST: Maintain pre-aggregated summary table-- Daily aggregation table (maintained by batch job or trigger)CREATE TABLE order_daily_summary ( summary_date DATE PRIMARY KEY, order_count INT, total_revenue DECIMAL(18,2)); -- Query from summary insteadSELECT DATE_TRUNC('month', summary_date) AS month, SUM(order_count) AS order_count, SUM(total_revenue) AS revenueFROM order_daily_summaryWHERE summary_date >= '2020-01-01'GROUP BY DATE_TRUNC('month', summary_date); /*Pre-aggregation trades storage and update complexityfor dramatically faster reads. Standard practice for:- Analytics dashboards- Reporting queries- Time-series data Consider update strategy: batch, incremental, or trigger-based.*/ -- =========================-- Pattern 5: Window Functions vs JOIN for Rankings-- ========================= -- SLOW: Self-join for "top per group"SELECT o.*FROM orders oWHERE o.total = ( SELECT MAX(o2.total) FROM orders o2 WHERE o2.customer_id = o.customer_id); /*Correlated subquery executes once per row.For 1M orders, this is slow.*/ -- FAST: Window function (single pass)SELECT *FROM ( SELECT o.*, ROW_NUMBER() OVER( PARTITION BY customer_id ORDER BY total DESC ) AS rn FROM orders o) rankedWHERE rn = 1; /*Window function processes all rows in a single pass,partitioning and ranking efficiently.Modern approach for any "top N per group" problem.*/COUNT() counts all rows; COUNT(column) counts non-NULL values. COUNT() can be optimized using statistics or narrow indexes. COUNT(column) may require checking each value. Use COUNT(*) for row counts unless you specifically need to exclude NULLs.
CTEs (Common Table Expressions) and derived tables (subqueries in FROM) provide structure and readability. However, their performance characteristics differ across databases—understanding these differences is essential for optimization.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163
-- ================================================================-- CTE and Derived Table Optimization-- ================================================================ -- =========================-- Understanding CTE Execution Models-- ========================= /*Database systems handle CTEs differently: PostgreSQL (pre-12): CTEs were ALWAYS materialized - Good: Prevents repeated execution - Bad: Prevents predicate pushdown PostgreSQL (12+): CTEs are inlined unless: - Referenced multiple times - Marked with MATERIALIZED hint SQL Server: CTEs are typically inlined - Optimizer treats as views - May execute multiple times if referenced multiple times MySQL (8+): Similar to SQL Server - Inlining is default Oracle: Can be materialized with hints - MATERIALIZE or INLINE hints available*/ -- =========================-- When to Force Materialization-- ========================= -- SCENARIO: CTE used multiple timesWITH expensive_calculation AS ( SELECT customer_id, SUM(total) AS lifetime_value FROM orders WHERE order_date >= '2020-01-01' GROUP BY customer_id)SELECT * FROM expensive_calculation WHERE lifetime_value > 10000UNION ALLSELECT * FROM expensive_calculation WHERE lifetime_value BETWEEN 5000 AND 10000UNION ALLSELECT * FROM expensive_calculation WHERE lifetime_value BETWEEN 1000 AND 5000; /*Without materialization, the CTE may execute THREE times.Force materialization when CTE is referenced multiple times.*/ -- PostgreSQL 12+: Force materializationWITH expensive_calculation AS MATERIALIZED ( SELECT customer_id, SUM(total) AS lifetime_value FROM orders WHERE order_date >= '2020-01-01' GROUP BY customer_id)SELECT * FROM expensive_calculation WHERE lifetime_value > 10000UNION ALLSELECT * FROM expensive_calculation WHERE lifetime_value BETWEEN 5000 AND 10000; -- SQL Server: Use temp table for explicit materializationSELECT customer_id, SUM(total) AS lifetime_valueINTO #expensive_calcFROM ordersWHERE order_date >= '2020-01-01'GROUP BY customer_id; CREATE INDEX idx_temp ON #expensive_calc(lifetime_value); SELECT * FROM #expensive_calc WHERE lifetime_value > 10000UNION ALLSELECT * FROM #expensive_calc WHERE lifetime_value BETWEEN 5000 AND 10000; DROP TABLE #expensive_calc; -- =========================-- When to Force Inlining-- ========================= -- SCENARIO: CTE prevents predicate pushdownWITH recent_orders AS ( SELECT * FROM orders -- All orders selected)SELECT * FROM recent_orders WHERE customer_id = 12345; /*In PostgreSQL < 12, this materialized ALL orders,then filtered. Very inefficient. PostgreSQL 12+ inlines by default, allowing pushdown.*/ -- PostgreSQL 12+: Explicit inline (unnecessary but available)WITH recent_orders AS NOT MATERIALIZED ( SELECT * FROM orders)SELECT * FROM recent_orders WHERE customer_id = 12345; -- Best practice: Write the predicate in the CTEWITH recent_orders AS ( SELECT * FROM orders WHERE customer_id = 12345 -- Filter pushed into CTE)SELECT * FROM recent_orders; -- =========================-- Recursive CTE Optimization-- ========================= -- Recursive CTEs can be expensive; optimize the anchor and recursive terms -- SLOW: Unoptimized recursive CTEWITH RECURSIVE org_tree AS ( -- Anchor: All employees (expensive!) SELECT employee_id, manager_id, name, 1 AS level FROM employees WHERE manager_id IS NULL UNION ALL -- Recursive: Join to find reports SELECT e.employee_id, e.manager_id, e.name, ot.level + 1 FROM employees e JOIN org_tree ot ON e.manager_id = ot.employee_id)SELECT * FROM org_tree WHERE employee_id = 12345; /*This builds the ENTIRE org tree, then filters.If you only need one employee's path, start from there instead.*/ -- FAST: Start from target, walk upWITH RECURSIVE emp_path AS ( -- Anchor: Start from specific employee SELECT employee_id, manager_id, name, 1 AS level FROM employees WHERE employee_id = 12345 UNION ALL -- Recursive: Walk up to manager SELECT e.employee_id, e.manager_id, e.name, ep.level + 1 FROM employees e JOIN emp_path ep ON e.employee_id = ep.manager_id)SELECT * FROM emp_path; /*Processes only the path from employee to CEO,not the entire organization.*/Don't assume CTE behavior—verify with EXPLAIN. Look for "CTE Scan" (materialized) vs. inline evaluation. The optimizer's choice may differ from your expectation, especially after database upgrades.
Query rewrites must be verified for both correctness and performance improvement. A systematic testing approach prevents introducing bugs or regressions.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137
-- ================================================================-- Query Rewrite Verification Techniques-- ================================================================ -- =========================-- Technique 1: Row Count Comparison-- ========================= -- Count from originalSELECT COUNT(*) AS original_count FROM ( -- Original query here SELECT c.customer_id, c.customer_name, (SELECT MAX(o.order_date) FROM orders o WHERE o.customer_id = c.customer_id) AS last_order FROM customers c) original; -- Count from rewrittenSELECT COUNT(*) AS rewritten_count FROM ( -- Rewritten query here SELECT c.customer_id, c.customer_name, MAX(o.order_date) AS last_order FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name) rewritten; -- Should be equal! -- =========================-- Technique 2: EXCEPT Comparison (Bi-directional)-- ========================= -- Rows in original but not in rewritten (should be empty)( SELECT c.customer_id, c.customer_name, (SELECT MAX(o.order_date) FROM orders o WHERE o.customer_id = c.customer_id) AS last_order FROM customers c)EXCEPT( SELECT c.customer_id, c.customer_name, MAX(o.order_date) AS last_order FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name); -- Rows in rewritten but not in original (should be empty)( SELECT c.customer_id, c.customer_name, MAX(o.order_date) AS last_order FROM customers c LEFT JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name)EXCEPT( SELECT c.customer_id, c.customer_name, (SELECT MAX(o.order_date) FROM orders o WHERE o.customer_id = c.customer_id) AS last_order FROM customers c); -- Both should return 0 rows! -- =========================-- Technique 3: Performance Comparison-- ========================= -- Original query timingEXPLAIN (ANALYZE, BUFFERS, TIMING)SELECT c.customer_id, c.customer_name, (SELECT MAX(o.order_date) FROM orders o WHERE o.customer_id = c.customer_id) AS last_orderFROM customers c; /*Capture:- Execution time- Buffer hits/reads- Rows examined*/ -- Rewritten query timingEXPLAIN (ANALYZE, BUFFERS, TIMING)SELECT c.customer_id, c.customer_name, MAX(o.order_date) AS last_orderFROM customers cLEFT JOIN orders o ON c.customer_id = o.customer_idGROUP BY c.customer_id, c.customer_name; /*Compare metrics:- Is execution time lower?- Are buffer reads reduced?- Is CPU time reduced? Only deploy if clearly better across representative cases.*/ -- =========================-- Technique 4: Parameter Sensitivity Testing-- ========================= -- Test with different parameters that produce different data volumes -- Small result set parameterEXPLAIN ANALYZE SELECT ... WHERE customer_id = 1; -- Large result set parameter EXPLAIN ANALYZE SELECT ... WHERE customer_id BETWEEN 1 AND 10000; -- NULL parameter (if applicable)EXPLAIN ANALYZE SELECT ... WHERE customer_id IS NULL; -- Verify optimal plan chosen across parameter variations -- =========================-- Technique 5: NULL Behavior Verification-- ========================= -- Insert test data with NULLs if not present-- Then verify both queries handle identically -- Check NULL count matchesSELECT SUM(CASE WHEN last_order IS NULL THEN 1 ELSE 0 END) AS null_count, SUM(CASE WHEN last_order IS NOT NULL THEN 1 ELSE 0 END) AS non_null_countFROM ( -- Original query) original; -- Compare to rewrittenSELECT SUM(CASE WHEN last_order IS NULL THEN 1 ELSE 0 END) AS null_count, SUM(CASE WHEN last_order IS NOT NULL THEN 1 ELSE 0 END) AS non_null_countFROM ( -- Rewritten query) rewritten;Query rewriting transforms structurally inefficient SQL into efficient alternatives. Unlike index tuning which helps the optimizer execute your query better, rewriting gives the optimizer fundamentally better queries to work with.
What's next:
Query rewriting addresses individual queries. The final page covers continuous improvement—establishing processes and practices that maintain and improve query performance over time, across development, deployment, and production operations.
You now possess a comprehensive toolkit for query rewriting—subquery transformations, predicate optimization, join restructuring, and aggregation techniques. Combined with rigorous verification practices, you can confidently transform slow queries into efficient ones.