Common Table Expressions - Learning Module

Loading content...

0/252

CTE vs Subquery

Choosing Your Query Structure

You now understand both CTEs and traditional subqueries. But when should you use each? This isn't just an academic question—it affects readability, maintainability, performance, and even whether certain queries are possible at all.

The answer isn't always "use CTEs." Subqueries remain the right choice in many scenarios. Understanding the trade-offs enables you to choose the optimal approach for each situation, writing SQL that is not just correct but elegant and efficient.

What You Will Learn

By the end of this page, you will understand the fundamental differences between CTEs and subqueries, compare their performance characteristics, know when each approach excels, and have a practical decision framework for choosing between them in your daily SQL work.

Fundamental Differences

CTEs and subqueries solve similar problems—embedding one query's results into another—but they differ fundamentally in structure, scope, and capabilities.

CTE vs Subquery: Core Differences
Aspect	CTE (WITH Clause)	Subquery
Definition location	Before the main query	Inline, embedded in the query
Naming	Always named	Usually anonymous (aliases optional)
Reusability	Can be referenced multiple times	Must be duplicated for reuse
Self-reference	Allowed with RECURSIVE	Not possible
Reading order	Top-down (definition then usage)	Inside-out (nested context)
Maximum nesting	Flat structure (unlimited CTEs)	Deeply nested (practical limits)
Scope	Visible throughout statement	Visible only in immediate context
SQL Standard	SQL:1999 and later	Original SQL (pre-1999)

structural_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
-- The SAME logical query, two different structures
 
-- SUBQUERY APPROACH: Nested, inside-out reading
SELECT 
    c.customer_name,
    customer_totals.total_amount
FROM customers c
INNER JOIN (
    -- Must read this first to understand the outer query
    SELECT 
        customer_id,
        SUM(amount) as total_amount
    FROM (
        -- Must read this first to understand the middle query
        SELECT order_id, customer_id, amount
        FROM orders
        WHERE status = 'completed'
        AND order_date >= CURRENT_DATE - INTERVAL '1 year'
    ) AS completed_orders
    GROUP BY customer_id
    HAVING SUM(amount) > (
        -- And this is a scalar subquery inside HAVING
        SELECT AVG(total)
        FROM (
            SELECT customer_id, SUM(amount) as total
            FROM orders
            WHERE status = 'completed'
            GROUP BY customer_id
        ) AS all_totals
    )
) AS customer_totals ON c.customer_id = customer_totals.customer_id;
 
 
-- CTE APPROACH: Flat, top-down reading
WITH 
    completed_orders AS (
        -- Step 1: Define completed orders
        SELECT order_id, customer_id, amount
        FROM orders
        WHERE status = 'completed'
        AND order_date >= CURRENT_DATE - INTERVAL '1 year'
    ),
    
    customer_totals AS (
        -- Step 2: Aggregate by customer
        SELECT customer_id, SUM(amount) as total_amount
        FROM completed_orders
        GROUP BY customer_id
    ),
    
    average_total AS (
        -- Step 3: Calculate overall average
        SELECT AVG(total_amount) as avg_amount
        FROM customer_totals
    )
 
-- Step 4: Final result
SELECT c.customer_name, ct.total_amount
FROM customers c
INNER JOIN customer_totals ct ON c.customer_id = ct.customer_id
CROSS JOIN average_total
WHERE ct.total_amount > average_total.avg_amount;

Readability Comparison

Readability is subjective but measurable. Code is readable when its purpose is clear without extensive study, its structure matches its logic, and modifications are straightforward.

CTE Readability Advantages

•Descriptive names convey meaning without reading the query body
•Sequential flow matches how we think step-by-step
•Flat structure eliminates nesting-induced cognitive load
•Modular organization groups related logic together
•Self-documenting reduces need for comments

Subquery Readability Issues

•Anonymous expressions require reading full implementation
•Nested structure forces inside-out reading
•Deep indentation obscures overall structure
•Scattered logic mixes conditions across levels
•Parenthesis matching becomes error-prone

The Complexity Threshold

For simple, single-use subqueries (1-2 levels of nesting), inline subqueries often read better—they keep related logic together. The readability advantage of CTEs grows dramatically as query complexity increases. The threshold is typically around 2-3 levels of nesting or when the same logic appears twice.

readability_threshold.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
-- SUBQUERY WINS: Simple, single-use, low nesting
-- Reading "WHERE id IN (SELECT...)" is natural and clear
SELECT product_name, price
FROM products
WHERE category_id IN (
    SELECT category_id 
    FROM categories 
    WHERE department = 'Electronics'
);
 
-- No CTE needed - it would add verbosity without value
-- This CTE version is longer but not clearer:
WITH electronics_categories AS (
    SELECT category_id FROM categories WHERE department = 'Electronics'
)
SELECT product_name, price
FROM products
WHERE category_id IN (SELECT category_id FROM electronics_categories);
 
 
-- CTE WINS: Complex, multi-use, deep nesting
-- Compare the subquery monster:
SELECT 
    d.department_name,
    (SELECT COUNT(*) FROM employees e 
     WHERE e.department_id = d.department_id 
     AND e.salary > (SELECT AVG(salary) FROM employees WHERE department_id = d.department_id)) as above_avg_count
FROM departments d
WHERE d.department_id IN (
    SELECT department_id FROM employees 
    GROUP BY department_id 
    HAVING AVG(salary) > (SELECT AVG(salary) FROM employees)
);
 
-- Versus the CTE clarity:
WITH 
    dept_avg_salaries AS (
        SELECT department_id, AVG(salary) as dept_avg
        FROM employees GROUP BY department_id
    ),
    company_avg_salary AS (
        SELECT AVG(salary) as company_avg FROM employees
    ),
    above_avg_departments AS (
        SELECT department_id
        FROM dept_avg_salaries, company_avg_salary
        WHERE dept_avg > company_avg
    ),
    above_dept_avg_employees AS (
        SELECT e.department_id, COUNT(*) as count
        FROM employees e
        JOIN dept_avg_salaries das ON e.department_id = das.department_id
        WHERE e.salary > das.dept_avg
        GROUP BY e.department_id
    )
SELECT 
    d.department_name,
    COALESCE(aae.count, 0) as above_avg_count
FROM departments d
JOIN above_avg_departments aad ON d.department_id = aad.department_id
LEFT JOIN above_dept_avg_employees aae ON d.department_id = aae.department_id;

Performance Analysis

A common misconception is that CTEs are inherently slower or faster than subqueries. The truth is more nuanced: performance depends on how the query optimizer handles each construct, which varies by database and query structure.

Performance Factors

•Inlining — When a CTE is inlined (substituted like a macro), it performs identically to an equivalent subquery
•Materialization — When a CTE is materialized (computed once, stored in temp space), it may be faster (reuse) or slower (no predicate pushdown)
•Optimization barriers — Some databases treat CTEs as optimization barriers, preventing the optimizer from 'seeing through' them
•Reference count — CTEs referenced multiple times may benefit from materialization; single-use CTEs may suffer from it
•Database version — Modern database versions have smarter CTE optimization than older ones

Database-Specific CTE Performance Behavior
Database	Default Behavior	Performance Notes
PostgreSQL 11 and earlier	Always materialized	CTEs could be significantly slower due to optimization barrier
PostgreSQL 12+	Optimizer chooses	Much better; inlines single-reference CTEs by default
MySQL 8.0+	Generally inlined	Non-recursive CTEs merged into main query; minimal overhead
SQL Server	Optimizer chooses	May spool to tempdb; performance varies widely
Oracle	Generally inlined	/+ MATERIALIZE / hint forces materialization when beneficial
SQLite	Always inlined	CTEs are pure syntax sugar; no materialization

performance_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
-- SCENARIO 1: Single-reference CTE
-- Usually IDENTICAL performance to subquery
-- Optimizer inlines the CTE
 
WITH active_orders AS (
    SELECT * FROM orders WHERE status = 'active'
)
SELECT * FROM active_orders WHERE amount > 100;
 
-- Equivalent subquery (optimizer likely produces same plan):
SELECT * FROM (
    SELECT * FROM orders WHERE status = 'active'
) AS active_orders 
WHERE amount > 100;
 
 
-- SCENARIO 2: Multi-reference CTE
-- CTE MAY be faster (compute once, reuse)
-- Depends on whether optimizer materializes
 
WITH monthly_totals AS (
    SELECT 
        DATE_TRUNC('month', order_date) as month,
        SUM(amount) as total
    FROM orders
    WHERE status = 'completed'
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT 
    curr.month,
    curr.total,
    prev.total as prev_month
FROM monthly_totals curr
LEFT JOIN monthly_totals prev ON curr.month = prev.month + INTERVAL '1 month';
 
-- With subquery, the aggregation would execute TWICE:
SELECT 
    curr.month,
    curr.total,
    prev.total as prev_month
FROM (SELECT DATE_TRUNC('month', order_date) as month, SUM(amount) as total
      FROM orders WHERE status = 'completed' 
      GROUP BY DATE_TRUNC('month', order_date)) curr
LEFT JOIN (SELECT DATE_TRUNC('month', order_date) as month, SUM(amount) as total
           FROM orders WHERE status = 'completed' 
           GROUP BY DATE_TRUNC('month', order_date)) prev 
    ON curr.month = prev.month + INTERVAL '1 month';
 
 
-- SCENARIO 3: Predicate pushdown impact
-- CTE materialization can PREVENT predicate pushdown
 
-- CTE version (may materialize ALL active orders first)
WITH active_orders AS (
    SELECT * FROM orders WHERE status = 'active'
)
SELECT * FROM active_orders 
WHERE amount > 1000000;  -- Very selective filter
 
-- Subquery version (optimizer can push filter into subquery)
SELECT * 
FROM (SELECT * FROM orders WHERE status = 'active') ao
WHERE ao.amount > 1000000;  -- More likely to combine conditions
 
-- PostgreSQL 12+ hint for explicit control:
WITH active_orders AS NOT MATERIALIZED (
    SELECT * FROM orders WHERE status = 'active'
)
SELECT * FROM active_orders WHERE amount > 1000000;

Always Measure, Don't Assume

Never assume CTE or subquery is faster without measuring. Use EXPLAIN ANALYZE (PostgreSQL), EXPLAIN (MySQL), SET STATISTICS IO ON (SQL Server), or EXPLAIN PLAN (Oracle) to compare actual execution. Query optimizer behavior changes with data distribution, statistics, and database updates.

Capability Comparison

Some things are only possible with CTEs. Others can be done by either approach. Understanding these capabilities defines hard boundaries for your choice.

Feature Capability Matrix
Capability	CTE	Subquery	Notes
Recursion	✅ Yes	❌ No	CTEs are the ONLY way to do recursion in standard SQL
Self-reference	✅ Yes	❌ No	Required for hierarchies, graphs, series generation
Multiple references	✅ Native	⚠️ Copy-paste	Subqueries must be duplicated; CTEs reference by name
DML statements (INSERT/UPDATE/DELETE)	✅ Yes	✅ Yes	Both work in FROM clause or with data-modifying CTEs
Correlated use	⚠️ Limited	✅ Yes	Correlated subqueries reference outer query; CTEs cannot
Scalar context	⚠️ Awkward	✅ Natural	Scalar subqueries in SELECT list are more natural
EXISTS/NOT EXISTS	⚠️ Possible	✅ Natural	Subqueries are the idiomatic pattern for existence checks
LATERAL joins	✅ Yes	✅ Yes	Both can be used with LATERAL

capability_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
-- CAPABILITY: Recursion (CTE only)
-- Impossible with subqueries
WITH RECURSIVE hierarchy AS (
    SELECT id, name, parent_id, 0 as depth
    FROM nodes WHERE parent_id IS NULL
    UNION ALL
    SELECT n.id, n.name, n.parent_id, h.depth + 1
    FROM nodes n JOIN hierarchy h ON n.parent_id = h.id
)
SELECT * FROM hierarchy;
 
 
-- CAPABILITY: Multiple references (CTE advantage)
-- CTE: Define once, use many times
WITH order_stats AS (
    SELECT 
        DATE_TRUNC('day', order_date) as day,
        COUNT(*) as orders,
        SUM(amount) as revenue
    FROM orders
    GROUP BY DATE_TRUNC('day', order_date)
)
SELECT 
    day,
    orders,
    revenue,
    AVG(orders) OVER (ORDER BY day ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) as avg_7day_orders,
    (SELECT AVG(revenue) FROM order_stats) as overall_avg_revenue,
    revenue - (SELECT AVG(revenue) FROM order_stats) as revenue_vs_avg
FROM order_stats
ORDER BY day;
 
 
-- CAPABILITY: Correlated subquery (Subquery advantage)
-- Each row's subquery references outer row
SELECT 
    p.product_name,
    p.price,
    (SELECT COUNT(*) 
     FROM order_items oi 
     WHERE oi.product_id = p.product_id) as times_ordered,  -- Correlated!
    (SELECT MAX(order_date) 
     FROM orders o 
     JOIN order_items oi ON o.order_id = oi.order_id 
     WHERE oi.product_id = p.product_id) as last_ordered    -- Correlated!
FROM products p;
 
-- CTE cannot be correlated in the same way:
-- This WON'T work:
/*
WITH product_order_count AS (
    SELECT COUNT(*) as cnt
    FROM order_items 
    WHERE product_id = p.product_id  -- ERROR: p is not visible here!
)
SELECT p.product_name, (SELECT cnt FROM product_order_count)
FROM products p;
*/
 
 
-- CAPABILITY: EXISTS pattern (Subquery more natural)
-- Checking existence is idiomatic with subqueries
SELECT c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.amount > 10000
);
 
-- CTE version is more verbose:
WITH high_value_customer_ids AS (
    SELECT DISTINCT customer_id
    FROM orders WHERE amount > 10000
)
SELECT c.customer_name
FROM customers c
WHERE c.customer_id IN (SELECT customer_id FROM high_value_customer_ids);

Maintainability Considerations

Code maintainability—the ease of understanding, modifying, and debugging—often matters more than marginal performance differences. Consider how each approach affects long-term code health.

CTE Maintainability Strengths

•Modular debugging — Test each CTE independently by selecting from it
•Single point of change — Update logic in one CTE, all references benefit
•Clear data lineage — The flow from raw to result is explicitly named
•Easier code reviews — Reviewers can verify each step independently
•Better documentation — CTE names serve as inline documentation
•Refactoring-friendly — Extract CTEs to views or functions with minimal changes

Subquery Maintainability Challenges

•Scattered updates — Same logic in multiple subqueries must be updated in each location
•Debugging requires extraction — Must copy subquery out to test independently
•Hidden relationships — Dependencies between subqueries are implicit
•Easy to break — Moving a parenthesis can silently change query semantics
•Merge conflict magnets — Complex nested changes often conflict in version control

maintainability_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
-- MAINTENANCE SCENARIO: "Change the date range from 1 year to 6 months"
 
-- SUBQUERY VERSION: Must find and update EVERY occurrence
SELECT ...
FROM (SELECT ... FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '1 year') a
JOIN (SELECT ... FROM orders WHERE order_date >= CURRENT_DATE - INTERVAL '1 year' ...) b
WHERE product_id IN (
    SELECT product_id FROM order_items 
    WHERE order_id IN (
        SELECT order_id FROM orders 
        WHERE order_date >= CURRENT_DATE - INTERVAL '1 year'  -- Another one!
    )
);
-- Risk: Miss one occurrence and have inconsistent logic
 
 
-- CTE VERSION: Update ONE place
WITH 
    date_range AS (
        -- SINGLE SOURCE OF TRUTH
        SELECT CURRENT_DATE - INTERVAL '6 months' AS start_date  -- Changed here only!
    ),
    relevant_orders AS (
        SELECT * FROM orders, date_range
        WHERE order_date >= date_range.start_date
    ),
    order_products AS (
        SELECT DISTINCT product_id
        FROM order_items
        WHERE order_id IN (SELECT order_id FROM relevant_orders)
    )
SELECT ...
FROM relevant_orders a
JOIN relevant_orders b ON ...
WHERE product_id IN (SELECT product_id FROM order_products);
 
 
-- DEBUGGING SCENARIO: "Why are some customers missing?"
 
-- CTE VERSION: Easy to debug step by step
WITH 
    step1_active_customers AS (
        SELECT * FROM customers WHERE status = 'active'
    ),
    step2_with_orders AS (
        SELECT c.*
        FROM step1_active_customers c
        WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id)
    ),
    step3_high_value AS (
        SELECT c.*
        FROM step2_with_orders c
        JOIN (SELECT customer_id, SUM(amount) as total 
              FROM orders GROUP BY customer_id) totals 
            ON c.customer_id = totals.customer_id
        WHERE totals.total > 1000
    )
-- Debug: Change final SELECT to check each step
-- SELECT COUNT(*) FROM step1_active_customers;  -- 5000
-- SELECT COUNT(*) FROM step2_with_orders;        -- 3200 (1800 never ordered)
-- SELECT COUNT(*) FROM step3_high_value;          -- 450 (most < $1000)
SELECT * FROM step3_high_value;

Decision Framework

Based on everything we've covered, here's a practical decision framework for choosing between CTEs and subqueries.

The Quick Decision Guide

Use CTE when: you need recursion, multiple references, complex transformations, or team collaboration. Use Subquery when: it's simple and single-use, you need correlation, you're checking existence, or you're in a scalar context.

decision_flowchart.txt
DECISION FLOWCHART: CTE vs Subquery
 
START
  │
  ▼
┌─────────────────────────────────────┐
│ Do you need recursion?              │
│ (hierarchies, graphs, series)       │
└─────────────────────────────────────┘
  │ YES → USE CTE (only option)
  │ NO
  ▼
┌─────────────────────────────────────┐
│ Will you reference the same result  │
│ multiple times?                     │
└─────────────────────────────────────┘
  │ YES → USE CTE (avoid duplication)
  │ NO
  ▼
┌─────────────────────────────────────┐
│ Is nesting deeper than 2 levels?    │
└─────────────────────────────────────┘
  │ YES → USE CTE (improve readability)
  │ NO
  ▼
┌─────────────────────────────────────┐
│ Is this a correlated subquery?      │
│ (references outer query)            │
└─────────────────────────────────────┘
  │ YES → USE SUBQUERY (CTE cannot correlate)
  │ NO
  ▼
┌─────────────────────────────────────┐
│ Is this an EXISTS check?            │
└─────────────────────────────────────┘
  │ YES → USE SUBQUERY (more idiomatic)
  │ NO
  ▼
┌─────────────────────────────────────┐
│ Is this a simple scalar subquery?   │
│ (single value in SELECT list)       │
└─────────────────────────────────────┘
  │ YES → USE SUBQUERY (natural fit)
  │ NO
  ▼
┌─────────────────────────────────────┐
│ Will multiple people maintain this? │
└─────────────────────────────────────┘
  │ YES → USE CTE (better collaboration)
  │ NO
  ▼
┌─────────────────────────────────────┐
│ Is the logic complex enough to      │
│ benefit from a descriptive name?    │
└─────────────────────────────────────┘
  │ YES → USE CTE
  │ NO → USE SUBQUERY (simpler is fine)

Quick Reference: When to Use Each
Use CTE When	Use Subquery When
Recursion is needed	Simple, single-use derived table
Referenced 2+ times	Correlated reference to outer query
Complex multi-step transformation	EXISTS / NOT EXISTS checks
Nesting would exceed 2 levels	Scalar value in SELECT list
Team needs to maintain query	IN (SELECT ...) with simple filter
Debugging intermediate steps	Query fits easily on one screen
Named intermediate results add clarity	CTE would just add boilerplate

Real-World Examples

Let's look at real-world scenarios where the choice between CTE and subquery is clear, applying our decision framework.

Scenario: Find products currently in stock

This is a simple existence check—exactly where subqueries shine.

subquery_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- SUBQUERY: Correct choice
-- Simple, single-use, existence check, correlated
SELECT p.product_name, p.price
FROM products p
WHERE EXISTS (
    SELECT 1 FROM inventory i 
    WHERE i.product_id = p.product_id 
    AND i.quantity > 0
);
 
-- CTE alternative: More verbose, no added benefit
WITH in_stock_products AS (
    SELECT DISTINCT product_id
    FROM inventory
    WHERE quantity > 0
)
SELECT p.product_name, p.price
FROM products p
WHERE p.product_id IN (SELECT product_id FROM in_stock_products);
 
-- Why subquery wins:
-- ✓ Simpler (7 lines vs 10)
-- ✓ More direct expression of intent
-- ✓ EXISTS pattern is idiomatic SQL
-- ✓ Optimizer handles EXISTS efficiently

Summary: CTE vs Subquery

We've completed our comprehensive exploration of Common Table Expressions. Let's consolidate everything from this final comparison page:

Key Takeaways: CTE vs Subquery

•Neither is universally better — Each has scenarios where it excels; mastery means knowing which to use when
•CTEs are required for recursion — This is a hard requirement, not a preference
•CTEs eliminate duplication — Multiple references use define-once, reference-many pattern
•Performance is database-dependent — Modern optimizers often produce identical plans; always measure
•Subqueries excel at correlation and existence — Correlated subqueries and EXISTS patterns are natural as subqueries
•Readability tips the scale for complex queries — Named, modular CTEs win when queries get complex
•Hybrid approaches work well — CTEs for structure, subqueries for simple operations

Module Complete: Common Table Expressions

You've now mastered CTEs comprehensively:

Page 1: CTE syntax, execution models, and scoping rules
Page 2: Named subqueries as modular building blocks
Page 3: Multiple CTEs and data transformation pipelines
Page 4: Recursive CTEs for hierarchies, graphs, and iteration
Page 5: CTE vs subquery decision framework

With this knowledge, you can write SQL that is not just correct, but elegant, maintainable, and powerful. CTEs transform how you approach complex data problems, enabling query designs that would be impractical—or impossible—with subqueries alone.

Module Complete

Congratulations! You've completed the Common Table Expressions module. You now possess the skills to write sophisticated CTEs, from basic named subqueries through recursive hierarchical traversals, and the judgment to choose the optimal query structure for any situation.