Database Management SystemsCommon Performance Issues

Common SQL Performance Issues

LevelAdvanced

Duration90 mins

TopicCommon Performance Issues

4 / 5

Correlated Subquery Issues

The Hidden Loop That Kills Performance

Some SQL queries hide devastating performance characteristics in plain sight. They look reasonable, return correct results, and execute instantly on small datasets. But as data grows, they slow down not linearly—but quadratically.

The hidden culprit is often a correlated subquery: a subquery that references columns from the outer query and must be re-executed for every row the outer query examines.

Consider this innocent-looking query:

SELECT e.employee_id, e.name, e.salary,
       (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e.department_id)
FROM employees e;

What you expect: The database calculates department averages and returns them with employee data.

What actually happens: For each of the 100,000 employees, the database runs a separate query against the employees table to compute the department average. That's 100,000 subquery executions, each potentially scanning thousands of rows.

The complexity explosion:

100,000 employees × 1,000 rows per department average = 100,000,000 row examinations
What should take 100 milliseconds takes 10 minutes
10x more data → 100x slower execution (quadratic growth)

What You Will Learn

By the end of this page, you will understand exactly how correlated subqueries execute, recognize patterns that cause performance problems, master rewrite techniques using JOINs and window functions, and know when (rarely) correlated subqueries are acceptable.

Correlated vs. Non-Correlated Subqueries

Before addressing performance issues, we must clearly distinguish between correlated and non-correlated subqueries, as they have fundamentally different execution characteristics.

Non-Correlated (Independent) Subquery:

A non-correlated subquery doesn't reference the outer query. It can be executed once, independently, and its result reused:

-- Non-correlated: subquery is independent
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
--              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--              No reference to outer query's employees table

Execution: The inner query runs once, returns a single value (e.g., 75000), and then the outer query becomes WHERE salary > 75000—a simple, efficient scan or index seek.

Correlated (Dependent) Subquery:

A correlated subquery references columns from the outer query, creating a dependency:

-- Correlated: subquery references outer query
SELECT * FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees e2 
                WHERE e2.department_id = e.department_id);
--                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--                    References e.department_id from outer query

Execution: For each row examined by the outer query, the inner query must run with that row's department_id. If the outer query examines 100,000 rows, the inner query runs 100,000 times.

Non-Correlated Subquery

•Executes once
•Independent of outer query
•Result can be cached and reused
•Generally efficient
•Complexity: O(n + m)

Correlated Subquery

•Executes once per outer row
•Depends on current outer row values
•Result changes for each outer row
•Often catastrophically slow
•Complexity: O(n × m)

The Correlation Trap

Correlated subqueries are syntactically compact and logically intuitive—which makes them attractive to write. But this readability comes at a severe performance cost. The database cannot cache or reuse the subquery result; it must re-evaluate for every row.

How Correlated Subqueries Execute

Understanding the execution model of correlated subqueries reveals why they cause performance problems.

The Nested Loop Execution Model:

At a conceptual level, a correlated subquery executes like a nested loop in procedural code:

correlated_execution_model.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- SQL with correlated subquery
SELECT e.employee_id, e.name,
       (SELECT AVG(o.amount) 
        FROM orders o 
        WHERE o.employee_id = e.employee_id) AS avg_order
FROM employees e
WHERE e.status = 'active';
 
-- Conceptual execution (pseudocode)
result = []
FOR each employee e IN (SELECT * FROM employees WHERE status = 'active'):
    # For every single row in the outer result...
    
    subquery_result = EXECUTE(
        SELECT AVG(amount) FROM orders WHERE employee_id = e.employee_id
    )
    # ...we run a complete query against the orders table
    
    result.append(e.employee_id, e.name, subquery_result)
    
RETURN result
 
# If we have:
# - 10,000 active employees
# - Orders table with 5,000,000 rows
# 
# Execution:
# - Outer loop: 10,000 iterations
# - Inner query: Each may scan orders table (unless indexed)
# - Best case (indexed): 10,000 index seeks
# - Worst case (unindexed): 10,000 full table scans = 50 billion row reads

Execution Plan Representation:

In actual execution plans, correlated subqueries appear as nested loop joins or specific operators:

Database	Operator Name	Description
SQL Server	Nested Loops (with index spool)	Inner query re-executed; may cache via spool
PostgreSQL	SubPlan	Subquery executed per outer row
MySQL	DEPENDENT SUBQUERY	Explicitly marked in EXPLAIN
Oracle	NESTED LOOPS or FILTER	Subquery in nested execution context

The Quadratic Growth Pattern:

The key insight is that complexity is multiplicative, not additive:

Total work ≈ (rows from outer query) × (work per subquery execution)

If the outer query returns 10,000 rows and each subquery execution examines 5,000 rows:

Naive correlated: 10,000 × 5,000 = 50,000,000 row examinations
Equivalent JOIN: 10,000 + 5,000 = 15,000 row examinations (plus join operations)

This 3,300x difference explains why queries that work fine in development (100 rows) become unusable in production (100,000 rows): 100 × 100 = 10,000, but 100,000 × 100,000 = 10,000,000,000.

Converting Mermaid diagram...

Optimizer Rescue (Sometimes)

Modern query optimizers can sometimes 'decorrelate' subqueries, automatically rewriting them as joins. However, this optimization isn't always possible, and different databases have varying capabilities. Don't rely on the optimizer—write efficient queries explicitly.

Common Correlated Subquery Patterns

Correlated subqueries appear in several common patterns. Recognizing these patterns is the first step toward optimization.

Pattern 1: Scalar Subquery in SELECT (Per-Row Computation)

pattern1_select_subquery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- PROBLEM: Scalar subquery for each row
SELECT 
    c.customer_id,
    c.name,
    (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count,
    (SELECT MAX(order_date) FROM orders o WHERE o.customer_id = c.customer_id) AS last_order,
    (SELECT SUM(total) FROM orders o WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c;
 
-- For 50,000 customers, this runs 150,000 subqueries (3 per row)!
-- Each subquery may scan or seek in the orders table
 
-- SOLUTION: Single JOIN with aggregation
SELECT 
    c.customer_id,
    c.name,
    COUNT(o.order_id) AS order_count,
    MAX(o.order_date) AS last_order,
    SUM(o.total) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
 
-- One scan of customers, one scan of orders, one join operation
-- Complexity: O(n + m) instead of O(n × m)

Pattern 2: Existence Check in WHERE (EXISTS/NOT EXISTS)

pattern2_exists_subquery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- PROBLEM: EXISTS with correlated subquery
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_date >= '2024-01-01'
);
 
-- For each customer, check if matching orders exist
-- With proper indexing on orders(customer_id, order_date), 
-- EXISTS can short-circuit (stop at first match)
-- This is often acceptable, but alternatives may be faster
 
-- ALTERNATIVE 1: Semi-join (if database supports; often equivalent)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01';
 
-- ALTERNATIVE 2: IN with non-correlated subquery
SELECT c.customer_id, c.name
FROM customers c
WHERE c.customer_id IN (
    SELECT DISTINCT o.customer_id 
    FROM orders o 
    WHERE o.order_date >= '2024-01-01'
);
-- Inner query runs once, returns set; outer query filters against set
 
-- NOT EXISTS is similar but for anti-joins:
-- PROBLEM: Find customers with no orders
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
 
-- SOLUTION: LEFT JOIN with NULL check
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

Pattern 3: Comparison with Aggregate (Per-Group Comparison)

pattern3_aggregate_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- PROBLEM: Find employees earning more than their department average
SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);
 
-- For each of 100,000 employees, compute department average
-- If 100 departments with 1,000 employees each:
-- 100,000 × 1,000 = 100,000,000 row examinations!
 
-- SOLUTION: Window function (calculate average once per partition)
SELECT employee_id, name, salary, department_id
FROM (
    SELECT 
        e.employee_id, 
        e.name, 
        e.salary, 
        e.department_id,
        AVG(e.salary) OVER (PARTITION BY e.department_id) AS dept_avg
    FROM employees e
) sub
WHERE salary > dept_avg;
 
-- Single pass through employees table
-- Window function computes aggregate per department partition
-- Complexity: O(n) instead of O(n × m)
 
-- ALTERNATIVE: JOIN with pre-computed aggregates (CTE)
WITH dept_averages AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
JOIN dept_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;
 
-- Two passes: one for aggregate, one for join/filter

Pattern 4: Row Limiting with Correlated Logic (Top-N Per Group)

pattern4_topn_per_group.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- PROBLEM: Find the most recent order for each customer
SELECT o.*
FROM orders o
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
);
 
-- For each order, re-scan orders to find maximum date for that customer
-- Extremely expensive for large tables
 
-- SOLUTION 1: Window function with ROW_NUMBER()
SELECT order_id, customer_id, order_date, total
FROM (
    SELECT 
        o.*,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY order_date DESC
        ) AS rn
    FROM orders o
) ranked
WHERE rn = 1;
 
-- Single pass with window function ranking
-- Much more efficient
 
-- SOLUTION 2: DISTINCT ON (PostgreSQL-specific)
SELECT DISTINCT ON (customer_id) *
FROM orders
ORDER BY customer_id, order_date DESC;
 
-- SOLUTION 3: LATERAL JOIN (PostgreSQL, newer SQL Server)
SELECT c.customer_id, c.name, lo.order_id, lo.order_date
FROM customers c
CROSS JOIN LATERAL (
    SELECT o.order_id, o.order_date
    FROM orders o
    WHERE o.customer_id = c.customer_id
    ORDER BY o.order_date DESC
    LIMIT 1
) lo;
 
-- LATERAL is still correlated but with explicit row limiting
-- Can be efficient with proper indexing

When to Suspect Correlated Subqueries

Look for these SQL patterns: (1) Subquery references a table alias from the outer query, (2) Subquery in SELECT clause that isn't a constant, (3) EXISTS/NOT EXISTS with inner/outer table correlation, (4) Scalar comparison against an aggregate that varies per row.

Rewriting with JOINs

The most common optimization for correlated subqueries is rewriting with JOINs. This transforms the per-row execution model into a set-based operation that the optimizer can execute more efficiently.

The General Transformation Pattern:

Correlated Subquery:                     JOIN Equivalent:
─────────────────────                    ────────────────
SELECT outer.*, (subquery)          →    SELECT outer.*, agg.*
FROM outer                                FROM outer
WHERE condition with subquery             JOIN aggregated_subquery agg
                                          ON outer.key = agg.key
                                          WHERE condition

Step-by-Step Transformation:

join_transformation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- ORIGINAL: Correlated scalar subquery
SELECT 
    p.product_id,
    p.name,
    p.price,
    (SELECT COUNT(*) 
     FROM order_items oi 
     WHERE oi.product_id = p.product_id) AS times_ordered,
    (SELECT SUM(quantity) 
     FROM order_items oi 
     WHERE oi.product_id = p.product_id) AS total_quantity
FROM products p;
 
-- STEP 1: Identify the subquery aggregations
-- - COUNT(*) grouped by product_id
-- - SUM(quantity) grouped by product_id
 
-- STEP 2: Create the aggregated subquery as a derived table/CTE
WITH product_stats AS (
    SELECT 
        oi.product_id,
        COUNT(*) AS times_ordered,
        SUM(quantity) AS total_quantity
    FROM order_items oi
    GROUP BY oi.product_id
)
 
-- STEP 3: JOIN to the outer query
SELECT 
    p.product_id,
    p.name,
    p.price,
    COALESCE(ps.times_ordered, 0) AS times_ordered,
    COALESCE(ps.total_quantity, 0) AS total_quantity
FROM products p
LEFT JOIN product_stats ps ON p.product_id = ps.product_id;
 
-- LEFT JOIN ensures products with no orders are included (with 0 values)
-- COALESCE handles NULL from unmatched LEFT JOIN
 
-- PERFORMANCE COMPARISON:
-- Correlated: 10,000 products × 500,000 order_items scans (per product)
-- JOIN: 1 scan of products + 1 scan of order_items + join operation

EXISTS to JOIN Transformation:

exists_to_join.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- ORIGINAL: EXISTS correlated subquery
SELECT c.*
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.status = 'completed'
    AND o.total > 1000
);
 
-- REWRITE: Semi-join with DISTINCT
SELECT DISTINCT c.*
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed' 
AND o.total > 1000;
 
-- OR: Using IN with derived set
SELECT c.*
FROM customers c
WHERE c.customer_id IN (
    SELECT DISTINCT o.customer_id
    FROM orders o
    WHERE o.status = 'completed' AND o.total > 1000
);
 
-- The IN version is non-correlated: inner query runs once
 
-- NOT EXISTS to LEFT JOIN transformation:
-- ORIGINAL:
SELECT c.*
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
 
-- REWRITE: Anti-join pattern
SELECT c.*
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
 
-- This is often the most efficient anti-join in many databases

Watch for Cardinality Changes

When converting correlated subqueries to JOINs, be careful about cardinality. A scalar subquery returns exactly one value per outer row. A JOIN may return multiple rows if the relationship isn't 1:1. Use DISTINCT or ensure your join keys are unique on the joined side, or use aggregation to collapse to one row.

Rewriting with Window Functions

Window functions are often the most elegant and efficient solution for correlated subquery patterns, particularly those involving per-group comparisons or rankings.

Why Window Functions Excel:

Single pass execution — Data is scanned once; aggregates computed per partition
Preserve row identity — Unlike GROUP BY, individual rows remain accessible
Multiple aggregates efficiently — Multiple window functions share the same pass
Express complex patterns simply — Ranking, running totals, comparisons to group aggregates

window_function_rewrites.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
-- ========================================
-- PATTERN 1: Compare to group aggregate
-- ========================================
 
-- PROBLEM (correlated):
SELECT e.*
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) 
    FROM employees e2 
    WHERE e2.department_id = e.department_id
);
 
-- SOLUTION (window function):
SELECT *
FROM (
    SELECT 
        e.*,
        AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
    FROM employees e
) sub
WHERE salary > dept_avg;
 
-- ========================================
-- PATTERN 2: Find max/min row per group
-- ========================================
 
-- PROBLEM (correlated):
SELECT o.*
FROM orders o
WHERE o.order_date = (
    SELECT MAX(order_date) 
    FROM orders o2 
    WHERE o2.customer_id = o.customer_id
);
 
-- SOLUTION (window function with ROW_NUMBER):
SELECT *
FROM (
    SELECT 
        o.*,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY order_date DESC
        ) AS rn
    FROM orders o
) ranked
WHERE rn = 1;
 
-- Use RANK() or DENSE_RANK() if ties should return multiple rows
 
-- ========================================
-- PATTERN 3: Running comparisons
-- ========================================
 
-- PROBLEM: Find orders where total exceeds cumulative average
-- SOLUTION (window function):
SELECT *
FROM (
    SELECT 
        o.*,
        AVG(total) OVER (
            PARTITION BY customer_id 
            ORDER BY order_date
            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
        ) AS avg_previous_orders
    FROM orders o
) sub
WHERE total > avg_previous_orders;
 
-- ========================================
-- PATTERN 4: Percentage of group total
-- ========================================
 
-- PROBLEM (correlated):
SELECT 
    d.department_id,
    e.employee_id,
    e.salary,
    e.salary * 100.0 / (
        SELECT SUM(e2.salary) 
        FROM employees e2 
        WHERE e2.department_id = d.department_id
    ) AS pct_of_dept
FROM departments d
JOIN employees e ON d.department_id = e.department_id;
 
-- SOLUTION (window function):
SELECT 
    department_id,
    employee_id,
    salary,
    salary * 100.0 / SUM(salary) OVER (PARTITION BY department_id) AS pct_of_dept
FROM employees;
 
-- ========================================
-- PATTERN 5: Top N per group
-- ========================================
 
-- Find top 3 highest-paid employees per department
 
-- SOLUTION (window function):
SELECT *
FROM (
    SELECT 
        e.*,
        RANK() OVER (
            PARTITION BY department_id 
            ORDER BY salary DESC
        ) AS salary_rank
    FROM employees e
) ranked
WHERE salary_rank <= 3;

Correlated Subquery to Window Function Mapping
Correlated Pattern	Window Function Solution	Key Insight
Compare to group AVG/SUM/COUNT	`AGG() OVER (PARTITION BY group_col)`	Aggregate computed once per partition
Find row with MAX/MIN per group	`ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)` = 1	Ranking finds extreme values
Compare to row's peers	`LAG()/LEAD()` or frame clauses	Access adjacent rows without self-join
Running/cumulative calculations	`SUM() OVER (ORDER BY ... ROWS ...)`	Frame specification controls accumulation
Percentage of total	value / `SUM() OVER (PARTITION BY ...)`	Total computed per partition

Window Functions: Single Pass Power

Window functions process the entire result set in one pass while computing per-partition aggregates. Multiple window functions with the same PARTITION BY and ORDER BY can often share the same sort operation, making them extremely efficient for complex analytics.

Detecting Correlated Subquery Performance Issues

Identifying problematic correlated subqueries requires both code review and execution plan analysis.

Code-Level Detection:

Look for these patterns in SQL:

Subquery that references a table alias defined in the outer query
Subquery in SELECT clause that isn't a constant expression
WHERE clause with = (SELECT ...) or IN (SELECT ...) where the inner SELECT references outer tables
EXISTS/NOT EXISTS with inner-outer table correlation

Execution Plan Detection:

sqlserver_detect_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Enable execution plan
SET SHOWPLAN_TEXT ON;
GO
SELECT e.*, (SELECT AVG(salary) FROM employees e2 
             WHERE e2.dept_id = e.dept_id)
FROM employees e;
GO
SET SHOWPLAN_TEXT OFF;
 
-- Look for in execution plan:
-- 1. Nested Loops operator with inner side as subquery
-- 2. High "Actual Rows" × "Executions" ratio
-- 3. Spools (Table Spool, Index Spool) indicating repeated access
 
-- In graphical plan (SSMS):
-- - Thick arrows (indicating high row counts)
-- - Nested Loops with subquery on inner side
-- - "Number of Executions" equal to outer row count
 
-- Query for plans with high-cost subqueries
SELECT 
    qs.execution_count,
    qs.total_elapsed_time / 1000000.0 AS total_sec,
    qs.total_elapsed_time / qs.execution_count / 1000.0 AS avg_ms,
    SUBSTRING(qt.text, qs.statement_start_offset/2 + 1,
        (CASE qs.statement_end_offset 
            WHEN -1 THEN DATALENGTH(qt.text)
            ELSE qs.statement_end_offset END 
        - qs.statement_start_offset)/2 + 1) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qt.text LIKE '%SELECT%SELECT%'  -- Nested SELECT (rough pattern)
AND qs.total_elapsed_time / qs.execution_count > 1000000  -- >1 sec avg
ORDER BY avg_ms DESC;

The 'loops' Indicator

In PostgreSQL's EXPLAIN ANALYZE output, the 'loops=N' value is critical. It shows how many times an operation was executed. For a correlated subquery, N often equals the number of rows in the outer query—a clear sign of per-row execution.

When Correlated Subqueries Are Acceptable

While this page emphasizes the performance dangers of correlated subqueries, they're not universally bad. In specific situations, they may be acceptable or even optimal.

Scenarios Where Correlated Subqueries May Be Acceptable:

Acceptable Use Cases

•Very small outer result sets — If the outer query returns only 10-100 rows, the subquery executes only 10-100 times. The overhead is negligible.
•EXISTS with early termination — EXISTS stops at the first match. If matches are common and indexed, each subquery execution is a single index probe.
•Subquery is highly selective and indexed — If the correlated subquery accesses a small amount of indexed data per execution, the overhead is bounded.
•Readability is paramount for rare queries — For ad-hoc analytics or rarely-run reports, clarity may outweigh micro-optimization.
•Optimizer decorrelates successfully — Modern optimizers sometimes rewrite correlated subqueries as joins. Check the execution plan to verify.
•LATERAL joins are the best solution — LATERAL (or CROSS APPLY) is explicitly correlated but with modern optimizer support for efficient execution.

acceptable_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Example 1: EXISTS with good indexing
-- Finding active users who have admin roles (rare combination)
SELECT u.*
FROM users u
WHERE u.status = 'active'  -- Very selective: 100 active users
AND EXISTS (
    SELECT 1 
    FROM user_roles ur 
    WHERE ur.user_id = u.user_id 
    AND ur.role_name = 'admin'
    -- Index on user_roles(user_id, role_name) makes this O(1) per user
);
-- 100 users × O(1) index probe = 100 index probes: acceptable
 
-- Example 2: LATERAL for top-N per group (PostgreSQL, SQL Server as CROSS APPLY)
SELECT c.customer_id, c.name, recent_orders.*
FROM customers c
CROSS JOIN LATERAL (
    SELECT o.order_id, o.order_date, o.total
    FROM orders o
    WHERE o.customer_id = c.customer_id
    ORDER BY o.order_date DESC
    LIMIT 3
) recent_orders;
-- LATERAL is correlated but allows LIMIT within the correlation
-- With index on orders(customer_id, order_date DESC), this is efficient
 
-- Example 3: Optimizer-decorrelated query
-- Some databases automatically rewrite this:
SELECT e.*
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) FROM employees WHERE dept_id = e.dept_id
);
 
-- Check the execution plan:
-- If it shows Hash Join or Merge Join instead of Nested Loops,
-- the optimizer decorrelated the query. 
-- In that case, the correlated syntax has no performance penalty.

Always Verify with Execution Plans

Never assume a correlated subquery will be optimized away. Always check the execution plan. Look for the number of loops/executions, the join type (Nested Loops suggests correlation; Hash/Merge Join suggests decorrelation), and the total work performed.

Summary: Correlated Subquery Issues

Correlated subqueries are a common source of catastrophic query performance problems. Their innocent syntax hides quadratic execution complexity that only manifests at production scale.

Key Takeaways

•Correlated subqueries execute once per outer row — They create O(n × m) complexity instead of O(n + m) for equivalent joins.
•The signature is referencing outer query tables — Any subquery that uses table aliases from the outer query is correlated.
•Common patterns: scalar SELECT subqueries, per-group comparisons, EXISTS — These appear frequently in business logic queries.
•Rewrite to JOINs for set-based execution — Pre-compute aggregates in CTEs or derived tables, then join.
•Window functions elegantly solve many patterns — AVG() OVER, ROW_NUMBER() OVER, and ranking replace common correlated aggregates.
•Detect via execution plans: look for loops = outer row count — 'SubPlan with loops=10000' or 'DEPENDENT SUBQUERY' are warning signs.
•Correlated subqueries are sometimes acceptable — Small outer sets, EXISTS with indexes, and successful decorrelation by the optimizer.

Page Complete

You now understand why correlated subqueries cause performance problems and how to rewrite them using JOINs and window functions. Next, we'll examine the notorious N+1 query problem—a related pattern that plagues object-relational mapping and application code.

4 / 5

Loading learning content...

Database Management SystemsCommon Performance Issues

Common SQL Performance Issues

LevelAdvanced

Duration90 mins

TopicCommon Performance Issues

4 / 5

Correlated Subquery Issues

The Hidden Loop That Kills Performance

The hidden culprit is often a correlated subquery: a subquery that references columns from the outer query and must be re-executed for every row the outer query examines.

Consider this innocent-looking query:

SELECT e.employee_id, e.name, e.salary,
       (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e.department_id)
FROM employees e;

What you expect: The database calculates department averages and returns them with employee data.

The complexity explosion:

100,000 employees × 1,000 rows per department average = 100,000,000 row examinations
What should take 100 milliseconds takes 10 minutes
10x more data → 100x slower execution (quadratic growth)

What You Will Learn

Correlated vs. Non-Correlated Subqueries

Before addressing performance issues, we must clearly distinguish between correlated and non-correlated subqueries, as they have fundamentally different execution characteristics.

Non-Correlated (Independent) Subquery:

A non-correlated subquery doesn't reference the outer query. It can be executed once, independently, and its result reused:

-- Non-correlated: subquery is independent
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
--              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--              No reference to outer query's employees table

Execution: The inner query runs once, returns a single value (e.g., 75000), and then the outer query becomes WHERE salary > 75000—a simple, efficient scan or index seek.

Correlated (Dependent) Subquery:

A correlated subquery references columns from the outer query, creating a dependency:

-- Correlated: subquery references outer query
SELECT * FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees e2 
                WHERE e2.department_id = e.department_id);
--                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
--                    References e.department_id from outer query

Execution: For each row examined by the outer query, the inner query must run with that row's department_id. If the outer query examines 100,000 rows, the inner query runs 100,000 times.

Non-Correlated Subquery

•Executes once
•Independent of outer query
•Result can be cached and reused
•Generally efficient
•Complexity: O(n + m)

Correlated Subquery

•Executes once per outer row
•Depends on current outer row values
•Result changes for each outer row
•Often catastrophically slow
•Complexity: O(n × m)

The Correlation Trap

How Correlated Subqueries Execute

Understanding the execution model of correlated subqueries reveals why they cause performance problems.

The Nested Loop Execution Model:

At a conceptual level, a correlated subquery executes like a nested loop in procedural code:

correlated_execution_model.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- SQL with correlated subquery
SELECT e.employee_id, e.name,
       (SELECT AVG(o.amount) 
        FROM orders o 
        WHERE o.employee_id = e.employee_id) AS avg_order
FROM employees e
WHERE e.status = 'active';
 
-- Conceptual execution (pseudocode)
result = []
FOR each employee e IN (SELECT * FROM employees WHERE status = 'active'):
    # For every single row in the outer result...
    
    subquery_result = EXECUTE(
        SELECT AVG(amount) FROM orders WHERE employee_id = e.employee_id
    )
    # ...we run a complete query against the orders table
    
    result.append(e.employee_id, e.name, subquery_result)
    
RETURN result
 
# If we have:
# - 10,000 active employees
# - Orders table with 5,000,000 rows
# 
# Execution:
# - Outer loop: 10,000 iterations
# - Inner query: Each may scan orders table (unless indexed)
# - Best case (indexed): 10,000 index seeks
# - Worst case (unindexed): 10,000 full table scans = 50 billion row reads

Execution Plan Representation:

In actual execution plans, correlated subqueries appear as nested loop joins or specific operators:

Database	Operator Name	Description
SQL Server	Nested Loops (with index spool)	Inner query re-executed; may cache via spool
PostgreSQL	SubPlan	Subquery executed per outer row
MySQL	DEPENDENT SUBQUERY	Explicitly marked in EXPLAIN
Oracle	NESTED LOOPS or FILTER	Subquery in nested execution context

The Quadratic Growth Pattern:

The key insight is that complexity is multiplicative, not additive:

Total work ≈ (rows from outer query) × (work per subquery execution)

If the outer query returns 10,000 rows and each subquery execution examines 5,000 rows:

Naive correlated: 10,000 × 5,000 = 50,000,000 row examinations
Equivalent JOIN: 10,000 + 5,000 = 15,000 row examinations (plus join operations)

This 3,300x difference explains why queries that work fine in development (100 rows) become unusable in production (100,000 rows): 100 × 100 = 10,000, but 100,000 × 100,000 = 10,000,000,000.

Converting Mermaid diagram...

Optimizer Rescue (Sometimes)

Common Correlated Subquery Patterns

Correlated subqueries appear in several common patterns. Recognizing these patterns is the first step toward optimization.

Pattern 1: Scalar Subquery in SELECT (Per-Row Computation)

pattern1_select_subquery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
-- PROBLEM: Scalar subquery for each row
SELECT 
    c.customer_id,
    c.name,
    (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) AS order_count,
    (SELECT MAX(order_date) FROM orders o WHERE o.customer_id = c.customer_id) AS last_order,
    (SELECT SUM(total) FROM orders o WHERE o.customer_id = c.customer_id) AS total_spent
FROM customers c;
 
-- For 50,000 customers, this runs 150,000 subqueries (3 per row)!
-- Each subquery may scan or seek in the orders table
 
-- SOLUTION: Single JOIN with aggregation
SELECT 
    c.customer_id,
    c.name,
    COUNT(o.order_id) AS order_count,
    MAX(o.order_date) AS last_order,
    SUM(o.total) AS total_spent
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.name;
 
-- One scan of customers, one scan of orders, one join operation
-- Complexity: O(n + m) instead of O(n × m)

Pattern 2: Existence Check in WHERE (EXISTS/NOT EXISTS)

pattern2_exists_subquery.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- PROBLEM: EXISTS with correlated subquery
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.order_date >= '2024-01-01'
);
 
-- For each customer, check if matching orders exist
-- With proper indexing on orders(customer_id, order_date), 
-- EXISTS can short-circuit (stop at first match)
-- This is often acceptable, but alternatives may be faster
 
-- ALTERNATIVE 1: Semi-join (if database supports; often equivalent)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01';
 
-- ALTERNATIVE 2: IN with non-correlated subquery
SELECT c.customer_id, c.name
FROM customers c
WHERE c.customer_id IN (
    SELECT DISTINCT o.customer_id 
    FROM orders o 
    WHERE o.order_date >= '2024-01-01'
);
-- Inner query runs once, returns set; outer query filters against set
 
-- NOT EXISTS is similar but for anti-joins:
-- PROBLEM: Find customers with no orders
SELECT c.customer_id, c.name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
 
-- SOLUTION: LEFT JOIN with NULL check
SELECT c.customer_id, c.name
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;

Pattern 3: Comparison with Aggregate (Per-Group Comparison)

pattern3_aggregate_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- PROBLEM: Find employees earning more than their department average
SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
);
 
-- For each of 100,000 employees, compute department average
-- If 100 departments with 1,000 employees each:
-- 100,000 × 1,000 = 100,000,000 row examinations!
 
-- SOLUTION: Window function (calculate average once per partition)
SELECT employee_id, name, salary, department_id
FROM (
    SELECT 
        e.employee_id, 
        e.name, 
        e.salary, 
        e.department_id,
        AVG(e.salary) OVER (PARTITION BY e.department_id) AS dept_avg
    FROM employees e
) sub
WHERE salary > dept_avg;
 
-- Single pass through employees table
-- Window function computes aggregate per department partition
-- Complexity: O(n) instead of O(n × m)
 
-- ALTERNATIVE: JOIN with pre-computed aggregates (CTE)
WITH dept_averages AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.employee_id, e.name, e.salary, e.department_id
FROM employees e
JOIN dept_averages da ON e.department_id = da.department_id
WHERE e.salary > da.avg_salary;
 
-- Two passes: one for aggregate, one for join/filter

Pattern 4: Row Limiting with Correlated Logic (Top-N Per Group)

pattern4_topn_per_group.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- PROBLEM: Find the most recent order for each customer
SELECT o.*
FROM orders o
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id
);
 
-- For each order, re-scan orders to find maximum date for that customer
-- Extremely expensive for large tables
 
-- SOLUTION 1: Window function with ROW_NUMBER()
SELECT order_id, customer_id, order_date, total
FROM (
    SELECT 
        o.*,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY order_date DESC
        ) AS rn
    FROM orders o
) ranked
WHERE rn = 1;
 
-- Single pass with window function ranking
-- Much more efficient
 
-- SOLUTION 2: DISTINCT ON (PostgreSQL-specific)
SELECT DISTINCT ON (customer_id) *
FROM orders
ORDER BY customer_id, order_date DESC;
 
-- SOLUTION 3: LATERAL JOIN (PostgreSQL, newer SQL Server)
SELECT c.customer_id, c.name, lo.order_id, lo.order_date
FROM customers c
CROSS JOIN LATERAL (
    SELECT o.order_id, o.order_date
    FROM orders o
    WHERE o.customer_id = c.customer_id
    ORDER BY o.order_date DESC
    LIMIT 1
) lo;
 
-- LATERAL is still correlated but with explicit row limiting
-- Can be efficient with proper indexing

When to Suspect Correlated Subqueries

Rewriting with JOINs

The General Transformation Pattern:

Correlated Subquery:                     JOIN Equivalent:
─────────────────────                    ────────────────
SELECT outer.*, (subquery)          →    SELECT outer.*, agg.*
FROM outer                                FROM outer
WHERE condition with subquery             JOIN aggregated_subquery agg
                                          ON outer.key = agg.key
                                          WHERE condition

Step-by-Step Transformation:

join_transformation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- ORIGINAL: Correlated scalar subquery
SELECT 
    p.product_id,
    p.name,
    p.price,
    (SELECT COUNT(*) 
     FROM order_items oi 
     WHERE oi.product_id = p.product_id) AS times_ordered,
    (SELECT SUM(quantity) 
     FROM order_items oi 
     WHERE oi.product_id = p.product_id) AS total_quantity
FROM products p;
 
-- STEP 1: Identify the subquery aggregations
-- - COUNT(*) grouped by product_id
-- - SUM(quantity) grouped by product_id
 
-- STEP 2: Create the aggregated subquery as a derived table/CTE
WITH product_stats AS (
    SELECT 
        oi.product_id,
        COUNT(*) AS times_ordered,
        SUM(quantity) AS total_quantity
    FROM order_items oi
    GROUP BY oi.product_id
)
 
-- STEP 3: JOIN to the outer query
SELECT 
    p.product_id,
    p.name,
    p.price,
    COALESCE(ps.times_ordered, 0) AS times_ordered,
    COALESCE(ps.total_quantity, 0) AS total_quantity
FROM products p
LEFT JOIN product_stats ps ON p.product_id = ps.product_id;
 
-- LEFT JOIN ensures products with no orders are included (with 0 values)
-- COALESCE handles NULL from unmatched LEFT JOIN
 
-- PERFORMANCE COMPARISON:
-- Correlated: 10,000 products × 500,000 order_items scans (per product)
-- JOIN: 1 scan of products + 1 scan of order_items + join operation

EXISTS to JOIN Transformation:

exists_to_join.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- ORIGINAL: EXISTS correlated subquery
SELECT c.*
FROM customers c
WHERE EXISTS (
    SELECT 1 
    FROM orders o 
    WHERE o.customer_id = c.customer_id 
    AND o.status = 'completed'
    AND o.total > 1000
);
 
-- REWRITE: Semi-join with DISTINCT
SELECT DISTINCT c.*
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
WHERE o.status = 'completed' 
AND o.total > 1000;
 
-- OR: Using IN with derived set
SELECT c.*
FROM customers c
WHERE c.customer_id IN (
    SELECT DISTINCT o.customer_id
    FROM orders o
    WHERE o.status = 'completed' AND o.total > 1000
);
 
-- The IN version is non-correlated: inner query runs once
 
-- NOT EXISTS to LEFT JOIN transformation:
-- ORIGINAL:
SELECT c.*
FROM customers c
WHERE NOT EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);
 
-- REWRITE: Anti-join pattern
SELECT c.*
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_id IS NULL;
 
-- This is often the most efficient anti-join in many databases

Watch for Cardinality Changes

Rewriting with Window Functions

Window functions are often the most elegant and efficient solution for correlated subquery patterns, particularly those involving per-group comparisons or rankings.

Why Window Functions Excel:

Single pass execution — Data is scanned once; aggregates computed per partition
Preserve row identity — Unlike GROUP BY, individual rows remain accessible
Multiple aggregates efficiently — Multiple window functions share the same pass
Express complex patterns simply — Ranking, running totals, comparisons to group aggregates

window_function_rewrites.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
-- ========================================
-- PATTERN 1: Compare to group aggregate
-- ========================================
 
-- PROBLEM (correlated):
SELECT e.*
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) 
    FROM employees e2 
    WHERE e2.department_id = e.department_id
);
 
-- SOLUTION (window function):
SELECT *
FROM (
    SELECT 
        e.*,
        AVG(salary) OVER (PARTITION BY department_id) AS dept_avg
    FROM employees e
) sub
WHERE salary > dept_avg;
 
-- ========================================
-- PATTERN 2: Find max/min row per group
-- ========================================
 
-- PROBLEM (correlated):
SELECT o.*
FROM orders o
WHERE o.order_date = (
    SELECT MAX(order_date) 
    FROM orders o2 
    WHERE o2.customer_id = o.customer_id
);
 
-- SOLUTION (window function with ROW_NUMBER):
SELECT *
FROM (
    SELECT 
        o.*,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY order_date DESC
        ) AS rn
    FROM orders o
) ranked
WHERE rn = 1;
 
-- Use RANK() or DENSE_RANK() if ties should return multiple rows
 
-- ========================================
-- PATTERN 3: Running comparisons
-- ========================================
 
-- PROBLEM: Find orders where total exceeds cumulative average
-- SOLUTION (window function):
SELECT *
FROM (
    SELECT 
        o.*,
        AVG(total) OVER (
            PARTITION BY customer_id 
            ORDER BY order_date
            ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
        ) AS avg_previous_orders
    FROM orders o
) sub
WHERE total > avg_previous_orders;
 
-- ========================================
-- PATTERN 4: Percentage of group total
-- ========================================
 
-- PROBLEM (correlated):
SELECT 
    d.department_id,
    e.employee_id,
    e.salary,
    e.salary * 100.0 / (
        SELECT SUM(e2.salary) 
        FROM employees e2 
        WHERE e2.department_id = d.department_id
    ) AS pct_of_dept
FROM departments d
JOIN employees e ON d.department_id = e.department_id;
 
-- SOLUTION (window function):
SELECT 
    department_id,
    employee_id,
    salary,
    salary * 100.0 / SUM(salary) OVER (PARTITION BY department_id) AS pct_of_dept
FROM employees;
 
-- ========================================
-- PATTERN 5: Top N per group
-- ========================================
 
-- Find top 3 highest-paid employees per department
 
-- SOLUTION (window function):
SELECT *
FROM (
    SELECT 
        e.*,
        RANK() OVER (
            PARTITION BY department_id 
            ORDER BY salary DESC
        ) AS salary_rank
    FROM employees e
) ranked
WHERE salary_rank <= 3;

Correlated Subquery to Window Function Mapping
Correlated Pattern	Window Function Solution	Key Insight
Compare to group AVG/SUM/COUNT	`AGG() OVER (PARTITION BY group_col)`	Aggregate computed once per partition
Find row with MAX/MIN per group	`ROW_NUMBER() OVER (PARTITION BY ... ORDER BY ...)` = 1	Ranking finds extreme values
Compare to row's peers	`LAG()/LEAD()` or frame clauses	Access adjacent rows without self-join
Running/cumulative calculations	`SUM() OVER (ORDER BY ... ROWS ...)`	Frame specification controls accumulation
Percentage of total	value / `SUM() OVER (PARTITION BY ...)`	Total computed per partition

Window Functions: Single Pass Power

Detecting Correlated Subquery Performance Issues

Identifying problematic correlated subqueries requires both code review and execution plan analysis.

Code-Level Detection:

Look for these patterns in SQL:

Subquery that references a table alias defined in the outer query
Subquery in SELECT clause that isn't a constant expression
WHERE clause with = (SELECT ...) or IN (SELECT ...) where the inner SELECT references outer tables
EXISTS/NOT EXISTS with inner-outer table correlation

Execution Plan Detection:

sqlserver_detect_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Enable execution plan
SET SHOWPLAN_TEXT ON;
GO
SELECT e.*, (SELECT AVG(salary) FROM employees e2 
             WHERE e2.dept_id = e.dept_id)
FROM employees e;
GO
SET SHOWPLAN_TEXT OFF;
 
-- Look for in execution plan:
-- 1. Nested Loops operator with inner side as subquery
-- 2. High "Actual Rows" × "Executions" ratio
-- 3. Spools (Table Spool, Index Spool) indicating repeated access
 
-- In graphical plan (SSMS):
-- - Thick arrows (indicating high row counts)
-- - Nested Loops with subquery on inner side
-- - "Number of Executions" equal to outer row count
 
-- Query for plans with high-cost subqueries
SELECT 
    qs.execution_count,
    qs.total_elapsed_time / 1000000.0 AS total_sec,
    qs.total_elapsed_time / qs.execution_count / 1000.0 AS avg_ms,
    SUBSTRING(qt.text, qs.statement_start_offset/2 + 1,
        (CASE qs.statement_end_offset 
            WHEN -1 THEN DATALENGTH(qt.text)
            ELSE qs.statement_end_offset END 
        - qs.statement_start_offset)/2 + 1) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
WHERE qt.text LIKE '%SELECT%SELECT%'  -- Nested SELECT (rough pattern)
AND qs.total_elapsed_time / qs.execution_count > 1000000  -- >1 sec avg
ORDER BY avg_ms DESC;

The 'loops' Indicator

When Correlated Subqueries Are Acceptable

While this page emphasizes the performance dangers of correlated subqueries, they're not universally bad. In specific situations, they may be acceptable or even optimal.

Scenarios Where Correlated Subqueries May Be Acceptable:

Acceptable Use Cases

•Very small outer result sets — If the outer query returns only 10-100 rows, the subquery executes only 10-100 times. The overhead is negligible.
•EXISTS with early termination — EXISTS stops at the first match. If matches are common and indexed, each subquery execution is a single index probe.
•Subquery is highly selective and indexed — If the correlated subquery accesses a small amount of indexed data per execution, the overhead is bounded.
•Readability is paramount for rare queries — For ad-hoc analytics or rarely-run reports, clarity may outweigh micro-optimization.
•Optimizer decorrelates successfully — Modern optimizers sometimes rewrite correlated subqueries as joins. Check the execution plan to verify.
•LATERAL joins are the best solution — LATERAL (or CROSS APPLY) is explicitly correlated but with modern optimizer support for efficient execution.

acceptable_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Example 1: EXISTS with good indexing
-- Finding active users who have admin roles (rare combination)
SELECT u.*
FROM users u
WHERE u.status = 'active'  -- Very selective: 100 active users
AND EXISTS (
    SELECT 1 
    FROM user_roles ur 
    WHERE ur.user_id = u.user_id 
    AND ur.role_name = 'admin'
    -- Index on user_roles(user_id, role_name) makes this O(1) per user
);
-- 100 users × O(1) index probe = 100 index probes: acceptable
 
-- Example 2: LATERAL for top-N per group (PostgreSQL, SQL Server as CROSS APPLY)
SELECT c.customer_id, c.name, recent_orders.*
FROM customers c
CROSS JOIN LATERAL (
    SELECT o.order_id, o.order_date, o.total
    FROM orders o
    WHERE o.customer_id = c.customer_id
    ORDER BY o.order_date DESC
    LIMIT 3
) recent_orders;
-- LATERAL is correlated but allows LIMIT within the correlation
-- With index on orders(customer_id, order_date DESC), this is efficient
 
-- Example 3: Optimizer-decorrelated query
-- Some databases automatically rewrite this:
SELECT e.*
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary) FROM employees WHERE dept_id = e.dept_id
);
 
-- Check the execution plan:
-- If it shows Hash Join or Merge Join instead of Nested Loops,
-- the optimizer decorrelated the query. 
-- In that case, the correlated syntax has no performance penalty.

Always Verify with Execution Plans

Summary: Correlated Subquery Issues

Correlated subqueries are a common source of catastrophic query performance problems. Their innocent syntax hides quadratic execution complexity that only manifests at production scale.

Key Takeaways

•Correlated subqueries execute once per outer row — They create O(n × m) complexity instead of O(n + m) for equivalent joins.
•The signature is referencing outer query tables — Any subquery that uses table aliases from the outer query is correlated.
•Common patterns: scalar SELECT subqueries, per-group comparisons, EXISTS — These appear frequently in business logic queries.
•Rewrite to JOINs for set-based execution — Pre-compute aggregates in CTEs or derived tables, then join.
•Window functions elegantly solve many patterns — AVG() OVER, ROW_NUMBER() OVER, and ranking replace common correlated aggregates.
•Detect via execution plans: look for loops = outer row count — 'SubPlan with loops=10000' or 'DEPENDENT SUBQUERY' are warning signs.
•Correlated subqueries are sometimes acceptable — Small outer sets, EXISTS with indexes, and successful decorrelation by the optimizer.

Page Complete

4 / 5