Database Management SystemsSQL Joins & Subqueries

Correlated Subqueries

LevelIntermediate

Duration60 mins

TopicSQL Joins & Subqueries

4 / 5

Correlated vs Non-Correlated

Two Flavors of Subqueries

Every subquery in SQL falls into one of two fundamental categories: correlated or non-correlated (also called independent or simple subqueries). This distinction isn't just academic—it determines:

How the database engine executes the query
Whether transformations to joins are possible
The potential performance characteristics
What questions the query can answer

Understanding this dichotomy deeply enables you to choose the right approach for each situation, recognize when queries can be rewritten for performance, and avoid common pitfalls that arise from confusion between the two types.

What You Will Learn

By the end of this page, you will clearly distinguish correlated from non-correlated subqueries, understand their execution models, know when each type is appropriate, and recognize opportunities to transform one into the other for clarity or performance.

The Fundamental Distinction

The defining characteristic that separates correlated from non-correlated subqueries is dependency on the outer query.

Non-Correlated (Independent) Subquery:

Does NOT reference any columns from the outer query
Can be executed independently, in isolation
Returns the same result regardless of outer query rows
Conceptually executed once, result reused for all outer rows

Correlated (Dependent) Subquery:

References one or more columns from the outer query
Cannot be executed independently—needs outer row context
Returns potentially different results for different outer rows
Conceptually executed once per outer row

Non-Correlated Example:

non_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Find employees earning above
-- the company average
SELECT name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
);
 
-- Analysis:
-- • Subquery: SELECT AVG(salary) FROM employees
-- • No reference to outer query
-- • Executes ONCE, returns (e.g.) 75000
-- • Outer query becomes:
--   WHERE salary > 75000
-- • Same comparison for every row

Correlated Example:

correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Find employees earning above
-- THEIR department's average
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.dept_id = e.dept_id
);
 
-- Analysis:
-- • Subquery references e.dept_id
-- • For Engineering: AVG might be 90000
-- • For Sales: AVG might be 65000  
-- • Different value per department
-- • Conceptually executes per row

Quick Identification

To identify subquery type: Look inside the subquery for any reference to tables or aliases defined only in the outer query. If found → correlated. If the subquery could run standalone and return meaningful results → non-correlated.

Execution Model Comparison

The execution models differ fundamentally, which directly impacts performance characteristics and optimization opportunities.

Non-Correlated Subquery Execution:

1. Execute subquery ONCE
2. Store result (single value, row, or table)
3. For each outer row:
   - Use stored subquery result in predicate
   - Evaluate outer row

Correlated Subquery Execution (Conceptual):

1. For each outer row:
   a. Substitute outer row values into subquery
   b. Execute subquery with those values
   c. Use subquery result for this specific row
   d. Evaluate outer row

execution_trace_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Sample data:
-- employees: (1, 'Alice', 90000, 1), (2, 'Bob', 70000, 1),
--            (3, 'Carol', 80000, 2), (4, 'David', 60000, 2)
-- Dept 1 avg: 80000, Dept 2 avg: 70000, Company avg: 75000
 
-- NON-CORRELATED EXECUTION:
SELECT name, salary FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- Step 1: Execute subquery → 75000
-- Step 2: Process rows:
--   Alice:  90000 > 75000? YES → include
--   Bob:    70000 > 75000? NO  → exclude
--   Carol:  80000 > 75000? YES → include
--   David:  60000 > 75000? NO  → exclude
-- Result: Alice, Carol
 
-- CORRELATED EXECUTION (conceptual):
SELECT e.name, e.salary FROM employees e
WHERE e.salary > (SELECT AVG(e2.salary) FROM employees e2 
                  WHERE e2.dept_id = e.dept_id);
 
-- For Alice (dept_id=1):
--   Execute: SELECT AVG(salary) WHERE dept_id=1 → 80000
--   90000 > 80000? YES → include
-- For Bob (dept_id=1):
--   Execute: SELECT AVG(salary) WHERE dept_id=1 → 80000
--   70000 > 80000? NO → exclude
-- For Carol (dept_id=2):
--   Execute: SELECT AVG(salary) WHERE dept_id=2 → 70000
--   80000 > 70000? YES → include
-- For David (dept_id=2):
--   Execute: SELECT AVG(salary) WHERE dept_id=2 → 70000
--   60000 > 70000? NO → exclude
-- Result: Alice, Carol

Optimizer Transformations

Modern optimizers often transform both types. Non-correlated subqueries may be inlined as constants. Correlated subqueries may be converted to joins with aggregation. The 'conceptual' model describes semantics; actual execution may differ significantly.

Semantic Differences: What Each Can Express

Beyond execution differences, correlated and non-correlated subqueries have different expressive capabilities. Some questions can only be answered with correlation.

Expressive Capability Comparison
Question Type	Non-Correlated	Correlated
Compare to global aggregate	✓ Natural fit	Possible but overkill
Compare to group-specific aggregate	✗ Cannot express	✓ Required
Check existence of ANY related row	✓ Via IN	✓ Via EXISTS (often better)
Check per-row relationship condition	✗ Cannot express	✓ Required
Compute row-specific derived value	✗ Cannot express	✓ Scalar correlated subquery
Find maximum in each group	Complex	✓ Natural with correlation
Compare row to its own group	✗ Cannot express	✓ Required (or window function)

semantic_requirements.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- REQUIRES CORRELATION: Per-department comparison
-- "Find products priced above their category average"
SELECT p.name, p.price, p.category_id
FROM products p
WHERE p.price > (
    SELECT AVG(p2.price) FROM products p2
    WHERE p2.category_id = p.category_id  -- Must correlate on category
);
-- Each product needs comparison against a DIFFERENT average
 
-- NON-CORRELATED WORKS: Global comparison
-- "Find products priced above the overall average"
SELECT p.name, p.price
FROM products p
WHERE p.price > (SELECT AVG(price) FROM products);
-- Every product compared to SAME value
 
-- REQUIRES CORRELATION: Find latest order per customer
SELECT c.name, o.order_date, o.amount
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = c.customer_id  -- Different max per customer
);
 
-- NON-CORRELATED ALTERNATIVE (different semantics!):
SELECT c.name, o.order_date, o.amount  
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_date = (SELECT MAX(order_date) FROM orders);
-- This finds orders matching the GLOBAL latest date, not per-customer

Semantic Precision Matters

The difference between 'above average' and 'above THEIR group's average' is semantically crucial. Using non-correlated when you need correlated (or vice versa) produces incorrect results, not errors. Always verify your query matches the business question.

Transformation Between Types

In many cases, queries can be rewritten from one form to another. Understanding these transformations helps optimize queries and understand optimizer behavior.

Rewriting Correlated Subqueries as JOINs:

Many correlated subqueries can be converted to JOINs with aggregation.

correlated_to_join.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- CORRELATED VERSION:
SELECT e.name, e.salary, e.dept_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary) FROM employees e2
    WHERE e2.dept_id = e.dept_id
);
 
-- JOIN + AGGREGATION VERSION:
SELECT e.name, e.salary, e.dept_id
FROM employees e
JOIN (
    SELECT dept_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY dept_id
) dept_avgs ON dept_avgs.dept_id = e.dept_id
WHERE e.salary > dept_avgs.avg_salary;
 
-- Benefits of JOIN version:
-- • Subquery executes once, not per-row
-- • Explicit aggregate computation
-- • Often easier for optimizer to handle
-- • avg_salary can be selected if needed

Performance Characteristics

The theoretical performance model—non-correlated executes once, correlated executes per row—suggests correlated subqueries are always slower. Reality is more nuanced due to optimizer transformations and caching.

Performance Realities

•Optimizer decorrelation — Modern optimizers often 'decorrelate' subqueries, transforming them into joins internally. Your correlated subquery may execute as efficiently as a hand-written join.
•Result caching — If the same correlation values repeat, results may be cached. A correlated subquery over 1M rows with only 10 distinct department IDs might only execute 10 times.
•Index utilization — Correlated subqueries can leverage indexes on correlation columns. A well-indexed correlated subquery may outperform a join that must scan full tables.
•Outer query cardinality — If the outer query returns few rows, per-row subquery execution is cheap. This is why highly selective WHERE clauses before correlation help performance.
•Subquery complexity — Simple subqueries (especially EXISTS) are easily optimized. Complex subqueries with multiple joins may resist optimization.

performance_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- ANALYZE EXECUTION PLANS:
 
-- Check if optimizer decorrelates your subquery:
EXPLAIN ANALYZE
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary) FROM employees e2
    WHERE e2.dept_id = e.dept_id
);
 
-- Look for:
-- • "SubPlan" or "Subquery Scan" → executing as subquery (potentially slow)
-- • "HashAggregate" + "Hash Join" → decorrelated to join (good)
-- • "Merge Join" with pre-computed aggregates → optimal transformation
 
-- COMPARE ALTERNATIVES:
EXPLAIN ANALYZE
SELECT e.name, e.salary
FROM employees e
JOIN (
    SELECT dept_id, AVG(salary) as avg_sal
    FROM employees
    GROUP BY dept_id  
) d ON d.dept_id = e.dept_id
WHERE e.salary > d.avg_sal;
 
-- Often, both produce identical plans!
-- If not, choose the faster one.

When to Worry

Worry about correlated subquery performance when: (1) EXPLAIN shows repeated subquery scans, (2) the outer query returns many rows with many distinct correlation values, (3) the subquery is complex with its own joins, (4) indexes on correlation columns are missing.

Choosing Between Correlated and Non-Correlated

Use this decision framework when writing subqueries:

decision_flowchart.txt
                    START HERE
                         │
                         ▼
         ┌───────────────────────────────┐
         │ Does the subquery need        │
         │ different results for         │
         │ different outer rows?         │
         └───────────────┬───────────────┘
                         │
              ┌──────────┴──────────┐
              │                     │
              ▼ NO                  ▼ YES
    ┌─────────────────┐   ┌─────────────────────┐
    │ NON-CORRELATED  │   │ CORRELATED required │
    │ subquery works  │   │ (or JOIN + GROUP BY │
    │                 │   │  or window function)│
    └────────┬────────┘   └──────────┬──────────┘
             │                       │
             ▼                       ▼
    ┌─────────────────┐   ┌─────────────────────┐
    │ Does subquery   │   │ Can use window      │
    │ return single   │   │ function instead?   │
    │ value, row, or  │   │                     │
    │ table?          │   │ (AVG() OVER, etc.)  │
    └────────┬────────┘   └──────────┬──────────┘
             │                       │
             ▼                       │
    Use appropriate                  ▼
    scalar/row/table          ┌──────┴──────┐
    subquery syntax           │ YES         │ NO
                              ▼             ▼
                     Prefer window    Use correlated
                     function for     subquery or
                     clarity          JOIN + aggregation

Quick Decision Reference
Scenario	Recommendation	Rationale
Compare to global statistic	Non-correlated	Same value for all rows
Compare to per-group statistic	Window function or correlated	Different value per group
Check if related rows exist	EXISTS (correlated)	Clearer, handles NULLs
Compute per-row derived value	Window function preferred	Single pass, cleaner
Find max/min in each group	Window or correlated	Both work well
Simple membership test	IN with subquery	Clear, often optimized
Complex relationship filter	EXISTS with correlation	Most expressive

Common Mistakes and Misconceptions

Understanding common errors helps you avoid them and debug problematic queries.

common_mistakes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- MISTAKE 1: Accidental correlation
-- Intended: Find employees above global average
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);  -- Correct ✓
 
-- But wrote (typo: forgot alias in subquery):
SELECT e.name FROM employees e
WHERE e.salary > (SELECT AVG(salary) FROM employees e);  
-- The 'e' in subquery shadows outer 'e' - still correct, but confusing
 
-- Worse: Accidental self-reference
SELECT e.name FROM employees e
WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE dept = e.dept);
-- Was non-correlated intended? Now it's correlated! Different results.
 
 
-- MISTAKE 2: Missing correlation when needed
-- Intended: Find products above their category average
SELECT name FROM products
WHERE price > (SELECT AVG(price) FROM products);  -- WRONG!
-- This compares to GLOBAL average, not per-category
 
-- Correct:
SELECT p.name FROM products p
WHERE p.price > (SELECT AVG(p2.price) FROM products p2 
                 WHERE p2.category_id = p.category_id);  -- Correlated ✓
 
 
-- MISTAKE 3: Assuming correlated is always slower
-- Sometimes correlated is BETTER due to selectivity:
-- If outer query returns 10 rows, correlated subquery runs 10 times
-- A join might compute all combinations unnecessarily
 
 
-- MISTAKE 4: Not using window functions when appropriate
-- Overly complex correlated approach:
SELECT e.name,
    (SELECT COUNT(*) FROM employees e2 WHERE e2.salary > e.salary) + 1 as rank
FROM employees e;
 
-- Simpler window function approach:
SELECT name, RANK() OVER (ORDER BY salary DESC) as rank
FROM employees;

Verify Your Intent

After writing a subquery, explicitly verify: 'Did I mean this to be correlated or independent?' Accidental correlation (or lack thereof) changes query semantics silently—no error, just wrong results.

Summary: Correlated vs Non-Correlated

We've comprehensively compared correlated and non-correlated subqueries. Here are the essential takeaways:

Key Takeaways

•Dependency defines the type — Correlated subqueries reference outer query columns; non-correlated can execute independently.
•Different questions, different types — Per-row comparisons require correlation; global comparisons can use non-correlated subqueries.
•Execution models differ conceptually — Non-correlated runs once; correlated runs per-row (conceptually—optimizers may transform).
•Transformations exist — Many correlated subqueries can be rewritten as JOINs or window functions, often with equivalent performance.
•Optimizer magic — Modern databases decorrelate and optimize subqueries, so performance differences are often smaller than theory suggests.
•Choose for clarity first — Write the form that most clearly expresses your intent; optimize only if profiling reveals issues.

Coming up next: We'll dive deep into performance considerations for correlated subqueries—understanding when they're costly, how to identify performance issues, and techniques for optimization when necessary.

Sound Understanding

You now have a thorough understanding of the distinction between correlated and non-correlated subqueries. This knowledge enables you to choose the right type for each query, understand optimizer transformations, and avoid common semantic errors.

4 / 5

Loading learning content...

Database Management SystemsSQL Joins & Subqueries

Correlated Subqueries

LevelIntermediate

Duration60 mins

TopicSQL Joins & Subqueries

4 / 5

Correlated vs Non-Correlated

Two Flavors of Subqueries

How the database engine executes the query
Whether transformations to joins are possible
The potential performance characteristics
What questions the query can answer

What You Will Learn

The Fundamental Distinction

The defining characteristic that separates correlated from non-correlated subqueries is dependency on the outer query.

Non-Correlated (Independent) Subquery:

Does NOT reference any columns from the outer query
Can be executed independently, in isolation
Returns the same result regardless of outer query rows
Conceptually executed once, result reused for all outer rows

Correlated (Dependent) Subquery:

References one or more columns from the outer query
Cannot be executed independently—needs outer row context
Returns potentially different results for different outer rows
Conceptually executed once per outer row

Non-Correlated Example:

non_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Find employees earning above
-- the company average
SELECT name, salary
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
);
 
-- Analysis:
-- • Subquery: SELECT AVG(salary) FROM employees
-- • No reference to outer query
-- • Executes ONCE, returns (e.g.) 75000
-- • Outer query becomes:
--   WHERE salary > 75000
-- • Same comparison for every row

Correlated Example:

correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Find employees earning above
-- THEIR department's average
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.dept_id = e.dept_id
);
 
-- Analysis:
-- • Subquery references e.dept_id
-- • For Engineering: AVG might be 90000
-- • For Sales: AVG might be 65000  
-- • Different value per department
-- • Conceptually executes per row

Quick Identification

Execution Model Comparison

The execution models differ fundamentally, which directly impacts performance characteristics and optimization opportunities.

Non-Correlated Subquery Execution:

1. Execute subquery ONCE
2. Store result (single value, row, or table)
3. For each outer row:
   - Use stored subquery result in predicate
   - Evaluate outer row

Correlated Subquery Execution (Conceptual):

1. For each outer row:
   a. Substitute outer row values into subquery
   b. Execute subquery with those values
   c. Use subquery result for this specific row
   d. Evaluate outer row

execution_trace_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Sample data:
-- employees: (1, 'Alice', 90000, 1), (2, 'Bob', 70000, 1),
--            (3, 'Carol', 80000, 2), (4, 'David', 60000, 2)
-- Dept 1 avg: 80000, Dept 2 avg: 70000, Company avg: 75000
 
-- NON-CORRELATED EXECUTION:
SELECT name, salary FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- Step 1: Execute subquery → 75000
-- Step 2: Process rows:
--   Alice:  90000 > 75000? YES → include
--   Bob:    70000 > 75000? NO  → exclude
--   Carol:  80000 > 75000? YES → include
--   David:  60000 > 75000? NO  → exclude
-- Result: Alice, Carol
 
-- CORRELATED EXECUTION (conceptual):
SELECT e.name, e.salary FROM employees e
WHERE e.salary > (SELECT AVG(e2.salary) FROM employees e2 
                  WHERE e2.dept_id = e.dept_id);
 
-- For Alice (dept_id=1):
--   Execute: SELECT AVG(salary) WHERE dept_id=1 → 80000
--   90000 > 80000? YES → include
-- For Bob (dept_id=1):
--   Execute: SELECT AVG(salary) WHERE dept_id=1 → 80000
--   70000 > 80000? NO → exclude
-- For Carol (dept_id=2):
--   Execute: SELECT AVG(salary) WHERE dept_id=2 → 70000
--   80000 > 70000? YES → include
-- For David (dept_id=2):
--   Execute: SELECT AVG(salary) WHERE dept_id=2 → 70000
--   60000 > 70000? NO → exclude
-- Result: Alice, Carol

Optimizer Transformations

Semantic Differences: What Each Can Express

Beyond execution differences, correlated and non-correlated subqueries have different expressive capabilities. Some questions can only be answered with correlation.

Expressive Capability Comparison
Question Type	Non-Correlated	Correlated
Compare to global aggregate	✓ Natural fit	Possible but overkill
Compare to group-specific aggregate	✗ Cannot express	✓ Required
Check existence of ANY related row	✓ Via IN	✓ Via EXISTS (often better)
Check per-row relationship condition	✗ Cannot express	✓ Required
Compute row-specific derived value	✗ Cannot express	✓ Scalar correlated subquery
Find maximum in each group	Complex	✓ Natural with correlation
Compare row to its own group	✗ Cannot express	✓ Required (or window function)

semantic_requirements.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- REQUIRES CORRELATION: Per-department comparison
-- "Find products priced above their category average"
SELECT p.name, p.price, p.category_id
FROM products p
WHERE p.price > (
    SELECT AVG(p2.price) FROM products p2
    WHERE p2.category_id = p.category_id  -- Must correlate on category
);
-- Each product needs comparison against a DIFFERENT average
 
-- NON-CORRELATED WORKS: Global comparison
-- "Find products priced above the overall average"
SELECT p.name, p.price
FROM products p
WHERE p.price > (SELECT AVG(price) FROM products);
-- Every product compared to SAME value
 
-- REQUIRES CORRELATION: Find latest order per customer
SELECT c.name, o.order_date, o.amount
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = c.customer_id  -- Different max per customer
);
 
-- NON-CORRELATED ALTERNATIVE (different semantics!):
SELECT c.name, o.order_date, o.amount  
FROM customers c
JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_date = (SELECT MAX(order_date) FROM orders);
-- This finds orders matching the GLOBAL latest date, not per-customer

Semantic Precision Matters

Transformation Between Types

In many cases, queries can be rewritten from one form to another. Understanding these transformations helps optimize queries and understand optimizer behavior.

Rewriting Correlated Subqueries as JOINs:

Many correlated subqueries can be converted to JOINs with aggregation.

correlated_to_join.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- CORRELATED VERSION:
SELECT e.name, e.salary, e.dept_id
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary) FROM employees e2
    WHERE e2.dept_id = e.dept_id
);
 
-- JOIN + AGGREGATION VERSION:
SELECT e.name, e.salary, e.dept_id
FROM employees e
JOIN (
    SELECT dept_id, AVG(salary) as avg_salary
    FROM employees
    GROUP BY dept_id
) dept_avgs ON dept_avgs.dept_id = e.dept_id
WHERE e.salary > dept_avgs.avg_salary;
 
-- Benefits of JOIN version:
-- • Subquery executes once, not per-row
-- • Explicit aggregate computation
-- • Often easier for optimizer to handle
-- • avg_salary can be selected if needed

Performance Characteristics

Performance Realities

•Optimizer decorrelation — Modern optimizers often 'decorrelate' subqueries, transforming them into joins internally. Your correlated subquery may execute as efficiently as a hand-written join.
•Result caching — If the same correlation values repeat, results may be cached. A correlated subquery over 1M rows with only 10 distinct department IDs might only execute 10 times.
•Index utilization — Correlated subqueries can leverage indexes on correlation columns. A well-indexed correlated subquery may outperform a join that must scan full tables.
•Outer query cardinality — If the outer query returns few rows, per-row subquery execution is cheap. This is why highly selective WHERE clauses before correlation help performance.
•Subquery complexity — Simple subqueries (especially EXISTS) are easily optimized. Complex subqueries with multiple joins may resist optimization.

performance_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- ANALYZE EXECUTION PLANS:
 
-- Check if optimizer decorrelates your subquery:
EXPLAIN ANALYZE
SELECT e.name, e.salary
FROM employees e
WHERE e.salary > (
    SELECT AVG(e2.salary) FROM employees e2
    WHERE e2.dept_id = e.dept_id
);
 
-- Look for:
-- • "SubPlan" or "Subquery Scan" → executing as subquery (potentially slow)
-- • "HashAggregate" + "Hash Join" → decorrelated to join (good)
-- • "Merge Join" with pre-computed aggregates → optimal transformation
 
-- COMPARE ALTERNATIVES:
EXPLAIN ANALYZE
SELECT e.name, e.salary
FROM employees e
JOIN (
    SELECT dept_id, AVG(salary) as avg_sal
    FROM employees
    GROUP BY dept_id  
) d ON d.dept_id = e.dept_id
WHERE e.salary > d.avg_sal;
 
-- Often, both produce identical plans!
-- If not, choose the faster one.

When to Worry

Choosing Between Correlated and Non-Correlated

Use this decision framework when writing subqueries:

decision_flowchart.txt
                    START HERE
                         │
                         ▼
         ┌───────────────────────────────┐
         │ Does the subquery need        │
         │ different results for         │
         │ different outer rows?         │
         └───────────────┬───────────────┘
                         │
              ┌──────────┴──────────┐
              │                     │
              ▼ NO                  ▼ YES
    ┌─────────────────┐   ┌─────────────────────┐
    │ NON-CORRELATED  │   │ CORRELATED required │
    │ subquery works  │   │ (or JOIN + GROUP BY │
    │                 │   │  or window function)│
    └────────┬────────┘   └──────────┬──────────┘
             │                       │
             ▼                       ▼
    ┌─────────────────┐   ┌─────────────────────┐
    │ Does subquery   │   │ Can use window      │
    │ return single   │   │ function instead?   │
    │ value, row, or  │   │                     │
    │ table?          │   │ (AVG() OVER, etc.)  │
    └────────┬────────┘   └──────────┬──────────┘
             │                       │
             ▼                       │
    Use appropriate                  ▼
    scalar/row/table          ┌──────┴──────┐
    subquery syntax           │ YES         │ NO
                              ▼             ▼
                     Prefer window    Use correlated
                     function for     subquery or
                     clarity          JOIN + aggregation

Quick Decision Reference
Scenario	Recommendation	Rationale
Compare to global statistic	Non-correlated	Same value for all rows
Compare to per-group statistic	Window function or correlated	Different value per group
Check if related rows exist	EXISTS (correlated)	Clearer, handles NULLs
Compute per-row derived value	Window function preferred	Single pass, cleaner
Find max/min in each group	Window or correlated	Both work well
Simple membership test	IN with subquery	Clear, often optimized
Complex relationship filter	EXISTS with correlation	Most expressive

Common Mistakes and Misconceptions

Understanding common errors helps you avoid them and debug problematic queries.

common_mistakes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- MISTAKE 1: Accidental correlation
-- Intended: Find employees above global average
SELECT name FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);  -- Correct ✓
 
-- But wrote (typo: forgot alias in subquery):
SELECT e.name FROM employees e
WHERE e.salary > (SELECT AVG(salary) FROM employees e);  
-- The 'e' in subquery shadows outer 'e' - still correct, but confusing
 
-- Worse: Accidental self-reference
SELECT e.name FROM employees e
WHERE e.salary > (SELECT AVG(salary) FROM employees WHERE dept = e.dept);
-- Was non-correlated intended? Now it's correlated! Different results.
 
 
-- MISTAKE 2: Missing correlation when needed
-- Intended: Find products above their category average
SELECT name FROM products
WHERE price > (SELECT AVG(price) FROM products);  -- WRONG!
-- This compares to GLOBAL average, not per-category
 
-- Correct:
SELECT p.name FROM products p
WHERE p.price > (SELECT AVG(p2.price) FROM products p2 
                 WHERE p2.category_id = p.category_id);  -- Correlated ✓
 
 
-- MISTAKE 3: Assuming correlated is always slower
-- Sometimes correlated is BETTER due to selectivity:
-- If outer query returns 10 rows, correlated subquery runs 10 times
-- A join might compute all combinations unnecessarily
 
 
-- MISTAKE 4: Not using window functions when appropriate
-- Overly complex correlated approach:
SELECT e.name,
    (SELECT COUNT(*) FROM employees e2 WHERE e2.salary > e.salary) + 1 as rank
FROM employees e;
 
-- Simpler window function approach:
SELECT name, RANK() OVER (ORDER BY salary DESC) as rank
FROM employees;

Verify Your Intent

Summary: Correlated vs Non-Correlated

We've comprehensively compared correlated and non-correlated subqueries. Here are the essential takeaways:

Key Takeaways

•Dependency defines the type — Correlated subqueries reference outer query columns; non-correlated can execute independently.
•Different questions, different types — Per-row comparisons require correlation; global comparisons can use non-correlated subqueries.
•Execution models differ conceptually — Non-correlated runs once; correlated runs per-row (conceptually—optimizers may transform).
•Transformations exist — Many correlated subqueries can be rewritten as JOINs or window functions, often with equivalent performance.
•Optimizer magic — Modern databases decorrelate and optimize subqueries, so performance differences are often smaller than theory suggests.
•Choose for clarity first — Write the form that most clearly expresses your intent; optimize only if profiling reveals issues.

Sound Understanding

4 / 5