Loading learning content...
Every subquery in SQL falls into one of two fundamental categories: correlated or non-correlated (also called independent or simple subqueries). This distinction isn't just academic—it determines:
Understanding this dichotomy deeply enables you to choose the right approach for each situation, recognize when queries can be rewritten for performance, and avoid common pitfalls that arise from confusion between the two types.
By the end of this page, you will clearly distinguish correlated from non-correlated subqueries, understand their execution models, know when each type is appropriate, and recognize opportunities to transform one into the other for clarity or performance.
The defining characteristic that separates correlated from non-correlated subqueries is dependency on the outer query.
Non-Correlated (Independent) Subquery:
Correlated (Dependent) Subquery:
Non-Correlated Example:
12345678910111213141516
-- Find employees earning above-- the company averageSELECT name, salaryFROM employeesWHERE salary > ( SELECT AVG(salary) FROM employees); -- Analysis:-- • Subquery: SELECT AVG(salary) FROM employees-- • No reference to outer query-- • Executes ONCE, returns (e.g.) 75000-- • Outer query becomes:-- WHERE salary > 75000-- • Same comparison for every rowCorrelated Example:
12345678910111213141516
-- Find employees earning above-- THEIR department's averageSELECT e.name, e.salaryFROM employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e.dept_id); -- Analysis:-- • Subquery references e.dept_id-- • For Engineering: AVG might be 90000-- • For Sales: AVG might be 65000 -- • Different value per department-- • Conceptually executes per rowTo identify subquery type: Look inside the subquery for any reference to tables or aliases defined only in the outer query. If found → correlated. If the subquery could run standalone and return meaningful results → non-correlated.
The execution models differ fundamentally, which directly impacts performance characteristics and optimization opportunities.
Non-Correlated Subquery Execution:
1. Execute subquery ONCE
2. Store result (single value, row, or table)
3. For each outer row:
- Use stored subquery result in predicate
- Evaluate outer row
Correlated Subquery Execution (Conceptual):
1. For each outer row:
a. Substitute outer row values into subquery
b. Execute subquery with those values
c. Use subquery result for this specific row
d. Evaluate outer row
1234567891011121314151617181920212223242526272829303132333435
-- Sample data:-- employees: (1, 'Alice', 90000, 1), (2, 'Bob', 70000, 1),-- (3, 'Carol', 80000, 2), (4, 'David', 60000, 2)-- Dept 1 avg: 80000, Dept 2 avg: 70000, Company avg: 75000 -- NON-CORRELATED EXECUTION:SELECT name, salary FROM employeesWHERE salary > (SELECT AVG(salary) FROM employees); -- Step 1: Execute subquery → 75000-- Step 2: Process rows:-- Alice: 90000 > 75000? YES → include-- Bob: 70000 > 75000? NO → exclude-- Carol: 80000 > 75000? YES → include-- David: 60000 > 75000? NO → exclude-- Result: Alice, Carol -- CORRELATED EXECUTION (conceptual):SELECT e.name, e.salary FROM employees eWHERE e.salary > (SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e.dept_id); -- For Alice (dept_id=1):-- Execute: SELECT AVG(salary) WHERE dept_id=1 → 80000-- 90000 > 80000? YES → include-- For Bob (dept_id=1):-- Execute: SELECT AVG(salary) WHERE dept_id=1 → 80000-- 70000 > 80000? NO → exclude-- For Carol (dept_id=2):-- Execute: SELECT AVG(salary) WHERE dept_id=2 → 70000-- 80000 > 70000? YES → include-- For David (dept_id=2):-- Execute: SELECT AVG(salary) WHERE dept_id=2 → 70000-- 60000 > 70000? NO → exclude-- Result: Alice, CarolModern optimizers often transform both types. Non-correlated subqueries may be inlined as constants. Correlated subqueries may be converted to joins with aggregation. The 'conceptual' model describes semantics; actual execution may differ significantly.
Beyond execution differences, correlated and non-correlated subqueries have different expressive capabilities. Some questions can only be answered with correlation.
| Question Type | Non-Correlated | Correlated |
|---|---|---|
| Compare to global aggregate | ✓ Natural fit | Possible but overkill |
| Compare to group-specific aggregate | ✗ Cannot express | ✓ Required |
| Check existence of ANY related row | ✓ Via IN | ✓ Via EXISTS (often better) |
| Check per-row relationship condition | ✗ Cannot express | ✓ Required |
| Compute row-specific derived value | ✗ Cannot express | ✓ Scalar correlated subquery |
| Find maximum in each group | Complex | ✓ Natural with correlation |
| Compare row to its own group | ✗ Cannot express | ✓ Required (or window function) |
123456789101112131415161718192021222324252627282930313233
-- REQUIRES CORRELATION: Per-department comparison-- "Find products priced above their category average"SELECT p.name, p.price, p.category_idFROM products pWHERE p.price > ( SELECT AVG(p2.price) FROM products p2 WHERE p2.category_id = p.category_id -- Must correlate on category);-- Each product needs comparison against a DIFFERENT average -- NON-CORRELATED WORKS: Global comparison-- "Find products priced above the overall average"SELECT p.name, p.priceFROM products pWHERE p.price > (SELECT AVG(price) FROM products);-- Every product compared to SAME value -- REQUIRES CORRELATION: Find latest order per customerSELECT c.name, o.order_date, o.amountFROM customers cJOIN orders o ON o.customer_id = c.customer_idWHERE o.order_date = ( SELECT MAX(o2.order_date) FROM orders o2 WHERE o2.customer_id = c.customer_id -- Different max per customer); -- NON-CORRELATED ALTERNATIVE (different semantics!):SELECT c.name, o.order_date, o.amount FROM customers cJOIN orders o ON o.customer_id = c.customer_idWHERE o.order_date = (SELECT MAX(order_date) FROM orders);-- This finds orders matching the GLOBAL latest date, not per-customerThe difference between 'above average' and 'above THEIR group's average' is semantically crucial. Using non-correlated when you need correlated (or vice versa) produces incorrect results, not errors. Always verify your query matches the business question.
In many cases, queries can be rewritten from one form to another. Understanding these transformations helps optimize queries and understand optimizer behavior.
Rewriting Correlated Subqueries as JOINs:
Many correlated subqueries can be converted to JOINs with aggregation.
1234567891011121314151617181920212223
-- CORRELATED VERSION:SELECT e.name, e.salary, e.dept_idFROM employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e.dept_id); -- JOIN + AGGREGATION VERSION:SELECT e.name, e.salary, e.dept_idFROM employees eJOIN ( SELECT dept_id, AVG(salary) as avg_salary FROM employees GROUP BY dept_id) dept_avgs ON dept_avgs.dept_id = e.dept_idWHERE e.salary > dept_avgs.avg_salary; -- Benefits of JOIN version:-- • Subquery executes once, not per-row-- • Explicit aggregate computation-- • Often easier for optimizer to handle-- • avg_salary can be selected if neededThe theoretical performance model—non-correlated executes once, correlated executes per row—suggests correlated subqueries are always slower. Reality is more nuanced due to optimizer transformations and caching.
1234567891011121314151617181920212223242526272829
-- ANALYZE EXECUTION PLANS: -- Check if optimizer decorrelates your subquery:EXPLAIN ANALYZESELECT e.name, e.salaryFROM employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e.dept_id); -- Look for:-- • "SubPlan" or "Subquery Scan" → executing as subquery (potentially slow)-- • "HashAggregate" + "Hash Join" → decorrelated to join (good)-- • "Merge Join" with pre-computed aggregates → optimal transformation -- COMPARE ALTERNATIVES:EXPLAIN ANALYZESELECT e.name, e.salaryFROM employees eJOIN ( SELECT dept_id, AVG(salary) as avg_sal FROM employees GROUP BY dept_id ) d ON d.dept_id = e.dept_idWHERE e.salary > d.avg_sal; -- Often, both produce identical plans!-- If not, choose the faster one.Worry about correlated subquery performance when: (1) EXPLAIN shows repeated subquery scans, (2) the outer query returns many rows with many distinct correlation values, (3) the subquery is complex with its own joins, (4) indexes on correlation columns are missing.
Use this decision framework when writing subqueries:
START HERE │ ▼ ┌───────────────────────────────┐ │ Does the subquery need │ │ different results for │ │ different outer rows? │ └───────────────┬───────────────┘ │ ┌──────────┴──────────┐ │ │ ▼ NO ▼ YES ┌─────────────────┐ ┌─────────────────────┐ │ NON-CORRELATED │ │ CORRELATED required │ │ subquery works │ │ (or JOIN + GROUP BY │ │ │ │ or window function)│ └────────┬────────┘ └──────────┬──────────┘ │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────────┐ │ Does subquery │ │ Can use window │ │ return single │ │ function instead? │ │ value, row, or │ │ │ │ table? │ │ (AVG() OVER, etc.) │ └────────┬────────┘ └──────────┬──────────┘ │ │ ▼ │ Use appropriate ▼ scalar/row/table ┌──────┴──────┐ subquery syntax │ YES │ NO ▼ ▼ Prefer window Use correlated function for subquery or clarity JOIN + aggregation| Scenario | Recommendation | Rationale |
|---|---|---|
| Compare to global statistic | Non-correlated | Same value for all rows |
| Compare to per-group statistic | Window function or correlated | Different value per group |
| Check if related rows exist | EXISTS (correlated) | Clearer, handles NULLs |
| Compute per-row derived value | Window function preferred | Single pass, cleaner |
| Find max/min in each group | Window or correlated | Both work well |
| Simple membership test | IN with subquery | Clear, often optimized |
| Complex relationship filter | EXISTS with correlation | Most expressive |
Understanding common errors helps you avoid them and debug problematic queries.
12345678910111213141516171819202122232425262728293031323334353637383940414243
-- MISTAKE 1: Accidental correlation-- Intended: Find employees above global averageSELECT name FROM employeesWHERE salary > (SELECT AVG(salary) FROM employees); -- Correct ✓ -- But wrote (typo: forgot alias in subquery):SELECT e.name FROM employees eWHERE e.salary > (SELECT AVG(salary) FROM employees e); -- The 'e' in subquery shadows outer 'e' - still correct, but confusing -- Worse: Accidental self-referenceSELECT e.name FROM employees eWHERE e.salary > (SELECT AVG(salary) FROM employees WHERE dept = e.dept);-- Was non-correlated intended? Now it's correlated! Different results. -- MISTAKE 2: Missing correlation when needed-- Intended: Find products above their category averageSELECT name FROM productsWHERE price > (SELECT AVG(price) FROM products); -- WRONG!-- This compares to GLOBAL average, not per-category -- Correct:SELECT p.name FROM products pWHERE p.price > (SELECT AVG(p2.price) FROM products p2 WHERE p2.category_id = p.category_id); -- Correlated ✓ -- MISTAKE 3: Assuming correlated is always slower-- Sometimes correlated is BETTER due to selectivity:-- If outer query returns 10 rows, correlated subquery runs 10 times-- A join might compute all combinations unnecessarily -- MISTAKE 4: Not using window functions when appropriate-- Overly complex correlated approach:SELECT e.name, (SELECT COUNT(*) FROM employees e2 WHERE e2.salary > e.salary) + 1 as rankFROM employees e; -- Simpler window function approach:SELECT name, RANK() OVER (ORDER BY salary DESC) as rankFROM employees;After writing a subquery, explicitly verify: 'Did I mean this to be correlated or independent?' Accidental correlation (or lack thereof) changes query semantics silently—no error, just wrong results.
We've comprehensively compared correlated and non-correlated subqueries. Here are the essential takeaways:
Coming up next: We'll dive deep into performance considerations for correlated subqueries—understanding when they're costly, how to identify performance issues, and techniques for optimization when necessary.
You now have a thorough understanding of the distinction between correlated and non-correlated subqueries. This knowledge enables you to choose the right type for each query, understand optimizer transformations, and avoid common semantic errors.