Loading content...
Consider this question: "For each employee, find their salary compared to the average salary in their own department." A simple subquery calculating the global average won't suffice—each employee needs comparison against a different average depending on which department they belong to.
This is where correlated subqueries enter the picture. Unlike regular subqueries that execute independently and return a single result, correlated subqueries reference values from the outer query, creating a dynamic relationship where the inner query is re-evaluated for each row processed by the outer query.
Correlated subqueries represent one of SQL's most powerful—and most frequently misunderstood—features. They enable queries that would otherwise require multiple passes, procedural logic, or complex joins. Understanding them deeply transforms your ability to express sophisticated data relationships in pure SQL.
By the end of this page, you will understand what makes a subquery 'correlated', how the database engine conceptually executes correlated subqueries, the distinction between correlation and simple nesting, and when correlated subqueries are the right tool for your query needs.
A correlated subquery (also called a synchronized subquery or repeating subquery) is a subquery that references one or more columns from the outer query. This reference creates a dependency between the inner and outer queries—the subquery cannot be evaluated in isolation because it requires values from the row currently being processed by the outer query.
The defining characteristic: In a correlated subquery, the inner query is conceptually executed once for each row of the outer query, using that row's values to parameterize the inner query.
Let's contrast this with a non-correlated (independent) subquery:
Non-Correlated Subquery:
12345678910111213
-- Find employees earning above-- the company-wide averageSELECT employee_id, name, salaryFROM employeesWHERE salary > ( SELECT AVG(salary) FROM employees); -- The inner query:-- - Has no reference to outer query-- - Executes ONCE, returning 75000-- - Outer query uses this constantCorrelated Subquery:
1234567891011121314
-- Find employees earning above-- THEIR department's averageSELECT e.employee_id, e.name, e.salaryFROM employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e.dept_id -- Correlation!); -- The inner query:-- - References e.dept_id from outer-- - Executes ONCE PER outer row-- - Returns different value per deptTo identify a correlated subquery, look for references to table aliases defined in the outer query. In the example above, 'e.dept_id' inside the subquery references the 'e' alias from the outer FROM clause—this creates the correlation. If the subquery only references its own tables, it's non-correlated.
Understanding how correlated subqueries execute is crucial for both writing correct queries and anticipating performance characteristics. While modern query optimizers may transform correlated subqueries internally, the conceptual execution model remains essential for understanding semantics.
Conceptual Execution Algorithm:
for each row R in outer_query_result:
substitute R's column values into the subquery
execute the parameterized subquery
use subquery result to evaluate WHERE/SELECT for row R
if row R satisfies conditions, include in final result
This row-by-row execution model explains both the power and potential performance implications of correlated subqueries.
Step-by-Step Execution Example:
Consider finding employees who earn more than their department's average:
123456789101112131415161718192021222324252627282930313233343536
-- Sample data:-- employees: (1, 'Alice', 90000, 'Engineering')-- (2, 'Bob', 70000, 'Engineering')-- (3, 'Carol', 80000, 'Sales')-- (4, 'David', 60000, 'Sales') SELECT e.name, e.salary, e.deptFROM employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept = e.dept); -- Execution trace:-- -- Row 1 (Alice, 90000, Engineering):-- Subquery: SELECT AVG(salary) WHERE dept='Engineering'-- Result: (90000 + 70000) / 2 = 80000-- Check: 90000 > 80000? YES → Include Alice---- Row 2 (Bob, 70000, Engineering):-- Subquery: SELECT AVG(salary) WHERE dept='Engineering'-- Result: 80000-- Check: 70000 > 80000? NO → Exclude Bob---- Row 3 (Carol, 80000, Sales):-- Subquery: SELECT AVG(salary) WHERE dept='Sales'-- Result: (80000 + 60000) / 2 = 70000-- Check: 80000 > 70000? YES → Include Carol---- Row 4 (David, 60000, Sales):-- Subquery: SELECT AVG(salary) WHERE dept='Sales'-- Result: 70000-- Check: 60000 > 70000? NO → Exclude David---- Final result: Alice, CarolModern database optimizers often transform correlated subqueries into joins or use caching to avoid redundant subquery executions. However, understanding the conceptual model helps you predict query behavior, debug unexpected results, and recognize when the optimizer might struggle.
Every correlated subquery has distinct structural components that work together. Understanding these components helps you construct correct queries and diagnose issues.
The Essential Components:
12345678910111213141516171819
SELECT -- Outer SELECT outer_table.column1, -- Outer columns outer_table.column2, ( -- Subquery begins SELECT aggregate(inner_table.column) FROM inner_table WHERE inner_table.key = outer_table.key -- CORRELATION POINT ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ References outer_table, creating dependency ) AS computed_column -- Subquery endsFROM outer_table -- Outer FROMWHERE outer_table.condition -- Outer WHERE -- Key structural elements:-- 1. OUTER QUERY: The main query operating on outer_table-- 2. INNER QUERY: The subquery operating on inner_table -- 3. CORRELATION PREDICATE: The WHERE clause in subquery-- that references the outer table-- 4. TABLE ALIASES: Critical for distinguishing outer vs innerAlways use meaningful table aliases and qualify all column references in correlated subqueries. 'SELECT * FROM employees e WHERE salary > (SELECT AVG(salary) FROM employees WHERE dept_id = dept_id)' is ambiguous—does dept_id refer to outer or inner? This query likely returns wrong results without errors.
Correlated subqueries can appear in multiple positions within a SQL query, each serving different purposes. Understanding these positions expands your query-writing toolkit significantly.
Filtering with Correlated Conditions
The most common position for correlated subqueries. Used to filter outer rows based on related data in other tables or the same table.
1234567891011121314151617
-- Find products with prices above their category averageSELECT p.product_name, p.price, p.category_idFROM products pWHERE p.price > ( SELECT AVG(p2.price) FROM products p2 WHERE p2.category_id = p.category_id); -- Find customers who have placed orders in the last monthSELECT c.customer_id, c.nameFROM customers cWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.order_date >= CURRENT_DATE - INTERVAL '30 days');One of the most powerful applications of correlated subqueries is self-correlation, where a table is compared against itself. This pattern solves problems like:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- 1. ABOVE-AVERAGE PATTERN-- Find employees earning more than their department's averageSELECT e.name, e.salary, e.departmentFROM employees eWHERE e.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.department = e.department); -- 2. MAXIMUM IN GROUP PATTERN -- Find the highest-paid employee in each departmentSELECT e.name, e.salary, e.departmentFROM employees eWHERE e.salary = ( SELECT MAX(e2.salary) FROM employees e2 WHERE e2.department = e.department); -- 3. COUNTING PATTERN-- Find employees who have more direct reports than averageSELECT m.name, m.employee_id, (SELECT COUNT(*) FROM employees e2 WHERE e2.manager_id = m.employee_id) as report_countFROM employees mWHERE (SELECT COUNT(*) FROM employees e2 WHERE e2.manager_id = m.employee_id) > ( SELECT AVG(cnt) FROM ( SELECT COUNT(*) as cnt FROM employees GROUP BY manager_id ) sub); -- 4. SEQUENCE GAP PATTERN-- Find orders with gaps in order numbersSELECT o.order_numberFROM orders oWHERE NOT EXISTS ( SELECT 1 FROM orders o2 WHERE o2.order_number = o.order_number - 1)AND o.order_number > (SELECT MIN(order_number) FROM orders);When self-correlating, use meaningful aliases that clarify the role of each table instance. Common patterns: 'e' and 'e2' (outer and inner), 'm' and 'r' (manager and report), 'current' and 'comparison'. Clear aliases prevent confusion and make queries self-documenting.
Certain correlated subquery patterns appear repeatedly across different domains. Recognizing these patterns accelerates query writing and helps you apply proven solutions to new problems.
| Pattern | Use Case | Structure |
|---|---|---|
| Row-to-Aggregate | Compare row value to group statistic | WHERE x > (SELECT AVG(x) WHERE group = outer.group) |
| Existence Check | Filter if related data exists | WHERE EXISTS (SELECT 1 WHERE fk = outer.pk) |
| Non-Existence Check | Filter if NO related data exists | WHERE NOT EXISTS (SELECT 1 WHERE fk = outer.pk) |
| Top-N per Group | Get best/worst N per category | LATERAL (SELECT ... ORDER BY ... LIMIT N) |
| Running Calculation | Cumulative sums, counts up to row | (SELECT SUM(x) WHERE date <= outer.date) |
| Row Numbering | Rank within partition (pre-window) | (SELECT COUNT(*) WHERE val > outer.val) |
12345678910111213141516171819202122232425262728293031
-- RUNNING TOTAL PATTERN (before window functions)SELECT o.order_date, o.amount, (SELECT SUM(o2.amount) FROM orders o2 WHERE o2.order_date <= o.order_date AND o2.customer_id = o.customer_id) AS running_totalFROM orders oORDER BY o.customer_id, o.order_date; -- DENSE RANK PATTERN (before window functions)SELECT e.name, e.salary, (SELECT COUNT(DISTINCT e2.salary) FROM employees e2 WHERE e2.salary > e.salary) + 1 AS salary_rankFROM employees eORDER BY salary_rank; -- LATEST-PER-GROUP PATTERN-- Get most recent order for each customerSELECT c.customer_id, c.name, o.order_date, o.amountFROM customers cJOIN orders o ON o.customer_id = c.customer_idWHERE o.order_date = ( SELECT MAX(o2.order_date) FROM orders o2 WHERE o2.customer_id = c.customer_id);Correlated subqueries are one tool among several for achieving similar results. Understanding when they're the best choice—and when alternatives are preferable—is key to writing effective SQL.
123456789101112131415161718192021
-- CORRELATED SUBQUERY approachSELECT e.name, e.salary, (SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e.dept_id) AS dept_avgFROM employees e; -- JOIN + GROUP BY approach (often more efficient)SELECT e.name, e.salary, d.dept_avgFROM employees eJOIN ( SELECT dept_id, AVG(salary) AS dept_avg FROM employees GROUP BY dept_id) d ON d.dept_id = e.dept_id; -- WINDOW FUNCTION approach (cleanest for analytics)SELECT name, salary, AVG(salary) OVER (PARTITION BY dept_id) AS dept_avgFROM employees;There's no universal 'best' approach. Correlated subqueries often express intent most clearly, which aids maintainability. Modern optimizers frequently transform between these forms internally. Start with clarity, then optimize if profiling reveals issues.
We've established a deep understanding of what makes subqueries correlated. Let's consolidate the key insights:
Coming up next: We'll explore the EXISTS operator, the most performant and expressive way to use correlated subqueries for existence testing. EXISTS is fundamental to advanced SQL and is often the preferred approach for checking related data conditions.
You now understand the fundamental concept of correlation in SQL subqueries. This foundation is essential for mastering the EXISTS operator, NOT EXISTS patterns, and understanding performance implications—all covered in the following pages.