Loading content...
Consider a deceptively simple question: "Find all employees who earn more than the average salary in their department."
At first glance, this seems straightforward. But pause and think about what's involved:
This isn't a simple filter—it requires relating each row to an aggregated value computed from other rows. The data you need to evaluate a row depends on computations over other rows.
This is precisely where subqueries shine. A subquery is a query nested inside another query, allowing you to use the result of one query as input to another. They transform SQL from a simple data retrieval language into a powerful compositional system where queries become building blocks for more complex operations.
By the end of this page, you will understand the fundamental concept of subqueries—what they are, how they execute, why they're essential, and how they relate to the outer (main) query. This conceptual foundation is critical before exploring the specific types of subqueries.
A subquery (also called a nested query or inner query) is a complete SELECT statement embedded within another SQL statement. The containing statement is called the outer query, main query, or parent query.
The key insight is that a subquery is a first-class query—it follows all the rules of a standalone SELECT statement. It can have its own FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses. The only difference is its context: instead of returning results directly to the user, it returns results to be used by the outer query.
Formal Definition:
A subquery is a
SELECTstatement nested inside another SQL statement (typicallySELECT,INSERT,UPDATE, orDELETE) that returns a result set used by the outer statement for evaluation, filtering, or data manipulation.
12345678910111213141516
-- Anatomy of a Subquery-- The outer query uses the result of the inner query SELECT employee_name, salaryFROM employeesWHERE salary > ( -- Outer query uses comparison SELECT AVG(salary) -- Inner query (subquery) FROM employees -- Subquery has its own FROM WHERE department_id = 10 -- Subquery has its own WHERE); -- Breaking it down:-- 1. The subquery: SELECT AVG(salary) FROM employees WHERE department_id = 10-- This is a complete, valid query that returns a single value (e.g., 75000)-- 2. The outer query: SELECT ... WHERE salary > 75000-- Uses the subquery result as if it were a literal valueThink of a subquery as a placeholder that gets replaced by its result before the outer query executes. In the example above, (SELECT AVG(salary) FROM employees WHERE department_id = 10) might evaluate to 75000, and then the outer query effectively becomes WHERE salary > 75000. This substitution model helps understand simple subqueries, though correlated subqueries (covered in Module 6) work differently.
Subqueries represent query composition—the ability to combine queries to create more powerful queries. This is analogous to function composition in programming, where you pass the output of one function as input to another.
Without subqueries, you would need to:
This is inefficient, error-prone, and loses the benefits of database optimization. With subqueries, the database handles all of this internally, often with significant performance advantages.
The Relational Algebra Connection:
In relational algebra, subqueries correspond to the composition of relational operators. The result of one operation becomes the input to another. This mathematical foundation ensures that subqueries are well-defined—the inner query produces a relation (possibly containing a single value), and the outer query consumes it.
This compositionality is what makes SQL expressive. You can nest queries arbitrarily deep:
SELECT * FROM A WHERE x IN (SELECT y FROM B WHERE z = (SELECT MAX(z) FROM C))
Each level evaluates from innermost to outermost (conceptually), building up the final result.
Understanding how subqueries execute is crucial for writing efficient SQL. The execution model differs based on whether the subquery references columns from the outer query.
Non-Correlated (Uncorrelated) Subqueries:
A non-correlated subquery is independent of the outer query—it can be executed on its own and produces the same result regardless of the outer query's context. These subqueries are conceptually executed once, and their result is used by the outer query.
Correlated Subqueries:
A correlated subquery references columns from the outer query, creating a dependency. These subqueries are conceptually executed once per row of the outer query, with the outer row's values substituted into the subquery. (We'll explore correlated subqueries in depth in Module 6.)
123456789101112131415161718192021222324252627282930
-- NON-CORRELATED SUBQUERY-- The subquery doesn't reference the outer query at all-- Conceptually executed ONCE SELECT employee_name, salaryFROM employeesWHERE salary > ( SELECT AVG(salary) FROM employees -- No reference to outer query);-- Execution: -- 1. Compute AVG(salary) from all employees → 65000-- 2. Scan employees, return rows where salary > 65000 -- CORRELATED SUBQUERY -- The subquery references the outer query's table-- Conceptually executed ONCE PER OUTER ROW SELECT e.employee_name, e.salary, e.department_idFROM employees eWHERE e.salary > ( SELECT AVG(salary) FROM employees WHERE department_id = e.department_id -- References outer query!);-- Conceptual Execution (not necessarily actual):-- For each employee row e:-- 1. Compute AVG(salary) for e's department-- 2. Compare e.salary against that average-- 3. Include row if comparison is trueThe 'once per row' execution model for correlated subqueries is conceptual. Modern query optimizers often transform correlated subqueries into joins or use semi-join optimizations, making actual execution much more efficient than naive row-by-row processing. However, understanding the conceptual model helps you reason about correctness and predict when optimization might struggle.
The Evaluation Order:
Logically, SQL evaluates queries in this order:
Subqueries can appear at multiple stages, and their evaluation fits into this logical order.
Not all subqueries are created equal. What a subquery returns determines where and how it can be used. This is a critical concept that affects SQL syntax and semantics.
There are three fundamental categories based on the result cardinality:
Each type has different use cases and syntactic requirements.
| Return Type | Result Shape | Common Usage | Operators |
|---|---|---|---|
| Scalar | 1 row × 1 column | Comparison, SELECT list, SET clause | =, >, <, >=, <=, <> |
| Row | 1 row × N columns | Row comparison, tuple matching | = (row), IN (row list) |
| Table | M rows × N columns | IN, EXISTS, ANY/ALL, FROM clause | IN, EXISTS, ANY, ALL |
123456789101112131415161718192021222324252627282930
-- SCALAR SUBQUERY: Returns one value-- Can be used anywhere a single value is expected SELECT employee_name, salary, (SELECT AVG(salary) FROM employees) AS company_avg -- Scalar in SELECTFROM employeesWHERE salary > (SELECT AVG(salary) FROM employees); -- Scalar in WHERE -- ROW SUBQUERY: Returns one row with multiple columns-- Used for tuple comparisons SELECT * FROM employeesWHERE (department_id, job_id) = ( SELECT department_id, job_id FROM employees WHERE employee_id = 101); -- TABLE SUBQUERY: Returns multiple rows-- Used with set operators like IN, EXISTS, ANY, ALL SELECT employee_name FROM employeesWHERE department_id IN ( SELECT department_id FROM departments WHERE location = 'New York');Using a subquery that returns multiple rows where a scalar is expected causes a runtime error: 'Subquery returned more than 1 row'. This is one of the most common subquery bugs. Always verify your subquery's cardinality matches its usage context. Use LIMIT 1, MAX/MIN, or ensure your WHERE clause guarantees single-row results.
Subqueries aren't just syntactic sugar—they solve problems that are difficult or impossible to express with only joins and simple filters. Understanding their unique strengths helps you choose the right tool for each situation.
Subqueries vs. Joins — The Key Distinction:
Joins combine data from multiple tables horizontally, potentially changing the row count through one-to-many relationships. Subqueries use data from other queries to filter, compute, or provide values—often without changing the fundamental structure of the outer query's result.
Consider finding employees in departments with more than 10 people:
With Join:
SELECT DISTINCT e.employee_name
FROM employees e
JOIN (
SELECT department_id, COUNT(*) as cnt
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 10
) d ON e.department_id = d.department_id;
With Subquery:
SELECT employee_name
FROM employees
WHERE department_id IN (
SELECT department_id
FROM employees
GROUP BY department_id
HAVING COUNT(*) > 10
);
Both work, but the subquery version is often cleaner when you don't need data from the inner query—just its filtering effect.
Every subquery must be enclosed in parentheses. This is non-negotiable syntax in SQL—the parentheses delimit where the subquery begins and ends, allowing the parser to distinguish nested structure.
Encapsulation Rules:
12345678910111213141516171819202122232425262728293031
-- CORRECT: Subquery properly enclosed in parenthesesSELECT * FROM employeesWHERE department_id = (SELECT department_id FROM departments WHERE name = 'Sales'); -- CORRECT: Subquery with its own clausesSELECT employee_name, salaryFROM employeesWHERE salary > ( SELECT AVG(salary) FROM employees WHERE department_id = 20 -- ORDER BY would be ignored here (no LIMIT)); -- CORRECT: Derived table subquery in FROM clause (requires alias)SELECT dept_stats.department_id, dept_stats.avg_salaryFROM ( SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id) AS dept_stats -- Alias required for derived tablesWHERE dept_stats.avg_salary > 50000; -- ERROR: Missing parentheses-- SELECT * FROM employees WHERE salary > SELECT AVG(salary) FROM employees; -- ERROR: Missing alias for derived table-- SELECT * FROM (SELECT department_id, COUNT(*) FROM employees GROUP BY department_id);When a subquery appears in the FROM clause (creating a 'derived table' or 'inline view'), most databases require you to give it an alias. This alias becomes the table name for referencing columns in the outer query. This requirement varies slightly by database—MySQL requires it, PostgreSQL requires it, but Oracle allows omitting it in some contexts.
A common question is whether subqueries are slower than joins. The answer is nuanced and depends heavily on the query structure and database optimizer.
Modern Query Optimization:
Sophisticated database engines (PostgreSQL, MySQL 8+, SQL Server, Oracle) often transform subqueries into equivalent joins during query optimization. This means the same execution plan may result from either syntax. The optimizer chooses based on statistics and cost estimation, not on whether you wrote a subquery or a join.
When Subqueries Can Be Slower:
| Scenario | Better Approach | Reason |
|---|---|---|
| Check existence only | EXISTS subquery | EXISTS stops at first match; JOIN retrieves all matches |
| Need data from both tables | JOIN | Subquery would require duplicating logic or multiple queries |
| Filter by aggregation | Either (optimizer equivalent) | Modern optimizers transform to same plan |
| Self-referential comparison | Correlated subquery or JOIN | Depends on data distribution and indexes |
| Complex derived calculations | Subquery (clarity) | Performance similar; subquery more readable |
Write the query in whichever form is clearest and most maintainable. Then examine the execution plan. If performance is inadequate, try alternative formulations. Don't prematurely optimize for perceived subquery overhead—profile first, then optimize based on evidence.
Certain subquery patterns appear repeatedly across different domains. Recognizing these patterns accelerates query design and helps you choose the right approach.
Pattern 1: Comparison Against Aggregate
The most common pattern—compare each row against a summarized value from the same or related table.
1234567891011121314151617
-- Find products priced above the category averageSELECT product_name, price, category_idFROM products pWHERE price > ( SELECT AVG(price) FROM products WHERE category_id = p.category_id -- Correlated to outer query); -- Find orders larger than the customer's average orderSELECT order_id, customer_id, total_amountFROM orders oWHERE total_amount > ( SELECT AVG(total_amount) FROM orders WHERE customer_id = o.customer_id);Pattern 2: Membership Check (IN Subquery)
Check if a value belongs to a dynamically computed set of values.
1234567891011121314151617
-- Find customers who have placed orders in the last monthSELECT customer_name, emailFROM customersWHERE customer_id IN ( SELECT DISTINCT customer_id FROM orders WHERE order_date >= DATE_SUB(CURRENT_DATE, INTERVAL 1 MONTH)); -- Find products that have never been orderedSELECT product_nameFROM productsWHERE product_id NOT IN ( SELECT DISTINCT product_id FROM order_items WHERE product_id IS NOT NULL -- Important! NULL in NOT IN causes issues);Pattern 3: Maximum/Minimum Row Selection
Retrieve the full row containing the maximum or minimum value.
12345678910111213
-- Find the employee(s) with the highest salarySELECT employee_name, salary, department_idFROM employeesWHERE salary = (SELECT MAX(salary) FROM employees); -- Find the most recent order for each customerSELECT o.*FROM orders oWHERE o.order_date = ( SELECT MAX(o2.order_date) FROM orders o2 WHERE o2.customer_id = o.customer_id);If the subquery in NOT IN returns any NULL values, the entire NOT IN condition evaluates to UNKNOWN (effectively false for all rows), returning no results. Always filter out NULLs in NOT IN subqueries or use NOT EXISTS instead, which handles NULLs correctly.
We've established the conceptual foundation for understanding subqueries. Let's consolidate the key points:
What's Next:
With the conceptual foundation in place, we'll dive deep into each subquery type. The next page explores scalar subqueries—queries that return exactly one value and can be used anywhere a single value is expected.
You now understand what subqueries are, how they execute, and why they're essential to SQL's expressive power. This conceptual foundation prepares you for mastering the specific subquery types covered in the following pages.