Loading content...
Consider a conceptually simple requirement: "For each employee, show how their salary compares to the company average."
You can solve this with a subquery—but WHERE you place it affects everything: syntax requirements, execution semantics, result structure, and performance.
-- Placement 1: In SELECT (adds column)
SELECT name, salary, (SELECT AVG(salary) FROM employees) AS avg FROM employees;
-- Placement 2: In WHERE (filters rows)
SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);
-- Placement 3: In FROM (creates derived table)
SELECT e.name, e.salary, stats.avg FROM employees e, (SELECT AVG(salary) AS avg FROM employees) stats;
Each placement serves a different purpose. Understanding these differences—and knowing which to use when—is the mark of SQL mastery.
By the end of this page, you will understand exactly where subqueries can appear in SQL statements, how placement affects behavior, the specific requirements and restrictions of each context, and how to choose the optimal placement for your use case.
Subqueries can appear in virtually any part of a SQL statement where an expression or table reference is valid. Each context has specific rules about what subquery types are allowed and how results are used.
Complete Placement Map:
| Placement | Allowed Types | Primary Purpose | Example Pattern |
|---|---|---|---|
| SELECT list | Scalar only | Add computed columns | SELECT ..., (subquery) AS col |
| FROM clause | Table (derived tables) | Create virtual table source | FROM (subquery) AS alias |
| WHERE clause | Scalar, Row, Table | Filter source rows | WHERE col operator (subquery) |
| HAVING clause | Scalar, Table | Filter grouped results | HAVING AGG(col) > (subquery) |
| JOIN condition | Scalar | Dynamic join conditions | ON a.col = (subquery) |
| CASE expression | Scalar | Conditional logic | CASE WHEN (subquery) ... |
| INSERT VALUES | Scalar | Computed insert values | VALUES ((subquery), ...) |
| UPDATE SET | Scalar | Dynamic update values | SET col = (subquery) |
| DELETE WHERE | Scalar, Table | Dynamic deletion criteria | WHERE col IN (subquery) |
SQL's logical execution order is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Subqueries in each clause are evaluated at that stage. A WHERE subquery filters before grouping; a HAVING subquery filters after. Understanding this order explains why certain placements work differently.
Subqueries in the SELECT list add computed columns to each output row. Only scalar subqueries are allowed—the subquery must return exactly one value.
Characteristics:
123456789101112131415161718192021222324252627282930
-- Non-correlated: same value for all rowsSELECT employee_name, salary, (SELECT AVG(salary) FROM employees) AS company_avg, salary - (SELECT AVG(salary) FROM employees) AS diff_from_avgFROM employees; -- Correlated: different value per rowSELECT e.employee_name, e.salary, e.department_id, (SELECT d.department_name FROM departments d WHERE d.department_id = e.department_id) AS dept_name, (SELECT COUNT(*) FROM employees e2 WHERE e2.department_id = e.department_id) AS dept_size, (SELECT AVG(salary) FROM employees e3 WHERE e3.department_id = e.department_id) AS dept_avgFROM employees e; -- With expressions on subquery resultSELECT product_name, price, ROUND(price / (SELECT AVG(price) FROM products) * 100, 1) AS pct_of_avgFROM products;Only scalar subqueries are allowed in SELECT. A subquery returning multiple rows or multiple columns will cause an error. If your subquery might return multiple rows, you must aggregate it (MAX, MIN, etc.) or filter to a single row (LIMIT 1).
Performance Consideration:
Multiple correlated subqueries in SELECT can cause performance issues, as each subquery conceptually executes per output row. For 1,000 rows with 3 subqueries, that's 3,000 subquery evaluations. Modern optimizers may collapse these into joins, but complex queries warrant EXPLAIN verification.
Alternative: For heavy per-row computations, consider rewriting with JOINs to derived tables or using window functions.
Subqueries in the FROM clause create derived tables (inline views)—virtual tables that exist only for the query's duration. This is the most powerful subquery placement, enabling complex multi-step data transformations.
Characteristics:
12345678910111213141516171819202122232425262728293031323334353637383940
-- Basic derived tableSELECT dept_stats.department_id, dept_stats.avg_salaryFROM ( SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id) AS dept_stats -- Alias REQUIREDWHERE dept_stats.avg_salary > 75000; -- Derived table joined with physical tableSELECT e.employee_name, e.salary, stats.dept_avg, e.salary - stats.dept_avg AS diffFROM employees eJOIN ( SELECT department_id, AVG(salary) AS dept_avg FROM employees GROUP BY department_id) AS stats ON e.department_id = stats.department_id; -- Multiple derived tablesSELECT high_earners.department_id, high_earners.count AS high_earner_count, all_employees.count AS total_countFROM ( SELECT department_id, COUNT(*) AS count FROM employees WHERE salary > 100000 GROUP BY department_id) AS high_earnersJOIN ( SELECT department_id, COUNT(*) AS count FROM employees GROUP BY department_id) AS all_employees ON high_earners.department_id = all_employees.department_id;The WHERE clause is the most versatile subquery location, accepting all three subquery types (scalar, row, table) with different operators.
Subquery Types and Operators in WHERE:
| Subquery Type | Valid Operators | Example |
|---|---|---|
| Scalar | =, <>, <, >, <=, >= | WHERE salary > (SELECT AVG(salary) ...) |
| Scalar | IS NULL, IS NOT NULL | WHERE (SELECT ...) IS NOT NULL |
| Row | =, <>, <, >, <=, >= | WHERE (a, b) = (SELECT x, y ...) |
| Table | IN, NOT IN | WHERE id IN (SELECT id ...) |
| Table | ANY, SOME, ALL | WHERE salary > ALL (SELECT ...) |
| Table (correlated) | EXISTS, NOT EXISTS | WHERE EXISTS (SELECT 1 ...) |
1234567891011121314151617181920212223242526272829303132333435
-- Scalar comparisonSELECT * FROM employeesWHERE salary > (SELECT AVG(salary) FROM employees); -- Row (tuple) comparisonSELECT * FROM employeesWHERE (department_id, job_code) = ( SELECT department_id, job_code FROM employees WHERE employee_id = 100); -- IN with table subquerySELECT * FROM productsWHERE category_id IN ( SELECT category_id FROM categories WHERE active = TRUE); -- EXISTS with correlated subquerySELECT * FROM customers cWHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id AND o.total > 1000); -- ALL operatorSELECT * FROM employeesWHERE salary >= ALL ( SELECT salary FROM employees WHERE department_id = 10); -- Combined operatorsSELECT * FROM ordersWHERE customer_id IN (SELECT customer_id FROM vip_customers) AND total_amount > (SELECT AVG(total_amount) FROM orders);WHERE subqueries execute during the filtering phase, before GROUP BY. This means they filter individual rows, not grouped results. If you need to filter after aggregation, use HAVING instead.
HAVING clause subqueries filter groups produced by GROUP BY, operating on aggregated values. This is distinct from WHERE, which filters individual rows before grouping.
Key Difference:
1234567891011121314151617181920212223242526272829303132333435363738
-- Scalar subquery in HAVING: filter groups by aggregate comparisonSELECT department_id, AVG(salary) AS dept_avgFROM employeesGROUP BY department_idHAVING AVG(salary) > (SELECT AVG(salary) FROM employees);-- Returns departments with above-average salary -- Multiple subqueries in HAVINGSELECT department_id, COUNT(*) AS emp_count, AVG(salary) AS avg_salaryFROM employees GROUP BY department_idHAVING COUNT(*) >= (SELECT AVG(cnt) FROM ( SELECT COUNT(*) AS cnt FROM employees GROUP BY department_id ) AS counts) AND AVG(salary) > (SELECT AVG(salary) FROM employees); -- HAVING with IN (table subquery)SELECT department_id, SUM(salary) AS total_payroll FROM employeesGROUP BY department_idHAVING department_id IN ( SELECT department_id FROM departments WHERE budget > 1000000); -- HAVING with correlated subquery (rare but valid)SELECT e.department_id, AVG(e.salary) AS dept_avgFROM employees eGROUP BY e.department_idHAVING AVG(e.salary) > ( SELECT AVG(salary) * 0.8 FROM employees WHERE department_id = e.department_id -- Correlation still works in HAVING);If your subquery filter doesn't involve aggregates, it can go in either WHERE or HAVING (with same logical result). However, WHERE is more efficient—it filters early, reducing rows to process. Use HAVING only when filtering on aggregate results is necessary.
Subqueries can appear in JOIN ON conditions, enabling dynamic join criteria. Only scalar subqueries are typically allowed here.
Use Cases:
12345678910111213141516171819202122232425262728293031
-- Join with dynamic threshold from config tableSELECT o.order_id, o.total, d.discount_rateFROM orders oJOIN discount_tiers d ON o.total >= d.min_amountWHERE d.tier_name = ( SELECT current_tier FROM system_config WHERE config_key = 'active_discount'); -- Join using subquery for date rangeSELECT e.employee_name, t.training_nameFROM employees eJOIN training_attendance t ON e.employee_id = t.employee_idWHERE t.attendance_date >= ( SELECT MAX(review_date) FROM performance_reviews WHERE employee_id = e.employee_id); -- Self-join with subquery-derived criteriaSELECT current.employee_name, current.salary, previous.salary AS prev_salaryFROM employees currentJOIN salary_history previous ON current.employee_id = previous.employee_id AND previous.effective_date = ( SELECT MAX(effective_date) FROM salary_history WHERE employee_id = current.employee_id AND effective_date < current.hire_date );Subqueries in JOIN conditions can impact performance if the optimizer can't efficiently evaluate them. Non-correlated subqueries are cached, but correlated ones may execute per candidate join pair. For complex cases, compute the value in a derived table and join explicitly.
Subqueries are equally powerful in Data Manipulation Language (DML) statements—INSERT, UPDATE, and DELETE—enabling dynamic data modification based on computed values.
INSERT with Subqueries:
12345678910111213141516171819202122232425262728
-- Scalar subqueries in VALUESINSERT INTO audit_log (action, timestamp, avg_before_change)VALUES ( 'SALARY_ADJUSTMENT', CURRENT_TIMESTAMP, (SELECT AVG(salary) FROM employees)); -- INSERT ... SELECT (entire result set)INSERT INTO high_performers (employee_id, calculated_score)SELECT employee_id, (sales_total * 0.4 + customer_rating * 0.6)FROM employee_metricsWHERE (sales_total * 0.4 + customer_rating * 0.6) > ( SELECT AVG(sales_total * 0.4 + customer_rating * 0.6) FROM employee_metrics); -- INSERT with subquery computing valuesINSERT INTO department_summaries (department_id, emp_count, avg_salary, computed_date)SELECT department_id, COUNT(*), AVG(salary), CURRENT_DATEFROM employeesGROUP BY department_id;UPDATE with Subqueries:
12345678910111213141516171819202122232425262728293031323334
-- Scalar subquery in SET clauseUPDATE employeesSET bonus = salary * (SELECT bonus_rate FROM config WHERE year = 2024); -- Scalar subquery in WHERE clauseUPDATE productsSET on_sale = TRUEWHERE price < (SELECT AVG(price) FROM products); -- Correlated subquery in SET (update based on related data)UPDATE employees eSET department_name = ( SELECT department_name FROM departments d WHERE d.department_id = e.department_id); -- Correlated subquery in both SET and WHEREUPDATE orders oSET discount_applied = ( SELECT discount_rate FROM customer_tiers ct WHERE ct.tier_level = ( SELECT tier_level FROM customers c WHERE c.customer_id = o.customer_id ))WHERE EXISTS ( SELECT 1 FROM customers c WHERE c.customer_id = o.customer_id AND c.tier_level > 0);DELETE with Subqueries:
1234567891011121314151617181920212223242526
-- Delete using IN subqueryDELETE FROM ordersWHERE customer_id IN ( SELECT customer_id FROM customers WHERE status = 'inactive'); -- Delete using scalar comparisonDELETE FROM logsWHERE created_at < ( SELECT DATE_SUB(MAX(created_at), INTERVAL 90 DAY) FROM logs); -- Delete using EXISTS (often for referential cleanup)DELETE FROM orphaned_records oWHERE NOT EXISTS ( SELECT 1 FROM master_records m WHERE m.id = o.master_id); -- Delete using NOT IN (careful with NULLs!)DELETE FROM temp_dataWHERE record_id NOT IN ( SELECT record_id FROM permanent_data WHERE record_id IS NOT NULL);Some databases (especially MySQL) restrict subqueries that reference the table being modified. The error 'You can't specify target table X for update in FROM clause' indicates this. Workarounds: use derived tables, temporary tables, or CTEs to isolate the subquery computation.
With multiple valid placements for many subqueries, how do you choose? The decision depends on your goal, the data flow, and performance considerations.
Decision Framework:
| If You Need To... | Use Placement | Rationale |
|---|---|---|
| Add computed columns to output | SELECT | Only SELECT adds columns without filtering |
| Filter individual rows | WHERE | Executes before GROUP BY, most efficient |
| Filter aggregated groups | HAVING | Executes after GROUP BY, filters on aggregates |
| Create intermediate table for joining | FROM (derived table) | Enables multi-step transformations |
| Check existence of related data | WHERE EXISTS | Most efficient for existence checks |
| Match against a set of values | WHERE IN | Set membership testing |
| Compare against all values in set | WHERE > ALL | Universal comparison |
| Use subquery result multiple times | FROM or CTE | Avoids duplicate execution |
12345678910111213141516171819202122232425262728293031
-- SAME RESULT, DIFFERENT PLACEMENTS-- Goal: Employees above company average, showing the average -- Option A: SELECT + WHERE (two subqueries, possibly cached)SELECT employee_name, salary, (SELECT AVG(salary) FROM employees) AS company_avgFROM employeesWHERE salary > (SELECT AVG(salary) FROM employees); -- Option B: FROM clause (single computation, joined)SELECT e.employee_name, e.salary, stats.avg AS company_avgFROM employees eCROSS JOIN (SELECT AVG(salary) AS avg FROM employees) AS statsWHERE e.salary > stats.avg; -- Option C: Using CTE (clearest, single computation)WITH company_stats AS ( SELECT AVG(salary) AS avg FROM employees)SELECT e.employee_name, e.salary, s.avg AS company_avgFROM employees e, company_stats sWHERE e.salary > s.avg; -- All three produce identical results-- Option A: 2 subquery executions (possibly cached to 1)-- Option B: 1 subquery execution + CROSS JOIN-- Option C: 1 CTE execution + JOIN (most readable)Performance differences between placements are often negligible after optimization. Favor the placement that makes your query's intent clearest. A maintainable query beats a marginally faster one. Use EXPLAIN to verify when performance is critical.
Experience reveals patterns that work well and anti-patterns to avoid.
Recommended Patterns:
123456789101112131415161718192021222324252627
-- PATTERN: Derived table for aggregate-then-filterSELECT d.department_id, d.avg_salaryFROM ( SELECT department_id, AVG(salary) AS avg_salary FROM employees GROUP BY department_id) dWHERE d.avg_salary > 50000;-- Clear: compute aggregates, then filter -- PATTERN: EXISTS for existence checksSELECT * FROM customers cWHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);-- Efficient: stops at first match -- PATTERN: Scalar subquery for single reference valuesSELECT * FROM employeesWHERE hire_date < (SELECT company_founding_date FROM company_info LIMIT 1);-- Simple: one value, one comparison -- PATTERN: IN for set membershipSELECT * FROM productsWHERE category_id IN (SELECT category_id FROM featured_categories);-- Semantic: "is member of this set"Anti-Patterns to Avoid:
123456789101112131415161718192021222324252627282930313233343536
-- ANTI-PATTERN: Repeated identical subqueries-- BAD: same subquery executed twice (may not be cached)SELECT employee_name, salary - (SELECT AVG(salary) FROM employees) AS diff, 100 * salary / (SELECT SUM(salary) FROM employees) AS pctFROM employeesWHERE salary > (SELECT AVG(salary) FROM employees); -- BETTER: compute once, reuseWITH stats AS ( SELECT AVG(salary) AS avg, SUM(salary) AS total FROM employees)SELECT employee_name, salary - stats.avg AS diff, 100 * salary / stats.total AS pctFROM employees, statsWHERE salary > stats.avg; -- ANTI-PATTERN: NOT IN with potentially NULL subquery-- DANGEROUS: returns NO rows if subquery has NULLSELECT * FROM orders WHERE customer_id NOT IN (SELECT customer_id FROM banned_customers); -- SAFE: use NOT EXISTSSELECT * FROM orders oWHERE NOT EXISTS (SELECT 1 FROM banned_customers b WHERE b.customer_id = o.customer_id); -- ANTI-PATTERN: Correlated subquery where JOIN works-- SLOWER: correlated subquery per rowSELECT e.employee_name, (SELECT d.name FROM depts d WHERE d.id = e.dept_id) AS deptFROM employees e; -- FASTER: simple JOINSELECT e.employee_name, d.name AS deptFROM employees e LEFT JOIN depts d ON d.id = e.dept_id;Subquery placement is as important as subquery design. Let's consolidate the key insights:
Module Complete:
You have now completed the comprehensive study of SQL subqueries. From the foundational concept through scalar, row, and table subqueries to strategic placement, you possess the knowledge to leverage subqueries for any data challenge.
In the next module, we'll explore correlated subqueries in depth—subqueries that reference the outer query, enabling powerful row-by-row dependent computations.
Congratulations! You've mastered SQL subqueries—their types, behaviors, and strategic placement. You now understand how to compose queries for complex data retrieval, when to use each subquery type, and how to avoid common pitfalls. This knowledge forms a foundation for advanced SQL techniques including correlated subqueries and CTEs.