Subqueries - Learning Module

Loading content...

0/252

Subquery Placement

Location Matters: The Same Logic, Different Behavior

Consider a conceptually simple requirement: "For each employee, show how their salary compares to the company average."

You can solve this with a subquery—but WHERE you place it affects everything: syntax requirements, execution semantics, result structure, and performance.

-- Placement 1: In SELECT (adds column)
SELECT name, salary, (SELECT AVG(salary) FROM employees) AS avg FROM employees;

-- Placement 2: In WHERE (filters rows)
SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

-- Placement 3: In FROM (creates derived table)
SELECT e.name, e.salary, stats.avg FROM employees e, (SELECT AVG(salary) AS avg FROM employees) stats;

Each placement serves a different purpose. Understanding these differences—and knowing which to use when—is the mark of SQL mastery.

What You Will Learn

By the end of this page, you will understand exactly where subqueries can appear in SQL statements, how placement affects behavior, the specific requirements and restrictions of each context, and how to choose the optimal placement for your use case.

Overview of Subquery Placement Contexts

Subqueries can appear in virtually any part of a SQL statement where an expression or table reference is valid. Each context has specific rules about what subquery types are allowed and how results are used.

Complete Placement Map:

Subquery Placement Contexts
Placement	Allowed Types	Primary Purpose	Example Pattern
SELECT list	Scalar only	Add computed columns	`SELECT ..., (subquery) AS col`
FROM clause	Table (derived tables)	Create virtual table source	`FROM (subquery) AS alias`
WHERE clause	Scalar, Row, Table	Filter source rows	`WHERE col operator (subquery)`
HAVING clause	Scalar, Table	Filter grouped results	`HAVING AGG(col) > (subquery)`
JOIN condition	Scalar	Dynamic join conditions	`ON a.col = (subquery)`
CASE expression	Scalar	Conditional logic	`CASE WHEN (subquery) ...`
INSERT VALUES	Scalar	Computed insert values	`VALUES ((subquery), ...)`
UPDATE SET	Scalar	Dynamic update values	`SET col = (subquery)`
DELETE WHERE	Scalar, Table	Dynamic deletion criteria	`WHERE col IN (subquery)`

Logical Query Execution Order

SQL's logical execution order is: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY. Subqueries in each clause are evaluated at that stage. A WHERE subquery filters before grouping; a HAVING subquery filters after. Understanding this order explains why certain placements work differently.

Subqueries in SELECT Clause

Subqueries in the SELECT list add computed columns to each output row. Only scalar subqueries are allowed—the subquery must return exactly one value.

Characteristics:

Result appears as a column in output
Can be non-correlated (same value for all rows) or correlated (different per row)
Does not affect which rows are returned—only adds data to them
Multiple scalar subqueries allowed in same SELECT

select_placement.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Non-correlated: same value for all rows
SELECT 
    employee_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
 
-- Correlated: different value per row
SELECT 
    e.employee_name,
    e.salary,
    e.department_id,
    (SELECT d.department_name 
     FROM departments d 
     WHERE d.department_id = e.department_id) AS dept_name,
    (SELECT COUNT(*) 
     FROM employees e2 
     WHERE e2.department_id = e.department_id) AS dept_size,
    (SELECT AVG(salary) 
     FROM employees e3 
     WHERE e3.department_id = e.department_id) AS dept_avg
FROM employees e;
 
-- With expressions on subquery result
SELECT 
    product_name,
    price,
    ROUND(price / (SELECT AVG(price) FROM products) * 100, 1) AS pct_of_avg
FROM products;

SELECT Subquery Limitations

Only scalar subqueries are allowed in SELECT. A subquery returning multiple rows or multiple columns will cause an error. If your subquery might return multiple rows, you must aggregate it (MAX, MIN, etc.) or filter to a single row (LIMIT 1).

Performance Consideration:

Multiple correlated subqueries in SELECT can cause performance issues, as each subquery conceptually executes per output row. For 1,000 rows with 3 subqueries, that's 3,000 subquery evaluations. Modern optimizers may collapse these into joins, but complex queries warrant EXPLAIN verification.

Alternative: For heavy per-row computations, consider rewriting with JOINs to derived tables or using window functions.

Subqueries in FROM Clause (Derived Tables)

Subqueries in the FROM clause create derived tables (inline views)—virtual tables that exist only for the query's duration. This is the most powerful subquery placement, enabling complex multi-step data transformations.

Characteristics:

Must be table subqueries (can return any number of rows/columns)
MUST have an alias (database requirement)
Acts as a regular table: can be joined, filtered, aggregated
Column aliases from subquery become the derived table's schema

from_placement.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Basic derived table
SELECT dept_stats.department_id, dept_stats.avg_salary
FROM (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) AS dept_stats                              -- Alias REQUIRED
WHERE dept_stats.avg_salary > 75000;
 
 
-- Derived table joined with physical table
SELECT 
    e.employee_name,
    e.salary,
    stats.dept_avg,
    e.salary - stats.dept_avg AS diff
FROM employees e
JOIN (
    SELECT department_id, AVG(salary) AS dept_avg
    FROM employees
    GROUP BY department_id
) AS stats ON e.department_id = stats.department_id;
 
 
-- Multiple derived tables
SELECT 
    high_earners.department_id,
    high_earners.count AS high_earner_count,
    all_employees.count AS total_count
FROM (
    SELECT department_id, COUNT(*) AS count
    FROM employees
    WHERE salary > 100000
    GROUP BY department_id
) AS high_earners
JOIN (
    SELECT department_id, COUNT(*) AS count
    FROM employees
    GROUP BY department_id
) AS all_employees ON high_earners.department_id = all_employees.department_id;

Derived Table Advantages

•Enables multi-step computation
•Can filter on aggregate results
•Creates reusable intermediate results
•Allows complex transformations

Derived Table Limitations

•Cannot reference earlier in same FROM (without LATERAL)
•Deep nesting hurts readability
•May be materialized (memory usage)
•CTEs often clearer for complex cases

Subqueries in WHERE Clause

The WHERE clause is the most versatile subquery location, accepting all three subquery types (scalar, row, table) with different operators.

Subquery Types and Operators in WHERE:

WHERE Clause Subquery Patterns
Subquery Type	Valid Operators	Example
Scalar	=, <>, <, >, <=, >=	`WHERE salary > (SELECT AVG(salary) ...)`
Scalar	IS NULL, IS NOT NULL	`WHERE (SELECT ...) IS NOT NULL`
Row	=, <>, <, >, <=, >=	`WHERE (a, b) = (SELECT x, y ...)`
Table	IN, NOT IN	`WHERE id IN (SELECT id ...)`
Table	ANY, SOME, ALL	`WHERE salary > ALL (SELECT ...)`
Table (correlated)	EXISTS, NOT EXISTS	`WHERE EXISTS (SELECT 1 ...)`

where_placement.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Scalar comparison
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- Row (tuple) comparison
SELECT * FROM employees
WHERE (department_id, job_code) = (
    SELECT department_id, job_code
    FROM employees WHERE employee_id = 100
);
 
-- IN with table subquery
SELECT * FROM products
WHERE category_id IN (
    SELECT category_id FROM categories WHERE active = TRUE
);
 
-- EXISTS with correlated subquery
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
    AND o.total > 1000
);
 
-- ALL operator
SELECT * FROM employees
WHERE salary >= ALL (
    SELECT salary FROM employees WHERE department_id = 10
);
 
-- Combined operators
SELECT * FROM orders
WHERE customer_id IN (SELECT customer_id FROM vip_customers)
  AND total_amount > (SELECT AVG(total_amount) FROM orders);

WHERE Subquery Execution

WHERE subqueries execute during the filtering phase, before GROUP BY. This means they filter individual rows, not grouped results. If you need to filter after aggregation, use HAVING instead.

Subqueries in HAVING Clause

HAVING clause subqueries filter groups produced by GROUP BY, operating on aggregated values. This is distinct from WHERE, which filters individual rows before grouping.

Key Difference:

WHERE filters rows → happens before GROUP BY
HAVING filters groups → happens after GROUP BY and aggregate calculation

having_placement.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Scalar subquery in HAVING: filter groups by aggregate comparison
SELECT department_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY department_id
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
-- Returns departments with above-average salary
 
 
-- Multiple subqueries in HAVING
SELECT department_id, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees  
GROUP BY department_id
HAVING COUNT(*) >= (SELECT AVG(cnt) FROM (
           SELECT COUNT(*) AS cnt FROM employees GROUP BY department_id
       ) AS counts)
   AND AVG(salary) > (SELECT AVG(salary) FROM employees);
 
 
-- HAVING with IN (table subquery)
SELECT department_id, SUM(salary) AS total_payroll  
FROM employees
GROUP BY department_id
HAVING department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE budget > 1000000
);
 
 
-- HAVING with correlated subquery (rare but valid)
SELECT e.department_id, AVG(e.salary) AS dept_avg
FROM employees e
GROUP BY e.department_id
HAVING AVG(e.salary) > (
    SELECT AVG(salary) * 0.8
    FROM employees
    WHERE department_id = e.department_id  -- Correlation still works in HAVING
);

HAVING vs. WHERE for Subqueries

If your subquery filter doesn't involve aggregates, it can go in either WHERE or HAVING (with same logical result). However, WHERE is more efficient—it filters early, reducing rows to process. Use HAVING only when filtering on aggregate results is necessary.

Subqueries in JOIN Conditions

Subqueries can appear in JOIN ON conditions, enabling dynamic join criteria. Only scalar subqueries are typically allowed here.

Use Cases:

Join on a value looked up dynamically
Apply business rules stored in a configuration table
Create date-relative or threshold-based joins

join_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Join with dynamic threshold from config table
SELECT o.order_id, o.total, d.discount_rate
FROM orders o
JOIN discount_tiers d ON o.total >= d.min_amount
WHERE d.tier_name = (
    SELECT current_tier FROM system_config WHERE config_key = 'active_discount'
);
 
 
-- Join using subquery for date range
SELECT e.employee_name, t.training_name
FROM employees e
JOIN training_attendance t ON e.employee_id = t.employee_id
WHERE t.attendance_date >= (
    SELECT MAX(review_date) 
    FROM performance_reviews 
    WHERE employee_id = e.employee_id
);
 
 
-- Self-join with subquery-derived criteria
SELECT current.employee_name, current.salary, previous.salary AS prev_salary
FROM employees current
JOIN salary_history previous 
    ON current.employee_id = previous.employee_id
   AND previous.effective_date = (
       SELECT MAX(effective_date)
       FROM salary_history
       WHERE employee_id = current.employee_id
         AND effective_date < current.hire_date
   );

Performance with JOIN Subqueries

Subqueries in JOIN conditions can impact performance if the optimizer can't efficiently evaluate them. Non-correlated subqueries are cached, but correlated ones may execute per candidate join pair. For complex cases, compute the value in a derived table and join explicitly.

Subqueries in DML Statements

Subqueries are equally powerful in Data Manipulation Language (DML) statements—INSERT, UPDATE, and DELETE—enabling dynamic data modification based on computed values.

INSERT with Subqueries:

insert_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Scalar subqueries in VALUES
INSERT INTO audit_log (action, timestamp, avg_before_change)
VALUES (
    'SALARY_ADJUSTMENT',
    CURRENT_TIMESTAMP,
    (SELECT AVG(salary) FROM employees)
);
 
 
-- INSERT ... SELECT (entire result set)
INSERT INTO high_performers (employee_id, calculated_score)
SELECT employee_id, (sales_total * 0.4 + customer_rating * 0.6)
FROM employee_metrics
WHERE (sales_total * 0.4 + customer_rating * 0.6) > (
    SELECT AVG(sales_total * 0.4 + customer_rating * 0.6)
    FROM employee_metrics
);
 
 
-- INSERT with subquery computing values
INSERT INTO department_summaries (department_id, emp_count, avg_salary, computed_date)
SELECT 
    department_id,
    COUNT(*),
    AVG(salary),
    CURRENT_DATE
FROM employees
GROUP BY department_id;

UPDATE with Subqueries:

update_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Scalar subquery in SET clause
UPDATE employees
SET bonus = salary * (SELECT bonus_rate FROM config WHERE year = 2024);
 
 
-- Scalar subquery in WHERE clause
UPDATE products
SET on_sale = TRUE
WHERE price < (SELECT AVG(price) FROM products);
 
 
-- Correlated subquery in SET (update based on related data)
UPDATE employees e
SET department_name = (
    SELECT department_name 
    FROM departments d 
    WHERE d.department_id = e.department_id
);
 
 
-- Correlated subquery in both SET and WHERE
UPDATE orders o
SET discount_applied = (
    SELECT discount_rate 
    FROM customer_tiers ct 
    WHERE ct.tier_level = (
        SELECT tier_level FROM customers c WHERE c.customer_id = o.customer_id
    )
)
WHERE EXISTS (
    SELECT 1 FROM customers c 
    WHERE c.customer_id = o.customer_id 
    AND c.tier_level > 0
);

DELETE with Subqueries:

delete_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Delete using IN subquery
DELETE FROM orders
WHERE customer_id IN (
    SELECT customer_id FROM customers WHERE status = 'inactive'
);
 
 
-- Delete using scalar comparison
DELETE FROM logs
WHERE created_at < (
    SELECT DATE_SUB(MAX(created_at), INTERVAL 90 DAY) FROM logs
);
 
 
-- Delete using EXISTS (often for referential cleanup)
DELETE FROM orphaned_records o
WHERE NOT EXISTS (
    SELECT 1 FROM master_records m WHERE m.id = o.master_id
);
 
 
-- Delete using NOT IN (careful with NULLs!)
DELETE FROM temp_data
WHERE record_id NOT IN (
    SELECT record_id FROM permanent_data WHERE record_id IS NOT NULL
);

Same-Table Subquery Restrictions

Some databases (especially MySQL) restrict subqueries that reference the table being modified. The error 'You can't specify target table X for update in FROM clause' indicates this. Workarounds: use derived tables, temporary tables, or CTEs to isolate the subquery computation.

Choosing the Right Placement

With multiple valid placements for many subqueries, how do you choose? The decision depends on your goal, the data flow, and performance considerations.

Decision Framework:

Subquery Placement Decision Guide
If You Need To...	Use Placement	Rationale
Add computed columns to output	SELECT	Only SELECT adds columns without filtering
Filter individual rows	WHERE	Executes before GROUP BY, most efficient
Filter aggregated groups	HAVING	Executes after GROUP BY, filters on aggregates
Create intermediate table for joining	FROM (derived table)	Enables multi-step transformations
Check existence of related data	WHERE EXISTS	Most efficient for existence checks
Match against a set of values	WHERE IN	Set membership testing
Compare against all values in set	WHERE > ALL	Universal comparison
Use subquery result multiple times	FROM or CTE	Avoids duplicate execution

placement_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- SAME RESULT, DIFFERENT PLACEMENTS
-- Goal: Employees above company average, showing the average
 
-- Option A: SELECT + WHERE (two subqueries, possibly cached)
SELECT 
    employee_name, 
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
 
-- Option B: FROM clause (single computation, joined)
SELECT e.employee_name, e.salary, stats.avg AS company_avg
FROM employees e
CROSS JOIN (SELECT AVG(salary) AS avg FROM employees) AS stats
WHERE e.salary > stats.avg;
 
 
-- Option C: Using CTE (clearest, single computation)
WITH company_stats AS (
    SELECT AVG(salary) AS avg FROM employees
)
SELECT e.employee_name, e.salary, s.avg AS company_avg
FROM employees e, company_stats s
WHERE e.salary > s.avg;
 
-- All three produce identical results
-- Option A: 2 subquery executions (possibly cached to 1)
-- Option B: 1 subquery execution + CROSS JOIN
-- Option C: 1 CTE execution + JOIN (most readable)

When in Doubt, Favor Clarity

Performance differences between placements are often negligible after optimization. Favor the placement that makes your query's intent clearest. A maintainable query beats a marginally faster one. Use EXPLAIN to verify when performance is critical.

Common Patterns and Anti-Patterns

Experience reveals patterns that work well and anti-patterns to avoid.

Recommended Patterns:

good_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- PATTERN: Derived table for aggregate-then-filter
SELECT d.department_id, d.avg_salary
FROM (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) d
WHERE d.avg_salary > 50000;
-- Clear: compute aggregates, then filter
 
 
-- PATTERN: EXISTS for existence checks
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
-- Efficient: stops at first match
 
 
-- PATTERN: Scalar subquery for single reference values
SELECT * FROM employees
WHERE hire_date < (SELECT company_founding_date FROM company_info LIMIT 1);
-- Simple: one value, one comparison
 
 
-- PATTERN: IN for set membership
SELECT * FROM products
WHERE category_id IN (SELECT category_id FROM featured_categories);
-- Semantic: "is member of this set"

Anti-Patterns to Avoid:

antipatterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- ANTI-PATTERN: Repeated identical subqueries
-- BAD: same subquery executed twice (may not be cached)
SELECT employee_name,
       salary - (SELECT AVG(salary) FROM employees) AS diff,
       100 * salary / (SELECT SUM(salary) FROM employees) AS pct
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- BETTER: compute once, reuse
WITH stats AS (
    SELECT AVG(salary) AS avg, SUM(salary) AS total FROM employees
)
SELECT employee_name,
       salary - stats.avg AS diff,
       100 * salary / stats.total AS pct
FROM employees, stats
WHERE salary > stats.avg;
 
 
-- ANTI-PATTERN: NOT IN with potentially NULL subquery
-- DANGEROUS: returns NO rows if subquery has NULL
SELECT * FROM orders WHERE customer_id NOT IN (SELECT customer_id FROM banned_customers);
 
-- SAFE: use NOT EXISTS
SELECT * FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM banned_customers b WHERE b.customer_id = o.customer_id);
 
 
-- ANTI-PATTERN: Correlated subquery where JOIN works
-- SLOWER: correlated subquery per row
SELECT e.employee_name, (SELECT d.name FROM depts d WHERE d.id = e.dept_id) AS dept
FROM employees e;
 
-- FASTER: simple JOIN
SELECT e.employee_name, d.name AS dept
FROM employees e LEFT JOIN depts d ON d.id = e.dept_id;

Anti-Pattern Summary

•Repeated identical subqueries — Use CTEs or derived tables for reuse
•NOT IN with NULL risk — Use NOT EXISTS instead
•Correlated subquery for simple lookup — Use JOIN
•Subquery when aggregate exists — Use window functions for row-level aggregates
•Deep nesting without necessity — Flatten with CTEs for readability

Summary: Mastering Subquery Placement

Subquery placement is as important as subquery design. Let's consolidate the key insights:

Key Takeaways

•SELECT clause — Scalar only; adds computed columns without filtering; correlated subqueries compute per row.
•FROM clause — Table subqueries only; creates derived tables; must have alias; enables multi-step transformations.
•WHERE clause — Most versatile; accepts all types; filters rows before grouping; use for primary filtering logic.
•HAVING clause — Filters groups after aggregation; use only when aggregate-based filtering is needed.
•DML statements — Subqueries enable dynamic INSERT/UPDATE/DELETE based on computed values.
•Placement choice — Depends on goal (filter vs. enrich), timing (pre vs. post aggregation), and performance.
•Avoid anti-patterns — Don't repeat subqueries; use NOT EXISTS over NOT IN; prefer JOINs for simple lookups.

Module Complete:

You have now completed the comprehensive study of SQL subqueries. From the foundational concept through scalar, row, and table subqueries to strategic placement, you possess the knowledge to leverage subqueries for any data challenge.

In the next module, we'll explore correlated subqueries in depth—subqueries that reference the outer query, enabling powerful row-by-row dependent computations.

Module Complete

Congratulations! You've mastered SQL subqueries—their types, behaviors, and strategic placement. You now understand how to compose queries for complex data retrieval, when to use each subquery type, and how to avoid common pitfalls. This knowledge forms a foundation for advanced SQL techniques including correlated subqueries and CTEs.

Subquery Placement

Location Matters: The Same Logic, Different Behavior

Consider a conceptually simple requirement: "For each employee, show how their salary compares to the company average."

You can solve this with a subquery—but WHERE you place it affects everything: syntax requirements, execution semantics, result structure, and performance.

-- Placement 1: In SELECT (adds column)
SELECT name, salary, (SELECT AVG(salary) FROM employees) AS avg FROM employees;

-- Placement 2: In WHERE (filters rows)
SELECT name, salary FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

-- Placement 3: In FROM (creates derived table)
SELECT e.name, e.salary, stats.avg FROM employees e, (SELECT AVG(salary) AS avg FROM employees) stats;

Each placement serves a different purpose. Understanding these differences—and knowing which to use when—is the mark of SQL mastery.

What You Will Learn

Overview of Subquery Placement Contexts

Complete Placement Map:

Subquery Placement Contexts
Placement	Allowed Types	Primary Purpose	Example Pattern
SELECT list	Scalar only	Add computed columns	`SELECT ..., (subquery) AS col`
FROM clause	Table (derived tables)	Create virtual table source	`FROM (subquery) AS alias`
WHERE clause	Scalar, Row, Table	Filter source rows	`WHERE col operator (subquery)`
HAVING clause	Scalar, Table	Filter grouped results	`HAVING AGG(col) > (subquery)`
JOIN condition	Scalar	Dynamic join conditions	`ON a.col = (subquery)`
CASE expression	Scalar	Conditional logic	`CASE WHEN (subquery) ...`
INSERT VALUES	Scalar	Computed insert values	`VALUES ((subquery), ...)`
UPDATE SET	Scalar	Dynamic update values	`SET col = (subquery)`
DELETE WHERE	Scalar, Table	Dynamic deletion criteria	`WHERE col IN (subquery)`

Logical Query Execution Order

Subqueries in SELECT Clause

Subqueries in the SELECT list add computed columns to each output row. Only scalar subqueries are allowed—the subquery must return exactly one value.

Characteristics:

Result appears as a column in output
Can be non-correlated (same value for all rows) or correlated (different per row)
Does not affect which rows are returned—only adds data to them
Multiple scalar subqueries allowed in same SELECT

select_placement.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Non-correlated: same value for all rows
SELECT 
    employee_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
 
-- Correlated: different value per row
SELECT 
    e.employee_name,
    e.salary,
    e.department_id,
    (SELECT d.department_name 
     FROM departments d 
     WHERE d.department_id = e.department_id) AS dept_name,
    (SELECT COUNT(*) 
     FROM employees e2 
     WHERE e2.department_id = e.department_id) AS dept_size,
    (SELECT AVG(salary) 
     FROM employees e3 
     WHERE e3.department_id = e.department_id) AS dept_avg
FROM employees e;
 
-- With expressions on subquery result
SELECT 
    product_name,
    price,
    ROUND(price / (SELECT AVG(price) FROM products) * 100, 1) AS pct_of_avg
FROM products;

SELECT Subquery Limitations

Performance Consideration:

Alternative: For heavy per-row computations, consider rewriting with JOINs to derived tables or using window functions.

Subqueries in FROM Clause (Derived Tables)

Characteristics:

Must be table subqueries (can return any number of rows/columns)
MUST have an alias (database requirement)
Acts as a regular table: can be joined, filtered, aggregated
Column aliases from subquery become the derived table's schema

from_placement.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Basic derived table
SELECT dept_stats.department_id, dept_stats.avg_salary
FROM (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) AS dept_stats                              -- Alias REQUIRED
WHERE dept_stats.avg_salary > 75000;
 
 
-- Derived table joined with physical table
SELECT 
    e.employee_name,
    e.salary,
    stats.dept_avg,
    e.salary - stats.dept_avg AS diff
FROM employees e
JOIN (
    SELECT department_id, AVG(salary) AS dept_avg
    FROM employees
    GROUP BY department_id
) AS stats ON e.department_id = stats.department_id;
 
 
-- Multiple derived tables
SELECT 
    high_earners.department_id,
    high_earners.count AS high_earner_count,
    all_employees.count AS total_count
FROM (
    SELECT department_id, COUNT(*) AS count
    FROM employees
    WHERE salary > 100000
    GROUP BY department_id
) AS high_earners
JOIN (
    SELECT department_id, COUNT(*) AS count
    FROM employees
    GROUP BY department_id
) AS all_employees ON high_earners.department_id = all_employees.department_id;

Derived Table Advantages

•Enables multi-step computation
•Can filter on aggregate results
•Creates reusable intermediate results
•Allows complex transformations

Derived Table Limitations

•Cannot reference earlier in same FROM (without LATERAL)
•Deep nesting hurts readability
•May be materialized (memory usage)
•CTEs often clearer for complex cases

Subqueries in WHERE Clause

The WHERE clause is the most versatile subquery location, accepting all three subquery types (scalar, row, table) with different operators.

Subquery Types and Operators in WHERE:

WHERE Clause Subquery Patterns
Subquery Type	Valid Operators	Example
Scalar	=, <>, <, >, <=, >=	`WHERE salary > (SELECT AVG(salary) ...)`
Scalar	IS NULL, IS NOT NULL	`WHERE (SELECT ...) IS NOT NULL`
Row	=, <>, <, >, <=, >=	`WHERE (a, b) = (SELECT x, y ...)`
Table	IN, NOT IN	`WHERE id IN (SELECT id ...)`
Table	ANY, SOME, ALL	`WHERE salary > ALL (SELECT ...)`
Table (correlated)	EXISTS, NOT EXISTS	`WHERE EXISTS (SELECT 1 ...)`

where_placement.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- Scalar comparison
SELECT * FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- Row (tuple) comparison
SELECT * FROM employees
WHERE (department_id, job_code) = (
    SELECT department_id, job_code
    FROM employees WHERE employee_id = 100
);
 
-- IN with table subquery
SELECT * FROM products
WHERE category_id IN (
    SELECT category_id FROM categories WHERE active = TRUE
);
 
-- EXISTS with correlated subquery
SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o 
    WHERE o.customer_id = c.customer_id
    AND o.total > 1000
);
 
-- ALL operator
SELECT * FROM employees
WHERE salary >= ALL (
    SELECT salary FROM employees WHERE department_id = 10
);
 
-- Combined operators
SELECT * FROM orders
WHERE customer_id IN (SELECT customer_id FROM vip_customers)
  AND total_amount > (SELECT AVG(total_amount) FROM orders);

WHERE Subquery Execution

WHERE subqueries execute during the filtering phase, before GROUP BY. This means they filter individual rows, not grouped results. If you need to filter after aggregation, use HAVING instead.

Subqueries in HAVING Clause

HAVING clause subqueries filter groups produced by GROUP BY, operating on aggregated values. This is distinct from WHERE, which filters individual rows before grouping.

Key Difference:

WHERE filters rows → happens before GROUP BY
HAVING filters groups → happens after GROUP BY and aggregate calculation

having_placement.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Scalar subquery in HAVING: filter groups by aggregate comparison
SELECT department_id, AVG(salary) AS dept_avg
FROM employees
GROUP BY department_id
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
-- Returns departments with above-average salary
 
 
-- Multiple subqueries in HAVING
SELECT department_id, COUNT(*) AS emp_count, AVG(salary) AS avg_salary
FROM employees  
GROUP BY department_id
HAVING COUNT(*) >= (SELECT AVG(cnt) FROM (
           SELECT COUNT(*) AS cnt FROM employees GROUP BY department_id
       ) AS counts)
   AND AVG(salary) > (SELECT AVG(salary) FROM employees);
 
 
-- HAVING with IN (table subquery)
SELECT department_id, SUM(salary) AS total_payroll  
FROM employees
GROUP BY department_id
HAVING department_id IN (
    SELECT department_id 
    FROM departments 
    WHERE budget > 1000000
);
 
 
-- HAVING with correlated subquery (rare but valid)
SELECT e.department_id, AVG(e.salary) AS dept_avg
FROM employees e
GROUP BY e.department_id
HAVING AVG(e.salary) > (
    SELECT AVG(salary) * 0.8
    FROM employees
    WHERE department_id = e.department_id  -- Correlation still works in HAVING
);

HAVING vs. WHERE for Subqueries

Subqueries in JOIN Conditions

Subqueries can appear in JOIN ON conditions, enabling dynamic join criteria. Only scalar subqueries are typically allowed here.

Use Cases:

Join on a value looked up dynamically
Apply business rules stored in a configuration table
Create date-relative or threshold-based joins

join_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Join with dynamic threshold from config table
SELECT o.order_id, o.total, d.discount_rate
FROM orders o
JOIN discount_tiers d ON o.total >= d.min_amount
WHERE d.tier_name = (
    SELECT current_tier FROM system_config WHERE config_key = 'active_discount'
);
 
 
-- Join using subquery for date range
SELECT e.employee_name, t.training_name
FROM employees e
JOIN training_attendance t ON e.employee_id = t.employee_id
WHERE t.attendance_date >= (
    SELECT MAX(review_date) 
    FROM performance_reviews 
    WHERE employee_id = e.employee_id
);
 
 
-- Self-join with subquery-derived criteria
SELECT current.employee_name, current.salary, previous.salary AS prev_salary
FROM employees current
JOIN salary_history previous 
    ON current.employee_id = previous.employee_id
   AND previous.effective_date = (
       SELECT MAX(effective_date)
       FROM salary_history
       WHERE employee_id = current.employee_id
         AND effective_date < current.hire_date
   );

Performance with JOIN Subqueries

Subqueries in DML Statements

Subqueries are equally powerful in Data Manipulation Language (DML) statements—INSERT, UPDATE, and DELETE—enabling dynamic data modification based on computed values.

INSERT with Subqueries:

insert_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
-- Scalar subqueries in VALUES
INSERT INTO audit_log (action, timestamp, avg_before_change)
VALUES (
    'SALARY_ADJUSTMENT',
    CURRENT_TIMESTAMP,
    (SELECT AVG(salary) FROM employees)
);
 
 
-- INSERT ... SELECT (entire result set)
INSERT INTO high_performers (employee_id, calculated_score)
SELECT employee_id, (sales_total * 0.4 + customer_rating * 0.6)
FROM employee_metrics
WHERE (sales_total * 0.4 + customer_rating * 0.6) > (
    SELECT AVG(sales_total * 0.4 + customer_rating * 0.6)
    FROM employee_metrics
);
 
 
-- INSERT with subquery computing values
INSERT INTO department_summaries (department_id, emp_count, avg_salary, computed_date)
SELECT 
    department_id,
    COUNT(*),
    AVG(salary),
    CURRENT_DATE
FROM employees
GROUP BY department_id;

UPDATE with Subqueries:

update_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Scalar subquery in SET clause
UPDATE employees
SET bonus = salary * (SELECT bonus_rate FROM config WHERE year = 2024);
 
 
-- Scalar subquery in WHERE clause
UPDATE products
SET on_sale = TRUE
WHERE price < (SELECT AVG(price) FROM products);
 
 
-- Correlated subquery in SET (update based on related data)
UPDATE employees e
SET department_name = (
    SELECT department_name 
    FROM departments d 
    WHERE d.department_id = e.department_id
);
 
 
-- Correlated subquery in both SET and WHERE
UPDATE orders o
SET discount_applied = (
    SELECT discount_rate 
    FROM customer_tiers ct 
    WHERE ct.tier_level = (
        SELECT tier_level FROM customers c WHERE c.customer_id = o.customer_id
    )
)
WHERE EXISTS (
    SELECT 1 FROM customers c 
    WHERE c.customer_id = o.customer_id 
    AND c.tier_level > 0
);

DELETE with Subqueries:

delete_subqueries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Delete using IN subquery
DELETE FROM orders
WHERE customer_id IN (
    SELECT customer_id FROM customers WHERE status = 'inactive'
);
 
 
-- Delete using scalar comparison
DELETE FROM logs
WHERE created_at < (
    SELECT DATE_SUB(MAX(created_at), INTERVAL 90 DAY) FROM logs
);
 
 
-- Delete using EXISTS (often for referential cleanup)
DELETE FROM orphaned_records o
WHERE NOT EXISTS (
    SELECT 1 FROM master_records m WHERE m.id = o.master_id
);
 
 
-- Delete using NOT IN (careful with NULLs!)
DELETE FROM temp_data
WHERE record_id NOT IN (
    SELECT record_id FROM permanent_data WHERE record_id IS NOT NULL
);

Same-Table Subquery Restrictions

Choosing the Right Placement

With multiple valid placements for many subqueries, how do you choose? The decision depends on your goal, the data flow, and performance considerations.

Decision Framework:

Subquery Placement Decision Guide
If You Need To...	Use Placement	Rationale
Add computed columns to output	SELECT	Only SELECT adds columns without filtering
Filter individual rows	WHERE	Executes before GROUP BY, most efficient
Filter aggregated groups	HAVING	Executes after GROUP BY, filters on aggregates
Create intermediate table for joining	FROM (derived table)	Enables multi-step transformations
Check existence of related data	WHERE EXISTS	Most efficient for existence checks
Match against a set of values	WHERE IN	Set membership testing
Compare against all values in set	WHERE > ALL	Universal comparison
Use subquery result multiple times	FROM or CTE	Avoids duplicate execution

placement_comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- SAME RESULT, DIFFERENT PLACEMENTS
-- Goal: Employees above company average, showing the average
 
-- Option A: SELECT + WHERE (two subqueries, possibly cached)
SELECT 
    employee_name, 
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
 
-- Option B: FROM clause (single computation, joined)
SELECT e.employee_name, e.salary, stats.avg AS company_avg
FROM employees e
CROSS JOIN (SELECT AVG(salary) AS avg FROM employees) AS stats
WHERE e.salary > stats.avg;
 
 
-- Option C: Using CTE (clearest, single computation)
WITH company_stats AS (
    SELECT AVG(salary) AS avg FROM employees
)
SELECT e.employee_name, e.salary, s.avg AS company_avg
FROM employees e, company_stats s
WHERE e.salary > s.avg;
 
-- All three produce identical results
-- Option A: 2 subquery executions (possibly cached to 1)
-- Option B: 1 subquery execution + CROSS JOIN
-- Option C: 1 CTE execution + JOIN (most readable)

When in Doubt, Favor Clarity

Common Patterns and Anti-Patterns

Experience reveals patterns that work well and anti-patterns to avoid.

Recommended Patterns:

good_patterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- PATTERN: Derived table for aggregate-then-filter
SELECT d.department_id, d.avg_salary
FROM (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
) d
WHERE d.avg_salary > 50000;
-- Clear: compute aggregates, then filter
 
 
-- PATTERN: EXISTS for existence checks
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
-- Efficient: stops at first match
 
 
-- PATTERN: Scalar subquery for single reference values
SELECT * FROM employees
WHERE hire_date < (SELECT company_founding_date FROM company_info LIMIT 1);
-- Simple: one value, one comparison
 
 
-- PATTERN: IN for set membership
SELECT * FROM products
WHERE category_id IN (SELECT category_id FROM featured_categories);
-- Semantic: "is member of this set"

Anti-Patterns to Avoid:

antipatterns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- ANTI-PATTERN: Repeated identical subqueries
-- BAD: same subquery executed twice (may not be cached)
SELECT employee_name,
       salary - (SELECT AVG(salary) FROM employees) AS diff,
       100 * salary / (SELECT SUM(salary) FROM employees) AS pct
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
-- BETTER: compute once, reuse
WITH stats AS (
    SELECT AVG(salary) AS avg, SUM(salary) AS total FROM employees
)
SELECT employee_name,
       salary - stats.avg AS diff,
       100 * salary / stats.total AS pct
FROM employees, stats
WHERE salary > stats.avg;
 
 
-- ANTI-PATTERN: NOT IN with potentially NULL subquery
-- DANGEROUS: returns NO rows if subquery has NULL
SELECT * FROM orders WHERE customer_id NOT IN (SELECT customer_id FROM banned_customers);
 
-- SAFE: use NOT EXISTS
SELECT * FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM banned_customers b WHERE b.customer_id = o.customer_id);
 
 
-- ANTI-PATTERN: Correlated subquery where JOIN works
-- SLOWER: correlated subquery per row
SELECT e.employee_name, (SELECT d.name FROM depts d WHERE d.id = e.dept_id) AS dept
FROM employees e;
 
-- FASTER: simple JOIN
SELECT e.employee_name, d.name AS dept
FROM employees e LEFT JOIN depts d ON d.id = e.dept_id;

Anti-Pattern Summary

•Repeated identical subqueries — Use CTEs or derived tables for reuse
•NOT IN with NULL risk — Use NOT EXISTS instead
•Correlated subquery for simple lookup — Use JOIN
•Subquery when aggregate exists — Use window functions for row-level aggregates
•Deep nesting without necessity — Flatten with CTEs for readability

Summary: Mastering Subquery Placement

Subquery placement is as important as subquery design. Let's consolidate the key insights:

Key Takeaways

•SELECT clause — Scalar only; adds computed columns without filtering; correlated subqueries compute per row.
•FROM clause — Table subqueries only; creates derived tables; must have alias; enables multi-step transformations.
•WHERE clause — Most versatile; accepts all types; filters rows before grouping; use for primary filtering logic.
•HAVING clause — Filters groups after aggregation; use only when aggregate-based filtering is needed.
•DML statements — Subqueries enable dynamic INSERT/UPDATE/DELETE based on computed values.
•Placement choice — Depends on goal (filter vs. enrich), timing (pre vs. post aggregation), and performance.
•Avoid anti-patterns — Don't repeat subqueries; use NOT EXISTS over NOT IN; prefer JOINs for simple lookups.

Module Complete:

In the next module, we'll explore correlated subqueries in depth—subqueries that reference the outer query, enabling powerful row-by-row dependent computations.

Module Complete