Database Management SystemsSQL Query Writing

SQL Query Writing for Interviews

LevelAdvanced

Duration90 mins

TopicSQL Query Writing

3 / 5

Subqueries

The Power of Nested Queries

Subqueries are queries nested within other queries, enabling solutions that would be impossible or extremely awkward with flat SQL alone. They allow you to compute values on-the-fly, filter based on aggregated results, and express complex logical conditions that reference other result sets.

In technical interviews, subqueries are often the key to solving problems that initially seem intractable. More importantly, understanding when to use subqueries versus joins—and recognizing the patterns where each excels—demonstrates the sophisticated SQL thinking that distinguishes exceptional candidates.

What You Will Learn

By the end of this page, you will master scalar subqueries for single-value computations, table subqueries for derived tables and inline views, correlated subqueries for row-by-row processing, Common Table Expressions (CTEs) for readable modular queries, and recursive CTEs for hierarchical and graph data.

Subquery Fundamentals

A subquery is a SELECT statement embedded within another SQL statement. Understanding where subqueries can appear and what they return is fundamental to using them effectively.

Subquery Types and Characteristics
Subquery Type	Returns	Usage Location	Correlation
Scalar Subquery	Single value (1 row, 1 column)	SELECT, WHERE, HAVING	Optional
Row Subquery	Single row (1 row, multiple columns)	WHERE (with row constructor)	Optional
Table Subquery	Multiple rows and columns	FROM, JOIN	Not allowed
Correlated Subquery	Any of above, references outer query	SELECT, WHERE, HAVING	Required

Subquery Execution Conceptual Model:

To understand subquery behavior, it helps to visualize the execution model:

Non-correlated subqueries execute once and their result is reused
Correlated subqueries conceptually execute once per row of the outer query
Table subqueries (FROM) execute first, creating a derived table for the outer query

subquery_basics.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- Scalar subquery in SELECT: Single value per row context
SELECT 
    product_name,
    unit_price,
    (SELECT AVG(unit_price) FROM products) AS avg_price,
    unit_price - (SELECT AVG(unit_price) FROM products) AS diff_from_avg
FROM products;
 
-- Scalar subquery in WHERE: Filter based on computed value
SELECT product_name, unit_price
FROM products
WHERE unit_price > (SELECT AVG(unit_price) FROM products);
 
-- Table subquery in FROM: Derived table
SELECT 
    category,
    avg_price,
    product_count
FROM (
    SELECT 
        category_id AS category,
        AVG(unit_price) AS avg_price,
        COUNT(*) AS product_count
    FROM products
    GROUP BY category_id
) AS category_stats
WHERE product_count >= 5;
 
-- Subquery with IN: Set membership test
SELECT customer_name, email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);
 
-- Subquery with EXISTS: Existence test
SELECT customer_name, email
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.total > 1000
);

Scalar Subquery Requirement

A scalar subquery must return exactly one value (one row, one column). If it returns no rows, the result is NULL. If it returns multiple rows, an error occurs. This is critical for WHERE and SELECT usage.

Correlated Subqueries: Row-by-Row Processing

A correlated subquery references columns from the outer query, creating a dependency that (conceptually) causes the subquery to execute once for each row of the outer query. This enables powerful patterns but requires careful performance consideration.

Identifying Correlated Subqueries:

A subquery is correlated when it references a column from the outer query that is not defined within the subquery itself.

correlated_subqueries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
-- Correlated in WHERE: Find products priced above their category average
SELECT 
    p.product_name,
    p.category_id,
    p.unit_price
FROM products p
WHERE p.unit_price > (
    SELECT AVG(p2.unit_price)
    FROM products p2
    WHERE p2.category_id = p.category_id  -- Correlation: references outer p
);
 
-- Correlated in SELECT: Running count
SELECT 
    o.order_id,
    o.order_date,
    o.customer_id,
    (
        SELECT COUNT(*)
        FROM orders o2
        WHERE o2.customer_id = o.customer_id  -- Correlation
        AND o2.order_date <= o.order_date
    ) AS customer_order_number
FROM orders o
ORDER BY o.customer_id, o.order_date;
 
-- EXISTS with correlation: Customers with high-value orders
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id  -- Correlation
    AND o.total > 1000
);
 
-- NOT EXISTS with correlation: Customers with no orders this year
SELECT c.customer_id, c.name, c.email
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id  -- Correlation
    AND EXTRACT(YEAR FROM o.order_date) = EXTRACT(YEAR FROM CURRENT_DATE)
);
 
-- Correlated subquery for top-N per group (without window functions)
SELECT p.product_id, p.product_name, p.category_id, p.unit_price
FROM products p
WHERE (
    SELECT COUNT(*)
    FROM products p2
    WHERE p2.category_id = p.category_id  -- Same category
    AND p2.unit_price > p.unit_price       -- Higher price
) < 3;  -- Fewer than 3 products have higher price = top 3
 
-- Correlated UPDATE: Update with computed value from related table
UPDATE customers c
SET total_orders = (
    SELECT COUNT(*)
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

Performance Consideration

Correlated subqueries can be inefficient on large tables as they conceptually execute once per outer row. Modern optimizers often transform them into joins, but when performance matters, consider rewriting as a JOIN or using window functions. Always check EXPLAIN ANALYZE.

Common Interview Pattern: All-Match Condition

Finding rows that match ALL members of a set is a classic interview problem that correlated subqueries solve elegantly:

all_match_pattern.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Find students who have taken ALL required courses
-- Required courses: MATH101, COMP101, PHYS101
 
-- Method 1: Double NOT EXISTS (relational division)
SELECT s.student_id, s.name
FROM students s
WHERE NOT EXISTS (
    -- Find any required course this student hasn't taken
    SELECT 1
    FROM (VALUES ('MATH101'), ('COMP101'), ('PHYS101')) AS required(course_id)
    WHERE NOT EXISTS (
        SELECT 1
        FROM enrollments e
        WHERE e.student_id = s.student_id
        AND e.course_id = required.course_id
    )
);
 
-- Method 2: COUNT comparison
SELECT s.student_id, s.name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
WHERE e.course_id IN ('MATH101', 'COMP101', 'PHYS101')
GROUP BY s.student_id, s.name
HAVING COUNT(DISTINCT e.course_id) = 3;  -- Must have all 3
 
-- Find suppliers who supply ALL products in a category
SELECT s.supplier_id, s.supplier_name
FROM suppliers s
WHERE NOT EXISTS (
    SELECT p.product_id
    FROM products p
    WHERE p.category_id = 'CAT-ELECTRONICS'
    AND NOT EXISTS (
        SELECT 1
        FROM supplier_products sp
        WHERE sp.supplier_id = s.supplier_id
        AND sp.product_id = p.product_id
    )
);

Common Table Expressions (CTEs)

Common Table Expressions (CTEs), introduced with the WITH clause, define named temporary result sets that exist only within the scope of a single statement. CTEs dramatically improve query readability and enable step-by-step query construction.

CTE Advantages:

Readability: Break complex queries into named, logical steps
Reusability: Reference the same CTE multiple times in one query
Debugging: Easier to isolate and test individual components
Recursion: Enable recursive queries for hierarchical data

cte_basics.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
-- Basic CTE: Named temporary result set
WITH high_value_customers AS (
    SELECT 
        customer_id,
        name,
        SUM(order_total) AS lifetime_value
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id, c.name
    HAVING SUM(order_total) > 10000
)
SELECT * FROM high_value_customers
ORDER BY lifetime_value DESC;
 
-- Multiple CTEs: Step-by-step query construction
WITH 
customer_orders AS (
    SELECT 
        customer_id,
        COUNT(*) AS order_count,
        SUM(total) AS total_spending,
        AVG(total) AS avg_order
    FROM orders
    GROUP BY customer_id
),
customer_segments AS (
    SELECT 
        customer_id,
        order_count,
        total_spending,
        avg_order,
        CASE 
            WHEN total_spending >= 10000 THEN 'VIP'
            WHEN total_spending >= 5000 THEN 'Gold'
            WHEN total_spending >= 1000 THEN 'Silver'
            ELSE 'Bronze'
        END AS segment
    FROM customer_orders
)
SELECT 
    c.name,
    c.email,
    cs.segment,
    cs.order_count,
    cs.total_spending,
    ROUND(cs.avg_order, 2) AS avg_order
FROM customers c
JOIN customer_segments cs ON c.customer_id = cs.customer_id
ORDER BY cs.total_spending DESC;
 
-- CTE referenced multiple times
WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', order_date) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT 
    curr.month,
    curr.revenue AS current_revenue,
    prev.revenue AS previous_revenue,
    curr.revenue - COALESCE(prev.revenue, 0) AS change,
    ROUND(
        100.0 * (curr.revenue - COALESCE(prev.revenue, 0)) / 
        NULLIF(prev.revenue, 0), 
        2
    ) AS pct_change
FROM monthly_sales curr
LEFT JOIN monthly_sales prev 
    ON curr.month = prev.month + INTERVAL '1 month'
ORDER BY curr.month;
 
-- CTE with INSERT (PostgreSQL, SQL Server)
WITH new_orders AS (
    SELECT 
        order_id, 
        customer_id, 
        total
    FROM staging_orders
    WHERE validated = true
)
INSERT INTO orders (order_id, customer_id, total, created_at)
SELECT order_id, customer_id, total, CURRENT_TIMESTAMP
FROM new_orders;

CTE Materialization

Different databases handle CTE optimization differently. PostgreSQL may materialize CTEs (compute once, store result) while others inline them like subqueries. For critical performance, check your database's behavior. PostgreSQL 12+ allows 'AS MATERIALIZED' or 'AS NOT MATERIALIZED' hints.

Recursive CTEs: Hierarchical and Graph Data

Recursive CTEs enable queries that reference themselves, essential for traversing hierarchical data (org charts, bill of materials) and graph structures (social networks, dependencies).

Recursive CTE Structure:

WITH RECURSIVE cte_name AS (
    -- Anchor member: Starting point (non-recursive)
    SELECT ... 
    UNION [ALL]
    -- Recursive member: References CTE itself
    SELECT ... FROM cte_name WHERE termination_condition
)
SELECT * FROM cte_name;

recursive_ctes.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
-- Classic: Employee hierarchy (org chart)
WITH RECURSIVE org_hierarchy AS (
    -- Anchor: Start with the CEO (no manager)
    SELECT 
        employee_id,
        name,
        manager_id,
        1 AS level,
        name::VARCHAR(1000) AS path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: Find direct reports of current level
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        oh.level + 1,
        (oh.path || ' > ' || e.name)::VARCHAR(1000)
    FROM employees e
    JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT 
    employee_id,
    REPEAT('  ', level - 1) || name AS org_chart,
    level,
    path
FROM org_hierarchy
ORDER BY path;
 
-- Bill of Materials: Find all components of a product
WITH RECURSIVE bom AS (
    -- Anchor: Start with the top-level product
    SELECT 
        component_id,
        component_name,
        1 AS quantity,
        1 AS level
    FROM components
    WHERE component_id = 'PROD-001'
    
    UNION ALL
    
    -- Recursive: Find sub-components
    SELECT 
        c.component_id,
        c.component_name,
        b.quantity * pc.quantity,  -- Accumulated quantity
        b.level + 1
    FROM bom b
    JOIN product_components pc ON b.component_id = pc.parent_id
    JOIN components c ON pc.component_id = c.component_id
    WHERE b.level < 10  -- Prevent infinite recursion
)
SELECT 
    component_id,
    component_name,
    SUM(quantity) AS total_needed,
    MIN(level) AS first_level
FROM bom
GROUP BY component_id, component_name
ORDER BY first_level, component_name;
 
-- Graph traversal: Find all connections within N degrees
WITH RECURSIVE connections AS (
    -- Anchor: Direct connections of user 1001
    SELECT 
        friend_id AS user_id,
        1 AS degree,
        ARRAY[1001, friend_id] AS path
    FROM friendships
    WHERE user_id = 1001
    
    UNION
    
    -- Recursive: Friends of friends
    SELECT 
        f.friend_id,
        c.degree + 1,
        c.path || f.friend_id
    FROM connections c
    JOIN friendships f ON c.user_id = f.user_id
    WHERE c.degree < 3  -- Up to 3 degrees
    AND NOT f.friend_id = ANY(c.path)  -- Prevent cycles
)
SELECT DISTINCT user_id, MIN(degree) AS closest_degree
FROM connections
GROUP BY user_id
ORDER BY closest_degree, user_id;
 
-- Generate series (useful for filling gaps)
WITH RECURSIVE date_series AS (
    SELECT DATE '2024-01-01' AS date
    UNION ALL
    SELECT date + INTERVAL '1 day'
    FROM date_series
    WHERE date < DATE '2024-12-31'
)
SELECT ds.date, COALESCE(o.order_count, 0) AS orders
FROM date_series ds
LEFT JOIN (
    SELECT order_date::DATE, COUNT(*) AS order_count
    FROM orders
    GROUP BY order_date::DATE
) o ON ds.date = o.order_date
ORDER BY ds.date;

Recursion Safety

Recursive CTEs can run infinitely if not properly bounded. Always include: (1) A termination condition in the recursive member (WHERE level < N, or path checks), (2) A maximum iteration limit if your database supports it. PostgreSQL's default limit is 1000 iterations.

Subqueries vs Joins: Making the Right Choice

Many problems can be solved with either subqueries or joins. Understanding when each approach is preferable demonstrates sophisticated SQL thinking.

Subqueries vs Joins: Decision Guide
Scenario	Prefer Subquery	Prefer Join
Existence/absence check	EXISTS/NOT EXISTS	—
Single aggregate comparison	Scalar subquery	—
Multiple columns from related table	—	JOIN
Need data from both tables	—	JOIN
Filtering against a computed set	IN with subquery	CTE + JOIN
Row-by-row computation	Correlated subquery or	Window function
Complex multi-step logic	CTEs	CTEs
Performance-critical	Depends—test both!	Depends—test both!

subquery_vs_join.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
-- Scenario: Find customers with orders > average order value
 
-- Approach 1: Subquery (clear intent)
SELECT customer_id, name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE total > (SELECT AVG(total) FROM orders)
);
 
-- Approach 2: JOIN (same result, different style)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
CROSS JOIN (SELECT AVG(total) AS avg_total FROM orders) avg
WHERE o.total > avg.avg_total;
 
-- Scenario: Products with their sales, including products with no sales
 
-- Subquery approach (awkward for this case)
SELECT 
    p.product_id,
    p.product_name,
    (
        SELECT COALESCE(SUM(oi.quantity), 0)
        FROM order_items oi
        WHERE oi.product_id = p.product_id
    ) AS total_sold
FROM products p;
 
-- JOIN approach (more natural)
SELECT 
    p.product_id,
    p.product_name,
    COALESCE(SUM(oi.quantity), 0) AS total_sold
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name;
 
-- Scenario: Find the most recent order per customer
 
-- Correlated subquery (classic approach)
SELECT o1.*
FROM orders o1
WHERE o1.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o1.customer_id
);
 
-- CTE with window function (modern approach)
WITH ranked_orders AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY order_date DESC
        ) AS rn
    FROM orders
)
SELECT * FROM ranked_orders WHERE rn = 1;
 
-- Scenario: Existence check (EXISTS is typically best)
 
-- EXISTS (preferred - can short-circuit)
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= CURRENT_DATE - INTERVAL '30 days'
);
 
-- IN (similar, optimizer may transform to EXISTS)
SELECT customer_id, name
FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);
 
-- JOIN with DISTINCT (potentially less efficient)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days';

Interview Strategy

In interviews, start with the approach that most clearly expresses your intent. Then, if asked about optimization, discuss alternatives. 'I wrote it with EXISTS for clarity because we only need to check existence, but I could also use an anti-join with LEFT JOIN IS NULL if performance testing suggested it was faster.'

Advanced Subquery Patterns

Beyond basic usage, several advanced subquery patterns appear in complex queries and interviews. Mastering these elevates your SQL capabilities significantly.

Derived Tables: Pre-computed Results in FROM

Derived tables (subqueries in FROM clause) create inline virtual tables that can be joined and filtered like regular tables:

derived_tables.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Aggregate before joining to prevent row multiplication
SELECT 
    c.customer_id,
    c.name,
    order_stats.total_orders,
    order_stats.total_spent,
    item_stats.total_items
FROM customers c
JOIN (
    SELECT 
        customer_id,
        COUNT(*) AS total_orders,
        SUM(total) AS total_spent
    FROM orders
    GROUP BY customer_id
) AS order_stats ON c.customer_id = order_stats.customer_id
JOIN (
    SELECT 
        o.customer_id,
        COUNT(oi.item_id) AS total_items
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    GROUP BY o.customer_id
) AS item_stats ON c.customer_id = item_stats.customer_id;
 
-- Filtering on aggregated data
SELECT *
FROM (
    SELECT 
        category_id,
        COUNT(*) AS product_count,
        AVG(unit_price) AS avg_price
    FROM products
    GROUP BY category_id
) AS category_stats
WHERE product_count >= 10
  AND avg_price > 50;

Interview Subquery Challenges

Let's work through challenging interview problems that showcase subquery mastery:

interview_subquery_problems.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
/* Problem 1: Find departments where ALL employees earn more 
   than the company average salary */
 
WITH company_avg AS (
    SELECT AVG(salary) AS avg_salary FROM employees
)
SELECT d.department_id, d.department_name
FROM departments d
WHERE NOT EXISTS (
    -- Find any employee in this dept below company average
    SELECT 1
    FROM employees e
    CROSS JOIN company_avg ca
    WHERE e.department_id = d.department_id
    AND e.salary <= ca.avg_salary
);
 
/* Problem 2: Find the earliest order for each customer, but only 
   for customers who have placed at least 3 orders */
 
WITH customer_order_counts AS (
    SELECT customer_id, COUNT(*) AS order_count
    FROM orders
    GROUP BY customer_id
    HAVING COUNT(*) >= 3
),
first_orders AS (
    SELECT o.*
    FROM orders o
    WHERE o.order_date = (
        SELECT MIN(o2.order_date)
        FROM orders o2
        WHERE o2.customer_id = o.customer_id
    )
)
SELECT fo.*
FROM first_orders fo
WHERE fo.customer_id IN (SELECT customer_id FROM customer_order_counts);
 
/* Problem 3: For each product, calculate its revenue rank within 
   its category and overall, without using window functions */
 
SELECT 
    p.product_id,
    p.product_name,
    p.category_id,
    ps.product_revenue,
    (
        SELECT COUNT(DISTINCT ps2.product_revenue) + 1
        FROM (
            SELECT product_id, SUM(quantity * unit_price) AS product_revenue
            FROM order_items
            GROUP BY product_id
        ) ps2
        JOIN products p2 ON ps2.product_id = p2.product_id
        WHERE p2.category_id = p.category_id
        AND ps2.product_revenue > ps.product_revenue
    ) AS category_rank,
    (
        SELECT COUNT(DISTINCT ps2.product_revenue) + 1
        FROM (
            SELECT product_id, SUM(quantity * unit_price) AS product_revenue
            FROM order_items
            GROUP BY product_id
        ) ps2
        WHERE ps2.product_revenue > ps.product_revenue
    ) AS overall_rank
FROM products p
JOIN (
    SELECT product_id, SUM(quantity * unit_price) AS product_revenue
    FROM order_items
    GROUP BY product_id
) ps ON p.product_id = ps.product_id
ORDER BY category_id, category_rank;
 
/* Problem 4: Find pairs of products frequently bought together 
   (appearing in the same order at least 5 times) */
 
SELECT 
    p1.product_name AS product_1,
    p2.product_name AS product_2,
    pair_count
FROM (
    SELECT 
        oi1.product_id AS pid1,
        oi2.product_id AS pid2,
        COUNT(DISTINCT oi1.order_id) AS pair_count
    FROM order_items oi1
    JOIN order_items oi2 ON oi1.order_id = oi2.order_id
    WHERE oi1.product_id < oi2.product_id  -- Prevent duplicates and self-pairs
    GROUP BY oi1.product_id, oi2.product_id
    HAVING COUNT(DISTINCT oi1.order_id) >= 5
) pairs
JOIN products p1 ON pairs.pid1 = p1.product_id
JOIN products p2 ON pairs.pid2 = p2.product_id
ORDER BY pair_count DESC, product_1, product_2;
 
/* Problem 5: Find the median order value (without PERCENTILE functions) */
 
WITH ordered_values AS (
    SELECT 
        total,
        ROW_NUMBER() OVER (ORDER BY total) AS row_asc,
        ROW_NUMBER() OVER (ORDER BY total DESC) AS row_desc
    FROM orders
)
SELECT AVG(total) AS median_value
FROM ordered_values
WHERE ABS(row_asc - row_desc) <= 1;  -- Middle value(s)

Problem-Solving Strategy

For complex subquery problems: (1) Identify what the final output should look like, (2) Work backwards—what intermediate results do you need?, (3) Consider if each piece is best expressed as a CTE, derived table, or inline subquery, (4) Build incrementally, testing each piece.

Summary: Subquery Mastery

You've now developed comprehensive knowledge of SQL subqueries—from basic scalar subqueries to advanced recursive CTEs and complex interview patterns.

Key Takeaways:

Core Concepts Mastered

•Subquery Types — Scalar (single value), row (single row), table (multiple rows/columns), and correlated subqueries each serve specific purposes.
•Correlated Subqueries — Reference outer query columns, conceptually execute per-row, powerful but potentially expensive.
•CTEs (WITH Clause) — Named temporary result sets that improve readability and enable step-by-step query construction.
•Recursive CTEs — Essential for hierarchical data (org charts, bill of materials) and graph traversal with anchor + recursive members.
•Subqueries vs Joins — Choose based on clarity and semantics; EXISTS for existence checks, JOINs when multiple columns needed.
•Advanced Patterns — Derived tables prevent row multiplication; multi-column subqueries match composite keys; nested depth for complex logic.

What's Next:

With subquery mastery in place, we'll explore Window Functions in the next page—the powerful analytical capabilities that enable ranking, running totals, moving averages, and row comparisons without self-joins or correlated subqueries.

Page Complete

You now command the full power of SQL subqueries. From simple scalar computations to recursive hierarchical queries, these techniques enable elegant solutions to problems that would otherwise require complex procedural code or multiple queries.

3 / 5

Loading learning content...

Database Management SystemsSQL Query Writing

SQL Query Writing for Interviews

LevelAdvanced

Duration90 mins

TopicSQL Query Writing

3 / 5

Subqueries

The Power of Nested Queries

What You Will Learn

Subquery Fundamentals

A subquery is a SELECT statement embedded within another SQL statement. Understanding where subqueries can appear and what they return is fundamental to using them effectively.

Subquery Types and Characteristics
Subquery Type	Returns	Usage Location	Correlation
Scalar Subquery	Single value (1 row, 1 column)	SELECT, WHERE, HAVING	Optional
Row Subquery	Single row (1 row, multiple columns)	WHERE (with row constructor)	Optional
Table Subquery	Multiple rows and columns	FROM, JOIN	Not allowed
Correlated Subquery	Any of above, references outer query	SELECT, WHERE, HAVING	Required

Subquery Execution Conceptual Model:

To understand subquery behavior, it helps to visualize the execution model:

Non-correlated subqueries execute once and their result is reused
Correlated subqueries conceptually execute once per row of the outer query
Table subqueries (FROM) execute first, creating a derived table for the outer query

subquery_basics.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
-- Scalar subquery in SELECT: Single value per row context
SELECT 
    product_name,
    unit_price,
    (SELECT AVG(unit_price) FROM products) AS avg_price,
    unit_price - (SELECT AVG(unit_price) FROM products) AS diff_from_avg
FROM products;
 
-- Scalar subquery in WHERE: Filter based on computed value
SELECT product_name, unit_price
FROM products
WHERE unit_price > (SELECT AVG(unit_price) FROM products);
 
-- Table subquery in FROM: Derived table
SELECT 
    category,
    avg_price,
    product_count
FROM (
    SELECT 
        category_id AS category,
        AVG(unit_price) AS avg_price,
        COUNT(*) AS product_count
    FROM products
    GROUP BY category_id
) AS category_stats
WHERE product_count >= 5;
 
-- Subquery with IN: Set membership test
SELECT customer_name, email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);
 
-- Subquery with EXISTS: Existence test
SELECT customer_name, email
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.total > 1000
);

Scalar Subquery Requirement

Correlated Subqueries: Row-by-Row Processing

Identifying Correlated Subqueries:

A subquery is correlated when it references a column from the outer query that is not defined within the subquery itself.

correlated_subqueries.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
-- Correlated in WHERE: Find products priced above their category average
SELECT 
    p.product_name,
    p.category_id,
    p.unit_price
FROM products p
WHERE p.unit_price > (
    SELECT AVG(p2.unit_price)
    FROM products p2
    WHERE p2.category_id = p.category_id  -- Correlation: references outer p
);
 
-- Correlated in SELECT: Running count
SELECT 
    o.order_id,
    o.order_date,
    o.customer_id,
    (
        SELECT COUNT(*)
        FROM orders o2
        WHERE o2.customer_id = o.customer_id  -- Correlation
        AND o2.order_date <= o.order_date
    ) AS customer_order_number
FROM orders o
ORDER BY o.customer_id, o.order_date;
 
-- EXISTS with correlation: Customers with high-value orders
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id  -- Correlation
    AND o.total > 1000
);
 
-- NOT EXISTS with correlation: Customers with no orders this year
SELECT c.customer_id, c.name, c.email
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id  -- Correlation
    AND EXTRACT(YEAR FROM o.order_date) = EXTRACT(YEAR FROM CURRENT_DATE)
);
 
-- Correlated subquery for top-N per group (without window functions)
SELECT p.product_id, p.product_name, p.category_id, p.unit_price
FROM products p
WHERE (
    SELECT COUNT(*)
    FROM products p2
    WHERE p2.category_id = p.category_id  -- Same category
    AND p2.unit_price > p.unit_price       -- Higher price
) < 3;  -- Fewer than 3 products have higher price = top 3
 
-- Correlated UPDATE: Update with computed value from related table
UPDATE customers c
SET total_orders = (
    SELECT COUNT(*)
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

Performance Consideration

Common Interview Pattern: All-Match Condition

Finding rows that match ALL members of a set is a classic interview problem that correlated subqueries solve elegantly:

all_match_pattern.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Find students who have taken ALL required courses
-- Required courses: MATH101, COMP101, PHYS101
 
-- Method 1: Double NOT EXISTS (relational division)
SELECT s.student_id, s.name
FROM students s
WHERE NOT EXISTS (
    -- Find any required course this student hasn't taken
    SELECT 1
    FROM (VALUES ('MATH101'), ('COMP101'), ('PHYS101')) AS required(course_id)
    WHERE NOT EXISTS (
        SELECT 1
        FROM enrollments e
        WHERE e.student_id = s.student_id
        AND e.course_id = required.course_id
    )
);
 
-- Method 2: COUNT comparison
SELECT s.student_id, s.name
FROM students s
JOIN enrollments e ON s.student_id = e.student_id
WHERE e.course_id IN ('MATH101', 'COMP101', 'PHYS101')
GROUP BY s.student_id, s.name
HAVING COUNT(DISTINCT e.course_id) = 3;  -- Must have all 3
 
-- Find suppliers who supply ALL products in a category
SELECT s.supplier_id, s.supplier_name
FROM suppliers s
WHERE NOT EXISTS (
    SELECT p.product_id
    FROM products p
    WHERE p.category_id = 'CAT-ELECTRONICS'
    AND NOT EXISTS (
        SELECT 1
        FROM supplier_products sp
        WHERE sp.supplier_id = s.supplier_id
        AND sp.product_id = p.product_id
    )
);

Common Table Expressions (CTEs)

CTE Advantages:

Readability: Break complex queries into named, logical steps
Reusability: Reference the same CTE multiple times in one query
Debugging: Easier to isolate and test individual components
Recursion: Enable recursive queries for hierarchical data

cte_basics.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
-- Basic CTE: Named temporary result set
WITH high_value_customers AS (
    SELECT 
        customer_id,
        name,
        SUM(order_total) AS lifetime_value
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id, c.name
    HAVING SUM(order_total) > 10000
)
SELECT * FROM high_value_customers
ORDER BY lifetime_value DESC;
 
-- Multiple CTEs: Step-by-step query construction
WITH 
customer_orders AS (
    SELECT 
        customer_id,
        COUNT(*) AS order_count,
        SUM(total) AS total_spending,
        AVG(total) AS avg_order
    FROM orders
    GROUP BY customer_id
),
customer_segments AS (
    SELECT 
        customer_id,
        order_count,
        total_spending,
        avg_order,
        CASE 
            WHEN total_spending >= 10000 THEN 'VIP'
            WHEN total_spending >= 5000 THEN 'Gold'
            WHEN total_spending >= 1000 THEN 'Silver'
            ELSE 'Bronze'
        END AS segment
    FROM customer_orders
)
SELECT 
    c.name,
    c.email,
    cs.segment,
    cs.order_count,
    cs.total_spending,
    ROUND(cs.avg_order, 2) AS avg_order
FROM customers c
JOIN customer_segments cs ON c.customer_id = cs.customer_id
ORDER BY cs.total_spending DESC;
 
-- CTE referenced multiple times
WITH monthly_sales AS (
    SELECT 
        DATE_TRUNC('month', order_date) AS month,
        SUM(total) AS revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
)
SELECT 
    curr.month,
    curr.revenue AS current_revenue,
    prev.revenue AS previous_revenue,
    curr.revenue - COALESCE(prev.revenue, 0) AS change,
    ROUND(
        100.0 * (curr.revenue - COALESCE(prev.revenue, 0)) / 
        NULLIF(prev.revenue, 0), 
        2
    ) AS pct_change
FROM monthly_sales curr
LEFT JOIN monthly_sales prev 
    ON curr.month = prev.month + INTERVAL '1 month'
ORDER BY curr.month;
 
-- CTE with INSERT (PostgreSQL, SQL Server)
WITH new_orders AS (
    SELECT 
        order_id, 
        customer_id, 
        total
    FROM staging_orders
    WHERE validated = true
)
INSERT INTO orders (order_id, customer_id, total, created_at)
SELECT order_id, customer_id, total, CURRENT_TIMESTAMP
FROM new_orders;

CTE Materialization

Recursive CTEs: Hierarchical and Graph Data

Recursive CTEs enable queries that reference themselves, essential for traversing hierarchical data (org charts, bill of materials) and graph structures (social networks, dependencies).

Recursive CTE Structure:

WITH RECURSIVE cte_name AS (
    -- Anchor member: Starting point (non-recursive)
    SELECT ... 
    UNION [ALL]
    -- Recursive member: References CTE itself
    SELECT ... FROM cte_name WHERE termination_condition
)
SELECT * FROM cte_name;

recursive_ctes.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
-- Classic: Employee hierarchy (org chart)
WITH RECURSIVE org_hierarchy AS (
    -- Anchor: Start with the CEO (no manager)
    SELECT 
        employee_id,
        name,
        manager_id,
        1 AS level,
        name::VARCHAR(1000) AS path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: Find direct reports of current level
    SELECT 
        e.employee_id,
        e.name,
        e.manager_id,
        oh.level + 1,
        (oh.path || ' > ' || e.name)::VARCHAR(1000)
    FROM employees e
    JOIN org_hierarchy oh ON e.manager_id = oh.employee_id
)
SELECT 
    employee_id,
    REPEAT('  ', level - 1) || name AS org_chart,
    level,
    path
FROM org_hierarchy
ORDER BY path;
 
-- Bill of Materials: Find all components of a product
WITH RECURSIVE bom AS (
    -- Anchor: Start with the top-level product
    SELECT 
        component_id,
        component_name,
        1 AS quantity,
        1 AS level
    FROM components
    WHERE component_id = 'PROD-001'
    
    UNION ALL
    
    -- Recursive: Find sub-components
    SELECT 
        c.component_id,
        c.component_name,
        b.quantity * pc.quantity,  -- Accumulated quantity
        b.level + 1
    FROM bom b
    JOIN product_components pc ON b.component_id = pc.parent_id
    JOIN components c ON pc.component_id = c.component_id
    WHERE b.level < 10  -- Prevent infinite recursion
)
SELECT 
    component_id,
    component_name,
    SUM(quantity) AS total_needed,
    MIN(level) AS first_level
FROM bom
GROUP BY component_id, component_name
ORDER BY first_level, component_name;
 
-- Graph traversal: Find all connections within N degrees
WITH RECURSIVE connections AS (
    -- Anchor: Direct connections of user 1001
    SELECT 
        friend_id AS user_id,
        1 AS degree,
        ARRAY[1001, friend_id] AS path
    FROM friendships
    WHERE user_id = 1001
    
    UNION
    
    -- Recursive: Friends of friends
    SELECT 
        f.friend_id,
        c.degree + 1,
        c.path || f.friend_id
    FROM connections c
    JOIN friendships f ON c.user_id = f.user_id
    WHERE c.degree < 3  -- Up to 3 degrees
    AND NOT f.friend_id = ANY(c.path)  -- Prevent cycles
)
SELECT DISTINCT user_id, MIN(degree) AS closest_degree
FROM connections
GROUP BY user_id
ORDER BY closest_degree, user_id;
 
-- Generate series (useful for filling gaps)
WITH RECURSIVE date_series AS (
    SELECT DATE '2024-01-01' AS date
    UNION ALL
    SELECT date + INTERVAL '1 day'
    FROM date_series
    WHERE date < DATE '2024-12-31'
)
SELECT ds.date, COALESCE(o.order_count, 0) AS orders
FROM date_series ds
LEFT JOIN (
    SELECT order_date::DATE, COUNT(*) AS order_count
    FROM orders
    GROUP BY order_date::DATE
) o ON ds.date = o.order_date
ORDER BY ds.date;

Recursion Safety

Subqueries vs Joins: Making the Right Choice

Many problems can be solved with either subqueries or joins. Understanding when each approach is preferable demonstrates sophisticated SQL thinking.

Subqueries vs Joins: Decision Guide
Scenario	Prefer Subquery	Prefer Join
Existence/absence check	EXISTS/NOT EXISTS	—
Single aggregate comparison	Scalar subquery	—
Multiple columns from related table	—	JOIN
Need data from both tables	—	JOIN
Filtering against a computed set	IN with subquery	CTE + JOIN
Row-by-row computation	Correlated subquery or	Window function
Complex multi-step logic	CTEs	CTEs
Performance-critical	Depends—test both!	Depends—test both!

subquery_vs_join.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
-- Scenario: Find customers with orders > average order value
 
-- Approach 1: Subquery (clear intent)
SELECT customer_id, name
FROM customers
WHERE customer_id IN (
    SELECT customer_id
    FROM orders
    WHERE total > (SELECT AVG(total) FROM orders)
);
 
-- Approach 2: JOIN (same result, different style)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
CROSS JOIN (SELECT AVG(total) AS avg_total FROM orders) avg
WHERE o.total > avg.avg_total;
 
-- Scenario: Products with their sales, including products with no sales
 
-- Subquery approach (awkward for this case)
SELECT 
    p.product_id,
    p.product_name,
    (
        SELECT COALESCE(SUM(oi.quantity), 0)
        FROM order_items oi
        WHERE oi.product_id = p.product_id
    ) AS total_sold
FROM products p;
 
-- JOIN approach (more natural)
SELECT 
    p.product_id,
    p.product_name,
    COALESCE(SUM(oi.quantity), 0) AS total_sold
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
GROUP BY p.product_id, p.product_name;
 
-- Scenario: Find the most recent order per customer
 
-- Correlated subquery (classic approach)
SELECT o1.*
FROM orders o1
WHERE o1.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o1.customer_id
);
 
-- CTE with window function (modern approach)
WITH ranked_orders AS (
    SELECT *,
        ROW_NUMBER() OVER (
            PARTITION BY customer_id 
            ORDER BY order_date DESC
        ) AS rn
    FROM orders
)
SELECT * FROM ranked_orders WHERE rn = 1;
 
-- Scenario: Existence check (EXISTS is typically best)
 
-- EXISTS (preferred - can short-circuit)
SELECT c.customer_id, c.name
FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = c.customer_id
    AND o.order_date >= CURRENT_DATE - INTERVAL '30 days'
);
 
-- IN (similar, optimizer may transform to EXISTS)
SELECT customer_id, name
FROM customers
WHERE customer_id IN (
    SELECT customer_id FROM orders
    WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
);
 
-- JOIN with DISTINCT (potentially less efficient)
SELECT DISTINCT c.customer_id, c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '30 days';

Interview Strategy

Advanced Subquery Patterns

Beyond basic usage, several advanced subquery patterns appear in complex queries and interviews. Mastering these elevates your SQL capabilities significantly.

Derived Tables: Pre-computed Results in FROM

Derived tables (subqueries in FROM clause) create inline virtual tables that can be joined and filtered like regular tables:

derived_tables.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Aggregate before joining to prevent row multiplication
SELECT 
    c.customer_id,
    c.name,
    order_stats.total_orders,
    order_stats.total_spent,
    item_stats.total_items
FROM customers c
JOIN (
    SELECT 
        customer_id,
        COUNT(*) AS total_orders,
        SUM(total) AS total_spent
    FROM orders
    GROUP BY customer_id
) AS order_stats ON c.customer_id = order_stats.customer_id
JOIN (
    SELECT 
        o.customer_id,
        COUNT(oi.item_id) AS total_items
    FROM orders o
    JOIN order_items oi ON o.order_id = oi.order_id
    GROUP BY o.customer_id
) AS item_stats ON c.customer_id = item_stats.customer_id;
 
-- Filtering on aggregated data
SELECT *
FROM (
    SELECT 
        category_id,
        COUNT(*) AS product_count,
        AVG(unit_price) AS avg_price
    FROM products
    GROUP BY category_id
) AS category_stats
WHERE product_count >= 10
  AND avg_price > 50;

Interview Subquery Challenges

Let's work through challenging interview problems that showcase subquery mastery:

interview_subquery_problems.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
/* Problem 1: Find departments where ALL employees earn more 
   than the company average salary */
 
WITH company_avg AS (
    SELECT AVG(salary) AS avg_salary FROM employees
)
SELECT d.department_id, d.department_name
FROM departments d
WHERE NOT EXISTS (
    -- Find any employee in this dept below company average
    SELECT 1
    FROM employees e
    CROSS JOIN company_avg ca
    WHERE e.department_id = d.department_id
    AND e.salary <= ca.avg_salary
);
 
/* Problem 2: Find the earliest order for each customer, but only 
   for customers who have placed at least 3 orders */
 
WITH customer_order_counts AS (
    SELECT customer_id, COUNT(*) AS order_count
    FROM orders
    GROUP BY customer_id
    HAVING COUNT(*) >= 3
),
first_orders AS (
    SELECT o.*
    FROM orders o
    WHERE o.order_date = (
        SELECT MIN(o2.order_date)
        FROM orders o2
        WHERE o2.customer_id = o.customer_id
    )
)
SELECT fo.*
FROM first_orders fo
WHERE fo.customer_id IN (SELECT customer_id FROM customer_order_counts);
 
/* Problem 3: For each product, calculate its revenue rank within 
   its category and overall, without using window functions */
 
SELECT 
    p.product_id,
    p.product_name,
    p.category_id,
    ps.product_revenue,
    (
        SELECT COUNT(DISTINCT ps2.product_revenue) + 1
        FROM (
            SELECT product_id, SUM(quantity * unit_price) AS product_revenue
            FROM order_items
            GROUP BY product_id
        ) ps2
        JOIN products p2 ON ps2.product_id = p2.product_id
        WHERE p2.category_id = p.category_id
        AND ps2.product_revenue > ps.product_revenue
    ) AS category_rank,
    (
        SELECT COUNT(DISTINCT ps2.product_revenue) + 1
        FROM (
            SELECT product_id, SUM(quantity * unit_price) AS product_revenue
            FROM order_items
            GROUP BY product_id
        ) ps2
        WHERE ps2.product_revenue > ps.product_revenue
    ) AS overall_rank
FROM products p
JOIN (
    SELECT product_id, SUM(quantity * unit_price) AS product_revenue
    FROM order_items
    GROUP BY product_id
) ps ON p.product_id = ps.product_id
ORDER BY category_id, category_rank;
 
/* Problem 4: Find pairs of products frequently bought together 
   (appearing in the same order at least 5 times) */
 
SELECT 
    p1.product_name AS product_1,
    p2.product_name AS product_2,
    pair_count
FROM (
    SELECT 
        oi1.product_id AS pid1,
        oi2.product_id AS pid2,
        COUNT(DISTINCT oi1.order_id) AS pair_count
    FROM order_items oi1
    JOIN order_items oi2 ON oi1.order_id = oi2.order_id
    WHERE oi1.product_id < oi2.product_id  -- Prevent duplicates and self-pairs
    GROUP BY oi1.product_id, oi2.product_id
    HAVING COUNT(DISTINCT oi1.order_id) >= 5
) pairs
JOIN products p1 ON pairs.pid1 = p1.product_id
JOIN products p2 ON pairs.pid2 = p2.product_id
ORDER BY pair_count DESC, product_1, product_2;
 
/* Problem 5: Find the median order value (without PERCENTILE functions) */
 
WITH ordered_values AS (
    SELECT 
        total,
        ROW_NUMBER() OVER (ORDER BY total) AS row_asc,
        ROW_NUMBER() OVER (ORDER BY total DESC) AS row_desc
    FROM orders
)
SELECT AVG(total) AS median_value
FROM ordered_values
WHERE ABS(row_asc - row_desc) <= 1;  -- Middle value(s)

Problem-Solving Strategy

Summary: Subquery Mastery

You've now developed comprehensive knowledge of SQL subqueries—from basic scalar subqueries to advanced recursive CTEs and complex interview patterns.

Key Takeaways:

Core Concepts Mastered

•Subquery Types — Scalar (single value), row (single row), table (multiple rows/columns), and correlated subqueries each serve specific purposes.
•Correlated Subqueries — Reference outer query columns, conceptually execute per-row, powerful but potentially expensive.
•CTEs (WITH Clause) — Named temporary result sets that improve readability and enable step-by-step query construction.
•Recursive CTEs — Essential for hierarchical data (org charts, bill of materials) and graph traversal with anchor + recursive members.
•Subqueries vs Joins — Choose based on clarity and semantics; EXISTS for existence checks, JOINs when multiple columns needed.
•Advanced Patterns — Derived tables prevent row multiplication; multi-column subqueries match composite keys; nested depth for complex logic.

What's Next:

Page Complete

3 / 5