Database Management SystemsSubqueries

SQL Subqueries: Nested Query Power

LevelIntermediate

Duration60 mins

TopicSubqueries

2 / 5

Scalar Subqueries

When You Need Exactly One Value

Imagine displaying an employee report that shows each person's salary alongside the company average. Or updating prices where the discount depends on a calculated threshold. Or inserting records that need computed default values.

All these scenarios require embedding a computed single value directly into your SQL statement—not a list, not a table, but precisely one value.

This is the domain of scalar subqueries: the simplest and most versatile form of nested queries. A scalar subquery evaluates to exactly one row containing exactly one column—a single cell in database terms. This restriction is what gives scalar subqueries their power: they can appear anywhere SQL expects a literal value.

What You Will Learn

By the end of this page, you will master scalar subqueries—understanding where they can appear, how to guarantee they return exactly one value, how to handle edge cases, and when they excel versus alternative approaches. You'll learn to use them for computations, comparisons, and even data modification statements.

Defining Scalar Subqueries

A scalar subquery is a subquery that returns a result set consisting of exactly one row with exactly one column—in other words, a single value (a scalar).

Formal Definition:

A scalar subquery is a SELECT statement that, upon execution, yields a result containing at most one row and exactly one column. If the query returns zero rows, the result is NULL. If it returns more than one row, most databases raise a runtime error.

The term "scalar" comes from linear algebra, where a scalar is a single number (as opposed to a vector or matrix). In SQL, a scalar value can be any data type: integer, string, date, boolean, etc.

scalar_subquery_definition.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Scalar subquery: returns exactly one value
-- This subquery returns a single number: the average salary
 
SELECT employee_name, salary,
       (SELECT AVG(salary) FROM employees) AS company_average
FROM employees;
 
-- For 100 employees, this query returns 100 rows.
-- Each row has three columns:
--   1. employee_name (from base table)
--   2. salary (from base table)  
--   3. company_average (same scalar value for all rows)
 
-- The subquery (SELECT AVG(salary) FROM employees) is scalar because:
--   - It returns exactly ONE row (aggregates collapse to one row)
--   - It returns exactly ONE column (AVG(salary))

Aggregate Functions Guarantee Scalarity

Aggregate functions (SUM, AVG, COUNT, MAX, MIN) without GROUP BY always return exactly one row, making them natural scalar subquery producers. When you need a scalar result, aggregate functions are your most reliable tool.

Where Scalar Subqueries Can Appear

The defining characteristic of scalar subqueries is their flexibility—they can appear anywhere a single value is expected. This includes contexts many developers don't initially consider:

In SELECT Statements:

scalar_in_select.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- 1. SELECT list (as a computed column)
SELECT 
    employee_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS avg_salary,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
 
 
-- 2. WHERE clause (comparison)
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
 
-- 3. HAVING clause (filter on groups)
SELECT department_id, AVG(salary) as dept_avg
FROM employees
GROUP BY department_id
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
 
 
-- 4. CASE expressions
SELECT 
    employee_name,
    salary,
    CASE 
        WHEN salary > (SELECT AVG(salary) FROM employees) THEN 'Above Average'
        WHEN salary = (SELECT AVG(salary) FROM employees) THEN 'At Average'
        ELSE 'Below Average'
    END AS salary_status
FROM employees;

In Data Modification Statements:

scalar_in_dml.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- 5. INSERT statement (computed values)
INSERT INTO salary_reports (report_date, avg_salary, employee_count)
VALUES (
    CURRENT_DATE,
    (SELECT AVG(salary) FROM employees),
    (SELECT COUNT(*) FROM employees)
);
 
 
-- 6. UPDATE statement (SET clause)
UPDATE employees
SET salary = salary * 1.10
WHERE salary < (SELECT AVG(salary) FROM employees);
 
 
-- 7. UPDATE with scalar in new value calculation
UPDATE products
SET discount_price = price * (
    SELECT discount_rate 
    FROM discount_policies 
    WHERE policy_name = 'seasonal_sale'
);
 
 
-- 8. DELETE statement (filter condition)  
DELETE FROM audit_logs
WHERE log_date < (
    SELECT DATE_SUB(MAX(log_date), INTERVAL 90 DAY) 
    FROM audit_logs
);

Scalar Subquery Placement Summary
Context	Role	Example Pattern
SELECT list	Computed column	`SELECT ..., (scalar_query) AS col`
WHERE clause	Comparison value	`WHERE column > (scalar_query)`
HAVING clause	Group filter value	`HAVING AGG(col) > (scalar_query)`
CASE expression	Conditional value	`CASE WHEN col > (scalar_query) ...`
INSERT VALUES	Value to insert	`VALUES (..., (scalar_query), ...)`
UPDATE SET	New column value	`SET col = (scalar_query)`
DELETE WHERE	Filter condition	`WHERE col < (scalar_query)`
ORDER BY	Sort expression	`ORDER BY (scalar_query)` (rare)

Guaranteeing Scalar Results

The most critical aspect of scalar subqueries is ensuring they actually return one row. If your subquery can return multiple rows, you'll get a runtime error—often in production, with real users watching.

Techniques to Guarantee Scalar Results:

1. Aggregate Functions Without GROUP BY

Aggregate functions collapse all rows into a single result row. Without GROUP BY, they always produce exactly one row.

guarantee_aggregate.sql
1
2
3
4
5
6
7
8
9
-- Always scalar: aggregate without GROUP BY
SELECT MAX(salary) FROM employees;                    -- One row: highest salary
SELECT COUNT(*) FROM orders WHERE status = 'pending'; -- One row: count
SELECT SUM(quantity) FROM order_items WHERE order_id = 42; -- One row: sum
 
-- These can NEVER return more than one row, making them inherently safe
SELECT employee_name
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);  -- Safe scalar

2. Filtering by Unique/Primary Key

Querying by a unique constraint or primary key ensures at most one row (or zero if not found).

guarantee_unique_key.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Scalar via unique constraint (employee_id is primary key)
SELECT salary 
FROM employees 
WHERE employee_id = 101;  -- Primary key: at most one row
 
-- Safe to use as scalar subquery
UPDATE bonuses
SET amount = (
    SELECT salary * 0.10 
    FROM employees 
    WHERE employee_id = 101  -- Guaranteed unique
);
 
-- Scalar via unique constraint (email is unique)
SELECT department_id
FROM employees
WHERE email = 'john.doe@company.com';  -- Unique: at most one row

3. LIMIT 1 (or TOP 1 / FETCH FIRST 1)

Explicitly limiting results to one row guarantees scalarity, though you should consider which row you're getting.

guarantee_limit.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Force scalar with LIMIT 1 (MySQL, PostgreSQL)
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1;  -- Highest salary (single value)
 
-- SQL Server: TOP 1
SELECT TOP 1 salary
FROM employees
ORDER BY salary DESC;
 
-- Standard SQL: FETCH FIRST
SELECT salary
FROM employees
ORDER BY salary DESC
FETCH FIRST 1 ROW ONLY;
 
-- Using in context
SELECT employee_name, salary
FROM employees
WHERE salary = (
    SELECT salary 
    FROM employees 
    ORDER BY salary DESC 
    LIMIT 1  -- Guaranteed scalar
);

LIMIT 1 Without ORDER BY Is Non-Deterministic

Using LIMIT 1 without ORDER BY returns an arbitrary row—whichever the database happens to find first. This is rarely intentional. Always pair LIMIT with ORDER BY when you care which row is returned. 'Give me any one row' is a valid use case; 'give me the first row but I don't care about order' is usually a bug.

Handling Zero Rows (NULL Result)

What happens when a scalar subquery matches zero rows? Unlike multi-row subqueries, this doesn't cause an error—the result is simply NULL.

This NULL behavior is both useful and dangerous:

Useful: It allows graceful handling of missing data
Dangerous: NULL propagates through expressions in unexpected ways

NULL Propagation Examples:

null_propagation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Subquery returns NULL when no rows match
SELECT (SELECT salary FROM employees WHERE employee_id = 99999);
-- Result: NULL (no employee with ID 99999)
 
-- NULL propagates through arithmetic
SELECT 
    salary,
    salary - (SELECT salary FROM employees WHERE employee_id = 99999) AS diff
FROM employees;
-- diff is NULL for all rows (number - NULL = NULL)
 
-- NULL propagates through comparisons (THREE-VALUED LOGIC)
SELECT *
FROM employees
WHERE salary > (SELECT salary FROM employees WHERE employee_id = 99999);
-- Returns ZERO rows! (salary > NULL) evaluates to UNKNOWN, not TRUE
 
-- This query might unexpectedly return nothing if subquery returns NULL
SELECT * 
FROM products
WHERE price > (
    SELECT discount_threshold 
    FROM discount_rules 
    WHERE rule_name = 'nonexistent_rule'  -- Returns NULL
);
-- All comparisons with NULL yield UNKNOWN → no rows returned

Defensive Techniques for NULL Handling:

null_handling_defensive.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- 1. COALESCE: Provide default value
SELECT 
    employee_name,
    salary - COALESCE(
        (SELECT AVG(salary) FROM employees WHERE department_id = 999),
        0  -- Default to 0 if subquery returns NULL
    ) AS diff_from_avg
FROM employees;
 
 
-- 2. IFNULL/NVL (database-specific)
-- MySQL:
SELECT IFNULL((SELECT salary FROM employees WHERE employee_id = 99999), 0);
-- Oracle:
SELECT NVL((SELECT salary FROM employees WHERE employee_id = 99999), 0) FROM dual;
 
 
-- 3. CASE expression with EXISTS check
SELECT 
    employee_name,
    CASE 
        WHEN EXISTS (SELECT 1 FROM departments WHERE department_id = 999)
        THEN (SELECT budget FROM departments WHERE department_id = 999)
        ELSE 0
    END AS dept_budget
FROM employees;
 
 
-- 4. Handle NULL in comparison explicitly
SELECT *
FROM employees
WHERE salary > COALESCE(
    (SELECT discount_threshold FROM discount_rules WHERE rule_name = 'summer'),
    0  -- If rule doesn't exist, compare against 0
);

The Silent Bug

A query that returns no rows due to NULL comparison doesn't produce an error—it just silently returns an empty result set. This makes NULL-related bugs extremely hard to detect. Always ask: 'What happens if my subquery returns NULL?'

Handling Multiple-Row Errors

When a scalar subquery returns more than one row, databases raise a runtime error. This is one of the most common subquery mistakes.

Error Messages by Database:

Multiple-Row Error Messages
Database	Error Message
MySQL	Subquery returns more than 1 row
PostgreSQL	more than one row returned by a subquery used as an expression
SQL Server	Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, >, <=, >= or when the subquery is used as an expression.
Oracle	ORA-01427: single-row subquery returns more than one row

multiple_row_error.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- This query WILL FAIL if there are multiple departments
SELECT employee_name
FROM employees
WHERE department_id = (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'  -- Multiple departments possible!
);
-- Error: Subquery returns more than 1 row
 
-- SOLUTIONS:
 
-- Solution 1: Use IN instead of = (allow multiple matches)
SELECT employee_name
FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE location = 'New York'
);
 
-- Solution 2: Use LIMIT 1 if you need exactly one (arbitrary choice)
SELECT employee_name
FROM employees
WHERE department_id = (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'
    ORDER BY department_name  -- Deterministic order
    LIMIT 1
);
 
-- Solution 3: Use aggregate to reduce to scalar
SELECT employee_name
FROM employees
WHERE department_id = (
    SELECT MIN(department_id)  -- Reduce to one value
    FROM departments 
    WHERE location = 'New York'
);
 
-- Solution 4: Redesign query using EXISTS
SELECT employee_name
FROM employees e
WHERE EXISTS (
    SELECT 1 
    FROM departments d 
    WHERE d.department_id = e.department_id 
      AND d.location = 'New York'
);

Testing Subquery Cardinality

Before using a subquery as scalar, run it standalone and check the row count. Better yet: design your subquery with explicit guarantees (aggregate function, unique key filter, or LIMIT 1) rather than hoping your data happens to produce one row.

Scalar Subqueries in SELECT List

One of the most powerful uses of scalar subqueries is adding computed columns to query results. This allows you to enrich each row with related data or calculated values.

Use Case 1: Adding Aggregate Context

scalar_select_aggregates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Add company-wide statistics to each employee row
SELECT 
    employee_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg,
    (SELECT MAX(salary) FROM employees) AS company_max,
    (SELECT MIN(salary) FROM employees) AS company_min,
    ROUND(
        salary * 100.0 / (SELECT SUM(salary) FROM employees), 
        2
    ) AS pct_of_total_payroll
FROM employees
ORDER BY salary DESC;
 
-- Output example:
-- | employee_name | salary | company_avg | company_max | company_min | pct_of_total |
-- |---------------|--------|-------------|-------------|-------------|--------------|
-- | Sarah CEO     | 250000 | 75000       | 250000      | 35000       | 3.33%        |
-- | John VP       | 180000 | 75000       | 250000      | 35000       | 2.40%        |

Use Case 2: Correlated Scalar for Per-Row Related Data

scalar_select_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Add department-specific information to each employee
SELECT 
    e.employee_name,
    e.salary,
    e.department_id,
    (SELECT d.department_name 
     FROM departments d 
     WHERE d.department_id = e.department_id) AS dept_name,
    (SELECT COUNT(*) 
     FROM employees e2 
     WHERE e2.department_id = e.department_id) AS dept_size,
    (SELECT AVG(salary) 
     FROM employees e3 
     WHERE e3.department_id = e.department_id) AS dept_avg_salary
FROM employees e;
 
-- Each scalar subquery is correlated: references e.department_id
-- Conceptually executes once per employee row

Use Case 3: Conditional Scalar Selection

scalar_select_conditional.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Different calculations based on employee type
SELECT 
    employee_name,
    employee_type,
    base_salary,
    CASE employee_type
        WHEN 'HOURLY' THEN 
            (SELECT hourly_rate * 2080 FROM hourly_rates 
             WHERE employee_id = e.employee_id)
        WHEN 'SALARIED' THEN 
            base_salary
        WHEN 'CONTRACTOR' THEN 
            (SELECT contract_amount FROM contracts 
             WHERE contractor_id = e.employee_id 
             ORDER BY start_date DESC LIMIT 1)
    END AS annual_compensation
FROM employees e;

Performance Consideration: N+1 Query Pattern

Multiple correlated scalar subqueries in the SELECT list can cause N+1 performance issues. For 1000 employees with 3 scalar subqueries each, you might execute 3001 queries conceptually. Modern optimizers often transform this to joins, but large tables warrant using EXPLAIN to verify. Consider JOINs with window functions for complex per-row calculations.

Scalar Subqueries in WHERE Clause

The WHERE clause is where scalar subqueries shine brightest—enabling comparisons against dynamically computed values.

Comparison Operators with Scalar Subqueries:

All standard comparison operators work with scalar subqueries:

scalar_where_comparisons.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Equal to scalar
SELECT * FROM employees
WHERE department_id = (SELECT department_id FROM employees WHERE email = 'manager@company.com');
 
-- Greater than scalar
SELECT * FROM products
WHERE price > (SELECT AVG(price) FROM products);
 
-- Less than or equal to scalar  
SELECT * FROM orders
WHERE order_date <= (SELECT MAX(order_date) FROM orders WHERE status = 'completed');
 
-- Not equal to scalar
SELECT * FROM departments
WHERE department_id <> (SELECT department_id FROM employees WHERE employee_id = 1);
 
-- BETWEEN using two scalars
SELECT * FROM products
WHERE price BETWEEN 
    (SELECT AVG(price) * 0.8 FROM products)
    AND 
    (SELECT AVG(price) * 1.2 FROM products);

Combining Scalar Subqueries with Logical Operators:

scalar_where_logical.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Multiple conditions with scalar subqueries
SELECT employee_name, salary, department_id
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
  AND department_id = (
      SELECT department_id FROM departments WHERE department_name = 'Engineering'
  )
  AND hire_date < (SELECT MAX(hire_date) FROM employees WHERE is_manager = TRUE);
 
-- OR conditions
SELECT product_name, price, category_id
FROM products
WHERE price = (SELECT MAX(price) FROM products WHERE category_id = 1)
   OR price = (SELECT MAX(price) FROM products WHERE category_id = 2)
   OR price = (SELECT MAX(price) FROM products WHERE category_id = 3);

Arithmetic with Scalar Subqueries:

scalar_where_arithmetic.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Salary within 10% of company average
SELECT employee_name, salary
FROM employees
WHERE ABS(salary - (SELECT AVG(salary) FROM employees)) <= 
      (SELECT AVG(salary) FROM employees) * 0.10;
 
-- Products priced at exactly 20% markup over cost
SELECT product_name, price, cost
FROM products
WHERE price = cost * (1 + (SELECT markup_rate FROM pricing_config WHERE config_name = 'standard'));
 
-- Orders from the last N days (N is stored in a config table)
SELECT * FROM orders
WHERE order_date >= DATE_SUB(
    CURRENT_DATE, 
    INTERVAL (SELECT value FROM settings WHERE key = 'recent_days_threshold') DAY
);

Scalar Subqueries in UPDATE and DELETE

Scalar subqueries aren't limited to SELECT—they're equally powerful in data modification statements. This enables dynamic updates based on computed values.

UPDATE with Scalar Subqueries:

scalar_update.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Set all salaries below average to the average
UPDATE employees
SET salary = (SELECT AVG(salary) FROM employees)
WHERE salary < (SELECT AVG(salary) FROM employees);
 
-- Note: Some databases (like MySQL) have restrictions on 
-- updating a table while selecting from it in a subquery.
-- Workaround: use a derived table or CTE
 
-- MySQL-safe version using derived table:
UPDATE employees
SET salary = (SELECT avg_sal FROM (SELECT AVG(salary) AS avg_sal FROM employees) AS temp)
WHERE salary < (SELECT avg_sal FROM (SELECT AVG(salary) AS avg_sal FROM employees) AS temp);
 
 
-- Update with correlated scalar subquery
-- Set each employee's bonus based on their department's performance
UPDATE employees e
SET bonus = salary * (
    SELECT bonus_multiplier 
    FROM department_performance dp
    WHERE dp.department_id = e.department_id
      AND dp.year = 2024
);
 
 
-- Update using scalar from different table
UPDATE products
SET featured = TRUE
WHERE price = (SELECT MAX(price) FROM products);

DELETE with Scalar Subqueries:

scalar_delete.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Delete low-value orders (below average)
DELETE FROM orders
WHERE total_amount < (SELECT AVG(total_amount) FROM orders) / 2;
 
-- Delete records older than the retention period (stored in config)
DELETE FROM logs
WHERE created_at < DATE_SUB(
    CURRENT_DATE,
    INTERVAL (SELECT retention_days FROM system_config WHERE config_name = 'log_retention') DAY
);
 
-- Delete duplicate products keeping only the one with lowest ID
DELETE FROM products p1
WHERE p1.product_id > (
    SELECT MIN(p2.product_id)
    FROM products p2
    WHERE p2.product_name = p1.product_name
);

Same-Table Subquery Restrictions

Some databases (especially MySQL) restrict modifying a table while selecting from it in a subquery. The error 'You can't specify target table for update in FROM clause' indicates this. Solutions include using derived tables, CTEs (WITH clause), or temporary tables to cache the subquery result first.

Performance Characteristics

Understanding scalar subquery performance is essential for writing efficient SQL. The key question is: how often does the subquery execute?

Non-Correlated Scalar Subqueries:

These execute once regardless of outer query size. Modern databases cache the result, making them efficient even when referenced multiple times.

performance_noncorrelated.sql
1
2
3
4
5
6
7
8
9
10
-- This subquery executes ONCE, result is cached and reused
SELECT 
    employee_name,
    salary,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
 
-- Even though (SELECT AVG(salary) FROM employees) appears conceptually
-- for each of 10,000 employees, it only executes once.
-- Execution: O(n) for outer query + O(m) for subquery = O(n + m)

Correlated Scalar Subqueries:

These can be expensive because they conceptually execute once per outer row. However, optimizers often transform them.

performance_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Correlated: conceptually executes per outer row
SELECT 
    e.employee_name,
    e.salary,
    (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e.department_id) AS dept_avg
FROM employees e;
 
-- Naive execution: O(n * m) where n = employees, m = department size
-- Optimized execution: query optimizer may transform to JOIN + window function:
 
-- Equivalent optimized query (what optimizer might produce):
SELECT 
    e.employee_name,
    e.salary,
    AVG(e.salary) OVER (PARTITION BY e.department_id) AS dept_avg
FROM employees e;
-- Execution: O(n log n) for windowed aggregate

Scalar Subquery Performance Guide
Subquery Type	Execution Frequency	Optimization Potential	Recommendation
Non-correlated (simple)	Once	Always cached	Use freely; very efficient
Non-correlated (in SELECT, multiple)	Once each	Cached independently	Efficient; consider combining if complex
Correlated (in WHERE)	Per outer row (conceptual)	Often to semi-join	Check EXPLAIN; usually optimized
Correlated (in SELECT)	Per outer row (conceptual)	Sometimes to window	Watch for N+1; prefer JOIN if slow
Multiple correlated in SELECT	Each per outer row	Limited	Consider restructuring to JOIN

When in Doubt, EXPLAIN

Always use EXPLAIN or EXPLAIN ANALYZE on queries with scalar subqueries, especially correlated ones. The execution plan reveals whether the optimizer transformed your subquery efficiently or is executing it row-by-row. If you see 'DEPENDENT SUBQUERY' or 'Subplan' executed many times, consider restructuring.

Summary: Scalar Subquery Mastery

Scalar subqueries are the workhorse of SQL's compositional power. Let's consolidate the essential knowledge:

Key Takeaways

•Definition — Scalar subqueries return exactly one row with one column, producing a single value usable anywhere a literal is expected.
•Universal placement — Use scalar subqueries in SELECT lists, WHERE clauses, HAVING conditions, CASE expressions, and all DML statements (INSERT, UPDATE, DELETE).
•Guaranteeing scalarity — Use aggregate functions without GROUP BY, filter by unique keys, or apply LIMIT 1 to ensure single-row results.
•NULL handling — Zero-row subqueries return NULL, which propagates through expressions. Use COALESCE for defensive coding.
•Multiple-row errors — When a scalar position receives multiple rows, databases raise runtime errors. Design subqueries with explicit scalar guarantees.
•Performance — Non-correlated scalar subqueries execute once (efficient). Correlated ones execute per row conceptually, but optimizers often transform them to joins.

What's Next:

Scalar subqueries return one value. But what if you need to match multiple columns simultaneously—like finding employees with the same (department, job_title) as another specific employee? The next page explores row subqueries, which return a complete row for tuple-based comparisons.

Page Complete

You now have comprehensive mastery of scalar subqueries—their syntax, placement, guarantees, error handling, and performance characteristics. These skills form the foundation for the more complex subquery types ahead.

2 / 5

Loading learning content...

Database Management SystemsSubqueries

SQL Subqueries: Nested Query Power

LevelIntermediate

Duration60 mins

TopicSubqueries

2 / 5

Scalar Subqueries

When You Need Exactly One Value

All these scenarios require embedding a computed single value directly into your SQL statement—not a list, not a table, but precisely one value.

What You Will Learn

Defining Scalar Subqueries

A scalar subquery is a subquery that returns a result set consisting of exactly one row with exactly one column—in other words, a single value (a scalar).

Formal Definition:

A scalar subquery is a SELECT statement that, upon execution, yields a result containing at most one row and exactly one column. If the query returns zero rows, the result is NULL. If it returns more than one row, most databases raise a runtime error.

The term "scalar" comes from linear algebra, where a scalar is a single number (as opposed to a vector or matrix). In SQL, a scalar value can be any data type: integer, string, date, boolean, etc.

scalar_subquery_definition.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Scalar subquery: returns exactly one value
-- This subquery returns a single number: the average salary
 
SELECT employee_name, salary,
       (SELECT AVG(salary) FROM employees) AS company_average
FROM employees;
 
-- For 100 employees, this query returns 100 rows.
-- Each row has three columns:
--   1. employee_name (from base table)
--   2. salary (from base table)  
--   3. company_average (same scalar value for all rows)
 
-- The subquery (SELECT AVG(salary) FROM employees) is scalar because:
--   - It returns exactly ONE row (aggregates collapse to one row)
--   - It returns exactly ONE column (AVG(salary))

Aggregate Functions Guarantee Scalarity

Where Scalar Subqueries Can Appear

The defining characteristic of scalar subqueries is their flexibility—they can appear anywhere a single value is expected. This includes contexts many developers don't initially consider:

In SELECT Statements:

scalar_in_select.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- 1. SELECT list (as a computed column)
SELECT 
    employee_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS avg_salary,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
 
 
-- 2. WHERE clause (comparison)
SELECT employee_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
 
 
-- 3. HAVING clause (filter on groups)
SELECT department_id, AVG(salary) as dept_avg
FROM employees
GROUP BY department_id
HAVING AVG(salary) > (SELECT AVG(salary) FROM employees);
 
 
-- 4. CASE expressions
SELECT 
    employee_name,
    salary,
    CASE 
        WHEN salary > (SELECT AVG(salary) FROM employees) THEN 'Above Average'
        WHEN salary = (SELECT AVG(salary) FROM employees) THEN 'At Average'
        ELSE 'Below Average'
    END AS salary_status
FROM employees;

In Data Modification Statements:

scalar_in_dml.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- 5. INSERT statement (computed values)
INSERT INTO salary_reports (report_date, avg_salary, employee_count)
VALUES (
    CURRENT_DATE,
    (SELECT AVG(salary) FROM employees),
    (SELECT COUNT(*) FROM employees)
);
 
 
-- 6. UPDATE statement (SET clause)
UPDATE employees
SET salary = salary * 1.10
WHERE salary < (SELECT AVG(salary) FROM employees);
 
 
-- 7. UPDATE with scalar in new value calculation
UPDATE products
SET discount_price = price * (
    SELECT discount_rate 
    FROM discount_policies 
    WHERE policy_name = 'seasonal_sale'
);
 
 
-- 8. DELETE statement (filter condition)  
DELETE FROM audit_logs
WHERE log_date < (
    SELECT DATE_SUB(MAX(log_date), INTERVAL 90 DAY) 
    FROM audit_logs
);

Scalar Subquery Placement Summary
Context	Role	Example Pattern
SELECT list	Computed column	`SELECT ..., (scalar_query) AS col`
WHERE clause	Comparison value	`WHERE column > (scalar_query)`
HAVING clause	Group filter value	`HAVING AGG(col) > (scalar_query)`
CASE expression	Conditional value	`CASE WHEN col > (scalar_query) ...`
INSERT VALUES	Value to insert	`VALUES (..., (scalar_query), ...)`
UPDATE SET	New column value	`SET col = (scalar_query)`
DELETE WHERE	Filter condition	`WHERE col < (scalar_query)`
ORDER BY	Sort expression	`ORDER BY (scalar_query)` (rare)

Guaranteeing Scalar Results

Techniques to Guarantee Scalar Results:

1. Aggregate Functions Without GROUP BY

Aggregate functions collapse all rows into a single result row. Without GROUP BY, they always produce exactly one row.

guarantee_aggregate.sql
1
2
3
4
5
6
7
8
9
-- Always scalar: aggregate without GROUP BY
SELECT MAX(salary) FROM employees;                    -- One row: highest salary
SELECT COUNT(*) FROM orders WHERE status = 'pending'; -- One row: count
SELECT SUM(quantity) FROM order_items WHERE order_id = 42; -- One row: sum
 
-- These can NEVER return more than one row, making them inherently safe
SELECT employee_name
FROM employees
WHERE salary = (SELECT MAX(salary) FROM employees);  -- Safe scalar

2. Filtering by Unique/Primary Key

Querying by a unique constraint or primary key ensures at most one row (or zero if not found).

guarantee_unique_key.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Scalar via unique constraint (employee_id is primary key)
SELECT salary 
FROM employees 
WHERE employee_id = 101;  -- Primary key: at most one row
 
-- Safe to use as scalar subquery
UPDATE bonuses
SET amount = (
    SELECT salary * 0.10 
    FROM employees 
    WHERE employee_id = 101  -- Guaranteed unique
);
 
-- Scalar via unique constraint (email is unique)
SELECT department_id
FROM employees
WHERE email = 'john.doe@company.com';  -- Unique: at most one row

3. LIMIT 1 (or TOP 1 / FETCH FIRST 1)

Explicitly limiting results to one row guarantees scalarity, though you should consider which row you're getting.

guarantee_limit.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Force scalar with LIMIT 1 (MySQL, PostgreSQL)
SELECT salary
FROM employees
ORDER BY salary DESC
LIMIT 1;  -- Highest salary (single value)
 
-- SQL Server: TOP 1
SELECT TOP 1 salary
FROM employees
ORDER BY salary DESC;
 
-- Standard SQL: FETCH FIRST
SELECT salary
FROM employees
ORDER BY salary DESC
FETCH FIRST 1 ROW ONLY;
 
-- Using in context
SELECT employee_name, salary
FROM employees
WHERE salary = (
    SELECT salary 
    FROM employees 
    ORDER BY salary DESC 
    LIMIT 1  -- Guaranteed scalar
);

LIMIT 1 Without ORDER BY Is Non-Deterministic

Handling Zero Rows (NULL Result)

What happens when a scalar subquery matches zero rows? Unlike multi-row subqueries, this doesn't cause an error—the result is simply NULL.

This NULL behavior is both useful and dangerous:

Useful: It allows graceful handling of missing data
Dangerous: NULL propagates through expressions in unexpected ways

NULL Propagation Examples:

null_propagation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Subquery returns NULL when no rows match
SELECT (SELECT salary FROM employees WHERE employee_id = 99999);
-- Result: NULL (no employee with ID 99999)
 
-- NULL propagates through arithmetic
SELECT 
    salary,
    salary - (SELECT salary FROM employees WHERE employee_id = 99999) AS diff
FROM employees;
-- diff is NULL for all rows (number - NULL = NULL)
 
-- NULL propagates through comparisons (THREE-VALUED LOGIC)
SELECT *
FROM employees
WHERE salary > (SELECT salary FROM employees WHERE employee_id = 99999);
-- Returns ZERO rows! (salary > NULL) evaluates to UNKNOWN, not TRUE
 
-- This query might unexpectedly return nothing if subquery returns NULL
SELECT * 
FROM products
WHERE price > (
    SELECT discount_threshold 
    FROM discount_rules 
    WHERE rule_name = 'nonexistent_rule'  -- Returns NULL
);
-- All comparisons with NULL yield UNKNOWN → no rows returned

Defensive Techniques for NULL Handling:

null_handling_defensive.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- 1. COALESCE: Provide default value
SELECT 
    employee_name,
    salary - COALESCE(
        (SELECT AVG(salary) FROM employees WHERE department_id = 999),
        0  -- Default to 0 if subquery returns NULL
    ) AS diff_from_avg
FROM employees;
 
 
-- 2. IFNULL/NVL (database-specific)
-- MySQL:
SELECT IFNULL((SELECT salary FROM employees WHERE employee_id = 99999), 0);
-- Oracle:
SELECT NVL((SELECT salary FROM employees WHERE employee_id = 99999), 0) FROM dual;
 
 
-- 3. CASE expression with EXISTS check
SELECT 
    employee_name,
    CASE 
        WHEN EXISTS (SELECT 1 FROM departments WHERE department_id = 999)
        THEN (SELECT budget FROM departments WHERE department_id = 999)
        ELSE 0
    END AS dept_budget
FROM employees;
 
 
-- 4. Handle NULL in comparison explicitly
SELECT *
FROM employees
WHERE salary > COALESCE(
    (SELECT discount_threshold FROM discount_rules WHERE rule_name = 'summer'),
    0  -- If rule doesn't exist, compare against 0
);

The Silent Bug

Handling Multiple-Row Errors

When a scalar subquery returns more than one row, databases raise a runtime error. This is one of the most common subquery mistakes.

Error Messages by Database:

Multiple-Row Error Messages
Database	Error Message
MySQL	Subquery returns more than 1 row
PostgreSQL	more than one row returned by a subquery used as an expression
SQL Server	Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, >, <=, >= or when the subquery is used as an expression.
Oracle	ORA-01427: single-row subquery returns more than one row

multiple_row_error.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
-- This query WILL FAIL if there are multiple departments
SELECT employee_name
FROM employees
WHERE department_id = (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'  -- Multiple departments possible!
);
-- Error: Subquery returns more than 1 row
 
-- SOLUTIONS:
 
-- Solution 1: Use IN instead of = (allow multiple matches)
SELECT employee_name
FROM employees
WHERE department_id IN (
    SELECT department_id FROM departments WHERE location = 'New York'
);
 
-- Solution 2: Use LIMIT 1 if you need exactly one (arbitrary choice)
SELECT employee_name
FROM employees
WHERE department_id = (
    SELECT department_id 
    FROM departments 
    WHERE location = 'New York'
    ORDER BY department_name  -- Deterministic order
    LIMIT 1
);
 
-- Solution 3: Use aggregate to reduce to scalar
SELECT employee_name
FROM employees
WHERE department_id = (
    SELECT MIN(department_id)  -- Reduce to one value
    FROM departments 
    WHERE location = 'New York'
);
 
-- Solution 4: Redesign query using EXISTS
SELECT employee_name
FROM employees e
WHERE EXISTS (
    SELECT 1 
    FROM departments d 
    WHERE d.department_id = e.department_id 
      AND d.location = 'New York'
);

Testing Subquery Cardinality

Scalar Subqueries in SELECT List

One of the most powerful uses of scalar subqueries is adding computed columns to query results. This allows you to enrich each row with related data or calculated values.

Use Case 1: Adding Aggregate Context

scalar_select_aggregates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- Add company-wide statistics to each employee row
SELECT 
    employee_name,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg,
    (SELECT MAX(salary) FROM employees) AS company_max,
    (SELECT MIN(salary) FROM employees) AS company_min,
    ROUND(
        salary * 100.0 / (SELECT SUM(salary) FROM employees), 
        2
    ) AS pct_of_total_payroll
FROM employees
ORDER BY salary DESC;
 
-- Output example:
-- | employee_name | salary | company_avg | company_max | company_min | pct_of_total |
-- |---------------|--------|-------------|-------------|-------------|--------------|
-- | Sarah CEO     | 250000 | 75000       | 250000      | 35000       | 3.33%        |
-- | John VP       | 180000 | 75000       | 250000      | 35000       | 2.40%        |

Use Case 2: Correlated Scalar for Per-Row Related Data

scalar_select_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Add department-specific information to each employee
SELECT 
    e.employee_name,
    e.salary,
    e.department_id,
    (SELECT d.department_name 
     FROM departments d 
     WHERE d.department_id = e.department_id) AS dept_name,
    (SELECT COUNT(*) 
     FROM employees e2 
     WHERE e2.department_id = e.department_id) AS dept_size,
    (SELECT AVG(salary) 
     FROM employees e3 
     WHERE e3.department_id = e.department_id) AS dept_avg_salary
FROM employees e;
 
-- Each scalar subquery is correlated: references e.department_id
-- Conceptually executes once per employee row

Use Case 3: Conditional Scalar Selection

scalar_select_conditional.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Different calculations based on employee type
SELECT 
    employee_name,
    employee_type,
    base_salary,
    CASE employee_type
        WHEN 'HOURLY' THEN 
            (SELECT hourly_rate * 2080 FROM hourly_rates 
             WHERE employee_id = e.employee_id)
        WHEN 'SALARIED' THEN 
            base_salary
        WHEN 'CONTRACTOR' THEN 
            (SELECT contract_amount FROM contracts 
             WHERE contractor_id = e.employee_id 
             ORDER BY start_date DESC LIMIT 1)
    END AS annual_compensation
FROM employees e;

Performance Consideration: N+1 Query Pattern

Scalar Subqueries in WHERE Clause

The WHERE clause is where scalar subqueries shine brightest—enabling comparisons against dynamically computed values.

Comparison Operators with Scalar Subqueries:

All standard comparison operators work with scalar subqueries:

scalar_where_comparisons.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Equal to scalar
SELECT * FROM employees
WHERE department_id = (SELECT department_id FROM employees WHERE email = 'manager@company.com');
 
-- Greater than scalar
SELECT * FROM products
WHERE price > (SELECT AVG(price) FROM products);
 
-- Less than or equal to scalar  
SELECT * FROM orders
WHERE order_date <= (SELECT MAX(order_date) FROM orders WHERE status = 'completed');
 
-- Not equal to scalar
SELECT * FROM departments
WHERE department_id <> (SELECT department_id FROM employees WHERE employee_id = 1);
 
-- BETWEEN using two scalars
SELECT * FROM products
WHERE price BETWEEN 
    (SELECT AVG(price) * 0.8 FROM products)
    AND 
    (SELECT AVG(price) * 1.2 FROM products);

Combining Scalar Subqueries with Logical Operators:

scalar_where_logical.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Multiple conditions with scalar subqueries
SELECT employee_name, salary, department_id
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
  AND department_id = (
      SELECT department_id FROM departments WHERE department_name = 'Engineering'
  )
  AND hire_date < (SELECT MAX(hire_date) FROM employees WHERE is_manager = TRUE);
 
-- OR conditions
SELECT product_name, price, category_id
FROM products
WHERE price = (SELECT MAX(price) FROM products WHERE category_id = 1)
   OR price = (SELECT MAX(price) FROM products WHERE category_id = 2)
   OR price = (SELECT MAX(price) FROM products WHERE category_id = 3);

Arithmetic with Scalar Subqueries:

scalar_where_arithmetic.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Salary within 10% of company average
SELECT employee_name, salary
FROM employees
WHERE ABS(salary - (SELECT AVG(salary) FROM employees)) <= 
      (SELECT AVG(salary) FROM employees) * 0.10;
 
-- Products priced at exactly 20% markup over cost
SELECT product_name, price, cost
FROM products
WHERE price = cost * (1 + (SELECT markup_rate FROM pricing_config WHERE config_name = 'standard'));
 
-- Orders from the last N days (N is stored in a config table)
SELECT * FROM orders
WHERE order_date >= DATE_SUB(
    CURRENT_DATE, 
    INTERVAL (SELECT value FROM settings WHERE key = 'recent_days_threshold') DAY
);

Scalar Subqueries in UPDATE and DELETE

Scalar subqueries aren't limited to SELECT—they're equally powerful in data modification statements. This enables dynamic updates based on computed values.

UPDATE with Scalar Subqueries:

scalar_update.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Set all salaries below average to the average
UPDATE employees
SET salary = (SELECT AVG(salary) FROM employees)
WHERE salary < (SELECT AVG(salary) FROM employees);
 
-- Note: Some databases (like MySQL) have restrictions on 
-- updating a table while selecting from it in a subquery.
-- Workaround: use a derived table or CTE
 
-- MySQL-safe version using derived table:
UPDATE employees
SET salary = (SELECT avg_sal FROM (SELECT AVG(salary) AS avg_sal FROM employees) AS temp)
WHERE salary < (SELECT avg_sal FROM (SELECT AVG(salary) AS avg_sal FROM employees) AS temp);
 
 
-- Update with correlated scalar subquery
-- Set each employee's bonus based on their department's performance
UPDATE employees e
SET bonus = salary * (
    SELECT bonus_multiplier 
    FROM department_performance dp
    WHERE dp.department_id = e.department_id
      AND dp.year = 2024
);
 
 
-- Update using scalar from different table
UPDATE products
SET featured = TRUE
WHERE price = (SELECT MAX(price) FROM products);

DELETE with Scalar Subqueries:

scalar_delete.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Delete low-value orders (below average)
DELETE FROM orders
WHERE total_amount < (SELECT AVG(total_amount) FROM orders) / 2;
 
-- Delete records older than the retention period (stored in config)
DELETE FROM logs
WHERE created_at < DATE_SUB(
    CURRENT_DATE,
    INTERVAL (SELECT retention_days FROM system_config WHERE config_name = 'log_retention') DAY
);
 
-- Delete duplicate products keeping only the one with lowest ID
DELETE FROM products p1
WHERE p1.product_id > (
    SELECT MIN(p2.product_id)
    FROM products p2
    WHERE p2.product_name = p1.product_name
);

Same-Table Subquery Restrictions

Performance Characteristics

Understanding scalar subquery performance is essential for writing efficient SQL. The key question is: how often does the subquery execute?

Non-Correlated Scalar Subqueries:

These execute once regardless of outer query size. Modern databases cache the result, making them efficient even when referenced multiple times.

performance_noncorrelated.sql
1
2
3
4
5
6
7
8
9
10
-- This subquery executes ONCE, result is cached and reused
SELECT 
    employee_name,
    salary,
    salary - (SELECT AVG(salary) FROM employees) AS diff_from_avg
FROM employees;
 
-- Even though (SELECT AVG(salary) FROM employees) appears conceptually
-- for each of 10,000 employees, it only executes once.
-- Execution: O(n) for outer query + O(m) for subquery = O(n + m)

Correlated Scalar Subqueries:

These can be expensive because they conceptually execute once per outer row. However, optimizers often transform them.

performance_correlated.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Correlated: conceptually executes per outer row
SELECT 
    e.employee_name,
    e.salary,
    (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e.department_id) AS dept_avg
FROM employees e;
 
-- Naive execution: O(n * m) where n = employees, m = department size
-- Optimized execution: query optimizer may transform to JOIN + window function:
 
-- Equivalent optimized query (what optimizer might produce):
SELECT 
    e.employee_name,
    e.salary,
    AVG(e.salary) OVER (PARTITION BY e.department_id) AS dept_avg
FROM employees e;
-- Execution: O(n log n) for windowed aggregate

Scalar Subquery Performance Guide
Subquery Type	Execution Frequency	Optimization Potential	Recommendation
Non-correlated (simple)	Once	Always cached	Use freely; very efficient
Non-correlated (in SELECT, multiple)	Once each	Cached independently	Efficient; consider combining if complex
Correlated (in WHERE)	Per outer row (conceptual)	Often to semi-join	Check EXPLAIN; usually optimized
Correlated (in SELECT)	Per outer row (conceptual)	Sometimes to window	Watch for N+1; prefer JOIN if slow
Multiple correlated in SELECT	Each per outer row	Limited	Consider restructuring to JOIN

When in Doubt, EXPLAIN

Summary: Scalar Subquery Mastery

Scalar subqueries are the workhorse of SQL's compositional power. Let's consolidate the essential knowledge:

Key Takeaways

•Definition — Scalar subqueries return exactly one row with one column, producing a single value usable anywhere a literal is expected.
•Universal placement — Use scalar subqueries in SELECT lists, WHERE clauses, HAVING conditions, CASE expressions, and all DML statements (INSERT, UPDATE, DELETE).
•Guaranteeing scalarity — Use aggregate functions without GROUP BY, filter by unique keys, or apply LIMIT 1 to ensure single-row results.
•NULL handling — Zero-row subqueries return NULL, which propagates through expressions. Use COALESCE for defensive coding.
•Multiple-row errors — When a scalar position receives multiple rows, databases raise runtime errors. Design subqueries with explicit scalar guarantees.
•Performance — Non-correlated scalar subqueries execute once (efficient). Correlated ones execute per row conceptually, but optimizers often transform them to joins.

What's Next:

Page Complete

2 / 5