Loading learning content...
Imagine displaying an employee report that shows each person's salary alongside the company average. Or updating prices where the discount depends on a calculated threshold. Or inserting records that need computed default values.
All these scenarios require embedding a computed single value directly into your SQL statement—not a list, not a table, but precisely one value.
This is the domain of scalar subqueries: the simplest and most versatile form of nested queries. A scalar subquery evaluates to exactly one row containing exactly one column—a single cell in database terms. This restriction is what gives scalar subqueries their power: they can appear anywhere SQL expects a literal value.
By the end of this page, you will master scalar subqueries—understanding where they can appear, how to guarantee they return exactly one value, how to handle edge cases, and when they excel versus alternative approaches. You'll learn to use them for computations, comparisons, and even data modification statements.
A scalar subquery is a subquery that returns a result set consisting of exactly one row with exactly one column—in other words, a single value (a scalar).
Formal Definition:
A scalar subquery is a SELECT statement that, upon execution, yields a result containing at most one row and exactly one column. If the query returns zero rows, the result is NULL. If it returns more than one row, most databases raise a runtime error.
The term "scalar" comes from linear algebra, where a scalar is a single number (as opposed to a vector or matrix). In SQL, a scalar value can be any data type: integer, string, date, boolean, etc.
12345678910111213141516
-- Scalar subquery: returns exactly one value-- This subquery returns a single number: the average salary SELECT employee_name, salary, (SELECT AVG(salary) FROM employees) AS company_averageFROM employees; -- For 100 employees, this query returns 100 rows.-- Each row has three columns:-- 1. employee_name (from base table)-- 2. salary (from base table) -- 3. company_average (same scalar value for all rows) -- The subquery (SELECT AVG(salary) FROM employees) is scalar because:-- - It returns exactly ONE row (aggregates collapse to one row)-- - It returns exactly ONE column (AVG(salary))Aggregate functions (SUM, AVG, COUNT, MAX, MIN) without GROUP BY always return exactly one row, making them natural scalar subquery producers. When you need a scalar result, aggregate functions are your most reliable tool.
The defining characteristic of scalar subqueries is their flexibility—they can appear anywhere a single value is expected. This includes contexts many developers don't initially consider:
In SELECT Statements:
1234567891011121314151617181920212223242526272829303132
-- 1. SELECT list (as a computed column)SELECT employee_name, salary, (SELECT AVG(salary) FROM employees) AS avg_salary, salary - (SELECT AVG(salary) FROM employees) AS diff_from_avgFROM employees; -- 2. WHERE clause (comparison)SELECT employee_name, salaryFROM employeesWHERE salary > (SELECT AVG(salary) FROM employees); -- 3. HAVING clause (filter on groups)SELECT department_id, AVG(salary) as dept_avgFROM employeesGROUP BY department_idHAVING AVG(salary) > (SELECT AVG(salary) FROM employees); -- 4. CASE expressionsSELECT employee_name, salary, CASE WHEN salary > (SELECT AVG(salary) FROM employees) THEN 'Above Average' WHEN salary = (SELECT AVG(salary) FROM employees) THEN 'At Average' ELSE 'Below Average' END AS salary_statusFROM employees;In Data Modification Statements:
123456789101112131415161718192021222324252627282930
-- 5. INSERT statement (computed values)INSERT INTO salary_reports (report_date, avg_salary, employee_count)VALUES ( CURRENT_DATE, (SELECT AVG(salary) FROM employees), (SELECT COUNT(*) FROM employees)); -- 6. UPDATE statement (SET clause)UPDATE employeesSET salary = salary * 1.10WHERE salary < (SELECT AVG(salary) FROM employees); -- 7. UPDATE with scalar in new value calculationUPDATE productsSET discount_price = price * ( SELECT discount_rate FROM discount_policies WHERE policy_name = 'seasonal_sale'); -- 8. DELETE statement (filter condition) DELETE FROM audit_logsWHERE log_date < ( SELECT DATE_SUB(MAX(log_date), INTERVAL 90 DAY) FROM audit_logs);| Context | Role | Example Pattern |
|---|---|---|
| SELECT list | Computed column | SELECT ..., (scalar_query) AS col |
| WHERE clause | Comparison value | WHERE column > (scalar_query) |
| HAVING clause | Group filter value | HAVING AGG(col) > (scalar_query) |
| CASE expression | Conditional value | CASE WHEN col > (scalar_query) ... |
| INSERT VALUES | Value to insert | VALUES (..., (scalar_query), ...) |
| UPDATE SET | New column value | SET col = (scalar_query) |
| DELETE WHERE | Filter condition | WHERE col < (scalar_query) |
| ORDER BY | Sort expression | ORDER BY (scalar_query) (rare) |
The most critical aspect of scalar subqueries is ensuring they actually return one row. If your subquery can return multiple rows, you'll get a runtime error—often in production, with real users watching.
Techniques to Guarantee Scalar Results:
1. Aggregate Functions Without GROUP BY
Aggregate functions collapse all rows into a single result row. Without GROUP BY, they always produce exactly one row.
123456789
-- Always scalar: aggregate without GROUP BYSELECT MAX(salary) FROM employees; -- One row: highest salarySELECT COUNT(*) FROM orders WHERE status = 'pending'; -- One row: countSELECT SUM(quantity) FROM order_items WHERE order_id = 42; -- One row: sum -- These can NEVER return more than one row, making them inherently safeSELECT employee_nameFROM employeesWHERE salary = (SELECT MAX(salary) FROM employees); -- Safe scalar2. Filtering by Unique/Primary Key
Querying by a unique constraint or primary key ensures at most one row (or zero if not found).
1234567891011121314151617
-- Scalar via unique constraint (employee_id is primary key)SELECT salary FROM employees WHERE employee_id = 101; -- Primary key: at most one row -- Safe to use as scalar subqueryUPDATE bonusesSET amount = ( SELECT salary * 0.10 FROM employees WHERE employee_id = 101 -- Guaranteed unique); -- Scalar via unique constraint (email is unique)SELECT department_idFROM employeesWHERE email = 'john.doe@company.com'; -- Unique: at most one row3. LIMIT 1 (or TOP 1 / FETCH FIRST 1)
Explicitly limiting results to one row guarantees scalarity, though you should consider which row you're getting.
1234567891011121314151617181920212223242526
-- Force scalar with LIMIT 1 (MySQL, PostgreSQL)SELECT salaryFROM employeesORDER BY salary DESCLIMIT 1; -- Highest salary (single value) -- SQL Server: TOP 1SELECT TOP 1 salaryFROM employeesORDER BY salary DESC; -- Standard SQL: FETCH FIRSTSELECT salaryFROM employeesORDER BY salary DESCFETCH FIRST 1 ROW ONLY; -- Using in contextSELECT employee_name, salaryFROM employeesWHERE salary = ( SELECT salary FROM employees ORDER BY salary DESC LIMIT 1 -- Guaranteed scalar);Using LIMIT 1 without ORDER BY returns an arbitrary row—whichever the database happens to find first. This is rarely intentional. Always pair LIMIT with ORDER BY when you care which row is returned. 'Give me any one row' is a valid use case; 'give me the first row but I don't care about order' is usually a bug.
What happens when a scalar subquery matches zero rows? Unlike multi-row subqueries, this doesn't cause an error—the result is simply NULL.
This NULL behavior is both useful and dangerous:
NULL Propagation Examples:
1234567891011121314151617181920212223242526
-- Subquery returns NULL when no rows matchSELECT (SELECT salary FROM employees WHERE employee_id = 99999);-- Result: NULL (no employee with ID 99999) -- NULL propagates through arithmeticSELECT salary, salary - (SELECT salary FROM employees WHERE employee_id = 99999) AS diffFROM employees;-- diff is NULL for all rows (number - NULL = NULL) -- NULL propagates through comparisons (THREE-VALUED LOGIC)SELECT *FROM employeesWHERE salary > (SELECT salary FROM employees WHERE employee_id = 99999);-- Returns ZERO rows! (salary > NULL) evaluates to UNKNOWN, not TRUE -- This query might unexpectedly return nothing if subquery returns NULLSELECT * FROM productsWHERE price > ( SELECT discount_threshold FROM discount_rules WHERE rule_name = 'nonexistent_rule' -- Returns NULL);-- All comparisons with NULL yield UNKNOWN → no rows returnedDefensive Techniques for NULL Handling:
1234567891011121314151617181920212223242526272829303132333435
-- 1. COALESCE: Provide default valueSELECT employee_name, salary - COALESCE( (SELECT AVG(salary) FROM employees WHERE department_id = 999), 0 -- Default to 0 if subquery returns NULL ) AS diff_from_avgFROM employees; -- 2. IFNULL/NVL (database-specific)-- MySQL:SELECT IFNULL((SELECT salary FROM employees WHERE employee_id = 99999), 0);-- Oracle:SELECT NVL((SELECT salary FROM employees WHERE employee_id = 99999), 0) FROM dual; -- 3. CASE expression with EXISTS checkSELECT employee_name, CASE WHEN EXISTS (SELECT 1 FROM departments WHERE department_id = 999) THEN (SELECT budget FROM departments WHERE department_id = 999) ELSE 0 END AS dept_budgetFROM employees; -- 4. Handle NULL in comparison explicitlySELECT *FROM employeesWHERE salary > COALESCE( (SELECT discount_threshold FROM discount_rules WHERE rule_name = 'summer'), 0 -- If rule doesn't exist, compare against 0);A query that returns no rows due to NULL comparison doesn't produce an error—it just silently returns an empty result set. This makes NULL-related bugs extremely hard to detect. Always ask: 'What happens if my subquery returns NULL?'
When a scalar subquery returns more than one row, databases raise a runtime error. This is one of the most common subquery mistakes.
Error Messages by Database:
| Database | Error Message |
|---|---|
| MySQL | Subquery returns more than 1 row |
| PostgreSQL | more than one row returned by a subquery used as an expression |
| SQL Server | Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, >, <=, >= or when the subquery is used as an expression. |
| Oracle | ORA-01427: single-row subquery returns more than one row |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
-- This query WILL FAIL if there are multiple departmentsSELECT employee_nameFROM employeesWHERE department_id = ( SELECT department_id FROM departments WHERE location = 'New York' -- Multiple departments possible!);-- Error: Subquery returns more than 1 row -- SOLUTIONS: -- Solution 1: Use IN instead of = (allow multiple matches)SELECT employee_nameFROM employeesWHERE department_id IN ( SELECT department_id FROM departments WHERE location = 'New York'); -- Solution 2: Use LIMIT 1 if you need exactly one (arbitrary choice)SELECT employee_nameFROM employeesWHERE department_id = ( SELECT department_id FROM departments WHERE location = 'New York' ORDER BY department_name -- Deterministic order LIMIT 1); -- Solution 3: Use aggregate to reduce to scalarSELECT employee_nameFROM employeesWHERE department_id = ( SELECT MIN(department_id) -- Reduce to one value FROM departments WHERE location = 'New York'); -- Solution 4: Redesign query using EXISTSSELECT employee_nameFROM employees eWHERE EXISTS ( SELECT 1 FROM departments d WHERE d.department_id = e.department_id AND d.location = 'New York');Before using a subquery as scalar, run it standalone and check the row count. Better yet: design your subquery with explicit guarantees (aggregate function, unique key filter, or LIMIT 1) rather than hoping your data happens to produce one row.
One of the most powerful uses of scalar subqueries is adding computed columns to query results. This allows you to enrich each row with related data or calculated values.
Use Case 1: Adding Aggregate Context
12345678910111213141516171819
-- Add company-wide statistics to each employee rowSELECT employee_name, salary, (SELECT AVG(salary) FROM employees) AS company_avg, (SELECT MAX(salary) FROM employees) AS company_max, (SELECT MIN(salary) FROM employees) AS company_min, ROUND( salary * 100.0 / (SELECT SUM(salary) FROM employees), 2 ) AS pct_of_total_payrollFROM employeesORDER BY salary DESC; -- Output example:-- | employee_name | salary | company_avg | company_max | company_min | pct_of_total |-- |---------------|--------|-------------|-------------|-------------|--------------|-- | Sarah CEO | 250000 | 75000 | 250000 | 35000 | 3.33% |-- | John VP | 180000 | 75000 | 250000 | 35000 | 2.40% |Use Case 2: Correlated Scalar for Per-Row Related Data
123456789101112131415161718
-- Add department-specific information to each employeeSELECT e.employee_name, e.salary, e.department_id, (SELECT d.department_name FROM departments d WHERE d.department_id = e.department_id) AS dept_name, (SELECT COUNT(*) FROM employees e2 WHERE e2.department_id = e.department_id) AS dept_size, (SELECT AVG(salary) FROM employees e3 WHERE e3.department_id = e.department_id) AS dept_avg_salaryFROM employees e; -- Each scalar subquery is correlated: references e.department_id-- Conceptually executes once per employee rowUse Case 3: Conditional Scalar Selection
1234567891011121314151617
-- Different calculations based on employee typeSELECT employee_name, employee_type, base_salary, CASE employee_type WHEN 'HOURLY' THEN (SELECT hourly_rate * 2080 FROM hourly_rates WHERE employee_id = e.employee_id) WHEN 'SALARIED' THEN base_salary WHEN 'CONTRACTOR' THEN (SELECT contract_amount FROM contracts WHERE contractor_id = e.employee_id ORDER BY start_date DESC LIMIT 1) END AS annual_compensationFROM employees e;Multiple correlated scalar subqueries in the SELECT list can cause N+1 performance issues. For 1000 employees with 3 scalar subqueries each, you might execute 3001 queries conceptually. Modern optimizers often transform this to joins, but large tables warrant using EXPLAIN to verify. Consider JOINs with window functions for complex per-row calculations.
The WHERE clause is where scalar subqueries shine brightest—enabling comparisons against dynamically computed values.
Comparison Operators with Scalar Subqueries:
All standard comparison operators work with scalar subqueries:
12345678910111213141516171819202122
-- Equal to scalarSELECT * FROM employeesWHERE department_id = (SELECT department_id FROM employees WHERE email = 'manager@company.com'); -- Greater than scalarSELECT * FROM productsWHERE price > (SELECT AVG(price) FROM products); -- Less than or equal to scalar SELECT * FROM ordersWHERE order_date <= (SELECT MAX(order_date) FROM orders WHERE status = 'completed'); -- Not equal to scalarSELECT * FROM departmentsWHERE department_id <> (SELECT department_id FROM employees WHERE employee_id = 1); -- BETWEEN using two scalarsSELECT * FROM productsWHERE price BETWEEN (SELECT AVG(price) * 0.8 FROM products) AND (SELECT AVG(price) * 1.2 FROM products);Combining Scalar Subqueries with Logical Operators:
123456789101112131415
-- Multiple conditions with scalar subqueriesSELECT employee_name, salary, department_idFROM employeesWHERE salary > (SELECT AVG(salary) FROM employees) AND department_id = ( SELECT department_id FROM departments WHERE department_name = 'Engineering' ) AND hire_date < (SELECT MAX(hire_date) FROM employees WHERE is_manager = TRUE); -- OR conditionsSELECT product_name, price, category_idFROM productsWHERE price = (SELECT MAX(price) FROM products WHERE category_id = 1) OR price = (SELECT MAX(price) FROM products WHERE category_id = 2) OR price = (SELECT MAX(price) FROM products WHERE category_id = 3);Arithmetic with Scalar Subqueries:
1234567891011121314151617
-- Salary within 10% of company averageSELECT employee_name, salaryFROM employeesWHERE ABS(salary - (SELECT AVG(salary) FROM employees)) <= (SELECT AVG(salary) FROM employees) * 0.10; -- Products priced at exactly 20% markup over costSELECT product_name, price, costFROM productsWHERE price = cost * (1 + (SELECT markup_rate FROM pricing_config WHERE config_name = 'standard')); -- Orders from the last N days (N is stored in a config table)SELECT * FROM ordersWHERE order_date >= DATE_SUB( CURRENT_DATE, INTERVAL (SELECT value FROM settings WHERE key = 'recent_days_threshold') DAY);Scalar subqueries aren't limited to SELECT—they're equally powerful in data modification statements. This enables dynamic updates based on computed values.
UPDATE with Scalar Subqueries:
123456789101112131415161718192021222324252627282930
-- Set all salaries below average to the averageUPDATE employeesSET salary = (SELECT AVG(salary) FROM employees)WHERE salary < (SELECT AVG(salary) FROM employees); -- Note: Some databases (like MySQL) have restrictions on -- updating a table while selecting from it in a subquery.-- Workaround: use a derived table or CTE -- MySQL-safe version using derived table:UPDATE employeesSET salary = (SELECT avg_sal FROM (SELECT AVG(salary) AS avg_sal FROM employees) AS temp)WHERE salary < (SELECT avg_sal FROM (SELECT AVG(salary) AS avg_sal FROM employees) AS temp); -- Update with correlated scalar subquery-- Set each employee's bonus based on their department's performanceUPDATE employees eSET bonus = salary * ( SELECT bonus_multiplier FROM department_performance dp WHERE dp.department_id = e.department_id AND dp.year = 2024); -- Update using scalar from different tableUPDATE productsSET featured = TRUEWHERE price = (SELECT MAX(price) FROM products);DELETE with Scalar Subqueries:
123456789101112131415161718
-- Delete low-value orders (below average)DELETE FROM ordersWHERE total_amount < (SELECT AVG(total_amount) FROM orders) / 2; -- Delete records older than the retention period (stored in config)DELETE FROM logsWHERE created_at < DATE_SUB( CURRENT_DATE, INTERVAL (SELECT retention_days FROM system_config WHERE config_name = 'log_retention') DAY); -- Delete duplicate products keeping only the one with lowest IDDELETE FROM products p1WHERE p1.product_id > ( SELECT MIN(p2.product_id) FROM products p2 WHERE p2.product_name = p1.product_name);Some databases (especially MySQL) restrict modifying a table while selecting from it in a subquery. The error 'You can't specify target table for update in FROM clause' indicates this. Solutions include using derived tables, CTEs (WITH clause), or temporary tables to cache the subquery result first.
Understanding scalar subquery performance is essential for writing efficient SQL. The key question is: how often does the subquery execute?
Non-Correlated Scalar Subqueries:
These execute once regardless of outer query size. Modern databases cache the result, making them efficient even when referenced multiple times.
12345678910
-- This subquery executes ONCE, result is cached and reusedSELECT employee_name, salary, salary - (SELECT AVG(salary) FROM employees) AS diff_from_avgFROM employees; -- Even though (SELECT AVG(salary) FROM employees) appears conceptually-- for each of 10,000 employees, it only executes once.-- Execution: O(n) for outer query + O(m) for subquery = O(n + m)Correlated Scalar Subqueries:
These can be expensive because they conceptually execute once per outer row. However, optimizers often transform them.
1234567891011121314151617
-- Correlated: conceptually executes per outer rowSELECT e.employee_name, e.salary, (SELECT AVG(salary) FROM employees e2 WHERE e2.department_id = e.department_id) AS dept_avgFROM employees e; -- Naive execution: O(n * m) where n = employees, m = department size-- Optimized execution: query optimizer may transform to JOIN + window function: -- Equivalent optimized query (what optimizer might produce):SELECT e.employee_name, e.salary, AVG(e.salary) OVER (PARTITION BY e.department_id) AS dept_avgFROM employees e;-- Execution: O(n log n) for windowed aggregate| Subquery Type | Execution Frequency | Optimization Potential | Recommendation |
|---|---|---|---|
| Non-correlated (simple) | Once | Always cached | Use freely; very efficient |
| Non-correlated (in SELECT, multiple) | Once each | Cached independently | Efficient; consider combining if complex |
| Correlated (in WHERE) | Per outer row (conceptual) | Often to semi-join | Check EXPLAIN; usually optimized |
| Correlated (in SELECT) | Per outer row (conceptual) | Sometimes to window | Watch for N+1; prefer JOIN if slow |
| Multiple correlated in SELECT | Each per outer row | Limited | Consider restructuring to JOIN |
Always use EXPLAIN or EXPLAIN ANALYZE on queries with scalar subqueries, especially correlated ones. The execution plan reveals whether the optimizer transformed your subquery efficiently or is executing it row-by-row. If you see 'DEPENDENT SUBQUERY' or 'Subplan' executed many times, consider restructuring.
Scalar subqueries are the workhorse of SQL's compositional power. Let's consolidate the essential knowledge:
What's Next:
Scalar subqueries return one value. But what if you need to match multiple columns simultaneously—like finding employees with the same (department, job_title) as another specific employee? The next page explores row subqueries, which return a complete row for tuple-based comparisons.
You now have comprehensive mastery of scalar subqueries—their syntax, placement, guarantees, error handling, and performance characteristics. These skills form the foundation for the more complex subquery types ahead.