Subqueries - Learning Module

Loading content...

0/252

Row Subqueries

When Single Values Aren't Enough

Consider this query challenge: "Find all employees who work in the same department AND have the same job title as employee John Smith."

With scalar subqueries, you'd need two separate conditions:

WHERE department_id = (SELECT department_id FROM employees WHERE name = 'John Smith')
  AND job_title = (SELECT job_title FROM employees WHERE name = 'John Smith')

This works, but it's verbose and inefficient—you're querying John Smith's record twice. What if you could fetch both values in one subquery and compare them as a unit?

This is precisely what row subqueries enable. A row subquery returns one row with multiple columns, allowing you to compare tuples (ordered sets of values) directly. It's a powerful pattern for matching composite identities, replicating records, and expressing complex relationships concisely.

What You Will Learn

By the end of this page, you will understand row subqueries—how they differ from scalar subqueries, where they can be used, the syntax for row constructors and comparisons, database support variations, and practical patterns for multi-column matching.

Defining Row Subqueries

A row subquery is a subquery that returns exactly one row containing multiple columns. The result is a composite value—a tuple—that can be compared against another tuple of the same structure.

Formal Definition:

A row subquery is a SELECT statement that, upon execution, yields a result containing at most one row with two or more columns. The columns form a tuple that can be used in row comparisons. If the query returns zero rows, the comparison evaluates to NULL/UNKNOWN. If it returns more than one row, a runtime error occurs.

Terminology Note:

In relational theory, a tuple is an ordered collection of attribute values—essentially a row. Row subqueries produce tuples, and row comparisons match tuples against tuples.

row_subquery_definition.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Row subquery: returns ONE row with MULTIPLE columns
-- This subquery returns (department_id, job_title) for John Smith
 
SELECT employee_name, department_id, job_title
FROM employees
WHERE (department_id, job_title) = (
    SELECT department_id, job_title 
    FROM employees 
    WHERE employee_name = 'John Smith'
);
 
-- Breaking it down:
-- 1. The subquery returns one row: (10, 'Engineer')
-- 2. The outer WHERE compares each employee's (dept_id, job_title) tuple
--    against the subquery's tuple
-- 3. Only employees with BOTH values matching are returned
 
-- This is equivalent to:
-- WHERE department_id = 10 AND job_title = 'Engineer'
-- But derived dynamically from John Smith's record

Row Subquery vs. Multiple Scalar Subqueries

A row subquery retrieves multiple columns in one query execution, while multiple scalar subqueries each execute separately. Row subqueries are more efficient and semantically clearer when columns come from the same row. They guarantee the values are from a single, consistent record.

Row Constructors (Tuple Syntax)

To compare a row subquery result, you need to express the outer query's values as a row constructor (also called a row value expression or tuple expression).

Syntax:

A row constructor groups multiple values into a tuple using parentheses:

(column1, column2, column3)  -- Three-element tuple

For explicit clarity, some databases support the ROW keyword:

ROW(column1, column2, column3)  -- Explicit row constructor

Note: The ROW keyword is optional in many databases (MySQL, PostgreSQL) but required in others (SQL Server doesn't support standard row comparison syntax at all).

row_constructor_syntax.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Implicit row constructor (most common)
SELECT * FROM employees
WHERE (department_id, job_title) = (SELECT department_id, job_title FROM employees WHERE id = 1);
 
-- Explicit ROW keyword (PostgreSQL, MySQL)
SELECT * FROM employees
WHERE ROW(department_id, job_title) = (SELECT department_id, job_title FROM employees WHERE id = 1);
 
-- Row constructor with literals
SELECT * FROM employees
WHERE (department_id, job_title) = (10, 'Engineer');
 
-- Row constructor in SELECT (creating composite values - advanced usage)
SELECT employee_name, (department_id, job_title) AS dept_job_tuple
FROM employees;  -- Note: output format varies by database
 
 
-- Row constructor with expressions
SELECT * FROM products
WHERE (category_id, YEAR(created_date)) = (
    SELECT category_id, YEAR(created_date) 
    FROM products 
    WHERE product_id = 100
);

Row Constructor Support by Database
Database	Implicit Tuple	ROW Keyword	Row Comparison Support
MySQL	✅ Supported	✅ Optional	Full support (=, <>, <, >, <=, >=)
PostgreSQL	✅ Supported	✅ Optional	Full support with rich operators
Oracle	✅ Supported	❌ Not used	Limited (= comparison only)
SQL Server	❌ Not supported	❌ Not supported	No direct row comparison; use AND conditions
SQLite	✅ Supported	❌ Not used	Limited equality comparison

Row Comparison Operators

Row comparisons extend scalar comparison operators to tuples. Understanding how these operators work with multiple values is essential for correct query semantics.

Equality (=) Comparison:

Two tuples are equal if and only if ALL corresponding elements are equal:

(a1, a2, a3) = (b1, b2, b3)
↔  a1 = b1 AND a2 = b2 AND a3 = b3

row_equality.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Row equality: all columns must match
SELECT * FROM employees
WHERE (department_id, job_title, location_id) = (
    SELECT department_id, job_title, location_id
    FROM employees
    WHERE employee_id = 101
);
 
-- Equivalent expanded form:
SELECT * FROM employees
WHERE department_id = (SELECT department_id FROM employees WHERE employee_id = 101)
  AND job_title = (SELECT job_title FROM employees WHERE employee_id = 101)
  AND location_id = (SELECT location_id FROM employees WHERE employee_id = 101);
 
-- The row form is preferred: cleaner, executes one subquery instead of three

Inequality (<>) Comparison:

Two tuples are unequal if ANY corresponding element differs:

(a1, a2) <> (b1, b2)
↔  a1 <> b1 OR a2 <> b2

row_inequality.sql
1
2
3
4
5
6
7
8
9
10
-- Row inequality: at least one column must differ
SELECT * FROM employees e
WHERE (e.department_id, e.job_title) <> (
    SELECT department_id, job_title
    FROM employees
    WHERE employee_id = 101
);
 
-- Returns all employees who differ in department OR job title OR both
-- from employee 101

Ordering Comparisons (<, >, <=, >=):

Row ordering uses lexicographic (dictionary) order—columns are compared left to right, with earlier columns taking precedence:

(a1, a2) < (b1, b2)
↔  (a1 < b1) OR (a1 = b1 AND a2 < b2)

This is like alphabetical sorting: 'AA' < 'AB' < 'B'. The first column is most significant.

row_ordering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Row ordering: lexicographic comparison
-- (10, 'A') < (10, 'B') → TRUE (first elements equal, second compared)
-- (10, 'Z') < (20, 'A') → TRUE (first element determines order)
 
SELECT * FROM employees
WHERE (department_id, hire_date) < (10, '2020-01-01');
 
-- Returns:
-- 1. All employees in departments < 10 (ANY hire date)
-- 2. All employees in department 10 hired BEFORE 2020-01-01
 
-- Useful for composite ordering like pagination:
SELECT * FROM orders
WHERE (order_date, order_id) > ('2024-01-01', 1000)
ORDER BY order_date, order_id
LIMIT 100;
-- This efficiently fetches the next page after the given cursor

Lexicographic Order and Data Types

Lexicographic comparison requires all corresponding elements to be comparable. Mixing incompatible types (e.g., comparing (INT, VARCHAR) against (INT, DATE)) causes errors. Ensure tuple columns have matching or compatible types.

NULL Handling in Row Comparisons

NULL values introduce complexity in row comparisons, following SQL's three-valued logic. The behavior can be counterintuitive.

Equality with NULL:

If any corresponding element involves NULL, the comparison may yield UNKNOWN rather than TRUE or FALSE:

(10, NULL) = (10, 'A')     → UNKNOWN (NULL = 'A' is unknown)
(10, NULL) = (10, NULL)    → UNKNOWN (NULL = NULL is unknown)
(NULL, 'A') = (10, 'A')    → UNKNOWN (NULL = 10 is unknown)

row_null_behavior.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Create test data
CREATE TABLE test_rows (
    id INT,
    col1 INT,
    col2 VARCHAR(10)
);
 
INSERT INTO test_rows VALUES (1, 10, 'A');
INSERT INTO test_rows VALUES (2, 10, NULL);
INSERT INTO test_rows VALUES (3, NULL, 'A');
INSERT INTO test_rows VALUES (4, NULL, NULL);
 
-- Query: Find rows matching (10, 'A')
SELECT * FROM test_rows WHERE (col1, col2) = (10, 'A');
-- Returns: only row 1
-- Rows 2, 3, 4 involve NULL → comparison is UNKNOWN → not returned
 
-- Query: Find rows NOT matching (10, 'A')  
SELECT * FROM test_rows WHERE (col1, col2) <> (10, 'A');
-- Returns: ONLY rows where we're CERTAIN they differ
-- Row 2: (10, NULL) <> (10, 'A') → 10=10 but NULL<>'A' is unknown → UNKNOWN
-- Row 3: (NULL, 'A') <> (10, 'A') → NULL<>10 is unknown → UNKNOWN
-- Row 4: (NULL, NULL) <> (10, 'A') → UNKNOWN
-- No rows returned! (surprising but correct per SQL semantics)

IS DISTINCT FROM (NULL-safe comparison):

Some databases offer NULL-safe comparison operators that treat NULLs as regular values:

row_null_safe.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- PostgreSQL: IS DISTINCT FROM (NULL-safe inequality)
SELECT * FROM test_rows
WHERE (col1, col2) IS DISTINCT FROM (10, 'A');
-- Returns rows 2, 3, 4 (treats NULL as a regular value)
 
-- PostgreSQL: IS NOT DISTINCT FROM (NULL-safe equality)
SELECT * FROM test_rows
WHERE (col1, col2) IS NOT DISTINCT FROM (10, NULL);
-- Returns row 2 (NULL matches NULL)
 
 
-- MySQL: NULL-safe equality operator <=>
SELECT * FROM test_rows
WHERE (col1, col2) <=> (10, NULL);
-- Note: MySQL's <=> only works on scalars, not row comparisons directly
 
 
-- Workaround for databases without tuple NULL-safe operators:
SELECT * FROM test_rows
WHERE (col1 = 10 OR (col1 IS NULL AND 10 IS NULL))
  AND (col2 = 'A' OR (col2 IS NULL AND 'A' IS NULL));

Row Comparison and NULL Pitfalls

When any tuple element can be NULL, row comparisons may silently exclude rows you expect to match or include. Either filter out NULLs before comparison, use NULL-safe operators where available, or expand to explicit AND/OR conditions with IS NULL checks.

Ensuring Single-Row Results

Like scalar subqueries, row subqueries must return at most one row. If multiple rows are returned, a runtime error occurs.

Techniques for Guaranteeing Single Row:

1. Filter by Primary/Unique Key:

single_row_pk.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Guaranteed single row: primary key filter
SELECT * FROM employees
WHERE (department_id, job_title) = (
    SELECT department_id, job_title
    FROM employees
    WHERE employee_id = 101  -- PK: exactly 0 or 1 row
);
 
-- Guaranteed single row: unique constraint filter
SELECT * FROM products  
WHERE (category_id, brand_id) = (
    SELECT category_id, brand_id
    FROM products
    WHERE sku = 'PROD-001'  -- SKU is unique
);

2. Aggregate Functions to Collapse:

single_row_aggregate.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Use aggregates to produce one row
-- Find employees matching the department with highest budget
SELECT * FROM employees
WHERE (department_id, location_id) = (
    SELECT department_id, location_id
    FROM departments
    WHERE budget = (SELECT MAX(budget) FROM departments)
    LIMIT 1  -- In case of ties, pick one
);
 
-- Using MIN/MAX for deterministic selection
SELECT * FROM employees
WHERE (department_id, job_title) = (
    SELECT MIN(department_id), MIN(job_title)  -- Aggregate → one row
    FROM employees
    WHERE hire_date = (SELECT MIN(hire_date) FROM employees)
);

3. LIMIT 1 with ORDER BY:

single_row_limit.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Explicit LIMIT 1 with deterministic ordering
SELECT * FROM employees
WHERE (department_id, job_title) = (
    SELECT department_id, job_title
    FROM employees
    WHERE salary > 100000
    ORDER BY hire_date ASC  -- Earliest high earner
    LIMIT 1
);
 
-- Top performer's attributes
SELECT * FROM employees
WHERE (department_id, job_title, office_id) = (
    SELECT department_id, job_title, office_id
    FROM employees
    ORDER BY performance_score DESC
    FETCH FIRST 1 ROW ONLY
);

Common Row Subquery Error

If your row subquery returns multiple rows, you'll see errors like 'Subquery returns more than 1 row' (MySQL) or 'more than one row returned by a subquery used as an expression' (PostgreSQL). Before deployment, test your subquery standalone to verify it never exceeds one row for any possible data state.

Practical Use Cases

Row subqueries excel in scenarios involving composite identity matching, record copying, and multi-attribute relationships.

Use Case 1: Finding Related Records by Composite Key

usecase_composite_key.sql
1
2
3
4
5
6
7
8
9
10
11
12
-- Find all orders with the same (customer_id, shipping_address) as order #12345
SELECT order_id, order_date, total_amount
FROM orders
WHERE (customer_id, shipping_address_id) = (
    SELECT customer_id, shipping_address_id
    FROM orders
    WHERE order_id = 12345
)
AND order_id <> 12345;  -- Exclude the reference order itself
 
-- This finds orders shipped to the same customer/address combination
-- Useful for detecting patterns or grouping related orders

Use Case 2: Matching Configuration Tuples

usecase_config_match.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Find products with identical configuration to a reference product
SELECT product_name, sku
FROM products
WHERE (category_id, brand_id, size_id, color_id) = (
    SELECT category_id, brand_id, size_id, color_id
    FROM products
    WHERE sku = 'REF-PRODUCT-001'
);
 
-- Find users with matching (role, department, access_level)
SELECT user_name, email
FROM users
WHERE (role_id, department_id, access_level) = (
    SELECT role_id, department_id, access_level
    FROM users
    WHERE user_id = (SELECT manager_id FROM users WHERE user_id = 456)
);

Use Case 3: Pagination with Composite Cursor

usecase_pagination.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Cursor-based pagination using row comparison
-- Fetch next page of results after a known (timestamp, id) position
 
SELECT event_id, event_type, created_at, user_id
FROM events
WHERE (created_at, event_id) > ('2024-01-15 14:30:00', 50000)
ORDER BY created_at, event_id
LIMIT 50;
 
-- Row comparison handles the edge case where multiple events 
-- have the same timestamp by using event_id as tiebreaker
-- More efficient than OFFSET for deep pagination
 
-- Previous page (reverse direction)
SELECT * FROM (
    SELECT event_id, event_type, created_at, user_id
    FROM events
    WHERE (created_at, event_id) < ('2024-01-15 14:30:00', 50000)
    ORDER BY created_at DESC, event_id DESC
    LIMIT 50
) AS prev_page
ORDER BY created_at, event_id;

Use Case 4: Record Cloning/Comparison

usecase_record_clone.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Find exact duplicate records (all columns match)
SELECT a.*
FROM products a
WHERE EXISTS (
    SELECT 1 FROM products b
    WHERE b.product_id <> a.product_id
      AND (b.name, b.category_id, b.brand_id, b.price) = 
          (a.name, a.category_id, a.brand_id, a.price)
);
 
-- Audit: find changes from previous version
SELECT current.*
FROM employee_history current
JOIN employee_history previous 
  ON current.employee_id = previous.employee_id
 AND current.version = previous.version + 1
WHERE (current.salary, current.department_id, current.job_title) <>
      (previous.salary, previous.department_id, previous.job_title);

Row Subqueries with IN Operator

While a row subquery used with = must return exactly one row, IN allows matching against a set of rows. This combines the power of row comparison with set membership testing.

Syntax:

WHERE (col1, col2) IN (SELECT colA, colB FROM table WHERE ...)

This returns TRUE if the tuple (col1, col2) matches ANY row in the subquery result.

row_in_operator.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Find employees in any of the high-performance department/role combinations
SELECT employee_name, salary
FROM employees
WHERE (department_id, job_title) IN (
    SELECT department_id, job_title
    FROM high_performer_profiles
    WHERE avg_rating > 4.5
);
 
-- Find products that match any (category, brand) combination of bestsellers
SELECT product_name, price
FROM products
WHERE (category_id, brand_id) IN (
    SELECT category_id, brand_id
    FROM products
    WHERE units_sold > 10000
);
 
-- NOT IN with row tuples: find orphan combinations
SELECT department_id, job_title
FROM salary_grades
WHERE (department_id, job_title) NOT IN (
    SELECT DISTINCT department_id, job_title
    FROM employees
    WHERE department_id IS NOT NULL 
      AND job_title IS NOT NULL  -- Crucial for NOT IN!
);

NULL in Row IN/NOT IN

The NULL trap with NOT IN also applies to row comparisons. If any element in any subquery tuple is NULL, NOT IN may return no rows. Either filter NULLs in both the tuple and subquery, or use NOT EXISTS which handles NULLs correctly.

row_in_vs_exists.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- NOT IN with potential NULLs: DANGEROUS
SELECT * FROM orders
WHERE (customer_id, product_id) NOT IN (
    SELECT customer_id, product_id FROM returns
    -- If any customer_id or product_id is NULL, this fails silently
);
 
-- NOT EXISTS equivalent: NULL-SAFE
SELECT * FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM returns r
    WHERE r.customer_id = o.customer_id
      AND r.product_id = o.product_id
);
-- Returns correct results even with NULL values

Database-Specific Considerations

Row subquery support varies significantly across database systems. Understanding these differences is crucial for writing portable SQL.

MySQL/MariaDB:

mysql_row_support.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- MySQL: Full row comparison support
-- All comparison operators work with tuples
 
SELECT * FROM employees
WHERE (department_id, salary) > (10, 50000);  -- Lexicographic
 
-- MySQL-specific: ROW() syntax optional but supported
SELECT * FROM employees  
WHERE ROW(department_id, job_title) = ROW(10, 'Engineer');
 
-- MySQL optimizes row IN well
SELECT * FROM orders
WHERE (customer_id, product_id) IN (
    SELECT customer_id, product_id FROM wishlist
);

PostgreSQL:

postgresql_row_support.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- PostgreSQL: Excellent row/composite type support
-- Most flexible implementation
 
SELECT * FROM employees
WHERE (department_id, job_title) = (10, 'Engineer');
 
-- Sophisticated NULL handling
SELECT * FROM employees
WHERE (department_id, job_title) IS DISTINCT FROM (10, 'Engineer');
 
-- Can compare ROW values directly
SELECT * FROM t1
WHERE ROW(a, b, c) = ROW(1, 2, 3);
 
-- PostgreSQL allows row expressions in more contexts
SELECT (department_id, hire_date) FROM employees;  -- Returns composite

SQL Server:

SQL Server does not support standard row comparison syntax. You must use expanded AND/OR conditions:

sqlserver_row_workaround.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- SQL Server: NO direct row comparison support
-- This DOES NOT WORK in SQL Server:
-- SELECT * FROM emp WHERE (dept_id, job) = (SELECT dept_id, job FROM emp WHERE id = 1)
 
-- Must expand to individual comparisons:
SELECT e.*
FROM employees e
CROSS APPLY (
    SELECT department_id AS ref_dept, job_title AS ref_job
    FROM employees 
    WHERE employee_id = 1
) AS ref
WHERE e.department_id = ref.ref_dept
  AND e.job_title = ref.ref_job;
 
-- Or use multiple scalar subqueries:
SELECT * FROM employees
WHERE department_id = (SELECT department_id FROM employees WHERE employee_id = 1)
  AND job_title = (SELECT job_title FROM employees WHERE employee_id = 1);

Row Subquery Feature Matrix
Feature	MySQL	PostgreSQL	Oracle	SQL Server
Tuple equality (=)	✅	✅	✅	❌
Tuple inequality (<>)	✅	✅	❌	❌
Tuple ordering (<, >)	✅	✅	❌	❌
IS DISTINCT FROM	❌	✅	❌	❌
Tuple IN subquery	✅	✅	✅	❌
ROW keyword	Optional	Optional	Not used	N/A

Performance Considerations

Row subqueries can be efficient or problematic depending on how they're used and optimized.

Advantages of Row Subqueries:

Single execution — One subquery returns multiple values, reducing query count
Atomic consistency — Values come from the same row, avoiding race conditions
Clean syntax — More readable than multiple scalar subqueries

Optimization Behavior:

row_performance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Row IN subqueries: often optimized to semi-joins
-- PostgreSQL execution plan typically shows:
-- Hash Semi Join
--   Hash Cond: ((employees.department_id = subquery.department_id) 
--               AND (employees.job_title = subquery.job_title))
 
EXPLAIN ANALYZE
SELECT * FROM employees
WHERE (department_id, job_title) IN (
    SELECT department_id, job_title
    FROM target_profiles
);
 
-- Composite index can help both sides:
CREATE INDEX idx_emp_dept_job ON employees(department_id, job_title);
CREATE INDEX idx_profile_dept_job ON target_profiles(department_id, job_title);
 
 
-- Row comparison for pagination: efficient with proper indexing
-- This query can use the composite index for seeking
CREATE INDEX idx_events_ts_id ON events(created_at, event_id);
 
SELECT * FROM events
WHERE (created_at, event_id) > ('2024-01-15 14:30:00', 50000)
ORDER BY created_at, event_id
LIMIT 50;

Composite Indexes for Row Comparisons

When using row comparisons for filtering or ordering, create composite indexes matching the column order in your tuple. A (col1, col2) index efficiently supports WHERE (col1, col2) > (val1, val2) through B-tree range scanning.

Row Subquery Performance Guidelines
Pattern	Performance	Optimization Tips
Row = (non-correlated subquery)	Excellent	Subquery cached; index the lookup columns
Row IN (small result set)	Good	Semi-join optimization; index both sides
Row IN (large result set)	Variable	May scan; consider JOIN or hash strategy
Row comparison for pagination	Excellent	Composite index; avoids OFFSET overhead
Row comparison with functions	Poor	Index not usable; materialize values if possible

Summary: Row Subquery Mastery

Row subqueries extend SQL's expressive power to multi-column comparisons. Let's consolidate the key concepts:

Key Takeaways

•Definition — Row subqueries return exactly one row with multiple columns, producing a tuple for composite comparisons.
•Row constructors — Use (col1, col2, ...) or ROW(col1, col2, ...) syntax to create tuples for comparison.
•Comparison operators — Equality (=) requires ALL elements to match; ordering (<, >) uses lexicographic comparison.
•NULL complications — NULL in any tuple element can cause UNKNOWN results. Use NULL-safe operators or explicit IS NULL handling.
•IN with row tuples — Allows matching against a set of row tuples, supporting multi-column membership tests.
•Database variance — Support varies widely; SQL Server lacks row comparison entirely.
•Use cases — Composite key matching, pagination cursors, record comparison, and atomic multi-attribute retrieval.

What's Next:

Scalar subqueries return one value; row subqueries return one row. But many problems require working with sets of rows—multiple rows that feed into set operations or derived table processing. The next page explores table subqueries, which return full result sets for use in FROM clauses, set operators, and multi-row comparisons.

Page Complete

You now understand row subqueries and tuple comparisons—a powerful tool for matching composite identities, implementing efficient pagination, and expressing multi-column conditions concisely. This knowledge prepares you for the full power of table subqueries.

Row Subqueries

When Single Values Aren't Enough

Consider this query challenge: "Find all employees who work in the same department AND have the same job title as employee John Smith."

With scalar subqueries, you'd need two separate conditions:

WHERE department_id = (SELECT department_id FROM employees WHERE name = 'John Smith')
  AND job_title = (SELECT job_title FROM employees WHERE name = 'John Smith')

This works, but it's verbose and inefficient—you're querying John Smith's record twice. What if you could fetch both values in one subquery and compare them as a unit?

What You Will Learn

Defining Row Subqueries

Formal Definition:

A row subquery is a SELECT statement that, upon execution, yields a result containing at most one row with two or more columns. The columns form a tuple that can be used in row comparisons. If the query returns zero rows, the comparison evaluates to NULL/UNKNOWN. If it returns more than one row, a runtime error occurs.

Terminology Note:

In relational theory, a tuple is an ordered collection of attribute values—essentially a row. Row subqueries produce tuples, and row comparisons match tuples against tuples.

row_subquery_definition.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Row subquery: returns ONE row with MULTIPLE columns
-- This subquery returns (department_id, job_title) for John Smith
 
SELECT employee_name, department_id, job_title
FROM employees
WHERE (department_id, job_title) = (
    SELECT department_id, job_title 
    FROM employees 
    WHERE employee_name = 'John Smith'
);
 
-- Breaking it down:
-- 1. The subquery returns one row: (10, 'Engineer')
-- 2. The outer WHERE compares each employee's (dept_id, job_title) tuple
--    against the subquery's tuple
-- 3. Only employees with BOTH values matching are returned
 
-- This is equivalent to:
-- WHERE department_id = 10 AND job_title = 'Engineer'
-- But derived dynamically from John Smith's record

Row Subquery vs. Multiple Scalar Subqueries

Row Constructors (Tuple Syntax)

To compare a row subquery result, you need to express the outer query's values as a row constructor (also called a row value expression or tuple expression).

Syntax:

A row constructor groups multiple values into a tuple using parentheses:

(column1, column2, column3)  -- Three-element tuple

For explicit clarity, some databases support the ROW keyword:

ROW(column1, column2, column3)  -- Explicit row constructor

Note: The ROW keyword is optional in many databases (MySQL, PostgreSQL) but required in others (SQL Server doesn't support standard row comparison syntax at all).

row_constructor_syntax.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Implicit row constructor (most common)
SELECT * FROM employees
WHERE (department_id, job_title) = (SELECT department_id, job_title FROM employees WHERE id = 1);
 
-- Explicit ROW keyword (PostgreSQL, MySQL)
SELECT * FROM employees
WHERE ROW(department_id, job_title) = (SELECT department_id, job_title FROM employees WHERE id = 1);
 
-- Row constructor with literals
SELECT * FROM employees
WHERE (department_id, job_title) = (10, 'Engineer');
 
-- Row constructor in SELECT (creating composite values - advanced usage)
SELECT employee_name, (department_id, job_title) AS dept_job_tuple
FROM employees;  -- Note: output format varies by database
 
 
-- Row constructor with expressions
SELECT * FROM products
WHERE (category_id, YEAR(created_date)) = (
    SELECT category_id, YEAR(created_date) 
    FROM products 
    WHERE product_id = 100
);

Row Constructor Support by Database
Database	Implicit Tuple	ROW Keyword	Row Comparison Support
MySQL	✅ Supported	✅ Optional	Full support (=, <>, <, >, <=, >=)
PostgreSQL	✅ Supported	✅ Optional	Full support with rich operators
Oracle	✅ Supported	❌ Not used	Limited (= comparison only)
SQL Server	❌ Not supported	❌ Not supported	No direct row comparison; use AND conditions
SQLite	✅ Supported	❌ Not used	Limited equality comparison

Row Comparison Operators

Row comparisons extend scalar comparison operators to tuples. Understanding how these operators work with multiple values is essential for correct query semantics.

Equality (=) Comparison:

Two tuples are equal if and only if ALL corresponding elements are equal:

(a1, a2, a3) = (b1, b2, b3)
↔  a1 = b1 AND a2 = b2 AND a3 = b3

row_equality.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Row equality: all columns must match
SELECT * FROM employees
WHERE (department_id, job_title, location_id) = (
    SELECT department_id, job_title, location_id
    FROM employees
    WHERE employee_id = 101
);
 
-- Equivalent expanded form:
SELECT * FROM employees
WHERE department_id = (SELECT department_id FROM employees WHERE employee_id = 101)
  AND job_title = (SELECT job_title FROM employees WHERE employee_id = 101)
  AND location_id = (SELECT location_id FROM employees WHERE employee_id = 101);
 
-- The row form is preferred: cleaner, executes one subquery instead of three

Inequality (<>) Comparison:

Two tuples are unequal if ANY corresponding element differs:

(a1, a2) <> (b1, b2)
↔  a1 <> b1 OR a2 <> b2

row_inequality.sql
1
2
3
4
5
6
7
8
9
10
-- Row inequality: at least one column must differ
SELECT * FROM employees e
WHERE (e.department_id, e.job_title) <> (
    SELECT department_id, job_title
    FROM employees
    WHERE employee_id = 101
);
 
-- Returns all employees who differ in department OR job title OR both
-- from employee 101

Ordering Comparisons (<, >, <=, >=):

Row ordering uses lexicographic (dictionary) order—columns are compared left to right, with earlier columns taking precedence:

(a1, a2) < (b1, b2)
↔  (a1 < b1) OR (a1 = b1 AND a2 < b2)

This is like alphabetical sorting: 'AA' < 'AB' < 'B'. The first column is most significant.

row_ordering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Row ordering: lexicographic comparison
-- (10, 'A') < (10, 'B') → TRUE (first elements equal, second compared)
-- (10, 'Z') < (20, 'A') → TRUE (first element determines order)
 
SELECT * FROM employees
WHERE (department_id, hire_date) < (10, '2020-01-01');
 
-- Returns:
-- 1. All employees in departments < 10 (ANY hire date)
-- 2. All employees in department 10 hired BEFORE 2020-01-01
 
-- Useful for composite ordering like pagination:
SELECT * FROM orders
WHERE (order_date, order_id) > ('2024-01-01', 1000)
ORDER BY order_date, order_id
LIMIT 100;
-- This efficiently fetches the next page after the given cursor

Lexicographic Order and Data Types

NULL Handling in Row Comparisons

NULL values introduce complexity in row comparisons, following SQL's three-valued logic. The behavior can be counterintuitive.

Equality with NULL:

If any corresponding element involves NULL, the comparison may yield UNKNOWN rather than TRUE or FALSE:

(10, NULL) = (10, 'A')     → UNKNOWN (NULL = 'A' is unknown)
(10, NULL) = (10, NULL)    → UNKNOWN (NULL = NULL is unknown)
(NULL, 'A') = (10, 'A')    → UNKNOWN (NULL = 10 is unknown)

row_null_behavior.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Create test data
CREATE TABLE test_rows (
    id INT,
    col1 INT,
    col2 VARCHAR(10)
);
 
INSERT INTO test_rows VALUES (1, 10, 'A');
INSERT INTO test_rows VALUES (2, 10, NULL);
INSERT INTO test_rows VALUES (3, NULL, 'A');
INSERT INTO test_rows VALUES (4, NULL, NULL);
 
-- Query: Find rows matching (10, 'A')
SELECT * FROM test_rows WHERE (col1, col2) = (10, 'A');
-- Returns: only row 1
-- Rows 2, 3, 4 involve NULL → comparison is UNKNOWN → not returned
 
-- Query: Find rows NOT matching (10, 'A')  
SELECT * FROM test_rows WHERE (col1, col2) <> (10, 'A');
-- Returns: ONLY rows where we're CERTAIN they differ
-- Row 2: (10, NULL) <> (10, 'A') → 10=10 but NULL<>'A' is unknown → UNKNOWN
-- Row 3: (NULL, 'A') <> (10, 'A') → NULL<>10 is unknown → UNKNOWN
-- Row 4: (NULL, NULL) <> (10, 'A') → UNKNOWN
-- No rows returned! (surprising but correct per SQL semantics)

IS DISTINCT FROM (NULL-safe comparison):

Some databases offer NULL-safe comparison operators that treat NULLs as regular values:

row_null_safe.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- PostgreSQL: IS DISTINCT FROM (NULL-safe inequality)
SELECT * FROM test_rows
WHERE (col1, col2) IS DISTINCT FROM (10, 'A');
-- Returns rows 2, 3, 4 (treats NULL as a regular value)
 
-- PostgreSQL: IS NOT DISTINCT FROM (NULL-safe equality)
SELECT * FROM test_rows
WHERE (col1, col2) IS NOT DISTINCT FROM (10, NULL);
-- Returns row 2 (NULL matches NULL)
 
 
-- MySQL: NULL-safe equality operator <=>
SELECT * FROM test_rows
WHERE (col1, col2) <=> (10, NULL);
-- Note: MySQL's <=> only works on scalars, not row comparisons directly
 
 
-- Workaround for databases without tuple NULL-safe operators:
SELECT * FROM test_rows
WHERE (col1 = 10 OR (col1 IS NULL AND 10 IS NULL))
  AND (col2 = 'A' OR (col2 IS NULL AND 'A' IS NULL));

Row Comparison and NULL Pitfalls

Ensuring Single-Row Results

Like scalar subqueries, row subqueries must return at most one row. If multiple rows are returned, a runtime error occurs.

Techniques for Guaranteeing Single Row:

1. Filter by Primary/Unique Key:

single_row_pk.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Guaranteed single row: primary key filter
SELECT * FROM employees
WHERE (department_id, job_title) = (
    SELECT department_id, job_title
    FROM employees
    WHERE employee_id = 101  -- PK: exactly 0 or 1 row
);
 
-- Guaranteed single row: unique constraint filter
SELECT * FROM products  
WHERE (category_id, brand_id) = (
    SELECT category_id, brand_id
    FROM products
    WHERE sku = 'PROD-001'  -- SKU is unique
);

2. Aggregate Functions to Collapse:

single_row_aggregate.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Use aggregates to produce one row
-- Find employees matching the department with highest budget
SELECT * FROM employees
WHERE (department_id, location_id) = (
    SELECT department_id, location_id
    FROM departments
    WHERE budget = (SELECT MAX(budget) FROM departments)
    LIMIT 1  -- In case of ties, pick one
);
 
-- Using MIN/MAX for deterministic selection
SELECT * FROM employees
WHERE (department_id, job_title) = (
    SELECT MIN(department_id), MIN(job_title)  -- Aggregate → one row
    FROM employees
    WHERE hire_date = (SELECT MIN(hire_date) FROM employees)
);

3. LIMIT 1 with ORDER BY:

single_row_limit.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Explicit LIMIT 1 with deterministic ordering
SELECT * FROM employees
WHERE (department_id, job_title) = (
    SELECT department_id, job_title
    FROM employees
    WHERE salary > 100000
    ORDER BY hire_date ASC  -- Earliest high earner
    LIMIT 1
);
 
-- Top performer's attributes
SELECT * FROM employees
WHERE (department_id, job_title, office_id) = (
    SELECT department_id, job_title, office_id
    FROM employees
    ORDER BY performance_score DESC
    FETCH FIRST 1 ROW ONLY
);

Common Row Subquery Error

Practical Use Cases

Row subqueries excel in scenarios involving composite identity matching, record copying, and multi-attribute relationships.

Use Case 1: Finding Related Records by Composite Key

usecase_composite_key.sql
1
2
3
4
5
6
7
8
9
10
11
12
-- Find all orders with the same (customer_id, shipping_address) as order #12345
SELECT order_id, order_date, total_amount
FROM orders
WHERE (customer_id, shipping_address_id) = (
    SELECT customer_id, shipping_address_id
    FROM orders
    WHERE order_id = 12345
)
AND order_id <> 12345;  -- Exclude the reference order itself
 
-- This finds orders shipped to the same customer/address combination
-- Useful for detecting patterns or grouping related orders

Use Case 2: Matching Configuration Tuples

usecase_config_match.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Find products with identical configuration to a reference product
SELECT product_name, sku
FROM products
WHERE (category_id, brand_id, size_id, color_id) = (
    SELECT category_id, brand_id, size_id, color_id
    FROM products
    WHERE sku = 'REF-PRODUCT-001'
);
 
-- Find users with matching (role, department, access_level)
SELECT user_name, email
FROM users
WHERE (role_id, department_id, access_level) = (
    SELECT role_id, department_id, access_level
    FROM users
    WHERE user_id = (SELECT manager_id FROM users WHERE user_id = 456)
);

Use Case 3: Pagination with Composite Cursor

usecase_pagination.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Cursor-based pagination using row comparison
-- Fetch next page of results after a known (timestamp, id) position
 
SELECT event_id, event_type, created_at, user_id
FROM events
WHERE (created_at, event_id) > ('2024-01-15 14:30:00', 50000)
ORDER BY created_at, event_id
LIMIT 50;
 
-- Row comparison handles the edge case where multiple events 
-- have the same timestamp by using event_id as tiebreaker
-- More efficient than OFFSET for deep pagination
 
-- Previous page (reverse direction)
SELECT * FROM (
    SELECT event_id, event_type, created_at, user_id
    FROM events
    WHERE (created_at, event_id) < ('2024-01-15 14:30:00', 50000)
    ORDER BY created_at DESC, event_id DESC
    LIMIT 50
) AS prev_page
ORDER BY created_at, event_id;

Use Case 4: Record Cloning/Comparison

usecase_record_clone.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Find exact duplicate records (all columns match)
SELECT a.*
FROM products a
WHERE EXISTS (
    SELECT 1 FROM products b
    WHERE b.product_id <> a.product_id
      AND (b.name, b.category_id, b.brand_id, b.price) = 
          (a.name, a.category_id, a.brand_id, a.price)
);
 
-- Audit: find changes from previous version
SELECT current.*
FROM employee_history current
JOIN employee_history previous 
  ON current.employee_id = previous.employee_id
 AND current.version = previous.version + 1
WHERE (current.salary, current.department_id, current.job_title) <>
      (previous.salary, previous.department_id, previous.job_title);

Row Subqueries with IN Operator

While a row subquery used with = must return exactly one row, IN allows matching against a set of rows. This combines the power of row comparison with set membership testing.

Syntax:

WHERE (col1, col2) IN (SELECT colA, colB FROM table WHERE ...)

This returns TRUE if the tuple (col1, col2) matches ANY row in the subquery result.

row_in_operator.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Find employees in any of the high-performance department/role combinations
SELECT employee_name, salary
FROM employees
WHERE (department_id, job_title) IN (
    SELECT department_id, job_title
    FROM high_performer_profiles
    WHERE avg_rating > 4.5
);
 
-- Find products that match any (category, brand) combination of bestsellers
SELECT product_name, price
FROM products
WHERE (category_id, brand_id) IN (
    SELECT category_id, brand_id
    FROM products
    WHERE units_sold > 10000
);
 
-- NOT IN with row tuples: find orphan combinations
SELECT department_id, job_title
FROM salary_grades
WHERE (department_id, job_title) NOT IN (
    SELECT DISTINCT department_id, job_title
    FROM employees
    WHERE department_id IS NOT NULL 
      AND job_title IS NOT NULL  -- Crucial for NOT IN!
);

NULL in Row IN/NOT IN

row_in_vs_exists.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- NOT IN with potential NULLs: DANGEROUS
SELECT * FROM orders
WHERE (customer_id, product_id) NOT IN (
    SELECT customer_id, product_id FROM returns
    -- If any customer_id or product_id is NULL, this fails silently
);
 
-- NOT EXISTS equivalent: NULL-SAFE
SELECT * FROM orders o
WHERE NOT EXISTS (
    SELECT 1 FROM returns r
    WHERE r.customer_id = o.customer_id
      AND r.product_id = o.product_id
);
-- Returns correct results even with NULL values

Database-Specific Considerations

Row subquery support varies significantly across database systems. Understanding these differences is crucial for writing portable SQL.

MySQL/MariaDB:

mysql_row_support.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- MySQL: Full row comparison support
-- All comparison operators work with tuples
 
SELECT * FROM employees
WHERE (department_id, salary) > (10, 50000);  -- Lexicographic
 
-- MySQL-specific: ROW() syntax optional but supported
SELECT * FROM employees  
WHERE ROW(department_id, job_title) = ROW(10, 'Engineer');
 
-- MySQL optimizes row IN well
SELECT * FROM orders
WHERE (customer_id, product_id) IN (
    SELECT customer_id, product_id FROM wishlist
);

PostgreSQL:

postgresql_row_support.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- PostgreSQL: Excellent row/composite type support
-- Most flexible implementation
 
SELECT * FROM employees
WHERE (department_id, job_title) = (10, 'Engineer');
 
-- Sophisticated NULL handling
SELECT * FROM employees
WHERE (department_id, job_title) IS DISTINCT FROM (10, 'Engineer');
 
-- Can compare ROW values directly
SELECT * FROM t1
WHERE ROW(a, b, c) = ROW(1, 2, 3);
 
-- PostgreSQL allows row expressions in more contexts
SELECT (department_id, hire_date) FROM employees;  -- Returns composite

SQL Server:

SQL Server does not support standard row comparison syntax. You must use expanded AND/OR conditions:

sqlserver_row_workaround.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- SQL Server: NO direct row comparison support
-- This DOES NOT WORK in SQL Server:
-- SELECT * FROM emp WHERE (dept_id, job) = (SELECT dept_id, job FROM emp WHERE id = 1)
 
-- Must expand to individual comparisons:
SELECT e.*
FROM employees e
CROSS APPLY (
    SELECT department_id AS ref_dept, job_title AS ref_job
    FROM employees 
    WHERE employee_id = 1
) AS ref
WHERE e.department_id = ref.ref_dept
  AND e.job_title = ref.ref_job;
 
-- Or use multiple scalar subqueries:
SELECT * FROM employees
WHERE department_id = (SELECT department_id FROM employees WHERE employee_id = 1)
  AND job_title = (SELECT job_title FROM employees WHERE employee_id = 1);

Row Subquery Feature Matrix
Feature	MySQL	PostgreSQL	Oracle	SQL Server
Tuple equality (=)	✅	✅	✅	❌
Tuple inequality (<>)	✅	✅	❌	❌
Tuple ordering (<, >)	✅	✅	❌	❌
IS DISTINCT FROM	❌	✅	❌	❌
Tuple IN subquery	✅	✅	✅	❌
ROW keyword	Optional	Optional	Not used	N/A

Performance Considerations

Row subqueries can be efficient or problematic depending on how they're used and optimized.

Advantages of Row Subqueries:

Single execution — One subquery returns multiple values, reducing query count
Atomic consistency — Values come from the same row, avoiding race conditions
Clean syntax — More readable than multiple scalar subqueries

Optimization Behavior:

row_performance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Row IN subqueries: often optimized to semi-joins
-- PostgreSQL execution plan typically shows:
-- Hash Semi Join
--   Hash Cond: ((employees.department_id = subquery.department_id) 
--               AND (employees.job_title = subquery.job_title))
 
EXPLAIN ANALYZE
SELECT * FROM employees
WHERE (department_id, job_title) IN (
    SELECT department_id, job_title
    FROM target_profiles
);
 
-- Composite index can help both sides:
CREATE INDEX idx_emp_dept_job ON employees(department_id, job_title);
CREATE INDEX idx_profile_dept_job ON target_profiles(department_id, job_title);
 
 
-- Row comparison for pagination: efficient with proper indexing
-- This query can use the composite index for seeking
CREATE INDEX idx_events_ts_id ON events(created_at, event_id);
 
SELECT * FROM events
WHERE (created_at, event_id) > ('2024-01-15 14:30:00', 50000)
ORDER BY created_at, event_id
LIMIT 50;

Composite Indexes for Row Comparisons

Row Subquery Performance Guidelines
Pattern	Performance	Optimization Tips
Row = (non-correlated subquery)	Excellent	Subquery cached; index the lookup columns
Row IN (small result set)	Good	Semi-join optimization; index both sides
Row IN (large result set)	Variable	May scan; consider JOIN or hash strategy
Row comparison for pagination	Excellent	Composite index; avoids OFFSET overhead
Row comparison with functions	Poor	Index not usable; materialize values if possible

Summary: Row Subquery Mastery

Row subqueries extend SQL's expressive power to multi-column comparisons. Let's consolidate the key concepts:

Key Takeaways

•Definition — Row subqueries return exactly one row with multiple columns, producing a tuple for composite comparisons.
•Row constructors — Use (col1, col2, ...) or ROW(col1, col2, ...) syntax to create tuples for comparison.
•Comparison operators — Equality (=) requires ALL elements to match; ordering (<, >) uses lexicographic comparison.
•NULL complications — NULL in any tuple element can cause UNKNOWN results. Use NULL-safe operators or explicit IS NULL handling.
•IN with row tuples — Allows matching against a set of row tuples, supporting multi-column membership tests.
•Database variance — Support varies widely; SQL Server lacks row comparison entirely.
•Use cases — Composite key matching, pagination cursors, record comparison, and atomic multi-attribute retrieval.

What's Next:

Page Complete