Database Management SystemsWHERE Clause

The WHERE Clause: Filtering Data with Precision

LevelBeginner

Duration55 mins

TopicWHERE Clause

1 / 5

Filtering Rows: The Foundation of Data Selection

The Art of Asking the Right Question

When you query a database, you're rarely interested in every row of a table. You want specific data: customers who placed orders this month, products below a price threshold, transactions flagged for review, employees in a particular department. The WHERE clause is SQL's answer to this fundamental need—it transforms broad table scans into surgical data extraction.

Consider a database with 100 million customer records. Without filtering, every query would return all 100 million rows—an impractical data deluge. The WHERE clause lets you express precisely which rows matter for your current question, reducing 100 million rows to exactly the subset you need.

What You Will Learn

By the end of this page, you will understand how the WHERE clause fundamentally works—from its role in the query execution pipeline to the mechanics of predicate evaluation. You'll grasp why WHERE is not just a convenience feature but an essential performance mechanism that determines whether queries complete in milliseconds or hours.

The Role of WHERE in SQL

The WHERE clause serves as SQL's predicate filter—a logical condition that each row must satisfy to be included in the query result. This simple concept underlies virtually every practical database operation.

Syntactic Position:

In the SQL statement structure, WHERE appears after the FROM clause (and any JOIN clauses) but before GROUP BY, HAVING, and ORDER BY:

SELECT column_list
FROM table_name
[JOIN ...]
WHERE filter_condition   -- Row filtering happens here
[GROUP BY ...]
[HAVING ...]
[ORDER BY ...]

This positioning is significant. The WHERE clause operates on individual rows before any grouping occurs. It filters the raw data from which aggregations, sortings, and other operations will work.

WHERE vs. HAVING: A Critical Distinction

WHERE filters rows before grouping; HAVING filters groups after aggregation. If you need to filter based on an aggregate result (like 'departments with more than 10 employees'), you use HAVING. If you're filtering individual rows ('employees hired after 2020'), you use WHERE. Misusing these leads to errors or incorrect results.

The Query Execution Perspective:

Understanding WHERE requires knowing when it executes in the query pipeline. The logical order of SQL operations is:

FROM/JOIN — Determine the source tables and combine them
WHERE — Filter individual rows based on conditions
GROUP BY — Organize remaining rows into groups
HAVING — Filter groups based on aggregate conditions
SELECT — Choose which columns/expressions to output
DISTINCT — Remove duplicate rows from output
ORDER BY — Sort the final result set
LIMIT/OFFSET — Restrict the number of returned rows

This logical order explains why you cannot reference column aliases (defined in SELECT) within the WHERE clause—SELECT hasn't executed yet when WHERE runs.

where_execution_order.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- This query demonstrates the logical execution order
-- WHERE executes BEFORE SELECT, so alias 'total' is not yet defined
 
-- ❌ INCORRECT: Cannot use SELECT alias in WHERE
SELECT price * quantity AS total
FROM order_items
WHERE total > 100;  -- Error: 'total' is not recognized
 
-- ✅ CORRECT: Repeat the expression in WHERE
SELECT price * quantity AS total
FROM order_items
WHERE price * quantity > 100;
 
-- ✅ ALTERNATIVE: Use a subquery or CTE
SELECT * FROM (
    SELECT price * quantity AS total
    FROM order_items
) AS computed
WHERE total > 100;

Predicate Evaluation Mechanics

A predicate is a logical expression that evaluates to TRUE, FALSE, or UNKNOWN for each row. The WHERE clause specifies one or more predicates, and only rows for which the entire WHERE expression evaluates to TRUE are included in the result.

The Three-Valued Logic Foundation:

SQL uses three-valued logic because of NULL values. Any comparison involving NULL yields UNKNOWN, not TRUE or FALSE. This has profound implications:

NULL = NULL evaluates to UNKNOWN (not TRUE)
NULL <> NULL evaluates to UNKNOWN (not TRUE)
x > NULL evaluates to UNKNOWN for any value of x

Three-Valued Logic Truth Table (AND, OR, NOT)
Expression	TRUE	FALSE	UNKNOWN
TRUE AND ?	TRUE	FALSE	UNKNOWN
FALSE AND ?	FALSE	FALSE	FALSE
UNKNOWN AND ?	UNKNOWN	FALSE	UNKNOWN
TRUE OR ?	TRUE	TRUE	TRUE
FALSE OR ?	TRUE	FALSE	UNKNOWN
UNKNOWN OR ?	TRUE	UNKNOWN	UNKNOWN
NOT ?	FALSE	TRUE	UNKNOWN

Row-by-Row Evaluation:

The database engine evaluates the WHERE predicate independently for each row (logically, at least—physical optimizations may differ). Consider this query:

SELECT * FROM employees WHERE department_id = 10 AND salary > 50000;

For each row in employees, the engine:

Retrieves the department_id value and checks if it equals 10
Retrieves the salary value and checks if it exceeds 50000
Combines results with AND logic
Includes the row only if the combined result is TRUE

The NULL Trap in WHERE

Because WHERE only includes rows evaluating to TRUE (not UNKNOWN), conditions like WHERE column_name = value will never return rows where column_name is NULL—even if you're looking for a specific value that happens to be in other rows. NULL rows silently disappear. This is a common source of bugs when developers forget that NULL comparisons yield UNKNOWN.

null_predicate_behavior.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Consider a table with some NULL values in the 'region' column
-- Table: customers (id, name, region)
-- Sample data:
-- (1, 'Alice', 'North')
-- (2, 'Bob', NULL)
-- (3, 'Carol', 'South')
-- (4, 'Dave', NULL)
 
-- Find customers NOT in the North region
SELECT * FROM customers WHERE region <> 'North';
-- Returns: Carol (South)
-- MISSING: Bob and Dave! Their NULL <> 'North' evaluates to UNKNOWN
 
-- To include NULLs, explicitly handle them:
SELECT * FROM customers 
WHERE region <> 'North' OR region IS NULL;
-- Returns: Bob, Carol, Dave
 
-- Or use NULL-safe comparison (MySQL/MariaDB specific):
SELECT * FROM customers 
WHERE NOT (region <=> 'North');
-- <=> returns TRUE for NULL = NULL, FALSE for NULL vs non-NULL

Simple Predicate Forms

Before exploring complex conditions, let's establish the fundamental predicate forms that WHERE clauses use. Each form addresses a specific type of question about your data.

Equality and Inequality:

The most basic predicates test whether a column value equals (or doesn't equal) a specified value:

equality_predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
-- Equality: Find exact matches
SELECT * FROM products WHERE category = 'Electronics';
SELECT * FROM orders WHERE status = 'pending';
SELECT * FROM users WHERE email = 'admin@example.com';
 
-- Inequality: Exclude specific values  
SELECT * FROM products WHERE category <> 'Electronics';
SELECT * FROM orders WHERE status != 'cancelled';  -- != is equivalent to <>
 
-- Note: String comparisons may be case-sensitive or case-insensitive
-- depending on the database collation settings
SELECT * FROM users WHERE LOWER(email) = LOWER('Admin@Example.com');

Relational Comparisons:

Beyond equality, WHERE supports full relational comparisons for ordered data types (numbers, dates, strings):

Comparison Operators in SQL
Operator	Meaning	Example
`=`	Equal to	`price = 100`
`<>` or `!=`	Not equal to	`status <> 'deleted'`
`<`	Less than	`quantity < 10`
`<=`	Less than or equal to	`age <= 65`
`>`	Greater than	`salary > 50000`
`>=`	Greater than or equal to	`created_at >= '2024-01-01'`

relational_predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Numeric comparisons
SELECT * FROM products WHERE price < 50.00;
SELECT * FROM inventory WHERE stock_level <= reorder_point;
SELECT * FROM employees WHERE years_of_service >= 10;
 
-- Date comparisons
SELECT * FROM orders WHERE order_date > '2024-01-01';
SELECT * FROM subscriptions WHERE expiry_date <= CURRENT_DATE;
SELECT * FROM events WHERE event_time >= NOW() - INTERVAL '24 hours';
 
-- String comparisons (lexicographic ordering)
SELECT * FROM products WHERE name < 'M';  -- Products A-L
SELECT * FROM customers WHERE last_name >= 'Smith';
 
-- Column-to-column comparisons
SELECT * FROM orders WHERE shipped_date > order_date + INTERVAL '7 days';
SELECT * FROM products WHERE sale_price < regular_price;

Type Coercion in Comparisons

When comparing values of different types, SQL applies implicit type coercion. For example, comparing a string '100' to a number 100 may convert the string to a number. However, this can lead to unexpected results or performance issues (index scans instead of seeks). Always compare like types for predictable behavior.

The Filtering Architecture

Understanding how the database engine implements WHERE filtering reveals why some queries fly while others crawl. The filtering architecture involves several key mechanisms.

Full Table Scan vs. Index Scan:

Without indexes, the database must examine every row in the table—a full table scan. For large tables, this is prohibitively slow. With appropriate indexes, the engine can directly locate matching rows—an index scan or index seek.

Full Table Scan

•Examines every row in the table
•O(n) complexity for n rows
•Acceptable for small tables (<1000 rows)
•Necessary when no suitable index exists
•May be chosen when returning most rows
•Resource-intensive for large tables

Index Seek/Scan

•Uses B-tree or hash index structure
•O(log n) seek + O(k) for k matching rows
•Critical for large tables (millions of rows)
•Requires index on filtered column(s)
•Optimizer chooses based on selectivity
•Dramatically faster for selective queries

Predicate Pushdown:

Modern query optimizers employ predicate pushdown—applying WHERE conditions as early as possible in the execution plan. In complex queries involving joins, subqueries, or views, pushing predicates down to the base tables reduces the data volume flowing through subsequent operations.

Sargable Predicates:

A predicate is sargable (Search ARGument ABLE) if it can use an index efficiently. Sargability is crucial for performance:

sargable_predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- ✅ SARGABLE: Index on 'salary' can be used
SELECT * FROM employees WHERE salary > 50000;
 
-- ❌ NON-SARGABLE: Function on column prevents index use
SELECT * FROM employees WHERE YEAR(hire_date) = 2024;
 
-- ✅ SARGABLE ALTERNATIVE: Rewrite as range
SELECT * FROM employees 
WHERE hire_date >= '2024-01-01' AND hire_date < '2025-01-01';
 
-- ❌ NON-SARGABLE: Arithmetic on column
SELECT * FROM products WHERE price * 1.1 < 100;
 
-- ✅ SARGABLE ALTERNATIVE: Move arithmetic to constant side
SELECT * FROM products WHERE price < 100 / 1.1;
 
-- ❌ NON-SARGABLE: Leading wildcard in LIKE
SELECT * FROM customers WHERE email LIKE '%@gmail.com';
 
-- ✅ SARGABLE: Trailing wildcard in LIKE
SELECT * FROM customers WHERE email LIKE 'john%';

The Sargability Rule

Keep the indexed column 'naked' on one side of the comparison. Any function, calculation, or type conversion applied to the column typically destroys sargability. Move all transformations to the constant/literal side of the expression.

Selectivity and Cardinality

Selectivity measures what fraction of rows a predicate filters out. Cardinality is the estimated number of rows a predicate will return. These metrics drive optimizer decisions.

Selectivity:

Selectivity ranges from 0 to 1:

Selectivity = 0: No rows match (e.g., WHERE 1 = 0)
Selectivity = 1: All rows match (e.g., WHERE 1 = 1)
Low selectivity (close to 0): Highly selective, few rows returned
High selectivity (close to 1): Most rows returned

Selectivity Examples (1 million row table)
Predicate	Estimated Selectivity	Expected Rows	Index Beneficial?
`id = 12345` (unique key)	0.000001	1	Yes (index seek)
`status = 'active'` (10% active)	0.1	100,000	Maybe (depends on data distribution)
`created_at > '2024-01-01'` (50% recent)	0.5	500,000	Probably not (too many rows)
`is_deleted = false` (99% not deleted)	0.99	990,000	No (table scan faster)
`category = 'X' AND region = 'Y'`	0.01 (if independent)	10,000	Yes (composite index)

How the Optimizer Uses Selectivity:

The query optimizer estimates selectivity to choose between:

Index scan vs. table scan: Indexes help when selectivity is low
Join order: More selective predicates filter early, reducing downstream work
Join method: Hash joins for large sets, nested loops for small sets

Statistics and Histograms:

Databases maintain statistics about data distribution:

Row counts: Total rows in each table
Distinct value counts: Number of unique values per column
Histograms: Distribution of values across ranges

These statistics power selectivity estimates. Outdated statistics lead to poor query plans—a common cause of sudden performance degradation.

statistics_commands.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- PostgreSQL: Update statistics for a table
ANALYZE customers;
 
-- PostgreSQL: View table statistics
SELECT attname, n_distinct, most_common_vals, histogram_bounds
FROM pg_stats WHERE tablename = 'customers';
 
-- SQL Server: Update statistics
UPDATE STATISTICS customers;
 
-- SQL Server: View index statistics
DBCC SHOW_STATISTICS ('customers', 'idx_customers_region');
 
-- MySQL: Update statistics
ANALYZE TABLE customers;
 
-- MySQL: View index cardinality
SHOW INDEX FROM customers;
 
-- Oracle: Gather table statistics
EXEC DBMS_STATS.GATHER_TABLE_STATS('schema_name', 'customers');

The Statistics Maintenance Problem

After bulk data loads or significant data changes, statistics may be stale. A query that ran in 100ms yesterday might take 10 minutes today—not because the data changed, but because the optimizer is making decisions based on outdated statistics. Schedule regular statistics updates for production databases.

Filter Expressions and Data Types

WHERE clauses must respect the data types of the columns they filter. Understanding type-specific filtering patterns prevents errors and ensures correct results.

Numeric Filtering:

numeric_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Integer comparisons
SELECT * FROM orders WHERE quantity = 5;
SELECT * FROM inventory WHERE stock_level BETWEEN 10 AND 100;
 
-- Decimal/floating-point comparisons
-- Caution: Floating-point equality can be unreliable due to precision
SELECT * FROM products WHERE price = 19.99;  -- May miss due to precision
 
-- Better approach for floating-point ranges:
SELECT * FROM products WHERE price >= 19.99 AND price < 20.00;
SELECT * FROM products WHERE ABS(price - 19.99) < 0.001;
 
-- Scientific notation (some databases)
SELECT * FROM measurements WHERE value > 1e-6;
 
-- NULL handling in numeric columns
SELECT * FROM products WHERE price IS NOT NULL AND price > 0;

Date and Time Filtering:

Date filtering is one of the most common WHERE clause patterns, with several syntax variations:

date_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Date literals (ISO 8601 format is most portable)
SELECT * FROM orders WHERE order_date = '2024-03-15';
 
-- Date ranges (common for reports)
SELECT * FROM orders 
WHERE order_date >= '2024-01-01' AND order_date < '2024-04-01';
 
-- Using BETWEEN for dates (inclusive on both ends)
SELECT * FROM orders 
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
 
-- Current date comparisons
SELECT * FROM subscriptions WHERE expiry_date < CURRENT_DATE;  -- PostgreSQL/Standard
SELECT * FROM subscriptions WHERE expiry_date < CURDATE();     -- MySQL
 
-- Date arithmetic
SELECT * FROM orders 
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';         -- PostgreSQL
 
SELECT * FROM orders 
WHERE order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);          -- MySQL
 
-- Timestamp with time zone considerations
SELECT * FROM events 
WHERE event_time >= '2024-03-15 00:00:00+00'                   -- UTC timestamp
  AND event_time < '2024-03-16 00:00:00+00';
 
-- Extracting date parts (caution: may prevent index use)
SELECT * FROM orders WHERE EXTRACT(MONTH FROM order_date) = 3;

String Filtering:

String comparisons involve case sensitivity, collation, and encoding considerations:

string_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Exact match (case sensitivity depends on collation)
SELECT * FROM users WHERE username = 'johndoe';
 
-- Case-insensitive match (explicit)
SELECT * FROM users WHERE LOWER(username) = LOWER('JohnDoe');
SELECT * FROM users WHERE username ILIKE 'johndoe';  -- PostgreSQL
 
-- Empty string vs NULL
SELECT * FROM users WHERE email = '';        -- Empty string
SELECT * FROM users WHERE email IS NULL;     -- NULL value
SELECT * FROM users WHERE COALESCE(email, '') = '';  -- Both
 
-- Unicode and encoding
SELECT * FROM products WHERE name = N'日本語';  -- SQL Server Unicode literal
 
-- Collation-specific comparisons
SELECT * FROM products 
WHERE name COLLATE utf8_general_ci = 'WIDGET';  -- MySQL case-insensitive

Collation Matters

A column's collation determines how string comparisons work—case sensitivity, accent sensitivity, and sort order. Two strings that appear identical may compare as unequal under certain collations. Always verify your database's default collation for production systems.

Boolean and Expression Filtering

Boolean Column Filtering:

Boolean columns (TRUE/FALSE) allow concise filtering, but watch for NULL complications:

boolean_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Direct boolean filtering
SELECT * FROM users WHERE is_active = TRUE;
SELECT * FROM users WHERE is_active = FALSE;
 
-- Shorthand (column name alone is truthy in some databases)
SELECT * FROM users WHERE is_active;        -- PostgreSQL, MySQL
SELECT * FROM users WHERE NOT is_active;    -- Negation
 
-- Boolean with NULL handling
SELECT * FROM users WHERE is_verified IS TRUE;   -- Excludes NULL
SELECT * FROM users WHERE is_verified IS FALSE;  -- Excludes NULL
SELECT * FROM users WHERE is_verified IS NOT TRUE;  -- Includes FALSE and NULL
 
-- Converting to boolean
SELECT * FROM orders WHERE (total_amount > 0) = TRUE;

Computed Expression Filtering:

WHERE clauses can filter based on computed expressions, though with performance implications:

expression_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Arithmetic expressions
SELECT * FROM order_items 
WHERE quantity * unit_price > 1000;  -- Line items over $1000
 
-- String expressions
SELECT * FROM products 
WHERE LENGTH(description) > 500;  -- Long descriptions
 
-- Conditional expressions (CASE)
SELECT * FROM employees WHERE (
    CASE 
        WHEN department = 'Engineering' THEN salary > 100000
        WHEN department = 'Sales' THEN commission > 10000
        ELSE FALSE
    END
);
 
-- Function-based filtering
SELECT * FROM users 
WHERE EXTRACT(YEAR FROM AGE(birth_date)) >= 18;  -- Adults only
 
-- Multi-column computations
SELECT * FROM products 
WHERE (regular_price - sale_price) / regular_price > 0.2;  -- 20%+ discount

Expression Performance Impact

Complex expressions in WHERE clauses must be evaluated for every candidate row. Unlike simple column comparisons that can use indexes, computed expressions often require full table scans. For frequently-used computed filters, consider creating a computed/generated column with an index, or using materialized views.

Summary: The Power of Precise Filtering

The WHERE clause is deceptively simple in syntax but profound in impact. We've established the foundational concepts that make WHERE the workhouse of SQL querying.

Key Takeaways

•WHERE operates on individual rows — It filters rows before grouping, sorting, or aggregation occurs
•Three-valued logic governs predicates — NULL comparisons yield UNKNOWN, which is treated as FALSE for filtering purposes
•Sargability determines performance — Keep columns 'naked' in predicates to enable index usage
•Selectivity guides optimizer choices — More selective predicates (fewer matching rows) benefit most from indexes
•Data types shape predicate syntax — Dates, strings, numbers, and booleans have type-specific filtering patterns
•Statistics maintenance is critical — Stale statistics lead to poor query plans and degraded performance

What's Next:

Now that we understand the foundational mechanics of WHERE clause filtering, the next page dives into comparison operators—the specific tools for expressing conditions. You'll learn the full arsenal of operators available: equality, inequality, relational comparisons, and their nuanced behaviors across different data types.

Page Complete

You now understand how WHERE clauses fundamentally work—from predicate evaluation to query execution order to filtering architecture. This foundation prepares you for mastering the specific operators and patterns that make WHERE clauses powerful and efficient.

1 / 5

Loading learning content...

Database Management SystemsWHERE Clause

The WHERE Clause: Filtering Data with Precision

LevelBeginner

Duration55 mins

TopicWHERE Clause

1 / 5

Filtering Rows: The Foundation of Data Selection

The Art of Asking the Right Question

What You Will Learn

The Role of WHERE in SQL

Syntactic Position:

In the SQL statement structure, WHERE appears after the FROM clause (and any JOIN clauses) but before GROUP BY, HAVING, and ORDER BY:

SELECT column_list
FROM table_name
[JOIN ...]
WHERE filter_condition   -- Row filtering happens here
[GROUP BY ...]
[HAVING ...]
[ORDER BY ...]

WHERE vs. HAVING: A Critical Distinction

The Query Execution Perspective:

Understanding WHERE requires knowing when it executes in the query pipeline. The logical order of SQL operations is:

FROM/JOIN — Determine the source tables and combine them
WHERE — Filter individual rows based on conditions
GROUP BY — Organize remaining rows into groups
HAVING — Filter groups based on aggregate conditions
SELECT — Choose which columns/expressions to output
DISTINCT — Remove duplicate rows from output
ORDER BY — Sort the final result set
LIMIT/OFFSET — Restrict the number of returned rows

This logical order explains why you cannot reference column aliases (defined in SELECT) within the WHERE clause—SELECT hasn't executed yet when WHERE runs.

where_execution_order.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
-- This query demonstrates the logical execution order
-- WHERE executes BEFORE SELECT, so alias 'total' is not yet defined
 
-- ❌ INCORRECT: Cannot use SELECT alias in WHERE
SELECT price * quantity AS total
FROM order_items
WHERE total > 100;  -- Error: 'total' is not recognized
 
-- ✅ CORRECT: Repeat the expression in WHERE
SELECT price * quantity AS total
FROM order_items
WHERE price * quantity > 100;
 
-- ✅ ALTERNATIVE: Use a subquery or CTE
SELECT * FROM (
    SELECT price * quantity AS total
    FROM order_items
) AS computed
WHERE total > 100;

Predicate Evaluation Mechanics

The Three-Valued Logic Foundation:

SQL uses three-valued logic because of NULL values. Any comparison involving NULL yields UNKNOWN, not TRUE or FALSE. This has profound implications:

NULL = NULL evaluates to UNKNOWN (not TRUE)
NULL <> NULL evaluates to UNKNOWN (not TRUE)
x > NULL evaluates to UNKNOWN for any value of x

Three-Valued Logic Truth Table (AND, OR, NOT)
Expression	TRUE	FALSE	UNKNOWN
TRUE AND ?	TRUE	FALSE	UNKNOWN
FALSE AND ?	FALSE	FALSE	FALSE
UNKNOWN AND ?	UNKNOWN	FALSE	UNKNOWN
TRUE OR ?	TRUE	TRUE	TRUE
FALSE OR ?	TRUE	FALSE	UNKNOWN
UNKNOWN OR ?	TRUE	UNKNOWN	UNKNOWN
NOT ?	FALSE	TRUE	UNKNOWN

Row-by-Row Evaluation:

The database engine evaluates the WHERE predicate independently for each row (logically, at least—physical optimizations may differ). Consider this query:

SELECT * FROM employees WHERE department_id = 10 AND salary > 50000;

For each row in employees, the engine:

Retrieves the department_id value and checks if it equals 10
Retrieves the salary value and checks if it exceeds 50000
Combines results with AND logic
Includes the row only if the combined result is TRUE

The NULL Trap in WHERE

null_predicate_behavior.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Consider a table with some NULL values in the 'region' column
-- Table: customers (id, name, region)
-- Sample data:
-- (1, 'Alice', 'North')
-- (2, 'Bob', NULL)
-- (3, 'Carol', 'South')
-- (4, 'Dave', NULL)
 
-- Find customers NOT in the North region
SELECT * FROM customers WHERE region <> 'North';
-- Returns: Carol (South)
-- MISSING: Bob and Dave! Their NULL <> 'North' evaluates to UNKNOWN
 
-- To include NULLs, explicitly handle them:
SELECT * FROM customers 
WHERE region <> 'North' OR region IS NULL;
-- Returns: Bob, Carol, Dave
 
-- Or use NULL-safe comparison (MySQL/MariaDB specific):
SELECT * FROM customers 
WHERE NOT (region <=> 'North');
-- <=> returns TRUE for NULL = NULL, FALSE for NULL vs non-NULL

Simple Predicate Forms

Before exploring complex conditions, let's establish the fundamental predicate forms that WHERE clauses use. Each form addresses a specific type of question about your data.

Equality and Inequality:

The most basic predicates test whether a column value equals (or doesn't equal) a specified value:

equality_predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
-- Equality: Find exact matches
SELECT * FROM products WHERE category = 'Electronics';
SELECT * FROM orders WHERE status = 'pending';
SELECT * FROM users WHERE email = 'admin@example.com';
 
-- Inequality: Exclude specific values  
SELECT * FROM products WHERE category <> 'Electronics';
SELECT * FROM orders WHERE status != 'cancelled';  -- != is equivalent to <>
 
-- Note: String comparisons may be case-sensitive or case-insensitive
-- depending on the database collation settings
SELECT * FROM users WHERE LOWER(email) = LOWER('Admin@Example.com');

Relational Comparisons:

Beyond equality, WHERE supports full relational comparisons for ordered data types (numbers, dates, strings):

Comparison Operators in SQL
Operator	Meaning	Example
`=`	Equal to	`price = 100`
`<>` or `!=`	Not equal to	`status <> 'deleted'`
`<`	Less than	`quantity < 10`
`<=`	Less than or equal to	`age <= 65`
`>`	Greater than	`salary > 50000`
`>=`	Greater than or equal to	`created_at >= '2024-01-01'`

relational_predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Numeric comparisons
SELECT * FROM products WHERE price < 50.00;
SELECT * FROM inventory WHERE stock_level <= reorder_point;
SELECT * FROM employees WHERE years_of_service >= 10;
 
-- Date comparisons
SELECT * FROM orders WHERE order_date > '2024-01-01';
SELECT * FROM subscriptions WHERE expiry_date <= CURRENT_DATE;
SELECT * FROM events WHERE event_time >= NOW() - INTERVAL '24 hours';
 
-- String comparisons (lexicographic ordering)
SELECT * FROM products WHERE name < 'M';  -- Products A-L
SELECT * FROM customers WHERE last_name >= 'Smith';
 
-- Column-to-column comparisons
SELECT * FROM orders WHERE shipped_date > order_date + INTERVAL '7 days';
SELECT * FROM products WHERE sale_price < regular_price;

Type Coercion in Comparisons

The Filtering Architecture

Understanding how the database engine implements WHERE filtering reveals why some queries fly while others crawl. The filtering architecture involves several key mechanisms.

Full Table Scan vs. Index Scan:

Full Table Scan

•Examines every row in the table
•O(n) complexity for n rows
•Acceptable for small tables (<1000 rows)
•Necessary when no suitable index exists
•May be chosen when returning most rows
•Resource-intensive for large tables

Index Seek/Scan

•Uses B-tree or hash index structure
•O(log n) seek + O(k) for k matching rows
•Critical for large tables (millions of rows)
•Requires index on filtered column(s)
•Optimizer chooses based on selectivity
•Dramatically faster for selective queries

Predicate Pushdown:

Sargable Predicates:

A predicate is sargable (Search ARGument ABLE) if it can use an index efficiently. Sargability is crucial for performance:

sargable_predicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- ✅ SARGABLE: Index on 'salary' can be used
SELECT * FROM employees WHERE salary > 50000;
 
-- ❌ NON-SARGABLE: Function on column prevents index use
SELECT * FROM employees WHERE YEAR(hire_date) = 2024;
 
-- ✅ SARGABLE ALTERNATIVE: Rewrite as range
SELECT * FROM employees 
WHERE hire_date >= '2024-01-01' AND hire_date < '2025-01-01';
 
-- ❌ NON-SARGABLE: Arithmetic on column
SELECT * FROM products WHERE price * 1.1 < 100;
 
-- ✅ SARGABLE ALTERNATIVE: Move arithmetic to constant side
SELECT * FROM products WHERE price < 100 / 1.1;
 
-- ❌ NON-SARGABLE: Leading wildcard in LIKE
SELECT * FROM customers WHERE email LIKE '%@gmail.com';
 
-- ✅ SARGABLE: Trailing wildcard in LIKE
SELECT * FROM customers WHERE email LIKE 'john%';

The Sargability Rule

Selectivity and Cardinality

Selectivity measures what fraction of rows a predicate filters out. Cardinality is the estimated number of rows a predicate will return. These metrics drive optimizer decisions.

Selectivity:

Selectivity ranges from 0 to 1:

Selectivity = 0: No rows match (e.g., WHERE 1 = 0)
Selectivity = 1: All rows match (e.g., WHERE 1 = 1)
Low selectivity (close to 0): Highly selective, few rows returned
High selectivity (close to 1): Most rows returned

Selectivity Examples (1 million row table)
Predicate	Estimated Selectivity	Expected Rows	Index Beneficial?
`id = 12345` (unique key)	0.000001	1	Yes (index seek)
`status = 'active'` (10% active)	0.1	100,000	Maybe (depends on data distribution)
`created_at > '2024-01-01'` (50% recent)	0.5	500,000	Probably not (too many rows)
`is_deleted = false` (99% not deleted)	0.99	990,000	No (table scan faster)
`category = 'X' AND region = 'Y'`	0.01 (if independent)	10,000	Yes (composite index)

How the Optimizer Uses Selectivity:

The query optimizer estimates selectivity to choose between:

Index scan vs. table scan: Indexes help when selectivity is low
Join order: More selective predicates filter early, reducing downstream work
Join method: Hash joins for large sets, nested loops for small sets

Statistics and Histograms:

Databases maintain statistics about data distribution:

Row counts: Total rows in each table
Distinct value counts: Number of unique values per column
Histograms: Distribution of values across ranges

These statistics power selectivity estimates. Outdated statistics lead to poor query plans—a common cause of sudden performance degradation.

statistics_commands.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- PostgreSQL: Update statistics for a table
ANALYZE customers;
 
-- PostgreSQL: View table statistics
SELECT attname, n_distinct, most_common_vals, histogram_bounds
FROM pg_stats WHERE tablename = 'customers';
 
-- SQL Server: Update statistics
UPDATE STATISTICS customers;
 
-- SQL Server: View index statistics
DBCC SHOW_STATISTICS ('customers', 'idx_customers_region');
 
-- MySQL: Update statistics
ANALYZE TABLE customers;
 
-- MySQL: View index cardinality
SHOW INDEX FROM customers;
 
-- Oracle: Gather table statistics
EXEC DBMS_STATS.GATHER_TABLE_STATS('schema_name', 'customers');

The Statistics Maintenance Problem

Filter Expressions and Data Types

WHERE clauses must respect the data types of the columns they filter. Understanding type-specific filtering patterns prevents errors and ensures correct results.

Numeric Filtering:

numeric_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Integer comparisons
SELECT * FROM orders WHERE quantity = 5;
SELECT * FROM inventory WHERE stock_level BETWEEN 10 AND 100;
 
-- Decimal/floating-point comparisons
-- Caution: Floating-point equality can be unreliable due to precision
SELECT * FROM products WHERE price = 19.99;  -- May miss due to precision
 
-- Better approach for floating-point ranges:
SELECT * FROM products WHERE price >= 19.99 AND price < 20.00;
SELECT * FROM products WHERE ABS(price - 19.99) < 0.001;
 
-- Scientific notation (some databases)
SELECT * FROM measurements WHERE value > 1e-6;
 
-- NULL handling in numeric columns
SELECT * FROM products WHERE price IS NOT NULL AND price > 0;

Date and Time Filtering:

Date filtering is one of the most common WHERE clause patterns, with several syntax variations:

date_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Date literals (ISO 8601 format is most portable)
SELECT * FROM orders WHERE order_date = '2024-03-15';
 
-- Date ranges (common for reports)
SELECT * FROM orders 
WHERE order_date >= '2024-01-01' AND order_date < '2024-04-01';
 
-- Using BETWEEN for dates (inclusive on both ends)
SELECT * FROM orders 
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31';
 
-- Current date comparisons
SELECT * FROM subscriptions WHERE expiry_date < CURRENT_DATE;  -- PostgreSQL/Standard
SELECT * FROM subscriptions WHERE expiry_date < CURDATE();     -- MySQL
 
-- Date arithmetic
SELECT * FROM orders 
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days';         -- PostgreSQL
 
SELECT * FROM orders 
WHERE order_date >= DATE_SUB(NOW(), INTERVAL 30 DAY);          -- MySQL
 
-- Timestamp with time zone considerations
SELECT * FROM events 
WHERE event_time >= '2024-03-15 00:00:00+00'                   -- UTC timestamp
  AND event_time < '2024-03-16 00:00:00+00';
 
-- Extracting date parts (caution: may prevent index use)
SELECT * FROM orders WHERE EXTRACT(MONTH FROM order_date) = 3;

String Filtering:

String comparisons involve case sensitivity, collation, and encoding considerations:

string_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
-- Exact match (case sensitivity depends on collation)
SELECT * FROM users WHERE username = 'johndoe';
 
-- Case-insensitive match (explicit)
SELECT * FROM users WHERE LOWER(username) = LOWER('JohnDoe');
SELECT * FROM users WHERE username ILIKE 'johndoe';  -- PostgreSQL
 
-- Empty string vs NULL
SELECT * FROM users WHERE email = '';        -- Empty string
SELECT * FROM users WHERE email IS NULL;     -- NULL value
SELECT * FROM users WHERE COALESCE(email, '') = '';  -- Both
 
-- Unicode and encoding
SELECT * FROM products WHERE name = N'日本語';  -- SQL Server Unicode literal
 
-- Collation-specific comparisons
SELECT * FROM products 
WHERE name COLLATE utf8_general_ci = 'WIDGET';  -- MySQL case-insensitive

Collation Matters

Boolean and Expression Filtering

Boolean Column Filtering:

Boolean columns (TRUE/FALSE) allow concise filtering, but watch for NULL complications:

boolean_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
-- Direct boolean filtering
SELECT * FROM users WHERE is_active = TRUE;
SELECT * FROM users WHERE is_active = FALSE;
 
-- Shorthand (column name alone is truthy in some databases)
SELECT * FROM users WHERE is_active;        -- PostgreSQL, MySQL
SELECT * FROM users WHERE NOT is_active;    -- Negation
 
-- Boolean with NULL handling
SELECT * FROM users WHERE is_verified IS TRUE;   -- Excludes NULL
SELECT * FROM users WHERE is_verified IS FALSE;  -- Excludes NULL
SELECT * FROM users WHERE is_verified IS NOT TRUE;  -- Includes FALSE and NULL
 
-- Converting to boolean
SELECT * FROM orders WHERE (total_amount > 0) = TRUE;

Computed Expression Filtering:

WHERE clauses can filter based on computed expressions, though with performance implications:

expression_filtering.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Arithmetic expressions
SELECT * FROM order_items 
WHERE quantity * unit_price > 1000;  -- Line items over $1000
 
-- String expressions
SELECT * FROM products 
WHERE LENGTH(description) > 500;  -- Long descriptions
 
-- Conditional expressions (CASE)
SELECT * FROM employees WHERE (
    CASE 
        WHEN department = 'Engineering' THEN salary > 100000
        WHEN department = 'Sales' THEN commission > 10000
        ELSE FALSE
    END
);
 
-- Function-based filtering
SELECT * FROM users 
WHERE EXTRACT(YEAR FROM AGE(birth_date)) >= 18;  -- Adults only
 
-- Multi-column computations
SELECT * FROM products 
WHERE (regular_price - sale_price) / regular_price > 0.2;  -- 20%+ discount

Expression Performance Impact

Summary: The Power of Precise Filtering

The WHERE clause is deceptively simple in syntax but profound in impact. We've established the foundational concepts that make WHERE the workhouse of SQL querying.

Key Takeaways

•WHERE operates on individual rows — It filters rows before grouping, sorting, or aggregation occurs
•Three-valued logic governs predicates — NULL comparisons yield UNKNOWN, which is treated as FALSE for filtering purposes
•Sargability determines performance — Keep columns 'naked' in predicates to enable index usage
•Selectivity guides optimizer choices — More selective predicates (fewer matching rows) benefit most from indexes
•Data types shape predicate syntax — Dates, strings, numbers, and booleans have type-specific filtering patterns
•Statistics maintenance is critical — Stale statistics lead to poor query plans and degraded performance

What's Next:

Page Complete

1 / 5