Loading content...
Two SQL queries can ask the exact same question, return identical results, and yet differ in execution time by orders of magnitude. The difference often lies not in what you query, but how you express the conditions.
The way you write WHERE clauses, function calls, and comparisons directly impacts whether the database can use indexes, apply query shortcuts, or must fall back to expensive operations. A single function call around the wrong column can transform a millisecond index seek into a multi-second full table scan.
This page reveals the hidden rules governing SQL expression performance—knowledge that separates queries that fly from queries that crawl.
By the end of this page, you'll understand sargability—the key property determining index usage, master patterns for function-safe expressions, recognize implicit conversion traps, and learn expression rewriting techniques that unlock optimizer power.
Sargability (Search ARGument ABLE) refers to whether an expression allows the database to efficiently seek through an index. A sargable condition can be evaluated using an index seek; a non-sargable condition requires scanning every row.
The Core Principle:
For an expression to be sargable, the indexed column must appear alone on one side of the comparison, without any functions, operators, or transformations applied to it.
12345678910111213141516
-- Assume index on: orders.order_date -- SARGABLE: Column alone on one sideSELECT * FROM orders WHERE order_date = '2024-01-15';SELECT * FROM orders WHERE order_date > '2024-01-01';SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'; -- Execution: Index seek directly to matching values -- NON-SARGABLE: Function applied to indexed columnSELECT * FROM orders WHERE YEAR(order_date) = 2024;SELECT * FROM orders WHERE DATE(order_date) = '2024-01-15';SELECT * FROM orders WHERE UPPER(customer_name) = 'SMITH';SELECT * FROM orders WHERE order_total + tax > 1000; -- Execution: Must scan EVERY row, apply function, then compareWhy Functions Break Sargability:
An index on order_date is organized by the raw column value. When you apply YEAR(order_date), the database must:
order_date valueYEAR() function2024The index knows where '2024-01-01' is located, but it has no idea where YEAR(date) = 2024 begins—the function transforms the value into something the index wasn't built to navigate.
| Expression Pattern | Sargable? | Index Usage |
|---|---|---|
| column = value | ✅ Yes | Index seek (==) |
| column > value | ✅ Yes | Index range scan |
| column BETWEEN a AND b | ✅ Yes | Index range scan |
| column LIKE 'prefix%' | ✅ Yes | Index range scan |
| FUNCTION(column) = value | ❌ No | Full scan required |
| column LIKE '%suffix' | ❌ No | Full scan required |
| column + 1 = value | ❌ No | Full scan required |
| column1 = column2 | ❌ Sometimes | Depends on optimizer |
Having an index doesn't guarantee it will be used. A non-sargable expression on an indexed column forces a full scan despite the index existing. The expression must be written to 'unlock' index access.
Many non-sargable expressions can be rewritten into sargable equivalents without changing the result. The goal is to move the transformation to the value side, leaving the column bare.
Pattern 1: Date/Time Functions → Range Comparisons:
123456789101112131415161718192021
-- Non-sargable: Function on columnSELECT * FROM orders WHERE YEAR(order_date) = 2024; -- Sargable: Range comparisonSELECT * FROM orders WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'; -- Non-sargable: Extract monthSELECT * FROM orders WHERE MONTH(order_date) = 3; -- Sargable: Union of March ranges (or for specific year)SELECT * FROM orders WHERE order_date >= '2024-03-01' AND order_date < '2024-04-01'; -- Non-sargable: Date part extractionSELECT * FROM events WHERE DATE(start_timestamp) = '2024-05-15'; -- Sargable: Full day rangeSELECT * FROM events WHERE start_timestamp >= '2024-05-15 00:00:00' AND start_timestamp < '2024-05-16 00:00:00';Pattern 2: String Functions → Pattern Matching or Data Cleanup:
1234567891011121314151617181920212223
-- Non-sargable: Case conversion on querySELECT * FROM customers WHERE UPPER(email) = 'JOHN@EXAMPLE.COM'; -- Better: Store normalized data, query directly-- (Ensure email is stored as lowercase during INSERT/UPDATE)SELECT * FROM customers WHERE email = 'john@example.com'; -- Alternative: Functional index (database support varies)CREATE INDEX idx_customers_email_upper ON customers(UPPER(email));-- Now UPPER(email) = 'JOHN@EXAMPLE.COM' uses the index -- Non-sargable: Substring extractionSELECT * FROM products WHERE SUBSTRING(sku, 1, 3) = 'ABC'; -- Sargable: LIKE with prefixSELECT * FROM products WHERE sku LIKE 'ABC%'; -- Non-sargable: LIKE with leading wildcardSELECT * FROM products WHERE product_name LIKE '%phone%'; -- Partial solution: Full-text search indexCREATE FULLTEXT INDEX idx_product_name ON products(product_name);SELECT * FROM products WHERE MATCH(product_name) AGAINST('phone');Pattern 3: Arithmetic on Columns → Arithmetic on Values:
12345678910111213141516171819
-- Non-sargable: Arithmetic on columnSELECT * FROM orders WHERE total * 1.1 > 100; -- Sargable: Solve for column algebraicallySELECT * FROM orders WHERE total > 100 / 1.1; -- total > 90.91 -- Non-sargable: Addition on columnSELECT * FROM products WHERE price + shipping > 50; -- Sargable: RearrangeSELECT * FROM products WHERE price > 50 - shipping;-- Note: If shipping varies per row, this doesn't help-- In that case, consider computed column or materialized value -- Non-sargable: Date arithmetic on columnSELECT * FROM subscriptions WHERE created_date + INTERVAL '30 days' < NOW(); -- Sargable: Arithmetic on valueSELECT * FROM subscriptions WHERE created_date < NOW() - INTERVAL '30 days';Treat the comparison as an equation and 'solve for the column.' If FUNCTION(column) = value, can you express it as column = INVERSE_FUNCTION(value)? This moves the function to the value side where it's computed once, not per row.
Implicit type conversion occurs when comparing different data types. The database converts one type to match the other—but which type converts to which can break sargability unexpectedly.
The Conversion Hierarchy:
Databases have type precedence rules. When types mismatch, the lower-precedence type converts to the higher-precedence type. If this converts the column, sargability is lost.
123456789101112131415161718192021222324
-- Scenario: phone_number column is VARCHAR(20), indexed -- Non-sargable: Numeric comparison causes column conversionSELECT * FROM customers WHERE phone_number = 5551234567;-- Database converts: CAST(phone_number AS BIGINT) = 5551234567-- The CAST on column breaks sargability (and may error on non-numeric data) -- Sargable: String comparison, no conversion neededSELECT * FROM customers WHERE phone_number = '5551234567';-- Direct string comparison uses index -- Scenario: order_id is INT, parameter is string -- Usually sargable: String converts to INT (column type preserved)SELECT * FROM orders WHERE order_id = '12345';-- Becomes: order_id = CAST('12345' AS INT) = 12345-- Column not wrapped, index usable -- Scenario: created_at is TIMESTAMP, parameter is DATE -- Behavior varies by database:SELECT * FROM events WHERE created_at = '2024-01-15';-- PostgreSQL: '2024-01-15' becomes '2024-01-15 00:00:00', works as expected-- Some databases: May apply DATE() to created_at, breaking sargabilityCommon Conversion Traps:
| Column Type | Value Type | What Happens | Sargable? |
|---|---|---|---|
| VARCHAR | INT | VARCHAR → INT (column converted) | ❌ No |
| INT | VARCHAR | VARCHAR → INT (value converted) | ✅ Yes |
| DATETIME | DATE | DATE → DATETIME (value converted) | ✅ Usually |
| VARCHAR(10) | VARCHAR(100) | May truncate or pad | ⚠️ Depends |
| CHAR(10) | VARCHAR | CHAR padded, comparison affected | ⚠️ Tricky |
| DECIMAL | FLOAT | Precision differences | ✅ Usually |
Safe Practices:
1234567891011121314151617
-- Always match parameter type to column type: -- Column: customer_id INTWHERE customer_id = 12345 -- Good: INT literalWHERE customer_id = '12345' -- Usually OK: string to intWHERE customer_id = :param -- Ensure param is INT in application -- Column: email VARCHAR(255) WHERE email = 'user@example.com' -- Good: String literal -- Column: price DECIMAL(10,2)WHERE price = 99.99 -- Good: Numeric literal -- For prepared statements, use explicit bindings:-- Java: ps.setInt(1, customerId);-- Python: cursor.execute(sql, (customer_id,))-- Node.js: connection.query(sql, [customerId])Comparing a VARCHAR column to a numeric value is one of the most common sargability destroyers. The database converts the VARCHAR column to a number for each row, making index usage impossible. Always quote string values: WHERE phone = '555123' not WHERE phone = 555123.
When you must filter on a function result, function-based indexes (also called expression indexes) provide a solution. These indexes pre-compute the function value for each row.
Creating Function-Based Indexes:
1234567891011121314151617181920
-- PostgreSQL: Expression indexesCREATE INDEX idx_orders_year ON orders (EXTRACT(YEAR FROM order_date));-- Now this is sargable:SELECT * FROM orders WHERE EXTRACT(YEAR FROM order_date) = 2024; -- Case-insensitive searchCREATE INDEX idx_customers_email_lower ON customers (LOWER(email));SELECT * FROM customers WHERE LOWER(email) = 'john@example.com'; -- MySQL: Generated columns + indexALTER TABLE customers ADD email_lower VARCHAR(255) GENERATED ALWAYS AS (LOWER(email)) STORED;CREATE INDEX idx_email_lower ON customers(email_lower); -- SQL Server: Computed columns + indexALTER TABLE customers ADD email_lower AS LOWER(email);CREATE INDEX idx_email_lower ON customers(email_lower); -- Oracle: Function-based indexesCREATE INDEX idx_cust_email_upper ON customers(UPPER(email));When to Use Function Indexes:
Cost Considerations:
12345678910111213141516
-- Function indexes have trade-offs: -- Cost 1: Storage space (duplicates computed values)-- Cost 2: INSERT/UPDATE overhead (must compute function)-- Cost 3: Maintenance (additional index to rebuild) -- Example sizing:-- Table: 10 million rows, email VARCHAR(255)-- Function index on LOWER(email): ~300 MB additional storage-- INSERT overhead: ~0.1ms per insert for function computation -- Benefit calculation:-- Queries using LOWER(email): 10,000/day-- Without index: 2 seconds each (full scan) = 5.5 hours/day-- With index: 5 ms each = 50 seconds/day-- Clear win for high-frequency queriesFunction indexes only work when the query expression exactly matches the indexed expression. CREATE INDEX ON (LOWER(email)) helps LOWER(email) = 'x' but NOT TRIM(LOWER(email)) = 'x' or email = UPPER('x'). Match the expression precisely.
OR expressions present special optimization challenges. While AND conditions narrow results (making indexes more useful), OR conditions widen results and can prevent index usage entirely.
OR on Same Column (Usually Optimizable):
12345678910
-- OR on same column often converts to INSELECT * FROM orders WHERE status = 'pending' OR status = 'processing' OR status = 'shipped'; -- Equivalent and often faster:SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped'); -- Optimizer typically handles this, but explicit IN is cleaner-- Both can use index on 'status' via multiple seeks or range scanOR on Different Columns (Problematic):
123456789101112131415161718
-- OR on different columns breaks simple index usageSELECT * FROM customers WHERE email = 'john@example.com' OR phone = '555-1234'; -- With separate indexes on email and phone:-- Option 1: Full scan (can't use either index cleanly)-- Option 2: Index Union (scan both indexes, combine results) -- Force index union (if optimizer doesn't choose it):-- PostgreSQLSELECT * FROM customers WHERE email = 'john@example.com'UNIONSELECT * FROM customers WHERE phone = '555-1234'; -- This uses index on email for first query, index on phone for second-- UNION automatically removes duplicates -- For large result sets, union can be faster than full scanOR with AND Mixed (Precedence Issues):
12345678910111213141516171819
-- AND has higher precedence than OR-- This expression:WHERE status = 'active' AND region = 'West' OR region = 'East' -- Is parsed as:WHERE (status = 'active' AND region = 'West') OR region = 'East'-- Returns: Active customers in West, OR ANY customer in East -- Probably intended:WHERE status = 'active' AND (region = 'West' OR region = 'East')-- Returns: Active customers in West or East -- Clearer with explicit parentheses:WHERE status = 'active' AND region IN ('West', 'East') -- For complex OR conditions, consider UNION:SELECT * FROM customers WHERE status = 'active' AND region = 'West'UNION ALLSELECT * FROM customers WHERE status = 'active' AND region = 'East';| Pattern | Problem | Solution |
|---|---|---|
| OR on same column | Verbose | Convert to IN clause |
| OR on different indexed columns | Can't use single index | UNION of separate queries |
| OR with NULL check | NULL handling complexity | COALESCE or rewrite logic |
| OR in JOIN condition | Complex join handling | Restructure as separate joins with UNION |
| Nested OR/AND | Precedence confusion | Use explicit parentheses |
When converting OR to UNION, use UNION ALL if you're certain queries don't produce duplicates. UNION ALL is faster because it doesn't need to deduplicate. UNION removes duplicates (like the OR semantic), but with sort/hash overhead.
NULL handling in SQL follows three-valued logic (TRUE, FALSE, UNKNOWN) and can create both correctness and performance issues. Understanding NULL behavior prevents subtle bugs and enables better optimization.
NULL in Comparisons:
12345678910111213141516
-- NULL is not equal to anything, including itselfSELECT * FROM customers WHERE discount = NULL; -- Returns 0 rows always!SELECT * FROM customers WHERE discount <> NULL; -- Also returns 0 rows! -- Correct NULL checks:SELECT * FROM customers WHERE discount IS NULL; -- Matches NULL valuesSELECT * FROM customers WHERE discount IS NOT NULL; -- Excludes NULL values -- NULL with equality: UNKNOWN result fails WHERE-- WHERE NULL = NULL → UNKNOWN → row excluded-- WHERE NULL <> 1 → UNKNOWN → row excluded -- Safe equality with COALESCE:SELECT * FROM orders WHERE COALESCE(discount_code, '') = COALESCE(:param, '');-- Treats NULL as empty string for comparisonNULL-Safe Equality Operators:
Some databases provide NULL-safe comparison operators:
12345678910111213141516
-- MySQL: NULL-safe equality (<=>)SELECT * FROM customers WHERE discount <=> :param;-- Returns TRUE if both NULL, or both equal non-NULL -- PostgreSQL: IS NOT DISTINCT FROMSELECT * FROM customers WHERE discount IS NOT DISTINCT FROM :param;-- Same behavior: NULL matches NULL -- SQL Standard: IS [NOT] DISTINCT FROMSELECT * FROM orders WHERE status IS NOT DISTINCT FROM 'pending'; -- Without NULL-safe operators, use verbose form:SELECT * FROM customersWHERE (discount = :param) OR (discount IS NULL AND :param IS NULL);NULL and Indexes:
12345678910111213141516
-- PostgreSQL, MySQL: NULLs included in B-tree indexesCREATE INDEX idx_discount ON orders(discount);SELECT * FROM orders WHERE discount IS NULL; -- Uses index -- Oracle: NULLs NOT in B-tree indexes (single-column)-- This query cannot use single-column index:SELECT * FROM orders WHERE discount IS NULL; -- Full scan -- Oracle workaround: Composite index includes NULLsCREATE INDEX idx_discount ON orders(discount, 0);-- Now NULL discount values are indexed -- Partial index for IS NOT NULL (PostgreSQL):CREATE INDEX idx_orders_discount_nonnull ON orders(discount) WHERE discount IS NOT NULL;-- Smaller index, only for non-NULL queriesPerformance Impact of COALESCE/NVL:
123456789101112
-- COALESCE on column breaks sargability:SELECT * FROM orders WHERE COALESCE(discount, 0) > 0;-- Cannot use index on discount (function applied) -- Sargable alternative:SELECT * FROM orders WHERE discount > 0;-- NULL values naturally excluded (NULL > 0 is UNKNOWN) -- If you need NULLs treated as specific value, consider:-- 1. Store the default value instead of NULL-- 2. Use computed column: discount_effective AS COALESCE(discount, 0)-- 3. Use partial index or expression indexCOUNT() counts all rows. COUNT(column) counts non-NULL values only. This semantic difference affects results when NULLs exist. For counting rows, use COUNT(). For counting values present, use COUNT(column).
The optimizer simplifies many expressions, but complex logic can hide opportunities. Manual simplification makes code clearer and helps the optimizer.
Constant Folding:
The optimizer pre-computes expressions involving only constants:
1234567891011
-- These are computed once at parse time:WHERE created_at > CURRENT_DATE - INTERVAL '30 days'-- Computed once: '2024-01-15' (example) WHERE price > 100 * 1.1 + 5-- Computed once: 115 -- But watch out for volatile functions:WHERE created_at > CURRENT_TIMESTAMP - INTERVAL '1 hour'-- CURRENT_TIMESTAMP may be evaluated once or per row-- Behavior varies by database - check documentationRedundant Condition Elimination:
123456789101112131415
-- Optimizer removes redundant conditions:WHERE age > 18 AND age >= 10-- Simplified to: WHERE age > 18 (>18 implies >=10) WHERE status = 'active' AND status = 'active'-- Simplified to: WHERE status = 'active' -- But may not simplify complex redundancy:WHERE category_id = 5 AND category_name = 'Electronics'-- Both kept even if category_name is functionally dependent on category_id-- Manual simplification helps if you know the dependency -- Impossible conditions can short-circuit:WHERE status = 'active' AND status = 'inactive'-- Optimizer may detect this returns 0 rows (contradiction)CASE Expression Optimization:
123456789101112131415161718192021
-- CASE in WHERE can prevent index usage:SELECT * FROM ordersWHERE CASE WHEN :param_type = 'id' THEN order_id = :param_value WHEN :param_type = 'status' THEN status = :param_valueEND;-- This is one statement with complex predicate - hard to optimize -- Better: Use application logic to build dynamic SQL-- Or use OR with static conditions:SELECT * FROM ordersWHERE (:param_type = 'id' AND order_id = :param_value) OR (:param_type = 'status' AND status = :param_value);-- Still not great, but more explicit -- Best: Separate queries per type in applicationif (paramType === 'id') { query = 'SELECT * FROM orders WHERE order_id = ?';} else { query = 'SELECT * FROM orders WHERE status = ?';}IN List Optimization:
1234567891011121314151617181920212223
-- Small IN lists are fast:WHERE status IN ('A', 'B', 'C') -- Typically 3 index seeks or range scan -- Very large IN lists can be problematic:WHERE product_id IN (1, 2, 3, ..., 10000)-- Parse time increases-- May exceed parameter limits-- Consider alternatives: -- Option 1: Temporary tableCREATE TEMPORARY TABLE filter_ids (id INT PRIMARY KEY);INSERT INTO filter_ids VALUES (1), (2), ... (10000);SELECT * FROM products p JOIN filter_ids f ON f.id = p.product_id; -- Option 2: VALUES clause (PostgreSQL, modern MySQL)SELECT * FROM products pWHERE p.product_id IN ( SELECT * FROM (VALUES (1), (2), (3), ...) AS v(id)); -- Option 3: Array parameter (PostgreSQL)SELECT * FROM products WHERE product_id = ANY(:ids_array);Most databases have practical limits on IN clause size (SQL Server: 2,100 parameters, Oracle: 1,000 values in IN). For large value sets, use temporary tables or array parameters. The join approach is often faster than long IN lists anyway.
While optimizers reorder predicates for efficiency, understanding evaluation order helps write clearer queries and avoid errors.
Optimizer Predicate Reordering:
1234567891011121314
-- You write:WHERE department = 'Engineering' AND salary > 100000 AND active = true -- Optimizer may evaluate in different order based on:-- 1. Selectivity: Most selective predicate first (eliminates most rows)-- 2. Index availability: Indexed predicates preferred-- 3. Cost: Cheaper predicates evaluated first -- If 'active' is most selective and indexed, optimizer may use:-- 1. Index scan on active = true (100 rows)-- 2. Filter by department = 'Engineering' (50 rows) -- 3. Filter by salary > 100000 (10 rows) -- Your written order doesn't constrain the optimizerShort-Circuit Evaluation (Limited in SQL):
123456789101112131415161718
-- Unlike programming languages, SQL doesn't guarantee short-circuit -- This might error even if amount IS NOT NULL:WHERE amount IS NOT NULL AND amount / quantity > 10-- SQL standard allows evaluating in any order-- If quantity = 0 evaluated first: division error possible -- Safe version using CASE ensures order:WHERE CASE WHEN amount IS NULL THEN FALSE WHEN quantity = 0 THEN FALSE ELSE amount / quantity > 10END -- Or use NULLIF to avoid division by zero:WHERE amount IS NOT NULL AND amount / NULLIF(quantity, 0) > 10-- NULLIF(quantity, 0) returns NULL if quantity is 0-- NULL / anything = NULL, and NULL > 10 = UNKNOWN = excludedFilter Pushdown in CTEs and Subqueries:
12345678910111213141516171819202122
-- Optimizer may push filters into CTEs/subqueries WITH expensive_calculation AS ( SELECT customer_id, SUM(total) as lifetime_value, COUNT(*) as order_count FROM orders GROUP BY customer_id)SELECT * FROM expensive_calculationWHERE customer_id = 12345; -- Smart optimizer pushes filter into CTE:-- Only aggregates for customer 12345 -- But materialization prevents pushdown:WITH expensive_calculation AS MATERIALIZED ( -- PostgreSQL SELECT customer_id, ...)SELECT * FROM expensive_calculationWHERE customer_id = 12345;-- CTE computed for ALL customers first, then filtered| Practice | Why It Helps |
|---|---|
| Put indexed columns in predicates | Enables index usage |
| Avoid functions on columns | Preserves sargability |
| Use IS NOT NULL explicitly | Documents intent, helps optimizer |
| Separate expensive predicates | May enable parallel evaluation |
| Match column types in comparisons | Avoids implicit conversion |
| Use CASE for order-dependent logic | Guarantees evaluation order |
The order you write predicates is for human readability. Put the most important or restrictive conditions first for clarity, but know the optimizer will reorder based on its cost analysis. Write for humans; let the optimizer write for the database.
How you write SQL expressions directly impacts query performance. The patterns and principles in this page apply to every query you write. Let's consolidate the key insights:
Module Complete:
You've completed the Query Optimization Tips module. You now understand how to avoid SELECT *, limit result sets efficiently, write performant joins and subqueries, and craft expressions that unlock optimizer power. These techniques, applied consistently, transform slow queries into fast ones and prevent performance problems before they occur.
Congratulations! You've mastered practical query optimization techniques. From column selection to expression crafting, you now have the knowledge to write SQL that performs efficiently at any scale. Apply these patterns in your daily work, and slow queries will become rare exceptions rather than daily frustrations.