Common Performance Issues - Learning Module

Loading content...

0/241

Functions on Indexed Columns

The Index-Blinding Trap

You've created a perfect index. The column it covers is exactly what your WHERE clause needs. You run the query—and the database ignores your index completely, scanning the entire table instead.

The culprit? A function applied to the indexed column.

This is one of the most common and frustrating performance anti-patterns in SQL. It's frustrating because:

The index exists — You did the right thing by creating it
The query looks reasonable — There's no obvious error
The fix is usually simple — Once you understand the problem

The issue is that functions transform column values, and the database can't navigate an index using transformed values. It's like trying to use a phone book organized by last name to find someone by their initials—the organizational structure exists, but it can't help with your specific search pattern.

This pattern is so common that it has earned its own name: the 'non-SARGable predicate' problem. SARG stands for Search ARGument, and a SARGable predicate is one that can be evaluated using an index seek. Applying a function to a column makes the predicate non-SARGable.

What You Will Learn

By the end of this page, you will understand exactly why functions prevent index usage, recognize the most common function-on-column patterns, master rewrite techniques to restore index eligibility, and learn when function-based indexes can solve problems that rewrites cannot.

Why Functions Prevent Index Usage

To understand why functions break index lookups, we need to examine how indexes enable efficient searching.

How B-Tree Index Seeks Work:

A B-tree index stores column values in sorted order. When you search for WHERE email = 'john@example.com', the database:

Starts at the root node
Compares 'john@example.com' against the node's values
Follows the appropriate branch (left, right, or an exact match)
Repeats until reaching a leaf node
Either finds the matching entry or determines it doesn't exist

This navigation is possible because the search value and the indexed values are directly comparable. The database knows that 'john@example.com' sorts between 'james@example.com' and 'kate@example.com' without examining every value.

The Function Problem:

When you write WHERE UPPER(email) = 'JOHN@EXAMPLE.COM', you're not searching for an email value—you're searching for the result of UPPER() applied to email values.

The index stores:

You're searching for 'JOHN@EXAMPLE.COM'. The database cannot navigate the index because:

The indexed values are the original values, not UPPER() results
To compare, it must UPPER() each indexed value
That means reading and transforming every single value
Which is exactly a full scan—the index provides no benefit

function_on_column_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Table with indexed column
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    hire_date DATE NOT NULL
);
 
CREATE INDEX idx_employees_email ON employees(email);
CREATE INDEX idx_employees_hire_date ON employees(hire_date);
 
-- SEARCHABLE (SARGable) - Uses index seek
SELECT * FROM employees WHERE email = 'john.doe@company.com';
-- Database navigates index directly to 'john.doe@company.com'
 
-- NON-SEARCHABLE (Non-SARGable) - Full scan despite index
SELECT * FROM employees WHERE UPPER(email) = 'JOHN.DOE@COMPANY.COM';
-- Database must:
-- 1. Read every row in the index/table
-- 2. Apply UPPER() to each email value
-- 3. Compare result to 'JOHN.DOE@COMPANY.COM'
-- The index exists but provides NO navigational benefit
 
-- Another non-SARGable example
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
-- Must evaluate YEAR() on every hire_date value
-- Cannot seek to "2023" in the hire_date index
 
-- The fundamental rule:
-- function(column) = value    ← NON-SARGable, full scan
-- column = value              ← SARGable, index seek
-- column = function(value)    ← SARGable, index seek (function on the right is OK)

The SARGability Rule

A predicate is SARGable (index-eligible) when the indexed column appears 'naked' on one side of the comparison. Any expression, function, or operation that wraps the column makes the predicate non-SARGable. Functions on the other side of the comparison (the value side) are fine—those are evaluated once, not per row.

Converting Mermaid diagram...

Common Function-on-Column Patterns

Function-on-column anti-patterns appear in many forms. Some are obvious; others are subtle. Let's examine the most common offenders.

Pattern 1: String Case Functions

The most common pattern—using UPPER(), LOWER(), or similar functions for case-insensitive searches:

string_case_functions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- PROBLEM: Case-insensitive search using function
SELECT * FROM customers WHERE UPPER(email) = UPPER('John@Example.com');
SELECT * FROM products WHERE LOWER(name) LIKE LOWER('wireless%');
 
-- Why this happens:
-- Developers want case-insensitive matching
-- Default collations are often case-sensitive
-- UPPER/LOWER seems like an obvious solution
 
-- The performance cost:
-- 1 million customers → 1 million UPPER() calls
-- Index completely ignored
 
-- SOLUTIONS:
 
-- Solution 1: Use case-insensitive collation at column level
ALTER TABLE customers 
ALTER COLUMN email VARCHAR(255) COLLATE Latin1_General_CI_AS;
-- CI = Case Insensitive
-- Now: WHERE email = 'John@Example.com' works case-insensitively
 
-- Solution 2: Use case-insensitive collation in query (SQL Server)
SELECT * FROM customers 
WHERE email COLLATE Latin1_General_CI_AS = 'John@Example.com';
 
-- Solution 3: PostgreSQL ILIKE for case-insensitive LIKE
SELECT * FROM products WHERE name ILIKE 'wireless%';
-- Still may not use index efficiently; consider citext extension
 
-- Solution 4: Store normalized version in separate indexed column
ALTER TABLE customers ADD email_lower VARCHAR(255);
UPDATE customers SET email_lower = LOWER(email);
CREATE INDEX idx_customers_email_lower ON customers(email_lower);
-- Query: WHERE email_lower = LOWER('John@Example.com')

Pattern 2: Date/Time Extraction Functions

Extracting components from dates (year, month, day) is extremely common and extremely problematic:

datetime_functions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- PROBLEMS: Extracting date components
 
-- Finding all orders from 2023
SELECT * FROM orders WHERE YEAR(order_date) = 2023;
 
-- Finding all events in March
SELECT * FROM events WHERE MONTH(event_date) = 3;
 
-- Finding Monday events
SELECT * FROM events WHERE DATEPART(dw, event_date) = 2;
 
-- All of these apply functions to the indexed column
-- All trigger full table scans
 
-- SOLUTIONS: Rewrite as range predicates
 
-- Solution for YEAR() - convert to range
SELECT * FROM orders 
WHERE order_date >= '2023-01-01' 
  AND order_date < '2024-01-01';
-- Index on order_date can seek to start of 2023, scan until 2024
 
-- Solution for MONTH() - also range (but may span years)
SELECT * FROM events
WHERE (order_date >= '2023-03-01' AND order_date < '2023-04-01')
   OR (order_date >= '2024-03-01' AND order_date < '2024-04-01');
-- Multiple ranges if spanning multiple years
 
-- Alternative: Computed column for month (if frequent query pattern)
ALTER TABLE events ADD event_month AS MONTH(event_date) PERSISTED;
CREATE INDEX idx_events_month ON events(event_month);
-- Now: WHERE event_month = 3 uses the index
 
-- Solution for day of week - harder, may need computed column
ALTER TABLE events ADD event_dow AS DATEPART(dw, event_date) PERSISTED;
CREATE INDEX idx_events_dow ON events(event_dow);

Pattern 3: String Manipulation Functions

Substring extraction, trimming, and concatenation on indexed columns:

string_manipulation.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- PROBLEMS: String manipulation
 
-- Finding by area code (first 3 chars of phone)
SELECT * FROM customers WHERE LEFT(phone, 3) = '415';
SELECT * FROM customers WHERE SUBSTRING(phone, 1, 3) = '415';
 
-- Finding SKUs in a category (prefix match)
SELECT * FROM products WHERE LEFT(sku, 2) = 'EL';
 
-- Trimming whitespace
SELECT * FROM items WHERE TRIM(code) = 'ABC123';
 
-- SOLUTIONS:
 
-- For prefix matching: Use LIKE with trailing wildcard
SELECT * FROM customers WHERE phone LIKE '415%';
-- B-tree indexes support prefix matching (leading wildcard is the problem)
 
SELECT * FROM products WHERE sku LIKE 'EL%';
-- Index seek to 'EL', scan until values no longer start with 'EL'
 
-- For trim issues: Clean data on insert, not on query
-- Add CHECK constraint and trigger to ensure no leading/trailing whitespace
ALTER TABLE items ADD CONSTRAINT chk_code_trimmed 
CHECK (code = TRIM(code));
 
-- If legacy data has whitespace, clean it once:
UPDATE items SET code = TRIM(code) WHERE code != TRIM(code);
 
-- Alternatively, create computed column
ALTER TABLE items ADD code_trimmed AS TRIM(code) PERSISTED;
CREATE INDEX idx_items_code_trimmed ON items(code_trimmed);

Pattern 4: Mathematical Operations

Arithmetic on indexed numeric columns:

math_operations.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- PROBLEMS: Arithmetic on columns
 
-- Finding products with 20% margin
SELECT * FROM products WHERE price - cost > price * 0.20;
-- Operations on both price and cost columns
 
-- Finding transactions over a threshold after fee
SELECT * FROM transactions WHERE amount - 2.50 > 100;
 
-- Dividing to find rate
SELECT * FROM metrics WHERE total / count > 50;
 
-- SOLUTIONS: Algebraic manipulation to isolate column
 
-- Margin calculation: Rearrange inequality
-- price - cost > price * 0.20
-- price - cost > 0.2 * price
-- price - 0.2 * price > cost
-- 0.8 * price > cost
-- price > cost / 0.8
-- price > cost * 1.25
 
SELECT * FROM products WHERE price > cost * 1.25;
-- Still has operation on cost column...
-- If you can't avoid operations, consider computed column
 
-- Transaction threshold: Move constant to right side
-- amount - 2.50 > 100
-- amount > 102.50
SELECT * FROM transactions WHERE amount > 102.50;
-- Now index on amount can be used!
 
-- Rate calculation: Multiply both sides
-- total / count > 50
-- total > 50 * count
SELECT * FROM metrics WHERE total > 50 * count;
-- Operation still on count... 
-- For complex cases, computed columns may be necessary

Hidden Functions in ORMs

Object-Relational Mappers (ORMs) sometimes generate non-SARGable queries without you realizing it. Methods like .ToLower(), .Trim(), .Year, or date manipulations in LINQ/Entity Framework/Hibernate may translate to function calls on columns. Always inspect generated SQL for production-critical queries.

Rewriting Non-SARGable Predicates

The primary solution for function-on-column problems is query rewriting—restructuring the predicate so the indexed column appears 'naked' on one side of the comparison.

The General Principle:

NON-SARGable: function(column) = value
SARGable:     column = inverse_function(value)  -- if inverse exists
SARGable:     column IN (range derived from value)

Let's work through systematic rewrite techniques:

Common Non-SARGable to SARGable Transformations
Non-SARGable Pattern	SARGable Rewrite	Notes
`YEAR(date_col) = 2023`	`date_col >= '2023-01-01' AND date_col < '2024-01-01'`	Range predicate on indexed column
`MONTH(date_col) = 6`	`date_col >= '2023-06-01' AND date_col < '2023-07-01'`	Need to specify year(s)
`DATE(datetime_col) = '2023-05-15'`	`datetime_col >= '2023-05-15' AND datetime_col < '2023-05-16'`	Range for exact date match
`LEFT(col, 3) = 'ABC'`	`col LIKE 'ABC%'`	Prefix match uses index
`SUBSTRING(col, 1, 2) = 'XY'`	`col LIKE 'XY%'`	Same as LEFT()
`UPPER(col) = 'VALUE'`	`col = 'VALUE'` with CI collation	Or use citext/function index
`col + 10 = 50`	`col = 40`	Move constant to right side
`col * 2 > 100`	`col > 50`	Divide both sides
`COALESCE(col, 0) = 5`	`col = 5`	If seeking non-NULL anyway
`DATEDIFF(day, col, GETDATE()) < 7`	`col > DATEADD(day, -7, GETDATE())`	Compute boundary date instead

Deep Dive: Date Range Rewrites

Date-related functions are the most frequently encountered non-SARGable patterns. Here's a comprehensive rewrite reference:

date_range_rewrites.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
-- ========================================
-- YEAR() extraction
-- ========================================
 
-- Non-SARGable
SELECT * FROM orders WHERE YEAR(order_date) = 2023;
 
-- SARGable
SELECT * FROM orders 
WHERE order_date >= '2023-01-01' 
  AND order_date < '2024-01-01';
 
-- ========================================
-- MONTH() extraction (requires year context)
-- ========================================
 
-- Non-SARGable
SELECT * FROM orders WHERE MONTH(order_date) = 12;
 
-- SARGable (assuming specific year or range of years)
SELECT * FROM orders 
WHERE order_date >= '2023-12-01' AND order_date < '2024-01-01';
 
-- For all Decembers across multiple years:
SELECT * FROM orders 
WHERE (order_date >= '2022-12-01' AND order_date < '2023-01-01')
   OR (order_date >= '2023-12-01' AND order_date < '2024-01-01');
-- Consider partitioning or computed column if this is common
 
-- ========================================
-- DAY() extraction
-- ========================================
 
-- Non-SARGable (find all orders on 15th of any month)
SELECT * FROM orders WHERE DAY(order_date) = 15;
 
-- This is hard to rewrite efficiently
-- Options:
-- 1. Computed persisted column: ALTER TABLE ADD order_day AS DAY(order_date)
-- 2. Accept full scan for rare queries
-- 3. Partition by day if pattern is critical
 
-- ========================================
-- DATE() or CAST to DATE from DATETIME
-- ========================================
 
-- Non-SARGable
SELECT * FROM events WHERE DATE(event_time) = '2023-06-15';
SELECT * FROM events WHERE CAST(event_time AS DATE) = '2023-06-15';
 
-- SARGable
SELECT * FROM events 
WHERE event_time >= '2023-06-15 00:00:00'
  AND event_time < '2023-06-16 00:00:00';
 
-- Or more precisely:
SELECT * FROM events 
WHERE event_time >= '2023-06-15'
  AND event_time < DATEADD(day, 1, '2023-06-15');
 
-- ========================================
-- DATEDIFF() for "recent" records
-- ========================================
 
-- Non-SARGable (find orders in last 30 days)
SELECT * FROM orders 
WHERE DATEDIFF(day, order_date, GETDATE()) <= 30;
 
-- SARGable (compute the boundary date once)
SELECT * FROM orders 
WHERE order_date >= DATEADD(day, -30, GETDATE());
-- DATEADD is computed once; order_date is naked
 
-- ========================================
-- Time component extraction
-- ========================================
 
-- Non-SARGable (find events at noon)
SELECT * FROM events WHERE DATEPART(hour, event_time) = 12;
 
-- SARGable requires range within each day
-- Complex if spanning multiple days; consider computed column
 
-- ========================================
-- Date arithmetic
-- ========================================
 
-- Non-SARGable
SELECT * FROM subscriptions WHERE end_date + 30 >= GETDATE();
 
-- SARGable (move constant to right side)
SELECT * FROM subscriptions WHERE end_date >= DATEADD(day, -30, GETDATE());

The 'Function on Value, Not Column' Rule

Functions on the VALUE side of a comparison are fine—they're evaluated once, not per row. WHERE order_date >= DATEADD(day, -30, GETDATE()) is SARGable because DATEADD is applied to GETDATE(), not to order_date. The result is computed once and used for index navigation.

Computed Columns and Function-Based Indexes

When query rewrites are impossible or impractical, computed columns and function-based indexes can make non-SARGable patterns searchable.

Computed/Generated Columns:

A computed column stores the result of an expression, allowing you to index it like any other column:

computed_columns.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- SQL Server: Persisted Computed Column
-- The PERSISTED keyword stores the computed value physically
ALTER TABLE employees
ADD hire_year AS YEAR(hire_date) PERSISTED;
 
CREATE INDEX idx_employees_hire_year ON employees(hire_year);
 
-- Now this query uses the index:
SELECT * FROM employees WHERE hire_year = 2023;
-- Translates to seeking on idx_employees_hire_year
 
-- PostgreSQL: Generated Column (stored)
ALTER TABLE employees
ADD COLUMN hire_year INT GENERATED ALWAYS AS (EXTRACT(YEAR FROM hire_date)) STORED;
 
CREATE INDEX idx_employees_hire_year ON employees(hire_year);
 
-- MySQL 5.7+: Generated Column
ALTER TABLE employees
ADD COLUMN hire_year INT AS (YEAR(hire_date)) STORED;
 
CREATE INDEX idx_employees_hire_year ON employees(hire_year);
 
-- ========================================
-- Common computed column patterns
-- ========================================
 
-- Case-insensitive search column
ALTER TABLE customers
ADD email_lower AS LOWER(email) PERSISTED;
CREATE INDEX idx_customers_email_lower ON customers(email_lower);
-- Query: WHERE email_lower = LOWER(@input)
 
-- Area code extraction
ALTER TABLE contacts
ADD phone_area_code AS LEFT(phone, 3) PERSISTED;
CREATE INDEX idx_contacts_area_code ON contacts(phone_area_code);
-- Query: WHERE phone_area_code = '415'
 
-- Full name for searching
ALTER TABLE users
ADD full_name AS CONCAT(first_name, ' ', last_name) PERSISTED;
CREATE INDEX idx_users_full_name ON users(full_name);
-- Query: WHERE full_name LIKE 'John Doe%'

Function-Based Indexes (Expression Indexes):

Some databases support indexes directly on expressions, without requiring a computed column:

function_based_indexes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- PostgreSQL: Expression Index
CREATE INDEX idx_customers_email_lower 
ON customers(LOWER(email));
 
-- The optimizer automatically uses this index when query matches:
SELECT * FROM customers WHERE LOWER(email) = 'john@example.com';
-- No query rewrite needed; index directly supports the expression
 
-- More PostgreSQL expression index examples
CREATE INDEX idx_orders_year ON orders(EXTRACT(YEAR FROM order_date));
CREATE INDEX idx_products_name_upper ON products(UPPER(name));
CREATE INDEX idx_events_date ON events(DATE(event_time));
 
-- Oracle: Function-Based Index
CREATE INDEX idx_customers_email_lower 
ON customers(LOWER(email));
 
CREATE INDEX idx_employees_hire_year 
ON employees(EXTRACT(YEAR FROM hire_date));
 
-- The function in the index MUST match the function in the query exactly
-- LOWER(email) index won't help with UPPER(email) query
 
-- MySQL 8.0+: Functional Index (as of 8.0.13)
CREATE INDEX idx_customers_email_lower 
ON customers((LOWER(email)));
-- Note the extra parentheses around the expression
 
CREATE INDEX idx_orders_year 
ON orders((YEAR(order_date)));
 
-- ========================================
-- Virtual vs Persisted/Stored
-- ========================================
 
-- Virtual computed columns (SQL Server, non-persisted):
ALTER TABLE employees ADD hire_year AS YEAR(hire_date);
-- Not stored; computed on each access
-- CAN be indexed, but index stores the value anyway
 
-- Expression indexes don't store a column, just the index:
-- Storage: Only index structure, not an extra column
-- Downside: Can't SELECT the computed value directly

Expression Must Match Exactly

For a function-based index to be used, the expression in your query must match the indexed expression exactly. LOWER(email) index won't be used for UPPER(email) query. Data type, collation, and function must all match precisely.

When to Use Each Approach:

Computed Column vs. Function-Based Index
Criteria	Computed Column	Function-Based Index
Need to SELECT the computed value	✅ Yes, directly queryable	❌ Must apply function in SELECT
Storage overhead	Higher (column + index)	Lower (index only)
Multiple expressions on same column	Multiple columns needed	Multiple indexes possible
Database support	Broad (most databases)	Varies (PostgreSQL, Oracle: excellent; MySQL: 8.0+)
Query rewrite required	Still need to query the computed column	Query can stay unchanged
Maintenance complexity	Lower (simpler concept)	Higher (must match expression exactly)

Collation and Case Sensitivity

One of the most common reasons for using UPPER()/LOWER() functions is to perform case-insensitive searches. Understanding collation allows you to achieve case-insensitivity without functions.

What is Collation?

Collation defines the rules for string comparison: how characters are sorted and whether comparisons are case-sensitive, accent-sensitive, etc.

Collation Suffixes (SQL Server example):

_CS = Case Sensitive
_CI = Case Insensitive
_AS = Accent Sensitive
_AI = Accent Insensitive

Example: Latin1_General_CI_AS = Latin1 General, Case Insensitive, Accent Sensitive

collation_solutions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
-- ========================================
-- SQL Server Collation Solutions
-- ========================================
 
-- Check current column collation
SELECT 
    COLUMN_NAME, 
    COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'customers' AND COLUMN_NAME = 'email';
 
-- Change column to case-insensitive collation
ALTER TABLE customers
ALTER COLUMN email VARCHAR(255) COLLATE Latin1_General_CI_AS NOT NULL;
 
-- Now these queries are equivalent (and both use index):
SELECT * FROM customers WHERE email = 'John@Example.com';
SELECT * FROM customers WHERE email = 'john@example.com';
SELECT * FROM customers WHERE email = 'JOHN@EXAMPLE.COM';
-- All find the same rows; index is used
 
-- Query-level collation override (if you can't change column)
SELECT * FROM customers 
WHERE email COLLATE Latin1_General_CI_AS = 'John@Example.com';
-- Warning: This may still not use index efficiently
-- Column-level collation is better
 
-- ========================================
-- PostgreSQL Case-Insensitivity
-- ========================================
 
-- Option 1: Use ILIKE for case-insensitive pattern matching
SELECT * FROM products WHERE name ILIKE 'widget%';
-- ILIKE may not use index efficiently for B-tree indexes
 
-- Option 2: Use citext extension (case-insensitive text type)
CREATE EXTENSION IF NOT EXISTS citext;
 
ALTER TABLE customers ALTER COLUMN email TYPE citext;
-- Now equality comparisons are case-insensitive
 
SELECT * FROM customers WHERE email = 'John@Example.com';
-- Matches 'john@example.com' and uses index
 
-- Option 3: Expression index on LOWER()
CREATE INDEX idx_customers_email_lower ON customers(LOWER(email));
SELECT * FROM customers WHERE LOWER(email) = LOWER('John@Example.com');
-- Expression index handles this efficiently
 
-- ========================================
-- MySQL Case-Insensitivity
-- ========================================
 
-- MySQL default collations are often case-insensitive
-- Check current collation:
SHOW FULL COLUMNS FROM customers LIKE 'email';
 
-- Common case-insensitive collations:
-- utf8mb4_general_ci (case insensitive)
-- utf8mb4_unicode_ci (case insensitive, more accurate sorting)
 
-- Change column collation:
ALTER TABLE customers 
MODIFY email VARCHAR(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
 
-- Binary collation for case-sensitive when needed:
ALTER TABLE codes
MODIFY code VARCHAR(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin;

Choose Collation at Design Time

The best time to set collation is during schema design. Changing collation on existing columns with data requires careful planning—it can lock tables, invalidate indexes, and cause character set conversion issues. Plan your case-sensitivity requirements before the first INSERT.

Detecting Non-SARGable Predicates

Identifying non-SARGable predicates requires examining execution plans and query patterns. Here are detection strategies for major databases.

sqlserver_detect_nonsargable.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Check execution plan for table/index scans
SET SHOWPLAN_TEXT ON;
GO
SELECT * FROM employees WHERE YEAR(hire_date) = 2023;
GO
SET SHOWPLAN_TEXT OFF;
 
-- Look for:
-- |--Index Scan or Table Scan (instead of Seek)
 
-- In XML plan, look for SCAN operations with Predicate containing functions:
-- <Predicate>
--   <ScalarOperator ScalarString="datepart(year,[hire_date])=(2023)">
 
-- Query plan cache for scans with predicates
SELECT 
    qs.total_elapsed_time / qs.execution_count AS avg_duration_us,
    qs.execution_count,
    SUBSTRING(qt.text, qs.statement_start_offset/2 + 1,
        (CASE qs.statement_end_offset 
            WHEN -1 THEN DATALENGTH(qt.text)
            ELSE qs.statement_end_offset END 
        - qs.statement_start_offset)/2 + 1) AS query_text,
    qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp
WHERE CAST(qp.query_plan AS NVARCHAR(MAX)) LIKE '%Index Scan%'
   OR CAST(qp.query_plan AS NVARCHAR(MAX)) LIKE '%Table Scan%'
ORDER BY avg_duration_us DESC;
 
-- Look for common function patterns in query text
SELECT text
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_sql_text(cp.plan_handle)
WHERE text LIKE '%YEAR(%' 
   OR text LIKE '%MONTH(%'
   OR text LIKE '%UPPER(%'
   OR text LIKE '%LOWER(%'
   OR text LIKE '%DATEPART(%';

Filter vs. Index Cond

In execution plans, look for where the predicate is applied. 'Index Cond' or 'Seek Predicate' means the condition is used for index navigation (good). 'Filter' means the condition is applied after reading data (potentially bad if it's filtering most rows from a scan).

ORM-Generated Non-SARGable Queries

Object-Relational Mappers (ORMs) like Entity Framework, Hibernate, and ActiveRecord can generate non-SARGable queries without the developer realizing it. Understanding common patterns helps you avoid these traps.

Common ORM Pitfalls:

orm_nonsargable_examples.cs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// C# / Entity Framework Examples
 
// PROBLEM 1: String manipulation methods
// Developer writes:
var customers = context.Customers
    .Where(c => c.Email.ToLower() == email.ToLower())
    .ToList();
 
// EF generates:
// SELECT * FROM Customers WHERE LOWER(Email) = LOWER(@email)
// Function on column → full scan
 
// SOLUTION: Use case-insensitive collation or EF.Functions
var customers = context.Customers
    .Where(c => EF.Functions.Like(c.Email, email))  // Uses LIKE, respects collation
    .ToList();
 
// Or configure collation in model:
modelBuilder.Entity<Customer>()
    .Property(c => c.Email)
    .UseCollation("SQL_Latin1_General_CP1_CI_AS");
 
// ----------------------------------------
 
// PROBLEM 2: Date component properties
// Developer writes:
var orders = context.Orders
    .Where(o => o.OrderDate.Year == 2023)
    .ToList();
 
// EF generates:
// SELECT * FROM Orders WHERE DATEPART(year, OrderDate) = 2023
// Function on column → full scan
 
// SOLUTION: Use date range
DateTime startDate = new DateTime(2023, 1, 1);
DateTime endDate = new DateTime(2024, 1, 1);
var orders = context.Orders
    .Where(o => o.OrderDate >= startDate && o.OrderDate < endDate)
    .ToList();
 
// ----------------------------------------
 
// PROBLEM 3: String length comparisons
// Developer writes:
var longNames = context.Products
    .Where(p => p.Name.Length > 50)
    .ToList();
 
// EF generates:
// SELECT * FROM Products WHERE LEN(Name) > 50
// Function on column → full scan
 
// SOLUTION: Computed column in database
// ALTER TABLE Products ADD NameLength AS LEN(Name) PERSISTED;
// CREATE INDEX idx_name_length ON Products(NameLength);
// Then map to entity property

orm_nonsargable_java.java
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// Java / Hibernate / JPA Examples
 
// PROBLEM 1: JPQL with functions
// Developer writes:
String jpql = "SELECT c FROM Customer c WHERE LOWER(c.email) = LOWER(:email)";
Query query = em.createQuery(jpql);
// Generates: SELECT * FROM customers WHERE LOWER(email) = LOWER(?)
// Full scan
 
// SOLUTION: Use database-level case-insensitivity
// Or use native query with collation-aware comparison
 
// PROBLEM 2: Criteria API with functions
CriteriaBuilder cb = em.getCriteriaBuilder();
CriteriaQuery<Customer> cq = cb.createQuery(Customer.class);
Root<Customer> root = cq.from(Customer.class);
 
// This applies function to column:
cq.where(cb.equal(cb.lower(root.get("email")), email.toLowerCase()));
// Generates: WHERE LOWER(email) = ?
 
// SOLUTION: Ensure data is normalized on insert
// Store email as lowercase, search with lowercase
cq.where(cb.equal(root.get("email"), email.toLowerCase()));
// Requires: email always stored as lowercase
 
// PROBLEM 3: Date criteria
// This is problematic:
cq.where(cb.equal(cb.function("YEAR", Integer.class, root.get("hireDate")), 2023));
 
// SOLUTION: Date range
Calendar start = new GregorianCalendar(2023, 0, 1);
Calendar end = new GregorianCalendar(2024, 0, 1);
cq.where(cb.and(
    cb.greaterThanOrEqualTo(root.get("hireDate"), start.getTime()),
    cb.lessThan(root.get("hireDate"), end.getTime())
));

Always Review Generated SQL

ORMs abstract away SQL, but you remain responsible for performance. For any query that runs frequently or against large tables, log and review the generated SQL. Configure your ORM to log queries in development, and check for function-on-column patterns.

Summary: Functions on Indexed Columns

Applying functions to indexed columns is one of the most common and easily preventable SQL performance anti-patterns. Mastering this concept is essential for writing performant database queries.

Key Takeaways

•Indexes store original column values — When you transform a value with a function, the index can't navigate to that transformed value.
•SARGable predicates keep columns 'naked' — column = value uses index; function(column) = value cannot.
•Common offenders: YEAR(), UPPER(), LEFT(), arithmetic — Date functions and string manipulations are frequent culprits.
•Rewrite queries to move functions to the value side — YEAR(col) = 2023 becomes col >= '2023-01-01' AND col < '2024-01-01'.
•Computed columns and function-based indexes solve hard cases — When rewriting is impossible, index the transformed value.
•Collation solves case-sensitivity without functions — Use case-insensitive collation instead of UPPER()/LOWER().
•ORMs frequently generate non-SARGable queries — Review generated SQL for .ToLower(), .Year, .Length and similar patterns.

Page Complete

You now understand why functions on indexed columns prevent efficient searches and how to rewrite queries for optimal index usage. Next, we'll examine correlated subquery performance issues—another pattern that can devastate query performance.