Database Management SystemsRelational Algebra

Selection and Projection

LevelIntermediate

Duration90 mins

TopicRelational Algebra

2 / 5

Predicate Conditions

The Language of Conditions

If the Selection operator is a filter, then the predicate is the filter's specification—the precise description of what passes through and what doesn't. The expressiveness and correctness of your queries depend entirely on how well you can articulate predicates.

Predicates in relational algebra form a miniature formal language: a collection of comparison operators, logical connectives, and evaluation rules that together allow you to express arbitrarily complex conditions. Every SQL WHERE clause, every index lookup condition, every join predicate—all are built from this fundamental vocabulary.

Mastering predicates means mastering the precision with which you can describe data requirements. An imprecise predicate returns wrong data. An inefficient predicate causes slow queries. A correct, efficient predicate is the mark of expert database craftsmanship.

What You Will Learn

By the end of this page, you will master comparison operators across different data types, understand logical connective semantics including short-circuit evaluation, comprehend operator precedence and its implications, handle complex predicate patterns including range queries and set membership, and understand predicate evaluation order and optimization.

Comparison Operators

At the atomic level, predicates are built from comparison operators that compare two values and return a boolean result. Understanding these operators—their syntax, semantics, and behavior across data types—is fundamental.

The Six Fundamental Comparison Operators:

Operator	Meaning	Formal Notation
`=`	Equal to	a = b ⟺ value(a) is identical to value(b)
`≠` or `<>`	Not equal to	a ≠ b ⟺ ¬(a = b)
`<`	Less than	a < b ⟺ value(a) precedes value(b) in domain ordering
`>`	Greater than	a > b ⟺ value(a) follows value(b) in domain ordering
`≤` or `<=`	Less than or equal	a ≤ b ⟺ (a < b) ∨ (a = b)
`≥` or `>=`	Greater than or equal	a ≥ b ⟺ (a > b) ∨ (a = b)

Operand Types:

Comparison operands can be:

Attribute references: Column names that resolve to the tuple's value for that attribute
Constants (literals): Fixed values like 100000, 'Engineering', DATE '2023-01-01'
Expressions: Computed values like salary * 1.1 or CONCAT(first_name, ' ', last_name)
Other attributes: Comparing two columns in the same tuple, like end_date > start_date

Comparison Operator ExamplesVarious comparison predicates applied to an Employee relation

Input

Output

Type-Specific Comparison Semantics:

Different data types have different comparison rules:

Numeric Types (INTEGER, DECIMAL, FLOAT):

Standard mathematical ordering
Mixed type comparisons: implicit promotion (e.g., INT compared to FLOAT promotes INT to FLOAT)
Precision considerations: floating-point comparison may have rounding issues

String Types (CHAR, VARCHAR, TEXT):

Lexicographic (dictionary) ordering based on character encoding
Case sensitivity depends on collation setting
Trailing space handling varies by database and type (CHAR pads, VARCHAR doesn't)
Collation determines ordering (e.g., 'ä' before or after 'b'?)

Date/Time Types (DATE, TIME, TIMESTAMP):

Chronological ordering
Time zone considerations for TIMESTAMP WITH TIME ZONE
Comparison across date and timestamp types requires care

Boolean Types:

FALSE < TRUE in ordering (by convention)
Direct equality comparison most common

Collation Pitfalls

String comparison behavior depends heavily on database collation settings. The query σ_{name < 'Bob'}(Employee) may or may not include 'alice' depending on case sensitivity. The query σ_{name < 'Böb'}(Employee) depends on locale-specific sorting rules. Always verify your database's collation when string comparisons affect correctness.

Logical Connectives

Atomic comparisons combine using logical connectives to form complex predicates. The three fundamental connectives—AND, OR, and NOT—provide the full expressiveness of propositional logic.

Conjunction (AND, ∧):

The conjunction of P₁ and P₂ is true if and only if both P₁ and P₂ are true.

$$P_1 ∧ P_2 = \begin{cases} TRUE & \text{if } P_1 = TRUE \text{ and } P_2 = TRUE \ FALSE & \text{if } P_1 = FALSE \text{ or } P_2 = FALSE \ UNKNOWN & \text{otherwise (NULL involvement)} \end{cases}$$

Disjunction (OR, ∨):

The disjunction of P₁ and P₂ is true if at least one of P₁ or P₂ is true.

$$P_1 ∨ P_2 = \begin{cases} TRUE & \text{if } P_1 = TRUE \text{ or } P_2 = TRUE \ FALSE & \text{if } P_1 = FALSE \text{ and } P_2 = FALSE \ UNKNOWN & \text{otherwise (NULL involvement)} \end{cases}$$

Negation (NOT, ¬):

Negation inverts the truth value of a predicate.

$$¬P = \begin{cases} TRUE & \text{if } P = FALSE \ FALSE & \text{if } P = TRUE \ UNKNOWN & \text{if } P = UNKNOWN \end{cases}$$

Complete Three-Valued Logic Truth Tables
P₁	P₂	P₁ AND P₂	P₁ OR P₂	NOT P₁
TRUE	TRUE	TRUE	TRUE	FALSE
TRUE	FALSE	FALSE	TRUE	FALSE
TRUE	UNKNOWN	UNKNOWN	TRUE	FALSE
FALSE	TRUE	FALSE	TRUE	TRUE
FALSE	FALSE	FALSE	FALSE	TRUE
FALSE	UNKNOWN	FALSE	UNKNOWN	TRUE
UNKNOWN	TRUE	UNKNOWN	TRUE	UNKNOWN
UNKNOWN	FALSE	FALSE	UNKNOWN	UNKNOWN
UNKNOWN	UNKNOWN	UNKNOWN	UNKNOWN	UNKNOWN

Key Observations from Three-Valued Logic:

TRUE dominates OR: If either operand is TRUE, the OR is TRUE regardless of the other operand (even if UNKNOWN).
FALSE dominates AND: If either operand is FALSE, the AND is FALSE regardless of the other operand (even if UNKNOWN).
UNKNOWN propagates: When the dominant value isn't present, UNKNOWN 'infects' the result.
NOT UNKNOWN = UNKNOWN: Negating an unknown value doesn't suddenly make it known.

Short-Circuit Evaluation:

Most database systems use short-circuit evaluation for efficiency:

P₁ AND P₂: If P₁ is FALSE, P₂ is not evaluated
P₁ OR P₂: If P₁ is TRUE, P₂ is not evaluated

However, relational algebra is declarative, and the optimizer may reorder predicates. Don't rely on evaluation order for correctness.

Compound Predicate ConstructionBuilding complex conditions from atomic comparisons

Input

Output

De Morgan's Laws

De Morgan's Laws allow predicate transformation: • ¬(P₁ ∧ P₂) = (¬P₁) ∨ (¬P₂) • ¬(P₁ ∨ P₂) = (¬P₁) ∧ (¬P₂)

These equivalences help rewrite complex predicates and are used by query optimizers. For example, NOT(salary > 100000 AND status = 'active') becomes (salary <= 100000 OR status <> 'active').

Operator Precedence and Parentheses

When multiple operators appear in a predicate, operator precedence determines the order of evaluation. Misunderstanding precedence is a common source of query bugs.

Standard Precedence (Highest to Lowest):

Precedence	Operators	Example
1 (Highest)	Arithmetic: *, /, %	salary * 1.1
2	Arithmetic: +, -	salary + bonus
3	Comparison: =, <>, <, >, <=, >=	salary > 100000
4	NOT	NOT active
5	AND	P1 AND P2
6 (Lowest)	OR	P1 OR P2

Critical Implication:

Because AND has higher precedence than OR, the expression:

A OR B AND C

is interpreted as:

A OR (B AND C)

not as:

(A OR B) AND C

These can produce very different results!

Precedence Pitfall DemonstrationShowing how precedence affects query results

Input

Output

Best Practices for Predicate Construction

•Always use parentheses when mixing AND and OR, even if you know the precedence. It prevents bugs and improves readability.
•Group related conditions together: (status = 'active' OR status = 'pending') AND department = 'Engineering'
•Write in normal form when possible: DNF (OR of ANDs) or CNF (AND of ORs) for clarity
•Test edge cases especially with NOT: verify predicates handle NULLs as expected
•Document complex predicates with comments explaining the business logic

Disjunctive vs Conjunctive Normal Form

DNF (Disjunctive Normal Form): OR of AND clauses, e.g., (A ∧ B) ∨ (C ∧ D). CNF (Conjunctive Normal Form): AND of OR clauses, e.g., (A ∨ B) ∧ (C ∨ D). Query optimizers often convert predicates to one of these forms for easier analysis and optimization.

Range and Membership Predicates

Beyond simple comparisons, relational systems support convenient syntactic forms for common predicate patterns: range queries and set membership.

BETWEEN: Closed Range Queries

The BETWEEN predicate tests whether a value falls within an inclusive range:

attribute BETWEEN low AND high ≡ attribute >= low AND attribute <= high

Key Characteristics:

Inclusive at both ends
Both endpoints must be specified
Works with any comparable type (numbers, dates, strings)
BETWEEN high AND low may return empty result if high < low (behavior varies by DBMS)

NOT BETWEEN:

attribute NOT BETWEEN low AND high ≡ attribute < low OR attribute > high

Range Predicate ExamplesVarious BETWEEN applications across data types

Input

Output

IN: Set Membership Testing

The IN predicate tests whether a value belongs to a specified set:

attribute IN (value1, value2, ..., valueN)

is equivalent to:

attribute = value1 OR attribute = value2 OR ... OR attribute = valueN

Key Characteristics:

More readable than multiple OR conditions
Order of values in the set doesn't matter
Works with any data type that supports equality comparison
Often optimized by database engines (hash-based lookup for large sets)

NOT IN:

attribute NOT IN (v1, v2, ..., vN) ≡ attribute ≠ v1 AND attribute ≠ v2 AND ... AND attribute ≠ vN

Set Membership Predicate ExamplesUsing IN for cleaner multi-value comparisons

Input

Output

NULL Gotcha with NOT IN

If the value list contains NULL, NOT IN behaves unexpectedly:

x NOT IN (1, 2, NULL) = x≠1 AND x≠2 AND x≠NULL

Since x≠NULL is UNKNOWN, the entire expression becomes UNKNOWN for any x (unless x matches 1 or 2). This means NOT IN with NULL in the list returns no rows! Always filter NULLs from value lists used with NOT IN.

Pattern Matching Predicates

While pure relational algebra doesn't include pattern matching, practical database systems (and SQL) provide powerful string matching capabilities. Understanding these is essential for working with real-world data.

LIKE: SQL Pattern Matching

The LIKE predicate matches strings against a pattern containing wildcards:

% (percent): Matches zero or more characters
_ (underscore): Matches exactly one character

Pattern Examples:

Pattern	Matches	Doesn't Match
`'John%'`	'John', 'Johnny', 'Johnson'	'john', 'JOhn' (case matters)
`'%son'`	'Johnson', 'son', 'Williamson'	'Sons', 'Sonata'
`'%oh%'`	'John', 'Mohawk', 'oh'	'ON', 'Owen'
`'J___'`	'John', 'Jane', 'Jack'	'Jo', 'James' (wrong length)
`'_o%'`	'Jo', 'John', 'Bob'	'Oscar' (o not in position 2)

Pattern Matching in PracticeUsing LIKE for flexible string matching

Input

Output

Escape Characters:

To match literal % or _ characters, use an escape character:

SELECT * FROM Products WHERE name LIKE '%50\%%' ESCAPE '\'
-- Matches: '50% off', 'Get 50% discount'

Regular Expressions:

Many databases support full regular expression matching:

PostgreSQL: column ~ 'pattern' or SIMILAR TO
MySQL: column REGEXP 'pattern'
Oracle: REGEXP_LIKE(column, 'pattern')
SQL Server: No native regex (use LIKE or CLR functions)

Performance Considerations:

Pattern matching predicates have significant performance implications:

Prefix patterns ('John%'): Can use B-tree indexes efficiently - O(log n) to find start point
Suffix patterns ('%son'): Cannot use standard indexes - requires full scan
Infix patterns ('%oh%'): Cannot use standard indexes - requires full scan
Full-text search: For efficient substring searching, consider full-text indexes (GIN, inverted indexes)

SARGable Pattern Matching

Only prefix patterns (patterns not starting with %) can leverage B-tree indexes. If you frequently search suffixes, consider storing a reversed copy of the column and using prefix search on it. For complex pattern needs, full-text search or specialized text indexes are far more efficient than LIKE with leading wildcards.

Predicate Evaluation Order and Optimization

The order in which predicates are evaluated can dramatically affect query performance. While the logical result doesn't depend on evaluation order (commutativity), the computational cost certainly does.

Evaluation Cost Factors:

Selectivity: More selective predicates eliminate more tuples early
Computational complexity: Some predicates are cheap (attribute comparison), others expensive (subquery, function call)
Index availability: Indexed predicates can be evaluated without full scan
Short-circuit opportunity: Earlier FALSE in AND, earlier TRUE in OR saves work

Optimal Evaluation Strategy:

For AND-connected predicates, evaluate in this order:

Most selective predicates first (eliminate tuples early)
Cheapest predicates first (if selectivity similar)
Indexed predicates first (if they dominate selectivity)

For OR-connected predicates:

Least selective predicates last (any TRUE satisfies)
Cheapest predicates first

Predicate Cost Comparison
Predicate Type	Relative Cost	Typical Selectivity	Index Usable
Primary key equality	Very Low	1/n (highly selective)	Yes
Indexed column equality	Low	1/NDV	Yes
Indexed range comparison	Low-Medium	Varies	Yes
Non-indexed equality	Medium	1/NDV	No (scan)
String LIKE 'prefix%'	Low	Varies	Yes (B-tree)
String LIKE '%suffix'	High	Varies	No (full scan)
Function on column	Medium-High	Varies	No (usually)
Subquery (uncorrelated)	High	Varies	Depends
Subquery (correlated)	Very High	Varies	Rarely

Evaluation Order ImpactDemonstrating how predicate order affects performance

Input

Output

Query Optimizer's Role:

Modern query optimizers automatically determine predicate evaluation order based on:

Cost-based optimization: Using statistics to estimate selectivity and cost
Index awareness: Pushing indexed predicates to access methods
Predicate pushdown: Evaluating predicates as early as possible in the plan
Predicate simplification: Combining or eliminating redundant predicates

When Optimizer Chooses Wrong:

Optimizers rely on statistics which can be outdated or inaccurate:

Correlated columns violate independence assumptions
Unusual data distributions not captured by histograms
Parameter binding hides actual values at plan time

Manual Intervention:

When optimizer chooses poorly, you can sometimes help:

Update statistics (ANALYZE / UPDATE STATISTICS)
Use optimizer hints (database-specific)
Rewrite predicates to expose optimization opportunities
Create appropriate indexes for selective predicates

Trust But Verify

Most of the time, the optimizer makes good choices—that's why we invest in sophisticated cost-based optimizers. But for critical queries, always examine the execution plan. Look for full table scans on large tables, unexpected predicate evaluation order, or missing index usage. Understanding how predicates are evaluated helps you diagnose and fix performance problems.

Complex Predicate Patterns

Real-world queries often require sophisticated predicate patterns. Recognizing these patterns helps you structure queries correctly and enables the optimizer to choose efficient execution strategies.

Pattern 1: Multi-Column Equality (Composite Key Lookup)

σ_{department = 'Eng' ∧ role = 'Senior' ∧ location = 'NYC'}(Employee)

This pattern benefits from composite indexes on (department, role, location).

Pattern 2: Range with Equality (Prefix + Range)

σ_{department = 'Eng' ∧ salary BETWEEN 100000 AND 150000}(Employee)

Optimal index: (department, salary) - equality column first, range column second.

Pattern 3: OR with Common Prefix

σ_{department = 'Eng' ∧ (role = 'Senior' ∨ role = 'Lead')}(Employee)

Equivalent to: department = 'Eng' AND role IN ('Senior', 'Lead')

Pattern 4: NOT with Complex Subexpression

σ_{¬(department = 'HR' ∧ salary < 50000)}(Employee)

Using De Morgan: department ≠ 'HR' OR salary >= 50000

Advanced Predicate PatternsComplex real-world predicate examples

Input

Output

Pattern Recognition Tips

•Multiple OR on same column → Convert to IN for clarity and potential optimization
•Range with equality → Consider column order in composite indexes
•NOT with compound → Apply De Morgan's laws for clearer logic
•NULL-aware comparisons → Always include explicit NULL handling when needed
•Temporal predicates → Be explicit about boundary inclusion/exclusion
•Hierarchical queries → Consider materialized path, nested sets, or recursive CTEs

Summary and Key Takeaways

Predicate conditions are the precise language for expressing data requirements. Mastering predicates enables you to write correct, efficient, and maintainable queries.

Key Takeaways

•Six comparison operators (=, ≠, <, >, ≤, ≥) form the atomic building blocks of predicates. Each has precise semantics across different data types.
•Logical connectives (AND, OR, NOT) combine atomics into complex expressions. Three-valued logic with UNKNOWN handles NULL correctly.
•Operator precedence follows NOT > AND > OR. Always use parentheses when mixing AND and OR to prevent subtle bugs.
•BETWEEN and IN provide convenient syntax for range and membership testing. Watch for NULL behavior, especially with NOT IN.
•Pattern matching with LIKE enables flexible string queries. Only prefix patterns are index-friendly; suffix/infix patterns require full scans.
•Evaluation order dramatically affects performance. High selectivity, low cost, and index availability guide optimal ordering.
•Complex patterns recur in real applications. Recognizing them helps structure queries for correctness and optimization.

What's Next:

Having thoroughly explored Selection and its predicates, we turn to the complementary operation: Projection. While Selection filters rows (horizontal), Projection selects columns (vertical). Together, they form the foundation for extracting precisely the data you need from any relation.

Page Complete

You now have a deep understanding of predicate conditions—from basic comparisons through complex patterns. This knowledge enables you to express any row-filtering requirement with precision and efficiency. Next, we'll explore the Projection operator for column selection.

2 / 5

Loading learning content...

Database Management SystemsRelational Algebra

Selection and Projection

LevelIntermediate

Duration90 mins

TopicRelational Algebra

2 / 5

Predicate Conditions

The Language of Conditions

What You Will Learn

Comparison Operators

The Six Fundamental Comparison Operators:

Operator	Meaning	Formal Notation
`=`	Equal to	a = b ⟺ value(a) is identical to value(b)
`≠` or `<>`	Not equal to	a ≠ b ⟺ ¬(a = b)
`<`	Less than	a < b ⟺ value(a) precedes value(b) in domain ordering
`>`	Greater than	a > b ⟺ value(a) follows value(b) in domain ordering
`≤` or `<=`	Less than or equal	a ≤ b ⟺ (a < b) ∨ (a = b)
`≥` or `>=`	Greater than or equal	a ≥ b ⟺ (a > b) ∨ (a = b)

Operand Types:

Comparison operands can be:

Attribute references: Column names that resolve to the tuple's value for that attribute
Constants (literals): Fixed values like 100000, 'Engineering', DATE '2023-01-01'
Expressions: Computed values like salary * 1.1 or CONCAT(first_name, ' ', last_name)
Other attributes: Comparing two columns in the same tuple, like end_date > start_date

Comparison Operator ExamplesVarious comparison predicates applied to an Employee relation

Input

Output

Type-Specific Comparison Semantics:

Different data types have different comparison rules:

Numeric Types (INTEGER, DECIMAL, FLOAT):

Standard mathematical ordering
Mixed type comparisons: implicit promotion (e.g., INT compared to FLOAT promotes INT to FLOAT)
Precision considerations: floating-point comparison may have rounding issues

String Types (CHAR, VARCHAR, TEXT):

Lexicographic (dictionary) ordering based on character encoding
Case sensitivity depends on collation setting
Trailing space handling varies by database and type (CHAR pads, VARCHAR doesn't)
Collation determines ordering (e.g., 'ä' before or after 'b'?)

Date/Time Types (DATE, TIME, TIMESTAMP):

Chronological ordering
Time zone considerations for TIMESTAMP WITH TIME ZONE
Comparison across date and timestamp types requires care

Boolean Types:

FALSE < TRUE in ordering (by convention)
Direct equality comparison most common

Collation Pitfalls

Logical Connectives

Atomic comparisons combine using logical connectives to form complex predicates. The three fundamental connectives—AND, OR, and NOT—provide the full expressiveness of propositional logic.

Conjunction (AND, ∧):

The conjunction of P₁ and P₂ is true if and only if both P₁ and P₂ are true.

Disjunction (OR, ∨):

The disjunction of P₁ and P₂ is true if at least one of P₁ or P₂ is true.

Negation (NOT, ¬):

Negation inverts the truth value of a predicate.

$$¬P = \begin{cases} TRUE & \text{if } P = FALSE \ FALSE & \text{if } P = TRUE \ UNKNOWN & \text{if } P = UNKNOWN \end{cases}$$

Complete Three-Valued Logic Truth Tables
P₁	P₂	P₁ AND P₂	P₁ OR P₂	NOT P₁
TRUE	TRUE	TRUE	TRUE	FALSE
TRUE	FALSE	FALSE	TRUE	FALSE
TRUE	UNKNOWN	UNKNOWN	TRUE	FALSE
FALSE	TRUE	FALSE	TRUE	TRUE
FALSE	FALSE	FALSE	FALSE	TRUE
FALSE	UNKNOWN	FALSE	UNKNOWN	TRUE
UNKNOWN	TRUE	UNKNOWN	TRUE	UNKNOWN
UNKNOWN	FALSE	FALSE	UNKNOWN	UNKNOWN
UNKNOWN	UNKNOWN	UNKNOWN	UNKNOWN	UNKNOWN

Key Observations from Three-Valued Logic:

TRUE dominates OR: If either operand is TRUE, the OR is TRUE regardless of the other operand (even if UNKNOWN).
FALSE dominates AND: If either operand is FALSE, the AND is FALSE regardless of the other operand (even if UNKNOWN).
UNKNOWN propagates: When the dominant value isn't present, UNKNOWN 'infects' the result.
NOT UNKNOWN = UNKNOWN: Negating an unknown value doesn't suddenly make it known.

Short-Circuit Evaluation:

Most database systems use short-circuit evaluation for efficiency:

P₁ AND P₂: If P₁ is FALSE, P₂ is not evaluated
P₁ OR P₂: If P₁ is TRUE, P₂ is not evaluated

However, relational algebra is declarative, and the optimizer may reorder predicates. Don't rely on evaluation order for correctness.

Compound Predicate ConstructionBuilding complex conditions from atomic comparisons

Input

Output

De Morgan's Laws

De Morgan's Laws allow predicate transformation: • ¬(P₁ ∧ P₂) = (¬P₁) ∨ (¬P₂) • ¬(P₁ ∨ P₂) = (¬P₁) ∧ (¬P₂)

These equivalences help rewrite complex predicates and are used by query optimizers. For example, NOT(salary > 100000 AND status = 'active') becomes (salary <= 100000 OR status <> 'active').

Operator Precedence and Parentheses

When multiple operators appear in a predicate, operator precedence determines the order of evaluation. Misunderstanding precedence is a common source of query bugs.

Standard Precedence (Highest to Lowest):

Precedence	Operators	Example
1 (Highest)	Arithmetic: *, /, %	salary * 1.1
2	Arithmetic: +, -	salary + bonus
3	Comparison: =, <>, <, >, <=, >=	salary > 100000
4	NOT	NOT active
5	AND	P1 AND P2
6 (Lowest)	OR	P1 OR P2

Critical Implication:

Because AND has higher precedence than OR, the expression:

A OR B AND C

is interpreted as:

A OR (B AND C)

not as:

(A OR B) AND C

These can produce very different results!

Precedence Pitfall DemonstrationShowing how precedence affects query results

Input

Output

Best Practices for Predicate Construction

•Always use parentheses when mixing AND and OR, even if you know the precedence. It prevents bugs and improves readability.
•Group related conditions together: (status = 'active' OR status = 'pending') AND department = 'Engineering'
•Write in normal form when possible: DNF (OR of ANDs) or CNF (AND of ORs) for clarity
•Test edge cases especially with NOT: verify predicates handle NULLs as expected
•Document complex predicates with comments explaining the business logic

Disjunctive vs Conjunctive Normal Form

Range and Membership Predicates

Beyond simple comparisons, relational systems support convenient syntactic forms for common predicate patterns: range queries and set membership.

BETWEEN: Closed Range Queries

The BETWEEN predicate tests whether a value falls within an inclusive range:

attribute BETWEEN low AND high ≡ attribute >= low AND attribute <= high

Key Characteristics:

Inclusive at both ends
Both endpoints must be specified
Works with any comparable type (numbers, dates, strings)
BETWEEN high AND low may return empty result if high < low (behavior varies by DBMS)

NOT BETWEEN:

attribute NOT BETWEEN low AND high ≡ attribute < low OR attribute > high

Range Predicate ExamplesVarious BETWEEN applications across data types

Input

Output

IN: Set Membership Testing

The IN predicate tests whether a value belongs to a specified set:

attribute IN (value1, value2, ..., valueN)

is equivalent to:

attribute = value1 OR attribute = value2 OR ... OR attribute = valueN

Key Characteristics:

More readable than multiple OR conditions
Order of values in the set doesn't matter
Works with any data type that supports equality comparison
Often optimized by database engines (hash-based lookup for large sets)

NOT IN:

attribute NOT IN (v1, v2, ..., vN) ≡ attribute ≠ v1 AND attribute ≠ v2 AND ... AND attribute ≠ vN

Set Membership Predicate ExamplesUsing IN for cleaner multi-value comparisons

Input

Output

NULL Gotcha with NOT IN

If the value list contains NULL, NOT IN behaves unexpectedly:

x NOT IN (1, 2, NULL) = x≠1 AND x≠2 AND x≠NULL

Pattern Matching Predicates

LIKE: SQL Pattern Matching

The LIKE predicate matches strings against a pattern containing wildcards:

% (percent): Matches zero or more characters
_ (underscore): Matches exactly one character

Pattern Examples:

Pattern	Matches	Doesn't Match
`'John%'`	'John', 'Johnny', 'Johnson'	'john', 'JOhn' (case matters)
`'%son'`	'Johnson', 'son', 'Williamson'	'Sons', 'Sonata'
`'%oh%'`	'John', 'Mohawk', 'oh'	'ON', 'Owen'
`'J___'`	'John', 'Jane', 'Jack'	'Jo', 'James' (wrong length)
`'_o%'`	'Jo', 'John', 'Bob'	'Oscar' (o not in position 2)

Pattern Matching in PracticeUsing LIKE for flexible string matching

Input

Output

Escape Characters:

To match literal % or _ characters, use an escape character:

SELECT * FROM Products WHERE name LIKE '%50\%%' ESCAPE '\'
-- Matches: '50% off', 'Get 50% discount'

Regular Expressions:

Many databases support full regular expression matching:

PostgreSQL: column ~ 'pattern' or SIMILAR TO
MySQL: column REGEXP 'pattern'
Oracle: REGEXP_LIKE(column, 'pattern')
SQL Server: No native regex (use LIKE or CLR functions)

Performance Considerations:

Pattern matching predicates have significant performance implications:

Prefix patterns ('John%'): Can use B-tree indexes efficiently - O(log n) to find start point
Suffix patterns ('%son'): Cannot use standard indexes - requires full scan
Infix patterns ('%oh%'): Cannot use standard indexes - requires full scan
Full-text search: For efficient substring searching, consider full-text indexes (GIN, inverted indexes)

SARGable Pattern Matching

Predicate Evaluation Order and Optimization

Evaluation Cost Factors:

Selectivity: More selective predicates eliminate more tuples early
Computational complexity: Some predicates are cheap (attribute comparison), others expensive (subquery, function call)
Index availability: Indexed predicates can be evaluated without full scan
Short-circuit opportunity: Earlier FALSE in AND, earlier TRUE in OR saves work

Optimal Evaluation Strategy:

For AND-connected predicates, evaluate in this order:

Most selective predicates first (eliminate tuples early)
Cheapest predicates first (if selectivity similar)
Indexed predicates first (if they dominate selectivity)

For OR-connected predicates:

Least selective predicates last (any TRUE satisfies)
Cheapest predicates first

Predicate Cost Comparison
Predicate Type	Relative Cost	Typical Selectivity	Index Usable
Primary key equality	Very Low	1/n (highly selective)	Yes
Indexed column equality	Low	1/NDV	Yes
Indexed range comparison	Low-Medium	Varies	Yes
Non-indexed equality	Medium	1/NDV	No (scan)
String LIKE 'prefix%'	Low	Varies	Yes (B-tree)
String LIKE '%suffix'	High	Varies	No (full scan)
Function on column	Medium-High	Varies	No (usually)
Subquery (uncorrelated)	High	Varies	Depends
Subquery (correlated)	Very High	Varies	Rarely

Evaluation Order ImpactDemonstrating how predicate order affects performance

Input

Output

Query Optimizer's Role:

Modern query optimizers automatically determine predicate evaluation order based on:

Cost-based optimization: Using statistics to estimate selectivity and cost
Index awareness: Pushing indexed predicates to access methods
Predicate pushdown: Evaluating predicates as early as possible in the plan
Predicate simplification: Combining or eliminating redundant predicates

When Optimizer Chooses Wrong:

Optimizers rely on statistics which can be outdated or inaccurate:

Correlated columns violate independence assumptions
Unusual data distributions not captured by histograms
Parameter binding hides actual values at plan time

Manual Intervention:

When optimizer chooses poorly, you can sometimes help:

Update statistics (ANALYZE / UPDATE STATISTICS)
Use optimizer hints (database-specific)
Rewrite predicates to expose optimization opportunities
Create appropriate indexes for selective predicates

Trust But Verify

Complex Predicate Patterns

Real-world queries often require sophisticated predicate patterns. Recognizing these patterns helps you structure queries correctly and enables the optimizer to choose efficient execution strategies.

Pattern 1: Multi-Column Equality (Composite Key Lookup)

σ_{department = 'Eng' ∧ role = 'Senior' ∧ location = 'NYC'}(Employee)

This pattern benefits from composite indexes on (department, role, location).

Pattern 2: Range with Equality (Prefix + Range)

σ_{department = 'Eng' ∧ salary BETWEEN 100000 AND 150000}(Employee)

Optimal index: (department, salary) - equality column first, range column second.

Pattern 3: OR with Common Prefix

σ_{department = 'Eng' ∧ (role = 'Senior' ∨ role = 'Lead')}(Employee)

Equivalent to: department = 'Eng' AND role IN ('Senior', 'Lead')

Pattern 4: NOT with Complex Subexpression

σ_{¬(department = 'HR' ∧ salary < 50000)}(Employee)

Using De Morgan: department ≠ 'HR' OR salary >= 50000

Advanced Predicate PatternsComplex real-world predicate examples

Input

Output

Pattern Recognition Tips

•Multiple OR on same column → Convert to IN for clarity and potential optimization
•Range with equality → Consider column order in composite indexes
•NOT with compound → Apply De Morgan's laws for clearer logic
•NULL-aware comparisons → Always include explicit NULL handling when needed
•Temporal predicates → Be explicit about boundary inclusion/exclusion
•Hierarchical queries → Consider materialized path, nested sets, or recursive CTEs

Summary and Key Takeaways

Predicate conditions are the precise language for expressing data requirements. Mastering predicates enables you to write correct, efficient, and maintainable queries.

Key Takeaways

•Six comparison operators (=, ≠, <, >, ≤, ≥) form the atomic building blocks of predicates. Each has precise semantics across different data types.
•Logical connectives (AND, OR, NOT) combine atomics into complex expressions. Three-valued logic with UNKNOWN handles NULL correctly.
•Operator precedence follows NOT > AND > OR. Always use parentheses when mixing AND and OR to prevent subtle bugs.
•BETWEEN and IN provide convenient syntax for range and membership testing. Watch for NULL behavior, especially with NOT IN.
•Pattern matching with LIKE enables flexible string queries. Only prefix patterns are index-friendly; suffix/infix patterns require full scans.
•Evaluation order dramatically affects performance. High selectivity, low cost, and index availability guide optimal ordering.
•Complex patterns recur in real applications. Recognizing them helps structure queries for correctness and optimization.

What's Next:

Page Complete

2 / 5