Loading learning content...
If the Selection operator is a filter, then the predicate is the filter's specification—the precise description of what passes through and what doesn't. The expressiveness and correctness of your queries depend entirely on how well you can articulate predicates.
Predicates in relational algebra form a miniature formal language: a collection of comparison operators, logical connectives, and evaluation rules that together allow you to express arbitrarily complex conditions. Every SQL WHERE clause, every index lookup condition, every join predicate—all are built from this fundamental vocabulary.
Mastering predicates means mastering the precision with which you can describe data requirements. An imprecise predicate returns wrong data. An inefficient predicate causes slow queries. A correct, efficient predicate is the mark of expert database craftsmanship.
By the end of this page, you will master comparison operators across different data types, understand logical connective semantics including short-circuit evaluation, comprehend operator precedence and its implications, handle complex predicate patterns including range queries and set membership, and understand predicate evaluation order and optimization.
At the atomic level, predicates are built from comparison operators that compare two values and return a boolean result. Understanding these operators—their syntax, semantics, and behavior across data types—is fundamental.
The Six Fundamental Comparison Operators:
| Operator | Meaning | Formal Notation |
|---|---|---|
= | Equal to | a = b ⟺ value(a) is identical to value(b) |
≠ or <> | Not equal to | a ≠ b ⟺ ¬(a = b) |
< | Less than | a < b ⟺ value(a) precedes value(b) in domain ordering |
> | Greater than | a > b ⟺ value(a) follows value(b) in domain ordering |
≤ or <= | Less than or equal | a ≤ b ⟺ (a < b) ∨ (a = b) |
≥ or >= | Greater than or equal | a ≥ b ⟺ (a > b) ∨ (a = b) |
Operand Types:
Comparison operands can be:
100000, 'Engineering', DATE '2023-01-01'salary * 1.1 or CONCAT(first_name, ' ', last_name)end_date > start_dateType-Specific Comparison Semantics:
Different data types have different comparison rules:
Numeric Types (INTEGER, DECIMAL, FLOAT):
String Types (CHAR, VARCHAR, TEXT):
Date/Time Types (DATE, TIME, TIMESTAMP):
Boolean Types:
String comparison behavior depends heavily on database collation settings. The query σ_{name < 'Bob'}(Employee) may or may not include 'alice' depending on case sensitivity. The query σ_{name < 'Böb'}(Employee) depends on locale-specific sorting rules. Always verify your database's collation when string comparisons affect correctness.
Atomic comparisons combine using logical connectives to form complex predicates. The three fundamental connectives—AND, OR, and NOT—provide the full expressiveness of propositional logic.
Conjunction (AND, ∧):
The conjunction of P₁ and P₂ is true if and only if both P₁ and P₂ are true.
$$P_1 ∧ P_2 = \begin{cases} TRUE & \text{if } P_1 = TRUE \text{ and } P_2 = TRUE \ FALSE & \text{if } P_1 = FALSE \text{ or } P_2 = FALSE \ UNKNOWN & \text{otherwise (NULL involvement)} \end{cases}$$
Disjunction (OR, ∨):
The disjunction of P₁ and P₂ is true if at least one of P₁ or P₂ is true.
$$P_1 ∨ P_2 = \begin{cases} TRUE & \text{if } P_1 = TRUE \text{ or } P_2 = TRUE \ FALSE & \text{if } P_1 = FALSE \text{ and } P_2 = FALSE \ UNKNOWN & \text{otherwise (NULL involvement)} \end{cases}$$
Negation (NOT, ¬):
Negation inverts the truth value of a predicate.
$$¬P = \begin{cases} TRUE & \text{if } P = FALSE \ FALSE & \text{if } P = TRUE \ UNKNOWN & \text{if } P = UNKNOWN \end{cases}$$
| P₁ | P₂ | P₁ AND P₂ | P₁ OR P₂ | NOT P₁ |
|---|---|---|---|---|
| TRUE | TRUE | TRUE | TRUE | FALSE |
| TRUE | FALSE | FALSE | TRUE | FALSE |
| TRUE | UNKNOWN | UNKNOWN | TRUE | FALSE |
| FALSE | TRUE | FALSE | TRUE | TRUE |
| FALSE | FALSE | FALSE | FALSE | TRUE |
| FALSE | UNKNOWN | FALSE | UNKNOWN | TRUE |
| UNKNOWN | TRUE | UNKNOWN | TRUE | UNKNOWN |
| UNKNOWN | FALSE | FALSE | UNKNOWN | UNKNOWN |
| UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN |
Key Observations from Three-Valued Logic:
TRUE dominates OR: If either operand is TRUE, the OR is TRUE regardless of the other operand (even if UNKNOWN).
FALSE dominates AND: If either operand is FALSE, the AND is FALSE regardless of the other operand (even if UNKNOWN).
UNKNOWN propagates: When the dominant value isn't present, UNKNOWN 'infects' the result.
NOT UNKNOWN = UNKNOWN: Negating an unknown value doesn't suddenly make it known.
Short-Circuit Evaluation:
Most database systems use short-circuit evaluation for efficiency:
P₁ AND P₂: If P₁ is FALSE, P₂ is not evaluatedP₁ OR P₂: If P₁ is TRUE, P₂ is not evaluatedHowever, relational algebra is declarative, and the optimizer may reorder predicates. Don't rely on evaluation order for correctness.
De Morgan's Laws allow predicate transformation: • ¬(P₁ ∧ P₂) = (¬P₁) ∨ (¬P₂) • ¬(P₁ ∨ P₂) = (¬P₁) ∧ (¬P₂)
These equivalences help rewrite complex predicates and are used by query optimizers. For example, NOT(salary > 100000 AND status = 'active') becomes (salary <= 100000 OR status <> 'active').
When multiple operators appear in a predicate, operator precedence determines the order of evaluation. Misunderstanding precedence is a common source of query bugs.
Standard Precedence (Highest to Lowest):
| Precedence | Operators | Example |
|---|---|---|
| 1 (Highest) | Arithmetic: *, /, % | salary * 1.1 |
| 2 | Arithmetic: +, - | salary + bonus |
| 3 | Comparison: =, <>, <, >, <=, >= | salary > 100000 |
| 4 | NOT | NOT active |
| 5 | AND | P1 AND P2 |
| 6 (Lowest) | OR | P1 OR P2 |
Critical Implication:
Because AND has higher precedence than OR, the expression:
A OR B AND C
is interpreted as:
A OR (B AND C)
not as:
(A OR B) AND C
These can produce very different results!
DNF (Disjunctive Normal Form): OR of AND clauses, e.g., (A ∧ B) ∨ (C ∧ D). CNF (Conjunctive Normal Form): AND of OR clauses, e.g., (A ∨ B) ∧ (C ∨ D). Query optimizers often convert predicates to one of these forms for easier analysis and optimization.
Beyond simple comparisons, relational systems support convenient syntactic forms for common predicate patterns: range queries and set membership.
BETWEEN: Closed Range Queries
The BETWEEN predicate tests whether a value falls within an inclusive range:
attribute BETWEEN low AND high ≡ attribute >= low AND attribute <= high
Key Characteristics:
BETWEEN high AND low may return empty result if high < low (behavior varies by DBMS)NOT BETWEEN:
attribute NOT BETWEEN low AND high ≡ attribute < low OR attribute > high
IN: Set Membership Testing
The IN predicate tests whether a value belongs to a specified set:
attribute IN (value1, value2, ..., valueN)
is equivalent to:
attribute = value1 OR attribute = value2 OR ... OR attribute = valueN
Key Characteristics:
NOT IN:
attribute NOT IN (v1, v2, ..., vN) ≡ attribute ≠ v1 AND attribute ≠ v2 AND ... AND attribute ≠ vN
If the value list contains NULL, NOT IN behaves unexpectedly:
x NOT IN (1, 2, NULL) = x≠1 AND x≠2 AND x≠NULL
Since x≠NULL is UNKNOWN, the entire expression becomes UNKNOWN for any x (unless x matches 1 or 2). This means NOT IN with NULL in the list returns no rows! Always filter NULLs from value lists used with NOT IN.
While pure relational algebra doesn't include pattern matching, practical database systems (and SQL) provide powerful string matching capabilities. Understanding these is essential for working with real-world data.
LIKE: SQL Pattern Matching
The LIKE predicate matches strings against a pattern containing wildcards:
% (percent): Matches zero or more characters_ (underscore): Matches exactly one characterPattern Examples:
| Pattern | Matches | Doesn't Match |
|---|---|---|
'John%' | 'John', 'Johnny', 'Johnson' | 'john', 'JOhn' (case matters) |
'%son' | 'Johnson', 'son', 'Williamson' | 'Sons', 'Sonata' |
'%oh%' | 'John', 'Mohawk', 'oh' | 'ON', 'Owen' |
'J___' | 'John', 'Jane', 'Jack' | 'Jo', 'James' (wrong length) |
'_o%' | 'Jo', 'John', 'Bob' | 'Oscar' (o not in position 2) |
Escape Characters:
To match literal % or _ characters, use an escape character:
SELECT * FROM Products WHERE name LIKE '%50\%%' ESCAPE '\'
-- Matches: '50% off', 'Get 50% discount'
Regular Expressions:
Many databases support full regular expression matching:
column ~ 'pattern' or SIMILAR TOcolumn REGEXP 'pattern'REGEXP_LIKE(column, 'pattern')Performance Considerations:
Pattern matching predicates have significant performance implications:
'John%'): Can use B-tree indexes efficiently - O(log n) to find start point'%son'): Cannot use standard indexes - requires full scan'%oh%'): Cannot use standard indexes - requires full scanOnly prefix patterns (patterns not starting with %) can leverage B-tree indexes. If you frequently search suffixes, consider storing a reversed copy of the column and using prefix search on it. For complex pattern needs, full-text search or specialized text indexes are far more efficient than LIKE with leading wildcards.
The order in which predicates are evaluated can dramatically affect query performance. While the logical result doesn't depend on evaluation order (commutativity), the computational cost certainly does.
Evaluation Cost Factors:
Optimal Evaluation Strategy:
For AND-connected predicates, evaluate in this order:
For OR-connected predicates:
| Predicate Type | Relative Cost | Typical Selectivity | Index Usable |
|---|---|---|---|
| Primary key equality | Very Low | 1/n (highly selective) | Yes |
| Indexed column equality | Low | 1/NDV | Yes |
| Indexed range comparison | Low-Medium | Varies | Yes |
| Non-indexed equality | Medium | 1/NDV | No (scan) |
| String LIKE 'prefix%' | Low | Varies | Yes (B-tree) |
| String LIKE '%suffix' | High | Varies | No (full scan) |
| Function on column | Medium-High | Varies | No (usually) |
| Subquery (uncorrelated) | High | Varies | Depends |
| Subquery (correlated) | Very High | Varies | Rarely |
Query Optimizer's Role:
Modern query optimizers automatically determine predicate evaluation order based on:
When Optimizer Chooses Wrong:
Optimizers rely on statistics which can be outdated or inaccurate:
Manual Intervention:
When optimizer chooses poorly, you can sometimes help:
ANALYZE / UPDATE STATISTICS)Most of the time, the optimizer makes good choices—that's why we invest in sophisticated cost-based optimizers. But for critical queries, always examine the execution plan. Look for full table scans on large tables, unexpected predicate evaluation order, or missing index usage. Understanding how predicates are evaluated helps you diagnose and fix performance problems.
Real-world queries often require sophisticated predicate patterns. Recognizing these patterns helps you structure queries correctly and enables the optimizer to choose efficient execution strategies.
Pattern 1: Multi-Column Equality (Composite Key Lookup)
σ_{department = 'Eng' ∧ role = 'Senior' ∧ location = 'NYC'}(Employee)
This pattern benefits from composite indexes on (department, role, location).
Pattern 2: Range with Equality (Prefix + Range)
σ_{department = 'Eng' ∧ salary BETWEEN 100000 AND 150000}(Employee)
Optimal index: (department, salary) - equality column first, range column second.
Pattern 3: OR with Common Prefix
σ_{department = 'Eng' ∧ (role = 'Senior' ∨ role = 'Lead')}(Employee)
Equivalent to: department = 'Eng' AND role IN ('Senior', 'Lead')
Pattern 4: NOT with Complex Subexpression
σ_{¬(department = 'HR' ∧ salary < 50000)}(Employee)
Using De Morgan: department ≠ 'HR' OR salary >= 50000
Predicate conditions are the precise language for expressing data requirements. Mastering predicates enables you to write correct, efficient, and maintainable queries.
What's Next:
Having thoroughly explored Selection and its predicates, we turn to the complementary operation: Projection. While Selection filters rows (horizontal), Projection selects columns (vertical). Together, they form the foundation for extracting precisely the data you need from any relation.
You now have a deep understanding of predicate conditions—from basic comparisons through complex patterns. This knowledge enables you to express any row-filtering requirement with precision and efficiency. Next, we'll explore the Projection operator for column selection.