Loading learning content...
Imagine you have a database containing millions of employee records, but you only need information about employees in the Engineering department who earn more than $100,000. How do you extract precisely the rows you need without retrieving and discarding irrelevant data? This is the fundamental problem that the Selection operator (σ) solves—and it does so with mathematical elegance and computational efficiency.
The Selection operator, denoted by the Greek letter sigma (σ), is one of the most frequently used operations in relational algebra. Every time you write a WHERE clause in SQL, you're invoking the conceptual machinery that Selection provides. Understanding this operator deeply isn't just academic—it's essential for writing efficient queries, understanding query optimization, and debugging performance problems in database systems.
By the end of this page, you will understand the formal mathematical definition of the Selection operator, master predicate construction and evaluation, comprehend the implementation strategies used by real database systems, and recognize how Selection forms the foundation for declarative data retrieval across all SQL operations.
The Selection operator is a unary operation in relational algebra, meaning it operates on a single relation to produce a new relation as output. Its purpose is horizontal filtering—selecting specific tuples (rows) from a relation based on a specified condition while preserving all attributes (columns).
Formal Definition:
Given a relation R with schema R(A₁, A₂, ..., Aₙ) and a selection predicate P, the Selection operation is defined as:
$$σ_P(R) = {t \mid t ∈ R ∧ P(t)}$$
In plain language: the Selection σₚ(R) returns the set of all tuples t from relation R where the predicate P evaluates to true for t.
Key Properties:
Type Preservation: The output schema is identical to the input schema. If R has attributes (A₁, A₂, ..., Aₙ), then σₚ(R) also has attributes (A₁, A₂, ..., Aₙ).
Cardinality Reduction: The number of tuples in the result is less than or equal to the number in the input: |σₚ(R)| ≤ |R|. Selection can only reduce or maintain cardinality, never increase it.
Closure Property: The output is always a valid relation, ensuring that Selection can be composed with other relational operations.
You may encounter different notational conventions: σ_{P}(R), σP, SELECTP, or R WHERE P. All express the same operation. The subscript or bracketed expression contains the selection predicate, and the relation appears either inside parentheses or to the left of the operator symbol. This course uses σ_{P}(R) as the standard notation.
Understanding the Selection operator requires grounding in set theory, as relational algebra is fundamentally a set-based formalism. The Selection operator performs set restriction—it produces a subset of the input relation.
Set-Theoretic Properties:
1. Subset Guarantee: For any relation R and predicate P: $$σ_P(R) ⊆ R$$
Every tuple in the result must be a tuple in the original relation. Selection never creates new tuples or modifies existing ones.
2. Idempotence: Applying the same Selection twice produces the same result: $$σ_P(σ_P(R)) = σ_P(R)$$
Once tuples not satisfying P are eliminated, applying the same filter again has no additional effect.
3. Commutativity: The order of cascaded Selections doesn't matter: $$σ_{P1}(σ_{P2}(R)) = σ_{P2}(σ_{P1}(R))$$
This property is crucial for query optimization, allowing the optimizer to reorder Selection operations.
4. Cascade into Conjunction: Multiple Selections can be combined into one with a conjunctive predicate: $$σ_{P1}(σ_{P2}(R)) = σ_{P1 ∧ P2}(R)$$
This equivalence allows query optimizers to either split or merge Selection operations based on execution efficiency.
| Law Name | Expression | Meaning |
|---|---|---|
| Identity | σ_{true}(R) = R | Selecting with always-true predicate returns entire relation |
| Empty Result | σ_{false}(R) = ∅ | Selecting with always-false predicate returns empty relation |
| Cascade | σ_{P1}(σ_{P2}(R)) = σ_{P1 ∧ P2}(R) | Cascaded selections combine into single selection |
| Commutativity | σ_{P1}(σ_{P2}(R)) = σ_{P2}(σ_{P1}(R)) | Order of cascaded selections doesn't matter |
| Idempotence | σ_P(σ_P(R)) = σ_P(R) | Repeated identical selection has no additional effect |
| Selectivity Bound | 0 ≤ |σ_P(R)| ≤ |R| | Result cardinality bounded by input cardinality |
Relationship to Predicate Logic:
The selection predicate P is a formula in first-order predicate logic. For each tuple t, P(t) evaluates to either true or false (we'll address NULL values later). The predicate can reference:
The expressiveness of selection predicates directly corresponds to the fragment of first-order logic restricted to single-tuple evaluation.
These algebraic laws aren't just theoretical curiosities—they're the foundation of query optimization. When a database optimizer sees a complex query, it uses these equivalences to rewrite the query into a more efficient form while guaranteeing identical results. Understanding these laws helps you predict optimizer behavior and write queries that are easier to optimize.
The power and flexibility of Selection lies in its predicate—the boolean expression that determines which tuples pass through the filter. Understanding predicate construction is essential for effective query writing.
Atomic Predicates (Simple Conditions):
The simplest predicates compare a single attribute to a value or another attribute:
salary > 100000end_date > start_dateGeneral form: <attribute> <comparison_op> <value|attribute>
Comparison operators include: =, ≠ (or <>), <, >, ≤ (<=), ≥ (>=)
Compound Predicates:
Atomic predicates combine using logical connectives to form compound predicates:
Conjunction (AND): Both conditions must be true
department = 'Engineering' AND salary > 100000Disjunction (OR): At least one condition must be true
department = 'Engineering' OR department = 'Research'Negation (NOT): The condition must be false
NOT (status = 'inactive')Operator Precedence (highest to lowest):
A OR B AND C evaluates as A OR (B AND C), not (A OR B) AND C. Always use explicit parentheses for clarity.salary > 100000 returns UNKNOWN (not false) when salary is NULL. We'll cover three-valued logic in detail.One of the most subtle aspects of Selection is how it handles NULL values. In standard relational algebra, tuples either satisfy a predicate or they don't—a two-valued Boolean world. However, SQL and practical database systems use three-valued logic (3VL) to handle unknown or missing values.
The Three Truth Values:
Selection passes only tuples where the predicate evaluates to TRUE. Tuples evaluating to FALSE or UNKNOWN are excluded.
NULL in Comparisons:
When any operand in a comparison is NULL, the result is UNKNOWN:
NULL = NULL → UNKNOWN (not TRUE!)NULL > 100 → UNKNOWN100 = NULL → UNKNOWN'Alice' <> NULL → UNKNOWN| P | Q | P AND Q | P OR Q | NOT P |
|---|---|---|---|---|
| TRUE | TRUE | TRUE | TRUE | FALSE |
| TRUE | FALSE | FALSE | TRUE | FALSE |
| TRUE | UNKNOWN | UNKNOWN | TRUE | FALSE |
| FALSE | TRUE | FALSE | TRUE | TRUE |
| FALSE | FALSE | FALSE | FALSE | TRUE |
| FALSE | UNKNOWN | FALSE | UNKNOWN | TRUE |
| UNKNOWN | TRUE | UNKNOWN | TRUE | UNKNOWN |
| UNKNOWN | FALSE | FALSE | UNKNOWN | UNKNOWN |
| UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN | UNKNOWN |
A critical implication: σ_{P}(R) ∪ σ_{NOT P}(R) ≠ R when NULLs are present. Tuples with NULL values in predicate-relevant columns disappear from both results! To include NULLs, you must explicitly test: salary > 100000 OR salary IS NULL.
Testing for NULL:
Since comparisons with NULL yield UNKNOWN, SQL provides special predicates:
IS NULL: Returns TRUE if the value is NULLIS NOT NULL: Returns TRUE if the value is not NULLThese are the only predicates that return TRUE or FALSE (never UNKNOWN) when evaluating NULL values.
Practical Implications:
When writing selections involving nullable columns:
Understanding the performance characteristics of Selection is essential for writing efficient queries and predicting system behavior. Two key concepts govern Selection performance: selectivity and evaluation cost.
Selectivity:
The selectivity of a predicate P on relation R is the fraction of tuples that satisfy the predicate:
$$sel(P, R) = \frac{|σ_P(R)|}{|R|}$$
Selectivity ranges from 0 (no tuples match) to 1 (all tuples match).
Examples of Selectivity:
emp_id = 'E001' on primary key: selectivity ≈ 1/n (highly selective)department = 'Engineering' on 5-department company: selectivity ≈ 0.2salary > 0 on salary column: selectivity ≈ 1.0 (non-selective)status = 'active' AND country = 'US' AND age > 30: compound selectivitySelectivity Estimation:
Database query optimizers estimate selectivity using:
| Predicate Type | Typical Selectivity Estimate | Assumptions |
|---|---|---|
| column = constant | 1/NDV(column) | Uniform distribution of values |
| column > constant | (max - constant)/(max - min) | Uniform distribution in range |
| column < constant | (constant - min)/(max - min) | Uniform distribution in range |
| column BETWEEN a AND b | (b - a)/(max - min) | Uniform distribution in range |
| column LIKE 'prefix%' | Varies | Uses histogram if available |
| P₁ AND P₂ | sel(P₁) × sel(P₂) | Independence assumption |
| P₁ OR P₂ | sel(P₁) + sel(P₂) - sel(P₁)×sel(P₂) | Inclusion-exclusion |
| NOT P | 1 - sel(P) | Complement |
Time Complexity:
Without Indexes: Selection requires a full table scan: O(n) where n is the number of tuples. Every tuple must be read and evaluated against the predicate.
With Indexes: If an index exists on the selection attribute, complexity improves dramatically:
Space Complexity:
Selection produces a result relation that requires storage. In the worst case (selectivity = 1), the output equals the input size. In practice:
Query optimizers use selectivity estimates to choose execution strategies. A highly selective predicate (sel < 0.01) might warrant an index lookup, while a non-selective predicate (sel > 0.3) might favor a full table scan. Inaccurate selectivity estimates cause poor query plans—one reason database statistics must be kept current.
Understanding how database systems implement Selection illuminates why some queries are fast and others are slow. The query execution engine has several strategies available, each suited to different scenarios.
Strategy 1: Full Table Scan
The simplest approach reads every tuple and evaluates the predicate:
Algorithm: FullTableScan(R, P)
result ← ∅
for each page p in R:
for each tuple t in p:
if P(t) = TRUE:
add t to result
return result
Strategy 2: Index Scan
When an index exists on selection attributes:
Algorithm: IndexScan(R, P, Index)
result ← ∅
keys ← SearchIndex(Index, P) -- Returns matching keys
for each key k in keys:
rid ← Index.lookup(k) -- Get record ID
t ← R.fetch(rid) -- Fetch tuple by ID
if P(t) = TRUE: -- Verify (for partial matches)
add t to result
return result
Strategy 3: Bitmap Index Scan
For complex predicates combining multiple conditions:
Algorithm: BitmapScan(R, P₁ AND P₂, Index₁, Index₂)
bitmap1 ← Index₁.getBitmap(P₁) -- Bitmap of RIDs matching P₁
bitmap2 ← Index₂.getBitmap(P₂) -- Bitmap of RIDs matching P₂
combined ← bitmap1 AND bitmap2 -- Bitwise AND
result ← ∅
for each bit set in combined:
rid ← position of bit
t ← R.fetch(rid)
add t to result
return result
Strategy 4: Index-Only Scan (Covering Index)
When all needed columns exist in the index:
Algorithm: IndexOnlyScan(Index, P)
result ← ∅
for each entry e in Index where P(e) = TRUE:
add ProjectedTuple(e) to result
return result
The Selection operator manifests in SQL through the WHERE clause. Understanding this connection helps bridge theoretical algebra with practical query writing.
Direct Mapping:
| Relational Algebra | SQL Equivalent |
|---|---|
| σ_{P}(R) | SELECT * FROM R WHERE P |
SQL extends the basic Selection capability with additional features not present in pure relational algebra:
12345678910111213141516171819202122232425262728293031323334353637
-- Basic Selection: σ_{department='Engineering'}(Employee)SELECT * FROM Employee WHERE department = 'Engineering'; -- Compound Selection: σ_{department='Engineering' ∧ salary>100000}(Employee)SELECT * FROM Employee WHERE department = 'Engineering' AND salary > 100000; -- Range SelectionSELECT * FROM Employee WHERE salary BETWEEN 80000 AND 120000; -- Pattern Matching (SQL extension beyond pure relational algebra)SELECT * FROM Employee WHERE name LIKE 'A%'; -- Names starting with 'A' -- Set Membership (simpler than multiple OR conditions)SELECT * FROM Employee WHERE department IN ('Engineering', 'Research', 'Product'); -- NULL HandlingSELECT * FROM Employee WHERE salary IS NOT NULL AND salary > 100000; -- Combining NULL check with ORSELECT * FROM Employee WHERE salary > 100000 OR salary IS NULL; -- Include unknown salariesSQL Predicates Beyond Basic Algebra:
SQL provides several predicate forms that extend beyond simple comparisons:
salary BETWEEN 50000 AND 100000 (inclusive range)department IN ('Eng', 'HR', 'Sales') (set membership)name LIKE 'John%' (pattern matching with wildcards)Expression in Predicates:
SQL allows expressions, not just attributes, in predicates:
-- Function application in predicate
SELECT * FROM Employee WHERE YEAR(hire_date) = 2023;
-- Arithmetic expression
SELECT * FROM Employee WHERE salary * 1.1 > 100000;
-- String function
SELECT * FROM Employee WHERE UPPER(department) = 'ENGINEERING';
Warning: Expressions on columns often prevent index usage, causing full table scans. Prefer hire_date >= '2023-01-01' AND hire_date < '2024-01-01' over YEAR(hire_date) = 2023.
A predicate is 'SARGable' (Search ARGument able) if it can leverage an index. Non-SARGable predicates include: functions on columns (YEAR(date)), arithmetic on columns (salary * 1.1), implicit type conversions, and LIKE patterns starting with wildcards ('%suffix'). Rewrite predicates to be SARGable whenever possible for optimal performance.
The Selection operator is the fundamental filtering mechanism in relational algebra, and mastering its properties is essential for database professionals. Let's consolidate what we've learned:
What's Next:
Having mastered the Selection operator for horizontal filtering, we'll next explore the predicates in greater depth. Understanding the different types of conditions you can express—and how databases evaluate them—is crucial for writing effective queries. The next page dives deep into predicate conditions, covering comparison types, logical operators, and advanced predicate patterns.
You now have a comprehensive understanding of the Selection operator—from its formal mathematical definition through implementation strategies to practical SQL usage. This knowledge forms the foundation for all filtering operations in relational databases. Next, we'll dive deeper into the predicate conditions that power Selection.