Loading content...
Selection filters rows. Projection selects columns. But real-world queries rarely need just one or the other—they need both, often multiple times, combined in sophisticated ways. Finding "the names and salaries of Engineering employees earning over $100,000" requires filtering (Selection) and column extraction (Projection), working together seamlessly.
The power of relational algebra lies in composability—the ability to combine simple operations into arbitrarily complex expressions. Because each operation produces a valid relation, that result can immediately become input to another operation. This closure property is the foundation of declarative query languages.
This page explores how Selection and Projection combine, the algebraic laws that govern their interaction, how query optimizers exploit these laws for efficiency, and practical patterns for constructing robust, maintainable queries.
By the end of this page, you will master composing Selection and Projection operations, understand commutativity and when operations can be reordered, comprehend predicate pushdown and projection pushdown optimizations, build complex relational algebra expressions systematically, and translate composite expressions to SQL effectively.
The closure property of relational algebra states that every operation produces a relation, which can then serve as input to any other operation. This enables operation composition or chaining.
Basic Composition:
Given operations Op₁ and Op₂, their composition is written as:
$$Op_1(Op_2(R))$$
Read from inside out: first apply Op₂ to R, then apply Op₁ to the result.
Common Composition Patterns:
1. Selection then Projection: $$π_{name, salary}(σ_{department='Engineering'}(Employee))$$
Filter rows first, then select columns.
2. Projection then Selection: $$σ_{salary>100000}(π_{name, salary}(Employee))$$
Select columns first, then filter rows.
3. Multiple Selections: $$σ_{status='active'}(σ_{department='Engineering'}(Employee))$$
Apply multiple filters sequentially.
4. Multiple Projections: $$π_{name}(π_{name, salary}(Employee))$$
Narrow columns progressively (the second projection further restricts the first).
Expression Trees:
Complex expressions are often visualized as trees:
π_{name, salary}
│
▼
σ_{dept='Eng' ∧ salary>100000}
│
▼
Employee
Leaves are base relations; internal nodes are operations. Evaluation proceeds bottom-up.
Canonical Form:
Queries can be written in different but equivalent forms. A canonical form is a standard representation that makes comparison and optimization easier. For Selection and Projection, a common canonical form places all Selections before all Projections:
$$π_L(σ_P(R))$$
This "filter first, project last" pattern often corresponds to efficient execution.
Composition enables declarative thinking: you describe WHAT you want (filter these conditions, show these columns), not HOW to compute it. The database system determines execution order and strategy. This separation of specification from execution is a key advantage of the relational model.
Understanding when operations can be reordered—and when they cannot—is fundamental to query optimization and correct query construction.
Selection-Selection Commutativity:
Multiple Selections always commute: $$σ_{P1}(σ_{P2}(R)) = σ_{P2}(σ_{P1}(R)) = σ_{P1 ∧ P2}(R)$$
The order of applying filters doesn't affect the result, and cascaded filters can be combined into a single conjunctive filter.
Projection-Projection Absorption:
For Projections, the outer one dominates: $$π_{L1}(π_{L2}(R)) = π_{L1}(R) \text{ where } L1 ⊆ L2$$
Projecting to a superset then to a subset equals projecting directly to the subset.
Selection-Projection Commutativity (Conditional):
Selection and Projection commute only if the selection predicate references only attributes retained by the projection:
$$π_L(σ_P(R)) = σ_P(π_L(R)) \text{ if } attributes(P) ⊆ L$$
| Law | Expression | Condition |
|---|---|---|
| Selection Cascade | σ_{P1}(σ_{P2}(R)) = σ_{P1 ∧ P2}(R) | Always valid |
| Selection Commutativity | σ_{P1}(σ_{P2}(R)) = σ_{P2}(σ_{P1}(R)) | Always valid |
| Projection Cascade | π_{L1}(π_{L2}(R)) = π_{L1}(R) | L1 ⊆ L2 |
| Selection-Projection Commute | π_L(σ_P(R)) = σ_P(π_L(R)) | attrs(P) ⊆ L |
| Selection Distribution (∨) | σ_{P1 ∨ P2}(R) = σ_{P1}(R) ∪ σ_{P2}(R) | Always valid |
| Selection Distribution (∧) | σ_{P1 ∧ P2}(R) = σ_{P1}(R) ∩ σ_{P2}(R) | Always valid |
Query optimizers use these equivalence laws to transform queries into more efficient forms. A query written as π_{name}(σ_{salary>100000}(R)) might be executed differently than written, but the result is guaranteed identical by algebraic law. Understanding these laws helps you predict optimizer behavior and write optimization-friendly queries.
One of the most important optimizations in query processing is predicate pushdown (also called filter pushdown or selection pushdown)—applying selection predicates as early as possible in the query plan.
The Principle:
Reducing the number of tuples early in a query plan reduces work for all subsequent operations. If you can filter 1 million rows down to 1,000 before joining, every subsequent operation processes 1,000 rows instead of 1 million.
Mathematical Basis:
Predicate pushdown relies on the cascade and commutativity laws:
$$σ_P(R_1 ⋈ R_2) = σ_P(R_1) ⋈ R_2 \text{ (if P references only } R_1 \text{ attributes)}$$
Example Transformation:
Original query plan:
σ_{E.dept='Eng'}(Employee ⋈ Department)
Optimized with pushdown:
(σ_{dept='Eng'}(Employee)) ⋈ Department
Filtering Employee before the join means the join processes fewer rows.
When Pushdown Applies:
Single-table predicates: Predicates referencing attributes from only one table can always be pushed to that table's scan.
Join predicates: Can be pushed into the join itself (becomes the join condition).
Predicates after aggregation: Some predicates on grouped results can be pushed before grouping if they reference only grouping columns.
When Pushdown Doesn't Apply:
Post-join predicates: Predicates comparing attributes from multiple tables must wait until after the join.
WHERE E.salary > D.avg_salary -- Must evaluate after join
Aggregate-dependent predicates: HAVING conditions can't be pushed before GROUP BY.
Outer join complications: For LEFT/RIGHT/FULL joins, pushdown rules are more complex—pushing to the 'null-supplying' side changes semantics.
SQL and Pushdown:
Optimizers automatically identify pushdown opportunities:
-- Written:
SELECT e.name, d.location
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.status = 'active' AND d.country = 'US';
-- Optimizer pushes:
-- e.status = 'active' → to Employee scan
-- d.country = 'US' → to Department scan
-- Both predicates evaluated before join
Predicate pushdown to the outer (null-preserving) side of a LEFT/RIGHT join can change query semantics. The optimizer must be careful:
• Predicates on the preserved side: Safe to push • Predicates on the null-supplying side: May turn outer join into inner join
Always verify execution plans for outer joins with complex predicates.
Complementing predicate pushdown, projection pushdown (also called column pruning) eliminates unnecessary columns as early as possible in query execution.
The Principle:
Carrying fewer columns through operations reduces:
Determining Necessary Columns:
The optimizer traces column usage backward from the output:
Example Transformation:
Interaction with Storage Engines:
Row-Oriented Storage:
Column-Oriented Storage:
Covering Indexes:
-- Query:
SELECT name, department FROM Employee WHERE status = 'active';
-- If index exists on (status, name, department):
-- Index-only scan possible—table never accessed!
SELECT * defeats projection pushdown because the optimizer must assume all columns are needed. Always specify columns explicitly in production queries. This enables optimization, documents intent, and protects against schema changes adding unexpected columns.
Real-world queries combine multiple operations. Developing a systematic approach to constructing complex relational algebra expressions improves correctness and maintainability.
Step-by-Step Construction Method:
Example Construction:
Query: Find names and salaries of active Engineering employees with their department locations, for departments in the US.
Analysis:
Common Expression Patterns:
Pattern 1: Simple Filter and Project
π_{columns}(σ_{predicate}(Table))
The most common pattern—equivalent to basic SELECT...FROM...WHERE.
Pattern 2: Join with Local Filters
π_{columns}(
σ_{T1.pred}(Table1) ⋈_{join_cond} σ_{T2.pred}(Table2)
)
Apply table-specific filters before joining.
Pattern 3: Join with Post-Join Filter
π_{columns}(
σ_{post_join_pred}(
Table1 ⋈_{join_cond} Table2
)
)
For predicates comparing attributes from both tables.
Pattern 4: Multi-Table Query
π_{columns}(
σ_{predicates}(
(Table1 ⋈ Table2) ⋈ Table3
)
)
Multiple joins with filtering and final projection.
Pattern 5: Derived Table / Subquery
π_{outer_columns}(
OuterTable ⋈ (
π_{inner_columns}(σ_{inner_pred}(InnerTable))
)
)
Nested expressions for subquery-like patterns.
When writing queries, optimize for readability first. Write the logical expression that clearly captures intent, then trust the optimizer to transform it efficiently. Prematurely optimizing query structure can make code harder to understand and maintain. Profile before hand-optimizing.
Understanding how composite relational algebra expressions map to SQL helps bridge theory and practice.
Direct Mapping:
$$π_{A_1, A_2, ..., A_n}(σ_P(R))$$
maps to:
SELECT DISTINCT A1, A2, ..., An
FROM R
WHERE P;
(DISTINCT for set semantics; omit for bag semantics)
Join Expression:
$$π_{columns}(σ_P(R_1 ⋈_{condition} R_2))$$
maps to:
SELECT columns
FROM R1
JOIN R2 ON condition
WHERE P;
Complex Expression:
$$π_{E.name, D.location}(σ_{E.salary>100000}(Employee ⋈_{dept_id=id} Department))$$
maps to:
SELECT e.name, d.location
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.salary > 100000;
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
-- Pattern 1: Selection then Projection-- RA: π_{name, salary}(σ_{department='Engineering'}(Employee)) SELECT name, salaryFROM EmployeeWHERE department = 'Engineering'; -- Pattern 2: Multiple selections (cascade)-- RA: σ_{status='active'}(σ_{salary>100000}(Employee))-- RA: = σ_{status='active' ∧ salary>100000}(Employee) SELECT *FROM EmployeeWHERE status = 'active' AND salary > 100000; -- Pattern 3: Join with pushed predicates-- RA: π_{e.name, d.location}(-- σ_{e.department='Engineering'}(Employee) -- ⋈_{dept_id=id} -- σ_{d.country='US'}(Department)-- ) SELECT e.name, d.locationFROM Employee eJOIN Department d ON e.dept_id = d.idWHERE e.department = 'Engineering' AND d.country = 'US'; -- Pattern 4: Self-join with rename (conceptual)-- RA: π_{E1.name, E2.name}(-- σ_{E1.manager_id = E2.emp_id}(-- ρ_{E1}(Employee) × ρ_{E2}(Employee)-- )-- ) SELECT e1.name AS employee, e2.name AS managerFROM Employee e1JOIN Employee e2 ON e1.manager_id = e2.emp_id; -- Pattern 5: Subquery (derived table)-- RA: π_{name}(-- Employee ⋈_{dept_id ∈ result}(-- π_{id}(σ_{country='US'}(Department))-- )-- ) SELECT e.nameFROM Employee eWHERE e.dept_id IN ( SELECT id FROM Department WHERE country = 'US');SQL Clause Correspondence:
| Relational Algebra | SQL Clause |
|---|---|
| π (Projection) | SELECT |
| σ (Selection) | WHERE |
| × (Cartesian Product) | FROM with multiple tables, no JOIN |
| ⋈ (Join) | JOIN ... ON |
| ∪ (Union) | UNION |
| ∩ (Intersection) | INTERSECT |
| − (Difference) | EXCEPT / MINUS |
| ρ (Rename) | AS (alias) |
| γ (Grouping) | GROUP BY |
Key Differences:
When approaching a query problem, think in relational algebra terms: What tables? What joins? What filters? What output? This structured thinking produces cleaner SQL and helps you understand optimizer behavior. The algebra gives you a mental model; SQL gives you the syntax.
Understanding optimization principles helps you write queries that are easier for the optimizer to handle and predict execution behavior.
Heuristic Optimization Rules:
These rules generally improve performance without cost estimation:
Cost-Based Optimization Considerations:
Beyond heuristics, optimizers estimate costs:
| Transformation | Before | After | Benefit |
|---|---|---|---|
| Selection Pushdown | σ_P(R ⋈ S) | σ_P(R) ⋈ S | Fewer rows in join |
| Projection Pushdown | π_L(R ⋈ S) | π_L(π_{L∪join}(R) ⋈ π_{L∪join}(S)) | Narrower tuples |
| Selection Cascade | σ_P1(σ_P2(R)) | σ_{P1∧P2}(R) | Single filter pass |
| Selection Split | σ_{P1∧P2}(R ⋈ S) | σ_{P1}(R) ⋈ σ_{P2}(S) | Push to each table |
| Cartesian to Join | σ_{R.a=S.b}(R × S) | R ⋈_{a=b} S | Avoid full product |
| Join Reordering | (R ⋈ S) ⋈ T | R ⋈ (S ⋈ T) | Smaller intermediates |
Writing Optimization-Friendly Queries:
DO:
AVOID:
-- Optimization-friendly:
SELECT e.name, e.salary, d.dept_name
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.status = 'active'
AND e.hire_date >= '2020-01-01'
AND d.country = 'US';
-- Less optimization-friendly:
SELECT *
FROM Employee e, Department d
WHERE e.dept_id = d.id
AND (e.status = 'active' OR d.status = 'active')
AND YEAR(e.hire_date) >= 2020;
Optimizers are powerful but not omniscient. They rely on statistics that may be outdated, make independence assumptions that don't hold, and have limited time for exploring alternatives. Complex queries may not find optimal plans. When performance matters, examine execution plans and consider query restructuring.
Combining Selection and Projection operations forms the foundation of practical query construction. Understanding composition and optimization enables you to write efficient, maintainable queries.
Module Complete:
Congratulations! You have completed the Selection and Projection module. You now understand:
These foundational operations appear in virtually every database query. Master them, and you've mastered the core of relational data retrieval.
You have completed Module 2: Selection and Projection. These two fundamental unary operations—filtering rows with Selection and extracting columns with Projection—form the backbone of relational queries. Combined with the algebraic laws for their transformation, you now have the theoretical foundation and practical knowledge to construct and optimize queries effectively. The next module explores Set Operations: Union, Intersection, and Difference.