Selection And Projection - Learning Module

Loading content...

0/252

Combining Operations

From Building Blocks to Complex Queries

Selection filters rows. Projection selects columns. But real-world queries rarely need just one or the other—they need both, often multiple times, combined in sophisticated ways. Finding "the names and salaries of Engineering employees earning over $100,000" requires filtering (Selection) and column extraction (Projection), working together seamlessly.

The power of relational algebra lies in composability—the ability to combine simple operations into arbitrarily complex expressions. Because each operation produces a valid relation, that result can immediately become input to another operation. This closure property is the foundation of declarative query languages.

This page explores how Selection and Projection combine, the algebraic laws that govern their interaction, how query optimizers exploit these laws for efficiency, and practical patterns for constructing robust, maintainable queries.

What You Will Learn

By the end of this page, you will master composing Selection and Projection operations, understand commutativity and when operations can be reordered, comprehend predicate pushdown and projection pushdown optimizations, build complex relational algebra expressions systematically, and translate composite expressions to SQL effectively.

Operation Composition

The closure property of relational algebra states that every operation produces a relation, which can then serve as input to any other operation. This enables operation composition or chaining.

Basic Composition:

Given operations Op₁ and Op₂, their composition is written as:

$$Op_1(Op_2(R))$$

Read from inside out: first apply Op₂ to R, then apply Op₁ to the result.

Common Composition Patterns:

1. Selection then Projection: $$π_{name, salary}(σ_{department='Engineering'}(Employee))$$

Filter rows first, then select columns.

2. Projection then Selection: $$σ_{salary>100000}(π_{name, salary}(Employee))$$

Select columns first, then filter rows.

3. Multiple Selections: $$σ_{status='active'}(σ_{department='Engineering'}(Employee))$$

Apply multiple filters sequentially.

4. Multiple Projections: $$π_{name}(π_{name, salary}(Employee))$$

Narrow columns progressively (the second projection further restricts the first).

Step-by-Step Composition EvaluationTracing through a composed operation

Input

Output

Expression Trees:

Complex expressions are often visualized as trees:

          π_{name, salary}
               │
               ▼
    σ_{dept='Eng' ∧ salary>100000}
               │
               ▼
           Employee

Leaves are base relations; internal nodes are operations. Evaluation proceeds bottom-up.

Canonical Form:

Queries can be written in different but equivalent forms. A canonical form is a standard representation that makes comparison and optimization easier. For Selection and Projection, a common canonical form places all Selections before all Projections:

$$π_L(σ_P(R))$$

This "filter first, project last" pattern often corresponds to efficient execution.

Why Composition Matters

Composition enables declarative thinking: you describe WHAT you want (filter these conditions, show these columns), not HOW to compute it. The database system determines execution order and strategy. This separation of specification from execution is a key advantage of the relational model.

Commutativity and Equivalence

Understanding when operations can be reordered—and when they cannot—is fundamental to query optimization and correct query construction.

Selection-Selection Commutativity:

Multiple Selections always commute: $$σ_{P1}(σ_{P2}(R)) = σ_{P2}(σ_{P1}(R)) = σ_{P1 ∧ P2}(R)$$

The order of applying filters doesn't affect the result, and cascaded filters can be combined into a single conjunctive filter.

Projection-Projection Absorption:

For Projections, the outer one dominates: $$π_{L1}(π_{L2}(R)) = π_{L1}(R) \text{ where } L1 ⊆ L2$$

Projecting to a superset then to a subset equals projecting directly to the subset.

Selection-Projection Commutativity (Conditional):

Selection and Projection commute only if the selection predicate references only attributes retained by the projection:

$$π_L(σ_P(R)) = σ_P(π_L(R)) \text{ if } attributes(P) ⊆ L$$

When Commutativity Applies (and Fails)Examples of valid and invalid reorderings

Input

Output

Algebraic Equivalence Laws for Selection and Projection
Law	Expression	Condition
Selection Cascade	σ_{P1}(σ_{P2}(R)) = σ_{P1 ∧ P2}(R)	Always valid
Selection Commutativity	σ_{P1}(σ_{P2}(R)) = σ_{P2}(σ_{P1}(R))	Always valid
Projection Cascade	π_{L1}(π_{L2}(R)) = π_{L1}(R)	L1 ⊆ L2
Selection-Projection Commute	π_L(σ_P(R)) = σ_P(π_L(R))	attrs(P) ⊆ L
Selection Distribution (∨)	σ_{P1 ∨ P2}(R) = σ_{P1}(R) ∪ σ_{P2}(R)	Always valid
Selection Distribution (∧)	σ_{P1 ∧ P2}(R) = σ_{P1}(R) ∩ σ_{P2}(R)	Always valid

Optimizer's Toolbox

Query optimizers use these equivalence laws to transform queries into more efficient forms. A query written as π_{name}(σ_{salary>100000}(R)) might be executed differently than written, but the result is guaranteed identical by algebraic law. Understanding these laws helps you predict optimizer behavior and write optimization-friendly queries.

Predicate Pushdown Optimization

One of the most important optimizations in query processing is predicate pushdown (also called filter pushdown or selection pushdown)—applying selection predicates as early as possible in the query plan.

The Principle:

Reducing the number of tuples early in a query plan reduces work for all subsequent operations. If you can filter 1 million rows down to 1,000 before joining, every subsequent operation processes 1,000 rows instead of 1 million.

Mathematical Basis:

Predicate pushdown relies on the cascade and commutativity laws:

$$σ_P(R_1 ⋈ R_2) = σ_P(R_1) ⋈ R_2 \text{ (if P references only } R_1 \text{ attributes)}$$

Example Transformation:

Original query plan:

σ_{E.dept='Eng'}(Employee ⋈ Department)

Optimized with pushdown:

(σ_{dept='Eng'}(Employee)) ⋈ Department

Filtering Employee before the join means the join processes fewer rows.

Predicate Pushdown ImpactComparing execution with and without pushdown

Input

Output

When Pushdown Applies:

Single-table predicates: Predicates referencing attributes from only one table can always be pushed to that table's scan.
Join predicates: Can be pushed into the join itself (becomes the join condition).
Predicates after aggregation: Some predicates on grouped results can be pushed before grouping if they reference only grouping columns.

When Pushdown Doesn't Apply:

Post-join predicates: Predicates comparing attributes from multiple tables must wait until after the join.
```
WHERE E.salary > D.avg_salary  -- Must evaluate after join
```
Aggregate-dependent predicates: HAVING conditions can't be pushed before GROUP BY.
Outer join complications: For LEFT/RIGHT/FULL joins, pushdown rules are more complex—pushing to the 'null-supplying' side changes semantics.

SQL and Pushdown:

Optimizers automatically identify pushdown opportunities:

-- Written:
SELECT e.name, d.location
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.status = 'active' AND d.country = 'US';

-- Optimizer pushes:
-- e.status = 'active' → to Employee scan
-- d.country = 'US' → to Department scan
-- Both predicates evaluated before join

Outer Joins and Pushdown

Predicate pushdown to the outer (null-preserving) side of a LEFT/RIGHT join can change query semantics. The optimizer must be careful:

• Predicates on the preserved side: Safe to push • Predicates on the null-supplying side: May turn outer join into inner join

Always verify execution plans for outer joins with complex predicates.

Projection Pushdown Optimization

Complementing predicate pushdown, projection pushdown (also called column pruning) eliminates unnecessary columns as early as possible in query execution.

The Principle:

Carrying fewer columns through operations reduces:

Memory usage for intermediate results
I/O for reading from disk (in column stores)
Network transfer (in distributed systems)
Hash/sort key sizes for joins and deduplication

Determining Necessary Columns:

The optimizer traces column usage backward from the output:

Start with columns in final SELECT
Add columns used in ORDER BY, GROUP BY, HAVING
Add columns used in predicates (WHERE, JOIN conditions)
Add columns for any subqueries that reference outer columns

Example Transformation:

Projection Pushdown in ActionPushing column elimination into table scans

Input

Output

Interaction with Storage Engines:

Row-Oriented Storage:

Must read entire row from disk
Projection benefit: Less data copied to memory/processed
Still pays full I/O cost for each page

Column-Oriented Storage:

Reads only needed columns from disk
Projection benefit: Dramatic I/O reduction
Ideal for analytical queries selecting few columns from wide tables

Covering Indexes:

Include all queried columns
Projection satisfied entirely from index
No table access needed (index-only scan)

-- Query:
SELECT name, department FROM Employee WHERE status = 'active';

-- If index exists on (status, name, department):
-- Index-only scan possible—table never accessed!

SELECT * Considered Harmful

SELECT * defeats projection pushdown because the optimizer must assume all columns are needed. Always specify columns explicitly in production queries. This enables optimization, documents intent, and protects against schema changes adding unexpected columns.

Building Complex Expressions

Real-world queries combine multiple operations. Developing a systematic approach to constructing complex relational algebra expressions improves correctness and maintainability.

Step-by-Step Construction Method:

Identify the result schema: What columns should the final output contain?
Identify the base relations: Which tables are involved?
Identify filter conditions: What predicates must be satisfied?
Identify relationship conditions: How do tables relate (join conditions)?
Build from inside out: Start with base relations, add joins, then filters, then projections.

Example Construction:

Query: Find names and salaries of active Engineering employees with their department locations, for departments in the US.

Analysis:

Result: name, salary, location
Base relations: Employee, Department
Filters: status='active', department='Engineering', country='US'
Join: Employee.dept_id = Department.id

Systematic Expression ConstructionBuilding a complex query step by step

Input

Output

Common Expression Patterns:

Pattern 1: Simple Filter and Project

π_{columns}(σ_{predicate}(Table))

The most common pattern—equivalent to basic SELECT...FROM...WHERE.

Pattern 2: Join with Local Filters

π_{columns}(
  σ_{T1.pred}(Table1) ⋈_{join_cond} σ_{T2.pred}(Table2)
)

Apply table-specific filters before joining.

Pattern 3: Join with Post-Join Filter

π_{columns}(
  σ_{post_join_pred}(
    Table1 ⋈_{join_cond} Table2
  )
)

For predicates comparing attributes from both tables.

Pattern 4: Multi-Table Query

π_{columns}(
  σ_{predicates}(
    (Table1 ⋈ Table2) ⋈ Table3
  )
)

Multiple joins with filtering and final projection.

Pattern 5: Derived Table / Subquery

π_{outer_columns}(
  OuterTable ⋈ (
    π_{inner_columns}(σ_{inner_pred}(InnerTable))
  )
)

Nested expressions for subquery-like patterns.

Readability vs Optimization

When writing queries, optimize for readability first. Write the logical expression that clearly captures intent, then trust the optimizer to transform it efficiently. Prematurely optimizing query structure can make code harder to understand and maintain. Profile before hand-optimizing.

Translation to SQL

Understanding how composite relational algebra expressions map to SQL helps bridge theory and practice.

Direct Mapping:

$$π_{A_1, A_2, ..., A_n}(σ_P(R))$$

maps to:

SELECT DISTINCT A1, A2, ..., An
FROM R
WHERE P;

(DISTINCT for set semantics; omit for bag semantics)

Join Expression:

$$π_{columns}(σ_P(R_1 ⋈_{condition} R_2))$$

maps to:

SELECT columns
FROM R1
JOIN R2 ON condition
WHERE P;

Complex Expression:

$$π_{E.name, D.location}(σ_{E.salary>100000}(Employee ⋈_{dept_id=id} Department))$$

maps to:

SELECT e.name, d.location
FROM Employee e
JOIN Department d ON e.dept_id = d.id  
WHERE e.salary > 100000;

expression_to_sql.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
-- Pattern 1: Selection then Projection
-- RA: π_{name, salary}(σ_{department='Engineering'}(Employee))
 
SELECT name, salary
FROM Employee
WHERE department = 'Engineering';
 
-- Pattern 2: Multiple selections (cascade)
-- RA: σ_{status='active'}(σ_{salary>100000}(Employee))
-- RA: = σ_{status='active' ∧ salary>100000}(Employee)
 
SELECT *
FROM Employee
WHERE status = 'active' 
  AND salary > 100000;
 
-- Pattern 3: Join with pushed predicates
-- RA: π_{e.name, d.location}(
--       σ_{e.department='Engineering'}(Employee) 
--       ⋈_{dept_id=id} 
--       σ_{d.country='US'}(Department)
--     )
 
SELECT e.name, d.location
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.department = 'Engineering'
  AND d.country = 'US';
 
-- Pattern 4: Self-join with rename (conceptual)
-- RA: π_{E1.name, E2.name}(
--       σ_{E1.manager_id = E2.emp_id}(
--         ρ_{E1}(Employee) × ρ_{E2}(Employee)
--       )
--     )
 
SELECT e1.name AS employee, e2.name AS manager
FROM Employee e1
JOIN Employee e2 ON e1.manager_id = e2.emp_id;
 
-- Pattern 5: Subquery (derived table)
-- RA: π_{name}(
--       Employee ⋈_{dept_id ∈ result}(
--         π_{id}(σ_{country='US'}(Department))
--       )
--     )
 
SELECT e.name
FROM Employee e
WHERE e.dept_id IN (
    SELECT id 
    FROM Department 
    WHERE country = 'US'
);

SQL Clause Correspondence:

Relational Algebra	SQL Clause
π (Projection)	SELECT
σ (Selection)	WHERE
× (Cartesian Product)	FROM with multiple tables, no JOIN
⋈ (Join)	JOIN ... ON
∪ (Union)	UNION
∩ (Intersection)	INTERSECT
− (Difference)	EXCEPT / MINUS
ρ (Rename)	AS (alias)
γ (Grouping)	GROUP BY

Key Differences:

Execution order: SQL is declarative; evaluation order differs from written order
Duplicate handling: SQL preserves duplicates by default; RA eliminates them
Extended features: SQL has aggregates, subqueries, window functions beyond basic RA
NULL handling: SQL uses three-valued logic; pure RA typically assumes no NULLs

Think Algebraically, Write SQL

When approaching a query problem, think in relational algebra terms: What tables? What joins? What filters? What output? This structured thinking produces cleaner SQL and helps you understand optimizer behavior. The algebra gives you a mental model; SQL gives you the syntax.

Query Optimization Principles

Understanding optimization principles helps you write queries that are easier for the optimizer to handle and predict execution behavior.

Heuristic Optimization Rules:

These rules generally improve performance without cost estimation:

Push selections down: Apply filters as early as possible
Push projections down: Eliminate unused columns early
Combine cascaded selections: Multiple filters into one
Combine cascaded projections: Keep only final column set
Identify selection before Cartesian product: Convert to join
Preprocess selections with indexes: Use available indexes for selective predicates

Cost-Based Optimization Considerations:

Beyond heuristics, optimizers estimate costs:

Selectivity estimation: How many rows pass each filter?
Join order selection: Which table order minimizes intermediate results?
Access path selection: Table scan vs index scan vs index-only scan?
Join algorithm selection: Nested loop vs hash vs merge join?

Optimization Transformation Summary
Transformation	Before	After	Benefit
Selection Pushdown	σ_P(R ⋈ S)	σ_P(R) ⋈ S	Fewer rows in join
Projection Pushdown	π_L(R ⋈ S)	π_L(π_{L∪join}(R) ⋈ π_{L∪join}(S))	Narrower tuples
Selection Cascade	σ_P1(σ_P2(R))	σ_{P1∧P2}(R)	Single filter pass
Selection Split	σ_{P1∧P2}(R ⋈ S)	σ_{P1}(R) ⋈ σ_{P2}(S)	Push to each table
Cartesian to Join	σ_{R.a=S.b}(R × S)	R ⋈_{a=b} S	Avoid full product
Join Reordering	(R ⋈ S) ⋈ T	R ⋈ (S ⋈ T)	Smaller intermediates

Writing Optimization-Friendly Queries:

DO:

Specify explicit column lists (enables projection pushdown)
Use JOIN syntax with ON clause (clearer join conditions)
Put most selective predicates where they can be pushed
Use SARGable predicates (functions after, not on, indexed columns)
Group related conditions logically

AVOID:

SELECT * (defeats column pruning)
OR conditions spanning multiple tables (prevents pushdown)
Functions on indexed columns in predicates
Implicit joining via Cartesian product and WHERE
Unnecessary DISTINCT (costly if not needed)

-- Optimization-friendly:
SELECT e.name, e.salary, d.dept_name
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.status = 'active'
  AND e.hire_date >= '2020-01-01'
  AND d.country = 'US';

-- Less optimization-friendly:
SELECT *
FROM Employee e, Department d
WHERE e.dept_id = d.id
  AND (e.status = 'active' OR d.status = 'active')
  AND YEAR(e.hire_date) >= 2020;

Optimizer Limits

Optimizers are powerful but not omniscient. They rely on statistics that may be outdated, make independence assumptions that don't hold, and have limited time for exploring alternatives. Complex queries may not find optimal plans. When performance matters, examine execution plans and consider query restructuring.

Summary and Key Takeaways

Combining Selection and Projection operations forms the foundation of practical query construction. Understanding composition and optimization enables you to write efficient, maintainable queries.

Key Takeaways

•Closure enables composition — Every operation produces a relation, enabling arbitrary chaining. Complex queries build from simple operations.
•Algebraic laws govern equivalence — Selection cascades and commutes freely; Selection-Projection commutes conditionally. These laws enable safe query transformation.
•Predicate pushdown is fundamental — Applying selections early reduces rows for subsequent operations. This is the most impactful optimization.
•Projection pushdown reduces data volume — Eliminating unused columns early saves memory, I/O, and processing throughout query execution.
•Build expressions systematically — Identify output, sources, conditions, and joins; construct from inside out. This method produces correct, clear queries.
•SQL maps directly from algebra — SELECT=Projection, WHERE=Selection, JOIN=Join. The conceptual model translates cleanly to practical syntax.
•Write for the optimizer — Explicit columns, SARGable predicates, clear join conditions help the optimizer help you.

Module Complete:

Congratulations! You have completed the Selection and Projection module. You now understand:

The Selection operator (σ) for row filtering
Predicate conditions and their evaluation
The Projection operator (π) for column selection
Duplicate elimination in theory and practice
Composing operations into complex queries

These foundational operations appear in virtually every database query. Master them, and you've mastered the core of relational data retrieval.

Module Complete

You have completed Module 2: Selection and Projection. These two fundamental unary operations—filtering rows with Selection and extracting columns with Projection—form the backbone of relational queries. Combined with the algebraic laws for their transformation, you now have the theoretical foundation and practical knowledge to construct and optimize queries effectively. The next module explores Set Operations: Union, Intersection, and Difference.

Combining Operations

From Building Blocks to Complex Queries

What You Will Learn

Operation Composition

Basic Composition:

Given operations Op₁ and Op₂, their composition is written as:

$$Op_1(Op_2(R))$$

Read from inside out: first apply Op₂ to R, then apply Op₁ to the result.

Common Composition Patterns:

1. Selection then Projection: $$π_{name, salary}(σ_{department='Engineering'}(Employee))$$

Filter rows first, then select columns.

2. Projection then Selection: $$σ_{salary>100000}(π_{name, salary}(Employee))$$

Select columns first, then filter rows.

3. Multiple Selections: $$σ_{status='active'}(σ_{department='Engineering'}(Employee))$$

Apply multiple filters sequentially.

4. Multiple Projections: $$π_{name}(π_{name, salary}(Employee))$$

Narrow columns progressively (the second projection further restricts the first).

Step-by-Step Composition EvaluationTracing through a composed operation

Input

Output

Expression Trees:

Complex expressions are often visualized as trees:

          π_{name, salary}
               │
               ▼
    σ_{dept='Eng' ∧ salary>100000}
               │
               ▼
           Employee

Leaves are base relations; internal nodes are operations. Evaluation proceeds bottom-up.

Canonical Form:

$$π_L(σ_P(R))$$

This "filter first, project last" pattern often corresponds to efficient execution.

Why Composition Matters

Commutativity and Equivalence

Understanding when operations can be reordered—and when they cannot—is fundamental to query optimization and correct query construction.

Selection-Selection Commutativity:

Multiple Selections always commute: $$σ_{P1}(σ_{P2}(R)) = σ_{P2}(σ_{P1}(R)) = σ_{P1 ∧ P2}(R)$$

The order of applying filters doesn't affect the result, and cascaded filters can be combined into a single conjunctive filter.

Projection-Projection Absorption:

For Projections, the outer one dominates: $$π_{L1}(π_{L2}(R)) = π_{L1}(R) \text{ where } L1 ⊆ L2$$

Projecting to a superset then to a subset equals projecting directly to the subset.

Selection-Projection Commutativity (Conditional):

Selection and Projection commute only if the selection predicate references only attributes retained by the projection:

$$π_L(σ_P(R)) = σ_P(π_L(R)) \text{ if } attributes(P) ⊆ L$$

When Commutativity Applies (and Fails)Examples of valid and invalid reorderings

Input

Output

Algebraic Equivalence Laws for Selection and Projection
Law	Expression	Condition
Selection Cascade	σ_{P1}(σ_{P2}(R)) = σ_{P1 ∧ P2}(R)	Always valid
Selection Commutativity	σ_{P1}(σ_{P2}(R)) = σ_{P2}(σ_{P1}(R))	Always valid
Projection Cascade	π_{L1}(π_{L2}(R)) = π_{L1}(R)	L1 ⊆ L2
Selection-Projection Commute	π_L(σ_P(R)) = σ_P(π_L(R))	attrs(P) ⊆ L
Selection Distribution (∨)	σ_{P1 ∨ P2}(R) = σ_{P1}(R) ∪ σ_{P2}(R)	Always valid
Selection Distribution (∧)	σ_{P1 ∧ P2}(R) = σ_{P1}(R) ∩ σ_{P2}(R)	Always valid

Optimizer's Toolbox

Predicate Pushdown Optimization

The Principle:

Mathematical Basis:

Predicate pushdown relies on the cascade and commutativity laws:

$$σ_P(R_1 ⋈ R_2) = σ_P(R_1) ⋈ R_2 \text{ (if P references only } R_1 \text{ attributes)}$$

Example Transformation:

Original query plan:

σ_{E.dept='Eng'}(Employee ⋈ Department)

Optimized with pushdown:

(σ_{dept='Eng'}(Employee)) ⋈ Department

Filtering Employee before the join means the join processes fewer rows.

Predicate Pushdown ImpactComparing execution with and without pushdown

Input

Output

When Pushdown Applies:

Single-table predicates: Predicates referencing attributes from only one table can always be pushed to that table's scan.
Join predicates: Can be pushed into the join itself (becomes the join condition).
Predicates after aggregation: Some predicates on grouped results can be pushed before grouping if they reference only grouping columns.

When Pushdown Doesn't Apply:

Post-join predicates: Predicates comparing attributes from multiple tables must wait until after the join.
```
WHERE E.salary > D.avg_salary  -- Must evaluate after join
```
Aggregate-dependent predicates: HAVING conditions can't be pushed before GROUP BY.
Outer join complications: For LEFT/RIGHT/FULL joins, pushdown rules are more complex—pushing to the 'null-supplying' side changes semantics.

SQL and Pushdown:

Optimizers automatically identify pushdown opportunities:

-- Written:
SELECT e.name, d.location
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.status = 'active' AND d.country = 'US';

-- Optimizer pushes:
-- e.status = 'active' → to Employee scan
-- d.country = 'US' → to Department scan
-- Both predicates evaluated before join

Outer Joins and Pushdown

Predicate pushdown to the outer (null-preserving) side of a LEFT/RIGHT join can change query semantics. The optimizer must be careful:

• Predicates on the preserved side: Safe to push • Predicates on the null-supplying side: May turn outer join into inner join

Always verify execution plans for outer joins with complex predicates.

Projection Pushdown Optimization

Complementing predicate pushdown, projection pushdown (also called column pruning) eliminates unnecessary columns as early as possible in query execution.

The Principle:

Carrying fewer columns through operations reduces:

Memory usage for intermediate results
I/O for reading from disk (in column stores)
Network transfer (in distributed systems)
Hash/sort key sizes for joins and deduplication

Determining Necessary Columns:

The optimizer traces column usage backward from the output:

Start with columns in final SELECT
Add columns used in ORDER BY, GROUP BY, HAVING
Add columns used in predicates (WHERE, JOIN conditions)
Add columns for any subqueries that reference outer columns

Example Transformation:

Projection Pushdown in ActionPushing column elimination into table scans

Input

Output

Interaction with Storage Engines:

Row-Oriented Storage:

Must read entire row from disk
Projection benefit: Less data copied to memory/processed
Still pays full I/O cost for each page

Column-Oriented Storage:

Reads only needed columns from disk
Projection benefit: Dramatic I/O reduction
Ideal for analytical queries selecting few columns from wide tables

Covering Indexes:

Include all queried columns
Projection satisfied entirely from index
No table access needed (index-only scan)

-- Query:
SELECT name, department FROM Employee WHERE status = 'active';

-- If index exists on (status, name, department):
-- Index-only scan possible—table never accessed!

SELECT * Considered Harmful

Building Complex Expressions

Real-world queries combine multiple operations. Developing a systematic approach to constructing complex relational algebra expressions improves correctness and maintainability.

Step-by-Step Construction Method:

Identify the result schema: What columns should the final output contain?
Identify the base relations: Which tables are involved?
Identify filter conditions: What predicates must be satisfied?
Identify relationship conditions: How do tables relate (join conditions)?
Build from inside out: Start with base relations, add joins, then filters, then projections.

Example Construction:

Query: Find names and salaries of active Engineering employees with their department locations, for departments in the US.

Analysis:

Result: name, salary, location
Base relations: Employee, Department
Filters: status='active', department='Engineering', country='US'
Join: Employee.dept_id = Department.id

Systematic Expression ConstructionBuilding a complex query step by step

Input

Output

Common Expression Patterns:

Pattern 1: Simple Filter and Project

π_{columns}(σ_{predicate}(Table))

The most common pattern—equivalent to basic SELECT...FROM...WHERE.

Pattern 2: Join with Local Filters

π_{columns}(
  σ_{T1.pred}(Table1) ⋈_{join_cond} σ_{T2.pred}(Table2)
)

Apply table-specific filters before joining.

Pattern 3: Join with Post-Join Filter

π_{columns}(
  σ_{post_join_pred}(
    Table1 ⋈_{join_cond} Table2
  )
)

For predicates comparing attributes from both tables.

Pattern 4: Multi-Table Query

π_{columns}(
  σ_{predicates}(
    (Table1 ⋈ Table2) ⋈ Table3
  )
)

Multiple joins with filtering and final projection.

Pattern 5: Derived Table / Subquery

π_{outer_columns}(
  OuterTable ⋈ (
    π_{inner_columns}(σ_{inner_pred}(InnerTable))
  )
)

Nested expressions for subquery-like patterns.

Readability vs Optimization

Translation to SQL

Understanding how composite relational algebra expressions map to SQL helps bridge theory and practice.

Direct Mapping:

$$π_{A_1, A_2, ..., A_n}(σ_P(R))$$

maps to:

SELECT DISTINCT A1, A2, ..., An
FROM R
WHERE P;

(DISTINCT for set semantics; omit for bag semantics)

Join Expression:

$$π_{columns}(σ_P(R_1 ⋈_{condition} R_2))$$

maps to:

SELECT columns
FROM R1
JOIN R2 ON condition
WHERE P;

Complex Expression:

$$π_{E.name, D.location}(σ_{E.salary>100000}(Employee ⋈_{dept_id=id} Department))$$

maps to:

SELECT e.name, d.location
FROM Employee e
JOIN Department d ON e.dept_id = d.id  
WHERE e.salary > 100000;

expression_to_sql.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
-- Pattern 1: Selection then Projection
-- RA: π_{name, salary}(σ_{department='Engineering'}(Employee))
 
SELECT name, salary
FROM Employee
WHERE department = 'Engineering';
 
-- Pattern 2: Multiple selections (cascade)
-- RA: σ_{status='active'}(σ_{salary>100000}(Employee))
-- RA: = σ_{status='active' ∧ salary>100000}(Employee)
 
SELECT *
FROM Employee
WHERE status = 'active' 
  AND salary > 100000;
 
-- Pattern 3: Join with pushed predicates
-- RA: π_{e.name, d.location}(
--       σ_{e.department='Engineering'}(Employee) 
--       ⋈_{dept_id=id} 
--       σ_{d.country='US'}(Department)
--     )
 
SELECT e.name, d.location
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.department = 'Engineering'
  AND d.country = 'US';
 
-- Pattern 4: Self-join with rename (conceptual)
-- RA: π_{E1.name, E2.name}(
--       σ_{E1.manager_id = E2.emp_id}(
--         ρ_{E1}(Employee) × ρ_{E2}(Employee)
--       )
--     )
 
SELECT e1.name AS employee, e2.name AS manager
FROM Employee e1
JOIN Employee e2 ON e1.manager_id = e2.emp_id;
 
-- Pattern 5: Subquery (derived table)
-- RA: π_{name}(
--       Employee ⋈_{dept_id ∈ result}(
--         π_{id}(σ_{country='US'}(Department))
--       )
--     )
 
SELECT e.name
FROM Employee e
WHERE e.dept_id IN (
    SELECT id 
    FROM Department 
    WHERE country = 'US'
);

SQL Clause Correspondence:

Relational Algebra	SQL Clause
π (Projection)	SELECT
σ (Selection)	WHERE
× (Cartesian Product)	FROM with multiple tables, no JOIN
⋈ (Join)	JOIN ... ON
∪ (Union)	UNION
∩ (Intersection)	INTERSECT
− (Difference)	EXCEPT / MINUS
ρ (Rename)	AS (alias)
γ (Grouping)	GROUP BY

Key Differences:

Execution order: SQL is declarative; evaluation order differs from written order
Duplicate handling: SQL preserves duplicates by default; RA eliminates them
Extended features: SQL has aggregates, subqueries, window functions beyond basic RA
NULL handling: SQL uses three-valued logic; pure RA typically assumes no NULLs

Think Algebraically, Write SQL

Query Optimization Principles

Understanding optimization principles helps you write queries that are easier for the optimizer to handle and predict execution behavior.

Heuristic Optimization Rules:

These rules generally improve performance without cost estimation:

Push selections down: Apply filters as early as possible
Push projections down: Eliminate unused columns early
Combine cascaded selections: Multiple filters into one
Combine cascaded projections: Keep only final column set
Identify selection before Cartesian product: Convert to join
Preprocess selections with indexes: Use available indexes for selective predicates

Cost-Based Optimization Considerations:

Beyond heuristics, optimizers estimate costs:

Selectivity estimation: How many rows pass each filter?
Join order selection: Which table order minimizes intermediate results?
Access path selection: Table scan vs index scan vs index-only scan?
Join algorithm selection: Nested loop vs hash vs merge join?

Optimization Transformation Summary
Transformation	Before	After	Benefit
Selection Pushdown	σ_P(R ⋈ S)	σ_P(R) ⋈ S	Fewer rows in join
Projection Pushdown	π_L(R ⋈ S)	π_L(π_{L∪join}(R) ⋈ π_{L∪join}(S))	Narrower tuples
Selection Cascade	σ_P1(σ_P2(R))	σ_{P1∧P2}(R)	Single filter pass
Selection Split	σ_{P1∧P2}(R ⋈ S)	σ_{P1}(R) ⋈ σ_{P2}(S)	Push to each table
Cartesian to Join	σ_{R.a=S.b}(R × S)	R ⋈_{a=b} S	Avoid full product
Join Reordering	(R ⋈ S) ⋈ T	R ⋈ (S ⋈ T)	Smaller intermediates

Writing Optimization-Friendly Queries:

DO:

Specify explicit column lists (enables projection pushdown)
Use JOIN syntax with ON clause (clearer join conditions)
Put most selective predicates where they can be pushed
Use SARGable predicates (functions after, not on, indexed columns)
Group related conditions logically

AVOID:

SELECT * (defeats column pruning)
OR conditions spanning multiple tables (prevents pushdown)
Functions on indexed columns in predicates
Implicit joining via Cartesian product and WHERE
Unnecessary DISTINCT (costly if not needed)

-- Optimization-friendly:
SELECT e.name, e.salary, d.dept_name
FROM Employee e
JOIN Department d ON e.dept_id = d.id
WHERE e.status = 'active'
  AND e.hire_date >= '2020-01-01'
  AND d.country = 'US';

-- Less optimization-friendly:
SELECT *
FROM Employee e, Department d
WHERE e.dept_id = d.id
  AND (e.status = 'active' OR d.status = 'active')
  AND YEAR(e.hire_date) >= 2020;

Optimizer Limits

Summary and Key Takeaways

Combining Selection and Projection operations forms the foundation of practical query construction. Understanding composition and optimization enables you to write efficient, maintainable queries.

Key Takeaways

•Closure enables composition — Every operation produces a relation, enabling arbitrary chaining. Complex queries build from simple operations.
•Algebraic laws govern equivalence — Selection cascades and commutes freely; Selection-Projection commutes conditionally. These laws enable safe query transformation.
•Predicate pushdown is fundamental — Applying selections early reduces rows for subsequent operations. This is the most impactful optimization.
•Projection pushdown reduces data volume — Eliminating unused columns early saves memory, I/O, and processing throughout query execution.
•Build expressions systematically — Identify output, sources, conditions, and joins; construct from inside out. This method produces correct, clear queries.
•SQL maps directly from algebra — SELECT=Projection, WHERE=Selection, JOIN=Join. The conceptual model translates cleanly to practical syntax.
•Write for the optimizer — Explicit columns, SARGable predicates, clear join conditions help the optimizer help you.

Module Complete:

Congratulations! You have completed the Selection and Projection module. You now understand:

The Selection operator (σ) for row filtering
Predicate conditions and their evaluation
The Projection operator (π) for column selection
Duplicate elimination in theory and practice
Composing operations into complex queries

These foundational operations appear in virtually every database query. Master them, and you've mastered the core of relational data retrieval.

Module Complete