Database Management SystemsQuery Optimization

Equivalence Rules

LevelIntermediate

Duration60 mins

TopicQuery Optimization

1 / 5

Relational Algebra Equivalences

The Mathematics of Query Transformation

When you write a SQL query, you express what data you want—not how to retrieve it. The database's query optimizer must bridge this gap, transforming your declarative specification into an efficient execution plan. At the heart of this transformation process lies a profound mathematical insight: relational algebra expressions can be rewritten into equivalent forms that produce identical results but with vastly different performance characteristics.

This is not merely an optimization trick—it's a rigorous mathematical framework. Just as algebraic identities like a + b = b + a allow mathematicians to manipulate equations while preserving truth, relational algebra equivalences allow query optimizers to manipulate queries while preserving correctness. Understanding these equivalences is fundamental to understanding how databases achieve their remarkable performance.

What You Will Learn

By the end of this page, you will understand: (1) the formal definition of expression equivalence in relational algebra, (2) the major categories of equivalence rules, (3) how these rules preserve query semantics, and (4) why equivalence-based optimization is both sound and complete for relational query processing.

The Foundation of Equivalence

Before diving into specific rules, we must establish what it means for two relational algebra expressions to be equivalent. This definition is the bedrock upon which all query optimization rests.

Definition: Expression Equivalence

Two relational algebra expressions E₁ and E₂ are said to be equivalent, denoted E₁ ≡ E₂, if and only if:

For every legal database instance D, the expressions E₁ and E₂ produce exactly the same result relation when evaluated on D.

This definition has several critical implications:

1. Universal Quantification: Equivalence must hold for all possible database states, not just the current one. An optimizer cannot assume anything about the data beyond schema constraints.

2. Set Semantics: Under standard relational algebra, relations are sets (no duplicates). Two expressions are equivalent if they produce the same set of tuples. Under bag (multiset) semantics, multiplicities must also match.

3. Order Independence: Relational algebra operates on unordered sets. The physical order of tuples in the result is irrelevant for equivalence (though it matters for ORDER BY in SQL).

4. Attribute Preservation: Equivalent expressions must produce relations with the same schema (attribute names and types).

Soundness Guarantee

When a query optimizer applies an equivalence rule, it is mathematically guaranteed to preserve the query's semantics. This is not heuristic-based guessing—it's provably correct transformation. The optimizer can aggressively rewrite queries knowing the result will always be identical.

Why Equivalence Matters for Optimization

The power of equivalence lies in this fundamental observation: while equivalent expressions produce the same result, they may have dramatically different costs. Consider a simple example:

Suppose we have tables Employees (1 million rows) and Departments (100 rows), and we want employees in the 'Engineering' department.

Expression 1: σ_{dept='Engineering'}(Employees ⋈ Departments)

First join: 1M × 100 = potential 100M intermediate tuples
Then filter to find Engineering employees

Expression 2: Employees ⋈ (σ_{dept='Engineering'}(Departments))

First filter Departments: 100 rows → ~1 row
Then join: 1M × 1 = 1M comparisons at most

Both expressions produce identical results, but Expression 2 may be orders of magnitude faster. The optimizer's job is to find such efficient equivalents using a repertoire of transformation rules.

Categories of Equivalence Rules

Relational algebra equivalence rules fall into several major categories, each addressing different aspects of query optimization. Understanding this taxonomy helps you reason about what transformations are possible and why they preserve correctness.

Major Categories of Equivalence Rules
Category	Purpose	Key Operations	Primary Benefit
Commutativity Rules	Reorder operands	Selection, Join	Enable optimal operation ordering
Associativity Rules	Regroup operations	Join, Union, Intersection	Enable join order optimization
Distributivity Rules	Push/pull operations	Selection, Projection, Join	Reduce intermediate result sizes
Idempotence Rules	Eliminate redundancy	Selection, Projection	Remove unnecessary operations
Decomposition Rules	Split/merge operations	Selection (conjunctions), Projection	Enable fine-grained optimization
Null/Empty Rules	Handle edge cases	All operations with empty sets	Short-circuit unnecessary computation

Let's examine each category in detail, understanding both the mathematical basis and the practical implications for query processing.

Understanding Rule Categories

•Commutativity tells us that the order of operands doesn't affect the result. For binary operations like join, this means we can choose which relation to process first, which has significant cost implications.
•Associativity tells us that grouping doesn't matter: (A ⋈ B) ⋈ C = A ⋈ (B ⋈ C). This is the foundation of join ordering optimization, one of the most impactful optimizations.
•Distributivity allows operations to be pushed through other operations. The classic example is pushing selection through join, which is often the single most important optimization.
•Idempotence means applying an operation twice is the same as applying it once. σ_p(σ_p(R)) = σ_p(R). This helps eliminate redundant operations.
•Decomposition allows complex operations to be broken into simpler pieces. σ_{p∧q}(R) = σ_p(σ_q(R)) lets us apply selections incrementally.

Fundamental Equivalence Rules

Now let's formalize the core equivalence rules that every query optimizer employs. These rules form the mathematical toolkit that transforms naive query plans into efficient ones.

Selection Rules (σ)

Selection is the workhorse of query optimization. These rules govern how selection predicates can be manipulated:

Rule S1: Cascade of Selections (Decomposition)

σ_{p₁∧p₂∧...∧pₙ}(R) ≡ σ_{p₁}(σ_{p₂}(...σ_{pₙ}(R)...))

A conjunction of conditions can be split into a cascade of individual selections. This enables predicates to be applied at different points in the query plan.

Rule S2: Commutativity of Selections

σ_{p₁}(σ_{p₂}(R)) ≡ σ_{p₂}(σ_{p₁}(R))

The order in which selections are applied doesn't matter (mathematically). However, performance differs—more selective predicates should be applied first.

Rule S3: Idempotence of Selection

σ_p(σ_p(R)) ≡ σ_p(R)

Applying the same selection twice is redundant. Optimizers use this to eliminate duplicate predicates.

**Rule S4: Selection with OR (Disjunction)

σ_{p₁∨p₂}(R) ≡ σ_{p₁}(R) ∪ σ_{p₂}(R)  [if p₁ and p₂ are disjoint]

OR conditions can potentially be split into unions, enabling parallel or index-based processing.

Cascade vs. Combine

The cascade rule works both ways. Sometimes splitting is beneficial (to push predicates closer to base tables), and sometimes combining is better (to enable index usage on compound conditions). Good optimizers consider both directions.

Projection Rules (π)

Projection removes attributes, reducing the width of tuples flowing through the query plan:

Rule P1: Cascade of Projections

π_{L₁}(π_{L₂}(R)) ≡ π_{L₁}(R)  [if L₁ ⊆ L₂]

If we project on a subset of attributes, an intermediate projection on a superset is redundant.

Rule P2: Projection Commuting with Selection

π_L(σ_p(R)) ≡ σ_p(π_{L∪attrs(p)}(R))  [if attrs(p) ⊆ L]

Projection can move past selection if the selection's attributes are preserved. This is subtle—if selection needs attributes not in L, we must temporarily include them.

Set Operation Rules

Union, intersection, and set difference have their own equivalence properties:

Rule U1: Commutativity of Union and Intersection

R ∪ S ≡ S ∪ R
R ∩ S ≡ S ∩ R

Rule U2: Associativity of Union and Intersection

(R ∪ S) ∪ T ≡ R ∪ (S ∪ T)
(R ∩ S) ∩ T ≡ R ∩ (S ∩ T)

Note: Set difference is neither commutative nor associative: R - S ≢ S - R.

Join Equivalence Rules

Join operations are typically the most expensive components of query execution. Consequently, the equivalence rules governing joins are among the most impactful for optimization.

Rule J1: Commutativity of Natural Join

R ⋈ S ≡ S ⋈ R

The order of relations in a natural join doesn't affect the result. This is fundamental for choosing which relation to use as the inner vs. outer in nested-loop algorithms.

Rule J2: Commutativity of Theta Join

R ⋈_θ S ≡ S ⋈_θ R  [with θ appropriately transformed]

Theta joins are also commutative, though the join condition may need to be rewritten if it's not symmetric.

Rule J3: Associativity of Natural Join

(R ⋈ S) ⋈ T ≡ R ⋈ (S ⋈ T)

This is the cornerstone of join ordering optimization. For n tables, there are n!/2 different join orders to consider.

Rule J4: Associativity of Theta Join

(R ⋈_{θ₁} S) ⋈_{θ₂} T ≡ R ⋈_{θ₁} (S ⋈_{θ₂∧θ₃} T)
[where θ₃ represents conditions involving attributes from R and T]

Theta join associativity is more complex because predicates may reference multiple relations.

Benefits of Join Commutativity

•Choose the smaller relation as the inner operand in nested loops
•Build hash tables on smaller relations
•Utilize indexes on either operand
•Optimize memory usage patterns
•Improve cache locality

Benefits of Join Associativity

•Minimize intermediate result sizes
•Enable early application of selective joins
•Exploit available indexes
•Balance parallel execution workloads
•Avoid Cartesian products

Rule J5: Join and Cartesian Product

R ⋈_θ S ≡ σ_θ(R × S)

A theta join is equivalent to a Cartesian product followed by selection. This equivalence is typically traversed in the opposite direction—converting Cartesian products with subsequent selections into joins.

Rule J6: Natural Join via Projection

R ⋈ S ≡ π_{attrs(R)∪attrs(S)}(R ⋈_{R.A=S.A∧...} S)

A natural join is a theta join on common attributes with appropriate projection. This makes explicit what natural join implies.

Outer Joins Are Different

These commutativity and associativity rules apply to inner joins. Outer joins (LEFT, RIGHT, FULL) have much more limited equivalences. For example, R LEFT JOIN S ≢ S LEFT JOIN R. Outer join optimization requires specialized rules and careful handling.

Distributivity Rules

Distributivity rules specify how operations can be pushed through other operations. These rules are the mathematical foundation for "pushing" selections and projections toward leaf nodes in the query tree, which is typically the most effective optimization.

Rule D1: Selection Distributes over Join

σ_p(R ⋈ S) ≡ σ_p(R) ⋈ S  [if p involves only attributes of R]

If a selection predicate only references one relation, it can be pushed to that relation before the join. This is the canonical "selection pushdown."

Rule D2: Selection Distributes over Join (Conjunctive)

σ_{p₁∧p₂}(R ⋈ S) ≡ σ_{p₁}(R) ⋈ σ_{p₂}(S)
[if p₁ involves only R, p₂ involves only S]

Conjunctive predicates can be split and pushed to their respective relations.

Rule D3: Selection Distributes over Union

σ_p(R ∪ S) ≡ σ_p(R) ∪ σ_p(S)

Selection can be pushed through union to both operands.

Rule D4: Selection Distributes over Set Difference

σ_p(R - S) ≡ σ_p(R) - S ≡ σ_p(R) - σ_p(S)

Selection pushes through set difference, but note the subtle point: we can apply it to just R, or to both.

Rule D5: Projection Distributes over Join

π_L(R ⋈ S) ≡ π_{L₁}(R) ⋈ π_{L₂}(S)
[where L₁ ⊆ attrs(R), L₂ ⊆ attrs(S), and join attributes are included]

Projection can be pushed down to remove unused attributes before joining, reducing tuple width.

Selection Distributivity Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Original Query
SELECT e.name, d.dept_name
FROM Employees e
JOIN Departments d ON e.dept_id = d.id
WHERE e.salary > 100000 AND d.location = 'NYC';
 
-- The optimizer transforms this using distributivity rules:
-- Step 1: Decompose the selection (Rule S1)
-- σ_{salary>100000 ∧ location='NYC'}(E ⋈ D)
-- ≡ σ_{salary>100000}(σ_{location='NYC'}(E ⋈ D))
 
-- Step 2: Push down selections using D2
-- ≡ σ_{salary>100000}(E) ⋈ σ_{location='NYC'}(D)
 
-- This is equivalent to:
SELECT e.name, d.dept_name
FROM (SELECT * FROM Employees WHERE salary > 100000) e
JOIN (SELECT * FROM Departments WHERE location = 'NYC') d
ON e.dept_id = d.id;
 
-- The second form filters BEFORE joining, dramatically reducing work

The Power of Pushing Early

Distributivity rules enable what's often called 'early selection' or 'early projection.' By reducing data volume as early as possible in the query plan, we minimize I/O, memory usage, and CPU time for all downstream operations. This is often the single most impactful class of optimizations.

Formal Properties of Equivalences

The collection of equivalence rules forms a mathematical system with important formal properties that guarantee the correctness and completeness of equivalence-based optimization.

Property 1: Reflexivity

E ≡ E (every expression is equivalent to itself)

Property 2: Symmetry

If E₁ ≡ E₂, then E₂ ≡ E₁

Property 3: Transitivity

If E₁ ≡ E₂ and E₂ ≡ E₃, then E₁ ≡ E₃

These three properties establish that equivalence is an equivalence relation (in the formal mathematical sense), partitioning all possible expressions into equivalence classes. Each equivalence class contains all expressions that produce the same result.

The Optimization Paradigm

Given these properties, optimization can be viewed as:

Start with the original query expression E
Define the equivalence class [E] containing all equivalent expressions
Estimate the cost of expressions in [E]
Select the minimum-cost expression from [E]

In practice, enumerating the entire equivalence class is infeasible—there are typically infinitely many equivalent expressions (since rules can be applied repeatedly). Instead, optimizers use heuristics to explore promising portions of the search space.

Properties of Relational Operations Under Equivalence
Operation	Commutative?	Associative?	Idempotent?	Notes
Selection (σ)	Yes (cascade)	Yes (cascade)	Yes	Can decompose conjunctions
Projection (π)	N/A	N/A	Yes	Nested projections collapse
Natural Join (⋈)	Yes	Yes	Yes (R⋈R≡R)	Full flexibility
Theta Join (⋈_θ)	Yes	Conditional	Conditional	Depends on condition
Union (∪)	Yes	Yes	Yes	Full flexibility
Intersection (∩)	Yes	Yes	Yes	Full flexibility
Set Difference (-)	No	No	No	Very limited transformations
Cartesian Product (×)	Yes	Yes	No	Avoid when possible

Confluence and Termination

For rule-based optimization systems, two properties are critical:

Confluence: If we can reach expressions E₂ and E₃ from E₁, can we reach a common expression from both E₂ and E₃? Confluence ensures that the order of rule application doesn't matter for the final result.

Termination: Does repeated rule application eventually stop? Without termination guarantees, optimization could loop forever.

Most query optimizers don't have strict confluence or termination—they use cost-based cutoffs and don't explore all possibilities. However, for specific rule subsets (like selection pushdown), these properties can be proven.

Practical Implications for Query Writing

Understanding equivalence rules changes how you think about writing queries. While optimizers try to find efficient plans regardless of how you write your SQL, there are practical implications.

Query Writing Insights

•Don't pre-optimize obsessively — Good optimizers apply equivalence rules automatically. Writing convoluted SQL to 'help' the optimizer often backfires.
•Trust predicate pushdown — If you write WHERE conditions on joined tables, modern optimizers will push them down appropriately. You don't need to use subqueries to force early filtering.
•Join order usually doesn't matter (in SQL) — The textual order of tables in your FROM clause is rarely the execution order. The optimizer reorders based on statistics.
•Complex expressions may not optimize well — Very complex predicates or non-standard constructs may block optimization. Simpler is often better.
•Understand optimizer limitations — Outer joins, correlated subqueries, and certain functions can limit the optimizer's ability to apply equivalence rules.

When Equivalence Breaks Down

Not all SQL constructs participate fully in equivalence-based optimization: (1) User-defined functions may have side effects, preventing reordering. (2) Non-deterministic functions (RAND(), CURRENT_TIMESTAMP) can't be safely moved. (3) Queries with ORDER BY, LIMIT, or OFFSET have ordering semantics the optimizer must preserve.

The Declarative Promise

Equivalence rules make SQL's declarative nature powerful. When you specify:

SELECT ... FROM A, B, C WHERE <conditions>

You're not saying "scan A, cross with B, cross with C, then filter." You're saying "give me tuples satisfying these conditions from the Cartesian product of A, B, C." The optimizer is free to:

Reorder the joins (associativity, commutativity)
Push selections down (distributivity)
Eliminate projections early (distributivity)
Choose any algorithm for each operation

This separation of specification from implementation is what makes database systems so powerful and durable across decades of hardware evolution.

Summary: The Algebra of Query Optimization

We've established the mathematical foundation that enables query optimizers to transform queries while guaranteeing correctness. Let's consolidate the key insights:

Key Takeaways

•Equivalence is rigorous — Two expressions are equivalent if they produce the same result for every possible database state. This is not approximation—it's mathematical certainty.
•Rules form categories — Commutativity, associativity, distributivity, and idempotence rules govern different aspects of query transformation.
•Selection rules enable filtering pushdown — Decomposing and pushing selections is often the most impactful optimization.
•Join rules enable order optimization — Commutativity and associativity give optimizers freedom to find optimal join sequences.
•Distributivity enables early reduction — Pushing operations toward leaf nodes reduces data volume throughout the plan.
•Equivalence is an equivalence relation — Reflexivity, symmetry, and transitivity guarantee consistent transformation.

What's Next

Now that we understand the general framework of equivalence rules, we'll dive deep into selection pushdown—arguably the single most important optimization technique. We'll see exactly how and when selections can be pushed through various operations, and the dramatic performance improvements this enables.

Page Complete

You now understand the algebraic foundations of query optimization. These equivalence rules are what enable databases to execute your declarative SQL specifications efficiently, regardless of how you write them. Next, we'll explore selection pushdown in comprehensive detail.

1 / 5

Loading learning content...

Database Management SystemsQuery Optimization

Equivalence Rules

LevelIntermediate

Duration60 mins

TopicQuery Optimization

1 / 5

Relational Algebra Equivalences

The Mathematics of Query Transformation

What You Will Learn

The Foundation of Equivalence

Before diving into specific rules, we must establish what it means for two relational algebra expressions to be equivalent. This definition is the bedrock upon which all query optimization rests.

Definition: Expression Equivalence

Two relational algebra expressions E₁ and E₂ are said to be equivalent, denoted E₁ ≡ E₂, if and only if:

For every legal database instance D, the expressions E₁ and E₂ produce exactly the same result relation when evaluated on D.

This definition has several critical implications:

1. Universal Quantification: Equivalence must hold for all possible database states, not just the current one. An optimizer cannot assume anything about the data beyond schema constraints.

3. Order Independence: Relational algebra operates on unordered sets. The physical order of tuples in the result is irrelevant for equivalence (though it matters for ORDER BY in SQL).

4. Attribute Preservation: Equivalent expressions must produce relations with the same schema (attribute names and types).

Soundness Guarantee

Why Equivalence Matters for Optimization

The power of equivalence lies in this fundamental observation: while equivalent expressions produce the same result, they may have dramatically different costs. Consider a simple example:

Suppose we have tables Employees (1 million rows) and Departments (100 rows), and we want employees in the 'Engineering' department.

Expression 1: σ_{dept='Engineering'}(Employees ⋈ Departments)

First join: 1M × 100 = potential 100M intermediate tuples
Then filter to find Engineering employees

Expression 2: Employees ⋈ (σ_{dept='Engineering'}(Departments))

First filter Departments: 100 rows → ~1 row
Then join: 1M × 1 = 1M comparisons at most

Both expressions produce identical results, but Expression 2 may be orders of magnitude faster. The optimizer's job is to find such efficient equivalents using a repertoire of transformation rules.

Categories of Equivalence Rules

Major Categories of Equivalence Rules
Category	Purpose	Key Operations	Primary Benefit
Commutativity Rules	Reorder operands	Selection, Join	Enable optimal operation ordering
Associativity Rules	Regroup operations	Join, Union, Intersection	Enable join order optimization
Distributivity Rules	Push/pull operations	Selection, Projection, Join	Reduce intermediate result sizes
Idempotence Rules	Eliminate redundancy	Selection, Projection	Remove unnecessary operations
Decomposition Rules	Split/merge operations	Selection (conjunctions), Projection	Enable fine-grained optimization
Null/Empty Rules	Handle edge cases	All operations with empty sets	Short-circuit unnecessary computation

Let's examine each category in detail, understanding both the mathematical basis and the practical implications for query processing.

Understanding Rule Categories

•Commutativity tells us that the order of operands doesn't affect the result. For binary operations like join, this means we can choose which relation to process first, which has significant cost implications.
•Associativity tells us that grouping doesn't matter: (A ⋈ B) ⋈ C = A ⋈ (B ⋈ C). This is the foundation of join ordering optimization, one of the most impactful optimizations.
•Distributivity allows operations to be pushed through other operations. The classic example is pushing selection through join, which is often the single most important optimization.
•Idempotence means applying an operation twice is the same as applying it once. σ_p(σ_p(R)) = σ_p(R). This helps eliminate redundant operations.
•Decomposition allows complex operations to be broken into simpler pieces. σ_{p∧q}(R) = σ_p(σ_q(R)) lets us apply selections incrementally.

Fundamental Equivalence Rules

Now let's formalize the core equivalence rules that every query optimizer employs. These rules form the mathematical toolkit that transforms naive query plans into efficient ones.

Selection Rules (σ)

Selection is the workhorse of query optimization. These rules govern how selection predicates can be manipulated:

Rule S1: Cascade of Selections (Decomposition)

σ_{p₁∧p₂∧...∧pₙ}(R) ≡ σ_{p₁}(σ_{p₂}(...σ_{pₙ}(R)...))

A conjunction of conditions can be split into a cascade of individual selections. This enables predicates to be applied at different points in the query plan.

Rule S2: Commutativity of Selections

σ_{p₁}(σ_{p₂}(R)) ≡ σ_{p₂}(σ_{p₁}(R))

The order in which selections are applied doesn't matter (mathematically). However, performance differs—more selective predicates should be applied first.

Rule S3: Idempotence of Selection

σ_p(σ_p(R)) ≡ σ_p(R)

Applying the same selection twice is redundant. Optimizers use this to eliminate duplicate predicates.

**Rule S4: Selection with OR (Disjunction)

σ_{p₁∨p₂}(R) ≡ σ_{p₁}(R) ∪ σ_{p₂}(R)  [if p₁ and p₂ are disjoint]

OR conditions can potentially be split into unions, enabling parallel or index-based processing.

Cascade vs. Combine

Projection Rules (π)

Projection removes attributes, reducing the width of tuples flowing through the query plan:

Rule P1: Cascade of Projections

π_{L₁}(π_{L₂}(R)) ≡ π_{L₁}(R)  [if L₁ ⊆ L₂]

If we project on a subset of attributes, an intermediate projection on a superset is redundant.

Rule P2: Projection Commuting with Selection

π_L(σ_p(R)) ≡ σ_p(π_{L∪attrs(p)}(R))  [if attrs(p) ⊆ L]

Projection can move past selection if the selection's attributes are preserved. This is subtle—if selection needs attributes not in L, we must temporarily include them.

Set Operation Rules

Union, intersection, and set difference have their own equivalence properties:

Rule U1: Commutativity of Union and Intersection

R ∪ S ≡ S ∪ R
R ∩ S ≡ S ∩ R

Rule U2: Associativity of Union and Intersection

(R ∪ S) ∪ T ≡ R ∪ (S ∪ T)
(R ∩ S) ∩ T ≡ R ∩ (S ∩ T)

Note: Set difference is neither commutative nor associative: R - S ≢ S - R.

Join Equivalence Rules

Join operations are typically the most expensive components of query execution. Consequently, the equivalence rules governing joins are among the most impactful for optimization.

Rule J1: Commutativity of Natural Join

R ⋈ S ≡ S ⋈ R

The order of relations in a natural join doesn't affect the result. This is fundamental for choosing which relation to use as the inner vs. outer in nested-loop algorithms.

Rule J2: Commutativity of Theta Join

R ⋈_θ S ≡ S ⋈_θ R  [with θ appropriately transformed]

Theta joins are also commutative, though the join condition may need to be rewritten if it's not symmetric.

Rule J3: Associativity of Natural Join

(R ⋈ S) ⋈ T ≡ R ⋈ (S ⋈ T)

This is the cornerstone of join ordering optimization. For n tables, there are n!/2 different join orders to consider.

Rule J4: Associativity of Theta Join

(R ⋈_{θ₁} S) ⋈_{θ₂} T ≡ R ⋈_{θ₁} (S ⋈_{θ₂∧θ₃} T)
[where θ₃ represents conditions involving attributes from R and T]

Theta join associativity is more complex because predicates may reference multiple relations.

Benefits of Join Commutativity

•Choose the smaller relation as the inner operand in nested loops
•Build hash tables on smaller relations
•Utilize indexes on either operand
•Optimize memory usage patterns
•Improve cache locality

Benefits of Join Associativity

•Minimize intermediate result sizes
•Enable early application of selective joins
•Exploit available indexes
•Balance parallel execution workloads
•Avoid Cartesian products

Rule J5: Join and Cartesian Product

R ⋈_θ S ≡ σ_θ(R × S)

Rule J6: Natural Join via Projection

R ⋈ S ≡ π_{attrs(R)∪attrs(S)}(R ⋈_{R.A=S.A∧...} S)

A natural join is a theta join on common attributes with appropriate projection. This makes explicit what natural join implies.

Outer Joins Are Different

Distributivity Rules

Rule D1: Selection Distributes over Join

σ_p(R ⋈ S) ≡ σ_p(R) ⋈ S  [if p involves only attributes of R]

If a selection predicate only references one relation, it can be pushed to that relation before the join. This is the canonical "selection pushdown."

Rule D2: Selection Distributes over Join (Conjunctive)

σ_{p₁∧p₂}(R ⋈ S) ≡ σ_{p₁}(R) ⋈ σ_{p₂}(S)
[if p₁ involves only R, p₂ involves only S]

Conjunctive predicates can be split and pushed to their respective relations.

Rule D3: Selection Distributes over Union

σ_p(R ∪ S) ≡ σ_p(R) ∪ σ_p(S)

Selection can be pushed through union to both operands.

Rule D4: Selection Distributes over Set Difference

σ_p(R - S) ≡ σ_p(R) - S ≡ σ_p(R) - σ_p(S)

Selection pushes through set difference, but note the subtle point: we can apply it to just R, or to both.

Rule D5: Projection Distributes over Join

π_L(R ⋈ S) ≡ π_{L₁}(R) ⋈ π_{L₂}(S)
[where L₁ ⊆ attrs(R), L₂ ⊆ attrs(S), and join attributes are included]

Projection can be pushed down to remove unused attributes before joining, reducing tuple width.

Selection Distributivity Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Original Query
SELECT e.name, d.dept_name
FROM Employees e
JOIN Departments d ON e.dept_id = d.id
WHERE e.salary > 100000 AND d.location = 'NYC';
 
-- The optimizer transforms this using distributivity rules:
-- Step 1: Decompose the selection (Rule S1)
-- σ_{salary>100000 ∧ location='NYC'}(E ⋈ D)
-- ≡ σ_{salary>100000}(σ_{location='NYC'}(E ⋈ D))
 
-- Step 2: Push down selections using D2
-- ≡ σ_{salary>100000}(E) ⋈ σ_{location='NYC'}(D)
 
-- This is equivalent to:
SELECT e.name, d.dept_name
FROM (SELECT * FROM Employees WHERE salary > 100000) e
JOIN (SELECT * FROM Departments WHERE location = 'NYC') d
ON e.dept_id = d.id;
 
-- The second form filters BEFORE joining, dramatically reducing work

The Power of Pushing Early

Formal Properties of Equivalences

The collection of equivalence rules forms a mathematical system with important formal properties that guarantee the correctness and completeness of equivalence-based optimization.

Property 1: Reflexivity

E ≡ E (every expression is equivalent to itself)

Property 2: Symmetry

If E₁ ≡ E₂, then E₂ ≡ E₁

Property 3: Transitivity

If E₁ ≡ E₂ and E₂ ≡ E₃, then E₁ ≡ E₃

The Optimization Paradigm

Given these properties, optimization can be viewed as:

Start with the original query expression E
Define the equivalence class [E] containing all equivalent expressions
Estimate the cost of expressions in [E]
Select the minimum-cost expression from [E]

Properties of Relational Operations Under Equivalence
Operation	Commutative?	Associative?	Idempotent?	Notes
Selection (σ)	Yes (cascade)	Yes (cascade)	Yes	Can decompose conjunctions
Projection (π)	N/A	N/A	Yes	Nested projections collapse
Natural Join (⋈)	Yes	Yes	Yes (R⋈R≡R)	Full flexibility
Theta Join (⋈_θ)	Yes	Conditional	Conditional	Depends on condition
Union (∪)	Yes	Yes	Yes	Full flexibility
Intersection (∩)	Yes	Yes	Yes	Full flexibility
Set Difference (-)	No	No	No	Very limited transformations
Cartesian Product (×)	Yes	Yes	No	Avoid when possible

Confluence and Termination

For rule-based optimization systems, two properties are critical:

Termination: Does repeated rule application eventually stop? Without termination guarantees, optimization could loop forever.

Practical Implications for Query Writing

Understanding equivalence rules changes how you think about writing queries. While optimizers try to find efficient plans regardless of how you write your SQL, there are practical implications.

Query Writing Insights

•Don't pre-optimize obsessively — Good optimizers apply equivalence rules automatically. Writing convoluted SQL to 'help' the optimizer often backfires.
•Trust predicate pushdown — If you write WHERE conditions on joined tables, modern optimizers will push them down appropriately. You don't need to use subqueries to force early filtering.
•Join order usually doesn't matter (in SQL) — The textual order of tables in your FROM clause is rarely the execution order. The optimizer reorders based on statistics.
•Complex expressions may not optimize well — Very complex predicates or non-standard constructs may block optimization. Simpler is often better.
•Understand optimizer limitations — Outer joins, correlated subqueries, and certain functions can limit the optimizer's ability to apply equivalence rules.

When Equivalence Breaks Down

The Declarative Promise

Equivalence rules make SQL's declarative nature powerful. When you specify:

SELECT ... FROM A, B, C WHERE <conditions>

You're not saying "scan A, cross with B, cross with C, then filter." You're saying "give me tuples satisfying these conditions from the Cartesian product of A, B, C." The optimizer is free to:

Reorder the joins (associativity, commutativity)
Push selections down (distributivity)
Eliminate projections early (distributivity)
Choose any algorithm for each operation

This separation of specification from implementation is what makes database systems so powerful and durable across decades of hardware evolution.

Summary: The Algebra of Query Optimization

We've established the mathematical foundation that enables query optimizers to transform queries while guaranteeing correctness. Let's consolidate the key insights:

Key Takeaways

•Equivalence is rigorous — Two expressions are equivalent if they produce the same result for every possible database state. This is not approximation—it's mathematical certainty.
•Rules form categories — Commutativity, associativity, distributivity, and idempotence rules govern different aspects of query transformation.
•Selection rules enable filtering pushdown — Decomposing and pushing selections is often the most impactful optimization.
•Join rules enable order optimization — Commutativity and associativity give optimizers freedom to find optimal join sequences.
•Distributivity enables early reduction — Pushing operations toward leaf nodes reduces data volume throughout the plan.
•Equivalence is an equivalence relation — Reflexivity, symmetry, and transitivity guarantee consistent transformation.

What's Next

Page Complete

1 / 5