Database Management SystemsRelational Algebra

Set Operations in Relational Algebra

LevelIntermediate

Duration90 mins

TopicRelational Algebra

5 / 5

Operation Rules

The Rules of the Game: Governing Set Operations

Understanding the rules that govern set operations is essential for writing efficient, correct, and optimizable relational algebra expressions. These rules determine how operations can be combined, reordered, and transformed—forming the theoretical foundation for query optimization.

Just as arithmetic has rules like commutativity (a + b = b + a) and distributivity (a × (b + c) = a×b + a×c), relational algebra has its own laws that govern how operations interact. These laws enable query optimizers to transform your queries into equivalent but more efficient forms.

This page covers the complete set of rules for relational set operations: algebraic properties, operator precedence, equivalence transformations, and the optimization opportunities they enable.

What You Will Learn

By the end of this page, you will master all algebraic properties of set operations, understand operator precedence in relational expressions, know the key equivalence rules for query transformation, apply these rules for query optimization, and recognize patterns that query optimizers exploit.

Algebraic Properties of Set Operations

The three set operations—union, intersection, and difference—exhibit distinct algebraic properties. Understanding these properties is crucial for query manipulation and optimization.

Commutativity:

An operation ⊕ is commutative if A ⊕ B = B ⊕ A for all A and B.

Operation	Commutative?	Example
Union (∪)	✓ Yes	R ∪ S = S ∪ R
Intersection (∩)	✓ Yes	R ∩ S = S ∩ R
Difference (−)	✗ No	R − S ≠ S − R (in general)

Associativity:

An operation ⊕ is associative if (A ⊕ B) ⊕ C = A ⊕ (B ⊕ C).

Operation	Associative?	Example
Union (∪)	✓ Yes	(R ∪ S) ∪ T = R ∪ (S ∪ T)
Intersection (∩)	✓ Yes	(R ∩ S) ∩ T = R ∩ (S ∩ T)
Difference (−)	✗ No	(R − S) − T ≠ R − (S − T)

Complete Algebraic Properties Summary
Property	Union (∪)	Intersection (∩)	Difference (−)
Commutative	R ∪ S = S ∪ R ✓	R ∩ S = S ∩ R ✓	R − S ≠ S − R ✗
Associative	(R∪S)∪T = R∪(S∪T) ✓	(R∩S)∩T = R∩(S∩T) ✓	(R−S)−T ≠ R−(S−T) ✗
Idempotent	R ∪ R = R ✓	R ∩ R = R ✓	R − R = ∅ (not idempotent)
Identity Element	R ∪ ∅ = R (∅ is identity)	None (no universal)	R − ∅ = R (∅ for right)
Annihilator	R ∪ U = U (universal)	R ∩ ∅ = ∅	∅ − R = ∅

Difference is Non-Associative!

Never assume parentheses don't matter with difference. Consider: R={1,2,3}, S={2,3}, T={3}. (R−S)−T = {1}−{3} = {1}. But R−(S−T) = {1,2,3}−{2} = {1,3}. The results differ!

Implications for Query Writing:

Union chains can be freely reordered: A ∪ B ∪ C = C ∪ A ∪ B = any order
Intersection chains can be freely reordered: A ∩ B ∩ C = B ∩ C ∩ A
Difference chains MUST use explicit parentheses: (A − B) − C ≠ A − (B − C)
Mixed operations follow precedence rules (covered in Section 3)

Distributive Laws and Interactions

Set operations interact with each other through distributive laws and absorption laws. These rules enable complex transformations between equivalent expressions.

Distributivity:

An operation ⊗ distributes over ⊕ if: A ⊗ (B ⊕ C) = (A ⊗ B) ⊕ (A ⊗ C)

Distributive Laws for Set Operations
Law	Equation	Direction
∩ distributes over ∪	R ∩ (S ∪ T) = (R ∩ S) ∪ (R ∩ T)	Both directions ✓
∪ distributes over ∩	R ∪ (S ∩ T) = (R ∪ S) ∩ (R ∪ T)	Both directions ✓
− distributes over ∪ (right)	R − (S ∪ T) = (R − S) ∩ (R − T)	De Morgan variant
− distributes over ∩ (right)	R − (S ∩ T) = (R − S) ∪ (R − T)	De Morgan variant

Absorption Laws:

Absorption describes how one operation can "absorb" a combined expression:

R ∪ (R ∩ S) = R (union absorbs intersection)
R ∩ (R ∪ S) = R (intersection absorbs union)

Proof of R ∪ (R ∩ S) = R:

R ∩ S ⊆ R (by definition of intersection)
R ⊆ R ∪ (anything) (by definition of union)
If A ⊆ B, then A ∪ B = B
Since R ∩ S ⊆ R, we have R ∪ (R ∩ S) = R ∎

De Morgan's Laws

De Morgan's laws are essential for query transformation: • R − (S ∪ T) = (R − S) ∩ (R − T) • R − (S ∩ T) = (R − S) ∪ (R − T)

These allow converting between union-based and intersection-based expressions, enabling different execution strategies.

Converting Mermaid diagram...

Operator Precedence in Relational Expressions

When relational algebra expressions contain multiple operators without explicit parentheses, precedence rules determine the order of evaluation. Understanding precedence prevents ambiguity and ensures correct query interpretation.

Standard Precedence (Highest to Lowest):

Relational Algebra Operator Precedence
Priority	Operators	Description	Example
1 (Highest)	σ, π, ρ	Unary operators (selection, projection, rename)	σ_x>5 binds tightest
2	×, ⋈	Product and join operators	R × S evaluated before union
3	∩	Intersection	Higher than union/difference
4	∪, −	Union and difference	Same level, left-to-right
5 (Lowest)	←	Assignment	Evaluated last

Precedence Examples:

σ_salary>50000(Employees) ∪ Managers

Evaluates as: (σ_salary>50000(Employees)) ∪ Managers Selection binds to Employees first, then union.

R ∪ S ∩ T

Evaluates as: R ∪ (S ∩ T) Intersection has higher precedence than union.

R − S ∪ T

Evaluates as: (R − S) ∪ T Same precedence for − and ∪, so left-to-right.

π_name(R ∪ S)

Explicit parentheses: Union first, then project on result.

π_name(R) ∪ π_name(S)

Without parentheses on R and S, projection binds first.

Best Practice: Use Parentheses

Even when precedence rules give the desired interpretation, explicit parentheses improve readability and prevent errors. Write (R ∪ S) ∩ T instead of relying on readers knowing precedence rules.

precedence_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- SQL set operation precedence
-- INTERSECT has higher precedence than UNION/EXCEPT
 
-- This query:
SELECT * FROM A
UNION
SELECT * FROM B
INTERSECT
SELECT * FROM C;
 
-- Evaluates as:
SELECT * FROM A
UNION
(SELECT * FROM B INTERSECT SELECT * FROM C);
 
-- NOT as:
(SELECT * FROM A UNION SELECT * FROM B)
INTERSECT
SELECT * FROM C;
 
-- For clarity, always use explicit parentheses:
SELECT * FROM A
UNION
(
    SELECT * FROM B
    INTERSECT
    SELECT * FROM C
);
 
-- Or if you want union first:
(
    SELECT * FROM A
    UNION
    SELECT * FROM B
)
INTERSECT
SELECT * FROM C;
 
-- EXCEPT and UNION have same precedence (left-to-right)
SELECT * FROM A
EXCEPT
SELECT * FROM B
UNION
SELECT * FROM C;
 
-- Evaluates as:
(
    (SELECT * FROM A EXCEPT SELECT * FROM B)
    UNION
    SELECT * FROM C
);

Equivalence Rules for Query Transformation

Equivalence rules define when two relational algebra expressions produce identical results for all possible database states. These rules form the foundation of query optimization—optimizers transform queries into equivalent but more efficient forms.

Fundamental Equivalences for Set Operations:

Key Equivalence Rules
Rule Name	Equivalence	Application
Commutativity of ∪	R ∪ S ≡ S ∪ R	Reorder for smaller relation first
Commutativity of ∩	R ∩ S ≡ S ∩ R	Build hash on smaller relation
Associativity of ∪	(R ∪ S) ∪ T ≡ R ∪ (S ∪ T)	Group for optimal execution
Associativity of ∩	(R ∩ S) ∩ T ≡ R ∩ (S ∩ T)	Process smallest intersections first
Intersection via Difference	R ∩ S ≡ R − (R − S)	Use when difference is indexed
De Morgan (union)	R − (S ∪ T) ≡ (R − S) ∩ (R − T)	Split complex exclusions
De Morgan (intersection)	R − (S ∩ T) ≡ (R − S) ∪ (R − T)	Parallelize exclusion checks
Absorption (union)	R ∪ (R ∩ S) ≡ R	Eliminate redundant operations
Absorption (intersection)	R ∩ (R ∪ S) ≡ R	Simplify expressions

Equivalences Involving Selection (σ):

Selection interacts with set operations in powerful ways:

-- Selection distributes over union:
σ_p(R ∪ S) ≡ σ_p(R) ∪ σ_p(S)

-- Selection distributes over intersection:
σ_p(R ∩ S) ≡ σ_p(R) ∩ σ_p(S)

-- Selection distributes over difference:
σ_p(R − S) ≡ σ_p(R) − σ_p(S)
           ≡ σ_p(R) − S  (if p applies to R's attributes)

Why these matter:

Pushing selection down (applying it earlier) often dramatically reduces the number of tuples processed by subsequent operations. This is one of the most effective optimization techniques.

Selection Pushdown

If you're selecting from a union of 1M and 2M row tables, applying selection AFTER union processes 3M rows. Pushing selection to BEFORE union might reduce each to 10K rows, processing only 20K total—a 150x improvement!

Equivalences Involving Projection (π):

-- Projection distributes over union:
π_L(R ∪ S) ≡ π_L(R) ∪ π_L(S)

-- BUT NOT over intersection or difference in general!
π_L(R ∩ S) ≢ π_L(R) ∩ π_L(S)  -- May produce different results!
π_L(R − S) ≢ π_L(R) − π_L(S)  -- May produce different results!

Why projection doesn't distribute over intersection/difference:

Projection can merge previously distinct tuples. Consider:

R = {(1, 'a'), (2, 'a')}
S = {(1, 'a'), (3, 'a')}
π_B(R ∩ S) = π_B({(1, 'a')}) = {('a')}
π_B(R) ∩ π_B(S) = {('a')} ∩ {('a')} = {('a')} -- Same here, but...

With different data, results can differ when projection causes duplicates to collapse differently.

Query Optimization Using Set Operation Rules

Query optimizers apply equivalence rules to transform queries into more efficient execution plans. Understanding these techniques helps you write optimizer-friendly queries and understand execution plans.

Key Optimization Strategies:

Optimization Techniques for Set Operations

•Push Selection Down: Apply σ before set operations to reduce tuple counts early • σ_p(R ∪ S) → σ_p(R) ∪ σ_p(S)
•Push Projection Down (for Union): Project before union to reduce tuple width • π_L(R ∪ S) → π_L(R) ∪ π_L(S)
•Reorder Commutative Operations: Process smaller relations first • If |S| < |R|, compute S ∩ R starting with S
•Exploit Associativity: Group operations to minimize intermediate results • (R ∪ S) ∪ T with small R,S → compute R ∪ S first
•Eliminate Redundancy: Use absorption and idempotence • R ∪ (R ∩ S) → R
•Parallelize via De Morgan: Split for concurrent execution • R − (S ∪ T) → (R − S) ∩ (R − T) if S and T can be processed in parallel

Example: Optimizing a Complex Query

Original query:

π_name(σ_dept='Engineering'(Employees ∪ Contractors))

Step 1: Push selection inside union:

π_name(σ_dept='Engineering'(Employees) ∪ σ_dept='Engineering'(Contractors))

Step 2: Push projection inside unions:

π_name(σ_dept='Engineering'(Employees)) ∪ π_name(σ_dept='Engineering'(Contractors))

Cost Analysis:

Step	Tuples Processed	Tuple Width
Original	100K + 50K = 150K	Full width
After σ push	10K + 5K = 15K	Full width
After π push	10K + 5K = 15K	Narrow (name only)

The optimized version processes 10x fewer tuples with narrower tuples—dramatic improvement!

Helping the Optimizer

While optimizers apply transformations automatically, you can help by: (1) Writing conditions that can be pushed down, (2) Avoiding unnecessary columns in SELECT, (3) Using appropriate indexes, and (4) Providing statistics through ANALYZE commands.

Set Operations with Other Relational Operations

Set operations don't exist in isolation—they interact with joins, selections, projections, and other operations. Understanding these interactions enables sophisticated query design.

Set Operations and Joins:

Join Interactions with Set Operations
Pattern	Equivalence	Notes
Join distributes over union	(R ∪ S) ⋈ T ≡ (R ⋈ T) ∪ (S ⋈ T)	Valid when join condition applies to both
Join over intersection	(R ∩ S) ⋈ T ⊆ (R ⋈ T) ∩ (S ⋈ T)	Subset, not equivalence
Join over difference	(R − S) ⋈ T ⊆ (R ⋈ T) − (S ⋈ T)	Subset, not equivalence

Warning: Be Careful with Join Distribution

Join distributing over union is valid:

(R ∪ S) ⋈ T = (R ⋈ T) ∪ (S ⋈ T)

But join does NOT fully distribute over intersection or difference! Consider this counterexample for intersection:

R = {(1, 'a')}, S = {(1, 'b')}, T = {(1, 'c')}

-- Left side: (R ∩ S) ⋈ T
R ∩ S = {} (no common tuples)
{} ⋈ T = {} 

-- Right side: (R ⋈ T) ∩ (S ⋈ T)
R ⋈ T = {(1, 'a', 'c')}
S ⋈ T = {(1, 'b', 'c')}
(R ⋈ T) ∩ (S ⋈ T) = {} (different tuples)

-- Same result here, but not in general!

Combining Set Operations in Complex Queries:

-- Find employees who have worked on all projects in Department X
-- This is a division problem, but can use set operations:

-- All (employee, project) pairs required:
Required ← π_emp_id(Employees) × π_proj_id(σ_dept='X'(Projects))

-- Actual assignments:
Actual ← π_emp_id,proj_id(Assignments)

-- Missing assignments:
Missing ← Required − Actual

-- Employees with no missing assignments (completed all):
Qualified ← π_emp_id(Employees) − π_emp_id(Missing)

This pattern uses Cartesian product to generate requirements, difference to find gaps, and another difference to find employees without gaps.

Division via Set Operations

The pattern above is how relational division is derived from primitive operations. It's a powerful template for 'for all' queries: finding entities that satisfy a complete set of requirements.

Common Patterns and Query Idioms

Experienced database practitioners recognize common patterns where set operations provide elegant solutions. Internalizing these patterns accelerates query design.

Pattern 1: OR Decomposition

Complex OR conditions can be decomposed into unions:

-- Instead of:
σ_{condition1 OR condition2 OR condition3}(R)

-- Consider:
σ_condition1(R) ∪ σ_condition2(R) ∪ σ_condition3(R)

Advantage: Each selection can use a different index.

Pattern 2: NOT IN as Difference

-- Find customers with no orders:
π_customer_id(Customers) − π_customer_id(Orders)

-- Equivalent to:
SELECT customer_id FROM Customers WHERE customer_id NOT IN (SELECT customer_id FROM Orders)

Pattern 3: Multi-way AND as Chained Intersection

-- Customers who bought products A, B, AND C:
π_cust_id(σ_product='A'(Purchases))
∩ π_cust_id(σ_product='B'(Purchases))
∩ π_cust_id(σ_product='C'(Purchases))

Pattern 4: Symmetric Difference for Change Detection

-- Records that changed (in old XOR new):
(OldData − NewData) ∪ (NewData − OldData)

-- Added records:
NewData − OldData

-- Deleted records:
OldData − NewData

-- Unchanged records:
OldData ∩ NewData

Pattern 5: Completeness Check via Division

-- Students who completed ALL required courses:
AllRequired ← π_student_id(Students) × π_course_id(RequiredCourses)
Completed ← π_student_id, course_id(Enrollments)
Incomplete ← AllRequired − Completed
CompleteStudents ← π_student_id(Students) − π_student_id(Incomplete)

pattern_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Pattern 1: OR Decomposition for Index Usage
-- Instead of OR which may not use indexes:
SELECT * FROM Products 
WHERE category = 'Electronics' OR category = 'Books' OR category = 'Music';
 
-- Decompose to unions (each can use index):
SELECT * FROM Products WHERE category = 'Electronics'
UNION
SELECT * FROM Products WHERE category = 'Books'
UNION
SELECT * FROM Products WHERE category = 'Music';
 
-- Pattern 2: NOT EXISTS via EXCEPT
-- Customers without orders:
SELECT customer_id FROM Customers
EXCEPT
SELECT DISTINCT customer_id FROM Orders;
 
-- Pattern 3: Multi-criteria AND via INTERSECT
-- Products in ALL warehouse locations:
SELECT product_id FROM Inventory WHERE warehouse = 'East'
INTERSECT
SELECT product_id FROM Inventory WHERE warehouse = 'West'
INTERSECT
SELECT product_id FROM Inventory WHERE warehouse = 'Central';
 
-- Pattern 4: Change Detection
-- Added records:
SELECT * FROM new_data EXCEPT SELECT * FROM old_data;
 
-- Deleted records:
SELECT * FROM old_data EXCEPT SELECT * FROM new_data;
 
-- Changed or new (for merge operations):
SELECT *, 'ADDED' as change_type 
FROM (SELECT * FROM new_data EXCEPT SELECT * FROM old_data) added
UNION ALL
SELECT *, 'DELETED' as change_type 
FROM (SELECT * FROM old_data EXCEPT SELECT * FROM new_data) deleted;

Pattern Library

Build your mental library of set operation patterns. When you see 'all of,' think intersection. When you see 'none of' or 'missing,' think difference. When you see 'any of,' think union. This pattern recognition accelerates query design.

Summary: Mastering Set Operation Rules

We've comprehensively explored the rules governing set operations in relational algebra. These rules are fundamental to query correctness and optimization. Let's consolidate the essential knowledge:

Key Takeaways

•Algebraic Properties: Union and intersection are commutative and associative; difference is neither
•Distributive Laws: ∩ distributes over ∪ and vice versa; De Morgan's laws handle − distribution
•Precedence: Unary ops > Products/Joins > ∩ > ∪/− ; use parentheses for clarity
•Selection Push: σ distributes over all set operations—push down for efficiency
•Projection Push: π distributes over ∪ only—NOT over ∩ or −
•Join Distribution: ⋈ distributes over ∪; subset relationship only for ∩ and −
•Optimization: Reorder, push down, and eliminate redundancy using equivalence rules
•Common Patterns: OR→union, NOT IN→difference, AND across records→intersection

Module Complete!

Congratulations! You've completed Module 3: Set Operations. You now have comprehensive mastery of union, intersection, and difference operations, including their mathematical properties, union compatibility requirements, implementation strategies, and the rules that govern their use. You're equipped to write efficient, correct queries using set operations and understand how query optimizers transform your queries. Continue to the next module to explore Cartesian Product and Rename operations.

5 / 5

Loading learning content...

Database Management SystemsRelational Algebra

Set Operations in Relational Algebra

LevelIntermediate

Duration90 mins

TopicRelational Algebra

5 / 5

Operation Rules

The Rules of the Game: Governing Set Operations

This page covers the complete set of rules for relational set operations: algebraic properties, operator precedence, equivalence transformations, and the optimization opportunities they enable.

What You Will Learn

Algebraic Properties of Set Operations

The three set operations—union, intersection, and difference—exhibit distinct algebraic properties. Understanding these properties is crucial for query manipulation and optimization.

Commutativity:

An operation ⊕ is commutative if A ⊕ B = B ⊕ A for all A and B.

Operation	Commutative?	Example
Union (∪)	✓ Yes	R ∪ S = S ∪ R
Intersection (∩)	✓ Yes	R ∩ S = S ∩ R
Difference (−)	✗ No	R − S ≠ S − R (in general)

Associativity:

An operation ⊕ is associative if (A ⊕ B) ⊕ C = A ⊕ (B ⊕ C).

Operation	Associative?	Example
Union (∪)	✓ Yes	(R ∪ S) ∪ T = R ∪ (S ∪ T)
Intersection (∩)	✓ Yes	(R ∩ S) ∩ T = R ∩ (S ∩ T)
Difference (−)	✗ No	(R − S) − T ≠ R − (S − T)

Complete Algebraic Properties Summary
Property	Union (∪)	Intersection (∩)	Difference (−)
Commutative	R ∪ S = S ∪ R ✓	R ∩ S = S ∩ R ✓	R − S ≠ S − R ✗
Associative	(R∪S)∪T = R∪(S∪T) ✓	(R∩S)∩T = R∩(S∩T) ✓	(R−S)−T ≠ R−(S−T) ✗
Idempotent	R ∪ R = R ✓	R ∩ R = R ✓	R − R = ∅ (not idempotent)
Identity Element	R ∪ ∅ = R (∅ is identity)	None (no universal)	R − ∅ = R (∅ for right)
Annihilator	R ∪ U = U (universal)	R ∩ ∅ = ∅	∅ − R = ∅

Difference is Non-Associative!

Never assume parentheses don't matter with difference. Consider: R={1,2,3}, S={2,3}, T={3}. (R−S)−T = {1}−{3} = {1}. But R−(S−T) = {1,2,3}−{2} = {1,3}. The results differ!

Implications for Query Writing:

Union chains can be freely reordered: A ∪ B ∪ C = C ∪ A ∪ B = any order
Intersection chains can be freely reordered: A ∩ B ∩ C = B ∩ C ∩ A
Difference chains MUST use explicit parentheses: (A − B) − C ≠ A − (B − C)
Mixed operations follow precedence rules (covered in Section 3)

Distributive Laws and Interactions

Set operations interact with each other through distributive laws and absorption laws. These rules enable complex transformations between equivalent expressions.

Distributivity:

An operation ⊗ distributes over ⊕ if: A ⊗ (B ⊕ C) = (A ⊗ B) ⊕ (A ⊗ C)

Distributive Laws for Set Operations
Law	Equation	Direction
∩ distributes over ∪	R ∩ (S ∪ T) = (R ∩ S) ∪ (R ∩ T)	Both directions ✓
∪ distributes over ∩	R ∪ (S ∩ T) = (R ∪ S) ∩ (R ∪ T)	Both directions ✓
− distributes over ∪ (right)	R − (S ∪ T) = (R − S) ∩ (R − T)	De Morgan variant
− distributes over ∩ (right)	R − (S ∩ T) = (R − S) ∪ (R − T)	De Morgan variant

Absorption Laws:

Absorption describes how one operation can "absorb" a combined expression:

R ∪ (R ∩ S) = R (union absorbs intersection)
R ∩ (R ∪ S) = R (intersection absorbs union)

Proof of R ∪ (R ∩ S) = R:

R ∩ S ⊆ R (by definition of intersection)
R ⊆ R ∪ (anything) (by definition of union)
If A ⊆ B, then A ∪ B = B
Since R ∩ S ⊆ R, we have R ∪ (R ∩ S) = R ∎

De Morgan's Laws

De Morgan's laws are essential for query transformation: • R − (S ∪ T) = (R − S) ∩ (R − T) • R − (S ∩ T) = (R − S) ∪ (R − T)

These allow converting between union-based and intersection-based expressions, enabling different execution strategies.

Converting Mermaid diagram...

Operator Precedence in Relational Expressions

Standard Precedence (Highest to Lowest):

Relational Algebra Operator Precedence
Priority	Operators	Description	Example
1 (Highest)	σ, π, ρ	Unary operators (selection, projection, rename)	σ_x>5 binds tightest
2	×, ⋈	Product and join operators	R × S evaluated before union
3	∩	Intersection	Higher than union/difference
4	∪, −	Union and difference	Same level, left-to-right
5 (Lowest)	←	Assignment	Evaluated last

Precedence Examples:

σ_salary>50000(Employees) ∪ Managers

Evaluates as: (σ_salary>50000(Employees)) ∪ Managers Selection binds to Employees first, then union.

R ∪ S ∩ T

Evaluates as: R ∪ (S ∩ T) Intersection has higher precedence than union.

R − S ∪ T

Evaluates as: (R − S) ∪ T Same precedence for − and ∪, so left-to-right.

π_name(R ∪ S)

Explicit parentheses: Union first, then project on result.

π_name(R) ∪ π_name(S)

Without parentheses on R and S, projection binds first.

Best Practice: Use Parentheses

Even when precedence rules give the desired interpretation, explicit parentheses improve readability and prevent errors. Write (R ∪ S) ∩ T instead of relying on readers knowing precedence rules.

precedence_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- SQL set operation precedence
-- INTERSECT has higher precedence than UNION/EXCEPT
 
-- This query:
SELECT * FROM A
UNION
SELECT * FROM B
INTERSECT
SELECT * FROM C;
 
-- Evaluates as:
SELECT * FROM A
UNION
(SELECT * FROM B INTERSECT SELECT * FROM C);
 
-- NOT as:
(SELECT * FROM A UNION SELECT * FROM B)
INTERSECT
SELECT * FROM C;
 
-- For clarity, always use explicit parentheses:
SELECT * FROM A
UNION
(
    SELECT * FROM B
    INTERSECT
    SELECT * FROM C
);
 
-- Or if you want union first:
(
    SELECT * FROM A
    UNION
    SELECT * FROM B
)
INTERSECT
SELECT * FROM C;
 
-- EXCEPT and UNION have same precedence (left-to-right)
SELECT * FROM A
EXCEPT
SELECT * FROM B
UNION
SELECT * FROM C;
 
-- Evaluates as:
(
    (SELECT * FROM A EXCEPT SELECT * FROM B)
    UNION
    SELECT * FROM C
);

Equivalence Rules for Query Transformation

Fundamental Equivalences for Set Operations:

Key Equivalence Rules
Rule Name	Equivalence	Application
Commutativity of ∪	R ∪ S ≡ S ∪ R	Reorder for smaller relation first
Commutativity of ∩	R ∩ S ≡ S ∩ R	Build hash on smaller relation
Associativity of ∪	(R ∪ S) ∪ T ≡ R ∪ (S ∪ T)	Group for optimal execution
Associativity of ∩	(R ∩ S) ∩ T ≡ R ∩ (S ∩ T)	Process smallest intersections first
Intersection via Difference	R ∩ S ≡ R − (R − S)	Use when difference is indexed
De Morgan (union)	R − (S ∪ T) ≡ (R − S) ∩ (R − T)	Split complex exclusions
De Morgan (intersection)	R − (S ∩ T) ≡ (R − S) ∪ (R − T)	Parallelize exclusion checks
Absorption (union)	R ∪ (R ∩ S) ≡ R	Eliminate redundant operations
Absorption (intersection)	R ∩ (R ∪ S) ≡ R	Simplify expressions

Equivalences Involving Selection (σ):

Selection interacts with set operations in powerful ways:

-- Selection distributes over union:
σ_p(R ∪ S) ≡ σ_p(R) ∪ σ_p(S)

-- Selection distributes over intersection:
σ_p(R ∩ S) ≡ σ_p(R) ∩ σ_p(S)

-- Selection distributes over difference:
σ_p(R − S) ≡ σ_p(R) − σ_p(S)
           ≡ σ_p(R) − S  (if p applies to R's attributes)

Why these matter:

Pushing selection down (applying it earlier) often dramatically reduces the number of tuples processed by subsequent operations. This is one of the most effective optimization techniques.

Selection Pushdown

Equivalences Involving Projection (π):

-- Projection distributes over union:
π_L(R ∪ S) ≡ π_L(R) ∪ π_L(S)

-- BUT NOT over intersection or difference in general!
π_L(R ∩ S) ≢ π_L(R) ∩ π_L(S)  -- May produce different results!
π_L(R − S) ≢ π_L(R) − π_L(S)  -- May produce different results!

Why projection doesn't distribute over intersection/difference:

Projection can merge previously distinct tuples. Consider:

R = {(1, 'a'), (2, 'a')}
S = {(1, 'a'), (3, 'a')}
π_B(R ∩ S) = π_B({(1, 'a')}) = {('a')}
π_B(R) ∩ π_B(S) = {('a')} ∩ {('a')} = {('a')} -- Same here, but...

With different data, results can differ when projection causes duplicates to collapse differently.

Query Optimization Using Set Operation Rules

Key Optimization Strategies:

Optimization Techniques for Set Operations

•Push Selection Down: Apply σ before set operations to reduce tuple counts early • σ_p(R ∪ S) → σ_p(R) ∪ σ_p(S)
•Push Projection Down (for Union): Project before union to reduce tuple width • π_L(R ∪ S) → π_L(R) ∪ π_L(S)
•Reorder Commutative Operations: Process smaller relations first • If |S| < |R|, compute S ∩ R starting with S
•Exploit Associativity: Group operations to minimize intermediate results • (R ∪ S) ∪ T with small R,S → compute R ∪ S first
•Eliminate Redundancy: Use absorption and idempotence • R ∪ (R ∩ S) → R
•Parallelize via De Morgan: Split for concurrent execution • R − (S ∪ T) → (R − S) ∩ (R − T) if S and T can be processed in parallel

Example: Optimizing a Complex Query

Original query:

π_name(σ_dept='Engineering'(Employees ∪ Contractors))

Step 1: Push selection inside union:

π_name(σ_dept='Engineering'(Employees) ∪ σ_dept='Engineering'(Contractors))

Step 2: Push projection inside unions:

π_name(σ_dept='Engineering'(Employees)) ∪ π_name(σ_dept='Engineering'(Contractors))

Cost Analysis:

Step	Tuples Processed	Tuple Width
Original	100K + 50K = 150K	Full width
After σ push	10K + 5K = 15K	Full width
After π push	10K + 5K = 15K	Narrow (name only)

The optimized version processes 10x fewer tuples with narrower tuples—dramatic improvement!

Helping the Optimizer

Set Operations with Other Relational Operations

Set operations don't exist in isolation—they interact with joins, selections, projections, and other operations. Understanding these interactions enables sophisticated query design.

Set Operations and Joins:

Join Interactions with Set Operations
Pattern	Equivalence	Notes
Join distributes over union	(R ∪ S) ⋈ T ≡ (R ⋈ T) ∪ (S ⋈ T)	Valid when join condition applies to both
Join over intersection	(R ∩ S) ⋈ T ⊆ (R ⋈ T) ∩ (S ⋈ T)	Subset, not equivalence
Join over difference	(R − S) ⋈ T ⊆ (R ⋈ T) − (S ⋈ T)	Subset, not equivalence

Warning: Be Careful with Join Distribution

Join distributing over union is valid:

(R ∪ S) ⋈ T = (R ⋈ T) ∪ (S ⋈ T)

But join does NOT fully distribute over intersection or difference! Consider this counterexample for intersection:

R = {(1, 'a')}, S = {(1, 'b')}, T = {(1, 'c')}

-- Left side: (R ∩ S) ⋈ T
R ∩ S = {} (no common tuples)
{} ⋈ T = {} 

-- Right side: (R ⋈ T) ∩ (S ⋈ T)
R ⋈ T = {(1, 'a', 'c')}
S ⋈ T = {(1, 'b', 'c')}
(R ⋈ T) ∩ (S ⋈ T) = {} (different tuples)

-- Same result here, but not in general!

Combining Set Operations in Complex Queries:

-- Find employees who have worked on all projects in Department X
-- This is a division problem, but can use set operations:

-- All (employee, project) pairs required:
Required ← π_emp_id(Employees) × π_proj_id(σ_dept='X'(Projects))

-- Actual assignments:
Actual ← π_emp_id,proj_id(Assignments)

-- Missing assignments:
Missing ← Required − Actual

-- Employees with no missing assignments (completed all):
Qualified ← π_emp_id(Employees) − π_emp_id(Missing)

This pattern uses Cartesian product to generate requirements, difference to find gaps, and another difference to find employees without gaps.

Division via Set Operations

The pattern above is how relational division is derived from primitive operations. It's a powerful template for 'for all' queries: finding entities that satisfy a complete set of requirements.

Common Patterns and Query Idioms

Experienced database practitioners recognize common patterns where set operations provide elegant solutions. Internalizing these patterns accelerates query design.

Pattern 1: OR Decomposition

Complex OR conditions can be decomposed into unions:

-- Instead of:
σ_{condition1 OR condition2 OR condition3}(R)

-- Consider:
σ_condition1(R) ∪ σ_condition2(R) ∪ σ_condition3(R)

Advantage: Each selection can use a different index.

Pattern 2: NOT IN as Difference

-- Find customers with no orders:
π_customer_id(Customers) − π_customer_id(Orders)

-- Equivalent to:
SELECT customer_id FROM Customers WHERE customer_id NOT IN (SELECT customer_id FROM Orders)

Pattern 3: Multi-way AND as Chained Intersection

-- Customers who bought products A, B, AND C:
π_cust_id(σ_product='A'(Purchases))
∩ π_cust_id(σ_product='B'(Purchases))
∩ π_cust_id(σ_product='C'(Purchases))

Pattern 4: Symmetric Difference for Change Detection

-- Records that changed (in old XOR new):
(OldData − NewData) ∪ (NewData − OldData)

-- Added records:
NewData − OldData

-- Deleted records:
OldData − NewData

-- Unchanged records:
OldData ∩ NewData

Pattern 5: Completeness Check via Division

-- Students who completed ALL required courses:
AllRequired ← π_student_id(Students) × π_course_id(RequiredCourses)
Completed ← π_student_id, course_id(Enrollments)
Incomplete ← AllRequired − Completed
CompleteStudents ← π_student_id(Students) − π_student_id(Incomplete)

pattern_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-- Pattern 1: OR Decomposition for Index Usage
-- Instead of OR which may not use indexes:
SELECT * FROM Products 
WHERE category = 'Electronics' OR category = 'Books' OR category = 'Music';
 
-- Decompose to unions (each can use index):
SELECT * FROM Products WHERE category = 'Electronics'
UNION
SELECT * FROM Products WHERE category = 'Books'
UNION
SELECT * FROM Products WHERE category = 'Music';
 
-- Pattern 2: NOT EXISTS via EXCEPT
-- Customers without orders:
SELECT customer_id FROM Customers
EXCEPT
SELECT DISTINCT customer_id FROM Orders;
 
-- Pattern 3: Multi-criteria AND via INTERSECT
-- Products in ALL warehouse locations:
SELECT product_id FROM Inventory WHERE warehouse = 'East'
INTERSECT
SELECT product_id FROM Inventory WHERE warehouse = 'West'
INTERSECT
SELECT product_id FROM Inventory WHERE warehouse = 'Central';
 
-- Pattern 4: Change Detection
-- Added records:
SELECT * FROM new_data EXCEPT SELECT * FROM old_data;
 
-- Deleted records:
SELECT * FROM old_data EXCEPT SELECT * FROM new_data;
 
-- Changed or new (for merge operations):
SELECT *, 'ADDED' as change_type 
FROM (SELECT * FROM new_data EXCEPT SELECT * FROM old_data) added
UNION ALL
SELECT *, 'DELETED' as change_type 
FROM (SELECT * FROM old_data EXCEPT SELECT * FROM new_data) deleted;

Pattern Library

Summary: Mastering Set Operation Rules

We've comprehensively explored the rules governing set operations in relational algebra. These rules are fundamental to query correctness and optimization. Let's consolidate the essential knowledge:

Key Takeaways

•Algebraic Properties: Union and intersection are commutative and associative; difference is neither
•Distributive Laws: ∩ distributes over ∪ and vice versa; De Morgan's laws handle − distribution
•Precedence: Unary ops > Products/Joins > ∩ > ∪/− ; use parentheses for clarity
•Selection Push: σ distributes over all set operations—push down for efficiency
•Projection Push: π distributes over ∪ only—NOT over ∩ or −
•Join Distribution: ⋈ distributes over ∪; subset relationship only for ∩ and −
•Optimization: Reorder, push down, and eliminate redundancy using equivalence rules
•Common Patterns: OR→union, NOT IN→difference, AND across records→intersection

Module Complete!

5 / 5