Loading learning content...
Understanding the rules that govern set operations is essential for writing efficient, correct, and optimizable relational algebra expressions. These rules determine how operations can be combined, reordered, and transformed—forming the theoretical foundation for query optimization.
Just as arithmetic has rules like commutativity (a + b = b + a) and distributivity (a × (b + c) = a×b + a×c), relational algebra has its own laws that govern how operations interact. These laws enable query optimizers to transform your queries into equivalent but more efficient forms.
This page covers the complete set of rules for relational set operations: algebraic properties, operator precedence, equivalence transformations, and the optimization opportunities they enable.
By the end of this page, you will master all algebraic properties of set operations, understand operator precedence in relational expressions, know the key equivalence rules for query transformation, apply these rules for query optimization, and recognize patterns that query optimizers exploit.
The three set operations—union, intersection, and difference—exhibit distinct algebraic properties. Understanding these properties is crucial for query manipulation and optimization.
Commutativity:
An operation ⊕ is commutative if A ⊕ B = B ⊕ A for all A and B.
| Operation | Commutative? | Example |
|---|---|---|
| Union (∪) | ✓ Yes | R ∪ S = S ∪ R |
| Intersection (∩) | ✓ Yes | R ∩ S = S ∩ R |
| Difference (−) | ✗ No | R − S ≠ S − R (in general) |
Associativity:
An operation ⊕ is associative if (A ⊕ B) ⊕ C = A ⊕ (B ⊕ C).
| Operation | Associative? | Example |
|---|---|---|
| Union (∪) | ✓ Yes | (R ∪ S) ∪ T = R ∪ (S ∪ T) |
| Intersection (∩) | ✓ Yes | (R ∩ S) ∩ T = R ∩ (S ∩ T) |
| Difference (−) | ✗ No | (R − S) − T ≠ R − (S − T) |
| Property | Union (∪) | Intersection (∩) | Difference (−) |
|---|---|---|---|
| Commutative | R ∪ S = S ∪ R ✓ | R ∩ S = S ∩ R ✓ | R − S ≠ S − R ✗ |
| Associative | (R∪S)∪T = R∪(S∪T) ✓ | (R∩S)∩T = R∩(S∩T) ✓ | (R−S)−T ≠ R−(S−T) ✗ |
| Idempotent | R ∪ R = R ✓ | R ∩ R = R ✓ | R − R = ∅ (not idempotent) |
| Identity Element | R ∪ ∅ = R (∅ is identity) | None (no universal) | R − ∅ = R (∅ for right) |
| Annihilator | R ∪ U = U (universal) | R ∩ ∅ = ∅ | ∅ − R = ∅ |
Never assume parentheses don't matter with difference. Consider: R={1,2,3}, S={2,3}, T={3}. (R−S)−T = {1}−{3} = {1}. But R−(S−T) = {1,2,3}−{2} = {1,3}. The results differ!
Implications for Query Writing:
Set operations interact with each other through distributive laws and absorption laws. These rules enable complex transformations between equivalent expressions.
Distributivity:
An operation ⊗ distributes over ⊕ if: A ⊗ (B ⊕ C) = (A ⊗ B) ⊕ (A ⊗ C)
| Law | Equation | Direction |
|---|---|---|
| ∩ distributes over ∪ | R ∩ (S ∪ T) = (R ∩ S) ∪ (R ∩ T) | Both directions ✓ |
| ∪ distributes over ∩ | R ∪ (S ∩ T) = (R ∪ S) ∩ (R ∪ T) | Both directions ✓ |
| − distributes over ∪ (right) | R − (S ∪ T) = (R − S) ∩ (R − T) | De Morgan variant |
| − distributes over ∩ (right) | R − (S ∩ T) = (R − S) ∪ (R − T) | De Morgan variant |
Absorption Laws:
Absorption describes how one operation can "absorb" a combined expression:
Proof of R ∪ (R ∩ S) = R:
De Morgan's laws are essential for query transformation: • R − (S ∪ T) = (R − S) ∩ (R − T) • R − (S ∩ T) = (R − S) ∪ (R − T)
These allow converting between union-based and intersection-based expressions, enabling different execution strategies.
When relational algebra expressions contain multiple operators without explicit parentheses, precedence rules determine the order of evaluation. Understanding precedence prevents ambiguity and ensures correct query interpretation.
Standard Precedence (Highest to Lowest):
| Priority | Operators | Description | Example |
|---|---|---|---|
| 1 (Highest) | σ, π, ρ | Unary operators (selection, projection, rename) | σ_x>5 binds tightest |
| 2 | ×, ⋈ | Product and join operators | R × S evaluated before union |
| 3 | ∩ | Intersection | Higher than union/difference |
| 4 | ∪, − | Union and difference | Same level, left-to-right |
| 5 (Lowest) | ← | Assignment | Evaluated last |
Precedence Examples:
σ_salary>50000(Employees) ∪ Managers
Evaluates as: (σ_salary>50000(Employees)) ∪ Managers Selection binds to Employees first, then union.
R ∪ S ∩ T
Evaluates as: R ∪ (S ∩ T) Intersection has higher precedence than union.
R − S ∪ T
Evaluates as: (R − S) ∪ T Same precedence for − and ∪, so left-to-right.
π_name(R ∪ S)
Explicit parentheses: Union first, then project on result.
π_name(R) ∪ π_name(S)
Without parentheses on R and S, projection binds first.
Even when precedence rules give the desired interpretation, explicit parentheses improve readability and prevent errors. Write (R ∪ S) ∩ T instead of relying on readers knowing precedence rules.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
-- SQL set operation precedence-- INTERSECT has higher precedence than UNION/EXCEPT -- This query:SELECT * FROM AUNIONSELECT * FROM BINTERSECTSELECT * FROM C; -- Evaluates as:SELECT * FROM AUNION(SELECT * FROM B INTERSECT SELECT * FROM C); -- NOT as:(SELECT * FROM A UNION SELECT * FROM B)INTERSECTSELECT * FROM C; -- For clarity, always use explicit parentheses:SELECT * FROM AUNION( SELECT * FROM B INTERSECT SELECT * FROM C); -- Or if you want union first:( SELECT * FROM A UNION SELECT * FROM B)INTERSECTSELECT * FROM C; -- EXCEPT and UNION have same precedence (left-to-right)SELECT * FROM AEXCEPTSELECT * FROM BUNIONSELECT * FROM C; -- Evaluates as:( (SELECT * FROM A EXCEPT SELECT * FROM B) UNION SELECT * FROM C);Equivalence rules define when two relational algebra expressions produce identical results for all possible database states. These rules form the foundation of query optimization—optimizers transform queries into equivalent but more efficient forms.
Fundamental Equivalences for Set Operations:
| Rule Name | Equivalence | Application |
|---|---|---|
| Commutativity of ∪ | R ∪ S ≡ S ∪ R | Reorder for smaller relation first |
| Commutativity of ∩ | R ∩ S ≡ S ∩ R | Build hash on smaller relation |
| Associativity of ∪ | (R ∪ S) ∪ T ≡ R ∪ (S ∪ T) | Group for optimal execution |
| Associativity of ∩ | (R ∩ S) ∩ T ≡ R ∩ (S ∩ T) | Process smallest intersections first |
| Intersection via Difference | R ∩ S ≡ R − (R − S) | Use when difference is indexed |
| De Morgan (union) | R − (S ∪ T) ≡ (R − S) ∩ (R − T) | Split complex exclusions |
| De Morgan (intersection) | R − (S ∩ T) ≡ (R − S) ∪ (R − T) | Parallelize exclusion checks |
| Absorption (union) | R ∪ (R ∩ S) ≡ R | Eliminate redundant operations |
| Absorption (intersection) | R ∩ (R ∪ S) ≡ R | Simplify expressions |
Equivalences Involving Selection (σ):
Selection interacts with set operations in powerful ways:
-- Selection distributes over union:
σ_p(R ∪ S) ≡ σ_p(R) ∪ σ_p(S)
-- Selection distributes over intersection:
σ_p(R ∩ S) ≡ σ_p(R) ∩ σ_p(S)
-- Selection distributes over difference:
σ_p(R − S) ≡ σ_p(R) − σ_p(S)
≡ σ_p(R) − S (if p applies to R's attributes)
Why these matter:
Pushing selection down (applying it earlier) often dramatically reduces the number of tuples processed by subsequent operations. This is one of the most effective optimization techniques.
If you're selecting from a union of 1M and 2M row tables, applying selection AFTER union processes 3M rows. Pushing selection to BEFORE union might reduce each to 10K rows, processing only 20K total—a 150x improvement!
Equivalences Involving Projection (π):
-- Projection distributes over union:
π_L(R ∪ S) ≡ π_L(R) ∪ π_L(S)
-- BUT NOT over intersection or difference in general!
π_L(R ∩ S) ≢ π_L(R) ∩ π_L(S) -- May produce different results!
π_L(R − S) ≢ π_L(R) − π_L(S) -- May produce different results!
Why projection doesn't distribute over intersection/difference:
Projection can merge previously distinct tuples. Consider:
With different data, results can differ when projection causes duplicates to collapse differently.
Query optimizers apply equivalence rules to transform queries into more efficient execution plans. Understanding these techniques helps you write optimizer-friendly queries and understand execution plans.
Key Optimization Strategies:
Example: Optimizing a Complex Query
Original query:
π_name(σ_dept='Engineering'(Employees ∪ Contractors))
Step 1: Push selection inside union:
π_name(σ_dept='Engineering'(Employees) ∪ σ_dept='Engineering'(Contractors))
Step 2: Push projection inside unions:
π_name(σ_dept='Engineering'(Employees)) ∪ π_name(σ_dept='Engineering'(Contractors))
Cost Analysis:
| Step | Tuples Processed | Tuple Width |
|---|---|---|
| Original | 100K + 50K = 150K | Full width |
| After σ push | 10K + 5K = 15K | Full width |
| After π push | 10K + 5K = 15K | Narrow (name only) |
The optimized version processes 10x fewer tuples with narrower tuples—dramatic improvement!
While optimizers apply transformations automatically, you can help by: (1) Writing conditions that can be pushed down, (2) Avoiding unnecessary columns in SELECT, (3) Using appropriate indexes, and (4) Providing statistics through ANALYZE commands.
Set operations don't exist in isolation—they interact with joins, selections, projections, and other operations. Understanding these interactions enables sophisticated query design.
Set Operations and Joins:
| Pattern | Equivalence | Notes |
|---|---|---|
| Join distributes over union | (R ∪ S) ⋈ T ≡ (R ⋈ T) ∪ (S ⋈ T) | Valid when join condition applies to both |
| Join over intersection | (R ∩ S) ⋈ T ⊆ (R ⋈ T) ∩ (S ⋈ T) | Subset, not equivalence |
| Join over difference | (R − S) ⋈ T ⊆ (R ⋈ T) − (S ⋈ T) | Subset, not equivalence |
Warning: Be Careful with Join Distribution
Join distributing over union is valid:
(R ∪ S) ⋈ T = (R ⋈ T) ∪ (S ⋈ T)
But join does NOT fully distribute over intersection or difference! Consider this counterexample for intersection:
R = {(1, 'a')}, S = {(1, 'b')}, T = {(1, 'c')}
-- Left side: (R ∩ S) ⋈ T
R ∩ S = {} (no common tuples)
{} ⋈ T = {}
-- Right side: (R ⋈ T) ∩ (S ⋈ T)
R ⋈ T = {(1, 'a', 'c')}
S ⋈ T = {(1, 'b', 'c')}
(R ⋈ T) ∩ (S ⋈ T) = {} (different tuples)
-- Same result here, but not in general!
Combining Set Operations in Complex Queries:
-- Find employees who have worked on all projects in Department X
-- This is a division problem, but can use set operations:
-- All (employee, project) pairs required:
Required ← π_emp_id(Employees) × π_proj_id(σ_dept='X'(Projects))
-- Actual assignments:
Actual ← π_emp_id,proj_id(Assignments)
-- Missing assignments:
Missing ← Required − Actual
-- Employees with no missing assignments (completed all):
Qualified ← π_emp_id(Employees) − π_emp_id(Missing)
This pattern uses Cartesian product to generate requirements, difference to find gaps, and another difference to find employees without gaps.
The pattern above is how relational division is derived from primitive operations. It's a powerful template for 'for all' queries: finding entities that satisfy a complete set of requirements.
Experienced database practitioners recognize common patterns where set operations provide elegant solutions. Internalizing these patterns accelerates query design.
Pattern 1: OR Decomposition
Complex OR conditions can be decomposed into unions:
-- Instead of:
σ_{condition1 OR condition2 OR condition3}(R)
-- Consider:
σ_condition1(R) ∪ σ_condition2(R) ∪ σ_condition3(R)
Advantage: Each selection can use a different index.
Pattern 2: NOT IN as Difference
-- Find customers with no orders:
π_customer_id(Customers) − π_customer_id(Orders)
-- Equivalent to:
SELECT customer_id FROM Customers WHERE customer_id NOT IN (SELECT customer_id FROM Orders)
Pattern 3: Multi-way AND as Chained Intersection
-- Customers who bought products A, B, AND C:
π_cust_id(σ_product='A'(Purchases))
∩ π_cust_id(σ_product='B'(Purchases))
∩ π_cust_id(σ_product='C'(Purchases))
Pattern 4: Symmetric Difference for Change Detection
-- Records that changed (in old XOR new):
(OldData − NewData) ∪ (NewData − OldData)
-- Added records:
NewData − OldData
-- Deleted records:
OldData − NewData
-- Unchanged records:
OldData ∩ NewData
Pattern 5: Completeness Check via Division
-- Students who completed ALL required courses:
AllRequired ← π_student_id(Students) × π_course_id(RequiredCourses)
Completed ← π_student_id, course_id(Enrollments)
Incomplete ← AllRequired − Completed
CompleteStudents ← π_student_id(Students) − π_student_id(Incomplete)
123456789101112131415161718192021222324252627282930313233343536373839
-- Pattern 1: OR Decomposition for Index Usage-- Instead of OR which may not use indexes:SELECT * FROM Products WHERE category = 'Electronics' OR category = 'Books' OR category = 'Music'; -- Decompose to unions (each can use index):SELECT * FROM Products WHERE category = 'Electronics'UNIONSELECT * FROM Products WHERE category = 'Books'UNIONSELECT * FROM Products WHERE category = 'Music'; -- Pattern 2: NOT EXISTS via EXCEPT-- Customers without orders:SELECT customer_id FROM CustomersEXCEPTSELECT DISTINCT customer_id FROM Orders; -- Pattern 3: Multi-criteria AND via INTERSECT-- Products in ALL warehouse locations:SELECT product_id FROM Inventory WHERE warehouse = 'East'INTERSECTSELECT product_id FROM Inventory WHERE warehouse = 'West'INTERSECTSELECT product_id FROM Inventory WHERE warehouse = 'Central'; -- Pattern 4: Change Detection-- Added records:SELECT * FROM new_data EXCEPT SELECT * FROM old_data; -- Deleted records:SELECT * FROM old_data EXCEPT SELECT * FROM new_data; -- Changed or new (for merge operations):SELECT *, 'ADDED' as change_type FROM (SELECT * FROM new_data EXCEPT SELECT * FROM old_data) addedUNION ALLSELECT *, 'DELETED' as change_type FROM (SELECT * FROM old_data EXCEPT SELECT * FROM new_data) deleted;Build your mental library of set operation patterns. When you see 'all of,' think intersection. When you see 'none of' or 'missing,' think difference. When you see 'any of,' think union. This pattern recognition accelerates query design.
We've comprehensively explored the rules governing set operations in relational algebra. These rules are fundamental to query correctness and optimization. Let's consolidate the essential knowledge:
Congratulations! You've completed Module 3: Set Operations. You now have comprehensive mastery of union, intersection, and difference operations, including their mathematical properties, union compatibility requirements, implementation strategies, and the rules that govern their use. You're equipped to write efficient, correct queries using set operations and understand how query optimizers transform your queries. Continue to the next module to explore Cartesian Product and Rename operations.