Loading learning content...
In relational algebra, the union operation represents one of the most fundamental mechanisms for combining data from multiple relations. Just as in set theory, where the union of two sets contains all elements from both sets, the relational union combines tuples from two relations into a single, unified result.
Understanding the union operation is essential because real-world database scenarios frequently require consolidating data from multiple sources—merging customer lists from different branches, combining transaction records from multiple time periods, or aggregating data from distributed database fragments. The union operation provides the formal, mathematical foundation for these essential operations.
This page will guide you through the complete theory and practice of the union operation, from its mathematical foundations to its practical implementation in modern database systems.
By the end of this page, you will master the formal definition of union, understand union compatibility requirements, be able to construct union expressions using proper notation, visualize union operations through diagrams, and apply union to solve real-world database problems. You'll also understand the subtle differences between relational union and SQL's UNION operations.
The union operation in relational algebra draws directly from set theory, which provides its mathematical foundation. To fully understand relational union, we must first establish its set-theoretic roots and then extend these concepts to the relational model.
Set-Theoretic Definition:
In classical set theory, given two sets A and B, the union A ∪ B is defined as:
A ∪ B = { x | x ∈ A ∨ x ∈ B }
This reads: "The union of A and B is the set of all elements x such that x is a member of A OR x is a member of B (or both)."
The key characteristics of set union that carry over to relational algebra include:
While set theory deals with unstructured sets of elements, relational algebra deals with relations—sets of tuples with defined schemas. This introduces the critical concept of 'union compatibility,' which ensures that combining two relations produces a meaningful result with a well-defined schema.
Relational Union Definition:
Formally, given two relations R(A₁, A₂, ..., Aₙ) and S(B₁, B₂, ..., Bₙ) that are union-compatible, the union R ∪ S is defined as:
R ∪ S = { t | t ∈ R ∨ t ∈ S }
Where:
Critical Requirement: Unlike general set union, relational union requires that both operand relations have compatible schemas. Without this compatibility, the resulting relation would have no well-defined structure, violating the fundamental requirements of the relational model.
| Property | Definition | Significance |
|---|---|---|
| Commutativity | R ∪ S = S ∪ R | Order of operands does not affect the result; query optimizers can freely reorder union operations |
| Associativity | (R ∪ S) ∪ T = R ∪ (S ∪ T) | Grouping of multiple unions does not affect result; enables flexible query restructuring |
| Idempotence | R ∪ R = R | Union of a relation with itself yields the original relation; useful for query simplification |
| Identity | R ∪ ∅ = R | Union with empty relation yields original; empty sets are neutral elements |
| Absorption | R ∪ (R ∩ S) = R | Demonstrates interaction with intersection; used in query optimization |
Union compatibility is the critical precondition that must be satisfied before two relations can participate in a union operation. This requirement ensures that the resulting relation has a coherent, well-defined structure.
Two relations R and S are union-compatible if and only if they satisfy these conditions:
Same Degree: Both relations must have the same number of attributes
Domain Compatibility: Corresponding attributes must have compatible domains
Note that attribute names do not need to be identical—only the number of attributes and their corresponding domains must match.
Domain compatibility does NOT require identical domains. It requires that values from both domains can be meaningfully combined. For example, INTEGER and SMALLINT are typically compatible, as are VARCHAR(50) and VARCHAR(100). However, INTEGER and VARCHAR are NOT compatible, even if both represent the same number of columns.
Result Schema Determination:
When R ∪ S is computed between union-compatible relations, the result schema is typically determined as follows:
Formal Definition with Schema:
Let R be a relation with schema (A₁: D₁, A₂: D₂, ..., Aₙ: Dₙ) Let S be a relation with schema (B₁: E₁, B₂: E₂, ..., Bₙ: Eₙ) Where Dᵢ and Eᵢ are compatible for all i
Then R ∪ S produces a relation with schema (A₁: F₁, A₂: F₂, ..., Aₙ: Fₙ) Where Fᵢ = resolve(Dᵢ, Eᵢ) represents the resolved domain
If you need to union non-compatible relations, you can often achieve compatibility through projection (π). For example, if R has attributes (A, B, C) and S has attributes (X, Y), you can project R onto just two attributes: π_{A,B}(R) ∪ S, assuming the projected attributes are domain-compatible with S's attributes.
The union operation is expressed using various notational systems across different contexts. Understanding these notations is essential for reading academic literature, implementing database systems, and communicating with other database professionals.
Standard Mathematical Notation:
The most common notation uses the union symbol (∪) as an infix operator:
Result ← R ∪ S
This reads: "Result is assigned the union of R and S."
For complex expressions, parentheses clarify precedence:
Result ← (R ∪ S) ∪ T Result ← R ∪ (σ_condition(S))
| System | Notation | Example |
|---|---|---|
| Mathematical | ∪ (infix operator) | Students ∪ Alumni |
| ASCII Text | UNION or U (keyword) | Students UNION Alumni |
| SQL Standard | UNION keyword | SELECT * FROM Students UNION SELECT * FROM Alumni |
| Query Trees | ∪ node with two children | (See diagram below) |
| Algebraic Functions | union(R, S) | union(Students, Alumni) |
| LINQ/Programming | .Union() method | students.Union(alumni) |
Operator Precedence:
In complex relational algebra expressions, understanding operator precedence prevents ambiguity:
| Priority | Operators | Notes |
|---|---|---|
| Highest | σ (selection), π (projection), ρ (rename) | Unary operators bind tightest |
| High | × (Cartesian product), ⋈ (join) | Binary product operators |
| Medium | ∩ (intersection) | Binary set operators |
| Lower | ∪ (union), − (difference) | Binary set operators |
| Lowest | ← (assignment) | Assignment operator |
Examples with Precedence:
σ_salary>50000(Employees) ∪ Managers
This applies selection FIRST (to Employees only), then unions with Managers.
π_name(Students ∪ Alumni)
This computes union FIRST, then projects the name attribute from the result.
(σ_dept='CS'(Employees)) ∪ (σ_dept='IT'(Employees))
Explicit parentheses for clarity, though not strictly required.
12345678910111213141516171819
-- Basic Union ExpressionResult ← Students ∪ Alumni -- Union with Prior SelectionActiveUsers ← σ_status='active'(Users) ∪ σ_verified=true(Users) -- Union with Prior Projection (ensuring compatibility)AllNames ← π_name, email(Customers) ∪ π_name, email(Suppliers) -- Multi-way Union (associativity allows any grouping)AllPeople ← Students ∪ Staff ∪ Alumni ∪ Visitors -- Union in Complex ExpressionSeniorEmployees ← (σ_years>10(Employees)) ∪ (σ_title='Manager'(Employees)) -- Nested OperationsResult ← π_ProductID, Price(σ_Category='Electronics'(Products)) ∪ π_ProductID, Price(σ_Category='Computers'(Products))Visualizing the union operation helps build intuition for how tuples are combined. We'll explore multiple visual representations: Venn diagrams, table-based examples, and query trees.
Concrete Table Example:
Consider two relations representing employees from different departments:
| EmpID | Name | Salary |
|---|---|---|
| E001 | Alice Chen | 85000 |
| E002 | Bob Smith | 92000 |
| E003 | Carol Davis | 78000 |
| EmpID | Name | Salary |
|---|---|---|
| E003 | Carol Davis | 78000 |
| E004 | David Lee | 89000 |
| E005 | Eve Wilson | 95000 |
| EmpID | Name | Salary |
|---|---|---|
| E001 | Alice Chen | 85000 |
| E002 | Bob Smith | 92000 |
| E003 | Carol Davis | 78000 |
| E004 | David Lee | 89000 |
| E005 | Eve Wilson | 95000 |
Notice that Carol Davis (E003) appears in both input relations but only once in the result. This is the fundamental distinction of set-based union: duplicate elimination. In SQL, use UNION (not UNION ALL) to achieve this behavior.
Query Tree Representation:
In query processing, union operations are represented as binary tree nodes with two child relations:
Understanding the cardinality (number of tuples) of union results is essential for query optimization and resource planning. The union operation's output size depends on the overlap between the input relations.
Cardinality Bounds:
For R ∪ S where |R| denotes the cardinality of R:
max(|R|, |S|) ≤ |R ∪ S| ≤ |R| + |S|
Lower Bound: If all tuples of the smaller relation are duplicates of tuples in the larger, the result equals the larger relation.
Upper Bound: If there is no overlap (no common tuples), the result contains all tuples from both relations.
| |R| | |S| | Overlap | |R ∪ S| | Scenario |
|---|---|---|---|---|
| 1000 | 500 | 0 | 1500 | Disjoint relations (maximum result) |
| 1000 | 500 | 500 | 1000 | S ⊆ R (complete overlap of smaller) |
| 1000 | 1000 | 1000 | 1000 | R = S (identical relations) |
| 1000 | 500 | 200 | 1300 | Partial overlap (typical case) |
| 0 | 500 | 0 | 500 | One empty relation |
Query optimizers estimate union cardinality using statistics like histograms and distinct value counts. For independent relations (no foreign key relationship), optimizers often assume a default overlap percentage. Accurate statistics lead to better query plans.
Performance Considerations:
Time Complexity:
The time complexity of computing R ∪ S depends on the duplicate elimination strategy:
| Strategy | Time Complexity | Space Complexity | Best For |
|---|---|---|---|
| Sort-based | O(( | R | + |
| Hash-based | O( | R | + |
| Index-based | O( | S | × log |
Implementation Approaches:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071
def hash_based_union(R: List[Tuple], S: List[Tuple]) -> List[Tuple]: """ Hash-based union algorithm. Time: O(|R| + |S|) expected Space: O(|R| + |S|) for result, O(min(|R|,|S|)) for hash table """ # Use set for O(1) membership testing result_set = set() result = [] # Process first relation for tuple_r in R: if tuple_r not in result_set: result_set.add(tuple_r) result.append(tuple_r) # Process second relation for tuple_s in S: if tuple_s not in result_set: result_set.add(tuple_s) result.append(tuple_s) return result def sort_merge_union(R: List[Tuple], S: List[Tuple]) -> List[Tuple]: """ Sort-merge union algorithm. Time: O((|R| + |S|) log(|R| + |S|)) for sorting Space: O(1) additional after sorting """ # Sort both relations sorted_R = sorted(R) sorted_S = sorted(S) result = [] i, j = 0, 0 last_added = None while i < len(sorted_R) and j < len(sorted_S): if sorted_R[i] < sorted_S[j]: if sorted_R[i] != last_added: result.append(sorted_R[i]) last_added = sorted_R[i] i += 1 elif sorted_R[i] > sorted_S[j]: if sorted_S[j] != last_added: result.append(sorted_S[j]) last_added = sorted_S[j] j += 1 else: # Equal tuples if sorted_R[i] != last_added: result.append(sorted_R[i]) last_added = sorted_R[i] i += 1 j += 1 # Process remaining tuples while i < len(sorted_R): if sorted_R[i] != last_added: result.append(sorted_R[i]) last_added = sorted_R[i] i += 1 while j < len(sorted_S): if sorted_S[j] != last_added: result.append(sorted_S[j]) last_added = sorted_S[j] j += 1 return resultThe union operation finds extensive use in real-world database scenarios. Understanding these applications helps you recognize when union is the right tool for your query requirements.
Common Use Cases:
Case Study: E-Commerce Platform
Consider an e-commerce platform that maintains separate tables for different product categories due to their distinct attribute requirements, but needs to display a unified search result:
-- Products are stored in category-specific tables
Electronics(product_id, name, price, warranty_months)
Clothing(product_id, name, price, size_chart)
Books(product_id, name, price, isbn)
-- Customer searches for products under $50
-- Need to combine results from all categories
-- Using relational algebra:
SearchResult ← π_product_id,name,price(σ_price<50(Electronics))
∪ π_product_id,name,price(σ_price<50(Clothing))
∪ π_product_id,name,price(σ_price<50(Books))
When unioning relations with different schemas, use projection (π) to extract compatible subsets of attributes. This is a common pattern when combining data from structurally different but conceptually related sources.
Application: Implementing OR Logic
The union operation provides an alternative way to express OR conditions in queries. This transformation is significant for query optimization:
-- Original query with OR:
σ_{status='active' OR priority='high'}(Tasks)
-- Equivalent using Union:
σ_{status='active'}(Tasks) ∪ σ_{priority='high'}(Tasks)
Why would we use the union form?
status and priority, the union form can use both; the OR form might only use one or neitherThe OR-to-Union transformation preserves semantic equivalence only because union eliminates duplicates. A tuple satisfying BOTH conditions would appear twice in both selections, but union removes the duplicate—exactly matching OR behavior.
While SQL's UNION operation is based on relational algebra's union, there are important differences and variations that database practitioners must understand.
SQL UNION Variants:
| SQL Operator | Behavior | Relational Algebra Equivalent |
|---|---|---|
| UNION | Eliminates duplicates | R ∪ S (exact equivalent) |
| UNION ALL | Retains all duplicates | Not directly in classic RA |
| UNION DISTINCT | Same as UNION | R ∪ S |
UNION ALL: Breaking Set Semantics
SQL's UNION ALL is a significant departure from pure relational algebra. It treats relations as bags (multisets) rather than sets, preserving duplicate counts:
-- With these relations:
-- R: {a, b, c, c}
-- S: {c, c, d}
SELECT * FROM R UNION SELECT * FROM S;
-- Result: {a, b, c, d} -- 4 distinct tuples
SELECT * FROM R UNION ALL SELECT * FROM S;
-- Result: {a, b, c, c, c, c, d} -- 7 tuples total
| Aspect | UNION | UNION ALL |
|---|---|---|
| Duplicate Handling | Eliminates duplicates | Preserves all duplicates |
| Mathematical Model | Set semantics | Bag (multiset) semantics |
| Performance | Slower (requires deduplication) | Faster (no deduplication) |
| Result Size | |R ∪ S| (set cardinality) | |R| + |S| (always) |
| Sorting Required | Often yes (for dedup) | No |
| Use Case | When duplicates are unwanted | When all rows matter (e.g., aggregates) |
UNION ALL is significantly faster than UNION because it skips the expensive deduplication step. In data warehousing and ETL operations, UNION ALL is often preferred when duplicates are known to be absent or when they are acceptable. Always use UNION ALL when you're certain there are no duplicates to avoid unnecessary overhead.
12345678910111213141516171819202122232425262728293031323334353637383940
-- Standard UNION (eliminates duplicates)-- Use when you need distinct combined resultsSELECT customer_id, name, email FROM US_CustomersUNIONSELECT customer_id, name, email FROM EU_Customers; -- UNION ALL (preserves duplicates)-- Use for performance when duplicates are acceptable or impossibleSELECT order_id, product_id, quantity FROM Q1_OrdersUNION ALLSELECT order_id, product_id, quantity FROM Q2_OrdersUNION ALLSELECT order_id, product_id, quantity FROM Q3_OrdersUNION ALLSELECT order_id, product_id, quantity FROM Q4_Orders; -- Column aliasing: Column names come from FIRST querySELECT product_id AS id, name AS item_name, price AS cost FROM ElectronicsUNIONSELECT product_id, name, price FROM Clothing; -- Column names: id, item_name, cost -- ORDER BY applies to final resultSELECT name, salary FROM EngineersUNIONSELECT name, salary FROM ManagersORDER BY salary DESC; -- Orders the combined result -- Common mistake: ORDER BY in individual queries-- This is INVALID in most databases:-- SELECT name FROM A ORDER BY name-- UNION-- SELECT name FROM B; -- ERROR: ORDER BY not allowed here -- Correct approach: Subqueries with ORDER BY need wrappingSELECT * FROM ( SELECT name, 'Engineer' AS role FROM Engineers UNION ALL SELECT name, 'Manager' AS role FROM Managers) AS combinedORDER BY role, name;We've conducted a comprehensive exploration of the union operation in relational algebra. Let's consolidate the essential knowledge:
You now have a thorough understanding of the union operation in relational algebra. You can identify union-compatible relations, write union expressions in proper notation, analyze cardinality and performance, and apply union to real-world query scenarios. Next, we'll explore the intersection operation (∩), which identifies tuples common to both relations.