Database Management SystemsRelational Algebra

Set Operations in Relational Algebra

LevelIntermediate

Duration90 mins

TopicRelational Algebra

1 / 5

Union (∪) Operation

Combining Relations: The Power of Union

In relational algebra, the union operation represents one of the most fundamental mechanisms for combining data from multiple relations. Just as in set theory, where the union of two sets contains all elements from both sets, the relational union combines tuples from two relations into a single, unified result.

Understanding the union operation is essential because real-world database scenarios frequently require consolidating data from multiple sources—merging customer lists from different branches, combining transaction records from multiple time periods, or aggregating data from distributed database fragments. The union operation provides the formal, mathematical foundation for these essential operations.

This page will guide you through the complete theory and practice of the union operation, from its mathematical foundations to its practical implementation in modern database systems.

What You Will Learn

By the end of this page, you will master the formal definition of union, understand union compatibility requirements, be able to construct union expressions using proper notation, visualize union operations through diagrams, and apply union to solve real-world database problems. You'll also understand the subtle differences between relational union and SQL's UNION operations.

Mathematical Foundations of Union

The union operation in relational algebra draws directly from set theory, which provides its mathematical foundation. To fully understand relational union, we must first establish its set-theoretic roots and then extend these concepts to the relational model.

Set-Theoretic Definition:

In classical set theory, given two sets A and B, the union A ∪ B is defined as:

A ∪ B = { x | x ∈ A ∨ x ∈ B }

This reads: "The union of A and B is the set of all elements x such that x is a member of A OR x is a member of B (or both)."

The key characteristics of set union that carry over to relational algebra include:

Inclusion of All Elements: Every element from both sets appears in the result
Elimination of Duplicates: Each distinct element appears exactly once
Commutativity: A ∪ B = B ∪ A
Associativity: (A ∪ B) ∪ C = A ∪ (B ∪ C)

From Sets to Relations

While set theory deals with unstructured sets of elements, relational algebra deals with relations—sets of tuples with defined schemas. This introduces the critical concept of 'union compatibility,' which ensures that combining two relations produces a meaningful result with a well-defined schema.

Relational Union Definition:

Formally, given two relations R(A₁, A₂, ..., Aₙ) and S(B₁, B₂, ..., Bₙ) that are union-compatible, the union R ∪ S is defined as:

R ∪ S = { t | t ∈ R ∨ t ∈ S }

Where:

t represents a tuple
The result contains all tuples that appear in either R, or S, or both
Duplicate tuples are eliminated (each distinct tuple appears once)
The result schema is derived from the operand schemas

Critical Requirement: Unlike general set union, relational union requires that both operand relations have compatible schemas. Without this compatibility, the resulting relation would have no well-defined structure, violating the fundamental requirements of the relational model.

Mathematical Properties of Relational Union
Property	Definition	Significance
Commutativity	R ∪ S = S ∪ R	Order of operands does not affect the result; query optimizers can freely reorder union operations
Associativity	(R ∪ S) ∪ T = R ∪ (S ∪ T)	Grouping of multiple unions does not affect result; enables flexible query restructuring
Idempotence	R ∪ R = R	Union of a relation with itself yields the original relation; useful for query simplification
Identity	R ∪ ∅ = R	Union with empty relation yields original; empty sets are neutral elements
Absorption	R ∪ (R ∩ S) = R	Demonstrates interaction with intersection; used in query optimization

Union Compatibility: The Essential Prerequisite

Union compatibility is the critical precondition that must be satisfied before two relations can participate in a union operation. This requirement ensures that the resulting relation has a coherent, well-defined structure.

Two relations R and S are union-compatible if and only if they satisfy these conditions:

Same Degree: Both relations must have the same number of attributes
- degree(R) = degree(S) = n
Domain Compatibility: Corresponding attributes must have compatible domains
- For each i from 1 to n: dom(Aᵢ) is compatible with dom(Bᵢ)

Note that attribute names do not need to be identical—only the number of attributes and their corresponding domains must match.

Domain Compatibility vs. Exact Match

Domain compatibility does NOT require identical domains. It requires that values from both domains can be meaningfully combined. For example, INTEGER and SMALLINT are typically compatible, as are VARCHAR(50) and VARCHAR(100). However, INTEGER and VARCHAR are NOT compatible, even if both represent the same number of columns.

Union-Compatible Relations

•Students(id: INT, name: VARCHAR, gpa: DECIMAL) and Alumni(code: INT, fullname: VARCHAR, final_gpa: DECIMAL)
•US_Customers(cid: INT, email: VARCHAR) and EU_Customers(customer_no: INT, contact: VARCHAR)
•January_Sales(product_id: INT, quantity: INT, revenue: DECIMAL) and February_Sales(item_id: INT, units: INT, total: DECIMAL)

NOT Union-Compatible

•Employees(id, name, salary) and Departments(dept_id, dept_name) — Different number of attributes
•Products(id: INT, name: VARCHAR, price: DECIMAL) and Reviews(id: INT, rating: VARCHAR, date: DATE) — Incompatible domains for attributes 2 and 3
•Orders(order_id, customer_id, total, date) and Items(item_id, name, price) — Different degree (4 vs 3)

Result Schema Determination:

When R ∪ S is computed between union-compatible relations, the result schema is typically determined as follows:

Attribute Names: Usually inherited from the first (left) operand
Domain Resolution: When domains differ but are compatible, the result domain is typically the "wider" or more general type
Constraint Preservation: Constraints from neither operand are automatically preserved in the result

Formal Definition with Schema:

Let R be a relation with schema (A₁: D₁, A₂: D₂, ..., Aₙ: Dₙ) Let S be a relation with schema (B₁: E₁, B₂: E₂, ..., Bₙ: Eₙ) Where Dᵢ and Eᵢ are compatible for all i

Then R ∪ S produces a relation with schema (A₁: F₁, A₂: F₂, ..., Aₙ: Fₙ) Where Fᵢ = resolve(Dᵢ, Eᵢ) represents the resolved domain

Making Non-Compatible Relations Compatible

If you need to union non-compatible relations, you can often achieve compatibility through projection (π). For example, if R has attributes (A, B, C) and S has attributes (X, Y), you can project R onto just two attributes: π_{A,B}(R) ∪ S, assuming the projected attributes are domain-compatible with S's attributes.

Notation and Syntax Systems

The union operation is expressed using various notational systems across different contexts. Understanding these notations is essential for reading academic literature, implementing database systems, and communicating with other database professionals.

Standard Mathematical Notation:

The most common notation uses the union symbol (∪) as an infix operator:

Result ← R ∪ S

This reads: "Result is assigned the union of R and S."

For complex expressions, parentheses clarify precedence:

Result ← (R ∪ S) ∪ T Result ← R ∪ (σ_condition(S))

Union Notation Across Different Systems
System	Notation	Example
Mathematical	∪ (infix operator)	Students ∪ Alumni
ASCII Text	UNION or U (keyword)	Students UNION Alumni
SQL Standard	UNION keyword	SELECT * FROM Students UNION SELECT * FROM Alumni
Query Trees	∪ node with two children	(See diagram below)
Algebraic Functions	union(R, S)	union(Students, Alumni)
LINQ/Programming	.Union() method	students.Union(alumni)

Operator Precedence:

In complex relational algebra expressions, understanding operator precedence prevents ambiguity:

Priority	Operators	Notes
Highest	σ (selection), π (projection), ρ (rename)	Unary operators bind tightest
High	× (Cartesian product), ⋈ (join)	Binary product operators
Medium	∩ (intersection)	Binary set operators
Lower	∪ (union), − (difference)	Binary set operators
Lowest	← (assignment)	Assignment operator

Examples with Precedence:

σ_salary>50000(Employees) ∪ Managers

This applies selection FIRST (to Employees only), then unions with Managers.

π_name(Students ∪ Alumni)

This computes union FIRST, then projects the name attribute from the result.

(σ_dept='CS'(Employees)) ∪ (σ_dept='IT'(Employees))

Explicit parentheses for clarity, though not strictly required.

union_notation_examples.txt

-- Basic Union Expression
Result ← Students ∪ Alumni
 
-- Union with Prior Selection
ActiveUsers ← σ_status='active'(Users) ∪ σ_verified=true(Users)
 
-- Union with Prior Projection (ensuring compatibility)
AllNames ← π_name, email(Customers) ∪ π_name, email(Suppliers)
 
-- Multi-way Union (associativity allows any grouping)
AllPeople ← Students ∪ Staff ∪ Alumni ∪ Visitors
 
-- Union in Complex Expression
SeniorEmployees ← (σ_years>10(Employees)) ∪ (σ_title='Manager'(Employees))
 
-- Nested Operations
Result ← π_ProductID, Price(σ_Category='Electronics'(Products))
         ∪ 
         π_ProductID, Price(σ_Category='Computers'(Products))

Visual Representation of Union

Visualizing the union operation helps build intuition for how tuples are combined. We'll explore multiple visual representations: Venn diagrams, table-based examples, and query trees.

Converting Mermaid diagram...

Concrete Table Example:

Consider two relations representing employees from different departments:

Relation R: Engineering Department (Engineering_Staff)
EmpID	Name	Salary
E001	Alice Chen	85000
E002	Bob Smith	92000
E003	Carol Davis	78000

Relation S: Research Department (Research_Staff)
EmpID	Name	Salary
E003	Carol Davis	78000
E004	David Lee	89000
E005	Eve Wilson	95000

Result: Engineering_Staff ∪ Research_Staff
EmpID	Name	Salary
E001	Alice Chen	85000
E002	Bob Smith	92000
E003	Carol Davis	78000
E004	David Lee	89000
E005	Eve Wilson	95000

Duplicate Elimination in Action

Notice that Carol Davis (E003) appears in both input relations but only once in the result. This is the fundamental distinction of set-based union: duplicate elimination. In SQL, use UNION (not UNION ALL) to achieve this behavior.

Query Tree Representation:

In query processing, union operations are represented as binary tree nodes with two child relations:

Converting Mermaid diagram...

Cardinality Analysis and Performance

Understanding the cardinality (number of tuples) of union results is essential for query optimization and resource planning. The union operation's output size depends on the overlap between the input relations.

Cardinality Bounds:

For R ∪ S where |R| denotes the cardinality of R:

max(|R|, |S|) ≤ |R ∪ S| ≤ |R| + |S|

Lower Bound: If all tuples of the smaller relation are duplicates of tuples in the larger, the result equals the larger relation.

Upper Bound: If there is no overlap (no common tuples), the result contains all tuples from both relations.

Cardinality Scenarios for Union
\|R\|	\|S\|	Overlap	\|R ∪ S\|	Scenario
1000	500	0	1500	Disjoint relations (maximum result)
1000	500	500	1000	S ⊆ R (complete overlap of smaller)
1000	1000	1000	1000	R = S (identical relations)
1000	500	200	1300	Partial overlap (typical case)
0	500	0	500	One empty relation

Selectivity Estimation

Query optimizers estimate union cardinality using statistics like histograms and distinct value counts. For independent relations (no foreign key relationship), optimizers often assume a default overlap percentage. Accurate statistics lead to better query plans.

Performance Considerations:

Time Complexity:

The time complexity of computing R ∪ S depends on the duplicate elimination strategy:

Strategy	Time Complexity	Space Complexity	Best For
Sort-based	O((	R	+
Hash-based	O(	R	+
Index-based	O(	S	× log

Implementation Approaches:

Sort-Merge Union: Sort both relations, then merge while eliminating duplicates
Hash Union: Build hash table from smaller relation, probe with larger, output unique tuples
External Sort Union: For relations exceeding memory, use external sorting algorithms

union_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def hash_based_union(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Hash-based union algorithm.
    Time: O(|R| + |S|) expected
    Space: O(|R| + |S|) for result, O(min(|R|,|S|)) for hash table
    """
    # Use set for O(1) membership testing
    result_set = set()
    result = []
    
    # Process first relation
    for tuple_r in R:
        if tuple_r not in result_set:
            result_set.add(tuple_r)
            result.append(tuple_r)
    
    # Process second relation
    for tuple_s in S:
        if tuple_s not in result_set:
            result_set.add(tuple_s)
            result.append(tuple_s)
    
    return result
 
 
def sort_merge_union(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Sort-merge union algorithm.
    Time: O((|R| + |S|) log(|R| + |S|)) for sorting
    Space: O(1) additional after sorting
    """
    # Sort both relations
    sorted_R = sorted(R)
    sorted_S = sorted(S)
    
    result = []
    i, j = 0, 0
    last_added = None
    
    while i < len(sorted_R) and j < len(sorted_S):
        if sorted_R[i] < sorted_S[j]:
            if sorted_R[i] != last_added:
                result.append(sorted_R[i])
                last_added = sorted_R[i]
            i += 1
        elif sorted_R[i] > sorted_S[j]:
            if sorted_S[j] != last_added:
                result.append(sorted_S[j])
                last_added = sorted_S[j]
            j += 1
        else:  # Equal tuples
            if sorted_R[i] != last_added:
                result.append(sorted_R[i])
                last_added = sorted_R[i]
            i += 1
            j += 1
    
    # Process remaining tuples
    while i < len(sorted_R):
        if sorted_R[i] != last_added:
            result.append(sorted_R[i])
            last_added = sorted_R[i]
        i += 1
    
    while j < len(sorted_S):
        if sorted_S[j] != last_added:
            result.append(sorted_S[j])
            last_added = sorted_S[j]
        j += 1
    
    return result

Practical Applications of Union

The union operation finds extensive use in real-world database scenarios. Understanding these applications helps you recognize when union is the right tool for your query requirements.

Common Use Cases:

Real-World Union Applications

•Data Consolidation: Merging customer records from multiple regional databases into a unified view
•Time-Based Aggregation: Combining sales data from different months or quarters for analysis
•Multi-Source Queries: Querying both active and archived records in a single operation
•OR Conditions Optimization: Decomposing complex OR conditions into separate selections unified via union
•Distributed Database Queries: Combining results from horizontally partitioned tables
•Report Generation: Creating comprehensive reports that include data from multiple categories

Case Study: E-Commerce Platform

Consider an e-commerce platform that maintains separate tables for different product categories due to their distinct attribute requirements, but needs to display a unified search result:

-- Products are stored in category-specific tables
Electronics(product_id, name, price, warranty_months)
Clothing(product_id, name, price, size_chart)
Books(product_id, name, price, isbn)

-- Customer searches for products under $50
-- Need to combine results from all categories

-- Using relational algebra:
SearchResult ← π_product_id,name,price(σ_price<50(Electronics)) 
              ∪ π_product_id,name,price(σ_price<50(Clothing))
              ∪ π_product_id,name,price(σ_price<50(Books))

Projection Before Union

When unioning relations with different schemas, use projection (π) to extract compatible subsets of attributes. This is a common pattern when combining data from structurally different but conceptually related sources.

Application: Implementing OR Logic

The union operation provides an alternative way to express OR conditions in queries. This transformation is significant for query optimization:

-- Original query with OR:
σ_{status='active' OR priority='high'}(Tasks)

-- Equivalent using Union:
σ_{status='active'}(Tasks) ∪ σ_{priority='high'}(Tasks)

Why would we use the union form?

Index Utilization: If separate indexes exist on status and priority, the union form can use both; the OR form might only use one or neither
Parallel Execution: The two selections can be executed in parallel, then unioned
Query Plan Flexibility: Optimizers have more rearrangement options with union form

Semantic Equivalence Caveat

The OR-to-Union transformation preserves semantic equivalence only because union eliminates duplicates. A tuple satisfying BOTH conditions would appear twice in both selections, but union removes the duplicate—exactly matching OR behavior.

Relational Union vs. SQL UNION

While SQL's UNION operation is based on relational algebra's union, there are important differences and variations that database practitioners must understand.

SQL UNION Variants:

SQL Operator	Behavior	Relational Algebra Equivalent
UNION	Eliminates duplicates	R ∪ S (exact equivalent)
UNION ALL	Retains all duplicates	Not directly in classic RA
UNION DISTINCT	Same as UNION	R ∪ S

UNION ALL: Breaking Set Semantics

SQL's UNION ALL is a significant departure from pure relational algebra. It treats relations as bags (multisets) rather than sets, preserving duplicate counts:

-- With these relations:
-- R: {a, b, c, c}
-- S: {c, c, d}

SELECT * FROM R UNION SELECT * FROM S;
-- Result: {a, b, c, d}  -- 4 distinct tuples

SELECT * FROM R UNION ALL SELECT * FROM S;
-- Result: {a, b, c, c, c, c, d}  -- 7 tuples total

UNION vs. UNION ALL Comparison
Aspect	UNION	UNION ALL
Duplicate Handling	Eliminates duplicates	Preserves all duplicates
Mathematical Model	Set semantics	Bag (multiset) semantics
Performance	Slower (requires deduplication)	Faster (no deduplication)
Result Size	\|R ∪ S\| (set cardinality)	\|R\| + \|S\| (always)
Sorting Required	Often yes (for dedup)	No
Use Case	When duplicates are unwanted	When all rows matter (e.g., aggregates)

Performance Implications

UNION ALL is significantly faster than UNION because it skips the expensive deduplication step. In data warehousing and ETL operations, UNION ALL is often preferred when duplicates are known to be absent or when they are acceptable. Always use UNION ALL when you're certain there are no duplicates to avoid unnecessary overhead.

sql_union_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Standard UNION (eliminates duplicates)
-- Use when you need distinct combined results
SELECT customer_id, name, email FROM US_Customers
UNION
SELECT customer_id, name, email FROM EU_Customers;
 
-- UNION ALL (preserves duplicates)
-- Use for performance when duplicates are acceptable or impossible
SELECT order_id, product_id, quantity FROM Q1_Orders
UNION ALL
SELECT order_id, product_id, quantity FROM Q2_Orders
UNION ALL
SELECT order_id, product_id, quantity FROM Q3_Orders
UNION ALL
SELECT order_id, product_id, quantity FROM Q4_Orders;
 
-- Column aliasing: Column names come from FIRST query
SELECT product_id AS id, name AS item_name, price AS cost FROM Electronics
UNION
SELECT product_id, name, price FROM Clothing;  -- Column names: id, item_name, cost
 
-- ORDER BY applies to final result
SELECT name, salary FROM Engineers
UNION
SELECT name, salary FROM Managers
ORDER BY salary DESC;  -- Orders the combined result
 
-- Common mistake: ORDER BY in individual queries
-- This is INVALID in most databases:
-- SELECT name FROM A ORDER BY name
-- UNION
-- SELECT name FROM B;  -- ERROR: ORDER BY not allowed here
 
-- Correct approach: Subqueries with ORDER BY need wrapping
SELECT * FROM (
    SELECT name, 'Engineer' AS role FROM Engineers
    UNION ALL
    SELECT name, 'Manager' AS role FROM Managers
) AS combined
ORDER BY role, name;

Summary: Mastering the Union Operation

We've conducted a comprehensive exploration of the union operation in relational algebra. Let's consolidate the essential knowledge:

Key Takeaways

•Definition: R ∪ S produces a relation containing all tuples that appear in R, S, or both, with duplicates eliminated
•Union Compatibility: Operands must have the same degree (number of attributes) and compatible domains for corresponding attributes
•Set Properties: Union is commutative (R ∪ S = S ∪ R), associative ((R ∪ S) ∪ T = R ∪ (S ∪ T)), and idempotent (R ∪ R = R)
•Cardinality: max(|R|, |S|) ≤ |R ∪ S| ≤ |R| + |S|; actual result depends on tuple overlap
•Implementation: Hash-based union offers O(n) expected time; sort-merge provides O(n log n) with minimal space
•SQL Mapping: UNION eliminates duplicates (matching RA); UNION ALL preserves them (bag semantics, faster)
•Query Optimization: Union can replace OR conditions to enable better index utilization and parallel execution

Page Complete

You now have a thorough understanding of the union operation in relational algebra. You can identify union-compatible relations, write union expressions in proper notation, analyze cardinality and performance, and apply union to real-world query scenarios. Next, we'll explore the intersection operation (∩), which identifies tuples common to both relations.

1 / 5

Loading learning content...

Database Management SystemsRelational Algebra

Set Operations in Relational Algebra

LevelIntermediate

Duration90 mins

TopicRelational Algebra

1 / 5

Union (∪) Operation

Combining Relations: The Power of Union

This page will guide you through the complete theory and practice of the union operation, from its mathematical foundations to its practical implementation in modern database systems.

What You Will Learn

Mathematical Foundations of Union

Set-Theoretic Definition:

In classical set theory, given two sets A and B, the union A ∪ B is defined as:

A ∪ B = { x | x ∈ A ∨ x ∈ B }

This reads: "The union of A and B is the set of all elements x such that x is a member of A OR x is a member of B (or both)."

The key characteristics of set union that carry over to relational algebra include:

Inclusion of All Elements: Every element from both sets appears in the result
Elimination of Duplicates: Each distinct element appears exactly once
Commutativity: A ∪ B = B ∪ A
Associativity: (A ∪ B) ∪ C = A ∪ (B ∪ C)

From Sets to Relations

Relational Union Definition:

Formally, given two relations R(A₁, A₂, ..., Aₙ) and S(B₁, B₂, ..., Bₙ) that are union-compatible, the union R ∪ S is defined as:

R ∪ S = { t | t ∈ R ∨ t ∈ S }

Where:

t represents a tuple
The result contains all tuples that appear in either R, or S, or both
Duplicate tuples are eliminated (each distinct tuple appears once)
The result schema is derived from the operand schemas

Mathematical Properties of Relational Union
Property	Definition	Significance
Commutativity	R ∪ S = S ∪ R	Order of operands does not affect the result; query optimizers can freely reorder union operations
Associativity	(R ∪ S) ∪ T = R ∪ (S ∪ T)	Grouping of multiple unions does not affect result; enables flexible query restructuring
Idempotence	R ∪ R = R	Union of a relation with itself yields the original relation; useful for query simplification
Identity	R ∪ ∅ = R	Union with empty relation yields original; empty sets are neutral elements
Absorption	R ∪ (R ∩ S) = R	Demonstrates interaction with intersection; used in query optimization

Union Compatibility: The Essential Prerequisite

Two relations R and S are union-compatible if and only if they satisfy these conditions:

Same Degree: Both relations must have the same number of attributes
- degree(R) = degree(S) = n
Domain Compatibility: Corresponding attributes must have compatible domains
- For each i from 1 to n: dom(Aᵢ) is compatible with dom(Bᵢ)

Note that attribute names do not need to be identical—only the number of attributes and their corresponding domains must match.

Domain Compatibility vs. Exact Match

Union-Compatible Relations

•Students(id: INT, name: VARCHAR, gpa: DECIMAL) and Alumni(code: INT, fullname: VARCHAR, final_gpa: DECIMAL)
•US_Customers(cid: INT, email: VARCHAR) and EU_Customers(customer_no: INT, contact: VARCHAR)
•January_Sales(product_id: INT, quantity: INT, revenue: DECIMAL) and February_Sales(item_id: INT, units: INT, total: DECIMAL)

NOT Union-Compatible

•Employees(id, name, salary) and Departments(dept_id, dept_name) — Different number of attributes
•Products(id: INT, name: VARCHAR, price: DECIMAL) and Reviews(id: INT, rating: VARCHAR, date: DATE) — Incompatible domains for attributes 2 and 3
•Orders(order_id, customer_id, total, date) and Items(item_id, name, price) — Different degree (4 vs 3)

Result Schema Determination:

When R ∪ S is computed between union-compatible relations, the result schema is typically determined as follows:

Attribute Names: Usually inherited from the first (left) operand
Domain Resolution: When domains differ but are compatible, the result domain is typically the "wider" or more general type
Constraint Preservation: Constraints from neither operand are automatically preserved in the result

Formal Definition with Schema:

Let R be a relation with schema (A₁: D₁, A₂: D₂, ..., Aₙ: Dₙ) Let S be a relation with schema (B₁: E₁, B₂: E₂, ..., Bₙ: Eₙ) Where Dᵢ and Eᵢ are compatible for all i

Then R ∪ S produces a relation with schema (A₁: F₁, A₂: F₂, ..., Aₙ: Fₙ) Where Fᵢ = resolve(Dᵢ, Eᵢ) represents the resolved domain

Making Non-Compatible Relations Compatible

Notation and Syntax Systems

Standard Mathematical Notation:

The most common notation uses the union symbol (∪) as an infix operator:

Result ← R ∪ S

This reads: "Result is assigned the union of R and S."

For complex expressions, parentheses clarify precedence:

Result ← (R ∪ S) ∪ T Result ← R ∪ (σ_condition(S))

Union Notation Across Different Systems
System	Notation	Example
Mathematical	∪ (infix operator)	Students ∪ Alumni
ASCII Text	UNION or U (keyword)	Students UNION Alumni
SQL Standard	UNION keyword	SELECT * FROM Students UNION SELECT * FROM Alumni
Query Trees	∪ node with two children	(See diagram below)
Algebraic Functions	union(R, S)	union(Students, Alumni)
LINQ/Programming	.Union() method	students.Union(alumni)

Operator Precedence:

In complex relational algebra expressions, understanding operator precedence prevents ambiguity:

Priority	Operators	Notes
Highest	σ (selection), π (projection), ρ (rename)	Unary operators bind tightest
High	× (Cartesian product), ⋈ (join)	Binary product operators
Medium	∩ (intersection)	Binary set operators
Lower	∪ (union), − (difference)	Binary set operators
Lowest	← (assignment)	Assignment operator

Examples with Precedence:

σ_salary>50000(Employees) ∪ Managers

This applies selection FIRST (to Employees only), then unions with Managers.

π_name(Students ∪ Alumni)

This computes union FIRST, then projects the name attribute from the result.

(σ_dept='CS'(Employees)) ∪ (σ_dept='IT'(Employees))

Explicit parentheses for clarity, though not strictly required.

union_notation_examples.txt

-- Basic Union Expression
Result ← Students ∪ Alumni
 
-- Union with Prior Selection
ActiveUsers ← σ_status='active'(Users) ∪ σ_verified=true(Users)
 
-- Union with Prior Projection (ensuring compatibility)
AllNames ← π_name, email(Customers) ∪ π_name, email(Suppliers)
 
-- Multi-way Union (associativity allows any grouping)
AllPeople ← Students ∪ Staff ∪ Alumni ∪ Visitors
 
-- Union in Complex Expression
SeniorEmployees ← (σ_years>10(Employees)) ∪ (σ_title='Manager'(Employees))
 
-- Nested Operations
Result ← π_ProductID, Price(σ_Category='Electronics'(Products))
         ∪ 
         π_ProductID, Price(σ_Category='Computers'(Products))

Visual Representation of Union

Visualizing the union operation helps build intuition for how tuples are combined. We'll explore multiple visual representations: Venn diagrams, table-based examples, and query trees.

Converting Mermaid diagram...

Concrete Table Example:

Consider two relations representing employees from different departments:

Relation R: Engineering Department (Engineering_Staff)
EmpID	Name	Salary
E001	Alice Chen	85000
E002	Bob Smith	92000
E003	Carol Davis	78000

Relation S: Research Department (Research_Staff)
EmpID	Name	Salary
E003	Carol Davis	78000
E004	David Lee	89000
E005	Eve Wilson	95000

Result: Engineering_Staff ∪ Research_Staff
EmpID	Name	Salary
E001	Alice Chen	85000
E002	Bob Smith	92000
E003	Carol Davis	78000
E004	David Lee	89000
E005	Eve Wilson	95000

Duplicate Elimination in Action

Query Tree Representation:

In query processing, union operations are represented as binary tree nodes with two child relations:

Converting Mermaid diagram...

Cardinality Analysis and Performance

Cardinality Bounds:

For R ∪ S where |R| denotes the cardinality of R:

max(|R|, |S|) ≤ |R ∪ S| ≤ |R| + |S|

Lower Bound: If all tuples of the smaller relation are duplicates of tuples in the larger, the result equals the larger relation.

Upper Bound: If there is no overlap (no common tuples), the result contains all tuples from both relations.

Cardinality Scenarios for Union
\|R\|	\|S\|	Overlap	\|R ∪ S\|	Scenario
1000	500	0	1500	Disjoint relations (maximum result)
1000	500	500	1000	S ⊆ R (complete overlap of smaller)
1000	1000	1000	1000	R = S (identical relations)
1000	500	200	1300	Partial overlap (typical case)
0	500	0	500	One empty relation

Selectivity Estimation

Performance Considerations:

Time Complexity:

The time complexity of computing R ∪ S depends on the duplicate elimination strategy:

Strategy	Time Complexity	Space Complexity	Best For
Sort-based	O((	R	+
Hash-based	O(	R	+
Index-based	O(	S	× log

Implementation Approaches:

Sort-Merge Union: Sort both relations, then merge while eliminating duplicates
Hash Union: Build hash table from smaller relation, probe with larger, output unique tuples
External Sort Union: For relations exceeding memory, use external sorting algorithms

union_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
def hash_based_union(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Hash-based union algorithm.
    Time: O(|R| + |S|) expected
    Space: O(|R| + |S|) for result, O(min(|R|,|S|)) for hash table
    """
    # Use set for O(1) membership testing
    result_set = set()
    result = []
    
    # Process first relation
    for tuple_r in R:
        if tuple_r not in result_set:
            result_set.add(tuple_r)
            result.append(tuple_r)
    
    # Process second relation
    for tuple_s in S:
        if tuple_s not in result_set:
            result_set.add(tuple_s)
            result.append(tuple_s)
    
    return result
 
 
def sort_merge_union(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Sort-merge union algorithm.
    Time: O((|R| + |S|) log(|R| + |S|)) for sorting
    Space: O(1) additional after sorting
    """
    # Sort both relations
    sorted_R = sorted(R)
    sorted_S = sorted(S)
    
    result = []
    i, j = 0, 0
    last_added = None
    
    while i < len(sorted_R) and j < len(sorted_S):
        if sorted_R[i] < sorted_S[j]:
            if sorted_R[i] != last_added:
                result.append(sorted_R[i])
                last_added = sorted_R[i]
            i += 1
        elif sorted_R[i] > sorted_S[j]:
            if sorted_S[j] != last_added:
                result.append(sorted_S[j])
                last_added = sorted_S[j]
            j += 1
        else:  # Equal tuples
            if sorted_R[i] != last_added:
                result.append(sorted_R[i])
                last_added = sorted_R[i]
            i += 1
            j += 1
    
    # Process remaining tuples
    while i < len(sorted_R):
        if sorted_R[i] != last_added:
            result.append(sorted_R[i])
            last_added = sorted_R[i]
        i += 1
    
    while j < len(sorted_S):
        if sorted_S[j] != last_added:
            result.append(sorted_S[j])
            last_added = sorted_S[j]
        j += 1
    
    return result

Practical Applications of Union

The union operation finds extensive use in real-world database scenarios. Understanding these applications helps you recognize when union is the right tool for your query requirements.

Common Use Cases:

Real-World Union Applications

•Data Consolidation: Merging customer records from multiple regional databases into a unified view
•Time-Based Aggregation: Combining sales data from different months or quarters for analysis
•Multi-Source Queries: Querying both active and archived records in a single operation
•OR Conditions Optimization: Decomposing complex OR conditions into separate selections unified via union
•Distributed Database Queries: Combining results from horizontally partitioned tables
•Report Generation: Creating comprehensive reports that include data from multiple categories

Case Study: E-Commerce Platform

Consider an e-commerce platform that maintains separate tables for different product categories due to their distinct attribute requirements, but needs to display a unified search result:

-- Products are stored in category-specific tables
Electronics(product_id, name, price, warranty_months)
Clothing(product_id, name, price, size_chart)
Books(product_id, name, price, isbn)

-- Customer searches for products under $50
-- Need to combine results from all categories

-- Using relational algebra:
SearchResult ← π_product_id,name,price(σ_price<50(Electronics)) 
              ∪ π_product_id,name,price(σ_price<50(Clothing))
              ∪ π_product_id,name,price(σ_price<50(Books))

Projection Before Union

Application: Implementing OR Logic

The union operation provides an alternative way to express OR conditions in queries. This transformation is significant for query optimization:

-- Original query with OR:
σ_{status='active' OR priority='high'}(Tasks)

-- Equivalent using Union:
σ_{status='active'}(Tasks) ∪ σ_{priority='high'}(Tasks)

Why would we use the union form?

Index Utilization: If separate indexes exist on status and priority, the union form can use both; the OR form might only use one or neither
Parallel Execution: The two selections can be executed in parallel, then unioned
Query Plan Flexibility: Optimizers have more rearrangement options with union form

Semantic Equivalence Caveat

Relational Union vs. SQL UNION

While SQL's UNION operation is based on relational algebra's union, there are important differences and variations that database practitioners must understand.

SQL UNION Variants:

SQL Operator	Behavior	Relational Algebra Equivalent
UNION	Eliminates duplicates	R ∪ S (exact equivalent)
UNION ALL	Retains all duplicates	Not directly in classic RA
UNION DISTINCT	Same as UNION	R ∪ S

UNION ALL: Breaking Set Semantics

SQL's UNION ALL is a significant departure from pure relational algebra. It treats relations as bags (multisets) rather than sets, preserving duplicate counts:

-- With these relations:
-- R: {a, b, c, c}
-- S: {c, c, d}

SELECT * FROM R UNION SELECT * FROM S;
-- Result: {a, b, c, d}  -- 4 distinct tuples

SELECT * FROM R UNION ALL SELECT * FROM S;
-- Result: {a, b, c, c, c, c, d}  -- 7 tuples total

UNION vs. UNION ALL Comparison
Aspect	UNION	UNION ALL
Duplicate Handling	Eliminates duplicates	Preserves all duplicates
Mathematical Model	Set semantics	Bag (multiset) semantics
Performance	Slower (requires deduplication)	Faster (no deduplication)
Result Size	\|R ∪ S\| (set cardinality)	\|R\| + \|S\| (always)
Sorting Required	Often yes (for dedup)	No
Use Case	When duplicates are unwanted	When all rows matter (e.g., aggregates)

Performance Implications

sql_union_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Standard UNION (eliminates duplicates)
-- Use when you need distinct combined results
SELECT customer_id, name, email FROM US_Customers
UNION
SELECT customer_id, name, email FROM EU_Customers;
 
-- UNION ALL (preserves duplicates)
-- Use for performance when duplicates are acceptable or impossible
SELECT order_id, product_id, quantity FROM Q1_Orders
UNION ALL
SELECT order_id, product_id, quantity FROM Q2_Orders
UNION ALL
SELECT order_id, product_id, quantity FROM Q3_Orders
UNION ALL
SELECT order_id, product_id, quantity FROM Q4_Orders;
 
-- Column aliasing: Column names come from FIRST query
SELECT product_id AS id, name AS item_name, price AS cost FROM Electronics
UNION
SELECT product_id, name, price FROM Clothing;  -- Column names: id, item_name, cost
 
-- ORDER BY applies to final result
SELECT name, salary FROM Engineers
UNION
SELECT name, salary FROM Managers
ORDER BY salary DESC;  -- Orders the combined result
 
-- Common mistake: ORDER BY in individual queries
-- This is INVALID in most databases:
-- SELECT name FROM A ORDER BY name
-- UNION
-- SELECT name FROM B;  -- ERROR: ORDER BY not allowed here
 
-- Correct approach: Subqueries with ORDER BY need wrapping
SELECT * FROM (
    SELECT name, 'Engineer' AS role FROM Engineers
    UNION ALL
    SELECT name, 'Manager' AS role FROM Managers
) AS combined
ORDER BY role, name;

Summary: Mastering the Union Operation

We've conducted a comprehensive exploration of the union operation in relational algebra. Let's consolidate the essential knowledge:

Key Takeaways

•Definition: R ∪ S produces a relation containing all tuples that appear in R, S, or both, with duplicates eliminated
•Union Compatibility: Operands must have the same degree (number of attributes) and compatible domains for corresponding attributes
•Set Properties: Union is commutative (R ∪ S = S ∪ R), associative ((R ∪ S) ∪ T = R ∪ (S ∪ T)), and idempotent (R ∪ R = R)
•Cardinality: max(|R|, |S|) ≤ |R ∪ S| ≤ |R| + |S|; actual result depends on tuple overlap
•Implementation: Hash-based union offers O(n) expected time; sort-merge provides O(n log n) with minimal space
•SQL Mapping: UNION eliminates duplicates (matching RA); UNION ALL preserves them (bag semantics, faster)
•Query Optimization: Union can replace OR conditions to enable better index utilization and parallel execution

Page Complete

1 / 5