Database Management SystemsRelational Algebra

Set Operations in Relational Algebra

LevelIntermediate

Duration90 mins

TopicRelational Algebra

2 / 5

Intersection (∩) Operation

Finding Common Ground: The Intersection Operation

While union combines all tuples from two relations, intersection identifies what is common to both. The intersection operation answers a fundamentally different question: "Which tuples exist in BOTH relations?"

Intersection is indispensable when you need to find overlapping data—customers who purchased from both product lines, employees who have skills in multiple domains, or records that satisfy multiple independent criteria. Understanding intersection deeply involves not just its definition, but also its fascinating theoretical property: it can be derived from other fundamental operations.

This page delivers a complete mastery of the intersection operation, from mathematical foundations through practical implementation, including a rigorous proof of its derivability from set difference.

What You Will Learn

By the end of this page, you will master the formal definition of intersection, understand why it's not a primitive operator (and can be derived from difference), compare intersection with other operations, analyze performance characteristics, and apply intersection to solve real-world database queries.

Mathematical Foundations of Intersection

The intersection operation in relational algebra, like union, is grounded in classical set theory. Understanding this foundation provides the rigorous framework needed for precise query formulation.

Set-Theoretic Definition:

In classical set theory, given two sets A and B, the intersection A ∩ B is defined as:

A ∩ B = { x | x ∈ A ∧ x ∈ B }

This reads: "The intersection of A and B is the set of all elements x such that x is a member of A AND x is a member of B."

The key characteristics of set intersection include:

Membership Requirement: An element must belong to BOTH sets to appear in the result
Subset Property: A ∩ B ⊆ A and A ∩ B ⊆ B
Commutativity: A ∩ B = B ∩ A
Associativity: (A ∩ B) ∩ C = A ∩ (B ∩ C)

Intersection as a Filter

Conceptually, intersection acts as a mutual filter: it retains only those tuples that 'pass through' both relations. This filtering perspective is helpful when reasoning about query results—the intersection can never contain more tuples than the smaller of its operands.

Relational Intersection Definition:

Formally, given two union-compatible relations R and S, the intersection R ∩ S is defined as:

R ∩ S = { t | t ∈ R ∧ t ∈ S }

Where:

t represents a tuple
The result contains only tuples that appear in BOTH R and S
The result schema matches the operand schemas
Like union, intersection requires union compatibility

Critical Observation: The intersection of two relations is always a subset of both operands. This has important implications for cardinality analysis.

Mathematical Properties of Relational Intersection
Property	Definition	Significance
Commutativity	R ∩ S = S ∩ R	Order of operands does not affect result
Associativity	(R ∩ S) ∩ T = R ∩ (S ∩ T)	Grouping of multiple intersections doesn't matter
Idempotence	R ∩ R = R	Intersection of a relation with itself yields the original
Identity	R ∩ Universal = R	Intersection with universal relation yields original
Annihilation	R ∩ ∅ = ∅	Intersection with empty relation yields empty
Absorption	R ∩ (R ∪ S) = R	Intersection absorbs union with same operand
Distribution over Union	R ∩ (S ∪ T) = (R ∩ S) ∪ (R ∩ T)	Intersection distributes over union
Subset Result	R ∩ S ⊆ R and R ∩ S ⊆ S	Result is subset of both operands

De Morgan's Laws in Relational Algebra:

The famous De Morgan's laws from set theory also apply to relational algebra:

First Law: R − (S ∪ T) = (R − S) ∩ (R − T)
Second Law: R − (S ∩ T) = (R − S) ∪ (R − T)

These laws are invaluable for query transformation and optimization, allowing queries to be restructured into equivalent forms that may be more efficiently executable.

Derivability: Intersection from Set Difference

A remarkable theoretical property of relational intersection is that it is not a primitive operator. Unlike selection, projection, union, difference, and Cartesian product (the five fundamental operations), intersection can be derived from these primitives.

The Derivation:

Intersection can be expressed using only set difference:

R ∩ S = R − (R − S)

Equivalently:

R ∩ S = S − (S − R)

Why does this work?

Let's trace through the logic step by step:

Derivation Proof: R ∩ S = R − (R − S)

•Start with R − S: This produces all tuples in R that are NOT in S (tuples unique to R)
•Compute R − (R − S): From R, remove all tuples that were unique to R (those not in S)
•What remains?: Only tuples that are in R AND also in S—exactly the intersection!

Intuitive Understanding

Think of it this way: (R − S) removes from R everything that R doesn't share with S. When we then compute R − (R − S), we're removing those 'not shared' tuples from R, leaving only the shared tuples—the intersection.

Formal Proof:

We prove R ∩ S = R − (R − S) by showing mutual subset inclusion.

Part 1: R ∩ S ⊆ R − (R − S)

Let t ∈ R ∩ S (arbitrary tuple in the intersection)

Then t ∈ R and t ∈ S (by definition of intersection)
Since t ∈ S, we know t ∉ (R − S) (because R − S contains only tuples NOT in S)
Since t ∈ R and t ∉ (R − S), we have t ∈ R − (R − S)
Therefore R ∩ S ⊆ R − (R − S) ✓

Part 2: R − (R − S) ⊆ R ∩ S

Let t ∈ R − (R − S) (arbitrary tuple in the difference)

Then t ∈ R and t ∉ (R − S) (by definition of difference)
Since t ∉ (R − S), either t ∉ R or t ∈ S
We know t ∈ R, so it must be that t ∈ S
Since t ∈ R and t ∈ S, we have t ∈ R ∩ S
Therefore R − (R − S) ⊆ R ∩ S ✓

Conclusion: Since both subset relations hold, R ∩ S = R − (R − S) ∎

Converting Mermaid diagram...

Practical Implications

Despite being derivable, most database systems implement intersection as a distinct operation for efficiency. Computing R − (R − S) requires two difference operations, while a direct intersection implementation can be more efficient. The derivability is theoretically important (proving relational algebra's closure properties) but not necessarily the implementation strategy.

Union Compatibility for Intersection

Like union, intersection requires union compatibility between its operands. The requirements are identical:

Same Degree: Both relations must have the same number of attributes
Domain Compatibility: Corresponding attributes must have compatible domains

However, the behavior implications differ from union:

For Union: Incompatible schemas make it impossible to create a coherent result relation

For Intersection: Even with compatible schemas, the intersection might be empty if there's no actual tuple overlap—which is a valid, meaningful result

Intersection Outcomes Based on Operand Relationship
Relationship	Result	Cardinality
R ⊆ S (R is subset of S)	R ∩ S = R	\|R ∩ S\| = \|R\|
S ⊆ R (S is subset of R)	R ∩ S = S	\|R ∩ S\| = \|S\|
R = S (identical)	R ∩ S = R = S	\|R ∩ S\| = \|R\| = \|S\|
R and S disjoint	R ∩ S = ∅	\|R ∩ S\| = 0
Partial overlap	Common tuples only	0 < \|R ∩ S\| < min(\|R\|, \|S\|)

Ensuring Compatibility:

When working with relations that are not directly union-compatible, projection and renaming can be used to achieve compatibility:

-- Original incompatible relations:
Employees(emp_id, name, department, salary, hire_date)
Contractors(contractor_id, full_name, dept_code, rate)

-- To find people working in both capacities (by name and department):
π_name,department(Employees) ∩ ρ_name←full_name,department←dept_code(π_full_name,dept_code(Contractors))

The steps:

Project Employees to just (name, department)
Project Contractors to (full_name, dept_code)
Rename Contractors' attributes to match Employees'
Compute intersection

This pattern is essential when comparing data from different sources with different schemas.

Semantic Compatibility

Domain compatibility alone doesn't guarantee meaningful results. Even if two VARCHAR columns are compatible, intersecting 'customer_name' with 'product_name' is semantically meaningless. Always ensure that the attributes being compared represent the same real-world concept.

Visual Representation of Intersection

Visualizing intersection helps build intuition for identifying common tuples between relations. The classic Venn diagram representation is particularly effective for intersection.

Converting Mermaid diagram...

Concrete Table Example:

Consider two relations representing employees who have completed different training programs:

Relation R: Completed Safety Training
EmpID	Name	Department
E001	Alice Chen	Engineering
E002	Bob Smith	Engineering
E003	Carol Davis	Research
E004	David Lee	Marketing

Relation S: Completed Security Training
EmpID	Name	Department
E002	Bob Smith	Engineering
E003	Carol Davis	Research
E005	Eve Wilson	Finance
E006	Frank Brown	Research

Result: SafetyTraining ∩ SecurityTraining (Completed BOTH)
EmpID	Name	Department
E002	Bob Smith	Engineering
E003	Carol Davis	Research

Practical Interpretation

The result shows employees who have completed BOTH training programs. Only Bob Smith and Carol Davis appear in both input relations. This is exactly the kind of query that intersection is designed for—finding overlap between two sets based on complete tuple matching.

Query Tree Representation:

In query processing, intersection is represented as a binary operator with two child relations:

Converting Mermaid diagram...

Cardinality Analysis and Performance

Understanding intersection cardinality is crucial for query optimization, as intersection results can range from empty to the full size of the smaller operand.

Cardinality Bounds:

For R ∩ S:

0 ≤ |R ∩ S| ≤ min(|R|, |S|)

Lower Bound: Zero (when relations are completely disjoint)

Upper Bound: Size of the smaller relation (when smaller is a subset of larger)

Cardinality Scenarios for Intersection
\|R\|	\|S\|	Relationship	\|R ∩ S\|	Scenario
1000	500	Disjoint	0	No common tuples
1000	500	S ⊆ R	500	All of S is in R (maximum)
1000	1000	R = S	1000	Identical relations
1000	500	200 common	200	Partial overlap
1000	0	S empty	0	Empty operand yields empty result

Selectivity Considerations:

Intersection typically has low selectivity (produces small results relative to inputs) unless the relations have significant overlap. Query optimizers use this characteristic:

Early Intersection: When possible, computing intersection early reduces data volume for subsequent operations
Build-Side Selection: In hash-based intersection, the smaller relation should be the build side
Statistics Importance: Accurate overlap statistics are critical for cost estimation

Performance Characteristics:

Implementation	Time Complexity	Space Complexity	Best For
Sort-Merge	O(	R	log
Hash-based	O(	R	+
Index-based	O(	S	× log

Optimization Insight

Because |R ∩ S| ≤ min(|R|, |S|), intersection never increases data volume. This makes it an excellent 'early filter' in query plans. If you need to intersect with other operations, doing intersection first often reduces the cost of subsequent operations.

intersection_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def hash_based_intersection(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Hash-based intersection algorithm.
    Time: O(|R| + |S|) expected
    Space: O(min(|R|, |S|)) for hash table
    
    Strategy: Build hash set from smaller relation,
              probe with larger relation.
    """
    # Use smaller relation for hash table (build side)
    if len(R) <= len(S):
        build_relation, probe_relation = R, S
    else:
        build_relation, probe_relation = S, R
    
    # Build phase: Create hash set from smaller relation
    build_set = set(build_relation)
    
    # Probe phase: Check each tuple from larger relation
    result = []
    seen = set()  # Avoid duplicates in result
    
    for tuple_p in probe_relation:
        if tuple_p in build_set and tuple_p not in seen:
            result.append(tuple_p)
            seen.add(tuple_p)
    
    return result
 
 
def sort_merge_intersection(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Sort-merge intersection algorithm.
    Time: O(|R| log|R| + |S| log|S|) for sorting
    Space: O(1) additional after sorting
    """
    sorted_R = sorted(R)
    sorted_S = sorted(S)
    
    result = []
    i, j = 0, 0
    last_added = None
    
    while i < len(sorted_R) and j < len(sorted_S):
        if sorted_R[i] < sorted_S[j]:
            i += 1
        elif sorted_R[i] > sorted_S[j]:
            j += 1
        else:  # Tuples are equal - found intersection
            if sorted_R[i] != last_added:
                result.append(sorted_R[i])
                last_added = sorted_R[i]
            i += 1
            j += 1
    
    return result
 
 
def intersection_via_difference(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Intersection computed via the derivation formula:
    R ∩ S = R − (R − S)
    
    Demonstrates the theoretical derivability,
    but NOT recommended for production use.
    """
    def set_difference(A: List[Tuple], B: List[Tuple]) -> List[Tuple]:
        b_set = set(B)
        return [t for t in A if t not in b_set]
    
    # Step 1: R − S (tuples in R but not in S)
    r_minus_s = set_difference(R, S)
    
    # Step 2: R − (R − S) (remove R-only tuples from R)
    result = set_difference(R, r_minus_s)
    
    return list(set(result))  # Remove any remaining duplicates

Intersection vs. Join: Understanding the Difference

A common source of confusion for database learners is distinguishing between intersection and natural join. While both operations can identify related data, they are fundamentally different:

Intersection (∩):

Requires union-compatible operands (same schema)
Returns complete tuples that appear in both relations
Result has same schema as operands
Compares entire tuples

Natural Join (⋈):

Works with relations of any compatible schema
Combines tuples based on matching attribute values for common attribute names
Result schema is the union of both schemas (shared attributes appear once)
Compares specific attribute values, not entire tuples

Use Intersection When

•Relations have identical schemas
•You need tuples that are exactly the same in both relations
•Finding common records between two sources with same structure
•Example: Customers in both Q1 and Q2 sales data

Use Join When

•Relations have different schemas with some shared attributes
•You need to combine information from both relations
•Linking related records across tables via a key
•Example: Combining customer info with their orders

Illustrative Comparison:

-- Relations:
R(A, B, C): {(1, 2, 3), (4, 5, 6), (7, 8, 9)}
S(A, B, C): {(1, 2, 3), (4, 5, 7), (10, 11, 12)}

-- Intersection: Same schema, find identical tuples
R ∩ S = {(1, 2, 3)}  -- Only tuple appearing in both

-- Now consider different schemas:
P(A, B): {(1, 2), (4, 5), (7, 8)}
Q(B, C): {(2, 3), (5, 6), (8, 100)}

-- Natural Join: Match on common attribute B, combine
P ⋈ Q = {(1, 2, 3), (4, 5, 6), (7, 8, 100)}

Note how intersection requires identical tuples, while join combines tuples based on matching attribute values and produces a wider result schema.

Common Mistake

Don't use intersection when you mean join! If you're trying to find 'employees and their departments,' you need a join (linking employee table to department table via department_id). Intersection would only make sense if both tables had the exact same structure and you wanted to find records appearing in both.

SQL Implementation: INTERSECT

SQL provides the INTERSECT operator to implement relational intersection. Understanding its behavior and compatibility with different database systems is essential for practical database work.

SQL INTERSECT Syntax:

SELECT column_list FROM table1
INTERSECT
SELECT column_list FROM table2;

SQL INTERSECT Variants:

SQL Operator	Behavior	Standard Support
INTERSECT	Eliminates duplicates (set semantics)	SQL Standard
INTERSECT ALL	Preserves duplicate count (bag semantics)	Some vendors
INTERSECT DISTINCT	Same as INTERSECT	Explicit form

sql_intersect_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- Basic INTERSECT: Find customers who ordered in both Q1 and Q2
SELECT customer_id, customer_name
FROM Q1_Orders
INTERSECT
SELECT customer_id, customer_name
FROM Q2_Orders;
 
-- Find products available in both US and EU warehouses
SELECT product_id, product_name, price
FROM US_Inventory
INTERSECT
SELECT product_id, product_name, price
FROM EU_Inventory;
 
-- Multi-way intersection: Find employees in all three departments
SELECT employee_id, name
FROM Engineering_Team
INTERSECT
SELECT employee_id, name
FROM Research_Team
INTERSECT
SELECT employee_id, name
FROM Leadership_Team;
 
-- INTERSECT with ORDER BY (applies to final result)
SELECT product_id, product_name
FROM Premium_Products
INTERSECT
SELECT product_id, product_name
FROM Sale_Products
ORDER BY product_name;
 
-- Using INTERSECT to implement AND logic across tables
-- Equivalent to subquery approach:
SELECT customer_id
FROM Orders
WHERE product_category = 'Electronics'
INTERSECT
SELECT customer_id
FROM Orders
WHERE product_category = 'Books';
 
-- Equivalent subquery version:
SELECT DISTINCT customer_id
FROM Orders o1
WHERE o1.product_category = 'Electronics'
  AND EXISTS (
    SELECT 1 FROM Orders o2
    WHERE o2.customer_id = o1.customer_id
      AND o2.product_category = 'Books'
  );

Vendor Support

INTERSECT is supported by most major database systems including PostgreSQL, SQL Server, Oracle, SQLite, and MySQL (8.0+). Some older MySQL versions didn't support INTERSECT, requiring workarounds using EXISTS subqueries or INNER JOINs.

Alternative Approaches (when INTERSECT unavailable):

-- Using EXISTS subquery:
SELECT DISTINCT a.col1, a.col2
FROM TableA a
WHERE EXISTS (
    SELECT 1 FROM TableB b
    WHERE b.col1 = a.col1 AND b.col2 = a.col2
);

-- Using INNER JOIN with DISTINCT:
SELECT DISTINCT a.col1, a.col2
FROM TableA a
INNER JOIN TableB b
    ON a.col1 = b.col1 AND a.col2 = b.col2;

-- Using IN with composite key (if single-valued):
SELECT col1
FROM TableA
WHERE col1 IN (SELECT col1 FROM TableB);

The native INTERSECT operator is generally more readable and may be optimized better by the query planner.

Practical Applications of Intersection

Intersection is invaluable for queries that require identifying commonality across datasets. Here are key real-world applications:

Common Intersection Use Cases

•Multi-criteria Qualification: Finding entities that meet ALL of several criteria (e.g., employees who passed all required certifications)
•Cross-channel Analysis: Customers who engaged through multiple channels (web AND mobile)
•Time-based Retention: Users active in both current and previous periods
•Multi-source Validation: Records that appear in multiple independent sources (data quality)
•Feature Overlap: Products/services with features from multiple categories
•Compliance Checking: Items satisfying requirements from multiple regulatory frameworks

Case Study: Multi-Certification Policy

A company requires all project managers to hold both PMP and Agile certifications. The database stores certifications in separate tables (one credential per record):

-- Find employees with both PMP AND Agile certifications:
π_emp_id,name(σ_cert='PMP'(Certifications))
∩
π_emp_id,name(σ_cert='Agile'(Certifications))

This intersection finds the exact set of employees qualified to lead projects under the dual-certification policy.

Case Study: Customer Retention Analysis

Identify customers who made purchases in both 2024 and 2025 (retained customers):

-- Retained customers:
π_customer_id,customer_name(σ_year=2024(Sales))
∩
π_customer_id,customer_name(σ_year=2025(Sales))

The result represents the company's retained customer base—a critical metric for business analysis.

Pattern Recognition

Whenever you see 'AND' in a requirement that spans different records (not columns within a record), think intersection. 'Customers who bought A AND B' requires intersection because A and B are separate purchase records, not columns in a single row.

Summary: Mastering the Intersection Operation

We've explored the intersection operation comprehensively. Let's consolidate the key knowledge:

Key Takeaways

•Definition: R ∩ S produces a relation containing only tuples that appear in BOTH R and S
•Union Compatibility: Like union, intersection requires same degree and compatible domains
•Derivability: Intersection is NOT primitive—it can be derived from difference: R ∩ S = R − (R − S)
•Cardinality: 0 ≤ |R ∩ S| ≤ min(|R|, |S|); intersection never increases data volume
•Properties: Commutative, associative, idempotent; obeys De Morgan's laws
•Implementation: Hash-based O(n) expected time; sort-merge for larger relations
•vs. Join: Intersection compares complete tuples; join matches on specific attributes
•SQL: Use INTERSECT operator; alternatives include EXISTS subqueries and JOINs

Page Complete

You now have thorough mastery of the intersection operation in relational algebra. You understand its mathematical foundations, can prove its derivability from set difference, recognize when to use intersection vs. join, and can implement intersection in both relational algebra and SQL. Next, we'll explore set difference (−), which identifies tuples in one relation but not another.

2 / 5

Loading learning content...

Database Management SystemsRelational Algebra

Set Operations in Relational Algebra

LevelIntermediate

Duration90 mins

TopicRelational Algebra

2 / 5

Intersection (∩) Operation

Finding Common Ground: The Intersection Operation

What You Will Learn

Mathematical Foundations of Intersection

The intersection operation in relational algebra, like union, is grounded in classical set theory. Understanding this foundation provides the rigorous framework needed for precise query formulation.

Set-Theoretic Definition:

In classical set theory, given two sets A and B, the intersection A ∩ B is defined as:

A ∩ B = { x | x ∈ A ∧ x ∈ B }

This reads: "The intersection of A and B is the set of all elements x such that x is a member of A AND x is a member of B."

The key characteristics of set intersection include:

Membership Requirement: An element must belong to BOTH sets to appear in the result
Subset Property: A ∩ B ⊆ A and A ∩ B ⊆ B
Commutativity: A ∩ B = B ∩ A
Associativity: (A ∩ B) ∩ C = A ∩ (B ∩ C)

Intersection as a Filter

Relational Intersection Definition:

Formally, given two union-compatible relations R and S, the intersection R ∩ S is defined as:

R ∩ S = { t | t ∈ R ∧ t ∈ S }

Where:

t represents a tuple
The result contains only tuples that appear in BOTH R and S
The result schema matches the operand schemas
Like union, intersection requires union compatibility

Critical Observation: The intersection of two relations is always a subset of both operands. This has important implications for cardinality analysis.

Mathematical Properties of Relational Intersection
Property	Definition	Significance
Commutativity	R ∩ S = S ∩ R	Order of operands does not affect result
Associativity	(R ∩ S) ∩ T = R ∩ (S ∩ T)	Grouping of multiple intersections doesn't matter
Idempotence	R ∩ R = R	Intersection of a relation with itself yields the original
Identity	R ∩ Universal = R	Intersection with universal relation yields original
Annihilation	R ∩ ∅ = ∅	Intersection with empty relation yields empty
Absorption	R ∩ (R ∪ S) = R	Intersection absorbs union with same operand
Distribution over Union	R ∩ (S ∪ T) = (R ∩ S) ∪ (R ∩ T)	Intersection distributes over union
Subset Result	R ∩ S ⊆ R and R ∩ S ⊆ S	Result is subset of both operands

De Morgan's Laws in Relational Algebra:

The famous De Morgan's laws from set theory also apply to relational algebra:

First Law: R − (S ∪ T) = (R − S) ∩ (R − T)
Second Law: R − (S ∩ T) = (R − S) ∪ (R − T)

These laws are invaluable for query transformation and optimization, allowing queries to be restructured into equivalent forms that may be more efficiently executable.

Derivability: Intersection from Set Difference

The Derivation:

Intersection can be expressed using only set difference:

R ∩ S = R − (R − S)

Equivalently:

R ∩ S = S − (S − R)

Why does this work?

Let's trace through the logic step by step:

Derivation Proof: R ∩ S = R − (R − S)

•Start with R − S: This produces all tuples in R that are NOT in S (tuples unique to R)
•Compute R − (R − S): From R, remove all tuples that were unique to R (those not in S)
•What remains?: Only tuples that are in R AND also in S—exactly the intersection!

Intuitive Understanding

Formal Proof:

We prove R ∩ S = R − (R − S) by showing mutual subset inclusion.

Part 1: R ∩ S ⊆ R − (R − S)

Let t ∈ R ∩ S (arbitrary tuple in the intersection)

Then t ∈ R and t ∈ S (by definition of intersection)
Since t ∈ S, we know t ∉ (R − S) (because R − S contains only tuples NOT in S)
Since t ∈ R and t ∉ (R − S), we have t ∈ R − (R − S)
Therefore R ∩ S ⊆ R − (R − S) ✓

Part 2: R − (R − S) ⊆ R ∩ S

Let t ∈ R − (R − S) (arbitrary tuple in the difference)

Then t ∈ R and t ∉ (R − S) (by definition of difference)
Since t ∉ (R − S), either t ∉ R or t ∈ S
We know t ∈ R, so it must be that t ∈ S
Since t ∈ R and t ∈ S, we have t ∈ R ∩ S
Therefore R − (R − S) ⊆ R ∩ S ✓

Conclusion: Since both subset relations hold, R ∩ S = R − (R − S) ∎

Converting Mermaid diagram...

Practical Implications

Union Compatibility for Intersection

Like union, intersection requires union compatibility between its operands. The requirements are identical:

Same Degree: Both relations must have the same number of attributes
Domain Compatibility: Corresponding attributes must have compatible domains

However, the behavior implications differ from union:

For Union: Incompatible schemas make it impossible to create a coherent result relation

For Intersection: Even with compatible schemas, the intersection might be empty if there's no actual tuple overlap—which is a valid, meaningful result

Intersection Outcomes Based on Operand Relationship
Relationship	Result	Cardinality
R ⊆ S (R is subset of S)	R ∩ S = R	\|R ∩ S\| = \|R\|
S ⊆ R (S is subset of R)	R ∩ S = S	\|R ∩ S\| = \|S\|
R = S (identical)	R ∩ S = R = S	\|R ∩ S\| = \|R\| = \|S\|
R and S disjoint	R ∩ S = ∅	\|R ∩ S\| = 0
Partial overlap	Common tuples only	0 < \|R ∩ S\| < min(\|R\|, \|S\|)

Ensuring Compatibility:

When working with relations that are not directly union-compatible, projection and renaming can be used to achieve compatibility:

-- Original incompatible relations:
Employees(emp_id, name, department, salary, hire_date)
Contractors(contractor_id, full_name, dept_code, rate)

-- To find people working in both capacities (by name and department):
π_name,department(Employees) ∩ ρ_name←full_name,department←dept_code(π_full_name,dept_code(Contractors))

The steps:

Project Employees to just (name, department)
Project Contractors to (full_name, dept_code)
Rename Contractors' attributes to match Employees'
Compute intersection

This pattern is essential when comparing data from different sources with different schemas.

Semantic Compatibility

Visual Representation of Intersection

Visualizing intersection helps build intuition for identifying common tuples between relations. The classic Venn diagram representation is particularly effective for intersection.

Converting Mermaid diagram...

Concrete Table Example:

Consider two relations representing employees who have completed different training programs:

Relation R: Completed Safety Training
EmpID	Name	Department
E001	Alice Chen	Engineering
E002	Bob Smith	Engineering
E003	Carol Davis	Research
E004	David Lee	Marketing

Relation S: Completed Security Training
EmpID	Name	Department
E002	Bob Smith	Engineering
E003	Carol Davis	Research
E005	Eve Wilson	Finance
E006	Frank Brown	Research

Result: SafetyTraining ∩ SecurityTraining (Completed BOTH)
EmpID	Name	Department
E002	Bob Smith	Engineering
E003	Carol Davis	Research

Practical Interpretation

Query Tree Representation:

In query processing, intersection is represented as a binary operator with two child relations:

Converting Mermaid diagram...

Cardinality Analysis and Performance

Understanding intersection cardinality is crucial for query optimization, as intersection results can range from empty to the full size of the smaller operand.

Cardinality Bounds:

For R ∩ S:

0 ≤ |R ∩ S| ≤ min(|R|, |S|)

Lower Bound: Zero (when relations are completely disjoint)

Upper Bound: Size of the smaller relation (when smaller is a subset of larger)

Cardinality Scenarios for Intersection
\|R\|	\|S\|	Relationship	\|R ∩ S\|	Scenario
1000	500	Disjoint	0	No common tuples
1000	500	S ⊆ R	500	All of S is in R (maximum)
1000	1000	R = S	1000	Identical relations
1000	500	200 common	200	Partial overlap
1000	0	S empty	0	Empty operand yields empty result

Selectivity Considerations:

Intersection typically has low selectivity (produces small results relative to inputs) unless the relations have significant overlap. Query optimizers use this characteristic:

Early Intersection: When possible, computing intersection early reduces data volume for subsequent operations
Build-Side Selection: In hash-based intersection, the smaller relation should be the build side
Statistics Importance: Accurate overlap statistics are critical for cost estimation

Performance Characteristics:

Implementation	Time Complexity	Space Complexity	Best For
Sort-Merge	O(	R	log
Hash-based	O(	R	+
Index-based	O(	S	× log

Optimization Insight

intersection_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def hash_based_intersection(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Hash-based intersection algorithm.
    Time: O(|R| + |S|) expected
    Space: O(min(|R|, |S|)) for hash table
    
    Strategy: Build hash set from smaller relation,
              probe with larger relation.
    """
    # Use smaller relation for hash table (build side)
    if len(R) <= len(S):
        build_relation, probe_relation = R, S
    else:
        build_relation, probe_relation = S, R
    
    # Build phase: Create hash set from smaller relation
    build_set = set(build_relation)
    
    # Probe phase: Check each tuple from larger relation
    result = []
    seen = set()  # Avoid duplicates in result
    
    for tuple_p in probe_relation:
        if tuple_p in build_set and tuple_p not in seen:
            result.append(tuple_p)
            seen.add(tuple_p)
    
    return result
 
 
def sort_merge_intersection(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Sort-merge intersection algorithm.
    Time: O(|R| log|R| + |S| log|S|) for sorting
    Space: O(1) additional after sorting
    """
    sorted_R = sorted(R)
    sorted_S = sorted(S)
    
    result = []
    i, j = 0, 0
    last_added = None
    
    while i < len(sorted_R) and j < len(sorted_S):
        if sorted_R[i] < sorted_S[j]:
            i += 1
        elif sorted_R[i] > sorted_S[j]:
            j += 1
        else:  # Tuples are equal - found intersection
            if sorted_R[i] != last_added:
                result.append(sorted_R[i])
                last_added = sorted_R[i]
            i += 1
            j += 1
    
    return result
 
 
def intersection_via_difference(R: List[Tuple], S: List[Tuple]) -> List[Tuple]:
    """
    Intersection computed via the derivation formula:
    R ∩ S = R − (R − S)
    
    Demonstrates the theoretical derivability,
    but NOT recommended for production use.
    """
    def set_difference(A: List[Tuple], B: List[Tuple]) -> List[Tuple]:
        b_set = set(B)
        return [t for t in A if t not in b_set]
    
    # Step 1: R − S (tuples in R but not in S)
    r_minus_s = set_difference(R, S)
    
    # Step 2: R − (R − S) (remove R-only tuples from R)
    result = set_difference(R, r_minus_s)
    
    return list(set(result))  # Remove any remaining duplicates

Intersection vs. Join: Understanding the Difference

A common source of confusion for database learners is distinguishing between intersection and natural join. While both operations can identify related data, they are fundamentally different:

Intersection (∩):

Requires union-compatible operands (same schema)
Returns complete tuples that appear in both relations
Result has same schema as operands
Compares entire tuples

Natural Join (⋈):

Works with relations of any compatible schema
Combines tuples based on matching attribute values for common attribute names
Result schema is the union of both schemas (shared attributes appear once)
Compares specific attribute values, not entire tuples

Use Intersection When

•Relations have identical schemas
•You need tuples that are exactly the same in both relations
•Finding common records between two sources with same structure
•Example: Customers in both Q1 and Q2 sales data

Use Join When

•Relations have different schemas with some shared attributes
•You need to combine information from both relations
•Linking related records across tables via a key
•Example: Combining customer info with their orders

Illustrative Comparison:

-- Relations:
R(A, B, C): {(1, 2, 3), (4, 5, 6), (7, 8, 9)}
S(A, B, C): {(1, 2, 3), (4, 5, 7), (10, 11, 12)}

-- Intersection: Same schema, find identical tuples
R ∩ S = {(1, 2, 3)}  -- Only tuple appearing in both

-- Now consider different schemas:
P(A, B): {(1, 2), (4, 5), (7, 8)}
Q(B, C): {(2, 3), (5, 6), (8, 100)}

-- Natural Join: Match on common attribute B, combine
P ⋈ Q = {(1, 2, 3), (4, 5, 6), (7, 8, 100)}

Note how intersection requires identical tuples, while join combines tuples based on matching attribute values and produces a wider result schema.

Common Mistake

SQL Implementation: INTERSECT

SQL provides the INTERSECT operator to implement relational intersection. Understanding its behavior and compatibility with different database systems is essential for practical database work.

SQL INTERSECT Syntax:

SELECT column_list FROM table1
INTERSECT
SELECT column_list FROM table2;

SQL INTERSECT Variants:

SQL Operator	Behavior	Standard Support
INTERSECT	Eliminates duplicates (set semantics)	SQL Standard
INTERSECT ALL	Preserves duplicate count (bag semantics)	Some vendors
INTERSECT DISTINCT	Same as INTERSECT	Explicit form

sql_intersect_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- Basic INTERSECT: Find customers who ordered in both Q1 and Q2
SELECT customer_id, customer_name
FROM Q1_Orders
INTERSECT
SELECT customer_id, customer_name
FROM Q2_Orders;
 
-- Find products available in both US and EU warehouses
SELECT product_id, product_name, price
FROM US_Inventory
INTERSECT
SELECT product_id, product_name, price
FROM EU_Inventory;
 
-- Multi-way intersection: Find employees in all three departments
SELECT employee_id, name
FROM Engineering_Team
INTERSECT
SELECT employee_id, name
FROM Research_Team
INTERSECT
SELECT employee_id, name
FROM Leadership_Team;
 
-- INTERSECT with ORDER BY (applies to final result)
SELECT product_id, product_name
FROM Premium_Products
INTERSECT
SELECT product_id, product_name
FROM Sale_Products
ORDER BY product_name;
 
-- Using INTERSECT to implement AND logic across tables
-- Equivalent to subquery approach:
SELECT customer_id
FROM Orders
WHERE product_category = 'Electronics'
INTERSECT
SELECT customer_id
FROM Orders
WHERE product_category = 'Books';
 
-- Equivalent subquery version:
SELECT DISTINCT customer_id
FROM Orders o1
WHERE o1.product_category = 'Electronics'
  AND EXISTS (
    SELECT 1 FROM Orders o2
    WHERE o2.customer_id = o1.customer_id
      AND o2.product_category = 'Books'
  );

Vendor Support

Alternative Approaches (when INTERSECT unavailable):

-- Using EXISTS subquery:
SELECT DISTINCT a.col1, a.col2
FROM TableA a
WHERE EXISTS (
    SELECT 1 FROM TableB b
    WHERE b.col1 = a.col1 AND b.col2 = a.col2
);

-- Using INNER JOIN with DISTINCT:
SELECT DISTINCT a.col1, a.col2
FROM TableA a
INNER JOIN TableB b
    ON a.col1 = b.col1 AND a.col2 = b.col2;

-- Using IN with composite key (if single-valued):
SELECT col1
FROM TableA
WHERE col1 IN (SELECT col1 FROM TableB);

The native INTERSECT operator is generally more readable and may be optimized better by the query planner.

Practical Applications of Intersection

Intersection is invaluable for queries that require identifying commonality across datasets. Here are key real-world applications:

Common Intersection Use Cases

•Multi-criteria Qualification: Finding entities that meet ALL of several criteria (e.g., employees who passed all required certifications)
•Cross-channel Analysis: Customers who engaged through multiple channels (web AND mobile)
•Time-based Retention: Users active in both current and previous periods
•Multi-source Validation: Records that appear in multiple independent sources (data quality)
•Feature Overlap: Products/services with features from multiple categories
•Compliance Checking: Items satisfying requirements from multiple regulatory frameworks

Case Study: Multi-Certification Policy

A company requires all project managers to hold both PMP and Agile certifications. The database stores certifications in separate tables (one credential per record):

-- Find employees with both PMP AND Agile certifications:
π_emp_id,name(σ_cert='PMP'(Certifications))
∩
π_emp_id,name(σ_cert='Agile'(Certifications))

This intersection finds the exact set of employees qualified to lead projects under the dual-certification policy.

Case Study: Customer Retention Analysis

Identify customers who made purchases in both 2024 and 2025 (retained customers):

-- Retained customers:
π_customer_id,customer_name(σ_year=2024(Sales))
∩
π_customer_id,customer_name(σ_year=2025(Sales))

The result represents the company's retained customer base—a critical metric for business analysis.

Pattern Recognition

Summary: Mastering the Intersection Operation

We've explored the intersection operation comprehensively. Let's consolidate the key knowledge:

Key Takeaways

•Definition: R ∩ S produces a relation containing only tuples that appear in BOTH R and S
•Union Compatibility: Like union, intersection requires same degree and compatible domains
•Derivability: Intersection is NOT primitive—it can be derived from difference: R ∩ S = R − (R − S)
•Cardinality: 0 ≤ |R ∩ S| ≤ min(|R|, |S|); intersection never increases data volume
•Properties: Commutative, associative, idempotent; obeys De Morgan's laws
•Implementation: Hash-based O(n) expected time; sort-merge for larger relations
•vs. Join: Intersection compares complete tuples; join matches on specific attributes
•SQL: Use INTERSECT operator; alternatives include EXISTS subqueries and JOINs

Page Complete

2 / 5