Loading learning content...
While union combines all tuples from two relations, intersection identifies what is common to both. The intersection operation answers a fundamentally different question: "Which tuples exist in BOTH relations?"
Intersection is indispensable when you need to find overlapping data—customers who purchased from both product lines, employees who have skills in multiple domains, or records that satisfy multiple independent criteria. Understanding intersection deeply involves not just its definition, but also its fascinating theoretical property: it can be derived from other fundamental operations.
This page delivers a complete mastery of the intersection operation, from mathematical foundations through practical implementation, including a rigorous proof of its derivability from set difference.
By the end of this page, you will master the formal definition of intersection, understand why it's not a primitive operator (and can be derived from difference), compare intersection with other operations, analyze performance characteristics, and apply intersection to solve real-world database queries.
The intersection operation in relational algebra, like union, is grounded in classical set theory. Understanding this foundation provides the rigorous framework needed for precise query formulation.
Set-Theoretic Definition:
In classical set theory, given two sets A and B, the intersection A ∩ B is defined as:
A ∩ B = { x | x ∈ A ∧ x ∈ B }
This reads: "The intersection of A and B is the set of all elements x such that x is a member of A AND x is a member of B."
The key characteristics of set intersection include:
Conceptually, intersection acts as a mutual filter: it retains only those tuples that 'pass through' both relations. This filtering perspective is helpful when reasoning about query results—the intersection can never contain more tuples than the smaller of its operands.
Relational Intersection Definition:
Formally, given two union-compatible relations R and S, the intersection R ∩ S is defined as:
R ∩ S = { t | t ∈ R ∧ t ∈ S }
Where:
Critical Observation: The intersection of two relations is always a subset of both operands. This has important implications for cardinality analysis.
| Property | Definition | Significance |
|---|---|---|
| Commutativity | R ∩ S = S ∩ R | Order of operands does not affect result |
| Associativity | (R ∩ S) ∩ T = R ∩ (S ∩ T) | Grouping of multiple intersections doesn't matter |
| Idempotence | R ∩ R = R | Intersection of a relation with itself yields the original |
| Identity | R ∩ Universal = R | Intersection with universal relation yields original |
| Annihilation | R ∩ ∅ = ∅ | Intersection with empty relation yields empty |
| Absorption | R ∩ (R ∪ S) = R | Intersection absorbs union with same operand |
| Distribution over Union | R ∩ (S ∪ T) = (R ∩ S) ∪ (R ∩ T) | Intersection distributes over union |
| Subset Result | R ∩ S ⊆ R and R ∩ S ⊆ S | Result is subset of both operands |
De Morgan's Laws in Relational Algebra:
The famous De Morgan's laws from set theory also apply to relational algebra:
These laws are invaluable for query transformation and optimization, allowing queries to be restructured into equivalent forms that may be more efficiently executable.
A remarkable theoretical property of relational intersection is that it is not a primitive operator. Unlike selection, projection, union, difference, and Cartesian product (the five fundamental operations), intersection can be derived from these primitives.
The Derivation:
Intersection can be expressed using only set difference:
R ∩ S = R − (R − S)
Equivalently:
R ∩ S = S − (S − R)
Why does this work?
Let's trace through the logic step by step:
Think of it this way: (R − S) removes from R everything that R doesn't share with S. When we then compute R − (R − S), we're removing those 'not shared' tuples from R, leaving only the shared tuples—the intersection.
Formal Proof:
We prove R ∩ S = R − (R − S) by showing mutual subset inclusion.
Part 1: R ∩ S ⊆ R − (R − S)
Let t ∈ R ∩ S (arbitrary tuple in the intersection)
Part 2: R − (R − S) ⊆ R ∩ S
Let t ∈ R − (R − S) (arbitrary tuple in the difference)
Conclusion: Since both subset relations hold, R ∩ S = R − (R − S) ∎
Despite being derivable, most database systems implement intersection as a distinct operation for efficiency. Computing R − (R − S) requires two difference operations, while a direct intersection implementation can be more efficient. The derivability is theoretically important (proving relational algebra's closure properties) but not necessarily the implementation strategy.
Like union, intersection requires union compatibility between its operands. The requirements are identical:
However, the behavior implications differ from union:
For Union: Incompatible schemas make it impossible to create a coherent result relation
For Intersection: Even with compatible schemas, the intersection might be empty if there's no actual tuple overlap—which is a valid, meaningful result
| Relationship | Result | Cardinality |
|---|---|---|
| R ⊆ S (R is subset of S) | R ∩ S = R | |R ∩ S| = |R| |
| S ⊆ R (S is subset of R) | R ∩ S = S | |R ∩ S| = |S| |
| R = S (identical) | R ∩ S = R = S | |R ∩ S| = |R| = |S| |
| R and S disjoint | R ∩ S = ∅ | |R ∩ S| = 0 |
| Partial overlap | Common tuples only | 0 < |R ∩ S| < min(|R|, |S|) |
Ensuring Compatibility:
When working with relations that are not directly union-compatible, projection and renaming can be used to achieve compatibility:
-- Original incompatible relations:
Employees(emp_id, name, department, salary, hire_date)
Contractors(contractor_id, full_name, dept_code, rate)
-- To find people working in both capacities (by name and department):
π_name,department(Employees) ∩ ρ_name←full_name,department←dept_code(π_full_name,dept_code(Contractors))
The steps:
This pattern is essential when comparing data from different sources with different schemas.
Domain compatibility alone doesn't guarantee meaningful results. Even if two VARCHAR columns are compatible, intersecting 'customer_name' with 'product_name' is semantically meaningless. Always ensure that the attributes being compared represent the same real-world concept.
Visualizing intersection helps build intuition for identifying common tuples between relations. The classic Venn diagram representation is particularly effective for intersection.
Concrete Table Example:
Consider two relations representing employees who have completed different training programs:
| EmpID | Name | Department |
|---|---|---|
| E001 | Alice Chen | Engineering |
| E002 | Bob Smith | Engineering |
| E003 | Carol Davis | Research |
| E004 | David Lee | Marketing |
| EmpID | Name | Department |
|---|---|---|
| E002 | Bob Smith | Engineering |
| E003 | Carol Davis | Research |
| E005 | Eve Wilson | Finance |
| E006 | Frank Brown | Research |
| EmpID | Name | Department |
|---|---|---|
| E002 | Bob Smith | Engineering |
| E003 | Carol Davis | Research |
The result shows employees who have completed BOTH training programs. Only Bob Smith and Carol Davis appear in both input relations. This is exactly the kind of query that intersection is designed for—finding overlap between two sets based on complete tuple matching.
Query Tree Representation:
In query processing, intersection is represented as a binary operator with two child relations:
Understanding intersection cardinality is crucial for query optimization, as intersection results can range from empty to the full size of the smaller operand.
Cardinality Bounds:
For R ∩ S:
0 ≤ |R ∩ S| ≤ min(|R|, |S|)
Lower Bound: Zero (when relations are completely disjoint)
Upper Bound: Size of the smaller relation (when smaller is a subset of larger)
| |R| | |S| | Relationship | |R ∩ S| | Scenario |
|---|---|---|---|---|
| 1000 | 500 | Disjoint | 0 | No common tuples |
| 1000 | 500 | S ⊆ R | 500 | All of S is in R (maximum) |
| 1000 | 1000 | R = S | 1000 | Identical relations |
| 1000 | 500 | 200 common | 200 | Partial overlap |
| 1000 | 0 | S empty | 0 | Empty operand yields empty result |
Selectivity Considerations:
Intersection typically has low selectivity (produces small results relative to inputs) unless the relations have significant overlap. Query optimizers use this characteristic:
Performance Characteristics:
| Implementation | Time Complexity | Space Complexity | Best For |
|---|---|---|---|
| Sort-Merge | O( | R | log |
| Hash-based | O( | R | + |
| Index-based | O( | S | × log |
Because |R ∩ S| ≤ min(|R|, |S|), intersection never increases data volume. This makes it an excellent 'early filter' in query plans. If you need to intersect with other operations, doing intersection first often reduces the cost of subsequent operations.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
def hash_based_intersection(R: List[Tuple], S: List[Tuple]) -> List[Tuple]: """ Hash-based intersection algorithm. Time: O(|R| + |S|) expected Space: O(min(|R|, |S|)) for hash table Strategy: Build hash set from smaller relation, probe with larger relation. """ # Use smaller relation for hash table (build side) if len(R) <= len(S): build_relation, probe_relation = R, S else: build_relation, probe_relation = S, R # Build phase: Create hash set from smaller relation build_set = set(build_relation) # Probe phase: Check each tuple from larger relation result = [] seen = set() # Avoid duplicates in result for tuple_p in probe_relation: if tuple_p in build_set and tuple_p not in seen: result.append(tuple_p) seen.add(tuple_p) return result def sort_merge_intersection(R: List[Tuple], S: List[Tuple]) -> List[Tuple]: """ Sort-merge intersection algorithm. Time: O(|R| log|R| + |S| log|S|) for sorting Space: O(1) additional after sorting """ sorted_R = sorted(R) sorted_S = sorted(S) result = [] i, j = 0, 0 last_added = None while i < len(sorted_R) and j < len(sorted_S): if sorted_R[i] < sorted_S[j]: i += 1 elif sorted_R[i] > sorted_S[j]: j += 1 else: # Tuples are equal - found intersection if sorted_R[i] != last_added: result.append(sorted_R[i]) last_added = sorted_R[i] i += 1 j += 1 return result def intersection_via_difference(R: List[Tuple], S: List[Tuple]) -> List[Tuple]: """ Intersection computed via the derivation formula: R ∩ S = R − (R − S) Demonstrates the theoretical derivability, but NOT recommended for production use. """ def set_difference(A: List[Tuple], B: List[Tuple]) -> List[Tuple]: b_set = set(B) return [t for t in A if t not in b_set] # Step 1: R − S (tuples in R but not in S) r_minus_s = set_difference(R, S) # Step 2: R − (R − S) (remove R-only tuples from R) result = set_difference(R, r_minus_s) return list(set(result)) # Remove any remaining duplicatesA common source of confusion for database learners is distinguishing between intersection and natural join. While both operations can identify related data, they are fundamentally different:
Intersection (∩):
Natural Join (⋈):
Illustrative Comparison:
-- Relations:
R(A, B, C): {(1, 2, 3), (4, 5, 6), (7, 8, 9)}
S(A, B, C): {(1, 2, 3), (4, 5, 7), (10, 11, 12)}
-- Intersection: Same schema, find identical tuples
R ∩ S = {(1, 2, 3)} -- Only tuple appearing in both
-- Now consider different schemas:
P(A, B): {(1, 2), (4, 5), (7, 8)}
Q(B, C): {(2, 3), (5, 6), (8, 100)}
-- Natural Join: Match on common attribute B, combine
P ⋈ Q = {(1, 2, 3), (4, 5, 6), (7, 8, 100)}
Note how intersection requires identical tuples, while join combines tuples based on matching attribute values and produces a wider result schema.
Don't use intersection when you mean join! If you're trying to find 'employees and their departments,' you need a join (linking employee table to department table via department_id). Intersection would only make sense if both tables had the exact same structure and you wanted to find records appearing in both.
SQL provides the INTERSECT operator to implement relational intersection. Understanding its behavior and compatibility with different database systems is essential for practical database work.
SQL INTERSECT Syntax:
SELECT column_list FROM table1
INTERSECT
SELECT column_list FROM table2;
SQL INTERSECT Variants:
| SQL Operator | Behavior | Standard Support |
|---|---|---|
| INTERSECT | Eliminates duplicates (set semantics) | SQL Standard |
| INTERSECT ALL | Preserves duplicate count (bag semantics) | Some vendors |
| INTERSECT DISTINCT | Same as INTERSECT | Explicit form |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
-- Basic INTERSECT: Find customers who ordered in both Q1 and Q2SELECT customer_id, customer_nameFROM Q1_OrdersINTERSECTSELECT customer_id, customer_nameFROM Q2_Orders; -- Find products available in both US and EU warehousesSELECT product_id, product_name, priceFROM US_InventoryINTERSECTSELECT product_id, product_name, priceFROM EU_Inventory; -- Multi-way intersection: Find employees in all three departmentsSELECT employee_id, nameFROM Engineering_TeamINTERSECTSELECT employee_id, nameFROM Research_TeamINTERSECTSELECT employee_id, nameFROM Leadership_Team; -- INTERSECT with ORDER BY (applies to final result)SELECT product_id, product_nameFROM Premium_ProductsINTERSECTSELECT product_id, product_nameFROM Sale_ProductsORDER BY product_name; -- Using INTERSECT to implement AND logic across tables-- Equivalent to subquery approach:SELECT customer_idFROM OrdersWHERE product_category = 'Electronics'INTERSECTSELECT customer_idFROM OrdersWHERE product_category = 'Books'; -- Equivalent subquery version:SELECT DISTINCT customer_idFROM Orders o1WHERE o1.product_category = 'Electronics' AND EXISTS ( SELECT 1 FROM Orders o2 WHERE o2.customer_id = o1.customer_id AND o2.product_category = 'Books' );INTERSECT is supported by most major database systems including PostgreSQL, SQL Server, Oracle, SQLite, and MySQL (8.0+). Some older MySQL versions didn't support INTERSECT, requiring workarounds using EXISTS subqueries or INNER JOINs.
Alternative Approaches (when INTERSECT unavailable):
-- Using EXISTS subquery:
SELECT DISTINCT a.col1, a.col2
FROM TableA a
WHERE EXISTS (
SELECT 1 FROM TableB b
WHERE b.col1 = a.col1 AND b.col2 = a.col2
);
-- Using INNER JOIN with DISTINCT:
SELECT DISTINCT a.col1, a.col2
FROM TableA a
INNER JOIN TableB b
ON a.col1 = b.col1 AND a.col2 = b.col2;
-- Using IN with composite key (if single-valued):
SELECT col1
FROM TableA
WHERE col1 IN (SELECT col1 FROM TableB);
The native INTERSECT operator is generally more readable and may be optimized better by the query planner.
Intersection is invaluable for queries that require identifying commonality across datasets. Here are key real-world applications:
Case Study: Multi-Certification Policy
A company requires all project managers to hold both PMP and Agile certifications. The database stores certifications in separate tables (one credential per record):
-- Find employees with both PMP AND Agile certifications:
π_emp_id,name(σ_cert='PMP'(Certifications))
∩
π_emp_id,name(σ_cert='Agile'(Certifications))
This intersection finds the exact set of employees qualified to lead projects under the dual-certification policy.
Case Study: Customer Retention Analysis
Identify customers who made purchases in both 2024 and 2025 (retained customers):
-- Retained customers:
π_customer_id,customer_name(σ_year=2024(Sales))
∩
π_customer_id,customer_name(σ_year=2025(Sales))
The result represents the company's retained customer base—a critical metric for business analysis.
Whenever you see 'AND' in a requirement that spans different records (not columns within a record), think intersection. 'Customers who bought A AND B' requires intersection because A and B are separate purchase records, not columns in a single row.
We've explored the intersection operation comprehensively. Let's consolidate the key knowledge:
You now have thorough mastery of the intersection operation in relational algebra. You understand its mathematical foundations, can prove its derivability from set difference, recognize when to use intersection vs. join, and can implement intersection in both relational algebra and SQL. Next, we'll explore set difference (−), which identifies tuples in one relation but not another.