Set Operations - Learning Module

Loading content...

0/252

Union Compatibility

The Gateway to Set Operations: Union Compatibility

Every set operation in relational algebra—union, intersection, and difference—requires a critical precondition: union compatibility. Without it, these operations are undefined, because the resulting relation would have no coherent schema.

Union compatibility ensures that when we combine tuples from two relations, the result forms a valid relation with a well-defined structure. It is the gatekeeper that preserves the integrity of the relational model.

This page provides exhaustive coverage of union compatibility: the formal requirements, how database systems evaluate compatibility, common mistakes, and practical techniques for making incompatible relations compatible through projection and renaming.

What You Will Learn

By the end of this page, you will master the formal definition of union compatibility, understand the difference between structural and semantic compatibility, know how database systems resolve schema differences, learn techniques to achieve compatibility between mismatched relations, and recognize common compatibility errors and how to avoid them.

Formal Definition of Union Compatibility

Union compatibility (also called type compatibility or schema compatibility) is a relationship between two relations that permits set operations. It is a necessary condition—without it, union, intersection, and difference are undefined.

Formal Definition:

Two relations R(A₁, A₂, ..., Aₙ) and S(B₁, B₂, ..., Bₘ) are union-compatible if and only if:

Same Degree (Arity): n = m
- Both relations must have the same number of attributes
Domain Compatibility: For each i from 1 to n:
- dom(Aᵢ) is compatible with dom(Bᵢ)
- Corresponding attributes must have compatible data types

Notation:

We write R ≈ S to denote that R and S are union-compatible.

If R ≈ S, then:

R ∪ S is defined
R ∩ S is defined
R − S is defined
S − R is defined

Attribute Names Don't Matter (Theoretically)

In pure relational algebra, union compatibility is based on position, not names. The first attribute of R is compared with the first attribute of S, regardless of what they're called. However, in practice (especially SQL), having meaningful, matching names improves query clarity and some systems offer name-based compatibility modes.

Union Compatibility Requirements
Requirement	Definition	Example
Same Degree	Both relations have equal number of attributes	R(A,B,C) and S(X,Y,Z) both have degree 3 ✓
Domain Compatibility	For each position i, dom(Rᵢ) ≈ dom(Sᵢ)	INT compatible with INT, VARCHAR with VARCHAR
Position-Based	Comparison by attribute position, not name	R.name (pos 1) vs S.product_name (pos 1)

What Union Compatibility Does NOT Require:

Same attribute names: Names can differ between relations
Identical domains: Compatible domains (not identical) are sufficient
Same cardinality: Relations can have different numbers of tuples
Same constraints: Constraints are not compared for compatibility

Result Schema:

When R ∪ S (or R ∩ S, R − S) is computed, the result schema is typically determined by:

Attribute names from the first operand (R)
Domains resolved to accommodate both operands (widening)
Constraints are NOT automatically inherited

Domain Compatibility: Types and Rules

Domain compatibility is more nuanced than simple type equality. Database systems typically allow compatible types that can be meaningfully compared, even if not identical.

Compatible Type Categories:

Domains are generally compatible if they belong to the same type family and values can be compared without loss of information:

Common Domain Compatibility Groups
Type Family	Compatible Types	Notes
Integer Numeric	TINYINT, SMALLINT, INT, BIGINT	Smaller types widen to larger
Decimal Numeric	FLOAT, DOUBLE, DECIMAL, NUMERIC	Precision may vary
Fixed String	CHAR(n) with various n	Padding differences handled
Variable String	VARCHAR(n) with various n	Length limits may widen
Date/Time	DATE, TIME, DATETIME, TIMESTAMP	Usually compatible within family
Boolean	BOOLEAN, BIT, TINYINT(1)	Vendor-specific variations

Compatible Domain Pairs

•INTEGER and BIGINT
•VARCHAR(50) and VARCHAR(100)
•CHAR(10) and CHAR(20)
•FLOAT and DOUBLE
•DECIMAL(10,2) and DECIMAL(12,4)
•DATE and DATE
•NUMERIC and DECIMAL

Incompatible Domain Pairs

•INTEGER and VARCHAR
•DATE and INTEGER
•BOOLEAN and FLOAT
•BLOB and VARCHAR
•TIMESTAMP and DECIMAL
•JSON and INTEGER
•ARRAY and SCALAR types

Domain Resolution (Widening):

When compatible domains differ, the result domain is typically the "wider" type that can accommodate values from both:

INT + BIGINT → BIGINT (larger integer type)
VARCHAR(50) + VARCHAR(100) → VARCHAR(100) (larger capacity)
FLOAT + DOUBLE → DOUBLE (higher precision)
DECIMAL(10,2) + DECIMAL(8,4) → DECIMAL(12,4) (covers both)

Implicit Conversion:

Some databases perform implicit conversion for set operations:

-- This might work (integer to string conversion)
SELECT id, name FROM Table1  -- id is INT
UNION
SELECT product_code, name FROM Table2  -- product_code is VARCHAR

However, relying on implicit conversion is risky and not recommended. Explicit casting ensures clarity and portability.

Implicit Conversion Dangers

While some databases allow implicit conversion between types (e.g., INT to VARCHAR), this can lead to subtle bugs. The same integer might become different strings depending on leading zeros, localization settings, or padding. Always use explicit CAST when type conversion is needed.

Structural vs. Semantic Compatibility

A critical distinction that many database learners overlook is the difference between structural compatibility (what the database checks) and semantic compatibility (what makes business sense).

Structural Compatibility:

Same number of columns
Compatible data types for corresponding columns
Verified automatically by the database engine
If unsatisfied, results in a compile-time or execution error

Semantic Compatibility:

Columns represent the same real-world concept
Values are comparable in meaning, not just type
NOT verified by the database (human responsibility)
If violated, results in meaningless but syntactically valid results

The Semantic Trap

A database will happily union 'customer_age' with 'product_price' if both are INTEGER. The query runs without error, but the result is garbage. The database checks syntax; YOU must check semantics.

Examples of Semantic Mismatches:

-- Structurally compatible, semantically WRONG:

-- Example 1: Different entities
SELECT id, name FROM Customers
UNION
SELECT id, title FROM Products;  -- Meaningless mix!

-- Example 2: Different units
SELECT product_id, price_usd FROM US_Store
UNION
SELECT product_id, price_eur FROM EU_Store;  -- Currency mismatch!

-- Example 3: Different time zones
SELECT event_id, timestamp_utc FROM US_Events
UNION
SELECT event_id, timestamp_local FROM EU_Events;  -- Time zone chaos!

-- Example 4: Different scales
SELECT account_id, balance_cents FROM SystemA
UNION
SELECT account_id, balance_dollars FROM SystemB;  -- 100x error!

In each case, the query executes successfully but produces nonsensical or dangerous results.

Semantic Compatibility Checklist

•Same Entity Type: Are both relations representing the same kind of thing (customers, products, orders)?
•Same Attribute Meaning: Do corresponding columns mean the same thing (price, not one being quantity)?
•Same Units: Are numeric values in the same units (dollars vs cents, meters vs feet)?
•Same Time Reference: Are timestamps in the same timezone, using the same epoch?
•Same Encoding: Are codes/IDs from the same system with the same meaning?
•Same Granularity: Are values at the same level of detail (daily vs monthly totals)?

Best Practice

Before executing any set operation, verbalize what you're combining: 'I'm unioning customer records from last year with customer records from this year.' If the statement sounds coherent, proceed. If it sounds like 'I'm unioning customers with products,' stop and reconsider.

Achieving Compatibility: Projection and Renaming

When relations are not naturally union-compatible, we can often achieve compatibility through the strategic use of projection (π) and renaming (ρ). These operations allow us to reshape relations for set operations.

Strategy 1: Projection to Shared Schema

When relations have different degrees, project both to a common subset of compatible attributes:

-- Original incompatible relations:
Customers(cust_id, name, email, phone, address)
Suppliers(supp_id, company_name, contact_email)

-- Project to common compatible schema:
π_{cust_id, name}(Customers) ∪ π_{supp_id, company_name}(Suppliers)

-- Now both have degree 2 with compatible domains

Strategy 2: Renaming for Clarity

While attribute names don't affect compatibility, renaming improves clarity when combining relations:

-- Rename for unified result schema:
ρ_{id, name, email}(π_{cust_id, name, email}(Customers))
∪
ρ_{id, name, email}(π_{supp_id, company_name, contact_email}(Suppliers))

-- Result has clean, unified attribute names: (id, name, email)

Strategy 3: Adding Constant Columns

When one relation is missing an attribute present in another, use constants:

-- SQL: Add source indicator
SELECT cust_id, name, email, 'Customer' AS source_type
FROM Customers
UNION
SELECT supp_id, company_name, contact_email, 'Supplier' AS source_type
FROM Suppliers;

compatibility_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Problem: Relations with different structures
-- Customers(id INT, name VARCHAR, email VARCHAR, loyalty_points INT)
-- Prospects(lead_id INT, full_name VARCHAR, contact_email VARCHAR)
 
-- Solution 1: Project to common columns
SELECT id, name, email FROM Customers
UNION
SELECT lead_id, full_name, contact_email FROM Prospects;
 
-- Solution 2: Add NULL padding for missing columns
SELECT id, name, email, loyalty_points, 'customer' AS type
FROM Customers
UNION
SELECT lead_id, full_name, contact_email, NULL, 'prospect' AS type
FROM Prospects;
 
-- Solution 3: Use CAST for type compatibility issues
SELECT id, CAST(product_code AS VARCHAR(50)), name FROM Table1
UNION
SELECT id, sku, name FROM Table2;
 
-- Solution 4: Reorder columns for positional matching
-- If Table1 has (A, B, C) and Table2 has (C, A, B)
SELECT A, B, C FROM Table1
UNION
SELECT A, B, C FROM Table2;  -- Explicitly order to match
 
-- Solution 5: Handle NULL-heavy columns with COALESCE
SELECT id, name, COALESCE(email, 'no-email') AS email
FROM Customers
UNION
SELECT id, name, COALESCE(contact, 'no-contact') AS email
FROM Vendors;

Order Matters in Projection

When projecting for union compatibility, ensure columns are listed in the corresponding order. π_{A,B,C}(R) ∪ π_{X,Y,Z}(S) matches A with X, B with Y, C with Z. Mismatched order with mismatched semantics leads to nonsensical results.

Common Compatibility Errors and Solutions

Understanding common union compatibility errors helps you diagnose issues quickly and write correct queries from the start. Here are the most frequent problems and their solutions:

Common Union Compatibility Errors
Error	Cause	Solution
Different column count	SELECT * from tables with different schemas	Use explicit column lists with same count
Type mismatch	Corresponding columns have incompatible types	Use CAST/CONVERT to unify types
Column order mismatch	Attributes listed in different order	Reorder SELECT columns consistently
Implicit vs explicit NULL	One query has NULLs where other has values	Use COALESCE or add NULL constants
Missing constants	Need discriminator column	Add literal values in SELECT
Subquery degree error	Subquery returns wrong column count	Verify subquery schema matches

error_examples.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
-- ERROR 1: Different column count
-- Bad:
SELECT * FROM Customers  -- 5 columns
UNION
SELECT * FROM SimpleContacts;  -- 3 columns
-- Error: Number of columns must match
 
-- Fix:
SELECT id, name, email FROM Customers
UNION
SELECT contact_id, contact_name, email FROM SimpleContacts;
 
 
-- ERROR 2: Type mismatch
-- Bad:
SELECT order_id, order_date FROM Orders  -- order_id is INT
UNION
SELECT invoice_no, invoice_date FROM Invoices;  -- invoice_no is VARCHAR
-- Error (in strict mode): Type mismatch for column 1
 
-- Fix:
SELECT CAST(order_id AS VARCHAR(20)), order_date FROM Orders
UNION
SELECT invoice_no, invoice_date FROM Invoices;
 
 
-- ERROR 3: Column order mismatch (semantic error, not syntax)
-- Bad (runs but wrong):
SELECT name, email, id FROM TableA
UNION
SELECT id, name, email FROM TableB;
-- Result mixes name with id, email with name!
 
-- Fix: Ensure consistent column order
SELECT id, name, email FROM TableA
UNION
SELECT id, name, email FROM TableB;
 
 
-- ERROR 4: Aggregate mismatch
-- Bad:
SELECT customer_id, SUM(amount) FROM Orders GROUP BY customer_id
UNION
SELECT customer_id, amount FROM SinglePurchases;  -- amount vs SUM
-- Might run but semantics are wrong (aggregate vs detail)
 
-- Fix: Either aggregate both or detail both
SELECT customer_id, SUM(amount) FROM Orders GROUP BY customer_id
UNION
SELECT customer_id, SUM(amount) FROM SinglePurchases GROUP BY customer_id;
 
 
-- ERROR 5: UNION in wrong position
-- Bad:
SELECT a FROM t1 ORDER BY a
UNION
SELECT a FROM t2;  -- ORDER BY before UNION is often an error
 
-- Fix: ORDER BY after all UNION operations
SELECT a FROM t1
UNION
SELECT a FROM t2
ORDER BY a;

Debug Strategy

When union compatibility errors occur: 1) Run each SELECT independently to verify it works, 2) Compare column counts (SELECT NULL, NULL, ... to match), 3) Check data types of each position, 4) Use explicit CASTs if types differ. Most IDEs will show column types on hover.

Database System Differences

Different database systems have varying levels of strictness and behavior regarding union compatibility. Understanding these differences is essential for writing portable SQL and diagnosing cross-platform issues.

Union Compatibility Behavior by Database
Database	Type Strictness	Implicit Conversion	Name Resolution
PostgreSQL	Strict	Limited	First query names used
MySQL	Lenient	Extensive	First query names used
SQL Server	Moderate	Common types	First query names used
Oracle	Moderate	Common types	First query names used
SQLite	Very lenient	Dynamic typing	First query names used

PostgreSQL (Strictest):

Requires exact type matches or explicit compatible types
Won't implicitly convert INTEGER to VARCHAR
Clear error messages for type mismatches

-- PostgreSQL: This fails
SELECT 1 UNION SELECT 'a';  -- ERROR: types integer and text cannot be matched

-- Fix requires explicit cast
SELECT CAST(1 AS TEXT) UNION SELECT 'a';

MySQL (Most Lenient):

Extensive implicit type conversion
May convert between very different types
Can produce unexpected results

-- MySQL: This works (but might not be what you want)
SELECT 1 UNION SELECT 'abc';  -- Returns 1 and 'abc' as strings

SQLite (Dynamic Typing):

Type affinity rather than strict types
Almost any combination works
Least predictable for complex types

-- SQLite: Very permissive
SELECT 123 UNION SELECT 'hello' UNION SELECT 45.67;

Portability Strategy

For maximum portability, write for the strictest system (PostgreSQL). Use explicit CASTs, match types exactly, and specify column lists explicitly. Code that works on PostgreSQL will work everywhere; code that only works on MySQL may fail on stricter systems.

Advanced Compatibility Patterns

Beyond basic projection and casting, several advanced patterns help achieve compatibility in complex scenarios.

Pattern 1: Synthetic Key Generation

When combining relations with incompatible keys, generate a synthetic unified key:

-- Create unified key with source prefix
SELECT CONCAT('CUS_', CAST(customer_id AS VARCHAR(20))) AS unified_id,
       name, email, 'Customer' AS source
FROM Customers
UNION
SELECT CONCAT('SUP_', CAST(supplier_id AS VARCHAR(20))) AS unified_id,
       company_name, contact_email, 'Supplier' AS source
FROM Suppliers;

Pattern 2: Polymorphic Schema with NULLs

Create a "supertype" schema that accommodates attributes from all subtypes:

-- Union of different entity types with type-specific columns
SELECT 'Employee' AS entity_type,
       emp_id AS id,
       name,
       salary,
       NULL AS hourly_rate,
       NULL AS project_code
FROM Employees
UNION ALL
SELECT 'Contractor' AS entity_type,
       contractor_id,
       name,
       NULL,
       hourly_rate,
       project_code
FROM Contractors;

Pattern 3: Temporal Schema Alignment

When unioning data from different time periods with schema evolution:

-- Old schema: (id, name, address)
-- New schema: (id, first_name, last_name, street, city, zip)

SELECT id, 
       name AS full_name,
       address AS full_address,
       NULL AS street,
       NULL AS city,
       NULL AS zip,
       'legacy' AS schema_version
FROM Customers_Archive
UNION ALL
SELECT id,
       CONCAT(first_name, ' ', last_name),
       CONCAT(street, ', ', city, ' ', zip),
       street,
       city,
       zip,
       'current' AS schema_version
FROM Customers_Current;

Pattern 4: JSON/XML Expansion

For semi-structured data, expand to compatible relations:

-- Expand JSON customer field to match relational schema
SELECT customer_data->>'id' AS id,
       customer_data->>'name' AS name,
       customer_data->>'email' AS email
FROM JSON_Customers
UNION
SELECT CAST(id AS TEXT), name, email
FROM Relational_Customers;

Schema Evolution

In long-lived systems, schemas evolve over time. Archived data may have different structures than current data. Union compatibility patterns for temporal alignment are essential for historical analysis and data migration projects.

Summary: Mastering Union Compatibility

We've completed a thorough exploration of union compatibility—the essential prerequisite for all relational set operations. Let's consolidate the key knowledge:

Key Takeaways

•Definition: Union compatibility requires same degree and compatible domains for corresponding attributes
•Position-Based: Compatibility is checked by attribute position, not by name
•Domain Rules: Types in the same family (INT/BIGINT, VARCHAR variants) are typically compatible
•Semantic vs Structural: The database checks structure; YOU must ensure semantic validity
•Result Schema: Attribute names come from the first operand; domains may widen
•Achieving Compatibility: Use projection, renaming, and casting to reshape relations
•Database Differences: PostgreSQL is strict; MySQL/SQLite are lenient; write for the strictest
•Common Patterns: NULL padding, synthetic keys, polymorphic schemas enable complex unions

Page Complete

You now have comprehensive understanding of union compatibility in relational algebra and SQL. You can diagnose compatibility errors, reshape relations for set operations, and understand the nuances across database systems. Next, we'll explore the rules governing all set operations, including operator precedence, algebraic properties, and query optimization opportunities.