Database Management SystemsRelations and Tables

Relations and Tables: The Foundation of Relational Databases

LevelBeginner

Duration60 mins

TopicRelations and Tables

2 / 5

Mathematical Foundation: The Set-Theoretic Basis

Why Mathematics Matters in Databases

The relational model's greatest strength lies not in its intuitive table metaphor, but in its rigorous mathematical foundation. This foundation is what enables query optimization, formal correctness proofs, and the remarkable fact that a query written in 1980 still runs correctly on modern databases.

Edgar Codd deliberately chose set theory as the basis for his model because set theory provides:

Precision: Every concept has an unambiguous mathematical definition
Compositionality: Complex operations can be built from simpler ones
Provability: Properties of queries and transformations can be formally verified
Independence: Logical operations are independent of physical implementation

This page explores the mathematical structures underlying relations. Understanding this foundation transforms database work from rote SQL writing to principled data engineering.

Learning Objectives

By the end of this page, you will: (1) Understand sets, tuples, and Cartesian products as they apply to relations, (2) Recognize how set theory operations correspond to relational operations, (3) Appreciate closure properties and their importance, (4) Grasp the algebraic structure that enables query optimization, and (5) Connect mathematical concepts to practical database behavior.

Set Theory Fundamentals

The relational model is built upon set theory, one of the most fundamental branches of mathematics. Before we can understand relations mathematically, we must establish the set-theoretic concepts upon which they rest.

Definition (Set): A set is an unordered collection of distinct elements. We denote sets using curly braces: {a, b, c}.

Key Properties of Sets:

Fundamental Set Properties

•No Duplicates: An element either is or is not in a set; it cannot appear twice. {1, 1, 2} = {1, 2}.
•No Order: Sets have no inherent ordering. {1, 2, 3} = {3, 1, 2} = {2, 3, 1}.
•Membership (∈): We write x ∈ S to indicate element x is a member of set S.
•Cardinality (|S|): The number of elements in set S. |{a, b, c}| = 3.
•Empty Set (∅): The set with no elements. |∅| = 0.
•Subset (⊆): A ⊆ B means every element of A is also in B.

Set Operations:

Sets support fundamental operations that correspond directly to relational operations:

Operation	Symbol	Definition	Example
Union	A ∪ B	All elements in A or B (or both)	{1,2} ∪ {2,3} = {1,2,3}
Intersection	A ∩ B	Elements in both A and B	{1,2} ∩ {2,3} = {2}
Difference	A − B	Elements in A but not in B	{1,2} − {2,3} = {1}
Symmetric Diff.	A △ B	Elements in exactly one of A or B	{1,2} △ {2,3} = {1,3}
Subset	A ⊆ B	True if all elements of A are in B	{2} ⊆ {1,2,3} = True

These operations are closed over sets: applying any operation to sets produces another set. This closure property is crucial—it means we can chain operations without leaving the domain of sets.

Sets and Relations

Because a relation is a set of tuples, all set operations apply directly to relations. The UNION of two relations is the union of their tuple sets. The INTERSECT of two relations is the intersection. This mathematical foundation is what makes SQL set operations possible and well-defined.

Ordered Pairs and Tuples

While sets are unordered, relations require ordered structures to associate attributes with values. This is where ordered pairs and tuples come in.

Definition (Ordered Pair): An ordered pair (a, b) is a pair of elements where order matters. Unlike sets, (a, b) ≠ (b, a) unless a = b.

Definition (n-Tuple): An n-tuple (or simply tuple) is an ordered sequence of n elements: (a₁, a₂, ..., aₙ). The value n is called the degree or arity of the tuple.

Crucial Property: Two tuples are equal if and only if they have the same degree and corresponding elements are equal:

(a₁, a₂, ..., aₙ) = (b₁, b₂, ..., bₙ) ⟺ (n = m) ∧ (a₁ = b₁) ∧ (a₂ = b₂) ∧ ... ∧ (aₙ = bₙ)

Tuple Equality ExamplesUnderstanding when tuples are equal:

Input

Compare: (1, 2, 3), (1, 2, 3), (3, 2, 1), (1, 2)

Output

(1, 2, 3) = (1, 2, 3) ✓ Equal
(1, 2, 3) ≠ (3, 2, 1) ✗ Same elements, different order
(1, 2, 3) ≠ (1, 2) ✗ Different degrees (3 vs 2)

Explanation

Order and degree both matter for tuple equality. This is why attribute ordering in formal relation definitions is significant, even though SQL abstracts this with named columns.

Component Access:

We reference individual components of a tuple using subscript notation:

For tuple t = (a₁, a₂, ..., aₙ), the i-th component is t[i] = aᵢ

In database terminology, if t is a tuple in a relation with attributes A₁, A₂, ..., Aₙ, we write:

t[Aᵢ] or t.Aᵢ to access the value of attribute Aᵢ in tuple t

Example: Given relation Employee with attributes (ID, Name, Salary) and tuple t = (101, 'Alice', 75000):

t[1] = t.ID = 101
t[2] = t.Name = 'Alice'
t[3] = t.Salary = 75000

Named vs. Positional Access

The mathematical definition uses positional access (t[i]), but practical systems and SQL use named access (t.Salary). Named access is more robust to schema changes—if columns are reordered, named references still work. The formal model supports both interpretations.

Cartesian Products: The Foundation of Relations

The Cartesian product is the operation that connects individual domains to form the space of all possible tuples. Understanding Cartesian products is essential for grasping what a relation really represents.

Definition (Cartesian Product of Two Sets): The Cartesian product of sets A and B, denoted A × B, is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B:

A × B = {(a, b) | a ∈ A ∧ b ∈ B}

Definition (Generalized Cartesian Product): The Cartesian product of n sets D₁, D₂, ..., Dₙ is:

D₁ × D₂ × ... × Dₙ = {(d₁, d₂, ..., dₙ) | d₁ ∈ D₁ ∧ d₂ ∈ D₂ ∧ ... ∧ dₙ ∈ Dₙ}

Cardinality of Cartesian Products:

The size of a Cartesian product grows multiplicatively:

|A × B| = |A| · |B|

|D₁ × D₂ × ... × Dₙ| = |D₁| · |D₂| · ... · |Dₙ|

This explosive growth is why relations are subsets of Cartesian products—we only include tuples that represent true facts, not every possible combination.

Cartesian Product VisualizationLet's construct the Cartesian product of two small sets:

Input

A = {a, b}, B = {1, 2, 3}

Output

A × B = {(a,1), (a,2), (a,3), (b,1), (b,2), (b,3)}

Explanation

|A × B| = |A| · |B| = 2 · 3 = 6 tuples. Each element of A is paired with every element of B. This represents ALL possible combinations—a relation selects which of these actually exist.

Visualizing Cartesian Products:

Cartesian products can be visualized as a grid where rows represent elements of one set and columns represent elements of another:

       | 1       | 2       | 3
-------|---------|---------|--------
a      | (a,1)   | (a,2)   | (a,3)
b      | (b,1)   | (b,2)   | (b,3)

For three sets A × B × C, we'd need a 3D grid—each point in the cube represents a possible 3-tuple.

Connection to Relations:

A relation R over domains D₁, D₂, ..., Dₙ is defined as:

R ⊆ D₁ × D₂ × ... × Dₙ

This means R is a selection of points from the n-dimensional space of all possible combinations. The relation asserts which combinations actually occur.

Cartesian Product in SQL

In SQL, the Cartesian product is performed by joining tables without a join condition (CROSS JOIN). If TableA has 1,000 rows and TableB has 10,000 rows, the result has 10,000,000 rows. This explosive growth is why accidental Cartesian products (forgot the WHERE clause) can crash queries and exhaust resources.

Domains and Their Semantic Properties

In the relational model, a domain is not merely a data type—it's a named pool of values with semantic meaning.

Definition (Domain): A domain D is a named set of atomic values. Each value in a domain represents an indivisible unit of information.

Key Properties of Domains:

Domain Characteristics

•Named: Each domain has a unique name indicating its semantic purpose (e.g., EmployeeID, ProductPrice, PhoneNumber).
•Atomic Values: Values cannot be decomposed within the relational model. A phone number is atomic even if it has area code/number components in the real world.
•Finite or Infinite: Domains may be finite ({A, B, C, D, F} for grades) or conceptually infinite (all positive integers for IDs).
•Data Type Foundation: Each domain has an underlying data type determining representation and valid operations.
•Semantic Meaning: Domains from the same data type may be semantically distinct. EmployeeID and ProductID might both be integers, but comparing them is meaningless.

Domain Compatibility:

Two attributes are domain compatible (or union compatible) if they draw values from the same domain. This concept is crucial for:

Set Operations: UNION, INTERSECT, and EXCEPT require corresponding attributes to be domain compatible.
Comparisons: Comparing values from incompatible domains (EmployeeID = ProductID) is semantically meaningless.
Joins: Natural joins match attributes by name, assuming domain compatibility.

Example of Domain Distinction:

Semantically Distinct Domains with Same Data Type
Domain Name	Data Type	Sample Values	Semantic Meaning
EmployeeID	INTEGER	1001, 1002, 1003	Unique employee identifier
DepartmentID	INTEGER	10, 20, 30, 40	Unique department identifier
Salary	INTEGER	50000, 75000, 100000	Annual compensation in dollars
Age	INTEGER	25, 30, 45, 62	Years since birth

All four domains use INTEGER, but:

Adding Salary + Age makes no sense
Comparing EmployeeID = DepartmentID is semantically wrong
Setting Salary = Age would be a logic error

Strong domain discipline prevents these errors. In formal relational theory, domains enforce semantic constraints that pure data types cannot.

SQL and Domains:

Standard SQL includes a CREATE DOMAIN statement for defining semantic domains, but many databases don't fully implement it. Instead, developers use CHECK constraints, foreign keys, and application logic to approximate domain semantics.

Domain-Driven Design in Practice

Even if your database doesn't enforce domains, thinking in terms of domains improves design. Use distinct column names that imply domains (employee_id vs department_id, not just id). Document domain constraints. Consider custom types in PostgreSQL (CREATE TYPE) for important semantic distinctions.

The Relation as a Mathematical Structure

Having established sets, tuples, Cartesian products, and domains, we can now formally define relations with complete mathematical precision.

Definition (Relation Schema): A relation schema R(A₁:D₁, A₂:D₂, ..., Aₙ:Dₙ) consists of:

A relation name R
A set of attribute names {A₁, A₂, ..., Aₙ}
A mapping from each attribute Aᵢ to its domain Dᵢ

The degree of the schema is n (the number of attributes).

Definition (Relation Instance): A relation instance r of schema R is a finite subset of D₁ × D₂ × ... × Dₙ where each n-tuple t = (d₁, d₂, ..., dₙ) satisfies dᵢ ∈ Dᵢ for all i.

The cardinality of instance r is |r| (the number of tuples).

relation_notation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Formal Schema Notation:
R(A₁:D₁, A₂:D₂, ..., Aₙ:Dₙ)
 
-- Example: Employee Relation Schema
Employee(
    EmployeeID: PositiveInteger,
    Name: String(50),
    Department: DepartmentName,
    Salary: PositiveDecimal,
    HireDate: Date
)
 
-- Relation Instance (a specific state of the relation):
r(Employee) = {
    (101, 'Alice Chen', 'Engineering', 95000.00, 2020-03-15),
    (102, 'Bob Smith', 'Marketing', 72000.00, 2019-07-01),
    (103, 'Carol Davis', 'Engineering', 105000.00, 2018-01-20),
    (104, 'David Kim', 'Sales', 68000.00, 2021-09-10)
}
 
-- Properties:
--   Schema degree: 5 (five attributes)
--   Instance cardinality: 4 (four tuples)

The Two-Level Structure:

Notice the critical distinction:

Schema Level (Intension): Defines what is possible—the structure, the attributes, the domains. Schemas are relatively stable; they change only when the database structure is altered.
Instance Level (Extension): Defines what is true now—the actual data. Instances change constantly as data is inserted, updated, and deleted.

This separation mirrors the distinction in logic between:

Definition: What a predicate means (schema)
Extension: What satisfies the predicate (instance)

For the Employee relation, the schema says 'we can record employees with IDs, names, departments, salaries, and hire dates.' The instance says 'these four specific employees exist with these specific values.'

Intension vs Extension

In database parlance, the schema is the INTENSION (it intensionally defines the structure), while the current data is the EXTENSION (it extends the schema with actual facts). Understanding this distinction is key to understanding constraints, views, and data independence.

Algebraic Properties of Relations

Relations, being sets, inherit algebraic properties from set theory. Additional properties emerge from the specific operations of relational algebra. These properties are not mere mathematical curiosities—they form the foundation for query optimization.

Closure Property:

The most fundamental property is closure: every relational operator takes relations as input and produces a relation as output.

If R and S are relations, then:

σ(R) is a relation (selection)
π(R) is a relation (projection)
R ∪ S is a relation (union)
R ∩ S is a relation (intersection)
R − S is a relation (difference)
R × S is a relation (Cartesian product)
R ⋈ S is a relation (join)

Closure enables composition: we can chain operations arbitrarily. The result of one operation becomes input to the next, enabling complex queries from simple primitives.

Key Algebraic Properties for Optimization

•Commutativity of Join: R ⋈ S = S ⋈ R. The optimizer can choose which table to access first.
•Associativity of Join: (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T). The optimizer can group joins optimally.
•Commutativity of Union/Intersection: R ∪ S = S ∪ R, R ∩ S = S ∩ R.
•Distributivity of Selection over Join: σ(R ⋈ S) = σ(R) ⋈ S when σ references only R's attributes. Enables 'pushing selections down.'
•Cascade of Selections: σ_p(σ_q(R)) = σ_{p∧q}(R). Multiple selections combine into one.
•Cascade of Projections: π_A(π_B(R)) = π_A(R) when A ⊆ B. Inner projections can be eliminated.
•Selection and Projection Commute: When selection predicate involves only projected attributes.

Why These Properties Enable Optimization:

Consider a query: 'Find names of employees in Engineering earning over $100K.'

Naive Execution:

Join all employees with all departments (potentially millions of combinations)
Filter to Engineering (remove most rows)
Filter to salary > 100K (remove more rows)
Project to Name column

Optimized Execution (using algebraic properties):

Filter employees to salary > 100K first (distributivity of selection)
Filter to Engineering department (cascade of selections)
Project to needed columns early (reduce data volume)
Only now do any remaining joins

The optimizer uses algebraic properties to prove these executions are equivalent (produce identical results), then chooses the more efficient one. Without mathematical foundations, such transformations would be guesswork.

The Power of Algebraic Equivalence

Query optimizers work by finding algebraically equivalent query plans and choosing the cheapest. A single SQL query might have thousands of equivalent relational algebra expressions. The mathematical properties guarantee they all produce the same result, freeing the optimizer to choose based purely on performance.

NULL Values and Three-Valued Logic

Codd's original relational model assumed complete information—every attribute of every tuple has a value from its domain. Reality is messier: sometimes data is missing, unknown, or not applicable. SQL databases handle this with NULL values, introducing significant mathematical complexity.

What NULL Represents:

NULL is not a value—it's the absence of a value. It can represent:

Unknown: The value exists but we don't know it (e.g., a person's birth date that wasn't recorded)
Inapplicable: The attribute doesn't apply to this entity (e.g., spouse name for an unmarried person)
Withheld: The value exists and is known but not disclosed

SQL's NULL conflates these distinct cases, which can lead to logical surprises.

Three-Valued Logic (3VL):

With NULL, boolean expressions evaluate to three possible values: TRUE, FALSE, or UNKNOWN.

The truth tables extend standard boolean logic:

Three-Valued Logic: AND Operation
P	Q	P AND Q
TRUE	TRUE	TRUE
TRUE	FALSE	FALSE
TRUE	UNKNOWN	UNKNOWN
FALSE	TRUE	FALSE
FALSE	FALSE	FALSE
FALSE	UNKNOWN	FALSE
UNKNOWN	TRUE	UNKNOWN
UNKNOWN	FALSE	FALSE
UNKNOWN	UNKNOWN	UNKNOWN

Three-Valued Logic: OR Operation
P	Q	P OR Q
TRUE	TRUE	TRUE
TRUE	FALSE	TRUE
TRUE	UNKNOWN	TRUE
FALSE	TRUE	TRUE
FALSE	FALSE	FALSE
FALSE	UNKNOWN	UNKNOWN
UNKNOWN	TRUE	TRUE
UNKNOWN	FALSE	UNKNOWN
UNKNOWN	UNKNOWN	UNKNOWN

Practical Implications:

Comparisons with NULL are UNKNOWN:
- NULL = NULL → UNKNOWN (not TRUE!)
- NULL <> NULL → UNKNOWN
- 5 = NULL → UNKNOWN
WHERE Clauses Only Return TRUE Rows:
- WHERE salary > 50000 excludes rows where salary is NULL
- This can cause unexpected missing rows!
NOT UNKNOWN = UNKNOWN:
- NOT(NULL = 5) → NOT(UNKNOWN) → UNKNOWN
Use IS NULL / IS NOT NULL for NULL Testing:
- x IS NULL → TRUE if x is NULL, FALSE otherwise (never UNKNOWN)
- x IS NOT NULL → TRUE if x has a value

NULL Gotchas

Common mistake: WHERE column <> 'value' excludes NULLs. If you want rows where column is not 'value' INCLUDING NULLs, you need: WHERE column <> 'value' OR column IS NULL. Three-valued logic trips up even experienced SQL developers.

Summary: Mathematics to Practice

We've traversed the mathematical landscape underlying the relational model. Let's consolidate how these abstract concepts connect to practical database work:

Key Concepts Mastered

•Set Theory Foundation: Relations are sets of tuples, inheriting closure and operation properties that enable query composition and optimization.
•Tuples as Ordered Structures: Unlike sets, tuples maintain order, enabling attribute-value association. Access by name or position.
•Cartesian Products Define Possibility Space: A relation is a subset of the Cartesian product of its domains—selecting which fact combinations are true.
•Domains Provide Semantic Meaning: Beyond data types, domains define what values mean and which comparisons are sensible.
•Schema vs. Instance: The two-level structure separates what's possible (schema/intension) from what's true (instance/extension).
•Algebraic Properties Enable Optimization: Commutativity, associativity, and distributivity allow query rewriting for efficiency.
•Three-Valued Logic Handles Missing Data: NULL introduces UNKNOWN, complicating boolean evaluation in ways that demand careful attention.

Why This Matters in Practice:

Mathematical Concept	Practical Application
Set properties	UNION, INTERSECT, EXCEPT queries work correctly
Tuple structure	Column ordering, SELECT *, explicit column lists
Cartesian products	CROSS JOIN, understanding join explosions
Domain semantics	Type safety, meaningful foreign keys
Algebraic properties	Query optimizer transforms queries automatically
Three-valued logic	Handling NULLs correctly in WHERE, CASE, JOINs

What's Next:

With the mathematical foundation established, we'll now examine how these abstract relations materialize as tables—the concrete structures you interact with in every SQL database. The next page bridges the gap from mathematical purity to practical implementation.

Mathematical Foundation Complete

You now understand the set-theoretic and algebraic basis of the relational model. This foundation enables you to reason about database behavior mathematically—understanding not just what SQL does, but why it works and how optimizers transform your queries. Next, we'll see how these abstractions become concrete tables.

2 / 5

Loading learning content...

Database Management SystemsRelations and Tables

Relations and Tables: The Foundation of Relational Databases

LevelBeginner

Duration60 mins

TopicRelations and Tables

2 / 5

Mathematical Foundation: The Set-Theoretic Basis

Why Mathematics Matters in Databases

Edgar Codd deliberately chose set theory as the basis for his model because set theory provides:

Precision: Every concept has an unambiguous mathematical definition
Compositionality: Complex operations can be built from simpler ones
Provability: Properties of queries and transformations can be formally verified
Independence: Logical operations are independent of physical implementation

This page explores the mathematical structures underlying relations. Understanding this foundation transforms database work from rote SQL writing to principled data engineering.

Learning Objectives

Set Theory Fundamentals

Definition (Set): A set is an unordered collection of distinct elements. We denote sets using curly braces: {a, b, c}.

Key Properties of Sets:

Fundamental Set Properties

•No Duplicates: An element either is or is not in a set; it cannot appear twice. {1, 1, 2} = {1, 2}.
•No Order: Sets have no inherent ordering. {1, 2, 3} = {3, 1, 2} = {2, 3, 1}.
•Membership (∈): We write x ∈ S to indicate element x is a member of set S.
•Cardinality (|S|): The number of elements in set S. |{a, b, c}| = 3.
•Empty Set (∅): The set with no elements. |∅| = 0.
•Subset (⊆): A ⊆ B means every element of A is also in B.

Set Operations:

Sets support fundamental operations that correspond directly to relational operations:

Operation	Symbol	Definition	Example
Union	A ∪ B	All elements in A or B (or both)	{1,2} ∪ {2,3} = {1,2,3}
Intersection	A ∩ B	Elements in both A and B	{1,2} ∩ {2,3} = {2}
Difference	A − B	Elements in A but not in B	{1,2} − {2,3} = {1}
Symmetric Diff.	A △ B	Elements in exactly one of A or B	{1,2} △ {2,3} = {1,3}
Subset	A ⊆ B	True if all elements of A are in B	{2} ⊆ {1,2,3} = True

These operations are closed over sets: applying any operation to sets produces another set. This closure property is crucial—it means we can chain operations without leaving the domain of sets.

Sets and Relations

Ordered Pairs and Tuples

While sets are unordered, relations require ordered structures to associate attributes with values. This is where ordered pairs and tuples come in.

Definition (Ordered Pair): An ordered pair (a, b) is a pair of elements where order matters. Unlike sets, (a, b) ≠ (b, a) unless a = b.

Definition (n-Tuple): An n-tuple (or simply tuple) is an ordered sequence of n elements: (a₁, a₂, ..., aₙ). The value n is called the degree or arity of the tuple.

Crucial Property: Two tuples are equal if and only if they have the same degree and corresponding elements are equal:

(a₁, a₂, ..., aₙ) = (b₁, b₂, ..., bₙ) ⟺ (n = m) ∧ (a₁ = b₁) ∧ (a₂ = b₂) ∧ ... ∧ (aₙ = bₙ)

Tuple Equality ExamplesUnderstanding when tuples are equal:

Input

Compare: (1, 2, 3), (1, 2, 3), (3, 2, 1), (1, 2)

Output

(1, 2, 3) = (1, 2, 3) ✓ Equal
(1, 2, 3) ≠ (3, 2, 1) ✗ Same elements, different order
(1, 2, 3) ≠ (1, 2) ✗ Different degrees (3 vs 2)

Explanation

Order and degree both matter for tuple equality. This is why attribute ordering in formal relation definitions is significant, even though SQL abstracts this with named columns.

Component Access:

We reference individual components of a tuple using subscript notation:

For tuple t = (a₁, a₂, ..., aₙ), the i-th component is t[i] = aᵢ

In database terminology, if t is a tuple in a relation with attributes A₁, A₂, ..., Aₙ, we write:

t[Aᵢ] or t.Aᵢ to access the value of attribute Aᵢ in tuple t

Example: Given relation Employee with attributes (ID, Name, Salary) and tuple t = (101, 'Alice', 75000):

t[1] = t.ID = 101
t[2] = t.Name = 'Alice'
t[3] = t.Salary = 75000

Named vs. Positional Access

Cartesian Products: The Foundation of Relations

Definition (Cartesian Product of Two Sets): The Cartesian product of sets A and B, denoted A × B, is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B:

A × B = {(a, b) | a ∈ A ∧ b ∈ B}

Definition (Generalized Cartesian Product): The Cartesian product of n sets D₁, D₂, ..., Dₙ is:

D₁ × D₂ × ... × Dₙ = {(d₁, d₂, ..., dₙ) | d₁ ∈ D₁ ∧ d₂ ∈ D₂ ∧ ... ∧ dₙ ∈ Dₙ}

Cardinality of Cartesian Products:

The size of a Cartesian product grows multiplicatively:

|A × B| = |A| · |B|

|D₁ × D₂ × ... × Dₙ| = |D₁| · |D₂| · ... · |Dₙ|

This explosive growth is why relations are subsets of Cartesian products—we only include tuples that represent true facts, not every possible combination.

Cartesian Product VisualizationLet's construct the Cartesian product of two small sets:

Input

A = {a, b}, B = {1, 2, 3}

Output

A × B = {(a,1), (a,2), (a,3), (b,1), (b,2), (b,3)}

Explanation

|A × B| = |A| · |B| = 2 · 3 = 6 tuples. Each element of A is paired with every element of B. This represents ALL possible combinations—a relation selects which of these actually exist.

Visualizing Cartesian Products:

Cartesian products can be visualized as a grid where rows represent elements of one set and columns represent elements of another:

       | 1       | 2       | 3
-------|---------|---------|--------
a      | (a,1)   | (a,2)   | (a,3)
b      | (b,1)   | (b,2)   | (b,3)

For three sets A × B × C, we'd need a 3D grid—each point in the cube represents a possible 3-tuple.

Connection to Relations:

A relation R over domains D₁, D₂, ..., Dₙ is defined as:

R ⊆ D₁ × D₂ × ... × Dₙ

This means R is a selection of points from the n-dimensional space of all possible combinations. The relation asserts which combinations actually occur.

Cartesian Product in SQL

Domains and Their Semantic Properties

In the relational model, a domain is not merely a data type—it's a named pool of values with semantic meaning.

Definition (Domain): A domain D is a named set of atomic values. Each value in a domain represents an indivisible unit of information.

Key Properties of Domains:

Domain Characteristics

•Named: Each domain has a unique name indicating its semantic purpose (e.g., EmployeeID, ProductPrice, PhoneNumber).
•Atomic Values: Values cannot be decomposed within the relational model. A phone number is atomic even if it has area code/number components in the real world.
•Finite or Infinite: Domains may be finite ({A, B, C, D, F} for grades) or conceptually infinite (all positive integers for IDs).
•Data Type Foundation: Each domain has an underlying data type determining representation and valid operations.
•Semantic Meaning: Domains from the same data type may be semantically distinct. EmployeeID and ProductID might both be integers, but comparing them is meaningless.

Domain Compatibility:

Two attributes are domain compatible (or union compatible) if they draw values from the same domain. This concept is crucial for:

Set Operations: UNION, INTERSECT, and EXCEPT require corresponding attributes to be domain compatible.
Comparisons: Comparing values from incompatible domains (EmployeeID = ProductID) is semantically meaningless.
Joins: Natural joins match attributes by name, assuming domain compatibility.

Example of Domain Distinction:

Semantically Distinct Domains with Same Data Type
Domain Name	Data Type	Sample Values	Semantic Meaning
EmployeeID	INTEGER	1001, 1002, 1003	Unique employee identifier
DepartmentID	INTEGER	10, 20, 30, 40	Unique department identifier
Salary	INTEGER	50000, 75000, 100000	Annual compensation in dollars
Age	INTEGER	25, 30, 45, 62	Years since birth

All four domains use INTEGER, but:

Adding Salary + Age makes no sense
Comparing EmployeeID = DepartmentID is semantically wrong
Setting Salary = Age would be a logic error

Strong domain discipline prevents these errors. In formal relational theory, domains enforce semantic constraints that pure data types cannot.

SQL and Domains:

Domain-Driven Design in Practice

The Relation as a Mathematical Structure

Having established sets, tuples, Cartesian products, and domains, we can now formally define relations with complete mathematical precision.

Definition (Relation Schema): A relation schema R(A₁:D₁, A₂:D₂, ..., Aₙ:Dₙ) consists of:

A relation name R
A set of attribute names {A₁, A₂, ..., Aₙ}
A mapping from each attribute Aᵢ to its domain Dᵢ

The degree of the schema is n (the number of attributes).

The cardinality of instance r is |r| (the number of tuples).

relation_notation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Formal Schema Notation:
R(A₁:D₁, A₂:D₂, ..., Aₙ:Dₙ)
 
-- Example: Employee Relation Schema
Employee(
    EmployeeID: PositiveInteger,
    Name: String(50),
    Department: DepartmentName,
    Salary: PositiveDecimal,
    HireDate: Date
)
 
-- Relation Instance (a specific state of the relation):
r(Employee) = {
    (101, 'Alice Chen', 'Engineering', 95000.00, 2020-03-15),
    (102, 'Bob Smith', 'Marketing', 72000.00, 2019-07-01),
    (103, 'Carol Davis', 'Engineering', 105000.00, 2018-01-20),
    (104, 'David Kim', 'Sales', 68000.00, 2021-09-10)
}
 
-- Properties:
--   Schema degree: 5 (five attributes)
--   Instance cardinality: 4 (four tuples)

The Two-Level Structure:

Notice the critical distinction:

Schema Level (Intension): Defines what is possible—the structure, the attributes, the domains. Schemas are relatively stable; they change only when the database structure is altered.
Instance Level (Extension): Defines what is true now—the actual data. Instances change constantly as data is inserted, updated, and deleted.

This separation mirrors the distinction in logic between:

Definition: What a predicate means (schema)
Extension: What satisfies the predicate (instance)

Intension vs Extension

Algebraic Properties of Relations

Closure Property:

The most fundamental property is closure: every relational operator takes relations as input and produces a relation as output.

If R and S are relations, then:

σ(R) is a relation (selection)
π(R) is a relation (projection)
R ∪ S is a relation (union)
R ∩ S is a relation (intersection)
R − S is a relation (difference)
R × S is a relation (Cartesian product)
R ⋈ S is a relation (join)

Closure enables composition: we can chain operations arbitrarily. The result of one operation becomes input to the next, enabling complex queries from simple primitives.

Key Algebraic Properties for Optimization

•Commutativity of Join: R ⋈ S = S ⋈ R. The optimizer can choose which table to access first.
•Associativity of Join: (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T). The optimizer can group joins optimally.
•Commutativity of Union/Intersection: R ∪ S = S ∪ R, R ∩ S = S ∩ R.
•Distributivity of Selection over Join: σ(R ⋈ S) = σ(R) ⋈ S when σ references only R's attributes. Enables 'pushing selections down.'
•Cascade of Selections: σ_p(σ_q(R)) = σ_{p∧q}(R). Multiple selections combine into one.
•Cascade of Projections: π_A(π_B(R)) = π_A(R) when A ⊆ B. Inner projections can be eliminated.
•Selection and Projection Commute: When selection predicate involves only projected attributes.

Why These Properties Enable Optimization:

Consider a query: 'Find names of employees in Engineering earning over $100K.'

Naive Execution:

Join all employees with all departments (potentially millions of combinations)
Filter to Engineering (remove most rows)
Filter to salary > 100K (remove more rows)
Project to Name column

Optimized Execution (using algebraic properties):

Filter employees to salary > 100K first (distributivity of selection)
Filter to Engineering department (cascade of selections)
Project to needed columns early (reduce data volume)
Only now do any remaining joins

The Power of Algebraic Equivalence

NULL Values and Three-Valued Logic

What NULL Represents:

NULL is not a value—it's the absence of a value. It can represent:

Unknown: The value exists but we don't know it (e.g., a person's birth date that wasn't recorded)
Inapplicable: The attribute doesn't apply to this entity (e.g., spouse name for an unmarried person)
Withheld: The value exists and is known but not disclosed

SQL's NULL conflates these distinct cases, which can lead to logical surprises.

Three-Valued Logic (3VL):

With NULL, boolean expressions evaluate to three possible values: TRUE, FALSE, or UNKNOWN.

The truth tables extend standard boolean logic:

Three-Valued Logic: AND Operation
P	Q	P AND Q
TRUE	TRUE	TRUE
TRUE	FALSE	FALSE
TRUE	UNKNOWN	UNKNOWN
FALSE	TRUE	FALSE
FALSE	FALSE	FALSE
FALSE	UNKNOWN	FALSE
UNKNOWN	TRUE	UNKNOWN
UNKNOWN	FALSE	FALSE
UNKNOWN	UNKNOWN	UNKNOWN

Three-Valued Logic: OR Operation
P	Q	P OR Q
TRUE	TRUE	TRUE
TRUE	FALSE	TRUE
TRUE	UNKNOWN	TRUE
FALSE	TRUE	TRUE
FALSE	FALSE	FALSE
FALSE	UNKNOWN	UNKNOWN
UNKNOWN	TRUE	TRUE
UNKNOWN	FALSE	UNKNOWN
UNKNOWN	UNKNOWN	UNKNOWN

Practical Implications:

Comparisons with NULL are UNKNOWN:
- NULL = NULL → UNKNOWN (not TRUE!)
- NULL <> NULL → UNKNOWN
- 5 = NULL → UNKNOWN
WHERE Clauses Only Return TRUE Rows:
- WHERE salary > 50000 excludes rows where salary is NULL
- This can cause unexpected missing rows!
NOT UNKNOWN = UNKNOWN:
- NOT(NULL = 5) → NOT(UNKNOWN) → UNKNOWN
Use IS NULL / IS NOT NULL for NULL Testing:
- x IS NULL → TRUE if x is NULL, FALSE otherwise (never UNKNOWN)
- x IS NOT NULL → TRUE if x has a value

NULL Gotchas

Summary: Mathematics to Practice

We've traversed the mathematical landscape underlying the relational model. Let's consolidate how these abstract concepts connect to practical database work:

Key Concepts Mastered

•Set Theory Foundation: Relations are sets of tuples, inheriting closure and operation properties that enable query composition and optimization.
•Tuples as Ordered Structures: Unlike sets, tuples maintain order, enabling attribute-value association. Access by name or position.
•Cartesian Products Define Possibility Space: A relation is a subset of the Cartesian product of its domains—selecting which fact combinations are true.
•Domains Provide Semantic Meaning: Beyond data types, domains define what values mean and which comparisons are sensible.
•Schema vs. Instance: The two-level structure separates what's possible (schema/intension) from what's true (instance/extension).
•Algebraic Properties Enable Optimization: Commutativity, associativity, and distributivity allow query rewriting for efficiency.
•Three-Valued Logic Handles Missing Data: NULL introduces UNKNOWN, complicating boolean evaluation in ways that demand careful attention.

Why This Matters in Practice:

Mathematical Concept	Practical Application
Set properties	UNION, INTERSECT, EXCEPT queries work correctly
Tuple structure	Column ordering, SELECT *, explicit column lists
Cartesian products	CROSS JOIN, understanding join explosions
Domain semantics	Type safety, meaningful foreign keys
Algebraic properties	Query optimizer transforms queries automatically
Three-valued logic	Handling NULLs correctly in WHERE, CASE, JOINs

What's Next:

Mathematical Foundation Complete

2 / 5