Relational Model - Learning Module

Loading content...

0/241

Mathematical Foundation

The Mathematics Behind the Magic

When E.F. Codd introduced the relational model in 1970, he didn't just propose a new way to organize data—he grounded database management in rigorous mathematics. This wasn't an academic exercise; it was a strategic decision that would prove transformative.

Before the relational model, databases were engineering artifacts. Queries against hierarchical and network databases required intimate knowledge of physical storage structures. Each optimization was ad-hoc, each query language proprietary.

Codd's insight was that by basing data management on set theory and first-order predicate logic, query operations could be defined formally, optimized systematically, and proven correct mathematically. The database could reason about data without knowing how it was stored.

This mathematical foundation is why you can write SELECT * FROM employees WHERE salary > 50000 and trust that the database will find all matching records—regardless of how they're indexed, partitioned, or distributed across storage.

What You Will Learn

By the end of this page, you will understand how set theory provides the foundation for relations and operations, how predicate logic enables expressive query languages, the role of relational algebra and calculus in query processing, and why mathematical foundations enable query optimization.

Set Theory as the Foundation

The relational model is built directly on set theory—the mathematical study of collections of objects. Understanding set theory illuminates why relations behave as they do.

Basic Set Concepts

A set is an unordered collection of distinct elements. Key properties:

No duplicates: Each element appears at most once
No ordering: {a, b, c} = {c, a, b}
Membership: An element either is or isn't in a set (no partial membership)

Sets can be defined by:

Enumeration: S = {1, 2, 3, 4, 5}
Set-builder notation: S = {x | x is an integer and 1 ≤ x ≤ 5}
Characteristic function: For each x, define whether x ∈ S

Set Operations and Their Relational Equivalents
Set Operation	Symbol	Relational Equivalent	Example
Union	A ∪ B	UNION	Combine employees from two departments
Intersection	A ∩ B	INTERSECT	Employees in both project A AND project B
Difference	A − B	EXCEPT / MINUS	Employees in A but not in B
Cartesian Product	A × B	CROSS JOIN	All possible employee-department pairs
Subset	A ⊆ B	(no direct SQL)	All managers are employees
Power Set	P(A)	(no direct SQL)	All possible subsets of attributes

How Sets Map to Relations

Recall that a relation is formally a set of tuples. This set-theoretic definition immediately gives us:

1. No Duplicate Rows Since sets cannot contain duplicates, relations cannot contain duplicate tuples by definition. This is why SQL's DISTINCT keyword exists—SQL tables can violate pure set semantics.

2. Set Operations Work on Relations Because relations are sets, set operations (union, intersection, difference) apply directly. We can combine relations, find common elements, or compute differences—all with well-defined mathematical semantics.

3. Relations are Subsets of Cartesian Products Given domains D₁, D₂, ..., Dₙ for n attributes:

The Cartesian product D₁ × D₂ × ... × Dₙ contains ALL possible combinations
A relation R is a SUBSET of this Cartesian product
R contains only the combinations that actually exist in our database

This is profound: the relation represents which facts are true out of all facts that could be true.

Relations as Subsets of Cartesian ProductsIllustrating the mathematical view of relations

Input

Domains:
  Department = {'Engineering', 'Marketing', 'Finance'}
  Level = {'Junior', 'Senior', 'Lead'}

Cartesian Product Department × Level (all 9 possible combinations):
  {('Engineering', 'Junior'), ('Engineering', 'Senior'), ('Engineering', 'Lead'),
   ('Marketing', 'Junior'), ('Marketing', 'Senior'), ('Marketing', 'Lead'),
   ('Finance', 'Junior'), ('Finance', 'Senior'), ('Finance', 'Lead')}

Output

OpenPositions Relation (a subset representing actual open positions):
  {('Engineering', 'Senior'), ('Marketing', 'Junior'), ('Finance', 'Lead')}

This relation asserts:
  ✓ Engineering needs a Senior
  ✓ Marketing needs a Junior  
  ✓ Finance needs a Lead
  ✗ All other combinations are NOT open positions

The Power of Set Semantics

Because relations are sets, every query result is also a set (and therefore a relation). This closure property means operations can be composed arbitrarily: the output of any query can be the input to another. This enables query languages to express arbitrarily complex retrievals through operation composition.

First-Order Predicate Logic

While set theory provides the structural foundation, first-order predicate logic (also called first-order logic or predicate calculus) provides the foundation for expressing queries and constraints.

What is Predicate Logic?

Predicate logic extends propositional logic with:

Variables that range over a domain of values
Predicates that express properties of or relationships between variables
Quantifiers that express how many elements satisfy a predicate

Predicates in the Relational Model

In relational databases, each relation can be viewed as a predicate:

Employee(emp_id, name, department, salary)

This is a 4-place predicate. The tuple (1001, 'Alice', 'Engineering', 95000) makes the predicate true:

Employee(1001, 'Alice', 'Engineering', 95000) = TRUE

A tuple NOT in the relation makes the predicate false:

Employee(9999, 'Nobody', 'Nowhere', 0) = FALSE

Quantifiers

Predicate logic uses two quantifiers:

Universal Quantifier (∀) — "for all"

∀x P(x) means "P(x) is true for every x in the domain"
Example: ∀emp (Employee(emp) → emp.salary > 0)
- "Every employee has a positive salary"

Existential Quantifier (∃) — "there exists"

∃x P(x) means "there is at least one x for which P(x) is true"
Example: ∃emp (Employee(emp) ∧ emp.department = 'Engineering')
- "There exists at least one employee in Engineering"

Logical Connectives

Predicates can be combined using:

∧ (AND): Both conditions must be true
∨ (OR): At least one condition must be true
¬ (NOT): Negation of a condition
→ (IMPLIES): If first is true, second must be true
↔ (IFF): Both conditions have the same truth value

Queries as Predicate Logic FormulasTranslating SQL queries to predicate logic

Input

SQL Query:
SELECT name FROM Employee WHERE salary > 80000 AND department = 'Engineering'

Predicate Logic:
{name | ∃emp_id, dept, sal (Employee(emp_id, name, dept, sal) 
                            ∧ sal > 80000 
                            ∧ dept = 'Engineering')}

Output

Reading the formula:
"The set of all names such that there exists an employee with that name
who has a salary greater than 80000 and works in Engineering"

This is Tuple Relational Calculus notation—a formal query language
based directly on predicate logic.

From Logic to SQL

SQL's WHERE clause is essentially predicate logic with user-friendly syntax. WHERE salary > 50000 AND department = 'Engineering' is a predicate that each row must satisfy. The SELECT clause specifies which variables (attributes) to return. JOINs combine predicates from multiple relations. Understanding this connection demystifies SQL and reveals its expressive power.

Relational Algebra: The Procedural Foundation

Relational algebra is a procedural query language that operates on relations and produces relations. It consists of a set of operations that take one or two relations as input and produce a new relation as output.

Relational algebra is procedural: it specifies not just what to retrieve, but how to compute it—a sequence of operations.

The Fundamental Operations

Codd identified six fundamental operations from which all other operations can be derived:

Fundamental Relational Algebra Operations
Operation	Symbol	Description	SQL Equivalent
Selection	σ (sigma)	Filter rows by a predicate	WHERE clause
Projection	π (pi)	Select specific columns	SELECT column list
Union	∪	Combine all rows from two compatible relations	UNION
Set Difference	−	Rows in first but not second relation	EXCEPT
Cartesian Product	×	All combinations of rows from two relations	CROSS JOIN
Rename	ρ (rho)	Rename relation or attributes	AS (alias)

Derived Operations

From the six fundamentals, we can derive useful shortcut operations:

Intersection (∩) Rows common to both relations. Derived: R ∩ S = R − (R − S)

Join (⋈) Combine related rows from two relations.

Theta Join (⋈θ): Like Cartesian product, filtered by condition θ
Natural Join (⋈): Join on common attribute names, removing duplicates
Equijoin: Theta join where θ uses only equality

Division (÷) Rows in R associated with ALL rows in S. Useful for "find X that has all Y" queries.

Outer Joins Preserve rows that don't match:

Left Outer Join (⟕): Keep all rows from left
Right Outer Join (⟖): Keep all rows from right
Full Outer Join (⟗): Keep all rows from both

Building a Query with Relational AlgebraStep-by-step query construction using algebra operations

Input

Goal: Find names of employees in Engineering earning over $80,000

Relations:
  Employee(emp_id, name, department, salary)

Step-by-step Relational Algebra:
  1. Select Engineering employees:
     σ_department='Engineering' (Employee)
  
  2. Select high earners from step 1:
     σ_salary>80000 (σ_department='Engineering' (Employee))
  
  3. Project just the names:
     π_name (σ_salary>80000 (σ_department='Engineering' (Employee)))

Output

Alternative combined notation:
  π_name (σ_department='Engineering' ∧ salary>80000 (Employee))

This algebraic expression is EQUIVALENT to:
  SELECT name 
  FROM Employee 
  WHERE department = 'Engineering' AND salary > 80000

The algebra provides a formal basis for query execution.

Why Relational Algebra Matters

You rarely write relational algebra directly, but it's the internal language of database query engines. SQL queries are parsed into algebraic expressions, optimized by reordering operations, then executed. Understanding algebra helps you understand query plans, predict performance, and write more efficient SQL.

Relational Calculus: The Declarative Foundation

While relational algebra is procedural (specify HOW), relational calculus is declarative (specify WHAT). It describes the desired result without specifying how to compute it.

There are two equivalent forms of relational calculus:

1. Tuple Relational Calculus (TRC)

Variables range over tuples. Example:

{t.name | Employee(t) ∧ t.salary > 80000}

Reads: "The set of name values from tuples t where t is in Employee and t's salary exceeds 80000."

2. Domain Relational Calculus (DRC)

Variables range over domain values (attributes). Example:

{<n> | ∃e, d, s (Employee(e, n, d, s) ∧ s > 80000)}

Reads: "The set of name values n where there exist values e, d, s such that (e, n, d, s) is in Employee and s exceeds 80000."

Relational Algebra

•Procedural: Specifies operations step by step
•Operations: σ, π, ∪, −, ×, ρ
•Closer to how queries execute
•Basis for query optimization
•Each operation transforms relations

Relational Calculus

•Declarative: Describes desired result
•Uses predicate logic formulas
•Closer to how queries are written
•Basis for SQL semantics
•Expresses conditions to satisfy

Codd's Theorem: Equivalence

A fundamental result in database theory is Codd's Theorem, which states:

Relational algebra and the safe subset of relational calculus are expressively equivalent.

This means:

Anything expressible in relational algebra can be expressed in relational calculus
Anything expressible in safe relational calculus can be expressed in relational algebra
They define the same class of queries

The "safe" qualifier is important. Unsafe calculus expressions can describe infinite results:

{t | ¬Employee(t)}  -- All tuples that are NOT employees

This describes infinitely many tuples! Safe calculus restricts expressions to guarantee finite results.

Why This Equivalence Matters

You can write queries declaratively (tell the database what you want) and the database can transform them to a procedural execution plan (how to get it). The mathematical equivalence guarantees the transformation is correct.

SQL's Dual Nature

SQL combines declarative and procedural elements. The FROM clause establishes relations, WHERE filters (like σ), SELECT projects (like π), and JOIN combines (like ×, ⋈). Subqueries can be viewed as calculus-style expressions. This hybrid nature makes SQL expressive yet sometimes confusing.

Mathematical Foundations of Query Optimization

One of the greatest benefits of mathematical foundations is query optimization. Because relational operations obey algebraic laws, databases can transform queries into more efficient forms while guaranteeing identical results.

Algebraic Equivalences

Just as arithmetic obeys laws (a + b = b + a), relational algebra obeys laws:

Selection Laws:

Cascade: σ_p1(σ_p2(R)) = σ_p1∧p2(R)
Commutativity: σ_p1(σ_p2(R)) = σ_p2(σ_p1(R))

Projection Laws:

Cascade: π_L1(π_L2(R)) = π_L1(R) if L1 ⊆ L2

Selection-Projection Commutativity:

π_L(σ_p(R)) = σ_p(π_L(R)) if p uses only attributes in L

Join Laws:

Commutativity: R ⋈ S = S ⋈ R
Associativity: (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)

Query Optimization Through Algebraic TransformationDemonstrating how equivalences improve performance

Input

Original Query (inefficient):
  π_name(Employee ⋈ Department) where department.budget > 1000000

This joins FIRST (expensive), then filters (cheap on small result).

Let Employee have 10,000 rows, Department have 100 rows.
Join produces all 10,000 employee-department combinations first.

Output

Optimized Query (equivalent but faster):
  π_name(Employee ⋈ (σ_budget>1000000(Department)))

This filters Department FIRST (cheap), then joins with smaller set.

If only 10 departments have budget > 1M:
- Original: Join 10000 × 100, then filter → 1,000,000 comparisons
- Optimized: Filter to 10 depts, join 10000 × 10 → 100,000 comparisons

10x faster, mathematically guaranteed to give identical results!

Common Optimization Strategies

1. Push Selections Down Apply filters as early as possible to reduce intermediate result sizes. This is almost always beneficial.

2. Push Projections Down Remove unneeded columns early to reduce memory and I/O. Be careful: some operations need columns that might be projected out prematurely.

3. Reorder Joins Since joins are commutative and associative, the optimizer can choose the most efficient join order. For n tables, there are (2n-2)! / (n-1)! possible orders.

4. Choose Join Algorithms Based on table sizes and indexes, choose between:

Nested Loop Join: Good for small outer table
Hash Join: Good for equality joins on large tables
Merge Join: Good when inputs are sorted

5. Use Indexes Replace table scans with index lookups when conditions use indexed columns.

Reading Query Plans

Modern databases show query execution plans (EXPLAIN in PostgreSQL/MySQL). These reveal the algebraic operations the optimizer chose. Understanding relational algebra helps you read these plans, identify bottlenecks, and write queries that optimize better.

Mathematical Integrity Constraints

The mathematical foundation enables us to express and enforce integrity constraints—rules that define valid database states. Constraints can be expressed as predicate logic formulas that must be true for every valid database instance.

Types of Integrity Constraints

Common Integrity Constraints in Predicate Logic
Constraint Type	Predicate Logic Expression	SQL Equivalent
Domain Constraint	∀t ∈ R: t.age ∈ {1..120}	CHECK (age BETWEEN 1 AND 120)
Not Null	∀t ∈ R: t.name ≠ NULL	name NOT NULL
Unique Key	∀t1,t2 ∈ R: t1.key = t2.key → t1 = t2	UNIQUE (key)
Primary Key	Unique ∧ Not Null on key attributes	PRIMARY KEY (id)
Foreign Key	∀t ∈ R: ∃s ∈ S: t.fk = s.pk	FOREIGN KEY REFERENCES S(pk)
Entity Integrity	∀t ∈ R: t.pk ≠ NULL	PRIMARY KEY implies NOT NULL

Referential Integrity in Depth

Referential integrity ensures that foreign key values match existing primary keys. Formally:

∀r ∈ R: r.foreign_key = NULL ∨ ∃s ∈ S: r.foreign_key = s.primary_key

This means: Every foreign key value is either NULL (if allowed) or refers to an existing primary key.

Enforcement Actions:

When a referenced primary key is deleted or updated:

RESTRICT / NO ACTION: Reject the operation if references exist
CASCADE: Delete/update referencing rows too
SET NULL: Set foreign key to NULL
SET DEFAULT: Set foreign key to default value

Expressing Business Rules as ConstraintsComplex constraints expressed in predicate logic

Input

Business Rule: 
"A manager's salary must be greater than all their direct reports"

Predicate Logic:
∀m, e ∈ Employee: 
  (e.manager_id = m.emp_id) → (m.salary > e.salary)

SQL CHECK (complex, requires triggers in most DBMSs):
Only expressible via triggers or application logic in standard SQL

Output

Another Rule:
"Every department must have at least one employee"

Predicate Logic:
∀d ∈ Department: ∃e ∈ Employee: e.department_id = d.id

This is an ASSERTION (not directly supported in most DBMSs).
Workaround: Use triggers or application-level enforcement.

The gap between theoretical expressiveness and practical SQL
is why understanding the theory helps—you know what's missing.

The Constraint Expression Gap

While predicate logic can express almost any constraint, SQL cannot enforce all of them directly. Complex constraints involving multiple tables, aggregations, or temporal conditions often require triggers, stored procedures, or application-level enforcement. Knowing the theory helps you identify what the database CAN'T enforce automatically.

Functional Dependencies and Normalization

Functional dependencies (FDs) are a mathematical concept central to database design. They describe how attribute values depend on each other.

Definition

A functional dependency X → Y holds in relation R if, whenever two tuples have the same values for attributes X, they must also have the same values for attributes Y.

In predicate logic:

∀t1, t2 ∈ R: t1.X = t2.X → t1.Y = t2.Y

Example:

emp_id → name, department, salary

If two tuples have the same emp_id, they must have the same name, department, and salary. This makes emp_id a candidate key.
zip_code → city, state (in US)

A zip code determines the city and state.

Armstrong's Axioms

Functional dependencies obey a complete set of inference rules called Armstrong's Axioms:

1. Reflexivity: If Y ⊆ X, then X → Y

Any set of attributes determines its subsets
{A, B} → B is always true (trivial FD)

2. Augmentation: If X → Y, then XZ → YZ

Adding attributes to both sides preserves the FD
If emp_id → name, then {emp_id, dept} → {name, dept}

3. Transitivity: If X → Y and Y → Z, then X → Z

Dependencies chain together
If emp_id → dept_id and dept_id → dept_name, then emp_id → dept_name

Derived Rules:

Union: If X → Y and X → Z, then X → YZ
Decomposition: If X → YZ, then X → Y and X → Z
Pseudotransitivity: If X → Y and WY → Z, then WX → Z

These rules are sound (only produce valid implications) and complete (can derive all valid implications).

Using Functional DependenciesFinding candidate keys from FDs

Input

Relation: CourseSection(course_id, section, semester, year, instructor, room)

Functional Dependencies:
  1. {course_id, section, semester, year} → instructor
  2. {course_id, section, semester, year} → room
  3. {room, semester, year} → course_id, section  (room is used for one course at a time)

Question: What are the candidate keys?

Output

Analysis:
  - From FDs 1 & 2 (union): {course_id, section, semester, year} → {instructor, room}
  - This means {course_id, section, semester, year} determines ALL other attributes
  - Therefore it's a superkey
  
  - Is it minimal? Yes—no subset determines all attributes
  - Therefore it's a CANDIDATE KEY

  - From FD 3: {room, semester, year} → {course_id, section}
  - Combined with FD 1: {room, semester, year} → entire relation
  - Therefore {room, semester, year} is ANOTHER candidate key!

The relation has two candidate keys.

FDs Enable Normalization

Functional dependencies are the mathematical basis for normalization—the process of organizing relations to eliminate redundancy and anomalies. Each normal form (1NF, 2NF, 3NF, BCNF) is defined in terms of FDs. Without understanding FDs mathematically, normalization becomes a set of mysterious rules rather than logical derivations.

The Power of Mathematical Formalism

The mathematical foundations of the relational model aren't just theoretical elegance—they enable practical capabilities that would be impossible otherwise.

Practical Benefits of Mathematical Foundations

•Query Optimization — Algebraic equivalences allow databases to automatically rewrite queries for better performance without changing results. You write readable SQL; the database executes efficiently.
•Correctness Proofs — Operations are formally defined, so we can prove query transformations preserve semantics. You can trust the optimizer.
•View Updates — Mathematical analysis determines when updates to views can be unambiguously propagated to base tables.
•Constraint Propagation — Formal constraint languages enable automatic enforcement and violation detection across complex relationships.
•Normalization Theory — Mathematical definitions of normal forms enable systematic design that provably eliminates redundancy and anomalies.
•Transaction Semantics — ACID properties can be formally defined and verified, ensuring data integrity under concurrent access.
•Parallel Execution — Algebraic properties identify operations that can safely run in parallel, enabling distributed query processing.

Relational Completeness

A query language is relationally complete if it can express any query that relational algebra/calculus can express. Codd defined this as the minimum bar for a relational query language.

SQL is relationally complete (and more—it includes aggregation, recursion in modern versions, etc.). But this baseline ensures you can always express the fundamental queries the relational model was designed to support.

Beyond Completeness: Extensions

Modern SQL extends beyond relational completeness with:

Aggregation (SUM, COUNT, GROUP BY)
Transitive closure / recursion (WITH RECURSIVE)
Window functions (RANK, OVER)
Temporal queries (AS OF, BETWEEN)

Each extension has its own theoretical foundations, but all build on the solid set-theoretic and logical base.

Theory in Practice

You don't need to work through predicate logic proofs to use a database. But when you need to understand why a query is slow, debug a constraint violation, or design a normalized schema, the mathematical foundations provide the framework for reasoning precisely about what's happening and what should happen.

Summary: The Mathematical Core

The relational model's mathematical foundations are what elevate it from a data storage technique to a principled data management system. Let's consolidate the key concepts:

Key Takeaways

•Set theory provides structural semantics — Relations are sets of tuples; set operations apply directly; closure under operations enables query composition.
•Predicate logic provides query semantics — Tuples represent true propositions; queries express conditions tuples must satisfy; quantifiers express complex conditions.
•Relational algebra is procedural — Operations transform relations step by step; forms the basis for query execution plans and optimization.
•Relational calculus is declarative — Describes desired results without specifying computation; closer to how SQL queries are written.
•Codd's theorem proves equivalence — Algebra and safe calculus express the same queries; enables declarative input with procedural execution.
•Algebraic laws enable optimization — Equivalences allow query rewriting for efficiency; this is why declarative SQL can execute efficiently.
•Functional dependencies formalize design — FDs describe attribute relationships; Armstrong's axioms enable systematic normalization.

What's Next:

Having explored the mathematical foundations, we'll now examine Codd's 12 Rules—the definitive criteria that Dr. Codd established to determine whether a database system is truly relational. These rules translate the mathematical theory into practical requirements for database implementations.

Page Complete

You now understand the mathematical foundations that power the relational model—set theory, predicate logic, relational algebra, and relational calculus. This isn't abstract theory; it's the engine that enables query optimization, constraint enforcement, normalization, and every other principled aspect of relational databases. Next, we'll see how Codd codified these principles into practical rules for RDBMS implementations.

Mathematical Foundation

The Mathematics Behind the Magic

What You Will Learn

Set Theory as the Foundation

The relational model is built directly on set theory—the mathematical study of collections of objects. Understanding set theory illuminates why relations behave as they do.

Basic Set Concepts

A set is an unordered collection of distinct elements. Key properties:

No duplicates: Each element appears at most once
No ordering: {a, b, c} = {c, a, b}
Membership: An element either is or isn't in a set (no partial membership)

Sets can be defined by:

Enumeration: S = {1, 2, 3, 4, 5}
Set-builder notation: S = {x | x is an integer and 1 ≤ x ≤ 5}
Characteristic function: For each x, define whether x ∈ S

Set Operations and Their Relational Equivalents
Set Operation	Symbol	Relational Equivalent	Example
Union	A ∪ B	UNION	Combine employees from two departments
Intersection	A ∩ B	INTERSECT	Employees in both project A AND project B
Difference	A − B	EXCEPT / MINUS	Employees in A but not in B
Cartesian Product	A × B	CROSS JOIN	All possible employee-department pairs
Subset	A ⊆ B	(no direct SQL)	All managers are employees
Power Set	P(A)	(no direct SQL)	All possible subsets of attributes

How Sets Map to Relations

Recall that a relation is formally a set of tuples. This set-theoretic definition immediately gives us:

3. Relations are Subsets of Cartesian Products Given domains D₁, D₂, ..., Dₙ for n attributes:

The Cartesian product D₁ × D₂ × ... × Dₙ contains ALL possible combinations
A relation R is a SUBSET of this Cartesian product
R contains only the combinations that actually exist in our database

This is profound: the relation represents which facts are true out of all facts that could be true.

Relations as Subsets of Cartesian ProductsIllustrating the mathematical view of relations

Input

Domains:
  Department = {'Engineering', 'Marketing', 'Finance'}
  Level = {'Junior', 'Senior', 'Lead'}

Cartesian Product Department × Level (all 9 possible combinations):
  {('Engineering', 'Junior'), ('Engineering', 'Senior'), ('Engineering', 'Lead'),
   ('Marketing', 'Junior'), ('Marketing', 'Senior'), ('Marketing', 'Lead'),
   ('Finance', 'Junior'), ('Finance', 'Senior'), ('Finance', 'Lead')}

Output

OpenPositions Relation (a subset representing actual open positions):
  {('Engineering', 'Senior'), ('Marketing', 'Junior'), ('Finance', 'Lead')}

This relation asserts:
  ✓ Engineering needs a Senior
  ✓ Marketing needs a Junior  
  ✓ Finance needs a Lead
  ✗ All other combinations are NOT open positions

The Power of Set Semantics

First-Order Predicate Logic

What is Predicate Logic?

Predicate logic extends propositional logic with:

Variables that range over a domain of values
Predicates that express properties of or relationships between variables
Quantifiers that express how many elements satisfy a predicate

Predicates in the Relational Model

In relational databases, each relation can be viewed as a predicate:

Employee(emp_id, name, department, salary)

This is a 4-place predicate. The tuple (1001, 'Alice', 'Engineering', 95000) makes the predicate true:

Employee(1001, 'Alice', 'Engineering', 95000) = TRUE

A tuple NOT in the relation makes the predicate false:

Employee(9999, 'Nobody', 'Nowhere', 0) = FALSE

Quantifiers

Predicate logic uses two quantifiers:

Universal Quantifier (∀) — "for all"

∀x P(x) means "P(x) is true for every x in the domain"
Example: ∀emp (Employee(emp) → emp.salary > 0)
- "Every employee has a positive salary"

Existential Quantifier (∃) — "there exists"

∃x P(x) means "there is at least one x for which P(x) is true"
Example: ∃emp (Employee(emp) ∧ emp.department = 'Engineering')
- "There exists at least one employee in Engineering"

Logical Connectives

Predicates can be combined using:

∧ (AND): Both conditions must be true
∨ (OR): At least one condition must be true
¬ (NOT): Negation of a condition
→ (IMPLIES): If first is true, second must be true
↔ (IFF): Both conditions have the same truth value

Queries as Predicate Logic FormulasTranslating SQL queries to predicate logic

Input

SQL Query:
SELECT name FROM Employee WHERE salary > 80000 AND department = 'Engineering'

Predicate Logic:
{name | ∃emp_id, dept, sal (Employee(emp_id, name, dept, sal) 
                            ∧ sal > 80000 
                            ∧ dept = 'Engineering')}

Output

Reading the formula:
"The set of all names such that there exists an employee with that name
who has a salary greater than 80000 and works in Engineering"

This is Tuple Relational Calculus notation—a formal query language
based directly on predicate logic.

From Logic to SQL

Relational Algebra: The Procedural Foundation

Relational algebra is procedural: it specifies not just what to retrieve, but how to compute it—a sequence of operations.

The Fundamental Operations

Codd identified six fundamental operations from which all other operations can be derived:

Fundamental Relational Algebra Operations
Operation	Symbol	Description	SQL Equivalent
Selection	σ (sigma)	Filter rows by a predicate	WHERE clause
Projection	π (pi)	Select specific columns	SELECT column list
Union	∪	Combine all rows from two compatible relations	UNION
Set Difference	−	Rows in first but not second relation	EXCEPT
Cartesian Product	×	All combinations of rows from two relations	CROSS JOIN
Rename	ρ (rho)	Rename relation or attributes	AS (alias)

Derived Operations

From the six fundamentals, we can derive useful shortcut operations:

Intersection (∩) Rows common to both relations. Derived: R ∩ S = R − (R − S)

Join (⋈) Combine related rows from two relations.

Theta Join (⋈θ): Like Cartesian product, filtered by condition θ
Natural Join (⋈): Join on common attribute names, removing duplicates
Equijoin: Theta join where θ uses only equality

Division (÷) Rows in R associated with ALL rows in S. Useful for "find X that has all Y" queries.

Outer Joins Preserve rows that don't match:

Left Outer Join (⟕): Keep all rows from left
Right Outer Join (⟖): Keep all rows from right
Full Outer Join (⟗): Keep all rows from both

Building a Query with Relational AlgebraStep-by-step query construction using algebra operations

Input

Goal: Find names of employees in Engineering earning over $80,000

Relations:
  Employee(emp_id, name, department, salary)

Step-by-step Relational Algebra:
  1. Select Engineering employees:
     σ_department='Engineering' (Employee)
  
  2. Select high earners from step 1:
     σ_salary>80000 (σ_department='Engineering' (Employee))
  
  3. Project just the names:
     π_name (σ_salary>80000 (σ_department='Engineering' (Employee)))

Output

Alternative combined notation:
  π_name (σ_department='Engineering' ∧ salary>80000 (Employee))

This algebraic expression is EQUIVALENT to:
  SELECT name 
  FROM Employee 
  WHERE department = 'Engineering' AND salary > 80000

The algebra provides a formal basis for query execution.

Why Relational Algebra Matters

Relational Calculus: The Declarative Foundation

While relational algebra is procedural (specify HOW), relational calculus is declarative (specify WHAT). It describes the desired result without specifying how to compute it.

There are two equivalent forms of relational calculus:

1. Tuple Relational Calculus (TRC)

Variables range over tuples. Example:

{t.name | Employee(t) ∧ t.salary > 80000}

Reads: "The set of name values from tuples t where t is in Employee and t's salary exceeds 80000."

2. Domain Relational Calculus (DRC)

Variables range over domain values (attributes). Example:

{<n> | ∃e, d, s (Employee(e, n, d, s) ∧ s > 80000)}

Reads: "The set of name values n where there exist values e, d, s such that (e, n, d, s) is in Employee and s exceeds 80000."

Relational Algebra

•Procedural: Specifies operations step by step
•Operations: σ, π, ∪, −, ×, ρ
•Closer to how queries execute
•Basis for query optimization
•Each operation transforms relations

Relational Calculus

•Declarative: Describes desired result
•Uses predicate logic formulas
•Closer to how queries are written
•Basis for SQL semantics
•Expresses conditions to satisfy

Codd's Theorem: Equivalence

A fundamental result in database theory is Codd's Theorem, which states:

Relational algebra and the safe subset of relational calculus are expressively equivalent.

This means:

Anything expressible in relational algebra can be expressed in relational calculus
Anything expressible in safe relational calculus can be expressed in relational algebra
They define the same class of queries

The "safe" qualifier is important. Unsafe calculus expressions can describe infinite results:

{t | ¬Employee(t)}  -- All tuples that are NOT employees

This describes infinitely many tuples! Safe calculus restricts expressions to guarantee finite results.

Why This Equivalence Matters

SQL's Dual Nature

Mathematical Foundations of Query Optimization

Algebraic Equivalences

Just as arithmetic obeys laws (a + b = b + a), relational algebra obeys laws:

Selection Laws:

Cascade: σ_p1(σ_p2(R)) = σ_p1∧p2(R)
Commutativity: σ_p1(σ_p2(R)) = σ_p2(σ_p1(R))

Projection Laws:

Cascade: π_L1(π_L2(R)) = π_L1(R) if L1 ⊆ L2

Selection-Projection Commutativity:

π_L(σ_p(R)) = σ_p(π_L(R)) if p uses only attributes in L

Join Laws:

Commutativity: R ⋈ S = S ⋈ R
Associativity: (R ⋈ S) ⋈ T = R ⋈ (S ⋈ T)

Query Optimization Through Algebraic TransformationDemonstrating how equivalences improve performance

Input

Original Query (inefficient):
  π_name(Employee ⋈ Department) where department.budget > 1000000

This joins FIRST (expensive), then filters (cheap on small result).

Let Employee have 10,000 rows, Department have 100 rows.
Join produces all 10,000 employee-department combinations first.

Output

Optimized Query (equivalent but faster):
  π_name(Employee ⋈ (σ_budget>1000000(Department)))

This filters Department FIRST (cheap), then joins with smaller set.

If only 10 departments have budget > 1M:
- Original: Join 10000 × 100, then filter → 1,000,000 comparisons
- Optimized: Filter to 10 depts, join 10000 × 10 → 100,000 comparisons

10x faster, mathematically guaranteed to give identical results!

Common Optimization Strategies

1. Push Selections Down Apply filters as early as possible to reduce intermediate result sizes. This is almost always beneficial.

2. Push Projections Down Remove unneeded columns early to reduce memory and I/O. Be careful: some operations need columns that might be projected out prematurely.

3. Reorder Joins Since joins are commutative and associative, the optimizer can choose the most efficient join order. For n tables, there are (2n-2)! / (n-1)! possible orders.

4. Choose Join Algorithms Based on table sizes and indexes, choose between:

Nested Loop Join: Good for small outer table
Hash Join: Good for equality joins on large tables
Merge Join: Good when inputs are sorted

5. Use Indexes Replace table scans with index lookups when conditions use indexed columns.

Reading Query Plans

Mathematical Integrity Constraints

Types of Integrity Constraints

Common Integrity Constraints in Predicate Logic
Constraint Type	Predicate Logic Expression	SQL Equivalent
Domain Constraint	∀t ∈ R: t.age ∈ {1..120}	CHECK (age BETWEEN 1 AND 120)
Not Null	∀t ∈ R: t.name ≠ NULL	name NOT NULL
Unique Key	∀t1,t2 ∈ R: t1.key = t2.key → t1 = t2	UNIQUE (key)
Primary Key	Unique ∧ Not Null on key attributes	PRIMARY KEY (id)
Foreign Key	∀t ∈ R: ∃s ∈ S: t.fk = s.pk	FOREIGN KEY REFERENCES S(pk)
Entity Integrity	∀t ∈ R: t.pk ≠ NULL	PRIMARY KEY implies NOT NULL

Referential Integrity in Depth

Referential integrity ensures that foreign key values match existing primary keys. Formally:

∀r ∈ R: r.foreign_key = NULL ∨ ∃s ∈ S: r.foreign_key = s.primary_key

This means: Every foreign key value is either NULL (if allowed) or refers to an existing primary key.

Enforcement Actions:

When a referenced primary key is deleted or updated:

RESTRICT / NO ACTION: Reject the operation if references exist
CASCADE: Delete/update referencing rows too
SET NULL: Set foreign key to NULL
SET DEFAULT: Set foreign key to default value

Expressing Business Rules as ConstraintsComplex constraints expressed in predicate logic

Input

Business Rule: 
"A manager's salary must be greater than all their direct reports"

Predicate Logic:
∀m, e ∈ Employee: 
  (e.manager_id = m.emp_id) → (m.salary > e.salary)

SQL CHECK (complex, requires triggers in most DBMSs):
Only expressible via triggers or application logic in standard SQL

Output

Another Rule:
"Every department must have at least one employee"

Predicate Logic:
∀d ∈ Department: ∃e ∈ Employee: e.department_id = d.id

This is an ASSERTION (not directly supported in most DBMSs).
Workaround: Use triggers or application-level enforcement.

The gap between theoretical expressiveness and practical SQL
is why understanding the theory helps—you know what's missing.

The Constraint Expression Gap

Functional Dependencies and Normalization

Functional dependencies (FDs) are a mathematical concept central to database design. They describe how attribute values depend on each other.

Definition

A functional dependency X → Y holds in relation R if, whenever two tuples have the same values for attributes X, they must also have the same values for attributes Y.

In predicate logic:

∀t1, t2 ∈ R: t1.X = t2.X → t1.Y = t2.Y

Example:

emp_id → name, department, salary

If two tuples have the same emp_id, they must have the same name, department, and salary. This makes emp_id a candidate key.
zip_code → city, state (in US)

A zip code determines the city and state.

Armstrong's Axioms

Functional dependencies obey a complete set of inference rules called Armstrong's Axioms:

1. Reflexivity: If Y ⊆ X, then X → Y

Any set of attributes determines its subsets
{A, B} → B is always true (trivial FD)

2. Augmentation: If X → Y, then XZ → YZ

Adding attributes to both sides preserves the FD
If emp_id → name, then {emp_id, dept} → {name, dept}

3. Transitivity: If X → Y and Y → Z, then X → Z

Dependencies chain together
If emp_id → dept_id and dept_id → dept_name, then emp_id → dept_name

Derived Rules:

Union: If X → Y and X → Z, then X → YZ
Decomposition: If X → YZ, then X → Y and X → Z
Pseudotransitivity: If X → Y and WY → Z, then WX → Z

These rules are sound (only produce valid implications) and complete (can derive all valid implications).

Using Functional DependenciesFinding candidate keys from FDs

Input

Relation: CourseSection(course_id, section, semester, year, instructor, room)

Functional Dependencies:
  1. {course_id, section, semester, year} → instructor
  2. {course_id, section, semester, year} → room
  3. {room, semester, year} → course_id, section  (room is used for one course at a time)

Question: What are the candidate keys?

Output

Analysis:
  - From FDs 1 & 2 (union): {course_id, section, semester, year} → {instructor, room}
  - This means {course_id, section, semester, year} determines ALL other attributes
  - Therefore it's a superkey
  
  - Is it minimal? Yes—no subset determines all attributes
  - Therefore it's a CANDIDATE KEY

  - From FD 3: {room, semester, year} → {course_id, section}
  - Combined with FD 1: {room, semester, year} → entire relation
  - Therefore {room, semester, year} is ANOTHER candidate key!

The relation has two candidate keys.

FDs Enable Normalization

The Power of Mathematical Formalism

The mathematical foundations of the relational model aren't just theoretical elegance—they enable practical capabilities that would be impossible otherwise.

Practical Benefits of Mathematical Foundations

•Query Optimization — Algebraic equivalences allow databases to automatically rewrite queries for better performance without changing results. You write readable SQL; the database executes efficiently.
•Correctness Proofs — Operations are formally defined, so we can prove query transformations preserve semantics. You can trust the optimizer.
•View Updates — Mathematical analysis determines when updates to views can be unambiguously propagated to base tables.
•Constraint Propagation — Formal constraint languages enable automatic enforcement and violation detection across complex relationships.
•Normalization Theory — Mathematical definitions of normal forms enable systematic design that provably eliminates redundancy and anomalies.
•Transaction Semantics — ACID properties can be formally defined and verified, ensuring data integrity under concurrent access.
•Parallel Execution — Algebraic properties identify operations that can safely run in parallel, enabling distributed query processing.

Relational Completeness

A query language is relationally complete if it can express any query that relational algebra/calculus can express. Codd defined this as the minimum bar for a relational query language.

Beyond Completeness: Extensions

Modern SQL extends beyond relational completeness with:

Aggregation (SUM, COUNT, GROUP BY)
Transitive closure / recursion (WITH RECURSIVE)
Window functions (RANK, OVER)
Temporal queries (AS OF, BETWEEN)

Each extension has its own theoretical foundations, but all build on the solid set-theoretic and logical base.

Theory in Practice

Summary: The Mathematical Core

The relational model's mathematical foundations are what elevate it from a data storage technique to a principled data management system. Let's consolidate the key concepts:

Key Takeaways

•Set theory provides structural semantics — Relations are sets of tuples; set operations apply directly; closure under operations enables query composition.
•Predicate logic provides query semantics — Tuples represent true propositions; queries express conditions tuples must satisfy; quantifiers express complex conditions.
•Relational algebra is procedural — Operations transform relations step by step; forms the basis for query execution plans and optimization.
•Relational calculus is declarative — Describes desired results without specifying computation; closer to how SQL queries are written.
•Codd's theorem proves equivalence — Algebra and safe calculus express the same queries; enables declarative input with procedural execution.
•Algebraic laws enable optimization — Equivalences allow query rewriting for efficiency; this is why declarative SQL can execute efficiently.
•Functional dependencies formalize design — FDs describe attribute relationships; Armstrong's axioms enable systematic normalization.

What's Next:

Page Complete