Loading learning content...
In the previous page, we stated Codd's Theorem: relational algebra and safe relational calculus possess identical expressive power. Such a claim demands proof—and not just any proof, but a constructive proof that provides actual algorithms for translating queries between these paradigms.
This page presents the key elements of that proof. We'll walk through the translations in both directions, demonstrating with concrete examples how calculus queries become algebraic expressions and vice versa. This isn't merely academic exercise—these translation algorithms mirror what real database systems do when compiling SQL into execution plans.
By the end of this page, you will understand the structural induction technique used to prove equivalence, the specific translation rules for each algebraic operator and calculus construct, worked examples showing complete translations, and the algorithmic nature of these proofs that enables implementation in database compilers.
The equivalence proof proceeds in two directions, each using structural induction on the syntax of query expressions:
Direction 1: Algebra → Calculus
{ t | R(t) }Direction 2: Calculus → Algebra (Safe Queries Only)
Both directions are constructive: they don't just prove equivalence exists, they provide explicit translation procedures.
Structural induction proves properties about recursively defined structures. For algebraic expressions: (1) prove the property holds for base expressions (relations), (2) prove that if it holds for subexpressions, it holds for any expression built from them. This covers all possible expressions in the language.
Notation conventions for this page:
| Symbol | Meaning |
|---|---|
| R, S | Base relations |
| E, E₁, E₂ | Algebraic expressions |
| φ, ψ | Calculus formulas |
| t, u, v | Tuple variables (TRC) |
| x, y, z | Domain variables (DRC) |
| σ, π, ×, ∪, − | Algebraic operators |
| ∧, ∨, ¬, ∃, ∀ | Logical connectives |
| [E] | The calculus equivalent of algebraic expression E |
| 〈φ〉 | The algebraic equivalent of calculus formula φ |
We prove that every relational algebra expression has an equivalent tuple relational calculus query. The proof proceeds by induction on the structure of algebraic expressions.
For any base relation R with schema R(A₁, A₂, ..., Aₙ), the equivalent TRC query is: { t | R(t) }. This simply retrieves all tuples in R.
Inductive Step: Translation Rules
Assume E₁ and E₂ are algebraic expressions with TRC equivalents [E₁] = { t | φ₁(t) } and [E₂] = { u | φ₂(u) }. We show each algebraic operator has a calculus equivalent:
1. Union (E₁ ∪ E₂)
[E₁ ∪ E₂] = { t | φ₁(t) ∨ φ₂(t) }
A tuple is in the union iff it satisfies the first formula OR the second.
2. Set Difference (E₁ − E₂)
[E₁ − E₂] = { t | φ₁(t) ∧ ¬φ₂(t) }
A tuple is in the difference iff it satisfies the first formula AND NOT the second.
3. Cartesian Product (E₁ × E₂)
[E₁ × E₂] = { (t, u) | φ₁(t) ∧ φ₂(u) }
A combined tuple is in the product iff the first part satisfies φ₁ and the second satisfies φ₂.
4. Selection σ_C(E₁) where C is a condition
[σ_C(E₁)] = { t | φ₁(t) ∧ C(t) }
A tuple is selected iff it satisfies the original formula AND the selection condition.
5. Projection π_{A₁,...,Aₖ}(E₁)
[π_{A₁,...,Aₖ}(E₁)] = { (t.A₁, ..., t.Aₖ) | ∃t(φ₁(t)) }
The projected attributes appear iff there EXISTS a tuple in E₁ with those values.
| Algebraic Expression | TRC Equivalent | Key Insight |
|---|---|---|
| R (base relation) | { t | R(t) } | Direct membership test |
| E₁ ∪ E₂ | { t | φ₁(t) ∨ φ₂(t) } | Union = disjunction |
| E₁ − E₂ | { t | φ₁(t) ∧ ¬φ₂(t) } | Difference = negated conjunction |
| E₁ × E₂ | { (t,u) | φ₁(t) ∧ φ₂(u) } | Product = independent conjunction |
| σ_C(E₁) | { t | φ₁(t) ∧ C(t) } | Selection = added condition |
| π_{A}(E₁) | { t.A | ∃t(φ₁(t)) } | Projection = existential hiding |
| E₁ ⋈_C E₂ | { (t,u) | φ₁(t) ∧ φ₂(u) ∧ C(t,u) } | Join = filtered product |
| E₁ ∩ E₂ | { t | φ₁(t) ∧ φ₂(t) } | Intersection = conjunction |
Let's translate a complete algebraic expression to TRC step by step.
Given algebraic query:
π_{name}(σ_{salary > 50000 ∧ dept = 'Engineering'}(Employee))
Goal: Find names of Engineering employees earning more than 50000.
Step-by-step translation:
Step 1: Translate the base relation
[Employee] = { e | Employee(e) }
Step 2: Translate the selection
[σ_{salary > 50000 ∧ dept = 'Engineering'}(Employee)]
= { e | Employee(e) ∧ e.salary > 50000 ∧ e.dept = 'Engineering' }
Step 3: Translate the projection
[π_{name}(σ_{...}(Employee))]
= { (e.name) | ∃e(Employee(e) ∧ e.salary > 50000 ∧ e.dept = 'Engineering') }
Final TRC query:
{ t | ∃e(Employee(e) ∧ e.salary > 50000 ∧ e.dept = 'Engineering' ∧ t.name = e.name) }
Or in a cleaner form using attribute binding:
{ ⟨n⟩ | ∃e(Employee(e) ∧ e.salary > 50000 ∧ e.dept = 'Engineering' ∧ e.name = n) }
Both the original algebraic expression and the TRC query describe the same set: names of employees in Engineering with salary above 50000. The translation preserves semantics—this is what the proof guarantees.
A more complex example with join:
Algebraic query:
π_{E.name, D.dname}(σ_{E.dno = D.dno}(Employee × Department))
Translation:
Step 1: [Employee] = { e | Employee(e) }
Step 2: [Department] = { d | Department(d) }
Step 3: [Employee × Department] = { (e, d) | Employee(e) ∧ Department(d) }
Step 4: [σ_{E.dno = D.dno}(...)] = { (e, d) | Employee(e) ∧ Department(d) ∧ e.dno = d.dno }
Step 5: [π_{name, dname}(...)] = { (n, dn) | ∃e∃d(Employee(e) ∧ Department(d) ∧ e.dno = d.dno
∧ e.name = n ∧ d.dname = dn) }
Final TRC query:
{ ⟨n, dn⟩ | ∃e∃d(Employee(e) ∧ Department(d) ∧ e.dno = d.dno ∧ e.name = n ∧ d.dname = dn) }
This reads naturally: 'Find name-department pairs where an employee exists in that department.'
The reverse direction—translating calculus to algebra—is more challenging because calculus formulas have arbitrary logical structure. The key technique is to systematically process each type of formula construct.
This translation only works for SAFE calculus queries. For unsafe queries (infinite or domain-dependent results), no algebraic equivalent exists. The translation algorithm will produce nonsensical or infinite operations for unsafe inputs.
The translation algorithm (overview):
Translation rules for atomic formulas:
〈R(t)〉 = R // Base relation membership
〈t.A = c〉 = σ_{A=c}(DOM) // Constant comparison
〈t.A = u.B〉 = σ_{A=B}(DOM × DOM) // Attribute comparison
〈t.A < c〉 = σ_{A<c}(DOM) // Inequality
Where DOM represents the active domain relation.
Translation rules for connectives:
〈φ₁ ∧ φ₂〉 = 〈φ₁〉 ∩ 〈φ₂〉 = 〈φ₁〉 − (〈φ₁〉 − 〈φ₂〉) // Conjunction
〈φ₁ ∨ φ₂〉 = 〈φ₁〉 ∪ 〈φ₂〉 // Disjunction
〈¬φ₁〉 = DOM − 〈φ₁〉 // Negation (requires active domain)
Translation rules for quantifiers:
〈∃x.φ(x,y)〉 = π_{y}(〈φ(x,y)〉) // Existential = projection
〈∀x.φ(x,y)〉 = DOM − π_{y}(DOM × DOM_{x} − 〈φ(x,y)〉) // Universal (complex)
The universal quantifier translation uses: ∀x.φ ≡ ¬∃x.¬φ
Let's translate a TRC query to relational algebra.
Given TRC query:
{ e.name | Employee(e) ∧ e.salary > 50000 ∧ ∃d(Department(d) ∧ d.dno = e.dno ∧ d.dname = 'Research') }
Goal: Names of employees earning over 50000 who work in Research.
Step-by-step translation:
Step 1: Identify the outermost structure
The query projects e.name from a formula with:
Step 2: Translate the existential subformula
∃d(Department(d) ∧ d.dno = e.dno ∧ d.dname = 'Research')
This checks if there exists a Research department matching the employee's dno.
〈∃d(Department(d) ∧ d.dno = e.dno ∧ d.dname = 'Research')〉
= π_{dno}(σ_{dname='Research'}(Department))
Projecting dno gives us the department numbers of Research departments.
Step 3: Combine with Employee conditions
Employee(e) ∧ e.salary > 50000 ∧ (e.dno ∈ Research departments)
Algebraically:
σ_{salary > 50000}(Employee) ⋈_{dno = dno} π_{dno}(σ_{dname='Research'}(Department))
Or using semi-join:
σ_{salary > 50000}(Employee) ⋉_{dno} σ_{dname='Research'}(Department)
Step 4: Apply the final projection
π_{name}(σ_{salary > 50000}(Employee) ⋈_{dno = dno} π_{dno}(σ_{dname='Research'}(Department)))
Final algebraic expression:
π_{name}(
σ_{salary > 50000}(Employee)
⋈_{Employee.dno = Department.dno}
σ_{dname = 'Research'}(Department)
)
Both expressions compute the same result: names of high-earning Research employees. The TRC describes WHAT, the algebra describes HOW. The translation is semantics-preserving.
Universal quantifiers (∀) require special attention. They appear in queries like 'find students who passed ALL courses' or 'find suppliers who supply ALL parts.' Translating them to algebra involves the division operation or its equivalent using difference.
Example: Universal quantifier query
{ s.sname | Student(s) ∧ ∀c(Course(c) → Enrolled(s.sid, c.cid)) }
Translation approach using ∀x.φ ≡ ¬∃x.¬φ:
∀c(Course(c) → Enrolled(s.sid, c.cid))
≡ ∀c(¬Course(c) ∨ Enrolled(s.sid, c.cid))
≡ ¬∃c(Course(c) ∧ ¬Enrolled(s.sid, c.cid))
≡ ¬∃c(c is a course AND s is not enrolled in c)
Algebraic translation:
-- All courses
π_{cid}(Course)
-- Courses each student is enrolled in
π_{sid, cid}(Enrolled)
-- Students missing at least one course
-- (Students paired with all courses) - (Actual enrollments)
Missing = π_{sid}((π_{sid}(Student) × π_{cid}(Course)) − Enrolled)
-- Students enrolled in ALL courses = All students - Those missing any
π_{sid}(Student) − Missing
-- Get names
π_{sname}(Student ⋈ (π_{sid}(Student) − Missing))
Recognizing this as division:
The operation 'find students enrolled in all courses' is exactly what the division operator (÷) computes:
π_{sid, cid}(Enrolled) ÷ π_{cid}(Course)
This returns sids of students whose enrolled courses include ALL courses.
When you see a universal quantifier over a membership test (∀x ∈ S, predicate(y,x)), consider the division operation. Division explicitly computes 'which y values are associated with ALL x values in S.'
A subtle but crucial aspect of the calculus-to-algebra translation involves the active domain—the set of values that actually appear in the database. This matters when translating negation.
The problem:
Consider the TRC query:
{ t | R(t) ∧ ¬S(t) }
This asks for tuples in R but not in S—simply the difference R − S.
But consider:
{ t | ¬R(t) }
This asks for tuples NOT in R. But from what universe? All possible tuples? That's infinite.
The active domain solution:
For safe queries, all free variables are constrained to range over values appearing in the database. The active domain ADOM is:
ADOM = π_{A₁}(R₁) ∪ π_{A₂}(R₁) ∪ ... ∪ π_{B₁}(R₂) ∪ ...
Union of all attribute values from all relations.
Safe negation translation:
〈¬R(t)〉 = ADOM^n − R // where n is the arity of R
This computes all tuples of active domain values that are not in R—a finite, computable set.
Example:
{ t | Employee(t) ∧ ¬Manager(t) }
Translation:
Employee − Manager
No need for ADOM here because Employee already constrains the domain.
Safe queries guarantee that every free variable is 'limited'—constrained by some positive occurrence in a relation. This ensures the active domain contains all relevant values. Unsafe queries fail precisely because they reference values outside ADOM, leading to infinite or domain-dependent results.
For the equivalence proof to be complete, we must establish that the translations preserve semantics in all cases.
The proof technique:
Each translation rule is verified individually:
Base case verification:
Inductive case verification:
Similar verification for each operator and construct
The full proof appears in database theory textbooks (Ullman, Abiteboul-Hull-Vianu) with all edge cases addressed.
Key theorems established:
Theorem (Algebra-Calculus Equivalence): For every relational schema S:
- For every RA expression E over S, there exists a safe TRC formula φ such that E ≡ { t | φ(t) }
- For every safe TRC formula φ over S, there exists an RA expression E such that { t | φ(t) } ≡ E
The translations are effective (algorithmic) and semantics-preserving.
We've now seen the constructive proof of Codd's Theorem—the algorithms that translate between algebra and calculus.
What's next:
With the equivalence established, we turn to an important question: What can't relational algebra (and equivalently, safe calculus) express? Page 4 examines the inherent limitations of the relational model—queries that are impossible to express regardless of how cleverly we write them. These limitations motivate the extensions we'll explore in Page 5.
You now understand the formal proof of Codd's Theorem at a level suitable for database professionals. The translation algorithms we examined are not merely theoretical—they mirror the actual compilation process in database systems, where SQL (calculus-like) becomes operator trees (algebra) for optimization and execution.