Tuple Relational Calculus - Learning Module

Loading content...

0/252

Range Expression

Declaring Where Tuple Variables Range

In the previous section, we established that tuple variables are symbols that iterate over tuples. But a critical question remains: How do we specify which relation a tuple variable iterates over?

This is the role of the range expression—the formal declaration that binds a tuple variable to a specific relation. Without range expressions, tuple variables would be untethered symbols with no defined domain of values.

Range expressions are not merely syntactic sugar. They serve fundamental purposes:

Domain Specification: They define the set of possible bindings for a tuple variable
Type Checking: They enable schema validation—ensuring attribute references are valid
Query Safety: They are essential for ensuring queries produce finite results
Optimization: They inform query planners which relations must be accessed

This page provides a comprehensive treatment of range expressions—their syntax, semantics, variations, and implications for query construction.

What You Will Learn

By the end of this page, you will master the syntax and semantics of range expressions across different TRC notations, understand how range declarations affect query safety, and recognize how range expressions translate to SQL FROM clauses and subquery structures.

Formal Syntax of Range Expressions

Different textbooks and formal treatments of TRC use varying syntax for range expressions. Understanding these variations prepares you for diverse academic and practical contexts.

Notation Style 1: Membership Notation

The most intuitive notation expresses range as set membership:

t ∈ R

This reads: "tuple variable t is an element of (ranges over) relation R."

Notation Style 2: Predicate Notation

An alternative treats the relation name as a predicate:

R(t)

This reads: "tuple variable t satisfies the predicate R" (i.e., t is a tuple in R).

Notation Style 3: Explicit Range Declaration

Some formal systems separate range declarations from the formula:

{ t | t.range = R ∧ P(t) }

or with explicit range specification:

{ t : R | P(t) }

where the : R binds t to relation R.

Notation Style 4: Subscript Notation

For compactness in mathematical texts:

{tᵣ | P(t)}

where the subscript indicates the relation.

Range Expression Syntax Variants
Style	Syntax	Reading	Common In
Membership	`t ∈ R`	`t` is in `R`	Set theory texts
Predicate	`R(t)`	`R` holds for `t`	Logic/Codd's original
Typed Target	`{ t : R \| P }`	`t` of type `R`	Formal specifications
Conditional	`t ∈ R ∧ ...`	Combined in formula	Database textbooks
Subscript	`tᵣ`	Subscript denotes range	Compact notation

Semantic Equivalence

All these notations are semantically equivalent—they all declare that a tuple variable ranges over a specific relation. The choice is primarily stylistic and contextual. In this module, we primarily use the membership notation (t ∈ R) and predicate notation (R(t)) as they are most common in database literature.

Complete Query Structure with Range:

A well-formed TRC query explicitly declares ranges for all tuple variables. The general form is:

{ target-expression | range-declarations ∧ selection-condition }

For example:

{ e.name, e.salary | Employee(e) ∧ e.dept_id = 5 ∧ e.salary > 50000 }

Here:

e.name, e.salary is the target expression (what to return)
Employee(e) is the range declaration
e.dept_id = 5 ∧ e.salary > 50000 is the selection condition

Semantic Interpretation of Range Expressions

Beyond syntax, we must understand the semantic meaning of range expressions—what they imply about query evaluation.

The Universal Domain Perspective:

In pure first-order logic, variables range over a universal domain of all possible values. This creates problems for databases:

The universe includes infinitely many possible tuples
Queries might need to enumerate all these tuples to find results
Results could be infinite

The Relation-Restricted Perspective:

TRC resolves this by restricting variables to specific relations:

Employee(e) means e ranges only over existing tuples in Employee
The domain is finite (the current database instance)
Query evaluation is guaranteed to terminate

This is a crucial semantic restriction that makes TRC practically evaluable.

Instance vs. Schema:

Range expressions operate at two levels:

Schema Level: Employee(e) declares that e has the same structure as Employee tuples—the same attributes with the same domains. This enables type checking.
Instance Level: At evaluation time, e ranges over the current instance of Employee—whatever tuples actually exist in the database at query time.

Dynamic Binding

Range expressions don't fix which tuples the variable binds to—they specify the relation. The actual tuples depend on the database state when the query executes. A query written today may return different results tomorrow if the underlying data changes.

Closed World Assumption:

TRC (and SQL) operate under the Closed World Assumption (CWA):

Facts in the database are true
Facts not in the database are false
There are no unknown facts

For range expressions, this means:

Employee(e)  // e can only be bound to tuples explicitly stored in Employee
             // If a person exists but isn't in the Employee table, they can't bind to e

This assumption enables negation: "Find employees with no recorded dependents" can be evaluated because absence from the Dependent table means no dependents exist.

Open World Alternative:

Contrast with semantic web databases using the Open World Assumption:

Absence of a fact doesn't mean it's false
It might just be unknown
Negation becomes problematic

Standard TRC assumes closed world, making range expressions definitive rather than tentative.

Range Semantics ExampleUnderstanding what a range expression actually binds

Input

Given: Employee table with tuples {E1, E2, E3}
Query: { e | Employee(e) ∧ e.salary > 50000 }

Output

Subset of {E1, E2, E3} where salary > 50000

Explanation

The range expression Employee(e) restricts e to bind only to E1, E2, or E3. Even if you imagine an employee E4 with salary 60000, if E4 isn't in the table, the query cannot return it. The formula is evaluated exactly three times—once for each existing tuple.

Multiple Relation Ranges

Real queries frequently involve multiple tuple variables, each ranging over potentially different relations. This section explores the semantics and syntax of multi-relation range declarations.

Cartesian Product Semantics:

When a query has multiple free tuple variables with ranges, the conceptual evaluation considers all combinations:

{ e, d | Employee(e) ∧ Department(d) ∧ condition }

Semantically, this iterates over:

for each e in Employee:
    for each d in Department:
        if condition(e, d) is TRUE:
            include (e, d) in result

This is the Cartesian product of Employee × Department, filtered by the condition.

Join Conditions:

Typically, a join condition connects the variables meaningfully:

{ e, d | Employee(e) ∧ Department(d) ∧ e.dept_id = d.dept_id }

The e.dept_id = d.dept_id predicate filters the Cartesian product to matching pairs—an equijoin.

Same-Relation Multi-Ranging:

Multiple variables can range over the same relation:

{ e₁, e₂ | Employee(e₁) ∧ Employee(e₂) ∧ e₁.manager_id = e₂.emp_id }

Here, both e₁ and e₂ range over Employee, but they bind independently. This is essential for self-joins—comparing tuples within one relation.

multiple-ranges.trc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Three-Way Join: Employees, Departments, Locations
{ e, d, l | Employee(e) ∧ 
            Department(d) ∧ 
            Location(l) ∧ 
            e.dept_id = d.dept_id ∧ 
            d.location_id = l.location_id }
 
-- Self-Join: Find employees earning more than their manager
{ emp | Employee(emp) ∧ 
        ∃mgr(Employee(mgr) ∧ 
             emp.manager_id = mgr.emp_id ∧ 
             emp.salary > mgr.salary) }
 
-- Mixed: Free and Bound Variables with Multiple Ranges
{ e | Employee(e) ∧ 
      ∃d(Department(d) ∧ e.dept_id = d.dept_id ∧ d.budget > 1000000) ∧
      ∃p(Project(p) ∧ p.lead_id = e.emp_id) }
 
-- Interpretation: 
-- Find employees who work in high-budget departments AND lead a project

Range Declarations Must Be Complete

Every tuple variable used in a TRC query must have its range declared. Using a variable without a range declaration (e.g., e.salary > 50000 without Employee(e)) is a malformed query. Some notations make this implicit; others require explicit declaration. Always ensure every variable's range is clear.

Order Independence:

Range declarations are order-independent in their semantics. The following are equivalent:

{ e, d | Employee(e) ∧ Department(d) ∧ ... }
{ e, d | Department(d) ∧ Employee(e) ∧ ... }

Both declare the same ranges. The query optimizer decides actual evaluation order based on efficiency, not declaration order.

Combining Ranges with Conditions:

Range predicates and selection conditions combine via conjunction (∧):

{ e | Employee(e) ∧ e.salary > 50000 ∧ e.dept_id = 5 }

Employee(e) is the range
e.salary > 50000 and e.dept_id = 5 are selection conditions

All must be true for e to be included in the result.

Range Expressions and Query Safety

Range expressions play a critical role in ensuring query safety—the guarantee that a query produces a finite result. This is not merely an academic concern; unsafe queries are unevaluable in practice.

The Safety Problem:

Consider this apparently innocent query:

{ e | ¬Employee(e) }

This asks for "all tuples that are NOT in Employee." But what is the domain? If we consider all possible tuples (infinite), the result is infinite—we cannot enumerate it.

How Range Expressions Ensure Safety:

A key safety requirement is that every tuple variable in the result must be positively bounded—its range must be constrained to a finite set (a stored relation or derived from one).

Safe Range Rules:

Positive Range Declaration: The variable appears in a positive (non-negated) range predicate like R(t)
Existential Safety: Variables under existential quantifiers must also be positively bounded within their scope
Universal Safety: Universal quantifiers must have their range clearly limited

Safe vs. Unsafe Query Examples
Query	Safe?	Reason
`{ e \| Employee(e) ∧ e.sal > 50000 }`	✓ Safe	e positively bounded by Employee(e)
`{ e \| ¬Employee(e) }`	✗ Unsafe	e not positively bounded; infinite complement
`{ e \| Employee(e) ∧ ¬∃d(Department(d) ∧ e.dept=d.id) }`	✓ Safe	e bounded; d is existential and bounded
`{ t \| t.salary > 50000 }`	✗ Unsafe	t has no range declaration at all
`{ e \| Employee(e) ∨ e.salary > 50000 }`	✗ Unsafe	Disjunction: e might match via unbounded branch
`{ e \| ∃d(Department(d) ∧ e.dept_id = d.dept_id) }`	✗ Unsafe	e itself has no positive range

Safety is Decidable

For a well-defined subset of TRC (safe TRC), there are syntactic rules that guarantee safety. Query processors can check these rules before execution. The full treatment of safety conditions appears in a dedicated module, but range expressions are the primary mechanism for establishing positive bounds.

Domain-Independent Queries:

A related concept is domain independence: a query's result should depend only on the database contents, not on the underlying attribute domains.

Consider:

{ t | ¬Employee(t) ∧ t.salary = 50000 }

If the salary domain includes all integers, this returns infinitely many "non-employee" tuples with salary 50000. The result depends on the domain definition, not the data. Such queries are domain-dependent and typically rejected.

Range expressions ensure domain independence by grounding all variables in actual relations. When every free variable has a positive range declaration over a stored relation, the query result depends only on stored data.

The Safe TRC Subset:

Practical query languages (including SQL's underlying formalism) restrict to safe TRC:

All free variables must appear in a positive range predicate
Existentially quantified variables must be positively bounded within their scope
Universal quantifications must be safely constructed (typically as ∀t(R(t) → ...) ensuring t ranges over R)

This safe subset remains expressively equivalent to relational algebra while guaranteeing evaluable queries.

Implicit vs. Explicit Range Declarations

Different TRC formalizations vary in how explicitly they require range declarations. Understanding this spectrum helps when reading diverse literature.

Fully Explicit Style:

Some notations require every variable's range to be declared in a dedicated section:

RANGE OF e IS Employee
RANGE OF d IS Department

{ e.name, d.name | e.dept_id = d.dept_id ∧ e.salary > 50000 }

This style, reminiscent of early QUEL (Query Language) from Ingres, separates range declarations from the predicate.

Integrated Style:

More common is integrating range declarations as predicates:

{ e.name, d.name | Employee(e) ∧ Department(d) ∧ e.dept_id = d.dept_id ∧ e.salary > 50000 }

Here, Employee(e) and Department(d) serve as both range declarations and filter predicates.

Implicit/Inferred Style:

Some theoretical treatments allow attribute references to imply ranges:

{ e.name | e.salary > 50000 }

If only one relation has a salary attribute, e's range is inferred. This is convenient but potentially ambiguous in schemas with overlapping attribute names.

Explicit Declaration Benefits

•Unambiguous—no confusion about variable ranges
•Enables schema validation before execution
•Works for any attribute naming scheme
•Clear separation of concerns (range vs. filter)
•Required for safety checking algorithms

Implicit Declaration Benefits

•More concise query expressions
•Natural for simple, single-relation queries
•Reduces syntactic overhead
•Closer to natural language descriptions
•User-friendly for interactive environments

SQL's Approach:

SQL takes a middle ground:

SELECT e.name, d.name
FROM Employee AS e, Department AS d
WHERE e.dept_id = d.dept_id AND e.salary > 50000

Ranges are explicit in the FROM clause
The predicate (WHERE) is separate from ranges
Aliases (e, d) function as tuple variable names

This is essentially the explicit TRC style with specific syntactic structure. The FROM clause declares ranges; the WHERE clause specifies conditions.

Best Practice:

For clarity and correctness, prefer explicit range declarations, especially in:

Formal specifications
Complex multi-relation queries
Educational contexts
Systems that validate query safety

Implicit ranges are acceptable only in unambiguous contexts (single-relation queries in well-known schemas).

Range Expressions with Quantifiers

Quantified variables (those under ∃ or ∀) require careful range handling. The range declaration appears within the quantifier's scope.

Existential Quantifier with Range:

Standard form:

∃t(R(t) ∧ condition(t))

R(t) declares that t ranges over R
condition(t) specifies additional requirements
The entire expression is true if at least one tuple in R satisfies the condition

Example:

{ e | Employee(e) ∧ ∃d(Department(d) ∧ e.dept_id = d.dept_id ∧ d.budget > 1000000) }

For each employee e, check if there exists a department d such that e is in d and d has budget over 1M.

Universal Quantifier with Range:

Universal quantification requires special care for safety:

∀t(R(t) → condition(t))

This reads: "For all t, if t is in R, then condition(t) holds."

The implication (→) restricts the universal to tuples in R
Without the implication, the universal would quantify over all possible tuples (unsafe)
This is the restricted universal form, essential for safe TRC

quantifier-ranges.trc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- EXISTENTIAL: Find departments that have at least one employee
{ d | Department(d) ∧ ∃e(Employee(e) ∧ e.dept_id = d.dept_id) }
 
-- UNIVERSAL: Find departments where ALL employees earn > 50000
-- Note the implication structure for safety
{ d | Department(d) ∧ 
      ∀e(Employee(e) ∧ e.dept_id = d.dept_id → e.salary > 50000) }
 
-- Alternative universal form (equivalent)
{ d | Department(d) ∧ 
      ¬∃e(Employee(e) ∧ e.dept_id = d.dept_id ∧ e.salary ≤ 50000) }
 
-- NESTED QUANTIFIERS: Employees who have worked on ALL projects in their department
{ e | Employee(e) ∧ 
      ∀p(Project(p) ∧ p.dept_id = e.dept_id → 
         ∃a(Assignment(a) ∧ a.emp_id = e.emp_id ∧ a.proj_id = p.proj_id)) }

Universal Quantifier Pitfall

Writing ∀t(condition(t)) without restricting to a relation range is unsafe—it quantifies over all possible tuples in the universe. Always use the form ∀t(R(t) → ...) or equivalently, convert to negated existential. This restricted form ensures the universal only considers tuples in R.

The Implication for Safety:

Why does ∀t(R(t) → condition(t)) work?

Logically, A → B is true when:

A is false (regardless of B), OR
Both A and B are true

For tuples NOT in R:

R(t) is false
So R(t) → condition(t) is true (implication with false antecedent)
These tuples trivially satisfy the universal

For tuples IN R:

R(t) is true
So condition(t) must be true for the implication to hold
These are the tuples we actually care about

Thus, the formula effectively means: "For all tuples in R, the condition holds." The implication restricts the universal to the finite set R.

SQL Equivalence:

SQL expresses universals via double negation:

-- "All employees in department 5 earn > 50000"
SELECT * FROM Department d
WHERE NOT EXISTS (
    SELECT 1 FROM Employee e
    WHERE e.dept_id = d.dept_id AND e.salary <= 50000
)

This is the NOT EXISTS (negated existential) pattern equivalent to the restricted universal.

Range Expressions in Extended TRC

Pure TRC ranges over base relations. Extended versions of TRC allow more sophisticated range specifications.

Ranging Over Query Results:

Some formalisms allow tuple variables to range over derived relations (query results):

{ t | (SELECT * FROM Employee WHERE salary > 50000)(t) ∧ ... }

Or with named derived relations:

LET HighEarners = { e | Employee(e) ∧ e.salary > 50000 }
{ h | HighEarners(h) ∧ h.dept_id = 5 }

This is essentially view composition—building queries on top of named subqueries.

Ranging Over Expressions:

Advanced systems allow ranges over computed sets:

{ t | t ∈ (Employee ∪ Contractor) ∧ ... }

Here, the range is the union of two relations. The tuple variable t can bind to tuples from either.

SQL Equivalents:

SQL supports these patterns through:

Subqueries in FROM:

SELECT * FROM (SELECT * FROM Employee WHERE salary > 50000) AS HighEarners
WHERE dept_id = 5

UNION in FROM (in some systems):

SELECT * FROM (
    SELECT * FROM Employee
    UNION ALL
    SELECT * FROM Contractor
) AS AllWorkers
WHERE ...

Common Table Expressions (CTEs):

WITH HighEarners AS (
    SELECT * FROM Employee WHERE salary > 50000
)
SELECT * FROM HighEarners WHERE dept_id = 5

Compositionality Principle

The ability to range over derived relations reflects TRC's compositionality—queries can be built from subqueries. This is fundamental to relational query languages and enables complex logic to be decomposed into manageable pieces. SQL CTEs and subqueries are the practical manifestation of this theoretical capability.

Domain Calculations:

Some extensions allow ranging over active domain values:

{ x | x ∈ π_salary(Employee) }

This yields all distinct salary values that appear in Employee—ranging over the active domain of the salary attribute rather than tuples.

This connects to Domain Relational Calculus (DRC), where variables range over domain values rather than tuples. We'll explore DRC in a later module.

Practical Implications:

Extended range capabilities enable:

Query modularization (named subqueries)
Recursive query patterns (in extended formalisms)
Aggregate-based ranges (after aggregation, range over groups)
Set operation results as ranges

While pure TRC restricts to base relations, practical query languages embrace these extensions for expressiveness.

Summary: Mastering Range Expressions

Range expressions, while seemingly simple, are foundational to TRC's semantics and practicality. Let's consolidate the key concepts:

Key Takeaways

•Purpose: Range expressions declare which relation a tuple variable iterates over, defining its domain of possible bindings
•Syntax Variants: Multiple notations exist (t ∈ R, R(t), { t : R | ... }) with equivalent semantics—choose based on context
•Semantic Role: Ranges restrict variables to finite sets (database relations), enabling query evaluation and ensuring domain independence
•Safety Foundation: Positive range declarations are essential for query safety—every free and bound variable must be properly ranged
•Multiple Ranges: Queries can have multiple tuple variables, each with its own range, creating Cartesian products filtered by conditions
•Quantifier Ranges: Bound variables under quantifiers require range declarations within their scope; universals need the implication pattern for safety
•SQL Correspondence: The FROM clause in SQL directly corresponds to TRC range declarations; table aliases are tuple variable names

What's Next:

With tuple variables and their range declarations established, we now turn to Formula Syntax—the complete grammar for expressing conditions in TRC. This includes logical connectives, comparison operators, and the formal structure of well-formed TRC expressions.

Page Complete

You now understand range expressions comprehensively—from syntax variations to semantic implications, from safety requirements to quantifier integration. This knowledge is essential for constructing correct, evaluable TRC queries.