Loading content...
In the previous section, we established that tuple variables are symbols that iterate over tuples. But a critical question remains: How do we specify which relation a tuple variable iterates over?
This is the role of the range expression—the formal declaration that binds a tuple variable to a specific relation. Without range expressions, tuple variables would be untethered symbols with no defined domain of values.
Range expressions are not merely syntactic sugar. They serve fundamental purposes:
This page provides a comprehensive treatment of range expressions—their syntax, semantics, variations, and implications for query construction.
By the end of this page, you will master the syntax and semantics of range expressions across different TRC notations, understand how range declarations affect query safety, and recognize how range expressions translate to SQL FROM clauses and subquery structures.
Different textbooks and formal treatments of TRC use varying syntax for range expressions. Understanding these variations prepares you for diverse academic and practical contexts.
Notation Style 1: Membership Notation
The most intuitive notation expresses range as set membership:
t ∈ R
This reads: "tuple variable t is an element of (ranges over) relation R."
Notation Style 2: Predicate Notation
An alternative treats the relation name as a predicate:
R(t)
This reads: "tuple variable t satisfies the predicate R" (i.e., t is a tuple in R).
Notation Style 3: Explicit Range Declaration
Some formal systems separate range declarations from the formula:
{ t | t.range = R ∧ P(t) }
or with explicit range specification:
{ t : R | P(t) }
where the : R binds t to relation R.
Notation Style 4: Subscript Notation
For compactness in mathematical texts:
{tᵣ | P(t)}
where the subscript indicates the relation.
| Style | Syntax | Reading | Common In |
|---|---|---|---|
| Membership | t ∈ R | t is in R | Set theory texts |
| Predicate | R(t) | R holds for t | Logic/Codd's original |
| Typed Target | { t : R | P } | t of type R | Formal specifications |
| Conditional | t ∈ R ∧ ... | Combined in formula | Database textbooks |
| Subscript | tᵣ | Subscript denotes range | Compact notation |
All these notations are semantically equivalent—they all declare that a tuple variable ranges over a specific relation. The choice is primarily stylistic and contextual. In this module, we primarily use the membership notation (t ∈ R) and predicate notation (R(t)) as they are most common in database literature.
Complete Query Structure with Range:
A well-formed TRC query explicitly declares ranges for all tuple variables. The general form is:
{ target-expression | range-declarations ∧ selection-condition }
For example:
{ e.name, e.salary | Employee(e) ∧ e.dept_id = 5 ∧ e.salary > 50000 }
Here:
e.name, e.salary is the target expression (what to return)Employee(e) is the range declaratione.dept_id = 5 ∧ e.salary > 50000 is the selection conditionBeyond syntax, we must understand the semantic meaning of range expressions—what they imply about query evaluation.
The Universal Domain Perspective:
In pure first-order logic, variables range over a universal domain of all possible values. This creates problems for databases:
The Relation-Restricted Perspective:
TRC resolves this by restricting variables to specific relations:
Employee(e) means e ranges only over existing tuples in EmployeeThis is a crucial semantic restriction that makes TRC practically evaluable.
Instance vs. Schema:
Range expressions operate at two levels:
Schema Level: Employee(e) declares that e has the same structure as Employee tuples—the same attributes with the same domains. This enables type checking.
Instance Level: At evaluation time, e ranges over the current instance of Employee—whatever tuples actually exist in the database at query time.
Range expressions don't fix which tuples the variable binds to—they specify the relation. The actual tuples depend on the database state when the query executes. A query written today may return different results tomorrow if the underlying data changes.
Closed World Assumption:
TRC (and SQL) operate under the Closed World Assumption (CWA):
For range expressions, this means:
Employee(e) // e can only be bound to tuples explicitly stored in Employee
// If a person exists but isn't in the Employee table, they can't bind to e
This assumption enables negation: "Find employees with no recorded dependents" can be evaluated because absence from the Dependent table means no dependents exist.
Open World Alternative:
Contrast with semantic web databases using the Open World Assumption:
Standard TRC assumes closed world, making range expressions definitive rather than tentative.
Given: Employee table with tuples {E1, E2, E3}
Query: { e | Employee(e) ∧ e.salary > 50000 }Subset of {E1, E2, E3} where salary > 50000The range expression Employee(e) restricts e to bind only to E1, E2, or E3. Even if you imagine an employee E4 with salary 60000, if E4 isn't in the table, the query cannot return it. The formula is evaluated exactly three times—once for each existing tuple.
Real queries frequently involve multiple tuple variables, each ranging over potentially different relations. This section explores the semantics and syntax of multi-relation range declarations.
Cartesian Product Semantics:
When a query has multiple free tuple variables with ranges, the conceptual evaluation considers all combinations:
{ e, d | Employee(e) ∧ Department(d) ∧ condition }
Semantically, this iterates over:
for each e in Employee:
for each d in Department:
if condition(e, d) is TRUE:
include (e, d) in result
This is the Cartesian product of Employee × Department, filtered by the condition.
Join Conditions:
Typically, a join condition connects the variables meaningfully:
{ e, d | Employee(e) ∧ Department(d) ∧ e.dept_id = d.dept_id }
The e.dept_id = d.dept_id predicate filters the Cartesian product to matching pairs—an equijoin.
Same-Relation Multi-Ranging:
Multiple variables can range over the same relation:
{ e₁, e₂ | Employee(e₁) ∧ Employee(e₂) ∧ e₁.manager_id = e₂.emp_id }
Here, both e₁ and e₂ range over Employee, but they bind independently. This is essential for self-joins—comparing tuples within one relation.
1234567891011121314151617181920
-- Three-Way Join: Employees, Departments, Locations{ e, d, l | Employee(e) ∧ Department(d) ∧ Location(l) ∧ e.dept_id = d.dept_id ∧ d.location_id = l.location_id } -- Self-Join: Find employees earning more than their manager{ emp | Employee(emp) ∧ ∃mgr(Employee(mgr) ∧ emp.manager_id = mgr.emp_id ∧ emp.salary > mgr.salary) } -- Mixed: Free and Bound Variables with Multiple Ranges{ e | Employee(e) ∧ ∃d(Department(d) ∧ e.dept_id = d.dept_id ∧ d.budget > 1000000) ∧ ∃p(Project(p) ∧ p.lead_id = e.emp_id) } -- Interpretation: -- Find employees who work in high-budget departments AND lead a projectEvery tuple variable used in a TRC query must have its range declared. Using a variable without a range declaration (e.g., e.salary > 50000 without Employee(e)) is a malformed query. Some notations make this implicit; others require explicit declaration. Always ensure every variable's range is clear.
Order Independence:
Range declarations are order-independent in their semantics. The following are equivalent:
{ e, d | Employee(e) ∧ Department(d) ∧ ... }
{ e, d | Department(d) ∧ Employee(e) ∧ ... }
Both declare the same ranges. The query optimizer decides actual evaluation order based on efficiency, not declaration order.
Combining Ranges with Conditions:
Range predicates and selection conditions combine via conjunction (∧):
{ e | Employee(e) ∧ e.salary > 50000 ∧ e.dept_id = 5 }
Employee(e) is the rangee.salary > 50000 and e.dept_id = 5 are selection conditionsAll must be true for e to be included in the result.
Range expressions play a critical role in ensuring query safety—the guarantee that a query produces a finite result. This is not merely an academic concern; unsafe queries are unevaluable in practice.
The Safety Problem:
Consider this apparently innocent query:
{ e | ¬Employee(e) }
This asks for "all tuples that are NOT in Employee." But what is the domain? If we consider all possible tuples (infinite), the result is infinite—we cannot enumerate it.
How Range Expressions Ensure Safety:
A key safety requirement is that every tuple variable in the result must be positively bounded—its range must be constrained to a finite set (a stored relation or derived from one).
Safe Range Rules:
R(t)| Query | Safe? | Reason |
|---|---|---|
{ e | Employee(e) ∧ e.sal > 50000 } | ✓ Safe | e positively bounded by Employee(e) |
{ e | ¬Employee(e) } | ✗ Unsafe | e not positively bounded; infinite complement |
{ e | Employee(e) ∧ ¬∃d(Department(d) ∧ e.dept=d.id) } | ✓ Safe | e bounded; d is existential and bounded |
{ t | t.salary > 50000 } | ✗ Unsafe | t has no range declaration at all |
{ e | Employee(e) ∨ e.salary > 50000 } | ✗ Unsafe | Disjunction: e might match via unbounded branch |
{ e | ∃d(Department(d) ∧ e.dept_id = d.dept_id) } | ✗ Unsafe | e itself has no positive range |
For a well-defined subset of TRC (safe TRC), there are syntactic rules that guarantee safety. Query processors can check these rules before execution. The full treatment of safety conditions appears in a dedicated module, but range expressions are the primary mechanism for establishing positive bounds.
Domain-Independent Queries:
A related concept is domain independence: a query's result should depend only on the database contents, not on the underlying attribute domains.
Consider:
{ t | ¬Employee(t) ∧ t.salary = 50000 }
If the salary domain includes all integers, this returns infinitely many "non-employee" tuples with salary 50000. The result depends on the domain definition, not the data. Such queries are domain-dependent and typically rejected.
Range expressions ensure domain independence by grounding all variables in actual relations. When every free variable has a positive range declaration over a stored relation, the query result depends only on stored data.
The Safe TRC Subset:
Practical query languages (including SQL's underlying formalism) restrict to safe TRC:
∀t(R(t) → ...) ensuring t ranges over R)This safe subset remains expressively equivalent to relational algebra while guaranteeing evaluable queries.
Different TRC formalizations vary in how explicitly they require range declarations. Understanding this spectrum helps when reading diverse literature.
Fully Explicit Style:
Some notations require every variable's range to be declared in a dedicated section:
RANGE OF e IS Employee
RANGE OF d IS Department
{ e.name, d.name | e.dept_id = d.dept_id ∧ e.salary > 50000 }
This style, reminiscent of early QUEL (Query Language) from Ingres, separates range declarations from the predicate.
Integrated Style:
More common is integrating range declarations as predicates:
{ e.name, d.name | Employee(e) ∧ Department(d) ∧ e.dept_id = d.dept_id ∧ e.salary > 50000 }
Here, Employee(e) and Department(d) serve as both range declarations and filter predicates.
Implicit/Inferred Style:
Some theoretical treatments allow attribute references to imply ranges:
{ e.name | e.salary > 50000 }
If only one relation has a salary attribute, e's range is inferred. This is convenient but potentially ambiguous in schemas with overlapping attribute names.
SQL's Approach:
SQL takes a middle ground:
SELECT e.name, d.name
FROM Employee AS e, Department AS d
WHERE e.dept_id = d.dept_id AND e.salary > 50000
FROM clauseThis is essentially the explicit TRC style with specific syntactic structure. The FROM clause declares ranges; the WHERE clause specifies conditions.
Best Practice:
For clarity and correctness, prefer explicit range declarations, especially in:
Implicit ranges are acceptable only in unambiguous contexts (single-relation queries in well-known schemas).
Quantified variables (those under ∃ or ∀) require careful range handling. The range declaration appears within the quantifier's scope.
Existential Quantifier with Range:
Standard form:
∃t(R(t) ∧ condition(t))
R(t) declares that t ranges over Rcondition(t) specifies additional requirementsExample:
{ e | Employee(e) ∧ ∃d(Department(d) ∧ e.dept_id = d.dept_id ∧ d.budget > 1000000) }
For each employee e, check if there exists a department d such that e is in d and d has budget over 1M.
Universal Quantifier with Range:
Universal quantification requires special care for safety:
∀t(R(t) → condition(t))
This reads: "For all t, if t is in R, then condition(t) holds."
12345678910111213141516
-- EXISTENTIAL: Find departments that have at least one employee{ d | Department(d) ∧ ∃e(Employee(e) ∧ e.dept_id = d.dept_id) } -- UNIVERSAL: Find departments where ALL employees earn > 50000-- Note the implication structure for safety{ d | Department(d) ∧ ∀e(Employee(e) ∧ e.dept_id = d.dept_id → e.salary > 50000) } -- Alternative universal form (equivalent){ d | Department(d) ∧ ¬∃e(Employee(e) ∧ e.dept_id = d.dept_id ∧ e.salary ≤ 50000) } -- NESTED QUANTIFIERS: Employees who have worked on ALL projects in their department{ e | Employee(e) ∧ ∀p(Project(p) ∧ p.dept_id = e.dept_id → ∃a(Assignment(a) ∧ a.emp_id = e.emp_id ∧ a.proj_id = p.proj_id)) }Writing ∀t(condition(t)) without restricting to a relation range is unsafe—it quantifies over all possible tuples in the universe. Always use the form ∀t(R(t) → ...) or equivalently, convert to negated existential. This restricted form ensures the universal only considers tuples in R.
The Implication for Safety:
Why does ∀t(R(t) → condition(t)) work?
Logically, A → B is true when:
For tuples NOT in R:
R(t) is falseR(t) → condition(t) is true (implication with false antecedent)For tuples IN R:
R(t) is truecondition(t) must be true for the implication to holdThus, the formula effectively means: "For all tuples in R, the condition holds." The implication restricts the universal to the finite set R.
SQL Equivalence:
SQL expresses universals via double negation:
-- "All employees in department 5 earn > 50000"
SELECT * FROM Department d
WHERE NOT EXISTS (
SELECT 1 FROM Employee e
WHERE e.dept_id = d.dept_id AND e.salary <= 50000
)
This is the NOT EXISTS (negated existential) pattern equivalent to the restricted universal.
Pure TRC ranges over base relations. Extended versions of TRC allow more sophisticated range specifications.
Ranging Over Query Results:
Some formalisms allow tuple variables to range over derived relations (query results):
{ t | (SELECT * FROM Employee WHERE salary > 50000)(t) ∧ ... }
Or with named derived relations:
LET HighEarners = { e | Employee(e) ∧ e.salary > 50000 }
{ h | HighEarners(h) ∧ h.dept_id = 5 }
This is essentially view composition—building queries on top of named subqueries.
Ranging Over Expressions:
Advanced systems allow ranges over computed sets:
{ t | t ∈ (Employee ∪ Contractor) ∧ ... }
Here, the range is the union of two relations. The tuple variable t can bind to tuples from either.
SQL Equivalents:
SQL supports these patterns through:
SELECT * FROM (SELECT * FROM Employee WHERE salary > 50000) AS HighEarners
WHERE dept_id = 5
SELECT * FROM (
SELECT * FROM Employee
UNION ALL
SELECT * FROM Contractor
) AS AllWorkers
WHERE ...
WITH HighEarners AS (
SELECT * FROM Employee WHERE salary > 50000
)
SELECT * FROM HighEarners WHERE dept_id = 5
The ability to range over derived relations reflects TRC's compositionality—queries can be built from subqueries. This is fundamental to relational query languages and enables complex logic to be decomposed into manageable pieces. SQL CTEs and subqueries are the practical manifestation of this theoretical capability.
Domain Calculations:
Some extensions allow ranging over active domain values:
{ x | x ∈ π_salary(Employee) }
This yields all distinct salary values that appear in Employee—ranging over the active domain of the salary attribute rather than tuples.
This connects to Domain Relational Calculus (DRC), where variables range over domain values rather than tuples. We'll explore DRC in a later module.
Practical Implications:
Extended range capabilities enable:
While pure TRC restricts to base relations, practical query languages embrace these extensions for expressiveness.
Range expressions, while seemingly simple, are foundational to TRC's semantics and practicality. Let's consolidate the key concepts:
t ∈ R, R(t), { t : R | ... }) with equivalent semantics—choose based on contextWhat's Next:
With tuple variables and their range declarations established, we now turn to Formula Syntax—the complete grammar for expressing conditions in TRC. This includes logical connectives, comparison operators, and the formal structure of well-formed TRC expressions.
You now understand range expressions comprehensively—from syntax variations to semantic implications, from safety requirements to quantifier integration. This knowledge is essential for constructing correct, evaluable TRC queries.