Domain Relational Calculus - Learning Module

Loading content...

0/241

Domain Variables: The Foundation of DRC

A Different Perspective on Variables

In our exploration of Tuple Relational Calculus (TRC), we worked with tuple variables—variables that range over entire tuples (rows) in a relation. While powerful and intuitive, this approach represents just one way of thinking about declarative queries in relational databases.

Domain Relational Calculus (DRC) offers an alternative paradigm—one where variables don't represent entire tuples but rather individual domain values (the atomic values that populate individual columns). This seemingly subtle distinction leads to a fundamentally different query formulation style, one that would eventually inspire one of the most influential database query interfaces ever created: Query-by-Example (QBE).

Understanding domain variables is essential because they represent a different level of abstraction—closer to the actual data values stored in the database. This perspective provides unique insights into query construction and has practical implications for how users interact with database systems.

What You Will Learn

By the end of this page, you will understand what domain variables are, how they differ fundamentally from tuple variables, how they range over attribute domains, and why this distinction matters for query formulation. You'll also see how domain variables provide a more granular view of data and enable alternative query expression styles.

What Are Domain Variables?

A domain variable is a variable that ranges over the values of a single domain (also called a data type or value set). In the context of relational databases, a domain is the set of all possible values that an attribute can take.

Formal Definition:

Let D be a domain (e.g., the set of all possible employee names, or the set of all valid salary values). A domain variable x is a variable such that for any valid assignment, x ∈ D—meaning x takes a value from domain D.

Key Insight: While a tuple variable represents an entire row with multiple attributes, a domain variable represents just one atomic value—a single cell in the relational table.

Mathematical Foundation:

In set-theoretic terms, if we have a relation R with schema R(A₁, A₂, ..., Aₙ), each attribute Aᵢ is associated with a domain dom(Aᵢ). A domain variable xᵢ ranges over dom(Aᵢ), meaning:

xᵢ ∈ dom(Aᵢ)

This contrasts with a tuple variable t that would range over the entire relation:

t ∈ R, where t = (v₁, v₂, ..., vₙ) and each vᵢ ∈ dom(Aᵢ)

The Granularity Difference

Think of it this way: A tuple variable is like selecting an entire row from a spreadsheet, while domain variables are like selecting individual cells. With tuple variables, you say 'give me this row.' With domain variables, you say 'give me the value in this column from a row where certain conditions hold.'

Tuple Variables vs Domain Variables
Aspect	Tuple Variables (TRC)	Domain Variables (DRC)
What it represents	An entire tuple (row)	A single attribute value (cell)
Ranges over	Tuples in a relation	Values in a domain
Access notation	t.attribute (dot notation)	Variable names directly
Number needed per tuple	One per relation reference	One per attribute
Conceptual level	Row-level abstraction	Value-level abstraction
Closer analogy	Selecting a row	Selecting individual cells

Understanding Domains in Depth

Before we can fully appreciate domain variables, we must have a rigorous understanding of what domains are in the relational model.

Definition of Domain:

A domain is a named set of atomic (indivisible) values. Each domain has a logical definition that specifies what values belong to it. Domains can be:

Primitive domains — Built-in types like INTEGER, REAL, CHAR(n), VARCHAR(n), DATE, BOOLEAN
User-defined domains — Custom value sets defined for specific applications
Enumerated domains — Finite sets of explicitly listed values

Examples of Domains:

domain_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
-- Primitive domains (built-in types)
EmployeeID_Domain = INTEGER (positive integers only)
Salary_Domain = DECIMAL(10,2) (monetary values)
Name_Domain = VARCHAR(100) (character strings up to 100 chars)
 
-- User-defined domains (SQL:1999 and later)
CREATE DOMAIN PhoneNumber AS VARCHAR(20)
    CHECK (VALUE ~ '^[0-9-()+ ]+$');
 
CREATE DOMAIN PositiveInteger AS INTEGER
    CHECK (VALUE > 0);
 
-- Enumerated domains (finite value sets)
Department_Domain = {'Engineering', 'Sales', 'HR', 'Finance', 'Marketing'}
Status_Domain = {'Active', 'Inactive', 'Pending', 'Terminated'}
Grade_Domain = {'A', 'B', 'C', 'D', 'F'}

Domain Compatibility:

In the relational model, attributes are associated with domains, and meaningful comparisons can only be made between values from the same or compatible domains. For example:

Comparing employee.salary with budget.amount makes sense if both are numeric monetary values
Comparing employee.name with department.id doesn't make semantic sense even if both could be stored as strings

The Active Domain:

While a domain D defines all possible values, the active domain (also called the database domain) at any point in time is the set of values that actually appear in the current database instance. This distinction becomes important when we discuss safe queries later.

Active_Domain(A) = {t.A | t ∈ R} ⊆ dom(A)

The active domain is always a subset of the full domain, containing only those values currently stored in the database.

Why Domains Matter for DRC

In DRC, when we declare a domain variable, we're implicitly stating which domain it ranges over. This gives the query evaluator precise information about what values the variable can take, which is crucial for query optimization and for ensuring query safety (finite results).

Domain Variable Declaration and Binding

In Domain Relational Calculus, each domain variable must be bound to a specific domain before it can be used meaningfully in a query. This binding can be:

Explicit — The domain is declared directly
Implicit — The domain is inferred from usage in a membership condition

Explicit Declaration:

In formal DRC notation, we might write:

{<x, y, z> | x ∈ dom(Name), y ∈ dom(Salary), z ∈ dom(DeptID), ...}

This explicitly states that x ranges over names, y over salaries, and z over department IDs.

Implicit Binding Through Membership:

More commonly, domain variables are bound implicitly through their use in membership predicates. When we write:

{<x, y> | Employee(x, y, z)}

The variable x is implicitly bound to dom(Name) because it appears in the first position of the Employee relation, which has Name as its first attribute.

Key Binding Concepts

•Positional binding — Variables are bound based on their position in a relation's attribute list. The first variable corresponds to the first attribute, second to second, and so on.
•Type consistency — Once bound, a variable maintains its domain type throughout the query. You cannot compare a name-domain variable with a salary-domain value.
•Multiple occurrences — A variable can appear multiple times in a formula, but all occurrences refer to the same value (like in algebra).
•Free vs. bound variables — Variables in the target list are free (they define the output), while variables bound by quantifiers (∃, ∀) are bound within their scope.

Binding Example in Detail:

Consider an Employee relation with schema:

Employee(EmpID, Name, Salary, DeptID)

If we write a DRC query:

{<n, s> | ∃e ∃d (Employee(e, n, s, d) ∧ s > 50000)}

Here:

e is bound to dom(EmpID) — the domain of employee IDs
n is bound to dom(Name) — the domain of employee names
s is bound to dom(Salary) — the domain of salary values
d is bound to dom(DeptID) — the domain of department IDs

The variables n and s are free (they appear in the target list), while e and d are existentially bound (they appear after ∃).

The Importance of Proper Binding

Every domain variable must be bound either explicitly to a domain or implicitly through a relation membership condition. Unbound variables lead to undefined semantics and potentially infinite or meaningless results. Query safety ensures all variables are properly constrained.

The Relationship Between Domain Variables and Tuples

Understanding how domain variables relate to tuples is crucial for mastering DRC. While we work with individual values, these values combine to form tuples.

The Tuple Reconstruction Principle:

In DRC, a tuple is represented as an ordered list of domain variables. If a relation R has attributes A₁, A₂, ..., Aₙ, then a tuple in R is represented by domain variables x₁, x₂, ..., xₙ where:

R(x₁, x₂, ..., xₙ)

This predicate is true if and only if the tuple (x₁, x₂, ..., xₙ) exists in relation R.

Visual Representation:

Employee Relation Instance
EmpID (e)	Name (n)	Salary (s)	DeptID (d)
E001	Alice Chen	75000	D10
E002	Bob Smith	62000	D10
E003	Carol Davis	88000	D20
E004	David Lee	55000	D30

For the above relation:

Employee('E001', 'Alice Chen', 75000, 'D10') is true
Employee('E001', 'Bob Smith', 75000, 'D10') is false (no such tuple exists)
Employee('E005', 'Eve Wilson', 90000, 'D40') is false (no such tuple exists)

Contrast with TRC:

In TRC, we would represent the same concept differently:

TRC: {t | Employee(t) ∧ t.Salary > 70000}
      One variable t represents the entire row
      Access attributes via t.Attribute notation

DRC: {<n, s> | ∃e ∃d (Employee(e, n, s, d) ∧ s > 70000)}
      Four variables e, n, s, d represent individual values
      No dot notation needed—variables ARE the values

The Trade-off:

DRC requires more variable declarations (one per attribute potentially used), but eliminates the need for dot notation and makes the relationship between values and domains explicit. This verbosity becomes an advantage in certain contexts, particularly in visual query languages.

When DRC Shines

DRC is particularly intuitive when the query result requires only specific columns rather than entire tuples. Selecting Name and Salary from a 15-column table means working with just 2 domain variables in the target list, whereas TRC would still work with the full tuple and project at the end.

Domain Variables in Formulas

Domain variables appear in three main contexts within DRC formulas:

1. Target List (Result Specification)

The target list specifies which domain values appear in the query result:

{<x₁, x₂, ..., xₖ> | formula}

The variables x₁ through xₖ must all be free in the formula (not bound by quantifiers in the outermost scope) and must have their domains constrained by the formula.

2. Membership Predicates (Tuple Existence)

Membership predicates assert that certain domain values form a tuple in a relation:

Employee(e, n, s, d)  -- Variables take values from a tuple in Employee
Department(d, dn, m)  -- Variables take values from a tuple in Department

3. Comparison and Logical Predicates

Domain variables participate in comparisons and logical expressions:

s > 50000                -- Comparison predicate
n = 'Alice Chen'         -- Equality comparison
d = d'                   -- Join condition (variables from different tuples)
s₁ < s₂                  -- Comparing values from different tuples

drc_formula_structure.txt
DRC Query Structure:
═══════════════════
 
{ <target_list> | formula }
 
Components:
───────────
<target_list> ::= <domain_var> | <domain_var>, <target_list>
                   
formula ::= atomic_formula
          | ¬ formula
          | formula ∧ formula
          | formula ∨ formula
          | formula → formula
          | ∃ domain_var (formula)
          | ∀ domain_var (formula)
 
atomic_formula ::= R(var₁, var₂, ..., varₙ)    -- Membership
                 | var θ var                    -- Comparison (θ ∈ {=,≠,<,>,≤,≥})
                 | var θ constant               -- Constant comparison
 
Example Query:
─────────────
Find names and salaries of employees earning over $50,000:
 
{ <n, s> | ∃e ∃d ( Employee(e, n, s, d) ∧ s > 50000 ) }
 
Breakdown:
• <n, s> = Target list (result columns)
• ∃e ∃d = Existential quantification (there exist values for e and d)
• Employee(e, n, s, d) = Membership predicate (binding all four variables)
• s > 50000 = Selection condition

Variable Scope and Quantification:

The scope of a quantified variable extends to the smallest enclosing parentheses. Consider:

{<n> | ∃e ∃s ∃d (Employee(e, n, s, d) ∧ ∀m ∃dm ∃mm (Department(d, dm, m) → s > 60000))}

In this (admittedly complex) example:

n is free — it appears in the target list
e, s, d are existentially bound in the outer scope
m is universally bound in the inner formula
dm, mm are existentially bound in the innermost scope

The scope rules ensure that each variable reference is unambiguous.

The Freedom Principle

A fundamental rule in DRC: Every variable in the target list must be free in the overall formula. You cannot return a value for a variable that's been existentially or universally bound—such a variable doesn't have a specific value to return.

Multiple Domain Variables and Joins

When queries involve multiple relations, domain variables from different membership predicates can be connected through equality conditions, effectively implementing joins.

Join Through Variable Reuse:

The most elegant way to express a join in DRC is to use the same variable in corresponding positions of different relation predicates:

Employee(e, n, s, d) ∧ Department(d, dn, m)

Here, the variable d appears in both predicates:

In Employee, it binds to DeptID (4th position)
In Department, it binds to the department key (1st position)

Because it's the same variable, the formula is only satisfied when both positions have the same value—this is the join condition!

Join Through Explicit Equality:

Alternatively, we can use different variables and add an explicit equality:

Employee(e, n, s, d₁) ∧ Department(d₂, dn, m) ∧ d₁ = d₂

This is semantically equivalent but more verbose.

drc_join_examples.txt
Example: Find employee names with their department names
 
Schema:
  Employee(EmpID, Name, Salary, DeptID)
  Department(DeptID, DeptName, ManagerID)
 
Method 1: Variable Reuse (Preferred)
────────────────────────────────────
{ <n, dn> | ∃e ∃s ∃d ∃m ( 
    Employee(e, n, s, d) ∧ 
    Department(d, dn, m) 
) }
 
Explanation:
• Variable 'd' binds to Employee.DeptID
• Same 'd' binds to Department.DeptID
• Natural join on department ID
 
Method 2: Explicit Equality
───────────────────────────
{ <n, dn> | ∃e ∃s ∃d₁ ∃d₂ ∃m ( 
    Employee(e, n, s, d₁) ∧ 
    Department(d₂, dn, m) ∧
    d₁ = d₂
) }
 
Both queries produce identical results:
┌─────────────┬─────────────┐
│    Name     │  DeptName   │
├─────────────┼─────────────┤
│ Alice Chen  │ Engineering │
│ Bob Smith   │ Engineering │
│ Carol Davis │ Marketing   │
│ David Lee   │ Sales       │
└─────────────┴─────────────┘

Multi-Way Joins:

For queries involving three or more relations, the same principles extend naturally:

{<n, dn, pn> | ∃e ∃s ∃d ∃m ∃p ∃ps 
    Employee(e, n, s, d) ∧ 
    Department(d, dn, m) ∧
    Project(p, pn, d)           -- Project.DeptID = Employee.DeptID
}

This finds employees with their department names and project names for projects in their department.

Self-Joins with Domain Variables:

Self-joins require careful variable management since we're referencing the same relation twice:

-- Find pairs of employees in the same department
{<n₁, n₂> | ∃e₁ ∃s₁ ∃d ∃e₂ ∃s₂ (
    Employee(e₁, n₁, s₁, d) ∧ 
    Employee(e₂, n₂, s₂, d) ∧
    e₁ < e₂  -- Avoid duplicate pairs and self-pairs
)}

Note we use different variables e₁, n₁, s₁ and e₂, n₂, s₂ for the two references to Employee, but the same d to enforce the same-department condition.

Elegant Join Semantics

The variable-reuse join style in DRC is remarkably elegant: by simply using the same variable name in corresponding positions across relations, we express natural join semantics without explicit join operators or conditions. This declarative style says 'these must be the same value' implicitly.

Domain Variables and the Active Domain

A critical concept when working with domain variables is the distinction between the full domain and the active domain.

The Full Domain Problem:

Consider a simple-looking query:

{<x> | ¬Employee(e, x, s, d)}

"Find all names that are NOT employee names."

If x ranges over the full domain of names (all possible strings), the result is potentially infinite! This is the safety problem in DRC.

The Active Domain Solution:

The active domain is the set of values that actually appear in the current database instance:

ADOM = ⋃ { πAᵢ(R) | R is a relation, Aᵢ is an attribute of R }

For safe query evaluation, we restrict domain variables to range over the active domain rather than the full theoretical domain.

Active Domain Properties

•Finiteness — The active domain is always finite (bounded by the total number of values in all relations).
•Dynamic — The active domain changes as the database is updated. Inserting new values expands it; deleting the last occurrence of a value contracts it.
•Includes constants — Constants appearing in the query are typically added to the active domain for evaluation purposes.
•Relation-specific — We can speak of the active domain for a specific attribute or relation, not just globally.
•Safety criterion — A query is safe if it can be evaluated using only active domain values and produces finite results.

Example: Active Domain Computation

Given these relations:

Employee(E001, 'Alice', 75000, D10)
Employee(E002, 'Bob', 62000, D10)
Department(D10, 'Engineering', E001)
Department(D20, 'Marketing', E003)

The active domain for relevant attributes:

ADOM(Name) = {'Alice', 'Bob', 'Engineering', 'Marketing'}
ADOM(DeptID) = {D10, D20}
ADOM(EmpID) = {E001, E002, E003}
Global ADOM = Union of all attribute active domains

Safe Query Evaluation:

When evaluating a DRC query, we:

Compute the active domain augmented with query constants
Consider only assignments where domain variables take active domain values
Evaluate the formula for all such assignments
Return tuples where the formula evaluates to true

This ensures finite computation time and finite results.

Safety Is Non-Negotiable

The active domain restriction isn't just a performance optimization—it's a semantic necessity. Without it, many reasonable-looking queries would have undefined (infinite) results. We'll explore safety conditions in depth in a later page, but remember: domain variables must ultimately be constrained to produce meaningful results.

Summary: Domain Variables

We've established the foundational understanding of domain variables—the building blocks of Domain Relational Calculus. Let's consolidate the key insights:

Key Takeaways

•Domain variables range over individual values — Unlike tuple variables that represent entire rows, domain variables represent single atomic values from a specific domain.
•Domains define the space of possible values — Each attribute has an associated domain that constrains what values are legal for that attribute.
•Variables are bound through membership predicates — The expression R(x, y, z) binds x, y, z to the domains of R's attributes based on position.
•Joins use variable sharing — Using the same variable in multiple relation predicates elegantly expresses join conditions without explicit operators.
•Free variables define results — Variables in the target list must be free in the formula and determine what appears in the query output.
•Active domain ensures safety — Restricting variable ranges to values actually in the database guarantees finite, meaningful results.

What's Next:

Now that we understand domain variables, we'll explore the complete DRC syntax—the formal grammar that governs how DRC queries are constructed. We'll learn the precise rules for forming valid queries, the operators available, and how complex conditions are expressed using logical connectives and quantifiers.

Page Complete

You now understand domain variables—the fundamental building blocks of Domain Relational Calculus. You know how they differ from tuple variables, how they bind to domains, how they participate in formulas, and why the active domain constraint is essential. Next, we'll use this foundation to learn the complete DRC syntax.