Relational CalculusSafe Queries

Safe Queries in Relational Calculus

LevelIntermediate

Duration60 mins

TopicSafe Queries

2 / 5

Domain Dependence

When Query Results Depend on the Invisible

In the previous page, we discovered that certain relational calculus queries can produce infinite results. But there's an even more subtle problem lurking in these expressions: domain dependence. A query is domain-dependent if its result changes based on what we assume the underlying domain to be—even when the database content remains exactly the same.

This phenomenon strikes at the heart of what it means for a query to have a well-defined answer. If the same query on the same database produces different results depending on an implicit assumption about 'all possible values,' then the query's meaning is fundamentally ambiguous.

What You Will Learn

By the end of this page, you will understand domain dependence formally, recognize why domain-independent queries are essential for database systems, and be able to analyze whether a given relational calculus expression is domain-dependent or domain-independent.

Understanding Domains in Relational Theory

Before examining domain dependence, we need precision about what 'domain' means in relational database theory. The term appears in multiple contexts with distinct meanings:

Types of Domains in Relational Theory
Domain Type	Symbol	Definition	Example
Attribute Domain	dom(A)	Set of legal values for attribute A	dom(Age) = {0, 1, 2, ..., 150}
Active Domain	adom(I)	All values appearing in database instance I	adom(Employees) = {'Alice', 'Bob', 'Sales', ...}
Underlying Domain	D	Universal set of all possible values	D = all strings ∪ all integers ∪ ...
Query Domain	dom(Q, I)	Values relevant to evaluating Q on I	Depends on query structure

The Active-Underlying Distinction:

The active domain (adom) consists of all values that actually appear in some tuple of some relation in the current database instance. For a database with Employee and Department relations, the active domain includes all employee names, all department names, all salaries—every concrete value stored in the database.

The underlying domain (D) is the mathematical universe from which values are drawn. If we model employee names as strings, the underlying domain includes ALL possible strings—not just those currently stored. This includes 'Alice', 'Bob', but also 'Xyzzy', 'AAAA', names in Chinese, and infinitely many others.

The key insight: adom(I) ⊂ D — the active domain is always a finite subset of the (typically infinite) underlying domain.

Why This Matters

When evaluating a relational calculus expression, we need to know which domain to consider. If a formula says '∃x P(x)', do we search for x in the active domain (finite, searchable) or the underlying domain (infinite, unsearchable)? This choice fundamentally affects whether the query is computable.

Domain Dependence Formally Defined

We can now formally define domain dependence and its crucial counterpart, domain independence.

Formal Definition

A relational calculus query Q is domain-independent if for all database instances I and all underlying domains D₁ and D₂ where adom(I) ⊆ D₁ and adom(I) ⊆ D₂, evaluating Q over I produces the same result using either D₁ or D₂. If Q is not domain-independent, it is domain-dependent.

Intuitive Interpretation:

A domain-independent query produces the same result regardless of what we assume about values not in the database. The query's answer is determined entirely by the database content—nothing else.

A domain-dependent query, by contrast, gives different answers depending on assumptions about the broader universe. This makes the query's meaning ambiguous and its result unpredictable.

Domain-Independent (Good)

•Result determined only by database content
•Same answer regardless of domain choice
•Always produces finite results
•Can be translated to relational algebra
•Equivalent to SQL queries

Domain-Dependent (Bad)

•Result depends on domain assumption
•Different answers for different domains
•Often produces infinite results
•No relational algebra equivalent
•Cannot be expressed in SQL

The Critical Theorem:

A query is safe (produces finite results) if and only if it is domain-independent.

This equivalence is profound: it connects the practical requirement (finite results) with the semantic requirement (unambiguous meaning). Safe queries are exactly those whose answers don't depend on invisible domain assumptions.

Domain Dependence in Action

Let's examine concrete examples showing how the same query produces different results under different domain assumptions. This demonstrates domain dependence viscerally.

Example Database:

Employee(name, department)

(Alice, Engineering)
(Bob, Sales)

adom = {'Alice', 'Bob', 'Engineering', 'Sales'}

Query: {t.name | ¬(∃e)(Employee(e) ∧ e.name = t.name ∧ e.department = 'Engineering')}

Find all names NOT associated with Engineering department.

Result Under Different Domains
Underlying Domain D	Query Result	Result Size
D = adom = {'Alice', 'Bob', 'Engineering', 'Sales'}	{'Bob', 'Engineering', 'Sales'}	3 values
D = adom ∪ {'Carol', 'Dave'}	{'Bob', 'Engineering', 'Sales', 'Carol', 'Dave'}	5 values
D = all strings	All strings except 'Alice'	Infinite!

Domain-Dependent!

The query result changes based on domain choice. Under the active domain, we get 3 results. Under a larger domain, we get more. Under infinite domains, we get infinite results. This query is domain-dependent.

The Pattern Emerges:

Notice the crucial difference between Examples 1 and 3:

Example 1: The free variable t.name is constrained only by negation. Any name NOT in Engineering qualifies—including names not in the database.
Example 3: The free variable e is first constrained to be an Employee (positive predicate), then filtered by a negative condition. Only database values can appear in the result.

This pattern—ensuring variables are bound by positive predicates before applying negative conditions—is the key to domain independence.

Mathematical Analysis of Domain Independence

Understanding domain independence mathematically helps us develop intuition for when queries are safe. Let's formalize the key concepts.

Evaluation Semantics:

Given a query Q = {t | φ(t)}, we define the result as:

Q(I, D) = {t ∈ Dⁿ | I ⊨ φ(t)}

Where:

I is the database instance
D is the underlying domain
Dⁿ is the n-fold Cartesian product (all possible n-tuples over D)
I ⊨ φ(t) means t satisfies formula φ under instance I

Domain independence requires: Q(I, D₁) = Q(I, D₂) for all D₁, D₂ containing adom(I)

Formal Analysis

Math

-- Domain-Dependent Query Analysis
Query Q₁ = {x | ¬R(x)}
 
Evaluation with D₁ = {1, 2} and I = {R = {(1)}}:
  Q₁(I, D₁) = {x ∈ D₁ | I ⊨ ¬R(x)}
            = {2}    -- 2 is not in R
 
Evaluation with D₂ = {1, 2, 3} and same I:
  Q₁(I, D₂) = {x ∈ D₂ | I ⊨ ¬R(x)}
            = {2, 3}  -- 2 and 3 are not in R
 
Q₁(I, D₁) ≠ Q₁(I, D₂)  →  Q₁ is domain-dependent
 
 
-- Domain-Independent Query Analysis  
Query Q₂ = {x | R(x) ∧ x > 0}
 
Evaluation with D₁ = {1, 2} and I = {R = {(1)}}:
  Q₂(I, D₁) = {x ∈ D₁ | I ⊨ R(x) ∧ x > 0}
            = {1}    -- Only R values satisfying x > 0
 
Evaluation with D₂ = {1, 2, 3, 4, 5} and same I:
  Q₂(I, D₂) = {x ∈ D₂ | I ⊨ R(x) ∧ x > 0}
            = {1}    -- Still only R values satisfying x > 0
 
Q₂(I, D₁) = Q₂(I, D₂) for all D₁, D₂ ⊇ adom(I)  →  Q₂ is domain-independent

The Monotonicity Insight:

For domain-dependent queries, increasing the domain generally increases (or changes) the result:

D₁ ⊆ D₂ may imply Q(I, D₁) ⊆ Q(I, D₂) (for negation-based queries)

But for domain-independent queries, the result is stable:

D₁ ⊇ adom(I) and D₂ ⊇ adom(I) implies Q(I, D₁) = Q(I, D₂)

Active Domain Theorem

For a domain-independent query Q, we can evaluate Q using only the active domain: Q(I, D) = Q(I, adom(I)) for any D ⊇ adom(I). This is why domain-independent queries are computable—we only need to consider the finite active domain.

Detecting Domain Dependence

Given a relational calculus expression, how can we determine if it's domain-dependent? While the general problem is undecidable (we cannot always determine domain independence for arbitrary expressions), we can identify sufficient conditions that guarantee domain independence.

Indicators of Domain Dependence

•Unguarded Negation: Free variables appearing only under negation ({x | ¬R(x)})
•Unguarded Universal Quantification: Variables quantified universally without positive bounds ({t | ∀x(P(t,x))})
•Unsafe Disjunction: Variables appearing in only one disjunct ({x | R(x) ∨ x=5})
•Comparison with Constants: Without positive predicates constraining the result ({x | x > 100})
•Implication without Guards: Antecedent doesn't restrict result variables ({t | R(x) → S(t)})

The Positive Predicate Principle:

The fundamental principle for ensuring domain independence is:

Every free variable in the result must be constrained by at least one positive (non-negated) database predicate.

This ensures the variable's possible values come from the database, not the infinite domain.

Analyzing expressions:

Domain Dependence Analysis

Analysis

-- Query 1: {x | ¬R(x)}
Free variable: x
Positive predicates on x: NONE
Verdict: DOMAIN-DEPENDENT (x takes values from domain, not DB)
 
 
-- Query 2: {x | R(x) ∧ ¬S(x)}  
Free variable: x
Positive predicates on x: R(x) ✓
Verdict: DOMAIN-INDEPENDENT (x must be in R)
 
 
-- Query 3: {(x,y) | R(x) ∧ ¬S(y)}
Free variables: x, y
Positive predicates on x: R(x) ✓
Positive predicates on y: NONE
Verdict: DOMAIN-DEPENDENT (y is unconstrained)
 
 
-- Query 4: {x | (∀y)(R(y) → S(x,y))}
Free variable: x
Positive predicates on x: NONE (S appears in implication consequence)
Verdict: DOMAIN-DEPENDENT 
 
 
-- Query 5: {x | R(x) ∧ (∀y)(T(y) → S(x,y))}
Free variable: x
Positive predicates on x: R(x) ✓
Universal y ranges over T: bounded ✓ 
Verdict: DOMAIN-INDEPENDENT

Undecidability Warning

Determining whether an arbitrary relational calculus expression is domain-independent is undecidable in general. The analysis rules we provide are sufficient conditions—queries satisfying them are guaranteed domain-independent. But some domain-independent queries may fail these tests yet still be semantically safe.

Connection to Query Safety

Domain independence and query safety are two perspectives on the same fundamental requirement. Understanding their relationship clarifies why both concepts matter.

The Fundamental Equivalence:

For relational calculus queries:

A query is safe if and only if it always produces a finite result for any finite database instance.
A query is domain-independent if and only if its result doesn't depend on the underlying domain.
These conditions are equivalent. A query is safe ⟺ it is domain-independent.

Two Views of the Same Requirement
Perspective	Question Asked	Good Queries	Bad Queries
Safety (Computability)	Can we compute the result?	Safe: finite results	Unsafe: infinite results
Domain Independence (Semantics)	Is the meaning well-defined?	Domain-independent: stable result	Domain-dependent: ambiguous result
Practical (SQL)	Can we translate to SQL?	Expressible in SQL	Not expressible in SQL

Proof Intuition:

Why does domain independence imply safety?

If a query is domain-independent, its result is the same whether we use the active domain or any larger domain. Since evaluating over the active domain necessarily produces a finite result (the active domain is finite), the query is safe.

Why does domain dependence imply unsafety?

If a query is domain-dependent, there exist two domains D₁ ⊂ D₂ such that Q(I, D₁) ≠ Q(I, D₂). Specifically, we can choose D₂ to be infinite (e.g., all integers). The query result over D₂ will then include values from outside any finite domain—potentially infinitely many.

Practical Implication

This equivalence means we can reason about safety either way: asking 'will this produce finite results?' is the same as asking 'does this depend on the domain?'. Different phrasings suit different contexts, but they identify the same class of well-behaved queries.

Transforming Domain-Dependent Queries

Often, a domain-dependent query expresses a legitimate user intent that can be reformulated as a domain-independent query. The key is understanding what the user actually wants and adding appropriate constraints.

User Intent: Find customers who haven't placed orders.

Naive (Domain-Dependent):

TRC

1
2

{c | ¬(∃o)(Order(o) ∧ o.customer_id = c)}
-- Problem: c ranges over entire domain

Corrected (Domain-Independent):

TRC

{c.id | Customer(c) ∧ ¬(∃o)(Order(o) ∧ o.customer_id = c.id)}
-- Fix: c is bound to Customer table first
 
-- SQL equivalent:
SELECT c.id FROM Customer c
WHERE NOT EXISTS (
    SELECT 1 FROM Order o WHERE o.customer_id = c.id
)

Transformation Pattern

The general pattern: identify what domain of values the user intends, and add a positive predicate constraining the free variable to that domain. Usually, there's a table that defines the 'universe of interest' (Customers, Students, Departments, etc.).

Summary: Domain Dependence

Domain dependence captures a fundamental semantic issue in relational calculus: queries whose meaning depends on assumptions beyond the database content.

Key Takeaways

•Active domain vs. underlying domain — The active domain (database values) is finite; the underlying domain (all possible values) is typically infinite.
•Domain-dependent queries have ambiguous semantics — Their result changes based on domain assumptions, making them ill-defined.
•Domain independence ⟺ Safety — A query is domain-independent if and only if it is safe (produces finite results).
•Free variables need positive constraints — Every result variable must be bound by at least one positive database predicate.
•Domain-dependent queries can often be fixed — Adding appropriate positive predicates to constrain free variables usually resolves the issue.

What's Next:

Now that we understand why domain independence matters, we'll examine the formal safety conditions—syntactic rules that guarantee a query is domain-independent. These conditions provide a practical checklist for ensuring queries are well-formed.

Page Complete

You now understand domain dependence and its equivalence to query safety. This semantic foundation prepares you to learn the formal safety conditions that guarantee computable, well-defined queries in relational calculus.

2 / 5

Loading learning content...

Relational CalculusSafe Queries

Safe Queries in Relational Calculus

LevelIntermediate

Duration60 mins

TopicSafe Queries

2 / 5

Domain Dependence

When Query Results Depend on the Invisible

What You Will Learn

Understanding Domains in Relational Theory

Before examining domain dependence, we need precision about what 'domain' means in relational database theory. The term appears in multiple contexts with distinct meanings:

Types of Domains in Relational Theory
Domain Type	Symbol	Definition	Example
Attribute Domain	dom(A)	Set of legal values for attribute A	dom(Age) = {0, 1, 2, ..., 150}
Active Domain	adom(I)	All values appearing in database instance I	adom(Employees) = {'Alice', 'Bob', 'Sales', ...}
Underlying Domain	D	Universal set of all possible values	D = all strings ∪ all integers ∪ ...
Query Domain	dom(Q, I)	Values relevant to evaluating Q on I	Depends on query structure

The Active-Underlying Distinction:

The key insight: adom(I) ⊂ D — the active domain is always a finite subset of the (typically infinite) underlying domain.

Why This Matters

Domain Dependence Formally Defined

We can now formally define domain dependence and its crucial counterpart, domain independence.

Formal Definition

Intuitive Interpretation:

A domain-independent query produces the same result regardless of what we assume about values not in the database. The query's answer is determined entirely by the database content—nothing else.

A domain-dependent query, by contrast, gives different answers depending on assumptions about the broader universe. This makes the query's meaning ambiguous and its result unpredictable.

Domain-Independent (Good)

•Result determined only by database content
•Same answer regardless of domain choice
•Always produces finite results
•Can be translated to relational algebra
•Equivalent to SQL queries

Domain-Dependent (Bad)

•Result depends on domain assumption
•Different answers for different domains
•Often produces infinite results
•No relational algebra equivalent
•Cannot be expressed in SQL

The Critical Theorem:

A query is safe (produces finite results) if and only if it is domain-independent.

Domain Dependence in Action

Let's examine concrete examples showing how the same query produces different results under different domain assumptions. This demonstrates domain dependence viscerally.

Example Database:

Employee(name, department)

(Alice, Engineering)
(Bob, Sales)

adom = {'Alice', 'Bob', 'Engineering', 'Sales'}

Query: {t.name | ¬(∃e)(Employee(e) ∧ e.name = t.name ∧ e.department = 'Engineering')}

Find all names NOT associated with Engineering department.

Result Under Different Domains
Underlying Domain D	Query Result	Result Size
D = adom = {'Alice', 'Bob', 'Engineering', 'Sales'}	{'Bob', 'Engineering', 'Sales'}	3 values
D = adom ∪ {'Carol', 'Dave'}	{'Bob', 'Engineering', 'Sales', 'Carol', 'Dave'}	5 values
D = all strings	All strings except 'Alice'	Infinite!

Domain-Dependent!

The Pattern Emerges:

Notice the crucial difference between Examples 1 and 3:

Example 1: The free variable t.name is constrained only by negation. Any name NOT in Engineering qualifies—including names not in the database.
Example 3: The free variable e is first constrained to be an Employee (positive predicate), then filtered by a negative condition. Only database values can appear in the result.

This pattern—ensuring variables are bound by positive predicates before applying negative conditions—is the key to domain independence.

Mathematical Analysis of Domain Independence

Understanding domain independence mathematically helps us develop intuition for when queries are safe. Let's formalize the key concepts.

Evaluation Semantics:

Given a query Q = {t | φ(t)}, we define the result as:

Q(I, D) = {t ∈ Dⁿ | I ⊨ φ(t)}

Where:

I is the database instance
D is the underlying domain
Dⁿ is the n-fold Cartesian product (all possible n-tuples over D)
I ⊨ φ(t) means t satisfies formula φ under instance I

Domain independence requires: Q(I, D₁) = Q(I, D₂) for all D₁, D₂ containing adom(I)

Formal Analysis

Math

-- Domain-Dependent Query Analysis
Query Q₁ = {x | ¬R(x)}
 
Evaluation with D₁ = {1, 2} and I = {R = {(1)}}:
  Q₁(I, D₁) = {x ∈ D₁ | I ⊨ ¬R(x)}
            = {2}    -- 2 is not in R
 
Evaluation with D₂ = {1, 2, 3} and same I:
  Q₁(I, D₂) = {x ∈ D₂ | I ⊨ ¬R(x)}
            = {2, 3}  -- 2 and 3 are not in R
 
Q₁(I, D₁) ≠ Q₁(I, D₂)  →  Q₁ is domain-dependent
 
 
-- Domain-Independent Query Analysis  
Query Q₂ = {x | R(x) ∧ x > 0}
 
Evaluation with D₁ = {1, 2} and I = {R = {(1)}}:
  Q₂(I, D₁) = {x ∈ D₁ | I ⊨ R(x) ∧ x > 0}
            = {1}    -- Only R values satisfying x > 0
 
Evaluation with D₂ = {1, 2, 3, 4, 5} and same I:
  Q₂(I, D₂) = {x ∈ D₂ | I ⊨ R(x) ∧ x > 0}
            = {1}    -- Still only R values satisfying x > 0
 
Q₂(I, D₁) = Q₂(I, D₂) for all D₁, D₂ ⊇ adom(I)  →  Q₂ is domain-independent

The Monotonicity Insight:

For domain-dependent queries, increasing the domain generally increases (or changes) the result:

D₁ ⊆ D₂ may imply Q(I, D₁) ⊆ Q(I, D₂) (for negation-based queries)

But for domain-independent queries, the result is stable:

D₁ ⊇ adom(I) and D₂ ⊇ adom(I) implies Q(I, D₁) = Q(I, D₂)

Active Domain Theorem

Detecting Domain Dependence

Indicators of Domain Dependence

•Unguarded Negation: Free variables appearing only under negation ({x | ¬R(x)})
•Unguarded Universal Quantification: Variables quantified universally without positive bounds ({t | ∀x(P(t,x))})
•Unsafe Disjunction: Variables appearing in only one disjunct ({x | R(x) ∨ x=5})
•Comparison with Constants: Without positive predicates constraining the result ({x | x > 100})
•Implication without Guards: Antecedent doesn't restrict result variables ({t | R(x) → S(t)})

The Positive Predicate Principle:

The fundamental principle for ensuring domain independence is:

Every free variable in the result must be constrained by at least one positive (non-negated) database predicate.

This ensures the variable's possible values come from the database, not the infinite domain.

Analyzing expressions:

Domain Dependence Analysis

Analysis

-- Query 1: {x | ¬R(x)}
Free variable: x
Positive predicates on x: NONE
Verdict: DOMAIN-DEPENDENT (x takes values from domain, not DB)
 
 
-- Query 2: {x | R(x) ∧ ¬S(x)}  
Free variable: x
Positive predicates on x: R(x) ✓
Verdict: DOMAIN-INDEPENDENT (x must be in R)
 
 
-- Query 3: {(x,y) | R(x) ∧ ¬S(y)}
Free variables: x, y
Positive predicates on x: R(x) ✓
Positive predicates on y: NONE
Verdict: DOMAIN-DEPENDENT (y is unconstrained)
 
 
-- Query 4: {x | (∀y)(R(y) → S(x,y))}
Free variable: x
Positive predicates on x: NONE (S appears in implication consequence)
Verdict: DOMAIN-DEPENDENT 
 
 
-- Query 5: {x | R(x) ∧ (∀y)(T(y) → S(x,y))}
Free variable: x
Positive predicates on x: R(x) ✓
Universal y ranges over T: bounded ✓ 
Verdict: DOMAIN-INDEPENDENT

Undecidability Warning

Connection to Query Safety

Domain independence and query safety are two perspectives on the same fundamental requirement. Understanding their relationship clarifies why both concepts matter.

The Fundamental Equivalence:

For relational calculus queries:

A query is safe if and only if it always produces a finite result for any finite database instance.
A query is domain-independent if and only if its result doesn't depend on the underlying domain.
These conditions are equivalent. A query is safe ⟺ it is domain-independent.

Two Views of the Same Requirement
Perspective	Question Asked	Good Queries	Bad Queries
Safety (Computability)	Can we compute the result?	Safe: finite results	Unsafe: infinite results
Domain Independence (Semantics)	Is the meaning well-defined?	Domain-independent: stable result	Domain-dependent: ambiguous result
Practical (SQL)	Can we translate to SQL?	Expressible in SQL	Not expressible in SQL

Proof Intuition:

Why does domain independence imply safety?

Why does domain dependence imply unsafety?

Practical Implication

Transforming Domain-Dependent Queries

User Intent: Find customers who haven't placed orders.

Naive (Domain-Dependent):

TRC

1
2

{c | ¬(∃o)(Order(o) ∧ o.customer_id = c)}
-- Problem: c ranges over entire domain

Corrected (Domain-Independent):

TRC

{c.id | Customer(c) ∧ ¬(∃o)(Order(o) ∧ o.customer_id = c.id)}
-- Fix: c is bound to Customer table first
 
-- SQL equivalent:
SELECT c.id FROM Customer c
WHERE NOT EXISTS (
    SELECT 1 FROM Order o WHERE o.customer_id = c.id
)

Transformation Pattern

Summary: Domain Dependence

Domain dependence captures a fundamental semantic issue in relational calculus: queries whose meaning depends on assumptions beyond the database content.

Key Takeaways

•Active domain vs. underlying domain — The active domain (database values) is finite; the underlying domain (all possible values) is typically infinite.
•Domain-dependent queries have ambiguous semantics — Their result changes based on domain assumptions, making them ill-defined.
•Domain independence ⟺ Safety — A query is domain-independent if and only if it is safe (produces finite results).
•Free variables need positive constraints — Every result variable must be bound by at least one positive database predicate.
•Domain-dependent queries can often be fixed — Adding appropriate positive predicates to constrain free variables usually resolves the issue.

What's Next:

Page Complete

2 / 5