Loading learning content...
In the previous page, we discovered that certain relational calculus queries can produce infinite results. But there's an even more subtle problem lurking in these expressions: domain dependence. A query is domain-dependent if its result changes based on what we assume the underlying domain to be—even when the database content remains exactly the same.
This phenomenon strikes at the heart of what it means for a query to have a well-defined answer. If the same query on the same database produces different results depending on an implicit assumption about 'all possible values,' then the query's meaning is fundamentally ambiguous.
By the end of this page, you will understand domain dependence formally, recognize why domain-independent queries are essential for database systems, and be able to analyze whether a given relational calculus expression is domain-dependent or domain-independent.
Before examining domain dependence, we need precision about what 'domain' means in relational database theory. The term appears in multiple contexts with distinct meanings:
| Domain Type | Symbol | Definition | Example |
|---|---|---|---|
| Attribute Domain | dom(A) | Set of legal values for attribute A | dom(Age) = {0, 1, 2, ..., 150} |
| Active Domain | adom(I) | All values appearing in database instance I | adom(Employees) = {'Alice', 'Bob', 'Sales', ...} |
| Underlying Domain | D | Universal set of all possible values | D = all strings ∪ all integers ∪ ... |
| Query Domain | dom(Q, I) | Values relevant to evaluating Q on I | Depends on query structure |
The Active-Underlying Distinction:
The active domain (adom) consists of all values that actually appear in some tuple of some relation in the current database instance. For a database with Employee and Department relations, the active domain includes all employee names, all department names, all salaries—every concrete value stored in the database.
The underlying domain (D) is the mathematical universe from which values are drawn. If we model employee names as strings, the underlying domain includes ALL possible strings—not just those currently stored. This includes 'Alice', 'Bob', but also 'Xyzzy', 'AAAA', names in Chinese, and infinitely many others.
The key insight: adom(I) ⊂ D — the active domain is always a finite subset of the (typically infinite) underlying domain.
When evaluating a relational calculus expression, we need to know which domain to consider. If a formula says '∃x P(x)', do we search for x in the active domain (finite, searchable) or the underlying domain (infinite, unsearchable)? This choice fundamentally affects whether the query is computable.
We can now formally define domain dependence and its crucial counterpart, domain independence.
A relational calculus query Q is domain-independent if for all database instances I and all underlying domains D₁ and D₂ where adom(I) ⊆ D₁ and adom(I) ⊆ D₂, evaluating Q over I produces the same result using either D₁ or D₂. If Q is not domain-independent, it is domain-dependent.
Intuitive Interpretation:
A domain-independent query produces the same result regardless of what we assume about values not in the database. The query's answer is determined entirely by the database content—nothing else.
A domain-dependent query, by contrast, gives different answers depending on assumptions about the broader universe. This makes the query's meaning ambiguous and its result unpredictable.
The Critical Theorem:
A query is safe (produces finite results) if and only if it is domain-independent.
This equivalence is profound: it connects the practical requirement (finite results) with the semantic requirement (unambiguous meaning). Safe queries are exactly those whose answers don't depend on invisible domain assumptions.
Let's examine concrete examples showing how the same query produces different results under different domain assumptions. This demonstrates domain dependence viscerally.
Example Database:
Employee(name, department)
adom = {'Alice', 'Bob', 'Engineering', 'Sales'}
Query: {t.name | ¬(∃e)(Employee(e) ∧ e.name = t.name ∧ e.department = 'Engineering')}
Find all names NOT associated with Engineering department.
| Underlying Domain D | Query Result | Result Size |
|---|---|---|
| D = adom = {'Alice', 'Bob', 'Engineering', 'Sales'} | {'Bob', 'Engineering', 'Sales'} | 3 values |
| D = adom ∪ {'Carol', 'Dave'} | {'Bob', 'Engineering', 'Sales', 'Carol', 'Dave'} | 5 values |
| D = all strings | All strings except 'Alice' | Infinite! |
The query result changes based on domain choice. Under the active domain, we get 3 results. Under a larger domain, we get more. Under infinite domains, we get infinite results. This query is domain-dependent.
The Pattern Emerges:
Notice the crucial difference between Examples 1 and 3:
Example 1: The free variable t.name is constrained only by negation. Any name NOT in Engineering qualifies—including names not in the database.
Example 3: The free variable e is first constrained to be an Employee (positive predicate), then filtered by a negative condition. Only database values can appear in the result.
This pattern—ensuring variables are bound by positive predicates before applying negative conditions—is the key to domain independence.
Understanding domain independence mathematically helps us develop intuition for when queries are safe. Let's formalize the key concepts.
Evaluation Semantics:
Given a query Q = {t | φ(t)}, we define the result as:
Q(I, D) = {t ∈ Dⁿ | I ⊨ φ(t)}
Where:
Domain independence requires: Q(I, D₁) = Q(I, D₂) for all D₁, D₂ containing adom(I)
1234567891011121314151617181920212223242526
-- Domain-Dependent Query AnalysisQuery Q₁ = {x | ¬R(x)} Evaluation with D₁ = {1, 2} and I = {R = {(1)}}: Q₁(I, D₁) = {x ∈ D₁ | I ⊨ ¬R(x)} = {2} -- 2 is not in R Evaluation with D₂ = {1, 2, 3} and same I: Q₁(I, D₂) = {x ∈ D₂ | I ⊨ ¬R(x)} = {2, 3} -- 2 and 3 are not in R Q₁(I, D₁) ≠ Q₁(I, D₂) → Q₁ is domain-dependent -- Domain-Independent Query Analysis Query Q₂ = {x | R(x) ∧ x > 0} Evaluation with D₁ = {1, 2} and I = {R = {(1)}}: Q₂(I, D₁) = {x ∈ D₁ | I ⊨ R(x) ∧ x > 0} = {1} -- Only R values satisfying x > 0 Evaluation with D₂ = {1, 2, 3, 4, 5} and same I: Q₂(I, D₂) = {x ∈ D₂ | I ⊨ R(x) ∧ x > 0} = {1} -- Still only R values satisfying x > 0 Q₂(I, D₁) = Q₂(I, D₂) for all D₁, D₂ ⊇ adom(I) → Q₂ is domain-independentThe Monotonicity Insight:
For domain-dependent queries, increasing the domain generally increases (or changes) the result:
D₁ ⊆ D₂ may imply Q(I, D₁) ⊆ Q(I, D₂) (for negation-based queries)
But for domain-independent queries, the result is stable:
D₁ ⊇ adom(I) and D₂ ⊇ adom(I) implies Q(I, D₁) = Q(I, D₂)
For a domain-independent query Q, we can evaluate Q using only the active domain: Q(I, D) = Q(I, adom(I)) for any D ⊇ adom(I). This is why domain-independent queries are computable—we only need to consider the finite active domain.
Given a relational calculus expression, how can we determine if it's domain-dependent? While the general problem is undecidable (we cannot always determine domain independence for arbitrary expressions), we can identify sufficient conditions that guarantee domain independence.
{x | ¬R(x)}){t | ∀x(P(t,x))}){x | R(x) ∨ x=5}){x | x > 100}){t | R(x) → S(t)})The Positive Predicate Principle:
The fundamental principle for ensuring domain independence is:
Every free variable in the result must be constrained by at least one positive (non-negated) database predicate.
This ensures the variable's possible values come from the database, not the infinite domain.
Analyzing expressions:
123456789101112131415161718192021222324252627282930
-- Query 1: {x | ¬R(x)}Free variable: xPositive predicates on x: NONEVerdict: DOMAIN-DEPENDENT (x takes values from domain, not DB) -- Query 2: {x | R(x) ∧ ¬S(x)} Free variable: xPositive predicates on x: R(x) ✓Verdict: DOMAIN-INDEPENDENT (x must be in R) -- Query 3: {(x,y) | R(x) ∧ ¬S(y)}Free variables: x, yPositive predicates on x: R(x) ✓Positive predicates on y: NONEVerdict: DOMAIN-DEPENDENT (y is unconstrained) -- Query 4: {x | (∀y)(R(y) → S(x,y))}Free variable: xPositive predicates on x: NONE (S appears in implication consequence)Verdict: DOMAIN-DEPENDENT -- Query 5: {x | R(x) ∧ (∀y)(T(y) → S(x,y))}Free variable: xPositive predicates on x: R(x) ✓Universal y ranges over T: bounded ✓ Verdict: DOMAIN-INDEPENDENTDetermining whether an arbitrary relational calculus expression is domain-independent is undecidable in general. The analysis rules we provide are sufficient conditions—queries satisfying them are guaranteed domain-independent. But some domain-independent queries may fail these tests yet still be semantically safe.
Domain independence and query safety are two perspectives on the same fundamental requirement. Understanding their relationship clarifies why both concepts matter.
The Fundamental Equivalence:
For relational calculus queries:
A query is safe if and only if it always produces a finite result for any finite database instance.
A query is domain-independent if and only if its result doesn't depend on the underlying domain.
These conditions are equivalent. A query is safe ⟺ it is domain-independent.
| Perspective | Question Asked | Good Queries | Bad Queries |
|---|---|---|---|
| Safety (Computability) | Can we compute the result? | Safe: finite results | Unsafe: infinite results |
| Domain Independence (Semantics) | Is the meaning well-defined? | Domain-independent: stable result | Domain-dependent: ambiguous result |
| Practical (SQL) | Can we translate to SQL? | Expressible in SQL | Not expressible in SQL |
Proof Intuition:
Why does domain independence imply safety?
If a query is domain-independent, its result is the same whether we use the active domain or any larger domain. Since evaluating over the active domain necessarily produces a finite result (the active domain is finite), the query is safe.
Why does domain dependence imply unsafety?
If a query is domain-dependent, there exist two domains D₁ ⊂ D₂ such that Q(I, D₁) ≠ Q(I, D₂). Specifically, we can choose D₂ to be infinite (e.g., all integers). The query result over D₂ will then include values from outside any finite domain—potentially infinitely many.
This equivalence means we can reason about safety either way: asking 'will this produce finite results?' is the same as asking 'does this depend on the domain?'. Different phrasings suit different contexts, but they identify the same class of well-behaved queries.
Often, a domain-dependent query expresses a legitimate user intent that can be reformulated as a domain-independent query. The key is understanding what the user actually wants and adding appropriate constraints.
User Intent: Find customers who haven't placed orders.
Naive (Domain-Dependent):
12
{c | ¬(∃o)(Order(o) ∧ o.customer_id = c)}-- Problem: c ranges over entire domainCorrected (Domain-Independent):
12345678
{c.id | Customer(c) ∧ ¬(∃o)(Order(o) ∧ o.customer_id = c.id)}-- Fix: c is bound to Customer table first -- SQL equivalent:SELECT c.id FROM Customer cWHERE NOT EXISTS ( SELECT 1 FROM Order o WHERE o.customer_id = c.id)The general pattern: identify what domain of values the user intends, and add a positive predicate constraining the free variable to that domain. Usually, there's a table that defines the 'universe of interest' (Customers, Students, Departments, etc.).
Domain dependence captures a fundamental semantic issue in relational calculus: queries whose meaning depends on assumptions beyond the database content.
What's Next:
Now that we understand why domain independence matters, we'll examine the formal safety conditions—syntactic rules that guarantee a query is domain-independent. These conditions provide a practical checklist for ensuring queries are well-formed.
You now understand domain dependence and its equivalence to query safety. This semantic foundation prepares you to learn the formal safety conditions that guarantee computable, well-defined queries in relational calculus.