Database Management SystemsExpressive Power

Expressive Power of Relational Languages

LevelAdvanced

Duration90 mins

TopicExpressive Power

4 / 5

Limitations

The Boundaries of Relational Power

Every computational formalism has boundaries—problems it cannot solve, questions it cannot answer. Relational algebra and calculus, despite their elegance and power, are no exception. Understanding these limitations is not a sign of weakness in the relational model; rather, it reveals the precise tradeoffs the model makes: what it gains in decidability, optimization, and simplicity, and what it sacrifices in raw expressiveness.

This page explores the fundamental limitations of the relational model. We'll examine specific types of queries that cannot be expressed, understand why these limitations exist, and see how recognizing them informs both database design and the development of extended query languages.

What You Will Learn

By the end of this page, you will understand the specific query types that exceed relational expressiveness (transitive closure, aggregation, arithmetic, string operations), the theoretical reasons these limitations exist (first-order logic boundaries, lack of recursion), and how these limitations have shaped the evolution of practical query languages like SQL.

The Fundamental Limitation: First-Order Logic

To understand relational limitations, we must first understand what relational algebra and calculus fundamentally are: they are equivalent to first-order logic over finite structures (with equality and a fixed set of relation names).

First-order logic (FOL) is characterized by:

Variables that range over individual elements (domain values)
Predicates that test relationships between elements
Connectives (∧, ∨, ¬, →, ↔)
Quantifiers (∃, ∀) that range over individuals

What FOL lacks:

Quantification over predicates themselves (that's second-order logic)
Quantification over sets of elements (also second-order)
Fixed-point operators for defining recursive structures
Any form of counting or arithmetic

The Core Insight

Relational algebra is equivalent to first-order logic over finite structures. Therefore, ANY query that requires more than first-order logic cannot be expressed in relational algebra. This single fact explains all the specific limitations we'll examine.

Why this matters:

First-order logic is remarkably expressive for many purposes, but it has a fundamental weakness: it cannot express properties that require unbounded iteration or counting. Consider:

'Is the graph connected?' — Requires checking paths of arbitrary length
'How many employees are there?' — Requires counting
'What is the total salary?' — Requires arithmetic aggregation

These queries are not first-order definable over finite structures, and therefore cannot be expressed in relational algebra or calculus.

The mathematical foundation:

The study of what can and cannot be expressed in FOL over finite structures is the domain of finite model theory. Key results include:

Ehrenfeucht-Fraïssé games — A technique for proving queries are not FOL-definable
0-1 laws — Asymptotic results about FOL expressibility
Descriptive complexity — Connections between logic and computational complexity

While we won't dive deep into these mathematical tools, knowing they exist explains why limitations are precisely characterizable rather than merely observed.

Limitation: Transitive Closure and Recursion

The most famous limitation of relational algebra is its inability to compute transitive closure—the reachability relation in a graph. This limitation has profound practical implications for hierarchical and network data.

The Transitive Closure Problem

Given a relation Edge(from, to) representing a directed graph, the query 'find all pairs (x, y) such that y is reachable from x' CANNOT be expressed in relational algebra. This holds regardless of how cleverly we combine operators.

Why is transitive closure inexpressible?

Reachability can require arbitrarily many steps:

A → B → C → D → ... → Z

To detect that Z is reachable from A, we need to follow an unbounded chain. Relational algebra expressions have fixed depth—each expression specifies a finite number of operations. No fixed-depth expression can handle arbitrary-length paths.

A concrete example:

Consider the ParentOf(parent, child) relation:

ParentOf(Alice, Bob)
ParentOf(Bob, Carol)
ParentOf(Carol, David)

Query: Find all ancestor-descendant pairs.

'Ancestors' includes grandparents, great-grandparents, etc.—chains of arbitrary length.

We can express fixed-length chains:

-- Parents (1 step)
π_{parent, child}(ParentOf)

-- Grandparents (2 steps)
π_{P1.parent, P2.child}(ParentOf P1 ⋈_{P1.child = P2.parent} ParentOf P2)

-- Great-grandparents (3 steps)
... (more joins)

But for arbitrary depth? Impossible in pure relational algebra.

Practical impact:

Organizational hierarchies (who reports to whom, transitively)
Bill of materials (what parts compose a product, recursively)
Social networks (who can reach whom through connections)
Transportation networks (which cities are connected)

All these require transitive closure and exceed basic relational expressiveness.

What We CAN vs. CANNOT Express
Query Type	Expressible in RA?	Reason
Direct edges (1 step)	✓ Yes	Just select from Edge
2-step paths	✓ Yes	Join Edge with itself
k-step paths (fixed k)	✓ Yes	k-1 self-joins
Paths of length ≤ k	✓ Yes	Union of 1..k step queries
All paths (arbitrary length)	✗ No	Would require unbounded joins
Reachability	✗ No	Transitive closure
Shortest paths	✗ No	Requires both TC and counting

Limitation: Aggregation and Counting

Relational algebra operates on sets of tuples. It can filter, combine, and transform tuples, but it cannot count them, sum their values, or compute any aggregate function over a set.

The Counting Problem

Queries like 'How many employees are in each department?', 'What is the average salary?', or 'What is the maximum order value?' CANNOT be expressed in relational algebra. Counting and aggregation are fundamentally beyond first-order logic.

Why counting is inexpressible:

First-order logic quantifiers (∃, ∀) can express:

'There exists an x such that...'
'For all x...'

But they cannot express:

'There are exactly 5 x's such that...'
'There are more x's satisfying φ than ψ'

The mathematical proof:

This is a famous result in finite model theory. Using Ehrenfeucht-Fraïssé games, one can prove that no first-order sentence can distinguish between a set of 100 elements and a set of 101 elements. Therefore, counting is not first-order definable.

Practical queries that fail:

-- These require aggregation (not in basic RA)
SELECT department, COUNT(*) FROM Employee GROUP BY department;
SELECT AVG(salary) FROM Employee;
SELECT MAX(order_total) FROM Orders;
SELECT department, SUM(salary) FROM Employee GROUP BY department;

What CAN we express?

Relational algebra can express:

'There exists an employee' — ∃
'All employees have salary > 0' — ∀
'There are at least k employees' (for fixed k) — ∃x₁∃x₂...∃xₖ (all distinct)
'There are at most k employees' (for fixed k) — Negation of 'at least k+1'
'There are exactly k employees' (for fixed k) — Combination of above

But k must be known at query compile time, not computed from data.

Aggregation Operations Beyond RA

•COUNT — Number of tuples (or distinct values)
•SUM — Total of numeric attribute values
•AVG — Arithmetic mean of values
•MIN / MAX — Extreme values (though MAX/MIN can sometimes be simulated)
•GROUP BY — Partition relation and aggregate per group
•HAVING — Filter groups by aggregate conditions

Limitation: Arithmetic Computation

Pure relational algebra treats domain values as abstract, uninterpreted elements. It can test equality (x = y) and, with extensions, inequality (x < y), but it cannot compute new values through arithmetic operations.

What pure RA cannot express:

-- Compute a derived value
SELECT name, salary * 1.1 AS raised_salary FROM Employee;

-- Arithmetic in conditions (beyond comparisons)
SELECT * FROM Orders WHERE quantity * price > 1000;

-- Date arithmetic
SELECT * FROM Subscriptions WHERE end_date - start_date > 30;

-- String concatenation
SELECT first_name || ' ' || last_name AS full_name FROM Person;

The theoretical reason:

Pure first-order logic doesn't include arithmetic. The domain is just a set of values with equality. While we can add comparison predicates (<, ≤, >, ≥) to the logic, these are interpreted predicates—they test properties but don't compute new values.

The expressibility spectrum:

Operation	In Pure RA?	In Extended RA?	Notes
x = y	✓	✓	Basic equality
x < y	With extension	✓	Requires ordered domain
x + y	✗	Sometimes	Requires arithmetic extension
x * y	✗	Sometimes	Requires arithmetic extension
f(x) for function f	✗	Sometimes	Requires function extension
String concatenation	✗	Sometimes	Requires string extension

Practical note:

SQL and all practical query languages include arithmetic. When we speak of 'relational algebra' in implementation, we mean an extended version with:

Arithmetic expressions in selection conditions
Computed attributes in projection (extended projection)
Type-specific operations (date arithmetic, string functions)

Pure vs. Practical

The 'limitations' we discuss are of PURE relational algebra—the theoretical model. Practical systems extend RA with arithmetic, aggregates, and more. The pure model is important for theoretical analysis and defining the baseline of relational completeness.

Limitation: Ordering and Ranking

Relation algebra operates on sets—unordered collections where tuple position has no meaning. This fundamentally conflicts with ordering and ranking operations.

The Ordering Problem

Queries like 'Return the top 10 highest-paid employees' or 'Rank departments by total sales' are not expressible in relational algebra. The result of an RA expression is an unordered set; 'first' and 'last' have no meaning.

Why ordering is outside the model:

Relational algebra is defined over relations, which are sets of tuples. In set theory:

{1, 2, 3} = {3, 1, 2} = {2, 3, 1}
Sets have no inherent order

Ordering is a property of sequences (lists), not sets. To return a 'top-k' result, we would need:

A comparison relation (e.g., salary ordering)
A way to limit output size
A way to produce an ordered sequence, not a set

RA can do (1) partially via selection, but (2) and (3) are outside its scope.

Related inexpressible queries:

-- Top-k queries
SELECT * FROM Employee ORDER BY salary DESC LIMIT 10;

-- Ranking
SELECT name, RANK() OVER (ORDER BY salary DESC) FROM Employee;

-- Percentiles
SELECT * FROM Employee WHERE salary > PERCENTILE_CONT(0.9) ...;

-- Running aggregates
SELECT date, SUM(sales) OVER (ORDER BY date) FROM DailySales;

Theoretical vs. practical perspective:

In the relational model, ORDER BY is not a relational operator—it produces an ordered list, not a relation. The result of ORDER BY is outside the relational model entirely.

In practice, SQL includes ORDER BY as a presentation operation. The optimizer knows it doesn't affect relational semantics, so it can be applied after all relational operations complete.

Set Operations vs. Ordered Operations
Property	Relational (Set) Model	Ordered (List) Model
Duplicate handling	Eliminated (set semantics)	Preserved (bag semantics)
Element order	No order (sets are unordered)	Position matters
'First' and 'Last'	Meaningless	Well-defined
Equality	{1,2,3} = {3,2,1}	[1,2,3] ≠ [3,2,1]
Top-k queries	Not expressible	Natural operation

Limitation: String and Text Operations

While relational algebra can test string equality, it cannot perform string manipulation or pattern matching beyond what's achievable through element-wise comparison.

Inexpressible string operations:

-- Substring matching
SELECT * FROM Products WHERE name LIKE '%laptop%';

-- Regular expression matching
SELECT * FROM LogEntries WHERE message ~ 'error.*fatal';

-- String functions
SELECT UPPER(name), LENGTH(description) FROM Products;

-- Full-text search
SELECT * FROM Documents WHERE body @@ 'database & optimization';

The theoretical reason:

In pure relational algebra:

String values are atomic, unstructured elements
We can test if two strings are equal or (with ordering) lexicographically ordered
We cannot 'see inside' a string to examine substrings, characters, or patterns

The LIKE predicate:

SQL's LIKE operator (with % and _ wildcards) requires an extended predicate not in basic FOL. It's testing a property (pattern matching) that involves examining the internal structure of string values.

Regular expressions:

Even more powerful, regular expressions require finite automata to evaluate. While regular languages and FOL are incomparable in general, pattern matching within strings is definitely outside standard relational operators.

Practical approach:

SQL systems implement string operations as built-in functions:

These are extensions to the relational model
They're evaluated during tuple processing
They enhance expressiveness beyond pure RA

Full-Text Search

Full-text search is particularly far from RA—it involves tokenization, stemming, ranking by relevance, and more. Database systems implement full-text search as separate subsystems (inverted indexes, ranking algorithms) that aren't part of the relational model.

Limitation: Null Values and Incomplete Information

The pure relational model assumes all data is known and present. Null values—representing missing, unknown, or inapplicable information—introduce complexities that the basic model doesn't naturally handle.

The null value complications:

Three-valued logic:
- TRUE, FALSE, UNKNOWN
- NULL = NULL is UNKNOWN, not TRUE
- Logical operations: TRUE ∧ UNKNOWN = UNKNOWN
Aggregation ambiguity:
- What is SUM of {10, NULL, 20}? (Most systems: 30, ignoring NULL)
- What is AVG? (30/2 = 15, or 30/3 = 10?)
Join semantics:
- Should NULL join with NULL? (Usually no)
- Outer joins introduce NULLs for non-matching tuples

Why this exceeds the pure model:

Pure relational algebra assumes:

Every tuple has a value for every attribute
Equality tests have definite outcomes (TRUE or FALSE)
All operations are total (defined for all inputs)

Nulls violate all three assumptions.

SQL's approach:

SQL incorporates nulls and three-valued logic:

IS NULL / IS NOT NULL predicates
COALESCE and NULLIF functions
Special handling in aggregates
Outer join operators that can produce nulls

These are extensions beyond the pure relational model, though they're so fundamental to SQL that they feel core to the 'relational' approach.

Codd's Criticism

Interestingly, Codd himself criticized how SQL handled nulls, proposing instead a system with multiple null types (missing-but-applicable, missing-and-inapplicable). The practical SQL approach is a compromise that works but doesn't fully resolve the theoretical issues.

Why These Limitations Exist: The Design Tradeoffs

The limitations we've examined aren't accidents or oversights—they're the result of deliberate (or at least inevitable) design tradeoffs. Understanding these tradeoffs illuminates why the relational model succeeded despite its limitations.

What the Limitations Buy Us

•Decidability — All RA queries terminate. There's no halting problem; every query produces a result in finite time on finite data. Adding recursion (for transitive closure) risks non-termination.
•Predictable Complexity — RA query evaluation is in polynomial time (specifically, polynomial in data size). Adding counting or arithmetic can increase complexity dramatically.
•Optimization Freedom — The equivalences used by query optimizers rely on RA semantics. Adding features that break these equivalences limits optimization.
•Simple Semantics — RA has clear, mathematical semantics. Every expression has a well-defined meaning. Adding features complicates the semantic foundation.
•Implementation Clarity — Each RA operator has a direct implementation (scan, filter, join, etc.). More complex operations require more sophisticated algorithms.

The expressiveness-tractability tradeoff:

In computational logic, a general principle holds:

More expressive logics are harder to evaluate and less amenable to optimization.

Relational algebra (equivalent to FOL) sits at a sweet spot:

Powerful enough to express most naturally occurring queries
Weak enough to ensure polynomial-time evaluation
Structured enough to enable algebraic optimization

Adding transitive closure crosses a complexity boundary. Adding unrestricted recursion makes evaluation undecidable. Adding aggregation and arithmetic changes the complexity characteristics.

Historical vindication:

The success of relational databases over 50 years validates these tradeoffs. Systems that tried to be 'more expressive' (like early object databases) often lost the optimization and simplicity benefits. The relational model's limitations are features as much as bugs—they define a coherent, implementable, optimizable system.

The Key Insight

Limitations define the boundary within which powerful guarantees hold. Cross the boundary, and you may gain expressiveness but lose optimization, decidability, or predictability. The art of database system design is knowing when to cross these boundaries and how to contain the consequences.

Summary: The Price of Elegance

We've explored the fundamental limitations of relational algebra and calculus—the boundaries of what the pure relational model can express.

Key Takeaways

•Relational algebra equals first-order logic over finite structures. Any query requiring more than FOL is inexpressible.
•Transitive closure (reachability, arbitrary-length paths) is the canonical example of an inexpressible query—it requires unbounded iteration.
•Counting and aggregation (COUNT, SUM, AVG) are not first-order definable and thus exceed relational expressiveness.
•Arithmetic computation on values is outside pure RA, which treats values as abstract elements.
•Ordering and ranking conflict with the set-based semantics of relations, where tuples have no inherent position.
•String operations beyond equality require examining the internal structure of string values, which RA doesn't support.
•These limitations are tradeoffs, not failures—they ensure decidability, polynomial-time evaluation, and optimization freedom.

What's next:

Recognizing limitations naturally leads to the question: How do we extend the relational model to overcome them? Page 5 examines the extensions developed over decades—from SQL's aggregation and grouping to recursive CTEs, window functions, and beyond. We'll see how practical query languages transcend pure relational expressiveness while preserving (as much as possible) the benefits of the relational foundation.

Page Complete

You now understand the fundamental limitations of relational algebra—what it cannot express and why. These limitations aren't weaknesses but rather the price of the elegance, simplicity, and optimizability that made the relational model the dominant paradigm for database systems.

4 / 5

Loading learning content...

Database Management SystemsExpressive Power

Expressive Power of Relational Languages

LevelAdvanced

Duration90 mins

TopicExpressive Power

4 / 5

Limitations

The Boundaries of Relational Power

What You Will Learn

The Fundamental Limitation: First-Order Logic

First-order logic (FOL) is characterized by:

Variables that range over individual elements (domain values)
Predicates that test relationships between elements
Connectives (∧, ∨, ¬, →, ↔)
Quantifiers (∃, ∀) that range over individuals

What FOL lacks:

Quantification over predicates themselves (that's second-order logic)
Quantification over sets of elements (also second-order)
Fixed-point operators for defining recursive structures
Any form of counting or arithmetic

The Core Insight

Why this matters:

First-order logic is remarkably expressive for many purposes, but it has a fundamental weakness: it cannot express properties that require unbounded iteration or counting. Consider:

'Is the graph connected?' — Requires checking paths of arbitrary length
'How many employees are there?' — Requires counting
'What is the total salary?' — Requires arithmetic aggregation

These queries are not first-order definable over finite structures, and therefore cannot be expressed in relational algebra or calculus.

The mathematical foundation:

The study of what can and cannot be expressed in FOL over finite structures is the domain of finite model theory. Key results include:

Ehrenfeucht-Fraïssé games — A technique for proving queries are not FOL-definable
0-1 laws — Asymptotic results about FOL expressibility
Descriptive complexity — Connections between logic and computational complexity

While we won't dive deep into these mathematical tools, knowing they exist explains why limitations are precisely characterizable rather than merely observed.

Limitation: Transitive Closure and Recursion

The Transitive Closure Problem

Why is transitive closure inexpressible?

Reachability can require arbitrarily many steps:

A → B → C → D → ... → Z

A concrete example:

Consider the ParentOf(parent, child) relation:

ParentOf(Alice, Bob)
ParentOf(Bob, Carol)
ParentOf(Carol, David)

Query: Find all ancestor-descendant pairs.

'Ancestors' includes grandparents, great-grandparents, etc.—chains of arbitrary length.

We can express fixed-length chains:

-- Parents (1 step)
π_{parent, child}(ParentOf)

-- Grandparents (2 steps)
π_{P1.parent, P2.child}(ParentOf P1 ⋈_{P1.child = P2.parent} ParentOf P2)

-- Great-grandparents (3 steps)
... (more joins)

But for arbitrary depth? Impossible in pure relational algebra.

Practical impact:

Organizational hierarchies (who reports to whom, transitively)
Bill of materials (what parts compose a product, recursively)
Social networks (who can reach whom through connections)
Transportation networks (which cities are connected)

All these require transitive closure and exceed basic relational expressiveness.

What We CAN vs. CANNOT Express
Query Type	Expressible in RA?	Reason
Direct edges (1 step)	✓ Yes	Just select from Edge
2-step paths	✓ Yes	Join Edge with itself
k-step paths (fixed k)	✓ Yes	k-1 self-joins
Paths of length ≤ k	✓ Yes	Union of 1..k step queries
All paths (arbitrary length)	✗ No	Would require unbounded joins
Reachability	✗ No	Transitive closure
Shortest paths	✗ No	Requires both TC and counting

Limitation: Aggregation and Counting

Relational algebra operates on sets of tuples. It can filter, combine, and transform tuples, but it cannot count them, sum their values, or compute any aggregate function over a set.

The Counting Problem

Why counting is inexpressible:

First-order logic quantifiers (∃, ∀) can express:

'There exists an x such that...'
'For all x...'

But they cannot express:

'There are exactly 5 x's such that...'
'There are more x's satisfying φ than ψ'

The mathematical proof:

Practical queries that fail:

-- These require aggregation (not in basic RA)
SELECT department, COUNT(*) FROM Employee GROUP BY department;
SELECT AVG(salary) FROM Employee;
SELECT MAX(order_total) FROM Orders;
SELECT department, SUM(salary) FROM Employee GROUP BY department;

What CAN we express?

Relational algebra can express:

'There exists an employee' — ∃
'All employees have salary > 0' — ∀
'There are at least k employees' (for fixed k) — ∃x₁∃x₂...∃xₖ (all distinct)
'There are at most k employees' (for fixed k) — Negation of 'at least k+1'
'There are exactly k employees' (for fixed k) — Combination of above

But k must be known at query compile time, not computed from data.

Aggregation Operations Beyond RA

•COUNT — Number of tuples (or distinct values)
•SUM — Total of numeric attribute values
•AVG — Arithmetic mean of values
•MIN / MAX — Extreme values (though MAX/MIN can sometimes be simulated)
•GROUP BY — Partition relation and aggregate per group
•HAVING — Filter groups by aggregate conditions

Limitation: Arithmetic Computation

What pure RA cannot express:

-- Compute a derived value
SELECT name, salary * 1.1 AS raised_salary FROM Employee;

-- Arithmetic in conditions (beyond comparisons)
SELECT * FROM Orders WHERE quantity * price > 1000;

-- Date arithmetic
SELECT * FROM Subscriptions WHERE end_date - start_date > 30;

-- String concatenation
SELECT first_name || ' ' || last_name AS full_name FROM Person;

The theoretical reason:

The expressibility spectrum:

Operation	In Pure RA?	In Extended RA?	Notes
x = y	✓	✓	Basic equality
x < y	With extension	✓	Requires ordered domain
x + y	✗	Sometimes	Requires arithmetic extension
x * y	✗	Sometimes	Requires arithmetic extension
f(x) for function f	✗	Sometimes	Requires function extension
String concatenation	✗	Sometimes	Requires string extension

Practical note:

SQL and all practical query languages include arithmetic. When we speak of 'relational algebra' in implementation, we mean an extended version with:

Arithmetic expressions in selection conditions
Computed attributes in projection (extended projection)
Type-specific operations (date arithmetic, string functions)

Pure vs. Practical

Limitation: Ordering and Ranking

Relation algebra operates on sets—unordered collections where tuple position has no meaning. This fundamentally conflicts with ordering and ranking operations.

The Ordering Problem

Why ordering is outside the model:

Relational algebra is defined over relations, which are sets of tuples. In set theory:

{1, 2, 3} = {3, 1, 2} = {2, 3, 1}
Sets have no inherent order

Ordering is a property of sequences (lists), not sets. To return a 'top-k' result, we would need:

A comparison relation (e.g., salary ordering)
A way to limit output size
A way to produce an ordered sequence, not a set

RA can do (1) partially via selection, but (2) and (3) are outside its scope.

Related inexpressible queries:

-- Top-k queries
SELECT * FROM Employee ORDER BY salary DESC LIMIT 10;

-- Ranking
SELECT name, RANK() OVER (ORDER BY salary DESC) FROM Employee;

-- Percentiles
SELECT * FROM Employee WHERE salary > PERCENTILE_CONT(0.9) ...;

-- Running aggregates
SELECT date, SUM(sales) OVER (ORDER BY date) FROM DailySales;

Theoretical vs. practical perspective:

In the relational model, ORDER BY is not a relational operator—it produces an ordered list, not a relation. The result of ORDER BY is outside the relational model entirely.

In practice, SQL includes ORDER BY as a presentation operation. The optimizer knows it doesn't affect relational semantics, so it can be applied after all relational operations complete.

Set Operations vs. Ordered Operations
Property	Relational (Set) Model	Ordered (List) Model
Duplicate handling	Eliminated (set semantics)	Preserved (bag semantics)
Element order	No order (sets are unordered)	Position matters
'First' and 'Last'	Meaningless	Well-defined
Equality	{1,2,3} = {3,2,1}	[1,2,3] ≠ [3,2,1]
Top-k queries	Not expressible	Natural operation

Limitation: String and Text Operations

While relational algebra can test string equality, it cannot perform string manipulation or pattern matching beyond what's achievable through element-wise comparison.

Inexpressible string operations:

-- Substring matching
SELECT * FROM Products WHERE name LIKE '%laptop%';

-- Regular expression matching
SELECT * FROM LogEntries WHERE message ~ 'error.*fatal';

-- String functions
SELECT UPPER(name), LENGTH(description) FROM Products;

-- Full-text search
SELECT * FROM Documents WHERE body @@ 'database & optimization';

The theoretical reason:

In pure relational algebra:

String values are atomic, unstructured elements
We can test if two strings are equal or (with ordering) lexicographically ordered
We cannot 'see inside' a string to examine substrings, characters, or patterns

The LIKE predicate:

Regular expressions:

Practical approach:

SQL systems implement string operations as built-in functions:

These are extensions to the relational model
They're evaluated during tuple processing
They enhance expressiveness beyond pure RA

Full-Text Search

Limitation: Null Values and Incomplete Information

The null value complications:

Three-valued logic:
- TRUE, FALSE, UNKNOWN
- NULL = NULL is UNKNOWN, not TRUE
- Logical operations: TRUE ∧ UNKNOWN = UNKNOWN
Aggregation ambiguity:
- What is SUM of {10, NULL, 20}? (Most systems: 30, ignoring NULL)
- What is AVG? (30/2 = 15, or 30/3 = 10?)
Join semantics:
- Should NULL join with NULL? (Usually no)
- Outer joins introduce NULLs for non-matching tuples

Why this exceeds the pure model:

Pure relational algebra assumes:

Every tuple has a value for every attribute
Equality tests have definite outcomes (TRUE or FALSE)
All operations are total (defined for all inputs)

Nulls violate all three assumptions.

SQL's approach:

SQL incorporates nulls and three-valued logic:

IS NULL / IS NOT NULL predicates
COALESCE and NULLIF functions
Special handling in aggregates
Outer join operators that can produce nulls

These are extensions beyond the pure relational model, though they're so fundamental to SQL that they feel core to the 'relational' approach.

Codd's Criticism

Why These Limitations Exist: The Design Tradeoffs

What the Limitations Buy Us

•Decidability — All RA queries terminate. There's no halting problem; every query produces a result in finite time on finite data. Adding recursion (for transitive closure) risks non-termination.
•Predictable Complexity — RA query evaluation is in polynomial time (specifically, polynomial in data size). Adding counting or arithmetic can increase complexity dramatically.
•Optimization Freedom — The equivalences used by query optimizers rely on RA semantics. Adding features that break these equivalences limits optimization.
•Simple Semantics — RA has clear, mathematical semantics. Every expression has a well-defined meaning. Adding features complicates the semantic foundation.
•Implementation Clarity — Each RA operator has a direct implementation (scan, filter, join, etc.). More complex operations require more sophisticated algorithms.

The expressiveness-tractability tradeoff:

In computational logic, a general principle holds:

More expressive logics are harder to evaluate and less amenable to optimization.

Relational algebra (equivalent to FOL) sits at a sweet spot:

Powerful enough to express most naturally occurring queries
Weak enough to ensure polynomial-time evaluation
Structured enough to enable algebraic optimization

Adding transitive closure crosses a complexity boundary. Adding unrestricted recursion makes evaluation undecidable. Adding aggregation and arithmetic changes the complexity characteristics.

Historical vindication:

The Key Insight

Summary: The Price of Elegance

We've explored the fundamental limitations of relational algebra and calculus—the boundaries of what the pure relational model can express.

Key Takeaways

•Relational algebra equals first-order logic over finite structures. Any query requiring more than FOL is inexpressible.
•Transitive closure (reachability, arbitrary-length paths) is the canonical example of an inexpressible query—it requires unbounded iteration.
•Counting and aggregation (COUNT, SUM, AVG) are not first-order definable and thus exceed relational expressiveness.
•Arithmetic computation on values is outside pure RA, which treats values as abstract elements.
•Ordering and ranking conflict with the set-based semantics of relations, where tuples have no inherent position.
•String operations beyond equality require examining the internal structure of string values, which RA doesn't support.
•These limitations are tradeoffs, not failures—they ensure decidability, polynomial-time evaluation, and optimization freedom.

What's next:

Page Complete

4 / 5