Loading learning content...
You've designed a decomposition that looks elegant on paper. The relations are normalized, the schema is clean, and you're confident it will perform well. But how do you know that all your functional dependencies are preserved? Intuition isn't enough when data integrity is at stake.
Testing dependency preservation might seem straightforward at first—just check if each FD appears in some projected dependency set. But the reality is more nuanced: dependencies can be preserved through derivation, not just explicit membership. This means we need algorithms that can detect both direct and transitive preservation.
This page covers the complete algorithmic toolkit for testing dependency preservation. You'll learn the naive approach (and why it fails at scale), the efficient polynomial-time algorithm, step-by-step worked examples, and implementation strategies. By the end, you'll be able to rigorously verify any decomposition.
The most direct way to test dependency preservation follows directly from the definition:
Naive Algorithm:
This approach is correct but has a fatal flaw: the closure F⁺ can be exponentially large in the number of attributes.
For a relation with n attributes, F⁺ can contain O(2ⁿ) functional dependencies. For a relation with just 20 attributes, this means potentially millions of FDs. For 30 attributes, billions. The naive algorithm is computationally infeasible for real-world schemas.
| Attributes (n) | Potential |F⁺| | Practical Feasibility |
|---|---|---|
| 5 | ~32 | ✓ Trivial |
| 10 | ~1,024 | ✓ Reasonable |
| 15 | ~32,768 | ⚠ Slow |
| 20 | ~1,048,576 | ⚠ Very Slow |
| 25 | ~33,554,432 | ✗ Impractical |
| 30 | ~1,073,741,824 | ✗ Impossible |
The insight:
We don't actually need to compute F⁺ or even the full projected closures. We can test preservation one FD at a time using a much more efficient approach that leverages attribute closure computation—an operation that runs in polynomial time.
The key insight is that we can test preservation per-FD rather than computing entire closures. For each FD X → Y in F, we check if X → Y is in the closure of the projected dependencies (without computing that closure explicitly).
Core Principle:
An FD X → Y is preserved if Y ⊆ X⁺G where X⁺G is the closure of X computed using only the projected dependencies G = ∪ πRᵢ(F).
The Efficient Algorithm:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
def test_dependency_preservation(F, decomposition): """ Efficiently test if all FDs in F are preserved by the decomposition. Time Complexity: O(|F| × n² × k) where n = #attributes, k = #relations This is polynomial, not exponential! Parameters: - F: Set of functional dependencies {(lhs_set, rhs_set), ...} - decomposition: List of attribute sets [R1_attrs, R2_attrs, ...] Returns: - (True, None) if all preserved - (False, violating_fd) if some FD is not preserved """ for fd in F: lhs, rhs = fd # X → Y if not is_fd_preserved(lhs, rhs, F, decomposition): return (False, fd) return (True, None) def is_fd_preserved(X, Y, F, decomposition): """ Test if a single FD X → Y is preserved in the decomposition. Uses modified attribute closure that only considers projected FDs. """ # Compute X+ using only projected dependencies closure = compute_closure_with_projection(X, F, decomposition) # X → Y is preserved iff Y ⊆ closure return Y.issubset(closure) def compute_closure_with_projection(X, F, decomposition): """ Compute attribute closure of X, but restrict to FDs that are fully contained within some Ri (the projected dependencies). This is the key innovation: we compute closure iteratively, only adding attributes when the FD is actually projectable. """ closure = set(X) # Start with X changed = True while changed: changed = False for relation_attrs in decomposition: # Consider only FDs that project onto this relation # For each FD in F: X' → Y' where X' ⊆ Ri and Y' ⊆ Ri for fd_lhs, fd_rhs in F: # Check if this FD is fully contained in this relation if fd_lhs.issubset(relation_attrs) and fd_rhs.issubset(relation_attrs): # If we have the LHS in our closure, add the RHS if fd_lhs.issubset(closure) and not fd_rhs.issubset(closure): closure.update(fd_rhs) changed = True return closureThe algorithm works because attribute closure computation is equivalent to testing membership in F⁺. By modifying the closure computation to only use projected FDs, we effectively test membership in (∪πRᵢ(F))⁺ without computing that set explicitly. The polynomial time bound comes from the fact that each closure computation takes O(n² × |F|) time.
Let's walk through the algorithm with a detailed example to build deep intuition.
Given:
Relation R(A, B, C, D) with FDs:
Decomposition:
| Relation | Attributes | Projected FDs from F |
|---|---|---|
| R₁ | {A, B} | A → B (fully contained) |
| R₂ | {B, C} | B → C (fully contained) |
| R₃ | {C, D} | C → D (fully contained) |
Step 2: Test Each FD in F for Preservation
Testing A → B:
Testing B → C:
Testing C → D:
Testing A → C:
Conclusion: All FDs are preserved! This decomposition is dependency-preserving.
Notice how A → C was preserved even though it doesn't explicitly appear in any single relation. It was preserved through transitivity: A → B (in R₁) and B → C (in R₂) combine to give A → C. The algorithm correctly detected this derived preservation.
Now let's examine a decomposition that fails the preservation test.
Given:
Relation R(A, B, C) with FDs:
Decomposition:
Identify Projected Dependencies:
| Relation | Attributes | Projected FDs |
|---|---|---|
| R₁ | {A, C} | A → C (fully contained) |
| R₂ | {B, C} | B → C (fully contained) |
Note: A → B is NOT projected onto either relation because:
Testing A → B:
A → B is NOT preserved!
This decomposition fails the preservation test. To enforce A → B, we would need to join R₁ and R₂ on C, then check the constraint—an expensive operation. This decomposition should be rejected or redesigned.
The Fix:
To preserve A → B, we need a relation containing both A and B. Options:
Option 1 maintains normalization while restoring preservation. The new decomposition:
Now A → B is directly enforceable in R₃.
For production use, the basic algorithm can be significantly optimized. Here are key strategies employed in database design tools.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
from typing import Set, List, Tuple, Dict, Optionalfrom dataclasses import dataclass, field @dataclassclass FD: lhs: frozenset # Determinant (left-hand side) rhs: frozenset # Dependent (right-hand side) def is_contained_in(self, attrs: Set[str]) -> bool: """Check if FD is fully contained in attribute set.""" return self.lhs.issubset(attrs) and self.rhs.issubset(attrs) @dataclass class OptimizedPreservationChecker: """ Optimized dependency preservation checker with pre-computed indexes. """ fds: List[FD] decomposition: List[Set[str]] # Pre-computed indexes fd_by_relation: Dict[int, List[FD]] = field(default_factory=dict) trivially_preserved: Set[int] = field(default_factory=set) def __post_init__(self): self._build_indexes() def _build_indexes(self): """Pre-compute FD-to-relation mappings.""" for i, rel_attrs in enumerate(self.decomposition): self.fd_by_relation[i] = [] for j, fd in enumerate(self.fds): if fd.is_contained_in(rel_attrs): self.fd_by_relation[i].append(fd) self.trivially_preserved.add(j) def check_all(self) -> Tuple[bool, Optional[FD]]: """ Check if all FDs are preserved. Returns (True, None) if preserved, (False, failing_fd) otherwise. """ for i, fd in enumerate(self.fds): # Skip trivially preserved (already in some relation) if i in self.trivially_preserved: continue if not self._is_preserved(fd): return (False, fd) return (True, None) def _is_preserved(self, target_fd: FD) -> bool: """Test if a single FD is preserved using projected closure.""" closure = set(target_fd.lhs) changed = True while changed: changed = False for rel_idx, projected_fds in self.fd_by_relation.items(): for fd in projected_fds: if fd.lhs.issubset(closure) and not fd.rhs.issubset(closure): closure.update(fd.rhs) changed = True # Early exit if we've found target if target_fd.rhs.issubset(closure): return True return target_fd.rhs.issubset(closure) # Example usagedef example(): fds = [ FD(frozenset(['A']), frozenset(['B'])), FD(frozenset(['B']), frozenset(['C'])), FD(frozenset(['C']), frozenset(['D'])), ] decomposition = [ {'A', 'B'}, {'B', 'C'}, {'C', 'D'}, ] checker = OptimizedPreservationChecker(fds, decomposition) preserved, failing = checker.check_all() if preserved: print("✓ All dependencies preserved!") else: print(f"✗ Not preserved: {failing.lhs} → {failing.rhs}")During schema design, you often have multiple valid decomposition options. The preservation test helps evaluate which options preserve all dependencies—allowing you to make informed trade-off decisions.
Scenario:
Relation: Student(SID, SName, Major, Advisor, Department)
FDs:
Two decomposition options for BCNF:
| FD | Option A Preserves? | Option B Preserves? |
|---|---|---|
| FD1: SID → SName, Major, Advisor, Department | Derived (SID→...→Dept via R₁+R₂) | Derived (multiple paths) |
| FD2: Advisor → Department | ✓ Direct in R₂ | ✓ Direct in R₂ |
| FD3: Major → Department | ✗ NOT preserved | ✓ Direct in R₃ |
In Option A, testing Major → Department: closure({Major}) = {Major} (no applicable projected FDs). Department is not reachable. FD3 is lost. To enforce that each major belongs to exactly one department, we'd need to join R₁ and R₂, then verify the constraint.
Recommendation: Option B adds a small extra relation but preserves all dependencies. The overhead of maintaining R₃(Major, Department) is minimal compared to the cost of enforcing FD3 through joins on every student insert/update.
Decision Framework:
The preservation test algorithm is robust, but implementation requires attention to subtle edge cases.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
def is_fd_preserved_robust(X: Set, Y: Set, F: List, decomposition: List) -> bool: """ Robust preservation test with edge case handling. """ # Edge case 1: Trivial FD (Y ⊆ X) if Y.issubset(X): return True # Always preserved # Edge case 2: Empty determinant if not X: # FD {} → Y means Y must be the same in all tuples # These require special handling - typically preserved if Y is single-valued return True # Assuming normalized schema # Edge case 3: Decomposition has empty components valid_relations = [r for r in decomposition if len(r) >= 2] # Standard closure computation closure = set(X) changed = True iterations = 0 max_iterations = len(F) * len(decomposition) + 10 # Safety bound while changed and iterations < max_iterations: changed = False iterations += 1 for rel_attrs in valid_relations: for fd_lhs, fd_rhs in F: # Edge case 4: Handle multi-attribute RHS if not isinstance(fd_rhs, set): fd_rhs = {fd_rhs} if fd_lhs.issubset(rel_attrs) and fd_rhs.issubset(rel_attrs): if fd_lhs.issubset(closure) and not fd_rhs.issubset(closure): closure.update(fd_rhs) changed = True # Edge case 5: Ensure termination was clean if iterations >= max_iterations: raise RuntimeError("Closure computation did not converge - check for cycles") # Full RHS must be in closure, not just partial return Y.issubset(closure)We've developed a complete algorithmic toolkit for testing dependency preservation. Let's consolidate the key concepts:
You now have the algorithmic tools to rigorously verify dependency preservation in any decomposition. In the next page, we'll explore the trade-offs between dependency preservation and other normalization goals—particularly when achieving BCNF might require sacrificing some dependencies.