Database Management SystemsDependency Preserving Decomposition

Dependency Preserving Decomposition

LevelIntermediate

Duration60 mins

TopicDependency Preserving Decomposition

2 / 5

Testing Preservation

The Verification Challenge

You've designed a decomposition that looks elegant on paper. The relations are normalized, the schema is clean, and you're confident it will perform well. But how do you know that all your functional dependencies are preserved? Intuition isn't enough when data integrity is at stake.

Testing dependency preservation might seem straightforward at first—just check if each FD appears in some projected dependency set. But the reality is more nuanced: dependencies can be preserved through derivation, not just explicit membership. This means we need algorithms that can detect both direct and transitive preservation.

What You Will Learn

This page covers the complete algorithmic toolkit for testing dependency preservation. You'll learn the naive approach (and why it fails at scale), the efficient polynomial-time algorithm, step-by-step worked examples, and implementation strategies. By the end, you'll be able to rigorously verify any decomposition.

The Naive Approach (And Why It Fails)

The most direct way to test dependency preservation follows directly from the definition:

Naive Algorithm:

Compute F⁺ (closure of original FDs)
For each relation Rᵢ, compute πRᵢ(F⁺) (project closure onto relation's attributes)
Compute G = ∪ πRᵢ(F⁺) (union of all projections)
Compute G⁺ (closure of the union)
Check if G⁺ = F⁺

This approach is correct but has a fatal flaw: the closure F⁺ can be exponentially large in the number of attributes.

Exponential Explosion

For a relation with n attributes, F⁺ can contain O(2ⁿ) functional dependencies. For a relation with just 20 attributes, this means potentially millions of FDs. For 30 attributes, billions. The naive algorithm is computationally infeasible for real-world schemas.

Naive Algorithm Complexity Growth
Attributes (n)	Potential \|F⁺\|	Practical Feasibility
5	~32	✓ Trivial
10	~1,024	✓ Reasonable
15	~32,768	⚠ Slow
20	~1,048,576	⚠ Very Slow
25	~33,554,432	✗ Impractical
30	~1,073,741,824	✗ Impossible

The insight:

We don't actually need to compute F⁺ or even the full projected closures. We can test preservation one FD at a time using a much more efficient approach that leverages attribute closure computation—an operation that runs in polynomial time.

The Efficient Preservation Test Algorithm

The key insight is that we can test preservation per-FD rather than computing entire closures. For each FD X → Y in F, we check if X → Y is in the closure of the projected dependencies (without computing that closure explicitly).

Core Principle:

An FD X → Y is preserved if Y ⊆ X⁺G where X⁺G is the closure of X computed using only the projected dependencies G = ∪ πRᵢ(F).

The Efficient Algorithm:

efficient_preservation_test.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def test_dependency_preservation(F, decomposition):
    """
    Efficiently test if all FDs in F are preserved by the decomposition.
    
    Time Complexity: O(|F| × n² × k) where n = #attributes, k = #relations
    This is polynomial, not exponential!
    
    Parameters:
    - F: Set of functional dependencies {(lhs_set, rhs_set), ...}
    - decomposition: List of attribute sets [R1_attrs, R2_attrs, ...]
    
    Returns:
    - (True, None) if all preserved
    - (False, violating_fd) if some FD is not preserved
    """
    for fd in F:
        lhs, rhs = fd  # X → Y
        if not is_fd_preserved(lhs, rhs, F, decomposition):
            return (False, fd)
    return (True, None)
 
 
def is_fd_preserved(X, Y, F, decomposition):
    """
    Test if a single FD X → Y is preserved in the decomposition.
    Uses modified attribute closure that only considers projected FDs.
    """
    # Compute X+ using only projected dependencies
    closure = compute_closure_with_projection(X, F, decomposition)
    
    # X → Y is preserved iff Y ⊆ closure
    return Y.issubset(closure)
 
 
def compute_closure_with_projection(X, F, decomposition):
    """
    Compute attribute closure of X, but restrict to FDs that are
    fully contained within some Ri (the projected dependencies).
    
    This is the key innovation: we compute closure iteratively,
    only adding attributes when the FD is actually projectable.
    """
    closure = set(X)  # Start with X
    changed = True
    
    while changed:
        changed = False
        for relation_attrs in decomposition:
            # Consider only FDs that project onto this relation
            # For each FD in F: X' → Y' where X' ⊆ Ri and Y' ⊆ Ri
            for fd_lhs, fd_rhs in F:
                # Check if this FD is fully contained in this relation
                if fd_lhs.issubset(relation_attrs) and fd_rhs.issubset(relation_attrs):
                    # If we have the LHS in our closure, add the RHS
                    if fd_lhs.issubset(closure) and not fd_rhs.issubset(closure):
                        closure.update(fd_rhs)
                        changed = True
    
    return closure

Why This Works

The algorithm works because attribute closure computation is equivalent to testing membership in F⁺. By modifying the closure computation to only use projected FDs, we effectively test membership in (∪πRᵢ(F))⁺ without computing that set explicitly. The polynomial time bound comes from the fact that each closure computation takes O(n² × |F|) time.

Step-by-Step Algorithm Walkthrough

Let's walk through the algorithm with a detailed example to build deep intuition.

Given:

Relation R(A, B, C, D) with FDs:

F = {A → B, B → C, C → D, A → C}

Decomposition:

R₁(A, B)
R₂(B, C)
R₃(C, D)

Step 1: Identify Projected Dependencies
Relation	Attributes	Projected FDs from F
R₁	{A, B}	A → B (fully contained)
R₂	{B, C}	B → C (fully contained)
R₃	{C, D}	C → D (fully contained)

Step 2: Test Each FD in F for Preservation

Testing A → B:

Compute closure of {A} using projected FDs only
Start: closure = {A}
Check R₁: FD A → B; A ⊆ {A}, so add B. closure = {A, B}
Check R₂: FD B → C; B ⊆ {A, B}, so add C. closure = {A, B, C}
Check R₃: FD C → D; C ⊆ {A, B, C}, so add D. closure = {A, B, C, D}
Is {B} ⊆ {A, B, C, D}? Yes! A → B is preserved. ✓

Testing B → C:

Compute closure of {B} using projected FDs
Start: closure = {B}
Check R₁: FD A → B; A ⊄ {B}, skip
Check R₂: FD B → C; B ⊆ {B}, add C. closure = {B, C}
Check R₃: FD C → D; C ⊆ {B, C}, add D. closure = {B, C, D}
Is {C} ⊆ {B, C, D}? Yes! B → C is preserved. ✓

Testing C → D:

Compute closure of {C} using projected FDs
Start: closure = {C}
Check R₁: FD A → B; A ⊄ {C}, skip
Check R₂: FD B → C; B ⊄ {C}, skip
Check R₃: FD C → D; C ⊆ {C}, add D. closure = {C, D}
Is {D} ⊆ {C, D}? Yes! C → D is preserved. ✓

Testing A → C:

Compute closure of {A} using projected FDs
Start: closure = {A}
Check R₁: FD A → B; A ⊆ {A}, add B. closure = {A, B}
Check R₂: FD B → C; B ⊆ {A, B}, add C. closure = {A, B, C}
Is {C} ⊆ {A, B, C}? Yes! A → C is preserved. ✓

Conclusion: All FDs are preserved! This decomposition is dependency-preserving.

Verification Complete

Notice how A → C was preserved even though it doesn't explicitly appear in any single relation. It was preserved through transitivity: A → B (in R₁) and B → C (in R₂) combine to give A → C. The algorithm correctly detected this derived preservation.

Example: Detecting Non-Preservation

Now let's examine a decomposition that fails the preservation test.

Given:

Relation R(A, B, C) with FDs:

F = {A → B, B → C, A → C}

Decomposition:

R₁(A, C)
R₂(B, C)

Identify Projected Dependencies:

Relation	Attributes	Projected FDs
R₁	{A, C}	A → C (fully contained)
R₂	{B, C}	B → C (fully contained)

Note: A → B is NOT projected onto either relation because:

In R₁: B is not present
In R₂: A is not present

Testing A → B:

Compute closure of {A} using projected FDs only
Start: closure = {A}
Check R₁: FD A → C; A ⊆ {A}, add C. closure = {A, C}
Check R₂: FD B → C; B ⊄ {A, C}, skip
No more changes. Final closure = {A, C}
Is {B} ⊆ {A, C}? NO! B is not in the closure.

A → B is NOT preserved!

Preservation Failure Detected

This decomposition fails the preservation test. To enforce A → B, we would need to join R₁ and R₂ on C, then check the constraint—an expensive operation. This decomposition should be rejected or redesigned.

The Fix:

To preserve A → B, we need a relation containing both A and B. Options:

Add a new relation: R₃(A, B)
Modify an existing relation: Change R₁(A, C) to R₁(A, B, C)
Redesign completely: Use different decomposition boundaries

Option 1 maintains normalization while restoring preservation. The new decomposition:

R₁(A, C)
R₂(B, C)
R₃(A, B)

Now A → B is directly enforceable in R₃.

Optimized Implementation Strategies

For production use, the basic algorithm can be significantly optimized. Here are key strategies employed in database design tools.

Optimization Techniques

•Pre-filter FDs — Before testing, identify which FDs are trivially preserved (fully contained in one relation). Only test the remaining FDs.
•Index FDs by determinant — Store FDs in a hash map keyed by determinant attributes. O(1) lookup when checking applicability.
•Relation-FD mapping — Pre-compute which FDs project onto each relation. Avoid re-checking containment in the inner loop.
•Early termination — Stop as soon as any FD fails preservation (if only checking preservability).
•Incremental closure — When checking multiple FDs, cache partial closure results for shared prefixes.
•Bit-vector attributes — Represent attribute sets as bit vectors for O(1) subset operations.

optimized_preservation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
from typing import Set, List, Tuple, Dict, Optional
from dataclasses import dataclass, field
 
@dataclass
class FD:
    lhs: frozenset  # Determinant (left-hand side)
    rhs: frozenset  # Dependent (right-hand side)
    
    def is_contained_in(self, attrs: Set[str]) -> bool:
        """Check if FD is fully contained in attribute set."""
        return self.lhs.issubset(attrs) and self.rhs.issubset(attrs)
 
 
@dataclass  
class OptimizedPreservationChecker:
    """
    Optimized dependency preservation checker with pre-computed indexes.
    """
    fds: List[FD]
    decomposition: List[Set[str]]
    
    # Pre-computed indexes
    fd_by_relation: Dict[int, List[FD]] = field(default_factory=dict)
    trivially_preserved: Set[int] = field(default_factory=set)
    
    def __post_init__(self):
        self._build_indexes()
    
    def _build_indexes(self):
        """Pre-compute FD-to-relation mappings."""
        for i, rel_attrs in enumerate(self.decomposition):
            self.fd_by_relation[i] = []
            for j, fd in enumerate(self.fds):
                if fd.is_contained_in(rel_attrs):
                    self.fd_by_relation[i].append(fd)
                    self.trivially_preserved.add(j)
    
    def check_all(self) -> Tuple[bool, Optional[FD]]:
        """
        Check if all FDs are preserved.
        Returns (True, None) if preserved, (False, failing_fd) otherwise.
        """
        for i, fd in enumerate(self.fds):
            # Skip trivially preserved (already in some relation)
            if i in self.trivially_preserved:
                continue
            
            if not self._is_preserved(fd):
                return (False, fd)
        
        return (True, None)
    
    def _is_preserved(self, target_fd: FD) -> bool:
        """Test if a single FD is preserved using projected closure."""
        closure = set(target_fd.lhs)
        changed = True
        
        while changed:
            changed = False
            for rel_idx, projected_fds in self.fd_by_relation.items():
                for fd in projected_fds:
                    if fd.lhs.issubset(closure) and not fd.rhs.issubset(closure):
                        closure.update(fd.rhs)
                        changed = True
                        
                        # Early exit if we've found target
                        if target_fd.rhs.issubset(closure):
                            return True
        
        return target_fd.rhs.issubset(closure)
 
 
# Example usage
def example():
    fds = [
        FD(frozenset(['A']), frozenset(['B'])),
        FD(frozenset(['B']), frozenset(['C'])),
        FD(frozenset(['C']), frozenset(['D'])),
    ]
    
    decomposition = [
        {'A', 'B'},
        {'B', 'C'},
        {'C', 'D'},
    ]
    
    checker = OptimizedPreservationChecker(fds, decomposition)
    preserved, failing = checker.check_all()
    
    if preserved:
        print("✓ All dependencies preserved!")
    else:
        print(f"✗ Not preserved: {failing.lhs} → {failing.rhs}")

Comparing Multiple Decomposition Options

During schema design, you often have multiple valid decomposition options. The preservation test helps evaluate which options preserve all dependencies—allowing you to make informed trade-off decisions.

Scenario:

Relation: Student(SID, SName, Major, Advisor, Department)

FDs:

FD1: SID → SName, Major, Advisor, Department
FD2: Advisor → Department
FD3: Major → Department

Two decomposition options for BCNF:

Option A

•R₁(SID, SName, Major, Advisor)
•R₂(Advisor, Department)

Option B

•R₁(SID, SName, Major, Advisor)
•R₂(Advisor, Department)
•R₃(Major, Department)

Preservation Analysis Comparison
FD	Option A Preserves?	Option B Preserves?
FD1: SID → SName, Major, Advisor, Department	Derived (SID→...→Dept via R₁+R₂)	Derived (multiple paths)
FD2: Advisor → Department	✓ Direct in R₂	✓ Direct in R₂
FD3: Major → Department	✗ NOT preserved	✓ Direct in R₃

Option A Loses FD3

In Option A, testing Major → Department: closure({Major}) = {Major} (no applicable projected FDs). Department is not reachable. FD3 is lost. To enforce that each major belongs to exactly one department, we'd need to join R₁ and R₂, then verify the constraint.

Recommendation: Option B adds a small extra relation but preserves all dependencies. The overhead of maintaining R₃(Major, Department) is minimal compared to the cost of enforcing FD3 through joins on every student insert/update.

Decision Framework:

Run preservation test on all candidate decompositions
Prefer decompositions that preserve all FDs
If no option preserves all FDs, evaluate: which FDs are most critical?
Consider adding small "preservation-restoring" relations for critical FDs

Edge Cases and Common Pitfalls

The preservation test algorithm is robust, but implementation requires attention to subtle edge cases.

Common Pitfalls to Avoid

•Trivial FDs — FDs like A → A are always preserved but may pollute test results. Filter them early.
•Redundant FDs in F — If F contains redundant FDs, ensure your test handles them (they should be preserved if f is preserved).
•Multi-attribute RHS — For FD X → YZ, ensure you test that BOTH Y and Z are in the closure, not just one.
•Empty decomposition components — Handle edge cases where a relation has very few attributes.
•Circular FD chains — FDs like A → B, B → C, C → A should terminate correctly (closure reaches fixed point).
•Very large attribute sets — Use bit vectors instead of sets for schemas with 64+ attributes.

edge_case_handling.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def is_fd_preserved_robust(X: Set, Y: Set, F: List, decomposition: List) -> bool:
    """
    Robust preservation test with edge case handling.
    """
    # Edge case 1: Trivial FD (Y ⊆ X)
    if Y.issubset(X):
        return True  # Always preserved
    
    # Edge case 2: Empty determinant
    if not X:
        # FD {} → Y means Y must be the same in all tuples
        # These require special handling - typically preserved if Y is single-valued
        return True  # Assuming normalized schema
    
    # Edge case 3: Decomposition has empty components
    valid_relations = [r for r in decomposition if len(r) >= 2]
    
    # Standard closure computation
    closure = set(X)
    changed = True
    iterations = 0
    max_iterations = len(F) * len(decomposition) + 10  # Safety bound
    
    while changed and iterations < max_iterations:
        changed = False
        iterations += 1
        
        for rel_attrs in valid_relations:
            for fd_lhs, fd_rhs in F:
                # Edge case 4: Handle multi-attribute RHS
                if not isinstance(fd_rhs, set):
                    fd_rhs = {fd_rhs}
                
                if fd_lhs.issubset(rel_attrs) and fd_rhs.issubset(rel_attrs):
                    if fd_lhs.issubset(closure) and not fd_rhs.issubset(closure):
                        closure.update(fd_rhs)
                        changed = True
    
    # Edge case 5: Ensure termination was clean
    if iterations >= max_iterations:
        raise RuntimeError("Closure computation did not converge - check for cycles")
    
    # Full RHS must be in closure, not just partial
    return Y.issubset(closure)

Summary: Testing Preservation

We've developed a complete algorithmic toolkit for testing dependency preservation. Let's consolidate the key concepts:

Key Takeaways

•Naive approach is exponential — Computing F⁺ explicitly is infeasible for real schemas.
•Efficient algorithm is polynomial — Test per-FD using modified attribute closure computation.
•Core insight: Y ⊆ X⁺G tests if X → Y is in G⁺ without computing G⁺.
•Pre-computation accelerates testing — Index FDs by relation containment for faster inner loops.
•Use the test to compare options — When multiple decompositions exist, preservation testing guides selection.
•Handle edge cases carefully — Trivial FDs, multi-attribute RHS, and empty relations need attention.

Page Complete

You now have the algorithmic tools to rigorously verify dependency preservation in any decomposition. In the next page, we'll explore the trade-offs between dependency preservation and other normalization goals—particularly when achieving BCNF might require sacrificing some dependencies.

2 / 5

Loading learning content...

Database Management SystemsDependency Preserving Decomposition

Dependency Preserving Decomposition

LevelIntermediate

Duration60 mins

TopicDependency Preserving Decomposition

2 / 5

Testing Preservation

The Verification Challenge

What You Will Learn

The Naive Approach (And Why It Fails)

The most direct way to test dependency preservation follows directly from the definition:

Naive Algorithm:

Compute F⁺ (closure of original FDs)
For each relation Rᵢ, compute πRᵢ(F⁺) (project closure onto relation's attributes)
Compute G = ∪ πRᵢ(F⁺) (union of all projections)
Compute G⁺ (closure of the union)
Check if G⁺ = F⁺

This approach is correct but has a fatal flaw: the closure F⁺ can be exponentially large in the number of attributes.

Exponential Explosion

Naive Algorithm Complexity Growth
Attributes (n)	Potential \|F⁺\|	Practical Feasibility
5	~32	✓ Trivial
10	~1,024	✓ Reasonable
15	~32,768	⚠ Slow
20	~1,048,576	⚠ Very Slow
25	~33,554,432	✗ Impractical
30	~1,073,741,824	✗ Impossible

The insight:

The Efficient Preservation Test Algorithm

Core Principle:

An FD X → Y is preserved if Y ⊆ X⁺G where X⁺G is the closure of X computed using only the projected dependencies G = ∪ πRᵢ(F).

The Efficient Algorithm:

efficient_preservation_test.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def test_dependency_preservation(F, decomposition):
    """
    Efficiently test if all FDs in F are preserved by the decomposition.
    
    Time Complexity: O(|F| × n² × k) where n = #attributes, k = #relations
    This is polynomial, not exponential!
    
    Parameters:
    - F: Set of functional dependencies {(lhs_set, rhs_set), ...}
    - decomposition: List of attribute sets [R1_attrs, R2_attrs, ...]
    
    Returns:
    - (True, None) if all preserved
    - (False, violating_fd) if some FD is not preserved
    """
    for fd in F:
        lhs, rhs = fd  # X → Y
        if not is_fd_preserved(lhs, rhs, F, decomposition):
            return (False, fd)
    return (True, None)
 
 
def is_fd_preserved(X, Y, F, decomposition):
    """
    Test if a single FD X → Y is preserved in the decomposition.
    Uses modified attribute closure that only considers projected FDs.
    """
    # Compute X+ using only projected dependencies
    closure = compute_closure_with_projection(X, F, decomposition)
    
    # X → Y is preserved iff Y ⊆ closure
    return Y.issubset(closure)
 
 
def compute_closure_with_projection(X, F, decomposition):
    """
    Compute attribute closure of X, but restrict to FDs that are
    fully contained within some Ri (the projected dependencies).
    
    This is the key innovation: we compute closure iteratively,
    only adding attributes when the FD is actually projectable.
    """
    closure = set(X)  # Start with X
    changed = True
    
    while changed:
        changed = False
        for relation_attrs in decomposition:
            # Consider only FDs that project onto this relation
            # For each FD in F: X' → Y' where X' ⊆ Ri and Y' ⊆ Ri
            for fd_lhs, fd_rhs in F:
                # Check if this FD is fully contained in this relation
                if fd_lhs.issubset(relation_attrs) and fd_rhs.issubset(relation_attrs):
                    # If we have the LHS in our closure, add the RHS
                    if fd_lhs.issubset(closure) and not fd_rhs.issubset(closure):
                        closure.update(fd_rhs)
                        changed = True
    
    return closure

Why This Works

Step-by-Step Algorithm Walkthrough

Let's walk through the algorithm with a detailed example to build deep intuition.

Given:

Relation R(A, B, C, D) with FDs:

F = {A → B, B → C, C → D, A → C}

Decomposition:

R₁(A, B)
R₂(B, C)
R₃(C, D)

Step 1: Identify Projected Dependencies
Relation	Attributes	Projected FDs from F
R₁	{A, B}	A → B (fully contained)
R₂	{B, C}	B → C (fully contained)
R₃	{C, D}	C → D (fully contained)

Step 2: Test Each FD in F for Preservation

Testing A → B:

Compute closure of {A} using projected FDs only
Start: closure = {A}
Check R₁: FD A → B; A ⊆ {A}, so add B. closure = {A, B}
Check R₂: FD B → C; B ⊆ {A, B}, so add C. closure = {A, B, C}
Check R₃: FD C → D; C ⊆ {A, B, C}, so add D. closure = {A, B, C, D}
Is {B} ⊆ {A, B, C, D}? Yes! A → B is preserved. ✓

Testing B → C:

Compute closure of {B} using projected FDs
Start: closure = {B}
Check R₁: FD A → B; A ⊄ {B}, skip
Check R₂: FD B → C; B ⊆ {B}, add C. closure = {B, C}
Check R₃: FD C → D; C ⊆ {B, C}, add D. closure = {B, C, D}
Is {C} ⊆ {B, C, D}? Yes! B → C is preserved. ✓

Testing C → D:

Compute closure of {C} using projected FDs
Start: closure = {C}
Check R₁: FD A → B; A ⊄ {C}, skip
Check R₂: FD B → C; B ⊄ {C}, skip
Check R₃: FD C → D; C ⊆ {C}, add D. closure = {C, D}
Is {D} ⊆ {C, D}? Yes! C → D is preserved. ✓

Testing A → C:

Compute closure of {A} using projected FDs
Start: closure = {A}
Check R₁: FD A → B; A ⊆ {A}, add B. closure = {A, B}
Check R₂: FD B → C; B ⊆ {A, B}, add C. closure = {A, B, C}
Is {C} ⊆ {A, B, C}? Yes! A → C is preserved. ✓

Conclusion: All FDs are preserved! This decomposition is dependency-preserving.

Verification Complete

Example: Detecting Non-Preservation

Now let's examine a decomposition that fails the preservation test.

Given:

Relation R(A, B, C) with FDs:

F = {A → B, B → C, A → C}

Decomposition:

R₁(A, C)
R₂(B, C)

Identify Projected Dependencies:

Relation	Attributes	Projected FDs
R₁	{A, C}	A → C (fully contained)
R₂	{B, C}	B → C (fully contained)

Note: A → B is NOT projected onto either relation because:

In R₁: B is not present
In R₂: A is not present

Testing A → B:

Compute closure of {A} using projected FDs only
Start: closure = {A}
Check R₁: FD A → C; A ⊆ {A}, add C. closure = {A, C}
Check R₂: FD B → C; B ⊄ {A, C}, skip
No more changes. Final closure = {A, C}
Is {B} ⊆ {A, C}? NO! B is not in the closure.

A → B is NOT preserved!

Preservation Failure Detected

The Fix:

To preserve A → B, we need a relation containing both A and B. Options:

Add a new relation: R₃(A, B)
Modify an existing relation: Change R₁(A, C) to R₁(A, B, C)
Redesign completely: Use different decomposition boundaries

Option 1 maintains normalization while restoring preservation. The new decomposition:

R₁(A, C)
R₂(B, C)
R₃(A, B)

Now A → B is directly enforceable in R₃.

Optimized Implementation Strategies

For production use, the basic algorithm can be significantly optimized. Here are key strategies employed in database design tools.

Optimization Techniques

•Pre-filter FDs — Before testing, identify which FDs are trivially preserved (fully contained in one relation). Only test the remaining FDs.
•Index FDs by determinant — Store FDs in a hash map keyed by determinant attributes. O(1) lookup when checking applicability.
•Relation-FD mapping — Pre-compute which FDs project onto each relation. Avoid re-checking containment in the inner loop.
•Early termination — Stop as soon as any FD fails preservation (if only checking preservability).
•Incremental closure — When checking multiple FDs, cache partial closure results for shared prefixes.
•Bit-vector attributes — Represent attribute sets as bit vectors for O(1) subset operations.

optimized_preservation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
from typing import Set, List, Tuple, Dict, Optional
from dataclasses import dataclass, field
 
@dataclass
class FD:
    lhs: frozenset  # Determinant (left-hand side)
    rhs: frozenset  # Dependent (right-hand side)
    
    def is_contained_in(self, attrs: Set[str]) -> bool:
        """Check if FD is fully contained in attribute set."""
        return self.lhs.issubset(attrs) and self.rhs.issubset(attrs)
 
 
@dataclass  
class OptimizedPreservationChecker:
    """
    Optimized dependency preservation checker with pre-computed indexes.
    """
    fds: List[FD]
    decomposition: List[Set[str]]
    
    # Pre-computed indexes
    fd_by_relation: Dict[int, List[FD]] = field(default_factory=dict)
    trivially_preserved: Set[int] = field(default_factory=set)
    
    def __post_init__(self):
        self._build_indexes()
    
    def _build_indexes(self):
        """Pre-compute FD-to-relation mappings."""
        for i, rel_attrs in enumerate(self.decomposition):
            self.fd_by_relation[i] = []
            for j, fd in enumerate(self.fds):
                if fd.is_contained_in(rel_attrs):
                    self.fd_by_relation[i].append(fd)
                    self.trivially_preserved.add(j)
    
    def check_all(self) -> Tuple[bool, Optional[FD]]:
        """
        Check if all FDs are preserved.
        Returns (True, None) if preserved, (False, failing_fd) otherwise.
        """
        for i, fd in enumerate(self.fds):
            # Skip trivially preserved (already in some relation)
            if i in self.trivially_preserved:
                continue
            
            if not self._is_preserved(fd):
                return (False, fd)
        
        return (True, None)
    
    def _is_preserved(self, target_fd: FD) -> bool:
        """Test if a single FD is preserved using projected closure."""
        closure = set(target_fd.lhs)
        changed = True
        
        while changed:
            changed = False
            for rel_idx, projected_fds in self.fd_by_relation.items():
                for fd in projected_fds:
                    if fd.lhs.issubset(closure) and not fd.rhs.issubset(closure):
                        closure.update(fd.rhs)
                        changed = True
                        
                        # Early exit if we've found target
                        if target_fd.rhs.issubset(closure):
                            return True
        
        return target_fd.rhs.issubset(closure)
 
 
# Example usage
def example():
    fds = [
        FD(frozenset(['A']), frozenset(['B'])),
        FD(frozenset(['B']), frozenset(['C'])),
        FD(frozenset(['C']), frozenset(['D'])),
    ]
    
    decomposition = [
        {'A', 'B'},
        {'B', 'C'},
        {'C', 'D'},
    ]
    
    checker = OptimizedPreservationChecker(fds, decomposition)
    preserved, failing = checker.check_all()
    
    if preserved:
        print("✓ All dependencies preserved!")
    else:
        print(f"✗ Not preserved: {failing.lhs} → {failing.rhs}")

Comparing Multiple Decomposition Options

Scenario:

Relation: Student(SID, SName, Major, Advisor, Department)

FDs:

FD1: SID → SName, Major, Advisor, Department
FD2: Advisor → Department
FD3: Major → Department

Two decomposition options for BCNF:

Option A

•R₁(SID, SName, Major, Advisor)
•R₂(Advisor, Department)

Option B

•R₁(SID, SName, Major, Advisor)
•R₂(Advisor, Department)
•R₃(Major, Department)

Preservation Analysis Comparison
FD	Option A Preserves?	Option B Preserves?
FD1: SID → SName, Major, Advisor, Department	Derived (SID→...→Dept via R₁+R₂)	Derived (multiple paths)
FD2: Advisor → Department	✓ Direct in R₂	✓ Direct in R₂
FD3: Major → Department	✗ NOT preserved	✓ Direct in R₃

Option A Loses FD3

Decision Framework:

Run preservation test on all candidate decompositions
Prefer decompositions that preserve all FDs
If no option preserves all FDs, evaluate: which FDs are most critical?
Consider adding small "preservation-restoring" relations for critical FDs

Edge Cases and Common Pitfalls

The preservation test algorithm is robust, but implementation requires attention to subtle edge cases.

Common Pitfalls to Avoid

•Trivial FDs — FDs like A → A are always preserved but may pollute test results. Filter them early.
•Redundant FDs in F — If F contains redundant FDs, ensure your test handles them (they should be preserved if f is preserved).
•Multi-attribute RHS — For FD X → YZ, ensure you test that BOTH Y and Z are in the closure, not just one.
•Empty decomposition components — Handle edge cases where a relation has very few attributes.
•Circular FD chains — FDs like A → B, B → C, C → A should terminate correctly (closure reaches fixed point).
•Very large attribute sets — Use bit vectors instead of sets for schemas with 64+ attributes.

edge_case_handling.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def is_fd_preserved_robust(X: Set, Y: Set, F: List, decomposition: List) -> bool:
    """
    Robust preservation test with edge case handling.
    """
    # Edge case 1: Trivial FD (Y ⊆ X)
    if Y.issubset(X):
        return True  # Always preserved
    
    # Edge case 2: Empty determinant
    if not X:
        # FD {} → Y means Y must be the same in all tuples
        # These require special handling - typically preserved if Y is single-valued
        return True  # Assuming normalized schema
    
    # Edge case 3: Decomposition has empty components
    valid_relations = [r for r in decomposition if len(r) >= 2]
    
    # Standard closure computation
    closure = set(X)
    changed = True
    iterations = 0
    max_iterations = len(F) * len(decomposition) + 10  # Safety bound
    
    while changed and iterations < max_iterations:
        changed = False
        iterations += 1
        
        for rel_attrs in valid_relations:
            for fd_lhs, fd_rhs in F:
                # Edge case 4: Handle multi-attribute RHS
                if not isinstance(fd_rhs, set):
                    fd_rhs = {fd_rhs}
                
                if fd_lhs.issubset(rel_attrs) and fd_rhs.issubset(rel_attrs):
                    if fd_lhs.issubset(closure) and not fd_rhs.issubset(closure):
                        closure.update(fd_rhs)
                        changed = True
    
    # Edge case 5: Ensure termination was clean
    if iterations >= max_iterations:
        raise RuntimeError("Closure computation did not converge - check for cycles")
    
    # Full RHS must be in closure, not just partial
    return Y.issubset(closure)

Summary: Testing Preservation

We've developed a complete algorithmic toolkit for testing dependency preservation. Let's consolidate the key concepts:

Key Takeaways

•Naive approach is exponential — Computing F⁺ explicitly is infeasible for real schemas.
•Efficient algorithm is polynomial — Test per-FD using modified attribute closure computation.
•Core insight: Y ⊆ X⁺G tests if X → Y is in G⁺ without computing G⁺.
•Pre-computation accelerates testing — Index FDs by relation containment for faster inner loops.
•Use the test to compare options — When multiple decompositions exist, preservation testing guides selection.
•Handle edge cases carefully — Trivial FDs, multi-attribute RHS, and empty relations need attention.

Page Complete

2 / 5