Dependency Preserving Decomposition - Learning Module

Loading content...

0/241

Combining with Lossless

The Two Pillars of Sound Decomposition

A database decomposition that loses data is useless. A decomposition where constraints can't be efficiently enforced is dangerous. The gold standard of database design achieves both lossless join and dependency preservation simultaneously—ensuring that you can reconstruct your data perfectly AND enforce all your business rules efficiently.

These two properties form the twin pillars of sound decomposition theory. Understanding how they interact—and how to verify that both are achieved—is essential for designing production-quality schemas.

What You Will Learn

This page explores the relationship between lossless join and dependency preservation, presents algorithms for verifying both properties, demonstrates how 3NF synthesis achieves both, and provides comprehensive verification strategies for your decompositions.

Understanding Lossless Join Decomposition

Before combining properties, let's ensure a solid understanding of lossless join decomposition.

Definition:

A decomposition of relation R into R₁, R₂, ..., Rₙ is lossless (or lossless-join) if for every valid instance r of R:

πR₁(r) ⨝ πR₂(r) ⨝ ... ⨝ πRₙ(r) = r

In words: projecting the data onto the decomposed relations and joining them back together yields exactly the original data—no tuples lost, no spurious tuples added.

Lossless ≠ No Information Lost

The term 'lossless' refers specifically to the join reconstruction property. It does NOT mean that dependencies are preserved or that the schema is normalized. A decomposition can be lossless but still lose constraint information, and vice versa.

Testing for Lossless Join (Binary Decomposition):

For a decomposition of R into two relations R₁ and R₂, a simple test exists:

The decomposition is lossless if and only if:

(R₁ ∩ R₂) → R₁, OR
(R₁ ∩ R₂) → R₂

is in F⁺ (the closure of functional dependencies).

Intuition: The common attributes must functionally determine all attributes of at least one of the relations. This ensures no spurious tuples are created during the join.

Lossless Join Test Examples
R₁	R₂	Common	FDs	Lossless?
{A, B, C}	{C, D, E}	{C}	C → D, E	Yes (C → R₂)
{A, B}	{B, C}	{B}	A → B	No (B determines nothing)
{A, B}	{B, C}	{B}	B → C	Yes (B → R₂)
{A, B, C}	{A, D}	{A}	A → B, C, D	Yes (A → both)

Independence of the Two Properties

A critical insight is that lossless join and dependency preservation are independent properties. Each can hold without the other, both can hold together, or neither can hold. Let's see concrete examples:

Example: Both Lossless and Dependency-Preserving

R(A, B, C) with FDs: A → B, B → C

Decomposition: R₁(A, B), R₂(B, C)

Lossless Test:

Common: {B}
B → C ∈ F, so B → R₂
✓ Lossless

Preservation Test:

A → B in R₁ ✓
B → C in R₂ ✓
All FDs preserved ✓

Result: This decomposition achieves both properties. It's the gold standard.

Key Insight:

The independence of these properties means you must verify each separately. Achieving one does not guarantee the other. A well-designed decomposition process should explicitly target both.

The 3NF Synthesis Algorithm: Guaranteed Both

The 3NF Synthesis Algorithm (also known as the 3NF Decomposition Algorithm or Bernstein's Algorithm) is a landmark result in database theory. It guarantees that the resulting decomposition is:

In Third Normal Form (3NF)
Lossless-join
Dependency-preserving

All three properties are guaranteed—by construction!

The 3NF Guarantee

Unlike BCNF decomposition (which may sacrifice preservation), 3NF synthesis ALWAYS achieves both lossless join and dependency preservation. This is why 3NF is often the practical target for production schemas.

The Algorithm:

Input: Relation R with attributes U and FDs F

Output: A set of relations {R₁, R₂, ..., Rₙ} in 3NF that is lossless-join and dependency-preserving

3nf_synthesis.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def three_nf_synthesis(attributes: set, fds: list) -> list:
    """
    3NF Synthesis Algorithm (Bernstein's Algorithm)
    
    Guarantees:
    - Resulting decomposition is in 3NF
    - Decomposition is lossless-join
    - All dependencies are preserved
    
    Parameters:
    - attributes: Set of all attributes in the relation
    - fds: List of functional dependencies as (lhs, rhs) tuples
    
    Returns:
    - List of relation schemas (each schema is a set of attributes)
    """
    
    # Step 1: Compute canonical cover (minimal cover) of F
    # This removes redundant FDs and extraneous attributes
    canonical = compute_canonical_cover(fds)
    
    # Step 2: Create a relation for each FD in the canonical cover
    # For each X → Y, create relation with attributes X ∪ Y
    relations = []
    for lhs, rhs in canonical:
        relation_attrs = lhs.union(rhs)
        relations.append(relation_attrs)
    
    # Step 3: Merge relations with the same key (LHS)
    # If multiple FDs have the same determinant, combine them
    merged = merge_by_key(relations, canonical)
    
    # Step 4: Remove redundant relations
    # If one relation's attributes are a subset of another, remove it
    merged = remove_subsets(merged)
    
    # Step 5: Ensure lossless join by adding a key relation (if needed)
    # If no relation contains a candidate key of R, add one
    if not any_contains_candidate_key(merged, attributes, fds):
        candidate_key = find_candidate_key(attributes, fds)
        merged.append(candidate_key)
    
    return merged
 
 
def compute_canonical_cover(fds):
    """
    Compute the canonical (minimal) cover of a set of FDs.
    Steps:
    1. Replace X → YZ with X → Y and X → Z (single attribute RHS)
    2. Remove extraneous LHS attributes
    3. Remove extraneous RHS attributes  
    4. Remove redundant FDs
    5. Recombine FDs with same LHS
    """
    # Implementation details...
    pass
 
 
def merge_by_key(relations, canonical):
    """Merge relations that have the same left-hand side (key)."""
    key_to_attrs = {}
    for lhs, rhs in canonical:
        lhs_frozen = frozenset(lhs)
        if lhs_frozen not in key_to_attrs:
            key_to_attrs[lhs_frozen] = set(lhs)
        key_to_attrs[lhs_frozen].update(rhs)
    return list(key_to_attrs.values())
 
 
def remove_subsets(relations):
    """Remove relations whose attributes are a subset of another."""
    result = []
    for r in relations:
        if not any(r < other for other in relations if r != other):
            result.append(r)
    return result
 
 
def any_contains_candidate_key(relations, all_attrs, fds):
    """Check if any relation contains a candidate key of the original R."""
    candidate_keys = find_all_candidate_keys(all_attrs, fds)
    for rel in relations:
        for key in candidate_keys:
            if key.issubset(rel):
                return True
    return False
 
 
def find_candidate_key(attributes, fds):
    """Find a candidate key using attribute closure."""
    # Start with all attributes, remove those that appear only on RHS
    # Then minimize
    pass  # Implementation details...

Step-by-Step 3NF Synthesis Example

Let's walk through the complete 3NF synthesis algorithm with a detailed example.

Given:

Relation: R(A, B, C, D, E, F)

Functional Dependencies:

F1: A → BC
F2: C → DE
F3: E → F
F4: F → A

Step 1: Compute Canonical Cover

First, we decompose multi-attribute RHS into single attributes:

A → B, A → C
C → D, C → E
E → F
F → A

Check for extraneous attributes (none here). Check for redundant FDs (none). Recombine by LHS:

Canonical Cover:

A → BC
C → DE
E → F
F → A

Step 2: Create Relations for Each FD

FD	Relation Attributes
A → BC	R₁(A, B, C)
C → DE	R₂(C, D, E)
E → F	R₃(E, F)
F → A	R₄(F, A)

Step 3: Merge Relations with Same Key

All FDs have different determinants, so no merging needed.

Step 4: Remove Redundant Relations

No relation is a subset of another, so all four remain.

Step 5: Ensure Lossless Join

Find candidate keys of original R:

Start with A. Closure(A) = {A, B, C, D, E, F} = all attributes
A is a candidate key!

Check if any Rᵢ contains {A}:

R₁(A, B, C) contains A ✓

Lossless join is guaranteed—no additional relation needed.

Final 3NF Decomposition
Relation	Attributes	Key	Preserved FDs
R₁	{A, B, C}	A	A → BC
R₂	{C, D, E}	C	C → DE
R₃	{E, F}	E	E → F
R₄	{F, A}	F	F → A

Verification Complete

This decomposition is: (1) In 3NF—each relation has only key-based dependencies, (2) Lossless—R₁ contains candidate key A, (3) Dependency-preserving—every FD in F is fully contained in some Rᵢ.

Verifying Both Properties Together

For decompositions not generated by 3NF synthesis (e.g., BCNF decomposition or manual design), you must verify both properties independently.

verify_both_properties.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
def verify_decomposition(original_attrs, fds, decomposition):
    """
    Verify that a decomposition is both lossless-join and dependency-preserving.
    
    Parameters:
    - original_attrs: Set of all attributes in original relation
    - fds: List of functional dependencies
    - decomposition: List of attribute sets (decomposed relations)
    
    Returns:
    - dict with 'lossless', 'preserving', and diagnostic information
    """
    result = {
        'lossless': False,
        'preserving': False,
        'lost_fds': [],
        'lossless_via': None,
        'issues': []
    }
    
    # ===== Test Lossless Join =====
    # Use the chase algorithm for n-way decompositions
    # For binary, use the simple common attribute test
    
    if len(decomposition) == 2:
        # Binary decomposition: simple test
        r1, r2 = decomposition
        common = r1.intersection(r2)
        
        if not common:
            result['issues'].append("No common attributes - cannot be lossless")
        else:
            # Check if common → r1 or common → r2
            closure_common = compute_attribute_closure(common, fds)
            
            if r1.issubset(closure_common):
                result['lossless'] = True
                result['lossless_via'] = f"Common attrs → R1"
            elif r2.issubset(closure_common):
                result['lossless'] = True
                result['lossless_via'] = f"Common attrs → R2"
            else:
                result['issues'].append(
                    f"Common {common} doesn't determine either relation"
                )
    else:
        # N-way decomposition: use chase algorithm
        result['lossless'] = chase_test(original_attrs, fds, decomposition)
        result['lossless_via'] = "Chase algorithm"
    
    # ===== Test Dependency Preservation =====
    for fd_lhs, fd_rhs in fds:
        if not is_fd_preserved(fd_lhs, fd_rhs, fds, decomposition):
            result['lost_fds'].append((fd_lhs, fd_rhs))
    
    result['preserving'] = len(result['lost_fds']) == 0
    
    # Overall assessment
    if result['lossless'] and result['preserving']:
        result['assessment'] = "EXCELLENT: Both properties achieved"
    elif result['lossless']:
        result['assessment'] = "ACCEPTABLE: Lossless but some FDs lost"
    elif result['preserving']:
        result['assessment'] = "PROBLEMATIC: FDs preserved but lossy join"
    else:
        result['assessment'] = "CRITICAL: Neither property achieved"
    
    return result
 
 
def chase_test(original_attrs, fds, decomposition):
    """
    Chase algorithm for testing lossless join of n-way decomposition.
    
    Creates a tableau and applies FDs until no more changes.
    If any row becomes all-subscript-free, decomposition is lossless.
    """
    # Create initial tableau
    # Rows = decomposed relations, Columns = original attributes
    # Entry (i,j) = 'a_j' if attr j is in relation i, else 'b_ij'
    
    n_rels = len(decomposition)
    n_attrs = len(original_attrs)
    attrs_list = sorted(original_attrs)
    
    # Initialize tableau
    tableau = []
    for i, rel in enumerate(decomposition):
        row = {}
        for j, attr in enumerate(attrs_list):
            if attr in rel:
                row[attr] = ('a', attr)  # Distinguished symbol
            else:
                row[attr] = ('b', i, attr)  # Subscripted symbol
        tableau.append(row)
    
    # Apply FDs until fixed point
    changed = True
    while changed:
        changed = False
        for fd_lhs, fd_rhs in fds:
            # Find rows that agree on all LHS attributes
            groups = {}
            for row_idx, row in enumerate(tableau):
                lhs_values = tuple(row[attr] for attr in fd_lhs)
                if lhs_values not in groups:
                    groups[lhs_values] = []
                groups[lhs_values].append(row_idx)
            
            # For each group, make RHS values agree
            for lhs_vals, row_indices in groups.items():
                if len(row_indices) > 1:
                    for rhs_attr in fd_rhs:
                        values = [tableau[i][rhs_attr] for i in row_indices]
                        # Prefer 'a' symbols over 'b' symbols
                        best = min(values, key=lambda v: (0 if v[0]=='a' else 1, v))
                        for i in row_indices:
                            if tableau[i][rhs_attr] != best:
                                tableau[i][rhs_attr] = best
                                changed = True
    
    # Check if any row is all 'a' symbols
    for row in tableau:
        if all(v[0] == 'a' for v in row.values()):
            return True
    return False
 
 
def compute_attribute_closure(attrs, fds):
    """Compute the closure of a set of attributes under FDs."""
    closure = set(attrs)
    changed = True
    while changed:
        changed = False
        for lhs, rhs in fds:
            if lhs.issubset(closure) and not rhs.issubset(closure):
                closure.update(rhs)
                changed = True
    return closure

The Chase Algorithm

The chase algorithm is the gold standard for testing lossless join in n-way decompositions. It's polynomial time and provides a constructive proof of losslessness. While the implementation is complex, understanding its existence and guarantees is valuable for rigorous schema verification.

Design Strategies for Achieving Both Properties

Beyond algorithms, practical design strategies help ensure your decompositions achieve both properties:

Practical Design Strategies

•Start with 3NF Synthesis — Use the algorithm as a baseline. It guarantees both properties. Then evaluate if BCNF improvements are worth potential preservation loss.
•Keep FD components together — When decomposing manually, ensure the LHS and RHS of each FD appear in at least one common relation.
•Include a key relation — Always ensure at least one decomposed relation contains a full candidate key of the original. This guarantees lossless join.
•Verify before implementing — Run algorithmic checks before creating tables. Fix issues in design phase, not after deployment.
•Document preserved and lost FDs — Create explicit documentation of which constraints are database-enforced vs application-enforced.
•Design for the common case — If most FDs are naturally preserved, a few lost FDs may be acceptable with trigger enforcement.

The Preservation-Restoring Relation Pattern:

When a necessary FD is lost during decomposition, you can often restore preservation by adding a small relation containing just that FD's attributes:

If FD X → Y is lost (X and Y are split across relations), add a relation R_fix(X ∪ Y).

This relation:

Allows X → Y to be enforced via a UNIQUE constraint or FK
May have some redundancy (by design)
Is often very small (only the constrained attributes)
Restores preservation without affecting lossless join

Intentional Redundancy

Adding a preservation-restoring relation introduces controlled redundancy. This is a deliberate trade-off: a small amount of redundancy in exchange for efficient constraint enforcement. Document this design decision clearly.

Common Pitfalls When Combining Properties

Even experienced designers make mistakes when trying to achieve both properties. Here are common pitfalls:

Pitfalls to Avoid

•Assuming BCNF preserves dependencies — BCNF decomposition does NOT guarantee preservation. Always verify explicitly.
•Testing only lossless join — Lossless join is easier to test (simple common-attribute check for binary). Preservation requires more work. Don't skip it.
•Ignoring multi-way decompositions — The simple common-attribute test only works for binary splits. Use the chase algorithm for three or more relations.
•Forgetting to check key containment — 3NF synthesis adds a key relation if needed. Manual decomposition may forget this critical step.
•Confusing derivable FDs with lost FDs — An FD may not appear explicitly in any relation but still be preserved through transitivity. Use the closure-based test.
•Not computing canonical cover first — Redundant FDs in F can cause unnecessary relations. Always minimize F before synthesis.

Wrong Approach

Design relations intuitively, then hope they're both lossless and preserving. Skip formal verification because 'it looks right.'

Right Approach

Start with 3NF synthesis for guaranteed properties. Verify any modifications algorithmically. Document any accepted trade-offs explicitly.

Summary: Combining with Lossless

We've explored how to achieve both lossless join and dependency preservation—the twin pillars of sound decomposition. Let's consolidate:

Key Takeaways

•Lossless and preservation are independent — Each must be verified separately; achieving one doesn't guarantee the other.
•3NF synthesis guarantees both — The algorithm is designed to achieve 3NF, lossless join, AND dependency preservation simultaneously.
•Use the chase algorithm for n-way tests — Simple common-attribute tests only work for binary decompositions.
•Key containment ensures lossless — At least one relation must contain a candidate key of the original relation.
•Preservation-restoring relations — When FDs are lost, adding a small relation containing the FD's attributes can restore preservation.
•Verify before implementation — Formal testing catches issues in design phase, saving costly post-deployment fixes.

Page Complete

You now understand how to achieve both lossless join and dependency preservation in your decompositions, with 3NF synthesis as the guaranteed path and verification strategies for any decomposition approach. In the final page, we'll present the complete algorithm for dependency-preserving decomposition, bringing together all concepts into a unified procedure.