Loading content...
A database decomposition that loses data is useless. A decomposition where constraints can't be efficiently enforced is dangerous. The gold standard of database design achieves both lossless join and dependency preservation simultaneously—ensuring that you can reconstruct your data perfectly AND enforce all your business rules efficiently.
These two properties form the twin pillars of sound decomposition theory. Understanding how they interact—and how to verify that both are achieved—is essential for designing production-quality schemas.
This page explores the relationship between lossless join and dependency preservation, presents algorithms for verifying both properties, demonstrates how 3NF synthesis achieves both, and provides comprehensive verification strategies for your decompositions.
Before combining properties, let's ensure a solid understanding of lossless join decomposition.
Definition:
A decomposition of relation R into R₁, R₂, ..., Rₙ is lossless (or lossless-join) if for every valid instance r of R:
πR₁(r) ⨝ πR₂(r) ⨝ ... ⨝ πRₙ(r) = r
In words: projecting the data onto the decomposed relations and joining them back together yields exactly the original data—no tuples lost, no spurious tuples added.
The term 'lossless' refers specifically to the join reconstruction property. It does NOT mean that dependencies are preserved or that the schema is normalized. A decomposition can be lossless but still lose constraint information, and vice versa.
Testing for Lossless Join (Binary Decomposition):
For a decomposition of R into two relations R₁ and R₂, a simple test exists:
The decomposition is lossless if and only if:
is in F⁺ (the closure of functional dependencies).
Intuition: The common attributes must functionally determine all attributes of at least one of the relations. This ensures no spurious tuples are created during the join.
| R₁ | R₂ | Common | FDs | Lossless? |
|---|---|---|---|---|
| {A, B, C} | {C, D, E} | {C} | C → D, E | Yes (C → R₂) |
| {A, B} | {B, C} | {B} | A → B | No (B determines nothing) |
| {A, B} | {B, C} | {B} | B → C | Yes (B → R₂) |
| {A, B, C} | {A, D} | {A} | A → B, C, D | Yes (A → both) |
A critical insight is that lossless join and dependency preservation are independent properties. Each can hold without the other, both can hold together, or neither can hold. Let's see concrete examples:
Example: Both Lossless and Dependency-Preserving
R(A, B, C) with FDs: A → B, B → C
Decomposition: R₁(A, B), R₂(B, C)
Lossless Test:
Preservation Test:
Result: This decomposition achieves both properties. It's the gold standard.
Key Insight:
The independence of these properties means you must verify each separately. Achieving one does not guarantee the other. A well-designed decomposition process should explicitly target both.
The 3NF Synthesis Algorithm (also known as the 3NF Decomposition Algorithm or Bernstein's Algorithm) is a landmark result in database theory. It guarantees that the resulting decomposition is:
All three properties are guaranteed—by construction!
Unlike BCNF decomposition (which may sacrifice preservation), 3NF synthesis ALWAYS achieves both lossless join and dependency preservation. This is why 3NF is often the practical target for production schemas.
The Algorithm:
Input: Relation R with attributes U and FDs F
Output: A set of relations {R₁, R₂, ..., Rₙ} in 3NF that is lossless-join and dependency-preserving
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394
def three_nf_synthesis(attributes: set, fds: list) -> list: """ 3NF Synthesis Algorithm (Bernstein's Algorithm) Guarantees: - Resulting decomposition is in 3NF - Decomposition is lossless-join - All dependencies are preserved Parameters: - attributes: Set of all attributes in the relation - fds: List of functional dependencies as (lhs, rhs) tuples Returns: - List of relation schemas (each schema is a set of attributes) """ # Step 1: Compute canonical cover (minimal cover) of F # This removes redundant FDs and extraneous attributes canonical = compute_canonical_cover(fds) # Step 2: Create a relation for each FD in the canonical cover # For each X → Y, create relation with attributes X ∪ Y relations = [] for lhs, rhs in canonical: relation_attrs = lhs.union(rhs) relations.append(relation_attrs) # Step 3: Merge relations with the same key (LHS) # If multiple FDs have the same determinant, combine them merged = merge_by_key(relations, canonical) # Step 4: Remove redundant relations # If one relation's attributes are a subset of another, remove it merged = remove_subsets(merged) # Step 5: Ensure lossless join by adding a key relation (if needed) # If no relation contains a candidate key of R, add one if not any_contains_candidate_key(merged, attributes, fds): candidate_key = find_candidate_key(attributes, fds) merged.append(candidate_key) return merged def compute_canonical_cover(fds): """ Compute the canonical (minimal) cover of a set of FDs. Steps: 1. Replace X → YZ with X → Y and X → Z (single attribute RHS) 2. Remove extraneous LHS attributes 3. Remove extraneous RHS attributes 4. Remove redundant FDs 5. Recombine FDs with same LHS """ # Implementation details... pass def merge_by_key(relations, canonical): """Merge relations that have the same left-hand side (key).""" key_to_attrs = {} for lhs, rhs in canonical: lhs_frozen = frozenset(lhs) if lhs_frozen not in key_to_attrs: key_to_attrs[lhs_frozen] = set(lhs) key_to_attrs[lhs_frozen].update(rhs) return list(key_to_attrs.values()) def remove_subsets(relations): """Remove relations whose attributes are a subset of another.""" result = [] for r in relations: if not any(r < other for other in relations if r != other): result.append(r) return result def any_contains_candidate_key(relations, all_attrs, fds): """Check if any relation contains a candidate key of the original R.""" candidate_keys = find_all_candidate_keys(all_attrs, fds) for rel in relations: for key in candidate_keys: if key.issubset(rel): return True return False def find_candidate_key(attributes, fds): """Find a candidate key using attribute closure.""" # Start with all attributes, remove those that appear only on RHS # Then minimize pass # Implementation details...Let's walk through the complete 3NF synthesis algorithm with a detailed example.
Given:
Relation: R(A, B, C, D, E, F)
Functional Dependencies:
Step 1: Compute Canonical Cover
First, we decompose multi-attribute RHS into single attributes:
Check for extraneous attributes (none here). Check for redundant FDs (none). Recombine by LHS:
Canonical Cover:
Step 2: Create Relations for Each FD
| FD | Relation Attributes |
|---|---|
| A → BC | R₁(A, B, C) |
| C → DE | R₂(C, D, E) |
| E → F | R₃(E, F) |
| F → A | R₄(F, A) |
Step 3: Merge Relations with Same Key
All FDs have different determinants, so no merging needed.
Step 4: Remove Redundant Relations
No relation is a subset of another, so all four remain.
Step 5: Ensure Lossless Join
Find candidate keys of original R:
Check if any Rᵢ contains {A}:
Lossless join is guaranteed—no additional relation needed.
| Relation | Attributes | Key | Preserved FDs |
|---|---|---|---|
| R₁ | {A, B, C} | A | A → BC |
| R₂ | {C, D, E} | C | C → DE |
| R₃ | {E, F} | E | E → F |
| R₄ | {F, A} | F | F → A |
This decomposition is: (1) In 3NF—each relation has only key-based dependencies, (2) Lossless—R₁ contains candidate key A, (3) Dependency-preserving—every FD in F is fully contained in some Rᵢ.
For decompositions not generated by 3NF synthesis (e.g., BCNF decomposition or manual design), you must verify both properties independently.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139
def verify_decomposition(original_attrs, fds, decomposition): """ Verify that a decomposition is both lossless-join and dependency-preserving. Parameters: - original_attrs: Set of all attributes in original relation - fds: List of functional dependencies - decomposition: List of attribute sets (decomposed relations) Returns: - dict with 'lossless', 'preserving', and diagnostic information """ result = { 'lossless': False, 'preserving': False, 'lost_fds': [], 'lossless_via': None, 'issues': [] } # ===== Test Lossless Join ===== # Use the chase algorithm for n-way decompositions # For binary, use the simple common attribute test if len(decomposition) == 2: # Binary decomposition: simple test r1, r2 = decomposition common = r1.intersection(r2) if not common: result['issues'].append("No common attributes - cannot be lossless") else: # Check if common → r1 or common → r2 closure_common = compute_attribute_closure(common, fds) if r1.issubset(closure_common): result['lossless'] = True result['lossless_via'] = f"Common attrs → R1" elif r2.issubset(closure_common): result['lossless'] = True result['lossless_via'] = f"Common attrs → R2" else: result['issues'].append( f"Common {common} doesn't determine either relation" ) else: # N-way decomposition: use chase algorithm result['lossless'] = chase_test(original_attrs, fds, decomposition) result['lossless_via'] = "Chase algorithm" # ===== Test Dependency Preservation ===== for fd_lhs, fd_rhs in fds: if not is_fd_preserved(fd_lhs, fd_rhs, fds, decomposition): result['lost_fds'].append((fd_lhs, fd_rhs)) result['preserving'] = len(result['lost_fds']) == 0 # Overall assessment if result['lossless'] and result['preserving']: result['assessment'] = "EXCELLENT: Both properties achieved" elif result['lossless']: result['assessment'] = "ACCEPTABLE: Lossless but some FDs lost" elif result['preserving']: result['assessment'] = "PROBLEMATIC: FDs preserved but lossy join" else: result['assessment'] = "CRITICAL: Neither property achieved" return result def chase_test(original_attrs, fds, decomposition): """ Chase algorithm for testing lossless join of n-way decomposition. Creates a tableau and applies FDs until no more changes. If any row becomes all-subscript-free, decomposition is lossless. """ # Create initial tableau # Rows = decomposed relations, Columns = original attributes # Entry (i,j) = 'a_j' if attr j is in relation i, else 'b_ij' n_rels = len(decomposition) n_attrs = len(original_attrs) attrs_list = sorted(original_attrs) # Initialize tableau tableau = [] for i, rel in enumerate(decomposition): row = {} for j, attr in enumerate(attrs_list): if attr in rel: row[attr] = ('a', attr) # Distinguished symbol else: row[attr] = ('b', i, attr) # Subscripted symbol tableau.append(row) # Apply FDs until fixed point changed = True while changed: changed = False for fd_lhs, fd_rhs in fds: # Find rows that agree on all LHS attributes groups = {} for row_idx, row in enumerate(tableau): lhs_values = tuple(row[attr] for attr in fd_lhs) if lhs_values not in groups: groups[lhs_values] = [] groups[lhs_values].append(row_idx) # For each group, make RHS values agree for lhs_vals, row_indices in groups.items(): if len(row_indices) > 1: for rhs_attr in fd_rhs: values = [tableau[i][rhs_attr] for i in row_indices] # Prefer 'a' symbols over 'b' symbols best = min(values, key=lambda v: (0 if v[0]=='a' else 1, v)) for i in row_indices: if tableau[i][rhs_attr] != best: tableau[i][rhs_attr] = best changed = True # Check if any row is all 'a' symbols for row in tableau: if all(v[0] == 'a' for v in row.values()): return True return False def compute_attribute_closure(attrs, fds): """Compute the closure of a set of attributes under FDs.""" closure = set(attrs) changed = True while changed: changed = False for lhs, rhs in fds: if lhs.issubset(closure) and not rhs.issubset(closure): closure.update(rhs) changed = True return closureThe chase algorithm is the gold standard for testing lossless join in n-way decompositions. It's polynomial time and provides a constructive proof of losslessness. While the implementation is complex, understanding its existence and guarantees is valuable for rigorous schema verification.
Beyond algorithms, practical design strategies help ensure your decompositions achieve both properties:
The Preservation-Restoring Relation Pattern:
When a necessary FD is lost during decomposition, you can often restore preservation by adding a small relation containing just that FD's attributes:
If FD X → Y is lost (X and Y are split across relations), add a relation R_fix(X ∪ Y).
This relation:
Adding a preservation-restoring relation introduces controlled redundancy. This is a deliberate trade-off: a small amount of redundancy in exchange for efficient constraint enforcement. Document this design decision clearly.
Even experienced designers make mistakes when trying to achieve both properties. Here are common pitfalls:
Design relations intuitively, then hope they're both lossless and preserving. Skip formal verification because 'it looks right.'
Start with 3NF synthesis for guaranteed properties. Verify any modifications algorithmically. Document any accepted trade-offs explicitly.
We've explored how to achieve both lossless join and dependency preservation—the twin pillars of sound decomposition. Let's consolidate:
You now understand how to achieve both lossless join and dependency preservation in your decompositions, with 3NF synthesis as the guaranteed path and verification strategies for any decomposition approach. In the final page, we'll present the complete algorithm for dependency-preserving decomposition, bringing together all concepts into a unified procedure.