Dependency Preserving Decomposition - Learning Module

Loading content...

0/241

Dependency Preservation

When Decomposition Breaks Constraints

Imagine you've successfully decomposed a large, redundant relation into smaller, well-normalized tables. You've eliminated update anomalies, reduced storage costs, and achieved lossless join decomposition—meaning you can reconstruct the original data perfectly through natural joins. But there's a hidden danger lurking in your design: you may have inadvertently made it impossible to efficiently enforce your original functional dependencies.

This scenario represents one of the most subtle yet critical challenges in database normalization. While lossless decomposition ensures data reconstruction, dependency preservation ensures constraint enforcement. Without dependency preservation, you may find that enforcing a simple integrity rule requires joining multiple tables—transforming what should be a straightforward constraint check into an expensive, error-prone operation.

What You Will Learn

By the end of this page, you will understand dependency preservation as a fundamental decomposition property, grasp its formal definition, and recognize why this property is essential for practical database design. You'll see how ignoring dependency preservation leads to constraint enforcement nightmares, and how preserving dependencies enables efficient integrity maintenance.

The Fundamental Problem

To understand dependency preservation, we must first understand the problem it solves. Consider a relation that tracks course assignments:

CourseAssignment(StudentID, CourseName, InstructorID, Department)

With the following functional dependencies:

FD1: InstructorID → Department (each instructor belongs to exactly one department)
FD2: CourseName → Department (each course is offered by exactly one department)
FD3: StudentID, CourseName → InstructorID (each student-course pair has one instructor)

Now, suppose we decompose this relation for normalization purposes into:

R₁(StudentID, CourseName, InstructorID) R₂(InstructorID, Department)

This decomposition is lossless—we can reconstruct the original data through a natural join on InstructorID. However, let's examine what happened to our functional dependencies:

Dependency Preservation Analysis
Functional Dependency	Can Check in R₁ Alone?	Can Check in R₂ Alone?	Preserved?
FD1: InstructorID → Department	No (no Department attribute)	✓ Yes	Yes
FD2: CourseName → Department	No (no Department attribute)	No (no CourseName attribute)	No!
FD3: StudentID, CourseName → InstructorID	✓ Yes	No	Yes

The critical observation: FD2 (CourseName → Department) cannot be checked using either R₁ or R₂ alone. To verify this constraint, we must:

Join R₁ and R₂ on InstructorID
Check that each CourseName maps to exactly one Department in the joined result

This join operation may involve millions of rows in a production database, turning a simple constraint check into an expensive operation that degrades database performance.

The Hidden Cost of Lost Dependencies

When a dependency is not preserved, every INSERT and UPDATE operation that could affect this constraint requires a multi-table join to validate. In high-transaction environments, this overhead can become a severe bottleneck—or worse, the constraint may be ignored entirely, leading to data corruption.

Formal Definition of Dependency Preservation

With the intuition established, let's formalize the concept precisely. This formal understanding is essential for rigorous analysis and algorithm development.

Definition: Projection of Functional Dependencies

Given a relation R with a set of functional dependencies F, and a decomposition of R into R₁, R₂, ..., Rₙ, we define the projection of F onto Rᵢ (denoted πRᵢ(F)) as:

πRᵢ(F) = { X → Y ∈ F⁺ | X ∪ Y ⊆ attributes(Rᵢ) }

In words: the projection of F onto Rᵢ contains all functional dependencies in the closure of F (F⁺) where both the determinant (X) and the dependent (Y) consist only of attributes present in Rᵢ.

Definition: Dependency-Preserving Decomposition

A decomposition of relation R into R₁, R₂, ..., Rₙ is dependency-preserving with respect to a set of functional dependencies F if and only if:

(πR₁(F) ∪ πR₂(F) ∪ ... ∪ πRₙ(F))⁺ = F⁺

This states that the closure of the union of all projected dependencies equals the closure of the original dependency set. In other words, all original dependencies can be derived from the dependencies enforceable on individual relations.

Understanding the Closure Requirement

The definition uses closures rather than direct set equality because dependencies can be logically equivalent without being literally identical. For example, if we have A → B and B → C, then A → C is in the closure. We need all original constraints to be derivable—whether explicitly or through Armstrong's axioms—from the projected sets.

Practical Interpretation:

A decomposition preserves dependencies if every functional dependency in F either:

Explicitly appears in some πRᵢ(F), or
Can be derived from the dependencies in ∪πRᵢ(F) using Armstrong's axioms

This means we don't need the exact original dependency in a single relation—we need the ability to enforce the constraint's effect using the decomposed schema.

dependency_preservation_check.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def is_dependency_preserved(F, decomposition):
    """
    Check if a decomposition preserves all functional dependencies.
    
    Parameters:
    - F: Set of functional dependencies on original relation
    - decomposition: List of attribute sets (one per decomposed relation)
    
    Returns:
    - True if dependency-preserving, False otherwise
    """
    # Compute the union of all projections
    projected_union = set()
    
    for relation_attrs in decomposition:
        # Project F onto this relation
        for fd in compute_closure(F):  # Iterate over F+
            lhs, rhs = fd
            if lhs.issubset(relation_attrs) and rhs.issubset(relation_attrs):
                projected_union.add(fd)
    
    # Compute closure of projected union
    projected_closure = compute_closure(projected_union)
    
    # Compute closure of original F
    original_closure = compute_closure(F)
    
    # Decomposition is dependency-preserving iff closures are equal
    return projected_closure == original_closure
 
 
def compute_closure(FDs):
    """
    Compute the closure of a set of functional dependencies (F+).
    Uses Armstrong's axioms iteratively until no new FDs can be derived.
    This is the theoretical algorithm - practical implementations optimize heavily.
    """
    closure = set(FDs)
    changed = True
    
    while changed:
        changed = False
        new_fds = set()
        
        for fd in closure:
            # Apply reflexivity, augmentation, transitivity
            # (simplified - full implementation would be more complex)
            new_fds.update(apply_armstrongs_axioms(closure, fd))
        
        if not new_fds.issubset(closure):
            closure.update(new_fds)
            changed = True
    
    return closure

Why Dependency Preservation Matters

Understanding the why behind dependency preservation transforms it from an abstract property into a practical design requirement. Let's explore the concrete implications.

Practical Benefits of Dependency Preservation

•Efficient Constraint Checking — Each functional dependency can be verified by examining a single table. No joins required. Constraint checks become O(row_size) instead of O(product_of_table_sizes).
•Simplified Triggers and Checks — Database CHECK constraints and triggers can be written for individual tables, not complex multi-table conditions. Implementation becomes straightforward.
•Application Logic Simplicity — Application code that validates data before insertion doesn't need to query multiple tables. Validation is localized and fast.
•Reduced Lock Contention — Without multi-table constraint checks, transactions can use finer-grained locking. Concurrent throughput improves dramatically.
•Better Query Optimizer Behavior — The query optimizer can use preserved dependencies for query optimization, including elimination of unnecessary joins and better index selection.

Without Dependency Preservation

•Constraint check: Join + scan (expensive)
•INSERT requires multi-table transaction
•Complex trigger logic with race conditions
•Potential for orphaned constraint violations
•Higher lock contention under load
•Difficult to scale horizontally

With Dependency Preservation

•Constraint check: Single table scan (fast)
•INSERT validates locally before commit
•Simple per-table CHECK constraints
•Atomic validation with no race conditions
•Minimal lock footprint per operation
•Easier distribution and sharding

The Real-World Impact

In production systems handling thousands of transactions per second, the difference between a single-table constraint check and a multi-table join can mean the difference between sub-millisecond response times and multi-second delays. Non-preserved dependencies are a common source of unexplained 'slow INSERTs' that frustrate both developers and DBAs.

Visualizing Dependency Preservation

A visual representation helps solidify understanding of how dependencies distribute across decomposed relations. Consider the following example with a more complex schema.

Original Relation: Employee(EmpID, Name, DeptID, DeptName, ManagerID, ProjectID, ProjectName)

With functional dependencies:

FD1: EmpID → Name, DeptID, ManagerID
FD2: DeptID → DeptName, ManagerID
FD3: ProjectID → ProjectName, DeptID
FD4: EmpID, ProjectID → (assignment relationship)

Converting Mermaid diagram...

Analysis of This Decomposition:

FD1 (EmpID → Name, DeptID): Fully contained in R1 — Preserved ✓
FD2 (DeptID → DeptName, ManagerID): Fully contained in R2 — Preserved ✓
FD3 (ProjectID → ProjectName, DeptID): Fully contained in R3 — Preserved ✓
FD4 (EmpID → ManagerID): Spans R1 and R2, but is derivable:
- EmpID → DeptID (from FD1 in R1)
- DeptID → ManagerID (from FD2 in R2)
- Therefore, EmpID → ManagerID (by transitivity) — Preserved ✓

This decomposition is dependency-preserving because all original FDs are either directly enforceable in one relation or derivable from the preserved dependencies through Armstrong's axioms.

Common Patterns in Dependency Preservation

Experience with decomposition reveals recurring patterns that frequently preserve or fail to preserve dependencies. Recognizing these patterns accelerates schema design.

Patterns in Dependency Preservation
Pattern	Preservation Likelihood	Why
Decomposition along FD boundaries	High	Each FD naturally falls into one relation
Key-based decomposition	High	Key attributes propagate correctly across relations
Transitive FD decomposition	Medium	Transitivity enables derivation but requires care
Arbitrary attribute splitting	Low	Random splits likely separate FD components
Multi-attribute determinant FDs	Low-Medium	Determinant may be split across relations

The Golden Rule of Preservation

When decomposing, always keep the determinant (left-hand side) and dependent (right-hand side) of each important functional dependency together in at least one relation. This simple heuristic prevents most preservation failures.

Anti-Patterns to Avoid:

Splitting composite keys arbitrarily — If FD is {A,B} → C, ensure A, B, and C appear together somewhere.
Focusing only on lossless property — A decomposition can be lossless but lose dependencies. Both properties must be verified.
Ignoring transitive dependencies during decomposition — When removing transitive FDs for 3NF, ensure the intermediate step attributes are properly placed.
Over-decomposition — Creating too many small relations increases the risk of splitting FD components.

Dependency Preservation vs Lossless Join

These two properties are often discussed together but are fundamentally independent. Understanding their relationship is crucial for database design.

Comparison of Decomposition Properties
Property	Lossless Join	Dependency Preservation
Goal	Reconstruct original data exactly	Enforce original constraints efficiently
Concern	Data integrity (no spurious tuples)	Constraint enforcement overhead
Testing	Check common attribute is key	Verify FDs in projected closures
Failure consequence	Incorrect query results from joins	Expensive or impossible constraint checks
3NF guarantee	Always achievable together	Always achievable together
BCNF guarantee	Always achievable	Sometimes sacrificed for BCNF

Independence of Properties

A decomposition can be lossless but not dependency-preserving, dependency-preserving but not lossless, both, or neither. They address orthogonal concerns: data reconstruction vs. constraint enforcement. Well-designed schemas aim for both.

Example Demonstrating Independence:

Consider R(A, B, C) with FDs: A → B, B → C

Decomposition 1: R₁(A, B) and R₂(A, C)

Lossless? Yes (A is common, A is key of R₁)
Preserves FDs? A → B preserved in R₁. But B → C is lost!

Decomposition 2: R₁(A, B) and R₂(B, C)

Lossless? Yes (B is common, B is key of R₂)
Preserves FDs? Both A → B in R₁ and B → C in R₂. Preserved!

Decomposition 2 achieves both properties; Decomposition 1 achieves only lossless join.

Practical Implications in Database Design

Let's translate theory into practice with concrete database design guidance.

Best Practices for Dependency-Preserving Design

•Document all FDs before decomposition — You must know what you're trying to preserve before you can succeed.
•Use the 3NF synthesis algorithm — It guarantees both lossless join and dependency preservation by construction.
•When choosing BCNF, verify preservation — BCNF decomposition may lose dependencies; verify and consciously accept trade-offs.
•Add preservation-restoring relations if needed — Sometimes adding a small relation that contains just an FD's components restores preservation.
•Test the final schema — After decomposition, algorithmically verify that all FDs are preserved before implementation.
•Document any sacrificed dependencies — If you must sacrifice preservation for higher normal forms, document which constraints require application-level enforcement.

validate_schema.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Example: Verifying FD preservation through schema introspection
-- This validates that DeptID → DeptName is enforceable in the Department table
 
-- First, verify the constraint can be expressed as a functional dependency
SELECT 
    DeptID,
    COUNT(DISTINCT DeptName) as distinct_dept_names
FROM Department
GROUP BY DeptID
HAVING COUNT(DISTINCT DeptName) > 1;
-- If this returns any rows, the FD is violated!
 
-- For preserved dependencies, we can create enforcing constraints:
-- Option 1: UNIQUE constraint (when determinant is not PK)
ALTER TABLE Department 
ADD CONSTRAINT enforce_dept_name_fd 
UNIQUE (DeptID, DeptName);
 
-- Option 2: For complex FDs, use triggers
CREATE TRIGGER validate_dept_fd
BEFORE INSERT OR UPDATE ON Department
FOR EACH ROW
BEGIN
    DECLARE existing_name VARCHAR(255);
    SELECT DeptName INTO existing_name 
    FROM Department 
    WHERE DeptID = NEW.DeptID 
    LIMIT 1;
    
    IF existing_name IS NOT NULL AND existing_name != NEW.DeptName THEN
        SIGNAL SQLSTATE '45000' 
        SET MESSAGE_TEXT = 'FD violation: DeptID → DeptName';
    END IF;
END;

Summary: Dependency Preservation

We've established the foundational understanding of dependency preservation—a critical property for practical database design. Let's consolidate the key insights:

Key Takeaways

•Dependency preservation ensures efficient constraint enforcement — FDs can be checked within individual tables without expensive joins.
•Formal definition uses closure equality — (∪πRᵢ(F))⁺ = F⁺ defines preservation rigorously.
•Preservation is independent of lossless join — Both properties are desirable but address different concerns.
•Lost dependencies require multi-table validation — This creates performance overhead and implementation complexity.
•3NF synthesis guarantees preservation — The algorithm is designed to achieve both lossless join and dependency preservation.
•BCNF may sacrifice preservation — Higher normal forms sometimes require accepting this trade-off.

Page Complete

You now understand what dependency preservation means and why it matters for database design. In the next page, we'll explore algorithms for testing whether a decomposition preserves dependencies—giving you the tools to verify your designs rigorously.