Database Management SystemsEquivalence of FD Sets

Equivalence of Functional Dependency Sets

LevelIntermediate

Duration60 mins

TopicEquivalence of FD Sets

5 / 5

Practical Applications

From Theory to Practice

Throughout this module, we've developed a rigorous theoretical framework for understanding FD set equivalence. Now we bring this theory to life through practical applications that database professionals encounter regularly.

The concepts of equivalence, cover, and derivation are not merely academic exercises—they form the computational backbone of:

Database design tools
Schema optimization algorithms
Data integrity verification systems
Migration and modernization projects
Research and development in data management

This page demonstrates how equivalence theory integrates into the workflows of practicing database engineers and architects.

What You Will Learn

By the end of this page, you will understand how equivalence supports canonical cover computation, apply equivalence testing to verify normalizations, use coverage analysis in decomposition algorithms, recognize equivalence patterns in database tooling, and approach real-world database design with confidence in constraint management.

Canonical Cover Computation

The canonical cover (also called minimal cover) of an FD set F is the simplest equivalent FD set. Computing it is a cornerstone application of equivalence theory.

Definition: Canonical Cover

A canonical cover Fc of F satisfies:

Equivalence: Fc ≡ F
Singleton RHS: Every FD has a single attribute on the right-hand side
No extraneous attributes: No attribute can be removed from any LHS or RHS without breaking equivalence
No redundant FDs: No FD can be removed without breaking equivalence

Canonical Cover Algorithm

The algorithm repeatedly applies simplification steps while maintaining equivalence:

Canonical Cover Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
function CANONICAL_COVER(F):
    """
    Computes a canonical cover of F.
    Invariant: Result is always equivalent to F.
    """
    
    # Step 1: Decompose all composite RHS
    F' = DECOMPOSE_RHS(F)
    # {A → BC} becomes {A → B, A → C}
    
    # Step 2: Remove extraneous LHS attributes
    repeat:
        for each (XY → Z) in F' where |XY| > 1:
            for each attribute A in XY:
                if Z is in ATTRIBUTE_CLOSURE(XY - {A}, F' - {XY → Z} ∪ {(XY - {A}) → Z}):
                    # A is extraneous in LHS
                    F' = F' - {XY → Z} ∪ {(XY - {A}) → Z}
    until no change
    
    # Step 3: Remove extraneous RHS attributes (already singleton, so skip)
    
    # Step 4: Remove redundant FDs
    repeat:
        for each (X → A) in F':
            if A is in ATTRIBUTE_CLOSURE(X, F' - {X → A}):
                # X → A is redundant (derivable from other FDs)
                F' = F' - {X → A}
    until no change
    
    # Step 5: Combine FDs with same LHS (optional, for readability)
    F' = MERGE_SAME_LHS(F')
    
    return F'

Equivalence Guarantees Correctness

At every step of the algorithm, we verify that the modification preserves equivalence:

Decomposition: {A → BC} ≡ {A → B, A → C} by union/decomposition rules
Extraneous attribute removal: Verified by closure computation before removal
Redundant FD removal: The FD is derivable from remaining FDs, so closure unchanged

The final assertion EQUIVALENT(F, Fc) provides a correctness check on the entire algorithm.

Implementation Note

Always run EQUIVALENT(F, Fc) as a post-condition check in your canonical cover implementation. This catches bugs that might arise from incorrect extraneous attribute detection or premature FD removal.

Normalization Algorithms

Normalization algorithms transform schemas to eliminate redundancy and anomalies. Equivalence testing plays crucial roles in these algorithms.

3NF Synthesis Algorithm

The 3NF synthesis algorithm creates a lossless, dependency-preserving decomposition into 3NF relations. It relies heavily on FD equivalence:

3NF Synthesis Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function SYNTHESIZE_3NF(R, F):
    """
    Decomposes R into 3NF relations that are:
    - Lossless (can reconstruct R via natural join)
    - Dependency-preserving (local FDs cover original F)
    """
    
    # Step 1: Compute canonical cover
    Fc = CANONICAL_COVER(F)
    # Uses equivalence: Fc ≡ F
    
    # Step 2: Create a relation for each FD
    relations = []
    for each (X → A) in Fc:
        relations.append(Relation(attributes = X ∪ {A}))
    
    # Step 3: Merge relations with same key
    relations = MERGE_SAME_KEY(relations)
    
    # Step 4: Ensure lossless join (add candidate key if needed)
    K = FIND_CANDIDATE_KEY(R, Fc)
    if no relation contains K:
        relations.append(Relation(attributes = K))
    
    # Step 5: Remove redundant relations
    relations = REMOVE_SUBSUMED(relations)
    
    return relations

Why Canonical Cover Matters for 3NF

Using F directly instead of Fc can create:

Redundant relations (from redundant FDs)
Relations with extraneous attributes (from non-minimal LHS)
Multiple overlapping relations (from composite RHS)

The equivalence Fc ≡ F guarantees that the 3NF decomposition based on Fc enforces exactly the same constraints as one based on F, but more efficiently.

BCNF Decomposition

BCNF decomposition works differently but also uses equivalence:

BCNF Decomposition Concept
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function DECOMPOSE_BCNF(R, F):
    """
    Decomposes R into BCNF relations.
    Note: May NOT preserve all dependencies.
    """
    
    result = {R}
    
    while exists R' in result that violates BCNF:
        # Find violating FD: X → Y where X is not a superkey
        (X → Y) = FIND_BCNF_VIOLATION(R', F)
        
        # Decompose R' into two relations
        R1 = X ∪ Y             # Contains the violating FD
        R2 = R'.attributes - Y + X  # R' minus Y, keeping X
        
        result = result - {R'} ∪ {R1, R2}
        
        # Update FDs for new relations
        # (projection of F onto each new relation's attributes)
    
    # After decomposition, check dependency preservation
    F_preserved = UNION(PROJECT(F, Ri) for Ri in result)
    if not COVERS(F_preserved, F):
        warn("Some dependencies not preserved!")
    
    return result

BCNF vs Dependency Preservation

BCNF decomposition may lose dependencies (projected FDs may not cover original F). The coverage check at the end identifies lost constraints. This is why 3NF is sometimes preferred—it guarantees dependency preservation.

Decomposition Verification

When a schema is decomposed (whether by algorithm or by hand), we must verify two critical properties: lossless join and dependency preservation. Equivalence testing is central to the latter.

Dependency Preservation Test

Given:

Original relation R with FD set F
Decomposition into R₁, R₂, ..., Rₙ

For each Rᵢ, compute the projected FD set Fᵢ = πᵤᵢ(F) containing FDs in F⁺ that involve only attributes of Rᵢ.

The decomposition is dependency-preserving if:

$$(F_1 \cup F_2 \cup ... \cup F_n)\ \text{covers}\ F$$

Example: Verifying Preservation

Let R(A, B, C, D) with F = {A → B, B → C, C → D}.

Decomposition 1: R₁(A, B), R₂(B, C), R₃(C, D)

F₁ = {A → B}
F₂ = {B → C}
F₃ = {C → D}
F₁ ∪ F₂ ∪ F₃ = F
Does F cover F? Trivially yes. ✓ Dependency-preserving

Decomposition 2: R₁(A, B), R₂(A, C, D)

F₁ = {A → B}
F₂ projected on {A, C, D}:
- A → C? Check {A}⁺_F = {A,B,C,D}, C ∈ closure. Yes.
- A → D? Check {A}⁺_F = {A,B,C,D}, D ∈ closure. Yes.
- C → D? Check {C}⁺_F = {C,D}, D ∈ closure. Yes.
- F₂ = {A → C, A → D, C → D}
F₁ ∪ F₂ = {A → B, A → C, A → D, C → D}
Does this cover F = {A → B, B → C, C → D}?
- A → B: {A}⁺ = {A,B,C,D}. B ∈? Yes ✓
- B → C: {B}⁺ = {B}. C ∈? No ✗

Decomposition 2 loses the dependency B → C!

Consequence of Lost Dependencies

When B → C is lost, the DBMS cannot enforce it via local constraints on R₁ or R₂ alone. Ensuring every B value has exactly one C value requires checking across both tables—expensive and error-prone. This is why preservation matters.

Chase-Based Lossless Join Test

While not directly about equivalence, the lossless join test is closely related. We mention it for completeness:

A decomposition is lossless if the natural join of all Rᵢ exactly reconstructs R. The chase algorithm tests this by checking if the original relation's attributes can be derived from the decomposed relations using the FDs.

For FD-only schemas, a simpler test exists: the decomposition into R₁(X) and R₂(Y) is lossless iff X ∩ Y → X or X ∩ Y → Y is in F⁺.

Schema Comparison and Migration

Database migrations—upgrading schema versions, consolidating systems, or modernizing legacy databases—require careful constraint management. Equivalence testing ensures migrations preserve data integrity.

Migration Scenario: Legacy Modernization

Consider migrating a legacy system with hand-written constraint documentation to a modern DBMS with formal constraints:

Legacy documentation says:
- "Employee ID determines Department"
- "Department determines Manager"

New schema defines:
- EMPLOYEE(EmpID, DeptID, DeptName, ManagerID)
- FDs: EmpID → DeptID, EmpID → DeptName, EmpID → ManagerID, DeptID → ManagerID, DeptID → DeptName

Question: Does the new schema capture the legacy constraints?

Analysis:

Legacy FDs (formalized):

F_legacy = {EmpID → DeptID, DeptID → ManagerID}

New schema FDs:

F_new = {EmpID → DeptID, EmpID → DeptName, EmpID → ManagerID, DeptID → ManagerID, DeptID → DeptName}

Test: Does F_new cover F_legacy?

EmpID → DeptID: In F_new. ✓
DeptID → ManagerID: In F_new. ✓

✓ New schema covers legacy constraints.

Test: Does F_legacy cover F_new?

EmpID → DeptID: In F_legacy. ✓
EmpID → DeptName: Compute {EmpID}⁺_legacy = {EmpID, DeptID, ManagerID}. DeptName ∈? No. ✗

✗ Legacy doesn't cover new. New schema has additional constraints.

Interpretation: The new schema is more constrained than the legacy one. It captures everything the legacy system enforced, plus additional requirements (like DeptID → DeptName). This is safe—new constraints don't invalidate old data, they just prevent certain future updates.

Migration Safety

When migrating, ensure New covers Old (no constraints lost). It's acceptable for New to impose additional constraints (progressive tightening). It's dangerous if Old covers something New doesn't (constraint loss).

Schema Consolidation

When merging two database systems:

System A: {EmpID → DeptID, DeptID → Location}
System B: {EmpID → DeptID, EmpID → Location}

Are A and B equivalent?

{EmpID}⁺_A = {EmpID, DeptID, Location}
{EmpID}⁺_B = {EmpID, DeptID, Location}
{DeptID}⁺_A = {DeptID, Location}
{DeptID}⁺_B = {DeptID}

B doesn't imply DeptID → Location. To check equivalence:

A covers B? Yes (all of B's FDs derivable from A).
B covers A? DeptID → Location from A. {DeptID}⁺_B = {DeptID}. Location ∈? No.

A ≢ B. System A has a constraint System B lacks. Consolidation must address this discrepancy.

Integration with Database Tools

Modern database tools leverage equivalence algorithms internally. Understanding how they work helps you use them effectively.

Data Modeling Tools

Tools like ERwin, PowerDesigner, and similar:

Allow graphical FD specification
Compute canonical covers automatically
Suggest normalization based on FD analysis
Verify that user edits maintain equivalence with original constraints

When you modify FDs in such a tool, it typically recomputes canonical cover and alerts you if constraints changed semantically (not just syntactically).

Schema Diff Tools

Tools comparing database schemas:

Extract implicit FDs from constraints
Use equivalence testing to identify semantic differences
Report "constraint added" or "constraint removed" rather than just syntactic changes

For example, if old schema has PRIMARY KEY (A) and new schema has UNIQUE(A), both imply A → A. The tool recognizes these as equivalent for FD purposes.

Automated Constraint Discovery

Data profiling tools discover FDs from actual data:

Scan data to find attribute combinations that always determine others
Generate candidate FD set F_discovered
Compare with documented FD set F_documented

The comparison uses equivalence testing:

If F_discovered ≡ F_documented: Data matches design
If F_documented covers F_discovered but not vice versa: Data satisfies more constraints than required (possibly due to limited sample)
If F_discovered covers F_documented but not vice versa: Some documented constraints are violated in data!

Data vs. Design Discrepancy

When discovered FDs don't match documented FDs, investigate! Either the documentation is wrong (needs updating), or the data has integrity issues (needs cleaning). Equivalence testing quantifies the discrepancy.

Query Optimization

Some query optimizers use FD information:

FDs help identify redundant join conditions
FDs enable elimination of unnecessary DISTINCT operations
FDs allow better cardinality estimation

Optimizers may compute closure to determine if a column is functionally determined by GROUP BY columns, affecting whether it can appear in SELECT without aggregation (SQL's infamous "single-value rule").

Case Study: E-Commerce Database Redesign

Let's walk through a realistic case study demonstrating FD equivalence in a complete database design workflow.

Scenario:

An e-commerce company has a legacy ORDER table:

ORDER(OrderID, CustomerID, CustomerName, CustomerEmail, 
      ProductID, ProductName, ProductPrice, Quantity, OrderDate)

Known business rules:

Each order has one customer
Each customer has a unique email
Each product has a fixed name and price

Step 1: Formalize FDs

From business rules:

F = { OrderID → CustomerID, ProductID, Quantity, OrderDate, CustomerID → CustomerName, CustomerEmail, CustomerEmail → CustomerID, ProductID → ProductName, ProductPrice }

Step 2: Compute Canonical Cover

After decomposing composite RHS and checking for extraneous attributes:

Fc = {
    OrderID → CustomerID,
    OrderID → ProductID,
    OrderID → Quantity,
    OrderID → OrderDate,
    CustomerID → CustomerName,
    CustomerID → CustomerEmail,
    CustomerEmail → CustomerID,
    ProductID → ProductName,
    ProductID → ProductPrice
}

Verify: EQUIVALENT(F, Fc) = true ✓

Step 3: Identify Candidate Keys

Compute {OrderID}⁺_Fc = {OrderID, CustomerID, ProductID, Quantity, OrderDate, CustomerName, CustomerEmail, ProductName, ProductPrice} = all attributes.

OrderID is a candidate key (minimal since {OrderID}⁺ = U and no subset works).

Step 4: Normalize to 3NF

Apply 3NF synthesis:

Create relation for each FD group:
- ORDER_CORE(OrderID, CustomerID, ProductID, Quantity, OrderDate)
- CUSTOMER(CustomerID, CustomerName, CustomerEmail)
- PRODUCT(ProductID, ProductName, ProductPrice)
Note: CustomerEmail → CustomerID creates a constraint within CUSTOMER, not a separate relation needed.

Step 5: Verify Decomposition

Lossless?

ORDER_CORE ∩ CUSTOMER = {CustomerID}
CustomerID → CustomerName, CustomerEmail (determines CUSTOMER attributes) ✓
ORDER_CORE ∩ PRODUCT = {ProductID}
ProductID → ProductName, ProductPrice (determines PRODUCT attributes) ✓
Lossless by the test condition.

Dependency-Preserving?

F_ORDER_CORE = {OrderID → CustomerID, OrderID → ProductID, OrderID → Quantity, OrderID → OrderDate}
F_CUSTOMER = {CustomerID → CustomerName, CustomerID → CustomerEmail, CustomerEmail → CustomerID}
F_PRODUCT = {ProductID → ProductName, ProductID → ProductPrice}

F_union = F_ORDER_CORE ∪ F_CUSTOMER ∪ F_PRODUCT

Does F_union cover Fc? Check each FD in Fc... all present locally. ✓

Case Study Complete

The decomposition is lossless, dependency-preserving, and in 3NF. Equivalence testing verified that no constraints were lost. The new design eliminates redundancy (CustomerName repeated per order) while maintaining all business rules.

Advanced Applications

Beyond the core applications, FD equivalence appears in several advanced database topics.

View Updates and Equivalent Rewriting

Views can be updated if the update can be translated to base table updates that preserve the view's defining constraints. FD equivalence helps determine:

Which view columns are uniquely determined (can be omitted in insert)
Whether an update preserves keys (equivalent before and after)

Materialized View Maintenance

When base tables change, materialized views need updates. FD analysis determines:

Which changes affect the view (based on FD coverage)
Whether incremental maintenance is possible (based on key preservation)

Semantic Query Optimization

Queries can be simplified using FD knowledge:

-- Original query
SELECT DISTINCT A, B, C 
FROM R 
WHERE A = 5

-- If A → B, C is known, optimizer can rewrite:
SELECT A, B, C 
FROM R 
WHERE A = 5
-- DISTINCT is unnecessary: A=5 means at most one (B,C) value

The optimizer uses closure computation to identify such opportunities.

Schema Evolution

As business requirements evolve, schemas change. Each change should be analyzed:

Adding an FD: New constraint. Verify existing data satisfies it.
Removing an FD: Lost constraint. May allow data previously prohibited.
Modifying an FD: Check if new set is equivalent to old (no semantic change) or not.

Equivalence testing automates this analysis.

Distributed Database Design

In distributed systems, data is partitioned and replicated. FDs help determine:

Partitioning strategies: Functional dependencies on the partition key ensure related data stays together
Replication decisions: If A → B and A is the replication key, B can be derived and doesn't need separate replication
Consistency requirements: FDs that span partitions require distributed enforcement

Research Frontiers

Active research areas include approximate FDs (FDs that hold for most but not all rows), conditional FDs (FDs that hold under certain conditions), and FD discovery in big data systems. All build on the equivalence foundations covered in this module.

Summary: Practical Mastery of FD Equivalence

We have explored the rich landscape of practical applications for FD set equivalence. These applications span the entire lifecycle of database systems, from design to evolution.

Key Takeaways

•Canonical cover uses equivalence — Each simplification step is validated by equivalence testing to ensure no constraints are lost.
•Normalization relies on coverage — 3NF synthesis uses equivalence; BCNF decomposition checks preservation via coverage.
•Decomposition verification is coverage testing — Dependency preservation means projected FDs cover original FDs.
•Migrations use equivalence — Ensure new schema covers old to prevent constraint loss during migration.
•Database tools use these algorithms — Modeling tools, diff tools, and profilers all leverage equivalence internally.
•Real-world workflows integrate equivalence — From initial design through evolution, equivalence testing ensures constraint consistency.
•Advanced applications abound — View updates, materialized views, query optimization, and distributed design all use FD analysis.

Module Complete

You have now completed the module on Equivalence of FD Sets. You understand:

What equivalence means theoretically (F⁺ = G⁺)
How to test it efficiently (mutual coverage via attribute closure)
Why it matters (canonical covers, normalization, migrations)
Where it applies (design, verification, evolution, tooling)

This knowledge forms a critical part of your database expertise, enabling you to reason rigorously about constraints and make sound design decisions.

Module Complete

Congratulations! You have mastered the theory and practice of FD set equivalence. From formal definitions through efficient algorithms to real-world applications, you now possess the tools to analyze, verify, and optimize database constraints like a seasoned database architect.

5 / 5

Loading learning content...

Database Management SystemsEquivalence of FD Sets

Equivalence of Functional Dependency Sets

LevelIntermediate

Duration60 mins

TopicEquivalence of FD Sets

5 / 5

Practical Applications

From Theory to Practice

The concepts of equivalence, cover, and derivation are not merely academic exercises—they form the computational backbone of:

Database design tools
Schema optimization algorithms
Data integrity verification systems
Migration and modernization projects
Research and development in data management

This page demonstrates how equivalence theory integrates into the workflows of practicing database engineers and architects.

What You Will Learn

Canonical Cover Computation

The canonical cover (also called minimal cover) of an FD set F is the simplest equivalent FD set. Computing it is a cornerstone application of equivalence theory.

Definition: Canonical Cover

A canonical cover Fc of F satisfies:

Equivalence: Fc ≡ F
Singleton RHS: Every FD has a single attribute on the right-hand side
No extraneous attributes: No attribute can be removed from any LHS or RHS without breaking equivalence
No redundant FDs: No FD can be removed without breaking equivalence

Canonical Cover Algorithm

The algorithm repeatedly applies simplification steps while maintaining equivalence:

Canonical Cover Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
function CANONICAL_COVER(F):
    """
    Computes a canonical cover of F.
    Invariant: Result is always equivalent to F.
    """
    
    # Step 1: Decompose all composite RHS
    F' = DECOMPOSE_RHS(F)
    # {A → BC} becomes {A → B, A → C}
    
    # Step 2: Remove extraneous LHS attributes
    repeat:
        for each (XY → Z) in F' where |XY| > 1:
            for each attribute A in XY:
                if Z is in ATTRIBUTE_CLOSURE(XY - {A}, F' - {XY → Z} ∪ {(XY - {A}) → Z}):
                    # A is extraneous in LHS
                    F' = F' - {XY → Z} ∪ {(XY - {A}) → Z}
    until no change
    
    # Step 3: Remove extraneous RHS attributes (already singleton, so skip)
    
    # Step 4: Remove redundant FDs
    repeat:
        for each (X → A) in F':
            if A is in ATTRIBUTE_CLOSURE(X, F' - {X → A}):
                # X → A is redundant (derivable from other FDs)
                F' = F' - {X → A}
    until no change
    
    # Step 5: Combine FDs with same LHS (optional, for readability)
    F' = MERGE_SAME_LHS(F')
    
    return F'

Equivalence Guarantees Correctness

At every step of the algorithm, we verify that the modification preserves equivalence:

Decomposition: {A → BC} ≡ {A → B, A → C} by union/decomposition rules
Extraneous attribute removal: Verified by closure computation before removal
Redundant FD removal: The FD is derivable from remaining FDs, so closure unchanged

The final assertion EQUIVALENT(F, Fc) provides a correctness check on the entire algorithm.

Implementation Note

Normalization Algorithms

Normalization algorithms transform schemas to eliminate redundancy and anomalies. Equivalence testing plays crucial roles in these algorithms.

3NF Synthesis Algorithm

The 3NF synthesis algorithm creates a lossless, dependency-preserving decomposition into 3NF relations. It relies heavily on FD equivalence:

3NF Synthesis Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
function SYNTHESIZE_3NF(R, F):
    """
    Decomposes R into 3NF relations that are:
    - Lossless (can reconstruct R via natural join)
    - Dependency-preserving (local FDs cover original F)
    """
    
    # Step 1: Compute canonical cover
    Fc = CANONICAL_COVER(F)
    # Uses equivalence: Fc ≡ F
    
    # Step 2: Create a relation for each FD
    relations = []
    for each (X → A) in Fc:
        relations.append(Relation(attributes = X ∪ {A}))
    
    # Step 3: Merge relations with same key
    relations = MERGE_SAME_KEY(relations)
    
    # Step 4: Ensure lossless join (add candidate key if needed)
    K = FIND_CANDIDATE_KEY(R, Fc)
    if no relation contains K:
        relations.append(Relation(attributes = K))
    
    # Step 5: Remove redundant relations
    relations = REMOVE_SUBSUMED(relations)
    
    return relations

Why Canonical Cover Matters for 3NF

Using F directly instead of Fc can create:

Redundant relations (from redundant FDs)
Relations with extraneous attributes (from non-minimal LHS)
Multiple overlapping relations (from composite RHS)

The equivalence Fc ≡ F guarantees that the 3NF decomposition based on Fc enforces exactly the same constraints as one based on F, but more efficiently.

BCNF Decomposition

BCNF decomposition works differently but also uses equivalence:

BCNF Decomposition Concept
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
function DECOMPOSE_BCNF(R, F):
    """
    Decomposes R into BCNF relations.
    Note: May NOT preserve all dependencies.
    """
    
    result = {R}
    
    while exists R' in result that violates BCNF:
        # Find violating FD: X → Y where X is not a superkey
        (X → Y) = FIND_BCNF_VIOLATION(R', F)
        
        # Decompose R' into two relations
        R1 = X ∪ Y             # Contains the violating FD
        R2 = R'.attributes - Y + X  # R' minus Y, keeping X
        
        result = result - {R'} ∪ {R1, R2}
        
        # Update FDs for new relations
        # (projection of F onto each new relation's attributes)
    
    # After decomposition, check dependency preservation
    F_preserved = UNION(PROJECT(F, Ri) for Ri in result)
    if not COVERS(F_preserved, F):
        warn("Some dependencies not preserved!")
    
    return result

BCNF vs Dependency Preservation

Decomposition Verification

When a schema is decomposed (whether by algorithm or by hand), we must verify two critical properties: lossless join and dependency preservation. Equivalence testing is central to the latter.

Dependency Preservation Test

Given:

Original relation R with FD set F
Decomposition into R₁, R₂, ..., Rₙ

For each Rᵢ, compute the projected FD set Fᵢ = πᵤᵢ(F) containing FDs in F⁺ that involve only attributes of Rᵢ.

The decomposition is dependency-preserving if:

$$(F_1 \cup F_2 \cup ... \cup F_n)\ \text{covers}\ F$$

Example: Verifying Preservation

Let R(A, B, C, D) with F = {A → B, B → C, C → D}.

Decomposition 1: R₁(A, B), R₂(B, C), R₃(C, D)

F₁ = {A → B}
F₂ = {B → C}
F₃ = {C → D}
F₁ ∪ F₂ ∪ F₃ = F
Does F cover F? Trivially yes. ✓ Dependency-preserving

Decomposition 2: R₁(A, B), R₂(A, C, D)

F₁ = {A → B}
F₂ projected on {A, C, D}:
- A → C? Check {A}⁺_F = {A,B,C,D}, C ∈ closure. Yes.
- A → D? Check {A}⁺_F = {A,B,C,D}, D ∈ closure. Yes.
- C → D? Check {C}⁺_F = {C,D}, D ∈ closure. Yes.
- F₂ = {A → C, A → D, C → D}
F₁ ∪ F₂ = {A → B, A → C, A → D, C → D}
Does this cover F = {A → B, B → C, C → D}?
- A → B: {A}⁺ = {A,B,C,D}. B ∈? Yes ✓
- B → C: {B}⁺ = {B}. C ∈? No ✗

Decomposition 2 loses the dependency B → C!

Consequence of Lost Dependencies

Chase-Based Lossless Join Test

While not directly about equivalence, the lossless join test is closely related. We mention it for completeness:

For FD-only schemas, a simpler test exists: the decomposition into R₁(X) and R₂(Y) is lossless iff X ∩ Y → X or X ∩ Y → Y is in F⁺.

Schema Comparison and Migration

Migration Scenario: Legacy Modernization

Consider migrating a legacy system with hand-written constraint documentation to a modern DBMS with formal constraints:

Legacy documentation says:
- "Employee ID determines Department"
- "Department determines Manager"

New schema defines:
- EMPLOYEE(EmpID, DeptID, DeptName, ManagerID)
- FDs: EmpID → DeptID, EmpID → DeptName, EmpID → ManagerID, DeptID → ManagerID, DeptID → DeptName

Question: Does the new schema capture the legacy constraints?

Analysis:

Legacy FDs (formalized):

F_legacy = {EmpID → DeptID, DeptID → ManagerID}

New schema FDs:

F_new = {EmpID → DeptID, EmpID → DeptName, EmpID → ManagerID, DeptID → ManagerID, DeptID → DeptName}

Test: Does F_new cover F_legacy?

EmpID → DeptID: In F_new. ✓
DeptID → ManagerID: In F_new. ✓

✓ New schema covers legacy constraints.

Test: Does F_legacy cover F_new?

EmpID → DeptID: In F_legacy. ✓
EmpID → DeptName: Compute {EmpID}⁺_legacy = {EmpID, DeptID, ManagerID}. DeptName ∈? No. ✗

✗ Legacy doesn't cover new. New schema has additional constraints.

Migration Safety

Schema Consolidation

When merging two database systems:

System A: {EmpID → DeptID, DeptID → Location}
System B: {EmpID → DeptID, EmpID → Location}

Are A and B equivalent?

{EmpID}⁺_A = {EmpID, DeptID, Location}
{EmpID}⁺_B = {EmpID, DeptID, Location}
{DeptID}⁺_A = {DeptID, Location}
{DeptID}⁺_B = {DeptID}

B doesn't imply DeptID → Location. To check equivalence:

A covers B? Yes (all of B's FDs derivable from A).
B covers A? DeptID → Location from A. {DeptID}⁺_B = {DeptID}. Location ∈? No.

A ≢ B. System A has a constraint System B lacks. Consolidation must address this discrepancy.

Integration with Database Tools

Modern database tools leverage equivalence algorithms internally. Understanding how they work helps you use them effectively.

Data Modeling Tools

Tools like ERwin, PowerDesigner, and similar:

Allow graphical FD specification
Compute canonical covers automatically
Suggest normalization based on FD analysis
Verify that user edits maintain equivalence with original constraints

When you modify FDs in such a tool, it typically recomputes canonical cover and alerts you if constraints changed semantically (not just syntactically).

Schema Diff Tools

Tools comparing database schemas:

Extract implicit FDs from constraints
Use equivalence testing to identify semantic differences
Report "constraint added" or "constraint removed" rather than just syntactic changes

For example, if old schema has PRIMARY KEY (A) and new schema has UNIQUE(A), both imply A → A. The tool recognizes these as equivalent for FD purposes.

Automated Constraint Discovery

Data profiling tools discover FDs from actual data:

Scan data to find attribute combinations that always determine others
Generate candidate FD set F_discovered
Compare with documented FD set F_documented

The comparison uses equivalence testing:

If F_discovered ≡ F_documented: Data matches design
If F_documented covers F_discovered but not vice versa: Data satisfies more constraints than required (possibly due to limited sample)
If F_discovered covers F_documented but not vice versa: Some documented constraints are violated in data!

Data vs. Design Discrepancy

Query Optimization

Some query optimizers use FD information:

FDs help identify redundant join conditions
FDs enable elimination of unnecessary DISTINCT operations
FDs allow better cardinality estimation

Case Study: E-Commerce Database Redesign

Let's walk through a realistic case study demonstrating FD equivalence in a complete database design workflow.

Scenario:

An e-commerce company has a legacy ORDER table:

ORDER(OrderID, CustomerID, CustomerName, CustomerEmail, 
      ProductID, ProductName, ProductPrice, Quantity, OrderDate)

Known business rules:

Each order has one customer
Each customer has a unique email
Each product has a fixed name and price

Step 1: Formalize FDs

From business rules:

F = { OrderID → CustomerID, ProductID, Quantity, OrderDate, CustomerID → CustomerName, CustomerEmail, CustomerEmail → CustomerID, ProductID → ProductName, ProductPrice }

Step 2: Compute Canonical Cover

After decomposing composite RHS and checking for extraneous attributes:

Fc = {
    OrderID → CustomerID,
    OrderID → ProductID,
    OrderID → Quantity,
    OrderID → OrderDate,
    CustomerID → CustomerName,
    CustomerID → CustomerEmail,
    CustomerEmail → CustomerID,
    ProductID → ProductName,
    ProductID → ProductPrice
}

Verify: EQUIVALENT(F, Fc) = true ✓

Step 3: Identify Candidate Keys

Compute {OrderID}⁺_Fc = {OrderID, CustomerID, ProductID, Quantity, OrderDate, CustomerName, CustomerEmail, ProductName, ProductPrice} = all attributes.

OrderID is a candidate key (minimal since {OrderID}⁺ = U and no subset works).

Step 4: Normalize to 3NF

Apply 3NF synthesis:

Create relation for each FD group:
- ORDER_CORE(OrderID, CustomerID, ProductID, Quantity, OrderDate)
- CUSTOMER(CustomerID, CustomerName, CustomerEmail)
- PRODUCT(ProductID, ProductName, ProductPrice)
Note: CustomerEmail → CustomerID creates a constraint within CUSTOMER, not a separate relation needed.

Step 5: Verify Decomposition

Lossless?

ORDER_CORE ∩ CUSTOMER = {CustomerID}
CustomerID → CustomerName, CustomerEmail (determines CUSTOMER attributes) ✓
ORDER_CORE ∩ PRODUCT = {ProductID}
ProductID → ProductName, ProductPrice (determines PRODUCT attributes) ✓
Lossless by the test condition.

Dependency-Preserving?

F_ORDER_CORE = {OrderID → CustomerID, OrderID → ProductID, OrderID → Quantity, OrderID → OrderDate}
F_CUSTOMER = {CustomerID → CustomerName, CustomerID → CustomerEmail, CustomerEmail → CustomerID}
F_PRODUCT = {ProductID → ProductName, ProductID → ProductPrice}

F_union = F_ORDER_CORE ∪ F_CUSTOMER ∪ F_PRODUCT

Does F_union cover Fc? Check each FD in Fc... all present locally. ✓

Case Study Complete

Advanced Applications

Beyond the core applications, FD equivalence appears in several advanced database topics.

View Updates and Equivalent Rewriting

Views can be updated if the update can be translated to base table updates that preserve the view's defining constraints. FD equivalence helps determine:

Which view columns are uniquely determined (can be omitted in insert)
Whether an update preserves keys (equivalent before and after)

Materialized View Maintenance

When base tables change, materialized views need updates. FD analysis determines:

Which changes affect the view (based on FD coverage)
Whether incremental maintenance is possible (based on key preservation)

Semantic Query Optimization

Queries can be simplified using FD knowledge:

-- Original query
SELECT DISTINCT A, B, C 
FROM R 
WHERE A = 5

-- If A → B, C is known, optimizer can rewrite:
SELECT A, B, C 
FROM R 
WHERE A = 5
-- DISTINCT is unnecessary: A=5 means at most one (B,C) value

The optimizer uses closure computation to identify such opportunities.

Schema Evolution

As business requirements evolve, schemas change. Each change should be analyzed:

Adding an FD: New constraint. Verify existing data satisfies it.
Removing an FD: Lost constraint. May allow data previously prohibited.
Modifying an FD: Check if new set is equivalent to old (no semantic change) or not.

Equivalence testing automates this analysis.

Distributed Database Design

In distributed systems, data is partitioned and replicated. FDs help determine:

Partitioning strategies: Functional dependencies on the partition key ensure related data stays together
Replication decisions: If A → B and A is the replication key, B can be derived and doesn't need separate replication
Consistency requirements: FDs that span partitions require distributed enforcement

Research Frontiers

Summary: Practical Mastery of FD Equivalence

We have explored the rich landscape of practical applications for FD set equivalence. These applications span the entire lifecycle of database systems, from design to evolution.

Key Takeaways

•Canonical cover uses equivalence — Each simplification step is validated by equivalence testing to ensure no constraints are lost.
•Normalization relies on coverage — 3NF synthesis uses equivalence; BCNF decomposition checks preservation via coverage.
•Decomposition verification is coverage testing — Dependency preservation means projected FDs cover original FDs.
•Migrations use equivalence — Ensure new schema covers old to prevent constraint loss during migration.
•Database tools use these algorithms — Modeling tools, diff tools, and profilers all leverage equivalence internally.
•Real-world workflows integrate equivalence — From initial design through evolution, equivalence testing ensures constraint consistency.
•Advanced applications abound — View updates, materialized views, query optimization, and distributed design all use FD analysis.

Module Complete

You have now completed the module on Equivalence of FD Sets. You understand:

What equivalence means theoretically (F⁺ = G⁺)
How to test it efficiently (mutual coverage via attribute closure)
Why it matters (canonical covers, normalization, migrations)
Where it applies (design, verification, evolution, tooling)

This knowledge forms a critical part of your database expertise, enabling you to reason rigorously about constraints and make sound design decisions.

Module Complete

5 / 5