Database Management SystemsBCNF Decomposition

BCNF Decomposition

LevelIntermediate

Duration75 mins

TopicBCNF Decomposition

1 / 5

BCNF Decomposition Algorithm

The Art of Decomposing Relations into BCNF

In our previous study of Boyce-Codd Normal Form (BCNF), we established that a relation is in BCNF when every determinant is a candidate key. This elegant definition eliminates all functional dependency-based redundancy. But knowing what BCNF is differs fundamentally from knowing how to achieve it.

This page presents the BCNF decomposition algorithm—a systematic procedure for transforming any relation schema into a set of BCNF relations. Unlike theoretical definitions, this algorithm is actionable: it takes a relation and its functional dependencies as input and produces a guaranteed BCNF decomposition as output.

The algorithm we study here is not merely academic. It forms the foundation of automated database design tools, schema normalization utilities, and the mental framework that experienced database architects use when designing real-world systems.

What You Will Learn

By the end of this page, you will be able to: (1) State the formal BCNF decomposition algorithm, (2) Apply it step-by-step to any relation schema, (3) Prove that the algorithm always terminates, (4) Understand why it guarantees lossless decomposition, and (5) Implement the algorithm in practice.

Prerequisites and Foundations

Before diving into the algorithm, let us consolidate the foundational concepts that make the decomposition procedure possible.

Essential Prerequisites

•Functional Dependency (FD): A constraint X → Y stating that the value of attribute set X uniquely determines the value of attribute set Y. If two tuples agree on X, they must agree on Y.
•Candidate Key: A minimal set of attributes that functionally determines all attributes in the relation. No proper subset of a candidate key is also a superkey.
•Superkey: Any set of attributes that functionally determines all attributes. Every candidate key is a superkey, but not vice versa.
•BCNF Definition: A relation R is in BCNF if and only if for every non-trivial functional dependency X → Y that holds in R, X is a superkey of R.
•Attribute Closure (X⁺): The set of all attributes functionally determined by X. Used to test whether X is a superkey (X⁺ = all attributes).
•Lossless Decomposition: A decomposition where the natural join of the decomposed relations always reconstructs the original relation exactly—no spurious tuples are generated.

The BCNF Violation Pattern

A BCNF violation occurs when we have a functional dependency X → Y where X is NOT a superkey. This means X determines something, but X doesn't determine everything—creating the potential for redundancy. The decomposition algorithm systematically eliminates these violations.

The Central Insight:

When a relation violates BCNF due to a functional dependency X → Y (where X is not a superkey), we can decompose the relation into two smaller relations:

One relation containing X ∪ Y (everything involved in the problematic FD)
Another relation containing X ∪ (R - Y) (X plus everything else except what Y determines)

This split isolates the problematic dependency into its own relation while preserving the ability to reconstruct the original data through joins.

The BCNF Decomposition Algorithm

The BCNF decomposition algorithm is an iterative procedure that repeatedly identifies and eliminates BCNF violations until all resulting relations are in BCNF.

Formal Algorithm Statement

Input: A relation schema R with attribute set U and a set of functional dependencies F.

Output: A decomposition of R into relations R₁, R₂, ..., Rₙ, each in BCNF.

Procedure:

Initialize result = {R}
While there exists a relation Rᵢ in result that is not in BCNF: a. Find a non-trivial FD X → Y that violates BCNF in Rᵢ (X is not a superkey of Rᵢ) b. Compute X⁺ with respect to F c. Replace Rᵢ with two relations: • R₁ = X⁺ (all attributes determined by X) • R₂ = X ∪ (Rᵢ - X⁺ - Y) ∪ X = X ∪ (Rᵢ - (X⁺ - X)) Simplified: R₁ = X⁺ and R₂ = Rᵢ - (X⁺ - X)
Return result

A Cleaner Formulation:

In practice, the algorithm is often stated more simply:

Given: Relation R with attributes U and FD set F

1. result := {R}
2. While some Rᵢ ∈ result violates BCNF:
   2.1. Find X → Y violating BCNF in Rᵢ (X not a superkey, X → Y non-trivial)
   2.2. Decompose Rᵢ into:
        • Rₐ = XY (attributes of the violating FD)
        • Rᵦ = Rᵢ - Y + X = Rᵢ - (Y - X) (remaining attributes plus X)
   2.3. Replace Rᵢ with Rₐ and Rᵦ in result
3. Return result

The key insight is that X becomes a key in Rₐ (since X → Y and Rₐ = XY), and X serves as a foreign key in Rᵦ to reference Rₐ.

Algorithm Components Explained
Component	Description	Purpose
X → Y	The BCNF-violating functional dependency	Identifies what needs to be separated
Rₐ = XY	The new relation containing the violating FD	X becomes a key here, satisfying BCNF for this FD
Rᵦ = R - (Y - X)	Original relation minus the dependent attributes (keeping X)	Removes redundancy while keeping join capability
X as common attribute	X appears in both Rₐ and Rᵦ	Enables lossless join reconstruction

Step-by-Step Worked Example

Let's trace the algorithm through a comprehensive example that demonstrates multiple decomposition steps.

Example Schema

Relation: CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)

Functional Dependencies: • F₁: {StudentID, CourseID} → Instructor • F₂: CourseID → Instructor • F₃: Room → Building • F₄: Room → Capacity • F₅: {Room, Building} → Capacity (derivable, but explicit)

Candidate Key: {StudentID, CourseID, Room}

Step 0: Initial State

result = {CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)}

Step 1: Check for BCNF Violations

We examine each FD to see if its determinant is a superkey:

FD	Determinant	Is Superkey?	BCNF Violation?
CourseID → Instructor	{CourseID}	No (doesn't determine StudentID or Room)	YES
Room → Building	{Room}	No (doesn't determine StudentID or CourseID)	YES
Room → Capacity	{Room}	No	YES

Multiple violations exist. We pick CourseID → Instructor first.

Step 2: First Decomposition (CourseID → Instructor)

Violating FD: CourseID → Instructor

Decompose CourseSchedule into:

R₁ = {CourseID, Instructor} (the violating FD's attributes)
R₂ = CourseSchedule - {Instructor} + {CourseID} = {StudentID, CourseID, Room, Building, Capacity}

Now result = {R₁(CourseID, Instructor), R₂(StudentID, CourseID, Room, Building, Capacity)}

Verify R₁ is in BCNF:

FD: CourseID → Instructor
Is CourseID a superkey of R₁? Yes! CourseID⁺ = {CourseID, Instructor} = R₁
R₁ is in BCNF ✓

Step 3: Check R₂ for BCNF Violations

R₂ = {StudentID, CourseID, Room, Building, Capacity}

FDs that apply to R₂:

Room → Building (since Room and Building are in R₂)
Room → Capacity

Is {Room} a superkey of R₂? Room⁺ = {Room, Building, Capacity} ≠ R₂

Room → Building violates BCNF in R₂!

Step 4: Second Decomposition (Room → Building)

Actually, let's use Room → {Building, Capacity} since Room⁺ includes both.

Decompose R₂ into:

R₃ = {Room, Building, Capacity} (Room⁺ restricted to R₂)
R₄ = R₂ - {Building, Capacity} + {Room} = {StudentID, CourseID, Room}

Now result = {R₁, R₃, R₄}

Verify R₃ is in BCNF:

FDs: Room → Building, Room → Capacity
Is Room a superkey? Room⁺ = {Room, Building, Capacity} = R₃ ✓
R₃ is in BCNF ✓

Verify R₄ is in BCNF:

What FDs apply? None of our original FDs have determinants that are subsets of {StudentID, CourseID, Room} except trivial ones.
The only candidate key is {StudentID, CourseID, Room} itself.
No non-trivial FDs mean no violations.
R₄ is in BCNF ✓

Final BCNF Decomposition

Original: CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)

BCNF Decomposition: • R₁(CourseID, Instructor) — Key: CourseID • R₃(Room, Building, Capacity) — Key: Room • R₄(StudentID, CourseID, Room) — Key: {StudentID, CourseID, Room}

Each relation is now in BCNF. Redundancy has been eliminated!

Why the Algorithm Works

The correctness of the BCNF decomposition algorithm rests on two fundamental guarantees: termination (it always finishes) and losslessness (the decomposition preserves all information). Let's examine each.

Termination Guarantee

•Observation: Each decomposition step strictly reduces the number of attributes in at least one resulting relation.
•Specifically: When we split R into Rₐ = XY and Rᵦ = R - (Y - X), both Rₐ and Rᵦ have fewer attributes than R (assuming the FD is non-trivial and X is not a superkey).
•Lower Bound: A relation with only one or two attributes cannot violate BCNF (any FD is trivial or makes the determinant a key).
•Conclusion: Since attribute count strictly decreases and has a lower bound, the algorithm must terminate.
•Worst Case: At most O(n) decomposition steps for a relation with n attributes, since each step removes at least one attribute from consideration.

Proof of Lossless Join:

The decomposition of R into Rₐ = XY and Rᵦ = R - (Y - X) satisfies the lossless join condition.

Theorem: A decomposition of R into R₁ and R₂ is lossless if and only if:

(R₁ ∩ R₂) → R₁, OR
(R₁ ∩ R₂) → R₂

In our decomposition:

R₁ = XY, R₂ = R - (Y - X) = X ∪ (R - Y)
R₁ ∩ R₂ = X (The common attributes)
Does X → XY? Yes, because X → Y is the violating FD, so X → X (trivial) and X → Y, therefore X → XY.

Since (R₁ ∩ R₂) → R₁, the decomposition is lossless!

Intuitive Understanding

Think of it this way: X is like a foreign key in R₂ (Rᵦ) that references the primary key X in R₁ (Rₐ). When we join them back together, each tuple in R₂ matches exactly one tuple in R₁ (because X functionally determines Y). No spurious tuples can appear because the relationship is many-to-one, not many-to-many.

Algorithm Variations and Refinements

The basic BCNF decomposition algorithm admits several variations that can affect the final result and its properties.

Algorithm Variations
Variation	Description	Trade-offs
Maximal Decomposition	Use X⁺ instead of just XY for the first sub-relation	Fewer relations, but may miss opportunities for dependency preservation
Minimal Decomposition	Use exactly XY where Y is a single attribute	More relations, but easier to analyze and modify
Choice of Violating FD	Different orderings of FD selection produce different decompositions	All are correct BCNF, but may have different numbers of relations
Canonical Cover First	Compute canonical cover of F before decomposing	Reduces redundant work and can simplify the final schema

The Choice Matters:

Consider a relation R(A, B, C, D) with FDs: A → B, A → C, A → D.

Approach 1 (Maximal):

A⁺ = {A, B, C, D} = R
A is already a superkey!
R is in BCNF. No decomposition needed.

Approach 2 (Minimal steps on different ordering):

If we processed B → C (hypothetically), we'd decompose differently.

The lesson: always compute the attribute closure to determine if you really have a violation. The determinant might be a superkey after all!

Non-Determinism Warning

The BCNF decomposition algorithm is non-deterministic: different choices of which violation to address first can lead to different final decompositions. All results are valid BCNF decompositions, but they may have different numbers of relations and different dependency preservation properties.

Implementing the Algorithm

Let's examine how to implement the BCNF decomposition algorithm programmatically. The implementation requires careful handling of attribute sets and functional dependency projection.

bcnf_decomposition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def compute_closure(attrs: set, fds: list) -> set:
    """Compute the closure of a set of attributes under given FDs."""
    closure = attrs.copy()
    changed = True
    while changed:
        changed = False
        for lhs, rhs in fds:
            if lhs.issubset(closure) and not rhs.issubset(closure):
                closure = closure.union(rhs)
                changed = True
    return closure
 
def is_superkey(attrs: set, relation: set, fds: list) -> bool:
    """Check if attrs is a superkey of the relation."""
    closure = compute_closure(attrs, fds)
    return relation.issubset(closure)
 
def find_bcnf_violation(relation: set, fds: list) -> tuple:
    """Find an FD that violates BCNF, or return None if in BCNF."""
    for lhs, rhs in fds:
        # Check if FD is applicable to this relation
        if lhs.issubset(relation) and rhs.issubset(relation):
            # Check if it's non-trivial
            if not rhs.issubset(lhs):
                # Check if lhs is NOT a superkey
                if not is_superkey(lhs, relation, fds):
                    return (lhs, rhs)
    return None
 
def project_fds(fds: list, relation: set) -> list:
    """Project FDs onto a relation (compute FDs that apply)."""
    projected = []
    # We need to compute the closure of each subset
    # This is expensive but correct
    for lhs, rhs in fds:
        if lhs.issubset(relation):
            closure = compute_closure(lhs, fds)
            new_rhs = closure.intersection(relation) - lhs
            if new_rhs:
                projected.append((lhs, new_rhs))
    return projected
 
def bcnf_decomposition(relation: set, fds: list) -> list:
    """
    Decompose a relation into BCNF.
    Returns a list of relations (each a set of attributes) in BCNF.
    """
    result = [relation]
    
    while True:
        # Find a relation that violates BCNF
        found_violation = False
        for i, rel in enumerate(result):
            violation = find_bcnf_violation(rel, fds)
            if violation:
                found_violation = True
                lhs, rhs = violation
                
                # Compute R1 = closure of lhs restricted to relation
                closure = compute_closure(lhs, fds)
                r1 = closure.intersection(rel)
                
                # Compute R2 = relation - (closure - lhs)
                r2 = rel - (closure - lhs)
                
                # Replace the violating relation
                result.pop(i)
                result.append(r1)
                result.append(r2)
                break
        
        if not found_violation:
            break
    
    return result
 
# Example usage
if __name__ == "__main__":
    # CourseSchedule example
    relation = {'StudentID', 'CourseID', 'Instructor', 'Room', 'Building', 'Capacity'}
    fds = [
        ({'CourseID'}, {'Instructor'}),
        ({'Room'}, {'Building'}),
        ({'Room'}, {'Capacity'}),
    ]
    
    result = bcnf_decomposition(relation, fds)
    print("BCNF Decomposition:")
    for i, r in enumerate(result, 1):
        print(f"  R{i}: {r}")

Implementation Complexity

The naive implementation has exponential complexity because projecting FDs correctly requires considering all subsets. In practice, optimizations like using the canonical cover and caching closures make the algorithm tractable for typical database schemas with tens of attributes.

Common Pitfalls and Mistakes

Students and practitioners frequently make errors when applying the BCNF decomposition algorithm. Understanding these pitfalls helps avoid them.

Critical Mistakes to Avoid

•Forgetting to recompute applicable FDs: After decomposition, you must determine which FDs apply to each new relation. An FD X → Y applies to Rᵢ only if both X ⊆ Rᵢ and Y ⊆ Rᵢ.
•Not using attribute closure: Checking if X is a superkey requires computing X⁺, not just looking at explicit FDs. Derived FDs (via transitivity) can make X a superkey.
•Stopping too early: Even if the original FD is resolved, new violations may exist in the decomposed relations. Always re-check each piece.
•Confusing keys across relations: The candidate keys of R are NOT automatically the keys of decomposed relations. Each sub-relation has its own keys.
•Ignoring trivial FDs: While trivial FDs (X → X) don't cause BCNF violations, forgetting to handle them properly can confuse the analysis.
•Not verifying losslessness: Although the algorithm guarantees losslessness, it's wise to verify by checking that R₁ ∩ R₂ is a key of R₁ or R₂.

Wrong Approach

Checking only explicit FDs:

'A → B is not a violation because I don't see A → C, A → D, etc.'

This ignores that A⁺ might include C and D through transitivity!

Correct Approach

Always compute A⁺:

'A⁺ = {A, B, C, D} which equals the entire relation, so A IS a superkey and A → B does NOT violate BCNF.'

Summary and Key Takeaways

The BCNF decomposition algorithm is a fundamental technique in database design. Let's consolidate the essential points.

Key Takeaways

•Algorithm Core: Repeatedly find a BCNF violation X → Y and split the relation into XY and R - (Y - X). Continue until all relations are in BCNF.
•Termination: The algorithm always terminates because each split strictly reduces the number of attributes in at least one relation.
•Lossless Guarantee: Every split preserves information because X (the common attribute) functionally determines the separated attributes.
•Non-Determinism: Different violation choices lead to different (but all valid) decompositions.
•Attribute Closure is Essential: Always compute X⁺ to correctly determine superkeys and identify violations.
•FD Projection: After decomposition, determine which FDs apply to each new relation by checking attribute containment.

Page Complete

You now understand the BCNF decomposition algorithm: its formal statement, step-by-step execution, correctness guarantees, and implementation. In the next page, we'll explore the lossless guarantee in greater depth, examining the mathematical foundations and practical implications.

1 / 5

Loading learning content...

Database Management SystemsBCNF Decomposition

BCNF Decomposition

LevelIntermediate

Duration75 mins

TopicBCNF Decomposition

1 / 5

BCNF Decomposition Algorithm

The Art of Decomposing Relations into BCNF

What You Will Learn

Prerequisites and Foundations

Before diving into the algorithm, let us consolidate the foundational concepts that make the decomposition procedure possible.

Essential Prerequisites

•Functional Dependency (FD): A constraint X → Y stating that the value of attribute set X uniquely determines the value of attribute set Y. If two tuples agree on X, they must agree on Y.
•Candidate Key: A minimal set of attributes that functionally determines all attributes in the relation. No proper subset of a candidate key is also a superkey.
•Superkey: Any set of attributes that functionally determines all attributes. Every candidate key is a superkey, but not vice versa.
•BCNF Definition: A relation R is in BCNF if and only if for every non-trivial functional dependency X → Y that holds in R, X is a superkey of R.
•Attribute Closure (X⁺): The set of all attributes functionally determined by X. Used to test whether X is a superkey (X⁺ = all attributes).
•Lossless Decomposition: A decomposition where the natural join of the decomposed relations always reconstructs the original relation exactly—no spurious tuples are generated.

The BCNF Violation Pattern

The Central Insight:

When a relation violates BCNF due to a functional dependency X → Y (where X is not a superkey), we can decompose the relation into two smaller relations:

One relation containing X ∪ Y (everything involved in the problematic FD)
Another relation containing X ∪ (R - Y) (X plus everything else except what Y determines)

This split isolates the problematic dependency into its own relation while preserving the ability to reconstruct the original data through joins.

The BCNF Decomposition Algorithm

The BCNF decomposition algorithm is an iterative procedure that repeatedly identifies and eliminates BCNF violations until all resulting relations are in BCNF.

Formal Algorithm Statement

Input: A relation schema R with attribute set U and a set of functional dependencies F.

Output: A decomposition of R into relations R₁, R₂, ..., Rₙ, each in BCNF.

Procedure:

Initialize result = {R}
While there exists a relation Rᵢ in result that is not in BCNF: a. Find a non-trivial FD X → Y that violates BCNF in Rᵢ (X is not a superkey of Rᵢ) b. Compute X⁺ with respect to F c. Replace Rᵢ with two relations: • R₁ = X⁺ (all attributes determined by X) • R₂ = X ∪ (Rᵢ - X⁺ - Y) ∪ X = X ∪ (Rᵢ - (X⁺ - X)) Simplified: R₁ = X⁺ and R₂ = Rᵢ - (X⁺ - X)
Return result

A Cleaner Formulation:

In practice, the algorithm is often stated more simply:

Given: Relation R with attributes U and FD set F

1. result := {R}
2. While some Rᵢ ∈ result violates BCNF:
   2.1. Find X → Y violating BCNF in Rᵢ (X not a superkey, X → Y non-trivial)
   2.2. Decompose Rᵢ into:
        • Rₐ = XY (attributes of the violating FD)
        • Rᵦ = Rᵢ - Y + X = Rᵢ - (Y - X) (remaining attributes plus X)
   2.3. Replace Rᵢ with Rₐ and Rᵦ in result
3. Return result

The key insight is that X becomes a key in Rₐ (since X → Y and Rₐ = XY), and X serves as a foreign key in Rᵦ to reference Rₐ.

Algorithm Components Explained
Component	Description	Purpose
X → Y	The BCNF-violating functional dependency	Identifies what needs to be separated
Rₐ = XY	The new relation containing the violating FD	X becomes a key here, satisfying BCNF for this FD
Rᵦ = R - (Y - X)	Original relation minus the dependent attributes (keeping X)	Removes redundancy while keeping join capability
X as common attribute	X appears in both Rₐ and Rᵦ	Enables lossless join reconstruction

Step-by-Step Worked Example

Let's trace the algorithm through a comprehensive example that demonstrates multiple decomposition steps.

Example Schema

Relation: CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)

Candidate Key: {StudentID, CourseID, Room}

Step 0: Initial State

result = {CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)}

Step 1: Check for BCNF Violations

We examine each FD to see if its determinant is a superkey:

FD	Determinant	Is Superkey?	BCNF Violation?
CourseID → Instructor	{CourseID}	No (doesn't determine StudentID or Room)	YES
Room → Building	{Room}	No (doesn't determine StudentID or CourseID)	YES
Room → Capacity	{Room}	No	YES

Multiple violations exist. We pick CourseID → Instructor first.

Step 2: First Decomposition (CourseID → Instructor)

Violating FD: CourseID → Instructor

Decompose CourseSchedule into:

R₁ = {CourseID, Instructor} (the violating FD's attributes)
R₂ = CourseSchedule - {Instructor} + {CourseID} = {StudentID, CourseID, Room, Building, Capacity}

Now result = {R₁(CourseID, Instructor), R₂(StudentID, CourseID, Room, Building, Capacity)}

Verify R₁ is in BCNF:

FD: CourseID → Instructor
Is CourseID a superkey of R₁? Yes! CourseID⁺ = {CourseID, Instructor} = R₁
R₁ is in BCNF ✓

Step 3: Check R₂ for BCNF Violations

R₂ = {StudentID, CourseID, Room, Building, Capacity}

FDs that apply to R₂:

Room → Building (since Room and Building are in R₂)
Room → Capacity

Is {Room} a superkey of R₂? Room⁺ = {Room, Building, Capacity} ≠ R₂

Room → Building violates BCNF in R₂!

Step 4: Second Decomposition (Room → Building)

Actually, let's use Room → {Building, Capacity} since Room⁺ includes both.

Decompose R₂ into:

R₃ = {Room, Building, Capacity} (Room⁺ restricted to R₂)
R₄ = R₂ - {Building, Capacity} + {Room} = {StudentID, CourseID, Room}

Now result = {R₁, R₃, R₄}

Verify R₃ is in BCNF:

FDs: Room → Building, Room → Capacity
Is Room a superkey? Room⁺ = {Room, Building, Capacity} = R₃ ✓
R₃ is in BCNF ✓

Verify R₄ is in BCNF:

What FDs apply? None of our original FDs have determinants that are subsets of {StudentID, CourseID, Room} except trivial ones.
The only candidate key is {StudentID, CourseID, Room} itself.
No non-trivial FDs mean no violations.
R₄ is in BCNF ✓

Final BCNF Decomposition

Original: CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)

BCNF Decomposition: • R₁(CourseID, Instructor) — Key: CourseID • R₃(Room, Building, Capacity) — Key: Room • R₄(StudentID, CourseID, Room) — Key: {StudentID, CourseID, Room}

Each relation is now in BCNF. Redundancy has been eliminated!

Why the Algorithm Works

Termination Guarantee

•Observation: Each decomposition step strictly reduces the number of attributes in at least one resulting relation.
•Specifically: When we split R into Rₐ = XY and Rᵦ = R - (Y - X), both Rₐ and Rᵦ have fewer attributes than R (assuming the FD is non-trivial and X is not a superkey).
•Lower Bound: A relation with only one or two attributes cannot violate BCNF (any FD is trivial or makes the determinant a key).
•Conclusion: Since attribute count strictly decreases and has a lower bound, the algorithm must terminate.
•Worst Case: At most O(n) decomposition steps for a relation with n attributes, since each step removes at least one attribute from consideration.

Proof of Lossless Join:

The decomposition of R into Rₐ = XY and Rᵦ = R - (Y - X) satisfies the lossless join condition.

Theorem: A decomposition of R into R₁ and R₂ is lossless if and only if:

(R₁ ∩ R₂) → R₁, OR
(R₁ ∩ R₂) → R₂

In our decomposition:

R₁ = XY, R₂ = R - (Y - X) = X ∪ (R - Y)
R₁ ∩ R₂ = X (The common attributes)
Does X → XY? Yes, because X → Y is the violating FD, so X → X (trivial) and X → Y, therefore X → XY.

Since (R₁ ∩ R₂) → R₁, the decomposition is lossless!

Intuitive Understanding

Algorithm Variations and Refinements

The basic BCNF decomposition algorithm admits several variations that can affect the final result and its properties.

Algorithm Variations
Variation	Description	Trade-offs
Maximal Decomposition	Use X⁺ instead of just XY for the first sub-relation	Fewer relations, but may miss opportunities for dependency preservation
Minimal Decomposition	Use exactly XY where Y is a single attribute	More relations, but easier to analyze and modify
Choice of Violating FD	Different orderings of FD selection produce different decompositions	All are correct BCNF, but may have different numbers of relations
Canonical Cover First	Compute canonical cover of F before decomposing	Reduces redundant work and can simplify the final schema

The Choice Matters:

Consider a relation R(A, B, C, D) with FDs: A → B, A → C, A → D.

Approach 1 (Maximal):

A⁺ = {A, B, C, D} = R
A is already a superkey!
R is in BCNF. No decomposition needed.

Approach 2 (Minimal steps on different ordering):

If we processed B → C (hypothetically), we'd decompose differently.

The lesson: always compute the attribute closure to determine if you really have a violation. The determinant might be a superkey after all!

Non-Determinism Warning

Implementing the Algorithm

Let's examine how to implement the BCNF decomposition algorithm programmatically. The implementation requires careful handling of attribute sets and functional dependency projection.

bcnf_decomposition.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
def compute_closure(attrs: set, fds: list) -> set:
    """Compute the closure of a set of attributes under given FDs."""
    closure = attrs.copy()
    changed = True
    while changed:
        changed = False
        for lhs, rhs in fds:
            if lhs.issubset(closure) and not rhs.issubset(closure):
                closure = closure.union(rhs)
                changed = True
    return closure
 
def is_superkey(attrs: set, relation: set, fds: list) -> bool:
    """Check if attrs is a superkey of the relation."""
    closure = compute_closure(attrs, fds)
    return relation.issubset(closure)
 
def find_bcnf_violation(relation: set, fds: list) -> tuple:
    """Find an FD that violates BCNF, or return None if in BCNF."""
    for lhs, rhs in fds:
        # Check if FD is applicable to this relation
        if lhs.issubset(relation) and rhs.issubset(relation):
            # Check if it's non-trivial
            if not rhs.issubset(lhs):
                # Check if lhs is NOT a superkey
                if not is_superkey(lhs, relation, fds):
                    return (lhs, rhs)
    return None
 
def project_fds(fds: list, relation: set) -> list:
    """Project FDs onto a relation (compute FDs that apply)."""
    projected = []
    # We need to compute the closure of each subset
    # This is expensive but correct
    for lhs, rhs in fds:
        if lhs.issubset(relation):
            closure = compute_closure(lhs, fds)
            new_rhs = closure.intersection(relation) - lhs
            if new_rhs:
                projected.append((lhs, new_rhs))
    return projected
 
def bcnf_decomposition(relation: set, fds: list) -> list:
    """
    Decompose a relation into BCNF.
    Returns a list of relations (each a set of attributes) in BCNF.
    """
    result = [relation]
    
    while True:
        # Find a relation that violates BCNF
        found_violation = False
        for i, rel in enumerate(result):
            violation = find_bcnf_violation(rel, fds)
            if violation:
                found_violation = True
                lhs, rhs = violation
                
                # Compute R1 = closure of lhs restricted to relation
                closure = compute_closure(lhs, fds)
                r1 = closure.intersection(rel)
                
                # Compute R2 = relation - (closure - lhs)
                r2 = rel - (closure - lhs)
                
                # Replace the violating relation
                result.pop(i)
                result.append(r1)
                result.append(r2)
                break
        
        if not found_violation:
            break
    
    return result
 
# Example usage
if __name__ == "__main__":
    # CourseSchedule example
    relation = {'StudentID', 'CourseID', 'Instructor', 'Room', 'Building', 'Capacity'}
    fds = [
        ({'CourseID'}, {'Instructor'}),
        ({'Room'}, {'Building'}),
        ({'Room'}, {'Capacity'}),
    ]
    
    result = bcnf_decomposition(relation, fds)
    print("BCNF Decomposition:")
    for i, r in enumerate(result, 1):
        print(f"  R{i}: {r}")

Implementation Complexity

Common Pitfalls and Mistakes

Students and practitioners frequently make errors when applying the BCNF decomposition algorithm. Understanding these pitfalls helps avoid them.

Critical Mistakes to Avoid

•Forgetting to recompute applicable FDs: After decomposition, you must determine which FDs apply to each new relation. An FD X → Y applies to Rᵢ only if both X ⊆ Rᵢ and Y ⊆ Rᵢ.
•Not using attribute closure: Checking if X is a superkey requires computing X⁺, not just looking at explicit FDs. Derived FDs (via transitivity) can make X a superkey.
•Stopping too early: Even if the original FD is resolved, new violations may exist in the decomposed relations. Always re-check each piece.
•Confusing keys across relations: The candidate keys of R are NOT automatically the keys of decomposed relations. Each sub-relation has its own keys.
•Ignoring trivial FDs: While trivial FDs (X → X) don't cause BCNF violations, forgetting to handle them properly can confuse the analysis.
•Not verifying losslessness: Although the algorithm guarantees losslessness, it's wise to verify by checking that R₁ ∩ R₂ is a key of R₁ or R₂.

Wrong Approach

Checking only explicit FDs:

'A → B is not a violation because I don't see A → C, A → D, etc.'

This ignores that A⁺ might include C and D through transitivity!

Correct Approach

Always compute A⁺:

'A⁺ = {A, B, C, D} which equals the entire relation, so A IS a superkey and A → B does NOT violate BCNF.'

Summary and Key Takeaways

The BCNF decomposition algorithm is a fundamental technique in database design. Let's consolidate the essential points.

Key Takeaways

•Algorithm Core: Repeatedly find a BCNF violation X → Y and split the relation into XY and R - (Y - X). Continue until all relations are in BCNF.
•Termination: The algorithm always terminates because each split strictly reduces the number of attributes in at least one relation.
•Lossless Guarantee: Every split preserves information because X (the common attribute) functionally determines the separated attributes.
•Non-Determinism: Different violation choices lead to different (but all valid) decompositions.
•Attribute Closure is Essential: Always compute X⁺ to correctly determine superkeys and identify violations.
•FD Projection: After decomposition, determine which FDs apply to each new relation by checking attribute containment.

Page Complete

1 / 5