Bcnf - Learning Module | OneNoughtOne

Loading content...

0/252

Decomposition Algorithm

From Violation to Compliance

Detecting BCNF violations is only half the battle. The real work lies in eliminating violations while preserving the essential information content of the original relation. This is where the BCNF decomposition algorithm enters—a systematic procedure that transforms any relation into a collection of BCNF-compliant relations.

The decomposition algorithm is one of the most elegant results in database theory. It guarantees that we can always achieve BCNF (unlike some other normal forms with more restrictive criteria), and it provides a deterministic path from a violated schema to a normalized one. However, this power comes with trade-offs we must understand.

What You Will Learn

By the end of this page, you will master the BCNF decomposition algorithm, understand why it guarantees lossless-join decomposition, work through multiple examples of increasing complexity, and appreciate the algorithm's computational characteristics.

The Core Algorithm

The BCNF decomposition algorithm is conceptually simple yet powerful. It repeatedly identifies violations and decomposes relations until no violations remain.

bcnf_decomposition.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Algorithm: BCNF Decomposition
Input: Relation R with attribute set U, Functional dependency set F
Output: Set of relations {R₁, R₂, ..., Rₙ} all in BCNF
 
function decomposeToBCNF(R: Relation, F: FunctionalDependencies) -> Set<Relation>:
    result = {R}  // Start with the original relation
    
    while exists relation Rᵢ in result that is not in BCNF:
        // Step 1: Find a BCNF-violating FD in Rᵢ
        // X → Y where X is not a superkey of Rᵢ and Y ⊄ X
        (X → Y) = findViolatingFD(Rᵢ, projectFDs(F, Rᵢ))
        
        // Step 2: Decompose Rᵢ into two relations
        // R₁ = X ∪ Y (contains the violating dependency)
        // R₂ = Rᵢ - (Y - X) = X ∪ (Rᵢ - Y) (remaining attributes with X)
        R₁ = X ∪ Y
        R₂ = attributes(Rᵢ) - Y + X  // Equivalently: X ∪ (attributes(Rᵢ) - Y)
        
        // Step 3: Replace Rᵢ with R₁ and R₂
        result.remove(Rᵢ)
        result.add(R₁)
        result.add(R₂)
    
    return result
 
// Helper: Project FDs onto a subset of attributes
function projectFDs(F: FunctionalDependencies, R: Relation) -> FunctionalDependencies:
    projected = {}
    for each subset X of attributes(R):
        closure = computeClosure(X, F)
        for each attribute A in (closure ∩ attributes(R)):
            if A not in X:
                projected.add(X → A)
    // Return minimal cover of projected
    return minimalCover(projected)

Algorithm Walkthrough:

Initialization: We start with the original relation R in our result set.

Iteration: While any relation in our result set violates BCNF:

Find a violating dependency X → Y in some relation Rᵢ
- X must not be a superkey of Rᵢ
- The dependency must be non-trivial (Y ⊄ X)
Decompose into two relations:
- R₁ = X ∪ Y — Contains the violating dependency's attributes. Here, X will be a superkey (since X → Y is the only relevant FD, and X trivially determines itself).
- R₂ = X ∪ (Rᵢ - Y) — Contains X (for join capability) plus all other attributes not in Y.
Replace the violated relation with the two new relations.

Termination: When no relation has BCNF violations, we're done.

Key Insight: Each decomposition step removes at least one attribute from consideration for the violating relation (Y - X moves to R₁), so the algorithm must terminate—we can't decompose forever.

The Decomposition Choice

When decomposing R based on X → Y, we create R₁ = XY and R₂ = R - (Y - X). The attribute set X appears in BOTH relations—this is essential for the lossless-join property. X serves as the "bridge" that allows us to reconstruct original tuples.

Why Decomposition is Lossless

A critical property of the BCNF decomposition algorithm is that it produces lossless-join decompositions. This means the original relation can always be perfectly reconstructed by joining the decomposed relations—no information is lost, and no spurious tuples are created.

Lossless-Join Theorem

A decomposition of R into R₁ and R₂ is lossless-join with respect to F if and only if:

(R₁ ∩ R₂) → (R₁ - R₂) ∈ F⁺ OR (R₁ ∩ R₂) → (R₂ - R₁) ∈ F⁺

In words: The common attributes must functionally determine all attributes of at least one of the decomposed relations.

Proof That BCNF Decomposition Is Lossless:

When we decompose R based on violation X → Y:

R₁ = X ∪ Y
R₂ = X ∪ (R - Y)

The intersection is:

R₁ ∩ R₂ = X (since Y and (R - Y) are disjoint, only X is common)

Now, consider which relation X determines:

R₁ - R₂ = Y - X (attributes in R₁ but not R₂)
Since X → Y, we have X → (Y - X)
Therefore, (R₁ ∩ R₂) → (R₁ - R₂) holds!

The lossless-join condition is satisfied. □

Intuitive Explanation:

When we decompose on X → Y, we're effectively saying:

"Store the X-to-Y mapping in one table (R₁)"
"Store everything else with X in another table (R₂)"
"Use X to join them back together"

Since X → Y, there's exactly one Y value for each X value. When we join on X, we reconstruct the original pairing perfectly—no duplicates, no losses.

Lossless-Join Verification Example
Step	R₁ (X ∪ Y)	R₂ (X ∪ remaining)	R₁ ⋈ R₂
Original R = {A, B, C} with A → B	R₁ = {A, B}	R₂ = {A, C}	Should = R
Sample data: (a1, b1, c1), (a1, b1, c2), (a2, b2, c3)	R₁: (a1, b1), (a2, b2)	R₂: (a1, c1), (a1, c2), (a2, c3)
Join on A			(a1, b1, c1), (a1, b1, c2), (a2, b2, c3) = R ✓

Why Lossy Decomposition Is Unacceptable

A lossy decomposition creates spurious tuples when joined—tuples that weren't in the original data. This corrupts query results and makes the database unreliable. The guarantee of lossless-join is non-negotiable for any normalization algorithm.

Complete Worked Example

Let's work through a complete decomposition example, step by step, to solidify understanding of the algorithm.

Problem Statement

Relation: R(A, B, C, D, E)

Functional Dependencies: • A → BC • C → D • BD → E

Decompose R into BCNF.

Step 0: Find Candidate Keys

First, identify what the candidate keys are so we can detect violations.

Attributes never on RHS: A (A doesn't appear on any right-hand side)

A must be in every key.

Compute A⁺:

Start: {A}
A → BC: {A, B, C}
C → D: {A, B, C, D}
BD → E: {A, B, C, D, E} = R ✓

A⁺ = R, so {A} is a superkey. It's also minimal (single attribute).

Candidate Key: {A}

Step 1: Find BCNF Violations

Check each FD:

A → BC: A⁺ = R, so A is a superkey. ✓ No violation.
C → D: C⁺ = {C, D}. C is NOT a superkey. ✗ Violation!
BD → E: (BD)⁺ = {B, D, E}. BD is NOT a superkey. ✗ Violation!

We have two violations. We'll address C → D first (arbitrary choice).

Step 2: First Decomposition (on C → D)

Violating FD: C → D

X = {C}, Y = {D}
R₁ = C ∪ D = {C, D}
R₂ = R - (Y - X) = R - D = {A, B, C, E}

Result so far: {R₁(C, D), R₂(A, B, C, E)}

Step 3: Check R₁ for BCNF

R₁(C, D) with projected FDs:

C → D (from original)

C⁺ in R₁ = {C, D} = R₁, so C is a superkey.

R₁ is in BCNF. ✓

Step 4: Check R₂ for BCNF

R₂(A, B, C, E) with projected FDs:

Compute relevant FDs:

A → BC still applies (A → B, C are in R₂)
C → D? D is not in R₂, so this doesn't apply.
BD → E? D is not in R₂, so this doesn't apply.

But wait—we need to compute the projection properly. What FDs hold over {A, B, C, E}?

Using original F:

A⁺ = {A, B, C, D, E}, so A → B, A → C, A → E (all in R₂)
B⁺ = {B}, nothing new
C⁺ = {C, D}, but D ∉ R₂, so no new FDs from C in R₂
E⁺ = {E}, nothing
(AB)⁺ = {A, B, C, D, E}, gives AB → C, E (but A already gives these)
etc.

Projected FDs on R₂: A → BCE (within R₂)

Now check BCNF:

A → BCE: A⁺ in R₂ = {A, B, C, E} = R₂. A is a superkey. ✓

R₂ is in BCNF. ✓

Wait—What About BD → E?

Let me reconsider. BD → E was in original F, but D is not in R₂. Does any FD involving only {A, B, C, E} violate BCNF?

We need to check if there's any non-superkey X within {A, B, C, E} that determines some Y within {A, B, C, E}.

For the FD B → something:

B⁺ using F = {B}. Adding BD → E doesn't help (D not in closure yet).
B determines nothing in R₂.

For C:

C⁺ = {C, D}. D ∉ R₂. C determines nothing in R₂.

For E:

E⁺ = {E}. Nothing.

For BC:

(BC)⁺ = {B, C, D}. D ∉ R₂. Determines nothing new in R₂.

For BE:

(BE)⁺ = {B, E}. Nothing new.

Actually, R₂ is fine because:

A is the only determinant, and A is a superkey.

Final Result: {R₁(C, D), R₂(A, B, C, E)}

Both relations are in BCNF.

Decomposition Complete

Original R(A, B, C, D, E) decomposes to: • R₁(C, D) with FD C → D — C is key • R₂(A, B, C, E) with FD A → BCE — A is key

Join on C reconstructs the original relation.

Multi-Level Decomposition

Sometimes, a single decomposition step isn't enough. The resulting relations may themselves violate BCNF, requiring further decomposition. Let's examine such a case.

Example: Deeper Decomposition Needed

Relation: R(A, B, C, D)

Functional Dependencies: • AB → CD • C → A

The candidate key is {A, B}... or is it?

Step 0: Find Candidate Keys

Attributes never on RHS: B (B never appears on right side)

B must be in every candidate key.

Compute closures:

B⁺ = {B}. Not a superkey.
(AB)⁺: AB → CD gives {A, B, C, D} = R. Superkey! Minimal? A⁺ = {A}, so yes.
(BC)⁺: Start {B, C}. C → A gives {A, B, C}. AB → CD gives {A, B, C, D} = R. Superkey!
- Is BC minimal? B⁺ = {B}, C⁺ = {A, C} (from C → A). Neither is R. Yes, minimal.

Candidate Keys: {AB}, {BC}

Prime attributes: A, B, C (all in some candidate key) Non-prime: D

Step 1: Check BCNF

AB → CD: (AB)⁺ = R. AB is superkey. ✓
C → A: C⁺ = {A, C} ≠ R. C is NOT a superkey. ✗ Violation!

Step 2: Decompose on C → A

R₁ = C ∪ A = {A, C}
R₂ = R - (A - C) = R - A = {B, C, D}

Result: {R₁(A, C), R₂(B, C, D)}

Step 3: Check R₁(A, C)

Projected FD: C → A

C⁺ in R₁ = {A, C} = R₁. C is superkey. ✓

R₁ is in BCNF.

Step 4: Check R₂(B, C, D)

Projected FDs on {B, C, D}:

From AB → CD: But A ∉ R₂. Does anything else determine D?

Let me compute closures within R₂:

B⁺ using F? B alone determines nothing.
C⁺ = {A, C}. But A ∉ R₂. So C determines nothing in R₂.
(BC)⁺ = {A, B, C} then → D. So BC → D in R₂.

Check: Is BC a superkey of R₂?

(BC)⁺ in R₂ context: BC → D (derived from original FDs), so {B, C, D} = R₂. Yes!

R₂ is in BCNF with candidate key {BC}. ✓

Final Result: {R₁(A, C), R₂(B, C, D)}

A Case Requiring Multiple Iterations:

Consider R(A, B, C, D, E) with:

A → B
B → C
C → D

Iteration 1: Decompose on A → B (A is not superkey since key would need to include E)

R₁(A, B), R₂(A, C, D, E)

Iteration 2: Check R₂. Does B → C apply? No, B ∉ R₂. But we might have A → C (transitively through B). Actually:

A⁺ = {A, B, C, D}. So A → C, A → D. But E is still not determined.
In R₂, we have A → C, A → D (since B was removed, but the transitive effect remains).
Specifically, for R₂(A, C, D, E), projected FDs include A → CD.
Is A a superkey of R₂? A⁺ in R₂ = {A, C, D} ≠ R₂ (missing E). No!

So we need more decomposition... This illustrates how projecting FDs can create new violations to handle.

Algorithm Termination

Despite potentially multiple iterations, BCNF decomposition always terminates. Each decomposition reduces the size of at least one relation. Eventually, all relations are either single attributes (trivially in BCNF) or have all FDs satisfied by superkey determinants.

Algorithm Visualization

Visualizing the decomposition process helps understand how relations branch and when the algorithm terminates.

Converting Mermaid diagram...

Tree Interpretation:

The decomposition forms a tree:

Root: Original relation
Internal nodes: Intermediate decomposition decisions
Leaves: Final BCNF-compliant relations

Each path from root to leaf represents a sequence of decompositions for that portion of the schema.

Decision Points:

At each step, we choose:

Which relation to process (if multiple have violations)
Which violating FD to use (if multiple exist)

Different choices may lead to different final decompositions, though all valid decompositions will be in BCNF and lossless-join.

Non-Uniqueness of Decomposition

The BCNF decomposition is not unique. Different orderings of addressing violations can produce different (but equally valid) final schemas. Choose based on semantic meaningfulness—each final relation should represent a coherent real-world concept.

Implementation Considerations

When implementing BCNF decomposition in practice—whether in a normalization tool or as part of schema design—several practical considerations arise.

Implementation Challenges

•FD Projection Complexity — Computing projected FDs for a relation can be expensive. The naive approach (checking all attribute subsets) is exponential in the number of attributes.
•Candidate Key Discovery — Finding all candidate keys is NP-complete in general. Practical algorithms use heuristics like identifying must-be-in-key attributes first.
•Violation Selection Strategy — When multiple violations exist, the order of addressing them affects the final schema structure. No single order is universally optimal.
•Semantic Preservation — The algorithm doesn't consider semantic meaning. Resulting relations may be technically correct but semantically awkward.
•Schema Naming — Automated decomposition produces relations like R₁, R₂. Meaningful names require human judgment.

optimized_bcnf.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Optimized BCNF Decomposition with Practical Enhancements
 
function decomposeToBCNFOptimized(R, F):
    result = {R}
    
    while true:
        // Find any relation with a violation
        violated = null
        violatingFD = null
        
        for each Rᵢ in result:
            // Optimization: Only check given FDs, not all projected FDs
            for each (X → Y) in F where X ⊆ attributes(Rᵢ) and Y ⊆ attributes(Rᵢ):
                XClosure = computeClosureRestricted(X, F, attributes(Rᵢ))
                if XClosure ≠ attributes(Rᵢ):  // X is not superkey
                    if not Y.isSubsetOf(X):    // Non-trivial
                        violated = Rᵢ
                        violatingFD = (X → Y)
                        break
            if violated: break
        
        if not violated:
            return result  // Done - all BCNF
        
        // Decompose on the violating FD
        (X, Y) = violatingFD
        
        // Compute full closure of X for R₁
        XFullClosure = computeClosure(X, F)
        R₁Attrs = XFullClosure ∩ attributes(violated)  // Everything X determines within Rᵢ
        R₂Attrs = X ∪ (attributes(violated) - R₁Attrs + X)
        
        // This optimization: put everything X determines into R₁
        // Reduces total number of decomposition steps
        
        result.remove(violated)
        result.add(RelationWith(R₁Attrs, projectFDsOptimized(F, R₁Attrs)))
        result.add(RelationWith(R₂Attrs, projectFDsOptimized(F, R₂Attrs)))
 
// Optimized FD projection - use attribute closure, not enumeration
function projectFDsOptimized(F, Attrs):
    projected = {}
    for each X ⊆ Attrs:
        closure = computeClosureRestricted(X, F, Attrs)
        for each A in (closure - X):
            // Only add if not derivable from smaller X
            if not isRedundant(X → A, projected):
                projected.add(X → A)
    return minimalCover(projected)

Practical Optimization

When decomposing on X → Y, consider putting ALL attributes determined by X (not just Y) into R₁. This can reduce the number of decomposition iterations by handling multiple violations in one step.

Complexity Analysis

Understanding the computational complexity of BCNF decomposition helps in both theoretical analysis and practical implementation decisions.

BCNF Decomposition Complexity
Component	Complexity	Notes
Closure computation	O(n²) per FD set	n = number of attributes; iterate until no change
Candidate key finding	O(2ⁿ) worst case	NP-complete; practical cases much better
BCNF violation check	O(\|F\| × n²)	Check each FD, compute closure
FD projection	O(2ⁿ × n²)	Check all subsets; expensive but bounded
Number of decompositions	O(n)	Each cuts at least one attribute; at most n-1 cuts
Overall algorithm	O(n × 2ⁿ × n²) = O(n³ × 2ⁿ)	Exponential in attributes

Why Exponential Isn't Always Bad:

While the worst-case complexity is exponential in the number of attributes, practical database schemas have:

Limited attribute counts — Most relations have 5-20 attributes. 2²⁰ ≈ 1 million is manageable.
Sparse FD sets — Real schemas don't have every possible FD. The number of FDs is typically O(n).
Heuristics apply — Must-be-in-key attributes prune the search space dramatically.
Early termination — Often, the first candidate key found is sufficient; we don't need all of them.

In Practice:

For typical database schemas (fewer than 50 attributes, fewer than 100 FDs), BCNF decomposition completes in milliseconds. The exponential nature only matters for pathological cases unlikely in real-world design.

Theoretical vs Practical

Database theory often deals with worst-case exponential algorithms that work fine in practice. BCNF decomposition is one example. Understanding the theoretical bounds helps recognize when a problem might be computationally difficult, but practical experience shows it's rarely an issue for schema design.

Summary: Decomposition Algorithm

We've thoroughly covered the BCNF decomposition algorithm—the practical tool for transforming violated schemas into normalized ones. Let's consolidate:

Key Takeaways

•The algorithm is simple and iterative — Find a violation, decompose into two relations with the violating determinant as the common attribute, repeat until done.
•Decomposition is always lossless — Because we decompose on X → Y, the common attributes X determine one of the decomposed relations, satisfying the lossless-join theorem.
•The algorithm always terminates — Each step reduces relation size. Eventually, all relations satisfy BCNF.
•Multiple decompositions are possible — Different violation orderings yield different final schemas. Choose based on semantic clarity.
•FD projection can create new violations — Decomposed relations may have new violations requiring further decomposition.
•Complexity is manageable in practice — Despite exponential worst-case, real schemas decompose quickly.

What's Next:

The BCNF decomposition algorithm guarantees lossless joins, but it does NOT guarantee dependency preservation. The next page explores this critical trade-off—why some functional dependencies may become uncheckable after decomposition, what this means for data integrity, and how to navigate this fundamental limitation of BCNF.

Page Complete

You now understand how to systematically decompose any relation into BCNF form using the standard algorithm. The next page addresses the most significant trade-off of BCNF: the potential loss of dependency preservation.

Decomposition Algorithm

From Violation to Compliance

What You Will Learn

The Core Algorithm

The BCNF decomposition algorithm is conceptually simple yet powerful. It repeatedly identifies violations and decomposes relations until no violations remain.

bcnf_decomposition.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
Algorithm: BCNF Decomposition
Input: Relation R with attribute set U, Functional dependency set F
Output: Set of relations {R₁, R₂, ..., Rₙ} all in BCNF
 
function decomposeToBCNF(R: Relation, F: FunctionalDependencies) -> Set<Relation>:
    result = {R}  // Start with the original relation
    
    while exists relation Rᵢ in result that is not in BCNF:
        // Step 1: Find a BCNF-violating FD in Rᵢ
        // X → Y where X is not a superkey of Rᵢ and Y ⊄ X
        (X → Y) = findViolatingFD(Rᵢ, projectFDs(F, Rᵢ))
        
        // Step 2: Decompose Rᵢ into two relations
        // R₁ = X ∪ Y (contains the violating dependency)
        // R₂ = Rᵢ - (Y - X) = X ∪ (Rᵢ - Y) (remaining attributes with X)
        R₁ = X ∪ Y
        R₂ = attributes(Rᵢ) - Y + X  // Equivalently: X ∪ (attributes(Rᵢ) - Y)
        
        // Step 3: Replace Rᵢ with R₁ and R₂
        result.remove(Rᵢ)
        result.add(R₁)
        result.add(R₂)
    
    return result
 
// Helper: Project FDs onto a subset of attributes
function projectFDs(F: FunctionalDependencies, R: Relation) -> FunctionalDependencies:
    projected = {}
    for each subset X of attributes(R):
        closure = computeClosure(X, F)
        for each attribute A in (closure ∩ attributes(R)):
            if A not in X:
                projected.add(X → A)
    // Return minimal cover of projected
    return minimalCover(projected)

Algorithm Walkthrough:

Initialization: We start with the original relation R in our result set.

Iteration: While any relation in our result set violates BCNF:

Find a violating dependency X → Y in some relation Rᵢ
- X must not be a superkey of Rᵢ
- The dependency must be non-trivial (Y ⊄ X)
Decompose into two relations:
- R₁ = X ∪ Y — Contains the violating dependency's attributes. Here, X will be a superkey (since X → Y is the only relevant FD, and X trivially determines itself).
- R₂ = X ∪ (Rᵢ - Y) — Contains X (for join capability) plus all other attributes not in Y.
Replace the violated relation with the two new relations.

Termination: When no relation has BCNF violations, we're done.

The Decomposition Choice

Why Decomposition is Lossless

Lossless-Join Theorem

A decomposition of R into R₁ and R₂ is lossless-join with respect to F if and only if:

(R₁ ∩ R₂) → (R₁ - R₂) ∈ F⁺ OR (R₁ ∩ R₂) → (R₂ - R₁) ∈ F⁺

In words: The common attributes must functionally determine all attributes of at least one of the decomposed relations.

Proof That BCNF Decomposition Is Lossless:

When we decompose R based on violation X → Y:

R₁ = X ∪ Y
R₂ = X ∪ (R - Y)

The intersection is:

R₁ ∩ R₂ = X (since Y and (R - Y) are disjoint, only X is common)

Now, consider which relation X determines:

R₁ - R₂ = Y - X (attributes in R₁ but not R₂)
Since X → Y, we have X → (Y - X)
Therefore, (R₁ ∩ R₂) → (R₁ - R₂) holds!

The lossless-join condition is satisfied. □

Intuitive Explanation:

When we decompose on X → Y, we're effectively saying:

"Store the X-to-Y mapping in one table (R₁)"
"Store everything else with X in another table (R₂)"
"Use X to join them back together"

Since X → Y, there's exactly one Y value for each X value. When we join on X, we reconstruct the original pairing perfectly—no duplicates, no losses.

Lossless-Join Verification Example
Step	R₁ (X ∪ Y)	R₂ (X ∪ remaining)	R₁ ⋈ R₂
Original R = {A, B, C} with A → B	R₁ = {A, B}	R₂ = {A, C}	Should = R
Sample data: (a1, b1, c1), (a1, b1, c2), (a2, b2, c3)	R₁: (a1, b1), (a2, b2)	R₂: (a1, c1), (a1, c2), (a2, c3)
Join on A			(a1, b1, c1), (a1, b1, c2), (a2, b2, c3) = R ✓

Why Lossy Decomposition Is Unacceptable

Complete Worked Example

Let's work through a complete decomposition example, step by step, to solidify understanding of the algorithm.

Problem Statement

Relation: R(A, B, C, D, E)

Functional Dependencies: • A → BC • C → D • BD → E

Decompose R into BCNF.

Step 0: Find Candidate Keys

First, identify what the candidate keys are so we can detect violations.

Attributes never on RHS: A (A doesn't appear on any right-hand side)

A must be in every key.

Compute A⁺:

Start: {A}
A → BC: {A, B, C}
C → D: {A, B, C, D}
BD → E: {A, B, C, D, E} = R ✓

A⁺ = R, so {A} is a superkey. It's also minimal (single attribute).

Candidate Key: {A}

Step 1: Find BCNF Violations

Check each FD:

A → BC: A⁺ = R, so A is a superkey. ✓ No violation.
C → D: C⁺ = {C, D}. C is NOT a superkey. ✗ Violation!
BD → E: (BD)⁺ = {B, D, E}. BD is NOT a superkey. ✗ Violation!

We have two violations. We'll address C → D first (arbitrary choice).

Step 2: First Decomposition (on C → D)

Violating FD: C → D

X = {C}, Y = {D}
R₁ = C ∪ D = {C, D}
R₂ = R - (Y - X) = R - D = {A, B, C, E}

Result so far: {R₁(C, D), R₂(A, B, C, E)}

Step 3: Check R₁ for BCNF

R₁(C, D) with projected FDs:

C → D (from original)

C⁺ in R₁ = {C, D} = R₁, so C is a superkey.

R₁ is in BCNF. ✓

Step 4: Check R₂ for BCNF

R₂(A, B, C, E) with projected FDs:

Compute relevant FDs:

A → BC still applies (A → B, C are in R₂)
C → D? D is not in R₂, so this doesn't apply.
BD → E? D is not in R₂, so this doesn't apply.

But wait—we need to compute the projection properly. What FDs hold over {A, B, C, E}?

Using original F:

A⁺ = {A, B, C, D, E}, so A → B, A → C, A → E (all in R₂)
B⁺ = {B}, nothing new
C⁺ = {C, D}, but D ∉ R₂, so no new FDs from C in R₂
E⁺ = {E}, nothing
(AB)⁺ = {A, B, C, D, E}, gives AB → C, E (but A already gives these)
etc.

Projected FDs on R₂: A → BCE (within R₂)

Now check BCNF:

A → BCE: A⁺ in R₂ = {A, B, C, E} = R₂. A is a superkey. ✓

R₂ is in BCNF. ✓

Wait—What About BD → E?

Let me reconsider. BD → E was in original F, but D is not in R₂. Does any FD involving only {A, B, C, E} violate BCNF?

We need to check if there's any non-superkey X within {A, B, C, E} that determines some Y within {A, B, C, E}.

For the FD B → something:

B⁺ using F = {B}. Adding BD → E doesn't help (D not in closure yet).
B determines nothing in R₂.

For C:

C⁺ = {C, D}. D ∉ R₂. C determines nothing in R₂.

For E:

E⁺ = {E}. Nothing.

For BC:

(BC)⁺ = {B, C, D}. D ∉ R₂. Determines nothing new in R₂.

For BE:

(BE)⁺ = {B, E}. Nothing new.

Actually, R₂ is fine because:

A is the only determinant, and A is a superkey.

Final Result: {R₁(C, D), R₂(A, B, C, E)}

Both relations are in BCNF.

Decomposition Complete

Original R(A, B, C, D, E) decomposes to: • R₁(C, D) with FD C → D — C is key • R₂(A, B, C, E) with FD A → BCE — A is key

Join on C reconstructs the original relation.

Multi-Level Decomposition

Sometimes, a single decomposition step isn't enough. The resulting relations may themselves violate BCNF, requiring further decomposition. Let's examine such a case.

Example: Deeper Decomposition Needed

Relation: R(A, B, C, D)

Functional Dependencies: • AB → CD • C → A

The candidate key is {A, B}... or is it?

Step 0: Find Candidate Keys

Attributes never on RHS: B (B never appears on right side)

B must be in every candidate key.

Compute closures:

B⁺ = {B}. Not a superkey.
(AB)⁺: AB → CD gives {A, B, C, D} = R. Superkey! Minimal? A⁺ = {A}, so yes.
(BC)⁺: Start {B, C}. C → A gives {A, B, C}. AB → CD gives {A, B, C, D} = R. Superkey!
- Is BC minimal? B⁺ = {B}, C⁺ = {A, C} (from C → A). Neither is R. Yes, minimal.

Candidate Keys: {AB}, {BC}

Prime attributes: A, B, C (all in some candidate key) Non-prime: D

Step 1: Check BCNF

AB → CD: (AB)⁺ = R. AB is superkey. ✓
C → A: C⁺ = {A, C} ≠ R. C is NOT a superkey. ✗ Violation!

Step 2: Decompose on C → A

R₁ = C ∪ A = {A, C}
R₂ = R - (A - C) = R - A = {B, C, D}

Result: {R₁(A, C), R₂(B, C, D)}

Step 3: Check R₁(A, C)

Projected FD: C → A

C⁺ in R₁ = {A, C} = R₁. C is superkey. ✓

R₁ is in BCNF.

Step 4: Check R₂(B, C, D)

Projected FDs on {B, C, D}:

From AB → CD: But A ∉ R₂. Does anything else determine D?

Let me compute closures within R₂:

B⁺ using F? B alone determines nothing.
C⁺ = {A, C}. But A ∉ R₂. So C determines nothing in R₂.
(BC)⁺ = {A, B, C} then → D. So BC → D in R₂.

Check: Is BC a superkey of R₂?

(BC)⁺ in R₂ context: BC → D (derived from original FDs), so {B, C, D} = R₂. Yes!

R₂ is in BCNF with candidate key {BC}. ✓

Final Result: {R₁(A, C), R₂(B, C, D)}

A Case Requiring Multiple Iterations:

Consider R(A, B, C, D, E) with:

A → B
B → C
C → D

Iteration 1: Decompose on A → B (A is not superkey since key would need to include E)

R₁(A, B), R₂(A, C, D, E)

Iteration 2: Check R₂. Does B → C apply? No, B ∉ R₂. But we might have A → C (transitively through B). Actually:

A⁺ = {A, B, C, D}. So A → C, A → D. But E is still not determined.
In R₂, we have A → C, A → D (since B was removed, but the transitive effect remains).
Specifically, for R₂(A, C, D, E), projected FDs include A → CD.
Is A a superkey of R₂? A⁺ in R₂ = {A, C, D} ≠ R₂ (missing E). No!

So we need more decomposition... This illustrates how projecting FDs can create new violations to handle.

Algorithm Termination

Algorithm Visualization

Visualizing the decomposition process helps understand how relations branch and when the algorithm terminates.

Converting Mermaid diagram...

Tree Interpretation:

The decomposition forms a tree:

Root: Original relation
Internal nodes: Intermediate decomposition decisions
Leaves: Final BCNF-compliant relations

Each path from root to leaf represents a sequence of decompositions for that portion of the schema.

Decision Points:

At each step, we choose:

Which relation to process (if multiple have violations)
Which violating FD to use (if multiple exist)

Different choices may lead to different final decompositions, though all valid decompositions will be in BCNF and lossless-join.

Non-Uniqueness of Decomposition

Implementation Considerations

When implementing BCNF decomposition in practice—whether in a normalization tool or as part of schema design—several practical considerations arise.

Implementation Challenges

•FD Projection Complexity — Computing projected FDs for a relation can be expensive. The naive approach (checking all attribute subsets) is exponential in the number of attributes.
•Candidate Key Discovery — Finding all candidate keys is NP-complete in general. Practical algorithms use heuristics like identifying must-be-in-key attributes first.
•Violation Selection Strategy — When multiple violations exist, the order of addressing them affects the final schema structure. No single order is universally optimal.
•Semantic Preservation — The algorithm doesn't consider semantic meaning. Resulting relations may be technically correct but semantically awkward.
•Schema Naming — Automated decomposition produces relations like R₁, R₂. Meaningful names require human judgment.

optimized_bcnf.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
// Optimized BCNF Decomposition with Practical Enhancements
 
function decomposeToBCNFOptimized(R, F):
    result = {R}
    
    while true:
        // Find any relation with a violation
        violated = null
        violatingFD = null
        
        for each Rᵢ in result:
            // Optimization: Only check given FDs, not all projected FDs
            for each (X → Y) in F where X ⊆ attributes(Rᵢ) and Y ⊆ attributes(Rᵢ):
                XClosure = computeClosureRestricted(X, F, attributes(Rᵢ))
                if XClosure ≠ attributes(Rᵢ):  // X is not superkey
                    if not Y.isSubsetOf(X):    // Non-trivial
                        violated = Rᵢ
                        violatingFD = (X → Y)
                        break
            if violated: break
        
        if not violated:
            return result  // Done - all BCNF
        
        // Decompose on the violating FD
        (X, Y) = violatingFD
        
        // Compute full closure of X for R₁
        XFullClosure = computeClosure(X, F)
        R₁Attrs = XFullClosure ∩ attributes(violated)  // Everything X determines within Rᵢ
        R₂Attrs = X ∪ (attributes(violated) - R₁Attrs + X)
        
        // This optimization: put everything X determines into R₁
        // Reduces total number of decomposition steps
        
        result.remove(violated)
        result.add(RelationWith(R₁Attrs, projectFDsOptimized(F, R₁Attrs)))
        result.add(RelationWith(R₂Attrs, projectFDsOptimized(F, R₂Attrs)))
 
// Optimized FD projection - use attribute closure, not enumeration
function projectFDsOptimized(F, Attrs):
    projected = {}
    for each X ⊆ Attrs:
        closure = computeClosureRestricted(X, F, Attrs)
        for each A in (closure - X):
            // Only add if not derivable from smaller X
            if not isRedundant(X → A, projected):
                projected.add(X → A)
    return minimalCover(projected)

Practical Optimization

Complexity Analysis

Understanding the computational complexity of BCNF decomposition helps in both theoretical analysis and practical implementation decisions.

BCNF Decomposition Complexity
Component	Complexity	Notes
Closure computation	O(n²) per FD set	n = number of attributes; iterate until no change
Candidate key finding	O(2ⁿ) worst case	NP-complete; practical cases much better
BCNF violation check	O(\|F\| × n²)	Check each FD, compute closure
FD projection	O(2ⁿ × n²)	Check all subsets; expensive but bounded
Number of decompositions	O(n)	Each cuts at least one attribute; at most n-1 cuts
Overall algorithm	O(n × 2ⁿ × n²) = O(n³ × 2ⁿ)	Exponential in attributes

Why Exponential Isn't Always Bad:

While the worst-case complexity is exponential in the number of attributes, practical database schemas have:

Limited attribute counts — Most relations have 5-20 attributes. 2²⁰ ≈ 1 million is manageable.
Sparse FD sets — Real schemas don't have every possible FD. The number of FDs is typically O(n).
Heuristics apply — Must-be-in-key attributes prune the search space dramatically.
Early termination — Often, the first candidate key found is sufficient; we don't need all of them.

In Practice:

Theoretical vs Practical

Summary: Decomposition Algorithm

We've thoroughly covered the BCNF decomposition algorithm—the practical tool for transforming violated schemas into normalized ones. Let's consolidate:

Key Takeaways

•The algorithm is simple and iterative — Find a violation, decompose into two relations with the violating determinant as the common attribute, repeat until done.
•Decomposition is always lossless — Because we decompose on X → Y, the common attributes X determine one of the decomposed relations, satisfying the lossless-join theorem.
•The algorithm always terminates — Each step reduces relation size. Eventually, all relations satisfy BCNF.
•Multiple decompositions are possible — Different violation orderings yield different final schemas. Choose based on semantic clarity.
•FD projection can create new violations — Decomposed relations may have new violations requiring further decomposition.
•Complexity is manageable in practice — Despite exponential worst-case, real schemas decompose quickly.

What's Next:

Page Complete