Loading content...
Detecting BCNF violations is only half the battle. The real work lies in eliminating violations while preserving the essential information content of the original relation. This is where the BCNF decomposition algorithm enters—a systematic procedure that transforms any relation into a collection of BCNF-compliant relations.
The decomposition algorithm is one of the most elegant results in database theory. It guarantees that we can always achieve BCNF (unlike some other normal forms with more restrictive criteria), and it provides a deterministic path from a violated schema to a normalized one. However, this power comes with trade-offs we must understand.
By the end of this page, you will master the BCNF decomposition algorithm, understand why it guarantees lossless-join decomposition, work through multiple examples of increasing complexity, and appreciate the algorithm's computational characteristics.
The BCNF decomposition algorithm is conceptually simple yet powerful. It repeatedly identifies violations and decomposes relations until no violations remain.
1234567891011121314151617181920212223242526272829303132333435
Algorithm: BCNF DecompositionInput: Relation R with attribute set U, Functional dependency set FOutput: Set of relations {R₁, R₂, ..., Rₙ} all in BCNF function decomposeToBCNF(R: Relation, F: FunctionalDependencies) -> Set<Relation>: result = {R} // Start with the original relation while exists relation Rᵢ in result that is not in BCNF: // Step 1: Find a BCNF-violating FD in Rᵢ // X → Y where X is not a superkey of Rᵢ and Y ⊄ X (X → Y) = findViolatingFD(Rᵢ, projectFDs(F, Rᵢ)) // Step 2: Decompose Rᵢ into two relations // R₁ = X ∪ Y (contains the violating dependency) // R₂ = Rᵢ - (Y - X) = X ∪ (Rᵢ - Y) (remaining attributes with X) R₁ = X ∪ Y R₂ = attributes(Rᵢ) - Y + X // Equivalently: X ∪ (attributes(Rᵢ) - Y) // Step 3: Replace Rᵢ with R₁ and R₂ result.remove(Rᵢ) result.add(R₁) result.add(R₂) return result // Helper: Project FDs onto a subset of attributesfunction projectFDs(F: FunctionalDependencies, R: Relation) -> FunctionalDependencies: projected = {} for each subset X of attributes(R): closure = computeClosure(X, F) for each attribute A in (closure ∩ attributes(R)): if A not in X: projected.add(X → A) // Return minimal cover of projected return minimalCover(projected)Algorithm Walkthrough:
Initialization: We start with the original relation R in our result set.
Iteration: While any relation in our result set violates BCNF:
Find a violating dependency X → Y in some relation Rᵢ
Decompose into two relations:
Replace the violated relation with the two new relations.
Termination: When no relation has BCNF violations, we're done.
Key Insight: Each decomposition step removes at least one attribute from consideration for the violating relation (Y - X moves to R₁), so the algorithm must terminate—we can't decompose forever.
When decomposing R based on X → Y, we create R₁ = XY and R₂ = R - (Y - X). The attribute set X appears in BOTH relations—this is essential for the lossless-join property. X serves as the "bridge" that allows us to reconstruct original tuples.
A critical property of the BCNF decomposition algorithm is that it produces lossless-join decompositions. This means the original relation can always be perfectly reconstructed by joining the decomposed relations—no information is lost, and no spurious tuples are created.
A decomposition of R into R₁ and R₂ is lossless-join with respect to F if and only if:
(R₁ ∩ R₂) → (R₁ - R₂) ∈ F⁺ OR (R₁ ∩ R₂) → (R₂ - R₁) ∈ F⁺
In words: The common attributes must functionally determine all attributes of at least one of the decomposed relations.
Proof That BCNF Decomposition Is Lossless:
When we decompose R based on violation X → Y:
The intersection is:
Now, consider which relation X determines:
The lossless-join condition is satisfied. □
Intuitive Explanation:
When we decompose on X → Y, we're effectively saying:
Since X → Y, there's exactly one Y value for each X value. When we join on X, we reconstruct the original pairing perfectly—no duplicates, no losses.
| Step | R₁ (X ∪ Y) | R₂ (X ∪ remaining) | R₁ ⋈ R₂ |
|---|---|---|---|
| Original R = {A, B, C} with A → B | R₁ = {A, B} | R₂ = {A, C} | Should = R |
| Sample data: (a1, b1, c1), (a1, b1, c2), (a2, b2, c3) | R₁: (a1, b1), (a2, b2) | R₂: (a1, c1), (a1, c2), (a2, c3) | |
| Join on A | (a1, b1, c1), (a1, b1, c2), (a2, b2, c3) = R ✓ |
A lossy decomposition creates spurious tuples when joined—tuples that weren't in the original data. This corrupts query results and makes the database unreliable. The guarantee of lossless-join is non-negotiable for any normalization algorithm.
Let's work through a complete decomposition example, step by step, to solidify understanding of the algorithm.
Relation: R(A, B, C, D, E)
Functional Dependencies: • A → BC • C → D • BD → E
Decompose R into BCNF.
Step 0: Find Candidate Keys
First, identify what the candidate keys are so we can detect violations.
Attributes never on RHS: A (A doesn't appear on any right-hand side)
Compute A⁺:
A⁺ = R, so {A} is a superkey. It's also minimal (single attribute).
Candidate Key: {A}
Step 1: Find BCNF Violations
Check each FD:
We have two violations. We'll address C → D first (arbitrary choice).
Step 2: First Decomposition (on C → D)
Violating FD: C → D
Result so far: {R₁(C, D), R₂(A, B, C, E)}
Step 3: Check R₁ for BCNF
R₁(C, D) with projected FDs:
C⁺ in R₁ = {C, D} = R₁, so C is a superkey.
R₁ is in BCNF. ✓
Step 4: Check R₂ for BCNF
R₂(A, B, C, E) with projected FDs:
Compute relevant FDs:
But wait—we need to compute the projection properly. What FDs hold over {A, B, C, E}?
Using original F:
Projected FDs on R₂: A → BCE (within R₂)
Now check BCNF:
R₂ is in BCNF. ✓
Wait—What About BD → E?
Let me reconsider. BD → E was in original F, but D is not in R₂. Does any FD involving only {A, B, C, E} violate BCNF?
We need to check if there's any non-superkey X within {A, B, C, E} that determines some Y within {A, B, C, E}.
For the FD B → something:
For C:
For E:
For BC:
For BE:
Actually, R₂ is fine because:
Final Result: {R₁(C, D), R₂(A, B, C, E)}
Both relations are in BCNF.
Original R(A, B, C, D, E) decomposes to: • R₁(C, D) with FD C → D — C is key • R₂(A, B, C, E) with FD A → BCE — A is key
Join on C reconstructs the original relation.
Sometimes, a single decomposition step isn't enough. The resulting relations may themselves violate BCNF, requiring further decomposition. Let's examine such a case.
Relation: R(A, B, C, D)
Functional Dependencies: • AB → CD • C → A
The candidate key is {A, B}... or is it?
Step 0: Find Candidate Keys
Attributes never on RHS: B (B never appears on right side)
Compute closures:
Candidate Keys: {AB}, {BC}
Prime attributes: A, B, C (all in some candidate key) Non-prime: D
Step 1: Check BCNF
Step 2: Decompose on C → A
Result: {R₁(A, C), R₂(B, C, D)}
Step 3: Check R₁(A, C)
Projected FD: C → A
R₁ is in BCNF.
Step 4: Check R₂(B, C, D)
Projected FDs on {B, C, D}:
From AB → CD: But A ∉ R₂. Does anything else determine D?
Let me compute closures within R₂:
Check: Is BC a superkey of R₂?
R₂ is in BCNF with candidate key {BC}. ✓
Final Result: {R₁(A, C), R₂(B, C, D)}
A Case Requiring Multiple Iterations:
Consider R(A, B, C, D, E) with:
Iteration 1: Decompose on A → B (A is not superkey since key would need to include E)
Iteration 2: Check R₂. Does B → C apply? No, B ∉ R₂. But we might have A → C (transitively through B). Actually:
So we need more decomposition... This illustrates how projecting FDs can create new violations to handle.
Despite potentially multiple iterations, BCNF decomposition always terminates. Each decomposition reduces the size of at least one relation. Eventually, all relations are either single attributes (trivially in BCNF) or have all FDs satisfied by superkey determinants.
Visualizing the decomposition process helps understand how relations branch and when the algorithm terminates.
Tree Interpretation:
The decomposition forms a tree:
Each path from root to leaf represents a sequence of decompositions for that portion of the schema.
Decision Points:
At each step, we choose:
Different choices may lead to different final decompositions, though all valid decompositions will be in BCNF and lossless-join.
The BCNF decomposition is not unique. Different orderings of addressing violations can produce different (but equally valid) final schemas. Choose based on semantic meaningfulness—each final relation should represent a coherent real-world concept.
When implementing BCNF decomposition in practice—whether in a normalization tool or as part of schema design—several practical considerations arise.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
// Optimized BCNF Decomposition with Practical Enhancements function decomposeToBCNFOptimized(R, F): result = {R} while true: // Find any relation with a violation violated = null violatingFD = null for each Rᵢ in result: // Optimization: Only check given FDs, not all projected FDs for each (X → Y) in F where X ⊆ attributes(Rᵢ) and Y ⊆ attributes(Rᵢ): XClosure = computeClosureRestricted(X, F, attributes(Rᵢ)) if XClosure ≠ attributes(Rᵢ): // X is not superkey if not Y.isSubsetOf(X): // Non-trivial violated = Rᵢ violatingFD = (X → Y) break if violated: break if not violated: return result // Done - all BCNF // Decompose on the violating FD (X, Y) = violatingFD // Compute full closure of X for R₁ XFullClosure = computeClosure(X, F) R₁Attrs = XFullClosure ∩ attributes(violated) // Everything X determines within Rᵢ R₂Attrs = X ∪ (attributes(violated) - R₁Attrs + X) // This optimization: put everything X determines into R₁ // Reduces total number of decomposition steps result.remove(violated) result.add(RelationWith(R₁Attrs, projectFDsOptimized(F, R₁Attrs))) result.add(RelationWith(R₂Attrs, projectFDsOptimized(F, R₂Attrs))) // Optimized FD projection - use attribute closure, not enumerationfunction projectFDsOptimized(F, Attrs): projected = {} for each X ⊆ Attrs: closure = computeClosureRestricted(X, F, Attrs) for each A in (closure - X): // Only add if not derivable from smaller X if not isRedundant(X → A, projected): projected.add(X → A) return minimalCover(projected)When decomposing on X → Y, consider putting ALL attributes determined by X (not just Y) into R₁. This can reduce the number of decomposition iterations by handling multiple violations in one step.
Understanding the computational complexity of BCNF decomposition helps in both theoretical analysis and practical implementation decisions.
| Component | Complexity | Notes |
|---|---|---|
| Closure computation | O(n²) per FD set | n = number of attributes; iterate until no change |
| Candidate key finding | O(2ⁿ) worst case | NP-complete; practical cases much better |
| BCNF violation check | O(|F| × n²) | Check each FD, compute closure |
| FD projection | O(2ⁿ × n²) | Check all subsets; expensive but bounded |
| Number of decompositions | O(n) | Each cuts at least one attribute; at most n-1 cuts |
| Overall algorithm | O(n × 2ⁿ × n²) = O(n³ × 2ⁿ) | Exponential in attributes |
Why Exponential Isn't Always Bad:
While the worst-case complexity is exponential in the number of attributes, practical database schemas have:
Limited attribute counts — Most relations have 5-20 attributes. 2²⁰ ≈ 1 million is manageable.
Sparse FD sets — Real schemas don't have every possible FD. The number of FDs is typically O(n).
Heuristics apply — Must-be-in-key attributes prune the search space dramatically.
Early termination — Often, the first candidate key found is sufficient; we don't need all of them.
In Practice:
For typical database schemas (fewer than 50 attributes, fewer than 100 FDs), BCNF decomposition completes in milliseconds. The exponential nature only matters for pathological cases unlikely in real-world design.
Database theory often deals with worst-case exponential algorithms that work fine in practice. BCNF decomposition is one example. Understanding the theoretical bounds helps recognize when a problem might be computationally difficult, but practical experience shows it's rarely an issue for schema design.
We've thoroughly covered the BCNF decomposition algorithm—the practical tool for transforming violated schemas into normalized ones. Let's consolidate:
What's Next:
The BCNF decomposition algorithm guarantees lossless joins, but it does NOT guarantee dependency preservation. The next page explores this critical trade-off—why some functional dependencies may become uncheckable after decomposition, what this means for data integrity, and how to navigate this fundamental limitation of BCNF.
You now understand how to systematically decompose any relation into BCNF form using the standard algorithm. The next page addresses the most significant trade-off of BCNF: the potential loss of dependency preservation.