Loading learning content...
In our previous study of Boyce-Codd Normal Form (BCNF), we established that a relation is in BCNF when every determinant is a candidate key. This elegant definition eliminates all functional dependency-based redundancy. But knowing what BCNF is differs fundamentally from knowing how to achieve it.
This page presents the BCNF decomposition algorithm—a systematic procedure for transforming any relation schema into a set of BCNF relations. Unlike theoretical definitions, this algorithm is actionable: it takes a relation and its functional dependencies as input and produces a guaranteed BCNF decomposition as output.
The algorithm we study here is not merely academic. It forms the foundation of automated database design tools, schema normalization utilities, and the mental framework that experienced database architects use when designing real-world systems.
By the end of this page, you will be able to: (1) State the formal BCNF decomposition algorithm, (2) Apply it step-by-step to any relation schema, (3) Prove that the algorithm always terminates, (4) Understand why it guarantees lossless decomposition, and (5) Implement the algorithm in practice.
Before diving into the algorithm, let us consolidate the foundational concepts that make the decomposition procedure possible.
A BCNF violation occurs when we have a functional dependency X → Y where X is NOT a superkey. This means X determines something, but X doesn't determine everything—creating the potential for redundancy. The decomposition algorithm systematically eliminates these violations.
The Central Insight:
When a relation violates BCNF due to a functional dependency X → Y (where X is not a superkey), we can decompose the relation into two smaller relations:
This split isolates the problematic dependency into its own relation while preserving the ability to reconstruct the original data through joins.
The BCNF decomposition algorithm is an iterative procedure that repeatedly identifies and eliminates BCNF violations until all resulting relations are in BCNF.
Input: A relation schema R with attribute set U and a set of functional dependencies F.
Output: A decomposition of R into relations R₁, R₂, ..., Rₙ, each in BCNF.
Procedure:
A Cleaner Formulation:
In practice, the algorithm is often stated more simply:
Given: Relation R with attributes U and FD set F
1. result := {R}
2. While some Rᵢ ∈ result violates BCNF:
2.1. Find X → Y violating BCNF in Rᵢ (X not a superkey, X → Y non-trivial)
2.2. Decompose Rᵢ into:
• Rₐ = XY (attributes of the violating FD)
• Rᵦ = Rᵢ - Y + X = Rᵢ - (Y - X) (remaining attributes plus X)
2.3. Replace Rᵢ with Rₐ and Rᵦ in result
3. Return result
The key insight is that X becomes a key in Rₐ (since X → Y and Rₐ = XY), and X serves as a foreign key in Rᵦ to reference Rₐ.
| Component | Description | Purpose |
|---|---|---|
| X → Y | The BCNF-violating functional dependency | Identifies what needs to be separated |
| Rₐ = XY | The new relation containing the violating FD | X becomes a key here, satisfying BCNF for this FD |
| Rᵦ = R - (Y - X) | Original relation minus the dependent attributes (keeping X) | Removes redundancy while keeping join capability |
| X as common attribute | X appears in both Rₐ and Rᵦ | Enables lossless join reconstruction |
Let's trace the algorithm through a comprehensive example that demonstrates multiple decomposition steps.
Relation: CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)
Functional Dependencies: • F₁: {StudentID, CourseID} → Instructor • F₂: CourseID → Instructor • F₃: Room → Building • F₄: Room → Capacity • F₅: {Room, Building} → Capacity (derivable, but explicit)
Candidate Key: {StudentID, CourseID, Room}
Step 0: Initial State
result = {CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)}
Step 1: Check for BCNF Violations
We examine each FD to see if its determinant is a superkey:
| FD | Determinant | Is Superkey? | BCNF Violation? |
|---|---|---|---|
| CourseID → Instructor | {CourseID} | No (doesn't determine StudentID or Room) | YES |
| Room → Building | {Room} | No (doesn't determine StudentID or CourseID) | YES |
| Room → Capacity | {Room} | No | YES |
Multiple violations exist. We pick CourseID → Instructor first.
Step 2: First Decomposition (CourseID → Instructor)
Violating FD: CourseID → Instructor
Decompose CourseSchedule into:
Now result = {R₁(CourseID, Instructor), R₂(StudentID, CourseID, Room, Building, Capacity)}
Verify R₁ is in BCNF:
Step 3: Check R₂ for BCNF Violations
R₂ = {StudentID, CourseID, Room, Building, Capacity}
FDs that apply to R₂:
Is {Room} a superkey of R₂? Room⁺ = {Room, Building, Capacity} ≠ R₂
Room → Building violates BCNF in R₂!
Step 4: Second Decomposition (Room → Building)
Actually, let's use Room → {Building, Capacity} since Room⁺ includes both.
Decompose R₂ into:
Now result = {R₁, R₃, R₄}
Verify R₃ is in BCNF:
Verify R₄ is in BCNF:
Original: CourseSchedule(StudentID, CourseID, Instructor, Room, Building, Capacity)
BCNF Decomposition: • R₁(CourseID, Instructor) — Key: CourseID • R₃(Room, Building, Capacity) — Key: Room • R₄(StudentID, CourseID, Room) — Key: {StudentID, CourseID, Room}
Each relation is now in BCNF. Redundancy has been eliminated!
The correctness of the BCNF decomposition algorithm rests on two fundamental guarantees: termination (it always finishes) and losslessness (the decomposition preserves all information). Let's examine each.
Proof of Lossless Join:
The decomposition of R into Rₐ = XY and Rᵦ = R - (Y - X) satisfies the lossless join condition.
Theorem: A decomposition of R into R₁ and R₂ is lossless if and only if:
In our decomposition:
Since (R₁ ∩ R₂) → R₁, the decomposition is lossless!
Think of it this way: X is like a foreign key in R₂ (Rᵦ) that references the primary key X in R₁ (Rₐ). When we join them back together, each tuple in R₂ matches exactly one tuple in R₁ (because X functionally determines Y). No spurious tuples can appear because the relationship is many-to-one, not many-to-many.
The basic BCNF decomposition algorithm admits several variations that can affect the final result and its properties.
| Variation | Description | Trade-offs |
|---|---|---|
| Maximal Decomposition | Use X⁺ instead of just XY for the first sub-relation | Fewer relations, but may miss opportunities for dependency preservation |
| Minimal Decomposition | Use exactly XY where Y is a single attribute | More relations, but easier to analyze and modify |
| Choice of Violating FD | Different orderings of FD selection produce different decompositions | All are correct BCNF, but may have different numbers of relations |
| Canonical Cover First | Compute canonical cover of F before decomposing | Reduces redundant work and can simplify the final schema |
The Choice Matters:
Consider a relation R(A, B, C, D) with FDs: A → B, A → C, A → D.
Approach 1 (Maximal):
Approach 2 (Minimal steps on different ordering):
The lesson: always compute the attribute closure to determine if you really have a violation. The determinant might be a superkey after all!
The BCNF decomposition algorithm is non-deterministic: different choices of which violation to address first can lead to different final decompositions. All results are valid BCNF decompositions, but they may have different numbers of relations and different dependency preservation properties.
Let's examine how to implement the BCNF decomposition algorithm programmatically. The implementation requires careful handling of attribute sets and functional dependency projection.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
def compute_closure(attrs: set, fds: list) -> set: """Compute the closure of a set of attributes under given FDs.""" closure = attrs.copy() changed = True while changed: changed = False for lhs, rhs in fds: if lhs.issubset(closure) and not rhs.issubset(closure): closure = closure.union(rhs) changed = True return closure def is_superkey(attrs: set, relation: set, fds: list) -> bool: """Check if attrs is a superkey of the relation.""" closure = compute_closure(attrs, fds) return relation.issubset(closure) def find_bcnf_violation(relation: set, fds: list) -> tuple: """Find an FD that violates BCNF, or return None if in BCNF.""" for lhs, rhs in fds: # Check if FD is applicable to this relation if lhs.issubset(relation) and rhs.issubset(relation): # Check if it's non-trivial if not rhs.issubset(lhs): # Check if lhs is NOT a superkey if not is_superkey(lhs, relation, fds): return (lhs, rhs) return None def project_fds(fds: list, relation: set) -> list: """Project FDs onto a relation (compute FDs that apply).""" projected = [] # We need to compute the closure of each subset # This is expensive but correct for lhs, rhs in fds: if lhs.issubset(relation): closure = compute_closure(lhs, fds) new_rhs = closure.intersection(relation) - lhs if new_rhs: projected.append((lhs, new_rhs)) return projected def bcnf_decomposition(relation: set, fds: list) -> list: """ Decompose a relation into BCNF. Returns a list of relations (each a set of attributes) in BCNF. """ result = [relation] while True: # Find a relation that violates BCNF found_violation = False for i, rel in enumerate(result): violation = find_bcnf_violation(rel, fds) if violation: found_violation = True lhs, rhs = violation # Compute R1 = closure of lhs restricted to relation closure = compute_closure(lhs, fds) r1 = closure.intersection(rel) # Compute R2 = relation - (closure - lhs) r2 = rel - (closure - lhs) # Replace the violating relation result.pop(i) result.append(r1) result.append(r2) break if not found_violation: break return result # Example usageif __name__ == "__main__": # CourseSchedule example relation = {'StudentID', 'CourseID', 'Instructor', 'Room', 'Building', 'Capacity'} fds = [ ({'CourseID'}, {'Instructor'}), ({'Room'}, {'Building'}), ({'Room'}, {'Capacity'}), ] result = bcnf_decomposition(relation, fds) print("BCNF Decomposition:") for i, r in enumerate(result, 1): print(f" R{i}: {r}")The naive implementation has exponential complexity because projecting FDs correctly requires considering all subsets. In practice, optimizations like using the canonical cover and caching closures make the algorithm tractable for typical database schemas with tens of attributes.
Students and practitioners frequently make errors when applying the BCNF decomposition algorithm. Understanding these pitfalls helps avoid them.
Checking only explicit FDs:
'A → B is not a violation because I don't see A → C, A → D, etc.'
This ignores that A⁺ might include C and D through transitivity!
Always compute A⁺:
'A⁺ = {A, B, C, D} which equals the entire relation, so A IS a superkey and A → B does NOT violate BCNF.'
The BCNF decomposition algorithm is a fundamental technique in database design. Let's consolidate the essential points.
You now understand the BCNF decomposition algorithm: its formal statement, step-by-step execution, correctness guarantees, and implementation. In the next page, we'll explore the lossless guarantee in greater depth, examining the mathematical foundations and practical implications.