Loading content...
The BCNF decomposition algorithm is powerful—it always produces lossless-join decompositions in BCNF. But there's a catch: it does not always preserve dependencies. This is not a flaw in the algorithm; it's a fundamental limitation of BCNF itself.
For certain schemas, achieving BCNF necessarily sacrifices the ability to check some functional dependencies using single-table constraints. This trade-off lies at the heart of database design decisions and represents one of the most important concepts in normalization theory. Understanding when and why dependency preservation fails, and what to do about it, is essential knowledge for any database professional.
By the end of this page, you will understand what dependency preservation means, why BCNF decomposition can fail to preserve dependencies, how to detect when preservation is lost, the practical implications of non-preservation, and strategies for handling this trade-off.
Before examining why BCNF can fail to preserve dependencies, we must precisely define what dependency preservation means.
A decomposition of relation R into relations R₁, R₂, ..., Rₙ is dependency preserving with respect to FD set F if:
(πᵣ₁(F) ∪ πᵣ₂(F) ∪ ... ∪ πᵣₙ(F))⁺ = F⁺
Where πᵣᵢ(F) denotes the projection of F onto the attributes of Rᵢ.
In simpler terms: The FDs that can be checked within individual decomposed relations, taken together, should be equivalent to the original FD set.
Why Dependency Preservation Matters:
Consider a functional dependency AB → C. If A is in relation R₁ and C is in relation R₂ (with only B in both), how do we check this constraint?
With dependency preservation:
Without dependency preservation:
The Efficiency Argument:
Preserved dependencies can be enforced by:
Non-preserved dependencies require:
An unpreserved dependency isn't just inconvenient—it's a potential data integrity risk. If checking a constraint requires a join, concurrent transactions may create violations before the check completes. Without careful locking, the database can end up with invalid data.
BCNF decomposition can fail to preserve dependencies because of how it separates attributes. The algorithm splits relations based on violations, but this splitting can scatter the attributes of certain FDs across multiple resulting relations.
The Mechanism of Loss:
Consider a relation R with FDs including:
When we decompose on X → Y:
If Z and W are split such that Z is entirely in R₁ and W is entirely in R₂ (or vice versa), the FD Z → W cannot be checked in either relation alone.
Critical Insight:
The decomposition algorithm doesn't consider which FDs will be preserved—it only considers which FDs are violated. This "blind spot" can inadvertently split apart dependencies that weren't even violated to begin with.
Mathematical Formulation:
For FD Z → W to be preserved in a decomposition {R₁, R₂, ..., Rₙ}:
A dependency may appear lost but still be checkable. If Z → W is derivable from preserved dependencies via Armstrong's axioms, the constraint is implicitly enforced. We only have a real problem when some FD is not derivable from the preserved set.
Let's examine the canonical example where BCNF decomposition fails to preserve all dependencies. This example appears in virtually every database textbook for good reason—it perfectly illustrates the trade-off.
Relation: R(Student, Course, Instructor)
Semantics: • Each student-course pair has exactly one instructor • Each instructor teaches only one course
Functional Dependencies: • F1: {Student, Course} → Instructor • F2: Instructor → Course
Step 1: Analyze the Original Relation
Candidate Keys:
Both {Student, Course} and {Student, Instructor} are candidate keys.
BCNF Check:
Step 2: Decompose on the Violation
Decomposing on Instructor → Course:
Wait, let me recalculate R₂:
Decomposition: {R₁(Instructor, Course), R₂(Student, Instructor)}
Step 3: Verify BCNF of Decomposed Relations
R₁(Instructor, Course):
R₂(Student, Instructor):
Step 4: Check Dependency Preservation
Original FDs to preserve:
Preserved FDs:
Is {Student, Course} → Instructor preserved?
To check {Student, Course} → Instructor, we need both Student and Course in the same relation. They are NOT in the same relation!
The dependency {Student, Course} → Instructor is NOT preserved.
The FD {Student, Course} → Instructor cannot be verified in either R₁ or R₂: • R₁ has Instructor and Course but not Student • R₂ has Student and Instructor but not Course
Enforcing this constraint now requires joining R₁ and R₂.
The Semantic Meaning of the Lost Dependency:
The lost dependency {Student, Course} → Instructor means: "For any given student in a given course, there's exactly one instructor."
Without this constraint enforced, the database could contain:
This would violate the business rule that Alice has only one instructor for CS101.
Why Did This Happen?
The decomposition separated Student (in R₂) from Course (in R₁). Since the FD {Student, Course} → Instructor requires both Student and Course as the determinant, they must be together to check the constraint. The algorithm's focus on eliminating the Instructor → Course violation blindly destroyed this capability.
Before committing to a BCNF decomposition, you should verify whether all dependencies are preserved. Here's a systematic approach to detection.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
Algorithm: Check Dependency PreservationInput: Original FDs F, Decomposition {R₁, R₂, ..., Rₙ}Output: true if all FDs are preserved, false otherwise function isDependencyPreserving(F, decomposition): // Collect all preserved FDs preservedFDs = {} for each Rᵢ in decomposition: projectedFDs = projectFDs(F, attributes(Rᵢ)) preservedFDs = preservedFDs ∪ projectedFDs // Check if preserved FDs are equivalent to original // We need: (preservedFDs)⁺ = F⁺ // Equivalently: Every FD in F must be derivable from preservedFDs for each (X → Y) in F: // Check if X → Y is derivable from preservedFDs XClosure = computeClosure(X, preservedFDs) if not Y.isSubsetOf(XClosure): return false // This FD is not preserved! return true // More efficient: Check each FD individually using this algorithmfunction isFDPreserved(X → Y, decomposition, F): // Can we derive X → Y from projected FDs only? // Use modified closure computation that only applies FDs // that are entirely within some Rᵢ result = X changed = true while changed: changed = false for each Rᵢ in decomposition: // Consider only the portion of result in Rᵢ Z = result ∩ attributes(Rᵢ) ZClosure = computeClosure(Z, projectFDs(F, attributes(Rᵢ))) if not ZClosure.isSubsetOf(result): result = result ∪ ZClosure changed = true return Y.isSubsetOf(result)Algorithm Explanation:
Approach 1: Full Preservation Check
Approach 2: Per-FD Check (More Efficient)
The second approach is more efficient because it doesn't require computing all projected FDs upfront—only checking whether specific FDs are derivable.
For quick manual analysis: If an FD X → Y has X and Y attributes all in the same relation Rᵢ after decomposition, it's definitely preserved. If attributes are split across relations, further analysis is needed—it might still be preserved through derivation.
The dependency preservation issue isn't just an algorithm limitation—it's a fundamental theoretical barrier. For certain schemas, there is no decomposition that achieves both BCNF and dependency preservation.
There exist relation schemas R with functional dependency sets F such that no decomposition of R can simultaneously:
This is a proven impossibility, not an algorithm deficiency.
Proof Sketch:
Consider R(A, B, C) with F = {A → B, B → C, C → A}.
This creates a cycle of dependencies with no clear key—actually, each attribute determines everything:
So {A}, {B}, and {C} are all candidate keys. Each dependency has a candidate key (hence superkey) on the left.
This relation is already in BCNF!
But this is atypical. Let's construct a true impossibility case:
R(J, K, L) with F = {JK → L, L → K}
Candidate Keys:
BCNF Check:
Decompose on L → K:
Check Preservation:
Is JK → L derivable from projected FDs?
Can we derive JK → L?
JK → L is NOT preserved.
Can We Do Better?
No. Any BCNF decomposition of R(J, K, L) with {JK → L, L → K} must separate J and K (to put L → K in one relation), which destroys the JK → L dependency.
This is the impossibility: the structure of the FDs forces a conflict between BCNF and dependency preservation.
Third Normal Form always has a dependency-preserving decomposition (via the 3NF synthesis algorithm). BCNF's stricter constraints create cases where no dependency-preserving decomposition exists. This is why 3NF remains relevant despite BCNF's theoretical superiority.
When dependency preservation is lost, database designers face practical challenges. Understanding these implications helps inform the choice between BCNF and alternatives like 3NF.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- Example: Enforcing {Student, Course} → Instructor across decomposed tables-- Tables: InstructorCourse(Instructor, Course), StudentInstructor(Student, Instructor) -- Option 1: Trigger-based enforcementCREATE OR REPLACE FUNCTION check_student_course_instructor()RETURNS TRIGGER AS $$BEGIN -- When inserting into StudentInstructor, verify no conflict exists IF EXISTS ( SELECT 1 FROM StudentInstructor si1 JOIN InstructorCourse ic1 ON si1.Instructor = ic1.Instructor JOIN InstructorCourse ic2 ON ic2.Course = ic1.Course JOIN StudentInstructor si2 ON si2.Instructor = ic2.Instructor WHERE si1.Student = NEW.Student AND si2.Student = NEW.Student AND si1.Instructor <> si2.Instructor ) THEN RAISE EXCEPTION 'Violation: Student % would have multiple instructors for same course', NEW.Student; END IF; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER enforce_sc_instructor BEFORE INSERT OR UPDATE ON StudentInstructor FOR EACH ROW EXECUTE FUNCTION check_student_course_instructor(); -- Option 2: Materialized view with unique constraintCREATE MATERIALIZED VIEW StudentCourseInstructor ASSELECT si.Student, ic.Course, si.InstructorFROM StudentInstructor siJOIN InstructorCourse ic ON si.Instructor = ic.Instructor; -- Refresh after each relevant transaction-- Check uniqueness violation by querying the view -- Option 3: Application-level check (pseudo-code)/*function addStudentInstructor(student, instructor): course = query("SELECT Course FROM InstructorCourse WHERE Instructor = ?", instructor) existingInstructor = query( "SELECT si.Instructor FROM StudentInstructor si JOIN InstructorCourse ic ON si.Instructor = ic.Instructor WHERE si.Student = ? AND ic.Course = ?", student, course ) if existingInstructor and existingInstructor != instructor: throw "Constraint violation: Student already has different instructor for this course" insert(student, instructor)*/Before accepting a BCNF decomposition with lost dependencies, estimate: (1) How often will the constraint be tested? (2) What's the cost of enforcement overhead? (3) What's the cost of potential violations? If constraints are rarely checked or violations have low impact, BCNF may still be preferred.
When BCNF decomposition loses dependencies, designers face a choice: accept the loss and stay in BCNF, or retreat to 3NF (which always preserves dependencies). Here's a framework for making this decision.
| Factor | Favors BCNF | Favors 3NF |
|---|---|---|
| Update frequency on redundant data | Low (redundancy cost is minimal) | High (update anomalies are real risk) |
| Constraint enforcement complexity | Acceptable complexity/resources available | Must use simple single-table constraints |
| Read vs write ratio | Write-heavy (reduce redundancy) | Read-heavy (can tolerate redundancy) |
| Storage costs | Storage is expensive | Storage is cheap |
| Data integrity criticality | Lower stakes; some violations acceptable | High stakes; violations unacceptable |
| Application architecture | Single app, can enforce at app level | Multiple apps, need DB-level enforcement |
| Concurrency requirements | Low concurrency, can afford locking | High concurrency, need fast constraints |
Decision Flowchart:
Does BCNF decomposition preserve all dependencies?
How critical are the non-preserved dependencies?
Can you implement cross-table enforcement reliably?
Is the 3NF redundancy tolerable?
The Pragmatic View:
In practice, many database designers:
There's no universally "correct" choice—it depends on specific requirements.
Sometimes the answer is neither pure BCNF nor 3NF. You might use BCNF for some relations, 3NF for others, or even denormalize certain areas. Schema design is ultimately about optimizing for your specific workload and constraints.
Beyond the simple BCNF-or-3NF choice, several alternative strategies can help manage the dependency preservation trade-off.
Whatever strategy you choose, document it thoroughly. Future maintainers need to understand: (1) Which normal form each table is in, (2) What constraints are enforced where, (3) What invariants must be maintained by application code. Undocumented schema design decisions become landmines.
We've explored the fundamental tension between BCNF's strictness and the practical need for dependency preservation. This trade-off is one of the most important concepts in database design. Let's consolidate:
Congratulations! You've completed the BCNF module. You now understand BCNF's definition, its relationship to 3NF, how to identify violations, the decomposition algorithm, and the crucial dependency preservation trade-off. You're equipped to make informed normalization decisions in real database design scenarios.
Module Summary:
Across five comprehensive pages, we've covered:
This knowledge positions you to engage deeply with normalization theory and apply it effectively in practice. The next module in this chapter explores BCNF decomposition in greater depth, with additional algorithms and edge cases.