Bcnf - Learning Module | OneNoughtOne

Loading content...

0/241

Dependency Preservation Issue

The Trade-Off at the Heart of BCNF

The BCNF decomposition algorithm is powerful—it always produces lossless-join decompositions in BCNF. But there's a catch: it does not always preserve dependencies. This is not a flaw in the algorithm; it's a fundamental limitation of BCNF itself.

For certain schemas, achieving BCNF necessarily sacrifices the ability to check some functional dependencies using single-table constraints. This trade-off lies at the heart of database design decisions and represents one of the most important concepts in normalization theory. Understanding when and why dependency preservation fails, and what to do about it, is essential knowledge for any database professional.

What You Will Learn

By the end of this page, you will understand what dependency preservation means, why BCNF decomposition can fail to preserve dependencies, how to detect when preservation is lost, the practical implications of non-preservation, and strategies for handling this trade-off.

What Is Dependency Preservation?

Before examining why BCNF can fail to preserve dependencies, we must precisely define what dependency preservation means.

Dependency Preservation Definition

A decomposition of relation R into relations R₁, R₂, ..., Rₙ is dependency preserving with respect to FD set F if:

(πᵣ₁(F) ∪ πᵣ₂(F) ∪ ... ∪ πᵣₙ(F))⁺ = F⁺

Where πᵣᵢ(F) denotes the projection of F onto the attributes of Rᵢ.

In simpler terms: The FDs that can be checked within individual decomposed relations, taken together, should be equivalent to the original FD set.

Why Dependency Preservation Matters:

Consider a functional dependency AB → C. If A is in relation R₁ and C is in relation R₂ (with only B in both), how do we check this constraint?

With dependency preservation:

The FD AB → C can be checked within a single relation
INSERT/UPDATE operations can validate the constraint immediately
A simple UNIQUE constraint or trigger on one table suffices

Without dependency preservation:

The FD AB → C spans multiple relations
Checking requires joining R₁ and R₂
Every INSERT/UPDATE must perform a multi-table check
This is slower, more complex, and error-prone

The Efficiency Argument:

Preserved dependencies can be enforced by:

Unique indexes within a single table
CHECK constraints
Simple triggers

Non-preserved dependencies require:

Multi-table triggers
Application-level enforcement
Periodic batch verification
Or accepting that the constraint may be violated

The Integrity Risk

An unpreserved dependency isn't just inconvenient—it's a potential data integrity risk. If checking a constraint requires a join, concurrent transactions may create violations before the check completes. Without careful locking, the database can end up with invalid data.

Why BCNF Can Fail to Preserve Dependencies

BCNF decomposition can fail to preserve dependencies because of how it separates attributes. The algorithm splits relations based on violations, but this splitting can scatter the attributes of certain FDs across multiple resulting relations.

The Mechanism of Loss:

Consider a relation R with FDs including:

Violating FD: X → Y (where X is not a superkey)
Another FD: Z → W (where Z and W may overlap with X and Y)

When we decompose on X → Y:

R₁ = X ∪ Y
R₂ = X ∪ (R - Y)

If Z and W are split such that Z is entirely in R₁ and W is entirely in R₂ (or vice versa), the FD Z → W cannot be checked in either relation alone.

Critical Insight:

The decomposition algorithm doesn't consider which FDs will be preserved—it only considers which FDs are violated. This "blind spot" can inadvertently split apart dependencies that weren't even violated to begin with.

Mathematical Formulation:

For FD Z → W to be preserved in a decomposition {R₁, R₂, ..., Rₙ}:

There must exist some Rᵢ such that Z ∪ W ⊆ attributes(Rᵢ)
If no such Rᵢ exists, checking Z → W requires joining multiple relations

Not All Lost Dependencies Are Problems

A dependency may appear lost but still be checkable. If Z → W is derivable from preserved dependencies via Armstrong's axioms, the constraint is implicitly enforced. We only have a real problem when some FD is not derivable from the preserved set.

The Classic Example

Let's examine the canonical example where BCNF decomposition fails to preserve all dependencies. This example appears in virtually every database textbook for good reason—it perfectly illustrates the trade-off.

The Example: Student-Course-Instructor

Relation: R(Student, Course, Instructor)

Semantics: • Each student-course pair has exactly one instructor • Each instructor teaches only one course

Functional Dependencies: • F1: {Student, Course} → Instructor • F2: Instructor → Course

Step 1: Analyze the Original Relation

Candidate Keys:

{Student, Course}⁺ = {Student, Course, Instructor} = R ✓
{Student, Instructor}⁺: Instructor → Course gives {Student, Instructor, Course} = R ✓

Both {Student, Course} and {Student, Instructor} are candidate keys.

BCNF Check:

F1: {Student, Course} → Instructor. SC is a candidate key. ✓
F2: Instructor → Course. Instructor⁺ = {Instructor, Course} ≠ R. Violation!

Step 2: Decompose on the Violation

Decomposing on Instructor → Course:

R₁ = {Instructor, Course}
R₂ = {Student, Instructor} (= {Student, Course, Instructor} - {Course} + {Instructor})

Wait, let me recalculate R₂:

Original attributes: {Student, Course, Instructor}
X = {Instructor}, Y = {Course}
R₂ = R - (Y - X) = {Student, Course, Instructor} - {Course} = {Student, Instructor}

Decomposition: {R₁(Instructor, Course), R₂(Student, Instructor)}

Step 3: Verify BCNF of Decomposed Relations

R₁(Instructor, Course):

Projected FD: Instructor → Course
Instructor⁺ in R₁ = {Instructor, Course} = R₁. Superkey. ✓ BCNF.

R₂(Student, Instructor):

What FDs apply?
{Student, Course} → Instructor? Course ∉ R₂, so no.
Instructor → Course? Course ∉ R₂, so no.
Any FD within {Student, Instructor}? Neither determines the other in general.
No non-trivial FDs in R₂. The key is {Student, Instructor} (the whole relation).
✓ BCNF by default (no FDs to violate).

Step 4: Check Dependency Preservation

Original FDs to preserve:

{Student, Course} → Instructor
Instructor → Course

Preserved FDs:

From R₁: Instructor → Course ✓
From R₂: None (no non-trivial FDs)

Is {Student, Course} → Instructor preserved?

Student is in R₂
Course is in R₁
Instructor is in both R₁ and R₂

To check {Student, Course} → Instructor, we need both Student and Course in the same relation. They are NOT in the same relation!

The dependency {Student, Course} → Instructor is NOT preserved.

Dependency Loss Confirmed

The FD {Student, Course} → Instructor cannot be verified in either R₁ or R₂: • R₁ has Instructor and Course but not Student • R₂ has Student and Instructor but not Course

Enforcing this constraint now requires joining R₁ and R₂.

The Semantic Meaning of the Lost Dependency:

The lost dependency {Student, Course} → Instructor means: "For any given student in a given course, there's exactly one instructor."

Without this constraint enforced, the database could contain:

(Alice, CS101, Dr.Smith) via R₂ join R₁
(Alice, CS101, Dr.Jones) via a different R₂ join R₁ path

This would violate the business rule that Alice has only one instructor for CS101.

Why Did This Happen?

The decomposition separated Student (in R₂) from Course (in R₁). Since the FD {Student, Course} → Instructor requires both Student and Course as the determinant, they must be together to check the constraint. The algorithm's focus on eliminating the Instructor → Course violation blindly destroyed this capability.

Detecting Dependency Loss

Before committing to a BCNF decomposition, you should verify whether all dependencies are preserved. Here's a systematic approach to detection.

check_preservation.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Algorithm: Check Dependency Preservation
Input: Original FDs F, Decomposition {R₁, R₂, ..., Rₙ}
Output: true if all FDs are preserved, false otherwise
 
function isDependencyPreserving(F, decomposition):
    // Collect all preserved FDs
    preservedFDs = {}
    for each Rᵢ in decomposition:
        projectedFDs = projectFDs(F, attributes(Rᵢ))
        preservedFDs = preservedFDs ∪ projectedFDs
    
    // Check if preserved FDs are equivalent to original
    // We need: (preservedFDs)⁺ = F⁺
    // Equivalently: Every FD in F must be derivable from preservedFDs
    
    for each (X → Y) in F:
        // Check if X → Y is derivable from preservedFDs
        XClosure = computeClosure(X, preservedFDs)
        if not Y.isSubsetOf(XClosure):
            return false  // This FD is not preserved!
    
    return true
 
// More efficient: Check each FD individually using this algorithm
function isFDPreserved(X → Y, decomposition, F):
    // Can we derive X → Y from projected FDs only?
    // Use modified closure computation that only applies FDs
    // that are entirely within some Rᵢ
    
    result = X
    changed = true
    
    while changed:
        changed = false
        for each Rᵢ in decomposition:
            // Consider only the portion of result in Rᵢ
            Z = result ∩ attributes(Rᵢ)
            ZClosure = computeClosure(Z, projectFDs(F, attributes(Rᵢ)))
            
            if not ZClosure.isSubsetOf(result):
                result = result ∪ ZClosure
                changed = true
    
    return Y.isSubsetOf(result)

Algorithm Explanation:

Approach 1: Full Preservation Check

Compute projected FDs for each relation in the decomposition
Combine all projected FDs into one set
For each original FD, check if it's derivable from the combined set
If any FD is not derivable, preservation fails

Approach 2: Per-FD Check (More Efficient)

For each FD X → Y in F, run a modified closure algorithm
Start with X
Iteratively expand using FDs that are entirely within some relation
If Y can be reached, the FD is preserved

The second approach is more efficient because it doesn't require computing all projected FDs upfront—only checking whether specific FDs are derivable.

Quick Check Heuristic

For quick manual analysis: If an FD X → Y has X and Y attributes all in the same relation Rᵢ after decomposition, it's definitely preserved. If attributes are split across relations, further analysis is needed—it might still be preserved through derivation.

The Impossibility Theorem

The dependency preservation issue isn't just an algorithm limitation—it's a fundamental theoretical barrier. For certain schemas, there is no decomposition that achieves both BCNF and dependency preservation.

The Impossibility Theorem

There exist relation schemas R with functional dependency sets F such that no decomposition of R can simultaneously:

Be in BCNF
Be lossless-join
Be dependency-preserving

This is a proven impossibility, not an algorithm deficiency.

Proof Sketch:

Consider R(A, B, C) with F = {A → B, B → C, C → A}.

This creates a cycle of dependencies with no clear key—actually, each attribute determines everything:

A⁺ = {A, B, C}
B⁺ = {B, C, A}
C⁺ = {C, A, B}

So {A}, {B}, and {C} are all candidate keys. Each dependency has a candidate key (hence superkey) on the left.

This relation is already in BCNF!

But this is atypical. Let's construct a true impossibility case:

R(J, K, L) with F = {JK → L, L → K}

Candidate Keys:

(JK)⁺ = {J, K, L} = R. {JK} is a candidate key.
(JL)⁺: L → K gives {J, L, K} = R. {JL} is a candidate key.

BCNF Check:

JK → L: JK is a candidate key. ✓
L → K: L⁺ = {L, K} ≠ R. L is NOT a superkey. Violation!

Decompose on L → K:

R₁ = {L, K}
R₂ = {J, L}

Check Preservation:

L → K: Both L and K are in R₁. ✓ Preserved.
JK → L: J is in R₂, K is in R₁, L is in both. Not in same relation.

Is JK → L derivable from projected FDs?

From R₁: L → K
From R₂: No non-trivial FDs (J and L are candidates separately)

Can we derive JK → L?

Start with {J, K}
In R₂: Only J is there; nothing new derived.
In R₁: Only K is there; nothing new derived (L → K doesn't help).
Cannot reach L from {J, K} using only projected FDs.

JK → L is NOT preserved.

Can We Do Better?

No. Any BCNF decomposition of R(J, K, L) with {JK → L, L → K} must separate J and K (to put L → K in one relation), which destroys the JK → L dependency.

This is the impossibility: the structure of the FDs forces a conflict between BCNF and dependency preservation.

Why 3NF Escapes This Issue

Third Normal Form always has a dependency-preserving decomposition (via the 3NF synthesis algorithm). BCNF's stricter constraints create cases where no dependency-preserving decomposition exists. This is why 3NF remains relevant despite BCNF's theoretical superiority.

Practical Implications

When dependency preservation is lost, database designers face practical challenges. Understanding these implications helps inform the choice between BCNF and alternatives like 3NF.

Implications of Lost Dependencies

•Cross-Table Constraint Enforcement — Triggers or stored procedures must enforce constraints that span tables. These are more complex to write, test, and maintain than single-table constraints.
•Performance Overhead — Checking constraints via joins is slower than checking within a single table, especially for high-volume transactional systems.
•Concurrency Challenges — Without careful locking, concurrent transactions may create violations between constraint check and commit. Serialization or complex isolation may be needed.
•Application-Layer Burden — Some organizations push constraint enforcement to application code. This risks inconsistency if multiple applications access the database.
•Testing Complexity — Validating that constraints are properly enforced across tables requires more sophisticated testing strategies.

enforce_cross_table_fd.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Example: Enforcing {Student, Course} → Instructor across decomposed tables
-- Tables: InstructorCourse(Instructor, Course), StudentInstructor(Student, Instructor)
 
-- Option 1: Trigger-based enforcement
CREATE OR REPLACE FUNCTION check_student_course_instructor()
RETURNS TRIGGER AS $$
BEGIN
    -- When inserting into StudentInstructor, verify no conflict exists
    IF EXISTS (
        SELECT 1 FROM StudentInstructor si1
        JOIN InstructorCourse ic1 ON si1.Instructor = ic1.Instructor
        JOIN InstructorCourse ic2 ON ic2.Course = ic1.Course
        JOIN StudentInstructor si2 ON si2.Instructor = ic2.Instructor
        WHERE si1.Student = NEW.Student
          AND si2.Student = NEW.Student
          AND si1.Instructor <> si2.Instructor
    ) THEN
        RAISE EXCEPTION 'Violation: Student % would have multiple instructors for same course', NEW.Student;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER enforce_sc_instructor
    BEFORE INSERT OR UPDATE ON StudentInstructor
    FOR EACH ROW EXECUTE FUNCTION check_student_course_instructor();
 
-- Option 2: Materialized view with unique constraint
CREATE MATERIALIZED VIEW StudentCourseInstructor AS
SELECT si.Student, ic.Course, si.Instructor
FROM StudentInstructor si
JOIN InstructorCourse ic ON si.Instructor = ic.Instructor;
 
-- Refresh after each relevant transaction
-- Check uniqueness violation by querying the view
 
-- Option 3: Application-level check (pseudo-code)
/*
function addStudentInstructor(student, instructor):
    course = query("SELECT Course FROM InstructorCourse WHERE Instructor = ?", instructor)
    existingInstructor = query(
        "SELECT si.Instructor FROM StudentInstructor si 
         JOIN InstructorCourse ic ON si.Instructor = ic.Instructor
         WHERE si.Student = ? AND ic.Course = ?",
        student, course
    )
    if existingInstructor and existingInstructor != instructor:
        throw "Constraint violation: Student already has different instructor for this course"
    insert(student, instructor)
*/

Cost-Benefit Analysis

Before accepting a BCNF decomposition with lost dependencies, estimate: (1) How often will the constraint be tested? (2) What's the cost of enforcement overhead? (3) What's the cost of potential violations? If constraints are rarely checked or violations have low impact, BCNF may still be preferred.

Decision Framework: BCNF vs 3NF

When BCNF decomposition loses dependencies, designers face a choice: accept the loss and stay in BCNF, or retreat to 3NF (which always preserves dependencies). Here's a framework for making this decision.

BCNF vs 3NF Decision Factors
Factor	Favors BCNF	Favors 3NF
Update frequency on redundant data	Low (redundancy cost is minimal)	High (update anomalies are real risk)
Constraint enforcement complexity	Acceptable complexity/resources available	Must use simple single-table constraints
Read vs write ratio	Write-heavy (reduce redundancy)	Read-heavy (can tolerate redundancy)
Storage costs	Storage is expensive	Storage is cheap
Data integrity criticality	Lower stakes; some violations acceptable	High stakes; violations unacceptable
Application architecture	Single app, can enforce at app level	Multiple apps, need DB-level enforcement
Concurrency requirements	Low concurrency, can afford locking	High concurrency, need fast constraints

Decision Flowchart:

Does BCNF decomposition preserve all dependencies?
- Yes → Use BCNF (best of both worlds)
- No → Continue to step 2
How critical are the non-preserved dependencies?
- Not business-critical → Consider accepting BCNF with application-level enforcement
- Business-critical → Continue to step 3
Can you implement cross-table enforcement reliably?
- Yes, with acceptable performance → Use BCNF with enforcement mechanisms
- No, too complex or slow → Continue to step 4
Is the 3NF redundancy tolerable?
- Yes, minimal update anomaly risk → Use 3NF
- No, redundancy causes serious issues → Return to step 3, invest in enforcement

The Pragmatic View:

In practice, many database designers:

Default to 3NF for transactional systems (dependency preservation is valuable)
Use BCNF for analytical/warehousing systems (reads dominate, joins are expected)
Evaluate on a case-by-case basis for complex systems

There's no universally "correct" choice—it depends on specific requirements.

Hybrid Approaches

Sometimes the answer is neither pure BCNF nor 3NF. You might use BCNF for some relations, 3NF for others, or even denormalize certain areas. Schema design is ultimately about optimizing for your specific workload and constraints.

Alternative Strategies

Beyond the simple BCNF-or-3NF choice, several alternative strategies can help manage the dependency preservation trade-off.

Strategy 1: Computed Columns / Views

•Approach: Keep BCNF decomposition but create a view that joins the relations and appears to have the lost FD.
•Implementation: Create a view or materialized view combining the decomposed tables. Add a unique constraint on the view (if DB supports) or check via application.
•Pros: BCNF data storage with apparent constraint checking.
•Cons: Views on writes can be complex; materialized views need refresh.

Strategy 2: Redundant Table for Constraint

•Approach: Add a table specifically to enforce the lost constraint.
•Example: For {Student, Course} → Instructor, add table StudentCourseInstructor(Student, Course, Instructor) with UNIQUE(Student, Course).
•Maintenance: Use triggers to keep this table synchronized with the normalized tables.
•Pros: Single-table constraint checking; clear enforcement.
•Cons: Extra storage and synchronization overhead.

Strategy 3: Controlled Denormalization

•Approach: Accept that some redundancy is the lesser evil and partially denormalize.
•Implementation: Keep some relations in 3NF rather than BCNF. Document the intentional redundancy.
•Management: Put triggers in place to synchronize redundant attributes on update.
•Pros: Simple single-table constraints; no cross-table enforcement.
•Cons: Update anomaly risk; must maintain synchronization.

Strategy 4: Event Sourcing / CQRS

•Approach: Use event sourcing where constraints are checked at event creation time.
•Implementation: All modifications come through a single entry point that validates against all constraints before generating events.
•Read model: Can be denormalized for query performance while source of truth maintains integrity.
•Pros: Flexible constraint enforcement; clean separation.
•Cons: Architectural complexity; not suitable for all systems.

Document Your Decisions

Whatever strategy you choose, document it thoroughly. Future maintainers need to understand: (1) Which normal form each table is in, (2) What constraints are enforced where, (3) What invariants must be maintained by application code. Undocumented schema design decisions become landmines.

Summary: Dependency Preservation Issue

We've explored the fundamental tension between BCNF's strictness and the practical need for dependency preservation. This trade-off is one of the most important concepts in database design. Let's consolidate:

Key Takeaways

•Dependency preservation means constraints can be checked without joins — When preserved, FDs can be enforced with simple single-table constraints.
•BCNF decomposition can split FD attributes — The algorithm focuses on violations, potentially scattering other FDs' attributes.
•For some schemas, BCNF and preservation are incompatible — This is a proven impossibility, not algorithm weakness.
•Non-preservation has real costs — Cross-table enforcement is complex, slower, and harder to maintain reliably.
•3NF always preserves dependencies — This is 3NF's major advantage, explaining its continued relevance.
•The choice depends on specific requirements — No universal answer; evaluate based on update patterns, constraint criticality, and enforcement capabilities.
•Alternative strategies exist — Views, redundant tables, controlled denormalization, and architectural patterns can mitigate trade-offs.

Module Complete: BCNF

Congratulations! You've completed the BCNF module. You now understand BCNF's definition, its relationship to 3NF, how to identify violations, the decomposition algorithm, and the crucial dependency preservation trade-off. You're equipped to make informed normalization decisions in real database design scenarios.

Module Summary:

Across five comprehensive pages, we've covered:

BCNF Definition — The elegant requirement that every determinant be a superkey
BCNF vs 3NF — When they differ, and why BCNF is stricter
BCNF Violations — Systematic detection methods and common patterns
Decomposition Algorithm — The lossless procedure for achieving BCNF
Dependency Preservation — The fundamental trade-off and decision framework

This knowledge positions you to engage deeply with normalization theory and apply it effectively in practice. The next module in this chapter explores BCNF decomposition in greater depth, with additional algorithms and edge cases.

Dependency Preservation Issue

The Trade-Off at the Heart of BCNF

What You Will Learn

What Is Dependency Preservation?

Before examining why BCNF can fail to preserve dependencies, we must precisely define what dependency preservation means.

Dependency Preservation Definition

A decomposition of relation R into relations R₁, R₂, ..., Rₙ is dependency preserving with respect to FD set F if:

(πᵣ₁(F) ∪ πᵣ₂(F) ∪ ... ∪ πᵣₙ(F))⁺ = F⁺

Where πᵣᵢ(F) denotes the projection of F onto the attributes of Rᵢ.

In simpler terms: The FDs that can be checked within individual decomposed relations, taken together, should be equivalent to the original FD set.

Why Dependency Preservation Matters:

Consider a functional dependency AB → C. If A is in relation R₁ and C is in relation R₂ (with only B in both), how do we check this constraint?

With dependency preservation:

The FD AB → C can be checked within a single relation
INSERT/UPDATE operations can validate the constraint immediately
A simple UNIQUE constraint or trigger on one table suffices

Without dependency preservation:

The FD AB → C spans multiple relations
Checking requires joining R₁ and R₂
Every INSERT/UPDATE must perform a multi-table check
This is slower, more complex, and error-prone

The Efficiency Argument:

Preserved dependencies can be enforced by:

Unique indexes within a single table
CHECK constraints
Simple triggers

Non-preserved dependencies require:

Multi-table triggers
Application-level enforcement
Periodic batch verification
Or accepting that the constraint may be violated

The Integrity Risk

Why BCNF Can Fail to Preserve Dependencies

The Mechanism of Loss:

Consider a relation R with FDs including:

Violating FD: X → Y (where X is not a superkey)
Another FD: Z → W (where Z and W may overlap with X and Y)

When we decompose on X → Y:

R₁ = X ∪ Y
R₂ = X ∪ (R - Y)

If Z and W are split such that Z is entirely in R₁ and W is entirely in R₂ (or vice versa), the FD Z → W cannot be checked in either relation alone.

Critical Insight:

Mathematical Formulation:

For FD Z → W to be preserved in a decomposition {R₁, R₂, ..., Rₙ}:

There must exist some Rᵢ such that Z ∪ W ⊆ attributes(Rᵢ)
If no such Rᵢ exists, checking Z → W requires joining multiple relations

Not All Lost Dependencies Are Problems

The Classic Example

The Example: Student-Course-Instructor

Relation: R(Student, Course, Instructor)

Semantics: • Each student-course pair has exactly one instructor • Each instructor teaches only one course

Functional Dependencies: • F1: {Student, Course} → Instructor • F2: Instructor → Course

Step 1: Analyze the Original Relation

Candidate Keys:

{Student, Course}⁺ = {Student, Course, Instructor} = R ✓
{Student, Instructor}⁺: Instructor → Course gives {Student, Instructor, Course} = R ✓

Both {Student, Course} and {Student, Instructor} are candidate keys.

BCNF Check:

F1: {Student, Course} → Instructor. SC is a candidate key. ✓
F2: Instructor → Course. Instructor⁺ = {Instructor, Course} ≠ R. Violation!

Step 2: Decompose on the Violation

Decomposing on Instructor → Course:

R₁ = {Instructor, Course}
R₂ = {Student, Instructor} (= {Student, Course, Instructor} - {Course} + {Instructor})

Wait, let me recalculate R₂:

Original attributes: {Student, Course, Instructor}
X = {Instructor}, Y = {Course}
R₂ = R - (Y - X) = {Student, Course, Instructor} - {Course} = {Student, Instructor}

Decomposition: {R₁(Instructor, Course), R₂(Student, Instructor)}

Step 3: Verify BCNF of Decomposed Relations

R₁(Instructor, Course):

Projected FD: Instructor → Course
Instructor⁺ in R₁ = {Instructor, Course} = R₁. Superkey. ✓ BCNF.

R₂(Student, Instructor):

What FDs apply?
{Student, Course} → Instructor? Course ∉ R₂, so no.
Instructor → Course? Course ∉ R₂, so no.
Any FD within {Student, Instructor}? Neither determines the other in general.
No non-trivial FDs in R₂. The key is {Student, Instructor} (the whole relation).
✓ BCNF by default (no FDs to violate).

Step 4: Check Dependency Preservation

Original FDs to preserve:

{Student, Course} → Instructor
Instructor → Course

Preserved FDs:

From R₁: Instructor → Course ✓
From R₂: None (no non-trivial FDs)

Is {Student, Course} → Instructor preserved?

Student is in R₂
Course is in R₁
Instructor is in both R₁ and R₂

To check {Student, Course} → Instructor, we need both Student and Course in the same relation. They are NOT in the same relation!

The dependency {Student, Course} → Instructor is NOT preserved.

Dependency Loss Confirmed

The FD {Student, Course} → Instructor cannot be verified in either R₁ or R₂: • R₁ has Instructor and Course but not Student • R₂ has Student and Instructor but not Course

Enforcing this constraint now requires joining R₁ and R₂.

The Semantic Meaning of the Lost Dependency:

The lost dependency {Student, Course} → Instructor means: "For any given student in a given course, there's exactly one instructor."

Without this constraint enforced, the database could contain:

(Alice, CS101, Dr.Smith) via R₂ join R₁
(Alice, CS101, Dr.Jones) via a different R₂ join R₁ path

This would violate the business rule that Alice has only one instructor for CS101.

Why Did This Happen?

Detecting Dependency Loss

Before committing to a BCNF decomposition, you should verify whether all dependencies are preserved. Here's a systematic approach to detection.

check_preservation.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Algorithm: Check Dependency Preservation
Input: Original FDs F, Decomposition {R₁, R₂, ..., Rₙ}
Output: true if all FDs are preserved, false otherwise
 
function isDependencyPreserving(F, decomposition):
    // Collect all preserved FDs
    preservedFDs = {}
    for each Rᵢ in decomposition:
        projectedFDs = projectFDs(F, attributes(Rᵢ))
        preservedFDs = preservedFDs ∪ projectedFDs
    
    // Check if preserved FDs are equivalent to original
    // We need: (preservedFDs)⁺ = F⁺
    // Equivalently: Every FD in F must be derivable from preservedFDs
    
    for each (X → Y) in F:
        // Check if X → Y is derivable from preservedFDs
        XClosure = computeClosure(X, preservedFDs)
        if not Y.isSubsetOf(XClosure):
            return false  // This FD is not preserved!
    
    return true
 
// More efficient: Check each FD individually using this algorithm
function isFDPreserved(X → Y, decomposition, F):
    // Can we derive X → Y from projected FDs only?
    // Use modified closure computation that only applies FDs
    // that are entirely within some Rᵢ
    
    result = X
    changed = true
    
    while changed:
        changed = false
        for each Rᵢ in decomposition:
            // Consider only the portion of result in Rᵢ
            Z = result ∩ attributes(Rᵢ)
            ZClosure = computeClosure(Z, projectFDs(F, attributes(Rᵢ)))
            
            if not ZClosure.isSubsetOf(result):
                result = result ∪ ZClosure
                changed = true
    
    return Y.isSubsetOf(result)

Algorithm Explanation:

Approach 1: Full Preservation Check

Compute projected FDs for each relation in the decomposition
Combine all projected FDs into one set
For each original FD, check if it's derivable from the combined set
If any FD is not derivable, preservation fails

Approach 2: Per-FD Check (More Efficient)

For each FD X → Y in F, run a modified closure algorithm
Start with X
Iteratively expand using FDs that are entirely within some relation
If Y can be reached, the FD is preserved

The second approach is more efficient because it doesn't require computing all projected FDs upfront—only checking whether specific FDs are derivable.

Quick Check Heuristic

The Impossibility Theorem

There exist relation schemas R with functional dependency sets F such that no decomposition of R can simultaneously:

Be in BCNF
Be lossless-join
Be dependency-preserving

This is a proven impossibility, not an algorithm deficiency.

Proof Sketch:

Consider R(A, B, C) with F = {A → B, B → C, C → A}.

This creates a cycle of dependencies with no clear key—actually, each attribute determines everything:

A⁺ = {A, B, C}
B⁺ = {B, C, A}
C⁺ = {C, A, B}

So {A}, {B}, and {C} are all candidate keys. Each dependency has a candidate key (hence superkey) on the left.

This relation is already in BCNF!

But this is atypical. Let's construct a true impossibility case:

R(J, K, L) with F = {JK → L, L → K}

Candidate Keys:

(JK)⁺ = {J, K, L} = R. {JK} is a candidate key.
(JL)⁺: L → K gives {J, L, K} = R. {JL} is a candidate key.

BCNF Check:

JK → L: JK is a candidate key. ✓
L → K: L⁺ = {L, K} ≠ R. L is NOT a superkey. Violation!

Decompose on L → K:

R₁ = {L, K}
R₂ = {J, L}

Check Preservation:

L → K: Both L and K are in R₁. ✓ Preserved.
JK → L: J is in R₂, K is in R₁, L is in both. Not in same relation.

Is JK → L derivable from projected FDs?

From R₁: L → K
From R₂: No non-trivial FDs (J and L are candidates separately)

Can we derive JK → L?

Start with {J, K}
In R₂: Only J is there; nothing new derived.
In R₁: Only K is there; nothing new derived (L → K doesn't help).
Cannot reach L from {J, K} using only projected FDs.

JK → L is NOT preserved.

Can We Do Better?

No. Any BCNF decomposition of R(J, K, L) with {JK → L, L → K} must separate J and K (to put L → K in one relation), which destroys the JK → L dependency.

This is the impossibility: the structure of the FDs forces a conflict between BCNF and dependency preservation.

Why 3NF Escapes This Issue

Practical Implications

When dependency preservation is lost, database designers face practical challenges. Understanding these implications helps inform the choice between BCNF and alternatives like 3NF.

Implications of Lost Dependencies

•Cross-Table Constraint Enforcement — Triggers or stored procedures must enforce constraints that span tables. These are more complex to write, test, and maintain than single-table constraints.
•Performance Overhead — Checking constraints via joins is slower than checking within a single table, especially for high-volume transactional systems.
•Concurrency Challenges — Without careful locking, concurrent transactions may create violations between constraint check and commit. Serialization or complex isolation may be needed.
•Application-Layer Burden — Some organizations push constraint enforcement to application code. This risks inconsistency if multiple applications access the database.
•Testing Complexity — Validating that constraints are properly enforced across tables requires more sophisticated testing strategies.

enforce_cross_table_fd.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Example: Enforcing {Student, Course} → Instructor across decomposed tables
-- Tables: InstructorCourse(Instructor, Course), StudentInstructor(Student, Instructor)
 
-- Option 1: Trigger-based enforcement
CREATE OR REPLACE FUNCTION check_student_course_instructor()
RETURNS TRIGGER AS $$
BEGIN
    -- When inserting into StudentInstructor, verify no conflict exists
    IF EXISTS (
        SELECT 1 FROM StudentInstructor si1
        JOIN InstructorCourse ic1 ON si1.Instructor = ic1.Instructor
        JOIN InstructorCourse ic2 ON ic2.Course = ic1.Course
        JOIN StudentInstructor si2 ON si2.Instructor = ic2.Instructor
        WHERE si1.Student = NEW.Student
          AND si2.Student = NEW.Student
          AND si1.Instructor <> si2.Instructor
    ) THEN
        RAISE EXCEPTION 'Violation: Student % would have multiple instructors for same course', NEW.Student;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER enforce_sc_instructor
    BEFORE INSERT OR UPDATE ON StudentInstructor
    FOR EACH ROW EXECUTE FUNCTION check_student_course_instructor();
 
-- Option 2: Materialized view with unique constraint
CREATE MATERIALIZED VIEW StudentCourseInstructor AS
SELECT si.Student, ic.Course, si.Instructor
FROM StudentInstructor si
JOIN InstructorCourse ic ON si.Instructor = ic.Instructor;
 
-- Refresh after each relevant transaction
-- Check uniqueness violation by querying the view
 
-- Option 3: Application-level check (pseudo-code)
/*
function addStudentInstructor(student, instructor):
    course = query("SELECT Course FROM InstructorCourse WHERE Instructor = ?", instructor)
    existingInstructor = query(
        "SELECT si.Instructor FROM StudentInstructor si 
         JOIN InstructorCourse ic ON si.Instructor = ic.Instructor
         WHERE si.Student = ? AND ic.Course = ?",
        student, course
    )
    if existingInstructor and existingInstructor != instructor:
        throw "Constraint violation: Student already has different instructor for this course"
    insert(student, instructor)
*/

Cost-Benefit Analysis

Decision Framework: BCNF vs 3NF

BCNF vs 3NF Decision Factors
Factor	Favors BCNF	Favors 3NF
Update frequency on redundant data	Low (redundancy cost is minimal)	High (update anomalies are real risk)
Constraint enforcement complexity	Acceptable complexity/resources available	Must use simple single-table constraints
Read vs write ratio	Write-heavy (reduce redundancy)	Read-heavy (can tolerate redundancy)
Storage costs	Storage is expensive	Storage is cheap
Data integrity criticality	Lower stakes; some violations acceptable	High stakes; violations unacceptable
Application architecture	Single app, can enforce at app level	Multiple apps, need DB-level enforcement
Concurrency requirements	Low concurrency, can afford locking	High concurrency, need fast constraints

Decision Flowchart:

Does BCNF decomposition preserve all dependencies?
- Yes → Use BCNF (best of both worlds)
- No → Continue to step 2
How critical are the non-preserved dependencies?
- Not business-critical → Consider accepting BCNF with application-level enforcement
- Business-critical → Continue to step 3
Can you implement cross-table enforcement reliably?
- Yes, with acceptable performance → Use BCNF with enforcement mechanisms
- No, too complex or slow → Continue to step 4
Is the 3NF redundancy tolerable?
- Yes, minimal update anomaly risk → Use 3NF
- No, redundancy causes serious issues → Return to step 3, invest in enforcement

The Pragmatic View:

In practice, many database designers:

Default to 3NF for transactional systems (dependency preservation is valuable)
Use BCNF for analytical/warehousing systems (reads dominate, joins are expected)
Evaluate on a case-by-case basis for complex systems

There's no universally "correct" choice—it depends on specific requirements.

Hybrid Approaches

Alternative Strategies

Beyond the simple BCNF-or-3NF choice, several alternative strategies can help manage the dependency preservation trade-off.

Strategy 1: Computed Columns / Views

•Approach: Keep BCNF decomposition but create a view that joins the relations and appears to have the lost FD.
•Implementation: Create a view or materialized view combining the decomposed tables. Add a unique constraint on the view (if DB supports) or check via application.
•Pros: BCNF data storage with apparent constraint checking.
•Cons: Views on writes can be complex; materialized views need refresh.

Strategy 2: Redundant Table for Constraint

•Approach: Add a table specifically to enforce the lost constraint.
•Example: For {Student, Course} → Instructor, add table StudentCourseInstructor(Student, Course, Instructor) with UNIQUE(Student, Course).
•Maintenance: Use triggers to keep this table synchronized with the normalized tables.
•Pros: Single-table constraint checking; clear enforcement.
•Cons: Extra storage and synchronization overhead.

Strategy 3: Controlled Denormalization

•Approach: Accept that some redundancy is the lesser evil and partially denormalize.
•Implementation: Keep some relations in 3NF rather than BCNF. Document the intentional redundancy.
•Management: Put triggers in place to synchronize redundant attributes on update.
•Pros: Simple single-table constraints; no cross-table enforcement.
•Cons: Update anomaly risk; must maintain synchronization.

Strategy 4: Event Sourcing / CQRS

•Approach: Use event sourcing where constraints are checked at event creation time.
•Implementation: All modifications come through a single entry point that validates against all constraints before generating events.
•Read model: Can be denormalized for query performance while source of truth maintains integrity.
•Pros: Flexible constraint enforcement; clean separation.
•Cons: Architectural complexity; not suitable for all systems.

Document Your Decisions

Summary: Dependency Preservation Issue

Key Takeaways

•Dependency preservation means constraints can be checked without joins — When preserved, FDs can be enforced with simple single-table constraints.
•BCNF decomposition can split FD attributes — The algorithm focuses on violations, potentially scattering other FDs' attributes.
•For some schemas, BCNF and preservation are incompatible — This is a proven impossibility, not algorithm weakness.
•Non-preservation has real costs — Cross-table enforcement is complex, slower, and harder to maintain reliably.
•3NF always preserves dependencies — This is 3NF's major advantage, explaining its continued relevance.
•The choice depends on specific requirements — No universal answer; evaluate based on update patterns, constraint criticality, and enforcement capabilities.
•Alternative strategies exist — Views, redundant tables, controlled denormalization, and architectural patterns can mitigate trade-offs.

Module Complete: BCNF

Module Summary:

Across five comprehensive pages, we've covered:

BCNF Definition — The elegant requirement that every determinant be a superkey
BCNF vs 3NF — When they differ, and why BCNF is stricter
BCNF Violations — Systematic detection methods and common patterns
Decomposition Algorithm — The lossless procedure for achieving BCNF
Dependency Preservation — The fundamental trade-off and decision framework