Lossless Join Decomposition - Learning Module

Loading content...

0/241

Decomposition Algorithm

Constructing Lossless Decompositions by Design

So far, we've learned how to test whether a decomposition is lossless. But in practice, you rarely receive a decomposition and need to verify it—instead, you're creating decompositions as part of normalization. The question becomes: How do we systematically construct decompositions that are guaranteed to be lossless?

This page presents the key decomposition algorithms used in database normalization. These algorithms don't just produce decompositions—they produce lossless decompositions by construction. When you follow these algorithms correctly, losslessness is guaranteed without needing to run the Chase algorithm afterward.

We'll cover:

The fundamental binary decomposition principle
The BCNF decomposition algorithm
The 3NF synthesis algorithm
How these algorithms guarantee losslessness

What You Will Learn

By the end of this page, you will understand how to systematically decompose relations while preserving information, apply the BCNF decomposition and 3NF synthesis algorithms, understand why these algorithms guarantee losslessness, and know when to use each approach.

The Fundamental Decomposition Principle

Before diving into specific algorithms, let's establish the core principle that makes decomposition algorithms produce lossless results.

The Binary Lossless Decomposition Principle:

Given a relation R and a functional dependency X → Y where Y ⊈ X and (X ∩ Y) = ∅, we can decompose R into:

R₁ = X ∪ Y        (The FD's attributes)
R₂ = R - (Y - X)  (Everything except Y's non-X part)
   = R - Y ∪ X    (Equivalently: R minus Y, plus X)

Or more simply:

R₁ = XY (the FD itself)
R₂ = R - (Y - X) = all attributes of R except those on the right side of the FD that aren't on the left

This decomposition is always lossless because:

Common attributes = X (the determinant)
X is a superkey for R₁ (since X → Y means X → R₁)
By our lossless join test, this satisfies the condition

The Key Insight

Decomposing along functional dependencies is inherently safe. The determinant of the FD becomes the join attribute, and it's a key for the relation containing the FD. This structural guarantee is why normalization algorithms work.

binary_decomposition_principle.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
BINARY LOSSLESS DECOMPOSITION
==============================
 
Given: Relation R with FD X → Y (where Y ⊈ X)
 
Decompose into:
  R₁ = X ∪ Y
  R₂ = R - (Y - X)  [R minus the non-trivial part of Y]
 
Alternatively stated as:
  R₁ = X ∪ Y
  R₂ = (R - Y) ∪ X  [R minus Y, reunited with X]
 
Properties:
  - R₁ ∩ R₂ = X (the determinant)
  - X → Y ⊆ R₁, so X is a superkey for R₁
  - Therefore: (R₁ ∩ R₂) → R₁ holds
  - LOSSLESS by binary test ✓
 
 
EXAMPLE:
========
R = (A, B, C, D, E)
FD: B → D
 
Decompose:
  R₁ = {B, D}           (the FD)
  R₂ = {A, B, C, E}     (R minus D, B stays as join key)
 
Check:
  Common = {B}
  B → D, so B → R₁ = {B, D} ✓
  Lossless!
 
Another FD: A → C
 
Further decompose R₂:
  R₂₁ = {A, C}
  R₂₂ = {A, B, E}
 
Check:
  Common = {A}
  A → C, so A → R₂₁ = {A, C} ✓
  Lossless!
 
Final decomposition: {R₁, R₂₁, R₂₂} = {{B,D}, {A,C}, {A,B,E}}

BCNF Decomposition Algorithm

The BCNF Decomposition Algorithm repeatedly applies the binary decomposition principle to eliminate BCNF violations until all resulting relations are in BCNF.

Recall BCNF Definition:

A relation R is in BCNF if and only if for every non-trivial FD X → Y that holds in R, X is a superkey of R.

BCNF Violation:

An FD X → Y violates BCNF if X is not a superkey (some attribute(s) are not determined by X).

The algorithm finds such violations and decomposes the relation until no violations remain.

bcnf_decomposition_algorithm.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
BCNF DECOMPOSITION ALGORITHM
=============================
 
INPUT:
  - Relation R with schema {A₁, A₂, ..., Aₙ}
  - Set of functional dependencies F
 
OUTPUT:
  - Set of relation schemas D = {R₁, R₂, ..., Rₘ} such that:
    1. Each Rᵢ is in BCNF
    2. The decomposition is lossless
    [Note: Dependency preservation is NOT guaranteed]
 
ALGORITHM:
 
FUNCTION BCNF_Decompose(R, F):
    result = {R}
    
    WHILE there exists Rᵢ in result that is not in BCNF:
        // Find a BCNF violation in Rᵢ
        Find X → Y in F⁺ such that:
          - X → Y is non-trivial (Y ⊈ X)
          - X → Y holds in Rᵢ (X ∪ Y ⊆ Rᵢ)
          - X is not a superkey of Rᵢ
        
        // Decompose Rᵢ using this violation
        R₁ = X ∪ Y            // Contains the violating FD
        R₂ = Rᵢ - (Y - X)     // Everything else, keeping X
        
        // Update result
        result = (result - {Rᵢ}) ∪ {R₁, R₂}
        
        // Note: Compute projections of F onto R₁ and R₂
        // for checking BCNF in subsequent iterations
    
    RETURN result
 
 
CORRECTNESS:
------------
- BCNF: Algorithm terminates only when no violations exist
- Lossless: Each binary split uses the fundamental principle
            (common attributes X are superkey for R₁)
- Termination: Each split strictly reduces total attributes
               in non-BCNF relations (finite)

Important Caveat

BCNF decomposition guarantees losslessness but NOT dependency preservation. Some FDs may become unjoinable after decomposition. For applications where constraints must be enforceable on single tables, consider 3NF synthesis instead.

BCNF Decomposition: Complete Worked Example

Let's trace through the BCNF decomposition algorithm with a comprehensive example.

Problem:

R = (A, B, C, D, E)

Functional Dependencies F:
  A → B
  B → C
  C → D

Step 0: Initial Analysis

First, find the candidate key(s) of R:

What determines everything?
A → B → C → D (by transitivity)
So A⁺ = {A, B, C, D}
A doesn't determine E, so A alone isn't a key
(A, E)⁺ = {A, B, C, D, E} = R ✓

Candidate Key: (A, E)

Check BCNF:

A → B: Is A a superkey? A⁺ = {A,B,C,D} ≠ R. VIOLATION!
B → C: Is B a superkey? B⁺ = {B,C,D} ≠ R. VIOLATION!
C → D: Is C a superkey? C⁺ = {C,D} ≠ R. VIOLATION!

R is not in BCNF. Begin decomposition.

Iteration 1: Decompose on A → B

R₁ = {A, B}           (the FD A → B)
R₂ = {A, C, D, E}     (R minus B, keep A)

Check R₁ = {A, B}:

Projected FDs: A → B
Key: A
A → B: Is A superkey of {A,B}? A⁺ in R₁ = {A,B} ✓
R₁ is in BCNF ✓

Check R₂ = {A, C, D, E}:

Projected FDs from F: What FDs of F apply to {A,C,D,E}?
- A → B doesn't apply (B not in R₂)
- B → C doesn't apply (B not in R₂)
- C → D applies!
Also consider derived FDs: A → B → C, so A → C. But does this help?
- For projection, we need to find FDs holding in R₂
- A⁺ using F restricted to {A,C,D,E}: We have C → D
- Does A determine anything in {A,C,D,E}? Not directly from F.
Actually need closure-based projection. Let's carefully compute:
- For each subset X of {A,C,D,E}, compute X⁺ under F, intersect with {A,C,D,E}
- C⁺ = {C,D} → C → D holds in R₂
- A⁺ = {A,B,C,D} ∩ {A,C,D,E} = {A,C,D} → A → CD holds in R₂
So in R₂: A → CD, C → D
Key of R₂: Find what determines {A,C,D,E}
- A⁺ in R₂ = {A,C,D} ≠ R₂
- (A,E)⁺ in R₂ = {A,C,D,E} = R₂ ✓
- Key: (A,E)
Check BCNF for R₂:
- C → D: Is C superkey of R₂? C⁺ = {C,D} ≠ R₂. VIOLATION!

Progress Check

After Iteration 1: R₁ = {A,B} is in BCNF ✓, but R₂ = {A,C,D,E} still has a violation (C → D). Continue decomposing R₂.

Iteration 2: Decompose R₂ on C → D

R₂₁ = {C, D}          (the FD C → D)
R₂₂ = {A, C, E}       (R₂ minus D, keep C)

Check R₂₁ = {C, D}:

Projected FDs: C → D
Key: C
C → D: Is C superkey of {C,D}? Yes! C⁺ = {C,D} = R₂₁ ✓
R₂₁ is in BCNF ✓

Check R₂₂ = {A, C, E}:

Projected FDs: What FDs hold?
- A → C? Let's check: A⁺ under F = {A,B,C,D} ∩ {A,C,E} = {A,C}
- So A → C holds in R₂₂
Key of R₂₂:
- A⁺ = {A,C} ≠ R₂₂
- (A,E)⁺ = {A,C,E} = R₂₂ ✓
- Key: (A,E)
Check BCNF for R₂₂:
- A → C: Is A superkey of R₂₂? A⁺ = {A,C} ≠ R₂₂. VIOLATION!

R₂₂ is not in BCNF. Continue.

Iteration 3: Decompose R₂₂ on A → C

R₂₂₁ = {A, C}         (the FD A → C)
R₂₂₂ = {A, E}         (R₂₂ minus C, keep A)

Check R₂₂₁ = {A, C}:

Projected FDs: A → C
Key: A
A → C: Is A superkey? A⁺ = {A,C} = R₂₂₁ ✓
R₂₂₁ is in BCNF ✓

Check R₂₂₂ = {A, E}:

Projected FDs: None (no FD from F involves only A and E)
Key: (A,E) — the whole thing
No non-trivial FDs exist, so trivially BCNF ✓
R₂₂₂ is in BCNF ✓

Decomposition Complete!

Final decomposition into BCNF: {R₁, R₂₁, R₂₂₁, R₂₂₂} = {{A,B}, {C,D}, {A,C}, {A,E}}. Each relation is in BCNF, and the decomposition is lossless by construction.

Final BCNF Decomposition
Relation	Schema	Key	FDs	BCNF?
R₁	{A, B}	A	A → B	✓
R₂₁	{C, D}	C	C → D	✓
R₂₂₁	{A, C}	A	A → C	✓
R₂₂₂	{A, E}	(A, E)	none	✓

Verifying Losslessness:

Each decomposition step used an FD where the determinant becomes the common attribute:

R split into R₁{A,B} and R₂{A,C,D,E} on A → B
- Common: A, A → B (A is key for R₁) ✓
R₂ split into R₂₁{C,D} and R₂₂{A,C,E} on C → D
- Common: C, C → D (C is key for R₂₁) ✓
R₂₂ split into R₂₂₁{A,C} and R₂₂₂{A,E} on A → C
- Common: A, A → C (A is key for R₂₂₁) ✓

Reconstruction order:

R₂₂₁ ⋈ R₂₂₂ = R₂₂   (join on A)
R₂₂  ⋈ R₂₁  = R₂    (join on C)
R₂   ⋈ R₁   = R     (join on A)

Each join is lossless, so the entire decomposition is lossless.

3NF Synthesis Algorithm

The 3NF Synthesis Algorithm takes a different approach. Instead of iteratively decomposing based on violations, it synthesizes relations from functional dependencies, guaranteeing both losslessness and dependency preservation.

This is important because BCNF decomposition can lose the ability to enforce some FDs on single tables. 3NF synthesis avoids this by designing around the FDs from the start.

3nf_synthesis_algorithm.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
3NF SYNTHESIS ALGORITHM
========================
 
INPUT:
  - Universal relation schema R = {A₁, A₂, ..., Aₙ}
  - Set of functional dependencies F
 
OUTPUT:
  - Set of relation schemas D = {R₁, R₂, ..., Rₘ} such that:
    1. Each Rᵢ is in 3NF
    2. The decomposition is lossless
    3. The decomposition is dependency-preserving
 
ALGORITHM:
 
STEP 1: Compute Canonical Cover
-------------------------------
Fc = canonical_cover(F)
// Removes extraneous attributes and redundant FDs
// Each FD has singleton right-hand side
 
 
STEP 2: Create Relations from FDs
----------------------------------
D = {}
FOR each FD (X → A) in Fc:
    // Group FDs with same LHS
    IF exists Rᵢ in D with schema starting from X:
        Add A to Rᵢ
    ELSE:
        Add new relation X ∪ {A} to D
 
 
STEP 3: Ensure Losslessness (Add Key Relation)
-----------------------------------------------
// Check if any existing relation contains a candidate key
candidate_keys = find_all_candidate_keys(R, F)
 
IF no relation in D contains any candidate key:
    Add one candidate key as a new relation to D
 
 
STEP 4: Remove Redundant Relations (Optional)
----------------------------------------------
// If one relation's schema is a subset of another's
FOR each pair (Rᵢ, Rⱼ) where Rᵢ ⊂ Rⱼ:
    Remove Rᵢ from D
 
RETURN D
 
 
PROPERTIES GUARANTEED:
-----------------------
1. LOSSLESS: The candidate key relation ensures reconstruction
2. DEPENDENCY-PRESERVING: Each FD is contained in one relation
3. 3NF: Each relation created from FD X→A has X as key

Why 3NF Synthesis Guarantees Losslessness

The key insight: By adding a relation containing a candidate key (if not already present), we ensure there exists a way to reconstruct the original relation. The candidate key ties all the pieces together, acting as the anchor for joining all decomposed relations.

3NF Synthesis: Complete Worked Example

Let's apply the 3NF synthesis algorithm to the same example.

Problem:

R = (A, B, C, D, E)

Functional Dependencies F:
  A → B
  B → C
  C → D

STEP 1: Compute Canonical Cover

F is already minimal:

Each FD has singleton RHS
No extraneous attributes (A, B, C are each minimal determinants)
No redundant FDs

Fc = {A → B, B → C, C → D}

STEP 2: Create Relations from FDs

Group by left-hand side:

A → B creates R₁ = {A, B}
B → C creates R₂ = {B, C}
C → D creates R₃ = {C, D}

D = {{A,B}, {B,C}, {C,D}}

STEP 3: Ensure Losslessness

Find candidate keys of R:

From earlier: Candidate key = (A, E)
A⁺ = {A, B, C, D}, (A,E)⁺ = {A, B, C, D, E} = R

Check if any relation in D contains (A, E):

R₁ = {A, B}: Contains A but not E
R₂ = {B, C}: Contains neither
R₃ = {C, D}: Contains neither

No relation contains the candidate key!

Add R₄ = {A, E} (the candidate key itself).

D = {{A,B}, {B,C}, {C,D}, {A,E}}

STEP 4: Remove Redundant Relations

Check for subset relationships:

{A,B} ⊄ any other
{B,C} ⊄ any other
{C,D} ⊄ any other
{A,E} ⊄ any other

No redundancies to remove.

Final Result:

3NF Decomposition = {{A,B}, {B,C}, {C,D}, {A,E}}

3NF Synthesis Result
Relation	Schema	Key	FDs Preserved	3NF?
R₁	{A, B}	A	A → B	✓
R₂	{B, C}	B	B → C	✓
R₃	{C, D}	C	C → D	✓
R₄	{A, E}	(A, E)	none (key relation)	✓

Comparison with BCNF Result

Notice the 3NF synthesis gave us {{A,B}, {B,C}, {C,D}, {A,E}}, while BCNF decomposition gave {{A,B}, {C,D}, {A,C}, {A,E}}. Both are lossless, but 3NF keeps B→C in a single relation (dependency-preserving), while BCNF splits the B→C dependency implicitly across relations.

Comparing BCNF Decomposition vs 3NF Synthesis

Both algorithms guarantee losslessness, but they have different trade-offs. Understanding these helps you choose the right approach for your schema design.

BCNF vs 3NF Algorithm Comparison
Property	BCNF Decomposition	3NF Synthesis
Lossless?	✓ Always	✓ Always
Dependency Preserving?	✗ Not guaranteed	✓ Always
Normal Form Achieved	BCNF (stronger)	3NF (weaker)
Approach	Top-down decomposition	Bottom-up synthesis
Redundancy	Minimal (BCNF)	May have some (3NF)
Constraint Enforcement	May require joins	Single-table checks
Algorithm Complexity	Simpler to understand	Requires canonical cover

When to Use Which

•Use BCNF Decomposition when: Minimizing redundancy is the top priority, you can enforce constraints at the application level, or the FDs that become unenforceable aren't critical business rules.
•Use 3NF Synthesis when: All functional dependencies must be enforceable via single-table constraints, you need database-level integrity guarantees, or slight redundancy is acceptable.
•Consider hybrid approaches: Start with 3NF synthesis for dependency preservation, then selectively decompose to BCNF where the trade-off is acceptable.

Practical Reality

In practice, most production systems settle on 3NF for operational data (where constraints matter) and may denormalize or use BCNF for analytical workloads (where query performance matters more than constraint enforcement).

Verifying Algorithm Correctness

Even when using these algorithms, it's good practice to verify the results. Here's a checklist approach:

decomposition_verification_checklist.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
DECOMPOSITION VERIFICATION CHECKLIST
=====================================
 
Given: Original R, Decomposition D = {R₁, R₂, ..., Rₘ}, FDs F
 
1. ATTRIBUTE PRESERVATION
-------------------------
□ Union of all Rᵢ equals R
  Check: R₁ ∪ R₂ ∪ ... ∪ Rₘ = R
 
2. LOSSLESSNESS
---------------
□ Binary test for 2-relation case:
  - Common = R₁ ∩ R₂
  - Common⁺ ⊇ R₁ OR Common⁺ ⊇ R₂
 
□ For multi-relation, verify Chase or sequential:
  - Can reconstruct original through pairwise lossless joins
 
3. DEPENDENCY PRESERVATION (if required)
-----------------------------------------
□ For each FD (X → Y) in F:
  - Check if X ∪ Y ⊆ some Rᵢ
  
□ Alternatively: (F₁ ∪ F₂ ∪ ... ∪ Fₘ)⁺ = F⁺
  where Fᵢ = projection of F onto Rᵢ
 
4. NORMAL FORM VERIFICATION
---------------------------
□ For each Rᵢ, with projected FDs Fᵢ:
  
  For 3NF: Every non-trivial FD X → A in Fᵢ satisfies:
    - X is a superkey of Rᵢ, OR
    - A is part of some candidate key of Rᵢ
  
  For BCNF: Every non-trivial FD X → A in Fᵢ satisfies:
    - X is a superkey of Rᵢ
 
5. PRACTICAL SANITY CHECKS
--------------------------
□ Each relation has a meaningful key
□ No attribute appears in suspiciously many relations
□ Domain relationships make semantic sense
□ Reconstruction join path is clear

Summary and Key Takeaways

You now understand how to systematically create lossless decompositions through established algorithms.

Key Takeaways

•The binary decomposition principle underlies all algorithms — Decomposing on an FD X → Y where X is kept as the common attribute guarantees losslessness.
•BCNF decomposition iteratively eliminates violations — Finds BCNF-violating FDs and decomposes until no violations remain. Guarantees losslessness but may sacrifice dependency preservation.
•3NF synthesis builds from FDs — Creates relations from canonical cover, adds a key relation for losslessness, guarantees both losslessness and dependency preservation.
•Choice depends on priorities — BCNF for minimal redundancy, 3NF for constraint enforceability.
•Always verify the result — Use the checklist to confirm attribute coverage, losslessness, dependency preservation, and target normal form.

Page Complete

You can now create lossless decompositions using BCNF decomposition or 3NF synthesis. Next, we'll explore how to ensure information is truly preserved during these transformations—understanding what 'information preservation' really means at a deeper level.

Decomposition Algorithm

Constructing Lossless Decompositions by Design

We'll cover:

The fundamental binary decomposition principle
The BCNF decomposition algorithm
The 3NF synthesis algorithm
How these algorithms guarantee losslessness

What You Will Learn

The Fundamental Decomposition Principle

Before diving into specific algorithms, let's establish the core principle that makes decomposition algorithms produce lossless results.

The Binary Lossless Decomposition Principle:

Given a relation R and a functional dependency X → Y where Y ⊈ X and (X ∩ Y) = ∅, we can decompose R into:

R₁ = X ∪ Y        (The FD's attributes)
R₂ = R - (Y - X)  (Everything except Y's non-X part)
   = R - Y ∪ X    (Equivalently: R minus Y, plus X)

Or more simply:

R₁ = XY (the FD itself)
R₂ = R - (Y - X) = all attributes of R except those on the right side of the FD that aren't on the left

This decomposition is always lossless because:

Common attributes = X (the determinant)
X is a superkey for R₁ (since X → Y means X → R₁)
By our lossless join test, this satisfies the condition

The Key Insight

binary_decomposition_principle.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
BINARY LOSSLESS DECOMPOSITION
==============================
 
Given: Relation R with FD X → Y (where Y ⊈ X)
 
Decompose into:
  R₁ = X ∪ Y
  R₂ = R - (Y - X)  [R minus the non-trivial part of Y]
 
Alternatively stated as:
  R₁ = X ∪ Y
  R₂ = (R - Y) ∪ X  [R minus Y, reunited with X]
 
Properties:
  - R₁ ∩ R₂ = X (the determinant)
  - X → Y ⊆ R₁, so X is a superkey for R₁
  - Therefore: (R₁ ∩ R₂) → R₁ holds
  - LOSSLESS by binary test ✓
 
 
EXAMPLE:
========
R = (A, B, C, D, E)
FD: B → D
 
Decompose:
  R₁ = {B, D}           (the FD)
  R₂ = {A, B, C, E}     (R minus D, B stays as join key)
 
Check:
  Common = {B}
  B → D, so B → R₁ = {B, D} ✓
  Lossless!
 
Another FD: A → C
 
Further decompose R₂:
  R₂₁ = {A, C}
  R₂₂ = {A, B, E}
 
Check:
  Common = {A}
  A → C, so A → R₂₁ = {A, C} ✓
  Lossless!
 
Final decomposition: {R₁, R₂₁, R₂₂} = {{B,D}, {A,C}, {A,B,E}}

BCNF Decomposition Algorithm

The BCNF Decomposition Algorithm repeatedly applies the binary decomposition principle to eliminate BCNF violations until all resulting relations are in BCNF.

Recall BCNF Definition:

A relation R is in BCNF if and only if for every non-trivial FD X → Y that holds in R, X is a superkey of R.

BCNF Violation:

An FD X → Y violates BCNF if X is not a superkey (some attribute(s) are not determined by X).

The algorithm finds such violations and decomposes the relation until no violations remain.

bcnf_decomposition_algorithm.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
BCNF DECOMPOSITION ALGORITHM
=============================
 
INPUT:
  - Relation R with schema {A₁, A₂, ..., Aₙ}
  - Set of functional dependencies F
 
OUTPUT:
  - Set of relation schemas D = {R₁, R₂, ..., Rₘ} such that:
    1. Each Rᵢ is in BCNF
    2. The decomposition is lossless
    [Note: Dependency preservation is NOT guaranteed]
 
ALGORITHM:
 
FUNCTION BCNF_Decompose(R, F):
    result = {R}
    
    WHILE there exists Rᵢ in result that is not in BCNF:
        // Find a BCNF violation in Rᵢ
        Find X → Y in F⁺ such that:
          - X → Y is non-trivial (Y ⊈ X)
          - X → Y holds in Rᵢ (X ∪ Y ⊆ Rᵢ)
          - X is not a superkey of Rᵢ
        
        // Decompose Rᵢ using this violation
        R₁ = X ∪ Y            // Contains the violating FD
        R₂ = Rᵢ - (Y - X)     // Everything else, keeping X
        
        // Update result
        result = (result - {Rᵢ}) ∪ {R₁, R₂}
        
        // Note: Compute projections of F onto R₁ and R₂
        // for checking BCNF in subsequent iterations
    
    RETURN result
 
 
CORRECTNESS:
------------
- BCNF: Algorithm terminates only when no violations exist
- Lossless: Each binary split uses the fundamental principle
            (common attributes X are superkey for R₁)
- Termination: Each split strictly reduces total attributes
               in non-BCNF relations (finite)

Important Caveat

BCNF Decomposition: Complete Worked Example

Let's trace through the BCNF decomposition algorithm with a comprehensive example.

Problem:

R = (A, B, C, D, E)

Functional Dependencies F:
  A → B
  B → C
  C → D

Step 0: Initial Analysis

First, find the candidate key(s) of R:

What determines everything?
A → B → C → D (by transitivity)
So A⁺ = {A, B, C, D}
A doesn't determine E, so A alone isn't a key
(A, E)⁺ = {A, B, C, D, E} = R ✓

Candidate Key: (A, E)

Check BCNF:

A → B: Is A a superkey? A⁺ = {A,B,C,D} ≠ R. VIOLATION!
B → C: Is B a superkey? B⁺ = {B,C,D} ≠ R. VIOLATION!
C → D: Is C a superkey? C⁺ = {C,D} ≠ R. VIOLATION!

R is not in BCNF. Begin decomposition.

Iteration 1: Decompose on A → B

R₁ = {A, B}           (the FD A → B)
R₂ = {A, C, D, E}     (R minus B, keep A)

Check R₁ = {A, B}:

Projected FDs: A → B
Key: A
A → B: Is A superkey of {A,B}? A⁺ in R₁ = {A,B} ✓
R₁ is in BCNF ✓

Check R₂ = {A, C, D, E}:

Projected FDs from F: What FDs of F apply to {A,C,D,E}?
- A → B doesn't apply (B not in R₂)
- B → C doesn't apply (B not in R₂)
- C → D applies!
Also consider derived FDs: A → B → C, so A → C. But does this help?
- For projection, we need to find FDs holding in R₂
- A⁺ using F restricted to {A,C,D,E}: We have C → D
- Does A determine anything in {A,C,D,E}? Not directly from F.
Actually need closure-based projection. Let's carefully compute:
- For each subset X of {A,C,D,E}, compute X⁺ under F, intersect with {A,C,D,E}
- C⁺ = {C,D} → C → D holds in R₂
- A⁺ = {A,B,C,D} ∩ {A,C,D,E} = {A,C,D} → A → CD holds in R₂
So in R₂: A → CD, C → D
Key of R₂: Find what determines {A,C,D,E}
- A⁺ in R₂ = {A,C,D} ≠ R₂
- (A,E)⁺ in R₂ = {A,C,D,E} = R₂ ✓
- Key: (A,E)
Check BCNF for R₂:
- C → D: Is C superkey of R₂? C⁺ = {C,D} ≠ R₂. VIOLATION!

Progress Check

After Iteration 1: R₁ = {A,B} is in BCNF ✓, but R₂ = {A,C,D,E} still has a violation (C → D). Continue decomposing R₂.

Iteration 2: Decompose R₂ on C → D

R₂₁ = {C, D}          (the FD C → D)
R₂₂ = {A, C, E}       (R₂ minus D, keep C)

Check R₂₁ = {C, D}:

Projected FDs: C → D
Key: C
C → D: Is C superkey of {C,D}? Yes! C⁺ = {C,D} = R₂₁ ✓
R₂₁ is in BCNF ✓

Check R₂₂ = {A, C, E}:

Projected FDs: What FDs hold?
- A → C? Let's check: A⁺ under F = {A,B,C,D} ∩ {A,C,E} = {A,C}
- So A → C holds in R₂₂
Key of R₂₂:
- A⁺ = {A,C} ≠ R₂₂
- (A,E)⁺ = {A,C,E} = R₂₂ ✓
- Key: (A,E)
Check BCNF for R₂₂:
- A → C: Is A superkey of R₂₂? A⁺ = {A,C} ≠ R₂₂. VIOLATION!

R₂₂ is not in BCNF. Continue.

Iteration 3: Decompose R₂₂ on A → C

R₂₂₁ = {A, C}         (the FD A → C)
R₂₂₂ = {A, E}         (R₂₂ minus C, keep A)

Check R₂₂₁ = {A, C}:

Projected FDs: A → C
Key: A
A → C: Is A superkey? A⁺ = {A,C} = R₂₂₁ ✓
R₂₂₁ is in BCNF ✓

Check R₂₂₂ = {A, E}:

Projected FDs: None (no FD from F involves only A and E)
Key: (A,E) — the whole thing
No non-trivial FDs exist, so trivially BCNF ✓
R₂₂₂ is in BCNF ✓

Decomposition Complete!

Final decomposition into BCNF: {R₁, R₂₁, R₂₂₁, R₂₂₂} = {{A,B}, {C,D}, {A,C}, {A,E}}. Each relation is in BCNF, and the decomposition is lossless by construction.

Final BCNF Decomposition
Relation	Schema	Key	FDs	BCNF?
R₁	{A, B}	A	A → B	✓
R₂₁	{C, D}	C	C → D	✓
R₂₂₁	{A, C}	A	A → C	✓
R₂₂₂	{A, E}	(A, E)	none	✓

Verifying Losslessness:

Each decomposition step used an FD where the determinant becomes the common attribute:

R split into R₁{A,B} and R₂{A,C,D,E} on A → B
- Common: A, A → B (A is key for R₁) ✓
R₂ split into R₂₁{C,D} and R₂₂{A,C,E} on C → D
- Common: C, C → D (C is key for R₂₁) ✓
R₂₂ split into R₂₂₁{A,C} and R₂₂₂{A,E} on A → C
- Common: A, A → C (A is key for R₂₂₁) ✓

Reconstruction order:

R₂₂₁ ⋈ R₂₂₂ = R₂₂   (join on A)
R₂₂  ⋈ R₂₁  = R₂    (join on C)
R₂   ⋈ R₁   = R     (join on A)

Each join is lossless, so the entire decomposition is lossless.

3NF Synthesis Algorithm

This is important because BCNF decomposition can lose the ability to enforce some FDs on single tables. 3NF synthesis avoids this by designing around the FDs from the start.

3nf_synthesis_algorithm.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
3NF SYNTHESIS ALGORITHM
========================
 
INPUT:
  - Universal relation schema R = {A₁, A₂, ..., Aₙ}
  - Set of functional dependencies F
 
OUTPUT:
  - Set of relation schemas D = {R₁, R₂, ..., Rₘ} such that:
    1. Each Rᵢ is in 3NF
    2. The decomposition is lossless
    3. The decomposition is dependency-preserving
 
ALGORITHM:
 
STEP 1: Compute Canonical Cover
-------------------------------
Fc = canonical_cover(F)
// Removes extraneous attributes and redundant FDs
// Each FD has singleton right-hand side
 
 
STEP 2: Create Relations from FDs
----------------------------------
D = {}
FOR each FD (X → A) in Fc:
    // Group FDs with same LHS
    IF exists Rᵢ in D with schema starting from X:
        Add A to Rᵢ
    ELSE:
        Add new relation X ∪ {A} to D
 
 
STEP 3: Ensure Losslessness (Add Key Relation)
-----------------------------------------------
// Check if any existing relation contains a candidate key
candidate_keys = find_all_candidate_keys(R, F)
 
IF no relation in D contains any candidate key:
    Add one candidate key as a new relation to D
 
 
STEP 4: Remove Redundant Relations (Optional)
----------------------------------------------
// If one relation's schema is a subset of another's
FOR each pair (Rᵢ, Rⱼ) where Rᵢ ⊂ Rⱼ:
    Remove Rᵢ from D
 
RETURN D
 
 
PROPERTIES GUARANTEED:
-----------------------
1. LOSSLESS: The candidate key relation ensures reconstruction
2. DEPENDENCY-PRESERVING: Each FD is contained in one relation
3. 3NF: Each relation created from FD X→A has X as key

Why 3NF Synthesis Guarantees Losslessness

3NF Synthesis: Complete Worked Example

Let's apply the 3NF synthesis algorithm to the same example.

Problem:

R = (A, B, C, D, E)

Functional Dependencies F:
  A → B
  B → C
  C → D

STEP 1: Compute Canonical Cover

F is already minimal:

Each FD has singleton RHS
No extraneous attributes (A, B, C are each minimal determinants)
No redundant FDs

Fc = {A → B, B → C, C → D}

STEP 2: Create Relations from FDs

Group by left-hand side:

A → B creates R₁ = {A, B}
B → C creates R₂ = {B, C}
C → D creates R₃ = {C, D}

D = {{A,B}, {B,C}, {C,D}}

STEP 3: Ensure Losslessness

Find candidate keys of R:

From earlier: Candidate key = (A, E)
A⁺ = {A, B, C, D}, (A,E)⁺ = {A, B, C, D, E} = R

Check if any relation in D contains (A, E):

R₁ = {A, B}: Contains A but not E
R₂ = {B, C}: Contains neither
R₃ = {C, D}: Contains neither

No relation contains the candidate key!

Add R₄ = {A, E} (the candidate key itself).

D = {{A,B}, {B,C}, {C,D}, {A,E}}

STEP 4: Remove Redundant Relations

Check for subset relationships:

{A,B} ⊄ any other
{B,C} ⊄ any other
{C,D} ⊄ any other
{A,E} ⊄ any other

No redundancies to remove.

Final Result:

3NF Decomposition = {{A,B}, {B,C}, {C,D}, {A,E}}

3NF Synthesis Result
Relation	Schema	Key	FDs Preserved	3NF?
R₁	{A, B}	A	A → B	✓
R₂	{B, C}	B	B → C	✓
R₃	{C, D}	C	C → D	✓
R₄	{A, E}	(A, E)	none (key relation)	✓

Comparison with BCNF Result

Comparing BCNF Decomposition vs 3NF Synthesis

Both algorithms guarantee losslessness, but they have different trade-offs. Understanding these helps you choose the right approach for your schema design.

BCNF vs 3NF Algorithm Comparison
Property	BCNF Decomposition	3NF Synthesis
Lossless?	✓ Always	✓ Always
Dependency Preserving?	✗ Not guaranteed	✓ Always
Normal Form Achieved	BCNF (stronger)	3NF (weaker)
Approach	Top-down decomposition	Bottom-up synthesis
Redundancy	Minimal (BCNF)	May have some (3NF)
Constraint Enforcement	May require joins	Single-table checks
Algorithm Complexity	Simpler to understand	Requires canonical cover

When to Use Which

•Use BCNF Decomposition when: Minimizing redundancy is the top priority, you can enforce constraints at the application level, or the FDs that become unenforceable aren't critical business rules.
•Use 3NF Synthesis when: All functional dependencies must be enforceable via single-table constraints, you need database-level integrity guarantees, or slight redundancy is acceptable.
•Consider hybrid approaches: Start with 3NF synthesis for dependency preservation, then selectively decompose to BCNF where the trade-off is acceptable.

Practical Reality

Verifying Algorithm Correctness

Even when using these algorithms, it's good practice to verify the results. Here's a checklist approach:

decomposition_verification_checklist.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
DECOMPOSITION VERIFICATION CHECKLIST
=====================================
 
Given: Original R, Decomposition D = {R₁, R₂, ..., Rₘ}, FDs F
 
1. ATTRIBUTE PRESERVATION
-------------------------
□ Union of all Rᵢ equals R
  Check: R₁ ∪ R₂ ∪ ... ∪ Rₘ = R
 
2. LOSSLESSNESS
---------------
□ Binary test for 2-relation case:
  - Common = R₁ ∩ R₂
  - Common⁺ ⊇ R₁ OR Common⁺ ⊇ R₂
 
□ For multi-relation, verify Chase or sequential:
  - Can reconstruct original through pairwise lossless joins
 
3. DEPENDENCY PRESERVATION (if required)
-----------------------------------------
□ For each FD (X → Y) in F:
  - Check if X ∪ Y ⊆ some Rᵢ
  
□ Alternatively: (F₁ ∪ F₂ ∪ ... ∪ Fₘ)⁺ = F⁺
  where Fᵢ = projection of F onto Rᵢ
 
4. NORMAL FORM VERIFICATION
---------------------------
□ For each Rᵢ, with projected FDs Fᵢ:
  
  For 3NF: Every non-trivial FD X → A in Fᵢ satisfies:
    - X is a superkey of Rᵢ, OR
    - A is part of some candidate key of Rᵢ
  
  For BCNF: Every non-trivial FD X → A in Fᵢ satisfies:
    - X is a superkey of Rᵢ
 
5. PRACTICAL SANITY CHECKS
--------------------------
□ Each relation has a meaningful key
□ No attribute appears in suspiciously many relations
□ Domain relationships make semantic sense
□ Reconstruction join path is clear

Summary and Key Takeaways

You now understand how to systematically create lossless decompositions through established algorithms.

Key Takeaways

•The binary decomposition principle underlies all algorithms — Decomposing on an FD X → Y where X is kept as the common attribute guarantees losslessness.
•BCNF decomposition iteratively eliminates violations — Finds BCNF-violating FDs and decomposes until no violations remain. Guarantees losslessness but may sacrifice dependency preservation.
•3NF synthesis builds from FDs — Creates relations from canonical cover, adds a key relation for losslessness, guarantees both losslessness and dependency preservation.
•Choice depends on priorities — BCNF for minimal redundancy, 3NF for constraint enforceability.
•Always verify the result — Use the checklist to confirm attribute coverage, losslessness, dependency preservation, and target normal form.

Page Complete