Loading content...
So far, we've learned how to test whether a decomposition is lossless. But in practice, you rarely receive a decomposition and need to verify it—instead, you're creating decompositions as part of normalization. The question becomes: How do we systematically construct decompositions that are guaranteed to be lossless?
This page presents the key decomposition algorithms used in database normalization. These algorithms don't just produce decompositions—they produce lossless decompositions by construction. When you follow these algorithms correctly, losslessness is guaranteed without needing to run the Chase algorithm afterward.
We'll cover:
By the end of this page, you will understand how to systematically decompose relations while preserving information, apply the BCNF decomposition and 3NF synthesis algorithms, understand why these algorithms guarantee losslessness, and know when to use each approach.
Before diving into specific algorithms, let's establish the core principle that makes decomposition algorithms produce lossless results.
The Binary Lossless Decomposition Principle:
Given a relation R and a functional dependency X → Y where Y ⊈ X and (X ∩ Y) = ∅, we can decompose R into:
R₁ = X ∪ Y (The FD's attributes)
R₂ = R - (Y - X) (Everything except Y's non-X part)
= R - Y ∪ X (Equivalently: R minus Y, plus X)
Or more simply:
This decomposition is always lossless because:
Decomposing along functional dependencies is inherently safe. The determinant of the FD becomes the join attribute, and it's a key for the relation containing the FD. This structural guarantee is why normalization algorithms work.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
BINARY LOSSLESS DECOMPOSITION============================== Given: Relation R with FD X → Y (where Y ⊈ X) Decompose into: R₁ = X ∪ Y R₂ = R - (Y - X) [R minus the non-trivial part of Y] Alternatively stated as: R₁ = X ∪ Y R₂ = (R - Y) ∪ X [R minus Y, reunited with X] Properties: - R₁ ∩ R₂ = X (the determinant) - X → Y ⊆ R₁, so X is a superkey for R₁ - Therefore: (R₁ ∩ R₂) → R₁ holds - LOSSLESS by binary test ✓ EXAMPLE:========R = (A, B, C, D, E)FD: B → D Decompose: R₁ = {B, D} (the FD) R₂ = {A, B, C, E} (R minus D, B stays as join key) Check: Common = {B} B → D, so B → R₁ = {B, D} ✓ Lossless! Another FD: A → C Further decompose R₂: R₂₁ = {A, C} R₂₂ = {A, B, E} Check: Common = {A} A → C, so A → R₂₁ = {A, C} ✓ Lossless! Final decomposition: {R₁, R₂₁, R₂₂} = {{B,D}, {A,C}, {A,B,E}}The BCNF Decomposition Algorithm repeatedly applies the binary decomposition principle to eliminate BCNF violations until all resulting relations are in BCNF.
Recall BCNF Definition:
A relation R is in BCNF if and only if for every non-trivial FD X → Y that holds in R, X is a superkey of R.
BCNF Violation:
An FD X → Y violates BCNF if X is not a superkey (some attribute(s) are not determined by X).
The algorithm finds such violations and decomposes the relation until no violations remain.
123456789101112131415161718192021222324252627282930313233343536373839404142434445
BCNF DECOMPOSITION ALGORITHM============================= INPUT: - Relation R with schema {A₁, A₂, ..., Aₙ} - Set of functional dependencies F OUTPUT: - Set of relation schemas D = {R₁, R₂, ..., Rₘ} such that: 1. Each Rᵢ is in BCNF 2. The decomposition is lossless [Note: Dependency preservation is NOT guaranteed] ALGORITHM: FUNCTION BCNF_Decompose(R, F): result = {R} WHILE there exists Rᵢ in result that is not in BCNF: // Find a BCNF violation in Rᵢ Find X → Y in F⁺ such that: - X → Y is non-trivial (Y ⊈ X) - X → Y holds in Rᵢ (X ∪ Y ⊆ Rᵢ) - X is not a superkey of Rᵢ // Decompose Rᵢ using this violation R₁ = X ∪ Y // Contains the violating FD R₂ = Rᵢ - (Y - X) // Everything else, keeping X // Update result result = (result - {Rᵢ}) ∪ {R₁, R₂} // Note: Compute projections of F onto R₁ and R₂ // for checking BCNF in subsequent iterations RETURN result CORRECTNESS:------------- BCNF: Algorithm terminates only when no violations exist- Lossless: Each binary split uses the fundamental principle (common attributes X are superkey for R₁)- Termination: Each split strictly reduces total attributes in non-BCNF relations (finite)BCNF decomposition guarantees losslessness but NOT dependency preservation. Some FDs may become unjoinable after decomposition. For applications where constraints must be enforceable on single tables, consider 3NF synthesis instead.
Let's trace through the BCNF decomposition algorithm with a comprehensive example.
Problem:
R = (A, B, C, D, E)
Functional Dependencies F:
A → B
B → C
C → D
Step 0: Initial Analysis
First, find the candidate key(s) of R:
Candidate Key: (A, E)
Check BCNF:
R is not in BCNF. Begin decomposition.
Iteration 1: Decompose on A → B
R₁ = {A, B} (the FD A → B)
R₂ = {A, C, D, E} (R minus B, keep A)
Check R₁ = {A, B}:
Check R₂ = {A, C, D, E}:
Projected FDs from F: What FDs of F apply to {A,C,D,E}?
Also consider derived FDs: A → B → C, so A → C. But does this help?
Actually need closure-based projection. Let's carefully compute:
So in R₂: A → CD, C → D
Key of R₂: Find what determines {A,C,D,E}
Check BCNF for R₂:
After Iteration 1: R₁ = {A,B} is in BCNF ✓, but R₂ = {A,C,D,E} still has a violation (C → D). Continue decomposing R₂.
Iteration 2: Decompose R₂ on C → D
R₂₁ = {C, D} (the FD C → D)
R₂₂ = {A, C, E} (R₂ minus D, keep C)
Check R₂₁ = {C, D}:
Check R₂₂ = {A, C, E}:
Projected FDs: What FDs hold?
Key of R₂₂:
Check BCNF for R₂₂:
R₂₂ is not in BCNF. Continue.
Iteration 3: Decompose R₂₂ on A → C
R₂₂₁ = {A, C} (the FD A → C)
R₂₂₂ = {A, E} (R₂₂ minus C, keep A)
Check R₂₂₁ = {A, C}:
Check R₂₂₂ = {A, E}:
Final decomposition into BCNF: {R₁, R₂₁, R₂₂₁, R₂₂₂} = {{A,B}, {C,D}, {A,C}, {A,E}}. Each relation is in BCNF, and the decomposition is lossless by construction.
| Relation | Schema | Key | FDs | BCNF? |
|---|---|---|---|---|
| R₁ | {A, B} | A | A → B | ✓ |
| R₂₁ | {C, D} | C | C → D | ✓ |
| R₂₂₁ | {A, C} | A | A → C | ✓ |
| R₂₂₂ | {A, E} | (A, E) | none | ✓ |
Verifying Losslessness:
Each decomposition step used an FD where the determinant becomes the common attribute:
R split into R₁{A,B} and R₂{A,C,D,E} on A → B
R₂ split into R₂₁{C,D} and R₂₂{A,C,E} on C → D
R₂₂ split into R₂₂₁{A,C} and R₂₂₂{A,E} on A → C
Reconstruction order:
R₂₂₁ ⋈ R₂₂₂ = R₂₂ (join on A)
R₂₂ ⋈ R₂₁ = R₂ (join on C)
R₂ ⋈ R₁ = R (join on A)
Each join is lossless, so the entire decomposition is lossless.
The 3NF Synthesis Algorithm takes a different approach. Instead of iteratively decomposing based on violations, it synthesizes relations from functional dependencies, guaranteeing both losslessness and dependency preservation.
This is important because BCNF decomposition can lose the ability to enforce some FDs on single tables. 3NF synthesis avoids this by designing around the FDs from the start.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
3NF SYNTHESIS ALGORITHM======================== INPUT: - Universal relation schema R = {A₁, A₂, ..., Aₙ} - Set of functional dependencies F OUTPUT: - Set of relation schemas D = {R₁, R₂, ..., Rₘ} such that: 1. Each Rᵢ is in 3NF 2. The decomposition is lossless 3. The decomposition is dependency-preserving ALGORITHM: STEP 1: Compute Canonical Cover-------------------------------Fc = canonical_cover(F)// Removes extraneous attributes and redundant FDs// Each FD has singleton right-hand side STEP 2: Create Relations from FDs----------------------------------D = {}FOR each FD (X → A) in Fc: // Group FDs with same LHS IF exists Rᵢ in D with schema starting from X: Add A to Rᵢ ELSE: Add new relation X ∪ {A} to D STEP 3: Ensure Losslessness (Add Key Relation)-----------------------------------------------// Check if any existing relation contains a candidate keycandidate_keys = find_all_candidate_keys(R, F) IF no relation in D contains any candidate key: Add one candidate key as a new relation to D STEP 4: Remove Redundant Relations (Optional)----------------------------------------------// If one relation's schema is a subset of another'sFOR each pair (Rᵢ, Rⱼ) where Rᵢ ⊂ Rⱼ: Remove Rᵢ from D RETURN D PROPERTIES GUARANTEED:-----------------------1. LOSSLESS: The candidate key relation ensures reconstruction2. DEPENDENCY-PRESERVING: Each FD is contained in one relation3. 3NF: Each relation created from FD X→A has X as keyThe key insight: By adding a relation containing a candidate key (if not already present), we ensure there exists a way to reconstruct the original relation. The candidate key ties all the pieces together, acting as the anchor for joining all decomposed relations.
Let's apply the 3NF synthesis algorithm to the same example.
Problem:
R = (A, B, C, D, E)
Functional Dependencies F:
A → B
B → C
C → D
STEP 1: Compute Canonical Cover
F is already minimal:
Fc = {A → B, B → C, C → D}
STEP 2: Create Relations from FDs
Group by left-hand side:
D = {{A,B}, {B,C}, {C,D}}
STEP 3: Ensure Losslessness
Find candidate keys of R:
Check if any relation in D contains (A, E):
No relation contains the candidate key!
Add R₄ = {A, E} (the candidate key itself).
D = {{A,B}, {B,C}, {C,D}, {A,E}}
STEP 4: Remove Redundant Relations
Check for subset relationships:
No redundancies to remove.
Final Result:
3NF Decomposition = {{A,B}, {B,C}, {C,D}, {A,E}}
| Relation | Schema | Key | FDs Preserved | 3NF? |
|---|---|---|---|---|
| R₁ | {A, B} | A | A → B | ✓ |
| R₂ | {B, C} | B | B → C | ✓ |
| R₃ | {C, D} | C | C → D | ✓ |
| R₄ | {A, E} | (A, E) | none (key relation) | ✓ |
Notice the 3NF synthesis gave us {{A,B}, {B,C}, {C,D}, {A,E}}, while BCNF decomposition gave {{A,B}, {C,D}, {A,C}, {A,E}}. Both are lossless, but 3NF keeps B→C in a single relation (dependency-preserving), while BCNF splits the B→C dependency implicitly across relations.
Both algorithms guarantee losslessness, but they have different trade-offs. Understanding these helps you choose the right approach for your schema design.
| Property | BCNF Decomposition | 3NF Synthesis |
|---|---|---|
| Lossless? | ✓ Always | ✓ Always |
| Dependency Preserving? | ✗ Not guaranteed | ✓ Always |
| Normal Form Achieved | BCNF (stronger) | 3NF (weaker) |
| Approach | Top-down decomposition | Bottom-up synthesis |
| Redundancy | Minimal (BCNF) | May have some (3NF) |
| Constraint Enforcement | May require joins | Single-table checks |
| Algorithm Complexity | Simpler to understand | Requires canonical cover |
In practice, most production systems settle on 3NF for operational data (where constraints matter) and may denormalize or use BCNF for analytical workloads (where query performance matters more than constraint enforcement).
Even when using these algorithms, it's good practice to verify the results. Here's a checklist approach:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
DECOMPOSITION VERIFICATION CHECKLIST===================================== Given: Original R, Decomposition D = {R₁, R₂, ..., Rₘ}, FDs F 1. ATTRIBUTE PRESERVATION-------------------------□ Union of all Rᵢ equals R Check: R₁ ∪ R₂ ∪ ... ∪ Rₘ = R 2. LOSSLESSNESS---------------□ Binary test for 2-relation case: - Common = R₁ ∩ R₂ - Common⁺ ⊇ R₁ OR Common⁺ ⊇ R₂ □ For multi-relation, verify Chase or sequential: - Can reconstruct original through pairwise lossless joins 3. DEPENDENCY PRESERVATION (if required)-----------------------------------------□ For each FD (X → Y) in F: - Check if X ∪ Y ⊆ some Rᵢ □ Alternatively: (F₁ ∪ F₂ ∪ ... ∪ Fₘ)⁺ = F⁺ where Fᵢ = projection of F onto Rᵢ 4. NORMAL FORM VERIFICATION---------------------------□ For each Rᵢ, with projected FDs Fᵢ: For 3NF: Every non-trivial FD X → A in Fᵢ satisfies: - X is a superkey of Rᵢ, OR - A is part of some candidate key of Rᵢ For BCNF: Every non-trivial FD X → A in Fᵢ satisfies: - X is a superkey of Rᵢ 5. PRACTICAL SANITY CHECKS--------------------------□ Each relation has a meaningful key□ No attribute appears in suspiciously many relations□ Domain relationships make semantic sense□ Reconstruction join path is clearYou now understand how to systematically create lossless decompositions through established algorithms.
You can now create lossless decompositions using BCNF decomposition or 3NF synthesis. Next, we'll explore how to ensure information is truly preserved during these transformations—understanding what 'information preservation' really means at a deeper level.