Loading content...
We've established what MVD violations look like and the problems they cause. Now comes the actionable part: how do we fix them?
The solution is decomposition—splitting a relation with MVD violations into multiple smaller relations that collectively preserve all information while eliminating redundancy. The 4NF decomposition algorithm is elegant, principled, and guarantees a lossless result.
This page presents the algorithm in full detail, with rigorous worked examples and analysis of its properties.
By the end of this page, you will understand the 4NF decomposition algorithm, be able to apply it to any relation with MVD violations, verify that your decomposition is lossless, and handle edge cases and complex scenarios.
Before diving into the algorithm, let's understand why decomposition works for MVD violations.
The Core Insight:
An MVD violation occurs when two independent sets of values are stored together, requiring all combinations. By separating these independent sets into different relations, we:
The Mathematical Foundation:
If a relation R has an MVD X →→ Y, then R can be decomposed into:
This decomposition is lossless: R = R₁ ⋈ R₂ (natural join reconstructs original).
The proof relies on the MVD definition: the 'swap' property ensures all combinations exist, so the join produces exactly the original tuples.
R = EmpSkillProject
MVD: EmpID →→ Skill
X = {EmpID}, Y = {Skill}, Z = {Project}R₁ = EmpSkill(EmpID, Skill)
R₂ = EmpProject(EmpID, Project)Why the Join Reconstructs Correctly:
Consider the original data:
| EmpID | Skill | Project |
|---|---|---|
| E1 | Java | Alpha |
| E1 | Java | Beta |
| E1 | Python | Alpha |
| E1 | Python | Beta |
Decomposed:
R₁ = EmpSkill:
| EmpID | Skill |
|---|---|
| E1 | Java |
| E1 | Python |
R₂ = EmpProject:
| EmpID | Project |
|---|---|
| E1 | Alpha |
| E1 | Beta |
R₁ ⋈ R₂ (joining on EmpID):
Every row in R₁ with E1 matches every row in R₂ with E1:
This is exactly the original relation—the join is lossless.
Decomposition based on an MVD is always lossless. This is a fundamental theorem: if X →→ Y holds on R, then R = π_{XY}(R) ⋈ π_{X(R-Y)}(R). The MVD's 'tuple swapping' property ensures no spurious tuples are created.
The 4NF decomposition algorithm iteratively removes MVD violations until all remaining relations satisfy 4NF. Here is the formal algorithm:
123456789101112131415161718192021222324252627282930313233
Algorithm: 4NF_Decomposition(R, D)Input: - R: Relation schema with attributes - D: Set of functional dependencies and multivalued dependencies Output: - A set of relation schemas, each in 4NF Procedure: 1. Initialize result = {R} 2. Compute D⁺, the closure of D (all implied FDs and MVDs) 3. WHILE there exists a schema Rᵢ in result that is NOT in 4NF: a. Find a non-trivial MVD X →→ Y in D⁺ such that: - X →→ Y applies to Rᵢ - X is NOT a superkey of Rᵢ b. Decompose Rᵢ into: - R₁ = X ∪ Y - R₂ = X ∪ (Rᵢ - Y) Note: These are attribute sets; create new schemas with these attributes c. Remove Rᵢ from result d. Add R₁ and R₂ to result e. Project constraints from D onto R₁ and R₂: - For FDs: Use attribute closure to determine relevant FDs - For MVDs: Handle context sensitivity carefully 4. RETURN resultAlgorithm Walkthrough:
Step 1: Initialization Start with the original relation as the only element in our result set.
Step 2: Closure Computation Compute all implied dependencies. This is important because decomposition may require MVDs that aren't explicitly stated but are implied by others.
Step 3: Violation Detection and Resolution Repeatedly find violations and decompose. Each decomposition strictly reduces violation count because:
Step 4: Termination The algorithm terminates because each decomposition reduces the total number of attributes per relation, and relations with 2 or fewer attributes are automatically in 4NF.
Key Properties:
When decomposing, treat FDs as special cases of MVDs. If the original relation has both FD and MVD violations, address them together. Every FD X → Y implies X →→ Y, so the MVD-based decomposition handles both constraint types.
Let's work through a comprehensive example step by step.
Initial Relation:
EmployeeData(EmpID, Skill, Project, Department)
Dependencies:
Candidate Keys: {EmpID, Skill, Project} is the only candidate key (Department is functionally determined).
| EmpID | Skill | Project | Department |
|---|---|---|---|
| E1 | Java | Alpha | Engineering |
| E1 | Java | Beta | Engineering |
| E1 | Python | Alpha | Engineering |
| E1 | Python | Beta | Engineering |
| E2 | SQL | Gamma | Data |
| E2 | C++ | Gamma | Data |
Step 1: Identify Violations
Check MVD EmpID →→ Skill:
Check MVD EmpID →→ Project:
Check FD EmpID → Department:
Step 2: First Decomposition
Let's decompose using EmpID →→ Skill:
| Relation | Attributes | Instance |
|---|---|---|
| R₁ = EmpSkill | (EmpID, Skill) | (E1, Java), (E1, Python), (E2, SQL), (E2, C++) |
| R₂ = EmpProjectDept | (EmpID, Project, Department) | (E1, Alpha, Engineering), (E1, Beta, Engineering), (E2, Gamma, Data) |
Step 3: Check R₁ = EmpSkill(EmpID, Skill)
Step 4: Check R₂ = EmpProjectDept(EmpID, Project, Department)
MVD EmpID →→ Project: Is it trivial? Project ⊄ EmpID, EmpID ∪ Project ≠ R₂. Non-trivial.
Is EmpID a superkey of R₂? Key is {EmpID, Project} (since EmpID → Department). EmpID is a proper subset, so not a superkey. Violation!
FD EmpID → Department: Is EmpID a superkey? No. BCNF Violation!
Step 5: Second Decomposition
Decompose R₂ using EmpID →→ Project:
| Relation | Attributes | Instance |
|---|---|---|
| EmpSkill | (EmpID, Skill) | (E1, Java), (E1, Python), (E2, SQL), (E2, C++) |
| EmpProject | (EmpID, Project) | (E1, Alpha), (E1, Beta), (E2, Gamma) |
| EmpDept | (EmpID, Department) | (E1, Engineering), (E2, Data) |
Step 6: Verify All Relations Are in 4NF
EmpSkill(EmpID, Skill):
EmpProject(EmpID, Project):
EmpDept(EmpID, Department):
Final Result:
Three relations in 4NF:
Storage Comparison:
Actually, let's count correctly:
But total row count can be similar. The benefit is in maintenance and data integrity, not always raw row count.
Decomposition doesn't always reduce total row count. It reduces REDUNDANCY (repeated facts). Each fact is stored once. The benefit is in anomaly elimination, not necessarily storage reduction.
A critical property of any decomposition is that it must be lossless—the original relation can be perfectly reconstructed by joining the decomposed relations. Let's verify this for our example.
Lossless Join Theorem for MVDs:
If R is decomposed into R₁ and R₂, the decomposition is lossless if and only if:
(R₁ ∩ R₂) →→ (R₁ - R₂) OR (R₁ ∩ R₂) →→ (R₂ - R₁)
in the closure of the dependencies on R.
For MVD decomposition, this is guaranteed by construction: we decompose on an MVD X →→ Y where R₁ = XY and R₂ = X(R-Y). The common attributes are X, and X →→ Y holds.
Practical Verification via Join:
To verify losslessness empirically, we can:
-- Reconstruct from decomposition
SELECT es.EmpID, es.Skill, ep.Project, ed.Department
FROM EmpSkill es
NATURAL JOIN EmpProject ep
NATURAL JOIN EmpDept ed;
-- Compare with original
-- Should produce identical rows (in set-equality sense)
Why MVD Decomposition Is Always Lossless:
The key insight is that MVD X →→ Y guarantees the 'swap' property: if tuples (x, y₁, z₁) and (x, y₂, z₂) exist, then (x, y₁, z₂) and (x, y₂, z₁) also exist.
When we project and join:
But by the MVD, all these combinations already exist in R. So the join produces exactly R, no more, no less.
When you decompose based on a valid MVD, losslessness is guaranteed. Unlike FD-based BCNF decomposition (which requires careful checking), MVD-based 4NF decomposition is automatically lossless by construction.
While 4NF decomposition is always lossless, dependency preservation is more nuanced. Some dependencies may not be enforceable on the decomposed relations alone.
Dependency Preservation Defined:
A decomposition preserves dependencies if every dependency in the original set D can be enforced by checking constraints on individual decomposed relations, without joining them.
The Challenge with MVDs:
MVDs are particularly tricky for preservation. An MVD X →→ Y on R may not be directly enforceable after decomposition because:
Example Analysis:
Original: EmployeeData(EmpID, Skill, Project, Department) MVDs: EmpID →→ Skill, EmpID →→ Project
After decomposition:
The original MVDs are 'satisfied' because:
The Practical Implication:
For most 4NF decompositions, the primary constraints are naturally preserved:
When Extra Enforcement Is Needed:
If the original relation had a complex constraint like 'every employee must have at least one skill', this becomes a cross-table constraint after decomposition. You would need:
Trade-off Analysis:
| Aspect | Before Decomposition | After Decomposition |
|---|---|---|
| Redundancy | High | None |
| Update anomalies | Present | Eliminated |
| Single-table constraints | Easy | Easy |
| Cross-table constraints | N/A | Requires triggers |
Losslessness is mandatory—you must be able to reconstruct data. Dependency preservation is desirable but not always achievable. The 4NF decomposition algorithm prioritizes losslessness and eliminates redundancy; dependency preservation may require supplementary enforcement mechanisms.
The basic 4NF decomposition algorithm has several variations and optimizations for practical application.
Variation 1: MVD Selection Strategy
When multiple MVD violations exist, the order of decomposition affects the intermediate steps (but not the final result in terms of 4NF satisfaction).
Greedy approach: Choose the MVD whose decomposition reduces redundancy most.
Predictable approach: Process MVDs in a deterministic order (e.g., by determinant size, then dependent size).
Variation 2: Combined FD/MVD Processing
Rather than handling FDs and MVDs separately:
This simplifies implementation and ensures a coherent result.
12345678910111213141516171819202122232425262728293031323334353637
Algorithm: Optimized_4NF_Decomposition(R, FDs, MVDs) # Step 1: Convert FDs to MVDs all_mvds = MVDs ∪ { X →→ Y : X → Y ∈ FDs } # Step 2: Initialize result = {R} # Step 3: Iterative decomposition changed = True WHILE changed: changed = False FOR each Rᵢ in result: # Find violating MVD with smallest determinant # (optimization: decompose from smaller determinants first) violations = find_violations(Rᵢ, all_mvds) IF violations is not empty: # Select MVD with minimal |X| mvd = min(violations, key = lambda m: |m.determinant|) # Decompose R₁, R₂ = decompose(Rᵢ, mvd) # Update result result.remove(Rᵢ) result.add(R₁) result.add(R₂) # Project MVDs for new relations all_mvds = project_mvds(all_mvds, R₁, R₂) changed = True BREAK # Restart loop with updated result RETURN resultVariation 3: Minimal Decomposition
The basic algorithm may produce more relations than necessary. A minimal decomposition variant seeks the fewest relations:
Example:
For R(A, B, C, D) with A →→ B and A →→ C (both independent of each other and D):
Iterative approach:
Direct approach: Recognize three independent facts: A-B relationship, A-C relationship, A-D relationship. Directly construct: {AB}, {AC}, {AD}.
Both produce the same result, but the direct approach requires less iteration.
Optimization: Avoiding Redundant Decompositions
After each decomposition, check if any resulting relation is a projection of another. If so, it can be eliminated (its information is already contained).
This prevents producing unnecessarily many small relations when the data has hierarchical MVD structure.
For most real-world cases, analyze the semantic meaning of MVDs first. This often reveals the natural decomposition directly: each independent multi-valued fact gets its own relation, plus one relation for single-valued facts. The iterative algorithm serves as verification.
Not all 4NF decompositions are straightforward. Let's examine scenarios that require careful handling.
Scenario 1: Multiple Independent MVD Groups
Consider: R(A, B, C, D, E) with:
Decomposition:
Note that D and E are NOT independent of each other—they're stored together in R₃.
Scenario 2: Overlapping MVDs
Consider: R(A, B, C) with:
Is there a 4NF violation? Let's check:
If the only candidate key is ABC, then A is not a superkey, and A →→ B violates 4NF.
Decomposition: R₁(A, B), R₂(A, C)? No—we lose the AB → C dependency.
Actually, if AB →→ C is the only interesting constraint besides A →→ B:
The decomposition works but the AB →→ C constraint is now implicit (enforced by the join structure).
Scenario 3: Embedded FDs Affecting MVDs
Consider: R(A, B, C, D) with:
Violations:
Decomposition options:
Option A: Address MVD first
Let's clarify: We need more information about D. Assume A → D also.
Option B: Address FD first
The order can matter for intermediate steps, but both should reach 4NF.
Complex scenarios require careful dependency analysis. Always clearly identify all FDs and MVDs, determine all candidate keys, and then systematically apply the decomposition algorithm. Drawing dependency diagrams can help visualize relationships.
We've covered the complete 4NF decomposition process. Let's consolidate the key principles:
What's Next:
With the decomposition algorithm mastered, the next page explores the relationship between 4NF and BCNF—examining when 4NF truly adds value beyond BCNF, and when BCNF alone is sufficient.
You now have a complete understanding of the 4NF decomposition algorithm, including the formal procedure, worked examples, lossless verification, dependency preservation considerations, and handling of complex scenarios. Next, we'll compare 4NF with BCNF in depth.