Fourth Normal Form - Learning Module

Loading content...

0/252

4NF Decomposition

From Violation to Resolution

We've established what MVD violations look like and the problems they cause. Now comes the actionable part: how do we fix them?

The solution is decomposition—splitting a relation with MVD violations into multiple smaller relations that collectively preserve all information while eliminating redundancy. The 4NF decomposition algorithm is elegant, principled, and guarantees a lossless result.

This page presents the algorithm in full detail, with rigorous worked examples and analysis of its properties.

What You Will Learn

By the end of this page, you will understand the 4NF decomposition algorithm, be able to apply it to any relation with MVD violations, verify that your decomposition is lossless, and handle edge cases and complex scenarios.

The Decomposition Principle

Before diving into the algorithm, let's understand why decomposition works for MVD violations.

The Core Insight:

An MVD violation occurs when two independent sets of values are stored together, requiring all combinations. By separating these independent sets into different relations, we:

Store each set exactly once (eliminating redundancy)
Can reconstruct the original data through natural join
Maintain the semantic meaning of the data

The Mathematical Foundation:

If a relation R has an MVD X →→ Y, then R can be decomposed into:

R₁ = πₓᵧ(R) — projection on attributes X ∪ Y
R₂ = πₓz(R) — projection on attributes X ∪ Z, where Z = R - X - Y

This decomposition is lossless: R = R₁ ⋈ R₂ (natural join reconstructs original).

The proof relies on the MVD definition: the 'swap' property ensures all combinations exist, so the join produces exactly the original tuples.

Decomposition in ActionConsider EmpSkillProject(EmpID, Skill, Project) with MVD EmpID →→ Skill:

Input

R = EmpSkillProject
MVD: EmpID →→ Skill
X = {EmpID}, Y = {Skill}, Z = {Project}

Output

R₁ = EmpSkill(EmpID, Skill)
R₂ = EmpProject(EmpID, Project)

Why the Join Reconstructs Correctly:

Consider the original data:

EmpID	Skill	Project
E1	Java	Alpha
E1	Java	Beta
E1	Python	Alpha
E1	Python	Beta

Decomposed:

R₁ = EmpSkill:

EmpID	Skill
E1	Java
E1	Python

R₂ = EmpProject:

EmpID	Project
E1	Alpha
E1	Beta

R₁ ⋈ R₂ (joining on EmpID):

Every row in R₁ with E1 matches every row in R₂ with E1:

(E1, Java) ⋈ (E1, Alpha) → (E1, Java, Alpha)
(E1, Java) ⋈ (E1, Beta) → (E1, Java, Beta)
(E1, Python) ⋈ (E1, Alpha) → (E1, Python, Alpha)
(E1, Python) ⋈ (E1, Beta) → (E1, Python, Beta)

This is exactly the original relation—the join is lossless.

The Lossless Guarantee

Decomposition based on an MVD is always lossless. This is a fundamental theorem: if X →→ Y holds on R, then R = π_{XY}(R) ⋈ π_{X(R-Y)}(R). The MVD's 'tuple swapping' property ensures no spurious tuples are created.

The 4NF Decomposition Algorithm

The 4NF decomposition algorithm iteratively removes MVD violations until all remaining relations satisfy 4NF. Here is the formal algorithm:

4NF Decomposition Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Algorithm: 4NF_Decomposition(R, D)
Input: 
  - R: Relation schema with attributes
  - D: Set of functional dependencies and multivalued dependencies
 
Output:
  - A set of relation schemas, each in 4NF
 
Procedure:
  1. Initialize result = {R}
  
  2. Compute D⁺, the closure of D (all implied FDs and MVDs)
  
  3. WHILE there exists a schema Rᵢ in result that is NOT in 4NF:
       
       a. Find a non-trivial MVD X →→ Y in D⁺ such that:
          - X →→ Y applies to Rᵢ
          - X is NOT a superkey of Rᵢ
       
       b. Decompose Rᵢ into:
          - R₁ = X ∪ Y
          - R₂ = X ∪ (Rᵢ - Y)
          Note: These are attribute sets; create new schemas with these attributes
       
       c. Remove Rᵢ from result
       
       d. Add R₁ and R₂ to result
       
       e. Project constraints from D onto R₁ and R₂:
          - For FDs: Use attribute closure to determine relevant FDs
          - For MVDs: Handle context sensitivity carefully
  
  4. RETURN result

Algorithm Walkthrough:

Step 1: Initialization Start with the original relation as the only element in our result set.

Step 2: Closure Computation Compute all implied dependencies. This is important because decomposition may require MVDs that aren't explicitly stated but are implied by others.

Step 3: Violation Detection and Resolution Repeatedly find violations and decompose. Each decomposition strictly reduces violation count because:

The violating MVD is now trivial in R₁ (covers all attributes)
R₂ has fewer violations or none

Step 4: Termination The algorithm terminates because each decomposition reduces the total number of attributes per relation, and relations with 2 or fewer attributes are automatically in 4NF.

Key Properties:

Terminates: Maximum log₂(n) iterations for n attributes
Lossless: Each decomposition step preserves information
Produces 4NF: Final relations have no 4NF violations

FD Handling

When decomposing, treat FDs as special cases of MVDs. If the original relation has both FD and MVD violations, address them together. Every FD X → Y implies X →→ Y, so the MVD-based decomposition handles both constraint types.

Detailed Worked Example

Let's work through a comprehensive example step by step.

Initial Relation:

EmployeeData(EmpID, Skill, Project, Department)

Dependencies:

MVD: EmpID →→ Skill (independent of project and department)
MVD: EmpID →→ Project (independent of skill and department)
FD: EmpID → Department (each employee in one department)

Candidate Keys: {EmpID, Skill, Project} is the only candidate key (Department is functionally determined).

Initial EmployeeData Instance
EmpID	Skill	Project	Department
E1	Java	Alpha	Engineering
E1	Java	Beta	Engineering
E1	Python	Alpha	Engineering
E1	Python	Beta	Engineering
E2	SQL	Gamma	Data
E2	C++	Gamma	Data

Step 1: Identify Violations

Check MVD EmpID →→ Skill:

Is it trivial? Skill ⊄ EmpID and EmpID ∪ Skill ≠ {EmpID, Skill, Project, Department}. Non-trivial.
Is EmpID a superkey? No, superkey is {EmpID, Skill, Project}. Violation!

Check MVD EmpID →→ Project:

Is it trivial? Project ⊄ EmpID and EmpID ∪ Project ≠ R. Non-trivial.
Is EmpID a superkey? No. Violation!

Check FD EmpID → Department:

Is EmpID a superkey? No. BCNF Violation! (Thus also 4NF violation, since FD implies MVD)

Step 2: First Decomposition

Let's decompose using EmpID →→ Skill:

X = {EmpID}, Y = {Skill}
R₁ = X ∪ Y = {EmpID, Skill}
R₂ = X ∪ (R - Y) = {EmpID} ∪ {Project, Department} = {EmpID, Project, Department}

After First Decomposition
Relation	Attributes	Instance
R₁ = EmpSkill	(EmpID, Skill)	(E1, Java), (E1, Python), (E2, SQL), (E2, C++)
R₂ = EmpProjectDept	(EmpID, Project, Department)	(E1, Alpha, Engineering), (E1, Beta, Engineering), (E2, Gamma, Data)

Step 3: Check R₁ = EmpSkill(EmpID, Skill)

MVD EmpID →→ Skill: X ∪ Y = entire relation. Trivial. ✓
No other dependencies apply.
R₁ is in 4NF.

Step 4: Check R₂ = EmpProjectDept(EmpID, Project, Department)

MVD EmpID →→ Project: Is it trivial? Project ⊄ EmpID, EmpID ∪ Project ≠ R₂. Non-trivial.
Is EmpID a superkey of R₂? Key is {EmpID, Project} (since EmpID → Department). EmpID is a proper subset, so not a superkey. Violation!
FD EmpID → Department: Is EmpID a superkey? No. BCNF Violation!

Step 5: Second Decomposition

Decompose R₂ using EmpID →→ Project:

X = {EmpID}, Y = {Project}
R₂₁ = {EmpID, Project}
R₂₂ = {EmpID, Department}

After Second Decomposition
Relation	Attributes	Instance
EmpSkill	(EmpID, Skill)	(E1, Java), (E1, Python), (E2, SQL), (E2, C++)
EmpProject	(EmpID, Project)	(E1, Alpha), (E1, Beta), (E2, Gamma)
EmpDept	(EmpID, Department)	(E1, Engineering), (E2, Data)

Step 6: Verify All Relations Are in 4NF

EmpSkill(EmpID, Skill):

Binary relation → automatically 4NF ✓

EmpProject(EmpID, Project):

Binary relation → automatically 4NF ✓

EmpDept(EmpID, Department):

FD: EmpID → Department
EmpID is a superkey (determines all attributes)
No MVD violations ✓

Final Result:

Three relations in 4NF:

EmpSkill(EmpID, Skill) — All employee skills
EmpProject(EmpID, Project) — All employee projects
EmpDept(EmpID, Department) — Employee department assignments

Storage Comparison:

Original: 6 rows
Decomposed: 4 + 3 + 2 = 9 total rows... wait, that's more!

Actually, let's count correctly:

Original redundancy: Department repeated, skills/projects repeated
Decomposed: No redundancy within relations

But total row count can be similar. The benefit is in maintenance and data integrity, not always raw row count.

Row Count vs. Redundancy

Decomposition doesn't always reduce total row count. It reduces REDUNDANCY (repeated facts). Each fact is stored once. The benefit is in anomaly elimination, not necessarily storage reduction.

Lossless Join Verification

A critical property of any decomposition is that it must be lossless—the original relation can be perfectly reconstructed by joining the decomposed relations. Let's verify this for our example.

Lossless Join Theorem for MVDs:

If R is decomposed into R₁ and R₂, the decomposition is lossless if and only if:

(R₁ ∩ R₂) →→ (R₁ - R₂) OR (R₁ ∩ R₂) →→ (R₂ - R₁)

in the closure of the dependencies on R.

For MVD decomposition, this is guaranteed by construction: we decompose on an MVD X →→ Y where R₁ = XY and R₂ = X(R-Y). The common attributes are X, and X →→ Y holds.

Verifying LosslessnessFor our decomposition of EmployeeData into EmpSkill, EmpProject, EmpDept:

Input

Output

Practical Verification via Join:

To verify losslessness empirically, we can:

Take the decomposed relations
Perform natural joins
Check if result equals original

-- Reconstruct from decomposition
SELECT es.EmpID, es.Skill, ep.Project, ed.Department
FROM EmpSkill es
NATURAL JOIN EmpProject ep
NATURAL JOIN EmpDept ed;

-- Compare with original
-- Should produce identical rows (in set-equality sense)

Why MVD Decomposition Is Always Lossless:

The key insight is that MVD X →→ Y guarantees the 'swap' property: if tuples (x, y₁, z₁) and (x, y₂, z₂) exist, then (x, y₁, z₂) and (x, y₂, z₁) also exist.

When we project and join:

R₁ = πₓᵧ(R) contains (x, y₁) and (x, y₂)
R₂ = πₓz(R) contains (x, z₁) and (x, z₂)
R₁ ⋈ R₂ produces all combinations: (x, y₁, z₁), (x, y₁, z₂), (x, y₂, z₁), (x, y₂, z₂)

But by the MVD, all these combinations already exist in R. So the join produces exactly R, no more, no less.

Automatic Losslessness

When you decompose based on a valid MVD, losslessness is guaranteed. Unlike FD-based BCNF decomposition (which requires careful checking), MVD-based 4NF decomposition is automatically lossless by construction.

Dependency Preservation Considerations

While 4NF decomposition is always lossless, dependency preservation is more nuanced. Some dependencies may not be enforceable on the decomposed relations alone.

Dependency Preservation Defined:

A decomposition preserves dependencies if every dependency in the original set D can be enforced by checking constraints on individual decomposed relations, without joining them.

The Challenge with MVDs:

MVDs are particularly tricky for preservation. An MVD X →→ Y on R may not be directly enforceable after decomposition because:

MVDs are context-sensitive (may not hold on projections)
The MVD may only be 'visible' when relations are joined

Example Analysis:

Original: EmployeeData(EmpID, Skill, Project, Department) MVDs: EmpID →→ Skill, EmpID →→ Project

After decomposition:

EmpSkill(EmpID, Skill) — Binary relation; MVD is trivial here
EmpProject(EmpID, Project) — Binary relation; MVD is trivial here
EmpDept(EmpID, Department) — No MVDs apply

The original MVDs are 'satisfied' because:

The independence is now structural (separate relations)
No combinatorial enforcement is needed
The MVD semantics are preserved by the decomposition itself

Preserved After Decomposition

•FDs that fit entirely in one relation
•MVDs that become trivial after split
•Key constraints on individual relations
•Referential integrity (can use foreign keys)

May Require Extra Enforcement

•Cross-relation FDs (spanning multiple tables)
•Complex MVDs with non-trivial remainders
•Business rules involving multiple entities
•Cardinality constraints

The Practical Implication:

For most 4NF decompositions, the primary constraints are naturally preserved:

FDs like EmpID → Department are preserved in their respective relation (EmpDept)
MVDs are 'enforced' by the structure—separate relations prevent combinatorial storage
Keys are preserved with appropriate primary key definitions

When Extra Enforcement Is Needed:

If the original relation had a complex constraint like 'every employee must have at least one skill', this becomes a cross-table constraint after decomposition. You would need:

Application-level validation
Database triggers
Deferred constraint checking

Trade-off Analysis:

Aspect	Before Decomposition	After Decomposition
Redundancy	High	None
Update anomalies	Present	Eliminated
Single-table constraints	Easy	Easy
Cross-table constraints	N/A	Requires triggers

Dependency Preservation vs. Losslessness

Losslessness is mandatory—you must be able to reconstruct data. Dependency preservation is desirable but not always achievable. The 4NF decomposition algorithm prioritizes losslessness and eliminates redundancy; dependency preservation may require supplementary enforcement mechanisms.

Algorithm Variations and Optimizations

The basic 4NF decomposition algorithm has several variations and optimizations for practical application.

Variation 1: MVD Selection Strategy

When multiple MVD violations exist, the order of decomposition affects the intermediate steps (but not the final result in terms of 4NF satisfaction).

Greedy approach: Choose the MVD whose decomposition reduces redundancy most.

Predictable approach: Process MVDs in a deterministic order (e.g., by determinant size, then dependent size).

Variation 2: Combined FD/MVD Processing

Rather than handling FDs and MVDs separately:

Treat all FDs as MVDs (X → Y implies X →→ Y)
Apply the 4NF algorithm, which will achieve both BCNF and 4NF

This simplifies implementation and ensures a coherent result.

Optimized 4NF Decomposition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Algorithm: Optimized_4NF_Decomposition(R, FDs, MVDs)
  
  # Step 1: Convert FDs to MVDs
  all_mvds = MVDs ∪ { X →→ Y : X → Y ∈ FDs }
  
  # Step 2: Initialize
  result = {R}
  
  # Step 3: Iterative decomposition
  changed = True
  WHILE changed:
    changed = False
    
    FOR each Rᵢ in result:
      # Find violating MVD with smallest determinant
      # (optimization: decompose from smaller determinants first)
      violations = find_violations(Rᵢ, all_mvds)
      
      IF violations is not empty:
        # Select MVD with minimal |X|
        mvd = min(violations, key = lambda m: |m.determinant|)
        
        # Decompose
        R₁, R₂ = decompose(Rᵢ, mvd)
        
        # Update result
        result.remove(Rᵢ)
        result.add(R₁)
        result.add(R₂)
        
        # Project MVDs for new relations
        all_mvds = project_mvds(all_mvds, R₁, R₂)
        
        changed = True
        BREAK  # Restart loop with updated result
  
  RETURN result

Variation 3: Minimal Decomposition

The basic algorithm may produce more relations than necessary. A minimal decomposition variant seeks the fewest relations:

Identify all independent MVDs
Compute the minimal number of relations needed (one per independent fact type, plus one for non-MVD attributes)
Construct relations directly rather than iteratively

Example:

For R(A, B, C, D) with A →→ B and A →→ C (both independent of each other and D):

Iterative approach:

Decompose on A →→ B: {AB}, {ACD}
Decompose {ACD} on A →→ C: {AC}, {AD}
Final: {AB}, {AC}, {AD}

Direct approach: Recognize three independent facts: A-B relationship, A-C relationship, A-D relationship. Directly construct: {AB}, {AC}, {AD}.

Both produce the same result, but the direct approach requires less iteration.

Optimization: Avoiding Redundant Decompositions

After each decomposition, check if any resulting relation is a projection of another. If so, it can be eliminated (its information is already contained).

This prevents producing unnecessarily many small relations when the data has hierarchical MVD structure.

Practical Recommendation

For most real-world cases, analyze the semantic meaning of MVDs first. This often reveals the natural decomposition directly: each independent multi-valued fact gets its own relation, plus one relation for single-valued facts. The iterative algorithm serves as verification.

Complex Decomposition Scenarios

Not all 4NF decompositions are straightforward. Let's examine scenarios that require careful handling.

Scenario 1: Multiple Independent MVD Groups

Consider: R(A, B, C, D, E) with:

A →→ B (B independent of C, D, E given A)
A →→ C (C independent of B, D, E given A)
A →→ DE (D and E together, independent of B, C)

Decomposition:

R₁(A, B) — A-B relationship
R₂(A, C) — A-C relationship
R₃(A, D, E) — A-DE relationship (D and E are grouped)

Note that D and E are NOT independent of each other—they're stored together in R₃.

Scenario 2: Overlapping MVDs

Consider: R(A, B, C) with:

A →→ B
AB →→ C

Is there a 4NF violation? Let's check:

A →→ B: Is A a superkey? Depends on other constraints.
AB →→ C: Since AB →→ C and the relation is ABC, this is trivial (covers all attributes).

If the only candidate key is ABC, then A is not a superkey, and A →→ B violates 4NF.

Decomposition: R₁(A, B), R₂(A, C)? No—we lose the AB → C dependency.

Actually, if AB →→ C is the only interesting constraint besides A →→ B:

Decompose on A →→ B: R₁(A, B), R₂(A, C)
Check R₂: Is A → C or A →→ C? If AB →→ C, projecting removes B, so the MVD doesn't apply.
R₂(A, C) is binary, automatically 4NF.

The decomposition works but the AB →→ C constraint is now implicit (enforced by the join structure).

Scenario 3: Embedded FDs Affecting MVDs

Consider: R(A, B, C, D) with:

A → B (FD)
A →→ C (MVD, independent of B, D)
Key: AC (since A → B and given A, C determines the tuple)

Violations:

A → B: Is A a superkey? No (key is AC). BCNF violation.
A →→ C: Is A a superkey? No. 4NF violation.

Decomposition options:

Option A: Address MVD first

Decompose on A →→ C: R₁(A, C), R₂(A, B, D)
R₂ still has A → B with key = A (since D functionally depends on A via... wait, what determines D?)

Let's clarify: We need more information about D. Assume A → D also.

R₁(A, C): Key is AC. A →→ C is trivial (covers relation). 4NF ✓
R₂(A, B, D): A → B and A → D. Key is A. Both FDs have superkey determinant. BCNF ✓, 4NF ✓

Option B: Address FD first

Decompose on A → B: R₁(A, B), R₂(A, C, D)
R₁ is 4NF.
R₂: A →→ C projected to R₂. Is A a superkey of R₂? If A → D, key is A. Then A is superkey, A →→ C is fine.
If A does not → D, we might still have issues.

The order can matter for intermediate steps, but both should reach 4NF.

Analyze Carefully

Complex scenarios require careful dependency analysis. Always clearly identify all FDs and MVDs, determine all candidate keys, and then systematically apply the decomposition algorithm. Drawing dependency diagrams can help visualize relationships.

Summary: 4NF Decomposition

We've covered the complete 4NF decomposition process. Let's consolidate the key principles:

Key Takeaways

•Decomposition separates independent facts — Each MVD-violating relation splits into relations for each independent multi-valued fact.
•The algorithm is iterative — Find a violation, decompose, repeat until all relations are 4NF.
•Decomposition is always lossless — MVD-based decomposition guarantees perfect reconstruction via join.
•Dependency preservation varies — FDs within single relations are preserved; cross-relation constraints may need extra enforcement.
•Optimization is possible — MVD selection order and direct decomposition can improve efficiency.
•Complex scenarios require care — Multiple MVD groups, overlapping MVDs, and embedded FDs need systematic analysis.

What's Next:

With the decomposition algorithm mastered, the next page explores the relationship between 4NF and BCNF—examining when 4NF truly adds value beyond BCNF, and when BCNF alone is sufficient.

Page Complete

You now have a complete understanding of the 4NF decomposition algorithm, including the formal procedure, worked examples, lossless verification, dependency preservation considerations, and handling of complex scenarios. Next, we'll compare 4NF with BCNF in depth.

4NF Decomposition

From Violation to Resolution

We've established what MVD violations look like and the problems they cause. Now comes the actionable part: how do we fix them?

This page presents the algorithm in full detail, with rigorous worked examples and analysis of its properties.

What You Will Learn

The Decomposition Principle

Before diving into the algorithm, let's understand why decomposition works for MVD violations.

The Core Insight:

An MVD violation occurs when two independent sets of values are stored together, requiring all combinations. By separating these independent sets into different relations, we:

Store each set exactly once (eliminating redundancy)
Can reconstruct the original data through natural join
Maintain the semantic meaning of the data

The Mathematical Foundation:

If a relation R has an MVD X →→ Y, then R can be decomposed into:

R₁ = πₓᵧ(R) — projection on attributes X ∪ Y
R₂ = πₓz(R) — projection on attributes X ∪ Z, where Z = R - X - Y

This decomposition is lossless: R = R₁ ⋈ R₂ (natural join reconstructs original).

The proof relies on the MVD definition: the 'swap' property ensures all combinations exist, so the join produces exactly the original tuples.

Decomposition in ActionConsider EmpSkillProject(EmpID, Skill, Project) with MVD EmpID →→ Skill:

Input

R = EmpSkillProject
MVD: EmpID →→ Skill
X = {EmpID}, Y = {Skill}, Z = {Project}

Output

R₁ = EmpSkill(EmpID, Skill)
R₂ = EmpProject(EmpID, Project)

Why the Join Reconstructs Correctly:

Consider the original data:

EmpID	Skill	Project
E1	Java	Alpha
E1	Java	Beta
E1	Python	Alpha
E1	Python	Beta

Decomposed:

R₁ = EmpSkill:

EmpID	Skill
E1	Java
E1	Python

R₂ = EmpProject:

EmpID	Project
E1	Alpha
E1	Beta

R₁ ⋈ R₂ (joining on EmpID):

Every row in R₁ with E1 matches every row in R₂ with E1:

(E1, Java) ⋈ (E1, Alpha) → (E1, Java, Alpha)
(E1, Java) ⋈ (E1, Beta) → (E1, Java, Beta)
(E1, Python) ⋈ (E1, Alpha) → (E1, Python, Alpha)
(E1, Python) ⋈ (E1, Beta) → (E1, Python, Beta)

This is exactly the original relation—the join is lossless.

The Lossless Guarantee

The 4NF Decomposition Algorithm

The 4NF decomposition algorithm iteratively removes MVD violations until all remaining relations satisfy 4NF. Here is the formal algorithm:

4NF Decomposition Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
Algorithm: 4NF_Decomposition(R, D)
Input: 
  - R: Relation schema with attributes
  - D: Set of functional dependencies and multivalued dependencies
 
Output:
  - A set of relation schemas, each in 4NF
 
Procedure:
  1. Initialize result = {R}
  
  2. Compute D⁺, the closure of D (all implied FDs and MVDs)
  
  3. WHILE there exists a schema Rᵢ in result that is NOT in 4NF:
       
       a. Find a non-trivial MVD X →→ Y in D⁺ such that:
          - X →→ Y applies to Rᵢ
          - X is NOT a superkey of Rᵢ
       
       b. Decompose Rᵢ into:
          - R₁ = X ∪ Y
          - R₂ = X ∪ (Rᵢ - Y)
          Note: These are attribute sets; create new schemas with these attributes
       
       c. Remove Rᵢ from result
       
       d. Add R₁ and R₂ to result
       
       e. Project constraints from D onto R₁ and R₂:
          - For FDs: Use attribute closure to determine relevant FDs
          - For MVDs: Handle context sensitivity carefully
  
  4. RETURN result

Algorithm Walkthrough:

Step 1: Initialization Start with the original relation as the only element in our result set.

Step 2: Closure Computation Compute all implied dependencies. This is important because decomposition may require MVDs that aren't explicitly stated but are implied by others.

Step 3: Violation Detection and Resolution Repeatedly find violations and decompose. Each decomposition strictly reduces violation count because:

The violating MVD is now trivial in R₁ (covers all attributes)
R₂ has fewer violations or none

Step 4: Termination The algorithm terminates because each decomposition reduces the total number of attributes per relation, and relations with 2 or fewer attributes are automatically in 4NF.

Key Properties:

Terminates: Maximum log₂(n) iterations for n attributes
Lossless: Each decomposition step preserves information
Produces 4NF: Final relations have no 4NF violations

FD Handling

Detailed Worked Example

Let's work through a comprehensive example step by step.

Initial Relation:

EmployeeData(EmpID, Skill, Project, Department)

Dependencies:

MVD: EmpID →→ Skill (independent of project and department)
MVD: EmpID →→ Project (independent of skill and department)
FD: EmpID → Department (each employee in one department)

Candidate Keys: {EmpID, Skill, Project} is the only candidate key (Department is functionally determined).

Initial EmployeeData Instance
EmpID	Skill	Project	Department
E1	Java	Alpha	Engineering
E1	Java	Beta	Engineering
E1	Python	Alpha	Engineering
E1	Python	Beta	Engineering
E2	SQL	Gamma	Data
E2	C++	Gamma	Data

Step 1: Identify Violations

Check MVD EmpID →→ Skill:

Is it trivial? Skill ⊄ EmpID and EmpID ∪ Skill ≠ {EmpID, Skill, Project, Department}. Non-trivial.
Is EmpID a superkey? No, superkey is {EmpID, Skill, Project}. Violation!

Check MVD EmpID →→ Project:

Is it trivial? Project ⊄ EmpID and EmpID ∪ Project ≠ R. Non-trivial.
Is EmpID a superkey? No. Violation!

Check FD EmpID → Department:

Is EmpID a superkey? No. BCNF Violation! (Thus also 4NF violation, since FD implies MVD)

Step 2: First Decomposition

Let's decompose using EmpID →→ Skill:

X = {EmpID}, Y = {Skill}
R₁ = X ∪ Y = {EmpID, Skill}
R₂ = X ∪ (R - Y) = {EmpID} ∪ {Project, Department} = {EmpID, Project, Department}

After First Decomposition
Relation	Attributes	Instance
R₁ = EmpSkill	(EmpID, Skill)	(E1, Java), (E1, Python), (E2, SQL), (E2, C++)
R₂ = EmpProjectDept	(EmpID, Project, Department)	(E1, Alpha, Engineering), (E1, Beta, Engineering), (E2, Gamma, Data)

Step 3: Check R₁ = EmpSkill(EmpID, Skill)

MVD EmpID →→ Skill: X ∪ Y = entire relation. Trivial. ✓
No other dependencies apply.
R₁ is in 4NF.

Step 4: Check R₂ = EmpProjectDept(EmpID, Project, Department)

MVD EmpID →→ Project: Is it trivial? Project ⊄ EmpID, EmpID ∪ Project ≠ R₂. Non-trivial.
Is EmpID a superkey of R₂? Key is {EmpID, Project} (since EmpID → Department). EmpID is a proper subset, so not a superkey. Violation!
FD EmpID → Department: Is EmpID a superkey? No. BCNF Violation!

Step 5: Second Decomposition

Decompose R₂ using EmpID →→ Project:

X = {EmpID}, Y = {Project}
R₂₁ = {EmpID, Project}
R₂₂ = {EmpID, Department}

After Second Decomposition
Relation	Attributes	Instance
EmpSkill	(EmpID, Skill)	(E1, Java), (E1, Python), (E2, SQL), (E2, C++)
EmpProject	(EmpID, Project)	(E1, Alpha), (E1, Beta), (E2, Gamma)
EmpDept	(EmpID, Department)	(E1, Engineering), (E2, Data)

Step 6: Verify All Relations Are in 4NF

EmpSkill(EmpID, Skill):

Binary relation → automatically 4NF ✓

EmpProject(EmpID, Project):

Binary relation → automatically 4NF ✓

EmpDept(EmpID, Department):

FD: EmpID → Department
EmpID is a superkey (determines all attributes)
No MVD violations ✓

Final Result:

Three relations in 4NF:

EmpSkill(EmpID, Skill) — All employee skills
EmpProject(EmpID, Project) — All employee projects
EmpDept(EmpID, Department) — Employee department assignments

Storage Comparison:

Original: 6 rows
Decomposed: 4 + 3 + 2 = 9 total rows... wait, that's more!

Actually, let's count correctly:

Original redundancy: Department repeated, skills/projects repeated
Decomposed: No redundancy within relations

But total row count can be similar. The benefit is in maintenance and data integrity, not always raw row count.

Row Count vs. Redundancy

Decomposition doesn't always reduce total row count. It reduces REDUNDANCY (repeated facts). Each fact is stored once. The benefit is in anomaly elimination, not necessarily storage reduction.

Lossless Join Verification

A critical property of any decomposition is that it must be lossless—the original relation can be perfectly reconstructed by joining the decomposed relations. Let's verify this for our example.

Lossless Join Theorem for MVDs:

If R is decomposed into R₁ and R₂, the decomposition is lossless if and only if:

(R₁ ∩ R₂) →→ (R₁ - R₂) OR (R₁ ∩ R₂) →→ (R₂ - R₁)

in the closure of the dependencies on R.

For MVD decomposition, this is guaranteed by construction: we decompose on an MVD X →→ Y where R₁ = XY and R₂ = X(R-Y). The common attributes are X, and X →→ Y holds.

Verifying LosslessnessFor our decomposition of EmployeeData into EmpSkill, EmpProject, EmpDept:

Input

Output

Practical Verification via Join:

To verify losslessness empirically, we can:

Take the decomposed relations
Perform natural joins
Check if result equals original

-- Reconstruct from decomposition
SELECT es.EmpID, es.Skill, ep.Project, ed.Department
FROM EmpSkill es
NATURAL JOIN EmpProject ep
NATURAL JOIN EmpDept ed;

-- Compare with original
-- Should produce identical rows (in set-equality sense)

Why MVD Decomposition Is Always Lossless:

The key insight is that MVD X →→ Y guarantees the 'swap' property: if tuples (x, y₁, z₁) and (x, y₂, z₂) exist, then (x, y₁, z₂) and (x, y₂, z₁) also exist.

When we project and join:

R₁ = πₓᵧ(R) contains (x, y₁) and (x, y₂)
R₂ = πₓz(R) contains (x, z₁) and (x, z₂)
R₁ ⋈ R₂ produces all combinations: (x, y₁, z₁), (x, y₁, z₂), (x, y₂, z₁), (x, y₂, z₂)

But by the MVD, all these combinations already exist in R. So the join produces exactly R, no more, no less.

Automatic Losslessness

Dependency Preservation Considerations

While 4NF decomposition is always lossless, dependency preservation is more nuanced. Some dependencies may not be enforceable on the decomposed relations alone.

Dependency Preservation Defined:

A decomposition preserves dependencies if every dependency in the original set D can be enforced by checking constraints on individual decomposed relations, without joining them.

The Challenge with MVDs:

MVDs are particularly tricky for preservation. An MVD X →→ Y on R may not be directly enforceable after decomposition because:

MVDs are context-sensitive (may not hold on projections)
The MVD may only be 'visible' when relations are joined

Example Analysis:

Original: EmployeeData(EmpID, Skill, Project, Department) MVDs: EmpID →→ Skill, EmpID →→ Project

After decomposition:

EmpSkill(EmpID, Skill) — Binary relation; MVD is trivial here
EmpProject(EmpID, Project) — Binary relation; MVD is trivial here
EmpDept(EmpID, Department) — No MVDs apply

The original MVDs are 'satisfied' because:

The independence is now structural (separate relations)
No combinatorial enforcement is needed
The MVD semantics are preserved by the decomposition itself

Preserved After Decomposition

•FDs that fit entirely in one relation
•MVDs that become trivial after split
•Key constraints on individual relations
•Referential integrity (can use foreign keys)

May Require Extra Enforcement

•Cross-relation FDs (spanning multiple tables)
•Complex MVDs with non-trivial remainders
•Business rules involving multiple entities
•Cardinality constraints

The Practical Implication:

For most 4NF decompositions, the primary constraints are naturally preserved:

FDs like EmpID → Department are preserved in their respective relation (EmpDept)
MVDs are 'enforced' by the structure—separate relations prevent combinatorial storage
Keys are preserved with appropriate primary key definitions

When Extra Enforcement Is Needed:

If the original relation had a complex constraint like 'every employee must have at least one skill', this becomes a cross-table constraint after decomposition. You would need:

Application-level validation
Database triggers
Deferred constraint checking

Trade-off Analysis:

Aspect	Before Decomposition	After Decomposition
Redundancy	High	None
Update anomalies	Present	Eliminated
Single-table constraints	Easy	Easy
Cross-table constraints	N/A	Requires triggers

Dependency Preservation vs. Losslessness

Algorithm Variations and Optimizations

The basic 4NF decomposition algorithm has several variations and optimizations for practical application.

Variation 1: MVD Selection Strategy

When multiple MVD violations exist, the order of decomposition affects the intermediate steps (but not the final result in terms of 4NF satisfaction).

Greedy approach: Choose the MVD whose decomposition reduces redundancy most.

Predictable approach: Process MVDs in a deterministic order (e.g., by determinant size, then dependent size).

Variation 2: Combined FD/MVD Processing

Rather than handling FDs and MVDs separately:

Treat all FDs as MVDs (X → Y implies X →→ Y)
Apply the 4NF algorithm, which will achieve both BCNF and 4NF

This simplifies implementation and ensures a coherent result.

Optimized 4NF Decomposition
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
Algorithm: Optimized_4NF_Decomposition(R, FDs, MVDs)
  
  # Step 1: Convert FDs to MVDs
  all_mvds = MVDs ∪ { X →→ Y : X → Y ∈ FDs }
  
  # Step 2: Initialize
  result = {R}
  
  # Step 3: Iterative decomposition
  changed = True
  WHILE changed:
    changed = False
    
    FOR each Rᵢ in result:
      # Find violating MVD with smallest determinant
      # (optimization: decompose from smaller determinants first)
      violations = find_violations(Rᵢ, all_mvds)
      
      IF violations is not empty:
        # Select MVD with minimal |X|
        mvd = min(violations, key = lambda m: |m.determinant|)
        
        # Decompose
        R₁, R₂ = decompose(Rᵢ, mvd)
        
        # Update result
        result.remove(Rᵢ)
        result.add(R₁)
        result.add(R₂)
        
        # Project MVDs for new relations
        all_mvds = project_mvds(all_mvds, R₁, R₂)
        
        changed = True
        BREAK  # Restart loop with updated result
  
  RETURN result

Variation 3: Minimal Decomposition

The basic algorithm may produce more relations than necessary. A minimal decomposition variant seeks the fewest relations:

Identify all independent MVDs
Compute the minimal number of relations needed (one per independent fact type, plus one for non-MVD attributes)
Construct relations directly rather than iteratively

Example:

For R(A, B, C, D) with A →→ B and A →→ C (both independent of each other and D):

Iterative approach:

Decompose on A →→ B: {AB}, {ACD}
Decompose {ACD} on A →→ C: {AC}, {AD}
Final: {AB}, {AC}, {AD}

Direct approach: Recognize three independent facts: A-B relationship, A-C relationship, A-D relationship. Directly construct: {AB}, {AC}, {AD}.

Both produce the same result, but the direct approach requires less iteration.

Optimization: Avoiding Redundant Decompositions

After each decomposition, check if any resulting relation is a projection of another. If so, it can be eliminated (its information is already contained).

This prevents producing unnecessarily many small relations when the data has hierarchical MVD structure.

Practical Recommendation

Complex Decomposition Scenarios

Not all 4NF decompositions are straightforward. Let's examine scenarios that require careful handling.

Scenario 1: Multiple Independent MVD Groups

Consider: R(A, B, C, D, E) with:

A →→ B (B independent of C, D, E given A)
A →→ C (C independent of B, D, E given A)
A →→ DE (D and E together, independent of B, C)

Decomposition:

R₁(A, B) — A-B relationship
R₂(A, C) — A-C relationship
R₃(A, D, E) — A-DE relationship (D and E are grouped)

Note that D and E are NOT independent of each other—they're stored together in R₃.

Scenario 2: Overlapping MVDs

Consider: R(A, B, C) with:

A →→ B
AB →→ C

Is there a 4NF violation? Let's check:

A →→ B: Is A a superkey? Depends on other constraints.
AB →→ C: Since AB →→ C and the relation is ABC, this is trivial (covers all attributes).

If the only candidate key is ABC, then A is not a superkey, and A →→ B violates 4NF.

Decomposition: R₁(A, B), R₂(A, C)? No—we lose the AB → C dependency.

Actually, if AB →→ C is the only interesting constraint besides A →→ B:

Decompose on A →→ B: R₁(A, B), R₂(A, C)
Check R₂: Is A → C or A →→ C? If AB →→ C, projecting removes B, so the MVD doesn't apply.
R₂(A, C) is binary, automatically 4NF.

The decomposition works but the AB →→ C constraint is now implicit (enforced by the join structure).

Scenario 3: Embedded FDs Affecting MVDs

Consider: R(A, B, C, D) with:

A → B (FD)
A →→ C (MVD, independent of B, D)
Key: AC (since A → B and given A, C determines the tuple)

Violations:

A → B: Is A a superkey? No (key is AC). BCNF violation.
A →→ C: Is A a superkey? No. 4NF violation.

Decomposition options:

Option A: Address MVD first

Decompose on A →→ C: R₁(A, C), R₂(A, B, D)
R₂ still has A → B with key = A (since D functionally depends on A via... wait, what determines D?)

Let's clarify: We need more information about D. Assume A → D also.

R₁(A, C): Key is AC. A →→ C is trivial (covers relation). 4NF ✓
R₂(A, B, D): A → B and A → D. Key is A. Both FDs have superkey determinant. BCNF ✓, 4NF ✓

Option B: Address FD first

Decompose on A → B: R₁(A, B), R₂(A, C, D)
R₁ is 4NF.
R₂: A →→ C projected to R₂. Is A a superkey of R₂? If A → D, key is A. Then A is superkey, A →→ C is fine.
If A does not → D, we might still have issues.

The order can matter for intermediate steps, but both should reach 4NF.

Analyze Carefully

Summary: 4NF Decomposition

We've covered the complete 4NF decomposition process. Let's consolidate the key principles:

Key Takeaways

•Decomposition separates independent facts — Each MVD-violating relation splits into relations for each independent multi-valued fact.
•The algorithm is iterative — Find a violation, decompose, repeat until all relations are 4NF.
•Decomposition is always lossless — MVD-based decomposition guarantees perfect reconstruction via join.
•Dependency preservation varies — FDs within single relations are preserved; cross-relation constraints may need extra enforcement.
•Optimization is possible — MVD selection order and direct decomposition can improve efficiency.
•Complex scenarios require care — Multiple MVD groups, overlapping MVDs, and embedded FDs need systematic analysis.

What's Next:

With the decomposition algorithm mastered, the next page explores the relationship between 4NF and BCNF—examining when 4NF truly adds value beyond BCNF, and when BCNF alone is sufficient.

Page Complete