Database Management SystemsSecond Normal Form (2NF)

Second Normal Form: Eliminating Partial Dependencies

LevelIntermediate

Duration55 mins

TopicSecond Normal Form (2NF)

4 / 5

Decomposition for Second Normal Form

From Diagnosis to Treatment

In the previous pages, you mastered identifying 2NF violations. Now comes the crucial step: fixing them. The cure for partial dependencies is decomposition—splitting a violating relation into smaller relations that each satisfy 2NF.

But decomposition must be done carefully. A careless split can lose information, break constraints, or create other problems. This page teaches you the systematic approach to decomposition that guarantees correctness while eliminating partial dependencies.

What You Will Master

By the end of this page, you will understand the 2NF decomposition algorithm, why it produces correct results, how to verify lossless join and dependency preservation, and how to apply decomposition to complex real-world schemas. You'll be able to confidently refactor any 2NF-violating relation.

The Decomposition Principle

The fundamental principle behind 2NF decomposition is elegantly simple:

Core Idea:

If attribute A partially depends on subset S of the key K, then A "belongs" in a separate relation where S is the full key.

Why This Works:

Partial dependencies exist because we're storing an attribute (A) in a relation where the key (K) contains "extra" information beyond what's needed to determine A. By moving A to a relation where its true determinant (S) is the key, we:

Eliminate redundancy: A is stored once per unique value of S, not repeated for every combination of K
Enable proper updates: Changing A requires updating one row, not many
Prevent anomalies: Insert/delete anomalies disappear because A exists independently

The Decomposition Trade-off:

We split one table into multiple tables. This means:

Read operations may require joins to reconstruct the original data
Write operations may span multiple tables
Schema complexity increases

However, the benefits of data integrity and reduced redundancy typically outweigh these costs, especially as data scales.

The Design Philosophy

Decomposition follows the principle: 'Each fact in one place.' If StudentName is a fact about a Student (determined by StudentID), it should be stored exactly once in a Student table, not scattered across every table that references students.

The 2NF Decomposition Algorithm

Here is the systematic algorithm for decomposing a relation to achieve 2NF:

Algorithm: DECOMPOSE-TO-2NF(R, F)

Input:

R: A relation with attributes and primary key K
F: Functional dependencies over R

Output:

A set of relations, all in 2NF, that together preserve R's information

2NF Decomposition Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
DECOMPOSE-TO-2NF(R, F):
    result = {R}  // Start with the original relation
    
    // Step 1: Find all partial dependencies
    partial_deps = DETECT-2NF-VIOLATIONS(R, F)
    
    // Step 2: Group partial dependencies by their determinant
    groups = GROUP-BY-DETERMINANT(partial_deps)
    // e.g., { {StudentID}: [StudentName], 
    //         {CourseID}: [CourseName, Credits] }
    
    // Step 3: For each determinant group, create a new relation
    for each (determinant, attributes) in groups:
        
        // Create new relation with determinant as key
        new_relation = CREATE-RELATION(
            attributes = determinant ∪ attributes,
            key = determinant
        )
        
        // Add to result set
        result.add(new_relation)
        
        // Remove partially dependent attributes from original
        R.attributes = R.attributes - attributes
    
    // Step 4: The modified R (with partial deps removed) stays in result
    // R now contains only the key and fully dependent attributes
    
    return result

Algorithm Explanation:

Step 1: Use the detection algorithm from the previous page to find all partial dependencies.

Step 2: Group the partially dependent attributes by their determinant. Attributes with the same determinant go into the same new relation.

Step 3: For each group:

Create a new relation containing the determinant and its dependent attributes
The determinant becomes the primary key of this new relation
Remove the dependent attributes from the original relation

Step 4: The original relation now contains only:

The full composite key
Attributes that fully depend on the entire key

Key Insight: The original composite key remains in the decomposed relation as a foreign key reference to the new relations, maintaining the relationships.

Worked Example: Complete Decomposition

Let's apply the algorithm to a comprehensive example.

Original Relation:

StudentCourse(StudentID, CourseID, StudentName, StudentEmail, 
              CourseName, Credits, InstructorID, Grade, EnrollDate)

Primary Key: {StudentID, CourseID}

Functional Dependencies:

•FD1: {StudentID, CourseID} → Grade, EnrollDate (full dependency)
•FD2: {StudentID} → StudentName, StudentEmail (partial dependency)
•FD3: {CourseID} → CourseName, Credits, InstructorID (partial dependency)

Step 1: Detect Partial Dependencies

From analysis:

{StudentID} → StudentName ✗
{StudentID} → StudentEmail ✗
{CourseID} → CourseName ✗
{CourseID} → Credits ✗
{CourseID} → InstructorID ✗

Five partial dependencies found.

Step 2: Group by Determinant

Grouping Partial Dependencies
Determinant	Dependent Attributes	New Relation Name
{StudentID}	StudentName, StudentEmail	Student
{CourseID}	CourseName, Credits, InstructorID	Course

Step 3: Create New Relations

New Relation 1: Student

CREATE TABLE Student (
    StudentID    INT PRIMARY KEY,
    StudentName  VARCHAR(100) NOT NULL,
    StudentEmail VARCHAR(100) UNIQUE
);

New Relation 2: Course

CREATE TABLE Course (
    CourseID     INT PRIMARY KEY,
    CourseName   VARCHAR(100) NOT NULL,
    Credits      INT NOT NULL,
    InstructorID INT NOT NULL
);

Step 4: Modified Original Relation

Remove partially dependent attributes, keep only:

The composite key: StudentID, CourseID
Fully dependent attributes: Grade, EnrollDate

CREATE TABLE Enrollment (
    StudentID  INT,
    CourseID   INT,
    Grade      CHAR(2),
    EnrollDate DATE NOT NULL,
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
);

Decomposition Complete

The original 7-attribute relation is now three relations: • Student (StudentID, StudentName, StudentEmail) — Key: StudentID • Course (CourseID, CourseName, Credits, InstructorID) — Key: CourseID • Enrollment (StudentID, CourseID, Grade, EnrollDate) — Key: {StudentID, CourseID}

All three are in 2NF (the first two have single-attribute keys; the third has no partial dependencies).

Ensuring Lossless Join

A decomposition is lossless (or lossless-join) if we can reconstruct the original relation exactly by natural joining the decomposed relations. No information is lost or spuriously created.

Formal Definition:

A decomposition of R into R₁ and R₂ is lossless if:

R = R₁ ⋈ R₂  (natural join)

The Lossless Join Condition:

Decomposition of R into R₁(X) and R₂(Y) is lossless if and only if:

(X ∩ Y) → (X - Y)  OR  (X ∩ Y) → (Y - X)

In words: The common attributes must be a key for at least one of the decomposed relations.

Why 2NF Decomposition Is Always Lossless:

When we decompose for 2NF:

New relation R₁ contains determinant S and its dependent attributes A
Original relation R keeps S (as part of the key) and removes A
The common attribute is S
S is the key of R₁ (by construction)
Therefore, (S) → (attributes of R₁ - S) holds
The lossless join condition is satisfied!

Verifying Losslessness in Our Example:

Original: StudentCourse(StudentID, CourseID, StudentName, StudentEmail, CourseName, Credits, InstructorID, Grade, EnrollDate)

Decomposition 1: Extracting Student

R₁ = Student(StudentID, StudentName, StudentEmail)
R₂ = Remaining(StudentID, CourseID, CourseName, Credits, InstructorID, Grade, EnrollDate)

Common: {StudentID}

Check: StudentID → (StudentName, StudentEmail)? YES Result: Lossless ✓

Decomposition 2: Extracting Course from Remaining

R₁ = Course(CourseID, CourseName, Credits, InstructorID)
R₂ = Enrollment(StudentID, CourseID, Grade, EnrollDate)

Common: {CourseID}

Check: CourseID → (CourseName, Credits, InstructorID)? YES Result: Lossless ✓

Full Verification:

Student ⋈ Course ⋈ Enrollment = original StudentCourse

Lossless Guarantee

The 2NF decomposition algorithm ALWAYS produces lossless decompositions because we always extract on functional dependencies where the determinant becomes the key of the new relation. This satisfies the lossless join condition by construction.

Dependency Preservation

A decomposition is dependency-preserving if all original functional dependencies can be enforced without computing joins across the decomposed relations.

Why It Matters:

If a dependency like A → B was in the original relation but A ends up in R₁ and B ends up in R₂, we can't enforce A → B using single-table constraints. We'd need triggers, application logic, or expensive cross-table checks.

Formal Definition:

Let F be the original FD set. Let F₁, F₂, ..., Fₙ be the projections of F onto the decomposed relations.

Decomposition is dependency-preserving if:

(F₁ ∪ F₂ ∪ ... ∪ Fₙ)⁺ = F⁺

That is, the closure of the union of projected dependencies equals the original closure.

Analysis for Our Example:

Original FDs:

FD1: {StudentID, CourseID} → Grade, EnrollDate
FD2: {StudentID} → StudentName, StudentEmail
FD3: {CourseID} → CourseName, Credits, InstructorID

After Decomposition:

Relation	Attributes	Enforced FDs
Student	StudentID, StudentName, StudentEmail	FD2 (StudentID → Name, Email) ✓
Course	CourseID, CourseName, Credits, InstructorID	FD3 (CourseID → Name, Credits, InstructorID) ✓
Enrollment	StudentID, CourseID, Grade, EnrollDate	FD1 ({StudentID, CourseID} → Grade, EnrollDate) ✓

All original FDs are preserved in one of the decomposed relations!

This is not a coincidence. The 2NF decomposition algorithm preserves dependencies because:

Partial dependencies become single-table constraints (the extracted relations)
Full dependencies stay in the original relation
No dependency spans multiple tables after decomposition

Dependency Preservation Achieved

The 2NF decomposition algorithm naturally preserves functional dependencies because partial dependencies are extracted wholesale (determinant + all dependents together), and full dependencies remain in place. Each original FD exists entirely within exactly one resulting relation.

SQL Implementation of Decomposition

Let's see the complete SQL implementation for decomposing a 2NF-violating table.

Step 1: Create the New Normalized Tables

SQL: Create Normalized Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Original violating table
-- StudentCourse(StudentID, CourseID, StudentName, StudentEmail, 
--               CourseName, Credits, InstructorID, Grade, EnrollDate)
 
-- Step 1a: Create Student table (for partial dep on StudentID)
CREATE TABLE Student (
    StudentID    INT PRIMARY KEY,
    StudentName  VARCHAR(100) NOT NULL,
    StudentEmail VARCHAR(100) NOT NULL UNIQUE
);
 
-- Step 1b: Create Course table (for partial dep on CourseID)
CREATE TABLE Course (
    CourseID     INT PRIMARY KEY,
    CourseName   VARCHAR(100) NOT NULL,
    Credits      INT NOT NULL CHECK (Credits > 0),
    InstructorID INT NOT NULL
);
 
-- Step 1c: Create Enrollment table (remaining full dependencies)
CREATE TABLE Enrollment (
    StudentID  INT NOT NULL,
    CourseID   INT NOT NULL,
    Grade      CHAR(2),
    EnrollDate DATE NOT NULL DEFAULT CURRENT_DATE,
    
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID)
        ON DELETE CASCADE ON UPDATE CASCADE,
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
        ON DELETE CASCADE ON UPDATE CASCADE
);

Step 2: Migrate Data from the Original Table

SQL: Data Migration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Assuming original data is in StudentCourse_Original
 
-- Step 2a: Populate Student (distinct students only)
INSERT INTO Student (StudentID, StudentName, StudentEmail)
SELECT DISTINCT StudentID, StudentName, StudentEmail
FROM StudentCourse_Original;
 
-- Step 2b: Populate Course (distinct courses only)
INSERT INTO Course (CourseID, CourseName, Credits, InstructorID)
SELECT DISTINCT CourseID, CourseName, Credits, InstructorID
FROM StudentCourse_Original;
 
-- Step 2c: Populate Enrollment (all enrollment relationships)
INSERT INTO Enrollment (StudentID, CourseID, Grade, EnrollDate)
SELECT StudentID, CourseID, Grade, EnrollDate
FROM StudentCourse_Original;
 
-- Step 3: Verify data integrity
-- Check row counts
SELECT 'Original' AS Source, COUNT(*) AS Rows FROM StudentCourse_Original
UNION ALL
SELECT 'After Join' AS Source, COUNT(*) AS Rows 
FROM Student s
JOIN Enrollment e ON s.StudentID = e.StudentID
JOIN Course c ON e.CourseID = c.CourseID;
 
-- Counts should match! If not, investigate data anomalies.

Step 3: Verify the Decomposition

SQL: Verification Queries
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Reconstruct original view (proves lossless join)
CREATE VIEW StudentCourse_Reconstructed AS
SELECT 
    s.StudentID, e.CourseID,
    s.StudentName, s.StudentEmail,
    c.CourseName, c.Credits, c.InstructorID,
    e.Grade, e.EnrollDate
FROM Student s
JOIN Enrollment e ON s.StudentID = e.StudentID
JOIN Course c ON e.CourseID = c.CourseID;
 
-- Compare with original (should return no rows if identical)
SELECT * FROM StudentCourse_Original
EXCEPT
SELECT * FROM StudentCourse_Reconstructed;
 
-- Verify: Student info is now stored once per student
SELECT StudentID, COUNT(*) AS StorageCount
FROM Student
GROUP BY StudentID
HAVING COUNT(*) > 1;  -- Should return no rows
 
-- Verify: Course info is now stored once per course
SELECT CourseID, COUNT(*) AS StorageCount
FROM Course  
GROUP BY CourseID
HAVING COUNT(*) > 1;  -- Should return no rows

Migration Considerations

Before migration, check for data inconsistencies in the original table (same StudentID with different names, etc.). These must be resolved first, as the normalized structure will enforce consistency going forward.

Handling Complex Cases

Real-world decomposition often involves additional complexities. Here are strategies for common challenges:

Case 1: Overlapping Partial Dependencies

When the same attribute appears in multiple partial dependencies:

R(A, B, C, D, E)
Key: {A, B}
FDs: A → C, B → C, {A,B} → D, E

Here, C depends on BOTH A and B independently. This is unusual (indicates C might be a constant or there's a design issue). Resolution:

Investigate if the FDs are actually correct (domain expert consultation)
If valid, choose one determinant for C (typically the more restrictive one)
Consider if C should be duplicated or if there's a deeper normalization issue

Case 2: Chained Partial Dependencies

R(A, B, C, D, E)
Key: {A, B}
FDs: A → C, C → D, {A,B} → E

Here, D depends on C, and C depends on A. This is both a partial dependency AND a transitive dependency.

Decomposition Approach:

First address partial dependencies (for 2NF):
- Extract {A, C} → This also brings D along due to C → D
- Or extract {A, C, D} directly
After 2NF, check for 3NF (C → D is transitive)

Result:

R1(A, C, D) — Key: A, but has transitive dep C → D
R2(A, B, E) — Key: {A, B}, in 2NF

R1 now needs 3NF decomposition (extract {C, D}).

Case 3: Multiple Candidate Keys

R(A, B, C, D, E)
Candidate Keys: {A, B}, {C, D}
FDs: A → E, C → E, {A,B} → ..., {C,D} → ...

Partial dependencies exist for BOTH composite keys.

Strategy:

Identify which key to use as primary (design decision)
Note that E partially depends on BOTH keys
When extracting E, include BOTH partial determinants if they determine E:
- Option 1: {A, E} and {C, E} as separate tables (if A → E and C → E are different facts)
- Option 2: {A, C, E} if the relationships are connected
The choice depends on semantics: Does E have one value determined by A, or by C, or are these independent?

When in Doubt, Consult Domain Experts

Complex decomposition decisions often require domain knowledge. A database schema encodes business rules. When FDs seem contradictory or overlapping, the answer usually lies in understanding the actual business meaning of the data.

Summary: Mastering 2NF Decomposition

You now have complete command of 2NF decomposition. Let's consolidate:

Key Takeaways

•Core Principle: Extract partially dependent attributes into new relations where their determinant is the full key.
•The Algorithm: Detect violations → Group by determinant → Create new relations → Remove from original.
•Lossless Guarantee: The algorithm always produces lossless decompositions because the common attribute is the key of the extracted relation.
•Dependency Preservation: All FDs remain enforceable within single tables—no cross-table constraints needed.
•SQL Implementation: Use SELECT DISTINCT to populate extracted tables, migrate remaining data, verify with joins.
•Complex Cases: Overlapping or chained dependencies require careful analysis; consult domain knowledge.

What's Next:

With the decomposition algorithm mastered, the next page brings everything together with comprehensive examples. You'll see complete 2NF normalization scenarios from start to finish, reinforcing your skills with varied real-world cases.

Decomposition Mastered

You can now take any 2NF-violating relation and systematically decompose it into a set of relations that satisfy 2NF while preserving all information and constraints. This is a fundamental database design skill that you'll use throughout your career.

4 / 5

Loading learning content...

Database Management SystemsSecond Normal Form (2NF)

Second Normal Form: Eliminating Partial Dependencies

LevelIntermediate

Duration55 mins

TopicSecond Normal Form (2NF)

4 / 5

Decomposition for Second Normal Form

From Diagnosis to Treatment

What You Will Master

The Decomposition Principle

The fundamental principle behind 2NF decomposition is elegantly simple:

Core Idea:

If attribute A partially depends on subset S of the key K, then A "belongs" in a separate relation where S is the full key.

Why This Works:

Eliminate redundancy: A is stored once per unique value of S, not repeated for every combination of K
Enable proper updates: Changing A requires updating one row, not many
Prevent anomalies: Insert/delete anomalies disappear because A exists independently

The Decomposition Trade-off:

We split one table into multiple tables. This means:

Read operations may require joins to reconstruct the original data
Write operations may span multiple tables
Schema complexity increases

However, the benefits of data integrity and reduced redundancy typically outweigh these costs, especially as data scales.

The Design Philosophy

The 2NF Decomposition Algorithm

Here is the systematic algorithm for decomposing a relation to achieve 2NF:

Algorithm: DECOMPOSE-TO-2NF(R, F)

Input:

R: A relation with attributes and primary key K
F: Functional dependencies over R

Output:

A set of relations, all in 2NF, that together preserve R's information

2NF Decomposition Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
DECOMPOSE-TO-2NF(R, F):
    result = {R}  // Start with the original relation
    
    // Step 1: Find all partial dependencies
    partial_deps = DETECT-2NF-VIOLATIONS(R, F)
    
    // Step 2: Group partial dependencies by their determinant
    groups = GROUP-BY-DETERMINANT(partial_deps)
    // e.g., { {StudentID}: [StudentName], 
    //         {CourseID}: [CourseName, Credits] }
    
    // Step 3: For each determinant group, create a new relation
    for each (determinant, attributes) in groups:
        
        // Create new relation with determinant as key
        new_relation = CREATE-RELATION(
            attributes = determinant ∪ attributes,
            key = determinant
        )
        
        // Add to result set
        result.add(new_relation)
        
        // Remove partially dependent attributes from original
        R.attributes = R.attributes - attributes
    
    // Step 4: The modified R (with partial deps removed) stays in result
    // R now contains only the key and fully dependent attributes
    
    return result

Algorithm Explanation:

Step 1: Use the detection algorithm from the previous page to find all partial dependencies.

Step 2: Group the partially dependent attributes by their determinant. Attributes with the same determinant go into the same new relation.

Step 3: For each group:

Create a new relation containing the determinant and its dependent attributes
The determinant becomes the primary key of this new relation
Remove the dependent attributes from the original relation

Step 4: The original relation now contains only:

The full composite key
Attributes that fully depend on the entire key

Key Insight: The original composite key remains in the decomposed relation as a foreign key reference to the new relations, maintaining the relationships.

Worked Example: Complete Decomposition

Let's apply the algorithm to a comprehensive example.

Original Relation:

StudentCourse(StudentID, CourseID, StudentName, StudentEmail, 
              CourseName, Credits, InstructorID, Grade, EnrollDate)

Primary Key: {StudentID, CourseID}

Functional Dependencies:

•FD1: {StudentID, CourseID} → Grade, EnrollDate (full dependency)
•FD2: {StudentID} → StudentName, StudentEmail (partial dependency)
•FD3: {CourseID} → CourseName, Credits, InstructorID (partial dependency)

Step 1: Detect Partial Dependencies

From analysis:

{StudentID} → StudentName ✗
{StudentID} → StudentEmail ✗
{CourseID} → CourseName ✗
{CourseID} → Credits ✗
{CourseID} → InstructorID ✗

Five partial dependencies found.

Step 2: Group by Determinant

Grouping Partial Dependencies
Determinant	Dependent Attributes	New Relation Name
{StudentID}	StudentName, StudentEmail	Student
{CourseID}	CourseName, Credits, InstructorID	Course

Step 3: Create New Relations

New Relation 1: Student

CREATE TABLE Student (
    StudentID    INT PRIMARY KEY,
    StudentName  VARCHAR(100) NOT NULL,
    StudentEmail VARCHAR(100) UNIQUE
);

New Relation 2: Course

CREATE TABLE Course (
    CourseID     INT PRIMARY KEY,
    CourseName   VARCHAR(100) NOT NULL,
    Credits      INT NOT NULL,
    InstructorID INT NOT NULL
);

Step 4: Modified Original Relation

Remove partially dependent attributes, keep only:

The composite key: StudentID, CourseID
Fully dependent attributes: Grade, EnrollDate

CREATE TABLE Enrollment (
    StudentID  INT,
    CourseID   INT,
    Grade      CHAR(2),
    EnrollDate DATE NOT NULL,
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
);

Decomposition Complete

All three are in 2NF (the first two have single-attribute keys; the third has no partial dependencies).

Ensuring Lossless Join

A decomposition is lossless (or lossless-join) if we can reconstruct the original relation exactly by natural joining the decomposed relations. No information is lost or spuriously created.

Formal Definition:

A decomposition of R into R₁ and R₂ is lossless if:

R = R₁ ⋈ R₂  (natural join)

The Lossless Join Condition:

Decomposition of R into R₁(X) and R₂(Y) is lossless if and only if:

(X ∩ Y) → (X - Y)  OR  (X ∩ Y) → (Y - X)

In words: The common attributes must be a key for at least one of the decomposed relations.

Why 2NF Decomposition Is Always Lossless:

When we decompose for 2NF:

New relation R₁ contains determinant S and its dependent attributes A
Original relation R keeps S (as part of the key) and removes A
The common attribute is S
S is the key of R₁ (by construction)
Therefore, (S) → (attributes of R₁ - S) holds
The lossless join condition is satisfied!

Verifying Losslessness in Our Example:

Original: StudentCourse(StudentID, CourseID, StudentName, StudentEmail, CourseName, Credits, InstructorID, Grade, EnrollDate)

Decomposition 1: Extracting Student

R₁ = Student(StudentID, StudentName, StudentEmail)
R₂ = Remaining(StudentID, CourseID, CourseName, Credits, InstructorID, Grade, EnrollDate)

Common: {StudentID}

Check: StudentID → (StudentName, StudentEmail)? YES Result: Lossless ✓

Decomposition 2: Extracting Course from Remaining

R₁ = Course(CourseID, CourseName, Credits, InstructorID)
R₂ = Enrollment(StudentID, CourseID, Grade, EnrollDate)

Common: {CourseID}

Check: CourseID → (CourseName, Credits, InstructorID)? YES Result: Lossless ✓

Full Verification:

Student ⋈ Course ⋈ Enrollment = original StudentCourse

Lossless Guarantee

Dependency Preservation

A decomposition is dependency-preserving if all original functional dependencies can be enforced without computing joins across the decomposed relations.

Why It Matters:

Formal Definition:

Let F be the original FD set. Let F₁, F₂, ..., Fₙ be the projections of F onto the decomposed relations.

Decomposition is dependency-preserving if:

(F₁ ∪ F₂ ∪ ... ∪ Fₙ)⁺ = F⁺

That is, the closure of the union of projected dependencies equals the original closure.

Analysis for Our Example:

Original FDs:

FD1: {StudentID, CourseID} → Grade, EnrollDate
FD2: {StudentID} → StudentName, StudentEmail
FD3: {CourseID} → CourseName, Credits, InstructorID

After Decomposition:

Relation	Attributes	Enforced FDs
Student	StudentID, StudentName, StudentEmail	FD2 (StudentID → Name, Email) ✓
Course	CourseID, CourseName, Credits, InstructorID	FD3 (CourseID → Name, Credits, InstructorID) ✓
Enrollment	StudentID, CourseID, Grade, EnrollDate	FD1 ({StudentID, CourseID} → Grade, EnrollDate) ✓

All original FDs are preserved in one of the decomposed relations!

This is not a coincidence. The 2NF decomposition algorithm preserves dependencies because:

Partial dependencies become single-table constraints (the extracted relations)
Full dependencies stay in the original relation
No dependency spans multiple tables after decomposition

Dependency Preservation Achieved

SQL Implementation of Decomposition

Let's see the complete SQL implementation for decomposing a 2NF-violating table.

Step 1: Create the New Normalized Tables

SQL: Create Normalized Structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Original violating table
-- StudentCourse(StudentID, CourseID, StudentName, StudentEmail, 
--               CourseName, Credits, InstructorID, Grade, EnrollDate)
 
-- Step 1a: Create Student table (for partial dep on StudentID)
CREATE TABLE Student (
    StudentID    INT PRIMARY KEY,
    StudentName  VARCHAR(100) NOT NULL,
    StudentEmail VARCHAR(100) NOT NULL UNIQUE
);
 
-- Step 1b: Create Course table (for partial dep on CourseID)
CREATE TABLE Course (
    CourseID     INT PRIMARY KEY,
    CourseName   VARCHAR(100) NOT NULL,
    Credits      INT NOT NULL CHECK (Credits > 0),
    InstructorID INT NOT NULL
);
 
-- Step 1c: Create Enrollment table (remaining full dependencies)
CREATE TABLE Enrollment (
    StudentID  INT NOT NULL,
    CourseID   INT NOT NULL,
    Grade      CHAR(2),
    EnrollDate DATE NOT NULL DEFAULT CURRENT_DATE,
    
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID)
        ON DELETE CASCADE ON UPDATE CASCADE,
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
        ON DELETE CASCADE ON UPDATE CASCADE
);

Step 2: Migrate Data from the Original Table

SQL: Data Migration
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Assuming original data is in StudentCourse_Original
 
-- Step 2a: Populate Student (distinct students only)
INSERT INTO Student (StudentID, StudentName, StudentEmail)
SELECT DISTINCT StudentID, StudentName, StudentEmail
FROM StudentCourse_Original;
 
-- Step 2b: Populate Course (distinct courses only)
INSERT INTO Course (CourseID, CourseName, Credits, InstructorID)
SELECT DISTINCT CourseID, CourseName, Credits, InstructorID
FROM StudentCourse_Original;
 
-- Step 2c: Populate Enrollment (all enrollment relationships)
INSERT INTO Enrollment (StudentID, CourseID, Grade, EnrollDate)
SELECT StudentID, CourseID, Grade, EnrollDate
FROM StudentCourse_Original;
 
-- Step 3: Verify data integrity
-- Check row counts
SELECT 'Original' AS Source, COUNT(*) AS Rows FROM StudentCourse_Original
UNION ALL
SELECT 'After Join' AS Source, COUNT(*) AS Rows 
FROM Student s
JOIN Enrollment e ON s.StudentID = e.StudentID
JOIN Course c ON e.CourseID = c.CourseID;
 
-- Counts should match! If not, investigate data anomalies.

Step 3: Verify the Decomposition

SQL: Verification Queries
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Reconstruct original view (proves lossless join)
CREATE VIEW StudentCourse_Reconstructed AS
SELECT 
    s.StudentID, e.CourseID,
    s.StudentName, s.StudentEmail,
    c.CourseName, c.Credits, c.InstructorID,
    e.Grade, e.EnrollDate
FROM Student s
JOIN Enrollment e ON s.StudentID = e.StudentID
JOIN Course c ON e.CourseID = c.CourseID;
 
-- Compare with original (should return no rows if identical)
SELECT * FROM StudentCourse_Original
EXCEPT
SELECT * FROM StudentCourse_Reconstructed;
 
-- Verify: Student info is now stored once per student
SELECT StudentID, COUNT(*) AS StorageCount
FROM Student
GROUP BY StudentID
HAVING COUNT(*) > 1;  -- Should return no rows
 
-- Verify: Course info is now stored once per course
SELECT CourseID, COUNT(*) AS StorageCount
FROM Course  
GROUP BY CourseID
HAVING COUNT(*) > 1;  -- Should return no rows

Migration Considerations

Handling Complex Cases

Real-world decomposition often involves additional complexities. Here are strategies for common challenges:

Case 1: Overlapping Partial Dependencies

When the same attribute appears in multiple partial dependencies:

R(A, B, C, D, E)
Key: {A, B}
FDs: A → C, B → C, {A,B} → D, E

Here, C depends on BOTH A and B independently. This is unusual (indicates C might be a constant or there's a design issue). Resolution:

Investigate if the FDs are actually correct (domain expert consultation)
If valid, choose one determinant for C (typically the more restrictive one)
Consider if C should be duplicated or if there's a deeper normalization issue

Case 2: Chained Partial Dependencies

R(A, B, C, D, E)
Key: {A, B}
FDs: A → C, C → D, {A,B} → E

Here, D depends on C, and C depends on A. This is both a partial dependency AND a transitive dependency.

Decomposition Approach:

First address partial dependencies (for 2NF):
- Extract {A, C} → This also brings D along due to C → D
- Or extract {A, C, D} directly
After 2NF, check for 3NF (C → D is transitive)

Result:

R1(A, C, D) — Key: A, but has transitive dep C → D
R2(A, B, E) — Key: {A, B}, in 2NF

R1 now needs 3NF decomposition (extract {C, D}).

Case 3: Multiple Candidate Keys

R(A, B, C, D, E)
Candidate Keys: {A, B}, {C, D}
FDs: A → E, C → E, {A,B} → ..., {C,D} → ...

Partial dependencies exist for BOTH composite keys.

Strategy:

Identify which key to use as primary (design decision)
Note that E partially depends on BOTH keys
When extracting E, include BOTH partial determinants if they determine E:
- Option 1: {A, E} and {C, E} as separate tables (if A → E and C → E are different facts)
- Option 2: {A, C, E} if the relationships are connected
The choice depends on semantics: Does E have one value determined by A, or by C, or are these independent?

When in Doubt, Consult Domain Experts

Summary: Mastering 2NF Decomposition

You now have complete command of 2NF decomposition. Let's consolidate:

Key Takeaways

•Core Principle: Extract partially dependent attributes into new relations where their determinant is the full key.
•The Algorithm: Detect violations → Group by determinant → Create new relations → Remove from original.
•Lossless Guarantee: The algorithm always produces lossless decompositions because the common attribute is the key of the extracted relation.
•Dependency Preservation: All FDs remain enforceable within single tables—no cross-table constraints needed.
•SQL Implementation: Use SELECT DISTINCT to populate extracted tables, migrate remaining data, verify with joins.
•Complex Cases: Overlapping or chained dependencies require careful analysis; consult domain knowledge.

What's Next:

Decomposition Mastered

4 / 5