Loading learning content...
In the previous pages, you mastered identifying 2NF violations. Now comes the crucial step: fixing them. The cure for partial dependencies is decomposition—splitting a violating relation into smaller relations that each satisfy 2NF.
But decomposition must be done carefully. A careless split can lose information, break constraints, or create other problems. This page teaches you the systematic approach to decomposition that guarantees correctness while eliminating partial dependencies.
By the end of this page, you will understand the 2NF decomposition algorithm, why it produces correct results, how to verify lossless join and dependency preservation, and how to apply decomposition to complex real-world schemas. You'll be able to confidently refactor any 2NF-violating relation.
The fundamental principle behind 2NF decomposition is elegantly simple:
Core Idea:
If attribute A partially depends on subset S of the key K, then A "belongs" in a separate relation where S is the full key.
Why This Works:
Partial dependencies exist because we're storing an attribute (A) in a relation where the key (K) contains "extra" information beyond what's needed to determine A. By moving A to a relation where its true determinant (S) is the key, we:
The Decomposition Trade-off:
We split one table into multiple tables. This means:
However, the benefits of data integrity and reduced redundancy typically outweigh these costs, especially as data scales.
Decomposition follows the principle: 'Each fact in one place.' If StudentName is a fact about a Student (determined by StudentID), it should be stored exactly once in a Student table, not scattered across every table that references students.
Here is the systematic algorithm for decomposing a relation to achieve 2NF:
Algorithm: DECOMPOSE-TO-2NF(R, F)
Input:
Output:
123456789101112131415161718192021222324252627282930
DECOMPOSE-TO-2NF(R, F): result = {R} // Start with the original relation // Step 1: Find all partial dependencies partial_deps = DETECT-2NF-VIOLATIONS(R, F) // Step 2: Group partial dependencies by their determinant groups = GROUP-BY-DETERMINANT(partial_deps) // e.g., { {StudentID}: [StudentName], // {CourseID}: [CourseName, Credits] } // Step 3: For each determinant group, create a new relation for each (determinant, attributes) in groups: // Create new relation with determinant as key new_relation = CREATE-RELATION( attributes = determinant ∪ attributes, key = determinant ) // Add to result set result.add(new_relation) // Remove partially dependent attributes from original R.attributes = R.attributes - attributes // Step 4: The modified R (with partial deps removed) stays in result // R now contains only the key and fully dependent attributes return resultAlgorithm Explanation:
Step 1: Use the detection algorithm from the previous page to find all partial dependencies.
Step 2: Group the partially dependent attributes by their determinant. Attributes with the same determinant go into the same new relation.
Step 3: For each group:
Step 4: The original relation now contains only:
Key Insight: The original composite key remains in the decomposed relation as a foreign key reference to the new relations, maintaining the relationships.
Let's apply the algorithm to a comprehensive example.
Original Relation:
StudentCourse(StudentID, CourseID, StudentName, StudentEmail,
CourseName, Credits, InstructorID, Grade, EnrollDate)
Primary Key: {StudentID, CourseID}
Functional Dependencies:
Step 1: Detect Partial Dependencies
From analysis:
Five partial dependencies found.
Step 2: Group by Determinant
| Determinant | Dependent Attributes | New Relation Name |
|---|---|---|
| {StudentID} | StudentName, StudentEmail | Student |
| {CourseID} | CourseName, Credits, InstructorID | Course |
Step 3: Create New Relations
New Relation 1: Student
CREATE TABLE Student (
StudentID INT PRIMARY KEY,
StudentName VARCHAR(100) NOT NULL,
StudentEmail VARCHAR(100) UNIQUE
);
New Relation 2: Course
CREATE TABLE Course (
CourseID INT PRIMARY KEY,
CourseName VARCHAR(100) NOT NULL,
Credits INT NOT NULL,
InstructorID INT NOT NULL
);
Step 4: Modified Original Relation
Remove partially dependent attributes, keep only:
CREATE TABLE Enrollment (
StudentID INT,
CourseID INT,
Grade CHAR(2),
EnrollDate DATE NOT NULL,
PRIMARY KEY (StudentID, CourseID),
FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
);
The original 7-attribute relation is now three relations: • Student (StudentID, StudentName, StudentEmail) — Key: StudentID • Course (CourseID, CourseName, Credits, InstructorID) — Key: CourseID • Enrollment (StudentID, CourseID, Grade, EnrollDate) — Key: {StudentID, CourseID}
All three are in 2NF (the first two have single-attribute keys; the third has no partial dependencies).
A decomposition is lossless (or lossless-join) if we can reconstruct the original relation exactly by natural joining the decomposed relations. No information is lost or spuriously created.
Formal Definition:
A decomposition of R into R₁ and R₂ is lossless if:
R = R₁ ⋈ R₂ (natural join)
The Lossless Join Condition:
Decomposition of R into R₁(X) and R₂(Y) is lossless if and only if:
(X ∩ Y) → (X - Y) OR (X ∩ Y) → (Y - X)
In words: The common attributes must be a key for at least one of the decomposed relations.
Why 2NF Decomposition Is Always Lossless:
When we decompose for 2NF:
Verifying Losslessness in Our Example:
Original: StudentCourse(StudentID, CourseID, StudentName, StudentEmail, CourseName, Credits, InstructorID, Grade, EnrollDate)
Decomposition 1: Extracting Student
Common: {StudentID}
Check: StudentID → (StudentName, StudentEmail)? YES Result: Lossless ✓
Decomposition 2: Extracting Course from Remaining
Common: {CourseID}
Check: CourseID → (CourseName, Credits, InstructorID)? YES Result: Lossless ✓
Full Verification:
Student ⋈ Course ⋈ Enrollment = original StudentCourse
The 2NF decomposition algorithm ALWAYS produces lossless decompositions because we always extract on functional dependencies where the determinant becomes the key of the new relation. This satisfies the lossless join condition by construction.
A decomposition is dependency-preserving if all original functional dependencies can be enforced without computing joins across the decomposed relations.
Why It Matters:
If a dependency like A → B was in the original relation but A ends up in R₁ and B ends up in R₂, we can't enforce A → B using single-table constraints. We'd need triggers, application logic, or expensive cross-table checks.
Formal Definition:
Let F be the original FD set. Let F₁, F₂, ..., Fₙ be the projections of F onto the decomposed relations.
Decomposition is dependency-preserving if:
(F₁ ∪ F₂ ∪ ... ∪ Fₙ)⁺ = F⁺
That is, the closure of the union of projected dependencies equals the original closure.
Analysis for Our Example:
Original FDs:
After Decomposition:
| Relation | Attributes | Enforced FDs |
|---|---|---|
| Student | StudentID, StudentName, StudentEmail | FD2 (StudentID → Name, Email) ✓ |
| Course | CourseID, CourseName, Credits, InstructorID | FD3 (CourseID → Name, Credits, InstructorID) ✓ |
| Enrollment | StudentID, CourseID, Grade, EnrollDate | FD1 ({StudentID, CourseID} → Grade, EnrollDate) ✓ |
All original FDs are preserved in one of the decomposed relations!
This is not a coincidence. The 2NF decomposition algorithm preserves dependencies because:
The 2NF decomposition algorithm naturally preserves functional dependencies because partial dependencies are extracted wholesale (determinant + all dependents together), and full dependencies remain in place. Each original FD exists entirely within exactly one resulting relation.
Let's see the complete SQL implementation for decomposing a 2NF-violating table.
Step 1: Create the New Normalized Tables
1234567891011121314151617181920212223242526272829303132
-- Original violating table-- StudentCourse(StudentID, CourseID, StudentName, StudentEmail, -- CourseName, Credits, InstructorID, Grade, EnrollDate) -- Step 1a: Create Student table (for partial dep on StudentID)CREATE TABLE Student ( StudentID INT PRIMARY KEY, StudentName VARCHAR(100) NOT NULL, StudentEmail VARCHAR(100) NOT NULL UNIQUE); -- Step 1b: Create Course table (for partial dep on CourseID)CREATE TABLE Course ( CourseID INT PRIMARY KEY, CourseName VARCHAR(100) NOT NULL, Credits INT NOT NULL CHECK (Credits > 0), InstructorID INT NOT NULL); -- Step 1c: Create Enrollment table (remaining full dependencies)CREATE TABLE Enrollment ( StudentID INT NOT NULL, CourseID INT NOT NULL, Grade CHAR(2), EnrollDate DATE NOT NULL DEFAULT CURRENT_DATE, PRIMARY KEY (StudentID, CourseID), FOREIGN KEY (StudentID) REFERENCES Student(StudentID) ON DELETE CASCADE ON UPDATE CASCADE, FOREIGN KEY (CourseID) REFERENCES Course(CourseID) ON DELETE CASCADE ON UPDATE CASCADE);Step 2: Migrate Data from the Original Table
123456789101112131415161718192021222324252627
-- Assuming original data is in StudentCourse_Original -- Step 2a: Populate Student (distinct students only)INSERT INTO Student (StudentID, StudentName, StudentEmail)SELECT DISTINCT StudentID, StudentName, StudentEmailFROM StudentCourse_Original; -- Step 2b: Populate Course (distinct courses only)INSERT INTO Course (CourseID, CourseName, Credits, InstructorID)SELECT DISTINCT CourseID, CourseName, Credits, InstructorIDFROM StudentCourse_Original; -- Step 2c: Populate Enrollment (all enrollment relationships)INSERT INTO Enrollment (StudentID, CourseID, Grade, EnrollDate)SELECT StudentID, CourseID, Grade, EnrollDateFROM StudentCourse_Original; -- Step 3: Verify data integrity-- Check row countsSELECT 'Original' AS Source, COUNT(*) AS Rows FROM StudentCourse_OriginalUNION ALLSELECT 'After Join' AS Source, COUNT(*) AS Rows FROM Student sJOIN Enrollment e ON s.StudentID = e.StudentIDJOIN Course c ON e.CourseID = c.CourseID; -- Counts should match! If not, investigate data anomalies.Step 3: Verify the Decomposition
123456789101112131415161718192021222324252627
-- Reconstruct original view (proves lossless join)CREATE VIEW StudentCourse_Reconstructed ASSELECT s.StudentID, e.CourseID, s.StudentName, s.StudentEmail, c.CourseName, c.Credits, c.InstructorID, e.Grade, e.EnrollDateFROM Student sJOIN Enrollment e ON s.StudentID = e.StudentIDJOIN Course c ON e.CourseID = c.CourseID; -- Compare with original (should return no rows if identical)SELECT * FROM StudentCourse_OriginalEXCEPTSELECT * FROM StudentCourse_Reconstructed; -- Verify: Student info is now stored once per studentSELECT StudentID, COUNT(*) AS StorageCountFROM StudentGROUP BY StudentIDHAVING COUNT(*) > 1; -- Should return no rows -- Verify: Course info is now stored once per courseSELECT CourseID, COUNT(*) AS StorageCountFROM Course GROUP BY CourseIDHAVING COUNT(*) > 1; -- Should return no rowsBefore migration, check for data inconsistencies in the original table (same StudentID with different names, etc.). These must be resolved first, as the normalized structure will enforce consistency going forward.
Real-world decomposition often involves additional complexities. Here are strategies for common challenges:
Case 1: Overlapping Partial Dependencies
When the same attribute appears in multiple partial dependencies:
R(A, B, C, D, E)
Key: {A, B}
FDs: A → C, B → C, {A,B} → D, E
Here, C depends on BOTH A and B independently. This is unusual (indicates C might be a constant or there's a design issue). Resolution:
Case 2: Chained Partial Dependencies
R(A, B, C, D, E)
Key: {A, B}
FDs: A → C, C → D, {A,B} → E
Here, D depends on C, and C depends on A. This is both a partial dependency AND a transitive dependency.
Decomposition Approach:
First address partial dependencies (for 2NF):
After 2NF, check for 3NF (C → D is transitive)
Result:
R1(A, C, D) — Key: A, but has transitive dep C → D
R2(A, B, E) — Key: {A, B}, in 2NF
R1 now needs 3NF decomposition (extract {C, D}).
Case 3: Multiple Candidate Keys
R(A, B, C, D, E)
Candidate Keys: {A, B}, {C, D}
FDs: A → E, C → E, {A,B} → ..., {C,D} → ...
Partial dependencies exist for BOTH composite keys.
Strategy:
Identify which key to use as primary (design decision)
Note that E partially depends on BOTH keys
When extracting E, include BOTH partial determinants if they determine E:
The choice depends on semantics: Does E have one value determined by A, or by C, or are these independent?
Complex decomposition decisions often require domain knowledge. A database schema encodes business rules. When FDs seem contradictory or overlapping, the answer usually lies in understanding the actual business meaning of the data.
You now have complete command of 2NF decomposition. Let's consolidate:
What's Next:
With the decomposition algorithm mastered, the next page brings everything together with comprehensive examples. You'll see complete 2NF normalization scenarios from start to finish, reinforcing your skills with varied real-world cases.
You can now take any 2NF-violating relation and systematically decompose it into a set of relations that satisfy 2NF while preserving all information and constraints. This is a fundamental database design skill that you'll use throughout your career.