Loading learning content...
Normalization is one of the most important and most misunderstood concepts in database design. At its core, normalization is the systematic application of formal techniques to transform database relations in ways that reduce data redundancy and improve data integrity.
The process was first introduced by Edgar F. Codd in 1970 and has since been refined into a hierarchy of normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), each building upon the previous to eliminate increasingly subtle forms of redundancy and anomalies.
Why does redundancy matter?
Redundant data—the same fact stored in multiple places—creates three types of anomalies:
Insertion Anomaly: Cannot insert data without complete information. Example: Can't add a new department until at least one employee is assigned.
Update Anomaly: Changing a fact requires multiple updates. Example: If a department moves locations, every employee row with that department must be updated.
Deletion Anomaly: Removing data unintentionally deletes related facts. Example: Deleting the last employee in a department loses the department's information.
Normalization systematically eliminates these anomalies by decomposing relations into smaller, well-structured relations that store each fact exactly once.
This page provides mastery of normalization theory and practice: functional dependencies, the full normal form hierarchy (1NF through 5NF), decomposition algorithms, lossless join and dependency preservation properties, and practical guidelines for determining appropriate normalization levels.
Before understanding normal forms, we must master functional dependencies (FDs)—the mathematical relationships that normalization seeks to manage.
Definition:
A functional dependency X → Y (read "X determines Y" or "Y is functionally dependent on X") exists in a relation R if and only if, for any two tuples t₁ and t₂ in R:
If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]
In other words: if two tuples have the same value(s) for attribute(s) X, they must have the same value(s) for attribute(s) Y.
Key Insights:
| Relation | Functional Dependency | Interpretation |
|---|---|---|
| Employee | EmployeeID → Name, Salary, DeptID | ID uniquely determines all employee attributes |
| Employee | Email → EmployeeID | Email is an alternate key |
| Employee | DeptID → DeptName, DeptLocation | Department ID determines department info (redundancy!) |
| CourseSection | CourseID, Semester → InstructorID | A course section has one instructor |
| CourseSection | InstructorID, Semester → OfficeHours | Instructor's hours per semester |
Types of Functional Dependencies:
Trivial FD: Y is a subset of X. Always true by definition. Example: {EmployeeID, Name} → Name
Non-trivial FD: Y is not a subset of X. These are meaningful constraints. Example: EmployeeID → Name
Completely Non-trivial FD: X and Y have no common attributes. Example: EmployeeID → Salary
Partial Dependency: Y depends on part of a composite key, not the whole. Example: In R(A, B, C) with key {A, B}, if B → C, then C is partially dependent.
Transitive Dependency: X → Y and Y → Z implies X → Z (transitivity). Example: EmployeeID → DeptID → DeptLocation
All FDs can be derived using Armstrong's Axioms: (1) Reflexivity: If Y ⊆ X, then X → Y. (2) Augmentation: If X → Y, then XZ → YZ. (3) Transitivity: If X → Y and Y → Z, then X → Z. From these, we can derive Union, Decomposition, and Pseudotransitivity rules.
First Normal Form (1NF) is the foundation—the minimum requirement for a relation to exist in the relational model.
Definition:
A relation is in 1NF if and only if:
The Atomicity Requirement:
"Atomic" means the value cannot be meaningfully subdivided for query purposes. This is context-dependent:
The key question: Will you ever need to access or query parts of this value independently?
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
-- ==================================================-- FIRST NORMAL FORM (1NF) VIOLATIONS AND CORRECTIONS-- ================================================== -- ❌ VIOLATION 1: Repeating Groups (Multiple values in one cell)-- Problem Table:-- | StudentID | Name | PhoneNumbers |-- |-----------|-------|----------------------------------|-- | 1 | Alice | 555-1234, 555-5678, 555-9999 |-- | 2 | Bob | 555-4321 | -- Issue: PhoneNumbers contains multiple values, violating atomicity -- ✅ CORRECTION: Separate table for multivalued attributesCREATE TABLE Student ( StudentID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL); CREATE TABLE StudentPhone ( StudentID INT NOT NULL, PhoneNumber VARCHAR(20) NOT NULL, PhoneType VARCHAR(20) DEFAULT 'Mobile', PRIMARY KEY (StudentID, PhoneNumber), FOREIGN KEY (StudentID) REFERENCES Student(StudentID) ON DELETE CASCADE); -- Data now atomic:-- Student: (1, 'Alice'), (2, 'Bob')-- StudentPhone: (1, '555-1234', 'Mobile'), (1, '555-5678', 'Home'), -- (1, '555-9999', 'Work'), (2, '555-4321', 'Mobile') -- ❌ VIOLATION 2: Repeating Groups as Columns-- Problem Table:-- | OrderID | CustomerID | Product1 | Price1 | Product2 | Price2 |-- |---------|------------|------------|--------|------------|--------|-- | 1001 | 1 | Widget | 10.00 | Gadget | 25.00 |-- | 1002 | 2 | Widget | 10.00 | NULL | NULL | -- Issues: -- 1) Fixed number of products limits flexibility-- 2) NULL values waste space-- 3) Cannot easily query "all products in order" -- ✅ CORRECTION: Normalize to separate order linesCREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT NOT NULL, OrderDate DATE NOT NULL DEFAULT CURRENT_DATE); CREATE TABLE OrderLine ( OrderID INT NOT NULL, LineNumber INT NOT NULL, ProductName VARCHAR(100) NOT NULL, UnitPrice DECIMAL(10,2) NOT NULL, Quantity INT NOT NULL DEFAULT 1, PRIMARY KEY (OrderID, LineNumber), FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE); -- ❌ VIOLATION 3: Composite/Non-atomic Values-- Problem:-- | EmployeeID | FullAddress |-- |------------|----------------------------------------|-- | 1 | 123 Main St, New York, NY 10001, USA | -- Cannot efficiently query by city, state, or zip code -- ✅ CORRECTION: Decompose into atomic componentsCREATE TABLE Employee ( EmployeeID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL); CREATE TABLE EmployeeAddress ( EmployeeID INT PRIMARY KEY, Street VARCHAR(200), City VARCHAR(100), State VARCHAR(50), PostalCode VARCHAR(20), Country CHAR(3) DEFAULT 'USA', FOREIGN KEY (EmployeeID) REFERENCES Employee(EmployeeID) ON DELETE CASCADE); -- Now queries like "SELECT * FROM EmployeeAddress WHERE City = 'New York'" work -- ❌ VIOLATION 4: Missing Primary Key-- Problem:-- | Name | Department | Salary |-- |-------|------------|--------|-- | Alice | Sales | 50000 |-- | Alice | HR | 55000 | -- Same name, different department-- | Bob | Sales | 60000 | -- Without PK, cannot uniquely identify rows or establish relationships -- ✅ CORRECTION: Add primary keyCREATE TABLE EmployeeWithPK ( EmployeeID SERIAL PRIMARY KEY, -- Surrogate key Name VARCHAR(100) NOT NULL, Department VARCHAR(50) NOT NULL, Salary DECIMAL(12,2), UNIQUE (Name, Department) -- If business rule requires uniqueness);What's "atomic" depends on requirements. A date is atomic for most purposes but not if you frequently query by month alone. JSON columns in modern SQL can be atomic (the whole document) or non-atomic (query individual fields). Design based on actual query patterns.
Second Normal Form (2NF) addresses partial dependencies in relations with composite primary keys.
Definition:
A relation is in 2NF if and only if:
Key Terminology:
Important Note:
If a relation has a single-attribute primary key, it automatically satisfies 2NF (there's no "part of" a single attribute). 2NF only matters for composite keys.
Why Partial Dependencies Cause Problems:
When non-key attributes depend on only part of the key, the same fact gets stored multiple times—once for each combination with the irrelevant key portion.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
-- ==================================================-- SECOND NORMAL FORM (2NF) VIOLATION AND CORRECTION-- ================================================== -- ❌ VIOLATION: Partial Dependency-- Table: StudentCourse-- Primary Key: (StudentID, CourseID)-- Attributes: StudentName, CourseName, InstructorID, Grade -- | StudentID | CourseID | StudentName | CourseName | InstructorID | Grade |-- |-----------|----------|-------------|------------|--------------|-------|-- | 1 | CS101 | Alice | Intro CS | 501 | A |-- | 1 | CS201 | Alice | Data Str | 502 | B |-- | 2 | CS101 | Bob | Intro CS | 501 | B |-- | 3 | CS101 | Carol | Intro CS | 501 | A | -- Functional Dependencies:-- 1) StudentID, CourseID → StudentName, CourseName, InstructorID, Grade (Full - OK)-- 2) StudentID → StudentName (PARTIAL - violates 2NF!)-- 3) CourseID → CourseName, InstructorID (PARTIAL - violates 2NF!)-- 4) StudentID, CourseID → Grade (Full - OK) -- Problems caused:-- 1) UPDATE ANOMALY: If Alice changes name, must update all her enrollments-- 2) INSERTION ANOMALY: Can't add new course until a student enrolls-- 3) DELETION ANOMALY: If Alice drops all courses, we lose her information-- 4) REDUNDANCY: "Alice" stored 2 times, "Intro CS" stored 3 times -- ✅ CORRECTION: Decompose to eliminate partial dependencies -- Relation 1: Student (captures StudentID → StudentName)CREATE TABLE Student ( StudentID INT PRIMARY KEY, StudentName VARCHAR(100) NOT NULL); -- Relation 2: Course (captures CourseID → CourseName, InstructorID)CREATE TABLE Course ( CourseID VARCHAR(10) PRIMARY KEY, CourseName VARCHAR(100) NOT NULL, InstructorID INT NOT NULL); -- Relation 3: Enrollment (only the full dependency remains)CREATE TABLE Enrollment ( StudentID INT NOT NULL, CourseID VARCHAR(10) NOT NULL, Grade CHAR(2), EnrollmentDate DATE NOT NULL DEFAULT CURRENT_DATE, PRIMARY KEY (StudentID, CourseID), FOREIGN KEY (StudentID) REFERENCES Student(StudentID), FOREIGN KEY (CourseID) REFERENCES Course(CourseID)); -- Now each fact stored exactly once:-- Student: (1, 'Alice'), (2, 'Bob'), (3, 'Carol')-- Course: ('CS101', 'Intro CS', 501), ('CS201', 'Data Str', 502)-- Enrollment: (1, 'CS101', 'A'), (1, 'CS201', 'B'), (2, 'CS101', 'B'), (3, 'CS101', 'A') -- ==================================================-- 2NF DECOMPOSITION ALGORITHM-- ================================================== /*Given relation R(A, B, C, D) with key {A, B} and FDs: - {A, B} → C, D - A → D (partial dependency: D depends on part of key) Step 1: Identify partial dependencies - A → D violates 2NF (D depends on A alone, not full key {A,B}) Step 2: Create new relation with partial dependency - Create R1(A, D) with key {A} Step 3: Remove partially dependent attributes from original - Create R2(A, B, C) with key {A, B} Result: Both R1 and R2 are in 2NF - R1(A, D): A → D (full dependency, single-attribute key) - R2(A, B, C): {A, B} → C (full dependency)*/While 2NF is conceptually important, well-designed ER-to-relational mappings rarely produce 2NF violations. If you map entities to separate tables and relationships properly, partial dependencies don't arise. 2NF violations typically occur when trying to store relationship attributes in entity tables.
Third Normal Form (3NF) eliminates transitive dependencies and is often considered the practical target for most database designs.
Definition:
A relation is in 3NF if and only if:
Alternative Definition (More Precise):
For every non-trivial FD X → A in R:
What is Transitive Dependency?
If X → Y and Y → Z, then X → Z transitively. In 3NF terms, a violation occurs when:
The Core Issue:
Transitive dependencies embed facts about one entity (the intermediate) within the table of another entity. This creates redundancy and anomalies similar to partial dependencies.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
-- ==================================================-- THIRD NORMAL FORM (3NF) VIOLATION AND CORRECTION-- ================================================== -- ❌ VIOLATION: Transitive Dependency-- Table: Employee-- Primary Key: EmployeeID-- -- | EmpID | EmpName | DeptID | DeptName | DeptLocation |-- |-------|---------|--------|-------------|--------------|-- | 1 | Alice | D1 | Engineering | Building A |-- | 2 | Bob | D1 | Engineering | Building A |-- | 3 | Carol | D2 | Marketing | Building B |-- | 4 | Dave | D2 | Marketing | Building B | -- Functional Dependencies:-- 1) EmployeeID → EmpName, DeptID, DeptName, DeptLocation (OK - key determines all)-- 2) DeptID → DeptName, DeptLocation (Non-key → Non-key: TRANSITIVE!) -- Transitive chain: EmployeeID → DeptID → DeptName, DeptLocation-- DeptName and DeptLocation are about Department, not Employee -- Problems:-- 1) UPDATE ANOMALY: Change "Engineering" location requires updating all employees-- 2) INSERTION ANOMALY: Can't add new department until employee is hired-- 3) DELETION ANOMALY: If Alice and Bob leave, we lose Engineering department info-- 4) REDUNDANCY: "Engineering, Building A" repeated for every Engineering employee -- ✅ CORRECTION: Remove transitive dependency -- Relation 1: Department (new entity based on determinant of transitive FD)CREATE TABLE Department ( DeptID VARCHAR(10) PRIMARY KEY, DeptName VARCHAR(100) NOT NULL UNIQUE, DeptLocation VARCHAR(100)); -- Relation 2: Employee (transitively dependent attributes removed)CREATE TABLE Employee ( EmployeeID INT PRIMARY KEY, EmployeeName VARCHAR(100) NOT NULL, Salary DECIMAL(12,2), DeptID VARCHAR(10) NOT NULL, FOREIGN KEY (DeptID) REFERENCES Department(DeptID)); -- Data normalized:-- Department: ('D1', 'Engineering', 'Building A'), ('D2', 'Marketing', 'Building B')-- Employee: (1, 'Alice', 75000, 'D1'), (2, 'Bob', 80000, 'D1'), -- (3, 'Carol', 70000, 'D2'), (4, 'Dave', 72000, 'D2') -- ==================================================-- 3NF DECOMPOSITION ALGORITHM-- ================================================== /*Given relation R(A, B, C, D, E) with key {A} and FDs: - A → B, C, D, E - C → D, E (transitive: non-key C determines non-keys D, E) Step 1: Identify transitive dependencies - A → C → D, E (C is the intermediate, D and E are transitively dependent) Step 2: Create new relation for the determinant - Create R1(C, D, E) with key {C} Step 3: Remove transitively dependent attributes from original - Create R2(A, B, C) with key {A} Result: Both relations are in 3NF - R1(C, D, E): C → D, E (C is a key, so allowed) - R2(A, B, C): A → B, C (A is a key, so allowed)*/ -- ==================================================-- ANOTHER EXAMPLE: Order with derived customer info-- ================================================== -- ❌ VIOLATION:-- Order(OrderID, CustomerID, CustomerName, CustomerEmail, OrderDate, Total)-- FDs: OrderID → CustomerID, CustomerID → CustomerName, CustomerEmail-- Transitive: OrderID → CustomerID → CustomerName, CustomerEmail -- ✅ CORRECTION:CREATE TABLE Customer ( CustomerID INT PRIMARY KEY, CustomerName VARCHAR(100) NOT NULL, CustomerEmail VARCHAR(254) NOT NULL UNIQUE); CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT NOT NULL, OrderDate DATE NOT NULL, Total DECIMAL(12,2) NOT NULL, FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID));"Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key." The whole key (2NF) eliminates partial dependencies. Nothing but the key (3NF) eliminates transitive dependencies.
Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF that handles edge cases involving candidate keys.
Definition:
A relation is in BCNF if and only if, for every non-trivial functional dependency X → Y:
That's it. Unlike 3NF, there's no exception for prime attributes. Every determinant must be a superkey.
3NF vs. BCNF:
3NF allows: X → A where A is prime (part of a candidate key) BCNF does not allow this exception
When Does 3NF ≠ BCNF?
The difference only matters when:
This is relatively rare in practice, but when it occurs, BCNF violations can cause anomalies missed by 3NF analysis.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
-- ==================================================-- BCNF VIOLATION EXAMPLE-- ================================================== -- Table: CourseInstructor-- Records which instructors teach which courses in which rooms-- -- | Course | Instructor | Room |-- |---------|------------|-------|-- | CS101 | Prof. Smith| R101 |-- | CS101 | Prof. Jones| R102 |-- | CS201 | Prof. Smith| R101 |-- | CS201 | Prof. Brown| R103 | -- Business Rules (leading to overlapping candidate keys):-- 1. An instructor teaches in only ONE room (Instructor → Room)-- 2. A course in a specific room has one instructor (Course, Room → Instructor) -- Candidate Keys:-- CK1: {Course, Instructor} - determines Room via Instructor → Room-- CK2: {Course, Room} - determines Instructor per business rule 2 -- FDs:-- {Course, Instructor} → Room (CK1 → Room, OK)-- {Course, Room} → Instructor (CK2 → Instructor, OK)-- Instructor → Room (NOT a superkey, BCNF VIOLATION!) -- 3NF Analysis: Instructor → Room-- Is Instructor a superkey? NO-- Is Room a prime attribute? YES (part of CK2)-- Therefore: 3NF satisfied, but BCNF violated! -- The Problem:-- | Course | Instructor | Room |-- | CS101 | Prof. Smith| R101 | -- | CS201 | Prof. Smith| R101 | -- Room R101 stored twice for Smith-- | CS301 | Prof. Smith| R101 | -- Redundancy! If Smith moves, update 3 rows -- ✅ CORRECTION: Decompose to achieve BCNF -- Relation 1: InstructorRoom (captures Instructor → Room)CREATE TABLE InstructorRoom ( Instructor VARCHAR(100) PRIMARY KEY, Room VARCHAR(10) NOT NULL); -- Relation 2: CourseInstructor (remaining attributes)CREATE TABLE CourseInstructor ( Course VARCHAR(10) NOT NULL, Instructor VARCHAR(100) NOT NULL, PRIMARY KEY (Course, Instructor), FOREIGN KEY (Instructor) REFERENCES InstructorRoom(Instructor)); -- Now normalized:-- InstructorRoom: ('Prof. Smith', 'R101'), ('Prof. Jones', 'R102'), ('Prof. Brown', 'R103')-- CourseInstructor: ('CS101', 'Prof. Smith'), ('CS101', 'Prof. Jones'), -- ('CS201', 'Prof. Smith'), ('CS201', 'Prof. Brown') -- No redundancy: Smith's room stored once only -- ==================================================-- BCNF TRADE-OFF: DEPENDENCY PRESERVATION-- ================================================== /*IMPORTANT CAVEAT: The original FD {Course, Room} → Instructor cannot be enforcedby a single-table constraint in either decomposed table! To enforce this, you would need:1. Application logic validation2. A trigger that checks across both tables3. A view with a CHECK constraint (if supported) This illustrates the BCNF trade-off:- BCNF guarantees no redundancy- But may sacrifice dependency preservation- 3NF can always preserve all FDs in single-table constraints- Sometimes 3NF is preferred for practical enforcement reasons*/ -- Enforcing the cross-table constraint via triggerCREATE OR REPLACE FUNCTION check_course_room_instructor()RETURNS TRIGGER AS $$BEGIN -- Check if the same course-room pair already has a different instructor IF EXISTS ( SELECT 1 FROM CourseInstructor ci JOIN InstructorRoom ir ON ci.Instructor = ir.Instructor WHERE ci.Course = NEW.Course AND ir.Room = (SELECT Room FROM InstructorRoom WHERE Instructor = NEW.Instructor) AND ci.Instructor != NEW.Instructor ) THEN RAISE EXCEPTION 'Constraint violation: Course % in this room already has different instructor', NEW.Course; END IF; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER trg_course_room_instructorBEFORE INSERT OR UPDATE ON CourseInstructorFOR EACH ROW EXECUTE FUNCTION check_course_room_instructor();BCNF decomposition may lose some functional dependencies—they can't be checked in a single table. 3NF always allows dependency-preserving decomposition. In practice, weigh redundancy (favoring BCNF) against constraint enforcement simplicity (favoring 3NF).
Beyond BCNF, higher normal forms address more subtle redundancies caused by multivalued dependencies (4NF) and join dependencies (5NF).
Fourth Normal Form (4NF):
A relation is in 4NF if it is in BCNF and contains no non-trivial multivalued dependencies.
Multivalued Dependency (MVD):
X ↠ Y (X multi-determines Y) means that the set of Y-values associated with a given X-value is independent of other attributes.
Formal: For all pairs of tuples with equal X-values, swapping their Y-values produces tuples that also exist in the relation.
Example of MVD:
Employee(EmpID, Skill, Language)
If an employee's skills are independent of languages they speak:
This creates ALL combinations of skills and languages per employee—massive redundancy!
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
-- ==================================================-- FOURTH NORMAL FORM (4NF) VIOLATION AND CORRECTION-- ================================================== -- ❌ VIOLATION: Independent Multivalued Dependencies-- Table: EmployeeSkillLanguage-- | EmpID | Skill | Language |-- |-------|------------|----------|-- | 1 | Python | English |-- | 1 | Python | Spanish |-- | 1 | Java | English |-- | 1 | Java | Spanish |-- | 2 | JavaScript | French |-- | 2 | JavaScript | German |-- | 2 | TypeScript | French |-- | 2 | TypeScript | German | -- MVDs: EmpID ↠ Skill, EmpID ↠ Language-- Each skill appears with EVERY language the employee knows-- Employee 1 knows 2 skills × 2 languages = 4 rows (exponential growth!) -- Problems:-- 1) REDUNDANCY: Each skill stored once per language known-- 2) UPDATE ANOMALY: Add new language → add row for EVERY skill-- 3) DELETION ANOMALY: Remove last language → lose all skill info -- ✅ CORRECTION: Decompose to eliminate MVDs CREATE TABLE EmployeeSkill ( EmpID INT NOT NULL, Skill VARCHAR(50) NOT NULL, PRIMARY KEY (EmpID, Skill)); CREATE TABLE EmployeeLanguage ( EmpID INT NOT NULL, Language VARCHAR(50) NOT NULL, PRIMARY KEY (EmpID, Language)); -- Now:-- EmployeeSkill: (1, 'Python'), (1, 'Java'), (2, 'JavaScript'), (2, 'TypeScript')-- EmployeeLanguage: (1, 'English'), (1, 'Spanish'), (2, 'French'), (2, 'German')-- 4 + 4 = 8 rows instead of 8 (no savings here, but scales linearly not exponentially)-- 3 skills × 3 languages = 6 rows instead of 9 -- ==================================================-- FIFTH NORMAL FORM (5NF) - Join Dependencies-- ================================================== -- 5NF addresses join dependencies that cannot be expressed as MVDs or FDs-- A relation is in 5NF if it cannot be losslessly decomposed further -- Example Scenario:-- Agents represent Companies for Products-- But the relationships are pairwise constrained:-- 1) Agent-Company: authorized agents for each company-- 2) Company-Product: products each company sells-- 3) Agent-Product: products each agent can sell -- If the combination (Agent, Company, Product) is valid ONLY when ALL THREE -- pairwise relationships exist, then the three-way table contains redundancy -- | Agent | Company | Product |-- |-------|---------|---------|-- | A1 | C1 | P1 |-- | A1 | C1 | P2 |-- | A1 | C2 | P1 |-- | A2 | C1 | P1 | -- Join Dependency: *(AC, CP, AP)-- The table can be reconstructed by joining three projections -- ✅ CORRECTION: Decompose into projections CREATE TABLE AgentCompany ( Agent VARCHAR(10) NOT NULL, Company VARCHAR(10) NOT NULL, PRIMARY KEY (Agent, Company)); CREATE TABLE CompanyProduct ( Company VARCHAR(10) NOT NULL, Product VARCHAR(10) NOT NULL, PRIMARY KEY (Company, Product)); CREATE TABLE AgentProduct ( Agent VARCHAR(10) NOT NULL, Product VARCHAR(10) NOT NULL, PRIMARY KEY (Agent, Product)); -- Reconstruction via natural join:-- SELECT DISTINCT ac.Agent, ac.Company, cp.Product-- FROM AgentCompany ac-- JOIN CompanyProduct cp ON ac.Company = cp.Company-- JOIN AgentProduct ap ON ac.Agent = ap.Agent AND cp.Product = ap.Product; -- NOTE: 5NF is rarely needed in practice and can make queries complex| Normal Form | Eliminates | Based On | Practical Use |
|---|---|---|---|
| 1NF | Non-atomic values, repeating groups | Atomicity | Universal requirement |
| 2NF | Partial dependencies | Full FD on key | Rare issue if ER mapping done well |
| 3NF | Transitive dependencies | Non-key → non-key FDs | Standard target for most schemas |
| BCNF | All non-superkey determinants | Stricter than 3NF | Used when 3NF has overlapping keys |
| 4NF | Non-trivial MVDs | Independent multivalued facts | Uncommon, specific patterns |
| 5NF | Join dependencies | Cyclic constraints | Very rare, academic interest |
Theory provides the framework, but practical normalization requires judgment. Here are essential considerations for applying normalization in real database design.
When to Stop Normalizing:
When to Denormalize:
Denormalization—intentionally violating normal form rules—is appropriate when:
Denormalization Strategies:
Senior engineers normalize first, then selectively denormalize. They don't skip normalization to 'optimize early.' Denormalization decisions are documented and reversible. The goal is a schema that is as normalized as possible while meeting performance requirements.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
-- ==================================================-- EXAMPLE: NORMALIZED vs. DENORMALIZED DESIGN-- ================================================== -- SCENARIO: E-commerce order history page-- Requirement: Show orders with customer name and line item count-- Pattern: Heavy reads, rare updates, millions of orders -- ====================-- NORMALIZED (3NF) DESIGN-- ==================== CREATE TABLE Customer ( CustomerID SERIAL PRIMARY KEY, CustomerName VARCHAR(100) NOT NULL); CREATE TABLE Orders ( OrderID SERIAL PRIMARY KEY, CustomerID INT NOT NULL REFERENCES Customer(CustomerID), OrderDate TIMESTAMP NOT NULL DEFAULT NOW(), TotalAmount DECIMAL(12,2)); CREATE TABLE OrderLine ( OrderLineID SERIAL PRIMARY KEY, OrderID INT NOT NULL REFERENCES Orders(OrderID), ProductID INT NOT NULL, Quantity INT NOT NULL, LineTotal DECIMAL(12,2)); -- Query for order history:SELECT o.OrderID, c.CustomerName, o.OrderDate, o.TotalAmount, COUNT(ol.OrderLineID) AS LineItemCountFROM Orders oJOIN Customer c ON o.CustomerID = c.CustomerIDLEFT JOIN OrderLine ol ON o.OrderID = ol.OrderIDWHERE c.CustomerID = 12345GROUP BY o.OrderID, c.CustomerName, o.OrderDate, o.TotalAmountORDER BY o.OrderDate DESCLIMIT 50; -- Performance concern: JOIN + GROUP BY on millions of orders -- ====================-- DENORMALIZED DESIGN-- ==================== CREATE TABLE Orders_Denormalized ( OrderID SERIAL PRIMARY KEY, CustomerID INT NOT NULL, CustomerName VARCHAR(100) NOT NULL, -- Denormalized from Customer OrderDate TIMESTAMP NOT NULL DEFAULT NOW(), TotalAmount DECIMAL(12,2), LineItemCount INT NOT NULL DEFAULT 0 -- Pre-computed aggregate); -- Faster query (no JOINs, no GROUP BY):SELECT OrderID, CustomerName, OrderDate, TotalAmount, LineItemCountFROM Orders_DenormalizedWHERE CustomerID = 12345ORDER BY OrderDate DESCLIMIT 50; -- Maintenance required:-- 1) Trigger to update LineItemCount on OrderLine changes-- 2) Trigger or application code to update CustomerName if it changes CREATE OR REPLACE FUNCTION update_order_line_count()RETURNS TRIGGER AS $$BEGIN IF TG_OP = 'INSERT' THEN UPDATE Orders_Denormalized SET LineItemCount = LineItemCount + 1 WHERE OrderID = NEW.OrderID; ELSIF TG_OP = 'DELETE' THEN UPDATE Orders_Denormalized SET LineItemCount = LineItemCount - 1 WHERE OrderID = OLD.OrderID; END IF; RETURN NULL;END;$$ LANGUAGE plpgsql; -- Trade-off analysis:-- Normalized: Clean, no redundancy, complex read queries-- Denormalized: Redundant CustomerName, faster reads, maintenance burdenNormalization is both science and craft—rigorous theory applied with practical judgment. Let's consolidate the essential knowledge:
What Comes Next:
With normalization mastered, we move to constraint specification—the art of encoding business rules directly in the database schema. Constraints are the guardians of data integrity, ensuring that even when applications fail or errors occur, the database rejects invalid data.
You now command the full normalization toolkit: from fundamental functional dependencies through the complete normal form hierarchy, including practical guidelines for when to normalize and when (carefully) to denormalize. This knowledge is central to professional database design.