Database Management SystemsLogical Design

Logical Design: From Concepts to Schemas

LevelIntermediate

Duration90 mins

TopicLogical Design

3 / 5

Normalization

The Science of Eliminating Redundancy

Normalization is one of the most important and most misunderstood concepts in database design. At its core, normalization is the systematic application of formal techniques to transform database relations in ways that reduce data redundancy and improve data integrity.

The process was first introduced by Edgar F. Codd in 1970 and has since been refined into a hierarchy of normal forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), each building upon the previous to eliminate increasingly subtle forms of redundancy and anomalies.

Why does redundancy matter?

Redundant data—the same fact stored in multiple places—creates three types of anomalies:

Insertion Anomaly: Cannot insert data without complete information. Example: Can't add a new department until at least one employee is assigned.
Update Anomaly: Changing a fact requires multiple updates. Example: If a department moves locations, every employee row with that department must be updated.
Deletion Anomaly: Removing data unintentionally deletes related facts. Example: Deleting the last employee in a department loses the department's information.

Normalization systematically eliminates these anomalies by decomposing relations into smaller, well-structured relations that store each fact exactly once.

Learning Objectives

This page provides mastery of normalization theory and practice: functional dependencies, the full normal form hierarchy (1NF through 5NF), decomposition algorithms, lossless join and dependency preservation properties, and practical guidelines for determining appropriate normalization levels.

Functional Dependencies: The Foundation

Before understanding normal forms, we must master functional dependencies (FDs)—the mathematical relationships that normalization seeks to manage.

Definition:

A functional dependency X → Y (read "X determines Y" or "Y is functionally dependent on X") exists in a relation R if and only if, for any two tuples t₁ and t₂ in R:

If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]

In other words: if two tuples have the same value(s) for attribute(s) X, they must have the same value(s) for attribute(s) Y.

Key Insights:

X is called the determinant (left-hand side)
Y is called the dependent (right-hand side)
X and Y can be single attributes or sets of attributes
FDs represent real-world constraints, not accidental patterns in current data
FDs must hold for ALL possible valid instances, not just the current state

Functional Dependency Examples
Relation	Functional Dependency	Interpretation
Employee	EmployeeID → Name, Salary, DeptID	ID uniquely determines all employee attributes
Employee	Email → EmployeeID	Email is an alternate key
Employee	DeptID → DeptName, DeptLocation	Department ID determines department info (redundancy!)
CourseSection	CourseID, Semester → InstructorID	A course section has one instructor
CourseSection	InstructorID, Semester → OfficeHours	Instructor's hours per semester

Types of Functional Dependencies:

Trivial FD: Y is a subset of X. Always true by definition. Example: {EmployeeID, Name} → Name
Non-trivial FD: Y is not a subset of X. These are meaningful constraints. Example: EmployeeID → Name
Completely Non-trivial FD: X and Y have no common attributes. Example: EmployeeID → Salary
Partial Dependency: Y depends on part of a composite key, not the whole. Example: In R(A, B, C) with key {A, B}, if B → C, then C is partially dependent.
Transitive Dependency: X → Y and Y → Z implies X → Z (transitivity). Example: EmployeeID → DeptID → DeptLocation

Armstrong's Axioms

All FDs can be derived using Armstrong's Axioms: (1) Reflexivity: If Y ⊆ X, then X → Y. (2) Augmentation: If X → Y, then XZ → YZ. (3) Transitivity: If X → Y and Y → Z, then X → Z. From these, we can derive Union, Decomposition, and Pseudotransitivity rules.

First Normal Form (1NF)

First Normal Form (1NF) is the foundation—the minimum requirement for a relation to exist in the relational model.

Definition:

A relation is in 1NF if and only if:

It has a defined primary key
All attributes contain only atomic (indivisible) values
There are no repeating groups or arrays within a single row
Each column contains values from a single domain
Each row is unique (enforced by the primary key)

The Atomicity Requirement:

"Atomic" means the value cannot be meaningfully subdivided for query purposes. This is context-dependent:

Is "123 Main St, New York, NY 10001" atomic? For simple storage, yes. For queries filtering by city, no—it should be decomposed.
Is a phone number atomic? Depends on whether you need to query by area code.

The key question: Will you ever need to access or query parts of this value independently?

first_normal_form_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
-- ==================================================
-- FIRST NORMAL FORM (1NF) VIOLATIONS AND CORRECTIONS
-- ==================================================
 
-- ❌ VIOLATION 1: Repeating Groups (Multiple values in one cell)
-- Problem Table:
-- | StudentID | Name  | PhoneNumbers                     |
-- |-----------|-------|----------------------------------|
-- | 1         | Alice | 555-1234, 555-5678, 555-9999    |
-- | 2         | Bob   | 555-4321                         |
 
-- Issue: PhoneNumbers contains multiple values, violating atomicity
 
-- ✅ CORRECTION: Separate table for multivalued attributes
CREATE TABLE Student (
    StudentID       INT             PRIMARY KEY,
    Name            VARCHAR(100)    NOT NULL
);
 
CREATE TABLE StudentPhone (
    StudentID       INT             NOT NULL,
    PhoneNumber     VARCHAR(20)     NOT NULL,
    PhoneType       VARCHAR(20)     DEFAULT 'Mobile',
    PRIMARY KEY (StudentID, PhoneNumber),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID) ON DELETE CASCADE
);
 
-- Data now atomic:
-- Student: (1, 'Alice'), (2, 'Bob')
-- StudentPhone: (1, '555-1234', 'Mobile'), (1, '555-5678', 'Home'), 
--               (1, '555-9999', 'Work'), (2, '555-4321', 'Mobile')
 
-- ❌ VIOLATION 2: Repeating Groups as Columns
-- Problem Table:
-- | OrderID | CustomerID | Product1   | Price1 | Product2   | Price2 |
-- |---------|------------|------------|--------|------------|--------|
-- | 1001    | 1          | Widget     | 10.00  | Gadget     | 25.00  |
-- | 1002    | 2          | Widget     | 10.00  | NULL       | NULL   |
 
-- Issues: 
-- 1) Fixed number of products limits flexibility
-- 2) NULL values waste space
-- 3) Cannot easily query "all products in order"
 
-- ✅ CORRECTION: Normalize to separate order lines
CREATE TABLE Orders (
    OrderID         INT             PRIMARY KEY,
    CustomerID      INT             NOT NULL,
    OrderDate       DATE            NOT NULL DEFAULT CURRENT_DATE
);
 
CREATE TABLE OrderLine (
    OrderID         INT             NOT NULL,
    LineNumber      INT             NOT NULL,
    ProductName     VARCHAR(100)    NOT NULL,
    UnitPrice       DECIMAL(10,2)   NOT NULL,
    Quantity        INT             NOT NULL DEFAULT 1,
    PRIMARY KEY (OrderID, LineNumber),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE
);
 
-- ❌ VIOLATION 3: Composite/Non-atomic Values
-- Problem:
-- | EmployeeID | FullAddress                            |
-- |------------|----------------------------------------|
-- | 1          | 123 Main St, New York, NY 10001, USA   |
 
-- Cannot efficiently query by city, state, or zip code
 
-- ✅ CORRECTION: Decompose into atomic components
CREATE TABLE Employee (
    EmployeeID      INT             PRIMARY KEY,
    Name            VARCHAR(100)    NOT NULL
);
 
CREATE TABLE EmployeeAddress (
    EmployeeID      INT             PRIMARY KEY,
    Street          VARCHAR(200),
    City            VARCHAR(100),
    State           VARCHAR(50),
    PostalCode      VARCHAR(20),
    Country         CHAR(3)         DEFAULT 'USA',
    FOREIGN KEY (EmployeeID) REFERENCES Employee(EmployeeID) ON DELETE CASCADE
);
 
-- Now queries like "SELECT * FROM EmployeeAddress WHERE City = 'New York'" work
 
-- ❌ VIOLATION 4: Missing Primary Key
-- Problem:
-- | Name  | Department | Salary |
-- |-------|------------|--------|
-- | Alice | Sales      | 50000  |
-- | Alice | HR         | 55000  |  -- Same name, different department
-- | Bob   | Sales      | 60000  |
 
-- Without PK, cannot uniquely identify rows or establish relationships
 
-- ✅ CORRECTION: Add primary key
CREATE TABLE EmployeeWithPK (
    EmployeeID      SERIAL          PRIMARY KEY,  -- Surrogate key
    Name            VARCHAR(100)    NOT NULL,
    Department      VARCHAR(50)     NOT NULL,
    Salary          DECIMAL(12,2),
    UNIQUE (Name, Department)  -- If business rule requires uniqueness
);

Context-Dependent Atomicity

What's "atomic" depends on requirements. A date is atomic for most purposes but not if you frequently query by month alone. JSON columns in modern SQL can be atomic (the whole document) or non-atomic (query individual fields). Design based on actual query patterns.

Second Normal Form (2NF)

Second Normal Form (2NF) addresses partial dependencies in relations with composite primary keys.

Definition:

A relation is in 2NF if and only if:

It is in 1NF, AND
Every non-prime attribute is fully functionally dependent on the entire primary key

Key Terminology:

Prime attribute: An attribute that is part of any candidate key
Non-prime attribute: An attribute that is not part of any candidate key
Full functional dependency: X → Y is full if no proper subset of X determines Y
Partial dependency: X → Y where some proper subset of X also determines Y

Important Note:

If a relation has a single-attribute primary key, it automatically satisfies 2NF (there's no "part of" a single attribute). 2NF only matters for composite keys.

Why Partial Dependencies Cause Problems:

When non-key attributes depend on only part of the key, the same fact gets stored multiple times—once for each combination with the irrelevant key portion.

second_normal_form_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- ==================================================
-- SECOND NORMAL FORM (2NF) VIOLATION AND CORRECTION
-- ==================================================
 
-- ❌ VIOLATION: Partial Dependency
-- Table: StudentCourse
-- Primary Key: (StudentID, CourseID)
-- Attributes: StudentName, CourseName, InstructorID, Grade
 
-- | StudentID | CourseID | StudentName | CourseName | InstructorID | Grade |
-- |-----------|----------|-------------|------------|--------------|-------|
-- | 1         | CS101    | Alice       | Intro CS   | 501          | A     |
-- | 1         | CS201    | Alice       | Data Str   | 502          | B     |
-- | 2         | CS101    | Bob         | Intro CS   | 501          | B     |
-- | 3         | CS101    | Carol       | Intro CS   | 501          | A     |
 
-- Functional Dependencies:
-- 1) StudentID, CourseID → StudentName, CourseName, InstructorID, Grade (Full - OK)
-- 2) StudentID → StudentName (PARTIAL - violates 2NF!)
-- 3) CourseID → CourseName, InstructorID (PARTIAL - violates 2NF!)
-- 4) StudentID, CourseID → Grade (Full - OK)
 
-- Problems caused:
-- 1) UPDATE ANOMALY: If Alice changes name, must update all her enrollments
-- 2) INSERTION ANOMALY: Can't add new course until a student enrolls
-- 3) DELETION ANOMALY: If Alice drops all courses, we lose her information
-- 4) REDUNDANCY: "Alice" stored 2 times, "Intro CS" stored 3 times
 
-- ✅ CORRECTION: Decompose to eliminate partial dependencies
 
-- Relation 1: Student (captures StudentID → StudentName)
CREATE TABLE Student (
    StudentID       INT             PRIMARY KEY,
    StudentName     VARCHAR(100)    NOT NULL
);
 
-- Relation 2: Course (captures CourseID → CourseName, InstructorID)
CREATE TABLE Course (
    CourseID        VARCHAR(10)     PRIMARY KEY,
    CourseName      VARCHAR(100)    NOT NULL,
    InstructorID    INT             NOT NULL
);
 
-- Relation 3: Enrollment (only the full dependency remains)
CREATE TABLE Enrollment (
    StudentID       INT             NOT NULL,
    CourseID        VARCHAR(10)     NOT NULL,
    Grade           CHAR(2),
    EnrollmentDate  DATE            NOT NULL DEFAULT CURRENT_DATE,
    
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
);
 
-- Now each fact stored exactly once:
-- Student: (1, 'Alice'), (2, 'Bob'), (3, 'Carol')
-- Course: ('CS101', 'Intro CS', 501), ('CS201', 'Data Str', 502)
-- Enrollment: (1, 'CS101', 'A'), (1, 'CS201', 'B'), (2, 'CS101', 'B'), (3, 'CS101', 'A')
 
-- ==================================================
-- 2NF DECOMPOSITION ALGORITHM
-- ==================================================
 
/*
Given relation R(A, B, C, D) with key {A, B} and FDs:
  - {A, B} → C, D
  - A → D  (partial dependency: D depends on part of key)
 
Step 1: Identify partial dependencies
  - A → D violates 2NF (D depends on A alone, not full key {A,B})
 
Step 2: Create new relation with partial dependency
  - Create R1(A, D) with key {A}
 
Step 3: Remove partially dependent attributes from original
  - Create R2(A, B, C) with key {A, B}
 
Result: Both R1 and R2 are in 2NF
  - R1(A, D): A → D (full dependency, single-attribute key)
  - R2(A, B, C): {A, B} → C (full dependency)
*/

2NF in Practice

While 2NF is conceptually important, well-designed ER-to-relational mappings rarely produce 2NF violations. If you map entities to separate tables and relationships properly, partial dependencies don't arise. 2NF violations typically occur when trying to store relationship attributes in entity tables.

Third Normal Form (3NF)

Third Normal Form (3NF) eliminates transitive dependencies and is often considered the practical target for most database designs.

Definition:

A relation is in 3NF if and only if:

It is in 2NF, AND
No non-prime attribute is transitively dependent on the primary key

Alternative Definition (More Precise):

For every non-trivial FD X → A in R:

X is a superkey, OR
A is a prime attribute (part of some candidate key)

What is Transitive Dependency?

If X → Y and Y → Z, then X → Z transitively. In 3NF terms, a violation occurs when:

A non-key attribute (Z) depends on another non-key attribute (Y)
Which in turn depends on the key (X)
And Y is not a candidate key

The Core Issue:

Transitive dependencies embed facts about one entity (the intermediate) within the table of another entity. This creates redundancy and anomalies similar to partial dependencies.

third_normal_form_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
-- ==================================================
-- THIRD NORMAL FORM (3NF) VIOLATION AND CORRECTION
-- ==================================================
 
-- ❌ VIOLATION: Transitive Dependency
-- Table: Employee
-- Primary Key: EmployeeID
-- 
-- | EmpID | EmpName | DeptID | DeptName    | DeptLocation |
-- |-------|---------|--------|-------------|--------------|
-- | 1     | Alice   | D1     | Engineering | Building A   |
-- | 2     | Bob     | D1     | Engineering | Building A   |
-- | 3     | Carol   | D2     | Marketing   | Building B   |
-- | 4     | Dave    | D2     | Marketing   | Building B   |
 
-- Functional Dependencies:
-- 1) EmployeeID → EmpName, DeptID, DeptName, DeptLocation (OK - key determines all)
-- 2) DeptID → DeptName, DeptLocation (Non-key → Non-key: TRANSITIVE!)
 
-- Transitive chain: EmployeeID → DeptID → DeptName, DeptLocation
-- DeptName and DeptLocation are about Department, not Employee
 
-- Problems:
-- 1) UPDATE ANOMALY: Change "Engineering" location requires updating all employees
-- 2) INSERTION ANOMALY: Can't add new department until employee is hired
-- 3) DELETION ANOMALY: If Alice and Bob leave, we lose Engineering department info
-- 4) REDUNDANCY: "Engineering, Building A" repeated for every Engineering employee
 
-- ✅ CORRECTION: Remove transitive dependency
 
-- Relation 1: Department (new entity based on determinant of transitive FD)
CREATE TABLE Department (
    DeptID          VARCHAR(10)     PRIMARY KEY,
    DeptName        VARCHAR(100)    NOT NULL UNIQUE,
    DeptLocation    VARCHAR(100)
);
 
-- Relation 2: Employee (transitively dependent attributes removed)
CREATE TABLE Employee (
    EmployeeID      INT             PRIMARY KEY,
    EmployeeName    VARCHAR(100)    NOT NULL,
    Salary          DECIMAL(12,2),
    DeptID          VARCHAR(10)     NOT NULL,
    
    FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
);
 
-- Data normalized:
-- Department: ('D1', 'Engineering', 'Building A'), ('D2', 'Marketing', 'Building B')
-- Employee: (1, 'Alice', 75000, 'D1'), (2, 'Bob', 80000, 'D1'), 
--           (3, 'Carol', 70000, 'D2'), (4, 'Dave', 72000, 'D2')
 
-- ==================================================
-- 3NF DECOMPOSITION ALGORITHM
-- ==================================================
 
/*
Given relation R(A, B, C, D, E) with key {A} and FDs:
  - A → B, C, D, E
  - C → D, E  (transitive: non-key C determines non-keys D, E)
 
Step 1: Identify transitive dependencies
  - A → C → D, E (C is the intermediate, D and E are transitively dependent)
 
Step 2: Create new relation for the determinant
  - Create R1(C, D, E) with key {C}
 
Step 3: Remove transitively dependent attributes from original
  - Create R2(A, B, C) with key {A}
 
Result: Both relations are in 3NF
  - R1(C, D, E): C → D, E (C is a key, so allowed)
  - R2(A, B, C): A → B, C (A is a key, so allowed)
*/
 
-- ==================================================
-- ANOTHER EXAMPLE: Order with derived customer info
-- ==================================================
 
-- ❌ VIOLATION:
-- Order(OrderID, CustomerID, CustomerName, CustomerEmail, OrderDate, Total)
-- FDs: OrderID → CustomerID, CustomerID → CustomerName, CustomerEmail
-- Transitive: OrderID → CustomerID → CustomerName, CustomerEmail
 
-- ✅ CORRECTION:
CREATE TABLE Customer (
    CustomerID      INT             PRIMARY KEY,
    CustomerName    VARCHAR(100)    NOT NULL,
    CustomerEmail   VARCHAR(254)    NOT NULL UNIQUE
);
 
CREATE TABLE Orders (
    OrderID         INT             PRIMARY KEY,
    CustomerID      INT             NOT NULL,
    OrderDate       DATE            NOT NULL,
    Total           DECIMAL(12,2)   NOT NULL,
    
    FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
);

The 3NF Mnemonic

"Every non-key attribute must provide a fact about the key, the whole key, and nothing but the key." The whole key (2NF) eliminates partial dependencies. Nothing but the key (3NF) eliminates transitive dependencies.

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF that handles edge cases involving candidate keys.

Definition:

A relation is in BCNF if and only if, for every non-trivial functional dependency X → Y:

X is a superkey

That's it. Unlike 3NF, there's no exception for prime attributes. Every determinant must be a superkey.

3NF vs. BCNF:

3NF allows: X → A where A is prime (part of a candidate key) BCNF does not allow this exception

When Does 3NF ≠ BCNF?

The difference only matters when:

A relation has multiple overlapping candidate keys
Some candidate keys are composite
The composite keys share common attributes

This is relatively rare in practice, but when it occurs, BCNF violations can cause anomalies missed by 3NF analysis.

bcnf_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
-- ==================================================
-- BCNF VIOLATION EXAMPLE
-- ==================================================
 
-- Table: CourseInstructor
-- Records which instructors teach which courses in which rooms
-- 
-- | Course  | Instructor | Room  |
-- |---------|------------|-------|
-- | CS101   | Prof. Smith| R101  |
-- | CS101   | Prof. Jones| R102  |
-- | CS201   | Prof. Smith| R101  |
-- | CS201   | Prof. Brown| R103  |
 
-- Business Rules (leading to overlapping candidate keys):
-- 1. An instructor teaches in only ONE room (Instructor → Room)
-- 2. A course in a specific room has one instructor (Course, Room → Instructor)
 
-- Candidate Keys:
-- CK1: {Course, Instructor} - determines Room via Instructor → Room
-- CK2: {Course, Room} - determines Instructor per business rule 2
 
-- FDs:
-- {Course, Instructor} → Room (CK1 → Room, OK)
-- {Course, Room} → Instructor (CK2 → Instructor, OK)
-- Instructor → Room (NOT a superkey, BCNF VIOLATION!)
 
-- 3NF Analysis: Instructor → Room
-- Is Instructor a superkey? NO
-- Is Room a prime attribute? YES (part of CK2)
-- Therefore: 3NF satisfied, but BCNF violated!
 
-- The Problem:
-- | Course  | Instructor | Room  |
-- | CS101   | Prof. Smith| R101  | 
-- | CS201   | Prof. Smith| R101  |  -- Room R101 stored twice for Smith
-- | CS301   | Prof. Smith| R101  |  -- Redundancy! If Smith moves, update 3 rows
 
-- ✅ CORRECTION: Decompose to achieve BCNF
 
-- Relation 1: InstructorRoom (captures Instructor → Room)
CREATE TABLE InstructorRoom (
    Instructor      VARCHAR(100)    PRIMARY KEY,
    Room            VARCHAR(10)     NOT NULL
);
 
-- Relation 2: CourseInstructor (remaining attributes)
CREATE TABLE CourseInstructor (
    Course          VARCHAR(10)     NOT NULL,
    Instructor      VARCHAR(100)    NOT NULL,
    
    PRIMARY KEY (Course, Instructor),
    FOREIGN KEY (Instructor) REFERENCES InstructorRoom(Instructor)
);
 
-- Now normalized:
-- InstructorRoom: ('Prof. Smith', 'R101'), ('Prof. Jones', 'R102'), ('Prof. Brown', 'R103')
-- CourseInstructor: ('CS101', 'Prof. Smith'), ('CS101', 'Prof. Jones'), 
--                   ('CS201', 'Prof. Smith'), ('CS201', 'Prof. Brown')
 
-- No redundancy: Smith's room stored once only
 
-- ==================================================
-- BCNF TRADE-OFF: DEPENDENCY PRESERVATION
-- ==================================================
 
/*
IMPORTANT CAVEAT:
 
The original FD {Course, Room} → Instructor cannot be enforced
by a single-table constraint in either decomposed table!
 
To enforce this, you would need:
1. Application logic validation
2. A trigger that checks across both tables
3. A view with a CHECK constraint (if supported)
 
This illustrates the BCNF trade-off:
- BCNF guarantees no redundancy
- But may sacrifice dependency preservation
- 3NF can always preserve all FDs in single-table constraints
- Sometimes 3NF is preferred for practical enforcement reasons
*/
 
-- Enforcing the cross-table constraint via trigger
CREATE OR REPLACE FUNCTION check_course_room_instructor()
RETURNS TRIGGER AS $$
BEGIN
    -- Check if the same course-room pair already has a different instructor
    IF EXISTS (
        SELECT 1 
        FROM CourseInstructor ci
        JOIN InstructorRoom ir ON ci.Instructor = ir.Instructor
        WHERE ci.Course = NEW.Course 
          AND ir.Room = (SELECT Room FROM InstructorRoom WHERE Instructor = NEW.Instructor)
          AND ci.Instructor != NEW.Instructor
    ) THEN
        RAISE EXCEPTION 'Constraint violation: Course % in this room already has different instructor', NEW.Course;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER trg_course_room_instructor
BEFORE INSERT OR UPDATE ON CourseInstructor
FOR EACH ROW EXECUTE FUNCTION check_course_room_instructor();

BCNF vs. Dependency Preservation

BCNF decomposition may lose some functional dependencies—they can't be checked in a single table. 3NF always allows dependency-preserving decomposition. In practice, weigh redundancy (favoring BCNF) against constraint enforcement simplicity (favoring 3NF).

Higher Normal Forms: 4NF and 5NF

Beyond BCNF, higher normal forms address more subtle redundancies caused by multivalued dependencies (4NF) and join dependencies (5NF).

Fourth Normal Form (4NF):

A relation is in 4NF if it is in BCNF and contains no non-trivial multivalued dependencies.

Multivalued Dependency (MVD):

X ↠ Y (X multi-determines Y) means that the set of Y-values associated with a given X-value is independent of other attributes.

Formal: For all pairs of tuples with equal X-values, swapping their Y-values produces tuples that also exist in the relation.

Example of MVD:

Employee(EmpID, Skill, Language)

If an employee's skills are independent of languages they speak:

EmpID ↠ Skill (employee's skills don't depend on languages known)
EmpID ↠ Language (languages don't depend on skills)

This creates ALL combinations of skills and languages per employee—massive redundancy!

fourth_fifth_normal_form.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
-- ==================================================
-- FOURTH NORMAL FORM (4NF) VIOLATION AND CORRECTION
-- ==================================================
 
-- ❌ VIOLATION: Independent Multivalued Dependencies
-- Table: EmployeeSkillLanguage
-- | EmpID | Skill      | Language |
-- |-------|------------|----------|
-- | 1     | Python     | English  |
-- | 1     | Python     | Spanish  |
-- | 1     | Java       | English  |
-- | 1     | Java       | Spanish  |
-- | 2     | JavaScript | French   |
-- | 2     | JavaScript | German   |
-- | 2     | TypeScript | French   |
-- | 2     | TypeScript | German   |
 
-- MVDs: EmpID ↠ Skill, EmpID ↠ Language
-- Each skill appears with EVERY language the employee knows
-- Employee 1 knows 2 skills × 2 languages = 4 rows (exponential growth!)
 
-- Problems:
-- 1) REDUNDANCY: Each skill stored once per language known
-- 2) UPDATE ANOMALY: Add new language → add row for EVERY skill
-- 3) DELETION ANOMALY: Remove last language → lose all skill info
 
-- ✅ CORRECTION: Decompose to eliminate MVDs
 
CREATE TABLE EmployeeSkill (
    EmpID           INT             NOT NULL,
    Skill           VARCHAR(50)     NOT NULL,
    PRIMARY KEY (EmpID, Skill)
);
 
CREATE TABLE EmployeeLanguage (
    EmpID           INT             NOT NULL,
    Language        VARCHAR(50)     NOT NULL,
    PRIMARY KEY (EmpID, Language)
);
 
-- Now:
-- EmployeeSkill: (1, 'Python'), (1, 'Java'), (2, 'JavaScript'), (2, 'TypeScript')
-- EmployeeLanguage: (1, 'English'), (1, 'Spanish'), (2, 'French'), (2, 'German')
-- 4 + 4 = 8 rows instead of 8 (no savings here, but scales linearly not exponentially)
-- 3 skills × 3 languages = 6 rows instead of 9
 
-- ==================================================
-- FIFTH NORMAL FORM (5NF) - Join Dependencies
-- ==================================================
 
-- 5NF addresses join dependencies that cannot be expressed as MVDs or FDs
-- A relation is in 5NF if it cannot be losslessly decomposed further
 
-- Example Scenario:
-- Agents represent Companies for Products
-- But the relationships are pairwise constrained:
-- 1) Agent-Company: authorized agents for each company
-- 2) Company-Product: products each company sells
-- 3) Agent-Product: products each agent can sell
 
-- If the combination (Agent, Company, Product) is valid ONLY when ALL THREE 
-- pairwise relationships exist, then the three-way table contains redundancy
 
-- | Agent | Company | Product |
-- |-------|---------|---------|
-- | A1    | C1      | P1      |
-- | A1    | C1      | P2      |
-- | A1    | C2      | P1      |
-- | A2    | C1      | P1      |
 
-- Join Dependency: *(AC, CP, AP)
-- The table can be reconstructed by joining three projections
 
-- ✅ CORRECTION: Decompose into projections
 
CREATE TABLE AgentCompany (
    Agent           VARCHAR(10)     NOT NULL,
    Company         VARCHAR(10)     NOT NULL,
    PRIMARY KEY (Agent, Company)
);
 
CREATE TABLE CompanyProduct (
    Company         VARCHAR(10)     NOT NULL,
    Product         VARCHAR(10)     NOT NULL,
    PRIMARY KEY (Company, Product)
);
 
CREATE TABLE AgentProduct (
    Agent           VARCHAR(10)     NOT NULL,
    Product         VARCHAR(10)     NOT NULL,
    PRIMARY KEY (Agent, Product)
);
 
-- Reconstruction via natural join:
-- SELECT DISTINCT ac.Agent, ac.Company, cp.Product
-- FROM AgentCompany ac
-- JOIN CompanyProduct cp ON ac.Company = cp.Company
-- JOIN AgentProduct ap ON ac.Agent = ap.Agent AND cp.Product = ap.Product;
 
-- NOTE: 5NF is rarely needed in practice and can make queries complex

Normal Form Summary
Normal Form	Eliminates	Based On	Practical Use
1NF	Non-atomic values, repeating groups	Atomicity	Universal requirement
2NF	Partial dependencies	Full FD on key	Rare issue if ER mapping done well
3NF	Transitive dependencies	Non-key → non-key FDs	Standard target for most schemas
BCNF	All non-superkey determinants	Stricter than 3NF	Used when 3NF has overlapping keys
4NF	Non-trivial MVDs	Independent multivalued facts	Uncommon, specific patterns
5NF	Join dependencies	Cyclic constraints	Very rare, academic interest

Normalization in Practice

Theory provides the framework, but practical normalization requires judgment. Here are essential considerations for applying normalization in real database design.

When to Stop Normalizing:

3NF is usually sufficient for most OLTP (transactional) systems
BCNF is worthwhile when overlapping candidate keys exist and redundancy is problematic
4NF and 5NF address rare patterns; decompose only if you observe the specific problems they solve

When to Denormalize:

Denormalization—intentionally violating normal form rules—is appropriate when:

Read performance is critical and joins are bottlenecks
Data rarely changes after initial load
The redundancy is manageable (limited rows, automated updates)
Query patterns strongly favor denormalized structure

Denormalization Strategies:

Materialized views: Store computed joins, refresh periodically
Redundant columns: Store derived or frequently-joined values
Summary tables: Pre-aggregate for reporting queries
Caching layers: Keep normalized DB, cache denormalized views

Normalization Decision Framework

•Start normalized (3NF minimum): Always begin with a normalized design. Denormalize only when measured performance problems justify it.
•Measure before denormalizing: Profile slow queries. The bottleneck is often indexing, not normalization level.
•Document denormalization: Record what normal form is violated, why, and what maintenance mechanisms exist.
•Consider the write/read ratio: High-write systems suffer more from denormalization (update anomalies). High-read systems benefit more.
•Factor in consistency requirements: Denormalization risks inconsistency. Is eventual consistency acceptable?
•Automate maintenance: If denormalizing, use triggers, stored procedures, or application code to maintain derived values.

The Expert's Approach

Senior engineers normalize first, then selectively denormalize. They don't skip normalization to 'optimize early.' Denormalization decisions are documented and reversible. The goal is a schema that is as normalized as possible while meeting performance requirements.

normalization_vs_performance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
-- ==================================================
-- EXAMPLE: NORMALIZED vs. DENORMALIZED DESIGN
-- ==================================================
 
-- SCENARIO: E-commerce order history page
-- Requirement: Show orders with customer name and line item count
-- Pattern: Heavy reads, rare updates, millions of orders
 
-- ====================
-- NORMALIZED (3NF) DESIGN
-- ====================
 
CREATE TABLE Customer (
    CustomerID      SERIAL PRIMARY KEY,
    CustomerName    VARCHAR(100) NOT NULL
);
 
CREATE TABLE Orders (
    OrderID         SERIAL PRIMARY KEY,
    CustomerID      INT NOT NULL REFERENCES Customer(CustomerID),
    OrderDate       TIMESTAMP NOT NULL DEFAULT NOW(),
    TotalAmount     DECIMAL(12,2)
);
 
CREATE TABLE OrderLine (
    OrderLineID     SERIAL PRIMARY KEY,
    OrderID         INT NOT NULL REFERENCES Orders(OrderID),
    ProductID       INT NOT NULL,
    Quantity        INT NOT NULL,
    LineTotal       DECIMAL(12,2)
);
 
-- Query for order history:
SELECT o.OrderID, c.CustomerName, o.OrderDate, o.TotalAmount,
       COUNT(ol.OrderLineID) AS LineItemCount
FROM Orders o
JOIN Customer c ON o.CustomerID = c.CustomerID
LEFT JOIN OrderLine ol ON o.OrderID = ol.OrderID
WHERE c.CustomerID = 12345
GROUP BY o.OrderID, c.CustomerName, o.OrderDate, o.TotalAmount
ORDER BY o.OrderDate DESC
LIMIT 50;
 
-- Performance concern: JOIN + GROUP BY on millions of orders
 
-- ====================
-- DENORMALIZED DESIGN
-- ====================
 
CREATE TABLE Orders_Denormalized (
    OrderID         SERIAL PRIMARY KEY,
    CustomerID      INT NOT NULL,
    CustomerName    VARCHAR(100) NOT NULL,  -- Denormalized from Customer
    OrderDate       TIMESTAMP NOT NULL DEFAULT NOW(),
    TotalAmount     DECIMAL(12,2),
    LineItemCount   INT NOT NULL DEFAULT 0  -- Pre-computed aggregate
);
 
-- Faster query (no JOINs, no GROUP BY):
SELECT OrderID, CustomerName, OrderDate, TotalAmount, LineItemCount
FROM Orders_Denormalized
WHERE CustomerID = 12345
ORDER BY OrderDate DESC
LIMIT 50;
 
-- Maintenance required:
-- 1) Trigger to update LineItemCount on OrderLine changes
-- 2) Trigger or application code to update CustomerName if it changes
 
CREATE OR REPLACE FUNCTION update_order_line_count()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        UPDATE Orders_Denormalized 
        SET LineItemCount = LineItemCount + 1 
        WHERE OrderID = NEW.OrderID;
    ELSIF TG_OP = 'DELETE' THEN
        UPDATE Orders_Denormalized 
        SET LineItemCount = LineItemCount - 1 
        WHERE OrderID = OLD.OrderID;
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;
 
-- Trade-off analysis:
-- Normalized: Clean, no redundancy, complex read queries
-- Denormalized: Redundant CustomerName, faster reads, maintenance burden

Summary and Key Takeaways

Normalization is both science and craft—rigorous theory applied with practical judgment. Let's consolidate the essential knowledge:

Normalization Essentials

•Functional dependencies are constraints that define which attributes determine others. They drive normalization analysis.
•1NF requires atomic values and a primary key—the foundation for being 'relational' at all.
•2NF eliminates partial dependencies—non-key attributes must depend on the WHOLE composite key.
•3NF eliminates transitive dependencies—non-key attributes must depend on NOTHING BUT the key.
•BCNF is stricter: every determinant must be a superkey. Handles edge cases 3NF misses.
•4NF and 5NF address multivalued and join dependencies—rarely needed but valuable to recognize.
•Denormalization is intentional redundancy for performance; document it, automate maintenance, and measure first.
•3NF is the practical standard for most OLTP schemas; go further only when specific problems justify it.

What Comes Next:

With normalization mastered, we move to constraint specification—the art of encoding business rules directly in the database schema. Constraints are the guardians of data integrity, ensuring that even when applications fail or errors occur, the database rejects invalid data.

Page Complete

You now command the full normalization toolkit: from fundamental functional dependencies through the complete normal form hierarchy, including practical guidelines for when to normalize and when (carefully) to denormalize. This knowledge is central to professional database design.

3 / 5

Loading learning content...

Database Management SystemsLogical Design

Logical Design: From Concepts to Schemas

LevelIntermediate

Duration90 mins

TopicLogical Design

3 / 5

Normalization

The Science of Eliminating Redundancy

Why does redundancy matter?

Redundant data—the same fact stored in multiple places—creates three types of anomalies:

Insertion Anomaly: Cannot insert data without complete information. Example: Can't add a new department until at least one employee is assigned.
Update Anomaly: Changing a fact requires multiple updates. Example: If a department moves locations, every employee row with that department must be updated.
Deletion Anomaly: Removing data unintentionally deletes related facts. Example: Deleting the last employee in a department loses the department's information.

Normalization systematically eliminates these anomalies by decomposing relations into smaller, well-structured relations that store each fact exactly once.

Learning Objectives

Functional Dependencies: The Foundation

Before understanding normal forms, we must master functional dependencies (FDs)—the mathematical relationships that normalization seeks to manage.

Definition:

A functional dependency X → Y (read "X determines Y" or "Y is functionally dependent on X") exists in a relation R if and only if, for any two tuples t₁ and t₂ in R:

If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]

In other words: if two tuples have the same value(s) for attribute(s) X, they must have the same value(s) for attribute(s) Y.

Key Insights:

X is called the determinant (left-hand side)
Y is called the dependent (right-hand side)
X and Y can be single attributes or sets of attributes
FDs represent real-world constraints, not accidental patterns in current data
FDs must hold for ALL possible valid instances, not just the current state

Functional Dependency Examples
Relation	Functional Dependency	Interpretation
Employee	EmployeeID → Name, Salary, DeptID	ID uniquely determines all employee attributes
Employee	Email → EmployeeID	Email is an alternate key
Employee	DeptID → DeptName, DeptLocation	Department ID determines department info (redundancy!)
CourseSection	CourseID, Semester → InstructorID	A course section has one instructor
CourseSection	InstructorID, Semester → OfficeHours	Instructor's hours per semester

Types of Functional Dependencies:

Trivial FD: Y is a subset of X. Always true by definition. Example: {EmployeeID, Name} → Name
Non-trivial FD: Y is not a subset of X. These are meaningful constraints. Example: EmployeeID → Name
Completely Non-trivial FD: X and Y have no common attributes. Example: EmployeeID → Salary
Partial Dependency: Y depends on part of a composite key, not the whole. Example: In R(A, B, C) with key {A, B}, if B → C, then C is partially dependent.
Transitive Dependency: X → Y and Y → Z implies X → Z (transitivity). Example: EmployeeID → DeptID → DeptLocation

Armstrong's Axioms

First Normal Form (1NF)

First Normal Form (1NF) is the foundation—the minimum requirement for a relation to exist in the relational model.

Definition:

A relation is in 1NF if and only if:

It has a defined primary key
All attributes contain only atomic (indivisible) values
There are no repeating groups or arrays within a single row
Each column contains values from a single domain
Each row is unique (enforced by the primary key)

The Atomicity Requirement:

"Atomic" means the value cannot be meaningfully subdivided for query purposes. This is context-dependent:

Is "123 Main St, New York, NY 10001" atomic? For simple storage, yes. For queries filtering by city, no—it should be decomposed.
Is a phone number atomic? Depends on whether you need to query by area code.

The key question: Will you ever need to access or query parts of this value independently?

first_normal_form_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
-- ==================================================
-- FIRST NORMAL FORM (1NF) VIOLATIONS AND CORRECTIONS
-- ==================================================
 
-- ❌ VIOLATION 1: Repeating Groups (Multiple values in one cell)
-- Problem Table:
-- | StudentID | Name  | PhoneNumbers                     |
-- |-----------|-------|----------------------------------|
-- | 1         | Alice | 555-1234, 555-5678, 555-9999    |
-- | 2         | Bob   | 555-4321                         |
 
-- Issue: PhoneNumbers contains multiple values, violating atomicity
 
-- ✅ CORRECTION: Separate table for multivalued attributes
CREATE TABLE Student (
    StudentID       INT             PRIMARY KEY,
    Name            VARCHAR(100)    NOT NULL
);
 
CREATE TABLE StudentPhone (
    StudentID       INT             NOT NULL,
    PhoneNumber     VARCHAR(20)     NOT NULL,
    PhoneType       VARCHAR(20)     DEFAULT 'Mobile',
    PRIMARY KEY (StudentID, PhoneNumber),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID) ON DELETE CASCADE
);
 
-- Data now atomic:
-- Student: (1, 'Alice'), (2, 'Bob')
-- StudentPhone: (1, '555-1234', 'Mobile'), (1, '555-5678', 'Home'), 
--               (1, '555-9999', 'Work'), (2, '555-4321', 'Mobile')
 
-- ❌ VIOLATION 2: Repeating Groups as Columns
-- Problem Table:
-- | OrderID | CustomerID | Product1   | Price1 | Product2   | Price2 |
-- |---------|------------|------------|--------|------------|--------|
-- | 1001    | 1          | Widget     | 10.00  | Gadget     | 25.00  |
-- | 1002    | 2          | Widget     | 10.00  | NULL       | NULL   |
 
-- Issues: 
-- 1) Fixed number of products limits flexibility
-- 2) NULL values waste space
-- 3) Cannot easily query "all products in order"
 
-- ✅ CORRECTION: Normalize to separate order lines
CREATE TABLE Orders (
    OrderID         INT             PRIMARY KEY,
    CustomerID      INT             NOT NULL,
    OrderDate       DATE            NOT NULL DEFAULT CURRENT_DATE
);
 
CREATE TABLE OrderLine (
    OrderID         INT             NOT NULL,
    LineNumber      INT             NOT NULL,
    ProductName     VARCHAR(100)    NOT NULL,
    UnitPrice       DECIMAL(10,2)   NOT NULL,
    Quantity        INT             NOT NULL DEFAULT 1,
    PRIMARY KEY (OrderID, LineNumber),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE
);
 
-- ❌ VIOLATION 3: Composite/Non-atomic Values
-- Problem:
-- | EmployeeID | FullAddress                            |
-- |------------|----------------------------------------|
-- | 1          | 123 Main St, New York, NY 10001, USA   |
 
-- Cannot efficiently query by city, state, or zip code
 
-- ✅ CORRECTION: Decompose into atomic components
CREATE TABLE Employee (
    EmployeeID      INT             PRIMARY KEY,
    Name            VARCHAR(100)    NOT NULL
);
 
CREATE TABLE EmployeeAddress (
    EmployeeID      INT             PRIMARY KEY,
    Street          VARCHAR(200),
    City            VARCHAR(100),
    State           VARCHAR(50),
    PostalCode      VARCHAR(20),
    Country         CHAR(3)         DEFAULT 'USA',
    FOREIGN KEY (EmployeeID) REFERENCES Employee(EmployeeID) ON DELETE CASCADE
);
 
-- Now queries like "SELECT * FROM EmployeeAddress WHERE City = 'New York'" work
 
-- ❌ VIOLATION 4: Missing Primary Key
-- Problem:
-- | Name  | Department | Salary |
-- |-------|------------|--------|
-- | Alice | Sales      | 50000  |
-- | Alice | HR         | 55000  |  -- Same name, different department
-- | Bob   | Sales      | 60000  |
 
-- Without PK, cannot uniquely identify rows or establish relationships
 
-- ✅ CORRECTION: Add primary key
CREATE TABLE EmployeeWithPK (
    EmployeeID      SERIAL          PRIMARY KEY,  -- Surrogate key
    Name            VARCHAR(100)    NOT NULL,
    Department      VARCHAR(50)     NOT NULL,
    Salary          DECIMAL(12,2),
    UNIQUE (Name, Department)  -- If business rule requires uniqueness
);

Context-Dependent Atomicity

Second Normal Form (2NF)

Second Normal Form (2NF) addresses partial dependencies in relations with composite primary keys.

Definition:

A relation is in 2NF if and only if:

It is in 1NF, AND
Every non-prime attribute is fully functionally dependent on the entire primary key

Key Terminology:

Prime attribute: An attribute that is part of any candidate key
Non-prime attribute: An attribute that is not part of any candidate key
Full functional dependency: X → Y is full if no proper subset of X determines Y
Partial dependency: X → Y where some proper subset of X also determines Y

Important Note:

If a relation has a single-attribute primary key, it automatically satisfies 2NF (there's no "part of" a single attribute). 2NF only matters for composite keys.

Why Partial Dependencies Cause Problems:

When non-key attributes depend on only part of the key, the same fact gets stored multiple times—once for each combination with the irrelevant key portion.

second_normal_form_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
-- ==================================================
-- SECOND NORMAL FORM (2NF) VIOLATION AND CORRECTION
-- ==================================================
 
-- ❌ VIOLATION: Partial Dependency
-- Table: StudentCourse
-- Primary Key: (StudentID, CourseID)
-- Attributes: StudentName, CourseName, InstructorID, Grade
 
-- | StudentID | CourseID | StudentName | CourseName | InstructorID | Grade |
-- |-----------|----------|-------------|------------|--------------|-------|
-- | 1         | CS101    | Alice       | Intro CS   | 501          | A     |
-- | 1         | CS201    | Alice       | Data Str   | 502          | B     |
-- | 2         | CS101    | Bob         | Intro CS   | 501          | B     |
-- | 3         | CS101    | Carol       | Intro CS   | 501          | A     |
 
-- Functional Dependencies:
-- 1) StudentID, CourseID → StudentName, CourseName, InstructorID, Grade (Full - OK)
-- 2) StudentID → StudentName (PARTIAL - violates 2NF!)
-- 3) CourseID → CourseName, InstructorID (PARTIAL - violates 2NF!)
-- 4) StudentID, CourseID → Grade (Full - OK)
 
-- Problems caused:
-- 1) UPDATE ANOMALY: If Alice changes name, must update all her enrollments
-- 2) INSERTION ANOMALY: Can't add new course until a student enrolls
-- 3) DELETION ANOMALY: If Alice drops all courses, we lose her information
-- 4) REDUNDANCY: "Alice" stored 2 times, "Intro CS" stored 3 times
 
-- ✅ CORRECTION: Decompose to eliminate partial dependencies
 
-- Relation 1: Student (captures StudentID → StudentName)
CREATE TABLE Student (
    StudentID       INT             PRIMARY KEY,
    StudentName     VARCHAR(100)    NOT NULL
);
 
-- Relation 2: Course (captures CourseID → CourseName, InstructorID)
CREATE TABLE Course (
    CourseID        VARCHAR(10)     PRIMARY KEY,
    CourseName      VARCHAR(100)    NOT NULL,
    InstructorID    INT             NOT NULL
);
 
-- Relation 3: Enrollment (only the full dependency remains)
CREATE TABLE Enrollment (
    StudentID       INT             NOT NULL,
    CourseID        VARCHAR(10)     NOT NULL,
    Grade           CHAR(2),
    EnrollmentDate  DATE            NOT NULL DEFAULT CURRENT_DATE,
    
    PRIMARY KEY (StudentID, CourseID),
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID),
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
);
 
-- Now each fact stored exactly once:
-- Student: (1, 'Alice'), (2, 'Bob'), (3, 'Carol')
-- Course: ('CS101', 'Intro CS', 501), ('CS201', 'Data Str', 502)
-- Enrollment: (1, 'CS101', 'A'), (1, 'CS201', 'B'), (2, 'CS101', 'B'), (3, 'CS101', 'A')
 
-- ==================================================
-- 2NF DECOMPOSITION ALGORITHM
-- ==================================================
 
/*
Given relation R(A, B, C, D) with key {A, B} and FDs:
  - {A, B} → C, D
  - A → D  (partial dependency: D depends on part of key)
 
Step 1: Identify partial dependencies
  - A → D violates 2NF (D depends on A alone, not full key {A,B})
 
Step 2: Create new relation with partial dependency
  - Create R1(A, D) with key {A}
 
Step 3: Remove partially dependent attributes from original
  - Create R2(A, B, C) with key {A, B}
 
Result: Both R1 and R2 are in 2NF
  - R1(A, D): A → D (full dependency, single-attribute key)
  - R2(A, B, C): {A, B} → C (full dependency)
*/

2NF in Practice

Third Normal Form (3NF)

Third Normal Form (3NF) eliminates transitive dependencies and is often considered the practical target for most database designs.

Definition:

A relation is in 3NF if and only if:

It is in 2NF, AND
No non-prime attribute is transitively dependent on the primary key

Alternative Definition (More Precise):

For every non-trivial FD X → A in R:

X is a superkey, OR
A is a prime attribute (part of some candidate key)

What is Transitive Dependency?

If X → Y and Y → Z, then X → Z transitively. In 3NF terms, a violation occurs when:

A non-key attribute (Z) depends on another non-key attribute (Y)
Which in turn depends on the key (X)
And Y is not a candidate key

The Core Issue:

Transitive dependencies embed facts about one entity (the intermediate) within the table of another entity. This creates redundancy and anomalies similar to partial dependencies.

third_normal_form_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
-- ==================================================
-- THIRD NORMAL FORM (3NF) VIOLATION AND CORRECTION
-- ==================================================
 
-- ❌ VIOLATION: Transitive Dependency
-- Table: Employee
-- Primary Key: EmployeeID
-- 
-- | EmpID | EmpName | DeptID | DeptName    | DeptLocation |
-- |-------|---------|--------|-------------|--------------|
-- | 1     | Alice   | D1     | Engineering | Building A   |
-- | 2     | Bob     | D1     | Engineering | Building A   |
-- | 3     | Carol   | D2     | Marketing   | Building B   |
-- | 4     | Dave    | D2     | Marketing   | Building B   |
 
-- Functional Dependencies:
-- 1) EmployeeID → EmpName, DeptID, DeptName, DeptLocation (OK - key determines all)
-- 2) DeptID → DeptName, DeptLocation (Non-key → Non-key: TRANSITIVE!)
 
-- Transitive chain: EmployeeID → DeptID → DeptName, DeptLocation
-- DeptName and DeptLocation are about Department, not Employee
 
-- Problems:
-- 1) UPDATE ANOMALY: Change "Engineering" location requires updating all employees
-- 2) INSERTION ANOMALY: Can't add new department until employee is hired
-- 3) DELETION ANOMALY: If Alice and Bob leave, we lose Engineering department info
-- 4) REDUNDANCY: "Engineering, Building A" repeated for every Engineering employee
 
-- ✅ CORRECTION: Remove transitive dependency
 
-- Relation 1: Department (new entity based on determinant of transitive FD)
CREATE TABLE Department (
    DeptID          VARCHAR(10)     PRIMARY KEY,
    DeptName        VARCHAR(100)    NOT NULL UNIQUE,
    DeptLocation    VARCHAR(100)
);
 
-- Relation 2: Employee (transitively dependent attributes removed)
CREATE TABLE Employee (
    EmployeeID      INT             PRIMARY KEY,
    EmployeeName    VARCHAR(100)    NOT NULL,
    Salary          DECIMAL(12,2),
    DeptID          VARCHAR(10)     NOT NULL,
    
    FOREIGN KEY (DeptID) REFERENCES Department(DeptID)
);
 
-- Data normalized:
-- Department: ('D1', 'Engineering', 'Building A'), ('D2', 'Marketing', 'Building B')
-- Employee: (1, 'Alice', 75000, 'D1'), (2, 'Bob', 80000, 'D1'), 
--           (3, 'Carol', 70000, 'D2'), (4, 'Dave', 72000, 'D2')
 
-- ==================================================
-- 3NF DECOMPOSITION ALGORITHM
-- ==================================================
 
/*
Given relation R(A, B, C, D, E) with key {A} and FDs:
  - A → B, C, D, E
  - C → D, E  (transitive: non-key C determines non-keys D, E)
 
Step 1: Identify transitive dependencies
  - A → C → D, E (C is the intermediate, D and E are transitively dependent)
 
Step 2: Create new relation for the determinant
  - Create R1(C, D, E) with key {C}
 
Step 3: Remove transitively dependent attributes from original
  - Create R2(A, B, C) with key {A}
 
Result: Both relations are in 3NF
  - R1(C, D, E): C → D, E (C is a key, so allowed)
  - R2(A, B, C): A → B, C (A is a key, so allowed)
*/
 
-- ==================================================
-- ANOTHER EXAMPLE: Order with derived customer info
-- ==================================================
 
-- ❌ VIOLATION:
-- Order(OrderID, CustomerID, CustomerName, CustomerEmail, OrderDate, Total)
-- FDs: OrderID → CustomerID, CustomerID → CustomerName, CustomerEmail
-- Transitive: OrderID → CustomerID → CustomerName, CustomerEmail
 
-- ✅ CORRECTION:
CREATE TABLE Customer (
    CustomerID      INT             PRIMARY KEY,
    CustomerName    VARCHAR(100)    NOT NULL,
    CustomerEmail   VARCHAR(254)    NOT NULL UNIQUE
);
 
CREATE TABLE Orders (
    OrderID         INT             PRIMARY KEY,
    CustomerID      INT             NOT NULL,
    OrderDate       DATE            NOT NULL,
    Total           DECIMAL(12,2)   NOT NULL,
    
    FOREIGN KEY (CustomerID) REFERENCES Customer(CustomerID)
);

The 3NF Mnemonic

Boyce-Codd Normal Form (BCNF)

Boyce-Codd Normal Form (BCNF) is a stricter version of 3NF that handles edge cases involving candidate keys.

Definition:

A relation is in BCNF if and only if, for every non-trivial functional dependency X → Y:

X is a superkey

That's it. Unlike 3NF, there's no exception for prime attributes. Every determinant must be a superkey.

3NF vs. BCNF:

3NF allows: X → A where A is prime (part of a candidate key) BCNF does not allow this exception

When Does 3NF ≠ BCNF?

The difference only matters when:

A relation has multiple overlapping candidate keys
Some candidate keys are composite
The composite keys share common attributes

This is relatively rare in practice, but when it occurs, BCNF violations can cause anomalies missed by 3NF analysis.

bcnf_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
-- ==================================================
-- BCNF VIOLATION EXAMPLE
-- ==================================================
 
-- Table: CourseInstructor
-- Records which instructors teach which courses in which rooms
-- 
-- | Course  | Instructor | Room  |
-- |---------|------------|-------|
-- | CS101   | Prof. Smith| R101  |
-- | CS101   | Prof. Jones| R102  |
-- | CS201   | Prof. Smith| R101  |
-- | CS201   | Prof. Brown| R103  |
 
-- Business Rules (leading to overlapping candidate keys):
-- 1. An instructor teaches in only ONE room (Instructor → Room)
-- 2. A course in a specific room has one instructor (Course, Room → Instructor)
 
-- Candidate Keys:
-- CK1: {Course, Instructor} - determines Room via Instructor → Room
-- CK2: {Course, Room} - determines Instructor per business rule 2
 
-- FDs:
-- {Course, Instructor} → Room (CK1 → Room, OK)
-- {Course, Room} → Instructor (CK2 → Instructor, OK)
-- Instructor → Room (NOT a superkey, BCNF VIOLATION!)
 
-- 3NF Analysis: Instructor → Room
-- Is Instructor a superkey? NO
-- Is Room a prime attribute? YES (part of CK2)
-- Therefore: 3NF satisfied, but BCNF violated!
 
-- The Problem:
-- | Course  | Instructor | Room  |
-- | CS101   | Prof. Smith| R101  | 
-- | CS201   | Prof. Smith| R101  |  -- Room R101 stored twice for Smith
-- | CS301   | Prof. Smith| R101  |  -- Redundancy! If Smith moves, update 3 rows
 
-- ✅ CORRECTION: Decompose to achieve BCNF
 
-- Relation 1: InstructorRoom (captures Instructor → Room)
CREATE TABLE InstructorRoom (
    Instructor      VARCHAR(100)    PRIMARY KEY,
    Room            VARCHAR(10)     NOT NULL
);
 
-- Relation 2: CourseInstructor (remaining attributes)
CREATE TABLE CourseInstructor (
    Course          VARCHAR(10)     NOT NULL,
    Instructor      VARCHAR(100)    NOT NULL,
    
    PRIMARY KEY (Course, Instructor),
    FOREIGN KEY (Instructor) REFERENCES InstructorRoom(Instructor)
);
 
-- Now normalized:
-- InstructorRoom: ('Prof. Smith', 'R101'), ('Prof. Jones', 'R102'), ('Prof. Brown', 'R103')
-- CourseInstructor: ('CS101', 'Prof. Smith'), ('CS101', 'Prof. Jones'), 
--                   ('CS201', 'Prof. Smith'), ('CS201', 'Prof. Brown')
 
-- No redundancy: Smith's room stored once only
 
-- ==================================================
-- BCNF TRADE-OFF: DEPENDENCY PRESERVATION
-- ==================================================
 
/*
IMPORTANT CAVEAT:
 
The original FD {Course, Room} → Instructor cannot be enforced
by a single-table constraint in either decomposed table!
 
To enforce this, you would need:
1. Application logic validation
2. A trigger that checks across both tables
3. A view with a CHECK constraint (if supported)
 
This illustrates the BCNF trade-off:
- BCNF guarantees no redundancy
- But may sacrifice dependency preservation
- 3NF can always preserve all FDs in single-table constraints
- Sometimes 3NF is preferred for practical enforcement reasons
*/
 
-- Enforcing the cross-table constraint via trigger
CREATE OR REPLACE FUNCTION check_course_room_instructor()
RETURNS TRIGGER AS $$
BEGIN
    -- Check if the same course-room pair already has a different instructor
    IF EXISTS (
        SELECT 1 
        FROM CourseInstructor ci
        JOIN InstructorRoom ir ON ci.Instructor = ir.Instructor
        WHERE ci.Course = NEW.Course 
          AND ir.Room = (SELECT Room FROM InstructorRoom WHERE Instructor = NEW.Instructor)
          AND ci.Instructor != NEW.Instructor
    ) THEN
        RAISE EXCEPTION 'Constraint violation: Course % in this room already has different instructor', NEW.Course;
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER trg_course_room_instructor
BEFORE INSERT OR UPDATE ON CourseInstructor
FOR EACH ROW EXECUTE FUNCTION check_course_room_instructor();

BCNF vs. Dependency Preservation

Higher Normal Forms: 4NF and 5NF

Beyond BCNF, higher normal forms address more subtle redundancies caused by multivalued dependencies (4NF) and join dependencies (5NF).

Fourth Normal Form (4NF):

A relation is in 4NF if it is in BCNF and contains no non-trivial multivalued dependencies.

Multivalued Dependency (MVD):

X ↠ Y (X multi-determines Y) means that the set of Y-values associated with a given X-value is independent of other attributes.

Formal: For all pairs of tuples with equal X-values, swapping their Y-values produces tuples that also exist in the relation.

Example of MVD:

Employee(EmpID, Skill, Language)

If an employee's skills are independent of languages they speak:

EmpID ↠ Skill (employee's skills don't depend on languages known)
EmpID ↠ Language (languages don't depend on skills)

This creates ALL combinations of skills and languages per employee—massive redundancy!

fourth_fifth_normal_form.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
-- ==================================================
-- FOURTH NORMAL FORM (4NF) VIOLATION AND CORRECTION
-- ==================================================
 
-- ❌ VIOLATION: Independent Multivalued Dependencies
-- Table: EmployeeSkillLanguage
-- | EmpID | Skill      | Language |
-- |-------|------------|----------|
-- | 1     | Python     | English  |
-- | 1     | Python     | Spanish  |
-- | 1     | Java       | English  |
-- | 1     | Java       | Spanish  |
-- | 2     | JavaScript | French   |
-- | 2     | JavaScript | German   |
-- | 2     | TypeScript | French   |
-- | 2     | TypeScript | German   |
 
-- MVDs: EmpID ↠ Skill, EmpID ↠ Language
-- Each skill appears with EVERY language the employee knows
-- Employee 1 knows 2 skills × 2 languages = 4 rows (exponential growth!)
 
-- Problems:
-- 1) REDUNDANCY: Each skill stored once per language known
-- 2) UPDATE ANOMALY: Add new language → add row for EVERY skill
-- 3) DELETION ANOMALY: Remove last language → lose all skill info
 
-- ✅ CORRECTION: Decompose to eliminate MVDs
 
CREATE TABLE EmployeeSkill (
    EmpID           INT             NOT NULL,
    Skill           VARCHAR(50)     NOT NULL,
    PRIMARY KEY (EmpID, Skill)
);
 
CREATE TABLE EmployeeLanguage (
    EmpID           INT             NOT NULL,
    Language        VARCHAR(50)     NOT NULL,
    PRIMARY KEY (EmpID, Language)
);
 
-- Now:
-- EmployeeSkill: (1, 'Python'), (1, 'Java'), (2, 'JavaScript'), (2, 'TypeScript')
-- EmployeeLanguage: (1, 'English'), (1, 'Spanish'), (2, 'French'), (2, 'German')
-- 4 + 4 = 8 rows instead of 8 (no savings here, but scales linearly not exponentially)
-- 3 skills × 3 languages = 6 rows instead of 9
 
-- ==================================================
-- FIFTH NORMAL FORM (5NF) - Join Dependencies
-- ==================================================
 
-- 5NF addresses join dependencies that cannot be expressed as MVDs or FDs
-- A relation is in 5NF if it cannot be losslessly decomposed further
 
-- Example Scenario:
-- Agents represent Companies for Products
-- But the relationships are pairwise constrained:
-- 1) Agent-Company: authorized agents for each company
-- 2) Company-Product: products each company sells
-- 3) Agent-Product: products each agent can sell
 
-- If the combination (Agent, Company, Product) is valid ONLY when ALL THREE 
-- pairwise relationships exist, then the three-way table contains redundancy
 
-- | Agent | Company | Product |
-- |-------|---------|---------|
-- | A1    | C1      | P1      |
-- | A1    | C1      | P2      |
-- | A1    | C2      | P1      |
-- | A2    | C1      | P1      |
 
-- Join Dependency: *(AC, CP, AP)
-- The table can be reconstructed by joining three projections
 
-- ✅ CORRECTION: Decompose into projections
 
CREATE TABLE AgentCompany (
    Agent           VARCHAR(10)     NOT NULL,
    Company         VARCHAR(10)     NOT NULL,
    PRIMARY KEY (Agent, Company)
);
 
CREATE TABLE CompanyProduct (
    Company         VARCHAR(10)     NOT NULL,
    Product         VARCHAR(10)     NOT NULL,
    PRIMARY KEY (Company, Product)
);
 
CREATE TABLE AgentProduct (
    Agent           VARCHAR(10)     NOT NULL,
    Product         VARCHAR(10)     NOT NULL,
    PRIMARY KEY (Agent, Product)
);
 
-- Reconstruction via natural join:
-- SELECT DISTINCT ac.Agent, ac.Company, cp.Product
-- FROM AgentCompany ac
-- JOIN CompanyProduct cp ON ac.Company = cp.Company
-- JOIN AgentProduct ap ON ac.Agent = ap.Agent AND cp.Product = ap.Product;
 
-- NOTE: 5NF is rarely needed in practice and can make queries complex

Normal Form Summary
Normal Form	Eliminates	Based On	Practical Use
1NF	Non-atomic values, repeating groups	Atomicity	Universal requirement
2NF	Partial dependencies	Full FD on key	Rare issue if ER mapping done well
3NF	Transitive dependencies	Non-key → non-key FDs	Standard target for most schemas
BCNF	All non-superkey determinants	Stricter than 3NF	Used when 3NF has overlapping keys
4NF	Non-trivial MVDs	Independent multivalued facts	Uncommon, specific patterns
5NF	Join dependencies	Cyclic constraints	Very rare, academic interest

Normalization in Practice

Theory provides the framework, but practical normalization requires judgment. Here are essential considerations for applying normalization in real database design.

When to Stop Normalizing:

3NF is usually sufficient for most OLTP (transactional) systems
BCNF is worthwhile when overlapping candidate keys exist and redundancy is problematic
4NF and 5NF address rare patterns; decompose only if you observe the specific problems they solve

When to Denormalize:

Denormalization—intentionally violating normal form rules—is appropriate when:

Read performance is critical and joins are bottlenecks
Data rarely changes after initial load
The redundancy is manageable (limited rows, automated updates)
Query patterns strongly favor denormalized structure

Denormalization Strategies:

Materialized views: Store computed joins, refresh periodically
Redundant columns: Store derived or frequently-joined values
Summary tables: Pre-aggregate for reporting queries
Caching layers: Keep normalized DB, cache denormalized views

Normalization Decision Framework

•Start normalized (3NF minimum): Always begin with a normalized design. Denormalize only when measured performance problems justify it.
•Measure before denormalizing: Profile slow queries. The bottleneck is often indexing, not normalization level.
•Document denormalization: Record what normal form is violated, why, and what maintenance mechanisms exist.
•Consider the write/read ratio: High-write systems suffer more from denormalization (update anomalies). High-read systems benefit more.
•Factor in consistency requirements: Denormalization risks inconsistency. Is eventual consistency acceptable?
•Automate maintenance: If denormalizing, use triggers, stored procedures, or application code to maintain derived values.

The Expert's Approach

normalization_vs_performance.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
-- ==================================================
-- EXAMPLE: NORMALIZED vs. DENORMALIZED DESIGN
-- ==================================================
 
-- SCENARIO: E-commerce order history page
-- Requirement: Show orders with customer name and line item count
-- Pattern: Heavy reads, rare updates, millions of orders
 
-- ====================
-- NORMALIZED (3NF) DESIGN
-- ====================
 
CREATE TABLE Customer (
    CustomerID      SERIAL PRIMARY KEY,
    CustomerName    VARCHAR(100) NOT NULL
);
 
CREATE TABLE Orders (
    OrderID         SERIAL PRIMARY KEY,
    CustomerID      INT NOT NULL REFERENCES Customer(CustomerID),
    OrderDate       TIMESTAMP NOT NULL DEFAULT NOW(),
    TotalAmount     DECIMAL(12,2)
);
 
CREATE TABLE OrderLine (
    OrderLineID     SERIAL PRIMARY KEY,
    OrderID         INT NOT NULL REFERENCES Orders(OrderID),
    ProductID       INT NOT NULL,
    Quantity        INT NOT NULL,
    LineTotal       DECIMAL(12,2)
);
 
-- Query for order history:
SELECT o.OrderID, c.CustomerName, o.OrderDate, o.TotalAmount,
       COUNT(ol.OrderLineID) AS LineItemCount
FROM Orders o
JOIN Customer c ON o.CustomerID = c.CustomerID
LEFT JOIN OrderLine ol ON o.OrderID = ol.OrderID
WHERE c.CustomerID = 12345
GROUP BY o.OrderID, c.CustomerName, o.OrderDate, o.TotalAmount
ORDER BY o.OrderDate DESC
LIMIT 50;
 
-- Performance concern: JOIN + GROUP BY on millions of orders
 
-- ====================
-- DENORMALIZED DESIGN
-- ====================
 
CREATE TABLE Orders_Denormalized (
    OrderID         SERIAL PRIMARY KEY,
    CustomerID      INT NOT NULL,
    CustomerName    VARCHAR(100) NOT NULL,  -- Denormalized from Customer
    OrderDate       TIMESTAMP NOT NULL DEFAULT NOW(),
    TotalAmount     DECIMAL(12,2),
    LineItemCount   INT NOT NULL DEFAULT 0  -- Pre-computed aggregate
);
 
-- Faster query (no JOINs, no GROUP BY):
SELECT OrderID, CustomerName, OrderDate, TotalAmount, LineItemCount
FROM Orders_Denormalized
WHERE CustomerID = 12345
ORDER BY OrderDate DESC
LIMIT 50;
 
-- Maintenance required:
-- 1) Trigger to update LineItemCount on OrderLine changes
-- 2) Trigger or application code to update CustomerName if it changes
 
CREATE OR REPLACE FUNCTION update_order_line_count()
RETURNS TRIGGER AS $$
BEGIN
    IF TG_OP = 'INSERT' THEN
        UPDATE Orders_Denormalized 
        SET LineItemCount = LineItemCount + 1 
        WHERE OrderID = NEW.OrderID;
    ELSIF TG_OP = 'DELETE' THEN
        UPDATE Orders_Denormalized 
        SET LineItemCount = LineItemCount - 1 
        WHERE OrderID = OLD.OrderID;
    END IF;
    RETURN NULL;
END;
$$ LANGUAGE plpgsql;
 
-- Trade-off analysis:
-- Normalized: Clean, no redundancy, complex read queries
-- Denormalized: Redundant CustomerName, faster reads, maintenance burden

Summary and Key Takeaways

Normalization is both science and craft—rigorous theory applied with practical judgment. Let's consolidate the essential knowledge:

Normalization Essentials

•Functional dependencies are constraints that define which attributes determine others. They drive normalization analysis.
•1NF requires atomic values and a primary key—the foundation for being 'relational' at all.
•2NF eliminates partial dependencies—non-key attributes must depend on the WHOLE composite key.
•3NF eliminates transitive dependencies—non-key attributes must depend on NOTHING BUT the key.
•BCNF is stricter: every determinant must be a superkey. Handles edge cases 3NF misses.
•4NF and 5NF address multivalued and join dependencies—rarely needed but valuable to recognize.
•Denormalization is intentional redundancy for performance; document it, automate maintenance, and measure first.
•3NF is the practical standard for most OLTP schemas; go further only when specific problems justify it.

What Comes Next:

Page Complete

3 / 5