Fourth Normal Form - Learning Module

Loading content...

0/252

4NF Examples

Learning Through Practice

Theory becomes mastery through application. This page presents a comprehensive collection of Fourth Normal Form examples spanning multiple domains—from e-commerce to healthcare, from education to manufacturing.

Each example follows a consistent structure:

Scenario description — The real-world context
Initial schema — The relation with potential issues
Dependency analysis — Identifying FDs and MVDs
4NF assessment — Determining if violations exist
Decomposition — Resolving violations when present
Verification — Confirming the result is in 4NF

These examples will cement your understanding and prepare you for applying 4NF analysis to your own database designs.

What You Will Learn

By the end of this page, you will have internalized the 4NF analysis process through multiple worked examples, recognizing violation patterns across different domains and confidently applying decomposition techniques.

Example 1: E-Commerce Product Catalog

Scenario:

An e-commerce platform tracks products with their available colors, sizes, and target markets. A product can come in multiple colors, multiple sizes, and be sold in multiple markets. Initially, all this information is stored in a single relation.

Initial Schema:

ProductCatalog(ProductID, ProductName, Color, Size, Market, BasePrice)

ProductCatalog Instance
ProductID	ProductName	Color	Size	Market	BasePrice
P001	Classic T-Shirt	Red	S	USA	$19.99
P001	Classic T-Shirt	Red	S	EU	$19.99
P001	Classic T-Shirt	Red	M	USA	$19.99
P001	Classic T-Shirt	Red	M	EU	$19.99
P001	Classic T-Shirt	Blue	S	USA	$19.99
P001	Classic T-Shirt	Blue	S	EU	$19.99
P001	Classic T-Shirt	Blue	M	USA	$19.99
P001	Classic T-Shirt	Blue	M	EU	$19.99

Dependency Analysis:

Functional Dependencies:

ProductID → ProductName (each product has one name)
ProductID → BasePrice (each product has one base price)

Multivalued Dependencies:

ProductID →→ Color (colors independent of sizes and markets)
ProductID →→ Size (sizes independent of colors and markets)
ProductID →→ Market (markets independent of colors and sizes)

4NF Assessment:

Candidate key: {ProductID, Color, Size, Market}
Check MVD ProductID →→ Color: ProductID is NOT a superkey. 4NF Violation!
Check MVD ProductID →→ Size: ProductID is NOT a superkey. 4NF Violation!
Check MVD ProductID →→ Market: ProductID is NOT a superkey. 4NF Violation!
Check FD ProductID → ProductName: ProductID is NOT a superkey. BCNF Violation!
Check FD ProductID → BasePrice: ProductID is NOT a superkey. BCNF Violation!

Severity Analysis:

With 2 colors × 2 sizes × 2 markets = 8 rows for one product. With 10 colors × 10 sizes × 50 markets = 5,000 rows per product!

ProductName and BasePrice each repeated 5,000 times per product.

Decomposition:

Step 1: Address the FD violations first (BCNF)

Decompose on ProductID → ProductName, BasePrice:
- Product(ProductID, ProductName, BasePrice)
- ProductVariants(ProductID, Color, Size, Market)

Step 2: Address MVD violations on ProductVariants

Decompose on ProductID →→ Color:
- ProductColor(ProductID, Color)
- ProductSizeMarket(ProductID, Size, Market)

Step 3: Continue with ProductSizeMarket

Decompose on ProductID →→ Size:
- ProductSize(ProductID, Size)
- ProductMarket(ProductID, Market)

Final 4NF Schema:

Product(ProductID PK, ProductName, BasePrice)
ProductColor(ProductID PK FK, Color PK)
ProductSize(ProductID PK FK, Size PK)
ProductMarket(ProductID PK FK, Market PK)

Verification:

Product: FD ProductID → (ProductName, BasePrice). ProductID is key. BCNF ✓, 4NF ✓
ProductColor: Binary relation. Automatically 4NF ✓
ProductSize: Binary relation. Automatically 4NF ✓
ProductMarket: Binary relation. Automatically 4NF ✓

Storage Comparison:

Original: 8 rows (and 5,000 with more options) Decomposed: 1 + 2 + 2 + 2 = 7 rows (1 + 10 + 10 + 50 = 71 with more options)

Reduction: 5,000 → 71 = 98.6% reduction in row count!

Massive Improvement

This example demonstrates the dramatic storage and maintenance benefits of 4NF. The combinatorial explosion of 5,000 rows collapses to just 71 rows, while eliminating all update, insertion, and deletion anomalies.

Example 2: University Course Management

Scenario:

A university tracks courses with their instructors and required textbooks. A course can be taught by multiple instructors and require multiple textbooks. Instructor assignments and textbook requirements are independent decisions.

Initial Schema:

CourseInfo(CourseID, CourseName, InstructorID, InstructorName, TextbookISBN, TextbookTitle)

CourseInfo Instance
CourseID	CourseName	InstructorID	InstructorName	TextbookISBN	TextbookTitle
CS101	Intro to Programming	I001	Dr. Smith	978-0134	Python Basics
CS101	Intro to Programming	I001	Dr. Smith	978-0256	Coding Fundamentals
CS101	Intro to Programming	I002	Dr. Jones	978-0134	Python Basics
CS101	Intro to Programming	I002	Dr. Jones	978-0256	Coding Fundamentals
CS201	Data Structures	I002	Dr. Jones	978-0378	DSA Guide

Dependency Analysis:

Functional Dependencies:

CourseID → CourseName
InstructorID → InstructorName
TextbookISBN → TextbookTitle

Multivalued Dependencies:

CourseID →→ InstructorID (instructors assigned independently of textbooks)
CourseID →→ TextbookISBN (textbooks assigned independently of instructors)

4NF Assessment:

Candidate key: {CourseID, InstructorID, TextbookISBN}
All three FDs have non-superkey determinants: BCNF violations
Both MVDs have non-superkey determinant: 4NF violations

Decomposition:

This requires careful ordering to handle both FD and MVD violations:

Step 1: Decompose on CourseID → CourseName

Course(CourseID, CourseName)
R1(CourseID, InstructorID, InstructorName, TextbookISBN, TextbookTitle)

Step 2: Decompose R1 on InstructorID → InstructorName

Instructor(InstructorID, InstructorName)
R2(CourseID, InstructorID, TextbookISBN, TextbookTitle)

Step 3: Decompose R2 on TextbookISBN → TextbookTitle

Textbook(TextbookISBN, TextbookTitle)
R3(CourseID, InstructorID, TextbookISBN)

Step 4: Now R3 still has MVD violations!

CourseID →→ InstructorID in R3: Non-superkey determinant. 4NF Violation!
Decompose on CourseID →→ InstructorID:
- CourseInstructor(CourseID, InstructorID)
- CourseTextbook(CourseID, TextbookISBN)

Final 4NF Schema:

Course(CourseID PK, CourseName)
Instructor(InstructorID PK, InstructorName)
Textbook(TextbookISBN PK, TextbookTitle)
CourseInstructor(CourseID PK FK, InstructorID PK FK)
CourseTextbook(CourseID PK FK, TextbookISBN PK FK)

Verification:

Course: CourseID is key, CourseID → CourseName. BCNF ✓, 4NF ✓
Instructor: InstructorID is key, InstructorID → InstructorName. BCNF ✓, 4NF ✓
Textbook: TextbookISBN is key, TextbookISBN → TextbookTitle. BCNF ✓, 4NF ✓
CourseInstructor: Binary relation. 4NF ✓
CourseTextbook: Binary relation. 4NF ✓

Anomalies Eliminated:

Anomaly	Before	After
Add new instructor to course	Update multiple rows (one per textbook)	Insert 1 row in CourseInstructor
Change instructor name	Update everywhere name appears	Update 1 row in Instructor
Remove textbook from course	Delete multiple rows (one per instructor)	Delete 1 row in CourseTextbook
Change course name	Update in every row	Update 1 row in Course

Natural Entity Boundaries

Notice how the final schema reflects natural entity boundaries: Course, Instructor, Textbook as entities; CourseInstructor and CourseTextbook as M:N relationships. Good 4NF decomposition often recovers the 'proper' ER model.

Example 3: Healthcare Patient Records

Scenario:

A healthcare system tracks patients with their known allergies and prescribed medications. Allergies and medications are recorded independently—allergy information comes from patient history, while medications are prescribed based on current treatment needs.

Initial Schema:

PatientMedical(PatientID, PatientName, DateOfBirth, Allergy, Medication)

PatientMedical Instance
PatientID	PatientName	DateOfBirth	Allergy	Medication
P001	John Doe	1985-03-15	Penicillin	Lisinopril
P001	John Doe	1985-03-15	Penicillin	Metformin
P001	John Doe	1985-03-15	Penicillin	Aspirin
P001	John Doe	1985-03-15	Sulfa	Lisinopril
P001	John Doe	1985-03-15	Sulfa	Metformin
P001	John Doe	1985-03-15	Sulfa	Aspirin

Dependency Analysis:

Functional Dependencies:

PatientID → PatientName, DateOfBirth

Multivalued Dependencies:

PatientID →→ Allergy (allergies recorded independently of medications)
PatientID →→ Medication (medications prescribed independently of allergies)

Critical Observation:

Allergies and medications ARE semantically independent:

Allergies are historical facts about patient physiology
Medications are current prescriptions based on conditions
A patient's allergy list doesn't determine their medication list

Yet in this schema, we store every combination, creating the Cartesian product.

4NF Assessment:

Candidate key: {PatientID, Allergy, Medication}
FD PatientID → (PatientName, DateOfBirth): Non-superkey determinant. BCNF Violation!
MVD PatientID →→ Allergy: Non-superkey determinant. 4NF Violation!
MVD PatientID →→ Medication: Non-superkey determinant. 4NF Violation!

Decomposition:

Step 1: Decompose on FD PatientID → (PatientName, DateOfBirth)

Patient(PatientID, PatientName, DateOfBirth)
PatientAllergyMed(PatientID, Allergy, Medication)

Step 2: Decompose PatientAllergyMed on MVD PatientID →→ Allergy

PatientAllergy(PatientID, Allergy)
PatientMedication(PatientID, Medication)

Final 4NF Schema:

Patient(PatientID PK, PatientName, DateOfBirth)
PatientAllergy(PatientID PK FK, Allergy PK)
PatientMedication(PatientID PK FK, Medication PK)

Verification:

Patient: PatientID is key. BCNF ✓, 4NF ✓
PatientAllergy: Binary, automatically 4NF ✓
PatientMedication: Binary, automatically 4NF ✓

Healthcare-Specific Benefits:

Safety: No risk of forgetting to add allergy info for each new medication
Accuracy: Patient demographics updated in one place
Audit trail: Each fact change is isolated and trackable
Query efficiency: Find all patients allergic to Penicillin with simple index scan
Data entry: Add allergy without knowing medications, and vice versa

Critical in Healthcare

In healthcare, data integrity is literally life-or-death. A 4NF violation here could mean: adding a medication but forgetting to replicate allergy info → potential adverse reaction not flagged. Proper normalization is a patient safety measure.

Example 4: Manufacturing Supply Chain

Scenario:

A manufacturing company tracks components, their approved suppliers, and warehouses where they're stocked. A component can be sourced from multiple suppliers and stored in multiple warehouses. Supplier approval and warehouse stocking are independent decisions.

Initial Schema:

ComponentSourcing(ComponentID, ComponentName, SupplierID, SupplierName, WarehouseID, WarehouseLocation)

ComponentSourcing Instance
ComponentID	ComponentName	SupplierID	SupplierName	WarehouseID	WarehouseLocation
C001	Steel Bolt M10	S001	FastenCo	W001	Chicago
C001	Steel Bolt M10	S001	FastenCo	W002	Detroit
C001	Steel Bolt M10	S002	BoltWorks	W001	Chicago
C001	Steel Bolt M10	S002	BoltWorks	W002	Detroit
C001	Steel Bolt M10	S003	MetalMax	W001	Chicago
C001	Steel Bolt M10	S003	MetalMax	W002	Detroit

Dependency Analysis:

Functional Dependencies:

ComponentID → ComponentName
SupplierID → SupplierName
WarehouseID → WarehouseLocation

Multivalued Dependencies:

ComponentID →→ SupplierID (approved suppliers independent of storage locations)
ComponentID →→ WarehouseID (storage locations independent of suppliers)

4NF Assessment:

Candidate key: {ComponentID, SupplierID, WarehouseID}
All FDs have non-superkey determinants: BCNF Violations
Both MVDs have non-superkey determinant: 4NF Violations

Business Impact of Violations:

With 100 components, avg 5 suppliers and 10 warehouses each:

Violation: 100 × 5 × 10 = 5,000 rows
Proper 4NF: 100 (components) + 500 (approvals) + 1,000 (stocking) + entities = ~1,600 rows

The bloated schema wastes storage and makes updates error-prone.

Decomposition:

Step 1: Extract entities (address FD violations)

Component(ComponentID, ComponentName)
Supplier(SupplierID, SupplierName)
Warehouse(WarehouseID, WarehouseLocation)
R1(ComponentID, SupplierID, WarehouseID)

Step 2: Decompose R1 on MVD ComponentID →→ SupplierID

ComponentSupplier(ComponentID, SupplierID)
ComponentWarehouse(ComponentID, WarehouseID)

Final 4NF Schema:

Component(ComponentID PK, ComponentName)
Supplier(SupplierID PK, SupplierName)
Warehouse(WarehouseID PK, WarehouseLocation)
ComponentSupplier(ComponentID PK FK, SupplierID PK FK) -- Approved suppliers
ComponentWarehouse(ComponentID PK FK, WarehouseID PK FK) -- Stocking locations

Operational Benefits:

Operation	Before 4NF	After 4NF
Approve new supplier	Insert 10 rows (one per warehouse)	Insert 1 row
Add warehouse location	Insert 5 rows (one per supplier)	Insert 1 row
Revoke supplier approval	Delete 10 rows	Delete 1 row
Query: All suppliers for C001	Scan 50 rows, deduplicate	Scan 5 rows
Update supplier name	Update 10 rows	Update 1 row

Supply Chain Scalability

Manufacturing supply chains often involve many-to-many relationships (components ↔ suppliers, components ↔ warehouses). Proper 4NF design ensures the database scales linearly with business growth rather than multiplicatively.

Example 5: Human Resources Skills Matrix

Scenario:

An HR department tracks employees with their technical skills, professional certifications, and language proficiencies. These three attribute sets are independent—skills come from experience, certifications from exams, and languages from personal background.

Initial Schema:

EmployeeCompetencies(EmpID, EmpName, Skill, Certification, Language)

EmployeeCompetencies Instance (Partial)
EmpID	EmpName	Skill	Certification	Language
E001	Alice	Python	AWS-SAA	English
E001	Alice	Python	AWS-SAA	Spanish
E001	Alice	Python	PMP	English
E001	Alice	Python	PMP	Spanish
E001	Alice	Java	AWS-SAA	English
E001	Alice	Java	AWS-SAA	Spanish
E001	Alice	Java	PMP	English
E001	Alice	Java	PMP	Spanish

Dependency Analysis:

Functional Dependencies:

EmpID → EmpName

Multivalued Dependencies:

EmpID →→ Skill (skills independent of certs and languages)
EmpID →→ Certification (certs independent of skills and languages)
EmpID →→ Language (languages independent of skills and certs)

Severity Analysis:

Alice has: 2 skills × 2 certifications × 2 languages = 8 rows

A senior employee with 15 skills, 5 certifications, 3 languages: 15 × 5 × 3 = 225 rows for ONE person!

For 1,000 employees with similar profiles: 225,000 rows vs. properly normalized ~23,000 rows.

4NF Assessment:

Candidate key: {EmpID, Skill, Certification, Language}
FD EmpID → EmpName: Non-superkey determinant. BCNF Violation
All three MVDs have non-superkey determinant: 4NF Violations

Decomposition:

This is a three-way independent MVD situation requiring four relations:

Step 1: Extract Employee entity

Employee(EmpID, EmpName)
R1(EmpID, Skill, Certification, Language)

Step 2: Decompose R1 on EmpID →→ Skill

EmployeeSkill(EmpID, Skill)
R2(EmpID, Certification, Language)

Step 3: Decompose R2 on EmpID →→ Certification

EmployeeCertification(EmpID, Certification)
EmployeeLanguage(EmpID, Language)

Final 4NF Schema:

Employee(EmpID PK, EmpName)
EmployeeSkill(EmpID PK FK, Skill PK)
EmployeeCertification(EmpID PK FK, Certification PK)
EmployeeLanguage(EmpID PK FK, Language PK)

Row Count Comparison:

Scenario	Original	4NF
Alice (2-2-2)	8 rows	1+2+2+2 = 7 rows
Senior (15-5-3)	225 rows	1+15+5+3 = 24 rows
1000 employees	~225,000 rows	~24,000 rows

Reduction: ~90% fewer rows!

HR Analytics Benefit

Beyond storage savings, HR analytics queries become far more efficient. 'Find all Python developers' queries a focused EmployeeSkill table instead of scanning massive combinatorial data. This directly impacts dashboard performance and reporting speed.

Example 6: When It's NOT a 4NF Violation

Scenario:

A restaurant management system tracks menu items with their ingredients and portion sizes. Each menu item has specific ingredients, and each ingredient has a specific quantity per portion size.

Initial Schema:

MenuItemRecipe(ItemID, ItemName, Ingredient, PortionSize, Quantity)

MenuItemRecipe Instance
ItemID	ItemName	Ingredient	PortionSize	Quantity
M001	Caesar Salad	Romaine Lettuce	Small	100g
M001	Caesar Salad	Romaine Lettuce	Large	200g
M001	Caesar Salad	Parmesan	Small	30g
M001	Caesar Salad	Parmesan	Large	60g
M001	Caesar Salad	Croutons	Small	20g
M001	Caesar Salad	Croutons	Large	40g

Dependency Analysis:

Functional Dependencies:

ItemID → ItemName
{ItemID, Ingredient, PortionSize} → Quantity

Potential MVDs:

Does ItemID →→ Ingredient hold independently? Let's check.
Does ItemID →→ PortionSize hold independently? Let's check.

The Critical Question: Are these independent?

At first glance, this looks like our previous examples. But examine the data:

Does every ingredient appear with every portion size? Yes (in this case)
Can we separate them?

Wait—there's a problem!

The Quantity attribute depends on the combination of Ingredient and PortionSize. If we decompose:

ItemIngredient(ItemID, Ingredient)
ItemPortionSize(ItemID, PortionSize)

We lose the Quantity information!

The quantity of lettuce in a small salad (100g) is different from a large salad (200g). This information cannot be recovered from a join of the decomposed relations.

Correct Analysis:

This is NOT a 4NF violation situation because:

The MVD ItemID →→ Ingredient does NOT hold independently of PortionSize when Quantity is involved
Quantity functionally depends on {ItemID, Ingredient, PortionSize}
The 'Cartesian product' appearance is not redundancy—it's genuinely required data

Proper Schema:

The original schema is actually almost correct. We just need to address the FD:

MenuItem(ItemID PK, ItemName)
RecipeDetail(ItemID PK FK, Ingredient PK, PortionSize PK, Quantity)

MenuItem: Addresses the FD violation
RecipeDetail: Has key {ItemID, Ingredient, PortionSize}, FD to Quantity is fine (key → non-key)

Verification:

RecipeDetail: Key is {ItemID, Ingredient, PortionSize}
Check for MVDs: ItemID →→ Ingredient? If we project out PortionSize, each Ingredient still appears with every PortionSize per Item. But this is REQUIRED because Quantity varies!
The apparent MVD doesn't cause redundancy because the combinations carry unique Quantity data

This is NOT a 4NF violation. No further decomposition is appropriate.

Don't Over-Normalize

This example demonstrates a critical lesson: not every Cartesian-product-looking structure is a 4NF violation. If the combinations carry unique data (like quantities per ingredient/size pair), they're necessary. Over-normalizing would lose information.

Example 7: Partial Independence

Scenario:

A publishing company tracks authors with their books, publishers, and genres. An author can write multiple books with multiple publishers in multiple genres. However, books are tied to specific publishers (each book has one publisher), while genres are author-level (an author writes in certain genres across all books).

Initial Schema:

AuthorPublishing(AuthorID, AuthorName, BookISBN, BookTitle, PublisherID, Genre)

Dependency Analysis:

Functional Dependencies:

AuthorID → AuthorName
BookISBN → BookTitle, PublisherID (each book has one title and one publisher)

Multivalued Dependencies:

AuthorID →→ Genre (an author's genres are independent of specific books)
AuthorID →→ BookISBN? Not exactly—books carry additional dependent info (title, publisher)

The Partial Independence:

Here we have:

Genres: Independent of books (author-level attribute)
Books: Each carries dependent info (title, publisher)

This is partial independence—Genre is independent, but BookISBN is part of a larger structure.

4NF Issue:

If we store (AuthorID, BookISBN, BookTitle, PublisherID, Genre), each book appears with every genre:

Author writes in {Sci-Fi, Fantasy}
Author has books {B1, B2, B3}
We get 6 rows instead of 2 (genres) + 3 (books) = 5 facts

Decomposition Strategy:

Step 1: Separate entities with FDs

Author(AuthorID, AuthorName)
Book(BookISBN, BookTitle, PublisherID)

Step 2: Create relationship tables

AuthorBook(AuthorID, BookISBN) — Which authors wrote which books
AuthorGenre(AuthorID, Genre) — Which genres each author writes in

Final 4NF Schema:

Author(AuthorID PK, AuthorName)
Book(BookISBN PK, BookTitle, PublisherID FK)
Publisher(PublisherID PK, PublisherName) -- Implied
AuthorBook(AuthorID PK FK, BookISBN PK FK)
AuthorGenre(AuthorID PK FK, Genre PK)

Verification:

Author: Key is AuthorID, single FD to AuthorName. 4NF ✓
Book: Key is BookISBN, FDs to BookTitle and PublisherID. 4NF ✓
AuthorBook: Binary relationship. 4NF ✓
AuthorGenre: Binary relationship. 4NF ✓

Key Insight:

We had to recognize that:

Genre is author-level (independent of books)
Book details are book-level (not independent)

Partial independence requires careful analysis to identify which attributes are truly independent and which are bound together.

Semantic Analysis Required

Complex scenarios require understanding the business semantics. Ask domain experts: 'Are genres assigned per author or per book?' 'Does each book have its own publisher or do authors work with publishers?' The answers determine independence and thus proper decomposition.

Summary: 4NF Examples

We've examined a diverse collection of 4NF scenarios. Let's consolidate the patterns and lessons:

Key Patterns Observed

•Product variants (color/size/market) — Classic triple independence, massive redundancy reduction possible.
•Course management (instructor/textbook) — Two independent M:N relationships on same entity.
•Patient records (allergy/medication) — Independent health facts, critical for safety.
•Supply chain (supplier/warehouse) — Business relationships that are independently managed.
•HR competencies (skill/cert/language) — Multiple independent attribute sets per person.
•Recipe details — NOT a violation when combinations carry unique data (quantity).
•Partial independence — Only some attributes are independent; requires careful analysis.

Example Summary
Example	Violation?	Key Lesson
E-Commerce	Yes	Triple independence causes cubic row growth
University	Yes	Combine FD and MVD decomposition
Healthcare	Yes	Data integrity is patient safety
Manufacturing	Yes	Supply chains scale linearly with 4NF
HR Skills	Yes	Three+ independent sets = severe redundancy
Restaurant	No	Combinations with unique data aren't violations
Publishing	Partial	Distinguish entity-level from instance-level attributes

General Methodology:

Identify all attributes — List every attribute in the relation
Determine FDs — Find single-valued dependencies
Determine MVDs — Find set-valued dependencies
Assess independence — Are the multi-valued facts truly independent?
Check triviality — Filter out trivial MVDs
Compare to keys — Are MVD determinants superkeys?
Decompose if needed — Apply the 4NF algorithm
Verify result — Confirm all relations are 4NF

You are now equipped to handle 4NF analysis in any domain!

Module Complete

You have completed the Fourth Normal Form module! You now understand the formal definition of 4NF, can identify and analyze MVD violations, apply the decomposition algorithm, compare 4NF with BCNF, and work through complex real-world examples. This knowledge positions you to design databases that are free from both FD and MVD-based redundancy.

4NF Examples

Learning Through Practice

Each example follows a consistent structure:

Scenario description — The real-world context
Initial schema — The relation with potential issues
Dependency analysis — Identifying FDs and MVDs
4NF assessment — Determining if violations exist
Decomposition — Resolving violations when present
Verification — Confirming the result is in 4NF

These examples will cement your understanding and prepare you for applying 4NF analysis to your own database designs.

What You Will Learn

Example 1: E-Commerce Product Catalog

Scenario:

Initial Schema:

ProductCatalog(ProductID, ProductName, Color, Size, Market, BasePrice)

ProductCatalog Instance
ProductID	ProductName	Color	Size	Market	BasePrice
P001	Classic T-Shirt	Red	S	USA	$19.99
P001	Classic T-Shirt	Red	S	EU	$19.99
P001	Classic T-Shirt	Red	M	USA	$19.99
P001	Classic T-Shirt	Red	M	EU	$19.99
P001	Classic T-Shirt	Blue	S	USA	$19.99
P001	Classic T-Shirt	Blue	S	EU	$19.99
P001	Classic T-Shirt	Blue	M	USA	$19.99
P001	Classic T-Shirt	Blue	M	EU	$19.99

Dependency Analysis:

Functional Dependencies:

ProductID → ProductName (each product has one name)
ProductID → BasePrice (each product has one base price)

Multivalued Dependencies:

ProductID →→ Color (colors independent of sizes and markets)
ProductID →→ Size (sizes independent of colors and markets)
ProductID →→ Market (markets independent of colors and sizes)

4NF Assessment:

Candidate key: {ProductID, Color, Size, Market}
Check MVD ProductID →→ Color: ProductID is NOT a superkey. 4NF Violation!
Check MVD ProductID →→ Size: ProductID is NOT a superkey. 4NF Violation!
Check MVD ProductID →→ Market: ProductID is NOT a superkey. 4NF Violation!
Check FD ProductID → ProductName: ProductID is NOT a superkey. BCNF Violation!
Check FD ProductID → BasePrice: ProductID is NOT a superkey. BCNF Violation!

Severity Analysis:

With 2 colors × 2 sizes × 2 markets = 8 rows for one product. With 10 colors × 10 sizes × 50 markets = 5,000 rows per product!

ProductName and BasePrice each repeated 5,000 times per product.

Decomposition:

Step 1: Address the FD violations first (BCNF)

Decompose on ProductID → ProductName, BasePrice:
- Product(ProductID, ProductName, BasePrice)
- ProductVariants(ProductID, Color, Size, Market)

Step 2: Address MVD violations on ProductVariants

Decompose on ProductID →→ Color:
- ProductColor(ProductID, Color)
- ProductSizeMarket(ProductID, Size, Market)

Step 3: Continue with ProductSizeMarket

Decompose on ProductID →→ Size:
- ProductSize(ProductID, Size)
- ProductMarket(ProductID, Market)

Final 4NF Schema:

Product(ProductID PK, ProductName, BasePrice)
ProductColor(ProductID PK FK, Color PK)
ProductSize(ProductID PK FK, Size PK)
ProductMarket(ProductID PK FK, Market PK)

Verification:

Product: FD ProductID → (ProductName, BasePrice). ProductID is key. BCNF ✓, 4NF ✓
ProductColor: Binary relation. Automatically 4NF ✓
ProductSize: Binary relation. Automatically 4NF ✓
ProductMarket: Binary relation. Automatically 4NF ✓

Storage Comparison:

Original: 8 rows (and 5,000 with more options) Decomposed: 1 + 2 + 2 + 2 = 7 rows (1 + 10 + 10 + 50 = 71 with more options)

Reduction: 5,000 → 71 = 98.6% reduction in row count!

Massive Improvement

Example 2: University Course Management

Scenario:

Initial Schema:

CourseInfo(CourseID, CourseName, InstructorID, InstructorName, TextbookISBN, TextbookTitle)

CourseInfo Instance
CourseID	CourseName	InstructorID	InstructorName	TextbookISBN	TextbookTitle
CS101	Intro to Programming	I001	Dr. Smith	978-0134	Python Basics
CS101	Intro to Programming	I001	Dr. Smith	978-0256	Coding Fundamentals
CS101	Intro to Programming	I002	Dr. Jones	978-0134	Python Basics
CS101	Intro to Programming	I002	Dr. Jones	978-0256	Coding Fundamentals
CS201	Data Structures	I002	Dr. Jones	978-0378	DSA Guide

Dependency Analysis:

Functional Dependencies:

CourseID → CourseName
InstructorID → InstructorName
TextbookISBN → TextbookTitle

Multivalued Dependencies:

CourseID →→ InstructorID (instructors assigned independently of textbooks)
CourseID →→ TextbookISBN (textbooks assigned independently of instructors)

4NF Assessment:

Candidate key: {CourseID, InstructorID, TextbookISBN}
All three FDs have non-superkey determinants: BCNF violations
Both MVDs have non-superkey determinant: 4NF violations

Decomposition:

This requires careful ordering to handle both FD and MVD violations:

Step 1: Decompose on CourseID → CourseName

Course(CourseID, CourseName)
R1(CourseID, InstructorID, InstructorName, TextbookISBN, TextbookTitle)

Step 2: Decompose R1 on InstructorID → InstructorName

Instructor(InstructorID, InstructorName)
R2(CourseID, InstructorID, TextbookISBN, TextbookTitle)

Step 3: Decompose R2 on TextbookISBN → TextbookTitle

Textbook(TextbookISBN, TextbookTitle)
R3(CourseID, InstructorID, TextbookISBN)

Step 4: Now R3 still has MVD violations!

CourseID →→ InstructorID in R3: Non-superkey determinant. 4NF Violation!
Decompose on CourseID →→ InstructorID:
- CourseInstructor(CourseID, InstructorID)
- CourseTextbook(CourseID, TextbookISBN)

Final 4NF Schema:

Course(CourseID PK, CourseName)
Instructor(InstructorID PK, InstructorName)
Textbook(TextbookISBN PK, TextbookTitle)
CourseInstructor(CourseID PK FK, InstructorID PK FK)
CourseTextbook(CourseID PK FK, TextbookISBN PK FK)

Verification:

Course: CourseID is key, CourseID → CourseName. BCNF ✓, 4NF ✓
Instructor: InstructorID is key, InstructorID → InstructorName. BCNF ✓, 4NF ✓
Textbook: TextbookISBN is key, TextbookISBN → TextbookTitle. BCNF ✓, 4NF ✓
CourseInstructor: Binary relation. 4NF ✓
CourseTextbook: Binary relation. 4NF ✓

Anomalies Eliminated:

Anomaly	Before	After
Add new instructor to course	Update multiple rows (one per textbook)	Insert 1 row in CourseInstructor
Change instructor name	Update everywhere name appears	Update 1 row in Instructor
Remove textbook from course	Delete multiple rows (one per instructor)	Delete 1 row in CourseTextbook
Change course name	Update in every row	Update 1 row in Course

Natural Entity Boundaries

Example 3: Healthcare Patient Records

Scenario:

Initial Schema:

PatientMedical(PatientID, PatientName, DateOfBirth, Allergy, Medication)

PatientMedical Instance
PatientID	PatientName	DateOfBirth	Allergy	Medication
P001	John Doe	1985-03-15	Penicillin	Lisinopril
P001	John Doe	1985-03-15	Penicillin	Metformin
P001	John Doe	1985-03-15	Penicillin	Aspirin
P001	John Doe	1985-03-15	Sulfa	Lisinopril
P001	John Doe	1985-03-15	Sulfa	Metformin
P001	John Doe	1985-03-15	Sulfa	Aspirin

Dependency Analysis:

Functional Dependencies:

PatientID → PatientName, DateOfBirth

Multivalued Dependencies:

PatientID →→ Allergy (allergies recorded independently of medications)
PatientID →→ Medication (medications prescribed independently of allergies)

Critical Observation:

Allergies and medications ARE semantically independent:

Allergies are historical facts about patient physiology
Medications are current prescriptions based on conditions
A patient's allergy list doesn't determine their medication list

Yet in this schema, we store every combination, creating the Cartesian product.

4NF Assessment:

Candidate key: {PatientID, Allergy, Medication}
FD PatientID → (PatientName, DateOfBirth): Non-superkey determinant. BCNF Violation!
MVD PatientID →→ Allergy: Non-superkey determinant. 4NF Violation!
MVD PatientID →→ Medication: Non-superkey determinant. 4NF Violation!

Decomposition:

Step 1: Decompose on FD PatientID → (PatientName, DateOfBirth)

Patient(PatientID, PatientName, DateOfBirth)
PatientAllergyMed(PatientID, Allergy, Medication)

Step 2: Decompose PatientAllergyMed on MVD PatientID →→ Allergy

PatientAllergy(PatientID, Allergy)
PatientMedication(PatientID, Medication)

Final 4NF Schema:

Patient(PatientID PK, PatientName, DateOfBirth)
PatientAllergy(PatientID PK FK, Allergy PK)
PatientMedication(PatientID PK FK, Medication PK)

Verification:

Patient: PatientID is key. BCNF ✓, 4NF ✓
PatientAllergy: Binary, automatically 4NF ✓
PatientMedication: Binary, automatically 4NF ✓

Healthcare-Specific Benefits:

Safety: No risk of forgetting to add allergy info for each new medication
Accuracy: Patient demographics updated in one place
Audit trail: Each fact change is isolated and trackable
Query efficiency: Find all patients allergic to Penicillin with simple index scan
Data entry: Add allergy without knowing medications, and vice versa

Critical in Healthcare

Example 4: Manufacturing Supply Chain

Scenario:

Initial Schema:

ComponentSourcing(ComponentID, ComponentName, SupplierID, SupplierName, WarehouseID, WarehouseLocation)

ComponentSourcing Instance
ComponentID	ComponentName	SupplierID	SupplierName	WarehouseID	WarehouseLocation
C001	Steel Bolt M10	S001	FastenCo	W001	Chicago
C001	Steel Bolt M10	S001	FastenCo	W002	Detroit
C001	Steel Bolt M10	S002	BoltWorks	W001	Chicago
C001	Steel Bolt M10	S002	BoltWorks	W002	Detroit
C001	Steel Bolt M10	S003	MetalMax	W001	Chicago
C001	Steel Bolt M10	S003	MetalMax	W002	Detroit

Dependency Analysis:

Functional Dependencies:

ComponentID → ComponentName
SupplierID → SupplierName
WarehouseID → WarehouseLocation

Multivalued Dependencies:

ComponentID →→ SupplierID (approved suppliers independent of storage locations)
ComponentID →→ WarehouseID (storage locations independent of suppliers)

4NF Assessment:

Candidate key: {ComponentID, SupplierID, WarehouseID}
All FDs have non-superkey determinants: BCNF Violations
Both MVDs have non-superkey determinant: 4NF Violations

Business Impact of Violations:

With 100 components, avg 5 suppliers and 10 warehouses each:

Violation: 100 × 5 × 10 = 5,000 rows
Proper 4NF: 100 (components) + 500 (approvals) + 1,000 (stocking) + entities = ~1,600 rows

The bloated schema wastes storage and makes updates error-prone.

Decomposition:

Step 1: Extract entities (address FD violations)

Component(ComponentID, ComponentName)
Supplier(SupplierID, SupplierName)
Warehouse(WarehouseID, WarehouseLocation)
R1(ComponentID, SupplierID, WarehouseID)

Step 2: Decompose R1 on MVD ComponentID →→ SupplierID

ComponentSupplier(ComponentID, SupplierID)
ComponentWarehouse(ComponentID, WarehouseID)

Final 4NF Schema:

Component(ComponentID PK, ComponentName)
Supplier(SupplierID PK, SupplierName)
Warehouse(WarehouseID PK, WarehouseLocation)
ComponentSupplier(ComponentID PK FK, SupplierID PK FK) -- Approved suppliers
ComponentWarehouse(ComponentID PK FK, WarehouseID PK FK) -- Stocking locations

Operational Benefits:

Operation	Before 4NF	After 4NF
Approve new supplier	Insert 10 rows (one per warehouse)	Insert 1 row
Add warehouse location	Insert 5 rows (one per supplier)	Insert 1 row
Revoke supplier approval	Delete 10 rows	Delete 1 row
Query: All suppliers for C001	Scan 50 rows, deduplicate	Scan 5 rows
Update supplier name	Update 10 rows	Update 1 row

Supply Chain Scalability

Example 5: Human Resources Skills Matrix

Scenario:

Initial Schema:

EmployeeCompetencies(EmpID, EmpName, Skill, Certification, Language)

EmployeeCompetencies Instance (Partial)
EmpID	EmpName	Skill	Certification	Language
E001	Alice	Python	AWS-SAA	English
E001	Alice	Python	AWS-SAA	Spanish
E001	Alice	Python	PMP	English
E001	Alice	Python	PMP	Spanish
E001	Alice	Java	AWS-SAA	English
E001	Alice	Java	AWS-SAA	Spanish
E001	Alice	Java	PMP	English
E001	Alice	Java	PMP	Spanish

Dependency Analysis:

Functional Dependencies:

EmpID → EmpName

Multivalued Dependencies:

EmpID →→ Skill (skills independent of certs and languages)
EmpID →→ Certification (certs independent of skills and languages)
EmpID →→ Language (languages independent of skills and certs)

Severity Analysis:

Alice has: 2 skills × 2 certifications × 2 languages = 8 rows

A senior employee with 15 skills, 5 certifications, 3 languages: 15 × 5 × 3 = 225 rows for ONE person!

For 1,000 employees with similar profiles: 225,000 rows vs. properly normalized ~23,000 rows.

4NF Assessment:

Candidate key: {EmpID, Skill, Certification, Language}
FD EmpID → EmpName: Non-superkey determinant. BCNF Violation
All three MVDs have non-superkey determinant: 4NF Violations

Decomposition:

This is a three-way independent MVD situation requiring four relations:

Step 1: Extract Employee entity

Employee(EmpID, EmpName)
R1(EmpID, Skill, Certification, Language)

Step 2: Decompose R1 on EmpID →→ Skill

EmployeeSkill(EmpID, Skill)
R2(EmpID, Certification, Language)

Step 3: Decompose R2 on EmpID →→ Certification

EmployeeCertification(EmpID, Certification)
EmployeeLanguage(EmpID, Language)

Final 4NF Schema:

Employee(EmpID PK, EmpName)
EmployeeSkill(EmpID PK FK, Skill PK)
EmployeeCertification(EmpID PK FK, Certification PK)
EmployeeLanguage(EmpID PK FK, Language PK)

Row Count Comparison:

Scenario	Original	4NF
Alice (2-2-2)	8 rows	1+2+2+2 = 7 rows
Senior (15-5-3)	225 rows	1+15+5+3 = 24 rows
1000 employees	~225,000 rows	~24,000 rows

Reduction: ~90% fewer rows!

HR Analytics Benefit

Example 6: When It's NOT a 4NF Violation

Scenario:

A restaurant management system tracks menu items with their ingredients and portion sizes. Each menu item has specific ingredients, and each ingredient has a specific quantity per portion size.

Initial Schema:

MenuItemRecipe(ItemID, ItemName, Ingredient, PortionSize, Quantity)

MenuItemRecipe Instance
ItemID	ItemName	Ingredient	PortionSize	Quantity
M001	Caesar Salad	Romaine Lettuce	Small	100g
M001	Caesar Salad	Romaine Lettuce	Large	200g
M001	Caesar Salad	Parmesan	Small	30g
M001	Caesar Salad	Parmesan	Large	60g
M001	Caesar Salad	Croutons	Small	20g
M001	Caesar Salad	Croutons	Large	40g

Dependency Analysis:

Functional Dependencies:

ItemID → ItemName
{ItemID, Ingredient, PortionSize} → Quantity

Potential MVDs:

Does ItemID →→ Ingredient hold independently? Let's check.
Does ItemID →→ PortionSize hold independently? Let's check.

The Critical Question: Are these independent?

At first glance, this looks like our previous examples. But examine the data:

Does every ingredient appear with every portion size? Yes (in this case)
Can we separate them?

Wait—there's a problem!

The Quantity attribute depends on the combination of Ingredient and PortionSize. If we decompose:

ItemIngredient(ItemID, Ingredient)
ItemPortionSize(ItemID, PortionSize)

We lose the Quantity information!

The quantity of lettuce in a small salad (100g) is different from a large salad (200g). This information cannot be recovered from a join of the decomposed relations.

Correct Analysis:

This is NOT a 4NF violation situation because:

The MVD ItemID →→ Ingredient does NOT hold independently of PortionSize when Quantity is involved
Quantity functionally depends on {ItemID, Ingredient, PortionSize}
The 'Cartesian product' appearance is not redundancy—it's genuinely required data

Proper Schema:

The original schema is actually almost correct. We just need to address the FD:

MenuItem(ItemID PK, ItemName)
RecipeDetail(ItemID PK FK, Ingredient PK, PortionSize PK, Quantity)

MenuItem: Addresses the FD violation
RecipeDetail: Has key {ItemID, Ingredient, PortionSize}, FD to Quantity is fine (key → non-key)

Verification:

RecipeDetail: Key is {ItemID, Ingredient, PortionSize}
Check for MVDs: ItemID →→ Ingredient? If we project out PortionSize, each Ingredient still appears with every PortionSize per Item. But this is REQUIRED because Quantity varies!
The apparent MVD doesn't cause redundancy because the combinations carry unique Quantity data

This is NOT a 4NF violation. No further decomposition is appropriate.

Don't Over-Normalize

Example 7: Partial Independence

Scenario:

Initial Schema:

AuthorPublishing(AuthorID, AuthorName, BookISBN, BookTitle, PublisherID, Genre)

Dependency Analysis:

Functional Dependencies:

AuthorID → AuthorName
BookISBN → BookTitle, PublisherID (each book has one title and one publisher)

Multivalued Dependencies:

AuthorID →→ Genre (an author's genres are independent of specific books)
AuthorID →→ BookISBN? Not exactly—books carry additional dependent info (title, publisher)

The Partial Independence:

Here we have:

Genres: Independent of books (author-level attribute)
Books: Each carries dependent info (title, publisher)

This is partial independence—Genre is independent, but BookISBN is part of a larger structure.

4NF Issue:

If we store (AuthorID, BookISBN, BookTitle, PublisherID, Genre), each book appears with every genre:

Author writes in {Sci-Fi, Fantasy}
Author has books {B1, B2, B3}
We get 6 rows instead of 2 (genres) + 3 (books) = 5 facts

Decomposition Strategy:

Step 1: Separate entities with FDs

Author(AuthorID, AuthorName)
Book(BookISBN, BookTitle, PublisherID)

Step 2: Create relationship tables

AuthorBook(AuthorID, BookISBN) — Which authors wrote which books
AuthorGenre(AuthorID, Genre) — Which genres each author writes in

Final 4NF Schema:

Author(AuthorID PK, AuthorName)
Book(BookISBN PK, BookTitle, PublisherID FK)
Publisher(PublisherID PK, PublisherName) -- Implied
AuthorBook(AuthorID PK FK, BookISBN PK FK)
AuthorGenre(AuthorID PK FK, Genre PK)

Verification:

Author: Key is AuthorID, single FD to AuthorName. 4NF ✓
Book: Key is BookISBN, FDs to BookTitle and PublisherID. 4NF ✓
AuthorBook: Binary relationship. 4NF ✓
AuthorGenre: Binary relationship. 4NF ✓

Key Insight:

We had to recognize that:

Genre is author-level (independent of books)
Book details are book-level (not independent)

Partial independence requires careful analysis to identify which attributes are truly independent and which are bound together.

Semantic Analysis Required

Summary: 4NF Examples

We've examined a diverse collection of 4NF scenarios. Let's consolidate the patterns and lessons:

Key Patterns Observed

•Product variants (color/size/market) — Classic triple independence, massive redundancy reduction possible.
•Course management (instructor/textbook) — Two independent M:N relationships on same entity.
•Patient records (allergy/medication) — Independent health facts, critical for safety.
•Supply chain (supplier/warehouse) — Business relationships that are independently managed.
•HR competencies (skill/cert/language) — Multiple independent attribute sets per person.
•Recipe details — NOT a violation when combinations carry unique data (quantity).
•Partial independence — Only some attributes are independent; requires careful analysis.

Example Summary
Example	Violation?	Key Lesson
E-Commerce	Yes	Triple independence causes cubic row growth
University	Yes	Combine FD and MVD decomposition
Healthcare	Yes	Data integrity is patient safety
Manufacturing	Yes	Supply chains scale linearly with 4NF
HR Skills	Yes	Three+ independent sets = severe redundancy
Restaurant	No	Combinations with unique data aren't violations
Publishing	Partial	Distinguish entity-level from instance-level attributes

General Methodology:

Identify all attributes — List every attribute in the relation
Determine FDs — Find single-valued dependencies
Determine MVDs — Find set-valued dependencies
Assess independence — Are the multi-valued facts truly independent?
Check triviality — Filter out trivial MVDs
Compare to keys — Are MVD determinants superkeys?
Decompose if needed — Apply the 4NF algorithm
Verify result — Confirm all relations are 4NF

You are now equipped to handle 4NF analysis in any domain!

Module Complete