Loading content...
In medicine, diagnosis precedes treatment. A physician doesn't prescribe medication without first understanding what's wrong. Database design works the same way. Before you can normalize a schema—decompose it, restructure it, improve it—you must first diagnose its current condition: What normal form does it satisfy? Where exactly does it fall short?
This diagnostic skill separates superficial database knowledge from deep understanding. Many engineers can recite the definitions of 1NF through BCNF, but far fewer can reliably analyze a given relation and pinpoint precisely which normal forms it satisfies and which violations exist. In interviews, this analytical precision is exactly what distinguishes strong candidates.
By the end of this page, you will possess a systematic methodology for analyzing any relation against all standard normal forms (1NF, 2NF, 3NF, BCNF, and higher). You'll understand not just the definitions, but the reasoning behind each check, enabling you to explain your analysis clearly in interviews and design reviews. You'll also learn the common edge cases and traps that cause candidates to give incorrect answers.
Normal forms are nested requirements—each higher form includes all requirements of lower forms plus additional constraints. Understanding this hierarchy is essential for systematic analysis.
The Progression:
1NF → 2NF → 3NF → BCNF → 4NF → 5NF
Each arrow means "plus additional requirements."
Key Insight: If a relation violates 2NF, it also technically violates 3NF, BCNF, etc. (since 2NF is a prerequisite). Therefore, when analyzing, we work from bottom to top—first check 1NF, then 2NF, etc. Stop at the first violation; that's the highest normal form achieved.
| Normal Form | Core Requirement | What It Eliminates | Prerequisites |
|---|---|---|---|
| 1NF | Atomic values, unique rows | Repeating groups, multi-valued attributes | None (base requirement) |
| 2NF | No partial dependencies | Attributes depending on part of key | 1NF |
| 3NF | No transitive dependencies (from non-keys) | Non-key → Non-key dependencies | 2NF |
| BCNF | Every determinant is a superkey | Non-superkey determinants | 3NF |
| 4NF | No non-trivial MVDs (except superkeys) | Independent multivalued facts | BCNF |
| 5NF (PJNF) | No join dependencies (except from keys) | Cyclic join dependencies | 4NF |
In most database interviews, 1NF through BCNF are the focus. 4NF and 5NF are rarely tested unless the role specifically involves advanced database design. Master 1NF–BCNF thoroughly before worrying about higher forms.
1NF is the foundation—the minimum requirement for a relation to even be called "relational." Yet violations are surprisingly common in practice, especially when mapping from non-relational sources.
1NF Requirements:
How to Check for 1NF Violations:
Example 1 - Multi-valued attribute:
Student(ID, Name, PhoneNumbers)
- Row: (1, 'Alice', '555-1234, 555-5678')
Example 2 - Repeating groups:
Order(OrderID, Product1, Qty1, Product2, Qty2, Product3, Qty3)
Example 3 - Nested structure:
Employee(ID, Name, Address{Street, City, State, ZIP})**Violations:**
Example 1: PhoneNumbers contains multiple values in one cell → **Violates 1NF**
Fix: Create separate Phone table (StudentID, PhoneNumber)
Example 2: Product1/Qty1, Product2/Qty2, etc. are repeating groups → **Violates 1NF**
Fix: Create OrderLine(OrderID, ProductID, Quantity)
Example 3: Address is a composite/nested structure → **Violates 1NF** (depending on interpretation)
Fix: Flatten to separate columns (Street, City, State, ZIP)Modern databases support JSON and array types. Does storing [1, 2, 3] in a column violate 1NF? Technically yes by classic theory. Practically, it depends on query patterns. If you never need to query individual array elements, storage atomicity is less critical. In interviews, acknowledge both perspectives—show you understand the theory while recognizing practical flexibility.
1NF Analysis Checklist:
If any check fails → Not in 1NF (stop here, no need to check higher forms)
2NF addresses partial dependencies—when a non-key attribute depends on only part of a composite primary key. This is only relevant when the primary key has multiple attributes.
Critical Observation: If the primary key is a single attribute, the relation is automatically in 2NF (assuming 1NF). Partial dependencies only exist with composite keys.
2NF Definition:
A relation is in 2NF iff:
Full Functional Dependency: X → A is a full FD if removing any attribute from X breaks the dependency. Partial dependency means some proper subset of X also determines A.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
/** * 2NF Violation Detection Algorithm * Input: Relation R, Set of FDs F, Candidate Key(s) K * Output: List of partial dependencies (2NF violations) */function detect2NFViolations(R, F, candidateKeys) { let violations = []; // If all candidate keys are single attributes, no 2NF violations possible if (candidateKeys.every(k => k.attributes.length === 1)) { return []; // Automatically in 2NF } // Identify all prime attributes (part of any candidate key) let primeAttrs = new Set(); for (const key of candidateKeys) { for (const attr of key.attributes) { primeAttrs.add(attr); } } // Identify non-prime attributes let nonPrimeAttrs = R.attributes.filter(a => !primeAttrs.has(a)); // For each non-prime attribute for (const attr of nonPrimeAttrs) { // For each candidate key for (const key of candidateKeys) { if (key.attributes.length > 1) { // Check if any proper subset of key determines attr for (const subset of properSubsets(key.attributes)) { let subsetClosure = computeClosure(subset, F); if (subsetClosure.has(attr)) { violations.push({ attribute: attr, partialDeterminant: subset, fullKey: key }); } } } } } return violations;}StudentCourse(StudentID, CourseID, StudentName, CourseName, Grade)
FDs:
- StudentID → StudentName
- CourseID → CourseName
- {StudentID, CourseID} → Grade
Primary Key: {StudentID, CourseID}**Step 1: Verify 1NF**
All attributes are atomic, PK defined → ✅ In 1NF
**Step 2: Identify prime and non-prime attributes**
Prime: {StudentID, CourseID}
Non-prime: {StudentName, CourseName, Grade}
**Step 3: Check for partial dependencies**
• StudentName: Determined by StudentID alone (proper subset of key)
**Partial Dependency Found** → 2NF Violation
• CourseName: Determined by CourseID alone (proper subset of key)
**Partial Dependency Found** → 2NF Violation
• Grade: Requires full key {StudentID, CourseID}
Full dependency → No violation
**Conclusion: NOT in 2NF**
Violating FDs:
- StudentID → StudentName (partial)
- CourseID → CourseName (partial)2NF only concerns non-prime attributes. A prime attribute (one that's part of any candidate key) can have partial dependencies without violating 2NF. This subtlety is sometimes tested in interviews. Example: If {A, B} and {C, D} are both candidate keys, and C depends on A, this doesn't violate 2NF because C is prime.
3NF addresses transitive dependencies—when a non-key attribute depends on another non-key attribute, creating an indirect path from the key.
3NF Definition (Standard):
A relation is in 3NF iff for every non-trivial FD X → A:
Alternative (Codd's Original):
A relation is in 3NF iff:
Transitive Dependency: A is transitively dependent on key K if:
Understanding the 3NF Definition:
The two conditions ("X is superkey" OR "A is prime") might seem odd. Here's the intuition:
If X is a superkey: The dependency is fine—keys are supposed to determine everything.
If A is prime: Even if X isn't a superkey, we tolerate this because prime attributes have special status (they help identify tuples).
The violation case: X is NOT a superkey AND A is NOT prime. This is the transitive dependency pattern.
Employee(EmpID, EmpName, DeptID, DeptName, DeptLocation)
FDs:
- EmpID → EmpName, DeptID
- DeptID → DeptName, DeptLocation
Candidate Key: {EmpID}**Step 1: Verify 2NF**
Key is single attribute → Automatically in 2NF ✅
**Step 2: List all non-trivial FDs and analyze**
**FD: EmpID → EmpName**
- Is EmpID a superkey? Yes ✅
- No violation
**FD: EmpID → DeptID**
- Is EmpID a superkey? Yes ✅
- No violation
**FD: DeptID → DeptName**
- Is DeptID a superkey? No (DeptID⁺ = {DeptID, DeptName, DeptLocation} ≠ all attributes)
- Is DeptName prime? No (not part of any candidate key)
- **3NF Violation!**
**FD: DeptID → DeptLocation**
- Is DeptID a superkey? No
- Is DeptLocation prime? No
- **3NF Violation!**
**Conclusion: NOT in 3NF**
Transitive dependencies:
- EmpID → DeptID → DeptName
- EmpID → DeptID → DeptLocationBCNF is the "purest" form of the classical normal forms. It's stricter than 3NF and eliminates the "prime attribute exception."
BCNF Definition:
A relation is in BCNF iff for every non-trivial FD X → A:
That's it. No exceptions for prime attributes.
Comparison with 3NF:
| Condition | 3NF | BCNF |
|---|---|---|
| X → A where X is superkey | ✅ OK | ✅ OK |
| X → A where A is prime | ✅ OK | ❌ Violation |
| X → A where X not superkey AND A not prime | ❌ Violation | ❌ Violation |
A relation can be in 3NF but not BCNF. This happens when a non-superkey determines a prime attribute. Classic example: multiple overlapping candidate keys where attributes of one key are determined by attributes of another.
TeachingAssignment(Student, Subject, Teacher)
Constraints:
- Each student takes each subject from exactly one teacher
- Each teacher teaches exactly one subject
- Each subject can be taught by multiple teachers
FDs:
- {Student, Subject} → Teacher
- Teacher → Subject
Candidate Keys: {Student, Subject} and {Student, Teacher}**Step 1: Check 3NF**
**FD: {Student, Subject} → Teacher**
- Is {Student, Subject} a superkey? Yes ✅
- OK for 3NF
**FD: Teacher → Subject**
- Is Teacher a superkey? No (Teacher⁺ = {Teacher, Subject} ≠ all attributes)
- Is Subject prime? **Yes!** (part of candidate key {Student, Subject})
- **OK for 3NF** (prime attribute exception applies)
**Conclusion: IN 3NF ✅**
**Step 2: Check BCNF**
**FD: Teacher → Subject**
- Is Teacher a superkey? No
- **BCNF Violation!** (no prime attribute exception in BCNF)
**Conclusion: NOT in BCNF ❌**When BCNF Differs from 3NF:
The key pattern is overlapping candidate keys. If two candidate keys share some but not all attributes, you often get situations where an attribute of one key determines an attribute of another key without being a full superkey.
The Trade-off:
BCNF is "cleaner" but sometimes decomposing to BCNF loses dependency preservation. 3NF decomposition always preserves dependencies. This is why 3NF is sometimes preferred in practice—we'll explore this trade-off in Page 3.
123456789101112131415161718192021222324252627282930
/** * BCNF Violation Detection * Input: Relation R, Set of FDs F * Output: List of BCNF violations */function checkBCNF(R, F) { let violations = []; for (const fd of F) { if (!isTrivial(fd)) { // X → A where A ⊆ X is trivial // Compute closure of left side let leftClosure = computeClosure(fd.leftSide, F); // If closure doesn't include all attributes of R, // left side is not a superkey if (!leftClosure.equals(R.attributes)) { violations.push({ fd: fd, reason: fd.leftSide + " is not a superkey" }); } } } return violations;} // Note: For BCNF, we check EVERY non-trivial FD.// The determinant must ALWAYS be a superkey.// No exceptions for prime attributes (unlike 3NF).While 4NF and 5NF are rarely the focus of interviews, understanding them demonstrates advanced knowledge and is occasionally tested for senior roles.
Fourth Normal Form (4NF)
4NF addresses multivalued dependencies (MVDs)—situations where two or more independent multivalued facts are stored in the same relation.
4NF Definition:
A relation is in 4NF iff for every non-trivial MVD X ↠ Y:
Multivalued Dependency (MVD): X ↠ Y means that for each X value, the set of Y values is independent of other attributes. The formal definition: If tuples (x, y1, z1) and (x, y2, z2) exist, then (x, y1, z2) and (x, y2, z1) must also exist.
When 4NF Matters:
When a relation stores two independent one-to-many relationships from the same key.
CourseInfo(CourseID, Instructor, Textbook)
Constraints:
- A course can have multiple instructors
- A course can have multiple textbooks
- Instructor and Textbook are independent choices
Sample Data:
CourseID | Instructor | Textbook
---------|------------|----------
CS101 | Prof. A | Book X
CS101 | Prof. A | Book Y
CS101 | Prof. B | Book X
CS101 | Prof. B | Book Y**Analysis:**
MVDs present:
- CourseID ↠ Instructor (instructors are independent of textbook choice)
- CourseID ↠ Textbook (textbooks are independent of instructor choice)
Is CourseID a superkey? **No.** (CourseID alone doesn't determine a unique tuple)
**4NF Violation!**
**Problem:** Massive redundancy. Adding a third instructor or textbook requires many new rows.
**Solution:** Decompose into:
- CourseInstructor(CourseID, Instructor)
- CourseTextbook(CourseID, Textbook)
Now each fact is stored once, and the two are independent.Fifth Normal Form (5NF / Project-Join Normal Form)
5NF addresses join dependencies—situations where a relation can only be losslessly reconstructed by joining three or more projections.
5NF Definition:
A relation is in 5NF iff every join dependency is implied by the candidate keys.
This is highly theoretical. 5NF problems involve cyclic constraints that can't be expressed as FDs or MVDs.
Practical Relevance:
In practice, 4NF violations are rare; 5NF violations are rarer still. If you encounter one in an interview, it's likely a trick question designed to see if you recognize the pattern or if you overly apply normalization.
If asked about 4NF or 5NF:
Let's consolidate everything into a systematic framework you can apply to any relation in an interview.
The Complete NF Analysis Process:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
function analyzeNormalForm(R, F, MVDs = []) { // STEP 1: Find all candidate keys let candidateKeys = findCandidateKeys(R, F); let primeAttrs = union(candidateKeys); let nonPrimeAttrs = R.attributes.minus(primeAttrs); // STEP 2: Check 1NF if (hasMultiValuedAttributes(R) || hasRepeatingGroups(R) || hasNoPrimaryKey(R)) { return "NOT IN 1NF"; } // STEP 3: Check 2NF (only if composite keys exist) if (candidateKeys.some(k => k.size > 1)) { for (const fd of F) { if (nonPrimeAttrs.contains(fd.rightSide)) { for (const key of candidateKeys) { if (key.size > 1 && isProperSubset(fd.leftSide, key)) { return "IN 1NF, NOT IN 2NF (partial dependency: " + fd + ")"; } } } } } // STEP 4: Check 3NF for (const fd of F) { if (!isTrivial(fd)) { let leftIsSuperkey = computeClosure(fd.leftSide, F) .equals(R.attributes); let rightIsPrime = primeAttrs.contains(fd.rightSide); if (!leftIsSuperkey && !rightIsPrime) { return "IN 2NF, NOT IN 3NF (transitive: " + fd + ")"; } } } // STEP 5: Check BCNF for (const fd of F) { if (!isTrivial(fd)) { let leftIsSuperkey = computeClosure(fd.leftSide, F) .equals(R.attributes); if (!leftIsSuperkey) { return "IN 3NF, NOT IN BCNF (" + fd.leftSide + " is not superkey)"; } } } // STEP 6: Check 4NF (if MVDs provided) for (const mvd of MVDs) { if (!isTrivial(mvd)) { let leftIsSuperkey = computeClosure(mvd.leftSide, F) .equals(R.attributes); if (!leftIsSuperkey) { return "IN BCNF, NOT IN 4NF (MVD: " + mvd + ")"; } } } // STEP 7: Assume 5NF if no violations found return "IN BCNF" + (MVDs.length > 0 ? " AND 4NF" : "");}Interviewers often include subtle traps to test deep understanding. Here are the most common ones:
R(A, B, C)
FDs: A → B, B → C, C → A
Determine the normal form.**Step 1: Find candidate keys**
- A⁺ = {A, B, C} → A is a key
- B⁺ = {B, C, A} → B is a key
- C⁺ = {C, A, B} → C is a key
Candidate keys: {A}, {B}, {C}
**Step 2: Identify prime vs non-prime**
Prime: {A, B, C} (all attributes!)
Non-prime: {} (none!)
**Step 3: Check 3NF**
Every attribute is prime, so every FD trivially satisfies condition 2 (RHS is prime).
**In 3NF ✅**
**Step 4: Check BCNF**
- A → B: Is A a superkey? A⁺ = {A,B,C} = R. Yes! ✅
- B → C: Is B a superkey? B⁺ = {B,C,A} = R. Yes! ✅
- C → A: Is C a superkey? C⁺ = {C,A,B} = R. Yes! ✅
**In BCNF ✅**
**Result: This cyclic FD structure creates a relation in BCNF.**When solving NF analysis in interviews:
Let's apply the complete framework to practice problems.
R(A, B, C, D, E)
FDs: AB → CDE, C → D, D → E
Determine the highest normal form satisfied.**Step 1: Find candidate keys**
{A,B}⁺ = {A,B} → {A,B,C,D,E} (via AB→CDE)
{A,B} determines all, so it's a superkey.
Is {A} a key? A⁺ = {A}. No.
Is {B} a key? B⁺ = {B}. No.
**Candidate Key: {A, B}**
**Step 2: Classify attributes**
Prime: {A, B}
Non-prime: {C, D, E}
**Step 3: Check 2NF**
Does any proper subset of {A,B} determine non-prime attributes?
- A⁺ = {A}. A → (nothing non-prime directly)
- B⁺ = {B}. B → (nothing non-prime directly)
No partial dependencies. **In 2NF ✅**
**Step 4: Check 3NF**
- AB → CDE: AB is superkey? AB⁺ = R. Yes ✅
- C → D: C is superkey? C⁺ = {C,D,E} ≠ R. No. Is D prime? No.
**3NF Violation!**
**Result: IN 2NF, NOT IN 3NF**
Violating FD: C → D (transitive dependency through AB → C → D)R(A, B, C, D)
FDs: A → B, BC → D, D → A
Determine the highest normal form satisfied.**Step 1: Find candidate keys**
Try {A,C}⁺ = {A} → {A,B} → (need D)
{A,C}⁺ via BC→D? No, we have AC, not BC.
Try {B,C}⁺ = {B,C} → {B,C,D} (BC→D) → {B,C,D,A} (D→A)
= {A,B,C,D} = R. **{B,C} is a superkey.**
Minimal? B⁺={B}, C⁺={C}. Neither alone works.
**{B,C} is a candidate key.**
Try {C,D}⁺ = {C,D} → {A,C,D} (D→A) → {A,B,C,D} (A→B)
= R. **{C,D} is also a candidate key.**
Try {A,C}⁺ = {A,C} → {A,B,C} (A→B). Still missing D.
Not a superkey.
**Candidate Keys: {B,C} and {C,D}**
**Step 2: Classify attributes**
Prime: {B, C, D}
Non-prime: {A}
**Step 3: Check 3NF**
- A → B: Is A superkey? A⁺ = {A,B} ≠ R. No.
Is B prime? Yes! (part of key {B,C})
**OK for 3NF** ✅
- BC → D: {B,C} superkey? Yes ✅
- D → A: Is D superkey? D⁺ = {D,A,B} ≠ R. No.
Is A prime? **No!**
**3NF Violation!**
**Result: IN 2NF, NOT IN 3NF**
Violating FD: D → AWhat's Next:
Now that you can analyze and diagnose normal form violations, Page 3: Decomposition will teach you how to fix them—how to systematically decompose relations into higher normal forms while preserving losslessness and dependencies.
You now possess a systematic framework for analyzing any relation against normal forms. Practice with real schemas—the pattern recognition becomes intuitive over time. Every database you encounter is an opportunity to diagnose its design quality.