Normalization Problems - Learning Module

Loading content...

0/241

Normal Form Analysis

The Art of Diagnosing Database Design

In medicine, diagnosis precedes treatment. A physician doesn't prescribe medication without first understanding what's wrong. Database design works the same way. Before you can normalize a schema—decompose it, restructure it, improve it—you must first diagnose its current condition: What normal form does it satisfy? Where exactly does it fall short?

This diagnostic skill separates superficial database knowledge from deep understanding. Many engineers can recite the definitions of 1NF through BCNF, but far fewer can reliably analyze a given relation and pinpoint precisely which normal forms it satisfies and which violations exist. In interviews, this analytical precision is exactly what distinguishes strong candidates.

What You Will Master

By the end of this page, you will possess a systematic methodology for analyzing any relation against all standard normal forms (1NF, 2NF, 3NF, BCNF, and higher). You'll understand not just the definitions, but the reasoning behind each check, enabling you to explain your analysis clearly in interviews and design reviews. You'll also learn the common edge cases and traps that cause candidates to give incorrect answers.

The Normal Form Hierarchy

Normal forms are nested requirements—each higher form includes all requirements of lower forms plus additional constraints. Understanding this hierarchy is essential for systematic analysis.

The Progression:

1NF → 2NF → 3NF → BCNF → 4NF → 5NF

Each arrow means "plus additional requirements."

Key Insight: If a relation violates 2NF, it also technically violates 3NF, BCNF, etc. (since 2NF is a prerequisite). Therefore, when analyzing, we work from bottom to top—first check 1NF, then 2NF, etc. Stop at the first violation; that's the highest normal form achieved.

Normal Form Summary
Normal Form	Core Requirement	What It Eliminates	Prerequisites
1NF	Atomic values, unique rows	Repeating groups, multi-valued attributes	None (base requirement)
2NF	No partial dependencies	Attributes depending on part of key	1NF
3NF	No transitive dependencies (from non-keys)	Non-key → Non-key dependencies	2NF
BCNF	Every determinant is a superkey	Non-superkey determinants	3NF
4NF	No non-trivial MVDs (except superkeys)	Independent multivalued facts	BCNF
5NF (PJNF)	No join dependencies (except from keys)	Cyclic join dependencies	4NF

Interview Focus

In most database interviews, 1NF through BCNF are the focus. 4NF and 5NF are rarely tested unless the role specifically involves advanced database design. Master 1NF–BCNF thoroughly before worrying about higher forms.

First Normal Form (1NF) Analysis

1NF is the foundation—the minimum requirement for a relation to even be called "relational." Yet violations are surprisingly common in practice, especially when mapping from non-relational sources.

1NF Requirements:

Atomic Values: Each cell contains exactly one value (no arrays, lists, or nested structures)
Unique Rows: No duplicate tuples (enforced by having a primary key)
Single-Valued Attributes: Each attribute has a single domain, no multi-valued fields
No Repeating Groups: No sequences like Phone1, Phone2, Phone3

How to Check for 1NF Violations:

1NF Violation ExamplesCommon patterns that violate First Normal Form

Input

Example 1 - Multi-valued attribute:
Student(ID, Name, PhoneNumbers)
- Row: (1, 'Alice', '555-1234, 555-5678')

Example 2 - Repeating groups:
Order(OrderID, Product1, Qty1, Product2, Qty2, Product3, Qty3)

Example 3 - Nested structure:
Employee(ID, Name, Address{Street, City, State, ZIP})

Output

**Violations:**

Example 1: PhoneNumbers contains multiple values in one cell → **Violates 1NF**
Fix: Create separate Phone table (StudentID, PhoneNumber)

Example 2: Product1/Qty1, Product2/Qty2, etc. are repeating groups → **Violates 1NF**
Fix: Create OrderLine(OrderID, ProductID, Quantity)

Example 3: Address is a composite/nested structure → **Violates 1NF** (depending on interpretation)
Fix: Flatten to separate columns (Street, City, State, ZIP)

The JSON/Array Debate

Modern databases support JSON and array types. Does storing [1, 2, 3] in a column violate 1NF? Technically yes by classic theory. Practically, it depends on query patterns. If you never need to query individual array elements, storage atomicity is less critical. In interviews, acknowledge both perspectives—show you understand the theory while recognizing practical flexibility.

1NF Analysis Checklist:

✅ Does every column contain single, atomic values?
✅ Is there a defined primary key ensuring unique rows?
✅ Are there no repeating groups (Column1, Column2, Column3...)?
✅ Are there no nested structures or composite fields?

If any check fails → Not in 1NF (stop here, no need to check higher forms)

Second Normal Form (2NF) Analysis

2NF addresses partial dependencies—when a non-key attribute depends on only part of a composite primary key. This is only relevant when the primary key has multiple attributes.

Critical Observation: If the primary key is a single attribute, the relation is automatically in 2NF (assuming 1NF). Partial dependencies only exist with composite keys.

2NF Definition:

A relation is in 2NF iff:

It is in 1NF, AND
Every non-prime attribute is fully functionally dependent on every candidate key

Full Functional Dependency: X → A is a full FD if removing any attribute from X breaks the dependency. Partial dependency means some proper subset of X also determines A.

2NF Analysis Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/**
 * 2NF Violation Detection Algorithm
 * Input: Relation R, Set of FDs F, Candidate Key(s) K
 * Output: List of partial dependencies (2NF violations)
 */
function detect2NFViolations(R, F, candidateKeys) {
    let violations = [];
    
    // If all candidate keys are single attributes, no 2NF violations possible
    if (candidateKeys.every(k => k.attributes.length === 1)) {
        return []; // Automatically in 2NF
    }
    
    // Identify all prime attributes (part of any candidate key)
    let primeAttrs = new Set();
    for (const key of candidateKeys) {
        for (const attr of key.attributes) {
            primeAttrs.add(attr);
        }
    }
    
    // Identify non-prime attributes
    let nonPrimeAttrs = R.attributes.filter(a => !primeAttrs.has(a));
    
    // For each non-prime attribute
    for (const attr of nonPrimeAttrs) {
        // For each candidate key
        for (const key of candidateKeys) {
            if (key.attributes.length > 1) {
                // Check if any proper subset of key determines attr
                for (const subset of properSubsets(key.attributes)) {
                    let subsetClosure = computeClosure(subset, F);
                    if (subsetClosure.has(attr)) {
                        violations.push({
                            attribute: attr,
                            partialDeterminant: subset,
                            fullKey: key
                        });
                    }
                }
            }
        }
    }
    
    return violations;
}

2NF Analysis WalkthroughAnalyzing a relation for partial dependencies

Input

StudentCourse(StudentID, CourseID, StudentName, CourseName, Grade)

FDs:
- StudentID → StudentName
- CourseID → CourseName
- {StudentID, CourseID} → Grade

Primary Key: {StudentID, CourseID}

Output

**Step 1: Verify 1NF**
All attributes are atomic, PK defined → ✅ In 1NF

**Step 2: Identify prime and non-prime attributes**
Prime: {StudentID, CourseID}
Non-prime: {StudentName, CourseName, Grade}

**Step 3: Check for partial dependencies**

• StudentName: Determined by StudentID alone (proper subset of key)
  **Partial Dependency Found** → 2NF Violation

• CourseName: Determined by CourseID alone (proper subset of key)
  **Partial Dependency Found** → 2NF Violation

• Grade: Requires full key {StudentID, CourseID}
  Full dependency → No violation

**Conclusion: NOT in 2NF**

Violating FDs:
- StudentID → StudentName (partial)
- CourseID → CourseName (partial)

The Prime Attribute Nuance

2NF only concerns non-prime attributes. A prime attribute (one that's part of any candidate key) can have partial dependencies without violating 2NF. This subtlety is sometimes tested in interviews. Example: If {A, B} and {C, D} are both candidate keys, and C depends on A, this doesn't violate 2NF because C is prime.

Third Normal Form (3NF) Analysis

3NF addresses transitive dependencies—when a non-key attribute depends on another non-key attribute, creating an indirect path from the key.

3NF Definition (Standard):

A relation is in 3NF iff for every non-trivial FD X → A:

X is a superkey, OR
A is a prime attribute (part of some candidate key)

Alternative (Codd's Original):

A relation is in 3NF iff:

It is in 2NF, AND
No non-prime attribute is transitively dependent on any candidate key

Transitive Dependency: A is transitively dependent on key K if:

K → B (directly)
B → A (and B is not a superkey)
Therefore K → A indirectly through B

Understanding the 3NF Definition:

The two conditions ("X is superkey" OR "A is prime") might seem odd. Here's the intuition:

If X is a superkey: The dependency is fine—keys are supposed to determine everything.
If A is prime: Even if X isn't a superkey, we tolerate this because prime attributes have special status (they help identify tuples).

The violation case: X is NOT a superkey AND A is NOT prime. This is the transitive dependency pattern.

3NF Analysis WalkthroughDetecting transitive dependencies

Input

Employee(EmpID, EmpName, DeptID, DeptName, DeptLocation)

FDs:
- EmpID → EmpName, DeptID
- DeptID → DeptName, DeptLocation

Candidate Key: {EmpID}

Output

**Step 1: Verify 2NF**
Key is single attribute → Automatically in 2NF ✅

**Step 2: List all non-trivial FDs and analyze**

**FD: EmpID → EmpName**
- Is EmpID a superkey? Yes ✅
- No violation

**FD: EmpID → DeptID**
- Is EmpID a superkey? Yes ✅
- No violation

**FD: DeptID → DeptName**
- Is DeptID a superkey? No (DeptID⁺ = {DeptID, DeptName, DeptLocation} ≠ all attributes)
- Is DeptName prime? No (not part of any candidate key)
- **3NF Violation!**

**FD: DeptID → DeptLocation**
- Is DeptID a superkey? No
- Is DeptLocation prime? No
- **3NF Violation!**

**Conclusion: NOT in 3NF**

Transitive dependencies:
- EmpID → DeptID → DeptName
- EmpID → DeptID → DeptLocation

In 3NF When

•Every determinant is a superkey
•Or every determined attribute is prime
•No non-key → non-key dependencies exist
•All dependencies flow directly from keys

Violates 3NF When

•X → A where X is not a superkey
•AND A is not a prime attribute
•Classic pattern: Key → X → Y
•Non-key attributes determine other non-key attributes

Boyce-Codd Normal Form (BCNF) Analysis

BCNF is the "purest" form of the classical normal forms. It's stricter than 3NF and eliminates the "prime attribute exception."

BCNF Definition:

A relation is in BCNF iff for every non-trivial FD X → A:

X is a superkey

That's it. No exceptions for prime attributes.

Comparison with 3NF:

Condition	3NF	BCNF
X → A where X is superkey	✅ OK	✅ OK
X → A where A is prime	✅ OK	❌ Violation
X → A where X not superkey AND A not prime	❌ Violation	❌ Violation

BCNF vs 3NF: The Subtle Difference

A relation can be in 3NF but not BCNF. This happens when a non-superkey determines a prime attribute. Classic example: multiple overlapping candidate keys where attributes of one key are determined by attributes of another.

3NF but not BCNFA relation satisfying 3NF but violating BCNF

Input

TeachingAssignment(Student, Subject, Teacher)

Constraints:
- Each student takes each subject from exactly one teacher
- Each teacher teaches exactly one subject
- Each subject can be taught by multiple teachers

FDs:
- {Student, Subject} → Teacher
- Teacher → Subject

Candidate Keys: {Student, Subject} and {Student, Teacher}

Output

**Step 1: Check 3NF**

**FD: {Student, Subject} → Teacher**
- Is {Student, Subject} a superkey? Yes ✅
- OK for 3NF

**FD: Teacher → Subject**
- Is Teacher a superkey? No (Teacher⁺ = {Teacher, Subject} ≠ all attributes)
- Is Subject prime? **Yes!** (part of candidate key {Student, Subject})
- **OK for 3NF** (prime attribute exception applies)

**Conclusion: IN 3NF ✅**

**Step 2: Check BCNF**

**FD: Teacher → Subject**
- Is Teacher a superkey? No
- **BCNF Violation!** (no prime attribute exception in BCNF)

**Conclusion: NOT in BCNF ❌**

When BCNF Differs from 3NF:

The key pattern is overlapping candidate keys. If two candidate keys share some but not all attributes, you often get situations where an attribute of one key determines an attribute of another key without being a full superkey.

The Trade-off:

BCNF is "cleaner" but sometimes decomposing to BCNF loses dependency preservation. 3NF decomposition always preserves dependencies. This is why 3NF is sometimes preferred in practice—we'll explore this trade-off in Page 3.

BCNF Check Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/**
 * BCNF Violation Detection
 * Input: Relation R, Set of FDs F
 * Output: List of BCNF violations
 */
function checkBCNF(R, F) {
    let violations = [];
    
    for (const fd of F) {
        if (!isTrivial(fd)) {  // X → A where A ⊆ X is trivial
            // Compute closure of left side
            let leftClosure = computeClosure(fd.leftSide, F);
            
            // If closure doesn't include all attributes of R,
            // left side is not a superkey
            if (!leftClosure.equals(R.attributes)) {
                violations.push({
                    fd: fd,
                    reason: fd.leftSide + " is not a superkey"
                });
            }
        }
    }
    
    return violations;
}
 
// Note: For BCNF, we check EVERY non-trivial FD.
// The determinant must ALWAYS be a superkey.
// No exceptions for prime attributes (unlike 3NF).

Higher Normal Forms (4NF and 5NF)

While 4NF and 5NF are rarely the focus of interviews, understanding them demonstrates advanced knowledge and is occasionally tested for senior roles.

Fourth Normal Form (4NF)

4NF addresses multivalued dependencies (MVDs)—situations where two or more independent multivalued facts are stored in the same relation.

4NF Definition:

A relation is in 4NF iff for every non-trivial MVD X ↠ Y:

X is a superkey

Multivalued Dependency (MVD): X ↠ Y means that for each X value, the set of Y values is independent of other attributes. The formal definition: If tuples (x, y1, z1) and (x, y2, z2) exist, then (x, y1, z2) and (x, y2, z1) must also exist.

When 4NF Matters:

When a relation stores two independent one-to-many relationships from the same key.

4NF Violation ExampleIndependent multivalued facts causing redundancy

Input

CourseInfo(CourseID, Instructor, Textbook)

Constraints:
- A course can have multiple instructors
- A course can have multiple textbooks
- Instructor and Textbook are independent choices

Sample Data:
CourseID | Instructor | Textbook
---------|------------|----------
CS101    | Prof. A    | Book X
CS101    | Prof. A    | Book Y
CS101    | Prof. B    | Book X
CS101    | Prof. B    | Book Y

Output

**Analysis:**

MVDs present:
- CourseID ↠ Instructor (instructors are independent of textbook choice)
- CourseID ↠ Textbook (textbooks are independent of instructor choice)

Is CourseID a superkey? **No.** (CourseID alone doesn't determine a unique tuple)

**4NF Violation!**

**Problem:** Massive redundancy. Adding a third instructor or textbook requires many new rows.

**Solution:** Decompose into:
- CourseInstructor(CourseID, Instructor)
- CourseTextbook(CourseID, Textbook)

Now each fact is stored once, and the two are independent.

Fifth Normal Form (5NF / Project-Join Normal Form)

5NF addresses join dependencies—situations where a relation can only be losslessly reconstructed by joining three or more projections.

5NF Definition:

A relation is in 5NF iff every join dependency is implied by the candidate keys.

This is highly theoretical. 5NF problems involve cyclic constraints that can't be expressed as FDs or MVDs.

Practical Relevance:

In practice, 4NF violations are rare; 5NF violations are rarer still. If you encounter one in an interview, it's likely a trick question designed to see if you recognize the pattern or if you overly apply normalization.

Interview Strategy for Higher NFs

If asked about 4NF or 5NF:

Briefly explain the concept (MVDs for 4NF, JDs for 5NF)
Acknowledge that they're uncommon in practice
Focus on recognizing the patterns (independent multivalued facts for 4NF)
Don't spend excessive time on them unless explicitly asked

Systematic Normal Form Analysis Framework

Let's consolidate everything into a systematic framework you can apply to any relation in an interview.

The Complete NF Analysis Process:

Complete NF Analysis Framework
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
function analyzeNormalForm(R, F, MVDs = []) {
    // STEP 1: Find all candidate keys
    let candidateKeys = findCandidateKeys(R, F);
    let primeAttrs = union(candidateKeys);
    let nonPrimeAttrs = R.attributes.minus(primeAttrs);
    
    // STEP 2: Check 1NF
    if (hasMultiValuedAttributes(R) || 
        hasRepeatingGroups(R) || 
        hasNoPrimaryKey(R)) {
        return "NOT IN 1NF";
    }
    
    // STEP 3: Check 2NF (only if composite keys exist)
    if (candidateKeys.some(k => k.size > 1)) {
        for (const fd of F) {
            if (nonPrimeAttrs.contains(fd.rightSide)) {
                for (const key of candidateKeys) {
                    if (key.size > 1 && isProperSubset(fd.leftSide, key)) {
                        return "IN 1NF, NOT IN 2NF (partial dependency: " 
                               + fd + ")";
                    }
                }
            }
        }
    }
    
    // STEP 4: Check 3NF
    for (const fd of F) {
        if (!isTrivial(fd)) {
            let leftIsSuperkey = computeClosure(fd.leftSide, F)
                                 .equals(R.attributes);
            let rightIsPrime = primeAttrs.contains(fd.rightSide);
            
            if (!leftIsSuperkey && !rightIsPrime) {
                return "IN 2NF, NOT IN 3NF (transitive: " + fd + ")";
            }
        }
    }
    
    // STEP 5: Check BCNF
    for (const fd of F) {
        if (!isTrivial(fd)) {
            let leftIsSuperkey = computeClosure(fd.leftSide, F)
                                 .equals(R.attributes);
            if (!leftIsSuperkey) {
                return "IN 3NF, NOT IN BCNF (" + fd.leftSide 
                       + " is not superkey)";
            }
        }
    }
    
    // STEP 6: Check 4NF (if MVDs provided)
    for (const mvd of MVDs) {
        if (!isTrivial(mvd)) {
            let leftIsSuperkey = computeClosure(mvd.leftSide, F)
                                 .equals(R.attributes);
            if (!leftIsSuperkey) {
                return "IN BCNF, NOT IN 4NF (MVD: " + mvd + ")";
            }
        }
    }
    
    // STEP 7: Assume 5NF if no violations found
    return "IN BCNF" + (MVDs.length > 0 ? " AND 4NF" : "");
}

Quick Analysis Checklist

•Find candidate keys FIRST — This enables all subsequent checks. Use closure algorithm.
•1NF: Atomicity check — Any multi-valued attributes or repeating groups?
•2NF: Partial dependency check — Only relevant for composite keys. Does any subset of key determine non-prime attributes?
•3NF: Transitive dependency check — Any non-superkey → non-prime FDs?
•BCNF: Superkey check — Every determinant must be a superkey. No exceptions.
•4NF: MVD check — Any independent multivalued facts? (Look for cartesian-product-like redundancy)

Common Interview Traps and Edge Cases

Interviewers often include subtle traps to test deep understanding. Here are the most common ones:

Common Traps

•Trap 1: Assuming single-key relations have 2NF issues. If the only candidate key is a single attribute, 2NF is automatically satisfied—no partial dependencies possible.
•Trap 2: Forgetting alternate candidate keys. A relation can have multiple candidate keys. Check ALL of them for prime/non-prime classification. An attribute is prime if it's in ANY candidate key.
•Trap 3: Confusing 3NF and BCNF. 3NF allows non-superkey determinants if the determined attribute is prime. BCNF doesn't. Know when to apply each definition.
•Trap 4: Trivial FDs. X → X (reflexivity) is always true and never a violation. Don't report trivial FDs as problems.
•Trap 5: Derived FDs. Check all FDs in the closure, not just the given ones. A → B and B → C imply A → C, which might reveal violations.
•Trap 6: The 'all key' relation. If every attribute is part of some candidate key, the relation is automatically in 3NF (no non-prime attributes exist). But it might still violate BCNF.

Edge Case: All-Key RelationA relation where every attribute is prime

Input

R(A, B, C)
FDs: A → B, B → C, C → A

Determine the normal form.

Output

**Step 1: Find candidate keys**
- A⁺ = {A, B, C} → A is a key
- B⁺ = {B, C, A} → B is a key  
- C⁺ = {C, A, B} → C is a key

Candidate keys: {A}, {B}, {C}

**Step 2: Identify prime vs non-prime**
Prime: {A, B, C} (all attributes!)
Non-prime: {} (none!)

**Step 3: Check 3NF**
Every attribute is prime, so every FD trivially satisfies condition 2 (RHS is prime).
**In 3NF ✅**

**Step 4: Check BCNF**
- A → B: Is A a superkey? A⁺ = {A,B,C} = R. Yes! ✅
- B → C: Is B a superkey? B⁺ = {B,C,A} = R. Yes! ✅
- C → A: Is C a superkey? C⁺ = {C,A,B} = R. Yes! ✅

**In BCNF ✅**

**Result: This cyclic FD structure creates a relation in BCNF.**

Interview Strategy

When solving NF analysis in interviews:

Start by finding ALL candidate keys (this is often the hardest part)
Clearly separate prime and non-prime attributes
Check each normal form in order (1NF → 2NF → 3NF → BCNF)
Stop at first violation and report it
Show your reasoning—don't just give an answer, explain why

Practice Problems and Summary

Let's apply the complete framework to practice problems.

Practice Problem 1Complete NF analysis

Input

R(A, B, C, D, E)
FDs: AB → CDE, C → D, D → E

Determine the highest normal form satisfied.

Output

**Step 1: Find candidate keys**
{A,B}⁺ = {A,B} → {A,B,C,D,E} (via AB→CDE)
{A,B} determines all, so it's a superkey.
Is {A} a key? A⁺ = {A}. No.
Is {B} a key? B⁺ = {B}. No.

**Candidate Key: {A, B}**

**Step 2: Classify attributes**
Prime: {A, B}
Non-prime: {C, D, E}

**Step 3: Check 2NF**
Does any proper subset of {A,B} determine non-prime attributes?
- A⁺ = {A}. A → (nothing non-prime directly)
- B⁺ = {B}. B → (nothing non-prime directly)

No partial dependencies. **In 2NF ✅**

**Step 4: Check 3NF**
- AB → CDE: AB is superkey? AB⁺ = R. Yes ✅
- C → D: C is superkey? C⁺ = {C,D,E} ≠ R. No. Is D prime? No.
  **3NF Violation!**

**Result: IN 2NF, NOT IN 3NF**
Violating FD: C → D (transitive dependency through AB → C → D)

Practice Problem 2Tricky candidate key identification

Input

R(A, B, C, D)
FDs: A → B, BC → D, D → A

Determine the highest normal form satisfied.

Output

**Step 1: Find candidate keys**
Try {A,C}⁺ = {A} → {A,B} → (need D)
{A,C}⁺ via BC→D? No, we have AC, not BC.

Try {B,C}⁺ = {B,C} → {B,C,D} (BC→D) → {B,C,D,A} (D→A)
= {A,B,C,D} = R. **{B,C} is a superkey.**
Minimal? B⁺={B}, C⁺={C}. Neither alone works.
**{B,C} is a candidate key.**

Try {C,D}⁺ = {C,D} → {A,C,D} (D→A) → {A,B,C,D} (A→B)
= R. **{C,D} is also a candidate key.**

Try {A,C}⁺ = {A,C} → {A,B,C} (A→B). Still missing D.
Not a superkey.

**Candidate Keys: {B,C} and {C,D}**

**Step 2: Classify attributes**
Prime: {B, C, D}
Non-prime: {A}

**Step 3: Check 3NF**
- A → B: Is A superkey? A⁺ = {A,B} ≠ R. No.
  Is B prime? Yes! (part of key {B,C})
  **OK for 3NF** ✅
  
- BC → D: {B,C} superkey? Yes ✅

- D → A: Is D superkey? D⁺ = {D,A,B} ≠ R. No.
  Is A prime? **No!**
  **3NF Violation!**

**Result: IN 2NF, NOT IN 3NF**
Violating FD: D → A

Key Takeaways

•Normal forms are nested — Higher forms include all requirements of lower ones. Check systematically from 1NF upward.
•Candidate keys are foundational — Finding ALL candidate keys is essential for correct prime/non-prime classification.
•2NF only matters for composite keys — Single-attribute keys can't have partial dependencies.
•3NF vs BCNF differs in prime attribute handling — 3NF allows non-superkey → prime. BCNF doesn't.
•Show your work — In interviews, explain the reasoning, not just the answer.
•Watch for traps — Multiple candidate keys, all-prime relations, and derived FDs are common interview tricks.

What's Next:

Now that you can analyze and diagnose normal form violations, Page 3: Decomposition will teach you how to fix them—how to systematically decompose relations into higher normal forms while preserving losslessness and dependencies.

Page Complete

You now possess a systematic framework for analyzing any relation against normal forms. Practice with real schemas—the pattern recognition becomes intuitive over time. Every database you encounter is an opportunity to diagnose its design quality.

Normal Form Analysis

The Art of Diagnosing Database Design

What You Will Master

The Normal Form Hierarchy

Normal forms are nested requirements—each higher form includes all requirements of lower forms plus additional constraints. Understanding this hierarchy is essential for systematic analysis.

The Progression:

1NF → 2NF → 3NF → BCNF → 4NF → 5NF

Each arrow means "plus additional requirements."

Normal Form Summary
Normal Form	Core Requirement	What It Eliminates	Prerequisites
1NF	Atomic values, unique rows	Repeating groups, multi-valued attributes	None (base requirement)
2NF	No partial dependencies	Attributes depending on part of key	1NF
3NF	No transitive dependencies (from non-keys)	Non-key → Non-key dependencies	2NF
BCNF	Every determinant is a superkey	Non-superkey determinants	3NF
4NF	No non-trivial MVDs (except superkeys)	Independent multivalued facts	BCNF
5NF (PJNF)	No join dependencies (except from keys)	Cyclic join dependencies	4NF

Interview Focus

First Normal Form (1NF) Analysis

1NF is the foundation—the minimum requirement for a relation to even be called "relational." Yet violations are surprisingly common in practice, especially when mapping from non-relational sources.

1NF Requirements:

Atomic Values: Each cell contains exactly one value (no arrays, lists, or nested structures)
Unique Rows: No duplicate tuples (enforced by having a primary key)
Single-Valued Attributes: Each attribute has a single domain, no multi-valued fields
No Repeating Groups: No sequences like Phone1, Phone2, Phone3

How to Check for 1NF Violations:

1NF Violation ExamplesCommon patterns that violate First Normal Form

Input

Example 1 - Multi-valued attribute:
Student(ID, Name, PhoneNumbers)
- Row: (1, 'Alice', '555-1234, 555-5678')

Example 2 - Repeating groups:
Order(OrderID, Product1, Qty1, Product2, Qty2, Product3, Qty3)

Example 3 - Nested structure:
Employee(ID, Name, Address{Street, City, State, ZIP})

Output

**Violations:**

Example 1: PhoneNumbers contains multiple values in one cell → **Violates 1NF**
Fix: Create separate Phone table (StudentID, PhoneNumber)

Example 2: Product1/Qty1, Product2/Qty2, etc. are repeating groups → **Violates 1NF**
Fix: Create OrderLine(OrderID, ProductID, Quantity)

Example 3: Address is a composite/nested structure → **Violates 1NF** (depending on interpretation)
Fix: Flatten to separate columns (Street, City, State, ZIP)

The JSON/Array Debate

1NF Analysis Checklist:

✅ Does every column contain single, atomic values?
✅ Is there a defined primary key ensuring unique rows?
✅ Are there no repeating groups (Column1, Column2, Column3...)?
✅ Are there no nested structures or composite fields?

If any check fails → Not in 1NF (stop here, no need to check higher forms)

Second Normal Form (2NF) Analysis

2NF addresses partial dependencies—when a non-key attribute depends on only part of a composite primary key. This is only relevant when the primary key has multiple attributes.

Critical Observation: If the primary key is a single attribute, the relation is automatically in 2NF (assuming 1NF). Partial dependencies only exist with composite keys.

2NF Definition:

A relation is in 2NF iff:

It is in 1NF, AND
Every non-prime attribute is fully functionally dependent on every candidate key

Full Functional Dependency: X → A is a full FD if removing any attribute from X breaks the dependency. Partial dependency means some proper subset of X also determines A.

2NF Analysis Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/**
 * 2NF Violation Detection Algorithm
 * Input: Relation R, Set of FDs F, Candidate Key(s) K
 * Output: List of partial dependencies (2NF violations)
 */
function detect2NFViolations(R, F, candidateKeys) {
    let violations = [];
    
    // If all candidate keys are single attributes, no 2NF violations possible
    if (candidateKeys.every(k => k.attributes.length === 1)) {
        return []; // Automatically in 2NF
    }
    
    // Identify all prime attributes (part of any candidate key)
    let primeAttrs = new Set();
    for (const key of candidateKeys) {
        for (const attr of key.attributes) {
            primeAttrs.add(attr);
        }
    }
    
    // Identify non-prime attributes
    let nonPrimeAttrs = R.attributes.filter(a => !primeAttrs.has(a));
    
    // For each non-prime attribute
    for (const attr of nonPrimeAttrs) {
        // For each candidate key
        for (const key of candidateKeys) {
            if (key.attributes.length > 1) {
                // Check if any proper subset of key determines attr
                for (const subset of properSubsets(key.attributes)) {
                    let subsetClosure = computeClosure(subset, F);
                    if (subsetClosure.has(attr)) {
                        violations.push({
                            attribute: attr,
                            partialDeterminant: subset,
                            fullKey: key
                        });
                    }
                }
            }
        }
    }
    
    return violations;
}

2NF Analysis WalkthroughAnalyzing a relation for partial dependencies

Input

StudentCourse(StudentID, CourseID, StudentName, CourseName, Grade)

FDs:
- StudentID → StudentName
- CourseID → CourseName
- {StudentID, CourseID} → Grade

Primary Key: {StudentID, CourseID}

Output

**Step 1: Verify 1NF**
All attributes are atomic, PK defined → ✅ In 1NF

**Step 2: Identify prime and non-prime attributes**
Prime: {StudentID, CourseID}
Non-prime: {StudentName, CourseName, Grade}

**Step 3: Check for partial dependencies**

• StudentName: Determined by StudentID alone (proper subset of key)
  **Partial Dependency Found** → 2NF Violation

• CourseName: Determined by CourseID alone (proper subset of key)
  **Partial Dependency Found** → 2NF Violation

• Grade: Requires full key {StudentID, CourseID}
  Full dependency → No violation

**Conclusion: NOT in 2NF**

Violating FDs:
- StudentID → StudentName (partial)
- CourseID → CourseName (partial)

The Prime Attribute Nuance

Third Normal Form (3NF) Analysis

3NF addresses transitive dependencies—when a non-key attribute depends on another non-key attribute, creating an indirect path from the key.

3NF Definition (Standard):

A relation is in 3NF iff for every non-trivial FD X → A:

X is a superkey, OR
A is a prime attribute (part of some candidate key)

Alternative (Codd's Original):

A relation is in 3NF iff:

It is in 2NF, AND
No non-prime attribute is transitively dependent on any candidate key

Transitive Dependency: A is transitively dependent on key K if:

K → B (directly)
B → A (and B is not a superkey)
Therefore K → A indirectly through B

Understanding the 3NF Definition:

The two conditions ("X is superkey" OR "A is prime") might seem odd. Here's the intuition:

If X is a superkey: The dependency is fine—keys are supposed to determine everything.
If A is prime: Even if X isn't a superkey, we tolerate this because prime attributes have special status (they help identify tuples).

The violation case: X is NOT a superkey AND A is NOT prime. This is the transitive dependency pattern.

3NF Analysis WalkthroughDetecting transitive dependencies

Input

Employee(EmpID, EmpName, DeptID, DeptName, DeptLocation)

FDs:
- EmpID → EmpName, DeptID
- DeptID → DeptName, DeptLocation

Candidate Key: {EmpID}

Output

**Step 1: Verify 2NF**
Key is single attribute → Automatically in 2NF ✅

**Step 2: List all non-trivial FDs and analyze**

**FD: EmpID → EmpName**
- Is EmpID a superkey? Yes ✅
- No violation

**FD: EmpID → DeptID**
- Is EmpID a superkey? Yes ✅
- No violation

**FD: DeptID → DeptName**
- Is DeptID a superkey? No (DeptID⁺ = {DeptID, DeptName, DeptLocation} ≠ all attributes)
- Is DeptName prime? No (not part of any candidate key)
- **3NF Violation!**

**FD: DeptID → DeptLocation**
- Is DeptID a superkey? No
- Is DeptLocation prime? No
- **3NF Violation!**

**Conclusion: NOT in 3NF**

Transitive dependencies:
- EmpID → DeptID → DeptName
- EmpID → DeptID → DeptLocation

In 3NF When

•Every determinant is a superkey
•Or every determined attribute is prime
•No non-key → non-key dependencies exist
•All dependencies flow directly from keys

Violates 3NF When

•X → A where X is not a superkey
•AND A is not a prime attribute
•Classic pattern: Key → X → Y
•Non-key attributes determine other non-key attributes

Boyce-Codd Normal Form (BCNF) Analysis

BCNF is the "purest" form of the classical normal forms. It's stricter than 3NF and eliminates the "prime attribute exception."

BCNF Definition:

A relation is in BCNF iff for every non-trivial FD X → A:

X is a superkey

That's it. No exceptions for prime attributes.

Comparison with 3NF:

Condition	3NF	BCNF
X → A where X is superkey	✅ OK	✅ OK
X → A where A is prime	✅ OK	❌ Violation
X → A where X not superkey AND A not prime	❌ Violation	❌ Violation

BCNF vs 3NF: The Subtle Difference

3NF but not BCNFA relation satisfying 3NF but violating BCNF

Input

TeachingAssignment(Student, Subject, Teacher)

Constraints:
- Each student takes each subject from exactly one teacher
- Each teacher teaches exactly one subject
- Each subject can be taught by multiple teachers

FDs:
- {Student, Subject} → Teacher
- Teacher → Subject

Candidate Keys: {Student, Subject} and {Student, Teacher}

Output

**Step 1: Check 3NF**

**FD: {Student, Subject} → Teacher**
- Is {Student, Subject} a superkey? Yes ✅
- OK for 3NF

**FD: Teacher → Subject**
- Is Teacher a superkey? No (Teacher⁺ = {Teacher, Subject} ≠ all attributes)
- Is Subject prime? **Yes!** (part of candidate key {Student, Subject})
- **OK for 3NF** (prime attribute exception applies)

**Conclusion: IN 3NF ✅**

**Step 2: Check BCNF**

**FD: Teacher → Subject**
- Is Teacher a superkey? No
- **BCNF Violation!** (no prime attribute exception in BCNF)

**Conclusion: NOT in BCNF ❌**

When BCNF Differs from 3NF:

The Trade-off:

BCNF Check Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
/**
 * BCNF Violation Detection
 * Input: Relation R, Set of FDs F
 * Output: List of BCNF violations
 */
function checkBCNF(R, F) {
    let violations = [];
    
    for (const fd of F) {
        if (!isTrivial(fd)) {  // X → A where A ⊆ X is trivial
            // Compute closure of left side
            let leftClosure = computeClosure(fd.leftSide, F);
            
            // If closure doesn't include all attributes of R,
            // left side is not a superkey
            if (!leftClosure.equals(R.attributes)) {
                violations.push({
                    fd: fd,
                    reason: fd.leftSide + " is not a superkey"
                });
            }
        }
    }
    
    return violations;
}
 
// Note: For BCNF, we check EVERY non-trivial FD.
// The determinant must ALWAYS be a superkey.
// No exceptions for prime attributes (unlike 3NF).

Higher Normal Forms (4NF and 5NF)

While 4NF and 5NF are rarely the focus of interviews, understanding them demonstrates advanced knowledge and is occasionally tested for senior roles.

Fourth Normal Form (4NF)

4NF addresses multivalued dependencies (MVDs)—situations where two or more independent multivalued facts are stored in the same relation.

4NF Definition:

A relation is in 4NF iff for every non-trivial MVD X ↠ Y:

X is a superkey

When 4NF Matters:

When a relation stores two independent one-to-many relationships from the same key.

4NF Violation ExampleIndependent multivalued facts causing redundancy

Input

CourseInfo(CourseID, Instructor, Textbook)

Constraints:
- A course can have multiple instructors
- A course can have multiple textbooks
- Instructor and Textbook are independent choices

Sample Data:
CourseID | Instructor | Textbook
---------|------------|----------
CS101    | Prof. A    | Book X
CS101    | Prof. A    | Book Y
CS101    | Prof. B    | Book X
CS101    | Prof. B    | Book Y

Output

**Analysis:**

MVDs present:
- CourseID ↠ Instructor (instructors are independent of textbook choice)
- CourseID ↠ Textbook (textbooks are independent of instructor choice)

Is CourseID a superkey? **No.** (CourseID alone doesn't determine a unique tuple)

**4NF Violation!**

**Problem:** Massive redundancy. Adding a third instructor or textbook requires many new rows.

**Solution:** Decompose into:
- CourseInstructor(CourseID, Instructor)
- CourseTextbook(CourseID, Textbook)

Now each fact is stored once, and the two are independent.

Fifth Normal Form (5NF / Project-Join Normal Form)

5NF addresses join dependencies—situations where a relation can only be losslessly reconstructed by joining three or more projections.

5NF Definition:

A relation is in 5NF iff every join dependency is implied by the candidate keys.

This is highly theoretical. 5NF problems involve cyclic constraints that can't be expressed as FDs or MVDs.

Practical Relevance:

Interview Strategy for Higher NFs

If asked about 4NF or 5NF:

Briefly explain the concept (MVDs for 4NF, JDs for 5NF)
Acknowledge that they're uncommon in practice
Focus on recognizing the patterns (independent multivalued facts for 4NF)
Don't spend excessive time on them unless explicitly asked

Systematic Normal Form Analysis Framework

Let's consolidate everything into a systematic framework you can apply to any relation in an interview.

The Complete NF Analysis Process:

Complete NF Analysis Framework
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
function analyzeNormalForm(R, F, MVDs = []) {
    // STEP 1: Find all candidate keys
    let candidateKeys = findCandidateKeys(R, F);
    let primeAttrs = union(candidateKeys);
    let nonPrimeAttrs = R.attributes.minus(primeAttrs);
    
    // STEP 2: Check 1NF
    if (hasMultiValuedAttributes(R) || 
        hasRepeatingGroups(R) || 
        hasNoPrimaryKey(R)) {
        return "NOT IN 1NF";
    }
    
    // STEP 3: Check 2NF (only if composite keys exist)
    if (candidateKeys.some(k => k.size > 1)) {
        for (const fd of F) {
            if (nonPrimeAttrs.contains(fd.rightSide)) {
                for (const key of candidateKeys) {
                    if (key.size > 1 && isProperSubset(fd.leftSide, key)) {
                        return "IN 1NF, NOT IN 2NF (partial dependency: " 
                               + fd + ")";
                    }
                }
            }
        }
    }
    
    // STEP 4: Check 3NF
    for (const fd of F) {
        if (!isTrivial(fd)) {
            let leftIsSuperkey = computeClosure(fd.leftSide, F)
                                 .equals(R.attributes);
            let rightIsPrime = primeAttrs.contains(fd.rightSide);
            
            if (!leftIsSuperkey && !rightIsPrime) {
                return "IN 2NF, NOT IN 3NF (transitive: " + fd + ")";
            }
        }
    }
    
    // STEP 5: Check BCNF
    for (const fd of F) {
        if (!isTrivial(fd)) {
            let leftIsSuperkey = computeClosure(fd.leftSide, F)
                                 .equals(R.attributes);
            if (!leftIsSuperkey) {
                return "IN 3NF, NOT IN BCNF (" + fd.leftSide 
                       + " is not superkey)";
            }
        }
    }
    
    // STEP 6: Check 4NF (if MVDs provided)
    for (const mvd of MVDs) {
        if (!isTrivial(mvd)) {
            let leftIsSuperkey = computeClosure(mvd.leftSide, F)
                                 .equals(R.attributes);
            if (!leftIsSuperkey) {
                return "IN BCNF, NOT IN 4NF (MVD: " + mvd + ")";
            }
        }
    }
    
    // STEP 7: Assume 5NF if no violations found
    return "IN BCNF" + (MVDs.length > 0 ? " AND 4NF" : "");
}

Quick Analysis Checklist

•Find candidate keys FIRST — This enables all subsequent checks. Use closure algorithm.
•1NF: Atomicity check — Any multi-valued attributes or repeating groups?
•2NF: Partial dependency check — Only relevant for composite keys. Does any subset of key determine non-prime attributes?
•3NF: Transitive dependency check — Any non-superkey → non-prime FDs?
•BCNF: Superkey check — Every determinant must be a superkey. No exceptions.
•4NF: MVD check — Any independent multivalued facts? (Look for cartesian-product-like redundancy)

Common Interview Traps and Edge Cases

Interviewers often include subtle traps to test deep understanding. Here are the most common ones:

Common Traps

•Trap 1: Assuming single-key relations have 2NF issues. If the only candidate key is a single attribute, 2NF is automatically satisfied—no partial dependencies possible.
•Trap 2: Forgetting alternate candidate keys. A relation can have multiple candidate keys. Check ALL of them for prime/non-prime classification. An attribute is prime if it's in ANY candidate key.
•Trap 3: Confusing 3NF and BCNF. 3NF allows non-superkey determinants if the determined attribute is prime. BCNF doesn't. Know when to apply each definition.
•Trap 4: Trivial FDs. X → X (reflexivity) is always true and never a violation. Don't report trivial FDs as problems.
•Trap 5: Derived FDs. Check all FDs in the closure, not just the given ones. A → B and B → C imply A → C, which might reveal violations.
•Trap 6: The 'all key' relation. If every attribute is part of some candidate key, the relation is automatically in 3NF (no non-prime attributes exist). But it might still violate BCNF.

Edge Case: All-Key RelationA relation where every attribute is prime

Input

R(A, B, C)
FDs: A → B, B → C, C → A

Determine the normal form.

Output

**Step 1: Find candidate keys**
- A⁺ = {A, B, C} → A is a key
- B⁺ = {B, C, A} → B is a key  
- C⁺ = {C, A, B} → C is a key

Candidate keys: {A}, {B}, {C}

**Step 2: Identify prime vs non-prime**
Prime: {A, B, C} (all attributes!)
Non-prime: {} (none!)

**Step 3: Check 3NF**
Every attribute is prime, so every FD trivially satisfies condition 2 (RHS is prime).
**In 3NF ✅**

**Step 4: Check BCNF**
- A → B: Is A a superkey? A⁺ = {A,B,C} = R. Yes! ✅
- B → C: Is B a superkey? B⁺ = {B,C,A} = R. Yes! ✅
- C → A: Is C a superkey? C⁺ = {C,A,B} = R. Yes! ✅

**In BCNF ✅**

**Result: This cyclic FD structure creates a relation in BCNF.**

Interview Strategy

When solving NF analysis in interviews:

Start by finding ALL candidate keys (this is often the hardest part)
Clearly separate prime and non-prime attributes
Check each normal form in order (1NF → 2NF → 3NF → BCNF)
Stop at first violation and report it
Show your reasoning—don't just give an answer, explain why

Practice Problems and Summary

Let's apply the complete framework to practice problems.

Practice Problem 1Complete NF analysis

Input

R(A, B, C, D, E)
FDs: AB → CDE, C → D, D → E

Determine the highest normal form satisfied.

Output

**Step 1: Find candidate keys**
{A,B}⁺ = {A,B} → {A,B,C,D,E} (via AB→CDE)
{A,B} determines all, so it's a superkey.
Is {A} a key? A⁺ = {A}. No.
Is {B} a key? B⁺ = {B}. No.

**Candidate Key: {A, B}**

**Step 2: Classify attributes**
Prime: {A, B}
Non-prime: {C, D, E}

**Step 3: Check 2NF**
Does any proper subset of {A,B} determine non-prime attributes?
- A⁺ = {A}. A → (nothing non-prime directly)
- B⁺ = {B}. B → (nothing non-prime directly)

No partial dependencies. **In 2NF ✅**

**Step 4: Check 3NF**
- AB → CDE: AB is superkey? AB⁺ = R. Yes ✅
- C → D: C is superkey? C⁺ = {C,D,E} ≠ R. No. Is D prime? No.
  **3NF Violation!**

**Result: IN 2NF, NOT IN 3NF**
Violating FD: C → D (transitive dependency through AB → C → D)

Practice Problem 2Tricky candidate key identification

Input

R(A, B, C, D)
FDs: A → B, BC → D, D → A

Determine the highest normal form satisfied.

Output

**Step 1: Find candidate keys**
Try {A,C}⁺ = {A} → {A,B} → (need D)
{A,C}⁺ via BC→D? No, we have AC, not BC.

Try {B,C}⁺ = {B,C} → {B,C,D} (BC→D) → {B,C,D,A} (D→A)
= {A,B,C,D} = R. **{B,C} is a superkey.**
Minimal? B⁺={B}, C⁺={C}. Neither alone works.
**{B,C} is a candidate key.**

Try {C,D}⁺ = {C,D} → {A,C,D} (D→A) → {A,B,C,D} (A→B)
= R. **{C,D} is also a candidate key.**

Try {A,C}⁺ = {A,C} → {A,B,C} (A→B). Still missing D.
Not a superkey.

**Candidate Keys: {B,C} and {C,D}**

**Step 2: Classify attributes**
Prime: {B, C, D}
Non-prime: {A}

**Step 3: Check 3NF**
- A → B: Is A superkey? A⁺ = {A,B} ≠ R. No.
  Is B prime? Yes! (part of key {B,C})
  **OK for 3NF** ✅
  
- BC → D: {B,C} superkey? Yes ✅

- D → A: Is D superkey? D⁺ = {D,A,B} ≠ R. No.
  Is A prime? **No!**
  **3NF Violation!**

**Result: IN 2NF, NOT IN 3NF**
Violating FD: D → A

Key Takeaways

•Normal forms are nested — Higher forms include all requirements of lower ones. Check systematically from 1NF upward.
•Candidate keys are foundational — Finding ALL candidate keys is essential for correct prime/non-prime classification.
•2NF only matters for composite keys — Single-attribute keys can't have partial dependencies.
•3NF vs BCNF differs in prime attribute handling — 3NF allows non-superkey → prime. BCNF doesn't.
•Show your work — In interviews, explain the reasoning, not just the answer.
•Watch for traps — Multiple candidate keys, all-prime relations, and derived FDs are common interview tricks.

What's Next:

Page Complete