Loading learning content...
Throughout this module, we've built a comprehensive understanding of how keys emerge from functional dependencies. Now we complete the picture by examining how this key analysis directly drives normalization decisions.
Normalization isn't an abstract exercise—it's a systematic process guided by the relationship between keys and non-key attributes. Understanding this relationship transforms normalization from rote application of rules into principled schema design.
By the end of this page, you will understand how normal forms are defined in terms of keys, see how key analysis reveals normalization violations, master the dependency chain concept that underlies all normal form definitions, and apply a unified framework for schema analysis and improvement.
Every normal form from 2NF through BCNF is fundamentally a statement about the relationship between keys and non-keys. Let's reframe each normal form through this lens:
The Core Question:
What may determine what in a well-designed relation?
The normal forms provide increasingly strict answers to this question.
| Normal Form | Key Requirement | Informal Statement | What May Determine Non-Primes? |
|---|---|---|---|
| 1NF | Must have a key | Atomic values, defined structure | Anything (no rules yet) |
| 2NF | No partial key dependencies | Non-primes depend on FULL key | Full keys only (not partial keys) |
| 3NF | No transitive dependencies (with prime exception) | Non-primes depend directly on key | Superkeys (or they're prime) |
| BCNF | All determinants are superkeys | Every determinant is a key | Only superkeys determine anything |
The Progressive Tightening:
Notice how each normal form progressively tightens the rules:
1NF: Just have a key.
2NF: Depend on the WHOLE key (no partial dependencies).
3NF: Depend ONLY on the key (no transitive dependencies) — unless you're prime.
BCNF: Depend ONLY on the key (no exceptions).
This progression is often summarized as:
"Every non-prime attribute must depend on the key, the whole key, and nothing but the key — so help me Codd."
(A play on the courtroom oath, attributed to William Kent.)
• The key: Must depend on a key (1NF establishes keys exist) • The whole key: Not just part of it (2NF eliminates partial dependencies) • Nothing but the key: No transitive chains through non-keys (3NF/BCNF)
This captures the essence of normalization in one memorable phrase.
Second Normal Form requires that every non-prime attribute is fully functionally dependent on every candidate key. This is directly tied to key structure:
Full Functional Dependency:
Attribute A is fully functionally dependent on key K if:
The Key Connection:
2NF violations can only occur when:
If all candidate keys are single-attribute, 2NF is automatically satisfied!
Complete 2NF Analysis Framework:
Given: Relation R with attribute set U and FD set F
Step 1: Find all candidate keys
Step 2: Identify prime attributes (union of all candidate keys)
Step 3: Identify non-prime attributes (U - Prime)
Step 4: For each candidate key K that is composite:
For each non-prime attribute A:
For each proper subset S of K:
If S → A exists (check S⁺ contains A):
VIOLATION: A is partially dependent on K via S
Relation: OrderLine(OrderID, LineNum, ProductID, Quantity)
FDs:
OrderID, LineNum → ProductID, Quantity
Analysis:
Result: No partial dependencies. ✓ In 2NF
Both non-prime attributes depend on the full key, not any proper subset.
Third Normal Form adds the transitive dependency check while providing an exception for prime attributes:
3NF Definition (Key-Focused):
For every non-trivial FD X → A in F⁺:
The Key Connection:
3NF violations occur when:
The second condition creates the prime attribute exception: if A is part of a candidate key, it can be determined by non-superkeys without violating 3NF.
Complete 3NF Analysis Framework:
Given: Relation R with attribute set U and FD set F
Step 1: Find all candidate keys → Identify superkeys
Step 2: Identify prime attributes
Step 3: For each non-trivial FD X → A in F:
If X is NOT a superkey:
If A is NOT prime:
VIOLATION: Non-superkey determines non-prime
Step 4: If no violations, relation is in 3NF
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
def is_superkey(attrs: set, all_attrs: set, fds: list) -> bool: """Check if attribute set is a superkey.""" closure = compute_closure(attrs, fds) return closure == all_attrs def analyze_3nf(all_attrs: set, fds: list, candidate_keys: list) -> dict: """ Complete 3NF analysis of a relation. Returns: Dictionary with analysis results including violations """ # Compute prime attributes prime = set() for key in candidate_keys: prime.update(key) # Check each FD violations = [] for lhs, rhs in fds: for attr in rhs: # Skip trivial (A → A) if attr in lhs: continue # Check conditions is_sk = is_superkey(lhs, all_attrs, fds) is_prime = attr in prime if not is_sk and not is_prime: violations.append({ 'fd': f"{lhs} → {attr}", 'determinant': lhs, 'dependent': attr, 'issue': 'Non-superkey determines non-prime' }) return { 'candidate_keys': candidate_keys, 'prime_attributes': prime, 'non_prime_attributes': all_attrs - prime, 'is_3nf': len(violations) == 0, 'violations': violations } def decompose_to_3nf(all_attrs: set, fds: list, candidate_keys: list) -> list: """ Decompose relation to 3NF using the synthesis algorithm. The synthesis algorithm guarantees: 1. Lossless join 2. Dependency preservation 3. 3NF """ # Step 1: Find minimal cover minimal = minimal_cover(fds) # Assume this exists # Step 2: Group FDs by left-hand side groups = {} for lhs, rhs in minimal: lhs_key = frozenset(lhs) if lhs_key not in groups: groups[lhs_key] = set() groups[lhs_key].update(rhs) # Step 3: Create relations from groups relations = [] for lhs, rhs in groups.items(): relations.append({ 'attributes': set(lhs) | rhs, 'key': set(lhs) }) # Step 4: Ensure a candidate key is represented key_found = False for key in candidate_keys: for rel in relations: if key.issubset(rel['attributes']): key_found = True break if key_found: break if not key_found and candidate_keys: relations.append({ 'attributes': set(candidate_keys[0]), 'key': set(candidate_keys[0]) }) return relations # Example analysisif __name__ == "__main__": # Employee relation with transitive dependency all_attrs = {'EmpID', 'EmpName', 'DeptID', 'DeptName', 'DeptHead'} fds = [ ({'EmpID'}, {'EmpName', 'DeptID'}), ({'DeptID'}, {'DeptName', 'DeptHead'}) ] candidate_keys = [{'EmpID'}] result = analyze_3nf(all_attrs, fds, candidate_keys) print("3NF Analysis:") print(f" Candidate Keys: {result['candidate_keys']}") print(f" Prime: {result['prime_attributes']}") print(f" Non-Prime: {result['non_prime_attributes']}") print(f" Is 3NF: {result['is_3nf']}") if result['violations']: print(" Violations:") for v in result['violations']: print(f" - {v['fd']}: {v['issue']}")The exception for prime attributes in 3NF exists to allow certain natural structures that would otherwise require awkward decompositions. However, this same exception means 3NF doesn't fully eliminate all redundancy—that's what BCNF addresses.
BCNF removes the prime attribute exception, providing the strictest key-based constraint:
BCNF Definition:
For every non-trivial FD X → A:
That's it. No exceptions. Every determinant must be a superkey.
Why BCNF Matters:
3NF allows anomalies when overlapping candidate keys exist. Consider:
Relation: CourseInstructor(Course, Instructor, Textbook)
FDs:
Course, Instructor → Textbook
Textbook → Course
Candidate Keys: {Course, Instructor}, {Instructor, Textbook}
Prime: {Course, Instructor, Textbook} (all!)
Non-Prime: {} (none!)
3NF Check:
BCNF Check:
The relation is in 3NF but not BCNF!
The 3NF vs BCNF Trade-off:
Why not always use BCNF? Because BCNF decomposition may not preserve all dependencies.
Original: CourseInstructor(Course, Instructor, Textbook)
FDs: Course, Instructor → Textbook
Textbook → Course
BCNF Decomposition:
R1(Textbook, Course) -- Key: Textbook
R2(Instructor, Textbook) -- Key: Instructor, Textbook
Preserved FDs:
✓ Textbook → Course (in R1)
Lost FD:
✗ Course, Instructor → Textbook (spans both tables!)
To enforce Course, Instructor → Textbook, we'd need to join R1 and R2, which is expensive. The dependency is not preserved.
| Property | 3NF Synthesis | BCNF Decomposition |
|---|---|---|
| Eliminates redundancy | Mostly (with prime exception) | Completely |
| Lossless join | ✓ Guaranteed | ✓ Guaranteed |
| Dependency preservation | ✓ Guaranteed | ✗ Not guaranteed |
| Algorithm complexity | Moderate | Simpler |
| Resulting relations | May be fewer | May be more |
In practice, most relations that are in 3NF are also in BCNF. The difference only matters when there are overlapping composite candidate keys—a relatively rare scenario. When in doubt, aim for BCNF but accept 3NF if dependency preservation is critical.
We can now present a unified approach to analyzing and improving database schemas based on key analysis:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283
PROCEDURE AnalyzeAndNormalize(R, F): // R = relation schema with attributes U // F = set of functional dependencies // ═══════════════════════════════════════════════════ // PHASE 1: KEY DISCOVERY // ═══════════════════════════════════════════════════ 1.1 Classify attributes: L_only ← attributes only on left side of FDs R_only ← attributes only on right side of FDs Both ← attributes on both sides Neither ← attributes not in any FD 1.2 Compute core: Core ← L_only ∪ Neither 1.3 Find all candidate keys: IF Core⁺ = U THEN CandidateKeys ← {Core} ELSE CandidateKeys ← {} FOR each subset S of Both (smallest first): Candidate ← Core ∪ S IF Candidate⁺ = U AND Candidate is minimal: Add Candidate to CandidateKeys // ═══════════════════════════════════════════════════ // PHASE 2: ATTRIBUTE CLASSIFICATION // ═══════════════════════════════════════════════════ 2.1 Compute prime attributes: Prime ← ⋃ CandidateKeys 2.2 Compute non-prime attributes: NonPrime ← U - Prime // ═══════════════════════════════════════════════════ // PHASE 3: NORMAL FORM ANALYSIS // ═══════════════════════════════════════════════════ 3.1 Check 2NF (only if any key is composite): FOR each composite key K in CandidateKeys: FOR each proper subset S of K: FOR each A in NonPrime: IF A ∈ S⁺ THEN Report 2NF violation: A partially depends on K via S 3.2 Check 3NF: FOR each FD (X → A) in F: IF A ∉ X THEN // Non-trivial IF X is not a superkey AND A ∉ Prime THEN Report 3NF violation: X → A 3.3 Check BCNF: FOR each FD (X → A) in F: IF A ∉ X THEN // Non-trivial IF X is not a superkey THEN Report BCNF violation: X → A // ═══════════════════════════════════════════════════ // PHASE 4: DECOMPOSITION (if needed) // ═══════════════════════════════════════════════════ 4.1 Choose target normal form: IF dependency preservation is critical THEN Target ← 3NF ELSE Target ← BCNF 4.2 Apply appropriate algorithm: IF Target = 3NF THEN Result ← Synthesize3NF(F, CandidateKeys) ELSE Result ← DecomposeBCNF(R, F) 4.3 Verify result: For each relation in Result: Verify target normal form is achieved Verify lossless join property If 3NF: Verify dependency preservation RETURN ResultThe Framework in Action:
Let's trace through a complete example:
Relation: StudentCourse(SID, SName, CID, CName, Instructor, Grade)
FDs:
SID → SName
CID → CName, Instructor
SID, CID → Grade
Phase 1: Key Discovery
| Attribute | L | R | Classification |
|---|---|---|---|
| SID | ✓ | ✗ | L-only |
| SName | ✗ | ✓ | R-only |
| CID | ✓ | ✗ | L-only |
| CName | ✗ | ✓ | R-only |
| Instructor | ✗ | ✓ | R-only |
| Grade | ✗ | ✓ | R-only |
Core = {SID, CID} {SID, CID}⁺ = {SID, CID} → SName (via SID→SName) → CName,Instructor (via CID→...) → Grade = All attributes ✓
Candidate Key: {{SID, CID}}
Phase 2: Attribute Classification
Prime = {SID, CID} Non-Prime = {SName, CName, Instructor, Grade}
Phase 3: Normal Form Analysis
2NF Check (key is composite):
Not in 2NF — multiple partial dependencies!
(Note: If not in 2NF, automatically not in 3NF or BCNF)
Phase 4: Decomposition
Decompose to eliminate partial dependencies:
Student(SID, SName) -- Key: SID
FD: SID → SName
Course(CID, CName, Instructor) -- Key: CID
FDs: CID → CName, Instructor
Enrollment(SID, CID, Grade) -- Key: SID, CID
FD: SID, CID → Grade
Each resulting relation is in BCNF:
Beyond normalization algorithms, understanding keys provides design intuition:
The Entity Rule:
Each candidate key represents an entity identity. If your relation has multiple unrelated keys, you may be storing multiple entity types.
Bad: PersonVehicle(SSN, Name, VIN, Make, Model)
Keys: {SSN} for person, {VIN} for vehicle
These are two different entities forced into one table!
The Determinant Rule:
Every non-trivial determinant should correspond to an entity or relationship identity.
Examine: Employee(EmpID, Name, DeptID, DeptName, ManagerID)
Determinants:
EmpID → ... (Employee entity)
DeptID → DeptName, ManagerID (Department entity!)
DeptID determines department-specific data but isn't the table's key.
This indicates Department should be a separate table.
A well-designed schema's keys tell you what the database is about. Looking at the keys of a database, you should be able to understand: what entities exist (one table per entity type), how they're identified (candidate keys), and how they relate (foreign keys referencing those candidates).
Several advanced topics connect keys to broader database design concerns:
Natural vs. Synthetic Keys:
Candidate keys derived from FDs are natural keys—they emerge from the domain semantics. Synthetic keys (auto-increment IDs, UUIDs) are added for practical reasons.
Natural Key: {Email} for User table
Synthetic Key: {UserID} added for:
- Stability (emails change more than IDs)
- Efficiency (integers are smaller than strings)
- Privacy (ID in URLs is less revealing than email)
Best practice: Implement synthetic primary keys but document natural candidate keys with UNIQUE constraints.
Keys and Temporal Data:
When tracking history, keys often expand:
Current: Employee(EmpID, Name, DeptID, Salary)
Key: {EmpID}
Historical: EmployeeHistory(EmpID, EffectiveDate, Name, DeptID, Salary)
Key: {EmpID, EffectiveDate}
The temporal dimension becomes part of the identity.
Keys in Distributed Systems:
In distributed databases, key choice affects:
Local: OrderID = auto-increment (conflicts across nodes!)
Distributed: OrderID = {NodeID, LocalSequence} or UUID
While normalization theory focuses on natural keys derived from FDs, practical database design balances this with operational concerns. The theory tells you what CAN uniquely identify data; engineering tells you what SHOULD.
This module has built a complete understanding of how functional dependencies, keys, and normalization interrelate. Let's consolidate the key insights:
Module Complete:
You've now completed the module on Functional Dependencies and Keys. You can:
This knowledge forms the foundation for the normalization chapters ahead, where you'll apply these concepts to systematically improve database designs.
Congratulations! You've mastered the relationship between functional dependencies, keys, and normalization. This understanding is essential for all database design work and provides the theoretical foundation for the normal forms covered in subsequent chapters.