Database Management SystemsFD and Keys

Functional Dependencies and Keys

LevelIntermediate

Duration75 mins

TopicFD and Keys

5 / 5

Key Relationship to Normalization

The Key-Normalization Connection

Throughout this module, we've built a comprehensive understanding of how keys emerge from functional dependencies. Now we complete the picture by examining how this key analysis directly drives normalization decisions.

Normalization isn't an abstract exercise—it's a systematic process guided by the relationship between keys and non-key attributes. Understanding this relationship transforms normalization from rote application of rules into principled schema design.

What You Will Learn

By the end of this page, you will understand how normal forms are defined in terms of keys, see how key analysis reveals normalization violations, master the dependency chain concept that underlies all normal form definitions, and apply a unified framework for schema analysis and improvement.

The Key-Centric View of Normal Forms

Every normal form from 2NF through BCNF is fundamentally a statement about the relationship between keys and non-keys. Let's reframe each normal form through this lens:

The Core Question:

What may determine what in a well-designed relation?

The normal forms provide increasingly strict answers to this question.

Normal Forms as Key Constraints
Normal Form	Key Requirement	Informal Statement	What May Determine Non-Primes?
1NF	Must have a key	Atomic values, defined structure	Anything (no rules yet)
2NF	No partial key dependencies	Non-primes depend on FULL key	Full keys only (not partial keys)
3NF	No transitive dependencies (with prime exception)	Non-primes depend directly on key	Superkeys (or they're prime)
BCNF	All determinants are superkeys	Every determinant is a key	Only superkeys determine anything

The Progressive Tightening:

Notice how each normal form progressively tightens the rules:

1NF: Just have a key.
2NF: Depend on the WHOLE key (no partial dependencies).
3NF: Depend ONLY on the key (no transitive dependencies) — unless you're prime.
BCNF: Depend ONLY on the key (no exceptions).

This progression is often summarized as:

"Every non-prime attribute must depend on the key, the whole key, and nothing but the key — so help me Codd."

(A play on the courtroom oath, attributed to William Kent.)

The Codd Mantra Decoded

• The key: Must depend on a key (1NF establishes keys exist) • The whole key: Not just part of it (2NF eliminates partial dependencies) • Nothing but the key: No transitive chains through non-keys (3NF/BCNF)

This captures the essence of normalization in one memorable phrase.

Keys in Second Normal Form Analysis

Second Normal Form requires that every non-prime attribute is fully functionally dependent on every candidate key. This is directly tied to key structure:

Full Functional Dependency:

Attribute A is fully functionally dependent on key K if:

K → A (K determines A)
No proper subset of K determines A

The Key Connection:

2NF violations can only occur when:

There is a composite candidate key (multiple attributes)
A non-prime attribute depends on a proper subset of that key

If all candidate keys are single-attribute, 2NF is automatically satisfied!

Complete 2NF Analysis Framework:

Given: Relation R with attribute set U and FD set F

Step 1: Find all candidate keys
Step 2: Identify prime attributes (union of all candidate keys)
Step 3: Identify non-prime attributes (U - Prime)
Step 4: For each candidate key K that is composite:
    For each non-prime attribute A:
        For each proper subset S of K:
            If S → A exists (check S⁺ contains A):
                VIOLATION: A is partially dependent on K via S

Relation: OrderLine(OrderID, LineNum, ProductID, Quantity)

FDs:

OrderID, LineNum → ProductID, Quantity

Analysis:

Candidate Key: {OrderID, LineNum}
Prime: {OrderID, LineNum}
Non-Prime: {ProductID, Quantity}
Check partial dependencies:
- Does {OrderID} → ProductID? No.
- Does {OrderID} → Quantity? No.
- Does {LineNum} → ProductID? No.
- Does {LineNum} → Quantity? No.

Result: No partial dependencies. ✓ In 2NF

Both non-prime attributes depend on the full key, not any proper subset.

Keys in Third Normal Form Analysis

Third Normal Form adds the transitive dependency check while providing an exception for prime attributes:

3NF Definition (Key-Focused):

For every non-trivial FD X → A in F⁺:

X is a superkey, OR
A is a prime attribute

The Key Connection:

3NF violations occur when:

A non-superkey determines a non-prime attribute
There's a "chain" like: Key → B → C (where B and C are non-prime)

The second condition creates the prime attribute exception: if A is part of a candidate key, it can be determined by non-superkeys without violating 3NF.

Complete 3NF Analysis Framework:

Given: Relation R with attribute set U and FD set F

Step 1: Find all candidate keys → Identify superkeys
Step 2: Identify prime attributes
Step 3: For each non-trivial FD X → A in F:
    If X is NOT a superkey:
        If A is NOT prime:
            VIOLATION: Non-superkey determines non-prime
Step 4: If no violations, relation is in 3NF

3NF Analysis Implementation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def is_superkey(attrs: set, all_attrs: set, fds: list) -> bool:
    """Check if attribute set is a superkey."""
    closure = compute_closure(attrs, fds)
    return closure == all_attrs
 
 
def analyze_3nf(all_attrs: set, fds: list, candidate_keys: list) -> dict:
    """
    Complete 3NF analysis of a relation.
    
    Returns:
        Dictionary with analysis results including violations
    """
    # Compute prime attributes
    prime = set()
    for key in candidate_keys:
        prime.update(key)
    
    # Check each FD
    violations = []
    for lhs, rhs in fds:
        for attr in rhs:
            # Skip trivial (A → A)
            if attr in lhs:
                continue
            
            # Check conditions
            is_sk = is_superkey(lhs, all_attrs, fds)
            is_prime = attr in prime
            
            if not is_sk and not is_prime:
                violations.append({
                    'fd': f"{lhs} → {attr}",
                    'determinant': lhs,
                    'dependent': attr,
                    'issue': 'Non-superkey determines non-prime'
                })
    
    return {
        'candidate_keys': candidate_keys,
        'prime_attributes': prime,
        'non_prime_attributes': all_attrs - prime,
        'is_3nf': len(violations) == 0,
        'violations': violations
    }
 
 
def decompose_to_3nf(all_attrs: set, fds: list, candidate_keys: list) -> list:
    """
    Decompose relation to 3NF using the synthesis algorithm.
    
    The synthesis algorithm guarantees:
    1. Lossless join
    2. Dependency preservation
    3. 3NF
    """
    # Step 1: Find minimal cover
    minimal = minimal_cover(fds)  # Assume this exists
    
    # Step 2: Group FDs by left-hand side
    groups = {}
    for lhs, rhs in minimal:
        lhs_key = frozenset(lhs)
        if lhs_key not in groups:
            groups[lhs_key] = set()
        groups[lhs_key].update(rhs)
    
    # Step 3: Create relations from groups
    relations = []
    for lhs, rhs in groups.items():
        relations.append({
            'attributes': set(lhs) | rhs,
            'key': set(lhs)
        })
    
    # Step 4: Ensure a candidate key is represented
    key_found = False
    for key in candidate_keys:
        for rel in relations:
            if key.issubset(rel['attributes']):
                key_found = True
                break
        if key_found:
            break
    
    if not key_found and candidate_keys:
        relations.append({
            'attributes': set(candidate_keys[0]),
            'key': set(candidate_keys[0])
        })
    
    return relations
 
 
# Example analysis
if __name__ == "__main__":
    # Employee relation with transitive dependency
    all_attrs = {'EmpID', 'EmpName', 'DeptID', 'DeptName', 'DeptHead'}
    fds = [
        ({'EmpID'}, {'EmpName', 'DeptID'}),
        ({'DeptID'}, {'DeptName', 'DeptHead'})
    ]
    candidate_keys = [{'EmpID'}]
    
    result = analyze_3nf(all_attrs, fds, candidate_keys)
    
    print("3NF Analysis:")
    print(f"  Candidate Keys: {result['candidate_keys']}")
    print(f"  Prime: {result['prime_attributes']}")
    print(f"  Non-Prime: {result['non_prime_attributes']}")
    print(f"  Is 3NF: {result['is_3nf']}")
    
    if result['violations']:
        print("  Violations:")
        for v in result['violations']:
            print(f"    - {v['fd']}: {v['issue']}")

The Prime Exception

The exception for prime attributes in 3NF exists to allow certain natural structures that would otherwise require awkward decompositions. However, this same exception means 3NF doesn't fully eliminate all redundancy—that's what BCNF addresses.

Keys in Boyce-Codd Normal Form Analysis

BCNF removes the prime attribute exception, providing the strictest key-based constraint:

BCNF Definition:

For every non-trivial FD X → A:

X is a superkey

That's it. No exceptions. Every determinant must be a superkey.

Why BCNF Matters:

3NF allows anomalies when overlapping candidate keys exist. Consider:

Relation: CourseInstructor(Course, Instructor, Textbook)
FDs:
  Course, Instructor → Textbook
  Textbook → Course

Candidate Keys: {Course, Instructor}, {Instructor, Textbook}
Prime: {Course, Instructor, Textbook}  (all!)
Non-Prime: {} (none!)

3NF Check:

Textbook → Course: Is Textbook a superkey? No. Is Course prime? Yes! ✓ Passes 3NF.

BCNF Check:

Textbook → Course: Is Textbook a superkey? No. ✗ Violates BCNF.

The relation is in 3NF but not BCNF!

The 3NF vs BCNF Trade-off:

Why not always use BCNF? Because BCNF decomposition may not preserve all dependencies.

Original: CourseInstructor(Course, Instructor, Textbook)
FDs: Course, Instructor → Textbook
     Textbook → Course

BCNF Decomposition:
  R1(Textbook, Course)     -- Key: Textbook
  R2(Instructor, Textbook) -- Key: Instructor, Textbook

Preserved FDs:
  ✓ Textbook → Course (in R1)

Lost FD:
  ✗ Course, Instructor → Textbook (spans both tables!)

To enforce Course, Instructor → Textbook, we'd need to join R1 and R2, which is expensive. The dependency is not preserved.

3NF vs BCNF Trade-offs
Property	3NF Synthesis	BCNF Decomposition
Eliminates redundancy	Mostly (with prime exception)	Completely
Lossless join	✓ Guaranteed	✓ Guaranteed
Dependency preservation	✓ Guaranteed	✗ Not guaranteed
Algorithm complexity	Moderate	Simpler
Resulting relations	May be fewer	May be more

Practical Guidance

In practice, most relations that are in 3NF are also in BCNF. The difference only matters when there are overlapping composite candidate keys—a relatively rare scenario. When in doubt, aim for BCNF but accept 3NF if dependency preservation is critical.

A Unified Framework for Key-Based Schema Analysis

We can now present a unified approach to analyzing and improving database schemas based on key analysis:

Unified Schema Analysis Framework
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
PROCEDURE AnalyzeAndNormalize(R, F):
    // R = relation schema with attributes U
    // F = set of functional dependencies
    
    // ═══════════════════════════════════════════════════
    // PHASE 1: KEY DISCOVERY
    // ═══════════════════════════════════════════════════
    
    1.1 Classify attributes:
        L_only ← attributes only on left side of FDs
        R_only ← attributes only on right side of FDs
        Both   ← attributes on both sides
        Neither ← attributes not in any FD
        
    1.2 Compute core:
        Core ← L_only ∪ Neither
        
    1.3 Find all candidate keys:
        IF Core⁺ = U THEN
            CandidateKeys ← {Core}
        ELSE
            CandidateKeys ← {}
            FOR each subset S of Both (smallest first):
                Candidate ← Core ∪ S
                IF Candidate⁺ = U AND Candidate is minimal:
                    Add Candidate to CandidateKeys
                    
    // ═══════════════════════════════════════════════════
    // PHASE 2: ATTRIBUTE CLASSIFICATION
    // ═══════════════════════════════════════════════════
    
    2.1 Compute prime attributes:
        Prime ← ⋃ CandidateKeys
        
    2.2 Compute non-prime attributes:
        NonPrime ← U - Prime
        
    // ═══════════════════════════════════════════════════
    // PHASE 3: NORMAL FORM ANALYSIS
    // ═══════════════════════════════════════════════════
    
    3.1 Check 2NF (only if any key is composite):
        FOR each composite key K in CandidateKeys:
            FOR each proper subset S of K:
                FOR each A in NonPrime:
                    IF A ∈ S⁺ THEN
                        Report 2NF violation: A partially depends on K via S
                        
    3.2 Check 3NF:
        FOR each FD (X → A) in F:
            IF A ∉ X THEN  // Non-trivial
                IF X is not a superkey AND A ∉ Prime THEN
                    Report 3NF violation: X → A
                    
    3.3 Check BCNF:
        FOR each FD (X → A) in F:
            IF A ∉ X THEN  // Non-trivial
                IF X is not a superkey THEN
                    Report BCNF violation: X → A
                    
    // ═══════════════════════════════════════════════════
    // PHASE 4: DECOMPOSITION (if needed)
    // ═══════════════════════════════════════════════════
    
    4.1 Choose target normal form:
        IF dependency preservation is critical THEN
            Target ← 3NF
        ELSE
            Target ← BCNF
            
    4.2 Apply appropriate algorithm:
        IF Target = 3NF THEN
            Result ← Synthesize3NF(F, CandidateKeys)
        ELSE
            Result ← DecomposeBCNF(R, F)
            
    4.3 Verify result:
        For each relation in Result:
            Verify target normal form is achieved
            Verify lossless join property
            If 3NF: Verify dependency preservation
            
    RETURN Result

The Framework in Action:

Let's trace through a complete example:

Relation: StudentCourse(SID, SName, CID, CName, Instructor, Grade)

FDs:
  SID → SName
  CID → CName, Instructor
  SID, CID → Grade

Phase 1: Key Discovery

Attribute	L	R	Classification
SID	✓	✗	L-only
SName	✗	✓	R-only
CID	✓	✗	L-only
CName	✗	✓	R-only
Instructor	✗	✓	R-only
Grade	✗	✓	R-only

Core = {SID, CID} {SID, CID}⁺ = {SID, CID} → SName (via SID→SName) → CName,Instructor (via CID→...) → Grade = All attributes ✓

Candidate Key: {{SID, CID}}

Phase 2: Attribute Classification

Prime = {SID, CID} Non-Prime = {SName, CName, Instructor, Grade}

Phase 3: Normal Form Analysis

2NF Check (key is composite):

{SID} → SName? Yes! SName is non-prime. 2NF VIOLATION
{SID} → CName? No. {SID}⁺ = {SID, SName}
{CID} → CName? Yes! CName is non-prime. 2NF VIOLATION
{CID} → Instructor? Yes! 2NF VIOLATION

Not in 2NF — multiple partial dependencies!

(Note: If not in 2NF, automatically not in 3NF or BCNF)

Phase 4: Decomposition

Decompose to eliminate partial dependencies:

Student(SID, SName)          -- Key: SID
  FD: SID → SName
  
Course(CID, CName, Instructor) -- Key: CID
  FDs: CID → CName, Instructor
  
Enrollment(SID, CID, Grade)  -- Key: SID, CID
  FD: SID, CID → Grade

Each resulting relation is in BCNF:

Student: SID is only determinant, SID is superkey ✓
Course: CID is only determinant, CID is superkey ✓
Enrollment: {SID, CID} is only determinant, {SID, CID} is superkey ✓

Keys as Design Guides

Beyond normalization algorithms, understanding keys provides design intuition:

The Entity Rule:

Each candidate key represents an entity identity. If your relation has multiple unrelated keys, you may be storing multiple entity types.

Bad: PersonVehicle(SSN, Name, VIN, Make, Model)
Keys: {SSN} for person, {VIN} for vehicle

These are two different entities forced into one table!

The Determinant Rule:

Every non-trivial determinant should correspond to an entity or relationship identity.

Examine: Employee(EmpID, Name, DeptID, DeptName, ManagerID)

Determinants:
  EmpID → ... (Employee entity)
  DeptID → DeptName, ManagerID (Department entity!)
  
DeptID determines department-specific data but isn't the table's key.
This indicates Department should be a separate table.

Key-Based Design Principles

•One entity, one relation: Each logical entity should have its own relation with its natural key
•Keys identify, non-keys describe: Prime attributes establish identity; non-prime attributes provide information about that identity
•Determinants reveal structure: Non-superkey determinants suggest hidden entities that should be extracted
•Key overlap indicates rich semantics: Multiple overlapping keys often indicate complex real-world constraints worth documenting
•Synthetic keys for stability: When natural keys are unstable or large, consider synthetic keys but document the natural candidates

Keys Tell the Story

A well-designed schema's keys tell you what the database is about. Looking at the keys of a database, you should be able to understand: what entities exist (one table per entity type), how they're identified (candidate keys), and how they relate (foreign keys referencing those candidates).

Advanced Considerations

Several advanced topics connect keys to broader database design concerns:

Natural vs. Synthetic Keys:

Candidate keys derived from FDs are natural keys—they emerge from the domain semantics. Synthetic keys (auto-increment IDs, UUIDs) are added for practical reasons.

Natural Key: {Email} for User table
Synthetic Key: {UserID} added for:
  - Stability (emails change more than IDs)
  - Efficiency (integers are smaller than strings)
  - Privacy (ID in URLs is less revealing than email)

Best practice: Implement synthetic primary keys but document natural candidate keys with UNIQUE constraints.

Keys and Temporal Data:

When tracking history, keys often expand:

Current: Employee(EmpID, Name, DeptID, Salary)
Key: {EmpID}

Historical: EmployeeHistory(EmpID, EffectiveDate, Name, DeptID, Salary)
Key: {EmpID, EffectiveDate}

The temporal dimension becomes part of the identity.

Keys in Distributed Systems:

In distributed databases, key choice affects:

Partitioning: Data is typically sharded by primary key
Locality: Related data should share key prefixes
Conflict resolution: Keys must be globally unique across nodes

Local: OrderID = auto-increment  (conflicts across nodes!)
Distributed: OrderID = {NodeID, LocalSequence} or UUID

Keys in Practice

While normalization theory focuses on natural keys derived from FDs, practical database design balances this with operational concerns. The theory tells you what CAN uniquely identify data; engineering tells you what SHOULD.

Summary: Keys and Normalization

This module has built a complete understanding of how functional dependencies, keys, and normalization interrelate. Let's consolidate the key insights:

Key Takeaways

•Keys emerge from FDs — Candidate keys are derived mathematically from functional dependencies using attribute closure
•Prime/Non-Prime matters — This classification directly affects normal form definitions and analysis
•Normal forms are key constraints — 2NF, 3NF, BCNF progressively tighten rules about what may determine what
•The Codd Mantra applies — Non-primes depend on 'the key, the whole key, and nothing but the key'
•3NF vs BCNF trade-off — BCNF eliminates all redundancy but may sacrifice dependency preservation
•Keys guide design — Understanding keys reveals entity structure and suggests decomposition

Module Complete:

You've now completed the module on Functional Dependencies and Keys. You can:

Derive all candidate keys from any FD set
Classify attributes as prime or non-prime
Analyze relations for 2NF, 3NF, and BCNF compliance
Apply decomposition algorithms to achieve target normal forms
Use key analysis to guide intuitive schema design

This knowledge forms the foundation for the normalization chapters ahead, where you'll apply these concepts to systematically improve database designs.

Module Complete

Congratulations! You've mastered the relationship between functional dependencies, keys, and normalization. This understanding is essential for all database design work and provides the theoretical foundation for the normal forms covered in subsequent chapters.

5 / 5

Loading learning content...

Database Management SystemsFD and Keys

Functional Dependencies and Keys

LevelIntermediate

Duration75 mins

TopicFD and Keys

5 / 5

Key Relationship to Normalization

The Key-Normalization Connection

What You Will Learn

The Key-Centric View of Normal Forms

Every normal form from 2NF through BCNF is fundamentally a statement about the relationship between keys and non-keys. Let's reframe each normal form through this lens:

The Core Question:

What may determine what in a well-designed relation?

The normal forms provide increasingly strict answers to this question.

Normal Forms as Key Constraints
Normal Form	Key Requirement	Informal Statement	What May Determine Non-Primes?
1NF	Must have a key	Atomic values, defined structure	Anything (no rules yet)
2NF	No partial key dependencies	Non-primes depend on FULL key	Full keys only (not partial keys)
3NF	No transitive dependencies (with prime exception)	Non-primes depend directly on key	Superkeys (or they're prime)
BCNF	All determinants are superkeys	Every determinant is a key	Only superkeys determine anything

The Progressive Tightening:

Notice how each normal form progressively tightens the rules:

1NF: Just have a key.
2NF: Depend on the WHOLE key (no partial dependencies).
3NF: Depend ONLY on the key (no transitive dependencies) — unless you're prime.
BCNF: Depend ONLY on the key (no exceptions).

This progression is often summarized as:

"Every non-prime attribute must depend on the key, the whole key, and nothing but the key — so help me Codd."

(A play on the courtroom oath, attributed to William Kent.)

The Codd Mantra Decoded

This captures the essence of normalization in one memorable phrase.

Keys in Second Normal Form Analysis

Second Normal Form requires that every non-prime attribute is fully functionally dependent on every candidate key. This is directly tied to key structure:

Full Functional Dependency:

Attribute A is fully functionally dependent on key K if:

K → A (K determines A)
No proper subset of K determines A

The Key Connection:

2NF violations can only occur when:

There is a composite candidate key (multiple attributes)
A non-prime attribute depends on a proper subset of that key

If all candidate keys are single-attribute, 2NF is automatically satisfied!

Complete 2NF Analysis Framework:

Given: Relation R with attribute set U and FD set F

Step 1: Find all candidate keys
Step 2: Identify prime attributes (union of all candidate keys)
Step 3: Identify non-prime attributes (U - Prime)
Step 4: For each candidate key K that is composite:
    For each non-prime attribute A:
        For each proper subset S of K:
            If S → A exists (check S⁺ contains A):
                VIOLATION: A is partially dependent on K via S

Relation: OrderLine(OrderID, LineNum, ProductID, Quantity)

FDs:

OrderID, LineNum → ProductID, Quantity

Analysis:

Candidate Key: {OrderID, LineNum}
Prime: {OrderID, LineNum}
Non-Prime: {ProductID, Quantity}
Check partial dependencies:
- Does {OrderID} → ProductID? No.
- Does {OrderID} → Quantity? No.
- Does {LineNum} → ProductID? No.
- Does {LineNum} → Quantity? No.

Result: No partial dependencies. ✓ In 2NF

Both non-prime attributes depend on the full key, not any proper subset.

Keys in Third Normal Form Analysis

Third Normal Form adds the transitive dependency check while providing an exception for prime attributes:

3NF Definition (Key-Focused):

For every non-trivial FD X → A in F⁺:

X is a superkey, OR
A is a prime attribute

The Key Connection:

3NF violations occur when:

A non-superkey determines a non-prime attribute
There's a "chain" like: Key → B → C (where B and C are non-prime)

The second condition creates the prime attribute exception: if A is part of a candidate key, it can be determined by non-superkeys without violating 3NF.

Complete 3NF Analysis Framework:

Given: Relation R with attribute set U and FD set F

Step 1: Find all candidate keys → Identify superkeys
Step 2: Identify prime attributes
Step 3: For each non-trivial FD X → A in F:
    If X is NOT a superkey:
        If A is NOT prime:
            VIOLATION: Non-superkey determines non-prime
Step 4: If no violations, relation is in 3NF

3NF Analysis Implementation
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def is_superkey(attrs: set, all_attrs: set, fds: list) -> bool:
    """Check if attribute set is a superkey."""
    closure = compute_closure(attrs, fds)
    return closure == all_attrs
 
 
def analyze_3nf(all_attrs: set, fds: list, candidate_keys: list) -> dict:
    """
    Complete 3NF analysis of a relation.
    
    Returns:
        Dictionary with analysis results including violations
    """
    # Compute prime attributes
    prime = set()
    for key in candidate_keys:
        prime.update(key)
    
    # Check each FD
    violations = []
    for lhs, rhs in fds:
        for attr in rhs:
            # Skip trivial (A → A)
            if attr in lhs:
                continue
            
            # Check conditions
            is_sk = is_superkey(lhs, all_attrs, fds)
            is_prime = attr in prime
            
            if not is_sk and not is_prime:
                violations.append({
                    'fd': f"{lhs} → {attr}",
                    'determinant': lhs,
                    'dependent': attr,
                    'issue': 'Non-superkey determines non-prime'
                })
    
    return {
        'candidate_keys': candidate_keys,
        'prime_attributes': prime,
        'non_prime_attributes': all_attrs - prime,
        'is_3nf': len(violations) == 0,
        'violations': violations
    }
 
 
def decompose_to_3nf(all_attrs: set, fds: list, candidate_keys: list) -> list:
    """
    Decompose relation to 3NF using the synthesis algorithm.
    
    The synthesis algorithm guarantees:
    1. Lossless join
    2. Dependency preservation
    3. 3NF
    """
    # Step 1: Find minimal cover
    minimal = minimal_cover(fds)  # Assume this exists
    
    # Step 2: Group FDs by left-hand side
    groups = {}
    for lhs, rhs in minimal:
        lhs_key = frozenset(lhs)
        if lhs_key not in groups:
            groups[lhs_key] = set()
        groups[lhs_key].update(rhs)
    
    # Step 3: Create relations from groups
    relations = []
    for lhs, rhs in groups.items():
        relations.append({
            'attributes': set(lhs) | rhs,
            'key': set(lhs)
        })
    
    # Step 4: Ensure a candidate key is represented
    key_found = False
    for key in candidate_keys:
        for rel in relations:
            if key.issubset(rel['attributes']):
                key_found = True
                break
        if key_found:
            break
    
    if not key_found and candidate_keys:
        relations.append({
            'attributes': set(candidate_keys[0]),
            'key': set(candidate_keys[0])
        })
    
    return relations
 
 
# Example analysis
if __name__ == "__main__":
    # Employee relation with transitive dependency
    all_attrs = {'EmpID', 'EmpName', 'DeptID', 'DeptName', 'DeptHead'}
    fds = [
        ({'EmpID'}, {'EmpName', 'DeptID'}),
        ({'DeptID'}, {'DeptName', 'DeptHead'})
    ]
    candidate_keys = [{'EmpID'}]
    
    result = analyze_3nf(all_attrs, fds, candidate_keys)
    
    print("3NF Analysis:")
    print(f"  Candidate Keys: {result['candidate_keys']}")
    print(f"  Prime: {result['prime_attributes']}")
    print(f"  Non-Prime: {result['non_prime_attributes']}")
    print(f"  Is 3NF: {result['is_3nf']}")
    
    if result['violations']:
        print("  Violations:")
        for v in result['violations']:
            print(f"    - {v['fd']}: {v['issue']}")

The Prime Exception

Keys in Boyce-Codd Normal Form Analysis

BCNF removes the prime attribute exception, providing the strictest key-based constraint:

BCNF Definition:

For every non-trivial FD X → A:

X is a superkey

That's it. No exceptions. Every determinant must be a superkey.

Why BCNF Matters:

3NF allows anomalies when overlapping candidate keys exist. Consider:

Relation: CourseInstructor(Course, Instructor, Textbook)
FDs:
  Course, Instructor → Textbook
  Textbook → Course

Candidate Keys: {Course, Instructor}, {Instructor, Textbook}
Prime: {Course, Instructor, Textbook}  (all!)
Non-Prime: {} (none!)

3NF Check:

Textbook → Course: Is Textbook a superkey? No. Is Course prime? Yes! ✓ Passes 3NF.

BCNF Check:

Textbook → Course: Is Textbook a superkey? No. ✗ Violates BCNF.

The relation is in 3NF but not BCNF!

The 3NF vs BCNF Trade-off:

Why not always use BCNF? Because BCNF decomposition may not preserve all dependencies.

Original: CourseInstructor(Course, Instructor, Textbook)
FDs: Course, Instructor → Textbook
     Textbook → Course

BCNF Decomposition:
  R1(Textbook, Course)     -- Key: Textbook
  R2(Instructor, Textbook) -- Key: Instructor, Textbook

Preserved FDs:
  ✓ Textbook → Course (in R1)

Lost FD:
  ✗ Course, Instructor → Textbook (spans both tables!)

To enforce Course, Instructor → Textbook, we'd need to join R1 and R2, which is expensive. The dependency is not preserved.

3NF vs BCNF Trade-offs
Property	3NF Synthesis	BCNF Decomposition
Eliminates redundancy	Mostly (with prime exception)	Completely
Lossless join	✓ Guaranteed	✓ Guaranteed
Dependency preservation	✓ Guaranteed	✗ Not guaranteed
Algorithm complexity	Moderate	Simpler
Resulting relations	May be fewer	May be more

Practical Guidance

A Unified Framework for Key-Based Schema Analysis

We can now present a unified approach to analyzing and improving database schemas based on key analysis:

Unified Schema Analysis Framework
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
PROCEDURE AnalyzeAndNormalize(R, F):
    // R = relation schema with attributes U
    // F = set of functional dependencies
    
    // ═══════════════════════════════════════════════════
    // PHASE 1: KEY DISCOVERY
    // ═══════════════════════════════════════════════════
    
    1.1 Classify attributes:
        L_only ← attributes only on left side of FDs
        R_only ← attributes only on right side of FDs
        Both   ← attributes on both sides
        Neither ← attributes not in any FD
        
    1.2 Compute core:
        Core ← L_only ∪ Neither
        
    1.3 Find all candidate keys:
        IF Core⁺ = U THEN
            CandidateKeys ← {Core}
        ELSE
            CandidateKeys ← {}
            FOR each subset S of Both (smallest first):
                Candidate ← Core ∪ S
                IF Candidate⁺ = U AND Candidate is minimal:
                    Add Candidate to CandidateKeys
                    
    // ═══════════════════════════════════════════════════
    // PHASE 2: ATTRIBUTE CLASSIFICATION
    // ═══════════════════════════════════════════════════
    
    2.1 Compute prime attributes:
        Prime ← ⋃ CandidateKeys
        
    2.2 Compute non-prime attributes:
        NonPrime ← U - Prime
        
    // ═══════════════════════════════════════════════════
    // PHASE 3: NORMAL FORM ANALYSIS
    // ═══════════════════════════════════════════════════
    
    3.1 Check 2NF (only if any key is composite):
        FOR each composite key K in CandidateKeys:
            FOR each proper subset S of K:
                FOR each A in NonPrime:
                    IF A ∈ S⁺ THEN
                        Report 2NF violation: A partially depends on K via S
                        
    3.2 Check 3NF:
        FOR each FD (X → A) in F:
            IF A ∉ X THEN  // Non-trivial
                IF X is not a superkey AND A ∉ Prime THEN
                    Report 3NF violation: X → A
                    
    3.3 Check BCNF:
        FOR each FD (X → A) in F:
            IF A ∉ X THEN  // Non-trivial
                IF X is not a superkey THEN
                    Report BCNF violation: X → A
                    
    // ═══════════════════════════════════════════════════
    // PHASE 4: DECOMPOSITION (if needed)
    // ═══════════════════════════════════════════════════
    
    4.1 Choose target normal form:
        IF dependency preservation is critical THEN
            Target ← 3NF
        ELSE
            Target ← BCNF
            
    4.2 Apply appropriate algorithm:
        IF Target = 3NF THEN
            Result ← Synthesize3NF(F, CandidateKeys)
        ELSE
            Result ← DecomposeBCNF(R, F)
            
    4.3 Verify result:
        For each relation in Result:
            Verify target normal form is achieved
            Verify lossless join property
            If 3NF: Verify dependency preservation
            
    RETURN Result

The Framework in Action:

Let's trace through a complete example:

Relation: StudentCourse(SID, SName, CID, CName, Instructor, Grade)

FDs:
  SID → SName
  CID → CName, Instructor
  SID, CID → Grade

Phase 1: Key Discovery

Attribute	L	R	Classification
SID	✓	✗	L-only
SName	✗	✓	R-only
CID	✓	✗	L-only
CName	✗	✓	R-only
Instructor	✗	✓	R-only
Grade	✗	✓	R-only

Core = {SID, CID} {SID, CID}⁺ = {SID, CID} → SName (via SID→SName) → CName,Instructor (via CID→...) → Grade = All attributes ✓

Candidate Key: {{SID, CID}}

Phase 2: Attribute Classification

Prime = {SID, CID} Non-Prime = {SName, CName, Instructor, Grade}

Phase 3: Normal Form Analysis

2NF Check (key is composite):

{SID} → SName? Yes! SName is non-prime. 2NF VIOLATION
{SID} → CName? No. {SID}⁺ = {SID, SName}
{CID} → CName? Yes! CName is non-prime. 2NF VIOLATION
{CID} → Instructor? Yes! 2NF VIOLATION

Not in 2NF — multiple partial dependencies!

(Note: If not in 2NF, automatically not in 3NF or BCNF)

Phase 4: Decomposition

Decompose to eliminate partial dependencies:

Student(SID, SName)          -- Key: SID
  FD: SID → SName
  
Course(CID, CName, Instructor) -- Key: CID
  FDs: CID → CName, Instructor
  
Enrollment(SID, CID, Grade)  -- Key: SID, CID
  FD: SID, CID → Grade

Each resulting relation is in BCNF:

Student: SID is only determinant, SID is superkey ✓
Course: CID is only determinant, CID is superkey ✓
Enrollment: {SID, CID} is only determinant, {SID, CID} is superkey ✓

Keys as Design Guides

Beyond normalization algorithms, understanding keys provides design intuition:

The Entity Rule:

Each candidate key represents an entity identity. If your relation has multiple unrelated keys, you may be storing multiple entity types.

Bad: PersonVehicle(SSN, Name, VIN, Make, Model)
Keys: {SSN} for person, {VIN} for vehicle

These are two different entities forced into one table!

The Determinant Rule:

Every non-trivial determinant should correspond to an entity or relationship identity.

Examine: Employee(EmpID, Name, DeptID, DeptName, ManagerID)

Determinants:
  EmpID → ... (Employee entity)
  DeptID → DeptName, ManagerID (Department entity!)
  
DeptID determines department-specific data but isn't the table's key.
This indicates Department should be a separate table.

Key-Based Design Principles

•One entity, one relation: Each logical entity should have its own relation with its natural key
•Keys identify, non-keys describe: Prime attributes establish identity; non-prime attributes provide information about that identity
•Determinants reveal structure: Non-superkey determinants suggest hidden entities that should be extracted
•Key overlap indicates rich semantics: Multiple overlapping keys often indicate complex real-world constraints worth documenting
•Synthetic keys for stability: When natural keys are unstable or large, consider synthetic keys but document the natural candidates

Keys Tell the Story

Advanced Considerations

Several advanced topics connect keys to broader database design concerns:

Natural vs. Synthetic Keys:

Candidate keys derived from FDs are natural keys—they emerge from the domain semantics. Synthetic keys (auto-increment IDs, UUIDs) are added for practical reasons.

Natural Key: {Email} for User table
Synthetic Key: {UserID} added for:
  - Stability (emails change more than IDs)
  - Efficiency (integers are smaller than strings)
  - Privacy (ID in URLs is less revealing than email)

Best practice: Implement synthetic primary keys but document natural candidate keys with UNIQUE constraints.

Keys and Temporal Data:

When tracking history, keys often expand:

Current: Employee(EmpID, Name, DeptID, Salary)
Key: {EmpID}

Historical: EmployeeHistory(EmpID, EffectiveDate, Name, DeptID, Salary)
Key: {EmpID, EffectiveDate}

The temporal dimension becomes part of the identity.

Keys in Distributed Systems:

In distributed databases, key choice affects:

Partitioning: Data is typically sharded by primary key
Locality: Related data should share key prefixes
Conflict resolution: Keys must be globally unique across nodes

Local: OrderID = auto-increment  (conflicts across nodes!)
Distributed: OrderID = {NodeID, LocalSequence} or UUID

Keys in Practice

Summary: Keys and Normalization

This module has built a complete understanding of how functional dependencies, keys, and normalization interrelate. Let's consolidate the key insights:

Key Takeaways

•Keys emerge from FDs — Candidate keys are derived mathematically from functional dependencies using attribute closure
•Prime/Non-Prime matters — This classification directly affects normal form definitions and analysis
•Normal forms are key constraints — 2NF, 3NF, BCNF progressively tighten rules about what may determine what
•The Codd Mantra applies — Non-primes depend on 'the key, the whole key, and nothing but the key'
•3NF vs BCNF trade-off — BCNF eliminates all redundancy but may sacrifice dependency preservation
•Keys guide design — Understanding keys reveals entity structure and suggests decomposition

Module Complete:

You've now completed the module on Functional Dependencies and Keys. You can:

Derive all candidate keys from any FD set
Classify attributes as prime or non-prime
Analyze relations for 2NF, 3NF, and BCNF compliance
Apply decomposition algorithms to achieve target normal forms
Use key analysis to guide intuitive schema design

This knowledge forms the foundation for the normalization chapters ahead, where you'll apply these concepts to systematically improve database designs.

Module Complete

5 / 5