Database Management SystemsFD and Keys

Functional Dependencies and Keys

LevelIntermediate

Duration75 mins

TopicFD and Keys

1 / 5

Key from FDs

The Foundation of Database Integrity

Every well-designed database schema rests upon a fundamental question: What uniquely identifies each entity in our data? This seemingly simple question has profound implications for data integrity, query performance, and system reliability.

Functional dependencies provide the mathematical framework to answer this question rigorously. Rather than relying on intuition or business assumptions alone, we can derive keys systematically from the functional relationships inherent in our data—ensuring that our schema design is both correct and complete.

What You Will Learn

By the end of this page, you will understand how functional dependencies encode the information necessary to identify keys, master the attribute closure algorithm for key derivation, and develop the analytical intuition to determine whether any attribute set qualifies as a superkey or candidate key.

Keys and FDs: The Fundamental Connection

To understand how keys emerge from functional dependencies, we must first precisely define what a key actually is in the context of relational theory.

Formal Definition of a Key:

Given a relation schema R with attribute set U and a set of functional dependencies F:

A superkey is any attribute set K ⊆ U such that K → U (i.e., K functionally determines all attributes in U)
A candidate key is a minimal superkey—no proper subset of K is also a superkey
The primary key is one designated candidate key chosen by the database designer

The crucial insight is this: the set F completely determines which attribute combinations can serve as keys. This is not a matter of designer preference or business rules—it's a mathematical consequence of the functional dependencies.

The Mathematical Perspective

Think of functional dependencies as encoding constraints on possible database states. A key is simply an attribute set strong enough (under these constraints) to uniquely determine everything else. The FDs don't suggest keys—they mandate them.

Example: Understanding the Connection

Consider a relation schema:

Student(StudentID, Email, Name, Department, Advisor)

With functional dependencies:

F = {
    StudentID → Email, Name, Department, Advisor
    Email → StudentID, Name, Department, Advisor
    Department → Advisor
}

From these FDs, we can immediately derive:

StudentID is a superkey because StudentID → all other attributes
Email is a superkey because Email → all other attributes
{StudentID, Name} is a superkey (StudentID alone determines everything)
{Department} is NOT a superkey because Department ↛ StudentID, Email, Name

But which are candidate keys? StudentID and Email are both minimal superkeys—removing any attribute from either (which is impossible since they're single attributes) would destroy the superkey property. Therefore, both StudentID and Email are candidate keys.

Key Hierarchy and Relationships
Concept	Definition	Relationship to FDs	Example
Superkey	Any set that determines all attributes	K⁺ = U (closure equals all attributes)	{StudentID}, {Email}, {StudentID, Name}
Candidate Key	Minimal superkey	No proper subset has closure = U	{StudentID}, {Email}
Primary Key	Designated candidate key	Designer choice from candidates	{StudentID}
Alternate Key	Non-primary candidate key	Remaining candidates after PK choice	{Email}

The Attribute Closure Method for Key Derivation

The attribute closure (X⁺) is the cornerstone tool for deriving keys from functional dependencies. Given an attribute set X and FD set F, the closure X⁺ contains all attributes that can be determined from X using the FDs in F.

The Key Insight:

An attribute set X is a superkey if and only if X⁺ contains all attributes of the relation. In symbols:

X is a superkey ⟺ X⁺ = U (where U is the set of all attributes)

This transforms key identification into a computational problem with a well-defined algorithm.

Attribute Closure Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
ALGORITHM: ComputeClosure(X, F)
// Input: X = set of attributes, F = set of functional dependencies
// Output: X⁺ = closure of X under F
 
FUNCTION ComputeClosure(X, F):
    closure ← X                    // Initialize with input attributes
    changed ← true
    
    WHILE changed DO:
        changed ← false
        FOR EACH dependency (A → B) IN F:
            // If we have all attributes on the left side
            IF A ⊆ closure AND B ⊄ closure THEN:
                closure ← closure ∪ B   // Add right-side attributes
                changed ← true
        END FOR
    END WHILE
    
    RETURN closure
 
// Example usage:
// X = {StudentID}
// F = {StudentID → Email, Email → Name, Name → Department}
// ComputeClosure(X, F) returns {StudentID, Email, Name, Department}

Algorithm Complexity

The closure algorithm runs in O(n²) time in the worst case, where n is the number of attributes and FDs. Each iteration adds at least one attribute, and each iteration examines all FDs. This is efficient enough for practical database design scenarios.

Systematic Key Derivation Process

While the closure algorithm tells us whether a specific attribute set is a superkey, we need a systematic process to find all candidate keys. This requires understanding the role different attributes play in functional dependencies.

Attribute Classification:

For a given relation schema R with FD set F, each attribute falls into one of four categories:

L-only (Left-only): Appears only on the left side of FDs
R-only (Right-only): Appears only on the right side of FDs
Both: Appears on both left and right sides of different FDs
Neither: Does not appear in any FD (but is in the schema)

This classification dramatically reduces the key search space.

Attribute Classification and Key Implications
Classification	Key Implication	Reason
L-only	MUST be in every candidate key	No FD can derive this attribute—it must be given
R-only	NEVER in any candidate key	Can always be derived—adding it creates redundancy
Neither	MUST be in every candidate key	No FD involves this attribute—it cannot be derived
Both	MAY be in some candidate keys	Sometimes needed, sometimes derivable

The Key Finding Algorithm:

With this classification, we can efficiently find all candidate keys:

Partition attributes into L-only, R-only, Both, and Neither categories
Compute the core: Core = (L-only attributes) ∪ (Neither attributes)
Check if core is sufficient: If Core⁺ = U, then Core is the unique candidate key
If not, extend systematically: Try adding subsets of 'Both' attributes to the core
Verify minimality: For each superkey found, ensure no proper subset is also a superkey

Find All Candidate Keys
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
from itertools import combinations
 
def classify_attributes(all_attrs: set, fds: list[tuple[set, set]]) -> dict:
    """
    Classify attributes based on their appearance in FDs.
    
    Returns dict with keys: 'left_only', 'right_only', 'both', 'neither'
    """
    left_side = set()
    right_side = set()
    
    for lhs, rhs in fds:
        left_side.update(lhs)
        right_side.update(rhs)
    
    return {
        'left_only': left_side - right_side,
        'right_only': right_side - left_side,
        'both': left_side & right_side,
        'neither': all_attrs - left_side - right_side
    }
 
 
def find_all_candidate_keys(all_attrs: set, fds: list[tuple[set, set]]) -> list[set]:
    """
    Find all candidate keys for a relation given its FDs.
    
    Uses attribute classification to prune the search space efficiently.
    """
    classification = classify_attributes(all_attrs, fds)
    
    # Core attributes that MUST be in every candidate key
    core = classification['left_only'] | classification['neither']
    
    # Attributes that might extend the core
    maybe_needed = classification['both']
    
    # Check if core alone is a superkey
    if compute_closure(core, fds) == all_attrs:
        return [core] if core else [set()]  # Handle empty core edge case
    
    candidate_keys = []
    
    # Try adding subsets of 'maybe_needed' to the core
    # Start with smallest subsets (for minimality)
    for size in range(1, len(maybe_needed) + 1):
        for subset in combinations(maybe_needed, size):
            candidate = core | set(subset)
            
            # Check if it's a superkey
            if compute_closure(candidate, fds) != all_attrs:
                continue
            
            # Check if it's minimal (no proper subset is a superkey)
            is_minimal = True
            for existing in candidate_keys:
                if existing < candidate:  # Existing is proper subset
                    is_minimal = False
                    break
            
            # Also check within 'maybe_needed' subset
            if is_minimal:
                for attr in subset:
                    smaller = core | (set(subset) - {attr})
                    if compute_closure(smaller, fds) == all_attrs:
                        is_minimal = False
                        break
            
            if is_minimal:
                candidate_keys.append(candidate)
    
    return candidate_keys
 
 
# Extended example
if __name__ == "__main__":
    # Relation: R(A, B, C, D, E)
    # FDs: A → B, BC → D, D → E, E → A
    all_attrs = {'A', 'B', 'C', 'D', 'E'}
    fds = [
        ({'A'}, {'B'}),
        ({'B', 'C'}, {'D'}),
        ({'D'}, {'E'}),
        ({'E'}, {'A'})
    ]
    
    classification = classify_attributes(all_attrs, fds)
    print("Attribute Classification:")
    for category, attrs in classification.items():
        print(f"  {category}: {attrs if attrs else '{}'}")
    
    candidate_keys = find_all_candidate_keys(all_attrs, fds)
    print(f"\nCandidate Keys: {[sorted(k) for k in candidate_keys]}")

Worked Examples: Key Derivation in Practice

Let's work through complete examples to solidify understanding. These examples progressively increase in complexity.

Relation: Employee(EmpID, Name, Email, DeptID, DeptName, Manager)

Functional Dependencies:

F = {
    EmpID → Name, Email, DeptID
    DeptID → DeptName, Manager
}

Step 1: Classify attributes

Attribute	Appears in LHS	Appears in RHS	Category
EmpID	Yes	No	L-only
Name	No	Yes	R-only
Email	No	Yes	R-only
DeptID	Yes	Yes	Both
DeptName	No	Yes	R-only
Manager	No	Yes	R-only

Step 2: Identify core

Core = L-only ∪ Neither = {EmpID} ∪ {} = {EmpID}

Step 3: Compute core closure

{EmpID}⁺:

Start: {EmpID}
Apply EmpID → Name, Email, DeptID: {EmpID, Name, Email, DeptID}
Apply DeptID → DeptName, Manager: {EmpID, Name, Email, DeptID, DeptName, Manager}

{EmpID}⁺ = all attributes ✓

Result: The unique candidate key is {EmpID}

Key Derivation and Schema Design

Understanding how keys emerge from FDs has profound implications for database design:

1. Validation of Schema Correctness

When designing a database, we often start with intuitions about keys ("StudentID should be the primary key"). Key derivation from FDs validates these intuitions:

If StudentID⁺ ≠ all attributes, our assumed key is wrong
If there are other candidate keys we didn't expect, we might be missing important FDs
If no candidate key exists, the schema is malformed

2. Discovery of Hidden Keys

Sometimes the FDs reveal keys that weren't obvious:

Relation: Order(OrderNum, CustomerEmail, CustomerPhone, ProductID, Quantity)
FDs: 
    OrderNum → CustomerEmail, ProductID, Quantity
    CustomerEmail → CustomerPhone

Obvious key: {OrderNum}

But what if business rules also specify:

    CustomerEmail, ProductID → OrderNum  (each customer-product pair has one active order)

Now {CustomerEmail, ProductID} is also a candidate key!

Keys from Data vs. Keys from Semantics

Never derive FDs (and thus keys) solely from examining data. A sample showing unique StudentID values doesn't prove StudentID is a key—maybe two students just haven't been assigned the same ID yet. Keys must be derived from semantic understanding of the domain, expressed as FDs that hold for ALL possible valid states of the database.

3. Implications for Foreign Keys

Candidate keys in one table become the options for foreign key references in another:

-- If Student has candidate keys {StudentID} and {Email}
-- Then Enrollment can reference either:

CREATE TABLE Enrollment (
    EnrollmentID INT PRIMARY KEY,
    StudentRef INT REFERENCES Student(StudentID),  -- or
    -- StudentEmail VARCHAR REFERENCES Student(Email)
    CourseID INT,
    ...
);

The choice affects query patterns, index requirements, and even storage efficiency.

Best Practices for Key-Based Schema Design

•Document all FDs explicitly — Don't assume keys; derive them from documented FDs
•Consider all candidate keys — The "best" primary key isn't always the most obvious one
•Synthetic vs. natural keys — Sometimes derived candidate keys are awkward (multi-attribute); consider adding a synthetic key
•Validate against business rules — Each FD should correspond to a verifiable business rule
•Re-derive keys when FDs change — Schema modifications that change FDs may change candidate keys

Common Pitfalls and Edge Cases

Key derivation seems straightforward in theory, but several pitfalls trap even experienced practitioners:

Common Mistakes

•Forgetting transitivity — If A → B and B → C, then A⁺ includes C
•Ignoring composite FDs — {A, B} → C is different from A → C and B → C
•Assuming uniqueness = key — Observing unique values doesn't prove FD holds
•Missing attributes in closure — If F doesn't mention an attribute X, include it in core
•Stopping at first key found — Relations can have multiple candidate keys

Correct Approach

•Apply closure fully — Iterate until no new attributes added
•Consider composite keys — Test attribute combinations systematically
•Derive FDs from semantics — Business rules → FDs → Keys
•Track all attributes — Classify using the L/R/Both/Neither system
•Find all candidate keys — Exhaustive search over 'Both' category

Edge Case: The Empty Key

What if the FD set implies a constant attribute?

Relation: Config(Version, Setting, Value)
FDs: {} → Version  (Version is constant across all tuples)

In this case, {} (the empty set) determines Version. The empty set is technically a subset of every key. This means the table can only have one distinct Version value—effectively, it's not a "true" attribute but a schema-level constant.

Edge Case: All Attributes Are Keys

If no non-trivial FDs exist, every proper superset of {} is a superkey. The unique candidate key is the set of all attributes:

Relation: EventLog(Timestamp, UserID, Action, IPAddress)
FDs: {} (none, all combinations are possible)

Candidate key: {Timestamp, UserID, Action, IPAddress}

This often indicates either:

A logging/audit table (correct design)
Missing FD documentation (review business rules)

Summary: Key Derivation from FDs

We've established the theoretical and practical foundations for deriving keys from functional dependencies. Let's consolidate the key insights:

Key Takeaways

•Keys are derived, not assumed — The FD set mathematically determines which attribute combinations are keys
•Attribute closure is the key tool — X is a superkey iff X⁺ = all attributes
•Classification accelerates search — L-only and Neither must be in keys; R-only never is
•Multiple candidate keys are possible — Find all of them for complete schema understanding
•Design implications are profound — Key derivation validates schema correctness and reveals hidden constraints

What's Next:

With the ability to derive keys from FDs, we now turn to categorizing the attributes within those keys. The next page explores prime and non-prime attributes—a classification that becomes essential for understanding normal forms and guiding normalization decisions.

Page Complete

You now understand how functional dependencies encode the information necessary to identify all candidate keys in a relation. This forms the foundation for the attribute classifications and normalization theory to come.

1 / 5

Loading learning content...

Database Management SystemsFD and Keys

Functional Dependencies and Keys

LevelIntermediate

Duration75 mins

TopicFD and Keys

1 / 5

Key from FDs

The Foundation of Database Integrity

What You Will Learn

Keys and FDs: The Fundamental Connection

To understand how keys emerge from functional dependencies, we must first precisely define what a key actually is in the context of relational theory.

Formal Definition of a Key:

Given a relation schema R with attribute set U and a set of functional dependencies F:

A superkey is any attribute set K ⊆ U such that K → U (i.e., K functionally determines all attributes in U)
A candidate key is a minimal superkey—no proper subset of K is also a superkey
The primary key is one designated candidate key chosen by the database designer

The Mathematical Perspective

Example: Understanding the Connection

Consider a relation schema:

Student(StudentID, Email, Name, Department, Advisor)

With functional dependencies:

F = {
    StudentID → Email, Name, Department, Advisor
    Email → StudentID, Name, Department, Advisor
    Department → Advisor
}

From these FDs, we can immediately derive:

StudentID is a superkey because StudentID → all other attributes
Email is a superkey because Email → all other attributes
{StudentID, Name} is a superkey (StudentID alone determines everything)
{Department} is NOT a superkey because Department ↛ StudentID, Email, Name

Key Hierarchy and Relationships
Concept	Definition	Relationship to FDs	Example
Superkey	Any set that determines all attributes	K⁺ = U (closure equals all attributes)	{StudentID}, {Email}, {StudentID, Name}
Candidate Key	Minimal superkey	No proper subset has closure = U	{StudentID}, {Email}
Primary Key	Designated candidate key	Designer choice from candidates	{StudentID}
Alternate Key	Non-primary candidate key	Remaining candidates after PK choice	{Email}

The Attribute Closure Method for Key Derivation

The Key Insight:

An attribute set X is a superkey if and only if X⁺ contains all attributes of the relation. In symbols:

X is a superkey ⟺ X⁺ = U (where U is the set of all attributes)

This transforms key identification into a computational problem with a well-defined algorithm.

Attribute Closure Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
ALGORITHM: ComputeClosure(X, F)
// Input: X = set of attributes, F = set of functional dependencies
// Output: X⁺ = closure of X under F
 
FUNCTION ComputeClosure(X, F):
    closure ← X                    // Initialize with input attributes
    changed ← true
    
    WHILE changed DO:
        changed ← false
        FOR EACH dependency (A → B) IN F:
            // If we have all attributes on the left side
            IF A ⊆ closure AND B ⊄ closure THEN:
                closure ← closure ∪ B   // Add right-side attributes
                changed ← true
        END FOR
    END WHILE
    
    RETURN closure
 
// Example usage:
// X = {StudentID}
// F = {StudentID → Email, Email → Name, Name → Department}
// ComputeClosure(X, F) returns {StudentID, Email, Name, Department}

Algorithm Complexity

Systematic Key Derivation Process

Attribute Classification:

For a given relation schema R with FD set F, each attribute falls into one of four categories:

L-only (Left-only): Appears only on the left side of FDs
R-only (Right-only): Appears only on the right side of FDs
Both: Appears on both left and right sides of different FDs
Neither: Does not appear in any FD (but is in the schema)

This classification dramatically reduces the key search space.

Attribute Classification and Key Implications
Classification	Key Implication	Reason
L-only	MUST be in every candidate key	No FD can derive this attribute—it must be given
R-only	NEVER in any candidate key	Can always be derived—adding it creates redundancy
Neither	MUST be in every candidate key	No FD involves this attribute—it cannot be derived
Both	MAY be in some candidate keys	Sometimes needed, sometimes derivable

The Key Finding Algorithm:

With this classification, we can efficiently find all candidate keys:

Partition attributes into L-only, R-only, Both, and Neither categories
Compute the core: Core = (L-only attributes) ∪ (Neither attributes)
Check if core is sufficient: If Core⁺ = U, then Core is the unique candidate key
If not, extend systematically: Try adding subsets of 'Both' attributes to the core
Verify minimality: For each superkey found, ensure no proper subset is also a superkey

Find All Candidate Keys
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
from itertools import combinations
 
def classify_attributes(all_attrs: set, fds: list[tuple[set, set]]) -> dict:
    """
    Classify attributes based on their appearance in FDs.
    
    Returns dict with keys: 'left_only', 'right_only', 'both', 'neither'
    """
    left_side = set()
    right_side = set()
    
    for lhs, rhs in fds:
        left_side.update(lhs)
        right_side.update(rhs)
    
    return {
        'left_only': left_side - right_side,
        'right_only': right_side - left_side,
        'both': left_side & right_side,
        'neither': all_attrs - left_side - right_side
    }
 
 
def find_all_candidate_keys(all_attrs: set, fds: list[tuple[set, set]]) -> list[set]:
    """
    Find all candidate keys for a relation given its FDs.
    
    Uses attribute classification to prune the search space efficiently.
    """
    classification = classify_attributes(all_attrs, fds)
    
    # Core attributes that MUST be in every candidate key
    core = classification['left_only'] | classification['neither']
    
    # Attributes that might extend the core
    maybe_needed = classification['both']
    
    # Check if core alone is a superkey
    if compute_closure(core, fds) == all_attrs:
        return [core] if core else [set()]  # Handle empty core edge case
    
    candidate_keys = []
    
    # Try adding subsets of 'maybe_needed' to the core
    # Start with smallest subsets (for minimality)
    for size in range(1, len(maybe_needed) + 1):
        for subset in combinations(maybe_needed, size):
            candidate = core | set(subset)
            
            # Check if it's a superkey
            if compute_closure(candidate, fds) != all_attrs:
                continue
            
            # Check if it's minimal (no proper subset is a superkey)
            is_minimal = True
            for existing in candidate_keys:
                if existing < candidate:  # Existing is proper subset
                    is_minimal = False
                    break
            
            # Also check within 'maybe_needed' subset
            if is_minimal:
                for attr in subset:
                    smaller = core | (set(subset) - {attr})
                    if compute_closure(smaller, fds) == all_attrs:
                        is_minimal = False
                        break
            
            if is_minimal:
                candidate_keys.append(candidate)
    
    return candidate_keys
 
 
# Extended example
if __name__ == "__main__":
    # Relation: R(A, B, C, D, E)
    # FDs: A → B, BC → D, D → E, E → A
    all_attrs = {'A', 'B', 'C', 'D', 'E'}
    fds = [
        ({'A'}, {'B'}),
        ({'B', 'C'}, {'D'}),
        ({'D'}, {'E'}),
        ({'E'}, {'A'})
    ]
    
    classification = classify_attributes(all_attrs, fds)
    print("Attribute Classification:")
    for category, attrs in classification.items():
        print(f"  {category}: {attrs if attrs else '{}'}")
    
    candidate_keys = find_all_candidate_keys(all_attrs, fds)
    print(f"\nCandidate Keys: {[sorted(k) for k in candidate_keys]}")

Worked Examples: Key Derivation in Practice

Let's work through complete examples to solidify understanding. These examples progressively increase in complexity.

Relation: Employee(EmpID, Name, Email, DeptID, DeptName, Manager)

Functional Dependencies:

F = {
    EmpID → Name, Email, DeptID
    DeptID → DeptName, Manager
}

Step 1: Classify attributes

Attribute	Appears in LHS	Appears in RHS	Category
EmpID	Yes	No	L-only
Name	No	Yes	R-only
Email	No	Yes	R-only
DeptID	Yes	Yes	Both
DeptName	No	Yes	R-only
Manager	No	Yes	R-only

Step 2: Identify core

Core = L-only ∪ Neither = {EmpID} ∪ {} = {EmpID}

Step 3: Compute core closure

{EmpID}⁺:

Start: {EmpID}
Apply EmpID → Name, Email, DeptID: {EmpID, Name, Email, DeptID}
Apply DeptID → DeptName, Manager: {EmpID, Name, Email, DeptID, DeptName, Manager}

{EmpID}⁺ = all attributes ✓

Result: The unique candidate key is {EmpID}

Key Derivation and Schema Design

Understanding how keys emerge from FDs has profound implications for database design:

1. Validation of Schema Correctness

When designing a database, we often start with intuitions about keys ("StudentID should be the primary key"). Key derivation from FDs validates these intuitions:

If StudentID⁺ ≠ all attributes, our assumed key is wrong
If there are other candidate keys we didn't expect, we might be missing important FDs
If no candidate key exists, the schema is malformed

2. Discovery of Hidden Keys

Sometimes the FDs reveal keys that weren't obvious:

Relation: Order(OrderNum, CustomerEmail, CustomerPhone, ProductID, Quantity)
FDs: 
    OrderNum → CustomerEmail, ProductID, Quantity
    CustomerEmail → CustomerPhone

Obvious key: {OrderNum}

But what if business rules also specify:

    CustomerEmail, ProductID → OrderNum  (each customer-product pair has one active order)

Now {CustomerEmail, ProductID} is also a candidate key!

Keys from Data vs. Keys from Semantics

3. Implications for Foreign Keys

Candidate keys in one table become the options for foreign key references in another:

-- If Student has candidate keys {StudentID} and {Email}
-- Then Enrollment can reference either:

CREATE TABLE Enrollment (
    EnrollmentID INT PRIMARY KEY,
    StudentRef INT REFERENCES Student(StudentID),  -- or
    -- StudentEmail VARCHAR REFERENCES Student(Email)
    CourseID INT,
    ...
);

The choice affects query patterns, index requirements, and even storage efficiency.

Best Practices for Key-Based Schema Design

•Document all FDs explicitly — Don't assume keys; derive them from documented FDs
•Consider all candidate keys — The "best" primary key isn't always the most obvious one
•Synthetic vs. natural keys — Sometimes derived candidate keys are awkward (multi-attribute); consider adding a synthetic key
•Validate against business rules — Each FD should correspond to a verifiable business rule
•Re-derive keys when FDs change — Schema modifications that change FDs may change candidate keys

Common Pitfalls and Edge Cases

Key derivation seems straightforward in theory, but several pitfalls trap even experienced practitioners:

Common Mistakes

•Forgetting transitivity — If A → B and B → C, then A⁺ includes C
•Ignoring composite FDs — {A, B} → C is different from A → C and B → C
•Assuming uniqueness = key — Observing unique values doesn't prove FD holds
•Missing attributes in closure — If F doesn't mention an attribute X, include it in core
•Stopping at first key found — Relations can have multiple candidate keys

Correct Approach

•Apply closure fully — Iterate until no new attributes added
•Consider composite keys — Test attribute combinations systematically
•Derive FDs from semantics — Business rules → FDs → Keys
•Track all attributes — Classify using the L/R/Both/Neither system
•Find all candidate keys — Exhaustive search over 'Both' category

Edge Case: The Empty Key

What if the FD set implies a constant attribute?

Relation: Config(Version, Setting, Value)
FDs: {} → Version  (Version is constant across all tuples)

Edge Case: All Attributes Are Keys

If no non-trivial FDs exist, every proper superset of {} is a superkey. The unique candidate key is the set of all attributes:

Relation: EventLog(Timestamp, UserID, Action, IPAddress)
FDs: {} (none, all combinations are possible)

Candidate key: {Timestamp, UserID, Action, IPAddress}

This often indicates either:

A logging/audit table (correct design)
Missing FD documentation (review business rules)

Summary: Key Derivation from FDs

We've established the theoretical and practical foundations for deriving keys from functional dependencies. Let's consolidate the key insights:

Key Takeaways

•Keys are derived, not assumed — The FD set mathematically determines which attribute combinations are keys
•Attribute closure is the key tool — X is a superkey iff X⁺ = all attributes
•Classification accelerates search — L-only and Neither must be in keys; R-only never is
•Multiple candidate keys are possible — Find all of them for complete schema understanding
•Design implications are profound — Key derivation validates schema correctness and reveals hidden constraints

What's Next:

Page Complete

1 / 5