Database Management SystemsFD and Keys

Functional Dependencies and Keys

LevelIntermediate

Duration75 mins

TopicFD and Keys

4 / 5

Finding All Candidate Keys

Beyond the Obvious Key

Many database designers stop after finding one candidate key—typically the most obvious one suggested by business requirements. But relations often have multiple candidate keys, and missing them leads to incomplete analysis.

Finding ALL candidate keys is essential for:

Accurate prime/non-prime attribute classification
Complete normal form analysis (especially 3NF's exception for prime attributes)
Understanding the full semantic structure of your data
Making informed choices about which key to designate as primary

This page presents systematic algorithms that guarantee discovery of every candidate key.

What You Will Learn

By the end of this page, you will master the attribute classification technique for pruning the key search space, understand the exhaustive algorithm for finding all candidate keys, work through complex examples with multiple overlapping keys, and recognize patterns that indicate multiple keys exist.

The Key-Finding Challenge

The Brute Force Approach (and Why It Fails):

Naively, we could test every possible subset of attributes to see if it's a candidate key:

For each subset S of the attribute set U:
- Compute S⁺ (closure of S)
- If S⁺ = U, S is a superkey
- If no proper subset of S is a superkey, S is a candidate key

The problem: a relation with n attributes has 2ⁿ subsets. For a modest table with 20 attributes, that's over a million subsets to check. For 30 attributes, over a billion.

Exponential Growth of Search Space
Attributes (n)	Subsets (2ⁿ)	At 1μs per check
10	1,024	~1 ms
15	32,768	~33 ms
20	1,048,576	~1 second
25	33,554,432	~34 seconds
30	1,073,741,824	~18 minutes
40	1,099,511,627,776	~13 days

The Smarter Approach:

We can dramatically reduce the search space using attribute classification. By analyzing which attributes appear on which side of functional dependencies, we can determine:

Which attributes MUST be in every candidate key (no need to try without them)
Which attributes can NEVER be in any candidate key (no need to try with them)
Which attributes MAY be in some candidate keys (only these require exploration)

This typically reduces the search from 2ⁿ subsets to 2ᵐ, where m is much smaller than n.

The Power of Classification

In practice, m is often very small (0-5 attributes), making the search tractable even for large schemas. A relation with 30 attributes but only 3 in the 'maybe' category requires checking just 8 combinations rather than a billion.

Attribute Classification: The Key to Efficient Search

Every attribute in a relation can be classified based on its appearance in functional dependencies:

The Four Categories:

Attribute Classification Categories
Category	Definition	Key Status	Reason
L-only	Appears only on the LEFT side of FDs	MUST be in EVERY key	No FD can derive it
R-only	Appears only on the RIGHT side of FDs	Can NEVER be in ANY key	Always derivable; including it makes key non-minimal
Both	Appears on both LEFT and RIGHT sides	MAY be in SOME keys	Sometimes derivable, sometimes needed
Neither	Does not appear in ANY FD	MUST be in EVERY key	Cannot be derived by any means

The Core Computation:

L = {attributes appearing on left side of some FD}
R = {attributes appearing on right side of some FD}

L-only = L - R
R-only = R - L
Both = L ∩ R
Neither = U - L - R (where U is all attributes)

Core = L-only ∪ Neither   // MUST be in every key
Excluded = R-only          // Can NEVER be in any key
Explore = Both             // Need to try combinations

Attribute Classification Algorithm
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def classify_attributes(all_attrs: set, fds: list[tuple[set, set]]) -> dict:
    """
    Classify all attributes based on FD appearance.
    
    Args:
        all_attrs: Complete set of attributes in the relation
        fds: List of functional dependencies as (lhs, rhs) tuples
    
    Returns:
        Dictionary with 'core', 'excluded', 'explore', and detailed categories
    """
    left_side = set()
    right_side = set()
    
    for lhs, rhs in fds:
        left_side.update(lhs)
        right_side.update(rhs)
    
    # The four categories
    l_only = left_side - right_side    # Only on left
    r_only = right_side - left_side    # Only on right
    both = left_side & right_side       # On both sides
    neither = all_attrs - left_side - right_side  # Not in any FD
    
    # Key-finding implications
    core = l_only | neither      # Must be in every key
    excluded = r_only            # Never in any key
    explore = both               # Need to try combinations
    
    return {
        'l_only': l_only,
        'r_only': r_only,
        'both': both,
        'neither': neither,
        'core': core,
        'excluded': excluded,
        'explore': explore
    }
 
 
def demonstrate_classification(name: str, all_attrs: set, fds: list):
    """Show classification results for a relation."""
    print(f"\n{'='*60}")
    print(f"Relation: {name}")
    print(f"Attributes: {sorted(all_attrs)}")
    print(f"\nFunctional Dependencies:")
    for lhs, rhs in fds:
        print(f"  {sorted(lhs)} → {sorted(rhs)}")
    
    result = classify_attributes(all_attrs, fds)
    
    print(f"\nClassification:")
    print(f"  L-only (must be in key):  {sorted(result['l_only'])}")
    print(f"  R-only (never in key):    {sorted(result['r_only'])}")
    print(f"  Both (explore):           {sorted(result['both'])}")
    print(f"  Neither (must be in key): {sorted(result['neither'])}")
    
    print(f"\nSearch Parameters:")
    print(f"  Core (definite): {sorted(result['core'])}")
    print(f"  Explore (try combinations of): {sorted(result['explore'])}")
    print(f"  Excluded (ignore): {sorted(result['excluded'])}")
    print(f"  Search space: 2^{len(result['explore'])} = {2**len(result['explore'])} combinations")
    print(f"{'='*60}")
 
 
# Examples
if __name__ == "__main__":
    # Example 1: Simple relation
    demonstrate_classification(
        "Employee(EmpID, Name, DeptID, Salary)",
        {"EmpID", "Name", "DeptID", "Salary"},
        [
            ({"EmpID"}, {"Name", "DeptID", "Salary"})
        ]
    )
    
    # Example 2: Multiple keys possible
    demonstrate_classification(
        "Student(SID, SSN, Email, Name, GPA)",
        {"SID", "SSN", "Email", "Name", "GPA"},
        [
            ({"SID"}, {"SSN", "Email", "Name", "GPA"}),
            ({"SSN"}, {"SID", "Email", "Name", "GPA"}),
            ({"Email"}, {"SID", "SSN", "Name", "GPA"})
        ]
    )
    
    # Example 3: Complex with explore attributes
    demonstrate_classification(
        "Enrollment(StudentID, CourseID, Semester, Grade, InstructorID)",
        {"StudentID", "CourseID", "Semester", "Grade", "InstructorID"},
        [
            ({"StudentID", "CourseID", "Semester"}, {"Grade", "InstructorID"}),
            ({"CourseID", "Semester"}, {"InstructorID"})
        ]
    )

The Complete Candidate Key Finding Algorithm

With attribute classification in place, we can now present the complete algorithm for finding all candidate keys:

Algorithm Overview:

Classify attributes into core, excluded, and explore
Check if core is sufficient: If Core⁺ = U, core is the unique candidate key
Otherwise, systematically try combinations: Add subsets of explore attributes to the core
For each superkey found, verify minimality: Ensure no proper subset is also a superkey
Return all minimal superkeys (candidate keys)

Complete Candidate Key Algorithm
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
from itertools import combinations
from functools import lru_cache
 
def compute_closure(attributes: frozenset, fds: tuple) -> frozenset:
    """
    Compute attribute closure with caching for efficiency.
    Uses immutable types for caching compatibility.
    """
    closure = set(attributes)
    changed = True
    
    while changed:
        changed = False
        for lhs, rhs in fds:
            if lhs.issubset(closure) and not rhs.issubset(closure):
                closure.update(rhs)
                changed = True
    
    return frozenset(closure)
 
 
def is_superkey(attrs: set, all_attrs: set, fds: list) -> bool:
    """Check if attribute set is a superkey."""
    fds_tuple = tuple((frozenset(lhs), frozenset(rhs)) for lhs, rhs in fds)
    closure = compute_closure(frozenset(attrs), fds_tuple)
    return set(closure) == all_attrs
 
 
def is_minimal_superkey(attrs: set, all_attrs: set, fds: list) -> bool:
    """Check if a superkey is minimal (i.e., a candidate key)."""
    if not is_superkey(attrs, all_attrs, fds):
        return False
    
    # Try removing each attribute
    for attr in attrs:
        subset = attrs - {attr}
        if subset and is_superkey(subset, all_attrs, fds):
            return False  # Not minimal
    
    return True
 
 
def find_all_candidate_keys(all_attrs: set, fds: list) -> list[set]:
    """
    Find ALL candidate keys for a relation.
    
    Uses attribute classification to minimize search space.
    Returns list of sets, each set being a candidate key.
    """
    # Step 1: Classify attributes
    classification = classify_attributes(all_attrs, fds)
    core = classification['core']
    explore = classification['explore']
    
    # Step 2: Check if core alone is sufficient
    if is_superkey(core, all_attrs, fds):
        # Core is a superkey; verify it's minimal
        if is_minimal_superkey(core, all_attrs, fds):
            return [core]
        else:
            # Shouldn't happen if classification is correct
            # But handle by reducing core
            pass
    
    # Handle edge case: empty core
    if not core:
        core = set()
    
    # Step 3: Try adding subsets of explore attributes
    candidate_keys = []
    
    # Start with smallest subsets for efficiency
    for size in range(len(explore) + 1):
        for subset in combinations(explore, size):
            candidate = core | set(subset)
            
            # Skip if we already have a proper subset that's a key
            is_superset_of_found_key = False
            for found_key in candidate_keys:
                if found_key < candidate:  # found_key is proper subset
                    is_superset_of_found_key = True
                    break
            
            if is_superset_of_found_key:
                continue
            
            # Check if this is a superkey
            if not is_superkey(candidate, all_attrs, fds):
                continue
            
            # Check if it's minimal
            if is_minimal_superkey(candidate, all_attrs, fds):
                candidate_keys.append(candidate)
    
    return candidate_keys if candidate_keys else [all_attrs]
 
 
def analyze_keys(name: str, all_attrs: set, fds: list):
    """Complete key analysis with detailed output."""
    print(f"\n{'='*70}")
    print(f"CANDIDATE KEY ANALYSIS: {name}")
    print(f"{'='*70}")
    
    print(f"\nAttributes: {sorted(all_attrs)}")
    print(f"\nFunctional Dependencies:")
    for lhs, rhs in fds:
        print(f"  {sorted(lhs)} → {sorted(rhs)}")
    
    # Classification
    classification = classify_attributes(all_attrs, fds)
    print(f"\nAttribute Classification:")
    print(f"  Core (must be in key): {sorted(classification['core'])}")
    print(f"  Explore (may be in key): {sorted(classification['explore'])}")
    print(f"  Excluded (never in key): {sorted(classification['excluded'])}")
    
    # Find all candidate keys
    candidate_keys = find_all_candidate_keys(all_attrs, fds)
    
    print(f"\nCandidate Keys Found: {len(candidate_keys)}")
    for i, key in enumerate(candidate_keys, 1):
        print(f"  Key {i}: {{{', '.join(sorted(key))}}}")
    
    # Verify each key
    print(f"\nVerification:")
    for key in candidate_keys:
        fds_tuple = tuple((frozenset(lhs), frozenset(rhs)) for lhs, rhs in fds)
        closure = compute_closure(frozenset(key), fds_tuple)
        print(f"  {{{', '.join(sorted(key))}}}⁺ = {{{', '.join(sorted(closure))}}}")
    
    # Prime/Non-prime classification
    prime = set()
    for key in candidate_keys:
        prime.update(key)
    non_prime = all_attrs - prime
    
    print(f"\nAttribute Status:")
    print(f"  Prime: {sorted(prime)}")
    print(f"  Non-prime: {sorted(non_prime)}")
    
    print(f"{'='*70}")
    
    return candidate_keys
 
 
# Comprehensive examples
if __name__ == "__main__":
    # Example 1: Single key
    analyze_keys(
        "Product(ProdID, Name, Category, Price)",
        {"ProdID", "Name", "Category", "Price"},
        [
            ({"ProdID"}, {"Name", "Category", "Price"})
        ]
    )
    
    # Example 2: Three equivalent keys
    analyze_keys(
        "Person(SSN, PassportNum, DriverLicense, Name, Address)",
        {"SSN", "PassportNum", "DriverLicense", "Name", "Address"},
        [
            ({"SSN"}, {"PassportNum", "DriverLicense", "Name", "Address"}),
            ({"PassportNum"}, {"SSN", "DriverLicense", "Name", "Address"}),
            ({"DriverLicense"}, {"SSN", "PassportNum", "Name", "Address"})
        ]
    )
    
    # Example 3: Cyclic dependencies
    analyze_keys(
        "Cycle(A, B, C, D, E)",
        {"A", "B", "C", "D", "E"},
        [
            ({"A"}, {"B"}),
            ({"B"}, {"C"}),
            ({"C"}, {"A"}),
            ({"D"}, {"E"})
        ]
    )
    
    # Example 4: Composite keys
    analyze_keys(
        "Enrollment(StudentID, CourseID, Semester, Grade)",
        {"StudentID", "CourseID", "Semester", "Grade"},
        [
            ({"StudentID", "CourseID", "Semester"}, {"Grade"})
        ]
    )

Worked Examples: Step-by-Step Key Discovery

Let's work through increasingly complex examples to build intuition for the key-finding process.

Relation: R(A, B, C, D)

Functional Dependencies:

A → B
B → C
C → D
D → A

Step 1: Classify Attributes

Attr	In LHS?	In RHS?	Category
A	Yes (A→B)	Yes (D→A)	Both
B	Yes (B→C)	Yes (A→B)	Both
C	Yes (C→D)	Yes (B→C)	Both
D	Yes (D→A)	Yes (C→D)	Both

L-only = {} R-only = {} Both = {A, B, C, D} Neither = {}

Core = {} (empty) Explore = {A, B, C, D}

Step 2: Check if core is sufficient

{}⁺ = {} ≠ {A, B, C, D}. Not sufficient.

Step 3: Try combinations of explore attributes

Size 1:

{A}⁺: A → B → C → D → A = {A, B, C, D} ✓ SUPERKEY
{B}⁺: B → C → D → A → B = {A, B, C, D} ✓ SUPERKEY
{C}⁺: C → D → A → B → C = {A, B, C, D} ✓ SUPERKEY
{D}⁺: D → A → B → C → D = {A, B, C, D} ✓ SUPERKEY

All single-attribute sets are superkeys!

Step 4: Check minimality

Each is a single attribute—no proper subsets except {}. All minimal.

Result: 4 Candidate Keys: {A}, {B}, {C}, {D}

Prime Attributes: {A, B, C, D} — all attributes! Non-Prime Attributes: {} — none!

Insight: Cyclic dependencies often create many equivalent keys.

Patterns That Indicate Multiple Keys

Certain FD patterns strongly suggest the presence of multiple candidate keys. Recognizing these patterns helps you anticipate when deeper analysis is needed:

Key-Indicating Patterns

•Bidirectional FDs (A → B and B → A): These create equivalent single-attribute keys
•Cyclic FDs (A → B → C → A): Each attribute in the cycle can serve as a key (with other required attributes)
•Alternate identifiers: Multiple FDs with different LHS but both determining all other attributes
•Substitute attributes: When X, A → Y and X, B → A (A and B are interchangeable)
•Empty core: When no attribute is L-only or Neither, multiple keys are almost certain

Pattern Deep Dive: Bidirectional Dependencies

Relation: Employee(EmpNumber, SSN, Email, Name)
FDs: 
  EmpNumber → SSN, Email, Name
  SSN → EmpNumber, Email, Name  
  Email → EmpNumber, SSN, Name

Each of these three attributes can serve as a standalone key because they mutually determine each other. This often happens when a domain has multiple natural identifiers:

EmpNumber: Internal company ID
SSN: Government ID
Email: Communication ID

All three are equally valid candidate keys.

Pattern Deep Dive: Shared Anchor with Alternatives

Relation: Booking(CustomerID, FlightID, Date, Seat, BookingRef)
FDs:
  CustomerID, FlightID, Date → Seat, BookingRef
  BookingRef → CustomerID, FlightID, Date, Seat

Here we have:

A composite key: {CustomerID, FlightID, Date}
A synthetic key: {BookingRef}

Both are candidate keys. This pattern is common when:

A natural composite key exists (who booked what, when)
A system-generated reference number is added for convenience

Candidate Keys: {CustomerID, FlightID, Date}, {BookingRef}

The Empty Core Signal

When attribute classification yields an empty core (no L-only or Neither attributes), you should expect multiple candidate keys. This is because every potential key must come from the 'explore' category, and different combinations from this category may all work as keys.

Common Pitfalls in Key Discovery

Even with systematic algorithms, several pitfalls can lead to incorrect or incomplete key discovery:

Common Mistakes

•Stopping at first key: Missing alternate keys that inform normalization
•Ignoring transitive closure: Missing keys that emerge through chains of FDs
•Forgetting Neither attributes: Attributes not in any FD must be in every key
•Premature minimality check: Checking minimality before finding all superkeys
•Confusing superkey with candidate key: All candidate keys are superkeys, but not vice versa

Best Practices

•Complete the classification: Ensure every attribute is categorized
•Start small: Test smallest candidates first (core alone, core + 1, ...)
•Use the closure algorithm correctly: Iterate until no changes
•Verify independently: Check each candidate key's closure equals U
•Document all keys: Even if you choose one primary, record all candidates

Edge Case: The All-Key Relation

Sometimes every attribute must be in the key:

Relation: EventLog(Timestamp, UserID, Action, Result)
FDs: {} (none—each combination is unique)

With no FDs, no attribute can be derived, so the only candidate key is the entire attribute set: {Timestamp, UserID, Action, Result}.

All attributes are prime. This is fine for logging tables but may indicate missing FDs in business data.

Edge Case: The Constant Attribute

Relation: Config(Version, SettingName, Value)
FDs: 
  {} → Version  (Version is constant across all rows)
  SettingName → Value

The FD {} → Version means Version is the same for all tuples. This is unusual but valid. The key analysis proceeds normally, with SettingName likely being the candidate key.

Such FDs often indicate data that should be metadata or configuration, not row data.

Practical Applications of Complete Key Discovery

Discovering all candidate keys has significant practical implications:

Practical Benefits

•Accurate Normalization Analysis: Prime attribute classification affects 3NF analysis directly
•Primary Key Selection: Choose from all candidates based on stability, size, and semantics
•Index Planning: Each candidate key suggests a potential index
•Foreign Key Options: Referencing tables can use any candidate key
•Constraint Documentation: UNIQUE constraints should reflect all candidates
•API Design: Which identifiers to expose in APIs (natural vs. synthetic)

Primary Key Selection Criteria:

When multiple candidate keys exist, consider:

Criterion	Preference	Reason
Stability	More stable	Minimize cascading updates
Size	Smaller	Better index efficiency
Semantics	More meaningful	Easier debugging/querying
Privacy	Less sensitive	Safer in logs/URLs
Uniqueness domain	Broader	Cross-system compatibility

Practical Key Analysis Report
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- After discovering multiple candidate keys, 
-- implement them in the schema:
 
-- Example: Student table with three candidate keys
CREATE TABLE Student (
    -- Primary key: synthetic for stability
    StudentID INT PRIMARY KEY,
    
    -- Alternate key: government identifier
    SSN VARCHAR(11) NOT NULL UNIQUE,
    
    -- Alternate key: institutional identifier  
    Email VARCHAR(100) NOT NULL UNIQUE,
    
    -- Non-prime attributes
    Name VARCHAR(100) NOT NULL,
    Major VARCHAR(50),
    GPA DECIMAL(3,2),
    
    -- Document the key structure
    CONSTRAINT chk_ssn_format CHECK (SSN LIKE '___-__-____'),
    CONSTRAINT chk_email_format CHECK (Email LIKE '%@%.%')
);
 
-- Each UNIQUE constraint reflects a candidate key
-- and automatically creates an index
 
-- Reference can use any candidate key
CREATE TABLE Enrollment (
    EnrollmentID INT PRIMARY KEY,
    
    -- Could use any of these:
    StudentID INT REFERENCES Student(StudentID),
    -- OR: SSN VARCHAR(11) REFERENCES Student(SSN),
    -- OR: Email VARCHAR(100) REFERENCES Student(Email),
    
    CourseID INT NOT NULL,
    Semester VARCHAR(10) NOT NULL
);
 
-- Index all candidate keys for lookup efficiency
-- (UNIQUE constraints already create indexes, but explicit:)
-- CREATE INDEX idx_student_ssn ON Student(SSN);
-- CREATE INDEX idx_student_email ON Student(Email);

Summary: Finding All Candidate Keys

Let's consolidate the key insights from this comprehensive exploration:

Key Takeaways

•Classification is crucial — Partition attributes into L-only, R-only, Both, Neither to prune the search space
•Core + Explore — Core must be in every key; explore systematically adds 'Both' attributes
•Closure is the test — Candidate⁺ = All_Attributes confirms superkey status
•Minimality required — Candidate keys are minimal superkeys; verify no proper subset is also a superkey
•Patterns indicate multiplicity — Bidirectional FDs, cycles, and empty cores suggest multiple keys
•Practical implications — All keys inform normalization, indexing, and schema design decisions

What's Next:

With the ability to find all candidate keys and classify attributes as prime/non-prime, we're now ready to explore the relationship between keys and normalization. The final page of this module ties together keys, FDs, and normal forms into a unified framework for schema analysis.

Page Complete

You now possess a complete toolkit for discovering all candidate keys in any relation. This capability is essential for accurate normalization analysis and informed database design.

4 / 5

Loading learning content...

Database Management SystemsFD and Keys

Functional Dependencies and Keys

LevelIntermediate

Duration75 mins

TopicFD and Keys

4 / 5

Finding All Candidate Keys

Beyond the Obvious Key

Finding ALL candidate keys is essential for:

Accurate prime/non-prime attribute classification
Complete normal form analysis (especially 3NF's exception for prime attributes)
Understanding the full semantic structure of your data
Making informed choices about which key to designate as primary

This page presents systematic algorithms that guarantee discovery of every candidate key.

What You Will Learn

The Key-Finding Challenge

The Brute Force Approach (and Why It Fails):

Naively, we could test every possible subset of attributes to see if it's a candidate key:

For each subset S of the attribute set U:
- Compute S⁺ (closure of S)
- If S⁺ = U, S is a superkey
- If no proper subset of S is a superkey, S is a candidate key

The problem: a relation with n attributes has 2ⁿ subsets. For a modest table with 20 attributes, that's over a million subsets to check. For 30 attributes, over a billion.

Exponential Growth of Search Space
Attributes (n)	Subsets (2ⁿ)	At 1μs per check
10	1,024	~1 ms
15	32,768	~33 ms
20	1,048,576	~1 second
25	33,554,432	~34 seconds
30	1,073,741,824	~18 minutes
40	1,099,511,627,776	~13 days

The Smarter Approach:

We can dramatically reduce the search space using attribute classification. By analyzing which attributes appear on which side of functional dependencies, we can determine:

Which attributes MUST be in every candidate key (no need to try without them)
Which attributes can NEVER be in any candidate key (no need to try with them)
Which attributes MAY be in some candidate keys (only these require exploration)

This typically reduces the search from 2ⁿ subsets to 2ᵐ, where m is much smaller than n.

The Power of Classification

Attribute Classification: The Key to Efficient Search

Every attribute in a relation can be classified based on its appearance in functional dependencies:

The Four Categories:

Attribute Classification Categories
Category	Definition	Key Status	Reason
L-only	Appears only on the LEFT side of FDs	MUST be in EVERY key	No FD can derive it
R-only	Appears only on the RIGHT side of FDs	Can NEVER be in ANY key	Always derivable; including it makes key non-minimal
Both	Appears on both LEFT and RIGHT sides	MAY be in SOME keys	Sometimes derivable, sometimes needed
Neither	Does not appear in ANY FD	MUST be in EVERY key	Cannot be derived by any means

The Core Computation:

L = {attributes appearing on left side of some FD}
R = {attributes appearing on right side of some FD}

L-only = L - R
R-only = R - L
Both = L ∩ R
Neither = U - L - R (where U is all attributes)

Core = L-only ∪ Neither   // MUST be in every key
Excluded = R-only          // Can NEVER be in any key
Explore = Both             // Need to try combinations

Attribute Classification Algorithm
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
def classify_attributes(all_attrs: set, fds: list[tuple[set, set]]) -> dict:
    """
    Classify all attributes based on FD appearance.
    
    Args:
        all_attrs: Complete set of attributes in the relation
        fds: List of functional dependencies as (lhs, rhs) tuples
    
    Returns:
        Dictionary with 'core', 'excluded', 'explore', and detailed categories
    """
    left_side = set()
    right_side = set()
    
    for lhs, rhs in fds:
        left_side.update(lhs)
        right_side.update(rhs)
    
    # The four categories
    l_only = left_side - right_side    # Only on left
    r_only = right_side - left_side    # Only on right
    both = left_side & right_side       # On both sides
    neither = all_attrs - left_side - right_side  # Not in any FD
    
    # Key-finding implications
    core = l_only | neither      # Must be in every key
    excluded = r_only            # Never in any key
    explore = both               # Need to try combinations
    
    return {
        'l_only': l_only,
        'r_only': r_only,
        'both': both,
        'neither': neither,
        'core': core,
        'excluded': excluded,
        'explore': explore
    }
 
 
def demonstrate_classification(name: str, all_attrs: set, fds: list):
    """Show classification results for a relation."""
    print(f"\n{'='*60}")
    print(f"Relation: {name}")
    print(f"Attributes: {sorted(all_attrs)}")
    print(f"\nFunctional Dependencies:")
    for lhs, rhs in fds:
        print(f"  {sorted(lhs)} → {sorted(rhs)}")
    
    result = classify_attributes(all_attrs, fds)
    
    print(f"\nClassification:")
    print(f"  L-only (must be in key):  {sorted(result['l_only'])}")
    print(f"  R-only (never in key):    {sorted(result['r_only'])}")
    print(f"  Both (explore):           {sorted(result['both'])}")
    print(f"  Neither (must be in key): {sorted(result['neither'])}")
    
    print(f"\nSearch Parameters:")
    print(f"  Core (definite): {sorted(result['core'])}")
    print(f"  Explore (try combinations of): {sorted(result['explore'])}")
    print(f"  Excluded (ignore): {sorted(result['excluded'])}")
    print(f"  Search space: 2^{len(result['explore'])} = {2**len(result['explore'])} combinations")
    print(f"{'='*60}")
 
 
# Examples
if __name__ == "__main__":
    # Example 1: Simple relation
    demonstrate_classification(
        "Employee(EmpID, Name, DeptID, Salary)",
        {"EmpID", "Name", "DeptID", "Salary"},
        [
            ({"EmpID"}, {"Name", "DeptID", "Salary"})
        ]
    )
    
    # Example 2: Multiple keys possible
    demonstrate_classification(
        "Student(SID, SSN, Email, Name, GPA)",
        {"SID", "SSN", "Email", "Name", "GPA"},
        [
            ({"SID"}, {"SSN", "Email", "Name", "GPA"}),
            ({"SSN"}, {"SID", "Email", "Name", "GPA"}),
            ({"Email"}, {"SID", "SSN", "Name", "GPA"})
        ]
    )
    
    # Example 3: Complex with explore attributes
    demonstrate_classification(
        "Enrollment(StudentID, CourseID, Semester, Grade, InstructorID)",
        {"StudentID", "CourseID", "Semester", "Grade", "InstructorID"},
        [
            ({"StudentID", "CourseID", "Semester"}, {"Grade", "InstructorID"}),
            ({"CourseID", "Semester"}, {"InstructorID"})
        ]
    )

The Complete Candidate Key Finding Algorithm

With attribute classification in place, we can now present the complete algorithm for finding all candidate keys:

Algorithm Overview:

Classify attributes into core, excluded, and explore
Check if core is sufficient: If Core⁺ = U, core is the unique candidate key
Otherwise, systematically try combinations: Add subsets of explore attributes to the core
For each superkey found, verify minimality: Ensure no proper subset is also a superkey
Return all minimal superkeys (candidate keys)

Complete Candidate Key Algorithm
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
from itertools import combinations
from functools import lru_cache
 
def compute_closure(attributes: frozenset, fds: tuple) -> frozenset:
    """
    Compute attribute closure with caching for efficiency.
    Uses immutable types for caching compatibility.
    """
    closure = set(attributes)
    changed = True
    
    while changed:
        changed = False
        for lhs, rhs in fds:
            if lhs.issubset(closure) and not rhs.issubset(closure):
                closure.update(rhs)
                changed = True
    
    return frozenset(closure)
 
 
def is_superkey(attrs: set, all_attrs: set, fds: list) -> bool:
    """Check if attribute set is a superkey."""
    fds_tuple = tuple((frozenset(lhs), frozenset(rhs)) for lhs, rhs in fds)
    closure = compute_closure(frozenset(attrs), fds_tuple)
    return set(closure) == all_attrs
 
 
def is_minimal_superkey(attrs: set, all_attrs: set, fds: list) -> bool:
    """Check if a superkey is minimal (i.e., a candidate key)."""
    if not is_superkey(attrs, all_attrs, fds):
        return False
    
    # Try removing each attribute
    for attr in attrs:
        subset = attrs - {attr}
        if subset and is_superkey(subset, all_attrs, fds):
            return False  # Not minimal
    
    return True
 
 
def find_all_candidate_keys(all_attrs: set, fds: list) -> list[set]:
    """
    Find ALL candidate keys for a relation.
    
    Uses attribute classification to minimize search space.
    Returns list of sets, each set being a candidate key.
    """
    # Step 1: Classify attributes
    classification = classify_attributes(all_attrs, fds)
    core = classification['core']
    explore = classification['explore']
    
    # Step 2: Check if core alone is sufficient
    if is_superkey(core, all_attrs, fds):
        # Core is a superkey; verify it's minimal
        if is_minimal_superkey(core, all_attrs, fds):
            return [core]
        else:
            # Shouldn't happen if classification is correct
            # But handle by reducing core
            pass
    
    # Handle edge case: empty core
    if not core:
        core = set()
    
    # Step 3: Try adding subsets of explore attributes
    candidate_keys = []
    
    # Start with smallest subsets for efficiency
    for size in range(len(explore) + 1):
        for subset in combinations(explore, size):
            candidate = core | set(subset)
            
            # Skip if we already have a proper subset that's a key
            is_superset_of_found_key = False
            for found_key in candidate_keys:
                if found_key < candidate:  # found_key is proper subset
                    is_superset_of_found_key = True
                    break
            
            if is_superset_of_found_key:
                continue
            
            # Check if this is a superkey
            if not is_superkey(candidate, all_attrs, fds):
                continue
            
            # Check if it's minimal
            if is_minimal_superkey(candidate, all_attrs, fds):
                candidate_keys.append(candidate)
    
    return candidate_keys if candidate_keys else [all_attrs]
 
 
def analyze_keys(name: str, all_attrs: set, fds: list):
    """Complete key analysis with detailed output."""
    print(f"\n{'='*70}")
    print(f"CANDIDATE KEY ANALYSIS: {name}")
    print(f"{'='*70}")
    
    print(f"\nAttributes: {sorted(all_attrs)}")
    print(f"\nFunctional Dependencies:")
    for lhs, rhs in fds:
        print(f"  {sorted(lhs)} → {sorted(rhs)}")
    
    # Classification
    classification = classify_attributes(all_attrs, fds)
    print(f"\nAttribute Classification:")
    print(f"  Core (must be in key): {sorted(classification['core'])}")
    print(f"  Explore (may be in key): {sorted(classification['explore'])}")
    print(f"  Excluded (never in key): {sorted(classification['excluded'])}")
    
    # Find all candidate keys
    candidate_keys = find_all_candidate_keys(all_attrs, fds)
    
    print(f"\nCandidate Keys Found: {len(candidate_keys)}")
    for i, key in enumerate(candidate_keys, 1):
        print(f"  Key {i}: {{{', '.join(sorted(key))}}}")
    
    # Verify each key
    print(f"\nVerification:")
    for key in candidate_keys:
        fds_tuple = tuple((frozenset(lhs), frozenset(rhs)) for lhs, rhs in fds)
        closure = compute_closure(frozenset(key), fds_tuple)
        print(f"  {{{', '.join(sorted(key))}}}⁺ = {{{', '.join(sorted(closure))}}}")
    
    # Prime/Non-prime classification
    prime = set()
    for key in candidate_keys:
        prime.update(key)
    non_prime = all_attrs - prime
    
    print(f"\nAttribute Status:")
    print(f"  Prime: {sorted(prime)}")
    print(f"  Non-prime: {sorted(non_prime)}")
    
    print(f"{'='*70}")
    
    return candidate_keys
 
 
# Comprehensive examples
if __name__ == "__main__":
    # Example 1: Single key
    analyze_keys(
        "Product(ProdID, Name, Category, Price)",
        {"ProdID", "Name", "Category", "Price"},
        [
            ({"ProdID"}, {"Name", "Category", "Price"})
        ]
    )
    
    # Example 2: Three equivalent keys
    analyze_keys(
        "Person(SSN, PassportNum, DriverLicense, Name, Address)",
        {"SSN", "PassportNum", "DriverLicense", "Name", "Address"},
        [
            ({"SSN"}, {"PassportNum", "DriverLicense", "Name", "Address"}),
            ({"PassportNum"}, {"SSN", "DriverLicense", "Name", "Address"}),
            ({"DriverLicense"}, {"SSN", "PassportNum", "Name", "Address"})
        ]
    )
    
    # Example 3: Cyclic dependencies
    analyze_keys(
        "Cycle(A, B, C, D, E)",
        {"A", "B", "C", "D", "E"},
        [
            ({"A"}, {"B"}),
            ({"B"}, {"C"}),
            ({"C"}, {"A"}),
            ({"D"}, {"E"})
        ]
    )
    
    # Example 4: Composite keys
    analyze_keys(
        "Enrollment(StudentID, CourseID, Semester, Grade)",
        {"StudentID", "CourseID", "Semester", "Grade"},
        [
            ({"StudentID", "CourseID", "Semester"}, {"Grade"})
        ]
    )

Worked Examples: Step-by-Step Key Discovery

Let's work through increasingly complex examples to build intuition for the key-finding process.

Relation: R(A, B, C, D)

Functional Dependencies:

A → B
B → C
C → D
D → A

Step 1: Classify Attributes

Attr	In LHS?	In RHS?	Category
A	Yes (A→B)	Yes (D→A)	Both
B	Yes (B→C)	Yes (A→B)	Both
C	Yes (C→D)	Yes (B→C)	Both
D	Yes (D→A)	Yes (C→D)	Both

L-only = {} R-only = {} Both = {A, B, C, D} Neither = {}

Core = {} (empty) Explore = {A, B, C, D}

Step 2: Check if core is sufficient

{}⁺ = {} ≠ {A, B, C, D}. Not sufficient.

Step 3: Try combinations of explore attributes

Size 1:

{A}⁺: A → B → C → D → A = {A, B, C, D} ✓ SUPERKEY
{B}⁺: B → C → D → A → B = {A, B, C, D} ✓ SUPERKEY
{C}⁺: C → D → A → B → C = {A, B, C, D} ✓ SUPERKEY
{D}⁺: D → A → B → C → D = {A, B, C, D} ✓ SUPERKEY

All single-attribute sets are superkeys!

Step 4: Check minimality

Each is a single attribute—no proper subsets except {}. All minimal.

Result: 4 Candidate Keys: {A}, {B}, {C}, {D}

Prime Attributes: {A, B, C, D} — all attributes! Non-Prime Attributes: {} — none!

Insight: Cyclic dependencies often create many equivalent keys.

Patterns That Indicate Multiple Keys

Certain FD patterns strongly suggest the presence of multiple candidate keys. Recognizing these patterns helps you anticipate when deeper analysis is needed:

Key-Indicating Patterns

•Bidirectional FDs (A → B and B → A): These create equivalent single-attribute keys
•Cyclic FDs (A → B → C → A): Each attribute in the cycle can serve as a key (with other required attributes)
•Alternate identifiers: Multiple FDs with different LHS but both determining all other attributes
•Substitute attributes: When X, A → Y and X, B → A (A and B are interchangeable)
•Empty core: When no attribute is L-only or Neither, multiple keys are almost certain

Pattern Deep Dive: Bidirectional Dependencies

Relation: Employee(EmpNumber, SSN, Email, Name)
FDs: 
  EmpNumber → SSN, Email, Name
  SSN → EmpNumber, Email, Name  
  Email → EmpNumber, SSN, Name

Each of these three attributes can serve as a standalone key because they mutually determine each other. This often happens when a domain has multiple natural identifiers:

EmpNumber: Internal company ID
SSN: Government ID
Email: Communication ID

All three are equally valid candidate keys.

Pattern Deep Dive: Shared Anchor with Alternatives

Relation: Booking(CustomerID, FlightID, Date, Seat, BookingRef)
FDs:
  CustomerID, FlightID, Date → Seat, BookingRef
  BookingRef → CustomerID, FlightID, Date, Seat

Here we have:

A composite key: {CustomerID, FlightID, Date}
A synthetic key: {BookingRef}

Both are candidate keys. This pattern is common when:

A natural composite key exists (who booked what, when)
A system-generated reference number is added for convenience

Candidate Keys: {CustomerID, FlightID, Date}, {BookingRef}

The Empty Core Signal

Common Pitfalls in Key Discovery

Even with systematic algorithms, several pitfalls can lead to incorrect or incomplete key discovery:

Common Mistakes

•Stopping at first key: Missing alternate keys that inform normalization
•Ignoring transitive closure: Missing keys that emerge through chains of FDs
•Forgetting Neither attributes: Attributes not in any FD must be in every key
•Premature minimality check: Checking minimality before finding all superkeys
•Confusing superkey with candidate key: All candidate keys are superkeys, but not vice versa

Best Practices

•Complete the classification: Ensure every attribute is categorized
•Start small: Test smallest candidates first (core alone, core + 1, ...)
•Use the closure algorithm correctly: Iterate until no changes
•Verify independently: Check each candidate key's closure equals U
•Document all keys: Even if you choose one primary, record all candidates

Edge Case: The All-Key Relation

Sometimes every attribute must be in the key:

Relation: EventLog(Timestamp, UserID, Action, Result)
FDs: {} (none—each combination is unique)

With no FDs, no attribute can be derived, so the only candidate key is the entire attribute set: {Timestamp, UserID, Action, Result}.

All attributes are prime. This is fine for logging tables but may indicate missing FDs in business data.

Edge Case: The Constant Attribute

Relation: Config(Version, SettingName, Value)
FDs: 
  {} → Version  (Version is constant across all rows)
  SettingName → Value

The FD {} → Version means Version is the same for all tuples. This is unusual but valid. The key analysis proceeds normally, with SettingName likely being the candidate key.

Such FDs often indicate data that should be metadata or configuration, not row data.

Practical Applications of Complete Key Discovery

Discovering all candidate keys has significant practical implications:

Practical Benefits

•Accurate Normalization Analysis: Prime attribute classification affects 3NF analysis directly
•Primary Key Selection: Choose from all candidates based on stability, size, and semantics
•Index Planning: Each candidate key suggests a potential index
•Foreign Key Options: Referencing tables can use any candidate key
•Constraint Documentation: UNIQUE constraints should reflect all candidates
•API Design: Which identifiers to expose in APIs (natural vs. synthetic)

Primary Key Selection Criteria:

When multiple candidate keys exist, consider:

Criterion	Preference	Reason
Stability	More stable	Minimize cascading updates
Size	Smaller	Better index efficiency
Semantics	More meaningful	Easier debugging/querying
Privacy	Less sensitive	Safer in logs/URLs
Uniqueness domain	Broader	Cross-system compatibility

Practical Key Analysis Report
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- After discovering multiple candidate keys, 
-- implement them in the schema:
 
-- Example: Student table with three candidate keys
CREATE TABLE Student (
    -- Primary key: synthetic for stability
    StudentID INT PRIMARY KEY,
    
    -- Alternate key: government identifier
    SSN VARCHAR(11) NOT NULL UNIQUE,
    
    -- Alternate key: institutional identifier  
    Email VARCHAR(100) NOT NULL UNIQUE,
    
    -- Non-prime attributes
    Name VARCHAR(100) NOT NULL,
    Major VARCHAR(50),
    GPA DECIMAL(3,2),
    
    -- Document the key structure
    CONSTRAINT chk_ssn_format CHECK (SSN LIKE '___-__-____'),
    CONSTRAINT chk_email_format CHECK (Email LIKE '%@%.%')
);
 
-- Each UNIQUE constraint reflects a candidate key
-- and automatically creates an index
 
-- Reference can use any candidate key
CREATE TABLE Enrollment (
    EnrollmentID INT PRIMARY KEY,
    
    -- Could use any of these:
    StudentID INT REFERENCES Student(StudentID),
    -- OR: SSN VARCHAR(11) REFERENCES Student(SSN),
    -- OR: Email VARCHAR(100) REFERENCES Student(Email),
    
    CourseID INT NOT NULL,
    Semester VARCHAR(10) NOT NULL
);
 
-- Index all candidate keys for lookup efficiency
-- (UNIQUE constraints already create indexes, but explicit:)
-- CREATE INDEX idx_student_ssn ON Student(SSN);
-- CREATE INDEX idx_student_email ON Student(Email);

Summary: Finding All Candidate Keys

Let's consolidate the key insights from this comprehensive exploration:

Key Takeaways

•Classification is crucial — Partition attributes into L-only, R-only, Both, Neither to prune the search space
•Core + Explore — Core must be in every key; explore systematically adds 'Both' attributes
•Closure is the test — Candidate⁺ = All_Attributes confirms superkey status
•Minimality required — Candidate keys are minimal superkeys; verify no proper subset is also a superkey
•Patterns indicate multiplicity — Bidirectional FDs, cycles, and empty cores suggest multiple keys
•Practical implications — All keys inform normalization, indexing, and schema design decisions

What's Next:

Page Complete

You now possess a complete toolkit for discovering all candidate keys in any relation. This capability is essential for accurate normalization analysis and informed database design.

4 / 5