Loading learning content...
Finding candidate keys is the culmination of closure analysis. While earlier pages covered the concepts and basic algorithms, real-world schemas can present challenging cases: multiple overlapping candidate keys, cyclic dependencies, large attribute sets, and intricate FD patterns.
This page elevates your candidate key finding skills to expert level. You'll learn refined algorithms that handle complex cases efficiently, understand when a relation has many candidate keys versus few, and master techniques used in database design tools.
The Complete Candidate Key Problem:
Given relation R with attributes U and functional dependencies F, find ALL sets K such that:
By the end of this page, you will master efficient candidate key algorithms, understand exponential cases and how to handle them, work through complex multi-key examples, and recognize patterns that indicate key structure.
Building on the categorization approach, here's a refined algorithm with optimizations for efficiency and completeness.
Algorithm: FIND-ALL-CANDIDATE-KEYS(R, F)
Input: R — set of all attributes
F — set of functional dependencies
Output: Set of all candidate keys
1. Categorize attributes:
- L = attributes appearing only on left side of FDs
- R = attributes appearing only on right side of FDs
- B = attributes appearing on both sides
- N = attributes appearing in neither side
2. Must = L ∪ N { These MUST be in every candidate key }
May = B { These MAY or may not be in candidate keys }
Never = R { These are NEVER in any candidate key }
3. If closure(Must) = R:
Return { Must } { Only one candidate key }
4. CandidateKeys = ∅
Compute candidate keys containing Must and subsets of May
5. For each minimal set S ⊆ May such that closure(Must ∪ S) = R:
Add (Must ∪ S) to CandidateKeys
6. Return CandidateKeys
Optimization: Level-wise search with pruning
In step 5, search subsets of May by increasing size. When you find a key K, prune all its supersets (they can't be minimal).
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101
from itertools import combinationsfrom typing import Set, List, Tuple, FrozenSet def find_all_candidate_keys( R: Set[str], F: List[Tuple[Set[str], Set[str]]]) -> List[FrozenSet[str]]: """ Find all candidate keys of relation R under FDs F. Returns list of candidate keys (each as frozenset for immutability). """ # Step 1: Categorize attributes left_attrs = set() right_attrs = set() for det, dep in F: left_attrs |= det right_attrs |= dep L = left_attrs - right_attrs # Left-only: MUST be in every key R_only = right_attrs - left_attrs # Right-only: NEVER in any key B = left_attrs & right_attrs # Both: MAY be in key N = R - (left_attrs | right_attrs) # Neither: MUST be in every key # Step 2: Compute mandatory base Must = L | N May = B # Step 3: Check if Must alone is sufficient if compute_closure(Must, F) == R: return [frozenset(Must)] # Step 4 & 5: Search for minimal supersets of Must candidate_keys = [] May_list = sorted(May) # Sort for deterministic order # Level-wise search: smallest subsets first for size in range(1, len(May) + 1): for subset in combinations(May_list, size): candidate = Must | set(subset) # Pruning: skip if superset of known candidate key candidate_frozen = frozenset(candidate) is_superset = any( ck < candidate_frozen # strict subset for ck in candidate_keys ) if is_superset: continue # Check if this is a superkey if compute_closure(candidate, F) == R: candidate_keys.append(candidate_frozen) return candidate_keys def compute_closure(X: Set[str], F: List[Tuple[Set[str], Set[str]]]) -> Set[str]: """Compute attribute closure X⁺ under F.""" result = set(X) changed = True while changed: changed = False for det, dep in F: if det <= result and not dep <= result: result |= dep changed = True return result # ============ EXAMPLE USAGE ============if __name__ == "__main__": # Example: R(A,B,C,D,E) with cyclic dependencies R = {'A', 'B', 'C', 'D', 'E'} F = [ ({'A', 'B'}, {'C'}), # AB → C ({'C'}, {'D'}), # C → D ({'D'}, {'E'}), # D → E ({'E'}, {'A'}), # E → A (creates cycle) ] keys = find_all_candidate_keys(R, F) print("Candidate Keys:") for k in keys: print(f" {set(k)}") # Analysis: # L = {B} (B only on left) # R_only = {} (all right attrs also on left somewhere) # B = {A, C, D, E} (all appear on both sides) # N = {} # Must = {B} # # {B}⁺ = {B} ≠ R, so we need to add from {A,C,D,E} # Due to cycle E→A, AB→C, C→D, D→E: # {B,A}⁺ = {A,B,C,D,E} ✓ # {B,C}⁺ = {B,C,D,E,A} ✓ # {B,D}⁺ = {B,D,E,A,C} ✓ (wait, need to verify) # {B,E}⁺ = {B,E,A,C,D} ✓ # # All four are candidate keys: {AB}, {BC}, {BD}, {BE}Let's work through a challenging example that demonstrates multiple interacting candidate keys.
Problem:
Relation: Student_Enrollment(StudentID, CourseID, Semester, Grade, InstructorID, InstructorName)
Simplified as: R(S, C, E, G, I, N) where:
Functional Dependencies:
Step 1: Categorize Attributes
| Attribute | Left side of | Right side of | Category |
|---|---|---|---|
| S | F1 | — | L (left-only) |
| C | F1, F2 | — | L (left-only) |
| E | F1, F2 | — | L (left-only) |
| G | — | F1 | R (right-only) |
| I | F3 | F2 | B (both) |
| N | — | F3 | R (right-only) |
Categories:
Must = L ∪ N = {S, C, E}
Step 2: Test Must Alone
Compute {S, C, E}⁺:
Result: {S, C, E}⁺ = {S, C, E, G, I, N} = R ✓
Since Must alone determines all attributes, {S, C, E} = {StudentID, CourseID, Semester} is the ONLY candidate key.
This makes semantic sense: a student's enrollment in a specific course during a specific semester uniquely identifies the record. The grade, instructor, and instructor name are all derivable from this combination.
Now let's examine a relation with multiple candidate keys stemming from equivalent identifiers.
Problem:
Relation: Employee(EmpID, SSN, Email, Name, DeptID, DeptName)
Simplified as: R(E, S, M, N, D, T) where:
Functional Dependencies:
Step 1: Categorize Attributes
| Attribute | Left side | Right side | Category |
|---|---|---|---|
| E | F1 | F2, F3 | B (both) |
| S | F2 | F1 | B (both) |
| M | F3 | F1 | B (both) |
| N | — | F1 | R (right-only) |
| D | F4 | F1 | B (both) |
| T | — | F4 | R (right-only) |
Categories:
Must = L ∪ N = ∅ (empty!)
Step 2: Test Must = ∅
Clearly, ∅⁺ = ∅ ≠ R. We need to find minimal subsets of B = {E, S, M, D} that determine everything.
Step 3: Level-wise Search
Size 1 subsets:
All size-1 keys found: {E}, {S}, {M}
| Candidate | Closure | Is Superkey? | Is Minimal? |
|---|---|---|---|
| {E} | {E,S,M,N,D,T} = R | ✓ Yes | ✓ Yes |
| {S} | {S,E,M,N,D,T} = R | ✓ Yes | ✓ Yes |
| {M} | {M,E,S,N,D,T} = R | ✓ Yes | ✓ Yes |
| {D} | {D,T} | ✗ No | — |
| {E,S} | R | ✓ Yes | ✗ No (E alone works) |
| {E,D} | R | ✓ Yes | ✗ No (E alone works) |
Candidate Keys: {EmpID}, {SSN}, {Email}
This reflects the business reality: an employee can be uniquely identified by their employee ID, social security number, OR email address. All three are valid unique identifiers, so all three are candidate keys.
Prime attributes: E, S, M (each belongs to some candidate key) Non-prime attributes: N, D, T (none belongs to any candidate key)
In the worst case, the number of candidate keys can be exponential in the number of attributes. This section explains when this happens and strategies to handle it.
When Do We Get Exponentially Many Keys?
The classic example is a relation with pairwise equivalent attributes:
R(A₁, A₂, ..., Aₙ)
FDs: A₁ → A₂, A₂ → A₁, (A₁ ↔ A₂)
A₃ → A₄, A₄ → A₃, (A₃ ↔ A₄)
...
If we have n/2 pairs of equivalent attributes, the number of candidate keys is 2^(n/2)—each pair contributes a binary choice.
F = {A → B, B → A, C → D, D → C}4 Candidate Keys: {A,C}, {A,D}, {B,C}, {B,D}Analysis: • L = ∅ (all attrs on both sides due to bidirectional FDs) • R = ∅ • B = {A, B, C, D} • Must = ∅
Key insight: (A,B) are equivalent—choose one. (C,D) are equivalent—choose one. 2 choices × 2 choices = 4 candidate keys.
With n/2 such pairs, we'd have 2^(n/2) candidate keys.
Verification: • {A,C}⁺: A→B, C→D → {A,B,C,D} ✓ • {A,D}⁺: A→B, D→C → {A,B,C,D} ✓ • {B,C}⁺: B→A, C→D → {A,B,C,D} ✓ • {B,D}⁺: B→A, D→C → {A,B,C,D} ✓
Strategies for Large Key Spaces:
Recognize equivalence classes: When attributes are mutually dependent (A ↔ B), they form an equivalence class. Within each class, pick any representative. This reduces the effective attribute count.
Early termination: For some applications, finding ANY candidate key suffices. Stop after finding the first.
Parallel search: The search through subsets of B can be parallelized since different subsets are independent.
Caching closures: Memoize closure computations to avoid redundant work when testing overlapping subsets.
SAT-based approaches: For very complex FD sets, encode the key-finding problem as a Boolean satisfiability problem and use SAT solvers.
In real database schemas, exponentially many candidate keys are rare. Most relations have 1-3 candidate keys, arising from natural unique identifiers (ID, SSN, Email, etc.). The exponential case is mainly of theoretical interest.
After finding a superkey, we must verify it's minimal (a candidate key). This section details the verification process.
Algorithm: VERIFY-CANDIDATE-KEY(K, R, F)
Input: K — proposed candidate key
R — all attributes
F — functional dependencies
Output: true if K is a candidate key, false otherwise
1. If K⁺ ≠ R:
Return false { Not even a superkey }
2. For each attribute A ∈ K:
If (K - {A})⁺ = R:
Return false { K - {A} is also a superkey, so K not minimal }
3. Return true { K is minimal superkey = candidate key }
Complexity: O(|K|) closure computations, each O(|F| × |R|²), giving O(|K| × |F| × |R|²) total.
Test K = {A, B, C}Not a candidate key (not minimal)Step 1: Check if K is a superkey • {A,B,C}⁺: A→B (have B), C→D → {A,B,C,D} = R ✓ • Yes, {A,B,C} is a superkey.
Step 2: Check minimality by removing each attribute
• Remove A: {B,C}⁺ = {B,C,D} ≠ R ✓ (A is essential) • Remove B: {A,C}⁺ = {A,B,C,D} = R ✗ (B not essential!) • (Can stop here)
Conclusion: {A,B,C} is NOT a candidate key because {A,C} is also a superkey. The actual candidate key is {A,C}.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
def is_candidate_key(K: set, R: set, F: list) -> bool: """ Verify if K is a candidate key (minimal superkey). Args: K: Proposed candidate key R: All attributes F: Functional dependencies Returns: True if K is a candidate key, False otherwise """ # Step 1: Must be a superkey if compute_closure(K, F) != R: return False # Step 2: Check minimality for attr in K: K_minus_attr = K - {attr} if compute_closure(K_minus_attr, F) == R: # Found a smaller superkey! return False return True def find_minimal_superkey(K: set, R: set, F: list) -> set: """ Given a superkey K, find a minimal subset that's still a superkey. Useful for reducing a known superkey to a candidate key. """ if compute_closure(K, F) != R: raise ValueError("K is not a superkey") minimal = set(K) for attr in list(K): # Iterate over copy since we modify if attr in minimal: test = minimal - {attr} if compute_closure(test, F) == R: minimal = test return minimal # ExampleR = {'A', 'B', 'C', 'D'}F = [({'A'}, {'B'}), ({'C'}, {'D'})] K = {'A', 'B', 'C'}print(f"{K} is candidate key: {is_candidate_key(K, R, F)}") # False minimal = find_minimal_superkey(K, R, F)print(f"Minimal superkey from {K}: {minimal}") # {A, C}print(f"{minimal} is candidate key: {is_candidate_key(minimal, R, F)}") # TrueHere's a comprehensive, production-ready implementation that combines all techniques:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
from itertools import combinationsfrom dataclasses import dataclassfrom typing import Set, List, Tuple, FrozenSet @dataclassclass KeyAnalysisResult: """Complete analysis of a relation's key structure.""" candidate_keys: List[FrozenSet[str]] prime_attributes: Set[str] non_prime_attributes: Set[str] must_include: Set[str] # L ∪ N never_include: Set[str] # R optional: Set[str] # B def summary(self) -> str: lines = [ "=== Key Analysis Result ===", f"Candidate Keys: {len(self.candidate_keys)}", ] for i, key in enumerate(self.candidate_keys, 1): lines.append(f" {i}. {set(key)}") lines.extend([ f"Prime Attributes: {self.prime_attributes}", f"Non-Prime Attributes: {self.non_prime_attributes}", f"Must be in every key: {self.must_include}", f"Never in any key: {self.never_include}", ]) return "\n".join(lines) def analyze_keys( R: Set[str], F: List[Tuple[Set[str], Set[str]]]) -> KeyAnalysisResult: """ Complete key analysis: find all candidate keys and classify attributes. """ # Categorize attributes left_attrs = set() right_attrs = set() for det, dep in F: left_attrs |= det right_attrs |= dep L = left_attrs - right_attrs R_only = right_attrs - left_attrs B = left_attrs & right_attrs N = R - (left_attrs | right_attrs) Must = L | N # Find candidate keys candidate_keys = _find_keys(R, F, Must, B) # Compute prime/non-prime attributes prime = set() for key in candidate_keys: prime |= key non_prime = R - prime return KeyAnalysisResult( candidate_keys=candidate_keys, prime_attributes=prime, non_prime_attributes=non_prime, must_include=Must, never_include=R_only, optional=B ) def _find_keys( R: Set[str], F: List[Tuple[Set[str], Set[str]]], Must: Set[str], May: Set[str]) -> List[FrozenSet[str]]: """Internal: find all candidate keys given Must and May sets.""" # Check if Must alone works if compute_closure(Must, F) == R: return [frozenset(Must)] candidate_keys = [] May_list = sorted(May) # Level-wise search for size in range(1, len(May) + 1): for subset in combinations(May_list, size): candidate = Must | set(subset) candidate_frozen = frozenset(candidate) # Prune: skip supersets of found keys if any(ck < candidate_frozen for ck in candidate_keys): continue if compute_closure(candidate, F) == R: candidate_keys.append(candidate_frozen) return candidate_keys def compute_closure(X: Set[str], F: List[Tuple[Set[str], Set[str]]]) -> Set[str]: """Compute X⁺ under F.""" result = set(X) changed = True while changed: changed = False for det, dep in F: if det <= result and not dep <= result: result |= dep changed = True return result # ============ DEMONSTRATION ============if __name__ == "__main__": print("Example 1: Employee Relation") R1 = {'E', 'S', 'M', 'N', 'D', 'T'} F1 = [ ({'E'}, {'S', 'M', 'N', 'D'}), ({'S'}, {'E'}), ({'M'}, {'E'}), ({'D'}, {'T'}), ] result1 = analyze_keys(R1, F1) print(result1.summary()) print() print("Example 2: Cyclic Dependencies") R2 = {'A', 'B', 'C', 'D'} F2 = [ ({'A'}, {'B'}), ({'B'}, {'A'}), ({'C'}, {'D'}), ({'D'}, {'C'}), ] result2 = analyze_keys(R2, F2) print(result2.summary())Experienced database designers recognize patterns that indicate key structure without full computation. These heuristics speed up analysis:
Pattern 1: Single Source Attribute
If one attribute X is in L (left-only) and no other attribute is in L or N, then every candidate key must contain X. If {X}⁺ = R, it's the only candidate key.
Pattern 2: Equivalent Identifiers
Bidirectional FDs (A → B and B → A) indicate equivalent identifiers. One can substitute for the other in keys, creating multiple candidate keys of the same size.
Pattern 3: Composite to Atomic Chain
When composite attributes (like StudentID, CourseID) start a chain of atomic dependencies, the composite is likely the only candidate key.
Pattern 4: Independent Attribute Groups
If attributes form independent groups (no FDs between groups), each group needs representation in the key. This often happens with denormalized relations.
| Pattern | Indicator | Likely Key Structure |
|---|---|---|
| Single source | One attr in L, others in R or B | Single candidate key = that attr |
| Multiple IDs | A↔B, both in category B | Two keys: one with A, one with B |
| Hierarchical | A→B→C→D chain | Key likely at start of chain (A) |
| Independent parts | No FD connects group X to group Y | Key needs one attr from each group |
| Everything derived | All attrs in R except one in L | That one L attr is the key |
The fastest way to identify keys is often semantic understanding. What uniquely identifies an entity? In an orders table: OrderID. In a line items table: OrderID + LineNumber. In a many-to-many: both FKs together. Formal analysis confirms what business logic suggests.
Candidate key finding is the practical culmination of attribute closure theory. Let's consolidate everything you've learned in this module:
Looking Ahead:
With mastery of attribute closure and candidate key finding, you're prepared for the advanced topics ahead:
The closure algorithm you've mastered is the workhorse underlying all these advanced techniques.
Congratulations! You've completed Module 3: Attribute Closure. You can now compute closure, find all candidate keys, analyze superkey structure, and understand the key structure of any relation. These skills are essential for normalization, schema design, and advanced database theory.