Loading learning content...
Many database designers stop after finding one candidate key—typically the most obvious one suggested by business requirements. But relations often have multiple candidate keys, and missing them leads to incomplete analysis.
Finding ALL candidate keys is essential for:
This page presents systematic algorithms that guarantee discovery of every candidate key.
By the end of this page, you will master the attribute classification technique for pruning the key search space, understand the exhaustive algorithm for finding all candidate keys, work through complex examples with multiple overlapping keys, and recognize patterns that indicate multiple keys exist.
The Brute Force Approach (and Why It Fails):
Naively, we could test every possible subset of attributes to see if it's a candidate key:
The problem: a relation with n attributes has 2ⁿ subsets. For a modest table with 20 attributes, that's over a million subsets to check. For 30 attributes, over a billion.
| Attributes (n) | Subsets (2ⁿ) | At 1μs per check |
|---|---|---|
| 10 | 1,024 | ~1 ms |
| 15 | 32,768 | ~33 ms |
| 20 | 1,048,576 | ~1 second |
| 25 | 33,554,432 | ~34 seconds |
| 30 | 1,073,741,824 | ~18 minutes |
| 40 | 1,099,511,627,776 | ~13 days |
The Smarter Approach:
We can dramatically reduce the search space using attribute classification. By analyzing which attributes appear on which side of functional dependencies, we can determine:
This typically reduces the search from 2ⁿ subsets to 2ᵐ, where m is much smaller than n.
In practice, m is often very small (0-5 attributes), making the search tractable even for large schemas. A relation with 30 attributes but only 3 in the 'maybe' category requires checking just 8 combinations rather than a billion.
Every attribute in a relation can be classified based on its appearance in functional dependencies:
The Four Categories:
| Category | Definition | Key Status | Reason |
|---|---|---|---|
| L-only | Appears only on the LEFT side of FDs | MUST be in EVERY key | No FD can derive it |
| R-only | Appears only on the RIGHT side of FDs | Can NEVER be in ANY key | Always derivable; including it makes key non-minimal |
| Both | Appears on both LEFT and RIGHT sides | MAY be in SOME keys | Sometimes derivable, sometimes needed |
| Neither | Does not appear in ANY FD | MUST be in EVERY key | Cannot be derived by any means |
The Core Computation:
L = {attributes appearing on left side of some FD}
R = {attributes appearing on right side of some FD}
L-only = L - R
R-only = R - L
Both = L ∩ R
Neither = U - L - R (where U is all attributes)
Core = L-only ∪ Neither // MUST be in every key
Excluded = R-only // Can NEVER be in any key
Explore = Both // Need to try combinations
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596
def classify_attributes(all_attrs: set, fds: list[tuple[set, set]]) -> dict: """ Classify all attributes based on FD appearance. Args: all_attrs: Complete set of attributes in the relation fds: List of functional dependencies as (lhs, rhs) tuples Returns: Dictionary with 'core', 'excluded', 'explore', and detailed categories """ left_side = set() right_side = set() for lhs, rhs in fds: left_side.update(lhs) right_side.update(rhs) # The four categories l_only = left_side - right_side # Only on left r_only = right_side - left_side # Only on right both = left_side & right_side # On both sides neither = all_attrs - left_side - right_side # Not in any FD # Key-finding implications core = l_only | neither # Must be in every key excluded = r_only # Never in any key explore = both # Need to try combinations return { 'l_only': l_only, 'r_only': r_only, 'both': both, 'neither': neither, 'core': core, 'excluded': excluded, 'explore': explore } def demonstrate_classification(name: str, all_attrs: set, fds: list): """Show classification results for a relation.""" print(f"\n{'='*60}") print(f"Relation: {name}") print(f"Attributes: {sorted(all_attrs)}") print(f"\nFunctional Dependencies:") for lhs, rhs in fds: print(f" {sorted(lhs)} → {sorted(rhs)}") result = classify_attributes(all_attrs, fds) print(f"\nClassification:") print(f" L-only (must be in key): {sorted(result['l_only'])}") print(f" R-only (never in key): {sorted(result['r_only'])}") print(f" Both (explore): {sorted(result['both'])}") print(f" Neither (must be in key): {sorted(result['neither'])}") print(f"\nSearch Parameters:") print(f" Core (definite): {sorted(result['core'])}") print(f" Explore (try combinations of): {sorted(result['explore'])}") print(f" Excluded (ignore): {sorted(result['excluded'])}") print(f" Search space: 2^{len(result['explore'])} = {2**len(result['explore'])} combinations") print(f"{'='*60}") # Examplesif __name__ == "__main__": # Example 1: Simple relation demonstrate_classification( "Employee(EmpID, Name, DeptID, Salary)", {"EmpID", "Name", "DeptID", "Salary"}, [ ({"EmpID"}, {"Name", "DeptID", "Salary"}) ] ) # Example 2: Multiple keys possible demonstrate_classification( "Student(SID, SSN, Email, Name, GPA)", {"SID", "SSN", "Email", "Name", "GPA"}, [ ({"SID"}, {"SSN", "Email", "Name", "GPA"}), ({"SSN"}, {"SID", "Email", "Name", "GPA"}), ({"Email"}, {"SID", "SSN", "Name", "GPA"}) ] ) # Example 3: Complex with explore attributes demonstrate_classification( "Enrollment(StudentID, CourseID, Semester, Grade, InstructorID)", {"StudentID", "CourseID", "Semester", "Grade", "InstructorID"}, [ ({"StudentID", "CourseID", "Semester"}, {"Grade", "InstructorID"}), ({"CourseID", "Semester"}, {"InstructorID"}) ] )With attribute classification in place, we can now present the complete algorithm for finding all candidate keys:
Algorithm Overview:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186
from itertools import combinationsfrom functools import lru_cache def compute_closure(attributes: frozenset, fds: tuple) -> frozenset: """ Compute attribute closure with caching for efficiency. Uses immutable types for caching compatibility. """ closure = set(attributes) changed = True while changed: changed = False for lhs, rhs in fds: if lhs.issubset(closure) and not rhs.issubset(closure): closure.update(rhs) changed = True return frozenset(closure) def is_superkey(attrs: set, all_attrs: set, fds: list) -> bool: """Check if attribute set is a superkey.""" fds_tuple = tuple((frozenset(lhs), frozenset(rhs)) for lhs, rhs in fds) closure = compute_closure(frozenset(attrs), fds_tuple) return set(closure) == all_attrs def is_minimal_superkey(attrs: set, all_attrs: set, fds: list) -> bool: """Check if a superkey is minimal (i.e., a candidate key).""" if not is_superkey(attrs, all_attrs, fds): return False # Try removing each attribute for attr in attrs: subset = attrs - {attr} if subset and is_superkey(subset, all_attrs, fds): return False # Not minimal return True def find_all_candidate_keys(all_attrs: set, fds: list) -> list[set]: """ Find ALL candidate keys for a relation. Uses attribute classification to minimize search space. Returns list of sets, each set being a candidate key. """ # Step 1: Classify attributes classification = classify_attributes(all_attrs, fds) core = classification['core'] explore = classification['explore'] # Step 2: Check if core alone is sufficient if is_superkey(core, all_attrs, fds): # Core is a superkey; verify it's minimal if is_minimal_superkey(core, all_attrs, fds): return [core] else: # Shouldn't happen if classification is correct # But handle by reducing core pass # Handle edge case: empty core if not core: core = set() # Step 3: Try adding subsets of explore attributes candidate_keys = [] # Start with smallest subsets for efficiency for size in range(len(explore) + 1): for subset in combinations(explore, size): candidate = core | set(subset) # Skip if we already have a proper subset that's a key is_superset_of_found_key = False for found_key in candidate_keys: if found_key < candidate: # found_key is proper subset is_superset_of_found_key = True break if is_superset_of_found_key: continue # Check if this is a superkey if not is_superkey(candidate, all_attrs, fds): continue # Check if it's minimal if is_minimal_superkey(candidate, all_attrs, fds): candidate_keys.append(candidate) return candidate_keys if candidate_keys else [all_attrs] def analyze_keys(name: str, all_attrs: set, fds: list): """Complete key analysis with detailed output.""" print(f"\n{'='*70}") print(f"CANDIDATE KEY ANALYSIS: {name}") print(f"{'='*70}") print(f"\nAttributes: {sorted(all_attrs)}") print(f"\nFunctional Dependencies:") for lhs, rhs in fds: print(f" {sorted(lhs)} → {sorted(rhs)}") # Classification classification = classify_attributes(all_attrs, fds) print(f"\nAttribute Classification:") print(f" Core (must be in key): {sorted(classification['core'])}") print(f" Explore (may be in key): {sorted(classification['explore'])}") print(f" Excluded (never in key): {sorted(classification['excluded'])}") # Find all candidate keys candidate_keys = find_all_candidate_keys(all_attrs, fds) print(f"\nCandidate Keys Found: {len(candidate_keys)}") for i, key in enumerate(candidate_keys, 1): print(f" Key {i}: {{{', '.join(sorted(key))}}}") # Verify each key print(f"\nVerification:") for key in candidate_keys: fds_tuple = tuple((frozenset(lhs), frozenset(rhs)) for lhs, rhs in fds) closure = compute_closure(frozenset(key), fds_tuple) print(f" {{{', '.join(sorted(key))}}}⁺ = {{{', '.join(sorted(closure))}}}") # Prime/Non-prime classification prime = set() for key in candidate_keys: prime.update(key) non_prime = all_attrs - prime print(f"\nAttribute Status:") print(f" Prime: {sorted(prime)}") print(f" Non-prime: {sorted(non_prime)}") print(f"{'='*70}") return candidate_keys # Comprehensive examplesif __name__ == "__main__": # Example 1: Single key analyze_keys( "Product(ProdID, Name, Category, Price)", {"ProdID", "Name", "Category", "Price"}, [ ({"ProdID"}, {"Name", "Category", "Price"}) ] ) # Example 2: Three equivalent keys analyze_keys( "Person(SSN, PassportNum, DriverLicense, Name, Address)", {"SSN", "PassportNum", "DriverLicense", "Name", "Address"}, [ ({"SSN"}, {"PassportNum", "DriverLicense", "Name", "Address"}), ({"PassportNum"}, {"SSN", "DriverLicense", "Name", "Address"}), ({"DriverLicense"}, {"SSN", "PassportNum", "Name", "Address"}) ] ) # Example 3: Cyclic dependencies analyze_keys( "Cycle(A, B, C, D, E)", {"A", "B", "C", "D", "E"}, [ ({"A"}, {"B"}), ({"B"}, {"C"}), ({"C"}, {"A"}), ({"D"}, {"E"}) ] ) # Example 4: Composite keys analyze_keys( "Enrollment(StudentID, CourseID, Semester, Grade)", {"StudentID", "CourseID", "Semester", "Grade"}, [ ({"StudentID", "CourseID", "Semester"}, {"Grade"}) ] )Let's work through increasingly complex examples to build intuition for the key-finding process.
Relation: R(A, B, C, D)
Functional Dependencies:
A → B
B → C
C → D
D → A
Step 1: Classify Attributes
| Attr | In LHS? | In RHS? | Category |
|---|---|---|---|
| A | Yes (A→B) | Yes (D→A) | Both |
| B | Yes (B→C) | Yes (A→B) | Both |
| C | Yes (C→D) | Yes (B→C) | Both |
| D | Yes (D→A) | Yes (C→D) | Both |
L-only = {} R-only = {} Both = {A, B, C, D} Neither = {}
Core = {} (empty) Explore = {A, B, C, D}
Step 2: Check if core is sufficient
{}⁺ = {} ≠ {A, B, C, D}. Not sufficient.
Step 3: Try combinations of explore attributes
Size 1:
All single-attribute sets are superkeys!
Step 4: Check minimality
Each is a single attribute—no proper subsets except {}. All minimal.
Result: 4 Candidate Keys: {A}, {B}, {C}, {D}
Prime Attributes: {A, B, C, D} — all attributes! Non-Prime Attributes: {} — none!
Insight: Cyclic dependencies often create many equivalent keys.
Certain FD patterns strongly suggest the presence of multiple candidate keys. Recognizing these patterns helps you anticipate when deeper analysis is needed:
Pattern Deep Dive: Bidirectional Dependencies
Relation: Employee(EmpNumber, SSN, Email, Name)
FDs:
EmpNumber → SSN, Email, Name
SSN → EmpNumber, Email, Name
Email → EmpNumber, SSN, Name
Each of these three attributes can serve as a standalone key because they mutually determine each other. This often happens when a domain has multiple natural identifiers:
All three are equally valid candidate keys.
Pattern Deep Dive: Shared Anchor with Alternatives
Relation: Booking(CustomerID, FlightID, Date, Seat, BookingRef)
FDs:
CustomerID, FlightID, Date → Seat, BookingRef
BookingRef → CustomerID, FlightID, Date, Seat
Here we have:
Both are candidate keys. This pattern is common when:
Candidate Keys: {CustomerID, FlightID, Date}, {BookingRef}
When attribute classification yields an empty core (no L-only or Neither attributes), you should expect multiple candidate keys. This is because every potential key must come from the 'explore' category, and different combinations from this category may all work as keys.
Even with systematic algorithms, several pitfalls can lead to incorrect or incomplete key discovery:
Edge Case: The All-Key Relation
Sometimes every attribute must be in the key:
Relation: EventLog(Timestamp, UserID, Action, Result)
FDs: {} (none—each combination is unique)
With no FDs, no attribute can be derived, so the only candidate key is the entire attribute set: {Timestamp, UserID, Action, Result}.
All attributes are prime. This is fine for logging tables but may indicate missing FDs in business data.
Edge Case: The Constant Attribute
Relation: Config(Version, SettingName, Value)
FDs:
{} → Version (Version is constant across all rows)
SettingName → Value
The FD {} → Version means Version is the same for all tuples. This is unusual but valid. The key analysis proceeds normally, with SettingName likely being the candidate key.
Such FDs often indicate data that should be metadata or configuration, not row data.
Discovering all candidate keys has significant practical implications:
Primary Key Selection Criteria:
When multiple candidate keys exist, consider:
| Criterion | Preference | Reason |
|---|---|---|
| Stability | More stable | Minimize cascading updates |
| Size | Smaller | Better index efficiency |
| Semantics | More meaningful | Easier debugging/querying |
| Privacy | Less sensitive | Safer in logs/URLs |
| Uniqueness domain | Broader | Cross-system compatibility |
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- After discovering multiple candidate keys, -- implement them in the schema: -- Example: Student table with three candidate keysCREATE TABLE Student ( -- Primary key: synthetic for stability StudentID INT PRIMARY KEY, -- Alternate key: government identifier SSN VARCHAR(11) NOT NULL UNIQUE, -- Alternate key: institutional identifier Email VARCHAR(100) NOT NULL UNIQUE, -- Non-prime attributes Name VARCHAR(100) NOT NULL, Major VARCHAR(50), GPA DECIMAL(3,2), -- Document the key structure CONSTRAINT chk_ssn_format CHECK (SSN LIKE '___-__-____'), CONSTRAINT chk_email_format CHECK (Email LIKE '%@%.%')); -- Each UNIQUE constraint reflects a candidate key-- and automatically creates an index -- Reference can use any candidate keyCREATE TABLE Enrollment ( EnrollmentID INT PRIMARY KEY, -- Could use any of these: StudentID INT REFERENCES Student(StudentID), -- OR: SSN VARCHAR(11) REFERENCES Student(SSN), -- OR: Email VARCHAR(100) REFERENCES Student(Email), CourseID INT NOT NULL, Semester VARCHAR(10) NOT NULL); -- Index all candidate keys for lookup efficiency-- (UNIQUE constraints already create indexes, but explicit:)-- CREATE INDEX idx_student_ssn ON Student(SSN);-- CREATE INDEX idx_student_email ON Student(Email);Let's consolidate the key insights from this comprehensive exploration:
What's Next:
With the ability to find all candidate keys and classify attributes as prime/non-prime, we're now ready to explore the relationship between keys and normalization. The final page of this module ties together keys, FDs, and normal forms into a unified framework for schema analysis.
You now possess a complete toolkit for discovering all candidate keys in any relation. This capability is essential for accurate normalization analysis and informed database design.