Loading content...
Every database table should have a primary key, but how do you determine what that key should be? In simple cases, it's obvious—an EmployeeID uniquely identifies an employee. But in complex schemas with multiple interrelated attributes and dozens of functional dependencies, finding all possible keys becomes a sophisticated analytical challenge.
The key discovery problem asks: Given a relation R and a set of functional dependencies F, what are all the candidate keys of R? This question is fundamental to database design because:
By the end of this page, you will master systematic techniques for finding candidate keys from functional dependencies. You'll learn the attribute categorization approach, understand why certain attributes must always be in every key, and be able to find all candidate keys of any relation—not just one.
Before diving into key discovery algorithms, let's ensure our definitions are crystal clear. These terms have precise meanings that are essential for correct reasoning.
Superkey: A set of attributes K is a superkey of relation R if K functionally determines all attributes of R. Formally:
K is a superkey ⟺ K⁺ = R (the closure of K is the entire relation)
Candidate Key: A candidate key is a minimal superkey—no proper subset of it is also a superkey. Formally:
K is a candidate key ⟺ K is a superkey AND for all proper subsets K' ⊂ K, K' is not a superkey
Primary Key: The primary key is the candidate key chosen by the database designer to be the main identifier. All other candidate keys become alternate keys.
Prime Attribute: An attribute is prime if it belongs to at least one candidate key.
Non-Prime Attribute: An attribute is non-prime if it does not belong to any candidate key.
| Concept | Definition | Example (R = ABCDE, key = AB) |
|---|---|---|
| Superkey | Determines all attributes | {AB}, {ABC}, {ABCD}, {ABE}, {ABCDE} |
| Candidate Key | Minimal superkey | {AB} (if no subset works) |
| Prime Attribute | In some candidate key | A, B |
| Non-Prime Attribute | In no candidate key | C, D, E |
Testing if K is a superkey is straightforward: compute K⁺. If K⁺ contains all attributes of R, K is a superkey. For testing if K is a candidate key, also verify that no proper subset of K has this property.
The most powerful technique for finding candidate keys is to categorize attributes based on their role in functional dependencies. This categorization dramatically reduces the search space.
Four Categories of Attributes:
L (Left-only): Appears only on the left side (determinant) of FDs, never on the right (dependent)
R (Right-only): Appears only on the right side (dependent) of FDs, never on the left
B (Both): Appears on both left and right sides in different FDs
N (Neither): Does not appear in any FD (neither side)
| Category | In Determinant? | In Dependent? | Key Membership |
|---|---|---|---|
| L (Left-only) | Yes | No | MUST be in every key |
| R (Right-only) | No | Yes | NEVER in any key |
| B (Both) | Yes | Yes | MAY or may not be in key |
| N (Neither) | No | No | MUST be in every key |
• L attributes can determine others but nothing determines them, so they must be given (in the key). • R attributes are determined by other attributes, so including them in a key is unnecessary. • N attributes are not involved in any FD, so they can only be known by being explicitly given (in the key). • B attributes are the candidates for optional inclusion—we must test whether adding them helps.
Categorize all attributesL = {A}, R = {E}, B = {B, C, D}, N = {F}Analyzing each attribute:
• A: Left of A→B, never on right side → L (left-only) • B: Right of A→B, left of BC→D → B (both) • C: Left of BC→D, never on right side → L (left-only)... wait, C is also B?
Let me recategorize: • A: determinant in A→B only → L (left-only) • B: dependent in A→B, determinant in BC→D → B (both) • C: determinant in BC→D only → L (left-only) • D: dependent in BC→D, determinant in D→E → B (both) • E: dependent in D→E only → R (right-only) • F: not in any FD → N (neither)
Actual categories: • L = {A, C} • R = {E} • B = {B, D} • N = {F}
Must be in every key: A, C, F Never in any key: E May or may not be in key: B, D
Using attribute categorization, we can formulate a systematic algorithm to find all candidate keys.
Algorithm: FIND-CANDIDATE-KEYS(R, F)
Input: R — relation schema (set of attributes)
F — set of functional dependencies
Output: All candidate keys of R
1. Categorize attributes into L, R, B, N sets
2. Base = L ∪ N (attributes that must be in every key)
3. If Base⁺ = R:
Return {Base} as the only candidate key
4. Else:
Find all minimal sets X ⊆ B such that (Base ∪ X)⁺ = R
Return all such (Base ∪ X) as candidate keys
Key Insight: We only need to explore subsets of B (the "both" category). L, N are always included; R is never included. This dramatically reduces the search space from 2^n to 2^|B|.
When exploring subsets of B, use a level-wise search:
This guarantees finding minimal keys first. Once you find a key, any superset is a superkey (not minimal), so you can prune it.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
from itertools import combinations def find_candidate_keys(R: set, F: list[tuple[set, set]]) -> list[set]: """ Find all candidate keys of relation R under FDs F. Args: R: Set of all attributes in the relation F: List of FDs as (determinant, dependent) tuples Returns: List of candidate keys (each key is a set of attributes) """ # Step 1: Categorize attributes left_attrs = set() right_attrs = set() for det, dep in F: left_attrs |= det right_attrs |= dep L = left_attrs - right_attrs # Left-only R_only = right_attrs - left_attrs # Right-only B = left_attrs & right_attrs # Both N = R - (left_attrs | right_attrs) # Neither # Step 2: Compute base (must be in every key) base = L | N # Step 3: Check if base alone is a key if compute_closure(base, F) == R: return [base] # Step 4: Search through subsets of B candidate_keys = [] B_list = list(B) # Level-wise search: smallest subsets first for size in range(1, len(B) + 1): for subset in combinations(B_list, size): candidate = base | set(subset) # Skip if superset of known candidate key if any(ck <= candidate and ck != candidate for ck in candidate_keys): continue # Check if this is a superkey if compute_closure(candidate, F) == R: candidate_keys.append(candidate) return candidate_keys # Example usageR = {'A', 'B', 'C', 'D', 'E', 'F'}F = [ ({'A'}, {'B'}), # A → B ({'B', 'C'}, {'D'}), # BC → D ({'D'}, {'E'}), # D → E] keys = find_candidate_keys(R, F)print("Candidate keys:", [sorted(k) for k in keys])# Output: Candidate keys: [['A', 'C', 'F']]Let's work through a comprehensive example that has multiple candidate keys.
Problem:
Relation: R(A, B, C, D, E)
Functional Dependencies:
Step 1: Categorize Attributes
| Attribute | Left side of | Right side of | Category |
|---|---|---|---|
| A | AB→C | E→A | B (Both) |
| B | AB→C | — | L (Left-only) |
| C | C→D | AB→C | B (Both) |
| D | D→E | C→D | B (Both) |
| E | E→A | D→E | B (Both) |
Categories:
Base = L ∪ N = {B}
Step 2: Test Base Alone
Compute {B}⁺:
Since {B}⁺ ≠ R, we need to add something from B = {A, C, D, E}.
| Candidate | Closure Computation | Is Superkey? |
|---|---|---|
| {B, A} | {BA}⁺: BA→C⊕, C→D⊕, D→E⊕ → {A,B,C,D,E} | ✓ YES |
| {B, C} | {BC}⁺: C→D⊕, D→E⊕, E→A⊕, AB→C(already) → {A,B,C,D,E} | ✓ YES |
| {B, D} | {BD}⁺: D→E⊕, E→A⊕, AB→C⊕, C→D(already) → {A,B,C,D,E} | ✓ YES |
| {B, E} | {BE}⁺: E→A⊕, AB→C⊕, C→D⊕, D→E(already) → {A,B,C,D,E} | ✓ YES |
All four combinations {B, A}, {B, C}, {B, D}, {B, E} are superkeys. Since they're all the same size (2 attributes) and no single attribute is a superkey, they are all candidate keys:
Candidate Keys: {AB}, {BC}, {BD}, {BE}
We don't need to test larger subsets (like {ABC}) because we've already found keys of size 2, making any superset a non-minimal superkey.
Why does this relation have so many candidate keys?
The cycle E → A, AB → C, C → D, D → E creates equivalences. Once you have B plus any attribute in the cycle, you can derive all others through the circular dependencies. This is characteristic of relations with cyclic FD structures.
Let's examine a case with a single candidate key.
Problem:
Relation: R(A, B, C, D, E, F)
Functional Dependencies:
Step 1: Categorize Attributes
| Attribute | Appears on Left? | Appears on Right? | Category |
|---|---|---|---|
| A | Yes (A→B, A→C) | No | L (Left-only) |
| B | No | Yes (A→B) | R (Right-only) |
| C | Yes (C→D) | Yes (A→C) | B (Both) |
| D | Yes (D→E) | Yes (C→D) | B (Both) |
| E | No | Yes (D→E) | R (Right-only) |
| F | No | No | N (Neither) |
Categories:
Base = L ∪ N = {A, F}
Step 2: Test Base Alone
Compute {A, F}⁺:
Result: {A, F}⁺ = {A, B, C, D, E, F} = R ✓
Conclusion: Base alone is a superkey, so {A, F} is the only candidate key.
A must be in every key because nothing determines it. F must be in every key because it's not involved in any FD (neither side). Together, A and F are sufficient because A triggers a chain A→B, A→C, C→D, D→E that covers everything else.
The attributes C and D (category B) are not needed in the key because A already determines them.
Some relations present more challenging key discovery scenarios. Let's examine strategies for handling them.
Case 1: Large B Category
When many attributes are in category B, the number of subsets to test can become large (2^|B|). Strategies to manage this:
Level-wise search with pruning: Test smallest subsets first. Once you find a key of size k, don't test supersets of found keys.
Compute closure incrementally: If {Base, X}⁺ ⊂ R, then adding some attributes from B might still not help. Use partial closure info to prune.
Focus on essential attributes: An attribute X ∈ B might be essential (must be in every key even though it's in B) if it's the only way to reach certain attributes.
Find candidate keysCandidate keys: {A, C, E}, {A, D, E}, {B, C, E}, {B, D, E}Analysis: • L = ∅ (all determinant attrs also appear on right) • R = ∅ (all dependent attrs also appear on left) • B = {A, B, C, D, E} (all appear on both sides) • N = ∅
Base = ∅, so we must test subsets of B = all attributes!
Key observations:
• A and B determine each other (A→B, B→A), so either works
• C and D determine each other (C→D, D→C), so either works
• E→E is trivial; E must be explicitly given
Every key needs: one of {A,B} + one of {C,D} + E This gives 2×2×1 = 4 candidate keys.
Case 2: Empty Left-Only Category
When L = ∅, every determinant attribute also appears as a dependent somewhere. This often indicates cyclic dependencies. The algorithm still works:
In the worst case (all attributes in category B), we might need to check 2^n candidate sets. However, this is rare in practice. Real-world schemas typically have clear L and N attributes that form most of the key, with only a few B attributes to test.
Finding keys correctly requires careful reasoning. Here are common mistakes to avoid and verification strategies to ensure correctness.
Verify or refute this claimINCORRECT — {B, C} is not even a superkeyLet's check:
{B, C}⁺: • Start: {B, C} • A → B: A ⊆ {B,C}? No • B → C: B ⊆ {B,C}? Yes, but C already present • C → D: C ⊆ {B,C}? Yes, add D → {B, C, D}
{B, C}⁺ = {B, C, D} ≠ R (missing A)
The student's error: ignoring that A is in category L (left-only) and MUST be in every key.
Correct key: {A} since {A}⁺ = {A, B, C, D} = R
Finding candidate keys from functional dependencies is a fundamental skill in database design. Let's consolidate the key techniques:
The Algorithm in Short:
What's Next:
Now that we can find keys, the next page explores superkey identification in more depth—understanding the structure of all superkeys, how they relate to candidate keys, and why this matters for schema design.
You now have a systematic method for finding all candidate keys of a relation from its functional dependencies. This skill is essential for normalization analysis, where distinguishing prime from non-prime attributes determines the normal form. Next, we examine superkey identification in detail.