Loading content...
Throughout our normalization journey, we've repeatedly needed to know what the candidate keys are. Normal form analysis requires distinguishing prime from non-prime attributes—which hinges on knowing all candidate keys. Decomposition algorithms require starting from a candidate key. Dependency analysis requires checking if determinants are superkeys.
Key finding is the critical first step in every normalization problem.
Yet many students and interview candidates struggle with this step, especially when given complex FD sets with multiple candidate keys. They either miss some keys (leading to incorrect prime/non-prime classification) or waste time with inefficient approaches. This page gives you systematic, reliable algorithms that work every time.
By the end of this page, you will know how to systematically identify ALL candidate keys for any relation given its functional dependencies. You'll understand multiple approaches (closure-based, attribute classification, and systematic search) and know which to apply in different scenarios. You'll also handle tricky cases like overlapping keys and cyclic dependencies.
Let's establish precise definitions before diving into algorithms.
Superkey: A set of attributes S is a superkey of relation R iff S⁺ = R (the closure of S equals all attributes). Every tuple can be uniquely identified by S.
Candidate Key: A superkey K that is minimal—no proper subset of K is also a superkey. Removing any attribute from K makes it no longer a superkey.
Primary Key: One chosen candidate key designated as the "main" identifier. (A design choice, not a mathematical concept.)
Prime Attribute: An attribute that is part of any candidate key.
Non-Prime Attribute: An attribute that is not part of any candidate key.
| Concept | Definition | Example (R = ABCDE, Key = AB) |
|---|---|---|
| Superkey | Any set whose closure is R | AB, ABC, ABD, ABCD, ABCDE |
| Candidate Key | Minimal superkey | AB only (if no other key exists) |
| Primary Key | Chosen candidate key | AB (design decision) |
| Prime Attribute | In some candidate key | A, B |
| Non-Prime Attribute | In no candidate key | C, D, E |
A relation can have multiple candidate keys. You must find ALL of them to correctly classify attributes as prime or non-prime. Missing even one candidate key can lead to incorrect normal form conclusions. Never assume there's only one key!
Before computing closures, we can narrow down candidates by classifying attributes based on their role in the FDs.
Attribute Classification:
Look at all FDs and classify each attribute:
L (Left-only): Appears only on left side of FDs, never on right
R (Right-only): Appears only on right side of FDs, never on left
LR (Both sides): Appears on both left and right
N (Neither): Appears in neither side of any FD
Starting Point for Key Finding:
Minimum Key = L ∪ N (all left-only and neither attributes)
If (L ∪ N)⁺ = R, then L ∪ N is the only candidate key. Otherwise, add LR attributes systematically.
1234567891011121314151617181920212223242526272829303132333435363738
/** * Classify attributes for key finding */function classifyAttributes(R, F) { let leftSides = new Set(); let rightSides = new Set(); // Collect all appearances for (let fd of F) { for (let attr of fd.left) leftSides.add(attr); for (let attr of fd.right) rightSides.add(attr); } let classification = { L: [], // Left only - MUST be in key R: [], // Right only - CANNOT be in key LR: [], // Both sides - might be in key N: [] // Neither - MUST be in key }; for (let attr of R) { let inLeft = leftSides.has(attr); let inRight = rightSides.has(attr); if (inLeft && !inRight) classification.L.push(attr); else if (!inLeft && inRight) classification.R.push(attr); else if (inLeft && inRight) classification.LR.push(attr); else classification.N.push(attr); // Neither } return classification;} // Minimum key starting point:// minKey = L ∪ N (must be in every candidate key)// // If minKey⁺ = R, then minKey is THE candidate key.// Otherwise, try adding subsets of LR to minKey.R(A, B, C, D, E, F)
FDs: A → B, BC → D, D → E, E → F**Step 1: Classify Attributes**
Left sides: A, B, C, D, E
Right sides: B, D, E, F
- A: Left only (L) — **Must be in key**
- B: Both (LR) — Might be in key
- C: Left only (L) — **Must be in key**
- D: Both (LR) — Might be in key
- E: Both (LR) — Might be in key
- F: Right only (R) — **Cannot be in key**
**L = {A, C}** (must be in every key)
**R = {F}** (never in any key)
**LR = {B, D, E}** (check if needed)
**N = {}** (empty)
---
**Step 2: Test Minimum Key**
minKey = L ∪ N = {A, C}
Compute {A, C}⁺:
- {A, C} → {A, C, B} (A → B)
- {A, C, B} → {A, C, B, D} (BC → D)
- {A, C, B, D} → {A, C, B, D, E} (D → E)
- {A, C, B, D, E} → {A, C, B, D, E, F} (E → F)
{A, C}⁺ = {A, B, C, D, E, F} = R ✓
**Candidate Key: {A, C}**
Since minKey alone covers R, {A, C} is the **only** candidate key.
No need to test LR additions.When attribute classification alone doesn't uniquely determine the key (because (L ∪ N)⁺ ≠ R), we need to systematically search for all candidate keys.
The Algorithm:
Optimization: Since we need minimal keys, we can search smallest to largest. Once we find a superkey of size k, any superset of it is not a candidate key.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
/** * Find ALL candidate keys */function findAllCandidateKeys(R, F) { let classification = classifyAttributes(R, F); let base = classification.L.concat(classification.N); let LR = classification.LR; // Check if base alone is a key if (computeClosure(base, F).equals(R)) { return [new Set(base)]; // Only one candidate key } // Need to add from LR attributes let candidateKeys = []; // Generate all subsets of LR, from smallest to largest for (let size = 1; size <= LR.length; size++) { for (let subset of combinations(LR, size)) { let potentialKey = new Set([...base, ...subset]); // Skip if superset of existing candidate key if (candidateKeys.some(k => isSubset(k, potentialKey))) { continue; } // Check if it's a superkey if (computeClosure(potentialKey, F).equals(R)) { // Check minimality (no proper subset is a superkey) let isMinimal = true; for (let attr of potentialKey) { let reduced = new Set(potentialKey); reduced.delete(attr); if (computeClosure(reduced, F).equals(R)) { isMinimal = false; break; } } if (isMinimal) { candidateKeys.push(potentialKey); } } } } return candidateKeys;}R(A, B, C, D)
FDs: A → B, B → A, C → D, D → C**Step 1: Classify Attributes**
Left: A, B, C, D
Right: B, A, D, C
- A: Both (LR)
- B: Both (LR)
- C: Both (LR)
- D: Both (LR)
**L = {}**, **R = {}**, **LR = {A, B, C, D}**, **N = {}**
base = L ∪ N = {} (empty!)
---
**Step 2: Systematic Search**
Need to find minimal subsets of LR that cover R.
**Size 1:**
- A⁺ = {A, B} ≠ R
- B⁺ = {A, B} ≠ R
- C⁺ = {C, D} ≠ R
- D⁺ = {C, D} ≠ R
No size-1 keys.
**Size 2:**
- {A, C}⁺ = {A, B} ∪ {C, D} = {A, B, C, D} = R ✓ **Candidate Key!**
- {A, D}⁺ = {A, B} ∪ {C, D} = R ✓ **Candidate Key!**
- {B, C}⁺ = {A, B} ∪ {C, D} = R ✓ **Candidate Key!**
- {B, D}⁺ = {A, B} ∪ {C, D} = R ✓ **Candidate Key!**
- {A, B}⁺ = {A, B} ≠ R
- {C, D}⁺ = {C, D} ≠ R
---
**Candidate Keys: {A,C}, {A,D}, {B,C}, {B,D}**
**Prime Attributes:** A, B, C, D (all!)
**Non-Prime Attributes:** None
This relation is in 3NF (and actually BCNF) because every attribute is prime!Several edge cases frequently appear in interviews. Recognizing these patterns saves time and prevents errors.
R(A, B, C, D, E)
FDs: A → B, B → C, C → A, D → E**Step 1: Classify Attributes**
L = {D} (only determines, never determined by non-cycle)
R = {E} (only determined, never determines)
LR = {A, B, C} (in the cycle)
N = {}
**Step 2: Analyze the Cycle**
A → B → C → A forms a cycle.
Any one of A, B, or C can determine all three.
**Step 3: Test Keys**
base = {D}
D⁺ = {D, E} ≠ R. Need more.
Try {A, D}:
- A⁺ = {A, B, C} (full cycle)
- {A, D}⁺ = {A, B, C, D, E} = R ✓
Is it minimal?
- {A}⁺ = {A, B, C} ≠ R (missing D, E)
- {D}⁺ = {D, E} ≠ R
- Both needed. **{A, D} is a candidate key.**
**Step 4: Check Other Cycle Members**
{B, D}⁺ = {B, C, A, D, E} = R ✓ **Candidate key!**
{C, D}⁺ = {C, A, B, D, E} = R ✓ **Candidate key!**
---
**Candidate Keys: {A,D}, {B,D}, {C,D}**
Note: D must be in every key (L attribute).
Any cycle member can substitute for others.R(A, B, C, D)
FDs: A → B, A → C, A → D
All of B, C, D are right-only. Only A is left-only.
What's the key?**Classification:**
- L = {A}
- R = {B, C, D}
- LR = {}
- N = {}
**Test:**
base = {A}
A⁺ = {A, B, C, D} = R ✓
**Candidate Key: {A}**
This is straightforward—single-attribute key where one attribute determines everything else.In interviews, key finding is often a preliminary step before the "real" question (like normalization or decomposition). Here's how to approach it efficiently.
In ~80% of interview problems, the base (L ∪ N) alone is the candidate key. Always check this first! Only dive into systematic search if the base doesn't cover all attributes.
Common Interview Mistakes:
Stopping after finding one key — There might be more! Always check for alternatives.
Including R attributes in keys — Attributes that only appear on the right side of FDs are NEVER in keys.
Forgetting N attributes — If an attribute isn't mentioned in any FD, it MUST be in every key.
Not verifying minimality — Finding a superkey isn't finding a candidate key. Always verify no attribute can be removed.
Inefficient brute force — Don't try all possible subsets. Use classification to narrow down quickly.
Understanding why key finding matters for normalization crystallizes both concepts.
Keys in 2NF Analysis:
2NF violations require partial dependencies—when part of a composite key determines a non-prime attribute.
Keys in 3NF Analysis:
3NF requires: For X → A, either X is superkey OR A is prime.
Keys in BCNF Analysis:
BCNF requires: Every determinant is a superkey.
| Normal Form | Why Keys Matter | Consequence of Missing a Key |
|---|---|---|
| 2NF | Need to know if keys are composite | Might miss that 2NF is automatically satisfied |
| 3NF | Need all keys for prime classification | Might misclassify prime as non-prime, report false violation |
| BCNF | Need to verify if determinants are superkeys | Might fail to recognize a superkey, report false violation |
R(A, B, C, D)
FDs: A → B, C → D, BD → A
Question: Is this in 3NF?**Common Mistake (incomplete key finding):**
Quickly find {A, C} as a key (A→B, C→D covers most).
Assume it's the only key.
Prime = {A, C}, Non-prime = {B, D}
Check BD → A:
- Is BD superkey? BD⁺ = ? Let's see... BD → A, A → B, so BDA → B...
Actually BD⁺ with A→B... BD contains B and D.
Need to check properly.
---
**Correct Analysis (complete key finding):**
**Classify:**
- A: LR (left in A→B, right in BD→A)
- B: LR (left in BD→A, right in A→B)
- C: L (left only)
- D: LR (left in BD→A, right in C→D)
L = {C}, base = {C}
C⁺ = {C, D} ≠ R. Need more.
Try {A, C}:
{A, C}⁺ = {A, B, C, D} = R ✓ Candidate key!
Try {B, C} (since B is LR like A):
{B, C}⁺ = {B, C, D} (C→D) → any more? B alone doesn't give A.
Not a superkey. ✗
Try {B, D, C}:
{B, D, C}⁺ = {B, D, C, A} (BD→A) → {A, B, C, D} = R ✓
But is it minimal? {B, D}⁺ = {B, D, A} → {A, B, D} ≠ R (no C).
{B, C}⁺ = {B, C, D} ≠ R.
{D, C}⁺ = {C, D} ≠ R.
All three needed. **{B, C, D} is a candidate key!**
**Candidate Keys: {A, C} and {B, C, D}**
Prime = {A, B, C, D} (all!)
Non-prime = {} (none!)
**Re-check 3NF:**
- A → B: A not superkey, but B is prime ✓
- C → D: C not superkey, but D is prime ✓
- BD → A: BD not superkey, but A is prime ✓
**IN 3NF!**
If we had missed key {B,C,D}, we might have thought B and D were non-prime and concluded 3NF violation!For complex schemas, these advanced techniques can help.
Technique 1: Dependency Graph Analysis
Draw a graph where nodes are attributes and edges represent FDs. Attributes with no incoming edges (sources) must be in every key. Attributes with only incoming edges (sinks) are never in keys.
Technique 2: Equivalence Classes
When FDs form cycles (A ↔ B), treat the cycle as a single unit. Any member can substitute for others.
Technique 3: Attribute Elimination
Start with all attributes as a potential key. Remove attributes one by one if their removal doesn't break superkey property. What remains is a candidate key.
12345678910111213141516171819202122232425262728293031323334353637
/** * Find ONE candidate key via elimination * (Useful when you just need one key quickly) */function findOneCandidateKey(R, F) { let key = new Set(R); // Start with all attributes for (let attr of R) { // Try removing this attribute let reduced = new Set(key); reduced.delete(attr); // Check if still a superkey if (computeClosure(reduced, F).equals(R)) { // Can remove this attribute key = reduced; } // Otherwise, keep the attribute } return key; // This is a candidate key} // Note: This finds ONE candidate key.// To find ALL keys, you need the systematic approach.// But for quickly establishing one key, this works. // Example:// R = {A, B, C, D}, F = {A → B, C → D}// Start: key = {A, B, C, D}// Try remove A: {B, C, D}⁺ = {B, C, D} ≠ R. Keep A.// Try remove B: {A, C, D}⁺ = {A, B, C, D} = R. Remove B!// key = {A, C, D}// Try remove C: {A, D}⁺ = {A, B, D} ≠ R. Keep C.// Try remove D: {A, C}⁺ = {A, B, C, D} = R. Remove D!// key = {A, C}// Final: {A, C} is a candidate key.The elimination method finds one candidate key, but different elimination orders might yield different keys if multiple exist. To find all keys, use the systematic search from Section 3, or run elimination multiple times with different orders.
Let's practice with interview-style problems.
R(A, B, C, D, E, F)
FDs: AB → C, C → D, D → E, CF → B**Step 1: Classify**
- A: L (only in AB→C left, never right)
- B: LR (left in AB→C, right in CF→B)
- C: LR (left in C→D, CF→B; right in AB→C)
- D: LR (left in D→E, right in C→D)
- E: R (only right)
- F: L (only in CF→B left)
**L = {A, F}**, R = {E}, LR = {B, C, D}
**Step 2: Test Base**
{A, F}⁺ = {A, F}. Need more.
**Step 3: Add LR Attributes**
Try {A, F, B}:
{A, F, B}⁺ = {A,F,B} → {A,F,B,C} (AB→C) → {A,F,B,C,D} (C→D) → {A,F,B,C,D,E} (D→E)
= R ✓ **Superkey!**
Minimal? Try removing:
- {A,F}⁺ = {A,F} ≠ R
- {A,B}⁺ = {A,B,C,D,E} ≠ R (no F)
- {F,B}⁺ = {F,B} ≠ R
**{A, F, B} is a candidate key!**
Try {A, F, C}:
{A,F,C}⁺ = {A,F,C} → {A,F,C,D} (C→D) → {A,F,C,D,E} (D→E) → {A,F,C,D,E,B} (CF→B)
= R ✓ **Superkey!**
Minimal?
- {A,F}⁺ = {A,F} ≠ R
- {A,C}⁺ = {A,C,D,E} ≠ R
- {F,C}⁺ = {F,C,D,E,B} ≠ R (no A)
**{A, F, C} is a candidate key!**
Any others? {A,F,D} would give A,F,D,E but not B or C. Need C for B (CF→B).
**Candidate Keys: {A, B, F} and {A, C, F}**
Prime: {A, B, C, F}
Non-Prime: {D, E}R(A, B, C, D)
FDs: A → B, B → C, C → D, D → A**This is a full cycle!**
A → B → C → D → A
Any single attribute determines all others:
- A⁺ = {A,B,C,D} = R ✓
- B⁺ = {B,C,D,A} = R ✓
- C⁺ = {C,D,A,B} = R ✓
- D⁺ = {D,A,B,C} = R ✓
**Candidate Keys: {A}, {B}, {C}, {D}**
All four single-attribute keys!
**Prime Attributes:** All (A, B, C, D)
**Non-Prime:** None
This relation is automatically in 3NF and BCNF (every determinant is a superkey, every attribute is prime).What's Next:
With key finding mastered, Page 5: Trade-off Analysis brings together everything we've learned. You'll learn to make informed decisions about when to normalize further vs. when to stop, when to accept denormalization, and how to communicate these trade-offs in interviews—the practical wisdom that separates textbook knowledge from engineering judgment.
You now have systematic algorithms for finding all candidate keys in any relation. Practice on diverse FD sets to build fluency. Key finding should become automatic—a quick 1-2 minute step before any normalization analysis.