Database Management SystemsAttribute Closure

Attribute Closure

LevelIntermediate

Duration60 mins

TopicAttribute Closure

5 / 5

Candidate Key Finding

The Ultimate Goal: All Candidate Keys

Finding candidate keys is the culmination of closure analysis. While earlier pages covered the concepts and basic algorithms, real-world schemas can present challenging cases: multiple overlapping candidate keys, cyclic dependencies, large attribute sets, and intricate FD patterns.

This page elevates your candidate key finding skills to expert level. You'll learn refined algorithms that handle complex cases efficiently, understand when a relation has many candidate keys versus few, and master techniques used in database design tools.

The Complete Candidate Key Problem:

Given relation R with attributes U and functional dependencies F, find ALL sets K such that:

K⁺ = U (K is a superkey)
For all K' ⊂ K: K'⁺ ≠ U (K is minimal)

What You Will Learn

By the end of this page, you will master efficient candidate key algorithms, understand exponential cases and how to handle them, work through complex multi-key examples, and recognize patterns that indicate key structure.

The Refined Candidate Key Algorithm

Building on the categorization approach, here's a refined algorithm with optimizations for efficiency and completeness.

Algorithm: FIND-ALL-CANDIDATE-KEYS(R, F)

Input:  R — set of all attributes
        F — set of functional dependencies
Output: Set of all candidate keys

1. Categorize attributes:
   - L = attributes appearing only on left side of FDs
   - R = attributes appearing only on right side of FDs
   - B = attributes appearing on both sides
   - N = attributes appearing in neither side

2. Must = L ∪ N    { These MUST be in every candidate key }
   May = B         { These MAY or may not be in candidate keys }
   Never = R       { These are NEVER in any candidate key }

3. If closure(Must) = R:
      Return { Must }   { Only one candidate key }

4. CandidateKeys = ∅
   Compute candidate keys containing Must and subsets of May
   
5. For each minimal set S ⊆ May such that closure(Must ∪ S) = R:
      Add (Must ∪ S) to CandidateKeys

6. Return CandidateKeys

Optimization: Level-wise search with pruning

In step 5, search subsets of May by increasing size. When you find a key K, prune all its supersets (they can't be minimal).

complete_candidate_key_finder
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
from itertools import combinations
from typing import Set, List, Tuple, FrozenSet
 
def find_all_candidate_keys(
    R: Set[str], 
    F: List[Tuple[Set[str], Set[str]]]
) -> List[FrozenSet[str]]:
    """
    Find all candidate keys of relation R under FDs F.
    
    Returns list of candidate keys (each as frozenset for immutability).
    """
    # Step 1: Categorize attributes
    left_attrs = set()
    right_attrs = set()
    for det, dep in F:
        left_attrs |= det
        right_attrs |= dep
    
    L = left_attrs - right_attrs       # Left-only: MUST be in every key
    R_only = right_attrs - left_attrs  # Right-only: NEVER in any key
    B = left_attrs & right_attrs       # Both: MAY be in key
    N = R - (left_attrs | right_attrs) # Neither: MUST be in every key
    
    # Step 2: Compute mandatory base
    Must = L | N
    May = B
    
    # Step 3: Check if Must alone is sufficient
    if compute_closure(Must, F) == R:
        return [frozenset(Must)]
    
    # Step 4 & 5: Search for minimal supersets of Must
    candidate_keys = []
    May_list = sorted(May)  # Sort for deterministic order
    
    # Level-wise search: smallest subsets first
    for size in range(1, len(May) + 1):
        for subset in combinations(May_list, size):
            candidate = Must | set(subset)
            
            # Pruning: skip if superset of known candidate key
            candidate_frozen = frozenset(candidate)
            is_superset = any(
                ck < candidate_frozen  # strict subset
                for ck in candidate_keys
            )
            if is_superset:
                continue
            
            # Check if this is a superkey
            if compute_closure(candidate, F) == R:
                candidate_keys.append(candidate_frozen)
    
    return candidate_keys
 
 
def compute_closure(X: Set[str], F: List[Tuple[Set[str], Set[str]]]) -> Set[str]:
    """Compute attribute closure X⁺ under F."""
    result = set(X)
    changed = True
    while changed:
        changed = False
        for det, dep in F:
            if det <= result and not dep <= result:
                result |= dep
                changed = True
    return result
 
 
# ============ EXAMPLE USAGE ============
if __name__ == "__main__":
    # Example: R(A,B,C,D,E) with cyclic dependencies
    R = {'A', 'B', 'C', 'D', 'E'}
    F = [
        ({'A', 'B'}, {'C'}),   # AB → C
        ({'C'}, {'D'}),         # C → D
        ({'D'}, {'E'}),         # D → E
        ({'E'}, {'A'}),         # E → A  (creates cycle)
    ]
    
    keys = find_all_candidate_keys(R, F)
    print("Candidate Keys:")
    for k in keys:
        print(f"  {set(k)}")
    
    # Analysis:
    # L = {B} (B only on left)
    # R_only = {} (all right attrs also on left somewhere)
    # B = {A, C, D, E} (all appear on both sides)
    # N = {} 
    # Must = {B}
    #
    # {B}⁺ = {B} ≠ R, so we need to add from {A,C,D,E}
    # Due to cycle E→A, AB→C, C→D, D→E:
    # {B,A}⁺ = {A,B,C,D,E} ✓
    # {B,C}⁺ = {B,C,D,E,A} ✓
    # {B,D}⁺ = {B,D,E,A,C} ✓ (wait, need to verify)
    # {B,E}⁺ = {B,E,A,C,D} ✓
    #
    # All four are candidate keys: {AB}, {BC}, {BD}, {BE}

Complex Multi-Key Example

Let's work through a challenging example that demonstrates multiple interacting candidate keys.

Problem:

Relation: Student_Enrollment(StudentID, CourseID, Semester, Grade, InstructorID, InstructorName)

Simplified as: R(S, C, E, G, I, N) where:

S = StudentID
C = CourseID
E = Semester
G = Grade
I = InstructorID
N = InstructorName

Functional Dependencies:

F1: S, C, E → G (student's grade in a course-semester is unique)
F2: C, E → I (each course-semester has one instructor)
F3: I → N (instructor has one name)

Step 1: Categorize Attributes

Attribute	Left side of	Right side of	Category
S	F1	—	L (left-only)
C	F1, F2	—	L (left-only)
E	F1, F2	—	L (left-only)
G	—	F1	R (right-only)
I	F3	F2	B (both)
N	—	F3	R (right-only)

Categories:

L = {S, C, E}
R = {G, N}
B = {I}
N = ∅

Must = L ∪ N = {S, C, E}

Step 2: Test Must Alone

Compute {S, C, E}⁺:

Start: {S, C, E}
F1 (S,C,E → G): S,C,E ⊆ {S,C,E}? YES → add G → {S, C, E, G}
F2 (C,E → I): C,E ⊆ {S,C,E,G}? YES → add I → {S, C, E, G, I}
F3 (I → N): I ⊆ {S,C,E,G,I}? YES → add N → {S, C, E, G, I, N}

Result: {S, C, E}⁺ = {S, C, E, G, I, N} = R ✓

Single Candidate Key Found

Since Must alone determines all attributes, {S, C, E} = {StudentID, CourseID, Semester} is the ONLY candidate key.

This makes semantic sense: a student's enrollment in a specific course during a specific semester uniquely identifies the record. The grade, instructor, and instructor name are all derivable from this combination.

Example with Multiple Candidate Keys

Now let's examine a relation with multiple candidate keys stemming from equivalent identifiers.

Problem:

Relation: Employee(EmpID, SSN, Email, Name, DeptID, DeptName)

Simplified as: R(E, S, M, N, D, T) where:

E = EmpID
S = SSN
M = Email
N = Name
D = DeptID
T = DeptName (T for "title")

Functional Dependencies:

F1: E → S, M, N, D (EmpID determines all employee info)
F2: S → E (SSN uniquely identifies employee)
F3: M → E (Email uniquely identifies employee)
F4: D → T (DeptID determines DeptName)

Step 1: Categorize Attributes

Attribute	Left side	Right side	Category
E	F1	F2, F3	B (both)
S	F2	F1	B (both)
M	F3	F1	B (both)
N	—	F1	R (right-only)
D	F4	F1	B (both)
T	—	F4	R (right-only)

Categories:

L = ∅ (no left-only attributes!)
R = {N, T}
B = {E, S, M, D}
N = ∅

Must = L ∪ N = ∅ (empty!)

Step 2: Test Must = ∅

Clearly, ∅⁺ = ∅ ≠ R. We need to find minimal subsets of B = {E, S, M, D} that determine everything.

Step 3: Level-wise Search

Size 1 subsets:

{E}⁺: E → S,M,N,D gives {E,S,M,N,D}, D → T gives {E,S,M,N,D,T} = R ✓ CANDIDATE KEY
{S}⁺: S → E gives {S,E}, then E → S,M,N,D already have S,E so add M,N,D, D → T gives R ✓ CANDIDATE KEY
{M}⁺: M → E gives {M,E}, similar chain gives R ✓ CANDIDATE KEY
{D}⁺: D → T gives {D,T} ≠ R ✗ (not a superkey)

All size-1 keys found: {E}, {S}, {M}

Candidate Key Analysis
Candidate	Closure	Is Superkey?	Is Minimal?
{E}	{E,S,M,N,D,T} = R	✓ Yes	✓ Yes
{S}	{S,E,M,N,D,T} = R	✓ Yes	✓ Yes
{M}	{M,E,S,N,D,T} = R	✓ Yes	✓ Yes
{D}	{D,T}	✗ No	—
{E,S}	R	✓ Yes	✗ No (E alone works)
{E,D}	R	✓ Yes	✗ No (E alone works)

Three Candidate Keys

Candidate Keys: {EmpID}, {SSN}, {Email}

This reflects the business reality: an employee can be uniquely identified by their employee ID, social security number, OR email address. All three are valid unique identifiers, so all three are candidate keys.

Prime attributes: E, S, M (each belongs to some candidate key) Non-prime attributes: N, D, T (none belongs to any candidate key)

Handling Exponential Cases

In the worst case, the number of candidate keys can be exponential in the number of attributes. This section explains when this happens and strategies to handle it.

When Do We Get Exponentially Many Keys?

The classic example is a relation with pairwise equivalent attributes:

R(A₁, A₂, ..., Aₙ)
FDs: A₁ → A₂, A₂ → A₁,    (A₁ ↔ A₂)
     A₃ → A₄, A₄ → A₃,    (A₃ ↔ A₄)
     ...

If we have n/2 pairs of equivalent attributes, the number of candidate keys is 2^(n/2)—each pair contributes a binary choice.

Exponential Keys ExampleR(A, B, C, D) with F = {A ↔ B, C ↔ D}

Input

F = {A → B, B → A, C → D, D → C}

Output

4 Candidate Keys: {A,C}, {A,D}, {B,C}, {B,D}

Explanation

Analysis: • L = ∅ (all attrs on both sides due to bidirectional FDs) • R = ∅ • B = {A, B, C, D} • Must = ∅

Key insight: (A,B) are equivalent—choose one. (C,D) are equivalent—choose one. 2 choices × 2 choices = 4 candidate keys.

With n/2 such pairs, we'd have 2^(n/2) candidate keys.

Verification: • {A,C}⁺: A→B, C→D → {A,B,C,D} ✓ • {A,D}⁺: A→B, D→C → {A,B,C,D} ✓ • {B,C}⁺: B→A, C→D → {A,B,C,D} ✓ • {B,D}⁺: B→A, D→C → {A,B,C,D} ✓

Strategies for Large Key Spaces:

Recognize equivalence classes: When attributes are mutually dependent (A ↔ B), they form an equivalence class. Within each class, pick any representative. This reduces the effective attribute count.
Early termination: For some applications, finding ANY candidate key suffices. Stop after finding the first.
Parallel search: The search through subsets of B can be parallelized since different subsets are independent.
Caching closures: Memoize closure computations to avoid redundant work when testing overlapping subsets.
SAT-based approaches: For very complex FD sets, encode the key-finding problem as a Boolean satisfiability problem and use SAT solvers.

Practical Reality

In real database schemas, exponentially many candidate keys are rare. Most relations have 1-3 candidate keys, arising from natural unique identifiers (ID, SSN, Email, etc.). The exponential case is mainly of theoretical interest.

Minimality Verification

After finding a superkey, we must verify it's minimal (a candidate key). This section details the verification process.

Algorithm: VERIFY-CANDIDATE-KEY(K, R, F)

Input:  K — proposed candidate key
        R — all attributes
        F — functional dependencies
Output: true if K is a candidate key, false otherwise

1. If K⁺ ≠ R:
      Return false   { Not even a superkey }

2. For each attribute A ∈ K:
      If (K - {A})⁺ = R:
          Return false   { K - {A} is also a superkey, so K not minimal }

3. Return true   { K is minimal superkey = candidate key }

Complexity: O(|K|) closure computations, each O(|F| × |R|²), giving O(|K| × |F| × |R|²) total.

Minimality Verification ExampleR(A,B,C,D) with F = {A → B, C → D}. Is {A, B, C} a candidate key?

Input

Test K = {A, B, C}

Output

Not a candidate key (not minimal)

Explanation

Step 1: Check if K is a superkey • {A,B,C}⁺: A→B (have B), C→D → {A,B,C,D} = R ✓ • Yes, {A,B,C} is a superkey.

Step 2: Check minimality by removing each attribute

• Remove A: {B,C}⁺ = {B,C,D} ≠ R ✓ (A is essential) • Remove B: {A,C}⁺ = {A,B,C,D} = R ✗ (B not essential!) • (Can stop here)

Conclusion: {A,B,C} is NOT a candidate key because {A,C} is also a superkey. The actual candidate key is {A,C}.

minimality_check
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def is_candidate_key(K: set, R: set, F: list) -> bool:
    """
    Verify if K is a candidate key (minimal superkey).
    
    Args:
        K: Proposed candidate key
        R: All attributes
        F: Functional dependencies
    
    Returns:
        True if K is a candidate key, False otherwise
    """
    # Step 1: Must be a superkey
    if compute_closure(K, F) != R:
        return False
    
    # Step 2: Check minimality
    for attr in K:
        K_minus_attr = K - {attr}
        if compute_closure(K_minus_attr, F) == R:
            # Found a smaller superkey!
            return False
    
    return True
 
 
def find_minimal_superkey(K: set, R: set, F: list) -> set:
    """
    Given a superkey K, find a minimal subset that's still a superkey.
    Useful for reducing a known superkey to a candidate key.
    """
    if compute_closure(K, F) != R:
        raise ValueError("K is not a superkey")
    
    minimal = set(K)
    for attr in list(K):  # Iterate over copy since we modify
        if attr in minimal:
            test = minimal - {attr}
            if compute_closure(test, F) == R:
                minimal = test
    
    return minimal
 
 
# Example
R = {'A', 'B', 'C', 'D'}
F = [({'A'}, {'B'}), ({'C'}, {'D'})]
 
K = {'A', 'B', 'C'}
print(f"{K} is candidate key: {is_candidate_key(K, R, F)}")  # False
 
minimal = find_minimal_superkey(K, R, F)
print(f"Minimal superkey from {K}: {minimal}")  # {A, C}
print(f"{minimal} is candidate key: {is_candidate_key(minimal, R, F)}")  # True

Complete Production-Ready Algorithm

Here's a comprehensive, production-ready implementation that combines all techniques:

Attribute categorization for efficiency
Level-wise search with pruning
Minimality verification
Comprehensive reporting

production_key_finder
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
from itertools import combinations
from dataclasses import dataclass
from typing import Set, List, Tuple, FrozenSet
 
@dataclass
class KeyAnalysisResult:
    """Complete analysis of a relation's key structure."""
    candidate_keys: List[FrozenSet[str]]
    prime_attributes: Set[str]
    non_prime_attributes: Set[str]
    must_include: Set[str]  # L ∪ N
    never_include: Set[str]  # R
    optional: Set[str]  # B
    
    def summary(self) -> str:
        lines = [
            "=== Key Analysis Result ===",
            f"Candidate Keys: {len(self.candidate_keys)}",
        ]
        for i, key in enumerate(self.candidate_keys, 1):
            lines.append(f"  {i}. {set(key)}")
        lines.extend([
            f"Prime Attributes: {self.prime_attributes}",
            f"Non-Prime Attributes: {self.non_prime_attributes}",
            f"Must be in every key: {self.must_include}",
            f"Never in any key: {self.never_include}",
        ])
        return "\n".join(lines)
 
 
def analyze_keys(
    R: Set[str], 
    F: List[Tuple[Set[str], Set[str]]]
) -> KeyAnalysisResult:
    """
    Complete key analysis: find all candidate keys and classify attributes.
    """
    # Categorize attributes
    left_attrs = set()
    right_attrs = set()
    for det, dep in F:
        left_attrs |= det
        right_attrs |= dep
    
    L = left_attrs - right_attrs
    R_only = right_attrs - left_attrs
    B = left_attrs & right_attrs
    N = R - (left_attrs | right_attrs)
    
    Must = L | N
    
    # Find candidate keys
    candidate_keys = _find_keys(R, F, Must, B)
    
    # Compute prime/non-prime attributes
    prime = set()
    for key in candidate_keys:
        prime |= key
    non_prime = R - prime
    
    return KeyAnalysisResult(
        candidate_keys=candidate_keys,
        prime_attributes=prime,
        non_prime_attributes=non_prime,
        must_include=Must,
        never_include=R_only,
        optional=B
    )
 
 
def _find_keys(
    R: Set[str], 
    F: List[Tuple[Set[str], Set[str]]],
    Must: Set[str],
    May: Set[str]
) -> List[FrozenSet[str]]:
    """Internal: find all candidate keys given Must and May sets."""
    
    # Check if Must alone works
    if compute_closure(Must, F) == R:
        return [frozenset(Must)]
    
    candidate_keys = []
    May_list = sorted(May)
    
    # Level-wise search
    for size in range(1, len(May) + 1):
        for subset in combinations(May_list, size):
            candidate = Must | set(subset)
            candidate_frozen = frozenset(candidate)
            
            # Prune: skip supersets of found keys
            if any(ck < candidate_frozen for ck in candidate_keys):
                continue
            
            if compute_closure(candidate, F) == R:
                candidate_keys.append(candidate_frozen)
    
    return candidate_keys
 
 
def compute_closure(X: Set[str], F: List[Tuple[Set[str], Set[str]]]) -> Set[str]:
    """Compute X⁺ under F."""
    result = set(X)
    changed = True
    while changed:
        changed = False
        for det, dep in F:
            if det <= result and not dep <= result:
                result |= dep
                changed = True
    return result
 
 
# ============ DEMONSTRATION ============
if __name__ == "__main__":
    print("Example 1: Employee Relation")
    R1 = {'E', 'S', 'M', 'N', 'D', 'T'}
    F1 = [
        ({'E'}, {'S', 'M', 'N', 'D'}),
        ({'S'}, {'E'}),
        ({'M'}, {'E'}),
        ({'D'}, {'T'}),
    ]
    result1 = analyze_keys(R1, F1)
    print(result1.summary())
    print()
    
    print("Example 2: Cyclic Dependencies")
    R2 = {'A', 'B', 'C', 'D'}
    F2 = [
        ({'A'}, {'B'}), ({'B'}, {'A'}),
        ({'C'}, {'D'}), ({'D'}, {'C'}),
    ]
    result2 = analyze_keys(R2, F2)
    print(result2.summary())

Common Patterns and Heuristics

Experienced database designers recognize patterns that indicate key structure without full computation. These heuristics speed up analysis:

Pattern 1: Single Source Attribute

If one attribute X is in L (left-only) and no other attribute is in L or N, then every candidate key must contain X. If {X}⁺ = R, it's the only candidate key.

Pattern 2: Equivalent Identifiers

Bidirectional FDs (A → B and B → A) indicate equivalent identifiers. One can substitute for the other in keys, creating multiple candidate keys of the same size.

Pattern 3: Composite to Atomic Chain

When composite attributes (like StudentID, CourseID) start a chain of atomic dependencies, the composite is likely the only candidate key.

Pattern 4: Independent Attribute Groups

If attributes form independent groups (no FDs between groups), each group needs representation in the key. This often happens with denormalized relations.

Quick Key Identification Heuristics
Pattern	Indicator	Likely Key Structure
Single source	One attr in L, others in R or B	Single candidate key = that attr
Multiple IDs	A↔B, both in category B	Two keys: one with A, one with B
Hierarchical	A→B→C→D chain	Key likely at start of chain (A)
Independent parts	No FD connects group X to group Y	Key needs one attr from each group
Everything derived	All attrs in R except one in L	That one L attr is the key

Semantic Understanding

The fastest way to identify keys is often semantic understanding. What uniquely identifies an entity? In an orders table: OrderID. In a line items table: OrderID + LineNumber. In a many-to-many: both FKs together. Formal analysis confirms what business logic suggests.

Summary: Mastering Candidate Key Discovery

Candidate key finding is the practical culmination of attribute closure theory. Let's consolidate everything you've learned in this module:

Module Summary: Attribute Closure

•Closure Definition — X⁺ is all attributes determined by X under F, computed by fixed-point iteration.
•Closure Algorithm — Iteratively add attributes until no FD can add more. O(n²m) naive, O(nm) optimized.
•Finding Keys — Use attribute categorization (L, R, B, N). Must = L ∪ N is in every key; search subsets of B.
•Superkey Structure — Superkeys form an upward-closed set; candidate keys are minimal elements.
•Candidate Keys — Minimal superkeys. Verify minimality by checking that removing any attribute breaks the superkey property.
•Complexity — Can have exponentially many keys in theory; rare in practice (usually 1-3).
•Applications — Primary key selection, normalization analysis, integrity constraints, query optimization.

Looking Ahead:

With mastery of attribute closure and candidate key finding, you're prepared for the advanced topics ahead:

Module 4: Equivalence of FD sets — comparing different dependency representations
Module 5: Canonical cover — finding minimal equivalent FD sets
Module 6: FD and Keys — deeper connections between dependencies and key structure

The closure algorithm you've mastered is the workhorse underlying all these advanced techniques.

Module Complete

Congratulations! You've completed Module 3: Attribute Closure. You can now compute closure, find all candidate keys, analyze superkey structure, and understand the key structure of any relation. These skills are essential for normalization, schema design, and advanced database theory.

5 / 5

Loading learning content...

Database Management SystemsAttribute Closure

Attribute Closure

LevelIntermediate

Duration60 mins

TopicAttribute Closure

5 / 5

Candidate Key Finding

The Ultimate Goal: All Candidate Keys

The Complete Candidate Key Problem:

Given relation R with attributes U and functional dependencies F, find ALL sets K such that:

K⁺ = U (K is a superkey)
For all K' ⊂ K: K'⁺ ≠ U (K is minimal)

What You Will Learn

The Refined Candidate Key Algorithm

Building on the categorization approach, here's a refined algorithm with optimizations for efficiency and completeness.

Algorithm: FIND-ALL-CANDIDATE-KEYS(R, F)

Input:  R — set of all attributes
        F — set of functional dependencies
Output: Set of all candidate keys

1. Categorize attributes:
   - L = attributes appearing only on left side of FDs
   - R = attributes appearing only on right side of FDs
   - B = attributes appearing on both sides
   - N = attributes appearing in neither side

2. Must = L ∪ N    { These MUST be in every candidate key }
   May = B         { These MAY or may not be in candidate keys }
   Never = R       { These are NEVER in any candidate key }

3. If closure(Must) = R:
      Return { Must }   { Only one candidate key }

4. CandidateKeys = ∅
   Compute candidate keys containing Must and subsets of May
   
5. For each minimal set S ⊆ May such that closure(Must ∪ S) = R:
      Add (Must ∪ S) to CandidateKeys

6. Return CandidateKeys

Optimization: Level-wise search with pruning

In step 5, search subsets of May by increasing size. When you find a key K, prune all its supersets (they can't be minimal).

complete_candidate_key_finder
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
from itertools import combinations
from typing import Set, List, Tuple, FrozenSet
 
def find_all_candidate_keys(
    R: Set[str], 
    F: List[Tuple[Set[str], Set[str]]]
) -> List[FrozenSet[str]]:
    """
    Find all candidate keys of relation R under FDs F.
    
    Returns list of candidate keys (each as frozenset for immutability).
    """
    # Step 1: Categorize attributes
    left_attrs = set()
    right_attrs = set()
    for det, dep in F:
        left_attrs |= det
        right_attrs |= dep
    
    L = left_attrs - right_attrs       # Left-only: MUST be in every key
    R_only = right_attrs - left_attrs  # Right-only: NEVER in any key
    B = left_attrs & right_attrs       # Both: MAY be in key
    N = R - (left_attrs | right_attrs) # Neither: MUST be in every key
    
    # Step 2: Compute mandatory base
    Must = L | N
    May = B
    
    # Step 3: Check if Must alone is sufficient
    if compute_closure(Must, F) == R:
        return [frozenset(Must)]
    
    # Step 4 & 5: Search for minimal supersets of Must
    candidate_keys = []
    May_list = sorted(May)  # Sort for deterministic order
    
    # Level-wise search: smallest subsets first
    for size in range(1, len(May) + 1):
        for subset in combinations(May_list, size):
            candidate = Must | set(subset)
            
            # Pruning: skip if superset of known candidate key
            candidate_frozen = frozenset(candidate)
            is_superset = any(
                ck < candidate_frozen  # strict subset
                for ck in candidate_keys
            )
            if is_superset:
                continue
            
            # Check if this is a superkey
            if compute_closure(candidate, F) == R:
                candidate_keys.append(candidate_frozen)
    
    return candidate_keys
 
 
def compute_closure(X: Set[str], F: List[Tuple[Set[str], Set[str]]]) -> Set[str]:
    """Compute attribute closure X⁺ under F."""
    result = set(X)
    changed = True
    while changed:
        changed = False
        for det, dep in F:
            if det <= result and not dep <= result:
                result |= dep
                changed = True
    return result
 
 
# ============ EXAMPLE USAGE ============
if __name__ == "__main__":
    # Example: R(A,B,C,D,E) with cyclic dependencies
    R = {'A', 'B', 'C', 'D', 'E'}
    F = [
        ({'A', 'B'}, {'C'}),   # AB → C
        ({'C'}, {'D'}),         # C → D
        ({'D'}, {'E'}),         # D → E
        ({'E'}, {'A'}),         # E → A  (creates cycle)
    ]
    
    keys = find_all_candidate_keys(R, F)
    print("Candidate Keys:")
    for k in keys:
        print(f"  {set(k)}")
    
    # Analysis:
    # L = {B} (B only on left)
    # R_only = {} (all right attrs also on left somewhere)
    # B = {A, C, D, E} (all appear on both sides)
    # N = {} 
    # Must = {B}
    #
    # {B}⁺ = {B} ≠ R, so we need to add from {A,C,D,E}
    # Due to cycle E→A, AB→C, C→D, D→E:
    # {B,A}⁺ = {A,B,C,D,E} ✓
    # {B,C}⁺ = {B,C,D,E,A} ✓
    # {B,D}⁺ = {B,D,E,A,C} ✓ (wait, need to verify)
    # {B,E}⁺ = {B,E,A,C,D} ✓
    #
    # All four are candidate keys: {AB}, {BC}, {BD}, {BE}

Complex Multi-Key Example

Let's work through a challenging example that demonstrates multiple interacting candidate keys.

Problem:

Relation: Student_Enrollment(StudentID, CourseID, Semester, Grade, InstructorID, InstructorName)

Simplified as: R(S, C, E, G, I, N) where:

S = StudentID
C = CourseID
E = Semester
G = Grade
I = InstructorID
N = InstructorName

Functional Dependencies:

F1: S, C, E → G (student's grade in a course-semester is unique)
F2: C, E → I (each course-semester has one instructor)
F3: I → N (instructor has one name)

Step 1: Categorize Attributes

Attribute	Left side of	Right side of	Category
S	F1	—	L (left-only)
C	F1, F2	—	L (left-only)
E	F1, F2	—	L (left-only)
G	—	F1	R (right-only)
I	F3	F2	B (both)
N	—	F3	R (right-only)

Categories:

L = {S, C, E}
R = {G, N}
B = {I}
N = ∅

Must = L ∪ N = {S, C, E}

Step 2: Test Must Alone

Compute {S, C, E}⁺:

Start: {S, C, E}
F1 (S,C,E → G): S,C,E ⊆ {S,C,E}? YES → add G → {S, C, E, G}
F2 (C,E → I): C,E ⊆ {S,C,E,G}? YES → add I → {S, C, E, G, I}
F3 (I → N): I ⊆ {S,C,E,G,I}? YES → add N → {S, C, E, G, I, N}

Result: {S, C, E}⁺ = {S, C, E, G, I, N} = R ✓

Single Candidate Key Found

Since Must alone determines all attributes, {S, C, E} = {StudentID, CourseID, Semester} is the ONLY candidate key.

Example with Multiple Candidate Keys

Now let's examine a relation with multiple candidate keys stemming from equivalent identifiers.

Problem:

Relation: Employee(EmpID, SSN, Email, Name, DeptID, DeptName)

Simplified as: R(E, S, M, N, D, T) where:

E = EmpID
S = SSN
M = Email
N = Name
D = DeptID
T = DeptName (T for "title")

Functional Dependencies:

F1: E → S, M, N, D (EmpID determines all employee info)
F2: S → E (SSN uniquely identifies employee)
F3: M → E (Email uniquely identifies employee)
F4: D → T (DeptID determines DeptName)

Step 1: Categorize Attributes

Attribute	Left side	Right side	Category
E	F1	F2, F3	B (both)
S	F2	F1	B (both)
M	F3	F1	B (both)
N	—	F1	R (right-only)
D	F4	F1	B (both)
T	—	F4	R (right-only)

Categories:

L = ∅ (no left-only attributes!)
R = {N, T}
B = {E, S, M, D}
N = ∅

Must = L ∪ N = ∅ (empty!)

Step 2: Test Must = ∅

Clearly, ∅⁺ = ∅ ≠ R. We need to find minimal subsets of B = {E, S, M, D} that determine everything.

Step 3: Level-wise Search

Size 1 subsets:

{E}⁺: E → S,M,N,D gives {E,S,M,N,D}, D → T gives {E,S,M,N,D,T} = R ✓ CANDIDATE KEY
{S}⁺: S → E gives {S,E}, then E → S,M,N,D already have S,E so add M,N,D, D → T gives R ✓ CANDIDATE KEY
{M}⁺: M → E gives {M,E}, similar chain gives R ✓ CANDIDATE KEY
{D}⁺: D → T gives {D,T} ≠ R ✗ (not a superkey)

All size-1 keys found: {E}, {S}, {M}

Candidate Key Analysis
Candidate	Closure	Is Superkey?	Is Minimal?
{E}	{E,S,M,N,D,T} = R	✓ Yes	✓ Yes
{S}	{S,E,M,N,D,T} = R	✓ Yes	✓ Yes
{M}	{M,E,S,N,D,T} = R	✓ Yes	✓ Yes
{D}	{D,T}	✗ No	—
{E,S}	R	✓ Yes	✗ No (E alone works)
{E,D}	R	✓ Yes	✗ No (E alone works)

Three Candidate Keys

Candidate Keys: {EmpID}, {SSN}, {Email}

Prime attributes: E, S, M (each belongs to some candidate key) Non-prime attributes: N, D, T (none belongs to any candidate key)

Handling Exponential Cases

In the worst case, the number of candidate keys can be exponential in the number of attributes. This section explains when this happens and strategies to handle it.

When Do We Get Exponentially Many Keys?

The classic example is a relation with pairwise equivalent attributes:

R(A₁, A₂, ..., Aₙ)
FDs: A₁ → A₂, A₂ → A₁,    (A₁ ↔ A₂)
     A₃ → A₄, A₄ → A₃,    (A₃ ↔ A₄)
     ...

If we have n/2 pairs of equivalent attributes, the number of candidate keys is 2^(n/2)—each pair contributes a binary choice.

Exponential Keys ExampleR(A, B, C, D) with F = {A ↔ B, C ↔ D}

Input

F = {A → B, B → A, C → D, D → C}

Output

4 Candidate Keys: {A,C}, {A,D}, {B,C}, {B,D}

Explanation

Analysis: • L = ∅ (all attrs on both sides due to bidirectional FDs) • R = ∅ • B = {A, B, C, D} • Must = ∅

Key insight: (A,B) are equivalent—choose one. (C,D) are equivalent—choose one. 2 choices × 2 choices = 4 candidate keys.

With n/2 such pairs, we'd have 2^(n/2) candidate keys.

Verification: • {A,C}⁺: A→B, C→D → {A,B,C,D} ✓ • {A,D}⁺: A→B, D→C → {A,B,C,D} ✓ • {B,C}⁺: B→A, C→D → {A,B,C,D} ✓ • {B,D}⁺: B→A, D→C → {A,B,C,D} ✓

Strategies for Large Key Spaces:

Recognize equivalence classes: When attributes are mutually dependent (A ↔ B), they form an equivalence class. Within each class, pick any representative. This reduces the effective attribute count.
Early termination: For some applications, finding ANY candidate key suffices. Stop after finding the first.
Parallel search: The search through subsets of B can be parallelized since different subsets are independent.
Caching closures: Memoize closure computations to avoid redundant work when testing overlapping subsets.
SAT-based approaches: For very complex FD sets, encode the key-finding problem as a Boolean satisfiability problem and use SAT solvers.

Practical Reality

Minimality Verification

After finding a superkey, we must verify it's minimal (a candidate key). This section details the verification process.

Algorithm: VERIFY-CANDIDATE-KEY(K, R, F)

Input:  K — proposed candidate key
        R — all attributes
        F — functional dependencies
Output: true if K is a candidate key, false otherwise

1. If K⁺ ≠ R:
      Return false   { Not even a superkey }

2. For each attribute A ∈ K:
      If (K - {A})⁺ = R:
          Return false   { K - {A} is also a superkey, so K not minimal }

3. Return true   { K is minimal superkey = candidate key }

Complexity: O(|K|) closure computations, each O(|F| × |R|²), giving O(|K| × |F| × |R|²) total.

Minimality Verification ExampleR(A,B,C,D) with F = {A → B, C → D}. Is {A, B, C} a candidate key?

Input

Test K = {A, B, C}

Output

Not a candidate key (not minimal)

Explanation

Step 1: Check if K is a superkey • {A,B,C}⁺: A→B (have B), C→D → {A,B,C,D} = R ✓ • Yes, {A,B,C} is a superkey.

Step 2: Check minimality by removing each attribute

• Remove A: {B,C}⁺ = {B,C,D} ≠ R ✓ (A is essential) • Remove B: {A,C}⁺ = {A,B,C,D} = R ✗ (B not essential!) • (Can stop here)

Conclusion: {A,B,C} is NOT a candidate key because {A,C} is also a superkey. The actual candidate key is {A,C}.

minimality_check
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def is_candidate_key(K: set, R: set, F: list) -> bool:
    """
    Verify if K is a candidate key (minimal superkey).
    
    Args:
        K: Proposed candidate key
        R: All attributes
        F: Functional dependencies
    
    Returns:
        True if K is a candidate key, False otherwise
    """
    # Step 1: Must be a superkey
    if compute_closure(K, F) != R:
        return False
    
    # Step 2: Check minimality
    for attr in K:
        K_minus_attr = K - {attr}
        if compute_closure(K_minus_attr, F) == R:
            # Found a smaller superkey!
            return False
    
    return True
 
 
def find_minimal_superkey(K: set, R: set, F: list) -> set:
    """
    Given a superkey K, find a minimal subset that's still a superkey.
    Useful for reducing a known superkey to a candidate key.
    """
    if compute_closure(K, F) != R:
        raise ValueError("K is not a superkey")
    
    minimal = set(K)
    for attr in list(K):  # Iterate over copy since we modify
        if attr in minimal:
            test = minimal - {attr}
            if compute_closure(test, F) == R:
                minimal = test
    
    return minimal
 
 
# Example
R = {'A', 'B', 'C', 'D'}
F = [({'A'}, {'B'}), ({'C'}, {'D'})]
 
K = {'A', 'B', 'C'}
print(f"{K} is candidate key: {is_candidate_key(K, R, F)}")  # False
 
minimal = find_minimal_superkey(K, R, F)
print(f"Minimal superkey from {K}: {minimal}")  # {A, C}
print(f"{minimal} is candidate key: {is_candidate_key(minimal, R, F)}")  # True

Complete Production-Ready Algorithm

Here's a comprehensive, production-ready implementation that combines all techniques:

Attribute categorization for efficiency
Level-wise search with pruning
Minimality verification
Comprehensive reporting

production_key_finder
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
from itertools import combinations
from dataclasses import dataclass
from typing import Set, List, Tuple, FrozenSet
 
@dataclass
class KeyAnalysisResult:
    """Complete analysis of a relation's key structure."""
    candidate_keys: List[FrozenSet[str]]
    prime_attributes: Set[str]
    non_prime_attributes: Set[str]
    must_include: Set[str]  # L ∪ N
    never_include: Set[str]  # R
    optional: Set[str]  # B
    
    def summary(self) -> str:
        lines = [
            "=== Key Analysis Result ===",
            f"Candidate Keys: {len(self.candidate_keys)}",
        ]
        for i, key in enumerate(self.candidate_keys, 1):
            lines.append(f"  {i}. {set(key)}")
        lines.extend([
            f"Prime Attributes: {self.prime_attributes}",
            f"Non-Prime Attributes: {self.non_prime_attributes}",
            f"Must be in every key: {self.must_include}",
            f"Never in any key: {self.never_include}",
        ])
        return "\n".join(lines)
 
 
def analyze_keys(
    R: Set[str], 
    F: List[Tuple[Set[str], Set[str]]]
) -> KeyAnalysisResult:
    """
    Complete key analysis: find all candidate keys and classify attributes.
    """
    # Categorize attributes
    left_attrs = set()
    right_attrs = set()
    for det, dep in F:
        left_attrs |= det
        right_attrs |= dep
    
    L = left_attrs - right_attrs
    R_only = right_attrs - left_attrs
    B = left_attrs & right_attrs
    N = R - (left_attrs | right_attrs)
    
    Must = L | N
    
    # Find candidate keys
    candidate_keys = _find_keys(R, F, Must, B)
    
    # Compute prime/non-prime attributes
    prime = set()
    for key in candidate_keys:
        prime |= key
    non_prime = R - prime
    
    return KeyAnalysisResult(
        candidate_keys=candidate_keys,
        prime_attributes=prime,
        non_prime_attributes=non_prime,
        must_include=Must,
        never_include=R_only,
        optional=B
    )
 
 
def _find_keys(
    R: Set[str], 
    F: List[Tuple[Set[str], Set[str]]],
    Must: Set[str],
    May: Set[str]
) -> List[FrozenSet[str]]:
    """Internal: find all candidate keys given Must and May sets."""
    
    # Check if Must alone works
    if compute_closure(Must, F) == R:
        return [frozenset(Must)]
    
    candidate_keys = []
    May_list = sorted(May)
    
    # Level-wise search
    for size in range(1, len(May) + 1):
        for subset in combinations(May_list, size):
            candidate = Must | set(subset)
            candidate_frozen = frozenset(candidate)
            
            # Prune: skip supersets of found keys
            if any(ck < candidate_frozen for ck in candidate_keys):
                continue
            
            if compute_closure(candidate, F) == R:
                candidate_keys.append(candidate_frozen)
    
    return candidate_keys
 
 
def compute_closure(X: Set[str], F: List[Tuple[Set[str], Set[str]]]) -> Set[str]:
    """Compute X⁺ under F."""
    result = set(X)
    changed = True
    while changed:
        changed = False
        for det, dep in F:
            if det <= result and not dep <= result:
                result |= dep
                changed = True
    return result
 
 
# ============ DEMONSTRATION ============
if __name__ == "__main__":
    print("Example 1: Employee Relation")
    R1 = {'E', 'S', 'M', 'N', 'D', 'T'}
    F1 = [
        ({'E'}, {'S', 'M', 'N', 'D'}),
        ({'S'}, {'E'}),
        ({'M'}, {'E'}),
        ({'D'}, {'T'}),
    ]
    result1 = analyze_keys(R1, F1)
    print(result1.summary())
    print()
    
    print("Example 2: Cyclic Dependencies")
    R2 = {'A', 'B', 'C', 'D'}
    F2 = [
        ({'A'}, {'B'}), ({'B'}, {'A'}),
        ({'C'}, {'D'}), ({'D'}, {'C'}),
    ]
    result2 = analyze_keys(R2, F2)
    print(result2.summary())

Common Patterns and Heuristics

Experienced database designers recognize patterns that indicate key structure without full computation. These heuristics speed up analysis:

Pattern 1: Single Source Attribute

If one attribute X is in L (left-only) and no other attribute is in L or N, then every candidate key must contain X. If {X}⁺ = R, it's the only candidate key.

Pattern 2: Equivalent Identifiers

Bidirectional FDs (A → B and B → A) indicate equivalent identifiers. One can substitute for the other in keys, creating multiple candidate keys of the same size.

Pattern 3: Composite to Atomic Chain

When composite attributes (like StudentID, CourseID) start a chain of atomic dependencies, the composite is likely the only candidate key.

Pattern 4: Independent Attribute Groups

If attributes form independent groups (no FDs between groups), each group needs representation in the key. This often happens with denormalized relations.

Quick Key Identification Heuristics
Pattern	Indicator	Likely Key Structure
Single source	One attr in L, others in R or B	Single candidate key = that attr
Multiple IDs	A↔B, both in category B	Two keys: one with A, one with B
Hierarchical	A→B→C→D chain	Key likely at start of chain (A)
Independent parts	No FD connects group X to group Y	Key needs one attr from each group
Everything derived	All attrs in R except one in L	That one L attr is the key

Semantic Understanding

Summary: Mastering Candidate Key Discovery

Candidate key finding is the practical culmination of attribute closure theory. Let's consolidate everything you've learned in this module:

Module Summary: Attribute Closure

•Closure Definition — X⁺ is all attributes determined by X under F, computed by fixed-point iteration.
•Closure Algorithm — Iteratively add attributes until no FD can add more. O(n²m) naive, O(nm) optimized.
•Finding Keys — Use attribute categorization (L, R, B, N). Must = L ∪ N is in every key; search subsets of B.
•Superkey Structure — Superkeys form an upward-closed set; candidate keys are minimal elements.
•Candidate Keys — Minimal superkeys. Verify minimality by checking that removing any attribute breaks the superkey property.
•Complexity — Can have exponentially many keys in theory; rare in practice (usually 1-3).
•Applications — Primary key selection, normalization analysis, integrity constraints, query optimization.

Looking Ahead:

With mastery of attribute closure and candidate key finding, you're prepared for the advanced topics ahead:

Module 4: Equivalence of FD sets — comparing different dependency representations
Module 5: Canonical cover — finding minimal equivalent FD sets
Module 6: FD and Keys — deeper connections between dependencies and key structure

The closure algorithm you've mastered is the workhorse underlying all these advanced techniques.

Module Complete

5 / 5