Keys - Learning Module | OneNoughtOne

Loading content...

0/241

Candidate Key

From Abundance to Precision

In our exploration of superkeys, we discovered a paradox of abundance: a single relation can have hundreds or even thousands of superkeys. While mathematically interesting, this abundance creates a practical problem—which superkey should we use for identification?

Consider a relation with attributes {A, B, C, D, E} where {A} is a superkey. By the superset property, all of the following are also superkeys:

{A, B}, {A, C}, {A, D}, {A, E}
{A, B, C}, {A, B, D}, {A, B, E}, {A, C, D}, {A, C, E}, {A, D, E}
And so on...

This explosion raises critical questions:

Why use {A, B, C, D, E} to identify a tuple when {A} alone suffices?
What is the 'smallest' or 'most efficient' superkey?
Are there multiple equally minimal options?

The candidate key concept answers these questions by identifying the minimal superkeys—those with no redundant attributes.

What You Will Learn

By the end of this page, you will understand the formal definition of candidate keys and their relationship to superkeys, master algorithms for finding candidate keys from functional dependencies, appreciate why relations may have multiple candidate keys, and develop intuition for identifying candidate keys in practical database design scenarios.

The Minimality Concept

Before defining candidate keys formally, we must understand what 'minimal' means in this context. Minimality isn't about the number of attributes—it's about irreducibility.

A set is minimal with respect to a property if removing any element causes the property to be lost. For superkeys, this means:

Understanding Minimality

A superkey K is minimal if no proper subset of K is also a superkey. Equivalently, removing any single attribute from K destroys its ability to uniquely identify tuples.

Example: Examining Minimality

Consider Student(StudentID, Email, Name, Major) where both {StudentID} and {Email} uniquely identify tuples:

Is {StudentID} minimal?

Check {}: The empty set cannot uniquely identify tuples (all tuples match)
Removing the only attribute destroys superkey status
Therefore, {StudentID} is minimal ✓

Is {StudentID, Email} minimal?

Check {StudentID}: Still a superkey (uniquely identifies tuples)
Therefore, {StudentID, Email} is NOT minimal ✗
We can remove Email and still have a superkey

Is {Name, Major} minimal? (Assuming it's a superkey)

Check {Name}: Cannot uniquely identify (students can share names)
Check {Major}: Cannot uniquely identify (students share majors)
Neither single attribute works, but together they might
If {Name, Major} is indeed a superkey, it would be minimal

Key insight: Minimality is about removing single attributes one at a time, not about having the smallest possible count.

Minimality Clarifications

•Minimal ≠ Smallest possible: A 2-attribute candidate key can exist alongside a 1-attribute candidate key. Both are minimal within their own structure.
•Minimal ≠ Unique: A relation can have multiple minimal superkeys (candidate keys). Each is irreducible, but they may differ in composition.
•Minimal = Irreducible: You cannot shrink a candidate key and maintain the superkey property. Every attribute is necessary.
•Testing requires checking each attribute: To verify minimality, remove each attribute one at a time and test if the result is still a superkey.

Formal Definition of Candidate Key

With the minimality concept clarified, we can precisely define candidate keys:

Definition: Candidate Key

A candidate key of a relation R is a minimal superkey of R. That is, K is a candidate key if: (1) K is a superkey — K → R (K functionally determines all attributes of R), and (2) K is minimal — for no proper subset K' ⊂ K is K' also a superkey.

Mathematical formalization:

Let R be a relation schema with attribute set U. Let F be the set of functional dependencies on R.

A set K ⊆ U is a candidate key of R if:

K⁺ = U (the closure of K under F equals the complete attribute set)
∀A ∈ K: (K - {A})⁺ ≠ U (removing any single attribute causes the closure to shrink below U)

Equivalently, using functional dependencies:

K is a candidate key if:

K → U (K determines everything)
∀A ∈ K: K - {A} ↛ U (K minus any attribute doesn't determine everything)

The two conditions are essential:

Condition	What it ensures	Consequence if violated
Superkey property (K → R)	K can identify tuples	K is useless for identification
Minimality (no proper subset works)	K has no redundancy	K wastes storage and complicates constraints

Why Both Conditions Matter

The superkey property ensures correctness—K actually works for unique identification. The minimality property ensures efficiency—K uses no more attributes than necessary. Together, they define the 'tightest' possible unique identifier.

Finding Candidate Keys

Finding all candidate keys of a relation given its functional dependencies is a fundamental database task. We present a systematic algorithm and then work through detailed examples.

Algorithm: Finding Candidate Keys

Step 1: Identify attributes that appear only on the LEFT side of FDs (must be in every key). Step 2: Identify attributes that appear only on the RIGHT side of FDs (never in any key). Step 3: Identify attributes on both sides (may or may not be in keys). Step 4: Start with left-only attributes, compute closure. If complete, this is a candidate key. Step 5: If incomplete, systematically add combinations of 'both sides' attributes until closure is complete. Step 6: For each candidate found, verify minimality.

Detailed Example:

Consider R(A, B, C, D, E) with functional dependencies:

F = {AB → C, C → D, D → E, B → E}

Step 1: Classify attributes

Attribute	Appears on Left	Appears on Right	Classification
A	Yes (AB→C)	No	Left-only
B	Yes (AB→C, B→E)	No	Left-only
C	Yes (C→D)	Yes (from AB→C)	Both
D	Yes (D→E)	Yes (from C→D)	Both
E	No	Yes (from D→E, B→E)	Right-only

Step 2: Start with left-only attributes

Left-only: {A, B} — these MUST be in every candidate key.

Step 3: Compute closure of {A, B}

Start: {A, B}⁺ = {A, B}
Apply AB → C: {A, B}⁺ = {A, B, C}
Apply C → D: {A, B}⁺ = {A, B, C, D}
Apply D → E: {A, B}⁺ = {A, B, C, D, E}
Apply B → E: Already have E

Closure = {A, B, C, D, E} = R ✓

Step 4: Verify minimality

{A}⁺ = {A} ≠ R — A alone doesn't work
{B}⁺ = {B, E} ≠ R — B alone doesn't work

Both attributes are necessary, so {A, B} is minimal.

Conclusion: The only candidate key is {A, B}

Right-Only Attributes

Attributes that appear only on the right side of functional dependencies can NEVER be part of any candidate key. They are always determined by other attributes and never determine anything themselves. In the example, E is right-only and cannot appear in any candidate key.

Multiple Candidate Keys

A critical principle in relational theory is that relations can have multiple candidate keys. Each represents a valid, minimal way to uniquely identify tuples. The database designer must choose one as the primary key.

Example: Multiple Candidate Keys

Consider Employee(EmployeeID, SSN, Email, Name, Department)

Functional dependencies (from real-world semantics):

EmployeeID → SSN, Email, Name, Department
SSN → EmployeeID, Email, Name, Department
Email → EmployeeID, SSN, Name, Department

Let's verify each single-attribute candidate:

Candidate Key Analysis for Employee
Attribute	Closure	Is Superkey?	Is Minimal?	Candidate Key?
{EmployeeID}	{All attributes}	Yes	Yes (single attribute)	Yes ✓
{SSN}	{All attributes}	Yes	Yes (single attribute)	Yes ✓
{Email}	{All attributes}	Yes	Yes (single attribute)	Yes ✓
{Name}	{Name}	No	N/A	No
{Department}	{Department}	No	N/A	No

This relation has three candidate keys: {EmployeeID}, {SSN}, and {Email}.

Each is:

A superkey (uniquely identifies all tuples)
Minimal (single attribute, can't be reduced)

Why multiple candidate keys exist:

Real-world identity has multiple facets: An employee is uniquely identified by their company ID, government SSN, and work email—each from a different domain of identification.
Historical accumulation: Systems often acquire multiple identification mechanisms over time as requirements evolve.
Integration requirements: When systems merge, each may bring its own identification scheme.

Design implications:

All candidate keys deserve equal theoretical consideration
One will be designated the primary key
Others become alternate keys
All should have uniqueness constraints enforced

The Terminology Hierarchy

Superkey ⊇ Candidate Key ⊇ Primary Key. Every primary key is a candidate key. Every candidate key is a superkey. But the reverse is not true—most superkeys are not candidate keys (they're not minimal), and most candidate keys are not the primary key (only one is chosen).

Composite Candidate Keys

While single-attribute candidate keys are common, many relations require composite candidate keys—keys consisting of multiple attributes. Understanding composite keys is essential for proper database design.

Definition: Composite Key

A composite key (or compound key) is a candidate key consisting of two or more attributes. The individual attributes are called key attributes or prime attributes. None of them alone is sufficient for unique identification—they must work together.

Example: Course Enrollment

Consider Enrollment(StudentID, CourseID, Semester, Grade, EnrollDate)

Business rules:

A student can enroll in a course at most once per semester
The same course runs every semester
Many students take each course

Functional dependencies:

{StudentID, CourseID, Semester} → Grade, EnrollDate

Analysis:

Attribute Combination	Uniquely Identifies?	Why?
{StudentID}	No	Same student in many courses
{CourseID}	No	Many students per course
{Semester}	No	Many enrollments per semester
{StudentID, CourseID}	No	Same student, same course in different semesters
{StudentID, Semester}	No	Student takes multiple courses per semester
{CourseID, Semester}	No	Many students per course per semester
{StudentID, CourseID, Semester}	Yes	At most one enrollment per combination

The candidate key is {StudentID, CourseID, Semester} — a 3-attribute composite key.

Minimality check:

Remove StudentID: {CourseID, Semester} — not unique (multiple students)
Remove CourseID: {StudentID, Semester} — not unique (multiple courses)
Remove Semester: {StudentID, CourseID} — not unique (repeatable enrollment)

All three attributes are necessary — the key is minimal.

Common Composite Key Patterns

•Junction tables (many-to-many): {Entity1_ID, Entity2_ID} — e.g., StudentCourse, BookAuthor, OrderProduct
•Time-series data: {Entity_ID, Timestamp} — e.g., StockPrice, SensorReading, AuditLog
•Multi-tenant systems: {TenantID, LocalID} — isolation between organizations sharing a database
•Versioned records: {Entity_ID, Version} or {Entity_ID, ValidFrom} — tracking historical changes
•Hierarchical namespaces: {Parent_ID, LocalName} — directory structures, organizational hierarchies

Prime and Non-Prime Attributes

The distinction between prime and non-prime attributes is fundamental to normalization theory and database design. These categories determine how functional dependencies should be structured.

Prime Attributes

An attribute is prime (or key attribute) if it is a member of ANY candidate key of the relation. Prime attributes participate in tuple identification.

Non-Prime Attributes

An attribute is non-prime (or non-key attribute) if it is NOT a member of any candidate key. Non-prime attributes are determined by keys but don't participate in identification.

Example: Identifying Prime and Non-Prime Attributes

Consider R(A, B, C, D, E) with candidate keys {A, B} and {A, C}:

Attribute	In Candidate Key {A,B}?	In Candidate Key {A,C}?	Classification
A	Yes	Yes	Prime
B	Yes	No	Prime
C	No	Yes	Prime
D	No	No	Non-Prime
E	No	No	Non-Prime

Notice:

A is in BOTH candidate keys — it's definitely prime
B is only in {A, B} but that's enough — still prime
C is only in {A, C} but that's enough — still prime
D and E are in NO candidate key — non-prime

Why this classification matters:

Second Normal Form (2NF): Requires no partial dependencies on candidate keys. If a non-prime attribute depends on only part of a composite candidate key, 2NF is violated.
Third Normal Form (3NF): Requires no transitive dependencies through non-prime attributes. Non-prime → non-prime dependencies create update anomalies.
Boyce-Codd Normal Form (BCNF): Requires every determinant to be a superkey. The distinction helps identify violations.
Index design: Prime attributes often need indexes for efficient joins and lookups.

Common Misconception

An attribute doesn't need to be in EVERY candidate key to be prime—membership in ANY candidate key suffices. In the example above, B is prime even though it's not in candidate key {A, C}.

Advanced Candidate Key Detection

For complex schemas with many functional dependencies, we need a more sophisticated approach to find all candidate keys. Here's a comprehensive algorithm:

candidate_key_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def find_all_candidate_keys(attributes, fds):
    """
    Find all candidate keys of a relation.
    
    Args:
        attributes: Set of all attributes in the relation
        fds: List of functional dependencies as (lhs, rhs) tuples
    
    Returns:
        List of candidate keys (each key is a frozenset)
    """
    def closure(attr_set, fds):
        """Compute attribute closure under FDs"""
        result = set(attr_set)
        changed = True
        while changed:
            changed = False
            for lhs, rhs in fds:
                if lhs.issubset(result) and not rhs.issubset(result):
                    result.update(rhs)
                    changed = True
        return result
    
    def is_superkey(attr_set, attributes, fds):
        """Check if attr_set is a superkey"""
        return closure(attr_set, fds) == attributes
    
    def is_minimal(attr_set, attributes, fds):
        """Check if superkey is minimal (no proper subset is superkey)"""
        for attr in attr_set:
            subset = attr_set - {attr}
            if is_superkey(subset, attributes, fds):
                return False
        return True
    
    # Classify attributes
    left_only = set()
    right_only = set()
    both_sides = set()
    
    for lhs, rhs in fds:
        left_only.update(lhs)
        right_only.update(rhs)
    
    neither = attributes - left_only - right_only
    both_sides = left_only & right_only
    left_only = left_only - both_sides
    right_only = right_only - both_sides
    
    # Start with left-only and neither (must be in all keys)
    must_have = left_only | neither
    
    # Candidates to try: add subsets of 'both_sides'
    from itertools import combinations
    
    candidate_keys = []
    maybe_attrs = list(both_sides)
    
    for size in range(len(maybe_attrs) + 1):
        for combo in combinations(maybe_attrs, size):
            candidate = must_have | set(combo)
            if is_superkey(candidate, attributes, fds):
                if is_minimal(candidate, attributes, fds):
                    # Check not superset of existing key
                    if not any(ck < candidate for ck in candidate_keys):
                        candidate_keys.append(frozenset(candidate))
    
    return candidate_keys

Algorithm walkthrough with complex example:

Consider R(A, B, C, D, E, F) with FDs:

AB → C
C → D
D → E
CF → B
E → A

Step 1: Classify attributes

Left-only: {F} (only on left in CF→B)
Right-only: none (all left-side attrs also appear on right somewhere... need careful analysis)
Actually: A appears right (E→A), B appears right (CF→B), C appears right (AB→C), D appears right (C→D), E appears right (D→E), F never on right
So F is left-only, must be in every candidate key

Step 2: Try combinations with F

{F}⁺ = {F} — not a superkey
{A, F}⁺ = {A, F} — not a superkey
{B, F}⁺ = {B, F} — not a superkey
{C, F}⁺ = {C, F, D, E, A, B} = all! — superkey! ✓
- Check minimality: {C}⁺ = {C, D, E, A} ≠ R, {F}⁺ = {F} ≠ R
- {C, F} is a candidate key ✓
{A, B, F}⁺ = {A, B, F, C, D, E} = all — superkey
- Check: {A, B}⁺ = {A, B, C, D, E} ≠ R (no F)
- Check: {A, F}⁺ = {A, F} ≠ R
- Check: {B, F}⁺ = {B, F} ≠ R
- {A, B, F} is a candidate key ✓

Candidate keys: {C, F} and {A, B, F}

Summary: Candidate Key Mastery

We've explored candidate keys in depth—the minimal superkeys that form the foundation for primary key selection. Let's consolidate the key insights:

Key Takeaways

•Candidate keys are minimal superkeys — They uniquely identify tuples using no more attributes than necessary.
•Minimality means irreducibility — Removing any single attribute from a candidate key destroys its superkey property.
•Relations can have multiple candidate keys — Each represents a valid identification mechanism; one becomes the primary key.
•Composite keys use multiple attributes — Common in junction tables, time-series data, and hierarchical structures.
•Prime/non-prime classification matters — It's fundamental to normalization theory and determines valid FD structures.
•Systematic algorithms exist — Attribute classification and closure computation enable finding all candidate keys.

What's next:

Now that we understand candidate keys, we'll explore primary keys—the specific candidate key chosen to serve as the main identification mechanism for a relation. We'll examine selection criteria, implementation considerations, and the critical properties that primary keys must maintain.

Page Complete

You now understand candidate keys as minimal superkeys. You can formally define and identify candidate keys, apply algorithms to find all candidate keys from functional dependencies, recognize composite keys and their applications, and classify attributes as prime or non-prime. Next, we'll explore primary key selection and its implications for database design.

Candidate Key

From Abundance to Precision

Consider a relation with attributes {A, B, C, D, E} where {A} is a superkey. By the superset property, all of the following are also superkeys:

{A, B}, {A, C}, {A, D}, {A, E}
{A, B, C}, {A, B, D}, {A, B, E}, {A, C, D}, {A, C, E}, {A, D, E}
And so on...

This explosion raises critical questions:

Why use {A, B, C, D, E} to identify a tuple when {A} alone suffices?
What is the 'smallest' or 'most efficient' superkey?
Are there multiple equally minimal options?

The candidate key concept answers these questions by identifying the minimal superkeys—those with no redundant attributes.

What You Will Learn

The Minimality Concept

Before defining candidate keys formally, we must understand what 'minimal' means in this context. Minimality isn't about the number of attributes—it's about irreducibility.

A set is minimal with respect to a property if removing any element causes the property to be lost. For superkeys, this means:

Understanding Minimality

A superkey K is minimal if no proper subset of K is also a superkey. Equivalently, removing any single attribute from K destroys its ability to uniquely identify tuples.

Example: Examining Minimality

Consider Student(StudentID, Email, Name, Major) where both {StudentID} and {Email} uniquely identify tuples:

Is {StudentID} minimal?

Check {}: The empty set cannot uniquely identify tuples (all tuples match)
Removing the only attribute destroys superkey status
Therefore, {StudentID} is minimal ✓

Is {StudentID, Email} minimal?

Check {StudentID}: Still a superkey (uniquely identifies tuples)
Therefore, {StudentID, Email} is NOT minimal ✗
We can remove Email and still have a superkey

Is {Name, Major} minimal? (Assuming it's a superkey)

Check {Name}: Cannot uniquely identify (students can share names)
Check {Major}: Cannot uniquely identify (students share majors)
Neither single attribute works, but together they might
If {Name, Major} is indeed a superkey, it would be minimal

Key insight: Minimality is about removing single attributes one at a time, not about having the smallest possible count.

Minimality Clarifications

•Minimal ≠ Smallest possible: A 2-attribute candidate key can exist alongside a 1-attribute candidate key. Both are minimal within their own structure.
•Minimal ≠ Unique: A relation can have multiple minimal superkeys (candidate keys). Each is irreducible, but they may differ in composition.
•Minimal = Irreducible: You cannot shrink a candidate key and maintain the superkey property. Every attribute is necessary.
•Testing requires checking each attribute: To verify minimality, remove each attribute one at a time and test if the result is still a superkey.

Formal Definition of Candidate Key

With the minimality concept clarified, we can precisely define candidate keys:

Definition: Candidate Key

Mathematical formalization:

Let R be a relation schema with attribute set U. Let F be the set of functional dependencies on R.

A set K ⊆ U is a candidate key of R if:

K⁺ = U (the closure of K under F equals the complete attribute set)
∀A ∈ K: (K - {A})⁺ ≠ U (removing any single attribute causes the closure to shrink below U)

Equivalently, using functional dependencies:

K is a candidate key if:

K → U (K determines everything)
∀A ∈ K: K - {A} ↛ U (K minus any attribute doesn't determine everything)

The two conditions are essential:

Condition	What it ensures	Consequence if violated
Superkey property (K → R)	K can identify tuples	K is useless for identification
Minimality (no proper subset works)	K has no redundancy	K wastes storage and complicates constraints

Why Both Conditions Matter

Finding Candidate Keys

Finding all candidate keys of a relation given its functional dependencies is a fundamental database task. We present a systematic algorithm and then work through detailed examples.

Algorithm: Finding Candidate Keys

Detailed Example:

Consider R(A, B, C, D, E) with functional dependencies:

F = {AB → C, C → D, D → E, B → E}

Step 1: Classify attributes

Attribute	Appears on Left	Appears on Right	Classification
A	Yes (AB→C)	No	Left-only
B	Yes (AB→C, B→E)	No	Left-only
C	Yes (C→D)	Yes (from AB→C)	Both
D	Yes (D→E)	Yes (from C→D)	Both
E	No	Yes (from D→E, B→E)	Right-only

Step 2: Start with left-only attributes

Left-only: {A, B} — these MUST be in every candidate key.

Step 3: Compute closure of {A, B}

Start: {A, B}⁺ = {A, B}
Apply AB → C: {A, B}⁺ = {A, B, C}
Apply C → D: {A, B}⁺ = {A, B, C, D}
Apply D → E: {A, B}⁺ = {A, B, C, D, E}
Apply B → E: Already have E

Closure = {A, B, C, D, E} = R ✓

Step 4: Verify minimality

{A}⁺ = {A} ≠ R — A alone doesn't work
{B}⁺ = {B, E} ≠ R — B alone doesn't work

Both attributes are necessary, so {A, B} is minimal.

Conclusion: The only candidate key is {A, B}

Right-Only Attributes

Multiple Candidate Keys

Example: Multiple Candidate Keys

Consider Employee(EmployeeID, SSN, Email, Name, Department)

Functional dependencies (from real-world semantics):

EmployeeID → SSN, Email, Name, Department
SSN → EmployeeID, Email, Name, Department
Email → EmployeeID, SSN, Name, Department

Let's verify each single-attribute candidate:

Candidate Key Analysis for Employee
Attribute	Closure	Is Superkey?	Is Minimal?	Candidate Key?
{EmployeeID}	{All attributes}	Yes	Yes (single attribute)	Yes ✓
{SSN}	{All attributes}	Yes	Yes (single attribute)	Yes ✓
{Email}	{All attributes}	Yes	Yes (single attribute)	Yes ✓
{Name}	{Name}	No	N/A	No
{Department}	{Department}	No	N/A	No

This relation has three candidate keys: {EmployeeID}, {SSN}, and {Email}.

Each is:

A superkey (uniquely identifies all tuples)
Minimal (single attribute, can't be reduced)

Why multiple candidate keys exist:

Real-world identity has multiple facets: An employee is uniquely identified by their company ID, government SSN, and work email—each from a different domain of identification.
Historical accumulation: Systems often acquire multiple identification mechanisms over time as requirements evolve.
Integration requirements: When systems merge, each may bring its own identification scheme.

Design implications:

All candidate keys deserve equal theoretical consideration
One will be designated the primary key
Others become alternate keys
All should have uniqueness constraints enforced

The Terminology Hierarchy

Composite Candidate Keys

Definition: Composite Key

Example: Course Enrollment

Consider Enrollment(StudentID, CourseID, Semester, Grade, EnrollDate)

Business rules:

A student can enroll in a course at most once per semester
The same course runs every semester
Many students take each course

Functional dependencies:

{StudentID, CourseID, Semester} → Grade, EnrollDate

Analysis:

Attribute Combination	Uniquely Identifies?	Why?
{StudentID}	No	Same student in many courses
{CourseID}	No	Many students per course
{Semester}	No	Many enrollments per semester
{StudentID, CourseID}	No	Same student, same course in different semesters
{StudentID, Semester}	No	Student takes multiple courses per semester
{CourseID, Semester}	No	Many students per course per semester
{StudentID, CourseID, Semester}	Yes	At most one enrollment per combination

The candidate key is {StudentID, CourseID, Semester} — a 3-attribute composite key.

Minimality check:

Remove StudentID: {CourseID, Semester} — not unique (multiple students)
Remove CourseID: {StudentID, Semester} — not unique (multiple courses)
Remove Semester: {StudentID, CourseID} — not unique (repeatable enrollment)

All three attributes are necessary — the key is minimal.

Common Composite Key Patterns

•Junction tables (many-to-many): {Entity1_ID, Entity2_ID} — e.g., StudentCourse, BookAuthor, OrderProduct
•Time-series data: {Entity_ID, Timestamp} — e.g., StockPrice, SensorReading, AuditLog
•Multi-tenant systems: {TenantID, LocalID} — isolation between organizations sharing a database
•Versioned records: {Entity_ID, Version} or {Entity_ID, ValidFrom} — tracking historical changes
•Hierarchical namespaces: {Parent_ID, LocalName} — directory structures, organizational hierarchies

Prime and Non-Prime Attributes

The distinction between prime and non-prime attributes is fundamental to normalization theory and database design. These categories determine how functional dependencies should be structured.

Prime Attributes

An attribute is prime (or key attribute) if it is a member of ANY candidate key of the relation. Prime attributes participate in tuple identification.

Non-Prime Attributes

An attribute is non-prime (or non-key attribute) if it is NOT a member of any candidate key. Non-prime attributes are determined by keys but don't participate in identification.

Example: Identifying Prime and Non-Prime Attributes

Consider R(A, B, C, D, E) with candidate keys {A, B} and {A, C}:

Attribute	In Candidate Key {A,B}?	In Candidate Key {A,C}?	Classification
A	Yes	Yes	Prime
B	Yes	No	Prime
C	No	Yes	Prime
D	No	No	Non-Prime
E	No	No	Non-Prime

Notice:

A is in BOTH candidate keys — it's definitely prime
B is only in {A, B} but that's enough — still prime
C is only in {A, C} but that's enough — still prime
D and E are in NO candidate key — non-prime

Why this classification matters:

Second Normal Form (2NF): Requires no partial dependencies on candidate keys. If a non-prime attribute depends on only part of a composite candidate key, 2NF is violated.
Third Normal Form (3NF): Requires no transitive dependencies through non-prime attributes. Non-prime → non-prime dependencies create update anomalies.
Boyce-Codd Normal Form (BCNF): Requires every determinant to be a superkey. The distinction helps identify violations.
Index design: Prime attributes often need indexes for efficient joins and lookups.

Common Misconception

An attribute doesn't need to be in EVERY candidate key to be prime—membership in ANY candidate key suffices. In the example above, B is prime even though it's not in candidate key {A, C}.

Advanced Candidate Key Detection

For complex schemas with many functional dependencies, we need a more sophisticated approach to find all candidate keys. Here's a comprehensive algorithm:

candidate_key_algorithm.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def find_all_candidate_keys(attributes, fds):
    """
    Find all candidate keys of a relation.
    
    Args:
        attributes: Set of all attributes in the relation
        fds: List of functional dependencies as (lhs, rhs) tuples
    
    Returns:
        List of candidate keys (each key is a frozenset)
    """
    def closure(attr_set, fds):
        """Compute attribute closure under FDs"""
        result = set(attr_set)
        changed = True
        while changed:
            changed = False
            for lhs, rhs in fds:
                if lhs.issubset(result) and not rhs.issubset(result):
                    result.update(rhs)
                    changed = True
        return result
    
    def is_superkey(attr_set, attributes, fds):
        """Check if attr_set is a superkey"""
        return closure(attr_set, fds) == attributes
    
    def is_minimal(attr_set, attributes, fds):
        """Check if superkey is minimal (no proper subset is superkey)"""
        for attr in attr_set:
            subset = attr_set - {attr}
            if is_superkey(subset, attributes, fds):
                return False
        return True
    
    # Classify attributes
    left_only = set()
    right_only = set()
    both_sides = set()
    
    for lhs, rhs in fds:
        left_only.update(lhs)
        right_only.update(rhs)
    
    neither = attributes - left_only - right_only
    both_sides = left_only & right_only
    left_only = left_only - both_sides
    right_only = right_only - both_sides
    
    # Start with left-only and neither (must be in all keys)
    must_have = left_only | neither
    
    # Candidates to try: add subsets of 'both_sides'
    from itertools import combinations
    
    candidate_keys = []
    maybe_attrs = list(both_sides)
    
    for size in range(len(maybe_attrs) + 1):
        for combo in combinations(maybe_attrs, size):
            candidate = must_have | set(combo)
            if is_superkey(candidate, attributes, fds):
                if is_minimal(candidate, attributes, fds):
                    # Check not superset of existing key
                    if not any(ck < candidate for ck in candidate_keys):
                        candidate_keys.append(frozenset(candidate))
    
    return candidate_keys

Algorithm walkthrough with complex example:

Consider R(A, B, C, D, E, F) with FDs:

AB → C
C → D
D → E
CF → B
E → A

Step 1: Classify attributes

Left-only: {F} (only on left in CF→B)
Right-only: none (all left-side attrs also appear on right somewhere... need careful analysis)
Actually: A appears right (E→A), B appears right (CF→B), C appears right (AB→C), D appears right (C→D), E appears right (D→E), F never on right
So F is left-only, must be in every candidate key

Step 2: Try combinations with F

{F}⁺ = {F} — not a superkey
{A, F}⁺ = {A, F} — not a superkey
{B, F}⁺ = {B, F} — not a superkey
{C, F}⁺ = {C, F, D, E, A, B} = all! — superkey! ✓
- Check minimality: {C}⁺ = {C, D, E, A} ≠ R, {F}⁺ = {F} ≠ R
- {C, F} is a candidate key ✓
{A, B, F}⁺ = {A, B, F, C, D, E} = all — superkey
- Check: {A, B}⁺ = {A, B, C, D, E} ≠ R (no F)
- Check: {A, F}⁺ = {A, F} ≠ R
- Check: {B, F}⁺ = {B, F} ≠ R
- {A, B, F} is a candidate key ✓

Candidate keys: {C, F} and {A, B, F}

Summary: Candidate Key Mastery

We've explored candidate keys in depth—the minimal superkeys that form the foundation for primary key selection. Let's consolidate the key insights:

Key Takeaways

•Candidate keys are minimal superkeys — They uniquely identify tuples using no more attributes than necessary.
•Minimality means irreducibility — Removing any single attribute from a candidate key destroys its superkey property.
•Relations can have multiple candidate keys — Each represents a valid identification mechanism; one becomes the primary key.
•Composite keys use multiple attributes — Common in junction tables, time-series data, and hierarchical structures.
•Prime/non-prime classification matters — It's fundamental to normalization theory and determines valid FD structures.
•Systematic algorithms exist — Attribute classification and closure computation enable finding all candidate keys.

What's next:

Page Complete