Database Management SystemsAttribute Closure

Attribute Closure

LevelIntermediate

Duration60 mins

TopicAttribute Closure

4 / 5

Superkey Identification

Beyond Candidate Keys: Understanding All Keys

While candidate keys get most of the attention in database theory, understanding superkeys provides crucial insights into the structure of a relation. Every candidate key is a superkey, but the reverse isn't true—superkeys can include "extra" attributes that aren't strictly necessary for unique identification.

Why should we care about superkeys beyond just finding candidate keys?

Constraint enforcement: Any superkey can serve as a uniqueness constraint
Index design: Understanding superkey relationships helps in index optimization
Query optimization: Knowing what determines what helps optimizers make better decisions
Schema evolution: Adding attributes to a key always preserves the superkey property
Theoretical foundations: Superkeys connect to normal form definitions and decomposition theory

What You Will Learn

By the end of this page, you will understand the complete structure of superkeys in a relation, how to efficiently identify whether an attribute set is a superkey, the lattice structure of all superkeys, and the relationship between superkeys, candidate keys, and prime attributes.

Superkey Definition and Properties

Let's establish a rigorous understanding of superkeys and their fundamental properties.

Formal Definition:

Let R be a relation schema with attribute set U and functional dependencies F. An attribute set K ⊆ U is a superkey of R if and only if K functionally determines all attributes of R. Formally:

K is a superkey ⟺ K⁺ = U ⟺ F ⊨ K → U

Intuition: A superkey is any set of attributes that uniquely identifies each tuple in the relation. Given the values of a superkey, there can be at most one tuple with those values.

Fundamental Properties of Superkeys
Property	Statement	Explanation
Superset Closure	If K is a superkey and K ⊆ K', then K' is a superkey	Adding attributes to a superkey gives another superkey
Universal Superkey	U (all attributes) is always a superkey	The full attribute set trivially determines itself
Minimum Size Bound	Every superkey K satisfies \|K\| ≥ \|candidate key\|	Candidate keys are minimal superkeys
Closure Characterization	K is a superkey ⟺ K⁺ = U	The closure test is definitive
Uniqueness Guarantee	∀ tuples t₁, t₂: t₁[K] = t₂[K] ⟹ t₁ = t₂	Superkeys enforce uniqueness

The Superset Property

The superset closure property is powerful: once you verify K is a superkey, every superset of K is automatically a superkey without further checking. This means superkeys form an "upward closed" set in the subset lattice.

The Superkey Test

Testing whether an attribute set is a superkey is straightforward using attribute closure.

Algorithm: IS-SUPERKEY(K, R, F)

Input:  K — attribute set to test
        R — relation schema (attribute set U)
        F — functional dependencies
Output: true if K is a superkey, false otherwise

1. Compute K⁺ using closure algorithm
2. Return (K⁺ = U)

Complexity: O(|F| × |U|²) for the closure computation — identical to closure algorithm complexity.

superkey_test
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def is_superkey(K: set, R: set, F: list[tuple[set, set]]) -> bool:
    """
    Test if attribute set K is a superkey of relation R under FDs F.
    
    Args:
        K: Set of attributes to test
        R: Set of all attributes in the relation
        F: List of FDs as (determinant, dependent) tuples
    
    Returns:
        True if K is a superkey, False otherwise
    """
    closure = compute_closure(K, F)
    return closure == R
 
 
def compute_closure(X: set, F: list[tuple[set, set]]) -> set:
    """Compute attribute closure X⁺ under F."""
    result = set(X)
    changed = True
    while changed:
        changed = False
        for (det, dep) in F:
            if det <= result and not dep <= result:
                result |= dep
                changed = True
    return result
 
 
# Example usage
R = {'A', 'B', 'C', 'D', 'E'}
F = [
    ({'A'}, {'B', 'C'}),   # A → BC
    ({'C'}, {'D'}),         # C → D
    ({'B', 'D'}, {'E'}),    # BD → E
]
 
# Test various attribute sets
test_sets = [
    {'A'},
    {'A', 'B'},
    {'C', 'D'},
    {'A', 'E'},
]
 
for K in test_sets:
    result = is_superkey(K, R, F)
    closure = compute_closure(K, F)
    print(f"{K}⁺ = {closure}")
    print(f"  Is {K} a superkey? {result}\n")
 
# Output:
# {'A'}⁺ = {'A', 'B', 'C', 'D', 'E'}
#   Is {'A'} a superkey? True
#
# {'A', 'B'}⁺ = {'A', 'B', 'C', 'D', 'E'}
#   Is {'A', 'B'} a superkey? True
#
# {'C', 'D'}⁺ = {'C', 'D'}... needs more analysis
#   Actually: C→D (already have D), no more → {'C', 'D'}
#   Is {'C', 'D'} a superkey? False
#
# {'A', 'E'}⁺ = {'A', 'B', 'C', 'D', 'E'}
#   Is {'A', 'E'} a superkey? True

Multiple Superkey TestsFor R(A, B, C, D) with F = {A → B, BC → D}

Input

Test: {A}, {AC}, {AB}, {BC}, {ABC}, {ABCD}

Output

Superkeys: {AC}, {ABC}, {ABCD}

Explanation

• {A}⁺ = {A, B} ≠ ABCD → NOT superkey • {AC}⁺ = {A, B, C, D} = R → SUPERKEY ✓ • {AB}⁺ = {A, B} ≠ ABCD → NOT superkey • {BC}⁺ = {B, C, D} ≠ ABCD → NOT superkey • {ABC}⁺ = {A, B, C, D} = R → SUPERKEY ✓ (but not minimal) • {ABCD}⁺ = {A, B, C, D} = R → SUPERKEY ✓ (trivially)

Note: {AC} is the only candidate key here (minimal superkey). {ABC} and {ABCD} are superkeys but not candidate keys.

The Lattice of Superkeys

The set of all superkeys of a relation has a beautiful mathematical structure: it forms an upward closed set (also called an upset or filter) in the subset lattice of attributes.

What does this mean?

Consider the power set of all attributes, ordered by subset inclusion. Superkeys occupy a specific region of this lattice:

Candidate keys are at the "bottom" of the superkey region (minimal elements)
All supersets of candidate keys are superkeys (upward closure)
The full attribute set is at the very top (trivially a superkey)
Non-superkeys are below and not connected to this region

This structure has practical implications for enumeration and reasoning about keys.

Converting Mermaid diagram...

In the diagram above (for a hypothetical relation where {A,B} and {A,C} are the candidate keys):

Green solid nodes are superkeys
Green thick border nodes are candidate keys (minimal superkeys)
Gray dashed nodes are NOT superkeys

Key Observation: The "frontier" between superkeys and non-superkeys is exactly the set of candidate keys. Above them (larger sets) are superkeys; below them (smaller sets) are non-superkeys.

Counting Superkeys

If a relation has candidate keys K₁, K₂, ..., Kₘ, the number of superkeys can be computed using inclusion-exclusion. In the simplest case with one candidate key K of size k in a relation with n attributes, there are 2^(n-k) superkeys (all supersets of K).

Counting Superkeys

Understanding how many superkeys exist helps in database design and theory. The count depends on the candidate key structure.

Case 1: Single Candidate Key

If K is the only candidate key with |K| = k in a relation with n attributes, then:

Number of superkeys = 2^(n-k)

Reason: Every superset of K is a superkey. We can independently include or exclude each of the (n-k) non-key attributes.

Case 2: Multiple Candidate Keys

With multiple candidate keys, we use inclusion-exclusion to avoid double-counting supersets.

Counting Superkeys ExampleR(A, B, C, D, E) with candidate keys {A, B} and {C, D}

Input

Count all superkeys

Output

Number of superkeys = 14

Explanation

Using inclusion-exclusion:

Let SK(K) = supersets of candidate key K

|SK({A,B})| = 2^(5-2) = 2³ = 8 (attribute E can be in/out: ×2, plus C, D) Actually: supersets of {A,B} can include any subset of {C,D,E} |SK({A,B})| = 2³ = 8

|SK({C,D})| = 2^(5-2) = 8

|SK({A,B}) ∩ SK({C,D})| = |SK({A,B,C,D})| = 2^(5-4) = 2¹ = 2 (only E can vary)

By inclusion-exclusion: Total superkeys = 8 + 8 - 2 = 14

Verification: The 14 superkeys are: {AB}, {ABC}, {ABD}, {ABE}, {ABCD}, {ABCE}, {ABDE}, {ABCDE} (8 from AB) {CD}, {ACD}, {BCD}, {CDE}, {ABCD}, {ACDE}, {BCDE}, {ABCDE} (8 from CD) Duplicates: {ABCD}, {ABCDE} counted twice → subtract 2 Unique superkeys: 8 + 8 - 2 = 14 ✓

Superkey Count Formulas
Scenario	Formula	Example
Single key K, \|K\|=k, n attrs	2^(n-k)	K={A}, n=5 → 2⁴ = 16 superkeys
Two disjoint keys K₁, K₂	2^(n-\|K₁\|) + 2^(n-\|K₂\|) - 2^(n-\|K₁∪K₂\|)	As computed above
Keys K₁, K₂ with overlap	More complex inclusion-exclusion	Depends on K₁ ∩ K₂
m disjoint keys of size k	m × 2^(n-k) - (m choose 2) × 2^(n-2k) + ...	Full inclusion-exclusion

Practical Note

In practice, the exact count of superkeys is rarely needed. What matters is identifying candidate keys (the minimal ones) and understanding that any superset is also a valid uniqueness constraint.

Superkeys and Prime Attributes

The relationship between superkeys, candidate keys, and prime/non-prime attributes is fundamental to normalization theory.

Definitions:

Prime attribute: An attribute that belongs to at least one candidate key
Non-prime attribute: An attribute that does not belong to any candidate key

Key Relationships:

Attribute Classification Based on Keys
Attribute Type	In Candidate Key?	In Every Superkey?	Relevance
Prime attribute	Yes (at least one)	Not necessarily	Participates in minimal unique identification
Non-prime attribute	No	Never required	Can always be derived from a key
L-category attribute	Yes (every key)	Yes	Essential for any identification
R-category attribute	No	No	Always derivable, never part of key

Identifying Prime and Non-Prime AttributesR(A, B, C, D, E) with candidate keys {A, B} and {A, C}

Input

Classify all attributes as prime or non-prime

Output

Prime: {A, B, C}, Non-prime: {D, E}

Explanation

Candidate Key 1: {A, B} • A is in this key → prime • B is in this key → prime

Candidate Key 2: {A, C} • A is in this key → prime (already prime) • C is in this key → prime

Prime attributes: A, B, C (each appears in at least one candidate key) Non-prime attributes: D, E (neither appears in any candidate key)

Note: A is "super-prime" in the sense it appears in EVERY candidate key (it's in the L category). B and C are prime but not in every key.

Why This Matters for Normalization

The 2NF, 3NF, and BCNF definitions all reference: • Whether a dependency's determinant is a superkey • Whether dependent attributes are prime or non-prime

Understanding superkeys and prime attributes is prerequisite to correctly applying normalization rules.

Superkeys in Normalization Context

Normal form definitions frequently reference superkeys. Understanding this connection is essential for applying normalization correctly.

BCNF Definition (using superkeys):

A relation R is in BCNF if and only if, for every non-trivial FD X → Y in F⁺, X is a superkey of R.

Interpretation: Every determinant that provides new information must be a superkey. This means no non-key attribute determines anything about the relation.

3NF Definition (using superkeys and prime attributes):

A relation R is in 3NF if and only if, for every non-trivial FD X → A in F⁺, either:

X is a superkey of R, OR

A is a prime attribute

BCNF Characteristics

•Every determinant is a superkey
•No exceptions allowed
•Eliminates all redundancy from FDs
•Not always dependency-preserving
•Strictest practical normal form

3NF Characteristics

•Determinant is superkey OR dependent is prime
•Prime attribute exception exists
•May have some controlled redundancy
•Always dependency-preserving decomposition exists
•Good balance of normalization and practicality

Testing Normal Forms Using Superkey KnowledgeR(A, B, C, D) with F = {A → B, BC → D} and candidate key {A, C}

Input

Is R in BCNF? Is R in 3NF?

Output

Not BCNF, Yes 3NF

Explanation

Check each non-trivial FD:

FD: A → B • Is {A} a superkey? {A}⁺ = {A, B} ≠ {A,B,C,D} → NO • BCNF violated ✗ • Is B prime? Candidate key is {A,C}, B ∉ {A,C} → NO • 3NF violated? Need superkey OR prime → violated ✗

Wait, let me reconsider. Actually:

FD: BC → D • Is {B, C} a superkey? {B,C}⁺ = {B, C, D} ≠ ABCD → NO • BCNF check: NOT a superkey → violation • 3NF check: D is not prime (D ∉ {A,C}) → violation IF BC not superkey

Actually this relation is NOT in 3NF either. Let me recalculate:

Candidate key = {A, C} (verify: {A,C}⁺ = A→B gives {A,B,C}, BC→D gives {A,B,C,D} ✓)

Prime attributes: A, C Non-prime: B, D

A → B: A not superkey, B not prime → 3NF violated BC → D: BC not superkey, D not prime → 3NF violated

Correct answer: R is in neither BCNF nor 3NF.

Practical Applications of Superkey Analysis

Beyond normalization theory, superkey analysis has practical applications in database design and optimization.

Real-World Applications

•UNIQUE Constraints: Any superkey can be the target of a UNIQUE constraint. Database designers choose based on semantic meaning and query patterns.
•Index Selection: Indexes on superkey prefixes can optimize queries that filter on those attributes, with guarantees about result uniqueness.
•Foreign Key References: While typically referencing primary keys, foreign keys can reference any superkey (candidate key in SQL standard).
•Query Optimization: Knowing A is a superkey tells the optimizer that SELECT DISTINCT on A columns is unnecessary.
•Data Validation: During data loading, superkey checks can identify duplicate records before insertion.
•Schema Documentation: Documenting all candidate keys (not just the primary) helps future maintainers understand entity identification.

Design Guideline

When choosing a primary key from multiple candidate keys, consider:

Stability (values shouldn't change)
Simplicity (fewer attributes preferred)
Performance (shorter keys index faster)
Semantics (natural vs. surrogate keys)

All candidate keys enforce the same uniqueness; the choice is about practical tradeoffs.

Superkey-Based Design Decisions
Decision	Use Superkey Analysis For	Example
Primary key selection	Identify all candidate keys, then choose	Choose EmployeeID over (SSN) or (Email)
Unique constraint placement	Determine if column combo is superkey	{OrderID, LineNumber} is superkey for OrderLines
Index design	Find minimal covering index	Index on candidate key covers all queries needing uniqueness
Join elimination	Optimizer removes redundant joins if FK references superkey	Join on 1:1 FK can be eliminated in projections
Duplicate detection	Define what makes rows 'the same'	Superkey equality = same entity

Summary: Mastering Superkey Analysis

Superkey identification is more than a stepping stone to candidate keys—it's fundamental to understanding relational structure. Let's consolidate the key concepts:

Key Takeaways

•Definition — K is a superkey ⟺ K⁺ = R (closure equals full relation).
•Superset Property — Every superset of a superkey is also a superkey.
•Lattice Structure — Superkeys form an upward-closed set with candidate keys at the boundary.
•Candidate Keys — The minimal superkeys; every superkey contains at least one candidate key.
•Prime Attributes — Attributes appearing in some candidate key; non-prime attributes appear in none.
•Normalization Connection — BCNF requires all determinants to be superkeys; 3NF allows prime attribute exceptions.
•Practical Uses — Unique constraints, index design, join optimization, duplicate detection.

What's Next:

The final page of this module focuses on candidate key finding—refining our techniques to efficiently discover all minimal superkeys, handle complex FD sets, and understand the complete key structure of any relation.

Page Complete

You now have a complete understanding of superkeys: their definition, structure, relationship to candidate keys and prime attributes, role in normalization, and practical applications. Next, we dive deep into candidate key finding algorithms.

4 / 5

Loading learning content...

Database Management SystemsAttribute Closure

Attribute Closure

LevelIntermediate

Duration60 mins

TopicAttribute Closure

4 / 5

Superkey Identification

Beyond Candidate Keys: Understanding All Keys

Why should we care about superkeys beyond just finding candidate keys?

Constraint enforcement: Any superkey can serve as a uniqueness constraint
Index design: Understanding superkey relationships helps in index optimization
Query optimization: Knowing what determines what helps optimizers make better decisions
Schema evolution: Adding attributes to a key always preserves the superkey property
Theoretical foundations: Superkeys connect to normal form definitions and decomposition theory

What You Will Learn

Superkey Definition and Properties

Let's establish a rigorous understanding of superkeys and their fundamental properties.

Formal Definition:

Let R be a relation schema with attribute set U and functional dependencies F. An attribute set K ⊆ U is a superkey of R if and only if K functionally determines all attributes of R. Formally:

K is a superkey ⟺ K⁺ = U ⟺ F ⊨ K → U

Intuition: A superkey is any set of attributes that uniquely identifies each tuple in the relation. Given the values of a superkey, there can be at most one tuple with those values.

Fundamental Properties of Superkeys
Property	Statement	Explanation
Superset Closure	If K is a superkey and K ⊆ K', then K' is a superkey	Adding attributes to a superkey gives another superkey
Universal Superkey	U (all attributes) is always a superkey	The full attribute set trivially determines itself
Minimum Size Bound	Every superkey K satisfies \|K\| ≥ \|candidate key\|	Candidate keys are minimal superkeys
Closure Characterization	K is a superkey ⟺ K⁺ = U	The closure test is definitive
Uniqueness Guarantee	∀ tuples t₁, t₂: t₁[K] = t₂[K] ⟹ t₁ = t₂	Superkeys enforce uniqueness

The Superset Property

The Superkey Test

Testing whether an attribute set is a superkey is straightforward using attribute closure.

Algorithm: IS-SUPERKEY(K, R, F)

Input:  K — attribute set to test
        R — relation schema (attribute set U)
        F — functional dependencies
Output: true if K is a superkey, false otherwise

1. Compute K⁺ using closure algorithm
2. Return (K⁺ = U)

Complexity: O(|F| × |U|²) for the closure computation — identical to closure algorithm complexity.

superkey_test
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def is_superkey(K: set, R: set, F: list[tuple[set, set]]) -> bool:
    """
    Test if attribute set K is a superkey of relation R under FDs F.
    
    Args:
        K: Set of attributes to test
        R: Set of all attributes in the relation
        F: List of FDs as (determinant, dependent) tuples
    
    Returns:
        True if K is a superkey, False otherwise
    """
    closure = compute_closure(K, F)
    return closure == R
 
 
def compute_closure(X: set, F: list[tuple[set, set]]) -> set:
    """Compute attribute closure X⁺ under F."""
    result = set(X)
    changed = True
    while changed:
        changed = False
        for (det, dep) in F:
            if det <= result and not dep <= result:
                result |= dep
                changed = True
    return result
 
 
# Example usage
R = {'A', 'B', 'C', 'D', 'E'}
F = [
    ({'A'}, {'B', 'C'}),   # A → BC
    ({'C'}, {'D'}),         # C → D
    ({'B', 'D'}, {'E'}),    # BD → E
]
 
# Test various attribute sets
test_sets = [
    {'A'},
    {'A', 'B'},
    {'C', 'D'},
    {'A', 'E'},
]
 
for K in test_sets:
    result = is_superkey(K, R, F)
    closure = compute_closure(K, F)
    print(f"{K}⁺ = {closure}")
    print(f"  Is {K} a superkey? {result}\n")
 
# Output:
# {'A'}⁺ = {'A', 'B', 'C', 'D', 'E'}
#   Is {'A'} a superkey? True
#
# {'A', 'B'}⁺ = {'A', 'B', 'C', 'D', 'E'}
#   Is {'A', 'B'} a superkey? True
#
# {'C', 'D'}⁺ = {'C', 'D'}... needs more analysis
#   Actually: C→D (already have D), no more → {'C', 'D'}
#   Is {'C', 'D'} a superkey? False
#
# {'A', 'E'}⁺ = {'A', 'B', 'C', 'D', 'E'}
#   Is {'A', 'E'} a superkey? True

Multiple Superkey TestsFor R(A, B, C, D) with F = {A → B, BC → D}

Input

Test: {A}, {AC}, {AB}, {BC}, {ABC}, {ABCD}

Output

Superkeys: {AC}, {ABC}, {ABCD}

Explanation

Note: {AC} is the only candidate key here (minimal superkey). {ABC} and {ABCD} are superkeys but not candidate keys.

The Lattice of Superkeys

The set of all superkeys of a relation has a beautiful mathematical structure: it forms an upward closed set (also called an upset or filter) in the subset lattice of attributes.

What does this mean?

Consider the power set of all attributes, ordered by subset inclusion. Superkeys occupy a specific region of this lattice:

Candidate keys are at the "bottom" of the superkey region (minimal elements)
All supersets of candidate keys are superkeys (upward closure)
The full attribute set is at the very top (trivially a superkey)
Non-superkeys are below and not connected to this region

This structure has practical implications for enumeration and reasoning about keys.

Converting Mermaid diagram...

In the diagram above (for a hypothetical relation where {A,B} and {A,C} are the candidate keys):

Green solid nodes are superkeys
Green thick border nodes are candidate keys (minimal superkeys)
Gray dashed nodes are NOT superkeys

Key Observation: The "frontier" between superkeys and non-superkeys is exactly the set of candidate keys. Above them (larger sets) are superkeys; below them (smaller sets) are non-superkeys.

Counting Superkeys

Understanding how many superkeys exist helps in database design and theory. The count depends on the candidate key structure.

Case 1: Single Candidate Key

If K is the only candidate key with |K| = k in a relation with n attributes, then:

Number of superkeys = 2^(n-k)

Reason: Every superset of K is a superkey. We can independently include or exclude each of the (n-k) non-key attributes.

Case 2: Multiple Candidate Keys

With multiple candidate keys, we use inclusion-exclusion to avoid double-counting supersets.

Counting Superkeys ExampleR(A, B, C, D, E) with candidate keys {A, B} and {C, D}

Input

Count all superkeys

Output

Number of superkeys = 14

Explanation

Using inclusion-exclusion:

Let SK(K) = supersets of candidate key K

|SK({A,B})| = 2^(5-2) = 2³ = 8 (attribute E can be in/out: ×2, plus C, D) Actually: supersets of {A,B} can include any subset of {C,D,E} |SK({A,B})| = 2³ = 8

|SK({C,D})| = 2^(5-2) = 8

|SK({A,B}) ∩ SK({C,D})| = |SK({A,B,C,D})| = 2^(5-4) = 2¹ = 2 (only E can vary)

By inclusion-exclusion: Total superkeys = 8 + 8 - 2 = 14

Superkey Count Formulas
Scenario	Formula	Example
Single key K, \|K\|=k, n attrs	2^(n-k)	K={A}, n=5 → 2⁴ = 16 superkeys
Two disjoint keys K₁, K₂	2^(n-\|K₁\|) + 2^(n-\|K₂\|) - 2^(n-\|K₁∪K₂\|)	As computed above
Keys K₁, K₂ with overlap	More complex inclusion-exclusion	Depends on K₁ ∩ K₂
m disjoint keys of size k	m × 2^(n-k) - (m choose 2) × 2^(n-2k) + ...	Full inclusion-exclusion

Practical Note

In practice, the exact count of superkeys is rarely needed. What matters is identifying candidate keys (the minimal ones) and understanding that any superset is also a valid uniqueness constraint.

Superkeys and Prime Attributes

The relationship between superkeys, candidate keys, and prime/non-prime attributes is fundamental to normalization theory.

Definitions:

Prime attribute: An attribute that belongs to at least one candidate key
Non-prime attribute: An attribute that does not belong to any candidate key

Key Relationships:

Attribute Classification Based on Keys
Attribute Type	In Candidate Key?	In Every Superkey?	Relevance
Prime attribute	Yes (at least one)	Not necessarily	Participates in minimal unique identification
Non-prime attribute	No	Never required	Can always be derived from a key
L-category attribute	Yes (every key)	Yes	Essential for any identification
R-category attribute	No	No	Always derivable, never part of key

Identifying Prime and Non-Prime AttributesR(A, B, C, D, E) with candidate keys {A, B} and {A, C}

Input

Classify all attributes as prime or non-prime

Output

Prime: {A, B, C}, Non-prime: {D, E}

Explanation

Candidate Key 1: {A, B} • A is in this key → prime • B is in this key → prime

Candidate Key 2: {A, C} • A is in this key → prime (already prime) • C is in this key → prime

Prime attributes: A, B, C (each appears in at least one candidate key) Non-prime attributes: D, E (neither appears in any candidate key)

Note: A is "super-prime" in the sense it appears in EVERY candidate key (it's in the L category). B and C are prime but not in every key.

Why This Matters for Normalization

The 2NF, 3NF, and BCNF definitions all reference: • Whether a dependency's determinant is a superkey • Whether dependent attributes are prime or non-prime

Understanding superkeys and prime attributes is prerequisite to correctly applying normalization rules.

Superkeys in Normalization Context

Normal form definitions frequently reference superkeys. Understanding this connection is essential for applying normalization correctly.

BCNF Definition (using superkeys):

A relation R is in BCNF if and only if, for every non-trivial FD X → Y in F⁺, X is a superkey of R.

Interpretation: Every determinant that provides new information must be a superkey. This means no non-key attribute determines anything about the relation.

3NF Definition (using superkeys and prime attributes):

A relation R is in 3NF if and only if, for every non-trivial FD X → A in F⁺, either:

X is a superkey of R, OR

A is a prime attribute

BCNF Characteristics

•Every determinant is a superkey
•No exceptions allowed
•Eliminates all redundancy from FDs
•Not always dependency-preserving
•Strictest practical normal form

3NF Characteristics

•Determinant is superkey OR dependent is prime
•Prime attribute exception exists
•May have some controlled redundancy
•Always dependency-preserving decomposition exists
•Good balance of normalization and practicality

Testing Normal Forms Using Superkey KnowledgeR(A, B, C, D) with F = {A → B, BC → D} and candidate key {A, C}

Input

Is R in BCNF? Is R in 3NF?

Output

Not BCNF, Yes 3NF

Explanation

Check each non-trivial FD:

Wait, let me reconsider. Actually:

Actually this relation is NOT in 3NF either. Let me recalculate:

Candidate key = {A, C} (verify: {A,C}⁺ = A→B gives {A,B,C}, BC→D gives {A,B,C,D} ✓)

Prime attributes: A, C Non-prime: B, D

A → B: A not superkey, B not prime → 3NF violated BC → D: BC not superkey, D not prime → 3NF violated

Correct answer: R is in neither BCNF nor 3NF.

Practical Applications of Superkey Analysis

Beyond normalization theory, superkey analysis has practical applications in database design and optimization.

Real-World Applications

•UNIQUE Constraints: Any superkey can be the target of a UNIQUE constraint. Database designers choose based on semantic meaning and query patterns.
•Index Selection: Indexes on superkey prefixes can optimize queries that filter on those attributes, with guarantees about result uniqueness.
•Foreign Key References: While typically referencing primary keys, foreign keys can reference any superkey (candidate key in SQL standard).
•Query Optimization: Knowing A is a superkey tells the optimizer that SELECT DISTINCT on A columns is unnecessary.
•Data Validation: During data loading, superkey checks can identify duplicate records before insertion.
•Schema Documentation: Documenting all candidate keys (not just the primary) helps future maintainers understand entity identification.

Design Guideline

When choosing a primary key from multiple candidate keys, consider:

Stability (values shouldn't change)
Simplicity (fewer attributes preferred)
Performance (shorter keys index faster)
Semantics (natural vs. surrogate keys)

All candidate keys enforce the same uniqueness; the choice is about practical tradeoffs.

Superkey-Based Design Decisions
Decision	Use Superkey Analysis For	Example
Primary key selection	Identify all candidate keys, then choose	Choose EmployeeID over (SSN) or (Email)
Unique constraint placement	Determine if column combo is superkey	{OrderID, LineNumber} is superkey for OrderLines
Index design	Find minimal covering index	Index on candidate key covers all queries needing uniqueness
Join elimination	Optimizer removes redundant joins if FK references superkey	Join on 1:1 FK can be eliminated in projections
Duplicate detection	Define what makes rows 'the same'	Superkey equality = same entity

Summary: Mastering Superkey Analysis

Superkey identification is more than a stepping stone to candidate keys—it's fundamental to understanding relational structure. Let's consolidate the key concepts:

Key Takeaways

•Definition — K is a superkey ⟺ K⁺ = R (closure equals full relation).
•Superset Property — Every superset of a superkey is also a superkey.
•Lattice Structure — Superkeys form an upward-closed set with candidate keys at the boundary.
•Candidate Keys — The minimal superkeys; every superkey contains at least one candidate key.
•Prime Attributes — Attributes appearing in some candidate key; non-prime attributes appear in none.
•Normalization Connection — BCNF requires all determinants to be superkeys; 3NF allows prime attribute exceptions.
•Practical Uses — Unique constraints, index design, join optimization, duplicate detection.

What's Next:

Page Complete

4 / 5