Loading learning content...
While candidate keys get most of the attention in database theory, understanding superkeys provides crucial insights into the structure of a relation. Every candidate key is a superkey, but the reverse isn't true—superkeys can include "extra" attributes that aren't strictly necessary for unique identification.
Why should we care about superkeys beyond just finding candidate keys?
By the end of this page, you will understand the complete structure of superkeys in a relation, how to efficiently identify whether an attribute set is a superkey, the lattice structure of all superkeys, and the relationship between superkeys, candidate keys, and prime attributes.
Let's establish a rigorous understanding of superkeys and their fundamental properties.
Formal Definition:
Let R be a relation schema with attribute set U and functional dependencies F. An attribute set K ⊆ U is a superkey of R if and only if K functionally determines all attributes of R. Formally:
K is a superkey ⟺ K⁺ = U ⟺ F ⊨ K → U
Intuition: A superkey is any set of attributes that uniquely identifies each tuple in the relation. Given the values of a superkey, there can be at most one tuple with those values.
| Property | Statement | Explanation |
|---|---|---|
| Superset Closure | If K is a superkey and K ⊆ K', then K' is a superkey | Adding attributes to a superkey gives another superkey |
| Universal Superkey | U (all attributes) is always a superkey | The full attribute set trivially determines itself |
| Minimum Size Bound | Every superkey K satisfies |K| ≥ |candidate key| | Candidate keys are minimal superkeys |
| Closure Characterization | K is a superkey ⟺ K⁺ = U | The closure test is definitive |
| Uniqueness Guarantee | ∀ tuples t₁, t₂: t₁[K] = t₂[K] ⟹ t₁ = t₂ | Superkeys enforce uniqueness |
The superset closure property is powerful: once you verify K is a superkey, every superset of K is automatically a superkey without further checking. This means superkeys form an "upward closed" set in the subset lattice.
Testing whether an attribute set is a superkey is straightforward using attribute closure.
Algorithm: IS-SUPERKEY(K, R, F)
Input: K — attribute set to test
R — relation schema (attribute set U)
F — functional dependencies
Output: true if K is a superkey, false otherwise
1. Compute K⁺ using closure algorithm
2. Return (K⁺ = U)
Complexity: O(|F| × |U|²) for the closure computation — identical to closure algorithm complexity.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
def is_superkey(K: set, R: set, F: list[tuple[set, set]]) -> bool: """ Test if attribute set K is a superkey of relation R under FDs F. Args: K: Set of attributes to test R: Set of all attributes in the relation F: List of FDs as (determinant, dependent) tuples Returns: True if K is a superkey, False otherwise """ closure = compute_closure(K, F) return closure == R def compute_closure(X: set, F: list[tuple[set, set]]) -> set: """Compute attribute closure X⁺ under F.""" result = set(X) changed = True while changed: changed = False for (det, dep) in F: if det <= result and not dep <= result: result |= dep changed = True return result # Example usageR = {'A', 'B', 'C', 'D', 'E'}F = [ ({'A'}, {'B', 'C'}), # A → BC ({'C'}, {'D'}), # C → D ({'B', 'D'}, {'E'}), # BD → E] # Test various attribute setstest_sets = [ {'A'}, {'A', 'B'}, {'C', 'D'}, {'A', 'E'},] for K in test_sets: result = is_superkey(K, R, F) closure = compute_closure(K, F) print(f"{K}⁺ = {closure}") print(f" Is {K} a superkey? {result}\n") # Output:# {'A'}⁺ = {'A', 'B', 'C', 'D', 'E'}# Is {'A'} a superkey? True## {'A', 'B'}⁺ = {'A', 'B', 'C', 'D', 'E'}# Is {'A', 'B'} a superkey? True## {'C', 'D'}⁺ = {'C', 'D'}... needs more analysis# Actually: C→D (already have D), no more → {'C', 'D'}# Is {'C', 'D'} a superkey? False## {'A', 'E'}⁺ = {'A', 'B', 'C', 'D', 'E'}# Is {'A', 'E'} a superkey? TrueTest: {A}, {AC}, {AB}, {BC}, {ABC}, {ABCD}Superkeys: {AC}, {ABC}, {ABCD}• {A}⁺ = {A, B} ≠ ABCD → NOT superkey • {AC}⁺ = {A, B, C, D} = R → SUPERKEY ✓ • {AB}⁺ = {A, B} ≠ ABCD → NOT superkey • {BC}⁺ = {B, C, D} ≠ ABCD → NOT superkey • {ABC}⁺ = {A, B, C, D} = R → SUPERKEY ✓ (but not minimal) • {ABCD}⁺ = {A, B, C, D} = R → SUPERKEY ✓ (trivially)
Note: {AC} is the only candidate key here (minimal superkey). {ABC} and {ABCD} are superkeys but not candidate keys.
The set of all superkeys of a relation has a beautiful mathematical structure: it forms an upward closed set (also called an upset or filter) in the subset lattice of attributes.
What does this mean?
Consider the power set of all attributes, ordered by subset inclusion. Superkeys occupy a specific region of this lattice:
This structure has practical implications for enumeration and reasoning about keys.
In the diagram above (for a hypothetical relation where {A,B} and {A,C} are the candidate keys):
Key Observation: The "frontier" between superkeys and non-superkeys is exactly the set of candidate keys. Above them (larger sets) are superkeys; below them (smaller sets) are non-superkeys.
If a relation has candidate keys K₁, K₂, ..., Kₘ, the number of superkeys can be computed using inclusion-exclusion. In the simplest case with one candidate key K of size k in a relation with n attributes, there are 2^(n-k) superkeys (all supersets of K).
Understanding how many superkeys exist helps in database design and theory. The count depends on the candidate key structure.
Case 1: Single Candidate Key
If K is the only candidate key with |K| = k in a relation with n attributes, then:
Number of superkeys = 2^(n-k)
Reason: Every superset of K is a superkey. We can independently include or exclude each of the (n-k) non-key attributes.
Case 2: Multiple Candidate Keys
With multiple candidate keys, we use inclusion-exclusion to avoid double-counting supersets.
Count all superkeysNumber of superkeys = 14Using inclusion-exclusion:
Let SK(K) = supersets of candidate key K
|SK({A,B})| = 2^(5-2) = 2³ = 8 (attribute E can be in/out: ×2, plus C, D) Actually: supersets of {A,B} can include any subset of {C,D,E} |SK({A,B})| = 2³ = 8
|SK({C,D})| = 2^(5-2) = 8
|SK({A,B}) ∩ SK({C,D})| = |SK({A,B,C,D})| = 2^(5-4) = 2¹ = 2 (only E can vary)
By inclusion-exclusion: Total superkeys = 8 + 8 - 2 = 14
Verification: The 14 superkeys are: {AB}, {ABC}, {ABD}, {ABE}, {ABCD}, {ABCE}, {ABDE}, {ABCDE} (8 from AB) {CD}, {ACD}, {BCD}, {CDE}, {ABCD}, {ACDE}, {BCDE}, {ABCDE} (8 from CD) Duplicates: {ABCD}, {ABCDE} counted twice → subtract 2 Unique superkeys: 8 + 8 - 2 = 14 ✓
| Scenario | Formula | Example |
|---|---|---|
| Single key K, |K|=k, n attrs | 2^(n-k) | K={A}, n=5 → 2⁴ = 16 superkeys |
| Two disjoint keys K₁, K₂ | 2^(n-|K₁|) + 2^(n-|K₂|) - 2^(n-|K₁∪K₂|) | As computed above |
| Keys K₁, K₂ with overlap | More complex inclusion-exclusion | Depends on K₁ ∩ K₂ |
| m disjoint keys of size k | m × 2^(n-k) - (m choose 2) × 2^(n-2k) + ... | Full inclusion-exclusion |
In practice, the exact count of superkeys is rarely needed. What matters is identifying candidate keys (the minimal ones) and understanding that any superset is also a valid uniqueness constraint.
The relationship between superkeys, candidate keys, and prime/non-prime attributes is fundamental to normalization theory.
Definitions:
Key Relationships:
| Attribute Type | In Candidate Key? | In Every Superkey? | Relevance |
|---|---|---|---|
| Prime attribute | Yes (at least one) | Not necessarily | Participates in minimal unique identification |
| Non-prime attribute | No | Never required | Can always be derived from a key |
| L-category attribute | Yes (every key) | Yes | Essential for any identification |
| R-category attribute | No | No | Always derivable, never part of key |
Classify all attributes as prime or non-primePrime: {A, B, C}, Non-prime: {D, E}Candidate Key 1: {A, B} • A is in this key → prime • B is in this key → prime
Candidate Key 2: {A, C} • A is in this key → prime (already prime) • C is in this key → prime
Prime attributes: A, B, C (each appears in at least one candidate key) Non-prime attributes: D, E (neither appears in any candidate key)
Note: A is "super-prime" in the sense it appears in EVERY candidate key (it's in the L category). B and C are prime but not in every key.
The 2NF, 3NF, and BCNF definitions all reference: • Whether a dependency's determinant is a superkey • Whether dependent attributes are prime or non-prime
Understanding superkeys and prime attributes is prerequisite to correctly applying normalization rules.
Normal form definitions frequently reference superkeys. Understanding this connection is essential for applying normalization correctly.
BCNF Definition (using superkeys):
A relation R is in BCNF if and only if, for every non-trivial FD X → Y in F⁺, X is a superkey of R.
Interpretation: Every determinant that provides new information must be a superkey. This means no non-key attribute determines anything about the relation.
3NF Definition (using superkeys and prime attributes):
A relation R is in 3NF if and only if, for every non-trivial FD X → A in F⁺, either:
- X is a superkey of R, OR
- A is a prime attribute
Is R in BCNF? Is R in 3NF?Not BCNF, Yes 3NFCheck each non-trivial FD:
FD: A → B • Is {A} a superkey? {A}⁺ = {A, B} ≠ {A,B,C,D} → NO • BCNF violated ✗ • Is B prime? Candidate key is {A,C}, B ∉ {A,C} → NO • 3NF violated? Need superkey OR prime → violated ✗
Wait, let me reconsider. Actually:
FD: BC → D • Is {B, C} a superkey? {B,C}⁺ = {B, C, D} ≠ ABCD → NO • BCNF check: NOT a superkey → violation • 3NF check: D is not prime (D ∉ {A,C}) → violation IF BC not superkey
Actually this relation is NOT in 3NF either. Let me recalculate:
Candidate key = {A, C} (verify: {A,C}⁺ = A→B gives {A,B,C}, BC→D gives {A,B,C,D} ✓)
Prime attributes: A, C Non-prime: B, D
A → B: A not superkey, B not prime → 3NF violated BC → D: BC not superkey, D not prime → 3NF violated
Correct answer: R is in neither BCNF nor 3NF.
Beyond normalization theory, superkey analysis has practical applications in database design and optimization.
When choosing a primary key from multiple candidate keys, consider:
All candidate keys enforce the same uniqueness; the choice is about practical tradeoffs.
| Decision | Use Superkey Analysis For | Example |
|---|---|---|
| Primary key selection | Identify all candidate keys, then choose | Choose EmployeeID over (SSN) or (Email) |
| Unique constraint placement | Determine if column combo is superkey | {OrderID, LineNumber} is superkey for OrderLines |
| Index design | Find minimal covering index | Index on candidate key covers all queries needing uniqueness |
| Join elimination | Optimizer removes redundant joins if FK references superkey | Join on 1:1 FK can be eliminated in projections |
| Duplicate detection | Define what makes rows 'the same' | Superkey equality = same entity |
Superkey identification is more than a stepping stone to candidate keys—it's fundamental to understanding relational structure. Let's consolidate the key concepts:
What's Next:
The final page of this module focuses on candidate key finding—refining our techniques to efficiently discover all minimal superkeys, handle complex FD sets, and understand the complete key structure of any relation.
You now have a complete understanding of superkeys: their definition, structure, relationship to candidate keys and prime attributes, role in normalization, and practical applications. Next, we dive deep into candidate key finding algorithms.