Loading learning content...
Every well-designed database schema rests upon a fundamental question: What uniquely identifies each entity in our data? This seemingly simple question has profound implications for data integrity, query performance, and system reliability.
Functional dependencies provide the mathematical framework to answer this question rigorously. Rather than relying on intuition or business assumptions alone, we can derive keys systematically from the functional relationships inherent in our data—ensuring that our schema design is both correct and complete.
By the end of this page, you will understand how functional dependencies encode the information necessary to identify keys, master the attribute closure algorithm for key derivation, and develop the analytical intuition to determine whether any attribute set qualifies as a superkey or candidate key.
To understand how keys emerge from functional dependencies, we must first precisely define what a key actually is in the context of relational theory.
Formal Definition of a Key:
Given a relation schema R with attribute set U and a set of functional dependencies F:
The crucial insight is this: the set F completely determines which attribute combinations can serve as keys. This is not a matter of designer preference or business rules—it's a mathematical consequence of the functional dependencies.
Think of functional dependencies as encoding constraints on possible database states. A key is simply an attribute set strong enough (under these constraints) to uniquely determine everything else. The FDs don't suggest keys—they mandate them.
Example: Understanding the Connection
Consider a relation schema:
Student(StudentID, Email, Name, Department, Advisor)
With functional dependencies:
F = {
StudentID → Email, Name, Department, Advisor
Email → StudentID, Name, Department, Advisor
Department → Advisor
}
From these FDs, we can immediately derive:
But which are candidate keys? StudentID and Email are both minimal superkeys—removing any attribute from either (which is impossible since they're single attributes) would destroy the superkey property. Therefore, both StudentID and Email are candidate keys.
| Concept | Definition | Relationship to FDs | Example |
|---|---|---|---|
| Superkey | Any set that determines all attributes | K⁺ = U (closure equals all attributes) | {StudentID}, {Email}, {StudentID, Name} |
| Candidate Key | Minimal superkey | No proper subset has closure = U | {StudentID}, {Email} |
| Primary Key | Designated candidate key | Designer choice from candidates | {StudentID} |
| Alternate Key | Non-primary candidate key | Remaining candidates after PK choice | {Email} |
The attribute closure (X⁺) is the cornerstone tool for deriving keys from functional dependencies. Given an attribute set X and FD set F, the closure X⁺ contains all attributes that can be determined from X using the FDs in F.
The Key Insight:
An attribute set X is a superkey if and only if X⁺ contains all attributes of the relation. In symbols:
X is a superkey ⟺ X⁺ = U (where U is the set of all attributes)
This transforms key identification into a computational problem with a well-defined algorithm.
123456789101112131415161718192021222324
ALGORITHM: ComputeClosure(X, F)// Input: X = set of attributes, F = set of functional dependencies// Output: X⁺ = closure of X under F FUNCTION ComputeClosure(X, F): closure ← X // Initialize with input attributes changed ← true WHILE changed DO: changed ← false FOR EACH dependency (A → B) IN F: // If we have all attributes on the left side IF A ⊆ closure AND B ⊄ closure THEN: closure ← closure ∪ B // Add right-side attributes changed ← true END FOR END WHILE RETURN closure // Example usage:// X = {StudentID}// F = {StudentID → Email, Email → Name, Name → Department}// ComputeClosure(X, F) returns {StudentID, Email, Name, Department}The closure algorithm runs in O(n²) time in the worst case, where n is the number of attributes and FDs. Each iteration adds at least one attribute, and each iteration examines all FDs. This is efficient enough for practical database design scenarios.
While the closure algorithm tells us whether a specific attribute set is a superkey, we need a systematic process to find all candidate keys. This requires understanding the role different attributes play in functional dependencies.
Attribute Classification:
For a given relation schema R with FD set F, each attribute falls into one of four categories:
This classification dramatically reduces the key search space.
| Classification | Key Implication | Reason |
|---|---|---|
| L-only | MUST be in every candidate key | No FD can derive this attribute—it must be given |
| R-only | NEVER in any candidate key | Can always be derived—adding it creates redundancy |
| Neither | MUST be in every candidate key | No FD involves this attribute—it cannot be derived |
| Both | MAY be in some candidate keys | Sometimes needed, sometimes derivable |
The Key Finding Algorithm:
With this classification, we can efficiently find all candidate keys:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
from itertools import combinations def classify_attributes(all_attrs: set, fds: list[tuple[set, set]]) -> dict: """ Classify attributes based on their appearance in FDs. Returns dict with keys: 'left_only', 'right_only', 'both', 'neither' """ left_side = set() right_side = set() for lhs, rhs in fds: left_side.update(lhs) right_side.update(rhs) return { 'left_only': left_side - right_side, 'right_only': right_side - left_side, 'both': left_side & right_side, 'neither': all_attrs - left_side - right_side } def find_all_candidate_keys(all_attrs: set, fds: list[tuple[set, set]]) -> list[set]: """ Find all candidate keys for a relation given its FDs. Uses attribute classification to prune the search space efficiently. """ classification = classify_attributes(all_attrs, fds) # Core attributes that MUST be in every candidate key core = classification['left_only'] | classification['neither'] # Attributes that might extend the core maybe_needed = classification['both'] # Check if core alone is a superkey if compute_closure(core, fds) == all_attrs: return [core] if core else [set()] # Handle empty core edge case candidate_keys = [] # Try adding subsets of 'maybe_needed' to the core # Start with smallest subsets (for minimality) for size in range(1, len(maybe_needed) + 1): for subset in combinations(maybe_needed, size): candidate = core | set(subset) # Check if it's a superkey if compute_closure(candidate, fds) != all_attrs: continue # Check if it's minimal (no proper subset is a superkey) is_minimal = True for existing in candidate_keys: if existing < candidate: # Existing is proper subset is_minimal = False break # Also check within 'maybe_needed' subset if is_minimal: for attr in subset: smaller = core | (set(subset) - {attr}) if compute_closure(smaller, fds) == all_attrs: is_minimal = False break if is_minimal: candidate_keys.append(candidate) return candidate_keys # Extended exampleif __name__ == "__main__": # Relation: R(A, B, C, D, E) # FDs: A → B, BC → D, D → E, E → A all_attrs = {'A', 'B', 'C', 'D', 'E'} fds = [ ({'A'}, {'B'}), ({'B', 'C'}, {'D'}), ({'D'}, {'E'}), ({'E'}, {'A'}) ] classification = classify_attributes(all_attrs, fds) print("Attribute Classification:") for category, attrs in classification.items(): print(f" {category}: {attrs if attrs else '{}'}") candidate_keys = find_all_candidate_keys(all_attrs, fds) print(f"\nCandidate Keys: {[sorted(k) for k in candidate_keys]}")Let's work through complete examples to solidify understanding. These examples progressively increase in complexity.
Relation: Employee(EmpID, Name, Email, DeptID, DeptName, Manager)
Functional Dependencies:
F = {
EmpID → Name, Email, DeptID
DeptID → DeptName, Manager
}
Step 1: Classify attributes
| Attribute | Appears in LHS | Appears in RHS | Category |
|---|---|---|---|
| EmpID | Yes | No | L-only |
| Name | No | Yes | R-only |
| No | Yes | R-only | |
| DeptID | Yes | Yes | Both |
| DeptName | No | Yes | R-only |
| Manager | No | Yes | R-only |
Step 2: Identify core
Core = L-only ∪ Neither = {EmpID} ∪ {} = {EmpID}
Step 3: Compute core closure
{EmpID}⁺:
{EmpID}⁺ = all attributes ✓
Result: The unique candidate key is {EmpID}
Understanding how keys emerge from FDs has profound implications for database design:
1. Validation of Schema Correctness
When designing a database, we often start with intuitions about keys ("StudentID should be the primary key"). Key derivation from FDs validates these intuitions:
2. Discovery of Hidden Keys
Sometimes the FDs reveal keys that weren't obvious:
Relation: Order(OrderNum, CustomerEmail, CustomerPhone, ProductID, Quantity)
FDs:
OrderNum → CustomerEmail, ProductID, Quantity
CustomerEmail → CustomerPhone
Obvious key: {OrderNum}
But what if business rules also specify:
CustomerEmail, ProductID → OrderNum (each customer-product pair has one active order)
Now {CustomerEmail, ProductID} is also a candidate key!
Never derive FDs (and thus keys) solely from examining data. A sample showing unique StudentID values doesn't prove StudentID is a key—maybe two students just haven't been assigned the same ID yet. Keys must be derived from semantic understanding of the domain, expressed as FDs that hold for ALL possible valid states of the database.
3. Implications for Foreign Keys
Candidate keys in one table become the options for foreign key references in another:
-- If Student has candidate keys {StudentID} and {Email}
-- Then Enrollment can reference either:
CREATE TABLE Enrollment (
EnrollmentID INT PRIMARY KEY,
StudentRef INT REFERENCES Student(StudentID), -- or
-- StudentEmail VARCHAR REFERENCES Student(Email)
CourseID INT,
...
);
The choice affects query patterns, index requirements, and even storage efficiency.
Key derivation seems straightforward in theory, but several pitfalls trap even experienced practitioners:
Edge Case: The Empty Key
What if the FD set implies a constant attribute?
Relation: Config(Version, Setting, Value)
FDs: {} → Version (Version is constant across all tuples)
In this case, {} (the empty set) determines Version. The empty set is technically a subset of every key. This means the table can only have one distinct Version value—effectively, it's not a "true" attribute but a schema-level constant.
Edge Case: All Attributes Are Keys
If no non-trivial FDs exist, every proper superset of {} is a superkey. The unique candidate key is the set of all attributes:
Relation: EventLog(Timestamp, UserID, Action, IPAddress)
FDs: {} (none, all combinations are possible)
Candidate key: {Timestamp, UserID, Action, IPAddress}
This often indicates either:
We've established the theoretical and practical foundations for deriving keys from functional dependencies. Let's consolidate the key insights:
What's Next:
With the ability to derive keys from FDs, we now turn to categorizing the attributes within those keys. The next page explores prime and non-prime attributes—a classification that becomes essential for understanding normal forms and guiding normalization decisions.
You now understand how functional dependencies encode the information necessary to identify all candidate keys in a relation. This forms the foundation for the attribute classifications and normalization theory to come.