Loading content...
Understanding what attribute closure means is essential, but as database professionals, we need to compute it. Whether you're verifying if a set of attributes forms a key, checking if a functional dependency is implied by others, or normalizing a schema, you need a reliable, systematic procedure.
The attribute closure algorithm is elegantly simple yet remarkably powerful. It transforms the abstract concept of closure into a concrete, step-by-step procedure that can be executed by hand or implemented in code. What makes this algorithm particularly beautiful is that it always terminates, always produces the correct result, and does so efficiently.
By the end of this page, you will master the closure algorithm in its standard form, understand its correctness guarantees, analyze its time complexity, and be able to trace its execution step-by-step on any input. You'll also learn optimized variants used in production database tools.
The attribute closure algorithm follows a simple iterative approach: start with the input set, and repeatedly expand it by applying functional dependencies until no more attributes can be added.
Algorithm: COMPUTE-CLOSURE(X, F)
Input: X — a set of attributes
F — a set of functional dependencies
Output: X⁺ — the closure of X under F
1. Initialize: result ← X
2. Repeat:
2.1. old_result ← result
2.2. For each functional dependency (Y → Z) in F:
If Y ⊆ result, then:
result ← result ∪ Z
2.3. If result = old_result, exit loop (no changes)
3. Return result as X⁺
Key Insight: The algorithm terminates when a complete pass through all FDs adds no new attributes. At that point, we've reached a fixed point—the closure.
Each iteration can only add attributes to the result (never remove). Since the attribute set is finite, eventually we must reach a state where nothing new can be added. This fixed point is exactly the closure by definition—all attributes that can be derived have been derived.
123456789101112131415161718192021222324252627282930313233343536373839
def compute_closure(X: set, F: list[tuple[set, set]]) -> set: """ Compute the attribute closure X⁺ under functional dependencies F. Args: X: Initial set of attributes (e.g., {'A', 'B'}) F: List of FDs as tuples (determinant, dependent) e.g., [({'A'}, {'B', 'C'}), ({'B'}, {'D'})] Returns: The closure X⁺ as a set of attributes """ result = set(X) # Start with X (satisfies reflexivity) changed = True while changed: changed = False for (determinant, dependent) in F: # If determinant is subset of current result if determinant <= result: # Add dependent attributes not already in result new_attrs = dependent - result if new_attrs: result |= new_attrs changed = True return result # Example usageX = {'A'}F = [ ({'A'}, {'B'}), # A → B ({'B'}, {'C'}), # B → C ({'C'}, {'D'}), # C → D ({'E'}, {'F'}) # E → F (won't apply)] closure = compute_closure(X, F)print(f"{{A}}⁺ = {closure}") # {A}⁺ = {'A', 'B', 'C', 'D'}Let's trace through a comprehensive example that demonstrates the algorithm's behavior with multiple functional dependencies and a non-trivial starting set.
Problem Setup:
Relation: R(A, B, C, D, E, F, G)
Functional Dependencies F:
Goal: Compute {A}⁺
| Iteration | Current Result | FD Checked | Condition Met? | Attributes Added |
|---|---|---|---|---|
| Init | {A} | — | — | — |
| 1.1 | {A} | A → BC | A ⊆ {A}? ✓ | B, C |
| 1.2 | {A,B,C} | B → D | B ⊆ {A,B,C}? ✓ | D |
| 1.3 | {A,B,C,D} | CD → E | CD ⊆ {A,B,C,D}? ✓ | E |
| 1.4 | {A,B,C,D,E} | E → F | E ⊆ {A,B,C,D,E}? ✓ | F |
| 1.5 | {A,B,C,D,E,F} | F → G | F ⊆ {A,B,C,D,E,F}? ✓ | G |
| 2.1 | {A,B,C,D,E,F,G} | All FDs | No new attributes | None |
| Done | {A,B,C,D,E,F,G} | — | Fixed point | — |
Result: {A}⁺ = {A, B, C, D, E, F, G} = R
Since {A}⁺ contains all attributes of R, A is a superkey of R. Furthermore, since A is a single attribute, no proper subset of {A} exists (other than ∅), so A is also a candidate key.
The final result is the same regardless of the order we check FDs within an iteration. However, the order affects how many iterations we need. With optimal ordering, we might reach the fixed point in fewer iterations, but the result is always correct.
Not every attribute set determines all attributes. Let's examine a case where the closure is proper subset of R.
Problem Setup:
Relation: R(A, B, C, D, E)
Functional Dependencies F:
Goal: Compute {A}⁺
X = {A}, F = {A → B, BC → D, D → E}{A}⁺ = {A, B}Iteration 1: • Start: result = {A} • A → B: A ⊆ {A}? YES → add B → result = {A, B} • BC → D: BC ⊆ {A, B}? NO (missing C) → no change • D → E: D ⊆ {A, B}? NO → no change
Iteration 2: • result = {A, B} (unchanged from previous check) • No FD adds new attributes • Fixed point reached
Result: {A}⁺ = {A, B}
Since {A}⁺ ≠ R, A is NOT a superkey. We cannot determine C, D, or E from A alone. The FD BC → D requires C, which we don't have.
Follow-up Question: What is {AC}⁺?
Let's compute:
| Step | Current Result | FD Applied | New Attributes |
|---|---|---|---|
| Init | {A, C} | — | — |
| 1 | {A, C} | A → B | B |
| 2 | {A, B, C} | BC → D | D |
| 3 | {A, B, C, D} | D → E | E |
| 4 | {A, B, C, D, E} | None apply | None |
{AC}⁺ = {A, B, C, D, E} = R
Therefore, {A, C} is a superkey. Is it a candidate key? We need to verify that neither {A} nor {C} alone is a superkey: • {A}⁺ = {A, B} ≠ R (not a superkey) • {C}⁺ = {C} ≠ R (not a superkey; no FD has C alone as determinant)
Since no proper subset of {A, C} is a superkey, {A, C} is a candidate key.
For a critical tool like closure computation, we need confidence that the algorithm is correct. Let's examine the formal correctness properties.
Claim: The COMPUTE-CLOSURE algorithm returns exactly X⁺.
Proof Sketch:
We prove two directions:
Direction 1: Soundness (result ⊆ X⁺) Every attribute added to result is legitimately in X⁺.
Proof by induction:
Direction 2: Completeness (X⁺ ⊆ result) Every attribute in X⁺ is captured in result.
Proof by contradiction: Suppose attribute A ∈ X⁺ but A ∉ result at termination. Since A ∈ X⁺, there exists a derivation chain from X to A using FDs in F. But if every FD in that chain is checked, and all prerequisites are met (by induction), A would be added. Contradiction. ✓
The algorithm always terminates because: • result can only grow (never shrink) • result is bounded by U (finite attributes) • Each iteration adds at least one attribute or exits • Maximum iterations: |U| - |X|
The algorithm is deterministic—the same inputs always produce the same output. While the internal order of checking FDs may vary, the final result is unique because closure itself is well-defined.
Understanding the algorithm's efficiency is crucial for practical applications, especially in automated database design tools.
Notation:
| Component | Cost | Explanation |
|---|---|---|
| Maximum iterations | O(n) | Each iteration adds ≥1 attribute; at most n-|X| iterations |
| FDs checked per iteration | O(m) | We check every FD once per iteration |
| Subset check per FD | O(k) or O(n) | Checking if determinant ⊆ result |
| Set union per FD | O(k) | Adding dependent attributes to result |
| Total (naive) | O(n × m × n) | O(n²m) in the worst case |
Optimized Implementation:
The naive algorithm can be improved significantly:
Tracking applicable FDs: Instead of checking all FDs each iteration, track which FDs become newly applicable when attributes are added.
Counting approach: For each FD Y → Z, maintain a counter of how many attributes from Y are in result. When counter equals |Y|, the FD fires.
Linked structure: Link FDs to the attributes in their determinant. When attribute A is added, only check FDs that include A.
With these optimizations, the algorithm achieves O(n × m) or even better in practice.
123456789101112131415161718192021222324252627282930313233343536
def compute_closure_optimized(X: set, F: list[tuple[set, set]], all_attrs: set) -> set: """ Optimized closure computation using counter technique. Time Complexity: O(|F| × avg_determinant_size) Space Complexity: O(|F| + |attributes|) """ result = set(X) # For each FD, count how many determinant attrs are in result # fd_counts[i] = number of determinant attrs of F[i] in result fd_counts = [len(det & result) for (det, _) in F] # Track which FDs have fully satisfied determinants (will fire) pending = [i for i, (det, _) in enumerate(F) if det <= result] while pending: fd_index = pending.pop() det, dep = F[fd_index] for attr in dep: if attr not in result: result.add(attr) # Update counts for FDs that include this attribute in determinant for j, (det_j, _) in enumerate(F): if attr in det_j: fd_counts[j] += 1 # If this FD is now satisfied, add to pending if fd_counts[j] == len(det_j) and j not in pending: pending.append(j) return result # This achieves O(|F| × k) where k = avg determinant size# Much better than O(n² × m) for large schemasIn practice, closure computation is very fast even for schemas with hundreds of attributes and thousands of FDs. Modern database design tools compute thousands of closures per second. The theoretical bounds are pessimistic; real-world FD sets are usually sparse.
Robust understanding requires examining corner cases. Let's explore special situations that the algorithm handles correctly.
| Situation | Example | Result | Why It Works |
|---|---|---|---|
| Empty FD set | X = {A}, F = ∅ | {A}⁺ = {A} | No FDs to apply; result stays at X |
| Empty starting set | X = ∅, F = {A→B} | ∅⁺ = ∅ | No determinant is subset of ∅ except ∅→anything (rare) |
| Self-referential FD | X = {A}, F = {A→A} | {A}⁺ = {A} | A→A is trivial; doesn't add anything |
| Cyclic dependencies | F = {A→B, B→A} | {A}⁺ = {A,B} = {B}⁺ | Both determine each other |
| Full determinant | X = all attrs | X⁺ = X | Already have everything; reflects by being a superkey |
| Overlapping FDs | F = {A→BC, A→BD} | {A}⁺ includes B,C,D | Union of all dependents applies |
Compute {A}⁺, {B}⁺, and {C}⁺All three closures equal {A, B, C}• {A}⁺: A→B gives {A,B}, B→C gives {A,B,C}. Done. • {B}⁺: B→C gives {B,C}, C→A gives {A,B,C}. Done. • {C}⁺: C→A gives {A,C}, A→B gives {A,B,C}. Done.
This shows that A, B, and C are all equivalent as determinants—each one determines all others. In such cases, any single attribute is a candidate key!
When computing closure by hand, don't forget to continue applying FDs after adding new attributes. A common mistake is to only check each FD once. The algorithm requires repeated passes until nothing changes.
One of the most important applications of closure is testing whether a functional dependency is implied by a given set of FDs. This is fundamental for:
The Implication Test:
To check if F ⊨ X → Y (does F imply X → Y?):
- Compute X⁺ under F
- If Y ⊆ X⁺, then F implies X → Y. Otherwise, it does not.
Check: Does F ⊨ A → D?Yes, F implies A → DStep 1: Compute {A}⁺ under F • Start: {A} • A → B: add B → {A, B} • B → C: add C → {A, B, C} • C → D: add D → {A, B, C, D} • Result: {A}⁺ = {A, B, C, D}
Step 2: Check if D ⊆ {A}⁺ • D ∈ {A, B, C, D}? YES
Conclusion: F ⊨ A → D ✓
Interpretation: Although A → D is not explicitly in F, it can be derived via transitivity: A → B, B → C, C → D implies A → D.
Check: Does F ⊨ A → D?No, F does not imply A → DStep 1: Compute {A}⁺ under F • Start: {A} • A → B: add B → {A, B} • C → D: C ⊆ {A, B}? NO, cannot apply • Result: {A}⁺ = {A, B}
Step 2: Check if D ⊆ {A}⁺ • D ∈ {A, B}? NO
Conclusion: F ⊭ A → D ✗
Interpretation: There's no chain of FDs connecting A to D. Knowing A tells us B, but nothing about C or D. We would need C in our starting set to determine D.
Testing F ⊨ X → Y via closure is much more efficient than computing F⁺ (all implied FDs). We only compute one closure, taking polynomial time. Computing F⁺ can produce exponentially many FDs. Always use closure for implication testing!
The closure algorithm is your primary tool for working with functional dependencies. Let's consolidate the key points:
Practice Strategy:
What's Next:
Now that you can compute closure, the next page applies this skill to its most important use case: finding keys from functional dependencies. You'll learn systematic methods to discover all candidate keys of a relation.
You now have a systematic, proven algorithm for computing attribute closure. This algorithm forms the foundation for key discovery, FD implication testing, canonical cover computation, and normalization verification. Next, we apply closure to find keys.