Loading content...
In the previous pages, you learned what minimal covers are and how to identify extraneous attributes. Now it's time to put these pieces together into a complete, systematic algorithm that transforms any set of functional dependencies into its canonical (minimal) cover.
The canonical cover algorithm is one of the foundational algorithms in database theory. It appears in virtually every database textbook and is a prerequisite for normalization algorithms like 3NF synthesis. Understanding this algorithm deeply—not just memorizing its steps—enables you to reason about FD sets, debug normalization issues, and design optimal schemas.
This page presents the algorithm in complete detail, with rigorous correctness analysis and extensive worked examples.
By the end of this page, you will understand every step of the canonical cover algorithm, why each step is necessary, and how to execute it correctly. You will work through multiple complete examples and understand the algorithm's correctness guarantees.
The canonical cover algorithm has three main phases:
Phase 1: Decompose Right-Hand Sides
Convert all FDs to have single-attribute right-hand sides. Replace X → AB with X → A and X → B.
Phase 2: Remove Extraneous Left-Hand Attributes
For each FD, check if any attribute on the left side can be removed. If so, remove it.
Phase 3: Remove Redundant FDs
Check if any FD can be derived from the others. If so, remove it.
Post-Processing: Combine FDs with Same LHS (Optional)
For readability, FDs with identical left-hand sides can be combined: X → A and X → B become X → AB.
Key Invariant:
Throughout the algorithm, the closure of the FD set remains unchanged. Each step preserves equivalence while simplifying the representation.
ALGORITHM: Canonical Cover (Minimal Cover) INPUT: F (set of functional dependencies)OUTPUT: Fc (canonical cover of F) // ═══════════════════════════════════════════════// PHASE 1: Decompose to Single-Attribute RHS// ═══════════════════════════════════════════════FOR each FD (X → Y) in F where |Y| > 1: Remove (X → Y) from F FOR each attribute A in Y: Add (X → A) to F // ═══════════════════════════════════════════════// PHASE 2: Remove Extraneous LHS Attributes// ═══════════════════════════════════════════════REPEAT: changed = false FOR each FD (X → A) in F where |X| > 1: FOR each attribute B in X: X' = X - {B} IF A ∈ closure(X', F): Replace (X → A) with (X' → A) in F changed = true BREAK inner loop IF changed: BREAK outer loopUNTIL NOT changed // ═══════════════════════════════════════════════// PHASE 3: Remove Redundant FDs// ═══════════════════════════════════════════════REPEAT: changed = false FOR each FD (X → A) in F: F' = F - {X → A} IF A ∈ closure(X, F'): F = F' // Remove the redundant FD changed = true BREAK loopUNTIL NOT changed RETURN F // This is now the canonical cover FcRemoving one extraneous attribute or redundant FD may make others extraneous or redundant. The REPEAT loops ensure we continue until the set is fully minimized. This is crucial for correctness.
The first phase converts all FDs to have exactly one attribute on the right-hand side. This is justified by the decomposition rule from Armstrong's axioms:
If X → YZ, then X → Y and X → Z
Conversely, the union rule allows us to combine them back:
If X → Y and X → Z, then X → YZ
These rules guarantee that decomposition preserves the closure—no information is lost.
Why Decompose First?
Example:
Original: {A → BCD, BC → D, D → A}
After Phase 1:
{A → B, A → C, A → D, BC → D, D → A}
| Original FD | After Decomposition |
|---|---|
A → BC | A → B, A → C |
AB → CDE | AB → C, AB → D, AB → E |
X → Y (single attr) | X → Y (no change) |
PQ → RST | PQ → R, PQ → S, PQ → T |
Unlike later phases, Phase 1 is deterministic. Every FD with multi-attribute RHS is decomposed the same way, regardless of order. The result of Phase 1 is unique.
Phase 2 systematically checks each attribute in the left-hand side of each FD. For each attribute, we ask: "Can we still derive the RHS without this attribute?"
The Test:
For FD X → A where B ∈ X:
Important Details:
Example Walkthrough:
F = {A → B, B → C, AC → D}
Test AC → D:
Is A extraneous? Compute C⁺ = {C}. Is D ∈ {C}? No. A is essential.
Is C extraneous? Compute A⁺:
Let me redo: We compute A⁺ under F (including AC → D):
AC → D with A → D.F after Phase 2: {A → B, B → C, A → D}
When testing if B is extraneous in X → A, compute (X-{B})⁺ using ALL FDs in F, including X → A. The FD itself might be triggered if the reduced LHS, combined with other FDs, produces enough attributes.
DETAILED PHASE 2 ALGORITHM: changed = trueWHILE changed: changed = false FOR each (X → A) in F: IF |X| == 1: CONTINUE // Skip single-attribute LHS FOR each B in X: X' = X - {B} closure = compute_closure(X', F) // Use FULL F IF A ∈ closure: // B is extraneous Remove (X → A) from F Add (X' → A) to F changed = true BREAK both loops // Restart outer while // After Phase 2, every FD has minimal LHSPhase 3 checks whether each FD is derivable from the others. If so, the FD is redundant and should be removed.
The Test:
For FD X → A in F:
Why This Works:
If A can be derived from X using only the other FDs, then X → A doesn't add any new derivation power. It's logically implied by the rest of F.
Example:
F = {A → B, B → C, A → C}
Test A → C:
A → C is redundant. Remove it.F after: {A → B, B → C}
When testing if X → A is redundant, compute X⁺ WITHOUT including X → A. This is opposite from Phase 2! In Phase 2, we include the target FD; in Phase 3, we exclude it.
DETAILED PHASE 3 ALGORITHM: changed = trueWHILE changed: changed = false FOR each (X → A) in F: F' = F - {X → A} // Temporarily exclude closure = compute_closure(X, F') // Use F' IF A ∈ closure: // FD is redundant - make removal permanent F = F' changed = true BREAK loop // Restart // After Phase 3, no FD is derivable from othersLet's work through a complete example from start to finish.
Given FD Set:
F = {
A → BC,
B → C,
A → B,
AB → C
}
Scheme: R(A, B, C)
PHASE 1: Decompose RHS
F = {
A → B, from A → BC
A → C, from A → BC
B → C,
A → B, duplicate! (from original A → B)
AB → C
}
Remove duplicate:
F = {A → B, A → C, B → C, AB → C}
PHASE 2: Remove Extraneous LHS Attributes
Check AB → C (only FD with |LHS| > 1):
Is A extraneous in AB → C?
AB → C with B → C.But wait—B → C already exists! This creates a duplicate.
F = {A → B, A → C, B → C, B → C}
Remove duplicate:
F = {A → B, A → C, B → C}
No more FDs with |LHS| > 1. Phase 2 complete.
Why does this algorithm produce a valid minimal cover? Let's examine the correctness guarantees.
Theorem: The algorithm terminates with a valid minimal cover.
Proof Sketch:
Part 1: Termination
Each phase can only decrease the "size" of F:
Since there are finitely many attributes and FDs, the algorithm must terminate.
Part 2: Equivalence Preservation
Each transformation preserves the closure:
X → AB ≡ {X → A, X → B}XA → Y, then X → Y is derivable from F, so removing A preserves closurePart 3: Minimality
The result satisfies all minimal cover conditions:
Therefore, the output is a valid minimal cover of the input F.
The algorithm has polynomial time complexity. For n FDs with at most a attributes per FD over a schema of m attributes: Phase 1 is O(n·a); Phase 2 is O(n²·a·closure_cost); Phase 3 is O(n²·closure_cost). Closure computation is O(n·a), making overall complexity O(n³·a²).
Let's work through a more complex example that exercises all parts of the algorithm.
Given FD Set:
F = {
AB → C,
C → A,
BC → D,
ACD → B,
D → EG,
BE → C,
CG → BD,
CE → AG
}
Schema: R(A, B, C, D, E, G)
PHASE 1: Decompose RHS
F = {
AB → C,
C → A,
BC → D,
ACD → B,
D → E,
D → G,
BE → C,
CG → B,
CG → D,
CE → A,
CE → G
}
(11 FDs after decomposition)
PHASE 2: Remove Extraneous LHS Attributes
Test each FD with |LHS| > 1:
AB → C:
BC → D:
ACD → B:
Restart Phase 2 with modified F...
CD → B: (newly modified)
BE → C:
CG → B, CG → D:
CE → A, CE → G:
Is C extraneous in CE → A? Compute E⁺ = {E}. A ∉ {E}. No.
Is E extraneous in CE → A? Compute C⁺:
E is extraneous. Replace CE → A with C → A.
But C → A already exists! Remove duplicate.
For CE → G:
After Phase 2:
F = {
AB → C,
C → A,
BC → D,
CD → B,
D → E,
D → G,
BE → C,
CG → B,
CG → D,
CE → G
}
(10 FDs)
PHASE 3: Remove Redundant FDs
Test each FD for redundancy:
AB → C:
C → A: Not redundant (only source of A from C).
BC → D:
CD → B:
Restart Phase 3...
CG → B:
CG → D:
Continue until no more redundant FDs...
Final Canonical Cover:
Fc = {
AB → C,
C → A,
BC → D,
D → E,
D → G,
BE → C,
CG → B,
CE → G
}
(8 FDs—reduced from original 8 with multi-attribute RHS)
Implementing the canonical cover algorithm correctly requires attention to subtle details. Here are common mistakes and how to avoid them.
Pitfall 1: Not Restarting After Changes
After removing an extraneous attribute or redundant FD, the FD set changes. Previous tests may now give different results. Always restart the relevant phase after any modification.
Pitfall 2: Wrong F for Closure Computation
Mixing these up produces incorrect results.
Pitfall 3: Forgetting to Handle Duplicates
Phase 2 can create duplicate FDs:
AB → C becomes B → C but B → C already existsCD → A becomes C → A but C → A already existsAlways check for and remove duplicates.
Pitfall 4: Empty LHS
If an FD's LHS becomes empty (all attributes extraneous), this indicates a constant—an attribute that has only one possible value. Handle this as a special case.
Pitfall 5: Confusing Equivalence with Equality
Two minimal covers may look different but be equivalent. Don't expect a unique answer—different processing orders yield different valid results.
| Pitfall | Incorrect Behavior | Correct Behavior |
|---|---|---|
| Not Restarting | Continue testing with stale F | Restart phase after each change |
| Wrong F for Closure | Use F for Phase 3 closure | Use F' = F - {FD} for Phase 3 |
| Ignoring Duplicates | Multiple identical FDs | Remove duplicates immediately |
| Empty LHS | Crash or undefined behavior | Flag as constant attribute |
| Expecting Uniqueness | Reject valid minimal covers | Accept equivalent variations |
Always verify that the output is equivalent to the input by checking both directions: every FD in the input should be derivable from the output, and every FD in the output should be derivable from the input. This catches most implementation bugs.
We've covered the complete canonical cover algorithm in detail. Let's consolidate the key points:
What's Next:
The next page explores the concept of removing redundancy more deeply—examining not just how to remove it, but why redundancy arises in practical database design scenarios and strategies for preventing it upfront.
You now understand the complete canonical cover algorithm and can execute it correctly on any FD set. You understand the correctness guarantees and common implementation pitfalls.