Loading learning content...
After all the work of decomposing, testing, and removing—what do you have? A simplified FD set, also known as a canonical cover or minimal cover. This is not just a cleaned-up version of your original FDs. It's the essential representation of your data's structural constraints—the irreducible core that captures everything about how your attributes relate.
This final page brings the entire module together. We'll examine what makes a simplified FD set valuable, how it connects to normalization, best practices for working with it, and common real-world scenarios. By the end, you'll not only know how to compute canonical covers but understand their place in the broader context of database design.
By the end of this page, you will understand the properties that make a simplified FD set useful, how it directly drives 3NF synthesis and other normalization algorithms, best practices for documenting and maintaining FD sets, and how to handle FD sets in real-world database projects.
A simplified FD set (canonical cover) satisfies four key properties that together guarantee its minimality and utility.
Property 1: Equivalence to Original
The simplified set Fc has the same closure as the original F. Every FD derivable from F is derivable from Fc, and vice versa.
Fc ≡ F ⟺ Fc⁺ = F⁺
This guarantees that no constraint information is lost.
Property 2: Single-Attribute Right-Hand Sides
Every FD in Fc has exactly one attribute on its right-hand side. This uniform structure simplifies all subsequent processing.
∀(X → A) ∈ Fc: |A| = 1
Property 3: Minimal Left-Hand Sides
No attribute on any left-hand side is extraneous. Each attribute is essential for deriving the right-hand side.
∀(X → A) ∈ Fc, ∀B ∈ X: A ∉ (X - {B})⁺ under Fc
Property 4: No Redundant FDs
No FD in Fc can be derived from the remaining FDs. Each one contributes unique information.
∀(X → A) ∈ Fc: A ∉ X⁺ under (Fc - {X → A})
| Property | What It Guarantees | Why It Matters |
|---|---|---|
| Equivalence | Same constraints as original | No information lost |
| Single RHS | Uniform FD structure | Simpler algorithms |
| Minimal LHS | Essential determinants only | Reveals true keys |
| No redundancy | Each FD is essential | Optimal representation |
We use 'simplified' to emphasize that this is the cleanest, most direct representation of your FD set. 'Canonical' emphasizes standardization; 'minimal' emphasizes size reduction. All three terms refer to the same concept.
The primary application of simplified FD sets is the 3NF synthesis algorithm, which constructs a normalized schema directly from the canonical cover.
3NF Synthesis Algorithm:
INPUT: Simplified FD set Fc over schema R
OUTPUT: 3NF decomposition of R
1. For each FD (X → A) in Fc:
Create a relation schema with attributes X ∪ {A}
2. If no relation from step 1 contains a candidate key of R:
Add a relation containing any candidate key
3. Remove any relation whose attributes are a subset of another's
Why Canonical Cover is Required:
Redundant FDs create extra tables: If A → C is in Fc but is redundant (derivable from A → B, B → C), step 1 creates an unnecessary relation {A, C}.
Extraneous LHS creates wrong keys: If AB → C is in Fc but B is extraneous, the relation {A, B, C} has an unnecessarily complex key.
Multi-attribute RHS creates incomplete tables: If A → BC is in Fc without decomposition, you get one relation {A, B, C} instead of potentially separate relations.
EXAMPLE: 3NF Synthesis from Canonical Cover Canonical Cover Fc:{ StudentID → StudentName, (1) CourseID → CourseTitle, (2) CourseID → DepartmentID, (3) InstructorID → InstructorName, (4) DepartmentID → DepartmentName, (5) StudentID, CourseID → Grade, (6) StudentID, CourseID → InstructorID, (7) StudentID, CourseID → Semester (8)} Step 1: Create relations for each FD: R1(StudentID, StudentName) -- from (1)R2(CourseID, CourseTitle) -- from (2)R3(CourseID, DepartmentID) -- from (3)R4(InstructorID, InstructorName) -- from (4)R5(DepartmentID, DepartmentName) -- from (5)R6(StudentID, CourseID, Grade) -- from (6)R7(StudentID, CourseID, InstructorID) -- from (7)R8(StudentID, CourseID, Semester) -- from (8) Step 2: Check for candidate key coverage- Candidate key: {StudentID, CourseID}- R6, R7, and R8 all contain the candidate key ✓ Step 3: Remove subset relations- R2 and R3 can be merged: R2'(CourseID, CourseTitle, DepartmentID)- R6, R7, R8 can be merged: R6'(StudentID, CourseID, Grade, InstructorID, Semester) Final 3NF Schema:R1(StudentID, StudentName) -- Key: StudentIDR2(CourseID, CourseTitle, DepartmentID) -- Key: CourseIDR4(InstructorID, InstructorName) -- Key: InstructorIDR5(DepartmentID, DepartmentName) -- Key: DepartmentIDR6(StudentID, CourseID, Grade, InstructorID, Semester) -- Key: (StudentID, CourseID)In practice, FDs with identical left-hand sides produce relations that can be merged. StudentID, CourseID → Grade and StudentID, CourseID → InstructorID become a single relation (StudentID, CourseID, Grade, InstructorID). This is the 'undecompose' step after synthesis.
While 3NF synthesis uses the canonical cover directly, BCNF decomposition uses a different approach. However, the canonical cover still plays an important role.
BCNF Decomposition Algorithm:
INPUT: Relation R with FD set F
OUTPUT: BCNF decomposition of R
WHILE there exists a relation Ri violating BCNF:
Find an FD X → A in Ri where X is not a superkey
Decompose Ri into:
- R1 = X ∪ {A} (with FD X → A)
- R2 = Ri - {A} (with projected FDs)
Role of Canonical Cover:
Example:
R(A, B, C, D) with Fc = {A → B, BC → D}
Candidate key: {A, C} (since A→B and BC→D cover all attributes)
BCNF check:
- A → B: Is A a superkey? {A}⁺ = {A, B}. Not a superkey. VIOLATION!
Decompose:
- R1(A, B) with key A
- R2(A, C, D)
Check R2 for BCNF:
- Project FDs onto R2: Need to derive FDs over {A, C, D} from Fc
- BC → D projects to... nothing (B not in R2)
- We need to check if any FD X → A holds where X ⊂ {A, C, D}
- AC → D? Compute (AC)⁺ = {A, C, B, D}. Yes!
- Is AC a superkey of R2? Yes! (covers all of A, C, D)
- No BCNF violation in R2.
Final: R1(A, B), R2(A, C, D) - both in BCNF.
Unlike 3NF synthesis which guarantees dependency preservation, BCNF decomposition may split FDs across tables. In the example, BC → D cannot be checked without joining R1 and R2. This is a fundamental tradeoff between BCNF and 3NF.
As we've discussed, a single FD set can have multiple valid canonical covers. In practice, this raises questions about which to use and whether the choice matters.
When Choice Matters:
Cover 1: {A → B, B → A, A → C} → Tables: R1(A,B), R2(A,C)
Cover 2: {A → B, B → A, B → C} → Tables: R1(A,B), R2(B,C)
Both are valid 3NF. Which is better depends on query patterns.
Cover 1: {EmployeeID → DeptID, DeptID → ManagerID}
Cover 2: {EmployeeID → DeptID, EmployeeID → ManagerID}
If business users think in terms of "employee's manager via department",
Cover 1 is clearer.
When Choice Doesn't Matter:
When multiple canonical covers exist, prefer the one where: (1) FDs most directly express business rules, (2) Determinants align with expected query predicates, (3) The number of FDs is smallest (fewer constraint checks). Often, domain knowledge guides the best choice.
SCENARIO: E-commerce Order System Original FDs:{ OrderID → CustomerID, CustomerID → CustomerEmail, OrderID → CustomerEmail, // Transitive ProductID → ProductName, ProductID → CategoryID, CategoryID → CategoryName, ProductID → CategoryName // Transitive} Canonical Cover Option 1 (Transitive chains):{ OrderID → CustomerID, CustomerID → CustomerEmail, ProductID → ProductName, ProductID → CategoryID, CategoryID → CategoryName} Canonical Cover Option 2 (Direct relationships):{ OrderID → CustomerID, OrderID → CustomerEmail, // Keep this, remove CustomerID → CustomerEmail ... invalid - can't derive CustomerID → CustomerEmail} Actually, Option 2 isn't valid because we'd lose CustomerID → CustomerEmail. The point: sometimes there's really only one valid canonical cover,determined by the fundamental structure of dependencies.A canonical cover is only useful if it's properly documented and maintained. Here are best practices for working with FD sets in practice.
Documentation Format:
# Functional Dependencies for [Schema Name]
# Last computed: [Date]
# Original FD count: [N], Canonical cover count: [M]
## Entity: Student
StudentID → StudentName // Student's full name
StudentID → Email // University email (unique)
StudentID → EnrollmentYear // Year of first enrollment
## Entity: Course
CourseID → CourseTitle // Official course title
CourseID → DepartmentID // Offering department
CourseID → Credits // Credit hours (3 or 4)
## Relationship: Enrollment
(StudentID, CourseID) → Grade // Final grade (A-F)
(StudentID, CourseID) → Semester // When taken (F22, S23, etc.)
(StudentID, CourseID) → InstructorID // Section instructor
Organizational Guidelines:
FD documentation should be treated as living documentation, updated whenever the schema changes. An outdated FD set can lead to incorrect normalization decisions when the schema evolves.
While understanding the algorithms is essential, practical work with FDs often involves tool support.
What Tools Can Do:
Types of Tools:
Academic/Educational Tools:
Database Design Tools:
Custom Scripts:
Limitations:
| Capability | Web Calculators | Design Tools | Custom Scripts |
|---|---|---|---|
| Closure computation | ✓ | ✓ | ✓ |
| Key finding | ✓ | ✓ | ✓ |
| Canonical cover | ✓ | Sometimes | ✓ |
| Normal form check | ✓ | ✓ | ✓ |
| Decomposition | Sometimes | ✓ | Custom |
| Large FD sets | Limited | Good | Depends |
| Integration | None | Some | Full control |
| Cost | Free | $$-$$$ | Dev time |
Use tools to speed up routine calculations, but understand the algorithms well enough to verify results. A tool might have bugs, or your input might be wrong. Manual verification catches these issues.
Let's examine a realistic scenario where canonical cover computation is essential.
Scenario: Healthcare Records Integration
A hospital is merging three legacy systems:
Each system has its own FD set, gathered from different teams:
-- From Demographics System:
PatientID → Name, DOB, SSN
SSN → PatientID (unique identifier)
PatientID → PrimaryPhysicianID
PrimaryPhysicianID → PhysicianName
-- From Scheduling System:
AppointmentID → PatientID, PhysicianID, Date, Time
PatientID, Date, PhysicianID → AppointmentID (business rule: one apt per patient/doctor/day)
PhysicianID → PhysicianName (duplicate!)
PhysicianID → SpecialtyID
-- From Medical Records:
RecordID → PatientID, PhysicianID, Date, Diagnosis
PatientID → PatientName (different attribute name!)
PatientID → SSN (derived, but listed separately)
PhysicianID → SpecialtyID (duplicate!)
SpecialtyID → SpecialtyName
Step 1: Normalize Attribute Names
Name, PatientName → PatientName (standardize)
PrimaryPhysicianID → PhysicianID (if same entity)
Step 2: Merge All FDs
Combined F = {
PatientID → PatientName, DOB, SSN,
SSN → PatientID,
PatientID → PhysicianID, // Primary physician
PhysicianID → PhysicianName, SpecialtyID,
SpecialtyID → SpecialtyName,
AppointmentID → PatientID, PhysicianID, Date, Time,
PatientID, Date, PhysicianID → AppointmentID,
RecordID → PatientID, PhysicianID, Date, Diagnosis,
PatientID → SSN // Redundant with first FD!
}
Step 3: Compute Canonical Cover
After decomposition, extraneous removal, and redundancy elimination:
Fc = {
PatientID → PatientName,
PatientID → DOB,
PatientID → SSN,
SSN → PatientID,
PatientID → PhysicianID,
PhysicianID → PhysicianName,
PhysicianID → SpecialtyID,
SpecialtyID → SpecialtyName,
AppointmentID → PatientID,
AppointmentID → PhysicianID,
AppointmentID → Date,
AppointmentID → Time,
PatientID, Date, PhysicianID → AppointmentID,
RecordID → PatientID,
RecordID → PhysicianID,
RecordID → Date,
RecordID → Diagnosis
}
Result:
3NF Schema:
Patient(PatientID, PatientName, DOB, SSN, PhysicianID)
Physician(PhysicianID, PhysicianName, SpecialtyID)
Specialty(SpecialtyID, SpecialtyName)
Appointment(AppointmentID, PatientID, PhysicianID, Date, Time)
MedicalRecord(RecordID, PatientID, PhysicianID, Date, Diagnosis)
What started as 20+ FDs from three systems became a clean, non-redundant canonical cover. The resulting schema eliminates duplicate data, clarifies keys, and provides a solid foundation for the merged system.
Even with solid understanding, practitioners encounter common pitfalls when working with canonical covers.
Pitfall 1: Incomplete FD Discovery
Problem: FD set doesn't capture all business rules. Symptom: Normalization produces schema that can't enforce constraints. Solution: Systematic requirements gathering; verify FDs with multiple stakeholders.
Pitfall 2: Conflicting FDs
Problem: Different sources provide contradictory dependencies. Symptom: No valid canonical cover; algorithm fails. Solution: Resolve conflicts before computing cover; determine which constraint is correct.
Pitfall 3: Over-Reliance on Data Mining
Problem: FDs inferred from data may be accidental. Symptom: False dependencies that break with new data. Solution: Verify mined FDs against business rules; don't trust data alone.
Pitfall 4: Ignoring Nulls
Problem: FD theory assumes no nulls; real data has nulls. Symptom: FDs that hold in theory fail in practice. Solution: Document null handling; consider FDs on non-null subsets.
Pitfall 5: Static Analysis
Problem: FD set computed once, never updated. Symptom: Schema drift; FDs no longer match reality. Solution: Include FD review in change management process.
| Pitfall | Detection | Prevention | Recovery |
|---|---|---|---|
| Incomplete FDs | Schema can't enforce rules | Systematic discovery | Revisit requirements |
| Conflicting FDs | Algorithm failure | Source reconciliation | Resolve before computing |
| False mined FDs | Constraint violations | Business rule verification | Remove false FDs |
| Null issues | Runtime errors | Explicit null handling | Adjust FD scope |
| Stale FDs | Schema drift | Change management | Recompute cover |
Congratulations! You've completed the comprehensive treatment of canonical covers. Let's review everything we've covered in this module:
The Bigger Picture:
Canonical cover computation is a foundational technique in database theory. It connects FD analysis to normalization, enabling the systematic construction of well-designed schemas. Every professional database designer should be able to:
You now have all these capabilities.
You have mastered canonical cover computation—from theory to practice. You understand not just HOW to compute minimal covers, but WHY each step is necessary and HOW the result connects to normalized schema design. This is world-class understanding of a fundamental database concept.
What's Next:
The next module, FD and Keys, explores the relationship between functional dependencies and candidate keys in greater depth. You'll learn to systematically find all candidate keys from an FD set and understand how keys connect to normalization levels.