Loading content...
Functional dependencies (FDs) are the DNA of database design. Just as geneticists must identify gene sequences before they can understand heredity, database engineers must identify functional dependencies before they can normalize effectively. This seemingly simple concept—that one attribute determines another—underlies every normalization decision, every decomposition strategy, and every trade-off analysis you'll encounter in interviews and production systems.
Yet here's the challenge: FD identification is rarely taught as a systematic skill. Most students learn the definition ("X → Y means X functionally determines Y") but struggle when faced with a real-world schema and asked: What are all the functional dependencies here? The gap between understanding the concept and applying it reliably under interview pressure is where most candidates stumble.
By the end of this page, you will possess a systematic methodology for identifying functional dependencies from business rules, existing data, schema constraints, and interview problem statements. You'll understand the difference between discovering FDs and inferring them, and you'll be able to generate the complete closure of FDs from a minimal set—skills that transform normalization from guesswork into engineering precision.
Before we can identify functional dependencies reliably, we must first understand what they truly represent—not just memorize the textbook definition.
The Formal Definition:
A functional dependency X → Y holds in a relation R if and only if, for any two tuples t₁ and t₂ in R:
If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]
In plain English: Whenever two rows have the same value(s) for attribute(s) X, they must have the same value(s) for attribute(s) Y.
But this formal definition obscures the deeper insight: A functional dependency encodes a business constraint. It represents a semantic rule about your domain—a promise that the real world makes about how certain pieces of information relate to each other.
Every valid FD corresponds to a business rule. StudentID → StudentName means 'Each student has exactly one name.' ISBN → Title means 'Each book has exactly one title.' When identifying FDs, you're really asking: What real-world constraints govern this data? This perspective transforms FD identification from a mechanical exercise into domain modeling.
The Distinction Between Data and Schema FDs:
Here's a subtle but critical point that trips up many candidates:
Schema-level FDs are constraints that must hold for ALL valid instances of a relation. These are the FDs that matter for normalization.
Data-level observations are patterns that happen to hold in the current data but might not be guaranteed.
Example: Suppose in your current employee table, all employees named 'John' happen to work in the Sales department. The data suggests Name → Department. But is this a business rule? Almost certainly not—the next John hired could work in Engineering. This is an accidental coincidence, not a functional dependency.
Critical insight for interviews: When given sample data, don't just look for patterns in the data. Ask: Is this pattern a business rule that must always hold, or a coincidence of the sample? The interviewer is testing whether you understand the difference between correlation in data and causation in design.
| Situation | Example | Is It an FD? | Why? |
|---|---|---|---|
| Business rule explicitly stated | Each order has one customer | Yes (OrderID → CustomerID) | Business explicitly guarantees this |
| Unique constraint exists | Email is unique per user | Yes (Email → UserID) | Database enforces uniqueness |
| Pattern in sample data | All red cars are sedans in data | Probably No | No business rule requires this |
| Domain knowledge | SSN identifies a person | Yes (SSN → PersonDetails) | Legal/government guarantee |
| Assumption about uniqueness | All employees have different names | No (Name is not a key) | Names are not guaranteed unique |
In real-world scenarios and interviews, FDs emerge from multiple sources. A skilled database engineer draws from all of these to build a complete picture of the dependencies.
Source 1: Explicit Business Rules
The most reliable source. When stakeholders say things like:
These translate directly into FDs:
Source 2: Unique Constraints and Keys
Any attribute(s) marked as UNIQUE or PRIMARY KEY are the left side of FDs determining all other attributes:
This is why key identification and FD identification are deeply linked—we'll explore this connection in Page 4.
A candidate key K of relation R is a minimal set of attributes such that K → R (K determines all attributes) and no proper subset of K has this property. Thus, every key gives us FDs, and conversely, FDs help us find keys. This bidirectional relationship is fundamental to normalization theory.
Source 3: Domain Knowledge
Some FDs are so universally true that they're assumed:
Interview tip: If you recognize domain-specific dependencies, mention them—it shows depth of understanding.
Source 4: Derived/Transitive Dependencies
Once you identify some FDs, you can derive others using Armstrong's Axioms:
This is crucial for identifying potential 3NF violations.
Source 5: Composite Dependencies
Sometimes no single attribute determines another, but a combination does:
These composite determinants often become parts of composite primary keys.
Let's formalize a systematic process for FD identification that you can apply consistently in interviews and design reviews.
Phase 1: Enumerate All Attributes
Start by listing every attribute in the relation. For a relation R(A, B, C, D, E), write:
Phase 2: Identify Obvious Single-Attribute Determinants
For each attribute, ask: Does this attribute alone determine any others?
Phase 3: Identify Composite Determinants
For each pair, triple, etc., ask: Does this combination determine attributes that single attributes don't?
Phase 4: Apply Transitivity
For each FD X → Y you've found, check: Does Y determine anything else?
Phase 5: Validate Against Business Rules
Review each FD you've identified:
Relation: Enrollment(StudentID, StudentName, CourseID, CourseName, InstructorID, InstructorName, Grade)
Business Rules:
1. Each student has a unique StudentID
2. Each course has a unique CourseID
3. Each instructor has a unique InstructorID
4. Each course has one instructor
5. A student can enroll in many courses
6. A student's grade is specific to a course**Identified FDs:**
• StudentID → StudentName (Rule 1: students are unique)
• CourseID → CourseName, InstructorID (Rule 2, 4: courses are unique and have one instructor)
• InstructorID → InstructorName (Rule 3: instructors are unique)
• {StudentID, CourseID} → Grade (Rule 6: grade is per enrollment)
**Derived FDs (via transitivity):**
• CourseID → InstructorName (CourseID → InstructorID → InstructorName)
**Candidate Key:** {StudentID, CourseID}
(This minimal set determines all attributes)Mistake 1: Assuming X → Y because Y appears after X in the schema. Column order is irrelevant to FDs.
Mistake 2: Confusing many-to-one with one-to-many. 'Each employee has one department' means EmployeeID → DeptID, not DeptID → EmployeeID.
Mistake 3: Missing composite keys. When no single attribute is a key, a combination must be.
Mistake 4: Assuming symmetry. X → Y does NOT imply Y → X unless both are candidate keys.
Once you've identified initial FDs, Armstrong's Axioms let you derive all implied FDs—essential for complete normalization analysis.
Armstrong's Axioms (Sound and Complete)
1. Reflexivity: If Y ⊆ X, then X → Y
2. Augmentation: If X → Y, then XZ → YZ
3. Transitivity: If X → Y and Y → Z, then X → Z
Derived Rules (from the axioms):
4. Union: If X → Y and X → Z, then X → YZ
5. Decomposition: If X → YZ, then X → Y and X → Z
6. Pseudotransitivity: If X → Y and WY → Z, then WX → Z
1234567891011121314151617181920212223242526272829303132333435363738394041
/** * Attribute Closure Algorithm * Input: Set of attributes X, Set of FDs F * Output: X+ (all attributes determined by X under F) */function computeClosure(X: Set<Attribute>, F: Set<FD>): Set<Attribute> { let closure = new Set(X); // Initialize with X itself let changed = true; while (changed) { changed = false; for (const fd of F) { // If left side of FD is contained in closure if (isSubset(fd.leftSide, closure)) { // Add right side to closure const sizeBefore = closure.size; addAll(closure, fd.rightSide); // Check if closure expanded if (closure.size > sizeBefore) { changed = true; } } } } return closure;} // Example: Computing closure of {StudentID} given FDs// FDs: {StudentID → Name, StudentID → DeptID, DeptID → DeptName}// // Iteration 1: closure = {StudentID}// - Apply StudentID → Name: closure = {StudentID, Name}// - Apply StudentID → DeptID: closure = {StudentID, Name, DeptID}// // Iteration 2: closure = {StudentID, Name, DeptID}// - Apply DeptID → DeptName: closure = {StudentID, Name, DeptID, DeptName}// // Result: {StudentID}+ = {StudentID, Name, DeptID, DeptName}A common interview question: 'Given FDs {A → B, B → C, C → D}, what is A⁺?' Use the algorithm: Start with {A}. Apply A → B: {A, B}. Apply B → C: {A, B, C}. Apply C → D: {A, B, C, D}. Answer: A⁺ = {A, B, C, D}. This means A is a superkey (determines everything) and potentially a candidate key.
Why Closure Matters:
Key Identification: X is a superkey iff X⁺ = R (all attributes). X is a candidate key iff X is a minimal superkey.
FD Verification: To check if X → Y holds, compute X⁺ and verify Y ⊆ X⁺.
Minimal Cover: Computing closures helps find a minimal set of FDs equivalent to the original.
Normalization: Closure reveals all transitive and partial dependencies for NF analysis.
Given a set of FDs, there may be redundancy—some FDs can be derived from others. The canonical cover (or minimal basis) is the smallest equivalent set of FDs, crucial for efficient database design.
Why We Need Canonical Covers:
Properties of a Canonical Cover Fc:
12345678910111213141516171819202122232425262728293031323334353637383940414243
/** * Canonical Cover Algorithm * Input: Set of FDs F * Output: Minimal canonical cover Fc equivalent to F */function computeCanonicalCover(F: Set<FD>): Set<FD> { // Step 1: Decompose right sides to single attributes let Fc = decomposeRightSides(F); // {A → BC} becomes {A → B, A → C} // Step 2: Remove extraneous left-side attributes for (const fd of Fc) { for (const attr of fd.leftSide) { // Check if attr is extraneous const reducedLeft = fd.leftSide.minus(attr); const closureWithout = computeClosure(reducedLeft, Fc); if (closureWithout.contains(fd.rightSide)) { // attr is extraneous, remove it fd.leftSide = reducedLeft; } } } // Step 3: Remove extraneous right-side attributes // (After Step 1, right sides are singletons, so skip) // Step 4: Remove redundant FDs for (const fd of Fc) { const FcMinusFD = Fc.minus(fd); const closureWithout = computeClosure(fd.leftSide, FcMinusFD); if (closureWithout.contains(fd.rightSide)) { // fd is redundant, remove it Fc = FcMinusFD; } } // Step 5: Combine FDs with same left side (optional) // {A → B, A → C} can become {A → BC} return Fc;}F = {A → BC, B → C, A → B, AB → C}
Find the canonical cover.**Step 1: Decompose right sides**
F₁ = {A → B, A → C, B → C, A → B, AB → C}
**Step 2: Remove duplicate {A → B}**
F₂ = {A → B, A → C, B → C, AB → C}
**Step 3: Check for extraneous left attributes in AB → C**
- Is A extraneous? Check if B → C is in F₂. Yes! B → C exists.
- So AB → C becomes B → C (but it's already there)
- Remove AB → C as redundant
F₃ = {A → B, A → C, B → C}
**Step 4: Check A → C for redundancy**
- Under {A → B, B → C}, compute A⁺ = {A, B, C}
- C is in A⁺, so A → C is derivable (by transitivity)
- Remove A → C
**Canonical Cover: Fc = {A → B, B → C}**Let's apply everything to realistic interview problems. These are the types of scenarios you'll encounter.
Scenario Type 1: Given Business Description
You're given a natural language description of a business and asked to identify FDs.
Example: "A hospital tracks patients, doctors, and appointments. Each patient has an ID, name, and date of birth. Each doctor has an ID, name, and specialty. Appointments are scheduled between a patient and a doctor on a specific date and time, with notes recorded."
FD Extraction:
Scenario Type 2: Given Schema with Constraints
You're given a CREATE TABLE statement and must identify FDs.
1234567891011121314151617181920212223242526272829
-- Given this schema, identify all FDsCREATE TABLE OrderDetails ( OrderID INT, ProductID INT, CustomerID INT, CustomerName VARCHAR(100), ProductName VARCHAR(100), UnitPrice DECIMAL(10,2), Quantity INT, OrderDate DATE, PRIMARY KEY (OrderID, ProductID), FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID), FOREIGN KEY (ProductID) REFERENCES Products(ProductID)); -- FD Analysis:-- 1. From PRIMARY KEY: {OrderID, ProductID} → all attributes-- 2. From FK to Customers: CustomerID → CustomerName-- 3. From FK to Products: ProductID → ProductName, UnitPrice-- 4. From business logic: OrderID → CustomerID, CustomerName, OrderDate-- (each order belongs to one customer, has one date)-- 5. Combining: {OrderID, ProductID} → Quantity-- (quantity is specific to each line item) -- Complete FD Set:-- OrderID → CustomerID, CustomerName, OrderDate-- CustomerID → CustomerName (transitive via Orders-Customer)-- ProductID → ProductName, UnitPrice-- {OrderID, ProductID} → QuantityScenario Type 3: Given Sample Data
You're shown a table with sample data and asked to identify FDs. This tests the critical distinction between data coincidence and schema constraint.
| EmpID | Name | DeptID | DeptName | ManagerID | Salary |
|---|---|---|---|---|---|
| E001 | Alice | D1 | Engineering | E005 | 90000 |
| E002 | Bob | D1 | Engineering | E005 | 85000 |
| E003 | Charlie | D2 | Marketing | E006 | 75000 |
| E004 | Diana | D2 | Marketing | E006 | 80000 |
| E005 | Eve | D1 | Engineering | NULL | 120000 |
Analyzing the Sample Data:
✅ Valid FDs (business rules):
❌ Invalid patterns (data coincidence):
⚠️ Question the interviewer: "Is the department manager fixed per department, or can employees in the same department have different managers?"
This clarifying question demonstrates sophisticated thinking.
For completeness, let's cover advanced FD concepts that occasionally appear in interviews and are essential for complete mastery.
Multivalued Dependencies (MVDs)
Notation: X ↠ Y (read "X multi-determines Y")
Meaning: For a given X value, the set of Y values is independent of the set of Z values (where Z is all other attributes).
Example: A course can have multiple instructors and multiple textbooks independently. CourseID ↠ Instructor and CourseID ↠ Textbook. The choice of instructors doesn't depend on the choice of textbooks.
MVDs are relevant for 4NF—a table violating 4NF has non-trivial MVDs that aren't FDs.
Join Dependencies (JDs)
Notation: (R1, R2, ..., Rn)
Meaning: The table can be losslessly decomposed into R1, R2, ..., Rn and rejoined.
JDs are relevant for 5NF (Project-Join Normal Form). Most practical databases don't require 5NF analysis, but awareness demonstrates depth.
Embedded/Implied Dependencies
Some FDs are embedded within larger FDs:
In most interviews, demonstrating mastery of basic FDs is sufficient. Mention MVDs and JDs only if: (1) the interviewer explicitly asks about higher normal forms, (2) the problem clearly involves multivalued facts, or (3) you want to show depth by briefly acknowledging the existence of these concepts. Don't over-complicate simple problems.
Lossless-Join Decomposition Test
When decomposing R into R1 and R2, the decomposition is lossless iff:
In plain English: The common attributes must be a superkey of at least one decomposed relation.
This is critical for the decomposition problems we'll cover in Page 3.
Let's consolidate your learning with practice problems that mirror real interview questions.
OrderLine(OrderID, ProductID, CustomerID, CustomerEmail, ProductName, ProductCategory, Quantity, UnitPrice, TotalPrice, OrderDate)
Business Rules:
- Each order belongs to one customer
- Customers have unique emails
- Products have unique names and belong to one category
- An order can have multiple products (line items)
- TotalPrice = Quantity × UnitPrice**FDs Identified:**
1. CustomerID → CustomerEmail
2. CustomerEmail → CustomerID (email is unique, so bidirectional)
3. ProductID → ProductName, ProductCategory, UnitPrice
4. OrderID → CustomerID, CustomerEmail, OrderDate
5. {OrderID, ProductID} → Quantity, TotalPrice
**Derived FD:**
6. {Quantity, UnitPrice} → TotalPrice (calculated field)
**Candidate Key:** {OrderID, ProductID}
**Normalization Issues:**
- CustomerEmail is transitively dependent (OrderID → CustomerID → CustomerEmail): 3NF violation
- ProductName, ProductCategory, UnitPrice are partially dependent on key: 2NF violation
- Schema needs decompositionR(A, B, C, D, E)
FDs: {A → B, BC → D, D → E, E → A}
Compute A⁺, B⁺, {B,C}⁺, and identify all candidate keys.**A⁺ Computation:**
{A} → {A, B} (A → B)
No more expansions possible.
**A⁺ = {A, B}**
**B⁺ Computation:**
{B} - no applicable FDs
**B⁺ = {B}**
**{B,C}⁺ Computation:**
{B, C} → {B, C, D} (BC → D)
{B, C, D} → {B, C, D, E} (D → E)
{B, C, D, E} → {A, B, C, D, E} (E → A)
**{B,C}⁺ = {A, B, C, D, E} = R**
**Candidate Keys:**
- {B, C} is a superkey (proven above)
- Is {B} alone a key? B⁺ = {B} ≠ R. No.
- Is {C} alone a key? C⁺ = {C} ≠ R. No.
- Therefore, {B, C} is a **candidate key** (minimal superkey)
**Other candidate keys?**
- From cycle E → A → B, check if any leads back...
- {C, E}: {C, E} → {A, C, E} → {A, B, C, E} → {A, B, C, D, E}. Yes! {C, E} is also a candidate key.
**Candidate Keys: {B, C} and {C, E}**What's Next:
With FD identification mastered, we'll move to Page 2: Normal Form Analysis—where you'll learn to systematically determine which normal form a relation satisfies and identify the specific violations that need to be addressed. The ability to identify FDs is prerequisite; now we'll put those FDs to work.
You now possess a systematic, rigorous approach to functional dependency identification. This foundational skill unlocks all subsequent normalization analysis. Practice with real schemas—every database you encounter is an opportunity to identify FDs and verify your understanding.