Normalization Problems - Learning Module

Loading content...

0/241

FD Identification

The Foundation of Normalization Excellence

Functional dependencies (FDs) are the DNA of database design. Just as geneticists must identify gene sequences before they can understand heredity, database engineers must identify functional dependencies before they can normalize effectively. This seemingly simple concept—that one attribute determines another—underlies every normalization decision, every decomposition strategy, and every trade-off analysis you'll encounter in interviews and production systems.

Yet here's the challenge: FD identification is rarely taught as a systematic skill. Most students learn the definition ("X → Y means X functionally determines Y") but struggle when faced with a real-world schema and asked: What are all the functional dependencies here? The gap between understanding the concept and applying it reliably under interview pressure is where most candidates stumble.

What You Will Master

By the end of this page, you will possess a systematic methodology for identifying functional dependencies from business rules, existing data, schema constraints, and interview problem statements. You'll understand the difference between discovering FDs and inferring them, and you'll be able to generate the complete closure of FDs from a minimal set—skills that transform normalization from guesswork into engineering precision.

Understanding Functional Dependencies at a Deep Level

Before we can identify functional dependencies reliably, we must first understand what they truly represent—not just memorize the textbook definition.

The Formal Definition:

A functional dependency X → Y holds in a relation R if and only if, for any two tuples t₁ and t₂ in R:

If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]

In plain English: Whenever two rows have the same value(s) for attribute(s) X, they must have the same value(s) for attribute(s) Y.

But this formal definition obscures the deeper insight: A functional dependency encodes a business constraint. It represents a semantic rule about your domain—a promise that the real world makes about how certain pieces of information relate to each other.

The Business Rule Perspective

Every valid FD corresponds to a business rule. StudentID → StudentName means 'Each student has exactly one name.' ISBN → Title means 'Each book has exactly one title.' When identifying FDs, you're really asking: What real-world constraints govern this data? This perspective transforms FD identification from a mechanical exercise into domain modeling.

The Distinction Between Data and Schema FDs:

Here's a subtle but critical point that trips up many candidates:

Schema-level FDs are constraints that must hold for ALL valid instances of a relation. These are the FDs that matter for normalization.
Data-level observations are patterns that happen to hold in the current data but might not be guaranteed.

Example: Suppose in your current employee table, all employees named 'John' happen to work in the Sales department. The data suggests Name → Department. But is this a business rule? Almost certainly not—the next John hired could work in Engineering. This is an accidental coincidence, not a functional dependency.

Critical insight for interviews: When given sample data, don't just look for patterns in the data. Ask: Is this pattern a business rule that must always hold, or a coincidence of the sample? The interviewer is testing whether you understand the difference between correlation in data and causation in design.

Schema FDs vs. Data Coincidences
Situation	Example	Is It an FD?	Why?
Business rule explicitly stated	Each order has one customer	Yes (OrderID → CustomerID)	Business explicitly guarantees this
Unique constraint exists	Email is unique per user	Yes (Email → UserID)	Database enforces uniqueness
Pattern in sample data	All red cars are sedans in data	Probably No	No business rule requires this
Domain knowledge	SSN identifies a person	Yes (SSN → PersonDetails)	Legal/government guarantee
Assumption about uniqueness	All employees have different names	No (Name is not a key)	Names are not guaranteed unique

Sources of Functional Dependencies

In real-world scenarios and interviews, FDs emerge from multiple sources. A skilled database engineer draws from all of these to build a complete picture of the dependencies.

Source 1: Explicit Business Rules

The most reliable source. When stakeholders say things like:

"Each customer has exactly one billing address"
"An order can only belong to one customer"
"Each product has a unique SKU"

These translate directly into FDs:

CustomerID → BillingAddress
OrderID → CustomerID
SKU → ProductDetails

Source 2: Unique Constraints and Keys

Any attribute(s) marked as UNIQUE or PRIMARY KEY are the left side of FDs determining all other attributes:

PRIMARY KEY (EmployeeID) implies: EmployeeID → {Name, Department, Salary, ...}
UNIQUE (Email) implies: Email → {UserID, Name, ...}

This is why key identification and FD identification are deeply linked—we'll explore this connection in Page 4.

The Key-FD Relationship

A candidate key K of relation R is a minimal set of attributes such that K → R (K determines all attributes) and no proper subset of K has this property. Thus, every key gives us FDs, and conversely, FDs help us find keys. This bidirectional relationship is fundamental to normalization theory.

Source 3: Domain Knowledge

Some FDs are so universally true that they're assumed:

ISBN → BookTitle, Author, Publisher (publishing industry standard)
SSN → PersonName, DateOfBirth (government ID)
ZIPCode → City, State (US postal standard—mostly)
VIN → CarMake, Model, Year (automotive standard)

Interview tip: If you recognize domain-specific dependencies, mention them—it shows depth of understanding.

Source 4: Derived/Transitive Dependencies

Once you identify some FDs, you can derive others using Armstrong's Axioms:

If StudentID → DepartmentID and DepartmentID → DepartmentName
Then StudentID → DepartmentName (transitivity)

This is crucial for identifying potential 3NF violations.

Source 5: Composite Dependencies

Sometimes no single attribute determines another, but a combination does:

StudentID, CourseID → Grade (a student's grade in a specific course)
OrderID, ProductID → Quantity (quantity of a specific product in an order)

These composite determinants often become parts of composite primary keys.

FD Source Checklist for Interviews

•Read the problem statement carefully — Business rules are often stated directly. 'Each X has one Y' means X → Y.
•Identify all keys and unique constraints — Each key K implies K → (all attributes).
•Apply domain knowledge — Recognize standard identifiers (SSN, ISBN, etc.) and their dependencies.
•Look for composite dependencies — When single attributes don't determine others, consider combinations.
•Use Armstrong's Axioms — Derive additional FDs from those you've found.
•Question coincidences — Patterns in sample data aren't FDs unless guaranteed by business rules.

Systematic FD Identification Process

Let's formalize a systematic process for FD identification that you can apply consistently in interviews and design reviews.

Phase 1: Enumerate All Attributes

Start by listing every attribute in the relation. For a relation R(A, B, C, D, E), write:

Attributes: {A, B, C, D, E}

Phase 2: Identify Obvious Single-Attribute Determinants

For each attribute, ask: Does this attribute alone determine any others?

Is A a key or unique? If yes, A → {B, C, D, E}
Does A have special domain meaning (like SSN, ISBN)?

Phase 3: Identify Composite Determinants

For each pair, triple, etc., ask: Does this combination determine attributes that single attributes don't?

Often, the primary key is a composite (like {StudentID, CourseID})
Look for many-to-many relationships that require combinations

Phase 4: Apply Transitivity

For each FD X → Y you've found, check: Does Y determine anything else?

If X → Y and Y → Z, then X → Z (record this, as it reveals normalization opportunities)

Phase 5: Validate Against Business Rules

Review each FD you've identified:

Is this guaranteed by a business rule?
Or is it just an artifact of sample data?

FD Identification WalkthroughApplying systematic FD identification to an enrollment schema

Input

Relation: Enrollment(StudentID, StudentName, CourseID, CourseName, InstructorID, InstructorName, Grade)

Business Rules:
1. Each student has a unique StudentID
2. Each course has a unique CourseID
3. Each instructor has a unique InstructorID
4. Each course has one instructor
5. A student can enroll in many courses
6. A student's grade is specific to a course

Output

**Identified FDs:**

• StudentID → StudentName (Rule 1: students are unique)
• CourseID → CourseName, InstructorID (Rule 2, 4: courses are unique and have one instructor)
• InstructorID → InstructorName (Rule 3: instructors are unique)
• {StudentID, CourseID} → Grade (Rule 6: grade is per enrollment)

**Derived FDs (via transitivity):**
• CourseID → InstructorName (CourseID → InstructorID → InstructorName)

**Candidate Key:** {StudentID, CourseID}
(This minimal set determines all attributes)

Common Identification Mistakes

Mistake 1: Assuming X → Y because Y appears after X in the schema. Column order is irrelevant to FDs.

Mistake 2: Confusing many-to-one with one-to-many. 'Each employee has one department' means EmployeeID → DeptID, not DeptID → EmployeeID.

Mistake 3: Missing composite keys. When no single attribute is a key, a combination must be.

Mistake 4: Assuming symmetry. X → Y does NOT imply Y → X unless both are candidate keys.

Armstrong's Axioms and Attribute Closure

Once you've identified initial FDs, Armstrong's Axioms let you derive all implied FDs—essential for complete normalization analysis.

Armstrong's Axioms (Sound and Complete)

1. Reflexivity: If Y ⊆ X, then X → Y

{A, B} → A (trivially true)
Used less in practice, but proves completeness

2. Augmentation: If X → Y, then XZ → YZ

If StudentID → Name, then {StudentID, Age} → {Name, Age}
Adding attributes to both sides preserves the dependency

3. Transitivity: If X → Y and Y → Z, then X → Z

If CourseID → InstructorID and InstructorID → InstructorName
Then CourseID → InstructorName
Critical for identifying 3NF violations

Derived Rules (from the axioms):

4. Union: If X → Y and X → Z, then X → YZ

Combine multiple dependencies with the same determinant

5. Decomposition: If X → YZ, then X → Y and X → Z

Split the right side for analysis

6. Pseudotransitivity: If X → Y and WY → Z, then WX → Z

Generalized transitivity with additional attributes

Attribute Closure Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/**
 * Attribute Closure Algorithm
 * Input: Set of attributes X, Set of FDs F
 * Output: X+ (all attributes determined by X under F)
 */
function computeClosure(X: Set<Attribute>, F: Set<FD>): Set<Attribute> {
    let closure = new Set(X);  // Initialize with X itself
    let changed = true;
    
    while (changed) {
        changed = false;
        
        for (const fd of F) {
            // If left side of FD is contained in closure
            if (isSubset(fd.leftSide, closure)) {
                // Add right side to closure
                const sizeBefore = closure.size;
                addAll(closure, fd.rightSide);
                
                // Check if closure expanded
                if (closure.size > sizeBefore) {
                    changed = true;
                }
            }
        }
    }
    
    return closure;
}
 
// Example: Computing closure of {StudentID} given FDs
// FDs: {StudentID → Name, StudentID → DeptID, DeptID → DeptName}
// 
// Iteration 1: closure = {StudentID}
//   - Apply StudentID → Name: closure = {StudentID, Name}
//   - Apply StudentID → DeptID: closure = {StudentID, Name, DeptID}
//   
// Iteration 2: closure = {StudentID, Name, DeptID}
//   - Apply DeptID → DeptName: closure = {StudentID, Name, DeptID, DeptName}
//   
// Result: {StudentID}+ = {StudentID, Name, DeptID, DeptName}

Interview Closure Questions

A common interview question: 'Given FDs {A → B, B → C, C → D}, what is A⁺?' Use the algorithm: Start with {A}. Apply A → B: {A, B}. Apply B → C: {A, B, C}. Apply C → D: {A, B, C, D}. Answer: A⁺ = {A, B, C, D}. This means A is a superkey (determines everything) and potentially a candidate key.

Why Closure Matters:

Key Identification: X is a superkey iff X⁺ = R (all attributes). X is a candidate key iff X is a minimal superkey.
FD Verification: To check if X → Y holds, compute X⁺ and verify Y ⊆ X⁺.
Minimal Cover: Computing closures helps find a minimal set of FDs equivalent to the original.
Normalization: Closure reveals all transitive and partial dependencies for NF analysis.

Canonical Cover and Minimal Basis

Given a set of FDs, there may be redundancy—some FDs can be derived from others. The canonical cover (or minimal basis) is the smallest equivalent set of FDs, crucial for efficient database design.

Why We Need Canonical Covers:

Reduced Overhead: Fewer FDs mean fewer constraints to enforce
Clearer Design: Shows only essential dependencies
BCNF/3NF Decomposition: Algorithms often require minimal FDs

Properties of a Canonical Cover Fc:

Equivalent: Fc⁺ = F⁺ (same closure)
No redundant FDs: Removing any FD changes the closure
No extraneous attributes:
- No extraneous left-side attributes (can't remove any)
- No extraneous right-side attributes (can't remove any)
Single attribute on right: Each FD has form X → A (singleton right side)

Canonical Cover Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/**
 * Canonical Cover Algorithm
 * Input: Set of FDs F
 * Output: Minimal canonical cover Fc equivalent to F
 */
function computeCanonicalCover(F: Set<FD>): Set<FD> {
    // Step 1: Decompose right sides to single attributes
    let Fc = decomposeRightSides(F);
    // {A → BC} becomes {A → B, A → C}
    
    // Step 2: Remove extraneous left-side attributes
    for (const fd of Fc) {
        for (const attr of fd.leftSide) {
            // Check if attr is extraneous
            const reducedLeft = fd.leftSide.minus(attr);
            const closureWithout = computeClosure(reducedLeft, Fc);
            
            if (closureWithout.contains(fd.rightSide)) {
                // attr is extraneous, remove it
                fd.leftSide = reducedLeft;
            }
        }
    }
    
    // Step 3: Remove extraneous right-side attributes
    // (After Step 1, right sides are singletons, so skip)
    
    // Step 4: Remove redundant FDs
    for (const fd of Fc) {
        const FcMinusFD = Fc.minus(fd);
        const closureWithout = computeClosure(fd.leftSide, FcMinusFD);
        
        if (closureWithout.contains(fd.rightSide)) {
            // fd is redundant, remove it
            Fc = FcMinusFD;
        }
    }
    
    // Step 5: Combine FDs with same left side (optional)
    // {A → B, A → C} can become {A → BC}
    
    return Fc;
}

Computing Canonical CoverReducing a set of FDs to its minimal form

Input

F = {A → BC, B → C, A → B, AB → C}

Find the canonical cover.

Output

**Step 1: Decompose right sides**
F₁ = {A → B, A → C, B → C, A → B, AB → C}

**Step 2: Remove duplicate {A → B}**
F₂ = {A → B, A → C, B → C, AB → C}

**Step 3: Check for extraneous left attributes in AB → C**
- Is A extraneous? Check if B → C is in F₂. Yes! B → C exists.
- So AB → C becomes B → C (but it's already there)
- Remove AB → C as redundant
F₃ = {A → B, A → C, B → C}

**Step 4: Check A → C for redundancy**
- Under {A → B, B → C}, compute A⁺ = {A, B, C}
- C is in A⁺, so A → C is derivable (by transitivity)
- Remove A → C

**Canonical Cover: Fc = {A → B, B → C}**

FD Identification in Interview Scenarios

Let's apply everything to realistic interview problems. These are the types of scenarios you'll encounter.

Scenario Type 1: Given Business Description

You're given a natural language description of a business and asked to identify FDs.

Example: "A hospital tracks patients, doctors, and appointments. Each patient has an ID, name, and date of birth. Each doctor has an ID, name, and specialty. Appointments are scheduled between a patient and a doctor on a specific date and time, with notes recorded."

FD Extraction:

PatientID → PatientName, DateOfBirth
DoctorID → DoctorName, Specialty
{PatientID, DoctorID, Date, Time} → Notes (appointment key)

Scenario Type 2: Given Schema with Constraints

You're given a CREATE TABLE statement and must identify FDs.

Schema Analysis Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Given this schema, identify all FDs
CREATE TABLE OrderDetails (
    OrderID INT,
    ProductID INT,
    CustomerID INT,
    CustomerName VARCHAR(100),
    ProductName VARCHAR(100),
    UnitPrice DECIMAL(10,2),
    Quantity INT,
    OrderDate DATE,
    PRIMARY KEY (OrderID, ProductID),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);
 
-- FD Analysis:
-- 1. From PRIMARY KEY: {OrderID, ProductID} → all attributes
-- 2. From FK to Customers: CustomerID → CustomerName
-- 3. From FK to Products: ProductID → ProductName, UnitPrice
-- 4. From business logic: OrderID → CustomerID, CustomerName, OrderDate
--    (each order belongs to one customer, has one date)
-- 5. Combining: {OrderID, ProductID} → Quantity
--    (quantity is specific to each line item)
 
-- Complete FD Set:
-- OrderID → CustomerID, CustomerName, OrderDate
-- CustomerID → CustomerName (transitive via Orders-Customer)
-- ProductID → ProductName, UnitPrice
-- {OrderID, ProductID} → Quantity

Strong Answers Include

•Methodical identification from all sources
•Explicit reasoning for each FD
•Recognition of transitive dependencies
•Identification of candidate keys
•Discussion of normalization implications

Weak Answers Exhibit

•Missing FDs due to incomplete analysis
•Confusing data patterns with FDs
•Ignoring composite dependencies
•Missing transitive dependencies
•Vague reasoning or guessing

Scenario Type 3: Given Sample Data

You're shown a table with sample data and asked to identify FDs. This tests the critical distinction between data coincidence and schema constraint.

Sample Data for FD Analysis
EmpID	Name	DeptID	DeptName	ManagerID	Salary
E001	Alice	D1	Engineering	E005	90000
E002	Bob	D1	Engineering	E005	85000
E003	Charlie	D2	Marketing	E006	75000
E004	Diana	D2	Marketing	E006	80000
E005	Eve	D1	Engineering	NULL	120000

Analyzing the Sample Data:

✅ Valid FDs (business rules):

EmpID → Name, DeptID, ManagerID, Salary (employees are unique)
DeptID → DeptName (departments have fixed names)

❌ Invalid patterns (data coincidence):

It looks like DeptID → ManagerID (all D1 emps have E005 as manager)
But this is likely coincidence—managers could change, departments could have multiple managers in different teams

⚠️ Question the interviewer: "Is the department manager fixed per department, or can employees in the same department have different managers?"

This clarifying question demonstrates sophisticated thinking.

Advanced FD Concepts

For completeness, let's cover advanced FD concepts that occasionally appear in interviews and are essential for complete mastery.

Multivalued Dependencies (MVDs)

Notation: X ↠ Y (read "X multi-determines Y")

Meaning: For a given X value, the set of Y values is independent of the set of Z values (where Z is all other attributes).

Example: A course can have multiple instructors and multiple textbooks independently. CourseID ↠ Instructor and CourseID ↠ Textbook. The choice of instructors doesn't depend on the choice of textbooks.

MVDs are relevant for 4NF—a table violating 4NF has non-trivial MVDs that aren't FDs.

Join Dependencies (JDs)

Notation: (R1, R2, ..., Rn)

Meaning: The table can be losslessly decomposed into R1, R2, ..., Rn and rejoined.

JDs are relevant for 5NF (Project-Join Normal Form). Most practical databases don't require 5NF analysis, but awareness demonstrates depth.

Embedded/Implied Dependencies

Some FDs are embedded within larger FDs:

If {A, B} → C and A → B, then A → C (A alone determines C)
Recognizing embedded dependencies helps minimize key sizes

When to Mention Advanced Concepts

In most interviews, demonstrating mastery of basic FDs is sufficient. Mention MVDs and JDs only if: (1) the interviewer explicitly asks about higher normal forms, (2) the problem clearly involves multivalued facts, or (3) you want to show depth by briefly acknowledging the existence of these concepts. Don't over-complicate simple problems.

Lossless-Join Decomposition Test

When decomposing R into R1 and R2, the decomposition is lossless iff:

(R1 ∩ R2) → R1, OR
(R1 ∩ R2) → R2

In plain English: The common attributes must be a superkey of at least one decomposed relation.

This is critical for the decomposition problems we'll cover in Page 3.

Practice Problems and Summary

Let's consolidate your learning with practice problems that mirror real interview questions.

Practice Problem 1: E-Commerce SchemaIdentify all FDs and candidate keys

Input

OrderLine(OrderID, ProductID, CustomerID, CustomerEmail, ProductName, ProductCategory, Quantity, UnitPrice, TotalPrice, OrderDate)

Business Rules:
- Each order belongs to one customer
- Customers have unique emails
- Products have unique names and belong to one category
- An order can have multiple products (line items)
- TotalPrice = Quantity × UnitPrice

Output

**FDs Identified:**

1. CustomerID → CustomerEmail
2. CustomerEmail → CustomerID (email is unique, so bidirectional)
3. ProductID → ProductName, ProductCategory, UnitPrice
4. OrderID → CustomerID, CustomerEmail, OrderDate
5. {OrderID, ProductID} → Quantity, TotalPrice

**Derived FD:**
6. {Quantity, UnitPrice} → TotalPrice (calculated field)

**Candidate Key:** {OrderID, ProductID}

**Normalization Issues:**
- CustomerEmail is transitively dependent (OrderID → CustomerID → CustomerEmail): 3NF violation
- ProductName, ProductCategory, UnitPrice are partially dependent on key: 2NF violation
- Schema needs decomposition

Practice Problem 2: Closure ComputationCompute attribute closures and identify keys

Input

R(A, B, C, D, E)
FDs: {A → B, BC → D, D → E, E → A}

Compute A⁺, B⁺, {B,C}⁺, and identify all candidate keys.

Output

**A⁺ Computation:**
{A} → {A, B} (A → B)
No more expansions possible.
**A⁺ = {A, B}**

**B⁺ Computation:**
{B} - no applicable FDs
**B⁺ = {B}**

**{B,C}⁺ Computation:**
{B, C} → {B, C, D} (BC → D)
{B, C, D} → {B, C, D, E} (D → E)
{B, C, D, E} → {A, B, C, D, E} (E → A)
**{B,C}⁺ = {A, B, C, D, E} = R**

**Candidate Keys:**
- {B, C} is a superkey (proven above)
- Is {B} alone a key? B⁺ = {B} ≠ R. No.
- Is {C} alone a key? C⁺ = {C} ≠ R. No.
- Therefore, {B, C} is a **candidate key** (minimal superkey)

**Other candidate keys?**
- From cycle E → A → B, check if any leads back...
- {C, E}: {C, E} → {A, C, E} → {A, B, C, E} → {A, B, C, D, E}. Yes! {C, E} is also a candidate key.

**Candidate Keys: {B, C} and {C, E}**

Key Takeaways

•FDs encode business rules — They're semantic constraints, not just data patterns. Always validate against business logic.
•Multiple sources for FDs — Use explicit rules, keys, domain knowledge, and derivation via Armstrong's Axioms.
•Systematic identification is reliable — Apply the 5-phase process: enumerate attributes, find single determinants, find composite determinants, apply transitivity, validate.
•Attribute closure is fundamental — Used for key identification, FD verification, and canonical cover computation.
•Canonical cover minimizes redundancy — Essential for efficient design and as input to normalization algorithms.
•Interview excellence requires precision — Distinguish data coincidence from constraints, ask clarifying questions, and show your reasoning.

What's Next:

With FD identification mastered, we'll move to Page 2: Normal Form Analysis—where you'll learn to systematically determine which normal form a relation satisfies and identify the specific violations that need to be addressed. The ability to identify FDs is prerequisite; now we'll put those FDs to work.

Page Complete

You now possess a systematic, rigorous approach to functional dependency identification. This foundational skill unlocks all subsequent normalization analysis. Practice with real schemas—every database you encounter is an opportunity to identify FDs and verify your understanding.

FD Identification

The Foundation of Normalization Excellence

What You Will Master

Understanding Functional Dependencies at a Deep Level

Before we can identify functional dependencies reliably, we must first understand what they truly represent—not just memorize the textbook definition.

The Formal Definition:

A functional dependency X → Y holds in a relation R if and only if, for any two tuples t₁ and t₂ in R:

If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]

In plain English: Whenever two rows have the same value(s) for attribute(s) X, they must have the same value(s) for attribute(s) Y.

The Business Rule Perspective

The Distinction Between Data and Schema FDs:

Here's a subtle but critical point that trips up many candidates:

Schema-level FDs are constraints that must hold for ALL valid instances of a relation. These are the FDs that matter for normalization.
Data-level observations are patterns that happen to hold in the current data but might not be guaranteed.

Schema FDs vs. Data Coincidences
Situation	Example	Is It an FD?	Why?
Business rule explicitly stated	Each order has one customer	Yes (OrderID → CustomerID)	Business explicitly guarantees this
Unique constraint exists	Email is unique per user	Yes (Email → UserID)	Database enforces uniqueness
Pattern in sample data	All red cars are sedans in data	Probably No	No business rule requires this
Domain knowledge	SSN identifies a person	Yes (SSN → PersonDetails)	Legal/government guarantee
Assumption about uniqueness	All employees have different names	No (Name is not a key)	Names are not guaranteed unique

Sources of Functional Dependencies

In real-world scenarios and interviews, FDs emerge from multiple sources. A skilled database engineer draws from all of these to build a complete picture of the dependencies.

Source 1: Explicit Business Rules

The most reliable source. When stakeholders say things like:

"Each customer has exactly one billing address"
"An order can only belong to one customer"
"Each product has a unique SKU"

These translate directly into FDs:

CustomerID → BillingAddress
OrderID → CustomerID
SKU → ProductDetails

Source 2: Unique Constraints and Keys

Any attribute(s) marked as UNIQUE or PRIMARY KEY are the left side of FDs determining all other attributes:

PRIMARY KEY (EmployeeID) implies: EmployeeID → {Name, Department, Salary, ...}
UNIQUE (Email) implies: Email → {UserID, Name, ...}

This is why key identification and FD identification are deeply linked—we'll explore this connection in Page 4.

The Key-FD Relationship

Source 3: Domain Knowledge

Some FDs are so universally true that they're assumed:

ISBN → BookTitle, Author, Publisher (publishing industry standard)
SSN → PersonName, DateOfBirth (government ID)
ZIPCode → City, State (US postal standard—mostly)
VIN → CarMake, Model, Year (automotive standard)

Interview tip: If you recognize domain-specific dependencies, mention them—it shows depth of understanding.

Source 4: Derived/Transitive Dependencies

Once you identify some FDs, you can derive others using Armstrong's Axioms:

If StudentID → DepartmentID and DepartmentID → DepartmentName
Then StudentID → DepartmentName (transitivity)

This is crucial for identifying potential 3NF violations.

Source 5: Composite Dependencies

Sometimes no single attribute determines another, but a combination does:

StudentID, CourseID → Grade (a student's grade in a specific course)
OrderID, ProductID → Quantity (quantity of a specific product in an order)

These composite determinants often become parts of composite primary keys.

FD Source Checklist for Interviews

•Read the problem statement carefully — Business rules are often stated directly. 'Each X has one Y' means X → Y.
•Identify all keys and unique constraints — Each key K implies K → (all attributes).
•Apply domain knowledge — Recognize standard identifiers (SSN, ISBN, etc.) and their dependencies.
•Look for composite dependencies — When single attributes don't determine others, consider combinations.
•Use Armstrong's Axioms — Derive additional FDs from those you've found.
•Question coincidences — Patterns in sample data aren't FDs unless guaranteed by business rules.

Systematic FD Identification Process

Let's formalize a systematic process for FD identification that you can apply consistently in interviews and design reviews.

Phase 1: Enumerate All Attributes

Start by listing every attribute in the relation. For a relation R(A, B, C, D, E), write:

Attributes: {A, B, C, D, E}

Phase 2: Identify Obvious Single-Attribute Determinants

For each attribute, ask: Does this attribute alone determine any others?

Is A a key or unique? If yes, A → {B, C, D, E}
Does A have special domain meaning (like SSN, ISBN)?

Phase 3: Identify Composite Determinants

For each pair, triple, etc., ask: Does this combination determine attributes that single attributes don't?

Often, the primary key is a composite (like {StudentID, CourseID})
Look for many-to-many relationships that require combinations

Phase 4: Apply Transitivity

For each FD X → Y you've found, check: Does Y determine anything else?

If X → Y and Y → Z, then X → Z (record this, as it reveals normalization opportunities)

Phase 5: Validate Against Business Rules

Review each FD you've identified:

Is this guaranteed by a business rule?
Or is it just an artifact of sample data?

FD Identification WalkthroughApplying systematic FD identification to an enrollment schema

Input

Relation: Enrollment(StudentID, StudentName, CourseID, CourseName, InstructorID, InstructorName, Grade)

Business Rules:
1. Each student has a unique StudentID
2. Each course has a unique CourseID
3. Each instructor has a unique InstructorID
4. Each course has one instructor
5. A student can enroll in many courses
6. A student's grade is specific to a course

Output

**Identified FDs:**

• StudentID → StudentName (Rule 1: students are unique)
• CourseID → CourseName, InstructorID (Rule 2, 4: courses are unique and have one instructor)
• InstructorID → InstructorName (Rule 3: instructors are unique)
• {StudentID, CourseID} → Grade (Rule 6: grade is per enrollment)

**Derived FDs (via transitivity):**
• CourseID → InstructorName (CourseID → InstructorID → InstructorName)

**Candidate Key:** {StudentID, CourseID}
(This minimal set determines all attributes)

Common Identification Mistakes

Mistake 1: Assuming X → Y because Y appears after X in the schema. Column order is irrelevant to FDs.

Mistake 2: Confusing many-to-one with one-to-many. 'Each employee has one department' means EmployeeID → DeptID, not DeptID → EmployeeID.

Mistake 3: Missing composite keys. When no single attribute is a key, a combination must be.

Mistake 4: Assuming symmetry. X → Y does NOT imply Y → X unless both are candidate keys.

Armstrong's Axioms and Attribute Closure

Once you've identified initial FDs, Armstrong's Axioms let you derive all implied FDs—essential for complete normalization analysis.

Armstrong's Axioms (Sound and Complete)

1. Reflexivity: If Y ⊆ X, then X → Y

{A, B} → A (trivially true)
Used less in practice, but proves completeness

2. Augmentation: If X → Y, then XZ → YZ

If StudentID → Name, then {StudentID, Age} → {Name, Age}
Adding attributes to both sides preserves the dependency

3. Transitivity: If X → Y and Y → Z, then X → Z

If CourseID → InstructorID and InstructorID → InstructorName
Then CourseID → InstructorName
Critical for identifying 3NF violations

Derived Rules (from the axioms):

4. Union: If X → Y and X → Z, then X → YZ

Combine multiple dependencies with the same determinant

5. Decomposition: If X → YZ, then X → Y and X → Z

Split the right side for analysis

6. Pseudotransitivity: If X → Y and WY → Z, then WX → Z

Generalized transitivity with additional attributes

Attribute Closure Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
/**
 * Attribute Closure Algorithm
 * Input: Set of attributes X, Set of FDs F
 * Output: X+ (all attributes determined by X under F)
 */
function computeClosure(X: Set<Attribute>, F: Set<FD>): Set<Attribute> {
    let closure = new Set(X);  // Initialize with X itself
    let changed = true;
    
    while (changed) {
        changed = false;
        
        for (const fd of F) {
            // If left side of FD is contained in closure
            if (isSubset(fd.leftSide, closure)) {
                // Add right side to closure
                const sizeBefore = closure.size;
                addAll(closure, fd.rightSide);
                
                // Check if closure expanded
                if (closure.size > sizeBefore) {
                    changed = true;
                }
            }
        }
    }
    
    return closure;
}
 
// Example: Computing closure of {StudentID} given FDs
// FDs: {StudentID → Name, StudentID → DeptID, DeptID → DeptName}
// 
// Iteration 1: closure = {StudentID}
//   - Apply StudentID → Name: closure = {StudentID, Name}
//   - Apply StudentID → DeptID: closure = {StudentID, Name, DeptID}
//   
// Iteration 2: closure = {StudentID, Name, DeptID}
//   - Apply DeptID → DeptName: closure = {StudentID, Name, DeptID, DeptName}
//   
// Result: {StudentID}+ = {StudentID, Name, DeptID, DeptName}

Interview Closure Questions

Why Closure Matters:

Key Identification: X is a superkey iff X⁺ = R (all attributes). X is a candidate key iff X is a minimal superkey.
FD Verification: To check if X → Y holds, compute X⁺ and verify Y ⊆ X⁺.
Minimal Cover: Computing closures helps find a minimal set of FDs equivalent to the original.
Normalization: Closure reveals all transitive and partial dependencies for NF analysis.

Canonical Cover and Minimal Basis

Why We Need Canonical Covers:

Reduced Overhead: Fewer FDs mean fewer constraints to enforce
Clearer Design: Shows only essential dependencies
BCNF/3NF Decomposition: Algorithms often require minimal FDs

Properties of a Canonical Cover Fc:

Equivalent: Fc⁺ = F⁺ (same closure)
No redundant FDs: Removing any FD changes the closure
No extraneous attributes:
- No extraneous left-side attributes (can't remove any)
- No extraneous right-side attributes (can't remove any)
Single attribute on right: Each FD has form X → A (singleton right side)

Canonical Cover Algorithm
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
/**
 * Canonical Cover Algorithm
 * Input: Set of FDs F
 * Output: Minimal canonical cover Fc equivalent to F
 */
function computeCanonicalCover(F: Set<FD>): Set<FD> {
    // Step 1: Decompose right sides to single attributes
    let Fc = decomposeRightSides(F);
    // {A → BC} becomes {A → B, A → C}
    
    // Step 2: Remove extraneous left-side attributes
    for (const fd of Fc) {
        for (const attr of fd.leftSide) {
            // Check if attr is extraneous
            const reducedLeft = fd.leftSide.minus(attr);
            const closureWithout = computeClosure(reducedLeft, Fc);
            
            if (closureWithout.contains(fd.rightSide)) {
                // attr is extraneous, remove it
                fd.leftSide = reducedLeft;
            }
        }
    }
    
    // Step 3: Remove extraneous right-side attributes
    // (After Step 1, right sides are singletons, so skip)
    
    // Step 4: Remove redundant FDs
    for (const fd of Fc) {
        const FcMinusFD = Fc.minus(fd);
        const closureWithout = computeClosure(fd.leftSide, FcMinusFD);
        
        if (closureWithout.contains(fd.rightSide)) {
            // fd is redundant, remove it
            Fc = FcMinusFD;
        }
    }
    
    // Step 5: Combine FDs with same left side (optional)
    // {A → B, A → C} can become {A → BC}
    
    return Fc;
}

Computing Canonical CoverReducing a set of FDs to its minimal form

Input

F = {A → BC, B → C, A → B, AB → C}

Find the canonical cover.

Output

**Step 1: Decompose right sides**
F₁ = {A → B, A → C, B → C, A → B, AB → C}

**Step 2: Remove duplicate {A → B}**
F₂ = {A → B, A → C, B → C, AB → C}

**Step 3: Check for extraneous left attributes in AB → C**
- Is A extraneous? Check if B → C is in F₂. Yes! B → C exists.
- So AB → C becomes B → C (but it's already there)
- Remove AB → C as redundant
F₃ = {A → B, A → C, B → C}

**Step 4: Check A → C for redundancy**
- Under {A → B, B → C}, compute A⁺ = {A, B, C}
- C is in A⁺, so A → C is derivable (by transitivity)
- Remove A → C

**Canonical Cover: Fc = {A → B, B → C}**

FD Identification in Interview Scenarios

Let's apply everything to realistic interview problems. These are the types of scenarios you'll encounter.

Scenario Type 1: Given Business Description

You're given a natural language description of a business and asked to identify FDs.

FD Extraction:

PatientID → PatientName, DateOfBirth
DoctorID → DoctorName, Specialty
{PatientID, DoctorID, Date, Time} → Notes (appointment key)

Scenario Type 2: Given Schema with Constraints

You're given a CREATE TABLE statement and must identify FDs.

Schema Analysis Example
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
-- Given this schema, identify all FDs
CREATE TABLE OrderDetails (
    OrderID INT,
    ProductID INT,
    CustomerID INT,
    CustomerName VARCHAR(100),
    ProductName VARCHAR(100),
    UnitPrice DECIMAL(10,2),
    Quantity INT,
    OrderDate DATE,
    PRIMARY KEY (OrderID, ProductID),
    FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);
 
-- FD Analysis:
-- 1. From PRIMARY KEY: {OrderID, ProductID} → all attributes
-- 2. From FK to Customers: CustomerID → CustomerName
-- 3. From FK to Products: ProductID → ProductName, UnitPrice
-- 4. From business logic: OrderID → CustomerID, CustomerName, OrderDate
--    (each order belongs to one customer, has one date)
-- 5. Combining: {OrderID, ProductID} → Quantity
--    (quantity is specific to each line item)
 
-- Complete FD Set:
-- OrderID → CustomerID, CustomerName, OrderDate
-- CustomerID → CustomerName (transitive via Orders-Customer)
-- ProductID → ProductName, UnitPrice
-- {OrderID, ProductID} → Quantity

Strong Answers Include

•Methodical identification from all sources
•Explicit reasoning for each FD
•Recognition of transitive dependencies
•Identification of candidate keys
•Discussion of normalization implications

Weak Answers Exhibit

•Missing FDs due to incomplete analysis
•Confusing data patterns with FDs
•Ignoring composite dependencies
•Missing transitive dependencies
•Vague reasoning or guessing

Scenario Type 3: Given Sample Data

You're shown a table with sample data and asked to identify FDs. This tests the critical distinction between data coincidence and schema constraint.

Sample Data for FD Analysis
EmpID	Name	DeptID	DeptName	ManagerID	Salary
E001	Alice	D1	Engineering	E005	90000
E002	Bob	D1	Engineering	E005	85000
E003	Charlie	D2	Marketing	E006	75000
E004	Diana	D2	Marketing	E006	80000
E005	Eve	D1	Engineering	NULL	120000

Analyzing the Sample Data:

✅ Valid FDs (business rules):

EmpID → Name, DeptID, ManagerID, Salary (employees are unique)
DeptID → DeptName (departments have fixed names)

❌ Invalid patterns (data coincidence):

It looks like DeptID → ManagerID (all D1 emps have E005 as manager)
But this is likely coincidence—managers could change, departments could have multiple managers in different teams

⚠️ Question the interviewer: "Is the department manager fixed per department, or can employees in the same department have different managers?"

This clarifying question demonstrates sophisticated thinking.

Advanced FD Concepts

For completeness, let's cover advanced FD concepts that occasionally appear in interviews and are essential for complete mastery.

Multivalued Dependencies (MVDs)

Notation: X ↠ Y (read "X multi-determines Y")

Meaning: For a given X value, the set of Y values is independent of the set of Z values (where Z is all other attributes).

MVDs are relevant for 4NF—a table violating 4NF has non-trivial MVDs that aren't FDs.

Join Dependencies (JDs)

Notation: (R1, R2, ..., Rn)

Meaning: The table can be losslessly decomposed into R1, R2, ..., Rn and rejoined.

JDs are relevant for 5NF (Project-Join Normal Form). Most practical databases don't require 5NF analysis, but awareness demonstrates depth.

Embedded/Implied Dependencies

Some FDs are embedded within larger FDs:

If {A, B} → C and A → B, then A → C (A alone determines C)
Recognizing embedded dependencies helps minimize key sizes

When to Mention Advanced Concepts

Lossless-Join Decomposition Test

When decomposing R into R1 and R2, the decomposition is lossless iff:

(R1 ∩ R2) → R1, OR
(R1 ∩ R2) → R2

In plain English: The common attributes must be a superkey of at least one decomposed relation.

This is critical for the decomposition problems we'll cover in Page 3.

Practice Problems and Summary

Let's consolidate your learning with practice problems that mirror real interview questions.

Practice Problem 1: E-Commerce SchemaIdentify all FDs and candidate keys

Input

OrderLine(OrderID, ProductID, CustomerID, CustomerEmail, ProductName, ProductCategory, Quantity, UnitPrice, TotalPrice, OrderDate)

Business Rules:
- Each order belongs to one customer
- Customers have unique emails
- Products have unique names and belong to one category
- An order can have multiple products (line items)
- TotalPrice = Quantity × UnitPrice

Output

**FDs Identified:**

1. CustomerID → CustomerEmail
2. CustomerEmail → CustomerID (email is unique, so bidirectional)
3. ProductID → ProductName, ProductCategory, UnitPrice
4. OrderID → CustomerID, CustomerEmail, OrderDate
5. {OrderID, ProductID} → Quantity, TotalPrice

**Derived FD:**
6. {Quantity, UnitPrice} → TotalPrice (calculated field)

**Candidate Key:** {OrderID, ProductID}

**Normalization Issues:**
- CustomerEmail is transitively dependent (OrderID → CustomerID → CustomerEmail): 3NF violation
- ProductName, ProductCategory, UnitPrice are partially dependent on key: 2NF violation
- Schema needs decomposition

Practice Problem 2: Closure ComputationCompute attribute closures and identify keys

Input

R(A, B, C, D, E)
FDs: {A → B, BC → D, D → E, E → A}

Compute A⁺, B⁺, {B,C}⁺, and identify all candidate keys.

Output

**A⁺ Computation:**
{A} → {A, B} (A → B)
No more expansions possible.
**A⁺ = {A, B}**

**B⁺ Computation:**
{B} - no applicable FDs
**B⁺ = {B}**

**{B,C}⁺ Computation:**
{B, C} → {B, C, D} (BC → D)
{B, C, D} → {B, C, D, E} (D → E)
{B, C, D, E} → {A, B, C, D, E} (E → A)
**{B,C}⁺ = {A, B, C, D, E} = R**

**Candidate Keys:**
- {B, C} is a superkey (proven above)
- Is {B} alone a key? B⁺ = {B} ≠ R. No.
- Is {C} alone a key? C⁺ = {C} ≠ R. No.
- Therefore, {B, C} is a **candidate key** (minimal superkey)

**Other candidate keys?**
- From cycle E → A → B, check if any leads back...
- {C, E}: {C, E} → {A, C, E} → {A, B, C, E} → {A, B, C, D, E}. Yes! {C, E} is also a candidate key.

**Candidate Keys: {B, C} and {C, E}**

Key Takeaways

•FDs encode business rules — They're semantic constraints, not just data patterns. Always validate against business logic.
•Multiple sources for FDs — Use explicit rules, keys, domain knowledge, and derivation via Armstrong's Axioms.
•Systematic identification is reliable — Apply the 5-phase process: enumerate attributes, find single determinants, find composite determinants, apply transitivity, validate.
•Attribute closure is fundamental — Used for key identification, FD verification, and canonical cover computation.
•Canonical cover minimizes redundancy — Essential for efficient design and as input to normalization algorithms.
•Interview excellence requires precision — Distinguish data coincidence from constraints, ask clarifying questions, and show your reasoning.

What's Next:

Page Complete