Loading learning content...
Beneath every well-designed relational database lies a hidden mathematical structure—a set of rules that govern how attributes relate to one another. These rules, called Functional Dependencies (FDs), are the invisible scaffolding upon which data integrity, normalization theory, and query optimization are built.
Functional dependencies aren't merely academic abstractions. They're the reason your banking database doesn't allow duplicate account numbers with different balances. They're why a properly designed system prevents an employee from having two different salaries in the same pay period. They're what database engines use to determine whether an index will make your query faster.
Without understanding functional dependencies, you're designing databases by intuition rather than science—and intuition fails at scale.
By the end of this page, you will understand the formal definition of functional dependency, recognize why this mathematical concept is essential for database design, and be able to identify functional dependencies in real-world data scenarios. This knowledge forms the foundation for all normalization theory that follows.
Before diving into formal definitions, let's build intuition. The word functional comes from mathematics, where a function maps each input to exactly one output. If you know the input, you can determine the output—there's no ambiguity.
Consider these everyday examples:
The first three examples exhibit functional dependencies: knowing one attribute value lets you uniquely determine another. The last example does not—the relationship is many-to-many, not deterministic.
Think of functional dependency like a mathematical function f(x) = y. If you plug in the same x, you always get the same y. In database terms: if you have the same SSN, you must have the same PersonName. The SSN 'determines' or 'functionally determines' the PersonName.
Key Insight: Determinism, Not Correlation
Functional dependency is about determinism, not correlation. Just because two attributes often appear together doesn't mean one determines the other. Consider:
In contrast:
This distinction is crucial. Database theory deals with certainties, not probabilities.
Now we're ready for the formal definition that appears in every database textbook. Understanding this precisely is essential—normalization algorithms depend on it.
Definition:
Let R be a relation schema with attributes {A₁, A₂, ..., Aₙ}. Let X and Y be subsets of these attributes (X ⊆ R and Y ⊆ R).
We say that X functionally determines Y, written X → Y, if and only if:
For any two tuples t₁ and t₂ in any valid instance r of R: If t₁[X] = t₂[X], then t₁[Y] = t₂[Y]
In plain English: whenever two rows have the same values for attributes X, they must also have the same values for attributes Y.
This definition has three critical implications:
Universal Quantification: The condition must hold for ALL pairs of tuples—no exceptions. A single counter-example violates the dependency.
Instance Independence: The dependency is a constraint on ALL valid instances of the relation, not just the current data. Future data must also satisfy it.
Set-Based: X and Y can be single attributes or sets of attributes. {StudentID, CourseID} → {Grade} is valid—both StudentID AND CourseID together determine the Grade.
12345678910111213141516171819202122
-- Consider this EMPLOYEE relation:-- EMPLOYEE(EmpID, EmpName, DeptID, DeptName, Salary) -- Functional Dependencies that likely hold:-- EmpID → EmpName (Employee ID determines Employee Name)-- EmpID → DeptID (Employee ID determines Department ID) -- EmpID → Salary (Employee ID determines Salary)-- DeptID → DeptName (Department ID determines Department Name) -- Functional Dependency that does NOT hold:-- DeptID → Salary (Many employees in same dept have different salaries)-- EmpName → EmpID (Two employees could share the same name) -- Verifying a Functional Dependency (conceptually):-- For EmpID → EmpName to hold, this query must return 0 rows:SELECT e1.EmpID, e1.EmpName, e2.EmpNameFROM EMPLOYEE e1JOIN EMPLOYEE e2 ON e1.EmpID = e2.EmpIDWHERE e1.EmpName <> e2.EmpName; -- If any rows are returned, we'd have the same EmpID -- with different EmpNames, violating the FDMathematical Precision Matters
Notice the careful phrasing: for any two tuples. This means:
This last point is subtle but important. The dependency X → Y says nothing about tuples that differ on X. It only constrains tuples that match on X.
Let's make functional dependencies concrete with a tabular example. Consider this instance of a COURSE_ENROLLMENT relation:
| StudentID | StudentName | CourseID | CourseName | Instructor | Grade |
|---|---|---|---|---|---|
| S001 | Alice Chen | CS101 | Intro to Programming | Dr. Smith | A |
| S001 | Alice Chen | CS201 | Data Structures | Dr. Jones | A- |
| S002 | Bob Wilson | CS101 | Intro to Programming | Dr. Smith | B+ |
| S002 | Bob Wilson | CS301 | Algorithms | Dr. Brown | B |
| S003 | Carol Davis | CS201 | Data Structures | Dr. Jones | A |
| S003 | Carol Davis | CS101 | Intro to Programming | Dr. Smith | A- |
Analyzing Potential Functional Dependencies:
Does StudentID → StudentName hold?
Let's check all pairs of tuples with matching StudentID:
Yes, StudentID → StudentName holds in this instance.
Does CourseID → Instructor hold?
Check tuples with matching CourseID:
Yes, CourseID → Instructor holds in this instance.
Does StudentID → Grade hold?
Check tuples with matching StudentID:
No, StudentID → Grade does NOT hold. A student can have different grades in different courses.
Critical distinction: An FD holding in a particular instance doesn't guarantee it's a true constraint. Perhaps we haven't yet observed a counter-example. True FDs come from understanding the MEANING of the data—the real-world constraints the database models. More on this in later pages.
Functional dependencies have several important properties that directly impact how we reason about database design. Understanding these properties sets the stage for Armstrong's Axioms (covered in the next module).
12345678910111213141516171819202122232425262728
-- Demonstrating FD Properties with EMPLOYEE(EmpID, Name, DeptID, DeptName, Salary) -- REFLEXIVITY (Trivial FD):-- {EmpID, Name} → {EmpID} -- Always true because EmpID is part of the determining set -- LEFT-SIDE SUPERSET:-- Given: EmpID → Name-- Then: {EmpID, DeptID} → Name (adding DeptID preserves the FD)-- Proof: If two rows match on both EmpID AND DeptID,-- they must match on EmpID alone,-- therefore they must match on Name. -- NON-SYMMETRY Example:-- EmpID → DeptID (employee works in one department)-- DeptID → EmpID DOES NOT HOLD (department has many employees) -- RIGHT-SIDE UNION:-- Given: EmpID → Name AND EmpID → DeptID-- Therefore: EmpID → {Name, DeptID}-- One EmpID determines both attributes together -- RIGHT-SIDE DECOMPOSITION:-- Given: EmpID → {Name, DeptID, Salary}-- Therefore: -- EmpID → Name-- EmpID → DeptID -- EmpID → SalaryThese aren't abstract mathematical curiosities. They're tools for reasoning about FDs. When you're analyzing a schema, you'll use right-side decomposition to simplify complex FDs and left-side supersets to understand composite keys. Armstrong's Axioms (next module) formalize these into a complete reasoning system.
An important distinction in FD theory separates trivial from non-trivial dependencies. This distinction affects normalization and is tested in database design problems.
Completely Non-Trivial FDs:
An FD X → Y is completely non-trivial if X ∩ Y = ∅ (no overlap between left and right sides).
When analyzing FDs for normalization, we often focus on completely non-trivial dependencies because they represent genuine constraints without redundancy.
In practice, when we list FDs for a schema, we almost always mean non-trivial FDs. Saying 'SSN → SSN' provides no information. When database theory references 'the set of FDs for relation R,' assume non-trivial FDs unless stated otherwise.
Functional dependencies and keys are deeply connected. In fact, the formal definition of a key relies entirely on functional dependencies.
Definition: Superkey
A set of attributes K is a superkey of relation R if K functionally determines ALL attributes of R:
K → R (equivalently, K → {all attributes in R})
Definition: Candidate Key
A candidate key is a minimal superkey—removing any attribute from it would cause it to stop being a superkey.
Definition: Primary Key
A primary key is a candidate key chosen by the database designer as the main identifier for tuples.
| Concept | FD Definition | Example |
|---|---|---|
| Superkey | K → R (K determines all attributes) | In EMPLOYEE, {EmpID, Name} is a superkey because it determines everything |
| Candidate Key | Minimal K such that K → R | In EMPLOYEE, {EmpID} is a candidate key—minimal, determines all |
| Primary Key | Chosen candidate key | EmpID designated as the primary key |
| Alternate Key | Non-chosen candidate keys | If both SSN and EmpID are candidate keys, the one not chosen as PK |
Key Insight: Keys Are Special FDs
When we declare a primary key, we're implicitly declaring a functional dependency:
CREATE TABLE Employee (
EmpID INT PRIMARY KEY,
Name VARCHAR(100),
...
);
This implicitly states: EmpID → {all other attributes}.
The database enforces this FD by preventing duplicate EmpID values. If two rows could have the same EmpID, the FD would be violated.
Composite Keys and FDs:
For a composite primary key like (StudentID, CourseID):
CREATE TABLE Enrollment (
StudentID INT,
CourseID INT,
Grade CHAR(2),
PRIMARY KEY (StudentID, CourseID)
);
The implicit FD is: {StudentID, CourseID} → {Grade}. Neither StudentID alone nor CourseID alone determines Grade—only their combination does.
Declaring a primary key captures the 'main' FD but not all FDs. If your table has DeptID and DeptName where DeptID → DeptName, this FD exists regardless of what you chose as the primary key. These 'extra' FDs cause normalization problems. Understanding all FDs—not just key-based ones—is essential for proper design.
Functional dependencies aren't just theoretical constructs—they're practical tools that solve real database design problems. Understanding their importance motivates the normalization theory we'll study next.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- Example: FD-Based Redundancy Analysis-- Original table with redundancy:CREATE TABLE EmployeeProject ( EmpID INT, EmpName VARCHAR(100), DeptID INT, DeptName VARCHAR(100), ProjectID INT, ProjectBudget DECIMAL(12,2), PRIMARY KEY (EmpID, ProjectID)); -- FDs in this table:-- EmpID → EmpName, DeptID (Employee info)-- DeptID → DeptName (Department info - REDUNDANT!)-- ProjectID → ProjectBudget (Project info - REDUNDANT!)-- {EmpID, ProjectID} → Grade (Assignment-specific) -- Problem: If employee E001 works on 5 projects, we store:-- - EmpName 5 times (redundancy from EmpID → EmpName)-- - DeptID 5 times (redundancy from EmpID → DeptID) -- - DeptName 5 times (redundancy, depends on DeptID) -- Proper decomposition based on FD analysis:CREATE TABLE Employee ( EmpID INT PRIMARY KEY, EmpName VARCHAR(100), DeptID INT -- Foreign key); CREATE TABLE Department ( DeptID INT PRIMARY KEY, DeptName VARCHAR(100)); CREATE TABLE Project ( ProjectID INT PRIMARY KEY, ProjectBudget DECIMAL(12,2)); CREATE TABLE Assignment ( EmpID INT, ProjectID INT, -- Only assignment-specific data here PRIMARY KEY (EmpID, ProjectID));We've established the fundamental concept of functional dependencies. Let's consolidate what we've learned:
What's Next:
With the definition established, we'll next explore the notation X → Y in depth—learning to read, write, and interpret functional dependencies precisely. This notation is the universal language used across all database literature and essential for working with normalization algorithms.
You now understand the formal definition of functional dependency—the mathematical foundation of database normalization. This concept will appear in every subsequent topic: Armstrong's Axioms, attribute closure, normal forms, and decomposition algorithms all build on this definition. Keep it sharp.