Loading learning content...
By now, you've mastered functional dependencies—the backbone of normalization theory through BCNF. You understand that a functional dependency X → Y means that X values uniquely determine Y values. But what happens when you encounter relationships that FDs simply cannot express?
Consider an employee who speaks multiple languages and has multiple skills. There's no functional dependency between languages and skills—an employee's language capabilities don't determine their skills, and vice versa. Yet these two sets of attributes are independently associated with the employee. If you store this data naively, you'll encounter redundancy that BCNF cannot eliminate.
This is precisely where Multivalued Dependencies (MVDs) enter the picture.
By the end of this page, you will understand the formal definition of multivalued dependencies, recognize why they represent a fundamental extension beyond functional dependencies, and grasp the semantic intuition behind independent relationships. You'll see how MVDs capture a pattern of redundancy that FDs fundamentally cannot address.
Before diving into MVDs, let's crystallize exactly where functional dependencies fall short. This understanding is crucial because MVDs exist specifically to address a class of redundancy that FDs cannot capture.
Functional dependencies capture determinacy:
When we say X → Y, we assert that given any X value, there is exactly one corresponding Y value. This is a powerful constraint that eliminates certain types of redundancy—if Y is determined by X, storing Y with every occurrence of X would be redundant (hence normalization).
But consider this scenario:
| EmpID | EmpName | Skill | Language |
|---|---|---|---|
| E001 | Alice | Java | English |
| E001 | Alice | Java | Spanish |
| E001 | Alice | Python | English |
| E001 | Alice | Python | Spanish |
| E002 | Bob | C++ | French |
| E002 | Bob | SQL | French |
Observe the structure carefully:
Alice has skills {Java, Python} and speaks languages {English, Spanish}. To represent all possible combinations, we need four tuples for Alice. This is the Cartesian product of her skills and languages.
Now ask: What functional dependencies exist here?
There are essentially no non-trivial functional dependencies among {EmpID, Skill, Language}.
Yet there is obvious redundancy! Alice's name is repeated four times. More subtly, the pairing of "Java" with "English" appears, but this pairing has no semantic meaning—Alice knows Java independently of speaking English.
The table is in BCNF (assuming the only FD is EmpID → EmpName and we decompose accordingly). Yet after BCNF decomposition, the redundancy from the Cartesian product remains. This redundancy causes update anomalies: if Alice learns German, we must add TWO new tuples (one for each skill). If she learns a new skill, we must add TWO new tuples (one for each language). The number of tuples grows multiplicatively, not additively.
The core insight:
Functional dependencies capture one-to-one or many-to-one relationships. But the relationship between an employee and their skills is one-to-many, and so is the relationship between an employee and their languages. These two one-to-many relationships are independent of each other.
We need a new type of constraint to express: "The set of skills for an employee is independent of the set of languages for that employee."
This constraint is called a Multivalued Dependency.
Let's build intuition before presenting the formal definition.
Informal Statement:
A multivalued dependency X →→ Y (read as "X multi-determines Y" or "Y is multidependent on X") holds in a relation R if, for any given value of X, the set of Y values is determined independently of the values of the remaining attributes R - X - Y.
In our example:
EmpID →→ Skill means: For any given employee, the set of skills is determined by the employee alone, independent of what languages they speak.
EmpID →→ Language means: For any given employee, the set of languages is determined by the employee alone, independent of what skills they have.
Think of it this way:
If you want to know Alice's skills, you only need to know that it's Alice. Her languages are irrelevant to determining her skills—the skills form an independent set associated with Alice.
The Cartesian Product Symptom Explained:
If EmpID →→ Skill holds in our relation, and an employee E001 has skills {Java, Python} and languages {English, Spanish}, then for E001, every skill must pair with every language. The tuples for E001 form the Cartesian product:
{Java, Python} × {English, Spanish} = {(Java, English), (Java, Spanish), (Python, English), (Python, Spanish)}
This is the "signature" of MVDs in data—and exactly the redundancy we want to eliminate.
Now we present the rigorous definition that underlies all MVD theory.
Formal Definition:
Let R be a relation schema with attribute set U. Let X, Y ⊆ U. The multivalued dependency X →→ Y holds in R if and only if:
For any relation instance r of R, whenever two tuples t₁ and t₂ exist in r such that t₁[X] = t₂[X], then there also exist tuples t₃ and t₄ in r such that:
Let Z = U - X - Y (the remaining attributes).
In essence: If we have two tuples that agree on X, we can "swap" their Y and Z values to create two more tuples that must also be in the relation.
Think of the formal definition as a "tuple swapping" rule. If X →→ Y holds, and we have tuples (x, y₁, z₁) and (x, y₂, z₂), then we MUST also have tuples (x, y₁, z₂) and (x, y₂, z₁). The Y values and Z values can be freely interchanged while keeping X fixed. This is the mathematical formalization of "independence".
Visualizing the Formal Definition:
Consider relation R(X, Y, Z) where X →→ Y holds. Let's say we have these tuples:
| Tuple | X | Y | Z |
|---|---|---|---|
| t₁ | x | y₁ | z₁ |
| t₂ | x | y₂ | z₂ |
Since t₁[X] = t₂[X] = x, the MVD X →→ Y requires that these tuples also exist:
| Tuple | X | Y | Z | Origin |
|---|---|---|---|---|
| t₃ | x | y₁ | z₂ | Y from t₁, Z from t₂ |
| t₄ | x | y₂ | z₁ | Y from t₂, Z from t₁ |
All four tuples must exist. This creates the Cartesian product structure: for each X value, every Y value combines with every Z value.
Applying to Our Example:
With R(EmpID, Skill, Language):
If we have:
Then EmpID →→ Skill requires:
And indeed, these tuples exist in our example table, confirming the MVD holds.
Understanding the relationship between MVDs and FDs is crucial for mastering normalization theory. Let's examine this relationship rigorously.
Theorem: Every FD is an MVD
If X → Y holds in relation R, then X →→ Y also holds in R.
Proof:
Assume X → Y holds. Consider any two tuples t₁, t₂ with t₁[X] = t₂[X].
By the functional dependency, t₁[Y] = t₂[Y] (they must have the same Y value).
Now construct t₃ with:
But since t₁[Y] = t₂[Y], we have t₃[Y] = t₂[Y] as well.
So t₃ = (t₁[X], t₁[Y], t₂[Z]) = (t₂[X], t₂[Y], t₂[Z]) = t₂!
Similarly, t₄ = t₁.
The required tuples already exist (they're just t₁ and t₂ themselves), so the MVD holds. ∎
Multivalued dependencies are a generalization of functional dependencies. Every FD X → Y implies X →→ Y. However, the converse is false: X →→ Y does NOT imply X → Y. MVDs are strictly more general—they capture constraints that FDs cannot express.
Example Where MVD Holds But FD Does Not:
In our EmpID →→ Skill example:
The MVD expresses something fundamentally different from an FD—it's not about unique determination, but about independent association.
How do you identify when a multivalued dependency holds in a given dataset? Let's develop systematic techniques.
Method 1: The Cartesian Product Test
For each distinct X value, examine the sets of Y values and Z values:
If the count equals |Y values| × |Z values| for every X value, and all combinations appear, the MVD likely holds.
| EmpID | Skills (Y) | Languages (Z) | |Y| | |Z| | Tuples | Product? |
|---|---|---|---|---|---|---|
| E001 | {Java, Python} | {English, Spanish} | 2 | 2 | 4 | ✓ 2×2=4 |
| E002 | {C++, SQL} | {French} | 2 | 1 | 2 | ✓ 2×1=2 |
Method 2: The Missing Tuple Check
Apply the formal MVD definition directly:
If any required tuple is missing, the MVD does NOT hold.
R(CourseID, Instructor, TimeSlot)
┌──────────┬────────────┬──────────┐
│ CourseID │ Instructor │ TimeSlot │
├──────────┼────────────┼──────────┤
│ CS101 │ Smith │ 9AM │
│ CS101 │ Jones │ 2PM │
│ CS101 │ Smith │ 2PM │
└──────────┴────────────┴──────────┘
Testing with t₁ = (CS101, Smith, 9AM) and t₂ = (CS101, Jones, 2PM):
Required: t₃ = (CS101, Smith, 2PM) — EXISTS ✓
Required: t₄ = (CS101, Jones, 9AM) — MISSING ✗The MVD CourseID →→ Instructor does NOT hold because
the tuple (CS101, Jones, 9AM) is missing.
For the MVD to hold, Jones must teach at BOTH
time slots if Smith teaches at both.Method 3: Semantic Analysis
Often the most reliable method is to reason about the real-world semantics:
"Are the sets of Y values and Z values for a given X truly independent?"
In our employee example:
This semantic independence is the essence of MVDs. If you can articulate why Y and Z vary independently given X, you've identified an MVD.
MVDs aren't just theoretical constructs—they emerge naturally in many real-world data modeling scenarios. Understanding these patterns helps you recognize MVDs in your own designs.
When MVDs exist but aren't addressed through proper normalization (4NF), the data grows as the product of the multivalued attributes. If an employee has 5 skills and speaks 4 languages, you need 20 tuples. Add 3 certifications, and you need 60 tuples. This multiplicative growth causes storage waste and update anomalies—exactly the problems normalization aims to solve.
Anti-Pattern: Forcing Non-Independent Attributes Together
A common database design mistake is creating "universal" tables that combine multiple independent multi-valued facts:
BadDesign(EmpID, EmpName, Skill, Language, Certification, Project)
If skills, languages, certifications, and projects are all independent of each other (for a given employee), you have FOUR MVDs:
The result? For an employee with 3 skills, 2 languages, 4 certifications, and 5 projects, you need 3 × 2 × 4 × 5 = 120 tuples to represent data that semantically requires only 3 + 2 + 4 + 5 = 14 facts.
This is catastrophic redundancy that MVD theory helps us understand and eliminate.
Before we explore MVD axioms in detail (in a later page), let's establish some fundamental properties that distinguish MVDs from FDs.
Property 1: Complementation Rule
If X →→ Y holds in R(U), then X →→ (U - X - Y) also holds.
Interpretation: If Y is independent of the rest (given X), then the rest is independent of Y (given X). Independence is symmetric.
Example: If EmpID →→ Skill holds in R(EmpID, Skill, Language), then EmpID →→ Language automatically holds.
This is not true for FDs: X → Y does NOT imply X → (U - X - Y).
Due to the complementation rule, multivalued dependencies always come in pairs. When you state X →→ Y, you're implicitly also stating X →→ Z where Z is everything else. This is why MVD notation sometimes writes X →→ Y | Z to emphasize both sides of the independence.
Property 2: MVD in Binary Relations
In a relation with only two attributes R(X, Y), every MVD X →→ Y is trivially satisfied.
Reason: There are no other attributes Z to be independent of.
Property 3: Context Dependence
MVDs are highly context-dependent—they depend on the total set of attributes in the relation. An MVD that holds in R(A, B, C) might not hold if we add attribute D.
Example: EmpID →→ Skill holds in R(EmpID, Skill, Language). But in an expanded relation R(EmpID, Skill, Language, SkillLevel), if SkillLevel is functionally dependent on (EmpID, Skill), the MVD structure changes.
Property 4: Embedded MVDs
We often speak of MVDs holding "embedded" in a subset of attributes. Formally, X →→ Y[R] means the MVD holds when considering only the attributes of schema R. This notation is important when discussing projections.
| Property | Functional Dependency | Multivalued Dependency |
|---|---|---|
| Symmetry | X → Y does NOT imply Y → X | X →→ Y implies X →→ (R - X - Y) |
| Trivial Case | Trivial if Y ⊆ X | Trivial if Y ⊆ X OR XY = R |
| Binary Relations | Can be non-trivial | Always trivially satisfied |
| Context | Less context-sensitive | Highly context-dependent |
| Relationship | Special case of MVD | Generalization of FD |
We've established the foundational understanding of multivalued dependencies. Let's consolidate the key concepts:
What's Next:
Now that you understand what MVDs are and why they matter, the next page explores the notation system for MVDs in depth. We'll examine the X →→ Y notation, discuss conventions for specifying MVDs, and understand how MVDs are written in formal database literature.
You now understand the formal definition of multivalued dependencies, how they differ from functional dependencies, and why they represent a fundamental extension of dependency theory. This understanding is essential for mastering Fourth Normal Form (4NF), which specifically addresses the redundancy caused by MVDs.