Loading content...
Consider a Person entity. How many phone numbers does a person have? One? Perhaps. But in reality, many people have:
Or consider a Book entity. How many authors does a book have? One author is simple. But what about:
These are multivalued attributes—attributes where a single entity instance can have multiple values. They represent a fundamental departure from the 'one entity, one value per attribute' model we've been building, and they require special handling in both conceptual design and physical implementation.
By the end of this page, you will understand what distinguishes multivalued attributes from simple and composite attributes, how to identify them in requirements, standard ER notation for representation, and multiple strategies for implementing them in relational databases—along with the First Normal Form implications.
A multivalued attribute is an attribute that can hold a set of values for a single entity instance. Unlike single-valued attributes (where each entity has exactly one value, possibly NULL), multivalued attributes allow zero, one, or many values per entity.
Formal Definition:
A multivalued attribute is an attribute whose domain includes sets of values, where each entity instance can have any number of values from the domain, including none.
Key Characteristics:
Single-Valued vs. Multivalued:
| Characteristic | Single-Valued | Multivalued |
|---|---|---|
| Values per entity | Exactly one (or NULL) | Zero to many |
| Storage in relation | One column | Requires special handling |
| Example | date_of_birth | phone_numbers |
| 1NF compliance | Natural | Requires normalization |
| Query complexity | Simple (= comparison) | Requires set operations |
The relational model is built on atomic values in cells—one value per cell. Multivalued attributes violate this fundamental assumption. That's why multivalued attributes in ER models must be transformed during the mapping to relational schema. They cannot exist 'as-is' in a properly normalized relational database.
During requirements analysis and conceptual modeling, recognizing multivalued attributes requires careful attention to language and domain semantics. Here are the key indicators:
| Entity | Multivalued Attribute | Typical Cardinality | Domain |
|---|---|---|---|
| Person | phone_numbers | 1-5 typical | Formatted phone strings |
| Person | email_addresses | 1-3 typical | Email format strings |
| Employee | skills | 5-20 typical | Skill names/IDs |
| Employee | certifications | 0-10 typical | Certification records |
| Student | majors | 1-2 typical | Major codes |
| Book | authors | 1-10 typical | Author names/IDs |
| Product | colors | 1-5 typical | Color values |
| Movie | genres | 1-4 typical | Genre categories |
| Property | amenities | 5-30 typical | Amenity codes |
If an attribute is usually single-valued but occasionally has multiple values, treat it as multivalued. A book might typically have one author, but if even 10% of books have multiple authors, model it as multivalued. Don't let the common case blind you to valid variations.
Multivalued vs. Separate Entity:
A critical question arises: should 'phone numbers' be a multivalued attribute of Person, or should Phone be its own entity with a relationship to Person?
Use multivalued attribute when:
Use separate entity when:
ER diagrams have distinct notation for multivalued attributes to distinguish them from single-valued attributes. The notation immediately signals that special implementation handling will be required.
Chen Notation (Original ER):
In Chen notation, multivalued attributes are represented with a double oval (double ellipse):
┌─────────────────┐
│ PERSON │
└────────┬────────┘
│
┌─────────────┼─────────────┐
│ │ │
╱───┴───╲ ╭───┴───╮ ╱───┴───╲
( Name ) ╱╱ ╲╲ ( DOB )
╲───────╱ (( Phone )) ╲───────╱
╲╲ ╱╱
╰───────╯
Single Oval Double Oval Single Oval
(simple) (multivalued) (simple)
The double border clearly distinguishes multivalued from simple attributes at a glance.
Extended Chen Notation with Cardinality:
Some variations add cardinality constraints:
╭───────────╮
╱╱ ╲╲
(( Phone {1,*} )) -- At least 1, no upper limit
╲╲ ╱╱
╰───────────╯
╭───────────╮
╱╱ ╲╲
(( Email {0,3} )) -- Optional, max 3
╲╲ ╱╱
╰───────────╯
The {min, max} notation specifies bounds on the number of values allowed.
Many diagramming tools don't natively support double-oval notation. Common workarounds include: asterisk suffix (phone*), curly braces (phone{}), or bracketed multiplicity [*]. Whatever notation your team uses, ensure it's documented and consistent.
Crow's Foot and Modern Notations:
In Crow's Foot (IE) notation, multivalued attributes are often shown with special markers or simply listed with notes:
┌────────────────────────────────┐
│ EMPLOYEE │
├────────────────────────────────┤
│ employee_id (PK) │
│ first_name VARCHAR(50) │
│ last_name VARCHAR(50) │
│ phone_numbers* VARCHAR(20) │ ← asterisk indicates multivalued
│ email_addresses* VARCHAR(100) │
│ skills* VARCHAR(50) │
│ hire_date DATE │
└────────────────────────────────┘
* = Multivalued (will become separate table)
Since Crow's Foot notation is closer to physical design, some tools show multivalued attributes as what they'll become—separate linked tables.
Composite + Multivalued Combination:
Attributes can be both composite AND multivalued. Consider multiple addresses where each address has components:
Person
│
╭─────┴─────╮
╱╱ ╲╲
(( Addresses )) ← Multivalued (double oval)
╲╲ ╱╱
╰─────┬─────╯
┌───────────────┼───────────────┐
│ │ │
╱───┴───╲ ╱────┴────╲ ╱────┴────╲
( Street ) ( City ) ( ZIP )
╲───────╱ ╲─────────╱ ╲─────────╱
Each person can have multiple addresses, and each address decomposes into components. This combines both concepts.
To understand how multivalued attributes must be handled in relational databases, we need to understand First Normal Form (1NF)—the foundational rule of relational database design.
First Normal Form Requirement:
A relation is in First Normal Form (1NF) if and only if all domains contain only atomic values—no repeating groups, no sets, no lists within a single column.
Multivalued attributes, by definition, hold sets of values. They directly violate 1NF.
Why 1NF Matters:
| Problem with Non-Atomic Values | Consequence |
|---|---|
| Can't index individual values | Full table scans for lookups |
| Can't enforce uniqueness | Duplicate values may creep in |
| Can't establish foreign keys per value | Referential integrity lost |
| Query complexity explodes | Need string parsing, difficult WHERE clauses |
| Updates are error-prone | Must parse, modify, and rewrite the whole set |
| Storage is unpredictable | Variable-length sets complicate capacity planning |
A common mistake is storing multivalued attributes as comma-separated strings: 'red,blue,green'. This violates 1NF and creates nightmares: searching for 'red' might match 'darkred', updating is string manipulation, max values are by string length not count. Never do this in production systems.
Example of 1NF Violation:
❌ VIOLATES 1NF:
┌─────────────┬───────────────┬────────────────────────────┐
│ employee_id │ name │ phone_numbers │
├─────────────┼───────────────┼────────────────────────────┤
│ 1 │ Alice Smith │ 555-1234, 555-5678 │ ← Multiple values!
│ 2 │ Bob Jones │ 555-9999 │
│ 3 │ Carol Brown │ 555-1111, 555-2222, 555-3333│ ← Variable count
└─────────────┴───────────────┴────────────────────────────┘
-- Query nightmare: Find all employees with phone 555-5678
SELECT * FROM employees WHERE phone_numbers LIKE '%555-5678%';
-- This might match 555-56789 or 1555-5678 too!
The solution is to transform the multivalued attribute during ER-to-relational mapping, which we'll explore next.
When mapping ER models with multivalued attributes to relational schemas, several strategies exist. The choice depends on query patterns, cardinality bounds, and database capabilities.
Strategy 1: Create a Separate Table
This is the standard, normalized approach. The multivalued attribute becomes its own table, linked to the parent via foreign key.
123456789101112131415161718192021222324252627282930313233343536373839
-- Parent entity tableCREATE TABLE employees ( employee_id SERIAL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, hire_date DATE NOT NULL); -- Separate table for multivalued attributeCREATE TABLE employee_phone_numbers ( phone_id SERIAL PRIMARY KEY, employee_id INTEGER NOT NULL, phone_number VARCHAR(20) NOT NULL, phone_type VARCHAR(20), -- mobile, home, work is_primary BOOLEAN DEFAULT false, FOREIGN KEY (employee_id) REFERENCES employees(employee_id) ON DELETE CASCADE, -- Prevent duplicate numbers for same employee UNIQUE (employee_id, phone_number), -- Only one primary phone per employee UNIQUE (employee_id, is_primary) WHERE is_primary = true -- Partial unique index (PostgreSQL)); -- Query: Get all phone numbers for an employeeSELECT e.first_name, e.last_name, p.phone_number, p.phone_typeFROM employees eJOIN employee_phone_numbers p ON e.employee_id = p.employee_idWHERE e.employee_id = 123; -- Query: Find employees with a specific phone numberSELECT e.* FROM employees eJOIN employee_phone_numbers p ON e.employee_id = p.employee_idWHERE p.phone_number = '555-1234';Advantages:
Disadvantages:
Let's examine multivalued attributes in complete scenarios, from requirements to implementation.
Scenario: Employee Skill Tracking
Requirement: Track skills for employees to enable project staffing.
Analysis:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
-- Skills catalog (normalized skill names)CREATE TABLE skills ( skill_id SERIAL PRIMARY KEY, skill_name VARCHAR(100) UNIQUE NOT NULL, skill_category VARCHAR(50), -- Programming, Database, Cloud, etc. is_active BOOLEAN DEFAULT true); -- Employees table (without multivalued attribute)CREATE TABLE employees ( employee_id SERIAL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, department VARCHAR(50), hire_date DATE NOT NULL); -- Bridge table for multivalued attribute with per-value attributesCREATE TABLE employee_skills ( employee_id INTEGER NOT NULL, skill_id INTEGER NOT NULL, proficiency VARCHAR(20) CHECK (proficiency IN ('Beginner', 'Intermediate', 'Advanced', 'Expert')), years_experience DECIMAL(4,1), last_used_year INTEGER, certified BOOLEAN DEFAULT false, PRIMARY KEY (employee_id, skill_id), FOREIGN KEY (employee_id) REFERENCES employees(employee_id) ON DELETE CASCADE, FOREIGN KEY (skill_id) REFERENCES skills(skill_id)); -- Sample queries -- Find all Python expertsSELECT e.first_name, e.last_name, es.years_experienceFROM employees eJOIN employee_skills es ON e.employee_id = es.employee_idJOIN skills s ON es.skill_id = s.skill_idWHERE s.skill_name = 'Python' AND es.proficiency = 'Expert'; -- Find employees with 3+ skillsSELECT e.employee_id, e.first_name, e.last_name, COUNT(*) as skill_countFROM employees eJOIN employee_skills es ON e.employee_id = es.employee_idGROUP BY e.employee_id, e.first_name, e.last_nameHAVING COUNT(*) >= 3; -- Skills gap analysis: employees missing specific skillSELECT e.*FROM employees eWHERE NOT EXISTS ( SELECT 1 FROM employee_skills es JOIN skills s ON es.skill_id = s.skill_id WHERE es.employee_id = e.employee_id AND s.skill_name = 'Kubernetes');Choosing the right approach for multivalued attributes involves weighing multiple factors. Here's a decision framework:
| Factor | Separate Table | Array/JSON | Fixed Columns |
|---|---|---|---|
| Cardinality upper bound | Unlimited ✓ | Practical limit ~100s | Small (2-4) ✓ |
| Per-value attributes | Full support ✓ | JSON only | Repetitive |
| Value-level constraints | Full support ✓ | Limited | Manual check |
| Query simplicity | Requires JOINs | Special functions | Simple ✓ |
| Index efficiency | Excellent ✓ | Good (GIN) | Multiple indexes |
| Schema evolution | Add columns ✓ | Flexible ✓ | Schema change |
| Portability | Universal ✓ | DB-specific | Universal ✓ |
| Storage efficiency | Normalized ✓ | Compact | Sparse waste |
Document expected cardinality during modeling: 'A product typically has 1-5 colors, max 20' or 'Employees average 8 skills, rarely exceed 30.' This guides implementation choices and helps catch requirement violations early (an employee with 100 skills may indicate a data quality issue).
Multivalued attributes capture the reality that entities often have multiple values for certain properties. Understanding how to identify, represent, and implement them is essential for accurate data modeling.
What's Next:
We've covered attributes that hold data directly—whether simple, composite, or multivalued. But some attributes don't store data at all; they compute values from other attributes. These are derived attributes, and understanding when to store vs. compute is a fundamental design decision—which we'll explore next.
You now understand multivalued attributes—their nature, notation, normalization implications, and implementation strategies. You can make informed decisions about separate tables vs. arrays vs. JSON, and recognize when multivalued attributes in requirements signal the need for special database design considerations. Next: derived attributes.