Loading content...
Consider a simple question: How many phone numbers does a customer have? One? Maybe. Two? Often. Three or more? Increasingly common. Zero? Possible. The answer is inherently variable—there's no fixed count.
This is the essence of multivalued attributes: attributes that can hold multiple values for a single entity instance. Phone numbers, email addresses, skills, hobbies, languages spoken, degrees earned, previous addresses—these are all multivalued. An employee might have three phone numbers; a person might speak five languages; a product might have dozens of tags.
The ER model handles multivalued attributes elegantly—a double-oval notation indicates that the attribute can have multiple values. But the relational model has a fundamental problem: First Normal Form (1NF) requires that every attribute value be atomic. A cell in a relation cannot contain a set of values; it must contain exactly one indivisible value.
This normalization requirement forces us to transform multivalued ER attributes into relational structures that preserve the semantic meaning while maintaining atomicity. The transformation isn't optional—it's mandatory for a valid relational schema.
By the end of this page, you will understand why multivalued attributes cannot exist directly in relational tables, master the standard transformation technique of creating separate tables, handle composite multivalued attributes, and appreciate modern alternatives like array types. You'll be equipped to correctly map any multivalued attribute while preserving all semantic information.
To appreciate the solution, we must first understand the problem. First Normal Form (1NF) is the foundational requirement of the relational model. It states:
A relation is in 1NF if and only if all underlying domains contain atomic values only.
In practical terms, each cell in a table must contain a single, indivisible value—not a list, not a set, not a repeating group. The atomicity requirement stems from relational algebra's foundational assumptions about how operations work on relations.
123456789101112131415161718
-- ❌ VIOLATES 1NF: Multivalued in columnCREATE TABLE Employee_Bad ( employee_id INT PRIMARY KEY, name VARCHAR(100), -- Multiple values in one column! phone_numbers VARCHAR(200) -- Stores: "555-1234,555-5678,555-9012"); -- Problems this creates:-- 1. How many phone numbers? Parse the string.-- 2. Find employee by phone? LIKE '%555-1234%'-- (slow, error-prone, can't use indexes)-- 3. Add a phone? String manipulation-- 4. Remove a phone? Complex string parsing-- 5. What if phone has country code? Delimiter conflicts-- 6. Enforce uniqueness? Impossible-- 7. Data validation? Per-phone format? No.1234567891011121314151617181920212223
-- ✓ SATISFIES 1NF: Separate tableCREATE TABLE Employee ( employee_id INT PRIMARY KEY, name VARCHAR(100)); CREATE TABLE Employee_Phone ( employee_id INT, phone_number VARCHAR(20), phone_type VARCHAR(20), PRIMARY KEY (employee_id, phone_number), FOREIGN KEY (employee_id) REFERENCES Employee(employee_id)); -- Benefits:-- 1. Count phones? COUNT(*) with GROUP BY-- 2. Find by phone? Direct WHERE clause, indexed-- 3. Add phone? Simple INSERT-- 4. Remove phone? Simple DELETE-- 5. Structured data with country code? Easy-- 6. Uniqueness? PK enforces per employee-- 7. Validation? CHECK constraints workThe Practical Problems:
Storing multivalued attributes as delimited strings (comma-separated values) or JSON arrays might seem convenient, but it creates severe problems:
Storing comma-separated values in a single column is one of the most common database design mistakes. It seems simpler at first but creates technical debt that compounds over time. Every query involving these values becomes a workaround rather than a clean operation.
The universally accepted solution for multivalued attributes is to create a separate table that stores the values individually, linked back to the parent entity via a foreign key. This approach fully normalizes the data, enables proper indexing and constraints, and allows all standard relational operations.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
-- Parent Entity: EmployeeCREATE TABLE Employee ( employee_id SERIAL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(100) NOT NULL UNIQUE, hire_date DATE NOT NULL); -- Multivalued Attribute 1: Phone NumbersCREATE TABLE Employee_Phone ( employee_id INTEGER NOT NULL, phone_number VARCHAR(20) NOT NULL, phone_type VARCHAR(20) DEFAULT 'mobile', is_primary BOOLEAN DEFAULT FALSE, -- Composite primary key ensures unique phones per employee PRIMARY KEY (employee_id, phone_number), -- Foreign key with cascade delete FOREIGN KEY (employee_id) REFERENCES Employee(employee_id) ON DELETE CASCADE, -- Only one primary phone per employee (via trigger or app logic) CONSTRAINT chk_phone_type CHECK (phone_type IN ('mobile', 'home', 'work', 'fax'))); -- Multivalued Attribute 2: SkillsCREATE TABLE Employee_Skill ( employee_id INTEGER NOT NULL, skill_name VARCHAR(100) NOT NULL, proficiency INTEGER CHECK (proficiency BETWEEN 1 AND 5), years_exp INTEGER, PRIMARY KEY (employee_id, skill_name), FOREIGN KEY (employee_id) REFERENCES Employee(employee_id) ON DELETE CASCADE); -- Multivalued Attribute 3: CertificationsCREATE TABLE Employee_Certification ( employee_id INTEGER NOT NULL, cert_name VARCHAR(200) NOT NULL, issuing_org VARCHAR(200), issue_date DATE NOT NULL, expiry_date DATE, credential_id VARCHAR(100), PRIMARY KEY (employee_id, cert_name, issue_date), FOREIGN KEY (employee_id) REFERENCES Employee(employee_id) ON DELETE CASCADE, CONSTRAINT chk_cert_dates CHECK (expiry_date IS NULL OR expiry_date > issue_date));For multivalued attribute tables, common naming patterns include: Entity_Attribute (Employee_Phone), EntityAttribute (EmployeePhone), or AttributeOfEntity (PhoneOfEmployee). Choose one convention and apply it consistently. The Entity_Attribute format is most explicit about the parent-child relationship.
Attributes can be both composite AND multivalued. An employee might have multiple addresses, where each address is itself a composite of street, city, state, and postal code. A student might have multiple degrees, where each degree comprises institution, field of study, degree type, and graduation year.
Mapping Composite Multivalued Attributes:
The process combines the techniques from composite and multivalued mapping:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081
-- Example 1: Multiple Addresses (composite multivalued)-- Each employee can have multiple addresses (home, work, mailing, etc.)-- Each address has multiple components CREATE TABLE Employee_Address ( -- Option 1: Composite natural key -- employee_id INTEGER NOT NULL, -- address_type VARCHAR(20) NOT NULL, -- PRIMARY KEY (employee_id, address_type) -- Option 2: Surrogate key (often cleaner) address_id SERIAL PRIMARY KEY, employee_id INTEGER NOT NULL, -- Composite attribute components (flattened) address_type VARCHAR(20) NOT NULL, -- home, work, mailing street VARCHAR(100) NOT NULL, apartment VARCHAR(20), city VARCHAR(50) NOT NULL, state VARCHAR(50) NOT NULL, postal_code VARCHAR(20) NOT NULL, country VARCHAR(50) DEFAULT 'USA', -- Metadata is_primary BOOLEAN DEFAULT FALSE, valid_from DATE, valid_to DATE, -- NULL if current FOREIGN KEY (employee_id) REFERENCES Employee(employee_id) ON DELETE CASCADE, -- Unique: one address of each type per employee UNIQUE (employee_id, address_type), CONSTRAINT chk_address_type CHECK (address_type IN ('home', 'work', 'mailing', 'emergency')), CONSTRAINT chk_valid_dates CHECK (valid_to IS NULL OR valid_to >= valid_from)); -- Example 2: Education History (composite multivalued)CREATE TABLE Employee_Education ( education_id SERIAL PRIMARY KEY, employee_id INTEGER NOT NULL, -- Composite components institution VARCHAR(200) NOT NULL, degree_type VARCHAR(50) NOT NULL, -- BS, MS, PhD, etc. field_of_study VARCHAR(200) NOT NULL, graduation_year INTEGER, gpa DECIMAL(3, 2), honors VARCHAR(100), -- Cum Laude, etc. FOREIGN KEY (employee_id) REFERENCES Employee(employee_id) ON DELETE CASCADE, CONSTRAINT chk_year CHECK (graduation_year IS NULL OR (graduation_year >= 1950 AND graduation_year <= 2100)), CONSTRAINT chk_gpa CHECK (gpa IS NULL OR (gpa >= 0 AND gpa <= 4.0))); -- Example 3: Work History (composite multivalued with dates)CREATE TABLE Employee_Work_History ( history_id SERIAL PRIMARY KEY, employee_id INTEGER NOT NULL, -- Composite components company_name VARCHAR(200) NOT NULL, job_title VARCHAR(200) NOT NULL, start_date DATE NOT NULL, end_date DATE, -- NULL if current responsibilities TEXT, FOREIGN KEY (employee_id) REFERENCES Employee(employee_id) ON DELETE CASCADE, CONSTRAINT chk_dates CHECK (end_date IS NULL OR end_date >= start_date));For composite multivalued attributes, natural composite keys can become complex. A surrogate key (auto-increment ID) often simplifies the schema, especially when referencing from other tables. The foreign key to the parent entity, combined with UNIQUE constraints, still enforces the necessary uniqueness.
Once multivalued attributes are properly mapped to separate tables, querying them becomes a matter of using JOINs, subqueries, and aggregation functions. Here are the essential query patterns:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
-- PATTERN 1: Get entity with all its values (LEFT JOIN)SELECT e.employee_id, e.first_name, e.last_name, p.phone_number, p.phone_typeFROM Employee eLEFT JOIN Employee_Phone p ON e.employee_id = p.employee_idWHERE e.employee_id = 101ORDER BY p.is_primary DESC; -- PATTERN 2: Find entities with a specific valueSELECT e.employee_id, e.first_name, e.last_nameFROM Employee eINNER JOIN Employee_Skill s ON e.employee_id = s.employee_idWHERE s.skill_name = 'Python' AND s.proficiency >= 3; -- PATTERN 3: Count values per entitySELECT e.employee_id, e.first_name, e.last_name, COUNT(p.phone_number) AS phone_count, COUNT(s.skill_name) AS skill_countFROM Employee eLEFT JOIN Employee_Phone p ON e.employee_id = p.employee_idLEFT JOIN Employee_Skill s ON e.employee_id = s.employee_idGROUP BY e.employee_id, e.first_name, e.last_name; -- PATTERN 4: Find entities with multiple valuesSELECT e.employee_id, e.first_nameFROM Employee eINNER JOIN Employee_Phone p ON e.employee_id = p.employee_idGROUP BY e.employee_id, e.first_nameHAVING COUNT(*) >= 2; -- Has 2+ phone numbers -- PATTERN 5: Find entities with ALL specified values (set containment)-- Find employees who know BOTH Python AND JavaSELECT e.employee_id, e.first_nameFROM Employee eWHERE e.employee_id IN ( SELECT employee_id FROM Employee_Skill WHERE skill_name = 'Python')AND e.employee_id IN ( SELECT employee_id FROM Employee_Skill WHERE skill_name = 'Java'); -- Alternative using GROUP BYSELECT e.employee_id, e.first_nameFROM Employee eINNER JOIN Employee_Skill s ON e.employee_id = s.employee_idWHERE s.skill_name IN ('Python', 'Java')GROUP BY e.employee_id, e.first_nameHAVING COUNT(DISTINCT s.skill_name) = 2; -- PATTERN 6: Aggregate values into a list (database-specific)-- PostgreSQL: STRING_AGG / ARRAY_AGGSELECT e.employee_id, e.first_name, STRING_AGG(p.phone_number, ', ') AS all_phones, ARRAY_AGG(s.skill_name ORDER BY s.skill_name) AS skills_arrayFROM Employee eLEFT JOIN Employee_Phone p ON e.employee_id = p.employee_idLEFT JOIN Employee_Skill s ON e.employee_id = s.employee_idGROUP BY e.employee_id, e.first_name; -- MySQL: GROUP_CONCATSELECT e.employee_id, e.first_name, GROUP_CONCAT(p.phone_number SEPARATOR ', ') AS all_phonesFROM Employee eLEFT JOIN Employee_Phone p ON e.employee_id = p.employee_idGROUP BY e.employee_id, e.first_name; -- PATTERN 7: Find entities WITHOUT a specific valueSELECT e.employee_id, e.first_nameFROM Employee eWHERE NOT EXISTS ( SELECT 1 FROM Employee_Skill s WHERE s.employee_id = e.employee_id AND s.skill_name = 'COBOL');The multivalued attribute table's foreign key column should always be indexed (it often is, as part of the primary key). For queries that filter on the value column, add an index there too. For example, if you frequently search for employees by skill name, index Employee_Skill(skill_name).
Modern relational databases offer alternative approaches to handling multivalued attributes using array types and JSON columns. These features can simplify schemas and application code but come with tradeoffs.
PostgreSQL Array Types:
PostgreSQL allows columns to store arrays of values, enabling multivalued attributes in a single column while still supporting array-aware queries:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
-- Using PostgreSQL Arrays for Simple Multivalued Attributes CREATE TABLE Employee_With_Arrays ( employee_id SERIAL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(100) NOT NULL UNIQUE, -- Array types for simple multivalued attributes phone_numbers VARCHAR(20)[], -- Array of phone strings skills TEXT[], -- Array of skill names languages TEXT[] -- Array of languages spoken); -- Inserting array dataINSERT INTO Employee_With_Arrays (first_name, last_name, email, phone_numbers, skills, languages)VALUES ('Jane', 'Smith', 'jane@example.com', ARRAY['555-1234', '555-5678'], ARRAY['Python', 'SQL', 'Data Analysis'], ARRAY['English', 'Spanish', 'French']); -- Querying arrays-- Contains a specific value?SELECT * FROM Employee_With_ArraysWHERE 'Python' = ANY(skills); -- Contains all of these values?SELECT * FROM Employee_With_ArraysWHERE skills @> ARRAY['Python', 'SQL']; -- Contains both -- Array lengthSELECT first_name, array_length(skills, 1) AS skill_countFROM Employee_With_Arrays; -- Unnest (expand array to rows)SELECT e.first_name, unnest(e.skills) AS skillFROM Employee_With_Arrays e; -- GIN index for array containment queriesCREATE INDEX idx_skills ON Employee_With_Arrays USING GIN (skills); -- ----------------------- Using JSONB for composite multivalued attributes-- --------------------- CREATE TABLE Employee_With_JSON ( employee_id SERIAL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, -- JSONB for structured multivalued data phone_numbers JSONB, -- [{type: "mobile", number: "555-1234"}, ...] addresses JSONB, -- [{type: "home", street: "...", city: "..."}, ...] education JSONB -- [{institution: "...", degree: "...", year: 2020}, ...]); -- Inserting JSONB dataINSERT INTO Employee_With_JSON (first_name, last_name, phone_numbers, addresses)VALUES ( 'John', 'Doe', '[{"type": "mobile", "number": "555-1234", "primary": true}, {"type": "work", "number": "555-5678", "primary": false}]'::jsonb, '[{"type": "home", "street": "123 Main St", "city": "Springfield", "state": "IL"}]'::jsonb); -- Querying JSONB-- Get first phone numberSELECT first_name, phone_numbers->0->>'number' AS first_phoneFROM Employee_With_JSON; -- Find by value inside JSONB arraySELECT * FROM Employee_With_JSONWHERE phone_numbers @> '[{"number": "555-1234"}]'::jsonb; -- Expand JSONB array to rowsSELECT e.first_name, p->>'type' AS phone_type, p->>'number' AS phone_numberFROM Employee_With_JSON e, jsonb_array_elements(e.phone_numbers) AS p; -- GIN index for JSONB queriesCREATE INDEX idx_phones ON Employee_With_JSON USING GIN (phone_numbers);| Aspect | Separate Table | Array Column | JSONB Column |
|---|---|---|---|
| 1NF Compliant | ✓ Yes | ✗ Technically no | ✗ Technically no |
| Full SQL support | ✓ Full | ◐ Partial | ◐ Partial |
| Indexing | ✓ Standard B-tree | ◐ GIN index | ◐ GIN index |
| Constraints on values | ✓ CHECK, NOT NULL | ✗ Limited | ✗ JSON Schema only |
| Foreign keys on values | ✓ Yes | ✗ No | ✗ No |
| JOIN performance | ✓ Optimized | ◐ Requires unnest | ◐ Requires extraction |
| Schema flexibility | ○ Fixed columns | ◐ Mixed | ✓ Highly flexible |
| ORM support | ✓ Full support | ◐ Varies by ORM | ◐ Varies |
| Portability | ✓ Standard SQL | ✗ PostgreSQL | ◐ Most modern DBs |
Use arrays/JSON when: (1) values are simple and don't need individual constraints, (2) you don't need foreign keys referencing individual values, (3) the data is document-like, (4) application expects array/object format, (5) portability isn't a concern. Use separate tables (the traditional approach) when: referential integrity matters, you need individual value constraints, query patterns focus on individual values, or you need maximum portability.
Real-world business rules often limit how many values a multivalued attribute can have. "A customer may have at most 5 phone numbers." "An employee must have at least one emergency contact." "A product can have between 1 and 10 tags."
These cardinality constraints (minimum and maximum counts) must be enforced in the relational schema:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
-- MAXIMUM CARDINALITY: "Up to 5 phone numbers per customer"-- Option 1: Check in a triggerCREATE OR REPLACE FUNCTION check_max_phones()RETURNS TRIGGER AS $$BEGIN IF (SELECT COUNT(*) FROM Customer_Phone WHERE customer_id = NEW.customer_id) >= 5 THEN RAISE EXCEPTION 'Customer cannot have more than 5 phone numbers'; END IF; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER trg_max_phonesBEFORE INSERT ON Customer_PhoneFOR EACH ROW EXECUTE FUNCTION check_max_phones(); -- Option 2: Fixed columns for small fixed maximums-- If truly limited (e.g., max 3), consider denormalizingCREATE TABLE Customer_With_Fixed_Phones ( customer_id BIGINT PRIMARY KEY, name VARCHAR(100) NOT NULL, phone_1 VARCHAR(20), phone_1_type VARCHAR(20), phone_2 VARCHAR(20), phone_2_type VARCHAR(20), phone_3 VARCHAR(20), phone_3_type VARCHAR(20));-- Note: This introduces NULLs and complicates queries. Use cautiously. -- MINIMUM CARDINALITY: "Must have at least one emergency contact"-- Enforced via application logic or deferred constraint checking-- Cannot easily enforce via standard SQL without procedural code -- Example: Trigger to prevent deleting last contactCREATE OR REPLACE FUNCTION prevent_last_contact_delete()RETURNS TRIGGER AS $$BEGIN IF (SELECT COUNT(*) FROM Employee_Emergency_Contact WHERE employee_id = OLD.employee_id) <= 1 THEN RAISE EXCEPTION 'Employee must have at least one emergency contact'; END IF; RETURN OLD;END;$$ LANGUAGE plpgsql; CREATE TRIGGER trg_min_contactsBEFORE DELETE ON Employee_Emergency_ContactFOR EACH ROW EXECUTE FUNCTION prevent_last_contact_delete(); -- EXACTLY N values: "Employee must list exactly 3 references"-- Usually enforced at application level during form submission-- Can be verified via CHECK on status change: CREATE TABLE Job_Application ( application_id BIGINT PRIMARY KEY, applicant_name VARCHAR(100) NOT NULL, status VARCHAR(20) DEFAULT 'draft'); CREATE TABLE Application_Reference ( application_id BIGINT, reference_name VARCHAR(100), reference_phone VARCHAR(20), PRIMARY KEY (application_id, reference_name), FOREIGN KEY (application_id) REFERENCES Job_Application(application_id)); -- Trigger: Verify 3 references before status changes to 'submitted'CREATE OR REPLACE FUNCTION validate_references()RETURNS TRIGGER AS $$BEGIN IF NEW.status = 'submitted' AND (SELECT COUNT(*) FROM Application_Reference WHERE application_id = NEW.application_id) != 3 THEN RAISE EXCEPTION 'Application must have exactly 3 references'; END IF; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER trg_validate_refsBEFORE UPDATE ON Job_ApplicationFOR EACH ROW EXECUTE FUNCTION validate_references();Minimum cardinality ("at least N") is notoriously difficult to enforce purely in SQL because it requires checking counts during insert/delete operations across transaction boundaries. Many systems enforce minimums at the application level during business workflows (e.g., form validation) rather than at the database level.
Certain multivalued attribute scenarios recur across many database designs. Here are established patterns with best practices:
123456789101112131415161718192021222324
-- Pattern: Flexible Contact Information-- Handles multiple types of contact (phone, email, social) -- with unified structure CREATE TABLE Contact_Info ( contact_id SERIAL PRIMARY KEY, entity_type VARCHAR(50) NOT NULL, -- 'customer', 'vendor' entity_id BIGINT NOT NULL, contact_type VARCHAR(50) NOT NULL, -- 'phone', 'email', 'twitter' contact_value VARCHAR(200) NOT NULL, label VARCHAR(50), -- 'work', 'personal', 'main' is_primary BOOLEAN DEFAULT FALSE, is_verified BOOLEAN DEFAULT FALSE, created_at TIMESTAMP DEFAULT NOW(), -- Polymorphic association index -- (Could also use separate tables per entity type) UNIQUE (entity_type, entity_id, contact_type, contact_value)); -- Indexes for common queriesCREATE INDEX idx_contact_entity ON Contact_Info(entity_type, entity_id);CREATE INDEX idx_contact_lookup ON Contact_Info(contact_type, contact_value);Multivalued attributes represent one-to-many relationships between an entity and a set of values. The relational model's 1NF requirement prohibits storing multiple values in a single cell, necessitating transformation. Let's consolidate what we've learned:
What's Next:
We've covered how to map stored attributes—simple, composite, and multivalued. But not all attributes represent stored data. Some are computed from other attributes. The next page explores Derived Attribute Mapping—handling calculated values like age (from birthdate), total_price (from quantity × unit_price), and other derived data.
You now have comprehensive knowledge of multivalued attribute mapping. You understand why separate tables are necessary, how to structure them properly, how to query them effectively, and when modern alternatives might be appropriate. Next, we tackle derived attributes.