Loading content...
Imagine a database with 10,000 Customer records. You need to update the phone number for 'John Smith.' But wait—there are 47 people named John Smith. Which one do you update?
This is the identity problem. In the real world, we distinguish people through context, visual recognition, and social knowledge. In a database, we have none of that. We need a reliable, unambiguous mechanism to uniquely identify each entity instance.
This is the purpose of key attributes—attributes (or combinations of attributes) whose values uniquely distinguish each entity instance from all others. Keys are the foundation of entity integrity, relationship navigation, and the entire relational model.
By the end of this page, you will understand what makes an attribute a key, the types of keys in ER modeling (simple vs. composite, natural vs. surrogate), ER diagram notation for key attributes, best practices for key selection, and common pitfalls to avoid in key design.
A key attribute is an attribute (or set of attributes) whose values uniquely identify each instance of an entity within an entity set. No two entity instances can have the same value for their key attribute(s).
Formal Definition:
A key is a minimal set of attributes such that no two distinct entity instances have the same combination of values for those attributes. A key attribute is any attribute that is part of a key.
The Uniqueness Property:
The fundamental property of a key is uniqueness: given a key value, you can identify at most one entity instance.
In ER modeling, we identify 'key attributes' that will uniquely identify entities. These become PRIMARY KEYs when mapped to relational tables. The concepts of superkey, candidate key, and alternate key are more relevant in the relational model, but the foundational idea—unique identification—is established right here in conceptual design.
Key attributes can be classified along several dimensions. Understanding these classifications helps you make better design decisions.
Simple Key vs. Composite Key:
| Type | Description | Example |
|---|---|---|
| Simple Key | A single attribute that uniquely identifies entities | student_id, isbn, ssn |
| Composite Key | Multiple attributes together form the unique identifier | (course_id, semester, year) for a course offering |
Natural Key vs. Surrogate Key:
This distinction is fundamental to database design philosophy:
Modern database design strongly favors surrogate keys as primary keys, while maintaining natural keys as unique constraints. This gives you stable internal references (surrogate) plus user-friendly lookup paths (natural). Example: Use user_id (surrogate) as PK, with email as a UNIQUE constraint.
ER diagrams use specific notation to identify key attributes, making the entity's identity mechanism immediately visible.
Chen Notation (Original ER):
In Chen notation, key attributes are shown with their names underlined:
┌─────────────────┐
│ EMPLOYEE │
└────────┬────────┘
│
┌─────────────┼─────────────┐
│ │ │
╱───┴───╲ ╱───┴───╲ ╱───┴───╲
( emp_id ) ( Name ) ( Salary )
╲━━━━━━━╱ ╲───────╱ ╲───────╱
‾‾‾‾‾‾
Underlined = Key Attribute
The underline immediately identifies which attribute(s) form the entity's key.
Composite Key Notation:
When multiple attributes together form the key, all participating attributes are underlined:
┌─────────────────────┐
│ COURSE_OFFERING │
└─────────┬───────────┘
│
┌───────────────┼───────────────┐
│ │ │
╱───┴───╲ ╱───┴───╲ ╱───┴───╲
( course_id) ( semester ) ( year ) ...
╲━━━━━━━━╱ ╲━━━━━━━━╱ ╲━━━━━━━╱
‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾‾‾ ‾‾‾‾‾‾
All underlined = Composite Key
The combination of course_id, semester, and year uniquely identifies a course offering.
Crow's Foot (IE) Notation:
In Crow's Foot notation, the key attribute is typically:
┌─────────────────────────────┐
│ EMPLOYEE │
├─────────────────────────────┤
│ employee_id (PK) │ ← Key attribute marked
│ first_name VARCHAR │
│ last_name VARCHAR │
│ hire_date DATE │
│ salary DECIMAL │
└─────────────────────────────┘
ER diagrams typically show only the chosen primary key. If an entity has multiple candidate keys (e.g., both SSN and employee_id could identify an employee), the ER diagram marks one as the key; the others become alternate keys noted in documentation or implemented as UNIQUE constraints.
Choosing the right key attribute is one of the most consequential decisions in database design. A poor key choice creates technical debt that's expensive to fix later.
| Candidate Key | Unique? | Stable? | Simple? | Verdict |
|---|---|---|---|---|
| Employee email | In practice, yes | No—changes jobs | Yes | ❌ Use as unique constraint, not PK |
| SSN | Legally yes | Rarely changes | Yes | ❌ Privacy risk; use as unique constraint |
| Auto-increment ID | Guaranteed | Never changes | Yes | ✓ Excellent surrogate key |
| UUID | Mathematically yes | Never changes | Long string | ✓ Good for distributed systems |
| Phone number | Not at all | Frequently changes | Yes | ❌ Not suitable as key |
| (Country, Passport#) | Yes within country | Passport renewals | Composite | ⚠️ Limited use cases |
| ISBN | Yes by design | Book never changes | Yes | ✓ Good natural key for books |
The most robust pattern: use a surrogate key (auto-increment or UUID) as the primary key for internal references and JOINs, while defining natural keys as additional UNIQUE constraints for user-facing lookups. Example: user_id (PK) + email (UNIQUE) + username (UNIQUE).
Sometimes no single attribute can uniquely identify an entity. In these cases, we need a composite key—a combination of multiple attributes that together provide uniqueness.
When Composite Keys Are Needed:
Associative (Junction) Entities — An enrollment connects a student and a course; neither alone identifies the enrollment.
Weak Entities — A dependent relies on its owning entity's key plus a discriminator.
Time-Versioned Records — History tables often key on (entity_id, effective_date).
Multi-Tenant Systems — Records may key on (tenant_id, local_id).
Naturally Compound Identifiers — A course section is uniquely identified by course + semester + year + section.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- Example 1: Associative Entity (Junction Table)-- Enrollment connects Student and CourseCREATE TABLE enrollments ( student_id INTEGER NOT NULL, course_id INTEGER NOT NULL, semester VARCHAR(20) NOT NULL, year INTEGER NOT NULL, grade CHAR(2), -- Composite primary key PRIMARY KEY (student_id, course_id, semester, year), FOREIGN KEY (student_id) REFERENCES students(student_id), FOREIGN KEY (course_id) REFERENCES courses(course_id)); -- Example 2: Weak Entity-- Order Line Items are identified by order + line numberCREATE TABLE order_items ( order_id INTEGER NOT NULL, line_number INTEGER NOT NULL, -- Discriminator product_id INTEGER NOT NULL, quantity INTEGER NOT NULL, unit_price DECIMAL(10,2) NOT NULL, -- Composite key: order + line number PRIMARY KEY (order_id, line_number), FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE); -- Example 3: Time-Versioned Data-- Price history tracks prices over timeCREATE TABLE product_prices ( product_id INTEGER NOT NULL, effective_date DATE NOT NULL, price DECIMAL(10,2) NOT NULL, created_by INTEGER, -- Composite key: product + effective date PRIMARY KEY (product_id, effective_date), FOREIGN KEY (product_id) REFERENCES products(product_id));Composite keys are powerful but have drawbacks: (1) Every foreign key referencing this table must include ALL key columns, (2) JOINs become more complex, (3) URLs and APIs become awkward (how do you URL-encode course_id/semester/year?). Consider whether a surrogate key plus a unique constraint would be simpler.
Composite Key vs. Surrogate Key Decision:
| Factor | Composite Key | Surrogate + Unique |
|---|---|---|
| Foreign key complexity | All columns repeated | Single column |
| JOIN complexity | Multiple ON conditions | Single ON condition |
| Index size | Larger (multi-column) | Smaller |
| API/URL friendliness | Poor | Good |
| Self-documenting | Yes | No |
| Natural relationships | Clear | Hidden |
| Cascading updates | Risky if key changes | Safe (surrogate stable) |
Let's examine key attribute design across several real-world domains.
Scenario: User Account System
Multiple natural identifiers exist, but which should be the key?
12345678910111213141516171819202122232425262728293031323334353637383940414243
CREATE TABLE users ( -- Surrogate primary key user_id SERIAL PRIMARY KEY, -- Natural identifiers as unique constraints email VARCHAR(254) UNIQUE NOT NULL, username VARCHAR(50) UNIQUE NOT NULL, -- Other attributes password_hash VARCHAR(255) NOT NULL, first_name VARCHAR(50), last_name VARCHAR(50), phone VARCHAR(20), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, is_active BOOLEAN DEFAULT true); -- Why this design?-- 1. user_id (surrogate):-- - Stable internal reference-- - Efficient for JOINs and indexes-- - Safe for foreign keys (never changes)---- 2. email (unique):-- - User-facing login identifier-- - BUT: users change email addresses-- - NOT the PK because it's mutable---- 3. username (unique):-- - Display name / vanity URL-- - MAY be changed in some systems-- - NOT the PK because it could change -- Related tables reference the stable surrogate keyCREATE TABLE user_sessions ( session_id UUID PRIMARY KEY, user_id INTEGER NOT NULL REFERENCES users(user_id), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, expires_at TIMESTAMP NOT NULL); -- If email changes, only the users table is updated-- user_sessions, orders, posts, etc. are unaffectedKey design errors are among the most expensive to fix later. Here are the patterns to avoid:
Once assigned, a primary key value should NEVER change. If you find yourself wanting to update a primary key, that's a design smell. Instead: (1) Add a new unique constraint for the mutable value, (2) Keep a stable surrogate as PK. Example: user_id (PK, immutable) + username (unique, changeable).
The Meaningful Key Trap in Depth:
❌ Wrong: SKU = 'ELE-TV-55-SAM-BLK-2023'
Encodes: Category(ELE) + Type(TV) + Size(55) + Brand(SAM) + Color(BLK) + Year(2023)
Problems:
- Samsung rebrands to 'SSG' → SKU is misleading
- Product moves from Electronics to Home category → SKU is wrong
- Black variant becomes 'Midnight' → SKU doesn't match marketing
✓ Right: SKU = 'P0012847'
Separate columns: category_id, product_type_id, brand_id, color_id, year
Benefits:
- SKU never changes
- Attributes can change independently
- No parsing required
Key attributes are the foundation of entity identity—the mechanism by which we uniquely identify and reference each entity instance. Proper key design is essential for data integrity, efficient operations, and system scalability.
Module Complete:
With key attributes, we've completed our comprehensive exploration of attribute types:
These concepts form the vocabulary for describing any entity in any domain. In the next module, we'll explore Relationships—how entities connect to each other and the constraints that govern those connections.
Congratulations! You now have a comprehensive understanding of attributes in ER modeling. You can classify attributes by their structure (simple, composite, multivalued), their nature (stored, derived), and their role (key, descriptive). This knowledge is fundamental to effective database design. Next: defining relationships between entities.