Attributes - Learning Module

Loading content...

0/241

Key Attributes

The Identity Problem

Imagine a database with 10,000 Customer records. You need to update the phone number for 'John Smith.' But wait—there are 47 people named John Smith. Which one do you update?

This is the identity problem. In the real world, we distinguish people through context, visual recognition, and social knowledge. In a database, we have none of that. We need a reliable, unambiguous mechanism to uniquely identify each entity instance.

This is the purpose of key attributes—attributes (or combinations of attributes) whose values uniquely distinguish each entity instance from all others. Keys are the foundation of entity integrity, relationship navigation, and the entire relational model.

What You Will Learn

By the end of this page, you will understand what makes an attribute a key, the types of keys in ER modeling (simple vs. composite, natural vs. surrogate), ER diagram notation for key attributes, best practices for key selection, and common pitfalls to avoid in key design.

What is a Key Attribute?

A key attribute is an attribute (or set of attributes) whose values uniquely identify each instance of an entity within an entity set. No two entity instances can have the same value for their key attribute(s).

Formal Definition:

A key is a minimal set of attributes such that no two distinct entity instances have the same combination of values for those attributes. A key attribute is any attribute that is part of a key.

The Uniqueness Property:

The fundamental property of a key is uniqueness: given a key value, you can identify at most one entity instance.

Properties of Key Attributes

•Uniqueness — Every entity instance has a distinct key value; no duplicates allowed
•Non-null — Key attributes cannot have NULL values (how can you identify something with 'unknown'?)
•Immutability (ideally) — Key values should rarely or never change; changing keys is operationally complex
•Minimality — A key should contain no unnecessary attributes; removing any attribute would break uniqueness
•Stability — Key values remain valid over time; they don't 'expire' or change meaning

Keys in the ER Model vs. Relational Model

In ER modeling, we identify 'key attributes' that will uniquely identify entities. These become PRIMARY KEYs when mapped to relational tables. The concepts of superkey, candidate key, and alternate key are more relevant in the relational model, but the foundational idea—unique identification—is established right here in conceptual design.

Types of Key Attributes

Key attributes can be classified along several dimensions. Understanding these classifications helps you make better design decisions.

Simple Key vs. Composite Key:

Type	Description	Example
Simple Key	A single attribute that uniquely identifies entities	`student_id`, `isbn`, `ssn`
Composite Key	Multiple attributes together form the unique identifier	(`course_id`, `semester`, `year`) for a course offering

Natural Key vs. Surrogate Key:

This distinction is fundamental to database design philosophy:

Natural Keys

•Definition: Attributes that exist naturally in the domain and happen to be unique
•Examples: SSN, ISBN, email, VIN, passport number
•Advantage: Meaningful to users; self-documenting
•Advantage: Already unique by external authority
•Disadvantage: May change (email, passport)
•Disadvantage: May be missing (not everyone has SSN)
•Disadvantage: Privacy concerns (SSN as identifier)

Surrogate Keys

•Definition: Artificial identifiers created solely for database purposes
•Examples: auto-increment IDs, UUIDs, generated codes
•Advantage: Guaranteed unique and stable
•Advantage: Compact and efficient for joins
•Advantage: No privacy exposure
•Disadvantage: Meaningless to users
•Disadvantage: Hidden meaning for debugging

Industry Practice

Modern database design strongly favors surrogate keys as primary keys, while maintaining natural keys as unique constraints. This gives you stable internal references (surrogate) plus user-friendly lookup paths (natural). Example: Use user_id (surrogate) as PK, with email as a UNIQUE constraint.

Representing Key Attributes in ER Diagrams

ER diagrams use specific notation to identify key attributes, making the entity's identity mechanism immediately visible.

Chen Notation (Original ER):

In Chen notation, key attributes are shown with their names underlined:

              ┌─────────────────┐
              │    EMPLOYEE     │
              └────────┬────────┘
                       │
         ┌─────────────┼─────────────┐
         │             │             │
     ╱───┴───╲     ╱───┴───╲     ╱───┴───╲
    ( emp_id  )   (  Name   )   ( Salary  )
     ╲━━━━━━━╱     ╲───────╱     ╲───────╱
      ‾‾‾‾‾‾
      Underlined = Key Attribute

The underline immediately identifies which attribute(s) form the entity's key.

Composite Key Notation:

When multiple attributes together form the key, all participating attributes are underlined:

              ┌─────────────────────┐
              │   COURSE_OFFERING   │
              └─────────┬───────────┘
                        │
        ┌───────────────┼───────────────┐
        │               │               │
    ╱───┴───╲       ╱───┴───╲       ╱───┴───╲
   ( course_id)    ( semester )    (  year   )     ...
    ╲━━━━━━━━╱      ╲━━━━━━━━╱      ╲━━━━━━━╱
     ‾‾‾‾‾‾‾‾        ‾‾‾‾‾‾‾‾        ‾‾‾‾‾‾
      All underlined = Composite Key

The combination of course_id, semester, and year uniquely identifies a course offering.

Crow's Foot (IE) Notation:

In Crow's Foot notation, the key attribute is typically:

Listed first in the attribute list
Marked with 'PK' (Primary Key)
Sometimes shown in a separate compartment

┌─────────────────────────────┐
│          EMPLOYEE           │
├─────────────────────────────┤
│ employee_id (PK)            │  ← Key attribute marked
│ first_name       VARCHAR    │
│ last_name        VARCHAR    │
│ hire_date        DATE       │
│ salary           DECIMAL    │
└─────────────────────────────┘

Candidate Keys in ER

ER diagrams typically show only the chosen primary key. If an entity has multiple candidate keys (e.g., both SSN and employee_id could identify an employee), the ER diagram marks one as the key; the others become alternate keys noted in documentation or implemented as UNIQUE constraints.

Selecting the Right Key Attribute

Choosing the right key attribute is one of the most consequential decisions in database design. A poor key choice creates technical debt that's expensive to fix later.

Key Selection Criteria

•Guaranteed uniqueness — Will this value ALWAYS be unique? Not just 'usually' but 'contractually.' Consider edge cases.
•Stability — Will this value change? Name changes happen. Email changes happen. IDs shouldn't.
•Non-nullable — Will every entity always have a value? If not, it can't be a key.
•Simplicity — Simpler keys are easier to use in JOINs, URLs, APIs, and debugging.
•Efficiency — Smaller keys are faster for indexing and JOIN operations (integers beat strings).
•Privacy — Does using this as a key expose sensitive information? SSN as PK is a security risk.
•Control — Do you control the key's uniqueness, or does an external system? Relying on external uniqueness is risky.

Key Selection Analysis Examples
Candidate Key	Unique?	Stable?	Simple?	Verdict
Employee email	In practice, yes	No—changes jobs	Yes	❌ Use as unique constraint, not PK
SSN	Legally yes	Rarely changes	Yes	❌ Privacy risk; use as unique constraint
Auto-increment ID	Guaranteed	Never changes	Yes	✓ Excellent surrogate key
UUID	Mathematically yes	Never changes	Long string	✓ Good for distributed systems
Phone number	Not at all	Frequently changes	Yes	❌ Not suitable as key
(Country, Passport#)	Yes within country	Passport renewals	Composite	⚠️ Limited use cases
ISBN	Yes by design	Book never changes	Yes	✓ Good natural key for books

The Hybrid Approach

The most robust pattern: use a surrogate key (auto-increment or UUID) as the primary key for internal references and JOINs, while defining natural keys as additional UNIQUE constraints for user-facing lookups. Example: user_id (PK) + email (UNIQUE) + username (UNIQUE).

Composite Keys in Depth

Sometimes no single attribute can uniquely identify an entity. In these cases, we need a composite key—a combination of multiple attributes that together provide uniqueness.

When Composite Keys Are Needed:

Associative (Junction) Entities — An enrollment connects a student and a course; neither alone identifies the enrollment.
Weak Entities — A dependent relies on its owning entity's key plus a discriminator.
Time-Versioned Records — History tables often key on (entity_id, effective_date).
Multi-Tenant Systems — Records may key on (tenant_id, local_id).
Naturally Compound Identifiers — A course section is uniquely identified by course + semester + year + section.

composite-keys.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Example 1: Associative Entity (Junction Table)
-- Enrollment connects Student and Course
CREATE TABLE enrollments (
    student_id      INTEGER NOT NULL,
    course_id       INTEGER NOT NULL,
    semester        VARCHAR(20) NOT NULL,
    year            INTEGER NOT NULL,
    grade           CHAR(2),
    
    -- Composite primary key
    PRIMARY KEY (student_id, course_id, semester, year),
    
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
 
-- Example 2: Weak Entity
-- Order Line Items are identified by order + line number
CREATE TABLE order_items (
    order_id        INTEGER NOT NULL,
    line_number     INTEGER NOT NULL,  -- Discriminator
    product_id      INTEGER NOT NULL,
    quantity        INTEGER NOT NULL,
    unit_price      DECIMAL(10,2) NOT NULL,
    
    -- Composite key: order + line number
    PRIMARY KEY (order_id, line_number),
    
    FOREIGN KEY (order_id) REFERENCES orders(order_id) ON DELETE CASCADE
);
 
-- Example 3: Time-Versioned Data
-- Price history tracks prices over time
CREATE TABLE product_prices (
    product_id      INTEGER NOT NULL,
    effective_date  DATE NOT NULL,
    price           DECIMAL(10,2) NOT NULL,
    created_by      INTEGER,
    
    -- Composite key: product + effective date
    PRIMARY KEY (product_id, effective_date),
    
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Composite Key Trade-offs

Composite keys are powerful but have drawbacks: (1) Every foreign key referencing this table must include ALL key columns, (2) JOINs become more complex, (3) URLs and APIs become awkward (how do you URL-encode course_id/semester/year?). Consider whether a surrogate key plus a unique constraint would be simpler.

Composite Key vs. Surrogate Key Decision:

Factor	Composite Key	Surrogate + Unique
Foreign key complexity	All columns repeated	Single column
JOIN complexity	Multiple ON conditions	Single ON condition
Index size	Larger (multi-column)	Smaller
API/URL friendliness	Poor	Good
Self-documenting	Yes	No
Natural relationships	Clear	Hidden
Cascading updates	Risky if key changes	Safe (surrogate stable)

Key Attributes in Real-World Domains

Let's examine key attribute design across several real-world domains.

Scenario: User Account System

Multiple natural identifiers exist, but which should be the key?

user-keys.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
CREATE TABLE users (
    -- Surrogate primary key
    user_id         SERIAL PRIMARY KEY,
    
    -- Natural identifiers as unique constraints
    email           VARCHAR(254) UNIQUE NOT NULL,
    username        VARCHAR(50) UNIQUE NOT NULL,
    
    -- Other attributes
    password_hash   VARCHAR(255) NOT NULL,
    first_name      VARCHAR(50),
    last_name       VARCHAR(50),
    phone           VARCHAR(20),
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active       BOOLEAN DEFAULT true
);
 
-- Why this design?
-- 1. user_id (surrogate):
--    - Stable internal reference
--    - Efficient for JOINs and indexes
--    - Safe for foreign keys (never changes)
--
-- 2. email (unique):
--    - User-facing login identifier
--    - BUT: users change email addresses
--    - NOT the PK because it's mutable
--
-- 3. username (unique):
--    - Display name / vanity URL
--    - MAY be changed in some systems
--    - NOT the PK because it could change
 
-- Related tables reference the stable surrogate key
CREATE TABLE user_sessions (
    session_id      UUID PRIMARY KEY,
    user_id         INTEGER NOT NULL REFERENCES users(user_id),
    created_at      TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    expires_at      TIMESTAMP NOT NULL
);
 
-- If email changes, only the users table is updated
-- user_sessions, orders, posts, etc. are unaffected

Common Mistakes with Key Attributes

Key design errors are among the most expensive to fix later. Here are the patterns to avoid:

Key Design Mistakes to Avoid

•Using mutable attributes as primary keys — Email, phone, username—anything users can change. When keys change, all foreign keys referencing them must also change (cascade updates) or break.
•Using meaningful keys that encode data — A product code like 'CAT-RED-LG-2023' where segments mean category, color, size, year. If any segment's meaning changes, the 'key' becomes misleading or wrong.
•Overloading keys with multiple purposes — Using SSN as both identity and a routing mechanism, or UPC as both identifier and category encoder. Keys should identify, nothing more.
•Not planning for edge cases — 'SSN is always unique' until you have international employees. 'Email is always present' until you support social login without email.
•Giant composite keys — A 6-column composite key makes every foreign key include 6 columns. Consider a surrogate.
•Exposing internal sequences — Auto-increment IDs in URLs reveal entity count and creation order. Consider UUIDs for public-facing IDs.
•Reusing keys after deletion — 'A' is deleted, ID 1001 is freed, 'B' gets ID 1001. Historical references to 'ID 1001' now point to wrong entity. Don't reuse keys.

The Immutability Principle

Once assigned, a primary key value should NEVER change. If you find yourself wanting to update a primary key, that's a design smell. Instead: (1) Add a new unique constraint for the mutable value, (2) Keep a stable surrogate as PK. Example: user_id (PK, immutable) + username (unique, changeable).

The Meaningful Key Trap in Depth:

❌ Wrong: SKU = 'ELE-TV-55-SAM-BLK-2023'
   Encodes: Category(ELE) + Type(TV) + Size(55) + Brand(SAM) + Color(BLK) + Year(2023)
   
   Problems:
   - Samsung rebrands to 'SSG' → SKU is misleading
   - Product moves from Electronics to Home category → SKU is wrong
   - Black variant becomes 'Midnight' → SKU doesn't match marketing
   
✓ Right: SKU = 'P0012847'
   Separate columns: category_id, product_type_id, brand_id, color_id, year
   
   Benefits:
   - SKU never changes
   - Attributes can change independently
   - No parsing required

Summary: Key Attributes

Key attributes are the foundation of entity identity—the mechanism by which we uniquely identify and reference each entity instance. Proper key design is essential for data integrity, efficient operations, and system scalability.

Key Takeaways

•Keys provide unique identification — Every entity must have a key; no two instances share the same key value.
•Underlined in ER diagrams — Key attributes are marked with underlining in Chen notation; 'PK' in Crow's Foot.
•Natural vs. Surrogate — Natural keys exist in the domain; surrogate keys are created for database purposes. Modern practice favors surrogate PKs with natural unique constraints.
•Composite keys combine attributes — When no single attribute is unique, multiple attributes together form the key.
•Stability is paramount — Keys should never change. Design for immutability from the start.
•Avoid encoding meaning — Keys should identify, not describe. Keep semantics in regular attributes.

Module Complete:

With key attributes, we've completed our comprehensive exploration of attribute types:

Simple Attributes — Atomic, indivisible values
Composite Attributes — Structured attributes with meaningful components
Multivalued Attributes — Attributes that can hold multiple values
Derived Attributes — Computed from other stored data
Key Attributes — Uniquely identify entity instances

These concepts form the vocabulary for describing any entity in any domain. In the next module, we'll explore Relationships—how entities connect to each other and the constraints that govern those connections.

Module Complete

Congratulations! You now have a comprehensive understanding of attributes in ER modeling. You can classify attributes by their structure (simple, composite, multivalued), their nature (stored, derived), and their role (key, descriptive). This knowledge is fundamental to effective database design. Next: defining relationships between entities.