Database Management SystemsER to Relational Mapping

Entity Mapping

LevelIntermediate

Duration60 mins

TopicER to Relational Mapping

5 / 5

Key Attributes

The Foundation of Entity Identity

In any database, we must answer a fundamental question: How do we uniquely identify each entity instance? Without unique identification, we cannot reliably retrieve, update, or reference specific records. We cannot establish relationships between entities. The entire relational foundation collapses.

Key attributes solve this problem. They are the attributes (or combinations of attributes) that uniquely identify each entity instance. In ER diagrams, key attributes are underlined, visually marking them as special. In the relational model, they become PRIMARY KEY constraints—the fundamental guarantee of row uniqueness.

But mapping keys isn't always straightforward. An entity may have multiple candidate keys. Keys may be composite (multiple attributes). We must decide between natural keys (meaningful real-world values) and surrogate keys (system-generated identifiers). Each choice has profound implications for data integrity, performance, and maintainability.

This page completes our entity mapping journey by examining how to properly identify, select, and implement key attributes in a relational schema.

What You Will Learn

By the end of this page, you will understand the complete taxonomy of keys (superkey, candidate key, primary key, alternate key, foreign key), master the mapping of ER keys to relational constraints, handle composite keys effectively, and make informed decisions in the natural vs. surrogate key debate.

The Complete Taxonomy of Keys

Before mapping keys, we must understand the precise terminology. The relational model defines a hierarchy of key concepts, each building on the previous:

Key Hierarchy Definitions

•Superkey: Any set of attributes that uniquely identifies each tuple in a relation. A superkey may contain extra attributes beyond the minimum needed. Example: {student_id, name, email} is a superkey for Student (but student_id alone suffices).
•Candidate Key: A minimal superkey—no proper subset of it is also a superkey. Each candidate key is a superkey, but a superkey is not necessarily a candidate key. Example: {student_id} and {email} might both be candidate keys if both are unique.
•Primary Key: The candidate key chosen as the principal identifier for the relation. There is exactly one primary key per relation. It becomes the PRIMARY KEY constraint.
•Alternate Key: Any candidate key not chosen as the primary key. Alternate keys become UNIQUE constraints—they still enforce uniqueness but aren't the primary identifier.
•Foreign Key: An attribute (or set) in one relation that references the primary key of another relation. Foreign keys implement relationships between tables.
•Simple Key: A key consisting of a single attribute. Example: student_id.
•Composite Key: A key consisting of multiple attributes. Example: {department_code, course_number} for a Course entity.

Converting Mermaid diagram...

Example: Student Entity

Consider a Student entity with attributes:

student_id (university-assigned identifier)
ssn (social security number)
email (university email address)
first_name, last_name, date_of_birth, etc.

Superkeys include:

{student_id}
{ssn}
{email}
{student_id, ssn}
{student_id, first_name}
{email, last_name}
... (any superset of a candidate key)

Candidate Keys (minimal):

{student_id}
{ssn}
{email}

Primary Key Selection: We choose {student_id} as primary key.

Alternate Keys: {ssn} and {email} remain as UNIQUE constraints.

Minimality Is Key

A candidate key must be minimal—removing any attribute would break uniqueness. If {A, B} uniquely identifies tuples but {A} alone also does, then {A, B} is NOT a candidate key; only {A} is. This minimality ensures candidate keys have no redundant components.

Mapping ER Keys to PRIMARY KEY

In ER diagrams, key attributes are indicated by underlining the attribute name. When mapping to the relational schema, these underlined attributes become the PRIMARY KEY constraint of the relation.

Primary Key Mapping Steps

•Identify Key Attributes: In the ER diagram, locate the underlined attribute(s) of the entity. This is the declared key.
•Create Column(s): Include the key attribute(s) as column(s) in the relation with appropriate data types.
•Apply PRIMARY KEY Constraint: Declare the column(s) as PRIMARY KEY. This enforces uniqueness and NOT NULL automatically.
•If Composite: For multi-attribute keys, include all component columns in the PRIMARY KEY definition.
•Identify Alternate Keys: If the entity has other candidate keys (other unique identifiers not chosen as primary), apply UNIQUE constraints to those.

primary_key_mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Example 1: Simple Primary Key
-- ER: Student entity with underlined 'student_id'
CREATE TABLE Student (
    student_id      VARCHAR(20)     PRIMARY KEY,  -- From ER key attribute
    first_name      VARCHAR(50)     NOT NULL,
    last_name       VARCHAR(50)     NOT NULL,
    email           VARCHAR(100)    NOT NULL UNIQUE,  -- Alternate key
    ssn             CHAR(11)        UNIQUE,           -- Alternate key
    date_of_birth   DATE            NOT NULL
);
 
-- Example 2: Composite Primary Key
-- ER: Course_Section with composite key {course_id, section_number, semester, year}
CREATE TABLE Course_Section (
    course_id       VARCHAR(10)     NOT NULL,
    section_number  VARCHAR(5)      NOT NULL,
    semester        VARCHAR(10)     NOT NULL,
    year            INTEGER         NOT NULL,
    instructor_id   INTEGER,
    room_number     VARCHAR(20),
    schedule        VARCHAR(100),
    
    -- Composite primary key (all columns from ER key)
    PRIMARY KEY (course_id, section_number, semester, year),
    
    -- Foreign keys
    FOREIGN KEY (course_id) REFERENCES Course(course_id),
    FOREIGN KEY (instructor_id) REFERENCES Instructor(instructor_id)
);
 
-- Example 3: Multiple Candidate Keys (choosing one as primary)
-- ER shows both 'product_id' and 'sku' as unique identifiers
CREATE TABLE Product (
    product_id      SERIAL          PRIMARY KEY,    -- Chosen as primary
    sku             VARCHAR(50)     NOT NULL UNIQUE, -- Alternate key
    upc             VARCHAR(20)     UNIQUE,          -- Another alternate key
    product_name    VARCHAR(200)    NOT NULL,
    price           DECIMAL(10, 2)  NOT NULL
);
 
-- The PRIMARY KEY constraint automatically implies:
-- - NOT NULL (key columns cannot be null)
-- - UNIQUE (key values cannot be duplicated)
-- - Creates an index (performance optimization)

PRIMARY KEY Semantics

A PRIMARY KEY constraint is conceptually equivalent to NOT NULL + UNIQUE, plus it designates the principal identifier. Most databases also create an index automatically. While you could manually combine NOT NULL and UNIQUE, using PRIMARY KEY explicitly declares intent and enables proper foreign key references.

Handling Composite Keys

When no single attribute uniquely identifies an entity, multiple attributes combine to form a composite key. This is common in associative entities (junction tables) and entities that are implicitly dependent on context.

Common Composite Key Scenarios
Entity	Composite Key	Rationale
Order_Item	(order_id, product_id)	Same product can appear in multiple orders; same order can have multiple products
Course_Section	(course_id, section, semester, year)	Same course has multiple sections across terms
Employee_Project	(employee_id, project_id)	Many-to-many relationship
Flight_Segment	(flight_number, segment_number)	A flight may have multiple segments
Version	(artifact_id, version_number)	Multiple versions per artifact
Room	(building_id, room_number)	Room numbers unique only within building

composite_keys.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Composite Key Example 1: Order Line Items
CREATE TABLE Order_Item (
    order_id        INTEGER         NOT NULL,
    product_id      INTEGER         NOT NULL,
    quantity        INTEGER         NOT NULL CHECK (quantity > 0),
    unit_price      DECIMAL(10, 2)  NOT NULL,
    discount_pct    DECIMAL(5, 2)   DEFAULT 0,
    
    -- Composite primary key
    PRIMARY KEY (order_id, product_id),
    
    -- Foreign keys
    FOREIGN KEY (order_id) REFERENCES Order(order_id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES Product(product_id)
);
 
-- Composite Key Example 2: Many-to-Many with Attributes
CREATE TABLE Student_Course_Enrollment (
    student_id      INTEGER         NOT NULL,
    course_id       VARCHAR(10)     NOT NULL,
    semester        VARCHAR(10)     NOT NULL,
    year            INTEGER         NOT NULL,
    
    -- Enrollment attributes
    enrollment_date DATE            NOT NULL DEFAULT CURRENT_DATE,
    grade           CHAR(2),
    status          VARCHAR(20)     DEFAULT 'enrolled',
    
    -- Four-column composite key
    PRIMARY KEY (student_id, course_id, semester, year),
    
    FOREIGN KEY (student_id) REFERENCES Student(student_id),
    FOREIGN KEY (course_id) REFERENCES Course(course_id),
    
    CONSTRAINT chk_status CHECK (status IN ('enrolled', 'withdrawn', 'completed'))
);
 
-- Composite Key Example 3: Hierarchical Identifier
CREATE TABLE Course_Module_Lesson (
    course_id       VARCHAR(10)     NOT NULL,
    module_number   INTEGER         NOT NULL,
    lesson_number   INTEGER         NOT NULL,
    lesson_title    VARCHAR(200)    NOT NULL,
    content_url     VARCHAR(500),
    duration_min    INTEGER,
    
    -- Three-level composite key
    PRIMARY KEY (course_id, module_number, lesson_number),
    
    FOREIGN KEY (course_id) REFERENCES Course(course_id)
);
 
-- Referencing a composite key from another table
CREATE TABLE Lesson_Quiz (
    quiz_id         SERIAL          PRIMARY KEY,
    course_id       VARCHAR(10)     NOT NULL,
    module_number   INTEGER         NOT NULL,
    lesson_number   INTEGER         NOT NULL,
    quiz_title      VARCHAR(200)    NOT NULL,
    pass_score      INTEGER         DEFAULT 70,
    
    -- Foreign key referencing composite primary key
    FOREIGN KEY (course_id, module_number, lesson_number) 
        REFERENCES Course_Module_Lesson(course_id, module_number, lesson_number)
        ON DELETE CASCADE
);

Composite Key Advantages

•Natural representation of identity
•No separate ID column needed
•Self-documenting (columns explain what makes it unique)
•Enforces business rules through structure

Composite Key Disadvantages

•Foreign key references are verbose
•URLs/APIs with multiple IDs are complex
•Larger index size (multiple columns)
•Join conditions more complex

Wide Composite Keys

If a composite key exceeds 3-4 columns, consider whether a surrogate key might simplify the schema. Very wide keys make foreign key references cumbersome and can impact performance. Balance natural semantics against practical usability.

Natural vs. Surrogate Keys

One of the most debated topics in database design: Should primary keys be natural (meaningful business values) or surrogate (system-generated identifiers)?

Natural Keys are real-world identifiers with business meaning:

Social Security Number, Email Address
ISBN (for books), VIN (for vehicles)
Employee ID assigned by HR system
Composite business attributes (country + passport number)

Surrogate Keys are artificial identifiers with no business meaning:

Auto-increment integers (1, 2, 3, ...)
UUIDs (550e8400-e29b-41d4-a716-446655440000)
Generated sequences

Both have legitimate use cases. The right choice depends on your specific requirements.

Advantages of Natural Keys

•Inherent Meaning: Values are recognizable and meaningful in reports, logs, and debugging
•No Additional Column: Uses existing data, no extra storage for artificial IDs
•Reduces Joins: Entity can be identified by meaningful value without joining to get 'the real ID'
•Data Quality: Forces proper handling of the natural identifier
•Integration-Friendly: Often matches external system identifiers

Disadvantages of Natural Keys

•Keys Can Change: Business values sometimes change (email, SSN corrections, mergers)
•Cascading Updates: Change to key requires updating all foreign key references
•Privacy Concerns: Sensitive data (SSN) as key exposes it everywhere it's referenced
•Size/Performance: Strings and wide composites are slower to index than integers
•Non-Existence Problem: What if the natural key isn't known yet (temporary record)?

Decision Framework

Use natural keys when: (1) they are truly immutable (ISBN, VIN), (2) they are compact, (3) they are always known at insert time, (4) they match integration requirements. Use surrogate keys when: (1) natural key might change, (2) natural key is large/composite, (3) natural key is sensitive, (4) record may exist before natural key is assigned. When in doubt, use the hybrid approach.

Alternate Keys and UNIQUE Constraints

When an entity has multiple candidate keys, one becomes PRIMARY KEY and the others become alternate keys. Alternate keys are implemented using UNIQUE constraints—they enforce uniqueness but aren't the principal identifier.

Why Alternate Keys Matter:

Business Rules: Email must be unique even if we use surrogate ID as primary
Query Patterns: Users search by natural identifiers (username, email, SKU)
Data Integrity: Prevent accidental duplicates on business-meaningful attributes
API Design: RESTful APIs often expose natural keys in URLs

alternate_keys.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
-- Multiple UNIQUE constraints for alternate keys
CREATE TABLE User_Account (
    id              BIGSERIAL       PRIMARY KEY,         -- Surrogate key (internal)
    
    -- Alternate keys (natural identifiers)
    username        VARCHAR(50)     NOT NULL UNIQUE,     -- Users login by username
    email           VARCHAR(100)    NOT NULL UNIQUE,     -- Password reset by email
    phone_number    VARCHAR(20)     UNIQUE,              -- Optional, but unique if provided
    
    -- Other attributes
    display_name    VARCHAR(100),
    password_hash   VARCHAR(200)    NOT NULL,
    created_at      TIMESTAMP       DEFAULT NOW()
);
 
-- Composite alternate key
CREATE TABLE Employee (
    employee_id     BIGSERIAL       PRIMARY KEY,
    
    -- Composite alternate key: company + badge number
    company_code    VARCHAR(10)     NOT NULL,
    badge_number    VARCHAR(20)     NOT NULL,
    
    -- Other attributes
    first_name      VARCHAR(50)     NOT NULL,
    last_name       VARCHAR(50)     NOT NULL,
    email           VARCHAR(100)    NOT NULL UNIQUE,
    
    -- Composite UNIQUE constraint
    CONSTRAINT uq_company_badge UNIQUE (company_code, badge_number)
);
 
-- Named UNIQUE constraints (recommended for maintainability)
CREATE TABLE Product (
    product_id      BIGSERIAL       PRIMARY KEY,
    
    -- Multiple alternate keys with named constraints
    sku             VARCHAR(50)     NOT NULL,
    upc             VARCHAR(20),
    manufacturer_part_no VARCHAR(50),
    
    product_name    VARCHAR(200)    NOT NULL,
    
    CONSTRAINT uq_product_sku UNIQUE (sku),
    CONSTRAINT uq_product_upc UNIQUE (upc),
    CONSTRAINT uq_product_mpn UNIQUE (manufacturer_part_no)
);
 
-- Partial unique: unique only when not null or under certain conditions
-- (PostgreSQL specific - filtered unique index)
CREATE TABLE Customer (
    customer_id     BIGSERIAL       PRIMARY KEY,
    email           VARCHAR(100),
    is_active       BOOLEAN         DEFAULT TRUE
);
 
-- Email must be unique, but only among active customers
CREATE UNIQUE INDEX uq_active_customer_email 
ON Customer(email) 
WHERE is_active = TRUE;
 
-- Unique constraint with NULL handling
-- By default, NULL = NULL is unknown, so multiple NULLs allowed in UNIQUE
-- To prohibit, either make NOT NULL or use partial index

UNIQUE and NULL

Standard SQL allows multiple NULL values in a UNIQUE column because NULL ≠ NULL (null is unknown). This means UNIQUE without NOT NULL can have many null entries. If you want to ensure at most one null or no nulls, add NOT NULL or use database-specific features like filtered indexes.

Key Data Types and Generation

The data type and generation strategy for keys impacts performance, storage, and usability. Here are the common options and their tradeoffs:

Key Type Comparison
Key Type	Size	Generation	Pros	Cons
INTEGER AUTO_INCREMENT	4 bytes	Database sequence	Compact, ordered, fast joins	Predictable, limited range (2B)
BIGINT AUTO_INCREMENT	8 bytes	Database sequence	Huge range (9×10¹⁸)	Larger index, still predictable
UUID v4	16 bytes (128 bit)	Random generation	Globally unique, unpredictable	Large, random = poor index locality
UUID v7 / ULID	16 bytes	Time-ordered random	Time-sortable, unique, unguessable	Relatively large
VARCHAR natural key	Variable	Business assigned	Meaningful, no generation needed	May change, variable size
CHAR fixed-width	Fixed n bytes	Business assigned	Consistent size for codes	Padding overhead

key_types.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
-- 1. Auto-increment Integer (most common)
CREATE TABLE Order_AutoInt (
    order_id    SERIAL PRIMARY KEY,  -- PostgreSQL: 1, 2, 3, ...
    -- or: INTEGER GENERATED ALWAYS AS IDENTITY
    customer_id INTEGER NOT NULL,
    order_date  DATE NOT NULL
);
 
-- MySQL equivalent
-- order_id INT AUTO_INCREMENT PRIMARY KEY
 
-- SQL Server equivalent
-- order_id INT IDENTITY(1,1) PRIMARY KEY
 
 
-- 2. BIGINT for very large tables
CREATE TABLE Event_Log (
    event_id    BIGSERIAL PRIMARY KEY,  -- 8 bytes, huge range
    event_type  VARCHAR(50),
    event_data  JSONB,
    created_at  TIMESTAMP DEFAULT NOW()
);
 
 
-- 3. UUID (Universally Unique Identifier)
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";  -- PostgreSQL
 
CREATE TABLE Document (
    document_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    title       VARCHAR(200) NOT NULL,
    content     TEXT,
    created_at  TIMESTAMP DEFAULT NOW()
);
 
-- Alternative: UUID v7 (time-sortable) - requires extension or app-generated
 
 
-- 4. Natural key with specific format
CREATE TABLE Country (
    country_code    CHAR(3) PRIMARY KEY,  -- ISO 3166-1 alpha-3
    country_name    VARCHAR(100) NOT NULL,
    continent       VARCHAR(50)
);
 
CREATE TABLE Currency (
    currency_code   CHAR(3) PRIMARY KEY,  -- ISO 4217
    currency_name   VARCHAR(100) NOT NULL,
    symbol          VARCHAR(5)
);
 
 
-- 5. Using a custom sequence
CREATE SEQUENCE order_number_seq START 100000;
 
CREATE TABLE Customer_Order (
    order_number    VARCHAR(20) PRIMARY KEY 
                    DEFAULT 'ORD-' || nextval('order_number_seq'),
    customer_id     INTEGER NOT NULL,
    order_date      DATE NOT NULL
);
-- Generates: ORD-100000, ORD-100001, ...
 
 
-- 6. Application-generated structured ID
-- Common pattern: Prefix + Date + Sequence
-- Example: INV-2024-001234
-- Usually generated in application code, stored as VARCHAR

UUID vs. Auto-Increment

Auto-increment is best for internal systems where ordering helps performance and exposing IDs is acceptable. UUID is better for distributed systems, public-facing APIs where you don't want guessable IDs, and data that may be merged from multiple sources. UUID v7 (time-ordered) offers a compromise with ordering benefits.

Key Design Best Practices

Drawing from decades of database design experience, here are the established best practices for key attribute mapping:

Key Design Best Practices

•Every table must have a primary key. No exceptions. This is fundamental to the relational model and required for most database features (replication, ORMs, etc.).
•Primary keys should be immutable. Once assigned, a key should never change. If natural keys might change, use surrogate key as primary.
•Primary keys should be NOT NULL. This is automatic with PRIMARY KEY constraint, but worth emphasizing—null identifiers are meaningless.
•Keep keys simple. Single-column integer or UUID keys are easiest to work with. Use composite keys when semantically required, but avoid them when a surrogate would suffice.
•Enforce alternate keys. Even with surrogate primary keys, use UNIQUE constraints on natural identifiers to enforce business uniqueness.
•Name constraints descriptively. Use names like 'pk_customer', 'uq_customer_email', 'fk_order_customer' for maintainability.
•Consider key size for frequently-joined tables. A 4-byte integer primary key is faster to join than a 36-character UUID string.
•Don't expose auto-increment IDs in public APIs if security matters. Sequential IDs reveal information (how many records, enumeration attacks). Use UUIDs or obscure mapped identifiers.
•Document key selection rationale. When the choice isn't obvious, document why you chose natural vs. surrogate, what the alternate keys represent.
•Test key-related queries early. Ensure your key choices support the expected query patterns and perform adequately.

Anti-Pattern: No Primary Key

Some developers omit primary keys from tables (especially 'temporary' or 'log' tables). This causes problems: no guaranteed uniqueness, replication issues, ORM incompatibility, performance problems without implicit row identifier. Always define a primary key, even if just an auto-increment ID.

Summary: Key Attribute Mapping

Key attributes are the foundation of entity identity in both ER modeling and relational databases. Proper key design ensures data integrity, enables relationships, and affects query performance. Let's consolidate what we've learned:

Key Takeaways

•Key Hierarchy: Superkeys → Candidate Keys (minimal) → Primary Key (chosen) + Alternate Keys (others).
•ER Keys → PRIMARY KEY: Underlined ER attributes become PRIMARY KEY constraints in the relational schema.
•Composite Keys: Multiple attributes can form a single key; all components go into PRIMARY KEY definition.
•Natural vs. Surrogate: Natural keys are meaningful but may change; surrogates are stable but meaningless. Hybrid approach often best.
•Alternate Keys → UNIQUE: Non-primary candidate keys become UNIQUE constraints to enforce business uniqueness.
•Data Type Matters: INTEGER is compact and fast; UUID is globally unique; choose based on requirements.
•Every Table Needs a Key: No exceptions—primary keys are fundamental to relational operations.
•Document Decisions: Key selection has lasting impact; document your reasoning for future maintainers.

Module Complete: Entity Mapping

We've now covered all aspects of mapping ER entities to relational tables:

Regular Entity Mapping — The foundational algorithm
Composite Attributes — Flattening hierarchical structures
Multivalued Attributes — Separate tables for multiple values
Derived Attributes — Computed values and storage decisions
Key Attributes — Identity, uniqueness, and key selection

You now have the complete toolkit for transforming any ER entity into a well-designed relational table. The next module will cover Relationship Mapping, where we'll learn how to transform the connections between entities into foreign keys and junction tables.

Page Complete

You now understand the complete theory and practice of key attribute mapping. You can identify and classify keys, map them to appropriate constraints, make informed natural vs. surrogate decisions, and apply best practices for robust key design. You've completed the Entity Mapping module!

5 / 5

Loading learning content...

Database Management SystemsER to Relational Mapping

Entity Mapping

LevelIntermediate

Duration60 mins

TopicER to Relational Mapping

5 / 5

Key Attributes

The Foundation of Entity Identity

This page completes our entity mapping journey by examining how to properly identify, select, and implement key attributes in a relational schema.

What You Will Learn

The Complete Taxonomy of Keys

Before mapping keys, we must understand the precise terminology. The relational model defines a hierarchy of key concepts, each building on the previous:

Key Hierarchy Definitions

•Superkey: Any set of attributes that uniquely identifies each tuple in a relation. A superkey may contain extra attributes beyond the minimum needed. Example: {student_id, name, email} is a superkey for Student (but student_id alone suffices).
•Candidate Key: A minimal superkey—no proper subset of it is also a superkey. Each candidate key is a superkey, but a superkey is not necessarily a candidate key. Example: {student_id} and {email} might both be candidate keys if both are unique.
•Primary Key: The candidate key chosen as the principal identifier for the relation. There is exactly one primary key per relation. It becomes the PRIMARY KEY constraint.
•Alternate Key: Any candidate key not chosen as the primary key. Alternate keys become UNIQUE constraints—they still enforce uniqueness but aren't the primary identifier.
•Foreign Key: An attribute (or set) in one relation that references the primary key of another relation. Foreign keys implement relationships between tables.
•Simple Key: A key consisting of a single attribute. Example: student_id.
•Composite Key: A key consisting of multiple attributes. Example: {department_code, course_number} for a Course entity.

Converting Mermaid diagram...

Example: Student Entity

Consider a Student entity with attributes:

student_id (university-assigned identifier)
ssn (social security number)
email (university email address)
first_name, last_name, date_of_birth, etc.

Superkeys include:

{student_id}
{ssn}
{email}
{student_id, ssn}
{student_id, first_name}
{email, last_name}
... (any superset of a candidate key)

Candidate Keys (minimal):

{student_id}
{ssn}
{email}

Primary Key Selection: We choose {student_id} as primary key.

Alternate Keys: {ssn} and {email} remain as UNIQUE constraints.

Minimality Is Key

Mapping ER Keys to PRIMARY KEY

Primary Key Mapping Steps

•Identify Key Attributes: In the ER diagram, locate the underlined attribute(s) of the entity. This is the declared key.
•Create Column(s): Include the key attribute(s) as column(s) in the relation with appropriate data types.
•Apply PRIMARY KEY Constraint: Declare the column(s) as PRIMARY KEY. This enforces uniqueness and NOT NULL automatically.
•If Composite: For multi-attribute keys, include all component columns in the PRIMARY KEY definition.
•Identify Alternate Keys: If the entity has other candidate keys (other unique identifiers not chosen as primary), apply UNIQUE constraints to those.

primary_key_mapping.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
-- Example 1: Simple Primary Key
-- ER: Student entity with underlined 'student_id'
CREATE TABLE Student (
    student_id      VARCHAR(20)     PRIMARY KEY,  -- From ER key attribute
    first_name      VARCHAR(50)     NOT NULL,
    last_name       VARCHAR(50)     NOT NULL,
    email           VARCHAR(100)    NOT NULL UNIQUE,  -- Alternate key
    ssn             CHAR(11)        UNIQUE,           -- Alternate key
    date_of_birth   DATE            NOT NULL
);
 
-- Example 2: Composite Primary Key
-- ER: Course_Section with composite key {course_id, section_number, semester, year}
CREATE TABLE Course_Section (
    course_id       VARCHAR(10)     NOT NULL,
    section_number  VARCHAR(5)      NOT NULL,
    semester        VARCHAR(10)     NOT NULL,
    year            INTEGER         NOT NULL,
    instructor_id   INTEGER,
    room_number     VARCHAR(20),
    schedule        VARCHAR(100),
    
    -- Composite primary key (all columns from ER key)
    PRIMARY KEY (course_id, section_number, semester, year),
    
    -- Foreign keys
    FOREIGN KEY (course_id) REFERENCES Course(course_id),
    FOREIGN KEY (instructor_id) REFERENCES Instructor(instructor_id)
);
 
-- Example 3: Multiple Candidate Keys (choosing one as primary)
-- ER shows both 'product_id' and 'sku' as unique identifiers
CREATE TABLE Product (
    product_id      SERIAL          PRIMARY KEY,    -- Chosen as primary
    sku             VARCHAR(50)     NOT NULL UNIQUE, -- Alternate key
    upc             VARCHAR(20)     UNIQUE,          -- Another alternate key
    product_name    VARCHAR(200)    NOT NULL,
    price           DECIMAL(10, 2)  NOT NULL
);
 
-- The PRIMARY KEY constraint automatically implies:
-- - NOT NULL (key columns cannot be null)
-- - UNIQUE (key values cannot be duplicated)
-- - Creates an index (performance optimization)

PRIMARY KEY Semantics

Handling Composite Keys

Common Composite Key Scenarios
Entity	Composite Key	Rationale
Order_Item	(order_id, product_id)	Same product can appear in multiple orders; same order can have multiple products
Course_Section	(course_id, section, semester, year)	Same course has multiple sections across terms
Employee_Project	(employee_id, project_id)	Many-to-many relationship
Flight_Segment	(flight_number, segment_number)	A flight may have multiple segments
Version	(artifact_id, version_number)	Multiple versions per artifact
Room	(building_id, room_number)	Room numbers unique only within building

composite_keys.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Composite Key Example 1: Order Line Items
CREATE TABLE Order_Item (
    order_id        INTEGER         NOT NULL,
    product_id      INTEGER         NOT NULL,
    quantity        INTEGER         NOT NULL CHECK (quantity > 0),
    unit_price      DECIMAL(10, 2)  NOT NULL,
    discount_pct    DECIMAL(5, 2)   DEFAULT 0,
    
    -- Composite primary key
    PRIMARY KEY (order_id, product_id),
    
    -- Foreign keys
    FOREIGN KEY (order_id) REFERENCES Order(order_id) ON DELETE CASCADE,
    FOREIGN KEY (product_id) REFERENCES Product(product_id)
);
 
-- Composite Key Example 2: Many-to-Many with Attributes
CREATE TABLE Student_Course_Enrollment (
    student_id      INTEGER         NOT NULL,
    course_id       VARCHAR(10)     NOT NULL,
    semester        VARCHAR(10)     NOT NULL,
    year            INTEGER         NOT NULL,
    
    -- Enrollment attributes
    enrollment_date DATE            NOT NULL DEFAULT CURRENT_DATE,
    grade           CHAR(2),
    status          VARCHAR(20)     DEFAULT 'enrolled',
    
    -- Four-column composite key
    PRIMARY KEY (student_id, course_id, semester, year),
    
    FOREIGN KEY (student_id) REFERENCES Student(student_id),
    FOREIGN KEY (course_id) REFERENCES Course(course_id),
    
    CONSTRAINT chk_status CHECK (status IN ('enrolled', 'withdrawn', 'completed'))
);
 
-- Composite Key Example 3: Hierarchical Identifier
CREATE TABLE Course_Module_Lesson (
    course_id       VARCHAR(10)     NOT NULL,
    module_number   INTEGER         NOT NULL,
    lesson_number   INTEGER         NOT NULL,
    lesson_title    VARCHAR(200)    NOT NULL,
    content_url     VARCHAR(500),
    duration_min    INTEGER,
    
    -- Three-level composite key
    PRIMARY KEY (course_id, module_number, lesson_number),
    
    FOREIGN KEY (course_id) REFERENCES Course(course_id)
);
 
-- Referencing a composite key from another table
CREATE TABLE Lesson_Quiz (
    quiz_id         SERIAL          PRIMARY KEY,
    course_id       VARCHAR(10)     NOT NULL,
    module_number   INTEGER         NOT NULL,
    lesson_number   INTEGER         NOT NULL,
    quiz_title      VARCHAR(200)    NOT NULL,
    pass_score      INTEGER         DEFAULT 70,
    
    -- Foreign key referencing composite primary key
    FOREIGN KEY (course_id, module_number, lesson_number) 
        REFERENCES Course_Module_Lesson(course_id, module_number, lesson_number)
        ON DELETE CASCADE
);

Composite Key Advantages

•Natural representation of identity
•No separate ID column needed
•Self-documenting (columns explain what makes it unique)
•Enforces business rules through structure

Composite Key Disadvantages

•Foreign key references are verbose
•URLs/APIs with multiple IDs are complex
•Larger index size (multiple columns)
•Join conditions more complex

Wide Composite Keys

Natural vs. Surrogate Keys

One of the most debated topics in database design: Should primary keys be natural (meaningful business values) or surrogate (system-generated identifiers)?

Natural Keys are real-world identifiers with business meaning:

Social Security Number, Email Address
ISBN (for books), VIN (for vehicles)
Employee ID assigned by HR system
Composite business attributes (country + passport number)

Surrogate Keys are artificial identifiers with no business meaning:

Auto-increment integers (1, 2, 3, ...)
UUIDs (550e8400-e29b-41d4-a716-446655440000)
Generated sequences

Both have legitimate use cases. The right choice depends on your specific requirements.

Advantages of Natural Keys

•Inherent Meaning: Values are recognizable and meaningful in reports, logs, and debugging
•No Additional Column: Uses existing data, no extra storage for artificial IDs
•Reduces Joins: Entity can be identified by meaningful value without joining to get 'the real ID'
•Data Quality: Forces proper handling of the natural identifier
•Integration-Friendly: Often matches external system identifiers

Disadvantages of Natural Keys

•Keys Can Change: Business values sometimes change (email, SSN corrections, mergers)
•Cascading Updates: Change to key requires updating all foreign key references
•Privacy Concerns: Sensitive data (SSN) as key exposes it everywhere it's referenced
•Size/Performance: Strings and wide composites are slower to index than integers
•Non-Existence Problem: What if the natural key isn't known yet (temporary record)?

Decision Framework

Alternate Keys and UNIQUE Constraints

Why Alternate Keys Matter:

Business Rules: Email must be unique even if we use surrogate ID as primary
Query Patterns: Users search by natural identifiers (username, email, SKU)
Data Integrity: Prevent accidental duplicates on business-meaningful attributes
API Design: RESTful APIs often expose natural keys in URLs

alternate_keys.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
-- Multiple UNIQUE constraints for alternate keys
CREATE TABLE User_Account (
    id              BIGSERIAL       PRIMARY KEY,         -- Surrogate key (internal)
    
    -- Alternate keys (natural identifiers)
    username        VARCHAR(50)     NOT NULL UNIQUE,     -- Users login by username
    email           VARCHAR(100)    NOT NULL UNIQUE,     -- Password reset by email
    phone_number    VARCHAR(20)     UNIQUE,              -- Optional, but unique if provided
    
    -- Other attributes
    display_name    VARCHAR(100),
    password_hash   VARCHAR(200)    NOT NULL,
    created_at      TIMESTAMP       DEFAULT NOW()
);
 
-- Composite alternate key
CREATE TABLE Employee (
    employee_id     BIGSERIAL       PRIMARY KEY,
    
    -- Composite alternate key: company + badge number
    company_code    VARCHAR(10)     NOT NULL,
    badge_number    VARCHAR(20)     NOT NULL,
    
    -- Other attributes
    first_name      VARCHAR(50)     NOT NULL,
    last_name       VARCHAR(50)     NOT NULL,
    email           VARCHAR(100)    NOT NULL UNIQUE,
    
    -- Composite UNIQUE constraint
    CONSTRAINT uq_company_badge UNIQUE (company_code, badge_number)
);
 
-- Named UNIQUE constraints (recommended for maintainability)
CREATE TABLE Product (
    product_id      BIGSERIAL       PRIMARY KEY,
    
    -- Multiple alternate keys with named constraints
    sku             VARCHAR(50)     NOT NULL,
    upc             VARCHAR(20),
    manufacturer_part_no VARCHAR(50),
    
    product_name    VARCHAR(200)    NOT NULL,
    
    CONSTRAINT uq_product_sku UNIQUE (sku),
    CONSTRAINT uq_product_upc UNIQUE (upc),
    CONSTRAINT uq_product_mpn UNIQUE (manufacturer_part_no)
);
 
-- Partial unique: unique only when not null or under certain conditions
-- (PostgreSQL specific - filtered unique index)
CREATE TABLE Customer (
    customer_id     BIGSERIAL       PRIMARY KEY,
    email           VARCHAR(100),
    is_active       BOOLEAN         DEFAULT TRUE
);
 
-- Email must be unique, but only among active customers
CREATE UNIQUE INDEX uq_active_customer_email 
ON Customer(email) 
WHERE is_active = TRUE;
 
-- Unique constraint with NULL handling
-- By default, NULL = NULL is unknown, so multiple NULLs allowed in UNIQUE
-- To prohibit, either make NOT NULL or use partial index

UNIQUE and NULL

Key Data Types and Generation

The data type and generation strategy for keys impacts performance, storage, and usability. Here are the common options and their tradeoffs:

Key Type Comparison
Key Type	Size	Generation	Pros	Cons
INTEGER AUTO_INCREMENT	4 bytes	Database sequence	Compact, ordered, fast joins	Predictable, limited range (2B)
BIGINT AUTO_INCREMENT	8 bytes	Database sequence	Huge range (9×10¹⁸)	Larger index, still predictable
UUID v4	16 bytes (128 bit)	Random generation	Globally unique, unpredictable	Large, random = poor index locality
UUID v7 / ULID	16 bytes	Time-ordered random	Time-sortable, unique, unguessable	Relatively large
VARCHAR natural key	Variable	Business assigned	Meaningful, no generation needed	May change, variable size
CHAR fixed-width	Fixed n bytes	Business assigned	Consistent size for codes	Padding overhead

key_types.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
-- 1. Auto-increment Integer (most common)
CREATE TABLE Order_AutoInt (
    order_id    SERIAL PRIMARY KEY,  -- PostgreSQL: 1, 2, 3, ...
    -- or: INTEGER GENERATED ALWAYS AS IDENTITY
    customer_id INTEGER NOT NULL,
    order_date  DATE NOT NULL
);
 
-- MySQL equivalent
-- order_id INT AUTO_INCREMENT PRIMARY KEY
 
-- SQL Server equivalent
-- order_id INT IDENTITY(1,1) PRIMARY KEY
 
 
-- 2. BIGINT for very large tables
CREATE TABLE Event_Log (
    event_id    BIGSERIAL PRIMARY KEY,  -- 8 bytes, huge range
    event_type  VARCHAR(50),
    event_data  JSONB,
    created_at  TIMESTAMP DEFAULT NOW()
);
 
 
-- 3. UUID (Universally Unique Identifier)
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";  -- PostgreSQL
 
CREATE TABLE Document (
    document_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
    title       VARCHAR(200) NOT NULL,
    content     TEXT,
    created_at  TIMESTAMP DEFAULT NOW()
);
 
-- Alternative: UUID v7 (time-sortable) - requires extension or app-generated
 
 
-- 4. Natural key with specific format
CREATE TABLE Country (
    country_code    CHAR(3) PRIMARY KEY,  -- ISO 3166-1 alpha-3
    country_name    VARCHAR(100) NOT NULL,
    continent       VARCHAR(50)
);
 
CREATE TABLE Currency (
    currency_code   CHAR(3) PRIMARY KEY,  -- ISO 4217
    currency_name   VARCHAR(100) NOT NULL,
    symbol          VARCHAR(5)
);
 
 
-- 5. Using a custom sequence
CREATE SEQUENCE order_number_seq START 100000;
 
CREATE TABLE Customer_Order (
    order_number    VARCHAR(20) PRIMARY KEY 
                    DEFAULT 'ORD-' || nextval('order_number_seq'),
    customer_id     INTEGER NOT NULL,
    order_date      DATE NOT NULL
);
-- Generates: ORD-100000, ORD-100001, ...
 
 
-- 6. Application-generated structured ID
-- Common pattern: Prefix + Date + Sequence
-- Example: INV-2024-001234
-- Usually generated in application code, stored as VARCHAR

UUID vs. Auto-Increment

Key Design Best Practices

Drawing from decades of database design experience, here are the established best practices for key attribute mapping:

Key Design Best Practices

•Every table must have a primary key. No exceptions. This is fundamental to the relational model and required for most database features (replication, ORMs, etc.).
•Primary keys should be immutable. Once assigned, a key should never change. If natural keys might change, use surrogate key as primary.
•Primary keys should be NOT NULL. This is automatic with PRIMARY KEY constraint, but worth emphasizing—null identifiers are meaningless.
•Keep keys simple. Single-column integer or UUID keys are easiest to work with. Use composite keys when semantically required, but avoid them when a surrogate would suffice.
•Enforce alternate keys. Even with surrogate primary keys, use UNIQUE constraints on natural identifiers to enforce business uniqueness.
•Name constraints descriptively. Use names like 'pk_customer', 'uq_customer_email', 'fk_order_customer' for maintainability.
•Consider key size for frequently-joined tables. A 4-byte integer primary key is faster to join than a 36-character UUID string.
•Don't expose auto-increment IDs in public APIs if security matters. Sequential IDs reveal information (how many records, enumeration attacks). Use UUIDs or obscure mapped identifiers.
•Document key selection rationale. When the choice isn't obvious, document why you chose natural vs. surrogate, what the alternate keys represent.
•Test key-related queries early. Ensure your key choices support the expected query patterns and perform adequately.

Anti-Pattern: No Primary Key

Summary: Key Attribute Mapping

Key Takeaways

•Key Hierarchy: Superkeys → Candidate Keys (minimal) → Primary Key (chosen) + Alternate Keys (others).
•ER Keys → PRIMARY KEY: Underlined ER attributes become PRIMARY KEY constraints in the relational schema.
•Composite Keys: Multiple attributes can form a single key; all components go into PRIMARY KEY definition.
•Natural vs. Surrogate: Natural keys are meaningful but may change; surrogates are stable but meaningless. Hybrid approach often best.
•Alternate Keys → UNIQUE: Non-primary candidate keys become UNIQUE constraints to enforce business uniqueness.
•Data Type Matters: INTEGER is compact and fast; UUID is globally unique; choose based on requirements.
•Every Table Needs a Key: No exceptions—primary keys are fundamental to relational operations.
•Document Decisions: Key selection has lasting impact; document your reasoning for future maintainers.

Module Complete: Entity Mapping

We've now covered all aspects of mapping ER entities to relational tables:

Regular Entity Mapping — The foundational algorithm
Composite Attributes — Flattening hierarchical structures
Multivalued Attributes — Separate tables for multiple values
Derived Attributes — Computed values and storage decisions
Key Attributes — Identity, uniqueness, and key selection

Page Complete

5 / 5