Loading learning content...
In any database, we must answer a fundamental question: How do we uniquely identify each entity instance? Without unique identification, we cannot reliably retrieve, update, or reference specific records. We cannot establish relationships between entities. The entire relational foundation collapses.
Key attributes solve this problem. They are the attributes (or combinations of attributes) that uniquely identify each entity instance. In ER diagrams, key attributes are underlined, visually marking them as special. In the relational model, they become PRIMARY KEY constraints—the fundamental guarantee of row uniqueness.
But mapping keys isn't always straightforward. An entity may have multiple candidate keys. Keys may be composite (multiple attributes). We must decide between natural keys (meaningful real-world values) and surrogate keys (system-generated identifiers). Each choice has profound implications for data integrity, performance, and maintainability.
This page completes our entity mapping journey by examining how to properly identify, select, and implement key attributes in a relational schema.
By the end of this page, you will understand the complete taxonomy of keys (superkey, candidate key, primary key, alternate key, foreign key), master the mapping of ER keys to relational constraints, handle composite keys effectively, and make informed decisions in the natural vs. surrogate key debate.
Before mapping keys, we must understand the precise terminology. The relational model defines a hierarchy of key concepts, each building on the previous:
Example: Student Entity
Consider a Student entity with attributes:
Superkeys include:
Candidate Keys (minimal):
Primary Key Selection: We choose {student_id} as primary key.
Alternate Keys: {ssn} and {email} remain as UNIQUE constraints.
A candidate key must be minimal—removing any attribute would break uniqueness. If {A, B} uniquely identifies tuples but {A} alone also does, then {A, B} is NOT a candidate key; only {A} is. This minimality ensures candidate keys have no redundant components.
In ER diagrams, key attributes are indicated by underlining the attribute name. When mapping to the relational schema, these underlined attributes become the PRIMARY KEY constraint of the relation.
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- Example 1: Simple Primary Key-- ER: Student entity with underlined 'student_id'CREATE TABLE Student ( student_id VARCHAR(20) PRIMARY KEY, -- From ER key attribute first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(100) NOT NULL UNIQUE, -- Alternate key ssn CHAR(11) UNIQUE, -- Alternate key date_of_birth DATE NOT NULL); -- Example 2: Composite Primary Key-- ER: Course_Section with composite key {course_id, section_number, semester, year}CREATE TABLE Course_Section ( course_id VARCHAR(10) NOT NULL, section_number VARCHAR(5) NOT NULL, semester VARCHAR(10) NOT NULL, year INTEGER NOT NULL, instructor_id INTEGER, room_number VARCHAR(20), schedule VARCHAR(100), -- Composite primary key (all columns from ER key) PRIMARY KEY (course_id, section_number, semester, year), -- Foreign keys FOREIGN KEY (course_id) REFERENCES Course(course_id), FOREIGN KEY (instructor_id) REFERENCES Instructor(instructor_id)); -- Example 3: Multiple Candidate Keys (choosing one as primary)-- ER shows both 'product_id' and 'sku' as unique identifiersCREATE TABLE Product ( product_id SERIAL PRIMARY KEY, -- Chosen as primary sku VARCHAR(50) NOT NULL UNIQUE, -- Alternate key upc VARCHAR(20) UNIQUE, -- Another alternate key product_name VARCHAR(200) NOT NULL, price DECIMAL(10, 2) NOT NULL); -- The PRIMARY KEY constraint automatically implies:-- - NOT NULL (key columns cannot be null)-- - UNIQUE (key values cannot be duplicated)-- - Creates an index (performance optimization)A PRIMARY KEY constraint is conceptually equivalent to NOT NULL + UNIQUE, plus it designates the principal identifier. Most databases also create an index automatically. While you could manually combine NOT NULL and UNIQUE, using PRIMARY KEY explicitly declares intent and enables proper foreign key references.
When no single attribute uniquely identifies an entity, multiple attributes combine to form a composite key. This is common in associative entities (junction tables) and entities that are implicitly dependent on context.
| Entity | Composite Key | Rationale |
|---|---|---|
| Order_Item | (order_id, product_id) | Same product can appear in multiple orders; same order can have multiple products |
| Course_Section | (course_id, section, semester, year) | Same course has multiple sections across terms |
| Employee_Project | (employee_id, project_id) | Many-to-many relationship |
| Flight_Segment | (flight_number, segment_number) | A flight may have multiple segments |
| Version | (artifact_id, version_number) | Multiple versions per artifact |
| Room | (building_id, room_number) | Room numbers unique only within building |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
-- Composite Key Example 1: Order Line ItemsCREATE TABLE Order_Item ( order_id INTEGER NOT NULL, product_id INTEGER NOT NULL, quantity INTEGER NOT NULL CHECK (quantity > 0), unit_price DECIMAL(10, 2) NOT NULL, discount_pct DECIMAL(5, 2) DEFAULT 0, -- Composite primary key PRIMARY KEY (order_id, product_id), -- Foreign keys FOREIGN KEY (order_id) REFERENCES Order(order_id) ON DELETE CASCADE, FOREIGN KEY (product_id) REFERENCES Product(product_id)); -- Composite Key Example 2: Many-to-Many with AttributesCREATE TABLE Student_Course_Enrollment ( student_id INTEGER NOT NULL, course_id VARCHAR(10) NOT NULL, semester VARCHAR(10) NOT NULL, year INTEGER NOT NULL, -- Enrollment attributes enrollment_date DATE NOT NULL DEFAULT CURRENT_DATE, grade CHAR(2), status VARCHAR(20) DEFAULT 'enrolled', -- Four-column composite key PRIMARY KEY (student_id, course_id, semester, year), FOREIGN KEY (student_id) REFERENCES Student(student_id), FOREIGN KEY (course_id) REFERENCES Course(course_id), CONSTRAINT chk_status CHECK (status IN ('enrolled', 'withdrawn', 'completed'))); -- Composite Key Example 3: Hierarchical IdentifierCREATE TABLE Course_Module_Lesson ( course_id VARCHAR(10) NOT NULL, module_number INTEGER NOT NULL, lesson_number INTEGER NOT NULL, lesson_title VARCHAR(200) NOT NULL, content_url VARCHAR(500), duration_min INTEGER, -- Three-level composite key PRIMARY KEY (course_id, module_number, lesson_number), FOREIGN KEY (course_id) REFERENCES Course(course_id)); -- Referencing a composite key from another tableCREATE TABLE Lesson_Quiz ( quiz_id SERIAL PRIMARY KEY, course_id VARCHAR(10) NOT NULL, module_number INTEGER NOT NULL, lesson_number INTEGER NOT NULL, quiz_title VARCHAR(200) NOT NULL, pass_score INTEGER DEFAULT 70, -- Foreign key referencing composite primary key FOREIGN KEY (course_id, module_number, lesson_number) REFERENCES Course_Module_Lesson(course_id, module_number, lesson_number) ON DELETE CASCADE);If a composite key exceeds 3-4 columns, consider whether a surrogate key might simplify the schema. Very wide keys make foreign key references cumbersome and can impact performance. Balance natural semantics against practical usability.
One of the most debated topics in database design: Should primary keys be natural (meaningful business values) or surrogate (system-generated identifiers)?
Natural Keys are real-world identifiers with business meaning:
Surrogate Keys are artificial identifiers with no business meaning:
Both have legitimate use cases. The right choice depends on your specific requirements.
Use natural keys when: (1) they are truly immutable (ISBN, VIN), (2) they are compact, (3) they are always known at insert time, (4) they match integration requirements. Use surrogate keys when: (1) natural key might change, (2) natural key is large/composite, (3) natural key is sensitive, (4) record may exist before natural key is assigned. When in doubt, use the hybrid approach.
When an entity has multiple candidate keys, one becomes PRIMARY KEY and the others become alternate keys. Alternate keys are implemented using UNIQUE constraints—they enforce uniqueness but aren't the principal identifier.
Why Alternate Keys Matter:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
-- Multiple UNIQUE constraints for alternate keysCREATE TABLE User_Account ( id BIGSERIAL PRIMARY KEY, -- Surrogate key (internal) -- Alternate keys (natural identifiers) username VARCHAR(50) NOT NULL UNIQUE, -- Users login by username email VARCHAR(100) NOT NULL UNIQUE, -- Password reset by email phone_number VARCHAR(20) UNIQUE, -- Optional, but unique if provided -- Other attributes display_name VARCHAR(100), password_hash VARCHAR(200) NOT NULL, created_at TIMESTAMP DEFAULT NOW()); -- Composite alternate keyCREATE TABLE Employee ( employee_id BIGSERIAL PRIMARY KEY, -- Composite alternate key: company + badge number company_code VARCHAR(10) NOT NULL, badge_number VARCHAR(20) NOT NULL, -- Other attributes first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(100) NOT NULL UNIQUE, -- Composite UNIQUE constraint CONSTRAINT uq_company_badge UNIQUE (company_code, badge_number)); -- Named UNIQUE constraints (recommended for maintainability)CREATE TABLE Product ( product_id BIGSERIAL PRIMARY KEY, -- Multiple alternate keys with named constraints sku VARCHAR(50) NOT NULL, upc VARCHAR(20), manufacturer_part_no VARCHAR(50), product_name VARCHAR(200) NOT NULL, CONSTRAINT uq_product_sku UNIQUE (sku), CONSTRAINT uq_product_upc UNIQUE (upc), CONSTRAINT uq_product_mpn UNIQUE (manufacturer_part_no)); -- Partial unique: unique only when not null or under certain conditions-- (PostgreSQL specific - filtered unique index)CREATE TABLE Customer ( customer_id BIGSERIAL PRIMARY KEY, email VARCHAR(100), is_active BOOLEAN DEFAULT TRUE); -- Email must be unique, but only among active customersCREATE UNIQUE INDEX uq_active_customer_email ON Customer(email) WHERE is_active = TRUE; -- Unique constraint with NULL handling-- By default, NULL = NULL is unknown, so multiple NULLs allowed in UNIQUE-- To prohibit, either make NOT NULL or use partial indexStandard SQL allows multiple NULL values in a UNIQUE column because NULL ≠ NULL (null is unknown). This means UNIQUE without NOT NULL can have many null entries. If you want to ensure at most one null or no nulls, add NOT NULL or use database-specific features like filtered indexes.
The data type and generation strategy for keys impacts performance, storage, and usability. Here are the common options and their tradeoffs:
| Key Type | Size | Generation | Pros | Cons |
|---|---|---|---|---|
| INTEGER AUTO_INCREMENT | 4 bytes | Database sequence | Compact, ordered, fast joins | Predictable, limited range (2B) |
| BIGINT AUTO_INCREMENT | 8 bytes | Database sequence | Huge range (9×10¹⁸) | Larger index, still predictable |
| UUID v4 | 16 bytes (128 bit) | Random generation | Globally unique, unpredictable | Large, random = poor index locality |
| UUID v7 / ULID | 16 bytes | Time-ordered random | Time-sortable, unique, unguessable | Relatively large |
| VARCHAR natural key | Variable | Business assigned | Meaningful, no generation needed | May change, variable size |
| CHAR fixed-width | Fixed n bytes | Business assigned | Consistent size for codes | Padding overhead |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
-- 1. Auto-increment Integer (most common)CREATE TABLE Order_AutoInt ( order_id SERIAL PRIMARY KEY, -- PostgreSQL: 1, 2, 3, ... -- or: INTEGER GENERATED ALWAYS AS IDENTITY customer_id INTEGER NOT NULL, order_date DATE NOT NULL); -- MySQL equivalent-- order_id INT AUTO_INCREMENT PRIMARY KEY -- SQL Server equivalent-- order_id INT IDENTITY(1,1) PRIMARY KEY -- 2. BIGINT for very large tablesCREATE TABLE Event_Log ( event_id BIGSERIAL PRIMARY KEY, -- 8 bytes, huge range event_type VARCHAR(50), event_data JSONB, created_at TIMESTAMP DEFAULT NOW()); -- 3. UUID (Universally Unique Identifier)CREATE EXTENSION IF NOT EXISTS "uuid-ossp"; -- PostgreSQL CREATE TABLE Document ( document_id UUID PRIMARY KEY DEFAULT uuid_generate_v4(), title VARCHAR(200) NOT NULL, content TEXT, created_at TIMESTAMP DEFAULT NOW()); -- Alternative: UUID v7 (time-sortable) - requires extension or app-generated -- 4. Natural key with specific formatCREATE TABLE Country ( country_code CHAR(3) PRIMARY KEY, -- ISO 3166-1 alpha-3 country_name VARCHAR(100) NOT NULL, continent VARCHAR(50)); CREATE TABLE Currency ( currency_code CHAR(3) PRIMARY KEY, -- ISO 4217 currency_name VARCHAR(100) NOT NULL, symbol VARCHAR(5)); -- 5. Using a custom sequenceCREATE SEQUENCE order_number_seq START 100000; CREATE TABLE Customer_Order ( order_number VARCHAR(20) PRIMARY KEY DEFAULT 'ORD-' || nextval('order_number_seq'), customer_id INTEGER NOT NULL, order_date DATE NOT NULL);-- Generates: ORD-100000, ORD-100001, ... -- 6. Application-generated structured ID-- Common pattern: Prefix + Date + Sequence-- Example: INV-2024-001234-- Usually generated in application code, stored as VARCHARAuto-increment is best for internal systems where ordering helps performance and exposing IDs is acceptable. UUID is better for distributed systems, public-facing APIs where you don't want guessable IDs, and data that may be merged from multiple sources. UUID v7 (time-ordered) offers a compromise with ordering benefits.
Drawing from decades of database design experience, here are the established best practices for key attribute mapping:
Some developers omit primary keys from tables (especially 'temporary' or 'log' tables). This causes problems: no guaranteed uniqueness, replication issues, ORM incompatibility, performance problems without implicit row identifier. Always define a primary key, even if just an auto-increment ID.
Key attributes are the foundation of entity identity in both ER modeling and relational databases. Proper key design ensures data integrity, enables relationships, and affects query performance. Let's consolidate what we've learned:
Module Complete: Entity Mapping
We've now covered all aspects of mapping ER entities to relational tables:
You now have the complete toolkit for transforming any ER entity into a well-designed relational table. The next module will cover Relationship Mapping, where we'll learn how to transform the connections between entities into foreign keys and junction tables.
You now understand the complete theory and practice of key attribute mapping. You can identify and classify keys, map them to appropriate constraints, make informed natural vs. surrogate decisions, and apply best practices for robust key design. You've completed the Entity Mapping module!