Loading learning content...
In the physical world, no two snowflakes are identical. In a well-designed database, no two rows representing the same type of entity should be indistinguishable. Key constraints are the mechanisms that enforce this principle of uniqueness.
At its core, a key constraint guarantees that a particular attribute or combination of attributes uniquely identifies each row in a table. This seemingly simple guarantee has profound implications: it enables precise querying, reliable updates, consistent referential integrity, and meaningful data analysis.
This page provides an exhaustive exploration of key constraints, examining the hierarchy of keys (superkeys, candidate keys, primary keys, alternate keys), their formal properties, practical implementation across database systems, and the critical design decisions involved in choosing appropriate keys.
By the end of this page, you will understand: (1) The formal definitions of superkeys, candidate keys, primary keys, and alternate keys, (2) The uniqueness property and its enforcement mechanisms, (3) UNIQUE constraints vs PRIMARY KEY constraints, (4) Composite key design considerations, (5) Natural keys vs surrogate keys in depth, and (6) Key selection best practices and common pitfalls.
The relational model defines a precise hierarchy of key concepts, each building on the previous. Understanding this hierarchy is essential for proper database design.
Illustrating the hierarchy:
Consider an Employee table with attributes: {emp_id, ssn, email, first_name, last_name, department}
Let's analyze the keys:
| Attribute Set | Superkey? | Candidate Key? | Rationale |
|---|---|---|---|
| {emp_id, ssn, email, first_name, last_name, department} | Yes | No | All attributes together are unique, but not minimal |
| {emp_id, ssn, email} | Yes | No | Unique but contains redundant uniqueness |
| {emp_id, first_name} | Yes | No | Unique but first_name is unnecessary |
| {emp_id} | Yes | Yes | Minimal—removing emp_id loses uniqueness |
| {ssn} | Yes | Yes | Minimal—unique per person (in US context) |
| {email} | Yes | Yes | Minimal—unique per employee (if enforced) |
| {first_name, last_name} | Maybe | Maybe | Depends on data—could have duplicate names |
| {department} | No | No | Multiple employees in same department |
From this analysis:
The primary key choice is a design decision, not determined by the data itself. We'll explore selection criteria later in this page.
The distinction between superkey and candidate key is minimality. A candidate key is a superkey with no unnecessary attributes. This matters for indexing efficiency, foreign key references, and schema reasoning. Always identify the minimal set of attributes needed for uniqueness.
Mathematically, a relation is a set of tuples. By the definition of a set, no two elements can be identical. Therefore, every relation inherently has the uniqueness property—no two rows can be exactly the same across all attributes.
However, in practice, we need stronger guarantees. We need certain attributes (the keys) to uniquely identify rows, even if other attributes happen to be identical.
Formal definition:
A set of attributes K is a superkey of relation R if and only if, for any two tuples t₁ and t₂ in R where t₁ ≠ t₂, we have t₁[K] ≠ t₂[K].
In plain language: no two different rows can have the same values for all attributes in K.
123456789101112131415161718192021222324252627282930313233343536373839404142
-- Sample data demonstrating uniquenessCREATE TABLE employees ( emp_id INT PRIMARY KEY, ssn CHAR(11) UNIQUE, email VARCHAR(255) UNIQUE, first_name VARCHAR(50), last_name VARCHAR(50), department VARCHAR(50)); -- Valid insertions: each key is uniqueINSERT INTO employees VALUES (1, '123-45-6789', 'alice@co.com', 'Alice', 'Smith', 'Engineering'), (2, '234-56-7890', 'bob@co.com', 'Bob', 'Jones', 'Marketing'), (3, '345-67-8901', 'carol@co.com', 'Carol', 'Smith', 'Engineering');-- Note: last_name 'Smith' and department 'Engineering' repeat (not keys) -- Duplicate primary key: REJECTEDINSERT INTO employees VALUES (1, '456-78-9012', 'dave@co.com', 'Dave', 'Brown', 'Sales');-- ERROR: duplicate key value violates unique constraint "employees_pkey" -- Duplicate alternate key (SSN): REJECTEDINSERT INTO employees VALUES (4, '123-45-6789', 'eve@co.com', 'Eve', 'Wilson', 'Finance');-- ERROR: duplicate key value violates unique constraint "employees_ssn_key" -- Duplicate alternate key (email): REJECTEDINSERT INTO employees VALUES (5, '567-89-0123', 'alice@co.com', 'Alice', 'Taylor', 'HR');-- ERROR: duplicate key value violates unique constraint "employees_email_key" -- Duplicate non-key values are allowedINSERT INTO employees VALUES (6, '678-90-1234', 'frank@co.com', 'Frank', 'Smith', 'Engineering');-- OK: same last_name and department as other employees (not keys) -- Uniqueness check querySELECT emp_id, COUNT(*)FROM employeesGROUP BY emp_idHAVING COUNT(*) > 1; -- Returns nothing if PK constraint workingThe NULL exception:
A critical nuance in SQL (though not in pure relational theory): UNIQUE constraints typically allow multiple NULL values because NULL ≠ NULL (it evaluates to UNKNOWN). Two rows with NULL in a UNIQUE column are not considered duplicates.
This is why the PRIMARY KEY constraint is special—it combines UNIQUE with NOT NULL, closing the NULL loophole.
In PostgreSQL, SQL Server, and SQLite, a UNIQUE constraint allows multiple NULL values. In MySQL (InnoDB), UNIQUE allows multiple NULLs by default. Only PRIMARY KEY guarantees both uniqueness AND non-nullability. If NULL should be prohibited, explicitly add NOT NULL to UNIQUE columns.
Both PRIMARY KEY and UNIQUE enforce uniqueness, but they serve different purposes and have distinct behaviors.
| Characteristic | PRIMARY KEY | UNIQUE Constraint |
|---|---|---|
| Number per table | Exactly one | Zero to many |
| NULL values | Never allowed (implicit NOT NULL) | Allowed by default (multiple NULLs possible) |
| Purpose | Principal identifier for tuples | Additional uniqueness requirements |
| Foreign key target | Default reference target | Can be referenced with explicit syntax |
| Clustered index (SQL Server) | Creates clustered index by default | Creates non-clustered index |
| Semantic meaning | 'This is THE identifier' | 'This must also be unique' |
| Entity integrity | Enforces entity integrity rule | Does not guarantee entity integrity |
12345678910111213141516171819202122232425262728293031323334353637383940414243
-- A table demonstrating both PRIMARY KEY and UNIQUECREATE TABLE users ( user_id SERIAL PRIMARY KEY, -- THE identifier; auto-generated username VARCHAR(50) NOT NULL UNIQUE, -- Alternate key: must be unique, no NULLs email VARCHAR(255) NOT NULL UNIQUE, -- Another alternate key: unique, no NULLs phone VARCHAR(20) UNIQUE, -- UNIQUE but nullable (optional phone) display_name VARCHAR(100), -- No uniqueness requirement created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Primary key: user_id-- Alternate keys: username, email-- phone is UNIQUE but not a candidate key (allows NULL) -- Valid: all constraints satisfiedINSERT INTO users (username, email, display_name)VALUES ('johndoe', 'john@example.com', 'John Doe'); -- NULL phone is allowed (UNIQUE permits NULL)INSERT INTO users (username, email, phone)VALUES ('janedoe', 'jane@example.com', NULL); -- Another NULL phone is STILL allowed (UNIQUE allows multiple NULLs)INSERT INTO users (username, email, phone)VALUES ('bobsmith', 'bob@example.com', NULL); -- Duplicate phone (non-NULL) is rejectedINSERT INTO users (username, email, phone)VALUES ('alicew', 'alice@example.com', '555-0100');INSERT INTO users (username, email, phone)VALUES ('charlieb', 'charlie@example.com', '555-0100');-- ERROR: duplicate key value violates unique constraint "users_phone_key" -- Foreign key can reference non-primary UNIQUE columnCREATE TABLE user_sessions ( session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(), username VARCHAR(50) NOT NULL, started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- Reference username instead of user_id (unusual but valid) FOREIGN KEY (username) REFERENCES users(username) ON UPDATE CASCADE -- Important: username might change);If a column should be a true alternate key (capable of uniquely identifying rows), declare it as NOT NULL UNIQUE. The UNIQUE alone allows NULL values, which means the column cannot reliably identify rows. Reserve nullable UNIQUE for columns that are unique when present but optional.
When no single attribute uniquely identifies tuples, we use composite keys—combinations of two or more attributes that together provide uniqueness. Composite keys are especially common in junction tables (bridge tables) that implement many-to-many relationships.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
-- Classic composite key: Junction table for M:N relationshipCREATE TABLE student_courses ( student_id INT NOT NULL, course_id VARCHAR(10) NOT NULL, enrollment_date DATE NOT NULL DEFAULT CURRENT_DATE, grade CHAR(2), -- Composite primary key PRIMARY KEY (student_id, course_id), FOREIGN KEY (student_id) REFERENCES students(student_id), FOREIGN KEY (course_id) REFERENCES courses(course_id)); -- Each combination of (student_id, course_id) is uniqueINSERT INTO student_courses VALUES (1001, 'CS101', '2024-01-15', NULL);INSERT INTO student_courses VALUES (1001, 'CS102', '2024-01-15', NULL); -- OK: different courseINSERT INTO student_courses VALUES (1002, 'CS101', '2024-01-15', NULL); -- OK: different studentINSERT INTO student_courses VALUES (1001, 'CS101', '2024-01-15', 'A'); -- ERROR: duplicate -- Composite key with three attributesCREATE TABLE exam_results ( student_id INT NOT NULL, course_id VARCHAR(10) NOT NULL, exam_number INT NOT NULL, -- 1 for midterm, 2 for final, etc. score DECIMAL(5,2) NOT NULL, taken_at TIMESTAMP NOT NULL, PRIMARY KEY (student_id, course_id, exam_number), FOREIGN KEY (student_id, course_id) REFERENCES student_courses(student_id, course_id)); -- Composite UNIQUE constraint (alternate composite key)CREATE TABLE inventory_locations ( location_id SERIAL PRIMARY KEY, -- Surrogate key warehouse_code CHAR(5) NOT NULL, aisle VARCHAR(3) NOT NULL, shelf INT NOT NULL, position INT NOT NULL, -- Natural composite key as alternate key UNIQUE (warehouse_code, aisle, shelf, position)); -- Both can identify a row:SELECT * FROM inventory_locations WHERE location_id = 42;SELECT * FROM inventory_locations WHERE warehouse_code = 'WH001' AND aisle = 'A12' AND shelf = 3 AND position = 5;For junction tables where the composite key naturally represents the relationship (student enrolled in course), composite keys are often preferred—they enforce uniqueness directly and require no extra columns. For entity tables where the natural key is complex or might change, a surrogate key with a UNIQUE constraint on the natural key provides stability while preserving business logic.
One of the most debated topics in database design is the choice between natural keys (real-world identifiers) and surrogate keys (system-generated identifiers). Both have ardent advocates, and the optimal choice depends on context.
Natural Key: An attribute or combination of attributes that has inherent meaning in the business domain and uniquely identifies an entity.
Surrogate Key: A system-generated, meaningless identifier (typically an auto-incrementing integer or UUID) used solely for database mechanics.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
-- Example 1: Strong case for NATURAL key-- ISO country codes are stable, meaningful, and compactCREATE TABLE countries ( country_code CHAR(2) PRIMARY KEY, -- 'US', 'GB', 'JP' country_name VARCHAR(100) NOT NULL, currency_code CHAR(3) NOT NULL); -- Joins are readableSELECT o.order_id, c.country_nameFROM orders oJOIN countries c ON o.shipping_country_code = c.country_code;-- vs: ON o.shipping_country_id = c.country_id (meaningless) -- Example 2: Strong case for SURROGATE key-- Customers have no stable, single natural identifierCREATE TABLE customers ( customer_id BIGSERIAL PRIMARY KEY, -- Surrogate -- Natural identifiers as alternate keys (may change!) email VARCHAR(255) NOT NULL UNIQUE, phone VARCHAR(20) UNIQUE, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL); -- Foreign keys use stable surrogateCREATE TABLE orders ( order_id BIGSERIAL PRIMARY KEY, customer_id BIGINT NOT NULL REFERENCES customers(customer_id), -- ... ); -- When email changes, no cascade needed!UPDATE customers SET email = 'newemail@example.com' WHERE customer_id = 12345; -- Example 3: HYBRID approach (recommended)-- Surrogate PK with natural UNIQUE constraintsCREATE TABLE products ( product_id BIGSERIAL PRIMARY KEY, -- Surrogate for internal use sku VARCHAR(50) NOT NULL UNIQUE, -- Natural key for business use upc CHAR(12) UNIQUE, -- Another natural key (barcode) product_name VARCHAR(200) NOT NULL, price DECIMAL(10,2) NOT NULL); -- Internal code uses surrogateSELECT * FROM order_items WHERE product_id = 12345; -- User-facing / API uses SKUSELECT * FROM products WHERE sku = 'WIDGET-XL-BLUE';Most production systems use a hybrid: surrogate primary keys for stability and join efficiency, with UNIQUE constraints on natural keys for business logic. This provides the best of both worlds—stable internal references and meaningful external identifiers. The surrogate key is for the database; the natural key is for the business.
While key constraints are conceptually similar across databases, implementation details vary in important ways that affect performance and administration.
1234567891011121314151617181920212223242526272829303132333435363738394041
-- PostgreSQL: Key constraint implementation -- Primary key creates B-tree index automaticallyCREATE TABLE orders ( order_id BIGSERIAL PRIMARY KEY -- Implicit: NOT NULL + UNIQUE index); -- View the auto-created indexSELECT indexname, indexdef FROM pg_indexes WHERE tablename = 'orders';-- orders_pkey | CREATE UNIQUE INDEX orders_pkey ON orders USING btree (order_id) -- Named primary key constraintCREATE TABLE products ( product_id BIGSERIAL, sku VARCHAR(50) NOT NULL, CONSTRAINT pk_products PRIMARY KEY (product_id), CONSTRAINT uq_products_sku UNIQUE (sku)); -- Unique constraint on expression (PostgreSQL-specific)CREATE TABLE users ( user_id SERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL);CREATE UNIQUE INDEX uq_users_email_lower ON users (LOWER(email));-- 'John@Example.com' and 'john@example.com' are now considered duplicates -- Partial unique constraint (PostgreSQL 9.0+)CREATE TABLE subscriptions ( sub_id SERIAL PRIMARY KEY, user_id INT NOT NULL, plan_type VARCHAR(20) NOT NULL, is_active BOOLEAN DEFAULT TRUE);-- Only one active subscription per userCREATE UNIQUE INDEX uq_active_subscription ON subscriptions (user_id) WHERE is_active = TRUE;SQL Server treats NULL as a value for UNIQUE constraint purposes—only one NULL is allowed. PostgreSQL and MySQL allow multiple NULLs. This is a critical difference when migrating between systems. Use filtered indexes in SQL Server if you need PostgreSQL-like behavior.
Choosing the right key for each table is one of the most impactful design decisions you'll make. Poor key choices create technical debt that's expensive to fix later.
| Mistake | Problem | Better Approach |
|---|---|---|
| Email as PK | Users change email addresses | Surrogate PK + email as UNIQUE constraint |
| Composite natural PK | Complex FKs; any part might change | Surrogate PK + UNIQUE on natural composite |
| Business code as PK | Codes get redefined; mergers change them | Surrogate PK + code as UNIQUE |
| Sequential INT for distributed | Collisions across servers | UUID or distributed sequence (Snowflake ID) |
| UUID for everything | Storage overhead; poor index locality | BIGINT for internal tables; UUID only when needed |
| No alternate keys | Business can't query by natural identifier | Add UNIQUE constraints on business identifiers |
| Nullable UNIQUE without NOT NULL | Illusion of uniqueness with NULL holes | Add NOT NULL for true alternate keys |
1234567891011121314151617181920212223242526272829303132333435363738
-- GOOD: Surrogate PK with natural UNIQUE constraintsCREATE TABLE employees ( employee_id BIGSERIAL PRIMARY KEY, -- Stable surrogate employee_number VARCHAR(20) NOT NULL UNIQUE, -- Business identifier email VARCHAR(255) NOT NULL UNIQUE, -- Another business identifier ssn CHAR(11) UNIQUE, -- Optional but unique first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL); -- GOOD: Natural key for truly stable reference dataCREATE TABLE currencies ( currency_code CHAR(3) PRIMARY KEY, -- ISO 4217: 'USD', 'EUR', 'JPY' currency_name VARCHAR(100) NOT NULL, symbol VARCHAR(5), decimal_places INT NOT NULL DEFAULT 2); -- GOOD: Composite key for junction tablesCREATE TABLE product_categories ( product_id BIGINT NOT NULL REFERENCES products(product_id), category_id BIGINT NOT NULL REFERENCES categories(category_id), is_primary BOOLEAN DEFAULT FALSE, PRIMARY KEY (product_id, category_id)); -- GOOD: Ensure only one primary category per productCREATE UNIQUE INDEX uq_product_primary_category ON product_categories (product_id) WHERE is_primary = TRUE; -- AVOID: Natural key that might changeCREATE TABLE customers_bad ( email VARCHAR(255) PRIMARY KEY, -- What if they change email? name VARCHAR(100) -- All foreign keys to this table break on email change!);If you're uncertain whether a natural key is stable enough, default to a surrogate key with the natural key as a UNIQUE constraint. You can always add application logic to query by natural key while keeping internal references stable. The cost of adding a BIGINT column is tiny compared to the cost of migrating a broken natural key later.
Key constraints are the backbone of data identification in relational databases. Let's consolidate the essential knowledge:
What's next:
We've covered the core integrity constraints: entity integrity, referential integrity, domain constraints, and key constraints. The final page explores Semantic Constraints—the complex business rules that go beyond simple value validation, capturing the intricate logic that keeps data meaningful in real-world applications.
You now have a comprehensive understanding of key constraints—from the theoretical key hierarchy through practical implementation across database systems. Properly designed keys form the foundation for reliable querying, consistent relationships, and scalable data management.