Database Management SystemsIntegrity Constraints

Integrity Constraints in the Relational Model

LevelIntermediate

Duration90 mins

TopicIntegrity Constraints

4 / 5

Key Constraints

Uniqueness as a Fundamental Property

In the physical world, no two snowflakes are identical. In a well-designed database, no two rows representing the same type of entity should be indistinguishable. Key constraints are the mechanisms that enforce this principle of uniqueness.

At its core, a key constraint guarantees that a particular attribute or combination of attributes uniquely identifies each row in a table. This seemingly simple guarantee has profound implications: it enables precise querying, reliable updates, consistent referential integrity, and meaningful data analysis.

This page provides an exhaustive exploration of key constraints, examining the hierarchy of keys (superkeys, candidate keys, primary keys, alternate keys), their formal properties, practical implementation across database systems, and the critical design decisions involved in choosing appropriate keys.

What You Will Master

By the end of this page, you will understand: (1) The formal definitions of superkeys, candidate keys, primary keys, and alternate keys, (2) The uniqueness property and its enforcement mechanisms, (3) UNIQUE constraints vs PRIMARY KEY constraints, (4) Composite key design considerations, (5) Natural keys vs surrogate keys in depth, and (6) Key selection best practices and common pitfalls.

The Key Hierarchy: From Superkeys to Primary Keys

The relational model defines a precise hierarchy of key concepts, each building on the previous. Understanding this hierarchy is essential for proper database design.

Key Definitions

•Superkey — Any set of attributes that uniquely identifies each tuple in a relation. A superkey may contain more attributes than necessary. Every relation has at least one superkey: the set of all its attributes.
•Candidate Key — A minimal superkey—a superkey from which no attribute can be removed without losing the uniqueness property. A relation may have multiple candidate keys.
•Primary Key — The candidate key selected by the database designer as the principal means of identifying tuples. There is exactly one primary key per table.
•Alternate Key — Any candidate key that is not chosen as the primary key. Alternate keys are also unique identifiers but secondary to the primary key.
•Simple Key — A key consisting of a single attribute.
•Composite Key — A key consisting of two or more attributes that together uniquely identify a tuple.

Illustrating the hierarchy:

Consider an Employee table with attributes: {emp_id, ssn, email, first_name, last_name, department}

Let's analyze the keys:

Key Analysis for Employee Table
Attribute Set	Superkey?	Candidate Key?	Rationale
{emp_id, ssn, email, first_name, last_name, department}	Yes	No	All attributes together are unique, but not minimal
{emp_id, ssn, email}	Yes	No	Unique but contains redundant uniqueness
{emp_id, first_name}	Yes	No	Unique but first_name is unnecessary
{emp_id}	Yes	Yes	Minimal—removing emp_id loses uniqueness
{ssn}	Yes	Yes	Minimal—unique per person (in US context)
{email}	Yes	Yes	Minimal—unique per employee (if enforced)
{first_name, last_name}	Maybe	Maybe	Depends on data—could have duplicate names
{department}	No	No	Multiple employees in same department

From this analysis:

Candidate Keys: {emp_id}, {ssn}, {email}
Primary Key: Designer chooses {emp_id}
Alternate Keys: {ssn} and {email}

The primary key choice is a design decision, not determined by the data itself. We'll explore selection criteria later in this page.

Minimality is Crucial

The distinction between superkey and candidate key is minimality. A candidate key is a superkey with no unnecessary attributes. This matters for indexing efficiency, foreign key references, and schema reasoning. Always identify the minimal set of attributes needed for uniqueness.

The Uniqueness Property: Mathematical Foundation

Mathematically, a relation is a set of tuples. By the definition of a set, no two elements can be identical. Therefore, every relation inherently has the uniqueness property—no two rows can be exactly the same across all attributes.

However, in practice, we need stronger guarantees. We need certain attributes (the keys) to uniquely identify rows, even if other attributes happen to be identical.

Formal definition:

A set of attributes K is a superkey of relation R if and only if, for any two tuples t₁ and t₂ in R where t₁ ≠ t₂, we have t₁[K] ≠ t₂[K].

In plain language: no two different rows can have the same values for all attributes in K.

uniqueness_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Sample data demonstrating uniqueness
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    ssn CHAR(11) UNIQUE,
    email VARCHAR(255) UNIQUE,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    department VARCHAR(50)
);
 
-- Valid insertions: each key is unique
INSERT INTO employees VALUES 
    (1, '123-45-6789', 'alice@co.com', 'Alice', 'Smith', 'Engineering'),
    (2, '234-56-7890', 'bob@co.com', 'Bob', 'Jones', 'Marketing'),
    (3, '345-67-8901', 'carol@co.com', 'Carol', 'Smith', 'Engineering');
-- Note: last_name 'Smith' and department 'Engineering' repeat (not keys)
 
-- Duplicate primary key: REJECTED
INSERT INTO employees VALUES 
    (1, '456-78-9012', 'dave@co.com', 'Dave', 'Brown', 'Sales');
-- ERROR: duplicate key value violates unique constraint "employees_pkey"
 
-- Duplicate alternate key (SSN): REJECTED
INSERT INTO employees VALUES 
    (4, '123-45-6789', 'eve@co.com', 'Eve', 'Wilson', 'Finance');
-- ERROR: duplicate key value violates unique constraint "employees_ssn_key"
 
-- Duplicate alternate key (email): REJECTED
INSERT INTO employees VALUES 
    (5, '567-89-0123', 'alice@co.com', 'Alice', 'Taylor', 'HR');
-- ERROR: duplicate key value violates unique constraint "employees_email_key"
 
-- Duplicate non-key values are allowed
INSERT INTO employees VALUES 
    (6, '678-90-1234', 'frank@co.com', 'Frank', 'Smith', 'Engineering');
-- OK: same last_name and department as other employees (not keys)
 
-- Uniqueness check query
SELECT emp_id, COUNT(*)
FROM employees
GROUP BY emp_id
HAVING COUNT(*) > 1;  -- Returns nothing if PK constraint working

The NULL exception:

A critical nuance in SQL (though not in pure relational theory): UNIQUE constraints typically allow multiple NULL values because NULL ≠ NULL (it evaluates to UNKNOWN). Two rows with NULL in a UNIQUE column are not considered duplicates.

This is why the PRIMARY KEY constraint is special—it combines UNIQUE with NOT NULL, closing the NULL loophole.

Multiple NULLs in UNIQUE Columns

In PostgreSQL, SQL Server, and SQLite, a UNIQUE constraint allows multiple NULL values. In MySQL (InnoDB), UNIQUE allows multiple NULLs by default. Only PRIMARY KEY guarantees both uniqueness AND non-nullability. If NULL should be prohibited, explicitly add NOT NULL to UNIQUE columns.

PRIMARY KEY vs UNIQUE Constraints

Both PRIMARY KEY and UNIQUE enforce uniqueness, but they serve different purposes and have distinct behaviors.

PRIMARY KEY vs UNIQUE Comparison
Characteristic	PRIMARY KEY	UNIQUE Constraint
Number per table	Exactly one	Zero to many
NULL values	Never allowed (implicit NOT NULL)	Allowed by default (multiple NULLs possible)
Purpose	Principal identifier for tuples	Additional uniqueness requirements
Foreign key target	Default reference target	Can be referenced with explicit syntax
Clustered index (SQL Server)	Creates clustered index by default	Creates non-clustered index
Semantic meaning	'This is THE identifier'	'This must also be unique'
Entity integrity	Enforces entity integrity rule	Does not guarantee entity integrity

pk_vs_unique.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- A table demonstrating both PRIMARY KEY and UNIQUE
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,           -- THE identifier; auto-generated
    username VARCHAR(50) NOT NULL UNIQUE, -- Alternate key: must be unique, no NULLs
    email VARCHAR(255) NOT NULL UNIQUE,   -- Another alternate key: unique, no NULLs
    phone VARCHAR(20) UNIQUE,             -- UNIQUE but nullable (optional phone)
    display_name VARCHAR(100),            -- No uniqueness requirement
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Primary key: user_id
-- Alternate keys: username, email
-- phone is UNIQUE but not a candidate key (allows NULL)
 
-- Valid: all constraints satisfied
INSERT INTO users (username, email, display_name)
VALUES ('johndoe', 'john@example.com', 'John Doe');
 
-- NULL phone is allowed (UNIQUE permits NULL)
INSERT INTO users (username, email, phone)
VALUES ('janedoe', 'jane@example.com', NULL);
 
-- Another NULL phone is STILL allowed (UNIQUE allows multiple NULLs)
INSERT INTO users (username, email, phone)
VALUES ('bobsmith', 'bob@example.com', NULL);
 
-- Duplicate phone (non-NULL) is rejected
INSERT INTO users (username, email, phone)
VALUES ('alicew', 'alice@example.com', '555-0100');
INSERT INTO users (username, email, phone)
VALUES ('charlieb', 'charlie@example.com', '555-0100');
-- ERROR: duplicate key value violates unique constraint "users_phone_key"
 
-- Foreign key can reference non-primary UNIQUE column
CREATE TABLE user_sessions (
    session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username VARCHAR(50) NOT NULL,
    started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- Reference username instead of user_id (unusual but valid)
    FOREIGN KEY (username) REFERENCES users(username)
        ON UPDATE CASCADE  -- Important: username might change
);

Best Practice: Combine UNIQUE with NOT NULL

If a column should be a true alternate key (capable of uniquely identifying rows), declare it as NOT NULL UNIQUE. The UNIQUE alone allows NULL values, which means the column cannot reliably identify rows. Reserve nullable UNIQUE for columns that are unique when present but optional.

Composite Keys: Multi-Attribute Identification

When no single attribute uniquely identifies tuples, we use composite keys—combinations of two or more attributes that together provide uniqueness. Composite keys are especially common in junction tables (bridge tables) that implement many-to-many relationships.

composite_keys.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Classic composite key: Junction table for M:N relationship
CREATE TABLE student_courses (
    student_id INT NOT NULL,
    course_id VARCHAR(10) NOT NULL,
    enrollment_date DATE NOT NULL DEFAULT CURRENT_DATE,
    grade CHAR(2),
    
    -- Composite primary key
    PRIMARY KEY (student_id, course_id),
    
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
 
-- Each combination of (student_id, course_id) is unique
INSERT INTO student_courses VALUES (1001, 'CS101', '2024-01-15', NULL);
INSERT INTO student_courses VALUES (1001, 'CS102', '2024-01-15', NULL);  -- OK: different course
INSERT INTO student_courses VALUES (1002, 'CS101', '2024-01-15', NULL);  -- OK: different student
INSERT INTO student_courses VALUES (1001, 'CS101', '2024-01-15', 'A');   -- ERROR: duplicate
 
-- Composite key with three attributes
CREATE TABLE exam_results (
    student_id INT NOT NULL,
    course_id VARCHAR(10) NOT NULL,
    exam_number INT NOT NULL,  -- 1 for midterm, 2 for final, etc.
    score DECIMAL(5,2) NOT NULL,
    taken_at TIMESTAMP NOT NULL,
    
    PRIMARY KEY (student_id, course_id, exam_number),
    
    FOREIGN KEY (student_id, course_id) 
        REFERENCES student_courses(student_id, course_id)
);
 
-- Composite UNIQUE constraint (alternate composite key)
CREATE TABLE inventory_locations (
    location_id SERIAL PRIMARY KEY,  -- Surrogate key
    warehouse_code CHAR(5) NOT NULL,
    aisle VARCHAR(3) NOT NULL,
    shelf INT NOT NULL,
    position INT NOT NULL,
    
    -- Natural composite key as alternate key
    UNIQUE (warehouse_code, aisle, shelf, position)
);
 
-- Both can identify a row:
SELECT * FROM inventory_locations WHERE location_id = 42;
SELECT * FROM inventory_locations 
WHERE warehouse_code = 'WH001' AND aisle = 'A12' AND shelf = 3 AND position = 5;

Composite Key Design Considerations

•Minimality — Include only attributes necessary for uniqueness. Adding extra attributes wastes storage and complicates foreign keys.
•Stability — All component attributes should be stable. If course IDs change, all references break.
•Size — Wider keys consume more index space and slow comparisons. Consider surrogate key + UNIQUE constraint for very wide natural keys.
•Order matters — In composite indexes, column order affects query performance. Put most-selective and most-queried columns first.
•Foreign key complexity — Tables referencing composite keys must include all component columns in their foreign key, cascading complexity.

When to Choose Composite vs Surrogate Keys

For junction tables where the composite key naturally represents the relationship (student enrolled in course), composite keys are often preferred—they enforce uniqueness directly and require no extra columns. For entity tables where the natural key is complex or might change, a surrogate key with a UNIQUE constraint on the natural key provides stability while preserving business logic.

Natural Keys vs Surrogate Keys: The Great Debate

One of the most debated topics in database design is the choice between natural keys (real-world identifiers) and surrogate keys (system-generated identifiers). Both have ardent advocates, and the optimal choice depends on context.

Natural Key: An attribute or combination of attributes that has inherent meaning in the business domain and uniquely identifies an entity.

Surrogate Key: A system-generated, meaningless identifier (typically an auto-incrementing integer or UUID) used solely for database mechanics.

Natural Key Advantages

•Self-documenting — Key value conveys meaning ('US', 'EUR', 'ISBN-13')
•Data validation — Invalid keys are often obviously wrong
•Business alignment — Matches how users think and search
•No extra column — Uses existing data; no redundant storage
•Cross-system consistency — Same ID in database, API, and UI

Natural Key Disadvantages

•May change — Email addresses, company names change over time
•Complex joins — Multi-column keys complicate foreign keys
•Size overhead — VARCHAR keys are larger than INT keys
•Availability — May not be known at record creation time
•Privacy — May expose PII (SSN, email) in logs and errors

Surrogate Key Advantages

•Immutable — Auto-generated values never change
•Compact — Single INT/BIGINT is the smallest possible key
•Always available — Generated on insert; no delay
•Simple joins — Single-column equality comparisons
•Privacy-safe — Reveals nothing about the entity

Surrogate Key Disadvantages

•Meaningless — 'User 12847' conveys nothing about the user
•Extra column — Adds storage and one more thing to maintain
•Still needs uniqueness — Natural key should still be UNIQUE constraint
•Sequence issues — Auto-increment can reveal system info
•Cross-system mapping — External systems may use different IDs

key_choice_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Example 1: Strong case for NATURAL key
-- ISO country codes are stable, meaningful, and compact
CREATE TABLE countries (
    country_code CHAR(2) PRIMARY KEY,  -- 'US', 'GB', 'JP'
    country_name VARCHAR(100) NOT NULL,
    currency_code CHAR(3) NOT NULL
);
 
-- Joins are readable
SELECT o.order_id, c.country_name
FROM orders o
JOIN countries c ON o.shipping_country_code = c.country_code;
-- vs: ON o.shipping_country_id = c.country_id (meaningless)
 
 
-- Example 2: Strong case for SURROGATE key
-- Customers have no stable, single natural identifier
CREATE TABLE customers (
    customer_id BIGSERIAL PRIMARY KEY,  -- Surrogate
    
    -- Natural identifiers as alternate keys (may change!)
    email VARCHAR(255) NOT NULL UNIQUE,
    phone VARCHAR(20) UNIQUE,
    
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL
);
 
-- Foreign keys use stable surrogate
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(customer_id),
    -- ... 
);
 
-- When email changes, no cascade needed!
UPDATE customers SET email = 'newemail@example.com' WHERE customer_id = 12345;
 
 
-- Example 3: HYBRID approach (recommended)
-- Surrogate PK with natural UNIQUE constraints
CREATE TABLE products (
    product_id BIGSERIAL PRIMARY KEY,   -- Surrogate for internal use
    sku VARCHAR(50) NOT NULL UNIQUE,    -- Natural key for business use
    upc CHAR(12) UNIQUE,                -- Another natural key (barcode)
    
    product_name VARCHAR(200) NOT NULL,
    price DECIMAL(10,2) NOT NULL
);
 
-- Internal code uses surrogate
SELECT * FROM order_items WHERE product_id = 12345;
 
-- User-facing / API uses SKU
SELECT * FROM products WHERE sku = 'WIDGET-XL-BLUE';

The Pragmatic Hybrid Approach

Most production systems use a hybrid: surrogate primary keys for stability and join efficiency, with UNIQUE constraints on natural keys for business logic. This provides the best of both worlds—stable internal references and meaningful external identifiers. The surrogate key is for the database; the natural key is for the business.

Implementation Across Database Systems

While key constraints are conceptually similar across databases, implementation details vary in important ways that affect performance and administration.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- PostgreSQL: Key constraint implementation
 
-- Primary key creates B-tree index automatically
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY  -- Implicit: NOT NULL + UNIQUE index
);
 
-- View the auto-created index
SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'orders';
-- orders_pkey | CREATE UNIQUE INDEX orders_pkey ON orders USING btree (order_id)
 
-- Named primary key constraint
CREATE TABLE products (
    product_id BIGSERIAL,
    sku VARCHAR(50) NOT NULL,
    
    CONSTRAINT pk_products PRIMARY KEY (product_id),
    CONSTRAINT uq_products_sku UNIQUE (sku)
);
 
-- Unique constraint on expression (PostgreSQL-specific)
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL
);
CREATE UNIQUE INDEX uq_users_email_lower ON users (LOWER(email));
-- 'John@Example.com' and 'john@example.com' are now considered duplicates
 
-- Partial unique constraint (PostgreSQL 9.0+)
CREATE TABLE subscriptions (
    sub_id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    plan_type VARCHAR(20) NOT NULL,
    is_active BOOLEAN DEFAULT TRUE
);
-- Only one active subscription per user
CREATE UNIQUE INDEX uq_active_subscription 
ON subscriptions (user_id) 
WHERE is_active = TRUE;

SQL Server NULL Behavior Difference

SQL Server treats NULL as a value for UNIQUE constraint purposes—only one NULL is allowed. PostgreSQL and MySQL allow multiple NULLs. This is a critical difference when migrating between systems. Use filtered indexes in SQL Server if you need PostgreSQL-like behavior.

Key Selection Best Practices

Choosing the right key for each table is one of the most impactful design decisions you'll make. Poor key choices create technical debt that's expensive to fix later.

Primary Key Selection Criteria

•Uniqueness is guaranteed — The key must uniquely identify every row, with no exceptions, now and forever.
•Never NULL — Primary keys cannot be NULL; ensure the value is always known at insert time.
•Immutable — Keys should never change. Updates cascade to all foreign keys and can be catastrophic at scale.
•Compact — Smaller keys mean faster comparisons, smaller indexes, and smaller foreign key columns.
•Simple — Single-column keys are easier to work with than composite keys; prefer surrogate if natural key is complex.
•Available at insert — The key value must be known when the row is created.
•Meaningless (for surrogates) — Surrogate keys should reveal nothing about the entity.

Common Key Selection Mistakes
Mistake	Problem	Better Approach
Email as PK	Users change email addresses	Surrogate PK + email as UNIQUE constraint
Composite natural PK	Complex FKs; any part might change	Surrogate PK + UNIQUE on natural composite
Business code as PK	Codes get redefined; mergers change them	Surrogate PK + code as UNIQUE
Sequential INT for distributed	Collisions across servers	UUID or distributed sequence (Snowflake ID)
UUID for everything	Storage overhead; poor index locality	BIGINT for internal tables; UUID only when needed
No alternate keys	Business can't query by natural identifier	Add UNIQUE constraints on business identifiers
Nullable UNIQUE without NOT NULL	Illusion of uniqueness with NULL holes	Add NOT NULL for true alternate keys

best_practice_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- GOOD: Surrogate PK with natural UNIQUE constraints
CREATE TABLE employees (
    employee_id BIGSERIAL PRIMARY KEY,           -- Stable surrogate
    employee_number VARCHAR(20) NOT NULL UNIQUE, -- Business identifier
    email VARCHAR(255) NOT NULL UNIQUE,          -- Another business identifier
    ssn CHAR(11) UNIQUE,                         -- Optional but unique
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL
);
 
-- GOOD: Natural key for truly stable reference data
CREATE TABLE currencies (
    currency_code CHAR(3) PRIMARY KEY,    -- ISO 4217: 'USD', 'EUR', 'JPY'
    currency_name VARCHAR(100) NOT NULL,
    symbol VARCHAR(5),
    decimal_places INT NOT NULL DEFAULT 2
);
 
-- GOOD: Composite key for junction tables
CREATE TABLE product_categories (
    product_id BIGINT NOT NULL REFERENCES products(product_id),
    category_id BIGINT NOT NULL REFERENCES categories(category_id),
    is_primary BOOLEAN DEFAULT FALSE,
    
    PRIMARY KEY (product_id, category_id)
);
 
-- GOOD: Ensure only one primary category per product
CREATE UNIQUE INDEX uq_product_primary_category 
ON product_categories (product_id) 
WHERE is_primary = TRUE;
 
-- AVOID: Natural key that might change
CREATE TABLE customers_bad (
    email VARCHAR(255) PRIMARY KEY,  -- What if they change email?
    name VARCHAR(100)
    -- All foreign keys to this table break on email change!
);

When in Doubt, Use Surrogate

If you're uncertain whether a natural key is stable enough, default to a surrogate key with the natural key as a UNIQUE constraint. You can always add application logic to query by natural key while keeping internal references stable. The cost of adding a BIGINT column is tiny compared to the cost of migrating a broken natural key later.

Summary: Key Constraints Mastery

Key constraints are the backbone of data identification in relational databases. Let's consolidate the essential knowledge:

Key Takeaways

•Key hierarchy — Superkey → Candidate key (minimal) → Primary key (chosen) + Alternate keys (unchosen candidates).
•Uniqueness property — No two rows can have the same values for all key attributes; NULL comparisons are UNKNOWN.
•PRIMARY KEY vs UNIQUE — PK is unique one per table, implicitly NOT NULL, entity integrity. UNIQUE can be many, allows NULL.
•Composite keys — Needed when no single attribute is unique; common in junction tables; all attributes must be non-NULL for PK.
•Natural vs surrogate — Natural keys carry meaning but may change; surrogate keys are stable but meaningless. Hybrid approach is often best.
•Implementation varies — NULL handling in UNIQUE differs across databases; SQL Server allows only one NULL.
•Best practices — Prioritize stability, compactness, availability; use surrogate PK with natural UNIQUE constraints.

What's next:

We've covered the core integrity constraints: entity integrity, referential integrity, domain constraints, and key constraints. The final page explores Semantic Constraints—the complex business rules that go beyond simple value validation, capturing the intricate logic that keeps data meaningful in real-world applications.

Page Complete

You now have a comprehensive understanding of key constraints—from the theoretical key hierarchy through practical implementation across database systems. Properly designed keys form the foundation for reliable querying, consistent relationships, and scalable data management.

4 / 5

Loading learning content...

Database Management SystemsIntegrity Constraints

Integrity Constraints in the Relational Model

LevelIntermediate

Duration90 mins

TopicIntegrity Constraints

4 / 5

Key Constraints

Uniqueness as a Fundamental Property

What You Will Master

The Key Hierarchy: From Superkeys to Primary Keys

The relational model defines a precise hierarchy of key concepts, each building on the previous. Understanding this hierarchy is essential for proper database design.

Key Definitions

•Superkey — Any set of attributes that uniquely identifies each tuple in a relation. A superkey may contain more attributes than necessary. Every relation has at least one superkey: the set of all its attributes.
•Candidate Key — A minimal superkey—a superkey from which no attribute can be removed without losing the uniqueness property. A relation may have multiple candidate keys.
•Primary Key — The candidate key selected by the database designer as the principal means of identifying tuples. There is exactly one primary key per table.
•Alternate Key — Any candidate key that is not chosen as the primary key. Alternate keys are also unique identifiers but secondary to the primary key.
•Simple Key — A key consisting of a single attribute.
•Composite Key — A key consisting of two or more attributes that together uniquely identify a tuple.

Illustrating the hierarchy:

Consider an Employee table with attributes: {emp_id, ssn, email, first_name, last_name, department}

Let's analyze the keys:

Key Analysis for Employee Table
Attribute Set	Superkey?	Candidate Key?	Rationale
{emp_id, ssn, email, first_name, last_name, department}	Yes	No	All attributes together are unique, but not minimal
{emp_id, ssn, email}	Yes	No	Unique but contains redundant uniqueness
{emp_id, first_name}	Yes	No	Unique but first_name is unnecessary
{emp_id}	Yes	Yes	Minimal—removing emp_id loses uniqueness
{ssn}	Yes	Yes	Minimal—unique per person (in US context)
{email}	Yes	Yes	Minimal—unique per employee (if enforced)
{first_name, last_name}	Maybe	Maybe	Depends on data—could have duplicate names
{department}	No	No	Multiple employees in same department

From this analysis:

Candidate Keys: {emp_id}, {ssn}, {email}
Primary Key: Designer chooses {emp_id}
Alternate Keys: {ssn} and {email}

The primary key choice is a design decision, not determined by the data itself. We'll explore selection criteria later in this page.

Minimality is Crucial

The Uniqueness Property: Mathematical Foundation

However, in practice, we need stronger guarantees. We need certain attributes (the keys) to uniquely identify rows, even if other attributes happen to be identical.

Formal definition:

A set of attributes K is a superkey of relation R if and only if, for any two tuples t₁ and t₂ in R where t₁ ≠ t₂, we have t₁[K] ≠ t₂[K].

In plain language: no two different rows can have the same values for all attributes in K.

uniqueness_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
-- Sample data demonstrating uniqueness
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    ssn CHAR(11) UNIQUE,
    email VARCHAR(255) UNIQUE,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    department VARCHAR(50)
);
 
-- Valid insertions: each key is unique
INSERT INTO employees VALUES 
    (1, '123-45-6789', 'alice@co.com', 'Alice', 'Smith', 'Engineering'),
    (2, '234-56-7890', 'bob@co.com', 'Bob', 'Jones', 'Marketing'),
    (3, '345-67-8901', 'carol@co.com', 'Carol', 'Smith', 'Engineering');
-- Note: last_name 'Smith' and department 'Engineering' repeat (not keys)
 
-- Duplicate primary key: REJECTED
INSERT INTO employees VALUES 
    (1, '456-78-9012', 'dave@co.com', 'Dave', 'Brown', 'Sales');
-- ERROR: duplicate key value violates unique constraint "employees_pkey"
 
-- Duplicate alternate key (SSN): REJECTED
INSERT INTO employees VALUES 
    (4, '123-45-6789', 'eve@co.com', 'Eve', 'Wilson', 'Finance');
-- ERROR: duplicate key value violates unique constraint "employees_ssn_key"
 
-- Duplicate alternate key (email): REJECTED
INSERT INTO employees VALUES 
    (5, '567-89-0123', 'alice@co.com', 'Alice', 'Taylor', 'HR');
-- ERROR: duplicate key value violates unique constraint "employees_email_key"
 
-- Duplicate non-key values are allowed
INSERT INTO employees VALUES 
    (6, '678-90-1234', 'frank@co.com', 'Frank', 'Smith', 'Engineering');
-- OK: same last_name and department as other employees (not keys)
 
-- Uniqueness check query
SELECT emp_id, COUNT(*)
FROM employees
GROUP BY emp_id
HAVING COUNT(*) > 1;  -- Returns nothing if PK constraint working

The NULL exception:

This is why the PRIMARY KEY constraint is special—it combines UNIQUE with NOT NULL, closing the NULL loophole.

Multiple NULLs in UNIQUE Columns

PRIMARY KEY vs UNIQUE Constraints

Both PRIMARY KEY and UNIQUE enforce uniqueness, but they serve different purposes and have distinct behaviors.

PRIMARY KEY vs UNIQUE Comparison
Characteristic	PRIMARY KEY	UNIQUE Constraint
Number per table	Exactly one	Zero to many
NULL values	Never allowed (implicit NOT NULL)	Allowed by default (multiple NULLs possible)
Purpose	Principal identifier for tuples	Additional uniqueness requirements
Foreign key target	Default reference target	Can be referenced with explicit syntax
Clustered index (SQL Server)	Creates clustered index by default	Creates non-clustered index
Semantic meaning	'This is THE identifier'	'This must also be unique'
Entity integrity	Enforces entity integrity rule	Does not guarantee entity integrity

pk_vs_unique.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- A table demonstrating both PRIMARY KEY and UNIQUE
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,           -- THE identifier; auto-generated
    username VARCHAR(50) NOT NULL UNIQUE, -- Alternate key: must be unique, no NULLs
    email VARCHAR(255) NOT NULL UNIQUE,   -- Another alternate key: unique, no NULLs
    phone VARCHAR(20) UNIQUE,             -- UNIQUE but nullable (optional phone)
    display_name VARCHAR(100),            -- No uniqueness requirement
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Primary key: user_id
-- Alternate keys: username, email
-- phone is UNIQUE but not a candidate key (allows NULL)
 
-- Valid: all constraints satisfied
INSERT INTO users (username, email, display_name)
VALUES ('johndoe', 'john@example.com', 'John Doe');
 
-- NULL phone is allowed (UNIQUE permits NULL)
INSERT INTO users (username, email, phone)
VALUES ('janedoe', 'jane@example.com', NULL);
 
-- Another NULL phone is STILL allowed (UNIQUE allows multiple NULLs)
INSERT INTO users (username, email, phone)
VALUES ('bobsmith', 'bob@example.com', NULL);
 
-- Duplicate phone (non-NULL) is rejected
INSERT INTO users (username, email, phone)
VALUES ('alicew', 'alice@example.com', '555-0100');
INSERT INTO users (username, email, phone)
VALUES ('charlieb', 'charlie@example.com', '555-0100');
-- ERROR: duplicate key value violates unique constraint "users_phone_key"
 
-- Foreign key can reference non-primary UNIQUE column
CREATE TABLE user_sessions (
    session_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    username VARCHAR(50) NOT NULL,
    started_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    
    -- Reference username instead of user_id (unusual but valid)
    FOREIGN KEY (username) REFERENCES users(username)
        ON UPDATE CASCADE  -- Important: username might change
);

Best Practice: Combine UNIQUE with NOT NULL

Composite Keys: Multi-Attribute Identification

composite_keys.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
-- Classic composite key: Junction table for M:N relationship
CREATE TABLE student_courses (
    student_id INT NOT NULL,
    course_id VARCHAR(10) NOT NULL,
    enrollment_date DATE NOT NULL DEFAULT CURRENT_DATE,
    grade CHAR(2),
    
    -- Composite primary key
    PRIMARY KEY (student_id, course_id),
    
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
 
-- Each combination of (student_id, course_id) is unique
INSERT INTO student_courses VALUES (1001, 'CS101', '2024-01-15', NULL);
INSERT INTO student_courses VALUES (1001, 'CS102', '2024-01-15', NULL);  -- OK: different course
INSERT INTO student_courses VALUES (1002, 'CS101', '2024-01-15', NULL);  -- OK: different student
INSERT INTO student_courses VALUES (1001, 'CS101', '2024-01-15', 'A');   -- ERROR: duplicate
 
-- Composite key with three attributes
CREATE TABLE exam_results (
    student_id INT NOT NULL,
    course_id VARCHAR(10) NOT NULL,
    exam_number INT NOT NULL,  -- 1 for midterm, 2 for final, etc.
    score DECIMAL(5,2) NOT NULL,
    taken_at TIMESTAMP NOT NULL,
    
    PRIMARY KEY (student_id, course_id, exam_number),
    
    FOREIGN KEY (student_id, course_id) 
        REFERENCES student_courses(student_id, course_id)
);
 
-- Composite UNIQUE constraint (alternate composite key)
CREATE TABLE inventory_locations (
    location_id SERIAL PRIMARY KEY,  -- Surrogate key
    warehouse_code CHAR(5) NOT NULL,
    aisle VARCHAR(3) NOT NULL,
    shelf INT NOT NULL,
    position INT NOT NULL,
    
    -- Natural composite key as alternate key
    UNIQUE (warehouse_code, aisle, shelf, position)
);
 
-- Both can identify a row:
SELECT * FROM inventory_locations WHERE location_id = 42;
SELECT * FROM inventory_locations 
WHERE warehouse_code = 'WH001' AND aisle = 'A12' AND shelf = 3 AND position = 5;

Composite Key Design Considerations

•Minimality — Include only attributes necessary for uniqueness. Adding extra attributes wastes storage and complicates foreign keys.
•Stability — All component attributes should be stable. If course IDs change, all references break.
•Size — Wider keys consume more index space and slow comparisons. Consider surrogate key + UNIQUE constraint for very wide natural keys.
•Order matters — In composite indexes, column order affects query performance. Put most-selective and most-queried columns first.
•Foreign key complexity — Tables referencing composite keys must include all component columns in their foreign key, cascading complexity.

When to Choose Composite vs Surrogate Keys

Natural Keys vs Surrogate Keys: The Great Debate

Natural Key: An attribute or combination of attributes that has inherent meaning in the business domain and uniquely identifies an entity.

Surrogate Key: A system-generated, meaningless identifier (typically an auto-incrementing integer or UUID) used solely for database mechanics.

Natural Key Advantages

•Self-documenting — Key value conveys meaning ('US', 'EUR', 'ISBN-13')
•Data validation — Invalid keys are often obviously wrong
•Business alignment — Matches how users think and search
•No extra column — Uses existing data; no redundant storage
•Cross-system consistency — Same ID in database, API, and UI

Natural Key Disadvantages

•May change — Email addresses, company names change over time
•Complex joins — Multi-column keys complicate foreign keys
•Size overhead — VARCHAR keys are larger than INT keys
•Availability — May not be known at record creation time
•Privacy — May expose PII (SSN, email) in logs and errors

Surrogate Key Advantages

•Immutable — Auto-generated values never change
•Compact — Single INT/BIGINT is the smallest possible key
•Always available — Generated on insert; no delay
•Simple joins — Single-column equality comparisons
•Privacy-safe — Reveals nothing about the entity

Surrogate Key Disadvantages

•Meaningless — 'User 12847' conveys nothing about the user
•Extra column — Adds storage and one more thing to maintain
•Still needs uniqueness — Natural key should still be UNIQUE constraint
•Sequence issues — Auto-increment can reveal system info
•Cross-system mapping — External systems may use different IDs

key_choice_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Example 1: Strong case for NATURAL key
-- ISO country codes are stable, meaningful, and compact
CREATE TABLE countries (
    country_code CHAR(2) PRIMARY KEY,  -- 'US', 'GB', 'JP'
    country_name VARCHAR(100) NOT NULL,
    currency_code CHAR(3) NOT NULL
);
 
-- Joins are readable
SELECT o.order_id, c.country_name
FROM orders o
JOIN countries c ON o.shipping_country_code = c.country_code;
-- vs: ON o.shipping_country_id = c.country_id (meaningless)
 
 
-- Example 2: Strong case for SURROGATE key
-- Customers have no stable, single natural identifier
CREATE TABLE customers (
    customer_id BIGSERIAL PRIMARY KEY,  -- Surrogate
    
    -- Natural identifiers as alternate keys (may change!)
    email VARCHAR(255) NOT NULL UNIQUE,
    phone VARCHAR(20) UNIQUE,
    
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL
);
 
-- Foreign keys use stable surrogate
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(customer_id),
    -- ... 
);
 
-- When email changes, no cascade needed!
UPDATE customers SET email = 'newemail@example.com' WHERE customer_id = 12345;
 
 
-- Example 3: HYBRID approach (recommended)
-- Surrogate PK with natural UNIQUE constraints
CREATE TABLE products (
    product_id BIGSERIAL PRIMARY KEY,   -- Surrogate for internal use
    sku VARCHAR(50) NOT NULL UNIQUE,    -- Natural key for business use
    upc CHAR(12) UNIQUE,                -- Another natural key (barcode)
    
    product_name VARCHAR(200) NOT NULL,
    price DECIMAL(10,2) NOT NULL
);
 
-- Internal code uses surrogate
SELECT * FROM order_items WHERE product_id = 12345;
 
-- User-facing / API uses SKU
SELECT * FROM products WHERE sku = 'WIDGET-XL-BLUE';

The Pragmatic Hybrid Approach

Implementation Across Database Systems

While key constraints are conceptually similar across databases, implementation details vary in important ways that affect performance and administration.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- PostgreSQL: Key constraint implementation
 
-- Primary key creates B-tree index automatically
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY  -- Implicit: NOT NULL + UNIQUE index
);
 
-- View the auto-created index
SELECT indexname, indexdef 
FROM pg_indexes 
WHERE tablename = 'orders';
-- orders_pkey | CREATE UNIQUE INDEX orders_pkey ON orders USING btree (order_id)
 
-- Named primary key constraint
CREATE TABLE products (
    product_id BIGSERIAL,
    sku VARCHAR(50) NOT NULL,
    
    CONSTRAINT pk_products PRIMARY KEY (product_id),
    CONSTRAINT uq_products_sku UNIQUE (sku)
);
 
-- Unique constraint on expression (PostgreSQL-specific)
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL
);
CREATE UNIQUE INDEX uq_users_email_lower ON users (LOWER(email));
-- 'John@Example.com' and 'john@example.com' are now considered duplicates
 
-- Partial unique constraint (PostgreSQL 9.0+)
CREATE TABLE subscriptions (
    sub_id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    plan_type VARCHAR(20) NOT NULL,
    is_active BOOLEAN DEFAULT TRUE
);
-- Only one active subscription per user
CREATE UNIQUE INDEX uq_active_subscription 
ON subscriptions (user_id) 
WHERE is_active = TRUE;

SQL Server NULL Behavior Difference

Key Selection Best Practices

Choosing the right key for each table is one of the most impactful design decisions you'll make. Poor key choices create technical debt that's expensive to fix later.

Primary Key Selection Criteria

•Uniqueness is guaranteed — The key must uniquely identify every row, with no exceptions, now and forever.
•Never NULL — Primary keys cannot be NULL; ensure the value is always known at insert time.
•Immutable — Keys should never change. Updates cascade to all foreign keys and can be catastrophic at scale.
•Compact — Smaller keys mean faster comparisons, smaller indexes, and smaller foreign key columns.
•Simple — Single-column keys are easier to work with than composite keys; prefer surrogate if natural key is complex.
•Available at insert — The key value must be known when the row is created.
•Meaningless (for surrogates) — Surrogate keys should reveal nothing about the entity.

Common Key Selection Mistakes
Mistake	Problem	Better Approach
Email as PK	Users change email addresses	Surrogate PK + email as UNIQUE constraint
Composite natural PK	Complex FKs; any part might change	Surrogate PK + UNIQUE on natural composite
Business code as PK	Codes get redefined; mergers change them	Surrogate PK + code as UNIQUE
Sequential INT for distributed	Collisions across servers	UUID or distributed sequence (Snowflake ID)
UUID for everything	Storage overhead; poor index locality	BIGINT for internal tables; UUID only when needed
No alternate keys	Business can't query by natural identifier	Add UNIQUE constraints on business identifiers
Nullable UNIQUE without NOT NULL	Illusion of uniqueness with NULL holes	Add NOT NULL for true alternate keys

best_practice_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- GOOD: Surrogate PK with natural UNIQUE constraints
CREATE TABLE employees (
    employee_id BIGSERIAL PRIMARY KEY,           -- Stable surrogate
    employee_number VARCHAR(20) NOT NULL UNIQUE, -- Business identifier
    email VARCHAR(255) NOT NULL UNIQUE,          -- Another business identifier
    ssn CHAR(11) UNIQUE,                         -- Optional but unique
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL
);
 
-- GOOD: Natural key for truly stable reference data
CREATE TABLE currencies (
    currency_code CHAR(3) PRIMARY KEY,    -- ISO 4217: 'USD', 'EUR', 'JPY'
    currency_name VARCHAR(100) NOT NULL,
    symbol VARCHAR(5),
    decimal_places INT NOT NULL DEFAULT 2
);
 
-- GOOD: Composite key for junction tables
CREATE TABLE product_categories (
    product_id BIGINT NOT NULL REFERENCES products(product_id),
    category_id BIGINT NOT NULL REFERENCES categories(category_id),
    is_primary BOOLEAN DEFAULT FALSE,
    
    PRIMARY KEY (product_id, category_id)
);
 
-- GOOD: Ensure only one primary category per product
CREATE UNIQUE INDEX uq_product_primary_category 
ON product_categories (product_id) 
WHERE is_primary = TRUE;
 
-- AVOID: Natural key that might change
CREATE TABLE customers_bad (
    email VARCHAR(255) PRIMARY KEY,  -- What if they change email?
    name VARCHAR(100)
    -- All foreign keys to this table break on email change!
);

When in Doubt, Use Surrogate

Summary: Key Constraints Mastery

Key constraints are the backbone of data identification in relational databases. Let's consolidate the essential knowledge:

Key Takeaways

•Key hierarchy — Superkey → Candidate key (minimal) → Primary key (chosen) + Alternate keys (unchosen candidates).
•Uniqueness property — No two rows can have the same values for all key attributes; NULL comparisons are UNKNOWN.
•PRIMARY KEY vs UNIQUE — PK is unique one per table, implicitly NOT NULL, entity integrity. UNIQUE can be many, allows NULL.
•Composite keys — Needed when no single attribute is unique; common in junction tables; all attributes must be non-NULL for PK.
•Natural vs surrogate — Natural keys carry meaning but may change; surrogate keys are stable but meaningless. Hybrid approach is often best.
•Implementation varies — NULL handling in UNIQUE differs across databases; SQL Server allows only one NULL.
•Best practices — Prioritize stability, compactness, availability; use surrogate PK with natural UNIQUE constraints.

What's next:

Page Complete

4 / 5