Index Types - Learning Module

Loading content...

0/252

Unique Index: Where Performance Meets Integrity

Two Birds, One Index

Database indexes typically serve a single purpose: accelerate data retrieval. But unique indexes are special—they simultaneously provide fast lookup capabilities AND enforce a critical data integrity constraint: no duplicate values allowed.

This dual nature makes unique indexes one of the most important tools in the database designer's arsenal. They're not just about speed; they're about correctness. When you define a unique index on an email column, you're making a promise to your application: every email address will be distinct, and the database itself will enforce this guarantee—no application code required.

What You Will Learn

By the end of this page, you will understand how unique indexes combine indexing with constraint enforcement, the relationship between unique indexes and primary keys, how different databases handle NULL values in unique indexes, the implementation mechanics that enable duplicate detection, and patterns for effective unique index design.

What Is a Unique Index?

A unique index is an index that enforces a uniqueness constraint on the indexed column(s). It guarantees that no two rows in the table can have the same value(s) for the indexed field(s)—with specific handling for NULL values that varies by database system.

Formal definition:

A unique index is an index structure where:

Each distinct key value appears at most once in the index
Any attempt to insert or update a row with a duplicate key value is rejected
The constraint is enforced at the database engine level, independent of application logic

Dual purpose:

Unique indexes serve two distinct roles:

Performance role: Provide O(log n) lookups just like any other B+-tree index
Integrity role: Prevent constraint violations, guaranteeing data quality

unique_index_basics.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- Creating a unique index explicitly
CREATE UNIQUE INDEX idx_employee_email
ON Employee (Email);
 
-- Creating a unique constraint (implicitly creates unique index)
ALTER TABLE Employee
ADD CONSTRAINT uq_employee_ssn UNIQUE (SSN);
 
-- The ENGINE enforces uniqueness on every operation:
 
-- This succeeds (first occurrence)
INSERT INTO Employee (ID, Email, SSN)
VALUES (1, 'alice@company.com', '123-45-6789');
 
-- This FAILS with constraint violation
INSERT INTO Employee (ID, Email, SSN)
VALUES (2, 'alice@company.com', '987-65-4321');
-- Error: Duplicate entry 'alice@company.com' for key 'idx_employee_email'
 
-- Updates are also checked:
UPDATE Employee SET Email = 'bob@company.com'
WHERE ID = 1;
-- Success (if no other row has this email)
 
UPDATE Employee SET Email = 'existing@company.com'
WHERE ID = 1;
-- FAILS if another row already has this email

Constraint vs. Index

There's a subtle distinction: UNIQUE CONSTRAINT is a logical concept (a rule about data), while UNIQUE INDEX is a physical structure. In practice, creating a unique constraint always creates an underlying unique index in most databases. The constraint provides semantic meaning; the index provides enforcement and performance.

Unique Index vs. Regular Index
Characteristic	Unique Index	Non-Unique Index
Duplicate values	Rejected (error thrown)	Allowed
Lookup performance	O(log n)	O(log n)
Data integrity	Enforces uniqueness	No constraint
Storage overhead	Same as regular index	Same as unique index
Insert check	Must verify no duplicate exists	No uniqueness check
Query optimization	Optimizer knows at most 1 row matches	Must assume multiple matches

Unique Indexes and Primary Keys

The relationship between unique indexes and primary keys is fundamental but often confused. Let's clarify their connection.

Primary Key = Unique + NOT NULL

A primary key constraint combines two properties:

Uniqueness: No two rows can have the same primary key value
Non-nullability: The primary key cannot be NULL

A unique index, by itself, typically allows NULL values (with variations by database). The primary key is essentially a unique index with an additional NOT NULL constraint.

Implementation relationship:

When you create a PRIMARY KEY constraint, the database automatically creates a unique index to enforce it. In some databases (like InnoDB), this becomes the clustered index.

pk_vs_unique.sql
Comparison
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- These are logically equivalent:
 
-- Option 1: Primary Key Constraint
CREATE TABLE Employee (
    ID INT PRIMARY KEY,           -- Unique + NOT NULL (implicit)
    Email VARCHAR(255)
);
 
-- Option 2: Unique Index + NOT NULL
CREATE TABLE Employee (
    ID INT NOT NULL,
    Email VARCHAR(255)
);
CREATE UNIQUE INDEX pk_employee_id ON Employee(ID);
 
-- Key differences:
 
-- 1. PRIMARY KEY implies NOT NULL
ALTER TABLE t ADD CONSTRAINT pk PRIMARY KEY (col);
-- Automatically ensures col is NOT NULL
 
-- 2. You can have only ONE primary key per table
ALTER TABLE t ADD PRIMARY KEY (col1);
ALTER TABLE t ADD PRIMARY KEY (col2);  -- ERROR: Only one PK allowed
 
-- 3. You can have MULTIPLE unique indexes
CREATE UNIQUE INDEX idx1 ON t(col1);
CREATE UNIQUE INDEX idx2 ON t(col2);
CREATE UNIQUE INDEX idx3 ON t(col3);   -- All valid!
 
-- 4. Unique index can include nullable columns
CREATE UNIQUE INDEX idx_nullable ON t(nullable_col);  -- Usually OK
-- (NULL handling varies by database, see next section)

Primary Key vs. Unique Index
Property	Primary Key	Unique Index
NULL allowed	No (implicit NOT NULL)	Usually yes (varies)
Multiple per table	No (only one)	Yes (many allowed)
Creates index	Yes (unique index)	By definition
Semantic meaning	Identifies the row	Enforces uniqueness only
Foreign key reference	Common FK target	Can also be FK target
Clustered by default	Often (varies by DB)	Usually not

Candidate Keys

In relational database theory, any column or combination of columns that could serve as a primary key is called a 'candidate key.' The chosen primary key is one candidate key; other candidate keys are often implemented as unique indexes. For example, an Employee table might use numeric ID as primary key but have unique indexes on SSN and Email—all are candidate keys.

NULL Handling in Unique Indexes

How unique indexes handle NULL values is one of the most contentious and database-specific aspects of the SQL standard. The question: Do two NULL values count as duplicates?

The SQL standard is somewhat ambiguous, leading to different implementations:

SQL Standard Interpretation: NULL is "unknown," and two unknown values cannot be determined to be equal
Practical Implications: This suggests NULL ≠ NULL, so multiple NULLs should be allowed

However, databases implement this differently:

PostgreSQL NULL Behavior:

PostgreSQL follows the SQL standard interpretation: NULL values are considered distinct, so multiple NULLs are allowed in a unique index by default.

postgresql_null.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- PostgreSQL: Multiple NULLs allowed by default
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    phone VARCHAR(20)
);
 
CREATE UNIQUE INDEX idx_unique_phone ON users(phone);
 
-- These ALL succeed:
INSERT INTO users (phone) VALUES ('555-1234');  -- OK
INSERT INTO users (phone) VALUES (NULL);        -- OK
INSERT INTO users (phone) VALUES (NULL);        -- OK! Second NULL allowed
INSERT INTO users (phone) VALUES (NULL);        -- OK! Third NULL allowed
 
-- To disallow multiple NULLs, use partial index:
CREATE UNIQUE INDEX idx_unique_phone_not_null 
ON users(phone) 
WHERE phone IS NOT NULL;
 
-- Now: Multiple NULLs allowed, but non-NULL values must be unique

Critical Design Decision

NULL handling in unique indexes is a common source of bugs when migrating between databases. Always test NULL behavior explicitly in your target database, and document your assumptions in code comments. What works in PostgreSQL may fail in SQL Server!

How Uniqueness Is Enforced

Understanding how databases enforce uniqueness at the implementation level reveals why unique indexes have specific performance characteristics.

The enforcement mechanism:

When you attempt to INSERT or UPDATE a row with a unique-indexed column:

The database searches the unique index for the new key value
If found (and pointing to a different row), the operation is rejected
If not found (or pointing to the same row in UPDATE), the operation proceeds
The new/updated entry is added to the index

This check happens within a transaction, using appropriate locking to prevent race conditions.

uniqueness_check_flow.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
INSERT Operation with Unique Index
 
Transaction T1: INSERT INTO users (email) VALUES ('new@email.com')
 
Step 1: Search unique index for 'new@email.com'
        ┌─────────────────────────────────────────────┐
        │  B+-tree traversal: O(log n) comparisons   │
        │  Found? → YES or NO                         │
        └─────────────────────────────────────────────┘
                            │
                 ┌──────────┴──────────┐
                 ▼                     ▼
           FOUND                  NOT FOUND
         (Duplicate)              (Unique OK)
                 │                     │
                 ▼                     ▼
        ┌─────────────────┐   ┌─────────────────────────┐
        │ REJECT INSERT   │   │ Step 2: Acquire lock    │
        │ Return error:   │   │ on index leaf page      │
        │ Duplicate entry │   │                         │
        └─────────────────┘   │ Step 3: Re-check (no    │
                              │ concurrent insert?)     │
                              │                         │
                              │ Step 4: Insert entry    │
                              │ in index AND data       │
                              │                         │
                              │ Step 5: Commit          │
                              └─────────────────────────┘
 
Performance Note:
- Best case: O(log n) for index lookup + O(1) for insert
- The uniqueness check adds ~1 index traversal per insert
- For batch inserts, this can be optimized with deferred checks

Concurrency and race conditions:

Without proper locking, two concurrent transactions could both check for the same value, both find it missing, and both attempt to insert—creating a duplicate. Databases prevent this through:

Index-level locking: Lock the index page/entry during check-and-insert
Key-range locking: Lock the 'gap' where the new key would go
Serialization: Process conflicting inserts sequentially

This is why unique constraints are more reliable than application-level checks—the database guarantees atomicity of the check-and-insert operation.

INSERT or UPDATE Pattern

Modern databases provide 'upsert' operations that leverage unique indexes: INSERT ... ON DUPLICATE KEY UPDATE (MySQL), INSERT ... ON CONFLICT (PostgreSQL), or MERGE (SQL Server). These atomically insert if unique or update if duplicate, avoiding the race condition of separate SELECT + INSERT/UPDATE.

upsert_patterns.sql
1
2
3
4
5
6
7
8
9
10
-- PostgreSQL: INSERT ON CONFLICT (requires unique index)
INSERT INTO users (email, name, login_count)
VALUES ('alice@example.com', 'Alice', 1)
ON CONFLICT (email) DO UPDATE
SET login_count = users.login_count + 1;
 
-- This atomically:
-- 1. Tries to insert
-- 2. If unique violation on email → updates instead
-- No race condition possible!

Composite Unique Indexes

A composite unique index (or multi-column unique index) enforces uniqueness on a combination of columns. The uniqueness constraint applies to the tuple of values, not individual columns.

Key concept: Individual columns can have duplicates, but the combination must be unique.

composite_unique.sql
Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Scenario: Each user can rate a product only once
CREATE TABLE product_ratings (
    id SERIAL PRIMARY KEY,
    user_id INT NOT NULL,
    product_id INT NOT NULL,
    rating INT CHECK (rating BETWEEN 1 AND 5)
);
 
-- Composite unique index: (user_id, product_id) must be unique
CREATE UNIQUE INDEX idx_unique_user_product
ON product_ratings (user_id, product_id);
 
-- Valid inserts:
INSERT INTO product_ratings (user_id, product_id, rating)
VALUES (1, 100, 5);    -- OK: (1, 100) first time
 
INSERT INTO product_ratings (user_id, product_id, rating)
VALUES (1, 200, 4);    -- OK: (1, 200) - same user, different product
 
INSERT INTO product_ratings (user_id, product_id, rating)
VALUES (2, 100, 3);    -- OK: (2, 100) - different user, same product
 
-- Invalid insert (duplicate combination):
INSERT INTO product_ratings (user_id, product_id, rating)
VALUES (1, 100, 4);    -- ERROR: (1, 100) already exists
-- "Duplicate entry '1-100' for key 'idx_unique_user_product'"
 
-- This pattern is common for:
-- - User-resource relationships (favorites, follows, subscriptions)
-- - Scheduling (employee-shift, room-timeslot)
-- - Tagging (post-tag, product-category)
-- - Versioning (document-version)

Composite Unique Index Use Cases
Domain	Columns	Meaning
E-commerce	(user_id, product_id)	One review per user per product
Social Media	(follower_id, following_id)	Follow relationship exists once
Scheduling	(room_id, date, time_slot)	No double-booking
Inventory	(warehouse_id, product_sku)	One stock record per location-product
CMS	(slug, tenant_id)	Unique URLs per tenant

Column Order Matters for Queries

While column order doesn't affect uniqueness enforcement, it significantly impacts query performance. The index can efficiently support queries that filter on the leftmost prefix: (user_id), (user_id, product_id). A query filtering only on product_id cannot use this index efficiently.

Query Optimizer Benefits of Unique Indexes

Unique indexes provide valuable information to the query optimizer beyond just fast lookups. The guarantee of uniqueness enables optimization strategies that aren't possible with regular indexes.

Optimizer knowledge from unique indexes:

Unique Index Optimizer Advantages

•Cardinality guarantee: At most one row matches an equality predicate, enabling early termination
•Join simplification: One-to-one relationships can skip duplicate elimination
•Subquery optimization: IN/EXISTS on unique column can be converted to simpler join
•GROUP BY elimination: Selecting by unique key means no grouping needed
•Index-only scans: Unique indexes make covering more effective when result is single row
•Lock optimization: Single-row guarantees allow less conservative locking

optimizer_examples.sql
Query Optimization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Example 1: Early termination
-- With unique index on employee.id:
SELECT * FROM employee WHERE id = 100;
-- Optimizer KNOWS there's at most 1 result
-- Can stop after finding first match (no need to check further)
 
-- Example 2: Join optimization
-- Unique index on product.id:
SELECT o.*, p.name
FROM orders o
JOIN products p ON o.product_id = p.id;
-- Optimizer knows: each order matches exactly 0 or 1 product
-- No duplicate checking needed in join result
 
-- Example 3: Subquery to join conversion
-- Unique index on users.email:
SELECT * FROM orders
WHERE customer_email IN (SELECT email FROM users WHERE status = 'active');
-- Can be optimized to semi-join knowing email is unique
-- No DISTINCT required in subquery result
 
-- Example 4: GROUP BY elimination
-- Unique index on orders.id:
SELECT id, SUM(amount)
FROM order_items
WHERE order_id = 100
GROUP BY id;
-- If we're filtering by a unique key, GROUP BY is trivial
 
-- Without unique constraint, optimizer must assume:
-- - Multiple rows might match
-- - Duplicate elimination may be needed
-- - Cannot stop searching early

EXPLAIN Reveals Optimizer Decisions

Use EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) to see how unique indexes affect query plans. Look for 'const' or 'eq_ref' access types in MySQL, 'Unique Index Scan' in PostgreSQL, or 'Clustered Index Seek' with 'Rows = 1' in SQL Server—these indicate the optimizer is leveraging uniqueness.

Partial and Filtered Unique Indexes

Sometimes you need uniqueness to apply only under certain conditions. Partial indexes (PostgreSQL terminology) or filtered indexes (SQL Server) enforce uniqueness on a subset of rows defined by a predicate.

Use cases:

Unique among active records only (allow duplicates in soft-deleted)
Unique per tenant in multi-tenant system
Unique when a status column has a specific value
Unique excluding NULL values (override default behavior)

partial_unique_indexes.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- PostgreSQL: Partial unique index
-- Scenario: Email must be unique among active users,
-- but deleted users can have duplicate emails
 
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    is_deleted BOOLEAN DEFAULT FALSE
);
 
CREATE UNIQUE INDEX idx_unique_active_email
ON users (email)
WHERE is_deleted = FALSE;
 
-- Works:
INSERT INTO users (email) VALUES ('alice@example.com');  -- OK
UPDATE users SET is_deleted = TRUE WHERE email = 'alice@example.com';  -- OK
INSERT INTO users (email) VALUES ('alice@example.com');  -- OK! (first is deleted)
 
-- Another example: Unique current version
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    doc_id INT NOT NULL,
    version INT NOT NULL,
    is_current BOOLEAN DEFAULT TRUE
);
 
CREATE UNIQUE INDEX idx_unique_current_doc
ON documents (doc_id)
WHERE is_current = TRUE;
 
-- Ensures only ONE current version per document
-- Multiple historical versions allowed

Partial/Filtered Unique Index Support
Database	Syntax	Notes
PostgreSQL	CREATE UNIQUE INDEX ... WHERE condition	Full support since v7.2
SQL Server	CREATE UNIQUE INDEX ... WHERE condition	Since SQL Server 2008
MySQL/InnoDB	Not supported directly	Use generated columns or triggers
Oracle	Function-based or virtual columns	No direct WHERE clause support
SQLite	CREATE UNIQUE INDEX ... WHERE condition	Since SQLite 3.8.0

Query Must Match Index Predicate

For a partial unique index to be used in a query, the query's WHERE clause must be compatible with the index's predicate. If querying WHERE is_deleted = TRUE, the index on is_deleted = FALSE won't help—it doesn't contain those rows!

Summary: Unique Index Mastery

We've explored unique indexes comprehensively—the dual-purpose structures that provide both performance and integrity guarantees at the database level.

Key Takeaways

•Dual purpose: Unique indexes serve both as fast lookup structures and constraint enforcers
•Primary key relationship: PRIMARY KEY = UNIQUE + NOT NULL; primary key automatically creates unique index
•NULL handling varies: PostgreSQL/MySQL allow multiple NULLs; SQL Server allows one; Oracle doesn't index NULLs
•Implementation: Uniqueness is enforced by index lookup before insert, with proper locking for concurrency
•Composite uniqueness: Multi-column unique indexes enforce uniqueness on the tuple, not individual columns
•Optimizer benefits: Query optimizer leverages uniqueness for early termination, join optimization, and plan simplification
•Partial indexes: Conditional uniqueness can be enforced on row subsets using WHERE clause in index definition
•Migration caution: NULL behavior differences make database migrations tricky—always test explicitly

What's next:

With unique indexes covered, we'll explore composite indexes (also called multi-column or concatenated indexes) in greater depth—examining column order strategy, prefix matching, and how to design indexes that serve multiple query patterns efficiently.

Page Complete

You now understand unique indexes from definition through implementation: their relationship with primary keys, the critical NULL handling differences between databases, how uniqueness is enforced at the transaction level, and advanced techniques like partial unique indexes. This knowledge enables you to design robust data integrity constraints.