Loading content...
Database indexes typically serve a single purpose: accelerate data retrieval. But unique indexes are special—they simultaneously provide fast lookup capabilities AND enforce a critical data integrity constraint: no duplicate values allowed.
This dual nature makes unique indexes one of the most important tools in the database designer's arsenal. They're not just about speed; they're about correctness. When you define a unique index on an email column, you're making a promise to your application: every email address will be distinct, and the database itself will enforce this guarantee—no application code required.
By the end of this page, you will understand how unique indexes combine indexing with constraint enforcement, the relationship between unique indexes and primary keys, how different databases handle NULL values in unique indexes, the implementation mechanics that enable duplicate detection, and patterns for effective unique index design.
A unique index is an index that enforces a uniqueness constraint on the indexed column(s). It guarantees that no two rows in the table can have the same value(s) for the indexed field(s)—with specific handling for NULL values that varies by database system.
Formal definition:
A unique index is an index structure where:
Dual purpose:
Unique indexes serve two distinct roles:
123456789101112131415161718192021222324252627
-- Creating a unique index explicitlyCREATE UNIQUE INDEX idx_employee_emailON Employee (Email); -- Creating a unique constraint (implicitly creates unique index)ALTER TABLE EmployeeADD CONSTRAINT uq_employee_ssn UNIQUE (SSN); -- The ENGINE enforces uniqueness on every operation: -- This succeeds (first occurrence)INSERT INTO Employee (ID, Email, SSN)VALUES (1, 'alice@company.com', '123-45-6789'); -- This FAILS with constraint violationINSERT INTO Employee (ID, Email, SSN)VALUES (2, 'alice@company.com', '987-65-4321');-- Error: Duplicate entry 'alice@company.com' for key 'idx_employee_email' -- Updates are also checked:UPDATE Employee SET Email = 'bob@company.com'WHERE ID = 1;-- Success (if no other row has this email) UPDATE Employee SET Email = 'existing@company.com'WHERE ID = 1;-- FAILS if another row already has this emailThere's a subtle distinction: UNIQUE CONSTRAINT is a logical concept (a rule about data), while UNIQUE INDEX is a physical structure. In practice, creating a unique constraint always creates an underlying unique index in most databases. The constraint provides semantic meaning; the index provides enforcement and performance.
| Characteristic | Unique Index | Non-Unique Index |
|---|---|---|
| Duplicate values | Rejected (error thrown) | Allowed |
| Lookup performance | O(log n) | O(log n) |
| Data integrity | Enforces uniqueness | No constraint |
| Storage overhead | Same as regular index | Same as unique index |
| Insert check | Must verify no duplicate exists | No uniqueness check |
| Query optimization | Optimizer knows at most 1 row matches | Must assume multiple matches |
The relationship between unique indexes and primary keys is fundamental but often confused. Let's clarify their connection.
Primary Key = Unique + NOT NULL
A primary key constraint combines two properties:
A unique index, by itself, typically allows NULL values (with variations by database). The primary key is essentially a unique index with an additional NOT NULL constraint.
Implementation relationship:
When you create a PRIMARY KEY constraint, the database automatically creates a unique index to enforce it. In some databases (like InnoDB), this becomes the clustered index.
123456789101112131415161718192021222324252627282930313233
-- These are logically equivalent: -- Option 1: Primary Key ConstraintCREATE TABLE Employee ( ID INT PRIMARY KEY, -- Unique + NOT NULL (implicit) Email VARCHAR(255)); -- Option 2: Unique Index + NOT NULLCREATE TABLE Employee ( ID INT NOT NULL, Email VARCHAR(255));CREATE UNIQUE INDEX pk_employee_id ON Employee(ID); -- Key differences: -- 1. PRIMARY KEY implies NOT NULLALTER TABLE t ADD CONSTRAINT pk PRIMARY KEY (col);-- Automatically ensures col is NOT NULL -- 2. You can have only ONE primary key per tableALTER TABLE t ADD PRIMARY KEY (col1);ALTER TABLE t ADD PRIMARY KEY (col2); -- ERROR: Only one PK allowed -- 3. You can have MULTIPLE unique indexesCREATE UNIQUE INDEX idx1 ON t(col1);CREATE UNIQUE INDEX idx2 ON t(col2);CREATE UNIQUE INDEX idx3 ON t(col3); -- All valid! -- 4. Unique index can include nullable columnsCREATE UNIQUE INDEX idx_nullable ON t(nullable_col); -- Usually OK-- (NULL handling varies by database, see next section)| Property | Primary Key | Unique Index |
|---|---|---|
| NULL allowed | No (implicit NOT NULL) | Usually yes (varies) |
| Multiple per table | No (only one) | Yes (many allowed) |
| Creates index | Yes (unique index) | By definition |
| Semantic meaning | Identifies the row | Enforces uniqueness only |
| Foreign key reference | Common FK target | Can also be FK target |
| Clustered by default | Often (varies by DB) | Usually not |
In relational database theory, any column or combination of columns that could serve as a primary key is called a 'candidate key.' The chosen primary key is one candidate key; other candidate keys are often implemented as unique indexes. For example, an Employee table might use numeric ID as primary key but have unique indexes on SSN and Email—all are candidate keys.
How unique indexes handle NULL values is one of the most contentious and database-specific aspects of the SQL standard. The question: Do two NULL values count as duplicates?
The SQL standard is somewhat ambiguous, leading to different implementations:
However, databases implement this differently:
PostgreSQL NULL Behavior:
PostgreSQL follows the SQL standard interpretation: NULL values are considered distinct, so multiple NULLs are allowed in a unique index by default.
1234567891011121314151617181920
-- PostgreSQL: Multiple NULLs allowed by defaultCREATE TABLE users ( id SERIAL PRIMARY KEY, phone VARCHAR(20)); CREATE UNIQUE INDEX idx_unique_phone ON users(phone); -- These ALL succeed:INSERT INTO users (phone) VALUES ('555-1234'); -- OKINSERT INTO users (phone) VALUES (NULL); -- OKINSERT INTO users (phone) VALUES (NULL); -- OK! Second NULL allowedINSERT INTO users (phone) VALUES (NULL); -- OK! Third NULL allowed -- To disallow multiple NULLs, use partial index:CREATE UNIQUE INDEX idx_unique_phone_not_null ON users(phone) WHERE phone IS NOT NULL; -- Now: Multiple NULLs allowed, but non-NULL values must be uniqueNULL handling in unique indexes is a common source of bugs when migrating between databases. Always test NULL behavior explicitly in your target database, and document your assumptions in code comments. What works in PostgreSQL may fail in SQL Server!
Understanding how databases enforce uniqueness at the implementation level reveals why unique indexes have specific performance characteristics.
The enforcement mechanism:
When you attempt to INSERT or UPDATE a row with a unique-indexed column:
This check happens within a transaction, using appropriate locking to prevent race conditions.
123456789101112131415161718192021222324252627282930313233
INSERT Operation with Unique Index Transaction T1: INSERT INTO users (email) VALUES ('new@email.com') Step 1: Search unique index for 'new@email.com' ┌─────────────────────────────────────────────┐ │ B+-tree traversal: O(log n) comparisons │ │ Found? → YES or NO │ └─────────────────────────────────────────────┘ │ ┌──────────┴──────────┐ ▼ ▼ FOUND NOT FOUND (Duplicate) (Unique OK) │ │ ▼ ▼ ┌─────────────────┐ ┌─────────────────────────┐ │ REJECT INSERT │ │ Step 2: Acquire lock │ │ Return error: │ │ on index leaf page │ │ Duplicate entry │ │ │ └─────────────────┘ │ Step 3: Re-check (no │ │ concurrent insert?) │ │ │ │ Step 4: Insert entry │ │ in index AND data │ │ │ │ Step 5: Commit │ └─────────────────────────┘ Performance Note:- Best case: O(log n) for index lookup + O(1) for insert- The uniqueness check adds ~1 index traversal per insert- For batch inserts, this can be optimized with deferred checksConcurrency and race conditions:
Without proper locking, two concurrent transactions could both check for the same value, both find it missing, and both attempt to insert—creating a duplicate. Databases prevent this through:
This is why unique constraints are more reliable than application-level checks—the database guarantees atomicity of the check-and-insert operation.
Modern databases provide 'upsert' operations that leverage unique indexes: INSERT ... ON DUPLICATE KEY UPDATE (MySQL), INSERT ... ON CONFLICT (PostgreSQL), or MERGE (SQL Server). These atomically insert if unique or update if duplicate, avoiding the race condition of separate SELECT + INSERT/UPDATE.
12345678910
-- PostgreSQL: INSERT ON CONFLICT (requires unique index)INSERT INTO users (email, name, login_count)VALUES ('alice@example.com', 'Alice', 1)ON CONFLICT (email) DO UPDATESET login_count = users.login_count + 1; -- This atomically:-- 1. Tries to insert-- 2. If unique violation on email → updates instead-- No race condition possible!A composite unique index (or multi-column unique index) enforces uniqueness on a combination of columns. The uniqueness constraint applies to the tuple of values, not individual columns.
Key concept: Individual columns can have duplicates, but the combination must be unique.
1234567891011121314151617181920212223242526272829303132
-- Scenario: Each user can rate a product only onceCREATE TABLE product_ratings ( id SERIAL PRIMARY KEY, user_id INT NOT NULL, product_id INT NOT NULL, rating INT CHECK (rating BETWEEN 1 AND 5)); -- Composite unique index: (user_id, product_id) must be uniqueCREATE UNIQUE INDEX idx_unique_user_productON product_ratings (user_id, product_id); -- Valid inserts:INSERT INTO product_ratings (user_id, product_id, rating)VALUES (1, 100, 5); -- OK: (1, 100) first time INSERT INTO product_ratings (user_id, product_id, rating)VALUES (1, 200, 4); -- OK: (1, 200) - same user, different product INSERT INTO product_ratings (user_id, product_id, rating)VALUES (2, 100, 3); -- OK: (2, 100) - different user, same product -- Invalid insert (duplicate combination):INSERT INTO product_ratings (user_id, product_id, rating)VALUES (1, 100, 4); -- ERROR: (1, 100) already exists-- "Duplicate entry '1-100' for key 'idx_unique_user_product'" -- This pattern is common for:-- - User-resource relationships (favorites, follows, subscriptions)-- - Scheduling (employee-shift, room-timeslot)-- - Tagging (post-tag, product-category)-- - Versioning (document-version)| Domain | Columns | Meaning |
|---|---|---|
| E-commerce | (user_id, product_id) | One review per user per product |
| Social Media | (follower_id, following_id) | Follow relationship exists once |
| Scheduling | (room_id, date, time_slot) | No double-booking |
| Inventory | (warehouse_id, product_sku) | One stock record per location-product |
| CMS | (slug, tenant_id) | Unique URLs per tenant |
While column order doesn't affect uniqueness enforcement, it significantly impacts query performance. The index can efficiently support queries that filter on the leftmost prefix: (user_id), (user_id, product_id). A query filtering only on product_id cannot use this index efficiently.
Unique indexes provide valuable information to the query optimizer beyond just fast lookups. The guarantee of uniqueness enables optimization strategies that aren't possible with regular indexes.
Optimizer knowledge from unique indexes:
123456789101112131415161718192021222324252627282930313233
-- Example 1: Early termination-- With unique index on employee.id:SELECT * FROM employee WHERE id = 100;-- Optimizer KNOWS there's at most 1 result-- Can stop after finding first match (no need to check further) -- Example 2: Join optimization-- Unique index on product.id:SELECT o.*, p.nameFROM orders oJOIN products p ON o.product_id = p.id;-- Optimizer knows: each order matches exactly 0 or 1 product-- No duplicate checking needed in join result -- Example 3: Subquery to join conversion-- Unique index on users.email:SELECT * FROM ordersWHERE customer_email IN (SELECT email FROM users WHERE status = 'active');-- Can be optimized to semi-join knowing email is unique-- No DISTINCT required in subquery result -- Example 4: GROUP BY elimination-- Unique index on orders.id:SELECT id, SUM(amount)FROM order_itemsWHERE order_id = 100GROUP BY id;-- If we're filtering by a unique key, GROUP BY is trivial -- Without unique constraint, optimizer must assume:-- - Multiple rows might match-- - Duplicate elimination may be needed-- - Cannot stop searching earlyUse EXPLAIN (or EXPLAIN ANALYZE in PostgreSQL) to see how unique indexes affect query plans. Look for 'const' or 'eq_ref' access types in MySQL, 'Unique Index Scan' in PostgreSQL, or 'Clustered Index Seek' with 'Rows = 1' in SQL Server—these indicate the optimizer is leveraging uniqueness.
Sometimes you need uniqueness to apply only under certain conditions. Partial indexes (PostgreSQL terminology) or filtered indexes (SQL Server) enforce uniqueness on a subset of rows defined by a predicate.
Use cases:
123456789101112131415161718192021222324252627282930313233
-- PostgreSQL: Partial unique index-- Scenario: Email must be unique among active users,-- but deleted users can have duplicate emails CREATE TABLE users ( id SERIAL PRIMARY KEY, email VARCHAR(255) NOT NULL, is_deleted BOOLEAN DEFAULT FALSE); CREATE UNIQUE INDEX idx_unique_active_emailON users (email)WHERE is_deleted = FALSE; -- Works:INSERT INTO users (email) VALUES ('alice@example.com'); -- OKUPDATE users SET is_deleted = TRUE WHERE email = 'alice@example.com'; -- OKINSERT INTO users (email) VALUES ('alice@example.com'); -- OK! (first is deleted) -- Another example: Unique current versionCREATE TABLE documents ( id SERIAL PRIMARY KEY, doc_id INT NOT NULL, version INT NOT NULL, is_current BOOLEAN DEFAULT TRUE); CREATE UNIQUE INDEX idx_unique_current_docON documents (doc_id)WHERE is_current = TRUE; -- Ensures only ONE current version per document-- Multiple historical versions allowed| Database | Syntax | Notes |
|---|---|---|
| PostgreSQL | CREATE UNIQUE INDEX ... WHERE condition | Full support since v7.2 |
| SQL Server | CREATE UNIQUE INDEX ... WHERE condition | Since SQL Server 2008 |
| MySQL/InnoDB | Not supported directly | Use generated columns or triggers |
| Oracle | Function-based or virtual columns | No direct WHERE clause support |
| SQLite | CREATE UNIQUE INDEX ... WHERE condition | Since SQLite 3.8.0 |
For a partial unique index to be used in a query, the query's WHERE clause must be compatible with the index's predicate. If querying WHERE is_deleted = TRUE, the index on is_deleted = FALSE won't help—it doesn't contain those rows!
We've explored unique indexes comprehensively—the dual-purpose structures that provide both performance and integrity guarantees at the database level.
What's next:
With unique indexes covered, we'll explore composite indexes (also called multi-column or concatenated indexes) in greater depth—examining column order strategy, prefix matching, and how to design indexes that serve multiple query patterns efficiently.
You now understand unique indexes from definition through implementation: their relationship with primary keys, the critical NULL handling differences between databases, how uniqueness is enforced at the transaction level, and advanced techniques like partial unique indexes. This knowledge enables you to design robust data integrity constraints.