Keys - Learning Module | OneNoughtOne

Loading content...

0/252

Primary Key

The Designated Identifier

In the previous pages, we established that a relation may have multiple candidate keys—each capable of uniquely identifying tuples. But in practice, databases don't treat all candidate keys equally. One must be designated as the primary means of identification.

Consider an Employee table with three candidate keys: {EmployeeID}, {SSN}, and {Email}. Any of these could technically serve as the identifier, but the database system needs to know which one is primary—the one used for:

Establishing relationships with other tables
Clustering data for physical storage
Default indexing strategies
Foreign key references from other relations

This designated candidate key is called the primary key—arguably the most important concept in relational database design.

What You Will Learn

By the end of this page, you will understand the formal definition and properties of primary keys, master the criteria for selecting the best primary key from available candidates, appreciate the distinction between natural and surrogate primary keys, and recognize primary key implementation patterns across different database systems.

Formal Definition of Primary Key

Definition: Primary Key

A primary key is a candidate key that has been explicitly designated as the principal identifier for tuples in a relation. Each relation has exactly one primary key, though it may have multiple candidate keys.

Essential properties of a primary key:

It is a candidate key: This means it's a superkey (uniquely identifies tuples) and it's minimal (no proper subset is a superkey).
It is singular: A relation has exactly one primary key. Other candidate keys become alternate keys.
It enforces entity integrity: The primary key cannot contain NULL values in any component attribute.
It is the target of foreign keys: Other relations reference this relation through its primary key.

Mathematical formalization:

Let R be a relation schema with candidate keys CK = {K₁, K₂, ..., Kₙ}. The primary key is a designated element PK ∈ CK such that:

PK → R (superkey property)
∀A ∈ PK: (PK - {A})⁺ ≠ R (minimality)
∀t ∈ r(R), ∀A ∈ PK: t[A] ≠ NULL (entity integrity)

The selection of PK from CK is a design decision, not a mathematical determination. The relational model tells us the candidates; the designer chooses the primary.

Primary Key vs Candidate Key Properties
Property	Candidate Key	Primary Key
Uniqueness	Yes	Yes
Minimality	Yes	Yes
Multiple per relation?	Yes (can be many)	No (exactly one)
NULL values allowed?	Depends on DBMS	Never (entity integrity)
Foreign key target?	Can be, but unusual	Primary target for FKs
Physical implications?	Usually none	Often affects clustering, indexing

The Entity Integrity Constraint

One of the fundamental constraints of the relational model is entity integrity, and it applies specifically to primary keys.

Entity Integrity Rule

No attribute of a primary key can be NULL. If the primary key is composite (consists of multiple attributes), NONE of those attributes may contain NULL values for any tuple.

Why entity integrity matters:

The primary key serves as the identity of each tuple. If we allowed NULL values:

Identity becomes ambiguous: A tuple with a NULL primary key cannot be reliably identified or referenced.
Comparisons fail: In SQL, NULL = NULL evaluates to UNKNOWN (not TRUE). This means we couldn't reliably determine if two tuples were 'the same'.
Foreign key references break: Other tables referencing this tuple would reference... nothing? The semantics become undefined.
Uniqueness enforcement fails: We can't guarantee uniqueness when NULL is involved, since NULL ≠ NULL.

Example: The NULL Problem

Suppose we allowed NULL in primary keys:

EmployeeID	Name	Department
E001	John	Engineering
NULL	Jane	Marketing
NULL	Bob	Sales

Problems:

Are Jane and Bob the same employee (both have NULL ID)? No way to know.
How do we update 'the employee with NULL ID'? Ambiguous.
How does Payroll.EmployeeID reference Jane? It can't reliably.
SELECT * WHERE EmployeeID = NULL returns nothing (NULL ≠ NULL in SQL).

Entity integrity prevents this chaos by requiring primary keys to always have meaningful, non-NULL values.

Composite Key Caveat

For composite primary keys, entity integrity applies to EVERY attribute in the key. If the primary key is {StudentID, CourseID, Semester}, then ALL THREE must be non-NULL for every tuple. A NULL in any component violates entity integrity.

Primary Key Selection Criteria

When a relation has multiple candidate keys, choosing the primary key is a crucial design decision. Several criteria guide this choice:

Primary Key Selection Criteria

•Stability (Immutability): Choose keys whose values rarely or never change. SSN can change (legal name changes, corrections). Employee IDs rarely change. Prefer the stable option.
•Simplicity: Prefer smaller keys. Single-attribute keys over composite keys. Shorter data types over longer ones. Numeric over strings (faster comparison, smaller indexes).
•Familiarity: Choose keys meaningful to users and applications. 'CustomerID' is more intuitive than an internal sequence. Balance with other criteria.
•Privacy/Security: Avoid keys that expose sensitive information. SSN as a primary key means it appears everywhere—foreign keys, logs, URLs, error messages.
•Existence Guarantee: Choose keys that always have values. An employee might not have an SSN (international workers), but always has an EmployeeID.
•Uniqueness Scope: Prefer globally unique keys over locally unique. UUIDs work across merged databases. Sequences might collide.
•Performance: Consider indexing and join performance. Integer keys are fastest. UUIDs have random distribution (B-tree fragmentation).

Example: Applying Selection Criteria

For an Employee table with candidate keys {EmployeeID}, {SSN}, {Email}:

Criterion	EmployeeID	SSN	Email
Stability	⭐⭐⭐ Rarely changes	⭐⭐ Can change	⭐ Often changes
Simplicity	⭐⭐⭐ Short integer	⭐⭐ 9 digits	⭐ Variable length
Privacy	⭐⭐⭐ Not sensitive	⭐ Highly sensitive	⭐⭐ Semi-sensitive
Existence	⭐⭐⭐ Always assigned	⭐⭐ Not universal	⭐⭐ Usually exists
Performance	⭐⭐⭐ Integer comparison	⭐⭐ String	⭐ Variable string

Winner: EmployeeID — Best overall score across criteria.

SSN becomes an alternate key (with unique constraint). Email becomes an alternate key (with unique constraint but may change).

No Perfect Rule

Primary key selection involves tradeoffs. In some domains, a natural key (like ISBN for books) is ideal. In others, a synthetic/surrogate key is better. Context matters—understand your domain's specific requirements.

Natural Keys vs Surrogate Keys

One of the most debated topics in database design is whether to use natural keys (derived from real-world attributes) or surrogate keys (artificially generated identifiers).

Natural Keys

A natural key is a candidate key composed of attributes that exist in the real world and have business meaning. Examples: ISBN for books, SSN for US residents, VIN for vehicles, country code + passport number for travelers.

Natural Key Advantages

•Self-describing: Values have inherent meaning
•No extra storage: Using existing data
•Query simplicity: No joins to understand data
•Domain validation: Invalid values are often obvious
•Data interchange: External systems recognize them

Surrogate Keys

A surrogate key is an artificial attribute created solely to serve as the primary key. It has no business meaning outside the database. Examples: auto-increment integers (1, 2, 3...), UUIDs, system-generated codes.

Surrogate Key Advantages

•Immutable: Never needs to change
•Compact: Optimized for storage and indexing
•Privacy-safe: Reveals nothing about data
•Always exists: Assigned at creation
•System-controlled: No business rule changes affect it

When to use natural keys:

The natural key is truly stable: ISBN doesn't change for a book edition. Country codes are internationally standardized.
External integration is critical: APIs, data exports, and external systems use the natural identifier.
Lookup patterns match: Users search by the natural key (e.g., find book by ISBN).
Uniqueness is guaranteed externally: An authority ensures no duplicates (ISBN agency, passport authority).

When to use surrogate keys:

No good natural key exists: People change names. Addresses change. Email changes.
Natural keys are composite: A 5-attribute natural key makes joins expensive.
Privacy matters: Don't expose SSN in every foreign key column.
Mergers/acquisitions are possible: Different divisions may have overlapping natural key spaces.
Performance is critical: Integer auto-increment outperforms string comparisons in joins.

The Hybrid Approach

Many systems use BOTH: a surrogate primary key for relationships and indexing, PLUS natural keys as alternate keys (unique constraints) for validation and external integration. This combines the advantages of both approaches.

Auto-Generated Primary Keys

Modern databases provide various mechanisms for automatically generating primary key values. Understanding these options is essential for practical database design.

Auto-Generation Strategies Comparison
Strategy	Format Example	Pros	Cons
Auto-Increment (IDENTITY/SERIAL)	1, 2, 3, 4, 5...	Simple, compact, ordered, fast	Sequential (security risk), single-DB only
UUID/GUID	550e8400-e29b-41d4-a716-446655440000	Globally unique, distributed-safe	Large (16 bytes), random (index fragmentation)
ULID	01ARZ3NDEKTSV4RRFFQ69G5FAV	Sortable, distributed-safe, compact	Newer standard, less support
Snowflake ID	1382971839198277632	Sortable, distributed, compact (64-bit)	Requires ID service, clock dependency
Sequence	Explicit NEXT VALUE	Controlled, predictable	Requires extra object, potential bottleneck

SQL Implementation Examples:

primary_key_generation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Auto-Increment (MySQL)
CREATE TABLE Customer (
    CustomerID INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE
);
 
-- IDENTITY (SQL Server)
CREATE TABLE Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Name NVARCHAR(100) NOT NULL,
    Email NVARCHAR(255) UNIQUE
);
 
-- SERIAL (PostgreSQL)
CREATE TABLE Customer (
    CustomerID SERIAL PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE
);
 
-- UUID (PostgreSQL with extension)
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
 
CREATE TABLE Customer (
    CustomerID UUID DEFAULT uuid_generate_v4() PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE
);
 
-- Sequence (Oracle/PostgreSQL)
CREATE SEQUENCE customer_seq START WITH 1 INCREMENT BY 1;
 
CREATE TABLE Customer (
    CustomerID INT DEFAULT NEXT VALUE FOR customer_seq PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE
);

Choosing the Right Strategy

For single-database applications, auto-increment integers are usually best (simple, fast, compact). For distributed systems or when data merging is expected, UUIDs or Snowflake IDs prevent collisions. Consider your specific scalability and integration requirements.

Composite Primary Keys

When the natural candidate key consists of multiple attributes, we face a design decision: use the composite key as the primary key, or introduce a surrogate. Understanding the implications of composite primary keys is crucial.

Composite Primary Key Definition

A composite primary key (or compound primary key) consists of two or more columns that together uniquely identify each row. No single column suffices—the combination is required.

Example: Order Line Items

CREATE TABLE OrderLine (
    OrderID INT NOT NULL,
    LineNumber INT NOT NULL,
    ProductID INT NOT NULL,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, LineNumber),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);

Here, {OrderID, LineNumber} is the composite primary key:

OrderID alone doesn't work (many lines per order)
LineNumber alone doesn't work (same number in different orders)
Together, they uniquely identify each line item

Implications of composite primary keys:

Advantages

•Natural semantics: Key reflects business reality
•No extra column: Uses existing meaningful data
•Automatic constraint: Uniqueness is inherent
•Clear relationships: Foreign key structure is obvious
•Data integrity: Business rules encoded in schema

Challenges

•Verbose foreign keys: Referencing tables need multiple columns
•Join complexity: More columns to match in joins
•Update cascades: More columns to cascade on updates
•Index size: Larger composite indexes
•ORM complications: Some ORMs handle composites poorly

When to use composite vs surrogate:

Use composite primary key when:

The table represents a many-to-many relationship (junction table)
The composite key has exactly 2 attributes from foreign keys
No other tables will reference this table (no cascade complexity)
The attributes are stable and non-nullable

Use surrogate primary key when:

Multiple tables will reference this table
The natural key has 3+ attributes
Key attributes might change
ORM/framework compatibility is important

Primary Key Implementation

Different database management systems implement primary keys with subtle variations. Understanding these differences is essential for practical database work.

primary_key_syntax.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Column-level constraint (single-column key)
CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,  -- Inline constraint
    Name VARCHAR(100) NOT NULL
);
 
-- Table-level constraint (single or composite)
CREATE TABLE Employee (
    EmployeeID INT NOT NULL,
    Name VARCHAR(100) NOT NULL,
    PRIMARY KEY (EmployeeID)     -- Separate declaration
);
 
-- Table-level with composite key
CREATE TABLE Enrollment (
    StudentID INT NOT NULL,
    CourseID INT NOT NULL,
    Semester VARCHAR(20) NOT NULL,
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID, Semester)
);
 
-- Named constraint (recommended for maintenance)
CREATE TABLE Employee (
    EmployeeID INT NOT NULL,
    Name VARCHAR(100) NOT NULL,
    CONSTRAINT pk_employee PRIMARY KEY (EmployeeID)
);
 
-- Adding PK to existing table (ALTER statement)
ALTER TABLE Employee
ADD CONSTRAINT pk_employee PRIMARY KEY (EmployeeID);
 
-- PostgreSQL: Primary key with custom tablespace
CREATE TABLE Customer (
    CustomerID SERIAL,
    Name VARCHAR(100) NOT NULL,
    CONSTRAINT pk_customer PRIMARY KEY (CustomerID)
    USING INDEX TABLESPACE fast_tablespace
);

DBMS-Specific Primary Key Behaviors
DBMS	Default Index Type	Clustering	Auto-Increment Syntax
MySQL (InnoDB)	B+Tree	PK is clustered index	AUTO_INCREMENT
PostgreSQL	B-Tree	Not clustered by default (use CLUSTER)	SERIAL / IDENTITY
SQL Server	B-Tree	Clustered by default (configurable)	IDENTITY(seed, increment)
Oracle	B-Tree	Not clustered (IOT optional)	SEQUENCE / IDENTITY (12c+)
SQLite	B-Tree	INTEGER PRIMARY KEY is rowid	AUTOINCREMENT (different from default)

Clustered Index Implications

In MySQL InnoDB and SQL Server (by default), the primary key determines physical row ordering. This means: (1) PK determines I/O patterns, (2) Random PKs (like UUID) cause fragmentation, (3) Wide PKs increase secondary index sizes. Choose your primary key with storage implications in mind.

Summary: Primary Key Mastery

We've comprehensively covered primary keys—the designated identifier that serves as the foundational reference point for relational data. Let's consolidate the key insights:

Key Takeaways

•Primary key is the designated candidate key — One candidate key is chosen as the primary means of identification for a relation.
•Entity integrity is mandatory — No attribute of a primary key may contain NULL values; this preserves tuple identity.
•Selection criteria matter — Stability, simplicity, privacy, existence, and performance guide primary key selection.
•Natural vs surrogate is context-dependent — Natural keys work when stable and meaningful; surrogates work for flexibility and privacy.
•Auto-generation strategies vary — From simple auto-increment to globally-unique UUIDs, choose based on distribution and scalability needs.
•Implementation differs by DBMS — Understand your database's specific behaviors regarding clustering, indexing, and syntax.

What's next:

Now that we understand primary keys, we'll explore alternate keys—the candidate keys that were not selected as primary but still serve important roles in ensuring data integrity and providing alternate access paths.

Page Complete

You now understand primary keys as the designated tuple identifier. You can apply selection criteria to choose the best primary key, evaluate natural versus surrogate key tradeoffs, implement primary keys across different DBMS platforms, and appreciate the physical implications of primary key choices. Next, we'll explore alternate keys and their role in database integrity.

Primary Key

The Designated Identifier

Establishing relationships with other tables
Clustering data for physical storage
Default indexing strategies
Foreign key references from other relations

This designated candidate key is called the primary key—arguably the most important concept in relational database design.

What You Will Learn

Formal Definition of Primary Key

Definition: Primary Key

Essential properties of a primary key:

It is a candidate key: This means it's a superkey (uniquely identifies tuples) and it's minimal (no proper subset is a superkey).
It is singular: A relation has exactly one primary key. Other candidate keys become alternate keys.
It enforces entity integrity: The primary key cannot contain NULL values in any component attribute.
It is the target of foreign keys: Other relations reference this relation through its primary key.

Mathematical formalization:

Let R be a relation schema with candidate keys CK = {K₁, K₂, ..., Kₙ}. The primary key is a designated element PK ∈ CK such that:

PK → R (superkey property)
∀A ∈ PK: (PK - {A})⁺ ≠ R (minimality)
∀t ∈ r(R), ∀A ∈ PK: t[A] ≠ NULL (entity integrity)

The selection of PK from CK is a design decision, not a mathematical determination. The relational model tells us the candidates; the designer chooses the primary.

Primary Key vs Candidate Key Properties
Property	Candidate Key	Primary Key
Uniqueness	Yes	Yes
Minimality	Yes	Yes
Multiple per relation?	Yes (can be many)	No (exactly one)
NULL values allowed?	Depends on DBMS	Never (entity integrity)
Foreign key target?	Can be, but unusual	Primary target for FKs
Physical implications?	Usually none	Often affects clustering, indexing

The Entity Integrity Constraint

One of the fundamental constraints of the relational model is entity integrity, and it applies specifically to primary keys.

Entity Integrity Rule

No attribute of a primary key can be NULL. If the primary key is composite (consists of multiple attributes), NONE of those attributes may contain NULL values for any tuple.

Why entity integrity matters:

The primary key serves as the identity of each tuple. If we allowed NULL values:

Identity becomes ambiguous: A tuple with a NULL primary key cannot be reliably identified or referenced.
Comparisons fail: In SQL, NULL = NULL evaluates to UNKNOWN (not TRUE). This means we couldn't reliably determine if two tuples were 'the same'.
Foreign key references break: Other tables referencing this tuple would reference... nothing? The semantics become undefined.
Uniqueness enforcement fails: We can't guarantee uniqueness when NULL is involved, since NULL ≠ NULL.

Example: The NULL Problem

Suppose we allowed NULL in primary keys:

EmployeeID	Name	Department
E001	John	Engineering
NULL	Jane	Marketing
NULL	Bob	Sales

Problems:

Are Jane and Bob the same employee (both have NULL ID)? No way to know.
How do we update 'the employee with NULL ID'? Ambiguous.
How does Payroll.EmployeeID reference Jane? It can't reliably.
SELECT * WHERE EmployeeID = NULL returns nothing (NULL ≠ NULL in SQL).

Entity integrity prevents this chaos by requiring primary keys to always have meaningful, non-NULL values.

Composite Key Caveat

Primary Key Selection Criteria

When a relation has multiple candidate keys, choosing the primary key is a crucial design decision. Several criteria guide this choice:

Primary Key Selection Criteria

•Stability (Immutability): Choose keys whose values rarely or never change. SSN can change (legal name changes, corrections). Employee IDs rarely change. Prefer the stable option.
•Simplicity: Prefer smaller keys. Single-attribute keys over composite keys. Shorter data types over longer ones. Numeric over strings (faster comparison, smaller indexes).
•Familiarity: Choose keys meaningful to users and applications. 'CustomerID' is more intuitive than an internal sequence. Balance with other criteria.
•Privacy/Security: Avoid keys that expose sensitive information. SSN as a primary key means it appears everywhere—foreign keys, logs, URLs, error messages.
•Existence Guarantee: Choose keys that always have values. An employee might not have an SSN (international workers), but always has an EmployeeID.
•Uniqueness Scope: Prefer globally unique keys over locally unique. UUIDs work across merged databases. Sequences might collide.
•Performance: Consider indexing and join performance. Integer keys are fastest. UUIDs have random distribution (B-tree fragmentation).

Example: Applying Selection Criteria

For an Employee table with candidate keys {EmployeeID}, {SSN}, {Email}:

Criterion	EmployeeID	SSN	Email
Stability	⭐⭐⭐ Rarely changes	⭐⭐ Can change	⭐ Often changes
Simplicity	⭐⭐⭐ Short integer	⭐⭐ 9 digits	⭐ Variable length
Privacy	⭐⭐⭐ Not sensitive	⭐ Highly sensitive	⭐⭐ Semi-sensitive
Existence	⭐⭐⭐ Always assigned	⭐⭐ Not universal	⭐⭐ Usually exists
Performance	⭐⭐⭐ Integer comparison	⭐⭐ String	⭐ Variable string

Winner: EmployeeID — Best overall score across criteria.

SSN becomes an alternate key (with unique constraint). Email becomes an alternate key (with unique constraint but may change).

No Perfect Rule

Natural Keys vs Surrogate Keys

One of the most debated topics in database design is whether to use natural keys (derived from real-world attributes) or surrogate keys (artificially generated identifiers).

Natural Keys

Natural Key Advantages

•Self-describing: Values have inherent meaning
•No extra storage: Using existing data
•Query simplicity: No joins to understand data
•Domain validation: Invalid values are often obvious
•Data interchange: External systems recognize them

Surrogate Keys

Surrogate Key Advantages

•Immutable: Never needs to change
•Compact: Optimized for storage and indexing
•Privacy-safe: Reveals nothing about data
•Always exists: Assigned at creation
•System-controlled: No business rule changes affect it

When to use natural keys:

The natural key is truly stable: ISBN doesn't change for a book edition. Country codes are internationally standardized.
External integration is critical: APIs, data exports, and external systems use the natural identifier.
Lookup patterns match: Users search by the natural key (e.g., find book by ISBN).
Uniqueness is guaranteed externally: An authority ensures no duplicates (ISBN agency, passport authority).

When to use surrogate keys:

No good natural key exists: People change names. Addresses change. Email changes.
Natural keys are composite: A 5-attribute natural key makes joins expensive.
Privacy matters: Don't expose SSN in every foreign key column.
Mergers/acquisitions are possible: Different divisions may have overlapping natural key spaces.
Performance is critical: Integer auto-increment outperforms string comparisons in joins.

The Hybrid Approach

Auto-Generated Primary Keys

Modern databases provide various mechanisms for automatically generating primary key values. Understanding these options is essential for practical database design.

Auto-Generation Strategies Comparison
Strategy	Format Example	Pros	Cons
Auto-Increment (IDENTITY/SERIAL)	1, 2, 3, 4, 5...	Simple, compact, ordered, fast	Sequential (security risk), single-DB only
UUID/GUID	550e8400-e29b-41d4-a716-446655440000	Globally unique, distributed-safe	Large (16 bytes), random (index fragmentation)
ULID	01ARZ3NDEKTSV4RRFFQ69G5FAV	Sortable, distributed-safe, compact	Newer standard, less support
Snowflake ID	1382971839198277632	Sortable, distributed, compact (64-bit)	Requires ID service, clock dependency
Sequence	Explicit NEXT VALUE	Controlled, predictable	Requires extra object, potential bottleneck

SQL Implementation Examples:

primary_key_generation.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Auto-Increment (MySQL)
CREATE TABLE Customer (
    CustomerID INT AUTO_INCREMENT PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE
);
 
-- IDENTITY (SQL Server)
CREATE TABLE Customer (
    CustomerID INT IDENTITY(1,1) PRIMARY KEY,
    Name NVARCHAR(100) NOT NULL,
    Email NVARCHAR(255) UNIQUE
);
 
-- SERIAL (PostgreSQL)
CREATE TABLE Customer (
    CustomerID SERIAL PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE
);
 
-- UUID (PostgreSQL with extension)
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";
 
CREATE TABLE Customer (
    CustomerID UUID DEFAULT uuid_generate_v4() PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE
);
 
-- Sequence (Oracle/PostgreSQL)
CREATE SEQUENCE customer_seq START WITH 1 INCREMENT BY 1;
 
CREATE TABLE Customer (
    CustomerID INT DEFAULT NEXT VALUE FOR customer_seq PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Email VARCHAR(255) UNIQUE
);

Choosing the Right Strategy

Composite Primary Keys

Composite Primary Key Definition

A composite primary key (or compound primary key) consists of two or more columns that together uniquely identify each row. No single column suffices—the combination is required.

Example: Order Line Items

CREATE TABLE OrderLine (
    OrderID INT NOT NULL,
    LineNumber INT NOT NULL,
    ProductID INT NOT NULL,
    Quantity INT NOT NULL,
    UnitPrice DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (OrderID, LineNumber),
    FOREIGN KEY (OrderID) REFERENCES Orders(OrderID),
    FOREIGN KEY (ProductID) REFERENCES Products(ProductID)
);

Here, {OrderID, LineNumber} is the composite primary key:

OrderID alone doesn't work (many lines per order)
LineNumber alone doesn't work (same number in different orders)
Together, they uniquely identify each line item

Implications of composite primary keys:

Advantages

•Natural semantics: Key reflects business reality
•No extra column: Uses existing meaningful data
•Automatic constraint: Uniqueness is inherent
•Clear relationships: Foreign key structure is obvious
•Data integrity: Business rules encoded in schema

Challenges

•Verbose foreign keys: Referencing tables need multiple columns
•Join complexity: More columns to match in joins
•Update cascades: More columns to cascade on updates
•Index size: Larger composite indexes
•ORM complications: Some ORMs handle composites poorly

When to use composite vs surrogate:

Use composite primary key when:

The table represents a many-to-many relationship (junction table)
The composite key has exactly 2 attributes from foreign keys
No other tables will reference this table (no cascade complexity)
The attributes are stable and non-nullable

Use surrogate primary key when:

Multiple tables will reference this table
The natural key has 3+ attributes
Key attributes might change
ORM/framework compatibility is important

Primary Key Implementation

Different database management systems implement primary keys with subtle variations. Understanding these differences is essential for practical database work.

primary_key_syntax.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Column-level constraint (single-column key)
CREATE TABLE Employee (
    EmployeeID INT PRIMARY KEY,  -- Inline constraint
    Name VARCHAR(100) NOT NULL
);
 
-- Table-level constraint (single or composite)
CREATE TABLE Employee (
    EmployeeID INT NOT NULL,
    Name VARCHAR(100) NOT NULL,
    PRIMARY KEY (EmployeeID)     -- Separate declaration
);
 
-- Table-level with composite key
CREATE TABLE Enrollment (
    StudentID INT NOT NULL,
    CourseID INT NOT NULL,
    Semester VARCHAR(20) NOT NULL,
    Grade CHAR(2),
    PRIMARY KEY (StudentID, CourseID, Semester)
);
 
-- Named constraint (recommended for maintenance)
CREATE TABLE Employee (
    EmployeeID INT NOT NULL,
    Name VARCHAR(100) NOT NULL,
    CONSTRAINT pk_employee PRIMARY KEY (EmployeeID)
);
 
-- Adding PK to existing table (ALTER statement)
ALTER TABLE Employee
ADD CONSTRAINT pk_employee PRIMARY KEY (EmployeeID);
 
-- PostgreSQL: Primary key with custom tablespace
CREATE TABLE Customer (
    CustomerID SERIAL,
    Name VARCHAR(100) NOT NULL,
    CONSTRAINT pk_customer PRIMARY KEY (CustomerID)
    USING INDEX TABLESPACE fast_tablespace
);

DBMS-Specific Primary Key Behaviors
DBMS	Default Index Type	Clustering	Auto-Increment Syntax
MySQL (InnoDB)	B+Tree	PK is clustered index	AUTO_INCREMENT
PostgreSQL	B-Tree	Not clustered by default (use CLUSTER)	SERIAL / IDENTITY
SQL Server	B-Tree	Clustered by default (configurable)	IDENTITY(seed, increment)
Oracle	B-Tree	Not clustered (IOT optional)	SEQUENCE / IDENTITY (12c+)
SQLite	B-Tree	INTEGER PRIMARY KEY is rowid	AUTOINCREMENT (different from default)

Clustered Index Implications

Summary: Primary Key Mastery

We've comprehensively covered primary keys—the designated identifier that serves as the foundational reference point for relational data. Let's consolidate the key insights:

Key Takeaways

•Primary key is the designated candidate key — One candidate key is chosen as the primary means of identification for a relation.
•Entity integrity is mandatory — No attribute of a primary key may contain NULL values; this preserves tuple identity.
•Selection criteria matter — Stability, simplicity, privacy, existence, and performance guide primary key selection.
•Natural vs surrogate is context-dependent — Natural keys work when stable and meaningful; surrogates work for flexibility and privacy.
•Auto-generation strategies vary — From simple auto-increment to globally-unique UUIDs, choose based on distribution and scalability needs.
•Implementation differs by DBMS — Understand your database's specific behaviors regarding clustering, indexing, and syntax.

What's next:

Page Complete