Database Management SystemsIntegrity Constraints

Integrity Constraints in the Relational Model

LevelIntermediate

Duration90 mins

TopicIntegrity Constraints

1 / 5

Entity Integrity

The Fundamental Contract of Relational Data

In 1970, E.F. Codd introduced the relational model and forever changed how we think about data storage and retrieval. But Codd understood something crucial: a data model without rules is merely a container for chaos. Entity integrity emerged as one of the two fundamental integrity rules that transform relational databases from passive storage into active guardians of data quality.

Entity integrity addresses a deceptively simple question: How do we ensure that every row in a table can be uniquely identified and meaningfully referenced? The answer to this question has profound implications for every database operation—from simple queries to complex transactions spanning multiple tables.

This page provides an exhaustive exploration of entity integrity, examining its theoretical foundations, practical manifestations, implementation strategies, and the subtle edge cases that can undermine even well-designed systems.

What You Will Master

By the end of this page, you will understand: (1) The formal definition and theoretical basis of entity integrity, (2) Why NULL values are prohibited in primary keys, (3) How entity integrity relates to entity identity in the real world, (4) Implementation mechanisms across different database systems, (5) Common violations and how to prevent them, and (6) The relationship between entity integrity and other constraints.

Formal Definition of Entity Integrity

Entity integrity is formally stated as follows:

Entity Integrity Rule: No attribute participating in the primary key of a base relation is allowed to accept NULL values.

This statement, while concise, encodes several critical concepts that deserve careful unpacking:

Deconstructing the Definition

•'No attribute' — This applies to every column that forms any part of the primary key, whether the key consists of one column or multiple columns (composite key).
•'Participating in the primary key' — The constraint is specific to primary keys, not to candidate keys or other unique constraints, though good design often extends non-nullability to all candidate keys.
•'Base relation' — The rule applies to stored tables (base relations), not to derived tables (views) where the integrity is inherited from underlying base relations.
•'NULL values' — NULL represents the absence of a value—either unknown, not applicable, or not yet assigned. Primary keys must have actual, meaningful values.

The mathematical perspective:

From a set-theoretic viewpoint, a relation is a subset of the Cartesian product of its attribute domains. Each tuple (row) in a relation represents a fact about some entity in the real world. The primary key serves as the unique identifier for this entity.

If we permitted NULL values in primary keys, we would face a fundamental logical paradox. Consider the comparison NULL = NULL. In SQL and relational logic, this evaluates to UNKNOWN, not TRUE. Therefore, two tuples with NULL primary keys cannot be definitively compared for equality, violating the fundamental requirement that primary keys uniquely identify tuples.

The Three-Valued Logic Problem

Relational databases operate with three-valued logic: TRUE, FALSE, and UNKNOWN. NULL comparisons always yield UNKNOWN. If primary keys could be NULL, the fundamental operation of determining tuple uniqueness would become logically undefined. Two rows with NULL keys might be the same entity, or they might not—we cannot know.

Entity Identity: The Philosophical Foundation

To truly understand entity integrity, we must examine what it means to identify an entity. In the relational model, each row in a table corresponds to an entity or fact in the real world. The primary key serves as the surrogate or natural identifier for this entity.

The Entity-Identity Principle:

Every entity in the real world has an identity—something that makes it distinguishable from all other entities of the same type. For a person, this might be a national ID number. For a product, it might be a manufacturer's serial number or a generated SKU. For an event, it might be a timestamp combined with a location.

When we model entities in a relational database, we must capture this real-world identity in the primary key. If the primary key could be NULL, we would essentially be saying: "This entity might exist, but we cannot identify it." Such a statement is a logical contradiction—how can something exist if we cannot distinguish it from anything else?

Entity Identity in Different Domains
Domain	Entity Type	Natural Identity	Why NULL Would Be Problematic
Healthcare	Patient	Medical Record Number	Cannot track treatments, allocate resources, or maintain history
Finance	Transaction	Transaction ID	Cannot audit, reverse, or reconcile financial movements
E-commerce	Order	Order Number	Cannot fulfill, track, or customer service the order
Government	Citizen	National ID	Cannot verify identity, grant benefits, or enforce obligations
Education	Student	Student ID	Cannot record grades, attendance, or degree progress
Manufacturing	Component	Serial Number	Cannot trace defects, manage recalls, or ensure quality

The practical manifestation:

Consider an employee database. Each employee has a unique employee ID that identifies them within the organization. If we allowed NULL employee IDs:

How would payroll know whom to pay?
How would managers assign work to a specific person?
How would HR track benefits enrollment?
How would security systems grant building access?

NULL in a primary key essentially creates a phantom entity—something that occupies space in the database but cannot participate meaningfully in any operation that requires identification.

Existence Implies Identity

A useful mental model: if an entity is important enough to store in a database, it is important enough to identify. Entity integrity simply enforces this principle at the database level. The moment you decide to insert a row, you are asserting that this entity exists and matters—therefore, it must be identifiable.

Why NULL Values Are Prohibited in Primary Keys

The prohibition of NULL values in primary keys is not arbitrary—it emerges from several fundamental requirements of relational systems:

Core Reasons for NULL Prohibition

•Unique identification requirement — The primary key must uniquely identify each tuple. If NULL were allowed, multiple tuples could have NULL as their key value, violating uniqueness because NULL = NULL evaluates to UNKNOWN, not TRUE.
•Foreign key referencing — Referential integrity depends on foreign keys pointing to existing primary keys. A foreign key cannot meaningfully reference a NULL primary key—the reference would point to 'nothing definite.'
•Indexing efficiency — B-tree indexes, which most databases use for primary keys, rely on key ordering. NULL has no definite position in an ordering, complicating index maintenance and lookup operations.
•Join semantics — Join operations match rows based on key equality. NULL keys would never match (since NULL = anything is UNKNOWN), creating orphaned rows that appear in no join results.
•Update and delete targeting — UPDATE and DELETE statements identify target rows by key values. A NULL key would match no WHERE clause reliably, making modification operations undefined.

A detailed example of the problems:

Consider an Orders table where order_id is the primary key. Suppose we allowed NULL values:

problematic_null_keys.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Hypothetical scenario: NULL allowed in primary key (this is NOT valid SQL)
CREATE TABLE Orders (
    order_id INT PRIMARY KEY,  -- Hypothetically allows NULL
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
);
 
-- Insert orders with NULL keys
INSERT INTO Orders VALUES (NULL, 101, '2024-01-15', 250.00);
INSERT INTO Orders VALUES (NULL, 102, '2024-01-16', 175.50);
INSERT INTO Orders VALUES (1001, 101, '2024-01-17', 320.00);
 
-- PROBLEM 1: Which order does the OrderItems reference?
CREATE TABLE OrderItems (
    item_id INT PRIMARY KEY,
    order_id INT REFERENCES Orders(order_id),  -- Foreign key
    product_id INT,
    quantity INT
);
 
-- This item belongs to... which NULL order?
INSERT INTO OrderItems VALUES (1, NULL, 5001, 2);
 
-- PROBLEM 2: How do we update the NULL order?
UPDATE Orders 
SET total_amount = 300.00 
WHERE order_id = NULL;  -- This matches NOTHING (NULL = NULL is UNKNOWN)
 
-- PROBLEM 3: How do we join for reporting?
SELECT o.order_id, c.customer_name
FROM Orders o
JOIN Customers c ON o.customer_id = c.customer_id
WHERE o.order_id = NULL;  -- Returns no rows!
 
-- PROBLEM 4: Are the two NULL orders the same order?
-- We cannot know! They might represent the same entity or different entities.

The Cascade of Failures

Each NULL in a primary key creates ripple effects throughout the database. Foreign keys become meaningless. Queries return unexpected results. Updates fail silently. Reports omit data. The relational model's elegant algebra breaks down completely. Entity integrity is not merely a best practice—it is a structural necessity.

Entity Integrity and Composite Primary Keys

Entity integrity becomes more nuanced when dealing with composite primary keys—primary keys consisting of multiple attributes. The rule is unambiguous: no attribute participating in the primary key may be NULL.

This means every component column of a composite key must have a non-NULL value. A single NULL in any component invalidates the entire key.

composite_key_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Enrollment table with composite primary key
CREATE TABLE Enrollment (
    student_id INT NOT NULL,
    course_id VARCHAR(10) NOT NULL,
    semester CHAR(6) NOT NULL,
    enrollment_date DATE NOT NULL,
    grade CHAR(2),  -- Can be NULL until graded
    
    PRIMARY KEY (student_id, course_id, semester)
);
 
-- VALID: All primary key components have values
INSERT INTO Enrollment VALUES (1001, 'CS101', '2024FA', '2024-08-25', NULL);
 
-- INVALID: student_id is NULL (violates entity integrity)
INSERT INTO Enrollment VALUES (NULL, 'CS101', '2024FA', '2024-08-25', 'A');
 
-- INVALID: course_id is NULL (violates entity integrity)  
INSERT INTO Enrollment VALUES (1001, NULL, '2024FA', '2024-08-25', 'A');
 
-- INVALID: semester is NULL (violates entity integrity)
INSERT INTO Enrollment VALUES (1001, 'CS101', NULL, '2024-08-25', 'A');
 
-- Note: grade CAN be NULL because it's not part of the primary key

Why each component matters:

In the enrollment example above, the composite key (student_id, course_id, semester) represents a unique enrollment event. Each component carries essential identifying information:

student_id identifies who is enrolled
course_id identifies what they are enrolled in
semester identifies when they are enrolled

If any of these were NULL, the enrollment record would be incomplete:

NULL student_id: Enrollment for no one in particular
NULL course_id: Enrollment in some unknown course
NULL semester: Enrollment at some undefined time

None of these scenarios represents a meaningful fact worth recording.

The Junction Table Pattern

Composite primary keys frequently appear in junction tables (bridge tables) that implement many-to-many relationships. In these tables, entity integrity is especially critical because the junction table's purpose is precisely to connect two entities. A NULL in either foreign key component means the connection is incomplete—connecting an entity to nothing.

Composite Key Analysis
Table	Composite Key Components	Meaning of Each Component	Consequence of NULL
OrderItems	order_id + product_id	Which order contains which product	Item belongs to no order or is no product
FlightBooking	flight_id + passenger_id + seat_number	Who sits where on which flight	Unbookable reservation
MovieCast	movie_id + actor_id + role_name	Who plays what role in which movie	Unattributable performance
ExamScore	student_id + exam_id + question_number	Specific student response to specific question	Unscorable answer

Implementation Across Database Systems

Modern relational database management systems automatically enforce entity integrity when you declare a primary key. However, the implementation details and additional options vary across systems. Understanding these mechanisms helps you leverage them effectively.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- PostgreSQL automatically enforces NOT NULL on primary key columns
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    hire_date DATE DEFAULT CURRENT_DATE
);
 
-- The PRIMARY KEY constraint implicitly adds:
-- 1. NOT NULL constraint on employee_id
-- 2. UNIQUE constraint on employee_id
-- 3. Creates a B-tree index on employee_id
 
-- Attempting NULL insertion fails:
INSERT INTO employees (employee_id, first_name, last_name)
VALUES (NULL, 'John', 'Doe');
-- ERROR: null value in column "employee_id" violates not-null constraint
 
-- SERIAL auto-generates values, preventing NULL naturally
INSERT INTO employees (first_name, last_name, email)
VALUES ('John', 'Doe', 'john.doe@example.com');
-- employee_id automatically assigned (e.g., 1)
 
-- Composite key example
CREATE TABLE order_items (
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL DEFAULT 1,
    unit_price DECIMAL(10,2) NOT NULL,
    
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Auto-Generation Strategies

All major databases provide mechanisms to automatically generate primary key values (SERIAL, AUTO_INCREMENT, IDENTITY, Sequences). These features don't just prevent NULL—they eliminate the need for application code to generate and manage unique identifiers. Use them unless you have a specific reason for natural keys.

Surrogate Keys vs Natural Keys: Entity Integrity Implications

The choice between surrogate keys (artificially generated identifiers) and natural keys (real-world identifiers) has significant implications for entity integrity maintenance.

Surrogate keys are system-generated values with no inherent meaning outside the database. Examples include auto-incremented integers, UUIDs, or database-generated sequences.

Natural keys are values that have meaning in the real world and uniquely identify entities. Examples include Social Security Numbers, ISBN numbers, or email addresses.

Surrogate Keys: Entity Integrity Advantages

•Never NULL by design — Auto-generation guarantees a value exists before insertion completes
•Immutable — No external process can change the identifier, preventing key modification cascades
•Uniform — Same data type and size across all tables, simplifying indexing and joins
•Privacy-neutral — No sensitive information encoded in the key itself
•Stable — Real-world changes (name changes, reorganizations) don't affect the key

Natural Keys: Entity Integrity Risks

•May be unknown at insertion time — New employee might not have SSN yet; violates entity integrity
•Can change — Email addresses change; key changes cascade to all foreign keys
•Variable size — Different natural identifiers have different lengths, complicating indexing
•Privacy concerns — Natural identifiers often contain PII, appearing in logs and debug output
•Format variations — Same real-world ID formatted differently ('US-12345' vs 'US12345') can cause duplicates

The practical recommendation:

For operational databases, surrogate keys are generally preferred because they inherently satisfy entity integrity and avoid the pitfalls of natural key management. However, a hybrid approach often works best:

Use a surrogate key as the primary key (satisfies entity integrity automatically)
Declare natural keys as unique constraints (enforces business uniqueness)
Index natural keys if they're frequently used for lookups

hybrid_key_approach.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Hybrid approach: Surrogate PK + Natural Key Uniqueness
CREATE TABLE customers (
    -- Surrogate primary key: auto-generated, never NULL
    customer_id BIGSERIAL PRIMARY KEY,
    
    -- Natural key: enforced unique, used for business operations
    email VARCHAR(255) NOT NULL UNIQUE,
    
    -- Additional natural identifier (may be NULL until verified)
    ssn CHAR(11) UNIQUE,  -- Can be NULL; not part of PK
    
    -- Business attributes
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Foreign keys reference the surrogate key (stable, efficient)
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(customer_id),
    order_date DATE NOT NULL DEFAULT CURRENT_DATE,
    total_amount DECIMAL(12,2) NOT NULL
);
 
-- Application lookups can use natural key (email)
SELECT customer_id, first_name, last_name 
FROM customers 
WHERE email = 'user@example.com';
 
-- But foreign keys use stable surrogate
SELECT o.order_id, o.order_date, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

When Natural Keys Make Sense

Natural keys are appropriate when: (1) The identifier is guaranteed to exist before record creation, (2) The identifier is immutable by policy (ISO country codes), (3) The table is a reference/lookup table that rarely changes, or (4) The natural key is already required for all business operations. Examples include ISO currency codes, airport codes, or standardized product codes like ISBNs.

Common Violations and Prevention Strategies

While databases enforce entity integrity at the constraint level, violations can still occur through various pathways. Understanding these failure modes helps you design robust systems.

Common Violation Patterns

•Application bypass — Bulk loading tools that disable constraints, or direct SQL that skips validation triggers
•ETL pipeline failures — Data extraction from sources where the natural key wasn't actually unique or complete
•Legacy data migration — Historical data with NULL identifiers being moved to a new system with stricter enforcement
•Constraint-disabled maintenance — DBA disables constraints for maintenance and forgets to re-enable them
•Race conditions — Two transactions simultaneously inserting rows with auto-generated keys on systems with flawed sequence management
•Weak typing — ORM layers that serialize NULL as an empty string or zero, bypassing NULL checks

Prevention strategies:

Entity Integrity Defense Mechanisms
Layer	Strategy	Implementation
Database	Always-on constraints	Never disable PK constraints; use TRUNCATE instead of constraint-off bulk deletes
Database	Auto-generation defaults	Use SERIAL/IDENTITY/AUTO_INCREMENT to prevent NULL at the source
Application	Validation before persist	Validate entity has identifier before attempting database save
Application	ORM configuration	Configure ORM to reject NULL primary keys; use database-generated IDs
ETL	Pre-load validation	Validate source data for key completeness before loading
ETL	Staging tables	Load into staging with same constraints; fix violations before promotion
Operations	Constraint monitoring	Automated checks that all tables have enabled primary key constraints
Operations	Change control	Require approval for any DDL that modifies or disables constraints

integrity_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- PostgreSQL: Query to find tables without primary keys
SELECT 
    t.table_schema,
    t.table_name 
FROM information_schema.tables t
LEFT JOIN information_schema.table_constraints tc
    ON t.table_schema = tc.table_schema 
    AND t.table_name = tc.table_name 
    AND tc.constraint_type = 'PRIMARY KEY'
WHERE t.table_type = 'BASE TABLE'
    AND t.table_schema NOT IN ('pg_catalog', 'information_schema')
    AND tc.constraint_name IS NULL
ORDER BY t.table_schema, t.table_name;
 
-- PostgreSQL: Find disabled constraints (requires pg_catalog access)
SELECT 
    conrelid::regclass AS table_name,
    conname AS constraint_name,
    contype AS constraint_type,
    NOT convalidated AS is_not_validated
FROM pg_constraint
WHERE contype = 'p'  -- Primary key
    AND NOT convalidated;
 
-- MySQL: Tables without primary keys
SELECT 
    t.TABLE_SCHEMA, 
    t.TABLE_NAME
FROM information_schema.TABLES t
LEFT JOIN information_schema.TABLE_CONSTRAINTS tc
    ON t.TABLE_SCHEMA = tc.TABLE_SCHEMA 
    AND t.TABLE_NAME = tc.TABLE_NAME 
    AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
WHERE t.TABLE_TYPE = 'BASE TABLE'
    AND t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema')
    AND tc.CONSTRAINT_NAME IS NULL;

Proactive Monitoring

Include entity integrity checks in your database health monitoring. Regular automated scans for tables missing primary keys, disabled constraints, or NULL values in key columns can catch problems before they cascade into application failures.

Summary: Entity Integrity Mastery

Entity integrity is the bedrock of reliable relational databases. Let's consolidate the essential knowledge:

Key Takeaways

•Definition — Entity integrity requires that no attribute of the primary key may accept NULL values, ensuring every tuple is uniquely identifiable.
•Logical necessity — NULL in a primary key creates logical paradoxes because NULL = NULL evaluates to UNKNOWN, not TRUE, breaking uniqueness semantics.
•Real-world mapping — Entities important enough to store must be important enough to identify; NULL keys represent phantom entities.
•Composite keys — Every component of a composite primary key must be non-NULL; a single NULL component invalidates the entire key.
•Automatic enforcement — Modern databases enforce entity integrity automatically when PRIMARY KEY is declared.
•Surrogate key preference — Auto-generated surrogate keys inherently satisfy entity integrity and avoid natural key pitfalls.
•Defense in depth — Protect entity integrity at database, application, and operational layers.

What's next:

Entity integrity ensures that each row can be identified. But databases are networks of related tables, not isolated silos. Referential integrity—the subject of the next page—ensures that relationships between tables remain consistent. Together, entity and referential integrity form the twin pillars of relational data reliability.

Page Complete

You now possess a comprehensive understanding of entity integrity—from its theoretical foundations in set theory and logic, through its practical manifestations in SQL DDL, to the strategies for preventing violations across your entire data stack. Next, we explore referential integrity, the constraint that governs relationships between tables.

1 / 5

Loading learning content...

Database Management SystemsIntegrity Constraints

Integrity Constraints in the Relational Model

LevelIntermediate

Duration90 mins

TopicIntegrity Constraints

1 / 5

Entity Integrity

The Fundamental Contract of Relational Data

What You Will Master

Formal Definition of Entity Integrity

Entity integrity is formally stated as follows:

Entity Integrity Rule: No attribute participating in the primary key of a base relation is allowed to accept NULL values.

This statement, while concise, encodes several critical concepts that deserve careful unpacking:

Deconstructing the Definition

•'No attribute' — This applies to every column that forms any part of the primary key, whether the key consists of one column or multiple columns (composite key).
•'Participating in the primary key' — The constraint is specific to primary keys, not to candidate keys or other unique constraints, though good design often extends non-nullability to all candidate keys.
•'Base relation' — The rule applies to stored tables (base relations), not to derived tables (views) where the integrity is inherited from underlying base relations.
•'NULL values' — NULL represents the absence of a value—either unknown, not applicable, or not yet assigned. Primary keys must have actual, meaningful values.

The mathematical perspective:

The Three-Valued Logic Problem

Entity Identity: The Philosophical Foundation

The Entity-Identity Principle:

Entity Identity in Different Domains
Domain	Entity Type	Natural Identity	Why NULL Would Be Problematic
Healthcare	Patient	Medical Record Number	Cannot track treatments, allocate resources, or maintain history
Finance	Transaction	Transaction ID	Cannot audit, reverse, or reconcile financial movements
E-commerce	Order	Order Number	Cannot fulfill, track, or customer service the order
Government	Citizen	National ID	Cannot verify identity, grant benefits, or enforce obligations
Education	Student	Student ID	Cannot record grades, attendance, or degree progress
Manufacturing	Component	Serial Number	Cannot trace defects, manage recalls, or ensure quality

The practical manifestation:

Consider an employee database. Each employee has a unique employee ID that identifies them within the organization. If we allowed NULL employee IDs:

How would payroll know whom to pay?
How would managers assign work to a specific person?
How would HR track benefits enrollment?
How would security systems grant building access?

NULL in a primary key essentially creates a phantom entity—something that occupies space in the database but cannot participate meaningfully in any operation that requires identification.

Existence Implies Identity

Why NULL Values Are Prohibited in Primary Keys

The prohibition of NULL values in primary keys is not arbitrary—it emerges from several fundamental requirements of relational systems:

Core Reasons for NULL Prohibition

•Unique identification requirement — The primary key must uniquely identify each tuple. If NULL were allowed, multiple tuples could have NULL as their key value, violating uniqueness because NULL = NULL evaluates to UNKNOWN, not TRUE.
•Foreign key referencing — Referential integrity depends on foreign keys pointing to existing primary keys. A foreign key cannot meaningfully reference a NULL primary key—the reference would point to 'nothing definite.'
•Indexing efficiency — B-tree indexes, which most databases use for primary keys, rely on key ordering. NULL has no definite position in an ordering, complicating index maintenance and lookup operations.
•Join semantics — Join operations match rows based on key equality. NULL keys would never match (since NULL = anything is UNKNOWN), creating orphaned rows that appear in no join results.
•Update and delete targeting — UPDATE and DELETE statements identify target rows by key values. A NULL key would match no WHERE clause reliably, making modification operations undefined.

A detailed example of the problems:

Consider an Orders table where order_id is the primary key. Suppose we allowed NULL values:

problematic_null_keys.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
-- Hypothetical scenario: NULL allowed in primary key (this is NOT valid SQL)
CREATE TABLE Orders (
    order_id INT PRIMARY KEY,  -- Hypothetically allows NULL
    customer_id INT,
    order_date DATE,
    total_amount DECIMAL(10,2)
);
 
-- Insert orders with NULL keys
INSERT INTO Orders VALUES (NULL, 101, '2024-01-15', 250.00);
INSERT INTO Orders VALUES (NULL, 102, '2024-01-16', 175.50);
INSERT INTO Orders VALUES (1001, 101, '2024-01-17', 320.00);
 
-- PROBLEM 1: Which order does the OrderItems reference?
CREATE TABLE OrderItems (
    item_id INT PRIMARY KEY,
    order_id INT REFERENCES Orders(order_id),  -- Foreign key
    product_id INT,
    quantity INT
);
 
-- This item belongs to... which NULL order?
INSERT INTO OrderItems VALUES (1, NULL, 5001, 2);
 
-- PROBLEM 2: How do we update the NULL order?
UPDATE Orders 
SET total_amount = 300.00 
WHERE order_id = NULL;  -- This matches NOTHING (NULL = NULL is UNKNOWN)
 
-- PROBLEM 3: How do we join for reporting?
SELECT o.order_id, c.customer_name
FROM Orders o
JOIN Customers c ON o.customer_id = c.customer_id
WHERE o.order_id = NULL;  -- Returns no rows!
 
-- PROBLEM 4: Are the two NULL orders the same order?
-- We cannot know! They might represent the same entity or different entities.

The Cascade of Failures

Entity Integrity and Composite Primary Keys

This means every component column of a composite key must have a non-NULL value. A single NULL in any component invalidates the entire key.

composite_key_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- Enrollment table with composite primary key
CREATE TABLE Enrollment (
    student_id INT NOT NULL,
    course_id VARCHAR(10) NOT NULL,
    semester CHAR(6) NOT NULL,
    enrollment_date DATE NOT NULL,
    grade CHAR(2),  -- Can be NULL until graded
    
    PRIMARY KEY (student_id, course_id, semester)
);
 
-- VALID: All primary key components have values
INSERT INTO Enrollment VALUES (1001, 'CS101', '2024FA', '2024-08-25', NULL);
 
-- INVALID: student_id is NULL (violates entity integrity)
INSERT INTO Enrollment VALUES (NULL, 'CS101', '2024FA', '2024-08-25', 'A');
 
-- INVALID: course_id is NULL (violates entity integrity)  
INSERT INTO Enrollment VALUES (1001, NULL, '2024FA', '2024-08-25', 'A');
 
-- INVALID: semester is NULL (violates entity integrity)
INSERT INTO Enrollment VALUES (1001, 'CS101', NULL, '2024-08-25', 'A');
 
-- Note: grade CAN be NULL because it's not part of the primary key

Why each component matters:

In the enrollment example above, the composite key (student_id, course_id, semester) represents a unique enrollment event. Each component carries essential identifying information:

student_id identifies who is enrolled
course_id identifies what they are enrolled in
semester identifies when they are enrolled

If any of these were NULL, the enrollment record would be incomplete:

NULL student_id: Enrollment for no one in particular
NULL course_id: Enrollment in some unknown course
NULL semester: Enrollment at some undefined time

None of these scenarios represents a meaningful fact worth recording.

The Junction Table Pattern

Composite Key Analysis
Table	Composite Key Components	Meaning of Each Component	Consequence of NULL
OrderItems	order_id + product_id	Which order contains which product	Item belongs to no order or is no product
FlightBooking	flight_id + passenger_id + seat_number	Who sits where on which flight	Unbookable reservation
MovieCast	movie_id + actor_id + role_name	Who plays what role in which movie	Unattributable performance
ExamScore	student_id + exam_id + question_number	Specific student response to specific question	Unscorable answer

Implementation Across Database Systems

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
-- PostgreSQL automatically enforces NOT NULL on primary key columns
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE,
    hire_date DATE DEFAULT CURRENT_DATE
);
 
-- The PRIMARY KEY constraint implicitly adds:
-- 1. NOT NULL constraint on employee_id
-- 2. UNIQUE constraint on employee_id
-- 3. Creates a B-tree index on employee_id
 
-- Attempting NULL insertion fails:
INSERT INTO employees (employee_id, first_name, last_name)
VALUES (NULL, 'John', 'Doe');
-- ERROR: null value in column "employee_id" violates not-null constraint
 
-- SERIAL auto-generates values, preventing NULL naturally
INSERT INTO employees (first_name, last_name, email)
VALUES ('John', 'Doe', 'john.doe@example.com');
-- employee_id automatically assigned (e.g., 1)
 
-- Composite key example
CREATE TABLE order_items (
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL DEFAULT 1,
    unit_price DECIMAL(10,2) NOT NULL,
    
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

Auto-Generation Strategies

Surrogate Keys vs Natural Keys: Entity Integrity Implications

The choice between surrogate keys (artificially generated identifiers) and natural keys (real-world identifiers) has significant implications for entity integrity maintenance.

Surrogate keys are system-generated values with no inherent meaning outside the database. Examples include auto-incremented integers, UUIDs, or database-generated sequences.

Natural keys are values that have meaning in the real world and uniquely identify entities. Examples include Social Security Numbers, ISBN numbers, or email addresses.

Surrogate Keys: Entity Integrity Advantages

•Never NULL by design — Auto-generation guarantees a value exists before insertion completes
•Immutable — No external process can change the identifier, preventing key modification cascades
•Uniform — Same data type and size across all tables, simplifying indexing and joins
•Privacy-neutral — No sensitive information encoded in the key itself
•Stable — Real-world changes (name changes, reorganizations) don't affect the key

Natural Keys: Entity Integrity Risks

•May be unknown at insertion time — New employee might not have SSN yet; violates entity integrity
•Can change — Email addresses change; key changes cascade to all foreign keys
•Variable size — Different natural identifiers have different lengths, complicating indexing
•Privacy concerns — Natural identifiers often contain PII, appearing in logs and debug output
•Format variations — Same real-world ID formatted differently ('US-12345' vs 'US12345') can cause duplicates

The practical recommendation:

Use a surrogate key as the primary key (satisfies entity integrity automatically)
Declare natural keys as unique constraints (enforces business uniqueness)
Index natural keys if they're frequently used for lookups

hybrid_key_approach.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
-- Hybrid approach: Surrogate PK + Natural Key Uniqueness
CREATE TABLE customers (
    -- Surrogate primary key: auto-generated, never NULL
    customer_id BIGSERIAL PRIMARY KEY,
    
    -- Natural key: enforced unique, used for business operations
    email VARCHAR(255) NOT NULL UNIQUE,
    
    -- Additional natural identifier (may be NULL until verified)
    ssn CHAR(11) UNIQUE,  -- Can be NULL; not part of PK
    
    -- Business attributes
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Foreign keys reference the surrogate key (stable, efficient)
CREATE TABLE orders (
    order_id BIGSERIAL PRIMARY KEY,
    customer_id BIGINT NOT NULL REFERENCES customers(customer_id),
    order_date DATE NOT NULL DEFAULT CURRENT_DATE,
    total_amount DECIMAL(12,2) NOT NULL
);
 
-- Application lookups can use natural key (email)
SELECT customer_id, first_name, last_name 
FROM customers 
WHERE email = 'user@example.com';
 
-- But foreign keys use stable surrogate
SELECT o.order_id, o.order_date, c.email
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

When Natural Keys Make Sense

Common Violations and Prevention Strategies

While databases enforce entity integrity at the constraint level, violations can still occur through various pathways. Understanding these failure modes helps you design robust systems.

Common Violation Patterns

•Application bypass — Bulk loading tools that disable constraints, or direct SQL that skips validation triggers
•ETL pipeline failures — Data extraction from sources where the natural key wasn't actually unique or complete
•Legacy data migration — Historical data with NULL identifiers being moved to a new system with stricter enforcement
•Constraint-disabled maintenance — DBA disables constraints for maintenance and forgets to re-enable them
•Race conditions — Two transactions simultaneously inserting rows with auto-generated keys on systems with flawed sequence management
•Weak typing — ORM layers that serialize NULL as an empty string or zero, bypassing NULL checks

Prevention strategies:

Entity Integrity Defense Mechanisms
Layer	Strategy	Implementation
Database	Always-on constraints	Never disable PK constraints; use TRUNCATE instead of constraint-off bulk deletes
Database	Auto-generation defaults	Use SERIAL/IDENTITY/AUTO_INCREMENT to prevent NULL at the source
Application	Validation before persist	Validate entity has identifier before attempting database save
Application	ORM configuration	Configure ORM to reject NULL primary keys; use database-generated IDs
ETL	Pre-load validation	Validate source data for key completeness before loading
ETL	Staging tables	Load into staging with same constraints; fix violations before promotion
Operations	Constraint monitoring	Automated checks that all tables have enabled primary key constraints
Operations	Change control	Require approval for any DDL that modifies or disables constraints

integrity_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- PostgreSQL: Query to find tables without primary keys
SELECT 
    t.table_schema,
    t.table_name 
FROM information_schema.tables t
LEFT JOIN information_schema.table_constraints tc
    ON t.table_schema = tc.table_schema 
    AND t.table_name = tc.table_name 
    AND tc.constraint_type = 'PRIMARY KEY'
WHERE t.table_type = 'BASE TABLE'
    AND t.table_schema NOT IN ('pg_catalog', 'information_schema')
    AND tc.constraint_name IS NULL
ORDER BY t.table_schema, t.table_name;
 
-- PostgreSQL: Find disabled constraints (requires pg_catalog access)
SELECT 
    conrelid::regclass AS table_name,
    conname AS constraint_name,
    contype AS constraint_type,
    NOT convalidated AS is_not_validated
FROM pg_constraint
WHERE contype = 'p'  -- Primary key
    AND NOT convalidated;
 
-- MySQL: Tables without primary keys
SELECT 
    t.TABLE_SCHEMA, 
    t.TABLE_NAME
FROM information_schema.TABLES t
LEFT JOIN information_schema.TABLE_CONSTRAINTS tc
    ON t.TABLE_SCHEMA = tc.TABLE_SCHEMA 
    AND t.TABLE_NAME = tc.TABLE_NAME 
    AND tc.CONSTRAINT_TYPE = 'PRIMARY KEY'
WHERE t.TABLE_TYPE = 'BASE TABLE'
    AND t.TABLE_SCHEMA NOT IN ('mysql', 'information_schema', 'performance_schema')
    AND tc.CONSTRAINT_NAME IS NULL;

Proactive Monitoring

Summary: Entity Integrity Mastery

Entity integrity is the bedrock of reliable relational databases. Let's consolidate the essential knowledge:

Key Takeaways

•Definition — Entity integrity requires that no attribute of the primary key may accept NULL values, ensuring every tuple is uniquely identifiable.
•Logical necessity — NULL in a primary key creates logical paradoxes because NULL = NULL evaluates to UNKNOWN, not TRUE, breaking uniqueness semantics.
•Real-world mapping — Entities important enough to store must be important enough to identify; NULL keys represent phantom entities.
•Composite keys — Every component of a composite primary key must be non-NULL; a single NULL component invalidates the entire key.
•Automatic enforcement — Modern databases enforce entity integrity automatically when PRIMARY KEY is declared.
•Surrogate key preference — Auto-generated surrogate keys inherently satisfy entity integrity and avoid natural key pitfalls.
•Defense in depth — Protect entity integrity at database, application, and operational layers.

What's next:

Page Complete

1 / 5