Loading content...
A database that can store any data in any structure with any operation is not a useful database—it's just a file dump. What transforms a data store into a trustworthy database is the presence of constraints: rules that define what data is valid and what is forbidden.
Constraints are the laws that govern data behavior. Just as physical laws determine what's possible in the natural world, database constraints determine what's possible in the data world. They ensure that despite the chaos of concurrent users, application bugs, and evolving requirements, the database maintains its integrity—its fundamental correctness and reliability.
The constraint component of a data model specifies what invariants the data must satisfy. These aren't suggestions or documentation—they are enforced rules. The database actively rejects any operation that would violate a constraint, protecting data quality at the deepest level.
By the end of this page, you will understand the constraint component of data models, including the categories of constraints (inherent, implicit, explicit, semantic), common constraint types, how constraints are enforced, and the critical role constraints play in data quality. You'll appreciate why constraints are not optional decorations but essential guarantees.
Before examining constraint types, we must understand why constraints are so critical. Without constraints, databases deteriorate into unreliable data swamps.
The data quality problem:
Consider a database without constraints. What happens?
Each issue might seem minor in isolation, but together they destroy trust in the data. When data can't be trusted, every application must implement its own validation, every report must be manually verified, and every decision based on data is suspect.
Studies estimate that bad data costs organizations 15-25% of revenue through failed operations, mistaken decisions, and remediation efforts. Database constraints are the first line of defense—preventing bad data from entering the system in the first place.
Constraints as documentation:
Beyond enforcement, constraints serve as executable documentation of data rules. When a developer sees:
ALTER TABLE Orders
ADD CONSTRAINT fk_customer
FOREIGN KEY (customer_id) REFERENCES Customers(id);
They immediately understand: every order must have a valid customer. This is clearer than any comment or wiki page, and it can never become outdated because the database enforces it.
Constraints enable optimization:
Database optimizers use constraint information to improve query performance:
Constraints aren't just about correctness—they're about performance too.
Constraints can be categorized by their source and how they're expressed. Understanding these categories helps in deciding where different rules should be enforced.
1. Inherent (Model-Based) Constraints:
These derive from the data model itself and cannot be violated under any circumstances:
Inherent constraints require no explicit declaration—they're built into how the model works.
2. Schema-Based (Implicit) Constraints:
These arise from the schema definition but aren't separately declared:
Schema constraints are enforced automatically once the schema is defined.
3. Explicit (Declarative) Constraints:
These are explicitly declared in the schema definition:
-- Explicit constraint declarations
CREATE TABLE Products (
id INTEGER PRIMARY KEY, -- Key constraint
name VARCHAR(100) NOT NULL, -- NOT NULL constraint
sku VARCHAR(50) UNIQUE, -- Uniqueness constraint
price DECIMAL(10,2) CHECK (price > 0), -- Check constraint
category_id INTEGER REFERENCES Categories(id) -- Foreign key constraint
);
Explicit constraints are the primary mechanism for expressing business rules.
4. Semantic (Business Rule) Constraints:
Complex business rules that may span multiple tables or involve temporal conditions:
Semantic constraints often require triggers, stored procedures, or application-level enforcement because they're too complex for simple declarative constraints.
| Category | Source | Declaration | Enforcement | Examples |
|---|---|---|---|---|
| Inherent | Data model definition | Implicit in model | Automatic, unavoidable | No duplicate tuples, valid JSON |
| Schema-based | Schema definition | Implicit in schema | Automatic after schema creation | Data types, attribute existence |
| Explicit | DDL statements | Declared explicitly | Database enforces | PRIMARY KEY, FOREIGN KEY, CHECK |
| Semantic | Business requirements | Triggers/procedures/app | Custom code | Complex multi-table rules |
Keys are fundamental to the relational model, providing guaranteed uniqueness and enabling relationships between tables. Understanding key constraints is essential for proper database design.
Superkey:
A superkey is any combination of attributes that uniquely identifies each tuple. Many superkeys exist for any relation—including the extreme case of all attributes together.
Candidate Key:
A candidate key is a minimal superkey—a superkey from which no attribute can be removed without losing the uniqueness property. A relation may have multiple candidate keys.
-- Employees table with two candidate keys
CREATE TABLE Employees (
employee_id INTEGER, -- Candidate key 1
email VARCHAR(255), -- Candidate key 2 (also unique)
ssn CHAR(11), -- Candidate key 3 (also unique)
name VARCHAR(100),
department_id INTEGER,
PRIMARY KEY (employee_id),
UNIQUE (email),
UNIQUE (ssn)
);
Primary Key:
The primary key is the candidate key chosen to be the main identifier for the relation. This choice has several implications:
-- Common primary key patterns
-- Surrogate key (generated ID)
CREATE TABLE Orders (
id SERIAL PRIMARY KEY, -- Auto-generated
customer_id INTEGER NOT NULL,
order_date DATE NOT NULL
);
-- Natural key (business identifier)
CREATE TABLE Countries (
country_code CHAR(3) PRIMARY KEY, -- ISO country code
name VARCHAR(100) NOT NULL
);
-- Composite key (multiple columns)
CREATE TABLE OrderItems (
order_id INTEGER REFERENCES Orders(id),
product_id INTEGER REFERENCES Products(id),
quantity INTEGER NOT NULL,
PRIMARY KEY (order_id, product_id) -- Composite
);
Surrogate keys (auto-generated IDs) provide guaranteed uniqueness and immutability but lack business meaning. Natural keys (like email, SSN, ISBN) have business meaning but may change or have format issues. Many systems use surrogate keys as primary keys with unique constraints on natural keys—getting the benefits of both.
Foreign Key:
Foreign keys establish relationships between tables by referencing the primary key (or unique key) of another table:
CREATE TABLE Orders (
id INTEGER PRIMARY KEY,
customer_id INTEGER NOT NULL,
status VARCHAR(20),
-- Foreign key with referential actions
CONSTRAINT fk_customer
FOREIGN KEY (customer_id)
REFERENCES Customers(id)
ON DELETE RESTRICT -- Prevent deleting customers with orders
ON UPDATE CASCADE -- Update references if customer ID changes
);
Referential actions specify what happens when the referenced row is modified:
Every attribute in a relational schema has a domain—the set of all possible values it can hold. Domain constraints ensure that attribute values come from appropriate sets.
Built-in Type Constraints:
SQL provides standard data types, each with its own domain:
CREATE TABLE Products (
id INTEGER, -- Whole numbers
name VARCHAR(100), -- Strings up to 100 chars
price DECIMAL(10,2), -- Precise decimal (10 digits, 2 after point)
weight FLOAT, -- Approximate floating-point
is_active BOOLEAN, -- TRUE/FALSE
created_at TIMESTAMP, -- Date and time
description TEXT -- Unlimited text
);
Type mismatches are caught at insert/update time:
INSERT INTO Products (id, price) VALUES ('abc', 'expensive');
-- ERROR: invalid input syntax for integer: "abc"
-- ERROR: invalid input syntax for type numeric: "expensive"
Custom Domains:
SQL allows defining custom domains with additional constraints:
-- Create a custom domain for email addresses
CREATE DOMAIN email_address AS VARCHAR(255)
CHECK (VALUE ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');
-- Create a domain for positive money
CREATE DOMAIN positive_money AS DECIMAL(10,2)
CHECK (VALUE > 0);
-- Create a domain for percentage (0-100)
CREATE DOMAIN percentage AS DECIMAL(5,2)
CHECK (VALUE >= 0 AND VALUE <= 100);
-- Use custom domains in table definitions
CREATE TABLE Customers (
id INTEGER PRIMARY KEY,
email email_address NOT NULL,
discount percentage DEFAULT 0
);
CREATE TABLE Products (
id INTEGER PRIMARY KEY,
price positive_money NOT NULL
);
Custom domains centralize validation logic and make schemas self-documenting.
Document databases like MongoDB are often called 'schema-less,' but they still have type constraints. JSON values have types (string, number, boolean, array, object). The difference is these constraints are checked at access time rather than schema definition time—and different documents can have different structures.
The relational model defines two fundamental integrity constraints that are central to database correctness:
Entity Integrity:
Entity integrity requires that primary key values cannot be NULL. The rationale is fundamental:
Therefore, every tuple must have a known, non-NULL primary key.
-- Entity integrity is automatically enforced for PRIMARY KEY
CREATE TABLE Employees (
id INTEGER PRIMARY KEY, -- Implicitly NOT NULL
name VARCHAR(100)
);
INSERT INTO Employees (id, name) VALUES (NULL, 'Alice');
-- ERROR: null value in column "id" violates not-null constraint
Referential Integrity:
Referential integrity requires that foreign key values must reference existing rows (or be NULL if the relationship is optional). This prevents "dangling references"—pointers to non-existent data.
CREATE TABLE Departments (
id INTEGER PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE Employees (
id INTEGER PRIMARY KEY,
name VARCHAR(100) NOT NULL,
department_id INTEGER REFERENCES Departments(id)
);
-- Insert a department
INSERT INTO Departments VALUES (1, 'Engineering');
-- Valid: References existing department
INSERT INTO Employees VALUES (1, 'Alice', 1); -- OK
-- Valid: NULL department (if column allows NULL)
INSERT INTO Employees VALUES (2, 'Bob', NULL); -- OK
-- Invalid: References non-existent department
INSERT INTO Employees VALUES (3, 'Carol', 99);
-- ERROR: insert or update on table "employees" violates
-- foreign key constraint "employees_department_id_fkey"
Many NoSQL databases don't enforce referential integrity, leaving it to the application. This can be acceptable for denormalized data where relationships are embedded, but for data with references (document IDs, etc.), applications must be extremely careful to maintain consistency manually—a significant source of bugs.
CHECK Constraints:
CHECK constraints allow arbitrary boolean conditions on column values:
CREATE TABLE Products (
id INTEGER PRIMARY KEY,
name VARCHAR(100) NOT NULL,
price DECIMAL(10,2) NOT NULL CHECK (price > 0),
discount DECIMAL(5,2) CHECK (discount >= 0 AND discount <= 100),
min_stock INTEGER CHECK (min_stock >= 0),
max_stock INTEGER CHECK (max_stock >= min_stock),
status VARCHAR(20) CHECK (status IN ('active', 'discontinued', 'pending'))
);
-- Table-level CHECK constraint (spans multiple columns)
CREATE TABLE DateRanges (
id INTEGER PRIMARY KEY,
start_date DATE NOT NULL,
end_date DATE NOT NULL,
CONSTRAINT valid_date_range CHECK (end_date >= start_date)
);
Common CHECK Constraint Patterns:
-- Enumeration (limited values)
CHECK (priority IN ('low', 'medium', 'high', 'critical'))
-- Range constraint
CHECK (rating BETWEEN 1 AND 5)
-- Pattern matching (PostgreSQL)
CHECK (phone ~ '^\+?[0-9]{10,15}$')
-- Conditional requirement
CHECK (status != 'shipped' OR ship_date IS NOT NULL)
-- Cross-column validation
CHECK (sale_price <= regular_price)
-- Complex business rule
CHECK (
(account_type = 'premium' AND credit_limit >= 10000) OR
(account_type = 'standard' AND credit_limit <= 5000) OR
(account_type = 'basic' AND credit_limit <= 1000)
)
Assertions (Multi-Table Constraints):
SQL defines ASSERTION for constraints spanning multiple tables, though few databases implement them:
-- Theoretical syntax (rarely supported)
CREATE ASSERTION budget_check
CHECK (
NOT EXISTS (
SELECT d.id
FROM Departments d
WHERE (
SELECT SUM(e.salary)
FROM Employees e
WHERE e.department_id = d.id
) > d.budget
)
);
In practice, multi-table constraints require:
When CHECK constraints aren't powerful enough, triggers can enforce arbitrary complex rules. However, triggers add complexity and can impact performance. Use them sparingly and document thoroughly. Sometimes, accepting eventual consistency and using batch validation is more practical than real-time trigger enforcement.
Understanding when and how constraints are enforced is crucial for both correctness and performance.
Enforcement Timing:
Constraints can be checked at different times:
Immediate checking: Constraints are verified after each individual statement. This is the default and ensures the database is never in a violated state.
Deferred checking: Constraints are verified only at transaction commit. This allows temporary violations within a transaction that are resolved before commit.
-- Deferred constraint checking (PostgreSQL)
CREATE TABLE Nodes (
id INTEGER PRIMARY KEY,
parent_id INTEGER,
CONSTRAINT fk_parent FOREIGN KEY (parent_id)
REFERENCES Nodes(id) DEFERRABLE INITIALLY DEFERRED
);
BEGIN;
-- Can temporarily violate the constraint
INSERT INTO Nodes (id, parent_id) VALUES (2, 1); -- parent 1 doesn't exist yet!
INSERT INTO Nodes (id, parent_id) VALUES (1, NULL); -- Now parent exists
COMMIT; -- Constraint checked here, passes
Enforcement Mechanisms:
Databases use various mechanisms to enforce constraints:
Index-based enforcement:
Lookup-based enforcement:
Expression evaluation:
Trigger-based enforcement:
| Constraint Type | Enforcement Cost | Mitigation Strategies |
|---|---|---|
| PRIMARY KEY | Index maintenance on insert | Use efficient key types (integer vs. UUID) |
| UNIQUE | Index lookup + maintenance | Consider partial unique indexes |
| FOREIGN KEY | Lookup on insert; cascade on delete | Index foreign key columns; consider deferred |
| CHECK | Expression evaluation | Keep expressions simple; avoid functions |
| NOT NULL | Very low (null check) | No real mitigation needed |
Every constraint requires enforcement work. Heavily constrained tables are slower to write than unconstrained ones. This is a worthwhile trade-off for data integrity, but be aware of it. Batch loads often disable constraints temporarily for performance, then re-enable and validate afterward.
The constraint component defines what must always be true about data—the invariants that ensure quality, consistency, and correctness:
What's next:
We've now explored all three components of a data model: structure (what data looks like), operations (what we can do), and constraints (what must be true). The next page examines model evolution—how data models have changed over time and how they continue to evolve to meet new challenges. Understanding this evolution reveals why we have so many models and where the field is heading.
You now understand the constraint component of data models—the laws that govern data behavior. Constraints transform storage into trustworthy databases by ensuring that data is always valid, consistent, and meaningful. Next, we'll explore how data models have evolved over time.