Data Model Concepts - Learning Module

Loading content...

0/241

Constraint Aspect

The Laws of Data

A database that can store any data in any structure with any operation is not a useful database—it's just a file dump. What transforms a data store into a trustworthy database is the presence of constraints: rules that define what data is valid and what is forbidden.

Constraints are the laws that govern data behavior. Just as physical laws determine what's possible in the natural world, database constraints determine what's possible in the data world. They ensure that despite the chaos of concurrent users, application bugs, and evolving requirements, the database maintains its integrity—its fundamental correctness and reliability.

The constraint component of a data model specifies what invariants the data must satisfy. These aren't suggestions or documentation—they are enforced rules. The database actively rejects any operation that would violate a constraint, protecting data quality at the deepest level.

What You Will Learn

By the end of this page, you will understand the constraint component of data models, including the categories of constraints (inherent, implicit, explicit, semantic), common constraint types, how constraints are enforced, and the critical role constraints play in data quality. You'll appreciate why constraints are not optional decorations but essential guarantees.

Why Constraints Matter

Before examining constraint types, we must understand why constraints are so critical. Without constraints, databases deteriorate into unreliable data swamps.

The data quality problem:

Consider a database without constraints. What happens?

NULL product prices: Some products have no price, causing checkout failures
Orphan records: Orders reference customers that don't exist
Duplicate entries: The same customer appears multiple times with different IDs
Invalid values: Negative quantities, dates from the year 9999, email addresses without @ symbols
Inconsistent states: An order marked "shipped" but with a future ship date
Violated business rules: Discounts exceeding 100%, employees reporting to themselves

Each issue might seem minor in isolation, but together they destroy trust in the data. When data can't be trusted, every application must implement its own validation, every report must be manually verified, and every decision based on data is suspect.

The Cost of Bad Data

Studies estimate that bad data costs organizations 15-25% of revenue through failed operations, mistaken decisions, and remediation efforts. Database constraints are the first line of defense—preventing bad data from entering the system in the first place.

Constraints as documentation:

Beyond enforcement, constraints serve as executable documentation of data rules. When a developer sees:

ALTER TABLE Orders
  ADD CONSTRAINT fk_customer 
  FOREIGN KEY (customer_id) REFERENCES Customers(id);

They immediately understand: every order must have a valid customer. This is clearer than any comment or wiki page, and it can never become outdated because the database enforces it.

Constraints enable optimization:

Database optimizers use constraint information to improve query performance:

UNIQUE constraints enable index-only scans
NOT NULL constraints allow tighter storage
Foreign key constraints inform join strategies
CHECK constraints enable partition elimination

Constraints aren't just about correctness—they're about performance too.

Categories of Constraints

Constraints can be categorized by their source and how they're expressed. Understanding these categories helps in deciding where different rules should be enforced.

1. Inherent (Model-Based) Constraints:

These derive from the data model itself and cannot be violated under any circumstances:

In the relational model: A tuple cannot have duplicate attribute names within the same relation
In the relational model: A relation is a set, so duplicate tuples are prohibited
In the document model: A document must be valid JSON/BSON
In the graph model: An edge must connect two valid nodes

Inherent constraints require no explicit declaration—they're built into how the model works.

2. Schema-Based (Implicit) Constraints:

These arise from the schema definition but aren't separately declared:

Data types: An INTEGER column implicitly rejects "hello"
Cardinality: A single-valued attribute implicitly rejects multiple values
Structure: The schema implicitly constrains what attributes exist

Schema constraints are enforced automatically once the schema is defined.

3. Explicit (Declarative) Constraints:

These are explicitly declared in the schema definition:

-- Explicit constraint declarations
CREATE TABLE Products (
    id          INTEGER PRIMARY KEY,           -- Key constraint
    name        VARCHAR(100) NOT NULL,         -- NOT NULL constraint
    sku         VARCHAR(50) UNIQUE,            -- Uniqueness constraint
    price       DECIMAL(10,2) CHECK (price > 0), -- Check constraint
    category_id INTEGER REFERENCES Categories(id) -- Foreign key constraint
);

Explicit constraints are the primary mechanism for expressing business rules.

4. Semantic (Business Rule) Constraints:

Complex business rules that may span multiple tables or involve temporal conditions:

"An employee's salary cannot exceed their manager's salary"
"A customer can have at most 3 active loans"
"An order cannot be shipped before it's paid"
"The total of line items must equal the order total"

Semantic constraints often require triggers, stored procedures, or application-level enforcement because they're too complex for simple declarative constraints.

Constraint Categories Summary
Category	Source	Declaration	Enforcement	Examples
Inherent	Data model definition	Implicit in model	Automatic, unavoidable	No duplicate tuples, valid JSON
Schema-based	Schema definition	Implicit in schema	Automatic after schema creation	Data types, attribute existence
Explicit	DDL statements	Declared explicitly	Database enforces	PRIMARY KEY, FOREIGN KEY, CHECK
Semantic	Business requirements	Triggers/procedures/app	Custom code	Complex multi-table rules

Key Constraints

Keys are fundamental to the relational model, providing guaranteed uniqueness and enabling relationships between tables. Understanding key constraints is essential for proper database design.

Superkey:

A superkey is any combination of attributes that uniquely identifies each tuple. Many superkeys exist for any relation—including the extreme case of all attributes together.

Candidate Key:

A candidate key is a minimal superkey—a superkey from which no attribute can be removed without losing the uniqueness property. A relation may have multiple candidate keys.

-- Employees table with two candidate keys
CREATE TABLE Employees (
    employee_id   INTEGER,        -- Candidate key 1
    email         VARCHAR(255),   -- Candidate key 2 (also unique)
    ssn           CHAR(11),       -- Candidate key 3 (also unique)
    name          VARCHAR(100),
    department_id INTEGER,
    
    PRIMARY KEY (employee_id),
    UNIQUE (email),
    UNIQUE (ssn)
);

Primary Key:

The primary key is the candidate key chosen to be the main identifier for the relation. This choice has several implications:

Primary key values cannot be NULL
Primary key is the default target for foreign key references
Primary key often determines physical storage order (clustered index)
Primary key should be stable—values shouldn't change

-- Common primary key patterns

-- Surrogate key (generated ID)
CREATE TABLE Orders (
    id          SERIAL PRIMARY KEY,  -- Auto-generated
    customer_id INTEGER NOT NULL,
    order_date  DATE NOT NULL
);

-- Natural key (business identifier)
CREATE TABLE Countries (
    country_code CHAR(3) PRIMARY KEY,  -- ISO country code
    name         VARCHAR(100) NOT NULL
);

-- Composite key (multiple columns)
CREATE TABLE OrderItems (
    order_id   INTEGER REFERENCES Orders(id),
    product_id INTEGER REFERENCES Products(id),
    quantity   INTEGER NOT NULL,
    PRIMARY KEY (order_id, product_id)  -- Composite
);

Surrogate vs. Natural Keys

Surrogate keys (auto-generated IDs) provide guaranteed uniqueness and immutability but lack business meaning. Natural keys (like email, SSN, ISBN) have business meaning but may change or have format issues. Many systems use surrogate keys as primary keys with unique constraints on natural keys—getting the benefits of both.

Foreign Key:

Foreign keys establish relationships between tables by referencing the primary key (or unique key) of another table:

CREATE TABLE Orders (
    id          INTEGER PRIMARY KEY,
    customer_id INTEGER NOT NULL,
    status      VARCHAR(20),
    
    -- Foreign key with referential actions
    CONSTRAINT fk_customer 
        FOREIGN KEY (customer_id) 
        REFERENCES Customers(id)
        ON DELETE RESTRICT      -- Prevent deleting customers with orders
        ON UPDATE CASCADE       -- Update references if customer ID changes
);

Referential actions specify what happens when the referenced row is modified:

RESTRICT/NO ACTION: Prevent the operation if references exist
CASCADE: Propagate the change to referencing rows
SET NULL: Set the foreign key to NULL
SET DEFAULT: Set the foreign key to its default value

Domain and Type Constraints

Every attribute in a relational schema has a domain—the set of all possible values it can hold. Domain constraints ensure that attribute values come from appropriate sets.

Built-in Type Constraints:

SQL provides standard data types, each with its own domain:

CREATE TABLE Products (
    id          INTEGER,           -- Whole numbers
    name        VARCHAR(100),      -- Strings up to 100 chars
    price       DECIMAL(10,2),     -- Precise decimal (10 digits, 2 after point)
    weight      FLOAT,             -- Approximate floating-point
    is_active   BOOLEAN,           -- TRUE/FALSE
    created_at  TIMESTAMP,         -- Date and time
    description TEXT               -- Unlimited text
);

Type mismatches are caught at insert/update time:

INSERT INTO Products (id, price) VALUES ('abc', 'expensive');
-- ERROR: invalid input syntax for integer: "abc"
-- ERROR: invalid input syntax for type numeric: "expensive"

Custom Domains:

SQL allows defining custom domains with additional constraints:

-- Create a custom domain for email addresses
CREATE DOMAIN email_address AS VARCHAR(255)
    CHECK (VALUE ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');

-- Create a domain for positive money
CREATE DOMAIN positive_money AS DECIMAL(10,2)
    CHECK (VALUE > 0);

-- Create a domain for percentage (0-100)
CREATE DOMAIN percentage AS DECIMAL(5,2)
    CHECK (VALUE >= 0 AND VALUE <= 100);

-- Use custom domains in table definitions
CREATE TABLE Customers (
    id          INTEGER PRIMARY KEY,
    email       email_address NOT NULL,
    discount    percentage DEFAULT 0
);

CREATE TABLE Products (
    id          INTEGER PRIMARY KEY,
    price       positive_money NOT NULL
);

Custom domains centralize validation logic and make schemas self-documenting.

Benefits of Strong Typing

•Early error detection: Invalid data is rejected immediately, not discovered later in application code
•Self-documenting schema: Types communicate intent (email_address vs. just VARCHAR)
•Optimized storage: Databases can choose efficient storage formats based on types
•Meaningful comparisons: Types enable sensible operations (you can add INTEGERs, not VARCHARs)
•Consistent handling: All applications see the same constraints, not just those that implement checks

Schema-less Doesn't Mean Type-less

Document databases like MongoDB are often called 'schema-less,' but they still have type constraints. JSON values have types (string, number, boolean, array, object). The difference is these constraints are checked at access time rather than schema definition time—and different documents can have different structures.

Integrity Constraints

The relational model defines two fundamental integrity constraints that are central to database correctness:

Entity Integrity:

Entity integrity requires that primary key values cannot be NULL. The rationale is fundamental:

Primary keys uniquely identify tuples
NULL means "unknown" or "not applicable"
If the identifier is unknown, the tuple's identity is undefined
An undefined identity means the tuple is meaningless

Therefore, every tuple must have a known, non-NULL primary key.

-- Entity integrity is automatically enforced for PRIMARY KEY
CREATE TABLE Employees (
    id   INTEGER PRIMARY KEY,  -- Implicitly NOT NULL
    name VARCHAR(100)
);

INSERT INTO Employees (id, name) VALUES (NULL, 'Alice');
-- ERROR: null value in column "id" violates not-null constraint

Referential Integrity:

Referential integrity requires that foreign key values must reference existing rows (or be NULL if the relationship is optional). This prevents "dangling references"—pointers to non-existent data.

CREATE TABLE Departments (
    id   INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE Employees (
    id            INTEGER PRIMARY KEY,
    name          VARCHAR(100) NOT NULL,
    department_id INTEGER REFERENCES Departments(id)
);

-- Insert a department
INSERT INTO Departments VALUES (1, 'Engineering');

-- Valid: References existing department
INSERT INTO Employees VALUES (1, 'Alice', 1);  -- OK

-- Valid: NULL department (if column allows NULL)
INSERT INTO Employees VALUES (2, 'Bob', NULL);  -- OK

-- Invalid: References non-existent department
INSERT INTO Employees VALUES (3, 'Carol', 99);  
-- ERROR: insert or update on table "employees" violates 
-- foreign key constraint "employees_department_id_fkey"

Without Referential Integrity

•Orphan records pointing nowhere
•Joins return incomplete results
•Aggregations are incorrect
•Application must check every reference
•Data corruption spreads silently
•No trust in relationship data

With Referential Integrity

•Every reference is valid by guarantee
•Joins always find matching rows
•Aggregations are accurate
•Database enforces automatically
•Corruption prevented at the source
•Complete trust in relationships

The NoSQL Trade-off

Many NoSQL databases don't enforce referential integrity, leaving it to the application. This can be acceptable for denormalized data where relationships are embedded, but for data with references (document IDs, etc.), applications must be extremely careful to maintain consistency manually—a significant source of bugs.

Check Constraints and Assertions

CHECK Constraints:

CHECK constraints allow arbitrary boolean conditions on column values:

CREATE TABLE Products (
    id          INTEGER PRIMARY KEY,
    name        VARCHAR(100) NOT NULL,
    price       DECIMAL(10,2) NOT NULL CHECK (price > 0),
    discount    DECIMAL(5,2) CHECK (discount >= 0 AND discount <= 100),
    min_stock   INTEGER CHECK (min_stock >= 0),
    max_stock   INTEGER CHECK (max_stock >= min_stock),
    status      VARCHAR(20) CHECK (status IN ('active', 'discontinued', 'pending'))
);

-- Table-level CHECK constraint (spans multiple columns)
CREATE TABLE DateRanges (
    id          INTEGER PRIMARY KEY,
    start_date  DATE NOT NULL,
    end_date    DATE NOT NULL,
    CONSTRAINT valid_date_range CHECK (end_date >= start_date)
);

Common CHECK Constraint Patterns:

-- Enumeration (limited values)
CHECK (priority IN ('low', 'medium', 'high', 'critical'))

-- Range constraint
CHECK (rating BETWEEN 1 AND 5)

-- Pattern matching (PostgreSQL)
CHECK (phone ~ '^\+?[0-9]{10,15}$')

-- Conditional requirement
CHECK (status != 'shipped' OR ship_date IS NOT NULL)

-- Cross-column validation
CHECK (sale_price <= regular_price)

-- Complex business rule
CHECK (
    (account_type = 'premium' AND credit_limit >= 10000) OR
    (account_type = 'standard' AND credit_limit <= 5000) OR
    (account_type = 'basic' AND credit_limit <= 1000)
)

Assertions (Multi-Table Constraints):

SQL defines ASSERTION for constraints spanning multiple tables, though few databases implement them:

-- Theoretical syntax (rarely supported)
CREATE ASSERTION budget_check
    CHECK (
        NOT EXISTS (
            SELECT d.id 
            FROM Departments d
            WHERE (
                SELECT SUM(e.salary) 
                FROM Employees e 
                WHERE e.department_id = d.id
            ) > d.budget
        )
    );

In practice, multi-table constraints require:

Triggers: Execute code on insert/update/delete
Stored procedures: Encapsulate complex validation
Application logic: Enforce in the application layer
Materialized views with checks: Maintain aggregates with constraints

Triggers for Complex Constraints

When CHECK constraints aren't powerful enough, triggers can enforce arbitrary complex rules. However, triggers add complexity and can impact performance. Use them sparingly and document thoroughly. Sometimes, accepting eventual consistency and using batch validation is more practical than real-time trigger enforcement.

Constraint Enforcement

Understanding when and how constraints are enforced is crucial for both correctness and performance.

Enforcement Timing:

Constraints can be checked at different times:

Immediate checking: Constraints are verified after each individual statement. This is the default and ensures the database is never in a violated state.

Deferred checking: Constraints are verified only at transaction commit. This allows temporary violations within a transaction that are resolved before commit.

-- Deferred constraint checking (PostgreSQL)
CREATE TABLE Nodes (
    id        INTEGER PRIMARY KEY,
    parent_id INTEGER,
    CONSTRAINT fk_parent FOREIGN KEY (parent_id) 
        REFERENCES Nodes(id) DEFERRABLE INITIALLY DEFERRED
);

BEGIN;
-- Can temporarily violate the constraint
INSERT INTO Nodes (id, parent_id) VALUES (2, 1);  -- parent 1 doesn't exist yet!
INSERT INTO Nodes (id, parent_id) VALUES (1, NULL);  -- Now parent exists
COMMIT;  -- Constraint checked here, passes

Enforcement Mechanisms:

Databases use various mechanisms to enforce constraints:

Index-based enforcement:

PRIMARY KEY and UNIQUE constraints are typically enforced through unique indexes
Indexes allow O(log n) duplicate checking on insert

Lookup-based enforcement:

FOREIGN KEY constraints require looking up the referenced table
Performance depends on index availability on the referenced column

Expression evaluation:

CHECK constraints evaluate boolean expressions
Complex expressions can impact insert/update performance

Trigger-based enforcement:

Some constraints are implemented through system triggers
User-defined triggers can enforce custom constraints

Constraint Performance Implications
Constraint Type	Enforcement Cost	Mitigation Strategies
PRIMARY KEY	Index maintenance on insert	Use efficient key types (integer vs. UUID)
UNIQUE	Index lookup + maintenance	Consider partial unique indexes
FOREIGN KEY	Lookup on insert; cascade on delete	Index foreign key columns; consider deferred
CHECK	Expression evaluation	Keep expressions simple; avoid functions
NOT NULL	Very low (null check)	No real mitigation needed

Constraints Have Costs

Every constraint requires enforcement work. Heavily constrained tables are slower to write than unconstrained ones. This is a worthwhile trade-off for data integrity, but be aware of it. Batch loads often disable constraints temporarily for performance, then re-enable and validate afterward.

Summary: Constraint Aspect

The constraint component defines what must always be true about data—the invariants that ensure quality, consistency, and correctness:

Key Takeaways

•Constraints ensure data quality — Without constraints, databases become unreliable data dumps. Constraints prevent bad data from entering the system.
•Constraints come in categories — Inherent (model-based), schema-based (from types), explicit (declared), and semantic (business rules) constraints work together.
•Keys are fundamental — Primary keys provide identity; foreign keys establish relationships. Entity and referential integrity protect these foundations.
•Domains and types constrain values — Type constraints are the first line of defense, preventing obviously invalid data.
•CHECK constraints express business rules — Arbitrary conditions can be enforced declaratively, making schemas self-documenting.
•Enforcement has costs — Constraints require processing time and storage. Understanding these costs helps balance integrity with performance.

What's next:

We've now explored all three components of a data model: structure (what data looks like), operations (what we can do), and constraints (what must be true). The next page examines model evolution—how data models have changed over time and how they continue to evolve to meet new challenges. Understanding this evolution reveals why we have so many models and where the field is heading.

Page Complete

You now understand the constraint component of data models—the laws that govern data behavior. Constraints transform storage into trustworthy databases by ensuring that data is always valid, consistent, and meaningful. Next, we'll explore how data models have evolved over time.