Loading content...
In 1999, Mars Climate Orbiter—a $327.6 million spacecraft—disintegrated in the Martian atmosphere. The cause? One engineering team used metric units (Newton-seconds), while another used imperial units (pound-seconds). The navigation software calculated thrust incorrectly, and the orbiter entered the atmosphere at an angle too steep to survive.
This wasn't a calculation error—it was a data integrity failure. The data (thruster measurements) was syntactically correct but semantically invalid in its context. No constraint prevented the mismatch.
Data integrity is the assurance that data is accurate, consistent, and valid throughout its lifecycle—from creation through all modifications to eventual deletion. While the Mars Orbiter example is dramatic, data integrity failures occur constantly in business systems: invalid phone numbers, impossible dates, negative prices, orphaned records, and violated business rules.
Database Management Systems provide systematic mechanisms to define and enforce integrity, transforming data validation from scattered application code to centralized, guaranteed rules.
By the end of this page, you will understand the four categories of data integrity constraints—entity, referential, domain, and semantic—and how DBMS enforces each. You'll learn to design constraint systems that protect data quality at the database layer, regardless of application behavior.
Data integrity encompasses multiple dimensions of data quality:
DBMS primarily addresses accuracy, consistency, and validity through integrity constraints—declarative rules that the database engine enforces automatically. These constraints form a safety net that catches errors before they corrupt the data.
| Category | Definition | Enforced By | Example |
|---|---|---|---|
| Entity Integrity | Every table has a primary key; no NULL in primary key | PRIMARY KEY constraint | CustomerID cannot be NULL or duplicate |
| Referential Integrity | Foreign keys must reference existing primary keys | FOREIGN KEY constraint | Order.CustomerID must exist in Customer.CustomerID |
| Domain Integrity | Column values must fall within defined valid sets | Data types, CHECK, NOT NULL | Age must be positive integer; Status in ('Active', 'Inactive') |
| Semantic/Business Integrity | Data must satisfy business rules and logic | CHECK constraints, triggers, stored procedures | EndDate must be after StartDate; Discount <= 50% |
Integrity constraints form a hierarchy. Domain constraints are checked first (is this value a valid integer?), then entity constraints (is this key unique?), then referential constraints (does this reference exist?), and finally semantic constraints (does this satisfy business logic?). A value must pass all applicable constraints to be accepted.
Entity integrity ensures that every row in a table is uniquely identifiable and that the identifying attributes (the primary key) are always present and unique. This fundamental constraint provides the foundation for all other integrity mechanisms.
Entity Integrity Rule:
No attribute of a primary key may be NULL, and every primary key value must be unique within the table.
This rule stems from the relational model's requirement that every tuple be distinguishable. If primary keys could be NULL or duplicate, we couldn't reliably identify which row to update, delete, or reference.
123456789101112131415161718192021222324252627282930313233343536
-- Entity Integrity: Primary Key Examples -- Simple auto-increment surrogate keyCREATE TABLE Customers ( CustomerID INT PRIMARY KEY AUTO_INCREMENT, -- Never NULL, always unique Email VARCHAR(255) NOT NULL UNIQUE, -- Natural key as alternate Name VARCHAR(100) NOT NULL); -- Composite primary keyCREATE TABLE OrderItems ( OrderID INT NOT NULL, ProductID INT NOT NULL, Quantity INT NOT NULL, UnitPrice DECIMAL(10,2) NOT NULL, PRIMARY KEY (OrderID, ProductID) -- Combination must be unique); -- UUID for distributed systems (no sequence coordination needed)CREATE TABLE DistributedEvents ( EventID UUID PRIMARY KEY DEFAULT gen_random_uuid(), EventType VARCHAR(50) NOT NULL, Payload JSONB, CreatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- DBMS enforces entity integrity automatically:-- This FAILS: Duplicate primary keyINSERT INTO Customers (CustomerID, Email, Name) VALUES (1, 'a@b.com', 'Alice');INSERT INTO Customers (CustomerID, Email, Name) VALUES (1, 'c@d.com', 'Bob');-- Error: Duplicate key value violates unique constraint -- This FAILS: NULL primary keyINSERT INTO OrderItems (OrderID, ProductID, Quantity, UnitPrice)VALUES (NULL, 100, 5, 29.99);-- Error: NULL value in column "orderid" violates not-null constraintFor most applications, surrogate keys (auto-generated IDs) are preferred. Natural keys (like SSN or email) can change, may have privacy implications, and may not exist when the record is first created. Surrogate keys provide stable, meaningless identifiers that never need to change.
Referential integrity ensures that relationships between tables remain valid. When a table has a foreign key referencing another table's primary key, referential integrity guarantees that:1
Without referential integrity, you could have orders for non-existent customers, enrollments in non-existent courses, or payments to non-existent accounts. The database would contain meaningless references—"orphaned" records pointing to nothing.
| On Delete/Update | Behavior | Use Case |
|---|---|---|
| RESTRICT / NO ACTION | Prevent the operation if referenced | Cannot delete customer with active orders |
| CASCADE | Propagate the operation to child records | Delete order → delete all order items |
| SET NULL | Set foreign key to NULL | Manager leaves → employees have no manager |
| SET DEFAULT | Set foreign key to default value | Category deleted → products go to 'Uncategorized' |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
-- Demonstrating all referential integrity actions -- Parent tablesCREATE TABLE Departments ( DeptID INT PRIMARY KEY, DeptName VARCHAR(100) NOT NULL); CREATE TABLE Managers ( ManagerID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL); -- Child table with various FK actionsCREATE TABLE Employees ( EmpID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL, -- CASCADE: If department deleted, employee records also deleted -- Dangerous! Use when child has no meaning without parent DeptID INT REFERENCES Departments(DeptID) ON DELETE CASCADE ON UPDATE CASCADE, -- SET NULL: If manager leaves, employee.ManagerID becomes NULL -- Safe default for optional relationships ManagerID INT REFERENCES Managers(ManagerID) ON DELETE SET NULL ON UPDATE CASCADE); -- Example: Complex order system with appropriate actionsCREATE TABLE Customers ( CustomerID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL, Status VARCHAR(20) DEFAULT 'Active'); CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT NOT NULL, OrderDate DATE NOT NULL, -- RESTRICT: Cannot delete customer with orders (business rule) FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) ON DELETE RESTRICT ON UPDATE CASCADE); CREATE TABLE OrderItems ( OrderItemID INT PRIMARY KEY, OrderID INT NOT NULL, ProductID INT NOT NULL, Quantity INT NOT NULL, -- CASCADE: Deleting order removes all its items (logical) FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) ON DELETE CASCADE); -- Practical demonstration:DELETE FROM Departments WHERE DeptID = 10;-- Cascades: All employees in Dept 10 are also deleted! DELETE FROM Managers WHERE ManagerID = 5;-- Sets NULL: Employees who reported to Manager 5 now have ManagerID = NULL DELETE FROM Customers WHERE CustomerID = 100;-- Blocked! RESTRICT prevents deletion if orders exist-- Error: update or delete violates foreign key constraintON DELETE CASCADE is powerful but dangerous. A single delete can trigger a chain reaction removing thousands of records. Use CASCADE only when child records truly have no meaning without the parent (order items without orders, comments without posts). For important data, prefer RESTRICT and handle deletions explicitly in application logic.
Domain integrity ensures that values in a column fall within the acceptable set of values for that column's domain. This includes:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
-- Domain Integrity: Comprehensive Example CREATE TABLE Products ( ProductID INT PRIMARY KEY, -- NOT NULL: Required field Name VARCHAR(200) NOT NULL, -- Data type constraint: SKU must be exactly 8 characters SKU CHAR(8) NOT NULL UNIQUE, -- Range constraint: Price must be positive Price DECIMAL(10,2) NOT NULL CHECK (Price > 0), -- Range constraint: Discount percentage bounded DiscountPercent DECIMAL(5,2) DEFAULT 0 CHECK (DiscountPercent >= 0 AND DiscountPercent <= 50), -- Enumeration constraint: Status from defined set Status VARCHAR(20) NOT NULL DEFAULT 'Draft' CHECK (Status IN ('Draft', 'Active', 'Discontinued', 'OutOfStock')), -- Range constraint: Stock cannot be negative StockQuantity INT NOT NULL DEFAULT 0 CHECK (StockQuantity >= 0), -- Format constraint: Weight must be positive if provided WeightKg DECIMAL(8,3) CHECK (WeightKg IS NULL OR WeightKg > 0), -- Date constraint: Created date cannot be in future CreatedAt TIMESTAMP DEFAULT CURRENT_TIMESTAMP CHECK (CreatedAt <= CURRENT_TIMESTAMP)); CREATE TABLE Employees ( EmpID INT PRIMARY KEY, -- Email format using pattern matching (PostgreSQL) Email VARCHAR(255) NOT NULL UNIQUE CHECK (Email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'), -- Phone format constraint Phone VARCHAR(20) CHECK (Phone ~ '^\+?[0-9]{10,15}$'), -- Hire date cannot be in future HireDate DATE NOT NULL CHECK (HireDate <= CURRENT_DATE), -- Birth date must be reasonable for employment BirthDate DATE CHECK (BirthDate >= '1940-01-01' AND BirthDate <= CURRENT_DATE - INTERVAL '16 years'), -- Salary range by business policy Salary DECIMAL(12,2) CHECK (Salary IS NULL OR (Salary >= 30000 AND Salary <= 500000))); -- DBMS enforces domain integrity:-- This FAILS: Invalid email formatINSERT INTO Employees (EmpID, Email, HireDate) VALUES (1, 'not-an-email', '2024-01-15');-- Error: new row violates check constraint "employees_email_check" -- This FAILS: Price must be positiveINSERT INTO Products (ProductID, Name, SKU, Price) VALUES (1, 'Widget', 'ABC12345', -5.99);-- Error: new row violates check constraint "products_price_check" -- This FAILS: Status not in valid setINSERT INTO Products (ProductID, Name, SKU, Price, Status) VALUES (2, 'Gadget', 'DEF67890', 29.99, 'Invalid');-- Error: new row violates check constraint "products_status_check"PostgreSQL supports CREATE DOMAIN to define reusable types with constraints. Define 'email_address', 'phone_number', or 'percentage' domains once, then use them across tables. Changes to the domain definition automatically apply everywhere, ensuring consistent validation.
Semantic integrity (also called business integrity) ensures data satisfies complex business rules that span multiple columns, tables, or require procedural logic. These rules reflect real-world constraints that can't be expressed with simple CHECK constraints.
Examples of semantic constraints:
| Mechanism | Scope | Use Case | Complexity |
|---|---|---|---|
| CHECK constraint | Single row | EndDate > StartDate | Low |
| TRIGGER | Before/after INSERT/UPDATE/DELETE | Cross-table validation, audit logging | Medium |
| Stored Procedure | Encapsulated operations | Complex business transactions | Medium-High |
| Application Layer | Full flexibility | Rules requiring external data | Highest |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
-- Semantic Integrity: Cross-Column Constraints -- Simple cross-column CHECK constraintCREATE TABLE Projects ( ProjectID INT PRIMARY KEY, Name VARCHAR(200) NOT NULL, StartDate DATE NOT NULL, EndDate DATE, Budget DECIMAL(15,2) NOT NULL, SpentAmount DECIMAL(15,2) DEFAULT 0, -- Cross-column constraint: EndDate after StartDate CHECK (EndDate IS NULL OR EndDate > StartDate), -- Business rule: Cannot overspend budget CHECK (SpentAmount <= Budget)); -- Self-referential constraint: Employee cannot manage themselvesCREATE TABLE Employees ( EmpID INT PRIMARY KEY, Name VARCHAR(100) NOT NULL, ManagerID INT REFERENCES Employees(EmpID), -- Cannot be your own manager CHECK (ManagerID IS NULL OR ManagerID != EmpID)); -- TRIGGER for complex validation (PostgreSQL)CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT NOT NULL, TotalAmount DECIMAL(12,2), DiscountAmount DECIMAL(12,2) DEFAULT 0); CREATE TABLE CustomerTiers ( CustomerID INT PRIMARY KEY, TierName VARCHAR(50), MaxDiscountPercent DECIMAL(5,2)); -- Trigger to enforce tier-based discount limitsCREATE OR REPLACE FUNCTION validate_order_discount()RETURNS TRIGGER AS $$DECLARE max_discount DECIMAL(12,2); customer_max_percent DECIMAL(5,2);BEGIN -- Get customer's maximum allowed discount percentage SELECT MaxDiscountPercent INTO customer_max_percent FROM CustomerTiers WHERE CustomerID = NEW.CustomerID; -- Calculate maximum allowed discount amount max_discount := NEW.TotalAmount * (COALESCE(customer_max_percent, 10) / 100); -- Validate IF NEW.DiscountAmount > max_discount THEN RAISE EXCEPTION 'Discount % exceeds customer tier limit of %', NEW.DiscountAmount, max_discount; END IF; RETURN NEW;END;$$ LANGUAGE plpgsql; CREATE TRIGGER check_discount_before_insert BEFORE INSERT OR UPDATE ON Orders FOR EACH ROW EXECUTE FUNCTION validate_order_discount(); -- TRIGGER for inventory validationCREATE TABLE Inventory ( ProductID INT PRIMARY KEY, QuantityOnHand INT NOT NULL CHECK (QuantityOnHand >= 0)); CREATE OR REPLACE FUNCTION validate_order_inventory()RETURNS TRIGGER AS $$DECLARE available INT;BEGIN SELECT QuantityOnHand INTO available FROM Inventory WHERE ProductID = NEW.ProductID; IF available < NEW.Quantity THEN RAISE EXCEPTION 'Insufficient inventory. Requested: %, Available: %', NEW.Quantity, available; END IF; -- Decrement inventory UPDATE Inventory SET QuantityOnHand = QuantityOnHand - NEW.Quantity WHERE ProductID = NEW.ProductID; RETURN NEW;END;$$ LANGUAGE plpgsql;There's ongoing debate about where business rules belong. Database constraints are guaranteed regardless of access path but can be harder to test and deploy. Application rules are more flexible but can be bypassed. Best practice: Enforce core data integrity in the database; implement complex business logic in the application but with database constraints as a safety net.
Beyond individual constraints, DBMS ensures transactional integrity through ACID properties. These guarantees ensure that groups of operations either complete entirely or have no effect, maintaining database consistency even during failures.
ACID Properties:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
-- ACID in Action: Bank Transfer Example -- Account balance tableCREATE TABLE Accounts ( AccountID INT PRIMARY KEY, CustomerName VARCHAR(100) NOT NULL, Balance DECIMAL(15,2) NOT NULL CHECK (Balance >= 0) -- No overdraft); INSERT INTO Accounts VALUES (1001, 'Alice', 5000.00);INSERT INTO Accounts VALUES (1002, 'Bob', 3000.00); -- ATOMIC Transaction: Transfer $1000 from Alice to BobBEGIN TRANSACTION; -- Step 1: Debit Alice's accountUPDATE Accounts SET Balance = Balance - 1000 WHERE AccountID = 1001; -- Step 2: Credit Bob's accountUPDATE Accounts SET Balance = Balance + 1000 WHERE AccountID = 1002; -- ATOMICITY: If we crash here, both updates are rolled back-- CONSISTENCY: CHECK constraint ensures no negative balance-- ISOLATION: Other transactions see either old or new balances, never partial COMMIT;-- DURABILITY: After commit, these changes survive any crash -- What happens if Alice only has $500?BEGIN TRANSACTION; UPDATE Accounts SET Balance = Balance - 1000 WHERE AccountID = 1001;-- This statement FAILS due to CHECK constraint (Balance >= 0)-- Transaction is rolled back automatically-- Alice's balance remains $5000 (now $4000 after successful transfer) -- ATOMICITY ensures Bob never receives money that Alice couldn't send -- Isolation Levels affect integrity:SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;-- Strongest isolation: Transactions appear to run one after another-- Prevents all anomalies but has performance cost SET TRANSACTION ISOLATION LEVEL READ COMMITTED;-- Default in PostgreSQL: Each query sees committed data-- Good balance of isolation and performanceWithout ACID guarantees, applications would need to handle recovery manually: tracking partial failures, implementing rollback logic, managing concurrent access. DBMS provides these guarantees transparently, allowing developers to focus on business logic while the system handles reliability.
In file-based systems, data integrity was primarily an application responsibility. Each program validated its own input, maintained its own consistency checks, and hoped other programs did the same. This approach had fundamental problems:
The transition from file-based to DBMS systems in the 1970s-80s was largely driven by integrity failures. As systems scaled and became mission-critical, the ad-hoc validation approach became untenable. DBMS integrity mechanisms were a direct response to real operational disasters.
Data integrity is not optional—it's the foundation of trustworthy data systems. DBMS provides comprehensive mechanisms to define and enforce integrity at multiple levels. Let's consolidate the key concepts:
What's Next:
We've covered how DBMS maintains data quality for single-user access. But real systems serve many users simultaneously. How does DBMS handle multiple users reading and writing the same data at the same time without corruption? The next page explores Concurrent Access—the mechanisms that enable safe simultaneous database access.
You now understand how DBMS enforces data integrity through constraints at multiple levels—from basic data types to complex business rules. This systematic approach to validation ensures data quality regardless of how the database is accessed.