Database Management SystemsData Models

The Relational Model

LevelBeginner

Duration75 mins

TopicData Models

3 / 5

Codd's 12 Rules

The Relational Manifesto

By the mid-1980s, the term "relational database" had become a marketing buzzword. Vendors slapped the label on products that were only superficially relational—databases that might use tables for display but lacked the mathematical foundations that gave the relational model its power.

E.F. Codd, the inventor of the relational model, watched this corruption of his work with growing concern. In 1985, he published a remarkable article in Computerworld that would become one of the most influential pieces in database history: "Is Your DBMS Really Relational?"

In this article and its sequel "Does Your DBMS Run By the Rules?", Codd articulated twelve rules (actually thirteen, numbered 0-12) that a database management system must satisfy to be called fully relational. These rules weren't arbitrary requirements—they were precise translations of the relational model's mathematical foundations into practical implementation criteria.

More than requirements, Codd's rules became a benchmark for evaluating database systems and a roadmap for database development. Let's examine each rule in detail.

What You Will Learn

By the end of this page, you will understand each of Codd's 12 rules (plus Rule 0), the theoretical rationale behind each rule, how real database systems comply (or fail to comply) with these rules, and why these rules remain relevant benchmarks today.

Rule 0: The Foundation Rule

Rule 0 Statement

For any system that is advertised as, or claimed to be, a relational database management system, that system must be able to manage databases entirely through its relational capabilities.

Interpretation

Rule 0 is the meta-rule that establishes the context for all other rules. It states that a system claiming to be relational must use relational features—not non-relational workarounds—to manage data.

This might seem obvious, but consider systems that:

Require pointer-based navigation for certain operations
Need non-SQL commands to perform some administrative tasks
Store metadata in non-relational structures
Require external tools for backup/recovery

If any essential database operation requires stepping outside the relational paradigm, the system isn't fully relational.

Why This Matters

Rule 0 ensures that the benefits of the relational model—data independence, declarative querying, optimization—apply to ALL database operations, not just simple ones. A system that's relational "except for" certain cases forces users back into the complexity the relational model was designed to eliminate.

Rule 0 Compliance ExampleEvaluating a system against Rule 0

Input

System A:
  - Data queries: SQL ✓
  - Data modification: SQL ✓
  - Schema changes: SQL ✓
  - Backup/recovery: SQL commands ✓
  - Permission management: SQL (GRANT/REVOKE) ✓
  - Metadata access: SQL (information_schema) ✓

Output

System A Verdict: COMPLIANT with Rule 0
All database management uses relational capabilities.

System B:
  - Data queries: SQL ✓
  - Bulk loading: Proprietary binary format ✗
  - Statistics: Must run external utility ✗
  
System B Verdict: PARTIAL compliance only

Rule 1: The Information Rule

Rule 1 Statement

All information in a relational database is represented explicitly at the logical level and in exactly one way—by values in tables.

Interpretation

This is perhaps the most fundamental rule. It establishes that:

Everything is a table: Data, metadata, relationships, constraints—all represented as values in tables
No hidden information: No data stored in pointers, object addresses, or invisible system structures
Uniform representation: The same access method works for all information

The Single Representation Principle

Pre-relational databases used different representations for different data:

User data in records
Relationships in pointer chains
Metadata in special system areas
Indexes as separate navigational structures

The relational model unifies everything: tables all the way down.

Implications

Self-describing databases: Metadata (table names, columns, types, constraints) is stored in tables (the system catalog) that can be queried just like user data
Relationships are data: Foreign key relationships are stored as values, not pointers
No navigational complexity: Access patterns don't depend on physical implementation

information-rule-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- The information rule in action: metadata as tables
-- Query the system catalog just like any other table
 
-- List all tables in PostgreSQL
SELECT table_name, table_type
FROM information_schema.tables
WHERE table_schema = 'public';
 
-- List all columns with their types
SELECT table_name, column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'public';
 
-- List all foreign key relationships
SELECT
    tc.table_name AS referencing_table,
    kcu.column_name AS referencing_column,
    ccu.table_name AS referenced_table,
    ccu.column_name AS referenced_column
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
    ON tc.constraint_name = ccu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';
 
-- All the above use standard SQL against standard tables!

The Catalog is a Database

The system catalog (information_schema, pg_catalog, etc.) IS a relational database about your database. This self-describing property enables tools like schema browsers, ORM introspection, and automatic documentation generators—all using the same SQL skills you use for data.

Rule 2: Guaranteed Access Rule

Rule 2 Statement

Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name.

Interpretation

This rule guarantees addressability. Every piece of data can be located precisely through three coordinates:

Table name: Which relation contains the data
Primary key: Which tuple (row) within that relation
Column name: Which attribute within that tuple

No Navigation Required

In hierarchical or network databases, accessing data might require:

Starting from a root record
Following a series of pointers
Knowing the physical path to the data

Rule 2 eliminates this. You specify WHAT you want, not HOW to get there.

Primary Keys are Mandatory

Rule 2 implicitly requires that every table has a primary key. Without a reliable way to identify individual rows, you cannot guarantee access to specific values.

This is why database design always emphasizes key selection—it's not just good practice, it's fundamental to relational access.

Guaranteed Access in PracticeDemonstrating precise data addressability

Input

To access Alice Chen's salary from the Employee table:

Coordinates:
  - Table: Employee
  - Primary Key: emp_id = 1001
  - Column: salary

SQL Query:
  SELECT salary FROM Employee WHERE emp_id = 1001;

Output

Result: 95000.00

This access is:
  ✓ Guaranteed to work (if the key exists)
  ✓ Independent of physical storage
  ✓ Same syntax regardless of table size
  ✓ No navigation path needed

Compare to hierarchical navigation:
  FIND Department WHERE name = 'Engineering'
  THEN FIND Employee UNDER Department WHERE emp_id = 1001
  THEN GET salary
  
The relational approach is simpler AND more powerful.

Tables Without Primary Keys

SQL allows tables without primary keys, which technically violates Rule 2. While the DBMS permits this, it creates ambiguity—if duplicate rows exist, you cannot uniquely identify values. Best practice (and Rule 2 compliance) requires defining a primary key for every table.

Rule 3: Systematic Treatment of NULL Values

Rule 3 Statement

NULL values (distinct from the empty character string or a string of blank characters and distinct from zero or any other number) are supported in fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of data type.

Interpretation

Rule 3 addresses a critical real-world problem: sometimes data is unknown or doesn't apply. Codd recognized that a principled solution was essential.

NULL is NOT:

Empty string '' (which is a known value—nothing)
Zero (which is a known number)
False (which is a known boolean)
An error state

NULL IS:

A marker indicating absence of a value
Semantically: "unknown" or "inapplicable"

Systematic Treatment

NULL must be handled consistently across all operations:

Comparisons: NULL = NULL is UNKNOWN, not TRUE
Arithmetic: NULL + 5 = NULL
Aggregates: COUNT includes NULLs, SUM ignores them
Sorting: NULLs sort first or last (configurable)

This consistent treatment is what "systematic" means—NULL behavior is defined, not arbitrary.

Three-Valued Logic with NULL
Expression	Result	Explanation
NULL = NULL	UNKNOWN	We don't know if two unknowns are equal
NULL <> NULL	UNKNOWN	We don't know if two unknowns differ
5 > NULL	UNKNOWN	We can't compare known to unknown
NULL AND TRUE	UNKNOWN	Unknown AND anything that isn't FALSE = Unknown
NULL AND FALSE	FALSE	Regardless of unknown, FALSE wins
NULL OR TRUE	TRUE	Regardless of unknown, TRUE wins
NULL OR FALSE	UNKNOWN	Result depends on the unknown value
NOT NULL	UNKNOWN	Negation of unknown is unknown

null-handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Proper NULL handling in SQL
 
-- Testing for NULL (= doesn't work!)
SELECT * FROM Employee WHERE manager_id IS NULL;     -- Correct
SELECT * FROM Employee WHERE manager_id = NULL;      -- WRONG! Returns nothing
 
-- Coalesce: provide default for NULL
SELECT name, COALESCE(manager_id, 0) AS mgr 
FROM Employee;
 
-- NULLIF: return NULL if values match
SELECT NULLIF(bonus, 0) AS actual_bonus  -- Treat 0 bonus as "no bonus"
FROM Employee;
 
-- NULL in aggregates
SELECT 
    COUNT(*) AS total_rows,        -- Counts all rows including NULL
    COUNT(bonus) AS with_bonus,    -- Counts non-NULL bonus values
    SUM(bonus) AS total_bonus,     -- Sums non-NULL values
    AVG(bonus) AS avg_bonus        -- Average of non-NULL values only
FROM Employee;
 
-- CASE with NULL
SELECT name,
    CASE 
        WHEN bonus IS NULL THEN 'No bonus assigned'
        WHEN bonus = 0 THEN 'Zero bonus'
        ELSE 'Has bonus: ' || bonus::text
    END AS bonus_status
FROM Employee;

NULL Complications

NULL introduces three-valued logic, which complicates reasoning. Many subtle bugs arise from unexpected NULL behavior: WHERE x <> 5 doesn't return rows where x IS NULL. Join conditions with NULL never match. Aggregate functions behave differently. Understanding NULL semantics is crucial for correct SQL.

Rule 4: Dynamic Online Catalog Based on Relational Model

Rule 4 Statement

The database description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.

Interpretation

Rule 4 extends Rule 1 specifically for metadata. The database catalog (schema information) must be:

Stored in tables: Using the same relational structures as user data
Queryable with SQL: Same language, same skills
Online and dynamic: Always reflects current state, updated automatically
Subject to security: Same permission system as data

The System Catalog

Modern databases implement this through system catalogs:

PostgreSQL: pg_catalog schema, information_schema view
MySQL: information_schema, mysql database
SQL Server: sys schema, INFORMATION_SCHEMA views
Oracle: DBA_*, ALL_*, USER_* views, INFORMATION_SCHEMA

The SQL standard defines INFORMATION_SCHEMA with standard table structures for portability.

catalog-queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Standard INFORMATION_SCHEMA queries (SQL standard, cross-database)
 
-- List all user tables
SELECT table_name, table_type
FROM information_schema.tables
WHERE table_schema NOT IN ('pg_catalog', 'information_schema');
 
-- Find all columns with their constraints
SELECT 
    t.table_name,
    c.column_name,
    c.data_type,
    c.character_maximum_length,
    c.is_nullable,
    c.column_default
FROM information_schema.tables t
JOIN information_schema.columns c 
    ON t.table_name = c.table_name
WHERE t.table_schema = 'public'
ORDER BY t.table_name, c.ordinal_position;
 
-- Find all primary keys
SELECT
    tc.table_name,
    string_agg(kcu.column_name, ', ') AS pk_columns
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
GROUP BY tc.table_name;
 
-- Check which tables reference a given table
SELECT
    tc.table_name AS referencing_table,
    ccu.table_name AS referenced_table
FROM information_schema.table_constraints tc
JOIN information_schema.constraint_column_usage ccu
    ON tc.constraint_name = ccu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
    AND ccu.table_name = 'employee';  -- find refs to 'employee'

Practical Applications

Rule 4 compliance enables: schema comparison tools, automatic documentation generators, ORM schema introspection, migration script generators, and dependency analysis. All these tools work because metadata IS data, queryable with standard SQL.

Rule 5: Comprehensive Data Sublanguage Rule

Rule 5 Statement

A relational system may support several languages and various modes of terminal use. However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all of the following items: data definition, view definition, data manipulation, integrity constraints, authorization, and transaction boundaries.

Interpretation

Rule 5 requires a unified language capable of all database operations. While multiple interfaces may exist (graphical tools, APIs, embedded SQL), at least one complete language must exist.

SQL Fulfills This Rule

SQL is the comprehensive data sublanguage that modern databases use:

Data Definition (DDL):

CREATE TABLE, ALTER TABLE, DROP TABLE
CREATE INDEX, CREATE SEQUENCE
CREATE SCHEMA, CREATE DATABASE

View Definition:

CREATE VIEW, CREATE MATERIALIZED VIEW
ALTER VIEW, DROP VIEW

Data Manipulation (DML):

SELECT, INSERT, UPDATE, DELETE
MERGE, TRUNCATE

Integrity Constraints:

PRIMARY KEY, FOREIGN KEY, UNIQUE
CHECK, NOT NULL, DEFAULT

Authorization (DCL):

GRANT, REVOKE
CREATE ROLE, ALTER ROLE

Transaction Control (TCL):

BEGIN, COMMIT, ROLLBACK
SAVEPOINT, SET TRANSACTION

SQL Statement Categories
Category	Purpose	Key Statements
DDL	Define structure	CREATE, ALTER, DROP, TRUNCATE
DML	Manipulate data	SELECT, INSERT, UPDATE, DELETE, MERGE
DCL	Control access	GRANT, REVOKE
TCL	Control transactions	BEGIN, COMMIT, ROLLBACK, SAVEPOINT
Views	Define virtual tables	CREATE VIEW, CREATE MATERIALIZED VIEW
Constraints	Enforce integrity	PRIMARY KEY, FOREIGN KEY, CHECK

Character String Representation

Notably, Codd specified that statements must be expressible as character strings. This ensures:

Standardization: A textual syntax can be formally defined
Portability: Statements can be written, stored, and transmitted
Tool support: IDEs, version control, and documentation can work with queries
Reproducibility: Exact statements can be logged and replayed

This is why SQL is text-based rather than a graphical or binary protocol.

Beyond SQL

While SQL is the standard comprehensive sublanguage, rule 5 allows additional languages too. Many databases support JSON query languages (PostgreSQL's jsonpath), graph queries (Cypher via extensions), or specialized analytical languages (Window functions). These supplement but don't replace SQL's comprehensive role.

Rule 6: View Updating Rule

Rule 6 Statement

All views that are theoretically updatable are also updatable by the system.

Interpretation

Views provide logical data independence by presenting data differently than base tables. But if views are read-only, they're limited. Rule 6 requires that any view which CAN be updated (mathematically) SHOULD be updatable.

When is a View Updatable?

A view is potentially updatable when updates can be unambiguously translated to base table updates:

UPDATABLE views typically:

Select from a single base table
Include the primary key
Don't use DISTINCT, GROUP BY, HAVING
Don't use aggregate functions
Don't use set operations (UNION, INTERSECT)

NON-UPDATABLE views typically:

Join multiple tables (which row gets updated?)
Use aggregates (what does updating SUM mean?)
Use DISTINCT (which duplicate do we update?)
Compute derived columns (can't update a calculation)

updatable-view.sql
-- UPDATABLE VIEW
-- Simple projection of base table
CREATE VIEW active_employees AS
SELECT emp_id, name, department, salary
FROM employee
WHERE is_active = true;
 
-- This is updatable:
UPDATE active_employees
SET salary = 50000
WHERE emp_id = 1001;
-- Translates to:
-- UPDATE employee 
-- SET salary = 50000 
-- WHERE emp_id = 1001;
 
-- This INSERT works too:
INSERT INTO active_employees 
    (emp_id, name, department, salary)
VALUES (1005, 'Eve', 'Sales', 60000);
-- (is_active defaults or uses CHECK OPTION)

non-updatable-view.sql
-- NON-UPDATABLE VIEW
-- Aggregation makes updates ambiguous
CREATE VIEW dept_stats AS
SELECT 
    department,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employee
GROUP BY department;
 
-- This CANNOT be updated:
UPDATE dept_stats
SET avg_salary = 80000
WHERE department = 'Engineering';
-- ERROR: How do we change individual 
-- salaries to achieve avg of 80000?
 
-- Similarly ambiguous:
INSERT INTO dept_stats
VALUES ('Marketing', 5, 70000);
-- ERROR: What are the 5 employees?

Rule 6 Compliance is Partial

No database fully complies with Rule 6. The mathematical definition of 'theoretically updatable' is complex, and many edge cases exist. Modern databases support updatable views for simple cases and allow INSTEAD OF triggers for complex cases. This remains an area where theory exceeds practice.

Rule 7: High-Level Insert, Update, and Delete

Rule 7 Statement

The capability of handling a base relation or a derived relation (view) as a single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data.

Interpretation

Rule 7 requires set-at-a-time processing: the ability to operate on multiple rows as a single logical operation, not just row-by-row processing.

Set vs Procedural Processing

Pre-relational databases often required:

OPEN CURSOR
FOR EACH ROW:
    READ ROW
    MODIFY VALUES
    WRITE ROW
CLOSE CURSOR

Relational databases allow:

UPDATE Employee SET salary = salary * 1.10 WHERE department = 'Engineering';

The entire set of matching rows is updated as ONE operation.

Benefits of Set Operations

Simplicity: One statement, not a loop
Atomicity: All rows updated or none (transaction)
Optimization: Database can parallelize, use indexes
Correctness: No risk of partial completion
Performance: Set operations are often faster than row-by-row

set-operations.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Rule 7: Set-at-a-time operations
 
-- Insert multiple rows in one statement
INSERT INTO employee (emp_id, name, department, salary)
VALUES 
    (1001, 'Alice', 'Engineering', 95000),
    (1002, 'Bob', 'Marketing', 78000),
    (1003, 'Carol', 'Engineering', 102000);
 
-- Update all matching rows at once
UPDATE employee 
SET salary = salary * 1.10
WHERE department = 'Engineering'
    AND last_review_date < '2023-01-01';
-- All matching employees get raises in ONE operation
 
-- Delete all matching rows at once  
DELETE FROM employee
WHERE termination_date < CURRENT_DATE - INTERVAL '7 years';
-- All terminated employees beyond retention period deleted
 
-- Insert from query result (set-based)
INSERT INTO employee_archive 
SELECT * FROM employee WHERE is_archived = true;
-- Could move thousands of rows in one operation
 
-- Complex set update
UPDATE products p
SET p.price = p.price * 0.9
WHERE p.category_id IN (
    SELECT c.id FROM categories c WHERE c.name = 'Clearance'
);
-- Discount all clearance products at once

Avoiding Row-by-Row Processing

A common anti-pattern is using cursors to loop through rows when a set-based approach would work. While cursors are sometimes necessary (complex procedural logic), prefer set operations when possible. They're not just more elegant—they're typically much faster because the optimizer can work with the entire operation.

Rule 8: Physical Data Independence

Rule 8 Statement

Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.

Interpretation

Physical data independence means applications don't break when the physical storage layer changes. You can:

Move data to different disks
Change from HDD to SSD
Add or remove indexes
Change clustering strategies
Modify buffer pool sizes
Switch compression algorithms
Reorganize tablespaces

...all WITHOUT modifying application code.

The Abstraction Boundary

Physical data independence is achieved by maintaining a clean boundary:

Application ← SQL → Logical Schema ← Mapping → Physical Storage

Applications see only the logical view (tables, columns, constraints). The database handles the mapping to physical storage internally.

Physical Changes That Shouldn't Affect Applications
Physical Change	Purpose	App Impact
Add B-tree index	Speed up lookups	None (faster queries only)
Move table to SSD	Improve I/O performance	None (transparent)
Enable compression	Reduce storage	None (handled internally)
Partition a table	Manageability, performance	None (partitioning is transparent)
Change from heap to clustered	Optimize range scans	None (storage detail)
Add read replicas	Scale reads	None (or minimal config change)
Upgrade RAID configuration	Redundancy	None (OS/hardware level)

Physical Independence in ActionSame query, different physical implementations

Input

Application Query (unchanged):
SELECT * FROM orders WHERE customer_id = 12345;

Physical Storage Evolution:
  Year 1: Single disk, no index
    → Full table scan, 500ms
    
  Year 2: Added B-tree index on customer_id
    → Index lookup, 5ms
    
  Year 3: Table partitioned by date, on SSD
    → Partition pruning + index, 2ms
    
  Year 4: Hot data in memory, cold on disk
    → In-memory lookup, 0.5ms

Output

Result: Application code never changed!

The exact same SELECT statement became 1000x faster
through physical optimizations alone.

Physical data independence enabled this evolution
without any application modifications.

Index Hints as Violation

When applications use index hints (FORCE INDEX, USE INDEX), they're violating physical data independence—the application now depends on specific indexes existing. Use hints sparingly and document them; they create coupling between application and physical schema.

Rule 9: Logical Data Independence

Rule 9 Statement

Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

Interpretation

Logical data independence is about protecting applications from schema changes that don't remove information. If you reorganize tables in a way that preserves all the data and relationships, applications should continue working.

"Information-Preserving" Changes

These changes reorganize without losing information:

Splitting a table: One table → multiple tables with foreign keys
Merging tables: Multiple tables → one denormalized table
Adding columns: New data, existing queries still work
Renaming (with views): Old names map to new structure

Achieving Logical Independence with Views

Views are the primary mechanism for logical data independence:

logical-independence.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- BEFORE: Original table structure
CREATE TABLE person (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    home_address VARCHAR(200),
    work_address VARCHAR(200),
    home_phone VARCHAR(20),
    work_phone VARCHAR(20)
);
 
-- Applications use: SELECT name, home_address FROM person;
 
-- AFTER: Normalized redesign (Rule 9 scenario)
-- Split addresses and phones into separate tables
 
CREATE TABLE person_v2 (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);
 
CREATE TABLE address (
    id INT PRIMARY KEY,
    person_id INT REFERENCES person_v2(id),
    address_type VARCHAR(10),  -- 'home' or 'work'
    address VARCHAR(200)
);
 
CREATE TABLE phone (
    id INT PRIMARY KEY,
    person_id INT REFERENCES person_v2(id),
    phone_type VARCHAR(10),
    phone_number VARCHAR(20)
);
 
-- COMPATIBILITY VIEW: Old apps keep working!
CREATE VIEW person AS
SELECT 
    p.id,
    p.name,
    ha.address AS home_address,
    wa.address AS work_address,
    hp.phone_number AS home_phone,
    wp.phone_number AS work_phone
FROM person_v2 p
LEFT JOIN address ha ON p.id = ha.person_id AND ha.address_type = 'home'
LEFT JOIN address wa ON p.id = wa.person_id AND wa.address_type = 'work'
LEFT JOIN phone hp ON p.id = hp.person_id AND hp.phone_type = 'home'
LEFT JOIN phone wp ON p.id = wp.person_id AND wp.phone_type = 'work';
 
-- Original query STILL WORKS:
SELECT name, home_address FROM person;  -- unchanged!

Schema Evolution Strategy

When making breaking schema changes: (1) Create new structure, (2) Create view with old name pointing to new structure, (3) Migrate data, (4) Gradually update applications to use new structure directly, (5) Eventually remove compatibility view. This allows incremental migration without downtime.

Rule 10: Integrity Independence

Rule 10 Statement

Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.

Interpretation

Rule 10 mandates that data integrity rules be:

Defined in the database: Using SQL, not application code
Stored in the catalog: As queryable metadata
Enforced by the DBMS: Automatically, at all times
Independent of applications: Changes don't require app updates

Why Database-Level Constraints Matter

If constraints exist only in applications:

Each application must implement them correctly
Different apps might enforce different rules
Direct database access bypasses all constraints
Bugs in constraint code corrupt data
Changing rules requires changing all apps

Database-enforced constraints are:

Universal (apply to all access paths)
Consistent (same rules everywhere)
Reliable (DBMS-tested, not app-developer written)
Discoverable (visible in catalog)
Evolvable (change once, affects all)

integrity-independence.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Integrity constraints defined in the database (Rule 10 compliant)
 
CREATE TABLE employee (
    emp_id INT PRIMARY KEY,                              -- Entity integrity
    email VARCHAR(100) UNIQUE NOT NULL,                   -- Not null + unique
    first_name VARCHAR(50) NOT NULL,                      -- Required field
    last_name VARCHAR(50) NOT NULL,
    hire_date DATE NOT NULL DEFAULT CURRENT_DATE,         -- Default value
    salary DECIMAL(10,2) CHECK (salary > 0),              -- Domain constraint
    manager_id INT REFERENCES employee(emp_id),           -- Self-referential FK
    department_id INT NOT NULL REFERENCES department(id), -- Referential integrity
    
    -- Table-level constraint: manager must earn more
    CONSTRAINT mgr_salary_check CHECK (
        manager_id IS NULL OR EXISTS (
            SELECT 1 FROM employee m 
            WHERE m.emp_id = manager_id AND m.salary > salary
        )
    )  -- Note: This specific syntax varies by DBMS
);
 
-- Complex constraints via triggers when CHECK is insufficient
CREATE OR REPLACE FUNCTION enforce_department_budget()
RETURNS TRIGGER AS $$
BEGIN
    IF (SELECT SUM(salary) FROM employee WHERE department_id = NEW.department_id)
        > (SELECT budget FROM department WHERE id = NEW.department_id) THEN
        RAISE EXCEPTION 'Salary would exceed department budget';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER check_budget
    BEFORE INSERT OR UPDATE ON employee
    FOR EACH ROW EXECUTE FUNCTION enforce_department_budget();
 
-- Query constraints from catalog
SELECT constraint_name, constraint_type, table_name
FROM information_schema.table_constraints
WHERE table_name = 'employee';

The Reality Gap

In practice, many constraints ARE enforced in application code—for performance, complexity, or cross-system reasons. While not ideal per Rule 10, be pragmatic: use database constraints for fundamental integrity (keys, referential integrity, simple checks) and application logic for complex business rules that require procedural logic or external system access.

Rule 11: Distribution Independence

Rule 11 Statement

The end user must not be able to see that the data is distributed over various locations. Users should always get the impression that the data is located at one site only.

Interpretation

In distributed databases, data may be spread across multiple physical locations (nodes, data centers, regions). Rule 11 requires that this distribution be transparent to users and applications.

Location Transparency

Applications should access distributed data exactly as if it were local:

-- This query should work the same whether 'orders' is:
-- - Local table
-- - On another server in the same data center
-- - Partitioned across continents
-- - Replicated globally

SELECT * FROM orders WHERE customer_id = 12345;

The DBMS handles:

Finding where the data resides
Routing queries to appropriate nodes
Combining results from multiple sources
Managing replication and consistency

Types of Distribution Transparency

Location transparency: No need to know which node holds data
Fragmentation transparency: No awareness of data partitioning
Replication transparency: No knowledge of multiple copies

Distribution Transparency Levels
Transparency Level	What's Hidden	Example Capability
Location	Physical node addresses	Query works regardless of data location
Fragmentation	How tables are partitioned	Query spanning partitions looks like one table
Replication	Multiple data copies	Reads served from any replica transparently
Latency	Network delays	Query optimizer considers network costs
Failure	Node outages	Automatic failover without app changes

Modern Distributed Databases

Rule 11 was written in 1985 when distributed databases were rare. Today, distribution is common:

Sharding: Tables partitioned across nodes by key range/hash
Replication: Copies for redundancy and read scaling
Multi-region: Data in multiple geographic locations
Federated: Queries spanning multiple database systems

Modern systems (CockroachDB, Spanner, Aurora) provide strong distribution transparency. Others (some NoSQL systems) expose distribution for performance tuning, partially violating Rule 11 but sometimes necessarily.

CAP Theorem Implications

Perfect distribution transparency conflicts with the CAP theorem (you can't have perfect Consistency, Availability, and Partition tolerance simultaneously). Real distributed databases make trade-offs. Highly transparent systems may sacrifice some availability or consistency during network partitions. Understanding these trade-offs is crucial for distributed system design.

Rule 12: The Non-Subversion Rule

Rule 12 Statement

If a relational system has a low-level (single-record-at-a-time) language, that low-level language cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher-level relational language (multiple-records-at-a-time).

Interpretation

Rule 12 is about security and integrity: low-level interfaces cannot bypass high-level rules. Even if you have access to row-level APIs, internal functions, or direct storage access, you can't use these to:

Insert data that violates constraints
Bypass foreign key checks
Circumvent access control
Violate transaction isolation

Why This Matters

Databases often provide multiple access methods:

High-level: SQL (full constraint checking)
Low-level: Cursor-based row access
Internal: Storage APIs, bulk loaders
Administrative: Maintenance tools

If low-level methods could bypass constraints, the integrity guarantees of the high-level SQL would be meaningless. A constraint is only as strong as the weakest access path.

Rule 12 Compliance

•Bulk loaders validate constraints
•Direct table access still checks permissions
•Internal APIs enforce foreign keys
•Cursor operations respect isolation levels
•No 'backdoor' write paths

Rule 12 Violations

•Bulk import skips constraint checks
•Direct file editing bypasses DBMS
•Admin tools can disable triggers
•Replication applies changes without FK checks
•Maintenance mode allows constraint violations

Real-World Trade-offs

In practice, some Rule 12 violations are intentional for:

Performance: Bulk loads often defer constraint checking for speed, then validate at commit
Recovery: Disaster recovery may need to restore data even if constraints temporarily fail
Migration: Schema changes sometimes require temporarily invalid states

However, these should be:

Exceptional situations, not normal operations
Performed by trusted administrators
Followed by constraint validation
Logged and audited

Direct File Access is Dangerous

The ultimate Rule 12 violation is editing database files directly (outside the DBMS). This bypasses all constraints, transactions, and integrity mechanisms. Never do this on a production system. Even for recovery, use DBMS-provided tools that maintain structural integrity. Direct file manipulation can corrupt data beyond recovery.

Codd's Rules in the Modern Era

Codd articulated these rules in 1985—nearly 40 years ago. How do they hold up today?

Still Fully Relevant

Rules 0, 1, 2, 4, 5, 7, 8, 9, 10 remain as valid as ever. The core principles of:

Data as tables
Guaranteed access via keys
Comprehensive query language
Set-based operations
Data independence
Database-enforced integrity

These are timeless and define quality database design.

Evolved in Practice

Rule 3 (NULLs) remains challenging; three-valued logic continues to cause bugs. Rule 6 (Updatable views) is still only partially supported. Rules 11 and 12 have become more complex with distributed systems and multiple access methods.

Codd's Rules Compliance in Major Databases
Rule	PostgreSQL	MySQL	SQL Server	Oracle
0: Foundation	✓	Partial	✓	✓
1: Information	✓	✓	✓	✓
2: Guaranteed Access	✓	✓	✓	✓
3: NULL Handling	✓	✓	✓	✓
4: Online Catalog	✓	✓	✓	✓
5: Comprehensive Language	✓	✓	✓	✓
6: View Updating	Partial	Partial	Partial	Partial
7: Set Operations	✓	✓	✓	✓
8: Physical Independence	✓	✓	✓	✓
9: Logical Independence	✓ (via views)	✓	✓	✓
10: Integrity Independence	✓	Partial	✓	✓
11: Distribution Independence	✓ (with extensions)	Partial	✓ (with AG)	✓ (RAC)
12: Non-subversion	Mostly	Partial	Mostly	Mostly

Using Rules as a Benchmark

When evaluating databases (especially newer systems or NoSQL alternatives), Codd's rules provide a useful framework. Where does the system diverge from pure relational principles? Is that divergence intentional for valid reasons (performance, flexibility) or a limitation? Understanding the trade-offs helps make informed technology choices.

Summary: The Relational Standard

Codd's 12 Rules (plus Rule 0) represent the definitive criteria for what constitutes a truly relational database management system. Let's consolidate the key points:

Key Takeaways

•Rule 0 sets the foundation — A system must manage databases entirely through relational capabilities, not non-relational workarounds.
•Rules 1-2 establish data representation — All information in tables, all values accessible by table + key + column.
•Rules 3-4 handle special cases — Systematic NULL treatment and self-describing metadata catalog.
•Rules 5-7 define the query language — Comprehensive SQL, updatable views where possible, set-at-a-time operations.
•Rules 8-9 ensure data independence — Physical and logical changes don't break applications.
•Rules 10-12 protect integrity — Database-enforced constraints, distribution transparency, no backdoor bypass.
•No database is 100% compliant — But the rules provide a standard to measure against and aspire to.

What's Next:

Having explored Codd's formal rules, we'll now examine why the relational model achieved dominance in the database industry. The next page explores the factors that enabled the relational model to triumph over hierarchical and network alternatives, and what this means for modern database choices.

Page Complete

You now understand Codd's 12 Rules—the definitive criteria that distinguish truly relational databases from pretenders. These rules translate mathematical theory into practical requirements, providing a framework for evaluating databases and understanding what relational compliance means. Next, we'll explore why the relational model won the database wars.

3 / 5

Loading learning content...

Database Management SystemsData Models

The Relational Model

LevelBeginner

Duration75 mins

TopicData Models

3 / 5

Codd's 12 Rules

The Relational Manifesto

More than requirements, Codd's rules became a benchmark for evaluating database systems and a roadmap for database development. Let's examine each rule in detail.

What You Will Learn

Rule 0: The Foundation Rule

Rule 0 Statement

For any system that is advertised as, or claimed to be, a relational database management system, that system must be able to manage databases entirely through its relational capabilities.

Interpretation

This might seem obvious, but consider systems that:

Require pointer-based navigation for certain operations
Need non-SQL commands to perform some administrative tasks
Store metadata in non-relational structures
Require external tools for backup/recovery

If any essential database operation requires stepping outside the relational paradigm, the system isn't fully relational.

Why This Matters

Rule 0 Compliance ExampleEvaluating a system against Rule 0

Input

System A:
  - Data queries: SQL ✓
  - Data modification: SQL ✓
  - Schema changes: SQL ✓
  - Backup/recovery: SQL commands ✓
  - Permission management: SQL (GRANT/REVOKE) ✓
  - Metadata access: SQL (information_schema) ✓

Output

System A Verdict: COMPLIANT with Rule 0
All database management uses relational capabilities.

System B:
  - Data queries: SQL ✓
  - Bulk loading: Proprietary binary format ✗
  - Statistics: Must run external utility ✗
  
System B Verdict: PARTIAL compliance only

Rule 1: The Information Rule

Rule 1 Statement

All information in a relational database is represented explicitly at the logical level and in exactly one way—by values in tables.

Interpretation

This is perhaps the most fundamental rule. It establishes that:

Everything is a table: Data, metadata, relationships, constraints—all represented as values in tables
No hidden information: No data stored in pointers, object addresses, or invisible system structures
Uniform representation: The same access method works for all information

The Single Representation Principle

Pre-relational databases used different representations for different data:

User data in records
Relationships in pointer chains
Metadata in special system areas
Indexes as separate navigational structures

The relational model unifies everything: tables all the way down.

Implications

Self-describing databases: Metadata (table names, columns, types, constraints) is stored in tables (the system catalog) that can be queried just like user data
Relationships are data: Foreign key relationships are stored as values, not pointers
No navigational complexity: Access patterns don't depend on physical implementation

information-rule-example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-- The information rule in action: metadata as tables
-- Query the system catalog just like any other table
 
-- List all tables in PostgreSQL
SELECT table_name, table_type
FROM information_schema.tables
WHERE table_schema = 'public';
 
-- List all columns with their types
SELECT table_name, column_name, data_type, is_nullable
FROM information_schema.columns
WHERE table_schema = 'public';
 
-- List all foreign key relationships
SELECT
    tc.table_name AS referencing_table,
    kcu.column_name AS referencing_column,
    ccu.table_name AS referenced_table,
    ccu.column_name AS referenced_column
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
    ON tc.constraint_name = ccu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';
 
-- All the above use standard SQL against standard tables!

The Catalog is a Database

Rule 2: Guaranteed Access Rule

Rule 2 Statement

Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name.

Interpretation

This rule guarantees addressability. Every piece of data can be located precisely through three coordinates:

Table name: Which relation contains the data
Primary key: Which tuple (row) within that relation
Column name: Which attribute within that tuple

No Navigation Required

In hierarchical or network databases, accessing data might require:

Starting from a root record
Following a series of pointers
Knowing the physical path to the data

Rule 2 eliminates this. You specify WHAT you want, not HOW to get there.

Primary Keys are Mandatory

Rule 2 implicitly requires that every table has a primary key. Without a reliable way to identify individual rows, you cannot guarantee access to specific values.

This is why database design always emphasizes key selection—it's not just good practice, it's fundamental to relational access.

Guaranteed Access in PracticeDemonstrating precise data addressability

Input

To access Alice Chen's salary from the Employee table:

Coordinates:
  - Table: Employee
  - Primary Key: emp_id = 1001
  - Column: salary

SQL Query:
  SELECT salary FROM Employee WHERE emp_id = 1001;

Output

Result: 95000.00

This access is:
  ✓ Guaranteed to work (if the key exists)
  ✓ Independent of physical storage
  ✓ Same syntax regardless of table size
  ✓ No navigation path needed

Compare to hierarchical navigation:
  FIND Department WHERE name = 'Engineering'
  THEN FIND Employee UNDER Department WHERE emp_id = 1001
  THEN GET salary
  
The relational approach is simpler AND more powerful.

Tables Without Primary Keys

Rule 3: Systematic Treatment of NULL Values

Rule 3 Statement

Interpretation

Rule 3 addresses a critical real-world problem: sometimes data is unknown or doesn't apply. Codd recognized that a principled solution was essential.

NULL is NOT:

Empty string '' (which is a known value—nothing)
Zero (which is a known number)
False (which is a known boolean)
An error state

NULL IS:

A marker indicating absence of a value
Semantically: "unknown" or "inapplicable"

Systematic Treatment

NULL must be handled consistently across all operations:

Comparisons: NULL = NULL is UNKNOWN, not TRUE
Arithmetic: NULL + 5 = NULL
Aggregates: COUNT includes NULLs, SUM ignores them
Sorting: NULLs sort first or last (configurable)

This consistent treatment is what "systematic" means—NULL behavior is defined, not arbitrary.

Three-Valued Logic with NULL
Expression	Result	Explanation
NULL = NULL	UNKNOWN	We don't know if two unknowns are equal
NULL <> NULL	UNKNOWN	We don't know if two unknowns differ
5 > NULL	UNKNOWN	We can't compare known to unknown
NULL AND TRUE	UNKNOWN	Unknown AND anything that isn't FALSE = Unknown
NULL AND FALSE	FALSE	Regardless of unknown, FALSE wins
NULL OR TRUE	TRUE	Regardless of unknown, TRUE wins
NULL OR FALSE	UNKNOWN	Result depends on the unknown value
NOT NULL	UNKNOWN	Negation of unknown is unknown

null-handling.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-- Proper NULL handling in SQL
 
-- Testing for NULL (= doesn't work!)
SELECT * FROM Employee WHERE manager_id IS NULL;     -- Correct
SELECT * FROM Employee WHERE manager_id = NULL;      -- WRONG! Returns nothing
 
-- Coalesce: provide default for NULL
SELECT name, COALESCE(manager_id, 0) AS mgr 
FROM Employee;
 
-- NULLIF: return NULL if values match
SELECT NULLIF(bonus, 0) AS actual_bonus  -- Treat 0 bonus as "no bonus"
FROM Employee;
 
-- NULL in aggregates
SELECT 
    COUNT(*) AS total_rows,        -- Counts all rows including NULL
    COUNT(bonus) AS with_bonus,    -- Counts non-NULL bonus values
    SUM(bonus) AS total_bonus,     -- Sums non-NULL values
    AVG(bonus) AS avg_bonus        -- Average of non-NULL values only
FROM Employee;
 
-- CASE with NULL
SELECT name,
    CASE 
        WHEN bonus IS NULL THEN 'No bonus assigned'
        WHEN bonus = 0 THEN 'Zero bonus'
        ELSE 'Has bonus: ' || bonus::text
    END AS bonus_status
FROM Employee;

NULL Complications

Rule 4: Dynamic Online Catalog Based on Relational Model

Rule 4 Statement

Interpretation

Rule 4 extends Rule 1 specifically for metadata. The database catalog (schema information) must be:

Stored in tables: Using the same relational structures as user data
Queryable with SQL: Same language, same skills
Online and dynamic: Always reflects current state, updated automatically
Subject to security: Same permission system as data

The System Catalog

Modern databases implement this through system catalogs:

PostgreSQL: pg_catalog schema, information_schema view
MySQL: information_schema, mysql database
SQL Server: sys schema, INFORMATION_SCHEMA views
Oracle: DBA_*, ALL_*, USER_* views, INFORMATION_SCHEMA

The SQL standard defines INFORMATION_SCHEMA with standard table structures for portability.

catalog-queries.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
-- Standard INFORMATION_SCHEMA queries (SQL standard, cross-database)
 
-- List all user tables
SELECT table_name, table_type
FROM information_schema.tables
WHERE table_schema NOT IN ('pg_catalog', 'information_schema');
 
-- Find all columns with their constraints
SELECT 
    t.table_name,
    c.column_name,
    c.data_type,
    c.character_maximum_length,
    c.is_nullable,
    c.column_default
FROM information_schema.tables t
JOIN information_schema.columns c 
    ON t.table_name = c.table_name
WHERE t.table_schema = 'public'
ORDER BY t.table_name, c.ordinal_position;
 
-- Find all primary keys
SELECT
    tc.table_name,
    string_agg(kcu.column_name, ', ') AS pk_columns
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
    ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'PRIMARY KEY'
GROUP BY tc.table_name;
 
-- Check which tables reference a given table
SELECT
    tc.table_name AS referencing_table,
    ccu.table_name AS referenced_table
FROM information_schema.table_constraints tc
JOIN information_schema.constraint_column_usage ccu
    ON tc.constraint_name = ccu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY'
    AND ccu.table_name = 'employee';  -- find refs to 'employee'

Practical Applications

Rule 5: Comprehensive Data Sublanguage Rule

Rule 5 Statement

Interpretation

Rule 5 requires a unified language capable of all database operations. While multiple interfaces may exist (graphical tools, APIs, embedded SQL), at least one complete language must exist.

SQL Fulfills This Rule

SQL is the comprehensive data sublanguage that modern databases use:

Data Definition (DDL):

CREATE TABLE, ALTER TABLE, DROP TABLE
CREATE INDEX, CREATE SEQUENCE
CREATE SCHEMA, CREATE DATABASE

View Definition:

CREATE VIEW, CREATE MATERIALIZED VIEW
ALTER VIEW, DROP VIEW

Data Manipulation (DML):

SELECT, INSERT, UPDATE, DELETE
MERGE, TRUNCATE

Integrity Constraints:

PRIMARY KEY, FOREIGN KEY, UNIQUE
CHECK, NOT NULL, DEFAULT

Authorization (DCL):

GRANT, REVOKE
CREATE ROLE, ALTER ROLE

Transaction Control (TCL):

BEGIN, COMMIT, ROLLBACK
SAVEPOINT, SET TRANSACTION

SQL Statement Categories
Category	Purpose	Key Statements
DDL	Define structure	CREATE, ALTER, DROP, TRUNCATE
DML	Manipulate data	SELECT, INSERT, UPDATE, DELETE, MERGE
DCL	Control access	GRANT, REVOKE
TCL	Control transactions	BEGIN, COMMIT, ROLLBACK, SAVEPOINT
Views	Define virtual tables	CREATE VIEW, CREATE MATERIALIZED VIEW
Constraints	Enforce integrity	PRIMARY KEY, FOREIGN KEY, CHECK

Character String Representation

Notably, Codd specified that statements must be expressible as character strings. This ensures:

Standardization: A textual syntax can be formally defined
Portability: Statements can be written, stored, and transmitted
Tool support: IDEs, version control, and documentation can work with queries
Reproducibility: Exact statements can be logged and replayed

This is why SQL is text-based rather than a graphical or binary protocol.

Beyond SQL

Rule 6: View Updating Rule

Rule 6 Statement

All views that are theoretically updatable are also updatable by the system.

Interpretation

When is a View Updatable?

A view is potentially updatable when updates can be unambiguously translated to base table updates:

UPDATABLE views typically:

Select from a single base table
Include the primary key
Don't use DISTINCT, GROUP BY, HAVING
Don't use aggregate functions
Don't use set operations (UNION, INTERSECT)

NON-UPDATABLE views typically:

Join multiple tables (which row gets updated?)
Use aggregates (what does updating SUM mean?)
Use DISTINCT (which duplicate do we update?)
Compute derived columns (can't update a calculation)

updatable-view.sql
-- UPDATABLE VIEW
-- Simple projection of base table
CREATE VIEW active_employees AS
SELECT emp_id, name, department, salary
FROM employee
WHERE is_active = true;
 
-- This is updatable:
UPDATE active_employees
SET salary = 50000
WHERE emp_id = 1001;
-- Translates to:
-- UPDATE employee 
-- SET salary = 50000 
-- WHERE emp_id = 1001;
 
-- This INSERT works too:
INSERT INTO active_employees 
    (emp_id, name, department, salary)
VALUES (1005, 'Eve', 'Sales', 60000);
-- (is_active defaults or uses CHECK OPTION)

non-updatable-view.sql
-- NON-UPDATABLE VIEW
-- Aggregation makes updates ambiguous
CREATE VIEW dept_stats AS
SELECT 
    department,
    COUNT(*) AS emp_count,
    AVG(salary) AS avg_salary
FROM employee
GROUP BY department;
 
-- This CANNOT be updated:
UPDATE dept_stats
SET avg_salary = 80000
WHERE department = 'Engineering';
-- ERROR: How do we change individual 
-- salaries to achieve avg of 80000?
 
-- Similarly ambiguous:
INSERT INTO dept_stats
VALUES ('Marketing', 5, 70000);
-- ERROR: What are the 5 employees?

Rule 6 Compliance is Partial

Rule 7: High-Level Insert, Update, and Delete

Rule 7 Statement

The capability of handling a base relation or a derived relation (view) as a single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data.

Interpretation

Rule 7 requires set-at-a-time processing: the ability to operate on multiple rows as a single logical operation, not just row-by-row processing.

Set vs Procedural Processing

Pre-relational databases often required:

OPEN CURSOR
FOR EACH ROW:
    READ ROW
    MODIFY VALUES
    WRITE ROW
CLOSE CURSOR

Relational databases allow:

UPDATE Employee SET salary = salary * 1.10 WHERE department = 'Engineering';

The entire set of matching rows is updated as ONE operation.

Benefits of Set Operations

Simplicity: One statement, not a loop
Atomicity: All rows updated or none (transaction)
Optimization: Database can parallelize, use indexes
Correctness: No risk of partial completion
Performance: Set operations are often faster than row-by-row

set-operations.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- Rule 7: Set-at-a-time operations
 
-- Insert multiple rows in one statement
INSERT INTO employee (emp_id, name, department, salary)
VALUES 
    (1001, 'Alice', 'Engineering', 95000),
    (1002, 'Bob', 'Marketing', 78000),
    (1003, 'Carol', 'Engineering', 102000);
 
-- Update all matching rows at once
UPDATE employee 
SET salary = salary * 1.10
WHERE department = 'Engineering'
    AND last_review_date < '2023-01-01';
-- All matching employees get raises in ONE operation
 
-- Delete all matching rows at once  
DELETE FROM employee
WHERE termination_date < CURRENT_DATE - INTERVAL '7 years';
-- All terminated employees beyond retention period deleted
 
-- Insert from query result (set-based)
INSERT INTO employee_archive 
SELECT * FROM employee WHERE is_archived = true;
-- Could move thousands of rows in one operation
 
-- Complex set update
UPDATE products p
SET p.price = p.price * 0.9
WHERE p.category_id IN (
    SELECT c.id FROM categories c WHERE c.name = 'Clearance'
);
-- Discount all clearance products at once

Avoiding Row-by-Row Processing

Rule 8: Physical Data Independence

Rule 8 Statement

Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.

Interpretation

Physical data independence means applications don't break when the physical storage layer changes. You can:

Move data to different disks
Change from HDD to SSD
Add or remove indexes
Change clustering strategies
Modify buffer pool sizes
Switch compression algorithms
Reorganize tablespaces

...all WITHOUT modifying application code.

The Abstraction Boundary

Physical data independence is achieved by maintaining a clean boundary:

Application ← SQL → Logical Schema ← Mapping → Physical Storage

Applications see only the logical view (tables, columns, constraints). The database handles the mapping to physical storage internally.

Physical Changes That Shouldn't Affect Applications
Physical Change	Purpose	App Impact
Add B-tree index	Speed up lookups	None (faster queries only)
Move table to SSD	Improve I/O performance	None (transparent)
Enable compression	Reduce storage	None (handled internally)
Partition a table	Manageability, performance	None (partitioning is transparent)
Change from heap to clustered	Optimize range scans	None (storage detail)
Add read replicas	Scale reads	None (or minimal config change)
Upgrade RAID configuration	Redundancy	None (OS/hardware level)

Physical Independence in ActionSame query, different physical implementations

Input

Application Query (unchanged):
SELECT * FROM orders WHERE customer_id = 12345;

Physical Storage Evolution:
  Year 1: Single disk, no index
    → Full table scan, 500ms
    
  Year 2: Added B-tree index on customer_id
    → Index lookup, 5ms
    
  Year 3: Table partitioned by date, on SSD
    → Partition pruning + index, 2ms
    
  Year 4: Hot data in memory, cold on disk
    → In-memory lookup, 0.5ms

Output

Result: Application code never changed!

The exact same SELECT statement became 1000x faster
through physical optimizations alone.

Physical data independence enabled this evolution
without any application modifications.

Index Hints as Violation

Rule 9: Logical Data Independence

Rule 9 Statement

Application programs and terminal activities remain logically unimpaired when information-preserving changes of any kind that theoretically permit unimpairment are made to the base tables.

Interpretation

"Information-Preserving" Changes

These changes reorganize without losing information:

Splitting a table: One table → multiple tables with foreign keys
Merging tables: Multiple tables → one denormalized table
Adding columns: New data, existing queries still work
Renaming (with views): Old names map to new structure

Achieving Logical Independence with Views

Views are the primary mechanism for logical data independence:

logical-independence.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
-- BEFORE: Original table structure
CREATE TABLE person (
    id INT PRIMARY KEY,
    name VARCHAR(100),
    home_address VARCHAR(200),
    work_address VARCHAR(200),
    home_phone VARCHAR(20),
    work_phone VARCHAR(20)
);
 
-- Applications use: SELECT name, home_address FROM person;
 
-- AFTER: Normalized redesign (Rule 9 scenario)
-- Split addresses and phones into separate tables
 
CREATE TABLE person_v2 (
    id INT PRIMARY KEY,
    name VARCHAR(100)
);
 
CREATE TABLE address (
    id INT PRIMARY KEY,
    person_id INT REFERENCES person_v2(id),
    address_type VARCHAR(10),  -- 'home' or 'work'
    address VARCHAR(200)
);
 
CREATE TABLE phone (
    id INT PRIMARY KEY,
    person_id INT REFERENCES person_v2(id),
    phone_type VARCHAR(10),
    phone_number VARCHAR(20)
);
 
-- COMPATIBILITY VIEW: Old apps keep working!
CREATE VIEW person AS
SELECT 
    p.id,
    p.name,
    ha.address AS home_address,
    wa.address AS work_address,
    hp.phone_number AS home_phone,
    wp.phone_number AS work_phone
FROM person_v2 p
LEFT JOIN address ha ON p.id = ha.person_id AND ha.address_type = 'home'
LEFT JOIN address wa ON p.id = wa.person_id AND wa.address_type = 'work'
LEFT JOIN phone hp ON p.id = hp.person_id AND hp.phone_type = 'home'
LEFT JOIN phone wp ON p.id = wp.person_id AND wp.phone_type = 'work';
 
-- Original query STILL WORKS:
SELECT name, home_address FROM person;  -- unchanged!

Schema Evolution Strategy

Rule 10: Integrity Independence

Rule 10 Statement

Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.

Interpretation

Rule 10 mandates that data integrity rules be:

Defined in the database: Using SQL, not application code
Stored in the catalog: As queryable metadata
Enforced by the DBMS: Automatically, at all times
Independent of applications: Changes don't require app updates

Why Database-Level Constraints Matter

If constraints exist only in applications:

Each application must implement them correctly
Different apps might enforce different rules
Direct database access bypasses all constraints
Bugs in constraint code corrupt data
Changing rules requires changing all apps

Database-enforced constraints are:

Universal (apply to all access paths)
Consistent (same rules everywhere)
Reliable (DBMS-tested, not app-developer written)
Discoverable (visible in catalog)
Evolvable (change once, affects all)

integrity-independence.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Integrity constraints defined in the database (Rule 10 compliant)
 
CREATE TABLE employee (
    emp_id INT PRIMARY KEY,                              -- Entity integrity
    email VARCHAR(100) UNIQUE NOT NULL,                   -- Not null + unique
    first_name VARCHAR(50) NOT NULL,                      -- Required field
    last_name VARCHAR(50) NOT NULL,
    hire_date DATE NOT NULL DEFAULT CURRENT_DATE,         -- Default value
    salary DECIMAL(10,2) CHECK (salary > 0),              -- Domain constraint
    manager_id INT REFERENCES employee(emp_id),           -- Self-referential FK
    department_id INT NOT NULL REFERENCES department(id), -- Referential integrity
    
    -- Table-level constraint: manager must earn more
    CONSTRAINT mgr_salary_check CHECK (
        manager_id IS NULL OR EXISTS (
            SELECT 1 FROM employee m 
            WHERE m.emp_id = manager_id AND m.salary > salary
        )
    )  -- Note: This specific syntax varies by DBMS
);
 
-- Complex constraints via triggers when CHECK is insufficient
CREATE OR REPLACE FUNCTION enforce_department_budget()
RETURNS TRIGGER AS $$
BEGIN
    IF (SELECT SUM(salary) FROM employee WHERE department_id = NEW.department_id)
        > (SELECT budget FROM department WHERE id = NEW.department_id) THEN
        RAISE EXCEPTION 'Salary would exceed department budget';
    END IF;
    RETURN NEW;
END;
$$ LANGUAGE plpgsql;
 
CREATE TRIGGER check_budget
    BEFORE INSERT OR UPDATE ON employee
    FOR EACH ROW EXECUTE FUNCTION enforce_department_budget();
 
-- Query constraints from catalog
SELECT constraint_name, constraint_type, table_name
FROM information_schema.table_constraints
WHERE table_name = 'employee';

The Reality Gap

Rule 11: Distribution Independence

Rule 11 Statement

The end user must not be able to see that the data is distributed over various locations. Users should always get the impression that the data is located at one site only.

Interpretation

In distributed databases, data may be spread across multiple physical locations (nodes, data centers, regions). Rule 11 requires that this distribution be transparent to users and applications.

Location Transparency

Applications should access distributed data exactly as if it were local:

-- This query should work the same whether 'orders' is:
-- - Local table
-- - On another server in the same data center
-- - Partitioned across continents
-- - Replicated globally

SELECT * FROM orders WHERE customer_id = 12345;

The DBMS handles:

Finding where the data resides
Routing queries to appropriate nodes
Combining results from multiple sources
Managing replication and consistency

Types of Distribution Transparency

Location transparency: No need to know which node holds data
Fragmentation transparency: No awareness of data partitioning
Replication transparency: No knowledge of multiple copies

Distribution Transparency Levels
Transparency Level	What's Hidden	Example Capability
Location	Physical node addresses	Query works regardless of data location
Fragmentation	How tables are partitioned	Query spanning partitions looks like one table
Replication	Multiple data copies	Reads served from any replica transparently
Latency	Network delays	Query optimizer considers network costs
Failure	Node outages	Automatic failover without app changes

Modern Distributed Databases

Rule 11 was written in 1985 when distributed databases were rare. Today, distribution is common:

Sharding: Tables partitioned across nodes by key range/hash
Replication: Copies for redundancy and read scaling
Multi-region: Data in multiple geographic locations
Federated: Queries spanning multiple database systems

CAP Theorem Implications

Rule 12: The Non-Subversion Rule

Rule 12 Statement

Interpretation

Insert data that violates constraints
Bypass foreign key checks
Circumvent access control
Violate transaction isolation

Why This Matters

Databases often provide multiple access methods:

High-level: SQL (full constraint checking)
Low-level: Cursor-based row access
Internal: Storage APIs, bulk loaders
Administrative: Maintenance tools

If low-level methods could bypass constraints, the integrity guarantees of the high-level SQL would be meaningless. A constraint is only as strong as the weakest access path.

Rule 12 Compliance

•Bulk loaders validate constraints
•Direct table access still checks permissions
•Internal APIs enforce foreign keys
•Cursor operations respect isolation levels
•No 'backdoor' write paths

Rule 12 Violations

•Bulk import skips constraint checks
•Direct file editing bypasses DBMS
•Admin tools can disable triggers
•Replication applies changes without FK checks
•Maintenance mode allows constraint violations

Real-World Trade-offs

In practice, some Rule 12 violations are intentional for:

Performance: Bulk loads often defer constraint checking for speed, then validate at commit
Recovery: Disaster recovery may need to restore data even if constraints temporarily fail
Migration: Schema changes sometimes require temporarily invalid states

However, these should be:

Exceptional situations, not normal operations
Performed by trusted administrators
Followed by constraint validation
Logged and audited

Direct File Access is Dangerous

Codd's Rules in the Modern Era

Codd articulated these rules in 1985—nearly 40 years ago. How do they hold up today?

Still Fully Relevant

Rules 0, 1, 2, 4, 5, 7, 8, 9, 10 remain as valid as ever. The core principles of:

Data as tables
Guaranteed access via keys
Comprehensive query language
Set-based operations
Data independence
Database-enforced integrity

These are timeless and define quality database design.

Evolved in Practice

Codd's Rules Compliance in Major Databases
Rule	PostgreSQL	MySQL	SQL Server	Oracle
0: Foundation	✓	Partial	✓	✓
1: Information	✓	✓	✓	✓
2: Guaranteed Access	✓	✓	✓	✓
3: NULL Handling	✓	✓	✓	✓
4: Online Catalog	✓	✓	✓	✓
5: Comprehensive Language	✓	✓	✓	✓
6: View Updating	Partial	Partial	Partial	Partial
7: Set Operations	✓	✓	✓	✓
8: Physical Independence	✓	✓	✓	✓
9: Logical Independence	✓ (via views)	✓	✓	✓
10: Integrity Independence	✓	Partial	✓	✓
11: Distribution Independence	✓ (with extensions)	Partial	✓ (with AG)	✓ (RAC)
12: Non-subversion	Mostly	Partial	Mostly	Mostly

Using Rules as a Benchmark

Summary: The Relational Standard

Codd's 12 Rules (plus Rule 0) represent the definitive criteria for what constitutes a truly relational database management system. Let's consolidate the key points:

Key Takeaways

•Rule 0 sets the foundation — A system must manage databases entirely through relational capabilities, not non-relational workarounds.
•Rules 1-2 establish data representation — All information in tables, all values accessible by table + key + column.
•Rules 3-4 handle special cases — Systematic NULL treatment and self-describing metadata catalog.
•Rules 5-7 define the query language — Comprehensive SQL, updatable views where possible, set-at-a-time operations.
•Rules 8-9 ensure data independence — Physical and logical changes don't break applications.
•Rules 10-12 protect integrity — Database-enforced constraints, distribution transparency, no backdoor bypass.
•No database is 100% compliant — But the rules provide a standard to measure against and aspire to.

What's Next:

Page Complete

3 / 5