Loading content...
Every database table has two aspects: its structure (what columns exist, what types they have, what constraints apply) and its content (the actual rows of data). The structure is defined by the relation schema, while the content at any moment constitutes the relation instance.
Understanding this separation is fundamental. The schema is like an architectural blueprint—it defines what's possible. The instance is the actual building—what exists right now. Architects can discuss blueprints without ever constructing a building; similarly, we can reason about schemas without any data.
This page explores relation schemas in depth: their formal definition, their components, how they constrain instances, and how they evolve over the lifetime of a database.
By the end of this page, you will: (1) Formally define and notate relation schemas, (2) Understand all components of a schema including attributes, domains, and constraints, (3) Distinguish between intension (schema) and extension (instance), (4) Recognize how schemas evolve through DDL operations, and (5) Apply schema concepts to practical database design.
A relation schema provides the complete structural specification for a relation. It defines what the relation can contain without specifying what it currently does contain.
Formal Definition:
A relation schema R is a tuple:
R = (R_name, {A₁, A₂, ..., Aₙ}, dom, C)
Where:
The degree of the schema is n, the number of attributes.
Common Notation Styles:
Different textbooks and systems use different notations for schemas:
1. Simple Notation (Most Common):
Employee(emp_id, emp_name, department, salary, hire_date)
2. With Domain Annotations:
Employee(emp_id: INTEGER, emp_name: VARCHAR(100),
department: VARCHAR(50), salary: DECIMAL(12,2),
hire_date: DATE)
3. With Key Underlining (Textbook Convention):
Employee(emp_id, emp_name, department, salary, hire_date)
^^^^^^
Primary Key (underlined)
4. With Constraint Annotations:
Employee(
emp_id: INTEGER [PK, NOT NULL],
emp_name: VARCHAR(100) [NOT NULL],
department: VARCHAR(50) [NOT NULL, FK→Departments.dept_name],
salary: DECIMAL(12,2) [NOT NULL, CHECK > 0],
hire_date: DATE [NOT NULL, CHECK ≤ CURRENT_DATE]
)
When you see R(A₁, A₂, ..., Aₙ), read it as: 'Relation R has attributes A₁ through Aₙ.' The schema tells you what questions you can ask of the data. If Employee has a 'salary' attribute, you can query salaries. If it doesn't have a 'performance_rating' attribute, you cannot—that information isn't modeled.
Let's examine each component of a relation schema comprehensively.
Attribute Characteristics:
Each attribute in a schema has several properties:
| Property | Description | Example |
|---|---|---|
| Name | Unique identifier within the relation | emp_id, salary |
| Domain | Set of allowable values | Positive integers, Valid emails |
| Data Type | Physical representation | INTEGER, VARCHAR(100) |
| Nullability | Whether NULL is allowed | NOT NULL, NULLABLE |
| Default | Value when not specified | DEFAULT 0, DEFAULT CURRENT_DATE |
| Computed | Whether value is derived | AS (quantity * unit_price) |
Order relation for an e-commerce systemOrder(
order_id: INTEGER [PK, NOT NULL, AUTO_INCREMENT],
customer_id: INTEGER [NOT NULL, FK→Customers.customer_id],
order_date: TIMESTAMP [NOT NULL, DEFAULT CURRENT_TIMESTAMP],
status: ENUM('pending','processing','shipped','delivered','cancelled')
[NOT NULL, DEFAULT 'pending'],
total_amount: DECIMAL(12,2) [NOT NULL, CHECK ≥ 0],
shipping_address_id: INTEGER [FK→Addresses.address_id]
)This schema specifies six attributes with complete domain specifications, nullability, defaults, and relationships. It defines what an Order is structurally without containing any actual orders.
The distinction between intension and extension is fundamental in database theory, borrowed from logic and philosophy.
Intension (Schema): The intension of a relation is its schema—the definition of its structure. It answers: 'What can this relation contain?' The intension is relatively permanent; it changes only through explicit schema modifications (ALTER TABLE).
Extension (Instance): The extension of a relation is its current instance—the set of tuples it contains at a given moment. It answers: 'What does this relation contain now?' The extension changes frequently through data operations (INSERT, UPDATE, DELETE).
The Relationship:
The intension defines a space of possible extensions. At any moment, the actual extension must be one of these valid possibilities. Think of it as:
Analogy: Form vs. Content
Consider a paper form for recording employee information:
You can design a form without filling it out (schema without data). You cannot fill out a form that doesn't exist (data without schema).
Practical Implications:
Schema design happens early: You must define the intension before inserting data. Planning the schema is a design activity.
Schema changes are expensive: Because the intension defines legal extensions, changing the schema may require transforming all existing data.
Instance changes are routine: The extension changes constantly—that's normal database operation.
Constraints are intensional: They're part of the schema, defining which extensions are valid.
When someone says 'the Employee table,' they might mean the schema (structure) or the instance (data). Context clarifies. 'Show me the Employee table' typically means the extension (data). 'Design the Employee table' means the intension (schema). Being precise about this distinction prevents miscommunication.
Constraints are rules embedded in the schema that restrict valid extensions. They are the schema's mechanism for encoding business rules, ensuring data quality, and maintaining consistency.
Types of Schema Constraints:
| Constraint Type | What It Enforces | SQL Syntax | Example |
|---|---|---|---|
| NOT NULL | Attribute must have a value | column_name type NOT NULL | emp_name VARCHAR(100) NOT NULL |
| UNIQUE | No duplicate values in column(s) | UNIQUE (columns) | UNIQUE (email) |
| PRIMARY KEY | Unique and not null identifier | PRIMARY KEY (columns) | PRIMARY KEY (emp_id) |
| FOREIGN KEY | Value must exist in referenced table | FOREIGN KEY (col) REFERENCES table(col) | FOREIGN KEY (dept_id) REFERENCES departments(dept_id) |
| CHECK | Custom condition must be true | CHECK (condition) | CHECK (salary > 0) |
| DEFAULT | Value when none specified | DEFAULT value | status VARCHAR(20) DEFAULT 'active' |
Constraint Categories:
1. Domain Constraints: Restrict individual attribute values to valid domain members.
salary DECIMAL(12,2) CHECK (salary >= 0 AND salary < 10000000),
email VARCHAR(255) CHECK (email LIKE '%@%.%'),
status CHAR(1) CHECK (status IN ('A', 'I', 'P', 'T'))
2. Key Constraints: Ensure tuple uniqueness and identifiability.
PRIMARY KEY (order_id),
UNIQUE (customer_id, order_date), -- Composite uniqueness
UNIQUE (social_security_number) -- Candidate key
3. Referential Integrity Constraints: Ensure that references between relations are valid.
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
ON DELETE CASCADE
ON UPDATE CASCADE
4. Semantic Constraints: Capture business rules that span multiple attributes.
CHECK (end_date > start_date),
CHECK (quantity > 0 OR status = 'cancelled'),
CHECK (discount_percent BETWEEN 0 AND 100)
Encode as many business rules as possible as constraints. Constraints are declarative, self-documenting, always enforced, and checked automatically by the DBMS. Application code can have bugs and be bypassed; constraints cannot. Every constraint is a bug that can never happen.
While we've focused on individual relation schemas, a database typically contains multiple relations. The database schema is the collection of all relation schemas plus inter-relation constraints.
Definition (Database Schema):
A database schema S consists of:
Diagrammatic Representation:
Database schemas are often visualized using Entity-Relationship Diagrams (ERDs) or relational schema diagrams:
┌────────────────┐ ┌────────────────┐
│ Departments │ │ Employees │
├────────────────┤ ├────────────────┤
│ *dept_id [PK] │──┐ │ *emp_id [PK] │
│ dept_name │ │ │ emp_name │
│ manager_id │──┼──┐ │ salary │
│ budget │ │ │ │ dept_id [FK] │──┐
└────────────────┘ │ │ │ hire_date │ │
│ │ └────────────────┘ │
│ │ │
│ └─────────────────────┘
│
│ ┌────────────────┐
│ │ Projects │
│ ├────────────────┤
│ │ *project_id [PK]│
└────▶│ dept_id [FK]│
│ project_name │
│ budget │
└────────────────┘
Components of a Complete Database Schema:
| Component | Description | Example |
|---|---|---|
| Relation schemas | Individual table definitions | Employee, Department, Project |
| Foreign key constraints | Cross-table references | Employee.dept_id → Department.dept_id |
| Views | Virtual relations derived from base relations | EmployeeSummary, DepartmentBudgets |
| Indexes | Query optimization structures | idx_employee_name, idx_order_date |
| Sequences | Auto-incrementing counters | emp_id_seq, order_number_seq |
| Domains | Custom data types | EmailAddress, PhoneNumber |
| Triggers | Automatic actions on data changes | audit_employee_changes |
| Stored Procedures | Reusable SQL logic | calculate_bonus(), transfer_employee() |
1234567891011121314151617181920212223242526272829303132333435363738
-- Complete database schema example-- for a minimal HR system -- Custom domain (PostgreSQL)CREATE DOMAIN positive_money AS DECIMAL(12,2) CHECK (VALUE > 0); -- Department relation schemaCREATE TABLE departments ( dept_id SERIAL PRIMARY KEY, dept_name VARCHAR(100) NOT NULL UNIQUE, budget positive_money, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP); -- Employee relation schemaCREATE TABLE employees ( emp_id SERIAL PRIMARY KEY, emp_name VARCHAR(100) NOT NULL, email VARCHAR(255) UNIQUE, dept_id INTEGER REFERENCES departments(dept_id) ON DELETE SET NULL ON UPDATE CASCADE, salary positive_money NOT NULL, hire_date DATE NOT NULL DEFAULT CURRENT_DATE, manager_id INTEGER REFERENCES employees(emp_id), CONSTRAINT valid_hire_date CHECK (hire_date <= CURRENT_DATE)); -- View (virtual relation)CREATE VIEW employee_summary AS SELECT e.emp_name, d.dept_name, e.salary FROM employees e LEFT JOIN departments d ON e.dept_id = d.dept_id; -- Index (physical optimization)CREATE INDEX idx_employee_dept ON employees(dept_id);CREATE INDEX idx_employee_manager ON employees(manager_id);A well-designed database schema serves as living documentation. By reading the schema, you can understand what entities exist, how they relate, what rules govern data, and what operations are supported. Treat schema design as an exercise in precision and clarity.
Real databases aren't static—they evolve as requirements change. Schema evolution refers to modifications to the database structure over time. Managing schema evolution is one of the most challenging aspects of database maintenance.
Types of Schema Changes:
| Operation | SQL Command | Impact | Difficulty |
|---|---|---|---|
| Add column | ALTER TABLE ADD COLUMN | Low risk if NULLable or has default | Easy |
| Remove column | ALTER TABLE DROP COLUMN | Data loss, breaks dependent queries | Moderate |
| Rename column | ALTER TABLE RENAME COLUMN | Breaks applications using old name | Moderate |
| Change data type | ALTER TABLE ALTER COLUMN TYPE | May require data conversion | Hard |
| Add constraint | ALTER TABLE ADD CONSTRAINT | May fail if existing data violates | Moderate |
| Remove constraint | ALTER TABLE DROP CONSTRAINT | Allows previously invalid data | Easy |
| Add table | CREATE TABLE | No impact on existing data | Easy |
| Remove table | DROP TABLE | Data loss, breaks references | Hard |
| Rename table | ALTER TABLE RENAME TO | Breaks applications using old name | Moderate |
Schema Evolution Challenges:
1. Backward Compatibility: Applications written against the old schema may break. Solutions include:
2. Data Migration: Existing data may not conform to new constraints. Strategies:
3. Downtime: Some schema changes lock tables. Approaches:
4. Rollback: What if the schema change causes problems?
1234567891011121314151617
-- Example: Safe column addition with default -- Step 1: Add nullable columnALTER TABLE employees ADD COLUMN department_date DATE; -- Step 2: Backfill existing rowsUPDATE employees SET department_date = hire_date WHERE department_date IS NULL; -- Step 3: Now safe to add constraintALTER TABLE employees ALTER COLUMN department_date SET NOT NULL; -- Step 4: Add check constraintALTER TABLE employees ADD CONSTRAINT chk_dept_date CHECK (department_date >= hire_date);Always: (1) Test migrations on a copy of production data, (2) Have a rollback plan, (3) Communicate with application teams before changes, (4) Consider the impact on running queries and transactions, (5) Version your schema migrations (tools like Flyway, Liquibase, Alembic help).
Well-designed schemas are easier to query, maintain, and evolve. Here are principles distilled from decades of database practice:
Common Schema Anti-Patterns:
| Anti-Pattern | Problem | Better Approach |
|---|---|---|
| God Table | One table with 100+ columns | Decompose into focused, normalized tables |
| Stringly Typed | Encoding structure in strings | Use proper data types and normalization |
| Polymorphic Junk Drawer | Generic 'key-value' tables | Define explicit schemas for each entity type |
| Implicit Relationships | Relationships encoded in application | Explicit foreign keys |
| Fear of NULLs | Using magic values (-1, 'N/A') | NULL with proper handling |
| No Keys | Tables without primary keys | Always define a primary key |
| Temporal Chaos | No consideration for time | Explicit timestamp columns, history tables |
Before finalizing a schema, verify: ✓ Every table has a primary key, ✓ All relationships have foreign keys, ✓ Column types are appropriate, ✓ NOT NULL is specified where required, ✓ Constraints encode business rules, ✓ Names are clear and consistent, ✓ Indexes support expected queries.
The relation schema is the foundational specification that gives structure, meaning, and constraints to data. Let's consolidate our understanding:
What's Next:
With relation schemas thoroughly understood, we'll examine relation instances—the actual data that populates schemas at any given moment. The interplay between schema and instance is the dynamic core of database operation.
You now understand relation schemas as formal structures that define the possibilities within which data exists. This knowledge underlies all database design work—you can specify what data means, how it's constrained, and how it can evolve. Next, we'll explore how instances bring schemas to life with actual data.