Relations And Tables - Learning Module

Loading content...

0/241

Relation Schema: Defining Structure

The Blueprint of Relations

Every database table has two aspects: its structure (what columns exist, what types they have, what constraints apply) and its content (the actual rows of data). The structure is defined by the relation schema, while the content at any moment constitutes the relation instance.

Understanding this separation is fundamental. The schema is like an architectural blueprint—it defines what's possible. The instance is the actual building—what exists right now. Architects can discuss blueprints without ever constructing a building; similarly, we can reason about schemas without any data.

This page explores relation schemas in depth: their formal definition, their components, how they constrain instances, and how they evolve over the lifetime of a database.

Learning Objectives

By the end of this page, you will: (1) Formally define and notate relation schemas, (2) Understand all components of a schema including attributes, domains, and constraints, (3) Distinguish between intension (schema) and extension (instance), (4) Recognize how schemas evolve through DDL operations, and (5) Apply schema concepts to practical database design.

Formal Definition of Relation Schema

A relation schema provides the complete structural specification for a relation. It defines what the relation can contain without specifying what it currently does contain.

Formal Definition:

A relation schema R is a tuple:

R = (R_name, {A₁, A₂, ..., Aₙ}, dom, C)

Where:

R_name is the unique name of the relation
{A₁, A₂, ..., Aₙ} is a set of n attribute names
dom: Aᵢ → Dᵢ is a function mapping each attribute to its domain
C is a set of constraints that any valid instance must satisfy

The degree of the schema is n, the number of attributes.

Common Notation Styles:

Different textbooks and systems use different notations for schemas:

1. Simple Notation (Most Common):

Employee(emp_id, emp_name, department, salary, hire_date)

2. With Domain Annotations:

Employee(emp_id: INTEGER, emp_name: VARCHAR(100), 
         department: VARCHAR(50), salary: DECIMAL(12,2), 
         hire_date: DATE)

3. With Key Underlining (Textbook Convention):

Employee(emp_id, emp_name, department, salary, hire_date)
         ^^^^^^
         Primary Key (underlined)

4. With Constraint Annotations:

Employee(
    emp_id: INTEGER [PK, NOT NULL],
    emp_name: VARCHAR(100) [NOT NULL],
    department: VARCHAR(50) [NOT NULL, FK→Departments.dept_name],
    salary: DECIMAL(12,2) [NOT NULL, CHECK > 0],
    hire_date: DATE [NOT NULL, CHECK ≤ CURRENT_DATE]
)

Reading Schema Notation

When you see R(A₁, A₂, ..., Aₙ), read it as: 'Relation R has attributes A₁ through Aₙ.' The schema tells you what questions you can ask of the data. If Employee has a 'salary' attribute, you can query salaries. If it doesn't have a 'performance_rating' attribute, you cannot—that information isn't modeled.

Schema Components in Detail

Let's examine each component of a relation schema comprehensively.

Schema Components

•Relation Name (R_name): A unique identifier for the relation within the database. Naming conventions vary: singular (Employee) vs plural (Employees), snake_case (employee_info) vs PascalCase (EmployeeInfo). Consistency matters more than which convention you choose.
•Attribute Set ({A₁, ..., Aₙ}): The set of named attributes. Each attribute represents a property of the entities or relationships modeled. Names should be descriptive, unambiguous, and follow a consistent convention. Common patterns: 'entity_property' (emp_salary), 'property' (salary), or 'property_qualifier' (start_date, end_date).
•Domain Function (dom): Maps each attribute to its domain—the set of legal values. In SQL, domains are approximated by data types plus constraints. The domain encapsulates both the physical representation (INTEGER, VARCHAR) and the semantic meaning (positive numbers only, valid email format).
•Constraint Set (C): Rules that any valid instance must satisfy. Includes key constraints, NOT NULL constraints, CHECK constraints, and referential integrity constraints. Constraints encode business rules directly in the schema.

Attribute Characteristics:

Each attribute in a schema has several properties:

Property	Description	Example
Name	Unique identifier within the relation	emp_id, salary
Domain	Set of allowable values	Positive integers, Valid emails
Data Type	Physical representation	INTEGER, VARCHAR(100)
Nullability	Whether NULL is allowed	NOT NULL, NULLABLE
Default	Value when not specified	DEFAULT 0, DEFAULT CURRENT_DATE
Computed	Whether value is derived	AS (quantity * unit_price)

Complete Schema SpecificationA fully specified schema for an Order relation:

Input

Order relation for an e-commerce system

Output

Order(
    order_id: INTEGER [PK, NOT NULL, AUTO_INCREMENT],
    customer_id: INTEGER [NOT NULL, FK→Customers.customer_id],
    order_date: TIMESTAMP [NOT NULL, DEFAULT CURRENT_TIMESTAMP],
    status: ENUM('pending','processing','shipped','delivered','cancelled') 
            [NOT NULL, DEFAULT 'pending'],
    total_amount: DECIMAL(12,2) [NOT NULL, CHECK ≥ 0],
    shipping_address_id: INTEGER [FK→Addresses.address_id]
)

Explanation

This schema specifies six attributes with complete domain specifications, nullability, defaults, and relationships. It defines what an Order is structurally without containing any actual orders.

Intension vs. Extension

The distinction between intension and extension is fundamental in database theory, borrowed from logic and philosophy.

Intension (Schema): The intension of a relation is its schema—the definition of its structure. It answers: 'What can this relation contain?' The intension is relatively permanent; it changes only through explicit schema modifications (ALTER TABLE).

Extension (Instance): The extension of a relation is its current instance—the set of tuples it contains at a given moment. It answers: 'What does this relation contain now?' The extension changes frequently through data operations (INSERT, UPDATE, DELETE).

Intension (Schema)

•Defines structure, not content
•Specifies allowable states
•Time-invariant (mostly)
•Changed by DDL (ALTER, CREATE)
•Like a class definition in OOP
•Stored in system catalog
•Independent of any data

Extension (Instance)

•Contains actual data
•One of many allowable states
•Time-varying (constantly)
•Changed by DML (INSERT, UPDATE, DELETE)
•Like object instances in OOP
•Stored in data files
•Must conform to schema

The Relationship:

The intension defines a space of possible extensions. At any moment, the actual extension must be one of these valid possibilities. Think of it as:

Intension defines the rules of the game
Extension is the current state of play

Analogy: Form vs. Content

Consider a paper form for recording employee information:

The form design (what fields exist, what's required) is the intension
Each filled-out form is an extension

You can design a form without filling it out (schema without data). You cannot fill out a form that doesn't exist (data without schema).

Practical Implications:

Schema design happens early: You must define the intension before inserting data. Planning the schema is a design activity.
Schema changes are expensive: Because the intension defines legal extensions, changing the schema may require transforming all existing data.
Instance changes are routine: The extension changes constantly—that's normal database operation.
Constraints are intensional: They're part of the schema, defining which extensions are valid.

Why This Matters

When someone says 'the Employee table,' they might mean the schema (structure) or the instance (data). Context clarifies. 'Show me the Employee table' typically means the extension (data). 'Design the Employee table' means the intension (schema). Being precise about this distinction prevents miscommunication.

Constraints as Schema Components

Constraints are rules embedded in the schema that restrict valid extensions. They are the schema's mechanism for encoding business rules, ensuring data quality, and maintaining consistency.

Types of Schema Constraints:

Constraint Types in Relation Schemas
Constraint Type	What It Enforces	SQL Syntax	Example
NOT NULL	Attribute must have a value	column_name type NOT NULL	emp_name VARCHAR(100) NOT NULL
UNIQUE	No duplicate values in column(s)	UNIQUE (columns)	UNIQUE (email)
PRIMARY KEY	Unique and not null identifier	PRIMARY KEY (columns)	PRIMARY KEY (emp_id)
FOREIGN KEY	Value must exist in referenced table	FOREIGN KEY (col) REFERENCES table(col)	FOREIGN KEY (dept_id) REFERENCES departments(dept_id)
CHECK	Custom condition must be true	CHECK (condition)	CHECK (salary > 0)
DEFAULT	Value when none specified	DEFAULT value	status VARCHAR(20) DEFAULT 'active'

Constraint Categories:

1. Domain Constraints: Restrict individual attribute values to valid domain members.

salary DECIMAL(12,2) CHECK (salary >= 0 AND salary < 10000000),
email VARCHAR(255) CHECK (email LIKE '%@%.%'),
status CHAR(1) CHECK (status IN ('A', 'I', 'P', 'T'))

2. Key Constraints: Ensure tuple uniqueness and identifiability.

PRIMARY KEY (order_id),
UNIQUE (customer_id, order_date),  -- Composite uniqueness
UNIQUE (social_security_number)     -- Candidate key

3. Referential Integrity Constraints: Ensure that references between relations are valid.

FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
    ON DELETE CASCADE
    ON UPDATE CASCADE

4. Semantic Constraints: Capture business rules that span multiple attributes.

CHECK (end_date > start_date),
CHECK (quantity > 0 OR status = 'cancelled'),
CHECK (discount_percent BETWEEN 0 AND 100)

Constraint Design Philosophy

Encode as many business rules as possible as constraints. Constraints are declarative, self-documenting, always enforced, and checked automatically by the DBMS. Application code can have bugs and be bypassed; constraints cannot. Every constraint is a bug that can never happen.

Database Schema: The Complete Picture

While we've focused on individual relation schemas, a database typically contains multiple relations. The database schema is the collection of all relation schemas plus inter-relation constraints.

Definition (Database Schema):

A database schema S consists of:

A set of relation schemas: {R₁, R₂, ..., Rₘ}
A set of inter-relation constraints (primarily foreign keys)
Additional global constraints and triggers

Diagrammatic Representation:

Database schemas are often visualized using Entity-Relationship Diagrams (ERDs) or relational schema diagrams:

┌────────────────┐       ┌────────────────┐
│  Departments   │       │   Employees    │
├────────────────┤       ├────────────────┤
│ *dept_id  [PK] │──┐    │ *emp_id   [PK] │
│  dept_name     │  │    │  emp_name      │
│  manager_id    │──┼──┐ │  salary        │
│  budget        │  │  │ │  dept_id  [FK] │──┐
└────────────────┘  │  │ │  hire_date     │  │
                    │  │ └────────────────┘  │
                    │  │                     │
                    │  └─────────────────────┘
                    │         
                    │     ┌────────────────┐
                    │     │   Projects     │
                    │     ├────────────────┤
                    │     │ *project_id [PK]│
                    └────▶│  dept_id   [FK]│
                          │  project_name  │
                          │  budget        │
                          └────────────────┘

Components of a Complete Database Schema:

Component	Description	Example
Relation schemas	Individual table definitions	Employee, Department, Project
Foreign key constraints	Cross-table references	Employee.dept_id → Department.dept_id
Views	Virtual relations derived from base relations	EmployeeSummary, DepartmentBudgets
Indexes	Query optimization structures	idx_employee_name, idx_order_date
Sequences	Auto-incrementing counters	emp_id_seq, order_number_seq
Domains	Custom data types	EmailAddress, PhoneNumber
Triggers	Automatic actions on data changes	audit_employee_changes
Stored Procedures	Reusable SQL logic	calculate_bonus(), transfer_employee()

complete_schema.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- Complete database schema example
-- for a minimal HR system
 
-- Custom domain (PostgreSQL)
CREATE DOMAIN positive_money AS DECIMAL(12,2)
    CHECK (VALUE > 0);
 
-- Department relation schema
CREATE TABLE departments (
    dept_id SERIAL PRIMARY KEY,
    dept_name VARCHAR(100) NOT NULL UNIQUE,
    budget positive_money,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
 
-- Employee relation schema
CREATE TABLE employees (
    emp_id SERIAL PRIMARY KEY,
    emp_name VARCHAR(100) NOT NULL,
    email VARCHAR(255) UNIQUE,
    dept_id INTEGER REFERENCES departments(dept_id)
        ON DELETE SET NULL
        ON UPDATE CASCADE,
    salary positive_money NOT NULL,
    hire_date DATE NOT NULL DEFAULT CURRENT_DATE,
    manager_id INTEGER REFERENCES employees(emp_id),
    CONSTRAINT valid_hire_date CHECK (hire_date <= CURRENT_DATE)
);
 
-- View (virtual relation)
CREATE VIEW employee_summary AS
    SELECT e.emp_name, d.dept_name, e.salary
    FROM employees e
    LEFT JOIN departments d ON e.dept_id = d.dept_id;
 
-- Index (physical optimization)
CREATE INDEX idx_employee_dept ON employees(dept_id);
CREATE INDEX idx_employee_manager ON employees(manager_id);

Schema as Documentation

A well-designed database schema serves as living documentation. By reading the schema, you can understand what entities exist, how they relate, what rules govern data, and what operations are supported. Treat schema design as an exercise in precision and clarity.

Schema Evolution

Real databases aren't static—they evolve as requirements change. Schema evolution refers to modifications to the database structure over time. Managing schema evolution is one of the most challenging aspects of database maintenance.

Types of Schema Changes:

Schema Evolution Operations
Operation	SQL Command	Impact	Difficulty
Add column	ALTER TABLE ADD COLUMN	Low risk if NULLable or has default	Easy
Remove column	ALTER TABLE DROP COLUMN	Data loss, breaks dependent queries	Moderate
Rename column	ALTER TABLE RENAME COLUMN	Breaks applications using old name	Moderate
Change data type	ALTER TABLE ALTER COLUMN TYPE	May require data conversion	Hard
Add constraint	ALTER TABLE ADD CONSTRAINT	May fail if existing data violates	Moderate
Remove constraint	ALTER TABLE DROP CONSTRAINT	Allows previously invalid data	Easy
Add table	CREATE TABLE	No impact on existing data	Easy
Remove table	DROP TABLE	Data loss, breaks references	Hard
Rename table	ALTER TABLE RENAME TO	Breaks applications using old name	Moderate

Schema Evolution Challenges:

1. Backward Compatibility: Applications written against the old schema may break. Solutions include:

Maintaining views with old column names
Gradual deprecation with warnings
Versioned APIs

2. Data Migration: Existing data may not conform to new constraints. Strategies:

Clean data before adding constraints
Add column with default, then populate, then add NOT NULL
Use triggers to transform data on access

3. Downtime: Some schema changes lock tables. Approaches:

Blue-green deployments with schema migration
Online DDL operations (if DBMS supports)
Scheduled maintenance windows

4. Rollback: What if the schema change causes problems?

Keep previous migration scripts
Use transactional DDL (PostgreSQL)
Maintain schema version history

schema_migration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
-- Example: Safe column addition with default
 
-- Step 1: Add nullable column
ALTER TABLE employees ADD COLUMN department_date DATE;
 
-- Step 2: Backfill existing rows
UPDATE employees SET department_date = hire_date 
WHERE department_date IS NULL;
 
-- Step 3: Now safe to add constraint
ALTER TABLE employees 
    ALTER COLUMN department_date SET NOT NULL;
 
-- Step 4: Add check constraint
ALTER TABLE employees 
    ADD CONSTRAINT chk_dept_date 
    CHECK (department_date >= hire_date);

Schema Change Best Practices

Always: (1) Test migrations on a copy of production data, (2) Have a rollback plan, (3) Communicate with application teams before changes, (4) Consider the impact on running queries and transactions, (5) Version your schema migrations (tools like Flyway, Liquibase, Alembic help).

Schema Design Best Practices

Well-designed schemas are easier to query, maintain, and evolve. Here are principles distilled from decades of database practice:

Schema Design Principles

•Name Thoughtfully: Choose names that describe semantics, not implementation. 'customer_id' over 'id123', 'order_total' over 'amount'. Names should be self-documenting.
•Be Consistent: Follow conventions uniformly. If you use 'created_at' in one table, don't use 'date_created' in another. Consistency reduces cognitive load.
•Avoid Abbreviations: 'quantity' not 'qty', 'description' not 'desc'. Storage is cheap; clarity is expensive to retrofit.
•Use Appropriate Types: Don't store dates as strings, numbers as varchar, or booleans as integers. Proper types enable operations and prevent invalid data.
•Define All Constraints: Every business rule that can be expressed as a constraint should be. Schemas should make illegal states unrepresentable.
•Document Intent: Use COMMENT statements, maintain external documentation, or use descriptive names. Future maintainers (including you) will thank you.
•Design for Change: Anticipate that requirements will evolve. Prefer extensible patterns over rigid ones.
•Normalize Until It Hurts, Then Denormalize Strategically: Start with normalized schemas to eliminate redundancy, then selectively denormalize for performance where measurements justify it.

Common Schema Anti-Patterns:

Anti-Pattern	Problem	Better Approach
God Table	One table with 100+ columns	Decompose into focused, normalized tables
Stringly Typed	Encoding structure in strings	Use proper data types and normalization
Polymorphic Junk Drawer	Generic 'key-value' tables	Define explicit schemas for each entity type
Implicit Relationships	Relationships encoded in application	Explicit foreign keys
Fear of NULLs	Using magic values (-1, 'N/A')	NULL with proper handling
No Keys	Tables without primary keys	Always define a primary key
Temporal Chaos	No consideration for time	Explicit timestamp columns, history tables

Schema Review Checklist

Before finalizing a schema, verify: ✓ Every table has a primary key, ✓ All relationships have foreign keys, ✓ Column types are appropriate, ✓ NOT NULL is specified where required, ✓ Constraints encode business rules, ✓ Names are clear and consistent, ✓ Indexes support expected queries.

Summary: The Power of Schema

The relation schema is the foundational specification that gives structure, meaning, and constraints to data. Let's consolidate our understanding:

Key Concepts Mastered

•Schema Definition: A relation schema formally specifies the name, attributes, domains, and constraints of a relation—its complete structural blueprint.
•Schema Components: Include the relation name, attribute set with domains, and constraint set. Each component serves a specific purpose in defining valid data.
•Intension vs Extension: The schema (intension) defines what's possible; the instance (extension) captures what's true now. This separation enables reasoning about structure independently of content.
•Constraints as Guards: Constraints are the schema's mechanism for encoding business rules and ensuring data quality. They make illegal states unrepresentable.
•Database Schema: The complete schema includes all relation schemas plus inter-relation constraints, views, and other database objects.
•Schema Evolution: Schemas change over time. Managing this evolution requires careful planning, migration strategies, and backward compatibility consideration.
•Best Practices: Good schema design emphasizes clarity, consistency, appropriate typing, comprehensive constraints, and anticipation of change.

What's Next:

With relation schemas thoroughly understood, we'll examine relation instances—the actual data that populates schemas at any given moment. The interplay between schema and instance is the dynamic core of database operation.

Schema Mastery Achieved

You now understand relation schemas as formal structures that define the possibilities within which data exists. This knowledge underlies all database design work—you can specify what data means, how it's constrained, and how it can evolve. Next, we'll explore how instances bring schemas to life with actual data.