Loading learning content...
A relational schema is the formal, precise specification of a database's logical structure. It defines what relations (tables) exist, what attributes (columns) each contains, what domains constrain each attribute, and what integrity constraints govern the data. If the database is a building, the relational schema is its architectural blueprint—comprehensive, unambiguous, and authoritative.
Understanding relational schemas is essential for multiple reasons:
This page provides an exhaustive treatment of relational schemas: their formal mathematical foundations, practical notation systems, diagrammatic representations, and documentation standards used across industry and academia.
By the end of this page, you will be able to read and write formal relational schema notation, understand the mathematical foundations underlying schema definitions, create and interpret schema diagrams, and apply industry-standard documentation practices to logical database designs.
The relational model, introduced by Edgar F. Codd in 1970, has precise mathematical foundations. Understanding these foundations clarifies why schemas are designed as they are and enables rigorous reasoning about database correctness.
Mathematical Foundations:
A domain D is a set of atomic values. Each domain has a name, a data type, and potentially additional constraints (format, range, enumeration).
Examples:
An attribute A is the name given to a role played by a domain D in a relation. Multiple attributes may use the same domain.
A relation schema R(A₁, A₂, ..., Aₙ) is a relation name R and an ordered list of attributes, where each attribute Aᵢ has an associated domain dom(Aᵢ).
A relational database schema S = {R₁, R₂, ..., Rₘ, IC} is a set of relation schemas Rᵢ together with a set of integrity constraints IC.
| Component | Mathematical Definition | Practical Interpretation |
|---|---|---|
| Relation Name | R ∈ Names | Table identifier (e.g., 'Employee') |
| Attribute Set | {A₁, A₂, ..., Aₙ} | Column names (e.g., 'EmployeeID', 'Name') |
| Attribute Order | Tuple (A₁, A₂, ..., Aₙ) | Column sequence (matters in some contexts) |
| Domain Assignment | dom: A → D | Data type binding (A → INTEGER, etc.) |
| Degree | n = |{A₁, A₂, ..., Aₙ}| | Number of columns |
| Relation Schema | R(A₁:D₁, ..., Aₙ:Dₙ) | Complete table structure definition |
The schema defines structure (intension); the instance contains actual data (extension). A schema is like a class definition in OOP, while an instance is like the objects created from that class. Schema changes are migrations; instance changes are transactions.
Relation Instance:
A relation instance (or relation state) r(R) is a set of tuples {t₁, t₂, ..., tₘ} where each tuple t is an ordered list of values t = <v₁, v₂, ..., vₙ> such that each value vᵢ is an element of dom(Aᵢ) or is NULL:
The cardinality of a relation is the number of tuples |r(R)|, which changes as data is inserted, updated, or deleted.
Key insight: While the schema is (relatively) static, defining the rules of the database, instances are dynamic, changing with every transaction. Good schema design anticipates pattern of instance changes and optimizes for common operations.
Text-based schema notation provides a precise, portable way to document relational schemas in contexts where diagrams aren't practical—papers, emails, whiteboards, code comments, and design documents.
Standard Academic Notation:
The widely-accepted notation for relation schemas follows these conventions:
RELATION_NAME(attribute1, attribute2, ..., attributeN)
Conventions for marking constraints:
Employee(<u>EmployeeID</u>, Name, Salary)Employee(ID, Name, DeptID*) where * denotes FKSince underlining and italics aren't always available, alternative notations exist:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677
=====================================================RELATIONAL SCHEMA NOTATION EXAMPLES===================================================== --------------------------------------------------NOTATION STYLE 1: Inline Markers--------------------------------------------------Employee(EmployeeID [PK], FirstName, LastName, Email [UK], DepartmentID [FK → Department.DepartmentID])Department(DepartmentID [PK], DepartmentName [UK], Budget)Project(ProjectID [PK], ProjectName, StartDate, EndDate)EmployeeProject(EmployeeID [PK,FK], ProjectID [PK,FK], HoursPerWeek, Role) Legend: PK = Primary Key, FK = Foreign Key, UK = Unique Key --------------------------------------------------NOTATION STYLE 2: Separate Constraint Listing--------------------------------------------------Employee(EmployeeID, FirstName, LastName, Email, Salary, HireDate, DepartmentID) Constraints: PK: Employee(EmployeeID) UK: Employee(Email) FK: Employee(DepartmentID) → Department(DepartmentID) NOT NULL: FirstName, LastName, Email, HireDate, DepartmentID CHECK: Salary >= 0 --------------------------------------------------NOTATION STYLE 3: Textual Description (Formal)--------------------------------------------------Schema: COMPANY_DB R1 = EMPLOYEE(ID: INTEGER, NAME: VARCHAR(100), DEPT_ID: INTEGER, SALARY: DECIMAL(10,2)) PRIMARY KEY: {ID} FOREIGN KEY: {DEPT_ID} REFERENCES DEPARTMENT R2 = DEPARTMENT(ID: INTEGER, NAME: VARCHAR(100), BUDGET: DECIMAL(15,2)) PRIMARY KEY: {ID} UNIQUE: {NAME} R3 = WORKS_ON(EMP_ID: INTEGER, PROJ_ID: INTEGER, HOURS: DECIMAL(4,1)) PRIMARY KEY: {EMP_ID, PROJ_ID} FOREIGN KEY: {EMP_ID} REFERENCES EMPLOYEE FOREIGN KEY: {PROJ_ID} REFERENCES PROJECT --------------------------------------------------NOTATION STYLE 4: Markdown-Friendly--------------------------------------------------## Employee| Column | Type | Constraints ||--------------|---------------|----------------------|| **EmployeeID** | INT | PK || FirstName | VARCHAR(50) | NOT NULL || LastName | VARCHAR(50) | NOT NULL || Email | VARCHAR(100) | UNIQUE, NOT NULL || DepartmentID | INT | FK → Department(ID) | --------------------------------------------------NOTATION STYLE 5: CompSci Textbook (Elmasri/Navathe)--------------------------------------------------EMPLOYEE EmployeeID INTEGER NOT NULL FirstName VARCHAR(50) NOT NULL LastName VARCHAR(50) NOT NULL Email VARCHAR(100) NOT NULL, UNIQUE Salary DECIMAL(10,2) DepartmentID INTEGER NOT NULL PRIMARY KEY (EmployeeID) FOREIGN KEY (DepartmentID) REFERENCES DEPARTMENT(DepartmentID) ON DELETE SET NULL ON UPDATE CASCADEChoose one notation style and use it consistently throughout a project. Document your conventions in a schema style guide. Inconsistent notation leads to misinterpretation, especially in large teams or during knowledge transfer.
While text notation is precise and portable, visual diagrams communicate schema structure more intuitively. Several diagram types are used in practice:
1. Relational Schema Diagram
The simplest visual representation shows relations as labeled rectangles containing attribute names:
This format is common in textbooks and informal design discussions.
2. Entity-Relationship Diagram (ER Diagram)
While technically a conceptual model, ER diagrams are often adapted to show logical structure:
3. UML Class Diagram (Database Profile)
UML class diagrams can represent relational schemas using the database profile:
Diagram Best Practices:
Popular tools for schema diagrams include: DbSchema, MySQL Workbench, pgModeler (PostgreSQL), Lucidchart, draw.io, and Microsoft Visio. Many provide both forward engineering (diagram → DDL) and reverse engineering (database → diagram) capabilities.
A critical aspect of schema specification is domain definition—constraining each attribute to a specific set of valid values. SQL provides built-in data types, but effective schema design requires understanding their characteristics and limitations.
Numeric Types:
Numeric types vary in precision, range, and storage requirements:
Character Types:
Temporal Types:
| Use Case | Recommended Type | Rationale |
|---|---|---|
| Surrogate primary key | INT or BIGINT AUTO_INCREMENT | Compact, fast comparisons |
| Monetary amounts | DECIMAL(19,4) | Exact precision, avoids rounding errors |
| Percentages | DECIMAL(5,2) | Range 0.00 to 100.00 with precision |
| Person names | VARCHAR(100) | Variable length, most names < 100 chars |
| Email addresses | VARCHAR(254) | RFC 5321 maximum length |
| Country codes | CHAR(2) or CHAR(3) | Fixed ISO standard length |
| UUIDs | CHAR(36) or UUID | Standard format, universal uniqueness |
| Boolean flags | BOOLEAN or TINYINT(1) | True/false or 0/1 |
| Timestamps | TIMESTAMP WITH TIME ZONE | Unambiguous time representation |
| Large text content | TEXT | No length constraint, separate storage |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
-- ==================================================-- DOMAIN DEFINITIONS IN SQL-- ================================================== -- Standard SQL supports CREATE DOMAIN (not all DBs implement)-- PostgreSQL example: -- Define semantic domainsCREATE DOMAIN EmailAddress AS VARCHAR(254) CHECK (VALUE ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'); CREATE DOMAIN PositiveMoney AS DECIMAL(19,4) CHECK (VALUE >= 0); CREATE DOMAIN PhoneNumber AS VARCHAR(20) CHECK (VALUE ~ '^\+?[0-9\-\s\(\)]+$'); CREATE DOMAIN Percentage AS DECIMAL(5,2) CHECK (VALUE >= 0 AND VALUE <= 100); -- Using domains in table definitionCREATE TABLE Customer ( CustomerID SERIAL PRIMARY KEY, Email EmailAddress NOT NULL UNIQUE, Phone PhoneNumber, LoyaltyPoints INTEGER DEFAULT 0 CHECK (LoyaltyPoints >= 0), DiscountRate Percentage DEFAULT 0.00); -- For databases without CREATE DOMAIN, use CHECK constraintsCREATE TABLE Product ( ProductID INT PRIMARY KEY, ProductName VARCHAR(200) NOT NULL, SKU CHAR(12) NOT NULL UNIQUE, -- Inline domain constraints Price DECIMAL(10,2) NOT NULL CHECK (Price >= 0), Weight DECIMAL(8,3) CHECK (Weight > 0), StockLevel INT NOT NULL DEFAULT 0 CHECK (StockLevel >= 0), -- Enumerated domain as CHECK Category VARCHAR(50) NOT NULL CHECK (Category IN ('Electronics', 'Clothing', 'Food', 'Furniture', 'Other'))); -- Enum type (PostgreSQL)CREATE TYPE OrderStatus AS ENUM ( 'pending', 'confirmed', 'processing', 'shipped', 'delivered', 'cancelled'); CREATE TABLE Orders ( OrderID SERIAL PRIMARY KEY, CustomerID INT NOT NULL REFERENCES Customer(CustomerID), Status OrderStatus NOT NULL DEFAULT 'pending', OrderDate TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, TotalAmount PositiveMoney NOT NULL);Separate data storage from data display. Store phone numbers without formatting (just digits), dates in UTC, and currencies without symbols. Apply formatting in the application layer. This ensures data integrity and enables consistent searching/sorting.
Keys are the cornerstone of relational schema integrity. A complete schema specification must identify all key types and their roles.
Key Hierarchy:
Super Key: Any set of attributes that uniquely identifies a tuple. A relation may have many super keys.
Candidate Key: A minimal super key—no attribute can be removed while maintaining uniqueness. A relation may have multiple candidate keys.
Primary Key: The candidate key chosen as the main identifier. Every relation must have exactly one primary key. Values cannot be NULL.
Alternate Key: Candidate keys not chosen as primary key. Often implemented as UNIQUE constraints.
Foreign Key: An attribute (or set) that references a candidate key in another (or the same) relation. Enforces referential integrity.
Composite Key: A key consisting of multiple attributes (any of the above types can be composite).
Surrogate Key: A system-generated key (often auto-increment integer) with no business meaning, used for technical efficiency.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899
-- ==================================================-- COMPREHENSIVE KEY SPECIFICATION-- ================================================== -- EXAMPLE 1: Multiple Candidate Keys-- --------------------------------------------------- A book can be uniquely identified by ISBN or by -- (Title, Author, Edition) combination CREATE TABLE Book ( BookID SERIAL, -- Surrogate key ISBN CHAR(13) NOT NULL, -- Candidate key 1 Title VARCHAR(500) NOT NULL, -- \ Author VARCHAR(200) NOT NULL, -- | Candidate key 2 Edition INT DEFAULT 1, -- / Publisher VARCHAR(200), PublishYear INT, -- Primary Key (chosen surrogate for efficiency) PRIMARY KEY (BookID), -- Alternate Keys (other candidate keys) UNIQUE (ISBN), UNIQUE (Title, Author, Edition)); -- EXAMPLE 2: Composite Primary Key-- -------------------------------------------------CREATE TABLE Enrollment ( StudentID INT NOT NULL, CourseID INT NOT NULL, Semester CHAR(6) NOT NULL, -- e.g., '2024SP' Grade CHAR(2), EnrollmentDate DATE NOT NULL, -- Composite Primary Key PRIMARY KEY (StudentID, CourseID, Semester), -- Foreign Keys FOREIGN KEY (StudentID) REFERENCES Student(StudentID) ON DELETE CASCADE, FOREIGN KEY (CourseID) REFERENCES Course(CourseID) ON DELETE RESTRICT); -- EXAMPLE 3: Self-Referencing Foreign Key-- -------------------------------------------------CREATE TABLE Employee ( EmployeeID SERIAL PRIMARY KEY, Name VARCHAR(100) NOT NULL, ManagerID INT, -- Nullable (CEO has no manager) -- Self-referencing FK FOREIGN KEY (ManagerID) REFERENCES Employee(EmployeeID) ON DELETE SET NULL); -- EXAMPLE 4: Multiple Foreign Keys to Same Table-- -------------------------------------------------CREATE TABLE Flight ( FlightID SERIAL PRIMARY KEY, FlightNumber VARCHAR(10) NOT NULL, DepartureCity INT NOT NULL, ArrivalCity INT NOT NULL, DepartureTime TIMESTAMP NOT NULL, ArrivalTime TIMESTAMP NOT NULL, -- Both FKs reference same table (different roles) FOREIGN KEY (DepartureCity) REFERENCES City(CityID), FOREIGN KEY (ArrivalCity) REFERENCES City(CityID), -- Constraint: can't fly to same city CHECK (DepartureCity != ArrivalCity)); -- EXAMPLE 5: Identifying Relationship (Weak Entity)-- --------------------------------------------------- Room is uniquely identified only within a Building CREATE TABLE Building ( BuildingCode CHAR(3) PRIMARY KEY, BuildingName VARCHAR(100) NOT NULL, Address VARCHAR(200)); CREATE TABLE Room ( BuildingCode CHAR(3) NOT NULL, RoomNumber VARCHAR(10) NOT NULL, Capacity INT CHECK (Capacity > 0), RoomType VARCHAR(50), -- Composite PK includes parent's PK PRIMARY KEY (BuildingCode, RoomNumber), -- Identifying relationship FOREIGN KEY (BuildingCode) REFERENCES Building(BuildingCode) ON DELETE CASCADE -- If building deleted, rooms deleted ON UPDATE CASCADE -- If building code changes, propagate);Production-quality schemas require comprehensive documentation beyond the DDL statements themselves. Documentation serves future maintainers, auditors, and developers integrating with the database.
Essential Documentation Components:
Data Dictionary: A complete catalog of all tables, columns, their meanings, and valid values
Relationship Documentation: Explanation of why relationships exist and their business meaning
Constraint Rationale: Why each constraint exists and what business rule it enforces
Historical Context: Why certain design decisions were made, alternatives considered
Usage Examples: Sample queries for common access patterns
Change Log: History of schema modifications with rationale
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091
-- ==================================================-- DOCUMENTED SCHEMA EXAMPLE-- ================================================== /*===================================================TABLE: Customer===================================================Purpose: Stores information about individuals or organizations that purchase products or services. Business Owner: Sales DepartmentData Steward: CRM Team Lead Source Systems: - WebShop: Online registrations - SalesForce: Imported leads - Legacy CRM: Historical migration Privacy Classification: PII (Personally Identifiable)Retention Policy: 7 years after last activity===================================================*/CREATE TABLE Customer ( -- Primary identifier, system-generated -- Format: Sequential integer, no gaps guaranteed -- Source: Auto-generated on insert CustomerID SERIAL PRIMARY KEY, -- Legal name for individuals, company name for B2B -- Source: User registration or sales input -- Note: May contain international characters (UTF-8) CustomerName VARCHAR(200) NOT NULL, -- Primary contact email -- Validation: Standard email regex at app layer -- Used for: Account recovery, marketing (with consent) Email VARCHAR(254) NOT NULL UNIQUE, -- Customer segment classification -- Values: -- 'individual' - B2C personal accounts -- 'business' - B2B company accounts -- 'enterprise' - Large accounts with special terms -- Set by: Sales team based on contract value CustomerType VARCHAR(20) NOT NULL DEFAULT 'individual' CHECK (CustomerType IN ('individual', 'business', 'enterprise')), -- Accumulated loyalty points -- Calculation: 1 point per $10 spent -- Expiration: Points expire 24 months after earning -- Updated by: Daily batch process LoyaltyPoints INT NOT NULL DEFAULT 0 CHECK (LoyaltyPoints >= 0), -- Account creation timestamp -- Timezone: Stored in UTC CreatedAt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, -- Soft delete flag -- When true: Account deactivated but data retained -- Retention: Required for 7-year audit trail IsActive BOOLEAN NOT NULL DEFAULT TRUE); -- Index for email lookups (login, duplicate check)CREATE INDEX idx_customer_email ON Customer(Email); -- Index for active customer queriesCREATE INDEX idx_customer_active ON Customer(IsActive) WHERE IsActive = TRUE; /*---------------------------------------------------RELATIONSHIP: Customer -> Address (1:N)---------------------------------------------------Business Rule: A customer may have multiple addresses (shipping, billing, etc.). At least one primary address is required for order fulfillment. Integrity Rules: - Deleting a customer cascades to addresses - At most one primary address per customer per type---------------------------------------------------*/ -- External data dictionary entry (often in separate doc)COMMENT ON TABLE Customer IS 'Core entity for customer relationship management';COMMENT ON COLUMN Customer.CustomerID IS 'Unique identifier (PK, auto-increment)';COMMENT ON COLUMN Customer.Email IS 'Primary contact email (UK, login credential)';COMMENT ON COLUMN Customer.LoyaltyPoints IS 'Accumulated rewards points, batch-updated daily';The best documentation lives with the code. Use SQL COMMENT statements, embed rationale in DDL files, and keep documentation in version control alongside schema definitions. Stale external documents are worse than no documentation.
Schemas evolve over time. Business requirements change, performance issues emerge, and integration needs expand. Managing schema evolution requires disciplined versioning practices.
Migration Patterns:
Additive Changes: Add new tables, columns, indexes. Generally safe and backward-compatible.
Destructive Changes: Remove or rename elements. Requires careful coordination with applications.
Transformative Changes: Split/merge tables, change data types. Often requires data migration.
Version Control for Schemas:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
-- ==================================================-- MIGRATION VERSIONING EXAMPLES-- ================================================== -- Migration: V001__Create_Customer_Table.sql-- Date: 2024-01-15-- Author: Jane Developer-- Description: Initial customer table for CRM module CREATE TABLE Customer ( CustomerID SERIAL PRIMARY KEY, Email VARCHAR(254) NOT NULL UNIQUE, CreatedAt TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP); -- ================================================== -- Migration: V002__Add_Customer_Name.sql-- Date: 2024-02-01-- Author: John Engineer-- Description: Add name field per JIRA-1234 ALTER TABLE Customer ADD COLUMN CustomerName VARCHAR(200); -- Backfill existing recordsUPDATE Customer SET CustomerName = 'Unknown' WHERE CustomerName IS NULL; -- Make NOT NULL after backfillALTER TABLE Customer ALTER COLUMN CustomerName SET NOT NULL; -- ================================================== -- Migration: V003__Add_CustomerType_Enum.sql-- Date: 2024-03-15-- Author: Jane Developer-- Description: Customer segmentation feature -- Add new column with defaultALTER TABLE Customer ADD COLUMN CustomerType VARCHAR(20) NOT NULL DEFAULT 'individual'; -- Add constraintALTER TABLE Customer ADD CONSTRAINT chk_customer_type CHECK (CustomerType IN ('individual', 'business', 'enterprise')); -- ================================================== -- Rollback: V003__Add_CustomerType_Enum_ROLLBACK.sql-- ALWAYS test rollbacks before deployment ALTER TABLE Customer DROP CONSTRAINT chk_customer_type; ALTER TABLE Customer DROP COLUMN CustomerType; -- ================================================== -- Migration: V010__Split_Name_Into_FirstLast.sql-- Date: 2024-06-01-- Author: John Engineer-- Description: Enable proper name sorting/searching-- WARNING: Destructive change - thorough testing required -- Step 1: Add new columnsALTER TABLE Customer ADD COLUMN FirstName VARCHAR(100), ADD COLUMN LastName VARCHAR(100); -- Step 2: Migrate data (simplified; real migration handles edge cases)UPDATE Customer SET FirstName = SPLIT_PART(CustomerName, ' ', 1), LastName = SUBSTRING(CustomerName FROM POSITION(' ' IN CustomerName) + 1); -- Step 3: Set NOT NULL after migrationALTER TABLE Customer ALTER COLUMN FirstName SET NOT NULL, ALTER COLUMN LastName SET NOT NULL; -- Step 4: Remove old column (AFTER apps updated)-- NOTE: This is often a separate migration after verification-- ALTER TABLE Customer DROP COLUMN CustomerName;Relational schemas are the definitive specifications of database structure. Mastery of schema notation, documentation, and evolution practices is essential for professional database work.
What Comes Next:
With schema representation mastered, we turn to normalization—the systematic process of transforming schemas to minimize redundancy and dependency issues. Normalization ensures that our carefully designed schemas don't suffer from update anomalies and data inconsistencies that plague poorly structured databases.
You now command the vocabulary and techniques for specifying, documenting, and evolving relational schemas. These skills translate directly into professional database design work, where schema quality determines system reliability and maintainability.