Database Management SystemsLogical Design

Logical Design: From Concepts to Schemas

LevelIntermediate

Duration90 mins

TopicLogical Design

2 / 5

Relational Schema

The Blueprint of Relational Databases

A relational schema is the formal, precise specification of a database's logical structure. It defines what relations (tables) exist, what attributes (columns) each contains, what domains constrain each attribute, and what integrity constraints govern the data. If the database is a building, the relational schema is its architectural blueprint—comprehensive, unambiguous, and authoritative.

Understanding relational schemas is essential for multiple reasons:

Communication: Schemas provide a common language between database designers, application developers, DBAs, and stakeholders
Validation: A well-specified schema serves as the first line of defense against data quality issues
Documentation: Schemas document the data model for onboarding, audits, and maintenance
Evolution: Understanding schema notation enables controlled evolution through migrations and versioning

This page provides an exhaustive treatment of relational schemas: their formal mathematical foundations, practical notation systems, diagrammatic representations, and documentation standards used across industry and academia.

Learning Objectives

By the end of this page, you will be able to read and write formal relational schema notation, understand the mathematical foundations underlying schema definitions, create and interpret schema diagrams, and apply industry-standard documentation practices to logical database designs.

Formal Definition of Relational Schema

The relational model, introduced by Edgar F. Codd in 1970, has precise mathematical foundations. Understanding these foundations clarifies why schemas are designed as they are and enables rigorous reasoning about database correctness.

Mathematical Foundations:

A domain D is a set of atomic values. Each domain has a name, a data type, and potentially additional constraints (format, range, enumeration).

Examples:

D_EmployeeID = {positive integers}
D_Name = {strings of length 1-100, non-empty}
D_Salary = {decimal numbers ≥ 0}
D_Department = {'Engineering', 'Sales', 'HR', 'Finance', ...}

An attribute A is the name given to a role played by a domain D in a relation. Multiple attributes may use the same domain.

A relation schema R(A₁, A₂, ..., Aₙ) is a relation name R and an ordered list of attributes, where each attribute Aᵢ has an associated domain dom(Aᵢ).

A relational database schema S = {R₁, R₂, ..., Rₘ, IC} is a set of relation schemas Rᵢ together with a set of integrity constraints IC.

Components of a Relation Schema
Component	Mathematical Definition	Practical Interpretation
Relation Name	R ∈ Names	Table identifier (e.g., 'Employee')
Attribute Set	{A₁, A₂, ..., Aₙ}	Column names (e.g., 'EmployeeID', 'Name')
Attribute Order	Tuple (A₁, A₂, ..., Aₙ)	Column sequence (matters in some contexts)
Domain Assignment	dom: A → D	Data type binding (A → INTEGER, etc.)
Degree	n = \|{A₁, A₂, ..., Aₙ}\|	Number of columns
Relation Schema	R(A₁:D₁, ..., Aₙ:Dₙ)	Complete table structure definition

Schema vs. Instance

The schema defines structure (intension); the instance contains actual data (extension). A schema is like a class definition in OOP, while an instance is like the objects created from that class. Schema changes are migrations; instance changes are transactions.

Relation Instance:

A relation instance (or relation state) r(R) is a set of tuples {t₁, t₂, ..., tₘ} where each tuple t is an ordered list of values t = <v₁, v₂, ..., vₙ> such that each value vᵢ is an element of dom(Aᵢ) or is NULL:

vᵢ ∈ dom(Aᵢ) ∪ {NULL}

The cardinality of a relation is the number of tuples |r(R)|, which changes as data is inserted, updated, or deleted.

Key insight: While the schema is (relatively) static, defining the rules of the database, instances are dynamic, changing with every transaction. Good schema design anticipates pattern of instance changes and optimizes for common operations.

Text-Based Schema Notation

Text-based schema notation provides a precise, portable way to document relational schemas in contexts where diagrams aren't practical—papers, emails, whiteboards, code comments, and design documents.

Standard Academic Notation:

The widely-accepted notation for relation schemas follows these conventions:

RELATION_NAME(attribute1, attribute2, ..., attributeN)

Conventions for marking constraints:

Primary Key: Underline the attribute(s) or list as PK. Example: Employee(<u>EmployeeID</u>, Name, Salary)
Foreign Key: Use italic, arrow notation, or FK label. Example: Employee(ID, Name, DeptID*) where * denotes FK
Composite Key: Underline multiple attributes
Alternate Key: Double underline or note separately
NOT NULL: Superscript or bold

Since underlining and italics aren't always available, alternative notations exist:

schema_notation_examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
=====================================================
RELATIONAL SCHEMA NOTATION EXAMPLES
=====================================================
 
--------------------------------------------------
NOTATION STYLE 1: Inline Markers
--------------------------------------------------
Employee(EmployeeID [PK], FirstName, LastName, 
         Email [UK], DepartmentID [FK → Department.DepartmentID])
Department(DepartmentID [PK], DepartmentName [UK], Budget)
Project(ProjectID [PK], ProjectName, StartDate, EndDate)
EmployeeProject(EmployeeID [PK,FK], ProjectID [PK,FK], 
                HoursPerWeek, Role)
 
Legend: PK = Primary Key, FK = Foreign Key, UK = Unique Key
 
--------------------------------------------------
NOTATION STYLE 2: Separate Constraint Listing
--------------------------------------------------
Employee(EmployeeID, FirstName, LastName, Email, 
         Salary, HireDate, DepartmentID)
 
Constraints:
  PK: Employee(EmployeeID)
  UK: Employee(Email)
  FK: Employee(DepartmentID) → Department(DepartmentID)
  NOT NULL: FirstName, LastName, Email, HireDate, DepartmentID
  CHECK: Salary >= 0
 
--------------------------------------------------
NOTATION STYLE 3: Textual Description (Formal)
--------------------------------------------------
Schema: COMPANY_DB
 
R1 = EMPLOYEE(ID: INTEGER, NAME: VARCHAR(100), 
              DEPT_ID: INTEGER, SALARY: DECIMAL(10,2))
    PRIMARY KEY: {ID}
    FOREIGN KEY: {DEPT_ID} REFERENCES DEPARTMENT
 
R2 = DEPARTMENT(ID: INTEGER, NAME: VARCHAR(100), 
                BUDGET: DECIMAL(15,2))
    PRIMARY KEY: {ID}
    UNIQUE: {NAME}
 
R3 = WORKS_ON(EMP_ID: INTEGER, PROJ_ID: INTEGER, 
              HOURS: DECIMAL(4,1))
    PRIMARY KEY: {EMP_ID, PROJ_ID}
    FOREIGN KEY: {EMP_ID} REFERENCES EMPLOYEE
    FOREIGN KEY: {PROJ_ID} REFERENCES PROJECT
 
--------------------------------------------------
NOTATION STYLE 4: Markdown-Friendly
--------------------------------------------------
## Employee
| Column       | Type          | Constraints          |
|--------------|---------------|----------------------|
| **EmployeeID** | INT         | PK                   |
| FirstName    | VARCHAR(50)   | NOT NULL             |
| LastName     | VARCHAR(50)   | NOT NULL             |
| Email        | VARCHAR(100)  | UNIQUE, NOT NULL     |
| DepartmentID | INT           | FK → Department(ID)  |
 
--------------------------------------------------
NOTATION STYLE 5: CompSci Textbook (Elmasri/Navathe)
--------------------------------------------------
EMPLOYEE
  EmployeeID        INTEGER         NOT NULL
  FirstName         VARCHAR(50)     NOT NULL
  LastName          VARCHAR(50)     NOT NULL
  Email             VARCHAR(100)    NOT NULL, UNIQUE
  Salary            DECIMAL(10,2)   
  DepartmentID      INTEGER         NOT NULL
 
  PRIMARY KEY (EmployeeID)
  FOREIGN KEY (DepartmentID) REFERENCES DEPARTMENT(DepartmentID)
    ON DELETE SET NULL
    ON UPDATE CASCADE

Notation Consistency

Choose one notation style and use it consistently throughout a project. Document your conventions in a schema style guide. Inconsistent notation leads to misinterpretation, especially in large teams or during knowledge transfer.

Schema Diagrams and Visual Representation

While text notation is precise and portable, visual diagrams communicate schema structure more intuitively. Several diagram types are used in practice:

1. Relational Schema Diagram

The simplest visual representation shows relations as labeled rectangles containing attribute names:

Primary key attributes are underlined or marked with a key icon
Foreign keys are connected to their referenced primary keys with arrows
Data types may be shown alongside attribute names

This format is common in textbooks and informal design discussions.

2. Entity-Relationship Diagram (ER Diagram)

While technically a conceptual model, ER diagrams are often adapted to show logical structure:

Chen notation: entities as rectangles, relationships as diamonds, attributes as ovals
Crow's foot notation: entities as rectangles with relationship lines showing cardinality
IE (Information Engineering) notation: similar to crow's foot with minor variations

3. UML Class Diagram (Database Profile)

UML class diagrams can represent relational schemas using the database profile:

Classes represent tables
Attributes represent columns with types
Associations represent foreign key relationships
Stereotypes (<<PK>>, <<FK>>) mark key columns

Converting Mermaid diagram...

Diagram Best Practices:

Consistency: Use the same notation throughout the organization
Clarity: Don't overcrowd diagrams; break large schemas into subject areas
Hierarchy: Show the most important tables prominently
Color coding: Use colors to indicate subject areas or table types
Versioning: Include version numbers and dates on diagrams
Legend: Always include a notation legend for readers unfamiliar with conventions

Tool Selection

Popular tools for schema diagrams include: DbSchema, MySQL Workbench, pgModeler (PostgreSQL), Lucidchart, draw.io, and Microsoft Visio. Many provide both forward engineering (diagram → DDL) and reverse engineering (database → diagram) capabilities.

Data Types and Domain Specifications

A critical aspect of schema specification is domain definition—constraining each attribute to a specific set of valid values. SQL provides built-in data types, but effective schema design requires understanding their characteristics and limitations.

Numeric Types:

Numeric types vary in precision, range, and storage requirements:

INTEGER/INT: Whole numbers, typically 4 bytes, range ≈ ±2 billion
SMALLINT: 2 bytes, range ≈ ±32,000
BIGINT: 8 bytes, for very large values
DECIMAL(p,s)/NUMERIC(p,s): Exact decimal with precision p and scale s
FLOAT/REAL/DOUBLE: Approximate floating-point (avoid for financial data)

Character Types:

CHAR(n): Fixed-length, padded with spaces
VARCHAR(n): Variable-length up to n characters
TEXT/CLOB: Large text without length limit (database-specific)

Temporal Types:

DATE: Year, month, day
TIME: Hour, minute, second, optional timezone
TIMESTAMP/DATETIME: Combined date and time
INTERVAL: Duration between timestamps

Data Type Selection Guidelines
Use Case	Recommended Type	Rationale
Surrogate primary key	INT or BIGINT AUTO_INCREMENT	Compact, fast comparisons
Monetary amounts	DECIMAL(19,4)	Exact precision, avoids rounding errors
Percentages	DECIMAL(5,2)	Range 0.00 to 100.00 with precision
Person names	VARCHAR(100)	Variable length, most names < 100 chars
Email addresses	VARCHAR(254)	RFC 5321 maximum length
Country codes	CHAR(2) or CHAR(3)	Fixed ISO standard length
UUIDs	CHAR(36) or UUID	Standard format, universal uniqueness
Boolean flags	BOOLEAN or TINYINT(1)	True/false or 0/1
Timestamps	TIMESTAMP WITH TIME ZONE	Unambiguous time representation
Large text content	TEXT	No length constraint, separate storage

domain_definitions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-- ==================================================
-- DOMAIN DEFINITIONS IN SQL
-- ==================================================
 
-- Standard SQL supports CREATE DOMAIN (not all DBs implement)
-- PostgreSQL example:
 
-- Define semantic domains
CREATE DOMAIN EmailAddress AS VARCHAR(254)
    CHECK (VALUE ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');
 
CREATE DOMAIN PositiveMoney AS DECIMAL(19,4)
    CHECK (VALUE >= 0);
 
CREATE DOMAIN PhoneNumber AS VARCHAR(20)
    CHECK (VALUE ~ '^\+?[0-9\-\s\(\)]+$');
 
CREATE DOMAIN Percentage AS DECIMAL(5,2)
    CHECK (VALUE >= 0 AND VALUE <= 100);
 
-- Using domains in table definition
CREATE TABLE Customer (
    CustomerID      SERIAL          PRIMARY KEY,
    Email           EmailAddress    NOT NULL UNIQUE,
    Phone           PhoneNumber,
    LoyaltyPoints   INTEGER         DEFAULT 0 CHECK (LoyaltyPoints >= 0),
    DiscountRate    Percentage      DEFAULT 0.00
);
 
-- For databases without CREATE DOMAIN, use CHECK constraints
CREATE TABLE Product (
    ProductID       INT             PRIMARY KEY,
    ProductName     VARCHAR(200)    NOT NULL,
    SKU             CHAR(12)        NOT NULL UNIQUE,
    
    -- Inline domain constraints
    Price           DECIMAL(10,2)   NOT NULL CHECK (Price >= 0),
    Weight          DECIMAL(8,3)    CHECK (Weight > 0),
    StockLevel      INT             NOT NULL DEFAULT 0 CHECK (StockLevel >= 0),
    
    -- Enumerated domain as CHECK
    Category        VARCHAR(50)     NOT NULL 
                    CHECK (Category IN ('Electronics', 'Clothing', 
                                        'Food', 'Furniture', 'Other'))
);
 
-- Enum type (PostgreSQL)
CREATE TYPE OrderStatus AS ENUM (
    'pending', 'confirmed', 'processing', 
    'shipped', 'delivered', 'cancelled'
);
 
CREATE TABLE Orders (
    OrderID         SERIAL          PRIMARY KEY,
    CustomerID      INT             NOT NULL REFERENCES Customer(CustomerID),
    Status          OrderStatus     NOT NULL DEFAULT 'pending',
    OrderDate       TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    TotalAmount     PositiveMoney   NOT NULL
);

Storage vs. Display

Separate data storage from data display. Store phone numbers without formatting (just digits), dates in UTC, and currencies without symbols. Apply formatting in the application layer. This ensures data integrity and enables consistent searching/sorting.

Complete Key Specification

Keys are the cornerstone of relational schema integrity. A complete schema specification must identify all key types and their roles.

Key Hierarchy:

Super Key: Any set of attributes that uniquely identifies a tuple. A relation may have many super keys.
Candidate Key: A minimal super key—no attribute can be removed while maintaining uniqueness. A relation may have multiple candidate keys.
Primary Key: The candidate key chosen as the main identifier. Every relation must have exactly one primary key. Values cannot be NULL.
Alternate Key: Candidate keys not chosen as primary key. Often implemented as UNIQUE constraints.
Foreign Key: An attribute (or set) that references a candidate key in another (or the same) relation. Enforces referential integrity.
Composite Key: A key consisting of multiple attributes (any of the above types can be composite).
Surrogate Key: A system-generated key (often auto-increment integer) with no business meaning, used for technical efficiency.

key_specification_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
-- ==================================================
-- COMPREHENSIVE KEY SPECIFICATION
-- ==================================================
 
-- EXAMPLE 1: Multiple Candidate Keys
-- -------------------------------------------------
-- A book can be uniquely identified by ISBN or by 
-- (Title, Author, Edition) combination
 
CREATE TABLE Book (
    BookID          SERIAL,                     -- Surrogate key
    ISBN            CHAR(13)        NOT NULL,   -- Candidate key 1
    Title           VARCHAR(500)    NOT NULL,   -- \
    Author          VARCHAR(200)    NOT NULL,   -- | Candidate key 2
    Edition         INT             DEFAULT 1,  -- /
    Publisher       VARCHAR(200),
    PublishYear     INT,
    
    -- Primary Key (chosen surrogate for efficiency)
    PRIMARY KEY (BookID),
    
    -- Alternate Keys (other candidate keys)
    UNIQUE (ISBN),
    UNIQUE (Title, Author, Edition)
);
 
-- EXAMPLE 2: Composite Primary Key
-- -------------------------------------------------
CREATE TABLE Enrollment (
    StudentID       INT             NOT NULL,
    CourseID        INT             NOT NULL,
    Semester        CHAR(6)         NOT NULL,  -- e.g., '2024SP'
    Grade           CHAR(2),
    EnrollmentDate  DATE            NOT NULL,
    
    -- Composite Primary Key
    PRIMARY KEY (StudentID, CourseID, Semester),
    
    -- Foreign Keys
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID)
        ON DELETE CASCADE,
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
        ON DELETE RESTRICT
);
 
-- EXAMPLE 3: Self-Referencing Foreign Key
-- -------------------------------------------------
CREATE TABLE Employee (
    EmployeeID      SERIAL          PRIMARY KEY,
    Name            VARCHAR(100)    NOT NULL,
    ManagerID       INT,            -- Nullable (CEO has no manager)
    
    -- Self-referencing FK
    FOREIGN KEY (ManagerID) REFERENCES Employee(EmployeeID)
        ON DELETE SET NULL
);
 
-- EXAMPLE 4: Multiple Foreign Keys to Same Table
-- -------------------------------------------------
CREATE TABLE Flight (
    FlightID        SERIAL          PRIMARY KEY,
    FlightNumber    VARCHAR(10)     NOT NULL,
    DepartureCity   INT             NOT NULL,
    ArrivalCity     INT             NOT NULL,
    DepartureTime   TIMESTAMP       NOT NULL,
    ArrivalTime     TIMESTAMP       NOT NULL,
    
    -- Both FKs reference same table (different roles)
    FOREIGN KEY (DepartureCity) REFERENCES City(CityID),
    FOREIGN KEY (ArrivalCity) REFERENCES City(CityID),
    
    -- Constraint: can't fly to same city
    CHECK (DepartureCity != ArrivalCity)
);
 
-- EXAMPLE 5: Identifying Relationship (Weak Entity)
-- -------------------------------------------------
-- Room is uniquely identified only within a Building
 
CREATE TABLE Building (
    BuildingCode    CHAR(3)         PRIMARY KEY,
    BuildingName    VARCHAR(100)    NOT NULL,
    Address         VARCHAR(200)
);
 
CREATE TABLE Room (
    BuildingCode    CHAR(3)         NOT NULL,
    RoomNumber      VARCHAR(10)     NOT NULL,
    Capacity        INT             CHECK (Capacity > 0),
    RoomType        VARCHAR(50),
    
    -- Composite PK includes parent's PK
    PRIMARY KEY (BuildingCode, RoomNumber),
    
    -- Identifying relationship
    FOREIGN KEY (BuildingCode) REFERENCES Building(BuildingCode)
        ON DELETE CASCADE   -- If building deleted, rooms deleted
        ON UPDATE CASCADE   -- If building code changes, propagate
);

Key Selection Decision Framework

•Stability: Choose keys whose values won't change. Surrogate keys excel here.
•Simplicity: Prefer single-column keys over composites when possible for joins.
•Meaningfulness: Natural keys aid debugging; surrogate keys hide business logic.
•Size: Smaller keys (integers) mean smaller indexes and faster joins.
•Universality: If data may integrate with other systems, consider industry-standard identifiers (ISBN, SSN, ISIN).

Schema Documentation Standards

Production-quality schemas require comprehensive documentation beyond the DDL statements themselves. Documentation serves future maintainers, auditors, and developers integrating with the database.

Essential Documentation Components:

Data Dictionary: A complete catalog of all tables, columns, their meanings, and valid values
Relationship Documentation: Explanation of why relationships exist and their business meaning
Constraint Rationale: Why each constraint exists and what business rule it enforces
Historical Context: Why certain design decisions were made, alternatives considered
Usage Examples: Sample queries for common access patterns
Change Log: History of schema modifications with rationale

schema_documentation_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
-- ==================================================
-- DOCUMENTED SCHEMA EXAMPLE
-- ==================================================
 
/*
===================================================
TABLE: Customer
===================================================
Purpose: 
    Stores information about individuals or 
    organizations that purchase products or services.
    
Business Owner: Sales Department
Data Steward: CRM Team Lead
 
Source Systems:
    - WebShop: Online registrations
    - SalesForce: Imported leads
    - Legacy CRM: Historical migration
 
Privacy Classification: PII (Personally Identifiable)
Retention Policy: 7 years after last activity
===================================================
*/
CREATE TABLE Customer (
    -- Primary identifier, system-generated
    -- Format: Sequential integer, no gaps guaranteed
    -- Source: Auto-generated on insert
    CustomerID      SERIAL          PRIMARY KEY,
    
    -- Legal name for individuals, company name for B2B
    -- Source: User registration or sales input
    -- Note: May contain international characters (UTF-8)
    CustomerName    VARCHAR(200)    NOT NULL,
    
    -- Primary contact email
    -- Validation: Standard email regex at app layer
    -- Used for: Account recovery, marketing (with consent)
    Email           VARCHAR(254)    NOT NULL UNIQUE,
    
    -- Customer segment classification
    -- Values:
    --   'individual' - B2C personal accounts
    --   'business' - B2B company accounts
    --   'enterprise' - Large accounts with special terms
    -- Set by: Sales team based on contract value
    CustomerType    VARCHAR(20)     NOT NULL DEFAULT 'individual'
                    CHECK (CustomerType IN ('individual', 'business', 'enterprise')),
    
    -- Accumulated loyalty points
    -- Calculation: 1 point per $10 spent
    -- Expiration: Points expire 24 months after earning
    -- Updated by: Daily batch process
    LoyaltyPoints   INT             NOT NULL DEFAULT 0 CHECK (LoyaltyPoints >= 0),
    
    -- Account creation timestamp
    -- Timezone: Stored in UTC
    CreatedAt       TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    
    -- Soft delete flag
    -- When true: Account deactivated but data retained
    -- Retention: Required for 7-year audit trail
    IsActive        BOOLEAN         NOT NULL DEFAULT TRUE
);
 
-- Index for email lookups (login, duplicate check)
CREATE INDEX idx_customer_email ON Customer(Email);
 
-- Index for active customer queries
CREATE INDEX idx_customer_active ON Customer(IsActive) WHERE IsActive = TRUE;
 
/*
---------------------------------------------------
RELATIONSHIP: Customer -> Address (1:N)
---------------------------------------------------
Business Rule:
    A customer may have multiple addresses (shipping,
    billing, etc.). At least one primary address is 
    required for order fulfillment.
    
Integrity Rules:
    - Deleting a customer cascades to addresses
    - At most one primary address per customer per type
---------------------------------------------------
*/
 
-- External data dictionary entry (often in separate doc)
COMMENT ON TABLE Customer IS 'Core entity for customer relationship management';
COMMENT ON COLUMN Customer.CustomerID IS 'Unique identifier (PK, auto-increment)';
COMMENT ON COLUMN Customer.Email IS 'Primary contact email (UK, login credential)';
COMMENT ON COLUMN Customer.LoyaltyPoints IS 'Accumulated rewards points, batch-updated daily';

Living Documentation

The best documentation lives with the code. Use SQL COMMENT statements, embed rationale in DDL files, and keep documentation in version control alongside schema definitions. Stale external documents are worse than no documentation.

Schema Versioning and Evolution

Schemas evolve over time. Business requirements change, performance issues emerge, and integration needs expand. Managing schema evolution requires disciplined versioning practices.

Migration Patterns:

Additive Changes: Add new tables, columns, indexes. Generally safe and backward-compatible.
Destructive Changes: Remove or rename elements. Requires careful coordination with applications.
Transformative Changes: Split/merge tables, change data types. Often requires data migration.

Version Control for Schemas:

Store DDL scripts in version control (Git)
Use migration tools (Flyway, Liquibase, Alembic, Prisma)
Never modify production schemas directly
Follow a consistent migration script naming convention
Maintain both 'up' and 'down' migrations for reversibility

migration_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
-- ==================================================
-- MIGRATION VERSIONING EXAMPLES
-- ==================================================
 
-- Migration: V001__Create_Customer_Table.sql
-- Date: 2024-01-15
-- Author: Jane Developer
-- Description: Initial customer table for CRM module
 
CREATE TABLE Customer (
    CustomerID      SERIAL          PRIMARY KEY,
    Email           VARCHAR(254)    NOT NULL UNIQUE,
    CreatedAt       TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP
);
 
-- ==================================================
 
-- Migration: V002__Add_Customer_Name.sql
-- Date: 2024-02-01
-- Author: John Engineer
-- Description: Add name field per JIRA-1234
 
ALTER TABLE Customer 
    ADD COLUMN CustomerName VARCHAR(200);
 
-- Backfill existing records
UPDATE Customer 
    SET CustomerName = 'Unknown' 
    WHERE CustomerName IS NULL;
 
-- Make NOT NULL after backfill
ALTER TABLE Customer 
    ALTER COLUMN CustomerName SET NOT NULL;
 
-- ==================================================
 
-- Migration: V003__Add_CustomerType_Enum.sql
-- Date: 2024-03-15
-- Author: Jane Developer
-- Description: Customer segmentation feature
 
-- Add new column with default
ALTER TABLE Customer 
    ADD COLUMN CustomerType VARCHAR(20) NOT NULL DEFAULT 'individual';
 
-- Add constraint
ALTER TABLE Customer 
    ADD CONSTRAINT chk_customer_type 
    CHECK (CustomerType IN ('individual', 'business', 'enterprise'));
 
-- ==================================================
 
-- Rollback: V003__Add_CustomerType_Enum_ROLLBACK.sql
-- ALWAYS test rollbacks before deployment
 
ALTER TABLE Customer 
    DROP CONSTRAINT chk_customer_type;
 
ALTER TABLE Customer 
    DROP COLUMN CustomerType;
 
-- ==================================================
 
-- Migration: V010__Split_Name_Into_FirstLast.sql
-- Date: 2024-06-01
-- Author: John Engineer
-- Description: Enable proper name sorting/searching
-- WARNING: Destructive change - thorough testing required
 
-- Step 1: Add new columns
ALTER TABLE Customer 
    ADD COLUMN FirstName VARCHAR(100),
    ADD COLUMN LastName VARCHAR(100);
 
-- Step 2: Migrate data (simplified; real migration handles edge cases)
UPDATE Customer 
SET 
    FirstName = SPLIT_PART(CustomerName, ' ', 1),
    LastName = SUBSTRING(CustomerName FROM POSITION(' ' IN CustomerName) + 1);
 
-- Step 3: Set NOT NULL after migration
ALTER TABLE Customer 
    ALTER COLUMN FirstName SET NOT NULL,
    ALTER COLUMN LastName SET NOT NULL;
 
-- Step 4: Remove old column (AFTER apps updated)
-- NOTE: This is often a separate migration after verification
-- ALTER TABLE Customer DROP COLUMN CustomerName;

Schema Evolution Best Practices

•Immutable migrations: Never modify a migration after it's been applied to any environment
•Incremental changes: Prefer many small migrations over large transformational ones
•Test rollbacks: Every migration should have a tested rollback procedure
•Backward compatibility: Keep old columns during transition periods; remove after app updates
•Data validation: Include data validation queries in migrations to verify correctness
•Performance consideration: Plan migrations for low-traffic periods; use online DDL when available

Summary and Key Takeaways

Relational schemas are the definitive specifications of database structure. Mastery of schema notation, documentation, and evolution practices is essential for professional database work.

Core Schema Concepts

•Formal definition: Schemas combine relation names, attribute sets, domain assignments, and integrity constraints into complete database specifications.
•Notation systems: Multiple text-based notation styles exist; choose one and apply consistently throughout projects.
•Visual representation: Schema diagrams communicate structure intuitively; use appropriate notation (ER, UML, or relational diagrams).
•Data types: Careful domain selection ensures data integrity; prefer precise types (DECIMAL for money, VARCHAR with limits).
•Key hierarchy: Understand superkeys, candidate keys, primary keys, alternate keys, and foreign keys—and when each applies.
•Documentation: Schema documentation is essential; embed it in DDL files and version control systems.
•Evolution: Schemas change over time; use migration tools and version control for disciplined evolution.

What Comes Next:

With schema representation mastered, we turn to normalization—the systematic process of transforming schemas to minimize redundancy and dependency issues. Normalization ensures that our carefully designed schemas don't suffer from update anomalies and data inconsistencies that plague poorly structured databases.

Page Complete

You now command the vocabulary and techniques for specifying, documenting, and evolving relational schemas. These skills translate directly into professional database design work, where schema quality determines system reliability and maintainability.

2 / 5

Loading learning content...

Database Management SystemsLogical Design

Logical Design: From Concepts to Schemas

LevelIntermediate

Duration90 mins

TopicLogical Design

2 / 5

Relational Schema

The Blueprint of Relational Databases

Understanding relational schemas is essential for multiple reasons:

Communication: Schemas provide a common language between database designers, application developers, DBAs, and stakeholders
Validation: A well-specified schema serves as the first line of defense against data quality issues
Documentation: Schemas document the data model for onboarding, audits, and maintenance
Evolution: Understanding schema notation enables controlled evolution through migrations and versioning

Learning Objectives

Formal Definition of Relational Schema

Mathematical Foundations:

A domain D is a set of atomic values. Each domain has a name, a data type, and potentially additional constraints (format, range, enumeration).

Examples:

D_EmployeeID = {positive integers}
D_Name = {strings of length 1-100, non-empty}
D_Salary = {decimal numbers ≥ 0}
D_Department = {'Engineering', 'Sales', 'HR', 'Finance', ...}

An attribute A is the name given to a role played by a domain D in a relation. Multiple attributes may use the same domain.

A relation schema R(A₁, A₂, ..., Aₙ) is a relation name R and an ordered list of attributes, where each attribute Aᵢ has an associated domain dom(Aᵢ).

A relational database schema S = {R₁, R₂, ..., Rₘ, IC} is a set of relation schemas Rᵢ together with a set of integrity constraints IC.

Components of a Relation Schema
Component	Mathematical Definition	Practical Interpretation
Relation Name	R ∈ Names	Table identifier (e.g., 'Employee')
Attribute Set	{A₁, A₂, ..., Aₙ}	Column names (e.g., 'EmployeeID', 'Name')
Attribute Order	Tuple (A₁, A₂, ..., Aₙ)	Column sequence (matters in some contexts)
Domain Assignment	dom: A → D	Data type binding (A → INTEGER, etc.)
Degree	n = \|{A₁, A₂, ..., Aₙ}\|	Number of columns
Relation Schema	R(A₁:D₁, ..., Aₙ:Dₙ)	Complete table structure definition

Schema vs. Instance

Relation Instance:

vᵢ ∈ dom(Aᵢ) ∪ {NULL}

The cardinality of a relation is the number of tuples |r(R)|, which changes as data is inserted, updated, or deleted.

Text-Based Schema Notation

Standard Academic Notation:

The widely-accepted notation for relation schemas follows these conventions:

RELATION_NAME(attribute1, attribute2, ..., attributeN)

Conventions for marking constraints:

Primary Key: Underline the attribute(s) or list as PK. Example: Employee(<u>EmployeeID</u>, Name, Salary)
Foreign Key: Use italic, arrow notation, or FK label. Example: Employee(ID, Name, DeptID*) where * denotes FK
Composite Key: Underline multiple attributes
Alternate Key: Double underline or note separately
NOT NULL: Superscript or bold

Since underlining and italics aren't always available, alternative notations exist:

schema_notation_examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
=====================================================
RELATIONAL SCHEMA NOTATION EXAMPLES
=====================================================
 
--------------------------------------------------
NOTATION STYLE 1: Inline Markers
--------------------------------------------------
Employee(EmployeeID [PK], FirstName, LastName, 
         Email [UK], DepartmentID [FK → Department.DepartmentID])
Department(DepartmentID [PK], DepartmentName [UK], Budget)
Project(ProjectID [PK], ProjectName, StartDate, EndDate)
EmployeeProject(EmployeeID [PK,FK], ProjectID [PK,FK], 
                HoursPerWeek, Role)
 
Legend: PK = Primary Key, FK = Foreign Key, UK = Unique Key
 
--------------------------------------------------
NOTATION STYLE 2: Separate Constraint Listing
--------------------------------------------------
Employee(EmployeeID, FirstName, LastName, Email, 
         Salary, HireDate, DepartmentID)
 
Constraints:
  PK: Employee(EmployeeID)
  UK: Employee(Email)
  FK: Employee(DepartmentID) → Department(DepartmentID)
  NOT NULL: FirstName, LastName, Email, HireDate, DepartmentID
  CHECK: Salary >= 0
 
--------------------------------------------------
NOTATION STYLE 3: Textual Description (Formal)
--------------------------------------------------
Schema: COMPANY_DB
 
R1 = EMPLOYEE(ID: INTEGER, NAME: VARCHAR(100), 
              DEPT_ID: INTEGER, SALARY: DECIMAL(10,2))
    PRIMARY KEY: {ID}
    FOREIGN KEY: {DEPT_ID} REFERENCES DEPARTMENT
 
R2 = DEPARTMENT(ID: INTEGER, NAME: VARCHAR(100), 
                BUDGET: DECIMAL(15,2))
    PRIMARY KEY: {ID}
    UNIQUE: {NAME}
 
R3 = WORKS_ON(EMP_ID: INTEGER, PROJ_ID: INTEGER, 
              HOURS: DECIMAL(4,1))
    PRIMARY KEY: {EMP_ID, PROJ_ID}
    FOREIGN KEY: {EMP_ID} REFERENCES EMPLOYEE
    FOREIGN KEY: {PROJ_ID} REFERENCES PROJECT
 
--------------------------------------------------
NOTATION STYLE 4: Markdown-Friendly
--------------------------------------------------
## Employee
| Column       | Type          | Constraints          |
|--------------|---------------|----------------------|
| **EmployeeID** | INT         | PK                   |
| FirstName    | VARCHAR(50)   | NOT NULL             |
| LastName     | VARCHAR(50)   | NOT NULL             |
| Email        | VARCHAR(100)  | UNIQUE, NOT NULL     |
| DepartmentID | INT           | FK → Department(ID)  |
 
--------------------------------------------------
NOTATION STYLE 5: CompSci Textbook (Elmasri/Navathe)
--------------------------------------------------
EMPLOYEE
  EmployeeID        INTEGER         NOT NULL
  FirstName         VARCHAR(50)     NOT NULL
  LastName          VARCHAR(50)     NOT NULL
  Email             VARCHAR(100)    NOT NULL, UNIQUE
  Salary            DECIMAL(10,2)   
  DepartmentID      INTEGER         NOT NULL
 
  PRIMARY KEY (EmployeeID)
  FOREIGN KEY (DepartmentID) REFERENCES DEPARTMENT(DepartmentID)
    ON DELETE SET NULL
    ON UPDATE CASCADE

Notation Consistency

Schema Diagrams and Visual Representation

While text notation is precise and portable, visual diagrams communicate schema structure more intuitively. Several diagram types are used in practice:

1. Relational Schema Diagram

The simplest visual representation shows relations as labeled rectangles containing attribute names:

Primary key attributes are underlined or marked with a key icon
Foreign keys are connected to their referenced primary keys with arrows
Data types may be shown alongside attribute names

This format is common in textbooks and informal design discussions.

2. Entity-Relationship Diagram (ER Diagram)

While technically a conceptual model, ER diagrams are often adapted to show logical structure:

Chen notation: entities as rectangles, relationships as diamonds, attributes as ovals
Crow's foot notation: entities as rectangles with relationship lines showing cardinality
IE (Information Engineering) notation: similar to crow's foot with minor variations

3. UML Class Diagram (Database Profile)

UML class diagrams can represent relational schemas using the database profile:

Classes represent tables
Attributes represent columns with types
Associations represent foreign key relationships
Stereotypes (<<PK>>, <<FK>>) mark key columns

Converting Mermaid diagram...

Diagram Best Practices:

Consistency: Use the same notation throughout the organization
Clarity: Don't overcrowd diagrams; break large schemas into subject areas
Hierarchy: Show the most important tables prominently
Color coding: Use colors to indicate subject areas or table types
Versioning: Include version numbers and dates on diagrams
Legend: Always include a notation legend for readers unfamiliar with conventions

Tool Selection

Data Types and Domain Specifications

Numeric Types:

Numeric types vary in precision, range, and storage requirements:

INTEGER/INT: Whole numbers, typically 4 bytes, range ≈ ±2 billion
SMALLINT: 2 bytes, range ≈ ±32,000
BIGINT: 8 bytes, for very large values
DECIMAL(p,s)/NUMERIC(p,s): Exact decimal with precision p and scale s
FLOAT/REAL/DOUBLE: Approximate floating-point (avoid for financial data)

Character Types:

CHAR(n): Fixed-length, padded with spaces
VARCHAR(n): Variable-length up to n characters
TEXT/CLOB: Large text without length limit (database-specific)

Temporal Types:

DATE: Year, month, day
TIME: Hour, minute, second, optional timezone
TIMESTAMP/DATETIME: Combined date and time
INTERVAL: Duration between timestamps

Data Type Selection Guidelines
Use Case	Recommended Type	Rationale
Surrogate primary key	INT or BIGINT AUTO_INCREMENT	Compact, fast comparisons
Monetary amounts	DECIMAL(19,4)	Exact precision, avoids rounding errors
Percentages	DECIMAL(5,2)	Range 0.00 to 100.00 with precision
Person names	VARCHAR(100)	Variable length, most names < 100 chars
Email addresses	VARCHAR(254)	RFC 5321 maximum length
Country codes	CHAR(2) or CHAR(3)	Fixed ISO standard length
UUIDs	CHAR(36) or UUID	Standard format, universal uniqueness
Boolean flags	BOOLEAN or TINYINT(1)	True/false or 0/1
Timestamps	TIMESTAMP WITH TIME ZONE	Unambiguous time representation
Large text content	TEXT	No length constraint, separate storage

domain_definitions.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
-- ==================================================
-- DOMAIN DEFINITIONS IN SQL
-- ==================================================
 
-- Standard SQL supports CREATE DOMAIN (not all DBs implement)
-- PostgreSQL example:
 
-- Define semantic domains
CREATE DOMAIN EmailAddress AS VARCHAR(254)
    CHECK (VALUE ~ '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$');
 
CREATE DOMAIN PositiveMoney AS DECIMAL(19,4)
    CHECK (VALUE >= 0);
 
CREATE DOMAIN PhoneNumber AS VARCHAR(20)
    CHECK (VALUE ~ '^\+?[0-9\-\s\(\)]+$');
 
CREATE DOMAIN Percentage AS DECIMAL(5,2)
    CHECK (VALUE >= 0 AND VALUE <= 100);
 
-- Using domains in table definition
CREATE TABLE Customer (
    CustomerID      SERIAL          PRIMARY KEY,
    Email           EmailAddress    NOT NULL UNIQUE,
    Phone           PhoneNumber,
    LoyaltyPoints   INTEGER         DEFAULT 0 CHECK (LoyaltyPoints >= 0),
    DiscountRate    Percentage      DEFAULT 0.00
);
 
-- For databases without CREATE DOMAIN, use CHECK constraints
CREATE TABLE Product (
    ProductID       INT             PRIMARY KEY,
    ProductName     VARCHAR(200)    NOT NULL,
    SKU             CHAR(12)        NOT NULL UNIQUE,
    
    -- Inline domain constraints
    Price           DECIMAL(10,2)   NOT NULL CHECK (Price >= 0),
    Weight          DECIMAL(8,3)    CHECK (Weight > 0),
    StockLevel      INT             NOT NULL DEFAULT 0 CHECK (StockLevel >= 0),
    
    -- Enumerated domain as CHECK
    Category        VARCHAR(50)     NOT NULL 
                    CHECK (Category IN ('Electronics', 'Clothing', 
                                        'Food', 'Furniture', 'Other'))
);
 
-- Enum type (PostgreSQL)
CREATE TYPE OrderStatus AS ENUM (
    'pending', 'confirmed', 'processing', 
    'shipped', 'delivered', 'cancelled'
);
 
CREATE TABLE Orders (
    OrderID         SERIAL          PRIMARY KEY,
    CustomerID      INT             NOT NULL REFERENCES Customer(CustomerID),
    Status          OrderStatus     NOT NULL DEFAULT 'pending',
    OrderDate       TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    TotalAmount     PositiveMoney   NOT NULL
);

Storage vs. Display

Complete Key Specification

Keys are the cornerstone of relational schema integrity. A complete schema specification must identify all key types and their roles.

Key Hierarchy:

Super Key: Any set of attributes that uniquely identifies a tuple. A relation may have many super keys.
Candidate Key: A minimal super key—no attribute can be removed while maintaining uniqueness. A relation may have multiple candidate keys.
Primary Key: The candidate key chosen as the main identifier. Every relation must have exactly one primary key. Values cannot be NULL.
Alternate Key: Candidate keys not chosen as primary key. Often implemented as UNIQUE constraints.
Foreign Key: An attribute (or set) that references a candidate key in another (or the same) relation. Enforces referential integrity.
Composite Key: A key consisting of multiple attributes (any of the above types can be composite).
Surrogate Key: A system-generated key (often auto-increment integer) with no business meaning, used for technical efficiency.

key_specification_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
-- ==================================================
-- COMPREHENSIVE KEY SPECIFICATION
-- ==================================================
 
-- EXAMPLE 1: Multiple Candidate Keys
-- -------------------------------------------------
-- A book can be uniquely identified by ISBN or by 
-- (Title, Author, Edition) combination
 
CREATE TABLE Book (
    BookID          SERIAL,                     -- Surrogate key
    ISBN            CHAR(13)        NOT NULL,   -- Candidate key 1
    Title           VARCHAR(500)    NOT NULL,   -- \
    Author          VARCHAR(200)    NOT NULL,   -- | Candidate key 2
    Edition         INT             DEFAULT 1,  -- /
    Publisher       VARCHAR(200),
    PublishYear     INT,
    
    -- Primary Key (chosen surrogate for efficiency)
    PRIMARY KEY (BookID),
    
    -- Alternate Keys (other candidate keys)
    UNIQUE (ISBN),
    UNIQUE (Title, Author, Edition)
);
 
-- EXAMPLE 2: Composite Primary Key
-- -------------------------------------------------
CREATE TABLE Enrollment (
    StudentID       INT             NOT NULL,
    CourseID        INT             NOT NULL,
    Semester        CHAR(6)         NOT NULL,  -- e.g., '2024SP'
    Grade           CHAR(2),
    EnrollmentDate  DATE            NOT NULL,
    
    -- Composite Primary Key
    PRIMARY KEY (StudentID, CourseID, Semester),
    
    -- Foreign Keys
    FOREIGN KEY (StudentID) REFERENCES Student(StudentID)
        ON DELETE CASCADE,
    FOREIGN KEY (CourseID) REFERENCES Course(CourseID)
        ON DELETE RESTRICT
);
 
-- EXAMPLE 3: Self-Referencing Foreign Key
-- -------------------------------------------------
CREATE TABLE Employee (
    EmployeeID      SERIAL          PRIMARY KEY,
    Name            VARCHAR(100)    NOT NULL,
    ManagerID       INT,            -- Nullable (CEO has no manager)
    
    -- Self-referencing FK
    FOREIGN KEY (ManagerID) REFERENCES Employee(EmployeeID)
        ON DELETE SET NULL
);
 
-- EXAMPLE 4: Multiple Foreign Keys to Same Table
-- -------------------------------------------------
CREATE TABLE Flight (
    FlightID        SERIAL          PRIMARY KEY,
    FlightNumber    VARCHAR(10)     NOT NULL,
    DepartureCity   INT             NOT NULL,
    ArrivalCity     INT             NOT NULL,
    DepartureTime   TIMESTAMP       NOT NULL,
    ArrivalTime     TIMESTAMP       NOT NULL,
    
    -- Both FKs reference same table (different roles)
    FOREIGN KEY (DepartureCity) REFERENCES City(CityID),
    FOREIGN KEY (ArrivalCity) REFERENCES City(CityID),
    
    -- Constraint: can't fly to same city
    CHECK (DepartureCity != ArrivalCity)
);
 
-- EXAMPLE 5: Identifying Relationship (Weak Entity)
-- -------------------------------------------------
-- Room is uniquely identified only within a Building
 
CREATE TABLE Building (
    BuildingCode    CHAR(3)         PRIMARY KEY,
    BuildingName    VARCHAR(100)    NOT NULL,
    Address         VARCHAR(200)
);
 
CREATE TABLE Room (
    BuildingCode    CHAR(3)         NOT NULL,
    RoomNumber      VARCHAR(10)     NOT NULL,
    Capacity        INT             CHECK (Capacity > 0),
    RoomType        VARCHAR(50),
    
    -- Composite PK includes parent's PK
    PRIMARY KEY (BuildingCode, RoomNumber),
    
    -- Identifying relationship
    FOREIGN KEY (BuildingCode) REFERENCES Building(BuildingCode)
        ON DELETE CASCADE   -- If building deleted, rooms deleted
        ON UPDATE CASCADE   -- If building code changes, propagate
);

Key Selection Decision Framework

•Stability: Choose keys whose values won't change. Surrogate keys excel here.
•Simplicity: Prefer single-column keys over composites when possible for joins.
•Meaningfulness: Natural keys aid debugging; surrogate keys hide business logic.
•Size: Smaller keys (integers) mean smaller indexes and faster joins.
•Universality: If data may integrate with other systems, consider industry-standard identifiers (ISBN, SSN, ISIN).

Schema Documentation Standards

Production-quality schemas require comprehensive documentation beyond the DDL statements themselves. Documentation serves future maintainers, auditors, and developers integrating with the database.

Essential Documentation Components:

Data Dictionary: A complete catalog of all tables, columns, their meanings, and valid values
Relationship Documentation: Explanation of why relationships exist and their business meaning
Constraint Rationale: Why each constraint exists and what business rule it enforces
Historical Context: Why certain design decisions were made, alternatives considered
Usage Examples: Sample queries for common access patterns
Change Log: History of schema modifications with rationale

schema_documentation_example.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
-- ==================================================
-- DOCUMENTED SCHEMA EXAMPLE
-- ==================================================
 
/*
===================================================
TABLE: Customer
===================================================
Purpose: 
    Stores information about individuals or 
    organizations that purchase products or services.
    
Business Owner: Sales Department
Data Steward: CRM Team Lead
 
Source Systems:
    - WebShop: Online registrations
    - SalesForce: Imported leads
    - Legacy CRM: Historical migration
 
Privacy Classification: PII (Personally Identifiable)
Retention Policy: 7 years after last activity
===================================================
*/
CREATE TABLE Customer (
    -- Primary identifier, system-generated
    -- Format: Sequential integer, no gaps guaranteed
    -- Source: Auto-generated on insert
    CustomerID      SERIAL          PRIMARY KEY,
    
    -- Legal name for individuals, company name for B2B
    -- Source: User registration or sales input
    -- Note: May contain international characters (UTF-8)
    CustomerName    VARCHAR(200)    NOT NULL,
    
    -- Primary contact email
    -- Validation: Standard email regex at app layer
    -- Used for: Account recovery, marketing (with consent)
    Email           VARCHAR(254)    NOT NULL UNIQUE,
    
    -- Customer segment classification
    -- Values:
    --   'individual' - B2C personal accounts
    --   'business' - B2B company accounts
    --   'enterprise' - Large accounts with special terms
    -- Set by: Sales team based on contract value
    CustomerType    VARCHAR(20)     NOT NULL DEFAULT 'individual'
                    CHECK (CustomerType IN ('individual', 'business', 'enterprise')),
    
    -- Accumulated loyalty points
    -- Calculation: 1 point per $10 spent
    -- Expiration: Points expire 24 months after earning
    -- Updated by: Daily batch process
    LoyaltyPoints   INT             NOT NULL DEFAULT 0 CHECK (LoyaltyPoints >= 0),
    
    -- Account creation timestamp
    -- Timezone: Stored in UTC
    CreatedAt       TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    
    -- Soft delete flag
    -- When true: Account deactivated but data retained
    -- Retention: Required for 7-year audit trail
    IsActive        BOOLEAN         NOT NULL DEFAULT TRUE
);
 
-- Index for email lookups (login, duplicate check)
CREATE INDEX idx_customer_email ON Customer(Email);
 
-- Index for active customer queries
CREATE INDEX idx_customer_active ON Customer(IsActive) WHERE IsActive = TRUE;
 
/*
---------------------------------------------------
RELATIONSHIP: Customer -> Address (1:N)
---------------------------------------------------
Business Rule:
    A customer may have multiple addresses (shipping,
    billing, etc.). At least one primary address is 
    required for order fulfillment.
    
Integrity Rules:
    - Deleting a customer cascades to addresses
    - At most one primary address per customer per type
---------------------------------------------------
*/
 
-- External data dictionary entry (often in separate doc)
COMMENT ON TABLE Customer IS 'Core entity for customer relationship management';
COMMENT ON COLUMN Customer.CustomerID IS 'Unique identifier (PK, auto-increment)';
COMMENT ON COLUMN Customer.Email IS 'Primary contact email (UK, login credential)';
COMMENT ON COLUMN Customer.LoyaltyPoints IS 'Accumulated rewards points, batch-updated daily';

Living Documentation

Schema Versioning and Evolution

Schemas evolve over time. Business requirements change, performance issues emerge, and integration needs expand. Managing schema evolution requires disciplined versioning practices.

Migration Patterns:

Additive Changes: Add new tables, columns, indexes. Generally safe and backward-compatible.
Destructive Changes: Remove or rename elements. Requires careful coordination with applications.
Transformative Changes: Split/merge tables, change data types. Often requires data migration.

Version Control for Schemas:

Store DDL scripts in version control (Git)
Use migration tools (Flyway, Liquibase, Alembic, Prisma)
Never modify production schemas directly
Follow a consistent migration script naming convention
Maintain both 'up' and 'down' migrations for reversibility

migration_examples.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
-- ==================================================
-- MIGRATION VERSIONING EXAMPLES
-- ==================================================
 
-- Migration: V001__Create_Customer_Table.sql
-- Date: 2024-01-15
-- Author: Jane Developer
-- Description: Initial customer table for CRM module
 
CREATE TABLE Customer (
    CustomerID      SERIAL          PRIMARY KEY,
    Email           VARCHAR(254)    NOT NULL UNIQUE,
    CreatedAt       TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP
);
 
-- ==================================================
 
-- Migration: V002__Add_Customer_Name.sql
-- Date: 2024-02-01
-- Author: John Engineer
-- Description: Add name field per JIRA-1234
 
ALTER TABLE Customer 
    ADD COLUMN CustomerName VARCHAR(200);
 
-- Backfill existing records
UPDATE Customer 
    SET CustomerName = 'Unknown' 
    WHERE CustomerName IS NULL;
 
-- Make NOT NULL after backfill
ALTER TABLE Customer 
    ALTER COLUMN CustomerName SET NOT NULL;
 
-- ==================================================
 
-- Migration: V003__Add_CustomerType_Enum.sql
-- Date: 2024-03-15
-- Author: Jane Developer
-- Description: Customer segmentation feature
 
-- Add new column with default
ALTER TABLE Customer 
    ADD COLUMN CustomerType VARCHAR(20) NOT NULL DEFAULT 'individual';
 
-- Add constraint
ALTER TABLE Customer 
    ADD CONSTRAINT chk_customer_type 
    CHECK (CustomerType IN ('individual', 'business', 'enterprise'));
 
-- ==================================================
 
-- Rollback: V003__Add_CustomerType_Enum_ROLLBACK.sql
-- ALWAYS test rollbacks before deployment
 
ALTER TABLE Customer 
    DROP CONSTRAINT chk_customer_type;
 
ALTER TABLE Customer 
    DROP COLUMN CustomerType;
 
-- ==================================================
 
-- Migration: V010__Split_Name_Into_FirstLast.sql
-- Date: 2024-06-01
-- Author: John Engineer
-- Description: Enable proper name sorting/searching
-- WARNING: Destructive change - thorough testing required
 
-- Step 1: Add new columns
ALTER TABLE Customer 
    ADD COLUMN FirstName VARCHAR(100),
    ADD COLUMN LastName VARCHAR(100);
 
-- Step 2: Migrate data (simplified; real migration handles edge cases)
UPDATE Customer 
SET 
    FirstName = SPLIT_PART(CustomerName, ' ', 1),
    LastName = SUBSTRING(CustomerName FROM POSITION(' ' IN CustomerName) + 1);
 
-- Step 3: Set NOT NULL after migration
ALTER TABLE Customer 
    ALTER COLUMN FirstName SET NOT NULL,
    ALTER COLUMN LastName SET NOT NULL;
 
-- Step 4: Remove old column (AFTER apps updated)
-- NOTE: This is often a separate migration after verification
-- ALTER TABLE Customer DROP COLUMN CustomerName;

Schema Evolution Best Practices

•Immutable migrations: Never modify a migration after it's been applied to any environment
•Incremental changes: Prefer many small migrations over large transformational ones
•Test rollbacks: Every migration should have a tested rollback procedure
•Backward compatibility: Keep old columns during transition periods; remove after app updates
•Data validation: Include data validation queries in migrations to verify correctness
•Performance consideration: Plan migrations for low-traffic periods; use online DDL when available

Summary and Key Takeaways

Relational schemas are the definitive specifications of database structure. Mastery of schema notation, documentation, and evolution practices is essential for professional database work.

Core Schema Concepts

•Formal definition: Schemas combine relation names, attribute sets, domain assignments, and integrity constraints into complete database specifications.
•Notation systems: Multiple text-based notation styles exist; choose one and apply consistently throughout projects.
•Visual representation: Schema diagrams communicate structure intuitively; use appropriate notation (ER, UML, or relational diagrams).
•Data types: Careful domain selection ensures data integrity; prefer precise types (DECIMAL for money, VARCHAR with limits).
•Key hierarchy: Understand superkeys, candidate keys, primary keys, alternate keys, and foreign keys—and when each applies.
•Documentation: Schema documentation is essential; embed it in DDL files and version control systems.
•Evolution: Schemas change over time; use migration tools and version control for disciplined evolution.

What Comes Next:

Page Complete

2 / 5