Database Management SystemsRelational Model Notation

Relational Model Notation

LevelIntermediate

Duration90 mins

TopicRelational Model Notation

1 / 5

Relation Schema Notation: The Language of Database Structure

The Critical Importance of Precise Notation

In mathematics, physics, and engineering, notation is not merely cosmetic—it is foundational. The symbols and conventions we use to represent ideas fundamentally shape how we think about, communicate, and manipulate those ideas. The history of mathematics itself demonstrates that revolutionary advances often follow from revolutionary notations: Leibniz's differential notation transformed calculus, vector notation enabled modern physics, and chemical formulas revolutionized chemistry.

Database engineering is no different. The relational model notation provides a precise language for describing database structures—a language that transcends implementation details, vendor-specific syntax, and natural language ambiguity. A database professional who masters this notation gains the ability to:

Communicate unambiguously with colleagues, regardless of their preferred DBMS
Document designs precisely in a form that remains valid across decades and technology shifts
Reason formally about database properties, transformations, and optimizations
Bridge theory and practice by connecting academic literature to practical implementation

This page provides a comprehensive exploration of relation schema notation—the conventions and symbols used to describe the structural blueprint of relational databases. We begin with the mathematical foundations and progress through practical applications, ensuring you command both the theoretical precision and practical utility of this essential notation.

Learning Objectives

After studying this page, you will be able to:

• Explain the distinction between relation schema and relation instance • Write formal mathematical notation for relation schemas using set-theoretic conventions • Apply textual schema notation in practical database documentation • Interpret and construct attribute-domain specifications • Use degree notation and understand schema comparison • Connect formal notation to practical SQL implementations

Schema vs Instance: The Fundamental Distinction

Before diving into notation specifics, we must establish a foundational distinction that pervades all database theory: the difference between schema (also called intension) and instance (also called extension).

The Type-Value Analogy

In programming, we distinguish between a type (which defines the structure and constraints of values) and values (which are specific instances of that type). For example, Integer is a type, while 42 and -17 are specific integer values.

The relational model makes an analogous distinction:

A relation schema is the structural description: the relation name, attribute names, and their associated domains (data types). It is the template or blueprint that defines what valid data looks like.
A relation instance (or relation state) is a specific set of tuples that conform to the schema at a particular point in time. It is the actual data currently stored.

Formal Definition

Relation Schema: A relation schema R(A₁, A₂, ..., Aₙ) is a named tuple consisting of:

A relation name R
An ordered list of attribute names (A₁, A₂, ..., Aₙ)
Associated domains dom(Aᵢ) for each attribute

Relation Instance: A relation instance r of schema R is a subset of the Cartesian product of the attribute domains:

r(R) ⊆ dom(A₁) × dom(A₂) × ... × dom(Aₙ)

Each element of this set is a tuple (row), and no two tuples can be identical (since a relation is a set, and sets contain no duplicates).

Schema vs Instance: Comprehensive Comparison
Aspect	Relation Schema (Intension)	Relation Instance (Extension)
Nature	Structural description	Actual data
Stability	Rarely changes	Frequently changes
Analogies	Class definition, Type, Blueprint	Object instance, Value, Building
Defined by	Database designer	Data insertion/modification
Contains	Attribute names, domains, constraints	Tuples (rows) of actual values
Time dependence	Static during operation	Dynamic, changes over time
Mathematical term	Intension	Extension
Size	Fixed (determined by degree)	Variable (cardinality changes)

The Meta-Level Distinction

Schema notation operates at a meta-level—it describes the description, not the data itself. When you see R(A, B, C), you're looking at a statement about what kind of data can exist, not any particular data. This distinction is crucial: schema notation is about structure, not content.

Why This Distinction Matters for Notation

Different notations serve different purposes:

Schema notation focuses on structure: attribute names, types, and relationships between schemas (e.g., foreign key references)
Instance notation focuses on content: specific tuples, sample data, and query results

In practice, we use schema notation far more frequently, as it captures the stable, definitional aspects of the database. Instance notation is typically reserved for examples, test cases, and query results.

The remainder of this page focuses primarily on schema notation, though we will return to instance notation in the next page.

Mathematical Set-Theoretic Notation

The relational model is rooted in set theory and first-order predicate logic—mathematical foundations that provide rigor and precision. The formal notation reflects these origins and enables mathematical reasoning about database properties.

Basic Schema Notation

The most fundamental schema notation uses the following form:

R(A₁: D₁, A₂: D₂, ..., Aₙ: Dₙ)

Where:

R is the relation name (by convention, capitalized or in small caps)
Aᵢ is the name of the i-th attribute
Dᵢ is the domain (data type) of attribute Aᵢ
n is the degree (number of attributes) of the relation

Simplified Notation

When domains are implicit or defined elsewhere, we often use the simplified form:

R(A₁, A₂, ..., Aₙ) or simply R(A, B, C)

This abbreviated form is extremely common in database literature and assumes domains are either understood from context or specified in a separate domain dictionary.

Set Notation for Relations

Formally, a relation r on schema R(A₁, A₂, ..., Aₙ) is a finite subset of the Cartesian product:

r ⊆ dom(A₁) × dom(A₂) × ... × dom(Aₙ)

Each element t ∈ r is a tuple, which can be viewed as a function:

t: {A₁, A₂, ..., Aₙ} → ⋃ᵢ dom(Aᵢ)

such that t(Aᵢ) ∈ dom(Aᵢ) for each attribute Aᵢ.

This functional view of tuples underlies the notation t[Aᵢ] or t.Aᵢ for accessing the value of attribute Aᵢ in tuple t.

formal_schema_examples.txt

Formal Notation

// ================================================
// FORMAL RELATION SCHEMA NOTATION EXAMPLES
// ================================================
 
// Basic Schema Notation with Explicit Domains
// Syntax: R(A₁: D₁, A₂: D₂, ..., Aₙ: Dₙ)
 
EMPLOYEE(
    emp_id: INTEGER,
    first_name: VARCHAR(50),
    last_name: VARCHAR(50),
    birth_date: DATE,
    salary: DECIMAL(10,2),
    dept_id: INTEGER
)
 
DEPARTMENT(
    dept_id: INTEGER,
    dept_name: VARCHAR(100),
    location: VARCHAR(100),
    budget: DECIMAL(12,2)
)
 
// Simplified Schema Notation (domains implicit)
// Syntax: R(A₁, A₂, ..., Aₙ)
 
EMPLOYEE(emp_id, first_name, last_name, birth_date, salary, dept_id)
DEPARTMENT(dept_id, dept_name, location, budget)
 
// Set-Theoretic Definition of a Relation
// r(R) ⊆ dom(A₁) × dom(A₂) × ... × dom(Aₙ)
 
r(EMPLOYEE) ⊆ INTEGER × VARCHAR(50) × VARCHAR(50) × DATE × DECIMAL(10,2) × INTEGER
 
// Tuple Notation
// t = (v₁, v₂, ..., vₙ) where vᵢ ∈ dom(Aᵢ)
 
t₁ = (101, 'Alice', 'Johnson', '1985-03-15', 75000.00, 10)
t₂ = (102, 'Bob', 'Smith', '1990-07-22', 62000.00, 20)
 
// Attribute Access Notation
// t[Aᵢ] or t.Aᵢ denotes the value of attribute Aᵢ in tuple t
 
t₁[first_name] = 'Alice'
t₁.salary = 75000.00
t₁[emp_id] = 101
 
// Multiple Attribute Access (Projection on Tuple)
// t[{A₁, A₂}] returns subtuple with only specified attributes
 
t₁[{first_name, last_name}] = ('Alice', 'Johnson')

Attribute Set Notation

For a relation schema R, we often need to refer to its set of attributes. Common notations include:

attr(R) or Attr(R) — The set of all attributes in schema R
|R| or deg(R) — The degree (number of attributes) of R
R.A — Qualified attribute name (relation.attribute)

Union and Difference of Schemas

When discussing multiple relations, we sometimes need to describe combined or differential schemas:

R ∪ S — Union of attribute sets from R and S
R ∩ S — Common attributes between R and S
R - S — Attributes in R but not in S

These operations on schemas (rather than instances) are fundamental to understanding join compatibility and set operations in relational algebra.

Subscript and Superscript Conventions

Rᵢ — The i-th schema in a database (R₁, R₂, R₃...)
R' or R'' — Modified or derived versions of schema R
r(R) or r(R) — Instance r of schema R (lowercase for instance, uppercase for schema)
|r| — Cardinality (number of tuples) in instance r

Key Symbols in Mathematical Schema Notation

•R, S, T — Relation names (uppercase or small caps by convention)
•A, B, C — Individual attribute names (uppercase letters)
•X, Y, Z — Sets of attributes (typically for functional dependencies)
•r, s, t — Relation instances (lowercase)
•t, u, v — Individual tuples (lowercase)
•dom(A) — Domain of attribute A
•deg(R) — Degree of relation R (number of attributes)
•|r| — Cardinality of relation instance r (number of tuples)
•t[A] or t.A — Value of attribute A in tuple t
•⊆, ∈, ×, ∪, ∩ — Standard set operations applied to relations

Reading Academic Literature

When reading database research papers or advanced textbooks, you'll encounter this mathematical notation extensively. Familiarity with it enables you to access the rich body of database theory, including seminal works by Codd, Armstrong, Beeri, and others. The notation is remarkably consistent across decades of literature.

Textual Schema Notation

While mathematical notation provides rigor, textual schema notation provides readability and practical applicability. This notation style is commonly used in:

Database design documents
System specification documents
Academic assignments and examinations
Quick communication between developers
Entity-Relationship to Relational mappings

Standard Textual Format

The most widely used textual notation follows this pattern:

RELATION_NAME(attribute₁, attribute₂, ..., attributeₙ)

With the following conventions:

Primary Key Indication:

Underline the primary key attribute(s): RELATION_NAME(pk_attr, attr₂, attr₃)
Or use bold: RELATION_NAME(pk_attr, attr₂, attr₃)
Or mark explicitly: RELATION_NAME(pk_attr, attr₂, attr₃)

Foreign Key Indication:

Use arrow notation: attr₂ → REFERENCED_TABLE(ref_attr)
Or use italic: RELATION_NAME(pk, fk_attr, other)
Or mark explicitly: fk_attr (FK)

Optional/Nullable Attributes:

Use question mark: attr?
Or use parentheses: (attr)
Or mark explicitly: attr (nullable)

textual_notation_examples.txt

Textual Notation

// ================================================
// TEXTUAL SCHEMA NOTATION STYLES
// ================================================
 
// Style 1: Underline for PK (common in academic contexts)
// ------------------------------------------------
STUDENT(student_id̲, first_name, last_name, email, enrollment_date, major_id)
         ̲̲̲̲̲̲̲̲̲̲̲        
COURSE(course_id̲, course_name, credits, department)
 
ENROLLMENT(student_id̲, course_id̲, semester̲, grade)
           ̲̲̲̲̲̲̲̲̲̲         ̲̲̲̲̲̲̲̲̲̲   ̲̲̲̲̲̲̲̲
 
// Style 2: Asterisk for PK, (FK) for Foreign Keys
// ------------------------------------------------
STUDENT(*student_id, first_name, last_name, email, enrollment_date, major_id(FK))
MAJOR(*major_id, major_name, department_id(FK))
DEPARTMENT(*department_id, dept_name, building, budget)
 
// Style 3: Explicit PK/FK Markers
// ------------------------------------------------
EMPLOYEE(
    emp_id          PK,
    first_name,
    last_name,
    hire_date,
    salary,
    manager_id      FK → EMPLOYEE(emp_id),    -- Self-referencing
    department_id   FK → DEPARTMENT(dept_id)
)
 
// Style 4: Compact with Domain Types
// ------------------------------------------------
PRODUCT(
    product_id: INT [PK],
    name: VARCHAR(100),
    description: TEXT [NULL],
    price: DECIMAL(10,2),
    category_id: INT [FK],
    stock_quantity: INT [DEFAULT 0]
)
 
// Style 5: Full Specification with Constraints
// ------------------------------------------------
ORDER(
    order_id        INT             PRIMARY KEY,
    customer_id     INT             FOREIGN KEY → CUSTOMER(customer_id),
    order_date      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status          ENUM            CHECK ('pending','processing','shipped','delivered'),
    total_amount    DECIMAL(12,2)   CHECK (total_amount >= 0),
    shipping_addr   VARCHAR(500)    NULLABLE
)
 
// Style 6: Database Schema Overview Document
// ================================================
// E-COMMERCE DATABASE SCHEMA
// Version: 2.3
// Last Updated: 2024-01-15
// ================================================
 
// Core Entities
// -------------
CUSTOMER(*customer_id, email, password_hash, first_name, last_name, 
         created_at, last_login)
 
PRODUCT(*product_id, sku, name, description?, price, weight?, 
        category_id(FK), vendor_id(FK), is_active)
 
// Relationship Tables
// -------------------
ORDER(*order_id, customer_id(FK), order_date, status, total_amount)
ORDER_ITEM(*order_id(FK,PK), *line_number(PK), product_id(FK), 
           quantity, unit_price, discount)
 
// Reference Tables
// ----------------
CATEGORY(*category_id, name, parent_category_id?(FK → CATEGORY))
VENDOR(*vendor_id, company_name, contact_email, rating)

Composite Key Notation

When a primary key consists of multiple attributes (composite key), all component attributes must be indicated:

ENROLLMENT(student_id, course_id, semester, grade)
ORDER_ITEM(order_id, line_number, product_id, quantity, price)

Referential Integrity Notation

Foreign key relationships can be expressed with varying levels of detail:

Minimal: Just mark as FK

ORDER(order_id, customer_id(FK), order_date)

With Reference: Show target table

ORDER(order_id, customer_id → CUSTOMER, order_date)

With Column: Show target column

ORDER(order_id, customer_id → CUSTOMER(customer_id), order_date)

With Actions: Show referential actions

ORDER(order_id, customer_id → CUSTOMER(customer_id) ON DELETE RESTRICT ON UPDATE CASCADE, order_date)

Notation Flexibility

There is no single 'correct' textual notation—different organizations, textbooks, and tools use variations. The key is consistency within a project. Document your notation conventions at the beginning of any design document and apply them uniformly throughout.

Good Notation Practices

•Use consistent conventions throughout
•Clearly distinguish PK from other attributes
•Indicate all foreign key relationships
•Note nullable/optional attributes
•Include domain types when precision matters
•Document composite keys completely

Notation Anti-Patterns

•Mixing different notation styles
•Omitting primary key indicators
•Failing to show foreign key references
•Inconsistent attribute ordering
•Using ambiguous abbreviations
•Leaving composite keys partially marked

Degree and Cardinality Notation

Two fundamental measures characterize any relation: its degree (number of attributes) and its cardinality (number of tuples). Precise notation for these concepts is essential for formal analysis.

Degree Notation

Degree (also called arity) refers to the number of attributes in a relation schema.

Notation conventions:

deg(R) or |R| — Degree of relation R
n-ary — A relation with n attributes (binary = 2, ternary = 3, quaternary = 4, etc.)
attr(R) = {A₁, A₂, ..., Aₙ} — The set of attributes, with n = deg(R)

Cardinality Notation

Cardinality refers to the number of tuples in a relation instance.

Notation conventions:

|r| or |r(R)| — Cardinality of relation instance r
card(r) — Alternative notation for cardinality
#r — Sometimes used in literature

Important Properties

Degree is schema-level: It is determined by the schema and remains constant
Cardinality is instance-level: It changes as tuples are inserted and deleted
Degree ≥ 1: A relation must have at least one attribute (otherwise it's undefined or a marker)
Cardinality ≥ 0: A relation may have zero tuples (empty relation)
Maximum cardinality: Bounded by the product of domain cardinalities (usually extremely large)

Degree and Cardinality: Formal Properties
Property	Degree (Schema)	Cardinality (Instance)
Definition	Number of attributes in schema	Number of tuples in instance
Notation	deg(R), \|R\|, n-ary	\|r\|, card(r), #r
Stability	Fixed (part of schema definition)	Variable (changes with operations)
Minimum value	1 (at least one attribute)	0 (empty relation allowed)
Maximum value	No theoretical limit (practical: thousands)	∏dom(Aᵢ)\| (theoretical); limited by storage
Determined by	Database designer	Data operations (INSERT, DELETE)
Time dependence	Static	Dynamic

degree_cardinality_notation.txt

Mathematical

// ================================================
// DEGREE AND CARDINALITY NOTATION
// ================================================
 
// Schema Definition
EMPLOYEE(emp_id, name, dept_id, salary, hire_date)
 
// Degree (number of attributes)
deg(EMPLOYEE) = 5
|EMPLOYEE| = 5
EMPLOYEE is a 5-ary (quinary) relation
 
// Attribute Set Notation
attr(EMPLOYEE) = {emp_id, name, dept_id, salary, hire_date}
 
// Instance with Specific Cardinality
Let r(EMPLOYEE) be an instance with these tuples:
r = {
    (101, 'Alice', 10, 75000, '2020-01-15'),
    (102, 'Bob', 20, 62000, '2019-07-22'),
    (103, 'Carol', 10, 81000, '2018-03-10')
}
 
// Cardinality (number of tuples)
|r(EMPLOYEE)| = 3
card(r) = 3
 
// After INSERT operation
INSERT INTO r: (104, 'David', 30, 55000, '2023-09-01')
|r(EMPLOYEE)| = 4  // Cardinality increased by 1
 
// After DELETE operation  
DELETE FROM r WHERE emp_id = 102
|r(EMPLOYEE)| = 3  // Cardinality decreased by 1
 
// Degree remains constant throughout
deg(EMPLOYEE) = 5  // Always 5, regardless of operations on instances
 
// Empty Relation (valid state)
Let s(EMPLOYEE) = ∅  (empty set)
|s| = 0              // Zero cardinality is valid
deg(EMPLOYEE) = 5    // Degree still defined by schema
 
// Domain Cardinality Relationship
// Maximum possible cardinality = |dom(A₁)| × |dom(A₂)| × ... × |dom(Aₙ)|
// For EMPLOYEE with typical domains:
// |INTEGER| × |VARCHAR| × |INTEGER| × |DECIMAL| × |DATE|
// This is astronomically large, so practical cardinality is always much less

Degree vs Cardinality in Practice

In practice, degree is determined during schema design and rarely changes (adding/removing columns requires schema modification). Cardinality changes constantly through normal operations. When estimating query performance, cardinality is the critical factor—the query optimizer uses cardinality estimates to choose execution strategies.

Domain Specification Notation

Domains are the semantic data types associated with attributes—they define the set of valid values an attribute can hold. Domain specification notation provides precise ways to describe these constraints.

Domain Definition Notation

Domains can be specified at various levels of abstraction:

Basic Type Notation:

dom(salary) = DECIMAL(10,2)
dom(name) = VARCHAR(100)
dom(birth_date) = DATE

Semantic Domain Notation:

dom(email) = EmailAddress  // Custom semantic type
dom(phone) = PhoneNumber
dom(currency) = MonetaryAmount

Constrained Domain Notation:

dom(age) = INTEGER WHERE age >= 0 AND age <= 150
dom(rating) = DECIMAL(2,1) WHERE rating >= 0.0 AND rating <= 5.0
dom(status) = {'active', 'inactive', 'pending', 'suspended'}

Domain Dictionary Notation

Large databases maintain a domain dictionary (or domain catalog) that centralizes domain definitions. This enables reuse and consistency across the schema.

domain_dictionary.txt

Domain Dictionary

// ================================================
// DOMAIN DICTIONARY NOTATION
// ================================================
 
// DOMAIN DEFINITIONS CATALOG
// ==========================
 
// Identity Domains
DOMAIN PersonId     = BIGINT UNSIGNED NOT NULL
DOMAIN ProductId    = BIGINT UNSIGNED NOT NULL  
DOMAIN OrderId      = BIGINT UNSIGNED NOT NULL
DOMAIN UUID         = CHAR(36) PATTERN '^[0-9a-f]{8}-[0-9a-f]{4}-...'
 
// String Domains
DOMAIN PersonName   = VARCHAR(100) NOT NULL
DOMAIN Email        = VARCHAR(255) PATTERN '^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$'
DOMAIN PhoneNumber  = VARCHAR(20) PATTERN '^\+?[0-9\-\s]+$'
DOMAIN PostalCode   = VARCHAR(20)
DOMAIN URL          = VARCHAR(2048) PATTERN '^https?://.*'
 
// Numeric Domains
DOMAIN MonetaryAmount = DECIMAL(14,2) CHECK (value >= 0)
DOMAIN Percentage     = DECIMAL(5,2) CHECK (value >= 0 AND value <= 100)
DOMAIN Quantity       = INTEGER CHECK (value >= 0)
DOMAIN Rating         = DECIMAL(2,1) CHECK (value >= 0 AND value <= 5)
 
// Temporal Domains
DOMAIN Timestamp      = TIMESTAMP WITH TIME ZONE
DOMAIN FutureDate     = DATE CHECK (value > CURRENT_DATE)
DOMAIN BirthDate      = DATE CHECK (value <= CURRENT_DATE AND value >= '1900-01-01')
 
// Enumeration Domains
DOMAIN OrderStatus    = ENUM('pending', 'confirmed', 'processing', 
                             'shipped', 'delivered', 'cancelled', 'refunded')
DOMAIN PaymentMethod  = ENUM('credit_card', 'debit_card', 'bank_transfer', 
                             'paypal', 'crypto', 'cash')
DOMAIN UserRole       = ENUM('admin', 'manager', 'user', 'guest')
 
// Composite/Structured Domains (for advanced systems)
DOMAIN Address = STRUCT(
    street: VARCHAR(200),
    city: VARCHAR(100),
    state: VARCHAR(100),
    postal_code: PostalCode,
    country: CountryCode
)
 
// ================================================
// SCHEMA USING DOMAIN REFERENCES
// ================================================
 
CUSTOMER(
    customer_id:    PersonId        [PK],
    email:          Email           [UNIQUE],
    name:           PersonName,
    phone:          PhoneNumber     [NULL],
    created_at:     Timestamp
)
 
PRODUCT(
    product_id:     ProductId       [PK],
    name:           VARCHAR(200),
    price:          MonetaryAmount,
    rating_avg:     Rating          [NULL],
    stock:          Quantity
)
 
ORDER(
    order_id:       OrderId         [PK],
    customer_id:    PersonId        [FK → CUSTOMER],
    status:         OrderStatus,
    total:          MonetaryAmount,
    placed_at:      Timestamp
)

Domain Compatibility

The concept of domain compatibility (or type compatibility) is crucial for operations that combine values from different columns:

Union compatibility: Two relations can be unioned only if they have the same degree AND corresponding attributes have compatible domains
Join compatibility: Attributes compared in a join condition must have compatible domains
Assignment compatibility: A value can only be assigned to an attribute if it belongs to that attribute's domain

Notation:

A ≈ᵈ B  — Attributes A and B have compatible domains
R ≈ᵤ S  — Relations R and S are union-compatible
v ∈ dom(A) — Value v is in the domain of attribute A

Benefits of Formal Domain Specification

•Semantic Consistency — Ensures the same real-world concept is represented identically across all tables
•Constraint Reuse — Define validation rules once, apply everywhere the domain is used
•Documentation — Domain dictionary serves as authoritative reference for developers
•Change Management — Modify domain definition once to update all usages
•Type Safety — Prevents invalid operations between incompatible types
•Query Optimization — Optimizer can use domain knowledge for better plans

SQL and Domains

Standard SQL includes CREATE DOMAIN syntax, but support varies by DBMS. PostgreSQL fully supports domains, while MySQL and SQL Server use CHECK constraints on columns. Regardless of DBMS support, maintaining a logical domain dictionary—even if implemented as naming conventions and comments—significantly improves schema quality.

Schema Comparison and Equivalence

Database professionals frequently need to compare schemas—whether for migration, integration, normalization verification, or quality assessment. Formal notation for schema comparison enables precise analysis.

Attribute Set Operations

Given relation schemas R and S, we can perform set operations on their attribute sets:

Union: attr(R) ∪ attr(S) — All attributes from both schemas Intersection: attr(R) ∩ attr(S) — Common attributes Difference: attr(R) - attr(S) — Attributes in R but not S Symmetric Difference: attr(R) △ attr(S) — Attributes in exactly one schema

Schema Equivalence

Two schemas are equivalent if they can represent exactly the same information. This is stronger than having the same attributes:

Structural equivalence: Same attributes with same domains

R ≡ₛ S ⟺ attr(R) = attr(S) ∧ ∀A ∈ attr(R): dom_R(A) = dom_S(A)

Semantic equivalence: Can be transformed into each other without information loss

R ≡ₛₑₘ S ⟺ there exists lossless decomposition/composition between R and S

schema_comparison.txt

Schema Comparison

// ================================================
// SCHEMA COMPARISON NOTATION
// ================================================
 
// Define two schemas
R = EMPLOYEE(emp_id, name, dept_id, salary, hire_date)
S = CONTRACTOR(contractor_id, name, agency_id, hourly_rate, start_date)
 
// Attribute sets
attr(R) = {emp_id, name, dept_id, salary, hire_date}
attr(S) = {contractor_id, name, agency_id, hourly_rate, start_date}
 
// Set operations on schemas
attr(R) ∪ attr(S) = {emp_id, name, dept_id, salary, hire_date, 
                     contractor_id, agency_id, hourly_rate, start_date}
 
attr(R) ∩ attr(S) = {name}  // Common attributes
 
attr(R) - attr(S) = {emp_id, dept_id, salary, hire_date}
 
attr(S) - attr(R) = {contractor_id, agency_id, hourly_rate, start_date}
 
// Degree comparison
deg(R) = 5
deg(S) = 5
deg(R) = deg(S)  // Same degree, but not equivalent schemas
 
// Union compatibility check
// For R ≈ᵤ S (union compatible):
// 1. deg(R) = deg(S)  ✓ (both have 5 attributes)
// 2. Corresponding domains must match
//    R.attr₁ ↔ S.attr₁: INTEGER ↔ INTEGER  ✓
//    R.attr₂ ↔ S.attr₂: VARCHAR ↔ VARCHAR  ✓
//    R.attr₃ ↔ S.attr₃: INTEGER ↔ INTEGER  ✓
//    R.attr₄ ↔ S.attr₄: DECIMAL ↔ DECIMAL  ✓
//    R.attr₅ ↔ S.attr₅: DATE ↔ DATE        ✓
// Result: R and S are union-compatible (syntactically)
 
// ================================================
// SCHEMA EVOLUTION NOTATION
// ================================================
 
// Original schema version
V1: CUSTOMER(customer_id, name, address, phone)
 
// Schema modifications - notation for changes
V2: CUSTOMER(customer_id, first_name, last_name, email, address, phone)
    Changes: 
    - RENAME: name → (first_name, last_name)  // Decomposition
    - ADD: email
    
// Delta notation
Δ(V1 → V2) = {
    DROP: {name},
    ADD: {first_name, last_name, email},
    MODIFY: ∅,
    RENAME: ∅
}
 
// Schema diff (what changed)
attr(V2) - attr(V1) = {first_name, last_name, email}
attr(V1) - attr(V2) = {name}
deg(V2) - deg(V1) = 6 - 4 = +2

Schema Relationship Types
Relationship	Notation	Meaning
Identity	R = S	Exactly the same schema (same name, attributes, domains)
Equivalent	R ≡ S	Same attributes and domains (possibly different names)
Subset	R ⊂ S	Every attribute in R is also in S
Union-compatible	R ≈ᵤ S	Same degree with domain-compatible attributes
Join-compatible on X	R ⋈ₓ S	Common attributes X have compatible domains
Lossless decomposition	R = R₁ ⊕ R₂	R can be decomposed into R₁ and R₂ without loss

Practical Schema Comparison

In practice, schema comparison often uses specialized tools (like SchemaSpy, Liquibase, Flyway) that generate diff reports. However, understanding the formal notation helps you articulate precisely what changed and verify that schema migrations are semantically correct.

Connecting Notation to SQL Implementation

The formal and textual schema notations we've explored are abstract—they describe structure without specifying implementation. However, most relational databases are implemented using SQL. Understanding how notation maps to SQL syntax bridges theory and practice.

Notation to SQL Mapping

The abstract schema notation translates directly to SQL DDL (Data Definition Language):

Notation Element	SQL Equivalent
Relation name R	CREATE TABLE R
Attribute A: D	column_name datatype
Primary Key PK	PRIMARY KEY constraint
Foreign Key FK → S	FOREIGN KEY REFERENCES S
Domain constraint	CHECK constraint
NOT NULL	NOT NULL constraint
Default value	DEFAULT clause

notation_to_sql.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
-- ================================================
-- SCHEMA NOTATION TO SQL TRANSLATION
-- ================================================
 
-- Textual Notation:
-- DEPARTMENT(*dept_id: INT, name: VARCHAR(100), budget: DECIMAL(12,2), 
--            location: VARCHAR(200)??, manager_id: INT? → EMPLOYEE)
 
-- SQL Implementation:
CREATE TABLE department (
    dept_id     INTEGER         NOT NULL,
    name        VARCHAR(100)    NOT NULL,
    budget      DECIMAL(12,2)   NOT NULL,
    location    VARCHAR(200),   -- Nullable (? in notation)
    manager_id  INTEGER,        -- Nullable FK
    
    CONSTRAINT pk_department PRIMARY KEY (dept_id),
    CONSTRAINT fk_dept_manager FOREIGN KEY (manager_id) 
        REFERENCES employee(emp_id) ON DELETE SET NULL
);
 
-- ================================================
-- Textual Notation:
-- EMPLOYEE(*emp_id: INT, first_name: VARCHAR(50), last_name: VARCHAR(50),
--          email: Email, hire_date: DATE, salary: DECIMAL(10,2) CHECK >=0,
--          dept_id: INT → DEPARTMENT)
 
-- SQL Implementation:
CREATE TABLE employee (
    emp_id      INTEGER         NOT NULL,
    first_name  VARCHAR(50)     NOT NULL,
    last_name   VARCHAR(50)     NOT NULL,
    email       VARCHAR(255)    NOT NULL,
    hire_date   DATE            NOT NULL DEFAULT CURRENT_DATE,
    salary      DECIMAL(10,2)   NOT NULL,
    dept_id     INTEGER         NOT NULL,
    
    CONSTRAINT pk_employee PRIMARY KEY (emp_id),
    CONSTRAINT uq_employee_email UNIQUE (email),
    CONSTRAINT chk_salary CHECK (salary >= 0),
    CONSTRAINT chk_email CHECK (email ~ '^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$'),
    CONSTRAINT fk_employee_dept FOREIGN KEY (dept_id) 
        REFERENCES department(dept_id) ON DELETE RESTRICT ON UPDATE CASCADE
);
 
-- ================================================
-- Textual Notation (Composite Key):
-- ENROLLMENT(*student_id → STUDENT, *course_id → COURSE, *semester: VARCHAR(20), 
--            grade: CHAR(2)?, enrollment_date: DATE)
 
-- SQL Implementation:
CREATE TABLE enrollment (
    student_id      INTEGER         NOT NULL,
    course_id       INTEGER         NOT NULL,
    semester        VARCHAR(20)     NOT NULL,
    grade           CHAR(2),        -- Nullable (grade assigned later)
    enrollment_date DATE            NOT NULL DEFAULT CURRENT_DATE,
    
    -- Composite primary key
    CONSTRAINT pk_enrollment PRIMARY KEY (student_id, course_id, semester),
    
    -- Foreign keys
    CONSTRAINT fk_enrollment_student FOREIGN KEY (student_id)
        REFERENCES student(student_id) ON DELETE CASCADE,
    CONSTRAINT fk_enrollment_course FOREIGN KEY (course_id)
        REFERENCES course(course_id) ON DELETE RESTRICT,
        
    -- Domain constraint on grade
    CONSTRAINT chk_grade CHECK (grade IN ('A+', 'A', 'A-', 'B+', 'B', 'B-',
                                          'C+', 'C', 'C-', 'D+', 'D', 'F', 'W'))
);
 
-- ================================================
-- MATHEMATICAL NOTATION TO SQL
-- ================================================
 
-- Mathematical: R(A₁: D₁, A₂: D₂) with constraint A₁ ∈ dom(A₁), A₂ ∈ dom(A₂)
-- SQL: CREATE TABLE with column definitions and CHECK constraints
 
-- Mathematical: t[A] denotes value of attribute A in tuple t
-- SQL: SELECT A FROM R (retrieves values of column A from rows)
 
-- Mathematical: |r| = cardinality
-- SQL: SELECT COUNT(*) FROM R
 
-- Mathematical: deg(R) = n
-- SQL: SELECT COUNT(*) FROM information_schema.columns WHERE table_name = 'R'

Notation vs Implementation

Schema notation is logically independent of any specific DBMS. The same notation can be implemented in PostgreSQL, MySQL, Oracle, SQL Server, or any relational system. This abstraction is valuable: design documents using proper notation remain valid even if you migrate to a different database platform.

Summary: Mastering Schema Notation

Relation schema notation provides the vocabulary for precise communication about database structure. From formal mathematical notation used in research to practical textual notation used in design documents, mastering these conventions elevates your ability to design, document, and discuss databases professionally.

Key Takeaways

•Schema vs Instance is fundamental — Schema (intension) describes structure; instance (extension) contains actual data. Schema notation describes blueprints, not contents.
•Mathematical notation provides rigor — R(A₁, A₂, ..., Aₙ) with dom(Aᵢ) specifications enables formal reasoning and connects to database theory literature.
•Textual notation provides practicality — Conventions for marking primary keys, foreign keys, and nullable attributes enable clear documentation.
•Degree and cardinality are distinct — Degree (attribute count) is fixed by schema; cardinality (tuple count) varies with data operations.
•Domains define valid values — Domain specification goes beyond SQL types to capture semantic constraints and enable consistency.
•Schema comparison has formal basis — Set operations on attribute sets, equivalence relations, and union compatibility have precise definitions.
•Notation maps to SQL implementation — Abstract notation translates directly to DDL, bridging theory and practice.

What's next:

The next page explores Instance Notation—the conventions for representing actual data within relations. While schema notation describes structure, instance notation describes content. Together, they provide a complete vocabulary for relational database representation.

Page Complete

You now understand the essential notation for representing relation schemas—from formal mathematical conventions to practical textual formats. This knowledge enables precise communication about database structure and forms the foundation for advanced topics in relational database design and theory.

1 / 5

Loading learning content...

Database Management SystemsRelational Model Notation

Relational Model Notation

LevelIntermediate

Duration90 mins

TopicRelational Model Notation

1 / 5

Relation Schema Notation: The Language of Database Structure

The Critical Importance of Precise Notation

Communicate unambiguously with colleagues, regardless of their preferred DBMS
Document designs precisely in a form that remains valid across decades and technology shifts
Reason formally about database properties, transformations, and optimizations
Bridge theory and practice by connecting academic literature to practical implementation

Learning Objectives

After studying this page, you will be able to:

Schema vs Instance: The Fundamental Distinction

The Type-Value Analogy

The relational model makes an analogous distinction:

A relation schema is the structural description: the relation name, attribute names, and their associated domains (data types). It is the template or blueprint that defines what valid data looks like.
A relation instance (or relation state) is a specific set of tuples that conform to the schema at a particular point in time. It is the actual data currently stored.

Formal Definition

Relation Schema: A relation schema R(A₁, A₂, ..., Aₙ) is a named tuple consisting of:

A relation name R
An ordered list of attribute names (A₁, A₂, ..., Aₙ)
Associated domains dom(Aᵢ) for each attribute

Relation Instance: A relation instance r of schema R is a subset of the Cartesian product of the attribute domains:

r(R) ⊆ dom(A₁) × dom(A₂) × ... × dom(Aₙ)

Each element of this set is a tuple (row), and no two tuples can be identical (since a relation is a set, and sets contain no duplicates).

Schema vs Instance: Comprehensive Comparison
Aspect	Relation Schema (Intension)	Relation Instance (Extension)
Nature	Structural description	Actual data
Stability	Rarely changes	Frequently changes
Analogies	Class definition, Type, Blueprint	Object instance, Value, Building
Defined by	Database designer	Data insertion/modification
Contains	Attribute names, domains, constraints	Tuples (rows) of actual values
Time dependence	Static during operation	Dynamic, changes over time
Mathematical term	Intension	Extension
Size	Fixed (determined by degree)	Variable (cardinality changes)

The Meta-Level Distinction

Why This Distinction Matters for Notation

Different notations serve different purposes:

Schema notation focuses on structure: attribute names, types, and relationships between schemas (e.g., foreign key references)
Instance notation focuses on content: specific tuples, sample data, and query results

The remainder of this page focuses primarily on schema notation, though we will return to instance notation in the next page.

Mathematical Set-Theoretic Notation

Basic Schema Notation

The most fundamental schema notation uses the following form:

R(A₁: D₁, A₂: D₂, ..., Aₙ: Dₙ)

Where:

R is the relation name (by convention, capitalized or in small caps)
Aᵢ is the name of the i-th attribute
Dᵢ is the domain (data type) of attribute Aᵢ
n is the degree (number of attributes) of the relation

Simplified Notation

When domains are implicit or defined elsewhere, we often use the simplified form:

R(A₁, A₂, ..., Aₙ) or simply R(A, B, C)

This abbreviated form is extremely common in database literature and assumes domains are either understood from context or specified in a separate domain dictionary.

Set Notation for Relations

Formally, a relation r on schema R(A₁, A₂, ..., Aₙ) is a finite subset of the Cartesian product:

r ⊆ dom(A₁) × dom(A₂) × ... × dom(Aₙ)

Each element t ∈ r is a tuple, which can be viewed as a function:

t: {A₁, A₂, ..., Aₙ} → ⋃ᵢ dom(Aᵢ)

such that t(Aᵢ) ∈ dom(Aᵢ) for each attribute Aᵢ.

This functional view of tuples underlies the notation t[Aᵢ] or t.Aᵢ for accessing the value of attribute Aᵢ in tuple t.

formal_schema_examples.txt

Formal Notation

// ================================================
// FORMAL RELATION SCHEMA NOTATION EXAMPLES
// ================================================
 
// Basic Schema Notation with Explicit Domains
// Syntax: R(A₁: D₁, A₂: D₂, ..., Aₙ: Dₙ)
 
EMPLOYEE(
    emp_id: INTEGER,
    first_name: VARCHAR(50),
    last_name: VARCHAR(50),
    birth_date: DATE,
    salary: DECIMAL(10,2),
    dept_id: INTEGER
)
 
DEPARTMENT(
    dept_id: INTEGER,
    dept_name: VARCHAR(100),
    location: VARCHAR(100),
    budget: DECIMAL(12,2)
)
 
// Simplified Schema Notation (domains implicit)
// Syntax: R(A₁, A₂, ..., Aₙ)
 
EMPLOYEE(emp_id, first_name, last_name, birth_date, salary, dept_id)
DEPARTMENT(dept_id, dept_name, location, budget)
 
// Set-Theoretic Definition of a Relation
// r(R) ⊆ dom(A₁) × dom(A₂) × ... × dom(Aₙ)
 
r(EMPLOYEE) ⊆ INTEGER × VARCHAR(50) × VARCHAR(50) × DATE × DECIMAL(10,2) × INTEGER
 
// Tuple Notation
// t = (v₁, v₂, ..., vₙ) where vᵢ ∈ dom(Aᵢ)
 
t₁ = (101, 'Alice', 'Johnson', '1985-03-15', 75000.00, 10)
t₂ = (102, 'Bob', 'Smith', '1990-07-22', 62000.00, 20)
 
// Attribute Access Notation
// t[Aᵢ] or t.Aᵢ denotes the value of attribute Aᵢ in tuple t
 
t₁[first_name] = 'Alice'
t₁.salary = 75000.00
t₁[emp_id] = 101
 
// Multiple Attribute Access (Projection on Tuple)
// t[{A₁, A₂}] returns subtuple with only specified attributes
 
t₁[{first_name, last_name}] = ('Alice', 'Johnson')

Attribute Set Notation

For a relation schema R, we often need to refer to its set of attributes. Common notations include:

attr(R) or Attr(R) — The set of all attributes in schema R
|R| or deg(R) — The degree (number of attributes) of R
R.A — Qualified attribute name (relation.attribute)

Union and Difference of Schemas

When discussing multiple relations, we sometimes need to describe combined or differential schemas:

R ∪ S — Union of attribute sets from R and S
R ∩ S — Common attributes between R and S
R - S — Attributes in R but not in S

These operations on schemas (rather than instances) are fundamental to understanding join compatibility and set operations in relational algebra.

Subscript and Superscript Conventions

Rᵢ — The i-th schema in a database (R₁, R₂, R₃...)
R' or R'' — Modified or derived versions of schema R
r(R) or r(R) — Instance r of schema R (lowercase for instance, uppercase for schema)
|r| — Cardinality (number of tuples) in instance r

Key Symbols in Mathematical Schema Notation

•R, S, T — Relation names (uppercase or small caps by convention)
•A, B, C — Individual attribute names (uppercase letters)
•X, Y, Z — Sets of attributes (typically for functional dependencies)
•r, s, t — Relation instances (lowercase)
•t, u, v — Individual tuples (lowercase)
•dom(A) — Domain of attribute A
•deg(R) — Degree of relation R (number of attributes)
•|r| — Cardinality of relation instance r (number of tuples)
•t[A] or t.A — Value of attribute A in tuple t
•⊆, ∈, ×, ∪, ∩ — Standard set operations applied to relations

Reading Academic Literature

Textual Schema Notation

While mathematical notation provides rigor, textual schema notation provides readability and practical applicability. This notation style is commonly used in:

Database design documents
System specification documents
Academic assignments and examinations
Quick communication between developers
Entity-Relationship to Relational mappings

Standard Textual Format

The most widely used textual notation follows this pattern:

RELATION_NAME(attribute₁, attribute₂, ..., attributeₙ)

With the following conventions:

Primary Key Indication:

Underline the primary key attribute(s): RELATION_NAME(pk_attr, attr₂, attr₃)
Or use bold: RELATION_NAME(pk_attr, attr₂, attr₃)
Or mark explicitly: RELATION_NAME(pk_attr, attr₂, attr₃)

Foreign Key Indication:

Use arrow notation: attr₂ → REFERENCED_TABLE(ref_attr)
Or use italic: RELATION_NAME(pk, fk_attr, other)
Or mark explicitly: fk_attr (FK)

Optional/Nullable Attributes:

Use question mark: attr?
Or use parentheses: (attr)
Or mark explicitly: attr (nullable)

textual_notation_examples.txt

Textual Notation

// ================================================
// TEXTUAL SCHEMA NOTATION STYLES
// ================================================
 
// Style 1: Underline for PK (common in academic contexts)
// ------------------------------------------------
STUDENT(student_id̲, first_name, last_name, email, enrollment_date, major_id)
         ̲̲̲̲̲̲̲̲̲̲̲        
COURSE(course_id̲, course_name, credits, department)
 
ENROLLMENT(student_id̲, course_id̲, semester̲, grade)
           ̲̲̲̲̲̲̲̲̲̲         ̲̲̲̲̲̲̲̲̲̲   ̲̲̲̲̲̲̲̲
 
// Style 2: Asterisk for PK, (FK) for Foreign Keys
// ------------------------------------------------
STUDENT(*student_id, first_name, last_name, email, enrollment_date, major_id(FK))
MAJOR(*major_id, major_name, department_id(FK))
DEPARTMENT(*department_id, dept_name, building, budget)
 
// Style 3: Explicit PK/FK Markers
// ------------------------------------------------
EMPLOYEE(
    emp_id          PK,
    first_name,
    last_name,
    hire_date,
    salary,
    manager_id      FK → EMPLOYEE(emp_id),    -- Self-referencing
    department_id   FK → DEPARTMENT(dept_id)
)
 
// Style 4: Compact with Domain Types
// ------------------------------------------------
PRODUCT(
    product_id: INT [PK],
    name: VARCHAR(100),
    description: TEXT [NULL],
    price: DECIMAL(10,2),
    category_id: INT [FK],
    stock_quantity: INT [DEFAULT 0]
)
 
// Style 5: Full Specification with Constraints
// ------------------------------------------------
ORDER(
    order_id        INT             PRIMARY KEY,
    customer_id     INT             FOREIGN KEY → CUSTOMER(customer_id),
    order_date      TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,
    status          ENUM            CHECK ('pending','processing','shipped','delivered'),
    total_amount    DECIMAL(12,2)   CHECK (total_amount >= 0),
    shipping_addr   VARCHAR(500)    NULLABLE
)
 
// Style 6: Database Schema Overview Document
// ================================================
// E-COMMERCE DATABASE SCHEMA
// Version: 2.3
// Last Updated: 2024-01-15
// ================================================
 
// Core Entities
// -------------
CUSTOMER(*customer_id, email, password_hash, first_name, last_name, 
         created_at, last_login)
 
PRODUCT(*product_id, sku, name, description?, price, weight?, 
        category_id(FK), vendor_id(FK), is_active)
 
// Relationship Tables
// -------------------
ORDER(*order_id, customer_id(FK), order_date, status, total_amount)
ORDER_ITEM(*order_id(FK,PK), *line_number(PK), product_id(FK), 
           quantity, unit_price, discount)
 
// Reference Tables
// ----------------
CATEGORY(*category_id, name, parent_category_id?(FK → CATEGORY))
VENDOR(*vendor_id, company_name, contact_email, rating)

Composite Key Notation

When a primary key consists of multiple attributes (composite key), all component attributes must be indicated:

ENROLLMENT(student_id, course_id, semester, grade)
ORDER_ITEM(order_id, line_number, product_id, quantity, price)

Referential Integrity Notation

Foreign key relationships can be expressed with varying levels of detail:

Minimal: Just mark as FK

ORDER(order_id, customer_id(FK), order_date)

With Reference: Show target table

ORDER(order_id, customer_id → CUSTOMER, order_date)

With Column: Show target column

ORDER(order_id, customer_id → CUSTOMER(customer_id), order_date)

With Actions: Show referential actions

ORDER(order_id, customer_id → CUSTOMER(customer_id) ON DELETE RESTRICT ON UPDATE CASCADE, order_date)

Notation Flexibility

Good Notation Practices

•Use consistent conventions throughout
•Clearly distinguish PK from other attributes
•Indicate all foreign key relationships
•Note nullable/optional attributes
•Include domain types when precision matters
•Document composite keys completely

Notation Anti-Patterns

•Mixing different notation styles
•Omitting primary key indicators
•Failing to show foreign key references
•Inconsistent attribute ordering
•Using ambiguous abbreviations
•Leaving composite keys partially marked

Degree and Cardinality Notation

Degree Notation

Degree (also called arity) refers to the number of attributes in a relation schema.

Notation conventions:

deg(R) or |R| — Degree of relation R
n-ary — A relation with n attributes (binary = 2, ternary = 3, quaternary = 4, etc.)
attr(R) = {A₁, A₂, ..., Aₙ} — The set of attributes, with n = deg(R)

Cardinality Notation

Cardinality refers to the number of tuples in a relation instance.

Notation conventions:

|r| or |r(R)| — Cardinality of relation instance r
card(r) — Alternative notation for cardinality
#r — Sometimes used in literature

Important Properties

Degree is schema-level: It is determined by the schema and remains constant
Cardinality is instance-level: It changes as tuples are inserted and deleted
Degree ≥ 1: A relation must have at least one attribute (otherwise it's undefined or a marker)
Cardinality ≥ 0: A relation may have zero tuples (empty relation)
Maximum cardinality: Bounded by the product of domain cardinalities (usually extremely large)

Degree and Cardinality: Formal Properties
Property	Degree (Schema)	Cardinality (Instance)
Definition	Number of attributes in schema	Number of tuples in instance
Notation	deg(R), \|R\|, n-ary	\|r\|, card(r), #r
Stability	Fixed (part of schema definition)	Variable (changes with operations)
Minimum value	1 (at least one attribute)	0 (empty relation allowed)
Maximum value	No theoretical limit (practical: thousands)	∏dom(Aᵢ)\| (theoretical); limited by storage
Determined by	Database designer	Data operations (INSERT, DELETE)
Time dependence	Static	Dynamic

degree_cardinality_notation.txt

Mathematical

// ================================================
// DEGREE AND CARDINALITY NOTATION
// ================================================
 
// Schema Definition
EMPLOYEE(emp_id, name, dept_id, salary, hire_date)
 
// Degree (number of attributes)
deg(EMPLOYEE) = 5
|EMPLOYEE| = 5
EMPLOYEE is a 5-ary (quinary) relation
 
// Attribute Set Notation
attr(EMPLOYEE) = {emp_id, name, dept_id, salary, hire_date}
 
// Instance with Specific Cardinality
Let r(EMPLOYEE) be an instance with these tuples:
r = {
    (101, 'Alice', 10, 75000, '2020-01-15'),
    (102, 'Bob', 20, 62000, '2019-07-22'),
    (103, 'Carol', 10, 81000, '2018-03-10')
}
 
// Cardinality (number of tuples)
|r(EMPLOYEE)| = 3
card(r) = 3
 
// After INSERT operation
INSERT INTO r: (104, 'David', 30, 55000, '2023-09-01')
|r(EMPLOYEE)| = 4  // Cardinality increased by 1
 
// After DELETE operation  
DELETE FROM r WHERE emp_id = 102
|r(EMPLOYEE)| = 3  // Cardinality decreased by 1
 
// Degree remains constant throughout
deg(EMPLOYEE) = 5  // Always 5, regardless of operations on instances
 
// Empty Relation (valid state)
Let s(EMPLOYEE) = ∅  (empty set)
|s| = 0              // Zero cardinality is valid
deg(EMPLOYEE) = 5    // Degree still defined by schema
 
// Domain Cardinality Relationship
// Maximum possible cardinality = |dom(A₁)| × |dom(A₂)| × ... × |dom(Aₙ)|
// For EMPLOYEE with typical domains:
// |INTEGER| × |VARCHAR| × |INTEGER| × |DECIMAL| × |DATE|
// This is astronomically large, so practical cardinality is always much less

Degree vs Cardinality in Practice

Domain Specification Notation

Domain Definition Notation

Domains can be specified at various levels of abstraction:

Basic Type Notation:

dom(salary) = DECIMAL(10,2)
dom(name) = VARCHAR(100)
dom(birth_date) = DATE

Semantic Domain Notation:

dom(email) = EmailAddress  // Custom semantic type
dom(phone) = PhoneNumber
dom(currency) = MonetaryAmount

Constrained Domain Notation:

dom(age) = INTEGER WHERE age >= 0 AND age <= 150
dom(rating) = DECIMAL(2,1) WHERE rating >= 0.0 AND rating <= 5.0
dom(status) = {'active', 'inactive', 'pending', 'suspended'}

Domain Dictionary Notation

Large databases maintain a domain dictionary (or domain catalog) that centralizes domain definitions. This enables reuse and consistency across the schema.

domain_dictionary.txt

Domain Dictionary

// ================================================
// DOMAIN DICTIONARY NOTATION
// ================================================
 
// DOMAIN DEFINITIONS CATALOG
// ==========================
 
// Identity Domains
DOMAIN PersonId     = BIGINT UNSIGNED NOT NULL
DOMAIN ProductId    = BIGINT UNSIGNED NOT NULL  
DOMAIN OrderId      = BIGINT UNSIGNED NOT NULL
DOMAIN UUID         = CHAR(36) PATTERN '^[0-9a-f]{8}-[0-9a-f]{4}-...'
 
// String Domains
DOMAIN PersonName   = VARCHAR(100) NOT NULL
DOMAIN Email        = VARCHAR(255) PATTERN '^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$'
DOMAIN PhoneNumber  = VARCHAR(20) PATTERN '^\+?[0-9\-\s]+$'
DOMAIN PostalCode   = VARCHAR(20)
DOMAIN URL          = VARCHAR(2048) PATTERN '^https?://.*'
 
// Numeric Domains
DOMAIN MonetaryAmount = DECIMAL(14,2) CHECK (value >= 0)
DOMAIN Percentage     = DECIMAL(5,2) CHECK (value >= 0 AND value <= 100)
DOMAIN Quantity       = INTEGER CHECK (value >= 0)
DOMAIN Rating         = DECIMAL(2,1) CHECK (value >= 0 AND value <= 5)
 
// Temporal Domains
DOMAIN Timestamp      = TIMESTAMP WITH TIME ZONE
DOMAIN FutureDate     = DATE CHECK (value > CURRENT_DATE)
DOMAIN BirthDate      = DATE CHECK (value <= CURRENT_DATE AND value >= '1900-01-01')
 
// Enumeration Domains
DOMAIN OrderStatus    = ENUM('pending', 'confirmed', 'processing', 
                             'shipped', 'delivered', 'cancelled', 'refunded')
DOMAIN PaymentMethod  = ENUM('credit_card', 'debit_card', 'bank_transfer', 
                             'paypal', 'crypto', 'cash')
DOMAIN UserRole       = ENUM('admin', 'manager', 'user', 'guest')
 
// Composite/Structured Domains (for advanced systems)
DOMAIN Address = STRUCT(
    street: VARCHAR(200),
    city: VARCHAR(100),
    state: VARCHAR(100),
    postal_code: PostalCode,
    country: CountryCode
)
 
// ================================================
// SCHEMA USING DOMAIN REFERENCES
// ================================================
 
CUSTOMER(
    customer_id:    PersonId        [PK],
    email:          Email           [UNIQUE],
    name:           PersonName,
    phone:          PhoneNumber     [NULL],
    created_at:     Timestamp
)
 
PRODUCT(
    product_id:     ProductId       [PK],
    name:           VARCHAR(200),
    price:          MonetaryAmount,
    rating_avg:     Rating          [NULL],
    stock:          Quantity
)
 
ORDER(
    order_id:       OrderId         [PK],
    customer_id:    PersonId        [FK → CUSTOMER],
    status:         OrderStatus,
    total:          MonetaryAmount,
    placed_at:      Timestamp
)

Domain Compatibility

The concept of domain compatibility (or type compatibility) is crucial for operations that combine values from different columns:

Union compatibility: Two relations can be unioned only if they have the same degree AND corresponding attributes have compatible domains
Join compatibility: Attributes compared in a join condition must have compatible domains
Assignment compatibility: A value can only be assigned to an attribute if it belongs to that attribute's domain

Notation:

A ≈ᵈ B  — Attributes A and B have compatible domains
R ≈ᵤ S  — Relations R and S are union-compatible
v ∈ dom(A) — Value v is in the domain of attribute A

Benefits of Formal Domain Specification

•Semantic Consistency — Ensures the same real-world concept is represented identically across all tables
•Constraint Reuse — Define validation rules once, apply everywhere the domain is used
•Documentation — Domain dictionary serves as authoritative reference for developers
•Change Management — Modify domain definition once to update all usages
•Type Safety — Prevents invalid operations between incompatible types
•Query Optimization — Optimizer can use domain knowledge for better plans

SQL and Domains

Schema Comparison and Equivalence

Attribute Set Operations

Given relation schemas R and S, we can perform set operations on their attribute sets:

Schema Equivalence

Two schemas are equivalent if they can represent exactly the same information. This is stronger than having the same attributes:

Structural equivalence: Same attributes with same domains

R ≡ₛ S ⟺ attr(R) = attr(S) ∧ ∀A ∈ attr(R): dom_R(A) = dom_S(A)

Semantic equivalence: Can be transformed into each other without information loss

R ≡ₛₑₘ S ⟺ there exists lossless decomposition/composition between R and S

schema_comparison.txt

Schema Comparison

// ================================================
// SCHEMA COMPARISON NOTATION
// ================================================
 
// Define two schemas
R = EMPLOYEE(emp_id, name, dept_id, salary, hire_date)
S = CONTRACTOR(contractor_id, name, agency_id, hourly_rate, start_date)
 
// Attribute sets
attr(R) = {emp_id, name, dept_id, salary, hire_date}
attr(S) = {contractor_id, name, agency_id, hourly_rate, start_date}
 
// Set operations on schemas
attr(R) ∪ attr(S) = {emp_id, name, dept_id, salary, hire_date, 
                     contractor_id, agency_id, hourly_rate, start_date}
 
attr(R) ∩ attr(S) = {name}  // Common attributes
 
attr(R) - attr(S) = {emp_id, dept_id, salary, hire_date}
 
attr(S) - attr(R) = {contractor_id, agency_id, hourly_rate, start_date}
 
// Degree comparison
deg(R) = 5
deg(S) = 5
deg(R) = deg(S)  // Same degree, but not equivalent schemas
 
// Union compatibility check
// For R ≈ᵤ S (union compatible):
// 1. deg(R) = deg(S)  ✓ (both have 5 attributes)
// 2. Corresponding domains must match
//    R.attr₁ ↔ S.attr₁: INTEGER ↔ INTEGER  ✓
//    R.attr₂ ↔ S.attr₂: VARCHAR ↔ VARCHAR  ✓
//    R.attr₃ ↔ S.attr₃: INTEGER ↔ INTEGER  ✓
//    R.attr₄ ↔ S.attr₄: DECIMAL ↔ DECIMAL  ✓
//    R.attr₅ ↔ S.attr₅: DATE ↔ DATE        ✓
// Result: R and S are union-compatible (syntactically)
 
// ================================================
// SCHEMA EVOLUTION NOTATION
// ================================================
 
// Original schema version
V1: CUSTOMER(customer_id, name, address, phone)
 
// Schema modifications - notation for changes
V2: CUSTOMER(customer_id, first_name, last_name, email, address, phone)
    Changes: 
    - RENAME: name → (first_name, last_name)  // Decomposition
    - ADD: email
    
// Delta notation
Δ(V1 → V2) = {
    DROP: {name},
    ADD: {first_name, last_name, email},
    MODIFY: ∅,
    RENAME: ∅
}
 
// Schema diff (what changed)
attr(V2) - attr(V1) = {first_name, last_name, email}
attr(V1) - attr(V2) = {name}
deg(V2) - deg(V1) = 6 - 4 = +2

Schema Relationship Types
Relationship	Notation	Meaning
Identity	R = S	Exactly the same schema (same name, attributes, domains)
Equivalent	R ≡ S	Same attributes and domains (possibly different names)
Subset	R ⊂ S	Every attribute in R is also in S
Union-compatible	R ≈ᵤ S	Same degree with domain-compatible attributes
Join-compatible on X	R ⋈ₓ S	Common attributes X have compatible domains
Lossless decomposition	R = R₁ ⊕ R₂	R can be decomposed into R₁ and R₂ without loss

Practical Schema Comparison

Connecting Notation to SQL Implementation

Notation to SQL Mapping

The abstract schema notation translates directly to SQL DDL (Data Definition Language):

Notation Element	SQL Equivalent
Relation name R	CREATE TABLE R
Attribute A: D	column_name datatype
Primary Key PK	PRIMARY KEY constraint
Foreign Key FK → S	FOREIGN KEY REFERENCES S
Domain constraint	CHECK constraint
NOT NULL	NOT NULL constraint
Default value	DEFAULT clause

notation_to_sql.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
-- ================================================
-- SCHEMA NOTATION TO SQL TRANSLATION
-- ================================================
 
-- Textual Notation:
-- DEPARTMENT(*dept_id: INT, name: VARCHAR(100), budget: DECIMAL(12,2), 
--            location: VARCHAR(200)??, manager_id: INT? → EMPLOYEE)
 
-- SQL Implementation:
CREATE TABLE department (
    dept_id     INTEGER         NOT NULL,
    name        VARCHAR(100)    NOT NULL,
    budget      DECIMAL(12,2)   NOT NULL,
    location    VARCHAR(200),   -- Nullable (? in notation)
    manager_id  INTEGER,        -- Nullable FK
    
    CONSTRAINT pk_department PRIMARY KEY (dept_id),
    CONSTRAINT fk_dept_manager FOREIGN KEY (manager_id) 
        REFERENCES employee(emp_id) ON DELETE SET NULL
);
 
-- ================================================
-- Textual Notation:
-- EMPLOYEE(*emp_id: INT, first_name: VARCHAR(50), last_name: VARCHAR(50),
--          email: Email, hire_date: DATE, salary: DECIMAL(10,2) CHECK >=0,
--          dept_id: INT → DEPARTMENT)
 
-- SQL Implementation:
CREATE TABLE employee (
    emp_id      INTEGER         NOT NULL,
    first_name  VARCHAR(50)     NOT NULL,
    last_name   VARCHAR(50)     NOT NULL,
    email       VARCHAR(255)    NOT NULL,
    hire_date   DATE            NOT NULL DEFAULT CURRENT_DATE,
    salary      DECIMAL(10,2)   NOT NULL,
    dept_id     INTEGER         NOT NULL,
    
    CONSTRAINT pk_employee PRIMARY KEY (emp_id),
    CONSTRAINT uq_employee_email UNIQUE (email),
    CONSTRAINT chk_salary CHECK (salary >= 0),
    CONSTRAINT chk_email CHECK (email ~ '^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$'),
    CONSTRAINT fk_employee_dept FOREIGN KEY (dept_id) 
        REFERENCES department(dept_id) ON DELETE RESTRICT ON UPDATE CASCADE
);
 
-- ================================================
-- Textual Notation (Composite Key):
-- ENROLLMENT(*student_id → STUDENT, *course_id → COURSE, *semester: VARCHAR(20), 
--            grade: CHAR(2)?, enrollment_date: DATE)
 
-- SQL Implementation:
CREATE TABLE enrollment (
    student_id      INTEGER         NOT NULL,
    course_id       INTEGER         NOT NULL,
    semester        VARCHAR(20)     NOT NULL,
    grade           CHAR(2),        -- Nullable (grade assigned later)
    enrollment_date DATE            NOT NULL DEFAULT CURRENT_DATE,
    
    -- Composite primary key
    CONSTRAINT pk_enrollment PRIMARY KEY (student_id, course_id, semester),
    
    -- Foreign keys
    CONSTRAINT fk_enrollment_student FOREIGN KEY (student_id)
        REFERENCES student(student_id) ON DELETE CASCADE,
    CONSTRAINT fk_enrollment_course FOREIGN KEY (course_id)
        REFERENCES course(course_id) ON DELETE RESTRICT,
        
    -- Domain constraint on grade
    CONSTRAINT chk_grade CHECK (grade IN ('A+', 'A', 'A-', 'B+', 'B', 'B-',
                                          'C+', 'C', 'C-', 'D+', 'D', 'F', 'W'))
);
 
-- ================================================
-- MATHEMATICAL NOTATION TO SQL
-- ================================================
 
-- Mathematical: R(A₁: D₁, A₂: D₂) with constraint A₁ ∈ dom(A₁), A₂ ∈ dom(A₂)
-- SQL: CREATE TABLE with column definitions and CHECK constraints
 
-- Mathematical: t[A] denotes value of attribute A in tuple t
-- SQL: SELECT A FROM R (retrieves values of column A from rows)
 
-- Mathematical: |r| = cardinality
-- SQL: SELECT COUNT(*) FROM R
 
-- Mathematical: deg(R) = n
-- SQL: SELECT COUNT(*) FROM information_schema.columns WHERE table_name = 'R'

Notation vs Implementation

Summary: Mastering Schema Notation

Key Takeaways

•Schema vs Instance is fundamental — Schema (intension) describes structure; instance (extension) contains actual data. Schema notation describes blueprints, not contents.
•Mathematical notation provides rigor — R(A₁, A₂, ..., Aₙ) with dom(Aᵢ) specifications enables formal reasoning and connects to database theory literature.
•Textual notation provides practicality — Conventions for marking primary keys, foreign keys, and nullable attributes enable clear documentation.
•Degree and cardinality are distinct — Degree (attribute count) is fixed by schema; cardinality (tuple count) varies with data operations.
•Domains define valid values — Domain specification goes beyond SQL types to capture semantic constraints and enable consistency.
•Schema comparison has formal basis — Set operations on attribute sets, equivalence relations, and union compatibility have precise definitions.
•Notation maps to SQL implementation — Abstract notation translates directly to DDL, bridging theory and practice.

What's next:

Page Complete

1 / 5