Database Management SystemsData Models

The Relational Model

LevelBeginner

Duration75 mins

TopicData Models

1 / 5

Table-Based Structure

The Power of Tables

In 1970, a mathematician at IBM named E.F. Codd published a paper that would transform the entire computing industry. Titled "A Relational Model of Data for Large Shared Data Banks," this groundbreaking work proposed something radical: instead of navigating complex pointer structures or hierarchical trees, data could be organized into simple, intuitive tables.

This deceptively simple idea—that data could be represented as rows and columns, like a spreadsheet—became the foundation of virtually every major database system in existence. From Oracle to PostgreSQL, MySQL to SQL Server, the relational model's table-based structure has dominated database technology for over five decades.

But what makes tables so powerful? Why did this representation triumph over hierarchical and network models? And what are the precise semantics behind what appears to be a simple grid of data?

What You Will Learn

By the end of this page, you will understand the formal structure of relational tables, the distinction between relations, tuples, and attributes, the critical properties that distinguish relations from ordinary tables, and why this structure provides both mathematical rigor and practical flexibility.

From Tables to Relations

While we casually refer to database tables, the formal term in the relational model is relation. This isn't mere academic pedantry—the choice of terminology reflects deep mathematical foundations that give the relational model its power.

What is a Relation?

In mathematics, a relation is a subset of the Cartesian product of one or more sets. In database terms, this translates to:

A relation is a set of tuples
Each tuple represents a single record (row)
Each tuple contains values for a fixed set of attributes (columns)
The domain of an attribute defines all possible values it can hold

This formal definition has profound implications for how data behaves in a relational system.

Relational Terminology vs Informal Terms
Formal Term	Informal Term	Definition
Relation	Table	A named, two-dimensional structure containing related data
Tuple	Row / Record	A single entry in the relation, representing one entity instance
Attribute	Column / Field	A named property with a defined domain of values
Domain	Data Type	The set of all permissible values for an attribute
Cardinality	Row Count	The number of tuples in a relation
Degree / Arity	Column Count	The number of attributes in a relation

Why Terminology Matters

Understanding formal relational terminology isn't just academic—it's practical. When you read database research, evaluate RDBMS products, or debug complex query behavior, precise terminology helps you reason about what's happening. A 'table' might allow duplicates; a 'relation' mathematically cannot. These distinctions have real consequences.

Anatomy of a Relation

Let's dissect the structure of a relation in detail. Consider a relation storing employee information:

Employee(emp_id: INTEGER, name: VARCHAR(100), department: VARCHAR(50), salary: DECIMAL(10,2), hire_date: DATE)

Here, Employee is the relation name (or relation schema). The items in parentheses define the attributes and their domains.

The Relation Schema

A relation schema, denoted R(A₁, A₂, ..., Aₙ), consists of:

The relation name R
A set of attributes {A₁, A₂, ..., Aₙ}
A mapping from each attribute to its domain

Example Relation: EmployeesA concrete instance of the Employee relation with sample data

Input

Relation Schema:
Employee(emp_id: INTEGER, name: VARCHAR(100), department: VARCHAR(50), salary: DECIMAL(10,2))

Relation Instance (set of tuples):
┌─────────┬──────────────┬─────────────┬───────────┐
│ emp_id  │ name         │ department  │ salary    │
├─────────┼──────────────┼─────────────┼───────────┤
│ 1001    │ Alice Chen   │ Engineering │ 95000.00  │
│ 1002    │ Bob Kumar    │ Marketing   │ 78000.00  │
│ 1003    │ Carol Smith  │ Engineering │ 102000.00 │
│ 1004    │ David Liu    │ Finance     │ 88000.00  │
└─────────┴──────────────┴─────────────┴───────────┘

Output

Cardinality: 4 (four tuples)
Degree: 4 (four attributes)
Domain of emp_id: Set of all integers
Domain of salary: Decimal numbers with 10 digits, 2 after decimal

Key Components Explained

1. Attributes (Columns)

Each attribute represents a single, atomic piece of information about the entities the relation describes. In our example:

emp_id uniquely identifies each employee
name stores the employee's full name
department categorizes the employee's organizational unit
salary records their compensation

2. Domains

Every attribute is associated with a domain—the complete set of values it can take. Domains can be:

Simple: Built-in types like INTEGER, VARCHAR, DATE
User-defined: Custom types like email addresses or SSNs with specific formats
Semantic: Beyond the data type, domains can encode meaning (e.g., salary must be positive)

3. Tuples (Rows)

Each tuple is an ordered list of values, one for each attribute. A tuple in our example:

(1001, 'Alice Chen', 'Engineering', 95000.00)

This tuple represents a single employee with all their associated attribute values.

4. Relation Instance

The complete set of tuples at any given moment is called the relation instance or relation state. This is the actual data—what changes as employees are hired, promoted, or leave. The schema, by contrast, defines the structure and typically changes rarely.

Fundamental Properties of Relations

Relations are not arbitrary tables. They obey specific mathematical properties that distinguish them from spreadsheets or CSV files. These properties aren't constraints imposed for convenience—they're fundamental to enabling the powerful operations the relational model supports.

Essential Properties of Relations

•No Duplicate Tuples — A relation is a set of tuples. By definition, sets contain no duplicates. Each tuple must be unique. This is why every relation must have a key—a way to distinguish each tuple from all others.
•Tuple Order is Irrelevant — The tuples in a relation have no inherent order. {row1, row2, row3} is the same relation as {row3, row1, row2}. Any ordering you see in query results is imposed by ORDER BY, not intrinsic to the data.
•Attribute Order is Irrelevant — Similarly, attribute positions don't matter. We identify attributes by name, not position. Employee(id, name) represents the same schema as Employee(name, id).
•Atomic Attribute Values — Each cell in a relation contains a single, indivisible value. No lists, no nested records, no repeating groups. This property—called First Normal Form (1NF)—is fundamental to relational theory.
•Uniform Row Structure — Every tuple in a relation has exactly the same set of attributes. There are no 'optional columns' or rows with extra fields. The schema is fixed.

SQL Tables vs Mathematical Relations

Real SQL databases often violate pure relational theory. SQL tables can have duplicate rows (unless constrained), results have order, and later standards added ARRAY and JSON types that violate atomicity. This doesn't invalidate relational theory—it means SQL is a practical approximation of the mathematical ideal, with trade-offs for performance and usability.

Why These Properties Matter

Each property enables something important:

No Duplicates → Integrity Duplicate elimination ensures data integrity. If the same employee appears twice with different salaries, which is correct? Duplicates create ambiguity. By preventing them mathematically, the relational model forces clean, unambiguous data.

Unordered Tuples → Query Optimization Because tuple order is undefined, the database engine is free to store and retrieve data in whatever order is most efficient. It can use indexes, parallelization, and caching without worrying about preserving artificial ordering.

Unordered Attributes → Schema Evolution Referring to attributes by name rather than position means adding a new column doesn't break existing queries. You can evolve your schema without rewriting your application.

Atomicity → Simple, Predictable Operations When every value is atomic, operations like comparison, sorting, and indexing are straightforward. There's no need for special logic to handle nested structures or varying lengths.

Attributes and Domains in Depth

The concepts of attributes and domains deserve deeper exploration, as they form the foundation of relational integrity and query semantics.

Attribute Semantics

An attribute isn't just a column name—it's a semantic statement about the data. Consider these two relations:

Employee(id, name, manager_id)
Department(id, name, budget)

Both have id and name attributes, but they mean different things. Employee.id is an employee identifier; Department.id is a department identifier. Even if both are integers, they represent different domains semantically.

Domain Semantics Beyond Data TypesIllustrating how domains carry meaning beyond their underlying data types

Input

Consider three INTEGER attributes:
  - employee_id: Identifies employees (range: 1000-9999)
  - department_code: Identifies departments (range: 100-999)  
  - age: Employee's age in years (range: 18-100)

All three have the same data type (INTEGER) but different domains.

Output

Semantic implications:
  - Comparing employee_id = department_code is valid syntax but likely a bug
  - Comparing employee_id = age is almost certainly a bug
  - A true RDBMS could warn or prevent such comparisons

Domain enforcement catches logical errors that type checking misses.

Domain Types

1. Simple Domains Built-in data types provided by the DBMS:

INTEGER, BIGINT, SMALLINT
DECIMAL(p, s), FLOAT, REAL
CHAR(n), VARCHAR(n), TEXT
DATE, TIME, TIMESTAMP
BOOLEAN

2. Constrained Domains Simple domains with additional restrictions:

CREATE DOMAIN positive_salary AS DECIMAL(10,2)
    CHECK (VALUE > 0 AND VALUE < 10000000);

CREATE DOMAIN email_address AS VARCHAR(255)
    CHECK (VALUE ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');

3. Enumerated Domains A fixed set of allowed values:

CREATE TYPE employment_status AS ENUM ('ACTIVE', 'INACTIVE', 'TERMINATED', 'ON_LEAVE');

4. Composite Domains (Extended Relational) Some modern systems support structured types:

CREATE TYPE address AS (
    street VARCHAR(100),
    city VARCHAR(50),
    postal_code VARCHAR(20),
    country CHAR(2)
);

Note: Pure relational theory requires atomic values; composite types extend the model.

The Power of Domain Constraints

Well-defined domains act as the first line of defense against bad data. When a domain constraint rejects 'salary = -50000' or 'email = not_an_email', you catch the error at the database boundary—before it corrupts your data or propagates through your application.

Tuples and Tuple Semantics

A tuple is more than a row of values—it's a proposition about the real world. Understanding this perspective transforms how you think about data modeling.

Tuples as Facts

Consider this tuple from our Employee relation:

(1001, 'Alice Chen', 'Engineering', 95000.00)

This tuple asserts a fact: "There exists an employee with ID 1001, named Alice Chen, who works in Engineering and earns $95,000."

This interpretation has profound implications:

Adding a tuple = asserting a new fact about the world
Deleting a tuple = retracting a previously asserted fact
Updating a tuple = correcting or updating an existing fact
A missing tuple = absence of that fact (not that we don't know—that the fact doesn't hold)

Tuple Properties

•Each tuple has a value for every attribute
•Values must come from the attribute's domain
•NULL represents missing or unknown information
•Tuples are identified by their attribute values, not position
•Two tuples with identical values are the same tuple (no duplicates)

Common Misconceptions

•Tuples are NOT ordered within a relation
•Tuple identity is NOT based on row number or insertion order
•Tuples are NOT arrays—attribute access is by name
•NULL is NOT the same as empty string or zero
•A tuple is NOT mutable—updates create new tuple states

The Closed World Assumption

Relational databases operate under the Closed World Assumption (CWA): if a fact is not represented by a tuple in the database, that fact is assumed to be false.

This contrasts with the Open World Assumption used in some knowledge representation systems, where a missing fact simply means we don't know.

Example under CWA:

Our Employee relation has no tuple for employee ID 9999
Conclusion: Employee 9999 does not exist (not "we don't know if they exist")

This assumption is what allows us to answer questions like "How many employees are in Engineering?" with a definitive count. If we couldn't assume completeness, every count would be "at least n."

Practical Implications:

Data entry is critical—missing data means missing facts
Deletions must be intentional—you're declaring something no longer true
NULLs must be used carefully—they break the simple true/false proposition model

The NULL Controversy

NULL values introduce three-valued logic (true, false, unknown) into what would otherwise be a clean two-valued system. Codd himself later proposed more nuanced NULL types (missing but applicable, inapplicable, etc.). Most modern systems use a single NULL, which creates known semantic ambiguities. Understanding this is crucial for writing correct queries.

Relation Schema vs Relation Instance

A critical distinction in the relational model is between schema (structure) and instance (data). This separation is fundamental to data independence and schema evolution.

Relation Schema (Intension)

The schema defines the structure:

Relation name
Attribute names and their domains
Integrity constraints
Keys and relationships

Schemas are defined at design time and typically change infrequently. They describe what could be stored.

Relation Instance (Extension)

The instance is the actual data:

The specific tuples currently in the relation
Values that satisfy the schema's constraints
The state of the data at a particular moment

Instances change constantly as data is inserted, updated, and deleted. They describe what is stored.

Schema vs Instance IllustratedShowing the distinction between structure and data

Input

SCHEMA (defined once at design time):
─────────────────────────────────────
Employee(
    emp_id: INTEGER PRIMARY KEY,
    name: VARCHAR(100) NOT NULL,
    department: VARCHAR(50),
    salary: DECIMAL(10,2) CHECK(salary > 0)
)

CONSTRAINTS:
  - emp_id uniquely identifies each employee
  - name cannot be null
  - salary must be positive

Output

INSTANCE (changes with each transaction):
─────────────────────────────────────────
At time T1:
  {(1001, 'Alice', 'Eng', 95000),
   (1002, 'Bob', 'Mkt', 78000)}

At time T2 (after Bob's raise):
  {(1001, 'Alice', 'Eng', 95000),
   (1002, 'Bob', 'Mkt', 85000)}

At time T3 (Carol hired):
  {(1001, 'Alice', 'Eng', 95000),
   (1002, 'Bob', 'Mkt', 85000),
   (1003, 'Carol', 'Eng', 102000)}

Why This Distinction Matters

1. Data Independence Separating schema from instance enables logical data independence. Applications can be written against the schema; the actual data can change without affecting application logic.

2. Constraint Enforcement Constraints defined in the schema are enforced on every instance. You define rules once; they apply to all data, past and future.

3. Query Generality Queries are written against schemas, not specific data. SELECT * FROM Employee WHERE salary > 80000 works regardless of how many employees exist or what their specific salaries are.

4. Schema Evolution Schemas can be modified (add column, change type) while preserving existing instance data, often with automatic migration.

Schema vs Instance Characteristics
Aspect	Schema (Intension)	Instance (Extension)
What it describes	Structure and rules	Actual data values
Change frequency	Rarely (design decisions)	Constantly (transactions)
Defined by	Database designer / DBA	Users, applications, data loads
Stored in	Data dictionary / catalog	Data files / tablespaces
Affects	All future data	Current state only
Query target	DDL statements	DML statements

Visual Representation of Relations

While relations are mathematical objects, we commonly visualize them as tables. Understanding the visual conventions helps when reading documentation, designing schemas, and communicating with colleagues.

Standard Table Visualization

relation-visual.txt
┌────────────────────────────────────────────────────────────────┐
│                     Relation: Employee                          │
├──────────┬───────────────┬──────────────┬──────────────────────┤
│ emp_id   │ name          │ department   │ salary               │
│ (INTEGER)│ (VARCHAR)     │ (VARCHAR)    │ (DECIMAL)            │
├══════════╪═══════════════╪══════════════╪══════════════════════┤
│ 1001     │ Alice Chen    │ Engineering  │ 95000.00             │  ← Tuple 1
│ 1002     │ Bob Kumar     │ Marketing    │ 78000.00             │  ← Tuple 2
│ 1003     │ Carol Smith   │ Engineering  │ 102000.00            │  ← Tuple 3
│ 1004     │ David Liu     │ Finance      │ 88000.00             │  ← Tuple 4
└──────────┴───────────────┴──────────────┴──────────────────────┘
     ↑            ↑              ↑              ↑
  Attribute 1  Attribute 2   Attribute 3    Attribute 4
 
Degree (arity) = 4 attributes
Cardinality = 4 tuples

Notation Conventions

1. Schema Notation

Relation_Name(attribute₁: domain₁, attribute₂: domain₂, ...)

Example: Employee(emp_id: INT, name: VARCHAR(100), salary: DECIMAL(10,2))

2. Instance Notation

Relation_Name = {tuple₁, tuple₂, ...}

Example: Employee = {(1001, 'Alice', 95000), (1002, 'Bob', 78000)}

3. Key Notation Primary key attributes are often underlined or marked:

Employee(emp_id, name, salary)  — emp_id underlined as primary key

4. Foreign Key Notation Relationships between relations shown with arrows or notation:

Employee(emp_id, name, dept_id → Department)

Reading ER Diagrams

In ER (Entity-Relationship) diagrams, relations are typically shown as rectangles. Attributes appear as ovals connected to their relation. Primary keys are underlined. This visual language is standard in database design documentation.

Practical Implications for Database Design

Understanding the table-based structure has direct implications for how you design and use databases. Here are the key practical takeaways:

Design Principles from Table Structure

•Always define a primary key — Since relations cannot have duplicates, every table must have a way to uniquely identify each row. Whether natural keys or surrogate keys, this isn't optional.
•Keep attributes atomic — Resist the temptation to store comma-separated lists or JSON blobs (unless you have good reasons). Atomic values enable indexing, querying, and constraint enforcement.
•Define meaningful domains — Don't just use generic VARCHAR for everything. Use appropriate types (DATE for dates, DECIMAL for money) and add CHECK constraints where applicable.
•Treat NULL with care — Every nullable column introduces three-valued logic complexity. Only allow NULL where genuinely unknown/inapplicable data is possible.
•Design for schema stability — Choose attribute names thoughtfully; they become part of your public interface. Changing column names later can break applications.

Good Table Design

Employee(emp_id INT PK, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(255) UNIQUE, hire_date DATE NOT NULL, salary DECIMAL(10,2) CHECK(salary > 0))

✓ Clear primary key ✓ Atomic names (first/last separate) ✓ Appropriate types ✓ Meaningful constraints

Poor Table Design

Employee(id, name, info, data, other)

✗ No key defined ✗ Vague column names ✗ 'info' likely non-atomic (JSON?) ✗ No types or constraints ✗ No clarity on what 'other' means

Summary: The Power of Table-Based Structure

We've covered the foundational structure that makes the relational model possible. Let's consolidate the key concepts:

Key Takeaways

•Relations are mathematical sets — Not arbitrary tables. The formal definition enables powerful query operations and guarantees consistency.
•The core components are relations, tuples, and attributes — Understanding these precisely is essential for effective database work.
•Key properties distinguish relations — No duplicates, unordered tuples, atomic values. These enable optimization and integrity.
•Domains carry semantic meaning — Beyond data types, domains define what values mean and what operations make sense.
•Tuples are propositions about reality — They assert facts; the closed world assumption makes queries meaningful.
•Schema vs instance separation enables flexibility — Structure is stable; data is dynamic. This separation is fundamental.

What's Next:

Now that we understand the structural basis of the relational model, we'll explore its mathematical foundations. The next page examines the set theory and predicate logic that give the relational model its formal power—the theory that enables query optimization, normalization, and constraint verification.

Page Complete

You now understand the table-based structure that defines the relational model. This seemingly simple representation—tables with rows and columns—rests on rigorous mathematical foundations that enable everything from simple queries to complex data integrity enforcement. Next, we'll explore those mathematical underpinnings.

1 / 5

Loading learning content...

Database Management SystemsData Models

The Relational Model

LevelBeginner

Duration75 mins

TopicData Models

1 / 5

Table-Based Structure

The Power of Tables

But what makes tables so powerful? Why did this representation triumph over hierarchical and network models? And what are the precise semantics behind what appears to be a simple grid of data?

What You Will Learn

From Tables to Relations

What is a Relation?

In mathematics, a relation is a subset of the Cartesian product of one or more sets. In database terms, this translates to:

A relation is a set of tuples
Each tuple represents a single record (row)
Each tuple contains values for a fixed set of attributes (columns)
The domain of an attribute defines all possible values it can hold

This formal definition has profound implications for how data behaves in a relational system.

Relational Terminology vs Informal Terms
Formal Term	Informal Term	Definition
Relation	Table	A named, two-dimensional structure containing related data
Tuple	Row / Record	A single entry in the relation, representing one entity instance
Attribute	Column / Field	A named property with a defined domain of values
Domain	Data Type	The set of all permissible values for an attribute
Cardinality	Row Count	The number of tuples in a relation
Degree / Arity	Column Count	The number of attributes in a relation

Why Terminology Matters

Anatomy of a Relation

Let's dissect the structure of a relation in detail. Consider a relation storing employee information:

Employee(emp_id: INTEGER, name: VARCHAR(100), department: VARCHAR(50), salary: DECIMAL(10,2), hire_date: DATE)

Here, Employee is the relation name (or relation schema). The items in parentheses define the attributes and their domains.

The Relation Schema

A relation schema, denoted R(A₁, A₂, ..., Aₙ), consists of:

The relation name R
A set of attributes {A₁, A₂, ..., Aₙ}
A mapping from each attribute to its domain

Example Relation: EmployeesA concrete instance of the Employee relation with sample data

Input

Relation Schema:
Employee(emp_id: INTEGER, name: VARCHAR(100), department: VARCHAR(50), salary: DECIMAL(10,2))

Relation Instance (set of tuples):
┌─────────┬──────────────┬─────────────┬───────────┐
│ emp_id  │ name         │ department  │ salary    │
├─────────┼──────────────┼─────────────┼───────────┤
│ 1001    │ Alice Chen   │ Engineering │ 95000.00  │
│ 1002    │ Bob Kumar    │ Marketing   │ 78000.00  │
│ 1003    │ Carol Smith  │ Engineering │ 102000.00 │
│ 1004    │ David Liu    │ Finance     │ 88000.00  │
└─────────┴──────────────┴─────────────┴───────────┘

Output

Cardinality: 4 (four tuples)
Degree: 4 (four attributes)
Domain of emp_id: Set of all integers
Domain of salary: Decimal numbers with 10 digits, 2 after decimal

Key Components Explained

1. Attributes (Columns)

Each attribute represents a single, atomic piece of information about the entities the relation describes. In our example:

emp_id uniquely identifies each employee
name stores the employee's full name
department categorizes the employee's organizational unit
salary records their compensation

2. Domains

Every attribute is associated with a domain—the complete set of values it can take. Domains can be:

Simple: Built-in types like INTEGER, VARCHAR, DATE
User-defined: Custom types like email addresses or SSNs with specific formats
Semantic: Beyond the data type, domains can encode meaning (e.g., salary must be positive)

3. Tuples (Rows)

Each tuple is an ordered list of values, one for each attribute. A tuple in our example:

(1001, 'Alice Chen', 'Engineering', 95000.00)

This tuple represents a single employee with all their associated attribute values.

4. Relation Instance

Fundamental Properties of Relations

Essential Properties of Relations

•No Duplicate Tuples — A relation is a set of tuples. By definition, sets contain no duplicates. Each tuple must be unique. This is why every relation must have a key—a way to distinguish each tuple from all others.
•Tuple Order is Irrelevant — The tuples in a relation have no inherent order. {row1, row2, row3} is the same relation as {row3, row1, row2}. Any ordering you see in query results is imposed by ORDER BY, not intrinsic to the data.
•Attribute Order is Irrelevant — Similarly, attribute positions don't matter. We identify attributes by name, not position. Employee(id, name) represents the same schema as Employee(name, id).
•Atomic Attribute Values — Each cell in a relation contains a single, indivisible value. No lists, no nested records, no repeating groups. This property—called First Normal Form (1NF)—is fundamental to relational theory.
•Uniform Row Structure — Every tuple in a relation has exactly the same set of attributes. There are no 'optional columns' or rows with extra fields. The schema is fixed.

SQL Tables vs Mathematical Relations

Why These Properties Matter

Each property enables something important:

Attributes and Domains in Depth

The concepts of attributes and domains deserve deeper exploration, as they form the foundation of relational integrity and query semantics.

Attribute Semantics

An attribute isn't just a column name—it's a semantic statement about the data. Consider these two relations:

Employee(id, name, manager_id)
Department(id, name, budget)

Domain Semantics Beyond Data TypesIllustrating how domains carry meaning beyond their underlying data types

Input

Consider three INTEGER attributes:
  - employee_id: Identifies employees (range: 1000-9999)
  - department_code: Identifies departments (range: 100-999)  
  - age: Employee's age in years (range: 18-100)

All three have the same data type (INTEGER) but different domains.

Output

Semantic implications:
  - Comparing employee_id = department_code is valid syntax but likely a bug
  - Comparing employee_id = age is almost certainly a bug
  - A true RDBMS could warn or prevent such comparisons

Domain enforcement catches logical errors that type checking misses.

Domain Types

1. Simple Domains Built-in data types provided by the DBMS:

INTEGER, BIGINT, SMALLINT
DECIMAL(p, s), FLOAT, REAL
CHAR(n), VARCHAR(n), TEXT
DATE, TIME, TIMESTAMP
BOOLEAN

2. Constrained Domains Simple domains with additional restrictions:

CREATE DOMAIN positive_salary AS DECIMAL(10,2)
    CHECK (VALUE > 0 AND VALUE < 10000000);

CREATE DOMAIN email_address AS VARCHAR(255)
    CHECK (VALUE ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');

3. Enumerated Domains A fixed set of allowed values:

CREATE TYPE employment_status AS ENUM ('ACTIVE', 'INACTIVE', 'TERMINATED', 'ON_LEAVE');

4. Composite Domains (Extended Relational) Some modern systems support structured types:

CREATE TYPE address AS (
    street VARCHAR(100),
    city VARCHAR(50),
    postal_code VARCHAR(20),
    country CHAR(2)
);

Note: Pure relational theory requires atomic values; composite types extend the model.

The Power of Domain Constraints

Tuples and Tuple Semantics

A tuple is more than a row of values—it's a proposition about the real world. Understanding this perspective transforms how you think about data modeling.

Tuples as Facts

Consider this tuple from our Employee relation:

(1001, 'Alice Chen', 'Engineering', 95000.00)

This tuple asserts a fact: "There exists an employee with ID 1001, named Alice Chen, who works in Engineering and earns $95,000."

This interpretation has profound implications:

Adding a tuple = asserting a new fact about the world
Deleting a tuple = retracting a previously asserted fact
Updating a tuple = correcting or updating an existing fact
A missing tuple = absence of that fact (not that we don't know—that the fact doesn't hold)

Tuple Properties

•Each tuple has a value for every attribute
•Values must come from the attribute's domain
•NULL represents missing or unknown information
•Tuples are identified by their attribute values, not position
•Two tuples with identical values are the same tuple (no duplicates)

Common Misconceptions

•Tuples are NOT ordered within a relation
•Tuple identity is NOT based on row number or insertion order
•Tuples are NOT arrays—attribute access is by name
•NULL is NOT the same as empty string or zero
•A tuple is NOT mutable—updates create new tuple states

The Closed World Assumption

Relational databases operate under the Closed World Assumption (CWA): if a fact is not represented by a tuple in the database, that fact is assumed to be false.

This contrasts with the Open World Assumption used in some knowledge representation systems, where a missing fact simply means we don't know.

Example under CWA:

Our Employee relation has no tuple for employee ID 9999
Conclusion: Employee 9999 does not exist (not "we don't know if they exist")

This assumption is what allows us to answer questions like "How many employees are in Engineering?" with a definitive count. If we couldn't assume completeness, every count would be "at least n."

Practical Implications:

Data entry is critical—missing data means missing facts
Deletions must be intentional—you're declaring something no longer true
NULLs must be used carefully—they break the simple true/false proposition model

The NULL Controversy

Relation Schema vs Relation Instance

A critical distinction in the relational model is between schema (structure) and instance (data). This separation is fundamental to data independence and schema evolution.

Relation Schema (Intension)

The schema defines the structure:

Relation name
Attribute names and their domains
Integrity constraints
Keys and relationships

Schemas are defined at design time and typically change infrequently. They describe what could be stored.

Relation Instance (Extension)

The instance is the actual data:

The specific tuples currently in the relation
Values that satisfy the schema's constraints
The state of the data at a particular moment

Instances change constantly as data is inserted, updated, and deleted. They describe what is stored.

Schema vs Instance IllustratedShowing the distinction between structure and data

Input

SCHEMA (defined once at design time):
─────────────────────────────────────
Employee(
    emp_id: INTEGER PRIMARY KEY,
    name: VARCHAR(100) NOT NULL,
    department: VARCHAR(50),
    salary: DECIMAL(10,2) CHECK(salary > 0)
)

CONSTRAINTS:
  - emp_id uniquely identifies each employee
  - name cannot be null
  - salary must be positive

Output

INSTANCE (changes with each transaction):
─────────────────────────────────────────
At time T1:
  {(1001, 'Alice', 'Eng', 95000),
   (1002, 'Bob', 'Mkt', 78000)}

At time T2 (after Bob's raise):
  {(1001, 'Alice', 'Eng', 95000),
   (1002, 'Bob', 'Mkt', 85000)}

At time T3 (Carol hired):
  {(1001, 'Alice', 'Eng', 95000),
   (1002, 'Bob', 'Mkt', 85000),
   (1003, 'Carol', 'Eng', 102000)}

Why This Distinction Matters

2. Constraint Enforcement Constraints defined in the schema are enforced on every instance. You define rules once; they apply to all data, past and future.

4. Schema Evolution Schemas can be modified (add column, change type) while preserving existing instance data, often with automatic migration.

Schema vs Instance Characteristics
Aspect	Schema (Intension)	Instance (Extension)
What it describes	Structure and rules	Actual data values
Change frequency	Rarely (design decisions)	Constantly (transactions)
Defined by	Database designer / DBA	Users, applications, data loads
Stored in	Data dictionary / catalog	Data files / tablespaces
Affects	All future data	Current state only
Query target	DDL statements	DML statements

Visual Representation of Relations

Standard Table Visualization

relation-visual.txt
┌────────────────────────────────────────────────────────────────┐
│                     Relation: Employee                          │
├──────────┬───────────────┬──────────────┬──────────────────────┤
│ emp_id   │ name          │ department   │ salary               │
│ (INTEGER)│ (VARCHAR)     │ (VARCHAR)    │ (DECIMAL)            │
├══════════╪═══════════════╪══════════════╪══════════════════════┤
│ 1001     │ Alice Chen    │ Engineering  │ 95000.00             │  ← Tuple 1
│ 1002     │ Bob Kumar     │ Marketing    │ 78000.00             │  ← Tuple 2
│ 1003     │ Carol Smith   │ Engineering  │ 102000.00            │  ← Tuple 3
│ 1004     │ David Liu     │ Finance      │ 88000.00             │  ← Tuple 4
└──────────┴───────────────┴──────────────┴──────────────────────┘
     ↑            ↑              ↑              ↑
  Attribute 1  Attribute 2   Attribute 3    Attribute 4
 
Degree (arity) = 4 attributes
Cardinality = 4 tuples

Notation Conventions

1. Schema Notation

Relation_Name(attribute₁: domain₁, attribute₂: domain₂, ...)

Example: Employee(emp_id: INT, name: VARCHAR(100), salary: DECIMAL(10,2))

2. Instance Notation

Relation_Name = {tuple₁, tuple₂, ...}

Example: Employee = {(1001, 'Alice', 95000), (1002, 'Bob', 78000)}

3. Key Notation Primary key attributes are often underlined or marked:

Employee(emp_id, name, salary)  — emp_id underlined as primary key

4. Foreign Key Notation Relationships between relations shown with arrows or notation:

Employee(emp_id, name, dept_id → Department)

Reading ER Diagrams

Practical Implications for Database Design

Understanding the table-based structure has direct implications for how you design and use databases. Here are the key practical takeaways:

Design Principles from Table Structure

•Always define a primary key — Since relations cannot have duplicates, every table must have a way to uniquely identify each row. Whether natural keys or surrogate keys, this isn't optional.
•Keep attributes atomic — Resist the temptation to store comma-separated lists or JSON blobs (unless you have good reasons). Atomic values enable indexing, querying, and constraint enforcement.
•Define meaningful domains — Don't just use generic VARCHAR for everything. Use appropriate types (DATE for dates, DECIMAL for money) and add CHECK constraints where applicable.
•Treat NULL with care — Every nullable column introduces three-valued logic complexity. Only allow NULL where genuinely unknown/inapplicable data is possible.
•Design for schema stability — Choose attribute names thoughtfully; they become part of your public interface. Changing column names later can break applications.

Good Table Design

Employee(emp_id INT PK, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(255) UNIQUE, hire_date DATE NOT NULL, salary DECIMAL(10,2) CHECK(salary > 0))

✓ Clear primary key ✓ Atomic names (first/last separate) ✓ Appropriate types ✓ Meaningful constraints

Poor Table Design

Employee(id, name, info, data, other)

✗ No key defined ✗ Vague column names ✗ 'info' likely non-atomic (JSON?) ✗ No types or constraints ✗ No clarity on what 'other' means

Summary: The Power of Table-Based Structure

We've covered the foundational structure that makes the relational model possible. Let's consolidate the key concepts:

Key Takeaways

•Relations are mathematical sets — Not arbitrary tables. The formal definition enables powerful query operations and guarantees consistency.
•The core components are relations, tuples, and attributes — Understanding these precisely is essential for effective database work.
•Key properties distinguish relations — No duplicates, unordered tuples, atomic values. These enable optimization and integrity.
•Domains carry semantic meaning — Beyond data types, domains define what values mean and what operations make sense.
•Tuples are propositions about reality — They assert facts; the closed world assumption makes queries meaningful.
•Schema vs instance separation enables flexibility — Structure is stable; data is dynamic. This separation is fundamental.

What's Next:

Page Complete

1 / 5