Loading learning content...
In 1970, a mathematician at IBM named E.F. Codd published a paper that would transform the entire computing industry. Titled "A Relational Model of Data for Large Shared Data Banks," this groundbreaking work proposed something radical: instead of navigating complex pointer structures or hierarchical trees, data could be organized into simple, intuitive tables.
This deceptively simple idea—that data could be represented as rows and columns, like a spreadsheet—became the foundation of virtually every major database system in existence. From Oracle to PostgreSQL, MySQL to SQL Server, the relational model's table-based structure has dominated database technology for over five decades.
But what makes tables so powerful? Why did this representation triumph over hierarchical and network models? And what are the precise semantics behind what appears to be a simple grid of data?
By the end of this page, you will understand the formal structure of relational tables, the distinction between relations, tuples, and attributes, the critical properties that distinguish relations from ordinary tables, and why this structure provides both mathematical rigor and practical flexibility.
While we casually refer to database tables, the formal term in the relational model is relation. This isn't mere academic pedantry—the choice of terminology reflects deep mathematical foundations that give the relational model its power.
What is a Relation?
In mathematics, a relation is a subset of the Cartesian product of one or more sets. In database terms, this translates to:
This formal definition has profound implications for how data behaves in a relational system.
| Formal Term | Informal Term | Definition |
|---|---|---|
| Relation | Table | A named, two-dimensional structure containing related data |
| Tuple | Row / Record | A single entry in the relation, representing one entity instance |
| Attribute | Column / Field | A named property with a defined domain of values |
| Domain | Data Type | The set of all permissible values for an attribute |
| Cardinality | Row Count | The number of tuples in a relation |
| Degree / Arity | Column Count | The number of attributes in a relation |
Understanding formal relational terminology isn't just academic—it's practical. When you read database research, evaluate RDBMS products, or debug complex query behavior, precise terminology helps you reason about what's happening. A 'table' might allow duplicates; a 'relation' mathematically cannot. These distinctions have real consequences.
Let's dissect the structure of a relation in detail. Consider a relation storing employee information:
Employee(emp_id: INTEGER, name: VARCHAR(100), department: VARCHAR(50), salary: DECIMAL(10,2), hire_date: DATE)
Here, Employee is the relation name (or relation schema). The items in parentheses define the attributes and their domains.
The Relation Schema
A relation schema, denoted R(A₁, A₂, ..., Aₙ), consists of:
Relation Schema:
Employee(emp_id: INTEGER, name: VARCHAR(100), department: VARCHAR(50), salary: DECIMAL(10,2))
Relation Instance (set of tuples):
┌─────────┬──────────────┬─────────────┬───────────┐
│ emp_id │ name │ department │ salary │
├─────────┼──────────────┼─────────────┼───────────┤
│ 1001 │ Alice Chen │ Engineering │ 95000.00 │
│ 1002 │ Bob Kumar │ Marketing │ 78000.00 │
│ 1003 │ Carol Smith │ Engineering │ 102000.00 │
│ 1004 │ David Liu │ Finance │ 88000.00 │
└─────────┴──────────────┴─────────────┴───────────┘Cardinality: 4 (four tuples)
Degree: 4 (four attributes)
Domain of emp_id: Set of all integers
Domain of salary: Decimal numbers with 10 digits, 2 after decimalKey Components Explained
1. Attributes (Columns)
Each attribute represents a single, atomic piece of information about the entities the relation describes. In our example:
emp_id uniquely identifies each employeename stores the employee's full namedepartment categorizes the employee's organizational unitsalary records their compensation2. Domains
Every attribute is associated with a domain—the complete set of values it can take. Domains can be:
3. Tuples (Rows)
Each tuple is an ordered list of values, one for each attribute. A tuple in our example:
(1001, 'Alice Chen', 'Engineering', 95000.00)
This tuple represents a single employee with all their associated attribute values.
4. Relation Instance
The complete set of tuples at any given moment is called the relation instance or relation state. This is the actual data—what changes as employees are hired, promoted, or leave. The schema, by contrast, defines the structure and typically changes rarely.
Relations are not arbitrary tables. They obey specific mathematical properties that distinguish them from spreadsheets or CSV files. These properties aren't constraints imposed for convenience—they're fundamental to enabling the powerful operations the relational model supports.
Real SQL databases often violate pure relational theory. SQL tables can have duplicate rows (unless constrained), results have order, and later standards added ARRAY and JSON types that violate atomicity. This doesn't invalidate relational theory—it means SQL is a practical approximation of the mathematical ideal, with trade-offs for performance and usability.
Why These Properties Matter
Each property enables something important:
No Duplicates → Integrity Duplicate elimination ensures data integrity. If the same employee appears twice with different salaries, which is correct? Duplicates create ambiguity. By preventing them mathematically, the relational model forces clean, unambiguous data.
Unordered Tuples → Query Optimization Because tuple order is undefined, the database engine is free to store and retrieve data in whatever order is most efficient. It can use indexes, parallelization, and caching without worrying about preserving artificial ordering.
Unordered Attributes → Schema Evolution Referring to attributes by name rather than position means adding a new column doesn't break existing queries. You can evolve your schema without rewriting your application.
Atomicity → Simple, Predictable Operations When every value is atomic, operations like comparison, sorting, and indexing are straightforward. There's no need for special logic to handle nested structures or varying lengths.
The concepts of attributes and domains deserve deeper exploration, as they form the foundation of relational integrity and query semantics.
Attribute Semantics
An attribute isn't just a column name—it's a semantic statement about the data. Consider these two relations:
Employee(id, name, manager_id)
Department(id, name, budget)
Both have id and name attributes, but they mean different things. Employee.id is an employee identifier; Department.id is a department identifier. Even if both are integers, they represent different domains semantically.
Consider three INTEGER attributes:
- employee_id: Identifies employees (range: 1000-9999)
- department_code: Identifies departments (range: 100-999)
- age: Employee's age in years (range: 18-100)
All three have the same data type (INTEGER) but different domains.Semantic implications:
- Comparing employee_id = department_code is valid syntax but likely a bug
- Comparing employee_id = age is almost certainly a bug
- A true RDBMS could warn or prevent such comparisons
Domain enforcement catches logical errors that type checking misses.Domain Types
1. Simple Domains Built-in data types provided by the DBMS:
2. Constrained Domains Simple domains with additional restrictions:
CREATE DOMAIN positive_salary AS DECIMAL(10,2)
CHECK (VALUE > 0 AND VALUE < 10000000);
CREATE DOMAIN email_address AS VARCHAR(255)
CHECK (VALUE ~ '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$');
3. Enumerated Domains A fixed set of allowed values:
CREATE TYPE employment_status AS ENUM ('ACTIVE', 'INACTIVE', 'TERMINATED', 'ON_LEAVE');
4. Composite Domains (Extended Relational) Some modern systems support structured types:
CREATE TYPE address AS (
street VARCHAR(100),
city VARCHAR(50),
postal_code VARCHAR(20),
country CHAR(2)
);
Note: Pure relational theory requires atomic values; composite types extend the model.
Well-defined domains act as the first line of defense against bad data. When a domain constraint rejects 'salary = -50000' or 'email = not_an_email', you catch the error at the database boundary—before it corrupts your data or propagates through your application.
A tuple is more than a row of values—it's a proposition about the real world. Understanding this perspective transforms how you think about data modeling.
Tuples as Facts
Consider this tuple from our Employee relation:
(1001, 'Alice Chen', 'Engineering', 95000.00)
This tuple asserts a fact: "There exists an employee with ID 1001, named Alice Chen, who works in Engineering and earns $95,000."
This interpretation has profound implications:
The Closed World Assumption
Relational databases operate under the Closed World Assumption (CWA): if a fact is not represented by a tuple in the database, that fact is assumed to be false.
This contrasts with the Open World Assumption used in some knowledge representation systems, where a missing fact simply means we don't know.
Example under CWA:
This assumption is what allows us to answer questions like "How many employees are in Engineering?" with a definitive count. If we couldn't assume completeness, every count would be "at least n."
Practical Implications:
NULL values introduce three-valued logic (true, false, unknown) into what would otherwise be a clean two-valued system. Codd himself later proposed more nuanced NULL types (missing but applicable, inapplicable, etc.). Most modern systems use a single NULL, which creates known semantic ambiguities. Understanding this is crucial for writing correct queries.
A critical distinction in the relational model is between schema (structure) and instance (data). This separation is fundamental to data independence and schema evolution.
Relation Schema (Intension)
The schema defines the structure:
Schemas are defined at design time and typically change infrequently. They describe what could be stored.
Relation Instance (Extension)
The instance is the actual data:
Instances change constantly as data is inserted, updated, and deleted. They describe what is stored.
SCHEMA (defined once at design time):
─────────────────────────────────────
Employee(
emp_id: INTEGER PRIMARY KEY,
name: VARCHAR(100) NOT NULL,
department: VARCHAR(50),
salary: DECIMAL(10,2) CHECK(salary > 0)
)
CONSTRAINTS:
- emp_id uniquely identifies each employee
- name cannot be null
- salary must be positiveINSTANCE (changes with each transaction):
─────────────────────────────────────────
At time T1:
{(1001, 'Alice', 'Eng', 95000),
(1002, 'Bob', 'Mkt', 78000)}
At time T2 (after Bob's raise):
{(1001, 'Alice', 'Eng', 95000),
(1002, 'Bob', 'Mkt', 85000)}
At time T3 (Carol hired):
{(1001, 'Alice', 'Eng', 95000),
(1002, 'Bob', 'Mkt', 85000),
(1003, 'Carol', 'Eng', 102000)}Why This Distinction Matters
1. Data Independence Separating schema from instance enables logical data independence. Applications can be written against the schema; the actual data can change without affecting application logic.
2. Constraint Enforcement Constraints defined in the schema are enforced on every instance. You define rules once; they apply to all data, past and future.
3. Query Generality
Queries are written against schemas, not specific data. SELECT * FROM Employee WHERE salary > 80000 works regardless of how many employees exist or what their specific salaries are.
4. Schema Evolution Schemas can be modified (add column, change type) while preserving existing instance data, often with automatic migration.
| Aspect | Schema (Intension) | Instance (Extension) |
|---|---|---|
| What it describes | Structure and rules | Actual data values |
| Change frequency | Rarely (design decisions) | Constantly (transactions) |
| Defined by | Database designer / DBA | Users, applications, data loads |
| Stored in | Data dictionary / catalog | Data files / tablespaces |
| Affects | All future data | Current state only |
| Query target | DDL statements | DML statements |
While relations are mathematical objects, we commonly visualize them as tables. Understanding the visual conventions helps when reading documentation, designing schemas, and communicating with colleagues.
Standard Table Visualization
┌────────────────────────────────────────────────────────────────┐│ Relation: Employee │├──────────┬───────────────┬──────────────┬──────────────────────┤│ emp_id │ name │ department │ salary ││ (INTEGER)│ (VARCHAR) │ (VARCHAR) │ (DECIMAL) │├══════════╪═══════════════╪══════════════╪══════════════════════┤│ 1001 │ Alice Chen │ Engineering │ 95000.00 │ ← Tuple 1│ 1002 │ Bob Kumar │ Marketing │ 78000.00 │ ← Tuple 2│ 1003 │ Carol Smith │ Engineering │ 102000.00 │ ← Tuple 3│ 1004 │ David Liu │ Finance │ 88000.00 │ ← Tuple 4└──────────┴───────────────┴──────────────┴──────────────────────┘ ↑ ↑ ↑ ↑ Attribute 1 Attribute 2 Attribute 3 Attribute 4 Degree (arity) = 4 attributesCardinality = 4 tuplesNotation Conventions
1. Schema Notation
Relation_Name(attribute₁: domain₁, attribute₂: domain₂, ...)
Example: Employee(emp_id: INT, name: VARCHAR(100), salary: DECIMAL(10,2))
2. Instance Notation
Relation_Name = {tuple₁, tuple₂, ...}
Example: Employee = {(1001, 'Alice', 95000), (1002, 'Bob', 78000)}
3. Key Notation Primary key attributes are often underlined or marked:
Employee(emp_id, name, salary) — emp_id underlined as primary key
4. Foreign Key Notation Relationships between relations shown with arrows or notation:
Employee(emp_id, name, dept_id → Department)
In ER (Entity-Relationship) diagrams, relations are typically shown as rectangles. Attributes appear as ovals connected to their relation. Primary keys are underlined. This visual language is standard in database design documentation.
Understanding the table-based structure has direct implications for how you design and use databases. Here are the key practical takeaways:
Employee(emp_id INT PK, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL, email VARCHAR(255) UNIQUE, hire_date DATE NOT NULL, salary DECIMAL(10,2) CHECK(salary > 0))
✓ Clear primary key ✓ Atomic names (first/last separate) ✓ Appropriate types ✓ Meaningful constraints
Employee(id, name, info, data, other)
✗ No key defined ✗ Vague column names ✗ 'info' likely non-atomic (JSON?) ✗ No types or constraints ✗ No clarity on what 'other' means
We've covered the foundational structure that makes the relational model possible. Let's consolidate the key concepts:
What's Next:
Now that we understand the structural basis of the relational model, we'll explore its mathematical foundations. The next page examines the set theory and predicate logic that give the relational model its formal power—the theory that enables query optimization, normalization, and constraint verification.
You now understand the table-based structure that defines the relational model. This seemingly simple representation—tables with rows and columns—rests on rigorous mathematical foundations that enable everything from simple queries to complex data integrity enforcement. Next, we'll explore those mathematical underpinnings.