Database Management SystemsRelations and Tables

Relations and Tables: The Foundation of Relational Databases

LevelBeginner

Duration60 mins

TopicRelations and Tables

1 / 5

Relation Definition: The Formal Foundation

The Heart of the Relational Model

In 1970, Edgar F. Codd, a mathematician and computer scientist at IBM, published a paper that would fundamentally transform how humanity stores and retrieves information. Titled "A Relational Model of Data for Large Shared Data Banks", this paper introduced a deceptively simple yet profoundly powerful concept: the relation.

Before Codd's work, database systems were dominated by hierarchical and network models—complex, inflexible structures that required programmers to navigate intricate pointer chains and understand physical storage layouts. Codd's insight was revolutionary: separate the logical representation of data from its physical storage. The vehicle for this separation was the mathematical concept of a relation.

Understanding what a relation truly is—not just superficially as a table, but with mathematical precision—is essential for mastering relational databases. This precision is what gives the relational model its power: the ability to reason about data with the rigor of mathematics.

Learning Objectives

By the end of this page, you will: (1) Understand the formal, mathematical definition of a relation, (2) Distinguish between relations and informal notions of tables, (3) Appreciate why mathematical precision matters for database theory, (4) Recognize the key properties that define relations, and (5) Connect abstract definitions to practical database concepts.

What is a Relation?

At its core, a relation is a mathematical construct borrowed from set theory. To understand it properly, we must first establish the foundational concepts upon which it rests.

Definition (Informal): A relation is a two-dimensional structure composed of rows and columns, where each row represents a unique entity or fact, and each column represents a specific attribute of those entities.

Definition (Formal): Given domains D₁, D₂, ..., Dₙ (not necessarily distinct), a relation r is a subset of the Cartesian product D₁ × D₂ × ... × Dₙ. That is:

$$r \subseteq D_1 \times D_2 \times \cdots \times D_n$$

This formal definition may seem abstract, but it captures something profound: a relation is any subset of all possible combinations of values from its constituent domains. This means relations are fundamentally about sets of facts that we choose to assert as true.

Key Insight: Relations as Sets of Facts

Think of a relation as a collection of assertions. Each row (tuple) asserts that a particular combination of values exists together in the real world. The relation r(Employee, Department, Salary) containing (John, Engineering, 85000) asserts the fact: 'John works in Engineering and earns 85000.' The power of the relational model comes from treating these facts as mathematical sets that can be manipulated with algebraic precision.

Breaking Down the Definition:

Let's unpack this definition piece by piece:

Domains (D₁, D₂, ..., Dₙ): Each domain is a set of atomic (indivisible) values. For example, a domain for 'Age' might be the set of positive integers {1, 2, 3, ...}, while a domain for 'Name' might be the set of all character strings up to 100 characters.
Cartesian Product (×): The Cartesian product D₁ × D₂ × ... × Dₙ is the set of all possible n-tuples where the first element comes from D₁, the second from D₂, and so on. This represents every conceivable combination.
Subset (⊆): A relation is a subset of this Cartesian product. This is crucial—we don't include every possible combination, only those that represent true facts in our domain.
n-tuples: Each element in the relation is an ordered sequence of n values, called a tuple. If n = 3, we have 3-tuples (or triples).

An Illustrative Example

Let's make this concrete with a carefully constructed example that demonstrates every aspect of the definition.

Scenario: We want to model student enrollment information for a university course.

Step 1: Define the Domains

Domain Definitions for Student Enrollment
Domain Name	Symbol	Definition	Example Values
Student ID	D₁	The set of 6-digit positive integers	100001, 100002, 234567
Student Name	D₂	Strings of 1-50 alphabetic characters	Alice, Bob, Carol
Grade	D₃	The set {A, B, C, D, F, I, W}	A, B, C, F
Credits	D₄	The set {1, 2, 3, 4}	3, 4

Step 2: Understand the Cartesian Product

The Cartesian product D₁ × D₂ × D₃ × D₄ contains every possible 4-tuple. Some examples:

(100001, "Alice", A, 3) — Could be a valid enrollment
(100001, "Alice", Z, 3) — Invalid: Z is not in domain D₃
(100002, "Bob", B, 5) — Invalid: 5 is not in domain D₄
(999999, "Zephyr", F, 4) — Valid structure, but is this fact true?

The Cartesian product is enormous—potentially infinite if any domain is infinite. For a simplified example with only 100 students, 5 grades, and 4 credit options, we'd have 100 × ∞ × 5 × 4 possible combinations. This is why the subset aspect is crucial.

Step 3: Define the Relation

The relation Enrollment is the subset of this Cartesian product containing only the combinations that actually exist in our university's records:

Relation: Enrollment
StudentID	StudentName	Grade	Credits
100001	Alice Chen	A	4
100002	Bob Smith	B	3
100003	Carol Davis	A	4
100004	David Kim	C	3
100005	Emma Wilson	B	4

Key Observations:

The relation is finite even though some domains may be infinite. While the domain of StudentName theoretically contains infinitely many strings, our relation only contains 5 tuples.
Each tuple represents an asserted fact. The tuple (100001, "Alice Chen", A, 4) asserts: "Student 100001, named Alice Chen, received grade A in a 4-credit course."
Absence has meaning. If the tuple (100006, "Frank Jones", B, 3) is not in the relation, we assert that this fact is not true (or at least not recorded)—this is the Closed World Assumption.
All valid combinations from domains are structurally possible. Whether (100001, "Bob Smith", F, 4) is in our relation depends on what's true in the real world, not on structural constraints (though other constraints may apply).

The Closed World Assumption (CWA)

In relational databases, we operate under the Closed World Assumption: if a fact is not explicitly present in the database, it is assumed to be false. This is a pragmatic choice—we can only query what we know. It contrasts with the Open World Assumption used in knowledge representation, where absence means 'unknown' rather than 'false.'

Essential Properties of Relations

Relations in the relational model possess several fundamental properties that distinguish them from arbitrary data structures. Understanding these properties is essential for database design and query formulation.

Fundamental Properties of Relations

•No Duplicate Tuples: Since a relation is a mathematical set, no two tuples can be identical. Each tuple must be unique. This property is enforced through primary keys and underlies the ability to identify specific facts.
•Tuple Order is Insignificant: The tuples in a relation have no inherent ordering. The set {t₁, t₂, t₃} is the same relation as {t₃, t₁, t₂}. While query results may be displayed in a particular order, this is a presentation choice, not a property of the relation.
•Attribute Order is Insignificant: Formally, attributes (columns) have no inherent left-to-right ordering. Attributes are identified by name, not position. The relation with attributes (Name, Age) is mathematically equivalent to (Age, Name).
•Attribute Values are Atomic: Each cell in a relation contains exactly one value from its attribute's domain—no lists, no sets, no nested structures. This is the First Normal Form (1NF) requirement, and it ensures that every value is directly accessible without decomposition.
•Attribute Names are Distinct: Within a relation, all attribute names must be unique. You cannot have two columns both named 'Name' in the same relation.

Theory vs. Implementation

In the mathematical definition, tuple and attribute ordering don't matter. However, practical SQL databases do maintain a physical column order, and some queries depend on column positions. Similarly, while relations mathematically cannot have duplicates, SQL tables can contain duplicate rows unless explicitly constrained. The theoretical model provides the ideal; implementations approximate it.

Why These Properties Matter:

No Duplicates → Meaningful Identity: Because each tuple is unique, we can always identify and refer to specific facts. This enables updates, deletions, and references without ambiguity.
Order Independence → Location Transparency: Since order doesn't matter, the database system can store and retrieve tuples in whatever physical order is most efficient. This enables query optimization.
Atomic Values → Query Simplicity: With atomic values, every query operates on simple, single values. You never need to 'look inside' a cell—enabling straightforward comparison, aggregation, and joining.
Named Attributes → Self-Describing Structures: Because attributes are named, not positional, relations are self-documenting. Queries can reference 'Employee.Salary' rather than 'the third column.'

Relation vs. Table: A Critical Distinction

The terms 'relation' and 'table' are often used interchangeably, but they represent fundamentally different concepts. A relation is a mathematical abstraction; a table is a concrete representation or implementation.

Understanding this distinction prevents conceptual errors that lead to poor database design.

Relation (Mathematical)

•Abstract mathematical concept
•Exists in the realm of theory
•No duplicate tuples (by definition)
•No ordering of tuples or attributes
•Attribute values must be atomic
•Defines what can be true, not how it's stored
•Closed under relational operations

Table (Implementation)

•Concrete data structure in a DBMS
•Exists as files on disk
•May allow duplicate rows (unless constrained)
•Has physical row and column order
•May support non-atomic types (arrays, JSON)
•Includes storage details, indexes, etc.
•Subject to SQL semantics, not pure algebra

SQL and the Relational Model:

SQL, the standard language for relational databases, is based on the relational model but departs from it in significant ways:

Aspect	Pure Relational Model	SQL Implementation
Duplicates	Forbidden	Allowed (use DISTINCT to remove)
NULL values	Not in original model	Fully supported
Ordering	Undefined	Explicit via ORDER BY
Attribute position	Irrelevant	SELECT * returns fixed order
Bag vs. Set semantics	Set (unique tuples)	Bag (multiset, allows duplicates)

These differences aren't flaws—they're pragmatic choices. Real-world data sometimes has duplicates, missing values (NULLs) are common, and users often want ordered results. But understanding the pure relational model helps you appreciate the theoretical foundation and recognize when SQL behavior might be surprising.

Practical Implication

When designing databases, think in terms of relations (mathematical sets of facts), then implement as tables. This leads to cleaner designs because you focus on what information you're modeling, not how it will be stored. The implementation details can be adjusted; the conceptual model should be precise.

Terminology: Precision Matters

The relational model uses precise terminology borrowed from mathematics. Unfortunately, different communities use different terms for the same concepts, leading to confusion. Here's a comprehensive mapping:

Relational Terminology Across Contexts
Formal Relational Model	Informal/File System	SQL	Description
Relation	Table, File	Table	The entire two-dimensional structure
Tuple	Row, Record	Row	One horizontal entry in the relation
Attribute	Column, Field	Column	One vertical component of the relation
Domain	Data type, Pool of values	Data type + constraints	Set of allowable values for an attribute
Degree (arity)	Number of columns	Column count	How many attributes the relation has
Cardinality	Number of rows	Row count	How many tuples the relation contains
Relation schema	Table structure	CREATE TABLE definition	The relation's name and attribute definitions
Relation instance	Table contents	Current table state	The set of tuples at a given time

Why Terminology Matters:

Using precise terminology isn't pedantry—it enables clear communication and prevents errors:

Tuple vs. Row: A tuple is an ordered set of values. A row in SQL may have a row ID, may be ordered, and may contain NULLs. Conflating them leads to confusion about identity and equality.
Relation vs. Table: Saying 'relation' signals you're thinking mathematically about data as sets of facts. Saying 'table' might imply mutable state, physical storage, or SQL-specific behavior.
Attribute vs. Column: Attributes are named components of tuples. Columns in SQL have positions and specific data types with implementation-defined behaviors.
Domain vs. Data Type: A domain is a semantic concept (e.g., 'EmployeeID' as a meaningful identifier with constraints), while a data type is a syntactic one (e.g., INTEGER). A domain may be implemented as a data type, but carries additional meaning.

Notational Convention

Throughout this curriculum, we will use formal terminology (relation, tuple, attribute) when discussing theory and informal terminology (table, row, column) when discussing SQL implementation. Being comfortable with both is essential for reading academic literature and writing production code.

Degree and Cardinality

Two fundamental metrics characterize every relation:

Degree (Arity): The degree of a relation is the number of attributes (columns) it contains. This is a property of the relation schema and remains constant throughout the relation's existence (unless altered by a schema change).

A relation with 1 attribute is unary
A relation with 2 attributes is binary
A relation with 3 attributes is ternary
A relation with n attributes is n-ary

Cardinality: The cardinality of a relation is the number of tuples (rows) it contains. This is a property of a specific relation instance and changes as data is inserted, updated, or deleted.

The distinction is fundamental: degree is structural (part of the schema), while cardinality is dynamic (changes with content).

Degree vs. Cardinality ExampleConsider a relation storing department information:

Input

Department(DeptID, DeptName, Manager, Budget, Location)

Output

Degree: 5 (five attributes)
Cardinality: Changes based on how many departments exist

Explanation

If this relation currently contains 12 departments, its cardinality is 12. Adding a new department increases cardinality to 13. Adding a new attribute (like 'Founded_Year') would increase degree to 6—but this requires schema modification.

Why This Distinction Matters for Performance:

Database performance is dramatically affected by both degree and cardinality:

Metric	Increase Impact	Performance Consideration
Degree	More data per tuple	Wider rows consume more memory; fewer rows fit in cache; index choices become more complex
Cardinality	More tuples total	Larger tables require more storage; scans take longer; index selectivity becomes crucial

A relation with degree 50 (50 columns) and cardinality 1,000 behaves very differently from one with degree 3 and cardinality 10,000,000.

Database designers must consider both: narrow tables (low degree) can often be stored more efficiently, while proper indexing and partitioning strategies depend heavily on anticipated cardinality.

Historical and Theoretical Significance

The relation is not merely one data structure among many—it represents a paradigm shift in how we conceptualize data. To fully appreciate its significance, we must understand what came before.

Pre-Relational Systems:

Before 1970, database systems were dominated by:

Hierarchical Model (IMS, 1966): Data organized as trees. Parent-child relationships hard-coded. Queries required navigating pointer chains. Changing structure meant rewriting programs.
Network Model (CODASYL, 1969): Data organized as graphs. More flexible than hierarchical, but still required explicit navigation. Programs were tightly coupled to physical data structures.

Both models had severe limitations:

Physical dependence: Programs knew about data layout, file organization, and access paths
Limited query capability: Finding data required programmers to write navigation code
Inflexibility: Schema changes broke applications

Codd's Revolution

Codd's insight was to separate the 'what' from the 'how.' Users should declare what data they want, not how to get it. This separation is provided by relations—abstract, logical structures that can be manipulated mathematically without knowing anything about physical storage.

Why Relations Won:

The relation succeeded because it provides:

Data Independence: Programs reference relations (logical structures), not files (physical structures). The DBMS can change physical organization without breaking applications.
Non-Procedural Queries: Users can ask 'Give me all employees in Engineering earning over $100K' without specifying how to search, which indexes to use, or in what order to access data.
Mathematical Foundation: Relations support a complete algebra of operations (select, project, join, etc.) with well-defined semantics. This enables query optimization, formal verification, and principled reasoning.
Simplicity: A relation is conceptually simple—just a table of facts. Users don't need to understand trees, graphs, or pointers.
Extensibility: New data can be added as new relations without modifying existing structures. New queries can be written without new programs.

These advantages were so compelling that relational databases became dominant by the 1990s and remain so today, despite the rise of NoSQL alternatives for specific use cases.

Key Historical Milestones

•1970: Codd publishes the foundational paper at IBM Research
•1974: IBM's System R project begins implementing Codd's ideas
•1974: UC Berkeley's INGRES project starts (later becomes PostgreSQL ancestor)
•1979: Oracle releases first commercial SQL database
•1983: IBM releases DB2
•1986: SQL becomes ANSI standard (SQL-86)
•1985: Codd publishes his 12 Rules for evaluating relational systems
•1981: Codd receives Turing Award for his contributions

Summary and Key Takeaways

We have established the foundational concept upon which all relational database theory rests: the relation. Let's consolidate what we've learned:

Core Concepts Mastered

•A relation is a subset of a Cartesian product of domains—a mathematical set representing which combinations of values are asserted as true facts.
•Relations have essential properties: no duplicate tuples, no inherent ordering (of tuples or attributes), and atomic attribute values.
•Relations differ from tables: relations are abstract mathematical concepts; tables are concrete SQL implementations that approximate (but don't perfectly match) the relational model.
•Precise terminology matters: tuple/row, attribute/column, relation/table, domain/data type—using terms correctly enables clear communication and prevents conceptual errors.
•Degree and cardinality characterize relations: degree (number of attributes) is structural; cardinality (number of tuples) is dynamic.
•Relations revolutionized data management by providing data independence, non-procedural querying, and a mathematical foundation for reasoning about data.

What's Next:

With the definition of a relation firmly established, we're ready to explore its mathematical foundations in greater depth. The next page examines the set-theoretic and algebraic structures that give relations their power—enabling us to manipulate data with the precision of mathematics and the efficiency of optimized algorithms.

Foundation Established

You now understand what a relation truly is—not just informally as a table, but as a rigorous mathematical structure. This precision will serve you throughout your study of databases, enabling you to reason about schemas, queries, and optimizations with clarity and confidence.

1 / 5

Loading learning content...

Database Management SystemsRelations and Tables

Relations and Tables: The Foundation of Relational Databases

LevelBeginner

Duration60 mins

TopicRelations and Tables

1 / 5

Relation Definition: The Formal Foundation

The Heart of the Relational Model

Learning Objectives

What is a Relation?

At its core, a relation is a mathematical construct borrowed from set theory. To understand it properly, we must first establish the foundational concepts upon which it rests.

Definition (Formal): Given domains D₁, D₂, ..., Dₙ (not necessarily distinct), a relation r is a subset of the Cartesian product D₁ × D₂ × ... × Dₙ. That is:

$$r \subseteq D_1 \times D_2 \times \cdots \times D_n$$

Key Insight: Relations as Sets of Facts

Breaking Down the Definition:

Let's unpack this definition piece by piece:

Domains (D₁, D₂, ..., Dₙ): Each domain is a set of atomic (indivisible) values. For example, a domain for 'Age' might be the set of positive integers {1, 2, 3, ...}, while a domain for 'Name' might be the set of all character strings up to 100 characters.
Cartesian Product (×): The Cartesian product D₁ × D₂ × ... × Dₙ is the set of all possible n-tuples where the first element comes from D₁, the second from D₂, and so on. This represents every conceivable combination.
Subset (⊆): A relation is a subset of this Cartesian product. This is crucial—we don't include every possible combination, only those that represent true facts in our domain.
n-tuples: Each element in the relation is an ordered sequence of n values, called a tuple. If n = 3, we have 3-tuples (or triples).

An Illustrative Example

Let's make this concrete with a carefully constructed example that demonstrates every aspect of the definition.

Scenario: We want to model student enrollment information for a university course.

Step 1: Define the Domains

Domain Definitions for Student Enrollment
Domain Name	Symbol	Definition	Example Values
Student ID	D₁	The set of 6-digit positive integers	100001, 100002, 234567
Student Name	D₂	Strings of 1-50 alphabetic characters	Alice, Bob, Carol
Grade	D₃	The set {A, B, C, D, F, I, W}	A, B, C, F
Credits	D₄	The set {1, 2, 3, 4}	3, 4

Step 2: Understand the Cartesian Product

The Cartesian product D₁ × D₂ × D₃ × D₄ contains every possible 4-tuple. Some examples:

(100001, "Alice", A, 3) — Could be a valid enrollment
(100001, "Alice", Z, 3) — Invalid: Z is not in domain D₃
(100002, "Bob", B, 5) — Invalid: 5 is not in domain D₄
(999999, "Zephyr", F, 4) — Valid structure, but is this fact true?

Step 3: Define the Relation

The relation Enrollment is the subset of this Cartesian product containing only the combinations that actually exist in our university's records:

Relation: Enrollment
StudentID	StudentName	Grade	Credits
100001	Alice Chen	A	4
100002	Bob Smith	B	3
100003	Carol Davis	A	4
100004	David Kim	C	3
100005	Emma Wilson	B	4

Key Observations:

The relation is finite even though some domains may be infinite. While the domain of StudentName theoretically contains infinitely many strings, our relation only contains 5 tuples.
Each tuple represents an asserted fact. The tuple (100001, "Alice Chen", A, 4) asserts: "Student 100001, named Alice Chen, received grade A in a 4-credit course."
Absence has meaning. If the tuple (100006, "Frank Jones", B, 3) is not in the relation, we assert that this fact is not true (or at least not recorded)—this is the Closed World Assumption.
All valid combinations from domains are structurally possible. Whether (100001, "Bob Smith", F, 4) is in our relation depends on what's true in the real world, not on structural constraints (though other constraints may apply).

The Closed World Assumption (CWA)

Essential Properties of Relations

Fundamental Properties of Relations

•No Duplicate Tuples: Since a relation is a mathematical set, no two tuples can be identical. Each tuple must be unique. This property is enforced through primary keys and underlies the ability to identify specific facts.
•Tuple Order is Insignificant: The tuples in a relation have no inherent ordering. The set {t₁, t₂, t₃} is the same relation as {t₃, t₁, t₂}. While query results may be displayed in a particular order, this is a presentation choice, not a property of the relation.
•Attribute Order is Insignificant: Formally, attributes (columns) have no inherent left-to-right ordering. Attributes are identified by name, not position. The relation with attributes (Name, Age) is mathematically equivalent to (Age, Name).
•Attribute Values are Atomic: Each cell in a relation contains exactly one value from its attribute's domain—no lists, no sets, no nested structures. This is the First Normal Form (1NF) requirement, and it ensures that every value is directly accessible without decomposition.
•Attribute Names are Distinct: Within a relation, all attribute names must be unique. You cannot have two columns both named 'Name' in the same relation.

Theory vs. Implementation

Why These Properties Matter:

No Duplicates → Meaningful Identity: Because each tuple is unique, we can always identify and refer to specific facts. This enables updates, deletions, and references without ambiguity.
Order Independence → Location Transparency: Since order doesn't matter, the database system can store and retrieve tuples in whatever physical order is most efficient. This enables query optimization.
Atomic Values → Query Simplicity: With atomic values, every query operates on simple, single values. You never need to 'look inside' a cell—enabling straightforward comparison, aggregation, and joining.
Named Attributes → Self-Describing Structures: Because attributes are named, not positional, relations are self-documenting. Queries can reference 'Employee.Salary' rather than 'the third column.'

Relation vs. Table: A Critical Distinction

Understanding this distinction prevents conceptual errors that lead to poor database design.

Relation (Mathematical)

•Abstract mathematical concept
•Exists in the realm of theory
•No duplicate tuples (by definition)
•No ordering of tuples or attributes
•Attribute values must be atomic
•Defines what can be true, not how it's stored
•Closed under relational operations

Table (Implementation)

•Concrete data structure in a DBMS
•Exists as files on disk
•May allow duplicate rows (unless constrained)
•Has physical row and column order
•May support non-atomic types (arrays, JSON)
•Includes storage details, indexes, etc.
•Subject to SQL semantics, not pure algebra

SQL and the Relational Model:

SQL, the standard language for relational databases, is based on the relational model but departs from it in significant ways:

Aspect	Pure Relational Model	SQL Implementation
Duplicates	Forbidden	Allowed (use DISTINCT to remove)
NULL values	Not in original model	Fully supported
Ordering	Undefined	Explicit via ORDER BY
Attribute position	Irrelevant	SELECT * returns fixed order
Bag vs. Set semantics	Set (unique tuples)	Bag (multiset, allows duplicates)

Practical Implication

Terminology: Precision Matters

Relational Terminology Across Contexts
Formal Relational Model	Informal/File System	SQL	Description
Relation	Table, File	Table	The entire two-dimensional structure
Tuple	Row, Record	Row	One horizontal entry in the relation
Attribute	Column, Field	Column	One vertical component of the relation
Domain	Data type, Pool of values	Data type + constraints	Set of allowable values for an attribute
Degree (arity)	Number of columns	Column count	How many attributes the relation has
Cardinality	Number of rows	Row count	How many tuples the relation contains
Relation schema	Table structure	CREATE TABLE definition	The relation's name and attribute definitions
Relation instance	Table contents	Current table state	The set of tuples at a given time

Why Terminology Matters:

Using precise terminology isn't pedantry—it enables clear communication and prevents errors:

Tuple vs. Row: A tuple is an ordered set of values. A row in SQL may have a row ID, may be ordered, and may contain NULLs. Conflating them leads to confusion about identity and equality.
Relation vs. Table: Saying 'relation' signals you're thinking mathematically about data as sets of facts. Saying 'table' might imply mutable state, physical storage, or SQL-specific behavior.
Attribute vs. Column: Attributes are named components of tuples. Columns in SQL have positions and specific data types with implementation-defined behaviors.
Domain vs. Data Type: A domain is a semantic concept (e.g., 'EmployeeID' as a meaningful identifier with constraints), while a data type is a syntactic one (e.g., INTEGER). A domain may be implemented as a data type, but carries additional meaning.

Notational Convention

Degree and Cardinality

Two fundamental metrics characterize every relation:

A relation with 1 attribute is unary
A relation with 2 attributes is binary
A relation with 3 attributes is ternary
A relation with n attributes is n-ary

Cardinality: The cardinality of a relation is the number of tuples (rows) it contains. This is a property of a specific relation instance and changes as data is inserted, updated, or deleted.

The distinction is fundamental: degree is structural (part of the schema), while cardinality is dynamic (changes with content).

Degree vs. Cardinality ExampleConsider a relation storing department information:

Input

Department(DeptID, DeptName, Manager, Budget, Location)

Output

Degree: 5 (five attributes)
Cardinality: Changes based on how many departments exist

Explanation

Why This Distinction Matters for Performance:

Database performance is dramatically affected by both degree and cardinality:

Metric	Increase Impact	Performance Consideration
Degree	More data per tuple	Wider rows consume more memory; fewer rows fit in cache; index choices become more complex
Cardinality	More tuples total	Larger tables require more storage; scans take longer; index selectivity becomes crucial

A relation with degree 50 (50 columns) and cardinality 1,000 behaves very differently from one with degree 3 and cardinality 10,000,000.

Database designers must consider both: narrow tables (low degree) can often be stored more efficiently, while proper indexing and partitioning strategies depend heavily on anticipated cardinality.

Historical and Theoretical Significance

The relation is not merely one data structure among many—it represents a paradigm shift in how we conceptualize data. To fully appreciate its significance, we must understand what came before.

Pre-Relational Systems:

Before 1970, database systems were dominated by:

Hierarchical Model (IMS, 1966): Data organized as trees. Parent-child relationships hard-coded. Queries required navigating pointer chains. Changing structure meant rewriting programs.
Network Model (CODASYL, 1969): Data organized as graphs. More flexible than hierarchical, but still required explicit navigation. Programs were tightly coupled to physical data structures.

Both models had severe limitations:

Physical dependence: Programs knew about data layout, file organization, and access paths
Limited query capability: Finding data required programmers to write navigation code
Inflexibility: Schema changes broke applications

Codd's Revolution

Why Relations Won:

The relation succeeded because it provides:

Data Independence: Programs reference relations (logical structures), not files (physical structures). The DBMS can change physical organization without breaking applications.
Non-Procedural Queries: Users can ask 'Give me all employees in Engineering earning over $100K' without specifying how to search, which indexes to use, or in what order to access data.
Mathematical Foundation: Relations support a complete algebra of operations (select, project, join, etc.) with well-defined semantics. This enables query optimization, formal verification, and principled reasoning.
Simplicity: A relation is conceptually simple—just a table of facts. Users don't need to understand trees, graphs, or pointers.
Extensibility: New data can be added as new relations without modifying existing structures. New queries can be written without new programs.

These advantages were so compelling that relational databases became dominant by the 1990s and remain so today, despite the rise of NoSQL alternatives for specific use cases.

Key Historical Milestones

•1970: Codd publishes the foundational paper at IBM Research
•1974: IBM's System R project begins implementing Codd's ideas
•1974: UC Berkeley's INGRES project starts (later becomes PostgreSQL ancestor)
•1979: Oracle releases first commercial SQL database
•1983: IBM releases DB2
•1986: SQL becomes ANSI standard (SQL-86)
•1985: Codd publishes his 12 Rules for evaluating relational systems
•1981: Codd receives Turing Award for his contributions

Summary and Key Takeaways

We have established the foundational concept upon which all relational database theory rests: the relation. Let's consolidate what we've learned:

Core Concepts Mastered

•A relation is a subset of a Cartesian product of domains—a mathematical set representing which combinations of values are asserted as true facts.
•Relations have essential properties: no duplicate tuples, no inherent ordering (of tuples or attributes), and atomic attribute values.
•Relations differ from tables: relations are abstract mathematical concepts; tables are concrete SQL implementations that approximate (but don't perfectly match) the relational model.
•Precise terminology matters: tuple/row, attribute/column, relation/table, domain/data type—using terms correctly enables clear communication and prevents conceptual errors.
•Degree and cardinality characterize relations: degree (number of attributes) is structural; cardinality (number of tuples) is dynamic.
•Relations revolutionized data management by providing data independence, non-procedural querying, and a mathematical foundation for reasoning about data.

What's Next:

Foundation Established

1 / 5