Tuples And Rows - Learning Module

Loading content...

0/241

Degree and Cardinality

The Dimensions of Relations

Every relation exists in two dimensions: horizontally across its attributes and vertically down its tuples. These dimensions have precise names and profound implications for database design, query performance, and system capacity.

The degree of a relation refers to its width—the number of attributes it contains. The cardinality refers to its depth—the number of tuples it holds. Together, these measures characterize the structural properties of any relation.

Understanding degree and cardinality is essential for:

Estimating storage requirements
Analyzing query complexity
Designing efficient schemas
Evaluating join costs
Making normalization decisions

In this page, we explore these fundamental concepts with mathematical precision and practical insight, establishing the vocabulary and understanding that database professionals rely upon.

What You Will Learn

By the end of this page, you will understand the formal definitions of degree and cardinality, how these properties affect database operations, the relationship between relation dimensions and storage, and how these concepts apply to query analysis and optimization.

Degree: The Width of a Relation

Definition (Degree of a Relation):

The degree (also called arity) of a relation R is the number of attributes in its schema. If R has schema R(A₁, A₂, ..., Aₙ), then the degree of R is n.

Notation:

degree(R) = n
|schema(R)| = n
R is an "n-ary relation" (unary, binary, ternary, etc.)

Examples:

degree_examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Unary relation (degree 1)
Departments(department_name)
-- degree = 1
-- Each tuple has one attribute: the department name
 
-- Binary relation (degree 2)
Manages(employee_id, manager_id)
-- degree = 2
-- Each tuple has two attributes: who manages whom
 
-- Ternary relation (degree 3)
Supplies(supplier_id, part_id, quantity)
-- degree = 3
-- Each tuple links supplier, part, and quantity
 
-- Higher-arity relation
Employee(id, name, department, salary, hire_date, manager_id, email, phone)
-- degree = 8
-- Each tuple has eight attributes
 
-- Degree is a property of the SCHEMA, not the instance
-- It doesn't change when you add/remove tuples

Properties of Degree:

Degree is fixed by the schema. Once defined, a relation's degree doesn't change with data—it changes only with schema evolution (ALTER TABLE ADD COLUMN).
Degree ≥ 1. A relation with zero attributes is called a nullary relation. It can have at most one tuple (the empty tuple). This is a degenerate case rarely encountered.
Degree affects storage. More attributes generally mean more bytes per tuple, though compression and null handling complicate this.
Degree affects query complexity. Joining on more attributes, selecting more columns, and processing wider rows all increase query costs.

Terminology for Relation Degree
Degree	Term	Example
1	Unary	ValidCodes(code)
2	Binary	EmployeeDept(emp_id, dept_id)
3	Ternary	Supplies(supplier, part, qty)
4	Quaternary	Shipment(supplier, part, warehouse, date)
5+	n-ary	General relations with many attributes

Projections Change Degree

The projection operation produces a relation with (usually) lower degree. If Employee has degree 8, then π(name, salary)(Employee) has degree 2. This is how we select a 'subset' of attributes from a wider relation.

Cardinality: The Depth of a Relation

Definition (Cardinality of a Relation):

The cardinality of a relation R is the number of tuples it contains. Since a relation is a set, cardinality equals the set's size.

Notation:

|R| = number of tuples in R
card(R) = cardinality of R
n(R) = count of R

Examples:

cardinality_examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Small table
Departments = {(Engineering), (Marketing), (Sales), (HR)}
|Departments| = 4
 
-- Medium table
Employees with 1000 employees
|Employees| = 1000
 
-- Large table
WebLogs with 10 billion entries
|WebLogs| = 10,000,000,000
 
-- Empty table (valid relation)
NewHires = {} (no tuples yet)
|NewHires| = 0
 
-- Cardinality changes with DML operations:
-- INSERT increases cardinality by 1 (usually)
-- DELETE decreases cardinality
-- UPDATE doesn't change cardinality (replaces tuples)

Properties of Cardinality:

Cardinality varies with data. Unlike degree, cardinality changes as data is added or removed. Schema doesn't constrain cardinality (except through application logic).
Cardinality ≥ 0. An empty relation (with no tuples) is perfectly valid. It represents no facts matching the predicate.
Cardinality directly affects performance. Query time often scales with cardinality—scanning 1 million rows takes longer than scanning 1 thousand.
Maximum cardinality is bounded by domains. The maximum possible cardinality is the product of domain sizes: |dom(A₁)| × |dom(A₂)| × ... × |dom(Aₙ)|. For infinite domains (like integers or strings), this is unbounded.
Cardinality statistics drive optimization. Query optimizers maintain cardinality estimates for tables and use them to choose efficient execution plans.

Why Cardinality Matters for Performance

A full table scan is O(cardinality). A join of two tables can produce up to cardinality(A) × cardinality(B) tuples. Index lookups reduce this dramatically. Modern query optimizers spend significant effort estimating cardinalities because join ordering and access method decisions depend on knowing how many tuples flow through each operation.

Visualizing Degree and Cardinality

When we represent relations as tables, degree and cardinality correspond directly to the table's visual dimensions:

Degree = Number of columns
Cardinality = Number of rows

This intuitive mapping makes it easy to think about relation structure visually.

Converting Mermaid diagram...

Dimensional Analysis:

We can think of a relation as a matrix-like structure with:

degree(R) columns
|R| rows
degree(R) × |R| "cells" (individual values)

For the Employee relation above:

degree = 4 (ID, Name, Dept, Salary)
cardinality = 5 (five employees)
total cells = 20

Column Operations:

Operations that change degree:

Projection (π): Reduces degree by selecting a subset of attributes
Cartesian Product (×): Increases degree by combining attributes from two relations
Natural Join (⋈): May reduce or preserve degree depending on shared attributes

Row Operations:

Operations that change cardinality:

Selection (σ): Reduces cardinality by filtering tuples
Union (∪): Increases cardinality by combining tuples from two relations
Cartesian Product (×): Multiplies cardinalities
Join: Cardinality depends on matching tuples

Impact of Relational Operations on Dimensions
Operation	Effect on Degree	Effect on Cardinality
Selection σ	Unchanged	Decreased or unchanged
Projection π	Decreased or unchanged	Decreased or unchanged (duplicates removed)
Cartesian Product ×	Sum of degrees	Product of cardinalities
Natural Join ⋈	Sum minus shared attributes	0 to product of cardinalities
Union ∪	Unchanged (must match)	Sum or less (no duplicates)
Intersection ∩	Unchanged (must match)	Min of cardinalities or less
Difference −	Unchanged (must match)	Cardinality of first or less

Cardinality Estimation in Query Processing

Query optimization is fundamentally about choosing the most efficient execution plan. Cardinality estimation—predicting how many tuples flow through each operation—is the cornerstone of this process.

Why Estimation Matters:

Consider joining three tables: A, B, and C.

Order 1: (A ⋈ B) ⋈ C
Order 2: (A ⋈ C) ⋈ B
Order 3: (B ⋈ C) ⋈ A

The costs can differ by orders of magnitude depending on which intermediate results are smallest. If |A ⋈ B| = 1000 and |B ⋈ C| = 10,000,000, we want to compute A ⋈ B first.

How Databases Estimate Cardinality:

Cardinality Estimation Techniques

•Base Table Statistics: The optimizer stores the cardinality of each table. |Employees| = 50000 is a known fact.
•Selectivity Estimation: For selections, estimate what fraction of rows pass the predicate. salary > 100000 might select 10% of rows.
•Histograms: Distribution data about column values. Helpful for non-uniform distributions (most salaries are 50-80K, few above 200K).
•Distinct Value Count (NDV): How many unique values exist in a column. Used to estimate join cardinality.
•Correlation Assumptions: By default, predicates on different columns are assumed independent. Modern optimizers detect correlations.

Selection Cardinality Formula:

For a selection σ_{condition}(R), the estimated cardinality is:

|σ_{condition}(R)| ≈ |R| × selectivity(condition)

Where selectivity is the fraction of tuples satisfying the condition:

Equality on key: selectivity = 1/|R| (one match)
Equality on non-key with NDV distinct values: selectivity ≈ 1/NDV
Range condition: depends on distribution (uniform assumption or histogram)

Join Cardinality Formula:

For a join R ⋈ S with join attribute having NDV(R) and NDV(S) distinct values:

|R ⋈ S| ≈ (|R| × |S|) / max(NDV(R.attr), NDV(S.attr))

This assumes uniform distribution and referential integrity. Real queries may deviate significantly.

Estimation Errors

Cardinality estimation is notoriously error-prone. Correlated columns, non-uniform distributions, complex predicates, and stale statistics lead to estimates that are orders of magnitude wrong. This causes poor plan choices—an ongoing challenge in database research.

Storage Implications of Degree and Cardinality

Degree and cardinality directly impact storage requirements. Understanding this relationship helps in capacity planning and schema design.

Storage Size Formula (Simplified):

Storage ≈ cardinality × average_tuple_size
        ≈ cardinality × Σ(attribute_sizes)
        ≈ |R| × Σᵢ(size(Aᵢ))

For instance:

Employee table with 100,000 employees
Attributes: ID (4 bytes), Name (50 bytes), Dept (20 bytes), Salary (8 bytes)
Average tuple size ≈ 82 bytes
Total storage ≈ 100,000 × 82 = 8.2 MB (before indexes and overhead)

Storage Impact by Relation Characteristics
Characteristic	Storage Impact	Example
High degree, low cardinality	Many columns, few rows—wide but short	Configuration table: 50 settings columns, 1 row
Low degree, high cardinality	Few columns, many rows—narrow but tall	Log table: 3 columns (timestamp, level, message), billions of rows
High degree, high cardinality	Wide and tall—significant storage	Analytics fact table: 100 columns, billions of rows
Low degree, low cardinality	Small—minimal storage concern	Lookup table: 2 columns (code, description), 50 rows

Degree Considerations for Schema Design:

Wide tables (high degree):
- More data per row—may exceed page size limits
- Slower full-table scans (more data to read per row)
- Advantage for queries needing many columns (one seek per row)
- Column-family stores may be more efficient
Narrow tables (low degree):
- More tables needed to represent same data (normalization)
- Joins required to reconstruct complete entity
- Faster scans when few columns needed
- Better suited to row-oriented storage

Cardinality Considerations:

Large cardinality (millions+):
- Indexing becomes essential for point queries
- Full table scans become expensive
- Partitioning may be necessary
- Memory for caching becomes critical
Small cardinality (hundreds/thousands):
- Full scans are cheap—indexes may not help
- Entire table may fit in cache
- Lookup joins are efficient

Database Page Considerations

Databases read data in pages (typically 4KB-16KB). If tuple size exceeds page size, the database must store tuples across multiple pages (overflow), which degrades performance. Very wide tables may benefit from vertical partitioning—splitting into multiple narrower tables with the same key.

Minimum and Maximum Cardinality

When analyzing relational operations, understanding the bounds on result cardinality helps validate query logic and estimate performance.

Minimum Cardinality:

The smallest possible size of an operation's result.

For common operations:

Selection σ(R): min = 0 (no tuples satisfy the predicate)
Projection π(R): min = 1 (all tuples project to the same values)
Union R ∪ S: min = max(|R|, |S|) (when one is a subset of the other)
Intersection R ∩ S: min = 0 (disjoint relations)
Difference R − S: min = 0 (S contains all of R)
Cartesian Product R × S: min = 0 (if either operand is empty)
Natural Join R ⋈ S: min = 0 (no matching tuples)

Maximum Cardinality:

The largest possible size of an operation's result.

Cardinality Bounds for Relational Operations
Operation	Minimum	Maximum	Notes
σ_condition(R)	0	\|R\|	All or none may pass filter
π_attrs(R)	1	\|R\|	Duplicates may collapse tuples
R ∪ S	max(\|R\|, \|S\|)	\|R\| + \|S\|	Duplicates eliminated in set union
R ∩ S	0	min(\|R\|, \|S\|)	Intersection can't exceed smaller set
R − S	0	\|R\|	Difference can't exceed left operand
R × S	\|R\| × \|S\|	\|R\| × \|S\|	Exact: Cartesian product is multiplicative
R ⋈ S	0	\|R\| × \|S\|	Natural join ranges from empty to Cartesian

Practical Application:

Knowing cardinality bounds helps:

Debugging Queries: If a join returns way more rows than expected, it might be performing a Cartesian product (missing join condition). If |R ⋈ S| ≈ |R| × |S|, the join condition is probably ineffective.
Validating Results: If selection returns 0 rows but you expected some, either the data doesn't match or the predicate is wrong.
Capacity Planning: Maximum cardinality gives worst-case storage needs. A report aggregating from a 1B-row table should not produce 1B rows (if it might, there's a design issue).
Query Optimization: Pushing down selections (reducing cardinality early) exploits the fact that smaller cardinality means faster subsequent operations.

cardinality_bounds_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Example: Debugging an unexpectedly large join result
 
-- Expected: ~10,000 orders with their customers
-- Actual result: 100,000,000 rows (suspiciously close to 10,000 × 10,000)
SELECT *
FROM Orders, Customers;  -- BUG: Missing join condition!
 
-- Fix: Add the join condition
SELECT *
FROM Orders o
JOIN Customers c ON o.customer_id = c.customer_id;
-- Now returns ~10,000 rows as expected
 
-- Checking cardinality of intermediate results
EXPLAIN ANALYZE
SELECT c.name, COUNT(o.order_id)
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
WHERE c.country = 'USA'
GROUP BY c.name;
-- Shows estimated vs actual cardinality at each step

Degree, Cardinality, and Normalization

Normalization—the process of organizing relations to reduce redundancy—has direct effects on degree and cardinality. Understanding this relationship illuminates the tradeoffs in database design.

Effect of Normalization on Degree:

When you normalize a relation (decompose it into smaller relations):

Individual relation degrees decrease (fewer attributes per table)
Total number of relations increases
More joins are required to reconstruct the original data

Example: Decomposing an Unnormalized Relation:

Before Normalization:

EmployeeProject(
  emp_id,
  emp_name,
  emp_department,
  project_id,
  project_name,
  project_budget,
  hours_worked
)

Degree: 7
Cardinality: 10,000 (one row per employee-project pair)
Redundancy: Employee info repeated for each project they work on

After Normalization:

Employee(emp_id, emp_name, dept)
  Degree: 3, Cardinality: 500

Project(project_id, name, budget)
  Degree: 3, Cardinality: 50

Assignment(emp_id, proj_id, hours)
  Degree: 3, Cardinality: 10,000

Total attributes: same information, split across relations
Less redundancy: employee info stored once

Cardinality Changes with Normalization:

Normalization typically:

Reduces cardinality in tables containing repeated data (employee info no longer duplicated)
Creates new tables for relationships (Assignment table)
May increase total rows across all tables (but reduces total storage due to less redundancy)

The Tradeoff:

Approach	Degree	Cardinality	Pros	Cons
Denormalized (one wide table)	High	Medium	Fast reads, no joins	Redundancy, update anomalies
Normalized (many narrow tables)	Low	Varies	No redundancy, clean updates	Requires joins for queries

Rule of Thumb:

OLTP (transactional) systems favor normalization (frequent updates, data integrity)
OLAP (analytical) systems favor denormalization (complex reads, few updates)

Degree and Join Cost

The degree of joined relations affects result width and memory consumption. Joining Employee (degree 8) with Department (degree 5) produces degree ≤ 12 (depending on shared columns). Wide results consume more memory for sorting, hashing, and network transfer.

Summary: Mastering Relation Dimensions

We have thoroughly explored degree and cardinality—the fundamental dimensions that characterize every relation. Let's consolidate the essential understanding:

Key Takeaways

•Degree is the number of attributes (columns) — It is fixed by the schema and changes only with schema evolution.
•Cardinality is the number of tuples (rows) — It varies with data and changes with every INSERT and DELETE.
•Degree × Cardinality approximates the 'size' of a relation — Both dimensions contribute to storage and query costs.
•Relational operations affect both dimensions predictably — Projection reduces degree; selection reduces cardinality; joins and products combine both.
•Cardinality estimation drives query optimization — Estimating how many tuples flow through operations is crucial for choosing efficient plans.
•Storage and performance scale with these dimensions — Wide tables (high degree) and tall tables (high cardinality) present different challenges.
•Normalization trades degree for table count — Decomposition creates narrower tables at the cost of requiring joins.
•Cardinality bounds help validate and debug queries — Understanding min/max results catches Cartesian products and empty intersections.

What's Next:

Having understood the structure and dimensions of tuples and relations, we'll now explore tuple operations—the fundamental operations that create, retrieve, update, and delete tuples. This completes our understanding of tuples as both data structures and units of manipulation.

Page Complete

You now understand degree and cardinality—the two dimensions that characterize every relation. This knowledge powers capacity planning, query optimization analysis, and schema design decisions. In the final page of this module, we explore tuple operations.

Degree and Cardinality

The Dimensions of Relations

Understanding degree and cardinality is essential for:

Estimating storage requirements
Analyzing query complexity
Designing efficient schemas
Evaluating join costs
Making normalization decisions

In this page, we explore these fundamental concepts with mathematical precision and practical insight, establishing the vocabulary and understanding that database professionals rely upon.

What You Will Learn

Degree: The Width of a Relation

Definition (Degree of a Relation):

The degree (also called arity) of a relation R is the number of attributes in its schema. If R has schema R(A₁, A₂, ..., Aₙ), then the degree of R is n.

Notation:

degree(R) = n
|schema(R)| = n
R is an "n-ary relation" (unary, binary, ternary, etc.)

Examples:

degree_examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- Unary relation (degree 1)
Departments(department_name)
-- degree = 1
-- Each tuple has one attribute: the department name
 
-- Binary relation (degree 2)
Manages(employee_id, manager_id)
-- degree = 2
-- Each tuple has two attributes: who manages whom
 
-- Ternary relation (degree 3)
Supplies(supplier_id, part_id, quantity)
-- degree = 3
-- Each tuple links supplier, part, and quantity
 
-- Higher-arity relation
Employee(id, name, department, salary, hire_date, manager_id, email, phone)
-- degree = 8
-- Each tuple has eight attributes
 
-- Degree is a property of the SCHEMA, not the instance
-- It doesn't change when you add/remove tuples

Properties of Degree:

Degree is fixed by the schema. Once defined, a relation's degree doesn't change with data—it changes only with schema evolution (ALTER TABLE ADD COLUMN).
Degree ≥ 1. A relation with zero attributes is called a nullary relation. It can have at most one tuple (the empty tuple). This is a degenerate case rarely encountered.
Degree affects storage. More attributes generally mean more bytes per tuple, though compression and null handling complicate this.
Degree affects query complexity. Joining on more attributes, selecting more columns, and processing wider rows all increase query costs.

Terminology for Relation Degree
Degree	Term	Example
1	Unary	ValidCodes(code)
2	Binary	EmployeeDept(emp_id, dept_id)
3	Ternary	Supplies(supplier, part, qty)
4	Quaternary	Shipment(supplier, part, warehouse, date)
5+	n-ary	General relations with many attributes

Projections Change Degree

Cardinality: The Depth of a Relation

Definition (Cardinality of a Relation):

The cardinality of a relation R is the number of tuples it contains. Since a relation is a set, cardinality equals the set's size.

Notation:

|R| = number of tuples in R
card(R) = cardinality of R
n(R) = count of R

Examples:

cardinality_examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- Small table
Departments = {(Engineering), (Marketing), (Sales), (HR)}
|Departments| = 4
 
-- Medium table
Employees with 1000 employees
|Employees| = 1000
 
-- Large table
WebLogs with 10 billion entries
|WebLogs| = 10,000,000,000
 
-- Empty table (valid relation)
NewHires = {} (no tuples yet)
|NewHires| = 0
 
-- Cardinality changes with DML operations:
-- INSERT increases cardinality by 1 (usually)
-- DELETE decreases cardinality
-- UPDATE doesn't change cardinality (replaces tuples)

Properties of Cardinality:

Cardinality varies with data. Unlike degree, cardinality changes as data is added or removed. Schema doesn't constrain cardinality (except through application logic).
Cardinality ≥ 0. An empty relation (with no tuples) is perfectly valid. It represents no facts matching the predicate.
Cardinality directly affects performance. Query time often scales with cardinality—scanning 1 million rows takes longer than scanning 1 thousand.
Maximum cardinality is bounded by domains. The maximum possible cardinality is the product of domain sizes: |dom(A₁)| × |dom(A₂)| × ... × |dom(Aₙ)|. For infinite domains (like integers or strings), this is unbounded.
Cardinality statistics drive optimization. Query optimizers maintain cardinality estimates for tables and use them to choose efficient execution plans.

Why Cardinality Matters for Performance

Visualizing Degree and Cardinality

When we represent relations as tables, degree and cardinality correspond directly to the table's visual dimensions:

Degree = Number of columns
Cardinality = Number of rows

This intuitive mapping makes it easy to think about relation structure visually.

Converting Mermaid diagram...

Dimensional Analysis:

We can think of a relation as a matrix-like structure with:

degree(R) columns
|R| rows
degree(R) × |R| "cells" (individual values)

For the Employee relation above:

degree = 4 (ID, Name, Dept, Salary)
cardinality = 5 (five employees)
total cells = 20

Column Operations:

Operations that change degree:

Projection (π): Reduces degree by selecting a subset of attributes
Cartesian Product (×): Increases degree by combining attributes from two relations
Natural Join (⋈): May reduce or preserve degree depending on shared attributes

Row Operations:

Operations that change cardinality:

Selection (σ): Reduces cardinality by filtering tuples
Union (∪): Increases cardinality by combining tuples from two relations
Cartesian Product (×): Multiplies cardinalities
Join: Cardinality depends on matching tuples

Impact of Relational Operations on Dimensions
Operation	Effect on Degree	Effect on Cardinality
Selection σ	Unchanged	Decreased or unchanged
Projection π	Decreased or unchanged	Decreased or unchanged (duplicates removed)
Cartesian Product ×	Sum of degrees	Product of cardinalities
Natural Join ⋈	Sum minus shared attributes	0 to product of cardinalities
Union ∪	Unchanged (must match)	Sum or less (no duplicates)
Intersection ∩	Unchanged (must match)	Min of cardinalities or less
Difference −	Unchanged (must match)	Cardinality of first or less

Cardinality Estimation in Query Processing

Why Estimation Matters:

Consider joining three tables: A, B, and C.

Order 1: (A ⋈ B) ⋈ C
Order 2: (A ⋈ C) ⋈ B
Order 3: (B ⋈ C) ⋈ A

The costs can differ by orders of magnitude depending on which intermediate results are smallest. If |A ⋈ B| = 1000 and |B ⋈ C| = 10,000,000, we want to compute A ⋈ B first.

How Databases Estimate Cardinality:

Cardinality Estimation Techniques

•Base Table Statistics: The optimizer stores the cardinality of each table. |Employees| = 50000 is a known fact.
•Selectivity Estimation: For selections, estimate what fraction of rows pass the predicate. salary > 100000 might select 10% of rows.
•Histograms: Distribution data about column values. Helpful for non-uniform distributions (most salaries are 50-80K, few above 200K).
•Distinct Value Count (NDV): How many unique values exist in a column. Used to estimate join cardinality.
•Correlation Assumptions: By default, predicates on different columns are assumed independent. Modern optimizers detect correlations.

Selection Cardinality Formula:

For a selection σ_{condition}(R), the estimated cardinality is:

|σ_{condition}(R)| ≈ |R| × selectivity(condition)

Where selectivity is the fraction of tuples satisfying the condition:

Equality on key: selectivity = 1/|R| (one match)
Equality on non-key with NDV distinct values: selectivity ≈ 1/NDV
Range condition: depends on distribution (uniform assumption or histogram)

Join Cardinality Formula:

For a join R ⋈ S with join attribute having NDV(R) and NDV(S) distinct values:

|R ⋈ S| ≈ (|R| × |S|) / max(NDV(R.attr), NDV(S.attr))

This assumes uniform distribution and referential integrity. Real queries may deviate significantly.

Estimation Errors

Storage Implications of Degree and Cardinality

Degree and cardinality directly impact storage requirements. Understanding this relationship helps in capacity planning and schema design.

Storage Size Formula (Simplified):

Storage ≈ cardinality × average_tuple_size
        ≈ cardinality × Σ(attribute_sizes)
        ≈ |R| × Σᵢ(size(Aᵢ))

For instance:

Employee table with 100,000 employees
Attributes: ID (4 bytes), Name (50 bytes), Dept (20 bytes), Salary (8 bytes)
Average tuple size ≈ 82 bytes
Total storage ≈ 100,000 × 82 = 8.2 MB (before indexes and overhead)

Storage Impact by Relation Characteristics
Characteristic	Storage Impact	Example
High degree, low cardinality	Many columns, few rows—wide but short	Configuration table: 50 settings columns, 1 row
Low degree, high cardinality	Few columns, many rows—narrow but tall	Log table: 3 columns (timestamp, level, message), billions of rows
High degree, high cardinality	Wide and tall—significant storage	Analytics fact table: 100 columns, billions of rows
Low degree, low cardinality	Small—minimal storage concern	Lookup table: 2 columns (code, description), 50 rows

Degree Considerations for Schema Design:

Wide tables (high degree):
- More data per row—may exceed page size limits
- Slower full-table scans (more data to read per row)
- Advantage for queries needing many columns (one seek per row)
- Column-family stores may be more efficient
Narrow tables (low degree):
- More tables needed to represent same data (normalization)
- Joins required to reconstruct complete entity
- Faster scans when few columns needed
- Better suited to row-oriented storage

Cardinality Considerations:

Large cardinality (millions+):
- Indexing becomes essential for point queries
- Full table scans become expensive
- Partitioning may be necessary
- Memory for caching becomes critical
Small cardinality (hundreds/thousands):
- Full scans are cheap—indexes may not help
- Entire table may fit in cache
- Lookup joins are efficient

Database Page Considerations

Minimum and Maximum Cardinality

When analyzing relational operations, understanding the bounds on result cardinality helps validate query logic and estimate performance.

Minimum Cardinality:

The smallest possible size of an operation's result.

For common operations:

Selection σ(R): min = 0 (no tuples satisfy the predicate)
Projection π(R): min = 1 (all tuples project to the same values)
Union R ∪ S: min = max(|R|, |S|) (when one is a subset of the other)
Intersection R ∩ S: min = 0 (disjoint relations)
Difference R − S: min = 0 (S contains all of R)
Cartesian Product R × S: min = 0 (if either operand is empty)
Natural Join R ⋈ S: min = 0 (no matching tuples)

Maximum Cardinality:

The largest possible size of an operation's result.

Cardinality Bounds for Relational Operations
Operation	Minimum	Maximum	Notes
σ_condition(R)	0	\|R\|	All or none may pass filter
π_attrs(R)	1	\|R\|	Duplicates may collapse tuples
R ∪ S	max(\|R\|, \|S\|)	\|R\| + \|S\|	Duplicates eliminated in set union
R ∩ S	0	min(\|R\|, \|S\|)	Intersection can't exceed smaller set
R − S	0	\|R\|	Difference can't exceed left operand
R × S	\|R\| × \|S\|	\|R\| × \|S\|	Exact: Cartesian product is multiplicative
R ⋈ S	0	\|R\| × \|S\|	Natural join ranges from empty to Cartesian

Practical Application:

Knowing cardinality bounds helps:

Debugging Queries: If a join returns way more rows than expected, it might be performing a Cartesian product (missing join condition). If |R ⋈ S| ≈ |R| × |S|, the join condition is probably ineffective.
Validating Results: If selection returns 0 rows but you expected some, either the data doesn't match or the predicate is wrong.
Capacity Planning: Maximum cardinality gives worst-case storage needs. A report aggregating from a 1B-row table should not produce 1B rows (if it might, there's a design issue).
Query Optimization: Pushing down selections (reducing cardinality early) exploits the fact that smaller cardinality means faster subsequent operations.

cardinality_bounds_analysis.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
-- Example: Debugging an unexpectedly large join result
 
-- Expected: ~10,000 orders with their customers
-- Actual result: 100,000,000 rows (suspiciously close to 10,000 × 10,000)
SELECT *
FROM Orders, Customers;  -- BUG: Missing join condition!
 
-- Fix: Add the join condition
SELECT *
FROM Orders o
JOIN Customers c ON o.customer_id = c.customer_id;
-- Now returns ~10,000 rows as expected
 
-- Checking cardinality of intermediate results
EXPLAIN ANALYZE
SELECT c.name, COUNT(o.order_id)
FROM Customers c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
WHERE c.country = 'USA'
GROUP BY c.name;
-- Shows estimated vs actual cardinality at each step

Degree, Cardinality, and Normalization

Normalization—the process of organizing relations to reduce redundancy—has direct effects on degree and cardinality. Understanding this relationship illuminates the tradeoffs in database design.

Effect of Normalization on Degree:

When you normalize a relation (decompose it into smaller relations):

Individual relation degrees decrease (fewer attributes per table)
Total number of relations increases
More joins are required to reconstruct the original data

Example: Decomposing an Unnormalized Relation:

Before Normalization:

EmployeeProject(
  emp_id,
  emp_name,
  emp_department,
  project_id,
  project_name,
  project_budget,
  hours_worked
)

Degree: 7
Cardinality: 10,000 (one row per employee-project pair)
Redundancy: Employee info repeated for each project they work on

After Normalization:

Employee(emp_id, emp_name, dept)
  Degree: 3, Cardinality: 500

Project(project_id, name, budget)
  Degree: 3, Cardinality: 50

Assignment(emp_id, proj_id, hours)
  Degree: 3, Cardinality: 10,000

Total attributes: same information, split across relations
Less redundancy: employee info stored once

Cardinality Changes with Normalization:

Normalization typically:

Reduces cardinality in tables containing repeated data (employee info no longer duplicated)
Creates new tables for relationships (Assignment table)
May increase total rows across all tables (but reduces total storage due to less redundancy)

The Tradeoff:

Approach	Degree	Cardinality	Pros	Cons
Denormalized (one wide table)	High	Medium	Fast reads, no joins	Redundancy, update anomalies
Normalized (many narrow tables)	Low	Varies	No redundancy, clean updates	Requires joins for queries

Rule of Thumb:

OLTP (transactional) systems favor normalization (frequent updates, data integrity)
OLAP (analytical) systems favor denormalization (complex reads, few updates)

Degree and Join Cost

Summary: Mastering Relation Dimensions

We have thoroughly explored degree and cardinality—the fundamental dimensions that characterize every relation. Let's consolidate the essential understanding:

Key Takeaways

•Degree is the number of attributes (columns) — It is fixed by the schema and changes only with schema evolution.
•Cardinality is the number of tuples (rows) — It varies with data and changes with every INSERT and DELETE.
•Degree × Cardinality approximates the 'size' of a relation — Both dimensions contribute to storage and query costs.
•Relational operations affect both dimensions predictably — Projection reduces degree; selection reduces cardinality; joins and products combine both.
•Cardinality estimation drives query optimization — Estimating how many tuples flow through operations is crucial for choosing efficient plans.
•Storage and performance scale with these dimensions — Wide tables (high degree) and tall tables (high cardinality) present different challenges.
•Normalization trades degree for table count — Decomposition creates narrower tables at the cost of requiring joins.
•Cardinality bounds help validate and debug queries — Understanding min/max results catches Cartesian products and empty intersections.

What's Next:

Page Complete