Loading content...
Every relation exists in two dimensions: horizontally across its attributes and vertically down its tuples. These dimensions have precise names and profound implications for database design, query performance, and system capacity.
The degree of a relation refers to its width—the number of attributes it contains. The cardinality refers to its depth—the number of tuples it holds. Together, these measures characterize the structural properties of any relation.
Understanding degree and cardinality is essential for:
In this page, we explore these fundamental concepts with mathematical precision and practical insight, establishing the vocabulary and understanding that database professionals rely upon.
By the end of this page, you will understand the formal definitions of degree and cardinality, how these properties affect database operations, the relationship between relation dimensions and storage, and how these concepts apply to query analysis and optimization.
Definition (Degree of a Relation):
The degree (also called arity) of a relation R is the number of attributes in its schema. If R has schema R(A₁, A₂, ..., Aₙ), then the degree of R is n.
Notation:
Examples:
12345678910111213141516171819202122
-- Unary relation (degree 1)Departments(department_name)-- degree = 1-- Each tuple has one attribute: the department name -- Binary relation (degree 2)Manages(employee_id, manager_id)-- degree = 2-- Each tuple has two attributes: who manages whom -- Ternary relation (degree 3)Supplies(supplier_id, part_id, quantity)-- degree = 3-- Each tuple links supplier, part, and quantity -- Higher-arity relationEmployee(id, name, department, salary, hire_date, manager_id, email, phone)-- degree = 8-- Each tuple has eight attributes -- Degree is a property of the SCHEMA, not the instance-- It doesn't change when you add/remove tuplesProperties of Degree:
Degree is fixed by the schema. Once defined, a relation's degree doesn't change with data—it changes only with schema evolution (ALTER TABLE ADD COLUMN).
Degree ≥ 1. A relation with zero attributes is called a nullary relation. It can have at most one tuple (the empty tuple). This is a degenerate case rarely encountered.
Degree affects storage. More attributes generally mean more bytes per tuple, though compression and null handling complicate this.
Degree affects query complexity. Joining on more attributes, selecting more columns, and processing wider rows all increase query costs.
| Degree | Term | Example |
|---|---|---|
| 1 | Unary | ValidCodes(code) |
| 2 | Binary | EmployeeDept(emp_id, dept_id) |
| 3 | Ternary | Supplies(supplier, part, qty) |
| 4 | Quaternary | Shipment(supplier, part, warehouse, date) |
| 5+ | n-ary | General relations with many attributes |
The projection operation produces a relation with (usually) lower degree. If Employee has degree 8, then π(name, salary)(Employee) has degree 2. This is how we select a 'subset' of attributes from a wider relation.
Definition (Cardinality of a Relation):
The cardinality of a relation R is the number of tuples it contains. Since a relation is a set, cardinality equals the set's size.
Notation:
Examples:
1234567891011121314151617181920
-- Small tableDepartments = {(Engineering), (Marketing), (Sales), (HR)}|Departments| = 4 -- Medium tableEmployees with 1000 employees|Employees| = 1000 -- Large tableWebLogs with 10 billion entries|WebLogs| = 10,000,000,000 -- Empty table (valid relation)NewHires = {} (no tuples yet)|NewHires| = 0 -- Cardinality changes with DML operations:-- INSERT increases cardinality by 1 (usually)-- DELETE decreases cardinality-- UPDATE doesn't change cardinality (replaces tuples)Properties of Cardinality:
Cardinality varies with data. Unlike degree, cardinality changes as data is added or removed. Schema doesn't constrain cardinality (except through application logic).
Cardinality ≥ 0. An empty relation (with no tuples) is perfectly valid. It represents no facts matching the predicate.
Cardinality directly affects performance. Query time often scales with cardinality—scanning 1 million rows takes longer than scanning 1 thousand.
Maximum cardinality is bounded by domains. The maximum possible cardinality is the product of domain sizes: |dom(A₁)| × |dom(A₂)| × ... × |dom(Aₙ)|. For infinite domains (like integers or strings), this is unbounded.
Cardinality statistics drive optimization. Query optimizers maintain cardinality estimates for tables and use them to choose efficient execution plans.
A full table scan is O(cardinality). A join of two tables can produce up to cardinality(A) × cardinality(B) tuples. Index lookups reduce this dramatically. Modern query optimizers spend significant effort estimating cardinalities because join ordering and access method decisions depend on knowing how many tuples flow through each operation.
When we represent relations as tables, degree and cardinality correspond directly to the table's visual dimensions:
This intuitive mapping makes it easy to think about relation structure visually.
Dimensional Analysis:
We can think of a relation as a matrix-like structure with:
For the Employee relation above:
Column Operations:
Operations that change degree:
Row Operations:
Operations that change cardinality:
| Operation | Effect on Degree | Effect on Cardinality |
|---|---|---|
| Selection σ | Unchanged | Decreased or unchanged |
| Projection π | Decreased or unchanged | Decreased or unchanged (duplicates removed) |
| Cartesian Product × | Sum of degrees | Product of cardinalities |
| Natural Join ⋈ | Sum minus shared attributes | 0 to product of cardinalities |
| Union ∪ | Unchanged (must match) | Sum or less (no duplicates) |
| Intersection ∩ | Unchanged (must match) | Min of cardinalities or less |
| Difference − | Unchanged (must match) | Cardinality of first or less |
Query optimization is fundamentally about choosing the most efficient execution plan. Cardinality estimation—predicting how many tuples flow through each operation—is the cornerstone of this process.
Why Estimation Matters:
Consider joining three tables: A, B, and C.
The costs can differ by orders of magnitude depending on which intermediate results are smallest. If |A ⋈ B| = 1000 and |B ⋈ C| = 10,000,000, we want to compute A ⋈ B first.
How Databases Estimate Cardinality:
|Employees| = 50000 is a known fact.salary > 100000 might select 10% of rows.Selection Cardinality Formula:
For a selection σ_{condition}(R), the estimated cardinality is:
|σ_{condition}(R)| ≈ |R| × selectivity(condition)
Where selectivity is the fraction of tuples satisfying the condition:
Join Cardinality Formula:
For a join R ⋈ S with join attribute having NDV(R) and NDV(S) distinct values:
|R ⋈ S| ≈ (|R| × |S|) / max(NDV(R.attr), NDV(S.attr))
This assumes uniform distribution and referential integrity. Real queries may deviate significantly.
Cardinality estimation is notoriously error-prone. Correlated columns, non-uniform distributions, complex predicates, and stale statistics lead to estimates that are orders of magnitude wrong. This causes poor plan choices—an ongoing challenge in database research.
Degree and cardinality directly impact storage requirements. Understanding this relationship helps in capacity planning and schema design.
Storage Size Formula (Simplified):
Storage ≈ cardinality × average_tuple_size
≈ cardinality × Σ(attribute_sizes)
≈ |R| × Σᵢ(size(Aᵢ))
For instance:
| Characteristic | Storage Impact | Example |
|---|---|---|
| High degree, low cardinality | Many columns, few rows—wide but short | Configuration table: 50 settings columns, 1 row |
| Low degree, high cardinality | Few columns, many rows—narrow but tall | Log table: 3 columns (timestamp, level, message), billions of rows |
| High degree, high cardinality | Wide and tall—significant storage | Analytics fact table: 100 columns, billions of rows |
| Low degree, low cardinality | Small—minimal storage concern | Lookup table: 2 columns (code, description), 50 rows |
Degree Considerations for Schema Design:
Wide tables (high degree):
Narrow tables (low degree):
Cardinality Considerations:
Large cardinality (millions+):
Small cardinality (hundreds/thousands):
Databases read data in pages (typically 4KB-16KB). If tuple size exceeds page size, the database must store tuples across multiple pages (overflow), which degrades performance. Very wide tables may benefit from vertical partitioning—splitting into multiple narrower tables with the same key.
When analyzing relational operations, understanding the bounds on result cardinality helps validate query logic and estimate performance.
Minimum Cardinality:
The smallest possible size of an operation's result.
For common operations:
Maximum Cardinality:
The largest possible size of an operation's result.
| Operation | Minimum | Maximum | Notes |
|---|---|---|---|
| σ_condition(R) | 0 | |R| | All or none may pass filter |
| π_attrs(R) | 1 | |R| | Duplicates may collapse tuples |
| R ∪ S | max(|R|, |S|) | |R| + |S| | Duplicates eliminated in set union |
| R ∩ S | 0 | min(|R|, |S|) | Intersection can't exceed smaller set |
| R − S | 0 | |R| | Difference can't exceed left operand |
| R × S | |R| × |S| | |R| × |S| | Exact: Cartesian product is multiplicative |
| R ⋈ S | 0 | |R| × |S| | Natural join ranges from empty to Cartesian |
Practical Application:
Knowing cardinality bounds helps:
Debugging Queries: If a join returns way more rows than expected, it might be performing a Cartesian product (missing join condition). If |R ⋈ S| ≈ |R| × |S|, the join condition is probably ineffective.
Validating Results: If selection returns 0 rows but you expected some, either the data doesn't match or the predicate is wrong.
Capacity Planning: Maximum cardinality gives worst-case storage needs. A report aggregating from a 1B-row table should not produce 1B rows (if it might, there's a design issue).
Query Optimization: Pushing down selections (reducing cardinality early) exploits the fact that smaller cardinality means faster subsequent operations.
123456789101112131415161718192021
-- Example: Debugging an unexpectedly large join result -- Expected: ~10,000 orders with their customers-- Actual result: 100,000,000 rows (suspiciously close to 10,000 × 10,000)SELECT *FROM Orders, Customers; -- BUG: Missing join condition! -- Fix: Add the join conditionSELECT *FROM Orders oJOIN Customers c ON o.customer_id = c.customer_id;-- Now returns ~10,000 rows as expected -- Checking cardinality of intermediate resultsEXPLAIN ANALYZESELECT c.name, COUNT(o.order_id)FROM Customers cLEFT JOIN Orders o ON c.customer_id = o.customer_idWHERE c.country = 'USA'GROUP BY c.name;-- Shows estimated vs actual cardinality at each stepNormalization—the process of organizing relations to reduce redundancy—has direct effects on degree and cardinality. Understanding this relationship illuminates the tradeoffs in database design.
Effect of Normalization on Degree:
When you normalize a relation (decompose it into smaller relations):
Example: Decomposing an Unnormalized Relation:
Before Normalization:
EmployeeProject(
emp_id,
emp_name,
emp_department,
project_id,
project_name,
project_budget,
hours_worked
)
After Normalization:
Employee(emp_id, emp_name, dept)
Degree: 3, Cardinality: 500
Project(project_id, name, budget)
Degree: 3, Cardinality: 50
Assignment(emp_id, proj_id, hours)
Degree: 3, Cardinality: 10,000
Cardinality Changes with Normalization:
Normalization typically:
The Tradeoff:
| Approach | Degree | Cardinality | Pros | Cons |
|---|---|---|---|---|
| Denormalized (one wide table) | High | Medium | Fast reads, no joins | Redundancy, update anomalies |
| Normalized (many narrow tables) | Low | Varies | No redundancy, clean updates | Requires joins for queries |
Rule of Thumb:
The degree of joined relations affects result width and memory consumption. Joining Employee (degree 8) with Department (degree 5) produces degree ≤ 12 (depending on shared columns). Wide results consume more memory for sorting, hashing, and network transfer.
We have thoroughly explored degree and cardinality—the fundamental dimensions that characterize every relation. Let's consolidate the essential understanding:
What's Next:
Having understood the structure and dimensions of tuples and relations, we'll now explore tuple operations—the fundamental operations that create, retrieve, update, and delete tuples. This completes our understanding of tuples as both data structures and units of manipulation.
You now understand degree and cardinality—the two dimensions that characterize every relation. This knowledge powers capacity planning, query optimization analysis, and schema design decisions. In the final page of this module, we explore tuple operations.