Database Management SystemsTuples and Rows

Tuples and Rows: The Atomic Units of Relational Data

LevelBeginner

Duration60 mins

TopicTuples and Rows

3 / 5

Tuple Uniqueness

The Uniqueness Principle

In the relational model, a fundamental principle governs the structure of relations: no two tuples in a relation can be identical. This isn't a limitation or a design choice—it's a logical necessity that flows directly from the mathematical foundation of the model.

Since relations are defined as sets of tuples, and sets by definition cannot contain duplicate elements, the uniqueness of tuples is axiomatic. Just as the mathematical set {1, 2, 3, 3} is identical to {1, 2, 3}, a relation cannot logically contain the same tuple twice.

But tuple uniqueness is more than a mathematical technicality. It has profound implications for data integrity, query semantics, and the entire mechanism of keys and constraints. In this page, we explore tuple uniqueness from multiple perspectives, understanding both its theoretical basis and its practical ramifications.

What You Will Learn

By the end of this page, you will understand why tuples must be unique, how this requirement relates to the set-theoretic definition of relations, what makes two tuples 'the same,' how keys enforce uniqueness, and the implications of uniqueness for data modeling and SQL databases.

The Set-Theoretic Foundation

To understand why tuples must be unique, we must revisit the mathematical definition of a relation.

Definition Reminder:

A relation R over a schema with attributes A₁, A₂, ..., Aₙ and corresponding domains D₁, D₂, ..., Dₙ is a subset of the Cartesian product D₁ × D₂ × ... × Dₙ.

The key word is subset. A subset is a set, and sets have a defining property: each element appears at most once. The question "How many times does element x appear in set S?" has only two answers: zero or one.

Formal Statement:

For any relation R, if tuples t₁ and t₂ are both in R, and t₁ = t₂ (they are identical), then they are the same tuple, not two copies of a tuple. The relation contains one instance of this tuple.

This is not a rule we impose—it's a consequence of what "relation" means mathematically.

Sets vs. Multisets (Bags)
Property	Set (Relation)	Multiset (Bag)
Duplicate elements	Not allowed	Allowed
Element count	0 or 1 per element	0, 1, 2, 3, ... per element
{1, 1, 2, 2, 3}	= {1, 2, 3}	≠ {1, 2, 3} (different multiplicities)
Membership question	Is x in S?	How many times is x in S?
Used in	Relational model (pure)	SQL query results (often)

SQL Tables vs. Relations

SQL tables are NOT pure relations—they can contain duplicate rows (unless constrained by UNIQUE or PRIMARY KEY). SQL query results are often multisets unless DISTINCT is specified. This deviation from relational theory is a practical concession, but it introduces complications that pure relations avoid.

What Makes Two Tuples Equal?

Understanding tuple uniqueness requires a precise definition of tuple equality. When are two tuples considered "the same"?

Definition (Tuple Equality):

Two tuples t₁ and t₂ over the same schema R = {A₁, A₂, ..., Aₙ} are equal (t₁ = t₂) if and only if for every attribute Aᵢ in R:

t₁[Aᵢ] = t₂[Aᵢ]

In other words, two tuples are equal if they have the same value for every attribute. If even one attribute differs, the tuples are distinct.

Examples:

tuple_equality.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Schema: Employee(ID, Name, Department, Salary)
 
-- These tuples are EQUAL (same values for all attributes):
t1 = {ID: 101, Name: 'Alice', Department: 'Engineering', Salary: 95000}
t2 = {ID: 101, Name: 'Alice', Department: 'Engineering', Salary: 95000}
Result: t1 = t2 ✓
 
-- These tuples are NOT EQUAL (Name differs):
t3 = {ID: 101, Name: 'Alice', Department: 'Engineering', Salary: 95000}
t4 = {ID: 101, Name: 'Alicia', Department: 'Engineering', Salary: 95000}
Result: t3 ≠ t4 ✗
 
-- These tuples are NOT EQUAL (Salary differs):
t5 = {ID: 102, Name: 'Bob', Department: 'Marketing', Salary: 78000}
t6 = {ID: 102, Name: 'Bob', Department: 'Marketing', Salary: 78001}
Result: t5 ≠ t6 ✗
 
-- Order of attributes doesn't affect equality:
t7 = {ID: 103, Name: 'Carol', Salary: 90000, Department: 'Sales'}
t8 = {Department: 'Sales', ID: 103, Salary: 90000, Name: 'Carol'}
Result: t7 = t8 ✓ (same values for same attributes)

The NULL Complication:

Tuple equality becomes subtle when NULL values are involved. In standard three-valued logic:

NULL = NULL evaluates to UNKNOWN, not TRUE
This means two tuples with NULL in the same attribute might not be considered "equal" by simple comparison

However, for the purpose of duplicate detection in SQL:

Most databases treat NULLs as duplicates of each other in GROUP BY and DISTINCT
But two rows with NULLs in a UNIQUE column are typically both allowed (NULL ≠ NULL for constraint checking in standard SQL)

This is one of the well-known inconsistencies in SQL's handling of NULL.

NULL and Uniqueness

Standard SQL allows multiple NULLs in a column with a UNIQUE constraint because NULL represents 'unknown,' and two unknown values aren't necessarily the same. However, this varies by database: some (like SQL Server) allow only one NULL per unique column by default. Always check your database's specific behavior.

The Semantic Case for Uniqueness

Beyond the mathematical definition, there's a compelling semantic argument for tuple uniqueness: tuples represent facts, and stating the same fact twice is meaningless.

Tuples as Propositions:

Recall that each tuple in a relation asserts a proposition. Consider the Enrolled relation:

Enrolled(StudentID, CourseID, Semester)

The tuple {StudentID: 'S001', CourseID: 'CS101', Semester: 'Fall2024'} asserts:

"Student S001 is enrolled in course CS101 during Fall2024."

Why No Duplicates?

If this tuple appeared twice, what would the second occurrence mean?

It can't mean "enrolled twice"—a student is either enrolled or not
It adds no new information
It's logically redundant

In propositional logic, asserting P and P is the same as asserting P. Redundancy adds no meaning.

Why Duplicate Facts Are Problematic

•Wasted Storage: Redundant data consumes space without adding information.
•Inconsistent Counts: Queries like 'How many students are enrolled?' would return incorrect results.
•Update Anomalies: Changing an employee's salary means updating multiple identical rows—miss one and the data is inconsistent.
•Delete Anomalies: Deleting 'one occurrence' of a fact is semantically meaningless if the fact should be true or false, not counted.
•Logical Confusion: The database would represent something (multiplicity of facts) that has no real-world meaning.

When Multiplicity Matters:

Sometimes, multiplicity IS meaningful. If a customer can place the same order twice (buying the same item on different occasions), that's two distinct facts—but they should differ in some attribute (order date, order number, transaction ID).

Consider:

-- WRONG: No way to distinguish two purchases of the same item
Purchases(CustomerID, ProductID)

-- RIGHT: Each purchase is a distinct fact
Purchases(PurchaseID, CustomerID, ProductID, PurchaseDate)

By adding attributes that distinguish occurrences (like PurchaseID or PurchaseDate), each purchase becomes a unique tuple representing a unique fact.

Designing for Uniqueness

When designing relations, ask: 'What distinguishes one occurrence from another?' If seemingly identical tuples could occur, you're missing an attribute that captures the distinction. Adding that attribute makes each tuple unique and each fact distinct.

Keys: The Mechanism for Enforcing Uniqueness

In practice, databases enforce tuple uniqueness through keys—sets of attributes that uniquely identify tuples. Let's examine the key hierarchy and how each level relates to uniqueness.

Superkeys:

A superkey is any set of attributes that uniquely identifies tuples. In an Employees relation, the following might all be superkeys:

{ID} — assuming IDs are unique
{ID, Name} — still unique (ID alone suffices)
{ID, Name, Department} — still unique
{Email} — assuming emails are unique
{SSN} — assuming social security numbers are unique

A superkey may contain 'extra' attributes that aren't needed for uniqueness.

Candidate Keys:

A candidate key is a minimal superkey—no proper subset is also a superkey. If {ID} uniquely identifies tuples, and you can't remove any attribute (there's only one), then {ID} is a candidate key. If {FirstName, LastName, BirthDate} is a superkey, it's a candidate key only if no proper subset (like {FirstName, LastName}) also uniquely identifies tuples.

Primary Key:

The primary key is the candidate key chosen as the official identifier for tuples. By convention, primary keys:

Are declared explicitly in the schema
Cannot contain NULL values
Are stable (rarely changed)
Are often simple (few columns)

Converting Mermaid diagram...

How Keys Enforce Uniqueness:

The uniqueness constraint on a key means: no two distinct tuples can have the same key values. Since keys uniquely identify tuples, this directly enforces tuple uniqueness.

If {ID} is the primary key of Employees:

Tuple with ID=101 can exist at most once
Attempting to insert another tuple with ID=101 violates the key constraint
The tuples may differ in other attributes, but they cannot share the same ID

This is how databases practically ensure that relations remain sets of unique tuples.

key_constraints.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Creating a table with primary key
CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,  -- Uniqueness enforced
    name VARCHAR(100) NOT NULL,
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);
 
-- Attempting to insert duplicate key fails
INSERT INTO Employees VALUES (101, 'Alice', 'Engineering', 95000);
INSERT INTO Employees VALUES (101, 'Bob', 'Marketing', 78000);
-- ERROR: Duplicate entry '101' for key 'PRIMARY'
 
-- Unique constraints on other columns
CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,
    email VARCHAR(255) UNIQUE,     -- Alternative key
    ssn CHAR(11) UNIQUE,           -- Another alternative key
    name VARCHAR(100) NOT NULL
);
 
-- Composite primary key
CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    semester VARCHAR(20),
    grade CHAR(2),
    PRIMARY KEY (student_id, course_id, semester)
);
-- A student can enroll in the same course in different semesters
-- but not twice in the same semester

SQL's Departure from Pure Relations

While the relational model prohibits duplicate tuples, SQL as implemented in real databases has a more complex relationship with uniqueness. Understanding this departure is essential for practical database work.

SQL Tables Can Have Duplicates:

Unless explicitly constrained, SQL tables can contain duplicate rows:

CREATE TABLE Messages (
    sender VARCHAR(100),
    content TEXT,
    sent_at TIMESTAMP
);  -- No primary key, no unique constraint

INSERT INTO Messages VALUES ('Alice', 'Hello', '2024-01-15 10:00:00');
INSERT INTO Messages VALUES ('Alice', 'Hello', '2024-01-15 10:00:00');
-- Both inserts succeed—duplicate rows exist

This is a practical concession for performance (checking uniqueness has overhead) and flexibility, but it deviates from relational theory.

Query Results Are Often Not Sets:

SQL queries by default return multisets (bags), not sets:

sql_duplicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- This query may return duplicate rows
SELECT department FROM Employees;
-- Returns: Engineering, Marketing, Engineering, Sales, Engineering, ...
 
-- To get unique values (a true set), use DISTINCT
SELECT DISTINCT department FROM Employees;
-- Returns: Engineering, Marketing, Sales, ...
 
-- Aggregate queries naturally produce distinct groups
SELECT department, COUNT(*) FROM Employees GROUP BY department;
-- Each department appears once
 
-- UNION vs UNION ALL
SELECT department FROM Employees WHERE salary > 80000
UNION
SELECT department FROM Employees WHERE name LIKE 'A%';
-- UNION eliminates duplicates (set union)
 
SELECT department FROM Employees WHERE salary > 80000
UNION ALL
SELECT department FROM Employees WHERE name LIKE 'A%';
-- UNION ALL preserves duplicates (multiset union)

Why SQL Allows This:

Performance: Eliminating duplicates requires sorting or hashing, which is expensive. Allowing duplicates by default is faster when duplicates don't matter.
Practical Needs: Sometimes you genuinely want to count occurrences. If three employees are in Engineering, you might want the department to appear three times in a projection.
Historical Compatibility: Early SQL implementations predated strict adherence to relational theory.

Best Practices:

Always define PRIMARY KEY or UNIQUE constraints on tables that represent relations
Use DISTINCT when duplicate-free results are needed
Be aware that aggregate queries inherently produce unique rows per group
Prefer UNION over UNION ALL when you want set semantics

Duplicates Can Cause Subtle Bugs

Unexpected duplicates in query results can cause incorrect counts, sums, and joins. A common bug: joining tables that multiply rows unexpectedly (many-to-many join with missing join condition), then aggregating without DISTINCT. Always verify row counts during query development.

Tuple Identification and Addressing

Because tuples in a relation are unique, we can reliably identify any tuple using its key values. This identification enables precise data manipulation.

Declarative Identification:

In the relational model, tuples are identified by their values, not by physical location:

-- Identify the employee with ID 101
SELECT * FROM Employees WHERE employee_id = 101;

-- Update that specific employee
UPDATE Employees SET salary = 100000 WHERE employee_id = 101;

-- Delete that specific employee
DELETE FROM Employees WHERE employee_id = 101;

The key value (101) uniquely identifies the tuple. We don't say "delete the 5th row" or "update the record at position 42." We say "the employee with ID 101."

Benefits of Value-Based Identification

•Location Independence: Tuples can move physically without changing their identity. Indexes, storage reorganization, and replication don't affect how we reference data.
•Meaningful References: Foreign keys reference real-world identifiers ('employee_id'), not arbitrary positions. Relationships are semantically clear.
•Query Stability: The query 'employee with ID 101' always means the same thing, regardless of table size, physical layout, or execution plan.
•Distributed Systems: Data can be sharded, partitioned, or replicated. A key identifies a tuple no matter which node stores it.

Physical Identifiers (Implementation Detail):

Some databases expose physical row identifiers:

Oracle ROWID: A pseudo-column containing physical row location (file, block, row-within-block)
PostgreSQL ctid: A tuple identifier (page, tuple-within-page)
SQL Server %%physloc%%: Physical locator

These are implementation details, NOT part of the relational model. They can change after updates, vacuums, or reorganization. Never use physical identifiers as primary keys or in application logic:

-- BAD: Using physical ID (may change)
SELECT * FROM Employees WHERE ctid = '(0,1)';

-- GOOD: Using logical key (stable)
SELECT * FROM Employees WHERE employee_id = 101;

Surrogate Keys

When no natural key exists (or natural keys are too complex or volatile), databases use surrogate keys—system-generated identifiers with no business meaning. Auto-increment integers, UUIDs, and sequences serve this purpose. They guarantee uniqueness without depending on business attributes.

Uniqueness and Data Integrity

Tuple uniqueness is intimately connected to data integrity—the accuracy, consistency, and reliability of data in a database. Let's explore how uniqueness constraints protect data quality.

Entity Integrity:

The Entity Integrity Rule states that no attribute of a primary key can be NULL. Combined with uniqueness, this ensures:

Every tuple is identifiable (has a complete key)
No two tuples share the same identity
References to tuples are unambiguous

Preventing Update Anomalies:

When data is duplicated, updates must touch all copies to remain consistent. With uniqueness enforced:

Without Uniqueness

•Same employee appears multiple times
•Salary update changes one row, not others
•Different rows claim different salaries for same person
•Data becomes inconsistent
•No way to know which value is 'correct'

With Uniqueness

•Each employee exists exactly once
•Salary update affects exactly one row
•Only one salary value exists per employee
•Data remains consistent by construction
•The single value is definitionally correct

Referential Integrity:

Foreign keys reference primary keys. For this to work correctly, the referenced tuple must be unique and identifiable:

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT REFERENCES Customers(customer_id),
    order_date DATE
);

If customer_id weren't unique in Customers, the foreign key reference would be ambiguous—which customer does order 12345 belong to?

Business Rule Enforcement:

Many business rules are naturally expressed as uniqueness constraints:

Each employee has one SSN (UNIQUE on ssn)
Each product has one SKU (UNIQUE on sku)
At most one manager per department (UNIQUE on department if storing manager there)
Email addresses are unique across users (UNIQUE on email)

uniqueness_constraints.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Multiple uniqueness constraints for data integrity
CREATE TABLE Users (
    user_id INT PRIMARY KEY,           -- Unique system identifier
    username VARCHAR(50) UNIQUE,       -- Unique login name
    email VARCHAR(255) UNIQUE,         -- Unique email
    phone VARCHAR(20) UNIQUE,          -- Unique phone number
    created_at TIMESTAMP DEFAULT NOW()
);
 
-- Composite uniqueness: one vote per user per poll
CREATE TABLE Votes (
    vote_id INT PRIMARY KEY,
    user_id INT REFERENCES Users(user_id),
    poll_id INT REFERENCES Polls(poll_id),
    choice VARCHAR(100),
    voted_at TIMESTAMP,
    UNIQUE(user_id, poll_id)  -- User can vote only once per poll
);
 
-- Partial uniqueness (PostgreSQL): unique email among active users
CREATE UNIQUE INDEX unique_active_email 
ON Users(email) 
WHERE status = 'active';

Summary: The Uniqueness Imperative

We have thoroughly explored tuple uniqueness—from its mathematical foundations to its practical applications. Let's consolidate the essential understanding:

Key Takeaways

•Relations are sets, so tuples must be unique — This follows directly from set theory: sets cannot contain duplicate elements.
•Two tuples are equal if all attribute values match — Every attribute must have the same value for tuples to be considered identical.
•Duplicate facts are semantically meaningless — A proposition is true once; stating it twice adds no information.
•Keys are the mechanism for enforcing uniqueness — Primary keys and unique constraints ensure no duplicates exist.
•SQL deviates from pure relations — Tables can have duplicates unless constrained; query results are multisets by default.
•Value-based identification is fundamental — Tuples are identified by their key values, not by physical position.
•Uniqueness enables data integrity — It prevents update anomalies, enables foreign keys, and enforces business rules.

What's Next:

Having explored uniqueness at the tuple level, we'll now zoom out to examine degree and cardinality—the dimensions of relations. How many attributes does a relation have? How many tuples? What are the implications of these measures? This perspective completes our understanding of relation structure.

Page Complete

You now possess a rigorous understanding of tuple uniqueness in the relational model. This principle underlies key constraints, data integrity, and the very definition of what a relation is. In the next page, we explore relation dimensions through degree and cardinality.

3 / 5

Loading learning content...

Database Management SystemsTuples and Rows

Tuples and Rows: The Atomic Units of Relational Data

LevelBeginner

Duration60 mins

TopicTuples and Rows

3 / 5

Tuple Uniqueness

The Uniqueness Principle

What You Will Learn

The Set-Theoretic Foundation

To understand why tuples must be unique, we must revisit the mathematical definition of a relation.

Definition Reminder:

A relation R over a schema with attributes A₁, A₂, ..., Aₙ and corresponding domains D₁, D₂, ..., Dₙ is a subset of the Cartesian product D₁ × D₂ × ... × Dₙ.

Formal Statement:

This is not a rule we impose—it's a consequence of what "relation" means mathematically.

Sets vs. Multisets (Bags)
Property	Set (Relation)	Multiset (Bag)
Duplicate elements	Not allowed	Allowed
Element count	0 or 1 per element	0, 1, 2, 3, ... per element
{1, 1, 2, 2, 3}	= {1, 2, 3}	≠ {1, 2, 3} (different multiplicities)
Membership question	Is x in S?	How many times is x in S?
Used in	Relational model (pure)	SQL query results (often)

SQL Tables vs. Relations

What Makes Two Tuples Equal?

Understanding tuple uniqueness requires a precise definition of tuple equality. When are two tuples considered "the same"?

Definition (Tuple Equality):

Two tuples t₁ and t₂ over the same schema R = {A₁, A₂, ..., Aₙ} are equal (t₁ = t₂) if and only if for every attribute Aᵢ in R:

t₁[Aᵢ] = t₂[Aᵢ]

In other words, two tuples are equal if they have the same value for every attribute. If even one attribute differs, the tuples are distinct.

Examples:

tuple_equality.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Schema: Employee(ID, Name, Department, Salary)
 
-- These tuples are EQUAL (same values for all attributes):
t1 = {ID: 101, Name: 'Alice', Department: 'Engineering', Salary: 95000}
t2 = {ID: 101, Name: 'Alice', Department: 'Engineering', Salary: 95000}
Result: t1 = t2 ✓
 
-- These tuples are NOT EQUAL (Name differs):
t3 = {ID: 101, Name: 'Alice', Department: 'Engineering', Salary: 95000}
t4 = {ID: 101, Name: 'Alicia', Department: 'Engineering', Salary: 95000}
Result: t3 ≠ t4 ✗
 
-- These tuples are NOT EQUAL (Salary differs):
t5 = {ID: 102, Name: 'Bob', Department: 'Marketing', Salary: 78000}
t6 = {ID: 102, Name: 'Bob', Department: 'Marketing', Salary: 78001}
Result: t5 ≠ t6 ✗
 
-- Order of attributes doesn't affect equality:
t7 = {ID: 103, Name: 'Carol', Salary: 90000, Department: 'Sales'}
t8 = {Department: 'Sales', ID: 103, Salary: 90000, Name: 'Carol'}
Result: t7 = t8 ✓ (same values for same attributes)

The NULL Complication:

Tuple equality becomes subtle when NULL values are involved. In standard three-valued logic:

NULL = NULL evaluates to UNKNOWN, not TRUE
This means two tuples with NULL in the same attribute might not be considered "equal" by simple comparison

However, for the purpose of duplicate detection in SQL:

Most databases treat NULLs as duplicates of each other in GROUP BY and DISTINCT
But two rows with NULLs in a UNIQUE column are typically both allowed (NULL ≠ NULL for constraint checking in standard SQL)

This is one of the well-known inconsistencies in SQL's handling of NULL.

NULL and Uniqueness

The Semantic Case for Uniqueness

Beyond the mathematical definition, there's a compelling semantic argument for tuple uniqueness: tuples represent facts, and stating the same fact twice is meaningless.

Tuples as Propositions:

Recall that each tuple in a relation asserts a proposition. Consider the Enrolled relation:

Enrolled(StudentID, CourseID, Semester)

The tuple {StudentID: 'S001', CourseID: 'CS101', Semester: 'Fall2024'} asserts:

"Student S001 is enrolled in course CS101 during Fall2024."

Why No Duplicates?

If this tuple appeared twice, what would the second occurrence mean?

It can't mean "enrolled twice"—a student is either enrolled or not
It adds no new information
It's logically redundant

In propositional logic, asserting P and P is the same as asserting P. Redundancy adds no meaning.

Why Duplicate Facts Are Problematic

•Wasted Storage: Redundant data consumes space without adding information.
•Inconsistent Counts: Queries like 'How many students are enrolled?' would return incorrect results.
•Update Anomalies: Changing an employee's salary means updating multiple identical rows—miss one and the data is inconsistent.
•Delete Anomalies: Deleting 'one occurrence' of a fact is semantically meaningless if the fact should be true or false, not counted.
•Logical Confusion: The database would represent something (multiplicity of facts) that has no real-world meaning.

When Multiplicity Matters:

Consider:

-- WRONG: No way to distinguish two purchases of the same item
Purchases(CustomerID, ProductID)

-- RIGHT: Each purchase is a distinct fact
Purchases(PurchaseID, CustomerID, ProductID, PurchaseDate)

By adding attributes that distinguish occurrences (like PurchaseID or PurchaseDate), each purchase becomes a unique tuple representing a unique fact.

Designing for Uniqueness

Keys: The Mechanism for Enforcing Uniqueness

In practice, databases enforce tuple uniqueness through keys—sets of attributes that uniquely identify tuples. Let's examine the key hierarchy and how each level relates to uniqueness.

Superkeys:

A superkey is any set of attributes that uniquely identifies tuples. In an Employees relation, the following might all be superkeys:

{ID} — assuming IDs are unique
{ID, Name} — still unique (ID alone suffices)
{ID, Name, Department} — still unique
{Email} — assuming emails are unique
{SSN} — assuming social security numbers are unique

A superkey may contain 'extra' attributes that aren't needed for uniqueness.

Candidate Keys:

Primary Key:

The primary key is the candidate key chosen as the official identifier for tuples. By convention, primary keys:

Are declared explicitly in the schema
Cannot contain NULL values
Are stable (rarely changed)
Are often simple (few columns)

Converting Mermaid diagram...

How Keys Enforce Uniqueness:

The uniqueness constraint on a key means: no two distinct tuples can have the same key values. Since keys uniquely identify tuples, this directly enforces tuple uniqueness.

If {ID} is the primary key of Employees:

Tuple with ID=101 can exist at most once
Attempting to insert another tuple with ID=101 violates the key constraint
The tuples may differ in other attributes, but they cannot share the same ID

This is how databases practically ensure that relations remain sets of unique tuples.

key_constraints.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
-- Creating a table with primary key
CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,  -- Uniqueness enforced
    name VARCHAR(100) NOT NULL,
    department VARCHAR(50),
    salary DECIMAL(10, 2)
);
 
-- Attempting to insert duplicate key fails
INSERT INTO Employees VALUES (101, 'Alice', 'Engineering', 95000);
INSERT INTO Employees VALUES (101, 'Bob', 'Marketing', 78000);
-- ERROR: Duplicate entry '101' for key 'PRIMARY'
 
-- Unique constraints on other columns
CREATE TABLE Employees (
    employee_id INT PRIMARY KEY,
    email VARCHAR(255) UNIQUE,     -- Alternative key
    ssn CHAR(11) UNIQUE,           -- Another alternative key
    name VARCHAR(100) NOT NULL
);
 
-- Composite primary key
CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    semester VARCHAR(20),
    grade CHAR(2),
    PRIMARY KEY (student_id, course_id, semester)
);
-- A student can enroll in the same course in different semesters
-- but not twice in the same semester

SQL's Departure from Pure Relations

SQL Tables Can Have Duplicates:

Unless explicitly constrained, SQL tables can contain duplicate rows:

CREATE TABLE Messages (
    sender VARCHAR(100),
    content TEXT,
    sent_at TIMESTAMP
);  -- No primary key, no unique constraint

INSERT INTO Messages VALUES ('Alice', 'Hello', '2024-01-15 10:00:00');
INSERT INTO Messages VALUES ('Alice', 'Hello', '2024-01-15 10:00:00');
-- Both inserts succeed—duplicate rows exist

This is a practical concession for performance (checking uniqueness has overhead) and flexibility, but it deviates from relational theory.

Query Results Are Often Not Sets:

SQL queries by default return multisets (bags), not sets:

sql_duplicates.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
-- This query may return duplicate rows
SELECT department FROM Employees;
-- Returns: Engineering, Marketing, Engineering, Sales, Engineering, ...
 
-- To get unique values (a true set), use DISTINCT
SELECT DISTINCT department FROM Employees;
-- Returns: Engineering, Marketing, Sales, ...
 
-- Aggregate queries naturally produce distinct groups
SELECT department, COUNT(*) FROM Employees GROUP BY department;
-- Each department appears once
 
-- UNION vs UNION ALL
SELECT department FROM Employees WHERE salary > 80000
UNION
SELECT department FROM Employees WHERE name LIKE 'A%';
-- UNION eliminates duplicates (set union)
 
SELECT department FROM Employees WHERE salary > 80000
UNION ALL
SELECT department FROM Employees WHERE name LIKE 'A%';
-- UNION ALL preserves duplicates (multiset union)

Why SQL Allows This:

Performance: Eliminating duplicates requires sorting or hashing, which is expensive. Allowing duplicates by default is faster when duplicates don't matter.
Practical Needs: Sometimes you genuinely want to count occurrences. If three employees are in Engineering, you might want the department to appear three times in a projection.
Historical Compatibility: Early SQL implementations predated strict adherence to relational theory.

Best Practices:

Always define PRIMARY KEY or UNIQUE constraints on tables that represent relations
Use DISTINCT when duplicate-free results are needed
Be aware that aggregate queries inherently produce unique rows per group
Prefer UNION over UNION ALL when you want set semantics

Duplicates Can Cause Subtle Bugs

Tuple Identification and Addressing

Because tuples in a relation are unique, we can reliably identify any tuple using its key values. This identification enables precise data manipulation.

Declarative Identification:

In the relational model, tuples are identified by their values, not by physical location:

-- Identify the employee with ID 101
SELECT * FROM Employees WHERE employee_id = 101;

-- Update that specific employee
UPDATE Employees SET salary = 100000 WHERE employee_id = 101;

-- Delete that specific employee
DELETE FROM Employees WHERE employee_id = 101;

The key value (101) uniquely identifies the tuple. We don't say "delete the 5th row" or "update the record at position 42." We say "the employee with ID 101."

Benefits of Value-Based Identification

•Location Independence: Tuples can move physically without changing their identity. Indexes, storage reorganization, and replication don't affect how we reference data.
•Meaningful References: Foreign keys reference real-world identifiers ('employee_id'), not arbitrary positions. Relationships are semantically clear.
•Query Stability: The query 'employee with ID 101' always means the same thing, regardless of table size, physical layout, or execution plan.
•Distributed Systems: Data can be sharded, partitioned, or replicated. A key identifies a tuple no matter which node stores it.

Physical Identifiers (Implementation Detail):

Some databases expose physical row identifiers:

Oracle ROWID: A pseudo-column containing physical row location (file, block, row-within-block)
PostgreSQL ctid: A tuple identifier (page, tuple-within-page)
SQL Server %%physloc%%: Physical locator

These are implementation details, NOT part of the relational model. They can change after updates, vacuums, or reorganization. Never use physical identifiers as primary keys or in application logic:

-- BAD: Using physical ID (may change)
SELECT * FROM Employees WHERE ctid = '(0,1)';

-- GOOD: Using logical key (stable)
SELECT * FROM Employees WHERE employee_id = 101;

Surrogate Keys

Uniqueness and Data Integrity

Tuple uniqueness is intimately connected to data integrity—the accuracy, consistency, and reliability of data in a database. Let's explore how uniqueness constraints protect data quality.

Entity Integrity:

The Entity Integrity Rule states that no attribute of a primary key can be NULL. Combined with uniqueness, this ensures:

Every tuple is identifiable (has a complete key)
No two tuples share the same identity
References to tuples are unambiguous

Preventing Update Anomalies:

When data is duplicated, updates must touch all copies to remain consistent. With uniqueness enforced:

Without Uniqueness

•Same employee appears multiple times
•Salary update changes one row, not others
•Different rows claim different salaries for same person
•Data becomes inconsistent
•No way to know which value is 'correct'

With Uniqueness

•Each employee exists exactly once
•Salary update affects exactly one row
•Only one salary value exists per employee
•Data remains consistent by construction
•The single value is definitionally correct

Referential Integrity:

Foreign keys reference primary keys. For this to work correctly, the referenced tuple must be unique and identifiable:

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT REFERENCES Customers(customer_id),
    order_date DATE
);

If customer_id weren't unique in Customers, the foreign key reference would be ambiguous—which customer does order 12345 belong to?

Business Rule Enforcement:

Many business rules are naturally expressed as uniqueness constraints:

Each employee has one SSN (UNIQUE on ssn)
Each product has one SKU (UNIQUE on sku)
At most one manager per department (UNIQUE on department if storing manager there)
Email addresses are unique across users (UNIQUE on email)

uniqueness_constraints.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
-- Multiple uniqueness constraints for data integrity
CREATE TABLE Users (
    user_id INT PRIMARY KEY,           -- Unique system identifier
    username VARCHAR(50) UNIQUE,       -- Unique login name
    email VARCHAR(255) UNIQUE,         -- Unique email
    phone VARCHAR(20) UNIQUE,          -- Unique phone number
    created_at TIMESTAMP DEFAULT NOW()
);
 
-- Composite uniqueness: one vote per user per poll
CREATE TABLE Votes (
    vote_id INT PRIMARY KEY,
    user_id INT REFERENCES Users(user_id),
    poll_id INT REFERENCES Polls(poll_id),
    choice VARCHAR(100),
    voted_at TIMESTAMP,
    UNIQUE(user_id, poll_id)  -- User can vote only once per poll
);
 
-- Partial uniqueness (PostgreSQL): unique email among active users
CREATE UNIQUE INDEX unique_active_email 
ON Users(email) 
WHERE status = 'active';

Summary: The Uniqueness Imperative

We have thoroughly explored tuple uniqueness—from its mathematical foundations to its practical applications. Let's consolidate the essential understanding:

Key Takeaways

•Relations are sets, so tuples must be unique — This follows directly from set theory: sets cannot contain duplicate elements.
•Two tuples are equal if all attribute values match — Every attribute must have the same value for tuples to be considered identical.
•Duplicate facts are semantically meaningless — A proposition is true once; stating it twice adds no information.
•Keys are the mechanism for enforcing uniqueness — Primary keys and unique constraints ensure no duplicates exist.
•SQL deviates from pure relations — Tables can have duplicates unless constrained; query results are multisets by default.
•Value-based identification is fundamental — Tuples are identified by their key values, not by physical position.
•Uniqueness enables data integrity — It prevents update anomalies, enables foreign keys, and enforces business rules.

What's Next:

Page Complete

3 / 5