Loading learning content...
Imagine you're architecting a global e-commerce platform serving 200 million customers across 50 countries. Your Orders table contains 5 billion rows and grows by 10 million daily. A single database server, regardless of its specifications, cannot efficiently store, process, or serve this data. The solution isn't merely scaling up hardware—it's strategically fragmenting your data across multiple nodes.
Horizontal fragmentation (also known as horizontal partitioning or sharding) is the technique of dividing a table's rows into disjoint subsets, distributing each subset to different database nodes while preserving the table schema. Each fragment contains complete rows but only a subset of the table's total population.
This page explores horizontal fragmentation in rigorous depth—from its theoretical foundations to its practical implementation challenges, equipping you with the knowledge to design fragmentation strategies that scale to billions of records while maintaining query performance and data integrity.
By the end of this page, you will understand: (1) The formal definition and mathematical foundations of horizontal fragmentation, (2) The completeness, disjointness, and reconstruction properties that valid fragments must satisfy, (3) Primary and derived horizontal fragmentation strategies, (4) Selection predicates and fragmentation schemas, (5) Allocation strategies and their impact on query performance, and (6) Real-world implementation patterns and their trade-offs.
Before diving into implementation details, we must establish the formal mathematical foundation. Understanding these principles rigorously distinguishes ad-hoc partitioning from principled fragmentation design.
Definition: Given a relation R, a horizontal fragmentation of R produces fragments R₁, R₂, ..., Rₙ such that:
Mathematically:
Violating completeness means data loss—some tuples become inaccessible. Violating disjointness wastes storage and creates update anomalies where the same logical row must be modified in multiple locations. Violating reconstruction means you cannot reassemble the original table for reporting or migration. Production systems must verify these properties continuously.
Selection Predicates:
Horizontal fragments are defined using selection predicates—Boolean expressions that determine fragment membership. Each fragment Rᵢ is defined as:
Rᵢ = σ(pᵢ)(R)
Where σ denotes the selection operation and pᵢ is the predicate for fragment i. For valid fragmentation, the predicates must be:
Example: Geographic Fragmentation
Consider a Customers table with a country attribute. A geographic fragmentation might define:
The fourth predicate is crucial—it catches all countries not explicitly assigned, ensuring completeness.
| Property Violated | Condition | Consequence | Detection Method |
|---|---|---|---|
| Completeness | Some tuples match no predicate | Data inaccessible, query results incomplete | COUNT() on union ≠ COUNT() on original |
| Disjointness | Some tuples match multiple predicates | Storage waste, update anomalies, delete failures | SUM of fragment counts > original count |
| Reconstruction | Union cannot recreate original | Schema mismatch, data corruption | Schema comparison, integrity constraint checks |
Primary horizontal fragmentation partitions a relation based on predicates defined on the relation's own attributes. This is the most straightforward form of horizontal fragmentation, where the fragmentation criteria are intrinsic to the data being fragmented.
The Design Process:
Designing primary horizontal fragmentation requires analyzing:
Simple Predicates:
A simple predicate is an atomic Boolean condition of the form:
attribute θ value
Where θ ∈ {=, ≠, <, ≤, >, ≥}.
Examples of simple predicates:
order_date >= '2024-01-01'status = 'ACTIVE'amount > 10000Minterm Predicates:
Given a set of simple predicates P = {p₁, p₂, ..., pₘ}, a minterm predicate is a conjunction of these predicates where each predicate appears in either its natural form (pᵢ) or negated form (¬pᵢ).
For m simple predicates, there are 2ᵐ possible minterms, though many may be contradictory (always false) and can be eliminated.
Example:
Given simple predicates:
The four minterms are:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- Original Orders table schemaCREATE TABLE Orders ( order_id BIGINT PRIMARY KEY, customer_id BIGINT NOT NULL, order_date DATE NOT NULL, region VARCHAR(50) NOT NULL, amount DECIMAL(15, 2) NOT NULL, status VARCHAR(20) NOT NULL); -- Define fragmentation predicates based on region-- Fragment 1: North AmericaCREATE TABLE Orders_NorthAmerica ASSELECT * FROM OrdersWHERE region IN ('United States', 'Canada', 'Mexico'); -- Fragment 2: Europe CREATE TABLE Orders_Europe ASSELECT * FROM OrdersWHERE region IN ('United Kingdom', 'Germany', 'France', 'Italy', 'Spain', 'Netherlands'); -- Fragment 3: Asia PacificCREATE TABLE Orders_AsiaPacific ASSELECT * FROM OrdersWHERE region IN ('Japan', 'China', 'India', 'Australia', 'Singapore', 'South Korea'); -- Fragment 4: Rest of World (catch-all)CREATE TABLE Orders_Other ASSELECT * FROM OrdersWHERE region NOT IN ('United States', 'Canada', 'Mexico', 'United Kingdom', 'Germany', 'France', 'Italy', 'Spain', 'Netherlands', 'Japan', 'China', 'India', 'Australia', 'Singapore', 'South Korea'); -- Verify completeness: count should match originalSELECT (SELECT COUNT(*) FROM Orders_NorthAmerica) + (SELECT COUNT(*) FROM Orders_Europe) + (SELECT COUNT(*) FROM Orders_AsiaPacific) + (SELECT COUNT(*) FROM Orders_Other) AS fragment_total, (SELECT COUNT(*) FROM Orders) AS original_total;The fragmentation attribute should ideally be: (1) Frequently used in WHERE clauses to enable fragment elimination, (2) Relatively stable to avoid frequent tuple migrations, (3) Well-distributed to prevent skewed fragment sizes, and (4) Meaningful for physical co-location when data locality matters. In our example, 'region' allows queries by geography to access only relevant fragments while supporting data sovereignty requirements.
Derived horizontal fragmentation partitions a relation based on the fragmentation of another (related) relation. This strategy is essential when tables have referential relationships and frequently join together.
Motivation:
Consider a parent-child relationship between Customers and Orders. If Customers is fragmented by region, queries joining these tables would require cross-fragment joins if Orders fragments don't align. Derived fragmentation solves this by fragmenting the child table (Orders) using a semi-join with the parent fragments.
Formal Definition:
Let R₁, R₂, ..., Rₙ be the fragments of parent relation R. For child relation S with foreign key referencing R, the derived fragments of S are:
Sᵢ = S ⋉ Rᵢ
Where ⋉ denotes the semi-join operation. Each Sᵢ contains tuples from S that reference tuples in Rᵢ.
Link Relation:
The relation whose fragmentation determines the derived fragmentation is called the link relation or owner relation. The attribute used for the join is the link attribute.
Properties of Derived Fragmentation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
-- Parent relation: Customers fragmented by region-- Customers_NorthAmerica, Customers_Europe, Customers_AsiaPacific -- Child relation: OrdersCREATE TABLE Orders ( order_id BIGINT PRIMARY KEY, customer_id BIGINT NOT NULL REFERENCES Customers(customer_id), order_date DATE NOT NULL, amount DECIMAL(15, 2) NOT NULL, status VARCHAR(20) NOT NULL); -- Derived fragmentation: Orders fragments align with Customers fragments-- This uses a semi-join with the parent fragment -- Fragment 1: Orders for North American customersCREATE TABLE Orders_NorthAmerica ASSELECT o.*FROM Orders oWHERE EXISTS ( SELECT 1 FROM Customers_NorthAmerica c WHERE c.customer_id = o.customer_id); -- Fragment 2: Orders for European customersCREATE TABLE Orders_Europe ASSELECT o.*FROM Orders oWHERE EXISTS ( SELECT 1 FROM Customers_Europe c WHERE c.customer_id = o.customer_id); -- Fragment 3: Orders for Asia Pacific customersCREATE TABLE Orders_AsiaPacific ASSELECT o.*FROM Orders oWHERE EXISTS ( SELECT 1 FROM Customers_AsiaPacific c WHERE c.customer_id = o.customer_id); -- Now, local joins are possible without cross-site communication-- Each site can locally execute:SELECT c.customer_name, o.order_date, o.amountFROM Customers_NorthAmerica cJOIN Orders_NorthAmerica o ON c.customer_id = o.customer_idWHERE o.amount > 1000;Creating fragments is only half the problem—you must also allocate them to physical database nodes. The allocation decision profoundly impacts query performance, availability, and operational complexity.
Allocation Problem:
Given:
Find an allocation A: fragments → sites that minimizes:
Subject to:
Allocation Strategies:
Non-replicated Allocation: Each fragment resides at exactly one site
Replicated Allocation: Fragments may be duplicated across sites
Partially Replicated Allocation: Strategic replication based on access patterns
| Strategy | Read Performance | Write Complexity | Storage Cost | Availability |
|---|---|---|---|---|
| Non-replicated | May require remote access | Simple—single copy | Minimal | Low—SPOF per fragment |
| Fully Replicated | Always local | High—update all copies | Maximal | High—survives any failure |
| Partial Replication | Usually local | Moderate—depends on degree | Moderate | Configurable per fragment |
Allocation Heuristics:
Optimal allocation is NP-hard for non-trivial workloads. Practical systems employ heuristics:
Query-Based Allocation:
Capacity-Based Allocation:
Locality-Based Allocation:
Example Allocation Decision:
For regional order fragments:
Orders_NorthAmerica → US-East datacenter (primary), US-West (replica)Orders_Europe → EU-Frankfurt datacenter (primary), EU-London (replica)Orders_AsiaPacific → Singapore datacenter (primary), Tokyo (replica)This allocation:
Production systems must continuously monitor fragment sizes and access patterns. As data grows unevenly, fragments may require splitting, merging, or migration. Modern distributed databases like CockroachDB and TiDB perform automatic rebalancing, moving range fragments between nodes to maintain even distribution without manual intervention.
Translating theoretical horizontal fragmentation into production systems requires addressing several practical concerns.
Range-Based vs. Hash-Based Partitioning:
The fragmentation predicate implementation typically follows one of two patterns:
Range-Based Partitioning:
date >= '2024-01-01' AND date < '2024-02-01'Hash-Based Partitioning:
hash(customer_id) MOD n = iList-Based Partitioning:
region IN ('US', 'Canada')1234567891011121314151617181920212223242526272829303132333435363738394041424344
-- PostgreSQL Declarative Partitioning -- Range partitioning by date (time-series data)CREATE TABLE events ( event_id BIGINT NOT NULL, event_time TIMESTAMP NOT NULL, event_type VARCHAR(50) NOT NULL, payload JSONB) PARTITION BY RANGE (event_time); -- Create partitions for each monthCREATE TABLE events_2024_01 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');CREATE TABLE events_2024_02 PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');-- Continue for additional months... -- Hash partitioning for even distributionCREATE TABLE users ( user_id BIGINT NOT NULL, email VARCHAR(255) NOT NULL, created_at TIMESTAMP DEFAULT NOW()) PARTITION BY HASH (user_id); -- Create 8 hash partitionsCREATE TABLE users_0 PARTITION OF users FOR VALUES WITH (MODULUS 8, REMAINDER 0);CREATE TABLE users_1 PARTITION OF users FOR VALUES WITH (MODULUS 8, REMAINDER 1);-- Continue for remainders 2-7... -- List partitioning by regionCREATE TABLE orders ( order_id BIGINT NOT NULL, region VARCHAR(20) NOT NULL, amount DECIMAL(15, 2)) PARTITION BY LIST (region); CREATE TABLE orders_americas PARTITION OF orders FOR VALUES IN ('US', 'CA', 'MX', 'BR');CREATE TABLE orders_emea PARTITION OF orders FOR VALUES IN ('UK', 'DE', 'FR', 'IT');CREATE TABLE orders_apac PARTITION OF orders FOR VALUES IN ('JP', 'CN', 'IN', 'AU');When relations are horizontally fragmented, the query processor must determine which fragments to access and how to combine results. This process, called fragment elimination or partition pruning, is critical for performance.
Fragment Elimination:
The query optimizer analyzes WHERE clause predicates against fragment definitions. If a predicate contradicts a fragment's definition, that fragment can be skipped entirely.
Example:
Given fragments:
Orders_2024_Q1: order_date FROM '2024-01-01' TO '2024-04-01'Orders_2024_Q2: order_date FROM '2024-04-01' TO '2024-07-01'Orders_2024_Q3: order_date FROM '2024-07-01' TO '2024-10-01'For query:
SELECT * FROM Orders WHERE order_date = '2024-05-15'
Only Orders_2024_Q2 needs to be accessed—the other fragments are eliminated.
Localization Program:
The distributed query processor transforms global queries into fragment-specific subqueries through a localization program:
Query Rewriting:
A global query on Orders is rewritten as:
-- Original
SELECT * FROM Orders WHERE region = 'US' AND amount > 1000;
-- After localization (fragments by region)
SELECT * FROM Orders_NorthAmerica WHERE region = 'US' AND amount > 1000;
-- Other fragment queries eliminated entirely
12345678910111213141516171819202122232425262728293031323334353637383940414243
-- Demonstrating partition pruning with EXPLAIN -- Time-partitioned tableCREATE TABLE events ( id BIGSERIAL, occurred_at TIMESTAMP NOT NULL, event_type VARCHAR(50), data JSONB) PARTITION BY RANGE (occurred_at); CREATE TABLE events_2024_01 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');CREATE TABLE events_2024_02 PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');CREATE TABLE events_2024_03 PARTITION OF events FOR VALUES FROM ('2024-03-01') TO ('2024-04-01'); -- Query targeting specific date rangeEXPLAIN (ANALYZE, COSTS OFF)SELECT * FROM eventsWHERE occurred_at >= '2024-02-10' AND occurred_at < '2024-02-20'; -- QUERY PLAN OUTPUT:-- Append-- -> Seq Scan on events_2024_02 events_1-- Filter: (occurred_at >= '2024-02-10' AND occurred_at < '2024-02-20')-- (Partitions events_2024_01 and events_2024_03 are PRUNED) -- Query without partition filter - scans ALL partitionsEXPLAIN (ANALYZE, COSTS OFF)SELECT * FROM eventsWHERE event_type = 'LOGIN'; -- QUERY PLAN OUTPUT:-- Append-- -> Seq Scan on events_2024_01 events_1-- Filter: (event_type = 'LOGIN')-- -> Seq Scan on events_2024_02 events_2-- Filter: (event_type = 'LOGIN')-- -> Seq Scan on events_2024_03 events_3-- Filter: (event_type = 'LOGIN')-- (No pruning - all partitions scanned)To maximize fragment elimination: (1) Include partition key in query predicates, (2) Use equality or range conditions—functions on partition keys prevent pruning, (3) Avoid OR conditions spanning partitions when possible, (4) Create composite partition keys for multi-dimensional access patterns, and (5) Monitor slow query logs for queries scanning all partitions unexpectedly.
Horizontal fragmentation is the foundational technique for scaling relational data beyond single-node limits. Let's consolidate the key concepts:
What's Next:
Horizontal fragmentation divides by rows—but some scenarios require dividing by columns. The next page explores Vertical Fragmentation, where tables are split by attribute groups to optimize different access patterns and enable more granular data distribution.
You now understand horizontal fragmentation at a deep, principled level—from its formal foundations through practical implementation patterns. This knowledge is essential for designing distributed database schemas that scale while maintaining data integrity and query performance.