Loading content...
When a single database can no longer handle your data, you must divide it. But how you divide it determines everything—query performance, operational complexity, and even what operations are possible. Horizontal partitioning is the foundational technique that makes sharding possible.
In horizontal partitioning, we divide a table's rows across multiple physical locations. Each partition (or shard) contains a subset of rows, but every row has the same columns. Think of it like dividing a phone book into volumes A-L and M-Z—each volume has the same structure (name, number, address), but different entries.
By the end of this page, you will understand how horizontal partitioning works, how it differs from vertical partitioning, the key terminology used in sharding discussions, and the fundamental principles that guide partition design. This conceptual foundation is essential for understanding specific strategies like range-based and hash-based sharding.
Before diving deep into horizontal partitioning, let's clearly distinguish it from its counterpart, vertical partitioning. Understanding both helps you choose the right approach—and sometimes you need both.
Horizontal Partitioning (Sharding)
In horizontal partitioning, we split a table by rows. Each partition contains a subset of the data records, but all columns. If you have a users table with 100 million rows:
Each partition has the same schema: id, name, email, created_at, etc.
Vertical Partitioning
In vertical partitioning, we split a table by columns. Frequently accessed columns stay together, while rarely accessed or large columns move to separate tables:
users(id, name, email, created_at) — hot data, frequently querieduser_profiles(user_id, bio, full_address, preferences) — cold data, rarely neededuser_avatars(user_id, avatar_blob) — large binary data, separate storageVertical partitioning reduces row size, improving cache efficiency and query performance for common operations.
| Aspect | Horizontal (Sharding) | Vertical Partitioning |
|---|---|---|
| Divides by | Rows | Columns |
| Each partition contains | All columns, subset of rows | All rows, subset of columns |
| Primary benefit | Scales storage & write throughput | Reduces row size, separates hot/cold data |
| Complexity added | Cross-partition queries | Joins to reconstruct full entity |
| Common use case | Large tables with billions of rows | Wide tables with rarely-used columns |
| Scaling dimension | Data volume and write load | I/O efficiency and cache utilization |
In practice, large systems often use both. A table might be vertically partitioned to separate hot and cold columns, then horizontally partitioned to distribute rows across shards. Instagram, for example, vertically partitions user metadata from media blobs, then horizontally shards both based on user_id.
To understand horizontal partitioning deeply, let's examine its core components and how they work together:
1. The Partition Key (Shard Key)
The partition key is the column (or columns) used to determine which partition a row belongs to. This is the most critical decision in any sharding design. Common partition keys include:
user_id — User-centric applicationstenant_id — Multi-tenant SaaStimestamp — Time-series dataregion — Geographically distributed dataThe partition key appears in almost every query, allowing the system to route requests to the correct partition.
2. The Partition Function
The partition function maps partition key values to specific partitions. There are three main approaches:
Each has tradeoffs we'll explore in subsequent pages.
3. The Partitions (Shards)
Each partition is typically a separate database instance or table. In a sharded architecture, each shard might be:
1234567891011121314151617181920212223242526272829303132
-- Example: Horizontal partitioning in PostgreSQL (native partitioning) -- Create the parent table with partition keyCREATE TABLE orders ( id BIGSERIAL, customer_id BIGINT NOT NULL, order_date DATE NOT NULL, total_amount DECIMAL(10, 2), status VARCHAR(50), PRIMARY KEY (id, customer_id) -- Partition key must be in primary key) PARTITION BY HASH (customer_id); -- Create individual partitionsCREATE TABLE orders_p0 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 0); CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 1); CREATE TABLE orders_p2 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 2); CREATE TABLE orders_p3 PARTITION OF orders FOR VALUES WITH (MODULUS 4, REMAINDER 3); -- Queries automatically route to correct partitionSELECT * FROM orders WHERE customer_id = 12345;-- PostgreSQL knows this goes to orders_p1 (if 12345 % 4 = 1) -- Queries without partition key must scan all partitionsSELECT * FROM orders WHERE status = 'pending';-- Scans orders_p0, p1, p2, p3 - much slower!Notice that the partition key (customer_id) must be part of the primary key. This is a common requirement in partitioned databases because enforcing uniqueness across partitions without the partition key would require checking all partitions—defeating the purpose of partitioning.
Understanding how queries interact with partitions is essential for designing effective sharding strategies. There are three categories of queries in a partitioned system:
1. Single-Partition Queries (The Goal)
These queries include the partition key and target exactly one partition:
SELECT * FROM orders WHERE customer_id = 12345 AND status = 'shipped';
The database routes this directly to the correct partition. Performance is identical to a non-partitioned table—often better because each partition is smaller.
2. Scatter-Gather Queries (The Penalty)
These queries don't include the partition key and must query all partitions:
SELECT COUNT(*) FROM orders WHERE status = 'pending';
The database must:
With 100 partitions, this is 100x more queries than a single-partition lookup.
3. Cross-Partition Joins (The Nightmare)
These queries join data across partition boundaries:
SELECT c.name, SUM(o.total_amount)
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.order_date > '2024-01-01'
GROUP BY c.name;
If customers and orders are partitioned differently (or customers isn't partitioned), this becomes extremely expensive. Data must be shuffled between nodes, potentially moving gigabytes of data.
| Query Type | Partition Key Present? | Partitions Touched | Performance | Design Goal |
|---|---|---|---|---|
| Single-Partition | Yes | 1 | Excellent | 80%+ of queries |
| Scatter-Gather | No | All | Slow (O(N) partitions) | <20% of queries |
| Cross-Partition Join | Varies | Multiple | Very Slow | Avoid entirely |
Partition Pruning
Modern databases can sometimes prune partitions even without the exact partition key. For example, with range partitioning on order_date:
SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';
The database knows which partitions contain January 2024 data and only queries those. This is called partition pruning or partition elimination.
Good partition design maximizes pruning opportunities. If most queries filter by date, range partitioning by date enables effective pruning. If most queries filter by customer, hash partitioning by customer ensures single-partition queries.
Before choosing a partition strategy, analyze your query patterns. Run EXPLAIN on your most common queries. Count how often each column appears in WHERE clauses. The column(s) that appear in 80%+ of queries are your partition key candidates.
Effective horizontal partitioning requires even data distribution across partitions. Uneven distribution—called data skew—undermines the benefits of sharding.
Why Balance Matters
Consider a system with 10 shards, each designed to handle 10% of load:
Balanced distribution:
Skewed distribution:
The system is only as strong as its most loaded shard.
country and 40% of users are in the US, the US shard is 4x larger than average.As a rule of thumb, if your largest shard is more than 2x the size of your average shard, you have a skew problem worth addressing. If it's more than 10x, you've essentially un-sharded your database—that one shard is your bottleneck, and the others might as well not exist.
One of the most important (and often underestimated) decisions is partition granularity—how many partitions to create. Too few and you'll need to split them soon. Too many and you'll waste resources on overhead.
The Tradeoffs:
Few, Large Partitions (e.g., 4-16)
Many, Small Partitions (e.g., 1000+)
The Virtual Shard Pattern
A common solution is to over-partition logically while under-partitioning physically:
This gives you fine-grained rebalancing without the overhead of 1024 physical database connections.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
// Virtual shard pattern: 1024 virtual shards on 16 physical shards interface ShardConfig { physicalShardCount: number; virtualShardCount: number; virtualToPhysicalMap: Map<number, number>;} class ShardRouter { private config: ShardConfig; constructor(config: ShardConfig) { this.config = config; } /** * Routes a partition key to a physical shard * 1. Hash the key to get virtual shard (0-1023) * 2. Map virtual shard to physical shard (0-15) */ getPhysicalShard(partitionKey: string): number { const virtualShard = this.getVirtualShard(partitionKey); return this.config.virtualToPhysicalMap.get(virtualShard) || 0; } private getVirtualShard(key: string): number { // Consistent hash to virtual shard const hash = this.hashFunction(key); return hash % this.config.virtualShardCount; } private hashFunction(key: string): number { // Simple hash for illustration (use MurmurHash3 in production) let hash = 0; for (let i = 0; i < key.length; i++) { hash = Math.imul(31, hash) + key.charCodeAt(i) | 0; } return Math.abs(hash); } /** * Rebalancing: move virtual shards between physical shards * This is a metadata change, not a data migration */ rebalance(virtualShard: number, newPhysicalShard: number): void { // Update routing (coordination required during migration) this.config.virtualToPhysicalMap.set(virtualShard, newPhysicalShard); // Actual data migration happens separately }} // Example: Initial configuration (evenly distributed)const config: ShardConfig = { physicalShardCount: 16, virtualShardCount: 1024, virtualToPhysicalMap: new Map( Array.from({ length: 1024 }, (_, i) => [i, i % 16]) ),}; const router = new ShardRouter(config); // Route a customer's queriesconst shard = router.getPhysicalShard("customer_12345");console.log(`Customer 12345 routed to physical shard ${shard}`);A common heuristic is to create at least 10x more virtual partitions than your expected maximum physical shards. If you expect to scale to 100 physical shards over your system's lifetime, create 1024 virtual partitions. This gives you flexibility for decades of growth without ever needing to re-partition.
One of the most powerful concepts in horizontal partitioning is colocation—ensuring that data that is frequently accessed together lives on the same partition.
Why Colocation Matters
Consider an e-commerce system with these tables:
customers — customer profilesorders — customer ordersorder_items — items in each orderIf a single request needs to fetch a customer's profile, their recent orders, and order items, you want all that data on the same shard. Otherwise, one user request becomes multiple cross-shard queries.
Partition by the Same Key
The simplest colocation strategy is partitioning related tables by the same key:
-- All partitioned by customer_id
CREATE TABLE customers (...) PARTITION BY HASH (customer_id);
CREATE TABLE orders (...) PARTITION BY HASH (customer_id);
CREATE TABLE order_items (...) PARTITION BY HASH (customer_id);
Now, all of customer 12345's data lives on the same shard. Joins are local:
-- This is a single-shard operation!
SELECT c.name, o.id, oi.product_name
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE c.id = 12345;
| Strategy | How It Works | Best For | Limitation |
|---|---|---|---|
| Same Partition Key | All tables partitioned by same column | Entity-centric access (user, tenant) | Cross-entity queries still distributed |
| Composite Keys | Partition by higher-level entity | Hierarchical data (org > team > user) | Deep hierarchies complicate routing |
| Reference Tables | Replicate small tables to all shards | Lookup tables, config data | Must be small; updates hit all shards |
| Shard-Local Caching | Cache foreign shard data locally | Frequently accessed cross-shard data | Cache invalidation complexity |
Reference Tables: A Special Case
Some tables are small and frequently joined with partitioned data:
These can be replicated to every shard as reference tables. They're read-only (or rarely updated) and enable local joins:
-- countries is replicated to every shard
SELECT o.*, c.country_name
FROM orders o
JOIN countries c ON o.country_code = c.code
WHERE o.customer_id = 12345;
The join happens locally because countries exists on every shard. Updates to reference tables must propagate to all shards, so keep them small and static.
Reference tables should be small (<10,000 rows typically) and rarely updated. If you find yourself wanting to replicate a 1 million row table, you've likely made a design mistake. Either that table should be partitioned, or your shard key choice needs reconsideration.
With horizontal partitioning, operations fall into two categories: local operations that touch a single partition, and global operations that span all partitions. Understanding this distinction is key to designing performant sharded systems.
Local Operations (Fast)
SELECT * FROM users WHERE id = 123Global Operations (Slow)
SELECT COUNT(*) FROM usersSELECT * FROM users WHERE email = 'a@b.com'The Global Secondary Index Problem
One of the most challenging aspects of sharding is handling lookups by non-partition-key columns. Imagine your users are partitioned by user_id, but you need to look up users by email for login:
-- user_id is partition key, but we're querying by email
SELECT * FROM users WHERE email = 'alice@example.com';
This query must check all partitions because we don't know which partition has this email. Solutions include:
For login (email → user lookup), the most common production pattern is a small, separate 'email-to-user-id' mapping table. This table is either unsharded (if small enough), sharded by email hash, or stored in a fast cache like Redis. Once you have the user_id, all subsequent queries route to the correct user shard.
We've covered the foundational concepts of horizontal partitioning. Let's consolidate the key insights:
What's Next:
Now that you understand the fundamentals of horizontal partitioning, we'll explore specific partitioning strategies. The next page covers Range-Based Sharding—partitioning data by ranges of key values. This is one of the most intuitive strategies but comes with unique challenges around hotspots and rebalancing.
You now understand the core concepts of horizontal partitioning: how it works, how queries route to partitions, the importance of balance and colocation, and the distinction between local and global operations. Next, we'll apply these concepts to specific sharding strategies, starting with range-based sharding.