Sharding Partitioning - Learning Module

Loading content...

0/273

Horizontal Partitioning

The Art of Dividing Data

When a single database can no longer handle your data, you must divide it. But how you divide it determines everything—query performance, operational complexity, and even what operations are possible. Horizontal partitioning is the foundational technique that makes sharding possible.

In horizontal partitioning, we divide a table's rows across multiple physical locations. Each partition (or shard) contains a subset of rows, but every row has the same columns. Think of it like dividing a phone book into volumes A-L and M-Z—each volume has the same structure (name, number, address), but different entries.

What You Will Learn

By the end of this page, you will understand how horizontal partitioning works, how it differs from vertical partitioning, the key terminology used in sharding discussions, and the fundamental principles that guide partition design. This conceptual foundation is essential for understanding specific strategies like range-based and hash-based sharding.

Horizontal vs. Vertical Partitioning

Before diving deep into horizontal partitioning, let's clearly distinguish it from its counterpart, vertical partitioning. Understanding both helps you choose the right approach—and sometimes you need both.

Horizontal Partitioning (Sharding)

In horizontal partitioning, we split a table by rows. Each partition contains a subset of the data records, but all columns. If you have a users table with 100 million rows:

Partition 1: user_id 1 - 10,000,000
Partition 2: user_id 10,000,001 - 20,000,000
... and so on

Each partition has the same schema: id, name, email, created_at, etc.

Vertical Partitioning

In vertical partitioning, we split a table by columns. Frequently accessed columns stay together, while rarely accessed or large columns move to separate tables:

Main table: users(id, name, email, created_at) — hot data, frequently queried
Extended table: user_profiles(user_id, bio, full_address, preferences) — cold data, rarely needed
Blob table: user_avatars(user_id, avatar_blob) — large binary data, separate storage

Vertical partitioning reduces row size, improving cache efficiency and query performance for common operations.

Horizontal vs. Vertical Partitioning
Aspect	Horizontal (Sharding)	Vertical Partitioning
Divides by	Rows	Columns
Each partition contains	All columns, subset of rows	All rows, subset of columns
Primary benefit	Scales storage & write throughput	Reduces row size, separates hot/cold data
Complexity added	Cross-partition queries	Joins to reconstruct full entity
Common use case	Large tables with billions of rows	Wide tables with rarely-used columns
Scaling dimension	Data volume and write load	I/O efficiency and cache utilization

Combining Both Approaches

In practice, large systems often use both. A table might be vertically partitioned to separate hot and cold columns, then horizontally partitioned to distribute rows across shards. Instagram, for example, vertically partitions user metadata from media blobs, then horizontally shards both based on user_id.

The Anatomy of Horizontal Partitioning

To understand horizontal partitioning deeply, let's examine its core components and how they work together:

1. The Partition Key (Shard Key)

The partition key is the column (or columns) used to determine which partition a row belongs to. This is the most critical decision in any sharding design. Common partition keys include:

user_id — User-centric applications
tenant_id — Multi-tenant SaaS
timestamp — Time-series data
region — Geographically distributed data

The partition key appears in almost every query, allowing the system to route requests to the correct partition.

2. The Partition Function

The partition function maps partition key values to specific partitions. There are three main approaches:

Range partitioning: Key values 1-1000 → Partition A, 1001-2000 → Partition B
Hash partitioning: hash(key) % N determines the partition
Directory partitioning: A lookup table maps keys to partitions

Each has tradeoffs we'll explore in subsequent pages.

3. The Partitions (Shards)

Each partition is typically a separate database instance or table. In a sharded architecture, each shard might be:

A separate database server
A separate database on the same server
A separate table with a suffix (users_0, users_1, etc.)
A partition in a database with native partitioning support

horizontal-partitioning-example.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Example: Horizontal partitioning in PostgreSQL (native partitioning)
 
-- Create the parent table with partition key
CREATE TABLE orders (
    id              BIGSERIAL,
    customer_id     BIGINT NOT NULL,
    order_date      DATE NOT NULL,
    total_amount    DECIMAL(10, 2),
    status          VARCHAR(50),
    PRIMARY KEY (id, customer_id)  -- Partition key must be in primary key
) PARTITION BY HASH (customer_id);
 
-- Create individual partitions
CREATE TABLE orders_p0 PARTITION OF orders
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);
 
CREATE TABLE orders_p1 PARTITION OF orders
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);
 
CREATE TABLE orders_p2 PARTITION OF orders
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);
 
CREATE TABLE orders_p3 PARTITION OF orders
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);
 
-- Queries automatically route to correct partition
SELECT * FROM orders WHERE customer_id = 12345;
-- PostgreSQL knows this goes to orders_p1 (if 12345 % 4 = 1)
 
-- Queries without partition key must scan all partitions
SELECT * FROM orders WHERE status = 'pending';
-- Scans orders_p0, p1, p2, p3 - much slower!

The Partition Key in Primary Key Requirement

Notice that the partition key (customer_id) must be part of the primary key. This is a common requirement in partitioned databases because enforcing uniqueness across partitions without the partition key would require checking all partitions—defeating the purpose of partitioning.

How Queries Work with Partitions

Understanding how queries interact with partitions is essential for designing effective sharding strategies. There are three categories of queries in a partitioned system:

1. Single-Partition Queries (The Goal)

These queries include the partition key and target exactly one partition:

SELECT * FROM orders WHERE customer_id = 12345 AND status = 'shipped';

The database routes this directly to the correct partition. Performance is identical to a non-partitioned table—often better because each partition is smaller.

2. Scatter-Gather Queries (The Penalty)

These queries don't include the partition key and must query all partitions:

SELECT COUNT(*) FROM orders WHERE status = 'pending';

The database must:

Send the query to every partition in parallel
Collect results from each partition
Aggregate results (COUNT, SUM, etc.)

With 100 partitions, this is 100x more queries than a single-partition lookup.

3. Cross-Partition Joins (The Nightmare)

These queries join data across partition boundaries:

SELECT c.name, SUM(o.total_amount)
FROM customers c
JOIN orders o ON c.id = o.customer_id
WHERE o.order_date > '2024-01-01'
GROUP BY c.name;

If customers and orders are partitioned differently (or customers isn't partitioned), this becomes extremely expensive. Data must be shuffled between nodes, potentially moving gigabytes of data.

Query Types and Their Impact
Query Type	Partition Key Present?	Partitions Touched	Performance	Design Goal
Single-Partition	Yes	1	Excellent	80%+ of queries
Scatter-Gather	No	All	Slow (O(N) partitions)	<20% of queries
Cross-Partition Join	Varies	Multiple	Very Slow	Avoid entirely

Partition Pruning

Modern databases can sometimes prune partitions even without the exact partition key. For example, with range partitioning on order_date:

SELECT * FROM orders WHERE order_date BETWEEN '2024-01-01' AND '2024-01-31';

The database knows which partitions contain January 2024 data and only queries those. This is called partition pruning or partition elimination.

Good partition design maximizes pruning opportunities. If most queries filter by date, range partitioning by date enables effective pruning. If most queries filter by customer, hash partitioning by customer ensures single-partition queries.

Query Patterns Drive Partition Design

Before choosing a partition strategy, analyze your query patterns. Run EXPLAIN on your most common queries. Count how often each column appears in WHERE clauses. The column(s) that appear in 80%+ of queries are your partition key candidates.

Data Distribution and Balance

Effective horizontal partitioning requires even data distribution across partitions. Uneven distribution—called data skew—undermines the benefits of sharding.

Why Balance Matters

Consider a system with 10 shards, each designed to handle 10% of load:

Balanced distribution:

Each shard has ~10% of data
Each shard handles ~10% of traffic
All shards hit capacity limits simultaneously

Skewed distribution:

Shard 1 has 40% of data (hot shard)
Other shards have 6-7% each
Shard 1 hits capacity limits while others are underutilized
You have 90% of your infrastructure sitting idle

The system is only as strong as its most loaded shard.

Common Causes of Data Skew

•Non-uniform key distribution — If you partition by country and 40% of users are in the US, the US shard is 4x larger than average.
•Temporal skew — If you partition by creation date, newer partitions receive all writes while older partitions are read-only.
•Celebrity/hotspot problem — One partition key (e.g., a viral user's ID) receives disproportionate traffic.
•Sequential IDs — Auto-increment IDs cause all new writes to hit the same partition (the one handling the highest IDs).
•Hash collisions — Poor hash functions or unlucky key distributions cause uneven bucket filling.

Skew Symptoms

•One shard has 3x+ the data of others
•One shard's CPU is maxed while others idle
•Adding shards doesn't improve performance
•Specific operations hit timeouts randomly
•Backup times vary wildly between shards

Skew Solutions

•Hash partition keys to distribute evenly
•Use composite keys (add random suffix)
•Implement hotspot detection and splitting
•Shard celebrities separately (dedicated shard)
•Rebalance shards periodically

The 10x Rule

As a rule of thumb, if your largest shard is more than 2x the size of your average shard, you have a skew problem worth addressing. If it's more than 10x, you've essentially un-sharded your database—that one shard is your bottleneck, and the others might as well not exist.

Partition Granularity: How Many Partitions?

One of the most important (and often underestimated) decisions is partition granularity—how many partitions to create. Too few and you'll need to split them soon. Too many and you'll waste resources on overhead.

The Tradeoffs:

Few, Large Partitions (e.g., 4-16)

Simpler to manage and reason about
Less metadata and routing overhead
Each partition must handle significant load
Splitting partitions is expensive and disruptive
Risk of uneven distribution is higher

Many, Small Partitions (e.g., 1000+)

Finer load distribution control
Easier to rebalance (move small chunks)
More metadata and connection overhead
Each scatter-gather query touches more nodes
Complex routing and management

The Virtual Shard Pattern

A common solution is to over-partition logically while under-partitioning physically:

Create 1024 virtual partitions (logical)
Map them to 16 physical shards (physical)
Each physical shard hosts ~64 virtual partitions
When you need to scale, move virtual partitions to new physical shards

This gives you fine-grained rebalancing without the overhead of 1024 physical database connections.

virtual-shard-routing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// Virtual shard pattern: 1024 virtual shards on 16 physical shards
 
interface ShardConfig {
    physicalShardCount: number;
    virtualShardCount: number;
    virtualToPhysicalMap: Map<number, number>;
}
 
class ShardRouter {
    private config: ShardConfig;
    
    constructor(config: ShardConfig) {
        this.config = config;
    }
    
    /**
     * Routes a partition key to a physical shard
     * 1. Hash the key to get virtual shard (0-1023)
     * 2. Map virtual shard to physical shard (0-15)
     */
    getPhysicalShard(partitionKey: string): number {
        const virtualShard = this.getVirtualShard(partitionKey);
        return this.config.virtualToPhysicalMap.get(virtualShard) || 0;
    }
    
    private getVirtualShard(key: string): number {
        // Consistent hash to virtual shard
        const hash = this.hashFunction(key);
        return hash % this.config.virtualShardCount;
    }
    
    private hashFunction(key: string): number {
        // Simple hash for illustration (use MurmurHash3 in production)
        let hash = 0;
        for (let i = 0; i < key.length; i++) {
            hash = Math.imul(31, hash) + key.charCodeAt(i) | 0;
        }
        return Math.abs(hash);
    }
    
    /**
     * Rebalancing: move virtual shards between physical shards
     * This is a metadata change, not a data migration
     */
    rebalance(virtualShard: number, newPhysicalShard: number): void {
        // Update routing (coordination required during migration)
        this.config.virtualToPhysicalMap.set(virtualShard, newPhysicalShard);
        // Actual data migration happens separately
    }
}
 
// Example: Initial configuration (evenly distributed)
const config: ShardConfig = {
    physicalShardCount: 16,
    virtualShardCount: 1024,
    virtualToPhysicalMap: new Map(
        Array.from({ length: 1024 }, (_, i) => [i, i % 16])
    ),
};
 
const router = new ShardRouter(config);
 
// Route a customer's queries
const shard = router.getPhysicalShard("customer_12345");
console.log(`Customer 12345 routed to physical shard ${shard}`);

The Rule of 1024

A common heuristic is to create at least 10x more virtual partitions than your expected maximum physical shards. If you expect to scale to 100 physical shards over your system's lifetime, create 1024 virtual partitions. This gives you flexibility for decades of growth without ever needing to re-partition.

Colocation and Data Locality

One of the most powerful concepts in horizontal partitioning is colocation—ensuring that data that is frequently accessed together lives on the same partition.

Why Colocation Matters

Consider an e-commerce system with these tables:

customers — customer profiles
orders — customer orders
order_items — items in each order

If a single request needs to fetch a customer's profile, their recent orders, and order items, you want all that data on the same shard. Otherwise, one user request becomes multiple cross-shard queries.

Partition by the Same Key

The simplest colocation strategy is partitioning related tables by the same key:

-- All partitioned by customer_id
CREATE TABLE customers (...) PARTITION BY HASH (customer_id);
CREATE TABLE orders (...) PARTITION BY HASH (customer_id);
CREATE TABLE order_items (...) PARTITION BY HASH (customer_id);

Now, all of customer 12345's data lives on the same shard. Joins are local:

-- This is a single-shard operation!
SELECT c.name, o.id, oi.product_name
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE c.id = 12345;

Colocation Strategies
Strategy	How It Works	Best For	Limitation
Same Partition Key	All tables partitioned by same column	Entity-centric access (user, tenant)	Cross-entity queries still distributed
Composite Keys	Partition by higher-level entity	Hierarchical data (org > team > user)	Deep hierarchies complicate routing
Reference Tables	Replicate small tables to all shards	Lookup tables, config data	Must be small; updates hit all shards
Shard-Local Caching	Cache foreign shard data locally	Frequently accessed cross-shard data	Cache invalidation complexity

Reference Tables: A Special Case

Some tables are small and frequently joined with partitioned data:

Country codes, currency mappings
Product categories
Permission roles
Configuration settings

These can be replicated to every shard as reference tables. They're read-only (or rarely updated) and enable local joins:

-- countries is replicated to every shard
SELECT o.*, c.country_name
FROM orders o
JOIN countries c ON o.country_code = c.code
WHERE o.customer_id = 12345;

The join happens locally because countries exists on every shard. Updates to reference tables must propagate to all shards, so keep them small and static.

The Reference Table Trap

Reference tables should be small (<10,000 rows typically) and rarely updated. If you find yourself wanting to replicate a 1 million row table, you've likely made a design mistake. Either that table should be partitioned, or your shard key choice needs reconsideration.

Local vs. Global Operations

With horizontal partitioning, operations fall into two categories: local operations that touch a single partition, and global operations that span all partitions. Understanding this distinction is key to designing performant sharded systems.

Local Operations (Fast)

Single-key lookups: SELECT * FROM users WHERE id = 123
All writes to a single entity
Transactions within a partition
Aggregations with partition key filter

Global Operations (Slow)

Full table scans: SELECT COUNT(*) FROM users
Queries without partition key: SELECT * FROM users WHERE email = 'a@b.com'
Cross-partition transactions
Global unique constraints (e.g., unique email)

The Global Secondary Index Problem

One of the most challenging aspects of sharding is handling lookups by non-partition-key columns. Imagine your users are partitioned by user_id, but you need to look up users by email for login:

-- user_id is partition key, but we're querying by email
SELECT * FROM users WHERE email = 'alice@example.com';

This query must check all partitions because we don't know which partition has this email. Solutions include:

Global Index Strategies

•Scatter-Gather — Query all partitions in parallel. Simple but expensive. Works for rare queries only.
•Global Index Shard — Maintain a separate index mapping email → user_id. Query index first, then route to correct shard. Common in practice.
•Local Secondary Index — Each shard indexes its own emails. Still requires scatter-gather but can filter early. Better than full scan.
•Denormalization — Store user_id with email in authentication-specific table partitioned by email. Fast login at the cost of maintaining two copies.
•External Index — Use Elasticsearch or similar full-text search engine as a global index. Returns IDs, then fetch from correct shards.

The Authentication Pattern

For login (email → user lookup), the most common production pattern is a small, separate 'email-to-user-id' mapping table. This table is either unsharded (if small enough), sharded by email hash, or stored in a fast cache like Redis. Once you have the user_id, all subsequent queries route to the correct user shard.

Summary: Horizontal Partitioning Fundamentals

We've covered the foundational concepts of horizontal partitioning. Let's consolidate the key insights:

Key Takeaways

•Horizontal partitioning divides rows, not columns — Each partition has all columns but a subset of data. Vertical partitioning divides columns.
•The partition key determines everything — It decides which partition holds each row and which queries are fast (single-partition) vs. slow (scatter-gather).
•Balance is essential — Skewed data distribution means one hot shard becomes your bottleneck, wasting the rest of your capacity.
•Virtual shards enable flexible scaling — Over-partition logically, under-partition physically, and rebalance by moving virtual shards.
•Colocation keeps related data together — Partition related tables by the same key to enable local joins.
•Global operations are expensive — Design your schema and queries to minimize cross-partition operations.

What's Next:

Now that you understand the fundamentals of horizontal partitioning, we'll explore specific partitioning strategies. The next page covers Range-Based Sharding—partitioning data by ranges of key values. This is one of the most intuitive strategies but comes with unique challenges around hotspots and rebalancing.

Page Complete

You now understand the core concepts of horizontal partitioning: how it works, how queries route to partitions, the importance of balance and colocation, and the distinction between local and global operations. Next, we'll apply these concepts to specific sharding strategies, starting with range-based sharding.