Sharding Partitioning - Learning Module

Loading content...

0/273

Range-Based Sharding

Partitioning by Boundaries

Range-based sharding is perhaps the most intuitive partitioning strategy. Just as a library organizes books by call number ranges (A-D on shelf 1, E-H on shelf 2), range-based sharding divides data into contiguous ranges of key values. Each shard handles a specific range, making queries within that range extremely efficient.

This strategy is particularly powerful for time-series data, sequential identifiers, and any dataset where range queries are common. When you query "all orders from January 2024," the system knows exactly which shard to hit—no scatter-gather required.

What You Will Learn

By the end of this page, you will understand how range-based sharding works, when it excels, its inherent challenges (especially hotspots), and practical strategies for implementing range partitioning in production systems. You'll also learn when to choose range-based sharding over alternatives like hash partitioning.

How Range-Based Sharding Works

In range-based sharding, the key space is divided into contiguous, non-overlapping ranges. Each range is assigned to a shard. When a query arrives, the system determines which range contains the target key and routes the query to the corresponding shard.

The Basic Mechanism:

Define Range Boundaries — Split the key space into ranges
- Shard A: keys 1 - 1,000,000
- Shard B: keys 1,000,001 - 2,000,000
- Shard C: keys 2,000,001 - 3,000,000
Route Queries — Given a key, find its range
- Query for key 1,500,000 → Shard B
- Query for key 500,000 → Shard A
- Query for keys 900,000 to 1,100,000 → Shard A and B
Execute and Return — Each shard processes its portion

The beauty of range sharding is that range queries are efficient. If you need all records from January 2024 and dates are your shard key, you know exactly which shard(s) to query.

range-shard-router.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
interface RangeBoundary {
    start: number | string | Date;
    end: number | string | Date;
    shardId: string;
}
 
class RangeShardRouter {
    private boundaries: RangeBoundary[];
    
    constructor(boundaries: RangeBoundary[]) {
        // Boundaries must be sorted and non-overlapping
        this.boundaries = boundaries.sort((a, b) => 
            this.compare(a.start, b.start)
        );
    }
    
    /**
     * Route a single key to its shard
     * Uses binary search for O(log n) routing with many shards
     */
    routeKey(key: number | string | Date): string {
        // Binary search for the correct range
        let left = 0;
        let right = this.boundaries.length - 1;
        
        while (left <= right) {
            const mid = Math.floor((left + right) / 2);
            const range = this.boundaries[mid];
            
            if (this.compare(key, range.start) >= 0 && 
                this.compare(key, range.end) <= 0) {
                return range.shardId;
            }
            
            if (this.compare(key, range.start) < 0) {
                right = mid - 1;
            } else {
                left = mid + 1;
            }
        }
        
        throw new Error(`Key ${key} not found in any range`);
    }
    
    /**
     * Route a range query to all shards it touches
     * Returns shards in order from start to end
     */
    routeRange(startKey: number | string | Date, endKey: number | string | Date): string[] {
        const shards: string[] = [];
        
        for (const range of this.boundaries) {
            // Check if this range overlaps with query range
            const rangeStartsBeforeQueryEnds = this.compare(range.start, endKey) <= 0;
            const rangeEndsAfterQueryStarts = this.compare(range.end, startKey) >= 0;
            
            if (rangeStartsBeforeQueryEnds && rangeEndsAfterQueryStarts) {
                shards.push(range.shardId);
            }
        }
        
        return shards;
    }
    
    private compare(a: number | string | Date, b: number | string | Date): number {
        if (a < b) return -1;
        if (a > b) return 1;
        return 0;
    }
}
 
// Example: Date-based range sharding for time-series data
const router = new RangeShardRouter([
    { start: new Date('2023-01-01'), end: new Date('2023-03-31'), shardId: 'shard-2023-q1' },
    { start: new Date('2023-04-01'), end: new Date('2023-06-30'), shardId: 'shard-2023-q2' },
    { start: new Date('2023-07-01'), end: new Date('2023-09-30'), shardId: 'shard-2023-q3' },
    { start: new Date('2023-10-01'), end: new Date('2023-12-31'), shardId: 'shard-2023-q4' },
    { start: new Date('2024-01-01'), end: new Date('2024-03-31'), shardId: 'shard-2024-q1' },
]);
 
// Single key lookup
const shard = router.routeKey(new Date('2023-08-15'));
console.log(`August 2023 data is on ${shard}`); // shard-2023-q3
 
// Range query
const shardsForQuery = router.routeRange(
    new Date('2023-11-01'), 
    new Date('2024-02-28')
);
console.log(`Query spans: ${shardsForQuery}`); // ['shard-2023-q4', 'shard-2024-q1']

Range Boundaries are Metadata

Range boundaries are typically stored in a metadata service (like ZooKeeper or etcd) or a configuration database. The routing layer caches this metadata locally for fast lookups. When ranges change (due to splits or rebalancing), the metadata is updated and clients refresh their cache.

When Range-Based Sharding Excels

Range-based sharding isn't always the right choice, but when your data and access patterns align with it, the benefits are substantial.

Ideal Use Cases:

Range Sharding Strengths

•Time-Series Data — Logs, metrics, events, and IoT sensor data are naturally ordered by time. Range sharding by timestamp means queries for 'last hour' or 'this month' hit only the relevant shards.
•Sequential Identifiers with Range Queries — If IDs are roughly time-ordered and you query ranges of IDs (e.g., 'orders 10000-20000'), range sharding provides optimal locality.
•Archival Patterns — Old data can be moved to cheaper storage. Range-sharded time-series data makes it trivial to archive or delete entire shards representing old periods.
•Sorted Results — When results need to be sorted by the shard key, data comes from shards in order, simplifying merge operations.
•Predictable Data Lifecycle — If data has known retention periods, range sharding by time allows dropping entire shards instead of expensive DELETE operations.

Range Sharding Use Case Analysis
Use Case	Shard Key	Range Strategy	Efficiency Gain
Application Logs	timestamp	Monthly or weekly ranges	Date queries hit single shard
Financial Transactions	transaction_date	Daily or weekly ranges	Settlement queries localized
Audit History	created_at	Quarterly ranges	Compliance queries efficient
Order History	order_id (sequential)	ID ranges	Batch processing optimized
Sensor Telemetry	measurement_time	Hourly or daily ranges	Aggregation queries efficient

The Archival Advantage:

One of the most powerful benefits of range sharding by time is simplified data lifecycle management. Consider a logging system with 90-day retention:

Without range sharding:

DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';
-- Touches all shards, generates massive tombstones/dead tuples
-- Requires VACUUM, causes I/O spikes, blocks other operations

With range sharding by month:

DROP TABLE logs_2023_10;
-- Instant, no tombstones, no cleanup, no impact on other shards

This is why time-series databases (TimescaleDB, InfluxDB) universally use range partitioning by time.

The Time-Series Pattern

If your primary access pattern is 'give me data for time period X,' range sharding by time is almost certainly the right choice. This includes logs, metrics, events, analytics, and any data where time is the natural query dimension.

The Hotspot Problem: Range Sharding's Achilles Heel

Despite its elegance, range-based sharding has a critical weakness: hotspots. When writes are concentrated in a narrow range, one shard receives disproportionate load while others sit idle.

The Sequential Write Problem:

Consider these common scenarios:

Auto-increment IDs:

New users get IDs 1,000,001, 1,000,002, 1,000,003...
All new writes hit Shard B (IDs 1M-2M)
Shards A and C are read-only

Time-based Keys:

All new events have today's timestamp
The 'current' shard receives 100% of writes
Yesterday's shard receives 0% of writes

This creates severe problems:

Hotspot Consequences

•Single-Shard Bottleneck — Write throughput is limited to one shard's capacity, regardless of how many shards you have.
•Resource Imbalance — Hot shard uses 100% CPU while others idle. You're paying for unused capacity.
•Cascade Failures — If the hot shard fails, all writes fail. During recovery, backlog can exceed capacity.
•Unfair Load Distribution — Hot shard's hardware degrades faster, needs more frequent maintenance.
•Scaling Doesn't Help — Adding more shards doesn't distribute the hotspot—new writes still hit one shard.

Visualizing the Hotspot:

Writes/second by shard:

Shard A (old IDs): ██ 200/s
Shard B (current): ██████████████████ 18,000/s
Shard C (future):  ░ 0/s

Capacity per shard: 20,000/s
Total cluster: 60,000/s
Actual throughput: 18,000/s

Efficiency: 30% 😱

You have 3 shards but can only use the capacity of 1.

Balanced Distribution:

Writes/second by shard:

Shard A: ██████ 18,500/s
Shard B: ██████ 19,000/s  
Shard C: ██████ 18,500/s

Capacity per shard: 20,000/s
Total cluster: 60,000/s
Actual throughput: 56,000/s

Efficiency: 93% ✅

With even distribution, you utilize nearly all capacity.

The Auto-Increment Trap

Never use auto-incrementing IDs as range shard keys unless you specifically want 'current' data on one shard. This pattern works for time-series with read-heavy current data, but fails catastrophically for write-heavy workloads with sequential keys.

Hotspot Mitigation Strategies

While hotspots are inherent to range sharding with sequential keys, several strategies can mitigate or eliminate them:

Hotspot Solutions

•Salting/Prefixing — Add a random prefix (0-9) to keys. Instead of one hot range, you have 10 warm ranges. For time-series: 0_2024-01-15, 1_2024-01-15, etc.
•Finer-Grained Ranges — Instead of monthly shards, use daily or hourly. More shards but each is smaller and ages out faster.
•Write Buffering — Buffer writes in memory or a queue, then batch-insert. Spreads load temporally instead of all hitting at once.
•Hash-Range Hybrid — Hash the non-time portion of the key within each time range. Time provides range query efficiency; hash prevents hotspots within periods.
•Pre-splitting — Create future ranges before they're needed. When 'current' range fills, writes automatically go to the next pre-created range.

salted-range-sharding.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
/**
 * Salted Range Sharding
 * Combines range partitioning with salt prefixes to eliminate hotspots
 */
 
const SALT_COUNT = 10; // Keys spread across 10 sub-ranges
 
interface SaltedKey {
    salt: number;
    originalKey: string;
    composite: string;
}
 
function createSaltedKey(originalKey: string): SaltedKey {
    // Hash the original key to get consistent salt
    const hash = simpleHash(originalKey);
    const salt = hash % SALT_COUNT;
    
    return {
        salt,
        originalKey,
        composite: `${salt}_${originalKey}`
    };
}
 
function simpleHash(str: string): number {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
        hash = Math.imul(31, hash) + str.charCodeAt(i) | 0;
    }
    return Math.abs(hash);
}
 
// Time-series example: logs with timestamp keys
function createLogEntry(timestamp: Date, logData: any) {
    // Original key would be: "2024-01-15T10:30:00Z"
    const originalKey = timestamp.toISOString();
    const salted = createSaltedKey(originalKey);
    
    // Store as: "3_2024-01-15T10:30:00Z" (where 3 is the salt)
    return {
        shardKey: salted.composite,
        data: logData,
        originalTimestamp: originalKey
    };
}
 
// Querying: must scatter across all salts
function queryTimeRange(start: Date, end: Date): string[] {
    const shardQueries: string[] = [];
    
    // Generate query for each salt
    for (let salt = 0; salt < SALT_COUNT; salt++) {
        shardQueries.push(`
            SELECT * FROM logs 
            WHERE shard_key BETWEEN '${salt}_${start.toISOString()}' 
                                AND '${salt}_${end.toISOString()}'
        `);
    }
    
    return shardQueries; // Execute in parallel, merge results
}
 
// Example usage
const entry = createLogEntry(new Date(), { message: "User login" });
console.log(`Entry shard key: ${entry.shardKey}`);
// Output: Entry shard key: 7_2024-01-15T10:30:00.000Z
 
// Instead of ALL writes going to today's shard,
// they're distributed across 10 shards (salt 0-9 for today)

The Salting Tradeoff:

Salting spreads writes across N sub-ranges, effectively eliminating the hotspot. However, it comes with costs:

Range queries become scatter-gather — A query for 'today' must now hit all 10 salt prefixes
Results require merging — You must collect and sort results from all salted ranges
Ordered iteration is complex — Walking the keyspace in order requires coordinating across salts

Salting is ideal when write distribution is more important than range query efficiency. For pure time-series analytics where reads dominate, unsalted ranges may be better.

Choosing Salt Count

Use a salt count that doesn't exceed your shard count. With 4 shards, salt with 4 or 8 values. With 16 shards, salt with 16 or 32 values. This ensures even distribution while avoiding excessive scatter-gather operations.

Range Splitting and Rebalancing

Unlike hash-based sharding where adding nodes requires rehashing, range sharding has a unique advantage: you can split ranges without moving all data. This makes rebalancing more surgical—but it still requires careful orchestration.

When to Split a Range:

The shard exceeds target size (e.g., 500GB per shard)
Write throughput approaches shard capacity
Query latency increases due to data volume
One shard's load is 2x+ the average

The Splitting Process:

Choose Split Point — Find a key that divides the range evenly (by size or key count)
Create New Shard — Provision the new database instance
Copy Data — Replicate the right half to the new shard (while continuing to serve reads)
Atomic Cutover — Update routing metadata; new shard handles its range
Cleanup — Delete copied data from original shard

range-split-orchestration.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
interface RangeSplit {
    originalShard: string;
    splitPoint: string | number | Date;
    newShard: string;
}
 
class RangeSplitOrchestrator {
    /**
     * Orchestrates splitting a range shard at a given point
     * This is a simplified view - production requires extensive error handling
     */
    async splitRange(split: RangeSplit): Promise<void> {
        const { originalShard, splitPoint, newShard } = split;
        
        console.log(`Starting split of ${originalShard} at ${splitPoint}`);
        
        // Phase 1: Preparation
        await this.createNewShardInstance(newShard);
        
        // Phase 2: Data Copy (online, while original serves traffic)
        // Uses streaming replication or bulk copy
        await this.copyDataAboveSplitPoint(originalShard, splitPoint, newShard);
        
        // Phase 3: Catch-up (copy changes made during Phase 2)
        await this.enableReplicationForNewWrites(originalShard, splitPoint, newShard);
        await this.waitForReplicationCatchup(originalShard, newShard);
        
        // Phase 4: Atomic Cutover
        // This is the critical moment - must be fast
        await this.pauseWrites(originalShard); // < 1 second pause
        await this.finalReplicationSync(originalShard, newShard);
        await this.updateRoutingMetadata(originalShard, splitPoint, newShard);
        await this.resumeWrites([originalShard, newShard]);
        
        // Phase 5: Cleanup
        await this.deleteMovedDataFromOriginal(originalShard, splitPoint);
        await this.verifyDataIntegrity(originalShard, newShard);
        
        console.log(`Split complete: ${originalShard} -> ${originalShard}, ${newShard}`);
    }
    
    async findOptimalSplitPoint(shardId: string): Promise<string> {
        // Options for choosing split point:
        
        // 1. Median key - ensures equal key count
        // SELECT key FROM table ORDER BY key LIMIT 1 OFFSET (SELECT COUNT(*)/2)
        
        // 2. Size-based - split by data size
        // Use index statistics to find key at 50% of data
        
        // 3. Write-rate based - split to balance writes
        // Analyze recent write patterns, split where writes divide evenly
        
        // Usually median key is the default choice
        return this.getMedianKey(shardId);
    }
    
    private async getMedianKey(shardId: string): Promise<string> {
        // Implementation: query for median key
        return "median_key_placeholder";
    }
    
    // ... implementation of other methods
    private async createNewShardInstance(shardId: string): Promise<void> { }
    private async copyDataAboveSplitPoint(from: string, point: any, to: string): Promise<void> { }
    private async enableReplicationForNewWrites(from: string, point: any, to: string): Promise<void> { }
    private async waitForReplicationCatchup(from: string, to: string): Promise<void> { }
    private async pauseWrites(shardId: string): Promise<void> { }
    private async finalReplicationSync(from: string, to: string): Promise<void> { }
    private async updateRoutingMetadata(original: string, point: any, newShard: string): Promise<void> { }
    private async resumeWrites(shards: string[]): Promise<void> { }
    private async deleteMovedDataFromOriginal(shardId: string, point: any): Promise<void> { }
    private async verifyDataIntegrity(shard1: string, shard2: string): Promise<void> { }
}

The Cutover Window

The most critical part of range splitting is the cutover window—the brief period when writes are paused to ensure consistency. Production systems aim for < 1 second cutover. Longer pauses cause client timeouts and queue buildup. Some systems use dual-write approaches (writing to both shards during transition) to eliminate the pause entirely.

Range Sharding in Practice

Let's look at how major systems implement range sharding:

Google Bigtable / HBase

Bigtable pioneered range sharding at scale. Data is sorted by row key and automatically split into 'tablets' (Bigtable) or 'regions' (HBase):

Rows sorted lexicographically by key
System automatically splits tablets when they exceed size threshold
Each tablet is served by exactly one tablet server
Tablets can be moved between servers for load balancing

Key design insight: Row keys are strings, so you can construct compound keys like user_123#order_456 where # enforces ordering. All data for a user is colocated and sorted.

Apache Cassandra (with ByteOrderedPartitioner)

While Cassandra defaults to hash partitioning, it supports range partitioning via ByteOrderedPartitioner:

Enables ORDER BY on partition key
Efficient range scans
Risk of hotspots with sequential writes
Rarely used due to hotspot concerns

Most Cassandra deployments use Murmur3Partitioner (hash) instead.

PostgreSQL Native Partitioning

PostgreSQL 10+ supports declarative range partitioning:

postgres-range-partitioning.sql
SQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
-- PostgreSQL declarative range partitioning by date
 
-- Create partitioned table
CREATE TABLE events (
    id              BIGSERIAL,
    event_type      VARCHAR(100),
    user_id         BIGINT,
    event_data      JSONB,
    created_at      TIMESTAMP NOT NULL,
    PRIMARY KEY (id, created_at)  -- Partition key in PK
) PARTITION BY RANGE (created_at);
 
-- Create monthly partitions
CREATE TABLE events_2024_01 PARTITION OF events
    FOR VALUES FROM ('2024-01-01') TO ('2024-02-01');
    
CREATE TABLE events_2024_02 PARTITION OF events
    FOR VALUES FROM ('2024-02-01') TO ('2024-03-01');
    
CREATE TABLE events_2024_03 PARTITION OF events
    FOR VALUES FROM ('2024-03-01') TO ('2024-04-01');
 
-- Create indexes on partitions (propagated automatically in PG 11+)
CREATE INDEX idx_events_user_id ON events (user_id);
CREATE INDEX idx_events_type ON events (event_type);
 
-- Query with partition pruning
EXPLAIN ANALYZE
SELECT * FROM events 
WHERE created_at >= '2024-02-01' AND created_at < '2024-03-01';
-- Only scans events_2024_02 partition!
 
-- Drop old data instantly
DROP TABLE events_2024_01;  -- Instant, no cleanup needed
 
-- Automate partition creation with pg_partman or cron
-- This creates future partitions automatically:
-- SELECT partman.create_partition('events', 'daily', 30);

Range Sharding Implementations
System	Range Unit	Auto-Split	Best For
Bigtable/HBase	Tablets (regions)	Yes, by size	Wide-column time-series
PostgreSQL	Table partitions	Manual/pg_partman	OLTP with time filters
CockroachDB	Ranges (64MB)	Yes, automatic	Distributed relational
TiDB	Regions (96MB)	Yes, by size/keys	MySQL-compatible distributed
TimescaleDB	Chunks (time)	Yes, by time interval	Time-series analytics

Managed Range Automation

Modern distributed databases (CockroachDB, TiDB, Spanner) handle range splitting automatically. When a range exceeds size threshold, the system splits it transparently. This dramatically reduces operational burden compared to manual range management in traditional databases.

Range vs. Hash: Decision Framework

Range and hash sharding are the two primary strategies. Choosing correctly is one of the most important sharding decisions. Here's a comprehensive framework:

Range vs. Hash Sharding Comparison
Factor	Favor Range	Favor Hash
Primary query pattern	Range queries (time periods, ID ranges)	Point lookups (get by ID)
Write pattern	Distributed across key space	Sequential/concentrated writes
Data lifecycle	Clear aging/archival needs	No time-based retention
Hotspot risk	Keys distributed naturally	Sequential keys (timestamps, IDs)
Ordered iteration	Required	Not required
Rebalancing	Split ranges (surgical)	Rehash (global data movement)

Choose Range Sharding When

•Your primary queries filter by ranges (date ranges, ID ranges, alphabetical ranges)
•You need to drop/archive old data by simply deleting partitions
•Keys are naturally distributed (not sequential IDs or timestamps)
•You need ordered iteration or sorted results by partition key
•Your system is time-series oriented with strong time locality

Choose Hash Sharding When

•Your primary queries are point lookups (get user by ID)
•Keys are sequential and would create hotspots with range sharding
•Even distribution is more important than range query efficiency
•You're building user-centric or tenant-centric applications
•You don't have clear archival patterns

Hybrid Approaches

Many systems use hybrid approaches. For example, Cassandra uses compound partition keys where the first part is hashed (for distribution) and subsequent parts are range-sorted (for efficient range queries within a partition). This gives you the best of both worlds for many use cases.

Summary: Range-Based Sharding

We've covered range-based sharding comprehensively. Let's consolidate the key insights:

Key Takeaways

•Range sharding divides data into contiguous ranges — Each shard handles a portion of the key space, making range queries efficient.
•Hotspots are the primary challenge — Sequential keys (timestamps, auto-increments) concentrate writes on one shard, negating sharding benefits.
•Salting mitigates hotspots — Adding a hash prefix distributes writes but converts range queries to scatter-gather.
•Time-series data is ideal for range sharding — Queries filter by time, old data can be dropped, and clear lifecycle patterns emerge.
•Range splitting is surgical — Unlike hash sharding, you can split/merge ranges without moving all data.
•Choose based on query patterns — If you query ranges, use range sharding. If you query points with sequential keys, use hash sharding.

What's Next:

Now that you understand range-based sharding, we'll explore hash-based sharding—the complementary strategy that prioritizes even distribution over range query efficiency. Hash sharding solves the hotspot problem inherently but introduces its own tradeoffs around range queries and rebalancing.

Page Complete

You now understand range-based sharding: how it works, when to use it, how to mitigate hotspots, and how it compares to hash sharding. Next, we'll explore hash-based sharding and understand why it's often the default choice for user-centric applications.