Loading content...
Range-based sharding is perhaps the most intuitive partitioning strategy. Just as a library organizes books by call number ranges (A-D on shelf 1, E-H on shelf 2), range-based sharding divides data into contiguous ranges of key values. Each shard handles a specific range, making queries within that range extremely efficient.
This strategy is particularly powerful for time-series data, sequential identifiers, and any dataset where range queries are common. When you query "all orders from January 2024," the system knows exactly which shard to hit—no scatter-gather required.
By the end of this page, you will understand how range-based sharding works, when it excels, its inherent challenges (especially hotspots), and practical strategies for implementing range partitioning in production systems. You'll also learn when to choose range-based sharding over alternatives like hash partitioning.
In range-based sharding, the key space is divided into contiguous, non-overlapping ranges. Each range is assigned to a shard. When a query arrives, the system determines which range contains the target key and routes the query to the corresponding shard.
The Basic Mechanism:
Define Range Boundaries — Split the key space into ranges
Route Queries — Given a key, find its range
Execute and Return — Each shard processes its portion
The beauty of range sharding is that range queries are efficient. If you need all records from January 2024 and dates are your shard key, you know exactly which shard(s) to query.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
interface RangeBoundary { start: number | string | Date; end: number | string | Date; shardId: string;} class RangeShardRouter { private boundaries: RangeBoundary[]; constructor(boundaries: RangeBoundary[]) { // Boundaries must be sorted and non-overlapping this.boundaries = boundaries.sort((a, b) => this.compare(a.start, b.start) ); } /** * Route a single key to its shard * Uses binary search for O(log n) routing with many shards */ routeKey(key: number | string | Date): string { // Binary search for the correct range let left = 0; let right = this.boundaries.length - 1; while (left <= right) { const mid = Math.floor((left + right) / 2); const range = this.boundaries[mid]; if (this.compare(key, range.start) >= 0 && this.compare(key, range.end) <= 0) { return range.shardId; } if (this.compare(key, range.start) < 0) { right = mid - 1; } else { left = mid + 1; } } throw new Error(`Key ${key} not found in any range`); } /** * Route a range query to all shards it touches * Returns shards in order from start to end */ routeRange(startKey: number | string | Date, endKey: number | string | Date): string[] { const shards: string[] = []; for (const range of this.boundaries) { // Check if this range overlaps with query range const rangeStartsBeforeQueryEnds = this.compare(range.start, endKey) <= 0; const rangeEndsAfterQueryStarts = this.compare(range.end, startKey) >= 0; if (rangeStartsBeforeQueryEnds && rangeEndsAfterQueryStarts) { shards.push(range.shardId); } } return shards; } private compare(a: number | string | Date, b: number | string | Date): number { if (a < b) return -1; if (a > b) return 1; return 0; }} // Example: Date-based range sharding for time-series dataconst router = new RangeShardRouter([ { start: new Date('2023-01-01'), end: new Date('2023-03-31'), shardId: 'shard-2023-q1' }, { start: new Date('2023-04-01'), end: new Date('2023-06-30'), shardId: 'shard-2023-q2' }, { start: new Date('2023-07-01'), end: new Date('2023-09-30'), shardId: 'shard-2023-q3' }, { start: new Date('2023-10-01'), end: new Date('2023-12-31'), shardId: 'shard-2023-q4' }, { start: new Date('2024-01-01'), end: new Date('2024-03-31'), shardId: 'shard-2024-q1' },]); // Single key lookupconst shard = router.routeKey(new Date('2023-08-15'));console.log(`August 2023 data is on ${shard}`); // shard-2023-q3 // Range queryconst shardsForQuery = router.routeRange( new Date('2023-11-01'), new Date('2024-02-28'));console.log(`Query spans: ${shardsForQuery}`); // ['shard-2023-q4', 'shard-2024-q1']Range boundaries are typically stored in a metadata service (like ZooKeeper or etcd) or a configuration database. The routing layer caches this metadata locally for fast lookups. When ranges change (due to splits or rebalancing), the metadata is updated and clients refresh their cache.
Range-based sharding isn't always the right choice, but when your data and access patterns align with it, the benefits are substantial.
Ideal Use Cases:
| Use Case | Shard Key | Range Strategy | Efficiency Gain |
|---|---|---|---|
| Application Logs | timestamp | Monthly or weekly ranges | Date queries hit single shard |
| Financial Transactions | transaction_date | Daily or weekly ranges | Settlement queries localized |
| Audit History | created_at | Quarterly ranges | Compliance queries efficient |
| Order History | order_id (sequential) | ID ranges | Batch processing optimized |
| Sensor Telemetry | measurement_time | Hourly or daily ranges | Aggregation queries efficient |
The Archival Advantage:
One of the most powerful benefits of range sharding by time is simplified data lifecycle management. Consider a logging system with 90-day retention:
Without range sharding:
DELETE FROM logs WHERE created_at < NOW() - INTERVAL '90 days';
-- Touches all shards, generates massive tombstones/dead tuples
-- Requires VACUUM, causes I/O spikes, blocks other operations
With range sharding by month:
DROP TABLE logs_2023_10;
-- Instant, no tombstones, no cleanup, no impact on other shards
This is why time-series databases (TimescaleDB, InfluxDB) universally use range partitioning by time.
If your primary access pattern is 'give me data for time period X,' range sharding by time is almost certainly the right choice. This includes logs, metrics, events, analytics, and any data where time is the natural query dimension.
Despite its elegance, range-based sharding has a critical weakness: hotspots. When writes are concentrated in a narrow range, one shard receives disproportionate load while others sit idle.
The Sequential Write Problem:
Consider these common scenarios:
Auto-increment IDs:
Time-based Keys:
This creates severe problems:
Visualizing the Hotspot:
Writes/second by shard:
Shard A (old IDs): ██ 200/s
Shard B (current): ██████████████████ 18,000/s
Shard C (future): ░ 0/s
Capacity per shard: 20,000/s
Total cluster: 60,000/s
Actual throughput: 18,000/s
Efficiency: 30% 😱
You have 3 shards but can only use the capacity of 1.
Balanced Distribution:
Writes/second by shard:
Shard A: ██████ 18,500/s
Shard B: ██████ 19,000/s
Shard C: ██████ 18,500/s
Capacity per shard: 20,000/s
Total cluster: 60,000/s
Actual throughput: 56,000/s
Efficiency: 93% ✅
With even distribution, you utilize nearly all capacity.
Never use auto-incrementing IDs as range shard keys unless you specifically want 'current' data on one shard. This pattern works for time-series with read-heavy current data, but fails catastrophically for write-heavy workloads with sequential keys.
While hotspots are inherent to range sharding with sequential keys, several strategies can mitigate or eliminate them:
0_2024-01-15, 1_2024-01-15, etc.12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
/** * Salted Range Sharding * Combines range partitioning with salt prefixes to eliminate hotspots */ const SALT_COUNT = 10; // Keys spread across 10 sub-ranges interface SaltedKey { salt: number; originalKey: string; composite: string;} function createSaltedKey(originalKey: string): SaltedKey { // Hash the original key to get consistent salt const hash = simpleHash(originalKey); const salt = hash % SALT_COUNT; return { salt, originalKey, composite: `${salt}_${originalKey}` };} function simpleHash(str: string): number { let hash = 0; for (let i = 0; i < str.length; i++) { hash = Math.imul(31, hash) + str.charCodeAt(i) | 0; } return Math.abs(hash);} // Time-series example: logs with timestamp keysfunction createLogEntry(timestamp: Date, logData: any) { // Original key would be: "2024-01-15T10:30:00Z" const originalKey = timestamp.toISOString(); const salted = createSaltedKey(originalKey); // Store as: "3_2024-01-15T10:30:00Z" (where 3 is the salt) return { shardKey: salted.composite, data: logData, originalTimestamp: originalKey };} // Querying: must scatter across all saltsfunction queryTimeRange(start: Date, end: Date): string[] { const shardQueries: string[] = []; // Generate query for each salt for (let salt = 0; salt < SALT_COUNT; salt++) { shardQueries.push(` SELECT * FROM logs WHERE shard_key BETWEEN '${salt}_${start.toISOString()}' AND '${salt}_${end.toISOString()}' `); } return shardQueries; // Execute in parallel, merge results} // Example usageconst entry = createLogEntry(new Date(), { message: "User login" });console.log(`Entry shard key: ${entry.shardKey}`);// Output: Entry shard key: 7_2024-01-15T10:30:00.000Z // Instead of ALL writes going to today's shard,// they're distributed across 10 shards (salt 0-9 for today)The Salting Tradeoff:
Salting spreads writes across N sub-ranges, effectively eliminating the hotspot. However, it comes with costs:
Salting is ideal when write distribution is more important than range query efficiency. For pure time-series analytics where reads dominate, unsalted ranges may be better.
Use a salt count that doesn't exceed your shard count. With 4 shards, salt with 4 or 8 values. With 16 shards, salt with 16 or 32 values. This ensures even distribution while avoiding excessive scatter-gather operations.
Unlike hash-based sharding where adding nodes requires rehashing, range sharding has a unique advantage: you can split ranges without moving all data. This makes rebalancing more surgical—but it still requires careful orchestration.
When to Split a Range:
The Splitting Process:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374
interface RangeSplit { originalShard: string; splitPoint: string | number | Date; newShard: string;} class RangeSplitOrchestrator { /** * Orchestrates splitting a range shard at a given point * This is a simplified view - production requires extensive error handling */ async splitRange(split: RangeSplit): Promise<void> { const { originalShard, splitPoint, newShard } = split; console.log(`Starting split of ${originalShard} at ${splitPoint}`); // Phase 1: Preparation await this.createNewShardInstance(newShard); // Phase 2: Data Copy (online, while original serves traffic) // Uses streaming replication or bulk copy await this.copyDataAboveSplitPoint(originalShard, splitPoint, newShard); // Phase 3: Catch-up (copy changes made during Phase 2) await this.enableReplicationForNewWrites(originalShard, splitPoint, newShard); await this.waitForReplicationCatchup(originalShard, newShard); // Phase 4: Atomic Cutover // This is the critical moment - must be fast await this.pauseWrites(originalShard); // < 1 second pause await this.finalReplicationSync(originalShard, newShard); await this.updateRoutingMetadata(originalShard, splitPoint, newShard); await this.resumeWrites([originalShard, newShard]); // Phase 5: Cleanup await this.deleteMovedDataFromOriginal(originalShard, splitPoint); await this.verifyDataIntegrity(originalShard, newShard); console.log(`Split complete: ${originalShard} -> ${originalShard}, ${newShard}`); } async findOptimalSplitPoint(shardId: string): Promise<string> { // Options for choosing split point: // 1. Median key - ensures equal key count // SELECT key FROM table ORDER BY key LIMIT 1 OFFSET (SELECT COUNT(*)/2) // 2. Size-based - split by data size // Use index statistics to find key at 50% of data // 3. Write-rate based - split to balance writes // Analyze recent write patterns, split where writes divide evenly // Usually median key is the default choice return this.getMedianKey(shardId); } private async getMedianKey(shardId: string): Promise<string> { // Implementation: query for median key return "median_key_placeholder"; } // ... implementation of other methods private async createNewShardInstance(shardId: string): Promise<void> { } private async copyDataAboveSplitPoint(from: string, point: any, to: string): Promise<void> { } private async enableReplicationForNewWrites(from: string, point: any, to: string): Promise<void> { } private async waitForReplicationCatchup(from: string, to: string): Promise<void> { } private async pauseWrites(shardId: string): Promise<void> { } private async finalReplicationSync(from: string, to: string): Promise<void> { } private async updateRoutingMetadata(original: string, point: any, newShard: string): Promise<void> { } private async resumeWrites(shards: string[]): Promise<void> { } private async deleteMovedDataFromOriginal(shardId: string, point: any): Promise<void> { } private async verifyDataIntegrity(shard1: string, shard2: string): Promise<void> { }}The most critical part of range splitting is the cutover window—the brief period when writes are paused to ensure consistency. Production systems aim for < 1 second cutover. Longer pauses cause client timeouts and queue buildup. Some systems use dual-write approaches (writing to both shards during transition) to eliminate the pause entirely.
Let's look at how major systems implement range sharding:
Google Bigtable / HBase
Bigtable pioneered range sharding at scale. Data is sorted by row key and automatically split into 'tablets' (Bigtable) or 'regions' (HBase):
Key design insight: Row keys are strings, so you can construct compound keys like user_123#order_456 where # enforces ordering. All data for a user is colocated and sorted.
Apache Cassandra (with ByteOrderedPartitioner)
While Cassandra defaults to hash partitioning, it supports range partitioning via ByteOrderedPartitioner:
Most Cassandra deployments use Murmur3Partitioner (hash) instead.
PostgreSQL Native Partitioning
PostgreSQL 10+ supports declarative range partitioning:
1234567891011121314151617181920212223242526272829303132333435363738
-- PostgreSQL declarative range partitioning by date -- Create partitioned tableCREATE TABLE events ( id BIGSERIAL, event_type VARCHAR(100), user_id BIGINT, event_data JSONB, created_at TIMESTAMP NOT NULL, PRIMARY KEY (id, created_at) -- Partition key in PK) PARTITION BY RANGE (created_at); -- Create monthly partitionsCREATE TABLE events_2024_01 PARTITION OF events FOR VALUES FROM ('2024-01-01') TO ('2024-02-01'); CREATE TABLE events_2024_02 PARTITION OF events FOR VALUES FROM ('2024-02-01') TO ('2024-03-01'); CREATE TABLE events_2024_03 PARTITION OF events FOR VALUES FROM ('2024-03-01') TO ('2024-04-01'); -- Create indexes on partitions (propagated automatically in PG 11+)CREATE INDEX idx_events_user_id ON events (user_id);CREATE INDEX idx_events_type ON events (event_type); -- Query with partition pruningEXPLAIN ANALYZESELECT * FROM events WHERE created_at >= '2024-02-01' AND created_at < '2024-03-01';-- Only scans events_2024_02 partition! -- Drop old data instantlyDROP TABLE events_2024_01; -- Instant, no cleanup needed -- Automate partition creation with pg_partman or cron-- This creates future partitions automatically:-- SELECT partman.create_partition('events', 'daily', 30);| System | Range Unit | Auto-Split | Best For |
|---|---|---|---|
| Bigtable/HBase | Tablets (regions) | Yes, by size | Wide-column time-series |
| PostgreSQL | Table partitions | Manual/pg_partman | OLTP with time filters |
| CockroachDB | Ranges (64MB) | Yes, automatic | Distributed relational |
| TiDB | Regions (96MB) | Yes, by size/keys | MySQL-compatible distributed |
| TimescaleDB | Chunks (time) | Yes, by time interval | Time-series analytics |
Modern distributed databases (CockroachDB, TiDB, Spanner) handle range splitting automatically. When a range exceeds size threshold, the system splits it transparently. This dramatically reduces operational burden compared to manual range management in traditional databases.
Range and hash sharding are the two primary strategies. Choosing correctly is one of the most important sharding decisions. Here's a comprehensive framework:
| Factor | Favor Range | Favor Hash |
|---|---|---|
| Primary query pattern | Range queries (time periods, ID ranges) | Point lookups (get by ID) |
| Write pattern | Distributed across key space | Sequential/concentrated writes |
| Data lifecycle | Clear aging/archival needs | No time-based retention |
| Hotspot risk | Keys distributed naturally | Sequential keys (timestamps, IDs) |
| Ordered iteration | Required | Not required |
| Rebalancing | Split ranges (surgical) | Rehash (global data movement) |
Many systems use hybrid approaches. For example, Cassandra uses compound partition keys where the first part is hashed (for distribution) and subsequent parts are range-sorted (for efficient range queries within a partition). This gives you the best of both worlds for many use cases.
We've covered range-based sharding comprehensively. Let's consolidate the key insights:
What's Next:
Now that you understand range-based sharding, we'll explore hash-based sharding—the complementary strategy that prioritizes even distribution over range query efficiency. Hash sharding solves the hotspot problem inherently but introduces its own tradeoffs around range queries and rebalancing.
You now understand range-based sharding: how it works, when to use it, how to mitigate hotspots, and how it compares to hash sharding. Next, we'll explore hash-based sharding and understand why it's often the default choice for user-centric applications.