Sharding Partitioning - Learning Module

Loading content...

0/273

Hash-Based Sharding

Distribution Through Mathematics

If range-based sharding is a librarian organizing books by call number, hash-based sharding is a perfectly randomized card dealer—every card goes to a player based purely on a mathematical function, with no regard for the card's value. The result is near-perfect distribution, regardless of the input pattern.

Hash-based sharding uses a hash function to transform any key into a number, then maps that number to a shard. Whether your keys are sequential IDs, random UUIDs, or timestamps, the hash function scrambles them into a uniform distribution across shards. This solves the hotspot problem inherent in range sharding—but at a cost.

What You Will Learn

By the end of this page, you will understand how hash-based sharding works, why consistent hashing revolutionized distributed systems, how virtual nodes improve balance, and the tradeoffs you accept when choosing hash over range sharding. This knowledge is essential for designing user-centric, write-heavy sharded systems.

How Hash-Based Sharding Works

Hash-based sharding uses a deterministic hash function to map keys to shards. The same key always produces the same hash value, and thus always routes to the same shard.

The Basic Mechanism:

Hash the Key — Apply a hash function to the partition key
- hash("user_12345") → 2847593
Map to Shard — Use modulo or consistent hashing
- 2847593 % 4 → Shard 1
Route Query — Direct the query to the calculated shard

Properties of Good Hash Functions:

For sharding, you need hash functions with specific properties:

Uniform Distribution — Output values spread evenly across the output range
Deterministic — Same input always produces same output
Avalanche Effect — Small input changes create dramatically different outputs
Fast Computation — Hashing should add negligible overhead

Popular choices include MurmurHash3, xxHash, and CityHash. Avoid cryptographic hashes like SHA-256—they're designed for security, not speed.

simple-hash-sharding.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/**
 * Basic Hash Sharding Implementation
 * 
 * Uses modulo operation to map hash to shard
 * Simple but has rebalancing problems (covered later)
 */
 
class SimpleHashRouter {
    private shardCount: number;
    
    constructor(shardCount: number) {
        this.shardCount = shardCount;
    }
    
    /**
     * Route a key to a shard using simple modulo
     */
    getShard(key: string): number {
        const hash = this.murmurHash3(key);
        return hash % this.shardCount;
    }
    
    /**
     * MurmurHash3 - fast, well-distributed hash function
     * This is a simplified JS implementation
     */
    private murmurHash3(key: string): number {
        let hash = 0x12345678;  // seed
        
        for (let i = 0; i < key.length; i++) {
            const char = key.charCodeAt(i);
            hash ^= char;
            hash = Math.imul(hash, 0x5bd1e995);
            hash ^= hash >>> 15;
        }
        
        // Ensure positive integer
        return Math.abs(hash >>> 0);
    }
}
 
// Example: 4 shards
const router = new SimpleHashRouter(4);
 
// Same key always routes to same shard
console.log(router.getShard("user_12345")); // Always shard 2 (example)
console.log(router.getShard("user_12346")); // Different shard
console.log(router.getShard("user_12347")); // Different shard
 
// Demonstrate distribution with 1000 keys
const distribution = [0, 0, 0, 0];
for (let i = 0; i < 1000; i++) {
    const shard = router.getShard(`user_${i}`);
    distribution[shard]++;
}
console.log("Distribution across 4 shards:", distribution);
// Output: [247, 251, 248, 254] - nearly uniform!

The Modulo Problem

Simple modulo hashing has a critical flaw: when you change the shard count, almost every key remaps to a different shard. Going from 4 to 5 shards moves ~80% of your data! This is why production systems use consistent hashing instead, which we'll cover next.

The Resharding Problem

Simple modulo hashing (hash(key) % N) works well until you need to change N—the number of shards. When you add or remove shards, the modulo calculation changes, and keys map to different shards.

The Math of Modulo Remapping:

With 4 shards:

hash("user_A") = 100 → 100 % 4 = 0 (Shard 0)
hash("user_B") = 101 → 101 % 4 = 1 (Shard 1)
hash("user_C") = 102 → 102 % 4 = 2 (Shard 2)

Add a 5th shard:

hash("user_A") = 100 → 100 % 5 = 0 (Shard 0) ✓ Same
hash("user_B") = 101 → 101 % 5 = 1 (Shard 1) ✓ Same
hash("user_C") = 102 → 102 % 5 = 2 (Shard 2) ✓ Same

But what about:

hash("user_X") = 103 → 103 % 4 = 3 → 103 % 5 = 3 ✓ Same
hash("user_Y") = 104 → 104 % 4 = 0 → 104 % 5 = 4 ✗ Moved!
hash("user_Z") = 107 → 107 % 4 = 3 → 107 % 5 = 2 ✗ Moved!

Approximately (N-1)/N of keys must move. Going from 4 to 5 shards moves ~80% of data. Going from 100 to 101 shards still moves ~99% of data!

Modulo Remapping Impact
Shard Change	Keys Moved	Data to Transfer (100TB)	Migration Time (1GB/s)
4 → 5 shards	~80%	~80 TB	~22 hours
10 → 11 shards	~91%	~91 TB	~25 hours
100 → 101 shards	~99%	~99 TB	~27 hours
100 → 110 shards	~91%	~91 TB	~25 hours

Why This Is Catastrophic:

Downtime — During migration, data is inconsistent. Many systems require maintenance windows.
Double Storage — You need space for old and new distributions simultaneously.
Network Saturation — Moving 80TB saturates network links for hours.
Cascade Failures — If migration fails partway, rollback is equally expensive.
Application Impact — Queries during migration may return stale or missing data.

This is why simple modulo hashing is only used when:

Shard count is fixed (never changing)
Data is ephemeral (cache, session storage)
Full rebuild is acceptable (batch processing)

For persistent, growing data, you need consistent hashing.

The Cardinality Multiplication Trick

One workaround: always use a multiple of your shard count. Start with 4 shards, then scale to 8, then 16. When doubling, exactly 50% of keys move—each shard splits in half. This is simpler than consistent hashing but limits your scaling options to powers of 2.

Consistent Hashing: The Elegant Solution

Consistent hashing, invented by Karger et al. in 1997, solves the resharding problem elegantly. Instead of hash % N, it uses a conceptual hash ring where both keys and nodes are placed. When nodes are added or removed, only the keys closest to that node need to move.

The Hash Ring Concept:

Imagine a circle (ring) representing the entire hash space (e.g., 0 to 2³²-1)
Hash each shard's identifier and place it on the ring at that position
Hash each key and place it on the ring
Each key belongs to the shard you encounter first, moving clockwise from the key's position

Adding or Removing Nodes:

When you add a node, only keys between the new node and its predecessor need to move—the new node "takes over" part of its successor's responsibility. When you remove a node, only its keys move to the next node clockwise.

With N nodes, adding or removing one node moves only 1/N of keys on average!

consistent-hashing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
/**
 * Consistent Hashing Implementation
 * 
 * Uses a sorted ring of node positions
 * Keys route to the first node clockwise from their hash position
 */
 
class ConsistentHashRing<T> {
    private ring: Map<number, T> = new Map();
    private sortedPositions: number[] = [];
    
    /**
     * Add a node to the ring
     */
    addNode(id: string, node: T): void {
        const position = this.hash(id);
        this.ring.set(position, node);
        this.sortedPositions.push(position);
        this.sortedPositions.sort((a, b) => a - b);
    }
    
    /**
     * Remove a node from the ring
     */
    removeNode(id: string): void {
        const position = this.hash(id);
        this.ring.delete(position);
        this.sortedPositions = this.sortedPositions.filter(p => p !== position);
    }
    
    /**
     * Get the node responsible for a key
     * Walks clockwise to find the first node
     */
    getNode(key: string): T | undefined {
        if (this.ring.size === 0) return undefined;
        
        const keyPosition = this.hash(key);
        
        // Binary search for first position >= keyPosition
        let left = 0;
        let right = this.sortedPositions.length;
        
        while (left < right) {
            const mid = Math.floor((left + right) / 2);
            if (this.sortedPositions[mid] < keyPosition) {
                left = mid + 1;
            } else {
                right = mid;
            }
        }
        
        // Wrap around if we've gone past the end
        const nodePosition = this.sortedPositions[left % this.sortedPositions.length];
        return this.ring.get(nodePosition);
    }
    
    private hash(key: string): number {
        // Simple hash for demonstration
        let hash = 0;
        for (let i = 0; i < key.length; i++) {
            hash = Math.imul(31, hash) + key.charCodeAt(i) | 0;
        }
        return Math.abs(hash);
    }
}
 
// Example usage
const ring = new ConsistentHashRing<string>();
 
ring.addNode("shard-1", "Shard 1 (192.168.1.1)");
ring.addNode("shard-2", "Shard 2 (192.168.1.2)");
ring.addNode("shard-3", "Shard 3 (192.168.1.3)");
 
console.log(ring.getNode("user_12345")); // -> Shard 2
console.log(ring.getNode("user_67890")); // -> Shard 1
 
// Add a new shard - only ~1/4 of keys move!
ring.addNode("shard-4", "Shard 4 (192.168.1.4)");
 
console.log(ring.getNode("user_12345")); // Might still be Shard 2, or moved to Shard 4
console.log(ring.getNode("user_67890")); // Most likely still Shard 1

Consistent Hashing Remapping Impact
Shard Change	Keys Moved	Data to Transfer (100TB)	Migration Time (1GB/s)
4 → 5 shards	~20%	~20 TB	~5.5 hours
10 → 11 shards	~9%	~9 TB	~2.5 hours
100 → 101 shards	~1%	~1 TB	~17 minutes
100 → 110 shards	~9%	~9 TB	~2.5 hours

The Power of Consistent Hashing

Compare the tables: with modulo hashing, adding one shard to 100 moves 99% of data. With consistent hashing, it moves only 1%! This is why consistent hashing is foundational to distributed systems like DynamoDB, Cassandra, and memcached.

Virtual Nodes: Solving Uneven Distribution

Basic consistent hashing has a problem: with few nodes, distribution can be very uneven. If you have 4 nodes and they happen to hash close to each other on the ring, one node might handle 40% of keys while another handles 10%.

The Virtual Node Solution:

Instead of placing each physical node once on the ring, place it multiple times with different identifiers. A node with 256 virtual nodes is placed at 256 positions, creating a much more even distribution.

Benefits of Virtual Nodes:

Better Distribution — More points on the ring means finer-grained key assignment
Heterogeneous Hardware — Give powerful servers more vnodes (and thus more keys)
Smoother Rebalancing — Adding a node moves keys from all existing nodes, not just neighbors
Gradual Migration — New nodes can start with few vnodes and gradually add more

consistent-hashing-vnodes.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
/**
 * Consistent Hashing with Virtual Nodes
 * 
 * Each physical node is represented by multiple virtual nodes
 * This provides much better load distribution
 */
 
interface PhysicalNode {
    id: string;
    address: string;
    weight: number; // Weight determines number of vnodes (for heterogeneous clusters)
}
 
class ConsistentHashRingWithVNodes {
    private ring: Map<number, PhysicalNode> = new Map();
    private sortedPositions: number[] = [];
    private baseVnodeCount: number;
    
    constructor(baseVnodeCount: number = 150) {
        this.baseVnodeCount = baseVnodeCount;
    }
    
    /**
     * Add a physical node with virtual nodes based on weight
     */
    addNode(node: PhysicalNode): void {
        const vnodeCount = Math.floor(this.baseVnodeCount * node.weight);
        
        for (let i = 0; i < vnodeCount; i++) {
            // Create unique identifier for each virtual node
            const vnodeId = `${node.id}#vnode${i}`;
            const position = this.hash(vnodeId);
            
            this.ring.set(position, node);
        }
        
        // Rebuild sorted positions
        this.sortedPositions = Array.from(this.ring.keys()).sort((a, b) => a - b);
    }
    
    /**
     * Remove a physical node (removes all its vnodes)
     */
    removeNode(nodeId: string): void {
        const positionsToRemove: number[] = [];
        
        for (const [position, node] of this.ring) {
            if (node.id === nodeId) {
                positionsToRemove.push(position);
            }
        }
        
        for (const position of positionsToRemove) {
            this.ring.delete(position);
        }
        
        this.sortedPositions = Array.from(this.ring.keys()).sort((a, b) => a - b);
    }
    
    /**
     * Get the physical node responsible for a key
     */
    getNode(key: string): PhysicalNode | undefined {
        if (this.ring.size === 0) return undefined;
        
        const keyPosition = this.hash(key);
        
        // Binary search for first position >= keyPosition
        let left = 0;
        let right = this.sortedPositions.length;
        
        while (left < right) {
            const mid = Math.floor((left + right) / 2);
            if (this.sortedPositions[mid] < keyPosition) {
                left = mid + 1;
            } else {
                right = mid;
            }
        }
        
        const nodePosition = this.sortedPositions[left % this.sortedPositions.length];
        return this.ring.get(nodePosition);
    }
    
    /**
     * Get distribution statistics
     */
    getDistribution(): Map<string, number> {
        const distribution = new Map<string, number>();
        
        // Count vnodes per physical node
        for (const node of this.ring.values()) {
            distribution.set(node.id, (distribution.get(node.id) || 0) + 1);
        }
        
        return distribution;
    }
    
    private hash(key: string): number {
        let hash = 0;
        for (let i = 0; i < key.length; i++) {
            hash = Math.imul(31, hash) + key.charCodeAt(i) | 0;
        }
        return Math.abs(hash);
    }
}
 
// Example: Heterogeneous cluster
const ring = new ConsistentHashRingWithVNodes(150);
 
// Large servers get more vnodes (weight 2.0)
ring.addNode({ id: "node-1", address: "10.0.0.1", weight: 2.0 });
ring.addNode({ id: "node-2", address: "10.0.0.2", weight: 2.0 });
 
// Smaller servers get base vnodes (weight 1.0)
ring.addNode({ id: "node-3", address: "10.0.0.3", weight: 1.0 });
ring.addNode({ id: "node-4", address: "10.0.0.4", weight: 1.0 });
 
console.log("VNode distribution:", ring.getDistribution());
// Output: node-1: 300, node-2: 300, node-3: 150, node-4: 150
// Large servers handle 2x the keys!
 
// Test key distribution
const keyDistribution = new Map<string, number>();
for (let i = 0; i < 10000; i++) {
    const node = ring.getNode(`key_${i}`);
    if (node) {
        keyDistribution.set(node.id, (keyDistribution.get(node.id) || 0) + 1);
    }
}
console.log("Key distribution:", keyDistribution);
// Should be approximately: node-1: 3333, node-2: 3333, node-3: 1667, node-4: 1667

Without Virtual Nodes

•4 nodes = 4 points on ring
•Uneven key distribution likely
•Adding node takes keys from 1-2 neighbors
•All nodes assumed equal capacity
•Standard deviation: ~25% of mean

With Virtual Nodes (150/node)

•4 nodes = 600 points on ring
•Very even key distribution
•Adding node takes keys from all existing nodes
•Weighted nodes for heterogeneous clusters
•Standard deviation: ~5% of mean

Choosing VNode Count

The number of vnodes per node is a tradeoff. More vnodes = better distribution but more memory for the ring mapping. Cassandra uses 256 vnodes by default. For most systems, 100-256 vnodes per physical node provides excellent distribution with manageable memory overhead.

The Range Query Tradeoff

Hash-based sharding's greatest strength—uniform distribution—is also its greatest weakness for certain access patterns. By design, hash functions scramble the key space, destroying any natural ordering.

The Impact on Range Queries:

With range sharding, keys 1000-2000 are adjacent on one shard:

SELECT * FROM orders WHERE order_id BETWEEN 1000 AND 2000;
-- Hits exactly one shard

With hash sharding, those same keys are scattered:

hash(1000) % 4 = 2 → Shard 2
hash(1001) % 4 = 0 → Shard 0
hash(1002) % 4 = 3 → Shard 3
hash(1003) % 4 = 1 → Shard 1

SELECT * FROM orders WHERE order_id BETWEEN 1000 AND 2000;
-- Hits ALL shards! Scatter-gather required.

Hash Sharding Impact on Query Types
Query Type	Example	Performance	Strategy
Point lookup	WHERE user_id = 123	Excellent (single shard)	Hash by user_id ✓
Equality on shard key	WHERE tenant_id = 'acme'	Excellent (single shard)	Standard hash routing ✓
Range on shard key	WHERE user_id BETWEEN 100 AND 200	Poor (all shards)	Scatter-gather required ✗
Range on non-shard-key	WHERE created_at > '2024-01-01'	Poor (all shards)	Secondary index or scan ✗
Aggregation	SELECT COUNT(*) FROM users	Poor (all shards)	Scatter-gather + merge ✗
ORDER BY shard key	ORDER BY user_id LIMIT 10	Poor (all shards)	Sort across shards ✗

Mitigating Range Query Costs:

Design Around Point Lookups — Structure your application to primarily query by the shard key. If you need user data, always query by user_id.
Composite Keys — Hash on a grouping key, range within groups. Cassandra does this: hash the partition key, sort by clustering columns within that partition.
Secondary Indexes — Maintain global secondary indexes for non-shard-key queries. This adds write overhead but enables efficient lookups.
Specialized Read Replicas — Replicate data to a different system optimized for range queries (Elasticsearch, time-series DB, data warehouse).
Accept Scatter-Gather — For rare queries, scatter-gather across shards is acceptable. Just don't make it your primary access pattern.

Don't Fight the Hash

If your primary access pattern requires range queries on the shard key, hash sharding is the wrong choice. Don't try to work around it with expensive scatter-gather operations. Either use range sharding, or restructure your data model so range queries target secondary columns within a hash-partitioned entity.

Jump Consistent Hashing: A Modern Alternative

In 2014, Google published "A Fast, Minimal Memory, Consistent Hash Algorithm" describing Jump Consistent Hash. It achieves the same minimal-movement property as ring-based consistent hashing but with O(1) memory and O(ln n) computation.

How Jump Consistent Hash Works:

The algorithm uses a mathematical progression that 'jumps' through bucket assignments. For any bucket count n, it produces a bucket assignment that only changes for ~1/n of keys when moving to n+1 buckets.

Advantages over Ring-Based:

Zero Memory — No ring data structure; pure computation
Perfect Distribution — Guarantees exactly equal bucket sizes (no vnodes needed)
Blazing Fast — Only ~10 operations per lookup

Limitations:

Only Supports Adding Buckets — Removing a bucket requires remapping as if it never existed
Sequential Bucket IDs — Buckets must be numbered 0 to n-1
No Weighted Nodes — All buckets receive equal share

jump-consistent-hash.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/**
 * Jump Consistent Hash
 * 
 * Google's O(1) memory, O(log n) time consistent hash
 * Reference: "A Fast, Minimal Memory, Consistent Hash Algorithm" (2014)
 */
 
function jumpConsistentHash(key: bigint, numBuckets: number): number {
    let b = -1n;
    let j = 0n;
    
    while (j < BigInt(numBuckets)) {
        b = j;
        key = (key * 2862933555777941757n + 1n) & 0xFFFFFFFFFFFFFFFFn;
        j = BigInt(Math.floor(
            (Number(b) + 1) * 
            Number(1n << 31n) / 
            Number((key >> 33n) + 1n)
        ));
    }
    
    return Number(b);
}
 
// Helper to convert string key to bigint
function stringToKey(str: string): bigint {
    let hash = 0n;
    for (let i = 0; i < str.length; i++) {
        hash = (hash * 31n) + BigInt(str.charCodeAt(i));
        hash = hash & 0xFFFFFFFFFFFFFFFFn; // Keep within 64 bits
    }
    return hash;
}
 
// Example usage
const key = stringToKey("user_12345");
 
console.log("4 buckets:", jumpConsistentHash(key, 4));   // Returns 0-3
console.log("5 buckets:", jumpConsistentHash(key, 5));   // Same or moved to 4
console.log("6 buckets:", jumpConsistentHash(key, 6));   // Same or moved to 5
 
// Verify minimal movement
function measureMovement(numKeys: number, oldBuckets: number, newBuckets: number): number {
    let moved = 0;
    for (let i = 0; i < numKeys; i++) {
        const k = stringToKey(`key_${i}`);
        if (jumpConsistentHash(k, oldBuckets) !== jumpConsistentHash(k, newBuckets)) {
            moved++;
        }
    }
    return moved / numKeys;
}
 
console.log("Movement 10→11:", (measureMovement(10000, 10, 11) * 100).toFixed(1) + "%");
// Output: ~9% (optimal is 9.09% = 1/11)
 
console.log("Movement 100→101:", (measureMovement(10000, 100, 101) * 100).toFixed(1) + "%");
// Output: ~1% (optimal is 0.99% = 1/101)
 
// Perfect distribution test
function testDistribution(numKeys: number, numBuckets: number): number[] {
    const distribution = new Array(numBuckets).fill(0);
    for (let i = 0; i < numKeys; i++) {
        const bucket = jumpConsistentHash(stringToKey(`key_${i}`), numBuckets);
        distribution[bucket]++;
    }
    return distribution;
}
 
console.log("Distribution (4 buckets, 10000 keys):", testDistribution(10000, 4));
// Output: [2500, 2500, 2500, 2500] - perfectly even!

When to Use Jump Consistent Hash

Jump consistent hash is ideal for statically growing clusters where you only add nodes, never remove them. It's perfect for cache clusters, read replicas, and append-only storage. For environments with frequent node failures requiring removal, ring-based consistent hashing with virtual nodes remains more flexible.

Hash Sharding in Production Systems

Let's examine how major distributed systems implement hash-based sharding:

Amazon DynamoDB

DynamoDB uses consistent hashing with virtual nodes:

Partition key is hashed to determine storage location
Data automatically split when partitions grow too large
Requests route directly to the correct partition
Global secondary indexes enable queries on non-partition-key attributes

Apache Cassandra

Cassandra uses Murmur3Partitioner by default:

Partition key is hashed using MurmurHash3
256 virtual nodes per node (configurable)
Replication factor determines copies across the ring
Within a partition, data is sorted by clustering columns (hybrid approach)

Redis Cluster

Redis Cluster uses hash slots:

16,384 fixed hash slots
Keys hashed via CRC16, modulo 16384
Slots distributed across nodes
Hash tags allow colocating related keys: {user_123}_profile, {user_123}_settings

redis-hash-slots.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
/**
 * Redis Cluster Hash Slot Routing
 * 
 * Redis uses 16384 fixed hash slots distributed across nodes
 * CRC16 hash with hash tag support for key colocation
 */
 
const HASH_SLOT_COUNT = 16384;
 
interface RedisNode {
    id: string;
    address: string;
    slots: [number, number][]; // Array of [start, end] slot ranges
}
 
function crc16(data: string): number {
    // Simplified CRC16 (actual implementation uses CRC16-CCITT)
    let crc = 0;
    for (let i = 0; i < data.length; i++) {
        crc = (crc ^ data.charCodeAt(i)) & 0xFFFF;
        for (let j = 0; j < 8; j++) {
            if (crc & 1) {
                crc = (crc >> 1) ^ 0xA001;
            } else {
                crc = crc >> 1;
            }
        }
    }
    return crc;
}
 
function getHashSlot(key: string): number {
    // Check for hash tag: {tag}key
    // Only the content inside {} is hashed
    const tagStart = key.indexOf('{');
    const tagEnd = key.indexOf('}', tagStart + 1);
    
    if (tagStart !== -1 && tagEnd !== -1 && tagEnd > tagStart + 1) {
        // Hash only the tag content
        const tag = key.substring(tagStart + 1, tagEnd);
        return crc16(tag) % HASH_SLOT_COUNT;
    }
    
    // No tag, hash entire key
    return crc16(key) % HASH_SLOT_COUNT;
}
 
class RedisCluster {
    private nodes: RedisNode[] = [];
    
    addNode(node: RedisNode): void {
        this.nodes.push(node);
    }
    
    getNode(key: string): RedisNode | undefined {
        const slot = getHashSlot(key);
        
        for (const node of this.nodes) {
            for (const [start, end] of node.slots) {
                if (slot >= start && slot <= end) {
                    return node;
                }
            }
        }
        
        return undefined;
    }
}
 
// Example: 3-node cluster with slot distribution
const cluster = new RedisCluster();
 
cluster.addNode({
    id: "node-1",
    address: "10.0.0.1:6379",
    slots: [[0, 5460]]           // Slots 0-5460
});
 
cluster.addNode({
    id: "node-2", 
    address: "10.0.0.2:6379",
    slots: [[5461, 10922]]       // Slots 5461-10922
});
 
cluster.addNode({
    id: "node-3",
    address: "10.0.0.3:6379",
    slots: [[10923, 16383]]      // Slots 10923-16383
});
 
// Regular keys - distributed based on full key hash
console.log("user_123:", getHashSlot("user_123"));
console.log("user_456:", getHashSlot("user_456"));
 
// Hash tags - these colocate on same node!
console.log("{user_123}_profile:", getHashSlot("{user_123}_profile"));
console.log("{user_123}_settings:", getHashSlot("{user_123}_settings"));
// Both hash by "user_123" only, so same slot!
 
// Verification
const profileNode = cluster.getNode("{user_123}_profile");
const settingsNode = cluster.getNode("{user_123}_settings");
console.log("Same node?", profileNode?.id === settingsNode?.id); // true

The Hash Tag Pattern

Redis hash tags ({...}) are a clever solution for colocation in hash-sharded systems. By embedding a common tag in related keys, you guarantee they hash to the same slot. This enables multi-key operations like MGET and Lua scripts across related data without cross-node coordination.

Summary: Hash-Based Sharding

We've covered hash-based sharding comprehensively. Let's consolidate the key insights:

Key Takeaways

•Hash functions distribute keys uniformly — Regardless of input patterns, good hash functions spread data evenly across shards, eliminating hotspots.
•Simple modulo hashing causes massive data movement — Changing shard count moves ~(N-1)/N of all data. Never use modulo for persistent data that grows.
•Consistent hashing minimizes movement — Adding or removing one shard moves only ~1/N of data, making cluster changes practical.
•Virtual nodes improve distribution — Multiple vnodes per physical node smooth out distribution and enable weighted capacity allocation.
•Hash sharding sacrifices range queries — Keys are scrambled, making range queries scatter-gather operations. Design around point lookups.
•Hash tags enable colocation — Embedding common identifiers in keys ensures related data routes to the same shard.

What's Next:

Now that you understand both range and hash sharding, we'll explore directory-based sharding—a flexible approach that uses an explicit lookup table to map keys to shards. This provides maximum flexibility at the cost of an additional hop and central point of coordination.

Page Complete

You now understand hash-based sharding: simple modulo, consistent hashing, virtual nodes, jump consistent hashing, and the range query tradeoff. You have the knowledge to design user-centric sharded systems with even distribution and minimal data movement during scaling. Next, we'll explore directory-based sharding for maximum flexibility.