System Design (HLD)SQL Database Scaling Patterns

SQL Database Scaling Patterns

LevelAdvanced

Duration75 mins

TopicSQL Database Scaling Patterns

4 / 5

Application-Level Sharding

The Ultimate Scaling Lever

When your data outgrows a single database—when vertical scaling is exhausted, read replicas can't keep up with writes, and functional partitioning doesn't help because a single table has billions of rows—you arrive at the final frontier of SQL database scaling: sharding.

Sharding horizontally partitions data across multiple database instances. Instead of all users in one database, you distribute them: users 1-1M on shard 1, 1M-2M on shard 2, and so on. Each shard is a complete, independent database containing a subset of the total data.

Application-level sharding means the application—not the database—is responsible for routing queries to the correct shard. This gives you maximum control but requires significant engineering investment.

Sharding Is Complex—and Often Premature

Sharding is the most powerful scaling strategy but also the most complex. Before implementing sharding, exhaust every other option: query optimization, indexing, read replicas, caching, and functional partitioning. Many teams shard prematurely and suffer years of unnecessary complexity.

What You Will Learn

By the end of this page, you will understand the mechanics of application-level sharding, how to choose and implement shard keys, routing strategies (hash-based, range-based, directory-based), handling cross-shard queries and transactions, and the operational challenges of resharding.

Anatomy of a Sharded System

A sharded database system consists of several components working together:

Components

Shards: Independent database instances, each holding a partition of the data. Shards may have their own replicas for read scaling and high availability.

Shard Key: The column (or columns) used to determine which shard a row belongs to. For a users table, this might be user_id. All rows with the same shard key value reside on the same shard.

Shard Map/Directory: A mapping from shard key values (or ranges) to shard locations. This might be a configuration file, a separate metadata database, or an in-memory structure.

Router/Proxy: Logic that intercepts queries, extracts the shard key, consults the shard map, and routes the query to the appropriate shard. In application-level sharding, this logic lives in your application code.

sharded_architecture.md
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Application-Level Sharding Architecture
═══════════════════════════════════════════════════════════════
 
┌─────────────────────────────────────────────────────────────┐
│                      APPLICATION                            │
│                                                             │
│   ┌─────────────────────────────────────────────────────┐   │
│   │                  SHARD ROUTER                       │   │
│   │                                                     │   │
│   │   1. Extract shard key from query/context           │   │
│   │   2. Compute shard ID: hash(user_id) % num_shards   │   │
│   │   3. Look up shard connection from shard map        │   │
│   │   4. Execute query on target shard                  │   │
│   │                                                     │   │
│   │   ┌─────────────────────────────────────────────┐   │   │
│   │   │            SHARD MAP / DIRECTORY            │   │   │
│   │   │                                             │   │   │
│   │   │  Shard 0: postgresql://shard0.db:5432       │   │   │
│   │   │  Shard 1: postgresql://shard1.db:5432       │   │   │
│   │   │  Shard 2: postgresql://shard2.db:5432       │   │   │
│   │   │  Shard 3: postgresql://shard3.db:5432       │   │   │
│   │   └─────────────────────────────────────────────┘   │   │
│   └─────────────────────────────────────────────────────┘   │
└───────────────────────┬───────────────┬───────────────┬─────┘
                        │               │               │
              ┌─────────▼─────────┐     │     ┌─────────▼─────────┐
              │     SHARD 0       │     │     │     SHARD 3       │
              │  (users 0-999K)   │     │     │  (users 3M+)      │
              │  ┌──────────────┐ │     │     │  ┌──────────────┐ │
              │  │    users     │ │     │     │  │    users     │ │
              │  │   orders     │ │     │     │  │   orders     │ │
              │  │   profiles   │ │     │     │  │   profiles   │ │
              │  └──────────────┘ │     │     │  └──────────────┘ │
              │   + replicas     │     │     │   + replicas     │
              └───────────────────┘     │     └───────────────────┘
                                        │
                              ┌─────────▼─────────┐
                              │   SHARDS 1 & 2    │
                              │  (similar setup)  │
                              └───────────────────┘

What Gets Sharded?

Not every table needs sharding. In a typical e-commerce system:

Sharded tables (large, tied to shard key):

users — Sharded by user_id
orders — Sharded by user_id (colocated with users)
user_preferences — Sharded by user_id

Global tables (small, read-mostly, needed everywhere):

countries — Replicated to every shard
product_categories — Replicated to every shard
config_settings — Replicated or stored centrally

Centralized tables (not sharded, stored separately):

products — In a separate product database, not sharded by user
audit_logs — Centralized for compliance

The decision of what to shard depends on your access patterns and data relationships.

Shard Key Selection

The choice of shard key is the most critical decision in sharding. A poor shard key creates hot spots, makes common queries inefficient, and is extremely expensive to change later.

Shard Key Criteria

High Cardinality: The key should have many distinct values. Sharding by country (200 values) creates large, uneven shards. Sharding by user_id (millions of values) distributes data evenly.

Even Distribution: Values should distribute evenly across the key space. If 50% of users are in shard 0, that shard becomes a bottleneck.

Query Alignment: Most queries should include the shard key. If you shard by user_id but frequently query by email, every email lookup is a scatter-gather across all shards.

Colocation Support: Related data should share a shard key. User orders should be on the same shard as the user, so queries joining users and orders don't cross shards.

Shard Key Evaluation Examples
Candidate Key	Cardinality	Distribution	Query Alignment	Verdict
user_id	High (millions)	Even (if using hash)	Good for user-centric apps	✅ Excellent choice
tenant_id	Medium	May be uneven (large tenants)	Good for multi-tenant SaaS	⚠️ Works if tenants balanced
created_date	High	Even	Poor (queries rarely filter by date alone)	❌ Bad—causes hot spots on current date
country_code	Low (<200)	Very uneven	Moderate	❌ Bad—shards too uneven
order_id	High	Even	Poor (usually query by user, not order)	❌ Bad—separates user's orders
(user_id, tenant_id)	Very high	Even	Good for multi-tenant	✅ Composite key can work

shard_key_analysis.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Analyze potential shard key distribution
 
-- 1. Cardinality analysis
SELECT 
    COUNT(DISTINCT user_id) AS distinct_users,
    COUNT(DISTINCT tenant_id) AS distinct_tenants,
    COUNT(DISTINCT DATE(created_at)) AS distinct_dates,
    COUNT(*) AS total_rows
FROM orders;
 
-- 2. Distribution analysis for user_id (simulating hash distribution)
-- Check if hash-based sharding would be even
WITH shard_distribution AS (
    SELECT 
        -- Simulate 8-shard distribution using modulo of hash
        abs(hashtext(user_id::text)) % 8 AS simulated_shard,
        COUNT(*) AS row_count
    FROM orders
    GROUP BY abs(hashtext(user_id::text)) % 8
)
SELECT 
    simulated_shard,
    row_count,
    round(100.0 * row_count / SUM(row_count) OVER (), 2) AS percentage
FROM shard_distribution
ORDER BY simulated_shard;
 
-- 3. Hot spot detection: Top users by row count
-- If few users dominate, sharding by user_id still creates hot shards
SELECT 
    user_id,
    COUNT(*) AS order_count,
    round(100.0 * COUNT(*) / (SELECT COUNT(*) FROM orders), 2) AS pct_of_total
FROM orders
GROUP BY user_id
ORDER BY order_count DESC
LIMIT 20;
 
-- 4. Query pattern analysis: What columns appear in WHERE clauses?
-- This requires pg_stat_statements or application-level query logging
SELECT 
    query,
    calls,
    total_time,
    -- Look for patterns in WHERE clauses
    CASE 
        WHEN query ILIKE '%WHERE%user_id%' THEN 'user_id'
        WHEN query ILIKE '%WHERE%tenant_id%' THEN 'tenant_id'
        WHEN query ILIKE '%WHERE%order_id%' THEN 'order_id'
        ELSE 'other/none'
    END AS likely_shard_key
FROM pg_stat_statements
WHERE query ILIKE '%orders%'
ORDER BY calls DESC
LIMIT 50;

The Golden Rule of Shard Keys

Choose a shard key that appears in 90%+ of your queries. If most queries include user_id, shard by user_id. If most include tenant_id, shard by tenant_id. The goal is single-shard queries wherever possible.

Sharding Strategies

Given a shard key, how do you map key values to shards? Three primary strategies exist, each with distinct trade-offs.

Hash-Based Sharding

Compute a hash of the shard key and use modulo to select a shard:

shard_id = hash(shard_key) % number_of_shards

Pros:

Very even distribution (if hash function is good)
Simple to implement
No need to store a mapping—purely algorithmic

Cons:

Adding/removing shards requires rehashing everything (expensive resharding)
Range queries become scatter-gather across all shards
No natural ordering—can't find "all users 1000-2000" on one shard

Range-Based Sharding

Divide the key space into ranges:

Shards:
  Shard 0: user_id 1 to 1,000,000
  Shard 1: user_id 1,000,001 to 2,000,000
  Shard 2: user_id 2,000,001 to 3,000,000

Pros:

Range queries on shard key are efficient (single shard)
Easier resharding—just split a range into smaller ranges
Predictable data locality

Cons:

Uneven distribution if data isn't uniformly distributed
Hot spots if recent data (high IDs) gets more traffic
Requires managing range boundaries

Directory-Based Sharding

Maintain an explicit mapping from shard key to shard:

Directory Table:
  user_id 12345 → Shard 2
  user_id 67890 → Shard 5

Pros:

Complete flexibility—can move individual keys between shards
Handle uneven distribution by mapping hot keys to dedicated shards
Supports complex resharding scenarios

Cons:

Directory itself becomes a bottleneck and SPOF
Extra lookup for every query
Directory must be highly available and consistent

Sharding Strategy Comparison
Strategy	Distribution	Resharding	Range Queries	Complexity
Hash-Based	Excellent	Hard (full rehash)	Poor (scatter-gather)	Low
Range-Based	Variable	Moderate (split ranges)	Good (single shard)	Medium
Directory-Based	Complete control	Easy (update mapping)	Depends on mapping	High
Consistent Hashing	Good	Minimized movement	Poor	Medium

shard_router.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
/**
 * Shard routing implementations
 */
 
import { createHash } from 'crypto';
import { Pool } from 'pg';
 
interface ShardConfig {
    id: number;
    connectionString: string;
    pool: Pool;
}
 
// Abstract router interface
interface ShardRouter {
    getShardForKey(key: string): ShardConfig;
    getAllShards(): ShardConfig[];
}
 
// Hash-based router
class HashBasedRouter implements ShardRouter {
    constructor(private shards: ShardConfig[]) {}
    
    getShardForKey(key: string): ShardConfig {
        // Consistent hash using MD5 (fast, good distribution)
        const hash = createHash('md5')
            .update(key)
            .digest();
        
        // Use first 4 bytes as unsigned integer
        const hashValue = hash.readUInt32BE(0);
        const shardIndex = hashValue % this.shards.length;
        
        return this.shards[shardIndex];
    }
    
    getAllShards(): ShardConfig[] {
        return this.shards;
    }
}
 
// Range-based router
class RangeBasedRouter implements ShardRouter {
    private ranges: Array<{ maxKey: number; shard: ShardConfig }>;
    
    constructor(ranges: Array<{ maxKey: number; shard: ShardConfig }>) {
        // Ranges must be sorted by maxKey
        this.ranges = ranges.sort((a, b) => a.maxKey - b.maxKey);
    }
    
    getShardForKey(key: string): ShardConfig {
        const keyValue = parseInt(key, 10);
        
        for (const range of this.ranges) {
            if (keyValue <= range.maxKey) {
                return range.shard;
            }
        }
        
        // Key exceeds all ranges—use last shard
        return this.ranges[this.ranges.length - 1].shard;
    }
    
    getAllShards(): ShardConfig[] {
        return this.ranges.map(r => r.shard);
    }
}
 
// Directory-based router
class DirectoryBasedRouter implements ShardRouter {
    private defaultShard: ShardConfig;
    
    constructor(
        private shards: ShardConfig[],
        private directoryDb: Pool,
        defaultShardId: number
    ) {
        this.defaultShard = shards.find(s => s.id === defaultShardId)!;
    }
    
    async getShardForKey(key: string): Promise<ShardConfig> {
        // Look up in directory
        const result = await this.directoryDb.query(
            'SELECT shard_id FROM shard_directory WHERE shard_key = $1',
            [key]
        );
        
        if (result.rows.length > 0) {
            const shardId = result.rows[0].shard_id;
            return this.shards.find(s => s.id === shardId)!;
        }
        
        // Key not in directory—assign to default shard
        // (or compute based on hash and insert into directory)
        return this.defaultShard;
    }
    
    getAllShards(): ShardConfig[] {
        return this.shards;
    }
}
 
// Usage example: ShardedUserRepository
class ShardedUserRepository {
    constructor(private router: ShardRouter) {}
    
    async getUser(userId: string): Promise<User | null> {
        const shard = this.router.getShardForKey(userId);
        
        const result = await shard.pool.query(
            'SELECT * FROM users WHERE id = $1',
            [userId]
        );
        
        return result.rows[0] || null;
    }
    
    async createUser(user: CreateUserInput): Promise<User> {
        const shard = this.router.getShardForKey(user.id);
        
        const result = await shard.pool.query(
            'INSERT INTO users (id, email, name) VALUES ($1, $2, $3) RETURNING *',
            [user.id, user.email, user.name]
        );
        
        return result.rows[0];
    }
    
    // Scatter-gather: Query all shards (expensive!)
    async searchUsers(query: string): Promise<User[]> {
        const allShards = this.router.getAllShards();
        
        const results = await Promise.all(
            allShards.map(shard =>
                shard.pool.query(
                    'SELECT * FROM users WHERE name ILIKE $1 LIMIT 100',
                    [`%${query}%`]
                )
            )
        );
        
        // Merge results from all shards
        return results.flatMap(r => r.rows);
    }
}

Consistent Hashing

Simple hash-based sharding has a critical flaw: adding or removing a shard requires rehashing nearly all keys. Consistent hashing minimizes this data movement.

The Ring Abstraction

Imagine the hash space as a ring (0 to 2³² wrapping around to 0). Both shard identifiers and data keys are hashed onto this ring:

Hash each shard ID to get its position on the ring
Hash each data key to get its position on the ring
Walk clockwise from the key position until you hit a shard—that's the owning shard

Benefits of Consistent Hashing

Minimal Redistribution: Adding a new shard only affects keys between the new shard and its clockwise neighbor. On average, only 1/N of keys move (where N is the number of shards).

Virtual Nodes: To improve distribution, each physical shard is represented by multiple "virtual nodes" on the ring. A shard with 100 virtual nodes appears at 100 ring positions, smoothing out distribution.

consistent_hash_ring.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
/**
 * Consistent hashing implementation with virtual nodes
 */
 
import { createHash } from 'crypto';
 
interface ConsistentHashNode {
    nodeId: string;
    data: any;
}
 
class ConsistentHashRing<T> {
    private ring: Map<number, { nodeId: string; data: T }> = new Map();
    private sortedHashes: number[] = [];
    private virtualNodeCount: number;
    
    constructor(virtualNodeCount = 150) {
        this.virtualNodeCount = virtualNodeCount;
    }
    
    private hash(key: string): number {
        const hash = createHash('md5').update(key).digest();
        return hash.readUInt32BE(0);
    }
    
    addNode(nodeId: string, data: T): void {
        // Add virtual nodes for this physical node
        for (let i = 0; i < this.virtualNodeCount; i++) {
            const virtualKey = `${nodeId}:${i}`;
            const hashValue = this.hash(virtualKey);
            
            this.ring.set(hashValue, { nodeId, data });
        }
        
        // Rebuild sorted hash list
        this.sortedHashes = Array.from(this.ring.keys()).sort((a, b) => a - b);
    }
    
    removeNode(nodeId: string): void {
        // Remove all virtual nodes for this physical node
        for (let i = 0; i < this.virtualNodeCount; i++) {
            const virtualKey = `${nodeId}:${i}`;
            const hashValue = this.hash(virtualKey);
            
            this.ring.delete(hashValue);
        }
        
        this.sortedHashes = Array.from(this.ring.keys()).sort((a, b) => a - b);
    }
    
    getNode(key: string): { nodeId: string; data: T } | null {
        if (this.sortedHashes.length === 0) {
            return null;
        }
        
        const keyHash = this.hash(key);
        
        // Binary search for first hash >= keyHash
        let low = 0;
        let high = this.sortedHashes.length;
        
        while (low < high) {
            const mid = Math.floor((low + high) / 2);
            if (this.sortedHashes[mid] < keyHash) {
                low = mid + 1;
            } else {
                high = mid;
            }
        }
        
        // Wrap around if necessary
        const selectedHash = low < this.sortedHashes.length 
            ? this.sortedHashes[low] 
            : this.sortedHashes[0];
        
        return this.ring.get(selectedHash)!;
    }
    
    // Get nodes affected by adding a new node
    getAffectedRange(newNodeId: string): { from: number; to: number }[] {
        const ranges: { from: number; to: number }[] = [];
        
        for (let i = 0; i < this.virtualNodeCount; i++) {
            const virtualKey = `${newNodeId}:${i}`;
            const newHash = this.hash(virtualKey);
            
            // Find the next node clockwise (which will lose keys to new node)
            const nextNodeHash = this.sortedHashes.find(h => h > newHash) 
                || this.sortedHashes[0];
            
            // Find the previous node counter-clockwise (range start)
            const prevIndex = this.sortedHashes.findIndex(h => h >= newHash) - 1;
            const prevNodeHash = prevIndex >= 0 
                ? this.sortedHashes[prevIndex] 
                : this.sortedHashes[this.sortedHashes.length - 1];
            
            ranges.push({ from: prevNodeHash, to: newHash });
        }
        
        return ranges;
    }
}
 
// Usage: Shard routing with consistent hashing
interface ShardInfo {
    host: string;
    port: number;
}
 
const shardRing = new ConsistentHashRing<ShardInfo>(200);
 
// Add shards
shardRing.addNode('shard-1', { host: 'shard1.db', port: 5432 });
shardRing.addNode('shard-2', { host: 'shard2.db', port: 5432 });
shardRing.addNode('shard-3', { host: 'shard3.db', port: 5432 });
 
// Route a key
const userId = 'user-12345';
const shard = shardRing.getNode(userId);
console.log(`User ${userId} → ${shard?.nodeId}`);
 
// Adding a new shard moves only ~25% of keys (1/4 with 4 shards)
shardRing.addNode('shard-4', { host: 'shard4.db', port: 5432 });

Virtual Nodes Trade-off

More virtual nodes = better distribution but more memory for the ring structure and slower lookups. 100-200 virtual nodes per physical node is a common balance. Production systems like Cassandra use similar numbers.

Cross-Shard Operations

Some operations inherently span multiple shards. These are the most challenging aspects of sharded systems.

Scatter-Gather Queries

When a query doesn't include the shard key, you must query all shards and merge results:

-- Find all users with email ending in '@company.com'
-- Shard key is user_id, not email
-- Must query all shards!
SELECT * FROM users WHERE email LIKE '%@company.com';

Performance implications:

Query latency = slowest shard latency (not average)
Total work = N × single-shard work
Increased network traffic
Complex result merging (especially with ORDER BY, LIMIT)

Global Tables

Some data is needed on every shard but doesn't fit the shard key model:

Reference data (countries, currencies, categories)
Configuration settings
Global aggregates

Strategies:

Replicated tables: Copy small tables to every shard. Update all copies on change.
Centralized lookup service: Separate database for global data, cached heavily.
Materialized views: Pre-join global data with local data on each shard.

cross_shard_operations.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
/**
 * Cross-shard query patterns
 */
 
interface ShardedQuery<T> {
    // Execute on specific shard
    executeSingle(shardKey: string): Promise<T>;
    
    // Execute on all shards and merge
    executeScatterGather(): Promise<T[]>;
}
 
class ShardedQueryExecutor {
    constructor(
        private router: ShardRouter,
        private timeout: number = 5000
    ) {}
    
    // Simple scatter-gather
    async scatterGather<T>(
        buildQuery: (shard: ShardConfig) => Promise<T[]>
    ): Promise<T[]> {
        const shards = this.router.getAllShards();
        
        const results = await Promise.all(
            shards.map(shard => 
                Promise.race([
                    buildQuery(shard),
                    this.timeoutPromise<T[]>(this.timeout, [])
                ])
            )
        );
        
        return results.flat();
    }
    
    // Scatter-gather with ORDER BY and LIMIT
    async scatterGatherTopN<T>(
        buildQuery: (shard: ShardConfig, limit: number) => Promise<T[]>,
        compareFn: (a: T, b: T) => number,
        limit: number
    ): Promise<T[]> {
        const shards = this.router.getAllShards();
        
        // Request 'limit' from each shard
        const results = await Promise.all(
            shards.map(shard =>
                buildQuery(shard, limit)
            )
        );
        
        // Merge and re-sort
        const merged = results.flat();
        merged.sort(compareFn);
        
        return merged.slice(0, limit);
    }
    
    // Scatter-gather aggregation
    async scatterGatherAggregate<TAgg>(
        buildAggQuery: (shard: ShardConfig) => Promise<TAgg>,
        combineAggregates: (aggs: TAgg[]) => TAgg
    ): Promise<TAgg> {
        const shards = this.router.getAllShards();
        
        const partialResults = await Promise.all(
            shards.map(shard => buildAggQuery(shard))
        );
        
        return combineAggregates(partialResults);
    }
    
    private timeoutPromise<T>(ms: number, fallback: T): Promise<T> {
        return new Promise(resolve => 
            setTimeout(() => resolve(fallback), ms)
        );
    }
}
 
// Example: Search users across all shards
async function searchUsers(
    executor: ShardedQueryExecutor,
    query: string,
    limit: number
): Promise<User[]> {
    return executor.scatterGatherTopN<User>(
        async (shard, shardLimit) => {
            const result = await shard.pool.query(
                `SELECT * FROM users 
                 WHERE name ILIKE $1 
                 ORDER BY created_at DESC 
                 LIMIT $2`,
                [`%${query}%`, shardLimit]
            );
            return result.rows;
        },
        // Sort by created_at descending
        (a, b) => new Date(b.created_at).getTime() - new Date(a.created_at).getTime(),
        limit
    );
}
 
// Example: Count all users
async function countAllUsers(
    executor: ShardedQueryExecutor
): Promise<number> {
    return executor.scatterGatherAggregate<number>(
        async (shard) => {
            const result = await shard.pool.query(
                'SELECT COUNT(*) as count FROM users'
            );
            return parseInt(result.rows[0].count, 10);
        },
        (counts) => counts.reduce((sum, c) => sum + c, 0)
    );
}

Cross-Shard Joins Are Effectively Impossible

You cannot efficiently JOIN tables across shards. If you need to join users with orders and they're on different shards, you must fetch from each shard separately and join in application code. Design your sharding so related data is colocated on the same shard.

Cross-Shard Transactions

ACID transactions don't naturally span shards. When business logic requires atomic operations across shards, you have limited options:

Two-Phase Commit (2PC)

The classic distributed transaction protocol:

Prepare phase: Coordinator asks all participants to prepare (acquire locks, do work, but don't commit)
Commit phase: If all participants voted "yes", coordinator tells everyone to commit. If any voted "no", everyone aborts.

Problems with 2PC:

Blocking: Participants hold locks during entire protocol. If coordinator fails, locks remain held.
Latency: Multiple network round-trips add significant latency.
Tight coupling: All shards must be available; one unavailable shard fails the entire transaction.
Scalability: Doesn't scale well beyond a handful of participants.

Saga Pattern (Preferred)

As discussed in functional partitioning, sagas are the preferred approach for cross-shard operations:

Local transactions on each shard
Compensating transactions for rollback
Eventual consistency accepted

Design for Single-Shard Transactions

The best strategy is avoiding cross-shard transactions entirely:

Colocate related data: If user and orders share a shard key (user_id), an order placement transaction is single-shard.

Denormalize: Store necessary data locally instead of referencing another shard.

Accept eventual consistency: For some operations, strict atomicity isn't required. Transfer money between users? Process as two events with reconciliation.

Avoid These Patterns

•Cross-shard JOINs (use application-level)
•2PC for high-volume operations
•Assuming strong consistency across shards
•Ignoring failure modes in cross-shard ops
•Scatter-gather for every query
•Shard keys that don't match access patterns

Preferred Patterns

•Colocate related data on same shard
•Single-shard transactions for writes
•Sagas with compensating transactions
•Eventual consistency with reconciliation
•Cache global data locally per shard
•Index or search service for cross-shard lookups

Resharding

Resharding—adding, removing, or rebalancing shards—is one of the most operationally complex procedures in database management. Poor resharding can cause extended downtime or data loss.

When Resharding Is Needed

Capacity growth: Shards are running out of storage or hitting performance limits
Hot spot mitigation: One shard receives disproportionate traffic
Cost optimization: Consolidating underutilized shards
Geographic expansion: Adding regional shards for latency

Resharding Strategies

Online resharding (preferred when possible):

Create new shard(s) and start replicating target data
Once caught up, update routing to new shard
Dual-write during transition for safety
Cut over reads, then writes
Clean up old data from source shard

Offline resharding (simpler but causes downtime):

Stop writes (application maintenance mode)
Dump data from old shards
Reload into new shard configuration
Update routing configuration
Resume operations

Consistent Hashing Advantage

With consistent hashing, adding a shard moves only a fraction of keys. With 4 shards becoming 5, only ~20% of keys move. With simple hash mod, all keys need reconsideration.

online_reshard_procedure.md
Procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Online Resharding Procedure
 
## Prerequisites
- [ ] New shard database provisioned
- [ ] Replication configured from source shard(s)
- [ ] Monitoring for new shard in place
- [ ] Rollback procedure documented
 
## Phase 1: Replication Setup (No Impact)
1. Create tables on new shard with same schema
2. Set up logical replication from source shard(s)
   - PostgreSQL: Use pg_logical or Debezium
   - MySQL: Set up row-based replication
3. Wait for replication to catch up (lag < 1s)
4. Monitor for errors
 
## Phase 2: Dual-Write (Write Amplification)
1. Update application to write to BOTH source and destination
2. Validate data consistency between shards
3. Monitor write latency (should not double)
 
## Phase 3: Read Migration (Gradual)
1. Update routing config: 5% reads to new shard
2. Monitor error rates and latency
3. Increase to 25%, 50%, 75%, 100%
4. If issues, roll back to 0% and investigate
 
## Phase 4: Write Migration (Critical)
1. Enable write routing to new shard
2. Disable writes to source shard for migrated keys
3. Verify no writes going to old location
 
## Phase 5: Cleanup
1. Stop replication from source
2. Remove migrated data from source shard
3. Update shard map to remove old references
4. Run VACUUM on source shard to reclaim space
 
## Rollback Procedure
At any phase:
1. Revert routing config to 100% source
2. Stop dual-writes (keep source as truth)
3. Discard new shard data
4. Investigate failure

Resharding Is a Major Operation

Plan for resharding to take days or weeks for large datasets. Rushing causes data loss or extended outages. Test the procedure on staging with production-like data volumes before attempting in production.

Summary: Application-Level Sharding

Let's consolidate the key insights from our exploration of application-level sharding:

Key Takeaways

•Shard key selection is critical — Choose a key that appears in most queries, has high cardinality, and enables colocation of related data.
•Hash-based sharding offers even distribution but painful resharding — Consider consistent hashing to minimize data movement when adding shards.
•Range-based sharding enables efficient range queries — Good when queries naturally filter by shard key ranges.
•Cross-shard operations are expensive — Design to minimize scatter-gather queries. Colocate data that's queried together.
•Avoid cross-shard transactions — Use sagas, eventual consistency, or redesign to keep transactions single-shard.
•Resharding is operational complexity — Build tooling and procedures for online resharding before you need them under pressure.

What's Next:

For organizations that find application-level sharding too complex to build and maintain, there's an alternative: NewSQL databases. These systems provide the horizontal scalability of sharding with the familiar interface and ACID guarantees of traditional SQL databases. We'll explore when and how to consider these alternatives.

Page Complete

You now understand the mechanics, strategies, and trade-offs of application-level sharding. This is the most powerful—and most complex—SQL database scaling strategy. Use it when other options are exhausted and the benefits justify the significant engineering investment.

4 / 5

Loading learning content...

System Design (HLD)SQL Database Scaling Patterns

SQL Database Scaling Patterns

LevelAdvanced

Duration75 mins

TopicSQL Database Scaling Patterns

4 / 5

Application-Level Sharding

The Ultimate Scaling Lever

Sharding Is Complex—and Often Premature

What You Will Learn

Anatomy of a Sharded System

A sharded database system consists of several components working together:

Components

Shards: Independent database instances, each holding a partition of the data. Shards may have their own replicas for read scaling and high availability.

Shard Key: The column (or columns) used to determine which shard a row belongs to. For a users table, this might be user_id. All rows with the same shard key value reside on the same shard.

Shard Map/Directory: A mapping from shard key values (or ranges) to shard locations. This might be a configuration file, a separate metadata database, or an in-memory structure.

sharded_architecture.md
Architecture
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Application-Level Sharding Architecture
═══════════════════════════════════════════════════════════════
 
┌─────────────────────────────────────────────────────────────┐
│                      APPLICATION                            │
│                                                             │
│   ┌─────────────────────────────────────────────────────┐   │
│   │                  SHARD ROUTER                       │   │
│   │                                                     │   │
│   │   1. Extract shard key from query/context           │   │
│   │   2. Compute shard ID: hash(user_id) % num_shards   │   │
│   │   3. Look up shard connection from shard map        │   │
│   │   4. Execute query on target shard                  │   │
│   │                                                     │   │
│   │   ┌─────────────────────────────────────────────┐   │   │
│   │   │            SHARD MAP / DIRECTORY            │   │   │
│   │   │                                             │   │   │
│   │   │  Shard 0: postgresql://shard0.db:5432       │   │   │
│   │   │  Shard 1: postgresql://shard1.db:5432       │   │   │
│   │   │  Shard 2: postgresql://shard2.db:5432       │   │   │
│   │   │  Shard 3: postgresql://shard3.db:5432       │   │   │
│   │   └─────────────────────────────────────────────┘   │   │
│   └─────────────────────────────────────────────────────┘   │
└───────────────────────┬───────────────┬───────────────┬─────┘
                        │               │               │
              ┌─────────▼─────────┐     │     ┌─────────▼─────────┐
              │     SHARD 0       │     │     │     SHARD 3       │
              │  (users 0-999K)   │     │     │  (users 3M+)      │
              │  ┌──────────────┐ │     │     │  ┌──────────────┐ │
              │  │    users     │ │     │     │  │    users     │ │
              │  │   orders     │ │     │     │  │   orders     │ │
              │  │   profiles   │ │     │     │  │   profiles   │ │
              │  └──────────────┘ │     │     │  └──────────────┘ │
              │   + replicas     │     │     │   + replicas     │
              └───────────────────┘     │     └───────────────────┘
                                        │
                              ┌─────────▼─────────┐
                              │   SHARDS 1 & 2    │
                              │  (similar setup)  │
                              └───────────────────┘

What Gets Sharded?

Not every table needs sharding. In a typical e-commerce system:

Sharded tables (large, tied to shard key):

users — Sharded by user_id
orders — Sharded by user_id (colocated with users)
user_preferences — Sharded by user_id

Global tables (small, read-mostly, needed everywhere):

countries — Replicated to every shard
product_categories — Replicated to every shard
config_settings — Replicated or stored centrally

Centralized tables (not sharded, stored separately):

products — In a separate product database, not sharded by user
audit_logs — Centralized for compliance

The decision of what to shard depends on your access patterns and data relationships.

Shard Key Selection

The choice of shard key is the most critical decision in sharding. A poor shard key creates hot spots, makes common queries inefficient, and is extremely expensive to change later.

Shard Key Criteria

High Cardinality: The key should have many distinct values. Sharding by country (200 values) creates large, uneven shards. Sharding by user_id (millions of values) distributes data evenly.

Even Distribution: Values should distribute evenly across the key space. If 50% of users are in shard 0, that shard becomes a bottleneck.

Query Alignment: Most queries should include the shard key. If you shard by user_id but frequently query by email, every email lookup is a scatter-gather across all shards.

Colocation Support: Related data should share a shard key. User orders should be on the same shard as the user, so queries joining users and orders don't cross shards.

Shard Key Evaluation Examples
Candidate Key	Cardinality	Distribution	Query Alignment	Verdict
user_id	High (millions)	Even (if using hash)	Good for user-centric apps	✅ Excellent choice
tenant_id	Medium	May be uneven (large tenants)	Good for multi-tenant SaaS	⚠️ Works if tenants balanced
created_date	High	Even	Poor (queries rarely filter by date alone)	❌ Bad—causes hot spots on current date
country_code	Low (<200)	Very uneven	Moderate	❌ Bad—shards too uneven
order_id	High	Even	Poor (usually query by user, not order)	❌ Bad—separates user's orders
(user_id, tenant_id)	Very high	Even	Good for multi-tenant	✅ Composite key can work

shard_key_analysis.sql
PostgreSQL
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Analyze potential shard key distribution
 
-- 1. Cardinality analysis
SELECT 
    COUNT(DISTINCT user_id) AS distinct_users,
    COUNT(DISTINCT tenant_id) AS distinct_tenants,
    COUNT(DISTINCT DATE(created_at)) AS distinct_dates,
    COUNT(*) AS total_rows
FROM orders;
 
-- 2. Distribution analysis for user_id (simulating hash distribution)
-- Check if hash-based sharding would be even
WITH shard_distribution AS (
    SELECT 
        -- Simulate 8-shard distribution using modulo of hash
        abs(hashtext(user_id::text)) % 8 AS simulated_shard,
        COUNT(*) AS row_count
    FROM orders
    GROUP BY abs(hashtext(user_id::text)) % 8
)
SELECT 
    simulated_shard,
    row_count,
    round(100.0 * row_count / SUM(row_count) OVER (), 2) AS percentage
FROM shard_distribution
ORDER BY simulated_shard;
 
-- 3. Hot spot detection: Top users by row count
-- If few users dominate, sharding by user_id still creates hot shards
SELECT 
    user_id,
    COUNT(*) AS order_count,
    round(100.0 * COUNT(*) / (SELECT COUNT(*) FROM orders), 2) AS pct_of_total
FROM orders
GROUP BY user_id
ORDER BY order_count DESC
LIMIT 20;
 
-- 4. Query pattern analysis: What columns appear in WHERE clauses?
-- This requires pg_stat_statements or application-level query logging
SELECT 
    query,
    calls,
    total_time,
    -- Look for patterns in WHERE clauses
    CASE 
        WHEN query ILIKE '%WHERE%user_id%' THEN 'user_id'
        WHEN query ILIKE '%WHERE%tenant_id%' THEN 'tenant_id'
        WHEN query ILIKE '%WHERE%order_id%' THEN 'order_id'
        ELSE 'other/none'
    END AS likely_shard_key
FROM pg_stat_statements
WHERE query ILIKE '%orders%'
ORDER BY calls DESC
LIMIT 50;

The Golden Rule of Shard Keys

Sharding Strategies

Given a shard key, how do you map key values to shards? Three primary strategies exist, each with distinct trade-offs.

Hash-Based Sharding

Compute a hash of the shard key and use modulo to select a shard:

shard_id = hash(shard_key) % number_of_shards

Pros:

Very even distribution (if hash function is good)
Simple to implement
No need to store a mapping—purely algorithmic

Cons:

Adding/removing shards requires rehashing everything (expensive resharding)
Range queries become scatter-gather across all shards
No natural ordering—can't find "all users 1000-2000" on one shard

Range-Based Sharding

Divide the key space into ranges:

Shards:
  Shard 0: user_id 1 to 1,000,000
  Shard 1: user_id 1,000,001 to 2,000,000
  Shard 2: user_id 2,000,001 to 3,000,000

Pros:

Range queries on shard key are efficient (single shard)
Easier resharding—just split a range into smaller ranges
Predictable data locality

Cons:

Uneven distribution if data isn't uniformly distributed
Hot spots if recent data (high IDs) gets more traffic
Requires managing range boundaries

Directory-Based Sharding

Maintain an explicit mapping from shard key to shard:

Directory Table:
  user_id 12345 → Shard 2
  user_id 67890 → Shard 5

Pros:

Complete flexibility—can move individual keys between shards
Handle uneven distribution by mapping hot keys to dedicated shards
Supports complex resharding scenarios

Cons:

Directory itself becomes a bottleneck and SPOF
Extra lookup for every query
Directory must be highly available and consistent

Sharding Strategy Comparison
Strategy	Distribution	Resharding	Range Queries	Complexity
Hash-Based	Excellent	Hard (full rehash)	Poor (scatter-gather)	Low
Range-Based	Variable	Moderate (split ranges)	Good (single shard)	Medium
Directory-Based	Complete control	Easy (update mapping)	Depends on mapping	High
Consistent Hashing	Good	Minimized movement	Poor	Medium

shard_router.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
/**
 * Shard routing implementations
 */
 
import { createHash } from 'crypto';
import { Pool } from 'pg';
 
interface ShardConfig {
    id: number;
    connectionString: string;
    pool: Pool;
}
 
// Abstract router interface
interface ShardRouter {
    getShardForKey(key: string): ShardConfig;
    getAllShards(): ShardConfig[];
}
 
// Hash-based router
class HashBasedRouter implements ShardRouter {
    constructor(private shards: ShardConfig[]) {}
    
    getShardForKey(key: string): ShardConfig {
        // Consistent hash using MD5 (fast, good distribution)
        const hash = createHash('md5')
            .update(key)
            .digest();
        
        // Use first 4 bytes as unsigned integer
        const hashValue = hash.readUInt32BE(0);
        const shardIndex = hashValue % this.shards.length;
        
        return this.shards[shardIndex];
    }
    
    getAllShards(): ShardConfig[] {
        return this.shards;
    }
}
 
// Range-based router
class RangeBasedRouter implements ShardRouter {
    private ranges: Array<{ maxKey: number; shard: ShardConfig }>;
    
    constructor(ranges: Array<{ maxKey: number; shard: ShardConfig }>) {
        // Ranges must be sorted by maxKey
        this.ranges = ranges.sort((a, b) => a.maxKey - b.maxKey);
    }
    
    getShardForKey(key: string): ShardConfig {
        const keyValue = parseInt(key, 10);
        
        for (const range of this.ranges) {
            if (keyValue <= range.maxKey) {
                return range.shard;
            }
        }
        
        // Key exceeds all ranges—use last shard
        return this.ranges[this.ranges.length - 1].shard;
    }
    
    getAllShards(): ShardConfig[] {
        return this.ranges.map(r => r.shard);
    }
}
 
// Directory-based router
class DirectoryBasedRouter implements ShardRouter {
    private defaultShard: ShardConfig;
    
    constructor(
        private shards: ShardConfig[],
        private directoryDb: Pool,
        defaultShardId: number
    ) {
        this.defaultShard = shards.find(s => s.id === defaultShardId)!;
    }
    
    async getShardForKey(key: string): Promise<ShardConfig> {
        // Look up in directory
        const result = await this.directoryDb.query(
            'SELECT shard_id FROM shard_directory WHERE shard_key = $1',
            [key]
        );
        
        if (result.rows.length > 0) {
            const shardId = result.rows[0].shard_id;
            return this.shards.find(s => s.id === shardId)!;
        }
        
        // Key not in directory—assign to default shard
        // (or compute based on hash and insert into directory)
        return this.defaultShard;
    }
    
    getAllShards(): ShardConfig[] {
        return this.shards;
    }
}
 
// Usage example: ShardedUserRepository
class ShardedUserRepository {
    constructor(private router: ShardRouter) {}
    
    async getUser(userId: string): Promise<User | null> {
        const shard = this.router.getShardForKey(userId);
        
        const result = await shard.pool.query(
            'SELECT * FROM users WHERE id = $1',
            [userId]
        );
        
        return result.rows[0] || null;
    }
    
    async createUser(user: CreateUserInput): Promise<User> {
        const shard = this.router.getShardForKey(user.id);
        
        const result = await shard.pool.query(
            'INSERT INTO users (id, email, name) VALUES ($1, $2, $3) RETURNING *',
            [user.id, user.email, user.name]
        );
        
        return result.rows[0];
    }
    
    // Scatter-gather: Query all shards (expensive!)
    async searchUsers(query: string): Promise<User[]> {
        const allShards = this.router.getAllShards();
        
        const results = await Promise.all(
            allShards.map(shard =>
                shard.pool.query(
                    'SELECT * FROM users WHERE name ILIKE $1 LIMIT 100',
                    [`%${query}%`]
                )
            )
        );
        
        // Merge results from all shards
        return results.flatMap(r => r.rows);
    }
}

Consistent Hashing

Simple hash-based sharding has a critical flaw: adding or removing a shard requires rehashing nearly all keys. Consistent hashing minimizes this data movement.

The Ring Abstraction

Imagine the hash space as a ring (0 to 2³² wrapping around to 0). Both shard identifiers and data keys are hashed onto this ring:

Hash each shard ID to get its position on the ring
Hash each data key to get its position on the ring
Walk clockwise from the key position until you hit a shard—that's the owning shard

Benefits of Consistent Hashing

Minimal Redistribution: Adding a new shard only affects keys between the new shard and its clockwise neighbor. On average, only 1/N of keys move (where N is the number of shards).

consistent_hash_ring.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
/**
 * Consistent hashing implementation with virtual nodes
 */
 
import { createHash } from 'crypto';
 
interface ConsistentHashNode {
    nodeId: string;
    data: any;
}
 
class ConsistentHashRing<T> {
    private ring: Map<number, { nodeId: string; data: T }> = new Map();
    private sortedHashes: number[] = [];
    private virtualNodeCount: number;
    
    constructor(virtualNodeCount = 150) {
        this.virtualNodeCount = virtualNodeCount;
    }
    
    private hash(key: string): number {
        const hash = createHash('md5').update(key).digest();
        return hash.readUInt32BE(0);
    }
    
    addNode(nodeId: string, data: T): void {
        // Add virtual nodes for this physical node
        for (let i = 0; i < this.virtualNodeCount; i++) {
            const virtualKey = `${nodeId}:${i}`;
            const hashValue = this.hash(virtualKey);
            
            this.ring.set(hashValue, { nodeId, data });
        }
        
        // Rebuild sorted hash list
        this.sortedHashes = Array.from(this.ring.keys()).sort((a, b) => a - b);
    }
    
    removeNode(nodeId: string): void {
        // Remove all virtual nodes for this physical node
        for (let i = 0; i < this.virtualNodeCount; i++) {
            const virtualKey = `${nodeId}:${i}`;
            const hashValue = this.hash(virtualKey);
            
            this.ring.delete(hashValue);
        }
        
        this.sortedHashes = Array.from(this.ring.keys()).sort((a, b) => a - b);
    }
    
    getNode(key: string): { nodeId: string; data: T } | null {
        if (this.sortedHashes.length === 0) {
            return null;
        }
        
        const keyHash = this.hash(key);
        
        // Binary search for first hash >= keyHash
        let low = 0;
        let high = this.sortedHashes.length;
        
        while (low < high) {
            const mid = Math.floor((low + high) / 2);
            if (this.sortedHashes[mid] < keyHash) {
                low = mid + 1;
            } else {
                high = mid;
            }
        }
        
        // Wrap around if necessary
        const selectedHash = low < this.sortedHashes.length 
            ? this.sortedHashes[low] 
            : this.sortedHashes[0];
        
        return this.ring.get(selectedHash)!;
    }
    
    // Get nodes affected by adding a new node
    getAffectedRange(newNodeId: string): { from: number; to: number }[] {
        const ranges: { from: number; to: number }[] = [];
        
        for (let i = 0; i < this.virtualNodeCount; i++) {
            const virtualKey = `${newNodeId}:${i}`;
            const newHash = this.hash(virtualKey);
            
            // Find the next node clockwise (which will lose keys to new node)
            const nextNodeHash = this.sortedHashes.find(h => h > newHash) 
                || this.sortedHashes[0];
            
            // Find the previous node counter-clockwise (range start)
            const prevIndex = this.sortedHashes.findIndex(h => h >= newHash) - 1;
            const prevNodeHash = prevIndex >= 0 
                ? this.sortedHashes[prevIndex] 
                : this.sortedHashes[this.sortedHashes.length - 1];
            
            ranges.push({ from: prevNodeHash, to: newHash });
        }
        
        return ranges;
    }
}
 
// Usage: Shard routing with consistent hashing
interface ShardInfo {
    host: string;
    port: number;
}
 
const shardRing = new ConsistentHashRing<ShardInfo>(200);
 
// Add shards
shardRing.addNode('shard-1', { host: 'shard1.db', port: 5432 });
shardRing.addNode('shard-2', { host: 'shard2.db', port: 5432 });
shardRing.addNode('shard-3', { host: 'shard3.db', port: 5432 });
 
// Route a key
const userId = 'user-12345';
const shard = shardRing.getNode(userId);
console.log(`User ${userId} → ${shard?.nodeId}`);
 
// Adding a new shard moves only ~25% of keys (1/4 with 4 shards)
shardRing.addNode('shard-4', { host: 'shard4.db', port: 5432 });

Virtual Nodes Trade-off

Cross-Shard Operations

Some operations inherently span multiple shards. These are the most challenging aspects of sharded systems.

Scatter-Gather Queries

When a query doesn't include the shard key, you must query all shards and merge results:

-- Find all users with email ending in '@company.com'
-- Shard key is user_id, not email
-- Must query all shards!
SELECT * FROM users WHERE email LIKE '%@company.com';

Performance implications:

Query latency = slowest shard latency (not average)
Total work = N × single-shard work
Increased network traffic
Complex result merging (especially with ORDER BY, LIMIT)

Global Tables

Some data is needed on every shard but doesn't fit the shard key model:

Reference data (countries, currencies, categories)
Configuration settings
Global aggregates

Strategies:

Replicated tables: Copy small tables to every shard. Update all copies on change.
Centralized lookup service: Separate database for global data, cached heavily.
Materialized views: Pre-join global data with local data on each shard.

cross_shard_operations.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
/**
 * Cross-shard query patterns
 */
 
interface ShardedQuery<T> {
    // Execute on specific shard
    executeSingle(shardKey: string): Promise<T>;
    
    // Execute on all shards and merge
    executeScatterGather(): Promise<T[]>;
}
 
class ShardedQueryExecutor {
    constructor(
        private router: ShardRouter,
        private timeout: number = 5000
    ) {}
    
    // Simple scatter-gather
    async scatterGather<T>(
        buildQuery: (shard: ShardConfig) => Promise<T[]>
    ): Promise<T[]> {
        const shards = this.router.getAllShards();
        
        const results = await Promise.all(
            shards.map(shard => 
                Promise.race([
                    buildQuery(shard),
                    this.timeoutPromise<T[]>(this.timeout, [])
                ])
            )
        );
        
        return results.flat();
    }
    
    // Scatter-gather with ORDER BY and LIMIT
    async scatterGatherTopN<T>(
        buildQuery: (shard: ShardConfig, limit: number) => Promise<T[]>,
        compareFn: (a: T, b: T) => number,
        limit: number
    ): Promise<T[]> {
        const shards = this.router.getAllShards();
        
        // Request 'limit' from each shard
        const results = await Promise.all(
            shards.map(shard =>
                buildQuery(shard, limit)
            )
        );
        
        // Merge and re-sort
        const merged = results.flat();
        merged.sort(compareFn);
        
        return merged.slice(0, limit);
    }
    
    // Scatter-gather aggregation
    async scatterGatherAggregate<TAgg>(
        buildAggQuery: (shard: ShardConfig) => Promise<TAgg>,
        combineAggregates: (aggs: TAgg[]) => TAgg
    ): Promise<TAgg> {
        const shards = this.router.getAllShards();
        
        const partialResults = await Promise.all(
            shards.map(shard => buildAggQuery(shard))
        );
        
        return combineAggregates(partialResults);
    }
    
    private timeoutPromise<T>(ms: number, fallback: T): Promise<T> {
        return new Promise(resolve => 
            setTimeout(() => resolve(fallback), ms)
        );
    }
}
 
// Example: Search users across all shards
async function searchUsers(
    executor: ShardedQueryExecutor,
    query: string,
    limit: number
): Promise<User[]> {
    return executor.scatterGatherTopN<User>(
        async (shard, shardLimit) => {
            const result = await shard.pool.query(
                `SELECT * FROM users 
                 WHERE name ILIKE $1 
                 ORDER BY created_at DESC 
                 LIMIT $2`,
                [`%${query}%`, shardLimit]
            );
            return result.rows;
        },
        // Sort by created_at descending
        (a, b) => new Date(b.created_at).getTime() - new Date(a.created_at).getTime(),
        limit
    );
}
 
// Example: Count all users
async function countAllUsers(
    executor: ShardedQueryExecutor
): Promise<number> {
    return executor.scatterGatherAggregate<number>(
        async (shard) => {
            const result = await shard.pool.query(
                'SELECT COUNT(*) as count FROM users'
            );
            return parseInt(result.rows[0].count, 10);
        },
        (counts) => counts.reduce((sum, c) => sum + c, 0)
    );
}

Cross-Shard Joins Are Effectively Impossible

Cross-Shard Transactions

ACID transactions don't naturally span shards. When business logic requires atomic operations across shards, you have limited options:

Two-Phase Commit (2PC)

The classic distributed transaction protocol:

Prepare phase: Coordinator asks all participants to prepare (acquire locks, do work, but don't commit)
Commit phase: If all participants voted "yes", coordinator tells everyone to commit. If any voted "no", everyone aborts.

Problems with 2PC:

Blocking: Participants hold locks during entire protocol. If coordinator fails, locks remain held.
Latency: Multiple network round-trips add significant latency.
Tight coupling: All shards must be available; one unavailable shard fails the entire transaction.
Scalability: Doesn't scale well beyond a handful of participants.

Saga Pattern (Preferred)

As discussed in functional partitioning, sagas are the preferred approach for cross-shard operations:

Local transactions on each shard
Compensating transactions for rollback
Eventual consistency accepted

Design for Single-Shard Transactions

The best strategy is avoiding cross-shard transactions entirely:

Colocate related data: If user and orders share a shard key (user_id), an order placement transaction is single-shard.

Denormalize: Store necessary data locally instead of referencing another shard.

Accept eventual consistency: For some operations, strict atomicity isn't required. Transfer money between users? Process as two events with reconciliation.

Avoid These Patterns

•Cross-shard JOINs (use application-level)
•2PC for high-volume operations
•Assuming strong consistency across shards
•Ignoring failure modes in cross-shard ops
•Scatter-gather for every query
•Shard keys that don't match access patterns

Preferred Patterns

•Colocate related data on same shard
•Single-shard transactions for writes
•Sagas with compensating transactions
•Eventual consistency with reconciliation
•Cache global data locally per shard
•Index or search service for cross-shard lookups

Resharding

Resharding—adding, removing, or rebalancing shards—is one of the most operationally complex procedures in database management. Poor resharding can cause extended downtime or data loss.

When Resharding Is Needed

Capacity growth: Shards are running out of storage or hitting performance limits
Hot spot mitigation: One shard receives disproportionate traffic
Cost optimization: Consolidating underutilized shards
Geographic expansion: Adding regional shards for latency

Resharding Strategies

Online resharding (preferred when possible):

Create new shard(s) and start replicating target data
Once caught up, update routing to new shard
Dual-write during transition for safety
Cut over reads, then writes
Clean up old data from source shard

Offline resharding (simpler but causes downtime):

Stop writes (application maintenance mode)
Dump data from old shards
Reload into new shard configuration
Update routing configuration
Resume operations

Consistent Hashing Advantage

With consistent hashing, adding a shard moves only a fraction of keys. With 4 shards becoming 5, only ~20% of keys move. With simple hash mod, all keys need reconsideration.

online_reshard_procedure.md
Procedure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Online Resharding Procedure
 
## Prerequisites
- [ ] New shard database provisioned
- [ ] Replication configured from source shard(s)
- [ ] Monitoring for new shard in place
- [ ] Rollback procedure documented
 
## Phase 1: Replication Setup (No Impact)
1. Create tables on new shard with same schema
2. Set up logical replication from source shard(s)
   - PostgreSQL: Use pg_logical or Debezium
   - MySQL: Set up row-based replication
3. Wait for replication to catch up (lag < 1s)
4. Monitor for errors
 
## Phase 2: Dual-Write (Write Amplification)
1. Update application to write to BOTH source and destination
2. Validate data consistency between shards
3. Monitor write latency (should not double)
 
## Phase 3: Read Migration (Gradual)
1. Update routing config: 5% reads to new shard
2. Monitor error rates and latency
3. Increase to 25%, 50%, 75%, 100%
4. If issues, roll back to 0% and investigate
 
## Phase 4: Write Migration (Critical)
1. Enable write routing to new shard
2. Disable writes to source shard for migrated keys
3. Verify no writes going to old location
 
## Phase 5: Cleanup
1. Stop replication from source
2. Remove migrated data from source shard
3. Update shard map to remove old references
4. Run VACUUM on source shard to reclaim space
 
## Rollback Procedure
At any phase:
1. Revert routing config to 100% source
2. Stop dual-writes (keep source as truth)
3. Discard new shard data
4. Investigate failure

Resharding Is a Major Operation

Summary: Application-Level Sharding

Let's consolidate the key insights from our exploration of application-level sharding:

Key Takeaways

•Shard key selection is critical — Choose a key that appears in most queries, has high cardinality, and enables colocation of related data.
•Hash-based sharding offers even distribution but painful resharding — Consider consistent hashing to minimize data movement when adding shards.
•Range-based sharding enables efficient range queries — Good when queries naturally filter by shard key ranges.
•Cross-shard operations are expensive — Design to minimize scatter-gather queries. Colocate data that's queried together.
•Avoid cross-shard transactions — Use sagas, eventual consistency, or redesign to keep transactions single-shard.
•Resharding is operational complexity — Build tooling and procedures for online resharding before you need them under pressure.

What's Next:

Page Complete

4 / 5