Sharding Partitioning - Learning Module

Loading content...

0/273

Shard Key Selection

The Decision That Shapes Everything

If sharding is the foundation of database scaling, the shard key is the cornerstone of that foundation. Every other sharding decision—strategy (range, hash, directory), number of shards, rebalancing approach—flows from this single choice.

The shard key determines:

Which queries are fast (single-shard) and which are slow (scatter-gather)
How evenly data distributes across shards
Whether transactions remain local or require distributed coordination
How painful future migrations will be

Choose well, and your system scales gracefully for years. Choose poorly, and you'll either suffer constant performance issues or face an expensive re-sharding migration.

What You Will Learn

By the end of this page, you will understand the principles of shard key selection, recognize common patterns and anti-patterns, analyze access patterns to identify optimal keys, and apply a decision framework to choose shard keys for real-world scenarios. This is perhaps the most important content in this entire module.

Shard Key Fundamentals

A shard key is the column (or combination of columns) used to determine which shard stores each row. Every row's shard is determined by applying the sharding function to its shard key value.

What Makes a Good Shard Key:

The ideal shard key satisfies multiple, sometimes conflicting, properties. Balancing these tradeoffs is the art of shard key selection.

1. High Cardinality

The shard key should have many distinct values to allow even distribution. A boolean column (true/false) is terrible—data splits into at most 2 shards. A UUID or auto-increment ID has essentially unlimited cardinality.

2. Even Distribution

Distinct values should occur with relatively equal frequency. A country column might have 200 values, but if 60% of users are in the US, you have severe skew.

3. Query Alignment

The shard key should appear in the WHERE clause of your most common queries. If 80% of queries filter by user_id, shard by user_id. If queries filter by created_at, consider time-based sharding.

4. Stability

The shard key value for a given entity should rarely change. If users frequently update their region, sharding by region means constantly moving data between shards.

Shard Key Properties Assessment
Property	What to Look For	Warning Signs	Impact if Missing
Cardinality	Millions+ of distinct values	< 100 distinct values	Cannot scale beyond N shards where N = cardinality
Distribution	Zipf < 1.5, no single key dominates	One value has >5% of data	Hot shards, wasted capacity
Query Alignment	Key in 80%+ of WHERE clauses	Most queries filter by other columns	Scatter-gather for common queries
Stability	Value set once, rarely changes	Frequent updates to key column	Data migration churn

The Immutability Principle

Treat the shard key as immutable after initial assignment. While you can technically update a shard key value, doing so requires moving the entire entity to a new shard—an expensive, error-prone operation. Design your data model so shard key values are permanent.

Common Shard Key Patterns

Certain shard key patterns appear repeatedly across different domains because they align with common access patterns. Understanding these patterns helps you recognize which applies to your situation.

Pattern 1: Tenant/Organization ID

For multi-tenant SaaS, shard by tenant_id (or org_id, account_id, workspace_id):

All of a tenant's data lives together
Queries always filter by tenant (for isolation)
Transactions within a tenant are local
Large tenants can get dedicated shards

Examples: Salesforce, Slack, Shopify, Atlassian

Pattern 2: User ID

For user-centric applications, shard by user_id:

User's profile, posts, messages, activity colocated
Most operations are user-scoped
Social features (following, messaging) may cross shards

Examples: Facebook, Twitter, LinkedIn

Pattern 3: Timestamp

For time-series and logging systems, shard by time:

Queries filter by time ranges
Old data archives/deletes naturally
Recent data is 'hot', old data is 'cold'

Examples: DataDog, Splunk, TimescaleDB

Pattern 4: Geographic Region

For geo-distributed applications, shard by region:

Data locality for performance
Compliance with data sovereignty laws
Users interact primarily within region

Examples: Uber, Doordash (for ride/delivery data)

Matching Patterns to Use Cases

•B2B SaaS → Tenant ID (workspaces are isolated, most queries are per-tenant)
•Social Network → User ID (user's activity, profile, social graph)
•E-commerce → Customer ID or Order ID (customer-centric or order-centric queries)
•IoT/Telemetry → Device ID + Timestamp (device-level queries with time bounds)
•Gaming → Player ID or Match ID (player stats, or match-centric for multiplayer)
•Financial → Account ID (all transactions for an account together)
•Analytics/Logs → Timestamp + Source (time-range queries with source filter)

The Natural Isolation Test

Ask: 'What entity defines the boundary of most operations?' If the answer is 'tenant' (in SaaS), shard by tenant. If 'user' (in social), shard by user. If 'time period' (in analytics), shard by time. The shard key should match the natural isolation boundary of your domain.

Shard Key Anti-Patterns

Learning from failures is as important as learning from successes. These anti-patterns have caused production outages, expensive re-sharding projects, and abandoned systems.

Shard Key Anti-Patterns

•Auto-Increment ID with Range Sharding — New data always hits the 'current' shard. 100% of writes to one shard. Use hash sharding with auto-increment, or use UUIDs.
•Low-Cardinality Column — Sharding by status (active/inactive), country (200 values), or type (10 values) limits shard count forever.
•Frequently Changing Column — Sharding by last_login or mutable status requires moving data whenever the value changes.
•Column Absent from Common Queries — If most queries filter by email but you shard by user_id, you need email→user_id lookup before every query.
•Non-Unique Column — Sharding by category_id where millions of items share the same category creates massive skew.
•Composite Key Where Only Part is Queried — Sharding by (tenant_id, user_id) but queries only filter by user_id require scatter-gather.

Case Study: Sharding by Status

A team sharded an orders table by order_status:

pending
processing
shipped
delivered
cancelled

Problems:

Only 5 shards possible forever
All new orders hit 'pending' shard
Most queries were for active orders
'Shipped' shard was 10x larger than others

Result: Complete re-sharding to customer_id after 6 months.

Case Study: Sharding by Email Domain

A team sharded users by email domain for 'fraud detection':

@gmail.com → Shard 1
@yahoo.com → Shard 2
@hotmail.com → Shard 3
Others → Shard 4

Problems:

70% of users were gmail.com
Shard 1 had 100M users, others had 10M each
Cannot add more shards
Queries by user_id require domain extraction

Result: Catastrophic failure at scale.

The Celebrity Problem

Even with good shard key choices (user_id), distribution can become skewed. One viral user (celebrity) can generate 1000x the data of normal users, overwhelming their shard. Solutions include: dedicated shards for celebrities, sub-sharding by (user_id, bucket), or rate limiting per-user data volume.

Composite Shard Keys

Sometimes a single column doesn't meet all shard key requirements. Composite shard keys combine multiple columns, providing finer-grained distribution and enabling efficient queries that filter by multiple dimensions.

When to Use Composite Keys:

Single column has low cardinality but combination is high
- region (10 values) + user_id (millions) = high cardinality
Queries filter by multiple dimensions
- Queries filter by both tenant_id AND date
Need hierarchical organization
- org_id / team_id / user_id hierarchy

Cassandra's Compound Key Model:

Cassandra provides an elegant compound key model that many systems emulate:

PRIMARY KEY ((partition_key), clustering_columns)

Partition Key: Determines which shard (can be composite with multiple columns)
Clustering Columns: Determines sort order WITHIN the partition

Example: PRIMARY KEY ((tenant_id), created_at, event_id)

Data sharded by tenant_id
Within each partition, sorted by created_at, then event_id
Range scans on created_at are efficient within a tenant

composite-shard-key.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
/**
 * Composite Shard Key Implementation
 * 
 * Uses multiple columns to determine shard placement
 */
 
interface CompositeKey {
    primary: string;    // Required, determines shard
    secondary?: string; // Optional, refines distribution
}
 
class CompositeShardRouter {
    private shardCount: number;
    private useSecondary: boolean;
    
    constructor(shardCount: number, useSecondary: boolean = false) {
        this.shardCount = shardCount;
        this.useSecondary = useSecondary;
    }
    
    /**
     * Compute shard from composite key
     */
    getShard(key: CompositeKey): number {
        let hashInput: string;
        
        if (this.useSecondary && key.secondary) {
            // Hash both components
            hashInput = `${key.primary}:${key.secondary}`;
        } else {
            // Hash only primary
            hashInput = key.primary;
        }
        
        return this.hash(hashInput) % this.shardCount;
    }
    
    /**
     * Check if a query can be routed to single shard
     */
    canRouteSingleShard(query: { primary?: string; secondary?: string }): boolean {
        // Must have primary key
        if (!query.primary) return false;
        
        // If using secondary for sharding, must have secondary
        if (this.useSecondary && !query.secondary) return false;
        
        return true;
    }
    
    private hash(input: string): number {
        let hash = 0;
        for (let i = 0; i < input.length; i++) {
            hash = Math.imul(31, hash) + input.charCodeAt(i) | 0;
        }
        return Math.abs(hash);
    }
}
 
// Example: IoT sensor data
// Composite key: (device_id, date)
 
const router = new CompositeShardRouter(16, true);
 
// Query for specific device on specific date -> single shard
const shard1 = router.getShard({ 
    primary: "device-abc123", 
    secondary: "2024-01-15" 
});
console.log(`device-abc123 on 2024-01-15: Shard ${shard1}`);
 
// Same device, different date -> potentially different shard
const shard2 = router.getShard({ 
    primary: "device-abc123", 
    secondary: "2024-01-16" 
});
console.log(`device-abc123 on 2024-01-16: Shard ${shard2}`);
 
// Query routing analysis
console.log("Can route single shard:");
console.log("  device + date:", router.canRouteSingleShard({ primary: "d1", secondary: "2024-01-15" })); // true
console.log("  device only:", router.canRouteSingleShard({ primary: "d1" })); // false (scatter-gather)
console.log("  date only:", router.canRouteSingleShard({ secondary: "2024-01-15" })); // false (scatter-gather)

Partial Key Queries

With composite keys, queries providing only part of the key require scatter-gather. If your composite key is (tenant_id, user_id), querying just by user_id requires checking all shards. Design your composite key so the most common query filter is the leading component.

Analyzing Access Patterns

The best shard key emerges from analyzing how your application actually accesses data. This is not a theoretical exercise—it requires examining real (or projected) query patterns.

Access Pattern Analysis Framework:

Step 1: Enumerate All Query Types

List every query your application runs. For each, note:

Frequency (queries/second)
WHERE clause filters
JOIN requirements
Latency requirements
Read vs. write

Step 2: Identify Filter Columns

For each query, which columns appear in WHERE clauses? Tally the frequency:

Column	Queries Using It	Total QPS	Critical Path
user_id	15/20 queries	10,000	Yes
tenant_id	18/20 queries	12,000	Yes
created_at	8/20 queries	3,000	No
status	5/20 queries	500	No

Step 3: Score Candidate Shard Keys

shard-key-analysis.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
/**
 * Shard Key Candidate Scoring
 * 
 * Analyzes access patterns to recommend optimal shard key
 */
 
interface Query {
    name: string;
    qps: number;                    // Queries per second
    filters: string[];              // Columns in WHERE clause
    isCriticalPath: boolean;        // Part of user-facing latency
    isWrite: boolean;               // Write operation
}
 
interface ShardKeyScore {
    column: string;
    singleShardQps: number;         // QPS that hits single shard
    scatterGatherQps: number;       // QPS that needs all shards
    criticalPathAlignment: number;  // % of critical queries aligned
    writeAlignment: number;         // % of writes aligned
    overallScore: number;
}
 
function analyzeShardKeyCandidates(
    queries: Query[],
    candidates: string[]
): ShardKeyScore[] {
    
    return candidates.map(column => {
        let singleShardQps = 0;
        let scatterGatherQps = 0;
        let criticalAligned = 0;
        let criticalTotal = 0;
        let writeAligned = 0;
        let writeTotal = 0;
        
        for (const query of queries) {
            const aligned = query.filters.includes(column);
            
            if (aligned) {
                singleShardQps += query.qps;
            } else {
                scatterGatherQps += query.qps;
            }
            
            if (query.isCriticalPath) {
                criticalTotal += query.qps;
                if (aligned) criticalAligned += query.qps;
            }
            
            if (query.isWrite) {
                writeTotal += query.qps;
                if (aligned) writeAligned += query.qps;
            }
        }
        
        const totalQps = singleShardQps + scatterGatherQps;
        
        // Score: weighted combination of factors
        const overallScore = 
            (singleShardQps / totalQps) * 40 +                             // 40% weight on single-shard ratio
            (criticalTotal > 0 ? criticalAligned / criticalTotal : 0) * 35 + // 35% on critical path
            (writeTotal > 0 ? writeAligned / writeTotal : 0) * 25;           // 25% on write alignment
        
        return {
            column,
            singleShardQps,
            scatterGatherQps,
            criticalPathAlignment: criticalTotal > 0 ? criticalAligned / criticalTotal : 0,
            writeAlignment: writeTotal > 0 ? writeAligned / writeTotal : 0,
            overallScore,
        };
    });
}
 
// Example: E-commerce application queries
const queries: Query[] = [
    { name: "Get user profile", qps: 5000, filters: ["user_id"], isCriticalPath: true, isWrite: false },
    { name: "Get user orders", qps: 3000, filters: ["user_id"], isCriticalPath: true, isWrite: false },
    { name: "Create order", qps: 500, filters: ["user_id"], isCriticalPath: true, isWrite: true },
    { name: "Get order by ID", qps: 1000, filters: ["order_id"], isCriticalPath: true, isWrite: false },
    { name: "Search products", qps: 2000, filters: ["category_id"], isCriticalPath: true, isWrite: false },
    { name: "Admin: orders by status", qps: 10, filters: ["status"], isCriticalPath: false, isWrite: false },
    { name: "Analytics: daily orders", qps: 5, filters: ["created_at"], isCriticalPath: false, isWrite: false },
];
 
const candidates = ["user_id", "order_id", "category_id", "status"];
const scores = analyzeShardKeyCandidates(queries, candidates);
 
console.log("Shard Key Candidate Analysis:");
console.log("============================");
scores.sort((a, b) => b.overallScore - a.overallScore).forEach(score => {
    console.log(`
${score.column}:
  Single-shard QPS: ${score.singleShardQps}
  Scatter-gather QPS: ${score.scatterGatherQps}
  Critical path alignment: ${(score.criticalPathAlignment * 100).toFixed(1)}%
  Write alignment: ${(score.writeAlignment * 100).toFixed(1)}%
  Overall Score: ${score.overallScore.toFixed(1)}
`);
});
 
// Expected output: user_id scores highest

Use Production Logs

The best input for access pattern analysis is production query logs. Use slow query logs, application APM data, or database query statistics. Real data beats assumptions every time. If building a new system, analyze the access patterns of similar systems or create realistic projections based on user flows.

Handling Cross-Shard Operations

No matter how well you choose your shard key, some operations will inevitably cross shard boundaries. The key is minimizing their frequency and handling them gracefully when they occur.

Types of Cross-Shard Operations:

Cross-Shard Queries — Queries without the shard key, requiring scatter-gather
Cross-Shard Joins — Joining data from different shards
Cross-Shard Transactions — ACID transactions spanning multiple shards
Cross-Shard Aggregations — COUNT, SUM, AVG across all data
Cross-Shard Migrations — Moving data between shards

Strategies for Each:

Cross-Shard Operation Strategies
Operation Type	Strategy	Tradeoff	Example
Scatter-gather query	Parallel query all shards, merge results	Latency = slowest shard	Search across all users
Cross-shard join	Query each side, join in application	Memory + network overhead	User → Orders (different shards)
Distributed transaction	2PC or Saga pattern	Performance + complexity	Transfer between accounts
Global aggregation	Pre-compute in background job	Staleness	Total user count
Global secondary index	Maintain separate index service	Write overhead	Email → User lookup

The Global Secondary Index Pattern:

For lookups by non-shard-key columns, maintain a separate index that maps alternative keys to shard keys:

Email Index: (email → user_id)
Username Index: (username → user_id)

The index can be:

Unsharded: Small table, single database
Sharded by the alternative key: Enables single-shard lookups
External service: Elasticsearch, Redis, dedicated lookup service

Login Flow Example:

User provides email
Query email index: email_index[email] → user_id
Hash user_id to find shard
Query shard: SELECT * FROM users WHERE id = user_id

Two lookups, but both are single-shard.

Global Unique Constraints

Enforcing global uniqueness (unique email across all shards) without a global index requires querying all shards on every insert—a scalability nightmare. Instead, use a global index for uniqueness enforcement: attempt to insert into the index first, fail if exists, then insert into the shard.

Shard Key Decision Framework

Let's integrate everything into a practical decision framework. Follow these steps to choose your shard key.

Shard Key Selection Process

•Identify Candidate Keys — List columns that could serve as shard keys. Include single columns and obvious composites. Focus on columns with high cardinality and natural entities (IDs, timestamps).
•Analyze Access Patterns — Enumerate your queries, their frequency, and which columns they filter by. Score each candidate based on single-shard routing percentage.
•Evaluate Distribution — For each candidate, analyze value distribution. Are there hotspots? What's the ratio of largest to average key group?
•Check Stability — Verify that candidate values don't change frequently. Exclude columns that update often.
•Consider Growth — How will cardinality and distribution change as you scale 10x or 100x?
•Evaluate Cross-Shard Needs — What operations must cross shards? Are they acceptable as scatter-gather, or do you need global indexes?
•Prototype and Test — Before committing, prototype the sharding scheme. Run representative queries against test data.
•Document the Decision — Record why you chose this shard key, what tradeoffs you accepted, and what operations will be cross-shard.

shard-key-decision-template.md
Template
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Shard Key Decision Document
 
## System: [Your System Name]
## Table: [Table Being Sharded]
## Date: [Decision Date]
 
## 1. Candidate Shard Keys
 
| Candidate | Cardinality | Distribution | In Common Queries | Stability |
|-----------|-------------|--------------|-------------------|-----------|
| user_id   | 100M+       | Even         | 85% of queries    | Immutable |
| tenant_id | 10K         | Skewed (10%) | 90% of queries    | Immutable |
| created_at| Continuous  | Even         | 30% of queries    | Immutable |
 
## 2. Access Pattern Analysis
 
Top 5 queries by QPS:
1. Get user profile (user_id) - 5000 QPS, critical path
2. Get user orders (user_id) - 3000 QPS, critical path
3. Search products (category_id) - 2000 QPS, critical path
4. Create order (user_id) - 500 QPS, critical path, write
5. Get order by ID (order_id) - 1000 QPS, critical path
 
## 3. Shard Key Scores
 
| Candidate | Single-Shard % | Critical Path % | Write % | Score |
|-----------|----------------|-----------------|---------|-------|
| user_id   | 75%            | 85%             | 100%    | 82    |
| tenant_id | 80%            | 90%             | 100%    | 87    |
| order_id  | 10%            | 10%             | 0%      | 8     |
 
## 4. Selected Shard Key: tenant_id
 
### Rationale:
- Highest critical path alignment (90%)
- All writes routed to single shard
- Multi-tenant isolation is business requirement
- Large tenants can get dedicated shards
 
### Accepted Tradeoffs:
- Product search requires scatter-gather (acceptable: uses Elasticsearch)
- Cross-tenant analytics require scatter-gather (acceptable: batch job)
 
### Cross-Shard Operations:
- Global user lookup: Email index service
- Analytics: Nightly aggregation job
- Admin dashboards: Scatter-gather (low volume)
 
## 5. Implementation Notes
- Use hash(tenant_id) % 64 for initial sharding
- Directory override for enterprise tenants
- Review distribution quarterly
 
## 6. Approved By: [Names]

Documenting Your Decision

A formal decision document prevents future debates, helps onboard new team members, and preserves the reasoning when you need to evaluate changes. When someone asks 'Why did we shard by tenant_id?', point them to the document.

Summary: Shard Key Selection

We've covered the critical topic of shard key selection. Let's consolidate the key insights:

Key Takeaways

•The shard key shapes everything — Query performance, data distribution, transaction scope, and migration complexity all flow from this single choice.
•Good shard keys have four properties — High cardinality, even distribution, alignment with common queries, and stability (rarely changing).
•Match the key to your natural isolation boundary — Tenant for SaaS, user for social, time for analytics. The shard key should match domain boundaries.
•Anti-patterns are costly — Low cardinality, changing values, and query misalignment cause scaling walls and expensive re-sharding.
•Composite keys provide more control — Combine columns when single columns don't satisfy all requirements, but remember partial key queries scatter-gather.
•Analyze, don't assume — Score candidates against actual (or projected) query patterns. Let data drive the decision.
•Plan for cross-shard operations — Accept scatter-gather for rare queries, build global indexes for common non-shard-key lookups.
•Document your decision — Record the rationale, tradeoffs, and expected cross-shard operations.

Module Complete:

Congratulations! You've completed the Sharding (Partitioning) module. You now understand:

Why sharding is necessary — Single-node database limits
How horizontal partitioning works — Row distribution fundamentals
Range-based sharding — Contiguous ranges, time-series data, hotspots
Hash-based sharding — Even distribution, consistent hashing, virtual nodes
Directory-based sharding — Explicit control, flexibility, operational complexity
Shard key selection — The decision that makes or breaks your sharding strategy

With this knowledge, you can design and implement sharded database architectures that scale to millions of users and petabytes of data.

Module Complete

You now possess comprehensive knowledge of database sharding—from fundamental concepts to advanced implementation patterns. Apply this knowledge thoughtfully: shard when necessary, choose your shard key deliberately, and always design for the operations that matter most to your users.

Shard Key Selection

The Decision That Shapes Everything

The shard key determines:

Which queries are fast (single-shard) and which are slow (scatter-gather)
How evenly data distributes across shards
Whether transactions remain local or require distributed coordination
How painful future migrations will be

Choose well, and your system scales gracefully for years. Choose poorly, and you'll either suffer constant performance issues or face an expensive re-sharding migration.

What You Will Learn

Shard Key Fundamentals

A shard key is the column (or combination of columns) used to determine which shard stores each row. Every row's shard is determined by applying the sharding function to its shard key value.

What Makes a Good Shard Key:

The ideal shard key satisfies multiple, sometimes conflicting, properties. Balancing these tradeoffs is the art of shard key selection.

1. High Cardinality

2. Even Distribution

Distinct values should occur with relatively equal frequency. A country column might have 200 values, but if 60% of users are in the US, you have severe skew.

3. Query Alignment

4. Stability

The shard key value for a given entity should rarely change. If users frequently update their region, sharding by region means constantly moving data between shards.

Shard Key Properties Assessment
Property	What to Look For	Warning Signs	Impact if Missing
Cardinality	Millions+ of distinct values	< 100 distinct values	Cannot scale beyond N shards where N = cardinality
Distribution	Zipf < 1.5, no single key dominates	One value has >5% of data	Hot shards, wasted capacity
Query Alignment	Key in 80%+ of WHERE clauses	Most queries filter by other columns	Scatter-gather for common queries
Stability	Value set once, rarely changes	Frequent updates to key column	Data migration churn

The Immutability Principle

Common Shard Key Patterns

Certain shard key patterns appear repeatedly across different domains because they align with common access patterns. Understanding these patterns helps you recognize which applies to your situation.

Pattern 1: Tenant/Organization ID

For multi-tenant SaaS, shard by tenant_id (or org_id, account_id, workspace_id):

All of a tenant's data lives together
Queries always filter by tenant (for isolation)
Transactions within a tenant are local
Large tenants can get dedicated shards

Examples: Salesforce, Slack, Shopify, Atlassian

Pattern 2: User ID

For user-centric applications, shard by user_id:

User's profile, posts, messages, activity colocated
Most operations are user-scoped
Social features (following, messaging) may cross shards

Examples: Facebook, Twitter, LinkedIn

Pattern 3: Timestamp

For time-series and logging systems, shard by time:

Queries filter by time ranges
Old data archives/deletes naturally
Recent data is 'hot', old data is 'cold'

Examples: DataDog, Splunk, TimescaleDB

Pattern 4: Geographic Region

For geo-distributed applications, shard by region:

Data locality for performance
Compliance with data sovereignty laws
Users interact primarily within region

Examples: Uber, Doordash (for ride/delivery data)

Matching Patterns to Use Cases

•B2B SaaS → Tenant ID (workspaces are isolated, most queries are per-tenant)
•Social Network → User ID (user's activity, profile, social graph)
•E-commerce → Customer ID or Order ID (customer-centric or order-centric queries)
•IoT/Telemetry → Device ID + Timestamp (device-level queries with time bounds)
•Gaming → Player ID or Match ID (player stats, or match-centric for multiplayer)
•Financial → Account ID (all transactions for an account together)
•Analytics/Logs → Timestamp + Source (time-range queries with source filter)

The Natural Isolation Test

Shard Key Anti-Patterns

Learning from failures is as important as learning from successes. These anti-patterns have caused production outages, expensive re-sharding projects, and abandoned systems.

Shard Key Anti-Patterns

•Auto-Increment ID with Range Sharding — New data always hits the 'current' shard. 100% of writes to one shard. Use hash sharding with auto-increment, or use UUIDs.
•Low-Cardinality Column — Sharding by status (active/inactive), country (200 values), or type (10 values) limits shard count forever.
•Frequently Changing Column — Sharding by last_login or mutable status requires moving data whenever the value changes.
•Column Absent from Common Queries — If most queries filter by email but you shard by user_id, you need email→user_id lookup before every query.
•Non-Unique Column — Sharding by category_id where millions of items share the same category creates massive skew.
•Composite Key Where Only Part is Queried — Sharding by (tenant_id, user_id) but queries only filter by user_id require scatter-gather.

Case Study: Sharding by Status

A team sharded an orders table by order_status:

pending
processing
shipped
delivered
cancelled

Problems:

Only 5 shards possible forever
All new orders hit 'pending' shard
Most queries were for active orders
'Shipped' shard was 10x larger than others

Result: Complete re-sharding to customer_id after 6 months.

Case Study: Sharding by Email Domain

A team sharded users by email domain for 'fraud detection':

@gmail.com → Shard 1
@yahoo.com → Shard 2
@hotmail.com → Shard 3
Others → Shard 4

Problems:

70% of users were gmail.com
Shard 1 had 100M users, others had 10M each
Cannot add more shards
Queries by user_id require domain extraction

Result: Catastrophic failure at scale.

The Celebrity Problem

Composite Shard Keys

When to Use Composite Keys:

Single column has low cardinality but combination is high
- region (10 values) + user_id (millions) = high cardinality
Queries filter by multiple dimensions
- Queries filter by both tenant_id AND date
Need hierarchical organization
- org_id / team_id / user_id hierarchy

Cassandra's Compound Key Model:

Cassandra provides an elegant compound key model that many systems emulate:

PRIMARY KEY ((partition_key), clustering_columns)

Partition Key: Determines which shard (can be composite with multiple columns)
Clustering Columns: Determines sort order WITHIN the partition

Example: PRIMARY KEY ((tenant_id), created_at, event_id)

Data sharded by tenant_id
Within each partition, sorted by created_at, then event_id
Range scans on created_at are efficient within a tenant

composite-shard-key.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
/**
 * Composite Shard Key Implementation
 * 
 * Uses multiple columns to determine shard placement
 */
 
interface CompositeKey {
    primary: string;    // Required, determines shard
    secondary?: string; // Optional, refines distribution
}
 
class CompositeShardRouter {
    private shardCount: number;
    private useSecondary: boolean;
    
    constructor(shardCount: number, useSecondary: boolean = false) {
        this.shardCount = shardCount;
        this.useSecondary = useSecondary;
    }
    
    /**
     * Compute shard from composite key
     */
    getShard(key: CompositeKey): number {
        let hashInput: string;
        
        if (this.useSecondary && key.secondary) {
            // Hash both components
            hashInput = `${key.primary}:${key.secondary}`;
        } else {
            // Hash only primary
            hashInput = key.primary;
        }
        
        return this.hash(hashInput) % this.shardCount;
    }
    
    /**
     * Check if a query can be routed to single shard
     */
    canRouteSingleShard(query: { primary?: string; secondary?: string }): boolean {
        // Must have primary key
        if (!query.primary) return false;
        
        // If using secondary for sharding, must have secondary
        if (this.useSecondary && !query.secondary) return false;
        
        return true;
    }
    
    private hash(input: string): number {
        let hash = 0;
        for (let i = 0; i < input.length; i++) {
            hash = Math.imul(31, hash) + input.charCodeAt(i) | 0;
        }
        return Math.abs(hash);
    }
}
 
// Example: IoT sensor data
// Composite key: (device_id, date)
 
const router = new CompositeShardRouter(16, true);
 
// Query for specific device on specific date -> single shard
const shard1 = router.getShard({ 
    primary: "device-abc123", 
    secondary: "2024-01-15" 
});
console.log(`device-abc123 on 2024-01-15: Shard ${shard1}`);
 
// Same device, different date -> potentially different shard
const shard2 = router.getShard({ 
    primary: "device-abc123", 
    secondary: "2024-01-16" 
});
console.log(`device-abc123 on 2024-01-16: Shard ${shard2}`);
 
// Query routing analysis
console.log("Can route single shard:");
console.log("  device + date:", router.canRouteSingleShard({ primary: "d1", secondary: "2024-01-15" })); // true
console.log("  device only:", router.canRouteSingleShard({ primary: "d1" })); // false (scatter-gather)
console.log("  date only:", router.canRouteSingleShard({ secondary: "2024-01-15" })); // false (scatter-gather)

Partial Key Queries

Analyzing Access Patterns

The best shard key emerges from analyzing how your application actually accesses data. This is not a theoretical exercise—it requires examining real (or projected) query patterns.

Access Pattern Analysis Framework:

Step 1: Enumerate All Query Types

List every query your application runs. For each, note:

Frequency (queries/second)
WHERE clause filters
JOIN requirements
Latency requirements
Read vs. write

Step 2: Identify Filter Columns

For each query, which columns appear in WHERE clauses? Tally the frequency:

Column	Queries Using It	Total QPS	Critical Path
user_id	15/20 queries	10,000	Yes
tenant_id	18/20 queries	12,000	Yes
created_at	8/20 queries	3,000	No
status	5/20 queries	500	No

Step 3: Score Candidate Shard Keys

shard-key-analysis.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
/**
 * Shard Key Candidate Scoring
 * 
 * Analyzes access patterns to recommend optimal shard key
 */
 
interface Query {
    name: string;
    qps: number;                    // Queries per second
    filters: string[];              // Columns in WHERE clause
    isCriticalPath: boolean;        // Part of user-facing latency
    isWrite: boolean;               // Write operation
}
 
interface ShardKeyScore {
    column: string;
    singleShardQps: number;         // QPS that hits single shard
    scatterGatherQps: number;       // QPS that needs all shards
    criticalPathAlignment: number;  // % of critical queries aligned
    writeAlignment: number;         // % of writes aligned
    overallScore: number;
}
 
function analyzeShardKeyCandidates(
    queries: Query[],
    candidates: string[]
): ShardKeyScore[] {
    
    return candidates.map(column => {
        let singleShardQps = 0;
        let scatterGatherQps = 0;
        let criticalAligned = 0;
        let criticalTotal = 0;
        let writeAligned = 0;
        let writeTotal = 0;
        
        for (const query of queries) {
            const aligned = query.filters.includes(column);
            
            if (aligned) {
                singleShardQps += query.qps;
            } else {
                scatterGatherQps += query.qps;
            }
            
            if (query.isCriticalPath) {
                criticalTotal += query.qps;
                if (aligned) criticalAligned += query.qps;
            }
            
            if (query.isWrite) {
                writeTotal += query.qps;
                if (aligned) writeAligned += query.qps;
            }
        }
        
        const totalQps = singleShardQps + scatterGatherQps;
        
        // Score: weighted combination of factors
        const overallScore = 
            (singleShardQps / totalQps) * 40 +                             // 40% weight on single-shard ratio
            (criticalTotal > 0 ? criticalAligned / criticalTotal : 0) * 35 + // 35% on critical path
            (writeTotal > 0 ? writeAligned / writeTotal : 0) * 25;           // 25% on write alignment
        
        return {
            column,
            singleShardQps,
            scatterGatherQps,
            criticalPathAlignment: criticalTotal > 0 ? criticalAligned / criticalTotal : 0,
            writeAlignment: writeTotal > 0 ? writeAligned / writeTotal : 0,
            overallScore,
        };
    });
}
 
// Example: E-commerce application queries
const queries: Query[] = [
    { name: "Get user profile", qps: 5000, filters: ["user_id"], isCriticalPath: true, isWrite: false },
    { name: "Get user orders", qps: 3000, filters: ["user_id"], isCriticalPath: true, isWrite: false },
    { name: "Create order", qps: 500, filters: ["user_id"], isCriticalPath: true, isWrite: true },
    { name: "Get order by ID", qps: 1000, filters: ["order_id"], isCriticalPath: true, isWrite: false },
    { name: "Search products", qps: 2000, filters: ["category_id"], isCriticalPath: true, isWrite: false },
    { name: "Admin: orders by status", qps: 10, filters: ["status"], isCriticalPath: false, isWrite: false },
    { name: "Analytics: daily orders", qps: 5, filters: ["created_at"], isCriticalPath: false, isWrite: false },
];
 
const candidates = ["user_id", "order_id", "category_id", "status"];
const scores = analyzeShardKeyCandidates(queries, candidates);
 
console.log("Shard Key Candidate Analysis:");
console.log("============================");
scores.sort((a, b) => b.overallScore - a.overallScore).forEach(score => {
    console.log(`
${score.column}:
  Single-shard QPS: ${score.singleShardQps}
  Scatter-gather QPS: ${score.scatterGatherQps}
  Critical path alignment: ${(score.criticalPathAlignment * 100).toFixed(1)}%
  Write alignment: ${(score.writeAlignment * 100).toFixed(1)}%
  Overall Score: ${score.overallScore.toFixed(1)}
`);
});
 
// Expected output: user_id scores highest

Use Production Logs

Handling Cross-Shard Operations

No matter how well you choose your shard key, some operations will inevitably cross shard boundaries. The key is minimizing their frequency and handling them gracefully when they occur.

Types of Cross-Shard Operations:

Cross-Shard Queries — Queries without the shard key, requiring scatter-gather
Cross-Shard Joins — Joining data from different shards
Cross-Shard Transactions — ACID transactions spanning multiple shards
Cross-Shard Aggregations — COUNT, SUM, AVG across all data
Cross-Shard Migrations — Moving data between shards

Strategies for Each:

Cross-Shard Operation Strategies
Operation Type	Strategy	Tradeoff	Example
Scatter-gather query	Parallel query all shards, merge results	Latency = slowest shard	Search across all users
Cross-shard join	Query each side, join in application	Memory + network overhead	User → Orders (different shards)
Distributed transaction	2PC or Saga pattern	Performance + complexity	Transfer between accounts
Global aggregation	Pre-compute in background job	Staleness	Total user count
Global secondary index	Maintain separate index service	Write overhead	Email → User lookup

The Global Secondary Index Pattern:

For lookups by non-shard-key columns, maintain a separate index that maps alternative keys to shard keys:

Email Index: (email → user_id)
Username Index: (username → user_id)

The index can be:

Unsharded: Small table, single database
Sharded by the alternative key: Enables single-shard lookups
External service: Elasticsearch, Redis, dedicated lookup service

Login Flow Example:

User provides email
Query email index: email_index[email] → user_id
Hash user_id to find shard
Query shard: SELECT * FROM users WHERE id = user_id

Two lookups, but both are single-shard.

Global Unique Constraints

Shard Key Decision Framework

Let's integrate everything into a practical decision framework. Follow these steps to choose your shard key.

Shard Key Selection Process

•Identify Candidate Keys — List columns that could serve as shard keys. Include single columns and obvious composites. Focus on columns with high cardinality and natural entities (IDs, timestamps).
•Analyze Access Patterns — Enumerate your queries, their frequency, and which columns they filter by. Score each candidate based on single-shard routing percentage.
•Evaluate Distribution — For each candidate, analyze value distribution. Are there hotspots? What's the ratio of largest to average key group?
•Check Stability — Verify that candidate values don't change frequently. Exclude columns that update often.
•Consider Growth — How will cardinality and distribution change as you scale 10x or 100x?
•Evaluate Cross-Shard Needs — What operations must cross shards? Are they acceptable as scatter-gather, or do you need global indexes?
•Prototype and Test — Before committing, prototype the sharding scheme. Run representative queries against test data.
•Document the Decision — Record why you chose this shard key, what tradeoffs you accepted, and what operations will be cross-shard.

shard-key-decision-template.md
Template
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Shard Key Decision Document
 
## System: [Your System Name]
## Table: [Table Being Sharded]
## Date: [Decision Date]
 
## 1. Candidate Shard Keys
 
| Candidate | Cardinality | Distribution | In Common Queries | Stability |
|-----------|-------------|--------------|-------------------|-----------|
| user_id   | 100M+       | Even         | 85% of queries    | Immutable |
| tenant_id | 10K         | Skewed (10%) | 90% of queries    | Immutable |
| created_at| Continuous  | Even         | 30% of queries    | Immutable |
 
## 2. Access Pattern Analysis
 
Top 5 queries by QPS:
1. Get user profile (user_id) - 5000 QPS, critical path
2. Get user orders (user_id) - 3000 QPS, critical path
3. Search products (category_id) - 2000 QPS, critical path
4. Create order (user_id) - 500 QPS, critical path, write
5. Get order by ID (order_id) - 1000 QPS, critical path
 
## 3. Shard Key Scores
 
| Candidate | Single-Shard % | Critical Path % | Write % | Score |
|-----------|----------------|-----------------|---------|-------|
| user_id   | 75%            | 85%             | 100%    | 82    |
| tenant_id | 80%            | 90%             | 100%    | 87    |
| order_id  | 10%            | 10%             | 0%      | 8     |
 
## 4. Selected Shard Key: tenant_id
 
### Rationale:
- Highest critical path alignment (90%)
- All writes routed to single shard
- Multi-tenant isolation is business requirement
- Large tenants can get dedicated shards
 
### Accepted Tradeoffs:
- Product search requires scatter-gather (acceptable: uses Elasticsearch)
- Cross-tenant analytics require scatter-gather (acceptable: batch job)
 
### Cross-Shard Operations:
- Global user lookup: Email index service
- Analytics: Nightly aggregation job
- Admin dashboards: Scatter-gather (low volume)
 
## 5. Implementation Notes
- Use hash(tenant_id) % 64 for initial sharding
- Directory override for enterprise tenants
- Review distribution quarterly
 
## 6. Approved By: [Names]

Documenting Your Decision

Summary: Shard Key Selection

We've covered the critical topic of shard key selection. Let's consolidate the key insights:

Key Takeaways

•The shard key shapes everything — Query performance, data distribution, transaction scope, and migration complexity all flow from this single choice.
•Good shard keys have four properties — High cardinality, even distribution, alignment with common queries, and stability (rarely changing).
•Match the key to your natural isolation boundary — Tenant for SaaS, user for social, time for analytics. The shard key should match domain boundaries.
•Anti-patterns are costly — Low cardinality, changing values, and query misalignment cause scaling walls and expensive re-sharding.
•Composite keys provide more control — Combine columns when single columns don't satisfy all requirements, but remember partial key queries scatter-gather.
•Analyze, don't assume — Score candidates against actual (or projected) query patterns. Let data drive the decision.
•Plan for cross-shard operations — Accept scatter-gather for rare queries, build global indexes for common non-shard-key lookups.
•Document your decision — Record the rationale, tradeoffs, and expected cross-shard operations.

Module Complete:

Congratulations! You've completed the Sharding (Partitioning) module. You now understand:

Why sharding is necessary — Single-node database limits
How horizontal partitioning works — Row distribution fundamentals
Range-based sharding — Contiguous ranges, time-series data, hotspots
Hash-based sharding — Even distribution, consistent hashing, virtual nodes
Directory-based sharding — Explicit control, flexibility, operational complexity
Shard key selection — The decision that makes or breaks your sharding strategy

With this knowledge, you can design and implement sharded database architectures that scale to millions of users and petabytes of data.

Module Complete