Amazon Dynamodb - Learning Module

Loading content...

0/273

Partition Key Design

The Decision That Makes or Breaks DynamoDB

Every DynamoDB implementation rises or falls on a single decision: partition key design.

This isn't hyperbole. I've witnessed systems handling 100,000 requests per second with rock-solid sub-5ms latency—and I've seen systems throttle at 1,000 requests per second while burning money on unused capacity. The difference? Not hardware, not configuration, not code optimization. The difference was partition key selection.

The partition key is DynamoDB's fundamental distribution mechanism. It determines which physical partition stores your data, how load is distributed across the cluster, and ultimately whether your system can scale. Choose wisely, and DynamoDB scales linearly with your traffic. Choose poorly, and you create hot partitions that become bottlenecks no amount of provisioned capacity can fix.

What You Will Learn

By the end of this page, you will understand how partition keys determine data distribution, the mathematics of partition capacity limits, strategies for designing high-cardinality keys, techniques for handling low-cardinality data, and advanced patterns like write sharding that unlock unlimited scale.

Understanding Partitions: The Foundation of DynamoDB Scale

Before we can design effective partition keys, we must understand what partitions are and how they work.

What is a Partition?

A partition is the fundamental unit of data storage and throughput in DynamoDB. Physically, a partition is a slice of an SSD managed by a storage node, but conceptually, it's simpler to think of it as a container with hard limits:

Storage Limit: 10 GB of data
Read Throughput: 3,000 RCU (read capacity units)
Write Throughput: 1,000 WCU (write capacity units)

When you create a table and start writing data, DynamoDB allocates partitions based on your throughput settings (for provisioned mode) or automatically (for on-demand mode). As data grows or throughput needs increase, DynamoDB automatically splits partitions.

Partition Distribution Visualization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
┌──────────────────────────────────────────────────────────────────────┐
│                    How Partition Keys Map to Partitions              │
└──────────────────────────────────────────────────────────────────────┘
 
Partition Key Value ───► Hash Function ───► Partition Hash (Internal)
                                                    │
                                                    ▼
                         ┌──────────────────────────────────────────┐
                         │        Hash Range: 0 to 2^128            │
                         │                                          │
    Partition 1          │ ████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░  │ (0 - X)
    Partition 2          │ ░░░░░░░░████████░░░░░░░░░░░░░░░░░░░░░░  │ (X - Y)
    Partition 3          │ ░░░░░░░░░░░░░░░░████████░░░░░░░░░░░░░░  │ (Y - Z)
    Partition N          │ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░████████│ (Z - Max)
                         │                                          │
                         └──────────────────────────────────────────┘
 
Example: userId = "U-12345"
    Hash("U-12345") = 0x7A3F... (falls in Partition 2's range)
    All items with userId="U-12345" stored in Partition 2
 
Example: userId = "U-67890"  
    Hash("U-67890") = 0x1B2C... (falls in Partition 1's range)
    All items with userId="U-67890" stored in Partition 1

Key Insight: Partition Key = Partition Affinity

All items with the same partition key value are stored in the same partition. This has profound implications:

Co-location enables efficiency — Items in the same partition can be queried together in a single operation
Co-location creates limits — All access to items with the same partition key hits the same physical resources
Hot partition keys = hot partitions — If one partition key value receives disproportionate traffic, that partition becomes a bottleneck

This is why partition key design is so critical. The hash function distributes partition key values across partitions uniformly, but if your application doesn't distribute traffic across partition key values uniformly, you'll have hot partitions.

The Hot Partition Trap

DynamoDB allocates throughput per partition, not per table. If you provision 10,000 WCU but all traffic goes to one partition, you effectively have only 1,000 WCU available. The other 9,000 WCU sit unused on other partitions. This is why throwing more capacity at a hot partition problem doesn't work—you need to fix the partition key design.

Characteristics of Effective Partition Keys

Designing effective partition keys requires understanding the properties that lead to uniform data and traffic distribution. Let's examine these characteristics systematically.

Ideal Partition Key Properties

•High Cardinality — Many distinct values. More unique values = more partitions = better distribution. UUIDs, user IDs, and order IDs are excellent. Boolean flags and status enums are terrible.
•Uniform Distribution — Values are accessed with roughly equal frequency. An e-commerce site where 90% of orders come from 1% of customers has non-uniform distribution.
•Write Pattern Spread — Writes are distributed across many partition key values. Time-series data writing to a 'sensor_id' partition key spreads writes; writing to 'date' partition key concentrates them.
•Read Pattern Alignment — The partition key matches how you query. If you always query by 'userId', 'userId' should be part of your key structure.
•Immutable — Partition keys cannot be changed after item creation. Choose attributes that won't need modification.

Partition Key Suitability Analysis
Candidate	Cardinality	Distribution	Verdict
userId (UUID)	Very High (billions)	Usually uniform	✅ Excellent
orderId (UUID)	Very High	Uniform by design	✅ Excellent
deviceId (IoT)	High (millions)	Depends on device activity	✅ Good
customerId	Medium-High	May have power users	⚠️ Watch for hot keys
date (YYYY-MM-DD)	Low (365/year)	Today's date is always hot	❌ Poor
status (enum)	Very Low (5-10)	Concentrated on 'active'	❌ Very Poor
country	Low (~200)	Concentrated on populous countries	❌ Very Poor
constant value	1	All traffic to one partition	❌ Catastrophic

The Cardinality Principle

The minimum number of partitions your table can effectively use is bounded by the cardinality of your partition key. If your partition key has only 100 distinct values, you can never have more than 100 partitions actively receiving traffic—even if DynamoDB creates more for storage.

Calculation example:

Table: User Sessions
Partition Key: status (5 values: active, expired, pending, suspended, deleted)

Even with 1 million sessions:
- Maximum effective partitions for load distribution: 5
- If 80% of sessions are 'active': That partition handles 80% of all traffic
- Maximum usable write throughput: ~5 × 1,000 WCU = 5,000 WCU
- But 80% of traffic hits one partition: Effective limit = 1,000 / 0.8 = 1,250 WCU

This is why status fields, booleans, and enums should never be partition keys.

Common Partition Key Anti-Patterns

Learning from common mistakes is often more instructive than studying ideal designs. Let's examine partition key anti-patterns that cause real-world failures.

Anti-Pattern 1: Date as Partition Key

•Scenario: Time-series data uses 'date' (YYYY-MM-DD) as partition key
•Problem: Today's partition receives all writes; historical partitions are cold
•Result: Massive hot partition on current date, throttling despite high provisioned capacity
•Fix: Use composite key with high-cardinality prefix (e.g., deviceId#2024-06-15)

Anti-Pattern 2: Sequential IDs

•Scenario: Using auto-incrementing IDs (1, 2, 3, ...) as partition keys
•Problem: New items always go to the 'highest' partition range; old partitions become stale
•Result: Write traffic concentrates on leading edge of hash space
•Fix: Use UUIDs/GUIDs or reversed timestamps for uniform hash distribution

Anti-Pattern 3: Status/Type Partition Keys

•Scenario: Using 'status' (pending, active, completed) as partition key for orders
•Problem: Most orders are in 'active' status at any given time
•Result: 'active' partition is overwhelmed while 'completed' partition is idle
•Fix: Use orderId as partition key; query by status using GSI if needed

Anti-Pattern 4: Tenant ID Alone (Multi-Tenant)

•Scenario: SaaS platform uses 'tenantId' as sole partition key
•Problem: Enterprise tenants generate orders of magnitude more traffic than small tenants
•Result: Large tenant partitions throttle; small tenant capacity wasted
•Fix: Composite key with tenantId#entityId or shard large tenants

Anti-Pattern vs Correct Design
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
// ❌ ANTI-PATTERN: Date as Partition Key
// All writes for today go to one partition
const badDesign = {
    tableName: "SensorReadings",
    keySchema: [
        { attributeName: "date", keyType: "HASH" },     // Only 365 values/year!
        { attributeName: "readingId", keyType: "RANGE" }
    ]
};
 
// One day = all traffic to one partition = throttled at 1,000 WCU
// Query: "Get all readings for date" → Full table scan of partition
 
 
// ✅ CORRECT: High-cardinality partition key with date in sort key
const goodDesign = {
    tableName: "SensorReadings",
    keySchema: [
        { attributeName: "sensorId", keyType: "HASH" }, // Millions of sensors = millions of partitions
        { attributeName: "timestamp", keyType: "RANGE" }
    ]
};
 
// Traffic distributed across all sensors
// Query: "Get readings for sensor X between dates" → Efficient range query
 
 
// ✅ ALTERNATIVE: Composite partition key for high-write scenarios
const compositeDesign = {
    tableName: "SensorReadings",
    // Partition key is a composite: sensorId#YYYY-MM
    // Ensures no single partition grows beyond 10 GB
    keySchema: [
        { attributeName: "pk", keyType: "HASH" },       // "SENSOR-001#2024-06"
        { attributeName: "timestamp", keyType: "RANGE" }
    ]
};

The Composite Key Pattern

One of the most powerful techniques in DynamoDB design is the composite primary key: a partition key combined with a sort key. This pattern enables rich querying while maintaining excellent distribution.

Anatomy of a Composite Key

Primary Key = Partition Key (PK) + Sort Key (SK)

Partition Key: Determines physical distribution. Items with the same PK are stored together.
Sort Key: Determines ordering within a partition. Enables range queries, sorting, and hierarchical data models.

The Power of Sort Keys

Sort keys enable:

Range Queries: SK BETWEEN '2024-01-01' AND '2024-06-30'
Prefix Queries: SK begins_with 'ORDER#'
Ordering: Items automatically sorted by SK within partition
Hierarchical Data: Store multiple entity types in one table (USER#123, ORDER#456, PROFILE#789)

Composite Key Design Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
// ============================================
// EXAMPLE 1: E-Commerce Order History
// ============================================
// Access Patterns:
// 1. Get all orders for a customer
// 2. Get orders for a customer in a date range
// 3. Get a specific order by orderId
 
const orderTable = {
    // PK: customerId (high cardinality - millions of customers)
    // SK: orderTimestamp#orderId (enables time-based queries)
    
    items: [
        {
            PK: "CUST#C-12345",
            SK: "ORDER#2024-06-15T10:30:00Z#ORD-789",
            orderId: "ORD-789",
            total: 149.99,
            status: "SHIPPED"
        },
        {
            PK: "CUST#C-12345",
            SK: "ORDER#2024-07-22T14:15:00Z#ORD-891",
            orderId: "ORD-891",
            total: 89.50,
            status: "DELIVERED"
        }
    ]
};
 
// Query: Get all orders for customer
// PK = "CUST#C-12345"
 
// Query: Get orders for customer in June 2024
// PK = "CUST#C-12345" AND SK BETWEEN "ORDER#2024-06-01" AND "ORDER#2024-06-30"
 
// Query: Get most recent orders
// PK = "CUST#C-12345" AND ScanIndexForward = false LIMIT 10
 
 
// ============================================
// EXAMPLE 2: Single-Table Design (Multi-Entity)
// ============================================
// One table stores Users, Orders, and Products using SK prefixes
 
const singleTable = {
    items: [
        // User Profile
        {
            PK: "USER#U-12345",
            SK: "PROFILE",
            name: "Alice Smith",
            email: "alice@example.com",
            createdAt: "2023-01-15"
        },
        // User's Orders (multiple items, same PK)
        {
            PK: "USER#U-12345",
            SK: "ORDER#2024-06-15#ORD-789",
            orderId: "ORD-789",
            total: 149.99
        },
        {
            PK: "USER#U-12345",
            SK: "ORDER#2024-07-22#ORD-891",
            orderId: "ORD-891",
            total: 89.50
        },
        // User's Payment Methods
        {
            PK: "USER#U-12345",
            SK: "PAYMENT#PM-001",
            type: "VISA",
            last4: "4242"
        }
    ]
};
 
// Query: Get user profile
// PK = "USER#U-12345" AND SK = "PROFILE"
 
// Query: Get all user's orders
// PK = "USER#U-12345" AND SK begins_with "ORDER#"
 
// Query: Get user with everything (profile, orders, payments)
// PK = "USER#U-12345" (returns all items for this user)

The Single-Table Design Philosophy

DynamoDB experts often use a single-table design where multiple entity types share one table, differentiated by sort key prefixes. This reduces the number of tables to manage, enables transactions across entity types, and can reduce costs by consolidating indexes. However, it requires careful planning and is more complex to understand than multi-table designs.

Write Sharding: Scaling Beyond Partition Limits

Sometimes your access pattern inherently creates hot partitions. A global counter, a trending topic, or a viral piece of content naturally concentrates traffic. Write sharding is the technique for handling these scenarios.

The Problem: Unavoidable Concentration

Consider a view counter for a viral video:

Video ID: VID-12345
Current views: 10,000,000
Incoming view updates: 50,000/second

With a simple partition key of videoId, all 50,000 writes/second hit the same partition—far exceeding the 1,000 WCU limit.

The Solution: Shard the Partition Key

Instead of a single partition key value, we create multiple by appending a random suffix:

Original: videoId = "VID-12345"
Sharded:  videoId = "VID-12345#shard-0" through "VID-12345#shard-9"

Now writes are distributed across 10 partitions, giving us ~10,000 WCU of write throughput.

Write Sharding Implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
// ============================================
// WRITE SHARDING FOR HOT KEYS
// ============================================
 
const SHARD_COUNT = 10; // Number of shards (tune based on throughput needs)
 
// Write: Increment view count (distributed across shards)
async function incrementViewCount(videoId: string): Promise<void> {
    // Random shard selection distributes writes evenly
    const shardId = Math.floor(Math.random() * SHARD_COUNT);
    const shardedKey = `${videoId}#shard-${shardId}`;
    
    await dynamodb.updateItem({
        TableName: "VideoStats",
        Key: {
            PK: { S: shardedKey },
            SK: { S: "VIEW_COUNT" }
        },
        UpdateExpression: "ADD #count :inc",
        ExpressionAttributeNames: { "#count": "count" },
        ExpressionAttributeValues: { ":inc": { N: "1" } }
    });
}
 
// Read: Get total view count (aggregate across all shards)
async function getTotalViewCount(videoId: string): Promise<number> {
    const shardKeys = Array.from(
        { length: SHARD_COUNT },
        (_, i) => ({ PK: { S: `${videoId}#shard-${i}` }, SK: { S: "VIEW_COUNT" } })
    );
    
    const results = await dynamodb.batchGetItem({
        RequestItems: {
            VideoStats: { Keys: shardKeys }
        }
    });
    
    // Sum counts from all shards
    return results.Responses?.VideoStats?.reduce(
        (sum, item) => sum + parseInt(item.count?.N || "0", 10),
        0
    ) || 0;
}
 
// ============================================
// ALTERNATIVE: Calculated Shard (Deterministic)
// ============================================
// Use when you need predictable shard placement (e.g., for caching)
 
function getShardForRequest(videoId: string, requestId: string): string {
    // Hash requestId to get deterministic shard
    const hash = simpleHash(requestId);
    const shardId = hash % SHARD_COUNT;
    return `${videoId}#shard-${shardId}`;
}
 
function simpleHash(str: string): number {
    let hash = 0;
    for (let i = 0; i < str.length; i++) {
        const char = str.charCodeAt(i);
        hash = ((hash << 5) - hash) + char;
        hash = hash & hash; // Convert to 32-bit integer
    }
    return Math.abs(hash);
}

Write Sharding Trade-offs
Aspect	Without Sharding	With Sharding
Write Throughput	≤1,000 WCU per key	N × 1,000 WCU (N shards)
Read Complexity	Single GetItem	BatchGetItem + aggregation
Read Cost	1 RCU	N RCU (one per shard)
Code Complexity	Simple	Moderate (shard logic required)
Eventual Consistency	N/A	Aggregated reads may be slightly stale
Use Cases	Normal traffic items	Counters, trending items, viral content

When to Use Write Sharding

Write sharding adds complexity and read overhead. Use it only when you have proven hot keys that exceed partition limits. Signs you need sharding: throttling on specific keys despite high table capacity, CloudWatch showing uneven partition utilization, or known viral/trending access patterns.

Adaptive Capacity: DynamoDB's Safety Net

While good partition key design is essential, DynamoDB provides a safety mechanism called Adaptive Capacity that helps mitigate hot partition issues. Understanding this feature helps you design more resilient tables.

How Adaptive Capacity Works

Traditionally, DynamoDB distributed provisioned capacity evenly across partitions. If you provisioned 10,000 WCU and had 10 partitions, each partition got 1,000 WCU. If one partition needed 2,000 WCU while others used only 500, the hot partition would throttle.

Adaptive Capacity changes this:

DynamoDB continuously monitors per-partition throughput utilization
When a partition needs more throughput than its share, DynamoDB automatically increases that partition's allocation
The increase is 'borrowed' from under-utilized partitions
This happens in real-time, within minutes of detection

Limits of Adaptive Capacity

Maximum boost per partition: Up to total table throughput (but still capped at partition limits)
Does not help if total table throughput is insufficient
Does not help if you hit the 3,000 RCU or 1,000 WCU per-partition hard limit
Works best with moderate hot spots, not extreme concentration

What Adaptive Capacity Can and Cannot Fix

•✅ CAN FIX: Moderately uneven traffic distribution (60/40 splits, 70/30 splits)
•✅ CAN FIX: Temporary traffic spikes to popular items
•✅ CAN FIX: Gradual shifts in access patterns over time
•❌ CANNOT FIX: Extreme hot keys receiving majority of traffic
•❌ CANNOT FIX: Single partition key receiving >1,000 WCU or >3,000 RCU
•❌ CANNOT FIX: Poor partition key design with low cardinality

Adaptive Capacity Is Not a Design Substitute

Adaptive Capacity is a safety net, not a design strategy. It helps smooth out unexpected traffic variations but cannot fix fundamentally poor partition key choices. Always design for uniform distribution first, then rely on Adaptive Capacity as insurance against real-world imperfection.

Partition Key Selection Framework

Let's synthesize everything into a practical framework for selecting partition keys. Follow this decision process when designing new tables.

Partition Key Selection Process

•Document your access patterns — List every query your application will perform. Include reads, writes, and updates with expected frequencies.
•Identify the primary entity — What is the natural 'owner' of the data? Users own orders. Devices own sensor readings. Tenants own resources.
•Evaluate candidate keys for cardinality — Estimate the number of distinct values. Aim for at least 1000× your expected partition count.
•Analyze traffic distribution — Will all candidate key values receive similar traffic? Model your top 1% of keys—do they receive disproportionate load?
•Determine if sort key is needed — Do you need range queries, time ordering, or hierarchical data within partitions?
•Consider composite keys — Combine attributes if a single attribute doesn't meet cardinality or distribution requirements.
•Plan for hot key handling — If unavoidable hot keys exist (trending items, global counters), design write sharding from the start.
•Validate with realistic load testing — Before production, simulate production access patterns and verify even partition utilization.

Partition Key Selection by Use Case
Use Case	Recommended Partition Key	Sort Key	Reasoning
User Profiles	userId	None (or PROFILE)	One profile per user, high cardinality
User's Orders	userId	orderDate#orderId	Query orders by user, sort by time
IoT Sensor Data	sensorId	timestamp	Each sensor writes independently
Game Leaderboard	leaderboardId#shard-N	score	Write sharding for hot leaderboards
Multi-tenant SaaS	tenantId#entityType#entityId	None or timestamp	Prevent large tenant hot spots
Session Storage	sessionId	None	Random IDs distribute perfectly
Product Catalog	productId	None or category#name	Products accessed independently
Chat Messages	conversationId	timestamp	Messages grouped by conversation

Summary: Partition Key Design Mastery

Partition key design is the foundation of DynamoDB success. Let's consolidate the essential principles:

Key Takeaways

•Partition keys determine distribution — Items with the same partition key are stored together on the same partition with hard throughput limits (1,000 WCU, 3,000 RCU per partition).
•High cardinality is essential — More distinct partition key values enable more partitions, better distribution, and higher total throughput.
•Uniform traffic matters as much as data — Even distribution of data isn't enough; traffic must also be evenly distributed across partition key values.
•Avoid date, status, and enum keys — Low-cardinality values create hot partitions that throttle regardless of provisioned capacity.
•Composite keys enable rich queries — Sort keys allow range queries, time ordering, and single-table designs with multiple entity types.
•Write sharding scales hot keys — For unavoidable hot keys, distribute writes across multiple sharded partition key values and aggregate on read.
•Adaptive Capacity helps but doesn't fix — DynamoDB's automatic throughput rebalancing is a safety net, not a substitute for good design.

What's Next

With partition keys mastered, we turn to Global Secondary Indexes (GSIs)—the mechanism that enables querying DynamoDB by attributes other than the primary key. GSIs unlock access patterns that the base table cannot support, but they come with their own design considerations, cost implications, and partition key challenges. Understanding GSIs is essential for building flexible, query-rich DynamoDB applications.

Page Complete

You now understand partition key design—the most critical factor in DynamoDB success. You can identify good and bad partition key candidates, use composite keys for rich querying, implement write sharding for hot keys, and apply a systematic framework for key selection. This knowledge prevents the throttling disasters that plague poorly designed DynamoDB tables.