System Design (HLD)Amazon DynamoDB

Amazon DynamoDB: AWS's Fully Managed NoSQL Database

LevelAdvanced

Duration90 mins

TopicAmazon DynamoDB

3 / 6

Global Secondary Indexes

Querying Beyond the Primary Key

You've designed the perfect partition key for your DynamoDB table. Your e-commerce orders table uses customerId as the partition key and orderTimestamp as the sort key. Queries for a customer's orders are blazing fast.

Then product management asks: "Can we find all orders for a specific product? We need to track which products are selling best."

Your heart sinks. The table is optimized for customer-centric queries. Product-centric queries would require a full table scan—touching every partition, reading every item, and filtering in application code. For a table with 100 million orders, this is a non-starter.

This is exactly the problem Global Secondary Indexes (GSIs) solve. GSIs allow you to create alternative views of your data with different partition and sort keys, enabling query patterns that the base table cannot efficiently support.

What You Will Learn

By the end of this page, you will understand what GSIs are and how they differ from Local Secondary Indexes, how to design GSIs for diverse access patterns, the cost and consistency implications of GSIs, sparse indexes and projection strategies for optimization, and common GSI design patterns used in production systems.

What Are Global Secondary Indexes?

A Global Secondary Index (GSI) is an index with a partition key and optional sort key that can be different from those on the base table. "Global" means the index spans all partitions of the base table—it's not constrained to a single partition.

GSIs are essentially separate tables managed automatically by DynamoDB:

They have their own partition key and sort key (completely independent of the base table)
They have their own throughput capacity (RCUs/WCUs, either provisioned or on-demand)
They store a copy of the data (projected attributes from the base table)
They are updated asynchronously when the base table changes
They can be created or deleted at any time, even on tables with data

GSI vs Base Table Comparison
Characteristic	Base Table	Global Secondary Index
Partition Key	Fixed at table creation	Any attribute from base table items
Sort Key	Fixed at table creation (optional)	Any attribute from base table items (optional)
Capacity	Table's provisioned/on-demand capacity	Separate provisioned/on-demand capacity
Consistency	Supports strong and eventual	Eventual consistency only
Item Size Limit	400 KB	Projected attributes within 400 KB
Write Path	Direct writes	Asynchronous replication from base table
Creation	At table creation only	Any time (with eventual population)

GSI Architecture Visualization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
┌────────────────────────────────────────────────────────────────────┐
│                    Base Table: Orders                               │
│    Partition Key: customerId    Sort Key: orderTimestamp           │
├────────────────────────────────────────────────────────────────────┤
│  customerId │ orderTimestamp  │ orderId │ productId │ status      │
├─────────────┼─────────────────┼─────────┼───────────┼─────────────┤
│  CUST-001   │ 2024-06-15T10:00│ ORD-100 │ PROD-500  │ delivered   │
│  CUST-001   │ 2024-06-16T14:30│ ORD-101 │ PROD-200  │ shipped     │
│  CUST-002   │ 2024-06-15T09:00│ ORD-102 │ PROD-500  │ delivered   │
│  CUST-003   │ 2024-06-17T11:15│ ORD-103 │ PROD-300  │ pending     │
└────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┴───────────────┐
                    │  Automatic Async Replication  │
                    └───────────────┬───────────────┘
                                    ▼
┌────────────────────────────────────────────────────────────────────┐
│              GSI: ProductOrders-Index                               │
│    Partition Key: productId    Sort Key: orderTimestamp            │
├────────────────────────────────────────────────────────────────────┤
│  productId  │ orderTimestamp  │ orderId │ customerId │ (keys only) │
├─────────────┼─────────────────┼─────────┼────────────┼─────────────┤
│  PROD-200   │ 2024-06-16T14:30│ ORD-101 │ CUST-001   │             │
│  PROD-300   │ 2024-06-17T11:15│ ORD-103 │ CUST-003   │             │
│  PROD-500   │ 2024-06-15T09:00│ ORD-102 │ CUST-002   │             │
│  PROD-500   │ 2024-06-15T10:00│ ORD-100 │ CUST-001   │             │
└────────────────────────────────────────────────────────────────────┘
 
Query: "Get all orders for PROD-500"
→ Queries GSI with PK = "PROD-500"
→ Returns ORD-100 and ORD-102 instantly (no table scan!)

GSIs vs LSIs

DynamoDB also offers Local Secondary Indexes (LSIs), which share the base table's partition key but allow an alternative sort key. LSIs share capacity with the base table, support strong consistency, and must be created at table creation time. GSIs are more flexible and are used far more frequently in practice. This page focuses on GSIs.

Creating and Configuring GSIs

Creating a GSI requires several key decisions: the key schema, which attributes to project, and capacity settings. Let's examine each in detail.

GSI Creation Example (AWS SDK)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
import { DynamoDB, CreateTableCommand } from "@aws-sdk/client-dynamodb";
 
const dynamodb = new DynamoDB({ region: "us-east-1" });
 
// Create table with GSI at creation time
const createTableWithGSI = async () => {
    await dynamodb.send(new CreateTableCommand({
        TableName: "Orders",
        
        // Base table key schema
        KeySchema: [
            { AttributeName: "customerId", KeyType: "HASH" },
            { AttributeName: "orderTimestamp", KeyType: "RANGE" }
        ],
        
        // Define all attributes used in keys (base + GSI)
        AttributeDefinitions: [
            { AttributeName: "customerId", AttributeType: "S" },
            { AttributeName: "orderTimestamp", AttributeType: "S" },
            { AttributeName: "productId", AttributeType: "S" },
            { AttributeName: "status", AttributeType: "S" }
        ],
        
        // GSI definitions
        GlobalSecondaryIndexes: [
            {
                IndexName: "ProductOrders-Index",
                KeySchema: [
                    { AttributeName: "productId", KeyType: "HASH" },
                    { AttributeName: "orderTimestamp", KeyType: "RANGE" }
                ],
                // What to copy to the GSI
                Projection: {
                    ProjectionType: "INCLUDE",
                    NonKeyAttributes: ["orderId", "total", "customerId"]
                },
                // GSI has its own capacity
                ProvisionedThroughput: {
                    ReadCapacityUnits: 100,
                    WriteCapacityUnits: 50
                }
            },
            {
                IndexName: "StatusOrders-Index",
                KeySchema: [
                    { AttributeName: "status", KeyType: "HASH" },
                    { AttributeName: "orderTimestamp", KeyType: "RANGE" }
                ],
                Projection: {
                    ProjectionType: "KEYS_ONLY"  // Only base table keys projected
                },
                ProvisionedThroughput: {
                    ReadCapacityUnits: 50,
                    WriteCapacityUnits: 25
                }
            }
        ],
        
        BillingMode: "PROVISIONED",
        ProvisionedThroughput: {
            ReadCapacityUnits: 500,
            WriteCapacityUnits: 200
        }
    }));
};
 
// Add GSI to existing table (backfill happens automatically)
const addGSIToExistingTable = async () => {
    await dynamodb.send(new UpdateTableCommand({
        TableName: "Orders",
        GlobalSecondaryIndexUpdates: [
            {
                Create: {
                    IndexName: "DateOrders-Index",
                    KeySchema: [
                        { AttributeName: "orderDate", KeyType: "HASH" },
                        { AttributeName: "orderId", KeyType: "RANGE" }
                    ],
                    Projection: { ProjectionType: "ALL" },
                    ProvisionedThroughput: {
                        ReadCapacityUnits: 100,
                        WriteCapacityUnits: 50
                    }
                }
            }
        ],
        // Also need to add the attribute definition
        AttributeDefinitions: [
            { AttributeName: "orderDate", AttributeType: "S" }
        ]
    }));
};

Projection Types Explained

Projection determines which attributes from the base table are copied to the GSI:

Projection Type	Contents	Storage Cost	Query Flexibility
KEYS_ONLY	Only base table keys + GSI keys	Lowest	Must fetch from base table for other attributes
INCLUDE	Specified attributes + all keys	Medium	Good balance of cost and flexibility
ALL	All attributes from base table	Highest	No fetches needed, full item available

Projection Strategy:

Start by identifying which attributes your GSI queries will need
If queries always need the full item → Consider ALL (but watch storage costs)
If queries need only a few attributes → Use INCLUDE with just those
If you only need to find item keys → Use KEYS_ONLY and fetch from base table

GSI Costs and Capacity Considerations

GSIs are powerful but not free. Understanding their cost model is essential for building cost-effective DynamoDB applications.

GSI Cost Components

•Storage Cost — GSI stores a copy of projected attributes. More projections = higher storage bills. With ALL projection, you're storing the full table twice.
•Write Cost — Every base table write that affects GSI keys or projected attributes triggers a GSI write. One base table write can cause writes to multiple GSIs.
•GSI-specific Capacity — GSI reads/writes consume the GSI's capacity, not the base table's. Under-provisioning GSIs causes throttling even if the base table has excess capacity.
•Backfill Cost — When creating a GSI on an existing table, DynamoDB reads all items to populate the index, consuming read/write capacity during creation.

GSI Write Amplification Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Scenario: Orders table with 3 GSIs
// Base table keys: customerId (PK), orderTimestamp (SK)
 
const tableConfig = {
    baseTable: {
        pk: "customerId",
        sk: "orderTimestamp"
    },
    gsis: [
        { name: "ProductOrders-Index", pk: "productId", projected: ["total"] },
        { name: "StatusOrders-Index", pk: "status", projected: ["customerId"] },
        { name: "DateOrders-Index", pk: "orderDate", projected: "ALL" }
    ]
};
 
// When you write ONE item to the base table:
const newOrder = {
    customerId: "CUST-001",       // Base PK
    orderTimestamp: "2024-06-15", // Base SK
    productId: "PROD-500",        // GSI 1 PK
    status: "pending",            // GSI 2 PK
    orderDate: "2024-06-15",      // GSI 3 PK
    total: 149.99,
    items: [...],                 // Not in any GSI projection (INCLUDE)
};
 
// Cost breakdown for this single write:
// 1. Base table write:     1 WCU  (assuming <1 KB item)
// 2. ProductOrders-Index:  1 WCU  (pk + base keys + total)
// 3. StatusOrders-Index:   1 WCU  (pk + base keys + customerId)
// 4. DateOrders-Index:     1 WCU  (all attributes)
// ─────────────────────────────────
// TOTAL:                   4 WCU for ONE logical write
 
// If your application writes 10,000 items/second:
// Base table needs:     10,000 WCU
// GSI 1 needs:          10,000 WCU
// GSI 2 needs:          10,000 WCU
// GSI 3 needs:          10,000 WCU
// TOTAL capacity:       40,000 WCU (4x base table writes!)

Write Amplification Trap

Every GSI multiplies your write costs. A table with 5 GSIs has ~6x the write cost of a table with no GSIs. This is the most common source of unexpected DynamoDB bills. Carefully consider: Do you really need this GSI? Can you serve this query from an existing GSI? Can you use batch reads from the base table instead?

GSI Throttling Independence

A critical fact that surprises many developers: GSI throttling does not throttle base table writes.

If a GSI can't keep up with base table writes:

The base table write succeeds
The GSI write goes into a backlog
The GSI becomes increasingly stale
Eventually, GSI replication catches up (or you run out of capacity)

This means GSI reads might return stale data for extended periods if the GSI is under-provisioned. Monitor GSI ThrottledRequests and ReplicationLatency metrics to catch this before it causes application issues.

Sparse Indexes: Efficiency Through Exclusion

One of the most powerful (and underutilized) GSI features is the sparse index pattern. A GSI only includes items from the base table that have the GSI's key attributes. Items without those attributes are simply not indexed.

This behavior is automatic—DynamoDB doesn't index what doesn't exist. Savvy designers exploit this to create efficient, focused indexes.

Sparse Index Pattern Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Scenario: Orders table where we need to find orders needing attention
// Most orders are 'completed' or 'shipped' - only ~5% need attention
 
// ❌ BAD APPROACH: GSI on 'status' field
// Every order is in the GSI, but we only query for a few statuses
// Result: Huge index, most of it never queried
 
// ✅ GOOD APPROACH: Sparse index on 'needsAttention' attribute
// Only orders requiring attention have this attribute
 
const orderExamples = [
    // Standard completed order - NO 'needsAttention' attribute
    {
        customerId: "CUST-001",
        orderTimestamp: "2024-06-15T10:00:00Z",
        status: "completed",
        total: 149.99
        // Note: no 'needsAttention' attribute
    },
    
    // Order with payment issue - HAS 'needsAttention' attribute
    {
        customerId: "CUST-002",
        orderTimestamp: "2024-06-16T14:30:00Z",
        status: "payment_failed",
        total: 89.50,
        needsAttention: "PAYMENT_ISSUE#2024-06-16T14:30:00Z",  // GSI PK
        issueDetails: "Card declined - insufficient funds"
    },
    
    // Order with shipping problem - HAS 'needsAttention' attribute
    {
        customerId: "CUST-003",
        orderTimestamp: "2024-06-17T11:15:00Z",
        status: "shipping_exception",
        total: 200.00,
        needsAttention: "SHIPPING_ISSUE#2024-06-17T11:15:00Z",  // GSI PK
        issueDetails: "Address undeliverable"
    }
];
 
// GSI: NeedsAttention-Index
// PK: needsAttention
// SK: (none needed, or orderTimestamp for sorting)
 
// Result:
// - GSI contains only ~5% of items (those with issues)
// - Storage cost: ~5% of full index
// - Query: "Get all orders needing attention" is instant and cheap
// - When issue is resolved: remove 'needsAttention' attribute
//   → Item automatically removed from GSI!

Sparse Index Use Cases

•Error/Exception Tracking — Index only items with 'error' attribute; most items don't have errors
•Featured/Promoted Items — Index only items with 'isFeatured' attribute; <1% of catalog featured
•Pending Approvals — Index only items awaiting approval; processed items lose the attribute
•Unread Messages — Index only unread messages; remove from index when read
•TTL-based Cleanup — Index items with 'cleanupScheduled' attribute for batch processing

Sparse Index Design Rule

If you're creating a GSI to find a minority of items (items with errors, featured items, pending reviews), make it sparse. Instead of indexing a status attribute that exists on all items, add a special attribute only to the items you want indexed. The storage and write cost savings can be dramatic.

GSI Partition Key Design

Many developers think carefully about base table partition keys but then choose GSI keys carelessly. GSIs have the same partition limits as base tables (3,000 RCU, 1,000 WCU per partition). All the partition key design principles apply equally to GSIs.

Common GSI Partition Key Mistakes

•Status as GSI partition key — Most items have 'active' status; GSI partition for 'active' becomes hot
•Date as GSI partition key — Today's date receives all writes; violates cardinality principles
•Country/Region as GSI partition key — Traffic concentrates on populous regions
•Category as GSI partition key — Popular categories dominate traffic

GSI Key Design: Before and After
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// ❌ BAD: Status GSI with low-cardinality partition key
const badGSI = {
    IndexName: "StatusIndex",
    KeySchema: [
        { AttributeName: "status", KeyType: "HASH" },  // 5 distinct values!
        { AttributeName: "createdAt", KeyType: "RANGE" }
    ]
};
// Problem: 90% of orders are "completed" → hot partition on "completed"
// Query: "Find all pending orders" → Works fine (few items)
// Query: "Find all completed orders" → Hits one overloaded partition
 
 
// ✅ GOOD: Composite GSI key with date sharding
const goodGSI = {
    IndexName: "StatusDateIndex",
    KeySchema: [
        // Composite key: status#date spreads status across many partitions
        { AttributeName: "statusDate", KeyType: "HASH" },  // "completed#2024-06-15"
        { AttributeName: "createdAt", KeyType: "RANGE" }
    ]
};
// Benefits:
// - Each day's completed orders in separate partition
// - Cardinality = statuses × days (365× improvement per year)
// - Query: "completed orders on 2024-06-15" → Efficient single partition
 
 
// ✅ ALTERNATIVE: Sharded status GSI for aggregation queries
const shardedGSI = {
    IndexName: "StatusShardIndex",
    KeySchema: [
        { AttributeName: "statusShard", KeyType: "HASH" },  // "completed#shard-7"
        { AttributeName: "createdAt", KeyType: "RANGE" }
    ]
};
 
// Write logic: Assign random shard
function getStatusShard(status: string, shardCount = 10): string {
    const shard = Math.floor(Math.random() * shardCount);
    return `${status}#shard-${shard}`;
}
 
// Read logic: Query all shards and merge
async function getOrdersByStatus(status: string): Promise<Order[]> {
    const promises = Array.from({ length: 10 }, (_, i) =>
        queryGSI("StatusShardIndex", `${status}#shard-${i}`)
    );
    const results = await Promise.all(promises);
    return results.flat().sort((a, b) => b.createdAt - a.createdAt);
}

GSI Keys Don't Need to Match Base Table Attributes

GSI keys can be attributes that don't exist on all items (sparse indexes), computed/composite values (status#date), or completely synthetic (hash-based shards). This flexibility is powerful—use it to create GSIs with excellent distribution characteristics regardless of your base table's natural attributes.

GSI Query Patterns and Best Practices

Querying GSIs follows the same patterns as querying base tables, with a few important distinctions. Let's examine common query patterns and their implementations.

GSI Query Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import { DynamoDBDocumentClient, QueryCommand } from "@aws-sdk/lib-dynamodb";
 
const docClient = DynamoDBDocumentClient.from(dynamoDBClient);
 
// ============================================
// Pattern 1: Simple GSI Query
// ============================================
// Find all orders for a product
async function getOrdersByProduct(productId: string): Promise<Order[]> {
    const result = await docClient.send(new QueryCommand({
        TableName: "Orders",
        IndexName: "ProductOrders-Index",  // Specify the GSI
        KeyConditionExpression: "productId = :pid",
        ExpressionAttributeValues: {
            ":pid": productId
        }
    }));
    return result.Items as Order[];
}
 
 
// ============================================
// Pattern 2: GSI with Sort Key Range
// ============================================
// Find orders for a product in a date range
async function getProductOrdersInRange(
    productId: string,
    startDate: string,
    endDate: string
): Promise<Order[]> {
    const result = await docClient.send(new QueryCommand({
        TableName: "Orders",
        IndexName: "ProductOrders-Index",
        KeyConditionExpression: 
            "productId = :pid AND orderTimestamp BETWEEN :start AND :end",
        ExpressionAttributeValues: {
            ":pid": productId,
            ":start": startDate,
            ":end": endDate
        },
        ScanIndexForward: false  // Newest first
    }));
    return result.Items as Order[];
}
 
 
// ============================================
// Pattern 3: GSI Query with Filter
// ============================================
// Find pending orders for a product (filter on non-key attribute)
async function getPendingProductOrders(productId: string): Promise<Order[]> {
    const result = await docClient.send(new QueryCommand({
        TableName: "Orders",
        IndexName: "ProductOrders-Index",
        KeyConditionExpression: "productId = :pid",
        FilterExpression: "#status = :status",  // Post-query filter
        ExpressionAttributeNames: {
            "#status": "status"  // 'status' is reserved word
        },
        ExpressionAttributeValues: {
            ":pid": productId,
            ":status": "pending"
        }
    }));
    return result.Items as Order[];
}
// ⚠️ NOTE: FilterExpression filters AFTER reading items from GSI
// You're charged for all items read, not just those returned
// For frequent filtered queries, consider a more specific GSI
 
 
// ============================================
// Pattern 4: Fetch Full Item from Base Table
// ============================================
// When GSI uses KEYS_ONLY or INCLUDE projection, get full item
async function getFullOrderDetails(productId: string): Promise<Order[]> {
    // Step 1: Query GSI to get keys
    const gsiResult = await docClient.send(new QueryCommand({
        TableName: "Orders",
        IndexName: "ProductOrders-Index",
        KeyConditionExpression: "productId = :pid",
        ExpressionAttributeValues: {
            ":pid": productId
        }
    }));
    
    // Step 2: BatchGetItem from base table for full items
    if (!gsiResult.Items?.length) return [];
    
    const keys = gsiResult.Items.map(item => ({
        customerId: item.customerId,      // Base table PK
        orderTimestamp: item.orderTimestamp  // Base table SK
    }));
    
    const batchResult = await docClient.send(new BatchGetCommand({
        RequestItems: {
            Orders: { Keys: keys }
        }
    }));
    
    return batchResult.Responses?.Orders as Order[];
}

Filter Expressions Don't Reduce Costs

A common misconception: FilterExpression does NOT reduce the items read from the index—it only filters what's returned to your application. You're charged for all items matching the key conditions, even if 99% are filtered out. If you find yourself filtering heavily, you probably need a better GSI design.

GSI Limits and Constraints

Understanding GSI limits helps avoid surprises during development and scaling. Here are the constraints you must design around:

GSI Limits Summary
Limit	Value	Notes
GSIs per table	20 (soft limit)	Can request increase via AWS Support
Projected attribute size	Item ≤ 400 KB after projection	Larger items cause write failure
GSI partition throughput	3,000 RCU / 1,000 WCU per partition	Same as base table partitions
Strong consistency	Not available	GSI queries are eventually consistent only
Transactions	Cannot read from GSI in transaction	Only base table reads in transactions
Backfill time	Hours to days for large tables	Depends on table size and capacity
Key attribute types	String, Number, Binary only	Complex types cannot be keys

Working Within GSI Constraints

•20 GSI limit — Design multi-purpose GSIs using composite keys and overloaded attributes. One GSI can serve multiple access patterns.
•No strong consistency — If you need strongly consistent reads on indexed attributes, consider keeping a denormalized copy in the base table.
•No transactional GSI reads — For transaction-critical flows, ensure base table key structure supports the required queries.
•Eventual consistency lag — For time-sensitive queries, build application logic that handles slightly stale GSI data.

GSI Backfill on Creation

When you create a GSI on an existing table with data, DynamoDB performs a background backfill. The table remains fully operational, but the GSI is in CREATING status until complete. For large tables (billions of items), this can take days. Plan GSI additions during low-traffic periods and monitor the creation progress via CloudWatch.

Summary: Mastering Global Secondary Indexes

Global Secondary Indexes are essential for building flexible DynamoDB applications. Let's consolidate the key insights:

Key Takeaways

•GSIs are separate tables — They have independent partition keys, sort keys, capacity, and storage. DynamoDB manages replication automatically.
•Eventual consistency only — GSI reads cannot be strongly consistent. Design applications to handle slight propagation delays.
•Write amplification is real — Every GSI multiplies write costs. Five GSIs = ~6x write costs. Justify every GSI.
•Projection impacts storage and query — Use KEYS_ONLY for minimal storage, INCLUDE for selected attributes, ALL for full flexibility at higher cost.
•Sparse indexes save money — Index only items needing indexing by conditionally adding the GSI key attribute.
•GSI keys need the same care as base table keys — High cardinality, uniform distribution, and partition limit awareness all apply.
•20 GSI limit requires planning — Design GSIs to serve multiple access patterns using composite keys and attribute overloading.

What's Next

With partition keys and GSIs covered, we turn to one of DynamoDB's most important trade-offs: Eventual vs Strong Consistency. Understanding when to use each consistency level—and what happens when you choose wrong—is crucial for building systems that behave correctly while maintaining the performance DynamoDB is known for.

Page Complete

You now understand Global Secondary Indexes—how they work, their costs and constraints, and patterns for effective use. You can design GSIs for diverse access patterns, use sparse indexes for efficiency, and avoid the write amplification and hot partition traps that catch many DynamoDB users.

3 / 6

Loading learning content...

System Design (HLD)Amazon DynamoDB

Amazon DynamoDB: AWS's Fully Managed NoSQL Database

LevelAdvanced

Duration90 mins

TopicAmazon DynamoDB

3 / 6

Global Secondary Indexes

Querying Beyond the Primary Key

Then product management asks: "Can we find all orders for a specific product? We need to track which products are selling best."

What You Will Learn

What Are Global Secondary Indexes?

GSIs are essentially separate tables managed automatically by DynamoDB:

They have their own partition key and sort key (completely independent of the base table)
They have their own throughput capacity (RCUs/WCUs, either provisioned or on-demand)
They store a copy of the data (projected attributes from the base table)
They are updated asynchronously when the base table changes
They can be created or deleted at any time, even on tables with data

GSI vs Base Table Comparison
Characteristic	Base Table	Global Secondary Index
Partition Key	Fixed at table creation	Any attribute from base table items
Sort Key	Fixed at table creation (optional)	Any attribute from base table items (optional)
Capacity	Table's provisioned/on-demand capacity	Separate provisioned/on-demand capacity
Consistency	Supports strong and eventual	Eventual consistency only
Item Size Limit	400 KB	Projected attributes within 400 KB
Write Path	Direct writes	Asynchronous replication from base table
Creation	At table creation only	Any time (with eventual population)

GSI Architecture Visualization
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
┌────────────────────────────────────────────────────────────────────┐
│                    Base Table: Orders                               │
│    Partition Key: customerId    Sort Key: orderTimestamp           │
├────────────────────────────────────────────────────────────────────┤
│  customerId │ orderTimestamp  │ orderId │ productId │ status      │
├─────────────┼─────────────────┼─────────┼───────────┼─────────────┤
│  CUST-001   │ 2024-06-15T10:00│ ORD-100 │ PROD-500  │ delivered   │
│  CUST-001   │ 2024-06-16T14:30│ ORD-101 │ PROD-200  │ shipped     │
│  CUST-002   │ 2024-06-15T09:00│ ORD-102 │ PROD-500  │ delivered   │
│  CUST-003   │ 2024-06-17T11:15│ ORD-103 │ PROD-300  │ pending     │
└────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┴───────────────┐
                    │  Automatic Async Replication  │
                    └───────────────┬───────────────┘
                                    ▼
┌────────────────────────────────────────────────────────────────────┐
│              GSI: ProductOrders-Index                               │
│    Partition Key: productId    Sort Key: orderTimestamp            │
├────────────────────────────────────────────────────────────────────┤
│  productId  │ orderTimestamp  │ orderId │ customerId │ (keys only) │
├─────────────┼─────────────────┼─────────┼────────────┼─────────────┤
│  PROD-200   │ 2024-06-16T14:30│ ORD-101 │ CUST-001   │             │
│  PROD-300   │ 2024-06-17T11:15│ ORD-103 │ CUST-003   │             │
│  PROD-500   │ 2024-06-15T09:00│ ORD-102 │ CUST-002   │             │
│  PROD-500   │ 2024-06-15T10:00│ ORD-100 │ CUST-001   │             │
└────────────────────────────────────────────────────────────────────┘
 
Query: "Get all orders for PROD-500"
→ Queries GSI with PK = "PROD-500"
→ Returns ORD-100 and ORD-102 instantly (no table scan!)

GSIs vs LSIs

Creating and Configuring GSIs

Creating a GSI requires several key decisions: the key schema, which attributes to project, and capacity settings. Let's examine each in detail.

GSI Creation Example (AWS SDK)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
import { DynamoDB, CreateTableCommand } from "@aws-sdk/client-dynamodb";
 
const dynamodb = new DynamoDB({ region: "us-east-1" });
 
// Create table with GSI at creation time
const createTableWithGSI = async () => {
    await dynamodb.send(new CreateTableCommand({
        TableName: "Orders",
        
        // Base table key schema
        KeySchema: [
            { AttributeName: "customerId", KeyType: "HASH" },
            { AttributeName: "orderTimestamp", KeyType: "RANGE" }
        ],
        
        // Define all attributes used in keys (base + GSI)
        AttributeDefinitions: [
            { AttributeName: "customerId", AttributeType: "S" },
            { AttributeName: "orderTimestamp", AttributeType: "S" },
            { AttributeName: "productId", AttributeType: "S" },
            { AttributeName: "status", AttributeType: "S" }
        ],
        
        // GSI definitions
        GlobalSecondaryIndexes: [
            {
                IndexName: "ProductOrders-Index",
                KeySchema: [
                    { AttributeName: "productId", KeyType: "HASH" },
                    { AttributeName: "orderTimestamp", KeyType: "RANGE" }
                ],
                // What to copy to the GSI
                Projection: {
                    ProjectionType: "INCLUDE",
                    NonKeyAttributes: ["orderId", "total", "customerId"]
                },
                // GSI has its own capacity
                ProvisionedThroughput: {
                    ReadCapacityUnits: 100,
                    WriteCapacityUnits: 50
                }
            },
            {
                IndexName: "StatusOrders-Index",
                KeySchema: [
                    { AttributeName: "status", KeyType: "HASH" },
                    { AttributeName: "orderTimestamp", KeyType: "RANGE" }
                ],
                Projection: {
                    ProjectionType: "KEYS_ONLY"  // Only base table keys projected
                },
                ProvisionedThroughput: {
                    ReadCapacityUnits: 50,
                    WriteCapacityUnits: 25
                }
            }
        ],
        
        BillingMode: "PROVISIONED",
        ProvisionedThroughput: {
            ReadCapacityUnits: 500,
            WriteCapacityUnits: 200
        }
    }));
};
 
// Add GSI to existing table (backfill happens automatically)
const addGSIToExistingTable = async () => {
    await dynamodb.send(new UpdateTableCommand({
        TableName: "Orders",
        GlobalSecondaryIndexUpdates: [
            {
                Create: {
                    IndexName: "DateOrders-Index",
                    KeySchema: [
                        { AttributeName: "orderDate", KeyType: "HASH" },
                        { AttributeName: "orderId", KeyType: "RANGE" }
                    ],
                    Projection: { ProjectionType: "ALL" },
                    ProvisionedThroughput: {
                        ReadCapacityUnits: 100,
                        WriteCapacityUnits: 50
                    }
                }
            }
        ],
        // Also need to add the attribute definition
        AttributeDefinitions: [
            { AttributeName: "orderDate", AttributeType: "S" }
        ]
    }));
};

Projection Types Explained

Projection determines which attributes from the base table are copied to the GSI:

Projection Type	Contents	Storage Cost	Query Flexibility
KEYS_ONLY	Only base table keys + GSI keys	Lowest	Must fetch from base table for other attributes
INCLUDE	Specified attributes + all keys	Medium	Good balance of cost and flexibility
ALL	All attributes from base table	Highest	No fetches needed, full item available

Projection Strategy:

Start by identifying which attributes your GSI queries will need
If queries always need the full item → Consider ALL (but watch storage costs)
If queries need only a few attributes → Use INCLUDE with just those
If you only need to find item keys → Use KEYS_ONLY and fetch from base table

GSI Costs and Capacity Considerations

GSIs are powerful but not free. Understanding their cost model is essential for building cost-effective DynamoDB applications.

GSI Cost Components

•Storage Cost — GSI stores a copy of projected attributes. More projections = higher storage bills. With ALL projection, you're storing the full table twice.
•Write Cost — Every base table write that affects GSI keys or projected attributes triggers a GSI write. One base table write can cause writes to multiple GSIs.
•GSI-specific Capacity — GSI reads/writes consume the GSI's capacity, not the base table's. Under-provisioning GSIs causes throttling even if the base table has excess capacity.
•Backfill Cost — When creating a GSI on an existing table, DynamoDB reads all items to populate the index, consuming read/write capacity during creation.

GSI Write Amplification Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Scenario: Orders table with 3 GSIs
// Base table keys: customerId (PK), orderTimestamp (SK)
 
const tableConfig = {
    baseTable: {
        pk: "customerId",
        sk: "orderTimestamp"
    },
    gsis: [
        { name: "ProductOrders-Index", pk: "productId", projected: ["total"] },
        { name: "StatusOrders-Index", pk: "status", projected: ["customerId"] },
        { name: "DateOrders-Index", pk: "orderDate", projected: "ALL" }
    ]
};
 
// When you write ONE item to the base table:
const newOrder = {
    customerId: "CUST-001",       // Base PK
    orderTimestamp: "2024-06-15", // Base SK
    productId: "PROD-500",        // GSI 1 PK
    status: "pending",            // GSI 2 PK
    orderDate: "2024-06-15",      // GSI 3 PK
    total: 149.99,
    items: [...],                 // Not in any GSI projection (INCLUDE)
};
 
// Cost breakdown for this single write:
// 1. Base table write:     1 WCU  (assuming <1 KB item)
// 2. ProductOrders-Index:  1 WCU  (pk + base keys + total)
// 3. StatusOrders-Index:   1 WCU  (pk + base keys + customerId)
// 4. DateOrders-Index:     1 WCU  (all attributes)
// ─────────────────────────────────
// TOTAL:                   4 WCU for ONE logical write
 
// If your application writes 10,000 items/second:
// Base table needs:     10,000 WCU
// GSI 1 needs:          10,000 WCU
// GSI 2 needs:          10,000 WCU
// GSI 3 needs:          10,000 WCU
// TOTAL capacity:       40,000 WCU (4x base table writes!)

Write Amplification Trap

GSI Throttling Independence

A critical fact that surprises many developers: GSI throttling does not throttle base table writes.

If a GSI can't keep up with base table writes:

The base table write succeeds
The GSI write goes into a backlog
The GSI becomes increasingly stale
Eventually, GSI replication catches up (or you run out of capacity)

Sparse Indexes: Efficiency Through Exclusion

This behavior is automatic—DynamoDB doesn't index what doesn't exist. Savvy designers exploit this to create efficient, focused indexes.

Sparse Index Pattern Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Scenario: Orders table where we need to find orders needing attention
// Most orders are 'completed' or 'shipped' - only ~5% need attention
 
// ❌ BAD APPROACH: GSI on 'status' field
// Every order is in the GSI, but we only query for a few statuses
// Result: Huge index, most of it never queried
 
// ✅ GOOD APPROACH: Sparse index on 'needsAttention' attribute
// Only orders requiring attention have this attribute
 
const orderExamples = [
    // Standard completed order - NO 'needsAttention' attribute
    {
        customerId: "CUST-001",
        orderTimestamp: "2024-06-15T10:00:00Z",
        status: "completed",
        total: 149.99
        // Note: no 'needsAttention' attribute
    },
    
    // Order with payment issue - HAS 'needsAttention' attribute
    {
        customerId: "CUST-002",
        orderTimestamp: "2024-06-16T14:30:00Z",
        status: "payment_failed",
        total: 89.50,
        needsAttention: "PAYMENT_ISSUE#2024-06-16T14:30:00Z",  // GSI PK
        issueDetails: "Card declined - insufficient funds"
    },
    
    // Order with shipping problem - HAS 'needsAttention' attribute
    {
        customerId: "CUST-003",
        orderTimestamp: "2024-06-17T11:15:00Z",
        status: "shipping_exception",
        total: 200.00,
        needsAttention: "SHIPPING_ISSUE#2024-06-17T11:15:00Z",  // GSI PK
        issueDetails: "Address undeliverable"
    }
];
 
// GSI: NeedsAttention-Index
// PK: needsAttention
// SK: (none needed, or orderTimestamp for sorting)
 
// Result:
// - GSI contains only ~5% of items (those with issues)
// - Storage cost: ~5% of full index
// - Query: "Get all orders needing attention" is instant and cheap
// - When issue is resolved: remove 'needsAttention' attribute
//   → Item automatically removed from GSI!

Sparse Index Use Cases

•Error/Exception Tracking — Index only items with 'error' attribute; most items don't have errors
•Featured/Promoted Items — Index only items with 'isFeatured' attribute; <1% of catalog featured
•Pending Approvals — Index only items awaiting approval; processed items lose the attribute
•Unread Messages — Index only unread messages; remove from index when read
•TTL-based Cleanup — Index items with 'cleanupScheduled' attribute for batch processing

Sparse Index Design Rule

GSI Partition Key Design

Common GSI Partition Key Mistakes

•Status as GSI partition key — Most items have 'active' status; GSI partition for 'active' becomes hot
•Date as GSI partition key — Today's date receives all writes; violates cardinality principles
•Country/Region as GSI partition key — Traffic concentrates on populous regions
•Category as GSI partition key — Popular categories dominate traffic

GSI Key Design: Before and After
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// ❌ BAD: Status GSI with low-cardinality partition key
const badGSI = {
    IndexName: "StatusIndex",
    KeySchema: [
        { AttributeName: "status", KeyType: "HASH" },  // 5 distinct values!
        { AttributeName: "createdAt", KeyType: "RANGE" }
    ]
};
// Problem: 90% of orders are "completed" → hot partition on "completed"
// Query: "Find all pending orders" → Works fine (few items)
// Query: "Find all completed orders" → Hits one overloaded partition
 
 
// ✅ GOOD: Composite GSI key with date sharding
const goodGSI = {
    IndexName: "StatusDateIndex",
    KeySchema: [
        // Composite key: status#date spreads status across many partitions
        { AttributeName: "statusDate", KeyType: "HASH" },  // "completed#2024-06-15"
        { AttributeName: "createdAt", KeyType: "RANGE" }
    ]
};
// Benefits:
// - Each day's completed orders in separate partition
// - Cardinality = statuses × days (365× improvement per year)
// - Query: "completed orders on 2024-06-15" → Efficient single partition
 
 
// ✅ ALTERNATIVE: Sharded status GSI for aggregation queries
const shardedGSI = {
    IndexName: "StatusShardIndex",
    KeySchema: [
        { AttributeName: "statusShard", KeyType: "HASH" },  // "completed#shard-7"
        { AttributeName: "createdAt", KeyType: "RANGE" }
    ]
};
 
// Write logic: Assign random shard
function getStatusShard(status: string, shardCount = 10): string {
    const shard = Math.floor(Math.random() * shardCount);
    return `${status}#shard-${shard}`;
}
 
// Read logic: Query all shards and merge
async function getOrdersByStatus(status: string): Promise<Order[]> {
    const promises = Array.from({ length: 10 }, (_, i) =>
        queryGSI("StatusShardIndex", `${status}#shard-${i}`)
    );
    const results = await Promise.all(promises);
    return results.flat().sort((a, b) => b.createdAt - a.createdAt);
}

GSI Keys Don't Need to Match Base Table Attributes

GSI Query Patterns and Best Practices

Querying GSIs follows the same patterns as querying base tables, with a few important distinctions. Let's examine common query patterns and their implementations.

GSI Query Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
import { DynamoDBDocumentClient, QueryCommand } from "@aws-sdk/lib-dynamodb";
 
const docClient = DynamoDBDocumentClient.from(dynamoDBClient);
 
// ============================================
// Pattern 1: Simple GSI Query
// ============================================
// Find all orders for a product
async function getOrdersByProduct(productId: string): Promise<Order[]> {
    const result = await docClient.send(new QueryCommand({
        TableName: "Orders",
        IndexName: "ProductOrders-Index",  // Specify the GSI
        KeyConditionExpression: "productId = :pid",
        ExpressionAttributeValues: {
            ":pid": productId
        }
    }));
    return result.Items as Order[];
}
 
 
// ============================================
// Pattern 2: GSI with Sort Key Range
// ============================================
// Find orders for a product in a date range
async function getProductOrdersInRange(
    productId: string,
    startDate: string,
    endDate: string
): Promise<Order[]> {
    const result = await docClient.send(new QueryCommand({
        TableName: "Orders",
        IndexName: "ProductOrders-Index",
        KeyConditionExpression: 
            "productId = :pid AND orderTimestamp BETWEEN :start AND :end",
        ExpressionAttributeValues: {
            ":pid": productId,
            ":start": startDate,
            ":end": endDate
        },
        ScanIndexForward: false  // Newest first
    }));
    return result.Items as Order[];
}
 
 
// ============================================
// Pattern 3: GSI Query with Filter
// ============================================
// Find pending orders for a product (filter on non-key attribute)
async function getPendingProductOrders(productId: string): Promise<Order[]> {
    const result = await docClient.send(new QueryCommand({
        TableName: "Orders",
        IndexName: "ProductOrders-Index",
        KeyConditionExpression: "productId = :pid",
        FilterExpression: "#status = :status",  // Post-query filter
        ExpressionAttributeNames: {
            "#status": "status"  // 'status' is reserved word
        },
        ExpressionAttributeValues: {
            ":pid": productId,
            ":status": "pending"
        }
    }));
    return result.Items as Order[];
}
// ⚠️ NOTE: FilterExpression filters AFTER reading items from GSI
// You're charged for all items read, not just those returned
// For frequent filtered queries, consider a more specific GSI
 
 
// ============================================
// Pattern 4: Fetch Full Item from Base Table
// ============================================
// When GSI uses KEYS_ONLY or INCLUDE projection, get full item
async function getFullOrderDetails(productId: string): Promise<Order[]> {
    // Step 1: Query GSI to get keys
    const gsiResult = await docClient.send(new QueryCommand({
        TableName: "Orders",
        IndexName: "ProductOrders-Index",
        KeyConditionExpression: "productId = :pid",
        ExpressionAttributeValues: {
            ":pid": productId
        }
    }));
    
    // Step 2: BatchGetItem from base table for full items
    if (!gsiResult.Items?.length) return [];
    
    const keys = gsiResult.Items.map(item => ({
        customerId: item.customerId,      // Base table PK
        orderTimestamp: item.orderTimestamp  // Base table SK
    }));
    
    const batchResult = await docClient.send(new BatchGetCommand({
        RequestItems: {
            Orders: { Keys: keys }
        }
    }));
    
    return batchResult.Responses?.Orders as Order[];
}

Filter Expressions Don't Reduce Costs

GSI Limits and Constraints

Understanding GSI limits helps avoid surprises during development and scaling. Here are the constraints you must design around:

GSI Limits Summary
Limit	Value	Notes
GSIs per table	20 (soft limit)	Can request increase via AWS Support
Projected attribute size	Item ≤ 400 KB after projection	Larger items cause write failure
GSI partition throughput	3,000 RCU / 1,000 WCU per partition	Same as base table partitions
Strong consistency	Not available	GSI queries are eventually consistent only
Transactions	Cannot read from GSI in transaction	Only base table reads in transactions
Backfill time	Hours to days for large tables	Depends on table size and capacity
Key attribute types	String, Number, Binary only	Complex types cannot be keys

Working Within GSI Constraints

•20 GSI limit — Design multi-purpose GSIs using composite keys and overloaded attributes. One GSI can serve multiple access patterns.
•No strong consistency — If you need strongly consistent reads on indexed attributes, consider keeping a denormalized copy in the base table.
•No transactional GSI reads — For transaction-critical flows, ensure base table key structure supports the required queries.
•Eventual consistency lag — For time-sensitive queries, build application logic that handles slightly stale GSI data.

GSI Backfill on Creation

Summary: Mastering Global Secondary Indexes

Global Secondary Indexes are essential for building flexible DynamoDB applications. Let's consolidate the key insights:

Key Takeaways

•GSIs are separate tables — They have independent partition keys, sort keys, capacity, and storage. DynamoDB manages replication automatically.
•Eventual consistency only — GSI reads cannot be strongly consistent. Design applications to handle slight propagation delays.
•Write amplification is real — Every GSI multiplies write costs. Five GSIs = ~6x write costs. Justify every GSI.
•Projection impacts storage and query — Use KEYS_ONLY for minimal storage, INCLUDE for selected attributes, ALL for full flexibility at higher cost.
•Sparse indexes save money — Index only items needing indexing by conditionally adding the GSI key attribute.
•GSI keys need the same care as base table keys — High cardinality, uniform distribution, and partition limit awareness all apply.
•20 GSI limit requires planning — Design GSIs to serve multiple access patterns using composite keys and attribute overloading.

What's Next

Page Complete

3 / 6