System Design (HLD)Key-Value Stores

Key-Value Stores

LevelIntermediate

Duration75 mins

TopicKey-Value Stores

4 / 5

DynamoDB and Riak: Distributed Key-Value at Scale

Key-Value Stores for Planet-Scale

Redis and Memcached are excellent for caching and real-time features, but what happens when your key-value workload exceeds what fits in memory? When you need durability guarantees? When you need to serve billions of requests per day across multiple geographic regions?

This is where distributed key-value stores enter the picture. Unlike in-memory caches, these systems are designed for persistence, horizontal scalability, and high availability at scales that would overwhelm traditional databases.

Two of the most influential distributed key-value stores are:

Amazon DynamoDB: A fully managed, serverless key-value and document database that scales automatically to handle trillions of requests per day.
Riak: An open-source, Dynamo-inspired database designed for fault tolerance and operational simplicity.

Both trace their lineage to Amazon's groundbreaking 2007 Dynamo paper, which introduced concepts like consistent hashing, vector clocks, and quorum-based replication that now underpin many distributed systems.

What You Will Learn

By the end of this page, you will understand DynamoDB's architecture, partition key design, capacity modes, and global tables. You'll also learn Riak's masterless architecture, conflict resolution approaches, and when each system is appropriate. You'll grasp how the Dynamo paper's ideas manifest in production systems.

The Dynamo Legacy

In 2007, Amazon published the Dynamo paper, describing the key-value store powering Amazon.com's shopping cart and other critical services. The paper introduced and combined several distributed systems techniques that have become foundational:

Core Dynamo concepts:

Dynamo Paper Innovations

•Consistent Hashing — Distributes keys across nodes using a hash ring. Adding/removing nodes only moves a fraction of keys, not all of them.
•Virtual Nodes (Vnodes) — Each physical node maps to multiple positions on the hash ring, ensuring even distribution and enabling heterogeneous hardware.
•Sloppy Quorums — When preferred nodes are unavailable, temporarily write to other nodes (hinted handoff) to maintain availability.
•Hinted Handoff — Nodes that temporarily accept writes for unavailable nodes 'hand off' the data once the original node recovers.
•Anti-Entropy with Merkle Trees — Background processes detect and repair data inconsistencies by comparing cryptographic hashes of data ranges.
•Vector Clocks — Track causal relationships between versions of a value, enabling detection and resolution of conflicting concurrent writes.
•Conflict Resolution at Read Time — Expose conflicts to the application or use simple resolution (last-write-wins) rather than blocking on conflict.

The availability-first philosophy:

Dynamo was designed for always-on services where unavailability directly costs money (shopping carts). Its philosophy:

"It is better to occasionally return stale or conflicting data than to block the customer's request."

This led to eventual consistency as the default, with techniques to detect and resolve conflicts rather than prevent them.

Systems influenced by Dynamo:

Apache Cassandra: Combines Dynamo's distribution with BigTable's data model
Riak: Close implementation of Dynamo concepts, open source
Voldemort: LinkedIn's Dynamo-style store (predecessor to Venice)
Amazon DynamoDB: AWS managed service, evolved beyond original paper

DynamoDB ≠ Dynamo

Amazon DynamoDB is named after the Dynamo paper but has evolved significantly. Modern DynamoDB uses different internal architecture, offers strong consistency options, and provides managed scaling. Think of it as 'inspired by Dynamo' rather than 'implementation of Dynamo'.

Amazon DynamoDB Architecture

DynamoDB is AWS's flagship NoSQL database service. It's fully managed—no servers to provision, no software to install, no replication to configure. You create a table, and AWS handles the rest.

Core concepts:

Tables: Top-level containers for items (like a table in a relational database)
Items: Individual records (like rows), identified by primary key
Attributes: Name-value pairs within items (like columns, but schema-less)
Primary Key: Uniquely identifies each item; either simple (partition key only) or composite (partition key + sort key)

Partition key (hash key):

DynamoDB uses the partition key to determine which partition stores an item:

Partition = hash(partition_key) mod number_of_partitions

Good partition key design is the most important factor in DynamoDB performance. Keys should:

Have high cardinality (many distinct values)
Distribute access evenly across values
Match your access patterns

Sort key (range key):

Optional second component of a composite primary key. Items with the same partition key are stored together, sorted by sort key:

Table: OrderItems
Partition Key: order_id
Sort Key: item_id

# All items for order stored together, sorted by item_id
# Enables queries like: "Get all items for order 123"

DynamoDB Data Model Examples
Use Case	Partition Key	Sort Key	Access Pattern
User profiles	user_id	(none)	Get profile by user ID
Orders by user	user_id	order_date#order_id	Get all orders for user, sorted by date
Chat messages	conversation_id	timestamp#message_id	Get messages in conversation, ordered
IoT sensor data	sensor_id	timestamp	Get readings for sensor in time range
Session store	session_token	(none)	Get session by token

Partitions and scaling:

DynamoDB automatically partitions data based on:

Size: Each partition holds up to 10GB
Throughput: Each partition handles ~3000 RCU or ~1000 WCU

As data grows, DynamoDB adds partitions. But here's the critical insight: throughput is divided among partitions. If you provision 1000 WCU with 10 partitions, each partition gets ~100 WCU.

The hot partition problem:

If access concentrates on one partition key value (a 'hot key'), that single partition becomes a bottleneck while others sit idle. Classic examples:

Partition key = date (all today's traffic hits one partition)
Partition key = tenant_id with one huge tenant
Partition key = status (most items are 'active')

Hot Partition Design

DynamoDB throttles at the partition level, not the table level. A table with 10,000 WCU can still be throttled if one partition receives >1000 WCU. Design partition keys for even distribution: add random suffixes, use composite keys, or shard hot keys.

DynamoDB Capacity and Indexes

Understanding DynamoDB's capacity models and secondary indexes is essential for cost optimization and query flexibility.

Capacity modes:

Provisioned Capacity

•Pre-allocate RCU/WCU (read/write capacity units)
•Predictable costs for steady workloads
•Reserved capacity discounts (up to 77%)
•Auto-scaling available (with delays)
•Throttling if capacity exceeded
•Best for: Predictable traffic patterns

On-Demand Capacity

•Pay per request, no provisioning
•Instant scaling (no warm-up)
•No capacity planning required
•~5-7x more expensive per operation
•Never throttled (within limits)
•Best for: Variable/unpredictable traffic

Capacity unit costs:

Read Capacity Unit (RCU):
- 1 strongly consistent read of item up to 4KB per second
- 2 eventually consistent reads of items up to 4KB per second

Write Capacity Unit (WCU):
- 1 write of item up to 1KB per second

Larger items consume more capacity. A 10KB write uses 10 WCUs.

Secondary indexes:

DynamoDB's primary key limitation (only access by partition key or partition+sort key) is addressed with secondary indexes:

Global Secondary Index (GSI):

New partition key (and optional sort key) from any attribute
Stored separately from base table (separate capacity)
Eventually consistent only
Can project subset of attributes (reduce costs)

Base table: Users
  PK: user_id

GSI: EmailIndex
  PK: email
  
# Now you can query: "Find user by email"

Local Secondary Index (LSI):

Same partition key as base table, different sort key
Must be created at table creation
Shares capacity with base table
Supports strong consistency

Base table: Orders
  PK: user_id, SK: order_date

LSI: StatusIndex
  PK: user_id, SK: status  
  
# Query: "Find pending orders for user" (same user, different sort)

GSI vs LSI Comparison
Aspect	Global Secondary Index	Local Secondary Index
Partition Key	Any attribute	Same as base table
Sort Key	Any attribute	Different from base table
When Created	Any time	Table creation only
Capacity	Separate from base table	Shared with base table
Consistency	Eventually consistent only	Strong or eventual
Size Limit	None	10GB per partition key

GSI Overloading

A powerful pattern: use generic attribute names (pk, sk, data) and overload their meaning based on item type. One GSI can support multiple access patterns by storing different entity types with the same key structure. This technique enables complex queries with minimal indexes.

DynamoDB Consistency and Global Tables

DynamoDB offers configurable consistency models and multi-region replication—essential features for global applications.

Consistency options:

Eventually Consistent Reads (default):

Read might not reflect recent writes
Half the cost of strongly consistent reads
Lower latency (can read from any replica)
Suitable for most read workloads

Strongly Consistent Reads:

Guaranteed to return the most recent write
Double the capacity cost
Higher latency (must contact leader)
Required when correctness depends on freshness

# Eventually consistent (default, cheaper)
response = table.get_item(Key={'user_id': '123'})

# Strongly consistent (guaranteed fresh)
response = table.get_item(
    Key={'user_id': '123'},
    ConsistentRead=True
)

Transactions:

DynamoDB supports ACID transactions across multiple items and tables:

# Atomic transfer between accounts
client.transact_write_items(
    TransactItems=[
        {
            'Update': {
                'TableName': 'Accounts',
                'Key': {'account_id': 'A'},
                'UpdateExpression': 'SET balance = balance - :amt',
                'ExpressionAttributeValues': {':amt': 100}
            }
        },
        {
            'Update': {
                'TableName': 'Accounts',
                'Key': {'account_id': 'B'},
                'UpdateExpression': 'SET balance = balance + :amt',
                'ExpressionAttributeValues': {':amt': 100}
            }
        }
    ]
)

Transactions cost 2x regular operations but provide serializability across up to 100 items.

Global Tables:

For multi-region deployments, DynamoDB Global Tables provide active-active replication:

Replicas in multiple AWS regions
Writes to any region, replicated to others
~1 second replication latency
Conflict resolution: Last Writer Wins (by timestamp)

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  us-east-1  │◄───►│  eu-west-1  │◄───►│ ap-south-1  │
│   Replica   │     │   Replica   │     │   Replica   │
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                   ▲                   ▲
       │                   │                   │
    Users in            Users in           Users in
    Americas            Europe              Asia

Last Writer Wins Conflicts

Global Tables use timestamp-based last-writer-wins. If users in different regions update the same item simultaneously, one update is silently lost. Design for conflict-free operations (like INCR) or use conditional writes when concurrent updates to the same item are possible.

Riak: Masterless Distributed Key-Value

Riak is an open-source distributed key-value store that closely follows the Dynamo paper's architecture. It's designed for high availability, fault tolerance, and operational simplicity—with no single point of failure.

Riak's core principles:

Masterless: Every node is equal; no leader election, no coordination
Highly available: Continues operating during network partitions and node failures
Eventually consistent: Trades consistency for availability and performance
Operationally simple: Add/remove nodes without downtime or coordination
Predictable latency: Consistent performance at scale

Ring-based architecture:

Riak distributes data using consistent hashing with a ring structure:

                        Ring (2^160 positions)
                              ┌───┐
                         ┌────┤ 0 ├────┐
                     ┌───┘    └───┘    └───┐
                   Node 3              Node 1
                     │                    │
                  ┌──┴──┐              ┌──┴──┐
                  │     │              │     │
                 ...   ...            ...   ...
                  │     │              │     │
                  └──┬──┘              └──┬──┘
                   Node 2              Node 4
                     └───┐    ┌───┐    ┌───┘
                         └────┤Max├────┘
                              └───┘

Each node owns a portion of the ring. Data is replicated to N consecutive nodes (default N=3).

Tunable consistency:

Riak allows configuring quorum parameters per operation:

N — Number of replicas (copies of each value)
R — Number of replicas that must respond to a read
W — Number of replicas that must acknowledge a write

Common configurations:

Riak Quorum Configurations
Configuration	N	R	W	Characteristic
Strong consistency	3	2	2	W + R > N: guaranteed to see latest
High read availability	3	1	3	Reads always succeed if any replica up
High write availability	3	3	1	Writes always succeed if any replica up
Balanced (default)	3	quorum	quorum	quorum = floor(N/2) + 1 = 2

Data types:

Riak supports conflict-free replicated data types (CRDTs) that automatically merge concurrent updates:

Counters: Increment/decrement without conflict
Sets: Add/remove elements, automatic merge on concurrent updates
Maps: Nested structures with type-specific merge
Flags: Boolean with deterministic merge (enable beats disable)
Registers: Last-write-wins strings

% Increment counter (no read-modify-write needed)
riakc_counter:increment(100, Counter),
UpdateOp = riakc_counter:to_op(Counter),
riakc_pb_socket:update_type(Pid, Bucket, Key, UpdateOp).

CRDTs are powerful for high-concurrency scenarios where traditional locks would be too expensive.

Sibling Resolution

When conflicts occur with regular values (not CRDTs), Riak stores all conflicting versions as 'siblings'. Your application must resolve conflicts at read time. For simple cases, use last-write-wins (allow_mult=false). For critical data, implement application-level merge logic.

Riak Operations and Consistency Patterns

Operating Riak requires understanding its consistency model and failure behavior.

Read repair:

When a read returns data from multiple replicas, Riak compares versions. If replicas have diverged:

The most recent/merged value is returned
Stale replicas are updated in the background
This 'repairs' inconsistency opportunistically

Anti-entropy:

Background processes (active anti-entropy) continuously compare data across replicas using Merkle trees:

Build hash tree of all data on each node
Compare trees to find divergent ranges
Synchronize only divergent data
Efficient even with billions of keys

Handling node failures:

Riak Failure Handling

•Temporary failure: Hinted handoff stores data on alternate nodes; data returned when original recovers
•Permanent failure: Replace node, data reconstructed from replicas via handoff
•Network partition: Both sides continue operating; conflicts resolved on partition heal
•Rolling upgrade: Upgrade one node at a time; cluster continues serving requests
•Cluster expansion: Add nodes to ring; data rebalances automatically

Bucket types and properties:

Riak organizes data into buckets with configurable properties:

riak-admin bucket-type create session_store 
    '{"props": {
        "n_val": 3,
        "allow_mult": false,
        "last_write_wins": true,
        "backend": "leveldb_backend"
    }}'

riak-admin bucket-type activate session_store

When to use Riak:

Riak Excels At

•Session storage with high availability
•User preferences and profiles
•Shopping carts (the original Dynamo use case)
•Sensor/IoT data collection
•Content/media metadata
•Any 'lookup by key' workload

Riak Struggles With

•Strong consistency requirements
•Complex queries (beyond key lookup)
•Transactions across multiple keys
•Small deployments (overhead not worthwhile)
•Operations teams unfamiliar with eventual consistency

Riak's Current Status

Riak was developed by Basho Technologies, which ceased operations in 2017. The project is now community-maintained. While existing deployments continue, new projects typically choose DynamoDB (if on AWS), Cassandra (for high write throughput), or newer databases. Riak remains valuable for understanding Dynamo-style systems.

Choosing the Right Key-Value Store

With multiple key-value stores available, choosing the right one depends on your specific requirements.

Key-Value Store Comparison
Aspect	Redis	Memcached	DynamoDB	Riak
Primary Use	Cache + data structures	Pure caching	Durable KV/Document	Durable KV (HA)
Persistence	Optional	None	Always	Always
Data Model	Rich types	Strings only	Items + attributes	Opaque values + CRDTs
Consistency	Strong (single)	N/A	Eventual or strong	Tunable (eventual default)
Scaling	Manual (Cluster)	Client-side	Automatic	Manual (add nodes)
Operations	Self-managed	Self-managed	Fully managed	Self-managed
Multi-Region	Manual	No	Global Tables	Yes (MDC)
Latency	<1ms	<1ms	~5-10ms	~5-20ms
Cost Model	Infrastructure	Infrastructure	Pay per request/capacity	Infrastructure

Decision framework:

                      Need persistence?
                           │
              ┌────────────┴────────────┐
              No                        Yes
              │                         │
         Memcached/Redis         Need strong consistency?
              │                         │
     Need data structures?      ┌───────┴───────┐
              │                 No             Yes
     ┌────────┴────────┐        │               │
     No               Yes    DynamoDB        PostgreSQL
     │                 │     (eventual)     or Spanner
  Memcached         Redis                        │
                                        Need managed service?
                                                │
                                        ┌───────┴───────┐
                                        No             Yes
                                        │               │
                                    Riak/Cassandra  DynamoDB
                                                   (strong)

When to Choose Each

•Redis: Caching with data structures, real-time features, pub/sub, sessions with optional persistence
•Memcached: Pure ephemeral caching, maximum per-instance throughput, simple operations
•DynamoDB: Serverless on AWS, automatic scaling, global distribution, don't want to manage infrastructure
•Riak: On-premises with maximum availability, tolerance for eventual consistency, existing expertise

The Polyglot Reality

Most production systems use multiple key-value stores. A typical setup: Redis for caching and real-time features, DynamoDB for durable session storage, and the primary relational database for complex queries. Each tool for its strengths.

Summary: Distributed Key-Value Stores

We've explored DynamoDB and Riak—two approaches to distributed key-value storage that solve different problems. Let's consolidate the key concepts:

Key Takeaways

•Dynamo paper shaped distributed databases — Consistent hashing, quorums, vector clocks, and hinted handoff are foundational concepts used across many systems.
•DynamoDB is fully managed — No servers, automatic scaling, built-in replication. Pay for reads/writes, not infrastructure. Partition key design is critical for performance.
•Global Secondary Indexes unlock queries — Transform DynamoDB from 'access by key' to rich query patterns. GSIs have separate capacity and eventual consistency.
•Global Tables enable multi-region — Active-active replication across regions. Last-writer-wins conflict resolution; design for conflict-free patterns.
•Riak is masterless and highly available — Every node is equal, no leader election. Continues operating through failures and partitions.
•Tunable consistency is powerful — Choose strong or eventual consistency per operation based on your correctness vs. availability trade-off.
•CRDTs solve concurrent updates — Conflict-free data types automatically merge without coordination. Essential for high-concurrency workloads.

What's next:

With our exploration of key-value stores complete—from Redis and Memcached to DynamoDB and Riak—we'll conclude this module with a comprehensive look at use cases and trade-offs. You'll learn how to choose the right key-value store for specific scenarios and understand the fundamental trade-offs in key-value database design.

Page Complete

You now understand DynamoDB's managed, scalable architecture and Riak's masterless, eventually consistent design. This knowledge enables you to evaluate distributed key-value stores for persistence-heavy workloads and geo-distributed applications.

4 / 5

Loading learning content...

System Design (HLD)Key-Value Stores

Key-Value Stores

LevelIntermediate

Duration75 mins

TopicKey-Value Stores

4 / 5

DynamoDB and Riak: Distributed Key-Value at Scale

Key-Value Stores for Planet-Scale

Two of the most influential distributed key-value stores are:

Amazon DynamoDB: A fully managed, serverless key-value and document database that scales automatically to handle trillions of requests per day.
Riak: An open-source, Dynamo-inspired database designed for fault tolerance and operational simplicity.

What You Will Learn

The Dynamo Legacy

Core Dynamo concepts:

Dynamo Paper Innovations

•Consistent Hashing — Distributes keys across nodes using a hash ring. Adding/removing nodes only moves a fraction of keys, not all of them.
•Virtual Nodes (Vnodes) — Each physical node maps to multiple positions on the hash ring, ensuring even distribution and enabling heterogeneous hardware.
•Sloppy Quorums — When preferred nodes are unavailable, temporarily write to other nodes (hinted handoff) to maintain availability.
•Hinted Handoff — Nodes that temporarily accept writes for unavailable nodes 'hand off' the data once the original node recovers.
•Anti-Entropy with Merkle Trees — Background processes detect and repair data inconsistencies by comparing cryptographic hashes of data ranges.
•Vector Clocks — Track causal relationships between versions of a value, enabling detection and resolution of conflicting concurrent writes.
•Conflict Resolution at Read Time — Expose conflicts to the application or use simple resolution (last-write-wins) rather than blocking on conflict.

The availability-first philosophy:

Dynamo was designed for always-on services where unavailability directly costs money (shopping carts). Its philosophy:

"It is better to occasionally return stale or conflicting data than to block the customer's request."

This led to eventual consistency as the default, with techniques to detect and resolve conflicts rather than prevent them.

Systems influenced by Dynamo:

Apache Cassandra: Combines Dynamo's distribution with BigTable's data model
Riak: Close implementation of Dynamo concepts, open source
Voldemort: LinkedIn's Dynamo-style store (predecessor to Venice)
Amazon DynamoDB: AWS managed service, evolved beyond original paper

DynamoDB ≠ Dynamo

Amazon DynamoDB Architecture

DynamoDB is AWS's flagship NoSQL database service. It's fully managed—no servers to provision, no software to install, no replication to configure. You create a table, and AWS handles the rest.

Core concepts:

Tables: Top-level containers for items (like a table in a relational database)
Items: Individual records (like rows), identified by primary key
Attributes: Name-value pairs within items (like columns, but schema-less)
Primary Key: Uniquely identifies each item; either simple (partition key only) or composite (partition key + sort key)

Partition key (hash key):

DynamoDB uses the partition key to determine which partition stores an item:

Partition = hash(partition_key) mod number_of_partitions

Good partition key design is the most important factor in DynamoDB performance. Keys should:

Have high cardinality (many distinct values)
Distribute access evenly across values
Match your access patterns

Sort key (range key):

Optional second component of a composite primary key. Items with the same partition key are stored together, sorted by sort key:

Table: OrderItems
Partition Key: order_id
Sort Key: item_id

# All items for order stored together, sorted by item_id
# Enables queries like: "Get all items for order 123"

DynamoDB Data Model Examples
Use Case	Partition Key	Sort Key	Access Pattern
User profiles	user_id	(none)	Get profile by user ID
Orders by user	user_id	order_date#order_id	Get all orders for user, sorted by date
Chat messages	conversation_id	timestamp#message_id	Get messages in conversation, ordered
IoT sensor data	sensor_id	timestamp	Get readings for sensor in time range
Session store	session_token	(none)	Get session by token

Partitions and scaling:

DynamoDB automatically partitions data based on:

Size: Each partition holds up to 10GB
Throughput: Each partition handles ~3000 RCU or ~1000 WCU

As data grows, DynamoDB adds partitions. But here's the critical insight: throughput is divided among partitions. If you provision 1000 WCU with 10 partitions, each partition gets ~100 WCU.

The hot partition problem:

If access concentrates on one partition key value (a 'hot key'), that single partition becomes a bottleneck while others sit idle. Classic examples:

Partition key = date (all today's traffic hits one partition)
Partition key = tenant_id with one huge tenant
Partition key = status (most items are 'active')

Hot Partition Design

DynamoDB Capacity and Indexes

Understanding DynamoDB's capacity models and secondary indexes is essential for cost optimization and query flexibility.

Capacity modes:

Provisioned Capacity

•Pre-allocate RCU/WCU (read/write capacity units)
•Predictable costs for steady workloads
•Reserved capacity discounts (up to 77%)
•Auto-scaling available (with delays)
•Throttling if capacity exceeded
•Best for: Predictable traffic patterns

On-Demand Capacity

•Pay per request, no provisioning
•Instant scaling (no warm-up)
•No capacity planning required
•~5-7x more expensive per operation
•Never throttled (within limits)
•Best for: Variable/unpredictable traffic

Capacity unit costs:

Read Capacity Unit (RCU):
- 1 strongly consistent read of item up to 4KB per second
- 2 eventually consistent reads of items up to 4KB per second

Write Capacity Unit (WCU):
- 1 write of item up to 1KB per second

Larger items consume more capacity. A 10KB write uses 10 WCUs.

Secondary indexes:

DynamoDB's primary key limitation (only access by partition key or partition+sort key) is addressed with secondary indexes:

Global Secondary Index (GSI):

New partition key (and optional sort key) from any attribute
Stored separately from base table (separate capacity)
Eventually consistent only
Can project subset of attributes (reduce costs)

Base table: Users
  PK: user_id

GSI: EmailIndex
  PK: email
  
# Now you can query: "Find user by email"

Local Secondary Index (LSI):

Same partition key as base table, different sort key
Must be created at table creation
Shares capacity with base table
Supports strong consistency

Base table: Orders
  PK: user_id, SK: order_date

LSI: StatusIndex
  PK: user_id, SK: status  
  
# Query: "Find pending orders for user" (same user, different sort)

GSI vs LSI Comparison
Aspect	Global Secondary Index	Local Secondary Index
Partition Key	Any attribute	Same as base table
Sort Key	Any attribute	Different from base table
When Created	Any time	Table creation only
Capacity	Separate from base table	Shared with base table
Consistency	Eventually consistent only	Strong or eventual
Size Limit	None	10GB per partition key

GSI Overloading

DynamoDB Consistency and Global Tables

DynamoDB offers configurable consistency models and multi-region replication—essential features for global applications.

Consistency options:

Eventually Consistent Reads (default):

Read might not reflect recent writes
Half the cost of strongly consistent reads
Lower latency (can read from any replica)
Suitable for most read workloads

Strongly Consistent Reads:

Guaranteed to return the most recent write
Double the capacity cost
Higher latency (must contact leader)
Required when correctness depends on freshness

# Eventually consistent (default, cheaper)
response = table.get_item(Key={'user_id': '123'})

# Strongly consistent (guaranteed fresh)
response = table.get_item(
    Key={'user_id': '123'},
    ConsistentRead=True
)

Transactions:

DynamoDB supports ACID transactions across multiple items and tables:

# Atomic transfer between accounts
client.transact_write_items(
    TransactItems=[
        {
            'Update': {
                'TableName': 'Accounts',
                'Key': {'account_id': 'A'},
                'UpdateExpression': 'SET balance = balance - :amt',
                'ExpressionAttributeValues': {':amt': 100}
            }
        },
        {
            'Update': {
                'TableName': 'Accounts',
                'Key': {'account_id': 'B'},
                'UpdateExpression': 'SET balance = balance + :amt',
                'ExpressionAttributeValues': {':amt': 100}
            }
        }
    ]
)

Transactions cost 2x regular operations but provide serializability across up to 100 items.

Global Tables:

For multi-region deployments, DynamoDB Global Tables provide active-active replication:

Replicas in multiple AWS regions
Writes to any region, replicated to others
~1 second replication latency
Conflict resolution: Last Writer Wins (by timestamp)

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  us-east-1  │◄───►│  eu-west-1  │◄───►│ ap-south-1  │
│   Replica   │     │   Replica   │     │   Replica   │
└─────────────┘     └─────────────┘     └─────────────┘
       ▲                   ▲                   ▲
       │                   │                   │
    Users in            Users in           Users in
    Americas            Europe              Asia

Last Writer Wins Conflicts

Riak: Masterless Distributed Key-Value

Riak's core principles:

Masterless: Every node is equal; no leader election, no coordination
Highly available: Continues operating during network partitions and node failures
Eventually consistent: Trades consistency for availability and performance
Operationally simple: Add/remove nodes without downtime or coordination
Predictable latency: Consistent performance at scale

Ring-based architecture:

Riak distributes data using consistent hashing with a ring structure:

                        Ring (2^160 positions)
                              ┌───┐
                         ┌────┤ 0 ├────┐
                     ┌───┘    └───┘    └───┐
                   Node 3              Node 1
                     │                    │
                  ┌──┴──┐              ┌──┴──┐
                  │     │              │     │
                 ...   ...            ...   ...
                  │     │              │     │
                  └──┬──┘              └──┬──┘
                   Node 2              Node 4
                     └───┐    ┌───┐    ┌───┘
                         └────┤Max├────┘
                              └───┘

Each node owns a portion of the ring. Data is replicated to N consecutive nodes (default N=3).

Tunable consistency:

Riak allows configuring quorum parameters per operation:

N — Number of replicas (copies of each value)
R — Number of replicas that must respond to a read
W — Number of replicas that must acknowledge a write

Common configurations:

Riak Quorum Configurations
Configuration	N	R	W	Characteristic
Strong consistency	3	2	2	W + R > N: guaranteed to see latest
High read availability	3	1	3	Reads always succeed if any replica up
High write availability	3	3	1	Writes always succeed if any replica up
Balanced (default)	3	quorum	quorum	quorum = floor(N/2) + 1 = 2

Data types:

Riak supports conflict-free replicated data types (CRDTs) that automatically merge concurrent updates:

Counters: Increment/decrement without conflict
Sets: Add/remove elements, automatic merge on concurrent updates
Maps: Nested structures with type-specific merge
Flags: Boolean with deterministic merge (enable beats disable)
Registers: Last-write-wins strings

% Increment counter (no read-modify-write needed)
riakc_counter:increment(100, Counter),
UpdateOp = riakc_counter:to_op(Counter),
riakc_pb_socket:update_type(Pid, Bucket, Key, UpdateOp).

CRDTs are powerful for high-concurrency scenarios where traditional locks would be too expensive.

Sibling Resolution

Riak Operations and Consistency Patterns

Operating Riak requires understanding its consistency model and failure behavior.

Read repair:

When a read returns data from multiple replicas, Riak compares versions. If replicas have diverged:

The most recent/merged value is returned
Stale replicas are updated in the background
This 'repairs' inconsistency opportunistically

Anti-entropy:

Background processes (active anti-entropy) continuously compare data across replicas using Merkle trees:

Build hash tree of all data on each node
Compare trees to find divergent ranges
Synchronize only divergent data
Efficient even with billions of keys

Handling node failures:

Riak Failure Handling

•Temporary failure: Hinted handoff stores data on alternate nodes; data returned when original recovers
•Permanent failure: Replace node, data reconstructed from replicas via handoff
•Network partition: Both sides continue operating; conflicts resolved on partition heal
•Rolling upgrade: Upgrade one node at a time; cluster continues serving requests
•Cluster expansion: Add nodes to ring; data rebalances automatically

Bucket types and properties:

Riak organizes data into buckets with configurable properties:

riak-admin bucket-type create session_store 
    '{"props": {
        "n_val": 3,
        "allow_mult": false,
        "last_write_wins": true,
        "backend": "leveldb_backend"
    }}'

riak-admin bucket-type activate session_store

When to use Riak:

Riak Excels At

•Session storage with high availability
•User preferences and profiles
•Shopping carts (the original Dynamo use case)
•Sensor/IoT data collection
•Content/media metadata
•Any 'lookup by key' workload

Riak Struggles With

•Strong consistency requirements
•Complex queries (beyond key lookup)
•Transactions across multiple keys
•Small deployments (overhead not worthwhile)
•Operations teams unfamiliar with eventual consistency

Riak's Current Status

Choosing the Right Key-Value Store

With multiple key-value stores available, choosing the right one depends on your specific requirements.

Key-Value Store Comparison
Aspect	Redis	Memcached	DynamoDB	Riak
Primary Use	Cache + data structures	Pure caching	Durable KV/Document	Durable KV (HA)
Persistence	Optional	None	Always	Always
Data Model	Rich types	Strings only	Items + attributes	Opaque values + CRDTs
Consistency	Strong (single)	N/A	Eventual or strong	Tunable (eventual default)
Scaling	Manual (Cluster)	Client-side	Automatic	Manual (add nodes)
Operations	Self-managed	Self-managed	Fully managed	Self-managed
Multi-Region	Manual	No	Global Tables	Yes (MDC)
Latency	<1ms	<1ms	~5-10ms	~5-20ms
Cost Model	Infrastructure	Infrastructure	Pay per request/capacity	Infrastructure

Decision framework:

                      Need persistence?
                           │
              ┌────────────┴────────────┐
              No                        Yes
              │                         │
         Memcached/Redis         Need strong consistency?
              │                         │
     Need data structures?      ┌───────┴───────┐
              │                 No             Yes
     ┌────────┴────────┐        │               │
     No               Yes    DynamoDB        PostgreSQL
     │                 │     (eventual)     or Spanner
  Memcached         Redis                        │
                                        Need managed service?
                                                │
                                        ┌───────┴───────┐
                                        No             Yes
                                        │               │
                                    Riak/Cassandra  DynamoDB
                                                   (strong)

When to Choose Each

•Redis: Caching with data structures, real-time features, pub/sub, sessions with optional persistence
•Memcached: Pure ephemeral caching, maximum per-instance throughput, simple operations
•DynamoDB: Serverless on AWS, automatic scaling, global distribution, don't want to manage infrastructure
•Riak: On-premises with maximum availability, tolerance for eventual consistency, existing expertise

The Polyglot Reality

Summary: Distributed Key-Value Stores

We've explored DynamoDB and Riak—two approaches to distributed key-value storage that solve different problems. Let's consolidate the key concepts:

Key Takeaways

•Dynamo paper shaped distributed databases — Consistent hashing, quorums, vector clocks, and hinted handoff are foundational concepts used across many systems.
•DynamoDB is fully managed — No servers, automatic scaling, built-in replication. Pay for reads/writes, not infrastructure. Partition key design is critical for performance.
•Global Secondary Indexes unlock queries — Transform DynamoDB from 'access by key' to rich query patterns. GSIs have separate capacity and eventual consistency.
•Global Tables enable multi-region — Active-active replication across regions. Last-writer-wins conflict resolution; design for conflict-free patterns.
•Riak is masterless and highly available — Every node is equal, no leader election. Continues operating through failures and partitions.
•Tunable consistency is powerful — Choose strong or eventual consistency per operation based on your correctness vs. availability trade-off.
•CRDTs solve concurrent updates — Conflict-free data types automatically merge without coordination. Essential for high-concurrency workloads.

What's next:

Page Complete

4 / 5