Design a Key-Value Store

Design a distributed key-value store similar to Amazon DynamoDB, Apache Cassandra, or Riak. The system should support put, get, and delete operations with tunable consistency, automatic data partitioning via consistent hashing, and fault-tolerant replication.

Scale Estimates

Metric	Value
Data stored	100 TB across the cluster
Key size	≤ 256 bytes
Value size	≤ 1 MB (typical: 1–10 KB)
Read QPS	500,000 per second
Write QPS	100,000 per second
Number of nodes	100–1,000
Replication factor (N)	3
Read/write latency (p99)	< 10ms (within a single DC)
Virtual nodes per physical	150 (for balanced hash ring)

Non-Functional Requirements

High availability: The system must remain operational even when multiple nodes fail (AP system by default; tunable to CP per request)
Partition tolerance: The system handles network partitions gracefully via sloppy quorum and hinted handoff
Tunable consistency: Clients choose W and R per request; W + R > N = strong consistency; W=1, R=1 = eventual
Horizontal scalability: Add nodes to increase capacity and throughput linearly; consistent hashing minimises data movement
Low latency: LSM-tree for fast writes; Bloom filters for fast reads; in-memory memtable absorbs write bursts
Durability: Write-ahead log (WAL) on every write; data replicated to N nodes across racks/AZs
Self-healing: Gossip protocol detects failures; Merkle trees enable efficient anti-entropy repair

Scale Estimates

Metric

Value

Data stored

100 TB across the cluster

Key size

≤ 256 bytes

Value size

≤ 1 MB (typical: 1–10 KB)

Read QPS

500,000 per second

Write QPS

100,000 per second

Number of nodes

100–1,000

Replication factor (N)

Read/write latency (p99)

< 10ms (within a single DC)

Virtual nodes per physical

150 (for balanced hash ring)

Non-Functional Requirements

High availability: The system must remain operational even when multiple nodes fail (AP system by default; tunable to CP per request)

Partition tolerance: The system handles network partitions gracefully via sloppy quorum and hinted handoff

Tunable consistency: Clients choose W and R per request; W + R > N = strong consistency; W=1, R=1 = eventual

Horizontal scalability: Add nodes to increase capacity and throughput linearly; consistent hashing minimises data movement

Low latency: LSM-tree for fast writes; Bloom filters for fast reads; in-memory memtable absorbs write bursts

Durability: Write-ahead log (WAL) on every write; data replicated to N nodes across racks/AZs

Self-healing: Gossip protocol detects failures; Merkle trees enable efficient anti-entropy repair

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Key-Value Store

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Key-Value Store

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How would you partition data across nodes using consistent hashing?

2How would you replicate data for fault tolerance? Explain the replication strategy.

3Explain the CAP theorem trade-offs and how tunable consistency works with quorum reads/writes.

4How would you handle conflicting writes (concurrent updates to the same key)?

5What storage engine would you use on each node? Compare LSM-tree vs B-tree.

6How would you detect and handle node failures?

7How would you handle data center replication for disaster recovery?

Key Topics

Asked At

Design a Key-Value Store

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How would you partition data across nodes using consistent hashing?

2How would you replicate data for fault tolerance? Explain the replication strategy.

3Explain the CAP theorem trade-offs and how tunable consistency works with quorum reads/writes.

4How would you handle conflicting writes (concurrent updates to the same key)?

5What storage engine would you use on each node? Compare LSM-tree vs B-tree.

6How would you detect and handle node failures?

7How would you handle data center replication for disaster recovery?

Key Topics

Asked At