Design a Distributed Database (Cassandra / DynamoDB)

Design a distributed NoSQL database like Apache Cassandra or Amazon DynamoDB — a masterless, peer-to-peer, wide-column / key-value store that partitions data across a cluster using consistent hashing with virtual nodes, replicates data to N nodes with rack/DC-aware placement, offers tunable consistency (ONE/QUORUM/ALL per operation), uses an LSM-tree storage engine (commit log → memtable → SSTables with compaction), and maintains consistency across replicas via hinted handoff, read repair, and Merkle tree anti-entropy repair — all while supporting multi-data-centre active-active replication.

Scale Estimates

Metric	Value
Cluster size	100–10,000 nodes
Total storage	1+ PB
Writes per second (per node)	50,000–100,000
Reads per second (per node)	10,000–50,000
Write latency (p99)	< 5ms
Read latency (p99)	< 10ms
Replication factor	3 (typical)
Virtual nodes per physical node	256
Data centres	2–5 (active-active)
Partitions (token ranges)	Millions
Compaction throughput	50–100 MB/s per node

Non-Functional Requirements

Availability: Masterless architecture — no single point of failure; any node can coordinate any request; writes succeed as long as ≥ 1 replica is reachable (CL=ONE); survives node and rack failures transparently; multi-DC for regional failures
Tunable consistency: Per-operation choice: ONE (fast, eventual) to QUORUM (strong, R+W>N) to ALL (strongest, least available); LOCAL_QUORUM for multi-DC; flexibility to match each use case's consistency needs
Write performance: LSM-tree engine — sequential commit log append + in-memory memtable; no read-before-write; no random I/O on write; 50K–100K writes/sec per node; < 5ms p99 write latency
Scalability: Consistent hashing with vnodes; add nodes → automatic data redistribution; linear horizontal scaling; no resharding downtime; 10,000+ node clusters proven in production (Apple, Netflix)
Durability: Commit log (WAL) ensures writes survive crashes; replication factor 3 across racks/DCs; hinted handoff for temporary failures; anti-entropy repair for long-term consistency
Multi-DC: Active-active replication across data centres; local reads/writes for low latency; async cross-DC sync; LWW conflict resolution; DC failover with no data loss

Scale Estimates

Metric

Value

Cluster size

100–10,000 nodes

Total storage

1+ PB

Writes per second (per node)

50,000–100,000

Reads per second (per node)

10,000–50,000

Write latency (p99)

< 5ms

Read latency (p99)

< 10ms

Replication factor

3 (typical)

Virtual nodes per physical node

256

Data centres

2–5 (active-active)

Partitions (token ranges)

Millions

Compaction throughput

50–100 MB/s per node

Non-Functional Requirements

Availability: Masterless architecture — no single point of failure; any node can coordinate any request; writes succeed as long as ≥ 1 replica is reachable (CL=ONE); survives node and rack failures transparently; multi-DC for regional failures

Tunable consistency: Per-operation choice: ONE (fast, eventual) to QUORUM (strong, R+W>N) to ALL (strongest, least available); LOCAL_QUORUM for multi-DC; flexibility to match each use case's consistency needs

Write performance: LSM-tree engine — sequential commit log append + in-memory memtable; no read-before-write; no random I/O on write; 50K–100K writes/sec per node; < 5ms p99 write latency

Scalability: Consistent hashing with vnodes; add nodes → automatic data redistribution; linear horizontal scaling; no resharding downtime; 10,000+ node clusters proven in production (Apple, Netflix)

Durability: Commit log (WAL) ensures writes survive crashes; replication factor 3 across racks/DCs; hinted handoff for temporary failures; anti-entropy repair for long-term consistency

Multi-DC: Active-active replication across data centres; local reads/writes for low latency; async cross-DC sync; LWW conflict resolution; DC failover with no data loss

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Distributed Database (Cassandra / DynamoDB)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design a Distributed Database (Cassandra / DynamoDB)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does consistent hashing work for data partitioning?

2How does the write path work (LSM-tree storage engine)?

3How does the read path work?

4How does tunable consistency work and what are the trade-offs?

5How do anti-entropy mechanisms ensure data consistency across replicas?

6How does multi-data-centre replication work?

7How do DynamoDB and Cassandra compare architecturally?

Key Topics

Asked At

Design a Distributed Database (Cassandra / DynamoDB)

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does consistent hashing work for data partitioning?

2How does the write path work (LSM-tree storage engine)?

3How does the read path work?

4How does tunable consistency work and what are the trade-offs?

5How do anti-entropy mechanisms ensure data consistency across replicas?

6How does multi-data-centre replication work?

7How do DynamoDB and Cassandra compare architecturally?

Key Topics

Asked At