Design Redis

Design an in-memory data structure store like Redis that serves as a database, cache, message broker, and streaming engine. The system stores all data in RAM for sub-millisecond latency, supports rich data structures (strings, lists, sets, sorted sets, hashes, streams), provides persistence via RDB snapshots and AOF logs, master-replica replication with automatic failover, and horizontal scaling via hash-slot-based clustering.

Scale Estimates

Metric	Value
Operations per second (single instance)	100K–1M
Data stored per instance	Up to 256 GB (RAM-constrained)
Client connections per instance	10,000+ (epoll multiplexed)
Average operation latency	< 1ms (sub-millisecond)
Cluster size (Redis Cluster)	Up to 1,000 nodes
Hash slots	16,384
Replication lag	< 1ms (async, same DC)
RDB snapshot time (10 GB dataset)	~10 seconds (background fork)
AOF rewrite time (10 GB)	~30 seconds (background fork)
Key count per instance	Hundreds of millions

Non-Functional Requirements

Latency: Sub-millisecond average; p99 < 1ms; achieved via in-memory storage + single-threaded event loop (no locking, no context switching)
Persistence trade-offs: RDB → periodic snapshots (compact, fast restart, data loss between snapshots); AOF → append every write (minimal data loss, larger files, slower restart); hybrid recommended
Replication: Asynchronous master→replica; read scaling; Sentinel for automatic failover (quorum-based, < 30s detection + promotion)
Clustering: Data sharded across masters via 16,384 hash slots; CRC16 hashing; live resharding; automatic failover per shard; gossip protocol for cluster state
Memory efficiency: Compact encodings for small collections (ziplist, intset); approximate LRU/LFU eviction; jemalloc; configurable maxmemory with policy
Simplicity: Single-threaded model makes reasoning about atomicity simple — each command is atomic; transactions via MULTI/EXEC; Lua scripts for complex atomic operations

Scale Estimates

Metric

Value

Operations per second (single instance)

100K–1M

Data stored per instance

Up to 256 GB (RAM-constrained)

Client connections per instance

10,000+ (epoll multiplexed)

Average operation latency

< 1ms (sub-millisecond)

Cluster size (Redis Cluster)

Up to 1,000 nodes

Hash slots

16,384

Replication lag

< 1ms (async, same DC)

RDB snapshot time (10 GB dataset)

~10 seconds (background fork)

AOF rewrite time (10 GB)

~30 seconds (background fork)

Key count per instance

Hundreds of millions

Non-Functional Requirements

Latency: Sub-millisecond average; p99 < 1ms; achieved via in-memory storage + single-threaded event loop (no locking, no context switching)

Persistence trade-offs: RDB → periodic snapshots (compact, fast restart, data loss between snapshots); AOF → append every write (minimal data loss, larger files, slower restart); hybrid recommended

Replication: Asynchronous master→replica; read scaling; Sentinel for automatic failover (quorum-based, < 30s detection + promotion)

Clustering: Data sharded across masters via 16,384 hash slots; CRC16 hashing; live resharding; automatic failover per shard; gossip protocol for cluster state

Memory efficiency: Compact encodings for small collections (ziplist, intset); approximate LRU/LFU eviction; jemalloc; configurable maxmemory with policy

Simplicity: Single-threaded model makes reasoning about atomicity simple — each command is atomic; transactions via MULTI/EXEC; Lua scripts for complex atomic operations

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design Redis

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Follow-up Deep Dives(Questions an interviewer might ask)

Design Redis

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does Redis achieve sub-millisecond latency? Why is single-threaded architecture effective?

2How do RDB snapshots and AOF persistence work? What are the trade-offs?

3How does Redis replication work? How does it handle failover?

4How does Redis Cluster distribute data across multiple nodes?

5How do Redis data structures work internally? What encoding optimisations exist?

6How does key expiry and eviction work?

7How would you use Redis as a distributed cache in a larger system architecture?

Key Topics

Asked At

Design Redis

Scale Estimates

Non-Functional Requirements

Functional Requirements

Approach Guide(Click to expand each section)

Non-Functional Requirements~3 min

Core Entities~2 min

API Design~3 min

High-Level Design~5 min

Follow-up Deep Dives(Questions an interviewer might ask)

1How does Redis achieve sub-millisecond latency? Why is single-threaded architecture effective?

2How do RDB snapshots and AOF persistence work? What are the trade-offs?

3How does Redis replication work? How does it handle failover?

4How does Redis Cluster distribute data across multiple nodes?

5How do Redis data structures work internally? What encoding optimisations exist?

6How does key expiry and eviction work?

7How would you use Redis as a distributed cache in a larger system architecture?

Key Topics

Asked At