Loading system design...
Design an in-memory data structure store like Redis that serves as a database, cache, message broker, and streaming engine. The system stores all data in RAM for sub-millisecond latency, supports rich data structures (strings, lists, sets, sorted sets, hashes, streams), provides persistence via RDB snapshots and AOF logs, master-replica replication with automatic failover, and horizontal scaling via hash-slot-based clustering.
| Metric | Value |
|---|---|
| Operations per second (single instance) | 100K–1M |
| Data stored per instance | Up to 256 GB (RAM-constrained) |
| Client connections per instance | 10,000+ (epoll multiplexed) |
| Average operation latency | < 1ms (sub-millisecond) |
| Cluster size (Redis Cluster) | Up to 1,000 nodes |
| Hash slots | 16,384 |
| Replication lag | < 1ms (async, same DC) |
| RDB snapshot time (10 GB dataset) | ~10 seconds (background fork) |
| AOF rewrite time (10 GB) | ~30 seconds (background fork) |
| Key count per instance | Hundreds of millions |
Key-value operations: support GET, SET, DEL with O(1) average time complexity; keys are strings; values can be strings, integers, or binary-safe byte arrays; support TTL (expiry) per key
Rich data structures: support Strings, Lists (linked list — LPUSH/RPUSH/LPOP/RPOP), Sets (SADD/SMEMBERS/SINTER), Sorted Sets (ZADD/ZRANGE/ZRANGEBYSCORE), Hashes (HSET/HGET/HGETALL), and Streams
In-memory storage: all data stored in RAM for sub-millisecond latency; provide mechanisms to persist data to disk for durability (RDB snapshots and AOF append-only file)
Persistence: RDB (point-in-time snapshots at configurable intervals) and AOF (append every write command to a log file); hybrid mode — RDB for fast restart + AOF for minimal data loss
Replication: master-replica architecture; replicas maintain an exact copy of master's data; asynchronous replication; support read scaling (read from replicas) and failover
Pub/Sub: publishers send messages to channels; subscribers receive messages from channels they've subscribed to; fire-and-forget (no persistence of pub/sub messages)
Transactions: MULTI/EXEC — batch multiple commands and execute them atomically (all or nothing); WATCH for optimistic locking (CAS — check-and-set); no rollback (commands execute sequentially within transaction)
Lua scripting: execute Lua scripts atomically on the server side; scripts can read/write multiple keys; provides conditional logic that single commands can't express; scripts are atomic (no other command runs concurrently)
Clustering: Redis Cluster — distribute data across multiple master nodes using hash slots (16,384 slots); automatic slot rebalancing; client-side routing with MOVED/ASK redirects
Eviction policies: when memory limit reached, evict keys according to configurable policy — LRU, LFU, TTL-based, random, or noeviction (return errors); approximate LRU for memory efficiency
Non-functional requirements define the system qualities critical to your users. Frame them as 'The system should be able to...' statements. These will guide your deep dives later.
Think about CAP theorem trade-offs, scalability limits, latency targets, durability guarantees, security requirements, fault tolerance, and compliance needs.
Frame NFRs for this specific system. 'Low latency search under 100ms' is far more valuable than just 'low latency'.
Add concrete numbers: 'P99 response time < 500ms', '99.9% availability', '10M DAU'. This drives architectural decisions.
Choose the 3-5 most critical NFRs. Every system should be 'scalable', but what makes THIS system's scaling uniquely challenging?