Loading content...
Every architectural choice involves trade-offs. Key-value stores achieve their remarkable simplicity and performance by deliberately omitting features that traditional databases provide. Understanding these limitations is essential for making sound decisions about when to use—and when to avoid—key-value stores.
The limitations aren't bugs or missing features; they're fundamental consequences of the key-value data model. Attempting to work around them often results in complex, fragile systems that would be better served by a different database type.
By the end of this page, you will understand the fundamental limitations of key-value stores, recognize warning signs that a key-value store is the wrong choice, and know how to work within these constraints or choose alternatives.
The most fundamental limitation. Key-value stores provide no way to query data by anything other than the exact key. There's no SQL, no query optimizer, no way to say "find all users where status = 'active'".
What this means in practice:
123456789101112131415161718192021222324252627282930313233
# What you CAN do:user = kv_store.get("user:123") # Exact key lookup # What you CANNOT do:# SQL equivalent: SELECT * FROM users WHERE email = 'alice@example.com'# Key-value: No built-in way to do this! # Workaround: Build and maintain secondary indexes manuallyclass UserStore: def create_user(self, user: dict): user_id = user["id"] # Primary storage self.kv.set(f"user:{user_id}", json.dumps(user)) # Manual secondary index self.kv.set(f"user:email:{user['email']}", user_id) def get_by_email(self, email: str) -> dict: # Two lookups required user_id = self.kv.get(f"user:email:{email}") if not user_id: return None return json.loads(self.kv.get(f"user:{user_id}")) def update_email(self, user_id: str, new_email: str): user = json.loads(self.kv.get(f"user:{user_id}")) old_email = user["email"] # Must update both primary data AND index user["email"] = new_email self.kv.set(f"user:{user_id}", json.dumps(user)) self.kv.delete(f"user:email:{old_email}") self.kv.set(f"user:email:{new_email}", user_id) # If any step fails, data is inconsistent!Every secondary index you build is a maintenance burden. You must update indexes on every write, handle partial failures, and ensure consistency. The more indexes you need, the more a document database or relational database becomes attractive.
No support for relational operations. Key-value stores have no concept of relationships between entities. There's no foreign key, no JOIN operation, no referential integrity.
The consequences:
123456789101112131415161718192021222324
# SQL: One query with JOIN# SELECT u.name, o.id, o.total # FROM users u JOIN orders o ON u.id = o.user_id # WHERE u.id = 123 # Key-value: Multiple queries requireddef get_user_with_orders(user_id: str): # Query 1: Get user user = json.loads(kv.get(f"user:{user_id}")) # Query 2: Get order IDs list order_ids = json.loads(kv.get(f"user:{user_id}:orders") or "[]") # Query 3+: Get each order (or use MGET) orders = [] for order_id in order_ids: order = json.loads(kv.get(f"order:{order_id}")) orders.append(order) return {"user": user, "orders": orders} # Minimum 3 round-trips vs 1 SQL query! # Alternative: Denormalize (duplicate user info in orders)# Trades storage/consistency for fewer queriesDenormalization is acceptable when: (1) the duplicated data rarely changes, (2) eventual consistency is acceptable, (3) read performance is more important than storage efficiency. For frequently-changing relational data, use a relational database.
No COUNT, SUM, AVG, GROUP BY. Key-value stores cannot compute aggregations across data. Every aggregate value must be pre-computed and maintained manually.
| Operation | SQL | Key-Value Approach |
|---|---|---|
| Count users | SELECT COUNT(*) FROM users | Maintain stats:users:count, increment on create, decrement on delete |
| Sum revenue | SELECT SUM(amount) FROM orders | Maintain stats:revenue:total, increment on each order |
| Group by status | SELECT status, COUNT(*) GROUP BY status | Maintain separate counters: stats:users:status:active, stats:users:status:inactive |
| Average order | SELECT AVG(amount) FROM orders | Maintain sum AND count, compute ratio on read |
Problems with pre-computed aggregates:
If you need ad-hoc analytics, reporting, or business intelligence, key-value stores are fundamentally wrong. Use a relational database, data warehouse, or specialized analytics database.
In-memory stores are bounded by RAM. Redis and Memcached keep all data in memory. Your dataset size is limited by available RAM, which is orders of magnitude more expensive than disk storage.
| Storage Type | Cost per GB/month | Latency | Relative Cost |
|---|---|---|---|
| Redis Cloud | $30-100 | ~100μs | 100x |
| SSD Cloud Storage | $0.10-0.30 | ~1ms | 1x |
| HDD Cloud Storage | $0.02-0.05 | ~10ms | 0.1x |
| Object Storage (S3) | $0.02 | ~50ms | 0.1x |
Implications:
1234567891011121314151617181920212223242526
# Memory-conscious practices # 1. Always set TTL on ephemeral dataredis.setex("cache:query:abc", 3600, result) # Expires in 1 hour # 2. Use appropriate data structures# Hash with small values uses less memory than individual keysredis.hset("user:123", mapping={"name": "Alice", "email": "a@b.com"})# vsredis.set("user:123:name", "Alice")redis.set("user:123:email", "a@b.com") # 3. Compress large values before storingimport zlibcompressed = zlib.compress(large_json.encode())redis.set("large:data", compressed) # 4. Use memory-efficient data types# - Small hashes/sets/lists use ziplist encoding (dense)# - Configure hash-max-ziplist-entries and similar settings # 5. Monitor memory usageinfo = redis.info("memory")used_memory = info["used_memory"]maxmemory = info["maxmemory"]usage_percent = (used_memory / maxmemory) * 100When Redis reaches maxmemory with an eviction policy like 'allkeys-lru', it silently deletes keys. This is fine for cache workloads but disastrous if you're using Redis as a primary database. Always monitor memory and plan for growth.
No ACID transactions across keys. While single-key operations are atomic, multi-key operations lack true transactional guarantees. Redis MULTI/EXEC provides atomicity but not isolation or rollback.
12345678910111213141516171819202122232425
# Redis transaction limitation: No rollback on command error pipe = redis.pipeline()pipe.multi()pipe.set("key1", "value1") # Will succeedpipe.incr("key2") # Will FAIL if key2 is not a numberpipe.set("key3", "value3") # Still executes!results = pipe.execute()# Results: [True, ResponseError, True]# key1 and key3 are set, key2 failed# No automatic rollback! # Workaround: Use Lua for true atomic operationslua_script = """local current = redis.call('GET', KEYS[1])if current then local new_value = tonumber(current) + 1 redis.call('SET', KEYS[1], new_value) redis.call('SET', KEYS[2], new_value) return new_valueelse return nilend"""# Entire script executes atomically with ability to abortReplication is asynchronous by default. In distributed key-value stores, writes may be acknowledged before replicating to all nodes. This means:
| System | Default Consistency | Strong Consistency Option |
|---|---|---|
| Redis Replication | Async (eventual) | WAIT command (blocks until replicated) |
| Redis Cluster | Async (eventual) | None built-in |
| DynamoDB | Eventually consistent | Strongly consistent reads (2x cost) |
| Cassandra | Tunable | ALL/QUORUM write + read |
If a Redis primary fails before replicating recent writes to replicas, those writes are lost permanently. For truly critical data, either use synchronous replication (WAIT) at the cost of latency, or use a different database with stronger durability guarantees.
Simple interface, complex operations. While the API is simple, running key-value stores in production introduces operational challenges.
Consider managed services (AWS ElastiCache, Redis Cloud, Azure Cache) to offload operational complexity. They handle replication, failover, patching, and monitoring, letting you focus on application logic.
The bottom line:
Key-value stores are specialized tools, not universal solutions. They excel at lookup-by-key patterns with simple data models. When you need complex queries, relationships, aggregations, or strong consistency, traditional relational databases remain the better choice.
The best architectures often use both: a relational database as the source of truth, with key-value stores for caching, sessions, and real-time features. This polyglot persistence approach leverages each tool's strengths.
Congratulations! You've completed the Key-Value Stores module. You now understand the fundamental concepts, data modeling, Redis as a canonical example, ideal use cases, and honest limitations. You're equipped to make informed decisions about when key-value stores are the right tool for your architecture.