Loading learning content...
Before Redis became ubiquitous, before key-value stores were a recognized database category, there was Memcached. Created by Brad Fitzpatrick in 2003 for LiveJournal, Memcached was the first widely-adopted distributed memory caching system, and it fundamentally changed how we build scalable web applications.
Memcached's design philosophy is radical simplicity: store key-value pairs in memory, expire them after a timeout, and do nothing else. No persistence, no data structures, no clustering—just blazingly fast, distributed caching. This constraint-driven design made Memcached extraordinarily reliable and performant at its core use case.
While Redis has overtaken Memcached in popularity due to its richer features, Memcached remains a compelling choice for pure caching workloads. Companies like Facebook, Twitter, YouTube, and Wikipedia still run massive Memcached deployments handling millions of requests per second. Understanding Memcached helps you appreciate when simplicity wins over capability.
By the end of this page, you will understand Memcached's architecture, its deliberate limitations, how it handles distribution, and when it's the right choice over Redis. You'll grasp the principle that drove its design: do one thing exceptionally well.
Memcached's architecture is designed for one purpose: serving cached data as fast as possible. Every design decision optimizes for this goal.
Core characteristics:
Threading model:
Memcached uses a multi-threaded, event-driven architecture:
┌─────────────────────────────────────────────────────────────────┐
│ Memcached Server │
├─────────────────────────────────────────────────────────────────┤
│ Main Thread (listener) │
│ ├── Accepts new connections │
│ └── Distributes to worker threads │
├─────────────────────────────────────────────────────────────────┤
│ Worker Threads (configurable, default 4) │
│ ├── Each thread handles subset of connections │
│ ├── Event-driven (libevent): handles many connections/thread │
│ └── Lock-free where possible, fine-grained locks otherwise │
├─────────────────────────────────────────────────────────────────┤
│ Memory: Slab Allocator │
│ ├── Fixed-size chunks avoid fragmentation │
│ └── LRU per slab class │
└─────────────────────────────────────────────────────────────────┘
Why multi-threaded matters:
Memcached can utilize multiple CPU cores, which Redis (single-threaded command processing) cannot. For extremely high-throughput workloads on multi-core machines, Memcached can achieve higher total throughput per instance.
However, this comes with trade-offs:
| Aspect | Memcached | Redis |
|---|---|---|
| Threading | Multi-threaded | Single-threaded (commands) |
| CPU Utilization | Uses all cores | Uses one core per instance |
| Latency Consistency | Variable (lock contention) | Very consistent |
| Data Types | Strings only | Rich data structures |
| Persistence | None | RDB, AOF, Hybrid |
| Clustering | Client-side only | Server-side cluster |
| Memory Overhead | Lower (~50 bytes/key) | Higher (~70+ bytes/key) |
Memcached's multi-threading shines when you have large multi-core servers and want to minimize instance count. A single 32-core Memcached can outperform multiple Redis instances for simple get/set workloads. But if you need data structures, persistence, or atomic operations beyond simple CAS, Redis is necessary.
One of Memcached's most important internal mechanisms is its slab allocator—a memory management system designed to eliminate fragmentation that plagued earlier caching systems.
The fragmentation problem:
In a naïve cache, you allocate memory for each item when stored and free it when evicted. With items of varying sizes constantly being added and removed, memory becomes fragmented—free space exists but in pieces too small to use. Eventually, the cache can't store new items even with plenty of 'free' memory.
Slab allocation solution:
Memcached pre-allocates memory into fixed-size chunks organized into 'slab classes':
Slab Class 1: 88-byte chunks (for items up to 88 bytes)
Slab Class 2: 112-byte chunks (for items up to 112 bytes)
Slab Class 3: 144-byte chunks (for items up to 144 bytes)
...
Slab Class N: 1MB chunks (for items up to 1MB)
Each slab class handles items of similar size:
| Class | Chunk Size | Items per Page | Waste (worst case) |
|---|---|---|---|
| 1 | 96 bytes | 10,922 | Up to 19 bytes/item |
| 2 | 120 bytes | 8,738 | Up to 24 bytes/item |
| 3 | 152 bytes | 6,898 | Up to 32 bytes/item |
| 4 | 192 bytes | 5,461 | Up to 40 bytes/item |
| ... | ... | ... | ... |
| 42 | 1,048,576 bytes | 1 | Up to 210KB/item |
Internal fragmentation vs external fragmentation:
This trade-off is deliberate: predictable, bounded waste is better than unpredictable fragmentation that grows over time.
Tuning slab allocation:
# Adjust growth factor (smaller = more classes, less waste, more memory overhead)
memcached -f 1.05 # 5% growth between classes
# Set minimum chunk size
memcached -n 48 # Minimum 48 bytes for key+value+flags
# Enable slab reassignment (move pages between classes)
memcached -o slab_reassign
# View slab statistics
stats slabs
stats items
The slab calcification problem:
With fixed slab class assignments, a class that was initially popular may become underutilized while others need more space. Memcached 1.4.11+ introduced slab rebalancing:
# Automatic reassignment
memcached -o slab_reassign,slab_automove=1
This moves pages from underutilized slab classes to those under pressure.
Run 'stats slabs' regularly to identify inefficient slab usage. If one class has many free chunks while another is evicting, enable slab automove. If most items waste significant space, consider adjusting the growth factor for your workload's item size distribution.
Unlike Redis Cluster (where servers coordinate), Memcached servers are completely independent. Distribution is entirely the client's responsibility—the client decides which server holds each key.
How it works:
Client wants to store key "user:123:profile"
↓
Client hashes key: hash("user:123:profile") = 2847593
↓
Client maps hash to server: 2847593 % 3 = 1 → server 1
↓
Client sends SET directly to server 1
Simple modulo hashing:
servers = ["cache1:11211", "cache2:11211", "cache3:11211"]
server_index = hash(key) % len(servers)
target_server = servers[server_index]
The problem with modulo:
When adding or removing servers, nearly all keys remap to different servers:
# 3 servers: hash(key) % 3 = 1 → server 1
# Add server: hash(key) % 4 = 2 → server 2 (different!)
After adding one server, ~75% of keys now hash to different servers. This causes a cache stampede—massive cache miss rate triggering overwhelming database load.
Consistent hashing:
Consistent hashing solves the rehashing problem. Servers and keys are both mapped onto a circular hash ring:
0°
│
┌───────┴───────┐
S3 S1
│ │
270°───┼───────────────┼───90°
│ │
S2 (S3)
└───────┬───────┘
│
180°
Key lookup: hash(key) → position on ring → walk clockwise to first server
Why consistent hashing helps:
When a server is added or removed, only keys that were (or would be) on that server are affected—roughly 1/N of keys, not N-1/N:
# Adding server S4 between S1 and S2:
# Only keys between S4 and S1 remap (from S1 to S4)
# All other keys unchanged
| Scenario | Modulo Hashing | Consistent Hashing |
|---|---|---|
| Add 1 server to 3 | ~75% keys remap | ~25% keys remap |
| Add 1 server to 10 | ~90% keys remap | ~9% keys remap |
| Remove 1 server from 10 | ~90% keys remap | ~10% keys remap |
| Replace 1 server in 10 | ~90% keys remap | ~10% keys remap |
Virtual nodes (vnodes):
Consistent hashing can create imbalanced load if servers have uneven positions on the ring. Virtual nodes solve this by placing each physical server at multiple positions:
Server 1 → vnode1-1, vnode1-2, vnode1-3, ... vnode1-150
Server 2 → vnode2-1, vnode2-2, vnode2-3, ... vnode2-150
Server 3 → vnode3-1, vnode3-2, vnode3-3, ... vnode3-150
With many vnodes per server, the statistical distribution becomes even, and load balances naturally.
Client library support:
Most Memcached clients support consistent hashing:
# Python pylibmc
import pylibmc
mc = pylibmc.Client(
["cache1:11211", "cache2:11211", "cache3:11211"],
behaviors={"ketama": True} # Consistent hashing (ketama algorithm)
)
All clients must use the same server list and same hashing algorithm. If one client uses modulo and another uses consistent hashing, or if server lists differ, clients will read/write to wrong servers, causing cache misses and stale data. Ensure consistent client configuration.
Memcached's protocol is intentionally simple—a text-based format that's human-readable and easy to debug. While a binary protocol exists for efficiency, the text protocol remains widely used for its simplicity.
Core commands:
| Command | Syntax | Description |
|---|---|---|
| set | set <key> <flags> <exptime> <bytes>\r <data>\r | Store unconditionally |
| add | add <key> <flags> <exptime> <bytes>\r <data>\r | Store only if key doesn't exist |
| replace | replace <key> <flags> <exptime> <bytes>\r <data>\r | Store only if key exists |
| append | append <key> <flags> <exptime> <bytes>\r <data>\r | Append to existing value |
| prepend | prepend <key> <flags> <exptime> <bytes>\r <data>\r | Prepend to existing value |
| get | get <key>*\r | Retrieve one or more keys |
| gets | gets <key>*\r | Get with CAS token |
| delete | delete <key>\r | Remove key |
| incr/decr | incr <key> <value>\r | Atomic increment/decrement |
| cas | cas <key> <flags> <exptime> <bytes> <cas>\r <data>\r | Compare-and-swap |
Example session:
$ telnet localhost 11211
Connected to localhost.
# Store a value
set user:123:name 0 300 5
Alice
STORED
# Retrieve the value
get user:123:name
VALUE user:123:name 0 5
Alice
END
# Increment a counter
set pageviews:home 0 0 1
0
STORED
incr pageviews:home 1
1
incr pageviews:home 1
2
# Compare-and-swap for safe updates
gets user:123:balance
VALUE user:123:balance 0 3 12345
100
END
cas user:123:balance 0 300 3 12345
200
STORED
Command parameters:
Multi-get efficiency:
Memcached's multi-get is crucial for performance. Instead of:
get user:1:profile
get user:2:profile
get user:3:profile
Use:
get user:1:profile user:2:profile user:3:profile
This reduces round trips from 3 to 1—a massive latency improvement.
For production, use the binary protocol. It's ~20% faster (no parsing), supports silent operations (no response for success), and handles edge cases better. Most client libraries use binary by default. The text protocol is great for debugging and learning.
Memcached's CAS (Compare-And-Swap) mechanism enables safe concurrent updates without distributed locks. This is essential for use cases like counters, balances, and any read-modify-write pattern.
The lost update problem:
Without CAS, concurrent updates can overwrite each other:
Time 0: balance = 100
Time 1: Client A: GET balance → 100
Time 2: Client B: GET balance → 100
Time 3: Client A: SET balance = 100 + 50 = 150 ✓
Time 4: Client B: SET balance = 100 - 30 = 70 ✗ (overwrites A's update!)
Time 5: balance = 70 (should be 120)
Client B's update was based on stale data and overwrote Client A's deposit.
CAS solution:
Every value has a CAS token (unique identifier that changes on each update):
Time 0: balance = 100, CAS = 5
Time 1: Client A: GETS balance → 100, CAS = 5
Time 2: Client B: GETS balance → 100, CAS = 5
Time 3: Client A: CAS balance = 150 (if CAS = 5) → SUCCESS, CAS = 6
Time 4: Client B: CAS balance = 70 (if CAS = 5) → FAIL (CAS mismatch!)
Time 5: Client B: GETS balance → 150, CAS = 6 (retry with fresh data)
Time 6: Client B: CAS balance = 120 (if CAS = 6) → SUCCESS, CAS = 7
Time 7: balance = 120 ✓ (both updates applied correctly)
CAS workflow:
Implementation pattern:
def update_balance(key, delta):
MAX_RETRIES = 10
for attempt in range(MAX_RETRIES):
# Step 1: Get current value and CAS token
result = mc.gets(key)
if result is None:
# Key doesn't exist, create it
if mc.add(key, delta):
return delta
continue # Race condition, retry
value, cas_token = result
new_value = value + delta
# Step 3: Attempt CAS
if mc.cas(key, new_value, cas_token):
return new_value # Success!
# CAS failed, another client updated - retry
raise Exception("CAS failed after max retries")
CAS response codes:
| Response | Meaning | Action |
|---|---|---|
| STORED | CAS succeeded, value updated | Done, proceed normally |
| EXISTS | CAS token mismatch (concurrent update) | Retry from GETS |
| NOT_FOUND | Key doesn't exist | Use ADD instead, or handle missing key |
Under high contention, CAS can retry many times. Set a maximum retry count and handle failure gracefully. If a key has many concurrent updaters, consider sharding the counter (counter:shard:1, counter:shard:2) and aggregating reads.
Running Memcached at scale requires understanding its operational characteristics and failure modes.
Deployment topologies:
Configuration best practices:
# Start with sensible defaults
memcached
-m 8192 \ # 8GB memory
-c 10000 \ # Max 10K connections
-t 4 \ # 4 worker threads
-p 11211 \ # Port
-l 0.0.0.0 \ # Listen address
-o slab_reassign \ # Enable slab rebalancing
-o slab_automove=1 # Automatic slab moving
Key metrics to monitor:
| Metric | Source | Warning Threshold | Meaning |
|---|---|---|---|
| Hit ratio | get_hits / (get_hits + get_misses) | <90% | Cache effectiveness |
| Evictions | evictions counter | 0 and rising | Memory pressure |
| Current connections | curr_connections | Near -c limit | Connection saturation |
| Bytes read/written | bytes_read/written | Near NIC capacity | Network saturation |
| cmd_get/cmd_set | Per operation counters | Baseline deviation | Traffic patterns |
| cmd_flush | flush_all counter | 0 | Manual cache flush (investigate) |
Failure handling:
Memcached servers will fail. Your application must handle:
Cache stampede protection:
def get_with_stampede_protection(key, ttl, compute_fn):
value = cache.get(key)
if value is not None:
return value
# Try to acquire lock to compute
lock_key = f"lock:{key}"
if cache.add(lock_key, "1", timeout=10): # 10 second lock
try:
value = compute_fn() # Only one process computes
cache.set(key, value, time=ttl)
return value
finally:
cache.delete(lock_key)
else:
# Another process is computing; wait and retry
time.sleep(0.1)
return get_with_stampede_protection(key, ttl, compute_fn)
Design your application to work without cache, just slower. If Memcached is down, queries should still succeed via the database. Never make cache a hard dependency. Set aggressive timeouts and circuit breakers to fail fast when cache is unavailable.
The Memcached vs Redis debate is one of the most common in system design. While Redis has broader capabilities, Memcached remains a valid choice for specific use cases.
Choose Memcached when:
| Use Case | Recommendation | Rationale |
|---|---|---|
| Database query cache | Either | Both handle simple caching well |
| Session storage | Redis | Persistence prevents session loss on restart |
| Leaderboards | Redis | Sorted sets are purpose-built for this |
| Rate limiting | Redis | Atomic INCR with expiration |
| Distributed locks | Redis (Redisson/Redlock) | Battle-tested distributed lock implementations |
| Message queues | Redis | Lists with BLPOP, or Streams |
| Real-time analytics | Redis | HyperLogLog, bitmaps, streams |
| Purely ephemeral cache | Memcached | Simpler, lower overhead |
| Multi-core single instance | Memcached | Multi-threaded architecture |
The modern reality:
For most new projects, Redis is the default choice. Its richer feature set covers more use cases with one technology, reducing operational complexity. The performance difference for typical workloads is negligible.
However, if you:
...then Memcached remains a solid, battle-tested choice.
It's not either/or. Many organizations run both: Memcached for simple caching layers (database result caching) and Redis for rich features (sessions, leaderboards, messaging). Use the right tool for each job.
We've explored Memcached's architecture, internals, and operational considerations. Let's consolidate the key concepts:
What's next:
With Redis and Memcached covered, we'll explore DynamoDB and Riak—two distributed key-value stores designed for different scales and consistency models. DynamoDB brings managed, serverless key-value at AWS scale, while Riak demonstrates Dynamo-style eventual consistency for on-premises deployments.
You now understand Memcached's architecture, slab allocation, distribution model, and operational characteristics. This knowledge enables you to evaluate when Memcached's simplicity is the right choice and how to operate it reliably in production.