System Design (HLD)Distributed Cache Systems

Distributed Cache Systems

LevelIntermediate

Duration90 mins

TopicDistributed Cache Systems

2 / 5

Memcached: Simple, High-Performance Caching

The Original Distributed Cache

While Redis has captured much of the distributed caching mindshare, Memcached remains a foundational technology that continues to power some of the world's highest-traffic systems. Originally developed by Brad Fitzpatrick for LiveJournal in 2003, Memcached pioneered the concept of a distributed in-memory key-value cache that could be deployed across multiple machines to aggregate memory into a massive, unified caching layer.

Memcached's philosophy is radical simplicity. Where Redis offers a rich ecosystem of data structures, persistence options, and advanced features, Memcached focuses on doing one thing exceptionally well: storing and retrieving key-value pairs with minimal latency and maximum throughput. This laser focus has made Memcached the choice for scenarios where raw performance trumps feature richness.

Today, Memcached serves caching layers at Facebook, Wikipedia, YouTube, Twitter, and countless other high-scale systems. Facebook famously operates one of the world's largest Memcached deployments, with thousands of servers handling billions of requests per second. For systems where every microsecond matters and the use case fits Memcached's strengths, it remains an unbeatable option.

What You Will Learn

By the end of this page, you will understand Memcached's architecture and design philosophy, master its multi-threaded model and memory management via the slab allocator, comprehend distributed caching through consistent hashing, and recognize the scenarios where Memcached outperforms more feature-rich alternatives.

Memcached Architecture Fundamentals

Memcached's architecture reflects its singular purpose: maximize throughput and minimize latency for simple key-value operations. Every design decision prioritizes performance for the common case.

Multi-Threaded Model

Unlike Redis's single-threaded command processing, Memcached uses a multi-threaded architecture from the ground up. This allows Memcached to utilize all available CPU cores for both network I/O and command processing.

Worker Thread Pool:

Memcached maintains a pool of worker threads, each capable of independently processing client requests. The number of threads is configurable via the -t flag (default: 4). On modern multi-core servers, increasing thread count can dramatically improve throughput:

memcached -t 8 -m 1024    # 8 threads, 1GB memory

Connection Distribution:

New connections are distributed across worker threads using a round-robin or least-connections approach. Each thread handles all operations for its assigned connections, minimizing lock contention.

Memcached Threading vs Redis Threading
Aspect	Memcached	Redis
Core model	Multi-threaded from design	Single-threaded command processing
CPU utilization	Utilizes all cores naturally	Requires Cluster for multi-core
Context switching	Present, but minimized	None in command path
Lock contention	Careful design to minimize	No locks needed
Single-instance throughput	Higher on multi-core	Limited by single core
Operation atomicity	Per-key locking	Natural from single-threading

Event-Driven Connection Handling

Memcached uses libevent for scalable, non-blocking I/O. Each worker thread runs its own event loop, handling thousands of connections efficiently. This epoll/kqueue-based approach means connection count doesn't linearly impact CPU usage.

Connection States:

Listening: Accept new connections, assign to worker
Reading: Parse incoming command
Executing: Process command, access item
Writing: Send response back
Idle: Wait for next command (persistent connection)

The event-driven model ensures that waiting for network I/O doesn't block other operations—a crucial property for high-concurrency scenarios.

memcached-startup.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/bash
# Production Memcached startup configuration
 
memcached \
    -d \                           # Run as daemon
    -m 4096 \                      # 4GB memory limit
    -p 11211 \                     # Port
    -u memcached \                 # Run as memcached user
    -l 0.0.0.0 \                   # Listen on all interfaces
    -t 8 \                         # 8 worker threads
    -c 10240 \                     # Max connections
    -b 1024 \                      # Connection backlog
    -R 200 \                       # Max requests per event
    -o slab_reassign,slab_automove \  # Enable slab rebalancing
    -v                             # Verbose logging
 
# Key tuning parameters:
# -t: Worker threads (usually cores - 1 to cores * 2)
# -c: Max connections (plan for your connection pool sizes)
# -m: Memory limit (leave headroom for slab overhead)
# -R: Requests per event (higher = better throughput, worse fairness)

Thread Count Tuning

Start with threads equal to CPU cores. If CPU utilization is low but throughput isn't meeting needs, increase thread count. If you see high context switching in monitoring, reduce threads. Modern best practice: 1.5 to 2x core count, then tune based on metrics.

Memory Management: The Slab Allocator

One of Memcached's most important innovations is its slab allocator—a memory management system designed to eliminate fragmentation and enable O(1) allocation. Understanding the slab allocator is essential for effective Memcached operation.

The Fragmentation Problem

Naive memory allocators (malloc/free) suffer from fragmentation over time. As items of varying sizes are created and deleted, memory becomes fragmented—plenty of total free space, but no contiguous block large enough for new allocations. This leads to increasing allocation times and wasted memory.

Slab Allocation Design

Memcached solves fragmentation by pre-allocating memory into slab classes, each optimized for a range of item sizes:

Slab Class: A category for items of similar sizes (e.g., 96 bytes, 120 bytes, 152 bytes...)
Slab Page: A 1MB block of memory allocated to a slab class
Chunk: A fixed-size slot within a slab page for storing one item

When an item is stored, Memcached finds the smallest slab class that fits the item and allocates a chunk from that class. All chunks in a class are the same size, eliminating fragmentation.

Converting Mermaid diagram...

Slab Class Sizing

By default, Memcached creates slab classes using a growth factor (default: 1.25). Each successive class is 25% larger than the previous:

Class 1: 96 bytes
Class 2: 120 bytes (96 × 1.25)
Class 3: 152 bytes (120 × 1.25)
Class 4: 192 bytes (152 × 1.25)
... continuing up to 1MB (maximum item size)

The Trade-off:

Larger growth factors mean fewer slab classes but more internal fragmentation (a 97-byte item wastes 23 bytes in the 120-byte class). Smaller growth factors reduce waste but increase the number of classes.

memcached -f 1.125 -m 4096    # Finer-grained classes (12.5% growth)

slab-stats-output.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Get slab statistics
echo "stats slabs" | nc localhost 11211
 
STAT 1:chunk_size 96
STAT 1:chunks_per_page 10922
STAT 1:total_pages 2
STAT 1:total_chunks 21844
STAT 1:used_chunks 18532
STAT 1:free_chunks 3312
STAT 1:mem_requested 1665420
 
STAT 2:chunk_size 120
STAT 2:chunks_per_page 8738
STAT 2:total_pages 1
STAT 2:total_chunks 8738
STAT 2:used_chunks 8200
STAT 2:free_chunks 538
STAT 2:mem_requested 902480
 
# Key metrics to monitor:
# - used_chunks / total_chunks: Utilization per class
# - mem_requested vs chunk_size * used_chunks: Internal fragmentation
# - Uneven distribution: Some classes may be starving for pages

Slab Calcification Problem

A critical operational issue with Memcached's slab allocator is slab calcification. Once a slab page is assigned to a class, it stays with that class forever (unless slab rebalancing is enabled). If your access patterns change—perhaps early in the day you cache many small items, and later you cache larger ones—the slab distribution becomes mismatched with actual needs.

Symptoms:

High eviction rates in some slab classes while others have free chunks
Poor cache hit rates despite available memory
evictions stat increasing rapidly for specific classes

Solutions:

Slab Reassignment (-o slab_reassign): Allows manually moving pages between classes
Slab Automove (-o slab_automove): Automatically rebalances pages based on eviction rates
Restart: Clear all memory and start fresh (last resort)

Enable Slab Automove in Production

Modern Memcached deployments should always enable slab rebalancing: -o slab_reassign,slab_automove. This allows Memcached to dynamically move memory between slab classes based on demand. Without it, workload changes can lead to severe cache inefficiency.

Data Model and Core Operations

Memcached's data model is intentionally minimal: keys map to opaque byte sequences (values). There are no rich data types, no secondary indexes, no query capabilities. This simplicity enables extreme performance optimization.

Key Constraints

Maximum key length: 250 bytes
Key characters: ASCII, no spaces or control characters
Case-sensitive: User:123 ≠ user:123

Value Constraints

Maximum value size: 1MB (default, configurable)
Content: Opaque binary data (Memcached doesn't interpret it)
Compression: Client responsibility (not built-in)

Core Commands

Memcached provides a small, focused command set optimized for the common cache access patterns:

Memcached Core Commands
Command	Description	Atomicity
GET key	Retrieve value for key	Atomic read
SET key flags exptime bytes	Store value unconditionally	Atomic write
ADD key flags exptime bytes	Store only if key doesn't exist	Atomic (check + write)
REPLACE key flags exptime bytes	Store only if key exists	Atomic (check + write)
DELETE key	Remove key from cache	Atomic delete
INCR key value	Increment numeric value	Atomic increment
DECR key value	Decrement numeric value	Atomic decrement
APPEND key bytes	Append data to existing value	Atomic append
PREPEND key bytes	Prepend data to existing value	Atomic prepend
CAS key flags exptime bytes cas_unique	Compare-and-swap (optimistic lock)	Atomic conditional update

memcached-operations.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Connect to Memcached
telnet localhost 11211
 
# Basic SET/GET
set user:123 0 3600 28
{"name":"Alice","age":30}
STORED
 
get user:123
VALUE user:123 0 28
{"name":"Alice","age":30}
END
 
# Atomic counter
set page:views 0 0 1
0
STORED
 
incr page:views 1
1
incr page:views 1
2
incr page:views 100
102
 
# Conditional SET (ADD - only if not exists)
add user:123 0 3600 3
new
NOT_STORED    # Key already exists
 
add user:456 0 3600 5
value
STORED        # Key didn't exist
 
# Compare-and-Swap (CAS) for optimistic locking
gets user:123          # Get with CAS token
VALUE user:123 0 28 12345    # 12345 is the CAS token
{"name":"Alice","age":30}
END
 
cas user:123 0 3600 28 12345
{"name":"Alice","age":31}
STORED        # CAS token matched
 
cas user:123 0 3600 28 12345
{"name":"Alice","age":32}
EXISTS        # CAS token changed (someone else modified)

Multi-Get for Batch Retrieval

One of Memcached's most important optimizations is multi-get—retrieving multiple keys in a single request. This dramatically reduces network round trips when fetching related data:

get user:1 user:2 user:3 user:4 user:5
VALUE user:1 0 28
{...}
VALUE user:2 0 32
{...}
VALUE user:4 0 25
{...}
END

Note that user:3 and user:5 weren't returned (cache misses). The client must handle partial results and fetch missing data from the source.

Silent vs Verbose Gets:

get returns nothing for missing keys. gets (for CAS) returns the same, but includes a CAS token for each value.

Batch Your Gets

Single-key gets are a common anti-pattern. If you need data for 10 users, don't make 10 GET calls—use multi-get. Each network round trip costs 0.1-1ms; multi-get retrieves all data in a single round trip. This optimization alone can 10x your effective throughput.

Expiration and Eviction Policies

Memcached uses a combination of explicit expiration and LRU eviction to manage limited memory. Understanding these mechanisms is crucial for capacity planning and hit-rate optimization.

Time-To-Live (TTL) Expiration

Every item in Memcached has an expiration time, set at write time:

exptime = 0: Never expire (only evicted when memory is needed)
exptime = 1-2592000 (30 days): Seconds from now
exptime > 2592000: Interpreted as Unix timestamp

Lazy Expiration:

Memcached doesn't actively scan for expired items. Instead, items are lazily expired when accessed. An expired item occupies memory until:

A GET request checks and finds it expired (returns miss)
The slab class needs space and this chunk is reclaimed

expiration-examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Cache for 1 hour (3600 seconds)
set user:session:abc 0 3600 42
{"user_id":123,"login_time":"2024-01-15T10:00:00Z"}
 
# Cache for 24 hours (86400 seconds)
set rendered:homepage 0 86400 15000
<html>...</html>
 
# Never expire (but can still be evicted under memory pressure)
set config:feature_flags 0 0 200
{"dark_mode":true,"beta_features":false}
 
# Unix timestamp expiration (January 16, 2024 00:00:00 UTC)
set event:sale_banner 0 1705363200 100
{"message":"Sale ends soon!"}
 
# Very short TTL for rate limiting
set ratelimit:user:123 0 60 1
1
# Check: exists within 60 seconds, then automatically expires

LRU Eviction

When Memcached needs memory for a new item and the slab class has no free chunks, it evicts the Least Recently Used (LRU) item from that class. This happens automatically; no configuration is required.

Important Distinction: Per-Slab-Class LRU

Each slab class maintains its own LRU list. An eviction in the 120-byte class only considers items in that class—not items in other classes. This can lead to unintuitive behavior:

A frequently-accessed large item might be evicted
While rarely-accessed small items in a different class are retained
Because the large-item class is under more memory pressure

Eviction Metrics:

echo "stats" | nc localhost 11211 | grep evictions
STAT evictions 1523942

High eviction counts indicate memory pressure. Solutions:

Increase Memcached memory (-m flag)
Reduce item TTLs to free space faster
Review what's being cached (maybe some data shouldn't be)

LRU Crawler (Background Expiration)

Modern Memcached includes an LRU crawler that proactively reclaims expired items in the background. This prevents the situation where expired items consume memory until accessed.

memcached -o lru_crawler,lru_maintainer

LRU Maintainer:

Continuously crawls LRU lists
Reclaims expired items before they're accessed
Rebalances items between "hot," "warm," and "cold" LRU queues

This addition significantly improves memory efficiency for workloads with many TTL expires.

Evictions Are Normal

Unlike database systems where losing data is catastrophic, evictions are a normal part of caching. The goal isn't zero evictions—it's maintaining a high cache hit rate despite evictions. Monitor your hit_rate: if it stays above 90-95%, evictions are working as intended, making room for more valuable data.

Distributed Caching with Consistent Hashing

Memcached itself has no built-in clustering or distribution mechanism. Each Memcached instance operates independently with no awareness of other instances. Distribution is entirely client-side—the client library decides which server holds which key.

The Naive Approach: Modulo Hashing

The simplest distribution approach hashes the key and uses modulo to pick a server:

server_index = hash(key) % num_servers

The Problem:

When you add or remove a server, almost every key maps to a different server:

Before (3 servers): hash(key) % 3 = 1 → Server B
After adding server: hash(key) % 4 = 3 → Server D

This causes a mass cache invalidation—nearly 100% of keys need to be refetched from the database, potentially overwhelming your backend.

Consistent Hashing

Consistent hashing solves this by minimizing key remapping when the server pool changes. The algorithm:

Servers and keys are both hashed onto a circular ring (0 to 2^32)
Each key is assigned to the first server encountered clockwise on the ring
Adding a server only remaps keys in the arc between the new server and the previous one
Removing a server only remaps its keys to the next server clockwise

Result: Adding or removing a server remaps only ~1/N of keys, not ~100%.

Converting Mermaid diagram...

Virtual Nodes for Balance

Simple consistent hashing can result in uneven distribution—some servers might get significantly more keys than others. Virtual nodes solve this by placing multiple points per server on the ring:

Server A: 150 virtual nodes at random points
Server B: 150 virtual nodes at random points
Server C: 150 virtual nodes at random points

With 150+ virtual nodes per server, key distribution becomes nearly uniform. Modern client libraries handle virtual node placement automatically.

python-consistent-hashing-example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import pylibmc
import hashlib
 
# Client-side consistent hashing with pylibmc
client = pylibmc.Client(
    ["192.168.1.101:11211", "192.168.1.102:11211", "192.168.1.103:11211"],
    behaviors={
        "ketama": True,           # Enable consistent hashing (libketama)
        "ketama_weighted": True,  # Support weighted servers
        "remove_failed": 1,       # Remove failed servers from ring
        "retry_timeout": 2,       # Retry failed servers after 2 seconds
        "dead_timeout": 30,       # Consider server dead after 30 seconds
    }
)
 
# Usage is transparent - client handles distribution
client.set("user:12345", {"name": "Alice", "email": "alice@example.com"}, time=3600)
user_data = client.get("user:12345")
 
# Multi-get across servers (client batches by server)
keys = ["user:1", "user:2", "user:3", "user:4", "user:5"]
results = client.get_multi(keys)  
# Results may come from different servers - client handles routing
 
# Server weights for unequal distribution (more memory = more keys)
# configured via: "192.168.1.101:11211:2" (weight 2)

Client Library Choice Matters

Different client libraries implement consistent hashing differently. When scaling a Memcached cluster, ensure all application instances use the same client library version and configuration—otherwise, the same key might route to different servers from different clients, causing inconsistency and elevated miss rates.

Production Deployment Considerations

Deploying Memcached in production requires attention to monitoring, capacity planning, and failure handling. Unlike databases, Memcached has no persistence—a restart means starting with an empty cache.

Key Metrics to Monitor

Memcached exposes statistics via the stats command. Critical metrics include:

Critical Memcached Metrics
Metric	Description	Alert Threshold
get_hits / (get_hits + get_misses)	Cache hit rate	< 80-90% (depends on workload)
evictions	Items evicted for new data	Rapid increase (relative to baseline)
curr_connections	Active client connections	Approaching max_connections
bytes / limit_maxbytes	Memory utilization	95% (eviction pressure)
cmd_get / uptime	Gets per second	Anomalies from baseline
cmd_set / uptime	Sets per second	Unexpected spikes
bytes_read, bytes_written	Network throughput	Approaching network capacity

memcached-monitoring.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/bin/bash
# Memcached monitoring script
 
MEMCACHED_HOST="localhost"
MEMCACHED_PORT="11211"
 
# Get all stats
stats=$(echo "stats" | nc -q1 $MEMCACHED_HOST $MEMCACHED_PORT)
 
# Parse key metrics
get_hits=$(echo "$stats" | grep "STAT get_hits" | awk '{print $3}')
get_misses=$(echo "$stats" | grep "STAT get_misses" | awk '{print $3}')
evictions=$(echo "$stats" | grep "STAT evictions" | awk '{print $3}')
bytes=$(echo "$stats" | grep "STAT bytes " | awk '{print $3}')
limit=$(echo "$stats" | grep "STAT limit_maxbytes" | awk '{print $3}')
 
# Calculate hit rate
total_gets=$((get_hits + get_misses))
if [ $total_gets -gt 0 ]; then
    hit_rate=$(echo "scale=4; $get_hits / $total_gets * 100" | bc)
else
    hit_rate="N/A"
fi
 
# Calculate memory usage
mem_usage=$(echo "scale=2; $bytes / $limit * 100" | bc)
 
echo "Cache Hit Rate: ${hit_rate}%"
echo "Memory Usage: ${mem_usage}%"
echo "Evictions: $evictions"
 
# Alert on low hit rate
if (( $(echo "$hit_rate < 85" | bc -l) )); then
    echo "WARNING: Cache hit rate below 85%"
fi

Connection Pooling

Each Memcached connection consumes server resources. Without pooling, high-traffic applications can exhaust connections or create excessive connection churn.

Best Practices:

Use connection pools: Most client libraries support connection pooling
Size pools appropriately: connections = expected_concurrency × operations_per_request
Persistent connections: Avoid reconnecting per request
Timeouts: Set reasonable connect and operation timeouts

Cold Start Problem

When Memcached restarts or a new server is added, it has an empty cache. Suddenly, all requests miss and hit the database, potentially causing an overload cascade.

Mitigation Strategies:

Warm-up scripts: Pre-populate cache with critical data before taking traffic
Gradual traffic shift: Route traffic slowly to new instances
Database connection limits: Prevent cache misses from overwhelming database
Fallback caching: Use local in-process cache as second layer

Security Considerations

Memcached has historically been deployed on trusted internal networks with minimal security. For modern deployments:

Bind to private interfaces: Never expose to public internet
Network segmentation: Isolate Memcached in a private subnet
SASL authentication: Enable for multi-tenant or sensitive environments
Firewall rules: Allow only application servers to connect

The Amplification Attack:

Open Memcached servers have been used for DDoS amplification attacks. A small spoofed UDP request can generate a massive response. Always:

Disable UDP (-U 0) or bind UDP to localhost
Never expose to the internet

Never Expose Memcached to the Internet

Memcached was designed for trusted networks. Exposing it to the internet without authentication allows anyone to read your cached data, insert malicious content, or use your server for DDoS attacks. Use firewalls, VPCs, and private networking to protect Memcached instances.

Optimal Use Cases and Best Practices

Memcached excels in specific scenarios. Understanding when to use (and when not to use) Memcached is essential for effective system design.

Ideal Memcached Workloads

Perfect Fit for Memcached

•Session storage: Serialize session data, store with TTL matching session timeout
•Database query caching: Cache expensive query results with appropriate TTL
•API response caching: Store rendered JSON responses for repeated requests
•Page fragment caching: Cache rendered HTML partials (headers, footers, widgets)
•Object caching: Hydrated model objects to avoid repeated database fetches
•Rate limiting counters: INCR/DECR with TTL for sliding window rate limits
•Feature flags: Cache configuration that's read frequently, written rarely

Not Ideal for Memcached

•Persistent data: Memcached has no durability—data can be lost anytime
•Complex data structures: No support for lists, sets, sorted sets (use Redis)
•Pub/Sub messaging: No built-in messaging (use Redis or dedicated MQ)
•Large values: 1MB limit and slab inefficiency with large items
•Data that must not be evicted: LRU eviction is unpredictable
•Tags or secondary indexes: No way to invalidate by pattern or group

Best Practices Summary

memcached-best-practices.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Memcached Best Practices
 
## Key Naming
- Use hierarchical namespaces: `user:123:profile`, `product:456:inventory`
- Include version for invalidation: `user:123:profile:v2`
- Keep keys short (< 100 bytes) to reduce memory overhead
 
## Value Serialization
- Use efficient formats: MessagePack, Protocol Buffers, or compressed JSON
- Compress large values client-side (gzip threshold ~1KB)
- Include version/schema in serialized format for evolution
 
## TTL Strategy
- Match TTL to data freshness requirements
- Use shorter TTLs for frequently-changing data
- Longer TTLs (hours/days) for stable reference data
- Never use 0 (never expire) for user-specific data
 
## Error Handling
- Treat cache as optional—always have fallback to source
- Handle timeouts gracefully (don't retry indefinitely)
- Log cache misses for hit-rate monitoring
 
## Capacity Planning
- Memory: Active dataset + 20% for slab overhead + growth
- Connections: peak_concurrency * apps * (1.5 safety margin)
- Network: Calculate based on avg_value_size * requests_per_second

Summary: Memcached Mastery

Memcached's simplicity is its superpower. By focusing on doing one thing exceptionally well, it achieves performance that more feature-rich systems struggle to match. Let's consolidate the key takeaways:

Key Takeaways

•Memcached is laser-focused — Simple key-value storage optimized for maximum throughput and minimum latency.
•Multi-threaded by design — Unlike Redis, Memcached naturally utilizes all CPU cores for both I/O and command processing.
•Slab allocator prevents fragmentation — Pre-allocated memory classes ensure O(1) allocation but require attention to slab calcification.
•Distribution is client-side — Use consistent hashing for resilient distribution; modulo hashing causes mass invalidation on topology changes.
•No persistence means ephemeral data — Treat Memcached as volatile acceleration, never as data storage.
•Monitor hit rate and evictions — These metrics indicate cache effectiveness and capacity needs.
•Multi-get is essential — Batch operations dramatically reduce network round trips.

What's Next:

Now that we understand both Redis and Memcached individually, the next page provides a detailed Redis vs Memcached comparison—helping you make informed decisions about which technology best fits your specific use case.

Page Complete

You now have a comprehensive understanding of Memcached's architecture, memory management, distributed deployment, and operational considerations. This knowledge enables you to deploy and operate Memcached effectively in high-performance caching scenarios.

2 / 5

Loading learning content...

System Design (HLD)Distributed Cache Systems

Distributed Cache Systems

LevelIntermediate

Duration90 mins

TopicDistributed Cache Systems

2 / 5

Memcached: Simple, High-Performance Caching

The Original Distributed Cache

What You Will Learn

Memcached Architecture Fundamentals

Memcached's architecture reflects its singular purpose: maximize throughput and minimize latency for simple key-value operations. Every design decision prioritizes performance for the common case.

Multi-Threaded Model

Worker Thread Pool:

memcached -t 8 -m 1024    # 8 threads, 1GB memory

Connection Distribution:

New connections are distributed across worker threads using a round-robin or least-connections approach. Each thread handles all operations for its assigned connections, minimizing lock contention.

Memcached Threading vs Redis Threading
Aspect	Memcached	Redis
Core model	Multi-threaded from design	Single-threaded command processing
CPU utilization	Utilizes all cores naturally	Requires Cluster for multi-core
Context switching	Present, but minimized	None in command path
Lock contention	Careful design to minimize	No locks needed
Single-instance throughput	Higher on multi-core	Limited by single core
Operation atomicity	Per-key locking	Natural from single-threading

Event-Driven Connection Handling

Connection States:

Listening: Accept new connections, assign to worker
Reading: Parse incoming command
Executing: Process command, access item
Writing: Send response back
Idle: Wait for next command (persistent connection)

The event-driven model ensures that waiting for network I/O doesn't block other operations—a crucial property for high-concurrency scenarios.

memcached-startup.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/bash
# Production Memcached startup configuration
 
memcached \
    -d \                           # Run as daemon
    -m 4096 \                      # 4GB memory limit
    -p 11211 \                     # Port
    -u memcached \                 # Run as memcached user
    -l 0.0.0.0 \                   # Listen on all interfaces
    -t 8 \                         # 8 worker threads
    -c 10240 \                     # Max connections
    -b 1024 \                      # Connection backlog
    -R 200 \                       # Max requests per event
    -o slab_reassign,slab_automove \  # Enable slab rebalancing
    -v                             # Verbose logging
 
# Key tuning parameters:
# -t: Worker threads (usually cores - 1 to cores * 2)
# -c: Max connections (plan for your connection pool sizes)
# -m: Memory limit (leave headroom for slab overhead)
# -R: Requests per event (higher = better throughput, worse fairness)

Thread Count Tuning

Memory Management: The Slab Allocator

The Fragmentation Problem

Slab Allocation Design

Memcached solves fragmentation by pre-allocating memory into slab classes, each optimized for a range of item sizes:

Slab Class: A category for items of similar sizes (e.g., 96 bytes, 120 bytes, 152 bytes...)
Slab Page: A 1MB block of memory allocated to a slab class
Chunk: A fixed-size slot within a slab page for storing one item

When an item is stored, Memcached finds the smallest slab class that fits the item and allocates a chunk from that class. All chunks in a class are the same size, eliminating fragmentation.

Converting Mermaid diagram...

Slab Class Sizing

By default, Memcached creates slab classes using a growth factor (default: 1.25). Each successive class is 25% larger than the previous:

Class 1: 96 bytes
Class 2: 120 bytes (96 × 1.25)
Class 3: 152 bytes (120 × 1.25)
Class 4: 192 bytes (152 × 1.25)
... continuing up to 1MB (maximum item size)

The Trade-off:

memcached -f 1.125 -m 4096    # Finer-grained classes (12.5% growth)

slab-stats-output.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Get slab statistics
echo "stats slabs" | nc localhost 11211
 
STAT 1:chunk_size 96
STAT 1:chunks_per_page 10922
STAT 1:total_pages 2
STAT 1:total_chunks 21844
STAT 1:used_chunks 18532
STAT 1:free_chunks 3312
STAT 1:mem_requested 1665420
 
STAT 2:chunk_size 120
STAT 2:chunks_per_page 8738
STAT 2:total_pages 1
STAT 2:total_chunks 8738
STAT 2:used_chunks 8200
STAT 2:free_chunks 538
STAT 2:mem_requested 902480
 
# Key metrics to monitor:
# - used_chunks / total_chunks: Utilization per class
# - mem_requested vs chunk_size * used_chunks: Internal fragmentation
# - Uneven distribution: Some classes may be starving for pages

Slab Calcification Problem

Symptoms:

High eviction rates in some slab classes while others have free chunks
Poor cache hit rates despite available memory
evictions stat increasing rapidly for specific classes

Solutions:

Slab Reassignment (-o slab_reassign): Allows manually moving pages between classes
Slab Automove (-o slab_automove): Automatically rebalances pages based on eviction rates
Restart: Clear all memory and start fresh (last resort)

Enable Slab Automove in Production

Data Model and Core Operations

Key Constraints

Maximum key length: 250 bytes
Key characters: ASCII, no spaces or control characters
Case-sensitive: User:123 ≠ user:123

Value Constraints

Maximum value size: 1MB (default, configurable)
Content: Opaque binary data (Memcached doesn't interpret it)
Compression: Client responsibility (not built-in)

Core Commands

Memcached provides a small, focused command set optimized for the common cache access patterns:

Memcached Core Commands
Command	Description	Atomicity
GET key	Retrieve value for key	Atomic read
SET key flags exptime bytes	Store value unconditionally	Atomic write
ADD key flags exptime bytes	Store only if key doesn't exist	Atomic (check + write)
REPLACE key flags exptime bytes	Store only if key exists	Atomic (check + write)
DELETE key	Remove key from cache	Atomic delete
INCR key value	Increment numeric value	Atomic increment
DECR key value	Decrement numeric value	Atomic decrement
APPEND key bytes	Append data to existing value	Atomic append
PREPEND key bytes	Prepend data to existing value	Atomic prepend
CAS key flags exptime bytes cas_unique	Compare-and-swap (optimistic lock)	Atomic conditional update

memcached-operations.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# Connect to Memcached
telnet localhost 11211
 
# Basic SET/GET
set user:123 0 3600 28
{"name":"Alice","age":30}
STORED
 
get user:123
VALUE user:123 0 28
{"name":"Alice","age":30}
END
 
# Atomic counter
set page:views 0 0 1
0
STORED
 
incr page:views 1
1
incr page:views 1
2
incr page:views 100
102
 
# Conditional SET (ADD - only if not exists)
add user:123 0 3600 3
new
NOT_STORED    # Key already exists
 
add user:456 0 3600 5
value
STORED        # Key didn't exist
 
# Compare-and-Swap (CAS) for optimistic locking
gets user:123          # Get with CAS token
VALUE user:123 0 28 12345    # 12345 is the CAS token
{"name":"Alice","age":30}
END
 
cas user:123 0 3600 28 12345
{"name":"Alice","age":31}
STORED        # CAS token matched
 
cas user:123 0 3600 28 12345
{"name":"Alice","age":32}
EXISTS        # CAS token changed (someone else modified)

Multi-Get for Batch Retrieval

One of Memcached's most important optimizations is multi-get—retrieving multiple keys in a single request. This dramatically reduces network round trips when fetching related data:

get user:1 user:2 user:3 user:4 user:5
VALUE user:1 0 28
{...}
VALUE user:2 0 32
{...}
VALUE user:4 0 25
{...}
END

Note that user:3 and user:5 weren't returned (cache misses). The client must handle partial results and fetch missing data from the source.

Silent vs Verbose Gets:

get returns nothing for missing keys. gets (for CAS) returns the same, but includes a CAS token for each value.

Batch Your Gets

Expiration and Eviction Policies

Memcached uses a combination of explicit expiration and LRU eviction to manage limited memory. Understanding these mechanisms is crucial for capacity planning and hit-rate optimization.

Time-To-Live (TTL) Expiration

Every item in Memcached has an expiration time, set at write time:

exptime = 0: Never expire (only evicted when memory is needed)
exptime = 1-2592000 (30 days): Seconds from now
exptime > 2592000: Interpreted as Unix timestamp

Lazy Expiration:

Memcached doesn't actively scan for expired items. Instead, items are lazily expired when accessed. An expired item occupies memory until:

A GET request checks and finds it expired (returns miss)
The slab class needs space and this chunk is reclaimed

expiration-examples.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Cache for 1 hour (3600 seconds)
set user:session:abc 0 3600 42
{"user_id":123,"login_time":"2024-01-15T10:00:00Z"}
 
# Cache for 24 hours (86400 seconds)
set rendered:homepage 0 86400 15000
<html>...</html>
 
# Never expire (but can still be evicted under memory pressure)
set config:feature_flags 0 0 200
{"dark_mode":true,"beta_features":false}
 
# Unix timestamp expiration (January 16, 2024 00:00:00 UTC)
set event:sale_banner 0 1705363200 100
{"message":"Sale ends soon!"}
 
# Very short TTL for rate limiting
set ratelimit:user:123 0 60 1
1
# Check: exists within 60 seconds, then automatically expires

LRU Eviction

Important Distinction: Per-Slab-Class LRU

Each slab class maintains its own LRU list. An eviction in the 120-byte class only considers items in that class—not items in other classes. This can lead to unintuitive behavior:

A frequently-accessed large item might be evicted
While rarely-accessed small items in a different class are retained
Because the large-item class is under more memory pressure

Eviction Metrics:

echo "stats" | nc localhost 11211 | grep evictions
STAT evictions 1523942

High eviction counts indicate memory pressure. Solutions:

Increase Memcached memory (-m flag)
Reduce item TTLs to free space faster
Review what's being cached (maybe some data shouldn't be)

LRU Crawler (Background Expiration)

Modern Memcached includes an LRU crawler that proactively reclaims expired items in the background. This prevents the situation where expired items consume memory until accessed.

memcached -o lru_crawler,lru_maintainer

LRU Maintainer:

Continuously crawls LRU lists
Reclaims expired items before they're accessed
Rebalances items between "hot," "warm," and "cold" LRU queues

This addition significantly improves memory efficiency for workloads with many TTL expires.

Evictions Are Normal

Distributed Caching with Consistent Hashing

The Naive Approach: Modulo Hashing

The simplest distribution approach hashes the key and uses modulo to pick a server:

server_index = hash(key) % num_servers

The Problem:

When you add or remove a server, almost every key maps to a different server:

Before (3 servers): hash(key) % 3 = 1 → Server B
After adding server: hash(key) % 4 = 3 → Server D

This causes a mass cache invalidation—nearly 100% of keys need to be refetched from the database, potentially overwhelming your backend.

Consistent Hashing

Consistent hashing solves this by minimizing key remapping when the server pool changes. The algorithm:

Servers and keys are both hashed onto a circular ring (0 to 2^32)
Each key is assigned to the first server encountered clockwise on the ring
Adding a server only remaps keys in the arc between the new server and the previous one
Removing a server only remaps its keys to the next server clockwise

Result: Adding or removing a server remaps only ~1/N of keys, not ~100%.

Converting Mermaid diagram...

Virtual Nodes for Balance

Server A: 150 virtual nodes at random points
Server B: 150 virtual nodes at random points
Server C: 150 virtual nodes at random points

With 150+ virtual nodes per server, key distribution becomes nearly uniform. Modern client libraries handle virtual node placement automatically.

python-consistent-hashing-example.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
import pylibmc
import hashlib
 
# Client-side consistent hashing with pylibmc
client = pylibmc.Client(
    ["192.168.1.101:11211", "192.168.1.102:11211", "192.168.1.103:11211"],
    behaviors={
        "ketama": True,           # Enable consistent hashing (libketama)
        "ketama_weighted": True,  # Support weighted servers
        "remove_failed": 1,       # Remove failed servers from ring
        "retry_timeout": 2,       # Retry failed servers after 2 seconds
        "dead_timeout": 30,       # Consider server dead after 30 seconds
    }
)
 
# Usage is transparent - client handles distribution
client.set("user:12345", {"name": "Alice", "email": "alice@example.com"}, time=3600)
user_data = client.get("user:12345")
 
# Multi-get across servers (client batches by server)
keys = ["user:1", "user:2", "user:3", "user:4", "user:5"]
results = client.get_multi(keys)  
# Results may come from different servers - client handles routing
 
# Server weights for unequal distribution (more memory = more keys)
# configured via: "192.168.1.101:11211:2" (weight 2)

Client Library Choice Matters

Production Deployment Considerations

Key Metrics to Monitor

Memcached exposes statistics via the stats command. Critical metrics include:

Critical Memcached Metrics
Metric	Description	Alert Threshold
get_hits / (get_hits + get_misses)	Cache hit rate	< 80-90% (depends on workload)
evictions	Items evicted for new data	Rapid increase (relative to baseline)
curr_connections	Active client connections	Approaching max_connections
bytes / limit_maxbytes	Memory utilization	95% (eviction pressure)
cmd_get / uptime	Gets per second	Anomalies from baseline
cmd_set / uptime	Sets per second	Unexpected spikes
bytes_read, bytes_written	Network throughput	Approaching network capacity

memcached-monitoring.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#!/bin/bash
# Memcached monitoring script
 
MEMCACHED_HOST="localhost"
MEMCACHED_PORT="11211"
 
# Get all stats
stats=$(echo "stats" | nc -q1 $MEMCACHED_HOST $MEMCACHED_PORT)
 
# Parse key metrics
get_hits=$(echo "$stats" | grep "STAT get_hits" | awk '{print $3}')
get_misses=$(echo "$stats" | grep "STAT get_misses" | awk '{print $3}')
evictions=$(echo "$stats" | grep "STAT evictions" | awk '{print $3}')
bytes=$(echo "$stats" | grep "STAT bytes " | awk '{print $3}')
limit=$(echo "$stats" | grep "STAT limit_maxbytes" | awk '{print $3}')
 
# Calculate hit rate
total_gets=$((get_hits + get_misses))
if [ $total_gets -gt 0 ]; then
    hit_rate=$(echo "scale=4; $get_hits / $total_gets * 100" | bc)
else
    hit_rate="N/A"
fi
 
# Calculate memory usage
mem_usage=$(echo "scale=2; $bytes / $limit * 100" | bc)
 
echo "Cache Hit Rate: ${hit_rate}%"
echo "Memory Usage: ${mem_usage}%"
echo "Evictions: $evictions"
 
# Alert on low hit rate
if (( $(echo "$hit_rate < 85" | bc -l) )); then
    echo "WARNING: Cache hit rate below 85%"
fi

Connection Pooling

Each Memcached connection consumes server resources. Without pooling, high-traffic applications can exhaust connections or create excessive connection churn.

Best Practices:

Use connection pools: Most client libraries support connection pooling
Size pools appropriately: connections = expected_concurrency × operations_per_request
Persistent connections: Avoid reconnecting per request
Timeouts: Set reasonable connect and operation timeouts

Cold Start Problem

When Memcached restarts or a new server is added, it has an empty cache. Suddenly, all requests miss and hit the database, potentially causing an overload cascade.

Mitigation Strategies:

Warm-up scripts: Pre-populate cache with critical data before taking traffic
Gradual traffic shift: Route traffic slowly to new instances
Database connection limits: Prevent cache misses from overwhelming database
Fallback caching: Use local in-process cache as second layer

Security Considerations

Memcached has historically been deployed on trusted internal networks with minimal security. For modern deployments:

Bind to private interfaces: Never expose to public internet
Network segmentation: Isolate Memcached in a private subnet
SASL authentication: Enable for multi-tenant or sensitive environments
Firewall rules: Allow only application servers to connect

The Amplification Attack:

Open Memcached servers have been used for DDoS amplification attacks. A small spoofed UDP request can generate a massive response. Always:

Disable UDP (-U 0) or bind UDP to localhost
Never expose to the internet

Never Expose Memcached to the Internet

Optimal Use Cases and Best Practices

Memcached excels in specific scenarios. Understanding when to use (and when not to use) Memcached is essential for effective system design.

Ideal Memcached Workloads

Perfect Fit for Memcached

•Session storage: Serialize session data, store with TTL matching session timeout
•Database query caching: Cache expensive query results with appropriate TTL
•API response caching: Store rendered JSON responses for repeated requests
•Page fragment caching: Cache rendered HTML partials (headers, footers, widgets)
•Object caching: Hydrated model objects to avoid repeated database fetches
•Rate limiting counters: INCR/DECR with TTL for sliding window rate limits
•Feature flags: Cache configuration that's read frequently, written rarely

Not Ideal for Memcached

•Persistent data: Memcached has no durability—data can be lost anytime
•Complex data structures: No support for lists, sets, sorted sets (use Redis)
•Pub/Sub messaging: No built-in messaging (use Redis or dedicated MQ)
•Large values: 1MB limit and slab inefficiency with large items
•Data that must not be evicted: LRU eviction is unpredictable
•Tags or secondary indexes: No way to invalidate by pattern or group

Best Practices Summary

memcached-best-practices.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Memcached Best Practices
 
## Key Naming
- Use hierarchical namespaces: `user:123:profile`, `product:456:inventory`
- Include version for invalidation: `user:123:profile:v2`
- Keep keys short (< 100 bytes) to reduce memory overhead
 
## Value Serialization
- Use efficient formats: MessagePack, Protocol Buffers, or compressed JSON
- Compress large values client-side (gzip threshold ~1KB)
- Include version/schema in serialized format for evolution
 
## TTL Strategy
- Match TTL to data freshness requirements
- Use shorter TTLs for frequently-changing data
- Longer TTLs (hours/days) for stable reference data
- Never use 0 (never expire) for user-specific data
 
## Error Handling
- Treat cache as optional—always have fallback to source
- Handle timeouts gracefully (don't retry indefinitely)
- Log cache misses for hit-rate monitoring
 
## Capacity Planning
- Memory: Active dataset + 20% for slab overhead + growth
- Connections: peak_concurrency * apps * (1.5 safety margin)
- Network: Calculate based on avg_value_size * requests_per_second

Summary: Memcached Mastery

Key Takeaways

•Memcached is laser-focused — Simple key-value storage optimized for maximum throughput and minimum latency.
•Multi-threaded by design — Unlike Redis, Memcached naturally utilizes all CPU cores for both I/O and command processing.
•Slab allocator prevents fragmentation — Pre-allocated memory classes ensure O(1) allocation but require attention to slab calcification.
•Distribution is client-side — Use consistent hashing for resilient distribution; modulo hashing causes mass invalidation on topology changes.
•No persistence means ephemeral data — Treat Memcached as volatile acceleration, never as data storage.
•Monitor hit rate and evictions — These metrics indicate cache effectiveness and capacity needs.
•Multi-get is essential — Batch operations dramatically reduce network round trips.

What's Next:

Page Complete

2 / 5