Loading learning content...
While Redis has captured much of the distributed caching mindshare, Memcached remains a foundational technology that continues to power some of the world's highest-traffic systems. Originally developed by Brad Fitzpatrick for LiveJournal in 2003, Memcached pioneered the concept of a distributed in-memory key-value cache that could be deployed across multiple machines to aggregate memory into a massive, unified caching layer.
Memcached's philosophy is radical simplicity. Where Redis offers a rich ecosystem of data structures, persistence options, and advanced features, Memcached focuses on doing one thing exceptionally well: storing and retrieving key-value pairs with minimal latency and maximum throughput. This laser focus has made Memcached the choice for scenarios where raw performance trumps feature richness.
Today, Memcached serves caching layers at Facebook, Wikipedia, YouTube, Twitter, and countless other high-scale systems. Facebook famously operates one of the world's largest Memcached deployments, with thousands of servers handling billions of requests per second. For systems where every microsecond matters and the use case fits Memcached's strengths, it remains an unbeatable option.
By the end of this page, you will understand Memcached's architecture and design philosophy, master its multi-threaded model and memory management via the slab allocator, comprehend distributed caching through consistent hashing, and recognize the scenarios where Memcached outperforms more feature-rich alternatives.
Memcached's architecture reflects its singular purpose: maximize throughput and minimize latency for simple key-value operations. Every design decision prioritizes performance for the common case.
Unlike Redis's single-threaded command processing, Memcached uses a multi-threaded architecture from the ground up. This allows Memcached to utilize all available CPU cores for both network I/O and command processing.
Worker Thread Pool:
Memcached maintains a pool of worker threads, each capable of independently processing client requests. The number of threads is configurable via the -t flag (default: 4). On modern multi-core servers, increasing thread count can dramatically improve throughput:
memcached -t 8 -m 1024 # 8 threads, 1GB memory
Connection Distribution:
New connections are distributed across worker threads using a round-robin or least-connections approach. Each thread handles all operations for its assigned connections, minimizing lock contention.
| Aspect | Memcached | Redis |
|---|---|---|
| Core model | Multi-threaded from design | Single-threaded command processing |
| CPU utilization | Utilizes all cores naturally | Requires Cluster for multi-core |
| Context switching | Present, but minimized | None in command path |
| Lock contention | Careful design to minimize | No locks needed |
| Single-instance throughput | Higher on multi-core | Limited by single core |
| Operation atomicity | Per-key locking | Natural from single-threading |
Memcached uses libevent for scalable, non-blocking I/O. Each worker thread runs its own event loop, handling thousands of connections efficiently. This epoll/kqueue-based approach means connection count doesn't linearly impact CPU usage.
Connection States:
The event-driven model ensures that waiting for network I/O doesn't block other operations—a crucial property for high-concurrency scenarios.
123456789101112131415161718192021
#!/bin/bash# Production Memcached startup configuration memcached \ -d \ # Run as daemon -m 4096 \ # 4GB memory limit -p 11211 \ # Port -u memcached \ # Run as memcached user -l 0.0.0.0 \ # Listen on all interfaces -t 8 \ # 8 worker threads -c 10240 \ # Max connections -b 1024 \ # Connection backlog -R 200 \ # Max requests per event -o slab_reassign,slab_automove \ # Enable slab rebalancing -v # Verbose logging # Key tuning parameters:# -t: Worker threads (usually cores - 1 to cores * 2)# -c: Max connections (plan for your connection pool sizes)# -m: Memory limit (leave headroom for slab overhead)# -R: Requests per event (higher = better throughput, worse fairness)Start with threads equal to CPU cores. If CPU utilization is low but throughput isn't meeting needs, increase thread count. If you see high context switching in monitoring, reduce threads. Modern best practice: 1.5 to 2x core count, then tune based on metrics.
One of Memcached's most important innovations is its slab allocator—a memory management system designed to eliminate fragmentation and enable O(1) allocation. Understanding the slab allocator is essential for effective Memcached operation.
Naive memory allocators (malloc/free) suffer from fragmentation over time. As items of varying sizes are created and deleted, memory becomes fragmented—plenty of total free space, but no contiguous block large enough for new allocations. This leads to increasing allocation times and wasted memory.
Memcached solves fragmentation by pre-allocating memory into slab classes, each optimized for a range of item sizes:
When an item is stored, Memcached finds the smallest slab class that fits the item and allocates a chunk from that class. All chunks in a class are the same size, eliminating fragmentation.
By default, Memcached creates slab classes using a growth factor (default: 1.25). Each successive class is 25% larger than the previous:
The Trade-off:
Larger growth factors mean fewer slab classes but more internal fragmentation (a 97-byte item wastes 23 bytes in the 120-byte class). Smaller growth factors reduce waste but increase the number of classes.
memcached -f 1.125 -m 4096 # Finer-grained classes (12.5% growth)
1234567891011121314151617181920212223
# Get slab statisticsecho "stats slabs" | nc localhost 11211 STAT 1:chunk_size 96STAT 1:chunks_per_page 10922STAT 1:total_pages 2STAT 1:total_chunks 21844STAT 1:used_chunks 18532STAT 1:free_chunks 3312STAT 1:mem_requested 1665420 STAT 2:chunk_size 120STAT 2:chunks_per_page 8738STAT 2:total_pages 1STAT 2:total_chunks 8738STAT 2:used_chunks 8200STAT 2:free_chunks 538STAT 2:mem_requested 902480 # Key metrics to monitor:# - used_chunks / total_chunks: Utilization per class# - mem_requested vs chunk_size * used_chunks: Internal fragmentation# - Uneven distribution: Some classes may be starving for pagesA critical operational issue with Memcached's slab allocator is slab calcification. Once a slab page is assigned to a class, it stays with that class forever (unless slab rebalancing is enabled). If your access patterns change—perhaps early in the day you cache many small items, and later you cache larger ones—the slab distribution becomes mismatched with actual needs.
Symptoms:
evictions stat increasing rapidly for specific classesSolutions:
-o slab_reassign): Allows manually moving pages between classes-o slab_automove): Automatically rebalances pages based on eviction ratesModern Memcached deployments should always enable slab rebalancing: -o slab_reassign,slab_automove. This allows Memcached to dynamically move memory between slab classes based on demand. Without it, workload changes can lead to severe cache inefficiency.
Memcached's data model is intentionally minimal: keys map to opaque byte sequences (values). There are no rich data types, no secondary indexes, no query capabilities. This simplicity enables extreme performance optimization.
User:123 ≠ user:123Memcached provides a small, focused command set optimized for the common cache access patterns:
| Command | Description | Atomicity |
|---|---|---|
| GET key | Retrieve value for key | Atomic read |
| SET key flags exptime bytes | Store value unconditionally | Atomic write |
| ADD key flags exptime bytes | Store only if key doesn't exist | Atomic (check + write) |
| REPLACE key flags exptime bytes | Store only if key exists | Atomic (check + write) |
| DELETE key | Remove key from cache | Atomic delete |
| INCR key value | Increment numeric value | Atomic increment |
| DECR key value | Decrement numeric value | Atomic decrement |
| APPEND key bytes | Append data to existing value | Atomic append |
| PREPEND key bytes | Prepend data to existing value | Atomic prepend |
| CAS key flags exptime bytes cas_unique | Compare-and-swap (optimistic lock) | Atomic conditional update |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
# Connect to Memcachedtelnet localhost 11211 # Basic SET/GETset user:123 0 3600 28{"name":"Alice","age":30}STORED get user:123VALUE user:123 0 28{"name":"Alice","age":30}END # Atomic counterset page:views 0 0 10STORED incr page:views 11incr page:views 12incr page:views 100102 # Conditional SET (ADD - only if not exists)add user:123 0 3600 3newNOT_STORED # Key already exists add user:456 0 3600 5valueSTORED # Key didn't exist # Compare-and-Swap (CAS) for optimistic lockinggets user:123 # Get with CAS tokenVALUE user:123 0 28 12345 # 12345 is the CAS token{"name":"Alice","age":30}END cas user:123 0 3600 28 12345{"name":"Alice","age":31}STORED # CAS token matched cas user:123 0 3600 28 12345{"name":"Alice","age":32}EXISTS # CAS token changed (someone else modified)One of Memcached's most important optimizations is multi-get—retrieving multiple keys in a single request. This dramatically reduces network round trips when fetching related data:
get user:1 user:2 user:3 user:4 user:5
VALUE user:1 0 28
{...}
VALUE user:2 0 32
{...}
VALUE user:4 0 25
{...}
END
Note that user:3 and user:5 weren't returned (cache misses). The client must handle partial results and fetch missing data from the source.
Silent vs Verbose Gets:
get returns nothing for missing keys. gets (for CAS) returns the same, but includes a CAS token for each value.
Single-key gets are a common anti-pattern. If you need data for 10 users, don't make 10 GET calls—use multi-get. Each network round trip costs 0.1-1ms; multi-get retrieves all data in a single round trip. This optimization alone can 10x your effective throughput.
Memcached uses a combination of explicit expiration and LRU eviction to manage limited memory. Understanding these mechanisms is crucial for capacity planning and hit-rate optimization.
Every item in Memcached has an expiration time, set at write time:
Lazy Expiration:
Memcached doesn't actively scan for expired items. Instead, items are lazily expired when accessed. An expired item occupies memory until:
1234567891011121314151617181920
# Cache for 1 hour (3600 seconds)set user:session:abc 0 3600 42{"user_id":123,"login_time":"2024-01-15T10:00:00Z"} # Cache for 24 hours (86400 seconds)set rendered:homepage 0 86400 15000<html>...</html> # Never expire (but can still be evicted under memory pressure)set config:feature_flags 0 0 200{"dark_mode":true,"beta_features":false} # Unix timestamp expiration (January 16, 2024 00:00:00 UTC)set event:sale_banner 0 1705363200 100{"message":"Sale ends soon!"} # Very short TTL for rate limitingset ratelimit:user:123 0 60 11# Check: exists within 60 seconds, then automatically expiresWhen Memcached needs memory for a new item and the slab class has no free chunks, it evicts the Least Recently Used (LRU) item from that class. This happens automatically; no configuration is required.
Important Distinction: Per-Slab-Class LRU
Each slab class maintains its own LRU list. An eviction in the 120-byte class only considers items in that class—not items in other classes. This can lead to unintuitive behavior:
Eviction Metrics:
echo "stats" | nc localhost 11211 | grep evictions
STAT evictions 1523942
High eviction counts indicate memory pressure. Solutions:
-m flag)Modern Memcached includes an LRU crawler that proactively reclaims expired items in the background. This prevents the situation where expired items consume memory until accessed.
memcached -o lru_crawler,lru_maintainer
LRU Maintainer:
This addition significantly improves memory efficiency for workloads with many TTL expires.
Unlike database systems where losing data is catastrophic, evictions are a normal part of caching. The goal isn't zero evictions—it's maintaining a high cache hit rate despite evictions. Monitor your hit_rate: if it stays above 90-95%, evictions are working as intended, making room for more valuable data.
Memcached itself has no built-in clustering or distribution mechanism. Each Memcached instance operates independently with no awareness of other instances. Distribution is entirely client-side—the client library decides which server holds which key.
The simplest distribution approach hashes the key and uses modulo to pick a server:
server_index = hash(key) % num_servers
The Problem:
When you add or remove a server, almost every key maps to a different server:
hash(key) % 3 = 1 → Server Bhash(key) % 4 = 3 → Server DThis causes a mass cache invalidation—nearly 100% of keys need to be refetched from the database, potentially overwhelming your backend.
Consistent hashing solves this by minimizing key remapping when the server pool changes. The algorithm:
Result: Adding or removing a server remaps only ~1/N of keys, not ~100%.
Simple consistent hashing can result in uneven distribution—some servers might get significantly more keys than others. Virtual nodes solve this by placing multiple points per server on the ring:
With 150+ virtual nodes per server, key distribution becomes nearly uniform. Modern client libraries handle virtual node placement automatically.
1234567891011121314151617181920212223242526
import pylibmcimport hashlib # Client-side consistent hashing with pylibmcclient = pylibmc.Client( ["192.168.1.101:11211", "192.168.1.102:11211", "192.168.1.103:11211"], behaviors={ "ketama": True, # Enable consistent hashing (libketama) "ketama_weighted": True, # Support weighted servers "remove_failed": 1, # Remove failed servers from ring "retry_timeout": 2, # Retry failed servers after 2 seconds "dead_timeout": 30, # Consider server dead after 30 seconds }) # Usage is transparent - client handles distributionclient.set("user:12345", {"name": "Alice", "email": "alice@example.com"}, time=3600)user_data = client.get("user:12345") # Multi-get across servers (client batches by server)keys = ["user:1", "user:2", "user:3", "user:4", "user:5"]results = client.get_multi(keys) # Results may come from different servers - client handles routing # Server weights for unequal distribution (more memory = more keys)# configured via: "192.168.1.101:11211:2" (weight 2)Different client libraries implement consistent hashing differently. When scaling a Memcached cluster, ensure all application instances use the same client library version and configuration—otherwise, the same key might route to different servers from different clients, causing inconsistency and elevated miss rates.
Deploying Memcached in production requires attention to monitoring, capacity planning, and failure handling. Unlike databases, Memcached has no persistence—a restart means starting with an empty cache.
Memcached exposes statistics via the stats command. Critical metrics include:
| Metric | Description | Alert Threshold |
|---|---|---|
| get_hits / (get_hits + get_misses) | Cache hit rate | < 80-90% (depends on workload) |
| evictions | Items evicted for new data | Rapid increase (relative to baseline) |
| curr_connections | Active client connections | Approaching max_connections |
| bytes / limit_maxbytes | Memory utilization | 95% (eviction pressure) |
| cmd_get / uptime | Gets per second | Anomalies from baseline |
| cmd_set / uptime | Sets per second | Unexpected spikes |
| bytes_read, bytes_written | Network throughput | Approaching network capacity |
1234567891011121314151617181920212223242526272829303132333435
#!/bin/bash# Memcached monitoring script MEMCACHED_HOST="localhost"MEMCACHED_PORT="11211" # Get all statsstats=$(echo "stats" | nc -q1 $MEMCACHED_HOST $MEMCACHED_PORT) # Parse key metricsget_hits=$(echo "$stats" | grep "STAT get_hits" | awk '{print $3}')get_misses=$(echo "$stats" | grep "STAT get_misses" | awk '{print $3}')evictions=$(echo "$stats" | grep "STAT evictions" | awk '{print $3}')bytes=$(echo "$stats" | grep "STAT bytes " | awk '{print $3}')limit=$(echo "$stats" | grep "STAT limit_maxbytes" | awk '{print $3}') # Calculate hit ratetotal_gets=$((get_hits + get_misses))if [ $total_gets -gt 0 ]; then hit_rate=$(echo "scale=4; $get_hits / $total_gets * 100" | bc)else hit_rate="N/A"fi # Calculate memory usagemem_usage=$(echo "scale=2; $bytes / $limit * 100" | bc) echo "Cache Hit Rate: ${hit_rate}%"echo "Memory Usage: ${mem_usage}%"echo "Evictions: $evictions" # Alert on low hit rateif (( $(echo "$hit_rate < 85" | bc -l) )); then echo "WARNING: Cache hit rate below 85%"fiEach Memcached connection consumes server resources. Without pooling, high-traffic applications can exhaust connections or create excessive connection churn.
Best Practices:
When Memcached restarts or a new server is added, it has an empty cache. Suddenly, all requests miss and hit the database, potentially causing an overload cascade.
Mitigation Strategies:
Memcached has historically been deployed on trusted internal networks with minimal security. For modern deployments:
The Amplification Attack:
Open Memcached servers have been used for DDoS amplification attacks. A small spoofed UDP request can generate a massive response. Always:
-U 0) or bind UDP to localhostMemcached was designed for trusted networks. Exposing it to the internet without authentication allows anyone to read your cached data, insert malicious content, or use your server for DDoS attacks. Use firewalls, VPCs, and private networking to protect Memcached instances.
Memcached excels in specific scenarios. Understanding when to use (and when not to use) Memcached is essential for effective system design.
123456789101112131415161718192021222324252627
# Memcached Best Practices ## Key Naming- Use hierarchical namespaces: `user:123:profile`, `product:456:inventory`- Include version for invalidation: `user:123:profile:v2`- Keep keys short (< 100 bytes) to reduce memory overhead ## Value Serialization- Use efficient formats: MessagePack, Protocol Buffers, or compressed JSON- Compress large values client-side (gzip threshold ~1KB)- Include version/schema in serialized format for evolution ## TTL Strategy- Match TTL to data freshness requirements- Use shorter TTLs for frequently-changing data- Longer TTLs (hours/days) for stable reference data- Never use 0 (never expire) for user-specific data ## Error Handling- Treat cache as optional—always have fallback to source- Handle timeouts gracefully (don't retry indefinitely)- Log cache misses for hit-rate monitoring ## Capacity Planning- Memory: Active dataset + 20% for slab overhead + growth- Connections: peak_concurrency * apps * (1.5 safety margin)- Network: Calculate based on avg_value_size * requests_per_secondMemcached's simplicity is its superpower. By focusing on doing one thing exceptionally well, it achieves performance that more feature-rich systems struggle to match. Let's consolidate the key takeaways:
What's Next:
Now that we understand both Redis and Memcached individually, the next page provides a detailed Redis vs Memcached comparison—helping you make informed decisions about which technology best fits your specific use case.
You now have a comprehensive understanding of Memcached's architecture, memory management, distributed deployment, and operational considerations. This knowledge enables you to deploy and operate Memcached effectively in high-performance caching scenarios.