Loading learning content...
A global CDN with 300 edge locations serving 50 million requests per second faces a fundamental challenge: how do you store the right content at each location to maximize cache hits while minimizing staleness?
The answer to this question determines whether a CDN achieves 95% cache efficiency or 50%—the difference between extraordinary user experience and mediocre performance. Caching is where the physical infrastructure we've studied becomes intelligent, making real-time decisions about what to store, where to store it, and when to expire it.
Caching is deceptively complex. On the surface, it's simple: store a copy of content closer to users. In practice, it involves intricate decisions about cache keys, TTLs, invalidation strategies, consistency guarantees, and storage hierarchies—each choice affecting performance, freshness, and cost.
This page covers: the fundamental principles of HTTP caching and how CDNs extend them; cache hierarchies from memory to SSD to origin shield tiers; cache key design and the art of maximizing hit rates; cache invalidation strategies from TTL-based to instant purging; advanced caching patterns including stale-while-revalidate and request coalescing; and the tradeoffs between consistency, performance, and efficiency that define caching strategy.
CDN caching is built upon the HTTP caching model defined in RFC 7234. Understanding these fundamentals is essential for effective CDN configuration and troubleshooting.
The cacheability decision:
HTTP defines when a response can be cached and for how long through response headers:
123456789101112131415161718192021222324
# Highly cacheable static asset (aggressive caching)HTTP/1.1 200 OKCache-Control: public, max-age=31536000, immutableContent-Type: application/javascriptETag: "abc123"# Cached for 1 year; 'immutable' tells browsers not to revalidate # Dynamic content with short cache (balance freshness/performance)HTTP/1.1 200 OKCache-Control: public, max-age=60, s-maxage=300Content-Type: application/jsonVary: Accept-Encoding# Browsers cache 60s; CDN caches 300s (s-maxage overrides for shared caches) # Private user-specific content (do not cache on CDN)HTTP/1.1 200 OKCache-Control: private, no-cache, no-storeSet-Cookie: session=xyz# 'private' prevents CDN caching; only browser can cache # Stale content allowed during revalidationHTTP/1.1 200 OKCache-Control: public, max-age=600, stale-while-revalidate=3600# Serve stale content for up to 1 hour while revalidating in backgroundCache-Control directive reference:
| Directive | Target | Effect | CDN Behavior |
|---|---|---|---|
public | Response | Can be cached by any cache | CDN will cache the response |
private | Response | Only browser can cache | CDN will NOT cache the response |
no-cache | Response | Cache but revalidate before use | CDN caches but checks origin on every request |
no-store | Response | Do not cache at all | CDN never stores the response |
max-age=N | Response | Cache for N seconds | CDN sets TTL to N seconds |
s-maxage=N | Response | Shared cache TTL (overrides max-age) | CDN uses this instead of max-age |
immutable | Response | Content never changes | CDN never revalidates until TTL expires |
must-revalidate | Response | Never use stale content | CDN returns error if unable to revalidate |
stale-while-revalidate=N | Response | Serve stale for N seconds during refresh | CDN serves stale content while refreshing |
stale-if-error=N | Response | Serve stale if origin errors for N seconds | CDN serves stale when origin is unavailable |
The s-maxage directive is specifically designed for CDN caching. Use it to set longer CDN TTLs while keeping shorter browser TTLs. For example, Cache-Control: public, max-age=60, s-maxage=3600 means browsers cache for 1 minute while the CDN caches for 1 hour. This enables aggressive CDN caching while ensuring browsers see fresh content.
Validation-based caching:
When cache content expires (TTL reached), the cache can validate whether the content has changed rather than fetching a full copy:
Client → CDN: GET /resource
If-None-Match: "abc123" # ETag from previous response
If-Modified-Since: Tue, 15 Jan 2025 10:00:00 GMT
CDN → Origin: GET /resource
If-None-Match: "abc123"
If-Modified-Since: Tue, 15 Jan 2025 10:00:00 GMT
Origin → CDN: HTTP/1.1 304 Not Modified
ETag: "abc123"
Cache-Control: public, max-age=3600
CDN → Client: HTTP/1.1 304 Not Modified
ETag: "abc123"
Cache-Control: public, max-age=3600
Key insight: The 304 response has no body—just headers. For large resources, conditional validation saves significant bandwidth while ensuring freshness.
Enterprise CDNs employ multi-tiered cache hierarchies that balance access speed, storage capacity, and origin offload. Understanding this hierarchy is essential for optimizing cache efficiency.
The typical four-tier cache hierarchy:
Tier 1: Memory Cache (RAM)
The fastest tier stores the most frequently accessed content in server RAM:
Tier 2: Local Disk Cache (NVMe SSD)
The primary persistent cache tier stores the working set:
Tier 3: Origin Shield
A regional cache tier that aggregates requests from multiple edge servers:
Without origin shield: If 10 edge servers each have 80% CHR, the origin receives 20% × 10 = 200% effective traffic (each edge independently fetches). With origin shield: All 10 edges share the shield's cache. Shield has aggregated 99%+ CHR for popular content, so origin sees only ~1-2% of total traffic. This multiplicative effect makes origin shields essential at scale.
Cache tier selection logic:
When a request arrives, the edge server checks tiers in order:
1. Check RAM cache → HIT? Serve immediately (fastest path)
→ MISS? Check next tier
2. Check SSD cache → HIT? Serve and promote to RAM
→ MISS? Check shield tier
3. Check Shield → HIT? Serve and cache locally
→ MISS? Fetch from origin
4. Fetch Origin → Cache at shield + local SSD + RAM (if hot)
→ Serve to client
Promotion and demotion:
The cache key is the unique identifier used to store and retrieve cached content. Cache key design directly determines hit rate—poor key design can cause cache fragmentation that devastates performance.
Default cache key components:
By default, most CDNs construct cache keys from:
Scheme + Host + Path + Query String
https://example.com/images/logo.png?v=123
The cache key problem:
This default key can cause issues when variations don't indicate different content:
?utm_source=google vs ?utm_source=facebook — Same content, different keys!?a=1&b=2 vs ?b=2&a=1 — Same content, different keys!?_=1705350000 (cache-busting timestamps)Result: Multiple cache entries for identical content → lower CHR → higher origin load.
?b=2&a=1 becomes ?a=1&b=2. Prevents duplicate entries from parameter reordering.version affects content, key on ?version=X and ignore all others.Accept-Encoding (different compressions), Accept-Language (localized content).country=US for geo-personalization.User-Agent classification, not raw header.12345678910111213141516171819202122
# Example: Cloudflare Page Rules cache key optimization# Remove marketing parameters from cache keycache_key: scheme: include host: include path: include query_string: include: ["version", "id", "format"] # Only these affect caching exclude: ["utm_*", "fbclid", "gclid"] # Ignored in key # Alternative: Akamai Property Manager approachbehavior: cacheKeyQueryString: behavior: IGNORE_ALL_PRESERVE # Ignore query, pass to origin # With header-based variantscache_key: additional_headers: - Accept-Encoding # Different key for gzip vs brotli - Accept-Language # Different key per language cookie_keys: - country # Geo-personalization variantThe HTTP Vary header tells caches to key on specified request headers. Vary: User-Agent creates a separate cache entry for EVERY unique User-Agent—potentially millions of variants for a single URL. Never use Vary: User-Agent; instead, normalize to device classes (mobile, tablet, desktop) and use custom headers.
Cache key best practices:
Audit your cache keys: Use CDN analytics to identify URLs with unexpectedly low CHR; investigate cache key fragmentation.
Minimize key cardinality: Every unique cache key is a separate entry. Lower cardinality = higher hit rate.
Separate static and dynamic: Static assets should have simple keys (just URL). Dynamic content may need additional attributes.
Test cache key changes: Incorrect key configuration can cause users to receive wrong content. Test thoroughly in staging.
Document your strategy: Cache key configuration is operational knowledge that must be maintained.
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
Cache invalidation—ensuring cached content is removed or refreshed when the authoritative source changes—is notoriously challenging. CDNs offer multiple invalidation strategies, each with tradeoffs between immediacy, cost, and operational complexity.
Time-To-Live (TTL) Expiration
The simplest and most common invalidation mechanism: content expires after a configured duration.
How it works:
Cache-Control: max-age=N or s-maxage=NAdvantages:
Disadvantages:
When to use:
| Content Type | Recommended TTL | Rationale |
|---|---|---|
| Versioned static assets | 1 year (31536000s) | URL changes on update; safe to cache forever |
| Unversioned static assets | 1 week (604800s) | Manual invalidation if emergency update |
| API responses | 5-60 seconds | Balance freshness and performance |
| HTML pages | 60-300 seconds | Short enough for content updates |
| Real-time data | 0 (no-cache) | Always validate with origin |
Cache tags (also called Surrogate Keys at Fastly) allow logical grouping of cached content. Tag a product page, its images, and related API responses with 'product-123'. One purge invalidates all related content across all URLs. This is dramatically more efficient than purging individual URLs and ensures consistency across related content.
Beyond basic TTL-based caching, production CDNs implement sophisticated patterns to handle edge cases, optimize efficiency, and maintain consistency under challenging conditions.
Request coalescing deep dive:
Request coalescing (also called request collapsing or request deduplication) is essential for handling viral content and flash crowds:
┌──────────────────────────────────────────────────────────┐
│ Edge Server Timeline │
├──────────────────────────────────────────────────────────┤
│ │
│ t=0ms: Request A arrives for /viral-video.mp4 (MISS) │
│ → Origin request initiated │
│ │
│ t=5ms: Request B arrives for /viral-video.mp4 │
│ → Sees pending request, JOINS WAIT GROUP │
│ │
│ t=10ms: Request C arrives for /viral-video.mp4 │
│ → Joins same wait group │
│ │
│ t=100ms: Origin response received │
│ → Content cached │
│ → All waiting requests (A, B, C) satisfied │
│ │
│ Result: 3 client requests, 1 origin request │
│ Without coalescing: 3 client requests, 3 origin requests │
└──────────────────────────────────────────────────────────┘
For viral content with 10,000 simultaneous requests, coalescing reduces origin load from 10,000 requests to 1. This is why CDNs can handle flash crowds that would devastate any origin server.
Request coalescing has failure modes. If the initial origin request is slow (e.g., 30 seconds), all coalesced requests wait 30 seconds. CDNs implement coalescing timeouts—if origin doesn't respond within threshold (e.g., 10 seconds), coalescing breaks and multiple origin requests are allowed. This prevents one slow request from blocking many users.
While CDNs excel at static content, increasingly they cache dynamic content—personalized pages, API responses, and real-time data. This requires careful strategy to balance freshness, personalization, and efficiency.
The dynamic content challenge:
Dynamic content varies by user, time, or context. A product page might include:
| Strategy | Technique | Cache Key Impact | Use Case |
|---|---|---|---|
| Response Fragmentation | Separate cacheable/non-cacheable parts | Multiple keys, lower cardinality | Pages mixing static + personalized |
| Cookie-less Domain | Serve static assets from different domain | No cookie in key; maximum sharing | Images, CSS, JS files |
| Vary Header Control | Specify which headers create variants | Controlled fragmentation | Language, device, format variants |
| Query String Versioning | Include version in URL, not query | Clean keys; long TTL | Static assets with updates |
| Micro-caching | Cache for 1-5 seconds | High hit rate, low staleness | High-traffic dynamic APIs |
| Edge Computing | Generate content at edge | Compute at edge; personalized | SSR, A/B testing, personalization |
Pattern: HTTP Cache Fragments with Edge Composition
Modern CDNs can assemble pages from independently cached fragments:
┌─────────────────────────────────────────────────────────────┐
│ Full Page │
├─────────────────────────────────────────────────────────────┤
│ ┌────────────────────────────────────────────────────┐ │
│ │ Header Fragment (shared, TTL: 1 hour) │ │
│ └────────────────────────────────────────────────────┘ │
│ ┌──────────────────────┐ ┌──────────────────────────┐ │
│ │ User Menu Fragment │ │ Search Fragment (shared) │ │
│ │ (private, no cache) │ │ (TTL: 1 minute) │ │
│ └──────────────────────┘ └──────────────────────────┘ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Content Fragment (shared, TTL: 5 minutes) │ │
│ └────────────────────────────────────────────────────┘ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Footer Fragment (shared, TTL: 1 hour) │ │
│ └────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Even 'dynamic' pages are usually 90%+ identical across users. The product description, layout, navigation, and images are the same—only shopping cart and recommendations differ. By fragmenting pages and caching the common 90%, CDNs deliver near-static performance for dynamic content. This insight drives modern JAMstack and edge computing architectures.
With content cached across 200+ global locations, maintaining consistency is a significant challenge. Different users may see different content versions depending on which edge server they hit and the local cache state.
The consistency challenge:
Consider this timeline:
t=0: Price is $10.00; cached globally with TTL=300s
t=60s: Price updated to $12.00 at origin
t=61s: User A (Tokyo edge) sees $10.00 (cached)
t=62s: User B (NYC edge) sees $12.00 (edge had cache miss)
t=180s: User A still sees $10.00; User B sees $12.00
Result: Same product, different prices for 4+ minutes
Consistency models for CDN caching:
| Model | Guarantee | Implementation | Tradeoff |
|---|---|---|---|
| Eventual Consistency | All edges will eventually have same content | TTL-based expiration; no active sync | Lowest cost; acceptable staleness window |
| Bounded Staleness | Content no older than X seconds | Short TTL + stale-while-revalidate | Predictable maximum staleness |
| Instant Consistency | All edges updated simultaneously | Purge on update + refill or edge compute | Highest cost; complex implementation |
| Strong Consistency | No stale content ever served | Cache-through with validation | High latency; defeats CDN benefits |
Implementing bounded staleness:
For most applications, bounded staleness provides acceptable consistency at reasonable cost:
s-maxage to maximum acceptable staleness (e.g., 60 seconds)stale-while-revalidate for background updatesThe version field pattern:
For content that must be consistent across API responses, include a version field:
{
"product_id": 123,
"name": "Widget",
"price": 12.00,
"cache_version": "2025-01-17T10:30:00Z",
"_meta": {
"cached_at": "2025-01-17T10:32:15Z",
"edge_location": "tokyo-01"
}
}
Clients can compare cache_version across responses and detect inconsistencies. If critical actions require consistency, clients can request with Cache-Control: no-cache to bypass CDN.
CDNs face the CAP theorem: during network partitions between edge and origin, choose Availability (serve potentially stale cached content) or Consistency (return errors when unable to validate). Most CDNs choose availability—serving stale content is better than serving errors. Configure stale-if-error to control this tradeoff explicitly.
Effective caching requires continuous measurement and optimization. Key metrics reveal caching efficiency and guide configuration improvements.
| Metric | Formula | Target | Interpretation |
|---|---|---|---|
| Cache Hit Ratio (CHR) | Hits ÷ (Hits + Misses) | 90% static, >60% overall | Primary efficiency measure |
| Byte Hit Ratio | Bytes from cache ÷ Total bytes | 85% | Bandwidth-weighted efficiency |
| Origin Offload | 1 - (Origin requests ÷ Total requests) | 90% | Origin protection effectiveness |
| Cache Efficiency | Unique objects ÷ Total objects | Lower is better | Cache fragmentation indicator |
| Miss Latency | Avg latency on cache miss | <500ms | Origin performance impact |
| Hit Latency | Avg latency on cache hit | <50ms | Edge serving performance |
| Stale Ratio | Stale serves ÷ Total serves | Depends on SWR config | Freshness vs. performance tradeoff |
Diagnosing cache problems:
Problem: Low CHR (50-70%)
Problem: High origin load despite good CHR
Problem: Stale content complaints
123456789101112131415161718
# Check cache status from CDN response headerscurl -sI https://example.com/page | grep -E '(cf-cache-status|x-cache|age)'# cf-cache-status: HIT ← Cloudflare cache hit# x-cache: Hit from cloudfront ← AWS CloudFront hit # age: 3600 ← Cached for 1 hour # Common cache status values:# HIT - Served from cache# MISS - Fetched from origin (now cached)# EXPIRED - Cache expired, revalidated# BYPASS - Cache bypassed (no-cache request)# DYNAMIC - Content marked as uncacheable # Query CDN analytics API for cache metricscurl -X GET "https://api.cloudflare.com/client/v4/zones/{zone}/analytics/dashboard" \ -H "Authorization: Bearer {token}" \ | jq '.result.totals.requests.cached / .result.totals.requests.all * 100'# Returns: 92.4 (cache hit ratio percentage)Cache optimization is iterative: Measure current CHR → Identify lowest-CHR URLs → Investigate cache key/TTL issues → Implement fixes → Measure impact → Repeat. Target improvements should be specific and measurable: 'Increase CHR from 85% to 92% for product images by removing utm parameters from cache key.'
Content caching transforms CDN infrastructure from distributed servers into an intelligent content delivery system. Effective caching strategy determines whether your CDN achieves its performance and cost potential.
What's next:
With caching fundamentals mastered, we turn to the CDN industry landscape: CDN Providers. The next page examines major commercial CDN offerings, their architectures, pricing models, and the criteria for selecting the right CDN for your specific requirements.
You now understand the complete caching lifecycle in CDNs—from HTTP headers through multi-tier cache hierarchies to sophisticated invalidation strategies. This knowledge enables you to configure, optimize, and troubleshoot CDN caching for any content type and scale. Cache effectively, and you unlock the full potential of global content delivery.