Computer NetworksContent Delivery Networks

Content Delivery Networks (CDN)

LevelIntermediate

Duration90 mins

TopicContent Delivery Networks

3 / 5

Content Caching: The Intelligence Behind Efficient Delivery

The Art and Science of Caching

A global CDN with 300 edge locations serving 50 million requests per second faces a fundamental challenge: how do you store the right content at each location to maximize cache hits while minimizing staleness?

The answer to this question determines whether a CDN achieves 95% cache efficiency or 50%—the difference between extraordinary user experience and mediocre performance. Caching is where the physical infrastructure we've studied becomes intelligent, making real-time decisions about what to store, where to store it, and when to expire it.

Caching is deceptively complex. On the surface, it's simple: store a copy of content closer to users. In practice, it involves intricate decisions about cache keys, TTLs, invalidation strategies, consistency guarantees, and storage hierarchies—each choice affecting performance, freshness, and cost.

What You Will Master

This page covers: the fundamental principles of HTTP caching and how CDNs extend them; cache hierarchies from memory to SSD to origin shield tiers; cache key design and the art of maximizing hit rates; cache invalidation strategies from TTL-based to instant purging; advanced caching patterns including stale-while-revalidate and request coalescing; and the tradeoffs between consistency, performance, and efficiency that define caching strategy.

HTTP Caching Fundamentals

CDN caching is built upon the HTTP caching model defined in RFC 7234. Understanding these fundamentals is essential for effective CDN configuration and troubleshooting.

The cacheability decision:

HTTP defines when a response can be cached and for how long through response headers:

Cache-Control — The primary mechanism for cache directives
Expires — Legacy absolute expiration time (superseded by Cache-Control)
ETag — Entity tag for validation-based caching
Last-Modified — Timestamp for validation-based caching

cache_headers_examples.http
HTTP Headers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Highly cacheable static asset (aggressive caching)
HTTP/1.1 200 OK
Cache-Control: public, max-age=31536000, immutable
Content-Type: application/javascript
ETag: "abc123"
# Cached for 1 year; 'immutable' tells browsers not to revalidate
 
# Dynamic content with short cache (balance freshness/performance)
HTTP/1.1 200 OK
Cache-Control: public, max-age=60, s-maxage=300
Content-Type: application/json
Vary: Accept-Encoding
# Browsers cache 60s; CDN caches 300s (s-maxage overrides for shared caches)
 
# Private user-specific content (do not cache on CDN)
HTTP/1.1 200 OK
Cache-Control: private, no-cache, no-store
Set-Cookie: session=xyz
# 'private' prevents CDN caching; only browser can cache
 
# Stale content allowed during revalidation
HTTP/1.1 200 OK
Cache-Control: public, max-age=600, stale-while-revalidate=3600
# Serve stale content for up to 1 hour while revalidating in background

Cache-Control directive reference:

Cache-Control Directives and Their Effects
Directive	Target	Effect	CDN Behavior
`public`	Response	Can be cached by any cache	CDN will cache the response
`private`	Response	Only browser can cache	CDN will NOT cache the response
`no-cache`	Response	Cache but revalidate before use	CDN caches but checks origin on every request
`no-store`	Response	Do not cache at all	CDN never stores the response
`max-age=N`	Response	Cache for N seconds	CDN sets TTL to N seconds
`s-maxage=N`	Response	Shared cache TTL (overrides max-age)	CDN uses this instead of max-age
`immutable`	Response	Content never changes	CDN never revalidates until TTL expires
`must-revalidate`	Response	Never use stale content	CDN returns error if unable to revalidate
`stale-while-revalidate=N`	Response	Serve stale for N seconds during refresh	CDN serves stale content while refreshing
`stale-if-error=N`	Response	Serve stale if origin errors for N seconds	CDN serves stale when origin is unavailable

The s-maxage Override

The s-maxage directive is specifically designed for CDN caching. Use it to set longer CDN TTLs while keeping shorter browser TTLs. For example, Cache-Control: public, max-age=60, s-maxage=3600 means browsers cache for 1 minute while the CDN caches for 1 hour. This enables aggressive CDN caching while ensuring browsers see fresh content.

Validation-based caching:

When cache content expires (TTL reached), the cache can validate whether the content has changed rather than fetching a full copy:

Client → CDN: GET /resource
               If-None-Match: "abc123"     # ETag from previous response
               If-Modified-Since: Tue, 15 Jan 2025 10:00:00 GMT

CDN → Origin: GET /resource
              If-None-Match: "abc123"
              If-Modified-Since: Tue, 15 Jan 2025 10:00:00 GMT

Origin → CDN: HTTP/1.1 304 Not Modified
              ETag: "abc123"
              Cache-Control: public, max-age=3600

CDN → Client: HTTP/1.1 304 Not Modified
              ETag: "abc123"
              Cache-Control: public, max-age=3600

Key insight: The 304 response has no body—just headers. For large resources, conditional validation saves significant bandwidth while ensuring freshness.

Cache Hierarchy Design

Enterprise CDNs employ multi-tiered cache hierarchies that balance access speed, storage capacity, and origin offload. Understanding this hierarchy is essential for optimizing cache efficiency.

The typical four-tier cache hierarchy:

Converting Mermaid diagram...

Tier 1: Memory Cache (RAM)

The fastest tier stores the most frequently accessed content in server RAM:

Capacity: 100-512GB per server (subset of total content)
Access time: <1 microsecond
Hit rate: 30-50% of requests (for popular content distributions)
Eviction policy: LRU (Least Recently Used) or LFU (Least Frequently Used)
Persistence: Lost on server restart

Tier 2: Local Disk Cache (NVMe SSD)

The primary persistent cache tier stores the working set:

Capacity: 30-100TB per server (larger content catalog)
Access time: 50-200 microseconds (random read)
Hit rate: 40-60% of remaining requests
Eviction policy: LRU with popularity scoring
Persistence: Survives server restart; enables instant cache warming

Tier 3: Origin Shield

A regional cache tier that aggregates requests from multiple edge servers:

Purpose: Protect origin from thundering herd; improve cache efficiency
Behavior: Edge servers fetch from shield on cache miss (not directly from origin)
Location: 3-10 global shield locations (regional coverage)
Benefit: 10 edges with 80% CHR each combine to 98% origin offload via shield

The Origin Shield Calculation

Without origin shield: If 10 edge servers each have 80% CHR, the origin receives 20% × 10 = 200% effective traffic (each edge independently fetches). With origin shield: All 10 edges share the shield's cache. Shield has aggregated 99%+ CHR for popular content, so origin sees only ~1-2% of total traffic. This multiplicative effect makes origin shields essential at scale.

Cache tier selection logic:

When a request arrives, the edge server checks tiers in order:

1. Check RAM cache → HIT? Serve immediately (fastest path)
                   → MISS? Check next tier

2. Check SSD cache  → HIT? Serve and promote to RAM
                   → MISS? Check shield tier

3. Check Shield    → HIT? Serve and cache locally
                   → MISS? Fetch from origin

4. Fetch Origin    → Cache at shield + local SSD + RAM (if hot)
                   → Serve to client

Promotion and demotion:

Content accessed frequently is promoted up tiers (SSD → RAM)
Content not accessed recently is demoted down tiers or evicted
Each tier runs independent eviction based on its constraints
Hot content exists in all tiers simultaneously; cold content only on disk

Cache Key Design

The cache key is the unique identifier used to store and retrieve cached content. Cache key design directly determines hit rate—poor key design can cause cache fragmentation that devastates performance.

Default cache key components:

By default, most CDNs construct cache keys from:

Scheme + Host + Path + Query String
https://example.com/images/logo.png?v=123

The cache key problem:

This default key can cause issues when variations don't indicate different content:

?utm_source=google vs ?utm_source=facebook — Same content, different keys!
Query parameter order: ?a=1&b=2 vs ?b=2&a=1 — Same content, different keys!
Tracking parameters: ?_=1705350000 (cache-busting timestamps)

Result: Multiple cache entries for identical content → lower CHR → higher origin load.

Cache Key Optimization Strategies

•Query string stripping — Remove tracking parameters (utm_*, fbclid, gclid) from cache key while passing to origin. Same content cached once regardless of marketing attribution.
•Query string sorting — Normalize parameter order: ?b=2&a=1 becomes ?a=1&b=2. Prevents duplicate entries from parameter reordering.
•Query string whitelisting — Only include known-relevant parameters in cache key. If only version affects content, key on ?version=X and ignore all others.
•Header-based keying — Include specific headers in cache key. Common: Accept-Encoding (different compressions), Accept-Language (localized content).
•Cookie-based keying — Include specific cookies in cache key (use sparingly—cookies are high-cardinality). Example: country=US for geo-personalization.
•Device-based keying — Include device type (mobile/desktop) in cache key for responsive content. Use User-Agent classification, not raw header.

cache_key_config.txt
CDN Config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Example: Cloudflare Page Rules cache key optimization
# Remove marketing parameters from cache key
cache_key:
  scheme: include
  host: include
  path: include
  query_string:
    include: ["version", "id", "format"]    # Only these affect caching
    exclude: ["utm_*", "fbclid", "gclid"]   # Ignored in key
  
# Alternative: Akamai Property Manager approach
behavior:
  cacheKeyQueryString:
    behavior: IGNORE_ALL_PRESERVE   # Ignore query, pass to origin
 
# With header-based variants
cache_key:
  additional_headers:
    - Accept-Encoding              # Different key for gzip vs brotli
    - Accept-Language             # Different key per language
  cookie_keys:
    - country                     # Geo-personalization variant

The Vary Header Trap

The HTTP Vary header tells caches to key on specified request headers. Vary: User-Agent creates a separate cache entry for EVERY unique User-Agent—potentially millions of variants for a single URL. Never use Vary: User-Agent; instead, normalize to device classes (mobile, tablet, desktop) and use custom headers.

Cache key best practices:

Audit your cache keys: Use CDN analytics to identify URLs with unexpectedly low CHR; investigate cache key fragmentation.
Minimize key cardinality: Every unique cache key is a separate entry. Lower cardinality = higher hit rate.
Separate static and dynamic: Static assets should have simple keys (just URL). Dynamic content may need additional attributes.
Test cache key changes: Incorrect key configuration can cause users to receive wrong content. Test thoroughly in staging.
Document your strategy: Cache key configuration is operational knowledge that must be maintained.

Cache Invalidation Strategies

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Cache invalidation—ensuring cached content is removed or refreshed when the authoritative source changes—is notoriously challenging. CDNs offer multiple invalidation strategies, each with tradeoffs between immediacy, cost, and operational complexity.

Time-To-Live (TTL) Expiration

The simplest and most common invalidation mechanism: content expires after a configured duration.

How it works:

Origin includes Cache-Control: max-age=N or s-maxage=N
CDN caches content and starts the TTL countdown
When TTL expires, cached content is marked stale
Next request either revalidates (conditional request) or fetches full content

Advantages:

Zero operational overhead (automatic)
Predictable behavior
No API calls or purge operations required

Disadvantages:

Content can be stale for up to TTL duration
Shorter TTLs increase origin load
Cannot handle urgent content corrections

When to use:

Content with predictable update schedules
Long-lived static assets (JS, CSS, images)
Content where slight staleness is acceptable

Content Type	Recommended TTL	Rationale
Versioned static assets	1 year (31536000s)	URL changes on update; safe to cache forever
Unversioned static assets	1 week (604800s)	Manual invalidation if emergency update
API responses	5-60 seconds	Balance freshness and performance
HTML pages	60-300 seconds	Short enough for content updates
Real-time data	0 (no-cache)	Always validate with origin

Cache Tags: The Power Tool

Cache tags (also called Surrogate Keys at Fastly) allow logical grouping of cached content. Tag a product page, its images, and related API responses with 'product-123'. One purge invalidates all related content across all URLs. This is dramatically more efficient than purging individual URLs and ensures consistency across related content.

Advanced Caching Patterns

Beyond basic TTL-based caching, production CDNs implement sophisticated patterns to handle edge cases, optimize efficiency, and maintain consistency under challenging conditions.

Advanced Caching Patterns

•Request Coalescing (Request Collapsing) — When multiple concurrent requests arrive for the same uncached content, only ONE request is sent to origin. Other requests wait for the first response. Prevents origin overload from flash crowds. Critical for live events and viral content.
•Negative Caching — Cache 404 and other error responses for short durations. Prevents repeated origin requests for non-existent content. Typical TTL: 30-300 seconds. Protects origin from attacks probing for non-existent resources.
•Grace Mode / Stale-If-Error — If origin is unavailable, serve stale cached content rather than error. Maintains user experience during origin outages. Stale content is better than no content for most use cases.
•Micro-Caching — Cache dynamic content for very short durations (1-5 seconds). Even 1-second caching collapses simultaneous requests. Effective for high-traffic pages where perfect freshness isn't required.
•Edge-Side Includes (ESI) — Composite pages assembled at the edge from cached fragments. Different fragments can have different TTLs. Enables mixing personalized and shared content efficiently.
•Predictive Pre-Fetching — Analyze request patterns to predict next content requests. Pre-fetch content to edge caches before users request it. Reduces cache miss latency for predictable navigation.

Request coalescing deep dive:

Request coalescing (also called request collapsing or request deduplication) is essential for handling viral content and flash crowds:

┌──────────────────────────────────────────────────────────┐
│                   Edge Server Timeline                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│ t=0ms:   Request A arrives for /viral-video.mp4 (MISS)  │
│          → Origin request initiated                      │
│                                                          │
│ t=5ms:   Request B arrives for /viral-video.mp4         │
│          → Sees pending request, JOINS WAIT GROUP        │
│                                                          │
│ t=10ms:  Request C arrives for /viral-video.mp4         │
│          → Joins same wait group                         │
│                                                          │
│ t=100ms: Origin response received                        │
│          → Content cached                                │
│          → All waiting requests (A, B, C) satisfied      │
│                                                          │
│ Result: 3 client requests, 1 origin request              │
│ Without coalescing: 3 client requests, 3 origin requests │
└──────────────────────────────────────────────────────────┘

For viral content with 10,000 simultaneous requests, coalescing reduces origin load from 10,000 requests to 1. This is why CDNs can handle flash crowds that would devastate any origin server.

Coalescing Timeout Considerations

Request coalescing has failure modes. If the initial origin request is slow (e.g., 30 seconds), all coalesced requests wait 30 seconds. CDNs implement coalescing timeouts—if origin doesn't respond within threshold (e.g., 10 seconds), coalescing breaks and multiple origin requests are allowed. This prevents one slow request from blocking many users.

Caching Dynamic Content

While CDNs excel at static content, increasingly they cache dynamic content—personalized pages, API responses, and real-time data. This requires careful strategy to balance freshness, personalization, and efficiency.

The dynamic content challenge:

Dynamic content varies by user, time, or context. A product page might include:

Static content: Product description, images (cache forever)
Semi-dynamic: Price, availability (cache 60 seconds)
User-specific: Cart status, recommendations (don't cache on CDN)

Dynamic Content Caching Strategies
Strategy	Technique	Cache Key Impact	Use Case
Response Fragmentation	Separate cacheable/non-cacheable parts	Multiple keys, lower cardinality	Pages mixing static + personalized
Cookie-less Domain	Serve static assets from different domain	No cookie in key; maximum sharing	Images, CSS, JS files
Vary Header Control	Specify which headers create variants	Controlled fragmentation	Language, device, format variants
Query String Versioning	Include version in URL, not query	Clean keys; long TTL	Static assets with updates
Micro-caching	Cache for 1-5 seconds	High hit rate, low staleness	High-traffic dynamic APIs
Edge Computing	Generate content at edge	Compute at edge; personalized	SSR, A/B testing, personalization

Pattern: HTTP Cache Fragments with Edge Composition

Modern CDNs can assemble pages from independently cached fragments:

┌─────────────────────────────────────────────────────────────┐
│                         Full Page                            │
├─────────────────────────────────────────────────────────────┤
│  ┌────────────────────────────────────────────────────┐     │
│  │     Header Fragment (shared, TTL: 1 hour)          │     │
│  └────────────────────────────────────────────────────┘     │
│  ┌──────────────────────┐  ┌──────────────────────────┐    │
│  │ User Menu Fragment   │  │ Search Fragment (shared) │    │
│  │ (private, no cache)  │  │ (TTL: 1 minute)          │    │
│  └──────────────────────┘  └──────────────────────────┘    │
│  ┌────────────────────────────────────────────────────┐     │
│  │     Content Fragment (shared, TTL: 5 minutes)      │     │
│  └────────────────────────────────────────────────────┘     │
│  ┌────────────────────────────────────────────────────┐     │
│  │     Footer Fragment (shared, TTL: 1 hour)          │     │
│  └────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────┘

Most fragments cached and shared across all users
User Menu fetched client-side (JavaScript) or excluded from cache
Page delivered fast; small personalized parts lazy-loaded

The 90% Cacheable Insight

Even 'dynamic' pages are usually 90%+ identical across users. The product description, layout, navigation, and images are the same—only shopping cart and recommendations differ. By fragmenting pages and caching the common 90%, CDNs deliver near-static performance for dynamic content. This insight drives modern JAMstack and edge computing architectures.

Cache Consistency and Coherence

With content cached across 200+ global locations, maintaining consistency is a significant challenge. Different users may see different content versions depending on which edge server they hit and the local cache state.

The consistency challenge:

Consider this timeline:

t=0:    Price is $10.00; cached globally with TTL=300s
t=60s:  Price updated to $12.00 at origin
t=61s:  User A (Tokyo edge) sees $10.00 (cached)
t=62s:  User B (NYC edge) sees $12.00 (edge had cache miss)
t=180s: User A still sees $10.00; User B sees $12.00

Result: Same product, different prices for 4+ minutes

Consistency models for CDN caching:

CDN Consistency Models
Model	Guarantee	Implementation	Tradeoff
Eventual Consistency	All edges will eventually have same content	TTL-based expiration; no active sync	Lowest cost; acceptable staleness window
Bounded Staleness	Content no older than X seconds	Short TTL + stale-while-revalidate	Predictable maximum staleness
Instant Consistency	All edges updated simultaneously	Purge on update + refill or edge compute	Highest cost; complex implementation
Strong Consistency	No stale content ever served	Cache-through with validation	High latency; defeats CDN benefits

Implementing bounded staleness:

For most applications, bounded staleness provides acceptable consistency at reasonable cost:

Set s-maxage to maximum acceptable staleness (e.g., 60 seconds)
Use stale-while-revalidate for background updates
For critical updates, trigger instant purge via API
Result: Content is at most 60 seconds stale except when explicitly purged

The version field pattern:

For content that must be consistent across API responses, include a version field:

{
  "product_id": 123,
  "name": "Widget",
  "price": 12.00,
  "cache_version": "2025-01-17T10:30:00Z",
  "_meta": {
    "cached_at": "2025-01-17T10:32:15Z",
    "edge_location": "tokyo-01"
  }
}

Clients can compare cache_version across responses and detect inconsistencies. If critical actions require consistency, clients can request with Cache-Control: no-cache to bypass CDN.

CAP Theorem Applies to CDNs

CDNs face the CAP theorem: during network partitions between edge and origin, choose Availability (serve potentially stale cached content) or Consistency (return errors when unable to validate). Most CDNs choose availability—serving stale content is better than serving errors. Configure stale-if-error to control this tradeoff explicitly.

Measuring Cache Performance

Effective caching requires continuous measurement and optimization. Key metrics reveal caching efficiency and guide configuration improvements.

Essential Cache Performance Metrics
Metric	Formula	Target	Interpretation
Cache Hit Ratio (CHR)	Hits ÷ (Hits + Misses)	90% static, >60% overall	Primary efficiency measure
Byte Hit Ratio	Bytes from cache ÷ Total bytes	85%	Bandwidth-weighted efficiency
Origin Offload	1 - (Origin requests ÷ Total requests)	90%	Origin protection effectiveness
Cache Efficiency	Unique objects ÷ Total objects	Lower is better	Cache fragmentation indicator
Miss Latency	Avg latency on cache miss	<500ms	Origin performance impact
Hit Latency	Avg latency on cache hit	<50ms	Edge serving performance
Stale Ratio	Stale serves ÷ Total serves	Depends on SWR config	Freshness vs. performance tradeoff

Diagnosing cache problems:

Problem: Low CHR (50-70%)

Check: Cache key fragmentation (too many variants?)
Check: TTLs too aggressive (short TTLs = more misses)
Check: No-cache/no-store headers incorrectly set
Check: Query string parameters creating unnecessary variants

Problem: High origin load despite good CHR

Check: Large objects causing bandwidth despite few requests
Check: Request coalescing not enabled
Check: Purge rate too high (content invalidated before natural expiry)

Problem: Stale content complaints

Check: TTL appropriate for content update frequency
Check: Purge automation functioning
Check: Stale-while-revalidate misconfigured

analyze_cache_performance.sh
CLI Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Check cache status from CDN response headers
curl -sI https://example.com/page | grep -E '(cf-cache-status|x-cache|age)'
# cf-cache-status: HIT        ← Cloudflare cache hit
# x-cache: Hit from cloudfront ← AWS CloudFront hit  
# age: 3600                    ← Cached for 1 hour
 
# Common cache status values:
# HIT        - Served from cache
# MISS       - Fetched from origin (now cached)
# EXPIRED    - Cache expired, revalidated
# BYPASS     - Cache bypassed (no-cache request)
# DYNAMIC    - Content marked as uncacheable
 
# Query CDN analytics API for cache metrics
curl -X GET "https://api.cloudflare.com/client/v4/zones/{zone}/analytics/dashboard" \
  -H "Authorization: Bearer {token}" \
  | jq '.result.totals.requests.cached / .result.totals.requests.all * 100'
# Returns: 92.4 (cache hit ratio percentage)

The CHR Optimization Loop

Cache optimization is iterative: Measure current CHR → Identify lowest-CHR URLs → Investigate cache key/TTL issues → Implement fixes → Measure impact → Repeat. Target improvements should be specific and measurable: 'Increase CHR from 85% to 92% for product images by removing utm parameters from cache key.'

Summary: Caching Excellence

Content caching transforms CDN infrastructure from distributed servers into an intelligent content delivery system. Effective caching strategy determines whether your CDN achieves its performance and cost potential.

Key Takeaways

•HTTP caching headers are foundational — Master Cache-Control directives, especially s-maxage, stale-while-revalidate, and immutable for CDN-specific optimization.
•Cache hierarchies multiply efficiency — RAM → SSD → Shield → Origin tiers progressively catch cache misses, with origin shields essential for origin protection.
•Cache key design determines hit rates — Minimize key cardinality by stripping tracking parameters, normalizing query strings, and carefully controlling Vary headers.
•Multiple invalidation strategies exist — TTL-based for simplicity, instant purging for control, SWR for latency, event-driven for automation—choose based on freshness requirements.
•Advanced patterns handle edge cases — Request coalescing, negative caching, and micro-caching solve specific problems at scale.
•Dynamic content can be cached — Fragment pages, use edge computing, and implement micro-caching to extend CDN benefits beyond static content.
•Consistency is an explicit choice — Understand eventual vs. bounded staleness vs. instant consistency and implement according to business requirements.
•Measurement drives optimization — Track CHR, byte hit ratio, and origin offload to identify and resolve caching inefficiencies.

What's next:

With caching fundamentals mastered, we turn to the CDN industry landscape: CDN Providers. The next page examines major commercial CDN offerings, their architectures, pricing models, and the criteria for selecting the right CDN for your specific requirements.

Page Complete

You now understand the complete caching lifecycle in CDNs—from HTTP headers through multi-tier cache hierarchies to sophisticated invalidation strategies. This knowledge enables you to configure, optimize, and troubleshoot CDN caching for any content type and scale. Cache effectively, and you unlock the full potential of global content delivery.

3 / 5

Loading learning content...

Computer NetworksContent Delivery Networks

Content Delivery Networks (CDN)

LevelIntermediate

Duration90 mins

TopicContent Delivery Networks

3 / 5

Content Caching: The Intelligence Behind Efficient Delivery

The Art and Science of Caching

What You Will Master

HTTP Caching Fundamentals

CDN caching is built upon the HTTP caching model defined in RFC 7234. Understanding these fundamentals is essential for effective CDN configuration and troubleshooting.

The cacheability decision:

HTTP defines when a response can be cached and for how long through response headers:

Cache-Control — The primary mechanism for cache directives
Expires — Legacy absolute expiration time (superseded by Cache-Control)
ETag — Entity tag for validation-based caching
Last-Modified — Timestamp for validation-based caching

cache_headers_examples.http
HTTP Headers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Highly cacheable static asset (aggressive caching)
HTTP/1.1 200 OK
Cache-Control: public, max-age=31536000, immutable
Content-Type: application/javascript
ETag: "abc123"
# Cached for 1 year; 'immutable' tells browsers not to revalidate
 
# Dynamic content with short cache (balance freshness/performance)
HTTP/1.1 200 OK
Cache-Control: public, max-age=60, s-maxage=300
Content-Type: application/json
Vary: Accept-Encoding
# Browsers cache 60s; CDN caches 300s (s-maxage overrides for shared caches)
 
# Private user-specific content (do not cache on CDN)
HTTP/1.1 200 OK
Cache-Control: private, no-cache, no-store
Set-Cookie: session=xyz
# 'private' prevents CDN caching; only browser can cache
 
# Stale content allowed during revalidation
HTTP/1.1 200 OK
Cache-Control: public, max-age=600, stale-while-revalidate=3600
# Serve stale content for up to 1 hour while revalidating in background

Cache-Control directive reference:

Cache-Control Directives and Their Effects
Directive	Target	Effect	CDN Behavior
`public`	Response	Can be cached by any cache	CDN will cache the response
`private`	Response	Only browser can cache	CDN will NOT cache the response
`no-cache`	Response	Cache but revalidate before use	CDN caches but checks origin on every request
`no-store`	Response	Do not cache at all	CDN never stores the response
`max-age=N`	Response	Cache for N seconds	CDN sets TTL to N seconds
`s-maxage=N`	Response	Shared cache TTL (overrides max-age)	CDN uses this instead of max-age
`immutable`	Response	Content never changes	CDN never revalidates until TTL expires
`must-revalidate`	Response	Never use stale content	CDN returns error if unable to revalidate
`stale-while-revalidate=N`	Response	Serve stale for N seconds during refresh	CDN serves stale content while refreshing
`stale-if-error=N`	Response	Serve stale if origin errors for N seconds	CDN serves stale when origin is unavailable

The s-maxage Override

Validation-based caching:

When cache content expires (TTL reached), the cache can validate whether the content has changed rather than fetching a full copy:

Client → CDN: GET /resource
               If-None-Match: "abc123"     # ETag from previous response
               If-Modified-Since: Tue, 15 Jan 2025 10:00:00 GMT

CDN → Origin: GET /resource
              If-None-Match: "abc123"
              If-Modified-Since: Tue, 15 Jan 2025 10:00:00 GMT

Origin → CDN: HTTP/1.1 304 Not Modified
              ETag: "abc123"
              Cache-Control: public, max-age=3600

CDN → Client: HTTP/1.1 304 Not Modified
              ETag: "abc123"
              Cache-Control: public, max-age=3600

Key insight: The 304 response has no body—just headers. For large resources, conditional validation saves significant bandwidth while ensuring freshness.

Cache Hierarchy Design

Enterprise CDNs employ multi-tiered cache hierarchies that balance access speed, storage capacity, and origin offload. Understanding this hierarchy is essential for optimizing cache efficiency.

The typical four-tier cache hierarchy:

Converting Mermaid diagram...

Tier 1: Memory Cache (RAM)

The fastest tier stores the most frequently accessed content in server RAM:

Capacity: 100-512GB per server (subset of total content)
Access time: <1 microsecond
Hit rate: 30-50% of requests (for popular content distributions)
Eviction policy: LRU (Least Recently Used) or LFU (Least Frequently Used)
Persistence: Lost on server restart

Tier 2: Local Disk Cache (NVMe SSD)

The primary persistent cache tier stores the working set:

Capacity: 30-100TB per server (larger content catalog)
Access time: 50-200 microseconds (random read)
Hit rate: 40-60% of remaining requests
Eviction policy: LRU with popularity scoring
Persistence: Survives server restart; enables instant cache warming

Tier 3: Origin Shield

A regional cache tier that aggregates requests from multiple edge servers:

Purpose: Protect origin from thundering herd; improve cache efficiency
Behavior: Edge servers fetch from shield on cache miss (not directly from origin)
Location: 3-10 global shield locations (regional coverage)
Benefit: 10 edges with 80% CHR each combine to 98% origin offload via shield

The Origin Shield Calculation

Cache tier selection logic:

When a request arrives, the edge server checks tiers in order:

1. Check RAM cache → HIT? Serve immediately (fastest path)
                   → MISS? Check next tier

2. Check SSD cache  → HIT? Serve and promote to RAM
                   → MISS? Check shield tier

3. Check Shield    → HIT? Serve and cache locally
                   → MISS? Fetch from origin

4. Fetch Origin    → Cache at shield + local SSD + RAM (if hot)
                   → Serve to client

Promotion and demotion:

Content accessed frequently is promoted up tiers (SSD → RAM)
Content not accessed recently is demoted down tiers or evicted
Each tier runs independent eviction based on its constraints
Hot content exists in all tiers simultaneously; cold content only on disk

Cache Key Design

Default cache key components:

By default, most CDNs construct cache keys from:

Scheme + Host + Path + Query String
https://example.com/images/logo.png?v=123

The cache key problem:

This default key can cause issues when variations don't indicate different content:

?utm_source=google vs ?utm_source=facebook — Same content, different keys!
Query parameter order: ?a=1&b=2 vs ?b=2&a=1 — Same content, different keys!
Tracking parameters: ?_=1705350000 (cache-busting timestamps)

Result: Multiple cache entries for identical content → lower CHR → higher origin load.

Cache Key Optimization Strategies

•Query string stripping — Remove tracking parameters (utm_*, fbclid, gclid) from cache key while passing to origin. Same content cached once regardless of marketing attribution.
•Query string sorting — Normalize parameter order: ?b=2&a=1 becomes ?a=1&b=2. Prevents duplicate entries from parameter reordering.
•Query string whitelisting — Only include known-relevant parameters in cache key. If only version affects content, key on ?version=X and ignore all others.
•Header-based keying — Include specific headers in cache key. Common: Accept-Encoding (different compressions), Accept-Language (localized content).
•Cookie-based keying — Include specific cookies in cache key (use sparingly—cookies are high-cardinality). Example: country=US for geo-personalization.
•Device-based keying — Include device type (mobile/desktop) in cache key for responsive content. Use User-Agent classification, not raw header.

cache_key_config.txt
CDN Config
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Example: Cloudflare Page Rules cache key optimization
# Remove marketing parameters from cache key
cache_key:
  scheme: include
  host: include
  path: include
  query_string:
    include: ["version", "id", "format"]    # Only these affect caching
    exclude: ["utm_*", "fbclid", "gclid"]   # Ignored in key
  
# Alternative: Akamai Property Manager approach
behavior:
  cacheKeyQueryString:
    behavior: IGNORE_ALL_PRESERVE   # Ignore query, pass to origin
 
# With header-based variants
cache_key:
  additional_headers:
    - Accept-Encoding              # Different key for gzip vs brotli
    - Accept-Language             # Different key per language
  cookie_keys:
    - country                     # Geo-personalization variant

The Vary Header Trap

Cache key best practices:

Audit your cache keys: Use CDN analytics to identify URLs with unexpectedly low CHR; investigate cache key fragmentation.
Minimize key cardinality: Every unique cache key is a separate entry. Lower cardinality = higher hit rate.
Separate static and dynamic: Static assets should have simple keys (just URL). Dynamic content may need additional attributes.
Test cache key changes: Incorrect key configuration can cause users to receive wrong content. Test thoroughly in staging.
Document your strategy: Cache key configuration is operational knowledge that must be maintained.

Cache Invalidation Strategies

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton

Time-To-Live (TTL) Expiration

The simplest and most common invalidation mechanism: content expires after a configured duration.

How it works:

Origin includes Cache-Control: max-age=N or s-maxage=N
CDN caches content and starts the TTL countdown
When TTL expires, cached content is marked stale
Next request either revalidates (conditional request) or fetches full content

Advantages:

Zero operational overhead (automatic)
Predictable behavior
No API calls or purge operations required

Disadvantages:

Content can be stale for up to TTL duration
Shorter TTLs increase origin load
Cannot handle urgent content corrections

When to use:

Content with predictable update schedules
Long-lived static assets (JS, CSS, images)
Content where slight staleness is acceptable

Content Type	Recommended TTL	Rationale
Versioned static assets	1 year (31536000s)	URL changes on update; safe to cache forever
Unversioned static assets	1 week (604800s)	Manual invalidation if emergency update
API responses	5-60 seconds	Balance freshness and performance
HTML pages	60-300 seconds	Short enough for content updates
Real-time data	0 (no-cache)	Always validate with origin

Cache Tags: The Power Tool

Advanced Caching Patterns

Beyond basic TTL-based caching, production CDNs implement sophisticated patterns to handle edge cases, optimize efficiency, and maintain consistency under challenging conditions.

Advanced Caching Patterns

•Request Coalescing (Request Collapsing) — When multiple concurrent requests arrive for the same uncached content, only ONE request is sent to origin. Other requests wait for the first response. Prevents origin overload from flash crowds. Critical for live events and viral content.
•Negative Caching — Cache 404 and other error responses for short durations. Prevents repeated origin requests for non-existent content. Typical TTL: 30-300 seconds. Protects origin from attacks probing for non-existent resources.
•Grace Mode / Stale-If-Error — If origin is unavailable, serve stale cached content rather than error. Maintains user experience during origin outages. Stale content is better than no content for most use cases.
•Micro-Caching — Cache dynamic content for very short durations (1-5 seconds). Even 1-second caching collapses simultaneous requests. Effective for high-traffic pages where perfect freshness isn't required.
•Edge-Side Includes (ESI) — Composite pages assembled at the edge from cached fragments. Different fragments can have different TTLs. Enables mixing personalized and shared content efficiently.
•Predictive Pre-Fetching — Analyze request patterns to predict next content requests. Pre-fetch content to edge caches before users request it. Reduces cache miss latency for predictable navigation.

Request coalescing deep dive:

Request coalescing (also called request collapsing or request deduplication) is essential for handling viral content and flash crowds:

┌──────────────────────────────────────────────────────────┐
│                   Edge Server Timeline                    │
├──────────────────────────────────────────────────────────┤
│                                                          │
│ t=0ms:   Request A arrives for /viral-video.mp4 (MISS)  │
│          → Origin request initiated                      │
│                                                          │
│ t=5ms:   Request B arrives for /viral-video.mp4         │
│          → Sees pending request, JOINS WAIT GROUP        │
│                                                          │
│ t=10ms:  Request C arrives for /viral-video.mp4         │
│          → Joins same wait group                         │
│                                                          │
│ t=100ms: Origin response received                        │
│          → Content cached                                │
│          → All waiting requests (A, B, C) satisfied      │
│                                                          │
│ Result: 3 client requests, 1 origin request              │
│ Without coalescing: 3 client requests, 3 origin requests │
└──────────────────────────────────────────────────────────┘

For viral content with 10,000 simultaneous requests, coalescing reduces origin load from 10,000 requests to 1. This is why CDNs can handle flash crowds that would devastate any origin server.

Coalescing Timeout Considerations

Caching Dynamic Content

The dynamic content challenge:

Dynamic content varies by user, time, or context. A product page might include:

Static content: Product description, images (cache forever)
Semi-dynamic: Price, availability (cache 60 seconds)
User-specific: Cart status, recommendations (don't cache on CDN)

Dynamic Content Caching Strategies
Strategy	Technique	Cache Key Impact	Use Case
Response Fragmentation	Separate cacheable/non-cacheable parts	Multiple keys, lower cardinality	Pages mixing static + personalized
Cookie-less Domain	Serve static assets from different domain	No cookie in key; maximum sharing	Images, CSS, JS files
Vary Header Control	Specify which headers create variants	Controlled fragmentation	Language, device, format variants
Query String Versioning	Include version in URL, not query	Clean keys; long TTL	Static assets with updates
Micro-caching	Cache for 1-5 seconds	High hit rate, low staleness	High-traffic dynamic APIs
Edge Computing	Generate content at edge	Compute at edge; personalized	SSR, A/B testing, personalization

Pattern: HTTP Cache Fragments with Edge Composition

Modern CDNs can assemble pages from independently cached fragments:

┌─────────────────────────────────────────────────────────────┐
│                         Full Page                            │
├─────────────────────────────────────────────────────────────┤
│  ┌────────────────────────────────────────────────────┐     │
│  │     Header Fragment (shared, TTL: 1 hour)          │     │
│  └────────────────────────────────────────────────────┘     │
│  ┌──────────────────────┐  ┌──────────────────────────┐    │
│  │ User Menu Fragment   │  │ Search Fragment (shared) │    │
│  │ (private, no cache)  │  │ (TTL: 1 minute)          │    │
│  └──────────────────────┘  └──────────────────────────┘    │
│  ┌────────────────────────────────────────────────────┐     │
│  │     Content Fragment (shared, TTL: 5 minutes)      │     │
│  └────────────────────────────────────────────────────┘     │
│  ┌────────────────────────────────────────────────────┐     │
│  │     Footer Fragment (shared, TTL: 1 hour)          │     │
│  └────────────────────────────────────────────────────┘     │
└─────────────────────────────────────────────────────────────┘

Most fragments cached and shared across all users
User Menu fetched client-side (JavaScript) or excluded from cache
Page delivered fast; small personalized parts lazy-loaded

The 90% Cacheable Insight

Cache Consistency and Coherence

The consistency challenge:

Consider this timeline:

t=0:    Price is $10.00; cached globally with TTL=300s
t=60s:  Price updated to $12.00 at origin
t=61s:  User A (Tokyo edge) sees $10.00 (cached)
t=62s:  User B (NYC edge) sees $12.00 (edge had cache miss)
t=180s: User A still sees $10.00; User B sees $12.00

Result: Same product, different prices for 4+ minutes

Consistency models for CDN caching:

CDN Consistency Models
Model	Guarantee	Implementation	Tradeoff
Eventual Consistency	All edges will eventually have same content	TTL-based expiration; no active sync	Lowest cost; acceptable staleness window
Bounded Staleness	Content no older than X seconds	Short TTL + stale-while-revalidate	Predictable maximum staleness
Instant Consistency	All edges updated simultaneously	Purge on update + refill or edge compute	Highest cost; complex implementation
Strong Consistency	No stale content ever served	Cache-through with validation	High latency; defeats CDN benefits

Implementing bounded staleness:

For most applications, bounded staleness provides acceptable consistency at reasonable cost:

Set s-maxage to maximum acceptable staleness (e.g., 60 seconds)
Use stale-while-revalidate for background updates
For critical updates, trigger instant purge via API
Result: Content is at most 60 seconds stale except when explicitly purged

The version field pattern:

For content that must be consistent across API responses, include a version field:

{
  "product_id": 123,
  "name": "Widget",
  "price": 12.00,
  "cache_version": "2025-01-17T10:30:00Z",
  "_meta": {
    "cached_at": "2025-01-17T10:32:15Z",
    "edge_location": "tokyo-01"
  }
}

Clients can compare cache_version across responses and detect inconsistencies. If critical actions require consistency, clients can request with Cache-Control: no-cache to bypass CDN.

CAP Theorem Applies to CDNs

Measuring Cache Performance

Effective caching requires continuous measurement and optimization. Key metrics reveal caching efficiency and guide configuration improvements.

Essential Cache Performance Metrics
Metric	Formula	Target	Interpretation
Cache Hit Ratio (CHR)	Hits ÷ (Hits + Misses)	90% static, >60% overall	Primary efficiency measure
Byte Hit Ratio	Bytes from cache ÷ Total bytes	85%	Bandwidth-weighted efficiency
Origin Offload	1 - (Origin requests ÷ Total requests)	90%	Origin protection effectiveness
Cache Efficiency	Unique objects ÷ Total objects	Lower is better	Cache fragmentation indicator
Miss Latency	Avg latency on cache miss	<500ms	Origin performance impact
Hit Latency	Avg latency on cache hit	<50ms	Edge serving performance
Stale Ratio	Stale serves ÷ Total serves	Depends on SWR config	Freshness vs. performance tradeoff

Diagnosing cache problems:

Problem: Low CHR (50-70%)

Check: Cache key fragmentation (too many variants?)
Check: TTLs too aggressive (short TTLs = more misses)
Check: No-cache/no-store headers incorrectly set
Check: Query string parameters creating unnecessary variants

Problem: High origin load despite good CHR

Check: Large objects causing bandwidth despite few requests
Check: Request coalescing not enabled
Check: Purge rate too high (content invalidated before natural expiry)

Problem: Stale content complaints

Check: TTL appropriate for content update frequency
Check: Purge automation functioning
Check: Stale-while-revalidate misconfigured

analyze_cache_performance.sh
CLI Analysis
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Check cache status from CDN response headers
curl -sI https://example.com/page | grep -E '(cf-cache-status|x-cache|age)'
# cf-cache-status: HIT        ← Cloudflare cache hit
# x-cache: Hit from cloudfront ← AWS CloudFront hit  
# age: 3600                    ← Cached for 1 hour
 
# Common cache status values:
# HIT        - Served from cache
# MISS       - Fetched from origin (now cached)
# EXPIRED    - Cache expired, revalidated
# BYPASS     - Cache bypassed (no-cache request)
# DYNAMIC    - Content marked as uncacheable
 
# Query CDN analytics API for cache metrics
curl -X GET "https://api.cloudflare.com/client/v4/zones/{zone}/analytics/dashboard" \
  -H "Authorization: Bearer {token}" \
  | jq '.result.totals.requests.cached / .result.totals.requests.all * 100'
# Returns: 92.4 (cache hit ratio percentage)

The CHR Optimization Loop

Summary: Caching Excellence

Key Takeaways

•HTTP caching headers are foundational — Master Cache-Control directives, especially s-maxage, stale-while-revalidate, and immutable for CDN-specific optimization.
•Cache hierarchies multiply efficiency — RAM → SSD → Shield → Origin tiers progressively catch cache misses, with origin shields essential for origin protection.
•Cache key design determines hit rates — Minimize key cardinality by stripping tracking parameters, normalizing query strings, and carefully controlling Vary headers.
•Multiple invalidation strategies exist — TTL-based for simplicity, instant purging for control, SWR for latency, event-driven for automation—choose based on freshness requirements.
•Advanced patterns handle edge cases — Request coalescing, negative caching, and micro-caching solve specific problems at scale.
•Dynamic content can be cached — Fragment pages, use edge computing, and implement micro-caching to extend CDN benefits beyond static content.
•Consistency is an explicit choice — Understand eventual vs. bounded staleness vs. instant consistency and implement according to business requirements.
•Measurement drives optimization — Track CHR, byte hit ratio, and origin offload to identify and resolve caching inefficiencies.

What's next:

Page Complete

3 / 5