Loading content...
While browser caching eliminates network requests for returning visitors, it does nothing for first-time visitors or cache misses. When users must fetch content from your servers, physical distance becomes the enemy. Light travels at approximately 300,000 kilometers per second—the absolute speed limit of the universe. A request from Tokyo to a server in Virginia must travel roughly 11,000 kilometers each way, adding at least 75 milliseconds just for the speed of light, plus routing delays, processing time, and network congestion.
Of course, we cannot change physics. But what we can do is change geography—by placing copies of content closer to users. This is the fundamental insight behind Content Delivery Networks (CDNs), and it remains one of the most impactful performance optimizations available to any web-scale system.
This page covers CDN architecture and how edge caching works, cache key design and the Vary header, TTL strategies and freshness management, cache invalidation patterns, origin shield configuration, and production-grade CDN configuration. By the end, you'll be able to design and implement CDN caching strategies that deliver sub-100ms response times globally.
A Content Delivery Network consists of a globally distributed network of proxy servers strategically positioned to serve content from locations geographically close to end users. Understanding CDN architecture is essential for effective caching strategy design.
Core CDN Components:
The Request Flow Through a CDN:
User (Tokyo) → DNS Resolution → Edge PoP (Tokyo) → Cache Hit?
|
┌────────────────┴────────────────┐
│ YES │ NO
↓ ↓
Serve from Cache Origin Shield (Singapore)?
(< 20ms) |
┌──────────────┴──────────────┐
│ Cache Hit │ Cache Miss
↓ ↓
Serve from Shield Fetch from Origin (US)
(+ 50ms) + Cache at Shield + Edge
(+ 150ms)
Latency Impact by Cache Hit Location:
| Cache Location | Typical Latency | Notes |
|---|---|---|
| Browser cache | < 5ms | No network at all |
| Edge PoP (same city) | 5-30ms | Single network hop |
| Edge PoP (same region) | 20-50ms | Few network hops |
| Origin shield | 50-100ms | Regional backbone |
| Origin server | 100-400ms+ | Cross-continental |
The number of edge locations directly impacts user experience. CloudFlare operates 300+ PoPs, AWS CloudFront 450+, and Akamai 4,000+. More edge locations mean more users are "close" to a cache, reducing latency. When evaluating CDNs, consider their PoP density in your key markets.
The cache key determines what constitutes a unique cacheable object. By default, CDNs use the full request URL as the cache key. However, real-world applications often need to serve different content for the same URL based on request characteristics—compression capabilities, language preferences, device type, or authentication state.
The Default Cache Key:
https://example.com/api/products?category=electronics&sort=price
└──────────────────────────────────────────────────────────────┘
Default Cache Key
This works well for truly static content but breaks down when:
The Vary Header:
The HTTP Vary header instructs caches to treat requests with different values for the specified headers as different cache entries. It effectively adds request header values to the cache key.
HTTP/1.1 200 OK
Content-Type: text/html
Cache-Control: public, max-age=3600
Vary: Accept-Encoding, Accept-Language
Content-Encoding: gzip
<compressed content>
With Vary: Accept-Encoding, Accept-Language, the effective cache key becomes:
https://example.com/page.html + Accept-Encoding=gzip + Accept-Language=en-US
The same URL with Accept-Language=fr-FR would be a separate cache entry.
| Vary Header | Use Case | Cache Impact | Considerations |
|---|---|---|---|
| Accept-Encoding | Serve compressed content to capable browsers | Low (2-3 variants) | Standard practice; always include |
| Accept-Language | Serve localized content | Medium (number of languages) | Consider URL-based i18n instead |
| Cookie | Any cookie variation | Very High (can explode) | Avoid if possible; breaks caching |
| Authorization | Authenticated content | Extremely High | Usually means no-cache instead |
| User-Agent | Device-specific content | Catastrophic (thousands of variants) | Never use; use device detection logic |
| Accept | Content negotiation (HTML vs JSON) | Low (few content types) | Consider separate URLs instead |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778
// Cloudflare Worker - Custom cache keyaddEventListener('fetch', event => { event.respondWith(handleRequest(event.request));}); async function handleRequest(request) { const url = new URL(request.url); // Create custom cache key const cacheKeyComponents = [ url.origin + url.pathname, // Base URL without query params getSortedQueryString(url), // Normalized query string getDeviceType(request), // Device classification getGeoRegion(request), // Geographic region ]; const cacheKey = cacheKeyComponents.join('::'); // Check cache with custom key const cache = caches.default; const cacheRequest = new Request(cacheKey, { cf: request.cf }); let response = await cache.match(cacheRequest); if (!response) { // Cache miss - fetch from origin response = await fetch(request); // Only cache successful responses if (response.ok) { // Clone response for caching const responseToCache = response.clone(); // Add Cache-Control if origin didn't if (!response.headers.get('Cache-Control')) { const headers = new Headers(response.headers); headers.set('Cache-Control', 'public, max-age=3600'); responseToCache = new Response(responseToCache.body, { status: responseToCache.status, headers }); } event.waitUntil(cache.put(cacheRequest, responseToCache)); } } return response;} function getSortedQueryString(url) { const params = [...url.searchParams.entries()]; // Ignore tracking parameters const filtered = params.filter(([key]) => !['utm_source', 'utm_medium', 'ref', 'fbclid'].includes(key) ); // Sort for consistent cache keys filtered.sort((a, b) => a[0].localeCompare(b[0])); return filtered.map(([k, v]) => `${k}=${v}`).join('&');} function getDeviceType(request) { const ua = request.headers.get('User-Agent') || ''; if (/Mobile|Android|iPhone/i.test(ua)) return 'mobile'; if (/Tablet|iPad/i.test(ua)) return 'tablet'; return 'desktop';} function getGeoRegion(request) { // Cloudflare provides geo data const country = request.cf?.country || 'XX'; const regionMap = { US: 'americas', CA: 'americas', MX: 'americas', BR: 'americas', GB: 'europe', DE: 'europe', FR: 'europe', IT: 'europe', JP: 'apac', CN: 'apac', IN: 'apac', AU: 'apac', }; return regionMap[country] || 'other';}Including 'Cookie' in the Vary header effectively makes each user's session a unique cache entry—destroying shared caching benefits. If you need to personalize content for authenticated users, prefer 'private' caching (browser only) or edge-side includes (ESI) where the base page is cached and personalized fragments are injected.
Time-To-Live (TTL) determines how long content remains fresh in the CDN cache before requiring revalidation or refresh from the origin. TTL strategy is a fundamental tension between freshness (showing current content) and efficiency (avoiding origin requests).
The TTL Spectrum:
Very Short TTL (seconds) Medium TTL (minutes-hours) Very Long TTL (days-years)
│ │ │
│ Fresh but expensive │ Balanced approach │ Efficient but stale
│ High origin load │ Reasonable freshness │ Requires cache busting
│ Good for dynamic content │ Good for semi-static content │ Good for immutable assets
▼ ▼ ▼
| Content Type | Recommended TTL | Rationale | s-maxage vs max-age |
|---|---|---|---|
| Versioned static assets (hash) | 1 year (31536000s) | URL changes with content; truly immutable | Same for both |
| Unversioned images/media | 1 day - 1 week | Changes infrequently; moderate staleness OK | CDN longer than browser |
| API responses (public data) | 1-5 minutes | Freshness matters; balance with efficiency | CDN same or longer |
| HTML pages | 0 (must-revalidate) | Always get fresh asset references | n/a—use no-cache |
| User-specific data | 0 or private only | Cannot serve cached to different users | Browser only, no CDN |
| Real-time data (stock prices) | 0 + no-cache | Staleness unacceptable | n/a—always revalidate |
Differentiating Browser and CDN TTLs:
You can set different cache durations for browsers (max-age) and CDNs (s-maxage). This enables powerful patterns:
# CDN caches for 1 hour, browsers cache for 5 minutes
Cache-Control: public, max-age=300, s-maxage=3600
Why differentiate?
Stale Content Directives:
Modern HTTP caching supports serving stale content in specific situations:
1234567891011121314151617181920212223
# Stale-While-Revalidate: Serve stale immediately, fetch fresh in backgroundCache-Control: public, max-age=60, stale-while-revalidate=300 # Timeline:# 0-60 seconds: Fresh - serve from cache# 60-360 seconds: Stale but acceptable - serve stale, fetch fresh in background# 360+ seconds: Must revalidate before serving # Stale-If-Error: Serve stale if origin is unreachableCache-Control: public, max-age=300, stale-if-error=86400 # Timeline:# 0-300 seconds: Fresh - serve from cache# 300+ seconds: Stale - if origin returns 5xx or times out, serve stale# After 86400 secs: Don't serve stale even on errors # Combined pattern for resilient API responsesCache-Control: public, max-age=60, stale-while-revalidate=300, stale-if-error=86400 # This provides:# - 60 seconds of fresh cache# - Immediate response with background refresh for 5 more minutes# - Graceful degradation to stale content during outages for up to 24 hoursSome CDNs (Fastly, Varnish-based) call stale-while-revalidate behavior "grace mode." It significantly improves perceived performance: users never wait for origin responses if there's any cached version available. The background refresh ensures eventual consistency.
When content changes before TTL expiration, you need mechanisms to remove or update cached content. CDN cache invalidation is notoriously complex—Phil Karlton famously noted that "There are only two hard things in Computer Science: cache invalidation and naming things."
Invalidation Strategies:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
// Cloudflare API - Purge specific URLsasync function purgeUrls(urls) { const response = await fetch( `https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/purge_cache`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_TOKEN}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ files: urls }), } ); return response.json();} // Purge by prefix (wildcard)async function purgePrefix(prefix) { // Not all CDNs support this; check your provider const response = await fetch( `https://api.cloudflare.com/client/v4/zones/${ZONE_ID}/purge_cache`, { method: 'POST', headers: { 'Authorization': `Bearer ${API_TOKEN}`, 'Content-Type': 'application/json', }, body: JSON.stringify({ prefixes: [prefix] }), } ); return response.json();} // Purge by cache tag (surrogate key)// Requires adding Surrogate-Key header to responsesasync function purgeByTag(tags) { // Fastly API example const requests = tags.map(tag => fetch(`https://api.fastly.com/service/${SERVICE_ID}/purge/${tag}`, { method: 'POST', headers: { 'Fastly-Key': API_KEY }, }) ); return Promise.all(requests);} // Usage in deployment pipelineasync function deployAndInvalidate() { // 1. Deploy new version of assets await deployAssets(); // 2. Purge old cached content await purgeByTag(['static-assets', 'v1.2.3']); // 3. Warm the cache for critical paths await warmCache([ 'https://example.com/', 'https://example.com/products', 'https://example.com/api/featured', ]); console.log('Deployment complete, caches invalidated and warmed');}Cache Tags / Surrogate Keys:
The most powerful invalidation pattern uses cache tags (Fastly calls them "surrogate keys"). Instead of tracking individual URLs, you tag responses with logical identifiers, then purge by tag.
GET /products/12345
HTTP/1.1 200 OK
Cache-Control: public, max-age=86400
Surrogate-Key: product-12345 category-electronics products-all
{"id": 12345, "name": "Laptop", ...}
When product 12345 is updated, purge product-12345. When category structure changes, purge category-electronics. When all products need refresh, purge products-all.
Benefits of Tag-based Invalidation:
| Approach | Invalidate when product updated | Implementation Complexity |
|---|---|---|
| URL-based purge | Must know all URLs referencing the product | High—track all URL variations |
| Wildcard purge | Broad purge, over-invalidates | Low but inefficient |
| Cache tags | Purge single tag | Medium—add tags to responses |
Purging popular content from all edge locations simultaneously can cause a thundering herd—all users suddenly hitting origin at once. Mitigate with: origin shield (single point for refetch), staggered purge, request coalescing at edge, or soft purge with stale-while-revalidate.
An origin shield is an intermediate caching layer between edge PoPs and your origin server. It acts as a regional cache that serves multiple edge locations, collapsing many edge-to-origin requests into fewer shield-to-origin requests.
Without Origin Shield:
Edge (Tokyo) ──┐
Edge (Seoul) ──┼─── 3 separate requests ──→ Origin (US)
Edge (Sydney) ──┘
On cache miss, each edge fetches independently
With Origin Shield:
Edge (Tokyo) ──┐ ┌──→ Origin (US)
Edge (Seoul) ──┼─→ Shield (Singapore) ──┤
Edge (Sydney) ──┘ (regional cache) └──→ (only if shield misses)
On cache miss, only shield fetches from origin
| Strategy | Place Shield Near | Use Case |
|---|---|---|
| Near Origin | Same region as origin server | Maximum origin protection; simpler architecture |
| Near Users | Region with most traffic | Faster shield hits for majority of users |
| Multiple Shields | One per major region | Global traffic distribution; regional resilience |
| Hierarchical | Multiple tiers of caching | Extreme scale; complex but optimal |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
# AWS CloudFront Distribution with Origin ShieldAWSTemplateFormatVersion: '2010-09-09'Description: CloudFront distribution with Origin Shield configured Resources: CDNDistribution: Type: AWS::CloudFront::Distribution Properties: DistributionConfig: Origins: - Id: MyOrigin DomainName: origin.example.com CustomOriginConfig: HTTPPort: 80 HTTPSPort: 443 OriginProtocolPolicy: https-only OriginSSLProtocols: - TLSv1.2 # Origin Shield Configuration OriginShield: Enabled: true # Choose region closest to your origin OriginShieldRegion: us-east-1 # Origin timeout and retry settings ConnectionAttempts: 3 ConnectionTimeout: 10 DefaultCacheBehavior: TargetOriginId: MyOrigin ViewerProtocolPolicy: redirect-to-https # Cache settings CachePolicyId: !Ref CachePolicy # Enable compression PriceClass: PriceClass_All Enabled: true CachePolicy: Type: AWS::CloudFront::CachePolicy Properties: CachePolicyConfig: Name: OptimizedCaching DefaultTTL: 86400 # 1 day MaxTTL: 31536000 # 1 year MinTTL: 0 ParametersInCacheKeyAndForwardedToOrigin: CookiesConfig: CookieBehavior: none HeadersConfig: HeaderBehavior: whitelist Headers: - Accept-Encoding - Accept-Language QueryStringsConfig: QueryStringBehavior: all EnableAcceptEncodingBrotli: true EnableAcceptEncodingGzip: trueOrigin shields typically implement request coalescing: when multiple concurrent requests arrive for the same uncached content, only one request goes to origin while others wait. This is transparent to your application but dramatically reduces origin load during cache warming or after purges.
While CDNs are traditionally associated with static content, modern CDNs provide significant benefits for dynamic API responses and personalized content. Understanding how to leverage CDNs for dynamic content expands their utility considerably.
Micro-Caching for Dynamic Content:
Even content that changes frequently can benefit from very short TTLs (1-10 seconds). This "micro-caching" pattern provides:
1234567891011121314151617181920
# API endpoint: Product catalog (changes hourly, allows staleness)Cache-Control: public, max-age=60, stale-while-revalidate=300# CDN serves fresh for 1 minute, then stale while refreshing for 5 more minutes # API endpoint: Stock prices (changes constantly, minimal staleness)Cache-Control: public, max-age=1, stale-while-revalidate=5# Only 1-second freshness, but still absorbs traffic spikes # API endpoint: Trending content (social proof, some staleness OK)Cache-Control: public, max-age=30, stale-while-revalidate=60# Updates every 30 seconds; users see fresh-ish content # API endpoint: User-specific but not sensitive (recommendations)Cache-Control: private, max-age=300# Browser caches 5 minutes; CDN doesn't cache (private) # Approach: Geographic variations of "public" content# Cache different versions by regionVary: Accept-Encoding# Plus CDN-level configuration to vary by geographic regionEdge Compute for Dynamic Content:
Modern CDNs offer edge compute capabilities (Cloudflare Workers, AWS Lambda@Edge, Fastly Compute@Edge) that enable dynamic content generation at the edge:
| Use Case | Edge Compute Pattern | Benefit |
|---|---|---|
| A/B testing | Assign user to variant at edge | No origin latency for variant assignment |
| Personalization | Inject user context into cached content | Combine caching with customization |
| Bot detection | Analyze request patterns at edge | Protect origin from bad traffic |
| API aggregation | Combine multiple cached responses | Reduce client round trips |
| Authentication validation | Validate JWT at edge | Block unauthorized before origin |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
// Edge worker: Personalized content with cached baseaddEventListener('fetch', event => { event.respondWith(handleRequest(event));}); async function handleRequest(event) { const request = event.request; const url = new URL(request.url); // Example: Homepage personalization if (url.pathname === '/') { return personalizeHomepage(event); } // Pass through to origin return fetch(request);} async function personalizeHomepage(event) { const request = event.request; // Fetch cached base HTML (generic version) const cache = caches.default; const baseUrl = new URL(request.url); baseUrl.searchParams.set('base', 'true'); let baseHtml = await cache.match(baseUrl); if (!baseHtml) { baseHtml = await fetch(baseUrl.toString()); event.waitUntil(cache.put(baseUrl, baseHtml.clone())); } // Get personalization data (from cookie or API) const userData = await getUserData(request); // Transform cached base with personalization const personalizedHtml = await personalizeHtml(baseHtml, userData); return new Response(personalizedHtml, { headers: { 'Content-Type': 'text/html', 'Cache-Control': 'private, max-age=0', // Don't cache personalized version }, });} async function getUserData(request) { const cookie = request.headers.get('Cookie'); const userId = extractUserIdFromCookie(cookie); if (!userId) { return { isAnonymous: true, segments: [] }; } // Fetch user segments (cached at edge) const segmentsUrl = `https://api.example.com/users/${userId}/segments`; const segmentsResponse = await fetch(segmentsUrl, { cf: { cacheTtl: 300 } // Cache user segments for 5 minutes }); return segmentsResponse.json();} async function personalizeHtml(response, userData) { const html = await response.text(); // Use HTMLRewriter for streaming transformation return new HTMLRewriter() .on('[data-personalize="greeting"]', { element(element) { element.setInnerContent( userData.isAnonymous ? 'Welcome!' : `Welcome back, ${userData.name}!` ); } }) .on('[data-segment]', { element(element) { const targetSegment = element.getAttribute('data-segment'); if (!userData.segments.includes(targetSegment)) { element.remove(); } } }) .transform(new Response(html));}ESI is a markup language for assembling pages from cached fragments. The CDN caches the base page and dynamic fragments separately, then assembles them at the edge. This pattern works well for pages with mostly static content and small personalized sections (user name, cart count).
Effective CDN caching requires continuous monitoring to identify opportunities, detect issues, and optimize performance. Key metrics reveal how well your caching strategy is working.
Critical CDN Metrics:
| Metric | What It Measures | Target Range | Action if Out of Range |
|---|---|---|---|
| Cache Hit Ratio | % of requests served from cache | 90% for static, > 50% for all | Review TTLs, Vary headers, cache key config |
| Origin Request Rate | Requests reaching origin per second | Stable, not tracking traffic spikes | Investigate cache misses, enable shield |
| Edge Response Time | Time from edge receive to edge send | < 50ms for cache hits | Check edge compute, response size |
| Origin Response Time | Time for origin to respond | < 500ms p95 | Optimize origin, check network path |
| Error Rate | % of 4xx/5xx responses | < 1% | Debug errors, check origin health |
| Bandwidth Saved | Data served from cache vs origin | 80% | Review asset caching, compression |
| TTL Distribution | Age of cached content when served | Varied by content type | Adjust TTLs if all very young/old |
Debugging Cache Behavior:
CDNs typically add response headers that reveal caching behavior:
# Cloudflare response headers
CF-Cache-Status: HIT # Served from Cloudflare cache
CF-Cache-Status: MISS # Fetched from origin
CF-Cache-Status: BYPASS # Caching not applicable
CF-Cache-Status: DYNAMIC # Not cached (dynamic content)
CF-Cache-Status: EXPIRED # Was cached but TTL expired
Age: 3600 # Content has been cached for 1 hour
# AWS CloudFront
X-Cache: Hit from cloudfront
X-Cache: Miss from cloudfront
X-Amz-Cf-Pop: IAD53-C1 # Edge location that served request
# Fastly
X-Served-By: cache-sin18032-SIN # Singapore PoP
X-Cache: HIT, HIT # Hit at edge AND shield
X-Cache-Hits: 42 # Number of hits for this object
Age: 7200 # Cached for 2 hours
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990
// CDN Analytics Query - Example using CloudWatch for CloudFrontconst AWS = require('aws-sdk');const cloudwatch = new AWS.CloudWatch(); async function getCDNMetrics(distributionId, hours = 24) { const endTime = new Date(); const startTime = new Date(endTime - hours * 60 * 60 * 1000); const metrics = [ { name: 'Requests', stat: 'Sum' }, { name: 'BytesDownloaded', stat: 'Sum' }, { name: 'CacheHitRate', stat: 'Average' }, { name: 'OriginLatency', stat: 'Average' }, { name: '4xxErrorRate', stat: 'Average' }, { name: '5xxErrorRate', stat: 'Average' }, ]; const params = { Namespace: 'AWS/CloudFront', StartTime: startTime, EndTime: endTime, Period: 3600, // 1 hour granularity Dimensions: [ { Name: 'DistributionId', Value: distributionId }, { Name: 'Region', Value: 'Global' }, ], MetricDataQueries: metrics.map((m, i) => ({ Id: `m${i}`, MetricStat: { Metric: { Namespace: 'AWS/CloudFront', MetricName: m.name, Dimensions: [ { Name: 'DistributionId', Value: distributionId }, { Name: 'Region', Value: 'Global' }, ], }, Period: 3600, Stat: m.stat, }, })), }; const data = await cloudwatch.getMetricData(params).promise(); // Calculate summary const summary = { totalRequests: sum(data.MetricDataResults[0].Values), bytesServed: sum(data.MetricDataResults[1].Values), avgCacheHitRate: average(data.MetricDataResults[2].Values), avgOriginLatency: average(data.MetricDataResults[3].Values), errorRate: average(data.MetricDataResults[4].Values) + average(data.MetricDataResults[5].Values), }; return summary;} // Set up alerting for cache issuesasync function checkCacheHealth(distributionId) { const metrics = await getCDNMetrics(distributionId, 1); const alerts = []; if (metrics.avgCacheHitRate < 0.7) { alerts.push({ severity: 'warning', message: `Cache hit rate low: ${(metrics.avgCacheHitRate * 100).toFixed(1)}%`, action: 'Review TTL configurations and Vary headers', }); } if (metrics.avgOriginLatency > 500) { alerts.push({ severity: 'critical', message: `Origin latency high: ${metrics.avgOriginLatency.toFixed(0)}ms`, action: 'Check origin server health and network path', }); } if (metrics.errorRate > 0.01) { alerts.push({ severity: 'critical', message: `Error rate elevated: ${(metrics.errorRate * 100).toFixed(2)}%`, action: 'Investigate origin errors in logs', }); } return alerts;}CDN metrics show what's happening at the edge, but RUM shows what users actually experience. Combine CDN metrics with client-side performance data (Core Web Vitals, Navigation Timing API) for a complete picture. A 99% cache hit rate doesn't help if the cached content is the wrong content.
CDN caching is essential for delivering fast, reliable experiences to global users. By placing content closer to users physically, CDNs eliminate the latency that no amount of server optimization can overcome. Effective CDN usage requires understanding cache keys, TTL strategies, invalidation patterns, and monitoring.
What's Next:
With browser and CDN caching covered, we move closer to the application. The next page explores Application-Level Caching—where your code explicitly manages cached data in memory, using techniques like memoization, in-process caches, and integration with caching libraries. This layer offers the most control but requires careful design to maintain consistency.
You now understand CDN caching comprehensively—from architecture and cache key design to TTL strategies, invalidation patterns, origin shield, and edge compute. You can design CDN caching strategies that deliver sub-100ms global response times while maintaining content freshness and operational control.