Loading learning content...
Every time a user requests a resource through a CDN—whether it's an image, a JavaScript bundle, an API response, or a video stream—the CDN must answer a deceptively simple question: "Have I seen this exact request before?"
The mechanism that answers this question is the cache key. Despite being invisible to most developers, cache keys are the single most critical concept in CDN caching. A poorly constructed cache key can render an expensive CDN infrastructure nearly useless, while a well-designed cache key strategy can transform application performance and dramatically reduce origin server load.
Understanding cache keys isn't just about knowing what they are—it's about understanding how they determine cache behavior, how subtle variations can cause cache misses, and how to design caching strategies that maximize efficiency at global scale.
By the end of this page, you will understand how cache keys are constructed, why default cache key behavior often leads to poor cache efficiency, how to customize cache keys for different scenarios, and the advanced techniques that elite engineers use to achieve 95%+ cache hit ratios.
A cache key is a unique identifier generated by a CDN to determine whether a cached response exists for a given request. When a request arrives at a CDN edge server, the CDN computes a cache key from the request properties and uses it to look up a corresponding cached response in its storage.
The fundamental operation:
cache_storage.get(cache_key)The cache key is essentially a hash function's input—it transforms a complex HTTP request into a string that can be used as a dictionary key. The challenge lies in deciding which parts of the request should contribute to this key.
12345678910111213141516171819
REQUEST:GET /images/hero.png HTTP/1.1Host: cdn.example.comAccept-Encoding: gzip, deflate, brAccept-Language: en-US,en;q=0.9Cookie: session=abc123; preferences=dark-modeUser-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64)... DEFAULT CACHE KEY (Cloudflare-style):cdn.example.com/images/hero.png WITH QUERY STRING:cdn.example.com/images/hero.png?version=2.1 WITH DEVICE TYPE (custom):cdn.example.com/images/hero.png::desktop WITH GEOGRAPHIC VARIANT (custom):cdn.example.com/images/hero.png::US::desktopThe cache key must be specific enough to differentiate responses that should be different (don't serve mobile images to desktop users), yet general enough to maximize cache sharing (don't create separate cache entries for every unique cookie value). Finding this balance is the art of cache key design.
Different CDN providers use different default cache key constructions, but most share common foundational components. Understanding the default behavior is essential because everything you do in cache key customization builds on or modifies these defaults.
| Component | Cloudflare | AWS CloudFront | Akamai | Fastly |
|---|---|---|---|---|
| Hostname | ✓ Included | ✓ Included | ✓ Included | ✓ Included |
| URI Path | ✓ Included | ✓ Included | ✓ Included | ✓ Included |
| Query String | ✓ Included (configurable) | ✓ Configurable | ✓ Configurable | ✓ Included (configurable) |
| HTTP Method | ✓ Implied (GET only cached by default) | ✓ Configurable | ✓ Configurable | ✓ Included |
| Protocol (HTTP/HTTPS) | ✗ Unified by default | ✗ Unified by default | ✓ Configurable | ✗ Unified by default |
| Headers | ✗ Not included by default | ✓ Via Cache Policy | ✓ Configurable | ✓ Via Vary header |
| Cookies | ✗ Not included by default | ✓ Via Cache Policy | ✓ Configurable | ✗ Not included by default |
The Minimal Default Cache Key:
Most CDNs default to something resembling:
cache_key = hash(hostname + path + sorted_query_string)
This simple construction handles the majority of static asset caching needs. A request for https://cdn.example.com/assets/app.js?v=1.2.3 produces a different cache key than https://cdn.example.com/assets/app.js?v=1.2.4, ensuring that version updates don't serve stale cached content.
Why This Works for Static Assets:
Static assets like images, scripts, and stylesheets typically have these properties:
For these resources, the minimal cache key achieves maximum cache efficiency.
Some CDNs treat /page?a=1&b=2 and /page?b=2&a=1 as different cache keys. This subtle behavior can halve your cache hit ratio for APIs with dynamically constructed query strings. Always check if your CDN normalizes query string order, and if not, normalize it in your application or edge logic.
The elegance of default cache keys masks serious problems that emerge in real-world scenarios. Understanding these failure modes is critical for architects designing CDN strategies for complex applications.
?utm_source=twitter&utm_campaign=fall_2024) create unique cache keys for identical content, causing cache fragmentation across thousands of variations.?session=abc123) make every request unique, effectively bypassing the cache entirely.?rand=0.123456789) ensure fresh content but annihilate cache efficiency—every single request is a guaranteed cache miss.?variant=B) create separate cache entries for each test variant, multiplying storage and reducing hit ratio.?lang=en_US) in query strings when it should be in the path or handled via Content Negotiation.12345678910111213141516
SINGLE PIECE OF CONTENT: /blog/article-123.html BUT THESE ALL HAVE DIFFERENT CACHE KEYS:/blog/article-123.html/blog/article-123.html?utm_source=twitter/blog/article-123.html?utm_source=facebook/blog/article-123.html?utm_source=linkedin/blog/article-123.html?utm_source=email&utm_campaign=newsletter/blog/article-123.html?utm_source=twitter&utm_medium=social/blog/article-123.html?ref=homepage/blog/article-123.html?fbclid=IwAR3abc123... RESULT: 8+ cache entries for IDENTICAL content Each entry consumes edge storage Each first access is a cache miss (hits origin) Real cache hit ratio: potentially <12% for viral contentThe Cost of Cache Fragmentation:
Consider a blog post that goes viral. Without cache key optimization:
?utm_source=twitter)?utm_source=facebook)?utm_source=email)With default cache keys, your origin server receives at least 4 requests (one cache miss per unique UTM variation), and the CDN stores 4 copies of identical content. But it gets worse—Twitter and Facebook often add their own tracking parameters, creating potentially dozens of unique cache keys.
With proper cache key stripping, your origin receives 1 request, and all 250,000 visitors receive the cached response.
Cache fragmentation doesn't just reduce efficiency—it amplifies origin load during traffic spikes. When content goes viral, the moment you need caching most is precisely when fragmented cache keys fail you. Each unique query string variation triggers a fresh origin request, potentially overwhelming your infrastructure during peak demand.
Modern CDNs provide sophisticated mechanisms to customize cache key generation. Mastering these techniques is essential for achieving optimal cache efficiency while maintaining correct behavior for your specific use cases.
version, api_key; exclude utm_*, fbclid, session./path?a=1&b=2 and /path?b=2&a=1 produce identical cache keys.Accept-Language, Accept-Encoding, device type) to serve different cached versions to different clients.12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
// Cloudflare Worker: Custom Cache Key ConstructionaddEventListener('fetch', event => { event.respondWith(handleRequest(event.request));}); async function handleRequest(request) { const url = new URL(request.url); // Build custom cache key const cacheKeyUrl = new URL(request.url); // 1. Strip analytics parameters (keep only relevant ones) const relevantParams = ['version', 'format', 'quality']; const newParams = new URLSearchParams(); for (const param of relevantParams) { if (url.searchParams.has(param)) { newParams.set(param, url.searchParams.get(param)); } } // Sort parameters for consistency newParams.sort(); cacheKeyUrl.search = newParams.toString(); // 2. Add device class to cache key (but not full User-Agent) const deviceClass = detectDeviceClass(request.headers.get('User-Agent')); cacheKeyUrl.searchParams.set('_device', deviceClass); // 3. Add country for geo-specific content const country = request.headers.get('CF-IPCountry') || 'XX'; cacheKeyUrl.searchParams.set('_country', country); // Create cache key from modified URL const cacheKey = new Request(cacheKeyUrl.toString(), { method: request.method, headers: request.headers, }); // Check cache first const cache = caches.default; let response = await cache.match(cacheKey); if (!response) { // Cache miss - forward to origin response = await fetch(request); // Clone and cache the response const responseClone = response.clone(); event.waitUntil(cache.put(cacheKey, responseClone)); } return response;} function detectDeviceClass(userAgent) { if (!userAgent) return 'unknown'; if (/Mobile|Android|iPhone/i.test(userAgent)) return 'mobile'; if (/Tablet|iPad/i.test(userAgent)) return 'tablet'; return 'desktop';}When filtering query parameters, prefer an allowlist (include only known relevant parameters) over a denylist (exclude known irrelevant parameters). New tracking parameters appear constantly, and a denylist will miss them. An allowlist ensures only explicitly approved parameters affect caching.
The HTTP Vary header is the standard mechanism for instructing caches (including CDNs) to include specific request headers in the cache key. Understanding Vary is crucial because it's often the primary way origins communicate caching requirements to CDNs.
1234567891011121314151617181920212223242526272829303132333435
# Response 1: Cache variants by Accept-EncodingHTTP/1.1 200 OKContent-Type: application/javascriptContent-Encoding: gzipVary: Accept-EncodingCache-Control: public, max-age=31536000 # Cache keys created:# - /app.js::Accept-Encoding:gzip# - /app.js::Accept-Encoding:br# - /app.js::Accept-Encoding:identity # Response 2: Cache variants by Accept-LanguageHTTP/1.1 200 OKContent-Type: text/htmlVary: Accept-LanguageCache-Control: public, max-age=3600 # Cache keys created:# - /page::Accept-Language:en-US# - /page::Accept-Language:es-ES# - /page::Accept-Language:fr-FR# ... potentially hundreds of variants # Response 3: Multiple Vary headers (DANGEROUS)HTTP/1.1 200 OKContent-Type: text/htmlVary: Accept-Encoding, Accept-Language, User-AgentCache-Control: public, max-age=3600 # Cache keys explode combinatorially:# - /page::gzip::en-US::Chrome/120# - /page::gzip::en-US::Firefox/121# - /page::br::en-US::Chrome/120# ... thousands of variants per pageNever use Vary: User-Agent in its raw form. With thousands of unique User-Agent strings, you effectively disable caching. Instead, use edge logic to normalize User-Agent into a small set of device classes (mobile, tablet, desktop) and Vary on that custom header.
Best Practices for Vary Headers:
Use Vary: Accept-Encoding — This is almost always safe and enables serving pre-compressed assets (gzip, Brotli) efficiently.
Limit Vary Dimensions — Each header in Vary multiplies cache variants. Three Vary headers with 5 possible values each creates 125 cache entries per URL.
Normalize Before Varying — Instead of Vary: Accept-Language with 500+ possible values, normalize to supported languages at the edge and Vary on a custom X-Normalized-Language header with 10-20 values.
Avoid Vary: Cookie — This effectively disables caching for authenticated users. Use cache key extraction for specific cookies instead.
Test Vary Behavior — Different CDNs interpret Vary differently. Some ignore unknown headers, others explode cache entries. Always test.
| Vary Configuration | Cache Variants per URL | Recommendation |
|---|---|---|
| Vary: Accept-Encoding | 3-5 (gzip, br, identity, deflate, none) | ✓ Recommended |
| Vary: Accept-Language (raw) | 100-500+ (all browser languages) | ✗ Avoid—normalize first |
| Vary: Accept-Language (normalized) | 5-20 (supported languages only) | ✓ Acceptable with normalization |
| Vary: User-Agent | 10,000+ (every unique UA string) | ✗ Never use |
| Vary: X-Device-Type (custom) | 3-5 (mobile, tablet, desktop) | ✓ Recommended pattern |
| Vary: Cookie | Infinite (every unique cookie value) | ✗ Effectively disables cache |
Cache key design isn't just about efficiency—it has critical security implications. Cache key collisions occur when different requests produce the same cache key, potentially serving the wrong content to users. In the worst cases, this can lead to severe security vulnerabilities.
/account/settings.css. If the CDN caches based on extension and the origin returns the user's settings page (ignoring the fake extension), the attacker can retrieve cached personal data.X-Forwarded-Host), attackers can inject that header and poison the cache for all users.?param=good¶m=evil) where the cache key uses one value and the origin uses another can cause cached content to differ from expected.12345678910111213141516171819202122232425
ATTACK SCENARIO: Web Cache Deception 1. ATTACKER sends link to victim: https://bank.com/account/profile.css 2. VICTIM (authenticated) clicks link - Origin returns: User's profile page (HTML with PII) - CDN sees: .css extension - CDN behavior: Cache this response (CSS files are cacheable) 3. CDN CACHE STATE: Key: bank.com/account/profile.css Value: Victim's personal profile HTML (cached for all!) 4. ATTACKER requests same URL (unauthenticated): https://bank.com/account/profile.css - CDN returns: Victim's personal profile (cache hit!) RESULT: Attacker obtains victim's PII MITIGATION:- Ensure Cache-Control headers from origin are respected- Don't cache based on file extension alone- Use Cache-Control: private for user-specific responses- Implement proper cache key segregation for authenticated contentAny response containing user-specific data must include Cache-Control: private or Cache-Control: no-store. Public CDN caches should never store content that varies by authentication state unless proper cache key segregation (including a session indicator) is implemented.
Defensive Cache Key Design Principles:
Include Authentication Indicators — For authenticated routes, include a flag in the cache key (not the session token itself, but a simple authenticated/unauthenticated indicator).
Validate Response Cacheability — Even if the CDN would cache, ensure origin sends appropriate Cache-Control headers.
Audit Unkeyed Inputs — Any request property that affects output but isn't in the cache key is a potential poisoning vector. Audit regularly.
Use Cache-Control: private Defensively — When in doubt, mark user-specific routes as private. Performance loss is preferable to security breach.
Implement Cache Segmentation — Use separate cache zones or behaviors for public vs. authenticated content.
Beyond basic customization, sophisticated architectures employ advanced cache key strategies to handle complex requirements while maintaining high cache efficiency.
/page::desktop::us::auth/page::variant-b12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
// Lambda@Edge function for advanced cache key construction'use strict'; exports.handler = async (event, context) => { const request = event.Records[0].cf.request; const headers = request.headers; // Extract and normalize key components const components = { path: request.uri, // Extract allowed query parameters only queryParams: extractAllowedParams(request.querystring, [ 'version', 'locale', 'format', 'page', 'limit' ]), // Normalize device type from User-Agent device: normalizeDevice(headers['user-agent']?.[0]?.value), // Extract country (from CloudFront geo header) country: headers['cloudfront-viewer-country']?.[0]?.value || 'XX', // Extract A/B test variant from cookie variant: extractCookieValue( headers['cookie']?.[0]?.value, 'ab_variant' ) || 'control', // Auth state (NOT the session token) isAuthenticated: !!extractCookieValue( headers['cookie']?.[0]?.value, 'session_token' ), }; // Construct hierarchical cache key const cacheKey = buildCacheKey(components); // Set custom header that CloudFront will use for cache key request.headers['x-cache-key'] = [{ key: 'X-Cache-Key', value: cacheKey }]; return request;}; function extractAllowedParams(querystring, allowlist) { if (!querystring) return ''; const params = new URLSearchParams(querystring); const filtered = new URLSearchParams(); allowlist.forEach(key => { if (params.has(key)) { filtered.set(key, params.get(key)); } }); // Sort for consistency filtered.sort(); return filtered.toString();} function normalizeDevice(userAgent) { if (!userAgent) return 'desktop'; const ua = userAgent.toLowerCase(); if (/mobile|android|iphone|ipod/.test(ua)) return 'mobile'; if (/tablet|ipad/.test(ua)) return 'tablet'; return 'desktop';} function extractCookieValue(cookieHeader, name) { if (!cookieHeader) return null; const match = cookieHeader.match(new RegExp(`${name}=([^;]+)`)); return match ? match[1] : null;} function buildCacheKey(components) { // Hierarchical structure for partial sharing return [ components.path, components.queryParams, components.device, components.country, components.variant, components.isAuthenticated ? 'auth' : 'anon' ].filter(Boolean).join('::');}Add a response header that echoes the computed cache key (e.g., X-Cache-Key-Debug). This is invaluable for debugging cache misses. In production, gate this behind a debug cookie or internal IP ranges to avoid exposing cache structure to attackers.
Cache keys are the invisible foundation of CDN efficiency. Understanding them transforms how you approach content delivery architecture.
Vary: User-Agent destroys cache efficiency.What's Next:
Now that we understand how cache keys determine which cached response to return, we'll explore how long those cached responses remain valid. The next page covers TTL Configuration—the time-based rules that govern cache freshness and the strategies for balancing performance with content accuracy.
You now understand cache keys at an expert level—from basic construction to advanced customization to security implications. This knowledge is foundational for all CDN optimization and directly impacts cache hit ratios, origin load, and system reliability.