System DesignCache Invalidation at Scale

Cache Invalidation at Scale

LevelAdvanced

Duration90 mins

TopicCache Invalidation at Scale

5 / 5

Invalidation Latency: The Time to Global Freshness

The Critical Metric Nobody Measures

When your CEO publishes a correction to a press release, how long until every user worldwide sees the update? When you patch a security vulnerability, how long are users potentially exposed to the vulnerable version? When prices change during a flash sale, how long until all customers see accurate pricing?

These questions probe invalidation latency—the time elapsed between initiating cache invalidation and achieving global cache freshness. Most engineering teams obsess over cache hit ratios and origin response times but completely neglect this critical metric. Yet for content-sensitive applications, invalidation latency can be the difference between a minor correction and a major incident.

Understanding, measuring, and optimizing invalidation latency requires deep knowledge of CDN architecture, network physics, and distributed systems behavior.

What You Will Master

By the end of this page, you will understand the components of invalidation latency, how to measure it accurately, provider-specific propagation characteristics, optimization strategies, and how to set realistic SLAs and build monitoring for content freshness.

Anatomy of Invalidation Latency

Invalidation latency is not a single delay but a composition of multiple sequential and parallel delays. Understanding each component enables targeted optimization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
TOTAL INVALIDATION LATENCY BREAKDOWN
═══════════════════════════════════════════════════════════════════════════
 
Content       Application    API         Control      Edge Node     Cache
Update        Processing    Latency      Plane        Propagation   Verified
   │              │            │            │              │            │
   T0             T1           T2           T3             T4           T5
   │              │            │            │              │            │
   ├──────────────┼────────────┼────────────┼──────────────┼────────────┤
   │   App Logic  │  Network   │  Queuing   │  Broadcast   │ Edge Proc. │
   │  (10-100ms)  │ (10-50ms)  │ (10-500ms) │  (50-500ms)  │  (1-50ms)  │
   ▼              ▼            ▼            ▼              ▼            ▼
   
   Origin DB     Webhook/     CDN API      CDN Control    Edge Caches   User Gets
   Updated       Event Fires   Receives     Plane Proc.    Invalidated  Fresh Content
 
 
COMPONENT BREAKDOWN:
───────────────────────────────────────────────────────────────────────────
 
T0 → T1: APPLICATION PROCESSING (10-100ms typical)
  • Time for application to detect content change
  • Database write latency
  • Webhook/event trigger latency
  • Application code execution time
 
T1 → T2: API LATENCY (10-50ms typical)
  • Network round-trip to CDN API endpoint
  • TLS handshake (if new connection)
  • DNS resolution (if not cached)
  • HTTP request/response overhead
 
T2 → T3: CONTROL PLANE PROCESSING (10-500ms typical)
  • CDN API authentication/authorization
  • Request validation and parsing
  • Purge queue insertion
  • Cache key resolution (especially for wildcards/tags)
  • Priority queue processing
 
T3 → T4: EDGE PROPAGATION (50-500ms typical, varies by provider)
  • Control plane broadcasts to all POPs
  • Geographic distribution delays
  • Internal CDN network latency
  • Potential retries for failed deliveries
 
T4 → T5: EDGE PROCESSING (1-50ms typical)
  • Edge node receives invalidation instruction
  • Cache lookup and entry deletion/marking
  • Confirmation sent back to control plane
 
TOTAL: 100ms - 1500ms (p50: ~300ms, p95: ~1000ms for most CDNs)

The Weakest Link Problem:

Your invalidation is only as fast as the slowest component. A CDN with sub-second edge propagation provides no benefit if your application takes 5 seconds to emit purge requests. Conversely, instantaneous application response means nothing if the CDN queues purges for 30 seconds during peak load.

Optimizing invalidation latency requires analyzing and improving each component in sequence, starting with the largest contributors.

Common Latency Bottlenecks by Component
Component	Common Bottleneck	Typical Delay	Mitigation
Application	Synchronous purge in request handler	100-500ms added	Async event-driven purging
API Latency	High-latency API endpoints	50-200ms	Use regional API endpoints, connection pooling
Control Plane	Wildcard purge resolution	100-1000ms	Use tag-based or exact URL purges
Edge Propagation	Global POP count (200+ locations)	200-500ms	Accept or use regional purging
Edge Processing	High cache entry count	10-100ms	Rarely the bottleneck

Physics Imposes Limits

Light takes ~67ms to travel halfway around Earth in fiber. A CDN with POPs in Tokyo and New York has a minimum ~130ms propagation floor for global consistency. No amount of optimization can break the speed of light—design around geographic realities.

CDN Provider Propagation Characteristics

CDN providers vary dramatically in their invalidation latency characteristics. These differences stem from architectural choices, network infrastructure, and engineering priorities.

CDN Provider Invalidation Latency Comparison
Provider	Claimed Latency	Measured p50	Measured p95	Notes
Fastly	<150ms globally	~200ms	~500ms	Industry-leading; purpose-built for fast purge
Cloudflare	<30 seconds	~2-5 seconds	~15 seconds	Improving; enterprise plans faster
AWS CloudFront	<60 seconds (typically)	~10-30 seconds	~60 seconds	Invalidation, not purge; batch delay
Akamai Fast Purge	<5 seconds	~2-5 seconds	~8 seconds	Significant improvement over legacy
Bunny CDN	<1 second	~500ms	~2 seconds	Smaller network, faster propagation

Why Such Variation?

The architectural differences that create this latency variation are significant:

Fastly's Approach:

Push-based invalidation via dedicated control channel
All POPs maintain persistent connections to control plane
Invalidation uses streaming RPC, not polling
Optimized binary protocol for purge instructions
Smaller POP count (~70) than competitors

CloudFront's Approach:

Invalidation queued and batch-processed
Edge locations poll for invalidation instructions
Larger POP count (~410+) increases propagation time
Designed for eventual consistency, not rapid invalidation
Invalidation is "best effort" timing

Implications:

If invalidation latency is critical (news, pricing, security), prioritize CDNs with fast purge architecture
If eventual consistency (1-60 seconds) is acceptable, CloudFront's lower cost may be preferabl
Hybrid approaches can use multiple CDNs for different content types

Claimed vs. Measured Latency

CDN marketing materials often cite best-case latencies. Always measure actual propagation latency from your application with your content patterns under your traffic conditions. Provider benchmarks may not reflect your real-world experience.

Measuring Invalidation Latency Accurately

You cannot optimize what you don't measure. Establishing accurate invalidation latency measurement requires multi-point sampling, time synchronization, and statistical rigor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
/**
 * Multi-Region Invalidation Latency Measurement System
 * 
 * Measures end-to-end invalidation latency from content change
 * to verified cache freshness across global edge locations.
 */
 
interface InvalidationLatencyMeasurement {
  measurementId: string;
  contentUrl: string;
  invalidationStartedAt: number;  // Unix timestamp ms
  apiAcknowledgedAt: number;
  apiReportedCompleteAt: number;
  regionalVerifications: RegionalVerification[];
  aggregateMetrics: AggregateMetrics;
}
 
interface RegionalVerification {
  region: string;
  edgeLocation: string;
  firstFreshResponseAt: number;
  latencyFromStartMs: number;
  verificationAttempts: number;
}
 
interface AggregateMetrics {
  p50LatencyMs: number;
  p90LatencyMs: number;
  p95LatencyMs: number;
  p99LatencyMs: number;
  maxLatencyMs: number;
  slowestRegion: string;
  fastestRegion: string;
}
 
class InvalidationLatencyProbe {
  private verificationRegions = [
    { region: 'us-east-1', endpoint: 'https://probe-us-east.example.com' },
    { region: 'us-west-2', endpoint: 'https://probe-us-west.example.com' },
    { region: 'eu-west-1', endpoint: 'https://probe-eu-west.example.com' },
    { region: 'eu-central-1', endpoint: 'https://probe-eu-central.example.com' },
    { region: 'ap-southeast-1', endpoint: 'https://probe-ap-southeast.example.com' },
    { region: 'ap-northeast-1', endpoint: 'https://probe-ap-northeast.example.com' },
    { region: 'sa-east-1', endpoint: 'https://probe-sa-east.example.com' },
  ];
 
  async measureInvalidationLatency(
    contentUrl: string,
    expectedContentHash: string,
    maxWaitMs: number = 60000
  ): Promise<InvalidationLatencyMeasurement> {
    const measurementId = crypto.randomUUID();
    const startTime = Date.now();
    
    // Step 1: Trigger invalidation
    console.log(`[Measurement ${measurementId}] Starting invalidation`);
    const purgeResult = await this.cdnClient.purge({ urls: [contentUrl] });
    const apiAcknowledgedAt = Date.now();
    
    // Step 2: Wait for API reported completion
    await this.cdnClient.waitForCompletion(purgeResult.purgeId);
    const apiReportedCompleteAt = Date.now();
    
    // Step 3: Verify from all regions in parallel
    const regionalVerifications = await Promise.all(
      this.verificationRegions.map(region =>
        this.verifyFromRegion(
          region,
          contentUrl,
          expectedContentHash,
          startTime,
          maxWaitMs
        )
      )
    );
    
    // Step 4: Calculate aggregate metrics
    const latencies = regionalVerifications.map(v => v.latencyFromStartMs);
    latencies.sort((a, b) => a - b);
    
    const aggregateMetrics: AggregateMetrics = {
      p50LatencyMs: this.percentile(latencies, 50),
      p90LatencyMs: this.percentile(latencies, 90),
      p95LatencyMs: this.percentile(latencies, 95),
      p99LatencyMs: this.percentile(latencies, 99),
      maxLatencyMs: Math.max(...latencies),
      slowestRegion: regionalVerifications.reduce((a, b) =>
        a.latencyFromStartMs > b.latencyFromStartMs ? a : b
      ).region,
      fastestRegion: regionalVerifications.reduce((a, b) =>
        a.latencyFromStartMs < b.latencyFromStartMs ? a : b
      ).region,
    };
    
    const measurement: InvalidationLatencyMeasurement = {
      measurementId,
      contentUrl,
      invalidationStartedAt: startTime,
      apiAcknowledgedAt,
      apiReportedCompleteAt,
      regionalVerifications,
      aggregateMetrics,
    };
    
    // Record to metrics system
    this.recordMetrics(measurement);
    
    return measurement;
  }
 
  private async verifyFromRegion(
    region: { region: string; endpoint: string },
    contentUrl: string,
    expectedContentHash: string,
    startTime: number,
    maxWaitMs: number
  ): Promise<RegionalVerification> {
    let attempts = 0;
    const pollIntervalMs = 100;
    
    while (Date.now() - startTime < maxWaitMs) {
      attempts++;
      
      try {
        // Request content via regional probe
        const response = await fetch(`${region.endpoint}/probe`, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            targetUrl: contentUrl,
            expectedHash: expectedContentHash,
            bypassCache: false  // We want to hit CDN cache
          })
        });
        
        const result = await response.json();
        
        if (result.contentHash === expectedContentHash) {
          // Fresh content verified!
          return {
            region: region.region,
            edgeLocation: result.edgeLocation,
            firstFreshResponseAt: Date.now(),
            latencyFromStartMs: Date.now() - startTime,
            verificationAttempts: attempts
          };
        }
      } catch (error) {
        console.warn(`Probe failed for ${region.region}: ${error.message}`);
      }
      
      await sleep(pollIntervalMs);
    }
    
    // Timeout - invalidation incomplete
    throw new Error(
      `Region ${region.region} did not receive fresh content within ${maxWaitMs}ms`
    );
  }
 
  private percentile(sortedArray: number[], p: number): number {
    const index = Math.ceil((p / 100) * sortedArray.length) - 1;
    return sortedArray[Math.max(0, index)];
  }
 
  private recordMetrics(measurement: InvalidationLatencyMeasurement): void {
    metrics.histogram('cdn.invalidation.latency_ms', measurement.aggregateMetrics.p50LatencyMs, {
      percentile: 'p50'
    });
    metrics.histogram('cdn.invalidation.latency_ms', measurement.aggregateMetrics.p95LatencyMs, {
      percentile: 'p95'
    });
    
    for (const regional of measurement.regionalVerifications) {
      metrics.histogram('cdn.invalidation.regional_latency_ms', regional.latencyFromStartMs, {
        region: regional.region
      });
    }
  }
}

Measurement Best Practices

•Use Globally Distributed Probes — Your users are global; measure from all major regions where they're located.
•Verify Content, Not Headers — Check actual content hash or unique marker, not just cache headers which can be misleading.
•Sample Continuously — Run periodic invalidation tests (hourly, per-deploy). Point-in-time measurements miss variability.
•Track Percentiles, Not Averages — p95 and p99 matter more than p50 for user experience; averages hide outliers.
•Correlate with System State — Track CDN provider status, your purge volume, and time-of-day to identify patterns.

Synthetic vs. Real User Measurement

Synthetic probes (controlled tests) measure CDN behavior. Real User Monitoring (RUM) measures actual user experience. Use both: synthetic for SLA verification and debugging; RUM for true user impact assessment.

Optimizing Application-Side Latency

The fastest CDN propagation is meaningless if your application takes seconds to initiate invalidation. Application-side optimization often provides the most significant latency improvements.

Anti-Pattern: Synchronous Purge

•Purge called inside request handler
•User waits for purge API response
•Adds 100-500ms to user request
•Failures may block user operation
•No retry logic; single point of failure

Best Practice: Async Event-Driven

•Content change emits event to queue
•Dedicated worker processes purge events
•User request completes immediately
•Worker retries on failure automatically
•Purge latency measured separately

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
/**
 * Low-Latency Event-Driven Purge Pipeline
 * 
 * Decouples content updates from purge execution for
 * minimum end-to-end invalidation latency.
 */
 
// Step 1: Content Change Emits Event (in request handler)
async function updateProduct(productId: string, data: ProductUpdate) {
  const startTime = Date.now();
  
  // Update database
  const product = await db.updateProduct(productId, data);
  
  // Emit invalidation event immediately - non-blocking
  await eventQueue.emit('content.updated', {
    entityType: 'product',
    entityId: productId,
    version: product.version,
    updatedAt: Date.now(),
    priority: determinePriority(data),
  });
  
  // Return to user immediately
  const responseTime = Date.now() - startTime;
  metrics.histogram('api.product_update.latency_ms', responseTime);
  
  return product;  // User sees ~50ms response, not ~500ms
}
 
// Step 2: Dedicated Purge Worker (separate process/container)
class PurgeEventWorker {
  constructor(
    private cdnClient: CDNClient,
    private urlResolver: ContentUrlResolver,
  ) {}
  
  async processEvent(event: ContentUpdateEvent): Promise<void> {
    const processingStart = Date.now();
    
    // Resolve event to cache URLs as fast as possible
    const urls = await this.urlResolver.resolveUrls(event);
    
    // Issue purge immediately
    const purgeId = await this.cdnClient.purge({
      urls,
      type: event.priority === 'critical' ? 'hard' : 'soft',
    });
    
    // Track timing
    const apiLatency = Date.now() - processingStart;
    metrics.histogram('purge.api_latency_ms', apiLatency, {
      priority: event.priority
    });
    
    // Log event-to-purge latency
    const eventToApiLatency = Date.now() - event.updatedAt;
    metrics.histogram('purge.event_to_api_latency_ms', eventToApiLatency);
    
    console.log(`Purge initiated: ${purgeId} (event-to-API: ${eventToApiLatency}ms)`);
  }
}
 
// Step 3: Optimize URL Resolution (often the hidden bottleneck)
class OptimizedUrlResolver implements ContentUrlResolver {
  // Pre-computed URL templates for common patterns
  private urlTemplates = new Map<string, (id: string) => string[]>([
    ['product', (id) => [
      `/products/${id}`,
      `/api/v1/products/${id}`,
      `/api/v2/products/${id}`,
    ]],
    ['category', (id) => [
      `/categories/${id}`,
      `/api/v1/categories/${id}`,
    ]],
  ]);
  
  // Cached relationship lookups
  private relationshipCache = new LRUCache<string, string[]>({
    max: 10000,
    ttl: 60000,  // 1 minute
  });
  
  async resolveUrls(event: ContentUpdateEvent): Promise<string[]> {
    const urls: string[] = [];
    
    // Fast path: Template-based resolution (no DB query)
    const template = this.urlTemplates.get(event.entityType);
    if (template) {
      urls.push(...template(event.entityId).map(p => CDN_BASE + p));
    }
    
    // Only query relationships if not in cache
    const cacheKey = `${event.entityType}:${event.entityId}`;
    let relationships = this.relationshipCache.get(cacheKey);
    
    if (!relationships) {
      relationships = await this.queryRelationships(event);
      this.relationshipCache.set(cacheKey, relationships);
    }
    
    urls.push(...relationships);
    
    return urls;
  }
}
 
// Optimization metrics to track
interface PurgeLatencyBreakdown {
  eventEmitToQueueMs: number;        // target: <10ms
  queueToWorkerPickupMs: number;     // target: <50ms
  urlResolutionMs: number;           // target: <20ms
  cdnApiRoundtripMs: number;         // target: <100ms
  totalEventToApiMs: number;         // target: <200ms
}

Queue Latency Matters

Message queue latency directly impacts invalidation latency. Use low-latency queuing systems (Redis Streams, Kafka with optimized config) rather than high-latency options (SQS standard mode). Every millisecond counts when optimizing event-to-purge time.

Regional Invalidation Strategies

Global invalidation takes longer than regional invalidation. For latency-sensitive scenarios, geographic scoping can dramatically reduce time to freshness.

Regional Invalidation Patterns:

1. Primary Region First

Invalidate the region with most traffic first, then propagate globally:

T+0ms:    Invalidate US-EAST (primary hub)
T+100ms:  Users in US-EAST see fresh content
T+100ms:  Global invalidation issued
T+500ms:  All regions see fresh content

Benefit: 80% of users (in primary region) see fresh content in 100ms rather than waiting 500ms for global propagation.

2. User-Region Affinity

Invalidate the region relevant to the content/user:

Event: EU customer updates their profile
Action: Invalidate EU regions first (EU-WEST, EU-CENTRAL)
        Then invalidate globally (async)

Benefit: The user who triggered the change sees immediate freshness; other regions follow.

3. Content-Region Binding

Some content is inherently regional:

Event: US prices updated (EU prices unchanged)
Action: Only invalidate US regions
        EU content remains unchanged

Benefit: Reduced purge scope = faster propagation + lower CDN quota usage.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
interface RegionalPurgeConfig {
  primaryRegions: string[];      // Invalidate immediately
  secondaryRegions: string[];    // Invalidate after primary
  globalFallback: boolean;       // Also issue global purge
}
 
class RegionalPurgeStrategy {
  private regionConfigs: Map<string, RegionalPurgeConfig> = new Map([
    ['us', {
      primaryRegions: ['us-east', 'us-west'],
      secondaryRegions: ['eu-west', 'ap-northeast'],
      globalFallback: true
    }],
    ['eu', {
      primaryRegions: ['eu-west', 'eu-central'],
      secondaryRegions: ['us-east', 'ap-southeast'],
      globalFallback: true
    }],
    ['global', {
      primaryRegions: [],  // Skip regional, go straight to global
      secondaryRegions: [],
      globalFallback: true
    }]
  ]);
 
  async purgeWithRegionalPriority(
    urls: string[],
    targetGeo: string = 'global'
  ): Promise<RegionalPurgeResult> {
    const config = this.regionConfigs.get(targetGeo) ?? this.regionConfigs.get('global')!;
    const result: RegionalPurgeResult = {
      primaryPurgeLatencyMs: 0,
      secondaryPurgeLatencyMs: 0,
      globalPurgeLatencyMs: 0,
    };
    
    // Phase 1: Primary regions (synchronous, wait for completion)
    if (config.primaryRegions.length > 0) {
      const primaryStart = Date.now();
      
      await this.cdnClient.purge({
        urls,
        regions: config.primaryRegions,
        waitForCompletion: true  // Important: wait for primary
      });
      
      result.primaryPurgeLatencyMs = Date.now() - primaryStart;
      console.log(`Primary regions invalidated in ${result.primaryPurgeLatencyMs}ms`);
    }
    
    // Phase 2: Secondary regions (async, don't block)
    if (config.secondaryRegions.length > 0) {
      const secondaryStart = Date.now();
      
      // Fire and continue (don't await)
      this.cdnClient.purge({
        urls,
        regions: config.secondaryRegions,
        waitForCompletion: false
      }).then(() => {
        result.secondaryPurgeLatencyMs = Date.now() - secondaryStart;
        console.log(`Secondary regions invalidated in ${result.secondaryPurgeLatencyMs}ms`);
      });
    }
    
    // Phase 3: Global fallback (async, comprehensive cleanup)
    if (config.globalFallback) {
      const globalStart = Date.now();
      
      // Delay slightly to reduce thundering herd on origin
      setTimeout(async () => {
        await this.cdnClient.purge({
          urls,
          regions: null,  // null = global
          waitForCompletion: false
        });
        result.globalPurgeLatencyMs = Date.now() - globalStart;
      }, 1000);  // 1 second delay
    }
    
    return result;
  }
}
 
// Usage: User in US updates their profile
async function handleProfileUpdate(userId: string, userGeo: string) {
  await db.updateProfile(userId, profileData);
  
  const urls = [`/users/${userId}`, `/api/users/${userId}`];
  
  // Invalidate user's region first for immediate feedback
  const strategy = new RegionalPurgeStrategy();
  await strategy.purgeWithRegionalPriority(urls, userGeo);
  
  // User sees fresh content in ~100ms (their region)
  // Rest of world sees fresh content in ~500ms (global)
}

Consistency Windows

Regional-priority invalidation creates consistency windows where different regions see different content versions. This is acceptable for most user-generated content but may not be for pricing, inventory, or compliance-sensitive data. Choose regional strategies based on content type.

SLA Definition and Monitoring

Defining and monitoring invalidation latency SLAs requires understanding business requirements, technical constraints, and provider capabilities.

Invalidation Latency SLA Examples by Use Case
Use Case	Latency Target (p95)	Rationale
Security patches	<5 seconds	Minimize vulnerability exposure window
Price updates	<30 seconds	Legal/accuracy requirements; customer trust
Breaking news	<2 minutes	Competitive advantage; journalistic integrity
Product content	<5 minutes	Good UX but not time-critical
Blog posts	<15 minutes	Low urgency; TTL-based often sufficient

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
/**
 * Invalidation SLA Monitoring System
 * 
 * Tracks invalidation latency against defined SLAs,
 * alerts on violations, and generates compliance reports.
 */
 
interface InvalidationSLA {
  name: string;
  contentTypes: string[];
  targetP95LatencyMs: number;
  targetP99LatencyMs: number;
  alertThresholdMs: number;
}
 
const SLAS: InvalidationSLA[] = [
  {
    name: 'security-critical',
    contentTypes: ['security-patch', 'malware-removal'],
    targetP95LatencyMs: 5000,
    targetP99LatencyMs: 10000,
    alertThresholdMs: 8000,
  },
  {
    name: 'pricing',
    contentTypes: ['product-price', 'sale-announcement'],
    targetP95LatencyMs: 30000,
    targetP99LatencyMs: 60000,
    alertThresholdMs: 45000,
  },
  {
    name: 'content-update',
    contentTypes: ['article', 'product-description', 'image'],
    targetP95LatencyMs: 300000,  // 5 minutes
    targetP99LatencyMs: 600000,  // 10 minutes
    alertThresholdMs: 480000,    // 8 minutes
  },
];
 
class InvalidationSLAMonitor {
  private latencyHistograms: Map<string, Histogram> = new Map();
  
  recordInvalidationLatency(
    contentType: string,
    latencyMs: number,
    metadata: Record<string, string>
  ): void {
    // Find applicable SLA
    const sla = this.findSLA(contentType);
    
    // Record to histogram
    let histogram = this.latencyHistograms.get(sla.name);
    if (!histogram) {
      histogram = new Histogram();
      this.latencyHistograms.set(sla.name, histogram);
    }
    histogram.record(latencyMs);
    
    // Emit metric for dashboards
    metrics.histogram('invalidation.latency_ms', latencyMs, {
      sla: sla.name,
      content_type: contentType,
      ...metadata
    });
    
    // Check alert threshold
    if (latencyMs > sla.alertThresholdMs) {
      this.triggerAlert(sla, latencyMs, contentType, metadata);
    }
  }
  
  private triggerAlert(
    sla: InvalidationSLA,
    latencyMs: number,
    contentType: string,
    metadata: Record<string, string>
  ): void {
    alerting.alert({
      severity: sla.name === 'security-critical' ? 'critical' : 'warning',
      title: `Invalidation SLA Breach: ${sla.name}`,
      message: `Content type '${contentType}' took ${latencyMs}ms to invalidate (threshold: ${sla.alertThresholdMs}ms)`,
      metadata: {
        sla_name: sla.name,
        latency_ms: latencyMs.toString(),
        threshold_ms: sla.alertThresholdMs.toString(),
        ...metadata
      },
      runbook: 'https://wiki.example.com/runbooks/invalidation-sla-breach'
    });
  }
  
  generateComplianceReport(periodDays: number = 30): SLAComplianceReport {
    const report: SLAComplianceReport = {
      period: { days: periodDays, endDate: new Date() },
      slaCompliance: [],
    };
    
    for (const sla of SLAS) {
      const histogram = this.latencyHistograms.get(sla.name);
      if (!histogram) continue;
      
      const p95 = histogram.percentile(95);
      const p99 = histogram.percentile(99);
      
      report.slaCompliance.push({
        slaName: sla.name,
        sampleCount: histogram.count,
        actualP95Ms: p95,
        actualP99Ms: p99,
        targetP95Ms: sla.targetP95LatencyMs,
        targetP99Ms: sla.targetP99LatencyMs,
        p95Compliant: p95 <= sla.targetP95LatencyMs,
        p99Compliant: p99 <= sla.targetP99LatencyMs,
        breachCount: histogram.countAbove(sla.alertThresholdMs),
      });
    }
    
    return report;
  }
}
 
// Grafana dashboard query examples
const GRAFANA_QUERIES = {
  // Latency percentiles over time
  latencyP95: 'histogram_quantile(0.95, rate(invalidation_latency_ms_bucket[5m]))',
  latencyP99: 'histogram_quantile(0.99, rate(invalidation_latency_ms_bucket[5m]))',
  
  // SLA compliance rate
  slaCompliance: '1 - (sum(rate(invalidation_latency_ms_bucket{le="30000"}[1h])) / sum(rate(invalidation_latency_ms_count[1h])))',
  
  // Breach count
  breachCount: 'sum(increase(invalidation_sla_breach_total[24h])) by (sla)'
};

SLAs Inform Architecture

If you define a 5-second SLA for security content, you need a CDN that can achieve sub-5-second propagation (e.g., Fastly). If you're on CloudFront with 30-60 second invalidation, your SLA cannot be 5 seconds. Match SLAs to provider capabilities.

Degradation and Fallback Strategies

CDN purge APIs can fail, rate limit, or experience elevated latency. Robust systems need fallback strategies for when invalidation doesn't work as expected.

Fallback Strategy Options

•TTL-Based Eventual Freshness — If purge fails, content will eventually refresh via natural TTL expiration. Design TTLs to provide acceptable fallback freshness.
•Reduced TTL During Incidents — When purge is failing, origin can set shorter Cache-Control headers for new requests, accelerating eventual consistency.
•Client-Side Cache Busting — Add version query parameters client-side: /content?v={timestamp}. Bypasses CDN cache for critical updates.
•Origin-Level Staleness Headers — Return Stale-While-Revalidate headers from origin so CDN serves stale but revalidates soon.
•Multi-CDN Failover — If primary CDN purge fails, route traffic to secondary CDN with fresh content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
class ResilientInvalidationService {
  constructor(
    private primaryCdn: CDNClient,
    private fallbackStrategies: FallbackStrategy[],
    private circuitBreaker: CircuitBreaker,
  ) {}
  
  async invalidate(urls: string[], options: InvalidationOptions): Promise<void> {
    // Check if circuit breaker is open (CDN API is failing)
    if (this.circuitBreaker.isOpen()) {
      console.warn('CDN purge circuit open - using fallback strategies');
      await this.executeFallbacks(urls, options);
      return;
    }
    
    try {
      // Attempt primary CDN purge
      const result = await this.primaryCdn.purge({ urls, ...options });
      
      // Verify completion
      const verified = await this.verifyPurge(result.purgeId, urls);
      
      if (!verified) {
        console.warn('Purge verification failed - executing fallbacks');
        this.circuitBreaker.recordFailure();
        await this.executeFallbacks(urls, options);
      } else {
        this.circuitBreaker.recordSuccess();
      }
    } catch (error) {
      console.error('CDN purge failed:', error);
      this.circuitBreaker.recordFailure();
      await this.executeFallbacks(urls, options);
    }
  }
  
  private async executeFallbacks(
    urls: string[],
    options: InvalidationOptions
  ): Promise<void> {
    for (const strategy of this.fallbackStrategies) {
      try {
        console.log(`Executing fallback: ${strategy.name}`);
        await strategy.execute(urls, options);
        
        if (strategy.stopOnSuccess) {
          return;  // This fallback is sufficient
        }
      } catch (error) {
        console.error(`Fallback ${strategy.name} failed:`, error);
        // Continue to next fallback
      }
    }
    
    // All fallbacks failed - alert on-call
    alerting.critical('All invalidation fallbacks failed', { urls });
  }
}
 
// Example fallback strategies
const fallbackStrategies: FallbackStrategy[] = [
  // Strategy 1: Retry purge with exponential backoff
  {
    name: 'retry-with-backoff',
    async execute(urls, options) {
      await retryWithBackoff(() => cdn.purge({ urls }), {
        maxAttempts: 3,
        initialDelayMs: 1000,
        maxDelayMs: 10000,
      });
    },
    stopOnSuccess: true,
  },
  
  // Strategy 2: Update origin to return shorter TTL
  {
    name: 'reduce-ttl-at-origin',
    async execute(urls, options) {
      // Tell origin to reduce TTL for these paths
      await originConfig.setTempTTL(urls, {
        ttlSeconds: 60,  // 1 minute instead of 1 hour
        durationMinutes: 30,  // For the next 30 minutes
      });
    },
    stopOnSuccess: false,  // Continue with other strategies
  },
  
  // Strategy 3: Client-side cache busting hint
  {
    name: 'client-cache-bust-hint',
    async execute(urls, options) {
      // Update a version number clients can check
      await kvStore.set('content-version', Date.now());
      
      // Clients poll this and add ?v= parameter when changed
    },
    stopOnSuccess: false,
  },
];

Never Rely Solely on Purge

Purge is a best-effort operation. Always design TTLs to provide acceptable staleness even if purge completely fails. For security-critical content, consider caching with very short TTLs (or no cache) rather than relying on purge reliability.

Summary: Invalidation Latency Mastery

Invalidation latency—the time from content change to global cache freshness—is a critical but often neglected metric. Understanding its components and optimizing each stage enables reliable, fast content updates at CDN scale.

Key Takeaways

•Latency is multi-component — Application processing, API latency, control plane, edge propagation all contribute. Optimize the largest bottleneck first.
•Providers vary dramatically — Fastly achieves sub-second; CloudFront may take 60+ seconds. Choose based on requirements.
•Measure continuously — Use multi-region probes to track actual propagation latency, not just API acknowledgment.
•Decouple application from purge — Event-driven architecture minimizes application-side latency contribution.
•Consider regional strategies — Prioritize user's region for immediate feedback; global follows asynchronously.
•Define and monitor SLAs — Match SLAs to business requirements and provider capabilities; alert on breaches.
•Plan for failure — TTL-based fallback, reduced TTL during incidents, and circuit breakers ensure resilience.

Module Complete:

You have now completed the Cache Invalidation at Scale module, covering:

Purge Requests — The fundamental mechanics of CDN cache purging
Soft vs Hard Purge — Choosing between availability and immediacy
Tag-Based Invalidation — Logical content grouping for efficient invalidation
Versioned URLs — Eliminating invalidation through immutability
Invalidation Latency — Measuring and optimizing time to freshness

These concepts form the complete toolkit for managing CDN cache freshness at global scale.

Module Complete

You have mastered cache invalidation at scale—the strategies, patterns, and operational practices required to maintain content freshness across globally distributed CDN infrastructure.

5 / 5

Loading learning content...

System DesignCache Invalidation at Scale

Cache Invalidation at Scale

LevelAdvanced

Duration90 mins

TopicCache Invalidation at Scale

5 / 5

Invalidation Latency: The Time to Global Freshness

The Critical Metric Nobody Measures

Understanding, measuring, and optimizing invalidation latency requires deep knowledge of CDN architecture, network physics, and distributed systems behavior.

What You Will Master

Anatomy of Invalidation Latency

Invalidation latency is not a single delay but a composition of multiple sequential and parallel delays. Understanding each component enables targeted optimization.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
TOTAL INVALIDATION LATENCY BREAKDOWN
═══════════════════════════════════════════════════════════════════════════
 
Content       Application    API         Control      Edge Node     Cache
Update        Processing    Latency      Plane        Propagation   Verified
   │              │            │            │              │            │
   T0             T1           T2           T3             T4           T5
   │              │            │            │              │            │
   ├──────────────┼────────────┼────────────┼──────────────┼────────────┤
   │   App Logic  │  Network   │  Queuing   │  Broadcast   │ Edge Proc. │
   │  (10-100ms)  │ (10-50ms)  │ (10-500ms) │  (50-500ms)  │  (1-50ms)  │
   ▼              ▼            ▼            ▼              ▼            ▼
   
   Origin DB     Webhook/     CDN API      CDN Control    Edge Caches   User Gets
   Updated       Event Fires   Receives     Plane Proc.    Invalidated  Fresh Content
 
 
COMPONENT BREAKDOWN:
───────────────────────────────────────────────────────────────────────────
 
T0 → T1: APPLICATION PROCESSING (10-100ms typical)
  • Time for application to detect content change
  • Database write latency
  • Webhook/event trigger latency
  • Application code execution time
 
T1 → T2: API LATENCY (10-50ms typical)
  • Network round-trip to CDN API endpoint
  • TLS handshake (if new connection)
  • DNS resolution (if not cached)
  • HTTP request/response overhead
 
T2 → T3: CONTROL PLANE PROCESSING (10-500ms typical)
  • CDN API authentication/authorization
  • Request validation and parsing
  • Purge queue insertion
  • Cache key resolution (especially for wildcards/tags)
  • Priority queue processing
 
T3 → T4: EDGE PROPAGATION (50-500ms typical, varies by provider)
  • Control plane broadcasts to all POPs
  • Geographic distribution delays
  • Internal CDN network latency
  • Potential retries for failed deliveries
 
T4 → T5: EDGE PROCESSING (1-50ms typical)
  • Edge node receives invalidation instruction
  • Cache lookup and entry deletion/marking
  • Confirmation sent back to control plane
 
TOTAL: 100ms - 1500ms (p50: ~300ms, p95: ~1000ms for most CDNs)

The Weakest Link Problem:

Optimizing invalidation latency requires analyzing and improving each component in sequence, starting with the largest contributors.

Common Latency Bottlenecks by Component
Component	Common Bottleneck	Typical Delay	Mitigation
Application	Synchronous purge in request handler	100-500ms added	Async event-driven purging
API Latency	High-latency API endpoints	50-200ms	Use regional API endpoints, connection pooling
Control Plane	Wildcard purge resolution	100-1000ms	Use tag-based or exact URL purges
Edge Propagation	Global POP count (200+ locations)	200-500ms	Accept or use regional purging
Edge Processing	High cache entry count	10-100ms	Rarely the bottleneck

Physics Imposes Limits

CDN Provider Propagation Characteristics

CDN providers vary dramatically in their invalidation latency characteristics. These differences stem from architectural choices, network infrastructure, and engineering priorities.

CDN Provider Invalidation Latency Comparison
Provider	Claimed Latency	Measured p50	Measured p95	Notes
Fastly	<150ms globally	~200ms	~500ms	Industry-leading; purpose-built for fast purge
Cloudflare	<30 seconds	~2-5 seconds	~15 seconds	Improving; enterprise plans faster
AWS CloudFront	<60 seconds (typically)	~10-30 seconds	~60 seconds	Invalidation, not purge; batch delay
Akamai Fast Purge	<5 seconds	~2-5 seconds	~8 seconds	Significant improvement over legacy
Bunny CDN	<1 second	~500ms	~2 seconds	Smaller network, faster propagation

Why Such Variation?

The architectural differences that create this latency variation are significant:

Fastly's Approach:

Push-based invalidation via dedicated control channel
All POPs maintain persistent connections to control plane
Invalidation uses streaming RPC, not polling
Optimized binary protocol for purge instructions
Smaller POP count (~70) than competitors

CloudFront's Approach:

Invalidation queued and batch-processed
Edge locations poll for invalidation instructions
Larger POP count (~410+) increases propagation time
Designed for eventual consistency, not rapid invalidation
Invalidation is "best effort" timing

Implications:

If invalidation latency is critical (news, pricing, security), prioritize CDNs with fast purge architecture
If eventual consistency (1-60 seconds) is acceptable, CloudFront's lower cost may be preferabl
Hybrid approaches can use multiple CDNs for different content types

Claimed vs. Measured Latency

Measuring Invalidation Latency Accurately

You cannot optimize what you don't measure. Establishing accurate invalidation latency measurement requires multi-point sampling, time synchronization, and statistical rigor.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
/**
 * Multi-Region Invalidation Latency Measurement System
 * 
 * Measures end-to-end invalidation latency from content change
 * to verified cache freshness across global edge locations.
 */
 
interface InvalidationLatencyMeasurement {
  measurementId: string;
  contentUrl: string;
  invalidationStartedAt: number;  // Unix timestamp ms
  apiAcknowledgedAt: number;
  apiReportedCompleteAt: number;
  regionalVerifications: RegionalVerification[];
  aggregateMetrics: AggregateMetrics;
}
 
interface RegionalVerification {
  region: string;
  edgeLocation: string;
  firstFreshResponseAt: number;
  latencyFromStartMs: number;
  verificationAttempts: number;
}
 
interface AggregateMetrics {
  p50LatencyMs: number;
  p90LatencyMs: number;
  p95LatencyMs: number;
  p99LatencyMs: number;
  maxLatencyMs: number;
  slowestRegion: string;
  fastestRegion: string;
}
 
class InvalidationLatencyProbe {
  private verificationRegions = [
    { region: 'us-east-1', endpoint: 'https://probe-us-east.example.com' },
    { region: 'us-west-2', endpoint: 'https://probe-us-west.example.com' },
    { region: 'eu-west-1', endpoint: 'https://probe-eu-west.example.com' },
    { region: 'eu-central-1', endpoint: 'https://probe-eu-central.example.com' },
    { region: 'ap-southeast-1', endpoint: 'https://probe-ap-southeast.example.com' },
    { region: 'ap-northeast-1', endpoint: 'https://probe-ap-northeast.example.com' },
    { region: 'sa-east-1', endpoint: 'https://probe-sa-east.example.com' },
  ];
 
  async measureInvalidationLatency(
    contentUrl: string,
    expectedContentHash: string,
    maxWaitMs: number = 60000
  ): Promise<InvalidationLatencyMeasurement> {
    const measurementId = crypto.randomUUID();
    const startTime = Date.now();
    
    // Step 1: Trigger invalidation
    console.log(`[Measurement ${measurementId}] Starting invalidation`);
    const purgeResult = await this.cdnClient.purge({ urls: [contentUrl] });
    const apiAcknowledgedAt = Date.now();
    
    // Step 2: Wait for API reported completion
    await this.cdnClient.waitForCompletion(purgeResult.purgeId);
    const apiReportedCompleteAt = Date.now();
    
    // Step 3: Verify from all regions in parallel
    const regionalVerifications = await Promise.all(
      this.verificationRegions.map(region =>
        this.verifyFromRegion(
          region,
          contentUrl,
          expectedContentHash,
          startTime,
          maxWaitMs
        )
      )
    );
    
    // Step 4: Calculate aggregate metrics
    const latencies = regionalVerifications.map(v => v.latencyFromStartMs);
    latencies.sort((a, b) => a - b);
    
    const aggregateMetrics: AggregateMetrics = {
      p50LatencyMs: this.percentile(latencies, 50),
      p90LatencyMs: this.percentile(latencies, 90),
      p95LatencyMs: this.percentile(latencies, 95),
      p99LatencyMs: this.percentile(latencies, 99),
      maxLatencyMs: Math.max(...latencies),
      slowestRegion: regionalVerifications.reduce((a, b) =>
        a.latencyFromStartMs > b.latencyFromStartMs ? a : b
      ).region,
      fastestRegion: regionalVerifications.reduce((a, b) =>
        a.latencyFromStartMs < b.latencyFromStartMs ? a : b
      ).region,
    };
    
    const measurement: InvalidationLatencyMeasurement = {
      measurementId,
      contentUrl,
      invalidationStartedAt: startTime,
      apiAcknowledgedAt,
      apiReportedCompleteAt,
      regionalVerifications,
      aggregateMetrics,
    };
    
    // Record to metrics system
    this.recordMetrics(measurement);
    
    return measurement;
  }
 
  private async verifyFromRegion(
    region: { region: string; endpoint: string },
    contentUrl: string,
    expectedContentHash: string,
    startTime: number,
    maxWaitMs: number
  ): Promise<RegionalVerification> {
    let attempts = 0;
    const pollIntervalMs = 100;
    
    while (Date.now() - startTime < maxWaitMs) {
      attempts++;
      
      try {
        // Request content via regional probe
        const response = await fetch(`${region.endpoint}/probe`, {
          method: 'POST',
          headers: { 'Content-Type': 'application/json' },
          body: JSON.stringify({
            targetUrl: contentUrl,
            expectedHash: expectedContentHash,
            bypassCache: false  // We want to hit CDN cache
          })
        });
        
        const result = await response.json();
        
        if (result.contentHash === expectedContentHash) {
          // Fresh content verified!
          return {
            region: region.region,
            edgeLocation: result.edgeLocation,
            firstFreshResponseAt: Date.now(),
            latencyFromStartMs: Date.now() - startTime,
            verificationAttempts: attempts
          };
        }
      } catch (error) {
        console.warn(`Probe failed for ${region.region}: ${error.message}`);
      }
      
      await sleep(pollIntervalMs);
    }
    
    // Timeout - invalidation incomplete
    throw new Error(
      `Region ${region.region} did not receive fresh content within ${maxWaitMs}ms`
    );
  }
 
  private percentile(sortedArray: number[], p: number): number {
    const index = Math.ceil((p / 100) * sortedArray.length) - 1;
    return sortedArray[Math.max(0, index)];
  }
 
  private recordMetrics(measurement: InvalidationLatencyMeasurement): void {
    metrics.histogram('cdn.invalidation.latency_ms', measurement.aggregateMetrics.p50LatencyMs, {
      percentile: 'p50'
    });
    metrics.histogram('cdn.invalidation.latency_ms', measurement.aggregateMetrics.p95LatencyMs, {
      percentile: 'p95'
    });
    
    for (const regional of measurement.regionalVerifications) {
      metrics.histogram('cdn.invalidation.regional_latency_ms', regional.latencyFromStartMs, {
        region: regional.region
      });
    }
  }
}

Measurement Best Practices

•Use Globally Distributed Probes — Your users are global; measure from all major regions where they're located.
•Verify Content, Not Headers — Check actual content hash or unique marker, not just cache headers which can be misleading.
•Sample Continuously — Run periodic invalidation tests (hourly, per-deploy). Point-in-time measurements miss variability.
•Track Percentiles, Not Averages — p95 and p99 matter more than p50 for user experience; averages hide outliers.
•Correlate with System State — Track CDN provider status, your purge volume, and time-of-day to identify patterns.

Synthetic vs. Real User Measurement

Optimizing Application-Side Latency

The fastest CDN propagation is meaningless if your application takes seconds to initiate invalidation. Application-side optimization often provides the most significant latency improvements.

Anti-Pattern: Synchronous Purge

•Purge called inside request handler
•User waits for purge API response
•Adds 100-500ms to user request
•Failures may block user operation
•No retry logic; single point of failure

Best Practice: Async Event-Driven

•Content change emits event to queue
•Dedicated worker processes purge events
•User request completes immediately
•Worker retries on failure automatically
•Purge latency measured separately

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
/**
 * Low-Latency Event-Driven Purge Pipeline
 * 
 * Decouples content updates from purge execution for
 * minimum end-to-end invalidation latency.
 */
 
// Step 1: Content Change Emits Event (in request handler)
async function updateProduct(productId: string, data: ProductUpdate) {
  const startTime = Date.now();
  
  // Update database
  const product = await db.updateProduct(productId, data);
  
  // Emit invalidation event immediately - non-blocking
  await eventQueue.emit('content.updated', {
    entityType: 'product',
    entityId: productId,
    version: product.version,
    updatedAt: Date.now(),
    priority: determinePriority(data),
  });
  
  // Return to user immediately
  const responseTime = Date.now() - startTime;
  metrics.histogram('api.product_update.latency_ms', responseTime);
  
  return product;  // User sees ~50ms response, not ~500ms
}
 
// Step 2: Dedicated Purge Worker (separate process/container)
class PurgeEventWorker {
  constructor(
    private cdnClient: CDNClient,
    private urlResolver: ContentUrlResolver,
  ) {}
  
  async processEvent(event: ContentUpdateEvent): Promise<void> {
    const processingStart = Date.now();
    
    // Resolve event to cache URLs as fast as possible
    const urls = await this.urlResolver.resolveUrls(event);
    
    // Issue purge immediately
    const purgeId = await this.cdnClient.purge({
      urls,
      type: event.priority === 'critical' ? 'hard' : 'soft',
    });
    
    // Track timing
    const apiLatency = Date.now() - processingStart;
    metrics.histogram('purge.api_latency_ms', apiLatency, {
      priority: event.priority
    });
    
    // Log event-to-purge latency
    const eventToApiLatency = Date.now() - event.updatedAt;
    metrics.histogram('purge.event_to_api_latency_ms', eventToApiLatency);
    
    console.log(`Purge initiated: ${purgeId} (event-to-API: ${eventToApiLatency}ms)`);
  }
}
 
// Step 3: Optimize URL Resolution (often the hidden bottleneck)
class OptimizedUrlResolver implements ContentUrlResolver {
  // Pre-computed URL templates for common patterns
  private urlTemplates = new Map<string, (id: string) => string[]>([
    ['product', (id) => [
      `/products/${id}`,
      `/api/v1/products/${id}`,
      `/api/v2/products/${id}`,
    ]],
    ['category', (id) => [
      `/categories/${id}`,
      `/api/v1/categories/${id}`,
    ]],
  ]);
  
  // Cached relationship lookups
  private relationshipCache = new LRUCache<string, string[]>({
    max: 10000,
    ttl: 60000,  // 1 minute
  });
  
  async resolveUrls(event: ContentUpdateEvent): Promise<string[]> {
    const urls: string[] = [];
    
    // Fast path: Template-based resolution (no DB query)
    const template = this.urlTemplates.get(event.entityType);
    if (template) {
      urls.push(...template(event.entityId).map(p => CDN_BASE + p));
    }
    
    // Only query relationships if not in cache
    const cacheKey = `${event.entityType}:${event.entityId}`;
    let relationships = this.relationshipCache.get(cacheKey);
    
    if (!relationships) {
      relationships = await this.queryRelationships(event);
      this.relationshipCache.set(cacheKey, relationships);
    }
    
    urls.push(...relationships);
    
    return urls;
  }
}
 
// Optimization metrics to track
interface PurgeLatencyBreakdown {
  eventEmitToQueueMs: number;        // target: <10ms
  queueToWorkerPickupMs: number;     // target: <50ms
  urlResolutionMs: number;           // target: <20ms
  cdnApiRoundtripMs: number;         // target: <100ms
  totalEventToApiMs: number;         // target: <200ms
}

Queue Latency Matters

Regional Invalidation Strategies

Global invalidation takes longer than regional invalidation. For latency-sensitive scenarios, geographic scoping can dramatically reduce time to freshness.

Regional Invalidation Patterns:

1. Primary Region First

Invalidate the region with most traffic first, then propagate globally:

T+0ms:    Invalidate US-EAST (primary hub)
T+100ms:  Users in US-EAST see fresh content
T+100ms:  Global invalidation issued
T+500ms:  All regions see fresh content

Benefit: 80% of users (in primary region) see fresh content in 100ms rather than waiting 500ms for global propagation.

2. User-Region Affinity

Invalidate the region relevant to the content/user:

Event: EU customer updates their profile
Action: Invalidate EU regions first (EU-WEST, EU-CENTRAL)
        Then invalidate globally (async)

Benefit: The user who triggered the change sees immediate freshness; other regions follow.

3. Content-Region Binding

Some content is inherently regional:

Event: US prices updated (EU prices unchanged)
Action: Only invalidate US regions
        EU content remains unchanged

Benefit: Reduced purge scope = faster propagation + lower CDN quota usage.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
interface RegionalPurgeConfig {
  primaryRegions: string[];      // Invalidate immediately
  secondaryRegions: string[];    // Invalidate after primary
  globalFallback: boolean;       // Also issue global purge
}
 
class RegionalPurgeStrategy {
  private regionConfigs: Map<string, RegionalPurgeConfig> = new Map([
    ['us', {
      primaryRegions: ['us-east', 'us-west'],
      secondaryRegions: ['eu-west', 'ap-northeast'],
      globalFallback: true
    }],
    ['eu', {
      primaryRegions: ['eu-west', 'eu-central'],
      secondaryRegions: ['us-east', 'ap-southeast'],
      globalFallback: true
    }],
    ['global', {
      primaryRegions: [],  // Skip regional, go straight to global
      secondaryRegions: [],
      globalFallback: true
    }]
  ]);
 
  async purgeWithRegionalPriority(
    urls: string[],
    targetGeo: string = 'global'
  ): Promise<RegionalPurgeResult> {
    const config = this.regionConfigs.get(targetGeo) ?? this.regionConfigs.get('global')!;
    const result: RegionalPurgeResult = {
      primaryPurgeLatencyMs: 0,
      secondaryPurgeLatencyMs: 0,
      globalPurgeLatencyMs: 0,
    };
    
    // Phase 1: Primary regions (synchronous, wait for completion)
    if (config.primaryRegions.length > 0) {
      const primaryStart = Date.now();
      
      await this.cdnClient.purge({
        urls,
        regions: config.primaryRegions,
        waitForCompletion: true  // Important: wait for primary
      });
      
      result.primaryPurgeLatencyMs = Date.now() - primaryStart;
      console.log(`Primary regions invalidated in ${result.primaryPurgeLatencyMs}ms`);
    }
    
    // Phase 2: Secondary regions (async, don't block)
    if (config.secondaryRegions.length > 0) {
      const secondaryStart = Date.now();
      
      // Fire and continue (don't await)
      this.cdnClient.purge({
        urls,
        regions: config.secondaryRegions,
        waitForCompletion: false
      }).then(() => {
        result.secondaryPurgeLatencyMs = Date.now() - secondaryStart;
        console.log(`Secondary regions invalidated in ${result.secondaryPurgeLatencyMs}ms`);
      });
    }
    
    // Phase 3: Global fallback (async, comprehensive cleanup)
    if (config.globalFallback) {
      const globalStart = Date.now();
      
      // Delay slightly to reduce thundering herd on origin
      setTimeout(async () => {
        await this.cdnClient.purge({
          urls,
          regions: null,  // null = global
          waitForCompletion: false
        });
        result.globalPurgeLatencyMs = Date.now() - globalStart;
      }, 1000);  // 1 second delay
    }
    
    return result;
  }
}
 
// Usage: User in US updates their profile
async function handleProfileUpdate(userId: string, userGeo: string) {
  await db.updateProfile(userId, profileData);
  
  const urls = [`/users/${userId}`, `/api/users/${userId}`];
  
  // Invalidate user's region first for immediate feedback
  const strategy = new RegionalPurgeStrategy();
  await strategy.purgeWithRegionalPriority(urls, userGeo);
  
  // User sees fresh content in ~100ms (their region)
  // Rest of world sees fresh content in ~500ms (global)
}

Consistency Windows

SLA Definition and Monitoring

Defining and monitoring invalidation latency SLAs requires understanding business requirements, technical constraints, and provider capabilities.

Invalidation Latency SLA Examples by Use Case
Use Case	Latency Target (p95)	Rationale
Security patches	<5 seconds	Minimize vulnerability exposure window
Price updates	<30 seconds	Legal/accuracy requirements; customer trust
Breaking news	<2 minutes	Competitive advantage; journalistic integrity
Product content	<5 minutes	Good UX but not time-critical
Blog posts	<15 minutes	Low urgency; TTL-based often sufficient

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
/**
 * Invalidation SLA Monitoring System
 * 
 * Tracks invalidation latency against defined SLAs,
 * alerts on violations, and generates compliance reports.
 */
 
interface InvalidationSLA {
  name: string;
  contentTypes: string[];
  targetP95LatencyMs: number;
  targetP99LatencyMs: number;
  alertThresholdMs: number;
}
 
const SLAS: InvalidationSLA[] = [
  {
    name: 'security-critical',
    contentTypes: ['security-patch', 'malware-removal'],
    targetP95LatencyMs: 5000,
    targetP99LatencyMs: 10000,
    alertThresholdMs: 8000,
  },
  {
    name: 'pricing',
    contentTypes: ['product-price', 'sale-announcement'],
    targetP95LatencyMs: 30000,
    targetP99LatencyMs: 60000,
    alertThresholdMs: 45000,
  },
  {
    name: 'content-update',
    contentTypes: ['article', 'product-description', 'image'],
    targetP95LatencyMs: 300000,  // 5 minutes
    targetP99LatencyMs: 600000,  // 10 minutes
    alertThresholdMs: 480000,    // 8 minutes
  },
];
 
class InvalidationSLAMonitor {
  private latencyHistograms: Map<string, Histogram> = new Map();
  
  recordInvalidationLatency(
    contentType: string,
    latencyMs: number,
    metadata: Record<string, string>
  ): void {
    // Find applicable SLA
    const sla = this.findSLA(contentType);
    
    // Record to histogram
    let histogram = this.latencyHistograms.get(sla.name);
    if (!histogram) {
      histogram = new Histogram();
      this.latencyHistograms.set(sla.name, histogram);
    }
    histogram.record(latencyMs);
    
    // Emit metric for dashboards
    metrics.histogram('invalidation.latency_ms', latencyMs, {
      sla: sla.name,
      content_type: contentType,
      ...metadata
    });
    
    // Check alert threshold
    if (latencyMs > sla.alertThresholdMs) {
      this.triggerAlert(sla, latencyMs, contentType, metadata);
    }
  }
  
  private triggerAlert(
    sla: InvalidationSLA,
    latencyMs: number,
    contentType: string,
    metadata: Record<string, string>
  ): void {
    alerting.alert({
      severity: sla.name === 'security-critical' ? 'critical' : 'warning',
      title: `Invalidation SLA Breach: ${sla.name}`,
      message: `Content type '${contentType}' took ${latencyMs}ms to invalidate (threshold: ${sla.alertThresholdMs}ms)`,
      metadata: {
        sla_name: sla.name,
        latency_ms: latencyMs.toString(),
        threshold_ms: sla.alertThresholdMs.toString(),
        ...metadata
      },
      runbook: 'https://wiki.example.com/runbooks/invalidation-sla-breach'
    });
  }
  
  generateComplianceReport(periodDays: number = 30): SLAComplianceReport {
    const report: SLAComplianceReport = {
      period: { days: periodDays, endDate: new Date() },
      slaCompliance: [],
    };
    
    for (const sla of SLAS) {
      const histogram = this.latencyHistograms.get(sla.name);
      if (!histogram) continue;
      
      const p95 = histogram.percentile(95);
      const p99 = histogram.percentile(99);
      
      report.slaCompliance.push({
        slaName: sla.name,
        sampleCount: histogram.count,
        actualP95Ms: p95,
        actualP99Ms: p99,
        targetP95Ms: sla.targetP95LatencyMs,
        targetP99Ms: sla.targetP99LatencyMs,
        p95Compliant: p95 <= sla.targetP95LatencyMs,
        p99Compliant: p99 <= sla.targetP99LatencyMs,
        breachCount: histogram.countAbove(sla.alertThresholdMs),
      });
    }
    
    return report;
  }
}
 
// Grafana dashboard query examples
const GRAFANA_QUERIES = {
  // Latency percentiles over time
  latencyP95: 'histogram_quantile(0.95, rate(invalidation_latency_ms_bucket[5m]))',
  latencyP99: 'histogram_quantile(0.99, rate(invalidation_latency_ms_bucket[5m]))',
  
  // SLA compliance rate
  slaCompliance: '1 - (sum(rate(invalidation_latency_ms_bucket{le="30000"}[1h])) / sum(rate(invalidation_latency_ms_count[1h])))',
  
  // Breach count
  breachCount: 'sum(increase(invalidation_sla_breach_total[24h])) by (sla)'
};

SLAs Inform Architecture

Degradation and Fallback Strategies

CDN purge APIs can fail, rate limit, or experience elevated latency. Robust systems need fallback strategies for when invalidation doesn't work as expected.

Fallback Strategy Options

•TTL-Based Eventual Freshness — If purge fails, content will eventually refresh via natural TTL expiration. Design TTLs to provide acceptable fallback freshness.
•Reduced TTL During Incidents — When purge is failing, origin can set shorter Cache-Control headers for new requests, accelerating eventual consistency.
•Client-Side Cache Busting — Add version query parameters client-side: /content?v={timestamp}. Bypasses CDN cache for critical updates.
•Origin-Level Staleness Headers — Return Stale-While-Revalidate headers from origin so CDN serves stale but revalidates soon.
•Multi-CDN Failover — If primary CDN purge fails, route traffic to secondary CDN with fresh content.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
class ResilientInvalidationService {
  constructor(
    private primaryCdn: CDNClient,
    private fallbackStrategies: FallbackStrategy[],
    private circuitBreaker: CircuitBreaker,
  ) {}
  
  async invalidate(urls: string[], options: InvalidationOptions): Promise<void> {
    // Check if circuit breaker is open (CDN API is failing)
    if (this.circuitBreaker.isOpen()) {
      console.warn('CDN purge circuit open - using fallback strategies');
      await this.executeFallbacks(urls, options);
      return;
    }
    
    try {
      // Attempt primary CDN purge
      const result = await this.primaryCdn.purge({ urls, ...options });
      
      // Verify completion
      const verified = await this.verifyPurge(result.purgeId, urls);
      
      if (!verified) {
        console.warn('Purge verification failed - executing fallbacks');
        this.circuitBreaker.recordFailure();
        await this.executeFallbacks(urls, options);
      } else {
        this.circuitBreaker.recordSuccess();
      }
    } catch (error) {
      console.error('CDN purge failed:', error);
      this.circuitBreaker.recordFailure();
      await this.executeFallbacks(urls, options);
    }
  }
  
  private async executeFallbacks(
    urls: string[],
    options: InvalidationOptions
  ): Promise<void> {
    for (const strategy of this.fallbackStrategies) {
      try {
        console.log(`Executing fallback: ${strategy.name}`);
        await strategy.execute(urls, options);
        
        if (strategy.stopOnSuccess) {
          return;  // This fallback is sufficient
        }
      } catch (error) {
        console.error(`Fallback ${strategy.name} failed:`, error);
        // Continue to next fallback
      }
    }
    
    // All fallbacks failed - alert on-call
    alerting.critical('All invalidation fallbacks failed', { urls });
  }
}
 
// Example fallback strategies
const fallbackStrategies: FallbackStrategy[] = [
  // Strategy 1: Retry purge with exponential backoff
  {
    name: 'retry-with-backoff',
    async execute(urls, options) {
      await retryWithBackoff(() => cdn.purge({ urls }), {
        maxAttempts: 3,
        initialDelayMs: 1000,
        maxDelayMs: 10000,
      });
    },
    stopOnSuccess: true,
  },
  
  // Strategy 2: Update origin to return shorter TTL
  {
    name: 'reduce-ttl-at-origin',
    async execute(urls, options) {
      // Tell origin to reduce TTL for these paths
      await originConfig.setTempTTL(urls, {
        ttlSeconds: 60,  // 1 minute instead of 1 hour
        durationMinutes: 30,  // For the next 30 minutes
      });
    },
    stopOnSuccess: false,  // Continue with other strategies
  },
  
  // Strategy 3: Client-side cache busting hint
  {
    name: 'client-cache-bust-hint',
    async execute(urls, options) {
      // Update a version number clients can check
      await kvStore.set('content-version', Date.now());
      
      // Clients poll this and add ?v= parameter when changed
    },
    stopOnSuccess: false,
  },
];

Never Rely Solely on Purge

Summary: Invalidation Latency Mastery

Key Takeaways

•Latency is multi-component — Application processing, API latency, control plane, edge propagation all contribute. Optimize the largest bottleneck first.
•Providers vary dramatically — Fastly achieves sub-second; CloudFront may take 60+ seconds. Choose based on requirements.
•Measure continuously — Use multi-region probes to track actual propagation latency, not just API acknowledgment.
•Decouple application from purge — Event-driven architecture minimizes application-side latency contribution.
•Consider regional strategies — Prioritize user's region for immediate feedback; global follows asynchronously.
•Define and monitor SLAs — Match SLAs to business requirements and provider capabilities; alert on breaches.
•Plan for failure — TTL-based fallback, reduced TTL during incidents, and circuit breakers ensure resilience.

Module Complete:

You have now completed the Cache Invalidation at Scale module, covering:

Purge Requests — The fundamental mechanics of CDN cache purging
Soft vs Hard Purge — Choosing between availability and immediacy
Tag-Based Invalidation — Logical content grouping for efficient invalidation
Versioned URLs — Eliminating invalidation through immutability
Invalidation Latency — Measuring and optimizing time to freshness

These concepts form the complete toolkit for managing CDN cache freshness at global scale.

Module Complete

You have mastered cache invalidation at scale—the strategies, patterns, and operational practices required to maintain content freshness across globally distributed CDN infrastructure.

5 / 5