System Design (LLD)Caching Strategies

Why Caching Matters

LevelIntermediate

Duration60 mins

TopicCaching Strategies

4 / 4

When to Cache

The Art of Knowing When

We've established that caching offers compelling benefits—dramatic performance improvements and significant resource savings. We've also explored the trade-offs—consistency challenges, operational complexity, and subtle failure modes. Now comes the practical question: When should you actually cache?

This isn't a question with a simple yes/no answer. The decision to cache—and how to cache—depends on the specific characteristics of your data, access patterns, consistency requirements, and system constraints. Experienced engineers develop an intuition for caching opportunities, but that intuition is built on systematic analysis.

This page provides a rigorous framework for identifying caching opportunities, recognizing anti-patterns that suggest caching isn't appropriate, and making data-driven decisions about caching strategies.

What You Will Learn

By the end of this page, you will be able to systematically evaluate caching opportunities, identify the signals that indicate when caching is and isn't appropriate, select the right caching layer for different scenarios, and build a mental model for caching decisions that will serve you throughout your career.

Signals That Indicate Caching Potential

Certain patterns in your system strongly suggest that caching would be beneficial. Learning to recognize these signals is the first step in developing caching intuition.

Signal 1: Repeated Identical Queries

The clearest caching opportunity exists when the same query is executed repeatedly:

-- If your database logs show this query 10,000 times per hour:
SELECT * FROM products WHERE id = 12345;

-- That's 10,000 identical database round-trips for data that 
-- probably hasn't changed between requests.

This pattern appears in:

Product detail pages (same product viewed by many users)
User profile lookups (authentication checks, permission validations)
Configuration data (feature flags, settings)
Reference data (countries, categories, status codes)

Strong Caching Signals

•High read-to-write ratio: Data read 100+ times per write is an ideal caching candidate. The 'amortized cost' of cache population is spread across many reads.
•Expensive computation: Operations taking >100ms are prime candidates. ML model inferences, complex aggregations, external API calls.
•Cross-request repetition: Multiple users requesting the same resource. Popular content, shared configurations, public data.
•Hot spots in access patterns: Analytics showing 20% of data serving 80% of requests. Classic power-law distribution.
•Temporal clustering: Requests for same data cluster in time. Breaking news, new product launches, event-driven traffic.
•Expensive joins or aggregations: Database queries involving multiple tables or GROUP BY operations. Often compute-bound rather than I/O-bound.

Signal 2: Database Under Pressure

Your monitoring is telling you something:

CPU utilization consistently above 70%
Connection pool frequently exhausted
Query latency percentiles climbing
Replication lag increasing
Query timeouts appearing in logs

These symptoms often indicate that read traffic is overwhelming your database. Caching offloads read traffic, giving the database breathing room.

The Query Log is Your Friend

Enable slow query logging and analyze your database's query patterns. Tools like pg_stat_statements (PostgreSQL) or Performance Schema (MySQL) reveal your most frequent and expensive queries. The intersection of 'frequent' and 'expensive' is your caching priority list.

Anti-Patterns: When NOT to Cache

Equally important is recognizing when caching is the wrong solution. Caching isn't a universal performance fix—in some situations, it adds complexity without benefit or actively causes harm.

Anti-Pattern 1: Low Read-to-Write Ratio

When data changes as often as (or more often than) it's read, caching provides no benefit:

Sensor reading updated every 100ms, read every 200ms
→ Cache would be invalidated before most reads
→ Cache just adds a layer without helping

The break-even point is roughly 2:1 reads per write. Below that, caching overhead may exceed its benefit.

Caching Anti-Patterns

•Data changes faster than it's read: Real-time sensor data, live counters, rapidly updating metrics. Cache would be perpetually stale.
•Highly personalized responses: Every user gets unique content. Cache keys explode, hit rates plummet, memory wasted.
•Low-traffic endpoints: If an endpoint gets 10 requests/hour, cache will cold-start on every request anyway.
•Transactional consistency required: Financial transactions, inventory decrements, anything requiring ACID guarantees.
•Security-critical authentication decisions: Token validation, permission checks where stale data = security breach.
•Data too large to cache economically: 100GB datasets that don't fit in memory. Consider query optimization instead.
•Uniform access distribution: Every item accessed equally. No 'hot' items means no cache hit concentration.

Anti-Pattern 2: Caching to Hide Architectural Problems

Caching should optimize already-reasonable systems, not mask fundamental issues:

If a query takes 30 seconds, caching hides a query that needs optimization
If the database is undersized, caching delays an inevitable migration
If an API is poorly designed (N+1 queries), caching papers over structural problems

Caching in these cases creates technical debt. The underlying problem remains, and you've added cache complexity on top.

The Right Order:

Optimize the query/code first
Scale infrastructure appropriately
Then add caching for additional performance margin

The Hidden Cost of Inappropriate Caching

Caching data that shouldn't be cached creates ongoing maintenance burden: stale data bugs, cache invalidation complexity, debugging difficulty, and infrastructure costs. A poorly considered cache often costs more than the performance gain is worth.

Caching Decision Quick Reference
Scenario	Cache?	Rationale
Product catalog, 1000 reads/write	✓ Yes	High read ratio, tolerates staleness
User's own dashboard view	⚠️ Carefully	Personalized, but repeat visits likely
Real-time stock ticker	✗ No	Changes constantly, staleness unacceptable
Feature flag configuration	✓ Yes	Read millions of times, changes rarely
Current account balance	✗ No	Financial accuracy required
Rendered HTML for marketing pages	✓ Yes	Expensive to generate, changes rarely
Live chat messages	✗ No	Real-time requirement, per-user data
Search results for common queries	✓ Yes	Expensive, repeated, tolerates staleness

Choosing the Right Caching Layer

Caching can be applied at multiple layers of your architecture. Each layer has different characteristics suitable for different use cases.

The Caching Hierarchy:

Caching Layers Comparison
Layer	Latency	Capacity	Sharing	Best For
Browser/Client	0ms	Limited	Single user	Static assets, user preferences
CDN/Edge	5-50ms	Large	Regional	Static files, public HTML, API responses
Reverse Proxy	1-5ms	Moderate	All users	Full page caching, API gateway
Application (in-process)	<1ms	Small	Single instance	Hot data, computed values, sessions
Distributed Cache	1-5ms	Large	All instances	Session data, computed results, database query results
Database Query Cache	0.1-1ms	Moderate	All connections	Repeated identical queries

Layer Selection Guidelines:

CDN/Edge Caching:

Use for: Static assets, public content, regional content
Latency savings: Eliminates geographic network latency
Key consideration: Cache invalidation is slow (propagation across global network)

Application-Level In-Process Cache:

Use for: Very hot data, computation results, per-instance session
Latency savings: Eliminates all network round-trips
Key consideration: Not shared across instances; cold start on restart; memory limits

Distributed Cache (Redis/Memcached):

Use for: Session data, database query results, shared state
Latency savings: Much faster than database, shared across instances
Key consideration: Network hop required; single point of failure if not clustered

caching_layer_selection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
"""
Framework for selecting appropriate caching layer based on data characteristics.
"""
from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
 
class CachingLayer(Enum):
    BROWSER = "Browser/Client Cache"
    CDN = "CDN/Edge Cache"
    REVERSE_PROXY = "Reverse Proxy Cache"
    APPLICATION = "Application In-Process Cache"
    DISTRIBUTED = "Distributed Cache (Redis/Memcached)"
    DATABASE = "Database Query Cache"
    NONE = "No Caching Recommended"
 
 
@dataclass
class DataCharacteristics:
    """Characteristics of data to help determine caching strategy."""
    
    reads_per_write: float          # Ratio of reads to writes
    staleness_tolerance_seconds: float  # Acceptable staleness window
    is_user_specific: bool          # Whether data is personalized
    is_public: bool                 # Whether data is the same for all users
    size_bytes: int                 # Typical size of cached item
    access_frequency_per_minute: float  # How often this data is accessed
    computation_cost_ms: float      # Time to generate/fetch this data
    is_security_sensitive: bool     # Whether stale data could cause security issues
    requires_strong_consistency: bool  # Whether data must be current
 
 
def recommend_caching_layer(data: DataCharacteristics) -> tuple[CachingLayer, str]:
    """
    Recommend appropriate caching layer based on data characteristics.
    Returns recommended layer and explanation.
    """
    
    # Check for caching anti-patterns first
    if data.requires_strong_consistency:
        return (CachingLayer.NONE, 
                "Strong consistency requirement rules out caching")
    
    if data.is_security_sensitive and data.staleness_tolerance_seconds < 1:
        return (CachingLayer.NONE,
                "Security-sensitive data with near-zero staleness tolerance")
    
    if data.reads_per_write < 2:
        return (CachingLayer.NONE,
                f"Read/write ratio of {data.reads_per_write:.1f} too low for caching benefit")
    
    if data.access_frequency_per_minute < 0.1:
        return (CachingLayer.NONE,
                "Access frequency too low - cache would constantly cold-start")
    
    # Determine best layer based on characteristics
    
    # CDN layer for public, static-ish, large-scale content
    if (data.is_public and 
        not data.is_user_specific and 
        data.staleness_tolerance_seconds >= 60 and
        data.access_frequency_per_minute > 100):
        return (CachingLayer.CDN,
                "Public content with high access frequency - ideal for CDN edge caching")
    
    # Browser cache for static assets
    if (data.is_public and 
        data.staleness_tolerance_seconds >= 3600 and  # 1 hour+
        data.size_bytes > 10000):  # Larger assets worth caching
        return (CachingLayer.BROWSER,
                "Static content with long staleness tolerance - ideal for browser caching")
    
    # Application cache for very hot, small data
    if (data.size_bytes < 10000 and 
        data.access_frequency_per_minute > 1000 and
        data.computation_cost_ms < 10):
        return (CachingLayer.APPLICATION,
                "Hot, small data with very high frequency - in-process cache for lowest latency")
    
    # Distributed cache for shared, computed, or session data
    if (data.computation_cost_ms > 50 or
        data.reads_per_write > 100 or
        (data.is_user_specific and data.staleness_tolerance_seconds >= 60)):
        return (CachingLayer.DISTRIBUTED,
                "Computed/shared data with good read ratio - distributed cache for sharing across instances")
    
    # Database query cache as fallback for moderate cases
    if data.reads_per_write > 10 and data.staleness_tolerance_seconds >= 5:
        return (CachingLayer.DATABASE,
                "Moderate caching benefit - database query cache as lightweight option")
    
    return (CachingLayer.DISTRIBUTED,
            "General purpose caching - distributed cache provides good balance")
 
 
# Example evaluations
examples = [
    ("Product Catalog", DataCharacteristics(
        reads_per_write=500,
        staleness_tolerance_seconds=300,
        is_user_specific=False,
        is_public=True,
        size_bytes=5000,
        access_frequency_per_minute=10000,
        computation_cost_ms=50,
        is_security_sensitive=False,
        requires_strong_consistency=False
    )),
    ("User Session", DataCharacteristics(
        reads_per_write=100,
        staleness_tolerance_seconds=30,
        is_user_specific=True,
        is_public=False,
        size_bytes=2000,
        access_frequency_per_minute=100,
        computation_cost_ms=10,
        is_security_sensitive=True,
        requires_strong_consistency=False
    )),
    ("Account Balance", DataCharacteristics(
        reads_per_write=10,
        staleness_tolerance_seconds=0,
        is_user_specific=True,
        is_public=False,
        size_bytes=100,
        access_frequency_per_minute=50,
        computation_cost_ms=5,
        is_security_sensitive=True,
        requires_strong_consistency=True
    )),
    ("Feature Flags", DataCharacteristics(
        reads_per_write=100000,
        staleness_tolerance_seconds=60,
        is_user_specific=False,
        is_public=False,
        size_bytes=500,
        access_frequency_per_minute=50000,
        computation_cost_ms=2,
        is_security_sensitive=False,
        requires_strong_consistency=False
    )),
]
 
print("Caching Layer Recommendations")
print("=" * 60)
 
for name, characteristics in examples:
    layer, explanation = recommend_caching_layer(characteristics)
    print(f"
{name}:")
    print(f"  Recommended: {layer.value}")
    print(f"  Rationale: {explanation}")

Data-Driven Caching Decisions

Intuition is valuable, but data is better. Before implementing caching, gather metrics that inform and justify the decision. After implementation, continue measuring to validate assumptions.

Pre-Caching Analysis:

Before implementing a cache, collect:

Access Pattern Data
- Query frequency: How often is each query executed?
- Query distribution: Is there a hot set, or uniform access?
- Temporal patterns: Are there daily/weekly cycles?
Latency Breakdown
- Current P50, P95, P99 latencies
- Time spent in database vs. application vs. network
- Origin cost per request
Data Volatility
- How often does each data type change?
- What triggers changes? (User action, scheduled job, external event)
- How is consistency affected by delayed awareness of changes?

cache_opportunity_analyzer.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
"""
Analyzer for identifying caching opportunities from query logs.
This simulates the analysis you'd do on actual production data.
"""
from dataclasses import dataclass
from collections import defaultdict
from datetime import datetime, timedelta
from typing import Dict, List, Tuple
import math
 
 
@dataclass
class QueryLogEntry:
    """Represents a single query from the log."""
    query_hash: str       # Normalized query hash
    query_pattern: str    # Human-readable query pattern
    execution_time_ms: float
    result_size_bytes: int
    timestamp: datetime
    was_write: bool
 
 
def analyze_caching_opportunities(
    query_logs: List[QueryLogEntry],
    analysis_window: timedelta
) -> Dict:
    """
    Analyze query logs to identify caching opportunities.
    
    Returns prioritized list of queries/patterns that would benefit from caching.
    """
    
    # Aggregate metrics per query pattern
    stats = defaultdict(lambda: {
        "read_count": 0,
        "write_count": 0,
        "total_time_ms": 0,
        "total_bytes": 0,
        "timestamps": [],
        "latencies": []
    })
    
    for entry in query_logs:
        s = stats[entry.query_pattern]
        if entry.was_write:
            s["write_count"] += 1
        else:
            s["read_count"] += 1
            s["total_time_ms"] += entry.execution_time_ms
            s["total_bytes"] += entry.result_size_bytes
            s["latencies"].append(entry.execution_time_ms)
            s["timestamps"].append(entry.timestamp)
    
    # Calculate caching metrics for each pattern
    opportunities = []
    
    for pattern, s in stats.items():
        if s["read_count"] == 0:
            continue
        
        read_count = s["read_count"]
        write_count = max(s["write_count"], 1)  # Avoid division by zero
        
        # Key metrics
        read_write_ratio = read_count / write_count
        avg_latency_ms = s["total_time_ms"] / read_count
        avg_size_bytes = s["total_bytes"] / read_count
        
        # Burstiness: Are reads clustered or spread out?
        if len(s["timestamps"]) > 1:
            intervals = [
                (s["timestamps"][i] - s["timestamps"][i-1]).total_seconds()
                for i in range(1, len(s["timestamps"]))
            ]
            avg_interval = sum(intervals) / len(intervals)
            burstiness = 1.0 / avg_interval if avg_interval > 0 else float('inf')
        else:
            burstiness = 0
        
        # Calculate cache value score
        # Higher is better for caching
        cache_value_score = (
            math.log10(max(read_write_ratio, 1)) * 2 +       # High R/W ratio helps
            math.log10(max(avg_latency_ms, 1)) * 3 +         # High latency helps
            math.log10(max(read_count, 1)) * 1 +             # High frequency helps
            -math.log10(max(avg_size_bytes / 1000, 1)) * 0.5  # Large size hurts
        )
        
        # Estimate potential savings
        # Assume 90% hit rate, 1ms cache latency
        potential_latency_saved = avg_latency_ms * 0.9 * read_count
        
        opportunities.append({
            "pattern": pattern,
            "read_count": read_count,
            "read_write_ratio": round(read_write_ratio, 1),
            "avg_latency_ms": round(avg_latency_ms, 2),
            "avg_size_kb": round(avg_size_bytes / 1024, 2),
            "cache_value_score": round(cache_value_score, 2),
            "potential_time_saved_sec": round(potential_latency_saved / 1000, 2),
            "recommendation": get_recommendation(read_write_ratio, avg_latency_ms, read_count)
        })
    
    # Sort by cache value score
    opportunities.sort(key=lambda x: x["cache_value_score"], reverse=True)
    
    return {
        "analysis_window": str(analysis_window),
        "total_queries_analyzed": len(query_logs),
        "unique_patterns": len(opportunities),
        "opportunities": opportunities[:10],  # Top 10
        "summary": generate_summary(opportunities)
    }
 
 
def get_recommendation(rw_ratio: float, latency: float, count: int) -> str:
    """Generate caching recommendation based on metrics."""
    if rw_ratio < 2:
        return "Not recommended - insufficient read/write ratio"
    if latency < 5 and count < 100:
        return "Low priority - minimal latency savings"
    if rw_ratio > 100 and latency > 50:
        return "STRONG - High ratio, high latency, excellent candidate"
    if rw_ratio > 10:
        return "Recommended - Good ratio for caching"
    return "Consider - May benefit from caching"
 
 
def generate_summary(opportunities: List[Dict]) -> Dict:
    """Generate summary statistics."""
    if not opportunities:
        return {"message": "No caching opportunities identified"}
    
    strong = sum(1 for o in opportunities if "STRONG" in o["recommendation"])
    recommended = sum(1 for o in opportunities if "Recommended" in o["recommendation"])
    
    total_savings = sum(o["potential_time_saved_sec"] for o in opportunities[:10])
    
    return {
        "strong_candidates": strong,
        "recommended_candidates": recommended,
        "potential_time_saved_top10_sec": round(total_savings, 2),
        "top_pattern": opportunities[0]["pattern"] if opportunities else None
    }
 
 
# Simulated analysis
sample_log = [
    QueryLogEntry("h1", "SELECT * FROM products WHERE id = ?", 45, 5000, datetime.now() - timedelta(hours=i), False)
    for i in range(1000)  # 1000 product reads
] + [
    QueryLogEntry("h2", "UPDATE products SET price = ? WHERE id = ?", 10, 0, datetime.now() - timedelta(hours=i*100), True)
    for i in range(2)  # 2 product updates
] + [
    QueryLogEntry("h3", "SELECT * FROM orders WHERE user_id = ?", 120, 15000, datetime.now() - timedelta(minutes=i), False)
    for i in range(500)
] + [
    QueryLogEntry("h4", "INSERT INTO orders ...", 20, 0, datetime.now() - timedelta(minutes=i*2), True)
    for i in range(200)
]
 
result = analyze_caching_opportunities(sample_log, timedelta(hours=24))
 
print("Caching Opportunity Analysis")
print("=" * 60)
print(f"Queries Analyzed: {result['total_queries_analyzed']}")
print(f"Unique Patterns: {result['unique_patterns']}")
print("
Top Opportunities:")
for i, opp in enumerate(result['opportunities'], 1):
    print(f"
{i}. {opp['pattern'][:50]}")
    print(f"   Reads: {opp['read_count']} | R/W Ratio: {opp['read_write_ratio']}")
    print(f"   Avg Latency: {opp['avg_latency_ms']}ms | Size: {opp['avg_size_kb']}KB")
    print(f"   Cache Score: {opp['cache_value_score']} | {opp['recommendation']}")

The Baseline Principle

Always establish performance baselines before adding caching. Without a baseline, you can't prove caching helped—or identify when it's causing problems. Measure latency distributions, throughput, and error rates. Compare these after cache implementation.

Caching Strategy Selection

Once you've decided to cache, you must choose how to cache. Different caching strategies suit different data characteristics and consistency requirements.

Strategy Overview:

Caching Strategy Selection Guide
Strategy	Consistency	Write Latency	Best For
Cache-Aside	Eventual	Same as no cache	General purpose; read-heavy workloads
Read-Through	Eventual	Same as no cache	Hiding cache complexity from app
Write-Through	Strong	Higher (two writes)	Strong consistency requirements
Write-Behind	Eventual	Lower (cache only)	Write-heavy with eventual consistency OK
Refresh-Ahead	Eventual	Same as no cache	Predictable access patterns; avoiding cache misses

Cache-Aside (Lazy Loading):

The most common pattern. Application manages cache explicitly:

Application checks cache
On miss, application fetches from origin
Application writes result to cache
Returns result

Best for: Most read-heavy workloads. Simple to implement. Handles cache failures gracefully (requests just hit origin).

Write-Through:

Writes go to both cache and origin synchronously:

Application writes to cache
Cache (or application) writes to origin
Both complete before returning

Best for: Strong consistency requirements. Cache always has latest data. Higher write latency is acceptable.

Write-Behind (Write-Back):

Writes go to cache immediately, asynchronously flushed to origin:

Application writes to cache
Returns immediately
Cache asynchronously persists to origin

Best for: Write-heavy workloads where eventual consistency is acceptable. Risk: data loss if cache fails before flush.

Write-Behind Data Loss Risk

Write-behind/write-back caching can lose data if the cache fails before asynchronously persisting to the origin. Only use this for data you can afford to lose, or pair it with durability mechanisms (Redis AOF, clustered replication).

cache_strategies.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
/**
 * Implementation examples of common caching strategies.
 * These illustrate the patterns; production code would include 
 * error handling, metrics, and more sophisticated features.
 */
 
interface Cache<T> {
    get(key: string): Promise<T | null>;
    set(key: string, value: T, ttlMs?: number): Promise<void>;
    delete(key: string): Promise<void>;
}
 
interface DataStore<T> {
    read(key: string): Promise<T | null>;
    write(key: string, value: T): Promise<void>;
}
 
/**
 * Cache-Aside (Lazy Loading)
 * Application explicitly manages cache reads and writes.
 */
class CacheAsideRepository<T> {
    constructor(
        private cache: Cache<T>,
        private store: DataStore<T>,
        private ttlMs: number = 300000 // 5 minutes default
    ) {}
    
    async read(key: string): Promise<T | null> {
        // 1. Check cache first
        const cached = await this.cache.get(key);
        if (cached !== null) {
            return cached; // Cache hit
        }
        
        // 2. Cache miss - fetch from origin
        const value = await this.store.read(key);
        
        // 3. Populate cache (only if value exists)
        if (value !== null) {
            await this.cache.set(key, value, this.ttlMs);
        }
        
        return value;
    }
    
    async write(key: string, value: T): Promise<void> {
        // Write to origin
        await this.store.write(key, value);
        
        // Invalidate cache (or update it)
        await this.cache.delete(key);
    }
}
 
/**
 * Write-Through
 * Writes go to both cache and origin synchronously.
 */
class WriteThroughRepository<T> {
    constructor(
        private cache: Cache<T>,
        private store: DataStore<T>,
        private ttlMs: number = 300000
    ) {}
    
    async read(key: string): Promise<T | null> {
        // Same as cache-aside for reads
        const cached = await this.cache.get(key);
        if (cached !== null) {
            return cached;
        }
        
        const value = await this.store.read(key);
        if (value !== null) {
            await this.cache.set(key, value, this.ttlMs);
        }
        
        return value;
    }
    
    async write(key: string, value: T): Promise<void> {
        // Write to BOTH cache and origin
        // Use Promise.all for parallel writes, but origin is source of truth
        
        // Origin first (if this fails, we don't pollute cache)
        await this.store.write(key, value);
        
        // Then cache
        await this.cache.set(key, value, this.ttlMs);
    }
}
 
/**
 * Write-Behind (Write-Back)
 * Writes go to cache immediately, async flush to origin.
 * WARNING: Risk of data loss if cache fails before flush.
 */
class WriteBehindRepository<T> {
    private pendingWrites: Map<string, { value: T; timestamp: number }> = new Map();
    private flushIntervalMs: number = 1000;
    
    constructor(
        private cache: Cache<T>,
        private store: DataStore<T>,
        private ttlMs: number = 300000
    ) {
        // Start background flusher
        this.startFlusher();
    }
    
    async read(key: string): Promise<T | null> {
        // Check pending writes first (most recent data)
        const pending = this.pendingWrites.get(key);
        if (pending) {
            return pending.value;
        }
        
        // Then cache
        const cached = await this.cache.get(key);
        if (cached !== null) {
            return cached;
        }
        
        // Finally origin
        const value = await this.store.read(key);
        if (value !== null) {
            await this.cache.set(key, value, this.ttlMs);
        }
        
        return value;
    }
    
    async write(key: string, value: T): Promise<void> {
        // Write to cache immediately
        await this.cache.set(key, value, this.ttlMs);
        
        // Queue for async flush to origin
        this.pendingWrites.set(key, { value, timestamp: Date.now() });
        
        // Returns immediately - origin write is async
    }
    
    private startFlusher(): void {
        setInterval(async () => {
            const toFlush = new Map(this.pendingWrites);
            this.pendingWrites.clear();
            
            for (const [key, { value }] of toFlush) {
                try {
                    await this.store.write(key, value);
                } catch (error) {
                    // Re-queue failed writes (with backoff in production)
                    console.error(`Failed to flush ${key}, re-queuing`);
                    this.pendingWrites.set(key, { value, timestamp: Date.now() });
                }
            }
        }, this.flushIntervalMs);
    }
}
 
/**
 * Refresh-Ahead
 * Proactively refresh cache entries before they expire.
 */
class RefreshAheadRepository<T> {
    private refreshThreshold: number = 0.8; // Refresh at 80% of TTL
    
    constructor(
        private cache: Cache<T>,
        private store: DataStore<T>,
        private ttlMs: number = 300000
    ) {}
    
    async read(key: string): Promise<T | null> {
        const cached = await this.cache.get(key);
        
        if (cached !== null) {
            // Check if we should refresh ahead
            // (In real implementation, track cache entry age)
            this.maybeRefreshAsync(key);
            return cached;
        }
        
        // Cache miss
        const value = await this.store.read(key);
        if (value !== null) {
            await this.cache.set(key, value, this.ttlMs);
        }
        
        return value;
    }
    
    private async maybeRefreshAsync(key: string): Promise<void> {
        // In production, track entry age and refresh if near expiry
        // This is a simplified illustration
        // Don't await - fire and forget
        this.store.read(key).then(async (value) => {
            if (value !== null) {
                await this.cache.set(key, value, this.ttlMs);
            }
        }).catch(() => {
            // Ignore refresh failures - cache still has value
        });
    }
    
    async write(key: string, value: T): Promise<void> {
        await this.store.write(key, value);
        await this.cache.delete(key);
    }
}
 
// Usage example
console.log(`
Cache Strategy Selection Guide:
  
  Cache-Aside: Default choice for most read-heavy workloads
  Write-Through: When you need strong consistency
  Write-Behind: Write-heavy with acceptable data loss risk
  Refresh-Ahead: Predictable hot data, minimize cache misses
`);

Starting Small and Iterating

The best caching implementations start simple and evolve based on real-world data. Don't try to design the perfect caching system upfront—you'll almost certainly be wrong about access patterns and requirements.

The Iterative Caching Process:

Recommended Implementation Approach

•Identify a single high-impact candidate: Pick the query with highest (frequency × latency) product. Just one.
•Implement simple cache-aside: Use a straightforward TTL-based cache. Don't over-engineer.
•Deploy with monitoring: Add hit rate, latency, and error tracking from day one.
•Observe for one week: Watch hit rates, latency improvements, consistency issues.
•Tune TTL based on data: Adjust TTL based on actual staleness tolerance and hit rate.
•Expand to next candidate: Apply learnings, cache the next high-impact query.
•Repeat: Build capability through iteration, not upfront design.

Common Iteration Patterns:

Week 1: Cache the product catalog with 5-minute TTL. 90% hit rate, 8x latency improvement.
Week 2: Reduce TTL to 2 minutes after noticing staleness complaints. Hit rate drops to 80%, but complaints stop.
Week 3: Add cache for user sessions. Different pattern—shorter TTL, higher consistency requirement.
Week 4: Implement invalidation for product updates. Complexity justified by business requirements.
Month 2: Add refresh-ahead for predictably hot products during sales events.

Each step builds on actual production experience, not theoretical design.

The Feature Flag Approach

Put caching behind feature flags. This allows gradual rollout (1% → 10% → 50% → 100%), instant rollback if issues arise, and A/B comparison of cached vs uncached performance. This dramatically reduces risk when introducing caching.

Knowing When You're Done:

At some point, additional caching yields diminishing returns:

Hit rate is already 99%+ — little room for improvement
Remaining cache misses are truly unique data
Origin load is well within capacity with healthy headroom
Adding more caching increases complexity without measurable benefit

At this point, shift focus from adding caches to optimizing existing ones: better eviction policies, smarter invalidation, more granular TTLs.

Summary: The Art of Knowing When to Cache

This page has equipped you with a systematic approach to caching decisions. The goal isn't to cache everything—it's to cache wisely, where the benefits clearly outweigh the costs.

Key Takeaways

•Recognize caching signals — High read-to-write ratios, expensive computations, hot spots in access patterns, and database pressure all indicate caching potential.
•Know the anti-patterns — Low read-to-write ratios, rapidly changing data, highly personalized responses, and strong consistency requirements suggest caching isn't appropriate.
•Choose the right layer — Browser, CDN, reverse proxy, application, and distributed caches each serve different use cases. Match layer to data characteristics.
•Use data to drive decisions — Analyze query logs, measure baselines, and let real metrics guide caching priorities rather than intuition alone.
•Select appropriate strategies — Cache-aside, write-through, write-behind, and refresh-ahead each fit different consistency and performance requirements.
•Start small and iterate — Begin with one high-impact cache, observe, tune, then expand. Build caching capability through experience, not upfront design.

Module Complete:

This concludes Module 1: Why Caching Matters. You now have a comprehensive understanding of:

The performance benefits caching provides
The resource consumption caching reduces
The trade-offs caching introduces
When and how to make caching decisions

In the next module, we'll dive into Cache Patterns—the specific implementation patterns (cache-aside, read-through, write-through, write-behind) that form the building blocks of caching systems. You'll learn how to implement each pattern and when each is appropriate.

Module Complete

You've completed Module 1: Why Caching Matters. You can now systematically evaluate caching opportunities, understand the benefits and trade-offs, and make informed decisions about when and how to cache. The next module will cover specific cache patterns and their implementations.

4 / 4

Loading learning content...

System Design (LLD)Caching Strategies

Why Caching Matters

LevelIntermediate

Duration60 mins

TopicCaching Strategies

4 / 4

When to Cache

The Art of Knowing When

What You Will Learn

Signals That Indicate Caching Potential

Certain patterns in your system strongly suggest that caching would be beneficial. Learning to recognize these signals is the first step in developing caching intuition.

Signal 1: Repeated Identical Queries

The clearest caching opportunity exists when the same query is executed repeatedly:

-- If your database logs show this query 10,000 times per hour:
SELECT * FROM products WHERE id = 12345;

-- That's 10,000 identical database round-trips for data that 
-- probably hasn't changed between requests.

This pattern appears in:

Product detail pages (same product viewed by many users)
User profile lookups (authentication checks, permission validations)
Configuration data (feature flags, settings)
Reference data (countries, categories, status codes)

Strong Caching Signals

•High read-to-write ratio: Data read 100+ times per write is an ideal caching candidate. The 'amortized cost' of cache population is spread across many reads.
•Expensive computation: Operations taking >100ms are prime candidates. ML model inferences, complex aggregations, external API calls.
•Cross-request repetition: Multiple users requesting the same resource. Popular content, shared configurations, public data.
•Hot spots in access patterns: Analytics showing 20% of data serving 80% of requests. Classic power-law distribution.
•Temporal clustering: Requests for same data cluster in time. Breaking news, new product launches, event-driven traffic.
•Expensive joins or aggregations: Database queries involving multiple tables or GROUP BY operations. Often compute-bound rather than I/O-bound.

Signal 2: Database Under Pressure

Your monitoring is telling you something:

CPU utilization consistently above 70%
Connection pool frequently exhausted
Query latency percentiles climbing
Replication lag increasing
Query timeouts appearing in logs

These symptoms often indicate that read traffic is overwhelming your database. Caching offloads read traffic, giving the database breathing room.

The Query Log is Your Friend

Anti-Patterns: When NOT to Cache

Equally important is recognizing when caching is the wrong solution. Caching isn't a universal performance fix—in some situations, it adds complexity without benefit or actively causes harm.

Anti-Pattern 1: Low Read-to-Write Ratio

When data changes as often as (or more often than) it's read, caching provides no benefit:

Sensor reading updated every 100ms, read every 200ms
→ Cache would be invalidated before most reads
→ Cache just adds a layer without helping

The break-even point is roughly 2:1 reads per write. Below that, caching overhead may exceed its benefit.

Caching Anti-Patterns

•Data changes faster than it's read: Real-time sensor data, live counters, rapidly updating metrics. Cache would be perpetually stale.
•Highly personalized responses: Every user gets unique content. Cache keys explode, hit rates plummet, memory wasted.
•Low-traffic endpoints: If an endpoint gets 10 requests/hour, cache will cold-start on every request anyway.
•Transactional consistency required: Financial transactions, inventory decrements, anything requiring ACID guarantees.
•Security-critical authentication decisions: Token validation, permission checks where stale data = security breach.
•Data too large to cache economically: 100GB datasets that don't fit in memory. Consider query optimization instead.
•Uniform access distribution: Every item accessed equally. No 'hot' items means no cache hit concentration.

Anti-Pattern 2: Caching to Hide Architectural Problems

Caching should optimize already-reasonable systems, not mask fundamental issues:

If a query takes 30 seconds, caching hides a query that needs optimization
If the database is undersized, caching delays an inevitable migration
If an API is poorly designed (N+1 queries), caching papers over structural problems

Caching in these cases creates technical debt. The underlying problem remains, and you've added cache complexity on top.

The Right Order:

Optimize the query/code first
Scale infrastructure appropriately
Then add caching for additional performance margin

The Hidden Cost of Inappropriate Caching

Caching Decision Quick Reference
Scenario	Cache?	Rationale
Product catalog, 1000 reads/write	✓ Yes	High read ratio, tolerates staleness
User's own dashboard view	⚠️ Carefully	Personalized, but repeat visits likely
Real-time stock ticker	✗ No	Changes constantly, staleness unacceptable
Feature flag configuration	✓ Yes	Read millions of times, changes rarely
Current account balance	✗ No	Financial accuracy required
Rendered HTML for marketing pages	✓ Yes	Expensive to generate, changes rarely
Live chat messages	✗ No	Real-time requirement, per-user data
Search results for common queries	✓ Yes	Expensive, repeated, tolerates staleness

Choosing the Right Caching Layer

Caching can be applied at multiple layers of your architecture. Each layer has different characteristics suitable for different use cases.

The Caching Hierarchy:

Caching Layers Comparison
Layer	Latency	Capacity	Sharing	Best For
Browser/Client	0ms	Limited	Single user	Static assets, user preferences
CDN/Edge	5-50ms	Large	Regional	Static files, public HTML, API responses
Reverse Proxy	1-5ms	Moderate	All users	Full page caching, API gateway
Application (in-process)	<1ms	Small	Single instance	Hot data, computed values, sessions
Distributed Cache	1-5ms	Large	All instances	Session data, computed results, database query results
Database Query Cache	0.1-1ms	Moderate	All connections	Repeated identical queries

Layer Selection Guidelines:

CDN/Edge Caching:

Use for: Static assets, public content, regional content
Latency savings: Eliminates geographic network latency
Key consideration: Cache invalidation is slow (propagation across global network)

Application-Level In-Process Cache:

Use for: Very hot data, computation results, per-instance session
Latency savings: Eliminates all network round-trips
Key consideration: Not shared across instances; cold start on restart; memory limits

Distributed Cache (Redis/Memcached):

Use for: Session data, database query results, shared state
Latency savings: Much faster than database, shared across instances
Key consideration: Network hop required; single point of failure if not clustered

caching_layer_selection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
"""
Framework for selecting appropriate caching layer based on data characteristics.
"""
from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
 
class CachingLayer(Enum):
    BROWSER = "Browser/Client Cache"
    CDN = "CDN/Edge Cache"
    REVERSE_PROXY = "Reverse Proxy Cache"
    APPLICATION = "Application In-Process Cache"
    DISTRIBUTED = "Distributed Cache (Redis/Memcached)"
    DATABASE = "Database Query Cache"
    NONE = "No Caching Recommended"
 
 
@dataclass
class DataCharacteristics:
    """Characteristics of data to help determine caching strategy."""
    
    reads_per_write: float          # Ratio of reads to writes
    staleness_tolerance_seconds: float  # Acceptable staleness window
    is_user_specific: bool          # Whether data is personalized
    is_public: bool                 # Whether data is the same for all users
    size_bytes: int                 # Typical size of cached item
    access_frequency_per_minute: float  # How often this data is accessed
    computation_cost_ms: float      # Time to generate/fetch this data
    is_security_sensitive: bool     # Whether stale data could cause security issues
    requires_strong_consistency: bool  # Whether data must be current
 
 
def recommend_caching_layer(data: DataCharacteristics) -> tuple[CachingLayer, str]:
    """
    Recommend appropriate caching layer based on data characteristics.
    Returns recommended layer and explanation.
    """
    
    # Check for caching anti-patterns first
    if data.requires_strong_consistency:
        return (CachingLayer.NONE, 
                "Strong consistency requirement rules out caching")
    
    if data.is_security_sensitive and data.staleness_tolerance_seconds < 1:
        return (CachingLayer.NONE,
                "Security-sensitive data with near-zero staleness tolerance")
    
    if data.reads_per_write < 2:
        return (CachingLayer.NONE,
                f"Read/write ratio of {data.reads_per_write:.1f} too low for caching benefit")
    
    if data.access_frequency_per_minute < 0.1:
        return (CachingLayer.NONE,
                "Access frequency too low - cache would constantly cold-start")
    
    # Determine best layer based on characteristics
    
    # CDN layer for public, static-ish, large-scale content
    if (data.is_public and 
        not data.is_user_specific and 
        data.staleness_tolerance_seconds >= 60 and
        data.access_frequency_per_minute > 100):
        return (CachingLayer.CDN,
                "Public content with high access frequency - ideal for CDN edge caching")
    
    # Browser cache for static assets
    if (data.is_public and 
        data.staleness_tolerance_seconds >= 3600 and  # 1 hour+
        data.size_bytes > 10000):  # Larger assets worth caching
        return (CachingLayer.BROWSER,
                "Static content with long staleness tolerance - ideal for browser caching")
    
    # Application cache for very hot, small data
    if (data.size_bytes < 10000 and 
        data.access_frequency_per_minute > 1000 and
        data.computation_cost_ms < 10):
        return (CachingLayer.APPLICATION,
                "Hot, small data with very high frequency - in-process cache for lowest latency")
    
    # Distributed cache for shared, computed, or session data
    if (data.computation_cost_ms > 50 or
        data.reads_per_write > 100 or
        (data.is_user_specific and data.staleness_tolerance_seconds >= 60)):
        return (CachingLayer.DISTRIBUTED,
                "Computed/shared data with good read ratio - distributed cache for sharing across instances")
    
    # Database query cache as fallback for moderate cases
    if data.reads_per_write > 10 and data.staleness_tolerance_seconds >= 5:
        return (CachingLayer.DATABASE,
                "Moderate caching benefit - database query cache as lightweight option")
    
    return (CachingLayer.DISTRIBUTED,
            "General purpose caching - distributed cache provides good balance")
 
 
# Example evaluations
examples = [
    ("Product Catalog", DataCharacteristics(
        reads_per_write=500,
        staleness_tolerance_seconds=300,
        is_user_specific=False,
        is_public=True,
        size_bytes=5000,
        access_frequency_per_minute=10000,
        computation_cost_ms=50,
        is_security_sensitive=False,
        requires_strong_consistency=False
    )),
    ("User Session", DataCharacteristics(
        reads_per_write=100,
        staleness_tolerance_seconds=30,
        is_user_specific=True,
        is_public=False,
        size_bytes=2000,
        access_frequency_per_minute=100,
        computation_cost_ms=10,
        is_security_sensitive=True,
        requires_strong_consistency=False
    )),
    ("Account Balance", DataCharacteristics(
        reads_per_write=10,
        staleness_tolerance_seconds=0,
        is_user_specific=True,
        is_public=False,
        size_bytes=100,
        access_frequency_per_minute=50,
        computation_cost_ms=5,
        is_security_sensitive=True,
        requires_strong_consistency=True
    )),
    ("Feature Flags", DataCharacteristics(
        reads_per_write=100000,
        staleness_tolerance_seconds=60,
        is_user_specific=False,
        is_public=False,
        size_bytes=500,
        access_frequency_per_minute=50000,
        computation_cost_ms=2,
        is_security_sensitive=False,
        requires_strong_consistency=False
    )),
]
 
print("Caching Layer Recommendations")
print("=" * 60)
 
for name, characteristics in examples:
    layer, explanation = recommend_caching_layer(characteristics)
    print(f"
{name}:")
    print(f"  Recommended: {layer.value}")
    print(f"  Rationale: {explanation}")

Data-Driven Caching Decisions

Intuition is valuable, but data is better. Before implementing caching, gather metrics that inform and justify the decision. After implementation, continue measuring to validate assumptions.

Pre-Caching Analysis:

Before implementing a cache, collect:

Access Pattern Data
- Query frequency: How often is each query executed?
- Query distribution: Is there a hot set, or uniform access?
- Temporal patterns: Are there daily/weekly cycles?
Latency Breakdown
- Current P50, P95, P99 latencies
- Time spent in database vs. application vs. network
- Origin cost per request
Data Volatility
- How often does each data type change?
- What triggers changes? (User action, scheduled job, external event)
- How is consistency affected by delayed awareness of changes?

cache_opportunity_analyzer.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
"""
Analyzer for identifying caching opportunities from query logs.
This simulates the analysis you'd do on actual production data.
"""
from dataclasses import dataclass
from collections import defaultdict
from datetime import datetime, timedelta
from typing import Dict, List, Tuple
import math
 
 
@dataclass
class QueryLogEntry:
    """Represents a single query from the log."""
    query_hash: str       # Normalized query hash
    query_pattern: str    # Human-readable query pattern
    execution_time_ms: float
    result_size_bytes: int
    timestamp: datetime
    was_write: bool
 
 
def analyze_caching_opportunities(
    query_logs: List[QueryLogEntry],
    analysis_window: timedelta
) -> Dict:
    """
    Analyze query logs to identify caching opportunities.
    
    Returns prioritized list of queries/patterns that would benefit from caching.
    """
    
    # Aggregate metrics per query pattern
    stats = defaultdict(lambda: {
        "read_count": 0,
        "write_count": 0,
        "total_time_ms": 0,
        "total_bytes": 0,
        "timestamps": [],
        "latencies": []
    })
    
    for entry in query_logs:
        s = stats[entry.query_pattern]
        if entry.was_write:
            s["write_count"] += 1
        else:
            s["read_count"] += 1
            s["total_time_ms"] += entry.execution_time_ms
            s["total_bytes"] += entry.result_size_bytes
            s["latencies"].append(entry.execution_time_ms)
            s["timestamps"].append(entry.timestamp)
    
    # Calculate caching metrics for each pattern
    opportunities = []
    
    for pattern, s in stats.items():
        if s["read_count"] == 0:
            continue
        
        read_count = s["read_count"]
        write_count = max(s["write_count"], 1)  # Avoid division by zero
        
        # Key metrics
        read_write_ratio = read_count / write_count
        avg_latency_ms = s["total_time_ms"] / read_count
        avg_size_bytes = s["total_bytes"] / read_count
        
        # Burstiness: Are reads clustered or spread out?
        if len(s["timestamps"]) > 1:
            intervals = [
                (s["timestamps"][i] - s["timestamps"][i-1]).total_seconds()
                for i in range(1, len(s["timestamps"]))
            ]
            avg_interval = sum(intervals) / len(intervals)
            burstiness = 1.0 / avg_interval if avg_interval > 0 else float('inf')
        else:
            burstiness = 0
        
        # Calculate cache value score
        # Higher is better for caching
        cache_value_score = (
            math.log10(max(read_write_ratio, 1)) * 2 +       # High R/W ratio helps
            math.log10(max(avg_latency_ms, 1)) * 3 +         # High latency helps
            math.log10(max(read_count, 1)) * 1 +             # High frequency helps
            -math.log10(max(avg_size_bytes / 1000, 1)) * 0.5  # Large size hurts
        )
        
        # Estimate potential savings
        # Assume 90% hit rate, 1ms cache latency
        potential_latency_saved = avg_latency_ms * 0.9 * read_count
        
        opportunities.append({
            "pattern": pattern,
            "read_count": read_count,
            "read_write_ratio": round(read_write_ratio, 1),
            "avg_latency_ms": round(avg_latency_ms, 2),
            "avg_size_kb": round(avg_size_bytes / 1024, 2),
            "cache_value_score": round(cache_value_score, 2),
            "potential_time_saved_sec": round(potential_latency_saved / 1000, 2),
            "recommendation": get_recommendation(read_write_ratio, avg_latency_ms, read_count)
        })
    
    # Sort by cache value score
    opportunities.sort(key=lambda x: x["cache_value_score"], reverse=True)
    
    return {
        "analysis_window": str(analysis_window),
        "total_queries_analyzed": len(query_logs),
        "unique_patterns": len(opportunities),
        "opportunities": opportunities[:10],  # Top 10
        "summary": generate_summary(opportunities)
    }
 
 
def get_recommendation(rw_ratio: float, latency: float, count: int) -> str:
    """Generate caching recommendation based on metrics."""
    if rw_ratio < 2:
        return "Not recommended - insufficient read/write ratio"
    if latency < 5 and count < 100:
        return "Low priority - minimal latency savings"
    if rw_ratio > 100 and latency > 50:
        return "STRONG - High ratio, high latency, excellent candidate"
    if rw_ratio > 10:
        return "Recommended - Good ratio for caching"
    return "Consider - May benefit from caching"
 
 
def generate_summary(opportunities: List[Dict]) -> Dict:
    """Generate summary statistics."""
    if not opportunities:
        return {"message": "No caching opportunities identified"}
    
    strong = sum(1 for o in opportunities if "STRONG" in o["recommendation"])
    recommended = sum(1 for o in opportunities if "Recommended" in o["recommendation"])
    
    total_savings = sum(o["potential_time_saved_sec"] for o in opportunities[:10])
    
    return {
        "strong_candidates": strong,
        "recommended_candidates": recommended,
        "potential_time_saved_top10_sec": round(total_savings, 2),
        "top_pattern": opportunities[0]["pattern"] if opportunities else None
    }
 
 
# Simulated analysis
sample_log = [
    QueryLogEntry("h1", "SELECT * FROM products WHERE id = ?", 45, 5000, datetime.now() - timedelta(hours=i), False)
    for i in range(1000)  # 1000 product reads
] + [
    QueryLogEntry("h2", "UPDATE products SET price = ? WHERE id = ?", 10, 0, datetime.now() - timedelta(hours=i*100), True)
    for i in range(2)  # 2 product updates
] + [
    QueryLogEntry("h3", "SELECT * FROM orders WHERE user_id = ?", 120, 15000, datetime.now() - timedelta(minutes=i), False)
    for i in range(500)
] + [
    QueryLogEntry("h4", "INSERT INTO orders ...", 20, 0, datetime.now() - timedelta(minutes=i*2), True)
    for i in range(200)
]
 
result = analyze_caching_opportunities(sample_log, timedelta(hours=24))
 
print("Caching Opportunity Analysis")
print("=" * 60)
print(f"Queries Analyzed: {result['total_queries_analyzed']}")
print(f"Unique Patterns: {result['unique_patterns']}")
print("
Top Opportunities:")
for i, opp in enumerate(result['opportunities'], 1):
    print(f"
{i}. {opp['pattern'][:50]}")
    print(f"   Reads: {opp['read_count']} | R/W Ratio: {opp['read_write_ratio']}")
    print(f"   Avg Latency: {opp['avg_latency_ms']}ms | Size: {opp['avg_size_kb']}KB")
    print(f"   Cache Score: {opp['cache_value_score']} | {opp['recommendation']}")

The Baseline Principle

Caching Strategy Selection

Once you've decided to cache, you must choose how to cache. Different caching strategies suit different data characteristics and consistency requirements.

Strategy Overview:

Caching Strategy Selection Guide
Strategy	Consistency	Write Latency	Best For
Cache-Aside	Eventual	Same as no cache	General purpose; read-heavy workloads
Read-Through	Eventual	Same as no cache	Hiding cache complexity from app
Write-Through	Strong	Higher (two writes)	Strong consistency requirements
Write-Behind	Eventual	Lower (cache only)	Write-heavy with eventual consistency OK
Refresh-Ahead	Eventual	Same as no cache	Predictable access patterns; avoiding cache misses

Cache-Aside (Lazy Loading):

The most common pattern. Application manages cache explicitly:

Application checks cache
On miss, application fetches from origin
Application writes result to cache
Returns result

Best for: Most read-heavy workloads. Simple to implement. Handles cache failures gracefully (requests just hit origin).

Write-Through:

Writes go to both cache and origin synchronously:

Application writes to cache
Cache (or application) writes to origin
Both complete before returning

Best for: Strong consistency requirements. Cache always has latest data. Higher write latency is acceptable.

Write-Behind (Write-Back):

Writes go to cache immediately, asynchronously flushed to origin:

Application writes to cache
Returns immediately
Cache asynchronously persists to origin

Best for: Write-heavy workloads where eventual consistency is acceptable. Risk: data loss if cache fails before flush.

Write-Behind Data Loss Risk

cache_strategies.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
/**
 * Implementation examples of common caching strategies.
 * These illustrate the patterns; production code would include 
 * error handling, metrics, and more sophisticated features.
 */
 
interface Cache<T> {
    get(key: string): Promise<T | null>;
    set(key: string, value: T, ttlMs?: number): Promise<void>;
    delete(key: string): Promise<void>;
}
 
interface DataStore<T> {
    read(key: string): Promise<T | null>;
    write(key: string, value: T): Promise<void>;
}
 
/**
 * Cache-Aside (Lazy Loading)
 * Application explicitly manages cache reads and writes.
 */
class CacheAsideRepository<T> {
    constructor(
        private cache: Cache<T>,
        private store: DataStore<T>,
        private ttlMs: number = 300000 // 5 minutes default
    ) {}
    
    async read(key: string): Promise<T | null> {
        // 1. Check cache first
        const cached = await this.cache.get(key);
        if (cached !== null) {
            return cached; // Cache hit
        }
        
        // 2. Cache miss - fetch from origin
        const value = await this.store.read(key);
        
        // 3. Populate cache (only if value exists)
        if (value !== null) {
            await this.cache.set(key, value, this.ttlMs);
        }
        
        return value;
    }
    
    async write(key: string, value: T): Promise<void> {
        // Write to origin
        await this.store.write(key, value);
        
        // Invalidate cache (or update it)
        await this.cache.delete(key);
    }
}
 
/**
 * Write-Through
 * Writes go to both cache and origin synchronously.
 */
class WriteThroughRepository<T> {
    constructor(
        private cache: Cache<T>,
        private store: DataStore<T>,
        private ttlMs: number = 300000
    ) {}
    
    async read(key: string): Promise<T | null> {
        // Same as cache-aside for reads
        const cached = await this.cache.get(key);
        if (cached !== null) {
            return cached;
        }
        
        const value = await this.store.read(key);
        if (value !== null) {
            await this.cache.set(key, value, this.ttlMs);
        }
        
        return value;
    }
    
    async write(key: string, value: T): Promise<void> {
        // Write to BOTH cache and origin
        // Use Promise.all for parallel writes, but origin is source of truth
        
        // Origin first (if this fails, we don't pollute cache)
        await this.store.write(key, value);
        
        // Then cache
        await this.cache.set(key, value, this.ttlMs);
    }
}
 
/**
 * Write-Behind (Write-Back)
 * Writes go to cache immediately, async flush to origin.
 * WARNING: Risk of data loss if cache fails before flush.
 */
class WriteBehindRepository<T> {
    private pendingWrites: Map<string, { value: T; timestamp: number }> = new Map();
    private flushIntervalMs: number = 1000;
    
    constructor(
        private cache: Cache<T>,
        private store: DataStore<T>,
        private ttlMs: number = 300000
    ) {
        // Start background flusher
        this.startFlusher();
    }
    
    async read(key: string): Promise<T | null> {
        // Check pending writes first (most recent data)
        const pending = this.pendingWrites.get(key);
        if (pending) {
            return pending.value;
        }
        
        // Then cache
        const cached = await this.cache.get(key);
        if (cached !== null) {
            return cached;
        }
        
        // Finally origin
        const value = await this.store.read(key);
        if (value !== null) {
            await this.cache.set(key, value, this.ttlMs);
        }
        
        return value;
    }
    
    async write(key: string, value: T): Promise<void> {
        // Write to cache immediately
        await this.cache.set(key, value, this.ttlMs);
        
        // Queue for async flush to origin
        this.pendingWrites.set(key, { value, timestamp: Date.now() });
        
        // Returns immediately - origin write is async
    }
    
    private startFlusher(): void {
        setInterval(async () => {
            const toFlush = new Map(this.pendingWrites);
            this.pendingWrites.clear();
            
            for (const [key, { value }] of toFlush) {
                try {
                    await this.store.write(key, value);
                } catch (error) {
                    // Re-queue failed writes (with backoff in production)
                    console.error(`Failed to flush ${key}, re-queuing`);
                    this.pendingWrites.set(key, { value, timestamp: Date.now() });
                }
            }
        }, this.flushIntervalMs);
    }
}
 
/**
 * Refresh-Ahead
 * Proactively refresh cache entries before they expire.
 */
class RefreshAheadRepository<T> {
    private refreshThreshold: number = 0.8; // Refresh at 80% of TTL
    
    constructor(
        private cache: Cache<T>,
        private store: DataStore<T>,
        private ttlMs: number = 300000
    ) {}
    
    async read(key: string): Promise<T | null> {
        const cached = await this.cache.get(key);
        
        if (cached !== null) {
            // Check if we should refresh ahead
            // (In real implementation, track cache entry age)
            this.maybeRefreshAsync(key);
            return cached;
        }
        
        // Cache miss
        const value = await this.store.read(key);
        if (value !== null) {
            await this.cache.set(key, value, this.ttlMs);
        }
        
        return value;
    }
    
    private async maybeRefreshAsync(key: string): Promise<void> {
        // In production, track entry age and refresh if near expiry
        // This is a simplified illustration
        // Don't await - fire and forget
        this.store.read(key).then(async (value) => {
            if (value !== null) {
                await this.cache.set(key, value, this.ttlMs);
            }
        }).catch(() => {
            // Ignore refresh failures - cache still has value
        });
    }
    
    async write(key: string, value: T): Promise<void> {
        await this.store.write(key, value);
        await this.cache.delete(key);
    }
}
 
// Usage example
console.log(`
Cache Strategy Selection Guide:
  
  Cache-Aside: Default choice for most read-heavy workloads
  Write-Through: When you need strong consistency
  Write-Behind: Write-heavy with acceptable data loss risk
  Refresh-Ahead: Predictable hot data, minimize cache misses
`);

Starting Small and Iterating

The Iterative Caching Process:

Recommended Implementation Approach

•Identify a single high-impact candidate: Pick the query with highest (frequency × latency) product. Just one.
•Implement simple cache-aside: Use a straightforward TTL-based cache. Don't over-engineer.
•Deploy with monitoring: Add hit rate, latency, and error tracking from day one.
•Observe for one week: Watch hit rates, latency improvements, consistency issues.
•Tune TTL based on data: Adjust TTL based on actual staleness tolerance and hit rate.
•Expand to next candidate: Apply learnings, cache the next high-impact query.
•Repeat: Build capability through iteration, not upfront design.

Common Iteration Patterns:

Week 1: Cache the product catalog with 5-minute TTL. 90% hit rate, 8x latency improvement.
Week 2: Reduce TTL to 2 minutes after noticing staleness complaints. Hit rate drops to 80%, but complaints stop.
Week 3: Add cache for user sessions. Different pattern—shorter TTL, higher consistency requirement.
Week 4: Implement invalidation for product updates. Complexity justified by business requirements.
Month 2: Add refresh-ahead for predictably hot products during sales events.

Each step builds on actual production experience, not theoretical design.

The Feature Flag Approach

Knowing When You're Done:

At some point, additional caching yields diminishing returns:

Hit rate is already 99%+ — little room for improvement
Remaining cache misses are truly unique data
Origin load is well within capacity with healthy headroom
Adding more caching increases complexity without measurable benefit

At this point, shift focus from adding caches to optimizing existing ones: better eviction policies, smarter invalidation, more granular TTLs.

Summary: The Art of Knowing When to Cache

This page has equipped you with a systematic approach to caching decisions. The goal isn't to cache everything—it's to cache wisely, where the benefits clearly outweigh the costs.

Key Takeaways

•Recognize caching signals — High read-to-write ratios, expensive computations, hot spots in access patterns, and database pressure all indicate caching potential.
•Know the anti-patterns — Low read-to-write ratios, rapidly changing data, highly personalized responses, and strong consistency requirements suggest caching isn't appropriate.
•Choose the right layer — Browser, CDN, reverse proxy, application, and distributed caches each serve different use cases. Match layer to data characteristics.
•Use data to drive decisions — Analyze query logs, measure baselines, and let real metrics guide caching priorities rather than intuition alone.
•Select appropriate strategies — Cache-aside, write-through, write-behind, and refresh-ahead each fit different consistency and performance requirements.
•Start small and iterate — Begin with one high-impact cache, observe, tune, then expand. Build caching capability through experience, not upfront design.

Module Complete:

This concludes Module 1: Why Caching Matters. You now have a comprehensive understanding of:

The performance benefits caching provides
The resource consumption caching reduces
The trade-offs caching introduces
When and how to make caching decisions

Module Complete

4 / 4