Loading learning content...
We've established that TLB hits are fast and misses are slow. But how do we quantify "fast" and "slow" into a single, actionable metric? How do we compare two TLB designs, evaluate the benefit of adding L2 TLB entries, or justify the overhead of huge pages?
The answer is Effective Access Time (EAT)—the weighted average time to access memory, accounting for all possible hit/miss scenarios in the translation and caching hierarchy. EAT is the memory performance metric that matters: it tells you how fast memory actually is in your system, averaged over all accesses.
Mastering EAT calculation is essential for systems architects designing memory hierarchies, OS developers tuning TLB-related parameters, and performance engineers diagnosing memory bottlenecks. This page develops the EAT framework rigorously and shows how to apply it to real-world analysis.
By the end of this page, you will understand how to calculate Effective Access Time for single-level and multi-level TLBs, incorporate cache hierarchy effects, account for page walk costs accurately, and use EAT analysis to evaluate design alternatives and optimization opportunities.
At its core, Effective Access Time is a probability-weighted sum of access times for each possible outcome in the memory hierarchy.
The fundamental principle:
EAT = Σ (Probability of Outcome × Time for that Outcome)
For a simple single-level TLB system:
There are two outcomes:
Let:
α = TLB hit rate (e.g., 0.99 for 99%)t_tlb = TLB lookup time (typically 0-1 clock cycle)t_mem = Memory access time (e.g., 100ns)t_walk = Page table walk time (varies by levels and cache state)Basic EAT with single-level TLB:
EAT = (α) × (t_tlb + t_mem) + (1 - α) × (t_tlb + t_walk + t_mem)
Simplifying:
EAT = t_tlb + t_mem + (1 - α) × t_walk
This elegant form shows that EAT equals the baseline access time (t_tlb + t_mem) plus the miss penalty (t_walk) weighted by miss rate (1 - α).
This basic formula assumes the TLB lookup happens before (or in parallel with) cache/memory access. Modern VIPT caches allow parallel lookup, so t_tlb is often effectively zero for hits—it's overlapped with cache access. We'll refine the model for this.
Worked Example: Basic EAT Calculation
Given:
For TLB hit:
For TLB miss:
EAT = (0.99 × 100) + (0.01 × 260)
= 99 + 2.6
= 101.6ns
Interpretation:
The average memory access takes 101.6ns—just 1.6% slower than the theoretical minimum of 100ns. The TLB effectively hides the translation overhead for 99 out of 100 accesses.
Modern processors have hierarchical TLBs (L1 and L2), adding complexity to EAT calculations. The L2 TLB is checked only on L1 TLB miss, and page walk occurs only on L2 miss.
Three-outcome model for hierarchical TLBs:
Let:
α₁ = L1 TLB hit rateα₂ = L2 TLB hit rate (on L1 miss)t_L1 = L1 TLB lookup timet_L2 = L2 TLB lookup timet_walk = Page walk timet_mem = Memory access timeEAT with two-level TLB:
EAT = α₁ × (t_L1 + t_mem) // L1 hit
+ (1 - α₁) × α₂ × (t_L1 + t_L2 + t_mem) // L1 miss, L2 hit
+ (1 - α₁) × (1 - α₂) × (t_L1 + t_L2 + t_walk + t_mem) // Both miss
Worked Example: Two-Level TLB
Typical modern desktop CPU:
Let's compute step by step:
L1 hit path (95% of accesses):
Time = 0.3 + 80 = 80.3ns
Weighted contribution: 0.95 × 80.3 = 76.29ns
L1 miss, L2 hit (5% × 90% = 4.5% of accesses):
Time = 0.3 + 2.7 + 80 = 83ns
Weighted contribution: 0.045 × 83 = 3.74ns
Both miss (5% × 10% = 0.5% of accesses):
Time = 0.3 + 2.7 + 80 + 80 = 163ns
Weighted contribution: 0.005 × 163 = 0.82ns
Total EAT:
EAT = 76.29 + 3.74 + 0.82 = 80.85ns
Analysis:
Despite 5% L1 TLB miss rate, the L2 TLB catches 90% of those misses. Only 0.5% of accesses incur the full page walk penalty. The two-level design is highly effective.
The L2 TLB acts as a "safety net" for L1 misses. A larger L2 TLB (1000+ entries) can have high hit rates even with modest L1 size. This tiered approach is more efficient than building one massive fast TLB.
The page walk penalty isn't a fixed constant—it depends heavily on cache state. A realistic EAT model must account for where page table entries reside in the memory hierarchy.
Page walk cache scenarios:
Each page table level access can hit in:
For a 4-level page table walk, if all levels hit in L3:
Page walk time = 4 × 13ns = 52ns
If all levels miss to memory:
Page walk time = 4 × 70ns = 280ns
The difference is dramatic: 52ns vs 280ns!
| Scenario | Per-Level Latency | 4-Level Walk | Typical Probability |
|---|---|---|---|
| All hit L1 cache | ~1.3ns | ~5ns | < 1% |
| All hit L2 cache | ~4ns | ~16ns | ~5% |
| All hit L3 cache | ~13ns | ~52ns | ~60% |
| Mix of L3 and memory | ~30ns avg | ~120ns | ~30% |
| All miss to memory | ~70ns | ~280ns | ~5% |
Refined EAT with cache-aware page walk:
Let:
t_walk_L3 = Page walk with L3 hits (fast)t_walk_mem = Page walk with memory misses (slow)β = Probability that page walk hits in cachet_walk_effective = β × t_walk_L3 + (1 - β) × t_walk_mem
Worked Example:
Assuming 70% of TLB miss page walks hit in L3, 30% go to memory:
t_walk_effective = (0.7 × 52) + (0.3 × 150) = 36.4 + 45 = 81.4ns
This is much more realistic than assuming either extreme.
The page walk cache effect:
Modern CPUs cache upper-level page table entries separately from data:
With effective page walk caches, most 4-level walks reduce to 1-2 memory accesses:
Effective walk = 0.95 × (1 × 13ns) + 0.05 × (4 × 13ns) = 12.35 + 2.6 = ~15ns
Page walk caches make the "4 sequential accesses" fear much less severe in practice.
Real memory access traverses both address translation (TLB) and data caching (L1/L2/L3) hierarchies. A complete EAT model must account for both.
The combined model:
For each TLB outcome, the data can hit or miss at each cache level:
Total EAT = Σ (TLB outcome probability ×
(TLB latency +
Σ (Cache outcome probability × Cache latency)))
Simplification using effective data access time:
Define:
EAT_data = Effective time to get data once physical address is knownWith cache hierarchy:
EAT_data = p_L1 × t_L1 + p_L2 × t_L2 + p_L3 × t_L3 + p_mem × t_mem
Where p_L1 + p_L2 + p_L3 + p_mem = 1
Combined TLB + Cache EAT:
EAT = α × (t_tlb + EAT_data) + (1 - α) × (t_tlb + t_walk + EAT_data)
Or with two-level TLB:
EAT = α₁ × (t_L1_TLB + EAT_data) +
(1-α₁) × α₂ × (t_L1_TLB + t_L2_TLB + EAT_data) +
(1-α₁) × (1-α₂) × (t_L1_TLB + t_L2_TLB + t_walk + EAT_data)
Comprehensive Example: Modern x86-64 System
System parameters:
Step 1: Calculate EAT_data (cache hierarchy)
L1 hit: 95% → 1.5ns
L1 miss, L2 hit: 5% × 80% = 4% → 1.5 + 5 = 6.5ns
L1/L2 miss, L3 hit: 5% × 20% × 90% = 0.9% → 1.5 + 5 + 13 = 19.5ns
All miss: 5% × 20% × 10% = 0.1% → 1.5 + 5 + 13 + 67 = 86.5ns
EAT_data = 0.95(1.5) + 0.04(6.5) + 0.009(19.5) + 0.001(86.5)
= 1.425 + 0.26 + 0.176 + 0.087
= 1.95ns
Step 2: Calculate TLB-augmented EAT
TLB parameters:
L1 TLB hit: 95% → 0 + 1.95 = 1.95ns
L2 TLB hit: 5% × 90% = 4.5% → 3 + 1.95 = 4.95ns
Both miss: 5% × 10% = 0.5% → 3 + 40 + 1.95 = 44.95ns
EAT = 0.95(1.95) + 0.045(4.95) + 0.005(44.95)
= 1.85 + 0.22 + 0.22
= 2.29ns
Result: Average memory access takes ~2.3ns on this well-tuned system!
Despite memory latency of 67ns and potential page walk of 40ns, the effective access time is just 2.3ns. The multi-level caching and TLB hierarchies successfully hide latency for the vast majority of accesses.
Understanding which parameters most affect EAT helps prioritize optimization efforts. Let's analyze sensitivities.
The sensitivity framework:
For each parameter, we ask: if this parameter improves by X%, how much does EAT improve?
Sensitivity = ∂EAT / ∂parameter
Key insight: Miss rates matter more than hit times
With current technology:
Wait—that seems wrong! Let's dig deeper.
The impact depends on the current miss penalty. If L2/L3/memory latency is high, improving L1 hit rate has outsized impact. If the miss penalty is low (fast L2/L3), hit rate matters less.
TLB sensitivity analysis:
Using our earlier model (99% TLB hit rate, 160ns miss penalty):
Current EAT: 101.6ns
Scenario A: Improve hit rate to 99.5%
EAT = 0.995(100) + 0.005(260) = 99.5 + 1.3 = 100.8ns
Improvement: 0.8ns (0.8%)
Scenario B: Reduce miss penalty from 160ns to 100ns
EAT = 0.99(100) + 0.01(200) = 99 + 2 = 101ns
Improvement: 0.6ns (0.6%)
At 99% hit rate, they're comparable. But at 95% hit rate:
Original: 0.95(100) + 0.05(260) = 95 + 13 = 108ns
Scenario A: Hit rate to 96%
EAT = 0.96(100) + 0.04(260) = 96 + 10.4 = 106.4ns
Improvement: 1.6ns (1.5%)
Scenario B: Miss penalty 160ns → 100ns
EAT = 0.95(100) + 0.05(200) = 95 + 10 = 105ns
Improvement: 3ns (2.8%)
With more misses, reducing miss penalty becomes more valuable!
| Parameter | At High Hit Rate (>99%) | At Moderate Hit Rate (~95%) | Priority |
|---|---|---|---|
| TLB hit rate | Medium impact | High impact | Critical if below 98% |
| Page walk latency | Low impact (rare) | Medium impact | Focus if many misses |
| L1 cache hit time | High impact (frequent) | High impact | Always matters |
| L1 cache hit rate | Low impact (small misses) | Medium impact | Matters if > 10% miss |
| Memory latency | Low impact (rare) | Medium-High impact | Scales with total miss rate |
This sensitivity analysis parallels Amdahl's Law: optimizing the rare case (misses) has limited benefit, while optimizing the common case (hits) always helps—until hits are so cheap that miss penalty dominates even at low miss rates.
EAT analysis guides major architectural decisions. Let's examine several.
Decision 1: Should we add more L2 TLB entries?
Current: 1024 L2 TLB entries, 85% L2 hit rate Proposed: 2048 entries, estimated 92% L2 hit rate (based on simulation) Cost: Slightly higher L2 TLB latency (8 cycles → 10 cycles, +0.7ns)
Analysis (assuming 95% L1 TLB hit, 50ns page walk):
Current:
// L1 miss, L2 hit: 5% × 85% = 4.25%
// Both miss: 5% × 15% = 0.75%
TLB_overhead = 0.0425(3ns) + 0.0075(3ns + 50ns)
= 0.13 + 0.40 = 0.53ns
Proposed:
// L1 miss, L2 hit: 5% × 92% = 4.6%
// Both miss: 5% × 8% = 0.4%
TLB_overhead = 0.046(3.7ns) + 0.004(3.7ns + 50ns)
= 0.17 + 0.21 = 0.38ns
Result: 0.15ns improvement. Worth it if transistor budget allows!
Decision 2: 4KB pages vs 2MB huge pages
Scenario: Database with 50GB working set
With 4KB pages:
With 2MB huge pages:
Result: 7ns improvement per access! For billions of accesses, this is transformative.
Decision 3: Hardware vs software page walk
Software page walk adds ~50 cycles of trap handling overhead.
At 99% TLB hit rate:
Software overhead = 0.01 × (50 cycles × 0.3ns) = 0.01 × 15ns = 0.15ns
At 95% TLB hit rate:
Software overhead = 0.05 × 15ns = 0.75ns
For high-miss-rate workloads, hardware page walk provides meaningful benefit.
EAT analysis transforms "should we do X?" into a quantitative comparison. Estimate the hit rates and latencies for each option, calculate EAT, and the numbers guide the decision. This is how CPU architects actually make tradeoffs.
Let's work through several EAT calculations to build fluency with the technique.
Example 1: Single-level TLB, simple cache
Given:
Hit time = TLB + memory = 2 + 100 = 102ns
Miss time = TLB + page walk + memory = 2 + 3(100) + 100 = 402ns
EAT = 0.98(102) + 0.02(402)
= 99.96 + 8.04
= 108ns
Example 2: Two-level TLB with cache hierarchy
Given:
L1 TLB hit: 0.95 × (0 + 10) = 9.5ns
L2 TLB hit: 0.05 × 0.90 × (5 + 10) = 0.045 × 15 = 0.675ns
Both miss: 0.05 × 0.10 × (5 + 60 + 10) = 0.005 × 75 = 0.375ns
EAT = 9.5 + 0.675 + 0.375 = 10.55ns
Example 3: Impact of huge pages
Given (4KB pages):
EAT_4KB = 0.06(10) + 0.94×0.90(5+10) + 0.94×0.10(5+50+10)
= 0.6 + 12.69 + 6.11
= 19.4ns
With 2MB huge pages:
EAT_2MB = 1.0(10) = 10ns
Result: Huge pages nearly halve effective access time!
These calculations use simplified models. Real-world performance depends on access patterns, cache replacement behavior, and many other factors. EAT analysis provides directional guidance, not exact predictions. Always validate with benchmarks.
We've developed a comprehensive framework for calculating and analyzing Effective Access Time. Let's consolidate the key insights:
What's next:
We've computed how long accesses take. But what happens when the TLB is full and a new translation must be installed? The next page examines TLB Replacement—the policies and mechanisms for evicting entries when capacity is exceeded.
You now have the mathematical framework to quantify memory access performance in any paging system. EAT analysis is the foundation for performance engineering in memory-intensive systems—use it to diagnose bottlenecks and evaluate optimizations.