Operating SystemsTranslation Lookaside Buffer (TLB)

Translation Lookaside Buffer (TLB)

LevelIntermediate

Duration60 mins

TopicTranslation Lookaside Buffer (TLB)

4 / 5

Effective Access Time

Quantifying Memory Performance

We've established that TLB hits are fast and misses are slow. But how do we quantify "fast" and "slow" into a single, actionable metric? How do we compare two TLB designs, evaluate the benefit of adding L2 TLB entries, or justify the overhead of huge pages?

The answer is Effective Access Time (EAT)—the weighted average time to access memory, accounting for all possible hit/miss scenarios in the translation and caching hierarchy. EAT is the memory performance metric that matters: it tells you how fast memory actually is in your system, averaged over all accesses.

Mastering EAT calculation is essential for systems architects designing memory hierarchies, OS developers tuning TLB-related parameters, and performance engineers diagnosing memory bottlenecks. This page develops the EAT framework rigorously and shows how to apply it to real-world analysis.

What You Will Learn

By the end of this page, you will understand how to calculate Effective Access Time for single-level and multi-level TLBs, incorporate cache hierarchy effects, account for page walk costs accurately, and use EAT analysis to evaluate design alternatives and optimization opportunities.

The Basic EAT Formula

At its core, Effective Access Time is a probability-weighted sum of access times for each possible outcome in the memory hierarchy.

The fundamental principle:

EAT = Σ (Probability of Outcome × Time for that Outcome)

For a simple single-level TLB system:

There are two outcomes:

TLB hit: Translation found, memory accessed immediately
TLB miss: Page table walked, then memory accessed

Let:

α = TLB hit rate (e.g., 0.99 for 99%)
t_tlb = TLB lookup time (typically 0-1 clock cycle)
t_mem = Memory access time (e.g., 100ns)
t_walk = Page table walk time (varies by levels and cache state)

Basic EAT with single-level TLB:

EAT = (α) × (t_tlb + t_mem) + (1 - α) × (t_tlb + t_walk + t_mem)

Simplifying:

EAT = t_tlb + t_mem + (1 - α) × t_walk

This elegant form shows that EAT equals the baseline access time (t_tlb + t_mem) plus the miss penalty (t_walk) weighted by miss rate (1 - α).

The Implicit Assumption

This basic formula assumes the TLB lookup happens before (or in parallel with) cache/memory access. Modern VIPT caches allow parallel lookup, so t_tlb is often effectively zero for hits—it's overlapped with cache access. We'll refine the model for this.

Worked Example: Basic EAT Calculation

Given:

TLB hit rate: 99%
TLB lookup: 1ns (parallel, effectively 0 when hit)
Memory access: 100ns
Page walk (4-level, hitting L3 cache): 4 × 40ns = 160ns

For TLB hit:

Time = 0 (TLB parallel) + 100ns = 100ns

For TLB miss:

Time = Page walk + memory = 160ns + 100ns = 260ns

EAT = (0.99 × 100) + (0.01 × 260)
    = 99 + 2.6
    = 101.6ns

Interpretation:

The average memory access takes 101.6ns—just 1.6% slower than the theoretical minimum of 100ns. The TLB effectively hides the translation overhead for 99 out of 100 accesses.

EAT with Multi-Level TLBs

Modern processors have hierarchical TLBs (L1 and L2), adding complexity to EAT calculations. The L2 TLB is checked only on L1 TLB miss, and page walk occurs only on L2 miss.

Three-outcome model for hierarchical TLBs:

L1 TLB hit: Fast translation, immediate memory access
L1 miss, L2 TLB hit: Slightly slower, but still avoids page walk
L1 miss, L2 miss: Full page walk required

Let:

α₁ = L1 TLB hit rate
α₂ = L2 TLB hit rate (on L1 miss)
t_L1 = L1 TLB lookup time
t_L2 = L2 TLB lookup time
t_walk = Page walk time
t_mem = Memory access time

EAT with two-level TLB:

EAT = α₁ × (t_L1 + t_mem)                           // L1 hit
    + (1 - α₁) × α₂ × (t_L1 + t_L2 + t_mem)         // L1 miss, L2 hit
    + (1 - α₁) × (1 - α₂) × (t_L1 + t_L2 + t_walk + t_mem)  // Both miss

Worked Example: Two-Level TLB

Typical modern desktop CPU:

L1 DTLB: 64 entries, 1-cycle lookup (0.3ns at 3GHz)
L2 TLB: 1536 entries, 8-cycle lookup (2.7ns at 3GHz)
L1 hit rate: 95%
L2 hit rate (on L1 miss): 90%
Page walk: 80ns (with page walk caches)
Memory access: 80ns

Let's compute step by step:

L1 hit path (95% of accesses):

Time = 0.3 + 80 = 80.3ns
Weighted contribution: 0.95 × 80.3 = 76.29ns

L1 miss, L2 hit (5% × 90% = 4.5% of accesses):

Time = 0.3 + 2.7 + 80 = 83ns
Weighted contribution: 0.045 × 83 = 3.74ns

Both miss (5% × 10% = 0.5% of accesses):

Time = 0.3 + 2.7 + 80 + 80 = 163ns
Weighted contribution: 0.005 × 163 = 0.82ns

Total EAT:

EAT = 76.29 + 3.74 + 0.82 = 80.85ns

Analysis:

Despite 5% L1 TLB miss rate, the L2 TLB catches 90% of those misses. Only 0.5% of accesses incur the full page walk penalty. The two-level design is highly effective.

The Power of Tiered Caching

The L2 TLB acts as a "safety net" for L1 misses. A larger L2 TLB (1000+ entries) can have high hit rates even with modest L1 size. This tiered approach is more efficient than building one massive fast TLB.

Modeling Page Walk Costs Accurately

The page walk penalty isn't a fixed constant—it depends heavily on cache state. A realistic EAT model must account for where page table entries reside in the memory hierarchy.

Page walk cache scenarios:

Each page table level access can hit in:

L1 Cache: ~4 cycles (~1.3ns at 3GHz)
L2 Cache: ~12 cycles (~4ns)
L3 Cache: ~40 cycles (~13ns)
Main Memory: ~200+ cycles (~67ns+)

For a 4-level page table walk, if all levels hit in L3:

Page walk time = 4 × 13ns = 52ns

If all levels miss to memory:

Page walk time = 4 × 70ns = 280ns

The difference is dramatic: 52ns vs 280ns!

Page Walk Latency by Cache State
Scenario	Per-Level Latency	4-Level Walk	Typical Probability
All hit L1 cache	~1.3ns	~5ns	< 1%
All hit L2 cache	~4ns	~16ns	~5%
All hit L3 cache	~13ns	~52ns	~60%
Mix of L3 and memory	~30ns avg	~120ns	~30%
All miss to memory	~70ns	~280ns	~5%

Refined EAT with cache-aware page walk:

Let:

t_walk_L3 = Page walk with L3 hits (fast)
t_walk_mem = Page walk with memory misses (slow)
β = Probability that page walk hits in cache

t_walk_effective = β × t_walk_L3 + (1 - β) × t_walk_mem

Worked Example:

Assuming 70% of TLB miss page walks hit in L3, 30% go to memory:

t_walk_effective = (0.7 × 52) + (0.3 × 150) = 36.4 + 45 = 81.4ns

This is much more realistic than assuming either extreme.

The page walk cache effect:

Modern CPUs cache upper-level page table entries separately from data:

PML4 cache: Covers 512GB virtual regions, rarely misses
PDPT cache: Covers 1GB regions
PD cache: Covers 2MB regions

With effective page walk caches, most 4-level walks reduce to 1-2 memory accesses:

Effective walk = 0.95 × (1 × 13ns) + 0.05 × (4 × 13ns) = 12.35 + 2.6 = ~15ns

Page walk caches make the "4 sequential accesses" fear much less severe in practice.

Complete Memory Hierarchy EAT

Real memory access traverses both address translation (TLB) and data caching (L1/L2/L3) hierarchies. A complete EAT model must account for both.

The combined model:

For each TLB outcome, the data can hit or miss at each cache level:

Total EAT = Σ (TLB outcome probability × 
              (TLB latency + 
               Σ (Cache outcome probability × Cache latency)))

Simplification using effective data access time:

Define:

EAT_data = Effective time to get data once physical address is known

With cache hierarchy:

EAT_data = p_L1 × t_L1 + p_L2 × t_L2 + p_L3 × t_L3 + p_mem × t_mem

Where p_L1 + p_L2 + p_L3 + p_mem = 1

Combined TLB + Cache EAT:

EAT = α × (t_tlb + EAT_data) + (1 - α) × (t_tlb + t_walk + EAT_data)

Or with two-level TLB:

EAT = α₁ × (t_L1_TLB + EAT_data) +
      (1-α₁) × α₂ × (t_L1_TLB + t_L2_TLB + EAT_data) +
      (1-α₁) × (1-α₂) × (t_L1_TLB + t_L2_TLB + t_walk + EAT_data)

Comprehensive Example: Modern x86-64 System

System parameters:

L1 Data Cache: 32KB, 5-cycle latency (1.5ns), 95% hit rate
L2 Cache: 256KB, 15-cycle latency (5ns), 80% hit rate (on L1 miss)
L3 Cache: 8MB, 40-cycle latency (13ns), 90% hit rate (on L2 miss)
Memory: 200-cycle latency (67ns)

Step 1: Calculate EAT_data (cache hierarchy)

L1 hit:        95%                 → 1.5ns
L1 miss, L2 hit: 5% × 80% = 4%     → 1.5 + 5 = 6.5ns
L1/L2 miss, L3 hit: 5% × 20% × 90% = 0.9% → 1.5 + 5 + 13 = 19.5ns
All miss: 5% × 20% × 10% = 0.1%   → 1.5 + 5 + 13 + 67 = 86.5ns

EAT_data = 0.95(1.5) + 0.04(6.5) + 0.009(19.5) + 0.001(86.5)
         = 1.425 + 0.26 + 0.176 + 0.087
         = 1.95ns

Step 2: Calculate TLB-augmented EAT

TLB parameters:

L1 DTLB hit rate: 95%
L2 TLB hit rate (on L1 miss): 90%
L1 DTLB lookup: ~0ns (parallel)
L2 TLB lookup: 3ns
Page walk: 40ns (average with caches)

L1 TLB hit:    95%                → 0 + 1.95 = 1.95ns
L2 TLB hit:    5% × 90% = 4.5%    → 3 + 1.95 = 4.95ns
Both miss:     5% × 10% = 0.5%    → 3 + 40 + 1.95 = 44.95ns

EAT = 0.95(1.95) + 0.045(4.95) + 0.005(44.95)
    = 1.85 + 0.22 + 0.22
    = 2.29ns

Result: Average memory access takes ~2.3ns on this well-tuned system!

The Hierarchy Hides Almost Everything

Despite memory latency of 67ns and potential page walk of 40ns, the effective access time is just 2.3ns. The multi-level caching and TLB hierarchies successfully hide latency for the vast majority of accesses.

Sensitivity Analysis: What Matters Most?

Understanding which parameters most affect EAT helps prioritize optimization efforts. Let's analyze sensitivities.

The sensitivity framework:

For each parameter, we ask: if this parameter improves by X%, how much does EAT improve?

Sensitivity = ∂EAT / ∂parameter

Key insight: Miss rates matter more than hit times

With current technology:

Improving L1 cache hit time from 1.5ns to 1.0ns saves ~0.5ns
Improving L1 cache hit rate from 95% to 96% saves ~0.05 × 5ns = ~0.25ns

Wait—that seems wrong! Let's dig deeper.

The impact depends on the current miss penalty. If L2/L3/memory latency is high, improving L1 hit rate has outsized impact. If the miss penalty is low (fast L2/L3), hit rate matters less.

TLB sensitivity analysis:

Using our earlier model (99% TLB hit rate, 160ns miss penalty):

Current EAT: 101.6ns

Scenario A: Improve hit rate to 99.5%

EAT = 0.995(100) + 0.005(260) = 99.5 + 1.3 = 100.8ns
Improvement: 0.8ns (0.8%)

Scenario B: Reduce miss penalty from 160ns to 100ns

EAT = 0.99(100) + 0.01(200) = 99 + 2 = 101ns
Improvement: 0.6ns (0.6%)

At 99% hit rate, they're comparable. But at 95% hit rate:

Original: 0.95(100) + 0.05(260) = 95 + 13 = 108ns

Scenario A: Hit rate to 96%

EAT = 0.96(100) + 0.04(260) = 96 + 10.4 = 106.4ns
Improvement: 1.6ns (1.5%)

Scenario B: Miss penalty 160ns → 100ns

EAT = 0.95(100) + 0.05(200) = 95 + 10 = 105ns
Improvement: 3ns (2.8%)

With more misses, reducing miss penalty becomes more valuable!

EAT Sensitivity Analysis
Parameter	At High Hit Rate (>99%)	At Moderate Hit Rate (~95%)	Priority
TLB hit rate	Medium impact	High impact	Critical if below 98%
Page walk latency	Low impact (rare)	Medium impact	Focus if many misses
L1 cache hit time	High impact (frequent)	High impact	Always matters
L1 cache hit rate	Low impact (small misses)	Medium impact	Matters if > 10% miss
Memory latency	Low impact (rare)	Medium-High impact	Scales with total miss rate

The Amdahl's Law Parallel

This sensitivity analysis parallels Amdahl's Law: optimizing the rare case (misses) has limited benefit, while optimizing the common case (hits) always helps—until hits are so cheap that miss penalty dominates even at low miss rates.

Using EAT for System Design Decisions

EAT analysis guides major architectural decisions. Let's examine several.

Decision 1: Should we add more L2 TLB entries?

Current: 1024 L2 TLB entries, 85% L2 hit rate Proposed: 2048 entries, estimated 92% L2 hit rate (based on simulation) Cost: Slightly higher L2 TLB latency (8 cycles → 10 cycles, +0.7ns)

Analysis (assuming 95% L1 TLB hit, 50ns page walk):

Current:

// L1 miss, L2 hit: 5% × 85% = 4.25%
// Both miss: 5% × 15% = 0.75%
TLB_overhead = 0.0425(3ns) + 0.0075(3ns + 50ns)
             = 0.13 + 0.40 = 0.53ns

Proposed:

// L1 miss, L2 hit: 5% × 92% = 4.6%
// Both miss: 5% × 8% = 0.4%
TLB_overhead = 0.046(3.7ns) + 0.004(3.7ns + 50ns)
             = 0.17 + 0.21 = 0.38ns

Result: 0.15ns improvement. Worth it if transistor budget allows!

Decision 2: 4KB pages vs 2MB huge pages

Scenario: Database with 50GB working set

With 4KB pages:

Need 12.5 million TLB entries for full coverage
TLB has 2048 entries → massive miss rate
Estimated TLB hit rate: 85%
EAT overhead: 15% × 50ns = 7.5ns additional per access

With 2MB huge pages:

Need 25,000 TLB entries for full coverage
TLB 2048 entries covers ~80% of working set
Estimated TLB hit rate: 99%
EAT overhead: 1% × 50ns = 0.5ns additional per access

Result: 7ns improvement per access! For billions of accesses, this is transformative.

Decision 3: Hardware vs software page walk

Software page walk adds ~50 cycles of trap handling overhead.

At 99% TLB hit rate:

Software overhead = 0.01 × (50 cycles × 0.3ns) = 0.01 × 15ns = 0.15ns

At 95% TLB hit rate:

Software overhead = 0.05 × 15ns = 0.75ns

For high-miss-rate workloads, hardware page walk provides meaningful benefit.

EAT as a Decision Framework

EAT analysis transforms "should we do X?" into a quantitative comparison. Estimate the hit rates and latencies for each option, calculate EAT, and the numbers guide the decision. This is how CPU architects actually make tradeoffs.

EAT Calculation Worked Examples

Let's work through several EAT calculations to build fluency with the technique.

Example 1: Single-level TLB, simple cache

Given:

TLB hit rate: 98%
TLB access time: 2ns
Memory access time: 100ns
Page walk: 3 memory accesses

Hit time = TLB + memory = 2 + 100 = 102ns
Miss time = TLB + page walk + memory = 2 + 3(100) + 100 = 402ns

EAT = 0.98(102) + 0.02(402)
    = 99.96 + 8.04
    = 108ns

Example 2: Two-level TLB with cache hierarchy

Given:

L1 TLB: 95% hit rate, 0ns (parallel)
L2 TLB: 90% hit rate on L1 miss, 5ns lookup
Page walk: 60ns (average)
Data cache EAT: 10ns

L1 TLB hit: 0.95 × (0 + 10) = 9.5ns
L2 TLB hit: 0.05 × 0.90 × (5 + 10) = 0.045 × 15 = 0.675ns
Both miss: 0.05 × 0.10 × (5 + 60 + 10) = 0.005 × 75 = 0.375ns

EAT = 9.5 + 0.675 + 0.375 = 10.55ns

Example 3: Impact of huge pages

Given (4KB pages):

L1 TLB entries: 64 (covering 256KB)
Working set: 4MB
L1 TLB hit rate: 64/1024 ≈ 6% (very rough estimate)
L2 TLB entries: 1024 (covering 4MB)
L2 TLB hit rate: 90% (covers working set)
Page walk: 50ns
Data access: 10ns

EAT_4KB = 0.06(10) + 0.94×0.90(5+10) + 0.94×0.10(5+50+10)
        = 0.6 + 12.69 + 6.11
        = 19.4ns

With 2MB huge pages:

L1 TLB entries: 64 (now covering 128MB!)
Working set: 4MB (easily fits in L1 TLB)
L1 TLB hit rate: 100%

EAT_2MB = 1.0(10) = 10ns

Result: Huge pages nearly halve effective access time!

Reality Check

These calculations use simplified models. Real-world performance depends on access patterns, cache replacement behavior, and many other factors. EAT analysis provides directional guidance, not exact predictions. Always validate with benchmarks.

Summary: Effective Access Time Analysis

We've developed a comprehensive framework for calculating and analyzing Effective Access Time. Let's consolidate the key insights:

Key Takeaways

•EAT is probability-weighted: Sum of (probability × time) for each hit/miss outcome
•Basic formula: EAT = t_hit + miss_rate × miss_penalty
•Multi-level TLBs add complexity: Each level creates a new branch in the probability tree
•Page walk cost varies: Cache state dramatically affects walk latency (50ns-280ns range)
•Complete model includes caches: TLB hierarchy + data cache hierarchy = full EAT
•Sensitivity analysis guides optimization: Focus on parameters with high impact at current operating point
•EAT enables design decisions: Quantitative comparison of architectural alternatives
•Huge pages dramatically help: Can transform TLB-bound workloads by 2×-10× improvement

What's next:

We've computed how long accesses take. But what happens when the TLB is full and a new translation must be installed? The next page examines TLB Replacement—the policies and mechanisms for evicting entries when capacity is exceeded.

Page Complete

You now have the mathematical framework to quantify memory access performance in any paging system. EAT analysis is the foundation for performance engineering in memory-intensive systems—use it to diagnose bottlenecks and evaluate optimizations.

4 / 5

Loading learning content...

Operating SystemsTranslation Lookaside Buffer (TLB)

Translation Lookaside Buffer (TLB)

LevelIntermediate

Duration60 mins

TopicTranslation Lookaside Buffer (TLB)

4 / 5

Effective Access Time

Quantifying Memory Performance

What You Will Learn

The Basic EAT Formula

At its core, Effective Access Time is a probability-weighted sum of access times for each possible outcome in the memory hierarchy.

The fundamental principle:

EAT = Σ (Probability of Outcome × Time for that Outcome)

For a simple single-level TLB system:

There are two outcomes:

TLB hit: Translation found, memory accessed immediately
TLB miss: Page table walked, then memory accessed

Let:

α = TLB hit rate (e.g., 0.99 for 99%)
t_tlb = TLB lookup time (typically 0-1 clock cycle)
t_mem = Memory access time (e.g., 100ns)
t_walk = Page table walk time (varies by levels and cache state)

Basic EAT with single-level TLB:

EAT = (α) × (t_tlb + t_mem) + (1 - α) × (t_tlb + t_walk + t_mem)

Simplifying:

EAT = t_tlb + t_mem + (1 - α) × t_walk

This elegant form shows that EAT equals the baseline access time (t_tlb + t_mem) plus the miss penalty (t_walk) weighted by miss rate (1 - α).

The Implicit Assumption

Worked Example: Basic EAT Calculation

Given:

TLB hit rate: 99%
TLB lookup: 1ns (parallel, effectively 0 when hit)
Memory access: 100ns
Page walk (4-level, hitting L3 cache): 4 × 40ns = 160ns

For TLB hit:

Time = 0 (TLB parallel) + 100ns = 100ns

For TLB miss:

Time = Page walk + memory = 160ns + 100ns = 260ns

EAT = (0.99 × 100) + (0.01 × 260)
    = 99 + 2.6
    = 101.6ns

Interpretation:

The average memory access takes 101.6ns—just 1.6% slower than the theoretical minimum of 100ns. The TLB effectively hides the translation overhead for 99 out of 100 accesses.

EAT with Multi-Level TLBs

Modern processors have hierarchical TLBs (L1 and L2), adding complexity to EAT calculations. The L2 TLB is checked only on L1 TLB miss, and page walk occurs only on L2 miss.

Three-outcome model for hierarchical TLBs:

L1 TLB hit: Fast translation, immediate memory access
L1 miss, L2 TLB hit: Slightly slower, but still avoids page walk
L1 miss, L2 miss: Full page walk required

Let:

α₁ = L1 TLB hit rate
α₂ = L2 TLB hit rate (on L1 miss)
t_L1 = L1 TLB lookup time
t_L2 = L2 TLB lookup time
t_walk = Page walk time
t_mem = Memory access time

EAT with two-level TLB:

EAT = α₁ × (t_L1 + t_mem)                           // L1 hit
    + (1 - α₁) × α₂ × (t_L1 + t_L2 + t_mem)         // L1 miss, L2 hit
    + (1 - α₁) × (1 - α₂) × (t_L1 + t_L2 + t_walk + t_mem)  // Both miss

Worked Example: Two-Level TLB

Typical modern desktop CPU:

L1 DTLB: 64 entries, 1-cycle lookup (0.3ns at 3GHz)
L2 TLB: 1536 entries, 8-cycle lookup (2.7ns at 3GHz)
L1 hit rate: 95%
L2 hit rate (on L1 miss): 90%
Page walk: 80ns (with page walk caches)
Memory access: 80ns

Let's compute step by step:

L1 hit path (95% of accesses):

Time = 0.3 + 80 = 80.3ns
Weighted contribution: 0.95 × 80.3 = 76.29ns

L1 miss, L2 hit (5% × 90% = 4.5% of accesses):

Time = 0.3 + 2.7 + 80 = 83ns
Weighted contribution: 0.045 × 83 = 3.74ns

Both miss (5% × 10% = 0.5% of accesses):

Time = 0.3 + 2.7 + 80 + 80 = 163ns
Weighted contribution: 0.005 × 163 = 0.82ns

Total EAT:

EAT = 76.29 + 3.74 + 0.82 = 80.85ns

Analysis:

Despite 5% L1 TLB miss rate, the L2 TLB catches 90% of those misses. Only 0.5% of accesses incur the full page walk penalty. The two-level design is highly effective.

The Power of Tiered Caching

Modeling Page Walk Costs Accurately

The page walk penalty isn't a fixed constant—it depends heavily on cache state. A realistic EAT model must account for where page table entries reside in the memory hierarchy.

Page walk cache scenarios:

Each page table level access can hit in:

L1 Cache: ~4 cycles (~1.3ns at 3GHz)
L2 Cache: ~12 cycles (~4ns)
L3 Cache: ~40 cycles (~13ns)
Main Memory: ~200+ cycles (~67ns+)

For a 4-level page table walk, if all levels hit in L3:

Page walk time = 4 × 13ns = 52ns

If all levels miss to memory:

Page walk time = 4 × 70ns = 280ns

The difference is dramatic: 52ns vs 280ns!

Page Walk Latency by Cache State
Scenario	Per-Level Latency	4-Level Walk	Typical Probability
All hit L1 cache	~1.3ns	~5ns	< 1%
All hit L2 cache	~4ns	~16ns	~5%
All hit L3 cache	~13ns	~52ns	~60%
Mix of L3 and memory	~30ns avg	~120ns	~30%
All miss to memory	~70ns	~280ns	~5%

Refined EAT with cache-aware page walk:

Let:

t_walk_L3 = Page walk with L3 hits (fast)
t_walk_mem = Page walk with memory misses (slow)
β = Probability that page walk hits in cache

t_walk_effective = β × t_walk_L3 + (1 - β) × t_walk_mem

Worked Example:

Assuming 70% of TLB miss page walks hit in L3, 30% go to memory:

t_walk_effective = (0.7 × 52) + (0.3 × 150) = 36.4 + 45 = 81.4ns

This is much more realistic than assuming either extreme.

The page walk cache effect:

Modern CPUs cache upper-level page table entries separately from data:

PML4 cache: Covers 512GB virtual regions, rarely misses
PDPT cache: Covers 1GB regions
PD cache: Covers 2MB regions

With effective page walk caches, most 4-level walks reduce to 1-2 memory accesses:

Effective walk = 0.95 × (1 × 13ns) + 0.05 × (4 × 13ns) = 12.35 + 2.6 = ~15ns

Page walk caches make the "4 sequential accesses" fear much less severe in practice.

Complete Memory Hierarchy EAT

Real memory access traverses both address translation (TLB) and data caching (L1/L2/L3) hierarchies. A complete EAT model must account for both.

The combined model:

For each TLB outcome, the data can hit or miss at each cache level:

Total EAT = Σ (TLB outcome probability × 
              (TLB latency + 
               Σ (Cache outcome probability × Cache latency)))

Simplification using effective data access time:

Define:

EAT_data = Effective time to get data once physical address is known

With cache hierarchy:

EAT_data = p_L1 × t_L1 + p_L2 × t_L2 + p_L3 × t_L3 + p_mem × t_mem

Where p_L1 + p_L2 + p_L3 + p_mem = 1

Combined TLB + Cache EAT:

EAT = α × (t_tlb + EAT_data) + (1 - α) × (t_tlb + t_walk + EAT_data)

Or with two-level TLB:

EAT = α₁ × (t_L1_TLB + EAT_data) +
      (1-α₁) × α₂ × (t_L1_TLB + t_L2_TLB + EAT_data) +
      (1-α₁) × (1-α₂) × (t_L1_TLB + t_L2_TLB + t_walk + EAT_data)

Comprehensive Example: Modern x86-64 System

System parameters:

L1 Data Cache: 32KB, 5-cycle latency (1.5ns), 95% hit rate
L2 Cache: 256KB, 15-cycle latency (5ns), 80% hit rate (on L1 miss)
L3 Cache: 8MB, 40-cycle latency (13ns), 90% hit rate (on L2 miss)
Memory: 200-cycle latency (67ns)

Step 1: Calculate EAT_data (cache hierarchy)

L1 hit:        95%                 → 1.5ns
L1 miss, L2 hit: 5% × 80% = 4%     → 1.5 + 5 = 6.5ns
L1/L2 miss, L3 hit: 5% × 20% × 90% = 0.9% → 1.5 + 5 + 13 = 19.5ns
All miss: 5% × 20% × 10% = 0.1%   → 1.5 + 5 + 13 + 67 = 86.5ns

EAT_data = 0.95(1.5) + 0.04(6.5) + 0.009(19.5) + 0.001(86.5)
         = 1.425 + 0.26 + 0.176 + 0.087
         = 1.95ns

Step 2: Calculate TLB-augmented EAT

TLB parameters:

L1 DTLB hit rate: 95%
L2 TLB hit rate (on L1 miss): 90%
L1 DTLB lookup: ~0ns (parallel)
L2 TLB lookup: 3ns
Page walk: 40ns (average with caches)

L1 TLB hit:    95%                → 0 + 1.95 = 1.95ns
L2 TLB hit:    5% × 90% = 4.5%    → 3 + 1.95 = 4.95ns
Both miss:     5% × 10% = 0.5%    → 3 + 40 + 1.95 = 44.95ns

EAT = 0.95(1.95) + 0.045(4.95) + 0.005(44.95)
    = 1.85 + 0.22 + 0.22
    = 2.29ns

Result: Average memory access takes ~2.3ns on this well-tuned system!

The Hierarchy Hides Almost Everything

Sensitivity Analysis: What Matters Most?

Understanding which parameters most affect EAT helps prioritize optimization efforts. Let's analyze sensitivities.

The sensitivity framework:

For each parameter, we ask: if this parameter improves by X%, how much does EAT improve?

Sensitivity = ∂EAT / ∂parameter

Key insight: Miss rates matter more than hit times

With current technology:

Improving L1 cache hit time from 1.5ns to 1.0ns saves ~0.5ns
Improving L1 cache hit rate from 95% to 96% saves ~0.05 × 5ns = ~0.25ns

Wait—that seems wrong! Let's dig deeper.

The impact depends on the current miss penalty. If L2/L3/memory latency is high, improving L1 hit rate has outsized impact. If the miss penalty is low (fast L2/L3), hit rate matters less.

TLB sensitivity analysis:

Using our earlier model (99% TLB hit rate, 160ns miss penalty):

Current EAT: 101.6ns

Scenario A: Improve hit rate to 99.5%

EAT = 0.995(100) + 0.005(260) = 99.5 + 1.3 = 100.8ns
Improvement: 0.8ns (0.8%)

Scenario B: Reduce miss penalty from 160ns to 100ns

EAT = 0.99(100) + 0.01(200) = 99 + 2 = 101ns
Improvement: 0.6ns (0.6%)

At 99% hit rate, they're comparable. But at 95% hit rate:

Original: 0.95(100) + 0.05(260) = 95 + 13 = 108ns

Scenario A: Hit rate to 96%

EAT = 0.96(100) + 0.04(260) = 96 + 10.4 = 106.4ns
Improvement: 1.6ns (1.5%)

Scenario B: Miss penalty 160ns → 100ns

EAT = 0.95(100) + 0.05(200) = 95 + 10 = 105ns
Improvement: 3ns (2.8%)

With more misses, reducing miss penalty becomes more valuable!

EAT Sensitivity Analysis
Parameter	At High Hit Rate (>99%)	At Moderate Hit Rate (~95%)	Priority
TLB hit rate	Medium impact	High impact	Critical if below 98%
Page walk latency	Low impact (rare)	Medium impact	Focus if many misses
L1 cache hit time	High impact (frequent)	High impact	Always matters
L1 cache hit rate	Low impact (small misses)	Medium impact	Matters if > 10% miss
Memory latency	Low impact (rare)	Medium-High impact	Scales with total miss rate

The Amdahl's Law Parallel

Using EAT for System Design Decisions

EAT analysis guides major architectural decisions. Let's examine several.

Decision 1: Should we add more L2 TLB entries?

Current: 1024 L2 TLB entries, 85% L2 hit rate Proposed: 2048 entries, estimated 92% L2 hit rate (based on simulation) Cost: Slightly higher L2 TLB latency (8 cycles → 10 cycles, +0.7ns)

Analysis (assuming 95% L1 TLB hit, 50ns page walk):

Current:

// L1 miss, L2 hit: 5% × 85% = 4.25%
// Both miss: 5% × 15% = 0.75%
TLB_overhead = 0.0425(3ns) + 0.0075(3ns + 50ns)
             = 0.13 + 0.40 = 0.53ns

Proposed:

// L1 miss, L2 hit: 5% × 92% = 4.6%
// Both miss: 5% × 8% = 0.4%
TLB_overhead = 0.046(3.7ns) + 0.004(3.7ns + 50ns)
             = 0.17 + 0.21 = 0.38ns

Result: 0.15ns improvement. Worth it if transistor budget allows!

Decision 2: 4KB pages vs 2MB huge pages

Scenario: Database with 50GB working set

With 4KB pages:

Need 12.5 million TLB entries for full coverage
TLB has 2048 entries → massive miss rate
Estimated TLB hit rate: 85%
EAT overhead: 15% × 50ns = 7.5ns additional per access

With 2MB huge pages:

Need 25,000 TLB entries for full coverage
TLB 2048 entries covers ~80% of working set
Estimated TLB hit rate: 99%
EAT overhead: 1% × 50ns = 0.5ns additional per access

Result: 7ns improvement per access! For billions of accesses, this is transformative.

Decision 3: Hardware vs software page walk

Software page walk adds ~50 cycles of trap handling overhead.

At 99% TLB hit rate:

Software overhead = 0.01 × (50 cycles × 0.3ns) = 0.01 × 15ns = 0.15ns

At 95% TLB hit rate:

Software overhead = 0.05 × 15ns = 0.75ns

For high-miss-rate workloads, hardware page walk provides meaningful benefit.

EAT as a Decision Framework

EAT Calculation Worked Examples

Let's work through several EAT calculations to build fluency with the technique.

Example 1: Single-level TLB, simple cache

Given:

TLB hit rate: 98%
TLB access time: 2ns
Memory access time: 100ns
Page walk: 3 memory accesses

Hit time = TLB + memory = 2 + 100 = 102ns
Miss time = TLB + page walk + memory = 2 + 3(100) + 100 = 402ns

EAT = 0.98(102) + 0.02(402)
    = 99.96 + 8.04
    = 108ns

Example 2: Two-level TLB with cache hierarchy

Given:

L1 TLB: 95% hit rate, 0ns (parallel)
L2 TLB: 90% hit rate on L1 miss, 5ns lookup
Page walk: 60ns (average)
Data cache EAT: 10ns

L1 TLB hit: 0.95 × (0 + 10) = 9.5ns
L2 TLB hit: 0.05 × 0.90 × (5 + 10) = 0.045 × 15 = 0.675ns
Both miss: 0.05 × 0.10 × (5 + 60 + 10) = 0.005 × 75 = 0.375ns

EAT = 9.5 + 0.675 + 0.375 = 10.55ns

Example 3: Impact of huge pages

Given (4KB pages):

L1 TLB entries: 64 (covering 256KB)
Working set: 4MB
L1 TLB hit rate: 64/1024 ≈ 6% (very rough estimate)
L2 TLB entries: 1024 (covering 4MB)
L2 TLB hit rate: 90% (covers working set)
Page walk: 50ns
Data access: 10ns

EAT_4KB = 0.06(10) + 0.94×0.90(5+10) + 0.94×0.10(5+50+10)
        = 0.6 + 12.69 + 6.11
        = 19.4ns

With 2MB huge pages:

L1 TLB entries: 64 (now covering 128MB!)
Working set: 4MB (easily fits in L1 TLB)
L1 TLB hit rate: 100%

EAT_2MB = 1.0(10) = 10ns

Result: Huge pages nearly halve effective access time!

Reality Check

Summary: Effective Access Time Analysis

We've developed a comprehensive framework for calculating and analyzing Effective Access Time. Let's consolidate the key insights:

Key Takeaways

•EAT is probability-weighted: Sum of (probability × time) for each hit/miss outcome
•Basic formula: EAT = t_hit + miss_rate × miss_penalty
•Multi-level TLBs add complexity: Each level creates a new branch in the probability tree
•Page walk cost varies: Cache state dramatically affects walk latency (50ns-280ns range)
•Complete model includes caches: TLB hierarchy + data cache hierarchy = full EAT
•Sensitivity analysis guides optimization: Focus on parameters with high impact at current operating point
•EAT enables design decisions: Quantitative comparison of architectural alternatives
•Huge pages dramatically help: Can transform TLB-bound workloads by 2×-10× improvement

What's next:

Page Complete

4 / 5