Loading learning content...
Modern computer systems employ a sophisticated memory hierarchy to bridge the gap between fast CPUs and slow main memory. A memory access may hit or miss at multiple levels—TLB, L1 cache, L2 cache, L3 cache—before reaching DRAM. Understanding how to calculate the effective memory access time (EMAT) across this entire hierarchy is essential for:
This page synthesizes everything we've learned about TLB and cache performance into a unified framework for complete memory system analysis.
By the end of this page, you will be able to calculate EMAT for any memory hierarchy configuration, analyze the impact of each hierarchy level on performance, model the interaction between TLB and cache lookups, and solve complex multi-level memory problems systematically.
A modern memory system has multiple layers, each with different characteristics:
Typical Memory Hierarchy:
┌─────────────────────────────┐
│ CPU Registers │ ← ~0.25 ns, ~1 KB
├─────────────────────────────┤
│ L1 Data Cache │ ← ~1 ns, 32-64 KB
│ L1 Instruction Cache │
├─────────────────────────────┤
│ L2 Cache │ ← ~4-10 ns, 256 KB - 1 MB
├─────────────────────────────┤
│ L3 Cache │ ← ~10-40 ns, 4-64 MB
├─────────────────────────────┤
│ Main Memory (DRAM) │ ← ~50-100 ns, 8-256 GB
├─────────────────────────────┤
│ Storage (SSD/HDD/NVMe) │ ← ~0.1-10 ms, 1-16 TB
└─────────────────────────────┘
Parallel to this is the TLB hierarchy:
┌─────────────────────────────┐
│ L1 DTLB (Data TLB) │ ← ~0.5 ns, 64-96 entries
│ L1 ITLB (Instruction) │
├─────────────────────────────┤
│ L2 STLB (Shared TLB) │ ← ~3-7 ns, 512-2048 entries
├─────────────────────────────┤
│ Page Table Walk │ ← Variable, 4 memory accesses
└─────────────────────────────┘
| Level | Access Time | Size | Hit Rate (typical) |
|---|---|---|---|
| L1 DTLB | 0.5 ns | 64 entries | 95-99% |
| L2 STLB | 3-5 ns | 1024 entries | 99%+ of L1 misses |
| L1 Data Cache | 1 ns (4 cycles) | 32-64 KB | 85-95% |
| L2 Cache | 4-10 ns | 256 KB - 1 MB | 80-95% of L1 misses |
| L3 Cache | 10-40 ns | 4-64 MB | 70-90% of L2 misses |
| DRAM | 50-100 ns | 8-256 GB | ~100% (if not paged out) |
| Page Fault | 5-15 ms | Disk/SSD | 0.001-0.1% |
Address translation (TLB) and data access (cache) are different concerns. Physically-indexed caches check the cache after translation. Virtually-indexed caches can check the cache in parallel with TLB lookup, but have aliasing issues. Most modern L1 caches use VIPT (Virtually-Indexed, Physically-Tagged) for speed.
The fundamental EMAT formula for a single cache level:
EMAT = (Hit Rate × Hit Time) + (Miss Rate × Miss Time)
= h × T_hit + (1 - h) × T_miss
Where:
h = Cache hit rateT_hit = Time for a cache hitT_miss = Time for a cache miss (includes going to next level)Recursive Expansion:
For multiple cache levels, T_miss of one level becomes the access to the next:
EMAT_L1 = h₁ × T_L1 + (1 - h₁) × (T_L1 + EMAT_L2)
EMAT_L2 = h₂ × T_L2 + (1 - h₂) × (T_L2 + EMAT_L3)
... and so on
Simplified Formula:
For a system with L1, L2, L3 caches and memory:
EMAT = T_L1 + (1-h₁) × T_L2 + (1-h₁)(1-h₂) × T_L3 + (1-h₁)(1-h₂)(1-h₃) × T_mem
This formula adds the latency of each level weighted by the probability of reaching that level.
Example: Three-Level Cache Hierarchy
Given:
Calculation (Working Outward):
Step 1: L3 to Memory
EMAT_from_L3 = T_L3 + (1 - h₃) × T_mem
= 20 + 0.2 × 100
= 20 + 20 = 40 ns
Step 2: L2 to L3
EMAT_from_L2 = T_L2 + (1 - h₂) × EMAT_from_L3
= 5 + 0.05 × 40
= 5 + 2 = 7 ns
Step 3: L1 to L2
EMAT = T_L1 + (1 - h₁) × EMAT_from_L2
= 1 + 0.1 × 7
= 1 + 0.7 = 1.7 ns
Analysis:
1234567891011121314151617181920212223242526272829303132333435363738394041
def calculate_emat(levels: list, memory_time: float) -> dict: """ Calculate EMAT for a cache hierarchy. Args: levels: List of (access_time, hit_rate) tuples, ordered L1 to Ln memory_time: Main memory access time in ns Returns: Dictionary with EMAT and breakdown """ # Work backwards from memory effective_time = memory_time breakdown = [("Memory", memory_time, 1.0)] for i, (access_time, hit_rate) in enumerate(reversed(levels)): level_name = f"L{len(levels) - i}" miss_rate = 1 - hit_rate # Time at this level = access_time + miss_rate × time_to_get_from_below effective_time = access_time + miss_rate * effective_time breakdown.append((level_name, access_time, hit_rate)) return { 'emat_ns': round(effective_time, 3), 'breakdown': list(reversed(breakdown)), 'speedup_vs_memory': round(memory_time / effective_time, 1) } # Example: L1, L2, L3 with memoryresult = calculate_emat( levels=[ (1, 0.90), # L1: 1ns, 90% hit (5, 0.95), # L2: 5ns, 95% hit of L1 misses (20, 0.80), # L3: 20ns, 80% hit of L2 misses ], memory_time=100) print(f"EMAT: {result['emat_ns']} ns")print(f"Speedup vs memory only: {result['speedup_vs_memory']}x")Some textbooks use the additive formula: EMAT = T_L1 + m₁×T_L2 + m₁×m₂×T_L3 + m₁×m₂×m₃×T_mem, where m = miss rate. This gives the same result and is sometimes easier for quick calculations.
Virtual memory adds address translation before every memory access. The complete EMAT must include TLB behavior.
Access Paths:
Path 1: TLB Hit (probability = p_tlb)
Time = T_tlb + EMAT_cache
Path 2: TLB Miss (probability = 1 - p_tlb)
Time = T_tlb + Page_Walk_Time + EMAT_cache
Complete Formula:
EMAT_with_TLB = p_tlb × (T_tlb + EMAT_cache) +
(1 - p_tlb) × (T_tlb + T_walk + EMAT_cache)
= T_tlb + EMAT_cache + (1 - p_tlb) × T_walk
Page Walk Time:
For N-level page tables:
T_walk = N × EMAT_cache (each level requires a memory read)
Note: Page table entries may be cached, so we use EMAT_cache, not raw memory time.
Complete Example with TLB:
Given:
Step 1: Calculate EMAT_cache (no TLB)
From L2 misses:
EMAT_from_L2 = 10 + 0.15 × 100 = 10 + 15 = 25 ns
From L1:
EMAT_cache = 1 + 0.10 × 25 = 1 + 2.5 = 3.5 ns
Step 2: Calculate Page Walk Time
Each level requires accessing cache/memory:
T_walk = 4 × EMAT_cache = 4 × 3.5 = 14 ns
Step 3: Complete EMAT with TLB
EMAT = T_tlb + EMAT_cache + (1 - p_tlb) × T_walk
= 2 + 3.5 + 0.02 × 14
= 2 + 3.5 + 0.28
= 5.78 ns
Impact Analysis:
| TLB Hit Rate | TLB Miss Penalty | Total EMAT | vs 100% Hit |
|---|---|---|---|
| 100% | 0 ns | 5.5 ns | baseline |
| 99% | 0.14 ns | 5.64 ns | +2.5% |
| 98% | 0.28 ns | 5.78 ns | +5.1% |
| 95% | 0.70 ns | 6.20 ns | +12.7% |
| 90% | 1.40 ns | 6.90 ns | +25.5% |
| 80% | 2.80 ns | 8.30 ns | +50.9% |
In reality, page table entries get cached. Upper-level entries (PML4, PDPT) are frequently accessed and almost always hit in cache. A more accurate model uses weighted access times for each page table level based on their cache hit rates.
Modern processors have multiple TLB levels, similar to the cache hierarchy.
Two-Level TLB Formula:
Path 1: L1 TLB Hit (p₁)
Time = T_L1_TLB + EMAT_cache
Path 2: L1 TLB Miss, L2 TLB Hit ((1-p₁) × p₂)
Time = T_L1_TLB + T_L2_TLB + EMAT_cache
Path 3: Both TLB Miss ((1-p₁) × (1-p₂))
Time = T_L1_TLB + T_L2_TLB + T_walk + EMAT_cache
Complete Formula:
EMAT = p₁ × (T_L1_TLB + EMAT_cache)
+ (1-p₁) × p₂ × (T_L1_TLB + T_L2_TLB + EMAT_cache)
+ (1-p₁) × (1-p₂) × (T_L1_TLB + T_L2_TLB + T_walk + EMAT_cache)
Simplified:
EMAT = T_L1_TLB + EMAT_cache
+ (1-p₁) × T_L2_TLB
+ (1-p₁) × (1-p₂) × T_walk
Complete System Example:
Given:
Step 1: EMAT_cache
EMAT_from_L3 = 15 + 0.15 × 80 = 15 + 12 = 27 ns
EMAT_from_L2 = 4 + 0.12 × 27 = 4 + 3.24 = 7.24 ns
EMAT_cache = 1 + 0.08 × 7.24 = 1 + 0.58 = 1.58 ns
Step 2: Page Walk Time
T_walk = 4 × EMAT_cache = 4 × 1.58 = 6.32 ns
Step 3: Full EMAT
p_L1_miss = 0.05
p_L2_miss_given_L1_miss = 0.01
p_both_miss = 0.05 × 0.01 = 0.0005
EMAT = T_L1_TLB + EMAT_cache + (1-p₁) × T_L2_TLB + (1-p₁)(1-p₂) × T_walk
= 0.5 + 1.58 + 0.05 × 4 + 0.0005 × 6.32
= 0.5 + 1.58 + 0.2 + 0.00316
= 2.28 ns
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960
def complete_emat( tlb_config: list, # [(time, hit_rate), ...] cache_config: list, # [(time, hit_rate), ...] memory_time: float, pt_levels: int) -> dict: """ Calculate complete EMAT with TLB and cache hierarchy. """ # Step 1: Calculate cache EMAT (working backwards) cache_emat = memory_time for cache_time, cache_hit_rate in reversed(cache_config): cache_emat = cache_time + (1 - cache_hit_rate) * cache_emat # Step 2: Calculate page walk time page_walk_time = pt_levels * cache_emat # Step 3: Calculate TLB contribution tlb_overhead = 0 miss_prob = 1.0 total_tlb_time = 0 for tlb_time, tlb_hit_rate in tlb_config: total_tlb_time += tlb_time miss_prob *= (1 - tlb_hit_rate) # EMAT = total_tlb_time + cache_emat + miss_prob * page_walk_time final_emat = sum(t for t, _ in tlb_config) + cache_emat # Add TLB miss overhead (progressive L2 TLB access) running_miss = 1.0 for i, (tlb_time, tlb_hit_rate) in enumerate(tlb_config): if i > 0: # L2+ TLB only accessed on prior miss final_emat += running_miss * tlb_time running_miss *= (1 - tlb_hit_rate) # Subtract redundant first TLB access and add page walk final_emat -= tlb_config[0][0] # Remove double-counted L1 TLB final_emat += tlb_config[0][0] # Add it back correctly # Add page walk cost final_emat += miss_prob * page_walk_time return { 'cache_emat_ns': round(cache_emat, 3), 'page_walk_time_ns': round(page_walk_time, 3), 'total_tlb_miss_prob': round(miss_prob, 6), 'final_emat_ns': round(final_emat, 3) } # Example configurationresult = complete_emat( tlb_config=[(0.5, 0.95), (4, 0.99)], # L1 DTLB, L2 STLB cache_config=[(1, 0.92), (4, 0.88), (15, 0.85)], # L1, L2, L3 memory_time=80, pt_levels=4) for k, v in result.items(): print(f"{k}: {v}")Understanding cache miss categories helps predict and optimize cache behavior.
The Three C's of Cache Misses:
1. Compulsory Misses (Cold Misses)
2. Capacity Misses
3. Conflict Misses (Collision Misses)
Calculating Miss Contributions:
Total Miss Rate = Compulsory + Capacity + Conflict
Compulsory Rate = Unique_Blocks_Accessed / Total_Accesses
For fully-associative cache:
Capacity Rate = Total_Miss_Rate - Compulsory_Rate
Conflict Rate = 0
For set-associative cache:
Capacity Rate = FA_Miss_Rate - Compulsory_Rate
Conflict Rate = Set_Assoc_Miss_Rate - FA_Miss_Rate
Example: Miss Category Analysis
Given:
Analysis:
Compulsory Rate = 1000 / 10000 = 10%
Capacity Rate = FA_Miss - Compulsory = 8% - 10% = -2% (capped at 0)
Note: This means the working set fits in cache!
Actual interpretation:
If FA miss rate < compulsory rate, misses are purely compulsory
(some blocks accessed only once)
FA Miss Rate = 8% means:
- Compulsory: 8% (all misses are first accesses)
- Capacity: 0% (everything fits)
Conflict Rate for Direct-Mapped = 15% - 8% = 7%
Conflict Rate for 4-way = 10% - 8% = 2%
Optimization Insight:
| Associativity | Miss Rate | EMAT | Improvement |
|---|---|---|---|
| Direct-Mapped | 15% | 1 + 0.15×10 = 2.5 ns | baseline |
| 2-way | 12% | 1 + 0.12×10 = 2.2 ns | 12% |
| 4-way | 10% | 1 + 0.10×10 = 2.0 ns | 20% |
| 8-way | 9% | 1 + 0.09×10 = 1.9 ns | 24% |
| Fully-Assoc | 8% | 1 + 0.08×10 = 1.8 ns | 28% |
In multiprocessor systems, a fourth 'C' exists: Coherence misses. These occur when another processor modifies a shared cache line, causing invalidation. Important for shared memory parallel programs.
Write operations have different timing characteristics than reads, affecting AMAT.
Write Policies:
Write-Through: Every write goes to cache AND memory
Write Time = max(Cache_Time, Memory_Write_Time) // if write buffer
Write Time = Cache_Time + Memory_Time // if no write buffer
Write-Back: Writes only go to cache; dirty blocks written on eviction
Write Hit Time = Cache_Time
Write Miss Time = Cache_Time + (if dirty eviction) × Memory_Time
Miss Handling:
Write-Allocate: On write miss, fetch block into cache, then write
Write Miss Time = Fetch_Time + Cache_Write_Time
Write-No-Allocate: On write miss, write directly to memory
Write Miss Time = Memory_Write_Time
Common combinations:
AMAT with Mixed Read/Write Traffic:
AMAT = Read_Fraction × AMAT_read + Write_Fraction × AMAT_write
Example: Write-Back Cache
Given:
Read AMAT:
AMAT_read = 1 + 0.05 × 100 = 6 ns
Write AMAT (Write-Back):
On write hit: 1 ns On write miss: Allocate + potential dirty writeback
Write Miss Time = 100 + 0.40 × 100 = 140 ns (fetch + 40% writeback)
AMAT_write = 1 + 0.08 × 140 = 1 + 11.2 = 12.2 ns
Combined AMAT:
AMAT = 0.70 × 6 + 0.30 × 12.2 = 4.2 + 3.66 = 7.86 ns
Write buffers absorb write latency by queuing writes and letting the CPU continue. If the write buffer doesn't fill up, write-through performance approaches write-back. Modern CPUs have sophisticated store buffers and memory ordering logic.
Let's bring everything together with a complete system analysis.
Problem: Full Memory System Characterization
System Configuration:
Step 1: Convert cycles to ns
1 cycle = 1/3 GHz = 0.333 ns
L1 TLB: 1 × 0.333 = 0.333 ns
L2 TLB: 8 × 0.333 = 2.67 ns
L1 Cache: 4 × 0.333 = 1.33 ns
L2 Cache: 15 × 0.333 = 5 ns
L3 Cache: 50 × 0.333 = 16.67 ns
Memory: 300 × 0.333 = 100 ns
Step 2: Calculate Cache EMAT
From L3: 16.67 + 0.20 × 100 = 16.67 + 20 = 36.67 ns
From L2: 5 + 0.10 × 36.67 = 5 + 3.67 = 8.67 ns
From L1: 1.33 + 0.06 × 8.67 = 1.33 + 0.52 = 1.85 ns
EMAT_cache = 1.85 ns
Step 3: Page Walk Time
T_walk = 4 × EMAT_cache = 4 × 1.85 = 7.4 ns
Step 4: TLB Contribution
p_L1_miss = 0.03
p_both_miss = 0.03 × 0.005 = 0.00015
EMAT_with_TLB = 0.333 + 1.85 + 0.03 × 2.67 + 0.00015 × 7.4
= 0.333 + 1.85 + 0.08 + 0.001
= 2.26 ns
Step 5: Impact on CPI
Memory stall cycles per memory instruction:
Stall_cycles = (EMAT_with_TLB - T_L1) / cycle_time
= (2.26 - 0.333) / 0.333
= 5.8 cycles
Actual CPI:
CPI = CPI_base + Memory_Fraction × Stall_cycles
= 1.0 + 0.35 × 5.8
= 1.0 + 2.03
= 3.03 cycles
Step 6: Performance Analysis
IPC_ideal = 1.0
IPC_actual = 1/3.03 = 0.33
Performance loss to memory = (3.03 - 1.0) / 3.03 = 67%
Interpretation: Memory access causes 67% of execution time. This is why:
| Change | New EMAT | New CPI | Speedup |
|---|---|---|---|
| Baseline | 2.26 ns | 3.03 | 1.00× |
| L1 hit rate → 97% | 1.97 ns | 2.74 | 1.11× |
| L2 hit rate → 95% | 2.10 ns | 2.86 | 1.06× |
| Memory → 200 cycles | 1.93 ns | 2.70 | 1.12× |
| TLB hit → 99% | 2.21 ns | 2.98 | 1.02× |
| All improvements | 1.55 ns | 2.28 | 1.33× |
L1 cache: 2 ns, 88% hit rate. L2 cache: 20 ns, 92% hit rate of L1 misses. Memory: 200 ns. Calculate EMAT and the percentage of time spent in each level.
EMAT Calculation:
From L2: 20 + 0.08 × 200 = 20 + 16 = 36 ns
From L1: 2 + 0.12 × 36 = 2 + 4.32 = 6.32 ns
EMAT = 6.32 ns
Time Distribution:
L1 time: 2 ns (always accessed)
L2 time: 0.12 × 20 = 2.4 ns (12% of accesses)
Memory time: 0.12 × 0.08 × 200 = 1.92 ns (0.96% of accesses)
Percentage:
- L1: 2/6.32 = 31.6%
- L2: 2.4/6.32 = 38.0%
- Memory: 1.92/6.32 = 30.4%
TLB: 1 ns, 96% hit. Cache: 4 ns, 90% hit. Memory: 100 ns. 2-level page table. (a) Calculate EMAT. (b) What TLB hit rate is needed to make TLB overhead < 5%?
(a) EMAT Calculation:
Cache EMAT = 4 + 0.1 × 100 = 14 ns
Page walk = 2 × 14 = 28 ns
EMAT = 1 + 14 + 0.04 × 28 = 1 + 14 + 1.12 = 16.12 ns
(b) TLB Overhead Target:
Without TLB miss: 1 + 14 = 15 ns
5% overhead: 15 × 1.05 = 15.75 ns
Max TLB miss contribution: 15.75 - 15 = 0.75 ns
(1 - p_hit) × 28 ≤ 0.75
(1 - p_hit) ≤ 0.75/28 = 0.027
p_hit ≥ 97.3%
Answer: TLB hit rate must be at least 97.3% for overhead < 5%.
A CPU at 2 GHz has CPI_base = 1.2 and 40% memory instructions. L1: 4 cycles, 92% hit. Memory: 200 cycles. Calculate actual CPI and IPC.
Memory Stall Calculation:
Miss penalty = 200 - 4 = 196 cycles
Miss rate = 8%
Avg memory stall per access = 0.08 × 196 = 15.68 cycles
Stalls per instruction = 0.40 × 15.68 = 6.27 cycles
Actual CPI:
CPI = CPI_base + memory_stalls
= 1.2 + 6.27
= 7.47 cycles
IPC:
IPC = 1/CPI = 1/7.47 = 0.134
Interpretation: Memory stalls cause CPI to increase 6.2× from ideal. Only 0.134 instructions complete per cycle vs. 0.83 ideal.
Essential Formulas:
EMAT_cache = T_L1 + (1-h₁) × T_L2 + (1-h₁)(1-h₂) × T_L3 + ... + T_mem
EMAT_with_TLB = T_TLB + EMAT_cache + (1-p_tlb) × T_walk
T_walk = Page_Table_Levels × EMAT_cache
CPI = CPI_base + Memory_Fraction × Stalls_per_Memory_Access
Module Complete!
You have now mastered the five core numerical problem types in operating systems:
These skills form the quantitative foundation for OS performance analysis and are essential for technical interviews.
Congratulations! You have completed the Numerical Problems module. You now have comprehensive knowledge of the quantitative aspects of operating systems, from process scheduling metrics to complete memory system analysis. Practice these calculations regularly to build speed and accuracy.