Loading learning content...
Virtual memory's elegance comes with a cost: every memory access requires an address translation. Without optimization, this translation would require multiple memory accesses (one per page table level), making virtual memory prohibitively slow. The Translation Lookaside Buffer (TLB) is the hardware solution that makes virtual memory practical.
The TLB is a small, fast cache of recent address translations. When a virtual address translation is found in the TLB (a TLB hit), translation is nearly instantaneous. When it's not found (a TLB miss), the system must perform an expensive page table walk.
Understanding TLB hit ratio calculations is essential because:
By the end of this page, you will be able to calculate Effective Access Time (EAT) for any TLB configuration, analyze how TLB parameters affect performance, understand the relationship between working set size and TLB hit ratio, and solve complex multi-level TLB problems.
Before calculating TLB performance, we must understand TLB architecture and operation.
TLB Structure:
A TLB entry contains:
TLB Lookup Process:
| System | TLB Size | Associativity | TLB Access Time | Memory Time |
|---|---|---|---|---|
| x86-64 (L1 DTLB) | 64-96 entries | 4-way/fully | 1-3 cycles | 100+ cycles |
| x86-64 (L2 TLB) | 512-2048 entries | 4-8 way | 6-8 cycles | 100+ cycles |
| ARM Cortex-A | 32-128 entries | Fully assoc | 1 cycle | 50-100 cycles |
| RISC-V | 16-64 entries | Fully assoc | 1 cycle | Variable |
TLB Miss Handling:
When a TLB miss occurs, one of two mechanisms handles it:
Hardware-Managed TLB (x86, ARM):
Software-Managed TLB (MIPS, SPARC, some RISC-V):
Page Fault (page not in memory):
The TLB caches complete virtual-to-physical translations. Some CPUs also cache intermediate page table entries (like PDEs, PDPEs) separately. This means a TLB miss might still be fast if upper-level page table entries are cached in the data cache or special paging caches.
The fundamental TLB performance metric is Effective Access Time (EAT), which represents the average time to access memory considering both TLB hits and misses.
Basic EAT Formula:
EAT = (Hit Rate × Hit Time) + (Miss Rate × Miss Time)
Where:
Hit Rate = TLB hit probability (α)
Miss Rate = 1 - Hit Rate = (1 - α)
Hit Time = TLB access time + Memory access time
Miss Time = TLB access time + Page table walk time + Memory access time
Simplified Formula (assuming TLB accessed in parallel with memory or very fast):
EAT = α × m + (1 - α) × (m + p)
= α × m + (1 - α) × m + (1 - α) × p
= m + (1 - α) × p
Where:
m = Memory access time
p = Page table walk penalty (additional memory accesses × m)
α = TLB hit ratio
Example 1: Single-Level Page Table
Given:
TLB Hit Path:
Time = TLB lookup + Memory access
= 10 + 100 = 110 ns
TLB Miss Path:
Time = TLB lookup + Page table access + Memory access
= 10 + 100 + 100 = 210 ns
EAT Calculation:
EAT = (0.98 × 110) + (0.02 × 210)
= 107.8 + 4.2
= 112 ns
Performance Impact:
Without TLB (always walk): 200 ns
With TLB: 112 ns
Speedup: 200/112 = 1.79× (44% reduction in access time)
Example 2: Four-Level Page Table (x86-64)
Given:
TLB Hit Path:
Time = 5 + 100 = 105 ns
TLB Miss Path:
Time = 5 + (4 × 100) + 100 = 505 ns
EAT Calculation:
EAT = (0.99 × 105) + (0.01 × 505)
= 103.95 + 5.05
= 109 ns
Key Insight: Even with the 4-level page table penalty (500 ns on miss), the high TLB hit ratio keeps EAT reasonable. If hit ratio dropped to 90%:
EAT = (0.90 × 105) + (0.10 × 505)
= 94.5 + 50.5
= 145 ns (38% slower than 99% hit ratio)
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
def calculate_eat( memory_time_ns: float, tlb_time_ns: float, tlb_hit_ratio: float, page_table_levels: int) -> dict: """ Calculate Effective Access Time for TLB-based memory system. Args: memory_time_ns: Time for single memory access tlb_time_ns: TLB lookup time tlb_hit_ratio: Probability of TLB hit (0 to 1) page_table_levels: Number of page table levels to walk on miss Returns: Dictionary with timing analysis """ # Calculate hit and miss times hit_time = tlb_time_ns + memory_time_ns miss_time = tlb_time_ns + (page_table_levels * memory_time_ns) + memory_time_ns # Calculate EAT eat = (tlb_hit_ratio * hit_time) + ((1 - tlb_hit_ratio) * miss_time) # Calculate without TLB for comparison no_tlb_time = (page_table_levels + 1) * memory_time_ns # Calculate slowdown factors overhead = eat - memory_time_ns overhead_percent = (overhead / memory_time_ns) * 100 speedup = no_tlb_time / eat return { 'hit_time_ns': hit_time, 'miss_time_ns': miss_time, 'eat_ns': round(eat, 2), 'no_tlb_time_ns': no_tlb_time, 'speedup_from_tlb': round(speedup, 2), 'overhead_ns': round(overhead, 2), 'overhead_percent': round(overhead_percent, 1) } # Example: 4-level page table with varying hit ratiosprint("TLB Hit Ratio Analysis (4-level, m=100ns):")print("-" * 60)for hit_ratio in [0.90, 0.95, 0.98, 0.99, 0.999]: result = calculate_eat( memory_time_ns=100, tlb_time_ns=5, tlb_hit_ratio=hit_ratio, page_table_levels=4 ) print(f"Hit ratio {hit_ratio*100:5.1f}%: EAT = {result['eat_ns']:6.1f}ns, " f"Overhead = {result['overhead_percent']:4.1f}%")Notice how performance degrades rapidly below 99% hit ratio for systems with deep page table hierarchies. This is why TLB design focuses on maximizing hit ratios through size, associativity, and reach (via huge pages).
TLB Reach is the total amount of memory that can be accessed without TLB misses when the TLB is fully utilized.
TLB Reach Formula:
TLB Reach = Number of TLB Entries × Page Size
Example Calculations:
For a 64-entry TLB:
Huge Pages Impact:
Huge pages dramatically increase TLB reach, which is why they're essential for applications with large working sets (databases, scientific computing, VMs).
| TLB Entries | 4 KB Pages | 2 MB Pages | 1 GB Pages |
|---|---|---|---|
| 64 | 256 KB | 128 MB | 64 GB |
| 128 | 512 KB | 256 MB | 128 GB |
| 512 | 2 MB | 1 GB | 512 GB |
| 2048 | 8 MB | 4 GB | 2 TB |
Working Set and Hit Ratio Relationship:
The TLB hit ratio depends on the relationship between working set size and TLB reach:
Case 1: Working Set < TLB Reach
Case 2: Working Set ≈ TLB Reach
Case 3: Working Set >> TLB Reach
Estimating Hit Ratio from Working Set:
Approximate hit ratio (random access) = min(1.0, TLB_Reach / Working_Set_Size)
This is a rough lower bound; actual patterns often achieve higher ratios due to locality.
Modern CPUs have multiple TLB levels (L1 DTLB, L1 ITLB, L2 STLB). The L1 TLBs are small and fast; the L2 TLB is larger and slower. EAT calculations become more complex with multi-level TLBs, requiring separate hit/miss paths for each level.
Multi-Level TLB EAT Example:
Given:
Paths:
L1 TLB Hit (95%):
Time = L1_TLB + Memory = 2 + 100 = 102 ns
L1 Miss, L2 Hit (5% × 99% = 4.95%):
Time = L1_TLB + L2_TLB + Memory = 2 + 8 + 100 = 110 ns
Both Miss (5% × 1% = 0.05%):
Time = L1_TLB + L2_TLB + (4 × Memory) + Memory
= 2 + 8 + 400 + 100 = 510 ns
EAT:
EAT = (0.95 × 102) + (0.0495 × 110) + (0.0005 × 510)
= 96.9 + 5.445 + 0.255
= 102.6 ns
The two-level TLB reduces EAT from what would be 109 ns with a single-level design of equivalent size.
Understanding what causes TLB misses helps predict and optimize hit ratios.
Types of TLB Misses:
1. Compulsory Misses (Cold Misses):
2. Capacity Misses:
3. Conflict Misses (Set-Associative TLBs):
4. Coherency Misses (TLB Shootdowns):
Miss Rate Estimation Techniques:
Analytical Estimation:
For a program accessing N unique pages with a TLB of size T entries:
Compulsory miss rate = N / Total_Accesses
Capacity miss rate (uniform random access) ≈ max(0, (N - T) / N)
Approximate total miss rate = Compulsory + Capacity (assuming high associativity)
Example:
Compulsory miss rate = 10,000 / 1,000,000 = 1%
If random access pattern:
Capacity miss rate ≈ (10,000 - 128) / 10,000 = 98.7%
(Effectively, TLB provides minimal benefit)
If sequential with temporal locality:
Capacity miss rate ≈ Pages / (Accesses per page × TLB_Size)
= 10,000 / (100 × 128) ≈ 0.78%
Total miss rate ≈ 1.78%
The same working set can have vastly different TLB miss rates depending on access pattern. Random access through 10,000 pages thrashes the TLB. Sequential access through the same 10,000 pages achieves near-optimal hit rate. This is why databases and scientific applications carefully optimize memory access patterns.
Locality and TLB Performance:
Temporal Locality:
Spatial Locality:
Stride Access Pattern:
When accessing memory with stride S:
If S < Page_Size:
Multiple accesses per TLB entry
Effective miss rate = 1 / (Page_Size / S)
If S >= Page_Size:
One access per TLB entry
Miss rate depends entirely on TLB size vs working set
Example: Matrix Column Access
Accessing a column of a row-major matrix:
With 4 KB pages and 128-entry TLB:
With 2 MB huge pages:
TLB hit ratio isn't just about TLB size—it's also affected by TLB invalidation events.
Context Switch TLB Behavior:
When switching between processes:
Without ASID (older systems):
With ASID (modern systems):
ASID Impact Calculation:
Given:
Without ASID:
Every switch: 128 or 64 compulsory misses
Miss overhead per 1,000 accesses: ~10% (if 100 misses average)
With ASID:
Both working sets fit in TLB: 128 + 64 = 192 < 256
After initial warmup: ~0% switch-related misses
TLB Shootdowns in Multiprocessor Systems:
When page mappings change (munmap, mprotect, page migration), all CPUs must invalidate the affected TLB entries.
Shootdown Process:
Shootdown Cost:
Shootdown latency = IPI_latency + Handler_time + Acknowledgment_latency
Typically: 5-50 microseconds per shootdown
Impact on Performance:
Example: Memory-Mapped File Updates
Database unmaps and remaps 1 GB file region frequently
With 4 KB pages: 262,144 pages to invalidate
With 2 MB pages: 512 pages to invalidate
Shootdown entries per operation:
4 KB: 262,144 entries (may require batching)
2 MB: 512 entries (single shootdown possible)
Modern x86-64 processors with PCID (similar to ASID) can avoid flushing TLB entries on context switch and can even do lazy TLB shootdowns. The kernel tracks which entries need invalidation and batches the work, significantly improving multicore scalability.
Real systems have additional complexity that affects EAT calculations.
Including Page Faults:
Page faults are rare but extremely expensive:
EAT = (Hit_Rate × Hit_Time) +
(TLB_Miss_Rate × In_Memory × Miss_Time) +
(TLB_Miss_Rate × Not_In_Memory × Page_Fault_Time)
Example with Page Faults:
Given:
Calculation:
TLB Hit: 5 + 100 = 105 ns (98%)
TLB Miss, No Page Fault: 5 + 400 + 100 = 505 ns (2% × 99.9% = 1.998%)
TLB Miss, Page Fault: 5 + 10,000,000 = 10,000,005 ns (2% × 0.1% = 0.002%)
EAT = (0.98 × 105) + (0.01998 × 505) + (0.00002 × 10,000,005)
= 102.9 + 10.09 + 200
= 312.99 ns
Key Insight: Even at 0.002% occurrence rate, page faults triple the EAT! This is why page fault minimization is critical for performance.
Hierarchical Caching Effects:
Modern systems have multiple cache levels. Page table walks often hit data caches:
Given:
Page Table Entry Cache Behavior:
Page table entries are just memory—they get cached like any data.
Effective Page Walk Cost:
With cold caches: 4 × 100 = 400 ns
With warm caches: 1 + 1 + 10 + 40 = 52 ns
Typical (mixed): 1 + 1 + 10 + 60 = 72 ns
This significantly reduces the TLB miss penalty in practice.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475
def advanced_eat( tlb_hit_ratio: float, tlb_time: float, mem_time: float, pt_levels: int, page_fault_rate: float = 0.0, page_fault_time: float = 0.0, pt_cache_hit_ratio: float = 0.0, cache_time: float = 0.0) -> dict: """ Calculate EAT with page faults and cache effects. Args: tlb_hit_ratio: TLB hit probability tlb_time: TLB lookup time (ns) mem_time: Memory access time (ns) pt_levels: Page table levels page_fault_rate: P(page fault | TLB miss) page_fault_time: Page fault handling time (ns) pt_cache_hit_ratio: P(PT entries in cache) cache_time: Average cache access time for PT (ns) """ # TLB hit time hit_time = tlb_time + mem_time # Page table walk time (considering cache) if pt_cache_hit_ratio > 0: pt_walk_time = pt_levels * ( pt_cache_hit_ratio * cache_time + (1 - pt_cache_hit_ratio) * mem_time ) else: pt_walk_time = pt_levels * mem_time # TLB miss, page in memory miss_in_mem_time = tlb_time + pt_walk_time + mem_time # TLB miss, page fault miss_page_fault_time = tlb_time + page_fault_time # Calculate probabilities p_hit = tlb_hit_ratio p_miss_in_mem = (1 - tlb_hit_ratio) * (1 - page_fault_rate) p_miss_fault = (1 - tlb_hit_ratio) * page_fault_rate # EAT eat = (p_hit * hit_time + p_miss_in_mem * miss_in_mem_time + p_miss_fault * miss_page_fault_time) return { 'eat_ns': round(eat, 2), 'hit_time_ns': hit_time, 'miss_in_mem_time_ns': round(miss_in_mem_time, 2), 'miss_fault_time_ns': miss_page_fault_time, 'p_hit': p_hit, 'p_miss_in_mem': round(p_miss_in_mem, 6), 'p_miss_fault': round(p_miss_fault, 6) } # Example with page faults and cache effectsresult = advanced_eat( tlb_hit_ratio=0.98, tlb_time=5, mem_time=100, pt_levels=4, page_fault_rate=0.001, page_fault_time=10_000_000, # 10ms pt_cache_hit_ratio=0.75, cache_time=20) for k, v in result.items(): print(f"{k}: {v}")Work through these problems to solidify your understanding of TLB performance calculations.
A system has 100ns memory access time, 10ns TLB access time, two-level page table, and 95% TLB hit ratio. Calculate the EAT. How much faster would it be with 99% hit ratio?
With 95% hit ratio:
TLB Hit: 10 + 100 = 110 ns (95%)
TLB Miss: 10 + (2 × 100) + 100 = 310 ns (5%)
EAT = 0.95 × 110 + 0.05 × 310
= 104.5 + 15.5
= 120 ns
With 99% hit ratio:
EAT = 0.99 × 110 + 0.01 × 310
= 108.9 + 3.1
= 112 ns
Speedup = 120/112 = 1.07× (7% faster)
Alternatively: Going from 5% to 1% miss rate reduces miss-related time from 15.5ns to 3.1ns = 12.4ns savings per access.
A program has a 4 MB working set. The TLB has 64 entries. (a) With 4 KB pages, what is the TLB reach? (b) What hit ratio can be expected (assume random access)? (c) What page size would achieve 95%+ hit ratio?
(a) TLB Reach with 4 KB pages:
Reach = 64 × 4 KB = 256 KB
(b) Expected hit ratio:
Working set = 4 MB = 4096 KB
Pages in working set = 4096 / 4 = 1024 pages
TLB can hold: 64 pages
Approximate hit ratio (random) = 64 / 1024 = 6.25%
(Very poor performance due to TLB thrashing)
(c) Page size for 95% hit ratio:
Need: TLB Reach ≥ Working Set × 0.95
64 × Page_Size ≥ 4 MB × 0.95 = 3.8 MB
Page_Size ≥ 3.8 MB / 64 ≈ 60.8 KB
Nearest standard size: 64 KB or 2 MB huge pages
With 2 MB pages:
Reach = 64 × 2 MB = 128 MB >> 4 MB
Expected hit ratio ≈ 100% (after warmup)
A system has: L1 TLB (32 entries, 1 cycle, 90% hit), L2 TLB (256 entries, 4 cycles, 98% hit of L1 misses), memory (200 cycles), 4-level page table. CPU clock: 2 GHz. Calculate EAT in nanoseconds.
Convert cycles to ns:
1 cycle at 2 GHz = 0.5 ns
L1 TLB: 1 × 0.5 = 0.5 ns
L2 TLB: 4 × 0.5 = 2 ns
Memory: 200 × 0.5 = 100 ns
Calculate paths:
L1 Hit: 0.5 + 100 = 100.5 ns (90%)
L1 Miss, L2 Hit: 0.5 + 2 + 100 = 102.5 ns
(10% × 98% = 9.8%)
Both Miss: 0.5 + 2 + (4 × 100) + 100 = 502.5 ns
(10% × 2% = 0.2%)
EAT:
EAT = 0.90 × 100.5 + 0.098 × 102.5 + 0.002 × 502.5
= 90.45 + 10.045 + 1.005
= 101.5 ns
Continuing from Problem 3, if 0.01% of TLB misses result in page faults taking 5 ms, what is the new EAT?
New path breakdown:
L1 Hit: 100.5 ns (90%)
L1 Miss, L2 Hit, No Fault: 102.5 ns (9.8% × 99.99% = 9.799%)
L1 Miss, L2 Hit, Fault: 5,000,000 ns (9.8% × 0.01% = 0.00098%)
Both Miss, No Fault: 502.5 ns (0.2% × 99.99% = 0.19998%)
Both Miss, Fault: 5,000,000 ns (0.2% × 0.01% = 0.00002%)
EAT:
EAT = 0.90 × 100.5
+ 0.09799 × 102.5
+ 0.0000098 × 5,000,000
+ 0.0019998 × 502.5
+ 0.0000002 × 5,000,000
= 90.45 + 10.04 + 49 + 1.00 + 1
= 151.49 ns
Impact: Page faults at 0.01% rate increased EAT from 101.5 to 151.5 ns—a 49% increase from 0.001% of accesses!
Essential Formulas:
EAT = (TLB_Hit_Rate × Hit_Time) + (TLB_Miss_Rate × Miss_Time)
TLB Reach = TLB_Entries × Page_Size
Estimated Hit Ratio (random) ≈ min(1.0, TLB_Reach / Working_Set)
Page Walk Time = Page_Table_Levels × Memory_Access_Time
What's Next:
We've analyzed TLB performance. Next, we'll examine disk scheduling seek time calculations, applying similar quantitative analysis to storage I/O.
You now have comprehensive knowledge of TLB hit ratio calculations, including basic EAT, multi-level TLBs, TLB reach analysis, and the impact of page faults and caching. Practice with various configurations to build speed and accuracy.