Operating SystemsTranslation Lookaside Buffer (TLB)

Translation Lookaside Buffer (TLB)

LevelIntermediate

Duration60 mins

TopicTranslation Lookaside Buffer (TLB)

1 / 5

TLB Purpose

The Memory Access Crisis

Virtual memory is one of the most elegant abstractions in computer science. It provides each process with the illusion of having its own private, contiguous address space, enables memory protection, and allows execution of programs larger than physical memory. But this elegant abstraction comes with a devastating performance problem: every single memory access now requires two memory accesses—one to translate the virtual address via the page table, and another to access the actual data.

Consider the implications. If memory access takes 100 nanoseconds, and every program memory reference now requires two accesses, you've just doubled your effective memory latency. For a CPU executing billions of instructions per second, many of which touch memory, this overhead would make virtual memory completely impractical. Modern systems would slow to a crawl.

This is where the Translation Lookaside Buffer (TLB) enters—a specialized hardware cache that transforms virtual memory from a theoretical curiosity into a practical reality. The TLB is not merely an optimization; it is the critical enabling technology that makes paging viable in high-performance systems.

What You Will Learn

By the end of this page, you will understand why the TLB exists, the fundamental problem it solves, how it eliminates the double-access penalty of paging, and why it is considered one of the most important caches in all of computer architecture—arguably more critical than the data cache itself.

The Double Memory Access Problem

To truly appreciate the TLB, we must first deeply understand the problem it solves. Let's trace what happens when a CPU executes a simple instruction like MOV EAX, [0x12345678] (load the value at virtual address 0x12345678 into register EAX) in a paging-enabled system without a TLB.

Step-by-step memory access sequence:

Memory Access Without TLB

•CPU generates virtual address: The instruction references virtual address 0x12345678
•Extract page number: Given a 4KB page size, the page number is 0x12345 (top 20 bits)
•Calculate page table entry address: Page Table Base Register + (Page Number × Entry Size) = PTBR + (0x12345 × 4)
•First memory access: Read the page table entry from physical memory (~100ns)
•Extract frame number: Parse PTE to get physical frame number and check valid/present bits
•Form physical address: Concatenate frame number with offset (lower 12 bits: 0x678)
•Second memory access: Finally read the actual data from the physical address (~100ns)

The arithmetic is damning. If main memory access latency is 100ns, every memory operation now costs 200ns. But it gets worse—much worse—when we consider multi-level page tables.

The multi-level nightmare:

Modern 64-bit systems don't use single-level page tables because the page table itself would be enormous. A 48-bit virtual address space with 4KB pages would require 2^36 page table entries—512GB just for the page table! Instead, systems use hierarchical page tables with 4 or 5 levels.

Memory Accesses Required Per Memory Reference (Without TLB)
Page Table Levels	Memory Accesses for Translation	Total for One Data Access	Slowdown Factor
1-level (32-bit simple)	1	2	2×
2-level (32-bit x86)	2	3	3×
3-level (older 64-bit)	3	4	4×
4-level (x86-64 current)	4	5	5×
5-level (x86-64 LA57)	5	6	6×

The Brutal Reality

In a 4-level page table system (standard x86-64), every memory access would require 5 memory accesses without the TLB. At 100ns per access, a single memory reference takes 500ns. A modern CPU can execute 5-10 instructions per nanosecond. Without TLB, memory-bound programs would run 5× slower—and that's assuming you don't cache miss on the page table entries themselves!

The TLB Solution: Caching Address Translations

The Translation Lookaside Buffer is a specialized hardware cache that stores recent virtual-to-physical address translations. Instead of walking the page table for every memory access, the system first checks the TLB. If the translation is found (a TLB hit), no page table access is needed—the physical address is immediately available.

The key insight that makes TLBs work: locality of reference.

Programs don't access memory randomly. They exhibit strong temporal and spatial locality:

Temporal locality: If you accessed an address recently, you're likely to access it again soon (loops, frequently-used data structures)
Spatial locality: If you accessed an address, you're likely to access nearby addresses soon (sequential code execution, array traversal)

Because of locality, a small TLB (typically 64-1024 entries) can capture the working set of most programs. A single TLB entry covers an entire page (4KB or more), so even 64 entries can cover 256KB of active memory—often sufficient for tight loops.

Without TLB

•Every memory access walks page table
•Multi-level tables: 4-5 memory accesses per reference
•~500ns per memory access (4-level)
•Memory bandwidth consumed by page walks
•CPU stalls waiting for translations
•Virtual memory impractical at scale

With TLB

•Check TLB first (1 cycle or less)
•TLB hit: translation in ~1ns
•~100ns per memory access (TLB hit)
•Page table accessed only on TLB miss
•CPU proceeds immediately on hit
•Virtual memory viable and fast

Quantifying the TLB benefit:

Typical TLB hit rates in well-behaved programs exceed 99%. Let's calculate the effective memory access time:

Effective Access Time = (TLB Hit Rate × Hit Time) + (TLB Miss Rate × Miss Time)

Assume:

TLB lookup: 1ns
Memory access: 100ns
4-level page table
Miss penalty: 4 × 100ns = 400ns for page walk + 100ns for data = 500ns
TLB hit: 1ns + 100ns = 101ns

With 99% hit rate:

EAT = (0.99 × 101) + (0.01 × 500) = 99.99 + 5 = ~105ns

Without TLB: 500ns With TLB (99% hit rate): 105ns Speedup: 4.76×

With 99.9% hit rate:

EAT = (0.999 × 101) + (0.001 × 500) = 100.9 + 0.5 = ~101.4ns

Near-perfect elimination of translation overhead!

TLB Architecture Fundamentals

The TLB is not just any cache—it's a highly specialized, carefully designed piece of hardware optimized for the unique requirements of address translation. Understanding its architecture reveals why it's so effective.

What a TLB entry contains:

TLB Entry Fields
Field	Size (typical)	Purpose
Virtual Page Number (VPN)	20-52 bits	The virtual address portion used for matching
Physical Frame Number (PFN)	20-52 bits	The translated physical frame number
Valid bit	1 bit	Entry contains valid translation
Protection bits	2-3 bits	Read/Write/Execute permissions
Reference bit	1 bit	Page was accessed (for replacement)
Dirty bit	1 bit	Page was modified
Global bit	1 bit	Entry shared across address spaces (kernel)
ASID	8-16 bits	Address Space ID (avoids flush on context switch)

Key architectural properties of TLBs:

1. Fully Associative or Set-Associative Design

Unlike CPU data caches that are typically 4-8 way set-associative, many TLBs are fully associative—any entry can hold any translation. This is feasible because TLBs are small (64-1024 entries), and full associativity minimizes conflict misses. Every slot is equally eligible for any page-to-frame mapping.

2. Content-Addressable Memory (CAM)

TLBs are implemented using content-addressable memory, also called associative memory. Unlike regular memory where you provide an address and get data, in CAM you provide data (the virtual page number) and get back whether it exists and where. All entries are searched in parallel in a single cycle.

3. Parallel Lookup

The TLB is queried in parallel with other pipeline stages. Modern CPUs speculatively begin the TLB lookup as soon as the virtual address is generated, even before confirming the memory operation will proceed. This hides TLB latency entirely.

4. Split TLB Design

Most processors have separate TLBs for instructions (ITLB) and data (DTLB). This allows simultaneous lookup for the next instruction fetch and the current data access. Each is smaller but the combined capacity serves both needs.

Hardware Investment Reveals Priority

The TLB is one of the most transistor-dense structures on a CPU die. A fully-associative TLB with 64 entries requires comparing the virtual page number against all 64 entries simultaneously—meaning 64 parallel comparators. This massive hardware investment underscores how critical fast translation is to overall system performance.

The TLB's Position in the Memory Hierarchy

To fully appreciate the TLB's role, we must understand where it sits in the memory hierarchy and how it interacts with other caches.

The critical question: Virtual or Physical Cache?

CPU data caches can be addressed using either virtual addresses (virtually-indexed, virtually-tagged—VIVT) or physical addresses (physically-indexed, physically-tagged—PIPT). This decision profoundly affects the TLB's role:

Physically-indexed caches (common in L2/L3):

TLB lookup must complete before cache lookup can begin
Translation is on the critical path
TLB latency directly adds to memory access time

Virtually-indexed, physically-tagged (VIPT, common in L1):

Index into cache using virtual address bits that don't change during translation
Tag comparison uses physical address from TLB
TLB lookup happens in parallel with cache access
No added latency on TLB hit!

Modern systems use VIPT L1 caches specifically to hide TLB latency.

Here's the elegant trick: For a 32KB L1 cache with 64-byte lines and 8-way associativity:

Cache has 64 sets (512 lines ÷ 8 ways)
Set index requires 6 bits
Line offset requires 6 bits
Total: 12 bits to locate a cache line

With 4KB pages:

Page offset is 12 bits
These 12 bits are identical in virtual and physical addresses!

Since the cache index bits are within the page offset, the cache can be indexed using the virtual address while the TLB lookup proceeds in parallel. By the time the cache line is fetched, the TLB has provided the physical tag for comparison. This is called VIPT with virtual indexing inside the page offset—a design that gives the speed of virtual caching with the correctness of physical tags.

The TLB-Cache Symbiosis

The TLB and L1 cache are designed together as a unit. Cache geometry (size, associativity, line size) is chosen specifically to allow virtual indexing with physical tagging, minimizing the impact of translation on the critical path. This co-design is why you can't change cache parameters without considering TLB implications.

Why the TLB is More Critical Than Data Cache

A provocative but defensible claim: the TLB is more important than the L1 data cache. Here's why:

1. TLB miss penalty is multiplicative, not additive

When you miss in the L1 data cache, you go to L2 (~10 cycles), then L3 (~40 cycles), then memory (~200 cycles). The miss penalty is the time to fetch one piece of data.

When you miss in the TLB, you perform a page table walk, which requires multiple sequential memory accesses. Each of those accesses can itself miss in the cache hierarchy! A TLB miss can trigger 4-16 memory accesses in the worst case.

2. TLB coverage affects cache effectiveness

The TLB determines which pages are quickly accessible. If your working set exceeds TLB coverage, you're not just paying TLB miss penalties—you're also likely thrashing the data cache, since the pages you're accessing keep changing.

3. No spatial prefetching helps TLB

Data caches benefit from spatial prefetching—accessing A likely means you'll access A+64 soon. But page translations don't work this way. Pages are scattered in physical memory. Knowing the translation for page N tells you nothing about page N+1's translation.

4. Software can't easily mitigate TLB misses

Programmers can restructure data for cache efficiency (blocking, tiling, cache-oblivious algorithms). But improving TLB behavior is much harder—you're fighting the OS memory allocator and page table structure. About the only knob is using larger pages.

TLB vs L1 Cache Comparison
Characteristic	L1 Data Cache	TLB
Typical entries	512-1024 lines × 64B	64-1024 entries
Coverage per entry	64 bytes	4KB (or 2MB/1GB)
Total coverage	32KB-64KB	256KB-4MB (4KB pages)
Miss penalty	~10-200 cycles	~50-500 cycles (page walk)
Can prefetch?	Yes (hardware + software)	Limited (huge pages only)
Replaceable by larger cache?	Yes (L2 backs L1)	No (must walk page table)
Miss triggers further misses?	No	Yes (each page walk level can miss)

TLB Misses Hide Everywhere

TLB misses are often invisible in standard profiling. They show up as high memory latency or cache misses (since page walks access memory). Many "memory-bound" performance issues are actually TLB-bound. Tools like perf's dTLB-load-misses counter are essential for diagnosis.

Historical Development of the TLB

Understanding the TLB's history illuminates both its importance and design evolution.

1960s: The Birth of Virtual Memory Without TLBs

When virtual memory was invented (Manchester Atlas, 1962), computers were slow enough that the double-access penalty was tolerable. Memory access was already many cycles, and adding another didn't catastrophically change performance. Page tables were small (16-bit addresses) and lived in fast core memory.

1970s: The TLB Emerges

As processors sped up and address spaces grew, the translation overhead became unacceptable. The IBM System/370 (1970) introduced what it called an "address translation buffer"—the first true TLB. With 8-128 entries, it cached recent translations and achieved hit rates above 95%.

1980s: TLBs Become Essential

The VAX-11/780 (1977) and later processors standardized TLBs as a required component. The 32-bit address space era made page tables larger, and increased CPU speeds made translation latency more painful. A single-level page table for a 32-bit address space with 4KB pages has 1 million entries!

1990s-2000s: Multi-Level TLBs

As processors gained more transistors and memory latency (in cycles) increased, TLB hierarchies emerged. The L1 TLB became smaller and faster (4-64 entries, 1 cycle), backed by a larger L2 TLB (512-1024 entries, ~10 cycles). This mirrors the data cache hierarchy.

2010s-Present: Hardware Page Walkers

Modern CPUs include dedicated page walk engines that traverse page tables in hardware without interrupting the CPU. They cache intermediate page table entries (paging structure caches) and can walk pages speculatively. AMD and Intel implement aggressive page walk caches that can reduce 4-level walks to 1-2 accesses.

The Trend: More, Not Fewer Resources for Translation

Every processor generation dedicates more silicon to address translation: larger TLBs, more TLB levels, faster page walkers, bigger page walk caches. This trend reflects virtual memory's centrality to modern computing—translation must be fast, and there's no substitute for hardware optimization.

TLB in Modern Processor Implementations

Let's examine how leading processors implement TLBs, revealing the engineering behind world-class performance.

TLB Specifications in Modern Processors
Processor	L1 ITLB	L1 DTLB	L2 TLB	Notable Features
Intel Alder Lake (2021)	128 entries (4KB/2MB)	64 entries (4KB), 32 entries (2MB)	2048 entries shared	Dedicated 2MB/1GB TLBs
AMD Zen 4 (2022)	64 entries (L1)	72 entries (4KB/2MB)	3072 entries L2	Per-core L2 TLB, fast page walker
Apple M2 (2022)	192 entries wide	160 entries	~2000 shared	Aggressive page walk cache
ARM Cortex-X3	48 entries fully assoc	48 entries fully assoc	2048 unified	ASID support, big/little pages

Key observations from modern TLB designs:

1. Separate TLBs for Different Page Sizes

Modern CPUs have dedicated TLBs for 4KB, 2MB, and 1GB pages. Large pages provide enormous TLB coverage (a single 2MB entry covers what 512 4KB entries would), but require contiguous physical memory and OS support.

2. PCID/ASID: Avoiding TLB Flushes

Traditionally, context switching between processes required flushing the TLB entirely—every entry belonged to the old process's address space. Modern CPUs tag TLB entries with an Address Space Identifier (ASID or PCID on x86). Different processes can coexist in the TLB simultaneously, and context switches don't require flushes.

3. Global Pages

Kernel memory is mapped identically in all processes. TLB entries for kernel pages are marked "global" and survive context switches even without ASID. This is critical since kernel code runs frequently.

4. Speculative Page Walks

Modern out-of-order CPUs begin page table walks speculatively when they predict a TLB miss may occur. By the time the memory access is confirmed, the translation might already be complete.

5. Page Walk Caches

Beyond TLBs, processors cache intermediate page table entries. When walking a 4-level page table, the upper 3 levels (PML4, PDPT, PD) change rarely for a given process. Caching these intermediate entries reduces most page walks from 4 memory accesses to 1.

Summary: Why the TLB Exists

We've established the fundamental case for the Translation Lookaside Buffer. Let's consolidate the key insights:

Key Takeaways

•The fundamental problem: Paging requires accessing the page table before accessing data, effectively doubling memory latency
•Multi-level amplification: Modern 4-5 level page tables require 4-5 memory accesses per translation, making virtual memory 5× slower without optimization
•The TLB solution: A small, fast cache of recent translations that exploits locality of reference to achieve >99% hit rates
•Hardware investment: TLBs use fully-associative, content-addressable memory—expensive but essential for parallel lookup
•Critical position: The TLB sits before all memory access and is arguably more important than data caches
•Co-designed with caches: VIPT L1 caches allow TLB lookup in parallel with cache access, hiding translation latency entirely on hits
•Modern sophistication: Multiple TLB levels, page size specialization, ASIDs, speculative walks, and page walk caches all optimize translation

What's next:

Having established why the TLB exists, we'll next examine how it works—specifically, the associative memory technology that enables parallel lookup of all entries in a single cycle. This is the hardware magic that makes the TLB fast enough to be queried on every memory access.

Page Complete

You now understand the fundamental purpose of the TLB: it transforms virtual memory from an elegant but impractical abstraction into a high-performance reality. Without the TLB, modern computing as we know it—with protected, isolated address spaces for every process—would not exist.

1 / 5

Loading learning content...

Operating SystemsTranslation Lookaside Buffer (TLB)

Translation Lookaside Buffer (TLB)

LevelIntermediate

Duration60 mins

TopicTranslation Lookaside Buffer (TLB)

1 / 5

TLB Purpose

The Memory Access Crisis

What You Will Learn

The Double Memory Access Problem

Step-by-step memory access sequence:

Memory Access Without TLB

•CPU generates virtual address: The instruction references virtual address 0x12345678
•Extract page number: Given a 4KB page size, the page number is 0x12345 (top 20 bits)
•Calculate page table entry address: Page Table Base Register + (Page Number × Entry Size) = PTBR + (0x12345 × 4)
•First memory access: Read the page table entry from physical memory (~100ns)
•Extract frame number: Parse PTE to get physical frame number and check valid/present bits
•Form physical address: Concatenate frame number with offset (lower 12 bits: 0x678)
•Second memory access: Finally read the actual data from the physical address (~100ns)

The arithmetic is damning. If main memory access latency is 100ns, every memory operation now costs 200ns. But it gets worse—much worse—when we consider multi-level page tables.

The multi-level nightmare:

Memory Accesses Required Per Memory Reference (Without TLB)
Page Table Levels	Memory Accesses for Translation	Total for One Data Access	Slowdown Factor
1-level (32-bit simple)	1	2	2×
2-level (32-bit x86)	2	3	3×
3-level (older 64-bit)	3	4	4×
4-level (x86-64 current)	4	5	5×
5-level (x86-64 LA57)	5	6	6×

The Brutal Reality

The TLB Solution: Caching Address Translations

The key insight that makes TLBs work: locality of reference.

Programs don't access memory randomly. They exhibit strong temporal and spatial locality:

Temporal locality: If you accessed an address recently, you're likely to access it again soon (loops, frequently-used data structures)
Spatial locality: If you accessed an address, you're likely to access nearby addresses soon (sequential code execution, array traversal)

Without TLB

•Every memory access walks page table
•Multi-level tables: 4-5 memory accesses per reference
•~500ns per memory access (4-level)
•Memory bandwidth consumed by page walks
•CPU stalls waiting for translations
•Virtual memory impractical at scale

With TLB

•Check TLB first (1 cycle or less)
•TLB hit: translation in ~1ns
•~100ns per memory access (TLB hit)
•Page table accessed only on TLB miss
•CPU proceeds immediately on hit
•Virtual memory viable and fast

Quantifying the TLB benefit:

Typical TLB hit rates in well-behaved programs exceed 99%. Let's calculate the effective memory access time:

Effective Access Time = (TLB Hit Rate × Hit Time) + (TLB Miss Rate × Miss Time)

Assume:

TLB lookup: 1ns
Memory access: 100ns
4-level page table
Miss penalty: 4 × 100ns = 400ns for page walk + 100ns for data = 500ns
TLB hit: 1ns + 100ns = 101ns

With 99% hit rate:

EAT = (0.99 × 101) + (0.01 × 500) = 99.99 + 5 = ~105ns

Without TLB: 500ns With TLB (99% hit rate): 105ns Speedup: 4.76×

With 99.9% hit rate:

EAT = (0.999 × 101) + (0.001 × 500) = 100.9 + 0.5 = ~101.4ns

Near-perfect elimination of translation overhead!

TLB Architecture Fundamentals

What a TLB entry contains:

TLB Entry Fields
Field	Size (typical)	Purpose
Virtual Page Number (VPN)	20-52 bits	The virtual address portion used for matching
Physical Frame Number (PFN)	20-52 bits	The translated physical frame number
Valid bit	1 bit	Entry contains valid translation
Protection bits	2-3 bits	Read/Write/Execute permissions
Reference bit	1 bit	Page was accessed (for replacement)
Dirty bit	1 bit	Page was modified
Global bit	1 bit	Entry shared across address spaces (kernel)
ASID	8-16 bits	Address Space ID (avoids flush on context switch)

Key architectural properties of TLBs:

1. Fully Associative or Set-Associative Design

2. Content-Addressable Memory (CAM)

3. Parallel Lookup

4. Split TLB Design

Hardware Investment Reveals Priority

The TLB's Position in the Memory Hierarchy

To fully appreciate the TLB's role, we must understand where it sits in the memory hierarchy and how it interacts with other caches.

The critical question: Virtual or Physical Cache?

Physically-indexed caches (common in L2/L3):

TLB lookup must complete before cache lookup can begin
Translation is on the critical path
TLB latency directly adds to memory access time

Virtually-indexed, physically-tagged (VIPT, common in L1):

Index into cache using virtual address bits that don't change during translation
Tag comparison uses physical address from TLB
TLB lookup happens in parallel with cache access
No added latency on TLB hit!

Modern systems use VIPT L1 caches specifically to hide TLB latency.

Here's the elegant trick: For a 32KB L1 cache with 64-byte lines and 8-way associativity:

Cache has 64 sets (512 lines ÷ 8 ways)
Set index requires 6 bits
Line offset requires 6 bits
Total: 12 bits to locate a cache line

With 4KB pages:

Page offset is 12 bits
These 12 bits are identical in virtual and physical addresses!

The TLB-Cache Symbiosis

Why the TLB is More Critical Than Data Cache

A provocative but defensible claim: the TLB is more important than the L1 data cache. Here's why:

1. TLB miss penalty is multiplicative, not additive

When you miss in the L1 data cache, you go to L2 (~10 cycles), then L3 (~40 cycles), then memory (~200 cycles). The miss penalty is the time to fetch one piece of data.

2. TLB coverage affects cache effectiveness

3. No spatial prefetching helps TLB

4. Software can't easily mitigate TLB misses

TLB vs L1 Cache Comparison
Characteristic	L1 Data Cache	TLB
Typical entries	512-1024 lines × 64B	64-1024 entries
Coverage per entry	64 bytes	4KB (or 2MB/1GB)
Total coverage	32KB-64KB	256KB-4MB (4KB pages)
Miss penalty	~10-200 cycles	~50-500 cycles (page walk)
Can prefetch?	Yes (hardware + software)	Limited (huge pages only)
Replaceable by larger cache?	Yes (L2 backs L1)	No (must walk page table)
Miss triggers further misses?	No	Yes (each page walk level can miss)

TLB Misses Hide Everywhere

Historical Development of the TLB

Understanding the TLB's history illuminates both its importance and design evolution.

1960s: The Birth of Virtual Memory Without TLBs

1970s: The TLB Emerges

1980s: TLBs Become Essential

1990s-2000s: Multi-Level TLBs

2010s-Present: Hardware Page Walkers

The Trend: More, Not Fewer Resources for Translation

TLB in Modern Processor Implementations

Let's examine how leading processors implement TLBs, revealing the engineering behind world-class performance.

TLB Specifications in Modern Processors
Processor	L1 ITLB	L1 DTLB	L2 TLB	Notable Features
Intel Alder Lake (2021)	128 entries (4KB/2MB)	64 entries (4KB), 32 entries (2MB)	2048 entries shared	Dedicated 2MB/1GB TLBs
AMD Zen 4 (2022)	64 entries (L1)	72 entries (4KB/2MB)	3072 entries L2	Per-core L2 TLB, fast page walker
Apple M2 (2022)	192 entries wide	160 entries	~2000 shared	Aggressive page walk cache
ARM Cortex-X3	48 entries fully assoc	48 entries fully assoc	2048 unified	ASID support, big/little pages

Key observations from modern TLB designs:

1. Separate TLBs for Different Page Sizes

2. PCID/ASID: Avoiding TLB Flushes

3. Global Pages

4. Speculative Page Walks

Modern out-of-order CPUs begin page table walks speculatively when they predict a TLB miss may occur. By the time the memory access is confirmed, the translation might already be complete.

5. Page Walk Caches

Summary: Why the TLB Exists

We've established the fundamental case for the Translation Lookaside Buffer. Let's consolidate the key insights:

Key Takeaways

•The fundamental problem: Paging requires accessing the page table before accessing data, effectively doubling memory latency
•Multi-level amplification: Modern 4-5 level page tables require 4-5 memory accesses per translation, making virtual memory 5× slower without optimization
•The TLB solution: A small, fast cache of recent translations that exploits locality of reference to achieve >99% hit rates
•Hardware investment: TLBs use fully-associative, content-addressable memory—expensive but essential for parallel lookup
•Critical position: The TLB sits before all memory access and is arguably more important than data caches
•Co-designed with caches: VIPT L1 caches allow TLB lookup in parallel with cache access, hiding translation latency entirely on hits
•Modern sophistication: Multiple TLB levels, page size specialization, ASIDs, speculative walks, and page walk caches all optimize translation

What's next:

Page Complete

1 / 5