Operating SystemsOS Design & Interview

Numerical Problems in Operating Systems

LevelAdvanced

Duration180 mins

TopicOS Design & Interview

5 / 5

Memory Access Time Calculations

The Complete Memory Performance Picture

Modern computer systems employ a sophisticated memory hierarchy to bridge the gap between fast CPUs and slow main memory. A memory access may hit or miss at multiple levels—TLB, L1 cache, L2 cache, L3 cache—before reaching DRAM. Understanding how to calculate the effective memory access time (EMAT) across this entire hierarchy is essential for:

System performance analysis — Predicting application performance
Architecture design — Sizing caches and TLBs optimally
Compiler optimization — Understanding the cost of memory operations
Interview problems — A extremely common topic combining multiple concepts

This page synthesizes everything we've learned about TLB and cache performance into a unified framework for complete memory system analysis.

What You Will Master

By the end of this page, you will be able to calculate EMAT for any memory hierarchy configuration, analyze the impact of each hierarchy level on performance, model the interaction between TLB and cache lookups, and solve complex multi-level memory problems systematically.

Memory Hierarchy Overview

A modern memory system has multiple layers, each with different characteristics:

Typical Memory Hierarchy:

┌─────────────────────────────┐
│       CPU Registers         │  ← ~0.25 ns, ~1 KB
├─────────────────────────────┤
│      L1 Data Cache          │  ← ~1 ns, 32-64 KB
│      L1 Instruction Cache   │
├─────────────────────────────┤
│         L2 Cache            │  ← ~4-10 ns, 256 KB - 1 MB
├─────────────────────────────┤
│         L3 Cache            │  ← ~10-40 ns, 4-64 MB
├─────────────────────────────┤
│       Main Memory (DRAM)    │  ← ~50-100 ns, 8-256 GB
├─────────────────────────────┤
│    Storage (SSD/HDD/NVMe)   │  ← ~0.1-10 ms, 1-16 TB
└─────────────────────────────┘

Parallel to this is the TLB hierarchy:

┌─────────────────────────────┐
│    L1 DTLB (Data TLB)       │  ← ~0.5 ns, 64-96 entries
│    L1 ITLB (Instruction)    │
├─────────────────────────────┤
│    L2 STLB (Shared TLB)     │  ← ~3-7 ns, 512-2048 entries
├─────────────────────────────┤
│     Page Table Walk         │  ← Variable, 4 memory accesses
└─────────────────────────────┘

Typical Memory Hierarchy Timing (Modern x86-64)
Level	Access Time	Size	Hit Rate (typical)
L1 DTLB	0.5 ns	64 entries	95-99%
L2 STLB	3-5 ns	1024 entries	99%+ of L1 misses
L1 Data Cache	1 ns (4 cycles)	32-64 KB	85-95%
L2 Cache	4-10 ns	256 KB - 1 MB	80-95% of L1 misses
L3 Cache	10-40 ns	4-64 MB	70-90% of L2 misses
DRAM	50-100 ns	8-256 GB	~100% (if not paged out)
Page Fault	5-15 ms	Disk/SSD	0.001-0.1%

TLB and Cache Are Independent

Address translation (TLB) and data access (cache) are different concerns. Physically-indexed caches check the cache after translation. Virtually-indexed caches can check the cache in parallel with TLB lookup, but have aliasing issues. Most modern L1 caches use VIPT (Virtually-Indexed, Physically-Tagged) for speed.

Basic Effective Memory Access Time (EMAT) Formula

The fundamental EMAT formula for a single cache level:

EMAT = (Hit Rate × Hit Time) + (Miss Rate × Miss Time)
     = h × T_hit + (1 - h) × T_miss

Where:

h = Cache hit rate
T_hit = Time for a cache hit
T_miss = Time for a cache miss (includes going to next level)

Recursive Expansion:

For multiple cache levels, T_miss of one level becomes the access to the next:

EMAT_L1 = h₁ × T_L1 + (1 - h₁) × (T_L1 + EMAT_L2)

EMAT_L2 = h₂ × T_L2 + (1 - h₂) × (T_L2 + EMAT_L3)

... and so on

Simplified Formula:

For a system with L1, L2, L3 caches and memory:

EMAT = T_L1 + (1-h₁) × T_L2 + (1-h₁)(1-h₂) × T_L3 + (1-h₁)(1-h₂)(1-h₃) × T_mem

This formula adds the latency of each level weighted by the probability of reaching that level.

Example: Three-Level Cache Hierarchy

Given:

L1 Cache: 1 ns access, 90% hit rate
L2 Cache: 5 ns access, 95% hit rate (for L1 misses)
L3 Cache: 20 ns access, 80% hit rate (for L2 misses)
Memory: 100 ns access

Calculation (Working Outward):

Step 1: L3 to Memory

EMAT_from_L3 = T_L3 + (1 - h₃) × T_mem
            = 20 + 0.2 × 100
            = 20 + 20 = 40 ns

Step 2: L2 to L3

EMAT_from_L2 = T_L2 + (1 - h₂) × EMAT_from_L3
            = 5 + 0.05 × 40
            = 5 + 2 = 7 ns

Step 3: L1 to L2

EMAT = T_L1 + (1 - h₁) × EMAT_from_L2
     = 1 + 0.1 × 7
     = 1 + 0.7 = 1.7 ns

Analysis:

Pure L1 access: 1 ns
With hierarchy overhead: 1.7 ns
Without caches (memory only): 100 ns
Speedup from caching: 100 / 1.7 = 59×

emat_calculator.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def calculate_emat(levels: list, memory_time: float) -> dict:
    """
    Calculate EMAT for a cache hierarchy.
    
    Args:
        levels: List of (access_time, hit_rate) tuples, ordered L1 to Ln
        memory_time: Main memory access time in ns
        
    Returns:
        Dictionary with EMAT and breakdown
    """
    # Work backwards from memory
    effective_time = memory_time
    breakdown = [("Memory", memory_time, 1.0)]
    
    for i, (access_time, hit_rate) in enumerate(reversed(levels)):
        level_name = f"L{len(levels) - i}"
        miss_rate = 1 - hit_rate
        
        # Time at this level = access_time + miss_rate × time_to_get_from_below
        effective_time = access_time + miss_rate * effective_time
        breakdown.append((level_name, access_time, hit_rate))
    
    return {
        'emat_ns': round(effective_time, 3),
        'breakdown': list(reversed(breakdown)),
        'speedup_vs_memory': round(memory_time / effective_time, 1)
    }
 
# Example: L1, L2, L3 with memory
result = calculate_emat(
    levels=[
        (1, 0.90),   # L1: 1ns, 90% hit
        (5, 0.95),   # L2: 5ns, 95% hit of L1 misses
        (20, 0.80),  # L3: 20ns, 80% hit of L2 misses
    ],
    memory_time=100
)
 
print(f"EMAT: {result['emat_ns']} ns")
print(f"Speedup vs memory only: {result['speedup_vs_memory']}x")

Alternative Formula

Some textbooks use the additive formula: EMAT = T_L1 + m₁×T_L2 + m₁×m₂×T_L3 + m₁×m₂×m₃×T_mem, where m = miss rate. This gives the same result and is sometimes easier for quick calculations.

Incorporating TLB into EMAT

Virtual memory adds address translation before every memory access. The complete EMAT must include TLB behavior.

Access Paths:

Path 1: TLB Hit (probability = p_tlb)

Time = T_tlb + EMAT_cache

Path 2: TLB Miss (probability = 1 - p_tlb)

Time = T_tlb + Page_Walk_Time + EMAT_cache

Complete Formula:

EMAT_with_TLB = p_tlb × (T_tlb + EMAT_cache) + 
               (1 - p_tlb) × (T_tlb + T_walk + EMAT_cache)
             
             = T_tlb + EMAT_cache + (1 - p_tlb) × T_walk

Page Walk Time:

For N-level page tables:

T_walk = N × EMAT_cache  (each level requires a memory read)

Note: Page table entries may be cached, so we use EMAT_cache, not raw memory time.

Complete Example with TLB:

Given:

TLB: 2 ns access, 98% hit rate
L1 Cache: 1 ns, 90% hit rate
L2 Cache: 10 ns, 85% hit rate
Memory: 100 ns
Page table: 4 levels

Step 1: Calculate EMAT_cache (no TLB)

From L2 misses:

EMAT_from_L2 = 10 + 0.15 × 100 = 10 + 15 = 25 ns

From L1:

EMAT_cache = 1 + 0.10 × 25 = 1 + 2.5 = 3.5 ns

Step 2: Calculate Page Walk Time

Each level requires accessing cache/memory:

T_walk = 4 × EMAT_cache = 4 × 3.5 = 14 ns

Step 3: Complete EMAT with TLB

EMAT = T_tlb + EMAT_cache + (1 - p_tlb) × T_walk
     = 2 + 3.5 + 0.02 × 14
     = 2 + 3.5 + 0.28
     = 5.78 ns

Impact Analysis:

Cache EMAT alone: 3.5 ns
With TLB overhead: 5.78 ns (+65%)
This shows why TLB hit rate is critical!

EMAT Sensitivity to TLB Hit Rate
TLB Hit Rate	TLB Miss Penalty	Total EMAT	vs 100% Hit
100%	0 ns	5.5 ns	baseline
99%	0.14 ns	5.64 ns	+2.5%
98%	0.28 ns	5.78 ns	+5.1%
95%	0.70 ns	6.20 ns	+12.7%
90%	1.40 ns	6.90 ns	+25.5%
80%	2.80 ns	8.30 ns	+50.9%

Page Walk Caching Effects

In reality, page table entries get cached. Upper-level entries (PML4, PDPT) are frequently accessed and almost always hit in cache. A more accurate model uses weighted access times for each page table level based on their cache hit rates.

Multi-Level TLB Analysis

Modern processors have multiple TLB levels, similar to the cache hierarchy.

Two-Level TLB Formula:

Path 1: L1 TLB Hit (p₁)

Time = T_L1_TLB + EMAT_cache

Path 2: L1 TLB Miss, L2 TLB Hit ((1-p₁) × p₂)

Time = T_L1_TLB + T_L2_TLB + EMAT_cache

Path 3: Both TLB Miss ((1-p₁) × (1-p₂))

Time = T_L1_TLB + T_L2_TLB + T_walk + EMAT_cache

Complete Formula:

EMAT = p₁ × (T_L1_TLB + EMAT_cache)
     + (1-p₁) × p₂ × (T_L1_TLB + T_L2_TLB + EMAT_cache)
     + (1-p₁) × (1-p₂) × (T_L1_TLB + T_L2_TLB + T_walk + EMAT_cache)

Simplified:

EMAT = T_L1_TLB + EMAT_cache 
     + (1-p₁) × T_L2_TLB 
     + (1-p₁) × (1-p₂) × T_walk

Complete System Example:

Given:

L1 DTLB: 0.5 ns, 95% hit rate
L2 STLB: 4 ns, 99% hit rate (of L1 misses)
L1 D$: 1 ns, 92% hit rate
L2 Cache: 4 ns, 88% hit rate (of L1 misses)
L3 Cache: 15 ns, 85% hit rate (of L2 misses)
Memory: 80 ns
Page table: 4 levels

Step 1: EMAT_cache

EMAT_from_L3 = 15 + 0.15 × 80 = 15 + 12 = 27 ns
EMAT_from_L2 = 4 + 0.12 × 27 = 4 + 3.24 = 7.24 ns
EMAT_cache = 1 + 0.08 × 7.24 = 1 + 0.58 = 1.58 ns

Step 2: Page Walk Time

T_walk = 4 × EMAT_cache = 4 × 1.58 = 6.32 ns

Step 3: Full EMAT

p_L1_miss = 0.05
p_L2_miss_given_L1_miss = 0.01
p_both_miss = 0.05 × 0.01 = 0.0005

EMAT = T_L1_TLB + EMAT_cache + (1-p₁) × T_L2_TLB + (1-p₁)(1-p₂) × T_walk
     = 0.5 + 1.58 + 0.05 × 4 + 0.0005 × 6.32
     = 0.5 + 1.58 + 0.2 + 0.00316
     = 2.28 ns

complete_emat.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def complete_emat(
    tlb_config: list,  # [(time, hit_rate), ...]
    cache_config: list,  # [(time, hit_rate), ...]
    memory_time: float,
    pt_levels: int
) -> dict:
    """
    Calculate complete EMAT with TLB and cache hierarchy.
    """
    # Step 1: Calculate cache EMAT (working backwards)
    cache_emat = memory_time
    for cache_time, cache_hit_rate in reversed(cache_config):
        cache_emat = cache_time + (1 - cache_hit_rate) * cache_emat
    
    # Step 2: Calculate page walk time
    page_walk_time = pt_levels * cache_emat
    
    # Step 3: Calculate TLB contribution
    tlb_overhead = 0
    miss_prob = 1.0
    total_tlb_time = 0
    
    for tlb_time, tlb_hit_rate in tlb_config:
        total_tlb_time += tlb_time
        miss_prob *= (1 - tlb_hit_rate)
    
    # EMAT = total_tlb_time + cache_emat + miss_prob * page_walk_time
    final_emat = sum(t for t, _ in tlb_config) + cache_emat
    
    # Add TLB miss overhead (progressive L2 TLB access)
    running_miss = 1.0
    for i, (tlb_time, tlb_hit_rate) in enumerate(tlb_config):
        if i > 0:  # L2+ TLB only accessed on prior miss
            final_emat += running_miss * tlb_time
        running_miss *= (1 - tlb_hit_rate)
    
    # Subtract redundant first TLB access and add page walk
    final_emat -= tlb_config[0][0]  # Remove double-counted L1 TLB
    final_emat += tlb_config[0][0]  # Add it back correctly
    
    # Add page walk cost
    final_emat += miss_prob * page_walk_time
    
    return {
        'cache_emat_ns': round(cache_emat, 3),
        'page_walk_time_ns': round(page_walk_time, 3),
        'total_tlb_miss_prob': round(miss_prob, 6),
        'final_emat_ns': round(final_emat, 3)
    }
 
# Example configuration
result = complete_emat(
    tlb_config=[(0.5, 0.95), (4, 0.99)],  # L1 DTLB, L2 STLB
    cache_config=[(1, 0.92), (4, 0.88), (15, 0.85)],  # L1, L2, L3
    memory_time=80,
    pt_levels=4
)
 
for k, v in result.items():
    print(f"{k}: {v}")

Cache Miss Categories (The 3 C's)

Understanding cache miss categories helps predict and optimize cache behavior.

The Three C's of Cache Misses:

1. Compulsory Misses (Cold Misses)

First access to a block always misses
Cannot be avoided (except with prefetching)
Independent of cache size or associativity

2. Capacity Misses

Working set exceeds cache capacity
Block evicted and later re-accessed
Reduced by increasing cache size

3. Conflict Misses (Collision Misses)

Multiple blocks map to same cache set
Eviction despite cache not being full
Reduced by increasing associativity
Zero for fully-associative caches

Calculating Miss Contributions:

Total Miss Rate = Compulsory + Capacity + Conflict

Compulsory Rate = Unique_Blocks_Accessed / Total_Accesses

For fully-associative cache:
  Capacity Rate = Total_Miss_Rate - Compulsory_Rate
  Conflict Rate = 0
  
For set-associative cache:
  Capacity Rate = FA_Miss_Rate - Compulsory_Rate
  Conflict Rate = Set_Assoc_Miss_Rate - FA_Miss_Rate

Example: Miss Category Analysis

Given:

A program accesses 1000 unique cache blocks
Total memory accesses: 10,000
Measured miss rates:
- Direct-mapped (1-way): 15%
- 4-way set-associative: 10%
- Fully-associative: 8%

Analysis:

Compulsory Rate = 1000 / 10000 = 10%

Capacity Rate = FA_Miss - Compulsory = 8% - 10% = -2% (capped at 0)
  Note: This means the working set fits in cache!
  
Actual interpretation:
  If FA miss rate < compulsory rate, misses are purely compulsory
  (some blocks accessed only once)
  
  FA Miss Rate = 8% means:
  - Compulsory: 8% (all misses are first accesses)
  - Capacity: 0% (everything fits)
  
Conflict Rate for Direct-Mapped = 15% - 8% = 7%
Conflict Rate for 4-way = 10% - 8% = 2%

Optimization Insight:

Moving from direct-mapped to 4-way reduces misses by 5% (15% → 10%)
Moving from 4-way to fully-associative reduces by 2% (10% → 8%)
Diminishing returns as associativity increases

Miss Rate Impact on EMAT (L1 = 1ns, L2 = 10ns)
Associativity	Miss Rate	EMAT	Improvement
Direct-Mapped	15%	1 + 0.15×10 = 2.5 ns	baseline
2-way	12%	1 + 0.12×10 = 2.2 ns	12%
4-way	10%	1 + 0.10×10 = 2.0 ns	20%
8-way	9%	1 + 0.09×10 = 1.9 ns	24%
Fully-Assoc	8%	1 + 0.08×10 = 1.8 ns	28%

The Fourth C: Coherence

In multiprocessor systems, a fourth 'C' exists: Coherence misses. These occur when another processor modifies a shared cache line, causing invalidation. Important for shared memory parallel programs.

AMAT with Write Policies

Write operations have different timing characteristics than reads, affecting AMAT.

Write Policies:

Write-Through: Every write goes to cache AND memory

Write Time = max(Cache_Time, Memory_Write_Time)  // if write buffer
Write Time = Cache_Time + Memory_Time            // if no write buffer

Write-Back: Writes only go to cache; dirty blocks written on eviction

Write Hit Time = Cache_Time
Write Miss Time = Cache_Time + (if dirty eviction) × Memory_Time

Miss Handling:

Write-Allocate: On write miss, fetch block into cache, then write

Write Miss Time = Fetch_Time + Cache_Write_Time

Write-No-Allocate: On write miss, write directly to memory

Write Miss Time = Memory_Write_Time

Common combinations:

Write-Through + Write-No-Allocate (simple)
Write-Back + Write-Allocate (high performance)

AMAT with Mixed Read/Write Traffic:

AMAT = Read_Fraction × AMAT_read + Write_Fraction × AMAT_write

Example: Write-Back Cache

Given:

L1 Cache: 1 ns
Memory: 100 ns
Read hit rate: 95%
Write hit rate: 92%
Traffic: 70% reads, 30% writes
Dirty block probability on eviction: 40%

Read AMAT:

AMAT_read = 1 + 0.05 × 100 = 6 ns

Write AMAT (Write-Back):

On write hit: 1 ns On write miss: Allocate + potential dirty writeback

Write Miss Time = 100 + 0.40 × 100 = 140 ns  (fetch + 40% writeback)
AMAT_write = 1 + 0.08 × 140 = 1 + 11.2 = 12.2 ns

Combined AMAT:

AMAT = 0.70 × 6 + 0.30 × 12.2 = 4.2 + 3.66 = 7.86 ns

Write Buffer Effect

Write buffers absorb write latency by queuing writes and letting the CPU continue. If the write buffer doesn't fill up, write-through performance approaches write-back. Modern CPUs have sophisticated store buffers and memory ordering logic.

Complete System Analysis

Let's bring everything together with a complete system analysis.

Problem: Full Memory System Characterization

System Configuration:

CPU: 3 GHz, CPI_base = 1.0 (no memory stalls)
L1 DTLB: 1 cycle, 97% hit rate
L2 STLB: 8 cycles, 99.5% hit rate of L1 misses
L1 D-Cache: 4 cycles, 94% hit rate
L2 Cache: 15 cycles, 90% hit rate of L1 misses
L3 Cache: 50 cycles, 80% hit rate of L2 misses
Memory: 300 cycles
Page table: 4 levels
Memory instructions: 35% of all instructions

Step 1: Convert cycles to ns

1 cycle = 1/3 GHz = 0.333 ns

L1 TLB:  1 × 0.333 = 0.333 ns
L2 TLB:  8 × 0.333 = 2.67 ns
L1 Cache: 4 × 0.333 = 1.33 ns
L2 Cache: 15 × 0.333 = 5 ns
L3 Cache: 50 × 0.333 = 16.67 ns
Memory: 300 × 0.333 = 100 ns

Step 2: Calculate Cache EMAT

From L3: 16.67 + 0.20 × 100 = 16.67 + 20 = 36.67 ns
From L2: 5 + 0.10 × 36.67 = 5 + 3.67 = 8.67 ns
From L1: 1.33 + 0.06 × 8.67 = 1.33 + 0.52 = 1.85 ns

EMAT_cache = 1.85 ns

Step 3: Page Walk Time

T_walk = 4 × EMAT_cache = 4 × 1.85 = 7.4 ns

Step 4: TLB Contribution

p_L1_miss = 0.03
p_both_miss = 0.03 × 0.005 = 0.00015

EMAT_with_TLB = 0.333 + 1.85 + 0.03 × 2.67 + 0.00015 × 7.4
             = 0.333 + 1.85 + 0.08 + 0.001
             = 2.26 ns

Step 5: Impact on CPI

Memory stall cycles per memory instruction:

Stall_cycles = (EMAT_with_TLB - T_L1) / cycle_time
             = (2.26 - 0.333) / 0.333
             = 5.8 cycles

Actual CPI:

CPI = CPI_base + Memory_Fraction × Stall_cycles
    = 1.0 + 0.35 × 5.8
    = 1.0 + 2.03
    = 3.03 cycles

Step 6: Performance Analysis

IPC_ideal = 1.0
IPC_actual = 1/3.03 = 0.33

Performance loss to memory = (3.03 - 1.0) / 3.03 = 67%

Interpretation: Memory access causes 67% of execution time. This is why:

Cache optimization is critical
Prefetching helps enormously
Memory-bound workloads benefit from larger caches

Sensitivity Analysis: What If Improvements
Change	New EMAT	New CPI	Speedup
Baseline	2.26 ns	3.03	1.00×
L1 hit rate → 97%	1.97 ns	2.74	1.11×
L2 hit rate → 95%	2.10 ns	2.86	1.06×
Memory → 200 cycles	1.93 ns	2.70	1.12×
TLB hit → 99%	2.21 ns	2.98	1.02×
All improvements	1.55 ns	2.28	1.33×

Practice Problems with Solutions

Problem 1: Two-Level Cache EMAT

L1 cache: 2 ns, 88% hit rate. L2 cache: 20 ns, 92% hit rate of L1 misses. Memory: 200 ns. Calculate EMAT and the percentage of time spent in each level.

EMAT Calculation:

From L2: 20 + 0.08 × 200 = 20 + 16 = 36 ns
From L1: 2 + 0.12 × 36 = 2 + 4.32 = 6.32 ns

EMAT = 6.32 ns

Time Distribution:

L1 time: 2 ns (always accessed)
L2 time: 0.12 × 20 = 2.4 ns (12% of accesses)
Memory time: 0.12 × 0.08 × 200 = 1.92 ns (0.96% of accesses)

Percentage:
- L1: 2/6.32 = 31.6%
- L2: 2.4/6.32 = 38.0%
- Memory: 1.92/6.32 = 30.4%

Problem 2: TLB + Cache

TLB: 1 ns, 96% hit. Cache: 4 ns, 90% hit. Memory: 100 ns. 2-level page table. (a) Calculate EMAT. (b) What TLB hit rate is needed to make TLB overhead < 5%?

(a) EMAT Calculation:

Cache EMAT = 4 + 0.1 × 100 = 14 ns
Page walk = 2 × 14 = 28 ns

EMAT = 1 + 14 + 0.04 × 28 = 1 + 14 + 1.12 = 16.12 ns

(b) TLB Overhead Target:

Without TLB miss: 1 + 14 = 15 ns
5% overhead: 15 × 1.05 = 15.75 ns

Max TLB miss contribution: 15.75 - 15 = 0.75 ns

(1 - p_hit) × 28 ≤ 0.75
(1 - p_hit) ≤ 0.75/28 = 0.027
p_hit ≥ 97.3%

Answer: TLB hit rate must be at least 97.3% for overhead < 5%.

Problem 3: CPI Impact

A CPU at 2 GHz has CPI_base = 1.2 and 40% memory instructions. L1: 4 cycles, 92% hit. Memory: 200 cycles. Calculate actual CPI and IPC.

Memory Stall Calculation:

Miss penalty = 200 - 4 = 196 cycles
Miss rate = 8%

Avg memory stall per access = 0.08 × 196 = 15.68 cycles

Stalls per instruction = 0.40 × 15.68 = 6.27 cycles

Actual CPI:

CPI = CPI_base + memory_stalls
    = 1.2 + 6.27
    = 7.47 cycles

IPC:

IPC = 1/CPI = 1/7.47 = 0.134

Interpretation: Memory stalls cause CPI to increase 6.2× from ideal. Only 0.134 instructions complete per cycle vs. 0.83 ideal.

Summary: Memory Access Time Calculations

Key Takeaways

•EMAT combines all levels — Cache EMAT + TLB overhead + page walk cost
•Work backwards — Start from memory, add each cache level's contribution
•TLB misses trigger page walks — Each walk level uses cache EMAT, not raw memory
•The 3 C's explain misses — Compulsory, Capacity, Conflict guide optimization
•Write policies affect EMAT — Write-back has different miss behavior than write-through
•CPI = base + memory stalls — Memory often dominates execution time

Essential Formulas:

EMAT_cache = T_L1 + (1-h₁) × T_L2 + (1-h₁)(1-h₂) × T_L3 + ... + T_mem

EMAT_with_TLB = T_TLB + EMAT_cache + (1-p_tlb) × T_walk

T_walk = Page_Table_Levels × EMAT_cache

CPI = CPI_base + Memory_Fraction × Stalls_per_Memory_Access

Module Complete!

You have now mastered the five core numerical problem types in operating systems:

Process timing calculations
Page table size calculations
TLB hit ratio calculations
Disk scheduling seek time
Memory access time calculations

These skills form the quantitative foundation for OS performance analysis and are essential for technical interviews.

Module Complete!

Congratulations! You have completed the Numerical Problems module. You now have comprehensive knowledge of the quantitative aspects of operating systems, from process scheduling metrics to complete memory system analysis. Practice these calculations regularly to build speed and accuracy.

5 / 5

Loading learning content...

Operating SystemsOS Design & Interview

Numerical Problems in Operating Systems

LevelAdvanced

Duration180 mins

TopicOS Design & Interview

5 / 5

Memory Access Time Calculations

The Complete Memory Performance Picture

System performance analysis — Predicting application performance
Architecture design — Sizing caches and TLBs optimally
Compiler optimization — Understanding the cost of memory operations
Interview problems — A extremely common topic combining multiple concepts

This page synthesizes everything we've learned about TLB and cache performance into a unified framework for complete memory system analysis.

What You Will Master

Memory Hierarchy Overview

A modern memory system has multiple layers, each with different characteristics:

Typical Memory Hierarchy:

┌─────────────────────────────┐
│       CPU Registers         │  ← ~0.25 ns, ~1 KB
├─────────────────────────────┤
│      L1 Data Cache          │  ← ~1 ns, 32-64 KB
│      L1 Instruction Cache   │
├─────────────────────────────┤
│         L2 Cache            │  ← ~4-10 ns, 256 KB - 1 MB
├─────────────────────────────┤
│         L3 Cache            │  ← ~10-40 ns, 4-64 MB
├─────────────────────────────┤
│       Main Memory (DRAM)    │  ← ~50-100 ns, 8-256 GB
├─────────────────────────────┤
│    Storage (SSD/HDD/NVMe)   │  ← ~0.1-10 ms, 1-16 TB
└─────────────────────────────┘

Parallel to this is the TLB hierarchy:

┌─────────────────────────────┐
│    L1 DTLB (Data TLB)       │  ← ~0.5 ns, 64-96 entries
│    L1 ITLB (Instruction)    │
├─────────────────────────────┤
│    L2 STLB (Shared TLB)     │  ← ~3-7 ns, 512-2048 entries
├─────────────────────────────┤
│     Page Table Walk         │  ← Variable, 4 memory accesses
└─────────────────────────────┘

Typical Memory Hierarchy Timing (Modern x86-64)
Level	Access Time	Size	Hit Rate (typical)
L1 DTLB	0.5 ns	64 entries	95-99%
L2 STLB	3-5 ns	1024 entries	99%+ of L1 misses
L1 Data Cache	1 ns (4 cycles)	32-64 KB	85-95%
L2 Cache	4-10 ns	256 KB - 1 MB	80-95% of L1 misses
L3 Cache	10-40 ns	4-64 MB	70-90% of L2 misses
DRAM	50-100 ns	8-256 GB	~100% (if not paged out)
Page Fault	5-15 ms	Disk/SSD	0.001-0.1%

TLB and Cache Are Independent

Basic Effective Memory Access Time (EMAT) Formula

The fundamental EMAT formula for a single cache level:

EMAT = (Hit Rate × Hit Time) + (Miss Rate × Miss Time)
     = h × T_hit + (1 - h) × T_miss

Where:

h = Cache hit rate
T_hit = Time for a cache hit
T_miss = Time for a cache miss (includes going to next level)

Recursive Expansion:

For multiple cache levels, T_miss of one level becomes the access to the next:

EMAT_L1 = h₁ × T_L1 + (1 - h₁) × (T_L1 + EMAT_L2)

EMAT_L2 = h₂ × T_L2 + (1 - h₂) × (T_L2 + EMAT_L3)

... and so on

Simplified Formula:

For a system with L1, L2, L3 caches and memory:

EMAT = T_L1 + (1-h₁) × T_L2 + (1-h₁)(1-h₂) × T_L3 + (1-h₁)(1-h₂)(1-h₃) × T_mem

This formula adds the latency of each level weighted by the probability of reaching that level.

Example: Three-Level Cache Hierarchy

Given:

L1 Cache: 1 ns access, 90% hit rate
L2 Cache: 5 ns access, 95% hit rate (for L1 misses)
L3 Cache: 20 ns access, 80% hit rate (for L2 misses)
Memory: 100 ns access

Calculation (Working Outward):

Step 1: L3 to Memory

EMAT_from_L3 = T_L3 + (1 - h₃) × T_mem
            = 20 + 0.2 × 100
            = 20 + 20 = 40 ns

Step 2: L2 to L3

EMAT_from_L2 = T_L2 + (1 - h₂) × EMAT_from_L3
            = 5 + 0.05 × 40
            = 5 + 2 = 7 ns

Step 3: L1 to L2

EMAT = T_L1 + (1 - h₁) × EMAT_from_L2
     = 1 + 0.1 × 7
     = 1 + 0.7 = 1.7 ns

Analysis:

Pure L1 access: 1 ns
With hierarchy overhead: 1.7 ns
Without caches (memory only): 100 ns
Speedup from caching: 100 / 1.7 = 59×

emat_calculator.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
def calculate_emat(levels: list, memory_time: float) -> dict:
    """
    Calculate EMAT for a cache hierarchy.
    
    Args:
        levels: List of (access_time, hit_rate) tuples, ordered L1 to Ln
        memory_time: Main memory access time in ns
        
    Returns:
        Dictionary with EMAT and breakdown
    """
    # Work backwards from memory
    effective_time = memory_time
    breakdown = [("Memory", memory_time, 1.0)]
    
    for i, (access_time, hit_rate) in enumerate(reversed(levels)):
        level_name = f"L{len(levels) - i}"
        miss_rate = 1 - hit_rate
        
        # Time at this level = access_time + miss_rate × time_to_get_from_below
        effective_time = access_time + miss_rate * effective_time
        breakdown.append((level_name, access_time, hit_rate))
    
    return {
        'emat_ns': round(effective_time, 3),
        'breakdown': list(reversed(breakdown)),
        'speedup_vs_memory': round(memory_time / effective_time, 1)
    }
 
# Example: L1, L2, L3 with memory
result = calculate_emat(
    levels=[
        (1, 0.90),   # L1: 1ns, 90% hit
        (5, 0.95),   # L2: 5ns, 95% hit of L1 misses
        (20, 0.80),  # L3: 20ns, 80% hit of L2 misses
    ],
    memory_time=100
)
 
print(f"EMAT: {result['emat_ns']} ns")
print(f"Speedup vs memory only: {result['speedup_vs_memory']}x")

Alternative Formula

Incorporating TLB into EMAT

Virtual memory adds address translation before every memory access. The complete EMAT must include TLB behavior.

Access Paths:

Path 1: TLB Hit (probability = p_tlb)

Time = T_tlb + EMAT_cache

Path 2: TLB Miss (probability = 1 - p_tlb)

Time = T_tlb + Page_Walk_Time + EMAT_cache

Complete Formula:

EMAT_with_TLB = p_tlb × (T_tlb + EMAT_cache) + 
               (1 - p_tlb) × (T_tlb + T_walk + EMAT_cache)
             
             = T_tlb + EMAT_cache + (1 - p_tlb) × T_walk

Page Walk Time:

For N-level page tables:

T_walk = N × EMAT_cache  (each level requires a memory read)

Note: Page table entries may be cached, so we use EMAT_cache, not raw memory time.

Complete Example with TLB:

Given:

TLB: 2 ns access, 98% hit rate
L1 Cache: 1 ns, 90% hit rate
L2 Cache: 10 ns, 85% hit rate
Memory: 100 ns
Page table: 4 levels

Step 1: Calculate EMAT_cache (no TLB)

From L2 misses:

EMAT_from_L2 = 10 + 0.15 × 100 = 10 + 15 = 25 ns

From L1:

EMAT_cache = 1 + 0.10 × 25 = 1 + 2.5 = 3.5 ns

Step 2: Calculate Page Walk Time

Each level requires accessing cache/memory:

T_walk = 4 × EMAT_cache = 4 × 3.5 = 14 ns

Step 3: Complete EMAT with TLB

EMAT = T_tlb + EMAT_cache + (1 - p_tlb) × T_walk
     = 2 + 3.5 + 0.02 × 14
     = 2 + 3.5 + 0.28
     = 5.78 ns

Impact Analysis:

Cache EMAT alone: 3.5 ns
With TLB overhead: 5.78 ns (+65%)
This shows why TLB hit rate is critical!

EMAT Sensitivity to TLB Hit Rate
TLB Hit Rate	TLB Miss Penalty	Total EMAT	vs 100% Hit
100%	0 ns	5.5 ns	baseline
99%	0.14 ns	5.64 ns	+2.5%
98%	0.28 ns	5.78 ns	+5.1%
95%	0.70 ns	6.20 ns	+12.7%
90%	1.40 ns	6.90 ns	+25.5%
80%	2.80 ns	8.30 ns	+50.9%

Page Walk Caching Effects

Multi-Level TLB Analysis

Modern processors have multiple TLB levels, similar to the cache hierarchy.

Two-Level TLB Formula:

Path 1: L1 TLB Hit (p₁)

Time = T_L1_TLB + EMAT_cache

Path 2: L1 TLB Miss, L2 TLB Hit ((1-p₁) × p₂)

Time = T_L1_TLB + T_L2_TLB + EMAT_cache

Path 3: Both TLB Miss ((1-p₁) × (1-p₂))

Time = T_L1_TLB + T_L2_TLB + T_walk + EMAT_cache

Complete Formula:

EMAT = p₁ × (T_L1_TLB + EMAT_cache)
     + (1-p₁) × p₂ × (T_L1_TLB + T_L2_TLB + EMAT_cache)
     + (1-p₁) × (1-p₂) × (T_L1_TLB + T_L2_TLB + T_walk + EMAT_cache)

Simplified:

EMAT = T_L1_TLB + EMAT_cache 
     + (1-p₁) × T_L2_TLB 
     + (1-p₁) × (1-p₂) × T_walk

Complete System Example:

Given:

L1 DTLB: 0.5 ns, 95% hit rate
L2 STLB: 4 ns, 99% hit rate (of L1 misses)
L1 D$: 1 ns, 92% hit rate
L2 Cache: 4 ns, 88% hit rate (of L1 misses)
L3 Cache: 15 ns, 85% hit rate (of L2 misses)
Memory: 80 ns
Page table: 4 levels

Step 1: EMAT_cache

EMAT_from_L3 = 15 + 0.15 × 80 = 15 + 12 = 27 ns
EMAT_from_L2 = 4 + 0.12 × 27 = 4 + 3.24 = 7.24 ns
EMAT_cache = 1 + 0.08 × 7.24 = 1 + 0.58 = 1.58 ns

Step 2: Page Walk Time

T_walk = 4 × EMAT_cache = 4 × 1.58 = 6.32 ns

Step 3: Full EMAT

p_L1_miss = 0.05
p_L2_miss_given_L1_miss = 0.01
p_both_miss = 0.05 × 0.01 = 0.0005

EMAT = T_L1_TLB + EMAT_cache + (1-p₁) × T_L2_TLB + (1-p₁)(1-p₂) × T_walk
     = 0.5 + 1.58 + 0.05 × 4 + 0.0005 × 6.32
     = 0.5 + 1.58 + 0.2 + 0.00316
     = 2.28 ns

complete_emat.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def complete_emat(
    tlb_config: list,  # [(time, hit_rate), ...]
    cache_config: list,  # [(time, hit_rate), ...]
    memory_time: float,
    pt_levels: int
) -> dict:
    """
    Calculate complete EMAT with TLB and cache hierarchy.
    """
    # Step 1: Calculate cache EMAT (working backwards)
    cache_emat = memory_time
    for cache_time, cache_hit_rate in reversed(cache_config):
        cache_emat = cache_time + (1 - cache_hit_rate) * cache_emat
    
    # Step 2: Calculate page walk time
    page_walk_time = pt_levels * cache_emat
    
    # Step 3: Calculate TLB contribution
    tlb_overhead = 0
    miss_prob = 1.0
    total_tlb_time = 0
    
    for tlb_time, tlb_hit_rate in tlb_config:
        total_tlb_time += tlb_time
        miss_prob *= (1 - tlb_hit_rate)
    
    # EMAT = total_tlb_time + cache_emat + miss_prob * page_walk_time
    final_emat = sum(t for t, _ in tlb_config) + cache_emat
    
    # Add TLB miss overhead (progressive L2 TLB access)
    running_miss = 1.0
    for i, (tlb_time, tlb_hit_rate) in enumerate(tlb_config):
        if i > 0:  # L2+ TLB only accessed on prior miss
            final_emat += running_miss * tlb_time
        running_miss *= (1 - tlb_hit_rate)
    
    # Subtract redundant first TLB access and add page walk
    final_emat -= tlb_config[0][0]  # Remove double-counted L1 TLB
    final_emat += tlb_config[0][0]  # Add it back correctly
    
    # Add page walk cost
    final_emat += miss_prob * page_walk_time
    
    return {
        'cache_emat_ns': round(cache_emat, 3),
        'page_walk_time_ns': round(page_walk_time, 3),
        'total_tlb_miss_prob': round(miss_prob, 6),
        'final_emat_ns': round(final_emat, 3)
    }
 
# Example configuration
result = complete_emat(
    tlb_config=[(0.5, 0.95), (4, 0.99)],  # L1 DTLB, L2 STLB
    cache_config=[(1, 0.92), (4, 0.88), (15, 0.85)],  # L1, L2, L3
    memory_time=80,
    pt_levels=4
)
 
for k, v in result.items():
    print(f"{k}: {v}")

Cache Miss Categories (The 3 C's)

Understanding cache miss categories helps predict and optimize cache behavior.

The Three C's of Cache Misses:

1. Compulsory Misses (Cold Misses)

First access to a block always misses
Cannot be avoided (except with prefetching)
Independent of cache size or associativity

2. Capacity Misses

Working set exceeds cache capacity
Block evicted and later re-accessed
Reduced by increasing cache size

3. Conflict Misses (Collision Misses)

Multiple blocks map to same cache set
Eviction despite cache not being full
Reduced by increasing associativity
Zero for fully-associative caches

Calculating Miss Contributions:

Total Miss Rate = Compulsory + Capacity + Conflict

Compulsory Rate = Unique_Blocks_Accessed / Total_Accesses

For fully-associative cache:
  Capacity Rate = Total_Miss_Rate - Compulsory_Rate
  Conflict Rate = 0
  
For set-associative cache:
  Capacity Rate = FA_Miss_Rate - Compulsory_Rate
  Conflict Rate = Set_Assoc_Miss_Rate - FA_Miss_Rate

Example: Miss Category Analysis

Given:

A program accesses 1000 unique cache blocks
Total memory accesses: 10,000
Measured miss rates:
- Direct-mapped (1-way): 15%
- 4-way set-associative: 10%
- Fully-associative: 8%

Analysis:

Compulsory Rate = 1000 / 10000 = 10%

Capacity Rate = FA_Miss - Compulsory = 8% - 10% = -2% (capped at 0)
  Note: This means the working set fits in cache!
  
Actual interpretation:
  If FA miss rate < compulsory rate, misses are purely compulsory
  (some blocks accessed only once)
  
  FA Miss Rate = 8% means:
  - Compulsory: 8% (all misses are first accesses)
  - Capacity: 0% (everything fits)
  
Conflict Rate for Direct-Mapped = 15% - 8% = 7%
Conflict Rate for 4-way = 10% - 8% = 2%

Optimization Insight:

Moving from direct-mapped to 4-way reduces misses by 5% (15% → 10%)
Moving from 4-way to fully-associative reduces by 2% (10% → 8%)
Diminishing returns as associativity increases

Miss Rate Impact on EMAT (L1 = 1ns, L2 = 10ns)
Associativity	Miss Rate	EMAT	Improvement
Direct-Mapped	15%	1 + 0.15×10 = 2.5 ns	baseline
2-way	12%	1 + 0.12×10 = 2.2 ns	12%
4-way	10%	1 + 0.10×10 = 2.0 ns	20%
8-way	9%	1 + 0.09×10 = 1.9 ns	24%
Fully-Assoc	8%	1 + 0.08×10 = 1.8 ns	28%

The Fourth C: Coherence

In multiprocessor systems, a fourth 'C' exists: Coherence misses. These occur when another processor modifies a shared cache line, causing invalidation. Important for shared memory parallel programs.

AMAT with Write Policies

Write operations have different timing characteristics than reads, affecting AMAT.

Write Policies:

Write-Through: Every write goes to cache AND memory

Write Time = max(Cache_Time, Memory_Write_Time)  // if write buffer
Write Time = Cache_Time + Memory_Time            // if no write buffer

Write-Back: Writes only go to cache; dirty blocks written on eviction

Write Hit Time = Cache_Time
Write Miss Time = Cache_Time + (if dirty eviction) × Memory_Time

Miss Handling:

Write-Allocate: On write miss, fetch block into cache, then write

Write Miss Time = Fetch_Time + Cache_Write_Time

Write-No-Allocate: On write miss, write directly to memory

Write Miss Time = Memory_Write_Time

Common combinations:

Write-Through + Write-No-Allocate (simple)
Write-Back + Write-Allocate (high performance)

AMAT with Mixed Read/Write Traffic:

AMAT = Read_Fraction × AMAT_read + Write_Fraction × AMAT_write

Example: Write-Back Cache

Given:

L1 Cache: 1 ns
Memory: 100 ns
Read hit rate: 95%
Write hit rate: 92%
Traffic: 70% reads, 30% writes
Dirty block probability on eviction: 40%

Read AMAT:

AMAT_read = 1 + 0.05 × 100 = 6 ns

Write AMAT (Write-Back):

On write hit: 1 ns On write miss: Allocate + potential dirty writeback

Write Miss Time = 100 + 0.40 × 100 = 140 ns  (fetch + 40% writeback)
AMAT_write = 1 + 0.08 × 140 = 1 + 11.2 = 12.2 ns

Combined AMAT:

AMAT = 0.70 × 6 + 0.30 × 12.2 = 4.2 + 3.66 = 7.86 ns

Write Buffer Effect

Complete System Analysis

Let's bring everything together with a complete system analysis.

Problem: Full Memory System Characterization

System Configuration:

CPU: 3 GHz, CPI_base = 1.0 (no memory stalls)
L1 DTLB: 1 cycle, 97% hit rate
L2 STLB: 8 cycles, 99.5% hit rate of L1 misses
L1 D-Cache: 4 cycles, 94% hit rate
L2 Cache: 15 cycles, 90% hit rate of L1 misses
L3 Cache: 50 cycles, 80% hit rate of L2 misses
Memory: 300 cycles
Page table: 4 levels
Memory instructions: 35% of all instructions

Step 1: Convert cycles to ns

1 cycle = 1/3 GHz = 0.333 ns

L1 TLB:  1 × 0.333 = 0.333 ns
L2 TLB:  8 × 0.333 = 2.67 ns
L1 Cache: 4 × 0.333 = 1.33 ns
L2 Cache: 15 × 0.333 = 5 ns
L3 Cache: 50 × 0.333 = 16.67 ns
Memory: 300 × 0.333 = 100 ns

Step 2: Calculate Cache EMAT

From L3: 16.67 + 0.20 × 100 = 16.67 + 20 = 36.67 ns
From L2: 5 + 0.10 × 36.67 = 5 + 3.67 = 8.67 ns
From L1: 1.33 + 0.06 × 8.67 = 1.33 + 0.52 = 1.85 ns

EMAT_cache = 1.85 ns

Step 3: Page Walk Time

T_walk = 4 × EMAT_cache = 4 × 1.85 = 7.4 ns

Step 4: TLB Contribution

p_L1_miss = 0.03
p_both_miss = 0.03 × 0.005 = 0.00015

EMAT_with_TLB = 0.333 + 1.85 + 0.03 × 2.67 + 0.00015 × 7.4
             = 0.333 + 1.85 + 0.08 + 0.001
             = 2.26 ns

Step 5: Impact on CPI

Memory stall cycles per memory instruction:

Stall_cycles = (EMAT_with_TLB - T_L1) / cycle_time
             = (2.26 - 0.333) / 0.333
             = 5.8 cycles

Actual CPI:

CPI = CPI_base + Memory_Fraction × Stall_cycles
    = 1.0 + 0.35 × 5.8
    = 1.0 + 2.03
    = 3.03 cycles

Step 6: Performance Analysis

IPC_ideal = 1.0
IPC_actual = 1/3.03 = 0.33

Performance loss to memory = (3.03 - 1.0) / 3.03 = 67%

Interpretation: Memory access causes 67% of execution time. This is why:

Cache optimization is critical
Prefetching helps enormously
Memory-bound workloads benefit from larger caches

Sensitivity Analysis: What If Improvements
Change	New EMAT	New CPI	Speedup
Baseline	2.26 ns	3.03	1.00×
L1 hit rate → 97%	1.97 ns	2.74	1.11×
L2 hit rate → 95%	2.10 ns	2.86	1.06×
Memory → 200 cycles	1.93 ns	2.70	1.12×
TLB hit → 99%	2.21 ns	2.98	1.02×
All improvements	1.55 ns	2.28	1.33×

Practice Problems with Solutions

Problem 1: Two-Level Cache EMAT

L1 cache: 2 ns, 88% hit rate. L2 cache: 20 ns, 92% hit rate of L1 misses. Memory: 200 ns. Calculate EMAT and the percentage of time spent in each level.

EMAT Calculation:

From L2: 20 + 0.08 × 200 = 20 + 16 = 36 ns
From L1: 2 + 0.12 × 36 = 2 + 4.32 = 6.32 ns

EMAT = 6.32 ns

Time Distribution:

L1 time: 2 ns (always accessed)
L2 time: 0.12 × 20 = 2.4 ns (12% of accesses)
Memory time: 0.12 × 0.08 × 200 = 1.92 ns (0.96% of accesses)

Percentage:
- L1: 2/6.32 = 31.6%
- L2: 2.4/6.32 = 38.0%
- Memory: 1.92/6.32 = 30.4%

Problem 2: TLB + Cache

TLB: 1 ns, 96% hit. Cache: 4 ns, 90% hit. Memory: 100 ns. 2-level page table. (a) Calculate EMAT. (b) What TLB hit rate is needed to make TLB overhead < 5%?

(a) EMAT Calculation:

Cache EMAT = 4 + 0.1 × 100 = 14 ns
Page walk = 2 × 14 = 28 ns

EMAT = 1 + 14 + 0.04 × 28 = 1 + 14 + 1.12 = 16.12 ns

(b) TLB Overhead Target:

Without TLB miss: 1 + 14 = 15 ns
5% overhead: 15 × 1.05 = 15.75 ns

Max TLB miss contribution: 15.75 - 15 = 0.75 ns

(1 - p_hit) × 28 ≤ 0.75
(1 - p_hit) ≤ 0.75/28 = 0.027
p_hit ≥ 97.3%

Answer: TLB hit rate must be at least 97.3% for overhead < 5%.

Problem 3: CPI Impact

A CPU at 2 GHz has CPI_base = 1.2 and 40% memory instructions. L1: 4 cycles, 92% hit. Memory: 200 cycles. Calculate actual CPI and IPC.

Memory Stall Calculation:

Miss penalty = 200 - 4 = 196 cycles
Miss rate = 8%

Avg memory stall per access = 0.08 × 196 = 15.68 cycles

Stalls per instruction = 0.40 × 15.68 = 6.27 cycles

Actual CPI:

CPI = CPI_base + memory_stalls
    = 1.2 + 6.27
    = 7.47 cycles

IPC:

IPC = 1/CPI = 1/7.47 = 0.134

Interpretation: Memory stalls cause CPI to increase 6.2× from ideal. Only 0.134 instructions complete per cycle vs. 0.83 ideal.

Summary: Memory Access Time Calculations

Key Takeaways

•EMAT combines all levels — Cache EMAT + TLB overhead + page walk cost
•Work backwards — Start from memory, add each cache level's contribution
•TLB misses trigger page walks — Each walk level uses cache EMAT, not raw memory
•The 3 C's explain misses — Compulsory, Capacity, Conflict guide optimization
•Write policies affect EMAT — Write-back has different miss behavior than write-through
•CPI = base + memory stalls — Memory often dominates execution time

Essential Formulas:

EMAT_cache = T_L1 + (1-h₁) × T_L2 + (1-h₁)(1-h₂) × T_L3 + ... + T_mem

EMAT_with_TLB = T_TLB + EMAT_cache + (1-p_tlb) × T_walk

T_walk = Page_Table_Levels × EMAT_cache

CPI = CPI_base + Memory_Fraction × Stalls_per_Memory_Access

Module Complete!

You have now mastered the five core numerical problem types in operating systems:

Process timing calculations
Page table size calculations
TLB hit ratio calculations
Disk scheduling seek time
Memory access time calculations

These skills form the quantitative foundation for OS performance analysis and are essential for technical interviews.

Module Complete!

5 / 5