Working Set Model - Learning Module

Loading content...

0/240

WSS Tracking

Measuring the Immeasurable

Knowing the precise working set size would allow perfect memory allocation—no more, no less than needed. But here's the challenge: modern systems execute billions of memory references per second. Recording every reference to compute the exact working set would consume more CPU cycles than the original computation. We need clever approximations that provide useful information without overwhelming the system.

This is the WSS tracking problem: how do we estimate the working set size of each process accurately enough to make good allocation decisions, while keeping overhead low enough to be practical? The solutions involve a combination of hardware support, statistical sampling, and algorithmic cleverness.

What You Will Learn

By the end of this page, you will understand the fundamental WSS tracking algorithms, their accuracy-overhead trade-offs, hardware support for tracking, modern kernel implementations, and practical considerations for WSS monitoring. You'll learn how to reason about tracking overhead and choose appropriate mechanisms for different system requirements.

The WSS Tracking Challenge

To understand why WSS tracking is challenging, let's examine what we'd need for exact tracking versus what's practically achievable.

Exact Tracking Requirements:

To compute W(t, Δ) exactly, we would need to:

Record every memory reference (billions per second)
Store the timestamp of each reference (massive storage)
Query the last Δ references to find distinct pages (expensive query)
Do this continuously for every process

The overhead would dwarf the actual computation.

Practical Tracking Goals:

Instead of exact values, we seek an estimate of WSS that:

Is sufficiently accurate for memory allocation decisions
Has low, amortized overhead (fraction of a percent of CPU)
Can be computed without storing complete reference history
Updates automatically as the working set evolves

WSS Tracking Mechanism Comparison
Mechanism	Accuracy	CPU Overhead	Storage Overhead	Practical
Exact trace	100%	10-100x slowdown	Gigabytes/second	❌ Analysis only
Per-reference counter	95%+	High (trap overhead)	Counter per page	❌ Too slow
Reference bit polling	80-90%	Low (periodic)	Bits per page	✓ Most common
Hardware counters	90%+	Very low	Minimal	✓ If available
Statistical sampling	85%+	Very low	Sample buffer	✓ Modern approach

Good Enough Is Good Enough

For memory allocation decisions, we don't need exact WSS values. What matters is: (1) detecting when WSS increases significantly (need more frames), (2) detecting when WSS decreases (can reclaim frames), and (3) comparing relative WSS between processes. Approximate tracking serves these purposes well.

Reference Bit Based Tracking

The most common WSS tracking mechanism leverages the hardware reference bit (also called accessed bit) provided by CPU page table hardware.

How Reference Bits Work:

Each page table entry contains a reference bit (R bit)
Hardware automatically sets R=1 when the page is accessed
Software can clear R=0 at any time
Hardware never clears the bit—only software does

This hardware support is universal in modern CPUs and provides the foundation for practical WSS tracking.

Basic Reference Bit Tracking Algorithm:

Periodically (on timer interrupt): Scan all page table entries
For each entry: Check if R=1, then clear R=0
Count: Pages with R=1 were accessed since last scan
Working Set: Pages accessed in recent N scans

reference_bit_tracking.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
/*
 * Reference Bit Based WSS Tracking
 * 
 * Uses hardware reference bits to track which pages are actively used.
 * This is the foundation of most practical WSS tracking.
 */
 
#include <stdint.h>
#include <stdbool.h>
 
#define PAGE_TABLE_SIZE 65536  // Example: 256MB address space with 4KB pages
#define HISTORY_INTERVALS 8    // Track access over 8 timer intervals
 
typedef struct {
    uint64_t physical_frame : 52;  // Physical frame number
    uint64_t reserved : 3;
    uint64_t accessed : 1;         // Hardware reference bit (R bit)
    uint64_t dirty : 1;            // Hardware modified bit (M bit)
    uint64_t present : 1;          // Page present in memory
    uint64_t writable : 1;
    uint64_t user : 1;
    // ... other bits
} PageTableEntry;
 
typedef struct {
    PageTableEntry page_table[PAGE_TABLE_SIZE];
    uint8_t access_history[PAGE_TABLE_SIZE];  // Per-page access history
    int current_wss;
    uint64_t scan_count;
} ProcessMemory;
 
/*
 * Timer interrupt handler for WSS tracking.
 * Called every timer tick (e.g., every 10ms).
 */
void wss_timer_tick(ProcessMemory *proc) {
    int accessed_this_interval = 0;
    int in_working_set = 0;
    
    for (int page = 0; page < PAGE_TABLE_SIZE; page++) {
        PageTableEntry *pte = &proc->page_table[page];
        
        // Only track present pages
        if (!pte->present) {
            continue;
        }
        
        // Shift history right (age all entries)
        proc->access_history[page] >>= 1;
        
        // Check and record reference bit
        if (pte->accessed) {
            proc->access_history[page] |= 0x80;  // Set high bit
            pte->accessed = 0;  // Clear for next interval
            accessed_this_interval++;
        }
        
        // Page is in working set if any history bit is set
        if (proc->access_history[page] != 0) {
            in_working_set++;
        }
    }
    
    proc->current_wss = in_working_set;
    proc->scan_count++;
    
    // Could trigger allocation adjustment here if WSS changed significantly
    // log_wss_change(proc->current_wss, accessed_this_interval);
}
 
/*
 * Get current WSS estimate.
 */
int get_wss(ProcessMemory *proc) {
    return proc->current_wss;
}
 
/*
 * Check if specific page is in working set.
 */
bool is_in_working_set(ProcessMemory *proc, int page) {
    if (page < 0 || page >= PAGE_TABLE_SIZE) return false;
    return proc->access_history[page] != 0;
}
 
/*
 * Get detailed age for a page (lower = more recent).
 * Useful for replacement decisions.
 */
int get_page_recency(ProcessMemory *proc, int page) {
    if (page < 0 || page >= PAGE_TABLE_SIZE) return HISTORY_INTERVALS + 1;
    
    uint8_t history = proc->access_history[page];
    if (history == 0) return HISTORY_INTERVALS + 1;  // Not in WS
    
    // Count leading zeros to find most recent access
    int age = 0;
    uint8_t mask = 0x80;
    while ((history & mask) == 0 && age < HISTORY_INTERVALS) {
        mask >>= 1;
        age++;
    }
    return age;
}
 
/*
 * Compute WSS with effective smaller window.
 * Allows dynamic window adjustment without re-tracking.
 */
int get_wss_with_window(ProcessMemory *proc, int intervals) {
    if (intervals > HISTORY_INTERVALS) intervals = HISTORY_INTERVALS;
    
    // Create mask for desired interval count
    uint8_t mask = 0;
    for (int i = 0; i < intervals; i++) {
        mask |= (0x80 >> i);
    }
    
    int wss = 0;
    for (int page = 0; page < PAGE_TABLE_SIZE; page++) {
        if (!proc->page_table[page].present) continue;
        if ((proc->access_history[page] & mask) != 0) {
            wss++;
        }
    }
    return wss;
}
 
/*
 * Compute WSS breakdown by recency.
 * Returns how many pages are at each age level.
 */
void get_wss_age_distribution(ProcessMemory *proc, int *distribution) {
    // distribution[i] = count of pages with age i
    for (int i = 0; i <= HISTORY_INTERVALS; i++) {
        distribution[i] = 0;
    }
    
    for (int page = 0; page < PAGE_TABLE_SIZE; page++) {
        if (!proc->page_table[page].present) continue;
        int age = get_page_recency(proc, page);
        distribution[age]++;
    }
}

The Scanning Overhead

Scanning all page table entries on every timer tick can be expensive for processes with large address spaces. Real implementations use optimizations: scan only resident pages (skip PTEs for swapped pages), use hierarchical page tables to skip empty regions, batch updates across multiple ticks, or use hardware performance counters instead of PTE scanning.

Multi-Level and Hierarchical Tracking

Real systems often use multi-level tracking schemes that provide different granularities of information about page usage.

The Two-Level Approach:

Many kernels maintain two categories of pages:

Active pages: Recently accessed, should stay resident
Inactive pages: Not recently accessed, candidates for eviction

This binary classification is simpler than tracking exact ages but provides the essential information for allocation decisions.

Linux's Active/Inactive Lists:

Linux maintains linked lists rather than strict working sets:

Active LRU list: Pages recently accessed
Inactive LRU list: Pages not recently accessed
Promotion: Inactive → Active when page is re-accessed
Demotion: Active → Inactive when page ages out

This enables O(1) list operations while approximating working set behavior.

Active List Properties

•Contains: Recently accessed pages
•Entry: From inactive list when accessed
•Exit: To inactive list on LRU age-out
•Eviction: Rarely from active list
•Size: Typically 50-75% of resident pages

Inactive List Properties

•Contains: Pages that may be evicted
•Entry: From active list on age-out
•Exit: Eviction or promotion to active
•Eviction: Primary eviction source
•Size: Typically 25-50% of resident pages

multi_level_tracking.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
# Multi-level WSS tracking with Active/Inactive lists
 
from collections import deque
from dataclasses import dataclass
from enum import Enum
from typing import Dict, Optional
 
class PageState(Enum):
    NOT_PRESENT = 0
    INACTIVE = 1
    ACTIVE = 2
 
@dataclass
class Page:
    page_num: int
    state: PageState
    access_bit: bool = False
    modified: bool = False
    
class ActiveInactiveTracker:
    """
    Two-list working set tracking inspired by Linux's approach.
    
    Provides approximate WSS through active/inactive classification
    rather than exact history tracking.
    """
    
    def __init__(self, max_active_ratio=0.75):
        # LRU lists (deques for O(1) operations at both ends)
        self.active_list = deque()    # Recently accessed
        self.inactive_list = deque()  # Candidates for eviction
        
        # Quick lookup
        self.pages: Dict[int, Page] = {}
        
        # Configuration
        self.max_active_ratio = max_active_ratio
        
        # Statistics
        self.promotions = 0
        self.demotions = 0
        
    def access_page(self, page_num: int) -> bool:
        """
        Record a page access. Returns True if page was already resident.
        """
        if page_num in self.pages:
            page = self.pages[page_num]
            page.access_bit = True
            
            if page.state == PageState.INACTIVE:
                # Promote to active on second access
                self._promote_to_active(page)
                
            return True
        else:
            # Page fault - not tracked here, but would load page
            return False
    
    def add_page(self, page_num: int, to_active: bool = False):
        """Add a new page (e.g., after loading from disk)."""
        page = Page(page_num=page_num, state=PageState.NOT_PRESENT)
        self.pages[page_num] = page
        
        if to_active:
            self._add_to_active(page)
        else:
            # New pages typically go to inactive first
            self._add_to_inactive(page)
    
    def _add_to_inactive(self, page: Page):
        """Add page to head of inactive list."""
        page.state = PageState.INACTIVE
        self.inactive_list.appendleft(page.page_num)
        
    def _add_to_active(self, page: Page):
        """Add page to head of active list."""
        page.state = PageState.ACTIVE
        self.active_list.appendleft(page.page_num)
        self._balance_lists()
        
    def _promote_to_active(self, page: Page):
        """Move page from inactive to active list."""
        if page.state != PageState.INACTIVE:
            return
            
        # Remove from inactive
        try:
            self.inactive_list.remove(page.page_num)
        except ValueError:
            pass
            
        # Add to active
        page.state = PageState.ACTIVE
        self.active_list.appendleft(page.page_num)
        page.access_bit = False
        self.promotions += 1
        
        self._balance_lists()
        
    def _demote_to_inactive(self, page: Page):
        """Move page from active to inactive list."""
        if page.state != PageState.ACTIVE:
            return
            
        # Remove from active
        try:
            self.active_list.remove(page.page_num)
        except ValueError:
            pass
            
        # Add to inactive
        page.state = PageState.INACTIVE
        self.inactive_list.appendleft(page.page_num)
        page.access_bit = False
        self.demotions += 1
        
    def _balance_lists(self):
        """
        Maintain target active/inactive ratio.
        
        If active list is too large, demote oldest active pages.
        """
        total = len(self.active_list) + len(self.inactive_list)
        target_active = int(total * self.max_active_ratio)
        
        while len(self.active_list) > target_active and self.active_list:
            # Demote oldest active page (at tail)
            oldest = self.active_list.pop()
            if oldest in self.pages:
                page = self.pages[oldest]
                page.state = PageState.INACTIVE
                self.inactive_list.appendleft(oldest)
                self.demotions += 1
    
    def scan_for_aging(self):
        """
        Periodic scan to age pages (like timer interrupt).
        
        Active pages without recent access are demoted.
        Inactive pages without recent access are eviction candidates.
        """
        # Process active list
        to_demote = []
        for page_num in self.active_list:
            if page_num in self.pages:
                page = self.pages[page_num]
                if page.access_bit:
                    page.access_bit = False  # Give another chance
                else:
                    to_demote.append(page_num)
        
        # Demote pages that weren't accessed
        for page_num in to_demote[:len(to_demote)//4]:  # Demote 25% at a time
            if page_num in self.pages:
                self._demote_to_inactive(self.pages[page_num])
    
    def select_eviction_candidate(self) -> Optional[int]:
        """
        Select a page for eviction from inactive list tail.
        """
        while self.inactive_list:
            candidate = self.inactive_list.pop()
            
            if candidate not in self.pages:
                continue
                
            page = self.pages[candidate]
            
            # Give second chance if accessed
            if page.access_bit:
                page.access_bit = False
                # Move to front of inactive (second chance)
                self.inactive_list.appendleft(candidate)
                continue
            
            return candidate
        
        # Inactive list exhausted - take from active list
        if self.active_list:
            return self.active_list.pop()
            
        return None
    
    def get_working_set_size(self) -> int:
        """
        Estimate WSS as size of active list.
        
        This is an approximation: the "actively used" pages.
        """
        return len(self.active_list)
    
    def get_total_resident(self) -> int:
        """Get total resident pages (active + inactive)."""
        return len(self.active_list) + len(self.inactive_list)
    
    def get_statistics(self):
        """Get tracking statistics."""
        return {
            'active_size': len(self.active_list),
            'inactive_size': len(self.inactive_list),
            'total_resident': self.get_total_resident(),
            'wss_estimate': self.get_working_set_size(),
            'promotions': self.promotions,
            'demotions': self.demotions,
            'active_ratio': len(self.active_list) / max(1, self.get_total_resident())
        }
 
 
# Demonstration
def demonstrate_active_inactive():
    """Show how active/inactive tracking works."""
    print("Active/Inactive List WSS Tracking")
    print("=" * 60)
    
    tracker = ActiveInactiveTracker(max_active_ratio=0.5)
    
    # Simulate loading pages
    for i in range(20):
        tracker.add_page(i)
    
    print("Initial state (all pages inactive):")
    print(f"  {tracker.get_statistics()}")
    
    # Simulate access pattern: hot set of 5 pages
    print("
Simulating accesses to pages 0-4 (hot set)...")
    for _ in range(10):
        for page in range(5):
            tracker.access_page(page)
        tracker.scan_for_aging()
    
    print(f"After hot set accesses:")
    print(f"  {tracker.get_statistics()}")
    
    # Access some cold pages
    print("
Accessing some 'cold' pages 15-19...")
    for page in range(15, 20):
        tracker.access_page(page)
    tracker.scan_for_aging()
    
    print(f"After cold page accesses:")
    print(f"  {tracker.get_statistics()}")
    
    # Select eviction candidates
    print("
Eviction candidates (from inactive list):")
    for i in range(3):
        victim = tracker.select_eviction_candidate()
        print(f"  Eviction {i+1}: page {victim}")
 
if __name__ == "__main__":
    demonstrate_active_inactive()

Hardware-Assisted Tracking

Modern processors provide various hardware features that can assist with WSS tracking, reducing software overhead while improving accuracy.

Hardware Performance Counters:

Most modern CPUs include Performance Monitoring Units (PMUs) that can count cache and TLB events:

TLB misses: Indicate access to pages not recently used (TLB approximates working set)
Page walks: Count address translations, correlate with working set size
Cache misses by level: L1/L2/L3 misses provide different granularity information

These counters operate in hardware with zero software overhead during normal execution.

Intel Processor Trace / PEBS:

Advanced processors offer precise event-based sampling (PEBS) or processor trace:

Sample memory accesses at configurable rates
Record virtual addresses accessed
Minimal overhead (< 5%) for reasonable sample rates
Can reconstruct working set from samples

Hardware Features for WSS Tracking
Feature	Architecture	Information Provided	Overhead
Reference bit	All modern	Binary: accessed or not	None (automatic)
TLB miss counter	Most modern	Working set ≫ TLB size	None (read counter)
PEBS (Intel)	Intel Core+	Sampled memory addresses	1-5% typical
IBS (AMD)	AMD Fam10h+	Sampled instruction/data	1-5% typical
ARM SPE	ARMv8.2+	Statistical profiling	1-5% typical
Intel PT	Broadwell+	Full trace (debugger)	5-15% typical

TLB as Working Set Proxy

The TLB (Translation Lookaside Buffer) naturally caches page table entries for recently accessed pages—essentially acting as a hardware-maintained approximation of the working set. If a process's working set fits in the TLB, TLB misses are rare. High TLB miss rates indicate the working set exceeds TLB capacity, suggesting more memory allocation might help.

hardware_counters.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
/*
 * Hardware Performance Counter Based WSS Estimation
 * 
 * Uses CPU performance counters to estimate working set
 * characteristics with minimal overhead.
 */
 
#include <stdint.h>
#include <stdbool.h>
 
/* Linux perf_event interface (simplified) */
struct perf_event_attr {
    uint32_t type;
    uint64_t config;
    // ... other fields
};
 
/* Counter IDs for WSS-relevant events */
#define PERF_TYPE_HARDWARE 0
 
/* Simulated hardware counter events */
typedef enum {
    EVENT_DTLB_LOAD_MISSES,
    EVENT_DTLB_STORE_MISSES, 
    EVENT_PAGE_WALKS,
    EVENT_L3_MISSES,
    EVENT_MEMORY_REFERENCES
} HWCounterEvent;
 
typedef struct {
    uint64_t dtlb_misses;
    uint64_t page_walks;
    uint64_t l3_misses;
    uint64_t mem_refs;
    uint64_t sample_time;
} HWCounterSnapshot;
 
typedef struct {
    HWCounterSnapshot current;
    HWCounterSnapshot previous;
    
    /* Derived WSS estimates */
    double estimated_wss_pages;
    double miss_rate;
    double working_set_ratio;  /* WSS / TLB size estimate */
} WSSEstimator;
 
/*
 * Initialize hardware counter based WSS estimator.
 */
void init_wss_estimator(WSSEstimator *est) {
    est->current = (HWCounterSnapshot){0};
    est->previous = (HWCounterSnapshot){0};
    est->estimated_wss_pages = 0;
    est->miss_rate = 0;
    est->working_set_ratio = 0;
}
 
/*
 * Sample hardware counters (called periodically).
 */
void sample_hw_counters(WSSEstimator *est) {
    /* In real implementation, would read MSRs or use perf_event */
    /* Here we show the interface concept */
    
    est->previous = est->current;
    
    /* Read counters (pseudo-code) */
    // est->current.dtlb_misses = rdpmc(DTLB_MISS_COUNTER);
    // est->current.page_walks = rdpmc(PAGE_WALK_COUNTER);
    // etc.
    
    /* Compute derived metrics */
    uint64_t dtlb_delta = est->current.dtlb_misses - est->previous.dtlb_misses;
    uint64_t refs_delta = est->current.mem_refs - est->previous.mem_refs;
    
    if (refs_delta > 0) {
        est->miss_rate = (double)dtlb_delta / refs_delta;
    }
}
 
/*
 * Estimate WSS from TLB behavior.
 * 
 * Key insight: TLB miss rate correlates with WSS / TLB_SIZE ratio.
 * - Low miss rate → WSS fits in TLB
 * - High miss rate → WSS >> TLB size
 */
double estimate_wss_from_tlb(WSSEstimator *est, int tlb_entries) {
    /*
     * Simple model: If miss rate is M and TLB has T entries,
     * then roughly WSS ≈ T / (1 - M) for M < 1.
     * 
     * This is a rough approximation; real models are more complex.
     */
    if (est->miss_rate < 0.01) {
        /* Very low miss rate: WSS fits in TLB */
        return tlb_entries * 0.5;  /* Conservative estimate */
    } else if (est->miss_rate > 0.9) {
        /* Very high miss rate: WSS much larger than TLB */
        return tlb_entries * 10;  /* Working set is large */
    } else {
        /* Interpolate based on miss rate */
        return tlb_entries / (1.0 - est->miss_rate);
    }
}
 
/*
 * Detect working set size changes from counter deltas.
 */
typedef enum {
    WSS_STABLE,
    WSS_GROWING,
    WSS_SHRINKING
} WSSChangeStatus;
 
WSSChangeStatus detect_wss_change(WSSEstimator *est) {
    /* Previous miss rate estimate */
    static double prev_miss_rate = 0;
    
    double change = est->miss_rate - prev_miss_rate;
    prev_miss_rate = est->miss_rate;
    
    if (change > 0.05) {
        /* Miss rate increasing → WSS likely growing */
        return WSS_GROWING;
    } else if (change < -0.05) {
        /* Miss rate decreasing → WSS likely shrinking */
        return WSS_SHRINKING;
    }
    return WSS_STABLE;
}
 
/*
 * Recommend allocation change based on hardware counters.
 */
int recommend_allocation_change(WSSEstimator *est) {
    if (est->miss_rate > 0.1) {
        /* High miss rate: could benefit from more memory */
        return +10;  /* Suggest adding pages */
    } else if (est->miss_rate < 0.01) {
        /* Very low miss rate: might have excess allocation */
        return -5;   /* Suggest reclaiming pages */
    }
    return 0;  /* Current allocation seems appropriate */
}
 
/*
 * Example usage in kernel memory manager.
 */
void memory_manager_tick(void) {
    static WSSEstimator per_process_est[MAX_PROCESSES];
    
    for (int pid = 0; pid < active_process_count; pid++) {
        WSSEstimator *est = &per_process_est[pid];
        
        /* Sample this process's counters */
        // switch_to_process_context(pid);
        sample_hw_counters(est);
        
        /* Check for WSS changes */
        WSSChangeStatus status = detect_wss_change(est);
        
        if (status == WSS_GROWING) {
            /* Consider allocating more frames */
            int delta = recommend_allocation_change(est);
            // request_additional_frames(pid, delta);
        } else if (status == WSS_SHRINKING) {
            /* Consider reclaiming frames */
            int delta = recommend_allocation_change(est);
            // reclaim_frames(pid, -delta);
        }
    }
}

Statistical Sampling Approaches

Rather than tracking every access or scanning all pages, statistical sampling provides accurate WSS estimates with very low overhead.

The Sampling Insight:

If pages are sampled randomly and independently:

The fraction of samples hitting a particular page reflects its access frequency
The number of distinct pages in a sample reflects the working set size
Confidence intervals can quantify estimate accuracy

Random Sampling Schemes:

1. Time-Based Sampling:

Sample at random time intervals
Record which page was being accessed (using PC or data address)
Reconstruct working set from sample history

2. Address Space Sampling:

Sample a random subset of pages
Track reference bits for sampled pages only
Extrapolate to full address space

3. Hardware-Triggered Sampling (PEBS/IBS):

Configure hardware to sample every Nth memory access
Record full address of sampled accesses
Very low overhead, high accuracy

statistical_sampling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
# Statistical Sampling for WSS Estimation
 
import random
import math
from collections import Counter
from typing import List, Set, Tuple
 
class SamplingWSSEstimator:
    """
    Estimate WSS through statistical sampling.
    
    Instead of tracking all pages, we sample memory accesses
    and estimate WSS from the sample distribution.
    """
    
    def __init__(self, sample_rate: float = 0.001, window_size: int = 10000):
        """
        Args:
            sample_rate: Fraction of accesses to sample (e.g., 0.001 = 0.1%)
            window_size: Number of samples to retain for estimation
        """
        self.sample_rate = sample_rate
        self.window_size = window_size
        self.samples: List[int] = []  # Recent page samples
        self.total_accesses = 0
        
    def record_access(self, page: int):
        """
        Record a memory access (may or may not be sampled).
        """
        self.total_accesses += 1
        
        # Probabilistically sample this access
        if random.random() < self.sample_rate:
            self.samples.append(page)
            
            # Maintain window size
            if len(self.samples) > self.window_size:
                self.samples.pop(0)
    
    def estimate_wss(self) -> Tuple[float, float]:
        """
        Estimate WSS from samples.
        
        Returns (estimate, confidence_interval_95).
        
        Uses the species richness estimation approach:
        Given n samples with D distinct values, estimate the
        true number of distinct items in population.
        """
        if len(self.samples) < 100:
            return (0, float('inf'))  # Not enough samples
        
        n = len(self.samples)
        D = len(set(self.samples))  # Distinct pages in sample
        
        # Chao1 estimator for species richness
        # Good when there are many rare species (infrequently accessed pages)
        page_counts = Counter(self.samples)
        f1 = sum(1 for c in page_counts.values() if c == 1)  # Singletons
        f2 = sum(1 for c in page_counts.values() if c == 2)  # Doubletons
        
        if f2 == 0:
            f2 = 1  # Avoid division by zero
            
        # Chao1 estimate
        chao1_estimate = D + (f1 * (f1 - 1)) / (2 * (f2 + 1))
        
        # Confidence interval (approximate)
        variance = f2 * (0.5 * (f1/f2)**2 + (f1/f2)**3 + 0.25 * (f1/f2)**4)
        ci_95 = 1.96 * math.sqrt(variance) if variance > 0 else D * 0.1
        
        return (chao1_estimate, ci_95)
    
    def estimate_page_frequency(self, page: int) -> float:
        """Estimate access frequency of a specific page."""
        if not self.samples:
            return 0.0
        count = self.samples.count(page)
        return count / len(self.samples)
    
    def get_hot_pages(self, top_k: int = 10) -> List[Tuple[int, float]]:
        """Return the top-K most frequently accessed pages."""
        page_counts = Counter(self.samples)
        total = len(self.samples)
        return [
            (page, count / total)
            for page, count in page_counts.most_common(top_k)
        ]
 
 
class AdaptiveSamplingEstimator:
    """
    Adaptive sampling that adjusts rate based on observed variance.
    
    Key insight: If WSS is stable, we can reduce sampling rate.
    If WSS is changing, we need higher sample rate for accuracy.
    """
    
    def __init__(self, initial_rate: float = 0.001):
        self.base_estimator = SamplingWSSEstimator(sample_rate=initial_rate)
        self.wss_history: List[float] = []
        self.sample_rate = initial_rate
        self.min_rate = 0.0001
        self.max_rate = 0.01
        
    def record_access(self, page: int):
        """Record access with adaptive sampling."""
        # Use current (potentially adjusted) sample rate
        self.base_estimator.sample_rate = self.sample_rate
        self.base_estimator.record_access(page)
        
        # Periodically adjust sample rate
        if self.base_estimator.total_accesses % 10000 == 0:
            self._adjust_sampling_rate()
    
    def _adjust_sampling_rate(self):
        """Adjust sampling rate based on WSS stability."""
        estimate, ci = self.base_estimator.estimate_wss()
        self.wss_history.append(estimate)
        
        if len(self.wss_history) < 5:
            return
            
        # Check variance of recent WSS estimates
        recent = self.wss_history[-5:]
        mean = sum(recent) / len(recent)
        variance = sum((x - mean)**2 for x in recent) / len(recent)
        cv = math.sqrt(variance) / mean if mean > 0 else 0
        
        if cv > 0.2:  # High variance: WSS is changing
            # Increase sample rate for better tracking
            self.sample_rate = min(self.sample_rate * 1.5, self.max_rate)
        elif cv < 0.05:  # Low variance: WSS is stable
            # Decrease sample rate to save overhead
            self.sample_rate = max(self.sample_rate * 0.75, self.min_rate)
    
    def estimate_wss(self) -> Tuple[float, float]:
        return self.base_estimator.estimate_wss()
 
 
def demonstrate_sampling():
    """Demonstrate sampling-based WSS estimation."""
    print("Sampling-Based WSS Estimation")
    print("=" * 60)
    
    # Create estimator with 0.1% sample rate
    estimator = SamplingWSSEstimator(sample_rate=0.001, window_size=10000)
    
    # Simulate workload: 50 hot pages (80% of accesses) + 200 cold pages
    hot_pages = list(range(50))
    cold_pages = list(range(50, 250))
    
    print("Simulating workload:")
    print("  Hot pages: 0-49 (80% of accesses)")
    print("  Cold pages: 50-249 (20% of accesses)")
    print("  True WSS: 250 pages")
    print()
    
    # Generate accesses
    for i in range(1000000):
        if random.random() < 0.8:
            page = random.choice(hot_pages)
        else:
            page = random.choice(cold_pages)
        estimator.record_access(page)
    
    # Get estimate
    estimate, ci = estimator.estimate_wss()
    
    print(f"After {estimator.total_accesses:,} accesses:")
    print(f"  Samples collected: {len(estimator.samples):,}")
    print(f"  WSS estimate: {estimate:.0f} pages")
    print(f"  95% CI: ±{ci:.0f} pages")
    print(f"  True WSS: 250 pages")
    print(f"  Error: {abs(estimate - 250) / 250 * 100:.1f}%")
    print()
    
    # Show hot pages
    print("Estimated hottest pages:")
    for page, freq in estimator.get_hot_pages(5):
        print(f"  Page {page}: {freq*100:.1f}% of sampled accesses")
 
if __name__ == "__main__":
    demonstrate_sampling()

Species Richness Connection

The problem of estimating the number of distinct pages from a sample is mathematically equivalent to the ecological problem of estimating species richness from a sample of individuals. Techniques like the Chao1 estimator, developed for ecology, apply directly to WSS estimation. This is an example of cross-domain insight enabling better systems design.

Real-World Kernel Implementations

Modern operating systems combine multiple tracking techniques to balance accuracy and overhead. Let's examine how real kernels approach WSS tracking.

Linux Kernel Approach:

Linux doesn't explicitly track 'working set' but uses related mechanisms:

Active/Inactive LRU Lists: Pages are classified by recency
Referenced Bit Aging: Periodic scanning clears reference bits
Memory Pressure Detection: High scan rates indicate working set > memory
Per-Process RSS Tracking: Resident Set Size tracks allocated frames
Cgroup Memory Controller: Can limit and track memory per group

Windows Working Set Manager:

Windows has explicit working set tracking:

Working Set Lists: Per-process list of allocated frames
WS Minimum/Maximum: Guaranteed minimum, enforced maximum
Trimming: When memory pressure high, trim working sets
Balance Set Manager: System-wide coordination

WSS Tracking in Major Operating Systems
Aspect	Linux	Windows	macOS/Darwin
Tracking unit	Pages in LRU lists	Working set pages	Resident pages
Reference tracking	Reference bit + aging	Reference bit + aging	Reference bit + pressure
Per-process info	RSS, /proc/pid/smaps	WS min/max, peak	Resident/virtual
Trimming trigger	kswapd watermarks	Balance set manager	Jetsam / memory pressure
Application visibility	/proc, cgroups	QueryWorkingSet API	Activity Monitor

Linux /proc/[pid]/smaps Example:

Linux exposes detailed per-process memory information through /proc:

/proc/1234/smaps shows:
Size:           4096 kB     # Virtual size of mapping
Rss:            1024 kB     # Currently resident pages
Pss:             512 kB     # Proportional share (for shared pages)
Referenced:      768 kB     # Pages with reference bit set
Anonymous:      1024 kB     # Anonymous (non-file) pages
...

The 'Referenced' field approximates the working set: pages accessed since the kernel last cleared reference bits.

Practical Monitoring Tools:

top / htop: Shows RES/RSS (resident set size)
ps aux: VSZ (virtual size), RSS (resident size)
vmstat: System-wide page fault rates
perf: Hardware performance counters for TLB behavior
valgrind --tool=massif: Heap profiling for application tuning

WSS Tools for Application Developers

For application developers wanting to understand their working set: (1) Use perf stat to see TLB miss rates, (2) Monitor RSS growth over time, (3) Use memory profilers to identify hot allocations, (4) Test with different memory limits to find the 'knee' where performance degrades. Understanding your application's WSS helps you set appropriate memory limits and predict behavior under memory pressure.

Summary and Looking Ahead

Tracking the working set size is essential for informed memory allocation, but exact tracking is prohibitively expensive. Practical systems use approximations—reference bit aging, active/inactive lists, hardware counters, and statistical sampling—to balance accuracy and overhead.

Key Concepts Mastered

•The tracking challenge: Exact WSS tracking is impractical; we need approximations
•Reference bit tracking: Hardware reference bits enable low-overhead recency tracking
•Multi-level classification: Active/inactive lists provide efficient page categorization
•Hardware counters: TLB and cache miss counters provide indirect WSS signals
•Statistical sampling: Sample-based estimation achieves accuracy with minimal overhead
•Real implementations: Linux, Windows, and other OSes combine multiple techniques

Page Complete

You now understand how operating systems track working set size in practice, balancing the need for accurate information against the overhead of gathering it. In the final page, we'll see how tracked WSS information is used to make allocation decisions.

Next Up: Allocation Based on WSS

With WSS tracking in place, we can finally close the loop: using working set information to make dynamic frame allocation decisions. We'll explore how the working set model translates into practical allocation policies, including integration with thrashing prevention and load control.