Loading content...
Knowing the precise working set size would allow perfect memory allocation—no more, no less than needed. But here's the challenge: modern systems execute billions of memory references per second. Recording every reference to compute the exact working set would consume more CPU cycles than the original computation. We need clever approximations that provide useful information without overwhelming the system.
This is the WSS tracking problem: how do we estimate the working set size of each process accurately enough to make good allocation decisions, while keeping overhead low enough to be practical? The solutions involve a combination of hardware support, statistical sampling, and algorithmic cleverness.
By the end of this page, you will understand the fundamental WSS tracking algorithms, their accuracy-overhead trade-offs, hardware support for tracking, modern kernel implementations, and practical considerations for WSS monitoring. You'll learn how to reason about tracking overhead and choose appropriate mechanisms for different system requirements.
To understand why WSS tracking is challenging, let's examine what we'd need for exact tracking versus what's practically achievable.
Exact Tracking Requirements:
To compute W(t, Δ) exactly, we would need to:
The overhead would dwarf the actual computation.
Practical Tracking Goals:
Instead of exact values, we seek an estimate of WSS that:
| Mechanism | Accuracy | CPU Overhead | Storage Overhead | Practical |
|---|---|---|---|---|
| Exact trace | 100% | 10-100x slowdown | Gigabytes/second | ❌ Analysis only |
| Per-reference counter | 95%+ | High (trap overhead) | Counter per page | ❌ Too slow |
| Reference bit polling | 80-90% | Low (periodic) | Bits per page | ✓ Most common |
| Hardware counters | 90%+ | Very low | Minimal | ✓ If available |
| Statistical sampling | 85%+ | Very low | Sample buffer | ✓ Modern approach |
For memory allocation decisions, we don't need exact WSS values. What matters is: (1) detecting when WSS increases significantly (need more frames), (2) detecting when WSS decreases (can reclaim frames), and (3) comparing relative WSS between processes. Approximate tracking serves these purposes well.
The most common WSS tracking mechanism leverages the hardware reference bit (also called accessed bit) provided by CPU page table hardware.
How Reference Bits Work:
This hardware support is universal in modern CPUs and provides the foundation for practical WSS tracking.
Basic Reference Bit Tracking Algorithm:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144
/* * Reference Bit Based WSS Tracking * * Uses hardware reference bits to track which pages are actively used. * This is the foundation of most practical WSS tracking. */ #include <stdint.h>#include <stdbool.h> #define PAGE_TABLE_SIZE 65536 // Example: 256MB address space with 4KB pages#define HISTORY_INTERVALS 8 // Track access over 8 timer intervals typedef struct { uint64_t physical_frame : 52; // Physical frame number uint64_t reserved : 3; uint64_t accessed : 1; // Hardware reference bit (R bit) uint64_t dirty : 1; // Hardware modified bit (M bit) uint64_t present : 1; // Page present in memory uint64_t writable : 1; uint64_t user : 1; // ... other bits} PageTableEntry; typedef struct { PageTableEntry page_table[PAGE_TABLE_SIZE]; uint8_t access_history[PAGE_TABLE_SIZE]; // Per-page access history int current_wss; uint64_t scan_count;} ProcessMemory; /* * Timer interrupt handler for WSS tracking. * Called every timer tick (e.g., every 10ms). */void wss_timer_tick(ProcessMemory *proc) { int accessed_this_interval = 0; int in_working_set = 0; for (int page = 0; page < PAGE_TABLE_SIZE; page++) { PageTableEntry *pte = &proc->page_table[page]; // Only track present pages if (!pte->present) { continue; } // Shift history right (age all entries) proc->access_history[page] >>= 1; // Check and record reference bit if (pte->accessed) { proc->access_history[page] |= 0x80; // Set high bit pte->accessed = 0; // Clear for next interval accessed_this_interval++; } // Page is in working set if any history bit is set if (proc->access_history[page] != 0) { in_working_set++; } } proc->current_wss = in_working_set; proc->scan_count++; // Could trigger allocation adjustment here if WSS changed significantly // log_wss_change(proc->current_wss, accessed_this_interval);} /* * Get current WSS estimate. */int get_wss(ProcessMemory *proc) { return proc->current_wss;} /* * Check if specific page is in working set. */bool is_in_working_set(ProcessMemory *proc, int page) { if (page < 0 || page >= PAGE_TABLE_SIZE) return false; return proc->access_history[page] != 0;} /* * Get detailed age for a page (lower = more recent). * Useful for replacement decisions. */int get_page_recency(ProcessMemory *proc, int page) { if (page < 0 || page >= PAGE_TABLE_SIZE) return HISTORY_INTERVALS + 1; uint8_t history = proc->access_history[page]; if (history == 0) return HISTORY_INTERVALS + 1; // Not in WS // Count leading zeros to find most recent access int age = 0; uint8_t mask = 0x80; while ((history & mask) == 0 && age < HISTORY_INTERVALS) { mask >>= 1; age++; } return age;} /* * Compute WSS with effective smaller window. * Allows dynamic window adjustment without re-tracking. */int get_wss_with_window(ProcessMemory *proc, int intervals) { if (intervals > HISTORY_INTERVALS) intervals = HISTORY_INTERVALS; // Create mask for desired interval count uint8_t mask = 0; for (int i = 0; i < intervals; i++) { mask |= (0x80 >> i); } int wss = 0; for (int page = 0; page < PAGE_TABLE_SIZE; page++) { if (!proc->page_table[page].present) continue; if ((proc->access_history[page] & mask) != 0) { wss++; } } return wss;} /* * Compute WSS breakdown by recency. * Returns how many pages are at each age level. */void get_wss_age_distribution(ProcessMemory *proc, int *distribution) { // distribution[i] = count of pages with age i for (int i = 0; i <= HISTORY_INTERVALS; i++) { distribution[i] = 0; } for (int page = 0; page < PAGE_TABLE_SIZE; page++) { if (!proc->page_table[page].present) continue; int age = get_page_recency(proc, page); distribution[age]++; }}Scanning all page table entries on every timer tick can be expensive for processes with large address spaces. Real implementations use optimizations: scan only resident pages (skip PTEs for swapped pages), use hierarchical page tables to skip empty regions, batch updates across multiple ticks, or use hardware performance counters instead of PTE scanning.
Real systems often use multi-level tracking schemes that provide different granularities of information about page usage.
The Two-Level Approach:
Many kernels maintain two categories of pages:
This binary classification is simpler than tracking exact ages but provides the essential information for allocation decisions.
Linux's Active/Inactive Lists:
Linux maintains linked lists rather than strict working sets:
This enables O(1) list operations while approximating working set behavior.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254
# Multi-level WSS tracking with Active/Inactive lists from collections import dequefrom dataclasses import dataclassfrom enum import Enumfrom typing import Dict, Optional class PageState(Enum): NOT_PRESENT = 0 INACTIVE = 1 ACTIVE = 2 @dataclassclass Page: page_num: int state: PageState access_bit: bool = False modified: bool = False class ActiveInactiveTracker: """ Two-list working set tracking inspired by Linux's approach. Provides approximate WSS through active/inactive classification rather than exact history tracking. """ def __init__(self, max_active_ratio=0.75): # LRU lists (deques for O(1) operations at both ends) self.active_list = deque() # Recently accessed self.inactive_list = deque() # Candidates for eviction # Quick lookup self.pages: Dict[int, Page] = {} # Configuration self.max_active_ratio = max_active_ratio # Statistics self.promotions = 0 self.demotions = 0 def access_page(self, page_num: int) -> bool: """ Record a page access. Returns True if page was already resident. """ if page_num in self.pages: page = self.pages[page_num] page.access_bit = True if page.state == PageState.INACTIVE: # Promote to active on second access self._promote_to_active(page) return True else: # Page fault - not tracked here, but would load page return False def add_page(self, page_num: int, to_active: bool = False): """Add a new page (e.g., after loading from disk).""" page = Page(page_num=page_num, state=PageState.NOT_PRESENT) self.pages[page_num] = page if to_active: self._add_to_active(page) else: # New pages typically go to inactive first self._add_to_inactive(page) def _add_to_inactive(self, page: Page): """Add page to head of inactive list.""" page.state = PageState.INACTIVE self.inactive_list.appendleft(page.page_num) def _add_to_active(self, page: Page): """Add page to head of active list.""" page.state = PageState.ACTIVE self.active_list.appendleft(page.page_num) self._balance_lists() def _promote_to_active(self, page: Page): """Move page from inactive to active list.""" if page.state != PageState.INACTIVE: return # Remove from inactive try: self.inactive_list.remove(page.page_num) except ValueError: pass # Add to active page.state = PageState.ACTIVE self.active_list.appendleft(page.page_num) page.access_bit = False self.promotions += 1 self._balance_lists() def _demote_to_inactive(self, page: Page): """Move page from active to inactive list.""" if page.state != PageState.ACTIVE: return # Remove from active try: self.active_list.remove(page.page_num) except ValueError: pass # Add to inactive page.state = PageState.INACTIVE self.inactive_list.appendleft(page.page_num) page.access_bit = False self.demotions += 1 def _balance_lists(self): """ Maintain target active/inactive ratio. If active list is too large, demote oldest active pages. """ total = len(self.active_list) + len(self.inactive_list) target_active = int(total * self.max_active_ratio) while len(self.active_list) > target_active and self.active_list: # Demote oldest active page (at tail) oldest = self.active_list.pop() if oldest in self.pages: page = self.pages[oldest] page.state = PageState.INACTIVE self.inactive_list.appendleft(oldest) self.demotions += 1 def scan_for_aging(self): """ Periodic scan to age pages (like timer interrupt). Active pages without recent access are demoted. Inactive pages without recent access are eviction candidates. """ # Process active list to_demote = [] for page_num in self.active_list: if page_num in self.pages: page = self.pages[page_num] if page.access_bit: page.access_bit = False # Give another chance else: to_demote.append(page_num) # Demote pages that weren't accessed for page_num in to_demote[:len(to_demote)//4]: # Demote 25% at a time if page_num in self.pages: self._demote_to_inactive(self.pages[page_num]) def select_eviction_candidate(self) -> Optional[int]: """ Select a page for eviction from inactive list tail. """ while self.inactive_list: candidate = self.inactive_list.pop() if candidate not in self.pages: continue page = self.pages[candidate] # Give second chance if accessed if page.access_bit: page.access_bit = False # Move to front of inactive (second chance) self.inactive_list.appendleft(candidate) continue return candidate # Inactive list exhausted - take from active list if self.active_list: return self.active_list.pop() return None def get_working_set_size(self) -> int: """ Estimate WSS as size of active list. This is an approximation: the "actively used" pages. """ return len(self.active_list) def get_total_resident(self) -> int: """Get total resident pages (active + inactive).""" return len(self.active_list) + len(self.inactive_list) def get_statistics(self): """Get tracking statistics.""" return { 'active_size': len(self.active_list), 'inactive_size': len(self.inactive_list), 'total_resident': self.get_total_resident(), 'wss_estimate': self.get_working_set_size(), 'promotions': self.promotions, 'demotions': self.demotions, 'active_ratio': len(self.active_list) / max(1, self.get_total_resident()) } # Demonstrationdef demonstrate_active_inactive(): """Show how active/inactive tracking works.""" print("Active/Inactive List WSS Tracking") print("=" * 60) tracker = ActiveInactiveTracker(max_active_ratio=0.5) # Simulate loading pages for i in range(20): tracker.add_page(i) print("Initial state (all pages inactive):") print(f" {tracker.get_statistics()}") # Simulate access pattern: hot set of 5 pages print("Simulating accesses to pages 0-4 (hot set)...") for _ in range(10): for page in range(5): tracker.access_page(page) tracker.scan_for_aging() print(f"After hot set accesses:") print(f" {tracker.get_statistics()}") # Access some cold pages print("Accessing some 'cold' pages 15-19...") for page in range(15, 20): tracker.access_page(page) tracker.scan_for_aging() print(f"After cold page accesses:") print(f" {tracker.get_statistics()}") # Select eviction candidates print("Eviction candidates (from inactive list):") for i in range(3): victim = tracker.select_eviction_candidate() print(f" Eviction {i+1}: page {victim}") if __name__ == "__main__": demonstrate_active_inactive()Modern processors provide various hardware features that can assist with WSS tracking, reducing software overhead while improving accuracy.
Hardware Performance Counters:
Most modern CPUs include Performance Monitoring Units (PMUs) that can count cache and TLB events:
These counters operate in hardware with zero software overhead during normal execution.
Intel Processor Trace / PEBS:
Advanced processors offer precise event-based sampling (PEBS) or processor trace:
| Feature | Architecture | Information Provided | Overhead |
|---|---|---|---|
| Reference bit | All modern | Binary: accessed or not | None (automatic) |
| TLB miss counter | Most modern | Working set ≫ TLB size | None (read counter) |
| PEBS (Intel) | Intel Core+ | Sampled memory addresses | 1-5% typical |
| IBS (AMD) | AMD Fam10h+ | Sampled instruction/data | 1-5% typical |
| ARM SPE | ARMv8.2+ | Statistical profiling | 1-5% typical |
| Intel PT | Broadwell+ | Full trace (debugger) | 5-15% typical |
The TLB (Translation Lookaside Buffer) naturally caches page table entries for recently accessed pages—essentially acting as a hardware-maintained approximation of the working set. If a process's working set fits in the TLB, TLB misses are rare. High TLB miss rates indicate the working set exceeds TLB capacity, suggesting more memory allocation might help.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
/* * Hardware Performance Counter Based WSS Estimation * * Uses CPU performance counters to estimate working set * characteristics with minimal overhead. */ #include <stdint.h>#include <stdbool.h> /* Linux perf_event interface (simplified) */struct perf_event_attr { uint32_t type; uint64_t config; // ... other fields}; /* Counter IDs for WSS-relevant events */#define PERF_TYPE_HARDWARE 0 /* Simulated hardware counter events */typedef enum { EVENT_DTLB_LOAD_MISSES, EVENT_DTLB_STORE_MISSES, EVENT_PAGE_WALKS, EVENT_L3_MISSES, EVENT_MEMORY_REFERENCES} HWCounterEvent; typedef struct { uint64_t dtlb_misses; uint64_t page_walks; uint64_t l3_misses; uint64_t mem_refs; uint64_t sample_time;} HWCounterSnapshot; typedef struct { HWCounterSnapshot current; HWCounterSnapshot previous; /* Derived WSS estimates */ double estimated_wss_pages; double miss_rate; double working_set_ratio; /* WSS / TLB size estimate */} WSSEstimator; /* * Initialize hardware counter based WSS estimator. */void init_wss_estimator(WSSEstimator *est) { est->current = (HWCounterSnapshot){0}; est->previous = (HWCounterSnapshot){0}; est->estimated_wss_pages = 0; est->miss_rate = 0; est->working_set_ratio = 0;} /* * Sample hardware counters (called periodically). */void sample_hw_counters(WSSEstimator *est) { /* In real implementation, would read MSRs or use perf_event */ /* Here we show the interface concept */ est->previous = est->current; /* Read counters (pseudo-code) */ // est->current.dtlb_misses = rdpmc(DTLB_MISS_COUNTER); // est->current.page_walks = rdpmc(PAGE_WALK_COUNTER); // etc. /* Compute derived metrics */ uint64_t dtlb_delta = est->current.dtlb_misses - est->previous.dtlb_misses; uint64_t refs_delta = est->current.mem_refs - est->previous.mem_refs; if (refs_delta > 0) { est->miss_rate = (double)dtlb_delta / refs_delta; }} /* * Estimate WSS from TLB behavior. * * Key insight: TLB miss rate correlates with WSS / TLB_SIZE ratio. * - Low miss rate → WSS fits in TLB * - High miss rate → WSS >> TLB size */double estimate_wss_from_tlb(WSSEstimator *est, int tlb_entries) { /* * Simple model: If miss rate is M and TLB has T entries, * then roughly WSS ≈ T / (1 - M) for M < 1. * * This is a rough approximation; real models are more complex. */ if (est->miss_rate < 0.01) { /* Very low miss rate: WSS fits in TLB */ return tlb_entries * 0.5; /* Conservative estimate */ } else if (est->miss_rate > 0.9) { /* Very high miss rate: WSS much larger than TLB */ return tlb_entries * 10; /* Working set is large */ } else { /* Interpolate based on miss rate */ return tlb_entries / (1.0 - est->miss_rate); }} /* * Detect working set size changes from counter deltas. */typedef enum { WSS_STABLE, WSS_GROWING, WSS_SHRINKING} WSSChangeStatus; WSSChangeStatus detect_wss_change(WSSEstimator *est) { /* Previous miss rate estimate */ static double prev_miss_rate = 0; double change = est->miss_rate - prev_miss_rate; prev_miss_rate = est->miss_rate; if (change > 0.05) { /* Miss rate increasing → WSS likely growing */ return WSS_GROWING; } else if (change < -0.05) { /* Miss rate decreasing → WSS likely shrinking */ return WSS_SHRINKING; } return WSS_STABLE;} /* * Recommend allocation change based on hardware counters. */int recommend_allocation_change(WSSEstimator *est) { if (est->miss_rate > 0.1) { /* High miss rate: could benefit from more memory */ return +10; /* Suggest adding pages */ } else if (est->miss_rate < 0.01) { /* Very low miss rate: might have excess allocation */ return -5; /* Suggest reclaiming pages */ } return 0; /* Current allocation seems appropriate */} /* * Example usage in kernel memory manager. */void memory_manager_tick(void) { static WSSEstimator per_process_est[MAX_PROCESSES]; for (int pid = 0; pid < active_process_count; pid++) { WSSEstimator *est = &per_process_est[pid]; /* Sample this process's counters */ // switch_to_process_context(pid); sample_hw_counters(est); /* Check for WSS changes */ WSSChangeStatus status = detect_wss_change(est); if (status == WSS_GROWING) { /* Consider allocating more frames */ int delta = recommend_allocation_change(est); // request_additional_frames(pid, delta); } else if (status == WSS_SHRINKING) { /* Consider reclaiming frames */ int delta = recommend_allocation_change(est); // reclaim_frames(pid, -delta); } }}Rather than tracking every access or scanning all pages, statistical sampling provides accurate WSS estimates with very low overhead.
The Sampling Insight:
If pages are sampled randomly and independently:
Random Sampling Schemes:
1. Time-Based Sampling:
2. Address Space Sampling:
3. Hardware-Triggered Sampling (PEBS/IBS):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185
# Statistical Sampling for WSS Estimation import randomimport mathfrom collections import Counterfrom typing import List, Set, Tuple class SamplingWSSEstimator: """ Estimate WSS through statistical sampling. Instead of tracking all pages, we sample memory accesses and estimate WSS from the sample distribution. """ def __init__(self, sample_rate: float = 0.001, window_size: int = 10000): """ Args: sample_rate: Fraction of accesses to sample (e.g., 0.001 = 0.1%) window_size: Number of samples to retain for estimation """ self.sample_rate = sample_rate self.window_size = window_size self.samples: List[int] = [] # Recent page samples self.total_accesses = 0 def record_access(self, page: int): """ Record a memory access (may or may not be sampled). """ self.total_accesses += 1 # Probabilistically sample this access if random.random() < self.sample_rate: self.samples.append(page) # Maintain window size if len(self.samples) > self.window_size: self.samples.pop(0) def estimate_wss(self) -> Tuple[float, float]: """ Estimate WSS from samples. Returns (estimate, confidence_interval_95). Uses the species richness estimation approach: Given n samples with D distinct values, estimate the true number of distinct items in population. """ if len(self.samples) < 100: return (0, float('inf')) # Not enough samples n = len(self.samples) D = len(set(self.samples)) # Distinct pages in sample # Chao1 estimator for species richness # Good when there are many rare species (infrequently accessed pages) page_counts = Counter(self.samples) f1 = sum(1 for c in page_counts.values() if c == 1) # Singletons f2 = sum(1 for c in page_counts.values() if c == 2) # Doubletons if f2 == 0: f2 = 1 # Avoid division by zero # Chao1 estimate chao1_estimate = D + (f1 * (f1 - 1)) / (2 * (f2 + 1)) # Confidence interval (approximate) variance = f2 * (0.5 * (f1/f2)**2 + (f1/f2)**3 + 0.25 * (f1/f2)**4) ci_95 = 1.96 * math.sqrt(variance) if variance > 0 else D * 0.1 return (chao1_estimate, ci_95) def estimate_page_frequency(self, page: int) -> float: """Estimate access frequency of a specific page.""" if not self.samples: return 0.0 count = self.samples.count(page) return count / len(self.samples) def get_hot_pages(self, top_k: int = 10) -> List[Tuple[int, float]]: """Return the top-K most frequently accessed pages.""" page_counts = Counter(self.samples) total = len(self.samples) return [ (page, count / total) for page, count in page_counts.most_common(top_k) ] class AdaptiveSamplingEstimator: """ Adaptive sampling that adjusts rate based on observed variance. Key insight: If WSS is stable, we can reduce sampling rate. If WSS is changing, we need higher sample rate for accuracy. """ def __init__(self, initial_rate: float = 0.001): self.base_estimator = SamplingWSSEstimator(sample_rate=initial_rate) self.wss_history: List[float] = [] self.sample_rate = initial_rate self.min_rate = 0.0001 self.max_rate = 0.01 def record_access(self, page: int): """Record access with adaptive sampling.""" # Use current (potentially adjusted) sample rate self.base_estimator.sample_rate = self.sample_rate self.base_estimator.record_access(page) # Periodically adjust sample rate if self.base_estimator.total_accesses % 10000 == 0: self._adjust_sampling_rate() def _adjust_sampling_rate(self): """Adjust sampling rate based on WSS stability.""" estimate, ci = self.base_estimator.estimate_wss() self.wss_history.append(estimate) if len(self.wss_history) < 5: return # Check variance of recent WSS estimates recent = self.wss_history[-5:] mean = sum(recent) / len(recent) variance = sum((x - mean)**2 for x in recent) / len(recent) cv = math.sqrt(variance) / mean if mean > 0 else 0 if cv > 0.2: # High variance: WSS is changing # Increase sample rate for better tracking self.sample_rate = min(self.sample_rate * 1.5, self.max_rate) elif cv < 0.05: # Low variance: WSS is stable # Decrease sample rate to save overhead self.sample_rate = max(self.sample_rate * 0.75, self.min_rate) def estimate_wss(self) -> Tuple[float, float]: return self.base_estimator.estimate_wss() def demonstrate_sampling(): """Demonstrate sampling-based WSS estimation.""" print("Sampling-Based WSS Estimation") print("=" * 60) # Create estimator with 0.1% sample rate estimator = SamplingWSSEstimator(sample_rate=0.001, window_size=10000) # Simulate workload: 50 hot pages (80% of accesses) + 200 cold pages hot_pages = list(range(50)) cold_pages = list(range(50, 250)) print("Simulating workload:") print(" Hot pages: 0-49 (80% of accesses)") print(" Cold pages: 50-249 (20% of accesses)") print(" True WSS: 250 pages") print() # Generate accesses for i in range(1000000): if random.random() < 0.8: page = random.choice(hot_pages) else: page = random.choice(cold_pages) estimator.record_access(page) # Get estimate estimate, ci = estimator.estimate_wss() print(f"After {estimator.total_accesses:,} accesses:") print(f" Samples collected: {len(estimator.samples):,}") print(f" WSS estimate: {estimate:.0f} pages") print(f" 95% CI: ±{ci:.0f} pages") print(f" True WSS: 250 pages") print(f" Error: {abs(estimate - 250) / 250 * 100:.1f}%") print() # Show hot pages print("Estimated hottest pages:") for page, freq in estimator.get_hot_pages(5): print(f" Page {page}: {freq*100:.1f}% of sampled accesses") if __name__ == "__main__": demonstrate_sampling()The problem of estimating the number of distinct pages from a sample is mathematically equivalent to the ecological problem of estimating species richness from a sample of individuals. Techniques like the Chao1 estimator, developed for ecology, apply directly to WSS estimation. This is an example of cross-domain insight enabling better systems design.
Modern operating systems combine multiple tracking techniques to balance accuracy and overhead. Let's examine how real kernels approach WSS tracking.
Linux Kernel Approach:
Linux doesn't explicitly track 'working set' but uses related mechanisms:
Windows Working Set Manager:
Windows has explicit working set tracking:
| Aspect | Linux | Windows | macOS/Darwin |
|---|---|---|---|
| Tracking unit | Pages in LRU lists | Working set pages | Resident pages |
| Reference tracking | Reference bit + aging | Reference bit + aging | Reference bit + pressure |
| Per-process info | RSS, /proc/pid/smaps | WS min/max, peak | Resident/virtual |
| Trimming trigger | kswapd watermarks | Balance set manager | Jetsam / memory pressure |
| Application visibility | /proc, cgroups | QueryWorkingSet API | Activity Monitor |
Linux /proc/[pid]/smaps Example:
Linux exposes detailed per-process memory information through /proc:
/proc/1234/smaps shows:
Size: 4096 kB # Virtual size of mapping
Rss: 1024 kB # Currently resident pages
Pss: 512 kB # Proportional share (for shared pages)
Referenced: 768 kB # Pages with reference bit set
Anonymous: 1024 kB # Anonymous (non-file) pages
...
The 'Referenced' field approximates the working set: pages accessed since the kernel last cleared reference bits.
Practical Monitoring Tools:
top / htop: Shows RES/RSS (resident set size)ps aux: VSZ (virtual size), RSS (resident size)vmstat: System-wide page fault ratesperf: Hardware performance counters for TLB behaviorvalgrind --tool=massif: Heap profiling for application tuningFor application developers wanting to understand their working set: (1) Use perf stat to see TLB miss rates, (2) Monitor RSS growth over time, (3) Use memory profilers to identify hot allocations, (4) Test with different memory limits to find the 'knee' where performance degrades. Understanding your application's WSS helps you set appropriate memory limits and predict behavior under memory pressure.
Tracking the working set size is essential for informed memory allocation, but exact tracking is prohibitively expensive. Practical systems use approximations—reference bit aging, active/inactive lists, hardware counters, and statistical sampling—to balance accuracy and overhead.
You now understand how operating systems track working set size in practice, balancing the need for accurate information against the overhead of gathering it. In the final page, we'll see how tracked WSS information is used to make allocation decisions.
Next Up: Allocation Based on WSS
With WSS tracking in place, we can finally close the loop: using working set information to make dynamic frame allocation decisions. We'll explore how the working set model translates into practical allocation policies, including integration with thrashing prevention and load control.