Operating SystemsWorking Set and Page Fault Frequency

Working Set Model and Page Fault Frequency

LevelAdvanced

Duration90 mins

TopicWorking Set and Page Fault Frequency

1 / 5

Working Set Model

The Foundation of Intelligent Memory Management

In 1968, Peter Denning introduced a concept that would fundamentally transform how operating systems manage memory: the working set model. Before this breakthrough, memory management was largely reactive—systems would respond to page faults as they occurred, often leading to catastrophic performance degradation known as thrashing.

The working set model provided something revolutionary: a predictive framework for understanding which pages a process truly needs at any given moment. Rather than treating all pages equally or making arbitrary eviction decisions, the working set approach recognizes a fundamental truth about program behavior—locality of reference.

This page explores the working set model in exhaustive detail, from its theoretical foundations to its mathematical formalization, establishing the groundwork for understanding modern memory management policies.

What You Will Learn

By the end of this page, you will understand: (1) The historical context and motivation for the working set model, (2) The principle of locality and its three manifestations, (3) The formal mathematical definition of a working set, (4) How working sets capture program behavior, and (5) Why the working set model became foundational to memory management theory.

The Problem Before Working Sets

To appreciate the significance of the working set model, we must first understand the severe challenges that plagued early multiprogramming systems.

The Early Virtual Memory Dilemma:

When virtual memory systems were first developed in the 1960s, designers faced a fundamental question: How many frames should each process receive? The answers seemed deceptively simple:

Give each process an equal share — If you have 100 frames and 10 processes, each gets 10 frames
Let processes compete freely — Use global replacement and let page fault rates determine allocation
Allocate based on process size — Larger processes get more frames

Each of these approaches harbored critical flaws that would manifest catastrophically under load.

Early Allocation Strategies and Their Failures
Strategy	Approach	Critical Flaw	Failure Mode
Equal Allocation	Divide frames equally among processes	Ignores actual memory needs	Small processes waste frames; large processes starve
Free Competition	Global replacement, no per-process limits	No isolation between processes	One process can steal all frames from others
Size-Proportional	Allocate based on virtual address space size	Confuses address space with actual usage	Sparse address spaces get excessive allocation
Fixed Minimum	Ensure each process has minimum frames	Static allocation ignores phase behavior	Cannot adapt to changing memory demands

The Thrashing Catastrophe:

The most devastating failure mode was thrashing—a condition where the system spends more time handling page faults than executing useful work. Consider this progression:

System has 1000 frames, 10 processes, each using ~150 pages actively
Total active pages: 1500, but only 1000 frames available
Every process constantly page faults, stealing frames from others
Stolen frames cause more page faults in victim processes
CPU utilization drops to near zero while disk I/O saturates
System becomes completely unresponsive

The terrifying aspect of thrashing is its positive feedback loop—once it starts, it accelerates. More page faults lead to more context switches, which leads to more diverse page references, which leads to more page faults.

The Thrashing Paradox

Traditional wisdom suggested that increasing the degree of multiprogramming (more processes) would increase CPU utilization. But beyond a certain point, the opposite occurs—utilization plummets. Before the working set model, system administrators had no principled way to determine this tipping point.

What Was Missing:

The fundamental gap in early approaches was predictive understanding of process memory behavior. Systems needed to answer:

Which pages does each process actually need right now?
How do memory requirements change over time?
What is the minimum allocation that prevents excessive page faults?
How can we detect when a process doesn't have enough frames?

Peter Denning's working set model provided elegant answers to all these questions by formalizing the observation that programs exhibit locality of reference.

The Principle of Locality

The working set model rests on a profound empirical observation about program behavior: programs do not reference their entire address space uniformly. Instead, at any given time, they focus their memory accesses on a small subset of their pages. This phenomenon is called the principle of locality, and it manifests in three distinct forms.

Three Forms of Locality

•Temporal Locality — If a memory location is referenced now, it is likely to be referenced again in the near future. This occurs because programs use loops, frequently access local variables, and repeatedly call certain functions. A recently accessed page has high probability of being accessed again soon.
•Spatial Locality — If a memory location is referenced, nearby locations are likely to be referenced soon. This arises from sequential instruction execution, array traversals, and data structure layouts. Accessing one address on a page increases the probability of accessing other addresses on the same page.
•Sequential Locality — A special case of spatial locality where memory is accessed in linear order. Programs execute instructions sequentially (except for branches), iterate through arrays element by element, and process files record by record.

Why Locality Exists:

Locality is not accidental—it emerges from fundamental properties of program structure:

Program Structure:

Most execution time is spent in loops (temporal locality of instructions)
Functions are called repeatedly (temporal locality of code pages)
Local variables are accessed throughout function execution (temporal locality of stack)
Data structures are processed systematically (spatial locality of heap)

Algorithm Behavior:

Sorting algorithms repeatedly compare and swap adjacent elements
Search algorithms focus on progressively smaller regions
Graph algorithms explore neighborhoods before moving on
Matrix operations process elements in predictable patterns

Memory Layout:

Compilers group related code together
Data structures store related data contiguously
Stack frames contain related local variables
Object-oriented systems cluster object data

The 90-10 Rule

Empirical studies consistently show that programs spend approximately 90% of their execution time accessing only 10% of their code and data. This extreme concentration makes intelligent memory management not just possible, but highly effective. If programs truly used all their pages uniformly, virtual memory would be far less practical.

Locality and Program Phases:

Programs don't maintain constant locality—they move through phases, each with its own working set of pages:

Phase	Example	Pages Referenced
Initialization	Program startup, library loading	Loader, initialization routines, global data
Input Processing	Reading configuration, parsing input	I/O libraries, parser code, input buffers
Computation	Core algorithm execution	Algorithm code, working data structures
Output Generation	Producing results	Output routines, formatting code, output buffers
Cleanup	Resource deallocation, exit	Cleanup routines, finalization code

During each phase, the program's memory requirements may differ dramatically. The working set model captures this phase behavior by defining the working set relative to recent history, automatically adapting as the program transitions between phases.

Formal Definition of the Working Set

With the principle of locality established, we can now formalize the working set concept. Denning's definition is elegant and precise:

Definition: The working set of a process at virtual time t with window size Δ, denoted W(t, Δ), is the set of pages referenced during the time interval from t - Δ to t.

Let's unpack each component of this definition carefully.

Components of the Working Set Definition

•Virtual Time (t) — This refers to the process's own logical time, measured in memory references or instructions executed—not wall-clock time. A blocked process doesn't advance its virtual time. This ensures the working set reflects the process's actual behavior, not elapsed time during which it was idle.
•Window Size (Δ) — The backward-looking window that defines 'recent history.' A larger Δ includes more pages (capturing longer-term patterns) while a smaller Δ focuses on immediate behavior. Choosing Δ is crucial and represents a fundamental tradeoff.
•Page Set W(t, Δ) — The collection of distinct pages referenced within the window. If page P is referenced multiple times, it still appears only once in W(t, Δ). The working set is a mathematical set, not a multiset.
•Working Set Size |W(t, Δ)| — The cardinality of the working set, often denoted WSS (Working Set Size). This represents the minimum number of frames the process needs to avoid thrashing during this phase.

Mathematical Formalization:

Let r₁, r₂, r₃, ... be the sequence of page references made by a process, where rᵢ is the page referenced at virtual time i.

Then the working set is defined as:

W(t, Δ) = { rᵢ | t - Δ < i ≤ t }

And the working set size is:

|W(t, Δ)| = number of distinct pages in W(t, Δ)

Example: Computing a Working Set

Consider a page reference string: 1, 2, 3, 2, 1, 4, 5, 4, 3, 2

At t = 10 with Δ = 5, we look at references from t = 6 to t = 10: References: 5, 4, 3, 2 (positions 7, 8, 9, 10 — note: 4 appears at position 8) Wait, let's recount: positions 6, 7, 8, 9, 10 are: 4, 5, 4, 3, 2

W(10, 5) = {4, 5, 3, 2} = {2, 3, 4, 5} |W(10, 5)| = 4 pages

Properties of Working Sets:

Working sets exhibit several important mathematical properties:

1. Monotonicity with Δ: If Δ₁ ≤ Δ₂, then W(t, Δ₁) ⊆ W(t, Δ₂)

A larger window always includes at least the pages in a smaller window. This means:

|W(t, Δ₁)| ≤ |W(t, Δ₂)|
Working set size is non-decreasing in Δ

2. Bounds on Working Set Size: 1 ≤ |W(t, Δ)| ≤ min(Δ, N)

Where N is the total number of pages in the process's address space. The working set always contains at least one page (the most recently referenced) and at most Δ pages (if every reference in the window is to a different page) or N pages (if the entire address space is covered).

3. Inclusion Property: If page P ∈ W(t, Δ), then either:

P was just referenced (at time t), or
P was referenced within the last Δ references and hasn't been 'pushed out' by Δ new distinct pages

4. Temporal Coherence: For most programs, W(t, Δ) ≈ W(t+1, Δ) — the working set changes slowly over time. Dramatic changes occur only at phase transitions.

Visualizing Working Set Behavior

To build intuition for how working sets behave, let's trace through a detailed example. Consider a program with the following page reference string:

Reference String: 1, 2, 3, 2, 1, 4, 1, 2, 5, 1, 2, 3, 4, 5

We'll compute the working set at each time step with Δ = 4 (a window of 4 references).

Working Set Computation (Δ = 4)
Time (t)	Reference	Window References	W(t, 4)	\|W(t, 4)\|
1	1	1	{1}	1
2	2	1, 2	{1, 2}	2
3	3	1, 2, 3	{1, 2, 3}	3
4	2	1, 2, 3, 2	{1, 2, 3}	3
5	1	2, 3, 2, 1	{1, 2, 3}	3
6	4	3, 2, 1, 4	{1, 2, 3, 4}	4
7	1	2, 1, 4, 1	{1, 2, 4}	3
8	2	1, 4, 1, 2	{1, 2, 4}	3
9	5	4, 1, 2, 5	{1, 2, 4, 5}	4
10	1	1, 2, 5, 1	{1, 2, 5}	3
11	2	2, 5, 1, 2	{1, 2, 5}	3
12	3	5, 1, 2, 3	{1, 2, 3, 5}	4
13	4	1, 2, 3, 4	{1, 2, 3, 4}	4
14	5	2, 3, 4, 5	{2, 3, 4, 5}	4

Observations from the Trace:

1. Working Set Stability: Notice how the working set remains relatively stable. From t=4 to t=5, despite different references, the working set stays {1, 2, 3}. This stability is what makes working sets useful—they change slowly, tracking genuine phase transitions rather than individual references.

2. Page Entry and Exit: Page 3 exits the working set at t=7 (its last reference at t=5 falls outside the window) but re-enters at t=12 when referenced again. The working set naturally captures which pages are 'currently important.'

3. Size Fluctuation: The working set size fluctuates between 3 and 4 pages. If we allocate exactly |W(t, Δ)| frames to this process at each moment, we'd need between 3 and 4 frames—far less than the 5 distinct pages eventually used.

4. Effect of Window Size: With a larger Δ (say, 6 instead of 4), working sets would be larger and change more slowly. With a smaller Δ (say, 2), they'd be smaller but potentially miss important pages, causing page faults.

Working Set Intuition

Think of the working set as a 'sliding window' over the process's recent memory behavior. Pages enter when referenced and stay until they haven't been accessed for Δ references. It's like a 'recently used' list with automatic expiration.

The Working Set Principle

The working set model leads to a fundamental principle for memory management, often called the Working Set Principle or Denning's Principle:

A process should be allocated frames at least equal to its working set size. If the sum of all processes' working set sizes exceeds total available frames, then some processes must be suspended (not all can run efficiently simultaneously).

This principle has profound implications for operating system design.

Implications of the Working Set Principle

•Admission Control — Before admitting a new process to the ready queue, the OS should verify that sufficient frames exist to accommodate its initial working set plus the working sets of all currently running processes.
•Load Control — If the sum of working set sizes exceeds available frames, the OS must reduce the degree of multiprogramming—suspend some processes to allow others to run efficiently.
•Dynamic Allocation — Frame allocation should track working set size dynamically, giving processes more frames when their working sets grow and reclaiming frames when working sets shrink.
•Thrashing Prevention — By ensuring each process has enough frames for its working set, the system prevents the page fault cascade that leads to thrashing.
•Fairness — Processes with inherently larger working sets get more frames (proportional to need) rather than being artificially constrained to equal shares.

The Working Set Balance Equation:

Let WSS_i denote the working set size of process i at a given moment, and let M denote total available frames. The system is in balance when:

∑ WSS_i ≤ M (for all currently running processes)

When this inequality is satisfied, each process can hold its entire working set in memory, page faults are minimal (only when transitioning between phases), and CPU utilization is high.

When the inequality is violated:

∑ WSS_i > M (system is overcommitted)

Some processes cannot hold their working sets, leading to:

Elevated page fault rates
Processes interfering with each other's memory
Potential thrashing if the imbalance is severe

The OS must respond by suspending one or more processes (moving them entirely out of memory) until balance is restored.

The Swapping Decision

When WSS sum exceeds M, the OS faces a critical decision: which process(es) to suspend? This decision involves balancing multiple factors—process priority, how long each has run, how close each is to completion, and the size of each working set (suspending one large-WSS process may be more effective than suspending multiple small-WSS processes).

Choosing the Window Size Δ

The window size Δ is the critical parameter of the working set model. Its selection represents a fundamental tradeoff between responsiveness and stability.

Effects of Different Δ Values:

Small Δ (Reactive)

•Working sets are small
•Pages exit quickly after last reference
•Rapid response to phase changes
•Risk: may exclude pages still needed
•Consequence: excessive page faults
•Memory efficiency high but performance may suffer

Large Δ (Conservative)

•Working sets are large
•Pages stay long after last reference
•Slow response to phase changes
•Risk: retains pages no longer needed
•Consequence: wasted frames
•Performance stable but memory less efficiently used

The Ideal Δ:

The ideal window size is one that:

Captures full locality regions — Large enough to include all pages in the current phase's working set
Excludes stale pages — Small enough to drop pages from previous phases that are no longer needed
Adapts to program behavior — What's ideal may differ between programs or even between phases of the same program

Practical Considerations:

In practice, Δ is typically:

Measured in page references (logical time) rather than wall-clock time
Set system-wide (same Δ for all processes) for simplicity
Chosen empirically based on workload characteristics
Ranges from thousands to millions of references

Common values in real systems:

Δ ≈ 10,000 references — Responsive, good for interactive workloads
Δ ≈ 100,000 references — Balanced, suitable for mixed workloads
Δ ≈ 1,000,000 references — Conservative, good for batch processing

The Δ Selection Problem

There is no universally 'correct' Δ. The optimal value depends on program behavior, workload mix, and system goals. This inherent difficulty in selecting Δ motivated alternative approaches like Page Fault Frequency (PFF), which we'll explore in a later page—PFF achieves similar goals without requiring explicit Δ selection.

Working Set vs. LRU: A Critical Distinction

Students often confuse the working set model with the LRU (Least Recently Used) page replacement algorithm. While both leverage the principle of locality, they serve fundamentally different purposes and operate at different levels.

Key Distinctions:

Working Set vs. LRU Comparison
Aspect	Working Set Model	LRU Algorithm
Purpose	Determines how many frames a process needs	Determines which page to evict when a frame is needed
Level	Memory allocation policy	Page replacement policy
Scope	Per-process (allocation decision)	Global or local (eviction decision)
Output	A set of pages (the working set)	A single page (the victim)
Parameter	Window size Δ (time-based)	None inherent (though approximations have parameters)
Trigger	Continuous monitoring	Activated on page fault
Goal	Prevent thrashing through proper allocation	Minimize page faults through smart replacement

Complementary, Not Competing:

Working set and LRU work at different levels of the memory management hierarchy:

Working Set (Allocation): "Process P needs 15 frames to hold its working set"
LRU (Replacement): "Process P needs a frame; evict its least recently used page"

A well-designed system uses both:

The working set model guides how many frames each process receives
LRU (or an approximation) determines which page to evict when frames must be reclaimed

The Relationship:

Interestingly, if you use working set-based allocation and need to evict a page, the page that will leave the working set next (under LRU ordering) is exactly the least recently used page. This means:

Working set allocation with window Δ
Plus LRU replacement when frames needed
Equals a coherent policy: keep pages used within last Δ references

The working set model can be seen as a way to dynamically adjust allocation so that LRU replacement rarely needs to evict pages that will be used again soon.

Unified Perspective

Both working set and LRU are manifestations of the same insight: recent history predicts future behavior. Working set uses this to allocate memory proactively; LRU uses it to replace pages reactively. The best systems combine proactive allocation with reactive replacement.

Summary: The Working Set Foundation

We have established the theoretical foundation of the working set model—one of the most influential concepts in operating systems memory management. Let's consolidate the key insights:

Key Takeaways

•The Problem: Early memory management lacked principled allocation, leading to thrashing—a catastrophic condition where page faulting dominates useful work.
•Locality of Reference: Programs don't access memory uniformly; they focus on small subsets that change gradually. This locality manifests as temporal, spatial, and sequential patterns.
•Working Set Definition: W(t, Δ) is the set of pages referenced in the last Δ memory accesses, capturing the process's current memory needs.
•Working Set Principle: Each process should receive frames ≥ |W(t, Δ)|. If total working sets exceed available frames, reduce multiprogramming.
•Window Size Tradeoff: Larger Δ provides stability but may retain unneeded pages; smaller Δ is responsive but may exclude needed pages.
•vs. LRU: Working set determines allocation (how many frames); LRU determines replacement (which page to evict). They complement each other.

What's Next:

In the next page, we'll explore the working set window in greater depth—examining how the window parameter affects system behavior, how to track working sets in practice, and the implementation challenges that led to approximate approaches.

Page Complete

You now understand the conceptual and mathematical foundations of the working set model. This understanding is essential for grasping how modern operating systems prevent thrashing and allocate memory dynamically. The working set concept remains central to memory management theory and practice, influencing designs from Linux's page reclamation to database buffer management.

1 / 5

Loading learning content...

Operating SystemsWorking Set and Page Fault Frequency

Working Set Model and Page Fault Frequency

LevelAdvanced

Duration90 mins

TopicWorking Set and Page Fault Frequency

1 / 5

Working Set Model

The Foundation of Intelligent Memory Management

What You Will Learn

The Problem Before Working Sets

To appreciate the significance of the working set model, we must first understand the severe challenges that plagued early multiprogramming systems.

The Early Virtual Memory Dilemma:

When virtual memory systems were first developed in the 1960s, designers faced a fundamental question: How many frames should each process receive? The answers seemed deceptively simple:

Give each process an equal share — If you have 100 frames and 10 processes, each gets 10 frames
Let processes compete freely — Use global replacement and let page fault rates determine allocation
Allocate based on process size — Larger processes get more frames

Each of these approaches harbored critical flaws that would manifest catastrophically under load.

Early Allocation Strategies and Their Failures
Strategy	Approach	Critical Flaw	Failure Mode
Equal Allocation	Divide frames equally among processes	Ignores actual memory needs	Small processes waste frames; large processes starve
Free Competition	Global replacement, no per-process limits	No isolation between processes	One process can steal all frames from others
Size-Proportional	Allocate based on virtual address space size	Confuses address space with actual usage	Sparse address spaces get excessive allocation
Fixed Minimum	Ensure each process has minimum frames	Static allocation ignores phase behavior	Cannot adapt to changing memory demands

The Thrashing Catastrophe:

The most devastating failure mode was thrashing—a condition where the system spends more time handling page faults than executing useful work. Consider this progression:

System has 1000 frames, 10 processes, each using ~150 pages actively
Total active pages: 1500, but only 1000 frames available
Every process constantly page faults, stealing frames from others
Stolen frames cause more page faults in victim processes
CPU utilization drops to near zero while disk I/O saturates
System becomes completely unresponsive

The Thrashing Paradox

What Was Missing:

The fundamental gap in early approaches was predictive understanding of process memory behavior. Systems needed to answer:

Which pages does each process actually need right now?
How do memory requirements change over time?
What is the minimum allocation that prevents excessive page faults?
How can we detect when a process doesn't have enough frames?

Peter Denning's working set model provided elegant answers to all these questions by formalizing the observation that programs exhibit locality of reference.

The Principle of Locality

Three Forms of Locality

•Temporal Locality — If a memory location is referenced now, it is likely to be referenced again in the near future. This occurs because programs use loops, frequently access local variables, and repeatedly call certain functions. A recently accessed page has high probability of being accessed again soon.
•Spatial Locality — If a memory location is referenced, nearby locations are likely to be referenced soon. This arises from sequential instruction execution, array traversals, and data structure layouts. Accessing one address on a page increases the probability of accessing other addresses on the same page.
•Sequential Locality — A special case of spatial locality where memory is accessed in linear order. Programs execute instructions sequentially (except for branches), iterate through arrays element by element, and process files record by record.

Why Locality Exists:

Locality is not accidental—it emerges from fundamental properties of program structure:

Program Structure:

Most execution time is spent in loops (temporal locality of instructions)
Functions are called repeatedly (temporal locality of code pages)
Local variables are accessed throughout function execution (temporal locality of stack)
Data structures are processed systematically (spatial locality of heap)

Algorithm Behavior:

Sorting algorithms repeatedly compare and swap adjacent elements
Search algorithms focus on progressively smaller regions
Graph algorithms explore neighborhoods before moving on
Matrix operations process elements in predictable patterns

Memory Layout:

Compilers group related code together
Data structures store related data contiguously
Stack frames contain related local variables
Object-oriented systems cluster object data

The 90-10 Rule

Locality and Program Phases:

Programs don't maintain constant locality—they move through phases, each with its own working set of pages:

Phase	Example	Pages Referenced
Initialization	Program startup, library loading	Loader, initialization routines, global data
Input Processing	Reading configuration, parsing input	I/O libraries, parser code, input buffers
Computation	Core algorithm execution	Algorithm code, working data structures
Output Generation	Producing results	Output routines, formatting code, output buffers
Cleanup	Resource deallocation, exit	Cleanup routines, finalization code

Formal Definition of the Working Set

With the principle of locality established, we can now formalize the working set concept. Denning's definition is elegant and precise:

Definition: The working set of a process at virtual time t with window size Δ, denoted W(t, Δ), is the set of pages referenced during the time interval from t - Δ to t.

Let's unpack each component of this definition carefully.

Components of the Working Set Definition

•Virtual Time (t) — This refers to the process's own logical time, measured in memory references or instructions executed—not wall-clock time. A blocked process doesn't advance its virtual time. This ensures the working set reflects the process's actual behavior, not elapsed time during which it was idle.
•Window Size (Δ) — The backward-looking window that defines 'recent history.' A larger Δ includes more pages (capturing longer-term patterns) while a smaller Δ focuses on immediate behavior. Choosing Δ is crucial and represents a fundamental tradeoff.
•Page Set W(t, Δ) — The collection of distinct pages referenced within the window. If page P is referenced multiple times, it still appears only once in W(t, Δ). The working set is a mathematical set, not a multiset.
•Working Set Size |W(t, Δ)| — The cardinality of the working set, often denoted WSS (Working Set Size). This represents the minimum number of frames the process needs to avoid thrashing during this phase.

Mathematical Formalization:

Let r₁, r₂, r₃, ... be the sequence of page references made by a process, where rᵢ is the page referenced at virtual time i.

Then the working set is defined as:

W(t, Δ) = { rᵢ | t - Δ < i ≤ t }

And the working set size is:

|W(t, Δ)| = number of distinct pages in W(t, Δ)

Example: Computing a Working Set

Consider a page reference string: 1, 2, 3, 2, 1, 4, 5, 4, 3, 2

W(10, 5) = {4, 5, 3, 2} = {2, 3, 4, 5} |W(10, 5)| = 4 pages

Properties of Working Sets:

Working sets exhibit several important mathematical properties:

1. Monotonicity with Δ: If Δ₁ ≤ Δ₂, then W(t, Δ₁) ⊆ W(t, Δ₂)

A larger window always includes at least the pages in a smaller window. This means:

|W(t, Δ₁)| ≤ |W(t, Δ₂)|
Working set size is non-decreasing in Δ

2. Bounds on Working Set Size: 1 ≤ |W(t, Δ)| ≤ min(Δ, N)

3. Inclusion Property: If page P ∈ W(t, Δ), then either:

P was just referenced (at time t), or
P was referenced within the last Δ references and hasn't been 'pushed out' by Δ new distinct pages

4. Temporal Coherence: For most programs, W(t, Δ) ≈ W(t+1, Δ) — the working set changes slowly over time. Dramatic changes occur only at phase transitions.

Visualizing Working Set Behavior

To build intuition for how working sets behave, let's trace through a detailed example. Consider a program with the following page reference string:

Reference String: 1, 2, 3, 2, 1, 4, 1, 2, 5, 1, 2, 3, 4, 5

We'll compute the working set at each time step with Δ = 4 (a window of 4 references).

Working Set Computation (Δ = 4)
Time (t)	Reference	Window References	W(t, 4)	\|W(t, 4)\|
1	1	1	{1}	1
2	2	1, 2	{1, 2}	2
3	3	1, 2, 3	{1, 2, 3}	3
4	2	1, 2, 3, 2	{1, 2, 3}	3
5	1	2, 3, 2, 1	{1, 2, 3}	3
6	4	3, 2, 1, 4	{1, 2, 3, 4}	4
7	1	2, 1, 4, 1	{1, 2, 4}	3
8	2	1, 4, 1, 2	{1, 2, 4}	3
9	5	4, 1, 2, 5	{1, 2, 4, 5}	4
10	1	1, 2, 5, 1	{1, 2, 5}	3
11	2	2, 5, 1, 2	{1, 2, 5}	3
12	3	5, 1, 2, 3	{1, 2, 3, 5}	4
13	4	1, 2, 3, 4	{1, 2, 3, 4}	4
14	5	2, 3, 4, 5	{2, 3, 4, 5}	4

Observations from the Trace:

Working Set Intuition

The Working Set Principle

The working set model leads to a fundamental principle for memory management, often called the Working Set Principle or Denning's Principle:

A process should be allocated frames at least equal to its working set size. If the sum of all processes' working set sizes exceeds total available frames, then some processes must be suspended (not all can run efficiently simultaneously).

This principle has profound implications for operating system design.

Implications of the Working Set Principle

•Admission Control — Before admitting a new process to the ready queue, the OS should verify that sufficient frames exist to accommodate its initial working set plus the working sets of all currently running processes.
•Load Control — If the sum of working set sizes exceeds available frames, the OS must reduce the degree of multiprogramming—suspend some processes to allow others to run efficiently.
•Dynamic Allocation — Frame allocation should track working set size dynamically, giving processes more frames when their working sets grow and reclaiming frames when working sets shrink.
•Thrashing Prevention — By ensuring each process has enough frames for its working set, the system prevents the page fault cascade that leads to thrashing.
•Fairness — Processes with inherently larger working sets get more frames (proportional to need) rather than being artificially constrained to equal shares.

The Working Set Balance Equation:

Let WSS_i denote the working set size of process i at a given moment, and let M denote total available frames. The system is in balance when:

∑ WSS_i ≤ M (for all currently running processes)

When this inequality is satisfied, each process can hold its entire working set in memory, page faults are minimal (only when transitioning between phases), and CPU utilization is high.

When the inequality is violated:

∑ WSS_i > M (system is overcommitted)

Some processes cannot hold their working sets, leading to:

Elevated page fault rates
Processes interfering with each other's memory
Potential thrashing if the imbalance is severe

The OS must respond by suspending one or more processes (moving them entirely out of memory) until balance is restored.

The Swapping Decision

Choosing the Window Size Δ

The window size Δ is the critical parameter of the working set model. Its selection represents a fundamental tradeoff between responsiveness and stability.

Effects of Different Δ Values:

Small Δ (Reactive)

•Working sets are small
•Pages exit quickly after last reference
•Rapid response to phase changes
•Risk: may exclude pages still needed
•Consequence: excessive page faults
•Memory efficiency high but performance may suffer

Large Δ (Conservative)

•Working sets are large
•Pages stay long after last reference
•Slow response to phase changes
•Risk: retains pages no longer needed
•Consequence: wasted frames
•Performance stable but memory less efficiently used

The Ideal Δ:

The ideal window size is one that:

Captures full locality regions — Large enough to include all pages in the current phase's working set
Excludes stale pages — Small enough to drop pages from previous phases that are no longer needed
Adapts to program behavior — What's ideal may differ between programs or even between phases of the same program

Practical Considerations:

In practice, Δ is typically:

Measured in page references (logical time) rather than wall-clock time
Set system-wide (same Δ for all processes) for simplicity
Chosen empirically based on workload characteristics
Ranges from thousands to millions of references

Common values in real systems:

Δ ≈ 10,000 references — Responsive, good for interactive workloads
Δ ≈ 100,000 references — Balanced, suitable for mixed workloads
Δ ≈ 1,000,000 references — Conservative, good for batch processing

The Δ Selection Problem

Working Set vs. LRU: A Critical Distinction

Key Distinctions:

Working Set vs. LRU Comparison
Aspect	Working Set Model	LRU Algorithm
Purpose	Determines how many frames a process needs	Determines which page to evict when a frame is needed
Level	Memory allocation policy	Page replacement policy
Scope	Per-process (allocation decision)	Global or local (eviction decision)
Output	A set of pages (the working set)	A single page (the victim)
Parameter	Window size Δ (time-based)	None inherent (though approximations have parameters)
Trigger	Continuous monitoring	Activated on page fault
Goal	Prevent thrashing through proper allocation	Minimize page faults through smart replacement

Complementary, Not Competing:

Working set and LRU work at different levels of the memory management hierarchy:

Working Set (Allocation): "Process P needs 15 frames to hold its working set"
LRU (Replacement): "Process P needs a frame; evict its least recently used page"

A well-designed system uses both:

The working set model guides how many frames each process receives
LRU (or an approximation) determines which page to evict when frames must be reclaimed

The Relationship:

Working set allocation with window Δ
Plus LRU replacement when frames needed
Equals a coherent policy: keep pages used within last Δ references

The working set model can be seen as a way to dynamically adjust allocation so that LRU replacement rarely needs to evict pages that will be used again soon.

Unified Perspective

Summary: The Working Set Foundation

We have established the theoretical foundation of the working set model—one of the most influential concepts in operating systems memory management. Let's consolidate the key insights:

Key Takeaways

•The Problem: Early memory management lacked principled allocation, leading to thrashing—a catastrophic condition where page faulting dominates useful work.
•Locality of Reference: Programs don't access memory uniformly; they focus on small subsets that change gradually. This locality manifests as temporal, spatial, and sequential patterns.
•Working Set Definition: W(t, Δ) is the set of pages referenced in the last Δ memory accesses, capturing the process's current memory needs.
•Working Set Principle: Each process should receive frames ≥ |W(t, Δ)|. If total working sets exceed available frames, reduce multiprogramming.
•Window Size Tradeoff: Larger Δ provides stability but may retain unneeded pages; smaller Δ is responsive but may exclude needed pages.
•vs. LRU: Working set determines allocation (how many frames); LRU determines replacement (which page to evict). They complement each other.

What's Next:

Page Complete

1 / 5