Loading content...
Consider a highway during rush hour. Individual drivers can optimize their routes and speeds, but if too many cars enter the highway, congestion becomes inevitable no matter how well each driver performs. The solution isn't better individual driving—it's controlling how many cars enter.
Load control applies this same principle to memory management. No matter how sophisticated the page replacement algorithm or how clever the working set approximation, if the total memory demand of active processes exceeds available physical memory, thrashing will occur. Load control addresses this fundamental constraint by managing the multiprogramming level—deciding which processes are allowed to compete for memory at any given time.
By the end of this page, you will understand the concept of multiprogramming level and its relationship to thrashing, admission control mechanisms that prevent overcommitment, medium-term scheduling strategies for balancing load, the criteria for selecting processes to deactivate, and how modern systems implement load control in various forms.
The multiprogramming level (also called the degree of multiprogramming) is the number of processes actively competing for system resources. This seemingly simple metric has profound implications for system performance.
The Fundamental Tension:
The challenge is that the optimal level is dynamic—it depends on the current processes' memory requirements, which change over time.
| Multiprogramming Level | CPU Utilization | Memory Pressure | Page Fault Rate | Throughput |
|---|---|---|---|---|
| Very Low (1-2) | Low (idle during I/O) | None | Minimal | Poor |
| Low (3-5) | Moderate | Low | Low | Moderate |
| Optimal | High (80-95%) | Moderate | Acceptable | Maximum |
| High | Declining | High | Elevated | Declining |
| Excessive | Very Low (paging overhead) | Severe | Extreme | Near Zero |
The Thrashing Threshold:
There exists a critical multiprogramming level beyond which performance degrades catastrophically. This threshold depends on:
The Key Insight:
The goal of load control is to keep the multiprogramming level at or below the threshold that triggers thrashing. It's better to run N processes efficiently than N+K processes in a thrashing state.
Performance degradation when crossing the thrashing threshold is not gradual—it's a cliff. Adding one too many processes can transform a system running at 90% CPU utilization into one running at 10%. Load control is about staying safely away from this cliff.
Admission control is the proactive component of load control. Rather than waiting for thrashing to occur and then reacting, admission control prevents overcommitment before it happens by gating entry to the set of active processes.
Admission Control Decision:
When a new process requests activation (becoming runnable), the admission controller evaluates:
Can this process be accommodated without causing total working set demand to exceed available memory?
If yes, the process is admitted. If no, the process waits.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
// Admission Control Implementation// Prevents overcommitment by gating process activation #include <stdbool.h> typedef struct { pid_t pid; uint32_t estimated_wss; // Estimated working set size uint32_t priority; bool is_active; // In active memory competition uint32_t time_waiting; // Time spent waiting for admission} ProcessAdmissionState; typedef struct { uint32_t total_frames; uint32_t reserved_frames; // Reserved for OS, buffers, etc. uint32_t available_frames; // For user processes uint32_t committed_frames; // Sum of active processes' WSS estimates uint32_t headroom_percent; // Safety margin (e.g., 10%)} SystemAdmissionState; // Calculate available capacity for new processesuint32_t calculate_available_capacity(SystemAdmissionState* state) { // Available = Total - Reserved - Committed - Headroom uint32_t headroom = (state->total_frames * state->headroom_percent) / 100; uint32_t usable = state->total_frames - state->reserved_frames - headroom; if (usable <= state->committed_frames) { return 0; // No capacity available } return usable - state->committed_frames;} // Main admission decisiontypedef enum { ADMIT_GRANTED, // Process can be admitted ADMIT_DEFERRED, // Wait until capacity available ADMIT_DENIED_PRIORITY, // Lower priority must wait for higher ADMIT_DENIED_SIZE // WSS too large for system} AdmissionResult; AdmissionResult evaluate_admission(ProcessAdmissionState* candidate) { SystemAdmissionState* system = get_system_state(); uint32_t available = calculate_available_capacity(system); // Check if process could ever fit if (candidate->estimated_wss > system->available_frames) { return ADMIT_DENIED_SIZE; // Process can never fit } // Check if capacity available now if (available >= candidate->estimated_wss) { return ADMIT_GRANTED; } // Check if preemption possible for high-priority process if (candidate->priority >= PRIORITY_HIGH) { uint32_t reclaimable = estimate_reclaimable_from_low_priority(); if (available + reclaimable >= candidate->estimated_wss) { // Can make room by deactivating lower priority return evaluate_preemption(candidate); } } return ADMIT_DEFERRED;} // Admit a process after approvalvoid admit_process(ProcessAdmissionState* proc) { SystemAdmissionState* system = get_system_state(); proc->is_active = true; system->committed_frames += proc->estimated_wss; // Move process to ready queue move_to_ready_queue(proc->pid); log_admission(proc);} // Handle waiting processes when capacity becomes availablevoid check_deferred_admissions(void) { // Sort waiting processes by priority, then by wait time ProcessAdmissionState* waiting = get_waiting_processes(); sort_by_priority_then_wait_time(waiting); for (int i = 0; i < waiting_count; i++) { AdmissionResult result = evaluate_admission(&waiting[i]); if (result == ADMIT_GRANTED) { admit_process(&waiting[i]); } else { // No more capacity for remaining processes break; } }}Estimating Working Set Size for New Processes:
Admission control requires estimates of working set size before the process runs. Techniques include:
Priority-Based Admission:
Not all processes are equal. Priority-based admission ensures high-priority processes get memory:
Admission control doesn't prevent process creation—processes can exist but remain inactive (swapped out, not in ready queue). The control is over which created processes are allowed to actively compete for execution time and memory. This allows the system to maintain a pool of processes without overcommitting resources.
While admission control gates new processes, medium-term scheduling (MTS) manages the multiprogramming level among already-admitted processes. MTS makes decisions about swapping—which processes should be temporarily moved out of active memory competition.
The Three Levels of Scheduling:
| Level | Also Called | Time Scale | Decision |
|---|---|---|---|
| Long-term | Job scheduler | Minutes-hours | Which jobs to admit |
| Medium-term | Swapper | Seconds-minutes | Which processes to swap |
| Short-term | CPU scheduler | Milliseconds | Which ready process to run |
Medium-term scheduling bridges long-term job admission and short-term CPU allocation, adjusting the active process set in response to system conditions.
Swap-Out Decisions:
MTS decreases multiprogramming level by selecting processes to swap out (deactivate). Criteria include:
Swap-In Decisions:
MTS increases multiprogramming level by selecting processes to swap in (reactivate). Considerations:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149
// Medium-Term Scheduler (Swapper)// Manages the set of processes actively in memory competition typedef enum { SWAP_STATE_ACTIVE, // In memory, competing for CPU SWAP_STATE_SWAPPING_OUT, // Being written to swap SWAP_STATE_SWAPPED, // Entirely in swap, not competing SWAP_STATE_SWAPPING_IN // Being read from swap} SwapState; typedef struct { pid_t pid; SwapState swap_state; uint32_t priority; uint32_t memory_resident; // Frames currently in memory uint32_t wss_estimate; uint64_t last_run_time; uint64_t swap_out_time; // When swapped out (for starvation prevention) bool is_blocked; // Waiting for event bool is_interactive; // Needs quick response} MediumTermProcState; // Call periodically or when memory pressure detectedvoid medium_term_schedule(void) { SystemMemoryState mem_state = get_memory_state(); if (mem_state.free_frames < SWAP_OUT_THRESHOLD) { // Memory pressure - need to swap out select_and_swap_out(); } if (mem_state.free_frames > SWAP_IN_THRESHOLD) { // Capacity available - can swap in select_and_swap_in(); } // Check for starvation of swapped processes check_starvation();} // Select process to swap outMediumTermProcState* select_swap_out_victim(void) { MediumTermProcState* best_victim = NULL; int best_score = -1; for (int i = 0; i < process_count; i++) { MediumTermProcState* proc = &processes[i]; if (proc->swap_state != SWAP_STATE_ACTIVE) continue; if (proc->is_interactive && system_load < HIGH) continue; int score = calculate_swapout_score(proc); if (score > best_score) { best_score = score; best_victim = proc; } } return best_victim;} // Score for swap-out (higher = more likely to swap out)int calculate_swapout_score(MediumTermProcState* proc) { int score = 0; // Lower priority increases swap-out likelihood score += (MAX_PRIORITY - proc->priority) * 100; // Blocked processes are good candidates if (proc->is_blocked) { score += 300; } // Larger memory footprint means more benefit from swapping score += proc->memory_resident / 10; // Longer idle time increases likelihood uint64_t idle_time = get_current_time() - proc->last_run_time; score += min(idle_time / 1000, 200); // Cap contribution // Interactive processes protected if (proc->is_interactive) { score -= 500; } return score;} // Select process to swap inMediumTermProcState* select_swap_in_candidate(uint32_t available_frames) { MediumTermProcState* best_candidate = NULL; int best_score = -1; for (int i = 0; i < process_count; i++) { MediumTermProcState* proc = &processes[i]; if (proc->swap_state != SWAP_STATE_SWAPPED) continue; if (proc->wss_estimate > available_frames) continue; // Won't fit int score = calculate_swapin_score(proc); if (score > best_score) { best_score = score; best_candidate = proc; } } return best_candidate;} // Score for swap-in (higher = more likely to swap in)int calculate_swapin_score(MediumTermProcState* proc) { int score = 0; // Higher priority increases swap-in likelihood score += proc->priority * 100; // Longer wait time increases priority (starvation prevention) uint64_t wait_time = get_current_time() - proc->swap_out_time; score += min(wait_time / 500, 400); // Strong contribution // Event completion makes swap-in urgent if (event_completed_for_process(proc->pid)) { score += 500; } // Interactive processes need quick return if (proc->is_interactive) { score += 300; } return score;} // Prevent indefinite swappingvoid check_starvation(void) { for (int i = 0; i < process_count; i++) { MediumTermProcState* proc = &processes[i]; if (proc->swap_state == SWAP_STATE_SWAPPED) { uint64_t wait_time = get_current_time() - proc->swap_out_time; if (wait_time > MAX_SWAP_WAIT_TIME) { // Force swap-in to prevent starvation // May require swapping out another process force_swap_in(proc); } } }}Medium-term scheduling must be careful not to thrash the swap device itself. Swapping processes in and out too frequently causes high I/O load and gains nothing. Hysteresis in thresholds (different triggers for swap-in vs. swap-out) and minimum residence times help prevent swap thrashing.
When load control determines that the multiprogramming level must decrease, the critical question becomes: which processes should be deactivated? This decision has significant fairness and performance implications.
Deactivation Methods:
| Criterion | Rationale | Advantage | Disadvantage |
|---|---|---|---|
| Lowest Priority | Protect important work | Clear policy, predictable | Low-priority starvation |
| Largest Memory User | Maximum memory freed | Efficient resource recovery | Penalizes legitimate big processes |
| Most Recent Arrival | Seniority protection | Fairness to older processes | Hurts new processes unfairly |
| Longest Blocked | Already not running | Minimal immediate impact | May not free enough memory |
| Poorest Locality | Most likely thrashing | Removes problem process | Requires tracking, may be temporary |
| Least CPU Used | Least productive recently | Protect active processes | Punishes I/O-bound unfairly |
Composite Scoring:
Production systems typically use weighted combinations of criteria:
deactivation_score = w1 × priority_factor +
w2 × memory_factor +
w3 × blocked_duration +
w4 × locality_quality +
w5 × arrival_time
Weights are tuned based on system goals:
Protecting System-Critical Processes:
Some processes should never be deactivated:
Care must be taken when deactivating processes based on priority. If a high-priority process is waiting for a low-priority process (e.g., for a lock or message), deactivating the low-priority process can delay the high-priority process indefinitely. The scheduler must consider dependencies, not just isolated priorities.
Load control must be triggered at the right time—early enough to prevent thrashing but not so aggressively as to unnecessarily limit concurrency. Several metrics help detect impending or actual overload.
Primary Indicators:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132
// Overload Detection System// Multi-metric detection for robust overload identification typedef struct { uint32_t free_frames; uint32_t total_frames; float system_page_fault_rate; // Faults per second, system-wide float page_out_rate; // Pages written to swap per second float scan_rate; // Pages scanned per second float cpu_utilization; float io_wait_percent; uint32_t runnable_processes; uint32_t swap_io_queue_depth;} SystemLoadMetrics; typedef enum { LOAD_LIGHT, // Can consider admitting more LOAD_MODERATE, // System operating well LOAD_HEAVY, // Near capacity, be careful LOAD_OVERLOADED, // Thrashing imminent or occurring LOAD_CRITICAL // Severe thrashing, emergency action} LoadLevel; // Evaluate current load levelLoadLevel evaluate_load(SystemLoadMetrics* metrics) { int pressure_indicators = 0; int critical_indicators = 0; // Check free memory float free_percent = (float)metrics->free_frames / metrics->total_frames * 100; if (free_percent < 5) { critical_indicators++; } else if (free_percent < 10) { pressure_indicators++; } // Check fault rate if (metrics->system_page_fault_rate > CRITICAL_FAULT_RATE) { critical_indicators++; } else if (metrics->system_page_fault_rate > HIGH_FAULT_RATE) { pressure_indicators++; } // Check for CPU utilization paradox (thrashing signature) if (metrics->cpu_utilization < 50 && metrics->runnable_processes > 5 && metrics->io_wait_percent > 30) { // Classic thrashing: many processes want CPU but CPU is idle (waiting for I/O) critical_indicators++; } // Check page-out rate if (metrics->page_out_rate > CRITICAL_PAGEOUT_RATE) { pressure_indicators++; } // Check scan rate (desperation scanning) if (metrics->scan_rate > DESPERATE_SCAN_RATE) { pressure_indicators++; } // Composite decision if (critical_indicators >= 2) { return LOAD_CRITICAL; } else if (critical_indicators >= 1 || pressure_indicators >= 3) { return LOAD_OVERLOADED; } else if (pressure_indicators >= 2) { return LOAD_HEAVY; } else if (pressure_indicators >= 1) { return LOAD_MODERATE; } else { return LOAD_LIGHT; }} // Take action based on load levelvoid respond_to_load_level(LoadLevel level) { switch (level) { case LOAD_CRITICAL: // Immediate action: suspend multiple processes emergency_load_reduction(); disable_new_admissions(); break; case LOAD_OVERLOADED: // Strong action: suspend one process, restrict admissions suspend_lowest_priority_process(); restrict_admissions(); break; case LOAD_HEAVY: // Moderate action: reclaim frames aggressively, careful admissions aggressive_frame_reclamation(); careful_admissions(); break; case LOAD_MODERATE: // Normal operation normal_admissions(); break; case LOAD_LIGHT: // Consider bringing in swapped processes consider_resumptions(); permissive_admissions(); break; }} // Trend detection for proactive interventiontypedef struct { float fault_rate_history[10]; int history_index; float trend_slope;} TrendDetector; void update_trend(TrendDetector* td, float current_rate) { td->fault_rate_history[td->history_index] = current_rate; td->history_index = (td->history_index + 1) % 10; // Simple linear regression for trend td->trend_slope = calculate_slope(td->fault_rate_history, 10); // Proactive intervention if trend is strongly upward if (td->trend_slope > WORRYING_TREND_THRESHOLD) { // Fault rate rising fast - intervene before overload preemptive_load_reduction(); }}Monitoring trends is often more valuable than monitoring absolute values. A system at 90% memory utilization with stable metrics is fine; one at 70% with rapidly rising fault rates is about to crash. Derivative (rate of change) signals enable proactive intervention before crisis.
When overload is detected, the system must shed load—reduce the number of active processes until the remaining processes can execute effectively. Several strategies exist, ranging from gentle to aggressive.
Progressive Load Shedding:
The most effective approach escalates through increasingly aggressive levels:
The OOM Killer: Last Resort:
When all other measures fail, the system must terminate processes to survive. Linux's OOM killer is the classic example:
OOM Score Calculation (Linux):
oom_score = memory_percentage × adjustment_factors
Where:
memory_percentage = process RSS / total RAMOOM Killer Selection:
The OOM killer is brutal but necessary—it's better to lose one process than to have all processes unusable indefinitely.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103
// Progressive Load Shedding Implementation typedef enum { SHED_RESTRICT, // Stop new admissions SHED_RECLAIM, // Aggressive page reclamation SHED_SUSPEND_ONE, // Suspend single process SHED_SUSPEND_BULK, // Suspend multiple processes SHED_EMERGENCY, // Suspend all non-critical SHED_TERMINATE // OOM kill} SheddingLevel; SheddingLevel current_shedding_level = SHED_RESTRICT; // Execute load shedding at current levelvoid execute_shedding(SheddingLevel level) { switch (level) { case SHED_RESTRICT: set_admission_policy(ADMISSION_RESTRICTED); break; case SHED_RECLAIM: set_admission_policy(ADMISSION_BLOCKED); trigger_aggressive_reclaim(); break; case SHED_SUSPEND_ONE: if (suspend_best_candidate()) { break; } // Fall through if no candidate found case SHED_SUSPEND_BULK: suspend_multiple_processes(calculate_needed_suspensions()); break; case SHED_EMERGENCY: suspend_all_non_critical(); break; case SHED_TERMINATE: invoke_oom_killer(); break; }} // Escalate if current level insufficientvoid maybe_escalate(void) { if (!check_memory_stable() && time_since_shedding() > STABILIZATION_WAIT) { if (current_shedding_level < SHED_TERMINATE) { current_shedding_level++; execute_shedding(current_shedding_level); } }} // De-escalate when pressure relievedvoid maybe_deescalate(void) { if (check_memory_comfortable() && time_since_stable() > DEESCALATION_DELAY) { if (current_shedding_level > SHED_RESTRICT) { current_shedding_level--; } if (current_shedding_level == SHED_RESTRICT && check_memory_abundant()) { set_admission_policy(ADMISSION_NORMAL); resume_suspended_processes(); } }} // OOM killer implementationvoid invoke_oom_killer(void) { int best_score = -1; Process* victim = NULL; for (int i = 0; i < process_count; i++) { Process* p = &all_processes[i]; // Never kill kernel threads if (p->is_kernel) continue; // Never kill init (pid 1) if (p->pid == 1) continue; // Skip processes with oom_score_adj = -1000 (unkillable) if (p->oom_score_adj == -1000) continue; int score = calculate_oom_score(p); if (score > best_score) { best_score = score; victim = p; } } if (victim != NULL) { log_oom_kill(victim); send_signal(victim->pid, SIGKILL); }}Triggering the OOM killer represents a failure of earlier load control mechanisms. A well-tuned system should rarely or never reach this point. If OOM kills are frequent, the solution is better admission control, more memory, or fewer processes—not just better OOM victim selection.
Modern systems implement load control through various mechanisms, often without explicit 'swapper' or 'medium-term scheduler' components.
Linux:
Windows:
macOS:
| Feature | Linux | Windows | macOS |
|---|---|---|---|
| Primary mechanism | kswapd + direct reclaim | Working set manager | Compressed memory |
| Process suspension | Rare (per-page focus) | Explicit (swap out) | App termination |
| Admission control | cgroups (optional) | Job objects | Automatic management |
| Application cooperation | None required | None required | Memory pressure notifications |
| Last resort | OOM killer | Close applications prompt | Terminate low-priority apps |
Modern systems have evolved beyond explicit medium-term scheduling to more nuanced approaches. Compression reduces swap I/O, app termination with restoration provides user transparency, and memory limits provide workload isolation. The core principle remains: control the number of active competitors for memory to prevent thrashing.
Load control represents the highest-level approach to thrashing prevention—controlling not just how much memory each process gets, but how many processes actively compete. Let's consolidate our understanding:
Looking Ahead:
Load control adjusts which processes compete for memory. But sometimes even with intelligent load control, specific processes need to be temporarily removed from competition entirely. The next page explores Swapping Processes—the mechanics and policies of moving entire processes between memory and secondary storage.
You now understand load control as a fundamental thrashing prevention strategy. The key insight is that controlling the number of active processes—not just their individual allocations—is essential for system stability. Better to run fewer processes efficiently than many processes in a thrashing state.