Loading learning content...
In a well-designed organization, employees don't remain at the same level forever. High performers get promoted; underperformers may be reassigned. But the rules governing these transitions—the criteria, timing, and speed of movement—profoundly affect both individual careers and organizational effectiveness.
Multi-level scheduling systems face identical challenges. Queue movement policies determine how processes transition between priority levels. Should a process be demoted after one bad quantum or after sustained CPU-bound behavior? Should promotion happen instantly upon I/O completion or gradually over time? Should all processes be boosted simultaneously or individually?
These policies, while often overlooked in introductory treatments, are the heart of practical MLFQ implementation. The difference between a responsive system and a sluggish one often comes down to the details of queue movement—how aggressively the scheduler demotes, how generously it promotes, and how it handles edge cases like behavior phase changes.
By the end of this page, you will understand: demotion policies and their variants, promotion policies including aging and boosting, hybrid movement strategies, the interaction between movement policies and system performance, real-world implementations, and how to design queue movement policies for specific workloads.
Demotion policies determine when and how a process moves to a lower-priority queue. The choice of demotion strategy significantly affects both interactive responsiveness and CPU-bound throughput.
Policy 1: Single-Quantum Demotion
The simplest approach: demote after a single full quantum usage.
if (used_full_quantum) {
priority--;
}
Characteristics:
Policy 2: Threshold-Based Demotion
Demote only after multiple consecutive full-quantum usages.
if (used_full_quantum) {
consecutive_full_quanta++;
if (consecutive_full_quanta >= THRESHOLD) {
priority--;
consecutive_full_quanta = 0;
}
} else {
consecutive_full_quanta = 0; // Reset on early yield
}
Characteristics:
Policy 3: Time Allotment Demotion (Cumulative Accounting)
Track cumulative CPU time at each priority level; demote when allotment exceeded.
time_at_priority += time_used;
if (time_at_priority >= allotment[priority]) {
priority--;
time_at_priority = 0;
}
Characteristics:
Policy 4: Weighted Demotion
Weight CPU usage based on context—giving processes partial credit for partial quantum usage.
demotion_score += (time_used / quantum) * weight;
if (demotion_score >= 1.0) {
priority--;
demotion_score -= 1.0;
}
Characteristics:
| Policy | Speed | Fairness | Gaming Resistance | Implementation |
|---|---|---|---|---|
| Single-Quantum | Fastest | Low | Low | Trivial |
| Threshold-Based | Moderate | Medium | Medium | Simple |
| Time Allotment | Moderate | High | High | Moderate |
| Weighted | Slowest/Continuous | Highest | Highest | Complex |
Time allotment demotion is the de facto standard in production systems. It combines fair treatment (cumulative accounting), gaming resistance (can't cheat with fragmented usage), and reasonable implementation complexity. Most OS scheduling papers and implementations use this approach.
Promotion policies determine when and how processes move to higher-priority queues. Unlike demotion (which follows CPU usage), promotion typically serves multiple goals: starvation prevention, behavior adaptation, and interactive responsiveness.
Policy 1: Periodic Global Boost
All processes are simultaneously promoted to highest priority at fixed intervals.
if (current_time % BOOST_PERIOD == 0) {
for (all processes) {
process.priority = 0;
process.time_at_priority = 0;
}
}
Characteristics:
Policy 2: Aging-Based Promotion
Promote processes individually after they've waited a certain time without CPU access.
for (each process in lower queues) {
if (current_time - last_run_time >= AGE_THRESHOLD) {
process.priority = max(process.priority - 1, 0);
process.last_run_time = current_time;
}
}
Characteristics:
Policy 3: I/O Completion Boost
Promote processes when they complete I/O operations.
void on_io_completion(Process* p) {
if (io_was_interactive(p->last_io)) {
p->priority = 0; // Boost to top
} else {
p->priority = max(p->priority - 1, 0); // Partial boost
}
}
Characteristics:
Policy 4: Graduated Promotion
Promote processes gradually—one level at a time—rather than jumping to top.
void promote(Process* p) {
if (p->priority > 0) {
p->priority--; // Move up one level
p->time_at_priority = 0;
}
}
Characteristics:
Policy 5: Hybrid Promotion (Graduated with I/O Boost)
Combine gradual aging with instant boost for I/O events.
// Aging: gradual promotion over time
if (wait_time >= AGE_THRESHOLD) {
priority = max(priority - 1, 0); // One level up
}
// I/O boost: instant promotion for interactive I/O
if (io_completion && is_user_interaction(io_type)) {
priority = 0; // Jump to top
}
Characteristics:
| Policy | Targeting | Starvation Prevention | Responsiveness | Stability |
|---|---|---|---|---|
| Global Boost | None (all processes) | Excellent | Periodic only | Low (congestion) |
| Aging | Waiting processes | Excellent | Time-delayed | High |
| I/O Boost | I/O-completing processes | Limited | Immediate | Medium |
| Graduated | Individual processes | Good | Slow | Very High |
| Hybrid | Context-dependent | Excellent | Context-dependent | High |
Production systems typically combine multiple promotion mechanisms. Windows uses I/O boost for responsiveness plus quantum end boost recovery. BSD uses aging. Linux CFS achieves promotion effects through virtual runtime decay. The right combination depends on workload characteristics.
Beyond which policy to use, the speed and granularity of queue movement significantly impact system behavior.
Movement Step Size:
Single-step movement (±1 level):
Multi-step movement (jump multiple levels):
Continuous priority (no discrete levels):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
from dataclasses import dataclassfrom typing import Callable @dataclassclass MovementConfig: """Configuration for queue movement policies.""" # Demotion parameters demote_allotment_factor: float = 2.0 # Allotment = factor * quantum quantum_multiplier: float = 2.0 # Quantum doubles each level # Promotion parameters global_boost_period: int = 1000 # ms between global boosts aging_threshold: int = 500 # ms wait before aging promotion io_boost_levels: int = 2 # Levels to boost on I/O completion # Movement granularity demotion_step: int = 1 # Levels to demote (usually 1) promotion_step: int = 1 # Levels to promote (aging) io_boost_to_top: bool = True # I/O completion jumps to top? def simulate_movement_behavior(config: MovementConfig): """Demonstrate different movement configurations.""" print("=" * 60) print("Queue Movement Policy Analysis") print("=" * 60) # Scenario 1: CPU-bound process demotion print("\n1. CPU-bound process demotion:") priority = 0 # Start at top time_at_level = 0 current_time = 0 for quantum_num in range(10): quantum_at_level = 8 * (config.quantum_multiplier ** priority) allotment = quantum_at_level * config.demote_allotment_factor # Process uses full quantum time_at_level += quantum_at_level current_time += quantum_at_level print(f" Quantum {quantum_num+1}: Level {priority}, " f"used {quantum_at_level}ms, accumulated {time_at_level}ms") if time_at_level >= allotment: priority = min(priority + config.demotion_step, 3) # 4 levels time_at_level = 0 print(f" -> Demoted to level {priority}") # Scenario 2: Interactive process with I/O print("\n2. Interactive process (I/O every 5ms):") priority = 0 time_at_level = 0 for burst in range(5): cpu_time = 5 # Short CPU burst time_at_level += cpu_time if time_at_level >= 8 * config.demote_allotment_factor: priority = min(priority + 1, 3) time_at_level = 0 demote_msg = " -> Demoted" else: demote_msg = " -> Stayed" # I/O completion boost if config.io_boost_to_top: priority = 0 time_at_level = 0 boost_msg = ", Boosted to top on I/O" else: priority = max(priority - config.io_boost_levels, 0) boost_msg = f", Partial boost to {priority}" print(f" Burst {burst+1}: {cpu_time}ms CPU, I/O{demote_msg}{boost_msg}") # Scenario 3: Aging behavior print("\n3. Low-priority process aging:") priority = 3 # Start at bottom wait_time = 0 while priority > 0: # Simulate waiting wait_time += 100 # 100ms wait increments if wait_time >= config.aging_threshold: priority = max(priority - config.promotion_step, 0) wait_time = 0 print(f" After {config.aging_threshold}ms wait: " f"Promoted to level {priority}") else: print(f" Waited {wait_time}ms, still at level {priority}") print(f" Reached top priority after aging") # Example configurationsif __name__ == "__main__": # Conservative config conservative = MovementConfig( demote_allotment_factor=3.0, # Slower demotion aging_threshold=1000, # Slow aging io_boost_to_top=False, # Gradual I/O boost io_boost_levels=1, ) # Aggressive config aggressive = MovementConfig( demote_allotment_factor=1.0, # Fast demotion aging_threshold=200, # Fast aging io_boost_to_top=True, # Jump to top on I/O ) print("\n>>> CONSERVATIVE CONFIGURATION <<<") simulate_movement_behavior(conservative) print("\n>>> AGGRESSIVE CONFIGURATION <<<") simulate_movement_behavior(aggressive)Movement Speed Tradeoffs:
| Speed | Demotion Effect | Promotion Effect | Best For |
|---|---|---|---|
| Fast | Quick CPU-bound identification | Quick starvation relief | Interactive-dominant |
| Slow | Tolerant of bursty behavior | Stable priority hierarchy | Server workloads |
| Asymmetric | Fast demotion, slow promotion | Conservative, stable | Mixed workloads |
| Symmetric | Balanced both directions | Dynamic, potentially unstable | Varied workloads |
The asymmetric approach (fast demotion, slow promotion) is common in production because:
Real-world schedulers must handle numerous edge cases that simple MLFQ rules don't address:
Edge Case 1: Process Fork/Clone
When a process creates a child, what priority should the child have?
Options:
Best practice: Inherit priority but reset allotment timer. This prevents fork bombs from gaming the system while maintaining priority inheritance for legitimate child processes.
Edge Case 2: Priority Inversion
A low-priority process holds a lock needed by a high-priority process.
Solutions:
Edge Case 3: Sleeping/Resumed Processes
A process sleeps for extended time (user suspends laptop, process waits for rare event).
Options:
Best practice: Preserve priority for short sleeps; treat as new arrival for very long sleeps (> boost period).
| Edge Case | Challenge | Common Solution |
|---|---|---|
| Process fork | Child priority unclear | Inherit priority, reset allotment |
| Priority inversion | Low-priority blocks high-priority | Priority inheritance |
| Long sleep | Stale priority after wake | Reset if sleep > boost period |
| Exec (new program) | Old behavior irrelevant | Reset to highest priority |
| Thread creation | New thread vs existing process | Inherit process priority |
| Nice value change | User requests priority change | Map to appropriate queue level |
Edge Case 4: Scheduler Tick Drift
Timer interrupts aren't perfectly accurate. A process may use slightly more or less than its quantum.
Solutions:
Edge Case 5: Real-Time Process Interference
Real-time processes may starve MLFQ-managed processes entirely.
Solutions:
Production schedulers spend significant code handling edge cases. The 'simple' MLFQ rules are just 20% of a real scheduler; the remaining 80% is edge case handling, integration with other subsystems, and platform-specific optimizations. Always consider edge cases when designing scheduling policies.
Let's examine how major operating systems implement queue movement:
Linux CFS: Virtual Runtime Movement
Linux CFS doesn't use discrete queues but achieves equivalent effects through virtual runtime:
// Virtual runtime increases as process runs
vruntime += delta_exec * (NICE_0_WEIGHT / weight);
// Process with lowest vruntime is selected
// This naturally sorts by CPU consumption
current = pick_next_entity(cfs_rq);
Movement equivalents:
sched_latency ensures all processes run within periodKey parameters:
sched_latency_ns: Target latency for runnable processes (6ms default)sched_min_granularity_ns: Minimum runtime before preemption (0.75ms)Windows: Dynamic Priority with Boost/Decay
Windows uses explicit priority levels (0-31) with dynamic adjustment:
// Priority boost on I/O completion
if (io_completed) {
boost = get_io_boost(io_type); // 1-15 depending on device
current_priority = base_priority + boost;
}
// Priority decay each quantum
if (used_full_quantum) {
current_priority = max(current_priority - 1, base_priority);
}
Movement characteristics:
FreeBSD ULE Scheduler:
ULE uses interactivity scoring to guide movement:
// Interactivity score based on sleep vs. run ratio
interactivity = SCHED_INTERACT_MAX -
(run_time / (sleep_time + run_time)) * SCHED_INTERACT_MAX;
// High interactivity → high priority
priority = base_priority - interactivity_boost(interactivity);
Movement characteristics:
| System | Demotion Mechanism | Promotion Mechanism | Boost Events |
|---|---|---|---|
| Linux CFS | vruntime increase | Sleep reset, min_vruntime | None (implicit) |
| Windows | Quantum decay | I/O completion boost | UI, I/O, foreground, lock |
| FreeBSD ULE | Run time accumulation | Sleep time accumulation | None (continuous) |
| Solaris TS | Dispatch table levels | Time slice + sleep | Wakeup boost |
| macOS GCD | QoS maintenance | QoS inheritance | User interaction |
Despite different implementations, all production schedulers share core principles: penalize CPU consumption, reward I/O behavior, ensure bounded starvation, and adapt to phase changes. The specific mechanism (discrete queues, vruntime, interactivity scores) varies, but the goals are universal.
When designing queue movement policies for custom systems (embedded, real-time, specialized servers), consider these guidelines:
Step 1: Define Workload Characteristics
Before designing policies, understand your workload:
| Workload Type | Demotion | Promotion | Boost Period |
|---|---|---|---|
| Interactive (desktop) | Moderate (2-3× quantum) | I/O boost to top | 1s |
| Server (mixed) | Aggressive (1× quantum) | Aging + I/O boost | 500ms |
| Batch (HPC) | Slow (5× quantum) | Aging only | 10s |
| Real-time (embedded) | Per-deadline miss | Deadline-based | N/A |
| Container/VM | Conservative | Share-based | Per-cgroup |
Step 2: Set Demotion Parameters
allotment[i] = base_quantum * (quantum_multiplier ^ i) * demotion_factor
Step 3: Set Promotion Parameters
aging_threshold = boost_period / num_queues // Ensures progression
io_boost_levels = interactive_dominance ? ALL : 1-2
Step 4: Tune Boost Period
boost_period = max_acceptable_latency / (1 + starvation_tolerance)
Step 5: Handle Edge Cases
Explicitly decide policy for:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293
/* Template for custom MLFQ movement policy */ #include <stdint.h>#include <stdbool.h> /* Configuration - adjust for your workload */typedef struct { /* Queue structure */ uint32_t num_queues; uint64_t base_quantum_ns; double quantum_multiplier; /* Demotion policy */ double allotment_factor; /* Allotment = quantum * factor */ bool use_cumulative_time; /* Track cumulative vs per-quantum */ /* Promotion policy */ uint64_t boost_period_ns; uint64_t aging_threshold_ns; bool io_boost_to_top; uint32_t io_boost_levels; /* Edge cases */ bool inherit_priority_on_fork; bool reset_on_long_sleep; uint64_t long_sleep_threshold_ns;} MLFQPolicy; /* Example: Interactive desktop configuration */static const MLFQPolicy DESKTOP_POLICY = { .num_queues = 4, .base_quantum_ns = 8 * 1000000, /* 8ms */ .quantum_multiplier = 2.0, .allotment_factor = 2.0, .use_cumulative_time = true, .boost_period_ns = 1000 * 1000000, /* 1 second */ .aging_threshold_ns = 250 * 1000000, /* 250ms */ .io_boost_to_top = true, .io_boost_levels = 4, /* Jump to top */ .inherit_priority_on_fork = true, .reset_on_long_sleep = true, .long_sleep_threshold_ns = 2000 * 1000000, /* 2 seconds */}; /* Example: Server workload configuration */static const MLFQPolicy SERVER_POLICY = { .num_queues = 3, .base_quantum_ns = 20 * 1000000, /* 20ms */ .quantum_multiplier = 2.0, .allotment_factor = 1.5, /* Faster demotion */ .use_cumulative_time = true, .boost_period_ns = 500 * 1000000, /* 500ms - prevent starvation */ .aging_threshold_ns = 100 * 1000000, /* 100ms */ .io_boost_to_top = false, .io_boost_levels = 1, /* Modest boost */ .inherit_priority_on_fork = true, .reset_on_long_sleep = false, /* Preserve state */ .long_sleep_threshold_ns = 0,}; /* Apply movement after process runs for 'runtime_ns' */void apply_movement(Process* p, uint64_t runtime_ns, MLFQPolicy* policy) { p->time_at_priority_ns += runtime_ns; uint64_t quantum = policy->base_quantum_ns * pow(policy->quantum_multiplier, p->priority); uint64_t allotment = quantum * policy->allotment_factor; if (p->time_at_priority_ns >= allotment) { /* Demote */ if (p->priority < policy->num_queues - 1) { p->priority++; p->time_at_priority_ns = 0; } }} /* Apply I/O completion boost */void apply_io_boost(Process* p, MLFQPolicy* policy) { if (policy->io_boost_to_top) { p->priority = 0; } else { p->priority = (p->priority > policy->io_boost_levels) ? p->priority - policy->io_boost_levels : 0; } p->time_at_priority_ns = 0;}Movement policies directly impact key performance metrics. Understanding these relationships helps in policy tuning.
Impact on Response Time:
Impact on Throughput:
Impact on Fairness:
| Parameter Change | Response Time | Throughput | Fairness |
|---|---|---|---|
| ↓ Allotment factor | Better ↑ | Slightly worse ↓ | Neutral |
| ↓ Boost period | Better ↑ | Worse ↓ | Better ↑ |
| ↑ Quantum multiplier | Worse ↓ | Better ↑ | Neutral |
| Enable I/O boost | Much better ↑↑ | Neutral | Better ↑ |
| ↑ Queue count | Better ↑ | Neutral | Neutral |
| Enable aging | Neutral | Neutral | Much better ↑↑ |
Benchmark Example:
Consider tuning the boost period on a mixed workload (50% interactive, 50% batch):
| Boost Period | Avg Response (interactive) | Batch Throughput | Observations |
|---|---|---|---|
| 100ms | 12ms | 85% of baseline | Excellent response, frequent disruption |
| 500ms | 25ms | 95% of baseline | Good balance |
| 1000ms | 45ms | 98% of baseline | Slight response degradation |
| 5000ms | 150ms | 100% baseline | Poor interactive response |
| No boost | > 1000ms (starvation) | 100%+ | Unusable for interactive |
Observation: The 500ms-1000ms range typically provides the best balance. Shorter periods disrupt batch work; longer periods harm interactive response.
These relationships are workload-dependent. A 500ms boost period optimal for one workload may be wrong for another. Always profile actual workload behavior before and after policy changes. Use scheduler tracing (ftrace, perf sched) to understand real impact.
We have comprehensively explored queue movement policies—the detailed algorithms governing how processes navigate between priority levels in multi-level scheduling systems.
Module Complete:
With this page, we have completed our comprehensive exploration of Round Robin and Multi-Level Queue Scheduling. You now understand:
This knowledge forms the foundation for understanding CPU scheduling in any operating system—from embedded RTOS to cloud hypervisors.
You have mastered Round Robin and Multi-Level Queue scheduling—from basic Round Robin mechanics through sophisticated MLFQ movement policies. You now possess the conceptual framework to understand, analyze, and tune CPU scheduling in any modern operating system.