Loading learning content...
CFS excels at providing proportionally fair CPU access across competing workloads. But some applications have requirements that fairness cannot satisfy:
For these workloads, Linux provides real-time scheduling policies that operate outside the CFS framework. Real-time tasks receive absolute priority over normal tasks—they preempt immediately and run until they voluntarily yield or block.
This page explores Linux's real-time capabilities: the SCHED_FIFO and SCHED_RR policies for static priority real-time, the newer SCHED_DEADLINE for deadline-based scheduling, and the trade-offs and configurations that make real-time work on a general-purpose operating system.
By the end of this page, you will understand: (1) The distinction between hard and soft real-time, (2) SCHED_FIFO and SCHED_RR policy semantics, (3) Real-time priority levels and their relationship to CFS, (4) The SCHED_DEADLINE policy and EDF scheduling, and (5) Configuration, safety mechanisms, and practical considerations.
Real-time systems are characterized by correctness depending not just on computational results but on when those results are produced. Understanding the taxonomy of real-time systems is essential before diving into Linux's implementation.
Hard vs. Soft Real-Time
Hard real-time systems have absolute deadlines. Missing a deadline constitutes system failure, potentially with catastrophic consequences:
Soft real-time systems have deadlines where occasional misses are tolerable, typically degrading quality rather than causing failure:
Standard Linux is designed for soft real-time. While it provides real-time scheduling policies, it cannot guarantee microsecond-level deadline bounds due to kernel preemption points, interrupt handling latency, and driver behavior. For hard real-time, specialized systems like PREEMPT_RT patches, Xenomai, or dedicated RTOSes are required.
Priority-Based vs. Deadline-Based Scheduling
Linux offers two paradigms for real-time scheduling:
Static Priority (Rate-Monotonic): Tasks are assigned fixed priority levels. Higher-priority tasks always preempt lower-priority ones. The programmer/administrator determines priorities at design time.
Dynamic Priority (Deadline-Based): Tasks specify their timing requirements (period, deadline, execution time). The scheduler dynamically orders tasks by deadline urgency.
The Scheduling Class Hierarchy
Linux organizes schedulers into classes with strict priority:
STOP class (highest) → kernel threads (migration, watchdog)
↓
DL class → SCHED_DEADLINE tasks
↓
RT class → SCHED_FIFO, SCHED_RR tasks (priorities 1-99)
↓
FAIR class → SCHED_NORMAL (CFS) tasks
↓
IDLE class (lowest) → SCHED_IDLE tasks
A higher class always preempts lower classes. Within each class, the class-specific algorithm determines scheduling.
SCHED_FIFO is the simplest real-time policy. A SCHED_FIFO task runs until:
Key Characteristics:
When to Use SCHED_FIFO:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495
/* * SCHED_FIFO Real-Time Task Example * * This example creates a high-priority real-time task * that processes audio buffers with minimal latency. */ #include <pthread.h>#include <sched.h>#include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <sys/mman.h> #define AUDIO_PRIORITY 80 /* High RT priority (1-99 scale) */ void *audio_processing_thread(void *arg) { printf("Audio thread running with SCHED_FIFO priority %d\n", AUDIO_PRIORITY); /* Simulate periodic audio processing */ while (1) { /* * In real code: * 1. Wait for audio buffer ready (blocking) * 2. Process audio samples * 3. Output processed samples */ process_audio_buffer(); /* * Key behavior: This thread will NOT be preempted by * ANY CFS (normal) task, regardless of their nice level. * Only higher-priority RT tasks can preempt. */ } return NULL;} int main() { pthread_t audio_thread; pthread_attr_t attr; struct sched_param param; int ret; /* * Lock memory to prevent page faults during RT operation. * Page faults cause unpredictable latency spikes. */ if (mlockall(MCL_CURRENT | MCL_FUTURE) == -1) { perror("mlockall failed (need CAP_IPC_LOCK)"); } /* Initialize thread attributes */ pthread_attr_init(&attr); /* Set SCHED_FIFO policy */ pthread_attr_setschedpolicy(&attr, SCHED_FIFO); /* Set priority (1-99, higher is higher priority) */ param.sched_priority = AUDIO_PRIORITY; pthread_attr_setschedparam(&attr, ¶m); /* Ensure policy/priority are used (vs inheriting from parent) */ pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED); /* Create the real-time thread */ ret = pthread_create(&audio_thread, &attr, audio_processing_thread, NULL); if (ret != 0) { fprintf(stderr, "pthread_create failed: %s\n", strerror(ret)); fprintf(stderr, "(Need CAP_SYS_NICE or root for RT scheduling)\n"); return 1; } /* Main thread continues with normal scheduling... */ pthread_join(audio_thread, NULL); return 0;} /* * Running this requires privileges: * * Option 1: Run as root * sudo ./audio_rt * * Option 2: Set capabilities * sudo setcap cap_sys_nice,cap_ipc_lock+ep ./audio_rt * * Option 3: Configure rtprio limit in /etc/security/limits.conf * @audio - rtprio 99 */SCHED_FIFO Behavior Nuances
Preemption Rules:
Same-Priority Ordering:
Yield Behavior:
A SCHED_FIFO task that loops without blocking will completely starve all CFS tasks, including critical system processes like SSH. The system becomes unresponsive; only higher-priority RT tasks or a reboot can recover. Always ensure RT tasks have bounded execution and block appropriately. Linux provides sched_rt_runtime_us as a safety throttle.
SCHED_RR is identical to SCHED_FIFO with one crucial addition: time slicing among equal-priority tasks. When multiple SCHED_RR tasks share the same priority level, they round-robin using a configurable time quantum.
Key Differences from SCHED_FIFO:
| Aspect | SCHED_FIFO | SCHED_RR |
|---|---|---|
| Time slice | None | Yes (default 100ms) |
| Equal-priority behavior | Run until yield/block | Round-robin |
| Preemption by higher | Immediate | Immediate |
| Use case | Single RT task per priority | Multiple RT tasks at same priority |
When to Use SCHED_RR:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
/* * SCHED_RR Real-Time Task Example * * Creates multiple worker threads that share CPU time * fairly within the real-time class. */ #include <pthread.h>#include <sched.h>#include <stdio.h>#include <unistd.h> #define RT_PRIORITY 50#define NUM_WORKERS 4 void *worker_thread(void *arg) { int id = *(int *)arg; int iterations = 0; printf("Worker %d started with SCHED_RR priority %d\n", id, RT_PRIORITY); while (iterations < 10) { /* Simulate computational work */ volatile long i; for (i = 0; i < 100000000; i++) { /* Busy work */ } printf("Worker %d completed iteration %d\n", id, ++iterations); /* * With SCHED_RR: * - Each worker runs for the time quantum (default 100ms) * - Then is preempted for the next equal-priority worker * - All workers make progress concurrently * * With SCHED_FIFO (same priority): * - First worker would run all 10 iterations * - Then second worker, etc. * - No interleaving without explicit yields */ } return NULL;} int main() { pthread_t threads[NUM_WORKERS]; pthread_attr_t attr; struct sched_param param; int worker_ids[NUM_WORKERS]; pthread_attr_init(&attr); pthread_attr_setschedpolicy(&attr, SCHED_RR); param.sched_priority = RT_PRIORITY; pthread_attr_setschedparam(&attr, ¶m); pthread_attr_setinheritsched(&attr, PTHREAD_EXPLICIT_SCHED); /* Create multiple RR workers at the same priority */ for (int i = 0; i < NUM_WORKERS; i++) { worker_ids[i] = i; pthread_create(&threads[i], &attr, worker_thread, &worker_ids[i]); } /* * Query the RR time quantum for informational purposes */ struct timespec ts; if (sched_rr_get_interval(0, &ts) == 0) { printf("RR time quantum: %ld.%09ld seconds\n", ts.tv_sec, ts.tv_nsec); } for (int i = 0; i < NUM_WORKERS; i++) { pthread_join(threads[i], NULL); } return 0;} /* * Output pattern with SCHED_RR (interleaved): * Worker 0 completed iteration 1 * Worker 1 completed iteration 1 * Worker 2 completed iteration 1 * Worker 3 completed iteration 1 * Worker 0 completed iteration 2 * ... (interleaved progress) * * Output pattern with SCHED_FIFO (sequential): * Worker 0 completed iteration 1 * Worker 0 completed iteration 2 * ... Worker 0 all 10 ... * Worker 1 completed iteration 1 * ... Worker 1 all 10 ... * (etc.) */Use SCHED_FIFO when you have one task per priority level or need explicit control over ordering. Use SCHED_RR when multiple tasks at the same priority need to share CPU time fairly. In practice, careful priority assignment often means SCHED_FIFO is sufficient—RR is a convenience for grouping equivalent-importance RT tasks.
SCHED_DEADLINE, introduced in Linux 3.14 (2014), implements the Earliest Deadline First (EDF) algorithm combined with the Constant Bandwidth Server (CBS) algorithm. This is the most sophisticated real-time policy in Linux.
Key Concepts:
Runtime (WCET): Maximum CPU time the task needs per period Deadline: Time by which the runtime must be consumed Period: How often the task repeats
The scheduler always runs the task with the earliest absolute deadline. This is dynamic—deadlines change as time passes and tasks complete their periods.
Why EDF Over Static Priority?
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
/* * SCHED_DEADLINE Task Example * * Configure a task to receive guaranteed CPU bandwidth * with deadline-based scheduling. */ #define _GNU_SOURCE#include <sched.h>#include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <linux/sched.h>#include <sys/syscall.h> /* * SCHED_DEADLINE parameters: * * This task needs 10ms of computation every 100ms * Deadline is also 100ms (must finish before next period) */#define SCHED_DEADLINE_RUNTIME (10 * 1000000) /* 10ms in ns */#define SCHED_DEADLINE_DEADLINE (100 * 1000000) /* 100ms in ns */#define SCHED_DEADLINE_PERIOD (100 * 1000000) /* 100ms in ns */ /* * sched_attr structure for SCHED_DEADLINE */struct sched_attr { uint32_t size; uint32_t sched_policy; uint64_t sched_flags; int32_t sched_nice; uint32_t sched_priority; uint64_t sched_runtime; uint64_t sched_deadline; uint64_t sched_period;}; static int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags) { return syscall(__NR_sched_setattr, pid, attr, flags);} int main() { struct sched_attr attr; int ret; /* Initialize sched_attr */ memset(&attr, 0, sizeof(attr)); attr.size = sizeof(attr); attr.sched_policy = SCHED_DEADLINE; attr.sched_runtime = SCHED_DEADLINE_RUNTIME; attr.sched_deadline = SCHED_DEADLINE_DEADLINE; attr.sched_period = SCHED_DEADLINE_PERIOD; /* * Set the scheduling parameters * Kernel will reject if the new task would make the system * unschedulable (admission control) */ ret = sched_setattr(0, &attr, 0); if (ret < 0) { perror("sched_setattr failed"); fprintf(stderr, "Possible causes:\n" " - Not running as root\n" " - Utilization would exceed system capacity\n" " - Invalid parameters (runtime > deadline > period)\n"); return 1; } printf("Running with SCHED_DEADLINE:\n" " Runtime: %llu ms\n" " Deadline: %llu ms\n" " Period: %llu ms\n", (unsigned long long)attr.sched_runtime / 1000000, (unsigned long long)attr.sched_deadline / 1000000, (unsigned long long)attr.sched_period / 1000000); /* * Main work loop - periodic task pattern */ while (1) { /* Do computation (should complete within runtime budget) */ do_periodic_work(); /* * sched_yield() has special meaning for SCHED_DEADLINE: * - Signals end of current period's computation * - Task blocks until next period begins * - Runtime budget resets for next period */ sched_yield(); } return 0;} /* * SCHED_DEADLINE admission control: * * Before accepting a new DEADLINE task, kernel checks: * * Σ (runtime_i / period_i) ≤ total_available_bandwidth * * Default total bandwidth: 95% of each CPU (5% reserved for non-RT) * Configurable via: /proc/sys/kernel/sched_rt_runtime_us * * This prevents oversubscription that would cause missed deadlines. */| Aspect | SCHED_FIFO | SCHED_RR | SCHED_DEADLINE |
|---|---|---|---|
| Priority model | Static (1-99) | Static (1-99) | Dynamic (deadline-based) |
| Time slicing | None | Yes (100ms default) | Implicit by runtime budget |
| Admission control | None | None | Yes (utilization check) |
| Utilization limit | ~69% optimal | ~69% optimal | ~100% optimal (EDF) |
| Configuration complexity | Low (assign priority) | Low (assign priority) | Medium (R, D, P parameters) |
| Class priority | Below DEADLINE | Below DEADLINE | Highest RT class |
| Best for | Known-priority tasks | Equal-importance tasks | Periodic, bounded workloads |
SCHED_DEADLINE tasks always preempt SCHED_FIFO and SCHED_RR tasks, regardless of their priorities. This is because deadline scheduling correctly identifies which task is most urgent. A DEADLINE task with an imminent deadline needs CPU immediately, regardless of any static priority assignments.
Real-time scheduling can easily render a system unresponsive if misconfigured. Linux provides several safety mechanisms and configuration points.
RT Throttling: The Safety Net
By default, Linux reserves CPU time for non-real-time tasks. Even if a runaway RT task consumes its full budget, CFS tasks still get some CPU.
/proc/sys/kernel/sched_rt_period_us = 1000000 (1 second)
/proc/sys/kernel/sched_rt_runtime_us = 950000 (950ms)
This means: In any 1-second period, RT tasks can only use 950ms total. The remaining 50ms (5%) is guaranteed to non-RT tasks.
To disable throttling (dangerous in production):
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
#!/bin/bash# Real-Time Scheduling Configuration for Linux # ========================================# View current RT scheduling parameters# ======================================== echo "=== RT Throttling Configuration ==="echo "Period (us): $(cat /proc/sys/kernel/sched_rt_period_us)"echo "Runtime (us): $(cat /proc/sys/kernel/sched_rt_runtime_us)"echo "RT utilization: $(($(cat /proc/sys/kernel/sched_rt_runtime_us) * 100 / $(cat /proc/sys/kernel/sched_rt_period_us)))%" echo ""echo "=== SCHED_RR Time Quantum ==="echo "RR timeslice (ms): $(cat /proc/sys/kernel/sched_rr_timeslice_ms)" # ========================================# Recommended production configuration# ======================================== # Allow RT tasks to use 95% of CPU (default)echo 950000 > /proc/sys/kernel/sched_rt_runtime_us # Set RR time slice to 50ms (more responsive than default 100ms)echo 50 > /proc/sys/kernel/sched_rr_timeslice_ms # ========================================# Per-user RT limits (/etc/security/limits.conf)# ======================================== # Allow 'audio' group to use RT priorities up to 99# @audio - rtprio 99 # Allow 'audio' group to lock memory# @audio - memlock unlimited # Allow specific user to use RT# alice - rtprio 50# alice - memlock 512000 # ========================================# Capabilities for non-root RT scheduling# ======================================== # Grant RT scheduling capability to a binary# sudo setcap cap_sys_nice+ep /path/to/binary # Check capabilities on a binary# getcap /path/to/binary # ========================================# Monitoring RT task behavior# ======================================== # View RT tasks with their prioritiesps -eo pid,class,rtprio,ni,comm --sort=-rtprio | head -20 # Real-time statistics from /proc/sched_debugcat /proc/sched_debug | grep -A5 "cfs_rq\|rt_rq" # Trace scheduling events (requires root, uses ftrace)echo 1 > /sys/kernel/debug/tracing/events/sched/sched_switch/enablecat /sys/kernel/debug/tracing/trace_pipe # ========================================# Check for priority inversion issues# ======================================== # Show tasks waiting on mutexes held by RT tasks# (This would require specific debugging tools like lockdep)For applications requiring sub-millisecond latency guarantees, consider the PREEMPT_RT patchset (being mainlined into Linux). It makes more kernel code preemptible, converts spinlocks to mutexes, and provides significantly lower interrupt-to-process latency—often under 100μs vs 500μs+ on standard kernels.
Real-time scheduling in Linux enables diverse applications that require predictable timing. Here are common use cases and their scheduling configurations.
Professional Audio (JACK/PipeWire)
Digital audio workstations need to process audio samples without glitches:
Industrial Automation (EtherCAT, CAN)
Real-time Ethernet and CAN bus protocols need precise packet timing:
12345678910111213141516171819202122232425262728293031323334353637383940414243
#!/bin/bash# Professional Audio Real-Time Setup# Configures a Linux system for low-latency audio production # 1. Add user to audio group for RT privilegessudo usermod -aG audio $USER # 2. Configure limits for audio group (/etc/security/limits.d/audio.conf)cat << 'EOF' | sudo tee /etc/security/limits.d/audio.conf@audio - rtprio 95@audio - memlock unlimited@audio - nice -19EOF # 3. Configure CPU governor for consistent timingecho performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor # 4. Disable CPU frequency scaling (optional, for minimal jitter)for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq; do cat "${cpu / min / max}" | sudo tee "$cpu"done # 5. Isolate a CPU core for audio (kernel boot parameter)# Add to GRUB_CMDLINE_LINUX in /etc/default/grub:# isolcpus=3 nohz_full=3 rcu_nocbs=3 # 6. Set RT scheduling for the audio server# JACK automatically uses RT when properly configured# PipeWire: copy /usr/share/pipewire/pipewire.conf to ~/.config/pipewire/# modify to enable RT with nice/rt settings # 7. Verify configurationecho ""echo "Current user groups:"groups echo ""echo "RT limits for current user:"ulimit -r # Max RT priority # After relogin and starting JACK:# jack_lsp -l # Check JACK latency# Should show latency in frames, e.g., 256 frames @ 48kHz = 5.3msLinux's real-time capabilities enable running soft real-time workloads on commodity hardware without dedicated RTOSes. Combined with low costs and rich ecosystems, this makes Linux the dominant platform for audio production, broadcast, industrial automation, and telecommunications—applications that once required specialized (and expensive) operating systems.
Linux's real-time scheduling policies provide essential capabilities for latency-sensitive applications, complementing CFS's fairness-oriented approach with strict priority-based scheduling.
What's Next
The final page explores nice values in depth—the user-facing mechanism for influencing CFS scheduling without real-time privileges. We'll see how nice values map to weights, their practical impact on CPU allocation, and guidelines for effective use.
You now understand Linux's real-time scheduling policies—SCHED_FIFO for strict priority ordering, SCHED_RR for time-sliced RT scheduling, and SCHED_DEADLINE for deadline-based optimal scheduling. This knowledge enables building latency-sensitive applications while understanding their system-wide implications.