Loading content...
Standard Linux scheduling—the Completely Fair Scheduler (CFS)—is designed around a principle of fairness: every process gets its proportional share of CPU time based on weight. This philosophy is exactly wrong for real-time systems.
In real-time computing, we don't want fairness—we want priority. A high-priority task monitoring a nuclear reactor coolant level must run immediately, regardless of how much CPU time it has already consumed or how many other tasks are waiting. The low-priority task updating a log file can wait indefinitely.
Linux provides three scheduling policies specifically designed for real-time requirements: SCHED_FIFO, SCHED_RR, and SCHED_DEADLINE. Understanding when and how to use each is essential for building deterministic systems.
By the end of this page, you will understand: (1) How SCHED_FIFO implements strict priority scheduling; (2) How SCHED_RR adds time-slicing for equal-priority tasks; (3) How SCHED_DEADLINE implements Earliest Deadline First; (4) When to choose each policy; (5) How to configure and monitor RT scheduling; and (6) Common pitfalls and best practices.
Linux implements a modular scheduler architecture with multiple scheduling classes, each implementing a different scheduling algorithm. The scheduler consults these classes in priority order, running tasks from the highest-priority class that has runnable tasks.
| Scheduling Class | Policies | Priority Range | Use Case |
|---|---|---|---|
| Stop Class | (internal) | N/A | Kernel-internal task migration, CPU hotplug |
| Deadline Class | SCHED_DEADLINE | N/A (EDF) | Tasks with explicit timing requirements |
| RT Class | SCHED_FIFO, SCHED_RR | 1-99 (99 highest) | Real-time tasks requiring priority scheduling |
| Fair Class (CFS) | SCHED_OTHER, SCHED_BATCH, SCHED_IDLE | Nice -20 to +19 | Normal tasks, weighted fair sharing |
| Idle Class | SCHED_IDLE | N/A | Background tasks, run only when nothing else |
Critical Architectural Point:
The scheduler always selects from the highest-priority class with runnable tasks. This means:
123456789101112131415161718192021
Scheduler Decision Flow:┌─────────────────────────────────────────────────────────────────┐│ pick_next_task() ││ │ ││ ▼ ││ ┌─── Stop class has runnable task? ───┐ ││ │ Yes No │ ││ ▼ │ │ ││ Run stop task ▼ │ ││ ┌─── Deadline class has task? ───┐ ││ │ Yes No │ ││ ▼ │ │ ││ Run EDF task ▼ │ ││ ┌─── RT class has task? ───┐ ││ │ Yes No │ ││ ▼ │ │ ││ Run highest-prio RT task ▼ │ ││ ┌─── Fair class ───┐ ││ ▼ ││ Run CFS-selected task │└─────────────────────────────────────────────────────────────────┘This strict priority ordering means RT tasks can completely starve normal tasks. A continuously running SCHED_FIFO task at priority 1 will prevent ALL SCHED_OTHER tasks from running—including your shell. Linux has RT throttling to prevent complete system lockup, but careful design is essential.
SCHED_FIFO implements the simplest real-time scheduling policy: strict priority with no time-slicing. A SCHED_FIFO task runs until it voluntarily yields, blocks, or is preempted by a higher-priority task.
SCHED_FIFO Algorithm:
1234567891011121314151617181920212223242526272829
SCHED_FIFO Scheduling Rules: 1. Pick the highest-priority runnable task2. Run that task until one of: a. Task blocks (I/O, mutex, sleep) b. Task yields (sched_yield()) c. Task terminates d. Higher-priority task becomes runnable 3. When multiple tasks have same priority: - Run them in FIFO order - A running task stays at front of queue - A waking/yielding task goes to back of queue 4. When task blocks and later wakes: - Task goes to back of its priority queue Example Timeline (priorities: A=90, B=80, C=80, D=70):┌─────────────────────────────────────────────────────────────────┐│ Time: 0 5 10 15 20 25 30 35 40 ││ ││ A(90): ████ ████████ (preempts from any lower) ││ B(80): ████ ████████ ││ C(80): ████ (FIFO order after B) ││ D(70): ███████ ││ ││ Events: A↓ A↓ A↓ B↓ C↓ ││ runs blocks runs blocks completes │└─────────────────────────────────────────────────────────────────┘Using SCHED_FIFO:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107
#define _GNU_SOURCE#include <sched.h>#include <stdio.h>#include <stdlib.h>#include <string.h>#include <pthread.h>#include <sys/mman.h>#include <unistd.h> /** * Configure current thread for SCHED_FIFO real-time scheduling * * @param priority RT priority (1-99, higher = more important) * @return 0 on success, -1 on failure */int configure_sched_fifo(int priority) { struct sched_param param; /* Validate priority range */ int min_prio = sched_get_priority_min(SCHED_FIFO); int max_prio = sched_get_priority_max(SCHED_FIFO); if (priority < min_prio || priority > max_prio) { fprintf(stderr, "Priority %d out of range [%d, %d]", priority, min_prio, max_prio); return -1; } memset(¶m, 0, sizeof(param)); param.sched_priority = priority; /* Set scheduler policy and priority */ if (sched_setscheduler(0, SCHED_FIFO, ¶m) != 0) { perror("sched_setscheduler failed"); fprintf(stderr, "Note: Requires CAP_SYS_NICE or root"); return -1; } printf("Configured SCHED_FIFO with priority %d", priority); return 0;} /** * Best practices for SCHED_FIFO tasks */void rt_task_best_practices(void) { /* 1. Lock all memory to prevent page faults */ if (mlockall(MCL_CURRENT | MCL_FUTURE) != 0) { perror("mlockall failed"); } /* 2. Pre-fault stack: touch stack pages before RT section */ volatile char stack_prefault[8192]; memset((void*)stack_prefault, 0, sizeof(stack_prefault)); /* 3. Avoid dynamic memory allocation in RT path */ /* Pre-allocate all buffers before entering RT loop */ /* 4. Avoid blocking system calls without timeouts */ /* Use poll/select with timeouts, not blocking read() */} /** * Example: Periodic RT task using SCHED_FIFO */void* periodic_rt_task(void* arg) { int period_us = *(int*)arg; struct timespec next_wake; /* Get current time */ clock_gettime(CLOCK_MONOTONIC, &next_wake); while (1) { /* Calculate next wake time */ next_wake.tv_nsec += period_us * 1000; while (next_wake.tv_nsec >= 1000000000) { next_wake.tv_nsec -= 1000000000; next_wake.tv_sec++; } /* Do RT work */ do_periodic_control_work(); /* Sleep until next period (precise timing) */ clock_nanosleep(CLOCK_MONOTONIC, TIMER_ABSTIME, &next_wake, NULL); } return NULL;} int main(int argc, char *argv[]) { /* Configure RT scheduling */ if (configure_sched_fifo(80) != 0) { return 1; } rt_task_best_practices(); int period_us = 1000; /* 1ms period */ periodic_rt_task(&period_us); return 0;}Command-line Configuration:
123456789101112131415161718192021
# Set SCHED_FIFO for a running processsudo chrt -f -p 80 <pid> # Run a new command with SCHED_FIFOsudo chrt -f 80 ./my_rt_application # View current scheduling policy of a processchrt -p <pid># Output: pid 1234's current scheduling policy: SCHED_FIFO# pid 1234's current scheduling priority: 80 # Show valid priority rangechrt -m# Output:# SCHED_OTHER min/max priority : 0/0# SCHED_FIFO min/max priority : 1/99# SCHED_RR min/max priority : 1/99 # Set SCHED_FIFO using nice/renice (NOT POSSIBLE)# nice/renice only affect SCHED_OTHER - they're completely # separate from RT priorities!SCHED_FIFO is ideal for event-driven tasks that do work and then block: interrupt handlers, I/O processing, and tasks waiting on condition variables. It's also appropriate when you have only one task per priority level, eliminating the need for time-slicing.
SCHED_RR extends SCHED_FIFO with time-slicing among equal-priority tasks. Tasks still follow strict priority rules, but when multiple SCHED_RR tasks share the same priority, they take turns via round-robin scheduling.
SCHED_RR Algorithm:
123456789101112131415161718192021222324252627
SCHED_RR Scheduling Rules: 1. Same as SCHED_FIFO, PLUS:2. Each task has a time quantum (typically 100ms)3. When quantum expires: a. Task goes to end of its priority queue b. Quantum reset for next run c. Next same-priority task runs 4. Quantum does NOT tick when: - Task is blocked - Higher-priority task running Example: Three SCHED_RR tasks at priority 50 (quantum = 100ms)┌─────────────────────────────────────────────────────────────────┐│ Time(ms): 0 100 200 300 400 500 ││ ││ Task A: ███████┐ ┌███████┐ ┌███████ ││ │ │ │ │ ││ Task B: ███████┐ ┌███████┐ ┌███████ ││ │ │ │ │ ││ Task C: ███████┐ ┌███████┐ ┌ ││ ││ All tasks make progress; none starved │└─────────────────────────────────────────────────────────────────┘ With SCHED_FIFO, Task A would run forever (no time-slicing)!Configuring SCHED_RR Time Quantum:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
#define _GNU_SOURCE#include <sched.h>#include <stdio.h> /** * Query and display SCHED_RR time quantum * * Note: The quantum is typically system-wide and not * easily configurable per-process in Linux. */void show_rr_quantum(void) { struct timespec quantum; /* Get RR time slice for PID 0 (current process) */ if (sched_rr_get_interval(0, &quantum) == 0) { printf("SCHED_RR time quantum: %ld.%09ld seconds", (long)quantum.tv_sec, quantum.tv_nsec); printf(" = %.1f milliseconds", quantum.tv_sec * 1000.0 + quantum.tv_nsec / 1000000.0); } else { perror("sched_rr_get_interval"); }} /** * Configure SCHED_RR scheduling */int configure_sched_rr(int priority) { struct sched_param param = { .sched_priority = priority }; if (sched_setscheduler(0, SCHED_RR, ¶m) != 0) { perror("sched_setscheduler SCHED_RR failed"); return -1; } printf("Configured SCHED_RR with priority %d", priority); show_rr_quantum(); return 0;}12345678910111213
# Run with SCHED_RR (round-robin)sudo chrt -r 50 ./my_application # Set running process to SCHED_RRsudo chrt -r -p 50 <pid> # Query current RR quantum (requires process to be RR)# There's no direct command; use the C function or:sudo cat /proc/<pid>/sched | grep policy # Configure system-wide RR quantum (if kernel supports)# This is kernel-compile-time configurable via:# CONFIG_HZ and sched_rr_timeslice_ms boot parameter (some kernels)In well-designed RT systems, each task should have a unique priority based on its timing requirements. If you find yourself needing SCHED_RR because multiple tasks share priorities, consider whether your priority assignment is correct. SCHED_RR is often a fallback for imprecise priority design rather than an intentional choice.
SCHED_DEADLINE implements Earliest Deadline First (EDF) scheduling, considered theoretically optimal for periodic real-time tasks. Instead of fixed priorities, tasks specify their timing requirements directly: period, deadline, and execution budget.
1234567891011121314151617181920
SCHED_DEADLINE Task Model:┌─────────────────────────────────────────────────────────────────┐│ Period ││ ◄──────────────────────────────────────────────────────────────►││ ││ │ Job N │ Job N+1 │ ││ │ │ │ ││ │←──Runtime────► │←──Runtime────► │ ││ │═══════════════ │═══════════════ │ ││ │ ↑ │ ↑ │ ││ │ Deadline │ Deadline │ ││ │←────────────────► │←────────────────► │ ││ ││ Example Task: ││ Period = 10ms (task activates every 10ms) ││ Deadline = 8ms (each job must complete within 8ms) ││ Runtime = 2ms (each job needs at most 2ms of CPU) ││ ││ Scheduler ensures task gets 2ms CPU within each 8ms window │└─────────────────────────────────────────────────────────────────┘EDF Scheduling Algorithm:
At any scheduling decision, the kernel runs the task with the earliest absolute deadline. This is provably optimal: if any schedule can meet all deadlines, EDF will.
Admission Control:
Unlike SCHED_FIFO/RR, SCHED_DEADLINE performs admission control. When you try to add a deadline task, the kernel checks if the new task's requirements, combined with existing deadline tasks, can be satisfied. If total utilization exceeds capacity, the request is rejected.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
#define _GNU_SOURCE#include <sched.h>#include <stdio.h>#include <string.h>#include <unistd.h>#include <sys/syscall.h>#include <linux/sched.h>#include <errno.h> /* * Note: SCHED_DEADLINE requires direct syscall usage as * glibc sched_setattr() may not be available on all systems. */ struct sched_attr { uint32_t size; uint32_t sched_policy; uint64_t sched_flags; /* SCHED_OTHER/BATCH/IDLE */ int32_t sched_nice; /* SCHED_FIFO/RR */ uint32_t sched_priority; /* SCHED_DEADLINE */ uint64_t sched_runtime; /* in nanoseconds */ uint64_t sched_deadline; /* in nanoseconds */ uint64_t sched_period; /* in nanoseconds */}; #define SCHED_FLAG_RESET_ON_FORK 0x01 /* Syscall wrappers */static int sched_setattr(pid_t pid, const struct sched_attr *attr, unsigned int flags) { return syscall(SYS_sched_setattr, pid, attr, flags);} static int sched_getattr(pid_t pid, struct sched_attr *attr, unsigned int size, unsigned int flags) { return syscall(SYS_sched_getattr, pid, attr, size, flags);} /** * Configure SCHED_DEADLINE for a periodic task * * @param runtime_ns Maximum CPU time per period (nanoseconds) * @param deadline_ns Relative deadline from period start * @param period_ns Period of the task (activation interval) */int configure_sched_deadline(uint64_t runtime_ns, uint64_t deadline_ns, uint64_t period_ns) { struct sched_attr attr; memset(&attr, 0, sizeof(attr)); attr.size = sizeof(attr); attr.sched_policy = SCHED_DEADLINE; attr.sched_runtime = runtime_ns; attr.sched_deadline = deadline_ns; attr.sched_period = period_ns; if (sched_setattr(0, &attr, 0) != 0) { perror("sched_setattr SCHED_DEADLINE failed"); if (errno == EBUSY) { fprintf(stderr, "Admission control failed: insufficient CPU" "Try reducing runtime or increasing period"); } return -1; } printf("Configured SCHED_DEADLINE:"); printf(" Runtime: %lu ns (%.2f ms)", runtime_ns, runtime_ns / 1e6); printf(" Deadline: %lu ns (%.2f ms)", deadline_ns, deadline_ns / 1e6); printf(" Period: %lu ns (%.2f ms)", period_ns, period_ns / 1e6); printf(" Utilization: %.1f%%", 100.0 * runtime_ns / period_ns); return 0;} /** * Example: Video frame processing with SCHED_DEADLINE * * Requirements: Process one frame every 16.67ms (60 FPS) * - Each frame takes up to 10ms to process * - Deadline: Must complete before next frame */int main(void) { /* 60 FPS video processing */ uint64_t period_ns = 16666667; /* 16.67ms */ uint64_t deadline_ns = 16666667; /* Same as period */ uint64_t runtime_ns = 8000000; /* 8ms max (48% utilization) */ /* * Admission check happens here: kernel verifies: * sum(runtime_i / period_i) <= CPU_CAPACITY * * Our utilization: 8/16.67 = 48% * Leaves 52% for other deadline tasks */ if (configure_sched_deadline(runtime_ns, deadline_ns, period_ns) != 0) { return 1; } /* Main RT loop */ while (1) { /* Process frame - kernel guarantees we finish in time */ process_video_frame(); /* Yield until next period */ sched_yield(); /* * After yield, task blocks until next period starts. * Kernel enforces that we don't run more than runtime_ns * per period, and that we complete by deadline_ns. */ } return 0;}Advantages of SCHED_DEADLINE:
SCHED_DEADLINE is powerful but has constraints: tasks must be periodic or sporadic (no arbitrary arrival); parameters must be known a priori; it's less flexible for aperiodic work; and it requires more careful WCET analysis to set runtime correctly.
Choosing the right scheduling policy depends on your task characteristics, system requirements, and complexity tolerance. Here's a comprehensive comparison:
| Aspect | SCHED_FIFO | SCHED_RR | SCHED_DEADLINE |
|---|---|---|---|
| Task Model | Priority-based | Priority-based with time-slice | Periodic/sporadic with parameters |
| Priorities | 1-99 fixed | 1-99 fixed + quantum | Implicit (earliest deadline) |
| Preemption | By higher priority only | By higher priority + quantum expiry | By earlier deadline only |
| Admission Control | None | None | Yes - rejects if infeasible |
| Max Utilization | ~69% (RMS) | ~69% (RMS) | 100% theoretical |
| Priority Inversion | Possible | Possible | Not applicable |
| Complexity | Low | Low | Medium-High |
| WCET Required | No (informal) | No (informal) | Yes (for runtime parameter) |
| Aperiodic Tasks | Easy | Easy | Requires bandwidth server |
| Use Case | Simple RT, legacy | Multiple equal-priority | Optimal multimedia, control |
Decision Framework:
123456789101112131415161718192021222324252627282930313233
How to Choose RT Scheduling Policy: 1. Do you have explicit timing requirements (period, deadline)? └── YES → Consider SCHED_DEADLINE └── NO → Use SCHED_FIFO or SCHED_RR 2. Are tasks periodic/sporadic with known parameters? └── YES → SCHED_DEADLINE provides optimal scheduling └── NO → Use priority-based (FIFO/RR) 3. Will you have multiple RT tasks at the same priority? └── YES → SCHED_RR prevents starvation among them └── NO → SCHED_FIFO is simpler 4. Is your system well-characterized with proper WCET analysis? └── YES → SCHED_DEADLINE gives guaranteed admission control └── NO → SCHED_FIFO/RR with conservative priorities 5. Maximum CPU utilization requirement: └── Need > 69% → SCHED_DEADLINE (can approach 100%) └── < 69% sufficient → Any policy works Common Patterns:┌─────────────────────────────────────────────────────────────────┐│ Pattern │ Recommendation │├──────────────────────────────────┼─────────────────────────────┤│ Simple control loop │ SCHED_FIFO, single priority ││ Multiple control loops │ SCHED_FIFO, careful priority││ Video/audio processing │ SCHED_DEADLINE (periodic) ││ Legacy RTOS port │ SCHED_RR (familiar model) ││ Event-driven I/O handler │ SCHED_FIFO (blocks often) ││ Soft real-time + best effort │ Mix: RT for critical, CFS │└─────────────────────────────────────────────────────────────────┘For most embedded and control applications, start with SCHED_FIFO and well-designed priorities. SCHED_DEADLINE shines for multimedia (fixed frame rates) and when you need formal guarantees. SCHED_RR is mainly useful when migrating from other RTOSes that used round-robin.
Real-time tasks can completely monopolize the CPU, preventing critical system services from running. Linux implements RT throttling to prevent runaway RT tasks from hanging the system.
RT Throttling Mechanism:
By default, Linux reserves a portion of CPU time for non-RT tasks. RT tasks are throttled if they exceed their allocated bandwidth:
12345678910111213141516171819202122
# View current RT throttling settingscat /proc/sys/kernel/sched_rt_period_us # Period (default: 1000000 = 1s)cat /proc/sys/kernel/sched_rt_runtime_us # RT budget (default: 950000 = 0.95s) # Default: RT tasks get 95% of CPU per 1-second period# Non-RT tasks guaranteed 5% of CPU (50ms per second) # EXAMPLES: # Example 1: Disable RT throttling (DANGEROUS!)echo -1 > /proc/sys/kernel/sched_rt_runtime_us# Now RT tasks can use 100% CPU - a buggy RT task will hang system! # Example 2: More conservative throttlingecho 800000 > /proc/sys/kernel/sched_rt_runtime_us# RT tasks get 80% of CPU, non-RT guaranteed 20% # Example 3: Shorter period for finer granularityecho 100000 > /proc/sys/kernel/sched_rt_period_usecho 95000 > /proc/sys/kernel/sched_rt_runtime_us# Still 95% RT, but checked every 100ms instead of 1s# Prevents RT bursts longer than 100msWhen Throttling Occurs:
123456789101112131415161718192021
RT Throttling Timeline (default settings: 950ms of 1000ms): Without throttling:┌─────────────────────────────────────────────────────────────────┐│ RT Task: ██████████████████████████████████████████████████ ││ Non-RT: (never runs - starved) ││ └─── System becomes unresponsive ───┘ │└─────────────────────────────────────────────────────────────────┘ With throttling:┌─────────────────────────────────────────────────────────────────┐│ Period: │ ◄────────── 1000ms ──────────► │ ◄─── 1000ms ──► ││ ││ RT Task: █████████████████████████████████│ ││ └─── 950ms ───┘ throttled! │ ││ ││ Non-RT: ████ ││ guaranteed 50ms to run └──┘ ││ ││ You can still SSH in, kill runaway process! │└─────────────────────────────────────────────────────────────────┘For production real-time systems, carefully consider RT throttling settings. Disabling throttling (echo -1) improves RT performance but means a buggy RT task WILL hang your system. Use hardware watchdogs and test thoroughly before disabling throttling.
Monitoring RT Scheduling:
1234567891011121314151617181920212223
# View all RT processes and their prioritiesps -eo pid,cls,rtprio,pri,comm --sort=-rtprio | head -20# cls: TS=SCHED_OTHER, FF=SCHED_FIFO, RR=SCHED_RR, DL=SCHED_DEADLINE # Detailed scheduling info for a processcat /proc/<pid>/sched # For SCHED_DEADLINE tasks, view parameters:# (requires reading sched_attr via syscall or specialized tools) # Monitor scheduling live with trace-cmdsudo trace-cmd record -e sched:sched_switch -e sched:sched_wakeupsudo trace-cmd report | head -100 # Use cyclictest to measure RT latency# (from rt-tests package)sudo cyclictest -p 90 -t -m -n# -p 90: priority 90# -t: per-CPU threads# -m: mlockall# -n: use clock_nanosleep # Output shows min/avg/max latencyEffective use of RT scheduling requires careful system design and adherence to proven practices:
A well-designed RT application: (1) Initializes resources; (2) Locks memory; (3) Pre-faults stack; (4) Sets RT scheduling; (5) Enters deterministic loop; (6) Uses precise timing for sleep/wake; (7) Never allocates in the loop. Follow this pattern and most latency problems vanish.
Linux provides three real-time scheduling policies, each suited to different requirements. Let's consolidate the essential concepts:
What's Next:
With RT scheduling policies understood, we'll next explore latency reduction techniques—the system-level optimizations that minimize scheduling and execution jitter, ensuring that RT tasks actually achieve their timing requirements.
You now understand Linux's real-time scheduling policies and can select the appropriate policy for your application requirements. This knowledge enables you to design and implement deterministic real-time systems on Linux platforms.