Operating SystemsReal-Time Linux

Real-Time Linux

LevelAdvanced

Duration90 mins

TopicReal-Time Linux

1 / 5

PREEMPT_RT Patch

Transforming Linux for Real-Time

Linux was never designed to be a real-time operating system. Born as a general-purpose Unix clone, its kernel architecture prioritized throughput, fairness, and scalability over determinism and bounded latency. Yet today, Linux powers industrial robots, medical devices, autonomous vehicles, and telecommunications infrastructure—systems where missing a deadline can mean catastrophic failure.

This transformation from a general-purpose system to a real-time platform represents one of the most ambitious kernel engineering efforts in computing history: the PREEMPT_RT patch.

For over two decades, this patchset has systematically reengineered the Linux kernel's fundamental assumptions about scheduling, locking, and interrupt handling. What began as an experimental project has evolved into production infrastructure that runs mission-critical systems worldwide, and as of 2024, significant portions have been merged into the mainline Linux kernel.

What You Will Learn

By the end of this page, you will understand: (1) Why standard Linux fails real-time requirements; (2) The architectural philosophy behind PREEMPT_RT; (3) Key kernel modifications that enable determinism; (4) The technical mechanisms that reduce worst-case latency; (5) How PREEMPT_RT compares to dedicated RTOSes; and (6) Practical considerations for deployment and configuration.

The Real-Time Problem in Standard Linux

To appreciate what PREEMPT_RT accomplishes, we must first understand why standard Linux fundamentally violates real-time requirements. The problem isn't performance—Linux can be extremely fast. The problem is predictability.

The Unpredictability Sources:

Standard Linux exhibits unpredictable latencies from multiple kernel subsystems, each capable of delaying high-priority task execution for arbitrary periods.

Sources of Latency in Standard Linux
Latency Source	Mechanism	Worst-Case Duration	Real-Time Impact
Interrupt Handlers	Hardirq handlers run with interrupts disabled, blocking all other processing	Hundreds of microseconds to milliseconds	Unbounded delay for high-priority tasks
Spinlocks	Lock holders cannot be preempted; critical sections run to completion	Milliseconds under contention	Priority inversion, deadline misses
RCU Grace Periods	Memory reclamation requires synchronization across all CPUs	Tens to hundreds of milliseconds	Unpredictable memory pressure effects
Softirqs/Tasklets	Deferred interrupt work runs at high priority, non-preemptible	Milliseconds during I/O bursts	Network/storage activity blocks RT tasks
Kernel Preemption Points	Preemption only at explicit points in non-PREEMPT kernels	Entire system call duration	Syscall latency becomes RT latency
Memory Allocation	Page reclaim, compaction, and slab allocation can block	Seconds under memory pressure	Catastrophic deadline misses

The Fundamental Tension:

Linux kernel developers have historically optimized for throughput and average-case performance. From this perspective, running an interrupt handler to completion before returning to user space is efficient—it minimizes context switch overhead and cache pollution.

But for real-time systems, worst-case latency is the only metric that matters. A system that completes 99.99% of operations in 10 microseconds but occasionally takes 100 milliseconds is worthless for controlling a robotic arm that requires 1-millisecond response guarantees.

The Long-Tail Problem

Real-time failures occur in the tail of the latency distribution—the rare worst-case scenarios that happen once per hour, per day, or per week. Standard Linux testing and optimization focuses on average cases, completely ignoring the tail events that cause real-time systems to fail catastrophically.

Quantifying the Problem:

In standard Linux (without PREEMPT_RT), measured worst-case latencies under load can reach:

Interrupt latency: 100μs - 10ms
Scheduling latency: 500μs - 50ms
Wake-up latency: 1ms - 100ms

For a real-time system requiring 100μs guarantees, these figures represent failure rates that are completely unacceptable. Even microsecond-level jitter can accumulate to cause deadline misses in tightly coupled control systems.

PREEMPT_RT Philosophy and History

The PREEMPT_RT patch emerged from a revolutionary insight: instead of building a real-time system alongside Linux (the dual-kernel approach), transform Linux itself into a real-time kernel. This philosophy has profound implications for system design, maintenance, and the broader Linux ecosystem.

Core Philosophical Principles

•Maximize Preemptibility — Make almost every kernel code path preemptible, so high-priority tasks can interrupt low-priority work at nearly any point
•Convert Spinlocks to Mutexes — Replace non-preemptible spinlocks with sleeping mutex implementations that allow priority inheritance
•Thread Interrupt Handlers — Move interrupt processing from hardirq context into kernel threads that can be scheduled and preempted
•Preserve Linux Semantics — Maintain compatibility with existing device drivers, filesystems, and user-space applications
•Upstream Integration — Work toward merging all changes into mainline Linux rather than maintaining a permanent fork

Historical Development:

The PREEMPT_RT project began around 2004-2005, led by kernel developers including Ingo Molnár, Thomas Gleixner, and Steven Rostedt. The project built upon earlier preemption work and introduced increasingly aggressive kernel modifications.

PREEMPT_RT Historical Milestones
Period	Development	Significance
2004-2005	Initial PREEMPT_RT patchset	Proved concept of fully preemptible Linux kernel
2006-2010	Threaded interrupt handlers	Fundamental architecture for interrupt preemption
2010-2015	Raw spinlock separation	Clean API distinguishing RT-safe and non-RT-safe locks
2015-2020	Mainline integration begins	Generic threaded IRQ support, priority inheritance mutexes merged
2020-2024	Accelerated upstreaming	Printk, locking, and timer subsystem changes merged; RT becomes kernel config option

Mainline Integration Achievement

As of Linux 6.x, the majority of PREEMPT_RT changes have been merged into mainline. The remaining pieces—primarily related to printk and certain locking primitives—are actively being upstreamed. This represents a decades-long engineering effort finally reaching completion.

Why Not a Dedicated RTOS?

The PREEMPT_RT approach offers compelling advantages over using a separate real-time operating system:

Driver Ecosystem — Access to thousands of Linux device drivers without porting
Development Tools — Use standard Linux debugging, profiling, and development environments
Application Compatibility — Run existing Linux applications alongside RT tasks
Community Support — Leverage the massive Linux development community
Hardware Support — Automatic support for new hardware through Linux's continuous development

Linux Preemption Models

The Linux kernel supports multiple preemption models, each representing a different tradeoff between throughput and latency. Understanding these models is essential for configuring real-time systems.

Linux Kernel Preemption Configurations
Config Option	Preemption Model	Kernel Behavior	Use Case
PREEMPT_NONE	No Preemption	Kernel code runs to completion; preemption only on return to user space	Servers, throughput-focused workloads
PREEMPT_VOLUNTARY	Voluntary Preemption	Explicit preemption points scattered through kernel; checks at might_sleep() calls	Desktop systems, general-purpose computing
PREEMPT	Full Preemption (Standard)	Kernel code preemptible except when holding spinlocks or in interrupt context	Low-latency desktops, soft real-time
PREEMPT_RT	Full Real-Time Preemption	Nearly all kernel code preemptible; spinlocks converted to mutexes; threaded interrupts	Hard real-time systems, industrial control

The Preemption Hierarchy:

Each preemption model builds upon the previous, adding more preemption points and reducing worst-case latency at the cost of increased overhead and complexity.

Preemption Progression

Conceptual Model

PREEMPT_NONE:
┌─────────────────────────────────────────────────────────────────┐
│ User Space │ Syscall/Interrupt → Kernel → Return │ User Space  │
└─────────────────────────────────────────────────────────────────┘
                    ↑ Preemption only here
 
PREEMPT_VOLUNTARY:
┌─────────────────────────────────────────────────────────────────┐
│ User │ Kernel code ──●──●──●──●── Kernel code │ User           │
└─────────────────────────────────────────────────────────────────┘
                      ↑ Preemption at explicit check points (●)
 
PREEMPT:
┌─────────────────────────────────────────────────────────────────┐
│ User │ ══════│ spinlock │══════│ spinlock │══════ │ User       │
└─────────────────────────────────────────────────────────────────┘
         ↑ Preemptible    ↑ Not     ↑ Preemptible
 
PREEMPT_RT:
┌─────────────────────────────────────────────────────────────────┐
│ User │ ═══════════════════════════════════════════════ │ User  │
└─────────────────────────────────────────────────────────────────┘
         ↑ Almost everything preemptible (sleeping locks)

Latency Implications:

The difference between preemption models becomes dramatic under load:

Measured Worst-Case Latencies (Representative Values)
Preemption Model	Idle System	Moderate Load	Heavy I/O Load	Memory Pressure
PREEMPT_NONE	10-50 μs	100 μs - 5 ms	10-100 ms	100 ms - 1 s
PREEMPT_VOLUNTARY	10-30 μs	50 μs - 1 ms	5-50 ms	50-500 ms
PREEMPT	10-20 μs	30-200 μs	1-10 ms	10-100 ms
PREEMPT_RT	5-15 μs	15-50 μs	20-100 μs	50-200 μs

The Real-Time Difference

Notice that PREEMPT_RT doesn't just improve average latency—it fundamentally changes worst-case behavior. Under heavy I/O, standard PREEMPT shows 1-10ms worst-case while PREEMPT_RT maintains 20-100μs. This bounded behavior is what makes real-time systems reliable.

Threaded Interrupt Handlers

One of PREEMPT_RT's most significant architectural changes is the conversion of interrupt handlers from hardirq context to kernel threads. This transformation is fundamental to achieving bounded latency.

Traditional Hardirq Model

•Interrupt handler runs immediately upon hardware interrupt
•All other interrupts blocked during handler execution
•No preemption possible—handler runs to completion
•Can execute for arbitrary duration
•Highest priority by virtue of running in interrupt context
•Device drivers determine kernel latency characteristics

Threaded IRQ Model

•Minimal hardirq stub acknowledges interrupt and wakes thread
•Main handler runs in kernel thread context
•Thread can be preempted by higher-priority threads
•Priority can be configured via standard scheduling APIs
•Multiple interrupt threads compete fairly for CPU
•System administrator controls latency through thread priorities

Implementation Architecture:

When a device driver requests a threaded interrupt, the kernel creates the following structure:

Threaded IRQ Flow

Conceptual

Hardware Interrupt Occurs
         │
         ▼
┌────────────────────────────────────────────────────┐
│  Hardirq Stub (Primary Handler)                    │
│  - Acknowledge interrupt to hardware               │
│  - Check if this interrupt needs handling          │
│  - Return IRQ_WAKE_THREAD to schedule thread       │
│  Duration: < 1 microsecond typically               │
└────────────────────────────────────────────────────┘
         │
         ▼ (Thread wakeup)
┌────────────────────────────────────────────────────┐
│  Scheduler Runs                                    │
│  - Threaded handler competes with other threads   │
│  - Priority inheritance if waiting on RT mutex    │
│  - System admin can set irq thread priorities     │
└────────────────────────────────────────────────────┘
         │
         ▼
┌────────────────────────────────────────────────────┐
│  Threaded Handler (Secondary Handler)              │
│  - Full interrupt processing                       │
│  - Can sleep, acquire mutexes                     │
│  - CAN BE PREEMPTED by higher priority threads   │
│  Duration: Whatever the handler needs             │
└────────────────────────────────────────────────────┘

threaded_irq_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
/**
 * Example: Requesting a threaded interrupt handler
 * 
 * The kernel creates an "irq/N-driver_name" thread for this handler.
 * This thread can be observed with 'ps' and its priority adjusted.
 */
 
#include <linux/interrupt.h>
 
/* Primary handler: runs in hardirq context, must be fast */
static irqreturn_t my_device_hardirq(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    
    /* Quick check: is this interrupt for us? */
    if (!device_interrupt_pending(dev))
        return IRQ_NONE;  /* Not our interrupt */
    
    /* Acknowledge interrupt to hardware (stop it from re-firing) */
    device_ack_interrupt(dev);
    
    /* Store any volatile state that must be captured immediately */
    dev->captured_timestamp = read_hardware_timestamp(dev);
    
    /* Request threaded handler execution */
    return IRQ_WAKE_THREAD;
}
 
/* Threaded handler: runs in process context, can do heavy work */
static irqreturn_t my_device_thread(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    
    /* 
     * This code runs in a kernel thread.
     * It CAN:
     *   - Sleep
     *   - Acquire mutexes (with priority inheritance on PREEMPT_RT)
     *   - Be preempted by higher-priority threads
     *   - Take as long as necessary
     * 
     * It CANNOT:
     *   - Assume it runs immediately after the interrupt
     *   - Assume no other code ran between hardirq and here
     */
    
    mutex_lock(&dev->data_lock);  /* Safe! Will sleep if contended */
    
    process_received_data(dev);
    wake_up_waiting_userspace(dev);
    prepare_next_dma_transfer(dev);
    
    mutex_unlock(&dev->data_lock);
    
    return IRQ_HANDLED;
}
 
/* Registration */
int my_device_probe(struct pci_dev *pdev)
{
    struct my_device *dev = /* ... allocate ... */;
    int ret;
    
    ret = request_threaded_irq(
        pdev->irq,
        my_device_hardirq,     /* Primary: hardirq context */
        my_device_thread,      /* Secondary: thread context */
        IRQF_SHARED,
        "my_device",
        dev
    );
    
    if (ret) {
        dev_err(&pdev->dev, "Failed to request threaded IRQ\n");
        return ret;
    }
    
    /*
     * After this, you can see the thread:
     *   $ ps aux | grep irq
     *   root ... [irq/24-my_device]
     * 
     * And adjust its priority:
     *   # chrt -f -p 90 <pid>
     */
    
    return 0;
}

Forced Threading on PREEMPT_RT

On PREEMPT_RT kernels, most interrupt handlers are automatically force-threaded even if the driver didn't request it. Only handlers marked with IRQF_NO_THREAD (for critical low-level functions like timer interrupts) retain hardirq execution. This ensures system-wide determinism regardless of driver quality.

Sleeping Spinlocks and Priority Inheritance

The most counterintuitive aspect of PREEMPT_RT is its treatment of spinlocks. In standard Linux, spinlocks are busy-wait locks that disable preemption—holding a spinlock means you cannot be preempted. PREEMPT_RT transforms most spinlocks into sleeping mutexes with priority inheritance.

The Spinlock Problem:

Consider a typical spinlock usage pattern in a device driver:

spinlock_problem.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/*
 * Standard spinlock usage - problematic for real-time
 */
void driver_operation(struct device *dev)
{
    unsigned long flags;
    
    spin_lock_irqsave(&dev->lock, flags);
    /*
     * PROBLEM: While holding this lock:
     *   1. Preemption is disabled
     *   2. All interrupts are disabled (on this CPU)
     *   3. Any higher-priority task wanting to run must wait
     *   4. Duration is unbounded (depends on work done here)
     * 
     * If this critical section takes 1ms, we add 1ms to the
     * worst-case latency of EVERY real-time task in the system!
     */
    perform_lengthy_device_operation(dev);
    spin_unlock_irqrestore(&dev->lock, flags);
}

PREEMPT_RT Solution: Sleeping Locks

PREEMPT_RT redefines spinlock_t to be a sleeping lock (an rtmutex internally). When code calls spin_lock(), it may actually sleep if the lock is contended—and critically, the lock holder can be preempted.

Lock Behavior Comparison

Conceptual

Standard Linux Spinlock:
┌─────────────────────────────────────────────────────────────────┐
│ Thread A: [───── spin_lock ─────────────── spin_unlock ────]   │
│                 ↑ preemption disabled                          │
│ Thread B: [...BLOCKED...BLOCKED...BLOCKED...]                 │
│           ↑ Cannot run even if higher priority                │
└─────────────────────────────────────────────────────────────────┘
 
PREEMPT_RT Sleeping Spinlock:
┌─────────────────────────────────────────────────────────────────┐
│ Thread A: [── lock ──┐                      ┌── unlock ──]     │
│                      │ preempted!           │                   │
│ Thread B:            └─[HIGH PRIORITY RUNS]─┘                  │
│                                                                 │
│ Note: Thread A continues after B completes. Priority           │
│ inheritance ensures A completes quickly to release lock.       │
└─────────────────────────────────────────────────────────────────┘

Priority Inheritance:

When a high-priority thread blocks waiting for a lock held by a low-priority thread, PREEMPT_RT implements priority inheritance: the lock holder temporarily inherits the waiter's priority. This prevents classic priority inversion scenarios.

Priority Inheritance Mechanism

Conceptual

Priority Inversion WITHOUT Inheritance:
┌─────────────────────────────────────────────────────────────────┐
│ High (Pri=90):     [BLOCKED on lock ─────────────────────────]│
│ Medium (Pri=50):   ════════════════════════════════════════    │
│ Low (Pri=10):      [holds lock...preempted by Medium...]      │
│                                                                 │
│ Problem: High waits for Low, but Medium runs instead of Low!  │
│          High's latency = Medium's entire execution time      │
└─────────────────────────────────────────────────────────────────┘
 
Priority Inversion WITH Inheritance (PREEMPT_RT):
┌─────────────────────────────────────────────────────────────────┐
│ High (Pri=90):     [wait]──────────[RUNS]                      │
│ Medium (Pri=50):           [blocked behind Low-at-90]          │
│ Low (Pri=10→90):   [runs at 90, releases lock]                 │
│                    ↑ Inherits High's priority                  │
│                                                                 │
│ Result: High's latency = only Low's critical section time     │
└─────────────────────────────────────────────────────────────────┘

Raw Spinlocks:

Some kernel code genuinely requires non-sleeping spinlocks—typically low-level code that manages the sleeping infrastructure itself, or code that runs before the scheduler is available. PREEMPT_RT provides raw_spinlock_t for these cases:

raw_spinlock_usage.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <linux/spinlock.h>
 
/* Regular spinlock: becomes sleeping mutex on PREEMPT_RT */
static DEFINE_SPINLOCK(normal_lock);
 
/* Raw spinlock: always a true spinlock, even on PREEMPT_RT */
static DEFINE_RAW_SPINLOCK(raw_lock);
 
void regular_path(void)
{
    spin_lock(&normal_lock);
    /* On PREEMPT_RT: May sleep! Lock holder can be preempted. */
    /* Use for most normal device driver critical sections. */
    do_normal_work();
    spin_unlock(&normal_lock);
}
 
void scheduler_critical_path(void)
{
    unsigned long flags;
    
    raw_spin_lock_irqsave(&raw_lock, flags);
    /*
     * TRUE busy-wait spinlock. Preemption disabled.
     * 
     * Use ONLY for:
     *   - Scheduler internals
     *   - Interrupt controller manipulation
     *   - Timer hardware programming
     *   - Debugging/tracing infrastructure
     * 
     * Keep critical sections EXTREMELY short (< 1μs ideal).
     */
    manipulate_scheduler_structures();
    raw_spin_unlock_irqrestore(&raw_lock, flags);
}

Raw Spinlock Discipline

Every raw_spinlock in the kernel is a potential latency source. PREEMPT_RT developers carefully audit raw spinlock usage, striving to minimize both the number of raw spinlocks and the duration of their critical sections. Adding raw spinlocks to new code requires strong justification.

Additional PREEMPT_RT Kernel Modifications

Beyond threaded interrupts and sleeping spinlocks, PREEMPT_RT includes numerous other modifications that collectively achieve deterministic behavior.

Key PREEMPT_RT Modifications

•High-Resolution Timers — Kernel timers with nanosecond resolution instead of jiffy-based (typically 1-4ms) timing. Essential for fine-grained scheduling.
•Threaded Softirqs — Softirq processing (network RX, timer callbacks, etc.) moved to kernel threads that can be prioritized and preempted.
•RCU Modifications — Preemptible RCU that allows readers to be preempted during read-side critical sections.
•PI-Aware Futexes — User-space futex operations support priority inheritance for RT mutexes in user space.
•Preemptible printk — Console output doesn't block high-priority tasks; printk work deferred to dedicated thread.
•Migration Disable — API to disable task migration without disabling preemption (preserving RT guarantees).
•Lazy Preemption — Optimized preemption that reduces overhead while maintaining determinism.

Softirq Threading:

In standard Linux, softirqs run immediately after hardware interrupts with interrupts enabled but preemption disabled. This can cause significant latency if multiple softirqs are pending. PREEMPT_RT moves softirq execution to dedicated kernel threads:

Softirq Handling Comparison

Conceptual

Standard Linux Softirq Processing:
┌─────────────────────────────────────────────────────────────────┐
│ [Hardirq] → [Softirq: NET_RX + TIMER + SCHED + ...] → [User]  │
│             └──────── Cannot be preempted ────────┘            │
│             Potentially milliseconds of non-preemptible work  │
└─────────────────────────────────────────────────────────────────┘
 
PREEMPT_RT Threaded Softirqs:
┌─────────────────────────────────────────────────────────────────┐
│ [Hardirq] → [Wake ksoftirqd] → [Return immediately]           │
│                   ↓                                             │
│             [ksoftirqd/N thread runs when scheduled]           │
│             [Can be preempted by higher priority threads]      │
│                                                                 │
│ $ ps aux | grep ksoftirqd                                      │
│ root ... [ksoftirqd/0]    # CPU 0 softirq thread              │
│ root ... [ksoftirqd/1]    # CPU 1 softirq thread              │
└─────────────────────────────────────────────────────────────────┘

Preemptible RCU:

Read-Copy-Update (RCU) is a fundamental Linux synchronization mechanism. Standard RCU requires readers to run to completion without preemption. PREEMPT_RT implements a preemptible RCU variant where readers can be preempted mid-critical-section:

preemptible_rcu.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/*
 * RCU Reader: Standard vs PREEMPT_RT
 */
 
/* Standard Linux: Reader cannot be preempted */
void standard_rcu_reader(void)
{
    rcu_read_lock();  /* Disables preemption */
    
    ptr = rcu_dereference(global_ptr);
    /* CANNOT be preempted here */
    process(ptr);
    
    rcu_read_unlock(); /* Re-enables preemption */
}
 
/* PREEMPT_RT: Reader CAN be preempted */
void preempt_rt_rcu_reader(void)
{
    rcu_read_lock();  /* Does NOT disable preemption */
    
    ptr = rcu_dereference(global_ptr);
    /*
     * CAN be preempted here!
     * 
     * The RCU machinery tracks that we're in a critical section
     * and ensures grace periods account for preempted readers.
     * 
     * This prevents a long-running RCU reader from blocking
     * high-priority real-time tasks.
     */
    process_potentially_long_operation(ptr);
    
    rcu_read_unlock();
}

Overhead Trade-off

All these modifications add overhead. Threaded interrupts require thread context switches. Sleeping spinlocks have mutex acquisition costs. Preemptible RCU has more complex tracking. PREEMPT_RT trades some average-case performance for dramatically improved worst-case performance—exactly the trade-off real-time systems require.

Building and Configuring PREEMPT_RT

Deploying PREEMPT_RT requires careful kernel configuration and system tuning. This section covers the practical aspects of building and configuring an RT kernel.

Obtaining PREEMPT_RT:

As of recent Linux versions, PREEMPT_RT support is largely mainlined. For older kernels or the complete patchset:

Obtaining PREEMPT_RT Patches
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# For mainline kernels (6.x+): PREEMPT_RT is a config option
# No patches needed for many configurations
 
# For kernels requiring patches:
# Visit: https://wiki.linuxfoundation.org/realtime/start
# Download matching patch version for your kernel
 
# Example: Applying patches to kernel 5.15
wget https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/patch-5.15.XX-rtXX.patch.gz
gunzip patch-5.15.XX-rtXX.patch.gz
 
# Apply to kernel source
cd /path/to/linux-5.15
patch -p1 < ../patch-5.15.XX-rtXX.patch

Essential Kernel Configuration:

RT Kernel Configuration

Kconfig

# Essential PREEMPT_RT Configuration Options
 
# Core preemption model - select PREEMPT_RT
CONFIG_PREEMPT_RT=y          # Full real-time preemption
 
# Timer configuration
CONFIG_HIGH_RES_TIMERS=y     # High-resolution timer support (essential)
CONFIG_NO_HZ_FULL=y          # Tickless operation for RT tasks (optional)
 
# Scheduler features
CONFIG_RT_GROUP_SCHED=y      # RT task group scheduling (optional)
 
# Disable problematic features for RT
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y  # Avoid frequency scaling
CONFIG_CPU_IDLE=n            # Or configure very short idle states only
 
# Debugging (disable in production for lower latency)
CONFIG_DEBUG_PREEMPT=n       # Disable for production
CONFIG_PROVE_LOCKING=n       # Disable for production
CONFIG_LOCKDEP=n             # Disable for production
 
# Tracing (keep for latency analysis, disable for absolute minimum)
CONFIG_FTRACE=y              # Function tracing
CONFIG_IRQSOFF_TRACER=y      # Track IRQs-off latency
CONFIG_PREEMPTIRQ_EVENTS=y   # Preemption/IRQ tracing
CONFIG_SCHED_TRACER=y        # Scheduler tracing
 
# Memory configuration
CONFIG_TRANSPARENT_HUGEPAGE=n   # Avoid THP overhead (recommended)
CONFIG_COMPACTION=n             # Consider disabling compaction

Runtime Configuration:

After booting the RT kernel, additional runtime configuration optimizes real-time behavior:

Runtime RT Configuration
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash
# Runtime configuration for PREEMPT_RT system
 
# 1. Verify RT kernel is running
uname -a | grep -i rt || echo "WARNING: Not running RT kernel!"
 
# 2. Set CPU frequency to maximum (avoid frequency scaling latency)
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo performance > $cpu
done
 
# 3. Disable real-time throttling (allow RT tasks to use 100% CPU)
# WARNING: A runaway RT task can hang the system!
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
 
# 4. Isolate CPUs for RT tasks (kernel boot parameter is better)
# In /etc/default/grub, add: isolcpus=2,3 nohz_full=2,3
 
# 5. Configure IRQ affinities - move non-RT IRQs off RT CPUs
# Move all IRQs to CPU 0,1 (leaving 2,3 for RT tasks)
for irq in /proc/irq/*/smp_affinity; do
    echo 3 > $irq 2>/dev/null  # CPUs 0 and 1
done
 
# 6. Set RT thread priorities
# Example: Set network IRQ thread to priority 90
pgrep -f "irq/.*eth" | xargs -I{} chrt -f -p 90 {}
 
# 7. Lock memory for RT application
# Application should use mlockall(MCL_CURRENT | MCL_FUTURE)
 
# 8. Verify configuration
echo "=== RT Configuration Summary ==="
echo "Kernel: $(uname -r)"
echo "RT Runtime: $(cat /proc/sys/kernel/sched_rt_runtime_us)"
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "Isolated CPUs: $(cat /sys/devices/system/cpu/isolated)"

Safety Warning: RT Throttling

Setting sched_rt_runtime_us to -1 disables RT throttling, allowing RT tasks to consume 100% CPU indefinitely. A buggy RT task can completely lock up the system. Only disable throttling on fully tested production systems with hardware watchdogs.

Summary: PREEMPT_RT Patch

The PREEMPT_RT patch represents a fundamental reimagining of the Linux kernel for real-time applications. Let's consolidate the key concepts:

Key Takeaways

•Standard Linux Has Unbounded Latency — Spinlocks, hardirqs, and non-preemptible sections create worst-case latencies measured in milliseconds to seconds.
•PREEMPT_RT Makes Nearly Everything Preemptible — Through sleeping spinlocks, threaded interrupts, and preemptible RCU, high-priority tasks can preempt almost any kernel activity.
•Spinlocks Become Sleeping Mutexes — Regular spinlock_t converts to sleeping locks with priority inheritance; raw_spinlock_t remains for truly critical sections.
•Interrupt Handlers Run as Threads — Device drivers' interrupt processing moves to schedulable kernel threads that obey priority and can be preempted.
•Priority Inheritance Prevents Inversion — When high-priority tasks block on locks, the lock holder temporarily inherits the waiter's priority.
•Mainline Integration Is Progressing — Most PREEMPT_RT code is now in mainline Linux; it's becoming a standard kernel configuration option.
•Configuration Requires Care — Achieving optimal RT performance requires kernel config awareness, runtime tuning, and understanding of the trade-offs involved.

What's Next:

With the foundational understanding of PREEMPT_RT architecture, we'll next explore the specific real-time schedulers available in Linux—SCHED_FIFO, SCHED_RR, and SCHED_DEADLINE—and how to effectively use them for different real-time requirements.

Page Complete

You now understand the fundamental architecture and mechanisms of the PREEMPT_RT patch—the key technology that transforms Linux into a real-time operating system. This knowledge is essential for developing and deploying real-time applications on Linux platforms.

1 / 5

Loading learning content...

Operating SystemsReal-Time Linux

Real-Time Linux

LevelAdvanced

Duration90 mins

TopicReal-Time Linux

1 / 5

PREEMPT_RT Patch

Transforming Linux for Real-Time

This transformation from a general-purpose system to a real-time platform represents one of the most ambitious kernel engineering efforts in computing history: the PREEMPT_RT patch.

What You Will Learn

The Real-Time Problem in Standard Linux

The Unpredictability Sources:

Standard Linux exhibits unpredictable latencies from multiple kernel subsystems, each capable of delaying high-priority task execution for arbitrary periods.

Sources of Latency in Standard Linux
Latency Source	Mechanism	Worst-Case Duration	Real-Time Impact
Interrupt Handlers	Hardirq handlers run with interrupts disabled, blocking all other processing	Hundreds of microseconds to milliseconds	Unbounded delay for high-priority tasks
Spinlocks	Lock holders cannot be preempted; critical sections run to completion	Milliseconds under contention	Priority inversion, deadline misses
RCU Grace Periods	Memory reclamation requires synchronization across all CPUs	Tens to hundreds of milliseconds	Unpredictable memory pressure effects
Softirqs/Tasklets	Deferred interrupt work runs at high priority, non-preemptible	Milliseconds during I/O bursts	Network/storage activity blocks RT tasks
Kernel Preemption Points	Preemption only at explicit points in non-PREEMPT kernels	Entire system call duration	Syscall latency becomes RT latency
Memory Allocation	Page reclaim, compaction, and slab allocation can block	Seconds under memory pressure	Catastrophic deadline misses

The Fundamental Tension:

The Long-Tail Problem

Quantifying the Problem:

In standard Linux (without PREEMPT_RT), measured worst-case latencies under load can reach:

Interrupt latency: 100μs - 10ms
Scheduling latency: 500μs - 50ms
Wake-up latency: 1ms - 100ms

PREEMPT_RT Philosophy and History

Core Philosophical Principles

•Maximize Preemptibility — Make almost every kernel code path preemptible, so high-priority tasks can interrupt low-priority work at nearly any point
•Convert Spinlocks to Mutexes — Replace non-preemptible spinlocks with sleeping mutex implementations that allow priority inheritance
•Thread Interrupt Handlers — Move interrupt processing from hardirq context into kernel threads that can be scheduled and preempted
•Preserve Linux Semantics — Maintain compatibility with existing device drivers, filesystems, and user-space applications
•Upstream Integration — Work toward merging all changes into mainline Linux rather than maintaining a permanent fork

Historical Development:

PREEMPT_RT Historical Milestones
Period	Development	Significance
2004-2005	Initial PREEMPT_RT patchset	Proved concept of fully preemptible Linux kernel
2006-2010	Threaded interrupt handlers	Fundamental architecture for interrupt preemption
2010-2015	Raw spinlock separation	Clean API distinguishing RT-safe and non-RT-safe locks
2015-2020	Mainline integration begins	Generic threaded IRQ support, priority inheritance mutexes merged
2020-2024	Accelerated upstreaming	Printk, locking, and timer subsystem changes merged; RT becomes kernel config option

Mainline Integration Achievement

Why Not a Dedicated RTOS?

The PREEMPT_RT approach offers compelling advantages over using a separate real-time operating system:

Driver Ecosystem — Access to thousands of Linux device drivers without porting
Development Tools — Use standard Linux debugging, profiling, and development environments
Application Compatibility — Run existing Linux applications alongside RT tasks
Community Support — Leverage the massive Linux development community
Hardware Support — Automatic support for new hardware through Linux's continuous development

Linux Preemption Models

The Linux kernel supports multiple preemption models, each representing a different tradeoff between throughput and latency. Understanding these models is essential for configuring real-time systems.

Linux Kernel Preemption Configurations
Config Option	Preemption Model	Kernel Behavior	Use Case
PREEMPT_NONE	No Preemption	Kernel code runs to completion; preemption only on return to user space	Servers, throughput-focused workloads
PREEMPT_VOLUNTARY	Voluntary Preemption	Explicit preemption points scattered through kernel; checks at might_sleep() calls	Desktop systems, general-purpose computing
PREEMPT	Full Preemption (Standard)	Kernel code preemptible except when holding spinlocks or in interrupt context	Low-latency desktops, soft real-time
PREEMPT_RT	Full Real-Time Preemption	Nearly all kernel code preemptible; spinlocks converted to mutexes; threaded interrupts	Hard real-time systems, industrial control

The Preemption Hierarchy:

Each preemption model builds upon the previous, adding more preemption points and reducing worst-case latency at the cost of increased overhead and complexity.

Preemption Progression

Conceptual Model

PREEMPT_NONE:
┌─────────────────────────────────────────────────────────────────┐
│ User Space │ Syscall/Interrupt → Kernel → Return │ User Space  │
└─────────────────────────────────────────────────────────────────┘
                    ↑ Preemption only here
 
PREEMPT_VOLUNTARY:
┌─────────────────────────────────────────────────────────────────┐
│ User │ Kernel code ──●──●──●──●── Kernel code │ User           │
└─────────────────────────────────────────────────────────────────┘
                      ↑ Preemption at explicit check points (●)
 
PREEMPT:
┌─────────────────────────────────────────────────────────────────┐
│ User │ ══════│ spinlock │══════│ spinlock │══════ │ User       │
└─────────────────────────────────────────────────────────────────┘
         ↑ Preemptible    ↑ Not     ↑ Preemptible
 
PREEMPT_RT:
┌─────────────────────────────────────────────────────────────────┐
│ User │ ═══════════════════════════════════════════════ │ User  │
└─────────────────────────────────────────────────────────────────┘
         ↑ Almost everything preemptible (sleeping locks)

Latency Implications:

The difference between preemption models becomes dramatic under load:

Measured Worst-Case Latencies (Representative Values)
Preemption Model	Idle System	Moderate Load	Heavy I/O Load	Memory Pressure
PREEMPT_NONE	10-50 μs	100 μs - 5 ms	10-100 ms	100 ms - 1 s
PREEMPT_VOLUNTARY	10-30 μs	50 μs - 1 ms	5-50 ms	50-500 ms
PREEMPT	10-20 μs	30-200 μs	1-10 ms	10-100 ms
PREEMPT_RT	5-15 μs	15-50 μs	20-100 μs	50-200 μs

The Real-Time Difference

Threaded Interrupt Handlers

Traditional Hardirq Model

•Interrupt handler runs immediately upon hardware interrupt
•All other interrupts blocked during handler execution
•No preemption possible—handler runs to completion
•Can execute for arbitrary duration
•Highest priority by virtue of running in interrupt context
•Device drivers determine kernel latency characteristics

Threaded IRQ Model

•Minimal hardirq stub acknowledges interrupt and wakes thread
•Main handler runs in kernel thread context
•Thread can be preempted by higher-priority threads
•Priority can be configured via standard scheduling APIs
•Multiple interrupt threads compete fairly for CPU
•System administrator controls latency through thread priorities

Implementation Architecture:

When a device driver requests a threaded interrupt, the kernel creates the following structure:

Threaded IRQ Flow

Conceptual

Hardware Interrupt Occurs
         │
         ▼
┌────────────────────────────────────────────────────┐
│  Hardirq Stub (Primary Handler)                    │
│  - Acknowledge interrupt to hardware               │
│  - Check if this interrupt needs handling          │
│  - Return IRQ_WAKE_THREAD to schedule thread       │
│  Duration: < 1 microsecond typically               │
└────────────────────────────────────────────────────┘
         │
         ▼ (Thread wakeup)
┌────────────────────────────────────────────────────┐
│  Scheduler Runs                                    │
│  - Threaded handler competes with other threads   │
│  - Priority inheritance if waiting on RT mutex    │
│  - System admin can set irq thread priorities     │
└────────────────────────────────────────────────────┘
         │
         ▼
┌────────────────────────────────────────────────────┐
│  Threaded Handler (Secondary Handler)              │
│  - Full interrupt processing                       │
│  - Can sleep, acquire mutexes                     │
│  - CAN BE PREEMPTED by higher priority threads   │
│  Duration: Whatever the handler needs             │
└────────────────────────────────────────────────────┘

threaded_irq_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
/**
 * Example: Requesting a threaded interrupt handler
 * 
 * The kernel creates an "irq/N-driver_name" thread for this handler.
 * This thread can be observed with 'ps' and its priority adjusted.
 */
 
#include <linux/interrupt.h>
 
/* Primary handler: runs in hardirq context, must be fast */
static irqreturn_t my_device_hardirq(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    
    /* Quick check: is this interrupt for us? */
    if (!device_interrupt_pending(dev))
        return IRQ_NONE;  /* Not our interrupt */
    
    /* Acknowledge interrupt to hardware (stop it from re-firing) */
    device_ack_interrupt(dev);
    
    /* Store any volatile state that must be captured immediately */
    dev->captured_timestamp = read_hardware_timestamp(dev);
    
    /* Request threaded handler execution */
    return IRQ_WAKE_THREAD;
}
 
/* Threaded handler: runs in process context, can do heavy work */
static irqreturn_t my_device_thread(int irq, void *dev_id)
{
    struct my_device *dev = dev_id;
    
    /* 
     * This code runs in a kernel thread.
     * It CAN:
     *   - Sleep
     *   - Acquire mutexes (with priority inheritance on PREEMPT_RT)
     *   - Be preempted by higher-priority threads
     *   - Take as long as necessary
     * 
     * It CANNOT:
     *   - Assume it runs immediately after the interrupt
     *   - Assume no other code ran between hardirq and here
     */
    
    mutex_lock(&dev->data_lock);  /* Safe! Will sleep if contended */
    
    process_received_data(dev);
    wake_up_waiting_userspace(dev);
    prepare_next_dma_transfer(dev);
    
    mutex_unlock(&dev->data_lock);
    
    return IRQ_HANDLED;
}
 
/* Registration */
int my_device_probe(struct pci_dev *pdev)
{
    struct my_device *dev = /* ... allocate ... */;
    int ret;
    
    ret = request_threaded_irq(
        pdev->irq,
        my_device_hardirq,     /* Primary: hardirq context */
        my_device_thread,      /* Secondary: thread context */
        IRQF_SHARED,
        "my_device",
        dev
    );
    
    if (ret) {
        dev_err(&pdev->dev, "Failed to request threaded IRQ\n");
        return ret;
    }
    
    /*
     * After this, you can see the thread:
     *   $ ps aux | grep irq
     *   root ... [irq/24-my_device]
     * 
     * And adjust its priority:
     *   # chrt -f -p 90 <pid>
     */
    
    return 0;
}

Forced Threading on PREEMPT_RT

Sleeping Spinlocks and Priority Inheritance

The Spinlock Problem:

Consider a typical spinlock usage pattern in a device driver:

spinlock_problem.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
/*
 * Standard spinlock usage - problematic for real-time
 */
void driver_operation(struct device *dev)
{
    unsigned long flags;
    
    spin_lock_irqsave(&dev->lock, flags);
    /*
     * PROBLEM: While holding this lock:
     *   1. Preemption is disabled
     *   2. All interrupts are disabled (on this CPU)
     *   3. Any higher-priority task wanting to run must wait
     *   4. Duration is unbounded (depends on work done here)
     * 
     * If this critical section takes 1ms, we add 1ms to the
     * worst-case latency of EVERY real-time task in the system!
     */
    perform_lengthy_device_operation(dev);
    spin_unlock_irqrestore(&dev->lock, flags);
}

PREEMPT_RT Solution: Sleeping Locks

Lock Behavior Comparison

Conceptual

Standard Linux Spinlock:
┌─────────────────────────────────────────────────────────────────┐
│ Thread A: [───── spin_lock ─────────────── spin_unlock ────]   │
│                 ↑ preemption disabled                          │
│ Thread B: [...BLOCKED...BLOCKED...BLOCKED...]                 │
│           ↑ Cannot run even if higher priority                │
└─────────────────────────────────────────────────────────────────┘
 
PREEMPT_RT Sleeping Spinlock:
┌─────────────────────────────────────────────────────────────────┐
│ Thread A: [── lock ──┐                      ┌── unlock ──]     │
│                      │ preempted!           │                   │
│ Thread B:            └─[HIGH PRIORITY RUNS]─┘                  │
│                                                                 │
│ Note: Thread A continues after B completes. Priority           │
│ inheritance ensures A completes quickly to release lock.       │
└─────────────────────────────────────────────────────────────────┘

Priority Inheritance:

Priority Inheritance Mechanism

Conceptual

Priority Inversion WITHOUT Inheritance:
┌─────────────────────────────────────────────────────────────────┐
│ High (Pri=90):     [BLOCKED on lock ─────────────────────────]│
│ Medium (Pri=50):   ════════════════════════════════════════    │
│ Low (Pri=10):      [holds lock...preempted by Medium...]      │
│                                                                 │
│ Problem: High waits for Low, but Medium runs instead of Low!  │
│          High's latency = Medium's entire execution time      │
└─────────────────────────────────────────────────────────────────┘
 
Priority Inversion WITH Inheritance (PREEMPT_RT):
┌─────────────────────────────────────────────────────────────────┐
│ High (Pri=90):     [wait]──────────[RUNS]                      │
│ Medium (Pri=50):           [blocked behind Low-at-90]          │
│ Low (Pri=10→90):   [runs at 90, releases lock]                 │
│                    ↑ Inherits High's priority                  │
│                                                                 │
│ Result: High's latency = only Low's critical section time     │
└─────────────────────────────────────────────────────────────────┘

Raw Spinlocks:

raw_spinlock_usage.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
#include <linux/spinlock.h>
 
/* Regular spinlock: becomes sleeping mutex on PREEMPT_RT */
static DEFINE_SPINLOCK(normal_lock);
 
/* Raw spinlock: always a true spinlock, even on PREEMPT_RT */
static DEFINE_RAW_SPINLOCK(raw_lock);
 
void regular_path(void)
{
    spin_lock(&normal_lock);
    /* On PREEMPT_RT: May sleep! Lock holder can be preempted. */
    /* Use for most normal device driver critical sections. */
    do_normal_work();
    spin_unlock(&normal_lock);
}
 
void scheduler_critical_path(void)
{
    unsigned long flags;
    
    raw_spin_lock_irqsave(&raw_lock, flags);
    /*
     * TRUE busy-wait spinlock. Preemption disabled.
     * 
     * Use ONLY for:
     *   - Scheduler internals
     *   - Interrupt controller manipulation
     *   - Timer hardware programming
     *   - Debugging/tracing infrastructure
     * 
     * Keep critical sections EXTREMELY short (< 1μs ideal).
     */
    manipulate_scheduler_structures();
    raw_spin_unlock_irqrestore(&raw_lock, flags);
}

Raw Spinlock Discipline

Additional PREEMPT_RT Kernel Modifications

Beyond threaded interrupts and sleeping spinlocks, PREEMPT_RT includes numerous other modifications that collectively achieve deterministic behavior.

Key PREEMPT_RT Modifications

•High-Resolution Timers — Kernel timers with nanosecond resolution instead of jiffy-based (typically 1-4ms) timing. Essential for fine-grained scheduling.
•Threaded Softirqs — Softirq processing (network RX, timer callbacks, etc.) moved to kernel threads that can be prioritized and preempted.
•RCU Modifications — Preemptible RCU that allows readers to be preempted during read-side critical sections.
•PI-Aware Futexes — User-space futex operations support priority inheritance for RT mutexes in user space.
•Preemptible printk — Console output doesn't block high-priority tasks; printk work deferred to dedicated thread.
•Migration Disable — API to disable task migration without disabling preemption (preserving RT guarantees).
•Lazy Preemption — Optimized preemption that reduces overhead while maintaining determinism.

Softirq Threading:

Softirq Handling Comparison

Conceptual

Standard Linux Softirq Processing:
┌─────────────────────────────────────────────────────────────────┐
│ [Hardirq] → [Softirq: NET_RX + TIMER + SCHED + ...] → [User]  │
│             └──────── Cannot be preempted ────────┘            │
│             Potentially milliseconds of non-preemptible work  │
└─────────────────────────────────────────────────────────────────┘
 
PREEMPT_RT Threaded Softirqs:
┌─────────────────────────────────────────────────────────────────┐
│ [Hardirq] → [Wake ksoftirqd] → [Return immediately]           │
│                   ↓                                             │
│             [ksoftirqd/N thread runs when scheduled]           │
│             [Can be preempted by higher priority threads]      │
│                                                                 │
│ $ ps aux | grep ksoftirqd                                      │
│ root ... [ksoftirqd/0]    # CPU 0 softirq thread              │
│ root ... [ksoftirqd/1]    # CPU 1 softirq thread              │
└─────────────────────────────────────────────────────────────────┘

Preemptible RCU:

preemptible_rcu.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
/*
 * RCU Reader: Standard vs PREEMPT_RT
 */
 
/* Standard Linux: Reader cannot be preempted */
void standard_rcu_reader(void)
{
    rcu_read_lock();  /* Disables preemption */
    
    ptr = rcu_dereference(global_ptr);
    /* CANNOT be preempted here */
    process(ptr);
    
    rcu_read_unlock(); /* Re-enables preemption */
}
 
/* PREEMPT_RT: Reader CAN be preempted */
void preempt_rt_rcu_reader(void)
{
    rcu_read_lock();  /* Does NOT disable preemption */
    
    ptr = rcu_dereference(global_ptr);
    /*
     * CAN be preempted here!
     * 
     * The RCU machinery tracks that we're in a critical section
     * and ensures grace periods account for preempted readers.
     * 
     * This prevents a long-running RCU reader from blocking
     * high-priority real-time tasks.
     */
    process_potentially_long_operation(ptr);
    
    rcu_read_unlock();
}

Overhead Trade-off

Building and Configuring PREEMPT_RT

Deploying PREEMPT_RT requires careful kernel configuration and system tuning. This section covers the practical aspects of building and configuring an RT kernel.

Obtaining PREEMPT_RT:

As of recent Linux versions, PREEMPT_RT support is largely mainlined. For older kernels or the complete patchset:

Obtaining PREEMPT_RT Patches
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# For mainline kernels (6.x+): PREEMPT_RT is a config option
# No patches needed for many configurations
 
# For kernels requiring patches:
# Visit: https://wiki.linuxfoundation.org/realtime/start
# Download matching patch version for your kernel
 
# Example: Applying patches to kernel 5.15
wget https://cdn.kernel.org/pub/linux/kernel/projects/rt/5.15/patch-5.15.XX-rtXX.patch.gz
gunzip patch-5.15.XX-rtXX.patch.gz
 
# Apply to kernel source
cd /path/to/linux-5.15
patch -p1 < ../patch-5.15.XX-rtXX.patch

Essential Kernel Configuration:

RT Kernel Configuration

Kconfig

# Essential PREEMPT_RT Configuration Options
 
# Core preemption model - select PREEMPT_RT
CONFIG_PREEMPT_RT=y          # Full real-time preemption
 
# Timer configuration
CONFIG_HIGH_RES_TIMERS=y     # High-resolution timer support (essential)
CONFIG_NO_HZ_FULL=y          # Tickless operation for RT tasks (optional)
 
# Scheduler features
CONFIG_RT_GROUP_SCHED=y      # RT task group scheduling (optional)
 
# Disable problematic features for RT
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y  # Avoid frequency scaling
CONFIG_CPU_IDLE=n            # Or configure very short idle states only
 
# Debugging (disable in production for lower latency)
CONFIG_DEBUG_PREEMPT=n       # Disable for production
CONFIG_PROVE_LOCKING=n       # Disable for production
CONFIG_LOCKDEP=n             # Disable for production
 
# Tracing (keep for latency analysis, disable for absolute minimum)
CONFIG_FTRACE=y              # Function tracing
CONFIG_IRQSOFF_TRACER=y      # Track IRQs-off latency
CONFIG_PREEMPTIRQ_EVENTS=y   # Preemption/IRQ tracing
CONFIG_SCHED_TRACER=y        # Scheduler tracing
 
# Memory configuration
CONFIG_TRANSPARENT_HUGEPAGE=n   # Avoid THP overhead (recommended)
CONFIG_COMPACTION=n             # Consider disabling compaction

Runtime Configuration:

After booting the RT kernel, additional runtime configuration optimizes real-time behavior:

Runtime RT Configuration
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash
# Runtime configuration for PREEMPT_RT system
 
# 1. Verify RT kernel is running
uname -a | grep -i rt || echo "WARNING: Not running RT kernel!"
 
# 2. Set CPU frequency to maximum (avoid frequency scaling latency)
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
    echo performance > $cpu
done
 
# 3. Disable real-time throttling (allow RT tasks to use 100% CPU)
# WARNING: A runaway RT task can hang the system!
echo -1 > /proc/sys/kernel/sched_rt_runtime_us
 
# 4. Isolate CPUs for RT tasks (kernel boot parameter is better)
# In /etc/default/grub, add: isolcpus=2,3 nohz_full=2,3
 
# 5. Configure IRQ affinities - move non-RT IRQs off RT CPUs
# Move all IRQs to CPU 0,1 (leaving 2,3 for RT tasks)
for irq in /proc/irq/*/smp_affinity; do
    echo 3 > $irq 2>/dev/null  # CPUs 0 and 1
done
 
# 6. Set RT thread priorities
# Example: Set network IRQ thread to priority 90
pgrep -f "irq/.*eth" | xargs -I{} chrt -f -p 90 {}
 
# 7. Lock memory for RT application
# Application should use mlockall(MCL_CURRENT | MCL_FUTURE)
 
# 8. Verify configuration
echo "=== RT Configuration Summary ==="
echo "Kernel: $(uname -r)"
echo "RT Runtime: $(cat /proc/sys/kernel/sched_rt_runtime_us)"
cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo "Isolated CPUs: $(cat /sys/devices/system/cpu/isolated)"

Safety Warning: RT Throttling

Summary: PREEMPT_RT Patch

The PREEMPT_RT patch represents a fundamental reimagining of the Linux kernel for real-time applications. Let's consolidate the key concepts:

Key Takeaways

•Standard Linux Has Unbounded Latency — Spinlocks, hardirqs, and non-preemptible sections create worst-case latencies measured in milliseconds to seconds.
•PREEMPT_RT Makes Nearly Everything Preemptible — Through sleeping spinlocks, threaded interrupts, and preemptible RCU, high-priority tasks can preempt almost any kernel activity.
•Spinlocks Become Sleeping Mutexes — Regular spinlock_t converts to sleeping locks with priority inheritance; raw_spinlock_t remains for truly critical sections.
•Interrupt Handlers Run as Threads — Device drivers' interrupt processing moves to schedulable kernel threads that obey priority and can be preempted.
•Priority Inheritance Prevents Inversion — When high-priority tasks block on locks, the lock holder temporarily inherits the waiter's priority.
•Mainline Integration Is Progressing — Most PREEMPT_RT code is now in mainline Linux; it's becoming a standard kernel configuration option.
•Configuration Requires Care — Achieving optimal RT performance requires kernel config awareness, runtime tuning, and understanding of the trade-offs involved.

What's Next:

Page Complete

1 / 5