Loading content...
In the earliest days of operating systems, extending kernel functionality meant one thing: recompiling the entire kernel. This was not merely inconvenient—it was operationally catastrophic. Adding support for a new network card driver required shutting down production systems, rebuilding the kernel, and rebooting. In enterprise environments, this meant scheduled downtime, service disruptions, and the ever-present risk that the new kernel wouldn't boot correctly.
Dynamic loading fundamentally transformed this paradigm. It introduced the revolutionary concept that executable code could be inserted into a running kernel—safely, efficiently, and without requiring a reboot. This capability seems almost magical when you first encounter it: the kernel is running, executing critical system code, and yet we can add new code to it while it continues to operate.
Understanding dynamic loading is essential for any serious operating systems engineer. It reveals the sophisticated mechanisms that enable modern kernels to be both stable (by keeping the core minimal and well-tested) and extensible (by supporting thousands of hardware devices and features through loadable modules).
By the end of this page, you will understand the fundamental concepts of dynamic loading, including the distinction between compile-time and runtime linking, the mechanisms that enable code to be inserted into a running kernel, the role of symbol tables and relocation, and the architectural patterns that make kernel extensibility both possible and safe.
To understand why dynamic loading matters, we must first understand what it replaced: static linking. In a statically linked kernel, all code that will ever execute in kernel space must be determined at compile time and linked into a single, monolithic executable.
The compile-time commitment:
Static linking requires the kernel developer to anticipate every piece of hardware and every feature that any system running this kernel might need. The linker resolves all symbol references—every function call, every global variable access—at compile time, producing a single executable with fixed addresses.
This approach has one significant advantage: performance. Every function call resolves directly to a fixed address. There's no indirection, no symbol lookup at runtime, no relocation overhead. The resulting code is as fast as possible.
But the disadvantages are severe:
The Windows 3.1 case study:
Early versions of Microsoft Windows exemplified the static linking problem. Hardware vendors had to work with Microsoft to include their drivers in the Windows distribution. Installing a new piece of hardware often meant obtaining floppy disks with driver files that the operating system would copy and recognize only after a reboot—and sometimes only after reinstallation.
The UNIX evolution:
Traditional UNIX systems took a different approach. The kernel source was available, and system administrators could rebuild a custom kernel including only the drivers needed for their specific hardware. While this was more efficient than Windows's approach, it required expertise that most users lacked and still demanded downtime for every kernel update.
Static linking involves several phases: compilation (source to object files), symbol resolution (matching function calls to definitions), relocation (adjusting addresses for final placement), and output generation (creating the executable). Dynamic loading shifts some of these phases from compile time to runtime.
Dynamic loading is the ability to load executable code into a running program's address space after that program has started execution. When applied to operating system kernels, it enables inserting new kernel code—device drivers, filesystems, network protocols, and more—into the running kernel without rebooting.
This capability rests on three fundamental mechanisms:
pci_register_driver) to memory addresses. The symbol table enables the loader to resolve references between the new code and the existing kernel.The loading workflow:
When the operating system dynamically loads a kernel module, a precise sequence of operations occurs:
Modern modules are often compiled as position-independent code, which uses relative addressing instead of absolute addresses. PIC reduces the number of relocations required, as the code can execute correctly regardless of where it's loaded. This is especially important for shared libraries in user space, but has implications for kernel modules as well.
Memory allocation for modules:
Kernel modules require special memory allocation. Unlike user-space programs that receive virtual address space from the kernel, modules need memory in the kernel's own address space. This memory must be:
Symbol resolution is the heart of dynamic loading. A kernel module is not a standalone program—it's a fragment of code designed to integrate with the kernel. It calls kernel functions, accesses kernel data structures, and registers itself with kernel subsystems. All these interactions are mediated through symbols.
What is a symbol?
A symbol is a named entity in compiled code—a function, a global variable, or a code label. The compiler generates symbol table entries for each symbol, recording:
printk, kmalloc, current)1234567891011121314151617181920
// Conceptual representation of an ELF symbol table entrytypedef struct { uint32_t st_name; // Offset into string table for symbol name uint32_t st_value; // Symbol value (address or offset) uint32_t st_size; // Size of the symbol (for data objects) uint8_t st_info; // Type and binding attributes uint8_t st_other; // Visibility and other attributes uint16_t st_shndx; // Section index where symbol is defined} Elf32_Sym; // Example symbol types#define STT_NOTYPE 0 // Symbol type is unspecified#define STT_OBJECT 1 // Symbol is a data object#define STT_FUNC 2 // Symbol is a function#define STT_SECTION 3 // Symbol is associated with a section // Example symbol bindings#define STB_LOCAL 0 // Local symbol (not visible outside object file)#define STB_GLOBAL 1 // Global symbol (visible to all object files)#define STB_WEAK 2 // Weak symbol (can be overridden)The kernel symbol table:
For modules to reference kernel functions, the kernel must maintain a symbol table of exported symbols. Not all kernel symbols are exported—only those explicitly marked for module use. In Linux, this is done with EXPORT_SYMBOL() and EXPORT_SYMBOL_GPL() macros:
void printk(const char *fmt, ...);
EXPORT_SYMBOL(printk); // Available to all modules
void internal_kernel_function(void);
// Not exported — modules cannot call this
Symbol lookup during loading:
When a module is loaded, the loader examines its undefined symbols—references to external entities not defined within the module. For each undefined symbol, the loader searches the kernel symbol table:
The kernel symbol table can contain thousands of symbols. Symbol name collisions can cause incorrect linking. Modern kernels use symbol namespacing (prefixes) and version information to ensure modules link against the correct symbols. Linux uses module versioning (CONFIG_MODVERSIONS) to detect ABI incompatibilities.
Module-to-module dependencies:
Modules can export symbols for use by other modules, creating a dependency graph. For example:
usbcore exports USB infrastructure symbolsusb_storage depends on usbcore symbols and exports storage-related symbolsusb_storage symbolsThe module loader must process dependencies in the correct order. If module A depends on module B, module B must be loaded first. This is typically resolved through:
Relocation is the process of adjusting addresses within code to account for where the code is actually loaded in memory. When a compiler generates an object file, it doesn't know where the code will eventually reside. It uses placeholder addresses or offsets that must be fixed up by the linker (for static linking) or the loader (for dynamic loading).
Why relocation is necessary:
Consider a simple function call:
void my_driver_init(void) {
printk("Driver initialized
");
}
The compiled code will contain a CALL instruction targeting printk. But at compile time, the address of printk is unknown—it depends on the kernel binary and where the kernel is loaded. The compiler emits a relocation entry recording:
printk)| Type | Description | Calculation | Usage |
|---|---|---|---|
| R_X86_64_64 | Absolute 64-bit address | S + A | Data pointers, function pointers |
| R_X86_64_PC32 | 32-bit PC-relative | S + A - P | Function calls, control flow |
| R_X86_64_PLT32 | PLT entry reference | L + A - P | External function calls |
| R_X86_64_GOTPCREL | GOT entry, PC-relative | G + GOT + A - P | Global variable access |
| R_X86_64_32S | Signed 32-bit absolute | S + A | 32-bit signed addresses |
Where:
Relocation processing:
The module loader processes relocations in this sequence:
.rela.text, .rela.data, etc.)1234567891011121314151617181920212223242526272829303132
// Simplified relocation processingvoid apply_relocations(Module *mod, RelocationSection *rela_sec) { for (int i = 0; i < rela_sec->num_entries; i++) { RelocationEntry *rel = &rela_sec->entries[i]; // Get the symbol being referenced Symbol *sym = lookup_symbol(mod, rel->symbol_index); // Get the address where we need to patch void *patch_location = mod->load_address + rel->offset; // Calculate the new value based on relocation type uint64_t symbol_value = resolve_symbol(sym); uint64_t addend = rel->addend; uint64_t place = (uint64_t)patch_location; switch (rel->type) { case R_X86_64_64: // Absolute 64-bit: S + A *(uint64_t*)patch_location = symbol_value + addend; break; case R_X86_64_PC32: // PC-relative 32-bit: S + A - P *(int32_t*)patch_location = (int32_t)(symbol_value + addend - place); break; // ... handle other relocation types } }}Object formats support two relocation styles: REL (implicit addend stored at the patch location) and RELA (explicit addend in the relocation entry). Modern x86-64 ELF uses RELA exclusively, as explicit addends are clearer and avoid the need to read-modify-write during relocation.
Once a module is loaded into memory and all relocations are applied, it's essentially dormant code—present but not active. The initialization entry point brings the module to life, allowing it to register with kernel subsystems, allocate resources, and announce its presence.
The initialization contract:
Kernel modules follow a well-defined contract:
init function — This function is called exactly once when the module loads123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051
#include <linux/module.h>#include <linux/init.h> // Module initialization functionstatic int __init my_module_init(void){ int ret; printk(KERN_INFO "My module: initializing..."); // Allocate resources ret = allocate_device_memory(); if (ret < 0) { printk(KERN_ERR "My module: memory allocation failed"); return ret; // Return error, module loading fails } // Register with a subsystem ret = register_character_device(); if (ret < 0) { printk(KERN_ERR "My module: device registration failed"); free_device_memory(); // Clean up previous allocation return ret; } printk(KERN_INFO "My module: initialized successfully"); return 0; // Success} // Module cleanup functionstatic void __exit my_module_exit(void){ printk(KERN_INFO "My module: cleaning up..."); unregister_character_device(); free_device_memory(); printk(KERN_INFO "My module: cleanup complete");} // Register the init and exit functionsmodule_init(my_module_init);module_exit(my_module_exit); MODULE_LICENSE("GPL");MODULE_AUTHOR("Kernel Developer");MODULE_DESCRIPTION("Example loadable kernel module");Finding the entry point:
The loader needs to know which function to call for initialization. This is recorded in the module's metadata:
.init.text for init code, .exit.text for cleanupinit_module in Linux)The __init and __exit markers:
Linux uses GCC attributes to optimize module code:
#define __init __attribute__((section(".init.text")))
#define __exit __attribute__((section(".exit.text")))
Code marked __init is placed in a special section that can be discarded after initialization—the function will never be called again, so its memory can be reclaimed. Similarly, __exit code is discarded entirely if the module is compiled into the kernel (since built-in code can never be "unloaded").
Not all module setup happens in the init function. Device drivers often register themselves and then wait for hardware events. The init function sets up the registration; actual device initialization happens when hardware is detected (probe function) or when user space opens the device.
The ability to unload a module is as important as loading it. Modules consume kernel memory, and in embedded systems, memory is precious. More critically, unloading enables driver upgrades—replacing an older driver with a newer version without rebooting.
The unloading challenge:
Unloading is far more complex than loading. A loaded module has integrated itself into the kernel:
Removing the module while any of these conditions exist would cause system crashes. The unloading process must verify that removal is safe.
The cleanup function:
Every well-designed module provides a cleanup function that reverses everything the initialization function did:
static void __exit my_module_exit(void)
{
// Unregister from subsystems in reverse order
unregister_character_device();
// Free allocated resources
free_device_memory();
// Cancel any pending work
cancel_delayed_work_sync(&my_work);
printk(KERN_INFO "Module unloaded
");
}
The unloading sequence:
Linux supports 'forced' unloading with rmmod -f, which bypasses safety checks. This is dangerous—it can crash the system if the module is in use. Forced unloading exists only for development and testing scenarios where a buggy module must be removed even if safety checks fail.
Kernel modules must coexist with the kernel and other modules in the kernel address space. Understanding how this memory is organized is crucial for both performance and security.
Address space considerations:
In a 64-bit Linux kernel, the virtual address space is typically split:
User space: 0x0000_0000_0000_0000 — 0x0000_7FFF_FFFF_FFFF
(128 TB of user-accessible memory)
Kernel space: 0xFFFF_8000_0000_0000 — 0xFFFF_FFFF_FFFF_FFFF
(128 TB of kernel-accessible memory)
Within kernel space, modules are loaded into a specific region. On x86-64 Linux, this is typically around 0xFFFF_FFFF_C000_0000 (the "module area").
| Section | Purpose | Permissions | Size (typical) |
|---|---|---|---|
| .text | Executable code | Read + Execute | Variable |
| .rodata | Read-only data (strings, constants) | Read only | Variable |
| .data | Initialized writable data | Read + Write | Variable |
| .bss | Zero-initialized data | Read + Write | Minimal |
| .init.text | Initialization code (freed after init) | Read + Execute | Freed |
| .exit.text | Cleanup code | Read + Execute | Freed for built-ins |
| .symtab | Symbol table | Read only | Debug only |
Memory protection and W^X:
Modern kernels enforce W^X (Write XOR Execute)—memory that is writable cannot be executable, and vice versa. This is a critical security measure:
This prevents attackers who corrupt data from injecting executable shellcode.
Module memory allocation:
The kernel provides special allocation functions for module memory:
// Allocate memory for a module
void *module_alloc(unsigned long size);
// Free module memory
void module_memfree(void *ptr);
These differ from regular kmalloc because they allocate from the module region with appropriate permissions, and they may use different page sizes or mapping strategies optimized for code.
Kernel Address Space Layout Randomization (KASLR) randomizes where modules are loaded. This makes exploits that depend on knowing module addresses more difficult. The module loader adds randomization to the base address within the module region.
Dynamic loading is the architectural foundation that enables modern operating systems to be both lean and extensible. We've explored the mechanisms that make this possible:
What's next:
Now that we understand the foundations of dynamic loading, we'll examine the concrete format used by Linux: the kernel object file format (.ko). We'll see how ELF structures are extended with Linux-specific sections and how the modprobe, insmod, and rmmod tools interact with the kernel loader.
You now understand the fundamental mechanisms of dynamic loading—symbol resolution, relocation, initialization, and memory management. This knowledge forms the basis for understanding how real operating systems implement loadable kernel modules.