Operating SystemsLinux Memory Management

Linux Memory Management

LevelAdvanced

Duration120 mins

TopicLinux Memory Management

3 / 5

Slab Allocator

The Art of Efficient Object Allocation

The Linux kernel constantly creates and destroys data structures—task_struct for processes, inode for files, sk_buff for network packets, dentry for directory entries. These allocations happen millions of times per second on busy systems. A naive malloc/free approach would be catastrophically slow.

Enter the slab allocator—a sophisticated object caching system designed by Jeff Bonwick for Solaris and adapted for Linux. The slab allocator recognizes that most kernel allocations are for objects of known, fixed sizes. Rather than allocating and freeing individual objects from a general-purpose heap, it maintains caches of pre-initialized objects that can be quickly dispensed and returned.

This page examines the slab allocator's design, its three Linux implementations (SLAB, SLUB, SLOB), and the techniques that make it extraordinarily efficient.

What You Will Learn

By the end of this page, you will understand: (1) why traditional allocators are insufficient for kernel needs, (2) the slab cache architecture and object lifecycle, (3) differences between SLAB, SLUB, and SLOB implementations, (4) how to create and use kernel caches, (5) per-CPU caching and NUMA awareness, and (6) debugging and monitoring slab allocator behavior.

The Problem with Traditional Allocators

To appreciate the slab allocator, we must first understand why general-purpose allocators like malloc are unsuitable for kernel workloads.

The Challenges:

1. Internal Fragmentation

General-purpose allocators maintain free lists of various sizes. When you request 64 bytes and only a 128-byte block is available, half the block is wasted. Over time, this internal fragmentation consumes significant memory.

2. Allocation Overhead

Each allocation requires searching free lists, potentially splitting blocks, and maintaining metadata. For small, frequent allocations, this overhead dominates the actual object size.

3. Cache Line Pollution

General allocators don't consider CPU cache behavior. Objects of the same type may be scattered across memory, causing poor cache utilization and more L1/L2 misses.

4. Initialization Cost

Many kernel objects require complex initialization (zeroing, setting up pointers, allocating sub-objects). If an object is freed and immediately reallocated, reinitializing it is wasteful—the previous state might be largely reusable.

5. Locking Contention

A single global allocator lock becomes a severe bottleneck on multi-core systems. Every allocation and free operation contends for this lock.

The Key Insight

Most kernel allocations are for a small number of well-known object types: inodes, dentries, task_structs, socket buffers, etc. If we maintain separate caches for each object type, we can eliminate fragmentation (objects are always the right size), reduce initialization overhead (objects can retain partial state), and optimize cache behavior (similar objects stored together).

Traditional Allocator vs. Slab Allocator
Aspect	Traditional Allocator	Slab Allocator
Fragmentation	Variable block sizes cause waste	Fixed-size objects eliminate internal fragmentation
Allocation Speed	Free list search, potential splitting	O(1) from pre-allocated cache
Initialization	Full re-initialization on each allocation	Constructor/destructor can preserve state
Cache Behavior	Objects scattered randomly	Objects grouped by type, cache-aligned
Locking	Global lock contention	Per-CPU and per-cache structures
Metadata	Per-allocation headers	Per-slab metadata, minimal per-object

Slab Allocator Architecture

The slab allocator organizes memory in a three-level hierarchy: caches, slabs, and objects.

Cache (kmem_cache)

A cache is a pool of objects of a single type. Each cache is created for a specific data structure—one cache for inodes, another for dentries, another for task_structs. The cache maintains:

Object size and alignment requirements
Constructor/destructor functions for initialization
Lists of slabs (full, partial, empty)
Per-CPU object lists for fast allocation
Statistics for debugging and tuning

Slab

A slab is a contiguous memory region (typically one or more pages) containing multiple objects. Slabs are categorized as:

Full: All objects in use—no allocations possible
Partial: Some objects free, some in use—preferred for allocation
Empty: All objects free—candidates for returning to page allocator

Object

The unit of allocation. Objects within a slab are pre-sized, properly aligned, and optionally pre-initialized. Getting a free object is simply returning a pointer from the free list.

Converting Mermaid diagram...

slab_structures.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/* Core slab allocator structures (simplified) */
 
/* A cache manages objects of one type */
struct kmem_cache {
    /* Per-CPU data for fast path allocation */
    struct kmem_cache_cpu __percpu *cpu_slab;
    
    /* Object properties */
    unsigned int object_size;      /* Size of each object */
    unsigned int size;             /* Size with padding/alignment */
    unsigned int align;            /* Alignment requirement */
    unsigned int offset;           /* Free pointer offset within object */
    
    /* Slab properties */
    unsigned int oo;               /* Objects per slab (encoded) */
    unsigned int min;              /* Minimum partial slabs to keep */
    gfp_t allocflags;              /* Allocation flags */
    
    /* Constructor/destructor */
    void (*ctor)(void *);          /* Object constructor */
    
    /* Slab lists */
    struct kmem_cache_node *node[MAX_NUMNODES];  /* Per-NUMA-node lists */
    
    /* Identification */
    const char *name;              /* Cache name (e.g., "task_struct") */
    struct list_head list;         /* All caches linked list */
    
    /* Debug and statistics */
    int refcount;                  /* Reference count */
    unsigned long flags;           /* SLAB_* flags */
    
    /* ... more fields ... */
};
 
/* Per-NUMA-node slab lists */
struct kmem_cache_node {
    spinlock_t list_lock;          /* Protects partial list */
    unsigned long nr_partial;      /* Number of partial slabs */
    struct list_head partial;      /* Partial slab list */
    
    /* Full list only if debugging enabled */
#ifdef CONFIG_SLUB_DEBUG
    unsigned long nr_slabs;
    struct list_head full;
#endif
};
 
/* Per-CPU slab data (SLUB) */
struct kmem_cache_cpu {
    void **freelist;               /* Pointer to next free object */
    unsigned long tid;             /* Transaction ID for lock-free ops */
    struct page *page;             /* Current "active" slab page */
    struct page *partial;          /* Per-CPU partial slab list */
};
 
/* Slab page metadata (stored in struct page) */
/* In Linux, slab metadata is stored in the page struct itself */
struct page {
    /* ... other fields ... */
    
    /* When used as slab page: */
    struct kmem_cache *slab_cache;  /* Owning cache */
    void *freelist;                  /* First free object in slab */
    unsigned int inuse;              /* Objects in use count */
    unsigned int objects;            /* Total objects in slab */
    
    /* ... */
};

Object Lifecycle: Allocation and Freeing

Understanding the allocation and free paths is crucial for appreciating slab allocator efficiency.

Allocation Path (Fast Path):

The ideal allocation path—serving the vast majority of requests—requires no locking:

Check the per-CPU freelist
If objects available, atomically pop one from the list
Return the object pointer

This entire operation is essentially a pointer manipulation—extremely fast.

Allocation Path (Slow Path):

When the per-CPU cache is empty:

Try to refill from partial slabs on the current CPU's partial list
Try to get a partial slab from the per-node partial list (requires lock)
If no partials available, allocate a new slab from the page allocator
Initialize objects in the new slab (call constructor if defined)
Move one object to the per-CPU freelist, return another

allocation_paths.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
/* Allocation: kmem_cache_alloc() (SLUB implementation, simplified) */
 
void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
{
    void *object;
    
    /* Fast path: try per-CPU freelist */
    object = __slab_alloc_fast(s);
    if (likely(object))
        return object;
    
    /* Slow path: need to refill or allocate new slab */
    return __slab_alloc_slow(s, gfpflags);
}
 
/* Fast path - lockless, CPU-local */
static inline void *__slab_alloc_fast(struct kmem_cache *s)
{
    struct kmem_cache_cpu *c;
    void *object;
    unsigned long tid;
    
    /* Disable preemption to ensure we stay on same CPU */
    c = this_cpu_ptr(s->cpu_slab);
    
    /* Try to allocate from current freelist atomically */
    do {
        tid = c->tid;
        object = c->freelist;
        
        if (unlikely(!object))
            return NULL;  /* Need slow path */
        
    } while (!this_cpu_cmpxchg_double(
        s->cpu_slab->freelist, s->cpu_slab->tid,
        object, tid,
        get_freepointer(s, object), next_tid(tid)));
    
    return object;
}
 
/* Slow path - may need locks, page allocation */
static void *__slab_alloc_slow(struct kmem_cache *s, gfp_t gfpflags)
{
    struct page *page;
    void *object;
    
    /* Try to refill from per-CPU partial list */
    page = this_cpu_read(s->cpu_slab->partial);
    if (page) {
        /* Activate this partial slab */
        object = acquire_slab(s, page);
        if (object)
            return object;
    }
    
    /* Try per-node partial list (requires lock) */
    page = get_partial_node(s, get_node(s, numa_node_id()));
    if (page) {
        object = get_freelist(s, page);
        return object;
    }
    
    /* Must allocate new slab */
    page = allocate_slab(s, gfpflags);
    if (!page)
        return NULL;  /* Out of memory */
    
    /* Initialize new slab's freelist */
    page->freelist = setup_objects(s, page);
    
    /* Get first object */
    object = page->freelist;
    page->freelist = get_freepointer(s, object);
    
    return object;
}
 
/* Freeing: kmem_cache_free() */
void kmem_cache_free(struct kmem_cache *s, void *object)
{
    struct kmem_cache_cpu *c;
    struct page *page;
    
    /* Find the slab page containing this object */
    page = virt_to_head_page(object);
    
    /* Fast path: if object belongs to current CPU's slab */
    c = this_cpu_ptr(s->cpu_slab);
    if (likely(page == c->page)) {
        /* Just push onto freelist - lockless */
        set_freepointer(s, object, c->freelist);
        c->freelist = object;
        return;
    }
    
    /* Slow path: object from different CPU or node */
    __slab_free_slow(s, page, object);
}
 
/* Slow path for cross-CPU or cross-node free */
static void __slab_free_slow(struct kmem_cache *s,
                             struct page *page, void *object)
{
    /* This requires locks and may move slabs between lists */
    spin_lock(&n->list_lock);
    
    /* Add object to page's freelist */
    set_freepointer(s, object, page->freelist);
    page->freelist = object;
    page->inuse--;
    
    if (page->inuse == 0) {
        /* Slab now empty - consider freeing */
        if (n->nr_partial >= s->min_partial) {
            /* We have enough partials, free this slab */
            discard_slab(s, page);
        }
    } else if (was_full) {
        /* Slab was full, now partial - add to partial list */
        add_partial(n, page);
    }
    
    spin_unlock(&n->list_lock);
}

The Freelist Trick

When an object is free, the slab allocator stores the pointer to the next free object within the object's own memory. This eliminates the need for separate free list structures—the objects themselves form the list. The 'offset' field in kmem_cache specifies where within the object this pointer is stored.

SLAB, SLUB, and SLOB Implementations

Linux has evolved three slab allocator implementations, each optimized for different scenarios. Understanding their differences is important for kernel configuration and performance tuning.

SLAB (Classic)

•Original Linux implementation — Based on Bonwick's Solaris design
•Complex three-list organization — Full, partial, and empty slab lists with sophisticated management
•Per-CPU caches with batching — Array-based per-CPU caches with configurable batch sizes
•Separate slab metadata — Slab descriptors stored separately from objects (off-slab for large objects)
•Good for workloads with many caches — Optimized for systems with hundreds of different cache types
•Higher memory overhead — More metadata per slab and per cache
•STATUS: Deprecated — Removed from mainline kernel in 2023

SLUB (Default)

•Current Linux default — Simpler design with comparable or better performance
•No separate slab descriptors — Metadata stored in struct page, reducing overhead
•Simplified list management — Only partial slabs tracked; full slabs not listed
•Lock-free fast path — Uses cmpxchg_double for per-CPU freelist manipulation
•Better debugging support — SLUB_DEBUG provides extensive tracking without separate config
•Lower memory overhead — Minimal per-slab metadata
•Excellent for most workloads — Balances simplicity, performance, and memory efficiency

SLOB (Simple List Of Blocks)

•Minimal implementation — For extremely memory-constrained embedded systems
•First-fit allocator — Simple linked list of free blocks
•No per-CPU caching — Single global allocator
•Lowest memory overhead — Minimal data structure footprint
•Worst performance — O(n) allocation in worst case
•Use case: < 16 MB RAM systems — Where every byte counts
•STATUS: Deprecated — Removed from mainline kernel in 2023

Slab Allocator Implementation Comparison
Feature	SLAB	SLUB (Default)	SLOB
Status	Removed (2023)	Active (default)	Removed (2023)
Complexity	High	Medium	Low
Memory Overhead	High	Low	Minimal
Per-CPU Caching	Yes (array-based)	Yes (freelist)	No
Slab Metadata	Separate struct	In struct page	Minimal
NUMA Awareness	Yes	Yes	No
Debug Support	CONFIG option	Runtime option	Limited
Best For	Many caches	General use	Tiny systems

SLUB is the Modern Choice

Unless you're maintaining legacy systems or working on extremely resource-constrained embedded devices, SLUB is the implementation you'll encounter. Modern kernel development and debugging tools are built around SLUB, and it offers the best balance of performance, memory efficiency, and debuggability.

Creating and Using Kernel Caches

Kernel developers create dedicated caches for frequently-allocated data structures. Here's how to create, use, and destroy slab caches:

using_slab_caches.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
/* Example: Creating and using a slab cache in a kernel module */
 
#include <linux/slab.h>
#include <linux/module.h>
 
/* Our example data structure */
struct my_object {
    unsigned long id;
    char name[64];
    struct list_head list;
    spinlock_t lock;
    /* ... */
};
 
/* Global cache pointer */
static struct kmem_cache *my_cache;
 
/* Constructor - called when new objects are created */
static void my_object_ctor(void *ptr)
{
    struct my_object *obj = ptr;
    
    /* Initialize fields that can be reused */
    spin_lock_init(&obj->lock);
    INIT_LIST_HEAD(&obj->list);
    /* Note: id and name set by caller, not constructor */
}
 
/* Module initialization */
static int __init my_module_init(void)
{
    /* Create the cache */
    my_cache = kmem_cache_create(
        "my_object_cache",           /* Cache name (shows in /proc/slabinfo) */
        sizeof(struct my_object),    /* Object size */
        0,                           /* Alignment (0 = default) */
        SLAB_HWCACHE_ALIGN |         /* Align to cache line */
        SLAB_PANIC,                  /* Panic if creation fails */
        my_object_ctor               /* Constructor (or NULL) */
    );
    
    if (!my_cache)
        return -ENOMEM;
    
    pr_info("my_object_cache created\n");
    return 0;
}
 
/* Allocate an object */
struct my_object *my_object_alloc(gfp_t gfp_flags)
{
    struct my_object *obj;
    static unsigned long next_id = 1;
    
    /* kmem_cache_alloc returns pre-initialized object */
    obj = kmem_cache_alloc(my_cache, gfp_flags);
    if (!obj)
        return NULL;
    
    /* Complete initialization */
    obj->id = next_id++;
    obj->name[0] = '\0';
    
    return obj;
}
 
/* Alternative: Allocate and zero (ignores constructor) */
struct my_object *my_object_alloc_zero(gfp_t gfp_flags)
{
    return kmem_cache_zalloc(my_cache, gfp_flags);
}
 
/* Free an object */
void my_object_free(struct my_object *obj)
{
    if (!obj)
        return;
    
    /* Clean up any resources held by object */
    /* Note: constructor-initialized fields can be left as-is */
    
    kmem_cache_free(my_cache, obj);
}
 
/* Module cleanup */
static void __exit my_module_exit(void)
{
    /* Destroy the cache - all objects must be freed first! */
    if (my_cache) {
        kmem_cache_destroy(my_cache);
        pr_info("my_object_cache destroyed\n");
    }
}
 
module_init(my_module_init);
module_exit(my_module_exit);
 
/* === Common Slab API Functions === */
 
/* Create cache with full control */
struct kmem_cache *kmem_cache_create(
    const char *name,      /* Cache name for debugging */
    unsigned int size,     /* Object size */
    unsigned int align,    /* Minimum alignment */
    slab_flags_t flags,    /* SLAB_* flags */
    void (*ctor)(void *)   /* Constructor */
);
 
/* Create cache using existing struct size/alignment */
struct kmem_cache *kmem_cache_create_usercopy(
    const char *name,
    unsigned int size,
    unsigned int align,
    slab_flags_t flags,
    unsigned int useroffset,  /* Offset of user-copyable region */
    unsigned int usersize,    /* Size of user-copyable region */
    void (*ctor)(void *)
);
 
/* Allocate from cache */
void *kmem_cache_alloc(struct kmem_cache *cache, gfp_t flags);
void *kmem_cache_zalloc(struct kmem_cache *cache, gfp_t flags);
 
/* Free to cache */
void kmem_cache_free(struct kmem_cache *cache, void *object);
 
/* Bulk operations (more efficient than loop) */
int kmem_cache_alloc_bulk(struct kmem_cache *cache, gfp_t flags,
                          size_t size, void **p);
void kmem_cache_free_bulk(struct kmem_cache *cache, size_t size, void **p);
 
/* Destroy cache */
void kmem_cache_destroy(struct kmem_cache *cache);

Common SLAB Flags

•SLAB_HWCACHE_ALIGN — Align objects to CPU cache line boundaries for better performance
•SLAB_POISON — Fill freed objects with known pattern to detect use-after-free
•SLAB_RED_ZONE — Add guard zones around objects to detect buffer overflows
•SLAB_PANIC — Panic if cache creation fails (for critical caches)
•SLAB_RECLAIM_ACCOUNT — Account allocations as reclaimable for memory pressure handling
•SLAB_MEM_SPREAD — Spread objects across NUMA nodes for better memory distribution
•SLAB_TYPESAFE_BY_RCU — Enable RCU-safe freeing for lock-free data structures

kmalloc and General-Purpose Caches

While dedicated caches are preferred for known object types, the kernel also needs general-purpose allocation for variable-sized buffers. The kmalloc family of functions provides this, built on top of size-specific slab caches.

How kmalloc Works:

The kernel maintains a set of anonymous caches for power-of-two sizes (32, 64, 128, 256, ... 4096, 8192, etc.). When you call kmalloc(100, GFP_KERNEL), the allocator:

Rounds up to the next power of two (128)
Selects the appropriate size cache (kmalloc-128)
Allocates from that cache

This gives you the speed benefits of slab allocation with malloc-like flexibility.

kmalloc_usage.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/* kmalloc family of functions */
 
#include <linux/slab.h>
 
/* Basic allocation */
void *ptr = kmalloc(size, GFP_KERNEL);
 
/* Zero-initialized allocation */
void *ptr = kzalloc(size, GFP_KERNEL);
 
/* Array allocation (checks for overflow) */
void *arr = kmalloc_array(n, element_size, GFP_KERNEL);
void *arr = kcalloc(n, element_size, GFP_KERNEL);  /* zeroed */
 
/* String duplication */
char *str = kstrdup("hello", GFP_KERNEL);
char *str = kstrndup(src, max_len, GFP_KERNEL);
 
/* Memory duplication */
void *copy = kmemdup(src, size, GFP_KERNEL);
 
/* Resize allocation */
void *new = krealloc(ptr, new_size, GFP_KERNEL);
 
/* Free */
kfree(ptr);
 
/* Size query - what size did we actually get? */
size_t actual = ksize(ptr);
 
/* === GFP (Get Free Pages) Flags === */
 
GFP_KERNEL    /* Normal kernel allocation - may sleep */
GFP_ATOMIC    /* Cannot sleep - use in interrupt context */
GFP_NOWAIT    /* Don't wait for memory to become available */
GFP_NOFS      /* Don't call into filesystem (avoid recursion) */
GFP_NOIO      /* Don't call into I/O layer */
GFP_DMA       /* Require DMA-capable memory */
GFP_DMA32     /* Require 32-bit DMA-capable memory */
GFP_HIGHUSER  /* User-space allocation from high memory */
 
/* === Viewing kmalloc Caches === */
/*
 * $ cat /proc/slabinfo | grep kmalloc
 * kmalloc-8192      1024   1024   8192   4    8 ...
 * kmalloc-4096      2048   2048   4096   8    8 ...
 * kmalloc-2048      4096   4096   2048  16    8 ...
 * kmalloc-1024      8192   8192   1024  32    8 ...
 * kmalloc-512      16384  16384    512  32    4 ...
 * kmalloc-256      32768  32768    256  32    2 ...
 * kmalloc-192      21333  21333    192  21    1 ...
 * kmalloc-128      65536  65536    128  32    1 ...
 * kmalloc-96       42666  42666     96  42    1 ...
 * kmalloc-64       65536  65536     64  64    1 ...
 * kmalloc-32      131072 131072     32 128    1 ...
 */
 
/* For larger allocations (> 1 page typically), use vmalloc or kvmalloc */
void *big = vmalloc(large_size);          /* Virtually contiguous */
void *any = kvmalloc(size, GFP_KERNEL);   /* Try kmalloc, fall back to vmalloc */

kmalloc Size Limits

kmalloc has size limits. The maximum varies by architecture but is typically 4-8 MB. For larger allocations, use vmalloc() which provides virtually contiguous (but physically scattered) memory, or kvmalloc() which tries kmalloc first and falls back to vmalloc. Note that vmalloc memory cannot be used for DMA.

Per-CPU Caching and NUMA Awareness

On multi-core and NUMA systems, the slab allocator employs several optimizations to minimize contention and maximize memory locality.

Per-CPU Freelists:

Each CPU maintains its own freelist for each cache. This eliminates lock contention for the common case:

Allocation: Pop from local freelist (lockless)
Free: Push to local freelist (lockless)
Cross-CPU free: Slightly slower, requires coordination

Objects tend to stay on the CPU that allocated them, improving cache locality.

NUMA Awareness:

On NUMA (Non-Uniform Memory Access) systems, memory access latency depends on which node the memory is attached to. The slab allocator is NUMA-aware:

Slabs are allocated from the local NUMA node when possible
Per-node partial lists keep objects close to the CPUs that use them
Objects freed on a different node can optionally migrate home

Converting Mermaid diagram...

numa_slab_ops.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/* NUMA-aware slab operations */
 
#include <linux/slab.h>
#include <linux/numa.h>
 
/* Allocate from specific NUMA node */
void *kmem_cache_alloc_node(struct kmem_cache *cache,
                            gfp_t flags, int node);
 
void *kmalloc_node(size_t size, gfp_t flags, int node);
 
/* Example: Allocate on node where current task prefers to run */
int preferred_node = numa_node_id();
void *obj = kmem_cache_alloc_node(cache, GFP_KERNEL, preferred_node);
 
/* Example: Allocate on same node as another object */
int obj_node = page_to_nid(virt_to_page(existing_obj));
void *new_obj = kmalloc_node(size, GFP_KERNEL, obj_node);
 
/* View NUMA slab distribution */
/*
 * $ cat /sys/kernel/slab/kmalloc-256/numa_stat
 * N0=12345 N1=10234 N2=11234 N3=9876
 */
 
/* Slab spreads across nodes by default for some caches */
/* SLAB_MEM_SPREAD flag enables intentional distribution */
cache = kmem_cache_create("spread_cache",
                          sizeof(struct my_obj),
                          0,
                          SLAB_MEM_SPREAD,
                          NULL);
 
/* === Per-CPU Statistics === */
/*
 * $ cat /sys/kernel/slab/task_struct/cpu_slabs
 * 0 : slabs=3 objects=24 partial=1
 * 1 : slabs=2 objects=16 partial=1
 * ...
 */

NUMA Implications for Drivers

When writing drivers that allocate memory heavily on specific CPUs (e.g., network drivers with receive queues per CPU), using kmalloc_node() or kmem_cache_alloc_node() with the appropriate NUMA node can significantly reduce memory latency. Profile with tools like 'numastat' to identify cross-node memory access patterns.

Debugging and Monitoring Slab Allocators

The slab allocator provides rich debugging and monitoring capabilities through /proc, /sys, and kernel options.

slab_debugging.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# === /proc/slabinfo - Traditional slab statistics ===
cat /proc/slabinfo
# slabinfo - version: 2.1
# # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
# task_struct          1234       1280       1728        18        8
# inode_cache         45678      46080        608        26        4
# dentry              89012      89856        192        21        1
 
# Parse with slabtop for real-time view
sudo slabtop
# Active / Total Objects (% used)    : 1234567 / 1345678 (91.7%)
# Active / Total Slabs (% used)      : 12345 / 13456 (91.7%)
# Active / Total Caches (% used)     : 150 / 200 (75.0%)
# Active / Total Size (% used)       : 512.00M / 560.00M (91.4%)
 
# === /sys/kernel/slab/ - Detailed per-cache information ===
ls /sys/kernel/slab/task_struct/
# aliases  align  alloc_calls  cache_dma  cpu_partial  cpu_slabs
# ctor  destroy_by_rcu  free_calls  hwcache_align  min_partial
# object_size  objects  objects_partial  objs_per_slab  order
# partial  poison  red_zone  sanity_checks  shrink  slabs
# slabs_cpu  store_user  total_objects  trace  validate
 
# View object size
cat /sys/kernel/slab/task_struct/object_size
# 1728
 
# View allocation call sites
cat /sys/kernel/slab/task_struct/alloc_calls
# 1234 copy_process+0x1a3/0x1d80 ...
 
# View free call sites
cat /sys/kernel/slab/task_struct/free_calls
# 1100 free_task+0x85/0xa0 ...
 
# === SLUB Debug Options ===
# Boot with: slub_debug=PUZ  (see below)
# P = Poisoning: fill with pattern on free
# U = User tracking: store allocation/free call sites
# Z = Red zoning: detect buffer overflows
# F = Sanity checks on free
# T = Trace allocations
 
# Enable debugging for specific cache
echo 1 > /sys/kernel/slab/my_cache/poison
echo 1 > /sys/kernel/slab/my_cache/red_zone
echo 1 > /sys/kernel/slab/my_cache/trace
 
# Force validation of a cache (slow!)
echo 1 > /sys/kernel/slab/my_cache/validate
 
# === Common Issues and Detection ===
 
# Use-after-free detection (with poisoning):
# Object filled with 0x6b on free
# Access pattern 0x6b6b6b6b indicates use-after-free
 
# Buffer overflow detection (with red zones):
# Guard bytes placed around object
# Corruption of guards indicates buffer overflow
 
# Memory leak detection:
# Compare alloc_calls vs free_calls
# Growing object count without matching frees
 
# === Memory Pressure and Shrinking ===
# Force slab shrinking (release empty slabs)
echo 1 > /sys/kernel/slab/*/shrink
 
# Or use sysctl
sysctl vm.drop_caches=2  # Free reclaimable slab objects

SLUB Debug Options
Option	Description	Overhead
P (Poison)	Fill freed objects with 0x6b pattern	Low
U (User tracking)	Record allocation/free call sites	Medium
Z (Red zone)	Add guard bytes around objects	Medium
F (Sanity)	Validate objects on free	Low
T (Trace)	Trace all allocations to kernel log	High
A (Failslab)	Random allocation failures for testing	Low

Debug Overhead

SLUB debugging options can significantly impact performance. Poisoning and red zones are suitable for development systems. User tracking and tracing should be used sparingly. Never enable all options on production systems unless diagnosing a specific issue. Boot with 'slub_debug=-' to disable all debugging.

Summary: Slab Allocator

The slab allocator is one of the most carefully engineered subsystems in the Linux kernel. Let's consolidate what we've learned:

Key Takeaways

•The slab allocator solves specific kernel allocation challenges — fragmentation, initialization overhead, cache behavior, and lock contention that general allocators handle poorly.
•Three-level hierarchy — caches contain slabs which contain objects, enabling efficient management of typed memory pools.
•SLUB is the modern default — offering simplicity, low overhead, and excellent debugging support; SLAB and SLOB have been removed from the kernel.
•Fast path allocation is lockless — per-CPU freelists with atomic operations handle the common case without any locking.
•Constructors preserve initialization work — avoiding redundant setup when objects cycle between allocation and freeing.
•kmalloc uses size-based caches — providing general-purpose allocation built on slab infrastructure.
•NUMA awareness ensures memory locality — allocating from local nodes and maintaining per-node slab lists.
•Rich debugging and monitoring — /sys/kernel/slab/, poisoning, red zones, and call tracking enable sophisticated debugging.

What's Next:

The slab allocator works with pages obtained from the page allocator, but not all memory is created equal. The next page explores Linux memory zones—how the kernel partitions physical memory into regions with different characteristics and constraints.

Page Complete

You now have a comprehensive understanding of the Linux slab allocator—its architecture, implementations, APIs, and optimization techniques. This knowledge is essential for kernel development, driver writing, and performance analysis.

3 / 5

Loading learning content...

Operating SystemsLinux Memory Management

Linux Memory Management

LevelAdvanced

Duration120 mins

TopicLinux Memory Management

3 / 5

Slab Allocator

The Art of Efficient Object Allocation

This page examines the slab allocator's design, its three Linux implementations (SLAB, SLUB, SLOB), and the techniques that make it extraordinarily efficient.

What You Will Learn

The Problem with Traditional Allocators

To appreciate the slab allocator, we must first understand why general-purpose allocators like malloc are unsuitable for kernel workloads.

The Challenges:

1. Internal Fragmentation

2. Allocation Overhead

Each allocation requires searching free lists, potentially splitting blocks, and maintaining metadata. For small, frequent allocations, this overhead dominates the actual object size.

3. Cache Line Pollution

General allocators don't consider CPU cache behavior. Objects of the same type may be scattered across memory, causing poor cache utilization and more L1/L2 misses.

4. Initialization Cost

5. Locking Contention

A single global allocator lock becomes a severe bottleneck on multi-core systems. Every allocation and free operation contends for this lock.

The Key Insight

Traditional Allocator vs. Slab Allocator
Aspect	Traditional Allocator	Slab Allocator
Fragmentation	Variable block sizes cause waste	Fixed-size objects eliminate internal fragmentation
Allocation Speed	Free list search, potential splitting	O(1) from pre-allocated cache
Initialization	Full re-initialization on each allocation	Constructor/destructor can preserve state
Cache Behavior	Objects scattered randomly	Objects grouped by type, cache-aligned
Locking	Global lock contention	Per-CPU and per-cache structures
Metadata	Per-allocation headers	Per-slab metadata, minimal per-object

Slab Allocator Architecture

The slab allocator organizes memory in a three-level hierarchy: caches, slabs, and objects.

Cache (kmem_cache)

A cache is a pool of objects of a single type. Each cache is created for a specific data structure—one cache for inodes, another for dentries, another for task_structs. The cache maintains:

Object size and alignment requirements
Constructor/destructor functions for initialization
Lists of slabs (full, partial, empty)
Per-CPU object lists for fast allocation
Statistics for debugging and tuning

Slab

A slab is a contiguous memory region (typically one or more pages) containing multiple objects. Slabs are categorized as:

Full: All objects in use—no allocations possible
Partial: Some objects free, some in use—preferred for allocation
Empty: All objects free—candidates for returning to page allocator

Object

The unit of allocation. Objects within a slab are pre-sized, properly aligned, and optionally pre-initialized. Getting a free object is simply returning a pointer from the free list.

Converting Mermaid diagram...

slab_structures.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
/* Core slab allocator structures (simplified) */
 
/* A cache manages objects of one type */
struct kmem_cache {
    /* Per-CPU data for fast path allocation */
    struct kmem_cache_cpu __percpu *cpu_slab;
    
    /* Object properties */
    unsigned int object_size;      /* Size of each object */
    unsigned int size;             /* Size with padding/alignment */
    unsigned int align;            /* Alignment requirement */
    unsigned int offset;           /* Free pointer offset within object */
    
    /* Slab properties */
    unsigned int oo;               /* Objects per slab (encoded) */
    unsigned int min;              /* Minimum partial slabs to keep */
    gfp_t allocflags;              /* Allocation flags */
    
    /* Constructor/destructor */
    void (*ctor)(void *);          /* Object constructor */
    
    /* Slab lists */
    struct kmem_cache_node *node[MAX_NUMNODES];  /* Per-NUMA-node lists */
    
    /* Identification */
    const char *name;              /* Cache name (e.g., "task_struct") */
    struct list_head list;         /* All caches linked list */
    
    /* Debug and statistics */
    int refcount;                  /* Reference count */
    unsigned long flags;           /* SLAB_* flags */
    
    /* ... more fields ... */
};
 
/* Per-NUMA-node slab lists */
struct kmem_cache_node {
    spinlock_t list_lock;          /* Protects partial list */
    unsigned long nr_partial;      /* Number of partial slabs */
    struct list_head partial;      /* Partial slab list */
    
    /* Full list only if debugging enabled */
#ifdef CONFIG_SLUB_DEBUG
    unsigned long nr_slabs;
    struct list_head full;
#endif
};
 
/* Per-CPU slab data (SLUB) */
struct kmem_cache_cpu {
    void **freelist;               /* Pointer to next free object */
    unsigned long tid;             /* Transaction ID for lock-free ops */
    struct page *page;             /* Current "active" slab page */
    struct page *partial;          /* Per-CPU partial slab list */
};
 
/* Slab page metadata (stored in struct page) */
/* In Linux, slab metadata is stored in the page struct itself */
struct page {
    /* ... other fields ... */
    
    /* When used as slab page: */
    struct kmem_cache *slab_cache;  /* Owning cache */
    void *freelist;                  /* First free object in slab */
    unsigned int inuse;              /* Objects in use count */
    unsigned int objects;            /* Total objects in slab */
    
    /* ... */
};

Object Lifecycle: Allocation and Freeing

Understanding the allocation and free paths is crucial for appreciating slab allocator efficiency.

Allocation Path (Fast Path):

The ideal allocation path—serving the vast majority of requests—requires no locking:

Check the per-CPU freelist
If objects available, atomically pop one from the list
Return the object pointer

This entire operation is essentially a pointer manipulation—extremely fast.

Allocation Path (Slow Path):

When the per-CPU cache is empty:

Try to refill from partial slabs on the current CPU's partial list
Try to get a partial slab from the per-node partial list (requires lock)
If no partials available, allocate a new slab from the page allocator
Initialize objects in the new slab (call constructor if defined)
Move one object to the per-CPU freelist, return another

allocation_paths.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
/* Allocation: kmem_cache_alloc() (SLUB implementation, simplified) */
 
void *kmem_cache_alloc(struct kmem_cache *s, gfp_t gfpflags)
{
    void *object;
    
    /* Fast path: try per-CPU freelist */
    object = __slab_alloc_fast(s);
    if (likely(object))
        return object;
    
    /* Slow path: need to refill or allocate new slab */
    return __slab_alloc_slow(s, gfpflags);
}
 
/* Fast path - lockless, CPU-local */
static inline void *__slab_alloc_fast(struct kmem_cache *s)
{
    struct kmem_cache_cpu *c;
    void *object;
    unsigned long tid;
    
    /* Disable preemption to ensure we stay on same CPU */
    c = this_cpu_ptr(s->cpu_slab);
    
    /* Try to allocate from current freelist atomically */
    do {
        tid = c->tid;
        object = c->freelist;
        
        if (unlikely(!object))
            return NULL;  /* Need slow path */
        
    } while (!this_cpu_cmpxchg_double(
        s->cpu_slab->freelist, s->cpu_slab->tid,
        object, tid,
        get_freepointer(s, object), next_tid(tid)));
    
    return object;
}
 
/* Slow path - may need locks, page allocation */
static void *__slab_alloc_slow(struct kmem_cache *s, gfp_t gfpflags)
{
    struct page *page;
    void *object;
    
    /* Try to refill from per-CPU partial list */
    page = this_cpu_read(s->cpu_slab->partial);
    if (page) {
        /* Activate this partial slab */
        object = acquire_slab(s, page);
        if (object)
            return object;
    }
    
    /* Try per-node partial list (requires lock) */
    page = get_partial_node(s, get_node(s, numa_node_id()));
    if (page) {
        object = get_freelist(s, page);
        return object;
    }
    
    /* Must allocate new slab */
    page = allocate_slab(s, gfpflags);
    if (!page)
        return NULL;  /* Out of memory */
    
    /* Initialize new slab's freelist */
    page->freelist = setup_objects(s, page);
    
    /* Get first object */
    object = page->freelist;
    page->freelist = get_freepointer(s, object);
    
    return object;
}
 
/* Freeing: kmem_cache_free() */
void kmem_cache_free(struct kmem_cache *s, void *object)
{
    struct kmem_cache_cpu *c;
    struct page *page;
    
    /* Find the slab page containing this object */
    page = virt_to_head_page(object);
    
    /* Fast path: if object belongs to current CPU's slab */
    c = this_cpu_ptr(s->cpu_slab);
    if (likely(page == c->page)) {
        /* Just push onto freelist - lockless */
        set_freepointer(s, object, c->freelist);
        c->freelist = object;
        return;
    }
    
    /* Slow path: object from different CPU or node */
    __slab_free_slow(s, page, object);
}
 
/* Slow path for cross-CPU or cross-node free */
static void __slab_free_slow(struct kmem_cache *s,
                             struct page *page, void *object)
{
    /* This requires locks and may move slabs between lists */
    spin_lock(&n->list_lock);
    
    /* Add object to page's freelist */
    set_freepointer(s, object, page->freelist);
    page->freelist = object;
    page->inuse--;
    
    if (page->inuse == 0) {
        /* Slab now empty - consider freeing */
        if (n->nr_partial >= s->min_partial) {
            /* We have enough partials, free this slab */
            discard_slab(s, page);
        }
    } else if (was_full) {
        /* Slab was full, now partial - add to partial list */
        add_partial(n, page);
    }
    
    spin_unlock(&n->list_lock);
}

The Freelist Trick

SLAB, SLUB, and SLOB Implementations

Linux has evolved three slab allocator implementations, each optimized for different scenarios. Understanding their differences is important for kernel configuration and performance tuning.

SLAB (Classic)

•Original Linux implementation — Based on Bonwick's Solaris design
•Complex three-list organization — Full, partial, and empty slab lists with sophisticated management
•Per-CPU caches with batching — Array-based per-CPU caches with configurable batch sizes
•Separate slab metadata — Slab descriptors stored separately from objects (off-slab for large objects)
•Good for workloads with many caches — Optimized for systems with hundreds of different cache types
•Higher memory overhead — More metadata per slab and per cache
•STATUS: Deprecated — Removed from mainline kernel in 2023

SLUB (Default)

•Current Linux default — Simpler design with comparable or better performance
•No separate slab descriptors — Metadata stored in struct page, reducing overhead
•Simplified list management — Only partial slabs tracked; full slabs not listed
•Lock-free fast path — Uses cmpxchg_double for per-CPU freelist manipulation
•Better debugging support — SLUB_DEBUG provides extensive tracking without separate config
•Lower memory overhead — Minimal per-slab metadata
•Excellent for most workloads — Balances simplicity, performance, and memory efficiency

SLOB (Simple List Of Blocks)

•Minimal implementation — For extremely memory-constrained embedded systems
•First-fit allocator — Simple linked list of free blocks
•No per-CPU caching — Single global allocator
•Lowest memory overhead — Minimal data structure footprint
•Worst performance — O(n) allocation in worst case
•Use case: < 16 MB RAM systems — Where every byte counts
•STATUS: Deprecated — Removed from mainline kernel in 2023

Slab Allocator Implementation Comparison
Feature	SLAB	SLUB (Default)	SLOB
Status	Removed (2023)	Active (default)	Removed (2023)
Complexity	High	Medium	Low
Memory Overhead	High	Low	Minimal
Per-CPU Caching	Yes (array-based)	Yes (freelist)	No
Slab Metadata	Separate struct	In struct page	Minimal
NUMA Awareness	Yes	Yes	No
Debug Support	CONFIG option	Runtime option	Limited
Best For	Many caches	General use	Tiny systems

SLUB is the Modern Choice

Creating and Using Kernel Caches

Kernel developers create dedicated caches for frequently-allocated data structures. Here's how to create, use, and destroy slab caches:

using_slab_caches.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
/* Example: Creating and using a slab cache in a kernel module */
 
#include <linux/slab.h>
#include <linux/module.h>
 
/* Our example data structure */
struct my_object {
    unsigned long id;
    char name[64];
    struct list_head list;
    spinlock_t lock;
    /* ... */
};
 
/* Global cache pointer */
static struct kmem_cache *my_cache;
 
/* Constructor - called when new objects are created */
static void my_object_ctor(void *ptr)
{
    struct my_object *obj = ptr;
    
    /* Initialize fields that can be reused */
    spin_lock_init(&obj->lock);
    INIT_LIST_HEAD(&obj->list);
    /* Note: id and name set by caller, not constructor */
}
 
/* Module initialization */
static int __init my_module_init(void)
{
    /* Create the cache */
    my_cache = kmem_cache_create(
        "my_object_cache",           /* Cache name (shows in /proc/slabinfo) */
        sizeof(struct my_object),    /* Object size */
        0,                           /* Alignment (0 = default) */
        SLAB_HWCACHE_ALIGN |         /* Align to cache line */
        SLAB_PANIC,                  /* Panic if creation fails */
        my_object_ctor               /* Constructor (or NULL) */
    );
    
    if (!my_cache)
        return -ENOMEM;
    
    pr_info("my_object_cache created\n");
    return 0;
}
 
/* Allocate an object */
struct my_object *my_object_alloc(gfp_t gfp_flags)
{
    struct my_object *obj;
    static unsigned long next_id = 1;
    
    /* kmem_cache_alloc returns pre-initialized object */
    obj = kmem_cache_alloc(my_cache, gfp_flags);
    if (!obj)
        return NULL;
    
    /* Complete initialization */
    obj->id = next_id++;
    obj->name[0] = '\0';
    
    return obj;
}
 
/* Alternative: Allocate and zero (ignores constructor) */
struct my_object *my_object_alloc_zero(gfp_t gfp_flags)
{
    return kmem_cache_zalloc(my_cache, gfp_flags);
}
 
/* Free an object */
void my_object_free(struct my_object *obj)
{
    if (!obj)
        return;
    
    /* Clean up any resources held by object */
    /* Note: constructor-initialized fields can be left as-is */
    
    kmem_cache_free(my_cache, obj);
}
 
/* Module cleanup */
static void __exit my_module_exit(void)
{
    /* Destroy the cache - all objects must be freed first! */
    if (my_cache) {
        kmem_cache_destroy(my_cache);
        pr_info("my_object_cache destroyed\n");
    }
}
 
module_init(my_module_init);
module_exit(my_module_exit);
 
/* === Common Slab API Functions === */
 
/* Create cache with full control */
struct kmem_cache *kmem_cache_create(
    const char *name,      /* Cache name for debugging */
    unsigned int size,     /* Object size */
    unsigned int align,    /* Minimum alignment */
    slab_flags_t flags,    /* SLAB_* flags */
    void (*ctor)(void *)   /* Constructor */
);
 
/* Create cache using existing struct size/alignment */
struct kmem_cache *kmem_cache_create_usercopy(
    const char *name,
    unsigned int size,
    unsigned int align,
    slab_flags_t flags,
    unsigned int useroffset,  /* Offset of user-copyable region */
    unsigned int usersize,    /* Size of user-copyable region */
    void (*ctor)(void *)
);
 
/* Allocate from cache */
void *kmem_cache_alloc(struct kmem_cache *cache, gfp_t flags);
void *kmem_cache_zalloc(struct kmem_cache *cache, gfp_t flags);
 
/* Free to cache */
void kmem_cache_free(struct kmem_cache *cache, void *object);
 
/* Bulk operations (more efficient than loop) */
int kmem_cache_alloc_bulk(struct kmem_cache *cache, gfp_t flags,
                          size_t size, void **p);
void kmem_cache_free_bulk(struct kmem_cache *cache, size_t size, void **p);
 
/* Destroy cache */
void kmem_cache_destroy(struct kmem_cache *cache);

Common SLAB Flags

•SLAB_HWCACHE_ALIGN — Align objects to CPU cache line boundaries for better performance
•SLAB_POISON — Fill freed objects with known pattern to detect use-after-free
•SLAB_RED_ZONE — Add guard zones around objects to detect buffer overflows
•SLAB_PANIC — Panic if cache creation fails (for critical caches)
•SLAB_RECLAIM_ACCOUNT — Account allocations as reclaimable for memory pressure handling
•SLAB_MEM_SPREAD — Spread objects across NUMA nodes for better memory distribution
•SLAB_TYPESAFE_BY_RCU — Enable RCU-safe freeing for lock-free data structures

kmalloc and General-Purpose Caches

How kmalloc Works:

The kernel maintains a set of anonymous caches for power-of-two sizes (32, 64, 128, 256, ... 4096, 8192, etc.). When you call kmalloc(100, GFP_KERNEL), the allocator:

Rounds up to the next power of two (128)
Selects the appropriate size cache (kmalloc-128)
Allocates from that cache

This gives you the speed benefits of slab allocation with malloc-like flexibility.

kmalloc_usage.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/* kmalloc family of functions */
 
#include <linux/slab.h>
 
/* Basic allocation */
void *ptr = kmalloc(size, GFP_KERNEL);
 
/* Zero-initialized allocation */
void *ptr = kzalloc(size, GFP_KERNEL);
 
/* Array allocation (checks for overflow) */
void *arr = kmalloc_array(n, element_size, GFP_KERNEL);
void *arr = kcalloc(n, element_size, GFP_KERNEL);  /* zeroed */
 
/* String duplication */
char *str = kstrdup("hello", GFP_KERNEL);
char *str = kstrndup(src, max_len, GFP_KERNEL);
 
/* Memory duplication */
void *copy = kmemdup(src, size, GFP_KERNEL);
 
/* Resize allocation */
void *new = krealloc(ptr, new_size, GFP_KERNEL);
 
/* Free */
kfree(ptr);
 
/* Size query - what size did we actually get? */
size_t actual = ksize(ptr);
 
/* === GFP (Get Free Pages) Flags === */
 
GFP_KERNEL    /* Normal kernel allocation - may sleep */
GFP_ATOMIC    /* Cannot sleep - use in interrupt context */
GFP_NOWAIT    /* Don't wait for memory to become available */
GFP_NOFS      /* Don't call into filesystem (avoid recursion) */
GFP_NOIO      /* Don't call into I/O layer */
GFP_DMA       /* Require DMA-capable memory */
GFP_DMA32     /* Require 32-bit DMA-capable memory */
GFP_HIGHUSER  /* User-space allocation from high memory */
 
/* === Viewing kmalloc Caches === */
/*
 * $ cat /proc/slabinfo | grep kmalloc
 * kmalloc-8192      1024   1024   8192   4    8 ...
 * kmalloc-4096      2048   2048   4096   8    8 ...
 * kmalloc-2048      4096   4096   2048  16    8 ...
 * kmalloc-1024      8192   8192   1024  32    8 ...
 * kmalloc-512      16384  16384    512  32    4 ...
 * kmalloc-256      32768  32768    256  32    2 ...
 * kmalloc-192      21333  21333    192  21    1 ...
 * kmalloc-128      65536  65536    128  32    1 ...
 * kmalloc-96       42666  42666     96  42    1 ...
 * kmalloc-64       65536  65536     64  64    1 ...
 * kmalloc-32      131072 131072     32 128    1 ...
 */
 
/* For larger allocations (> 1 page typically), use vmalloc or kvmalloc */
void *big = vmalloc(large_size);          /* Virtually contiguous */
void *any = kvmalloc(size, GFP_KERNEL);   /* Try kmalloc, fall back to vmalloc */

kmalloc Size Limits

Per-CPU Caching and NUMA Awareness

On multi-core and NUMA systems, the slab allocator employs several optimizations to minimize contention and maximize memory locality.

Per-CPU Freelists:

Each CPU maintains its own freelist for each cache. This eliminates lock contention for the common case:

Allocation: Pop from local freelist (lockless)
Free: Push to local freelist (lockless)
Cross-CPU free: Slightly slower, requires coordination

Objects tend to stay on the CPU that allocated them, improving cache locality.

NUMA Awareness:

On NUMA (Non-Uniform Memory Access) systems, memory access latency depends on which node the memory is attached to. The slab allocator is NUMA-aware:

Slabs are allocated from the local NUMA node when possible
Per-node partial lists keep objects close to the CPUs that use them
Objects freed on a different node can optionally migrate home

Converting Mermaid diagram...

numa_slab_ops.c
C (Kernel)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
/* NUMA-aware slab operations */
 
#include <linux/slab.h>
#include <linux/numa.h>
 
/* Allocate from specific NUMA node */
void *kmem_cache_alloc_node(struct kmem_cache *cache,
                            gfp_t flags, int node);
 
void *kmalloc_node(size_t size, gfp_t flags, int node);
 
/* Example: Allocate on node where current task prefers to run */
int preferred_node = numa_node_id();
void *obj = kmem_cache_alloc_node(cache, GFP_KERNEL, preferred_node);
 
/* Example: Allocate on same node as another object */
int obj_node = page_to_nid(virt_to_page(existing_obj));
void *new_obj = kmalloc_node(size, GFP_KERNEL, obj_node);
 
/* View NUMA slab distribution */
/*
 * $ cat /sys/kernel/slab/kmalloc-256/numa_stat
 * N0=12345 N1=10234 N2=11234 N3=9876
 */
 
/* Slab spreads across nodes by default for some caches */
/* SLAB_MEM_SPREAD flag enables intentional distribution */
cache = kmem_cache_create("spread_cache",
                          sizeof(struct my_obj),
                          0,
                          SLAB_MEM_SPREAD,
                          NULL);
 
/* === Per-CPU Statistics === */
/*
 * $ cat /sys/kernel/slab/task_struct/cpu_slabs
 * 0 : slabs=3 objects=24 partial=1
 * 1 : slabs=2 objects=16 partial=1
 * ...
 */

NUMA Implications for Drivers

Debugging and Monitoring Slab Allocators

The slab allocator provides rich debugging and monitoring capabilities through /proc, /sys, and kernel options.

slab_debugging.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# === /proc/slabinfo - Traditional slab statistics ===
cat /proc/slabinfo
# slabinfo - version: 2.1
# # name            <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab>
# task_struct          1234       1280       1728        18        8
# inode_cache         45678      46080        608        26        4
# dentry              89012      89856        192        21        1
 
# Parse with slabtop for real-time view
sudo slabtop
# Active / Total Objects (% used)    : 1234567 / 1345678 (91.7%)
# Active / Total Slabs (% used)      : 12345 / 13456 (91.7%)
# Active / Total Caches (% used)     : 150 / 200 (75.0%)
# Active / Total Size (% used)       : 512.00M / 560.00M (91.4%)
 
# === /sys/kernel/slab/ - Detailed per-cache information ===
ls /sys/kernel/slab/task_struct/
# aliases  align  alloc_calls  cache_dma  cpu_partial  cpu_slabs
# ctor  destroy_by_rcu  free_calls  hwcache_align  min_partial
# object_size  objects  objects_partial  objs_per_slab  order
# partial  poison  red_zone  sanity_checks  shrink  slabs
# slabs_cpu  store_user  total_objects  trace  validate
 
# View object size
cat /sys/kernel/slab/task_struct/object_size
# 1728
 
# View allocation call sites
cat /sys/kernel/slab/task_struct/alloc_calls
# 1234 copy_process+0x1a3/0x1d80 ...
 
# View free call sites
cat /sys/kernel/slab/task_struct/free_calls
# 1100 free_task+0x85/0xa0 ...
 
# === SLUB Debug Options ===
# Boot with: slub_debug=PUZ  (see below)
# P = Poisoning: fill with pattern on free
# U = User tracking: store allocation/free call sites
# Z = Red zoning: detect buffer overflows
# F = Sanity checks on free
# T = Trace allocations
 
# Enable debugging for specific cache
echo 1 > /sys/kernel/slab/my_cache/poison
echo 1 > /sys/kernel/slab/my_cache/red_zone
echo 1 > /sys/kernel/slab/my_cache/trace
 
# Force validation of a cache (slow!)
echo 1 > /sys/kernel/slab/my_cache/validate
 
# === Common Issues and Detection ===
 
# Use-after-free detection (with poisoning):
# Object filled with 0x6b on free
# Access pattern 0x6b6b6b6b indicates use-after-free
 
# Buffer overflow detection (with red zones):
# Guard bytes placed around object
# Corruption of guards indicates buffer overflow
 
# Memory leak detection:
# Compare alloc_calls vs free_calls
# Growing object count without matching frees
 
# === Memory Pressure and Shrinking ===
# Force slab shrinking (release empty slabs)
echo 1 > /sys/kernel/slab/*/shrink
 
# Or use sysctl
sysctl vm.drop_caches=2  # Free reclaimable slab objects

SLUB Debug Options
Option	Description	Overhead
P (Poison)	Fill freed objects with 0x6b pattern	Low
U (User tracking)	Record allocation/free call sites	Medium
Z (Red zone)	Add guard bytes around objects	Medium
F (Sanity)	Validate objects on free	Low
T (Trace)	Trace all allocations to kernel log	High
A (Failslab)	Random allocation failures for testing	Low

Debug Overhead

Summary: Slab Allocator

The slab allocator is one of the most carefully engineered subsystems in the Linux kernel. Let's consolidate what we've learned:

Key Takeaways

•The slab allocator solves specific kernel allocation challenges — fragmentation, initialization overhead, cache behavior, and lock contention that general allocators handle poorly.
•Three-level hierarchy — caches contain slabs which contain objects, enabling efficient management of typed memory pools.
•SLUB is the modern default — offering simplicity, low overhead, and excellent debugging support; SLAB and SLOB have been removed from the kernel.
•Fast path allocation is lockless — per-CPU freelists with atomic operations handle the common case without any locking.
•Constructors preserve initialization work — avoiding redundant setup when objects cycle between allocation and freeing.
•kmalloc uses size-based caches — providing general-purpose allocation built on slab infrastructure.
•NUMA awareness ensures memory locality — allocating from local nodes and maintaining per-node slab lists.
•Rich debugging and monitoring — /sys/kernel/slab/, poisoning, red zones, and call tracking enable sophisticated debugging.

What's Next:

Page Complete

3 / 5