Operating SystemsSystem Call Implementation

System Call Implementation

LevelIntermediate

Duration90 mins

TopicSystem Call Implementation

4 / 5

Parameter Validation

The Trust Boundary

System calls represent the most critical trust boundary in any operating system. On one side is user space—untrusted, potentially malicious, definitely buggy code that can send any garbage it wants. On the other side is the kernel—the most privileged code in the system, with access to all memory, all processes, all hardware.

Every argument to every syscall is a potential attack vector. A malicious user could:

Pass a kernel address as a "buffer" argument, causing the kernel to overwrite its own data
Provide a huge size that triggers integer overflow
Give a file descriptor that doesn't exist or belongs to another process
Supply a path string that's not null-terminated, causing buffer overruns
Submit a pointer to shared memory that another thread modifies after validation

Parameter validation is the kernel's immune system. Get it wrong, and you have a privilege escalation vulnerability. This page explores how the kernel validates every syscall argument to maintain system security.

What You Will Learn

By the end of this page, you will understand how the kernel validates syscall parameters—from address space verification through safe memory copy primitives to TOCTTOU (time-of-check-to-time-of-use) attack prevention. You'll know why direct pointer dereference is forbidden and how the kernel safely moves data across the user/kernel boundary.

The Fundamental Problem

User-space code can invoke any syscall with any arguments. The kernel must assume every argument is malicious until proven otherwise.

Why can't we just dereference user pointers?

Consider a naive implementation of sys_read():

sys_read_vulnerable.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/* VULNERABLE: Never do this! */
ssize_t vulnerable_sys_read(int fd, char *buf, size_t count)
{
    struct file *f = get_file(fd);
    char kernel_buffer[1024];
    
    /* Read data from file into kernel buffer */
    ssize_t n = read_file(f, kernel_buffer, count);
    
    /* VULNERABLE: Directly copy to user pointer! */
    memcpy(buf, kernel_buffer, n);
    
    return n;
}
 
/* Attack 1: buf points to kernel memory
 * 
 * User calls: read(fd, 0xffffffff81000000, 100);
 *             (0xffffffff81000000 is in kernel space!)
 * 
 * The memcpy writes to kernel memory,
 * potentially overwriting kernel code/data.
 * Result: Kernel crash or arbitrary code execution.
 */
 
/* Attack 2: buf is not mapped
 * 
 * User calls: read(fd, 0xdeadbeef, 100);
 *             (0xdeadbeef is unmapped)
 * 
 * The memcpy causes a page fault in kernel mode.
 * At best: kernel crash (DoS)
 * At worst: exploitable panic handler
 */
 
/* Attack 3: Integer overflow
 * 
 * User calls: read(fd, buf, 0xffffffffffffffff);
 *             (count = SIZE_MAX)
 * 
 * count + offset in kernel may overflow.
 * bounds checks may be bypassed.
 * Result: Out-of-bounds read/write
 */

Never Trust User Pointers

The kernel CANNOT safely dereference any pointer provided by user space. The pointer could point to kernel memory, unmapped memory, memory-mapped hardware registers, or change value between check and use. All access must go through validated copy functions.

The categories of validation:

Parameter validation breaks down into several categories, each preventing different attack classes:

Validation Categories

•Address space validation — Is this pointer within the valid user address range? Does it point to kernel space?
•Access permission validation — Is the memory readable/writable by this process? Are page permissions correct?
•Bounds checking — Will this size cause buffer overflows or integer overflows?
•Resource validation — Does this fd/pid/handle exist and is it accessible by this process?
•Safe memory copy — Gracefully handling page faults during user memory access
•Semantic validation — Are the argument values meaningful for this operation?

Address Space Validation

The first line of defense is checking whether a pointer is even in the range of possible user addresses. On x86-64 Linux:

User space: 0x0000000000000000 to 0x00007FFFFFFFFFFF (128 TB)
Non-canonical hole: 0x0000800000000000 to 0xFFFF7FFFFFFFFFFF (invalid)
Kernel space: 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF (128 TB)

The access_ok() macro performs this fundamental check:

access_ok.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/* Linux kernel: arch/x86/include/asm/uaccess.h */
 
/* Verify that a user pointer is in the valid user range */
#define access_ok(addr, size)                   \
    likely(!__range_not_ok(addr, size, TASK_SIZE_MAX))
 
/* The actual range check */
static inline bool __range_not_ok(
    unsigned long addr, 
    unsigned long size, 
    unsigned long limit)
{
    /*
     * If addr + size overflows, this test will catch it
     * because the result will wrap around below addr.
     *
     * Also catches addr + size > limit (outside user space)
     */
    return unlikely(size > limit) || 
           unlikely(addr > limit - size);
}
 
/* TASK_SIZE_MAX: The upper bound of user space
 * On x86-64: 0x00007ffffffff000 (128TB minus guard pages)
 */
 
/* Example usage in a syscall handler */
ssize_t ksys_read(int fd, char __user *buf, size_t count)
{
    /* First: verify buf is in user space */
    if (!access_ok(buf, count)) {
        return -EFAULT;  /* Bad address */
    }
    
    /* Now we know buf points to user space, but we can't
     * dereference it directly. We must use copy_to_user().
     */
    /* ... */
}

access_ok() is Necessary but Not Sufficient

access_ok() only checks that the address is in the user address range. It does NOT verify that the memory is mapped, that the process has permission to access it, or that the pages are currently resident. Those checks happen during the actual copy.

Historical note: set_fs() and the dangers it posed

Older Linux kernels had a mechanism called set_fs() that could temporarily allow kernel code to treat kernel pointers as "user" pointers, bypassing access_ok(). This was removed in Linux 5.10 because:

It was frequently misused, creating security vulnerabilities
It made code harder to audit
Modern kernels have better alternatives (kernel_read/kernel_write)

x86-64 Address Space Layout
Address Range	Size	Purpose	Pointer Check
0x0000000000000000 - 0x00007FFFFFFFFFFF	128 TB	User space	access_ok() passes
0x0000800000000000 - 0xFFFF7FFFFFFFFFFF	~16M TB	Non-canonical hole	CPU fault on use
0xFFFF800000000000 - 0xFFFFBFFFFFFFFFFF	64 TB	Kernel direct mapping	access_ok() fails
0xFFFFFFFF80000000 - 0xFFFFFFFFFFFFFFFF	2 GB	Kernel text/modules	access_ok() fails

Safe Memory Copy Functions

Once we've verified the address range, we need to actually move data between kernel and user space. The kernel provides specialized functions that handle all the complexity:

The copy_*_user() family:

copy_from_user(to, from, n) — Copy n bytes from user space to kernel
copy_to_user(to, from, n) — Copy n bytes from kernel to user space
get_user(x, ptr) — Copy a simple value (1, 2, 4, 8 bytes)
put_user(x, ptr) — Write a simple value to user space
strncpy_from_user() — Copy a null-terminated string from user space

copy_user.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
/* Safe memory copy functions - conceptual implementation */
 
/* Copy from user space to kernel space */
unsigned long copy_from_user(void *to, const void __user *from, 
                             unsigned long n)
{
    /* First: validate the source address range */
    if (!access_ok(from, n)) {
        /* Clear destination (security: don't leak old kernel data) */
        memset(to, 0, n);
        return n;  /* Return number of bytes NOT copied */
    }
    
    /* Attempt the copy, handling page faults */
    return raw_copy_from_user(to, from, n);
}
 
/* Copy from kernel space to user space */
unsigned long copy_to_user(void __user *to, const void *from,
                           unsigned long n)
{
    /* Validate the destination range */
    if (!access_ok(to, n))
        return n;  /* Return number of bytes NOT copied */
    
    return raw_copy_to_user(to, from, n);
}
 
/* Return value convention:
 * Returns 0 on success (all bytes copied)
 * Returns >0 on failure (number of bytes NOT copied)
 */
 
/* Example usage */
ssize_t sys_read_impl(struct file *f, char __user *buf, size_t count)
{
    char *kbuf;
    ssize_t ret;
    
    /* Allocate kernel buffer */
    kbuf = kmalloc(count, GFP_KERNEL);
    if (!kbuf)
        return -ENOMEM;
    
    /* Read data into kernel buffer */
    ret = vfs_read(f, kbuf, count);
    if (ret <= 0)
        goto out;
    
    /* Copy to user space SAFELY */
    if (copy_to_user(buf, kbuf, ret)) {
        /* copy_to_user failed (returned non-zero) */
        ret = -EFAULT;
    }
    
out:
    kfree(kbuf);
    return ret;
}

How page faults are handled:

The raw_copy_*_user() functions may encounter page faults—the user memory might not be currently resident. The kernel handles this with a clever mechanism:

exception_handling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/* The kernel uses exception tables to handle faults gracefully */
 
/* Assembly for raw_copy_from_user (simplified concept) */
/*
 * 1:  rep movsb           // Copy bytes
 *     xor %eax, %eax      // Return 0 (success)
 *     ret
 * 2:  mov %ecx, %eax      // Return remaining count (failure)
 *     ret
 * 
 * .section __ex_table
 * .quad 1b, 2b            // If fault at 1, jump to 2
 */
 
/* When a page fault occurs during the copy:
 * 
 * 1. CPU generates page fault exception
 * 2. do_page_fault() is called
 * 3. Kernel checks if fault is from a "expected" location
 * 4. Looks up RIP in exception table (__ex_table)
 * 5. If found: modify RIP to jump to fixup handler
 * 6. If not found: kernel oops (bug in kernel code)
 * 7. Fixup handler returns error to caller
 */
 
/* The exception table entry */
struct exception_table_entry {
    int insn;       /* Relative address of faulting instruction */
    int fixup;      /* Relative address of fixup code */
    int handler;    /* Handler type (optional) */
};
 
/* During page fault handling */
bool fixup_exception(struct pt_regs *regs, int trap)
{
    const struct exception_table_entry *e;
    
    /* Search for this RIP in exception table */
    e = search_exception_tables(regs->ip);
    
    if (e) {
        /* Found! Modify RIP to retry with fixup */
        regs->ip = (unsigned long)&e->insn + e->fixup;
        return true;  /* Fault handled */
    }
    
    return false;  /* No fixup - kernel bug! */
}

Why This Is Safe

The exception table approach means that copy_*_user() can never cause a kernel crash, even if the user provides a completely invalid pointer. The fault is caught, the copy aborts, and the caller receives an error. This is why we MUST use these functions instead of direct pointer access.

Simpler Accessors: get_user and put_user

For accessing single values (1, 2, 4, or 8 bytes), the kernel provides optimized get_user() and put_user() macros that are faster than copy_*_user() for small transfers:

get_put_user.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
/* Optimized accessors for simple types */
 
/* get_user: Read a single value from user space
 * 
 * x: kernel variable to receive the value
 * ptr: user-space pointer
 * Returns: 0 on success, -EFAULT on error
 */
#define get_user(x, ptr)                     \
({                                           \
    int __ret;                               \
    __typeof__(*(ptr)) __val;                \
    /* Verify address is in user space */    \
    if (!access_ok(ptr, sizeof(*ptr))) {     \
        __ret = -EFAULT;                     \
    } else {                                 \
        __ret = __get_user(__val, ptr);      \
        (x) = __val;                         \
    }                                        \
    __ret;                                   \
})
 
/* put_user: Write a single value to user space */
#define put_user(x, ptr)                     \
({                                           \
    int __ret;                               \
    if (!access_ok(ptr, sizeof(*ptr))) {     \
        __ret = -EFAULT;                     \
    } else {                                 \
        __ret = __put_user(x, ptr);          \
    }                                        \
    __ret;                                   \
})
 
/* Example usage */
SYSCALL_DEFINE2(getrlimit, unsigned int, resource, 
                struct rlimit __user *, rlim)
{
    struct rlimit value;
    int retval;
    
    retval = do_prlimit(current, resource, NULL, &value);
    if (retval)
        return retval;
    
    /* Write the result to user space */
    if (put_user(value.rlim_cur, &rlim->rlim_cur) ||
        put_user(value.rlim_max, &rlim->rlim_max))
        return -EFAULT;
    
    return 0;
}
 
/* __get_user (unsafe version) - used after access_ok() */
/* These are faster but caller MUST verify address first */
#define __get_user(x, ptr) ({                \
    __typeof__(*(ptr)) __gu_val;             \
    int __gu_err = 0;                        \
    /* Inline assembly with exception handler */  \
    switch (sizeof(*ptr)) {                  \
    case 1:                                  \
        __get_user_asm(__gu_val, ptr, "b");  \
        break;                               \
    case 2:                                  \
        __get_user_asm(__gu_val, ptr, "w");  \
        break;                               \
    case 4:                                  \
        __get_user_asm(__gu_val, ptr, "l");  \
        break;                               \
    case 8:                                  \
        __get_user_asm(__gu_val, ptr, "q");  \
        break;                               \
    }                                        \
    (x) = __gu_val;                          \
    __gu_err;                                \
})

Never Use __get_user/__put_user Directly

The double-underscore versions (__get_user, __put_user) skip access_ok() checking for performance. They should ONLY be used when the caller has already verified the address. Misuse leads to security vulnerabilities. Prefer the safe versions (get_user, put_user) unless you're absolutely certain access_ok() was already called.

Memory Access Functions Summary
Function	Direction	Size	Returns
`get_user(x, ptr)`	User → Kernel	1/2/4/8 bytes	0 or -EFAULT
`put_user(x, ptr)`	Kernel → User	1/2/4/8 bytes	0 or -EFAULT
`copy_from_user(to, from, n)`	User → Kernel	n bytes	Bytes not copied
`copy_to_user(to, from, n)`	Kernel → User	n bytes	Bytes not copied
`strncpy_from_user(to, from, n)`	User → Kernel	String	Length or -EFAULT
`strnlen_user(s, n)`	Measure length	String	Length or 0
`clear_user(to, n)`	Clear user memory	n bytes	Bytes not cleared

String Handling

Strings from user space are particularly dangerous. They can be:

Longer than expected (buffer overflow)
Not null-terminated (read past buffer bounds)
Extremely long (denial of service via memory exhaustion)
Modified by another thread between strlen and copy (TOCTTOU)

The kernel provides strncpy_from_user() for safe string copy:

strncpy_from_user.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
/* Safe string copy from user space */
 
/**
 * strncpy_from_user - Copy a string from user space
 * @dst: Destination kernel buffer
 * @src: Source user-space pointer  
 * @count: Maximum bytes to copy (including null terminator)
 *
 * Returns:
 *   - Length of string (not including null) on success
 *   - -EFAULT on access error
 *   - If string is longer than count, copies count-1 chars + null
 */
long strncpy_from_user(char *dst, const char __user *src, long count)
{
    long res;
    
    if (!access_ok(src, 1)) {  /* Check at least first byte */
        return -EFAULT;
    }
    
    /* Call architecture-specific safe version */
    res = do_strncpy_from_user(dst, src, count);
    
    if (res >= count) {
        /* String was truncated - add null terminator */
        dst[count - 1] = '\0';
        return count - 1;
    }
    
    return res;
}
 
/* Example: Getting a pathname from user space */
char *getname(const char __user *filename)
{
    char *kname;
    int len;
    
    /* PATH_MAX is typically 4096 */
    kname = kmalloc(PATH_MAX, GFP_KERNEL);
    if (!kname)
        return ERR_PTR(-ENOMEM);
    
    /* Safely copy the path */
    len = strncpy_from_user(kname, filename, PATH_MAX);
    
    if (len < 0) {
        kfree(kname);
        return ERR_PTR(len);  /* -EFAULT */
    }
    
    if (len == PATH_MAX - 1 && kname[len] != '\0') {
        /* Path was too long */
        kfree(kname);
        return ERR_PTR(-ENAMETOOLONG);
    }
    
    return kname;
}
 
/* strnlen_user: Get length of user string without copying */
long strnlen_user(const char __user *str, long count)
{
    /* Returns: length including null, or 0 on error */
    /* 
     * Useful for pre-allocation:
     * len = strnlen_user(user_str, MAX_LEN);
     * if (len == 0) return -EFAULT;
     * if (len > MAX_LEN) return -ENAMETOOLONG;
     * buf = kmalloc(len, GFP_KERNEL);
     * strncpy_from_user(buf, user_str, len);
     */
}

Kernel Has Its Own getname()

The kernel's getname() function (and getname_flags()) handles all the complexity of copying paths from user space, including length limits, error handling, and audit logging. Most syscalls dealing with filenames use this shared infrastructure rather than calling strncpy_from_user() directly.

TOCTTOU Prevention

TOCTTOU (Time-of-Check-to-Time-of-Use) vulnerabilities occur when the system checks a condition, then later acts on it—but the condition changes between check and use.

Classic TOCTTOU attack:

User calls access("/tmp/foo", W_OK) to check write permission
Kernel verifies user can write to /tmp/foo
Attacker swaps /tmp/foo symlink to point to /etc/passwd
User calls open("/tmp/foo", O_WRONLY)
Kernel opens (now pointing to /etc/passwd!)
User writes to /etc/passwd (privilege escalation)

tocttou_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/* TOCTTOU-vulnerable pattern (in USER space) */
/* This is broken by design, not a kernel bug */
 
if (access(filename, W_OK) == 0) {
    /* Window of vulnerability: between check and use */
    /* Attacker can change filesystem here! */
    
    int fd = open(filename, O_WRONLY);  /* Might open different file */
    write(fd, data, len);
}
 
/* TOCTTOU in KERNEL space - validation must be atomic with use */
 
/* VULNERABLE kernel code (conceptual) */
ssize_t vulnerable_read(char __user *buf, size_t count)
{
    /* Check that buf is valid */
    if (!access_ok(buf, count))
        return -EFAULT;
    
    /* Window: another thread unmaps buf here! */
    
    /* Use buf - might fault now! */
    for (int i = 0; i < count; i++) {
        *buf++ = data[i];  /* CRASH! */
    }
}
 
/* CORRECT approach: use atomic copy functions */
ssize_t correct_read(char __user *buf, size_t count)
{
    /* copy_to_user handles the entire operation atomically
     * w.r.t. page faults. If the page becomes invalid during
     * copy, the exception table catches it and returns error.
     */
    if (copy_to_user(buf, kernel_data, count))
        return -EFAULT;
    return count;
}

How the kernel prevents TOCTTOU in syscalls:

Atomic copy operations — copy_from_user/copy_to_user handle faults during the copy itself
Double-fetch prevention — Never read user memory twice; copy once, validate the copy
Lock ordering — Hold appropriate locks while acting on checked conditions
fd-based operations — Use file descriptors instead of paths for operations (openat(), fchmodat(), etc.)

double_fetch.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* Double-fetch vulnerability example */
 
struct user_cmd {
    size_t size;
    char data[];
};
 
/* VULNERABLE: Double fetch from user memory */
int vuln_handler(struct user_cmd __user *ucmd)
{
    size_t size;
    char *kbuf;
    
    /* First fetch: get size */
    if (get_user(size, &ucmd->size))
        return -EFAULT;
    
    /* Allocate based on size */
    if (size > MAX_SIZE)
        return -EINVAL;
    
    kbuf = kmalloc(size, GFP_KERNEL);
    
    /* VULNERABLE: Second fetch of size! */
    /* Attacker changes ucmd->size between fetches */
    if (copy_from_user(kbuf, ucmd->data, ucmd->size))  /* WRONG! */
        return -EFAULT;
    
    /* If attacker changed size to huge value, buffer overflow! */
}
 
/* CORRECT: Copy once, use the copy */
int safe_handler(struct user_cmd __user *ucmd)
{
    size_t size;
    char *kbuf;
    
    /* Fetch size exactly once */
    if (get_user(size, &ucmd->size))
        return -EFAULT;
    
    if (size > MAX_SIZE)
        return -EINVAL;
    
    kbuf = kmalloc(size, GFP_KERNEL);
    
    /* Use the KERNEL copy of size */
    if (copy_from_user(kbuf, ucmd->data, size))  /* Correct! */
        return -EFAULT;
    
    /* size is from kernel memory - can't change */
}

Rule: Never Read User Memory Twice

Any value read from user space must be copied to kernel memory ONCE, then only the kernel copy is used thereafter. Reading user memory multiple times creates race conditions that attackers can exploit.

Integer Overflow Protection

Size and count parameters can cause integer overflows that bypass bounds checks. The kernel uses careful arithmetic and dedicated overflow-safe functions:

Classic overflow attack:

User passes size = 0xfffffff0 and offset = 0x20
size + offset = 0x10 (overflows!)
Bounds check: is 0x10 < buffer_size? YES
But actual access goes way past the buffer!

overflow_protection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/* Integer overflow protection mechanisms */
 
/* VULNERABLE: Simple addition overflows */
ssize_t vuln_copy(void __user *buf, size_t offset, size_t count)
{
    /* This check is bypassed by overflow! */
    if (offset + count > buffer_size)  /* WRONG */
        return -EINVAL;
    
    /* Attacker: offset=MAX-10, count=100
     * offset + count = (wraps around to) 90
     * 90 < buffer_size - check passes!
     * But actual access is at offset MAX-10
     */
}
 
/* CORRECT: Check for overflow explicitly */
ssize_t safe_copy(void __user *buf, size_t offset, size_t count)
{
    /* Check for addition overflow */
    if (count > SIZE_MAX - offset)  /* Can't overflow */
        return -EINVAL;
    
    /* Now safe to add */
    if (offset + count > buffer_size)
        return -EINVAL;
    
    /* Proceed... */
}
 
/* Using compiler builtins (modern approach) */
ssize_t modern_copy(void __user *buf, size_t offset, size_t count)
{
    size_t end;
    
    /* __builtin_add_overflow returns true on overflow */
    if (__builtin_add_overflow(offset, count, &end))
        return -EOVERFLOW;
    
    if (end > buffer_size)
        return -EINVAL;
    
    /* Safe to proceed */
}
 
/* Kernel helper macros */
#include <linux/overflow.h>
 
/* check_add_overflow(a, b, d) - true if a+b overflows, stores in *d */
/* check_mul_overflow(a, b, d) - true if a*b overflows */
/* array_size(a, b) - returns a*b or SIZE_MAX on overflow */
/* struct_size(ptr, member, n) - size of struct with n array elements */
 
/* Example: Allocating array with overflow protection */
struct my_struct {
    int header;
    char data[];
};
 
struct my_struct *alloc_struct(size_t num_elements)
{
    struct my_struct *p;
    size_t size;
    
    /* Safe calculation of total size */
    size = struct_size(p, data, num_elements);
    if (size == SIZE_MAX)
        return NULL;  /* Overflow detected */
    
    return kmalloc(size, GFP_KERNEL);
}

struct_size() is Your Friend

When allocating structures with flexible array members, always use struct_size(). It correctly calculates the size including the array and returns SIZE_MAX if the multiplication overflows. This single function prevents an entire class of vulnerabilities.

Overflow-Safe Arithmetic Functions
Function	Operation	Overflow Behavior
`check_add_overflow(a, b, &d)`	d = a + b	Returns true if overflow
`check_sub_overflow(a, b, &d)`	d = a - b	Returns true if underflow
`check_mul_overflow(a, b, &d)`	d = a * b	Returns true if overflow
`array_size(n, size)`	n * size	Returns SIZE_MAX on overflow
`array3_size(a, b, c)`	a * b * c	Returns SIZE_MAX on overflow
`struct_size(p, member, n)`	sizeof(p) + nsizeof(member[0])	Returns SIZE_MAX on overflow

Summary: Parameter Validation

We've explored the critical security barrier between user space and kernel—the parameter validation layer that protects the system from malicious or buggy input. Let's consolidate the key concepts:

Key Takeaways

•Never trust user pointers — Every pointer from user space could be a kernel address, unmapped memory, or a trap. Always validate before use.
•access_ok() checks address range — It verifies pointers are in user address space but doesn't guarantee the memory is mapped or accessible.
•Use copy_*_user() for data transfer — These functions handle page faults gracefully via exception tables, never causing kernel crashes.
•get_user()/put_user() for simple values — Optimized accessors for 1/2/4/8-byte transfers with built-in validation.
•strncpy_from_user() for strings — Always use bounded copies; user strings may be unterminated or arbitrarily long.
•Prevent TOCTTOU attacks — Copy user data exactly once to kernel memory; never re-read user memory for validation.
•Guard against integer overflow — Use check_*_overflow() and struct_size() for arithmetic on user-provided sizes.

What's next:

Even with perfect parameter validation, syscalls can fail. The kernel must communicate errors to user space in a consistent, informative way. The final page in this module explores Error Handling—how the kernel signals errors, how errno propagates through wrappers, and how to debug syscall failures.

Page Complete

You now understand the parameter validation layer—the kernel's immune system against malicious input. This knowledge is essential for kernel development, security auditing, and understanding CVE reports about syscall vulnerabilities. Next, we'll examine how errors flow from kernel to user space.

4 / 5

Loading learning content...

Operating SystemsSystem Call Implementation

System Call Implementation

LevelIntermediate

Duration90 mins

TopicSystem Call Implementation

4 / 5

Parameter Validation

The Trust Boundary

Every argument to every syscall is a potential attack vector. A malicious user could:

Pass a kernel address as a "buffer" argument, causing the kernel to overwrite its own data
Provide a huge size that triggers integer overflow
Give a file descriptor that doesn't exist or belongs to another process
Supply a path string that's not null-terminated, causing buffer overruns
Submit a pointer to shared memory that another thread modifies after validation

What You Will Learn

The Fundamental Problem

User-space code can invoke any syscall with any arguments. The kernel must assume every argument is malicious until proven otherwise.

Why can't we just dereference user pointers?

Consider a naive implementation of sys_read():

sys_read_vulnerable.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
/* VULNERABLE: Never do this! */
ssize_t vulnerable_sys_read(int fd, char *buf, size_t count)
{
    struct file *f = get_file(fd);
    char kernel_buffer[1024];
    
    /* Read data from file into kernel buffer */
    ssize_t n = read_file(f, kernel_buffer, count);
    
    /* VULNERABLE: Directly copy to user pointer! */
    memcpy(buf, kernel_buffer, n);
    
    return n;
}
 
/* Attack 1: buf points to kernel memory
 * 
 * User calls: read(fd, 0xffffffff81000000, 100);
 *             (0xffffffff81000000 is in kernel space!)
 * 
 * The memcpy writes to kernel memory,
 * potentially overwriting kernel code/data.
 * Result: Kernel crash or arbitrary code execution.
 */
 
/* Attack 2: buf is not mapped
 * 
 * User calls: read(fd, 0xdeadbeef, 100);
 *             (0xdeadbeef is unmapped)
 * 
 * The memcpy causes a page fault in kernel mode.
 * At best: kernel crash (DoS)
 * At worst: exploitable panic handler
 */
 
/* Attack 3: Integer overflow
 * 
 * User calls: read(fd, buf, 0xffffffffffffffff);
 *             (count = SIZE_MAX)
 * 
 * count + offset in kernel may overflow.
 * bounds checks may be bypassed.
 * Result: Out-of-bounds read/write
 */

Never Trust User Pointers

The categories of validation:

Parameter validation breaks down into several categories, each preventing different attack classes:

Validation Categories

•Address space validation — Is this pointer within the valid user address range? Does it point to kernel space?
•Access permission validation — Is the memory readable/writable by this process? Are page permissions correct?
•Bounds checking — Will this size cause buffer overflows or integer overflows?
•Resource validation — Does this fd/pid/handle exist and is it accessible by this process?
•Safe memory copy — Gracefully handling page faults during user memory access
•Semantic validation — Are the argument values meaningful for this operation?

Address Space Validation

The first line of defense is checking whether a pointer is even in the range of possible user addresses. On x86-64 Linux:

User space: 0x0000000000000000 to 0x00007FFFFFFFFFFF (128 TB)
Non-canonical hole: 0x0000800000000000 to 0xFFFF7FFFFFFFFFFF (invalid)
Kernel space: 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF (128 TB)

The access_ok() macro performs this fundamental check:

access_ok.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/* Linux kernel: arch/x86/include/asm/uaccess.h */
 
/* Verify that a user pointer is in the valid user range */
#define access_ok(addr, size)                   \
    likely(!__range_not_ok(addr, size, TASK_SIZE_MAX))
 
/* The actual range check */
static inline bool __range_not_ok(
    unsigned long addr, 
    unsigned long size, 
    unsigned long limit)
{
    /*
     * If addr + size overflows, this test will catch it
     * because the result will wrap around below addr.
     *
     * Also catches addr + size > limit (outside user space)
     */
    return unlikely(size > limit) || 
           unlikely(addr > limit - size);
}
 
/* TASK_SIZE_MAX: The upper bound of user space
 * On x86-64: 0x00007ffffffff000 (128TB minus guard pages)
 */
 
/* Example usage in a syscall handler */
ssize_t ksys_read(int fd, char __user *buf, size_t count)
{
    /* First: verify buf is in user space */
    if (!access_ok(buf, count)) {
        return -EFAULT;  /* Bad address */
    }
    
    /* Now we know buf points to user space, but we can't
     * dereference it directly. We must use copy_to_user().
     */
    /* ... */
}

access_ok() is Necessary but Not Sufficient

Historical note: set_fs() and the dangers it posed

It was frequently misused, creating security vulnerabilities
It made code harder to audit
Modern kernels have better alternatives (kernel_read/kernel_write)

x86-64 Address Space Layout
Address Range	Size	Purpose	Pointer Check
0x0000000000000000 - 0x00007FFFFFFFFFFF	128 TB	User space	access_ok() passes
0x0000800000000000 - 0xFFFF7FFFFFFFFFFF	~16M TB	Non-canonical hole	CPU fault on use
0xFFFF800000000000 - 0xFFFFBFFFFFFFFFFF	64 TB	Kernel direct mapping	access_ok() fails
0xFFFFFFFF80000000 - 0xFFFFFFFFFFFFFFFF	2 GB	Kernel text/modules	access_ok() fails

Safe Memory Copy Functions

Once we've verified the address range, we need to actually move data between kernel and user space. The kernel provides specialized functions that handle all the complexity:

The copy_*_user() family:

copy_from_user(to, from, n) — Copy n bytes from user space to kernel
copy_to_user(to, from, n) — Copy n bytes from kernel to user space
get_user(x, ptr) — Copy a simple value (1, 2, 4, 8 bytes)
put_user(x, ptr) — Write a simple value to user space
strncpy_from_user() — Copy a null-terminated string from user space

copy_user.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
/* Safe memory copy functions - conceptual implementation */
 
/* Copy from user space to kernel space */
unsigned long copy_from_user(void *to, const void __user *from, 
                             unsigned long n)
{
    /* First: validate the source address range */
    if (!access_ok(from, n)) {
        /* Clear destination (security: don't leak old kernel data) */
        memset(to, 0, n);
        return n;  /* Return number of bytes NOT copied */
    }
    
    /* Attempt the copy, handling page faults */
    return raw_copy_from_user(to, from, n);
}
 
/* Copy from kernel space to user space */
unsigned long copy_to_user(void __user *to, const void *from,
                           unsigned long n)
{
    /* Validate the destination range */
    if (!access_ok(to, n))
        return n;  /* Return number of bytes NOT copied */
    
    return raw_copy_to_user(to, from, n);
}
 
/* Return value convention:
 * Returns 0 on success (all bytes copied)
 * Returns >0 on failure (number of bytes NOT copied)
 */
 
/* Example usage */
ssize_t sys_read_impl(struct file *f, char __user *buf, size_t count)
{
    char *kbuf;
    ssize_t ret;
    
    /* Allocate kernel buffer */
    kbuf = kmalloc(count, GFP_KERNEL);
    if (!kbuf)
        return -ENOMEM;
    
    /* Read data into kernel buffer */
    ret = vfs_read(f, kbuf, count);
    if (ret <= 0)
        goto out;
    
    /* Copy to user space SAFELY */
    if (copy_to_user(buf, kbuf, ret)) {
        /* copy_to_user failed (returned non-zero) */
        ret = -EFAULT;
    }
    
out:
    kfree(kbuf);
    return ret;
}

How page faults are handled:

The raw_copy_*_user() functions may encounter page faults—the user memory might not be currently resident. The kernel handles this with a clever mechanism:

exception_handling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
/* The kernel uses exception tables to handle faults gracefully */
 
/* Assembly for raw_copy_from_user (simplified concept) */
/*
 * 1:  rep movsb           // Copy bytes
 *     xor %eax, %eax      // Return 0 (success)
 *     ret
 * 2:  mov %ecx, %eax      // Return remaining count (failure)
 *     ret
 * 
 * .section __ex_table
 * .quad 1b, 2b            // If fault at 1, jump to 2
 */
 
/* When a page fault occurs during the copy:
 * 
 * 1. CPU generates page fault exception
 * 2. do_page_fault() is called
 * 3. Kernel checks if fault is from a "expected" location
 * 4. Looks up RIP in exception table (__ex_table)
 * 5. If found: modify RIP to jump to fixup handler
 * 6. If not found: kernel oops (bug in kernel code)
 * 7. Fixup handler returns error to caller
 */
 
/* The exception table entry */
struct exception_table_entry {
    int insn;       /* Relative address of faulting instruction */
    int fixup;      /* Relative address of fixup code */
    int handler;    /* Handler type (optional) */
};
 
/* During page fault handling */
bool fixup_exception(struct pt_regs *regs, int trap)
{
    const struct exception_table_entry *e;
    
    /* Search for this RIP in exception table */
    e = search_exception_tables(regs->ip);
    
    if (e) {
        /* Found! Modify RIP to retry with fixup */
        regs->ip = (unsigned long)&e->insn + e->fixup;
        return true;  /* Fault handled */
    }
    
    return false;  /* No fixup - kernel bug! */
}

Why This Is Safe

Simpler Accessors: get_user and put_user

For accessing single values (1, 2, 4, or 8 bytes), the kernel provides optimized get_user() and put_user() macros that are faster than copy_*_user() for small transfers:

get_put_user.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
/* Optimized accessors for simple types */
 
/* get_user: Read a single value from user space
 * 
 * x: kernel variable to receive the value
 * ptr: user-space pointer
 * Returns: 0 on success, -EFAULT on error
 */
#define get_user(x, ptr)                     \
({                                           \
    int __ret;                               \
    __typeof__(*(ptr)) __val;                \
    /* Verify address is in user space */    \
    if (!access_ok(ptr, sizeof(*ptr))) {     \
        __ret = -EFAULT;                     \
    } else {                                 \
        __ret = __get_user(__val, ptr);      \
        (x) = __val;                         \
    }                                        \
    __ret;                                   \
})
 
/* put_user: Write a single value to user space */
#define put_user(x, ptr)                     \
({                                           \
    int __ret;                               \
    if (!access_ok(ptr, sizeof(*ptr))) {     \
        __ret = -EFAULT;                     \
    } else {                                 \
        __ret = __put_user(x, ptr);          \
    }                                        \
    __ret;                                   \
})
 
/* Example usage */
SYSCALL_DEFINE2(getrlimit, unsigned int, resource, 
                struct rlimit __user *, rlim)
{
    struct rlimit value;
    int retval;
    
    retval = do_prlimit(current, resource, NULL, &value);
    if (retval)
        return retval;
    
    /* Write the result to user space */
    if (put_user(value.rlim_cur, &rlim->rlim_cur) ||
        put_user(value.rlim_max, &rlim->rlim_max))
        return -EFAULT;
    
    return 0;
}
 
/* __get_user (unsafe version) - used after access_ok() */
/* These are faster but caller MUST verify address first */
#define __get_user(x, ptr) ({                \
    __typeof__(*(ptr)) __gu_val;             \
    int __gu_err = 0;                        \
    /* Inline assembly with exception handler */  \
    switch (sizeof(*ptr)) {                  \
    case 1:                                  \
        __get_user_asm(__gu_val, ptr, "b");  \
        break;                               \
    case 2:                                  \
        __get_user_asm(__gu_val, ptr, "w");  \
        break;                               \
    case 4:                                  \
        __get_user_asm(__gu_val, ptr, "l");  \
        break;                               \
    case 8:                                  \
        __get_user_asm(__gu_val, ptr, "q");  \
        break;                               \
    }                                        \
    (x) = __gu_val;                          \
    __gu_err;                                \
})

Never Use __get_user/__put_user Directly

Memory Access Functions Summary
Function	Direction	Size	Returns
`get_user(x, ptr)`	User → Kernel	1/2/4/8 bytes	0 or -EFAULT
`put_user(x, ptr)`	Kernel → User	1/2/4/8 bytes	0 or -EFAULT
`copy_from_user(to, from, n)`	User → Kernel	n bytes	Bytes not copied
`copy_to_user(to, from, n)`	Kernel → User	n bytes	Bytes not copied
`strncpy_from_user(to, from, n)`	User → Kernel	String	Length or -EFAULT
`strnlen_user(s, n)`	Measure length	String	Length or 0
`clear_user(to, n)`	Clear user memory	n bytes	Bytes not cleared

String Handling

Strings from user space are particularly dangerous. They can be:

Longer than expected (buffer overflow)
Not null-terminated (read past buffer bounds)
Extremely long (denial of service via memory exhaustion)
Modified by another thread between strlen and copy (TOCTTOU)

The kernel provides strncpy_from_user() for safe string copy:

strncpy_from_user.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
/* Safe string copy from user space */
 
/**
 * strncpy_from_user - Copy a string from user space
 * @dst: Destination kernel buffer
 * @src: Source user-space pointer  
 * @count: Maximum bytes to copy (including null terminator)
 *
 * Returns:
 *   - Length of string (not including null) on success
 *   - -EFAULT on access error
 *   - If string is longer than count, copies count-1 chars + null
 */
long strncpy_from_user(char *dst, const char __user *src, long count)
{
    long res;
    
    if (!access_ok(src, 1)) {  /* Check at least first byte */
        return -EFAULT;
    }
    
    /* Call architecture-specific safe version */
    res = do_strncpy_from_user(dst, src, count);
    
    if (res >= count) {
        /* String was truncated - add null terminator */
        dst[count - 1] = '\0';
        return count - 1;
    }
    
    return res;
}
 
/* Example: Getting a pathname from user space */
char *getname(const char __user *filename)
{
    char *kname;
    int len;
    
    /* PATH_MAX is typically 4096 */
    kname = kmalloc(PATH_MAX, GFP_KERNEL);
    if (!kname)
        return ERR_PTR(-ENOMEM);
    
    /* Safely copy the path */
    len = strncpy_from_user(kname, filename, PATH_MAX);
    
    if (len < 0) {
        kfree(kname);
        return ERR_PTR(len);  /* -EFAULT */
    }
    
    if (len == PATH_MAX - 1 && kname[len] != '\0') {
        /* Path was too long */
        kfree(kname);
        return ERR_PTR(-ENAMETOOLONG);
    }
    
    return kname;
}
 
/* strnlen_user: Get length of user string without copying */
long strnlen_user(const char __user *str, long count)
{
    /* Returns: length including null, or 0 on error */
    /* 
     * Useful for pre-allocation:
     * len = strnlen_user(user_str, MAX_LEN);
     * if (len == 0) return -EFAULT;
     * if (len > MAX_LEN) return -ENAMETOOLONG;
     * buf = kmalloc(len, GFP_KERNEL);
     * strncpy_from_user(buf, user_str, len);
     */
}

Kernel Has Its Own getname()

TOCTTOU Prevention

TOCTTOU (Time-of-Check-to-Time-of-Use) vulnerabilities occur when the system checks a condition, then later acts on it—but the condition changes between check and use.

Classic TOCTTOU attack:

User calls access("/tmp/foo", W_OK) to check write permission
Kernel verifies user can write to /tmp/foo
Attacker swaps /tmp/foo symlink to point to /etc/passwd
User calls open("/tmp/foo", O_WRONLY)
Kernel opens (now pointing to /etc/passwd!)
User writes to /etc/passwd (privilege escalation)

tocttou_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
/* TOCTTOU-vulnerable pattern (in USER space) */
/* This is broken by design, not a kernel bug */
 
if (access(filename, W_OK) == 0) {
    /* Window of vulnerability: between check and use */
    /* Attacker can change filesystem here! */
    
    int fd = open(filename, O_WRONLY);  /* Might open different file */
    write(fd, data, len);
}
 
/* TOCTTOU in KERNEL space - validation must be atomic with use */
 
/* VULNERABLE kernel code (conceptual) */
ssize_t vulnerable_read(char __user *buf, size_t count)
{
    /* Check that buf is valid */
    if (!access_ok(buf, count))
        return -EFAULT;
    
    /* Window: another thread unmaps buf here! */
    
    /* Use buf - might fault now! */
    for (int i = 0; i < count; i++) {
        *buf++ = data[i];  /* CRASH! */
    }
}
 
/* CORRECT approach: use atomic copy functions */
ssize_t correct_read(char __user *buf, size_t count)
{
    /* copy_to_user handles the entire operation atomically
     * w.r.t. page faults. If the page becomes invalid during
     * copy, the exception table catches it and returns error.
     */
    if (copy_to_user(buf, kernel_data, count))
        return -EFAULT;
    return count;
}

How the kernel prevents TOCTTOU in syscalls:

Atomic copy operations — copy_from_user/copy_to_user handle faults during the copy itself
Double-fetch prevention — Never read user memory twice; copy once, validate the copy
Lock ordering — Hold appropriate locks while acting on checked conditions
fd-based operations — Use file descriptors instead of paths for operations (openat(), fchmodat(), etc.)

double_fetch.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
/* Double-fetch vulnerability example */
 
struct user_cmd {
    size_t size;
    char data[];
};
 
/* VULNERABLE: Double fetch from user memory */
int vuln_handler(struct user_cmd __user *ucmd)
{
    size_t size;
    char *kbuf;
    
    /* First fetch: get size */
    if (get_user(size, &ucmd->size))
        return -EFAULT;
    
    /* Allocate based on size */
    if (size > MAX_SIZE)
        return -EINVAL;
    
    kbuf = kmalloc(size, GFP_KERNEL);
    
    /* VULNERABLE: Second fetch of size! */
    /* Attacker changes ucmd->size between fetches */
    if (copy_from_user(kbuf, ucmd->data, ucmd->size))  /* WRONG! */
        return -EFAULT;
    
    /* If attacker changed size to huge value, buffer overflow! */
}
 
/* CORRECT: Copy once, use the copy */
int safe_handler(struct user_cmd __user *ucmd)
{
    size_t size;
    char *kbuf;
    
    /* Fetch size exactly once */
    if (get_user(size, &ucmd->size))
        return -EFAULT;
    
    if (size > MAX_SIZE)
        return -EINVAL;
    
    kbuf = kmalloc(size, GFP_KERNEL);
    
    /* Use the KERNEL copy of size */
    if (copy_from_user(kbuf, ucmd->data, size))  /* Correct! */
        return -EFAULT;
    
    /* size is from kernel memory - can't change */
}

Rule: Never Read User Memory Twice

Integer Overflow Protection

Size and count parameters can cause integer overflows that bypass bounds checks. The kernel uses careful arithmetic and dedicated overflow-safe functions:

Classic overflow attack:

User passes size = 0xfffffff0 and offset = 0x20
size + offset = 0x10 (overflows!)
Bounds check: is 0x10 < buffer_size? YES
But actual access goes way past the buffer!

overflow_protection.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
/* Integer overflow protection mechanisms */
 
/* VULNERABLE: Simple addition overflows */
ssize_t vuln_copy(void __user *buf, size_t offset, size_t count)
{
    /* This check is bypassed by overflow! */
    if (offset + count > buffer_size)  /* WRONG */
        return -EINVAL;
    
    /* Attacker: offset=MAX-10, count=100
     * offset + count = (wraps around to) 90
     * 90 < buffer_size - check passes!
     * But actual access is at offset MAX-10
     */
}
 
/* CORRECT: Check for overflow explicitly */
ssize_t safe_copy(void __user *buf, size_t offset, size_t count)
{
    /* Check for addition overflow */
    if (count > SIZE_MAX - offset)  /* Can't overflow */
        return -EINVAL;
    
    /* Now safe to add */
    if (offset + count > buffer_size)
        return -EINVAL;
    
    /* Proceed... */
}
 
/* Using compiler builtins (modern approach) */
ssize_t modern_copy(void __user *buf, size_t offset, size_t count)
{
    size_t end;
    
    /* __builtin_add_overflow returns true on overflow */
    if (__builtin_add_overflow(offset, count, &end))
        return -EOVERFLOW;
    
    if (end > buffer_size)
        return -EINVAL;
    
    /* Safe to proceed */
}
 
/* Kernel helper macros */
#include <linux/overflow.h>
 
/* check_add_overflow(a, b, d) - true if a+b overflows, stores in *d */
/* check_mul_overflow(a, b, d) - true if a*b overflows */
/* array_size(a, b) - returns a*b or SIZE_MAX on overflow */
/* struct_size(ptr, member, n) - size of struct with n array elements */
 
/* Example: Allocating array with overflow protection */
struct my_struct {
    int header;
    char data[];
};
 
struct my_struct *alloc_struct(size_t num_elements)
{
    struct my_struct *p;
    size_t size;
    
    /* Safe calculation of total size */
    size = struct_size(p, data, num_elements);
    if (size == SIZE_MAX)
        return NULL;  /* Overflow detected */
    
    return kmalloc(size, GFP_KERNEL);
}

struct_size() is Your Friend

Overflow-Safe Arithmetic Functions
Function	Operation	Overflow Behavior
`check_add_overflow(a, b, &d)`	d = a + b	Returns true if overflow
`check_sub_overflow(a, b, &d)`	d = a - b	Returns true if underflow
`check_mul_overflow(a, b, &d)`	d = a * b	Returns true if overflow
`array_size(n, size)`	n * size	Returns SIZE_MAX on overflow
`array3_size(a, b, c)`	a * b * c	Returns SIZE_MAX on overflow
`struct_size(p, member, n)`	sizeof(p) + nsizeof(member[0])	Returns SIZE_MAX on overflow

Summary: Parameter Validation

We've explored the critical security barrier between user space and kernel—the parameter validation layer that protects the system from malicious or buggy input. Let's consolidate the key concepts:

Key Takeaways

•Never trust user pointers — Every pointer from user space could be a kernel address, unmapped memory, or a trap. Always validate before use.
•access_ok() checks address range — It verifies pointers are in user address space but doesn't guarantee the memory is mapped or accessible.
•Use copy_*_user() for data transfer — These functions handle page faults gracefully via exception tables, never causing kernel crashes.
•get_user()/put_user() for simple values — Optimized accessors for 1/2/4/8-byte transfers with built-in validation.
•strncpy_from_user() for strings — Always use bounded copies; user strings may be unterminated or arbitrarily long.
•Prevent TOCTTOU attacks — Copy user data exactly once to kernel memory; never re-read user memory for validation.
•Guard against integer overflow — Use check_*_overflow() and struct_size() for arithmetic on user-provided sizes.

What's next:

Page Complete

4 / 5