Loading learning content...
System calls fail. Files don't exist. Permissions are denied. Resources run out. Networks disconnect. Hardware fails. A robust operating system must communicate these failures clearly and consistently to applications.
The error handling mechanism spans the entire syscall path:
This seemingly simple flow hides considerable complexity. What do the error codes mean? How are they chosen? How do POSIX-standard programs distinguish error types? What happens when multiple errors occur? This page answers these questions and equips you to debug the most cryptic syscall failures.
By the end of this page, you will understand the complete error handling flow—from kernel error code selection through glibc's errno mechanism to application-level error checking. You'll know how to interpret strace output, debug mysterious EFAULTs, and write robust error-handling code.
The Linux kernel uses a simple but effective convention for signaling errors:
The rule:
This convention allows the entire result and error to be communicated through a single register, avoiding the overhead of separate success/error channels.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
/* How kernel syscall handlers return errors */ #include <linux/errno.h> /* Error code definitions */ /* Example handler showing error returns */SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode){ struct filename *name; int fd; /* Copy filename from user space */ name = getname(filename); if (IS_ERR(name)) { /* getname() failed - return its error */ return PTR_ERR(name); /* e.g., -EFAULT, -ENAMETOOLONG */ } /* Look up the file */ fd = do_sys_open(AT_FDCWD, name, flags, mode); putname(name); /* fd is either: * - positive (valid file descriptor), or * - negative (error code like -ENOENT, -EACCES) */ return fd;} /* The kernel's error pointer convention *//* Some functions return pointers that might be errors */#define MAX_ERRNO 4095#define IS_ERR_VALUE(x) ((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO)#define IS_ERR(ptr) IS_ERR_VALUE((unsigned long)(ptr))#define PTR_ERR(ptr) ((long)(ptr)) /* Extract error from pointer */#define ERR_PTR(error) ((void *)((long)(error))) /* Convert error to pointer */ /* Example: Function returning pointer or error */struct file *filp_open(const char *filename, int flags, umode_t mode){ struct file *f; f = do_filp_open(AT_FDCWD, getname_kernel(filename), ...); if (!f) return ERR_PTR(-ENOMEM); if (error_occurred) return ERR_PTR(-ENOENT); /* -2 stored as pointer */ return f; /* Valid pointer */} /* Caller usage */struct file *f = filp_open("/etc/passwd", O_RDONLY, 0);if (IS_ERR(f)) { int err = PTR_ERR(f); /* Extract -ENOENT = -2 */ printk("open failed: %d\n", err); return err;}The kernel uses ERR_PTR/IS_ERR for functions that return pointers. Since pointers in the range [-4095, -1] cannot be valid addresses (this range is unmapped on Linux), we can embed error codes in pointer-returning functions. This elegant trick unifies the error handling convention.
Why negative values?
The choice to use negative values for errors is both practical and elegant:
Linux defines over 130 distinct error codes, each describing a specific failure category. These are defined in <errno.h> headers and organized by subsystem:
Base errors (1-34): Core POSIX errors
These are the fundamental errors that appear in virtually every UNIX-like system:
| errno | Value | Meaning |
|---|---|---|
| EPERM | 1 | Operation not permitted - lacks privilege/capability |
| ENOENT | 2 | No such file or directory - path doesn't exist |
| ESRCH | 3 | No such process - PID not found |
| EINTR | 4 | Interrupted system call - signal received |
| EIO | 5 | I/O error - hardware or driver failure |
| ENXIO | 6 | No such device or address |
| E2BIG | 7 | Argument list too long (exec) |
| ENOEXEC | 8 | Exec format error - not a valid executable |
| EBADF | 9 | Bad file descriptor - fd not open |
| ECHILD | 10 | No child processes |
| EAGAIN/EWOULDBLOCK | 11 | Resource temporarily unavailable |
| ENOMEM | 12 | Out of memory - allocation failed |
| EACCES | 13 | Permission denied - file access check failed |
| EFAULT | 14 | Bad address - pointer outside address space |
| EBUSY | 16 | Device or resource busy |
| EEXIST | 17 | File exists - tried to create existing file |
| ENOTDIR | 20 | Not a directory - path component isn't dir |
| EISDIR | 21 | Is a directory - tried to write to dir |
| EINVAL | 22 | Invalid argument - parameter out of range |
| EMFILE | 24 | Too many open files (process limit) |
| ENFILE | 23 | File table overflow (system limit) |
| ENOTTY | 25 | Inappropriate ioctl for device |
| EFBIG | 27 | File too large |
| ENOSPC | 28 | No space left on device |
| ESPIPE | 29 | Illegal seek (on pipe/socket) |
| EROFS | 30 | Read-only file system |
| EPIPE | 32 | Broken pipe - reader closed |
| EDOM | 33 | Math argument out of domain |
| ERANGE | 34 | Math result not representable |
Extended errors (35-124): Linux-specific and network errors
| errno | Value | Category | Meaning |
|---|---|---|---|
| EDEADLK | 35 | Locking | Resource deadlock would occur |
| ENOLCK | 37 | Locking | No record locks available |
| ENOSYS | 38 | Syscall | Function not implemented |
| ENOTEMPTY | 39 | Filesystem | Directory not empty |
| ELOOP | 40 | Filesystem | Too many symbolic links |
| ENODATA | 61 | Streams | No data available |
| ETIME | 62 | Timeout | Timer expired |
| ENONET | 64 | Network | Machine not on network |
| EPROTO | 71 | Protocol | Protocol error |
| EOVERFLOW | 75 | Numeric | Value too large for type |
| ENOTSOCK | 88 | Socket | Not a socket |
| EPROTOTYPE | 91 | Socket | Wrong protocol type |
| ECONNREFUSED | 111 | Network | Connection refused |
| EHOSTUNREACH | 113 | Network | No route to host |
| EALREADY | 114 | Network | Operation already in progress |
| EINPROGRESS | 115 | Network | Operation now in progress |
While errno values are numeric, strerror(errno) returns a human-readable description. Always log both: 'open failed: ENOENT (2): No such file or directory'. The number helps with cross-reference, the message helps human readers.
The kernel returns errors as negative values in a register. The C library wrapper translates this to the POSIX-standard convention:
This two-part convention allows programs to distinguish "returned -1 successfully" (for syscalls where -1 is valid) from "failed with error".
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970
/* glibc's errno handling (conceptual) */ /* errno is thread-local storage */extern __thread int errno; /* Or accessed through a function (older systems) */extern int *__errno_location(void);#define errno (*__errno_location()) /* Wrapper translation */ssize_t read(int fd, void *buf, size_t count){ long ret; /* Make the syscall */ ret = syscall3(__NR_read, fd, (long)buf, count); /* Check for error: range [-4095, -1] */ if (ret >= -4095UL && ret <= -1UL) { /* Convert to C convention */ errno = -ret; /* Negate: -(-2) = 2 = ENOENT */ return -1; /* Always return -1 on error */ } /* Success - return the actual value */ errno = 0; /* Some wrappers clear errno on success */ return ret;} /* Why errno must be thread-local (since POSIX.1c) *//* * Thread A calls read() → fails → sets errno = ENOENT * Thread B calls write() → fails → would overwrite A's errno! * * Solution: each thread has its own errno storage * * __thread int errno; // Thread-local in modern C * * or * * int *__errno_location(void) { * return ¤t_thread->errno; * } */ /* Correct error checking pattern */void example_correct(void){ int fd = open("/etc/passwd", O_RDONLY); if (fd == -1) { /* Check errno IMMEDIATELY - before any other calls! */ int saved_errno = errno; /* Now safe to make other calls (logging, etc.) */ perror("open failed"); /* Uses errno internally */ /* Or handle specific errors */ switch (saved_errno) { case ENOENT: fprintf(stderr, "File not found\n"); break; case EACCES: fprintf(stderr, "Permission denied\n"); break; default: fprintf(stderr, "Error: %s\n", strerror(saved_errno)); } }}errno can be overwritten by ANY subsequent system call or library function. If you need to check errno, save it to a local variable IMMEDIATELY after the failing call returns. Even printf() can modify errno. The pattern if (ret == -1) { int err = errno; ... } is essential.
What happens if errno is not checked?
Many programmers ignore return values (and therefore errno). This leads to:
While error codes are standardized, different syscalls return different subsets of errors with different meanings. Understanding syscall-specific error semantics is essential for robust programming.
open() errors:
| errno | Meaning in Context of open() |
|---|---|
| EACCES | Permission denied (file exists but can't be accessed) |
| ENOENT | File doesn't exist (and O_CREAT not specified) |
| EEXIST | O_CREAT | O_EXCL specified, but file exists |
| EISDIR | Tried to open directory for writing |
| EMFILE | Process file descriptor limit reached |
| ENFILE | System-wide file descriptor limit reached |
| ENOMEM | Kernel couldn't allocate inode/dentry structures |
| ENOSPC | O_CREAT specified, filesystem full |
| ELOOP | Too many symbolic link levels |
| ENAMETOOLONG | Path exceeds PATH_MAX |
| ENOTDIR | A component of path is not a directory |
| EROFS | Tried to open read-only filesystem for writing |
read() errors:
| errno | Meaning in Context of read() |
|---|---|
| EBADF | fd is not valid or not open for reading |
| EFAULT | buf points outside accessible address space |
| EINTR | Read was interrupted by signal before data |
| EINVAL | fd refers to object unsuitable for reading |
| EIO | I/O error (hardware failure) |
| EISDIR | fd refers to a directory |
| EAGAIN/EWOULDBLOCK | Non-blocking fd, no data available |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
/* Robust error handling patterns */ /* Pattern 1: Retry on EINTR */ssize_t safe_read(int fd, void *buf, size_t count){ ssize_t ret; do { ret = read(fd, buf, count); } while (ret == -1 && errno == EINTR); return ret;} /* Pattern 2: Handle expected vs unexpected errors */int open_with_fallback(const char *path, int flags){ int fd = open(path, flags); if (fd == -1) { switch (errno) { case ENOENT: /* Expected - file might not exist */ return -1; /* Caller handles this */ case EACCES: case EPERM: /* Expected - permission issue */ fprintf(stderr, "Permission denied: %s\n", path); return -1; case EMFILE: case ENFILE: /* Resource exhaustion - maybe recoverable */ fprintf(stderr, "Too many open files\n"); /* Could try to close some files and retry */ return -1; default: /* Unexpected error - serious problem */ fprintf(stderr, "Unexpected error opening %s: %s\n", path, strerror(errno)); abort(); /* Or handle more gracefully */ } } return fd;} /* Pattern 3: Write with proper short-write handling */ssize_t full_write(int fd, const void *buf, size_t count){ size_t written = 0; while (written < count) { ssize_t ret = write(fd, (char*)buf + written, count - written); if (ret == -1) { if (errno == EINTR) continue; /* Interrupted, retry */ return -1; /* Real error */ } if (ret == 0) { errno = EIO; /* Unexpected zero write */ return -1; } written += ret; } return written;}Many syscalls can be interrupted by signals, returning EINTR. Robust code should either retry (SA_RESTART helps but isn't universal) or properly handle the interruption. Never assume a syscall can't fail with EINTR.
When syscalls fail mysteriously, strace is your diagnostic microscope. It intercepts every syscall a process makes, showing arguments and return values:
Basic usage:
1234567891011121314151617181920212223242526272829303132333435363738394041424344
# Trace all syscalls of a command$ strace ls /nonexistentexecve("/usr/bin/ls", ["ls", "/nonexistent"], ...) = 0...many syscalls...openat(AT_FDCWD, "/nonexistent", O_RDONLY|O_NOCTTY|O_DIRECTORY) = -1 ENOENT (No such file or directory)write(2, "ls: cannot access '/nonexistent'"..., 44) = 44exit_group(2) = ? # Filter to specific syscall types$ strace -e openat,read,write cat /etc/passwdopenat(AT_FDCWD, "/etc/passwd", O_RDONLY) = 3read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 131072) = 1234write(1, "root:x:0:0:root:/root:/bin/bash\n"..., 1234) = 1234read(3, "", 131072) = 0+++ exited with 0 +++ # Show timing information$ strace -T ls /tmpopenat(AT_FDCWD, "/tmp", ...) = 3 <0.000024>getdents64(3, ..., 32768) = 456 <0.000089> ^^^^^^^^ Time in syscall (seconds) # Attach to a running process$ strace -p 12345Process 12345 attachedread(0, "hello\n", 1024) = 6write(1, "hello\n", 6) = 6^CProcess 12345 detached # Save output to file$ strace -o trace.log myprogram # Show relative timestamps$ strace -r ls 0.000000 execve("/usr/bin/ls", ...) = 0 0.000541 brk(NULL) = 0x... 0.000012 access("/etc/ld.so.preload", R_OK) = -1 ENOENT ^^^^^^^^ Time since last syscall # Follow forked children$ strace -f ./parent_processReading strace output:
Each line follows the format:
syscall(arguments...) = return_value[ERRNO (description)]
For errors:
12345678910111213141516171819202122232425262728
# Interpreting strace output # Successful syscalls show the return value:openat(AT_FDCWD, "/etc/passwd", O_RDONLY) = 3 ^ File descriptor 3 returned read(3, "root:x:0:0:root...", 4096) = 1234 ^ 1234 bytes read # Failed syscalls show -1, errno, and description:openat(AT_FDCWD, "/nonexistent", O_RDONLY) = -1 ENOENT (No such file or directory) ^^^^^ ^^^^^ Return Errno name access("/etc/shadow", R_OK) = -1 EACCES (Permission denied)write(5, "data", 4) = -1 EBADF (Bad file descriptor)mmap(NULL, 1000000000, ...) = -1 ENOMEM (Cannot allocate memory) # Common patterns to look for: # 1. File not found early in startup → config file missing# 2. Permission denied on /etc/shadow → needs root# 3. EMFILE or ENFILE → fd leak in application# 4. EFAULT → bug: passing invalid pointer # 5. EINVAL → bug: passing invalid argument value# 6. Many EINTRs → signal handling issuesstrace has significant overhead (uses ptrace, stops process on each syscall). For production debugging, consider perf trace or BPF-based tools (bpftrace, bcc) which are much faster. strace is excellent for development and debugging but not for production profiling.
Syscall errors visible to user space are just the tip of the iceberg. The kernel has additional error reporting mechanisms for kernel developers and system administrators:
printk() and dmesg:
Kernel code uses printk() to log messages that appear in the kernel ring buffer (viewable via dmesg):
12345678910111213141516171819202122232425262728293031323334353637383940
/* Kernel error logging levels */ /* Log levels (include/linux/kern_levels.h) */#define KERN_EMERG "0" /* System is unusable */#define KERN_ALERT "1" /* Action must be taken immediately */#define KERN_CRIT "2" /* Critical conditions */#define KERN_ERR "3" /* Error conditions */#define KERN_WARNING "4" /* Warning conditions */#define KERN_NOTICE "5" /* Normal but significant */#define KERN_INFO "6" /* Informational */#define KERN_DEBUG "7" /* Debug-level messages */ /* Usage in kernel code */void some_driver_function(void){ if (hardware_error()) { pr_err("Hardware failure detected on device %s\n", dev_name); /* pr_err = printk(KERN_ERR ...) */ } if (suspicious_condition()) { pr_warn("Unusual state detected: flags=%x\n", flags); } pr_debug("Normal operation: processed %d items\n", count); /* Only appears if dynamic debug enabled for this file */} /* Viewing kernel logs */$ dmesg | tail[ 1234.567890] EXT4-fs (sda1): mounted filesystem with ordered data mode[ 1234.678901] nvidia-uvm: Loaded the UVM driver[ 1235.789012] usb 1-2: new high-speed USB device $ dmesg --level=err,warn # Only errors and warnings[ 9876.543210] ata1.00: failed command: WRITE FPDMA QUEUED[ 9876.654321] Buffer I/O error on dev sda1, sector 12345 /* Real-time monitoring */$ dmesg -w # Watch mode (like tail -f)Kernel panics and oops:
When the kernel encounters an unrecoverable error (dereferencing NULL, stack corruption, etc.), it generates a "kernel panic" or "oops":
1234567891011121314151617181920212223242526272829
# Example kernel oops output BUG: unable to handle kernel NULL pointer dereference at 0000000000000008IP: my_driver_read+0x42/0x100 [my_module]PGD 0 P4D 0Oops: 0000 [#1] SMP PTICPU: 2 PID: 1234 Comm: userprogram Tainted: G OE 5.4.0-42-genericRIP: 0010:my_driver_read+0x42/0x100 [my_module] ^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^ CS Function + offset RSP: 0018:ffffc900012abc58 EFLAGS: 00010246RAX: 0000000000000000 RBX: ffff888012345678 RCX: 0000000000000000 ^^^ NULL pointer was in RAX Call Trace: vfs_read+0x91/0x140 ksys_read+0x59/0xd0 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x44/0xa9 # This call trace shows:# 1. User called read() syscall# 2. Kernel's do_syscall_64 handled it# 3. Dispatched to ksys_read# 4. Called vfs_read# 5. Called my_driver_read# 6. my_driver_read dereferenced NULL at offset 0x8If a syscall returns EFAULT and you're certain the user pointer is valid, the kernel code might have a bug (dereferencing user pointer directly instead of using copy_*_user()). Check dmesg for related messages. If reproducible, this may be a kernel bug worth reporting.
Robust error handling separates production-quality code from fragile prototypes. Here are the essential patterns:
int err = errno; right after failure.12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
/* Template for robust syscall error handling */ #include <errno.h>#include <stdio.h>#include <string.h> /* Error wrapper that logs and returns */#define SYSCALL_FAIL(call, ...) do { \ int _saved_errno = errno; \ fprintf(stderr, "%s:%d: " call " failed: %s (%d)\n", \ __FILE__, __LINE__, ##__VA_ARGS__, \ strerror(_saved_errno), _saved_errno); \ errno = _saved_errno; \} while (0) /* Robust file processing template */int process_file(const char *path){ int fd = -1; void *buf = NULL; int ret = -1; /* Open with error handling */ fd = open(path, O_RDONLY); if (fd == -1) { SYSCALL_FAIL("open(%s)", path); goto cleanup; /* ret = -1 */ } /* Allocate with error handling */ buf = malloc(BUFFER_SIZE); if (!buf) { fprintf(stderr, "malloc failed\n"); goto cleanup; } /* Read with EINTR handling */ ssize_t n; do { n = read(fd, buf, BUFFER_SIZE); } while (n == -1 && errno == EINTR); if (n == -1) { SYSCALL_FAIL("read(fd=%d)", fd); goto cleanup; } /* Process data... */ ret = do_processing(buf, n); cleanup: /* Always clean up, check errors even here */ if (buf) free(buf); if (fd != -1) { if (close(fd) == -1) { SYSCALL_FAIL("close(fd=%d)", fd); /* Still return previous ret - close error is secondary */ } } return ret;} /* Error classification helper */typedef enum { ERR_NONE, ERR_NOTFOUND, ERR_PERMISSION, ERR_RESOURCE, ERR_PERMANENT, ERR_TEMPORARY} error_class_t; error_class_t classify_error(int err){ switch (err) { case 0: return ERR_NONE; case ENOENT: return ERR_NOTFOUND; case EACCES: case EPERM: return ERR_PERMISSION; case EMFILE: case ENFILE: case ENOMEM: return ERR_RESOURCE; case EAGAIN: case EINTR: return ERR_TEMPORARY; default: return ERR_PERMANENT; }}Classifying errors into categories (temporary, permanent, resource-related, etc.) helps you write consistent handling logic. Temporary errors might warrant retry; resource errors might need cleanup and retry; permanent errors should fail immediately.
We've traced the complete error handling path—from kernel detection through glibc translation to application handling. Let's consolidate the key concepts:
Module Complete:
You have now completed the System Call Implementation module. You understand the complete lifecycle of a system call:
This foundational knowledge enables you to debug system-level issues, write robust systems code, understand security vulnerabilities, and even contribute to kernel development.
Congratulations! You now have a deep understanding of system call implementation. You can trace a syscall from user C code through glibc wrappers, CPU mode transitions, kernel dispatch, and back—including all the error paths along the way. This knowledge is foundational for systems programming, kernel debugging, and security analysis.