Loading learning content...
Every programmer begins their journey with the traditional I/O model: open a file, read bytes into a buffer, process them, write bytes back, and close. This model is conceptually clean, universally understood, and taught in every introductory programming course. But beneath this simplicity lies a profound inefficiency that becomes painfully apparent at scale.
Consider what happens during a traditional read() call:
read(fd, buffer, n), triggering a context switch to kernel modeThis double-copy architecture—from disk to kernel buffer, then from kernel buffer to user buffer—was the accepted paradigm for decades. But what if we could eliminate that second copy entirely? What if your application could access file contents directly through memory operations, as if the file were simply an array in your address space?
By the end of this page, you will have comprehensive understanding of the mmap() system call—its complete signature, every parameter's purpose, all flags and their interactions, return value semantics, error conditions, and the kernel mechanisms that make memory mapping possible. You'll understand not just how to use mmap(), but why each design decision was made.
The mmap() system call is the POSIX interface for creating memory mappings. Its signature reveals the rich flexibility of the mechanism:
#include <sys/mman.h>
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
At first glance, this six-parameter signature appears daunting. But each parameter addresses a specific aspect of the mapping contract between your application and the operating system. Let's dissect them systematically.
| Parameter | Type | Purpose | Common Values |
|---|---|---|---|
| addr | void * | Hint for where to place the mapping in virtual address space | NULL (let kernel choose), or specific address |
| length | size_t | Size of the mapping in bytes | File size, page-multiple sizes |
| prot | int | Memory protection flags (access permissions) | PROT_READ, PROT_WRITE, PROT_EXEC |
| flags | int | Mapping behavior flags | MAP_SHARED, MAP_PRIVATE, MAP_ANONYMOUS |
| fd | int | File descriptor of the file to map | Valid fd, or -1 for anonymous mappings |
| offset | off_t | Offset within the file where mapping begins | 0, or page-aligned offset |
Return Value Semantics:
On success, mmap() returns a pointer to the mapped region—the starting virtual address of the new mapping. On failure, it returns MAP_FAILED (which is (void *) -1) and sets errno to indicate the specific error.
Critical Implementation Detail: Unlike many system calls that return -1 on error, mmap() returns MAP_FAILED. This distinction matters because a NULL return could theoretically be a valid mapping address (though mapping at address 0 is typically prevented by modern kernels for security reasons).
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061
#include <sys/mman.h>#include <sys/stat.h>#include <fcntl.h>#include <stdio.h>#include <stdlib.h>#include <unistd.h> int main(int argc, char *argv[]) { if (argc != 2) { fprintf(stderr, "Usage: %s <filename>\n", argv[0]); exit(EXIT_FAILURE); } // Step 1: Open the file int fd = open(argv[1], O_RDONLY); if (fd == -1) { perror("open"); exit(EXIT_FAILURE); } // Step 2: Get file size using fstat struct stat sb; if (fstat(fd, &sb) == -1) { perror("fstat"); close(fd); exit(EXIT_FAILURE); } // Step 3: Memory-map the file void *mapped = mmap( NULL, // addr: let kernel choose the address sb.st_size, // length: map entire file PROT_READ, // prot: read-only access MAP_PRIVATE, // flags: changes won't affect file fd, // fd: file descriptor to map 0 // offset: start from beginning of file ); if (mapped == MAP_FAILED) { perror("mmap"); close(fd); exit(EXIT_FAILURE); } // Step 4: File descriptor can be closed after mapping // The mapping remains valid close(fd); // Step 5: Access file contents through memory // Write first 100 bytes (or less) to stdout write(STDOUT_FILENO, mapped, sb.st_size < 100 ? sb.st_size : 100); // Step 6: Unmap when done if (munmap(mapped, sb.st_size) == -1) { perror("munmap"); exit(EXIT_FAILURE); } return 0;}The first parameter, addr, specifies where in your process's virtual address space you want the mapping to appear. This parameter's behavior depends critically on whether you include MAP_FIXED in the flags.
When addr is NULL:
The kernel has complete freedom to choose an appropriate location. This is the most common and recommended usage. The kernel's memory manager will find a suitable hole in your virtual address space, taking into account:
When addr is non-NULL without MAP_FIXED:
The address is treated as a hint. The kernel attempts to create the mapping at the specified address, but if that location is unavailable (already mapped, or otherwise unsuitable), it will choose a different address. Your code must always use the returned pointer, never assume the hint was honored.
When addr is non-NULL with MAP_FIXED:
This is a hard requirement. The kernel must place the mapping exactly at the specified address, or fail entirely. If an existing mapping overlaps with the requested region, that existing mapping is silently unmapped—replaced by the new mapping. This behavior is intentional but dangerous.
Using MAP_FIXED with an arbitrary address can corrupt your process's address space. If you accidentally specify an address that overlaps your stack, heap, or code segment, mmap() will silently replace those critical regions with your new mapping, causing immediate and mysterious crashes. Use MAP_FIXED only when you have precise knowledge of your address space layout—typically in specialized scenarios like implementing custom allocators or runtime loaders.
Modern Alternative: MAP_FIXED_NOREPLACE (Linux 4.17+):
Recognizing the dangers of MAP_FIXED, Linux introduced MAP_FIXED_NOREPLACE. This flag requests exact address placement but fails rather than replacing an existing mapping:
// Safe fixed-address mapping
void *ptr = mmap(
desired_addr,
length,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED_NOREPLACE,
-1,
0
);
if (ptr == MAP_FAILED) {
if (errno == EEXIST) {
// Address already mapped - handle gracefully
}
}
Page Alignment Requirements:
Whether specified as a hint or requirement, the addr parameter (when non-NULL) must be page-aligned. If you specify an address that doesn't fall on a page boundary, the kernel typically rounds down to the nearest page (behavior may vary by implementation). For explicit control, calculate page-aligned addresses:
long page_size = sysconf(_SC_PAGESIZE); // 4096 on most systems
void *aligned_addr = (void *)((uintptr_t)desired_addr & ~(page_size - 1));
The length parameter specifies the size of the mapping in bytes. The relationship between length and page boundaries is subtle but critical for correct usage.
Automatic Page Rounding:
Internally, the kernel rounds the length up to the nearest page boundary. If you request a 100-byte mapping, you'll actually get at least one full page (typically 4096 bytes). However, accessing bytes beyond your specified length but within the mapped pages is undefined behavior—the kernel may or may not fault on such accesses, leading to non-portable code.
Relationship to File Size:
When mapping files, the interplay between length and file size creates several scenarios:
| Scenario | Behavior | Implications |
|---|---|---|
| length == file_size | Perfect fit—entire file mapped | Optimal case, common usage pattern |
| length < file_size | Partial mapping—only specified bytes mapped | Valid for processing file sections |
| length > file_size | Mapped region extends beyond EOF | Accessing bytes beyond EOF causes SIGBUS |
| length = 0 | Mapping fails with EINVAL | Zero-length mappings are prohibited |
The SIGBUS Trap:
One of the most confusing aspects of memory-mapped files is the SIGBUS signal. When you access a mapped region that corresponds to bytes beyond the end of the underlying file, the kernel delivers SIGBUS (Bus Error), not SIGSEGV (Segmentation Fault).
Consider this dangerous pattern:
int fd = open("small_file.txt", O_RDONLY); // File is 100 bytes
void *map = mmap(NULL, 4096, PROT_READ, MAP_PRIVATE, fd, 0);
// This works fine:
char first_byte = ((char *)map)[0];
// This causes SIGBUS, not SIGSEGV:
char byte_1000 = ((char *)map)[1000]; // File only has 100 bytes!
Why SIGBUS Instead of SIGSEGV?
The distinction is architecturally significant:
When the accessed offset exceeds the file size, there's no corresponding block on disk. The page is mapped but empty, leading to SIGBUS.
Always use fstat() to get the exact file size, and use that size (or less) as your length parameter. Never map beyond the file's current size unless you've explicitly extended the file (using ftruncate() or write operations) beforehand.
12345678910111213141516171819202122232425262728293031323334353637
#include <sys/mman.h>#include <sys/stat.h>#include <fcntl.h>#include <unistd.h>#include <stdio.h> void *safe_mmap_file(const char *path, size_t *out_size) { int fd = open(path, O_RDONLY); if (fd == -1) return NULL; struct stat sb; if (fstat(fd, &sb) == -1) { close(fd); return NULL; } // Handle empty files specially if (sb.st_size == 0) { close(fd); *out_size = 0; return NULL; // Can't map empty files } // Map exactly the file size - no more, no less void *map = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); // Safe to close after mapping if (map == MAP_FAILED) return NULL; *out_size = sb.st_size; return map;} // Usage:// size_t size;// void *data = safe_mmap_file("config.txt", &size);// // Now 'size' contains exact number of accessible bytesThe prot parameter specifies the memory protection for the mapped region. These flags are combined using bitwise OR and determine what operations the CPU will permit on the mapped pages.
| Flag | Value (typical) | Meaning | CPU Enforcement |
|---|---|---|---|
| PROT_NONE | 0x0 | Pages cannot be accessed at all | Any access triggers SIGSEGV |
| PROT_READ | 0x1 | Pages can be read | Read operations permitted |
| PROT_WRITE | 0x2 | Pages can be written | Write operations permitted |
| PROT_EXEC | 0x4 | Pages can contain executable code | CPU can fetch instructions from these pages |
Common Combinations:
PROT_READ // Read-only mapping (data files)
PROT_READ | PROT_WRITE // Read-write mapping (modifiable data)
PROT_READ | PROT_EXEC // Executable code (shared libraries)
PROT_READ | PROT_WRITE | PROT_EXEC // JIT compilation buffers
PROT_NONE // Guard pages, reserved address space
Interaction with File Permissions:
The protection flags must be compatible with how the file was opened:
PROT_READ requires the file to be opened with O_RDONLY or O_RDWRPROT_WRITE with MAP_SHARED requires O_RDWR (writes go to file)PROT_WRITE with MAP_PRIVATE only requires O_RDONLY (writes are private, COW)PROT_EXEC may have additional restrictions (see below)If you request protections incompatible with the file descriptor's mode, mmap() fails with EACCES.
Protection is enforced at page granularity by the CPU's Memory Management Unit (MMU). You cannot have different protections for different bytes within the same page. The page table entry contains protection bits that the MMU checks on every memory access. Violations trigger processor exceptions that the kernel converts to signals.
PROT_EXEC and W^X (Write XOR Execute) Policy:
Modern security practices enforce W^X: memory should be either writable OR executable, but never both simultaneously. This prevents code injection attacks where an attacker writes malicious code to a buffer and then executes it.
Many operating systems now enforce W^X by default:
For JIT Compilers:
Just-In-Time compilers must work around W^X by:
// JIT compilation pattern
void *code_buffer = mmap(NULL, size,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
generate_machine_code(code_buffer); // Write code bytes
mprotect(code_buffer, size, PROT_READ | PROT_EXEC); // Make executable
typedef int (*func_t)(void);
func_t jit_function = (func_t)code_buffer;
int result = jit_function(); // Execute generated code
PROT_NONE: Guard Pages and Address Reservation:
Although PROT_NONE creates pages that cannot be accessed at all, this has important uses:
Guard Pages: Protect against buffer overflows and stack overflow. Place PROT_NONE pages at the boundaries of buffers or around the stack. Any overflow that touches the guard page immediately triggers SIGSEGV, catching the bug early.
// Create a buffer with guard pages
size_t page_size = sysconf(_SC_PAGESIZE);
size_t protected_size = buffer_size + 2 * page_size;
void *region = mmap(NULL, protected_size,
PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
// Make middle portion accessible
void *buffer = (char *)region + page_size;
mprotect(buffer, buffer_size, PROT_READ | PROT_WRITE);
// Now 'buffer' is guarded on both ends
Address Reservation: Reserve a large contiguous virtual address region without consuming physical memory. Later, use mprotect() to make portions accessible as needed. This technique is used by custom allocators and garbage collectors to ensure memory growth doesn't require copying data to new locations.
The flags parameter controls the fundamental characteristics of the mapping—whether changes are visible to other processes, whether the mapping backs a file or anonymous memory, and various behavioral modifiers. This is the most complex parameter, with flags organized into categories:
Mandatory Flags (exactly one required):
| Flag | Description | Write Behavior | Visibility |
|---|---|---|---|
| MAP_SHARED | Changes shared with other mappings and written to file | Writes update the file (eventually) | Other processes see changes |
| MAP_PRIVATE | Copy-on-write private mapping | Writes create private copies | Changes invisible to others |
You must specify exactly one of MAP_SHARED or MAP_PRIVATE. Omitting both, or specifying both, results in undefined behavior (though most implementations treat it as an error).
Optional Modifier Flags:
| Flag | Description | Typical Use Case |
|---|---|---|
| MAP_ANONYMOUS | Mapping not backed by any file | Allocating memory, shared memory without files |
| MAP_FIXED | Interpret addr exactly, not as hint | Replacing existing mappings, custom loaders |
| MAP_FIXED_NOREPLACE | Like MAP_FIXED but fail if address in use | Safe fixed placement (Linux 4.17+) |
| MAP_NORESERVE | Don't reserve swap space upfront | Sparse mappings, overcommit-aware allocation |
| MAP_POPULATE | Pre-fault all pages immediately | Avoid page faults on first access |
| MAP_LOCKED | Lock pages in RAM (like mlock) | Real-time applications, cryptographic keys |
| MAP_HUGETLB | Use huge pages for this mapping | Large working sets, TLB efficiency |
| MAP_STACK | Hint that this is stack memory | Stack allocation (used by threading libraries) |
MAP_ANONYMOUS Deep Dive:
Anonymous mappings don't correspond to any file. Instead, they're backed by the system's swap space (if any) and are initialized to zero. This is actually how most modern memory allocators (malloc implementations) obtain memory from the operating system for large allocations.
// Anonymous private mapping - like malloc() for large blocks
void *heap_block = mmap(NULL, 1024 * 1024,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1, // fd must be -1 (or ignored, depending on implementation)
0 // offset must be 0 (or ignored)
);
Advantages of anonymous mmap over malloc for large allocations:
When you know you'll access the entire mapping immediately, use MAP_POPULATE. This causes the kernel to read all file pages into memory during the mmap() call itself, rather than faulting them in one-by-one on first access. The mmap() call takes longer, but subsequent access patterns become predictable with no page faults interrupting your computation.
The fd (file descriptor) and offset parameters together specify which file and which portion of that file to map.
The File Descriptor:
For file-backed mappings, fd must be an open file descriptor with appropriate access rights:
int fd = open("data.bin", O_RDWR); // For MAP_SHARED with PROT_WRITE
int fd = open("data.bin", O_RDONLY); // Sufficient for MAP_PRIVATE even with PROT_WRITE
File Descriptor Lifecycle:
A commonly misunderstood point: the file descriptor can be closed immediately after mmap() returns. The mapping maintains its own reference to the underlying file (technically, to the struct file in kernel space). This means:
int fd = open("file.txt", O_RDONLY);
void *map = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd); // Perfectly safe! Mapping remains valid.
// Continue using 'map' for as long as needed
printf("%s", (char *)map);
munmap(map, size); // Clean up when done
This behavior enables cleaner code—you don't need to track file descriptors for the lifetime of mappings.
For Anonymous Mappings:
When using MAP_ANONYMOUS, the fd parameter should be -1. Some older Unix systems require fd to be -1; others ignore it. For maximum portability, always pass -1.
The offset Parameter:
The offset specifies where within the file the mapping should begin. Two critical requirements:
Must be page-aligned: The offset must be a multiple of the system page size. Non-page-aligned offsets cause mmap() to fail with EINVAL.
Affects file positioning: The mapping begins at file position offset and extends for length bytes.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
#include <sys/mman.h>#include <sys/stat.h>#include <fcntl.h>#include <unistd.h>#include <stdio.h> // Map a section of a file, starting at a given offset// Returns NULL on failurevoid *map_file_section(const char *path, off_t offset, size_t length) { long page_size = sysconf(_SC_PAGESIZE); // Calculate page-aligned offset off_t aligned_offset = offset & ~(page_size - 1); size_t offset_diff = offset - aligned_offset; // Adjust length to include bytes before the requested offset size_t adjusted_length = length + offset_diff; int fd = open(path, O_RDONLY); if (fd == -1) { perror("open"); return NULL; } void *map = mmap(NULL, adjusted_length, PROT_READ, MAP_PRIVATE, fd, aligned_offset); close(fd); if (map == MAP_FAILED) { perror("mmap"); return NULL; } // Return pointer to the actual requested offset // Caller must remember to unmap starting at (ptr - offset_diff) // and using adjusted_length return (char *)map + offset_diff;} /* * Usage note: This function demonstrates the page-alignment * complexity. A production version would need to return * additional metadata for proper cleanup. * * Better approach: Create a struct containing: * - Pointer to user-requested offset * - Pointer to actual mapping start * - Actual mapping length */Mapping Beyond File End:
If offset + length exceeds the file size at the time of mmap(), the behavior depends on how far beyond:
Within the last mapped page: The kernel zero-fills bytes between EOF and the page boundary. Accessing these zeros is safe (reads return zero, writes to MAP_PRIVATE work but are discarded on munmap, writes to MAP_SHARED may extend the file).
Beyond the last mapped page: Accessing these addresses triggers SIGBUS.
Windows Comparison:
On Windows, the equivalent is CreateFileMapping() + MapViewOfFile(). While the concepts are similar, Windows uses a two-step process and has different parameter semantics. Cross-platform code must abstract these differences.
When mmap() fails, it returns MAP_FAILED and sets errno to indicate the specific error. Understanding these errors is essential for robust code:
| errno | Cause | Resolution |
|---|---|---|
| EACCES | Protection flags incompatible with file open mode | Open file with correct mode (O_RDWR for MAP_SHARED writes) |
| EAGAIN | File is locked, or too much memory locked | Release locks, reduce mlock usage, increase RLIMIT_MEMLOCK |
| EBADF | fd is not a valid open file descriptor | Check open() succeeded, file not already closed |
| EINVAL | Invalid arguments (length=0, offset not aligned, etc.) | Validate all parameters before calling |
| ENFILE | System-wide limit on open files reached | System administration issue - reduce open files |
| ENODEV | File system doesn't support mmap | Use traditional read/write instead |
| ENOMEM | Not enough memory, or address space limit reached | Reduce mapping size, free other mappings |
| EOVERFLOW | offset + length overflows off_t | Use 64-bit off_t (compile with _FILE_OFFSET_BITS=64) |
| EPERM | Operation not permitted (MAP_FIXED on special page, etc.) | Check permissions, avoid protected addresses |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465
#include <sys/mman.h>#include <errno.h>#include <string.h>#include <stdio.h> typedef struct { void *address; // Mapped address (NULL on failure) size_t length; // Actual mapped length int error_code; // errno value on failure const char *error_msg; // Human-readable error} MmapResult; MmapResult safe_mmap(size_t length, int prot, int flags, int fd, off_t offset) { MmapResult result = {0}; // Pre-validate obvious errors if (length == 0) { result.error_code = EINVAL; result.error_msg = "Length cannot be zero"; return result; } // Check page alignment of offset long page_size = sysconf(_SC_PAGESIZE); if (offset % page_size != 0) { result.error_code = EINVAL; result.error_msg = "Offset must be page-aligned"; return result; } void *map = mmap(NULL, length, prot, flags, fd, offset); if (map == MAP_FAILED) { result.error_code = errno; // Provide detailed error messages switch (errno) { case EACCES: result.error_msg = "Protection incompatible with file access mode"; break; case ENOMEM: result.error_msg = "Insufficient memory or address space"; break; case ENODEV: result.error_msg = "Filesystem does not support memory mapping"; break; default: result.error_msg = strerror(errno); } return result; } result.address = map; result.length = length; result.error_code = 0; result.error_msg = NULL; return result;} // Usage example:// MmapResult r = safe_mmap(file_size, PROT_READ, MAP_PRIVATE, fd, 0);// if (r.address == NULL) {// fprintf(stderr, "mmap failed: %s\n", r.error_msg);// }Understanding what happens inside the kernel during an mmap() call illuminates why certain behaviors exist and helps predict performance characteristics.
The Virtual Memory Area (VMA):
When mmap() succeeds, the kernel creates a Virtual Memory Area (VMA) structure that describes the mapping. In Linux, this is struct vm_area_struct. Each VMA contains:
What mmap() Does NOT Do:
Critically, mmap() typically does not:
Mmap() merely creates the VMA—a promise that "if you access addresses in this range, the kernel knows what to do." The heavy lifting is deferred.
Page Fault Handling:
The deferred nature means that actual I/O happens during page faults:
This demand-paging approach means:
Page Cache Sharing:
The page cache is the key to mmap efficiency. When multiple processes map the same file:
This is why shared libraries (.so/.dll files) are so efficient—the code pages exist once in physical memory, mapped into potentially thousands of processes.
The mmap() system call is fundamentally lightweight—it manipulates kernel data structures representing the mapping but defers all actual work (RAM allocation, disk I/O, page table modification) until the mapped addresses are accessed. This lazy approach is the cornerstone of efficient memory management in modern operating systems.
Every mmap() should eventually be matched with a munmap() to release the mapping. The signature is straightforward:
#include <sys/mman.h>
int munmap(void *addr, size_t length);
Parameters:
Return Value:
Important Considerations:
1234567891011121314151617181920212223242526
#include <sys/mman.h>#include <stdio.h> void cleanup_mapping(void *map, size_t length, int is_shared_writeable) { if (map == NULL || map == MAP_FAILED) { return; // Nothing to clean up } if (is_shared_writeable) { // Ensure changes are persisted to disk if (msync(map, length, MS_SYNC) == -1) { perror("msync failed - data may not be persisted"); // Continue with munmap anyway } } if (munmap(map, length) == -1) { perror("munmap failed"); // On failure, memory may leak until process exit // There's no good recovery option }} // Note: For mapping sizes retrieved from struct stat,// length should be the exact value used in mmap().// Common bug: using different length values for mmap/munmap.We've comprehensively examined the mmap() system call—the fundamental interface for memory-mapped files and anonymous memory allocation. Let's consolidate the key points:
What's Next:
With the mmap() interface mastered, we'll explore how memory-mapped files transform file access patterns. The next page examines the "File as Memory" paradigm—how treating files as byte arrays eliminates the read/write system call overhead and enables elegant solutions to complex problems.
You now have expert-level understanding of the mmap() system call interface. You know what each parameter does, how they interact, what errors can occur, and what happens inside the kernel. This foundation is essential for the deeper exploration of memory-mapped I/O patterns that follows.