Operating SystemsVirtual Memory

Memory-Mapped Files

LevelAdvanced

Duration90 mins

TopicVirtual Memory

2 / 5

File as Memory

The Paradigm Shift: When Files Become Byte Arrays

Every programmer learns file I/O through the lens of streams and buffers. You open a file, read chunks into application memory, process them, and write results back. This mental model—of files as external entities that must be explicitly fetched and stored—is so deeply ingrained that questioning it seems almost absurd.

But what if we simply... didn't have to do that?

Memory-mapped files offer a radical alternative: treat the file as if it were already in memory. No read() calls. No write() calls. No explicit buffering. You receive a pointer, and from that moment on, the file is just a byte array you can index, iterate, or process with any pointer-based algorithm.

// Traditional I/O: Many system calls, explicit buffer management
char buffer[4096];
while ((n = read(fd, buffer, sizeof(buffer))) > 0) {
    process(buffer, n);
}

// Memory-mapped: Zero system calls for data access
char *file_data = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
process(file_data, file_size);  // Just use it like any array

This isn't syntactic sugar—it's a fundamental change in how your application interacts with the storage subsystem. And understanding it deeply transforms how you architect data-intensive applications.

What You Will Master

This page explores the 'file as memory' paradigm comprehensively—how it changes your application architecture, why it can be dramatically faster than traditional I/O, what limitations exist, and when this approach is (and isn't) appropriate. You'll gain the deep understanding needed to make informed decisions about file access patterns in your systems.

Understanding Traditional I/O Overhead

To appreciate the elegance of memory-mapped files, we must first understand the overhead inherent in traditional file I/O. When you call read() on a modern operating system, a remarkable amount of machinery activates:

The System Call Journey:

User-to-Kernel Transition: Your read() triggers a system call instruction (syscall on x86-64, svc on ARM). The CPU switches from user mode to kernel mode, saving register state and switching to a kernel stack. This transition itself costs hundreds of CPU cycles.
File Descriptor Lookup: The kernel locates the struct file associated with your file descriptor in the process's file descriptor table.
Permission Verification: The kernel verifies your process has read permission on this file.
Page Cache Check: The kernel checks if the requested file data is already in the page cache (also called buffer cache). This involves translating the file offset to a page cache index and performing hash table lookups.
Cache Miss Handling: If data isn't cached, the kernel must:
- Allocate physical pages
- Construct I/O requests to the block layer
- Queue the request for the storage device
- Sleep your process (context switch to another process)
- Wait for disk I/O completion
- Wake your process (another context switch)
The First Copy: Data arrives from the storage device into the kernel's page cache.
The Second Copy: The kernel copies data from the page cache to your user-space buffer. This is the infamous "copy" overhead.
Return to User Space: The kernel returns execution to your application, with another mode transition.

Overhead Components in Traditional I/O
Operation	Approximate Cost	Frequency per I/O
System call entry/exit	~200-500 cycles	Per read()/write() call
File descriptor lookup	~50-100 cycles	Per call
Page cache lookup	~100-300 cycles	Per page accessed
Memory copy (per 4KB page)	~1000-2000 cycles	Per page of data
Context switch (if blocking)	~5,000-20,000 cycles	On cache miss
Disk I/O (HDD)	~10,000,000 cycles	On cache miss
Disk I/O (SSD)	~100,000-500,000 cycles	On cache miss

The Copy Problem at Scale:

Consider a program that processes a 10GB file using 4KB read() calls:

Number of system calls: 10GB ÷ 4KB = 2.5 million calls
System call overhead alone: 2.5M × 400 cycles = 1 billion cycles
Memory copy overhead: 2.5M × 1500 cycles = 3.75 billion cycles
Total overhead: ~4.75 billion cycles just for I/O mechanics

Even with larger buffer sizes (reducing system calls), the copy overhead remains: every byte must transit from kernel space to user space, consuming memory bandwidth and CPU cycles.

The Fundamental Inefficiency:

Notice what's happening: file data gets loaded into memory twice. It exists both in the kernel's page cache AND in your application's buffer. For large files, this means:

Double the RAM consumption
Wasted cache space (L1/L2/L3 caches polluted with duplicate data)
Wasted memory bandwidth (every byte crosses the memory bus twice)

The Dirty Secret of read()

When you read() a file, the data already exists in kernel memory (the page cache) before it's copied to your buffer. With memory-mapped files, you access that same page cache directly—eliminating the copy entirely. You're not just avoiding system calls; you're avoiding unnecessary data movement at the hardware level.

The Memory-Mapped Access Model

Memory mapping inverts the traditional file access model. Instead of moving data from files to your program, you project files into your program's address space. The conceptual shift is profound:

Traditional Model: File → (system call) → Kernel Buffer → (copy) → User Buffer → Process

Memory-Mapped Model: File → Kernel Page Cache ←→ Process Address Space (same physical pages!)

After mmap(), your process's page table contains entries that point to the same physical pages used by the kernel's page cache. There's no intermediate copy because there's no intermediate buffer—you're directly accessing the page cache through your virtual address space.

Converting Mermaid diagram...

The Virtual Memory Magic:

This unification is possible because of virtual memory's flexibility. Virtual addresses don't care what physical pages they point to. The kernel can map:

Physical RAM allocated to your heap
Physical RAM containing your code
Physical RAM from the page cache containing file data

All these appear as different portions of your flat virtual address space. Your code doesn't know (or care) whether a particular address holds heap data or a memory-mapped file—it's all just memory.

What Happens When You Access a Mapped Address:

char *mapped_file = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
char first_byte = mapped_file[0];  // What actually happens here?

When mapped_file[0] executes:

CPU generates a memory read for the virtual address
MMU consults the page table for that virtual address
If page is present: the mapped physical page (from page cache) is accessed directly. No kernel intervention—this is pure hardware.
If page is not present: a page fault occurs:
- Kernel identifies this as a valid mmap region
- Kernel checks if the file page is in the page cache
- If not cached: kernel reads it from disk
- Kernel updates page table to map your virtual address to the physical page
- CPU retries the access—now succeeds

Subsequent accesses to the same page are purely hardware operations with zero kernel involvement. This is why memory-mapped I/O can be dramatically faster for random access patterns.

True Zero-Copy

Memory-mapped files achieve genuine zero-copy I/O. The data in the page cache is the exact same data your application accesses—not a copy, but the original. This eliminates memory bandwidth waste and keeps CPU caches efficient (no duplicate data polluting cache lines).

The Unified Memory Interface

Once a file is mapped, all the powerful tools of memory manipulation become available for file processing:

Pointer Arithmetic:

struct Header *header = (struct Header *)mapped_file;
struct Record *records = (struct Record *)(mapped_file + header->record_offset);

for (int i = 0; i < header->num_records; i++) {
    process_record(&records[i]);
}

Standard Library Functions:

// Search for a byte sequence in the file
char *found = memmem(mapped_file, file_size, pattern, pattern_len);

// Compare portions of two files
int diff = memcmp(mapped_file1, mapped_file2, compare_size);

// Copy file contents to another buffer (if needed)
memcpy(destination, mapped_file + offset, length);

Data Structure Access:

// Binary search in a sorted file
struct Entry *entries = (struct Entry *)mapped_file;
int n_entries = file_size / sizeof(struct Entry);

struct Entry *target = bsearch(&key, entries, n_entries, 
                                sizeof(struct Entry), compare_entries);

unified_interface_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
 
// Example 1: Treating a file as a struct
typedef struct {
    uint32_t magic;
    uint32_t version;
    uint64_t entry_count;
    uint64_t data_offset;
} FileHeader;
 
typedef struct {
    uint64_t id;
    char name[56];
    double value;
} DataEntry;
 
void process_structured_file(const char *path) {
    int fd = open(path, O_RDONLY);
    struct stat sb;
    fstat(fd, &sb);
    
    void *map = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    close(fd);
    
    // Direct structure access - no parsing needed!
    FileHeader *header = (FileHeader *)map;
    
    if (header->magic != 0xDEADBEEF) {
        printf("Invalid file format
");
        munmap(map, sb.st_size);
        return;
    }
    
    // Navigate to data section using pointer arithmetic
    DataEntry *entries = (DataEntry *)((char *)map + header->data_offset);
    
    // Iterate entries - feels like iterating an array
    for (uint64_t i = 0; i < header->entry_count; i++) {
        printf("Entry %lu: %s = %f
", 
               entries[i].id, entries[i].name, entries[i].value);
    }
    
    munmap(map, sb.st_size);
}
 
// Example 2: Index-based random access
typedef struct {
    uint64_t offset;
    uint32_t length;
} IndexEntry;
 
char *get_record_by_index(void *data_map, IndexEntry *index, int record_num) {
    // Direct access to any record via index - O(1) regardless of file size
    return (char *)data_map + index[record_num].offset;
}
 
// Example 3: Memory operations on file content
int count_occurrences(void *map, size_t size, const char *pattern) {
    size_t pattern_len = strlen(pattern);
    int count = 0;
    char *pos = map;
    char *end = (char *)map + size - pattern_len;
    
    while (pos <= end) {
        pos = memmem(pos, end - pos + pattern_len, pattern, pattern_len);
        if (pos == NULL) break;
        count++;
        pos++;
    }
    
    return count;
}

The Power of Direct Access:

This unified interface eliminates entire categories of code:

Buffer management: No need to allocate, resize, or free I/O buffers
Read loops: No while-loops consuming chunks of data
Boundary handling: No logic for when records span buffer boundaries
Position tracking: No maintaining file pointers or seek operations

A complex file format that might require hundreds of lines of parsing code with read() can often be reduced to casting pointers with mmap().

Performance Deep Dive: When mmap() Wins

Memory-mapped I/O isn't universally faster than read()—understanding when it excels is crucial for making informed decisions.

Access Pattern Analysis:

Performance by Access Pattern
Access Pattern	mmap() Performance	read() Performance	Winner
Random access to large file	Excellent—direct single-page faults	Poor—each access is a system call	mmap()
Sequential read, process once	Good—but may fault per-page	Good—readahead helps significantly	Roughly equal
Re-reading same data multiple times	Excellent—pages stay warm in cache	Requires explicit caching	mmap()
Very large file, touch small portion	Excellent—only load needed pages	Wasteful if read beyond needs	mmap()
Streaming data (copy to socket)	Overhead from page faults	sendfile() bypasses user space entirely	sendfile()
Write-heavy, durability critical	Requires msync() management	fsync() after write is clearer	read()/write()

Why Random Access Favors mmap():

Consider a database-style access pattern: reading record #1000, then #42, then #999,000, scattered throughout a large file.

With read():

Each access requires: seek() + read() = 2 system calls
System calls dominate the time profile
For 1 million random accesses: 2 million system calls

With mmap():

Each access is a pointer dereference
First access to each page: single page fault (not a full system call)
Subsequent accesses to same page: pure hardware, zero overhead
Working set caching happens automatically

Quantifying the Difference:

Benchmark scenario: Random access to 4-byte integers in a 1GB file, 1 million accesses:

read() with lseek():   ~45 seconds (dominated by system calls)
mmap() with indexing:  ~2 seconds (page faults only on first access)

That's a 22x speedup—not from clever optimization, but from fundamental change in access model.

The Read-Ahead Caveat

For purely sequential access, the kernel's read-ahead mechanism helps read() performance significantly. When the kernel detects sequential access, it prefetches data before you request it, hiding I/O latency. mmap() also triggers read-ahead, but the patterns may be less predictable. For truly sequential processing, both methods can achieve similar throughput—but mmap() still avoids the copy overhead.

Memory Efficiency:

With traditional I/O, processing a 100GB file requires either:

Reading in chunks (repeated allocation/deallocation)
Allocating a 100GB buffer (impractical)

With mmap(), you can map the entire 100GB file even if you have only 16GB of RAM:

Virtual address space is large (128TB on x86-64 Linux)
Physical pages are allocated on-demand as you access them
Under memory pressure, the kernel reclaims clean pages (they can be re-read from file)
You touch only what you need; the rest consumes zero RAM

This enables algorithms that conceptually need to "see" entire datasets to work on datasets larger than RAM, without explicit chunking or streaming logic.

Side-by-Side: mmap() vs. read()/write()

Let's examine equivalent operations in both models to highlight practical differences:

Scenario 1: Reading and Processing a Configuration File

traditional_read.c
C (Traditional)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Traditional approach
int fd = open("config.dat", O_RDONLY);
struct stat sb;
fstat(fd, &sb);
 
// Allocate buffer
char *buffer = malloc(sb.st_size + 1);
if (!buffer) { /* error */ }
 
// Read entire file
ssize_t bytes = 0;
while (bytes < sb.st_size) {
    ssize_t r = read(fd, buffer + bytes,
                     sb.st_size - bytes);
    if (r <= 0) break;
    bytes += r;
}
close(fd);
buffer[bytes] = '\0';
 
// Process
process_config(buffer, bytes);
 
// Cleanup
free(buffer);

mmap_read.c
C (mmap)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Memory-mapped approach
int fd = open("config.dat", O_RDONLY);
struct stat sb;
fstat(fd, &sb);
 
// Map file
char *mapped = mmap(NULL, sb.st_size,
    PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
 
if (mapped == MAP_FAILED) { /* error */ }
 
// Process - file is already accessible
process_config(mapped, sb.st_size);
 
// Cleanup
munmap(mapped, sb.st_size);
 
 
 
// Note: No malloc, no read loop,
// no buffer boundary handling

Scenario 2: Modifying a Binary File In-Place

traditional_modify.c
C (Traditional)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Modify record at offset
void update_record_traditional(
    int fd, off_t offset, 
    Record *new_data) {
    
    // Seek to position
    if (lseek(fd, offset, SEEK_SET) == -1)
        return;
    
    // Write new data
    if (write(fd, new_data, sizeof(Record)) 
        != sizeof(Record)) {
        // Partial write handling...
    }
    
    // Ensure durability
    fsync(fd);
}
 
// For multiple updates: many seeks + writes

mmap_modify.c
C (mmap)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Modify record via memory
void update_record_mmap(
    void *map, off_t offset,
    Record *new_data) {
    
    // Direct memory write
    Record *target = (Record *)
        ((char *)map + offset);
    *target = *new_data;
    
    // Sync if durability needed
    msync(target, sizeof(Record), MS_SYNC);
}
 
// For multiple updates: just assign to
// different offsets - no system calls
// until you msync()

Scenario 3: Searching for a Pattern

traditional_search.c
C (Traditional)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Search for pattern in file
off_t find_pattern_traditional(
    int fd, const char *pattern, 
    size_t pattern_len) {
    
    char buffer[8192];
    char overlap[256]; // For cross-boundary
    size_t overlap_len = 0;
    off_t position = 0;
    
    while (1) {
        ssize_t n = read(fd, buffer, 
                         sizeof(buffer));
        if (n <= 0) return -1;
        
        // Search in overlap + new data
        // (Complex boundary handling)
        // ...
        
        position += n;
    }
}

mmap_search.c
C (mmap)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Search for pattern in file
off_t find_pattern_mmap(
    void *map, size_t file_size,
    const char *pattern,
    size_t pattern_len) {
    
    // Use standard memory search
    void *found = memmem(map, file_size,
                         pattern, pattern_len);
    
    if (found == NULL) return -1;
    
    return (char *)found - (char *)map;
}
 
 
 
// That's it. No buffers, no boundaries,
// no complex state management.

Writing to Files Through Memory

Memory-mapped I/O isn't just for reading—it enables intuitive file modification through standard memory operations. When you write to a MAP_SHARED mapping, your changes eventually propagate to the underlying file.

How Write Propagation Works:

You write to a mapped address (e.g., *ptr = value)
The MMU marks the page as dirty in the page table
The corresponding page cache page is now modified
At some later point, the kernel's writeback mechanism flushes dirty pages to disk

The Writeback Timing Challenge:

Unlike write() which blocks until data reaches kernel buffers (and optionally disk with fsync), mmap() writes are decoupled from filesystem operations:

// When does this actually hit the disk?
*mapped_ptr = new_value;  // Only modifies RAM (page cache)

// Answer: "Eventually" — when:
//   1. Kernel writeback timer fires (usually 5-30 seconds)
//   2. System runs low on memory
//   3. You explicitly call msync()
//   4. You call munmap()
//   5. Process exits

Durability Requires msync()

For crash-safe applications, never assume mmap() writes reached disk. A power failure or kernel crash before writeback will lose your changes. Use msync(addr, len, MS_SYNC) to force writes to disk, similar to fsync() after write().

safe_mmap_writes.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
 
typedef struct {
    uint64_t transaction_id;
    char data[4088];
} Record;
 
// Safely update a record with durability guarantee
int update_record_durably(void *map, size_t map_size, 
                          int record_index, Record *new_record) {
    Record *records = (Record *)map;
    Record *target = &records[record_index];
    
    // Validate bounds
    if ((char *)(target + 1) > (char *)map + map_size) {
        return -1;  // Out of bounds
    }
    
    // Step 1: Write to memory
    *target = *new_record;
    
    // Step 2: Force to disk synchronously
    // MS_SYNC: Wait for write to complete
    // MS_ASYNC: Schedule write but don't wait
    // MS_INVALIDATE: Invalidate other mappings (rarely needed)
    
    if (msync(target, sizeof(Record), MS_SYNC) == -1) {
        perror("msync failed");
        // Note: Data might still be in page cache
        // It may still reach disk eventually
        return -1;
    }
    
    return 0;  // Record durably stored
}
 
// Batch multiple updates, then sync once for efficiency
int update_records_batch(void *map, size_t map_size,
                         int *indices, Record *new_records, 
                         int count) {
    Record *records = (Record *)map;
    
    // Step 1: All writes to memory (fast)
    for (int i = 0; i < count; i++) {
        records[indices[i]] = new_records[i];
    }
    
    // Step 2: Single sync for entire mapped region (slower, but once)
    if (msync(map, map_size, MS_SYNC) == -1) {
        perror("msync failed");
        return -1;
    }
    
    return 0;
}

msync() Flags Explained:

MS_SYNC: Synchronous writeback—msync() blocks until data is on disk. Use for durability guarantees.
MS_ASYNC: Asynchronous writeback—msync() schedules writeback and returns immediately. Data will reach disk "soon" but not guaranteed immediately.
MS_INVALIDATE: Invalidate other mappings of the same file, forcing them to see your changes. Rarely needed in practice.

Extending Files Through mmap():

You cannot extend a file's size by writing beyond its current end through mmap(). Attempting to access bytes beyond the file size triggers SIGBUS. To grow a file:

// Growing a memory-mapped file
int fd = open("data.bin", O_RDWR);

// Extend the file first
if (ftruncate(fd, new_size) == -1) {
    perror("ftruncate");
    // handle error
}

// Now remap with larger size
// Note: Must munmap() old mapping first, or use mremap() on Linux

Limitations and Common Pitfalls

Memory-mapped I/O isn't universally superior. Understanding its limitations helps you make appropriate choices:

Pitfall 1: Error Handling Complexity

With read(), I/O errors return immediately via the return value:

if (read(fd, buf, size) == -1) {
    // Handle error - we know immediately
}

With mmap(), I/O errors manifest as signals when you access the memory:

char *data = mmap(...);
// mmap() succeeded, but...
char c = data[0];  // This might SIGBUS if disk read fails!

Handling errors requires setting up signal handlers—substantially more complex than checking return values.

Key Limitations

•No portable error reporting — I/O errors become SIGBUS signals, complicating robust error handling
•Cannot map empty files — mmap() with length=0 fails; handle this case explicitly
•Cannot extend files — Writing beyond EOF causes SIGBUS, not automatic growth
•Virtual address space limits — On 32-bit systems, mapping many large files quickly exhausts address space
•Working set thrashing — Mapping more data than RAM fits causes page fault storms as the kernel swaps
•File truncation danger — If another process truncates a mapped file, accessing beyond new EOF causes SIGBUS
•Not suitable for streams — Pipes, sockets, and some devices don't support mmap()
•Complexity of synchronization — For shared mappings with concurrent access, you need proper locking

Pitfall 2: The Truncation Race

Consider this dangerous scenario:

Process A maps a 1GB file
Process B truncates the file to 100MB
Process A accesses byte 500,000,000 → SIGBUS

Unlike read() which would return 0 or an error for a truncated file, mmap() causes a crash. Solutions:

Use file locking to coordinate access
Handle SIGBUS signals (complex and platform-specific)
Ensure exclusive access during mapping lifetime

Pitfall 3: Sequential Streaming Performance

For pure sequential reads of massive files, mmap() may not outperform optimzied read() loops:

// This is often faster for sequential reads:
while ((n = read(fd, buf, BIG_BUFFER)) > 0) {
    send(socket_fd, buf, n, 0);  // Stream to network
}

// Better still - zero-copy to socket:
sendfile(socket_fd, fd, NULL, file_size);

The sendfile() system call moves data directly between file and socket within the kernel—even mmap() can't compete because it involves user-space address access.

32-bit Address Space Warning

On 32-bit architectures, user-space typically has ~3GB of virtual address space. Mapping multiple large files simultaneously can exhaust this quickly. Each mapping consumes virtual address space even if you never touch most pages. On 64-bit systems, this is rarely a concern—you have 128TB or more of virtual space.

Decision Framework: Choosing mmap() vs. read()/write()

Use this decision framework to choose between memory mapping and traditional I/O:

Prefer mmap() When

•Random access patterns — Jumping around a file benefits enormously from mmap()'s direct addressing
•Multiple accesses to same data — Re-reading same regions is zero-cost with mmap()
•Structured binary data — Casting pointers to structs is cleaner than parsing buffers
•Large files, sparse access — Only pages you touch consume RAM
•Sharing data between processes — MAP_SHARED enables efficient IPC
•Index + data file patterns — Reading an index to find offsets, then accessing data directly
•Code simplicity is priority — mmap() often results in dramatically simpler code

Prefer read()/write() When

•Sequential streaming — Pure sequential processing with no re-access
•Robust error handling critical — Return codes are easier than signal handling
•Non-file I/O — Sockets, pipes, and some devices don't support mmap()
•Files changing concurrently — Truncation can cause SIGBUS crashes
•32-bit systems with many large files — Address space exhaustion is real
•Need sendfile()/splice() — True zero-copy socket streaming

Quick Reference: Use Case to Approach
Use Case	Recommended Approach	Rationale
Database files	mmap()	Random access, re-reads, structured data
Log file tailing	read()	Sequential, may need nonblocking
Configuration files	mmap() or read()	Usually small, either works
Image/video editing	mmap()	Random access, sparse edits
HTTP file serving	sendfile()	Zero-copy to socket
Search/grep	mmap()	Simpler algorithms, may re-scan
Archive extraction	read()+write()	Sequential decompression

Summary: File as Memory

We've explored the paradigm of treating files as memory—how memory-mapped files fundamentally transform file access patterns and enable elegant, efficient solutions to data processing challenges.

Key Takeaways

•Zero-copy access — mmap() lets you access page cache directly, eliminating kernel-to-user copies
•Pointer-based interface — Files become byte arrays accessible via standard memory operations
•Demand paging — Only accessed pages consume RAM, enabling larger-than-memory datasets
•Random access excels — Each access is a pointer dereference, not a system call
•Writes require msync() — For durability, explicitly sync dirty pages to disk
•Limitations exist — Error handling, file growth, and truncation require careful consideration
•Not always superior — Sequential streaming and zero-copy socket I/O favor traditional methods

What's Next:

With the file-as-memory paradigm understood, we'll examine how the kernel's lazy loading mechanism makes mmap() efficient even for enormous files. The next page explores lazy loading in depth—how pages are faulted in on demand, what triggers I/O, and how to optimize access patterns for your working set.

Page Complete

You now understand how memory-mapped files transform file access—allowing you to treat persistent storage as ordinary memory. This unified interface simplifies code, eliminates copying overhead, and enables powerful data processing patterns. Next, we'll explore the lazy loading mechanism that makes this efficient.

2 / 5

Loading learning content...

Operating SystemsVirtual Memory

Memory-Mapped Files

LevelAdvanced

Duration90 mins

TopicVirtual Memory

2 / 5

File as Memory

The Paradigm Shift: When Files Become Byte Arrays

But what if we simply... didn't have to do that?

// Traditional I/O: Many system calls, explicit buffer management
char buffer[4096];
while ((n = read(fd, buffer, sizeof(buffer))) > 0) {
    process(buffer, n);
}

// Memory-mapped: Zero system calls for data access
char *file_data = mmap(NULL, file_size, PROT_READ, MAP_PRIVATE, fd, 0);
process(file_data, file_size);  // Just use it like any array

What You Will Master

Understanding Traditional I/O Overhead

The System Call Journey:

User-to-Kernel Transition: Your read() triggers a system call instruction (syscall on x86-64, svc on ARM). The CPU switches from user mode to kernel mode, saving register state and switching to a kernel stack. This transition itself costs hundreds of CPU cycles.
File Descriptor Lookup: The kernel locates the struct file associated with your file descriptor in the process's file descriptor table.
Permission Verification: The kernel verifies your process has read permission on this file.
Page Cache Check: The kernel checks if the requested file data is already in the page cache (also called buffer cache). This involves translating the file offset to a page cache index and performing hash table lookups.
Cache Miss Handling: If data isn't cached, the kernel must:
- Allocate physical pages
- Construct I/O requests to the block layer
- Queue the request for the storage device
- Sleep your process (context switch to another process)
- Wait for disk I/O completion
- Wake your process (another context switch)
The First Copy: Data arrives from the storage device into the kernel's page cache.
The Second Copy: The kernel copies data from the page cache to your user-space buffer. This is the infamous "copy" overhead.
Return to User Space: The kernel returns execution to your application, with another mode transition.

Overhead Components in Traditional I/O
Operation	Approximate Cost	Frequency per I/O
System call entry/exit	~200-500 cycles	Per read()/write() call
File descriptor lookup	~50-100 cycles	Per call
Page cache lookup	~100-300 cycles	Per page accessed
Memory copy (per 4KB page)	~1000-2000 cycles	Per page of data
Context switch (if blocking)	~5,000-20,000 cycles	On cache miss
Disk I/O (HDD)	~10,000,000 cycles	On cache miss
Disk I/O (SSD)	~100,000-500,000 cycles	On cache miss

The Copy Problem at Scale:

Consider a program that processes a 10GB file using 4KB read() calls:

Number of system calls: 10GB ÷ 4KB = 2.5 million calls
System call overhead alone: 2.5M × 400 cycles = 1 billion cycles
Memory copy overhead: 2.5M × 1500 cycles = 3.75 billion cycles
Total overhead: ~4.75 billion cycles just for I/O mechanics

Even with larger buffer sizes (reducing system calls), the copy overhead remains: every byte must transit from kernel space to user space, consuming memory bandwidth and CPU cycles.

The Fundamental Inefficiency:

Notice what's happening: file data gets loaded into memory twice. It exists both in the kernel's page cache AND in your application's buffer. For large files, this means:

Double the RAM consumption
Wasted cache space (L1/L2/L3 caches polluted with duplicate data)
Wasted memory bandwidth (every byte crosses the memory bus twice)

The Dirty Secret of read()

The Memory-Mapped Access Model

Memory mapping inverts the traditional file access model. Instead of moving data from files to your program, you project files into your program's address space. The conceptual shift is profound:

Traditional Model: File → (system call) → Kernel Buffer → (copy) → User Buffer → Process

Memory-Mapped Model: File → Kernel Page Cache ←→ Process Address Space (same physical pages!)

Converting Mermaid diagram...

The Virtual Memory Magic:

This unification is possible because of virtual memory's flexibility. Virtual addresses don't care what physical pages they point to. The kernel can map:

Physical RAM allocated to your heap
Physical RAM containing your code
Physical RAM from the page cache containing file data

What Happens When You Access a Mapped Address:

char *mapped_file = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
char first_byte = mapped_file[0];  // What actually happens here?

When mapped_file[0] executes:

CPU generates a memory read for the virtual address
MMU consults the page table for that virtual address
If page is present: the mapped physical page (from page cache) is accessed directly. No kernel intervention—this is pure hardware.
If page is not present: a page fault occurs:
- Kernel identifies this as a valid mmap region
- Kernel checks if the file page is in the page cache
- If not cached: kernel reads it from disk
- Kernel updates page table to map your virtual address to the physical page
- CPU retries the access—now succeeds

Subsequent accesses to the same page are purely hardware operations with zero kernel involvement. This is why memory-mapped I/O can be dramatically faster for random access patterns.

True Zero-Copy

The Unified Memory Interface

Once a file is mapped, all the powerful tools of memory manipulation become available for file processing:

Pointer Arithmetic:

struct Header *header = (struct Header *)mapped_file;
struct Record *records = (struct Record *)(mapped_file + header->record_offset);

for (int i = 0; i < header->num_records; i++) {
    process_record(&records[i]);
}

Standard Library Functions:

// Search for a byte sequence in the file
char *found = memmem(mapped_file, file_size, pattern, pattern_len);

// Compare portions of two files
int diff = memcmp(mapped_file1, mapped_file2, compare_size);

// Copy file contents to another buffer (if needed)
memcpy(destination, mapped_file + offset, length);

Data Structure Access:

// Binary search in a sorted file
struct Entry *entries = (struct Entry *)mapped_file;
int n_entries = file_size / sizeof(struct Entry);

struct Entry *target = bsearch(&key, entries, n_entries, 
                                sizeof(struct Entry), compare_entries);

unified_interface_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
 
// Example 1: Treating a file as a struct
typedef struct {
    uint32_t magic;
    uint32_t version;
    uint64_t entry_count;
    uint64_t data_offset;
} FileHeader;
 
typedef struct {
    uint64_t id;
    char name[56];
    double value;
} DataEntry;
 
void process_structured_file(const char *path) {
    int fd = open(path, O_RDONLY);
    struct stat sb;
    fstat(fd, &sb);
    
    void *map = mmap(NULL, sb.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
    close(fd);
    
    // Direct structure access - no parsing needed!
    FileHeader *header = (FileHeader *)map;
    
    if (header->magic != 0xDEADBEEF) {
        printf("Invalid file format
");
        munmap(map, sb.st_size);
        return;
    }
    
    // Navigate to data section using pointer arithmetic
    DataEntry *entries = (DataEntry *)((char *)map + header->data_offset);
    
    // Iterate entries - feels like iterating an array
    for (uint64_t i = 0; i < header->entry_count; i++) {
        printf("Entry %lu: %s = %f
", 
               entries[i].id, entries[i].name, entries[i].value);
    }
    
    munmap(map, sb.st_size);
}
 
// Example 2: Index-based random access
typedef struct {
    uint64_t offset;
    uint32_t length;
} IndexEntry;
 
char *get_record_by_index(void *data_map, IndexEntry *index, int record_num) {
    // Direct access to any record via index - O(1) regardless of file size
    return (char *)data_map + index[record_num].offset;
}
 
// Example 3: Memory operations on file content
int count_occurrences(void *map, size_t size, const char *pattern) {
    size_t pattern_len = strlen(pattern);
    int count = 0;
    char *pos = map;
    char *end = (char *)map + size - pattern_len;
    
    while (pos <= end) {
        pos = memmem(pos, end - pos + pattern_len, pattern, pattern_len);
        if (pos == NULL) break;
        count++;
        pos++;
    }
    
    return count;
}

The Power of Direct Access:

This unified interface eliminates entire categories of code:

Buffer management: No need to allocate, resize, or free I/O buffers
Read loops: No while-loops consuming chunks of data
Boundary handling: No logic for when records span buffer boundaries
Position tracking: No maintaining file pointers or seek operations

A complex file format that might require hundreds of lines of parsing code with read() can often be reduced to casting pointers with mmap().

Performance Deep Dive: When mmap() Wins

Memory-mapped I/O isn't universally faster than read()—understanding when it excels is crucial for making informed decisions.

Access Pattern Analysis:

Performance by Access Pattern
Access Pattern	mmap() Performance	read() Performance	Winner
Random access to large file	Excellent—direct single-page faults	Poor—each access is a system call	mmap()
Sequential read, process once	Good—but may fault per-page	Good—readahead helps significantly	Roughly equal
Re-reading same data multiple times	Excellent—pages stay warm in cache	Requires explicit caching	mmap()
Very large file, touch small portion	Excellent—only load needed pages	Wasteful if read beyond needs	mmap()
Streaming data (copy to socket)	Overhead from page faults	sendfile() bypasses user space entirely	sendfile()
Write-heavy, durability critical	Requires msync() management	fsync() after write is clearer	read()/write()

Why Random Access Favors mmap():

Consider a database-style access pattern: reading record #1000, then #42, then #999,000, scattered throughout a large file.

With read():

Each access requires: seek() + read() = 2 system calls
System calls dominate the time profile
For 1 million random accesses: 2 million system calls

With mmap():

Each access is a pointer dereference
First access to each page: single page fault (not a full system call)
Subsequent accesses to same page: pure hardware, zero overhead
Working set caching happens automatically

Quantifying the Difference:

Benchmark scenario: Random access to 4-byte integers in a 1GB file, 1 million accesses:

read() with lseek():   ~45 seconds (dominated by system calls)
mmap() with indexing:  ~2 seconds (page faults only on first access)

That's a 22x speedup—not from clever optimization, but from fundamental change in access model.

The Read-Ahead Caveat

Memory Efficiency:

With traditional I/O, processing a 100GB file requires either:

Reading in chunks (repeated allocation/deallocation)
Allocating a 100GB buffer (impractical)

With mmap(), you can map the entire 100GB file even if you have only 16GB of RAM:

Virtual address space is large (128TB on x86-64 Linux)
Physical pages are allocated on-demand as you access them
Under memory pressure, the kernel reclaims clean pages (they can be re-read from file)
You touch only what you need; the rest consumes zero RAM

This enables algorithms that conceptually need to "see" entire datasets to work on datasets larger than RAM, without explicit chunking or streaming logic.

Side-by-Side: mmap() vs. read()/write()

Let's examine equivalent operations in both models to highlight practical differences:

Scenario 1: Reading and Processing a Configuration File

traditional_read.c
C (Traditional)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
// Traditional approach
int fd = open("config.dat", O_RDONLY);
struct stat sb;
fstat(fd, &sb);
 
// Allocate buffer
char *buffer = malloc(sb.st_size + 1);
if (!buffer) { /* error */ }
 
// Read entire file
ssize_t bytes = 0;
while (bytes < sb.st_size) {
    ssize_t r = read(fd, buffer + bytes,
                     sb.st_size - bytes);
    if (r <= 0) break;
    bytes += r;
}
close(fd);
buffer[bytes] = '\0';
 
// Process
process_config(buffer, bytes);
 
// Cleanup
free(buffer);

mmap_read.c
C (mmap)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Memory-mapped approach
int fd = open("config.dat", O_RDONLY);
struct stat sb;
fstat(fd, &sb);
 
// Map file
char *mapped = mmap(NULL, sb.st_size,
    PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
 
if (mapped == MAP_FAILED) { /* error */ }
 
// Process - file is already accessible
process_config(mapped, sb.st_size);
 
// Cleanup
munmap(mapped, sb.st_size);
 
 
 
// Note: No malloc, no read loop,
// no buffer boundary handling

Scenario 2: Modifying a Binary File In-Place

traditional_modify.c
C (Traditional)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// Modify record at offset
void update_record_traditional(
    int fd, off_t offset, 
    Record *new_data) {
    
    // Seek to position
    if (lseek(fd, offset, SEEK_SET) == -1)
        return;
    
    // Write new data
    if (write(fd, new_data, sizeof(Record)) 
        != sizeof(Record)) {
        // Partial write handling...
    }
    
    // Ensure durability
    fsync(fd);
}
 
// For multiple updates: many seeks + writes

mmap_modify.c
C (mmap)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// Modify record via memory
void update_record_mmap(
    void *map, off_t offset,
    Record *new_data) {
    
    // Direct memory write
    Record *target = (Record *)
        ((char *)map + offset);
    *target = *new_data;
    
    // Sync if durability needed
    msync(target, sizeof(Record), MS_SYNC);
}
 
// For multiple updates: just assign to
// different offsets - no system calls
// until you msync()

Scenario 3: Searching for a Pattern

traditional_search.c
C (Traditional)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Search for pattern in file
off_t find_pattern_traditional(
    int fd, const char *pattern, 
    size_t pattern_len) {
    
    char buffer[8192];
    char overlap[256]; // For cross-boundary
    size_t overlap_len = 0;
    off_t position = 0;
    
    while (1) {
        ssize_t n = read(fd, buffer, 
                         sizeof(buffer));
        if (n <= 0) return -1;
        
        // Search in overlap + new data
        // (Complex boundary handling)
        // ...
        
        position += n;
    }
}

mmap_search.c
C (mmap)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// Search for pattern in file
off_t find_pattern_mmap(
    void *map, size_t file_size,
    const char *pattern,
    size_t pattern_len) {
    
    // Use standard memory search
    void *found = memmem(map, file_size,
                         pattern, pattern_len);
    
    if (found == NULL) return -1;
    
    return (char *)found - (char *)map;
}
 
 
 
// That's it. No buffers, no boundaries,
// no complex state management.

Writing to Files Through Memory

How Write Propagation Works:

You write to a mapped address (e.g., *ptr = value)
The MMU marks the page as dirty in the page table
The corresponding page cache page is now modified
At some later point, the kernel's writeback mechanism flushes dirty pages to disk

The Writeback Timing Challenge:

Unlike write() which blocks until data reaches kernel buffers (and optionally disk with fsync), mmap() writes are decoupled from filesystem operations:

// When does this actually hit the disk?
*mapped_ptr = new_value;  // Only modifies RAM (page cache)

// Answer: "Eventually" — when:
//   1. Kernel writeback timer fires (usually 5-30 seconds)
//   2. System runs low on memory
//   3. You explicitly call msync()
//   4. You call munmap()
//   5. Process exits

Durability Requires msync()

safe_mmap_writes.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <sys/mman.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
 
typedef struct {
    uint64_t transaction_id;
    char data[4088];
} Record;
 
// Safely update a record with durability guarantee
int update_record_durably(void *map, size_t map_size, 
                          int record_index, Record *new_record) {
    Record *records = (Record *)map;
    Record *target = &records[record_index];
    
    // Validate bounds
    if ((char *)(target + 1) > (char *)map + map_size) {
        return -1;  // Out of bounds
    }
    
    // Step 1: Write to memory
    *target = *new_record;
    
    // Step 2: Force to disk synchronously
    // MS_SYNC: Wait for write to complete
    // MS_ASYNC: Schedule write but don't wait
    // MS_INVALIDATE: Invalidate other mappings (rarely needed)
    
    if (msync(target, sizeof(Record), MS_SYNC) == -1) {
        perror("msync failed");
        // Note: Data might still be in page cache
        // It may still reach disk eventually
        return -1;
    }
    
    return 0;  // Record durably stored
}
 
// Batch multiple updates, then sync once for efficiency
int update_records_batch(void *map, size_t map_size,
                         int *indices, Record *new_records, 
                         int count) {
    Record *records = (Record *)map;
    
    // Step 1: All writes to memory (fast)
    for (int i = 0; i < count; i++) {
        records[indices[i]] = new_records[i];
    }
    
    // Step 2: Single sync for entire mapped region (slower, but once)
    if (msync(map, map_size, MS_SYNC) == -1) {
        perror("msync failed");
        return -1;
    }
    
    return 0;
}

msync() Flags Explained:

MS_SYNC: Synchronous writeback—msync() blocks until data is on disk. Use for durability guarantees.
MS_ASYNC: Asynchronous writeback—msync() schedules writeback and returns immediately. Data will reach disk "soon" but not guaranteed immediately.
MS_INVALIDATE: Invalidate other mappings of the same file, forcing them to see your changes. Rarely needed in practice.

Extending Files Through mmap():

You cannot extend a file's size by writing beyond its current end through mmap(). Attempting to access bytes beyond the file size triggers SIGBUS. To grow a file:

// Growing a memory-mapped file
int fd = open("data.bin", O_RDWR);

// Extend the file first
if (ftruncate(fd, new_size) == -1) {
    perror("ftruncate");
    // handle error
}

// Now remap with larger size
// Note: Must munmap() old mapping first, or use mremap() on Linux

Limitations and Common Pitfalls

Memory-mapped I/O isn't universally superior. Understanding its limitations helps you make appropriate choices:

Pitfall 1: Error Handling Complexity

With read(), I/O errors return immediately via the return value:

if (read(fd, buf, size) == -1) {
    // Handle error - we know immediately
}

With mmap(), I/O errors manifest as signals when you access the memory:

char *data = mmap(...);
// mmap() succeeded, but...
char c = data[0];  // This might SIGBUS if disk read fails!

Handling errors requires setting up signal handlers—substantially more complex than checking return values.

Key Limitations

•No portable error reporting — I/O errors become SIGBUS signals, complicating robust error handling
•Cannot map empty files — mmap() with length=0 fails; handle this case explicitly
•Cannot extend files — Writing beyond EOF causes SIGBUS, not automatic growth
•Virtual address space limits — On 32-bit systems, mapping many large files quickly exhausts address space
•Working set thrashing — Mapping more data than RAM fits causes page fault storms as the kernel swaps
•File truncation danger — If another process truncates a mapped file, accessing beyond new EOF causes SIGBUS
•Not suitable for streams — Pipes, sockets, and some devices don't support mmap()
•Complexity of synchronization — For shared mappings with concurrent access, you need proper locking

Pitfall 2: The Truncation Race

Consider this dangerous scenario:

Process A maps a 1GB file
Process B truncates the file to 100MB
Process A accesses byte 500,000,000 → SIGBUS

Unlike read() which would return 0 or an error for a truncated file, mmap() causes a crash. Solutions:

Use file locking to coordinate access
Handle SIGBUS signals (complex and platform-specific)
Ensure exclusive access during mapping lifetime

Pitfall 3: Sequential Streaming Performance

For pure sequential reads of massive files, mmap() may not outperform optimzied read() loops:

// This is often faster for sequential reads:
while ((n = read(fd, buf, BIG_BUFFER)) > 0) {
    send(socket_fd, buf, n, 0);  // Stream to network
}

// Better still - zero-copy to socket:
sendfile(socket_fd, fd, NULL, file_size);

The sendfile() system call moves data directly between file and socket within the kernel—even mmap() can't compete because it involves user-space address access.

32-bit Address Space Warning

Decision Framework: Choosing mmap() vs. read()/write()

Use this decision framework to choose between memory mapping and traditional I/O:

Prefer mmap() When

•Random access patterns — Jumping around a file benefits enormously from mmap()'s direct addressing
•Multiple accesses to same data — Re-reading same regions is zero-cost with mmap()
•Structured binary data — Casting pointers to structs is cleaner than parsing buffers
•Large files, sparse access — Only pages you touch consume RAM
•Sharing data between processes — MAP_SHARED enables efficient IPC
•Index + data file patterns — Reading an index to find offsets, then accessing data directly
•Code simplicity is priority — mmap() often results in dramatically simpler code

Prefer read()/write() When

•Sequential streaming — Pure sequential processing with no re-access
•Robust error handling critical — Return codes are easier than signal handling
•Non-file I/O — Sockets, pipes, and some devices don't support mmap()
•Files changing concurrently — Truncation can cause SIGBUS crashes
•32-bit systems with many large files — Address space exhaustion is real
•Need sendfile()/splice() — True zero-copy socket streaming

Quick Reference: Use Case to Approach
Use Case	Recommended Approach	Rationale
Database files	mmap()	Random access, re-reads, structured data
Log file tailing	read()	Sequential, may need nonblocking
Configuration files	mmap() or read()	Usually small, either works
Image/video editing	mmap()	Random access, sparse edits
HTTP file serving	sendfile()	Zero-copy to socket
Search/grep	mmap()	Simpler algorithms, may re-scan
Archive extraction	read()+write()	Sequential decompression

Summary: File as Memory

We've explored the paradigm of treating files as memory—how memory-mapped files fundamentally transform file access patterns and enable elegant, efficient solutions to data processing challenges.

Key Takeaways

•Zero-copy access — mmap() lets you access page cache directly, eliminating kernel-to-user copies
•Pointer-based interface — Files become byte arrays accessible via standard memory operations
•Demand paging — Only accessed pages consume RAM, enabling larger-than-memory datasets
•Random access excels — Each access is a pointer dereference, not a system call
•Writes require msync() — For durability, explicitly sync dirty pages to disk
•Limitations exist — Error handling, file growth, and truncation require careful consideration
•Not always superior — Sequential streaming and zero-copy socket I/O favor traditional methods

What's Next:

Page Complete

2 / 5