Operating SystemsAccess Methods

File Access Methods

LevelIntermediate

Duration75 mins

TopicAccess Methods

1 / 5

Sequential Access

The Oldest and Most Enduring Access Pattern

Before random access memory, before magnetic disks with movable heads, before any notion of 'jumping' to a specific location in a file—there was sequential access. It is the most primitive, most intuitive, and remarkably, still the most common method of interacting with file data in modern computing systems.

Sequential access embodies a simple principle: read or write data in a linear, ordered sequence from beginning to end. Like reading a book page by page, or listening to a tape recording from start to finish, sequential access processes information in precisely the order it was stored.

This simplicity is not a limitation—it is a profound strength. Sequential access aligns perfectly with physics. Magnetic tape can only move in one direction at a time. Hard disk platters spin continuously, making sequential reads vastly more efficient than scattered random reads. Even modern SSDs, with no moving parts, still benefit from sequential access patterns due to the way flash memory pages and blocks are organized.

Understanding sequential access is foundational to understanding all file access methods, because every other method is essentially an optimization for specific access patterns that sequential access cannot efficiently serve.

What You Will Learn

By the end of this page, you will deeply understand the sequential access model—its conceptual underpinnings, the file pointer mechanism, implementation details, performance characteristics on different storage media, real-world use cases, and why this 'primitive' access pattern remains central to modern operating systems and applications.

The Conceptual Model of Sequential Access

At its core, sequential access treats a file as a linear stream of bytes or records. Imagine a very long tape with data written from one end to the other. To access any piece of data, you must 'play' the tape from wherever you currently are, moving forward (and sometimes backward) through the data stream.

The defining characteristic of sequential access is the file pointer (also called the file position indicator or current position). This pointer marks your current location within the file. Every read operation retrieves data starting from this pointer and automatically advances the pointer by the number of bytes read. Every write operation writes data starting at the pointer and advances it similarly.

Key properties of the sequential access model:

Sequential Access Properties

•Linear Traversal — Data is accessed in order, from position 0 through position N-1, where N is the file size.
•Automatic Position Advancement — After each read or write, the file pointer advances by the number of bytes processed.
•Single Current Position — At any moment, there is exactly one 'current position' per open file descriptor.
•Forward and Backward Movement — While forward is natural, the lseek() system call allows rewinding or repositioning.
•Record-Oriented or Byte-Oriented — Depending on the OS and file system, sequential access may operate on fixed-size records or arbitrary byte streams.

The tape metaphor:

The tape metaphor is historically accurate—sequential access was literally how tape drives worked. Magnetic tape could only read data as it passed over the read/write head, and the tape could only move in one direction at meaningful speed (rewinding was slow and avoided when possible).

Even though modern storage devices (disks, SSDs) can access arbitrary locations, the sequential model persists because:

Physical efficiency — Disks achieve maximum throughput with sequential reads/writes
Simplicity — The programming model is straightforward and intuitive
Buffering optimization — OS caches can predict and prefetch sequential data
Common workloads — Most file processing (logs, media, backups) is naturally sequential

Historical Context

Sequential access predates the concept of 'files' entirely. Early computer systems read punched cards or paper tape—inherently sequential media. When magnetic tape was introduced in the 1950s, the sequential model was preserved. Even the original Unix file system was designed with sequential access as the primary paradigm, and this influence persists in POSIX standards today.

The File Pointer Mechanism in Depth

The file pointer is the central abstraction enabling sequential access. Understanding how it works—both conceptually and in terms of OS implementation—is crucial for mastering file I/O.

What exactly is a file pointer?

A file pointer is a non-negative integer offset representing the byte position within a file where the next read or write will occur. When a file is first opened (without special flags), the file pointer is set to 0, indicating the beginning of the file.

The file pointer has the following behaviors:

File Pointer Behavior by Operation
Operation	Effect on File Pointer	Notes
open()	Set to 0 (or EOF with O_APPEND)	Initial position determined by flags
read(n)	Advances by n bytes	Returns bytes read; may be < n at EOF
write(n)	Advances by n bytes	May extend file if at or past EOF
lseek(offset, whence)	Set to calculated position	Allows random repositioning
close()	Pointer discarded	Each open() creates a new pointer

Implementation in the Operating System:

When you open a file in Unix/Linux, the OS creates several data structures:

File Descriptor Table (per-process) — Maps file descriptor integers (0, 1, 2, ...) to entries in the system-wide open file table.
Open File Table (system-wide) — Contains one entry per open file instance. This entry stores:
- The file offset (file pointer) — The current position for read/write
- Access mode — Read, write, or both
- Reference count — How many file descriptors point here
- Pointer to vnode/inode — The actual file's metadata
Vnode/Inode Table — Contains file metadata: size, permissions, block locations, etc.

The crucial insight is that the file pointer lives in the open file table entry, not in the file descriptor table or the inode. This has profound implications:

file_pointer_sharing.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>
 
int main() {
    // Open a file - creates one open file table entry
    int fd = open("data.txt", O_RDONLY);
    char buffer[10];
    
    // Read first 10 bytes - file pointer at position 10
    read(fd, buffer, 10);
    printf("Parent read: %.*s\n", 10, buffer);
    
    // Fork creates a child with SAME open file table entry
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        // Shares the file pointer with parent!
        read(fd, buffer, 10);  // Reads bytes 10-19
        printf("Child read: %.*s\n", 10, buffer);
        return 0;
    }
    
    wait(NULL);  // Wait for child
    
    // Parent continues - file pointer moved by child!
    read(fd, buffer, 10);  // Reads bytes 20-29
    printf("Parent read after child: %.*s\n", 10, buffer);
    
    close(fd);
    return 0;
}

Shared File Pointers After fork()

When you fork() a process, the child inherits file descriptors that point to the same open file table entries as the parent. This means they share the same file pointer! Reads/writes in one process affect the other's position. This is a common source of bugs and race conditions in systems programming.

Contrast: Independent file pointers with separate open() calls:

If two processes both call open() on the same file, they get separate open file table entries with independent file pointers. Changes to one pointer do not affect the other.

This distinction is fundamental to understanding multi-process file I/O and is exploited by various system designs (e.g., append-only logs where multiple writers share a file pointer with O_APPEND semantics).

Read and Write Operations in Sequential Mode

Let's examine how sequential read and write operations work at the system call level, understanding exactly what happens when bytes flow between your program and persistent storage.

The read() System Call:

ssize_t read(int fd, void *buf, size_t count);

When you call read(fd, buffer, n), the following sequence occurs:

Validate parameters — Check that fd is valid, buffer is a valid user-space pointer, count is reasonable.
Locate open file table entry — Follow fd through the process's file descriptor table to the system open file table entry.
Check current offset — Retrieve the current file pointer from the open file table entry.
Determine bytes to read — Compare requested count vs. bytes remaining (file_size - current_offset). Use the smaller value.
Buffer cache lookup — Check if the required file blocks are already in the system buffer cache.
Disk I/O if needed — If blocks are not cached, issue disk read operations to bring them into memory.
Copy to user space — Copy the requested bytes from kernel buffer cache to the user's buffer.
Advance file pointer — Add the number of bytes actually read to the current offset.
Return bytes read — Return the count of bytes successfully copied (may be less than requested).

sequential_read_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
 
/**
 * Demonstrates sequential reading with proper error handling.
 * Shows how to handle partial reads and EOF detection.
 */
void sequential_read_demo(const char *filename) {
    int fd = open(filename, O_RDONLY);
    if (fd < 0) {
        perror("open");
        return;
    }
    
    char buffer[4096];
    ssize_t bytes_read;
    off_t total_bytes = 0;
    
    // Classic sequential read loop
    while ((bytes_read = read(fd, buffer, sizeof(buffer))) > 0) {
        // Process bytes_read bytes of data
        // File pointer automatically advances
        total_bytes += bytes_read;
        
        // Note: bytes_read may be less than sizeof(buffer)
        // This is normal - not an error (short read)
        // Causes: approaching EOF, signal interruption, etc.
    }
    
    if (bytes_read < 0) {
        // Actual error occurred
        perror("read");
    } else {
        // bytes_read == 0 means EOF
        printf("Read %lld bytes total (EOF reached)\n", 
               (long long)total_bytes);
    }
    
    close(fd);
}

The write() System Call:

ssize_t write(int fd, const void *buf, size_t count);

Write follows a similar pattern but with additional considerations:

Validate parameters and permissions — Check fd is writable, buffer is readable, count is valid.
Locate open file table entry and current offset — Same as read.
Check O_APPEND flag — If set, atomically move offset to EOF before writing.
Allocate blocks if extending file — If writing past current file end, new blocks must be allocated.
Copy from user space to buffer cache — Data moves from user buffer to kernel buffers.
Mark buffers dirty — Buffers are marked for eventual write-back to disk.
Advance file pointer — Add bytes written to current offset.
Return bytes written — Usually equals count unless error or disk full.
Eventual write-back — Dirty buffers are written to disk asynchronously (or synchronously with O_SYNC).

Short Reads vs. Short Writes

Short reads (returning fewer bytes than requested) are normal and expected—always check the return value and loop if necessary. Short writes are more concerning; on regular files they typically indicate disk full conditions or quota exceeded. Always check write return values and handle ENOSPC (no space left on device) appropriately.

Performance Characteristics of Sequential Access

Sequential access achieves the best possible performance on virtually all storage media. Understanding why requires examining how different storage technologies respond to access patterns.

Hard Disk Drives (HDDs):

Magnetic hard drives consist of spinning platters and a read/write head that physically moves across the platter surface. Accessing data involves three time components:

Seek time — Time to move the head to the correct track (~3-15ms average)
Rotational latency — Time for the platter to rotate the desired sector under the head (~2-8ms average)
Transfer time — Time to read/write the data once positioned (~0.01-0.1ms per sector)

For sequential access, seek time and rotational latency are paid once at the beginning. Subsequent reads are pure transfer time as contiguous sectors pass under the head. For random access, seek and rotational latency are paid for every access.

The numbers are stark:

HDD Performance: Sequential vs. Random Access
Access Pattern	Throughput	Latency per 4KB	Time for 1GB
Sequential Read	150-250 MB/s	~0.02ms	~5 seconds
Random Read	0.5-2 MB/s	~10ms	~15 minutes
Ratio	100-200x faster	500x lower	180x faster

Solid State Drives (SSDs):

SSDs have no moving parts—electrons, not mechanical arms, locate data. This dramatically reduces the sequential vs. random performance gap, but it still exists:

Page-level granularity — SSDs read and write in pages (typically 4-16KB). Sequential access aligns with page boundaries.
Channel parallelism — SSDs contain multiple flash chips. Sequential access can saturate all channels; random small reads may bottleneck on fewer channels.
Read-ahead optimization — SSD controllers detect sequential patterns and prefetch subsequent data.
Write amplification — Random small writes cause more internal housekeeping (garbage collection, wear leveling) than sequential writes.

SSD performance comparison:

SSD Performance: Sequential vs. Random Access (NVMe)
Access Pattern	Throughput	IOPS	Latency
Sequential Read	3,000-7,000 MB/s	N/A (streaming)	~0.02ms effective
Random Read (4KB)	50-200 MB/s	100K-1M IOPS	~0.02-0.1ms
Sequential Write	2,000-5,000 MB/s	N/A (streaming)	~0.02ms effective
Random Write (4KB)	100-500 MB/s	100K-500K IOPS	~0.02-0.1ms

Operating System Optimizations for Sequential Access:

Operating systems heavily optimize for sequential access patterns:

Read-ahead (Prefetching) — When the OS detects sequential access, it preemptively reads subsequent blocks into the buffer cache before you request them. This effectively hides disk latency.
Large contiguous allocation — File systems attempt to allocate sequential files in contiguous disk blocks to maximize sequential performance.
Write coalescing — Small sequential writes are buffered and written as larger sequential chunks.
Delayed allocation — Some file systems (ext4, XFS) delay block allocation until write-back, allowing better contiguous placement.
I/O scheduling — Disk schedulers like CFQ and mq-deadline merge and reorder requests to maximize sequential runs.

The result: sequential access often approaches the theoretical maximum throughput of the storage device, while random access is limited by latency and seek overhead.

The 1000x Rule of Thumb

A classic systems programming heuristic: sequential disk access can be 1000x faster than random access on HDDs. While SSDs narrow this gap dramatically, sequential access still wins—often by 10-50x for throughput-limited workloads. Always prefer sequential access when designing file formats and I/O patterns.

Buffering Strategies for Sequential I/O

Buffering is crucial to achieving good performance with sequential access. It occurs at multiple layers, and understanding each layer helps you write efficient I/O code.

Layer 1: Application Buffers

At the application level, you control buffer size when calling read()/write(). The choice matters:

Too small (byte-at-a-time) — Each byte triggers a system call. System call overhead (~100ns-1μs) dominates, reducing throughput by 100-1000x.
Too large — Memory pressure increases; diminishing returns beyond buffer cache size.
Optimal (4KB-1MB) — Matches file system block size or buffer cache policies; amortizes system call overhead effectively.

The math:

buffer_size_impact.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/*
 * Impact of buffer size on sequential read performance
 * Reading 100MB file with different buffer sizes
 *
 * Typical results on modern system:
 *
 * Buffer Size | System Calls | Overhead    | Effective Rate
 * ------------|--------------|-------------|---------------
 * 1 byte      | 104,857,600  | ~100 seconds| ~1 MB/s
 * 64 bytes    | 1,638,400    | ~1.6 seconds| ~60 MB/s
 * 4 KB        | 25,600       | ~0.025 sec  | ~4 GB/s
 * 64 KB       | 1,600        | ~0.002 sec  | ~50 GB/s
 * 1 MB        | 100          | ~0.0001 sec | ~1 TB/s (cache hits)
 *
 * Note: Beyond ~64KB, benefits plateau as we're limited by
 * disk throughput, not system call overhead.
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <time.h>
 
void benchmark_buffer_size(const char *filename, size_t bufsize) {
    char *buffer = malloc(bufsize);
    int fd = open(filename, O_RDONLY);
    
    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);
    
    ssize_t bytes;
    size_t total = 0;
    while ((bytes = read(fd, buffer, bufsize)) > 0) {
        total += bytes;
    }
    
    clock_gettime(CLOCK_MONOTONIC, &end);
    double elapsed = (end.tv_sec - start.tv_sec) + 
                     (end.tv_nsec - start.tv_nsec) / 1e9;
    
    printf("Buffer %8zu: %.3f seconds, %.2f MB/s\n",
           bufsize, elapsed, (total / 1e6) / elapsed);
    
    close(fd);
    free(buffer);
}

Layer 2: C Library Buffering (stdio)

Functions like fread(), fwrite(), fgets() use an intermediate buffer (typically 4KB-8KB) in user space:

// This is actually efficient:
while ((c = fgetc(file)) != EOF) {
    process(c);
}
// fgetc() reads from an internal buffer, not the kernel.
// Only refills the buffer with read() every 4KB-8KB.

The stdio library provides buffering modes that you can control:

_IOFBF — Full buffering (default for regular files)
_IOLBF — Line buffering (default for terminals)
_IONBF — No buffering

Layer 3: Kernel Buffer Cache

The operating system maintains a buffer cache (page cache in Linux) that holds recently accessed file blocks in RAM. Benefits:

First read hits disk, subsequent reads hit cache
Read-ahead prefetches sequential data into cache
Write-back buffers small writes and flushes efficiently

Layer 4: Disk Controller Cache

Modern disk controllers have 8-256MB of DRAM cache for:

Read lookahead — Prefetching subsequent sectors
Write buffering — Acknowledging writes before platters
Command queuing — Reordering requests for efficiency

Practical Buffer Size Recommendation

For most sequential I/O, use buffer sizes of 64KB to 256KB. Smaller buffers waste CPU on system calls; larger buffers rarely improve throughput beyond disk limits. For memory-constrained environments, 4KB-16KB is a reasonable minimum. Match buffer sizes to file system block sizes when possible.

Real-World Applications of Sequential Access

Sequential access is not merely a historical artifact—it remains the dominant access pattern for numerous critical workloads. Understanding these use cases illuminates why operating systems and storage systems optimize so heavily for sequential I/O.

1. Log Files and Event Streams

Application logs, system logs, transaction logs, and event streams are inherently sequential:

Append-only writes — New entries are always added at the end
Linear reads — Log analysis tools read from start to end (or a time range)
Time ordering — Events are naturally ordered by time of occurrence

Examples: syslog, web server access logs, database transaction logs (WAL), Kafka message streams.

2. Media Files (Audio, Video, Images)

Media consumption is fundamentally sequential:

Video playback — Frames are decoded in order at fixed rate
Audio streaming — Samples are read sequentially at sample rate
Image processing — Raw images are often processed scanline-by-scanline

Exceptions exist (seeking in video), but dominant access is sequential streaming.

3. Backup and Archival

Backup operations involve:

Copying entire files — Sequential read of source, sequential write to destination
Tape drives — Backup tapes are strictly sequential media
Archive formats — tar, zip, and similar formats are designed for sequential access

4. Batch Data Processing

Data pipelines often process files sequentially:

ETL (Extract-Transform-Load) — Read input, process, write output
Scientific computation — Process large datasets in sweeps
Map phase of MapReduce — Sequentially scan input splits

5. File Transfer and Replication

Network file transfer — scp, rsync read files sequentially
Database replication — Streaming WAL records sequentially
Content delivery — CDN servers stream files to clients

Sequential Access Workload Characteristics
Workload	Read Pattern	Write Pattern	Sequentiality
Log files	Full scan or tail	Append-only	100%
Video playback	Linear streaming	N/A	95%+ (rare seeks)
File backup	Full file copy	Sequential write	100%
Database bulk load	Sequential read	Append writes	100%
Compiler output	N/A	Sequential write	100%
Document editing	Full load on open	Full save on close	95%+

Design for Sequential Access

When designing file formats or I/O-heavy applications, ask: 'Can this workload be sequential?' Restructuring random access patterns into sequential patterns (e.g., sorting before processing, using append-only logs) often yields order-of-magnitude performance improvements.

Limitations and When Sequential Access Falls Short

While sequential access is often optimal, certain workloads fundamentally require other access patterns. Recognizing these cases is crucial for selecting the right approach.

Workloads Where Sequential Access Is Inadequate:

Sequential Access Limitations

•Database record lookup — Finding a single record by key requires random access; sequential scan would read entire table.
•Binary search — Searching sorted data requires jumping to middle, then halves—inherently random access.
•Index-based queries — Using indexes means following pointers to scattered data locations.
•Memory-mapped file editing — Document editors modify arbitrary positions without reading/writing entire files.
•Random sampling — Statistical sampling requires reading random positions throughout a file.
•Interactive applications — Users jump to arbitrary positions (e.g., video seeking, spreadsheet navigation).

The Cost of Inappropriate Sequential Access:

Using sequential access where random access is needed leads to severe inefficiency:

Full table scans — Database without indexes sequentially scans entire tables for single-record queries. O(n) instead of O(log n).
Linear search — Finding an element in an unsorted file is O(n). With proper indexing and random access, it's O(log n) or O(1).
Read amplification — To read 1 byte at position 1,000,000, sequential access must read all preceding 999,999 bytes.
Update inefficiency — Modifying one record in a sequential file may require rewriting the entire file.

Example: The Database Index Problem

sequential_vs_indexed.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/*
 * Comparison: Finding record by ID in a 1GB file
 * File contains 10 million 100-byte records
 */
 
// Sequential scan approach
// Must read on average half the file to find a record
// Average: 500MB read = ~3 seconds on HDD, ~0.2s on SSD
record_t find_sequential(int fd, int target_id) {
    record_t record;
    // Read every record until we find the target
    while (read(fd, &record, sizeof(record)) == sizeof(record)) {
        if (record.id == target_id) {
            return record;  // Found it!
        }
    }
    // Record not found - traversed entire file
    return (record_t){0};
}
 
// Random access with index approach
// Index tells us exact byte offset of target record
// Read exactly one 100-byte record
// Time: ~10ms on HDD, ~0.1ms on SSD
record_t find_indexed(int fd, int target_id, index_t *index) {
    // Look up record offset in in-memory index (hash table or tree)
    off_t offset = index_lookup(index, target_id);  // O(1) or O(log n)
    
    // Seek directly to record location
    lseek(fd, offset, SEEK_SET);
    
    // Read only the single record we need
    record_t record;
    read(fd, &record, sizeof(record));
    return record;
}
 
/*
 * Performance comparison for single lookup:
 * - Sequential: 3000ms (HDD) / 200ms (SSD)
 * - Indexed:    10ms (HDD) / 0.1ms (SSD)
 * 
 * That's 300x to 2000x faster!
 * The difference compounds with query volume.
 */

Match Access Pattern to Workload

The key insight is not that sequential access is 'better' or 'worse' but that it suits specific workloads. Logs and streaming media are naturally sequential. Databases require indexed random access. Choosing the wrong access pattern can degrade performance by orders of magnitude.

Summary: Mastering Sequential Access

We've conducted a deep examination of sequential access—the most fundamental file access pattern. Let's consolidate the critical concepts:

Key Takeaways

•Sequential access reads/writes data linearly — From byte 0 forward, with the file pointer auto-advancing after each operation.
•The file pointer is per-open-file, not per-fd — Fork() creates shared pointers; separate open() creates independent pointers.
•Sequential access achieves maximum storage throughput — 100-1000x faster than random access on HDDs; 10-50x on SSDs.
•Buffering is essential — Use 64KB+ application buffers to amortize system call overhead.
•OS optimizations (read-ahead, contiguous allocation) enhance sequential performance — The OS works with you when you access files sequentially.
•Many critical workloads are naturally sequential — Logs, media, backups, batch processing all benefit enormously.
•Know when sequential doesn't fit — Indexed lookups, binary search, and interactive editing require random access.

What's next:

Sequential access excels when you process files from start to finish, but what if you need to jump directly to a specific record without reading everything before it? The next page explores Direct (Random) Access—how to read and write at arbitrary positions within a file, the system calls that enable it, and the performance implications of breaking the sequential pattern.

Page Complete

You now have a comprehensive understanding of sequential file access—the conceptual model, implementation mechanics, performance characteristics, and real-world applications. This foundation prepares you to appreciate why and when other access methods are necessary, starting with direct random access.

1 / 5

Loading learning content...

Operating SystemsAccess Methods

File Access Methods

LevelIntermediate

Duration75 mins

TopicAccess Methods

1 / 5

Sequential Access

The Oldest and Most Enduring Access Pattern

What You Will Learn

The Conceptual Model of Sequential Access

Key properties of the sequential access model:

Sequential Access Properties

•Linear Traversal — Data is accessed in order, from position 0 through position N-1, where N is the file size.
•Automatic Position Advancement — After each read or write, the file pointer advances by the number of bytes processed.
•Single Current Position — At any moment, there is exactly one 'current position' per open file descriptor.
•Forward and Backward Movement — While forward is natural, the lseek() system call allows rewinding or repositioning.
•Record-Oriented or Byte-Oriented — Depending on the OS and file system, sequential access may operate on fixed-size records or arbitrary byte streams.

The tape metaphor:

Even though modern storage devices (disks, SSDs) can access arbitrary locations, the sequential model persists because:

Physical efficiency — Disks achieve maximum throughput with sequential reads/writes
Simplicity — The programming model is straightforward and intuitive
Buffering optimization — OS caches can predict and prefetch sequential data
Common workloads — Most file processing (logs, media, backups) is naturally sequential

Historical Context

The File Pointer Mechanism in Depth

The file pointer is the central abstraction enabling sequential access. Understanding how it works—both conceptually and in terms of OS implementation—is crucial for mastering file I/O.

What exactly is a file pointer?

The file pointer has the following behaviors:

File Pointer Behavior by Operation
Operation	Effect on File Pointer	Notes
open()	Set to 0 (or EOF with O_APPEND)	Initial position determined by flags
read(n)	Advances by n bytes	Returns bytes read; may be < n at EOF
write(n)	Advances by n bytes	May extend file if at or past EOF
lseek(offset, whence)	Set to calculated position	Allows random repositioning
close()	Pointer discarded	Each open() creates a new pointer

Implementation in the Operating System:

When you open a file in Unix/Linux, the OS creates several data structures:

File Descriptor Table (per-process) — Maps file descriptor integers (0, 1, 2, ...) to entries in the system-wide open file table.
Open File Table (system-wide) — Contains one entry per open file instance. This entry stores:
- The file offset (file pointer) — The current position for read/write
- Access mode — Read, write, or both
- Reference count — How many file descriptors point here
- Pointer to vnode/inode — The actual file's metadata
Vnode/Inode Table — Contains file metadata: size, permissions, block locations, etc.

The crucial insight is that the file pointer lives in the open file table entry, not in the file descriptor table or the inode. This has profound implications:

file_pointer_sharing.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>
 
int main() {
    // Open a file - creates one open file table entry
    int fd = open("data.txt", O_RDONLY);
    char buffer[10];
    
    // Read first 10 bytes - file pointer at position 10
    read(fd, buffer, 10);
    printf("Parent read: %.*s\n", 10, buffer);
    
    // Fork creates a child with SAME open file table entry
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        // Shares the file pointer with parent!
        read(fd, buffer, 10);  // Reads bytes 10-19
        printf("Child read: %.*s\n", 10, buffer);
        return 0;
    }
    
    wait(NULL);  // Wait for child
    
    // Parent continues - file pointer moved by child!
    read(fd, buffer, 10);  // Reads bytes 20-29
    printf("Parent read after child: %.*s\n", 10, buffer);
    
    close(fd);
    return 0;
}

Shared File Pointers After fork()

Contrast: Independent file pointers with separate open() calls:

If two processes both call open() on the same file, they get separate open file table entries with independent file pointers. Changes to one pointer do not affect the other.

Read and Write Operations in Sequential Mode

Let's examine how sequential read and write operations work at the system call level, understanding exactly what happens when bytes flow between your program and persistent storage.

The read() System Call:

ssize_t read(int fd, void *buf, size_t count);

When you call read(fd, buffer, n), the following sequence occurs:

Validate parameters — Check that fd is valid, buffer is a valid user-space pointer, count is reasonable.
Locate open file table entry — Follow fd through the process's file descriptor table to the system open file table entry.
Check current offset — Retrieve the current file pointer from the open file table entry.
Determine bytes to read — Compare requested count vs. bytes remaining (file_size - current_offset). Use the smaller value.
Buffer cache lookup — Check if the required file blocks are already in the system buffer cache.
Disk I/O if needed — If blocks are not cached, issue disk read operations to bring them into memory.
Copy to user space — Copy the requested bytes from kernel buffer cache to the user's buffer.
Advance file pointer — Add the number of bytes actually read to the current offset.
Return bytes read — Return the count of bytes successfully copied (may be less than requested).

sequential_read_example.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <errno.h>
 
/**
 * Demonstrates sequential reading with proper error handling.
 * Shows how to handle partial reads and EOF detection.
 */
void sequential_read_demo(const char *filename) {
    int fd = open(filename, O_RDONLY);
    if (fd < 0) {
        perror("open");
        return;
    }
    
    char buffer[4096];
    ssize_t bytes_read;
    off_t total_bytes = 0;
    
    // Classic sequential read loop
    while ((bytes_read = read(fd, buffer, sizeof(buffer))) > 0) {
        // Process bytes_read bytes of data
        // File pointer automatically advances
        total_bytes += bytes_read;
        
        // Note: bytes_read may be less than sizeof(buffer)
        // This is normal - not an error (short read)
        // Causes: approaching EOF, signal interruption, etc.
    }
    
    if (bytes_read < 0) {
        // Actual error occurred
        perror("read");
    } else {
        // bytes_read == 0 means EOF
        printf("Read %lld bytes total (EOF reached)\n", 
               (long long)total_bytes);
    }
    
    close(fd);
}

The write() System Call:

ssize_t write(int fd, const void *buf, size_t count);

Write follows a similar pattern but with additional considerations:

Validate parameters and permissions — Check fd is writable, buffer is readable, count is valid.
Locate open file table entry and current offset — Same as read.
Check O_APPEND flag — If set, atomically move offset to EOF before writing.
Allocate blocks if extending file — If writing past current file end, new blocks must be allocated.
Copy from user space to buffer cache — Data moves from user buffer to kernel buffers.
Mark buffers dirty — Buffers are marked for eventual write-back to disk.
Advance file pointer — Add bytes written to current offset.
Return bytes written — Usually equals count unless error or disk full.
Eventual write-back — Dirty buffers are written to disk asynchronously (or synchronously with O_SYNC).

Short Reads vs. Short Writes

Performance Characteristics of Sequential Access

Sequential access achieves the best possible performance on virtually all storage media. Understanding why requires examining how different storage technologies respond to access patterns.

Hard Disk Drives (HDDs):

Magnetic hard drives consist of spinning platters and a read/write head that physically moves across the platter surface. Accessing data involves three time components:

Seek time — Time to move the head to the correct track (~3-15ms average)
Rotational latency — Time for the platter to rotate the desired sector under the head (~2-8ms average)
Transfer time — Time to read/write the data once positioned (~0.01-0.1ms per sector)

The numbers are stark:

HDD Performance: Sequential vs. Random Access
Access Pattern	Throughput	Latency per 4KB	Time for 1GB
Sequential Read	150-250 MB/s	~0.02ms	~5 seconds
Random Read	0.5-2 MB/s	~10ms	~15 minutes
Ratio	100-200x faster	500x lower	180x faster

Solid State Drives (SSDs):

SSDs have no moving parts—electrons, not mechanical arms, locate data. This dramatically reduces the sequential vs. random performance gap, but it still exists:

Page-level granularity — SSDs read and write in pages (typically 4-16KB). Sequential access aligns with page boundaries.
Channel parallelism — SSDs contain multiple flash chips. Sequential access can saturate all channels; random small reads may bottleneck on fewer channels.
Read-ahead optimization — SSD controllers detect sequential patterns and prefetch subsequent data.
Write amplification — Random small writes cause more internal housekeeping (garbage collection, wear leveling) than sequential writes.

SSD performance comparison:

SSD Performance: Sequential vs. Random Access (NVMe)
Access Pattern	Throughput	IOPS	Latency
Sequential Read	3,000-7,000 MB/s	N/A (streaming)	~0.02ms effective
Random Read (4KB)	50-200 MB/s	100K-1M IOPS	~0.02-0.1ms
Sequential Write	2,000-5,000 MB/s	N/A (streaming)	~0.02ms effective
Random Write (4KB)	100-500 MB/s	100K-500K IOPS	~0.02-0.1ms

Operating System Optimizations for Sequential Access:

Operating systems heavily optimize for sequential access patterns:

Read-ahead (Prefetching) — When the OS detects sequential access, it preemptively reads subsequent blocks into the buffer cache before you request them. This effectively hides disk latency.
Large contiguous allocation — File systems attempt to allocate sequential files in contiguous disk blocks to maximize sequential performance.
Write coalescing — Small sequential writes are buffered and written as larger sequential chunks.
Delayed allocation — Some file systems (ext4, XFS) delay block allocation until write-back, allowing better contiguous placement.
I/O scheduling — Disk schedulers like CFQ and mq-deadline merge and reorder requests to maximize sequential runs.

The result: sequential access often approaches the theoretical maximum throughput of the storage device, while random access is limited by latency and seek overhead.

The 1000x Rule of Thumb

Buffering Strategies for Sequential I/O

Buffering is crucial to achieving good performance with sequential access. It occurs at multiple layers, and understanding each layer helps you write efficient I/O code.

Layer 1: Application Buffers

At the application level, you control buffer size when calling read()/write(). The choice matters:

Too small (byte-at-a-time) — Each byte triggers a system call. System call overhead (~100ns-1μs) dominates, reducing throughput by 100-1000x.
Too large — Memory pressure increases; diminishing returns beyond buffer cache size.
Optimal (4KB-1MB) — Matches file system block size or buffer cache policies; amortizes system call overhead effectively.

The math:

buffer_size_impact.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
/*
 * Impact of buffer size on sequential read performance
 * Reading 100MB file with different buffer sizes
 *
 * Typical results on modern system:
 *
 * Buffer Size | System Calls | Overhead    | Effective Rate
 * ------------|--------------|-------------|---------------
 * 1 byte      | 104,857,600  | ~100 seconds| ~1 MB/s
 * 64 bytes    | 1,638,400    | ~1.6 seconds| ~60 MB/s
 * 4 KB        | 25,600       | ~0.025 sec  | ~4 GB/s
 * 64 KB       | 1,600        | ~0.002 sec  | ~50 GB/s
 * 1 MB        | 100          | ~0.0001 sec | ~1 TB/s (cache hits)
 *
 * Note: Beyond ~64KB, benefits plateau as we're limited by
 * disk throughput, not system call overhead.
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <time.h>
 
void benchmark_buffer_size(const char *filename, size_t bufsize) {
    char *buffer = malloc(bufsize);
    int fd = open(filename, O_RDONLY);
    
    struct timespec start, end;
    clock_gettime(CLOCK_MONOTONIC, &start);
    
    ssize_t bytes;
    size_t total = 0;
    while ((bytes = read(fd, buffer, bufsize)) > 0) {
        total += bytes;
    }
    
    clock_gettime(CLOCK_MONOTONIC, &end);
    double elapsed = (end.tv_sec - start.tv_sec) + 
                     (end.tv_nsec - start.tv_nsec) / 1e9;
    
    printf("Buffer %8zu: %.3f seconds, %.2f MB/s\n",
           bufsize, elapsed, (total / 1e6) / elapsed);
    
    close(fd);
    free(buffer);
}

Layer 2: C Library Buffering (stdio)

Functions like fread(), fwrite(), fgets() use an intermediate buffer (typically 4KB-8KB) in user space:

// This is actually efficient:
while ((c = fgetc(file)) != EOF) {
    process(c);
}
// fgetc() reads from an internal buffer, not the kernel.
// Only refills the buffer with read() every 4KB-8KB.

The stdio library provides buffering modes that you can control:

_IOFBF — Full buffering (default for regular files)
_IOLBF — Line buffering (default for terminals)
_IONBF — No buffering

Layer 3: Kernel Buffer Cache

The operating system maintains a buffer cache (page cache in Linux) that holds recently accessed file blocks in RAM. Benefits:

First read hits disk, subsequent reads hit cache
Read-ahead prefetches sequential data into cache
Write-back buffers small writes and flushes efficiently

Layer 4: Disk Controller Cache

Modern disk controllers have 8-256MB of DRAM cache for:

Read lookahead — Prefetching subsequent sectors
Write buffering — Acknowledging writes before platters
Command queuing — Reordering requests for efficiency

Practical Buffer Size Recommendation

Real-World Applications of Sequential Access

1. Log Files and Event Streams

Application logs, system logs, transaction logs, and event streams are inherently sequential:

Append-only writes — New entries are always added at the end
Linear reads — Log analysis tools read from start to end (or a time range)
Time ordering — Events are naturally ordered by time of occurrence

Examples: syslog, web server access logs, database transaction logs (WAL), Kafka message streams.

2. Media Files (Audio, Video, Images)

Media consumption is fundamentally sequential:

Video playback — Frames are decoded in order at fixed rate
Audio streaming — Samples are read sequentially at sample rate
Image processing — Raw images are often processed scanline-by-scanline

Exceptions exist (seeking in video), but dominant access is sequential streaming.

3. Backup and Archival

Backup operations involve:

Copying entire files — Sequential read of source, sequential write to destination
Tape drives — Backup tapes are strictly sequential media
Archive formats — tar, zip, and similar formats are designed for sequential access

4. Batch Data Processing

Data pipelines often process files sequentially:

ETL (Extract-Transform-Load) — Read input, process, write output
Scientific computation — Process large datasets in sweeps
Map phase of MapReduce — Sequentially scan input splits

5. File Transfer and Replication

Network file transfer — scp, rsync read files sequentially
Database replication — Streaming WAL records sequentially
Content delivery — CDN servers stream files to clients

Sequential Access Workload Characteristics
Workload	Read Pattern	Write Pattern	Sequentiality
Log files	Full scan or tail	Append-only	100%
Video playback	Linear streaming	N/A	95%+ (rare seeks)
File backup	Full file copy	Sequential write	100%
Database bulk load	Sequential read	Append writes	100%
Compiler output	N/A	Sequential write	100%
Document editing	Full load on open	Full save on close	95%+

Design for Sequential Access

Limitations and When Sequential Access Falls Short

While sequential access is often optimal, certain workloads fundamentally require other access patterns. Recognizing these cases is crucial for selecting the right approach.

Workloads Where Sequential Access Is Inadequate:

Sequential Access Limitations

•Database record lookup — Finding a single record by key requires random access; sequential scan would read entire table.
•Binary search — Searching sorted data requires jumping to middle, then halves—inherently random access.
•Index-based queries — Using indexes means following pointers to scattered data locations.
•Memory-mapped file editing — Document editors modify arbitrary positions without reading/writing entire files.
•Random sampling — Statistical sampling requires reading random positions throughout a file.
•Interactive applications — Users jump to arbitrary positions (e.g., video seeking, spreadsheet navigation).

The Cost of Inappropriate Sequential Access:

Using sequential access where random access is needed leads to severe inefficiency:

Full table scans — Database without indexes sequentially scans entire tables for single-record queries. O(n) instead of O(log n).
Linear search — Finding an element in an unsorted file is O(n). With proper indexing and random access, it's O(log n) or O(1).
Read amplification — To read 1 byte at position 1,000,000, sequential access must read all preceding 999,999 bytes.
Update inefficiency — Modifying one record in a sequential file may require rewriting the entire file.

Example: The Database Index Problem

sequential_vs_indexed.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
/*
 * Comparison: Finding record by ID in a 1GB file
 * File contains 10 million 100-byte records
 */
 
// Sequential scan approach
// Must read on average half the file to find a record
// Average: 500MB read = ~3 seconds on HDD, ~0.2s on SSD
record_t find_sequential(int fd, int target_id) {
    record_t record;
    // Read every record until we find the target
    while (read(fd, &record, sizeof(record)) == sizeof(record)) {
        if (record.id == target_id) {
            return record;  // Found it!
        }
    }
    // Record not found - traversed entire file
    return (record_t){0};
}
 
// Random access with index approach
// Index tells us exact byte offset of target record
// Read exactly one 100-byte record
// Time: ~10ms on HDD, ~0.1ms on SSD
record_t find_indexed(int fd, int target_id, index_t *index) {
    // Look up record offset in in-memory index (hash table or tree)
    off_t offset = index_lookup(index, target_id);  // O(1) or O(log n)
    
    // Seek directly to record location
    lseek(fd, offset, SEEK_SET);
    
    // Read only the single record we need
    record_t record;
    read(fd, &record, sizeof(record));
    return record;
}
 
/*
 * Performance comparison for single lookup:
 * - Sequential: 3000ms (HDD) / 200ms (SSD)
 * - Indexed:    10ms (HDD) / 0.1ms (SSD)
 * 
 * That's 300x to 2000x faster!
 * The difference compounds with query volume.
 */

Match Access Pattern to Workload

Summary: Mastering Sequential Access

We've conducted a deep examination of sequential access—the most fundamental file access pattern. Let's consolidate the critical concepts:

Key Takeaways

•Sequential access reads/writes data linearly — From byte 0 forward, with the file pointer auto-advancing after each operation.
•The file pointer is per-open-file, not per-fd — Fork() creates shared pointers; separate open() creates independent pointers.
•Sequential access achieves maximum storage throughput — 100-1000x faster than random access on HDDs; 10-50x on SSDs.
•Buffering is essential — Use 64KB+ application buffers to amortize system call overhead.
•OS optimizations (read-ahead, contiguous allocation) enhance sequential performance — The OS works with you when you access files sequentially.
•Many critical workloads are naturally sequential — Logs, media, backups, batch processing all benefit enormously.
•Know when sequential doesn't fit — Indexed lookups, binary search, and interactive editing require random access.

What's next:

Page Complete

1 / 5