Loading learning content...
Before random access memory, before magnetic disks with movable heads, before any notion of 'jumping' to a specific location in a file—there was sequential access. It is the most primitive, most intuitive, and remarkably, still the most common method of interacting with file data in modern computing systems.
Sequential access embodies a simple principle: read or write data in a linear, ordered sequence from beginning to end. Like reading a book page by page, or listening to a tape recording from start to finish, sequential access processes information in precisely the order it was stored.
This simplicity is not a limitation—it is a profound strength. Sequential access aligns perfectly with physics. Magnetic tape can only move in one direction at a time. Hard disk platters spin continuously, making sequential reads vastly more efficient than scattered random reads. Even modern SSDs, with no moving parts, still benefit from sequential access patterns due to the way flash memory pages and blocks are organized.
Understanding sequential access is foundational to understanding all file access methods, because every other method is essentially an optimization for specific access patterns that sequential access cannot efficiently serve.
By the end of this page, you will deeply understand the sequential access model—its conceptual underpinnings, the file pointer mechanism, implementation details, performance characteristics on different storage media, real-world use cases, and why this 'primitive' access pattern remains central to modern operating systems and applications.
At its core, sequential access treats a file as a linear stream of bytes or records. Imagine a very long tape with data written from one end to the other. To access any piece of data, you must 'play' the tape from wherever you currently are, moving forward (and sometimes backward) through the data stream.
The defining characteristic of sequential access is the file pointer (also called the file position indicator or current position). This pointer marks your current location within the file. Every read operation retrieves data starting from this pointer and automatically advances the pointer by the number of bytes read. Every write operation writes data starting at the pointer and advances it similarly.
Key properties of the sequential access model:
lseek() system call allows rewinding or repositioning.The tape metaphor:
The tape metaphor is historically accurate—sequential access was literally how tape drives worked. Magnetic tape could only read data as it passed over the read/write head, and the tape could only move in one direction at meaningful speed (rewinding was slow and avoided when possible).
Even though modern storage devices (disks, SSDs) can access arbitrary locations, the sequential model persists because:
Sequential access predates the concept of 'files' entirely. Early computer systems read punched cards or paper tape—inherently sequential media. When magnetic tape was introduced in the 1950s, the sequential model was preserved. Even the original Unix file system was designed with sequential access as the primary paradigm, and this influence persists in POSIX standards today.
The file pointer is the central abstraction enabling sequential access. Understanding how it works—both conceptually and in terms of OS implementation—is crucial for mastering file I/O.
What exactly is a file pointer?
A file pointer is a non-negative integer offset representing the byte position within a file where the next read or write will occur. When a file is first opened (without special flags), the file pointer is set to 0, indicating the beginning of the file.
The file pointer has the following behaviors:
| Operation | Effect on File Pointer | Notes |
|---|---|---|
| open() | Set to 0 (or EOF with O_APPEND) | Initial position determined by flags |
| read(n) | Advances by n bytes | Returns bytes read; may be < n at EOF |
| write(n) | Advances by n bytes | May extend file if at or past EOF |
| lseek(offset, whence) | Set to calculated position | Allows random repositioning |
| close() | Pointer discarded | Each open() creates a new pointer |
Implementation in the Operating System:
When you open a file in Unix/Linux, the OS creates several data structures:
File Descriptor Table (per-process) — Maps file descriptor integers (0, 1, 2, ...) to entries in the system-wide open file table.
Open File Table (system-wide) — Contains one entry per open file instance. This entry stores:
Vnode/Inode Table — Contains file metadata: size, permissions, block locations, etc.
The crucial insight is that the file pointer lives in the open file table entry, not in the file descriptor table or the inode. This has profound implications:
12345678910111213141516171819202122232425262728293031323334
#include <stdio.h>#include <unistd.h>#include <fcntl.h>#include <sys/wait.h> int main() { // Open a file - creates one open file table entry int fd = open("data.txt", O_RDONLY); char buffer[10]; // Read first 10 bytes - file pointer at position 10 read(fd, buffer, 10); printf("Parent read: %.*s\n", 10, buffer); // Fork creates a child with SAME open file table entry pid_t pid = fork(); if (pid == 0) { // Child process // Shares the file pointer with parent! read(fd, buffer, 10); // Reads bytes 10-19 printf("Child read: %.*s\n", 10, buffer); return 0; } wait(NULL); // Wait for child // Parent continues - file pointer moved by child! read(fd, buffer, 10); // Reads bytes 20-29 printf("Parent read after child: %.*s\n", 10, buffer); close(fd); return 0;}When you fork() a process, the child inherits file descriptors that point to the same open file table entries as the parent. This means they share the same file pointer! Reads/writes in one process affect the other's position. This is a common source of bugs and race conditions in systems programming.
Contrast: Independent file pointers with separate open() calls:
If two processes both call open() on the same file, they get separate open file table entries with independent file pointers. Changes to one pointer do not affect the other.
This distinction is fundamental to understanding multi-process file I/O and is exploited by various system designs (e.g., append-only logs where multiple writers share a file pointer with O_APPEND semantics).
Let's examine how sequential read and write operations work at the system call level, understanding exactly what happens when bytes flow between your program and persistent storage.
The read() System Call:
ssize_t read(int fd, void *buf, size_t count);
When you call read(fd, buffer, n), the following sequence occurs:
Validate parameters — Check that fd is valid, buffer is a valid user-space pointer, count is reasonable.
Locate open file table entry — Follow fd through the process's file descriptor table to the system open file table entry.
Check current offset — Retrieve the current file pointer from the open file table entry.
Determine bytes to read — Compare requested count vs. bytes remaining (file_size - current_offset). Use the smaller value.
Buffer cache lookup — Check if the required file blocks are already in the system buffer cache.
Disk I/O if needed — If blocks are not cached, issue disk read operations to bring them into memory.
Copy to user space — Copy the requested bytes from kernel buffer cache to the user's buffer.
Advance file pointer — Add the number of bytes actually read to the current offset.
Return bytes read — Return the count of bytes successfully copied (may be less than requested).
123456789101112131415161718192021222324252627282930313233343536373839404142
#include <stdio.h>#include <unistd.h>#include <fcntl.h>#include <errno.h> /** * Demonstrates sequential reading with proper error handling. * Shows how to handle partial reads and EOF detection. */void sequential_read_demo(const char *filename) { int fd = open(filename, O_RDONLY); if (fd < 0) { perror("open"); return; } char buffer[4096]; ssize_t bytes_read; off_t total_bytes = 0; // Classic sequential read loop while ((bytes_read = read(fd, buffer, sizeof(buffer))) > 0) { // Process bytes_read bytes of data // File pointer automatically advances total_bytes += bytes_read; // Note: bytes_read may be less than sizeof(buffer) // This is normal - not an error (short read) // Causes: approaching EOF, signal interruption, etc. } if (bytes_read < 0) { // Actual error occurred perror("read"); } else { // bytes_read == 0 means EOF printf("Read %lld bytes total (EOF reached)\n", (long long)total_bytes); } close(fd);}The write() System Call:
ssize_t write(int fd, const void *buf, size_t count);
Write follows a similar pattern but with additional considerations:
Validate parameters and permissions — Check fd is writable, buffer is readable, count is valid.
Locate open file table entry and current offset — Same as read.
Check O_APPEND flag — If set, atomically move offset to EOF before writing.
Allocate blocks if extending file — If writing past current file end, new blocks must be allocated.
Copy from user space to buffer cache — Data moves from user buffer to kernel buffers.
Mark buffers dirty — Buffers are marked for eventual write-back to disk.
Advance file pointer — Add bytes written to current offset.
Return bytes written — Usually equals count unless error or disk full.
Eventual write-back — Dirty buffers are written to disk asynchronously (or synchronously with O_SYNC).
Short reads (returning fewer bytes than requested) are normal and expected—always check the return value and loop if necessary. Short writes are more concerning; on regular files they typically indicate disk full conditions or quota exceeded. Always check write return values and handle ENOSPC (no space left on device) appropriately.
Sequential access achieves the best possible performance on virtually all storage media. Understanding why requires examining how different storage technologies respond to access patterns.
Hard Disk Drives (HDDs):
Magnetic hard drives consist of spinning platters and a read/write head that physically moves across the platter surface. Accessing data involves three time components:
For sequential access, seek time and rotational latency are paid once at the beginning. Subsequent reads are pure transfer time as contiguous sectors pass under the head. For random access, seek and rotational latency are paid for every access.
The numbers are stark:
| Access Pattern | Throughput | Latency per 4KB | Time for 1GB |
|---|---|---|---|
| Sequential Read | 150-250 MB/s | ~0.02ms | ~5 seconds |
| Random Read | 0.5-2 MB/s | ~10ms | ~15 minutes |
| Ratio | 100-200x faster | 500x lower | 180x faster |
Solid State Drives (SSDs):
SSDs have no moving parts—electrons, not mechanical arms, locate data. This dramatically reduces the sequential vs. random performance gap, but it still exists:
Page-level granularity — SSDs read and write in pages (typically 4-16KB). Sequential access aligns with page boundaries.
Channel parallelism — SSDs contain multiple flash chips. Sequential access can saturate all channels; random small reads may bottleneck on fewer channels.
Read-ahead optimization — SSD controllers detect sequential patterns and prefetch subsequent data.
Write amplification — Random small writes cause more internal housekeeping (garbage collection, wear leveling) than sequential writes.
SSD performance comparison:
| Access Pattern | Throughput | IOPS | Latency |
|---|---|---|---|
| Sequential Read | 3,000-7,000 MB/s | N/A (streaming) | ~0.02ms effective |
| Random Read (4KB) | 50-200 MB/s | 100K-1M IOPS | ~0.02-0.1ms |
| Sequential Write | 2,000-5,000 MB/s | N/A (streaming) | ~0.02ms effective |
| Random Write (4KB) | 100-500 MB/s | 100K-500K IOPS | ~0.02-0.1ms |
Operating System Optimizations for Sequential Access:
Operating systems heavily optimize for sequential access patterns:
Read-ahead (Prefetching) — When the OS detects sequential access, it preemptively reads subsequent blocks into the buffer cache before you request them. This effectively hides disk latency.
Large contiguous allocation — File systems attempt to allocate sequential files in contiguous disk blocks to maximize sequential performance.
Write coalescing — Small sequential writes are buffered and written as larger sequential chunks.
Delayed allocation — Some file systems (ext4, XFS) delay block allocation until write-back, allowing better contiguous placement.
I/O scheduling — Disk schedulers like CFQ and mq-deadline merge and reorder requests to maximize sequential runs.
The result: sequential access often approaches the theoretical maximum throughput of the storage device, while random access is limited by latency and seek overhead.
A classic systems programming heuristic: sequential disk access can be 1000x faster than random access on HDDs. While SSDs narrow this gap dramatically, sequential access still wins—often by 10-50x for throughput-limited workloads. Always prefer sequential access when designing file formats and I/O patterns.
Buffering is crucial to achieving good performance with sequential access. It occurs at multiple layers, and understanding each layer helps you write efficient I/O code.
Layer 1: Application Buffers
At the application level, you control buffer size when calling read()/write(). The choice matters:
Too small (byte-at-a-time) — Each byte triggers a system call. System call overhead (~100ns-1μs) dominates, reducing throughput by 100-1000x.
Too large — Memory pressure increases; diminishing returns beyond buffer cache size.
Optimal (4KB-1MB) — Matches file system block size or buffer cache policies; amortizes system call overhead effectively.
The math:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647
/* * Impact of buffer size on sequential read performance * Reading 100MB file with different buffer sizes * * Typical results on modern system: * * Buffer Size | System Calls | Overhead | Effective Rate * ------------|--------------|-------------|--------------- * 1 byte | 104,857,600 | ~100 seconds| ~1 MB/s * 64 bytes | 1,638,400 | ~1.6 seconds| ~60 MB/s * 4 KB | 25,600 | ~0.025 sec | ~4 GB/s * 64 KB | 1,600 | ~0.002 sec | ~50 GB/s * 1 MB | 100 | ~0.0001 sec | ~1 TB/s (cache hits) * * Note: Beyond ~64KB, benefits plateau as we're limited by * disk throughput, not system call overhead. */ #include <stdio.h>#include <stdlib.h>#include <unistd.h>#include <fcntl.h>#include <time.h> void benchmark_buffer_size(const char *filename, size_t bufsize) { char *buffer = malloc(bufsize); int fd = open(filename, O_RDONLY); struct timespec start, end; clock_gettime(CLOCK_MONOTONIC, &start); ssize_t bytes; size_t total = 0; while ((bytes = read(fd, buffer, bufsize)) > 0) { total += bytes; } clock_gettime(CLOCK_MONOTONIC, &end); double elapsed = (end.tv_sec - start.tv_sec) + (end.tv_nsec - start.tv_nsec) / 1e9; printf("Buffer %8zu: %.3f seconds, %.2f MB/s\n", bufsize, elapsed, (total / 1e6) / elapsed); close(fd); free(buffer);}Layer 2: C Library Buffering (stdio)
Functions like fread(), fwrite(), fgets() use an intermediate buffer (typically 4KB-8KB) in user space:
// This is actually efficient:
while ((c = fgetc(file)) != EOF) {
process(c);
}
// fgetc() reads from an internal buffer, not the kernel.
// Only refills the buffer with read() every 4KB-8KB.
The stdio library provides buffering modes that you can control:
_IOFBF — Full buffering (default for regular files)_IOLBF — Line buffering (default for terminals)_IONBF — No bufferingLayer 3: Kernel Buffer Cache
The operating system maintains a buffer cache (page cache in Linux) that holds recently accessed file blocks in RAM. Benefits:
Layer 4: Disk Controller Cache
Modern disk controllers have 8-256MB of DRAM cache for:
For most sequential I/O, use buffer sizes of 64KB to 256KB. Smaller buffers waste CPU on system calls; larger buffers rarely improve throughput beyond disk limits. For memory-constrained environments, 4KB-16KB is a reasonable minimum. Match buffer sizes to file system block sizes when possible.
Sequential access is not merely a historical artifact—it remains the dominant access pattern for numerous critical workloads. Understanding these use cases illuminates why operating systems and storage systems optimize so heavily for sequential I/O.
1. Log Files and Event Streams
Application logs, system logs, transaction logs, and event streams are inherently sequential:
Examples: syslog, web server access logs, database transaction logs (WAL), Kafka message streams.
2. Media Files (Audio, Video, Images)
Media consumption is fundamentally sequential:
Exceptions exist (seeking in video), but dominant access is sequential streaming.
3. Backup and Archival
Backup operations involve:
4. Batch Data Processing
Data pipelines often process files sequentially:
5. File Transfer and Replication
| Workload | Read Pattern | Write Pattern | Sequentiality |
|---|---|---|---|
| Log files | Full scan or tail | Append-only | 100% |
| Video playback | Linear streaming | N/A | 95%+ (rare seeks) |
| File backup | Full file copy | Sequential write | 100% |
| Database bulk load | Sequential read | Append writes | 100% |
| Compiler output | N/A | Sequential write | 100% |
| Document editing | Full load on open | Full save on close | 95%+ |
When designing file formats or I/O-heavy applications, ask: 'Can this workload be sequential?' Restructuring random access patterns into sequential patterns (e.g., sorting before processing, using append-only logs) often yields order-of-magnitude performance improvements.
While sequential access is often optimal, certain workloads fundamentally require other access patterns. Recognizing these cases is crucial for selecting the right approach.
Workloads Where Sequential Access Is Inadequate:
The Cost of Inappropriate Sequential Access:
Using sequential access where random access is needed leads to severe inefficiency:
Full table scans — Database without indexes sequentially scans entire tables for single-record queries. O(n) instead of O(log n).
Linear search — Finding an element in an unsorted file is O(n). With proper indexing and random access, it's O(log n) or O(1).
Read amplification — To read 1 byte at position 1,000,000, sequential access must read all preceding 999,999 bytes.
Update inefficiency — Modifying one record in a sequential file may require rewriting the entire file.
Example: The Database Index Problem
123456789101112131415161718192021222324252627282930313233343536373839404142434445
/* * Comparison: Finding record by ID in a 1GB file * File contains 10 million 100-byte records */ // Sequential scan approach// Must read on average half the file to find a record// Average: 500MB read = ~3 seconds on HDD, ~0.2s on SSDrecord_t find_sequential(int fd, int target_id) { record_t record; // Read every record until we find the target while (read(fd, &record, sizeof(record)) == sizeof(record)) { if (record.id == target_id) { return record; // Found it! } } // Record not found - traversed entire file return (record_t){0};} // Random access with index approach// Index tells us exact byte offset of target record// Read exactly one 100-byte record// Time: ~10ms on HDD, ~0.1ms on SSDrecord_t find_indexed(int fd, int target_id, index_t *index) { // Look up record offset in in-memory index (hash table or tree) off_t offset = index_lookup(index, target_id); // O(1) or O(log n) // Seek directly to record location lseek(fd, offset, SEEK_SET); // Read only the single record we need record_t record; read(fd, &record, sizeof(record)); return record;} /* * Performance comparison for single lookup: * - Sequential: 3000ms (HDD) / 200ms (SSD) * - Indexed: 10ms (HDD) / 0.1ms (SSD) * * That's 300x to 2000x faster! * The difference compounds with query volume. */The key insight is not that sequential access is 'better' or 'worse' but that it suits specific workloads. Logs and streaming media are naturally sequential. Databases require indexed random access. Choosing the wrong access pattern can degrade performance by orders of magnitude.
We've conducted a deep examination of sequential access—the most fundamental file access pattern. Let's consolidate the critical concepts:
What's next:
Sequential access excels when you process files from start to finish, but what if you need to jump directly to a specific record without reading everything before it? The next page explores Direct (Random) Access—how to read and write at arbitrary positions within a file, the system calls that enable it, and the performance implications of breaking the sequential pattern.
You now have a comprehensive understanding of sequential file access—the conceptual model, implementation mechanics, performance characteristics, and real-world applications. This foundation prepares you to appreciate why and when other access methods are necessary, starting with direct random access.