Access Methods - Learning Module

Loading content...

0/241

Direct (Random) Access

Escaping the Sequential Constraint

Sequential access treats a file like a river—you can only go with the flow, moving steadily from source to sea. But what if you need to teleport upstream? What if you must read byte 1,000,000 without first reading bytes 0 through 999,999?

Direct access (also called random access) answers this need. It provides the ability to position the file pointer at any arbitrary offset and perform read/write operations there—instantly, without traversing intervening data.

The term 'random' doesn't imply randomness in the probabilistic sense. It means access at any position, in any order, as opposed to the strictly linear pattern of sequential access. Think of it like the difference between a cassette tape (sequential: must fast-forward/rewind through all intermediate content) and a CD (random: the laser can jump directly to any track).

Direct access is the foundation of all indexed data structures: databases, file system metadata, search indexes, and countless applications where looking up specific records by key is the dominant operation. Without it, modern computing as we know it—where billions of database queries execute per second—would be impossible.

What You Will Learn

By the end of this page, you will master direct file access—understanding the lseek() system call in depth, how random positioning interacts with read/write operations, the performance costs of breaking sequential patterns, and the critical use cases where random access provides orders-of-magnitude improvements over sequential scanning.

The Conceptual Model of Direct Access

Direct access reframes how we think about files. Instead of a stream to be consumed, a file becomes an array of bytes that can be addressed by index. Just as you can access array[500] without first touching array[0] through array[499], direct access lets you read file[500] directly.

Key characteristics of direct access:

Direct Access Properties

•Arbitrary Positioning — The file pointer can be set to any valid offset (0 to file_size, or beyond for sparse files).
•Constant-Time Seeking — Moving the file pointer from position 0 to position 1,000,000 takes effectively zero time (just updating an integer).
•Non-Destructive Movement — Seeking doesn't read or modify data; it only repositions the pointer.
•Position Independence — The cost of reading N bytes is the same whether they start at offset 0 or offset 1,000,000,000.
•Interleaved reads/writes — You can read from one position, seek elsewhere, write, seek back, and read again—in any order.

The array metaphor:

Think of a file as a very large byte array persisted to disk:

File:  [B₀][B₁][B₂][B₃]...[B₉₉₉₉₉₉]...[Bₙ₋₁]
        ↑                    ↑
    offset 0           offset 1,000,000

With direct access:

You can seek to offset 1,000,000 instantly
Read or write bytes starting there
The file pointer advances from the read/write
You can seek elsewhere and repeat

Historical Context:

Direct access became practical with magnetic disk drives in the 1960s. Unlike tape (strictly sequential), disks had movable read/write heads that could position over any track. While seeking wasn't free (it required mechanical head movement), it was vastly faster than reading all intervening data.

Modern SSDs take this further—with no mechanical parts, positioning is essentially instantaneous. The byte-addressable abstraction that seemed like a convenient fiction on HDDs becomes physically accurate on solid-state media.

Virtualization at Work

The file abstraction presents a clean byte-array interface regardless of underlying storage. Whether the file lives on spinning rust (HDD), flash chips (SSD), network storage (NFS), or RAM-based filesystem (tmpfs), the lseek/read/write interface remains identical. This virtualization is a core operating system contribution.

The lseek() System Call: Mastery-Level Understanding

The lseek() system call is the gateway to direct access. It repositions the file offset (file pointer) for an open file descriptor, determining where the next read or write will occur.

The signature:

#include <unistd.h>

off_t lseek(int fd, off_t offset, int whence);

Parameters in depth:

fd — An open file descriptor. Must refer to a seekable file (regular files and block devices are seekable; pipes, FIFOs, sockets, and terminal devices are not).
offset — A signed integer specifying the offset. Its interpretation depends on whence.
whence — The reference point for the offset. Three standard values:
- SEEK_SET — Offset from the beginning of the file. New position = offset.
- SEEK_CUR — Offset from the current position. New position = current + offset.
- SEEK_END — Offset from the end of the file. New position = file_size + offset.

Return value: The new file offset measured from the beginning, or -1 on error (with errno set).

lseek() whence Parameter Examples
Call	Current Pos	File Size	New Position
lseek(fd, 0, SEEK_SET)	any	any	0 (file start)
lseek(fd, 100, SEEK_SET)	any	any	100
lseek(fd, 50, SEEK_CUR)	100	any	150
lseek(fd, -30, SEEK_CUR)	100	any	70
lseek(fd, 0, SEEK_END)	any	1000	1000 (EOF)
lseek(fd, -100, SEEK_END)	any	1000	900
lseek(fd, 100, SEEK_END)	any	1000	1100 (past EOF)

lseek_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/stat.h>
 
/**
 * Comprehensive lseek() demonstration
 */
void demonstrate_lseek(const char *filename) {
    int fd = open(filename, O_RDWR);
    if (fd < 0) {
        perror("open");
        return;
    }
    
    // Get file size
    struct stat st;
    fstat(fd, &st);
    printf("File size: %lld bytes\n", (long long)st.st_size);
    
    // SEEK_SET: Absolute positioning
    off_t pos = lseek(fd, 1000, SEEK_SET);
    printf("After SEEK_SET to 1000: position = %lld\n", (long long)pos);
    
    // SEEK_CUR: Relative positioning
    pos = lseek(fd, 500, SEEK_CUR);    // Move forward 500
    printf("After SEEK_CUR +500: position = %lld\n", (long long)pos);
    
    pos = lseek(fd, -200, SEEK_CUR);   // Move backward 200
    printf("After SEEK_CUR -200: position = %lld\n", (long long)pos);
    
    // SEEK_END: Position relative to file end
    pos = lseek(fd, 0, SEEK_END);      // Go to EOF
    printf("After SEEK_END +0: position = %lld (EOF)\n", (long long)pos);
    
    pos = lseek(fd, -100, SEEK_END);   // 100 bytes before EOF
    printf("After SEEK_END -100: position = %lld\n", (long long)pos);
    
    // Get current position without moving (common idiom)
    pos = lseek(fd, 0, SEEK_CUR);
    printf("Current position (via SEEK_CUR +0): %lld\n", (long long)pos);
    
    // Rewind to beginning
    lseek(fd, 0, SEEK_SET);
    printf("Rewound to start\n");
    
    close(fd);
}
 
/**
 * Reading a specific record using direct access
 * Assumes fixed-size records of 100 bytes each
 */
typedef struct {
    int id;
    char name[48];
    double value;
    char padding[40];  // Total: 100 bytes
} Record;
 
Record read_record_by_number(int fd, int record_num) {
    Record record;
    
    // Calculate byte offset: record_num * record_size
    off_t offset = (off_t)record_num * sizeof(Record);
    
    // Seek directly to that record
    lseek(fd, offset, SEEK_SET);
    
    // Read the single record
    read(fd, &record, sizeof(Record));
    
    return record;
}

lseek() on Non-Seekable Files

Calling lseek() on pipes, FIFOs, sockets, or terminals returns -1 with errno set to ESPIPE. These are stream-oriented and fundamentally cannot support random positioning. Always verify that lseek() succeeds when working with diverse file types.

Seeking Past End of File: Sparse Files and Holes

One of the most interesting and often misunderstood aspects of lseek() is that you can seek past the end of a file. The position is not bounded by the file's current size.

What happens when you seek past EOF and write?

When you seek to a position beyond the current file size and then write data, the file is extended. But here's the crucial insight: the gap between the old EOF and the new write position doesn't necessarily consume disk space.

This creates what's called a sparse file or a file with holes. The file system records that those bytes exist (they read as zeros) but doesn't allocate actual disk blocks for them.

Example scenario:

sparse_file.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
#include <sys/stat.h>
 
/**
 * Creating a sparse file demonstration
 */
void create_sparse_file(const char *filename) {
    // Create a new file
    int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0644);
    
    // Write some data at the beginning
    const char *start_data = "START DATA";
    write(fd, start_data, strlen(start_data));
    printf("Wrote %zu bytes at start\n", strlen(start_data));
    
    // Seek way past the current file size (1 GB ahead!)
    off_t gap_start = lseek(fd, 0, SEEK_CUR);  // Current position
    off_t target = 1024 * 1024 * 1024;          // 1 GB
    lseek(fd, target, SEEK_SET);
    
    // Write some data at the 1 GB position
    const char *end_data = "END DATA";
    write(fd, end_data, strlen(end_data));
    printf("Wrote %zu bytes at offset %lld\n", 
           strlen(end_data), (long long)target);
    
    close(fd);
    
    // Check file size vs. disk usage
    struct stat st;
    stat(filename, &st);
    printf("\nFile analysis:\n");
    printf("  Logical size (st_size):    %lld bytes (~1 GB)\n", 
           (long long)st.st_size);
    printf("  Actual blocks (st_blocks): %lld\n", 
           (long long)st.st_blocks);
    printf("  Disk usage:                %lld bytes (~%lld KB)\n",
           (long long)st.st_blocks * 512,
           (long long)st.st_blocks * 512 / 1024);
    printf("\nThe 'hole' (1 GB gap) uses no disk space!\n");
}
 
/*
 * Typical output:
 * 
 * Wrote 10 bytes at start
 * Wrote 8 bytes at offset 1073741824
 * 
 * File analysis:
 *   Logical size (st_size):    1073741832 bytes (~1 GB)
 *   Actual blocks (st_blocks): 16
 *   Disk usage:                8192 bytes (~8 KB)
 * 
 * The 'hole' (1 GB gap) uses no disk space!
 */

Use cases for sparse files:

Virtual machine disk images — A 100GB virtual disk can be created instantly; only blocks actually written consume space.
Database pre-allocation — Reserve logical space for growth without consuming physical storage immediately.
Log files with timestamps — Seek to byte offset based on timestamp for time-based access.
Core dumps — Large regions of unmapped memory appear as holes in the dump file.
Torrent downloads — File is created at full size immediately; pieces are filled in as downloaded.

Reading holes:

When you read from a hole (a region that was never written), the file system returns zeros. This is transparent to the application—it appears as if the file contains zeros in those positions.

Caution with holes:

Sparse File Gotchas

cp and tar may expand holes — Copying a sparse file with naive tools can create a dense file that consumes the full logical size. Use 'cp --sparse=always' or 'tar --sparse'.
Not all file systems support holes — FAT32, for example, does not. Holes become real zeros.
Disk quotas count logical size — Some quota systems count the full logical size, not actual blocks.
Fragmentation — Sparse files with scattered writes can become severely fragmented.

Atomic Positioned I/O: pread() and pwrite()

A common pattern in direct-access code is:

lseek(fd, offset, SEEK_SET);
read(fd, buffer, count);

This works, but it has a problem: it's not atomic. In a multi-threaded application where multiple threads share a file descriptor, another thread could seek between your lseek and read, corrupting both operations.

The pread() and pwrite() system calls solve this by combining the seek and I/O into a single atomic operation:

#include <unistd.h>

ssize_t pread(int fd, void *buf, size_t count, off_t offset);
ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset);

Key differences from lseek + read/write:

lseek+read vs. pread Comparison
Aspect	lseek + read	pread
Atomicity	Two separate syscalls; race-prone	Single atomic operation
File pointer	Modified by both operations	Not modified at all
System call count	2	1
Thread safety	Requires external locking	Inherently thread-safe per-call
Use case	Sequential with occasional seeks	True random access patterns

pread_pwrite_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <pthread.h>
#include <string.h>
 
/**
 * Thread-safe random access using pread/pwrite
 * Multiple threads can safely access different offsets simultaneously
 */
 
#define RECORD_SIZE 100
#define NUM_RECORDS 10000
 
typedef struct {
    int fd;
    int record_num;
    char data[RECORD_SIZE];
} ThreadArgs;
 
// Thread-safe record read using pread
void* read_record_thread(void *arg) {
    ThreadArgs *args = (ThreadArgs*)arg;
    
    off_t offset = (off_t)args->record_num * RECORD_SIZE;
    
    // pread is atomic and doesn't modify the shared file pointer
    ssize_t bytes = pread(args->fd, args->data, RECORD_SIZE, offset);
    
    if (bytes != RECORD_SIZE) {
        fprintf(stderr, "Short read for record %d\n", args->record_num);
    }
    
    return NULL;
}
 
// Thread-safe record write using pwrite
void* write_record_thread(void *arg) {
    ThreadArgs *args = (ThreadArgs*)arg;
    
    off_t offset = (off_t)args->record_num * RECORD_SIZE;
    
    // pwrite is atomic and doesn't modify the shared file pointer
    ssize_t bytes = pwrite(args->fd, args->data, RECORD_SIZE, offset);
    
    if (bytes != RECORD_SIZE) {
        fprintf(stderr, "Short write for record %d\n", args->record_num);
    }
    
    return NULL;
}
 
/**
 * Demonstration: Multiple threads reading different records concurrently
 */
void concurrent_random_access(const char *filename) {
    int fd = open(filename, O_RDONLY);
    if (fd < 0) return;
    
    pthread_t threads[10];
    ThreadArgs args[10];
    
    // Launch 10 threads reading 10 different records simultaneously
    for (int i = 0; i < 10; i++) {
        args[i].fd = fd;
        args[i].record_num = i * 1000;  // Records 0, 1000, 2000, ...
        pthread_create(&threads[i], NULL, read_record_thread, &args[i]);
    }
    
    // Wait for all threads
    for (int i = 0; i < 10; i++) {
        pthread_join(threads[i], NULL);
        printf("Thread %d read record %d\n", i, args[i].record_num);
    }
    
    close(fd);
}

When to Use pread/pwrite

Use pread() and pwrite() for: (1) Multi-threaded access to shared file descriptors, (2) Database-style random record access, (3) Any situation where you need to read/write at a specific offset without affecting or being affected by the file pointer. They're the gold standard for modern random-access file I/O.

Performance Implications of Random Access

While direct access provides immense flexibility, it comes with performance trade-offs that every systems programmer must understand. The costs vary dramatically based on storage technology.

Hard Disk Drives (HDDs):

On spinning disks, random access incurs physical costs that sequential access avoids:

Seek time — Moving the actuator arm to a different track takes 3-15ms on average.
Rotational latency — Waiting for the target sector to rotate under the head adds another 2-8ms on average.
No read-ahead benefit — The OS cannot prefetch data since it doesn't know where you'll seek next.

For random 4KB reads on HDD:

Latency: ~10ms per read (seek + rotation)
Throughput: ~100-400 IOPS (I/O operations per second)
Bandwidth: 0.4-1.6 MB/s effective

Compare to sequential 4KB reads:

Latency: ~0.02ms per read (already positioned)
Throughput: ~40,000+ effective IOPS
Bandwidth: 150-250 MB/s

The ratio: Random access on HDD is 100-500x slower than sequential.

Random vs. Sequential I/O Performance
Storage Type	Random 4KB Read	Sequential Read	Random Penalty
HDD (7200 RPM)	~100 IOPS (0.4 MB/s)	~150 MB/s	~375x
HDD (15K RPM)	~200 IOPS (0.8 MB/s)	~200 MB/s	~250x
SATA SSD	~50K IOPS (200 MB/s)	~500 MB/s	~2.5x
NVMe SSD	~500K IOPS (2 GB/s)	~5 GB/s	~2.5x
Intel Optane	~500K IOPS (2 GB/s)	~2.5 GB/s	~1.25x
RAM (tmpfs)	~10M+ IOPS	~10 GB/s	~1x

Solid State Drives (SSDs):

SSDs dramatically improve random access performance because they have no mechanical parts:

No seek time — Electronic addressing is near-instantaneous
No rotational latency — Flash chips don't spin
Parallelism — Multiple flash chips can be accessed simultaneously

The random/sequential gap narrows significantly but doesn't disappear:

Page granularity — SSDs read in pages (4-16KB). Small random reads still incur per-page overhead.
Controller queuing — Deep command queues help random workloads; shallow queues favor sequential.
Internal organization — Sequential writes align with erase block boundaries; random writes cause write amplification.

Practical implications:

Performance Guidelines

•On HDD: Minimize random access ruthlessly. Batch operations, sort by disk location, consider log-structured designs.
•On SSD: Random access is viable but still not free. Sequential still wins by 2-3x for throughput-limited workloads.
•Block size matters: Random 4KB reads hurt more than random 1MB reads. Larger I/O amortizes the per-operation overhead.
•Queue depth: SSDs perform dramatically better with multiple concurrent I/O operations (high queue depth). Sequential access often runs at low queue depth.
•Pattern detection: OS read-ahead detects sequential patterns but gives up on random. You're on your own for caching.

The SSD Revolution

SSDs have transformed random access from 'avoid at all costs' to 'use judiciously'. Workloads that were impossible on HDD (e.g., serving millions of small random reads per second) are routine on SSD. This shift underpins modern databases, caching systems, and cloud storage.

Essential Use Cases for Direct Access

Direct access isn't just an alternative to sequential access—for certain workloads, it's the only viable approach. Let's examine the canonical use cases:

1. Database Systems

Databases are the poster child for random access:

B-tree traversal — Finding a record requires following pointers from root to leaf, seeking to each node.
Index lookups — Indexes map keys to row locations; retrieving a row means seeking to its offset.
Transaction logs — While writes are sequential (append), crash recovery seeks to specific log positions.
Page buffer management — The buffer pool loads arbitrary pages on demand.

A single SQL query like SELECT * FROM users WHERE id = 42 might require 3-5 random reads (index levels + data page).

2. File System Metadata

File systems themselves rely heavily on random access:

Inode lookup — Given an inode number, seek directly to inode table offset.
Directory traversal — Directories are data structures requiring random access to entries.
Block allocation — Consulting and updating free block bitmaps at scattered locations.
Journal replay — Seeking to specific journal records during fsck.

3. Memory-Mapped File Editing

Document editors and IDEs use memory mapping (covered later) which inherently provides random access:

Text editors — Jump to line 10,000 without loading everything before it.
Image editors — Modify specific regions of large images.
Video editors — Seek to specific frames for editing.

4. Virtual Machine Disk Images

VM disk images emulate block devices:

Random block access — Guest OS issues random reads/writes; host must service them at arbitrary offsets.
Snapshot management — Copy-on-write snapshots require reading original blocks at their offsets.
Thin provisioning — Sparse file support enables efficient large virtual disks.

5. Game Asset Loading

Modern games bundle assets in archive files:

Asset lookup — Index at file start maps asset names to offsets; assets loaded on demand.
Streaming — Open-world games load terrain/assets based on player position, seeking to relevant chunks.
Patching — Updates modify specific offsets without rewriting entire archive.

Direct Access Workload Patterns
Application	Read Pattern	Write Pattern	Random %
OLTP Database	Index lookups	Row updates, log appends	80-95%
File System	Metadata ops	Block alloc, journal	60-80%
Text Editor	Scroll, search	Local modifications	70-90%
VM Disk Image	Guest I/O	Guest I/O	50-90%
Game Engine	Asset loading	Save states	40-70%
Web Server	Small file reads	Logging (sequential)	20-40%

Hybrid Patterns

Real applications typically exhibit hybrid access patterns. A database has random reads for queries but sequential writes for WAL. A web server serves mostly small random files but writes logs sequentially. Understanding your workload's access pattern distribution is key to optimization.

Implementing Record-Based Random Access

A powerful application of direct access is record-based file organization, where a file contains a sequence of fixed-size records that can be accessed by record number.

The model:

File:  [Record 0][Record 1][Record 2]...[Record N-1]
        |         |         |
        0        100       200       ...  (byte offsets)

If each record is 100 bytes, accessing record K means seeking to offset K * 100.

Complete implementation:

record_file.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <fcntl.h>
#include <string.h>
 
/**
 * Fixed-size record file implementation
 * Provides O(1) access to any record by number
 */
 
typedef struct {
    int id;
    char name[64];
    double balance;
    int flags;
    char reserved[16];
} Record;  // 96 bytes; pad to 100 for alignment
 
#define RECORD_SIZE 100
_Static_assert(sizeof(Record) <= RECORD_SIZE, "Record too large");
 
typedef struct {
    int fd;
    int record_count;
} RecordFile;
 
// Open or create a record file
RecordFile* record_file_open(const char *filename, int create) {
    RecordFile *rf = malloc(sizeof(RecordFile));
    
    int flags = O_RDWR;
    if (create) flags |= O_CREAT | O_TRUNC;
    
    rf->fd = open(filename, flags, 0644);
    if (rf->fd < 0) {
        free(rf);
        return NULL;
    }
    
    // Determine record count from file size
    off_t size = lseek(rf->fd, 0, SEEK_END);
    rf->record_count = size / RECORD_SIZE;
    
    return rf;
}
 
// Read a record by number (0-indexed)
int record_read(RecordFile *rf, int record_num, Record *out) {
    if (record_num < 0 || record_num >= rf->record_count) {
        return -1;  // Out of bounds
    }
    
    char buffer[RECORD_SIZE];
    off_t offset = (off_t)record_num * RECORD_SIZE;
    
    // Use pread for thread safety
    ssize_t bytes = pread(rf->fd, buffer, RECORD_SIZE, offset);
    if (bytes != RECORD_SIZE) {
        return -1;  // Read error
    }
    
    memcpy(out, buffer, sizeof(Record));
    return 0;
}
 
// Write a record by number (0-indexed)
int record_write(RecordFile *rf, int record_num, const Record *record) {
    char buffer[RECORD_SIZE];
    memset(buffer, 0, RECORD_SIZE);
    memcpy(buffer, record, sizeof(Record));
    
    off_t offset = (off_t)record_num * RECORD_SIZE;
    
    // Use pwrite for thread safety
    ssize_t bytes = pwrite(rf->fd, buffer, RECORD_SIZE, offset);
    if (bytes != RECORD_SIZE) {
        return -1;  // Write error
    }
    
    // Update record count if we extended the file
    if (record_num >= rf->record_count) {
        rf->record_count = record_num + 1;
    }
    
    return 0;
}
 
// Append a new record, return its record number
int record_append(RecordFile *rf, const Record *record) {
    int new_num = rf->record_count;
    if (record_write(rf, new_num, record) != 0) {
        return -1;
    }
    return new_num;
}
 
// Delete a record (mark as deleted; actual deletion is complex)
int record_delete(RecordFile *rf, int record_num) {
    Record empty = {0};
    empty.id = -1;  // Convention: id=-1 means deleted
    return record_write(rf, record_num, &empty);
}
 
// Usage example
void demo_record_file() {
    RecordFile *rf = record_file_open("accounts.dat", 1);
    
    // Add some records
    Record r1 = {.id = 1001, .name = "Alice", .balance = 5000.00};
    Record r2 = {.id = 1002, .name = "Bob", .balance = 3200.50};
    Record r3 = {.id = 1003, .name = "Charlie", .balance = 8100.75};
    
    int n1 = record_append(rf, &r1);
    int n2 = record_append(rf, &r2);
    int n3 = record_append(rf, &r3);
    
    printf("Created records at positions: %d, %d, %d\n", n1, n2, n3);
    
    // Random access: read record 1 (Bob)
    Record lookup;
    record_read(rf, 1, &lookup);
    printf("Record 1: %s, balance: %.2f\n", lookup.name, lookup.balance);
    
    // Update record 1
    lookup.balance -= 500.00;
    record_write(rf, 1, &lookup);
    
    close(rf->fd);
    free(rf);
}

Padding and Alignment

Always pad records to fixed sizes that align with disk block sizes (ideally powers of 2: 64, 128, 256, 512 bytes). This prevents records from spanning block boundaries, which doubles the I/O cost and complicates atomic updates.

Summary: Mastering Direct Access

We've thoroughly explored direct (random) file access—the ability to read and write at arbitrary file positions without processing intervening data. Let's consolidate the critical points:

Key Takeaways

•lseek() repositions the file pointer — Use SEEK_SET for absolute positioning, SEEK_CUR for relative, SEEK_END for end-relative.
•Seeking is virtually free — It just updates an integer; the cost comes from the subsequent I/O operation.
•Seeking past EOF creates sparse files — Holes don't consume disk space but read as zeros.
•pread()/pwrite() are atomic — Use them for thread-safe random access without file pointer modification.
•Random access is expensive on HDD — Seek time and rotation add 5-15ms per operation; 100-500x slower than sequential.
•SSDs narrow the gap dramatically — Random access is viable on SSD but sequential still wins for throughput.
•Databases, file systems, and editors depend on random access — It enables O(1) record lookup, index traversal, and in-place modification.

What's next:

Direct access gives us the ability to seek anywhere, but what if we need to find records by key rather than by position? What if records are variable-size? The next page explores Indexed Access—how to build and use indexes that map logical keys to physical file locations, enabling efficient lookup without knowing byte offsets in advance.

Page Complete

You now command a deep understanding of direct file access—the lseek/pread/pwrite system calls, sparse files, performance implications across storage media, and critical use cases. This foundation prepares you for understanding how indexes build efficient key-based lookup on top of random access primitives.