Loading learning content...
If files are the nouns of persistent storage, then file operations are the verbs—the actions that breathe life into the static concept of a file. Every interaction between a program and a file—whether reading configuration at startup, writing logs during execution, or saving documents on user request—occurs through a well-defined set of operations.
These operations form a contract between the operating system and applications. Understand this contract, and you understand how all file I/O works, from the simplest script to the most complex database engine. Violate this contract, and you face data corruption, race conditions, and mysterious failures.
This page explores every major file operation in depth—not just what they do, but how they work, why they're designed as they are, and what pitfalls await the unwary programmer.
By the end of this page, you will understand the complete lifecycle of file access, from opening to closing. You'll master the semantics of read and write operations, understand the critical role of file pointers and seeking, and learn about atomic operations, locking, and the subtle differences between operating systems.
Every file access follows a fundamental pattern—a lifecycle that applies regardless of programming language, operating system, or file system:
The Basic Pattern:
1. OPEN → Establish connection to file, get file descriptor
2. READ → Transfer data from file to memory
WRITE → Transfer data from memory to file
SEEK → Change position within file
(repeat as needed)
3. CLOSE → Release file descriptor, flush buffers
This pattern is so fundamental that it appears in every programming interface:
| Language | Open | Read | Write | Close |
|---|---|---|---|---|
| C | fopen() | fread() | fwrite() | fclose() |
| Python | open() | file.read() | file.write() | file.close() |
| Java | new FileInputStream() | read() | write() | close() |
| Go | os.Open() | Read() | Write() | Close() |
| Rust | File::open() | read() | write() | drop |
Failing to close files causes resource leaks (file descriptors are limited), may leave data unwritten (buffers not flushed), and can prevent other processes from accessing the file. Modern languages use 'with' statements (Python), try-with-resources (Java), or RAII (C++, Rust) to ensure proper cleanup.
Opening a file establishes a connection between a process and a file, returning a file descriptor (Unix) or handle (Windows) that represents this connection.
What Happens During Open:
Open Flags (Unix System Call):
int fd = open("/path/to/file", flags, mode);
| Flag | Meaning |
|---|---|
O_RDONLY | Open for reading only |
O_WRONLY | Open for writing only |
O_RDWR | Open for reading and writing |
O_CREAT | Create file if it doesn't exist |
O_EXCL | Fail if file exists (with O_CREAT) |
O_TRUNC | Truncate file to zero length |
O_APPEND | Write always appends to end |
O_SYNC | Synchronous writes (wait for disk) |
O_DIRECT | Bypass kernel buffer cache |
O_NOFOLLOW | Don't follow symbolic links |
Common Open Patterns:
// Read existing file
int fd = open("data.txt", O_RDONLY);
// Create new file (fail if exists)
int fd = open("new.txt",
O_WRONLY | O_CREAT | O_EXCL, 0644);
// Overwrite file (create if needed)
int fd = open("output.txt",
O_WRONLY | O_CREAT | O_TRUNC, 0644);
// Append to log file
int fd = open("app.log",
O_WRONLY | O_CREAT | O_APPEND, 0644);
When a process forks, the child inherits copies of the parent's file descriptors. Both parent and child can read/write the same files, sharing the same file offset. This is how shell redirection works: the shell opens files, then the child process uses the inherited descriptors for stdin/stdout/stderr.
Reading transfers data from a file into a process's memory buffer. The operation specifies the file descriptor, a buffer to receive data, and the maximum number of bytes to read.
Basic Read System Call (Unix):
ssize_t bytes_read = read(fd, buffer, count);
Return Values:
Critical Insight: Short Reads
A common mistake is assuming read() returns exactly the requested number of bytes. In reality, reads can be "short" for many reasons:
Correct Pattern for Reading N Bytes:
// WRONG: Assumes full read
char buffer[1000];
read(fd, buffer, 1000); // May read less!
// CORRECT: Loop until all bytes read
char buffer[1000];
size_t total = 0;
while (total < 1000) {
ssize_t n = read(fd, buffer + total, 1000 - total);
if (n <= 0) break; // EOF or error
total += n;
}
| Function | Description | Use Case |
|---|---|---|
read() | Basic read from current position | General file reading |
pread() | Read at specific offset (atomic) | Multithreaded file access |
readv() | Scatter read into multiple buffers | Protocol parsing, efficiency |
preadv() | Scatter read at offset | Combination of above |
fread() | Buffered read (C stdio) | Text processing, convenience |
aio_read() | Asynchronous read | High-performance I/O |
Each successful read advances the file's current position (offset) by the number of bytes read. The next read continues from this new position. This is why sequential reads progress through the file naturally. Use lseek() or pread() to read from other positions.
Writing transfers data from a process's memory buffer to a file. Like read, write specifies a file descriptor, a buffer containing data, and the number of bytes to write.
Basic Write System Call (Unix):
ssize_t bytes_written = write(fd, buffer, count);
Return Values:
Short Writes Are Real:
Like reads, writes can be short:
ENOSPCCorrect Pattern for Writing N Bytes:
ssize_t write_all(int fd, const void *buf, size_t count) {
const char *ptr = buf;
size_t remaining = count;
while (remaining > 0) {
ssize_t n = write(fd, ptr, remaining);
if (n < 0) {
if (errno == EINTR) continue; // Retry on interrupt
return -1; // Real error
}
ptr += n;
remaining -= n;
}
return count;
}
write() returns → Data in kernel bufferfsync() returns → Data on physical diskclose() returns → File descriptor releasedO_SYNC: Each write waits for diskO_DSYNC: Data sync (not all metadata)fsync(fd): Flush specific filefdatasync(fd): Flush data onlysync(): Flush entire systemFor critical data (databases, financial transactions), you MUST call fsync() after write() to ensure data reaches permanent storage. Without fsync(), data remains in volatile kernel buffers. A power failure or crash can lose any 'written' data that wasn't synced.
Seeking changes the file's current position—the offset where the next read or write will occur. This enables random access to file contents.
The lseek System Call:
off_t new_position = lseek(fd, offset, whence);
Parameters:
fd: File descriptoroffset: Number of bytes to move (can be negative)whence: Reference point for the offsetWhence Values:
| Value | Meaning | New Position |
|---|---|---|
SEEK_SET | From file start | offset |
SEEK_CUR | From current position | current + offset |
SEEK_END | From file end | size + offset |
Common Seek Patterns:
// Go to beginning of file
lseek(fd, 0, SEEK_SET);
// Go to end of file
lseek(fd, 0, SEEK_END);
// Get current position (seek by 0 from current)
off_t pos = lseek(fd, 0, SEEK_CUR);
// Get file size (seek to end, note position)
off_t size = lseek(fd, 0, SEEK_END);
lseek(fd, 0, SEEK_SET); // Return to start
// Skip 100 bytes forward
lseek(fd, 100, SEEK_CUR);
// Go back 50 bytes
lseek(fd, -50, SEEK_CUR);
// Read byte 1000 specifically
lseek(fd, 1000, SEEK_SET);
read(fd, &byte, 1);
Seeking Past End of File:
You can seek past the end of a file. If you then write, the gap is filled with zeros—creating a sparse file (or "hole"):
int fd = open("sparse.dat", O_WRONLY | O_CREAT, 0644);
lseek(fd, 1000000, SEEK_SET); // Seek to 1 MB offset
write(fd, "X", 1); // Write 1 byte
close(fd);
// File has logical size ~1 MB but uses minimal disk space
Pipes, sockets, and certain device files cannot be seeked. lseek() returns -1 with errno set to ESPIPE for these. Such files support only sequential access. The pread()/pwrite() functions also fail on non-seekable files.
Closing a file releases the file descriptor and associated resources. It's the final step in the file operation lifecycle.
What Happens During Close:
The close System Call:
int result = close(fd);
// Returns 0 on success, -1 on error
Close Can Fail—But Usually Doesn't:
Most programmers assume close() always succeeds. While close failures are rare, they can indicate serious problems:
Robust Close Pattern:
if (close(fd) != 0) {
perror("close failed");
// For critical files, this indicates data may be lost!
// Log the error and possibly alert the user
}
Closing an already-closed file descriptor is a bug. The descriptor may have been reused for a different file, so you might close the wrong file! Set fd = -1 after close, and check before closing. This is a common source of subtle, hard-to-debug errors.
Language-Level Cleanup:
# Python: with statement ensures close
with open('file.txt', 'r') as f:
data = f.read()
# f is automatically closed here, even if exception occurs
// Java: try-with-resources
try (FileInputStream fis = new FileInputStream("file.txt")) {
// use fis
} // fis is automatically closed
// Rust: RAII - File closes when dropped
{
let f = File::open("file.txt")?;
// use f
} // f is automatically closed when it goes out of scope
Truncation changes a file's size, either shrinking it (discarding data) or extending it (adding zeros).
Truncation System Calls:
// Truncate by path
int result = truncate("/path/to/file", new_size);
// Truncate by file descriptor
int result = ftruncate(fd, new_size);
Effects:
| Situation | Effect |
|---|---|
new_size < current_size | Data beyond new_size is discarded |
new_size > current_size | File extended with zeros (sparse) |
new_size == current_size | No change (but ctime updated) |
Common Truncation Patterns:
// Clear a file completely (like > file)
ftruncate(fd, 0);
// Preallocate space for a file
ftruncate(fd, 1000000); // 1 MB
// Chop last 100 bytes from file
off_t size = lseek(fd, 0, SEEK_END);
ftruncate(fd, size - 100);
Using O_TRUNC on Open:
// Atomic open-and-truncate
int fd = open("file.txt", O_WRONLY | O_TRUNC);
// File is now empty, ready for new content
A common pattern for log files is to truncate them to zero rather than delete and recreate. This preserves file descriptors held by logging processes—they continue writing to the same file, now empty, rather than to a deleted file that no longer appears in the directory.
Creating Files:
Files are typically created via open() with the O_CREAT flag, or via creat() (an older interface):
// Modern creation
int fd = open("newfile.txt", O_WRONLY | O_CREAT, 0644);
// With O_EXCL for atomic creation
int fd = open("newfile.txt", O_WRONLY | O_CREAT | O_EXCL, 0644);
if (fd < 0 && errno == EEXIST) {
// File already exists
}
// Legacy creat() - equivalent to open(path, O_CREAT|O_WRONLY|O_TRUNC, mode)
int fd = creat("newfile.txt", 0644);
The mode Argument (Permissions):
When creating files, the mode specifies initial permissions. The actual permissions are modified by the process's umask:
actual_permissions = mode & ~umask
With a common umask of 022, mode 0666 produces permissions 0644 (rw-r--r--).
Deleting Files (unlink):
int result = unlink("/path/to/file");
What unlink Actually Does:
The Delayed Deletion Effect:
If a process has a file open when it's unlinked:
int fd = open("tempfile", O_RDWR | O_CREAT, 0600);
unlink("tempfile"); // Name gone, but file lives
write(fd, "secret", 6); // Still works!
// File exists only through fd
close(fd); // NOW the file truly disappears
The unlink-after-open pattern is used for secure temporary files. The file has no name in the filesystem, so other processes cannot access it. This prevents race conditions and accidental exposure of temporary data.
Renaming changes a file's name without moving its data. Moving within the same filesystem is also just a rename. Moving across filesystems requires copying data.
The rename System Call:
int result = rename("/path/old", "/path/new");
Semantics:
EXDEV errorAtomic Updates with Rename:
Rename's atomicity is crucial for safe file updates:
// WRONG: Modify in place (crash leaves corrupt file)
fd = open("config.json", O_RDWR);
write_new_content(fd);
close(fd);
// CORRECT: Write-then-rename (atomic update)
fd = open("config.json.tmp", O_WRONLY | O_CREAT, 0644);
write_new_content(fd);
fsync(fd); // Ensure data on disk
close(fd);
rename("config.json.tmp", "config.json"); // Atomic swap!
// Power failure at any point leaves valid config
| Scenario | Behavior |
|---|---|
| Source doesn't exist | Error: ENOENT |
| Destination exists (file) | Old destination unlinked, source takes its place |
| Destination exists (directory) | Source must also be directory; dir must be empty |
| Cross-filesystem | Error: EXDEV (use mv which copies) |
| Directory to non-directory | Error: ENOTDIR or EISDIR |
| File open by another process | Rename succeeds; process still has access via fd |
Linux's renameat2() provides additional flags: RENAME_EXCHANGE atomically swaps two files, and RENAME_NOREPLACE fails if the destination exists (unlike implicit overwrites). These enable even more sophisticated atomic operations.
When multiple processes access the same file, file locking prevents conflicts.
Types of Locks:
| Lock Type | Also Called | Allows Concurrent |
|---|---|---|
| Shared Lock | Read lock | Multiple readers, no writers |
| Exclusive Lock | Write lock | One writer, no readers |
Locking Mechanisms:
flock() Example:
#include <sys/file.h>
int fd = open("shared.dat", O_RDWR);
// Acquire exclusive lock (blocks if unavailable)
if (flock(fd, LOCK_EX) == 0) {
// Critical section - only one process here
write(fd, data, len);
flock(fd, LOCK_UN); // Release lock
}
// Non-blocking attempt
if (flock(fd, LOCK_EX | LOCK_NB) == -1) {
if (errno == EWOULDBLOCK) {
printf("File is locked by another process\n");
}
}
fcntl() for Byte-Range Locks:
struct flock fl = {
.l_type = F_WRLCK, // Exclusive lock
.l_whence = SEEK_SET,
.l_start = 0, // Start at byte 0
.l_len = 100, // Lock 100 bytes
};
fcntl(fd, F_SETLKW, &fl); // Wait for lock
// ... work with bytes 0-99 ...
fl.l_type = F_UNLCK;
fcntl(fd, F_SETLK, &fl); // Release
On most Unix systems, file locks are ADVISORY—they only work if all processes cooperate by requesting locks. A rogue process can still read/write without locking. Mandatory locking (enabled via special permissions) exists but is deprecated and should be avoided.
Memory mapping maps a file directly into a process's address space, allowing file access through memory operations rather than read/write calls.
The mmap System Call:
void *addr = mmap(
NULL, // Let kernel choose address
length, // Size to map
PROT_READ | PROT_WRITE, // Permissions
MAP_SHARED, // Sharing mode
fd, // File descriptor
offset // Offset in file
);
if (addr == MAP_FAILED) {
perror("mmap");
}
// Access file as memory
char *data = (char *)addr;
printf("First byte: %c\n", data[0]);
data[100] = 'X'; // Write to file!
// Cleanup
munmap(addr, length);
Many databases (SQLite, LMDB, some configurations of PostgreSQL) use mmap for data files. This lets the OS handle caching efficiently—the database's cache IS the OS page cache, avoiding double-buffering. However, careful use of msync() is needed for durability.
We've explored the complete landscape of file operations—from basic open/read/write/close to advanced topics like locking and memory mapping. Let's consolidate:
What's next:
Now that we understand file operations, we'll explore file types—the different kinds of files that exist beyond regular files. Directories, symbolic links, device files, pipes, and sockets all use the file abstraction but serve very different purposes.
You now have comprehensive knowledge of file operations—the actions that bring files to life. This foundation enables you to write correct, robust file-handling code, understand performance implications, and debug I/O issues effectively.