Operating SystemsFile System Concepts

File Concepts

LevelBeginner

Duration60 mins

TopicFile System Concepts

3 / 5

File Operations

The Verbs of File Systems

If files are the nouns of persistent storage, then file operations are the verbs—the actions that breathe life into the static concept of a file. Every interaction between a program and a file—whether reading configuration at startup, writing logs during execution, or saving documents on user request—occurs through a well-defined set of operations.

These operations form a contract between the operating system and applications. Understand this contract, and you understand how all file I/O works, from the simplest script to the most complex database engine. Violate this contract, and you face data corruption, race conditions, and mysterious failures.

This page explores every major file operation in depth—not just what they do, but how they work, why they're designed as they are, and what pitfalls await the unwary programmer.

What You Will Learn

By the end of this page, you will understand the complete lifecycle of file access, from opening to closing. You'll master the semantics of read and write operations, understand the critical role of file pointers and seeking, and learn about atomic operations, locking, and the subtle differences between operating systems.

The File Operation Lifecycle

Every file access follows a fundamental pattern—a lifecycle that applies regardless of programming language, operating system, or file system:

The Basic Pattern:

1. OPEN    → Establish connection to file, get file descriptor
2. READ    → Transfer data from file to memory
   WRITE   → Transfer data from memory to file
   SEEK    → Change position within file
   (repeat as needed)
3. CLOSE   → Release file descriptor, flush buffers

This pattern is so fundamental that it appears in every programming interface:

Language	Open	Read	Write	Close
C	`fopen()`	`fread()`	`fwrite()`	`fclose()`
Python	`open()`	`file.read()`	`file.write()`	`file.close()`
Java	`new FileInputStream()`	`read()`	`write()`	`close()`
Go	`os.Open()`	`Read()`	`Write()`	`Close()`
Rust	`File::open()`	`read()`	`write()`	drop

Always Close Your Files

Failing to close files causes resource leaks (file descriptors are limited), may leave data unwritten (buffers not flushed), and can prevent other processes from accessing the file. Modern languages use 'with' statements (Python), try-with-resources (Java), or RAII (C++, Rust) to ensure proper cleanup.

The Open Operation

Opening a file establishes a connection between a process and a file, returning a file descriptor (Unix) or handle (Windows) that represents this connection.

What Happens During Open:

Path Resolution: The kernel traverses the directory hierarchy to find the file
Permission Check: The kernel verifies the process has needed permissions
Inode Load: File metadata is loaded into kernel memory
File Table Entry: A new entry is created in the system-wide open file table
Descriptor Allocation: A process-local file descriptor is allocated
Descriptor Return: The descriptor number is returned to the application

Open Flags (Unix System Call):

int fd = open("/path/to/file", flags, mode);

Flag	Meaning
`O_RDONLY`	Open for reading only
`O_WRONLY`	Open for writing only
`O_RDWR`	Open for reading and writing
`O_CREAT`	Create file if it doesn't exist
`O_EXCL`	Fail if file exists (with O_CREAT)
`O_TRUNC`	Truncate file to zero length
`O_APPEND`	Write always appends to end
`O_SYNC`	Synchronous writes (wait for disk)
`O_DIRECT`	Bypass kernel buffer cache
`O_NOFOLLOW`	Don't follow symbolic links

Common Open Patterns:

// Read existing file
int fd = open("data.txt", O_RDONLY);

// Create new file (fail if exists)
int fd = open("new.txt", 
    O_WRONLY | O_CREAT | O_EXCL, 0644);

// Overwrite file (create if needed)
int fd = open("output.txt", 
    O_WRONLY | O_CREAT | O_TRUNC, 0644);

// Append to log file
int fd = open("app.log", 
    O_WRONLY | O_CREAT | O_APPEND, 0644);

Open Failure Reasons

•ENOENT: File doesn't exist
•EACCES: Permission denied
•EEXIST: File exists (with O_EXCL)
•EISDIR: Path is a directory
•EMFILE: Too many open files (process limit)
•ENFILE: Too many open files (system limit)
•ELOOP: Too many symbolic links

File Descriptors Are Inherited

When a process forks, the child inherits copies of the parent's file descriptors. Both parent and child can read/write the same files, sharing the same file offset. This is how shell redirection works: the shell opens files, then the child process uses the inherited descriptors for stdin/stdout/stderr.

The Read Operation

Reading transfers data from a file into a process's memory buffer. The operation specifies the file descriptor, a buffer to receive data, and the maximum number of bytes to read.

Basic Read System Call (Unix):

ssize_t bytes_read = read(fd, buffer, count);

Return Values:

Positive: Number of bytes actually read (may be less than requested)
Zero: End of file reached
Negative (-1): Error occurred (check errno)

Critical Insight: Short Reads

A common mistake is assuming read() returns exactly the requested number of bytes. In reality, reads can be "short" for many reasons:

End of file: Only remaining bytes returned
Pipe/socket: Only currently available data returned
Interrupting signal: Read interrupted, partial data returned
Internal buffering: Implementation-dependent batching

Correct Pattern for Reading N Bytes:

// WRONG: Assumes full read
char buffer[1000];
read(fd, buffer, 1000);  // May read less!

// CORRECT: Loop until all bytes read
char buffer[1000];
size_t total = 0;
while (total < 1000) {
    ssize_t n = read(fd, buffer + total, 1000 - total);
    if (n <= 0) break;  // EOF or error
    total += n;
}

Read Variants Across Systems
Function	Description	Use Case
`read()`	Basic read from current position	General file reading
`pread()`	Read at specific offset (atomic)	Multithreaded file access
`readv()`	Scatter read into multiple buffers	Protocol parsing, efficiency
`preadv()`	Scatter read at offset	Combination of above
`fread()`	Buffered read (C stdio)	Text processing, convenience
`aio_read()`	Asynchronous read	High-performance I/O

The File Offset Advances

Each successful read advances the file's current position (offset) by the number of bytes read. The next read continues from this new position. This is why sequential reads progress through the file naturally. Use lseek() or pread() to read from other positions.

The Write Operation

Writing transfers data from a process's memory buffer to a file. Like read, write specifies a file descriptor, a buffer containing data, and the number of bytes to write.

Basic Write System Call (Unix):

ssize_t bytes_written = write(fd, buffer, count);

Return Values:

Positive: Number of bytes actually written (may be less than requested)
Zero: Possible for unusual situations (empty write, some devices)
Negative (-1): Error occurred

Short Writes Are Real:

Like reads, writes can be short:

Disk full: Only partial data written before ENOSPC
Signal interruption: Write interrupted
Pipe buffer full: Only buffer space worth of data written
Quota exceeded: Similar to disk full

Correct Pattern for Writing N Bytes:

ssize_t write_all(int fd, const void *buf, size_t count) {
    const char *ptr = buf;
    size_t remaining = count;
    
    while (remaining > 0) {
        ssize_t n = write(fd, ptr, remaining);
        if (n < 0) {
            if (errno == EINTR) continue;  // Retry on interrupt
            return -1;  // Real error
        }
        ptr += n;
        remaining -= n;
    }
    return count;
}

Write Success Doesn't Mean Safe

•write() returns → Data in kernel buffer
•fsync() returns → Data on physical disk
•close() returns → File descriptor released
•Power failure before fsync → Data lost!

Durability Guarantees

•O_SYNC: Each write waits for disk
•O_DSYNC: Data sync (not all metadata)
•fsync(fd): Flush specific file
•fdatasync(fd): Flush data only
•sync(): Flush entire system

The fsync() Requirement for Durability

For critical data (databases, financial transactions), you MUST call fsync() after write() to ensure data reaches permanent storage. Without fsync(), data remains in volatile kernel buffers. A power failure or crash can lose any 'written' data that wasn't synced.

The Seek Operation (lseek)

Seeking changes the file's current position—the offset where the next read or write will occur. This enables random access to file contents.

The lseek System Call:

off_t new_position = lseek(fd, offset, whence);

Parameters:

fd: File descriptor
offset: Number of bytes to move (can be negative)
whence: Reference point for the offset

Whence Values:

Value	Meaning	New Position
`SEEK_SET`	From file start	`offset`
`SEEK_CUR`	From current position	`current + offset`
`SEEK_END`	From file end	`size + offset`

Common Seek Patterns:

// Go to beginning of file
lseek(fd, 0, SEEK_SET);

// Go to end of file
lseek(fd, 0, SEEK_END);

// Get current position (seek by 0 from current)
off_t pos = lseek(fd, 0, SEEK_CUR);

// Get file size (seek to end, note position)
off_t size = lseek(fd, 0, SEEK_END);
lseek(fd, 0, SEEK_SET);  // Return to start

// Skip 100 bytes forward
lseek(fd, 100, SEEK_CUR);

// Go back 50 bytes
lseek(fd, -50, SEEK_CUR);

// Read byte 1000 specifically
lseek(fd, 1000, SEEK_SET);
read(fd, &byte, 1);

Seeking Past End of File:

You can seek past the end of a file. If you then write, the gap is filled with zeros—creating a sparse file (or "hole"):

int fd = open("sparse.dat", O_WRONLY | O_CREAT, 0644);
lseek(fd, 1000000, SEEK_SET);  // Seek to 1 MB offset
write(fd, "X", 1);              // Write 1 byte
close(fd);
// File has logical size ~1 MB but uses minimal disk space

Not All Files Are Seekable

Pipes, sockets, and certain device files cannot be seeked. lseek() returns -1 with errno set to ESPIPE for these. Such files support only sequential access. The pread()/pwrite() functions also fail on non-seekable files.

The Close Operation

Closing a file releases the file descriptor and associated resources. It's the final step in the file operation lifecycle.

What Happens During Close:

Buffer Flush: Any buffered data is written (for buffered I/O)
Lock Release: Any file locks held are released
File Table Cleanup: The system file table entry's reference count decreases
Descriptor Free: The file descriptor becomes available for reuse
Potential File Deletion: If link count is zero and this was the last reference

The close System Call:

int result = close(fd);
// Returns 0 on success, -1 on error

Close Can Fail—But Usually Doesn't:

Most programmers assume close() always succeeds. While close failures are rare, they can indicate serious problems:

EIO: Write error occurred (data loss on NFS)
EINTR: Interrupted by signal (retry may be unsafe)
ENOSPC: Delayed write failed due to full disk

Robust Close Pattern:

if (close(fd) != 0) {
    perror("close failed");
    // For critical files, this indicates data may be lost!
    // Log the error and possibly alert the user
}

Double-Close Is Undefined Behavior

Closing an already-closed file descriptor is a bug. The descriptor may have been reused for a different file, so you might close the wrong file! Set fd = -1 after close, and check before closing. This is a common source of subtle, hard-to-debug errors.

Language-Level Cleanup:

# Python: with statement ensures close
with open('file.txt', 'r') as f:
    data = f.read()
# f is automatically closed here, even if exception occurs

// Java: try-with-resources
try (FileInputStream fis = new FileInputStream("file.txt")) {
    // use fis
} // fis is automatically closed

// Rust: RAII - File closes when dropped
{
    let f = File::open("file.txt")?;
    // use f
} // f is automatically closed when it goes out of scope

File Truncation

Truncation changes a file's size, either shrinking it (discarding data) or extending it (adding zeros).

Truncation System Calls:

// Truncate by path
int result = truncate("/path/to/file", new_size);

// Truncate by file descriptor
int result = ftruncate(fd, new_size);

Effects:

Situation	Effect
`new_size < current_size`	Data beyond new_size is discarded
`new_size > current_size`	File extended with zeros (sparse)
`new_size == current_size`	No change (but ctime updated)

Common Truncation Patterns:

// Clear a file completely (like > file)
ftruncate(fd, 0);

// Preallocate space for a file
ftruncate(fd, 1000000);  // 1 MB

// Chop last 100 bytes from file
off_t size = lseek(fd, 0, SEEK_END);
ftruncate(fd, size - 100);

Using O_TRUNC on Open:

// Atomic open-and-truncate
int fd = open("file.txt", O_WRONLY | O_TRUNC);
// File is now empty, ready for new content

Truncation for Log Rotation

A common pattern for log files is to truncate them to zero rather than delete and recreate. This preserves file descriptors held by logging processes—they continue writing to the same file, now empty, rather than to a deleted file that no longer appears in the directory.

File Creation and Deletion

Creating Files:

Files are typically created via open() with the O_CREAT flag, or via creat() (an older interface):

// Modern creation
int fd = open("newfile.txt", O_WRONLY | O_CREAT, 0644);

// With O_EXCL for atomic creation
int fd = open("newfile.txt", O_WRONLY | O_CREAT | O_EXCL, 0644);
if (fd < 0 && errno == EEXIST) {
    // File already exists
}

// Legacy creat() - equivalent to open(path, O_CREAT|O_WRONLY|O_TRUNC, mode)
int fd = creat("newfile.txt", 0644);

The mode Argument (Permissions):

When creating files, the mode specifies initial permissions. The actual permissions are modified by the process's umask:

actual_permissions = mode & ~umask

With a common umask of 022, mode 0666 produces permissions 0644 (rw-r--r--).

Deleting Files (unlink):

int result = unlink("/path/to/file");

What unlink Actually Does:

Removes the directory entry (the name)
Decrements the file's link count
If link count becomes zero AND no processes have the file open:
- File's data blocks are freed
- Inode is deallocated

The Delayed Deletion Effect:

If a process has a file open when it's unlinked:

The name disappears from the directory
The file continues to exist (accessible via the open fd)
Disk space is freed only when the last fd is closed

int fd = open("tempfile", O_RDWR | O_CREAT, 0600);
unlink("tempfile");  // Name gone, but file lives
write(fd, "secret", 6);  // Still works!
// File exists only through fd
close(fd);  // NOW the file truly disappears

Secure Temporary Files

The unlink-after-open pattern is used for secure temporary files. The file has no name in the filesystem, so other processes cannot access it. This prevents race conditions and accidental exposure of temporary data.

Renaming and Moving Files

Renaming changes a file's name without moving its data. Moving within the same filesystem is also just a rename. Moving across filesystems requires copying data.

The rename System Call:

int result = rename("/path/old", "/path/new");

Semantics:

Atomic: The rename is indivisible—no moment when neither name exists
Overwrites: If the new name exists, it's silently replaced
Same inode: The file's inode number doesn't change
Cross-directory: Works within the same filesystem
Cross-filesystem: FAILS with EXDEV error

Atomic Updates with Rename:

Rename's atomicity is crucial for safe file updates:

// WRONG: Modify in place (crash leaves corrupt file)
fd = open("config.json", O_RDWR);
write_new_content(fd);
close(fd);

// CORRECT: Write-then-rename (atomic update)
fd = open("config.json.tmp", O_WRONLY | O_CREAT, 0644);
write_new_content(fd);
fsync(fd);              // Ensure data on disk
close(fd);
rename("config.json.tmp", "config.json");  // Atomic swap!
// Power failure at any point leaves valid config

Rename Edge Cases
Scenario	Behavior
Source doesn't exist	Error: `ENOENT`
Destination exists (file)	Old destination unlinked, source takes its place
Destination exists (directory)	Source must also be directory; dir must be empty
Cross-filesystem	Error: `EXDEV` (use `mv` which copies)
Directory to non-directory	Error: `ENOTDIR` or `EISDIR`
File open by another process	Rename succeeds; process still has access via fd

renameat2() for Even More Atomicity

Linux's renameat2() provides additional flags: RENAME_EXCHANGE atomically swaps two files, and RENAME_NOREPLACE fails if the destination exists (unlike implicit overwrites). These enable even more sophisticated atomic operations.

File Locking

When multiple processes access the same file, file locking prevents conflicts.

Types of Locks:

Lock Type	Also Called	Allows Concurrent
Shared Lock	Read lock	Multiple readers, no writers
Exclusive Lock	Write lock	One writer, no readers

Locking Mechanisms:

flock() - Simple whole-file locks (BSD origin)
fcntl() - POSIX locks, supports byte-range locking
lockf() - Simplified interface to fcntl()
Open File Description Locks - Linux-specific, more predictable

flock() Example:

#include <sys/file.h>

int fd = open("shared.dat", O_RDWR);

// Acquire exclusive lock (blocks if unavailable)
if (flock(fd, LOCK_EX) == 0) {
    // Critical section - only one process here
    write(fd, data, len);
    flock(fd, LOCK_UN);  // Release lock
}

// Non-blocking attempt
if (flock(fd, LOCK_EX | LOCK_NB) == -1) {
    if (errno == EWOULDBLOCK) {
        printf("File is locked by another process
");
    }
}

fcntl() for Byte-Range Locks:

struct flock fl = {
    .l_type = F_WRLCK,          // Exclusive lock
    .l_whence = SEEK_SET,
    .l_start = 0,               // Start at byte 0
    .l_len = 100,               // Lock 100 bytes
};

fcntl(fd, F_SETLKW, &fl);       // Wait for lock
// ... work with bytes 0-99 ...
fl.l_type = F_UNLCK;
fcntl(fd, F_SETLK, &fl);        // Release

Advisory vs Mandatory Locks

On most Unix systems, file locks are ADVISORY—they only work if all processes cooperate by requesting locks. A rogue process can still read/write without locking. Mandatory locking (enabled via special permissions) exists but is deprecated and should be avoided.

Memory-Mapped I/O

Memory mapping maps a file directly into a process's address space, allowing file access through memory operations rather than read/write calls.

The mmap System Call:

void *addr = mmap(
    NULL,           // Let kernel choose address
    length,         // Size to map
    PROT_READ | PROT_WRITE,  // Permissions
    MAP_SHARED,     // Sharing mode
    fd,             // File descriptor
    offset          // Offset in file
);

if (addr == MAP_FAILED) {
    perror("mmap");
}

// Access file as memory
char *data = (char *)addr;
printf("First byte: %c
", data[0]);
data[100] = 'X';  // Write to file!

// Cleanup
munmap(addr, length);

mmap Advantages

•No explicit read/write calls needed
•Kernel handles paging automatically
•Zero-copy I/O (no buffer copying)
•Shared mappings enable IPC
•Lazy loading (demand paging)
•Random access without seeking

mmap Caveats

•Mapping size must be known upfront
•Can't extend file by writing past end
•Errors appear as SIGBUS, not return values
•Synchronization needs msync()
•Complex interaction with file locking
•Not suitable for network files

Databases Love mmap

Many databases (SQLite, LMDB, some configurations of PostgreSQL) use mmap for data files. This lets the OS handle caching efficiently—the database's cache IS the OS page cache, avoiding double-buffering. However, careful use of msync() is needed for durability.

Summary: Mastering File Operations

We've explored the complete landscape of file operations—from basic open/read/write/close to advanced topics like locking and memory mapping. Let's consolidate:

Key Takeaways

•Files follow a lifecycle: open → operations → close — This pattern is universal across all programming languages and operating systems.
•Open establishes a connection and returns a descriptor — Flags control creation, truncation, and access mode.
•Read and write may be short — Always loop until all bytes are transferred; never assume a single call suffices.
•write() success doesn't guarantee durability — Use fsync() for data that must survive crashes.
•lseek enables random access — The file position advances automatically; seek changes it explicitly.
•Close releases resources — Always close files; use language features for automatic cleanup.
•rename() is atomic and crucial for safe updates — Write to temp file, then rename for crash-safe modifications.
•File locking coordinates concurrent access — But Unix locks are advisory; all processes must cooperate.
•mmap provides powerful memory-based I/O — Zero-copy, lazy loading, but requires careful synchronization.

What's next:

Now that we understand file operations, we'll explore file types—the different kinds of files that exist beyond regular files. Directories, symbolic links, device files, pipes, and sockets all use the file abstraction but serve very different purposes.

Page Complete

You now have comprehensive knowledge of file operations—the actions that bring files to life. This foundation enables you to write correct, robust file-handling code, understand performance implications, and debug I/O issues effectively.

3 / 5

Loading learning content...

Operating SystemsFile System Concepts

File Concepts

LevelBeginner

Duration60 mins

TopicFile System Concepts

3 / 5

File Operations

The Verbs of File Systems

This page explores every major file operation in depth—not just what they do, but how they work, why they're designed as they are, and what pitfalls await the unwary programmer.

What You Will Learn

The File Operation Lifecycle

Every file access follows a fundamental pattern—a lifecycle that applies regardless of programming language, operating system, or file system:

The Basic Pattern:

1. OPEN    → Establish connection to file, get file descriptor
2. READ    → Transfer data from file to memory
   WRITE   → Transfer data from memory to file
   SEEK    → Change position within file
   (repeat as needed)
3. CLOSE   → Release file descriptor, flush buffers

This pattern is so fundamental that it appears in every programming interface:

Language	Open	Read	Write	Close
C	`fopen()`	`fread()`	`fwrite()`	`fclose()`
Python	`open()`	`file.read()`	`file.write()`	`file.close()`
Java	`new FileInputStream()`	`read()`	`write()`	`close()`
Go	`os.Open()`	`Read()`	`Write()`	`Close()`
Rust	`File::open()`	`read()`	`write()`	drop

Always Close Your Files

The Open Operation

Opening a file establishes a connection between a process and a file, returning a file descriptor (Unix) or handle (Windows) that represents this connection.

What Happens During Open:

Path Resolution: The kernel traverses the directory hierarchy to find the file
Permission Check: The kernel verifies the process has needed permissions
Inode Load: File metadata is loaded into kernel memory
File Table Entry: A new entry is created in the system-wide open file table
Descriptor Allocation: A process-local file descriptor is allocated
Descriptor Return: The descriptor number is returned to the application

Open Flags (Unix System Call):

int fd = open("/path/to/file", flags, mode);

Flag	Meaning
`O_RDONLY`	Open for reading only
`O_WRONLY`	Open for writing only
`O_RDWR`	Open for reading and writing
`O_CREAT`	Create file if it doesn't exist
`O_EXCL`	Fail if file exists (with O_CREAT)
`O_TRUNC`	Truncate file to zero length
`O_APPEND`	Write always appends to end
`O_SYNC`	Synchronous writes (wait for disk)
`O_DIRECT`	Bypass kernel buffer cache
`O_NOFOLLOW`	Don't follow symbolic links

Common Open Patterns:

// Read existing file
int fd = open("data.txt", O_RDONLY);

// Create new file (fail if exists)
int fd = open("new.txt", 
    O_WRONLY | O_CREAT | O_EXCL, 0644);

// Overwrite file (create if needed)
int fd = open("output.txt", 
    O_WRONLY | O_CREAT | O_TRUNC, 0644);

// Append to log file
int fd = open("app.log", 
    O_WRONLY | O_CREAT | O_APPEND, 0644);

Open Failure Reasons

•ENOENT: File doesn't exist
•EACCES: Permission denied
•EEXIST: File exists (with O_EXCL)
•EISDIR: Path is a directory
•EMFILE: Too many open files (process limit)
•ENFILE: Too many open files (system limit)
•ELOOP: Too many symbolic links

File Descriptors Are Inherited

The Read Operation

Reading transfers data from a file into a process's memory buffer. The operation specifies the file descriptor, a buffer to receive data, and the maximum number of bytes to read.

Basic Read System Call (Unix):

ssize_t bytes_read = read(fd, buffer, count);

Return Values:

Positive: Number of bytes actually read (may be less than requested)
Zero: End of file reached
Negative (-1): Error occurred (check errno)

Critical Insight: Short Reads

A common mistake is assuming read() returns exactly the requested number of bytes. In reality, reads can be "short" for many reasons:

End of file: Only remaining bytes returned
Pipe/socket: Only currently available data returned
Interrupting signal: Read interrupted, partial data returned
Internal buffering: Implementation-dependent batching

Correct Pattern for Reading N Bytes:

// WRONG: Assumes full read
char buffer[1000];
read(fd, buffer, 1000);  // May read less!

// CORRECT: Loop until all bytes read
char buffer[1000];
size_t total = 0;
while (total < 1000) {
    ssize_t n = read(fd, buffer + total, 1000 - total);
    if (n <= 0) break;  // EOF or error
    total += n;
}

Read Variants Across Systems
Function	Description	Use Case
`read()`	Basic read from current position	General file reading
`pread()`	Read at specific offset (atomic)	Multithreaded file access
`readv()`	Scatter read into multiple buffers	Protocol parsing, efficiency
`preadv()`	Scatter read at offset	Combination of above
`fread()`	Buffered read (C stdio)	Text processing, convenience
`aio_read()`	Asynchronous read	High-performance I/O

The File Offset Advances

The Write Operation

Writing transfers data from a process's memory buffer to a file. Like read, write specifies a file descriptor, a buffer containing data, and the number of bytes to write.

Basic Write System Call (Unix):

ssize_t bytes_written = write(fd, buffer, count);

Return Values:

Positive: Number of bytes actually written (may be less than requested)
Zero: Possible for unusual situations (empty write, some devices)
Negative (-1): Error occurred

Short Writes Are Real:

Like reads, writes can be short:

Disk full: Only partial data written before ENOSPC
Signal interruption: Write interrupted
Pipe buffer full: Only buffer space worth of data written
Quota exceeded: Similar to disk full

Correct Pattern for Writing N Bytes:

ssize_t write_all(int fd, const void *buf, size_t count) {
    const char *ptr = buf;
    size_t remaining = count;
    
    while (remaining > 0) {
        ssize_t n = write(fd, ptr, remaining);
        if (n < 0) {
            if (errno == EINTR) continue;  // Retry on interrupt
            return -1;  // Real error
        }
        ptr += n;
        remaining -= n;
    }
    return count;
}

Write Success Doesn't Mean Safe

•write() returns → Data in kernel buffer
•fsync() returns → Data on physical disk
•close() returns → File descriptor released
•Power failure before fsync → Data lost!

Durability Guarantees

•O_SYNC: Each write waits for disk
•O_DSYNC: Data sync (not all metadata)
•fsync(fd): Flush specific file
•fdatasync(fd): Flush data only
•sync(): Flush entire system

The fsync() Requirement for Durability

The Seek Operation (lseek)

Seeking changes the file's current position—the offset where the next read or write will occur. This enables random access to file contents.

The lseek System Call:

off_t new_position = lseek(fd, offset, whence);

Parameters:

fd: File descriptor
offset: Number of bytes to move (can be negative)
whence: Reference point for the offset

Whence Values:

Value	Meaning	New Position
`SEEK_SET`	From file start	`offset`
`SEEK_CUR`	From current position	`current + offset`
`SEEK_END`	From file end	`size + offset`

Common Seek Patterns:

// Go to beginning of file
lseek(fd, 0, SEEK_SET);

// Go to end of file
lseek(fd, 0, SEEK_END);

// Get current position (seek by 0 from current)
off_t pos = lseek(fd, 0, SEEK_CUR);

// Get file size (seek to end, note position)
off_t size = lseek(fd, 0, SEEK_END);
lseek(fd, 0, SEEK_SET);  // Return to start

// Skip 100 bytes forward
lseek(fd, 100, SEEK_CUR);

// Go back 50 bytes
lseek(fd, -50, SEEK_CUR);

// Read byte 1000 specifically
lseek(fd, 1000, SEEK_SET);
read(fd, &byte, 1);

Seeking Past End of File:

You can seek past the end of a file. If you then write, the gap is filled with zeros—creating a sparse file (or "hole"):

int fd = open("sparse.dat", O_WRONLY | O_CREAT, 0644);
lseek(fd, 1000000, SEEK_SET);  // Seek to 1 MB offset
write(fd, "X", 1);              // Write 1 byte
close(fd);
// File has logical size ~1 MB but uses minimal disk space

Not All Files Are Seekable

The Close Operation

Closing a file releases the file descriptor and associated resources. It's the final step in the file operation lifecycle.

What Happens During Close:

Buffer Flush: Any buffered data is written (for buffered I/O)
Lock Release: Any file locks held are released
File Table Cleanup: The system file table entry's reference count decreases
Descriptor Free: The file descriptor becomes available for reuse
Potential File Deletion: If link count is zero and this was the last reference

The close System Call:

int result = close(fd);
// Returns 0 on success, -1 on error

Close Can Fail—But Usually Doesn't:

Most programmers assume close() always succeeds. While close failures are rare, they can indicate serious problems:

EIO: Write error occurred (data loss on NFS)
EINTR: Interrupted by signal (retry may be unsafe)
ENOSPC: Delayed write failed due to full disk

Robust Close Pattern:

if (close(fd) != 0) {
    perror("close failed");
    // For critical files, this indicates data may be lost!
    // Log the error and possibly alert the user
}

Double-Close Is Undefined Behavior

Language-Level Cleanup:

# Python: with statement ensures close
with open('file.txt', 'r') as f:
    data = f.read()
# f is automatically closed here, even if exception occurs

// Java: try-with-resources
try (FileInputStream fis = new FileInputStream("file.txt")) {
    // use fis
} // fis is automatically closed

// Rust: RAII - File closes when dropped
{
    let f = File::open("file.txt")?;
    // use f
} // f is automatically closed when it goes out of scope

File Truncation

Truncation changes a file's size, either shrinking it (discarding data) or extending it (adding zeros).

Truncation System Calls:

// Truncate by path
int result = truncate("/path/to/file", new_size);

// Truncate by file descriptor
int result = ftruncate(fd, new_size);

Effects:

Situation	Effect
`new_size < current_size`	Data beyond new_size is discarded
`new_size > current_size`	File extended with zeros (sparse)
`new_size == current_size`	No change (but ctime updated)

Common Truncation Patterns:

// Clear a file completely (like > file)
ftruncate(fd, 0);

// Preallocate space for a file
ftruncate(fd, 1000000);  // 1 MB

// Chop last 100 bytes from file
off_t size = lseek(fd, 0, SEEK_END);
ftruncate(fd, size - 100);

Using O_TRUNC on Open:

// Atomic open-and-truncate
int fd = open("file.txt", O_WRONLY | O_TRUNC);
// File is now empty, ready for new content

Truncation for Log Rotation

File Creation and Deletion

Creating Files:

Files are typically created via open() with the O_CREAT flag, or via creat() (an older interface):

// Modern creation
int fd = open("newfile.txt", O_WRONLY | O_CREAT, 0644);

// With O_EXCL for atomic creation
int fd = open("newfile.txt", O_WRONLY | O_CREAT | O_EXCL, 0644);
if (fd < 0 && errno == EEXIST) {
    // File already exists
}

// Legacy creat() - equivalent to open(path, O_CREAT|O_WRONLY|O_TRUNC, mode)
int fd = creat("newfile.txt", 0644);

The mode Argument (Permissions):

When creating files, the mode specifies initial permissions. The actual permissions are modified by the process's umask:

actual_permissions = mode & ~umask

With a common umask of 022, mode 0666 produces permissions 0644 (rw-r--r--).

Deleting Files (unlink):

int result = unlink("/path/to/file");

What unlink Actually Does:

Removes the directory entry (the name)
Decrements the file's link count
If link count becomes zero AND no processes have the file open:
- File's data blocks are freed
- Inode is deallocated

The Delayed Deletion Effect:

If a process has a file open when it's unlinked:

The name disappears from the directory
The file continues to exist (accessible via the open fd)
Disk space is freed only when the last fd is closed

int fd = open("tempfile", O_RDWR | O_CREAT, 0600);
unlink("tempfile");  // Name gone, but file lives
write(fd, "secret", 6);  // Still works!
// File exists only through fd
close(fd);  // NOW the file truly disappears

Secure Temporary Files

Renaming and Moving Files

Renaming changes a file's name without moving its data. Moving within the same filesystem is also just a rename. Moving across filesystems requires copying data.

The rename System Call:

int result = rename("/path/old", "/path/new");

Semantics:

Atomic: The rename is indivisible—no moment when neither name exists
Overwrites: If the new name exists, it's silently replaced
Same inode: The file's inode number doesn't change
Cross-directory: Works within the same filesystem
Cross-filesystem: FAILS with EXDEV error

Atomic Updates with Rename:

Rename's atomicity is crucial for safe file updates:

// WRONG: Modify in place (crash leaves corrupt file)
fd = open("config.json", O_RDWR);
write_new_content(fd);
close(fd);

// CORRECT: Write-then-rename (atomic update)
fd = open("config.json.tmp", O_WRONLY | O_CREAT, 0644);
write_new_content(fd);
fsync(fd);              // Ensure data on disk
close(fd);
rename("config.json.tmp", "config.json");  // Atomic swap!
// Power failure at any point leaves valid config

Rename Edge Cases
Scenario	Behavior
Source doesn't exist	Error: `ENOENT`
Destination exists (file)	Old destination unlinked, source takes its place
Destination exists (directory)	Source must also be directory; dir must be empty
Cross-filesystem	Error: `EXDEV` (use `mv` which copies)
Directory to non-directory	Error: `ENOTDIR` or `EISDIR`
File open by another process	Rename succeeds; process still has access via fd

renameat2() for Even More Atomicity

File Locking

When multiple processes access the same file, file locking prevents conflicts.

Types of Locks:

Lock Type	Also Called	Allows Concurrent
Shared Lock	Read lock	Multiple readers, no writers
Exclusive Lock	Write lock	One writer, no readers

Locking Mechanisms:

flock() - Simple whole-file locks (BSD origin)
fcntl() - POSIX locks, supports byte-range locking
lockf() - Simplified interface to fcntl()
Open File Description Locks - Linux-specific, more predictable

flock() Example:

#include <sys/file.h>

int fd = open("shared.dat", O_RDWR);

// Acquire exclusive lock (blocks if unavailable)
if (flock(fd, LOCK_EX) == 0) {
    // Critical section - only one process here
    write(fd, data, len);
    flock(fd, LOCK_UN);  // Release lock
}

// Non-blocking attempt
if (flock(fd, LOCK_EX | LOCK_NB) == -1) {
    if (errno == EWOULDBLOCK) {
        printf("File is locked by another process
");
    }
}

fcntl() for Byte-Range Locks:

struct flock fl = {
    .l_type = F_WRLCK,          // Exclusive lock
    .l_whence = SEEK_SET,
    .l_start = 0,               // Start at byte 0
    .l_len = 100,               // Lock 100 bytes
};

fcntl(fd, F_SETLKW, &fl);       // Wait for lock
// ... work with bytes 0-99 ...
fl.l_type = F_UNLCK;
fcntl(fd, F_SETLK, &fl);        // Release

Advisory vs Mandatory Locks

Memory-Mapped I/O

Memory mapping maps a file directly into a process's address space, allowing file access through memory operations rather than read/write calls.

The mmap System Call:

void *addr = mmap(
    NULL,           // Let kernel choose address
    length,         // Size to map
    PROT_READ | PROT_WRITE,  // Permissions
    MAP_SHARED,     // Sharing mode
    fd,             // File descriptor
    offset          // Offset in file
);

if (addr == MAP_FAILED) {
    perror("mmap");
}

// Access file as memory
char *data = (char *)addr;
printf("First byte: %c
", data[0]);
data[100] = 'X';  // Write to file!

// Cleanup
munmap(addr, length);

mmap Advantages

•No explicit read/write calls needed
•Kernel handles paging automatically
•Zero-copy I/O (no buffer copying)
•Shared mappings enable IPC
•Lazy loading (demand paging)
•Random access without seeking

mmap Caveats

•Mapping size must be known upfront
•Can't extend file by writing past end
•Errors appear as SIGBUS, not return values
•Synchronization needs msync()
•Complex interaction with file locking
•Not suitable for network files

Databases Love mmap

Summary: Mastering File Operations

We've explored the complete landscape of file operations—from basic open/read/write/close to advanced topics like locking and memory mapping. Let's consolidate:

Key Takeaways

•Files follow a lifecycle: open → operations → close — This pattern is universal across all programming languages and operating systems.
•Open establishes a connection and returns a descriptor — Flags control creation, truncation, and access mode.
•Read and write may be short — Always loop until all bytes are transferred; never assume a single call suffices.
•write() success doesn't guarantee durability — Use fsync() for data that must survive crashes.
•lseek enables random access — The file position advances automatically; seek changes it explicitly.
•Close releases resources — Always close files; use language features for automatic cleanup.
•rename() is atomic and crucial for safe updates — Write to temp file, then rename for crash-safe modifications.
•File locking coordinates concurrent access — But Unix locks are advisory; all processes must cooperate.
•mmap provides powerful memory-based I/O — Zero-copy, lazy loading, but requires careful synchronization.

What's next:

Page Complete

3 / 5