Loading learning content...
The power of the Virtual File System lies not in complexity, but in standardization. VFS defines a common interface—a precise contract specifying exactly what operations a file system must support and how applications invoke those operations. This contract is the reason you can use the same ls command to list files from an ext4 partition, an NFS network share, a FAT USB drive, or even the /proc pseudo-filesystem.
In this page, we'll dissect the VFS common interface from both perspectives: the system call API that applications use, and the operations structures that file systems implement. Understanding this duality is essential for systems programming—whether you're writing applications, debugging file system issues, or implementing new file systems.
By the end of this page, you will understand the complete VFS system call interface, how system calls map to file system operations, the key operation structures (file_operations, inode_operations, super_operations), and how this design achieves file system independence.
Applications interact with files through a well-defined set of POSIX system calls. These system calls are architecture-independent interfaces that trigger kernel execution. The VFS layer receives these calls and dispatches them to the appropriate file system.
The Core File System Calls:
POSIX defines approximately 50+ file-related system calls. The most fundamental ones form a small, coherent API that every developer must understand:
| System Call | Signature (Simplified) | Purpose |
|---|---|---|
| open() | int open(path, flags, mode) | Open or create a file; returns file descriptor |
| close() | int close(fd) | Close file descriptor; release resources |
| read() | ssize_t read(fd, buf, count) | Read data from file into buffer |
| write() | ssize_t write(fd, buf, count) | Write data from buffer to file |
| lseek() | off_t lseek(fd, offset, whence) | Reposition file offset for read/write |
| stat() | int stat(path, statbuf) | Get file metadata (size, permissions, times) |
| fstat() | int fstat(fd, statbuf) | Get metadata by file descriptor |
| mkdir() | int mkdir(path, mode) | Create a directory |
| rmdir() | int rmdir(path) | Remove an empty directory |
| unlink() | int unlink(path) | Remove a file (decrease link count) |
| rename() | int rename(oldpath, newpath) | Rename or move a file/directory |
| link() | int link(oldpath, newpath) | Create a hard link |
| symlink() | int symlink(target, linkpath) | Create a symbolic link |
| readlink() | ssize_t readlink(path, buf, size) | Read symbolic link target |
| chmod() | int chmod(path, mode) | Change file permissions |
| chown() | int chown(path, owner, group) | Change file ownership |
| truncate() | int truncate(path, length) | Truncate file to specified length |
| opendir() | (library) DIR* opendir(path) | Open directory for reading (uses low-level getdents) |
| readdir() | (library) struct dirent* readdir(dir) | Read directory entry |
The File Descriptor Model:
A central concept in the VFS interface is the file descriptor (fd)—a small non-negative integer that represents an open file. File descriptors provide:
Indirection: Applications don't hold direct kernel pointers; they hold integers that the kernel maps to internal structures.
Security: Applications can't forge file descriptors to access arbitrary files; the descriptor must have been returned by a successful open().
State Encapsulation: The file descriptor tracks the current read/write position, access mode, and file system context.
Inheritance: File descriptors are inherited across fork(), enabling parent-child file sharing.
Standard File Descriptors:
0 — stdin (standard input)1 — stdout (standard output)2 — stderr (standard error)These are automatically assigned during program startup, pointing to the terminal by default.
Each process has a file descriptor table that maps integers to 'struct file' pointers. The 'struct file' contains the current offset, access mode, and a pointer to the file's inode. When you call read(5, buf, 100), the kernel looks up fd 5 in your process's table, finds the struct file, and uses it to call the appropriate file system's read operation.
When an application makes a system call, the kernel's VFS layer receives it, performs validation, and then dispatches to the appropriate file system driver. This dispatch happens through function pointers stored in VFS data structures.
The Dispatch Flow:
Key Steps in Detail:
Library Call: Application calls read(fd, buffer, 1024) from libc.
System Call Transition: Library executes syscall instruction (x86-64) or svc (ARM), transitioning from user mode to kernel mode. Control passes to the kernel's system call entry point.
VFS Entry: The kernel's sys_read() function:
struct file from the process's fd tableopen() timef_op->read or f_op->read_iter function pointerDispatch: VFS calls file->f_op->read_iter(kiocb, iter), which invokes the file system's implementation (e.g., ext4_file_read_iter).
File System Operation: The file system translates the byte range to disk blocks, reads from the block cache or issues I/O, and returns data.
Return Path: Data flows back up, user buffer is populated, and read() returns the byte count.
The key to dispatch is the operations structure. Every open file has a 'struct file_operations' pointer (f_op). Every inode has a 'struct inode_operations' pointer (i_op). Every superblock has 'struct super_operations' (s_op). Different file systems provide different implementations of these structures, enabling polymorphic behavior.
The struct file_operations is the primary interface for file I/O operations. When you open a file, the VFS assigns the file system's file_operations to the struct file. All subsequent read, write, seek, and control operations dispatch through this structure.
Linux file_operations (Simplified):
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
struct file_operations { struct module *owner; /* Seek operations */ loff_t (*llseek)(struct file *, loff_t, int); /* Legacy read/write - older interface */ ssize_t (*read)(struct file *, char __user *, size_t, loff_t *); ssize_t (*write)(struct file *, const char __user *, size_t, loff_t *); /* Modern vectored I/O - preferred interface */ ssize_t (*read_iter)(struct kiocb *, struct iov_iter *); ssize_t (*write_iter)(struct kiocb *, struct iov_iter *); /* Directory iteration */ int (*iterate_shared)(struct file *, struct dir_context *); /* Poll for readiness */ __poll_t (*poll)(struct file *, struct poll_table_struct *); /* Device control */ long (*unlocked_ioctl)(struct file *, unsigned int, unsigned long); long (*compat_ioctl)(struct file *, unsigned int, unsigned long); /* Memory mapping */ int (*mmap)(struct file *, struct vm_area_struct *); /* Open and release */ int (*open)(struct inode *, struct file *); int (*release)(struct inode *, struct file *); int (*flush)(struct file *, fl_owner_t); /* Sync operations */ int (*fsync)(struct file *, loff_t, loff_t, int datasync); /* Asynchronous I/O */ int (*fasync)(int, struct file *, int); /* File locking */ int (*lock)(struct file *, int, struct file_lock *); int (*flock)(struct file *, int, struct file_lock *); /* Splice operations for zero-copy I/O */ ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); /* Hole punching, preallocation */ long (*fallocate)(struct file *, int, loff_t, loff_t); /* ... additional operations ... */};Key Operations Explained:
| Operation | Purpose | When Called |
|---|---|---|
read_iter | Read data from file | read(), readv(), pread() system calls |
write_iter | Write data to file | write(), writev(), pwrite() system calls |
llseek | Change file offset | lseek() system call |
iterate_shared | List directory contents | getdents() (used by readdir) |
mmap | Map file into memory | mmap() system call |
fsync | Flush file to disk | fsync(), fdatasync() system calls |
poll | Check I/O readiness | poll(), select(), epoll_wait() |
open | Per-file-object initialization | open() system call (after inode lookup) |
release | Clean up file object | close() (when last reference dropped) |
File System Implementations:
Different file systems provide their own implementations. Here's how ext4 defines its file_operations for regular files:
123456789101112131415161718192021
const struct file_operations ext4_file_operations = { .llseek = ext4_llseek, .read_iter = ext4_file_read_iter, .write_iter = ext4_file_write_iter, .mmap = ext4_file_mmap, .open = ext4_file_open, .release = ext4_release_file, .fsync = ext4_sync_file, .splice_read = generic_file_splice_read, .splice_write = iter_file_splice_write, .fallocate = ext4_fallocate,}; /* For directories, ext4 uses different operations */const struct file_operations ext4_dir_operations = { .llseek = ext4_dir_llseek, .read = generic_read_dir, /* Directories can't read() */ .iterate_shared = ext4_readdir, /* But can iterate entries */ .release = ext4_release_dir, .fsync = ext4_sync_file,};Notice 'generic_file_splice_read' and 'generic_read_dir'. The VFS provides generic implementations for common operations. File systems can use these when the default behavior is correct, only implementing custom functions where their specific format requires it. This reduces code duplication enormously.
While file_operations handles I/O on open files, inode_operations handles namespace operations—creating, deleting, linking, and looking up files within directories. These operations work on inodes and dentries, modifying the file system's structure.
Linux inode_operations (Simplified):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
struct inode_operations { /* Lookup a name within a directory */ struct dentry * (*lookup)(struct inode *, struct dentry *, unsigned int flags); /* Follow a symbolic link */ const char * (*get_link)(struct dentry *, struct inode *, struct delayed_call *); /* Check permissions */ int (*permission)(struct user_namespace *, struct inode *, int); /* Extended attributes */ struct posix_acl * (*get_acl)(struct inode *, int, bool); int (*set_acl)(struct user_namespace *, struct inode *, struct posix_acl *, int); /* Read symbolic link target (deprecated, use get_link) */ int (*readlink)(struct dentry *, char __user *, int); /* Create a regular file */ int (*create)(struct user_namespace *, struct inode *, struct dentry *, umode_t, bool); /* Create a hard link */ int (*link)(struct dentry *, struct inode *, struct dentry *); /* Remove a hard link (delete file) */ int (*unlink)(struct inode *, struct dentry *); /* Create a symbolic link */ int (*symlink)(struct user_namespace *, struct inode *, struct dentry *, const char *); /* Create a directory */ int (*mkdir)(struct user_namespace *, struct inode *, struct dentry *, umode_t); /* Remove a directory */ int (*rmdir)(struct inode *, struct dentry *); /* Create a device node */ int (*mknod)(struct user_namespace *, struct inode *, struct dentry *, umode_t, dev_t); /* Rename a file or directory */ int (*rename)(struct user_namespace *, struct inode *, struct dentry *, struct inode *, struct dentry *, unsigned int); /* Set file attributes (permissions, times) */ int (*setattr)(struct user_namespace *, struct dentry *, struct iattr *); /* Get file attributes */ int (*getattr)(struct user_namespace *, const struct path *, struct kstat *, u32, unsigned int); /* File locking operations */ int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64, u64); /* ... additional operations ... */};Key Operations Explained:
| Operation | Purpose | System Call(s) |
|---|---|---|
lookup | Find inode for a name in directory | Path resolution (internal) |
create | Create new regular file | open() with O_CREAT |
mkdir | Create new directory | mkdir() |
rmdir | Remove empty directory | rmdir() |
unlink | Remove file (decrement link count) | unlink(), remove() |
link | Create hard link | link() |
symlink | Create symbolic link | symlink() |
rename | Move/rename file or directory | rename(), renameat() |
setattr | Change permissions, owner, times | chmod(), chown(), utime() |
getattr | Get file metadata | stat(), fstat(), lstat() |
get_link | Read symlink target | readlink(), path resolution |
permission | Check access permissions | Every file operation |
The lookup Operation:
The lookup operation is particularly important. During pathname resolution, VFS calls lookup on each directory to find its children. For example, resolving /home/alice/file.txt:
root_inode->i_op->lookup(root_inode, "home", ...) → returns home's dentry/inodehome_inode->i_op->lookup(home_inode, "alice", ...) → returns alice's dentry/inodealice_inode->i_op->lookup(alice_inode, "file.txt", ...) → returns file's dentry/inodeDon't confuse inode operations with system calls. The mkdir() operation in inode_operations is called by VFS to implement the mkdir() system call, but it occurs AFTER the kernel has resolved the parent directory, checked permissions, and determined the file doesn't already exist. File systems implement the actual disk modification.
The super_operations structure handles file system-wide operations—mounting, unmounting, syncing, and managing inodes at the file system level. Each mounted file system has a superblock, and that superblock contains s_op pointing to super_operations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
struct super_operations { /* Allocate a new inode for this file system */ struct inode *(*alloc_inode)(struct super_block *sb); /* Destroy an inode when its reference count hits zero */ void (*destroy_inode)(struct inode *); void (*free_inode)(struct inode *); /* Called when inode marked dirty */ void (*dirty_inode)(struct inode *, int flags); /* Write inode to disk */ int (*write_inode)(struct inode *, struct writeback_control *); /* Called before removing inode from memory */ int (*drop_inode)(struct inode *); /* Delete inode from disk when link count = 0 */ void (*evict_inode)(struct inode *); /* Called during unmount */ void (*put_super)(struct super_block *); /* Sync file system metadata to disk */ int (*sync_fs)(struct super_block *, int wait); /* Freeze file system for snapshot */ int (*freeze_super)(struct super_block *); int (*freeze_fs)(struct super_block *); int (*thaw_super)(struct super_block *); int (*unfreeze_fs)(struct super_block *); /* Get file system statistics */ int (*statfs)(struct dentry *, struct kstatfs *); /* Remount with different options */ int (*remount_fs)(struct super_block *, int *, char *); /* Release superblock */ void (*umount_begin)(struct super_block *); /* Quota operations */ ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); /* ... additional operations ... */};Key Operations Explained:
| Operation | Purpose | When Called |
|---|---|---|
alloc_inode | Allocate inode structure | When VFS needs a new inode |
destroy_inode | Free inode structure | When inode removed from cache |
write_inode | Persist inode to disk | Writeback, sync operations |
evict_inode | Delete inode fully | When link count reaches 0 |
put_super | Clean up superblock | During unmount |
sync_fs | Sync all metadata | sync command, periodic sync |
statfs | Get FS statistics | df command, statfs() syscall |
remount_fs | Change mount options | mount -o remount |
freeze_fs/thaw_super | Freeze for backup | LVM snapshots, online backup |
The Inode Lifecycle:
alloc_inode to get memory, then inode_operations->create to initialize.dirty_inode is called; later, write_inode persists changes.evict_inode (or drop_inode) is called.evict_inode removes on-disk data.The statfs operation populates the kstatfs structure with file system statistics: total blocks, free blocks, available blocks, total inodes, free inodes, block size, etc. This is what powers the 'df' command. Each file system calculates these from its on-disk structures.
Beyond the big three (file_operations, inode_operations, super_operations), VFS defines additional operations structures for specialized purposes.
dentry_operations:
The dentry_operations structure customizes directory entry behavior. Most file systems don't need custom dentry operations, but some (especially network file systems) do.
struct dentry_operations {
/* Revalidate cached dentry */
int (*d_revalidate)(struct dentry *,
unsigned int);
/* Custom hash function */
int (*d_hash)(const struct dentry *,
struct qstr *);
/* Custom comparison */
int (*d_compare)(const struct dentry *,
unsigned int, const char *,
const struct qstr *);
/* Called on last dput() */
int (*d_delete)(const struct dentry *);
/* Initialize new dentry */
int (*d_init)(struct dentry *);
/* Release resources */
void (*d_release)(struct dentry *);
/* ... */
};
Important: d_revalidate is crucial for NFS. Since remote files can change on the server, NFS must check if a cached dentry is still valid before using it.
address_space_operations:
The address_space_operations structure manages page cache I/O. It defines how pages are read from and written to the backing store.
struct address_space_operations {
/* Write a dirty page to disk */
int (*writepage)(struct page *,
struct writeback_control *);
/* Read a page from disk */
int (*readpage)(struct file *,
struct page *);
/* Read multiple pages */
void (*readahead)(struct readahead_control *);
/* Prepare for write */
int (*write_begin)(struct file *,
struct address_space *,
loff_t, unsigned,
struct page **, void **);
/* Finish write */
int (*write_end)(struct file *,
struct address_space *,
loff_t, unsigned, unsigned,
struct page *, void *);
/* Direct I/O */
ssize_t (*direct_IO)(struct kiocb *,
struct iov_iter *);
/* ... */
};
Important: These operations integrate file systems with the page cache for efficient buffered I/O.
The VFS interface is modular: file_operations for I/O, inode_operations for namespace, super_operations for the file system, dentry_operations for name caching, and address_space_operations for page cache. Each structure can use generic implementations or custom ones. This modularity allows file systems to customize exactly what they need.
To make these concepts concrete, let's conceptualize what a minimal file system implementation looks like. This pseudo-code shows the core structures a file system must provide:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
/* * myfs - A conceptual minimal file system * This shows the structures a file system provides to VFS */ #include <linux/fs.h>#include <linux/module.h> /* Super operations - file system-wide functions */static const struct super_operations myfs_super_ops = { .alloc_inode = myfs_alloc_inode, .destroy_inode = myfs_destroy_inode, .write_inode = myfs_write_inode, .evict_inode = myfs_evict_inode, .put_super = myfs_put_super, .statfs = myfs_statfs,}; /* Inode operations for directories */static const struct inode_operations myfs_dir_inode_ops = { .lookup = myfs_lookup, /* Find child in directory */ .create = myfs_create, /* Create regular file */ .mkdir = myfs_mkdir, /* Create subdirectory */ .rmdir = myfs_rmdir, /* Remove directory */ .unlink = myfs_unlink, /* Remove file */ .rename = myfs_rename, /* Move/rename */}; /* Inode operations for regular files */static const struct inode_operations myfs_file_inode_ops = { .setattr = myfs_setattr, /* chmod, chown, etc. */ .getattr = myfs_getattr, /* stat() */}; /* File operations for regular files */static const struct file_operations myfs_file_ops = { .llseek = generic_file_llseek, .read_iter = generic_file_read_iter, /* Use generic page cache */ .write_iter = generic_file_write_iter, .mmap = generic_file_mmap, .fsync = myfs_fsync, .open = generic_file_open,}; /* File operations for directories */static const struct file_operations myfs_dir_ops = { .llseek = generic_file_llseek, .read = generic_read_dir, .iterate_shared = myfs_iterate, /* List directory contents */}; /* Address space operations - page cache integration */static const struct address_space_operations myfs_aops = { .readpage = myfs_readpage, .writepage = myfs_writepage, .write_begin = myfs_write_begin, .write_end = myfs_write_end,}; /* Called when new inode is created, assigns operations */void myfs_set_inode_ops(struct inode *inode) { if (S_ISREG(inode->i_mode)) { inode->i_op = &myfs_file_inode_ops; inode->i_fop = &myfs_file_ops; inode->i_mapping->a_ops = &myfs_aops; } else if (S_ISDIR(inode->i_mode)) { inode->i_op = &myfs_dir_inode_ops; inode->i_fop = &myfs_dir_ops; } else if (S_ISLNK(inode->i_mode)) { inode->i_op = &myfs_symlink_inode_ops; }} /* Fill superblock during mount */static int myfs_fill_super(struct super_block *sb, void *data, int silent) { sb->s_op = &myfs_super_ops; sb->s_magic = MYFS_MAGIC; /* Read root inode from disk, set up root dentry */ struct inode *root = myfs_iget(sb, MYFS_ROOT_INO); sb->s_root = d_make_root(root); return 0;}Key Observations:
Multiple Operation Structures: Even this minimal example has 6 operation structures: super_ops, dir_inode_ops, file_inode_ops, file_ops, dir_ops, and aops.
Generic Helpers: Notice extensive use of generic_* functions. The file system doesn't reimplement basic file I/O—it uses VFS-provided page cache integration.
Type-Specific Operations: Different inode types (file, directory, symlink) get different operations. myfs_set_inode_ops assigns appropriate pointers based on file type.
Mount-Time Setup: myfs_fill_super connects the file system to VFS during mount, setting s_op and creating the root dentry.
This minimal example omits thousands of lines of code for journaling, extent management, block allocation, error handling, recovery, concurrency control, etc. Real file systems like ext4 or XFS have ~50,000+ lines of code. The VFS interface is the contract they fulfill.
We've explored the VFS common interface in detail. Let's consolidate the key concepts:
What's Next:
With the common interface understood, we'll explore file system mounting—how file systems are discovered, initialized, attached to the namespace, and how mount points work to create the unified directory hierarchy.
You now understand the VFS common interface: the system calls applications use, the operation structures file systems implement, and how VFS dispatches between them. This knowledge is essential for systems programming, debugging file system issues, and implementing custom file systems.