Operating SystemsFile System Layers

File System Layers

LevelIntermediate

Duration90 mins

TopicFile System Layers

1 / 5

Logical File System

The Invisible Abstraction Machine

When you execute a simple command like open('/home/user/document.txt', 'r'), you invoke one of the most sophisticated abstraction layers in computer science. Behind this seemingly trivial operation lies an intricate machinery that translates your high-level request through multiple layers of software, each adding critical functionality while hiding complexity from the layer above.

The logical file system sits at the apex of this layered architecture. It's the layer that developers and users interact with directly—the layer that transforms the chaos of spinning platters, flash cells, and electrical signals into the elegant abstraction we call a file.

Understanding this layer isn't merely academic. It's the key to writing efficient file-handling code, debugging mysterious I/O failures, and architecting systems that perform well under load. Principal engineers at companies like Google, Netflix, and Amazon regularly make design decisions that require deep understanding of how logical file systems process requests.

What You Will Learn

By the end of this page, you will understand the complete role of the logical file system layer: how it manages metadata, enforces protection, provides directory services, validates operations, and interfaces with lower layers. You'll gain the mental model needed to reason about file system behavior in production systems.

The Layered File System Architecture

Before diving deep into the logical file system, we must understand where it sits within the broader file system architecture. Modern operating systems implement file systems as a layered stack, where each layer provides abstraction and services to the layer above while consuming services from the layer below.

This layered design follows fundamental software engineering principles: separation of concerns, abstraction hiding, and modularity. Each layer has a well-defined interface and responsibility, making the system easier to understand, maintain, and extend.

The Five-Layer File System Stack
Layer	Name	Primary Responsibility	Key Operations
5 (Top)	Logical File System	Metadata management & protection	File validation, permission checks, FCB management
4	File Organization Module	Logical-to-physical block mapping	Block allocation, free space management
3	Basic File System	Block-level operations	Read/write physical blocks, buffer management
2	I/O Control	Device command translation	Device drivers, interrupt handling
1 (Bottom)	Device Drivers	Hardware communication	Direct hardware register access, DMA setup

The critical insight: Each layer only knows about the abstractions provided by the layer below. The logical file system doesn't know whether data is stored on an SSD, a spinning disk, or a network-attached storage device. It operates purely in terms of logical blocks and file metadata. This separation is what allows the same file system implementation to work transparently across vastly different storage technologies.

Why Layering Matters

The layered architecture enables independent evolution. When new storage technologies emerge (SSD replacing HDD, NVMe replacing SATA), only the lower layers need modification. The logical file system—and all application code—continues to work unchanged. This is why 40-year-old Unix programs still run on modern NVMe drives without recompilation.

What Is the Logical File System?

The logical file system is the topmost layer of the file system stack. It's the layer that user programs interact with through system calls like open(), read(), write(), close(), stat(), and chmod(). Its primary job is to manage all aspects of files that don't involve physical storage—essentially, everything about a file except the actual bytes on disk.

Think of the logical file system as the interface and policy layer. It defines what operations are possible, validates that requested operations are legal, manages the metadata that describes files, and orchestrates requests to lower layers. It's the layer that makes files behave like files—with names, permissions, sizes, timestamps, and organizational structure.

Core Responsibilities of the Logical File System

•Metadata Management — Maintains file control blocks (FCBs/inodes) containing file attributes: size, permissions, timestamps, owner, and pointers to data blocks.
•Directory Management — Implements hierarchical directory structures and translates human-readable paths into internal file identifiers.
•Protection Enforcement — Validates access permissions before allowing any operation, enforcing the security model (Unix permissions, ACLs, capabilities).
•Symbolic Name Resolution — Converts symbolic file names into internal unique identifiers used by lower layers.
•Open File Table Management — Tracks which files are currently open, by whom, and in what mode, maintaining per-process and system-wide file tables.
•Operation Validation — Verifies that requested operations are semantically valid (e.g., you can't read from a directory, you can't seek past EOF in certain modes).

An important distinction: The logical file system deals with structure and policy, not mechanism. It decides what should happen and whether it's allowed to happen, then delegates the how to lower layers. When you call write(fd, buffer, 1000), the logical file system verifies you have write permission, looks up the file's FCB, and determines which logical blocks need modification. But it doesn't actually issue I/O commands—that's left to lower layers.

File Control Blocks (FCB/Inodes): The Heart of Metadata

Every file in a file system is represented by a File Control Block (FCB). In Unix-like systems, this structure is called an inode (index node). In Windows NTFS, it's a Master File Table (MFT) entry. Regardless of the specific implementation, the concept is universal: there must be a data structure that stores all metadata about a file.

The FCB is the logical file system's fundamental unit of management. When you refer to a file by name, the logical file system ultimately resolves that name to an FCB. All subsequent operations on that file consult and update this structure.

Contents of a Typical File Control Block (Unix inode)
Field	Size (bytes)	Description
File Type	2	Regular file, directory, symbolic link, device file, socket, named pipe
Permissions	2	Read/write/execute for owner, group, others (12 bits + setuid/setgid/sticky)
Link Count	2-4	Number of hard links pointing to this inode
Owner UID	4	User ID of the file owner
Owner GID	4	Group ID of the file group
File Size	8	Size of file data in bytes (64-bit for large file support)
Access Time (atime)	8-16	Last time file data was read
Modification Time (mtime)	8-16	Last time file data was modified
Change Time (ctime)	8-16	Last time inode metadata was changed
Block Pointers	60+	Direct, indirect, double-indirect, triple-indirect block pointers
Flags	4	Immutable, append-only, no-dump, synchronous I/O, etc.
Extended Attribute Pointer	4-8	Pointer to extended attributes (ACLs, SELinux labels, user attributes)

Critical observation: The FCB does NOT contain the file's name. This is a profound design decision. Names are stored in directories, which are just special files containing (name, inode number) pairs. This separation enables:

Hard links — Multiple names can point to the same inode
Efficient renaming — Renaming only changes the directory entry, not the file itself
Atomic operations — The inode is the authoritative source of truth, reducing race conditions

When the link count drops to zero (no names reference the inode) AND no process has the file open, only then is the file truly deleted.

Inode Exhaustion

A file system can run out of inodes before running out of disk space. This happens when you have many small files. A system might report '0 bytes available' while actually having gigabytes free—but no inode slots remaining. This is a common production failure mode for mail servers, build systems, and container registries.

Directory Services and Path Resolution

The logical file system provides directory services—the ability to organize files into a hierarchical namespace and resolve human-readable paths to internal identifiers. This seemingly simple functionality involves surprisingly complex machinery.

What is a directory? In most Unix-like file systems, a directory is simply a special file whose contents are a list of (name, inode number) pairs. The logical file system interprets this data specially, providing operations like opendir(), readdir(), and closedir().

Path Resolution Algorithm

When you access /home/user/docs/report.txt, the logical file system executes a path resolution algorithm:

1. Start at root directory (inode 2 on most Unix systems)
2. Read root directory contents
3. Search for entry named 'home' → get inode number
4. Read inode for 'home', verify it's a directory
5. Read 'home' directory contents
6. Search for entry named 'user' → get inode number
7. Read inode for 'user', verify it's a directory
8. Read 'user' directory contents
9. Search for entry named 'docs' → get inode number
10. Read inode for 'docs', verify it's a directory
11. Read 'docs' directory contents
12. Search for entry named 'report.txt' → get inode number
13. Read inode for 'report.txt', verify permissions
14. Return file handle

For a 4-component path, this involves reading at least 8 blocks (4 directory inodes + 4 directory contents). For deeply nested paths, the cost is even higher.

The Directory Name Lookup Cache (DNLC/dcache)

Because path resolution is expensive, modern kernels maintain a directory entry cache (dcache in Linux). This cache stores recently resolved (name → inode) mappings, often achieving 99%+ hit rates. Without this cache, every file access would require multiple disk reads. The dcache is one of the most critical caches in the entire operating system.

Directory Implementation Strategies

The logical file system must efficiently implement directories despite varying workloads—from directories with 3 files to directories with 3 million files. Several strategies exist:

Directory Implementation Comparison
Strategy	Lookup Time	Insert Time	Advantages	Disadvantages
Linear List	O(n)	O(1)*	Simple, works for small directories	Degrades badly with many entries
Hash Table	O(1) average	O(1) average	Fast for known names	No ordering, collision handling needed
B-Tree	O(log n)	O(log n)	Ordered, scales to millions of entries	More complex, some overhead
Hash + HTree (ext4)	O(1) → O(log n)	O(1) → O(log n)	Best of both: fast + scalable	Complexity, compatibility concerns

*Linear list insertion is O(1) only if we allow duplicates; with duplicate checking it becomes O(n).

Modern file systems like ext4 use a hybrid approach: small directories use a linear list (fast for typical use), but large directories automatically convert to an HTree (hashed B-tree) structure. This adaptive strategy optimizes for common cases while scaling to extreme workloads.

Protection and Access Control

The logical file system is the enforcement point for all file access control. Before any operation proceeds, this layer validates that the requesting process has appropriate permissions. This is not optional—it's the fundamental security barrier protecting user data.

Access control happens at multiple levels:

Every operation — Read, write, execute, delete, rename, attribute change
Every path component — You need execute permission on every directory in a path
Every transition — Following symbolic links, crossing mount points

A single failed permission check aborts the entire operation with EACCES (permission denied) or EPERM (operation not permitted).

Unix Permission Model (DAC)

The traditional Unix model is Discretionary Access Control (DAC). The file owner decides who can access the file. Permissions are encoded in a 12-bit field:

   Special    Owner      Group      Others
   ─────────  ─────────  ─────────  ─────────
   suid sgid  r  w  x    r  w  x    r  w  x
   sticky                                     
   ─────────  ─────────  ─────────  ─────────
     3 bits    3 bits     3 bits     3 bits

Permission bits meaning:

r (4): Read file contents / list directory
w (2): Modify file contents / add/remove directory entries
x (1): Execute file / traverse directory

Special bits:

setuid (4000): Execute as file owner (not caller)
setgid (2000): Execute as file group / inherit group in directory
sticky (1000): Only owner can delete files in directory (used for /tmp)

The Directory Execute Bit Confusion

The 'execute' bit on directories is actually the 'search' or 'traverse' permission. Without it, you cannot access ANY file in that directory or its subdirectories—even if you know the exact path. This is a common source of confusion and misconfiguration.

Access Control Lists (ACLs)

Basic Unix permissions only allow specifying access for owner, one group, and everyone else. This is insufficient for complex environments. ACLs (Access Control Lists) extend the model, allowing fine-grained control:

# Example: Grant specific user access without changing group
setfacl -m u:alice:rw /data/project/report.doc

# Example: Deny specific group access
setfacl -m g:contractors:--- /data/confidential/

# Example: Set default ACL for new files
setfacl -d -m g:developers:rwx /data/project/

ACLs are stored as extended attributes attached to the inode. The logical file system evaluates ACLs after checking basic permissions, allowing or denying access based on the most specific matching entry.

Permission Check Algorithm

•If process is running as root (UID 0), most checks pass immediately (except execute without any x bit)
•Compare process's effective UID with file owner UID → use owner permissions if match
•Search process's group list for file's GID → use group permissions if match
•If ACLs exist, search ACL entries for matching user/group → use ACL permissions
•Fall back to 'others' permissions
•For capabilities-aware systems, check if process has required capability (e.g., CAP_DAC_OVERRIDE)

Open File Tables: Managing Active Files

When a process opens a file, the operating system doesn't just return an inode—it creates a sophisticated tracking structure. The logical file system maintains multiple levels of file tables to efficiently manage open files while supporting features like file sharing and independent file positions.

The Three-Level Table Architecture

    Process A            Process B           System-wide
   ────────────        ────────────        ─────────────────
   ┌──────────┐        ┌──────────┐        ┌───────────────┐
   │ FD Table │        │ FD Table │        │ Open File     │
   ├──────────┤        ├──────────┤        │ Table         │
   │ 0  stdin │        │ 0  stdin │        ├───────────────┤
   │ 1  stdout│        │ 1  stdout│        │ Entry 1       │
   │ 2  stderr│        │ 2  stderr│        │  - offset: 0  │
   │ 3 ─────────┐      │ 3 ───────────┐    │  - mode: r    │
   │ 4 ─────────┼──────│ 4 ───────────┼────│  - inode ptr  │
   └──────────┘ │      └──────────┘   │    ├───────────────┤
                │                     │    │ Entry 2       │
                │                     └────│  - offset: 500│
                │                          │  - mode: rw   │
                └──────────────────────────│  - inode ptr  │
                                           └───────────────┘

Level 1: Per-Process File Descriptor Table

Each process has its own array of file descriptors (small integers: 0, 1, 2, ...)
Each entry points to an entry in the system-wide open file table
Inherited across fork(), allowing parent-child file sharing
close() removes entry from this table only

Level 2: System-Wide Open File Table

One entry per unique file opening (not per descriptor)
Contains: current file offset, access mode (r/w), flags, pointer to inode
Multiple FDs (same or different processes) can point to same entry → shared offset
Reference counted; entry freed when all FDs closed

Level 3: In-Memory Inode Table (vnode/inode cache)

One entry per open file, regardless of how many times it's open
Cached copy of on-disk inode, plus runtime state (locks, pages in memory)
Reference counted; evicted from cache when no longer needed

Why Three Levels?

This design elegantly separates concerns: FD tables provide per-process namespacing, the open file table manages sharing semantics (crucial for pipes, shared logs, databases), and the inode cache provides data consistency and performance. Without these layers, basic operations like fork() with shared file handles would be impossible to implement correctly.

Practical Implications

Scenario 1: Two independent opens of the same file

// Process A
int fd = open("/data/log.txt", O_RDONLY);

// Process B (separate process)
int fd = open("/data/log.txt", O_RDONLY);

Each process gets its own FD
Each creates a separate open file table entry
Both entries point to the same inode
File offsets are independent — each reader tracks position separately

Scenario 2: Inherited file descriptor after fork()

int fd = open("/data/log.txt", O_RDWR);
if (fork() == 0) {
    write(fd, "child", 5);  // Child writes
} else {
    write(fd, "parent", 6); // Parent writes
}

Parent and child share the SAME open file table entry
File offset is shared — writes interleave at kernel-tracked position
Critical for: shell pipelines, log files, coordinated I/O

Operation Validation and Error Handling

The logical file system is the first line of defense against invalid operations. Before passing any request to lower layers, it performs comprehensive validation. This validation is critical for system stability—allowing an invalid request to reach the disk driver could cause data corruption or system crashes.

Validations Performed by the Logical File System

•File descriptor validity — Is the FD a valid index in the process's FD table?
•Operation/mode compatibility — Can't write to read-only FD, can't read from write-only FD
•File type appropriateness — Can't read() a directory (use readdir()), can't seek() on a pipe
•Argument sanity — Buffer pointers accessible, count values reasonable, offsets valid
•Resource limits — Process file descriptor limit, file size limits, quota
•Lock conflicts — Advisory/mandatory locks that block the operation
•Mount point state — Is the file system mounted read-only?
•File system specific constraints — Maximum file name length, allowed characters, path depth

Error Codes and Their Meanings

When validation fails, the logical file system returns specific error codes via errno. Understanding these codes is essential for proper error handling:

Common File System Error Codes
Error Code	Value	Cause	Recovery Strategy
EACCES	13	Permission denied (file)	Check permissions, request elevation
EPERM	1	Operation not permitted (system)	Usually not recoverable; redesign approach
ENOENT	2	File or path component doesn't exist	Verify path, create file if appropriate
EEXIST	17	File exists (for O_EXCL)	Use different name, or open existing
EISDIR	21	Tried to write to a directory	Use correct API (mkdir, rmdir)
ENOTDIR	20	Path component not a directory	Fix path, resolve symlinks
EMFILE	24	Process FD limit reached	Close unused FDs, increase ulimit
ENFILE	23	System open file limit reached	System-wide issue; contact admin
ENOSPC	28	No space left on device	Free space, extend volume, compress data
EROFS	30	Read-only file system	Remount read-write if appropriate
ELOOP	40	Too many symbolic links	Check for symlink loops
ENAMETOOLONG	36	Path or filename too long	Use shorter names, different location

Silent Failures in Error Handling

Production systems frequently fail to check system call return values. A failed write() that returns -1 looks like successful I/O to code that ignores the return value. Always check returns and errno—data loss often starts with an unchecked error.

Interface to Lower Layers

After validation succeeds, the logical file system must interface with the file organization module (the next layer down). This interface is designed around logical blocks—fixed-size units of storage independent of the underlying hardware.

The logical file system communicates requests in an abstract form:

"Read block 1000 of file inode 12345"
"Write this data to block 2000 of file inode 12345"
"Allocate 5 new blocks for file inode 12345"

It does NOT specify:

Which physical sectors on disk contain these blocks
Whether the blocks are contiguous or fragmented
What the actual disk geometry is

This abstraction is crucial. The same logical file system code can work with contiguous allocation, linked allocation, indexed allocation, or extent-based allocation—as long as the file organization module presents the same interface.

The Conversion Process

   Application Request              Logical FS Action           To Lower Layer
   ──────────────────────           ─────────────────           ──────────────
   read(fd, buf, 4096)              1. Validate fd, buf, count
                                    2. Get inode from FD
                                    3. Check read permission
                                    4. Calculate: offset 0 → logical block 0
                                                  ────────────────────────────►
                                                  Request: read inode X, block 0

   write(fd, buf, 8192)             1. Validate fd, buf, count
   (offset at 4096)                 2. Get inode from FD
                                    3. Check write permission
                                    4. Calculate: offset 4096 → block 1
                                                  offset 8192 → block 2 (partial)
                                                  ────────────────────────────►
                                                  Request: allocate block if needed
                                                  Request: write inode X, blocks 1-2
                                    5. Update inode: size, mtime

Key transformations:

Byte offset → Logical block number: block = offset / block_size
Byte count → Block count: Handles partial blocks at start and end
Permissions → Allowed operations: Read/write/execute checks
File descriptor → Inode: Via the open file table chain

The VFS Abstraction

In real operating systems, the logical file system often operates through a Virtual File System (VFS) layer. The VFS provides a common interface that all file systems implement, allowing applications to use the same system calls regardless of whether they're accessing ext4, XFS, NFS, or a FUSE-based file system. We'll explore VFS in depth later in this chapter.

Summary: The Logical File System Foundation

We've explored the logical file system—the top layer of the file system stack and the primary interface for applications and users. Let's consolidate the key concepts:

Key Takeaways

•The logical file system is the topmost layer — It handles all user-visible aspects: names, permissions, metadata, and directory structure.
•File Control Blocks (FCBs/inodes) are the fundamental unit — Every file has one, containing all metadata except the file's name.
•Names live in directories, not files — This separation enables hard links, efficient renaming, and atomic operations.
•Path resolution is expensive — Each component requires directory lookup; the dcache makes this bearable.
•Protection is enforced at this layer — Every operation passes through permission checks before reaching storage.
•Open file tables manage runtime state — Three levels (per-process FD, system-wide open file, inode cache) enable sharing semantics.
•Abstraction hides lower-layer complexity — The logical file system works with logical blocks, unaware of physical storage details.
•Proper error handling is critical — The logical file system returns specific error codes that applications must handle.

What's Next:

The logical file system converts file operations into logical block requests. But how does the system know which physical blocks on disk hold these logical blocks? That's the job of the file organization module—our next topic. We'll explore how different allocation strategies (contiguous, linked, indexed) map logical blocks to physical storage.

Page Complete

You now understand the logical file system layer—its responsibilities, data structures, and interfaces. This foundation is essential for understanding how file systems work and for debugging file-related issues in production. Next, we'll descend one layer to explore the file organization module.

1 / 5

Loading learning content...

Operating SystemsFile System Layers

File System Layers

LevelIntermediate

Duration90 mins

TopicFile System Layers

1 / 5

Logical File System

The Invisible Abstraction Machine

What You Will Learn

The Layered File System Architecture

The Five-Layer File System Stack
Layer	Name	Primary Responsibility	Key Operations
5 (Top)	Logical File System	Metadata management & protection	File validation, permission checks, FCB management
4	File Organization Module	Logical-to-physical block mapping	Block allocation, free space management
3	Basic File System	Block-level operations	Read/write physical blocks, buffer management
2	I/O Control	Device command translation	Device drivers, interrupt handling
1 (Bottom)	Device Drivers	Hardware communication	Direct hardware register access, DMA setup

Why Layering Matters

What Is the Logical File System?

Core Responsibilities of the Logical File System

•Metadata Management — Maintains file control blocks (FCBs/inodes) containing file attributes: size, permissions, timestamps, owner, and pointers to data blocks.
•Directory Management — Implements hierarchical directory structures and translates human-readable paths into internal file identifiers.
•Protection Enforcement — Validates access permissions before allowing any operation, enforcing the security model (Unix permissions, ACLs, capabilities).
•Symbolic Name Resolution — Converts symbolic file names into internal unique identifiers used by lower layers.
•Open File Table Management — Tracks which files are currently open, by whom, and in what mode, maintaining per-process and system-wide file tables.
•Operation Validation — Verifies that requested operations are semantically valid (e.g., you can't read from a directory, you can't seek past EOF in certain modes).

File Control Blocks (FCB/Inodes): The Heart of Metadata

Contents of a Typical File Control Block (Unix inode)
Field	Size (bytes)	Description
File Type	2	Regular file, directory, symbolic link, device file, socket, named pipe
Permissions	2	Read/write/execute for owner, group, others (12 bits + setuid/setgid/sticky)
Link Count	2-4	Number of hard links pointing to this inode
Owner UID	4	User ID of the file owner
Owner GID	4	Group ID of the file group
File Size	8	Size of file data in bytes (64-bit for large file support)
Access Time (atime)	8-16	Last time file data was read
Modification Time (mtime)	8-16	Last time file data was modified
Change Time (ctime)	8-16	Last time inode metadata was changed
Block Pointers	60+	Direct, indirect, double-indirect, triple-indirect block pointers
Flags	4	Immutable, append-only, no-dump, synchronous I/O, etc.
Extended Attribute Pointer	4-8	Pointer to extended attributes (ACLs, SELinux labels, user attributes)

Hard links — Multiple names can point to the same inode
Efficient renaming — Renaming only changes the directory entry, not the file itself
Atomic operations — The inode is the authoritative source of truth, reducing race conditions

When the link count drops to zero (no names reference the inode) AND no process has the file open, only then is the file truly deleted.

Inode Exhaustion

Directory Services and Path Resolution

Path Resolution Algorithm

When you access /home/user/docs/report.txt, the logical file system executes a path resolution algorithm:

1. Start at root directory (inode 2 on most Unix systems)
2. Read root directory contents
3. Search for entry named 'home' → get inode number
4. Read inode for 'home', verify it's a directory
5. Read 'home' directory contents
6. Search for entry named 'user' → get inode number
7. Read inode for 'user', verify it's a directory
8. Read 'user' directory contents
9. Search for entry named 'docs' → get inode number
10. Read inode for 'docs', verify it's a directory
11. Read 'docs' directory contents
12. Search for entry named 'report.txt' → get inode number
13. Read inode for 'report.txt', verify permissions
14. Return file handle

For a 4-component path, this involves reading at least 8 blocks (4 directory inodes + 4 directory contents). For deeply nested paths, the cost is even higher.

The Directory Name Lookup Cache (DNLC/dcache)

Directory Implementation Strategies

The logical file system must efficiently implement directories despite varying workloads—from directories with 3 files to directories with 3 million files. Several strategies exist:

Directory Implementation Comparison
Strategy	Lookup Time	Insert Time	Advantages	Disadvantages
Linear List	O(n)	O(1)*	Simple, works for small directories	Degrades badly with many entries
Hash Table	O(1) average	O(1) average	Fast for known names	No ordering, collision handling needed
B-Tree	O(log n)	O(log n)	Ordered, scales to millions of entries	More complex, some overhead
Hash + HTree (ext4)	O(1) → O(log n)	O(1) → O(log n)	Best of both: fast + scalable	Complexity, compatibility concerns

*Linear list insertion is O(1) only if we allow duplicates; with duplicate checking it becomes O(n).

Protection and Access Control

Access control happens at multiple levels:

Every operation — Read, write, execute, delete, rename, attribute change
Every path component — You need execute permission on every directory in a path
Every transition — Following symbolic links, crossing mount points

A single failed permission check aborts the entire operation with EACCES (permission denied) or EPERM (operation not permitted).

Unix Permission Model (DAC)

The traditional Unix model is Discretionary Access Control (DAC). The file owner decides who can access the file. Permissions are encoded in a 12-bit field:

   Special    Owner      Group      Others
   ─────────  ─────────  ─────────  ─────────
   suid sgid  r  w  x    r  w  x    r  w  x
   sticky                                     
   ─────────  ─────────  ─────────  ─────────
     3 bits    3 bits     3 bits     3 bits

Permission bits meaning:

r (4): Read file contents / list directory
w (2): Modify file contents / add/remove directory entries
x (1): Execute file / traverse directory

Special bits:

setuid (4000): Execute as file owner (not caller)
setgid (2000): Execute as file group / inherit group in directory
sticky (1000): Only owner can delete files in directory (used for /tmp)

The Directory Execute Bit Confusion

Access Control Lists (ACLs)

# Example: Grant specific user access without changing group
setfacl -m u:alice:rw /data/project/report.doc

# Example: Deny specific group access
setfacl -m g:contractors:--- /data/confidential/

# Example: Set default ACL for new files
setfacl -d -m g:developers:rwx /data/project/

Permission Check Algorithm

•If process is running as root (UID 0), most checks pass immediately (except execute without any x bit)
•Compare process's effective UID with file owner UID → use owner permissions if match
•Search process's group list for file's GID → use group permissions if match
•If ACLs exist, search ACL entries for matching user/group → use ACL permissions
•Fall back to 'others' permissions
•For capabilities-aware systems, check if process has required capability (e.g., CAP_DAC_OVERRIDE)

Open File Tables: Managing Active Files

The Three-Level Table Architecture

    Process A            Process B           System-wide
   ────────────        ────────────        ─────────────────
   ┌──────────┐        ┌──────────┐        ┌───────────────┐
   │ FD Table │        │ FD Table │        │ Open File     │
   ├──────────┤        ├──────────┤        │ Table         │
   │ 0  stdin │        │ 0  stdin │        ├───────────────┤
   │ 1  stdout│        │ 1  stdout│        │ Entry 1       │
   │ 2  stderr│        │ 2  stderr│        │  - offset: 0  │
   │ 3 ─────────┐      │ 3 ───────────┐    │  - mode: r    │
   │ 4 ─────────┼──────│ 4 ───────────┼────│  - inode ptr  │
   └──────────┘ │      └──────────┘   │    ├───────────────┤
                │                     │    │ Entry 2       │
                │                     └────│  - offset: 500│
                │                          │  - mode: rw   │
                └──────────────────────────│  - inode ptr  │
                                           └───────────────┘

Level 1: Per-Process File Descriptor Table

Each process has its own array of file descriptors (small integers: 0, 1, 2, ...)
Each entry points to an entry in the system-wide open file table
Inherited across fork(), allowing parent-child file sharing
close() removes entry from this table only

Level 2: System-Wide Open File Table

One entry per unique file opening (not per descriptor)
Contains: current file offset, access mode (r/w), flags, pointer to inode
Multiple FDs (same or different processes) can point to same entry → shared offset
Reference counted; entry freed when all FDs closed

Level 3: In-Memory Inode Table (vnode/inode cache)

One entry per open file, regardless of how many times it's open
Cached copy of on-disk inode, plus runtime state (locks, pages in memory)
Reference counted; evicted from cache when no longer needed

Why Three Levels?

Practical Implications

Scenario 1: Two independent opens of the same file

// Process A
int fd = open("/data/log.txt", O_RDONLY);

// Process B (separate process)
int fd = open("/data/log.txt", O_RDONLY);

Each process gets its own FD
Each creates a separate open file table entry
Both entries point to the same inode
File offsets are independent — each reader tracks position separately

Scenario 2: Inherited file descriptor after fork()

int fd = open("/data/log.txt", O_RDWR);
if (fork() == 0) {
    write(fd, "child", 5);  // Child writes
} else {
    write(fd, "parent", 6); // Parent writes
}

Parent and child share the SAME open file table entry
File offset is shared — writes interleave at kernel-tracked position
Critical for: shell pipelines, log files, coordinated I/O

Operation Validation and Error Handling

Validations Performed by the Logical File System

•File descriptor validity — Is the FD a valid index in the process's FD table?
•Operation/mode compatibility — Can't write to read-only FD, can't read from write-only FD
•File type appropriateness — Can't read() a directory (use readdir()), can't seek() on a pipe
•Argument sanity — Buffer pointers accessible, count values reasonable, offsets valid
•Resource limits — Process file descriptor limit, file size limits, quota
•Lock conflicts — Advisory/mandatory locks that block the operation
•Mount point state — Is the file system mounted read-only?
•File system specific constraints — Maximum file name length, allowed characters, path depth

Error Codes and Their Meanings

When validation fails, the logical file system returns specific error codes via errno. Understanding these codes is essential for proper error handling:

Common File System Error Codes
Error Code	Value	Cause	Recovery Strategy
EACCES	13	Permission denied (file)	Check permissions, request elevation
EPERM	1	Operation not permitted (system)	Usually not recoverable; redesign approach
ENOENT	2	File or path component doesn't exist	Verify path, create file if appropriate
EEXIST	17	File exists (for O_EXCL)	Use different name, or open existing
EISDIR	21	Tried to write to a directory	Use correct API (mkdir, rmdir)
ENOTDIR	20	Path component not a directory	Fix path, resolve symlinks
EMFILE	24	Process FD limit reached	Close unused FDs, increase ulimit
ENFILE	23	System open file limit reached	System-wide issue; contact admin
ENOSPC	28	No space left on device	Free space, extend volume, compress data
EROFS	30	Read-only file system	Remount read-write if appropriate
ELOOP	40	Too many symbolic links	Check for symlink loops
ENAMETOOLONG	36	Path or filename too long	Use shorter names, different location

Silent Failures in Error Handling

Interface to Lower Layers

The logical file system communicates requests in an abstract form:

"Read block 1000 of file inode 12345"
"Write this data to block 2000 of file inode 12345"
"Allocate 5 new blocks for file inode 12345"

It does NOT specify:

Which physical sectors on disk contain these blocks
Whether the blocks are contiguous or fragmented
What the actual disk geometry is

The Conversion Process

   Application Request              Logical FS Action           To Lower Layer
   ──────────────────────           ─────────────────           ──────────────
   read(fd, buf, 4096)              1. Validate fd, buf, count
                                    2. Get inode from FD
                                    3. Check read permission
                                    4. Calculate: offset 0 → logical block 0
                                                  ────────────────────────────►
                                                  Request: read inode X, block 0

   write(fd, buf, 8192)             1. Validate fd, buf, count
   (offset at 4096)                 2. Get inode from FD
                                    3. Check write permission
                                    4. Calculate: offset 4096 → block 1
                                                  offset 8192 → block 2 (partial)
                                                  ────────────────────────────►
                                                  Request: allocate block if needed
                                                  Request: write inode X, blocks 1-2
                                    5. Update inode: size, mtime

Key transformations:

Byte offset → Logical block number: block = offset / block_size
Byte count → Block count: Handles partial blocks at start and end
Permissions → Allowed operations: Read/write/execute checks
File descriptor → Inode: Via the open file table chain

The VFS Abstraction

Summary: The Logical File System Foundation

We've explored the logical file system—the top layer of the file system stack and the primary interface for applications and users. Let's consolidate the key concepts:

Key Takeaways

•The logical file system is the topmost layer — It handles all user-visible aspects: names, permissions, metadata, and directory structure.
•File Control Blocks (FCBs/inodes) are the fundamental unit — Every file has one, containing all metadata except the file's name.
•Names live in directories, not files — This separation enables hard links, efficient renaming, and atomic operations.
•Path resolution is expensive — Each component requires directory lookup; the dcache makes this bearable.
•Protection is enforced at this layer — Every operation passes through permission checks before reaching storage.
•Open file tables manage runtime state — Three levels (per-process FD, system-wide open file, inode cache) enable sharing semantics.
•Abstraction hides lower-layer complexity — The logical file system works with logical blocks, unaware of physical storage details.
•Proper error handling is critical — The logical file system returns specific error codes that applications must handle.

What's Next:

Page Complete

1 / 5