Operating SystemsFile System Implementation

Directory Implementation

LevelIntermediate

Duration120 mins

TopicFile System Implementation

5 / 5

Name Lookup

From Path to File: The Lookup Journey

When you type cat /home/alice/documents/report.pdf, the shell must transform that human-readable path into something the file system can use—specifically, the inode number (or equivalent) of the file. This transformation, called pathname resolution or name lookup, is one of the most frequently executed operations in any operating system.

Consider the magnitude: every open(), stat(), access(), exec(), and hundreds of other system calls begin with name lookup. A single compilation of a moderately complex program might invoke name lookup tens of thousands of times. The efficiency of this operation directly impacts system responsiveness.

This page provides comprehensive coverage of name lookup: the step-by-step algorithm that traverses directory hierarchies, caching mechanisms that avoid repeated disk access, handling of symbolic links and mount points, permission checking at each step, and optimizations that make modern file systems fast.

What You Will Learn

By the end of this page, you will understand: (1) The complete pathname resolution algorithm, (2) How the directory entry cache (dcache) accelerates lookups, (3) Symbolic link handling and loop detection, (4) Mount point traversal mechanics, (5) Permission checking during path traversal, and (6) Case sensitivity and normalization considerations.

The Pathname Resolution Algorithm

Pathname resolution is fundamentally iterative: starting from a known directory, look up each component of the path in sequence until reaching the final target.

The Basic Algorithm:

Pathname Resolution Algorithm
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// Simplified pathname resolution
// Returns inode of target file, or error
 
struct inode* resolve_path(const char* path, int flags) {
    struct inode* current;
    const char* component;
    int component_len;
    
    // Step 1: Determine starting point
    if (path[0] == '/') {
        // Absolute path: start from root
        current = get_root_inode();
        path++;  // Skip leading '/'
    } else {
        // Relative path: start from current working directory
        current = get_cwd_inode();
    }
    
    // Step 2: Process each path component
    while ((component_len = next_component(path, &component)) > 0) {
        // Step 2a: Verify current is a directory
        if (!S_ISDIR(current->i_mode)) {
            return ERR_PTR(-ENOTDIR);
        }
        
        // Step 2b: Check execute permission (search permission for dirs)
        if (!may_exec(current)) {
            return ERR_PTR(-EACCES);
        }
        
        // Step 2c: Look up component in current directory
        struct inode* next = dir_lookup(current, component, component_len);
        if (IS_ERR(next)) {
            return next;  // Component not found
        }
        
        // Step 2d: Handle symbolic links
        if (S_ISLNK(next->i_mode) && should_follow_link(flags, path)) {
            next = follow_symlink(next, flags);
            if (IS_ERR(next)) {
                return next;
            }
        }
        
        // Step 2e: Handle mount points
        if (is_mounted(next)) {
            next = get_mount_root(next);
        }
        
        // Move to next component
        iput(current);  // Release reference to old inode
        current = next;
        path = skip_slashes(component + component_len);
    }
    
    return current;
}
 
// Helper: Extract next path component
int next_component(const char* path, const char** out) {
    *out = path;
    int len = 0;
    while (path[len] && path[len] != '/') {
        len++;
    }
    return len;
}

Concrete Example:

Resolving /home/alice/report.pdf:

Resolution Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Path: /home/alice/report.pdf
 
Step 1: Start at root directory (inode 2)
        Path: "home/alice/report.pdf"
 
Step 2: Look up "home" in root directory
        → Scan root directory entries
        → Find entry: name="home", inode=100
        → Current = inode 100
        Path: "alice/report.pdf"
 
Step 3: Look up "alice" in /home (inode 100)
        → Check execute permission on inode 100 ✓
        → Scan /home directory entries
        → Find entry: name="alice", inode=500
        → Current = inode 500
        Path: "report.pdf"
 
Step 4: Look up "report.pdf" in /home/alice (inode 500)
        → Check execute permission on inode 500 ✓
        → Scan /home/alice directory entries
        → Find entry: name="report.pdf", inode=1234
        → No more components
        
Result: inode 1234
 
Statistics:
  - Directory lookups: 3
  - Permission checks: 3
  - Disk reads (cold cache): 3 directories + 3 inodes = 6 I/O ops
  - With hot dcache: 0 I/O ops (all in memory)

Why Execute Permission?

Directories require 'execute' permission (x bit) for traversal, not 'read'. Read permission lets you list contents; execute lets you traverse through. You can have 'x' without 'r'—allowing access to files whose names you know, but not listing.

The Directory Entry Cache (dcache)

Without caching, every path resolution would require disk I/O. The directory entry cache (dcache) stores recent (parent directory, filename) → inode mappings in memory.

dcache Architecture:

Linux dcache Structure
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Linux dentry structure (simplified)
struct dentry {
    // Identity
    struct qstr d_name;          // Filename with precomputed hash
    struct dentry *d_parent;     // Parent dentry
    struct inode *d_inode;       // Associated inode (NULL if negative)
    
    // Cache management
    struct hlist_node d_hash;    // Hash table linkage
    struct list_head d_lru;      // LRU list for reclaim
    struct list_head d_child;    // Child list for parent
    struct list_head d_subdirs;  // List of subdirectories
    
    // Reference counting
    atomic_t d_count;            // Reference count
    unsigned int d_flags;        // DCACHE_* flags
    
    // Operations
    const struct dentry_operations *d_op;
};
 
// qstr: Quick string with precomputed hash
struct qstr {
    unsigned int hash;           // Hash for fast comparison
    unsigned int len;            // String length
    const unsigned char *name;   // The actual name
};
 
// Lookup in dcache: O(1) average
struct dentry* d_lookup(struct dentry *parent, struct qstr *name) {
    // Compute hash bucket
    unsigned int hash = full_name_hash(parent, name->name, name->len);
    
    // Search in hash table
    struct hlist_head *head = d_hash(hash);
    struct dentry *dentry;
    
    hlist_for_each_entry(dentry, head, d_hash) {
        if (dentry->d_parent == parent &&
            dentry->d_name.hash == name->hash &&
            dentry->d_name.len == name->len &&
            memcmp(dentry->d_name.name, name->name, name->len) == 0) {
            // Found! Increment reference and return
            dget(dentry);
            return dentry;
        }
    }
    
    return NULL;  // Not in cache
}

Negative Entries:

The dcache also caches failed lookups (negative entries). If you access a nonexistent file, the dcache remembers "this name doesn't exist"—preventing repeated disk scans:

Negative Caching
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Negative dentry: d_inode is NULL
struct dentry* lookup_result;
 
if (found_in_directory) {
    // Positive entry: inode exists
    lookup_result->d_inode = found_inode;
} else {
    // Negative entry: file doesn't exist
    lookup_result->d_inode = NULL;  
    // Cache this negative result!
}
 
// Next lookup for same name:
dentry = d_lookup(parent, name);
if (dentry != NULL) {
    if (dentry->d_inode == NULL) {
        // Negative cache hit: "definitely doesn't exist"
        return -ENOENT;  // No disk I/O needed!
    }
    // Positive cache hit: return the inode
}
 
// Why negative caching matters:
// - Linkers probe for many nonexistent paths
// - Shell PATH search tries many directories
// - Build systems stat() thousands of potential files
// Without negative caching, each probe hits disk

dcache Performance Impact
Scenario	Without dcache	With dcache	Speedup
Same file opened 1000 times	1000 × disk lookups	1 disk + 999 memory	~1000x
walk /usr/lib/* (1000 files)	1000 directory reads	1 directory read	~1000x
Stat nonexistent file repeatedly	Scan directory each time	Immediate ENOENT	infinite
Compile project (50K files)	Hours of I/O	Seconds	~100x

Cache Coherency

The dcache must stay synchronized with on-disk directories. When files are created/deleted, dentries are added/removed. Network file systems (NFS) face extra challenges—remote changes may not be immediately visible. This is why NFS has 'attribute caching timeout' settings.

Symbolic Link Resolution

Symbolic links add complexity to path resolution. When the kernel encounters a symlink, it must read the link's target and continue resolution from there—potentially recursively.

Symlink Resolution Algorithm:

Symlink Handling
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Symlink resolution with loop detection
#define MAX_SYMLINK_DEPTH 40  // Linux limit
 
struct inode* follow_symlink(
    struct inode* link,
    int flags,
    int* depth           // Current recursion depth
) {
    char target[PATH_MAX];
    int len;
    
    // Step 1: Check recursion depth (prevent infinite loops)
    if (*depth >= MAX_SYMLINK_DEPTH) {
        return ERR_PTR(-ELOOP);  // Too many levels of symbolic links
    }
    (*depth)++;
    
    // Step 2: Read symlink target
    len = link->i_op->readlink(link, target, PATH_MAX);
    if (len < 0) {
        return ERR_PTR(len);
    }
    target[len] = '\0';
    
    // Step 3: Resolve the target path
    // If absolute, start from root
    // If relative, start from symlink's parent directory
    if (target[0] == '/') {
        return resolve_path_internal(target, flags, depth);
    } else {
        struct dentry *parent = link->i_dentry->d_parent;
        return resolve_path_from(parent, target, flags, depth);
    }
}
 
// Example symlink chains:
// /usr/lib -> /usr/lib64        (simple redirect)
// /var/run -> /run              (backward compat)
// a -> b -> c -> a              (LOOP! Returns -ELOOP)
 
// Special handling for different operations:
// - open(O_NOFOLLOW): Don't follow final symlink
// - lstat(): Return symlink info, don't follow
// - unlink() on symlink: Remove the link, not target

Symlink Complexity Examples:

Symlink Resolution Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Example 1: Simple symlink
──────────────────────────
Path: /lib/libc.so.6
/lib is symlink -> /usr/lib
 
Resolution:
  1. "/" -> inode 2 (root)
  2. "lib" in / -> inode 50, is symlink!
     Read symlink: "/usr/lib"
     Restart from root with "/usr/lib"
  3. "/" -> inode 2
  4. "usr" in / -> inode 10
  5. "lib" in /usr -> inode 20
  6. Continue: "libc.so.6" in /usr/lib -> inode 789
 
Total lookups: 6 (more than path components due to symlink)
 
 
Example 2: Relative symlink
───────────────────────────
/home/alice/latest -> reports/2024/final.pdf
 
Resolution:
  1-3. Resolve to /home/alice, find "latest" is symlink
  4. Read symlink: "reports/2024/final.pdf" (relative!)
  5. Resolve from /home/alice: "reports" -> inode 100
  6. "2024" in reports -> inode 101
  7. "final.pdf" in 2024 -> inode 500
 
 
Example 3: Symlink loop
───────────────────────
/tmp/a -> b
/tmp/b -> c  
/tmp/c -> a
 
Path: /tmp/a
 
Resolution:
  1. "/" -> inode 2
  2. "tmp" -> inode 5
  3. "a" -> symlink! Read: "b"
  4. "b" in /tmp -> symlink! Read: "c", depth=2
  5. "c" in /tmp -> symlink! Read: "a", depth=3
  6. "a" in /tmp -> symlink! Read: "b", depth=4
  ... continues until depth=40 ...
  41. Return -ELOOP

TOCTOU Vulnerabilities

Symlinks create Time-Of-Check-Time-Of-Use (TOCTOU) security risks. Between checking 'is this file safe?' and 'open this file', an attacker might change a symlink to point elsewhere. This is why security-critical code uses O_NOFOLLOW and fstat() on open file descriptors.

Mount Point Traversal

When a filesystem is mounted on a directory, that directory becomes a gateway to a different filesystem. Path resolution must detect mount points and switch to the mounted filesystem's root.

Mount Architecture:

Mount Point Handling
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Simplified mount structure
struct mount {
    struct vfsmount mnt;         // VFS mount info
    struct dentry *mnt_root;     // Root of mounted filesystem
    struct dentry *mnt_mountpoint; // Where it's mounted
    struct mount *mnt_parent;    // Parent mount
    // ...
};
 
// During path resolution:
struct inode* lookup_with_mounts(
    struct dentry *parent,
    struct qstr *name
) {
    struct dentry *dentry = d_lookup(parent, name);
    if (!dentry) {
        // Not in cache, read from disk
        dentry = real_lookup(parent, name);
    }
    
    // Check if this dentry is a mount point
    while (d_mountpoint(dentry)) {
        struct mount *mnt = lookup_mnt(dentry);
        if (!mnt) break;
        
        // Switch to mounted filesystem's root
        dentry = mnt->mnt_root;
        // Continue checking: mounts can stack!
    }
    
    return dentry->d_inode;
}
 
// Mount stacking example:
// /mnt/usb   <- ext4 USB drive mounted here
// /mnt/usb/encrypted <- LUKS container from USB mounted here
// /mnt/usb/encrypted/secret <- another mount
// Path resolution traverses all mount boundaries transparently

Mount Namespace Isolation:

Modern Linux supports mount namespaces—different processes can see different mount hierarchies:

Mount Namespaces
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Mount Namespace Example (containers):
═══════════════════════════════════════════════════════════════════
 
Host namespace:
  /
  ├── home/
  ├── var/
  └── mnt/
      └── docker/
          └── overlay/  <- Container layered filesystem
 
Container namespace (isolated view):
  /                      <- Actually docker/overlay/merged
  ├── home/              <- Only container's /home
  ├── var/
  └── etc/
 
Same path "/home" resolves to different directories
depending on which namespace the process is in!
 
Path resolution must consider:
1. Which mount namespace is the process in?
2. What mounts exist in that namespace?
3. Which filesystems can this process access?

Mount Traversal Considerations

•Crossing Upward — When resolving '..' at mount root, result is mount point in parent filesystem, not parent directory in mounted filesystem.
•Bind Mounts — Same filesystem mounted at multiple locations. Resolution must track which mount we entered through.
•Automount — Some mount points trigger mounting when accessed. NFS 'autofs' does this for network shares.
•Lazy Unmount — Filesystem may be 'unmounted' while still in use. Active processes keep references.

Chroot Jails

chroot() changes a process's view of '/'. Path resolution starts from the chroot directory instead of true root. But escapes are possible if the process can access file descriptors or mount points outside the jail. Modern containers use mount namespaces for stronger isolation.

Lookup Optimizations

Path resolution is so frequent that even small optimizations have outsized impact. Modern kernels employ numerous techniques.

RCU-Based dcache Lookup:

RCU dcache Optimization
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Linux uses RCU (Read-Copy-Update) for lock-free dcache reads
// Most lookups require NO LOCKS at all!
 
struct dentry* d_lookup_rcu(struct dentry *parent, struct qstr *name) {
    struct dentry *dentry;
    unsigned seq;
    
    rcu_read_lock();  // Lightweight: no actual locking
    
retry:
    seq = read_seqcount_begin(&parent->d_seq);
    
    // Walk hash chain without locks
    dentry = __d_lookup_rcu(parent, name);
    
    if (dentry) {
        if (read_seqcount_retry(&parent->d_seq, seq)) {
            goto retry;  // Parent changed, retry
        }
    }
    
    rcu_read_unlock();
    return dentry;
}
 
// Performance impact:
// - Typical lookup: ~50-100 CPU cycles
// - No cache line bouncing between CPUs
// - Scales linearly with cores

Short-Circuit Optimizations:

Path Resolution Shortcuts
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Special cases handled without full resolution
 
// Case 1: Empty path or just slashes
if (path[0] == '\0' || (path[0] == '/' && path[1] == '\0')) {
    return get_cwd();  // No resolution needed
}
 
// Case 2: "." (current directory)
if (path[0] == '.' && (path[1] == '\0' || path[1] == '/')) {
    // Skip this component entirely
}
 
// Case 3: ".." (parent directory) - cached in dentry
dentry = current->d_parent;  // O(1), no lookup
 
// Case 4: AT_EMPTY_PATH with fd
// openat(fd, "", AT_EMPTY_PATH) operates on fd itself
// No path resolution at all
 
// Case 5: Cached working directory
// cwd dentry kept in task_struct
// No resolution of /home/user/project each time
 
// Case 6: O_PATH opens
// open("/some/path", O_PATH) just returns fd
// Actual file content not accessed
// Used for fstatat(), fchmod() etc.

Hash Optimization:

Name Hash Optimization
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Linux uses a specialized hash for path components
// full_name_hash() includes parent's hash for uniqueness
 
unsigned int full_name_hash(
    const struct dentry *parent,
    const unsigned char *name,
    unsigned int len
) {
    unsigned int hash = parent->d_name.hash;
    
    // Rolling hash: very fast, good distribution
    while (len--) {
        hash = partial_name_hash(*name++, hash);
    }
    
    return end_name_hash(hash);
}
 
// partial_name_hash: single byte addition
static inline unsigned int partial_name_hash(
    unsigned long c,
    unsigned long prevhash
) {
    // Multiply and add: excellent avalanche properties
    return (prevhash + (c << 4) + (c >> 4)) * 11;
}
 
// Benefits:
// 1. Incremental: can compute while parsing path
// 2. Parent-aware: "foo" in different dirs has different hash
// 3. Fast: few instructions per character
// 4. Good distribution: minimizes hash collisions

Lookup Performance Factors
Factor	Impact	Optimization
dcache hit rate	Major (10-100x)	Large cache, smart eviction
Lock contention	Moderate (2-5x on many cores)	RCU, per-directory locks
Hash quality	Minor (1.1-1.5x)	Good hash function, sizing
Symlink depth	Can be major (40x max)	Limit depth, cache resolutions
Mount complexity	Usually minor	Mount cache, lazy unmount

Path Lookup Profiling

Use 'perf' to analyze lookup performance: 'perf stat -e dentry:d_lookup' counts cache hits. High dcache hit rates (>90%) indicate healthy caching. Low rates may indicate memory pressure or unusual access patterns.

Case Sensitivity and Normalization

Different file systems handle case differently, which affects path resolution significantly.

Case Handling Modes:

Case Sensitivity by File System
File System	Case Sensitive	Case Preserving	Notes
ext4 (Linux)	Yes	Yes	'File.txt' ≠ 'file.txt'
NTFS (Windows)	No*	Yes	Registry controls sensitivity
APFS (macOS)	No*	Yes	Per-volume option
FAT32	No	Limited (8.3 uppercase)	LFN preserves case
HFS+	No	Yes	Mac default

Case-Insensitive Lookup
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Case-insensitive comparison (NTFS style)
int compare_names_ci(const char *a, const char *b, size_t len) {
    for (size_t i = 0; i < len; i++) {
        // Convert to uppercase for comparison
        char ca = toupper((unsigned char)a[i]);
        char cb = toupper((unsigned char)b[i]);
        if (ca != cb) return ca - cb;
    }
    return 0;
}
 
// But it's more complex with Unicode:
// - "straße" should equal "STRASSE" (German sharp S)
// - Turkish: "i" has two uppercase forms: "I" and "İ"
// - Unicode has multiple normalization forms
 
// Windows solution: Upcase table in $UpCase file
// Maps every Unicode character to uppercase equivalent
// Lookup uses: upcase_table[char1] == upcase_table[char2]
 
// macOS HFS+ uses Unicode decomposition:
// "café" stored as "cafe\u0301" (e + combining accent)
// Comparisons normalize before matching
 
// Implications for dcache:
// - Case-sensitive: exact hash of name
// - Case-insensitive: hash of uppercased/normalized name
// - Need separate caches for each mode!

Cross-Platform Dangers

Git repositories created on case-insensitive systems (Mac, Windows) may contain 'file.txt' and 'File.txt' as the same file. On Linux, these become two files. This causes subtle bugs when checking out on different platforms. Use 'git config core.ignorecase' to control behavior.

Summary: Name Lookup

Name lookup is the gateway between human-readable paths and file system internals. Its efficiency determines overall system responsiveness.

Key Takeaways

•Algorithm — Iteratively resolve each path component, checking permissions and handling special cases at each step.
•dcache — Caches (parent, name) → inode mappings. Negative entries cache failed lookups. Achieves near-100% hit rates for common workloads.
•Symbolic Links — Kernel follows links transparently, with depth limits to prevent infinite loops. O_NOFOLLOW disables following.
•Mount Points — Resolution seamlessly crosses filesystem boundaries. Mount namespaces provide isolation.
•Optimizations — RCU enables lock-free reads. Short-circuits handle common cases. Hash functions tuned for path components.
•Case Sensitivity — Varies by filesystem. Affects comparison, hashing, and caching. Unicode normalization adds complexity.

Module Complete:

You've now completed the comprehensive coverage of directory implementation—from the data structures (linear lists, hash tables, B-trees) through the entries they contain, to the lookup algorithms that traverse them. This knowledge provides the foundation for understanding how file systems organize and access files efficiently.

Module Complete

Congratulations! You now understand directory implementation comprehensively: linear lists, hash tables, B-trees, directory entry structures, and name lookup algorithms. This knowledge explains how file systems scale from tiny directories to those containing millions of files, and why operations like 'ls' and 'open' can be so fast.

5 / 5

Loading learning content...

Operating SystemsFile System Implementation

Directory Implementation

LevelIntermediate

Duration120 mins

TopicFile System Implementation

5 / 5

Name Lookup

From Path to File: The Lookup Journey

What You Will Learn

The Pathname Resolution Algorithm

Pathname resolution is fundamentally iterative: starting from a known directory, look up each component of the path in sequence until reaching the final target.

The Basic Algorithm:

Pathname Resolution Algorithm
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// Simplified pathname resolution
// Returns inode of target file, or error
 
struct inode* resolve_path(const char* path, int flags) {
    struct inode* current;
    const char* component;
    int component_len;
    
    // Step 1: Determine starting point
    if (path[0] == '/') {
        // Absolute path: start from root
        current = get_root_inode();
        path++;  // Skip leading '/'
    } else {
        // Relative path: start from current working directory
        current = get_cwd_inode();
    }
    
    // Step 2: Process each path component
    while ((component_len = next_component(path, &component)) > 0) {
        // Step 2a: Verify current is a directory
        if (!S_ISDIR(current->i_mode)) {
            return ERR_PTR(-ENOTDIR);
        }
        
        // Step 2b: Check execute permission (search permission for dirs)
        if (!may_exec(current)) {
            return ERR_PTR(-EACCES);
        }
        
        // Step 2c: Look up component in current directory
        struct inode* next = dir_lookup(current, component, component_len);
        if (IS_ERR(next)) {
            return next;  // Component not found
        }
        
        // Step 2d: Handle symbolic links
        if (S_ISLNK(next->i_mode) && should_follow_link(flags, path)) {
            next = follow_symlink(next, flags);
            if (IS_ERR(next)) {
                return next;
            }
        }
        
        // Step 2e: Handle mount points
        if (is_mounted(next)) {
            next = get_mount_root(next);
        }
        
        // Move to next component
        iput(current);  // Release reference to old inode
        current = next;
        path = skip_slashes(component + component_len);
    }
    
    return current;
}
 
// Helper: Extract next path component
int next_component(const char* path, const char** out) {
    *out = path;
    int len = 0;
    while (path[len] && path[len] != '/') {
        len++;
    }
    return len;
}

Concrete Example:

Resolving /home/alice/report.pdf:

Resolution Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Path: /home/alice/report.pdf
 
Step 1: Start at root directory (inode 2)
        Path: "home/alice/report.pdf"
 
Step 2: Look up "home" in root directory
        → Scan root directory entries
        → Find entry: name="home", inode=100
        → Current = inode 100
        Path: "alice/report.pdf"
 
Step 3: Look up "alice" in /home (inode 100)
        → Check execute permission on inode 100 ✓
        → Scan /home directory entries
        → Find entry: name="alice", inode=500
        → Current = inode 500
        Path: "report.pdf"
 
Step 4: Look up "report.pdf" in /home/alice (inode 500)
        → Check execute permission on inode 500 ✓
        → Scan /home/alice directory entries
        → Find entry: name="report.pdf", inode=1234
        → No more components
        
Result: inode 1234
 
Statistics:
  - Directory lookups: 3
  - Permission checks: 3
  - Disk reads (cold cache): 3 directories + 3 inodes = 6 I/O ops
  - With hot dcache: 0 I/O ops (all in memory)

Why Execute Permission?

The Directory Entry Cache (dcache)

Without caching, every path resolution would require disk I/O. The directory entry cache (dcache) stores recent (parent directory, filename) → inode mappings in memory.

dcache Architecture:

Linux dcache Structure
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// Linux dentry structure (simplified)
struct dentry {
    // Identity
    struct qstr d_name;          // Filename with precomputed hash
    struct dentry *d_parent;     // Parent dentry
    struct inode *d_inode;       // Associated inode (NULL if negative)
    
    // Cache management
    struct hlist_node d_hash;    // Hash table linkage
    struct list_head d_lru;      // LRU list for reclaim
    struct list_head d_child;    // Child list for parent
    struct list_head d_subdirs;  // List of subdirectories
    
    // Reference counting
    atomic_t d_count;            // Reference count
    unsigned int d_flags;        // DCACHE_* flags
    
    // Operations
    const struct dentry_operations *d_op;
};
 
// qstr: Quick string with precomputed hash
struct qstr {
    unsigned int hash;           // Hash for fast comparison
    unsigned int len;            // String length
    const unsigned char *name;   // The actual name
};
 
// Lookup in dcache: O(1) average
struct dentry* d_lookup(struct dentry *parent, struct qstr *name) {
    // Compute hash bucket
    unsigned int hash = full_name_hash(parent, name->name, name->len);
    
    // Search in hash table
    struct hlist_head *head = d_hash(hash);
    struct dentry *dentry;
    
    hlist_for_each_entry(dentry, head, d_hash) {
        if (dentry->d_parent == parent &&
            dentry->d_name.hash == name->hash &&
            dentry->d_name.len == name->len &&
            memcmp(dentry->d_name.name, name->name, name->len) == 0) {
            // Found! Increment reference and return
            dget(dentry);
            return dentry;
        }
    }
    
    return NULL;  // Not in cache
}

Negative Entries:

The dcache also caches failed lookups (negative entries). If you access a nonexistent file, the dcache remembers "this name doesn't exist"—preventing repeated disk scans:

Negative Caching
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Negative dentry: d_inode is NULL
struct dentry* lookup_result;
 
if (found_in_directory) {
    // Positive entry: inode exists
    lookup_result->d_inode = found_inode;
} else {
    // Negative entry: file doesn't exist
    lookup_result->d_inode = NULL;  
    // Cache this negative result!
}
 
// Next lookup for same name:
dentry = d_lookup(parent, name);
if (dentry != NULL) {
    if (dentry->d_inode == NULL) {
        // Negative cache hit: "definitely doesn't exist"
        return -ENOENT;  // No disk I/O needed!
    }
    // Positive cache hit: return the inode
}
 
// Why negative caching matters:
// - Linkers probe for many nonexistent paths
// - Shell PATH search tries many directories
// - Build systems stat() thousands of potential files
// Without negative caching, each probe hits disk

dcache Performance Impact
Scenario	Without dcache	With dcache	Speedup
Same file opened 1000 times	1000 × disk lookups	1 disk + 999 memory	~1000x
walk /usr/lib/* (1000 files)	1000 directory reads	1 directory read	~1000x
Stat nonexistent file repeatedly	Scan directory each time	Immediate ENOENT	infinite
Compile project (50K files)	Hours of I/O	Seconds	~100x

Cache Coherency

Symbolic Link Resolution

Symbolic links add complexity to path resolution. When the kernel encounters a symlink, it must read the link's target and continue resolution from there—potentially recursively.

Symlink Resolution Algorithm:

Symlink Handling
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// Symlink resolution with loop detection
#define MAX_SYMLINK_DEPTH 40  // Linux limit
 
struct inode* follow_symlink(
    struct inode* link,
    int flags,
    int* depth           // Current recursion depth
) {
    char target[PATH_MAX];
    int len;
    
    // Step 1: Check recursion depth (prevent infinite loops)
    if (*depth >= MAX_SYMLINK_DEPTH) {
        return ERR_PTR(-ELOOP);  // Too many levels of symbolic links
    }
    (*depth)++;
    
    // Step 2: Read symlink target
    len = link->i_op->readlink(link, target, PATH_MAX);
    if (len < 0) {
        return ERR_PTR(len);
    }
    target[len] = '\0';
    
    // Step 3: Resolve the target path
    // If absolute, start from root
    // If relative, start from symlink's parent directory
    if (target[0] == '/') {
        return resolve_path_internal(target, flags, depth);
    } else {
        struct dentry *parent = link->i_dentry->d_parent;
        return resolve_path_from(parent, target, flags, depth);
    }
}
 
// Example symlink chains:
// /usr/lib -> /usr/lib64        (simple redirect)
// /var/run -> /run              (backward compat)
// a -> b -> c -> a              (LOOP! Returns -ELOOP)
 
// Special handling for different operations:
// - open(O_NOFOLLOW): Don't follow final symlink
// - lstat(): Return symlink info, don't follow
// - unlink() on symlink: Remove the link, not target

Symlink Complexity Examples:

Symlink Resolution Examples
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
Example 1: Simple symlink
──────────────────────────
Path: /lib/libc.so.6
/lib is symlink -> /usr/lib
 
Resolution:
  1. "/" -> inode 2 (root)
  2. "lib" in / -> inode 50, is symlink!
     Read symlink: "/usr/lib"
     Restart from root with "/usr/lib"
  3. "/" -> inode 2
  4. "usr" in / -> inode 10
  5. "lib" in /usr -> inode 20
  6. Continue: "libc.so.6" in /usr/lib -> inode 789
 
Total lookups: 6 (more than path components due to symlink)
 
 
Example 2: Relative symlink
───────────────────────────
/home/alice/latest -> reports/2024/final.pdf
 
Resolution:
  1-3. Resolve to /home/alice, find "latest" is symlink
  4. Read symlink: "reports/2024/final.pdf" (relative!)
  5. Resolve from /home/alice: "reports" -> inode 100
  6. "2024" in reports -> inode 101
  7. "final.pdf" in 2024 -> inode 500
 
 
Example 3: Symlink loop
───────────────────────
/tmp/a -> b
/tmp/b -> c  
/tmp/c -> a
 
Path: /tmp/a
 
Resolution:
  1. "/" -> inode 2
  2. "tmp" -> inode 5
  3. "a" -> symlink! Read: "b"
  4. "b" in /tmp -> symlink! Read: "c", depth=2
  5. "c" in /tmp -> symlink! Read: "a", depth=3
  6. "a" in /tmp -> symlink! Read: "b", depth=4
  ... continues until depth=40 ...
  41. Return -ELOOP

TOCTOU Vulnerabilities

Mount Point Traversal

When a filesystem is mounted on a directory, that directory becomes a gateway to a different filesystem. Path resolution must detect mount points and switch to the mounted filesystem's root.

Mount Architecture:

Mount Point Handling
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Simplified mount structure
struct mount {
    struct vfsmount mnt;         // VFS mount info
    struct dentry *mnt_root;     // Root of mounted filesystem
    struct dentry *mnt_mountpoint; // Where it's mounted
    struct mount *mnt_parent;    // Parent mount
    // ...
};
 
// During path resolution:
struct inode* lookup_with_mounts(
    struct dentry *parent,
    struct qstr *name
) {
    struct dentry *dentry = d_lookup(parent, name);
    if (!dentry) {
        // Not in cache, read from disk
        dentry = real_lookup(parent, name);
    }
    
    // Check if this dentry is a mount point
    while (d_mountpoint(dentry)) {
        struct mount *mnt = lookup_mnt(dentry);
        if (!mnt) break;
        
        // Switch to mounted filesystem's root
        dentry = mnt->mnt_root;
        // Continue checking: mounts can stack!
    }
    
    return dentry->d_inode;
}
 
// Mount stacking example:
// /mnt/usb   <- ext4 USB drive mounted here
// /mnt/usb/encrypted <- LUKS container from USB mounted here
// /mnt/usb/encrypted/secret <- another mount
// Path resolution traverses all mount boundaries transparently

Mount Namespace Isolation:

Modern Linux supports mount namespaces—different processes can see different mount hierarchies:

Mount Namespaces
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Mount Namespace Example (containers):
═══════════════════════════════════════════════════════════════════
 
Host namespace:
  /
  ├── home/
  ├── var/
  └── mnt/
      └── docker/
          └── overlay/  <- Container layered filesystem
 
Container namespace (isolated view):
  /                      <- Actually docker/overlay/merged
  ├── home/              <- Only container's /home
  ├── var/
  └── etc/
 
Same path "/home" resolves to different directories
depending on which namespace the process is in!
 
Path resolution must consider:
1. Which mount namespace is the process in?
2. What mounts exist in that namespace?
3. Which filesystems can this process access?

Mount Traversal Considerations

•Crossing Upward — When resolving '..' at mount root, result is mount point in parent filesystem, not parent directory in mounted filesystem.
•Bind Mounts — Same filesystem mounted at multiple locations. Resolution must track which mount we entered through.
•Automount — Some mount points trigger mounting when accessed. NFS 'autofs' does this for network shares.
•Lazy Unmount — Filesystem may be 'unmounted' while still in use. Active processes keep references.

Chroot Jails

Lookup Optimizations

Path resolution is so frequent that even small optimizations have outsized impact. Modern kernels employ numerous techniques.

RCU-Based dcache Lookup:

RCU dcache Optimization
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
// Linux uses RCU (Read-Copy-Update) for lock-free dcache reads
// Most lookups require NO LOCKS at all!
 
struct dentry* d_lookup_rcu(struct dentry *parent, struct qstr *name) {
    struct dentry *dentry;
    unsigned seq;
    
    rcu_read_lock();  // Lightweight: no actual locking
    
retry:
    seq = read_seqcount_begin(&parent->d_seq);
    
    // Walk hash chain without locks
    dentry = __d_lookup_rcu(parent, name);
    
    if (dentry) {
        if (read_seqcount_retry(&parent->d_seq, seq)) {
            goto retry;  // Parent changed, retry
        }
    }
    
    rcu_read_unlock();
    return dentry;
}
 
// Performance impact:
// - Typical lookup: ~50-100 CPU cycles
// - No cache line bouncing between CPUs
// - Scales linearly with cores

Short-Circuit Optimizations:

Path Resolution Shortcuts
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Special cases handled without full resolution
 
// Case 1: Empty path or just slashes
if (path[0] == '\0' || (path[0] == '/' && path[1] == '\0')) {
    return get_cwd();  // No resolution needed
}
 
// Case 2: "." (current directory)
if (path[0] == '.' && (path[1] == '\0' || path[1] == '/')) {
    // Skip this component entirely
}
 
// Case 3: ".." (parent directory) - cached in dentry
dentry = current->d_parent;  // O(1), no lookup
 
// Case 4: AT_EMPTY_PATH with fd
// openat(fd, "", AT_EMPTY_PATH) operates on fd itself
// No path resolution at all
 
// Case 5: Cached working directory
// cwd dentry kept in task_struct
// No resolution of /home/user/project each time
 
// Case 6: O_PATH opens
// open("/some/path", O_PATH) just returns fd
// Actual file content not accessed
// Used for fstatat(), fchmod() etc.

Hash Optimization:

Name Hash Optimization
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Linux uses a specialized hash for path components
// full_name_hash() includes parent's hash for uniqueness
 
unsigned int full_name_hash(
    const struct dentry *parent,
    const unsigned char *name,
    unsigned int len
) {
    unsigned int hash = parent->d_name.hash;
    
    // Rolling hash: very fast, good distribution
    while (len--) {
        hash = partial_name_hash(*name++, hash);
    }
    
    return end_name_hash(hash);
}
 
// partial_name_hash: single byte addition
static inline unsigned int partial_name_hash(
    unsigned long c,
    unsigned long prevhash
) {
    // Multiply and add: excellent avalanche properties
    return (prevhash + (c << 4) + (c >> 4)) * 11;
}
 
// Benefits:
// 1. Incremental: can compute while parsing path
// 2. Parent-aware: "foo" in different dirs has different hash
// 3. Fast: few instructions per character
// 4. Good distribution: minimizes hash collisions

Lookup Performance Factors
Factor	Impact	Optimization
dcache hit rate	Major (10-100x)	Large cache, smart eviction
Lock contention	Moderate (2-5x on many cores)	RCU, per-directory locks
Hash quality	Minor (1.1-1.5x)	Good hash function, sizing
Symlink depth	Can be major (40x max)	Limit depth, cache resolutions
Mount complexity	Usually minor	Mount cache, lazy unmount

Path Lookup Profiling

Case Sensitivity and Normalization

Different file systems handle case differently, which affects path resolution significantly.

Case Handling Modes:

Case Sensitivity by File System
File System	Case Sensitive	Case Preserving	Notes
ext4 (Linux)	Yes	Yes	'File.txt' ≠ 'file.txt'
NTFS (Windows)	No*	Yes	Registry controls sensitivity
APFS (macOS)	No*	Yes	Per-volume option
FAT32	No	Limited (8.3 uppercase)	LFN preserves case
HFS+	No	Yes	Mac default

Case-Insensitive Lookup
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Case-insensitive comparison (NTFS style)
int compare_names_ci(const char *a, const char *b, size_t len) {
    for (size_t i = 0; i < len; i++) {
        // Convert to uppercase for comparison
        char ca = toupper((unsigned char)a[i]);
        char cb = toupper((unsigned char)b[i]);
        if (ca != cb) return ca - cb;
    }
    return 0;
}
 
// But it's more complex with Unicode:
// - "straße" should equal "STRASSE" (German sharp S)
// - Turkish: "i" has two uppercase forms: "I" and "İ"
// - Unicode has multiple normalization forms
 
// Windows solution: Upcase table in $UpCase file
// Maps every Unicode character to uppercase equivalent
// Lookup uses: upcase_table[char1] == upcase_table[char2]
 
// macOS HFS+ uses Unicode decomposition:
// "café" stored as "cafe\u0301" (e + combining accent)
// Comparisons normalize before matching
 
// Implications for dcache:
// - Case-sensitive: exact hash of name
// - Case-insensitive: hash of uppercased/normalized name
// - Need separate caches for each mode!

Cross-Platform Dangers

Summary: Name Lookup

Name lookup is the gateway between human-readable paths and file system internals. Its efficiency determines overall system responsiveness.

Key Takeaways

•Algorithm — Iteratively resolve each path component, checking permissions and handling special cases at each step.
•dcache — Caches (parent, name) → inode mappings. Negative entries cache failed lookups. Achieves near-100% hit rates for common workloads.
•Symbolic Links — Kernel follows links transparently, with depth limits to prevent infinite loops. O_NOFOLLOW disables following.
•Mount Points — Resolution seamlessly crosses filesystem boundaries. Mount namespaces provide isolation.
•Optimizations — RCU enables lock-free reads. Short-circuits handle common cases. Hash functions tuned for path components.
•Case Sensitivity — Varies by filesystem. Affects comparison, hashing, and caching. Unicode normalization adds complexity.

Module Complete:

Module Complete

5 / 5