Directory Operations - Learning Module

Loading content...

0/240

List Directory

Reading the Contents of Directories

The ls command is perhaps the most frequently used command in Unix-like systems. Every time you list a directory's contents, you're invoking a sophisticated mechanism that reads directory entries, potentially retrieves file metadata, and presents the results in a human-readable format.

But beneath the simplicity of ls lies a rich set of system interfaces and design decisions. Why can't you read() a directory like a regular file? How does the kernel prevent partial reads of directory entries? What happens when a directory has millions of files? Understanding directory listing reveals fundamental truths about the contract between user space and the kernel.

What You Will Learn

By the end of this page, you will understand the complete mechanics of directory listing—from the POSIX readdir() interface through the underlying getdents() system call, directory stream management, the dirent structure, and cross-platform considerations. You'll also learn about performance implications and how to efficiently enumerate large directories.

Why read() Doesn't Work on Directories

In Unix-like systems, directories are files. This might suggest you could simply open() a directory and read() its contents as raw bytes. However, this approach is explicitly forbidden by modern POSIX systems, and the reasons are fundamental to file system integrity.

Historical Context:

In early Unix (Version 6 and earlier), you could read directories as raw bytes. The directory format was simple and fixed:

struct direct {
    ino_t d_ino;      /* 2 bytes: inode number */
    char  d_name[14]; /* 14 bytes: file name */
};

Each entry was exactly 16 bytes. Programs could read directories byte-by-byte and parse entries directly. But this approach caused serious problems:

Problems with Raw Directory Reading

•Format Lock-In — User programs embedded knowledge of directory structure, making it impossible to change formats without breaking applications
•Portability Nightmare — Different file systems (ext2, XFS, etc.) use different directory formats; programs couldn't work across file systems
•Partial Read Danger — If read() returned partial entries, user programs could parse garbage as valid entries
•Security Concerns — Raw access could expose deleted entry remnants or internal file system metadata
•No Room for Growth — The 14-character filename limit was embedded in every program that read directories

The Modern Solution:

Starting with BSD and System V, Unix systems required the use of dedicated directory-reading interfaces. The kernel provides an abstraction layer that hides the on-disk format:

User space gets a standard struct dirent regardless of file system
The kernel translates from on-disk format to the standard structure
Reads are guaranteed to return complete entries, never partial ones
New file systems can use any format without breaking applications

why_read_fails.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
 
/**
 * Demonstrating that read() fails on directories
 */
void demonstrate_read_failure(void) {
    int fd = open("/tmp", O_RDONLY);
    if (fd == -1) {
        perror("open");
        return;
    }
    
    char buffer[1024];
    ssize_t n = read(fd, buffer, sizeof(buffer));
    
    if (n == -1) {
        // On Linux: EISDIR (Is a directory)
        // On some systems: EBADF or other error
        printf("read() on directory failed: %s (errno=%d)
", 
               strerror(errno), errno);
    }
    
    close(fd);
}
 
/**
 * The correct way: use opendir()/readdir() or getdents()
 */
#include <dirent.h>
 
void demonstrate_readdir(void) {
    DIR *dir = opendir("/tmp");
    if (dir == NULL) {
        perror("opendir");
        return;
    }
    
    struct dirent *entry;
    while ((entry = readdir(dir)) != NULL) {
        printf("Found: %s (inode: %lu)
", 
               entry->d_name, (unsigned long)entry->d_ino);
    }
    
    closedir(dir);
}

EISDIR Error

When you call read() on a directory file descriptor on Linux, you get EISDIR (Is a directory). Other systems may return EBADF or silently return 0 bytes. The POSIX standard leaves this behavior implementation-defined, but all modern systems effectively prohibit it.

The POSIX Directory Reading Interface

POSIX defines a standard interface for reading directories that abstracts away file system specifics. The key components are:

opendir() — Open a directory and get a directory stream
readdir() — Read the next entry from the stream
closedir() — Close the directory stream
rewinddir() — Reset stream to beginning
seekdir() / telldir() — Position within the stream

The Core Interface:

posix_readdir.c
C (POSIX)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
#include <dirent.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <sys/stat.h>
 
/**
 * Basic directory listing using opendir/readdir/closedir
 */
int list_directory(const char *path) {
    DIR *dir;
    struct dirent *entry;
    
    // Open the directory stream
    dir = opendir(path);
    if (dir == NULL) {
        perror("opendir");
        return -1;
    }
    
    // Read entries one by one
    // Note: readdir() returns NULL at end-of-directory OR on error
    // To distinguish, reset errno before calling
    errno = 0;
    while ((entry = readdir(dir)) != NULL) {
        printf("%s
", entry->d_name);
        errno = 0;  // Reset for next iteration
    }
    
    // Check if we ended due to error
    if (errno != 0) {
        perror("readdir");
        closedir(dir);
        return -1;
    }
    
    closedir(dir);
    return 0;
}
 
/**
 * The dirent structure (POSIX guaranteed fields)
 *
 * struct dirent {
 *     ino_t  d_ino;       // inode number
 *     char   d_name[];    // filename (null-terminated)
 * };
 *
 * Linux extends this with additional fields (not portable):
 *   off_t  d_off;         // offset to next entry
 *   unsigned char d_type; // file type (DT_REG, DT_DIR, etc.)
 *   unsigned short d_reclen; // length of this record
 */
 
/**
 * More detailed listing using Linux d_type extension
 */
void list_directory_with_types(const char *path) {
    DIR *dir = opendir(path);
    if (!dir) {
        perror("opendir");
        return;
    }
    
    struct dirent *entry;
    while ((entry = readdir(dir)) != NULL) {
        const char *type;
        
        // d_type is a Linux extension, may not be filled on all filesystems
        switch (entry->d_type) {
            case DT_REG:  type = "file";      break;
            case DT_DIR:  type = "directory"; break;
            case DT_LNK:  type = "symlink";   break;
            case DT_CHR:  type = "char dev";  break;
            case DT_BLK:  type = "block dev"; break;
            case DT_FIFO: type = "fifo";      break;
            case DT_SOCK: type = "socket";    break;
            case DT_UNKNOWN:
            default:      type = "unknown";   break;
        }
        
        printf("[%s] %s (inode %lu)
", 
               type, entry->d_name, (unsigned long)entry->d_ino);
    }
    
    closedir(dir);
}
 
/**
 * Handling the d_type caveat
 *
 * Some filesystems (notably older NFS, some XFS configurations) 
 * don't support d_type and return DT_UNKNOWN. Always fall back
 * to stat() when d_type is DT_UNKNOWN.
 */
int is_directory(int dirfd, const char *name, unsigned char d_type) {
    if (d_type != DT_UNKNOWN) {
        return d_type == DT_DIR;
    }
    
    // Fallback: use stat
    struct stat st;
    if (fstatat(dirfd, name, &st, AT_SYMLINK_NOFOLLOW) == -1) {
        return 0;  // Assume not a directory on error
    }
    
    return S_ISDIR(st.st_mode);
}

d_type Values (Linux Extension)
Value	Meaning	Description
DT_REG	Regular file	Normal data file
DT_DIR	Directory	Subdirectory
DT_LNK	Symbolic link	Soft link to another file
DT_CHR	Character device	Character special file
DT_BLK	Block device	Block special file
DT_FIFO	Named pipe (FIFO)	Inter-process communication
DT_SOCK	Socket	Unix domain socket
DT_UNKNOWN	Unknown	Type not determined; use stat()

The readdir() Return Value Trap

readdir() returns NULL both at end-of-directory AND on error. The only way to distinguish is to set errno to 0 before calling and check it afterward. If errno is non-zero after NULL is returned, an error occurred. This is a common source of bugs—many programs don't check for readdir errors at all.

The Underlying getdents() System Call

While readdir() is a C library function, the actual kernel interface is the getdents() system call ("get directory entries"). Understanding this lower level reveals how directory reading really works.

The getdents() Interface:

getdents_syscall.c
C (Linux)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/syscall.h>
 
/**
 * Linux directory entry structure (returned by getdents64)
 */
struct linux_dirent64 {
    unsigned long long d_ino;     /* 64-bit inode number */
    long long          d_off;     /* Offset to next entry */
    unsigned short     d_reclen;  /* Length of this entry */
    unsigned char      d_type;    /* File type */
    char               d_name[];  /* Filename (null-terminated) */
};
 
/**
 * Using getdents64() directly (bypassing libc readdir)
 *
 * This is what the C library's readdir() does internally,
 * but with more control over buffer size.
 *
 * The advantage of using getdents directly:
 * - Can read multiple entries at once (more efficient)
 * - Control over buffer size
 * - Access to d_off for seeking
 */
int list_directory_getdents(const char *path) {
    int fd = open(path, O_RDONLY | O_DIRECTORY);
    if (fd == -1) {
        perror("open");
        return -1;
    }
    
    // Buffer for directory entries
    // Larger buffer = fewer system calls = better performance
    char buffer[8192];
    int nread;
    
    while ((nread = syscall(SYS_getdents64, fd, buffer, sizeof(buffer))) > 0) {
        int offset = 0;
        
        while (offset < nread) {
            struct linux_dirent64 *entry = 
                (struct linux_dirent64 *)(buffer + offset);
            
            printf("inode=%llu, reclen=%u, type=%u, name=%s
",
                   entry->d_ino,
                   entry->d_reclen,
                   entry->d_type,
                   entry->d_name);
            
            // Move to next entry
            offset += entry->d_reclen;
        }
    }
    
    if (nread == -1) {
        perror("getdents64");
        close(fd);
        return -1;
    }
    
    close(fd);
    return 0;
}
 
/**
 * Why libc readdir() uses an internal buffer
 *
 * getdents() is expensive (context switch to kernel), so libc
 * reads many entries at once and returns them one at a time:
 *
 * User calls readdir()
 *   If buffer empty or exhausted:
 *     -> Call getdents() to refill buffer
 *   Return next entry from buffer
 */

How readdir() and getdents() Interact:

The C library maintains internal state in the DIR structure:

dir_stream_internals.c
Pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
/**
 * Simplified DIR structure (actual implementation varies)
 */
typedef struct {
    int fd;              /* File descriptor from open() */
    char *buf;           /* Buffer for getdents() results */
    size_t buf_size;     /* Size of buffer */
    size_t buf_offset;   /* Current position in buffer */
    size_t buf_end;      /* End of valid data in buffer */
    struct dirent entry; /* Current entry (returned to user) */
} DIR;
 
/**
 * Simplified opendir() implementation
 */
DIR *opendir(const char *path) {
    int fd = open(path, O_RDONLY | O_DIRECTORY | O_CLOEXEC);
    if (fd == -1) {
        return NULL;
    }
    
    DIR *dir = malloc(sizeof(DIR));
    if (!dir) {
        close(fd);
        return NULL;
    }
    
    dir->fd = fd;
    dir->buf_size = 32768;  /* Typical: 32KB buffer */
    dir->buf = malloc(dir->buf_size);
    dir->buf_offset = 0;
    dir->buf_end = 0;
    
    if (!dir->buf) {
        close(fd);
        free(dir);
        return NULL;
    }
    
    return dir;
}
 
/**
 * Simplified readdir() implementation
 */
struct dirent *readdir(DIR *dir) {
    /* Check if we need to refill buffer */
    if (dir->buf_offset >= dir->buf_end) {
        /* Call getdents to get more entries */
        ssize_t n = syscall(SYS_getdents64, dir->fd, 
                           dir->buf, dir->buf_size);
        if (n <= 0) {
            return NULL;  /* End of directory or error */
        }
        dir->buf_end = n;
        dir->buf_offset = 0;
    }
    
    /* Extract current entry from buffer */
    struct linux_dirent64 *linux_entry = 
        (void *)(dir->buf + dir->buf_offset);
    
    /* Copy to standard dirent structure */
    dir->entry.d_ino = linux_entry->d_ino;
    strcpy(dir->entry.d_name, linux_entry->d_name);
    /* ... copy other fields ... */
    
    /* Advance for next call */
    dir->buf_offset += linux_entry->d_reclen;
    
    return &dir->entry;
}

Performance Insight

The buffering strategy means readdir() makes far fewer system calls than you might expect. For a directory with 1000 entries, there might be only 2-3 getdents() calls total, not 1000 calls. Each getdents() returns multiple entries up to the buffer size.

Directory Entry Order and Stability

A common misconception is that directory entries are returned in alphabetical order or in the order files were created. Neither is guaranteed.

The Reality of Entry Order:

What POSIX Says About Order

•No ordering guarantee — POSIX explicitly does not specify the order of entries
•Implementation-defined — Each file system returns entries in its internal order
•. and .. have no special position — They appear somewhere, but not necessarily first
•Order may change — Adding or removing files can reorder existing entries
•Same directory, different runs — Order may differ between scans of the same directory

File System Specific Ordering:

File System	Order Behavior
ext4 (htree)	Hash-based; appears random
XFS	B+ tree; somewhat sorted by hash
FAT	Creation order (mostly)
NTFS	Alphabetical in B+ tree index
tmpfs	Insertion order (list-based)

Implications for Applications:

sorted_listing.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
#include <dirent.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
 
/**
 * WRONG: Assuming entries come out sorted
 *
 * This is a common bug - don't do this!
 */
void wrong_assumption(const char *path) {
    DIR *dir = opendir(path);
    struct dirent *entry;
    
    while ((entry = readdir(dir)) != NULL) {
        // BUG: assuming entries are sorted, processing
        // as if first entry is "smallest"
        printf("%s
", entry->d_name);
    }
    closedir(dir);
}
 
/**
 * CORRECT: Sort entries after reading them all
 */
 
// Comparison function for qsort
int compare_dirents(const void *a, const void *b) {
    const struct dirent **da = (const struct dirent **)a;
    const struct dirent **db = (const struct dirent **)b;
    return strcmp((*da)->d_name, (*db)->d_name);
}
 
void list_sorted(const char *path) {
    DIR *dir = opendir(path);
    if (!dir) return;
    
    // First pass: count entries
    int count = 0;
    struct dirent *entry;
    while ((entry = readdir(dir)) != NULL) {
        count++;
    }
    
    // Allocate array
    struct dirent **entries = malloc(count * sizeof(struct dirent *));
    
    // Second pass: read entries
    rewinddir(dir);
    int i = 0;
    while ((entry = readdir(dir)) != NULL && i < count) {
        // Must copy because readdir returns pointer to static buffer
        entries[i] = malloc(sizeof(struct dirent));
        memcpy(entries[i], entry, sizeof(struct dirent));
        i++;
    }
    closedir(dir);
    
    // Sort
    qsort(entries, count, sizeof(struct dirent *), compare_dirents);
    
    // Print sorted
    for (i = 0; i < count; i++) {
        printf("%s
", entries[i]->d_name);
        free(entries[i]);
    }
    free(entries);
}
 
/**
 * BETTER: Use scandir() which handles allocation and sorting
 *
 * scandir() reads all entries into an allocated array and
 * can apply a filter and/or sort function.
 */
#include <dirent.h>
 
void list_sorted_scandir(const char *path) {
    struct dirent **namelist;
    int n;
    
    // alphasort is a standard comparison function
    // All entries pass (no filter) and are sorted alphabetically
    n = scandir(path, &namelist, NULL, alphasort);
    if (n == -1) {
        perror("scandir");
        return;
    }
    
    for (int i = 0; i < n; i++) {
        printf("%s
", namelist[i]->d_name);
        free(namelist[i]);
    }
    free(namelist);
}
 
/**
 * Custom filter example: only show regular files
 */
int file_filter(const struct dirent *entry) {
    // Skip . and ..
    if (entry->d_name[0] == '.' && 
        (entry->d_name[1] == '\0' || 
         (entry->d_name[1] == '.' && entry->d_name[2] == '\0'))) {
        return 0;
    }
    
    // Only include regular files (requires d_type support)
    return entry->d_type == DT_REG;
}
 
void list_files_only(const char *path) {
    struct dirent **namelist;
    int n = scandir(path, &namelist, file_filter, alphasort);
    
    if (n == -1) {
        perror("scandir");
        return;
    }
    
    printf("Regular files in %s:
", path);
    for (int i = 0; i < n; i++) {
        printf("  %s
", namelist[i]->d_name);
        free(namelist[i]);
    }
    free(namelist);
}

Why ls Shows Sorted Output

When you run ls, you see sorted output not because the file system returns sorted entries, but because ls reads all entries first, then sorts them. This is why ls on huge directories is slow—it must read everything before printing anything. Use ls -f to skip sorting and see entries in file system order.

Directory Stream Positioning

POSIX provides functions to save and restore positions within a directory stream. However, these functions come with significant caveats and are rarely needed in practice.

The seekdir/telldir Interface:

directory_seeking.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#include <dirent.h>
#include <stdio.h>
 
/**
 * rewinddir() - Reset to beginning of directory
 *
 * This is the most reliable positioning function.
 * It's guaranteed to work correctly.
 */
void demonstrate_rewinddir(const char *path) {
    DIR *dir = opendir(path);
    
    // First scan: count entries
    int count = 0;
    while (readdir(dir) != NULL) {
        count++;
    }
    printf("Found %d entries
", count);
    
    // Reset to beginning
    rewinddir(dir);
    
    // Second scan: print entries
    struct dirent *entry;
    while ((entry = readdir(dir)) != NULL) {
        printf("%s
", entry->d_name);
    }
    
    closedir(dir);
}
 
/**
 * telldir()/seekdir() - Save and restore position
 *
 * WARNING: These functions have portability and reliability issues!
 * The position is only valid for the lifetime of the DIR stream.
 * Directory modifications may invalidate saved positions.
 */
void demonstrate_telldir_seekdir(const char *path) {
    DIR *dir = opendir(path);
    struct dirent *entry;
    long saved_pos = -1;
    
    // Find a specific entry and save its position
    while ((entry = readdir(dir)) != NULL) {
        if (strcmp(entry->d_name, "target_file.txt") == 0) {
            saved_pos = telldir(dir);
            printf("Found target at position %ld
", saved_pos);
            break;
        }
    }
    
    // Continue reading...
    while ((entry = readdir(dir)) != NULL) {
        printf("After target: %s
", entry->d_name);
    }
    
    // Seek back to saved position
    if (saved_pos != -1) {
        seekdir(dir, saved_pos);
        entry = readdir(dir);
        if (entry) {
            printf("Back to: %s
", entry->d_name);
        }
    }
    
    closedir(dir);
}
 
/**
 * The Problems with telldir/seekdir:
 *
 * 1. Position format is implementation-defined (may be offset,
 *    may be cookie, may be something else entirely)
 * 
 * 2. Positions are only valid for the same DIR stream
 *    - Can't save position, close, reopen, and seek
 *
 * 3. Directory modifications may invalidate positions
 *    - Adding/removing entries can shift positions
 *    - Many implementations don't handle this well
 *
 * 4. Some filesystems (especially FUSE, network FS) have trouble
 *    implementing seekdir reliably
 *
 * RECOMMENDATION: Avoid telldir/seekdir unless absolutely necessary.
 * Instead, reread the directory or build an in-memory index.
 */

seekdir() Pitfalls

The value returned by telldir() is opaque—don't try to manipulate it. On some systems it's a byte offset, on others it's a cookie that only the kernel understands. Using telldir/seekdir across directory modifications can lead to skipped or duplicated entries. Modern applications should avoid these functions.

Performance Considerations for Large Directories

Directory listing performance becomes critical when dealing with directories containing thousands or millions of entries. Understanding the performance characteristics helps you make informed design decisions.

Factors Affecting Performance:

Performance Factors for Directory Listing
Factor	Impact	Mitigation
Directory size	Larger directories take longer to enumerate	Use subdirectories to limit entries per directory
File system type	Hash-based (ext4 htree) vs linear (old ext2) matters	Choose appropriate FS for workload
Buffer size	Larger buffers = fewer syscalls	Use getdents() directly for control
stat() calls	Getting metadata is expensive	Use d_type when available; batch stat calls
Sorting requirement	Sorting requires reading all entries first	Accept unsorted order when possible
Network file systems	Each readdir may be a network round-trip	Increase buffer size; cache results

Optimizing Directory Listing:

optimized_listing.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
#include <dirent.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>
 
/**
 * Optimization 1: Use d_type to avoid stat() calls
 *
 * Many operations only need to know if an entry is a file or directory.
 * Using d_type avoids the expensive stat() syscall per entry.
 */
void list_directories_fast(const char *path) {
    DIR *dir = opendir(path);
    if (!dir) return;
    
    struct dirent *entry;
    while ((entry = readdir(dir)) != NULL) {
        // Skip . and ..
        if (entry->d_name[0] == '.' && 
            (entry->d_name[1] == '\0' || 
             (entry->d_name[1] == '.' && entry->d_name[2] == '\0'))) {
            continue;
        }
        
        // Use d_type if available - much faster than stat()
        if (entry->d_type == DT_DIR) {
            printf("[DIR] %s
", entry->d_name);
        } else if (entry->d_type == DT_UNKNOWN) {
            // Fallback for filesystems without d_type support
            struct stat st;
            fstatat(dirfd(dir), entry->d_name, &st, 0);
            if (S_ISDIR(st.st_mode)) {
                printf("[DIR] %s
", entry->d_name);
            }
        }
    }
    
    closedir(dir);
}
 
/**
 * Optimization 2: Use openat/fstatat pattern
 *
 * Opening the directory once and using *at() functions
 * avoids repeated path concatenation and parsing.
 */
void list_with_sizes(const char *path) {
    DIR *dir = opendir(path);
    if (!dir) return;
    
    // Get the fd for the directory
    int dfd = dirfd(dir);
    
    struct dirent *entry;
    struct stat st;
    
    while ((entry = readdir(dir)) != NULL) {
        if (entry->d_name[0] == '.' && 
            (entry->d_name[1] == '\0' || 
             (entry->d_name[1] == '.' && entry->d_name[2] == '\0'))) {
            continue;
        }
        
        // fstatat is more efficient than building path + stat()
        if (fstatat(dfd, entry->d_name, &st, AT_SYMLINK_NOFOLLOW) == 0) {
            printf("%10ld %s
", (long)st.st_size, entry->d_name);
        }
    }
    
    closedir(dir);
}
 
/**
 * Optimization 3: Streaming vs collecting
 *
 * For operations that don't need all entries at once:
 * - Process entries as they're read
 * - Don't allocate memory for the full list
 * - Start outputting before finishing enumeration
 */
void count_files_streaming(const char *path) {
    DIR *dir = opendir(path);
    if (!dir) return;
    
    int file_count = 0;
    int dir_count = 0;
    
    struct dirent *entry;
    while ((entry = readdir(dir)) != NULL) {
        switch (entry->d_type) {
            case DT_REG: file_count++; break;
            case DT_DIR: dir_count++; break;
            default: break;
        }
    }
    
    closedir(dir);
    
    // Only need counters, not the entire list
    printf("Files: %d, Directories: %d
", file_count, dir_count - 2);
}
 
/**
 * Optimization 4: Parallel processing for large directories
 *
 * For directories with thousands of entries where each
 * entry requires expensive processing (like checksumming):
 * - Read entries in batches
 * - Dispatch batches to worker threads
 * - Overlap I/O with processing
 */

The stat() Bottleneck

A common performance killer is calling stat() on every entry in a large directory. Each stat() is a separate system call and disk I/O. On a directory with 10,000 files, that's 10,000 additional system calls. Use d_type when possible, batch stat calls, or redesign to avoid needing per-file metadata during enumeration.

Windows Directory Enumeration

Windows uses a different approach to directory listing—the FindFirstFile/FindNextFile pattern. This provides file metadata along with the filename in a single structure.

Windows Directory Enumeration:

windows_listing.c
C (Windows)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
#include <windows.h>
#include <stdio.h>
 
/**
 * FindFirstFileW / FindNextFileW pattern
 *
 * Key differences from POSIX:
 * - Uses a search pattern (supports wildcards)
 * - Returns full metadata, not just name/inode
 * - No separate stat() needed
 * - Handle-based, not stream-based
 */
void list_directory_windows(const wchar_t *path) {
    WIN32_FIND_DATAW findData;
    wchar_t searchPath[MAX_PATH];
    
    // Build search pattern: "path\*"
    swprintf(searchPath, MAX_PATH, L"%s\\*", path);
    
    // Start enumeration
    HANDLE hFind = FindFirstFileW(searchPath, &findData);
    if (hFind == INVALID_HANDLE_VALUE) {
        DWORD error = GetLastError();
        if (error == ERROR_FILE_NOT_FOUND) {
            wprintf(L"Directory is empty: %s
", path);
        } else if (error == ERROR_PATH_NOT_FOUND) {
            wprintf(L"Directory not found: %s
", path);
        } else {
            wprintf(L"FindFirstFile failed: %lu
", error);
        }
        return;
    }
    
    do {
        // Skip . and ..
        if (wcscmp(findData.cFileName, L".") == 0 ||
            wcscmp(findData.cFileName, L"..") == 0) {
            continue;
        }
        
        wchar_t type[16];
        if (findData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) {
            wcscpy(type, L"[DIR]");
        } else {
            wcscpy(type, L"[FILE]");
        }
        
        // WIN32_FIND_DATA includes size, timestamps, attributes
        ULARGE_INTEGER fileSize;
        fileSize.LowPart = findData.nFileSizeLow;
        fileSize.HighPart = findData.nFileSizeHigh;
        
        wprintf(L"%-8s %12llu %s
", 
                type, fileSize.QuadPart, findData.cFileName);
        
    } while (FindNextFileW(hFind, &findData));
    
    // Check if we ended due to no more files or error
    DWORD error = GetLastError();
    if (error != ERROR_NO_MORE_FILES) {
        wprintf(L"FindNextFile failed: %lu
", error);
    }
    
    FindClose(hFind);
}
 
/**
 * WIN32_FIND_DATAW structure includes:
 *   - dwFileAttributes (file/dir/system/hidden/etc.)
 *   - ftCreationTime, ftLastAccessTime, ftLastWriteTime
 *   - nFileSizeHigh, nFileSizeLow (64-bit file size)
 *   - cFileName (name up to MAX_PATH)
 *   - cAlternateFileName (8.3 short name)
 *
 * This is more efficient than POSIX readdir() + stat()
 * because all metadata comes in one operation.
 */
 
/**
 * Pattern matching support
 *
 * Unlike POSIX opendir/readdir, Windows FindFirstFile
 * supports glob patterns: *.txt, data_*.csv, etc.
 */
void list_cpp_files(const wchar_t *path) {
    WIN32_FIND_DATAW findData;
    wchar_t searchPath[MAX_PATH];
    
    // Search for *.cpp files only
    swprintf(searchPath, MAX_PATH, L"%s\\*.cpp", path);
    
    HANDLE hFind = FindFirstFileW(searchPath, &findData);
    if (hFind == INVALID_HANDLE_VALUE) {
        return;
    }
    
    wprintf(L"C++ source files:
");
    do {
        wprintf(L"  %s
", findData.cFileName);
    } while (FindNextFileW(hFind, &findData));
    
    FindClose(hFind);
}
 
/**
 * FindFirstFileExW for more control
 *
 * Provides options for:
 * - Case sensitivity
 * - Large fetch (better performance)
 * - Filtering (directories only, etc.)
 */
void list_directories_only(const wchar_t *path) {
    WIN32_FIND_DATAW findData;
    wchar_t searchPath[MAX_PATH];
    swprintf(searchPath, MAX_PATH, L"%s\\*", path);
    
    // FindExInfoBasic: don't retrieve 8.3 names (faster)
    // FindExSearchLimitToDirectories: only directories
    HANDLE hFind = FindFirstFileExW(
        searchPath,
        FindExInfoBasic,             // Less info = faster
        &findData,
        FindExSearchLimitToDirectories,  // Dirs only
        NULL,
        FIND_FIRST_EX_LARGE_FETCH   // Optimized buffering
    );
    
    if (hFind == INVALID_HANDLE_VALUE) {
        return;
    }
    
    do {
        if (findData.cFileName[0] != L'.') {  // Skip . and ..
            wprintf(L"[DIR] %s
", findData.cFileName);
        }
    } while (FindNextFileW(hFind, &findData));
    
    FindClose(hFind);
}

POSIX vs Windows Directory Listing
Feature	POSIX	Windows
Basic enumeration	opendir/readdir/closedir	FindFirstFile/FindNextFile/FindClose
Returns metadata	Only name and inode	Full metadata (size, dates, attributes)
Stat() needed	Yes, for metadata	No, metadata included
Pattern matching	No, enumerate all then filter	Yes, built into FindFirstFile
Directory-only filter	No, check d_type or stat	Yes, FindExSearchLimitToDirectories
Long path support	Yes (path limits very high)	Requires special handling for >MAX_PATH

Summary: Mastering Directory Listing

Directory listing is a fundamental operation with significant complexity beneath its simple interface. The abstraction layer between user space and the kernel protects applications from file system specifics while enabling portable, efficient directory enumeration.

Key Takeaways

•You cannot read() directories — The kernel enforces use of readdir/getdents for portability and safety
•readdir() uses buffered getdents() — Multiple entries are fetched per system call for efficiency
•Entry order is undefined — Don't assume alphabetical or creation-time ordering
•d_type is a valuable optimization — Avoids stat() calls when available, but requires fallback
•scandir() simplifies sorting/filtering — Use it for sorted or filtered listings
•Avoid telldir/seekdir — These have reliability issues; prefer rewinddir or re-enumeration
•Performance requires attention — Large directories benefit from d_type, batching, and streaming approaches

What's Next:

Now that we can list directory contents, we'll explore how to search within directories efficiently. The next page covers directory searching—finding specific files by name, pattern matching, recursive traversal, and the powerful nftw() file tree walking function.

Page Complete

You now understand the complete mechanics of directory listing—from the user-space readdir() interface through the getdents() system call, entry ordering behavior, performance optimizations, and cross-platform considerations. This knowledge is essential for building efficient file management applications and understanding how tools like ls and file managers work.