Loading learning content...
Modern file systems contain millions of files organized in deep hierarchies. Finding a specific file—by name, by content, by modification time, or by any other criterion—is a fundamental task that operating systems must support efficiently.
The humble find command represents one of the most powerful tools in the Unix arsenal, capable of locating files based on complex criteria across vast directory trees. But how does it work? What system facilities enable recursive traversal? How can you implement similar functionality in your own programs?
This page explores the mechanisms behind file system searching, from basic pattern matching to sophisticated tree-walking algorithms, revealing the kernel-level support and user-space libraries that make efficient searching possible.
By the end of this page, you will understand comprehensive file system searching—from simple filename matching to recursive directory traversal with the nftw() function, glob pattern expansion, and implementing find-like functionality. You'll also learn about performance considerations and modern alternatives like locate databases.
At its most basic, searching for a file means enumerating directory entries and comparing names. However, even this simple operation has nuances around case sensitivity, pattern matching, and character encoding.
Direct Name Lookup:
Searching for an exact filename in a single directory is straightforward—iterate through entries until found:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
#include <dirent.h>#include <string.h>#include <stdio.h>#include <sys/stat.h> /** * Search for an exact filename in a directory * * Returns: * 1 if found * 0 if not found * -1 on error */int search_exact(const char *dir_path, const char *filename) { DIR *dir = opendir(dir_path); if (!dir) { perror("opendir"); return -1; } struct dirent *entry; while ((entry = readdir(dir)) != NULL) { if (strcmp(entry->d_name, filename) == 0) { closedir(dir); return 1; // Found } } closedir(dir); return 0; // Not found} /** * Case-insensitive search * * Note: On case-sensitive file systems (most Unix), files named * "README" and "readme" are different files. This function finds * any case variation. */int search_case_insensitive(const char *dir_path, const char *filename) { DIR *dir = opendir(dir_path); if (!dir) return -1; struct dirent *entry; while ((entry = readdir(dir)) != NULL) { if (strcasecmp(entry->d_name, filename) == 0) { printf("Found (case-insensitive): %s", entry->d_name); closedir(dir); return 1; } } closedir(dir); return 0;} /** * Search with prefix matching * * Finds all files starting with a given prefix. * Example: prefix="report_" matches report_2024.txt, report_final.pdf, etc. */int search_prefix(const char *dir_path, const char *prefix) { DIR *dir = opendir(dir_path); if (!dir) return -1; size_t prefix_len = strlen(prefix); int count = 0; struct dirent *entry; while ((entry = readdir(dir)) != NULL) { if (strncmp(entry->d_name, prefix, prefix_len) == 0) { printf("Match: %s", entry->d_name); count++; } } closedir(dir); return count;} /** * Search with substring matching * * Finds all files containing a substring anywhere in the name. */int search_substring(const char *dir_path, const char *substring) { DIR *dir = opendir(dir_path); if (!dir) return -1; int count = 0; struct dirent *entry; while ((entry = readdir(dir)) != NULL) { if (strstr(entry->d_name, substring) != NULL) { printf("Contains '%s': %s", substring, entry->d_name); count++; } } closedir(dir); return count;}Unix file systems (ext4, XFS, etc.) are typically case-sensitive: 'File.txt' and 'file.txt' are different files. macOS APFS is case-insensitive by default but case-preserving. NTFS on Windows is case-insensitive but case-preserving. When writing cross-platform search code, always consider case handling explicitly.
Shell glob patterns (*.txt, data_??.csv, [a-z]*) are ubiquitous in Unix. The C library provides functions to expand these patterns programmatically.
The glob() Function:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
#include <glob.h>#include <stdio.h>#include <string.h> /** * glob() - Find pathnames matching a pattern * * Pattern characters: * * - Matches any string (including empty) * ? - Matches any single character * [abc] - Matches any character in the set * [a-z] - Matches any character in the range * [!ab] - Matches any character NOT in the set */void find_matching_files(const char *pattern) { glob_t glob_result; // GLOB_TILDE: Expand ~ to home directory // GLOB_MARK: Append / to directory names int ret = glob(pattern, GLOB_TILDE | GLOB_MARK, NULL, &glob_result); switch (ret) { case 0: printf("Found %zu matches for '%s':", glob_result.gl_pathc, pattern); for (size_t i = 0; i < glob_result.gl_pathc; i++) { printf(" %s", glob_result.gl_pathv[i]); } break; case GLOB_NOMATCH: printf("No matches for pattern: %s", pattern); break; case GLOB_NOSPACE: printf("Out of memory during glob"); break; case GLOB_ABORTED: printf("Read error during glob"); break; } globfree(&glob_result);} /** * Combining multiple patterns with GLOB_APPEND */void find_multiple_patterns(void) { glob_t glob_result; int flags = GLOB_TILDE; // First pattern glob("*.c", flags, NULL, &glob_result); // Additional patterns - note GLOB_APPEND glob("*.h", flags | GLOB_APPEND, NULL, &glob_result); glob("Makefile", flags | GLOB_APPEND, NULL, &glob_result); printf("Found %zu source files:", glob_result.gl_pathc); for (size_t i = 0; i < glob_result.gl_pathc; i++) { printf(" %s", glob_result.gl_pathv[i]); } globfree(&glob_result);} /** * Using fnmatch() for pattern matching without expansion * * fnmatch() just tests if a string matches a pattern. * Useful when you already have a list of filenames. */#include <fnmatch.h> void filter_names_by_pattern(const char *dir_path, const char *pattern) { DIR *dir = opendir(dir_path); if (!dir) return; struct dirent *entry; while ((entry = readdir(dir)) != NULL) { // FNM_PERIOD: Leading period must be explicitly matched // FNM_CASEFOLD: Case-insensitive (GNU extension) if (fnmatch(pattern, entry->d_name, FNM_PERIOD) == 0) { printf("Match: %s", entry->d_name); } } closedir(dir);} /** * Example patterns and what they match: * * "*.txt" - report.txt, README.txt (not .hidden.txt) * ".*" - .bashrc, .gitignore (hidden files) * "data_??.csv" - data_01.csv, data_AB.csv (exactly 2 chars) * "[Rr]eadme*" - README, Readme.md, readme.txt * "*.[ch]" - main.c, util.h (C source files) * "[0-9]*" - 01_intro.md, 99_conclusion.txt */| Pattern | Meaning | Example Matches |
|---|---|---|
* | Any string (0 or more chars) | *.log matches app.log, .log |
? | Any single character | file?.txt matches file1.txt, fileA.txt |
[abc] | Any char in set | [abc].txt matches a.txt, b.txt, c.txt |
[a-z] | Any char in range | [a-z].txt matches a.txt through z.txt |
[!abc] | Any char NOT in set | [!0-9]* matches non-numeric starts |
** | Recursive (requires GLOB_STAR) | src/**/*.c matches C files in all subdirs |
By default, glob patterns do NOT match hidden files (those starting with .). The pattern * matches file.txt but not .hidden. Use .* to explicitly match hidden files, or pass GLOB_PERIOD flag (GNU extension) to match leading periods with *.
Searching an entire directory tree requires recursively descending into subdirectories. There are multiple approaches, each with different tradeoffs.
Manual Recursive Implementation:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113
#include <dirent.h>#include <sys/stat.h>#include <string.h>#include <stdio.h>#include <limits.h>#include <fnmatch.h> /** * Recursive search for files matching a pattern * * This is a simplified implementation of 'find . -name pattern' */int search_recursive(const char *dir_path, const char *pattern) { DIR *dir = opendir(dir_path); if (!dir) { perror(dir_path); return -1; } struct dirent *entry; char path[PATH_MAX]; struct stat statbuf; int count = 0; while ((entry = readdir(dir)) != NULL) { // Skip . and .. if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) { continue; } // Build full path snprintf(path, sizeof(path), "%s/%s", dir_path, entry->d_name); // Get file type (use lstat to not follow symlinks) if (lstat(path, &statbuf) == -1) { perror(path); continue; } if (S_ISDIR(statbuf.st_mode)) { // Recurse into subdirectory count += search_recursive(path, pattern); } else if (S_ISREG(statbuf.st_mode)) { // Check if filename matches pattern if (fnmatch(pattern, entry->d_name, 0) == 0) { printf("%s", path); count++; } } } closedir(dir); return count;} /** * Improved version using directory file descriptor * * Benefits: * - More secure (resistant to symlink attacks) * - Slightly faster (fewer path concatenations) * - Works correctly even if cwd changes */#include <fcntl.h>#include <unistd.h> int search_recursive_safe(int parent_fd, const char *dirname, const char *pattern, const char *prefix) { int dir_fd = openat(parent_fd, dirname, O_RDONLY | O_DIRECTORY); if (dir_fd == -1) { return -1; } DIR *dir = fdopendir(dir_fd); if (!dir) { close(dir_fd); return -1; } struct dirent *entry; struct stat statbuf; int count = 0; char new_prefix[PATH_MAX]; while ((entry = readdir(dir)) != NULL) { if (entry->d_name[0] == '.' && (entry->d_name[1] == '\0' || (entry->d_name[1] == '.' && entry->d_name[2] == '\0'))) { continue; } // Use fstatat instead of building path and calling lstat if (fstatat(dir_fd, entry->d_name, &statbuf, AT_SYMLINK_NOFOLLOW) == -1) { continue; } snprintf(new_prefix, sizeof(new_prefix), "%s/%s", prefix, entry->d_name); if (S_ISDIR(statbuf.st_mode)) { count += search_recursive_safe(dir_fd, entry->d_name, pattern, new_prefix); } else if (fnmatch(pattern, entry->d_name, 0) == 0) { printf("%s", new_prefix); count++; } } closedir(dir); // Also closes dir_fd return count;}Always skip . and .. entries to avoid infinite recursion. Also consider handling symbolic links carefully—following symlinks to directories could create cycles. Use lstat() instead of stat() to detect symlinks, and consider tracking visited inodes to detect cycles.
POSIX provides nftw() (new file tree walk) as a standardized way to traverse directory trees. It handles the complexity of recursive traversal and provides callbacks for processing each file.
The nftw() Function:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134
#define _XOPEN_SOURCE 500#include <ftw.h>#include <stdio.h>#include <string.h>#include <fnmatch.h> /** * nftw() - New file tree walk * * Synopsis: * int nftw(const char *dirpath, * int (*fn)(const char *fpath, const struct stat *sb, * int typeflag, struct FTW *ftwbuf), * int nopenfd, * int flags); * * Parameters: * dirpath - Starting directory * fn - Callback function for each file * nopenfd - Max open file descriptors to use * flags - Control traversal behavior */ /* Callback function - called for every file and directory */int print_entry(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf) { const char *type; switch (typeflag) { case FTW_F: type = "file "; break; case FTW_D: type = "directory"; break; case FTW_DNR: type = "dir(unreadable)"; break; case FTW_NS: type = "stat failed"; break; case FTW_SL: type = "symlink "; break; case FTW_DP: type = "dir(post)"; break; default: type = "unknown "; break; } // ftwbuf->level is the depth in the tree // ftwbuf->base is the offset of basename in fpath printf("%-15s %4d %s", type, ftwbuf->level, fpath); return 0; // Continue traversal} void walk_directory(const char *path) { // FTW_PHYS: Don't follow symlinks (use lstat) // FTW_MOUNT: Stay on same file system int flags = FTW_PHYS; // 20 = max file descriptors to use (affects memory/speed tradeoff) if (nftw(path, print_entry, 20, flags) == -1) { perror("nftw"); }} /** * Practical example: Find files by pattern */static const char *search_pattern = NULL; int find_matching(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf) { // Only match regular files if (typeflag != FTW_F) { return 0; // Continue } // Get just the filename (not the full path) const char *filename = fpath + ftwbuf->base; if (fnmatch(search_pattern, filename, 0) == 0) { printf("%s", fpath); } return 0; // Continue} void find_files(const char *dir, const char *pattern) { search_pattern = pattern; nftw(dir, find_matching, 20, FTW_PHYS);} /** * Example: Calculate total directory size */static unsigned long long total_size = 0; int sum_size(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf) { if (typeflag == FTW_F) { total_size += sb->st_size; } return 0;} unsigned long long calc_dir_size(const char *path) { total_size = 0; nftw(path, sum_size, 20, FTW_PHYS | FTW_MOUNT); return total_size;} /** * Example: Delete directory tree (rm -rf) * * FTW_DEPTH flag processes children before parents, * which is required for deletion. */int delete_entry(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf) { int result; if (typeflag == FTW_DP) { // Directory, post-order: children already deleted result = rmdir(fpath); } else { // File, symlink, etc. result = unlink(fpath); } if (result == -1) { perror(fpath); } return 0; // Continue even on error} int remove_tree(const char *path) { // FTW_DEPTH: Process directory AFTER its contents (post-order) // FTW_PHYS: Don't follow symlinks return nftw(path, delete_entry, 20, FTW_DEPTH | FTW_PHYS);}| Flag | Purpose | When to Use |
|---|---|---|
FTW_PHYS | Don't follow symlinks | Almost always—prevents cycles and security issues |
FTW_MOUNT | Stay on same file system | For operations like du that shouldn't cross mounts |
FTW_DEPTH | Post-order traversal | For deletion—process children before parents |
FTW_CHDIR | Change to each directory | Rarely needed; can improve performance |
| Value | Meaning |
|---|---|
FTW_F | Regular file |
FTW_D | Directory (pre-order visit) |
FTW_DP | Directory (post-order visit, only with FTW_DEPTH) |
FTW_SL | Symbolic link |
FTW_SLN | Symbolic link pointing to nonexistent file |
FTW_DNR | Directory that couldn't be read (permission denied) |
FTW_NS | File for which stat() failed |
Return a non-zero value from the callback function to stop nftw() immediately. The callback's return value becomes nftw()'s return value. This is useful for "find first" operations or when an error requires aborting the traversal.
The find command supports numerous criteria for file matching: name, type, size, modification time, permissions, and more. Here's how to implement similar functionality:
Multi-Criteria File Search:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136
#define _XOPEN_SOURCE 500#include <ftw.h>#include <stdio.h>#include <string.h>#include <fnmatch.h>#include <time.h>#include <sys/types.h> /** * Search criteria structure * * Mirrors common 'find' options */typedef struct { const char *name_pattern; // -name pattern int type; // -type (f, d, l) off_t min_size; // -size +Xk off_t max_size; // -size -Xk time_t newer_than; // -mtime -X (modified within X days) time_t older_than; // -mtime +X mode_t perm_mask; // -perm mode uid_t owner_uid; // -user uid gid_t owner_gid; // -group gid int max_depth; // -maxdepth int min_depth; // -mindepth} find_criteria_t; static find_criteria_t criteria; /** * Check if a file matches all specified criteria */int matches_criteria(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf) { const char *filename = fpath + ftwbuf->base; // Check depth constraints if (criteria.max_depth >= 0 && ftwbuf->level > criteria.max_depth) { // Skip this directory entirely if it's a directory if (typeflag == FTW_D) { return FTW_SKIP_SUBTREE; // GNU extension } return 0; } if (criteria.min_depth >= 0 && ftwbuf->level < criteria.min_depth) { return 0; // Don't print, but continue walking } // Check file type (-type f/d/l) if (criteria.type != 0) { switch (criteria.type) { case 'f': if (typeflag != FTW_F) return 0; break; case 'd': if (typeflag != FTW_D && typeflag != FTW_DP) return 0; break; case 'l': if (typeflag != FTW_SL) return 0; break; } } // Check name pattern (-name) if (criteria.name_pattern) { if (fnmatch(criteria.name_pattern, filename, 0) != 0) { return 0; } } // Check size constraints (-size) if (criteria.min_size > 0 && sb->st_size < criteria.min_size) { return 0; } if (criteria.max_size > 0 && sb->st_size > criteria.max_size) { return 0; } // Check modification time (-mtime) if (criteria.newer_than > 0 && sb->st_mtime < criteria.newer_than) { return 0; } if (criteria.older_than > 0 && sb->st_mtime > criteria.older_than) { return 0; } // Check owner (-user) if (criteria.owner_uid != (uid_t)-1 && sb->st_uid != criteria.owner_uid) { return 0; } // Check group (-group) if (criteria.owner_gid != (gid_t)-1 && sb->st_gid != criteria.owner_gid) { return 0; } // Check permissions (-perm) if (criteria.perm_mask != 0 && (sb->st_mode & criteria.perm_mask) != criteria.perm_mask) { return 0; } // All criteria matched - print the path printf("%s", fpath); return 0;} /** * Example usage: Find large log files modified in last 7 days */void find_recent_large_logs(const char *start_path) { memset(&criteria, 0, sizeof(criteria)); criteria.name_pattern = "*.log"; // -name "*.log" criteria.type = 'f'; // -type f criteria.min_size = 1024 * 1024; // -size +1M criteria.newer_than = time(NULL) - 7*24*60*60; // -mtime -7 criteria.max_depth = 5; // -maxdepth 5 criteria.owner_uid = (uid_t)-1; // No owner filter criteria.owner_gid = (gid_t)-1; // No group filter nftw(start_path, matches_criteria, 20, FTW_PHYS);} /** * Example usage: Find executables */void find_executables(const char *start_path) { memset(&criteria, 0, sizeof(criteria)); criteria.type = 'f'; criteria.perm_mask = S_IXUSR; // -perm /u+x nftw(start_path, matches_criteria, 20, FTW_PHYS);}The actual GNU find command is much more sophisticated, supporting expression trees with -and, -or, -not operators, actions like -exec and -delete, and additional tests like -newer (compare against a file) and -regex (regular expression matching). The example above shows the core matching logic.
Recursive directory traversal is slow for large file systems. The locate command takes a different approach: pre-building a database of all file paths, then searching the database instead of the file system.
How locate Works:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384
/** * Conceptual implementation of a locate-like database * * This is simplified to illustrate the concept. * Real implementations use compression and incremental updates. */ #include <stdio.h>#include <string.h>#include <stdlib.h> #define DATABASE_FILE "/tmp/my_locate.db" /** * Build the database by walking the file system * (Run periodically via cron) */int build_database_callback(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf) { static FILE *db = NULL; if (!db) { db = fopen(DATABASE_FILE, "w"); if (!db) return -1; } // Simple format: one path per line fprintf(db, "%s", fpath); return 0;} void build_database(const char *root) { nftw(root, build_database_callback, 20, FTW_PHYS | FTW_MOUNT); // Close is handled at end of traversal} /** * Search the database (much faster than walking) */int search_database(const char *pattern) { FILE *db = fopen(DATABASE_FILE, "r"); if (!db) { fprintf(stderr, "Database not found. Run updatedb first."); return -1; } char line[PATH_MAX]; int count = 0; while (fgets(line, sizeof(line), db)) { // Remove trailing newline line[strcspn(line, "")] = 0; // Simple substring search (locate default) // or use fnmatch for pattern matching if (strstr(line, pattern)) { printf("%s", line); count++; } } fclose(db); return count;} /** * Comparison: find vs locate performance * * find /usr -name "*.h" * - Walks entire /usr hierarchy * - Reads directory entries, makes stat() calls * - O(n) where n = total files under /usr * - Time: seconds to minutes * * locate "*.h" * - Reads pre-built database file * - Simple string search through compressed data * - O(m) where m = database size (compact) * - Time: milliseconds */Windows provides both immediate (like find) and indexed (like locate) search capabilities through different APIs.
Recursive Search in Windows:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
#include <windows.h>#include <stdio.h> /** * Recursive directory search using FindFirstFile/FindNextFile * * Windows FindFirstFile includes pattern matching built-in, * but still requires manual recursion for subdirectories. */void search_windows_recursive(const wchar_t *path, const wchar_t *pattern) { wchar_t searchPath[MAX_PATH]; wchar_t subPath[MAX_PATH]; WIN32_FIND_DATAW findData; // Search for all entries to find subdirectories swprintf(searchPath, MAX_PATH, L"%s\\*", path); HANDLE hFind = FindFirstFileW(searchPath, &findData); if (hFind == INVALID_HANDLE_VALUE) { return; } do { // Skip . and .. if (wcscmp(findData.cFileName, L".") == 0 || wcscmp(findData.cFileName, L"..") == 0) { continue; } swprintf(subPath, MAX_PATH, L"%s\\%s", path, findData.cFileName); if (findData.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) { // Recurse into subdirectory search_windows_recursive(subPath, pattern); } else { // Check if file matches pattern (simple substring for demo) // For proper glob matching, use PathMatchSpec if (wcsstr(findData.cFileName, pattern) != NULL) { wprintf(L"%s", subPath); } } } while (FindNextFileW(hFind, &findData)); FindClose(hFind);} /** * Using PathMatchSpec for glob-style matching */#include <shlwapi.h>#pragma comment(lib, "shlwapi.lib") BOOL matches_pattern(const wchar_t *filename, const wchar_t *pattern) { // PathMatchSpec supports *, ?, [abc] patterns return PathMatchSpecW(filename, pattern);} /** * Windows Search Index Query * * Windows Indexing Service / Windows Search indexes file content * and metadata for fast searching. This is the Windows equivalent * of locate + full-text search. * * Query using ISearchQueryHelper COM interface or simpler: * - Windows Search SQL syntax * - Everything search tool (third-party, very fast) */On Windows, the third-party 'Everything' tool provides near-instant file searching by reading the NTFS Master File Table directly. It's dramatically faster than the built-in Windows Search and is widely used by power users and developers.
File system searching spans a spectrum from simple directory enumeration to sophisticated indexed queries. Understanding the available tools and their tradeoffs enables you to choose the right approach for each use case.
What's Next:
With search operations mastered, we'll complete our exploration of directory operations by examining renaming and moving files. The next page covers the rename() system call, its atomicity guarantees, cross-device move limitations, and the implementation details that make file restructuring safe and efficient.
You now understand the complete mechanics of file system searching—from basic pattern matching through recursive traversal with nftw(), implementing find-like functionality, and understanding when to use indexed databases like locate. This knowledge enables you to build efficient file management tools and understand how search utilities work under the hood.