Loading content...
Storage devices present data as vast arrays of blocks—billions of 512-byte or 4KB sectors with numeric addresses. Without higher-level organization, managing data would be nearly impossible. How would you find a document among trillions of bytes? How would you share files between users while protecting sensitive data? How would applications store configuration without conflicting with each other?
The file system solves these problems by imposing structure on raw storage. It provides the abstraction of files (named, typed data containers) organized into directories (hierarchical namespaces), with permissions controlling access. These file system manipulation services are among the most frequently used OS capabilities—virtually every application reads and writes files.
By the end of this page, you will understand how file systems organize data, the operations available for file and directory manipulation, how permissions and access control work, the role of metadata and file attributes, and how modern file systems handle advanced features like links, journaling, and atomic operations. This knowledge is essential for effective systems programming.
A file is the fundamental unit of persistent storage—a named collection of related information stored on secondary storage. To the application, a file is a logical sequence of bytes; the file system handles the physical layout on disk.
File attributes (metadata):
Beyond content, files carry extensive metadata:
12345678910111213141516171819202122232425262728293031323334
# Examining file metadata on Linux $ stat /etc/passwd File: /etc/passwd Size: 2834 Blocks: 8 IO Block: 4096 regular fileDevice: 259,2 Inode: 1048594 Links: 1Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)Access: 2025-01-15 10:30:00.000000000 +0000Modify: 2025-01-10 08:15:30.123456789 +0000Change: 2025-01-10 08:15:30.123456789 +0000 Birth: 2024-06-01 00:00:00.000000000 +0000 # Breakdown:# - Size: 2834 bytes of content# - Blocks: 8 (512-byte blocks allocated = 4KB)# - IO Block: Optimal I/O transfer size# - Inode: Unique identifier within filesystem# - Links: Hard link count# - Access mode: 0644 = rw-r--r--# - Three timestamps:# - atime (Access): Last read# - mtime (Modify): Last content change# - ctime (Change): Last metadata change# - btime (Birth): Creation time (not all filesystems) # JSON-format metadata for scripts$ stat --format='{"name":"%n","size":%s,"uid":%u,"mode":"%a"}' /etc/passwd{"name":"/etc/passwd","size":2834,"uid":0,"mode":"644"} # Extended attributes (xattrs) - additional metadata$ getfattr -d /path/to/file# file: /path/to/fileuser.description="Project documentation"security.selinux="system_u:object_r:user_home_t:s0"File types:
Modern file systems distinguish several file types, each with different semantics:
$ ls -la /dev /home/user /var/run
crw-rw-rw- 1 root root 1, 3 Jan 15 10:00 /dev/null # c = character device
brw-rw---- 1 root disk 8, 0 Jan 15 10:00 /dev/sda # b = block device
prw-r--r-- 1 user user 0 Jan 15 10:00 /tmp/myfifo # p = named pipe (FIFO)
srwxrwxrwx 1 user user 0 Jan 15 10:00 /var/run/app.sock # s = socket
lrwxrwxrwx 1 root root 11 Jan 15 10:00 /bin -> /usr/bin # l = symbolic link
drwxr-xr-x 2 user user 4096 Jan 15 10:00 /home/user/docs # d = directory
-rw-r--r-- 1 user user 1234 Jan 15 10:00 /home/user/file # - = regular file
The first character indicates type:
- Regular file (data content)d Directory (contains other files)l Symbolic link (pointer to another path)c Character device (byte-stream I/O)b Block device (block-based I/O)p Named pipe (FIFO, IPC)s Socket (network-style IPC)File extensions (.txt, .pdf, .exe) are naming conventions, not enforced types. The OS doesn't care about extensions—you can rename 'doc.pdf' to 'doc.exe' and the content is unchanged.
Magic numbers are byte sequences at file start that identify content types. For example, PDFs start with '%PDF-', PNGs with '\x89PNG'. The file command uses these to detect actual file types regardless of extension.
Directories organize files into hierarchical namespaces. A directory is itself a file—one that contains a list of (name, inode) pairs mapping names to file locations.
Hierarchical structure:
Modern file systems use a tree structure rooted at / (Unix) or drive letters (Windows):
/ C:\
├── bin/ ├── Windows/
├── home/ ├── Program Files/
│ ├── alice/ ├── Users/
│ │ ├── documents/ │ ├── Alice/
│ │ │ └── report.pdf │ │ ├── Documents/
│ │ └── .bashrc │ │ │ └── report.pdf
│ └── bob/ │ │ └── Desktop/
├── etc/ │ └── Bob/
├── var/ └── Temp/
└── tmp/
Special directory entries:
. (dot): Refers to current directory.. (dot-dot): Refers to parent directory. in root equals ..)Path resolution:
When you access a file by path, the OS must resolve the path to an actual file location. This involves traversing the directory tree:
Absolute path /home/alice/documents/report.pdf:
1. Start at root directory (/)
2. Look up "home" → get inode for /home directory
3. Read /home directory, look up "alice" → get inode
4. Read /home/alice, look up "documents" → get inode
5. Read /home/alice/documents, look up "report.pdf" → get inode
6. Inode contains file's disk block locations
7. Access file content from those blocks
Relative path ../bob/file.txt (from /home/alice):
1. Start at current directory (/home/alice)
2. Look up ".." → get inode for /home
3. Read /home, look up "bob" → get inode
4. Read /home/bob, look up "file.txt" → get inode
5. Access file
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
#include <dirent.h>#include <sys/stat.h>#include <stdio.h>#include <string.h>#include <unistd.h> /** * Demonstrates directory manipulation operations */ /* List directory contents */void list_directory(const char *path) { DIR *dir = opendir(path); if (!dir) { perror("opendir"); return; } struct dirent *entry; while ((entry = readdir(dir)) != NULL) { /* entry->d_type: DT_REG, DT_DIR, DT_LNK, etc. */ char type; switch (entry->d_type) { case DT_REG: type = 'f'; break; /* Regular file */ case DT_DIR: type = 'd'; break; /* Directory */ case DT_LNK: type = 'l'; break; /* Symbolic link */ default: type = '?'; break; } printf("[%c] %s (inode: %lu)\n", type, entry->d_name, entry->d_ino); } closedir(dir);} /* Create directory */int create_directory(const char *path) { /* 0755 = rwxr-xr-x permissions */ if (mkdir(path, 0755) != 0) { perror("mkdir"); return -1; } return 0;} /* Remove empty directory */int remove_directory(const char *path) { if (rmdir(path) != 0) { perror("rmdir"); return -1; } return 0;} /* Change current working directory */int change_directory(const char *path) { if (chdir(path) != 0) { perror("chdir"); return -1; } return 0;} /* Get current working directory */void print_cwd() { char cwd[PATH_MAX]; if (getcwd(cwd, sizeof(cwd)) != NULL) { printf("Current directory: %s\n", cwd); }} int main() { /* Working directory operations */ print_cwd(); change_directory("/tmp"); print_cwd(); /* Create and list directory */ create_directory("/tmp/test_dir"); list_directory("/tmp/test_dir"); /* Cleanup */ remove_directory("/tmp/test_dir"); return 0;}| Operation | Unix API | Windows API | Purpose |
|---|---|---|---|
| Create directory | mkdir() | CreateDirectory() | Create new directory |
| Remove directory | rmdir() | RemoveDirectory() | Remove empty directory |
| Open directory | opendir() | FindFirstFile() | Begin reading directory |
| Read entry | readdir() | FindNextFile() | Get next directory entry |
| Close directory | closedir() | FindClose() | Finish reading directory |
| Change directory | chdir() | SetCurrentDirectory() | Change working directory |
| Get current dir | getcwd() | GetCurrentDirectory() | Get working directory |
Each path component requires a directory read—potentially a disk access. Deep paths incur more overhead. The OS caches directory contents (dentry cache in Linux) to accelerate repeated lookups. This is why accessing '/a/b/c/d/e/f/file' isn't dramatically slower than '/file' in practice.
File operations can be categorized by whether they affect content, metadata, or namespace.
Content operations:
These operations read from or write to file content:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
#include <fcntl.h>#include <unistd.h>#include <stdio.h>#include <string.h>#include <sys/stat.h> /** * Comprehensive file operations demonstration */ int main() { const char *filename = "/tmp/demo_file.txt"; char buffer[256]; /* ===== CREATE AND WRITE ===== */ /* Open flags: * O_CREAT - Create if doesn't exist * O_WRONLY - Write-only access * O_TRUNC - Truncate if exists * O_APPEND - Append mode (writes always at end) * O_EXCL - Fail if file exists (with O_CREAT) * O_SYNC - Synchronous writes (durability) */ int fd = open(filename, O_CREAT | O_WRONLY | O_TRUNC, 0644); if (fd < 0) { perror("open for write"); return 1; } /* Write data */ const char *text = "Hello, File System!\n"; ssize_t written = write(fd, text, strlen(text)); printf("Wrote %zd bytes\n", written); /* Seek to position (SEEK_SET=absolute, SEEK_CUR=relative, SEEK_END=from end) */ off_t pos = lseek(fd, 0, SEEK_END); printf("File position after write: %ld\n", pos); /* Write more data */ const char *more = "Second line.\n"; write(fd, more, strlen(more)); close(fd); /* ===== READ ===== */ fd = open(filename, O_RDONLY); if (fd < 0) { perror("open for read"); return 1; } /* Read entire file */ ssize_t bytes_read; while ((bytes_read = read(fd, buffer, sizeof(buffer) - 1)) > 0) { buffer[bytes_read] = '\0'; printf("Read: %s", buffer); } /* Seek to beginning and read again */ lseek(fd, 0, SEEK_SET); bytes_read = read(fd, buffer, 5); /* Read first 5 bytes */ buffer[bytes_read] = '\0'; printf("First 5 bytes: '%s'\n", buffer); close(fd); /* ===== METADATA OPERATIONS ===== */ /* Get file information */ struct stat st; if (stat(filename, &st) == 0) { printf("Size: %ld bytes\n", st.st_size); printf("Inode: %lu\n", st.st_ino); printf("Mode: %o\n", st.st_mode & 0777); printf("Links: %lu\n", st.st_nlink); } /* Change permissions */ chmod(filename, 0600); /* rw------- */ /* Change ownership (requires root) */ // chown(filename, new_uid, new_gid); /* Truncate to specific size */ truncate(filename, 10); /* Keep only first 10 bytes */ /* ===== NAMESPACE OPERATIONS ===== */ /* Rename file */ rename(filename, "/tmp/renamed_file.txt"); /* Create hard link */ link("/tmp/renamed_file.txt", "/tmp/hardlink.txt"); /* Create symbolic link */ symlink("/tmp/renamed_file.txt", "/tmp/symlink.txt"); /* Delete file (unlink removes directory entry) */ unlink("/tmp/symlink.txt"); unlink("/tmp/hardlink.txt"); unlink("/tmp/renamed_file.txt"); return 0;}The file descriptor:
When you open a file, the OS returns a file descriptor—a small integer that references an open file. Each process maintains a file descriptor table:
Process File Descriptor Table:
┌─────┬────────────────────────────────────────────────────┐
│ FD │ Points to (in kernel) │
├─────┼────────────────────────────────────────────────────┤
│ 0 │ stdin → terminal input │
│ 1 │ stdout → terminal output │
│ 2 │ stderr → terminal output │
│ 3 │ Open file → /home/user/data.txt (pos: 1024) │
│ 4 │ Socket → TCP connection to 10.0.0.1:80 │
│ 5 │ Pipe → write end of pipe to child process │
└─────┴────────────────────────────────────────────────────┘
Kernel File Table Entry (per fd):
- Reference to underlying inode/vnode
- Current file position (offset)
- Open mode (read/write/append)
- File status flags
File descriptors 0, 1, 2 are standard input, output, and error—conventionally already open when a process starts. New opens return the lowest available number.
mmap() maps a file directly into process memory. Instead of read()/write(), you access file content as memory addresses. Benefits: zero-copy I/O, demand paging (only accessed pages loaded), shared memory between processes. Used by databases, executables (code pages), and high-performance applications.
Operating systems enforce access control to prevent unauthorized file access. The traditional Unix permission model and modern access control lists (ACLs) provide overlapping but distinct capabilities.
Unix permission model:
Every file has three permission sets: owner (user), group, and others. Each set has three bits: read (r), write (w), execute (x).
-rwxr-xr-- 1 alice developers 4096 Jan 15 10:00 script.sh
│└┬┘└┬┘└┬┘
│ │ │ └── Others: read only (r--) = 4
│ │ └───── Group: read + execute (r-x) = 5
│ └──────── Owner: full access (rwx) = 7
└────────── File type (- = regular file)
Octal representation: 754
Permission values:
r = 4 (read)
w = 2 (write)
x = 1 (execute)
For directories:
r = list contents
w = create/delete files
x = traverse (cd into, access files within)
12345678910111213141516171819202122232425262728293031323334353637
# Permission manipulation examples # View permissions$ ls -la file.txt-rw-r--r-- 1 alice staff 1234 Jan 15 10:00 file.txt # Change permissions (chmod)$ chmod 755 script.sh # rwxr-xr-x (octal)$ chmod u+x script.sh # Add execute for user (symbolic)$ chmod g-w file.txt # Remove write from group$ chmod o=r file.txt # Set others to read-only$ chmod a+r file.txt # Add read for all (a = all)$ chmod -R 755 dir/ # Recursive # Change ownership (chown) - requires root or owner$ sudo chown bob:staff file.txt # Change user and group$ sudo chown :developers file.txt # Change group only$ sudo chown -R alice:alice dir/ # Recursive # Special permissions$ chmod 4755 program # setuid: runs as file owner$ chmod 2755 dir # setgid: new files inherit group$ chmod 1755 /tmp # sticky bit: only owner can delete # View numeric permissions$ stat -c "%a %n" file.txt644 file.txt # Default permissions (umask)$ umask # Show current mask022$ umask 027 # Set mask (files: 640, dirs: 750) # File creation with umask:# Default mode & ~umask = actual mode# 666 & ~022 = 644 (for files)# 777 & ~022 = 755 (for directories)Access Control Lists (ACLs):
Traditional Unix permissions are limited—you can only grant access to owner, one group, and everyone else. ACLs extend this with fine-grained, per-user/per-group permissions:
# View ACL
$ getfacl file.txt
# file: file.txt
# owner: alice
# group: staff
user::rw-
user:bob:r-- # Specific user permission
group::r--
group:developers:rw- # Specific group permission
mask::rw-
other::---
# Set ACL
$ setfacl -m u:bob:r file.txt # Add user ACL
$ setfacl -m g:developers:rw file.txt # Add group ACL
$ setfacl -x u:bob file.txt # Remove user ACL
$ setfacl -b file.txt # Remove all ACLs
# Default ACLs (inherited by new files in directory)
$ setfacl -d -m g:developers:rw dir/
Windows permission model:
Windows uses a more complex ACL system by default, with granular permissions like "Read Attributes," "Write Extended Attributes," "Delete Child," etc., plus inheritance rules for directories.
| Bit | On Files | On Directories | Representation |
|---|---|---|---|
| setuid (4000) | Execute as file owner | No effect | -rwsr-xr-x |
| setgid (2000) | Execute as file group | New files inherit group | -rwxr-sr-x |
| sticky (1000) | No effect (historically swap) | Only owner can delete files | drwxrwxrwt |
setuid programs run with the file owner's privileges regardless of who executes them. When owned by root, they're a significant security risk—any vulnerability allows privilege escalation. Examples: /usr/bin/passwd (needs to modify /etc/shadow), sudo, su. Modern systems minimize setuid binaries and prefer capabilities for fine-grained privilege.
Unix file systems support two types of links that provide indirection—allowing multiple names to reference the same data.
Hard links:
A hard link is an additional directory entry pointing to the same inode. The file's data has multiple names equally valid—there's no "original" and "link."
$ echo "content" > file1.txt
$ ln file1.txt file2.txt # Create hard link
$ ls -li
12345 -rw-r--r-- 2 user user 8 Jan 15 10:00 file1.txt
12345 -rw-r--r-- 2 user user 8 Jan 15 10:00 file2.txt
^^^^^^^^^^^^^^^
Same inode (12345), link count = 2
$ rm file1.txt # Remove one name
$ cat file2.txt # Data still accessible!
content
$ ls -li file2.txt
12345 -rw-r--r-- 1 user user 8 Jan 15 10:00 file2.txt
^
Link count now 1
Hard link characteristics:
Symbolic links (symlinks):
A symbolic link is a special file containing a path to another file. It's an indirect reference resolved at access time.
$ ln -s /path/to/target linkname # Create symlink
$ ls -l linkname
lrwxrwxrwx 1 user user 15 Jan 15 10:00 linkname -> /path/to/target
# Symlink characteristics:
# - Different inode from target
# - Contains path string, not data
# - Can cross filesystems
# - Can link to directories
# - Can become "dangling" if target deleted
$ rm /path/to/target
$ cat linkname
cat: linkname: No such file or directory # Dangling symlink!
$ ls -l linkname # Link still exists
lrwxrwxrwx 1 user user 15 Jan 15 10:00 linkname -> /path/to/target
Comparison:
Hard Link Symbolic Link
┌────────────────────┐ ┌────────────────────┐
│ Directory Entry │ │ Directory Entry │
│ name: "file1.txt" │ │ name: "linkname" │
│ inode: 12345 │ │ inode: 67890 │
└────────┬───────────┘ └────────┬───────────┘
│ │
│ ▼
│ ┌────────────────────┐
│ │ Inode 67890 │
│ │ type: symlink │
│ │ data: "/path/to/ │
│ │ target" │
▼ └────────┬───────────┘
┌────────────────────┐ │
│ Inode 12345 │◄─────────────────────┘ (resolved at access)
│ type: regular │
│ size: 1000 │
│ blocks: [...] │
└────────────────────┘
Many system calls follow symlinks automatically (open, stat). Some have 'l' variants that don't (lstat, lchown). readlink() reads the symlink content itself. realpath() resolves all symlinks to get the canonical absolute path.
Different file system types implement these abstractions in various ways, optimized for different use cases. Understanding file system organization helps explain performance characteristics and limitations.
On-disk structure (simplified ext4 example):
┌────────────────────────────────────────────────────────────────────────┐
│ Disk Layout │
├─────────┬─────────┬─────────┬──────────────────────────────────────────┤
│ Boot │ Super- │ Block │ Block Groups (repeated) │
│ Block │ block │ Group │ │
│ │ │ Desc. │ ┌────────────────────────────────────────┐│
│ │ │ │ │ Group 0 │ Group 1 │ Group 2 │ ... ││
│ 1 KB │ 1 KB │ n KB │ └────────────────────────────────────────┘│
└─────────┴─────────┴─────────┴──────────────────────────────────────────┘
Block Group Structure:
┌──────────────────────────────────────────────────────────────┐
│ Data Block │ Inode │ Inode │ Data Blocks │
│ Bitmap │ Bitmap │ Table │ (file contents) │
│ │ │ │ │
│ 1 bit per │ 1 bit per│ inodes │ Actual file data │
│ data block │ inode │ │ │
└──────────────────────────────────────────────────────────────┘
Inode Structure:
┌────────────────────────────────────────────────────────────┐
│ Type & Permissions │ Owner UID/GID │
├───────────────────────────┼────────────────────────────────┤
│ Size │ Timestamps (atime/mtime/ctime) │
├───────────────────────────┼────────────────────────────────┤
│ Link count │ Flags │
├───────────────────────────┴────────────────────────────────┤
│ Direct block pointers (12) → data blocks │
│ Indirect pointer (1) → block of pointers │
│ Double indirect pointer (1) → block of ind. ptrs │
│ Triple indirect pointer (1) → block of dbl. ptrs │
└────────────────────────────────────────────────────────────┘
| File System | OS | Max File Size | Features |
|---|---|---|---|
| ext4 | Linux | 16 TB | Journaling, extents, large FS support |
| XFS | Linux | 8 EB | High performance, large files, scalable |
| Btrfs | Linux | 16 EB | Copy-on-write, snapshots, checksums |
| NTFS | Windows | 16 EB | Journaling, ACLs, compression, encryption |
| APFS | macOS | 8 EB | Copy-on-write, snapshots, encryption |
| ZFS | Illumos/BSD | 16 EB | Checksums, RAID, snapshots, compression |
| FAT32 | Cross-platform | 4 GB | Simple, wide compatibility, no journaling |
| exFAT | Cross-platform | 16 EB | Flash-optimized, large files, no journaling |
Journaling and crash recovery:
A key concern for file systems is crash recovery—if power fails during a write, the file system should remain consistent. Journaling file systems maintain a log of changes:
If crash occurs:
Journal modes:
Modern file systems like Btrfs, ZFS, and APFS use copy-on-write: modified blocks are written to new locations, never overwriting existing data. Benefits: snapshots are instant (just preserve old block pointers), corruption is detectable (checksums on all blocks), atomic updates. Trade-off: can fragment over time, write amplification.
We've explored how operating systems organize and manage persistent data through file system services. Let's consolidate the key insights:
What's next:
With file systems covered, we'll explore the final category of OS services in this module: Communication Services. This includes inter-process communication (IPC), networking, and the mechanisms that allow processes and systems to exchange information.
You now understand how operating systems provide file system manipulation services. From file concepts through directory structures, operations, permissions, links, and file system organization—these services form the foundation of persistent data management in computing.