Loading content...
In the previous page, we established that an inode is the kernel's representation of a file—a fixed-size data structure containing everything the operating system needs to know about a file except its name. But what exactly is "everything"?
When you run stat on a file, you see a wealth of information:
$ stat /etc/passwd
File: /etc/passwd
Size: 2584 Blocks: 8 IO Block: 4096 regular file
Device: 801h/2049d Inode: 131074 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2024-01-15 10:30:22.123456789 +0000
Modify: 2024-01-10 14:22:33.987654321 +0000
Change: 2024-01-10 14:22:33.987654321 +0000
Birth: 2023-06-01 09:00:00.000000000 +0000
Every piece of information displayed here—size, block count, device, inode number, link count, permissions, ownership, and all four timestamps—comes directly from the inode. Understanding these fields gives you insight into how the kernel manages files and how you can leverage this knowledge for debugging, forensics, and system optimization.
By the end of this page, you will understand: every metadata field stored in an inode; how the kernel uses each field for file operations; the three (or four) Unix timestamps and their subtle differences; how file permissions and ownership are encoded; why some inode fields exist for performance optimization; and how different filesystems extend the basic inode structure.
Let's examine the actual inode structure as defined in the Linux kernel. While implementations vary between filesystems, the core fields remain consistent. Here is a conceptual representation based on the ext4 inode:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566
/* * Conceptual inode structure (simplified from ext4_inode) * Actual sizes and layouts vary by filesystem */struct inode { /* === File Type and Permissions === */ __u16 mode; /* File type (4 bits) + permissions (12 bits) */ /* === Ownership === */ __u16 uid; /* Owner user ID (lower 16 bits) */ __u16 gid; /* Owner group ID (lower 16 bits) */ /* === Size Information === */ __u32 size_lo; /* File size in bytes (lower 32 bits) */ __u32 size_hi; /* File size in bytes (upper 32 bits) - for large files */ /* === Link Count === */ __u16 links_count; /* Number of hard links to this inode */ /* === Block Information === */ __u32 blocks_lo; /* Number of 512-byte blocks allocated (lower 32 bits) */ __u16 blocks_hi; /* Upper 16 bits of block count */ /* === Timestamps (seconds since epoch) === */ __u32 atime; /* Last access time */ __u32 mtime; /* Last modification time */ __u32 ctime; /* Last inode change time */ __u32 crtime; /* Creation time (ext4 extension) */ /* === Nanosecond Timestamp Extensions === */ __u32 atime_extra; /* Nanoseconds + epoch extension for atime */ __u32 mtime_extra; /* Nanoseconds + epoch extension for mtime */ __u32 ctime_extra; /* Nanoseconds + epoch extension for ctime */ __u32 crtime_extra; /* Nanoseconds + epoch extension for crtime */ /* === File Flags === */ __u32 flags; /* File attributes (immutable, append-only, etc.) */ /* === Block Pointers === */ __u32 block[15]; /* Direct/indirect block pointers OR extent tree root */ /* === Generation Number === */ __u32 generation; /* File version (NFS uses this) */ /* === Extended Attributes === */ __u32 file_acl_lo; /* Block containing extended attributes */ __u16 file_acl_hi; /* Upper 16 bits of EA block */ /* === Fragment Info (largely obsolete) === */ __u8 frag_num; /* Fragment number */ __u8 frag_size; /* Fragment size */ /* === Extended UID/GID === */ __u16 uid_hi; /* Upper 16 bits of owner UID */ __u16 gid_hi; /* Upper 16 bits of owner GID */ /* === Checksum === */ __u16 checksum_hi; /* Upper 16 bits of inode checksum */ __u32 checksum_lo; /* Lower 32 bits of inode checksum */ /* === Extra/Reserved === */ __u16 extra_isize; /* Size of extra inode fields */ __u32 reserved[...]; /* Reserved for future use */}; /* Total size: 128 bytes (classic) or 256 bytes (ext4 extended) */This structure packs an enormous amount of information into a small, fixed-size space. The original Unix inode was 64 bytes; ext2/ext3 used 128 bytes; ext4 defaults to 256 bytes to accommodate larger timestamps and additional features.
Let's examine each category of fields in detail.
The mode field is a 16-bit value that encodes both the file type and permissions. This elegant packing demonstrates the Unix philosophy of doing more with less.
File Type (upper 4 bits):
The file type tells the kernel how to handle this inode. Unix supports seven file types:
| Type | Octal Value | Symbolic | Description |
|---|---|---|---|
| Regular file | 0100000 | S_IFREG (-) | Ordinary file containing data |
| Directory | 0040000 | S_IFDIR (d) | Contains directory entries |
| Symbolic link | 0120000 | S_IFLNK (l) | Pointer to another path |
| Block device | 0060000 | S_IFBLK (b) | Block-oriented device (disk) |
| Character device | 0020000 | S_IFCHR (c) | Character-oriented device (terminal) |
| FIFO (named pipe) | 0010000 | S_IFIFO (p) | First-in-first-out pipe |
| Socket | 0140000 | S_IFSOCK (s) | Unix domain socket for IPC |
Permission Bits (lower 12 bits):
The permission bits control access. They are traditionally displayed in octal (base 8):
Example: -rw-r--r-- = 0644 octal
Breakdown:
6 = 110 binary = read + write for owner
4 = 100 binary = read for group
4 = 100 binary = read for others
Special Permission Bits:
| Bit | Name | Effect on Files | Effect on Directories |
|---|---|---|---|
| SetUID (4000) | Set User ID | Execute as file owner | No standard effect |
| SetGID (2000) | Set Group ID | Execute as file group | New files inherit dir's group |
| Sticky (1000) | Restricted Delete | No standard effect | Only owner can delete files |
The SetUID bit is how passwd can modify /etc/shadow (owned by root) when run by regular users. The executable runs with root's permissions. This is powerful but dangerous—a vulnerable SetUID program becomes a privilege escalation vector. Security audits always examine SetUID binaries closely.
Every inode stores the user ID (UID) and group ID (GID) of its owner. These numeric IDs—not usernames—are what the kernel uses for access control.
Historical Evolution:
Original Unix used 16-bit UIDs/GIDs, supporting 65,536 users—plenty for 1970s timesharing systems. Modern systems use 32-bit IDs (4+ billion possibilities), essential for large organizations and container systems that assign unique IDs per container.
Ext4 stores UIDs/GIDs as split fields:
uid_lo (16 bits) + uid_hi (16 bits) = 32-bit UIDgid_lo (16 bits) + gid_hi (16 bits) = 32-bit GIDThis split maintains backwards compatibility with older filesystems while enabling larger ID spaces.
The Access Decision Algorithm:
When process (with eUID, eGID) accesses file:
1. If eUID == 0 (root):
→ Grant access (root bypasses checks)
2. If eUID == file's UID:
→ Check owner permission bits
3. If eGID == file's GID:
→ Check group permission bits
4. If process supplementary groups
include file's GID:
→ Check group permission bits
5. Otherwise:
→ Check 'other' permission bits
The kernel evaluates permissions in this order, stopping at the first match.
Common UID Values:
| UID | User | Purpose |
|---|---|---|
| 0 | root | Superuser, bypasses permissions |
| 1-99 | System | Reserved for system accounts |
| 100-999 | Services | System services (web, mail) |
| 1000+ | Users | Regular user accounts |
| 65534 | nobody | Unprivileged fallback |
Ownership and Security:
Ownership determines who can:
chmod requires ownership or root)chown requires root on most systems)When using containers with user namespaces, UIDs inside the container map to different UIDs outside. A file owned by UID 1000 inside might appear as UID 100000 on the host filesystem. Understanding this mapping is essential for debugging container permission issues.
Two inode fields track the amount of space a file occupies:
These numbers are often different, and understanding why reveals important filesystem behavior.
| Scenario | Logical Size | Physical Blocks | Explanation |
|---|---|---|---|
| Normal file (1000 bytes) | 1000 bytes | 8 blocks (4096 bytes) | Filesystem allocates whole blocks; 3096 bytes "wasted" as slack space |
| Sparse file with hole | 1 GB | 8 blocks | File has 1GB logical size but most is empty; only written regions use blocks |
| File with resource fork | 1000 bytes | 24 blocks | Extended attributes stored in additional blocks beyond file data |
| Compressed file (ZFS) | 10 MB | 2 MB | Transparent compression means less physical storage than logical size |
Sparse Files: A Powerful Optimization
Unix filesystems support sparse files—files with "holes" that contain logical zeros but consume no disk space. This is achieved by not allocating blocks for regions never written:
# Create a 1GB sparse file (almost instant, uses minimal space)
$ truncate -s 1G sparse_file.img
# Check apparent size vs actual size
$ ls -lh sparse_file.img
-rw-r--r-- 1 user user 1.0G Jan 15 10:00 sparse_file.img
$ du -h sparse_file.img
0 sparse_file.img # Actually uses 0 blocks!
# Write to the middle of the file
$ echo "data" | dd of=sparse_file.img bs=1 seek=536870912 conv=notrunc
$ du -h sparse_file.img
4.0K sparse_file.img # Now uses one block for the written data
When you read from a hole (unallocated region), the filesystem returns zeros without any actual I/O. Virtual machine disk images commonly use sparse files—a 100GB virtual disk might only consume 10GB of actual storage.
The inode's block count is historically measured in 512-byte sectors, not filesystem blocks. If your filesystem uses 4096-byte blocks, a file using 2 blocks would have blocks=16 (because 2 × 4096 / 512 = 16). This legacy from disk sector sizes can be confusing—always verify the units when interpreting this field.
The link count (often displayed as nlink in stat output) records the number of directory entries pointing to this inode. This simple counter is the foundation of Unix's reference-counting approach to file deletion.
How Link Count Works:
| Action | Effect on Link Count |
|---|---|
| Create new file | Set to 1 |
| Create hard link to file | Increment by 1 |
| Unlink (delete) a name | Decrement by 1 |
| Rename file (same FS) | No change |
| Move file (same FS) | No change |
Directory Link Counts Are Special:
For directories, the link count includes:
. entry within the directory (1).. entry (1 per subdirectory)$ mkdir test_dir
$ stat test_dir | grep Links
Links: 2 # Parent's entry + own "." entry
$ mkdir test_dir/sub1
$ stat test_dir | grep Links
Links: 3 # Now sub1's ".." adds another
$ mkdir test_dir/sub2
$ stat test_dir | grep Links
Links: 4 # sub2's ".." adds yet another
This means: A directory's link count equals 2 + (number of immediate subdirectories). This can be useful for quickly estimating directory structure without reading contents.
When Link Count Reaches Zero:
When unlink decrements the link count to zero:
If no process has the file open:
If a process has the file open:
This orphan handling explains why you can delete files in use—the deletion is deferred.
Link Count Edge Cases:
# Can't delete directory with contents
$ rmdir non_empty_dir
rmdir: failed to remove:
Directory not empty
# Because subdirs have links TO it
# (their ".." entries), preventing
# link count from reaching zero
# Hard links share link count
$ ln file.txt hardlink.txt
$ stat file.txt | grep Links
Links: 2
$ stat hardlink.txt | grep Links
Links: 2 # Same inode = same count
A regular file with link count > 1 has hard links. Finding all names for an inode requires searching the entire filesystem: find / -inum <inode_number>. This is expensive because inode→name is not indexed; only name→inode lookups are fast.
Unix inodes maintain multiple timestamps, each tracking a different aspect of file activity. Understanding these timestamps is essential for system administration, forensics, backup strategies, and debugging.
| Timestamp | Abbreviation | Updated When | Common Use |
|---|---|---|---|
| Access Time | atime | File content is read | Auditing, LRU cache decisions |
| Modification Time | mtime | File content is modified | Build systems, backup tools |
| Change Time | ctime | Inode metadata is modified | Security auditing, integrity checks |
| Birth/Creation Time | crtime/btime | File is first created | Forensics (not in all FS) |
Understanding the Subtle Differences:
$ touch testfile # Creates file: all timestamps set to now
$ cat testfile # Reads content: atime updated
# mtime unchanged, ctime unchanged
$ echo "data" > testfile # Modifies content: mtime updated, ctime updated
# (ctime changes because size changed in inode)
# atime may or may not update (depends on mount options)
$ chmod 755 testfile # Changes permissions: ctime updated
# mtime unchanged (content didn't change)
# atime unchanged
$ chown user testfile # Changes owner: ctime updated
# mtime unchanged
$ mv testfile newname # Renames: ctime updated on file
# mtime unchanged on file
# Parent directory mtime updated
The touch command can set atime and mtime to arbitrary values, but ctime cannot be set by user programs—only the kernel updates it. This makes ctime valuable for forensics: even if an attacker modifies mtime to hide changes, ctime will reveal the true modification time. (Root can work around this by modifying the raw filesystem, but that leaves other traces.)
The atime Controversy:
Updating atime on every file read seems useful, but it causes problems:
Modern systems offer mount options to mitigate this:
| Option | Behavior | Use Case |
|---|---|---|
strictatime | Always update atime | Full POSIX compliance, auditing |
relatime (default) | Update if atime < mtime, or if atime > 24h old | Balance: preserves useful info, reduces writes |
noatime | Never update atime | Maximum performance (SSDs, read-heavy) |
lazytime | Update atime in memory, batch writes to disk | Performance with eventual persistence |
Timestamp Precision:
Original Unix timestamps had 1-second precision (32-bit seconds since 1970). Modern filesystems store additional fields for nanosecond precision:
$ stat --format='%y' testfile
2024-01-15 10:30:22.123456789 +0000
^^^^^^^^^ nanoseconds
Ext4's *_extra fields provide:
This extends the timestamp range to the year 2514 with nanosecond precision.
Beyond basic permissions, the inode contains a flags field providing additional file attributes. These are filesystem-specific extensions to the basic Unix permission model.
Common ext4/ext2 File Flags:
| Flag | Letter | Effect | Use Case |
|---|---|---|---|
| Immutable | i | Cannot be modified, deleted, renamed, or linked | Protecting critical configs from accidental changes |
| Append-only | a | Can only append data; cannot modify existing content | Log files that should never be truncated |
| No Dump | d | Skipped by dump backup utility | Excluding cache/temp from backups |
| No Atime | A | Don't update atime on access | High-read files on SSDs |
| Sync | S | Synchronous writes (no caching) | Critical data requiring immediate persistence |
| Secure Delete | s | Overwrite blocks on deletion | Sensitive data (though unreliable on SSDs) |
| Compression | c | Compress file data transparently | Large compressible files (if FS supports) |
| Extent Format | e | Uses extents instead of block mapping | Automatically set in ext4 (informational) |
Working with File Flags:
# View current flags
$ lsattr important.conf
----i--------e-- important.conf
^-- i = immutable, e = extents
# Make file immutable (requires root)
$ sudo chattr +i important.conf
# Now even root cannot modify it directly
$ sudo rm important.conf
rm: cannot remove 'important.conf': Operation not permitted
$ sudo echo "new line" >> important.conf
bash: important.conf: Operation not permitted
# Must remove immutable flag first
$ sudo chattr -i important.conf
$ sudo rm important.conf # Now works
The immutable flag is particularly valuable for protecting files from rootkits—even if an attacker gains root access, they must know to check for and remove this flag before modifying protected files.
Setting the append-only flag on log files prevents attackers from erasing their tracks. Even root can only append, not delete entries. Combined with remote syslog, this creates defense-in-depth for audit trails.
Two less-visible inode fields serve important roles in distributed systems and metadata extension:
Generation Number:
The generation number is a random value assigned when an inode is allocated. Its purpose becomes clear in network filesystems like NFS:
This prevents the "stale file handle" problem from causing silent data corruption—instead, you get a clear error.
Extended Attributes (xattrs):
The core inode is fixed-size, but applications need to store arbitrary metadata. Extended attributes provide a key-value store attached to files:
# Set an extended attribute
$ setfattr -n user.description -v "Important document" file.txt
# List extended attributes
$ getfattr -d file.txt
# file: file.txt
user.description="Important document"
# Used by many systems:
$ getfattr -d -m - /path/to/file
security.selinux="system_u:object_r:user_home_t:s0"
user.com.apple.quarantine="..."
system.posix_acl_access="..."
Common xattr namespaces:
| Namespace | Purpose | Access Control |
|---|---|---|
user.* | Application-defined metadata | File owner |
trusted.* | Trusted program metadata | Root only |
security.* | Security modules (SELinux, AppArmor) | Security module controls |
system.* | System-level attributes (ACLs) | Varies by attribute |
Small xattrs may be stored directly in the inode's reserved space (inline). Larger xattrs are stored in a dedicated block pointed to by the file_acl field. Very large xattrs may require multiple blocks. This tiered storage keeps common cases fast while supporting arbitrary metadata sizes.
While the conceptual content of inodes is consistent across Unix filesystems, implementations vary significantly in structure and size:
| Filesystem | Default Inode Size | Notable Features | Year |
|---|---|---|---|
| Original Unix FS | 64 bytes | 13 block pointers, no ACLs | 1971 |
| ext2/ext3 | 128 bytes | 15 block pointers, basic xattrs | 1993/2001 |
| ext4 | 256 bytes | Nanosecond timestamps, inline xattrs, birth time | 2008 |
| XFS | 256-2048 bytes | Dynamic inode size, 64-bit inode numbers | 1994 |
| Btrfs | Variable | Copy-on-write, part of B-tree node | 2009 |
| ZFS | Variable | Object-based, not traditional inode | 2005 |
| APFS | Variable | Clone-aware, encryption metadata | 2017 |
Configuring Inode Size at Format Time:
# Default ext4 inode size (256 bytes)
$ mkfs.ext4 /dev/sda1
# Specify inode size explicitly
$ mkfs.ext4 -I 512 /dev/sda1 # 512-byte inodes
# Specify inode ratio (inodes per bytes of space)
$ mkfs.ext4 -i 4096 /dev/sda1 # One inode per 4KB (many small files)
$ mkfs.ext4 -i 65536 /dev/sda1 # One inode per 64KB (large files)
# View inode configuration
$ tune2fs -l /dev/sda1 | grep -i inode
Inode count: 65536
Inode size: 256
Inodes per group: 8192
Tradeoffs:
Inode size and count are set at filesystem creation and cannot be changed without reformatting. This makes initial planning crucial. Monitor inode usage with df -i and plan capacity for your expected workload.
We've examined every aspect of what an inode contains. Let's consolidate our understanding:
What's next:
Now that we understand the metadata fields in an inode, we'll focus on the most important part: the block pointers. The next page explores direct blocks—how inodes store pointers to the first several data blocks of a file, enabling O(1) access to the beginning of any file.
You now have a complete understanding of inode contents—the metadata that describes every aspect of a file's properties, permissions, and timestamps. This knowledge is essential for system administration, debugging, and understanding filesystem behavior. Next, we'll dive into how inodes point to actual file data.