Unix Inode Structure - Learning Module

Loading content...

0/240

inode Contents

Inside the inode: A File's Complete Biography

In the previous page, we established that an inode is the kernel's representation of a file—a fixed-size data structure containing everything the operating system needs to know about a file except its name. But what exactly is "everything"?

When you run stat on a file, you see a wealth of information:

$ stat /etc/passwd
  File: /etc/passwd
  Size: 2584          Blocks: 8          IO Block: 4096   regular file
Device: 801h/2049d    Inode: 131074      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2024-01-15 10:30:22.123456789 +0000
Modify: 2024-01-10 14:22:33.987654321 +0000
Change: 2024-01-10 14:22:33.987654321 +0000
 Birth: 2023-06-01 09:00:00.000000000 +0000

Every piece of information displayed here—size, block count, device, inode number, link count, permissions, ownership, and all four timestamps—comes directly from the inode. Understanding these fields gives you insight into how the kernel manages files and how you can leverage this knowledge for debugging, forensics, and system optimization.

What You Will Learn

By the end of this page, you will understand: every metadata field stored in an inode; how the kernel uses each field for file operations; the three (or four) Unix timestamps and their subtle differences; how file permissions and ownership are encoded; why some inode fields exist for performance optimization; and how different filesystems extend the basic inode structure.

The inode Data Structure

Let's examine the actual inode structure as defined in the Linux kernel. While implementations vary between filesystems, the core fields remain consistent. Here is a conceptual representation based on the ext4 inode:

conceptual_inode_structure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
/*
 * Conceptual inode structure (simplified from ext4_inode)
 * Actual sizes and layouts vary by filesystem
 */
struct inode {
    /* === File Type and Permissions === */
    __u16 mode;           /* File type (4 bits) + permissions (12 bits) */
    
    /* === Ownership === */
    __u16 uid;            /* Owner user ID (lower 16 bits) */
    __u16 gid;            /* Owner group ID (lower 16 bits) */
    
    /* === Size Information === */
    __u32 size_lo;        /* File size in bytes (lower 32 bits) */
    __u32 size_hi;        /* File size in bytes (upper 32 bits) - for large files */
    
    /* === Link Count === */
    __u16 links_count;    /* Number of hard links to this inode */
    
    /* === Block Information === */
    __u32 blocks_lo;      /* Number of 512-byte blocks allocated (lower 32 bits) */
    __u16 blocks_hi;      /* Upper 16 bits of block count */
    
    /* === Timestamps (seconds since epoch) === */
    __u32 atime;          /* Last access time */
    __u32 mtime;          /* Last modification time */
    __u32 ctime;          /* Last inode change time */
    __u32 crtime;         /* Creation time (ext4 extension) */
    
    /* === Nanosecond Timestamp Extensions === */
    __u32 atime_extra;    /* Nanoseconds + epoch extension for atime */
    __u32 mtime_extra;    /* Nanoseconds + epoch extension for mtime */
    __u32 ctime_extra;    /* Nanoseconds + epoch extension for ctime */
    __u32 crtime_extra;   /* Nanoseconds + epoch extension for crtime */
    
    /* === File Flags === */
    __u32 flags;          /* File attributes (immutable, append-only, etc.) */
    
    /* === Block Pointers === */
    __u32 block[15];      /* Direct/indirect block pointers OR extent tree root */
    
    /* === Generation Number === */
    __u32 generation;     /* File version (NFS uses this) */
    
    /* === Extended Attributes === */
    __u32 file_acl_lo;    /* Block containing extended attributes */
    __u16 file_acl_hi;    /* Upper 16 bits of EA block */
    
    /* === Fragment Info (largely obsolete) === */
    __u8  frag_num;       /* Fragment number */
    __u8  frag_size;      /* Fragment size */
    
    /* === Extended UID/GID === */
    __u16 uid_hi;         /* Upper 16 bits of owner UID */
    __u16 gid_hi;         /* Upper 16 bits of owner GID */
    
    /* === Checksum === */
    __u16 checksum_hi;    /* Upper 16 bits of inode checksum */
    __u32 checksum_lo;    /* Lower 32 bits of inode checksum */
    
    /* === Extra/Reserved === */
    __u16 extra_isize;    /* Size of extra inode fields */
    __u32 reserved[...];  /* Reserved for future use */
};
 
/* Total size: 128 bytes (classic) or 256 bytes (ext4 extended) */

This structure packs an enormous amount of information into a small, fixed-size space. The original Unix inode was 64 bytes; ext2/ext3 used 128 bytes; ext4 defaults to 256 bytes to accommodate larger timestamps and additional features.

Let's examine each category of fields in detail.

File Type and Mode

The mode field is a 16-bit value that encodes both the file type and permissions. This elegant packing demonstrates the Unix philosophy of doing more with less.

Converting Mermaid diagram...

File Type (upper 4 bits):

The file type tells the kernel how to handle this inode. Unix supports seven file types:

Unix File Types
Type	Octal Value	Symbolic	Description
Regular file	0100000	`S_IFREG` (-)	Ordinary file containing data
Directory	0040000	`S_IFDIR` (d)	Contains directory entries
Symbolic link	0120000	`S_IFLNK` (l)	Pointer to another path
Block device	0060000	`S_IFBLK` (b)	Block-oriented device (disk)
Character device	0020000	`S_IFCHR` (c)	Character-oriented device (terminal)
FIFO (named pipe)	0010000	`S_IFIFO` (p)	First-in-first-out pipe
Socket	0140000	`S_IFSOCK` (s)	Unix domain socket for IPC

Permission Bits (lower 12 bits):

The permission bits control access. They are traditionally displayed in octal (base 8):

Example: -rw-r--r-- = 0644 octal

Breakdown:
  6 = 110 binary = read + write for owner
  4 = 100 binary = read for group
  4 = 100 binary = read for others

Special Permission Bits:

Bit	Name	Effect on Files	Effect on Directories
SetUID (4000)	Set User ID	Execute as file owner	No standard effect
SetGID (2000)	Set Group ID	Execute as file group	New files inherit dir's group
Sticky (1000)	Restricted Delete	No standard effect	Only owner can delete files

SetUID: Power and Danger

The SetUID bit is how passwd can modify /etc/shadow (owned by root) when run by regular users. The executable runs with root's permissions. This is powerful but dangerous—a vulnerable SetUID program becomes a privilege escalation vector. Security audits always examine SetUID binaries closely.

Ownership: UID and GID

Every inode stores the user ID (UID) and group ID (GID) of its owner. These numeric IDs—not usernames—are what the kernel uses for access control.

Historical Evolution:

Original Unix used 16-bit UIDs/GIDs, supporting 65,536 users—plenty for 1970s timesharing systems. Modern systems use 32-bit IDs (4+ billion possibilities), essential for large organizations and container systems that assign unique IDs per container.

Ext4 stores UIDs/GIDs as split fields:

uid_lo (16 bits) + uid_hi (16 bits) = 32-bit UID
gid_lo (16 bits) + gid_hi (16 bits) = 32-bit GID

This split maintains backwards compatibility with older filesystems while enabling larger ID spaces.

The Access Decision Algorithm:

When process (with eUID, eGID) accesses file:

1. If eUID == 0 (root):
   → Grant access (root bypasses checks)

2. If eUID == file's UID:
   → Check owner permission bits

3. If eGID == file's GID:
   → Check group permission bits

4. If process supplementary groups 
   include file's GID:
   → Check group permission bits

5. Otherwise:
   → Check 'other' permission bits

The kernel evaluates permissions in this order, stopping at the first match.

Common UID Values:

UID	User	Purpose
0	root	Superuser, bypasses permissions
1-99	System	Reserved for system accounts
100-999	Services	System services (web, mail)
1000+	Users	Regular user accounts
65534	nobody	Unprivileged fallback

Ownership and Security:

Ownership determines who can:

Modify the file (requires write permission)
Change permissions (chmod requires ownership or root)
Change ownership (chown requires root on most systems)

UID/GID Mapping in Containers

When using containers with user namespaces, UIDs inside the container map to different UIDs outside. A file owned by UID 1000 inside might appear as UID 100000 on the host filesystem. Understanding this mapping is essential for debugging container permission issues.

File Size and Block Count

Two inode fields track the amount of space a file occupies:

size: The logical size of the file in bytes—how much data you can read
blocks: The physical space allocated on disk—how much storage is consumed

These numbers are often different, and understanding why reveals important filesystem behavior.

Size vs Blocks: Why They Differ
Scenario	Logical Size	Physical Blocks	Explanation
Normal file (1000 bytes)	1000 bytes	8 blocks (4096 bytes)	Filesystem allocates whole blocks; 3096 bytes "wasted" as slack space
Sparse file with hole	1 GB	8 blocks	File has 1GB logical size but most is empty; only written regions use blocks
File with resource fork	1000 bytes	24 blocks	Extended attributes stored in additional blocks beyond file data
Compressed file (ZFS)	10 MB	2 MB	Transparent compression means less physical storage than logical size

Sparse Files: A Powerful Optimization

Unix filesystems support sparse files—files with "holes" that contain logical zeros but consume no disk space. This is achieved by not allocating blocks for regions never written:

# Create a 1GB sparse file (almost instant, uses minimal space)
$ truncate -s 1G sparse_file.img

# Check apparent size vs actual size
$ ls -lh sparse_file.img
-rw-r--r-- 1 user user 1.0G Jan 15 10:00 sparse_file.img

$ du -h sparse_file.img
0       sparse_file.img    # Actually uses 0 blocks!

# Write to the middle of the file
$ echo "data" | dd of=sparse_file.img bs=1 seek=536870912 conv=notrunc

$ du -h sparse_file.img
4.0K    sparse_file.img    # Now uses one block for the written data

When you read from a hole (unallocated region), the filesystem returns zeros without any actual I/O. Virtual machine disk images commonly use sparse files—a 100GB virtual disk might only consume 10GB of actual storage.

The Block Count Field Uses 512-Byte Units

The inode's block count is historically measured in 512-byte sectors, not filesystem blocks. If your filesystem uses 4096-byte blocks, a file using 2 blocks would have blocks=16 (because 2 × 4096 / 512 = 16). This legacy from disk sector sizes can be confusing—always verify the units when interpreting this field.

Link Count

The link count (often displayed as nlink in stat output) records the number of directory entries pointing to this inode. This simple counter is the foundation of Unix's reference-counting approach to file deletion.

How Link Count Works:

Action	Effect on Link Count
Create new file	Set to 1
Create hard link to file	Increment by 1
Unlink (delete) a name	Decrement by 1
Rename file (same FS)	No change
Move file (same FS)	No change

Directory Link Counts Are Special:

For directories, the link count includes:

The entry in the parent directory (1)
The . entry within the directory (1)
Each subdirectory's .. entry (1 per subdirectory)

$ mkdir test_dir
$ stat test_dir | grep Links
Links: 2                     # Parent's entry + own "." entry

$ mkdir test_dir/sub1
$ stat test_dir | grep Links  
Links: 3                     # Now sub1's ".." adds another

$ mkdir test_dir/sub2
$ stat test_dir | grep Links
Links: 4                     # sub2's ".." adds yet another

This means: A directory's link count equals 2 + (number of immediate subdirectories). This can be useful for quickly estimating directory structure without reading contents.

When Link Count Reaches Zero:

When unlink decrements the link count to zero:

If no process has the file open:
- Inode is freed immediately
- Data blocks are marked free
- File is permanently deleted
If a process has the file open:
- Inode is marked "orphan"
- File remains accessible to the process
- Deletion completes when last handle closes

This orphan handling explains why you can delete files in use—the deletion is deferred.

Link Count Edge Cases:

# Can't delete directory with contents
$ rmdir non_empty_dir
rmdir: failed to remove: 
  Directory not empty

# Because subdirs have links TO it
# (their ".." entries), preventing
# link count from reaching zero

# Hard links share link count
$ ln file.txt hardlink.txt
$ stat file.txt | grep Links
Links: 2
$ stat hardlink.txt | grep Links
Links: 2  # Same inode = same count

Finding Link Count Anomalies

A regular file with link count > 1 has hard links. Finding all names for an inode requires searching the entire filesystem: find / -inum <inode_number>. This is expensive because inode→name is not indexed; only name→inode lookups are fast.

The Three (or Four) Timestamps

Unix inodes maintain multiple timestamps, each tracking a different aspect of file activity. Understanding these timestamps is essential for system administration, forensics, backup strategies, and debugging.

Unix File Timestamps
Timestamp	Abbreviation	Updated When	Common Use
Access Time	`atime`	File content is read	Auditing, LRU cache decisions
Modification Time	`mtime`	File content is modified	Build systems, backup tools
Change Time	`ctime`	Inode metadata is modified	Security auditing, integrity checks
Birth/Creation Time	`crtime`/`btime`	File is first created	Forensics (not in all FS)

Understanding the Subtle Differences:

$ touch testfile           # Creates file: all timestamps set to now

$ cat testfile             # Reads content: atime updated
                           # mtime unchanged, ctime unchanged

$ echo "data" > testfile   # Modifies content: mtime updated, ctime updated
                           # (ctime changes because size changed in inode)
                           # atime may or may not update (depends on mount options)

$ chmod 755 testfile       # Changes permissions: ctime updated
                           # mtime unchanged (content didn't change)
                           # atime unchanged

$ chown user testfile      # Changes owner: ctime updated
                           # mtime unchanged

$ mv testfile newname      # Renames: ctime updated on file
                           # mtime unchanged on file
                           # Parent directory mtime updated

You Cannot Manually Set ctime

The touch command can set atime and mtime to arbitrary values, but ctime cannot be set by user programs—only the kernel updates it. This makes ctime valuable for forensics: even if an attacker modifies mtime to hide changes, ctime will reveal the true modification time. (Root can work around this by modifying the raw filesystem, but that leaves other traces.)

The atime Controversy:

Updating atime on every file read seems useful, but it causes problems:

Performance Impact: Reading files (even from cache) triggers inode writes
SSD Wear: Flash storage has limited write cycles
Journaling Overhead: atime updates must be journaled for consistency

Modern systems offer mount options to mitigate this:

Option	Behavior	Use Case
`strictatime`	Always update atime	Full POSIX compliance, auditing
`relatime` (default)	Update if atime < mtime, or if atime > 24h old	Balance: preserves useful info, reduces writes
`noatime`	Never update atime	Maximum performance (SSDs, read-heavy)
`lazytime`	Update atime in memory, batch writes to disk	Performance with eventual persistence

Timestamp Precision:

Original Unix timestamps had 1-second precision (32-bit seconds since 1970). Modern filesystems store additional fields for nanosecond precision:

$ stat --format='%y' testfile
2024-01-15 10:30:22.123456789 +0000
                   ^^^^^^^^^ nanoseconds

Ext4's *_extra fields provide:

30 bits of nanoseconds (0-999,999,999)
2 bits extending the seconds epoch past 2038

This extends the timestamp range to the year 2514 with nanosecond precision.

File Flags and Attributes

Beyond basic permissions, the inode contains a flags field providing additional file attributes. These are filesystem-specific extensions to the basic Unix permission model.

Common ext4/ext2 File Flags:

ext4 File Flags (Viewable with lsattr, Modifiable with chattr)
Flag	Letter	Effect	Use Case
Immutable	i	Cannot be modified, deleted, renamed, or linked	Protecting critical configs from accidental changes
Append-only	a	Can only append data; cannot modify existing content	Log files that should never be truncated
No Dump	d	Skipped by dump backup utility	Excluding cache/temp from backups
No Atime	A	Don't update atime on access	High-read files on SSDs
Sync	S	Synchronous writes (no caching)	Critical data requiring immediate persistence
Secure Delete	s	Overwrite blocks on deletion	Sensitive data (though unreliable on SSDs)
Compression	c	Compress file data transparently	Large compressible files (if FS supports)
Extent Format	e	Uses extents instead of block mapping	Automatically set in ext4 (informational)

Working with File Flags:

# View current flags
$ lsattr important.conf
----i--------e-- important.conf
             ^-- i = immutable, e = extents

# Make file immutable (requires root)
$ sudo chattr +i important.conf

# Now even root cannot modify it directly
$ sudo rm important.conf
rm: cannot remove 'important.conf': Operation not permitted

$ sudo echo "new line" >> important.conf
bash: important.conf: Operation not permitted

# Must remove immutable flag first
$ sudo chattr -i important.conf
$ sudo rm important.conf  # Now works

The immutable flag is particularly valuable for protecting files from rootkits—even if an attacker gains root access, they must know to check for and remove this flag before modifying protected files.

Append-Only for Secure Logging

Setting the append-only flag on log files prevents attackers from erasing their tracks. Even root can only append, not delete entries. Combined with remote syslog, this creates defense-in-depth for audit trails.

Generation Number and Extended Attributes

Two less-visible inode fields serve important roles in distributed systems and metadata extension:

Generation Number:

The generation number is a random value assigned when an inode is allocated. Its purpose becomes clear in network filesystems like NFS:

NFS file handles include (inode number, generation)
If inode 500 is deleted and later a new file gets inode 500...
Without generation: old NFS handles point to wrong file!
With generation: new file has different generation, old handles are invalidated

This prevents the "stale file handle" problem from causing silent data corruption—instead, you get a clear error.

Extended Attributes (xattrs):

The core inode is fixed-size, but applications need to store arbitrary metadata. Extended attributes provide a key-value store attached to files:

# Set an extended attribute
$ setfattr -n user.description -v "Important document" file.txt

# List extended attributes
$ getfattr -d file.txt
# file: file.txt
user.description="Important document"

# Used by many systems:
$ getfattr -d -m - /path/to/file
security.selinux="system_u:object_r:user_home_t:s0"
user.com.apple.quarantine="..."
system.posix_acl_access="..."

Common xattr namespaces:

Namespace	Purpose	Access Control
`user.*`	Application-defined metadata	File owner
`trusted.*`	Trusted program metadata	Root only
`security.*`	Security modules (SELinux, AppArmor)	Security module controls
`system.*`	System-level attributes (ACLs)	Varies by attribute

Where xattrs Are Stored

Small xattrs may be stored directly in the inode's reserved space (inline). Larger xattrs are stored in a dedicated block pointed to by the file_acl field. Very large xattrs may require multiple blocks. This tiered storage keeps common cases fast while supporting arbitrary metadata sizes.

Inode Size and Filesystem Variations

While the conceptual content of inodes is consistent across Unix filesystems, implementations vary significantly in structure and size:

Inode Implementations Across Filesystems
Filesystem	Default Inode Size	Notable Features	Year
Original Unix FS	64 bytes	13 block pointers, no ACLs	1971
ext2/ext3	128 bytes	15 block pointers, basic xattrs	1993/2001
ext4	256 bytes	Nanosecond timestamps, inline xattrs, birth time	2008
XFS	256-2048 bytes	Dynamic inode size, 64-bit inode numbers	1994
Btrfs	Variable	Copy-on-write, part of B-tree node	2009
ZFS	Variable	Object-based, not traditional inode	2005
APFS	Variable	Clone-aware, encryption metadata	2017

Configuring Inode Size at Format Time:

# Default ext4 inode size (256 bytes)
$ mkfs.ext4 /dev/sda1

# Specify inode size explicitly
$ mkfs.ext4 -I 512 /dev/sda1    # 512-byte inodes

# Specify inode ratio (inodes per bytes of space)
$ mkfs.ext4 -i 4096 /dev/sda1   # One inode per 4KB (many small files)
$ mkfs.ext4 -i 65536 /dev/sda1  # One inode per 64KB (large files)

# View inode configuration
$ tune2fs -l /dev/sda1 | grep -i inode
Inode count:              65536
Inode size:               256
Inodes per group:         8192

Tradeoffs:

Larger inodes: More inline xattr space, more features, but fewer total inodes per disk
Smaller inodes: More inodes (files) possible, but limited feature support
More inodes (lower ratio): Handles many small files, wastes space if files are large

Inode Configuration Cannot Be Changed Later

Inode size and count are set at filesystem creation and cannot be changed without reformatting. This makes initial planning crucial. Monitor inode usage with df -i and plan capacity for your expected workload.

Summary: The Complete Inode Picture

We've examined every aspect of what an inode contains. Let's consolidate our understanding:

Key Takeaways

•Mode field encodes type and permissions — 16 bits elegantly pack file type, special bits (SetUID/SetGID/sticky), and rwx permissions for owner/group/others.
•UID/GID determine ownership — 32-bit values (stored as split 16-bit fields) identify the owner and group for access control decisions.
•Size and blocks track logical vs physical — Size is what you can read; blocks is disk space consumed. Sparse files show why these differ.
•Link count enables reference counting — Tracks directory entries pointing to this inode, enabling safe deletion only when no references remain.
•Three timestamps serve different needs — atime (access), mtime (modification), ctime (inode change) each serve distinct purposes; ctime cannot be forged.
•File flags extend permissions — Immutable, append-only, and other flags provide fine-grained control beyond basic Unix permissions.
•Extended attributes store arbitrary metadata — Key-value pairs expand what can be stored about a file without changing the core inode structure.
•Block pointers locate the data — The most crucial inode content—we'll explore this in the next pages.

What's next:

Now that we understand the metadata fields in an inode, we'll focus on the most important part: the block pointers. The next page explores direct blocks—how inodes store pointers to the first several data blocks of a file, enabling O(1) access to the beginning of any file.

Page Complete

You now have a complete understanding of inode contents—the metadata that describes every aspect of a file's properties, permissions, and timestamps. This knowledge is essential for system administration, debugging, and understanding filesystem behavior. Next, we'll dive into how inodes point to actual file data.