Loading learning content...
Imagine having the same document accessible from three different folders—your home directory, your projects folder, and a shared team directory—without consuming triple the disk space. This isn't a copy operation, and it's not a shortcut in the traditional sense. Each location provides genuine, first-class access to the same underlying data, and any modification through one path is instantly visible through all others.
This powerful capability is provided by hard links, one of the most fundamental yet often misunderstood features of Unix-like file systems. Understanding hard links requires a paradigm shift in how we conceptualize files themselves—moving from the intuitive notion of 'files stored in folders' to the deeper reality of inodes, directory entries, and reference counting.
By the end of this page, you will understand the true nature of hard links—how they work at the inode level, why they behave differently from copies, their inherent limitations, and their practical applications in system administration and software development.
To understand hard links, we must first deconstruct our everyday notion of a 'file.' When users interact with files, they typically think:
A file is something stored in a folder, with a name, content, and properties.
This mental model conflates three distinct entities that file systems carefully separate:
In Unix-like file systems, the inode holds both the data and metadata, while directory entries (names) are merely references to inodes. This separation is the foundation of hard links.
A filename is not the file itself—it's a pointer to the file. Just as multiple variables in a program can reference the same object, multiple filenames can reference the same inode. Each such reference is a hard link.
The inode architecture:
An inode (index node) is a data structure on disk that stores:
Notice what the inode does not contain: the filename. The filename lives in a directory entry that maps the name string to the inode number.
| Layer | What It Contains | Where It Lives | Uniqueness |
|---|---|---|---|
| Inode | Metadata + block pointers | Inode table on disk | Unique per file system |
| Data blocks | Actual file content | Data region on disk | Shared via inode |
| Directory entry | Name → inode mapping | Parent directory | Can have multiple per inode |
A hard link is a directory entry that points directly to an inode. When you create a file, the file system:
When you create a hard link, the file system:
No data is copied. No new inode is created. The only change is an additional directory entry and an incremented counter.
1234567891011121314151617181920212223
# Create an original fileecho "Important data that exists exactly once on disk" > original.txt # Verify the inode number and link countls -li original.txt# Output: 1234567 -rw-r--r-- 1 user group 48 Jan 16 10:00 original.txt# ^^^^^^^ ^# inode number link count = 1 # Create a hard link using the 'ln' command (no -s flag)ln original.txt linked.txt # Verify both names point to the same inodels -li original.txt linked.txt# Output:# 1234567 -rw-r--r-- 2 user group 48 Jan 16 10:00 original.txt# 1234567 -rw-r--r-- 2 user group 48 Jan 16 10:00 linked.txt# ^^^^^^^ ^# same inode link count = 2 # Both files have identical inode numbers# Both files show link count of 2# This is the same file accessed through two different namesSystem call mechanics:
At the kernel level, creating a hard link involves the link() system call:
int link(const char *oldpath, const char *newpath);
The kernel implementation:
The link count increment and directory entry creation are atomic. Either both succeed or neither does. This atomicity is crucial for file system consistency—otherwise, an inode could be orphaned (link count says 1 but no directory entries exist) or leaked (link count says 0 but a directory entry still points to it).
Hard links exhibit several distinctive characteristics that stem from their nature as multiple directory entries pointing to the same inode. Understanding these characteristics is essential for effective use.
ls -i output. This is the definitive test for hard links.. and ..).12345678910111213141516171819202122
# Demonstrating equality: there is no "original"echo "Original content" > file1.txtln file1.txt file2.txtln file1.txt file3.txt # All three names are equal referencesstat file1.txt file2.txt file3.txt# All show: Links: 3, Inode: 1234567 (same number) # Delete the "original" - file2 and file3 are unaffectedrm file1.txtcat file2.txt# Output: Original content # Modify through file2 - file3 sees the change instantlyecho "Modified content" > file2.txtcat file3.txt# Output: Modified content # Check link count dropped to 2stat file2.txt# Links: 2The equality principle in practice:
This equality has profound implications. When you 'delete' a file, you're actually:
This means:
When a process opens a file, the kernel increments an internal reference count (distinct from the on-disk link count). Even if all hard links are deleted while a file is open, the inode and data persist until the process closes the file descriptor. This is how Unix handles log rotation: rename old log, create new log, and the old log is cleaned up when logging processes restart.
Understanding the on-disk structures involved in hard links illuminates why they behave as they do. Let's trace through the data structures in a typical ext4 file system.
Directory entry structure:
In ext4, a directory is itself a file containing a sequence of directory entries. Each entry contains:
struct ext4_dir_entry_2 {
__le32 inode; /* Inode number (4 bytes) */
__le16 rec_len; /* Directory entry length (2 bytes) */
__u8 name_len; /* Name length (1 byte) */
__u8 file_type; /* File type (1 byte) */
char name[]; /* File name (variable, up to 255 bytes) */
};
When you create a hard link, the file system adds a new directory entry with:
Inode structure (simplified):
struct ext4_inode {
__le16 i_mode; /* File type and permissions */
__le16 i_uid; /* Owner user ID (low 16 bits) */
__le32 i_size_lo; /* File size in bytes */
__le32 i_atime; /* Access time */
__le32 i_ctime; /* Inode change time */
__le32 i_mtime; /* Modification time */
__le32 i_dtime; /* Deletion time */
__le16 i_gid; /* Group ID (low 16 bits) */
__le16 i_links_count; /* Hard link count */
__le32 i_blocks_lo; /* Block count */
__le32 i_flags; /* File flags */
/* ... block pointers and extended attributes ... */
};
The i_links_count field is a 16-bit unsigned integer, meaning a single inode can have up to 65,535 hard links in ext4 (though practical limits may be lower).
| Step | Data Structure Modified | Change Made |
|---|---|---|
| VFS dentry cache | Locate parent directory inode |
| Source inode, filesystem superblock | Check same FS, not directory, permissions |
| Destination directory data blocks | Insert new entry with source inode number |
| Source inode | i_links_count++ |
| Source inode, destination directory inode | Update ctime and mtime respectively |
| Journal area | Commit all changes atomically |
Modern file systems like ext4 use journaling to ensure link operations are atomic. If a crash occurs mid-operation, the journal replay either completes the operation or rolls it back, preventing inconsistent link counts.
Hard links come with several fundamental restrictions that arise from their design. Understanding these limitations helps you choose when hard links are appropriate.
/dev/sda1 is meaningless on /dev/sda2. Attempting to create a cross-filesystem hard link returns EXDEV: Invalid cross-device link.. and .. entries are special cases created by the filesystem itself. Cycles would break utilities that traverse directories recursively (find, rm -r, du) and could cause infinite loops.123456789101112131415
# Attempting cross-filesystem hard linkecho "test" > /home/user/file.txtln /home/user/file.txt /mnt/usb/file_link.txt# Error: ln: failed to create hard link '/mnt/usb/file_link.txt': Invalid cross-device link # Attempting to hard link a directorymkdir my_directoryln my_directory my_dir_link# Error: ln: my_directory: hard link not allowed for directory # Attempting to link a non-existent fileln does_not_exist.txt new_link.txt# Error: ln: failed to access 'does_not_exist.txt': No such file or directory # These restrictions are fundamental, not configuration issuesWhy directories cannot have hard links:
The prohibition on directory hard links prevents several dangerous scenarios:
Infinite traversal loops — A hard link from /a/b/c to /a would create a cycle that find, ls -R, du, and rm -r would never escape.
Ambiguous parent references — If a directory has multiple parents via hard links, what does .. resolve to? The answer is undefined and breaks fundamental navigation assumptions.
File system damage — Many file system repair tools (fsck) assume the directory structure is a tree. Cycles violate this invariant and could cause data loss during repair.
Inconsistent semantics — Deleting a directory should free all its contents, but what if another hard link still references that directory from elsewhere?
The . and .. entries are created automatically by the filesystem and are carefully managed to maintain tree structure invariants.
Despite their limitations, hard links are valuable in several real-world scenarios. Their key advantages are space efficiency and perfect synchronization—changes through one link are immediately visible through all others.
| Use Case | How Hard Links Help | Example |
|---|---|---|
| Incremental backups | Unchanged files are hard linked to previous backup, saving space | rsync --link-dest, Time Machine |
| Build systems | Share object files between build directories without copying | Bazel, distributed builds |
| Package management | Multiple packages referencing the same library file | dpkg, RPM deduplication |
| Safe file replacement | Create new version, then atomically rename over old version | Configuration updates |
| Multi-location access | Same file accessible from multiple directory contexts | Shared data directories |
Deep dive: Incremental backups with hard links
The most significant application of hard links is in incremental backup systems. Consider backing up a 100 GB home directory daily for a year:
If only 1 GB changes daily on average, hard link-based backups use:
The rsync --link-dest option implements this:
# First full backup
rsync -av /home/user/ /backup/2024-01-01/
# Subsequent incremental backups
rsync -av --link-dest=/backup/2024-01-01/
/home/user/ /backup/2024-01-02/
Unchanged files are hard linked to the previous day's backup. Each backup directory appears complete (you can ls and see all files), but unchanged files don't consume additional space. Apple's Time Machine uses this exact technique.
A server with 1 TB of data and 2% daily churn, backed up daily for 90 days, would need 90 TB with full copies but only ~3 TB with hard link-based incrementals. This makes frequent backups practical where they would otherwise be prohibitively expensive.
12345678910111213141516171819202122232425
# Safe atomic file replacement pattern# Used for configuration files, databases, etc. # Original config filecat /etc/myapp/config.json# { "version": 1, "setting": "old" } # Create new version in the same directorycat > /etc/myapp/config.json.new << 'EOF'{ "version": 2, "setting": "new" }EOF # Atomically replace old with new# mv on the same filesystem is atomicmv /etc/myapp/config.json.new /etc/myapp/config.json # If the system crashes between write and mv:# - The old config is intact# - The .new file may be incomplete but is ignored# This ensures config is always in a valid state # Hard links enable an even safer pattern with rollback:ln /etc/myapp/config.json /etc/myapp/config.json.bak# Now we have a backup that doesn't consume extra space# (until the inode is modified, at which point COW may apply)We've explored hard links from their conceptual foundation to their practical applications. Let's consolidate the key insights:
What's next:
Now that we understand how hard links work through inode references, we'll examine the link count in greater detail. Understanding link count behavior is crucial for predicting when files are actually freed and how utilities like rm, unlink, and nlink interact with the filesystem.
You now understand hard links at both conceptual and implementation levels. You can explain why hard links behave as they do, what restrictions apply, and when to use them effectively. Next, we'll explore the critical role of link count in file lifecycle management.