Loading learning content...
Consider a shared library file used by dozens of programs: /usr/lib/libc.so.6. In a pure tree, this file could only exist in one location. But what if you need the same library accessible from /lib64/libc.so.6 for compatibility? Copy the file? That wastes space and creates synchronization nightmares.
The solution is acyclic-graph directories—an extension of tree directories that allows a file (or directory) to have multiple parents. A single file can appear at multiple paths, enabling sharing without duplication. The 'acyclic' constraint ensures we don't create infinite loops when traversing the structure.
This structure is what enables hard links and symbolic links in UNIX systems. Understanding acyclic-graph directories reveals how file systems balance the flexibility of sharing with the need for consistent operations like deletion and traversal.
By completing this page, you will understand how acyclic-graph directories extend trees for sharing, master hard links and symbolic links implementation, analyze reference counting for deletion, understand the critical importance of the acyclic constraint, and recognize the consistency challenges in graph-structured directories.
Trees have a fundamental limitation: each node has exactly one parent. This means:
An acyclic-graph directory relaxes the single-parent constraint while preserving essential properties:
Tree Property (strict): Acyclic Graph Property:
Each node has ONE parent → Each node can have MULTIPLE parents
One path to each node → Multiple paths to some nodes
No sharing possible → Sharing via links
The Critical Constraint: Acyclicity
Why "acyclic"? Consider what happens if we allowed cycles:
Cycle Example (FORBIDDEN):
/home/alice/link → /home
Path resolution for /home/alice/link/alice/link/alice/link/...
→ Never terminates! Infinite loop!
Cycles would break path resolution, directory traversal, and recursive operations like rm -r. The acyclic constraint ensures:
Formal Definition:
An acyclic-graph directory is a Directed Acyclic Graph (DAG):
G = (V, E)
where:
V = nodes (files and directories)
E = directed edges (parent → child)
∄ cycle: v₁ → v₂ → ... → vₙ → v₁
Key Observations from the Diagram:
Shared Directory: The shared/ directory has TWO parents (alice and bob). Both users see it in their home.
Shared File: libc.so.6 is reachable via /lib/libc.so.6 AND /usr/lib/libc.so.6. One physical file, two paths.
No Cycles: Following edges always moves 'downward'—you can never return to an ancestor by following child links.
Types of Sharing:
| Mechanism | What It Links | Creates | Example |
|---|---|---|---|
| Hard link | File to additional directory entry | Same inode, multiple names | ln file link |
| Symbolic link | Path reference | New inode pointing to path string | ln -s /path/to/target link |
| Directory hard link | (Usually prohibited) | Most systems disallow |
We'll explore each mechanism in depth in the following sections.
Most UNIX systems prohibit hard links to directories (except for the implicit . and .. entries). Reason: It's too easy to create cycles accidentally. If alice/linkdir → /home creates a cycle, the file system is corrupted. Only root can sometimes create directory hard links, and it's generally avoided.
A hard link is a directory entry that points directly to a file's inode—the same inode another entry points to. The file has multiple names, but exists once on disk.
How Hard Links Work:
Initial state after creating 'original.txt':
/home/alice/original.txt → inode #12345 → [data blocks]
(link count: 1)
After: ln /home/alice/original.txt /home/bob/copy.txt
/home/alice/original.txt → inode #12345 → [data blocks]
/home/bob/copy.txt →↗ (link count: 2)
Both names point to the SAME inode. No data copied.
Key Properties:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194
/* * Hard Links Demonstration * Shows how hard links work at the system call level. */ #include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <sys/stat.h>#include <fcntl.h>#include <errno.h> /* * Display file information including inode and link count */void show_file_info(const char *path) { struct stat st; if (stat(path, &st) == -1) { printf("%s: %s", path, strerror(errno)); return; } printf("%-30s inode: %-10lu links: %-2lu size: %lu bytes", path, (unsigned long)st.st_ino, (unsigned long)st.st_nlink, (unsigned long)st.st_size);} /* * Create a hard link with explanation */void create_hard_link(const char *target, const char *link_name) { printf("Creating hard link: %s → %s", link_name, target); if (link(target, link_name) == -1) { printf(" ERROR: %s", strerror(errno)); return; } printf(" Success!");} /* * Demonstrate hard link properties */void demonstrate_hard_links(void) { printf("=== Hard Links Demonstration === "); const char *original = "/tmp/original.txt"; const char *link1 = "/tmp/hardlink1.txt"; const char *link2 = "/tmp/hardlink2.txt"; /* Create original file */ printf("Step 1: Create original file"); int fd = open(original, O_CREAT | O_WRONLY | O_TRUNC, 0644); if (fd >= 0) { write(fd, "Hello, World!", 14); close(fd); } show_file_info(original); /* Create first hard link */ printf("Step 2: Create first hard link"); create_hard_link(original, link1); show_file_info(original); show_file_info(link1); /* Create second hard link */ printf("Step 3: Create second hard link"); create_hard_link(original, link2); show_file_info(original); show_file_info(link1); show_file_info(link2); printf(">>> Notice: All three have the SAME inode number!"); printf(">>> Link count is now 3."); /* Modify through one link, visible through all */ printf("Step 4: Modify through link1, read through link2"); fd = open(link1, O_WRONLY | O_APPEND); if (fd >= 0) { write(fd, "Modified via link1", 19); close(fd); } printf("Written to: %s", link1); /* Read through different link */ char buffer[100]; fd = open(link2, O_RDONLY); if (fd >= 0) { ssize_t n = read(fd, buffer, sizeof(buffer) - 1); if (n > 0) { buffer[n] = '\0'; printf("Read from %s:%s", link2, buffer); } close(fd); } printf(">>> Changes through any link affect the same file!"); /* Delete original - other links still work */ printf("Step 5: Delete 'original' file"); unlink(original); printf("Deleted: %s", original); show_file_info(original); /* Will show error */ show_file_info(link1); show_file_info(link2); printf(">>> 'Original' is gone, but file still exists!"); printf(">>> Link count is now 2. Data is NOT deleted."); /* File data deleted only when last link removed */ printf("Step 6: Delete remaining links"); unlink(link1); printf("Deleted: %s (link count now 1)", link1); show_file_info(link2); unlink(link2); printf("Deleted: %s (link count now 0 - FILE DELETED)", link2);} /* * Demonstrate hard link restrictions */void demonstrate_restrictions(void) { printf(" === Hard Link Restrictions === "); /* Cannot link to directory (usually) */ printf("Attempting to hard link a directory:"); if (link("/tmp", "/tmp/link_to_tmp") == -1) { printf(" Failed: %s", strerror(errno)); printf(" (Most systems prohibit directory hard links)"); } /* Cannot cross filesystem boundaries */ printf("Cross-filesystem hard link would fail:"); printf(" ln /home/file /mnt/usb/link"); printf(" → ERROR: Invalid cross-device link"); printf(" (Hard links require same inode table)");} int main() { demonstrate_hard_links(); demonstrate_restrictions(); return 0;}Reference Counting and Deletion:
Hard links use reference counting to manage file lifetime:
Initial: link_count = 1
After ln: link_count = 2
After open(): (open_count++, but inode exists)
After unlink #1: link_count = 1 (still exists)
After unlink #2: link_count = 0
IF open_count = 0 → DELETE file
ELSE → DELETE when last close()
The unlink Semantic:
The unlink() system call:
This allows:
A common pattern for temporary files: create the file, open it, immediately unlink it. The file is now 'anonymous'—no path leads to it. When your process exits (or closes the file), the kernel automatically reclaims the space. No cleanup code needed!
A symbolic link (symlink) is a special file containing a text string interpreted as a path. Unlike hard links, symlinks are indirection—they point to a path, not an inode.
How Symbolic Links Work:
Creating symlink: ln -s /usr/lib/libc.so.6 /lib/libc.so
/lib/libc.so → inode #99999 (symlink)
content: \"/usr/lib/libc.so.6\"
/usr/lib/libc.so.6 → inode #12345 (regular file)
[actual data]
Opening /lib/libc.so:
1. Read inode #99999
2. See it's a symlink
3. Read symlink content: \"/usr/lib/libc.so.6\"
4. Resolve that path
5. Open inode #12345
Key Differences from Hard Links:
| Property | Hard Link | Symbolic Link |
|---|---|---|
| Points to | Inode directly | Path string |
| Own inode? | No (shares target's) | Yes (separate inode) |
| Cross-filesystem | No | Yes |
| Link to directories | Usually prohibited | Allowed |
| Target deleted | Link still works | Becomes 'dangling' |
| Space overhead | Just directory entry | Inode + path string |
| Resolution | Direct | Requires path lookup |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245
/* * Symbolic Links Demonstration * Shows symlink creation, resolution, and edge cases. */ #include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <sys/stat.h>#include <fcntl.h>#include <errno.h>#include <limits.h> /* * Display detailed info for file or symlink */void show_link_info(const char *path) { struct stat st; char target[PATH_MAX]; ssize_t len; /* lstat: get info about link itself, not target */ if (lstat(path, &st) == -1) { printf("%s: %s", path, strerror(errno)); return; } printf("%-30s ", path); if (S_ISLNK(st.st_mode)) { /* Read symlink target */ len = readlink(path, target, sizeof(target) - 1); if (len != -1) { target[len] = '\0'; printf("[SYMLINK] → %s", target); /* Check if target exists */ if (access(path, F_OK) == -1) { printf(" (DANGLING!)"); } } } else if (S_ISREG(st.st_mode)) { printf("[FILE] inode: %lu, links: %lu", (unsigned long)st.st_ino, (unsigned long)st.st_nlink); } else if (S_ISDIR(st.st_mode)) { printf("[DIR]"); } printf("");} /* * Demonstrate symlink creation and resolution */void demonstrate_symlinks(void) { printf("=== Symbolic Links Demonstration === "); const char *target = "/tmp/target_file.txt"; const char *symlink_path = "/tmp/my_symlink"; const char *relative_link = "/tmp/relative_link"; /* Create target file */ printf("Step 1: Create target file"); int fd = open(target, O_CREAT | O_WRONLY, 0644); if (fd >= 0) { write(fd, "Original content", 17); close(fd); } show_link_info(target); /* Create absolute symlink */ printf("Step 2: Create absolute symlink"); if (symlink(target, symlink_path) == 0) { printf("Created: %s → %s", symlink_path, target); } show_link_info(symlink_path); /* Create relative symlink */ printf("Step 3: Create relative symlink"); if (symlink("target_file.txt", relative_link) == 0) { printf("Created: %s → target_file.txt (relative)", relative_link); } show_link_info(relative_link); /* Access through symlink */ printf("Step 4: Access file through symlink"); char buffer[100]; fd = open(symlink_path, O_RDONLY); if (fd >= 0) { ssize_t n = read(fd, buffer, sizeof(buffer) - 1); if (n > 0) { buffer[n] = '\0'; printf("Read via symlink: %s", buffer); } close(fd); } /* stat vs lstat */ printf("Step 5: stat() vs lstat() difference"); struct stat st; stat(symlink_path, &st); /* Follows symlink */ printf("stat(%s): inode %lu (TARGET'S inode)", symlink_path, (unsigned long)st.st_ino); lstat(symlink_path, &st); /* Link itself */ printf("lstat(%s): inode %lu (SYMLINK'S inode)", symlink_path, (unsigned long)st.st_ino); /* Delete target - dangling link */ printf("Step 6: Delete target - symlink becomes dangling"); unlink(target); show_link_info(target); show_link_info(symlink_path); printf(">>> Symlink still exists but points to nothing!"); printf(">>> Opening it will fail with ENOENT."); fd = open(symlink_path, O_RDONLY); if (fd == -1) { printf("open(%s): %s", symlink_path, strerror(errno)); } /* Cleanup */ unlink(symlink_path); unlink(relative_link);} /* * Demonstrate symlink to directory */void demonstrate_directory_symlinks(void) { printf(" === Directory Symlinks === "); /* Create directory structure */ mkdir("/tmp/real_dir", 0755); creat("/tmp/real_dir/file.txt", 0644); /* Create symlink to directory */ printf("Creating symlink to directory:"); if (symlink("/tmp/real_dir", "/tmp/dir_link") == 0) { printf(" /tmp/dir_link → /tmp/real_dir"); } /* Access file through directory symlink */ printf("Accessing file via directory symlink:"); show_link_info("/tmp/dir_link/file.txt"); /* cd into symlink */ printf("The path /tmp/dir_link/file.txt works!"); printf("The symlink transparently redirects to /tmp/real_dir/"); /* Cleanup */ unlink("/tmp/real_dir/file.txt"); rmdir("/tmp/real_dir"); unlink("/tmp/dir_link");} /* * Demonstrate symlink loops (avoided by kernel) */void demonstrate_loop_protection(void) { printf(" === Symlink Loop Protection === "); /* Create circular symlinks */ symlink("/tmp/link_b", "/tmp/link_a"); symlink("/tmp/link_a", "/tmp/link_b"); printf("Created circular symlinks:"); printf(" /tmp/link_a → /tmp/link_b"); printf(" /tmp/link_b → /tmp/link_a "); /* Try to open - kernel detects loop */ printf("Attempting to open /tmp/link_a:"); int fd = open("/tmp/link_a", O_RDONLY); if (fd == -1) { printf(" Failed: %s", strerror(errno)); printf(" (ELOOP: Too many symbolic links)"); } printf(">>> Kernel counted symlink traversals and stopped!"); printf(">>> Typical limit: 40 symlink traversals (SYMLOOP_MAX)."); /* Cleanup */ unlink("/tmp/link_a"); unlink("/tmp/link_b");} int main() { demonstrate_symlinks(); demonstrate_directory_symlinks(); demonstrate_loop_protection(); return 0;}Absolute vs Relative Symlinks:
| Type | Example | When Target Moves |
|---|---|---|
| Absolute | /lib/libc → /usr/lib/libc.so.6 | Breaks if moved to different tree |
| Relative | lib/libc → ../usr/lib/libc.so.6 | Works if relative positions preserved |
Relative symlinks are often preferred for software packages, allowing entire directory trees to be relocated.
The Dangling Link Problem:
Unlike hard links (which prevent target deletion), symlinks can become 'dangling'—pointing to non-existent targets:
$ ln -s /tmp/important important_link
$ rm /tmp/important
$ cat important_link
cat: important_link: No such file or directory
This is by design—symlinks are 'weak references' that don't prevent deletion.
Symlinks can be security risks. A malicious user might create /tmp/symlink → /etc/passwd, tricking privileged programs into modifying sensitive files. The O_NOFOLLOW flag and careful path validation are essential for security-sensitive code.
The 'acyclic' part of acyclic-graph directories is crucial. Cycles break fundamental file system operations.
Why Cycles Are Dangerous:
1. Infinite Path Resolution
If: /a/b/link → /a
Resolving: /a/b/link/b/link/b/link/...
→ Never terminates!
→ System hangs or crashes
2. Recursive Operations
rm -r /a (with cycle /a/b/link → /a)
→ rm deletes /a/b/link → tries to delete /a → /a/b/link → /a ...
→ Infinite loop or inconsistent state
3. Traversal/Find Operations
find /a -name '*.txt'
→ With cycles, same directories revisited infinitely
How Systems Prevent Cycles:
| Mechanism | Applied To | Effect |
|---|---|---|
| Prohibit directory hard links | Hard links | Cannot create cycles directly |
| Symlink loop detection | Path resolution | Kernel counts traversals, errors at limit |
| Mount point checks | Mounts | Prevent mounting tree inside itself |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184
/* * Cycle Detection and Prevention * Demonstrates why cycles are dangerous and how systems prevent them. */ #include <stdio.h>#include <stdlib.h>#include <string.h>#include <unistd.h>#include <errno.h>#include <sys/stat.h>#include <limits.h> /* * Attempt to create directory hard link (usually fails) * This is the primary cycle prevention mechanism. */void try_directory_hardlink(void) { printf("=== Attempting Directory Hard Link === "); mkdir("/tmp/cycle_test", 0755); printf("Trying: link(/tmp/cycle_test, /tmp/cycle_test/self)"); if (link("/tmp/cycle_test", "/tmp/cycle_test/self") == -1) { printf("Failed: %s ", strerror(errno)); printf(">>> EPERM: Most systems prohibit directory hard links!"); printf(">>> This is PRIMARY cycle prevention."); } else { printf("Success (unusual - check your system!)"); unlink("/tmp/cycle_test/self"); } rmdir("/tmp/cycle_test");} /* * Demonstrate symlink loop detection (ELOOP) */void demonstrate_symloop(void) { printf(" === Symlink Loop Detection === "); mkdir("/tmp/loop_test", 0755); /* Create chain of symlinks */ printf("Creating chain: a → b → c → d → e → a"); symlink("/tmp/loop_test/b", "/tmp/loop_test/a"); symlink("/tmp/loop_test/c", "/tmp/loop_test/b"); symlink("/tmp/loop_test/d", "/tmp/loop_test/c"); symlink("/tmp/loop_test/e", "/tmp/loop_test/d"); symlink("/tmp/loop_test/a", "/tmp/loop_test/e"); /* Completes cycle */ /* Try to access */ printf("Attempting to open /tmp/loop_test/a:"); int fd = open("/tmp/loop_test/a/file", O_RDONLY); if (fd == -1 && errno == ELOOP) { printf(" Error: ELOOP (too many symbolic links) "); printf(">>> Kernel detected the cycle!"); printf(">>> Resolution stopped after ~40 symlink traversals."); } /* Show how find handles cycles */ printf("The 'find' command would detect similar cycles via inode tracking."); /* Cleanup */ unlink("/tmp/loop_test/a"); unlink("/tmp/loop_test/b"); unlink("/tmp/loop_test/c"); unlink("/tmp/loop_test/d"); unlink("/tmp/loop_test/e"); rmdir("/tmp/loop_test");} /* * Simulate safe recursive traversal with cycle detection */#define MAX_VISITED 100 typedef struct { dev_t dev; ino_t ino;} InodeRef; typedef struct { InodeRef visited[MAX_VISITED]; int count;} VisitedSet; int already_visited(VisitedSet *set, dev_t dev, ino_t ino) { for (int i = 0; i < set->count; i++) { if (set->visited[i].dev == dev && set->visited[i].ino == ino) { return 1; /* Already visited! Cycle detected */ } } return 0;} void mark_visited(VisitedSet *set, dev_t dev, ino_t ino) { if (set->count < MAX_VISITED) { set->visited[set->count].dev = dev; set->visited[set->count].ino = ino; set->count++; }} /* * Safe directory traversal with cycle detection */void safe_traverse(const char *path, int depth, VisitedSet *visited) { struct stat st; if (lstat(path, &st) == -1) return; if (!S_ISDIR(st.st_mode)) return; /* Check for cycle */ if (already_visited(visited, st.st_dev, st.st_ino)) { printf("%*s[CYCLE DETECTED: already visited inode %lu]", depth * 2, "", (unsigned long)st.st_ino); return; } mark_visited(visited, st.st_dev, st.st_ino); /* Print current directory */ for (int i = 0; i < depth; i++) printf(" "); printf("📂 %s (inode %lu)", path, (unsigned long)st.st_ino); /* Traverse contents... (simplified) */} void demonstrate_safe_traversal(void) { printf(" === Safe Recursive Traversal === "); printf("To safely traverse directories with potential cycles: "); printf("1. Track visited (dev, inode) pairs"); printf("2. Before descending, check if already visited"); printf("3. If visited, skip (cycle!) otherwise add to set "); printf("This is how 'find', 'du', and 'tar' handle cycles."); printf("The -xdev flag prevents crossing filesystems (different dev_t).");} int main() { try_directory_hardlink(); demonstrate_symloop(); demonstrate_safe_traversal(); return 0;}The Linux Symlink Limit (SYMLOOP_MAX):
Linux limits symlink traversals during path resolution:
Total symlinks in path: max 40 (typical)
Consecutive symlinks: varies by kernel version
Exceeding limit → ELOOP error
This prevents denial-of-service and infinite loops while allowing legitimate symlink chains.
POSIX Requirements:
POSIX specifies:
SYMLOOP_MAX: Minimum 8 (Linux typically uses 40)Every directory has '.' (self) and '..' (parent) entries—these ARE directory hard links! How does this not create cycles? The key insight: '.' and '..' are special cases managed by the kernel, not user-creatable hard links. The kernel ensures these never create actual cycles in the graph structure.
Acyclic-graph directories introduce consistency challenges not present in pure trees. When a file has multiple paths, coordinating operations becomes complex.
Challenge 1: Multiple Name Problem
With hard links, a file has multiple names
/home/alice/data.csv
/home/bob/data.csv
/archive/old_data.csv
All three paths refer to the same file. Questions arise:
ls -la report?Challenge 2: Directory Parent Ambiguity (if allowed)
If directory hard links were allowed:
/home/shared/ has parents: /home/alice/, /home/bob/
What is the result of:
cd /home/shared
cd .. ← Goes to alice/ or bob/?
pwd ← Shows which path?
Most systems answer by:
Challenge 3: Disk Space Accounting
Withhard-linked files:
/home/alice/big.dat (hard linked)
/home/bob/big.dat
File size: 1 GB
Disk usage:
Total space used: 1 GB (only one copy exists)
alice quota used: ???
bob quota used: ???
Different systems handle this differently:
| Challenge | Tree Behavior | Graph Behavior | Typical Solution |
|---|---|---|---|
| File deletion | One unlink removes file | Must track link count | Reference counting |
| Ownership | Owner clear from path | Multiple 'locations' | inode stores owner |
| Quota accounting | Simple directory based | Shared files complex | Charge inode owner |
| Traversal | Each node visited once | Might revisit shared nodes | Track visited inodes |
| Backup/copy | Simple tree walk | Avoid duplicating shared | Detect shared inodes |
Challenge 4: Backup and Archive Integrity
When backing up a directory tree with hard links:
$ tar czf backup.tar.gz /home
Should the archive:
Most tools detect hard links by tracking inodes and store data once with multiple directory entries.
Challenge 5: Symbolic Link Resolution Timing
Symlinks are resolved at access time, not creation time:
$ ln -s config.prod config
$ cat config # Reads config.prod
$ rm config.prod
$ ln -s config.dev config.prod # Replace target
$ cat config # Now reads config.dev's content!
This is powerful (late binding) but can confuse programs expecting stable targets.
Time-Of-Check-To-Time-Of-Use: A symlink can change between when you check it and when you use it. Evil attackers exploit this: 'if target is safe file, open target' → attacker swaps symlink after check → program opens /etc/passwd. Use O_NOFOLLOW and careful coding!
We've explored acyclic-graph directories—the extension of tree structures that enables file sharing through links. This powerful capability comes with complexity that file systems must carefully manage.
You now understand how acyclic-graph directories enable file sharing while maintaining the essential properties needed for correct file system operation. Hard links and symbolic links are fundamental UNIX concepts you'll encounter constantly. Next, we'll explore general graph directories—what happens when cycles are allowed, and how some advanced systems handle this.
What's Next:
In the final page of this module, we'll explore general graph directories—structures that allow or must handle cycles. We'll examine garbage collection approaches, historical systems that experimented with cycles, and why acyclic remains the dominant design despite theoretical expressiveness of general graphs.