Operating SystemsDirectory Structure

Directory Structure

LevelIntermediate

Duration75 mins

TopicDirectory Structure

5 / 5

General Graph Directory

When Acyclic Isn't Enough

What if we removed the acyclic constraint entirely? What if directories could point to their ancestors, creating true cycles? This isn't merely a theoretical question—some systems have explored general graph directories, and understanding why they remain rare illuminates fundamental trade-offs in file system design.

A general graph directory allows arbitrary connections between nodes, including cycles. This provides maximum flexibility but introduces significant complexity: deletion becomes nuanced, traversal requires cycle detection, and the entire system must implement garbage collection rather than simple reference counting.

In this concluding page of the module, we explore the general graph directory model: its theoretical expressiveness, the garbage collection problem, historical systems that attempted it, reasons for the dominance of acyclic designs, and how these concepts influence modern distributed and version control systems.

What You Will Master

By completing this page, you will understand why general graphs are more expressive than DAGs, analyze why reference counting fails with cycles, master garbage collection approaches for graph directories, examine historical and modern systems using graph concepts, and synthesize understanding of the directory structure evolution.

The General Graph Model

A general graph directory removes the acyclic constraint, allowing arbitrary directed edges between nodes. This creates a more general structure than the DAG (Directed Acyclic Graph) of standard file systems.

Formal Definition:

G = (V, E)

where:
  V = nodes (files and directories)
  E = directed edges (links between nodes)
  
  Cycles ALLOWED: ∃ paths v₁ → v₂ → ... → vₙ → v₁

Comparison of Directory Models:

Model	Edges	Multiple Parents	Cycles	Reachability
Tree	n-1	No	No	Unique path to each node
DAG (Acyclic)	≥n-1	Yes	No	Multiple paths, finite
General Graph	Any	Yes	Yes	May have infinite paths

What Cycles Enable:

Bi-directional relationships: A can link to B, and B can link back to A
Self-reference: A directory can contain a link to itself
Mutual containment: Directory A contains B, B contains A

Example Cyclic Structure:

/shared/
├── project_a/
│   ├── lib → /shared/common/
│   └── code/
├── project_b/
│   ├── lib → /shared/common/
│   └── code/
└── common/
    ├── utils/
    └── projects → /shared/         ← CYCLE!
        (common links back to shared)

Converting Mermaid diagram...

Observations from the Diagram:

Multiple Paths: file1.txt is reachable via /dir_a/file1.txt AND /dir_b/dir_a/file1.txt (via the cycle)
Cycles Present: dir_a → dir_c → dir_a forms a cycle. dir_a → dir_b → dir_c → dir_a is another.
Infinite Paths: The path /dir_a/dir_c/dir_a/dir_c/dir_a/... continues forever!
Shared Access: dir_c and shared.dat are reachable from multiple starting points.

Theoretical Expressiveness:

General graphs can model relationships that DAGs cannot:

Mutual dependencies between components
Version control branching/merging patterns
Distributed system topologies
Any relationship that isn't strictly hierarchical

Why Consider General Graphs?

While acyclic structures dominate production file systems, general graph concepts appear in version control (Git's commit DAG can be traversed in cycles from merge commits), distributed file systems, and object storage systems. Understanding graph directories prepares you for these advanced systems.

The Reference Counting Problem

Acyclic directories use reference counting for deletion: each inode tracks how many directory entries point to it. When the count reaches zero, the file is deleted. This elegant approach fails completely with cycles.

Why Reference Counting Fails with Cycles:

Consider a simple cycle:

dir_a (link_count: 2) ← referenced by root and dir_b
  └── link_to_b → dir_b
  
dir_b (link_count: 1) ← referenced by dir_a
  └── link_to_a → dir_a

Now, remove dir_a from the root directory:

Before: root → dir_a
After:  root ⊗ (dir_a removed from root)

dir_a.link_count: 2 → 1 (still > 0!)
dir_b.link_count: 1 (unchanged)

Both directories still exist because they reference each other!
But they're unreachable from root!

This is the cyclic garbage problem—nodes with non-zero reference counts that can never be reached from the root.

Reference Counting: DAG vs Cyclic Graph
Scenario	DAG Behavior	Cyclic Graph Behavior
Delete last link to file	link_count=0, file deleted	link_count=0, file deleted
Delete link, others exist	link_count--, file remains	link_count--, file remains
Cycle with no external refs	Impossible (no cycles)	All nodes have count>0, LEAKED!
Storage reclamation	Automatic via count=0	Requires garbage collection

reference_counting_failure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
/*
 * Demonstration of Reference Counting Failure with Cycles
 * Shows why garbage collection is needed for general graph directories.
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
 
#define MAX_LINKS 8
#define MAX_NAME 32
 
/*
 * Simulated file/directory node
 */
typedef struct Node {
    char name[MAX_NAME];
    int link_count;                    /* Reference count */
    struct Node *links[MAX_LINKS];     /* Outgoing links (children) */
    int num_links;
    bool marked;                       /* For garbage collection */
} Node;
 
/* All nodes in the system */
#define MAX_NODES 100
Node *all_nodes[MAX_NODES];
int node_count = 0;
 
/* Root of file system */
Node *root = NULL;
 
/*
 * Create a new node
 */
Node* create_node(const char *name) {
    Node *n = malloc(sizeof(Node));
    strncpy(n->name, name, MAX_NAME - 1);
    n->link_count = 0;
    n->num_links = 0;
    n->marked = false;
    
    all_nodes[node_count++] = n;
    return n;
}
 
/*
 * Add a link from parent to child
 */
void add_link(Node *parent, Node *child) {
    if (parent->num_links >= MAX_LINKS) return;
    
    parent->links[parent->num_links++] = child;
    child->link_count++;  /* Increment reference count */
    
    printf("Added link: %s → %s (link_count now %d)
",
           parent->name, child->name, child->link_count);
}
 
/*
 * Remove a link from parent to child
 */
void remove_link(Node *parent, Node *child) {
    for (int i = 0; i < parent->num_links; i++) {
        if (parent->links[i] == child) {
            /* Shift remaining links */
            for (int j = i; j < parent->num_links - 1; j++) {
                parent->links[j] = parent->links[j + 1];
            }
            parent->num_links--;
            child->link_count--;
            
            printf("Removed link: %s → %s (link_count now %d)
",
                   parent->name, child->name, child->link_count);
            
            /* Traditional ref counting: delete if count = 0 */
            if (child->link_count == 0) {
                printf("  → %s would be deleted (ref count = 0)
", child->name);
            }
            return;
        }
    }
}
 
/*
 * Print current state
 */
void print_state(void) {
    printf("
--- Current State ---
");
    for (int i = 0; i < node_count; i++) {
        printf("  %s: link_count=%d, links_out=%d
",
               all_nodes[i]->name,
               all_nodes[i]->link_count,
               all_nodes[i]->num_links);
    }
    printf("---------------------
 
");
}
 
/*
 * Demonstrate the cyclic garbage problem
 */
void demonstrate_cyclic_garbage(void) {
    printf("=== Cyclic Garbage Demonstration ===
 
");
    
    /* Create structure */
    root = create_node("root");
    Node *dir_a = create_node("dir_a");
    Node *dir_b = create_node("dir_b");
    
    /* Build initial tree */
    printf("Step 1: Build structure
");
    add_link(root, dir_a);     /* root → dir_a */
    add_link(root, dir_b);     /* root → dir_b */
    add_link(dir_a, dir_b);    /* dir_a → dir_b (sharing) */
    add_link(dir_b, dir_a);    /* dir_b → dir_a (CYCLE!) */
    
    print_state();
    printf("Note: dir_a and dir_b form a cycle.
");
    
    /* Remove from root */
    printf("
Step 2: Remove dir_a from root
");
    remove_link(root, dir_a);
    print_state();
    
    printf("dir_a still has link_count=1 (from dir_b)
");
    printf("dir_a is NOT deleted despite being unreachable!
");
    
    printf("
Step 3: Remove dir_b from root
");
    remove_link(root, dir_b);
    print_state();
    
    printf("PROBLEM: Both nodes have link_count > 0
");
    printf("         But neither is reachable from root!
");
    printf("         This is CYCLIC GARBAGE.
");
    printf("         Reference counting CANNOT reclaim this memory.
");
}
 
/*
 * Mark phase: mark all reachable nodes from root
 */
void mark_reachable(Node *node) {
    if (node == NULL || node->marked) return;
    
    node->marked = true;
    
    for (int i = 0; i < node->num_links; i++) {
        mark_reachable(node->links[i]);
    }
}
 
/*
 * Sweep phase: delete all unmarked nodes
 */
void sweep_unreachable(void) {
    printf("
Garbage Collection Results:
");
    
    for (int i = 0; i < node_count; i++) {
        if (!all_nodes[i]->marked) {
            printf("  GARBAGE: %s (unreachable, will be deleted)
",
                   all_nodes[i]->name);
        } else {
            printf("  LIVE:    %s (reachable from root)
",
                   all_nodes[i]->name);
        }
    }
}
 
/*
 * Demonstrate garbage collection solution
 */
void demonstrate_garbage_collection(void) {
    printf("
 
=== Garbage Collection Solution ===
 
");
    
    printf("Running Mark-and-Sweep from root:
");
    
    /* Reset marks */
    for (int i = 0; i < node_count; i++) {
        all_nodes[i]->marked = false;
    }
    
    /* Mark phase */
    printf("
Mark phase: traversing from root...
");
    mark_reachable(root);
    
    /* Sweep phase */
    sweep_unreachable();
    
    printf("
Garbage collection correctly identified the cycle!
");
}
 
int main() {
    demonstrate_cyclic_garbage();
    demonstrate_garbage_collection();
    
    return 0;
}

Memory Leak in File Systems

Cyclic garbage in a file system means disk space is permanently lost—files exist on disk, consuming space, but can never be accessed or freed through normal operations. This is a catastrophic failure mode, which is why production file systems enforce acyclicity.

Garbage Collection Approaches

To handle cycles, general graph directories require garbage collection (GC)—algorithms that determine which nodes are reachable and reclaim unreachable ones.

1. Mark-and-Sweep

The classic garbage collection algorithm:

Mark Phase:
  Start from root
  Recursively mark all reachable nodes
  
Sweep Phase:
  Scan all nodes
  Delete unmarked nodes (garbage)
  Reset marks for next collection

Complexity: O(|V| + |E|) - visits all nodes and edges

Challenges for File Systems:

Must scan entire storage device (slow for large disks)
System must be quiescent (no ongoing operations) OR use complex concurrent GC
Crash during GC can corrupt the file system

2. Tri-Color Marking (Incremental GC)

For systems that can't stop for full GC:

Colors:
  White: unvisited (potentially garbage)
  Gray:  visited but children not processed
  Black: visited and all children processed
  
Algorithm:
  1. All nodes start white
  2. Root becomes gray
  3. Pick gray node, process children (gray them), make current black
  4. Repeat until no gray nodes
  5. White nodes are garbage
  
Can be interrupted and resumed!

3. Backup Pointer Chain (for specific cycle patterns)

If cycles have predictable structure:

For each cycle-capable link, maintain a \"back-pointer\"
When checking deletion, follow back-pointers to detect cycles
Cycles can be explicitly broken when detected

Limitation: Only works for specific, anticipated cycle patterns

Garbage Collection Trade-offs
Approach	Stop-the-World?	Complexity	Space Overhead	File System Suitability
Mark-and-Sweep	Yes	O(V+E)	O(V) for marks	Poor (long pauses)
Tri-Color	No (incremental)	O(V+E) amortized	O(V) for colors	Better (no pauses)
Reference Counting + Cycle Detection	Periodic	O(V+E) for detection	Minimal	Common hybrid
Epoch-based	No	O(modified)	Per-epoch tracking	Good for versioning

mark_sweep_gc.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
/*
 * Mark-and-Sweep Garbage Collection for Graph Directories
 * Demonstrates how GC would work in a general graph file system.
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
 
#define MAX_CHILDREN 10
#define MAX_NODES 1000
 
/* Node states for tri-color marking */
typedef enum {
    WHITE,  /* Unvisited - potentially garbage */
    GRAY,   /* Visited but children not processed */
    BLACK   /* Fully processed - definitely reachable */
} NodeColor;
 
typedef struct FSNode {
    int id;
    char name[32];
    struct FSNode *children[MAX_CHILDREN];
    int child_count;
    NodeColor color;
    bool deleted;
} FSNode;
 
/* Global node table (simulating disk) */
FSNode *node_table[MAX_NODES];
int next_id = 0;
 
/* Root of file system */
FSNode *fs_root = NULL;
 
/*
 * Create a node
 */
FSNode* create_fs_node(const char *name) {
    FSNode *node = malloc(sizeof(FSNode));
    node->id = next_id;
    strncpy(node->name, name, 31);
    node->child_count = 0;
    node->color = WHITE;
    node->deleted = false;
    
    node_table[next_id++] = node;
    return node;
}
 
/*
 * Link nodes
 */
void link_nodes(FSNode *parent, FSNode *child) {
    if (parent->child_count < MAX_CHILDREN) {
        parent->children[parent->child_count++] = child;
    }
}
 
/*
 * Phase 1: Reset all nodes to white
 */
void gc_reset(void) {
    for (int i = 0; i < next_id; i++) {
        if (node_table[i] && !node_table[i]->deleted) {
            node_table[i]->color = WHITE;
        }
    }
}
 
/*
 * Phase 2: Mark - traverse from root, coloring reachable nodes
 * Uses iterative approach with explicit gray set
 */
void gc_mark(void) {
    FSNode *gray_set[MAX_NODES];
    int gray_count = 0;
    
    if (!fs_root || fs_root->deleted) return;
    
    /* Start with root */
    fs_root->color = GRAY;
    gray_set[gray_count++] = fs_root;
    
    while (gray_count > 0) {
        /* Pop a gray node */
        FSNode *current = gray_set[--gray_count];
        
        printf("  Visiting: %s (id=%d)
", current->name, current->id);
        
        /* Process children */
        for (int i = 0; i < current->child_count; i++) {
            FSNode *child = current->children[i];
            if (child && !child->deleted && child->color == WHITE) {
                child->color = GRAY;
                gray_set[gray_count++] = child;
            }
        }
        
        /* Current is now fully processed */
        current->color = BLACK;
    }
}
 
/*
 * Phase 3: Sweep - reclaim white (unreachable) nodes
 */
int gc_sweep(void) {
    int collected = 0;
    
    for (int i = 0; i < next_id; i++) {
        FSNode *node = node_table[i];
        if (node && !node->deleted && node->color == WHITE) {
            printf("  Collecting: %s (id=%d) - GARBAGE
", 
                   node->name, node->id);
            node->deleted = true;  /* Mark as deleted */
            collected++;
        }
    }
    
    return collected;
}
 
/*
 * Full garbage collection cycle
 */
int garbage_collect(void) {
    printf("
===== Running Garbage Collection =====
 
");
    
    printf("Phase 1: Reset colors
");
    gc_reset();
    
    printf("
Phase 2: Mark reachable nodes
");
    gc_mark();
    
    printf("
Phase 3: Sweep unreachable nodes
");
    int collected = gc_sweep();
    
    printf("
===== GC Complete: %d nodes collected =====
 
", collected);
    return collected;
}
 
/*
 * Print file system state
 */
void print_filesystem(void) {
    printf("
File System State:
");
    printf("------------------
");
    for (int i = 0; i < next_id; i++) {
        FSNode *n = node_table[i];
        if (n && !n->deleted) {
            printf("  [%d] %s: %d children
", 
                   n->id, n->name, n->child_count);
        }
    }
    printf("
");
}
 
/*
 * Demonstrate GC with cycles
 */
void demonstrate_gc(void) {
    printf("=== General Graph Directory with Garbage Collection ===
 
");
    
    /* Create file system structure */
    fs_root = create_fs_node("root");
    FSNode *home = create_fs_node("home");
    FSNode *alice = create_fs_node("alice");
    FSNode *bob = create_fs_node("bob");
    FSNode *shared = create_fs_node("shared");  /* Will become garbage */
    FSNode *cycle1 = create_fs_node("cycle1");  /* Part of cycle */
    FSNode *cycle2 = create_fs_node("cycle2");  /* Part of cycle */
    
    /* Build tree portion */
    link_nodes(fs_root, home);
    link_nodes(home, alice);
    link_nodes(home, bob);
    
    /* Build shared structure (reachable) */
    link_nodes(alice, shared);
    link_nodes(bob, shared);
    
    /* Build cycle not connected to root */
    link_nodes(cycle1, cycle2);
    link_nodes(cycle2, cycle1);  /* Creates cycle! */
    
    printf("Initial structure:
");
    printf("  root → home → alice → shared
");
    printf("              ↘ bob → shared
");
    printf("  
");
    printf("  cycle1 ⟷ cycle2 (disconnected cycle - GARBAGE!)
");
    
    print_filesystem();
    
    /* First GC - should collect the cycle */
    printf("Running GC to find the disconnected cycle...
");
    garbage_collect();
    
    print_filesystem();
    
    /* Now simulate removing shared from tree */
    printf("Simulating removal: alice no longer links to shared
");
    printf("                    bob no longer links to shared
");
    alice->child_count = 0;  /* Remove alice → shared */
    bob->child_count = 0;    /* Remove bob → shared */
    
    printf("
Now 'shared' is isolated (no incoming links)...
");
    
    garbage_collect();
    
    print_filesystem();
}
 
int main() {
    demonstrate_gc();
    return 0;
}

GC in Modern Storage Systems

While traditional file systems avoid GC, modern storage systems like log-structured file systems, SSDs, and distributed storage all use garbage collection for space reclamation. The principles learned here apply directly to these systems.

Historical and Modern Examples

While strict general graph directories are rare in production file systems, the concepts appear in various historical and modern systems.

Historical Systems:

1. Multics (1965-2000)

Multics allowed symbolic links that could create apparent cycles. It handled this through:

Symlink resolution limits (like modern SYMLOOP_MAX)
Path normalization
Not storing actual cyclic directory hard links

2. TENEX/TOPS-20 (1969-1990s)

PDP-10 systems allowed more flexible linking. The OS tracked links and performed periodic cleanup operations.

3. Research Systems

Academic research explored general graph file systems:

Semantic file systems (content-based organization)
Object-oriented databases as file systems
Plan 9's union directories (not cyclic, but graph-like)

Modern Systems with Graph Characteristics:

Modern Systems with Graph-Like Structures
System	Graph Feature	Cycle Handling	Use Case
Git	Commit DAG (merges)	DAG enforced (acyclic)	Version control
ZFS Snapshots	Dataset relationships	Controlled hierarchy	Copy-on-write storage
Btrfs Subvolumes	Multiple mount points	Namespace isolation	Flexible storage
Docker Layers	Layer inheritance	DAG of layers	Container images
IPFS	Content-addressed DAG	Hash-based (acyclic)	Distributed storage
Google File System	Chunk replication	Metadata separation	Distributed storage

Git as a DAG Example:

Git's commit history is a DAG, not a tree:

       A ← B ← D
         ╲   ╱
          ← C

Commit D has two parents (B and C) from a merge. This is a multi-parent relationship (DAG), but there are no cycles—you can never arrive at a commit by following its parent chain.

Git uses garbage collection to clean up unreachable commits:

$ git gc  # Runs mark-and-sweep style collection

Object Storage Systems:

Amazon S3, Google Cloud Storage, and similar systems use flat namespaces (essentially single-level with prefixes) because:

Eliminates all directory overhead
Simplifies distributed operation
Prefixes simulate hierarchy for humans
No cycles possible without directories

This is a return to simplicity—choosing flat structure for distributed scale.

The Full Circle

Modern cloud storage often resembles single-level directories! S3 buckets contain objects with key names like 'folder/subfolder/file.txt'—but those '/' separators are just characters in a flat namespace. We've gone from single-level → two-level → tree → DAG → back to flat for massive scale.

Why Acyclic Structures Dominate

Given the expressiveness of general graphs, why do acyclic structures dominate production file systems? The answer involves engineering trade-offs, user expectations, and fundamental complexity management.

1. Simplicity of Deletion

DAG/Tree: unlink() → decrement count → maybe delete
Graph:    unlink() → decrement count → ???
          (can't delete until GC runs)
          (GC is expensive and complex)

Reference counting is vastly simpler and more predictable than garbage collection.

2. Predictable Traversal

find /home -name '*.txt'

DAG: Visits each file once (track visited inodes)
Graph: Must detect cycles, may visit nodes multiple times on different paths

Every file system utility would need cycle detection, complicating implementation and slowing operation.

3. User Mental Model

Users understand hierarchies intuitively:

'A file is IN a folder'
'Folders contain other folders and files'
'There's a path from home to any file'

Cycles break this mental model:

'This folder contains itself?'
'Going into subfolders brings me back here?'

4. Implementation Complexity

Implementation Complexity Comparison
Operation	Tree/DAG Complexity	General Graph Complexity
Create file	O(path length)	O(path length)
Delete file	O(1) after path resolution	O(1) + periodic O(V+E) GC
List directory	O(entries in dir)	O(entries in dir)
Recursive delete	O(subtree size)	O(subtree) + cycle detection
Free space calculation	Simple: sum free blocks	Complex: GC may reclaim more
Backup	Tree walk	Complex: shared node handling
fsck (repair)	Tree consistency check	Full graph analysis required

5. Recovery and Reliability

File system recovery after crashes depends on consistent structures:

Tree: If parent pointer is consistent, subtree is recoverable
Graph: Broken links might be cycles or legitimate sharing
       Recovery tool can't easily distinguish

The fsck tool assumes tree/DAG structure. Graph structures would require fundamentally different recovery approaches.

6. Quotas and Accounting

With cycles, questions become complex:

How much space does user X use if files are shared?
Who 'owns' a file in a cycle?
How to enforce quotas when cycles can hide usage?

The Engineering Verdict:

The marginal expressiveness gained from allowing cycles doesn't justify the complexity cost. DAGs provide enough flexibility (hard links, symlinks) for real use cases while maintaining manageable complexity.

Design Wisdom

This is a recurring pattern in systems design: simpler abstractions with slightly less expressiveness often win over theoretically more powerful but complex alternatives. Trees and DAGs hit a sweet spot for file system organization.

Summary: Directory Structure Evolution Complete

We've completed our journey through directory structure evolution—from the simplest single-level directories to the theoretical extreme of general graphs. Let's synthesize the key insights from this entire module.

Module Key Takeaways

•Single-level directories are flat namespaces — Simple but with fatal naming collision problems as file counts grow. Still appropriate for specialized embedded uses.
•Two-level directories introduce user isolation — Partitioning by user solves multi-user naming but leaves each user's space flat. Historical stepping stone.
•Tree directories became the universal standard — Unlimited depth matches human categorization. Path resolution, working directories, and hierarchical operations emerged here.
•Acyclic graphs enable controlled sharing — Hard links and symbolic links allow files to appear in multiple locations. Reference counting handles deletion. The acyclic constraint prevents complexity explosion.
•General graphs are theoretically more expressive — But cycles break reference counting, requiring garbage collection. Complexity costs outweigh benefits for general file systems.
•Engineering trade-offs favor simpler structures — DAGs provide sufficient flexibility for real needs while maintaining predictable behavior, simple implementation, and reliable recovery.

The Evolution Pattern:

Single-Level  →  Two-Level  →  Tree  →  DAG
    │               │            │        │
 All files      User         Arbitrary  Sharing
  jumbled      isolation     hierarchy   via
  together     solves        solves      links
              inter-user    intra-user
              naming        organization

Each step solved specific limitations of the previous model while adding just enough complexity to be worthwhile.

The General Graph Lesson:

DAG → General Graph
     │
     Adds: True cycles possible
     Costs: GC required, complexity explodes
     Benefit: Marginal additional expressiveness
     Verdict: Not worth it for file systems

This teaches a broader engineering lesson: more expressive isn't always better. The 'best' design is often one that provides just enough capability for real needs while minimizing complexity.

Module Complete

You now have a comprehensive understanding of directory structure design—from historical evolution to modern implementations to theoretical limits. This knowledge applies whenever you work with file systems, design storage hierarchies, or architect systems requiring hierarchical organization. You can now appreciate why your file manager looks the way it does and why 'mkdir -p' works while 'cyclic directory links' don't exist in your daily computing.

5 / 5

Loading learning content...

Operating SystemsDirectory Structure

Directory Structure

LevelIntermediate

Duration75 mins

TopicDirectory Structure

5 / 5

General Graph Directory

When Acyclic Isn't Enough

What You Will Master

The General Graph Model

Formal Definition:

G = (V, E)

where:
  V = nodes (files and directories)
  E = directed edges (links between nodes)
  
  Cycles ALLOWED: ∃ paths v₁ → v₂ → ... → vₙ → v₁

Comparison of Directory Models:

Model	Edges	Multiple Parents	Cycles	Reachability
Tree	n-1	No	No	Unique path to each node
DAG (Acyclic)	≥n-1	Yes	No	Multiple paths, finite
General Graph	Any	Yes	Yes	May have infinite paths

What Cycles Enable:

Bi-directional relationships: A can link to B, and B can link back to A
Self-reference: A directory can contain a link to itself
Mutual containment: Directory A contains B, B contains A

Example Cyclic Structure:

/shared/
├── project_a/
│   ├── lib → /shared/common/
│   └── code/
├── project_b/
│   ├── lib → /shared/common/
│   └── code/
└── common/
    ├── utils/
    └── projects → /shared/         ← CYCLE!
        (common links back to shared)

Converting Mermaid diagram...

Observations from the Diagram:

Multiple Paths: file1.txt is reachable via /dir_a/file1.txt AND /dir_b/dir_a/file1.txt (via the cycle)
Cycles Present: dir_a → dir_c → dir_a forms a cycle. dir_a → dir_b → dir_c → dir_a is another.
Infinite Paths: The path /dir_a/dir_c/dir_a/dir_c/dir_a/... continues forever!
Shared Access: dir_c and shared.dat are reachable from multiple starting points.

Theoretical Expressiveness:

General graphs can model relationships that DAGs cannot:

Mutual dependencies between components
Version control branching/merging patterns
Distributed system topologies
Any relationship that isn't strictly hierarchical

Why Consider General Graphs?

The Reference Counting Problem

Why Reference Counting Fails with Cycles:

Consider a simple cycle:

dir_a (link_count: 2) ← referenced by root and dir_b
  └── link_to_b → dir_b
  
dir_b (link_count: 1) ← referenced by dir_a
  └── link_to_a → dir_a

Now, remove dir_a from the root directory:

Before: root → dir_a
After:  root ⊗ (dir_a removed from root)

dir_a.link_count: 2 → 1 (still > 0!)
dir_b.link_count: 1 (unchanged)

Both directories still exist because they reference each other!
But they're unreachable from root!

This is the cyclic garbage problem—nodes with non-zero reference counts that can never be reached from the root.

Reference Counting: DAG vs Cyclic Graph
Scenario	DAG Behavior	Cyclic Graph Behavior
Delete last link to file	link_count=0, file deleted	link_count=0, file deleted
Delete link, others exist	link_count--, file remains	link_count--, file remains
Cycle with no external refs	Impossible (no cycles)	All nodes have count>0, LEAKED!
Storage reclamation	Automatic via count=0	Requires garbage collection

reference_counting_failure.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
/*
 * Demonstration of Reference Counting Failure with Cycles
 * Shows why garbage collection is needed for general graph directories.
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
 
#define MAX_LINKS 8
#define MAX_NAME 32
 
/*
 * Simulated file/directory node
 */
typedef struct Node {
    char name[MAX_NAME];
    int link_count;                    /* Reference count */
    struct Node *links[MAX_LINKS];     /* Outgoing links (children) */
    int num_links;
    bool marked;                       /* For garbage collection */
} Node;
 
/* All nodes in the system */
#define MAX_NODES 100
Node *all_nodes[MAX_NODES];
int node_count = 0;
 
/* Root of file system */
Node *root = NULL;
 
/*
 * Create a new node
 */
Node* create_node(const char *name) {
    Node *n = malloc(sizeof(Node));
    strncpy(n->name, name, MAX_NAME - 1);
    n->link_count = 0;
    n->num_links = 0;
    n->marked = false;
    
    all_nodes[node_count++] = n;
    return n;
}
 
/*
 * Add a link from parent to child
 */
void add_link(Node *parent, Node *child) {
    if (parent->num_links >= MAX_LINKS) return;
    
    parent->links[parent->num_links++] = child;
    child->link_count++;  /* Increment reference count */
    
    printf("Added link: %s → %s (link_count now %d)
",
           parent->name, child->name, child->link_count);
}
 
/*
 * Remove a link from parent to child
 */
void remove_link(Node *parent, Node *child) {
    for (int i = 0; i < parent->num_links; i++) {
        if (parent->links[i] == child) {
            /* Shift remaining links */
            for (int j = i; j < parent->num_links - 1; j++) {
                parent->links[j] = parent->links[j + 1];
            }
            parent->num_links--;
            child->link_count--;
            
            printf("Removed link: %s → %s (link_count now %d)
",
                   parent->name, child->name, child->link_count);
            
            /* Traditional ref counting: delete if count = 0 */
            if (child->link_count == 0) {
                printf("  → %s would be deleted (ref count = 0)
", child->name);
            }
            return;
        }
    }
}
 
/*
 * Print current state
 */
void print_state(void) {
    printf("
--- Current State ---
");
    for (int i = 0; i < node_count; i++) {
        printf("  %s: link_count=%d, links_out=%d
",
               all_nodes[i]->name,
               all_nodes[i]->link_count,
               all_nodes[i]->num_links);
    }
    printf("---------------------
 
");
}
 
/*
 * Demonstrate the cyclic garbage problem
 */
void demonstrate_cyclic_garbage(void) {
    printf("=== Cyclic Garbage Demonstration ===
 
");
    
    /* Create structure */
    root = create_node("root");
    Node *dir_a = create_node("dir_a");
    Node *dir_b = create_node("dir_b");
    
    /* Build initial tree */
    printf("Step 1: Build structure
");
    add_link(root, dir_a);     /* root → dir_a */
    add_link(root, dir_b);     /* root → dir_b */
    add_link(dir_a, dir_b);    /* dir_a → dir_b (sharing) */
    add_link(dir_b, dir_a);    /* dir_b → dir_a (CYCLE!) */
    
    print_state();
    printf("Note: dir_a and dir_b form a cycle.
");
    
    /* Remove from root */
    printf("
Step 2: Remove dir_a from root
");
    remove_link(root, dir_a);
    print_state();
    
    printf("dir_a still has link_count=1 (from dir_b)
");
    printf("dir_a is NOT deleted despite being unreachable!
");
    
    printf("
Step 3: Remove dir_b from root
");
    remove_link(root, dir_b);
    print_state();
    
    printf("PROBLEM: Both nodes have link_count > 0
");
    printf("         But neither is reachable from root!
");
    printf("         This is CYCLIC GARBAGE.
");
    printf("         Reference counting CANNOT reclaim this memory.
");
}
 
/*
 * Mark phase: mark all reachable nodes from root
 */
void mark_reachable(Node *node) {
    if (node == NULL || node->marked) return;
    
    node->marked = true;
    
    for (int i = 0; i < node->num_links; i++) {
        mark_reachable(node->links[i]);
    }
}
 
/*
 * Sweep phase: delete all unmarked nodes
 */
void sweep_unreachable(void) {
    printf("
Garbage Collection Results:
");
    
    for (int i = 0; i < node_count; i++) {
        if (!all_nodes[i]->marked) {
            printf("  GARBAGE: %s (unreachable, will be deleted)
",
                   all_nodes[i]->name);
        } else {
            printf("  LIVE:    %s (reachable from root)
",
                   all_nodes[i]->name);
        }
    }
}
 
/*
 * Demonstrate garbage collection solution
 */
void demonstrate_garbage_collection(void) {
    printf("
 
=== Garbage Collection Solution ===
 
");
    
    printf("Running Mark-and-Sweep from root:
");
    
    /* Reset marks */
    for (int i = 0; i < node_count; i++) {
        all_nodes[i]->marked = false;
    }
    
    /* Mark phase */
    printf("
Mark phase: traversing from root...
");
    mark_reachable(root);
    
    /* Sweep phase */
    sweep_unreachable();
    
    printf("
Garbage collection correctly identified the cycle!
");
}
 
int main() {
    demonstrate_cyclic_garbage();
    demonstrate_garbage_collection();
    
    return 0;
}

Memory Leak in File Systems

Garbage Collection Approaches

To handle cycles, general graph directories require garbage collection (GC)—algorithms that determine which nodes are reachable and reclaim unreachable ones.

1. Mark-and-Sweep

The classic garbage collection algorithm:

Mark Phase:
  Start from root
  Recursively mark all reachable nodes
  
Sweep Phase:
  Scan all nodes
  Delete unmarked nodes (garbage)
  Reset marks for next collection

Complexity: O(|V| + |E|) - visits all nodes and edges

Challenges for File Systems:

Must scan entire storage device (slow for large disks)
System must be quiescent (no ongoing operations) OR use complex concurrent GC
Crash during GC can corrupt the file system

2. Tri-Color Marking (Incremental GC)

For systems that can't stop for full GC:

Colors:
  White: unvisited (potentially garbage)
  Gray:  visited but children not processed
  Black: visited and all children processed
  
Algorithm:
  1. All nodes start white
  2. Root becomes gray
  3. Pick gray node, process children (gray them), make current black
  4. Repeat until no gray nodes
  5. White nodes are garbage
  
Can be interrupted and resumed!

3. Backup Pointer Chain (for specific cycle patterns)

If cycles have predictable structure:

For each cycle-capable link, maintain a \"back-pointer\"
When checking deletion, follow back-pointers to detect cycles
Cycles can be explicitly broken when detected

Limitation: Only works for specific, anticipated cycle patterns

Garbage Collection Trade-offs
Approach	Stop-the-World?	Complexity	Space Overhead	File System Suitability
Mark-and-Sweep	Yes	O(V+E)	O(V) for marks	Poor (long pauses)
Tri-Color	No (incremental)	O(V+E) amortized	O(V) for colors	Better (no pauses)
Reference Counting + Cycle Detection	Periodic	O(V+E) for detection	Minimal	Common hybrid
Epoch-based	No	O(modified)	Per-epoch tracking	Good for versioning

mark_sweep_gc.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
/*
 * Mark-and-Sweep Garbage Collection for Graph Directories
 * Demonstrates how GC would work in a general graph file system.
 */
 
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
 
#define MAX_CHILDREN 10
#define MAX_NODES 1000
 
/* Node states for tri-color marking */
typedef enum {
    WHITE,  /* Unvisited - potentially garbage */
    GRAY,   /* Visited but children not processed */
    BLACK   /* Fully processed - definitely reachable */
} NodeColor;
 
typedef struct FSNode {
    int id;
    char name[32];
    struct FSNode *children[MAX_CHILDREN];
    int child_count;
    NodeColor color;
    bool deleted;
} FSNode;
 
/* Global node table (simulating disk) */
FSNode *node_table[MAX_NODES];
int next_id = 0;
 
/* Root of file system */
FSNode *fs_root = NULL;
 
/*
 * Create a node
 */
FSNode* create_fs_node(const char *name) {
    FSNode *node = malloc(sizeof(FSNode));
    node->id = next_id;
    strncpy(node->name, name, 31);
    node->child_count = 0;
    node->color = WHITE;
    node->deleted = false;
    
    node_table[next_id++] = node;
    return node;
}
 
/*
 * Link nodes
 */
void link_nodes(FSNode *parent, FSNode *child) {
    if (parent->child_count < MAX_CHILDREN) {
        parent->children[parent->child_count++] = child;
    }
}
 
/*
 * Phase 1: Reset all nodes to white
 */
void gc_reset(void) {
    for (int i = 0; i < next_id; i++) {
        if (node_table[i] && !node_table[i]->deleted) {
            node_table[i]->color = WHITE;
        }
    }
}
 
/*
 * Phase 2: Mark - traverse from root, coloring reachable nodes
 * Uses iterative approach with explicit gray set
 */
void gc_mark(void) {
    FSNode *gray_set[MAX_NODES];
    int gray_count = 0;
    
    if (!fs_root || fs_root->deleted) return;
    
    /* Start with root */
    fs_root->color = GRAY;
    gray_set[gray_count++] = fs_root;
    
    while (gray_count > 0) {
        /* Pop a gray node */
        FSNode *current = gray_set[--gray_count];
        
        printf("  Visiting: %s (id=%d)
", current->name, current->id);
        
        /* Process children */
        for (int i = 0; i < current->child_count; i++) {
            FSNode *child = current->children[i];
            if (child && !child->deleted && child->color == WHITE) {
                child->color = GRAY;
                gray_set[gray_count++] = child;
            }
        }
        
        /* Current is now fully processed */
        current->color = BLACK;
    }
}
 
/*
 * Phase 3: Sweep - reclaim white (unreachable) nodes
 */
int gc_sweep(void) {
    int collected = 0;
    
    for (int i = 0; i < next_id; i++) {
        FSNode *node = node_table[i];
        if (node && !node->deleted && node->color == WHITE) {
            printf("  Collecting: %s (id=%d) - GARBAGE
", 
                   node->name, node->id);
            node->deleted = true;  /* Mark as deleted */
            collected++;
        }
    }
    
    return collected;
}
 
/*
 * Full garbage collection cycle
 */
int garbage_collect(void) {
    printf("
===== Running Garbage Collection =====
 
");
    
    printf("Phase 1: Reset colors
");
    gc_reset();
    
    printf("
Phase 2: Mark reachable nodes
");
    gc_mark();
    
    printf("
Phase 3: Sweep unreachable nodes
");
    int collected = gc_sweep();
    
    printf("
===== GC Complete: %d nodes collected =====
 
", collected);
    return collected;
}
 
/*
 * Print file system state
 */
void print_filesystem(void) {
    printf("
File System State:
");
    printf("------------------
");
    for (int i = 0; i < next_id; i++) {
        FSNode *n = node_table[i];
        if (n && !n->deleted) {
            printf("  [%d] %s: %d children
", 
                   n->id, n->name, n->child_count);
        }
    }
    printf("
");
}
 
/*
 * Demonstrate GC with cycles
 */
void demonstrate_gc(void) {
    printf("=== General Graph Directory with Garbage Collection ===
 
");
    
    /* Create file system structure */
    fs_root = create_fs_node("root");
    FSNode *home = create_fs_node("home");
    FSNode *alice = create_fs_node("alice");
    FSNode *bob = create_fs_node("bob");
    FSNode *shared = create_fs_node("shared");  /* Will become garbage */
    FSNode *cycle1 = create_fs_node("cycle1");  /* Part of cycle */
    FSNode *cycle2 = create_fs_node("cycle2");  /* Part of cycle */
    
    /* Build tree portion */
    link_nodes(fs_root, home);
    link_nodes(home, alice);
    link_nodes(home, bob);
    
    /* Build shared structure (reachable) */
    link_nodes(alice, shared);
    link_nodes(bob, shared);
    
    /* Build cycle not connected to root */
    link_nodes(cycle1, cycle2);
    link_nodes(cycle2, cycle1);  /* Creates cycle! */
    
    printf("Initial structure:
");
    printf("  root → home → alice → shared
");
    printf("              ↘ bob → shared
");
    printf("  
");
    printf("  cycle1 ⟷ cycle2 (disconnected cycle - GARBAGE!)
");
    
    print_filesystem();
    
    /* First GC - should collect the cycle */
    printf("Running GC to find the disconnected cycle...
");
    garbage_collect();
    
    print_filesystem();
    
    /* Now simulate removing shared from tree */
    printf("Simulating removal: alice no longer links to shared
");
    printf("                    bob no longer links to shared
");
    alice->child_count = 0;  /* Remove alice → shared */
    bob->child_count = 0;    /* Remove bob → shared */
    
    printf("
Now 'shared' is isolated (no incoming links)...
");
    
    garbage_collect();
    
    print_filesystem();
}
 
int main() {
    demonstrate_gc();
    return 0;
}

GC in Modern Storage Systems

Historical and Modern Examples

While strict general graph directories are rare in production file systems, the concepts appear in various historical and modern systems.

Historical Systems:

1. Multics (1965-2000)

Multics allowed symbolic links that could create apparent cycles. It handled this through:

Symlink resolution limits (like modern SYMLOOP_MAX)
Path normalization
Not storing actual cyclic directory hard links

2. TENEX/TOPS-20 (1969-1990s)

PDP-10 systems allowed more flexible linking. The OS tracked links and performed periodic cleanup operations.

3. Research Systems

Academic research explored general graph file systems:

Semantic file systems (content-based organization)
Object-oriented databases as file systems
Plan 9's union directories (not cyclic, but graph-like)

Modern Systems with Graph Characteristics:

Modern Systems with Graph-Like Structures
System	Graph Feature	Cycle Handling	Use Case
Git	Commit DAG (merges)	DAG enforced (acyclic)	Version control
ZFS Snapshots	Dataset relationships	Controlled hierarchy	Copy-on-write storage
Btrfs Subvolumes	Multiple mount points	Namespace isolation	Flexible storage
Docker Layers	Layer inheritance	DAG of layers	Container images
IPFS	Content-addressed DAG	Hash-based (acyclic)	Distributed storage
Google File System	Chunk replication	Metadata separation	Distributed storage

Git as a DAG Example:

Git's commit history is a DAG, not a tree:

       A ← B ← D
         ╲   ╱
          ← C

Commit D has two parents (B and C) from a merge. This is a multi-parent relationship (DAG), but there are no cycles—you can never arrive at a commit by following its parent chain.

Git uses garbage collection to clean up unreachable commits:

$ git gc  # Runs mark-and-sweep style collection

Object Storage Systems:

Amazon S3, Google Cloud Storage, and similar systems use flat namespaces (essentially single-level with prefixes) because:

Eliminates all directory overhead
Simplifies distributed operation
Prefixes simulate hierarchy for humans
No cycles possible without directories

This is a return to simplicity—choosing flat structure for distributed scale.

The Full Circle

Why Acyclic Structures Dominate

1. Simplicity of Deletion

DAG/Tree: unlink() → decrement count → maybe delete
Graph:    unlink() → decrement count → ???
          (can't delete until GC runs)
          (GC is expensive and complex)

Reference counting is vastly simpler and more predictable than garbage collection.

2. Predictable Traversal

find /home -name '*.txt'

DAG: Visits each file once (track visited inodes)
Graph: Must detect cycles, may visit nodes multiple times on different paths

Every file system utility would need cycle detection, complicating implementation and slowing operation.

3. User Mental Model

Users understand hierarchies intuitively:

'A file is IN a folder'
'Folders contain other folders and files'
'There's a path from home to any file'

Cycles break this mental model:

'This folder contains itself?'
'Going into subfolders brings me back here?'

4. Implementation Complexity

Implementation Complexity Comparison
Operation	Tree/DAG Complexity	General Graph Complexity
Create file	O(path length)	O(path length)
Delete file	O(1) after path resolution	O(1) + periodic O(V+E) GC
List directory	O(entries in dir)	O(entries in dir)
Recursive delete	O(subtree size)	O(subtree) + cycle detection
Free space calculation	Simple: sum free blocks	Complex: GC may reclaim more
Backup	Tree walk	Complex: shared node handling
fsck (repair)	Tree consistency check	Full graph analysis required

5. Recovery and Reliability

File system recovery after crashes depends on consistent structures:

Tree: If parent pointer is consistent, subtree is recoverable
Graph: Broken links might be cycles or legitimate sharing
       Recovery tool can't easily distinguish

The fsck tool assumes tree/DAG structure. Graph structures would require fundamentally different recovery approaches.

6. Quotas and Accounting

With cycles, questions become complex:

How much space does user X use if files are shared?
Who 'owns' a file in a cycle?
How to enforce quotas when cycles can hide usage?

The Engineering Verdict:

Design Wisdom

Summary: Directory Structure Evolution Complete

Module Key Takeaways

•Single-level directories are flat namespaces — Simple but with fatal naming collision problems as file counts grow. Still appropriate for specialized embedded uses.
•Two-level directories introduce user isolation — Partitioning by user solves multi-user naming but leaves each user's space flat. Historical stepping stone.
•Tree directories became the universal standard — Unlimited depth matches human categorization. Path resolution, working directories, and hierarchical operations emerged here.
•Acyclic graphs enable controlled sharing — Hard links and symbolic links allow files to appear in multiple locations. Reference counting handles deletion. The acyclic constraint prevents complexity explosion.
•General graphs are theoretically more expressive — But cycles break reference counting, requiring garbage collection. Complexity costs outweigh benefits for general file systems.
•Engineering trade-offs favor simpler structures — DAGs provide sufficient flexibility for real needs while maintaining predictable behavior, simple implementation, and reliable recovery.

The Evolution Pattern:

Single-Level  →  Two-Level  →  Tree  →  DAG
    │               │            │        │
 All files      User         Arbitrary  Sharing
  jumbled      isolation     hierarchy   via
  together     solves        solves      links
              inter-user    intra-user
              naming        organization

Each step solved specific limitations of the previous model while adding just enough complexity to be worthwhile.

The General Graph Lesson:

DAG → General Graph
     │
     Adds: True cycles possible
     Costs: GC required, complexity explodes
     Benefit: Marginal additional expressiveness
     Verdict: Not worth it for file systems

This teaches a broader engineering lesson: more expressive isn't always better. The 'best' design is often one that provides just enough capability for real needs while minimizing complexity.

Module Complete

5 / 5