Operating SystemsProcess Creation & Termination

Process Creation

LevelIntermediate

Duration60 mins

TopicProcess Creation & Termination

3 / 5

Resource Sharing Options

The Resource Sharing Spectrum

When an operating system creates a child process, one of the most consequential design decisions is how resources are shared between parent and child. This choice sits at the intersection of performance, isolation, security, and programming model—and different operating systems make fundamentally different tradeoffs.

At one extreme lies complete sharing: parent and child access the same memory, file descriptors, and identity. At the other extreme lies complete isolation: child receives entirely independent copies of everything. Most practical systems occupy a nuanced middle ground, sharing some resources while copying or isolating others.

Understanding these options is essential for writing correct concurrent programs, optimizing performance, debugging memory corruption, and designing secure systems.

What You Will Learn

By the end of this page, you will understand the complete taxonomy of resource sharing options: when to share, when to copy, and when to isolate. You'll learn the specifics of memory sharing (including copy-on-write), file descriptor semantics, and the clone() system call that enables fine-grained control over resource inheritance.

The Three Sharing Models

Operating systems offer three fundamental approaches to resource sharing between parent and child processes:

Model 1: Share Everything

Parent and child access the same resources. Changes made by one are immediately visible to the other.

Characteristics:

Minimal creation overhead (no copying)
Shared state enables easy communication
Shared state creates synchronization challenges
One process can corrupt another's data

Example: Threads share the same address space and file descriptor table.

Model 2: Copy Everything

Child receives an independent copy of all resources. Parent and child are completely isolated after creation.

Characteristics:

Maximum isolation and safety
Higher creation overhead (copying takes time)
Changes in one process invisible to other
Acts as snapshot of parent state at fork time

Example: Traditional fork() semantics—child gets copy of memory.

Model 3: Selective Sharing

Some resources are shared, others are copied. Provides fine-grained control over isolation boundaries.

Characteristics:

Flexible—choose per-resource
Complexity in understanding what's shared
Enables optimizations (share read-only code, copy mutable state)
Foundation for modern containers and sandboxing

Example: clone() with specific flags controls sharing.

Comparing Resource Sharing Models
Aspect	Share Everything	Copy Everything	Selective Sharing
Creation Cost	Very Low	High (mitigated by COW)	Variable
Memory Isolation	None	Complete	Configurable
Communication Ease	Trivial (shared memory)	Requires IPC	Depends on choices
Synchronization Need	Critical	None needed	For shared resources
Fault Isolation	None (crash affects all)	Complete	Configurable
Security Boundary	Weak	Strong	Configurable
Typical Use Case	Threads	Processes (fork)	Containers, specialized apps

Threads vs Processes: The Core Distinction

The difference between threads and processes is primarily one of resource sharing. Threads share address space, file descriptors, and signal handlers within a process. Separate processes have independent address spaces. The clone() system call blurs this line by allowing any combination.

Memory Sharing and Copy-on-Write

Memory is the most significant resource to consider when creating processes. The choice between sharing and copying has profound implications for performance, correctness, and isolation.

The Copy-on-Write Optimization

Naively copying all memory at fork() time would be prohibitively expensive—a process with 1GB of memory would require copying 1GB of data. Modern operating systems use Copy-on-Write (COW) to defer this cost.

How Copy-on-Write Works:

At fork(): Parent and child share the same physical memory pages
Page Table Setup: Both processes' page tables point to the same physical frames
Mark Read-Only: All shared pages are marked read-only in both processes
Trap on Write: When either process tries to write, a page fault occurs
Copy and Remap: Kernel copies the page, gives the writer its own copy, marks it writable
Continue Execution: Only the actually-modified pages get copied

Converting Mermaid diagram...

Performance Implications

Cache-Efficient Fork:

fork() itself is very fast—mostly page table manipulation
No actual memory copying until writes occur
Many forked processes exec() immediately, never modifying memory
Pages that are never written are never copied

When COW Isn't Enough:

Memory-intensive children that modify most of their memory eventually copy everything anyway
COW page faults add latency to first writes
Kernel memory for tracking COW pages adds overhead
Very large processes may experience significant copy costs even with COW

cow_demonstration.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <string.h>
 
#define PAGE_SIZE 4096
#define NUM_PAGES 1000
#define ARRAY_SIZE (PAGE_SIZE * NUM_PAGES)
 
/**
 * Demonstrates Copy-on-Write behavior
 * 
 * Key observations:
 * 1. Fork is fast even with large memory allocation
 * 2. Reading shared memory doesn't trigger copies
 * 3. Writing triggers per-page copies
 */
 
int main() {
    // Allocate large memory region
    char* large_array = malloc(ARRAY_SIZE);
    if (!large_array) {
        perror("malloc");
        return 1;
    }
    
    // Initialize with pattern
    printf("Parent: Initializing %d bytes...\n", ARRAY_SIZE);
    memset(large_array, 'P', ARRAY_SIZE);
    
    printf("Parent: Memory initialized. Forking...\n");
    
    // Time the fork
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        
        // Reading doesn't copy - pages remain shared
        printf("Child: Reading from shared memory (no copy)...\n");
        int first_char = large_array[0];
        int last_char = large_array[ARRAY_SIZE - 1];
        printf("Child: First='%c', Last='%c'\n", first_char, last_char);
        
        // Writing triggers COW for specific pages
        printf("Child: Writing to first page (triggers COW copy)...\n");
        large_array[0] = 'C';  // Triggers COW for page 0
        
        printf("Child: Writing to page 500 (triggers COW copy)...\n");
        large_array[500 * PAGE_SIZE] = 'C';  // Triggers COW for page 500
        
        // Verify our writes
        printf("Child: After write, first='%c' (expected 'C')\n", large_array[0]);
        
        // Check that unmodified pages still have parent's data
        printf("Child: Unmodified page 100, char='%c' (expected 'P')\n", 
               large_array[100 * PAGE_SIZE]);
        
        _exit(0);
    }
    
    // Parent continues
    wait(NULL);
    
    // Verify parent's data unchanged despite child's writes
    printf("Parent: After child exit, first='%c' (expected 'P')\n", large_array[0]);
    printf("Parent: Isolation confirmed - child's writes didn't affect us\n");
    
    free(large_array);
    return 0;
}

Memory Overcommit and OOM

COW means the kernel can 'promise' more memory than physically exists, betting that not all COW pages will be written. If this bet fails (many processes write heavily), the system runs out of memory (OOM). Linux's OOM killer then terminates processes to reclaim memory. This is the 'overcommit' policy—configurable via /proc/sys/vm/overcommit_memory.

Memory Regions and Sharing Semantics

Different memory regions have different sharing behaviors:

Region	Typical Sharing Behavior	Notes
Text (Code) Segment	Shared (read-only)	Never needs COW—identical, immutable code
Initialized Data	COW	Variables with initial values
BSS (Uninitialized)	COW	Zero-initialized global/static variables
Heap	COW	Dynamically allocated memory
Stack	COW (per-thread)	Each thread/process gets own stack
Shared Memory (mmap SHARED)	Truly Shared	MAP_SHARED regions bypass COW
Mapped Files (PRIVATE)	COW	MAP_PRIVATE copies on write

File Descriptor Sharing

File descriptors present a unique sharing scenario: the descriptor table is copied, but the underlying file table entries are shared. This creates both opportunities and hazards.

The Three-Level File System Abstraction

To understand file descriptor sharing, you must understand the three levels of abstraction:

File Descriptor Table (per-process)
- Array of file descriptor entries
- Index = fd number (0, 1, 2, ...)
- Each entry points to a file table entry
- Copied at fork() (with flags_inherit behavior)
File Table (system-wide)
- Kernel maintains one table for all open files
- Entries contain: file offset, mode (read/write), reference count
- Shared between parent and child after fork
- Entry deleted when reference count reaches 0
Inode Table (system-wide)
- One entry per actual file on disk
- Contains file metadata, data block pointers
- Referenced by file table entries

Converting Mermaid diagram...

Implications of Shared File Table Entries

Shared File Offset:

The most important consequence: parent and child share the file offset.

int fd = open("data.txt", O_RDWR);
lseek(fd, 0, SEEK_SET);  // Offset = 0

pid_t pid = fork();

if (pid == 0) {
    // Child
    char buf[100];
    read(fd, buf, 100);  // Reads bytes 0-99, advances offset to 100
    _exit(0);
}

wait(NULL);
// Parent's offset is now 100! Child's read advanced it.
char buf[100];
read(fd, buf, 100);  // Reads bytes 100-199

This behavior is intentional—it enables concurrent log writing without explicit coordination:

// Both parent and child can write to log
// Writes are atomic (up to PIPE_BUF for pipes, larger for regular files)
// Shared offset ensures sequential log entries without gaps

fd_sharing_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <string.h>
 
/**
 * Demonstrates file descriptor sharing semantics
 * 
 * Key points:
 * 1. FD table is copied (parent and child have own tables)
 * 2. Both tables point to SAME file table entry
 * 3. File offset is shared and advances for both
 * 4. Closing FD in child doesn't affect parent's FD
 */
 
int main() {
    // Create test file
    int fd = open("shared_test.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
    
    const char* initial = "0123456789ABCDEF";  // 16 bytes
    write(fd, initial, 16);
    lseek(fd, 0, SEEK_SET);  // Reset to start
    
    printf("Parent: File position before fork: %ld\n", 
           (long)lseek(fd, 0, SEEK_CUR));
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child
        printf("Child: Initial file position: %ld\n",
               (long)lseek(fd, 0, SEEK_CUR));
        
        // Read 4 bytes - advances SHARED offset
        char buf[5] = {0};
        read(fd, buf, 4);
        printf("Child: Read '%s', position now: %ld\n", 
               buf, (long)lseek(fd, 0, SEEK_CUR));
        
        // Close FD in child - parent's FD still works
        close(fd);
        printf("Child: Closed fd, exiting\n");
        _exit(0);
    }
    
    wait(NULL);
    
    // Parent continues - offset was advanced by child
    printf("Parent: File position after child: %ld\n",
           (long)lseek(fd, 0, SEEK_CUR));
    
    char buf[5] = {0};
    read(fd, buf, 4);  // Reads "4567" not "0123"
    printf("Parent: Read '%s' (expected '4567')\n", buf);
    
    // Our FD still works despite child closing theirs
    printf("Parent: FD still valid, position: %ld\n",
           (long)lseek(fd, 0, SEEK_CUR));
    
    close(fd);
    unlink("shared_test.txt");
    return 0;
}

Close-on-Exec (FD_CLOEXEC)

Sometimes you DON'T want file descriptors to be inherited by children, especially when exec() is called. The close-on-exec flag causes a file descriptor to be automatically closed when the process exec()s a new program.

// Method 1: Set flag after open
int fd = open("secret.key", O_RDONLY);
fcntl(fd, F_SETFD, FD_CLOEXEC);

// Method 2: Use O_CLOEXEC at open time (preferred, atomic)
int fd = open("secret.key", O_RDONLY | O_CLOEXEC);

// Method 3: Set non-inheritable (same effect)
// After exec(), this FD will not exist in the new program

Security Implication: Sensitive file descriptors (passwords, crypto keys, privileged sockets) should always use close-on-exec to prevent leaking to exec'd programs.

Modern Practice: O_CLOEXEC Everywhere

Modern security-conscious code sets O_CLOEXEC by default on all file descriptors. Only explicitly clear it when you specifically want inheritance. This prevents accidentally leaking sensitive resources to exec'd child processes.

The clone() System Call: Fine-Grained Control

While fork() provides all-or-nothing semantics (copy everything, except what COW optimizes), Linux's clone() system call enables precise control over what is shared and what is copied.

clone() vs fork()

Conceptually:

fork() = clone() with no sharing flags
Thread creation = clone() with maximum sharing flags

clone() Flags

The clone() system call accepts flags that individually control each resource:

clone() Sharing Flags (Selected)
Flag	Effect When SET	Effect When UNSET
CLONE_VM	Share virtual memory (address space)	Copy address space (COW)
CLONE_FS	Share filesystem info (cwd, root, umask)	Copy filesystem info
CLONE_FILES	Share file descriptor table	Copy file descriptor table
CLONE_SIGHAND	Share signal handlers	Copy signal handlers
CLONE_THREAD	Place in same thread group	Create new process (new TGID)
CLONE_PARENT	New process is sibling, not child	Normal parent-child
CLONE_NEWPID	Create new PID namespace	Inherit PID namespace
CLONE_NEWNS	Create new mount namespace	Inherit mount namespace
CLONE_NEWNET	Create new network namespace	Inherit network namespace
CLONE_NEWUSER	Create new user namespace	Inherit user namespace

clone_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <sys/wait.h>
#include <sys/mman.h>
 
#define STACK_SIZE (1024 * 1024)
 
// Shared variable (only meaningful with CLONE_VM)
int shared_variable = 0;
 
int child_function(void* arg) {
    printf("Child: Entered child function\n");
    printf("Child: PID=%d, PPID=%d\n", getpid(), getppid());
    
    // Modify shared variable
    shared_variable = 42;
    printf("Child: Set shared_variable to %d\n", shared_variable);
    
    return 0;
}
 
/**
 * Demonstrate clone() with different sharing combinations
 */
int main() {
    // Allocate stack for child (required for clone)
    void* child_stack = mmap(NULL, STACK_SIZE,
                              PROT_READ | PROT_WRITE,
                              MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK,
                              -1, 0);
    if (child_stack == MAP_FAILED) {
        perror("mmap");
        return 1;
    }
    
    // Stack grows downward - pass top of stack
    void* stack_top = child_stack + STACK_SIZE;
    
    printf("Parent: Before clone, shared_variable = %d\n", shared_variable);
    
    // Example 1: Share virtual memory (like threads)
    printf("\n=== Clone with CLONE_VM (shared memory) ===\n");
    
    pid_t pid = clone(child_function,
                      stack_top,
                      CLONE_VM | SIGCHLD,  // Share VM, send SIGCHLD on exit
                      NULL);
    
    if (pid == -1) {
        perror("clone");
        return 1;
    }
    
    printf("Parent: Created child with PID %d\n", pid);
    
    // Wait for child
    waitpid(pid, NULL, 0);
    
    // With CLONE_VM, child's modification is visible to parent
    printf("Parent: After child exit, shared_variable = %d\n", shared_variable);
    
    if (shared_variable == 42) {
        printf("Parent: Memory was SHARED (expected with CLONE_VM)\n");
    } else {
        printf("Parent: Memory was COPIED\n");
    }
    
    // Cleanup
    munmap(child_stack, STACK_SIZE);
    
    return 0;
}

Thread Creation via clone()

POSIX threads (pthreads) are implemented using clone() with these typical flags:

clone(thread_function,
      child_stack,
      CLONE_VM |           // Share address space
      CLONE_FS |           // Share FS info
      CLONE_FILES |        // Share file descriptors
      CLONE_SIGHAND |      // Share signal handlers
      CLONE_THREAD |       // Same thread group (TGID)
      CLONE_SETTLS |       // Set thread-local storage
      CLONE_PARENT_SETTID | // Write TID to parent memory
      CLONE_CHILD_CLEARTID, // Clear TID on exit (for futex)
      arg);

This creates a new thread that shares almost everything with its creator—the defining characteristic of threads versus processes.

Namespace Flags for Containerization

The CLONE_NEW* flags create new namespaces, which is the foundation of Linux containers:

clone(init_function,
      stack_top,
      CLONE_NEWPID |  // New PID namespace (child sees itself as PID 1)
      CLONE_NEWNS |   // New mount namespace (isolated filesystem view)
      CLONE_NEWNET |  // New network namespace (private network stack)
      CLONE_NEWUTS |  // New UTS namespace (private hostname)
      CLONE_NEWIPC |  // New IPC namespace (private message queues)
      CLONE_NEWUSER | // New user namespace (private UID mappings)
      SIGCHLD,
      NULL);

Docker, Kubernetes, and LXC all use these mechanisms.

clone3() - The Modern Interface

Linux 5.3 introduced clone3(), which uses a struct for arguments instead of positional parameters. This allows adding new features without API breakage. New code should prefer clone3() when available, though glibc may still use clone() internally.

Resource Sharing in Practice

Let's examine how different applications and patterns use resource sharing strategically.

Pattern 1: Worker Process Pool

Scenario: Web server wants multiple worker processes to serve requests.

Sharing Strategy:

Fork workers from master after initialization (share read-only config, code)
Each worker has independent memory (for request handling)
Share listening socket FD via inheritance
Workers compete for connections via accept()

// Master opens socket, then forks
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
bind(listen_fd, ...);
listen(listen_fd, 128);

for (int i = 0; i < num_workers; i++) {
    if (fork() == 0) {
        // Worker process
        while (1) {
            int conn = accept(listen_fd, ...);  // Inherited FD
            handle_request(conn);
            close(conn);
        }
    }
}

Examples: Apache prefork MPM, Nginx workers, Gunicorn

Pattern 2: Secure Sandbox (Minimal Sharing)

Scenario: Run untrusted code with minimal access to parent's resources.

Sharing Strategy:

Use namespaces to isolate (CLONE_NEWPID, CLONE_NEWNS, etc.)
Don't inherit any file descriptors (close all before exec)
Drop privileges before exec
Use seccomp to limit system calls

// Highly isolated child
clone(sandbox_function,
      stack_top,
      CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET | CLONE_NEWUSER | SIGCHLD,
      NULL);

// In sandbox_function:
// - Close all FDs
// - Drop capabilities
// - Apply seccomp filter
// - exec untrusted binary

Examples: Chrome sandbox, Firejail, systemd-nspawn

Pattern 3: Parallel Computation (Maximum Sharing)

Scenario: Parallel algorithm with shared data structures.

Sharing Strategy:

Use threads (CLONE_VM shares memory)
Coordinate with mutexes/atomics
Each thread has private stack, TLS

pthread_t threads[NUM_THREADS];

// Shared data
int results[NUM_THREADS];
int shared_counter = 0;
pthread_mutex_t counter_lock = PTHREAD_MUTEX_INITIALIZER;

for (int i = 0; i < NUM_THREADS; i++) {
    pthread_create(&threads[i], NULL, worker, &i);
}

// Workers share address space, can access shared_counter with synchronization

Examples: OpenMP parallel regions, Java parallel streams, Rayon in Rust

When to Share

•Communication bandwidth is critical
•Data structures too large to copy
•Tasks naturally cooperate on shared state
•Creation/destruction overhead must be minimal
•Workloads are short-lived

When to Isolate

•Fault isolation is critical (crashes)
•Security boundaries required
•Tasks have little shared state
•Tasks may be hostile or untrusted
•Debugging requires separation

The Hybrid Approach

Many real systems use hybrid approaches: fork() for process isolation, then mmap(MAP_SHARED) for specific shared regions. This gives per-process protection for most memory while enabling efficient communication through designated shared areas.

Platform Differences

Different operating systems offer different resource sharing mechanisms, reflecting different design philosophies.

Windows: CreateProcess and Handle Inheritance

Windows does not have fork()—it uses CreateProcess() which always loads a new executable. Resource sharing is controlled differently:

Resource Sharing: Linux vs Windows
Resource	Linux (fork/clone)	Windows (CreateProcess)
Memory	COW copy (fork) or shared (clone+VM)	Always separate; use shared memory explicitly
File Handles	All inherited by default	Only handles marked inheritable are inherited
Threads	clone() with CLONE_THREAD	CreateThread() within process
Environment	Inherited by default	Explicitly passed or inherited
Directory	Inherited	Explicitly specified or inherited
Security Context	Inherited UID/GID	Inherited token or specified

windows_inheritance.c
C (Windows)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include <windows.h>
#include <stdio.h>
 
/**
 * Windows handle inheritance example
 * 
 * Handles must be explicitly marked inheritable
 */
int main() {
    // Create security attributes for inheritable handle
    SECURITY_ATTRIBUTES sa;
    sa.nLength = sizeof(sa);
    sa.bInheritHandle = TRUE;    // Key: mark as inheritable
    sa.lpSecurityDescriptor = NULL;
    
    // Create pipe with inheritable handles
    HANDLE read_pipe, write_pipe;
    CreatePipe(&read_pipe, &write_pipe, &sa, 0);
    
    // Prepare child process
    STARTUPINFOA si = {0};
    PROCESS_INFORMATION pi = {0};
    si.cb = sizeof(si);
    
    // Pass handle value to child via environment or command line
    char cmd[256];
    sprintf(cmd, "child.exe %p", write_pipe);  // Pass handle as argument
    
    // Create child with handle inheritance ENABLED
    BOOL success = CreateProcessA(
        NULL,           // App name
        cmd,            // Command line with handle
        NULL,           // Process security
        NULL,           // Thread security
        TRUE,           // INHERIT HANDLES (crucial)
        0,              // Flags
        NULL,           // Environment
        NULL,           // Directory
        &si,
        &pi
    );
    
    if (!success) {
        printf("CreateProcess failed: %d\n", GetLastError());
        return 1;
    }
    
    printf("Parent: Created child, PID=%d\n", pi.dwProcessId);
    printf("Parent: Child inherited write_pipe handle\n");
    
    // Close our copy of write end
    CloseHandle(write_pipe);
    
    // Read from child
    char buf[100];
    DWORD bytesRead;
    ReadFile(read_pipe, buf, sizeof(buf), &bytesRead, NULL);
    buf[bytesRead] = '\0';
    printf("Parent read from child: %s\n", buf);
    
    WaitForSingleObject(pi.hProcess, INFINITE);
    CloseHandle(pi.hProcess);
    CloseHandle(pi.hThread);
    CloseHandle(read_pipe);
    
    return 0;
}

macOS/BSD: fork() + posix_spawn()

macOS and BSD support fork() but encourage posix_spawn() for new code:

posix_spawn() atomically creates process and execs new program
More explicit about what is inherited via file_actions and attrp
Better performance for fork-then-exec pattern
Recommended by Apple for new development

#include <spawn.h>

extern char **environ;

posix_spawn_file_actions_t actions;
posix_spawn_file_actions_init(&actions);

// Close fd 3 in child
posix_spawn_file_actions_addclose(&actions, 3);

// Redirect stdin from file
posix_spawn_file_actions_addopen(&actions, 0, "input.txt", O_RDONLY, 0);

pid_t pid;
posix_spawn(&pid, "/bin/ls", &actions, NULL, 
            (char*[]){"ls", "-la", NULL}, environ);

Summary: Resource Sharing Options

Resource sharing is one of the most consequential decisions in process creation, affecting performance, isolation, security, and programming model. Modern operating systems provide flexible mechanisms to make these tradeoffs explicit.

Key Takeaways

•Three Models — Share everything (threads), copy everything (processes), or selective sharing (clone with flags)
•Copy-on-Write Optimizes fork() — Physical memory isn't copied until written, making fork surprisingly efficient
•File Descriptors Share Offset — Both parent and child share the same file table entry, including file position
•clone() Enables Fine Control — Linux's clone() allows per-resource sharing decisions via flags
•Namespaces Provide Isolation — CLONE_NEW* flags create isolated namespaces for containerization
•Close-on-Exec for Security — Use O_CLOEXEC to prevent sensitive FDs from leaking to exec'd programs
•Platform Differences Matter — Windows uses explicit inheritance; no equivalent to fork()

What's Next:

With resource sharing options understood, the next page examines execution options—the choices a parent has for how it and its child execute after creation: concurrently, sequentially, or in some combined pattern. These decisions affect program structure, responsiveness, and resource utilization.

Page Complete

You now understand the complete spectrum of resource sharing options available during process creation. From COW memory to shared file descriptors to namespace isolation, these mechanisms enable everything from high-performance threading to secure containerization.

3 / 5

Loading learning content...

Operating SystemsProcess Creation & Termination

Process Creation

LevelIntermediate

Duration60 mins

TopicProcess Creation & Termination

3 / 5

Resource Sharing Options

The Resource Sharing Spectrum

Understanding these options is essential for writing correct concurrent programs, optimizing performance, debugging memory corruption, and designing secure systems.

What You Will Learn

The Three Sharing Models

Operating systems offer three fundamental approaches to resource sharing between parent and child processes:

Model 1: Share Everything

Parent and child access the same resources. Changes made by one are immediately visible to the other.

Characteristics:

Minimal creation overhead (no copying)
Shared state enables easy communication
Shared state creates synchronization challenges
One process can corrupt another's data

Example: Threads share the same address space and file descriptor table.

Model 2: Copy Everything

Child receives an independent copy of all resources. Parent and child are completely isolated after creation.

Characteristics:

Maximum isolation and safety
Higher creation overhead (copying takes time)
Changes in one process invisible to other
Acts as snapshot of parent state at fork time

Example: Traditional fork() semantics—child gets copy of memory.

Model 3: Selective Sharing

Some resources are shared, others are copied. Provides fine-grained control over isolation boundaries.

Characteristics:

Flexible—choose per-resource
Complexity in understanding what's shared
Enables optimizations (share read-only code, copy mutable state)
Foundation for modern containers and sandboxing

Example: clone() with specific flags controls sharing.

Comparing Resource Sharing Models
Aspect	Share Everything	Copy Everything	Selective Sharing
Creation Cost	Very Low	High (mitigated by COW)	Variable
Memory Isolation	None	Complete	Configurable
Communication Ease	Trivial (shared memory)	Requires IPC	Depends on choices
Synchronization Need	Critical	None needed	For shared resources
Fault Isolation	None (crash affects all)	Complete	Configurable
Security Boundary	Weak	Strong	Configurable
Typical Use Case	Threads	Processes (fork)	Containers, specialized apps

Threads vs Processes: The Core Distinction

Memory Sharing and Copy-on-Write

Memory is the most significant resource to consider when creating processes. The choice between sharing and copying has profound implications for performance, correctness, and isolation.

The Copy-on-Write Optimization

How Copy-on-Write Works:

At fork(): Parent and child share the same physical memory pages
Page Table Setup: Both processes' page tables point to the same physical frames
Mark Read-Only: All shared pages are marked read-only in both processes
Trap on Write: When either process tries to write, a page fault occurs
Copy and Remap: Kernel copies the page, gives the writer its own copy, marks it writable
Continue Execution: Only the actually-modified pages get copied

Converting Mermaid diagram...

Performance Implications

Cache-Efficient Fork:

fork() itself is very fast—mostly page table manipulation
No actual memory copying until writes occur
Many forked processes exec() immediately, never modifying memory
Pages that are never written are never copied

When COW Isn't Enough:

Memory-intensive children that modify most of their memory eventually copy everything anyway
COW page faults add latency to first writes
Kernel memory for tracking COW pages adds overhead
Very large processes may experience significant copy costs even with COW

cow_demonstration.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/wait.h>
#include <string.h>
 
#define PAGE_SIZE 4096
#define NUM_PAGES 1000
#define ARRAY_SIZE (PAGE_SIZE * NUM_PAGES)
 
/**
 * Demonstrates Copy-on-Write behavior
 * 
 * Key observations:
 * 1. Fork is fast even with large memory allocation
 * 2. Reading shared memory doesn't trigger copies
 * 3. Writing triggers per-page copies
 */
 
int main() {
    // Allocate large memory region
    char* large_array = malloc(ARRAY_SIZE);
    if (!large_array) {
        perror("malloc");
        return 1;
    }
    
    // Initialize with pattern
    printf("Parent: Initializing %d bytes...\n", ARRAY_SIZE);
    memset(large_array, 'P', ARRAY_SIZE);
    
    printf("Parent: Memory initialized. Forking...\n");
    
    // Time the fork
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child process
        
        // Reading doesn't copy - pages remain shared
        printf("Child: Reading from shared memory (no copy)...\n");
        int first_char = large_array[0];
        int last_char = large_array[ARRAY_SIZE - 1];
        printf("Child: First='%c', Last='%c'\n", first_char, last_char);
        
        // Writing triggers COW for specific pages
        printf("Child: Writing to first page (triggers COW copy)...\n");
        large_array[0] = 'C';  // Triggers COW for page 0
        
        printf("Child: Writing to page 500 (triggers COW copy)...\n");
        large_array[500 * PAGE_SIZE] = 'C';  // Triggers COW for page 500
        
        // Verify our writes
        printf("Child: After write, first='%c' (expected 'C')\n", large_array[0]);
        
        // Check that unmodified pages still have parent's data
        printf("Child: Unmodified page 100, char='%c' (expected 'P')\n", 
               large_array[100 * PAGE_SIZE]);
        
        _exit(0);
    }
    
    // Parent continues
    wait(NULL);
    
    // Verify parent's data unchanged despite child's writes
    printf("Parent: After child exit, first='%c' (expected 'P')\n", large_array[0]);
    printf("Parent: Isolation confirmed - child's writes didn't affect us\n");
    
    free(large_array);
    return 0;
}

Memory Overcommit and OOM

Memory Regions and Sharing Semantics

Different memory regions have different sharing behaviors:

Region	Typical Sharing Behavior	Notes
Text (Code) Segment	Shared (read-only)	Never needs COW—identical, immutable code
Initialized Data	COW	Variables with initial values
BSS (Uninitialized)	COW	Zero-initialized global/static variables
Heap	COW	Dynamically allocated memory
Stack	COW (per-thread)	Each thread/process gets own stack
Shared Memory (mmap SHARED)	Truly Shared	MAP_SHARED regions bypass COW
Mapped Files (PRIVATE)	COW	MAP_PRIVATE copies on write

File Descriptor Sharing

File descriptors present a unique sharing scenario: the descriptor table is copied, but the underlying file table entries are shared. This creates both opportunities and hazards.

The Three-Level File System Abstraction

To understand file descriptor sharing, you must understand the three levels of abstraction:

File Descriptor Table (per-process)
- Array of file descriptor entries
- Index = fd number (0, 1, 2, ...)
- Each entry points to a file table entry
- Copied at fork() (with flags_inherit behavior)
File Table (system-wide)
- Kernel maintains one table for all open files
- Entries contain: file offset, mode (read/write), reference count
- Shared between parent and child after fork
- Entry deleted when reference count reaches 0
Inode Table (system-wide)
- One entry per actual file on disk
- Contains file metadata, data block pointers
- Referenced by file table entries

Converting Mermaid diagram...

Implications of Shared File Table Entries

Shared File Offset:

The most important consequence: parent and child share the file offset.

int fd = open("data.txt", O_RDWR);
lseek(fd, 0, SEEK_SET);  // Offset = 0

pid_t pid = fork();

if (pid == 0) {
    // Child
    char buf[100];
    read(fd, buf, 100);  // Reads bytes 0-99, advances offset to 100
    _exit(0);
}

wait(NULL);
// Parent's offset is now 100! Child's read advanced it.
char buf[100];
read(fd, buf, 100);  // Reads bytes 100-199

This behavior is intentional—it enables concurrent log writing without explicit coordination:

// Both parent and child can write to log
// Writes are atomic (up to PIPE_BUF for pipes, larger for regular files)
// Shared offset ensures sequential log entries without gaps

fd_sharing_demo.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/wait.h>
#include <string.h>
 
/**
 * Demonstrates file descriptor sharing semantics
 * 
 * Key points:
 * 1. FD table is copied (parent and child have own tables)
 * 2. Both tables point to SAME file table entry
 * 3. File offset is shared and advances for both
 * 4. Closing FD in child doesn't affect parent's FD
 */
 
int main() {
    // Create test file
    int fd = open("shared_test.txt", O_RDWR | O_CREAT | O_TRUNC, 0644);
    
    const char* initial = "0123456789ABCDEF";  // 16 bytes
    write(fd, initial, 16);
    lseek(fd, 0, SEEK_SET);  // Reset to start
    
    printf("Parent: File position before fork: %ld\n", 
           (long)lseek(fd, 0, SEEK_CUR));
    
    pid_t pid = fork();
    
    if (pid == 0) {
        // Child
        printf("Child: Initial file position: %ld\n",
               (long)lseek(fd, 0, SEEK_CUR));
        
        // Read 4 bytes - advances SHARED offset
        char buf[5] = {0};
        read(fd, buf, 4);
        printf("Child: Read '%s', position now: %ld\n", 
               buf, (long)lseek(fd, 0, SEEK_CUR));
        
        // Close FD in child - parent's FD still works
        close(fd);
        printf("Child: Closed fd, exiting\n");
        _exit(0);
    }
    
    wait(NULL);
    
    // Parent continues - offset was advanced by child
    printf("Parent: File position after child: %ld\n",
           (long)lseek(fd, 0, SEEK_CUR));
    
    char buf[5] = {0};
    read(fd, buf, 4);  // Reads "4567" not "0123"
    printf("Parent: Read '%s' (expected '4567')\n", buf);
    
    // Our FD still works despite child closing theirs
    printf("Parent: FD still valid, position: %ld\n",
           (long)lseek(fd, 0, SEEK_CUR));
    
    close(fd);
    unlink("shared_test.txt");
    return 0;
}

Close-on-Exec (FD_CLOEXEC)

// Method 1: Set flag after open
int fd = open("secret.key", O_RDONLY);
fcntl(fd, F_SETFD, FD_CLOEXEC);

// Method 2: Use O_CLOEXEC at open time (preferred, atomic)
int fd = open("secret.key", O_RDONLY | O_CLOEXEC);

// Method 3: Set non-inheritable (same effect)
// After exec(), this FD will not exist in the new program

Security Implication: Sensitive file descriptors (passwords, crypto keys, privileged sockets) should always use close-on-exec to prevent leaking to exec'd programs.

Modern Practice: O_CLOEXEC Everywhere

The clone() System Call: Fine-Grained Control

While fork() provides all-or-nothing semantics (copy everything, except what COW optimizes), Linux's clone() system call enables precise control over what is shared and what is copied.

clone() vs fork()

Conceptually:

fork() = clone() with no sharing flags
Thread creation = clone() with maximum sharing flags

clone() Flags

The clone() system call accepts flags that individually control each resource:

clone() Sharing Flags (Selected)
Flag	Effect When SET	Effect When UNSET
CLONE_VM	Share virtual memory (address space)	Copy address space (COW)
CLONE_FS	Share filesystem info (cwd, root, umask)	Copy filesystem info
CLONE_FILES	Share file descriptor table	Copy file descriptor table
CLONE_SIGHAND	Share signal handlers	Copy signal handlers
CLONE_THREAD	Place in same thread group	Create new process (new TGID)
CLONE_PARENT	New process is sibling, not child	Normal parent-child
CLONE_NEWPID	Create new PID namespace	Inherit PID namespace
CLONE_NEWNS	Create new mount namespace	Inherit mount namespace
CLONE_NEWNET	Create new network namespace	Inherit network namespace
CLONE_NEWUSER	Create new user namespace	Inherit user namespace

clone_examples.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sched.h>
#include <sys/wait.h>
#include <sys/mman.h>
 
#define STACK_SIZE (1024 * 1024)
 
// Shared variable (only meaningful with CLONE_VM)
int shared_variable = 0;
 
int child_function(void* arg) {
    printf("Child: Entered child function\n");
    printf("Child: PID=%d, PPID=%d\n", getpid(), getppid());
    
    // Modify shared variable
    shared_variable = 42;
    printf("Child: Set shared_variable to %d\n", shared_variable);
    
    return 0;
}
 
/**
 * Demonstrate clone() with different sharing combinations
 */
int main() {
    // Allocate stack for child (required for clone)
    void* child_stack = mmap(NULL, STACK_SIZE,
                              PROT_READ | PROT_WRITE,
                              MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK,
                              -1, 0);
    if (child_stack == MAP_FAILED) {
        perror("mmap");
        return 1;
    }
    
    // Stack grows downward - pass top of stack
    void* stack_top = child_stack + STACK_SIZE;
    
    printf("Parent: Before clone, shared_variable = %d\n", shared_variable);
    
    // Example 1: Share virtual memory (like threads)
    printf("\n=== Clone with CLONE_VM (shared memory) ===\n");
    
    pid_t pid = clone(child_function,
                      stack_top,
                      CLONE_VM | SIGCHLD,  // Share VM, send SIGCHLD on exit
                      NULL);
    
    if (pid == -1) {
        perror("clone");
        return 1;
    }
    
    printf("Parent: Created child with PID %d\n", pid);
    
    // Wait for child
    waitpid(pid, NULL, 0);
    
    // With CLONE_VM, child's modification is visible to parent
    printf("Parent: After child exit, shared_variable = %d\n", shared_variable);
    
    if (shared_variable == 42) {
        printf("Parent: Memory was SHARED (expected with CLONE_VM)\n");
    } else {
        printf("Parent: Memory was COPIED\n");
    }
    
    // Cleanup
    munmap(child_stack, STACK_SIZE);
    
    return 0;
}

Thread Creation via clone()

POSIX threads (pthreads) are implemented using clone() with these typical flags:

clone(thread_function,
      child_stack,
      CLONE_VM |           // Share address space
      CLONE_FS |           // Share FS info
      CLONE_FILES |        // Share file descriptors
      CLONE_SIGHAND |      // Share signal handlers
      CLONE_THREAD |       // Same thread group (TGID)
      CLONE_SETTLS |       // Set thread-local storage
      CLONE_PARENT_SETTID | // Write TID to parent memory
      CLONE_CHILD_CLEARTID, // Clear TID on exit (for futex)
      arg);

This creates a new thread that shares almost everything with its creator—the defining characteristic of threads versus processes.

Namespace Flags for Containerization

The CLONE_NEW* flags create new namespaces, which is the foundation of Linux containers:

clone(init_function,
      stack_top,
      CLONE_NEWPID |  // New PID namespace (child sees itself as PID 1)
      CLONE_NEWNS |   // New mount namespace (isolated filesystem view)
      CLONE_NEWNET |  // New network namespace (private network stack)
      CLONE_NEWUTS |  // New UTS namespace (private hostname)
      CLONE_NEWIPC |  // New IPC namespace (private message queues)
      CLONE_NEWUSER | // New user namespace (private UID mappings)
      SIGCHLD,
      NULL);

Docker, Kubernetes, and LXC all use these mechanisms.

clone3() - The Modern Interface

Resource Sharing in Practice

Let's examine how different applications and patterns use resource sharing strategically.

Pattern 1: Worker Process Pool

Scenario: Web server wants multiple worker processes to serve requests.

Sharing Strategy:

Fork workers from master after initialization (share read-only config, code)
Each worker has independent memory (for request handling)
Share listening socket FD via inheritance
Workers compete for connections via accept()

// Master opens socket, then forks
int listen_fd = socket(AF_INET, SOCK_STREAM, 0);
bind(listen_fd, ...);
listen(listen_fd, 128);

for (int i = 0; i < num_workers; i++) {
    if (fork() == 0) {
        // Worker process
        while (1) {
            int conn = accept(listen_fd, ...);  // Inherited FD
            handle_request(conn);
            close(conn);
        }
    }
}

Examples: Apache prefork MPM, Nginx workers, Gunicorn

Pattern 2: Secure Sandbox (Minimal Sharing)

Scenario: Run untrusted code with minimal access to parent's resources.

Sharing Strategy:

Use namespaces to isolate (CLONE_NEWPID, CLONE_NEWNS, etc.)
Don't inherit any file descriptors (close all before exec)
Drop privileges before exec
Use seccomp to limit system calls

// Highly isolated child
clone(sandbox_function,
      stack_top,
      CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET | CLONE_NEWUSER | SIGCHLD,
      NULL);

// In sandbox_function:
// - Close all FDs
// - Drop capabilities
// - Apply seccomp filter
// - exec untrusted binary

Examples: Chrome sandbox, Firejail, systemd-nspawn

Pattern 3: Parallel Computation (Maximum Sharing)

Scenario: Parallel algorithm with shared data structures.

Sharing Strategy:

Use threads (CLONE_VM shares memory)
Coordinate with mutexes/atomics
Each thread has private stack, TLS

pthread_t threads[NUM_THREADS];

// Shared data
int results[NUM_THREADS];
int shared_counter = 0;
pthread_mutex_t counter_lock = PTHREAD_MUTEX_INITIALIZER;

for (int i = 0; i < NUM_THREADS; i++) {
    pthread_create(&threads[i], NULL, worker, &i);
}

// Workers share address space, can access shared_counter with synchronization

Examples: OpenMP parallel regions, Java parallel streams, Rayon in Rust

When to Share

•Communication bandwidth is critical
•Data structures too large to copy
•Tasks naturally cooperate on shared state
•Creation/destruction overhead must be minimal
•Workloads are short-lived

When to Isolate

•Fault isolation is critical (crashes)
•Security boundaries required
•Tasks have little shared state
•Tasks may be hostile or untrusted
•Debugging requires separation

The Hybrid Approach

Platform Differences

Different operating systems offer different resource sharing mechanisms, reflecting different design philosophies.

Windows: CreateProcess and Handle Inheritance

Windows does not have fork()—it uses CreateProcess() which always loads a new executable. Resource sharing is controlled differently:

Resource Sharing: Linux vs Windows
Resource	Linux (fork/clone)	Windows (CreateProcess)
Memory	COW copy (fork) or shared (clone+VM)	Always separate; use shared memory explicitly
File Handles	All inherited by default	Only handles marked inheritable are inherited
Threads	clone() with CLONE_THREAD	CreateThread() within process
Environment	Inherited by default	Explicitly passed or inherited
Directory	Inherited	Explicitly specified or inherited
Security Context	Inherited UID/GID	Inherited token or specified

windows_inheritance.c
C (Windows)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
#include <windows.h>
#include <stdio.h>
 
/**
 * Windows handle inheritance example
 * 
 * Handles must be explicitly marked inheritable
 */
int main() {
    // Create security attributes for inheritable handle
    SECURITY_ATTRIBUTES sa;
    sa.nLength = sizeof(sa);
    sa.bInheritHandle = TRUE;    // Key: mark as inheritable
    sa.lpSecurityDescriptor = NULL;
    
    // Create pipe with inheritable handles
    HANDLE read_pipe, write_pipe;
    CreatePipe(&read_pipe, &write_pipe, &sa, 0);
    
    // Prepare child process
    STARTUPINFOA si = {0};
    PROCESS_INFORMATION pi = {0};
    si.cb = sizeof(si);
    
    // Pass handle value to child via environment or command line
    char cmd[256];
    sprintf(cmd, "child.exe %p", write_pipe);  // Pass handle as argument
    
    // Create child with handle inheritance ENABLED
    BOOL success = CreateProcessA(
        NULL,           // App name
        cmd,            // Command line with handle
        NULL,           // Process security
        NULL,           // Thread security
        TRUE,           // INHERIT HANDLES (crucial)
        0,              // Flags
        NULL,           // Environment
        NULL,           // Directory
        &si,
        &pi
    );
    
    if (!success) {
        printf("CreateProcess failed: %d\n", GetLastError());
        return 1;
    }
    
    printf("Parent: Created child, PID=%d\n", pi.dwProcessId);
    printf("Parent: Child inherited write_pipe handle\n");
    
    // Close our copy of write end
    CloseHandle(write_pipe);
    
    // Read from child
    char buf[100];
    DWORD bytesRead;
    ReadFile(read_pipe, buf, sizeof(buf), &bytesRead, NULL);
    buf[bytesRead] = '\0';
    printf("Parent read from child: %s\n", buf);
    
    WaitForSingleObject(pi.hProcess, INFINITE);
    CloseHandle(pi.hProcess);
    CloseHandle(pi.hThread);
    CloseHandle(read_pipe);
    
    return 0;
}

macOS/BSD: fork() + posix_spawn()

macOS and BSD support fork() but encourage posix_spawn() for new code:

posix_spawn() atomically creates process and execs new program
More explicit about what is inherited via file_actions and attrp
Better performance for fork-then-exec pattern
Recommended by Apple for new development

#include <spawn.h>

extern char **environ;

posix_spawn_file_actions_t actions;
posix_spawn_file_actions_init(&actions);

// Close fd 3 in child
posix_spawn_file_actions_addclose(&actions, 3);

// Redirect stdin from file
posix_spawn_file_actions_addopen(&actions, 0, "input.txt", O_RDONLY, 0);

pid_t pid;
posix_spawn(&pid, "/bin/ls", &actions, NULL, 
            (char*[]){"ls", "-la", NULL}, environ);

Summary: Resource Sharing Options

Key Takeaways

•Three Models — Share everything (threads), copy everything (processes), or selective sharing (clone with flags)
•Copy-on-Write Optimizes fork() — Physical memory isn't copied until written, making fork surprisingly efficient
•File Descriptors Share Offset — Both parent and child share the same file table entry, including file position
•clone() Enables Fine Control — Linux's clone() allows per-resource sharing decisions via flags
•Namespaces Provide Isolation — CLONE_NEW* flags create isolated namespaces for containerization
•Close-on-Exec for Security — Use O_CLOEXEC to prevent sensitive FDs from leaking to exec'd programs
•Platform Differences Matter — Windows uses explicit inheritance; no equivalent to fork()

What's Next:

Page Complete

3 / 5