Operating SystemsLinux Internals

Linux Process Management

LevelAdvanced

Duration120 mins

TopicLinux Internals

2 / 5

Process Creation: The do_fork Mechanism

How Linux Creates Every Process

Every process you see running on a Linux system—from the init process (PID 1) to thousands of containerized microservices—was created through a single kernel code path. Whether triggered by fork(), vfork(), clone(), or pthread_create(), all process creation flows through the kernel's kernel_clone() function (historically known as do_fork()).

Understanding this mechanism reveals how Linux achieves efficient process creation through copy-on-write semantics, how threads share resources with their parent, and why fork() is surprisingly fast despite seemingly copying an entire address space.

What You Will Learn

By the end of this page, you will understand the complete process creation flow from system call to runnable task, the role of clone flags in resource sharing, copy-on-write optimization, and how the kernel allocates and initializes a new task_struct.

System Call Entry Points

Linux provides several system calls for process creation, each with different semantics but sharing a common implementation:

Process Creation System Calls
System Call	Purpose	Key Characteristics
fork()	Create child process	Full copy of parent (COW optimized), child gets new PID
vfork()	Create child for exec	Parent blocked until child exits/execs, shares address space
clone()	Flexible creation	Fine-grained control via flags, used for threads
clone3()	Modern clone	Extensible struct-based arguments, additional features

System Call Implementations
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* All process creation funnels through kernel_clone() */
 
/* fork() - Traditional process creation */
SYSCALL_DEFINE0(fork)
{
    struct kernel_clone_args args = {
        .exit_signal = SIGCHLD,  /* Signal parent on exit */
    };
    return kernel_clone(&args);
}
 
/* vfork() - Optimized for fork+exec pattern */
SYSCALL_DEFINE0(vfork)
{
    struct kernel_clone_args args = {
        .flags       = CLONE_VFORK | CLONE_VM,
        .exit_signal = SIGCHLD,
    };
    return kernel_clone(&args);
}
 
/* clone() - Full control via flags */
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
                unsigned long, newsp, int __user *, parent_tidptr,
                int __user *, child_tidptr, unsigned long, tls)
{
    struct kernel_clone_args args = {
        .flags       = (lower_32_bits(clone_flags) & ~CSIGNAL),
        .exit_signal = (lower_32_bits(clone_flags) & CSIGNAL),
        .stack       = newsp,
        .parent_tid  = parent_tidptr,
        .child_tid   = child_tidptr,
        .tls         = tls,
    };
    return kernel_clone(&args);
}

Clone Flags: Controlling Resource Sharing

The power of Linux process creation lies in clone flags. Each flag controls whether a specific resource is shared with the parent or copied for the child. This is how Linux implements both processes (mostly copied) and threads (mostly shared).

Critical Clone Flags
Flag	When Set: SHARE	When Clear: COPY
CLONE_VM	Share memory space (mm_struct)	Copy address space (COW)
CLONE_FS	Share filesystem info (pwd, root)	Copy filesystem context
CLONE_FILES	Share file descriptor table	Copy open file descriptors
CLONE_SIGHAND	Share signal handlers	Copy signal handlers
CLONE_THREAD	Same thread group (TGID)	New thread group
CLONE_PARENT	Share parent with caller	Caller becomes parent
CLONE_NEWPID	New PID namespace	Inherit PID namespace
CLONE_NEWNS	New mount namespace	Inherit mount namespace

Thread vs Process Creation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/* How pthread_create() creates threads (via clone) */
#define CLONE_THREAD_FLAGS (CLONE_VM | CLONE_FS | CLONE_FILES | \
                            CLONE_SIGHAND | CLONE_THREAD | \
                            CLONE_SYSVSEM | CLONE_SETTLS | \
                            CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID)
 
/*
 * Thread creation shares nearly everything:
 * - CLONE_VM: Same address space (critical for threads!)
 * - CLONE_FS: Same pwd, root directory
 * - CLONE_FILES: Same file descriptor table
 * - CLONE_SIGHAND: Same signal handlers
 * - CLONE_THREAD: Same thread group ID (getpid() returns same value)
 * 
 * fork() sets NONE of these flags, so child gets copies of everything.
 */

The Container Magic

Container runtimes use CLONE_NEW* flags to create isolated namespaces. CLONE_NEWPID gives a container its own PID namespace where container init is PID 1. CLONE_NEWNS provides an isolated mount table. CLONE_NEWNET creates a separate network stack. All from the same kernel_clone() path!

The kernel_clone() Implementation

The kernel_clone() function (previously do_fork()) orchestrates the entire creation process. Let's trace the critical path:

kernel_clone() Core Logic
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
pid_t kernel_clone(struct kernel_clone_args *args)
{
    struct task_struct *p;
    struct pid *pid;
    int trace = 0;
    pid_t nr;
    
    /* Step 1: Copy the process (the heavy lifting) */
    p = copy_process(NULL, trace, NUMA_NO_NODE, args);
    if (IS_ERR(p))
        return PTR_ERR(p);
    
    /* Step 2: Get the PID (in appropriate namespace) */
    pid = get_task_pid(p, PIDTYPE_PID);
    nr = pid_vnr(pid);  /* Virtual PID number */
    
    /* Step 3: Handle vfork - parent waits for child */
    if (args->flags & CLONE_VFORK) {
        /* Will block until child calls exec() or exit() */
        p->vfork_done = &vfork;
    }
    
    /* Step 4: Wake up the new task */
    wake_up_new_task(p);
    
    /* Step 5: If vfork, wait for child completion */
    if (args->flags & CLONE_VFORK) {
        wait_for_vfork_done(p, &vfork);
    }
    
    /* Return child's PID to parent */
    return nr;
}

The Return Value Trick

kernel_clone() returns the child's PID to the parent. But how does the child return 0 from fork()? The answer is in copy_thread(): it sets up the child's registers so that when the child is first scheduled, it returns 0 from the syscall. Parent and child execute the same return path but see different values!

copy_process(): Building the New Task

The copy_process() function is where the actual work happens. It allocates and initializes every component of the new task:

copy_process() Implementation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
static struct task_struct *copy_process(
    struct pid *pid,
    int trace,
    int node,
    struct kernel_clone_args *args)
{
    struct task_struct *p;
    int retval;
    unsigned long clone_flags = args->flags;
    
    /* Validate flag combinations */
    retval = -EINVAL;
    if ((clone_flags & CLONE_THREAD) && 
        !(clone_flags & CLONE_SIGHAND))
        goto bad_fork;  /* Threads must share signals */
    
    /* Allocate task_struct and kernel stack */
    p = dup_task_struct(current, node);
    if (!p)
        goto bad_fork;
    
    /* Initialize scheduler data */
    retval = sched_fork(clone_flags, p);
    
    /* Copy/share each subsystem based on flags */
    retval = copy_files(clone_flags, p);    /* File descriptors */
    retval = copy_fs(clone_flags, p);       /* Filesystem context */
    retval = copy_sighand(clone_flags, p);  /* Signal handlers */
    retval = copy_signal(clone_flags, p);   /* Signal state */
    retval = copy_mm(clone_flags, p);       /* Memory (COW!) */
    retval = copy_namespaces(clone_flags, p); /* Namespaces */
    retval = copy_thread(p, args);          /* Registers, stack */
    
    /* Allocate PID */
    pid = alloc_pid(p->nsproxy->pid_ns_for_children, ...);
    p->pid = pid_nr(pid);
    
    /* Set up parent-child relationships */
    p->real_parent = current;  /* or as flags dictate */
    list_add_tail(&p->sibling, &p->real_parent->children);
    
    return p;
}

copy_process() Stages

•dup_task_struct() — Allocates new task_struct and kernel stack, copies parent's task_struct as template
•sched_fork() — Initializes scheduling fields, inherits priority, resets runtime statistics
•copy_files() — Shares or copies file descriptor table based on CLONE_FILES
•copy_mm() — Shares mm_struct (threads) or creates COW copy (processes)
•copy_thread() — Sets up architecture-specific state: registers, stack pointer, return value
•alloc_pid() — Assigns PID in appropriate namespace(s)

Copy-on-Write: Why fork() is Fast

A naive implementation of fork() would copy the entire address space—potentially gigabytes of memory. Linux uses Copy-on-Write (COW) to make fork() nearly instantaneous regardless of process size.

Copy-on-Write Implementation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* copy_mm() - Memory descriptor handling */
static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
{
    struct mm_struct *mm, *oldmm;
    
    oldmm = current->mm;
    if (!oldmm)
        return 0;  /* Kernel thread, no mm */
    
    /* CLONE_VM: Share the mm_struct (threads) */
    if (clone_flags & CLONE_VM) {
        mmget(oldmm);  /* Increment reference count */
        tsk->mm = oldmm;
        tsk->active_mm = oldmm;
        return 0;
    }
    
    /* Process fork: Create COW copy */
    mm = dup_mm(tsk, current->mm);
    tsk->mm = mm;
    tsk->active_mm = mm;
    return 0;
}
 
/* dup_mm creates COW mappings */
/*
 * COW works by:
 * 1. Both parent and child share the SAME physical pages
 * 2. All writable pages are marked READ-ONLY in both
 * 3. On first write, a page fault occurs
 * 4. Fault handler copies the page, gives private copy to writer
 * 5. Only actually-modified pages get copied
 * 
 * Result: fork() copies page TABLES (small), not pages (large).
 * Most pages are never modified, so never copied.
 */

COW and Performance

COW makes fork() fast but can cause unexpected latency later. A process forked from a 10GB parent might run quickly for a while, then hit severe page fault storms when it starts modifying data. Redis's background save (BGSAVE) is a famous example—forking is instant, but as the parent modifies keys, COW faults cause latency spikes.

fork() Performance Characteristics
Operation	Cost	Notes
Allocate task_struct	O(1)	Slab allocation, very fast
Copy page tables	O(n)	n = number of page table pages, not memory pages
Allocate PID	O(1)	IDR allocation
First write to shared page	O(1) + page copy	COW fault, deferred cost

Task Structure Allocation

The dup_task_struct() function handles the critical job of allocating memory for the new task and its kernel stack:

Task Allocation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
static struct task_struct *dup_task_struct(
    struct task_struct *orig, int node)
{
    struct task_struct *tsk;
    unsigned long *stack;
    int err;
    
    /* Allocate task_struct from the task slab cache */
    tsk = alloc_task_struct_node(node);
    if (!tsk)
        return NULL;
    
    /* Allocate kernel stack (typically 2 or 4 pages) */
    stack = alloc_thread_stack_node(tsk, node);
    if (!stack)
        goto free_tsk;
    
    /* Copy parent's task_struct as starting point */
    err = arch_dup_task_struct(tsk, orig);
    
    /* Set new stack pointer */
    tsk->stack = stack;
    
    /* Reset fields that must be fresh */
    tsk->stack_canary = get_random_canary();
    refcount_set(&tsk->usage, 2);  /* 1 for thread, 1 for return value */
    
    /* Clear statistics */
    tsk->utime = tsk->stime = 0;
    
    /* Setup thread_info in the stack */
    setup_thread_stack(tsk, orig);
    
    return tsk;
}
 
/*
 * Why slab allocation?
 * - task_struct is ~7KB, awkward for page allocator
 * - Slab caches keep free task_structs ready
 * - SLAB_TYPESAFE_BY_RCU allows RCU-safe task lookup
 * - Allocation is essentially O(1)
 */

Stack Canaries

Notice the stack_canary initialization. This random value is placed at the bottom of the kernel stack. If a buffer overflow corrupts the stack, the canary value changes, and the kernel detects it before the attacker can hijack control flow.

Waking the New Task

After copy_process() creates the new task, it's not yet runnable. The wake_up_new_task() function inserts it into the scheduler:

Waking New Tasks
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void wake_up_new_task(struct task_struct *p)
{
    struct rq *rq;
    unsigned long flags;
    
    /* Mark task as runnable */
    p->state = TASK_RUNNING;
    
    /* Select CPU for new task (scheduler decides) */
    __set_task_cpu(p, select_task_rq(p, ...));
    
    /* Lock the runqueue and enqueue */
    rq = task_rq_lock(p, &flags);
    
    /* Add to runqueue via scheduling class */
    activate_task(rq, p, ENQUEUE_NOCLOCK);
    
    /* Check if new task should preempt current */
    check_preempt_curr(rq, p, WF_FORK);
    
    task_rq_unlock(rq, p, &flags);
}
 
/*
 * Key decisions here:
 * 1. Which CPU should run the new task?
 *    - Usually same as parent for cache affinity
 *    - But may move if parent's CPU is overloaded
 * 
 * 2. Should new task preempt parent immediately?
 *    - CFS often runs child first (fork pattern: if(fork()) wait)
 *    - Avoids COW faults if child execs immediately
 */

Summary: Process Creation Mastery

Key Takeaways

•All process creation flows through kernel_clone() — fork(), vfork(), clone(), and pthread_create() all use the same kernel path with different flags.
•Clone flags control sharing — Threads share (CLONE_VM | CLONE_FILES | ...), processes copy. Containers use CLONE_NEW* for isolation.
•Copy-on-Write makes fork() fast — Only page tables are copied immediately; actual pages are copied on first write.
•copy_process() builds the new task — Allocates task_struct, kernel stack, copies/shares resources based on flags.
•wake_up_new_task() makes it runnable — Selects CPU, enqueues task, checks for preemption.

Page Complete

You now understand how Linux creates processes—from system call through kernel_clone() to a running task. Next, we'll explore the scheduling classes that determine how these tasks compete for CPU time.

2 / 5

Loading learning content...

Operating SystemsLinux Internals

Linux Process Management

LevelAdvanced

Duration120 mins

TopicLinux Internals

2 / 5

Process Creation: The do_fork Mechanism

How Linux Creates Every Process

What You Will Learn

System Call Entry Points

Linux provides several system calls for process creation, each with different semantics but sharing a common implementation:

Process Creation System Calls
System Call	Purpose	Key Characteristics
fork()	Create child process	Full copy of parent (COW optimized), child gets new PID
vfork()	Create child for exec	Parent blocked until child exits/execs, shares address space
clone()	Flexible creation	Fine-grained control via flags, used for threads
clone3()	Modern clone	Extensible struct-based arguments, additional features

System Call Implementations
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* All process creation funnels through kernel_clone() */
 
/* fork() - Traditional process creation */
SYSCALL_DEFINE0(fork)
{
    struct kernel_clone_args args = {
        .exit_signal = SIGCHLD,  /* Signal parent on exit */
    };
    return kernel_clone(&args);
}
 
/* vfork() - Optimized for fork+exec pattern */
SYSCALL_DEFINE0(vfork)
{
    struct kernel_clone_args args = {
        .flags       = CLONE_VFORK | CLONE_VM,
        .exit_signal = SIGCHLD,
    };
    return kernel_clone(&args);
}
 
/* clone() - Full control via flags */
SYSCALL_DEFINE5(clone, unsigned long, clone_flags,
                unsigned long, newsp, int __user *, parent_tidptr,
                int __user *, child_tidptr, unsigned long, tls)
{
    struct kernel_clone_args args = {
        .flags       = (lower_32_bits(clone_flags) & ~CSIGNAL),
        .exit_signal = (lower_32_bits(clone_flags) & CSIGNAL),
        .stack       = newsp,
        .parent_tid  = parent_tidptr,
        .child_tid   = child_tidptr,
        .tls         = tls,
    };
    return kernel_clone(&args);
}

Clone Flags: Controlling Resource Sharing

Critical Clone Flags
Flag	When Set: SHARE	When Clear: COPY
CLONE_VM	Share memory space (mm_struct)	Copy address space (COW)
CLONE_FS	Share filesystem info (pwd, root)	Copy filesystem context
CLONE_FILES	Share file descriptor table	Copy open file descriptors
CLONE_SIGHAND	Share signal handlers	Copy signal handlers
CLONE_THREAD	Same thread group (TGID)	New thread group
CLONE_PARENT	Share parent with caller	Caller becomes parent
CLONE_NEWPID	New PID namespace	Inherit PID namespace
CLONE_NEWNS	New mount namespace	Inherit mount namespace

Thread vs Process Creation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
/* How pthread_create() creates threads (via clone) */
#define CLONE_THREAD_FLAGS (CLONE_VM | CLONE_FS | CLONE_FILES | \
                            CLONE_SIGHAND | CLONE_THREAD | \
                            CLONE_SYSVSEM | CLONE_SETTLS | \
                            CLONE_PARENT_SETTID | CLONE_CHILD_CLEARTID)
 
/*
 * Thread creation shares nearly everything:
 * - CLONE_VM: Same address space (critical for threads!)
 * - CLONE_FS: Same pwd, root directory
 * - CLONE_FILES: Same file descriptor table
 * - CLONE_SIGHAND: Same signal handlers
 * - CLONE_THREAD: Same thread group ID (getpid() returns same value)
 * 
 * fork() sets NONE of these flags, so child gets copies of everything.
 */

The Container Magic

The kernel_clone() Implementation

The kernel_clone() function (previously do_fork()) orchestrates the entire creation process. Let's trace the critical path:

kernel_clone() Core Logic
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
pid_t kernel_clone(struct kernel_clone_args *args)
{
    struct task_struct *p;
    struct pid *pid;
    int trace = 0;
    pid_t nr;
    
    /* Step 1: Copy the process (the heavy lifting) */
    p = copy_process(NULL, trace, NUMA_NO_NODE, args);
    if (IS_ERR(p))
        return PTR_ERR(p);
    
    /* Step 2: Get the PID (in appropriate namespace) */
    pid = get_task_pid(p, PIDTYPE_PID);
    nr = pid_vnr(pid);  /* Virtual PID number */
    
    /* Step 3: Handle vfork - parent waits for child */
    if (args->flags & CLONE_VFORK) {
        /* Will block until child calls exec() or exit() */
        p->vfork_done = &vfork;
    }
    
    /* Step 4: Wake up the new task */
    wake_up_new_task(p);
    
    /* Step 5: If vfork, wait for child completion */
    if (args->flags & CLONE_VFORK) {
        wait_for_vfork_done(p, &vfork);
    }
    
    /* Return child's PID to parent */
    return nr;
}

The Return Value Trick

copy_process(): Building the New Task

The copy_process() function is where the actual work happens. It allocates and initializes every component of the new task:

copy_process() Implementation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
static struct task_struct *copy_process(
    struct pid *pid,
    int trace,
    int node,
    struct kernel_clone_args *args)
{
    struct task_struct *p;
    int retval;
    unsigned long clone_flags = args->flags;
    
    /* Validate flag combinations */
    retval = -EINVAL;
    if ((clone_flags & CLONE_THREAD) && 
        !(clone_flags & CLONE_SIGHAND))
        goto bad_fork;  /* Threads must share signals */
    
    /* Allocate task_struct and kernel stack */
    p = dup_task_struct(current, node);
    if (!p)
        goto bad_fork;
    
    /* Initialize scheduler data */
    retval = sched_fork(clone_flags, p);
    
    /* Copy/share each subsystem based on flags */
    retval = copy_files(clone_flags, p);    /* File descriptors */
    retval = copy_fs(clone_flags, p);       /* Filesystem context */
    retval = copy_sighand(clone_flags, p);  /* Signal handlers */
    retval = copy_signal(clone_flags, p);   /* Signal state */
    retval = copy_mm(clone_flags, p);       /* Memory (COW!) */
    retval = copy_namespaces(clone_flags, p); /* Namespaces */
    retval = copy_thread(p, args);          /* Registers, stack */
    
    /* Allocate PID */
    pid = alloc_pid(p->nsproxy->pid_ns_for_children, ...);
    p->pid = pid_nr(pid);
    
    /* Set up parent-child relationships */
    p->real_parent = current;  /* or as flags dictate */
    list_add_tail(&p->sibling, &p->real_parent->children);
    
    return p;
}

copy_process() Stages

•dup_task_struct() — Allocates new task_struct and kernel stack, copies parent's task_struct as template
•sched_fork() — Initializes scheduling fields, inherits priority, resets runtime statistics
•copy_files() — Shares or copies file descriptor table based on CLONE_FILES
•copy_mm() — Shares mm_struct (threads) or creates COW copy (processes)
•copy_thread() — Sets up architecture-specific state: registers, stack pointer, return value
•alloc_pid() — Assigns PID in appropriate namespace(s)

Copy-on-Write: Why fork() is Fast

Copy-on-Write Implementation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
/* copy_mm() - Memory descriptor handling */
static int copy_mm(unsigned long clone_flags, struct task_struct *tsk)
{
    struct mm_struct *mm, *oldmm;
    
    oldmm = current->mm;
    if (!oldmm)
        return 0;  /* Kernel thread, no mm */
    
    /* CLONE_VM: Share the mm_struct (threads) */
    if (clone_flags & CLONE_VM) {
        mmget(oldmm);  /* Increment reference count */
        tsk->mm = oldmm;
        tsk->active_mm = oldmm;
        return 0;
    }
    
    /* Process fork: Create COW copy */
    mm = dup_mm(tsk, current->mm);
    tsk->mm = mm;
    tsk->active_mm = mm;
    return 0;
}
 
/* dup_mm creates COW mappings */
/*
 * COW works by:
 * 1. Both parent and child share the SAME physical pages
 * 2. All writable pages are marked READ-ONLY in both
 * 3. On first write, a page fault occurs
 * 4. Fault handler copies the page, gives private copy to writer
 * 5. Only actually-modified pages get copied
 * 
 * Result: fork() copies page TABLES (small), not pages (large).
 * Most pages are never modified, so never copied.
 */

COW and Performance

fork() Performance Characteristics
Operation	Cost	Notes
Allocate task_struct	O(1)	Slab allocation, very fast
Copy page tables	O(n)	n = number of page table pages, not memory pages
Allocate PID	O(1)	IDR allocation
First write to shared page	O(1) + page copy	COW fault, deferred cost

Task Structure Allocation

The dup_task_struct() function handles the critical job of allocating memory for the new task and its kernel stack:

Task Allocation
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
static struct task_struct *dup_task_struct(
    struct task_struct *orig, int node)
{
    struct task_struct *tsk;
    unsigned long *stack;
    int err;
    
    /* Allocate task_struct from the task slab cache */
    tsk = alloc_task_struct_node(node);
    if (!tsk)
        return NULL;
    
    /* Allocate kernel stack (typically 2 or 4 pages) */
    stack = alloc_thread_stack_node(tsk, node);
    if (!stack)
        goto free_tsk;
    
    /* Copy parent's task_struct as starting point */
    err = arch_dup_task_struct(tsk, orig);
    
    /* Set new stack pointer */
    tsk->stack = stack;
    
    /* Reset fields that must be fresh */
    tsk->stack_canary = get_random_canary();
    refcount_set(&tsk->usage, 2);  /* 1 for thread, 1 for return value */
    
    /* Clear statistics */
    tsk->utime = tsk->stime = 0;
    
    /* Setup thread_info in the stack */
    setup_thread_stack(tsk, orig);
    
    return tsk;
}
 
/*
 * Why slab allocation?
 * - task_struct is ~7KB, awkward for page allocator
 * - Slab caches keep free task_structs ready
 * - SLAB_TYPESAFE_BY_RCU allows RCU-safe task lookup
 * - Allocation is essentially O(1)
 */

Stack Canaries

Waking the New Task

After copy_process() creates the new task, it's not yet runnable. The wake_up_new_task() function inserts it into the scheduler:

Waking New Tasks
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
void wake_up_new_task(struct task_struct *p)
{
    struct rq *rq;
    unsigned long flags;
    
    /* Mark task as runnable */
    p->state = TASK_RUNNING;
    
    /* Select CPU for new task (scheduler decides) */
    __set_task_cpu(p, select_task_rq(p, ...));
    
    /* Lock the runqueue and enqueue */
    rq = task_rq_lock(p, &flags);
    
    /* Add to runqueue via scheduling class */
    activate_task(rq, p, ENQUEUE_NOCLOCK);
    
    /* Check if new task should preempt current */
    check_preempt_curr(rq, p, WF_FORK);
    
    task_rq_unlock(rq, p, &flags);
}
 
/*
 * Key decisions here:
 * 1. Which CPU should run the new task?
 *    - Usually same as parent for cache affinity
 *    - But may move if parent's CPU is overloaded
 * 
 * 2. Should new task preempt parent immediately?
 *    - CFS often runs child first (fork pattern: if(fork()) wait)
 *    - Avoids COW faults if child execs immediately
 */

Summary: Process Creation Mastery

Key Takeaways

•All process creation flows through kernel_clone() — fork(), vfork(), clone(), and pthread_create() all use the same kernel path with different flags.
•Clone flags control sharing — Threads share (CLONE_VM | CLONE_FILES | ...), processes copy. Containers use CLONE_NEW* for isolation.
•Copy-on-Write makes fork() fast — Only page tables are copied immediately; actual pages are copied on first write.
•copy_process() builds the new task — Allocates task_struct, kernel stack, copies/shares resources based on flags.
•wake_up_new_task() makes it runnable — Selects CPU, enqueues task, checks for preemption.

Page Complete

2 / 5