Operating SystemsProcess Control Block (PCB)

Process Control Block (PCB)

LevelIntermediate

Duration60 mins

TopicProcess Control Block (PCB)

5 / 5

Memory Management Info: Tracking the Virtual Address Space

Every Process Has Its Own Universe

From a process's perspective, it has exclusive access to a vast, contiguous memory space—typically spanning 128 TB or more of virtual addresses on a 64-bit system. It can allocate memory, map files, and execute code as if no other process existed. This illusion of isolation and abundance is created by the operating system's memory management system.

But this illusion requires meticulous bookkeeping. The kernel must track:

Where in virtual memory each segment of the process lives (code, data, stack, heap)
How virtual addresses map to physical memory (page tables)
What permissions apply to each region (read, write, execute)
How much memory the process can use (limits and quotas)
Which pages are in physical RAM versus swapped to disk

All of this information lives in the PCB—specifically, in the memory management information structures that describe the process's address space.

What You Will Learn

By the end of this page, you will understand how the PCB represents a process's memory: the memory descriptor structure, virtual memory areas (VMAs), page table pointers, memory statistics, and resource limits. You'll see how fork() creates shared mappings, how mmap() adds regions, and how the kernel enforces memory protection.

The Memory Descriptor

The memory descriptor (called mm_struct in Linux, struct _MADDRESS_SPACE in Windows, or vm_map in macOS) is the top-level structure that describes a process's entire virtual address space. It's referenced from the PCB and contains or points to everything the kernel needs for memory management.

What the Memory Descriptor Contains:

Page table pointer: The physical address of the top-level page table (PGD/PML4/TTBRx)
List of memory regions: Data structures describing each mapped region (VMAs)
Memory statistics: RSS, virtual size, shared pages, locked pages
Synchronization primitives: Locks protecting the address space during modification
Reference count: How many tasks share this address space (threads)
Memory limits: Maximum virtual memory, locked memory, stack size

linux_mm_struct.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Linux Memory Descriptor (simplified from include/linux/mm_types.h)
 
struct mm_struct {
    // Virtual Memory Areas
    struct vm_area_struct *mmap;       // List of VMAs
    struct rb_root mm_rb;              // Red-black tree of VMAs for fast lookup
    struct vm_area_struct *mmap_cache; // Last accessed VMA (cache)
    
    // Address Space Layout
    unsigned long mmap_base;           // Base address for mmap() allocations
    unsigned long task_size;           // Size of user address space
    unsigned long highest_vm_end;      // Highest VMA end address
    
    // Page Tables
    pgd_t *pgd;                        // Pointer to top-level page table (PGD/PML4)
    
    // Reference Counting
    atomic_t mm_users;                 // Number of users (thread count)
    atomic_t mm_count;                 // Number of references (includes kernel)
    
    // Memory Statistics
    unsigned long total_vm;            // Total pages mapped (virtual)
    unsigned long locked_vm;           // Pages locked in memory
    unsigned long pinned_vm;           // Pages pinned (can't be swapped)
    unsigned long data_vm;             // Data + stack pages
    unsigned long exec_vm;             // Executable pages
    unsigned long stack_vm;            // Stack pages
    
    // Limits
    unsigned long def_stack_guard_gap; // Gap between stack and mmap
    
    // Code and Data Boundaries
    unsigned long start_code, end_code;   // Text segment
    unsigned long start_data, end_data;   // Data segment
    unsigned long start_brk, brk;         // Heap (start and current end)
    unsigned long start_stack;            // Stack start
    unsigned long arg_start, arg_end;     // Command line arguments
    unsigned long env_start, env_end;     // Environment variables
    
    // Synchronization
    struct rw_semaphore mmap_sem;      // Protects VMA list/tree
    spinlock_t page_table_lock;        // Protects page tables
    
    // Architecture-Specific
    mm_context_t context;              // CPU-specific context (ASID, etc.)
    
    // ... many more fields in actual kernel
};
 
// In task_struct (PCB):
struct task_struct {
    // ... other fields
    
    struct mm_struct *mm;              // Memory descriptor (user address space)
    struct mm_struct *active_mm;       // Currently active address space
    
    // Kernel threads have mm = NULL but need active_mm for context switch
    // User processes: mm == active_mm
    
    // ... other fields
};

Threads Share mm_struct

Threads within a process share the same mm_struct—they have the same address space. The mm_users count tracks how many threads reference this mm_struct. When a thread is created with CLONE_VM, it doesn't get a new mm_struct; it shares the parent's. This is why threads see each other's memory changes instantly.

Converting Mermaid diagram...

Virtual Memory Areas (VMAs)

A process's address space isn't uniformly mapped—it consists of discrete regions with different properties. Each region is represented by a Virtual Memory Area (VMA) structure.

What a VMA Represents:

A VMA describes a contiguous region of virtual addresses with uniform characteristics:

Same protection flags (read, write, execute)
Same backing store (file, anonymous memory, shared memory)
Same treatment by the kernel (swappable, locked, etc.)

linux_vma_struct.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Linux VMA Structure (simplified from include/linux/mm_types.h)
 
struct vm_area_struct {
    // Address Range
    unsigned long vm_start;        // Start virtual address (inclusive)
    unsigned long vm_end;          // End virtual address (exclusive)
    
    // Linkage
    struct vm_area_struct *vm_next, *vm_prev; // Sorted list by address
    struct rb_node vm_rb;          // Red-black tree node (for fast lookup)
    
    // Owning mm_struct
    struct mm_struct *vm_mm;       // Address space this VMA belongs to
    
    // Protection and Flags
    pgprot_t vm_page_prot;         // Page table protection bits
    unsigned long vm_flags;        // VMA flags (see below)
    
    // Backing Store
    struct file *vm_file;          // File being mapped (NULL for anon)
    unsigned long vm_pgoff;        // Offset in file (in pages)
    
    // Operations
    const struct vm_operations_struct *vm_ops;  // VMA-specific handlers
    
    // Anonymous Memory
    struct anon_vma *anon_vma;     // For copy-on-write handling
    
    // ... more fields for special cases
};
 
// VMA Flags (vm_flags) - from include/linux/mm.h
#define VM_READ         0x00000001  // Readable
#define VM_WRITE        0x00000002  // Writable
#define VM_EXEC         0x00000004  // Executable
#define VM_SHARED       0x00000008  // Shared (vs. private/COW)
#define VM_GROWSDOWN    0x00000100  // Stack: can grow toward lower addr
#define VM_GROWSUP      0x00000200  // Can grow toward higher addr
#define VM_DENYWRITE    0x00000800  // Deny write to file
#define VM_LOCKED       0x00002000  // Locked in memory (mlock)
#define VM_IO           0x00004000  // Memory-mapped I/O
#define VM_DONTCOPY     0x00020000  // Don't copy on fork
#define VM_DONTEXPAND   0x00040000  // Cannot expand (mremap)
#define VM_HUGETLB      0x00400000  // Huge TLB pages
 
// Example VMA for text (code) segment:
// vm_start = 0x400000
// vm_end   = 0x401000
// vm_flags = VM_READ | VM_EXEC | VM_DENYWRITE
// vm_file  = /path/to/executable
// vm_pgoff = 0 (starts at beginning of file)

Viewing VMAs: /proc/[pid]/maps

Linux exposes VMAs through the /proc filesystem. Each line represents one VMA:

proc_maps_example.txt
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cat /proc/self/maps
# Address range          Perms   Offset   Dev   Inode   Pathname
00400000-00452000         r-xp   00000000 08:01 123456  /usr/bin/cat
00651000-00652000         r--p   00051000 08:01 123456  /usr/bin/cat
00652000-00653000         rw-p   00052000 08:01 123456  /usr/bin/cat
00e54000-00e75000         rw-p   00000000 00:00 0       [heap]
7f6c88000000-7f6c88021000 rw-p   00000000 00:00 0       
7f6c8c000000-7f6c8c1c0000 r-xp   00000000 08:01 789012  /lib/x86_64-linux-gnu/libc.so.6
...
7ffc8ba00000-7ffc8ba21000 rw-p   00000000 00:00 0       [stack]
7ffc8bb0c000-7ffc8bb10000 r--p   00000000 00:00 0       [vvar]
7ffc8bb10000-7ffc8bb11000 r-xp   00000000 00:00 0       [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
 
# Columns explained:
# 1. Address range (start-end in hex)
# 2. Permissions: r=read, w=write, x=execute, p=private, s=shared
# 3. File offset (for file-backed mappings)
# 4. Device (major:minor)
# 5. Inode
# 6. Pathname (or [heap], [stack], [vdso], empty for anon mmap)

Common VMA Types
VMA Type	Permissions	Backing	Purpose
Text (Code)	r-xp	Executable file	Program instructions (read-only, executable)
Data	rw-p	Executable file	Initialized global/static variables
BSS	rw-p	Anonymous	Uninitialized global/static variables
Heap	rw-p	Anonymous	Dynamic allocation (malloc/new)
Stack	rw-p	Anonymous	Function call stack
Shared Library Code	r-xp	Library file	Shared code (read-only, shared)
Shared Library Data	rw-p	Library file	Per-process library data
mmap (file)	varies	File	Memory-mapped file
mmap (anon)	varies	Anonymous	Anonymous memory allocation
vDSO	r-xp	Kernel	Fast system call interface

Page Tables and Address Translation

The page tables translate virtual addresses to physical addresses. Each process has its own page table hierarchy, and the pointer to the top-level table is stored in the memory descriptor (and loaded into a CPU register like CR3 during context switches).

Multi-Level Page Tables (x86-64):

On x86-64 with 4-level paging, a 48-bit virtual address is translated using:

PML4 (Page Map Level 4): 512 entries, indexed by bits 47-39
PDPT (Page Directory Pointer Table): 512 entries per PML4 entry, indexed by bits 38-30
PD (Page Directory): 512 entries per PDPT entry, indexed by bits 29-21
PT (Page Table): 512 entries per PD entry, indexed by bits 20-12
Page Offset: 12 bits (4 KB page offset)

Converting Mermaid diagram...

page_table_entry.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// x86-64 Page Table Entry Format
 
/*
 * A 64-bit Page Table Entry (PTE):
 *
 * Bit 63:    NX (No Execute) - if 1, page is not executable
 * Bits 62-52: Available for OS use
 * Bit 51-12: Physical frame number (40 bits -> 52-bit physical address)
 * Bits 11-9: Available for OS use
 * Bit 8:     G (Global) - TLB not flushed on CR3 change
 * Bit 7:     PAT (Page Attribute Table)
 * Bit 6:     D (Dirty) - page has been written
 * Bit 5:     A (Accessed) - page has been read or written
 * Bit 4:     PCD (Page Cache Disable)
 * Bit 3:     PWT (Page Write Through)
 * Bit 2:     U/S (User/Supervisor) - if 1, accessible from user mode
 * Bit 1:     R/W (Read/Write) - if 1, page is writable
 * Bit 0:     P (Present) - if 0, page fault on access
 */
 
typedef unsigned long pte_t;
 
// Macros to extract/check PTE fields
#define PTE_PRESENT     (1UL << 0)
#define PTE_RW          (1UL << 1)
#define PTE_USER        (1UL << 2)
#define PTE_ACCESSED    (1UL << 5)
#define PTE_DIRTY       (1UL << 6)
#define PTE_NX          (1UL << 63)
#define PTE_FRAME_MASK  0x000FFFFFFFFFF000UL
 
static inline bool pte_present(pte_t pte) {
    return pte & PTE_PRESENT;
}
 
static inline unsigned long pte_pfn(pte_t pte) {
    return (pte & PTE_FRAME_MASK) >> 12;
}
 
static inline bool pte_write(pte_t pte) {
    return pte & PTE_RW;
}
 
// The kernel uses these to check permissions:
// - If PTE_PRESENT is 0: Page fault (page not in memory)
// - If PTE_RW is 0 and write attempted: Protection fault
// - If PTE_USER is 0 and accessed from user mode: Protection fault
// - If PTE_NX is 1 and instruction fetch: Protection fault

Page Tables Are Expensive

A full page table hierarchy for a 48-bit address space could require gigabytes of memory. In practice, most entries are not present—the hierarchy is sparse and only populated on demand (demand paging). Huge pages (2 MB or 1 GB) reduce table overhead by using fewer levels.

Memory Statistics in the PCB

The memory descriptor tracks various statistics about memory usage. These are used for resource accounting, limits enforcement, and monitoring tools like top and ps.

Memory Statistics Fields
Statistic	Field	Meaning	Where Visible
Virtual Size	total_vm	Total pages in address space (mapped, not necessarily resident)	VSZ/VIRT in ps/top
Resident Set Size	rss	Pages actually in physical memory now	RSS/RES in ps/top
Shared Memory	shared_vm	Pages shared with other processes	SHR in top
Locked Memory	locked_vm	Pages locked (not swappable)	mlock() accounting
Code Size	exec_vm	Executable pages (text)	CODE in some tools
Data Size	data_vm	Data + stack pages	DATA in some tools
Stack Size	stack_vm	Stack pages	Stack limit tracking
Swap Usage	(external)	Pages on swap device	SWAP in top

memory_stats.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Viewing process memory statistics
 
# Using ps
ps -o pid,vsz,rss,sz,command -p $$
#   PID    VSZ   RSS    SZ COMMAND
# 12345  25000  5000  6250 bash
 
# Using /proc
cat /proc/$$/status | grep -E "^(Vm|Rss)"
# VmPeak:    26000 kB     # Peak virtual memory size
# VmSize:    25000 kB     # Current virtual memory size
# VmLck:         0 kB     # Locked memory
# VmPin:         0 kB     # Pinned memory
# VmHWM:      5500 kB     # Peak resident set size
# VmRSS:      5000 kB     # Resident set size
# RssAnon:    2000 kB     # Anonymous RSS
# RssFile:    3000 kB     # File-backed RSS
# RssShmem:      0 kB     # Shared memory RSS
# VmData:     1500 kB     # Data segment size
# VmStk:       136 kB     # Stack size
# VmExe:       900 kB     # Text (code) size
# VmLib:      2000 kB     # Shared library size
# VmPTE:        64 kB     # Page table size
# VmSwap:        0 kB     # Swap usage
 
# Memory maps summary
cat /proc/$$/smaps_rollup
# Shows aggregated memory info without per-VMA detail
 
# Detailed per-VMA memory info
cat /proc/$$/smaps | head -30
# Shows RSS, PSS, Shared/Private pages per VMA

VSZ vs RSS

VSZ (Virtual Size) includes all mapped memory—even pages never accessed or swapped out. RSS (Resident Set Size) is the actual physical memory currently used. A process might have 1 GB VSZ but only 50 MB RSS if most pages aren't touched. For capacity planning, RSS (or PSS for shared memory) is more meaningful.

Memory Handling During fork() and exec()

Process creation and program execution fundamentally involve memory management. Understanding how fork() and exec() manipulate the memory descriptor is crucial.

fork(): Creating a Copy of the Address Space

When a process calls fork(), the child gets a "copy" of the parent's address space. But physical copying would be slow and wasteful (many pages are never modified). Instead, the kernel uses Copy-on-Write (COW):

The child's mm_struct is a new structure, but its VMAs initially point to the same physical pages as the parent
All writable pages are marked read-only in both processes' page tables
When either process writes to a COW page, a page fault occurs
The fault handler creates a private copy of the page for the faulting process
The copy is marked writable; the original remains shared (or COW) for the other

This makes fork() fast—only metadata is copied, not actual memory pages.

fork_memory_handling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// Simplified fork() memory handling
 
struct mm_struct *dup_mm(struct task_struct *parent) {
    struct mm_struct *mm;
    
    // Allocate new memory descriptor
    mm = allocate_mm();
    if (!mm)
        return NULL;
    
    // Copy basic mm_struct fields
    memcpy(mm, parent->mm, sizeof(*mm));
    
    // Allocate new page table root (PGD/PML4)
    mm->pgd = pgd_alloc(mm);
    if (!mm->pgd) {
        free_mm(mm);
        return NULL;
    }
    
    // Set reference counts
    atomic_set(&mm->mm_users, 1);
    atomic_set(&mm->mm_count, 1);
    
    // Initialize synchronization
    init_rwsem(&mm->mmap_sem);
    
    // Copy all VMAs with COW semantics
    if (dup_mmap(mm, parent->mm) < 0) {
        free_pgtables(mm);
        free_mm(mm);
        return NULL;
    }
    
    return mm;
}
 
int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) {
    struct vm_area_struct *mpnt, *tmp;
    
    // Iterate through parent's VMAs
    for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
        // Allocate new VMA structure
        tmp = vm_area_dup(mpnt);
        if (!tmp)
            return -ENOMEM;
        
        // Link into new mm
        tmp->vm_mm = mm;
        insert_vm_struct(mm, tmp);
        
        // Copy page table entries with COW marking
        copy_page_range(mm, oldmm, tmp);
    }
    
    return 0;
}
 
int copy_page_range(...) {
    // For each PTE in the range:
    //   1. If writable and not shared:
    //      - Clear write bit in parent's PTE
    //      - Copy PTE to child's page table
    //      - Increment page reference count
    //      - Both PTEs now read-only (COW)
    //   2. If read-only or shared:
    //      - Just copy PTE (share the page)
}

exec(): Replacing the Address Space

When a process calls exec(), its entire address space is replaced:

Parse the new executable (ELF format on Linux)
Create new VMAs for the executable segments (text, data, bss)
Set up stack at the top of user space
Set up heap (brk) after data segment
Map shared libraries
Set program counter to entry point
Discard old VMAs and page tables

The mm_struct is either reused (after clearing) or replaced entirely.

Converting Mermaid diagram...

Memory Resource Limits

The operating system enforces resource limits to prevent any process from consuming excessive memory. These limits are stored with the process and checked during memory allocation operations.

Memory-Related Resource Limits (Linux)
Resource	Constant	Default	Purpose
Virtual Memory	RLIMIT_AS	Unlimited	Maximum address space size
Locked Memory	RLIMIT_MEMLOCK	64 KB	Max memory lockable via mlock()
Stack Size	RLIMIT_STACK	8 MB	Maximum stack segment size
Data Segment	RLIMIT_DATA	Unlimited	Maximum data segment size
Core Dump	RLIMIT_CORE	0 or Unlimited	Maximum core file size
Resident Set	RLIMIT_RSS	Unlimited	Max resident set size (advisory)

resource_limits.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Viewing and modifying resource limits
 
# Show all limits for current shell
ulimit -a
# -t: cpu time (seconds)         unlimited
# -f: file size (blocks)         unlimited
# -d: data seg size (kbytes)     unlimited
# -s: stack size (kbytes)        8192
# -c: core file size (blocks)    0
# -m: resident set size (kbytes) unlimited
# -u: processes                  63636
# -n: file descriptors           1024
# -l: locked memory (kbytes)     64
# -v: virtual memory (kbytes)    unlimited
 
# Show specific limit (stack size)
ulimit -s
# 8192
 
# Set stack limit (soft limit)
ulimit -s 16384
 
# The kernel tracks limits in rlimit struct:
# struct rlimit {
#     rlim_t rlim_cur;  // Soft limit (current)
#     rlim_t rlim_max;  // Hard limit (ceiling)
# };
 
# View limits via /proc
cat /proc/self/limits | head -10
# Limit                     Soft Limit           Hard Limit           Units
# Max cpu time              unlimited            unlimited            seconds
# Max file size             unlimited            unlimited            bytes
# Max data size             unlimited            unlimited            bytes
# Max stack size            8388608              unlimited            bytes
# Max core file size        0                    unlimited            bytes
# Max resident set          unlimited            unlimited            bytes
# Max processes             63636                63636                processes
# Max open files            1024                 1048576              files
# Max locked memory         65536                65536                bytes
# Max address space         unlimited            unlimited            bytes

What Happens When Limits Are Hit?

When a process exceeds a memory limit: mmap() returns -ENOMEM, malloc() returns NULL (after mmap fails), or the process receives SIGSEGV (stack overflow). Container orchestrators like Kubernetes use cgroups for more sophisticated memory limits with OOM killer integration.

Cross-Platform Memory Management Structures

Different operating systems structure memory management information differently, though the concepts are similar.

Windows Memory Management Structures:

EPROCESS.VadRoot: A balanced tree (AVL) of Virtual Address Descriptors (VADs)—Windows' equivalent of VMAs
VAD: Describes each region with base address, size, protection, and backing
Working Set: The set of pages currently in physical memory for this process
Paging File Support: Unlike Linux swap which is per-page, Windows commits virtual memory against page file reservations

Key Differences:

Windows uses a two-step memory allocation: reserve (claims address range) then commit (backs with storage)
Working set is managed per-process with trimming policies
Memory priorities (8 levels) influence paging decisions

windows_memory.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Windows VAD structure (conceptual)
 
typedef struct _MMVAD {
    union {
        LONG_PTR Balance : 2;
        struct _MMVAD *Parent;
    };
    struct _MMVAD *LeftChild;
    struct _MMVAD *RightChild;
    
    ULONG_PTR StartingVpn;    // Starting virtual page number
    ULONG_PTR EndingVpn;      // Ending virtual page number
    
    union {
        ULONG LongFlags;
        MMVAD_FLAGS VadFlags; // Protection, state, type
    };
    
    // For file-backed VADs
    PCONTROL_AREA ControlArea;
    PFILE_OBJECT FileObject;
    
    // ... more fields
} MMVAD, *PMMVAD;
 
// Windows API for memory inspection
MEMORY_BASIC_INFORMATION mbi;
VirtualQueryEx(hProcess, address, &mbi, sizeof(mbi));
// Returns: BaseAddress, AllocationBase, AllocationProtect,
//          RegionSize, State (MEM_COMMIT/RESERVE/FREE),
//          Protect, Type (MEM_IMAGE/MAPPED/PRIVATE)

Summary: Memory Management Information

We've explored how the PCB tracks each process's virtual address space. From the memory descriptor to VMAs to page tables, these structures enable the illusion of isolated, abundant memory for every process. Let's consolidate the key insights:

Key Takeaways

•The memory descriptor (mm_struct) is the top-level container — It holds pointers to page tables, lists of VMAs, memory statistics, and resource limits.
•VMAs describe memory regions — Each VMA covers a contiguous range with uniform protection and backing (file or anonymous).
•Page tables translate virtual to physical — The pgd pointer in mm_struct is loaded into CR3/TTBR during context switches to activate the process's address space.
•Statistics track usage — VSZ, RSS, shared pages, and other metrics enable monitoring and limit enforcement.
•fork() uses copy-on-write — The child shares physical pages with the parent until writes trigger private copies.
•Resource limits prevent abuse — RLIMIT values constrain stack size, locked memory, and total address space.

Module Complete:

This concludes our deep dive into the Process Control Block. We've examined:

PCB Contents: The overall structure and its categories of information
Process State: The lifecycle status that governs scheduling
Program Counter: The execution bookmark enabling context switching
CPU Registers: The complete computational state saved and restored
Memory Management Info: The virtual address space representation

The PCB is the kernel's fundamental representation of a process. Every scheduling decision, every resource allocation, every context switch—all rely on the information stored in this critical data structure. Understanding the PCB provides a foundation for understanding all of process management.

Module Complete: Process Control Block

You now have a comprehensive understanding of the Process Control Block—the kernel data structure that gives each process its identity. From identification to state, from CPU context to memory mapping, you understand how operating systems represent and manage the fundamental unit of execution.

5 / 5

Loading learning content...

Operating SystemsProcess Control Block (PCB)

Process Control Block (PCB)

LevelIntermediate

Duration60 mins

TopicProcess Control Block (PCB)

5 / 5

Memory Management Info: Tracking the Virtual Address Space

Every Process Has Its Own Universe

But this illusion requires meticulous bookkeeping. The kernel must track:

Where in virtual memory each segment of the process lives (code, data, stack, heap)
How virtual addresses map to physical memory (page tables)
What permissions apply to each region (read, write, execute)
How much memory the process can use (limits and quotas)
Which pages are in physical RAM versus swapped to disk

All of this information lives in the PCB—specifically, in the memory management information structures that describe the process's address space.

What You Will Learn

The Memory Descriptor

What the Memory Descriptor Contains:

Page table pointer: The physical address of the top-level page table (PGD/PML4/TTBRx)
List of memory regions: Data structures describing each mapped region (VMAs)
Memory statistics: RSS, virtual size, shared pages, locked pages
Synchronization primitives: Locks protecting the address space during modification
Reference count: How many tasks share this address space (threads)
Memory limits: Maximum virtual memory, locked memory, stack size

linux_mm_struct.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Linux Memory Descriptor (simplified from include/linux/mm_types.h)
 
struct mm_struct {
    // Virtual Memory Areas
    struct vm_area_struct *mmap;       // List of VMAs
    struct rb_root mm_rb;              // Red-black tree of VMAs for fast lookup
    struct vm_area_struct *mmap_cache; // Last accessed VMA (cache)
    
    // Address Space Layout
    unsigned long mmap_base;           // Base address for mmap() allocations
    unsigned long task_size;           // Size of user address space
    unsigned long highest_vm_end;      // Highest VMA end address
    
    // Page Tables
    pgd_t *pgd;                        // Pointer to top-level page table (PGD/PML4)
    
    // Reference Counting
    atomic_t mm_users;                 // Number of users (thread count)
    atomic_t mm_count;                 // Number of references (includes kernel)
    
    // Memory Statistics
    unsigned long total_vm;            // Total pages mapped (virtual)
    unsigned long locked_vm;           // Pages locked in memory
    unsigned long pinned_vm;           // Pages pinned (can't be swapped)
    unsigned long data_vm;             // Data + stack pages
    unsigned long exec_vm;             // Executable pages
    unsigned long stack_vm;            // Stack pages
    
    // Limits
    unsigned long def_stack_guard_gap; // Gap between stack and mmap
    
    // Code and Data Boundaries
    unsigned long start_code, end_code;   // Text segment
    unsigned long start_data, end_data;   // Data segment
    unsigned long start_brk, brk;         // Heap (start and current end)
    unsigned long start_stack;            // Stack start
    unsigned long arg_start, arg_end;     // Command line arguments
    unsigned long env_start, env_end;     // Environment variables
    
    // Synchronization
    struct rw_semaphore mmap_sem;      // Protects VMA list/tree
    spinlock_t page_table_lock;        // Protects page tables
    
    // Architecture-Specific
    mm_context_t context;              // CPU-specific context (ASID, etc.)
    
    // ... many more fields in actual kernel
};
 
// In task_struct (PCB):
struct task_struct {
    // ... other fields
    
    struct mm_struct *mm;              // Memory descriptor (user address space)
    struct mm_struct *active_mm;       // Currently active address space
    
    // Kernel threads have mm = NULL but need active_mm for context switch
    // User processes: mm == active_mm
    
    // ... other fields
};

Threads Share mm_struct

Converting Mermaid diagram...

Virtual Memory Areas (VMAs)

A process's address space isn't uniformly mapped—it consists of discrete regions with different properties. Each region is represented by a Virtual Memory Area (VMA) structure.

What a VMA Represents:

A VMA describes a contiguous region of virtual addresses with uniform characteristics:

Same protection flags (read, write, execute)
Same backing store (file, anonymous memory, shared memory)
Same treatment by the kernel (swappable, locked, etc.)

linux_vma_struct.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
// Linux VMA Structure (simplified from include/linux/mm_types.h)
 
struct vm_area_struct {
    // Address Range
    unsigned long vm_start;        // Start virtual address (inclusive)
    unsigned long vm_end;          // End virtual address (exclusive)
    
    // Linkage
    struct vm_area_struct *vm_next, *vm_prev; // Sorted list by address
    struct rb_node vm_rb;          // Red-black tree node (for fast lookup)
    
    // Owning mm_struct
    struct mm_struct *vm_mm;       // Address space this VMA belongs to
    
    // Protection and Flags
    pgprot_t vm_page_prot;         // Page table protection bits
    unsigned long vm_flags;        // VMA flags (see below)
    
    // Backing Store
    struct file *vm_file;          // File being mapped (NULL for anon)
    unsigned long vm_pgoff;        // Offset in file (in pages)
    
    // Operations
    const struct vm_operations_struct *vm_ops;  // VMA-specific handlers
    
    // Anonymous Memory
    struct anon_vma *anon_vma;     // For copy-on-write handling
    
    // ... more fields for special cases
};
 
// VMA Flags (vm_flags) - from include/linux/mm.h
#define VM_READ         0x00000001  // Readable
#define VM_WRITE        0x00000002  // Writable
#define VM_EXEC         0x00000004  // Executable
#define VM_SHARED       0x00000008  // Shared (vs. private/COW)
#define VM_GROWSDOWN    0x00000100  // Stack: can grow toward lower addr
#define VM_GROWSUP      0x00000200  // Can grow toward higher addr
#define VM_DENYWRITE    0x00000800  // Deny write to file
#define VM_LOCKED       0x00002000  // Locked in memory (mlock)
#define VM_IO           0x00004000  // Memory-mapped I/O
#define VM_DONTCOPY     0x00020000  // Don't copy on fork
#define VM_DONTEXPAND   0x00040000  // Cannot expand (mremap)
#define VM_HUGETLB      0x00400000  // Huge TLB pages
 
// Example VMA for text (code) segment:
// vm_start = 0x400000
// vm_end   = 0x401000
// vm_flags = VM_READ | VM_EXEC | VM_DENYWRITE
// vm_file  = /path/to/executable
// vm_pgoff = 0 (starts at beginning of file)

Viewing VMAs: /proc/[pid]/maps

Linux exposes VMAs through the /proc filesystem. Each line represents one VMA:

proc_maps_example.txt
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
$ cat /proc/self/maps
# Address range          Perms   Offset   Dev   Inode   Pathname
00400000-00452000         r-xp   00000000 08:01 123456  /usr/bin/cat
00651000-00652000         r--p   00051000 08:01 123456  /usr/bin/cat
00652000-00653000         rw-p   00052000 08:01 123456  /usr/bin/cat
00e54000-00e75000         rw-p   00000000 00:00 0       [heap]
7f6c88000000-7f6c88021000 rw-p   00000000 00:00 0       
7f6c8c000000-7f6c8c1c0000 r-xp   00000000 08:01 789012  /lib/x86_64-linux-gnu/libc.so.6
...
7ffc8ba00000-7ffc8ba21000 rw-p   00000000 00:00 0       [stack]
7ffc8bb0c000-7ffc8bb10000 r--p   00000000 00:00 0       [vvar]
7ffc8bb10000-7ffc8bb11000 r-xp   00000000 00:00 0       [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
 
# Columns explained:
# 1. Address range (start-end in hex)
# 2. Permissions: r=read, w=write, x=execute, p=private, s=shared
# 3. File offset (for file-backed mappings)
# 4. Device (major:minor)
# 5. Inode
# 6. Pathname (or [heap], [stack], [vdso], empty for anon mmap)

Common VMA Types
VMA Type	Permissions	Backing	Purpose
Text (Code)	r-xp	Executable file	Program instructions (read-only, executable)
Data	rw-p	Executable file	Initialized global/static variables
BSS	rw-p	Anonymous	Uninitialized global/static variables
Heap	rw-p	Anonymous	Dynamic allocation (malloc/new)
Stack	rw-p	Anonymous	Function call stack
Shared Library Code	r-xp	Library file	Shared code (read-only, shared)
Shared Library Data	rw-p	Library file	Per-process library data
mmap (file)	varies	File	Memory-mapped file
mmap (anon)	varies	Anonymous	Anonymous memory allocation
vDSO	r-xp	Kernel	Fast system call interface

Page Tables and Address Translation

Multi-Level Page Tables (x86-64):

On x86-64 with 4-level paging, a 48-bit virtual address is translated using:

PML4 (Page Map Level 4): 512 entries, indexed by bits 47-39
PDPT (Page Directory Pointer Table): 512 entries per PML4 entry, indexed by bits 38-30
PD (Page Directory): 512 entries per PDPT entry, indexed by bits 29-21
PT (Page Table): 512 entries per PD entry, indexed by bits 20-12
Page Offset: 12 bits (4 KB page offset)

Converting Mermaid diagram...

page_table_entry.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// x86-64 Page Table Entry Format
 
/*
 * A 64-bit Page Table Entry (PTE):
 *
 * Bit 63:    NX (No Execute) - if 1, page is not executable
 * Bits 62-52: Available for OS use
 * Bit 51-12: Physical frame number (40 bits -> 52-bit physical address)
 * Bits 11-9: Available for OS use
 * Bit 8:     G (Global) - TLB not flushed on CR3 change
 * Bit 7:     PAT (Page Attribute Table)
 * Bit 6:     D (Dirty) - page has been written
 * Bit 5:     A (Accessed) - page has been read or written
 * Bit 4:     PCD (Page Cache Disable)
 * Bit 3:     PWT (Page Write Through)
 * Bit 2:     U/S (User/Supervisor) - if 1, accessible from user mode
 * Bit 1:     R/W (Read/Write) - if 1, page is writable
 * Bit 0:     P (Present) - if 0, page fault on access
 */
 
typedef unsigned long pte_t;
 
// Macros to extract/check PTE fields
#define PTE_PRESENT     (1UL << 0)
#define PTE_RW          (1UL << 1)
#define PTE_USER        (1UL << 2)
#define PTE_ACCESSED    (1UL << 5)
#define PTE_DIRTY       (1UL << 6)
#define PTE_NX          (1UL << 63)
#define PTE_FRAME_MASK  0x000FFFFFFFFFF000UL
 
static inline bool pte_present(pte_t pte) {
    return pte & PTE_PRESENT;
}
 
static inline unsigned long pte_pfn(pte_t pte) {
    return (pte & PTE_FRAME_MASK) >> 12;
}
 
static inline bool pte_write(pte_t pte) {
    return pte & PTE_RW;
}
 
// The kernel uses these to check permissions:
// - If PTE_PRESENT is 0: Page fault (page not in memory)
// - If PTE_RW is 0 and write attempted: Protection fault
// - If PTE_USER is 0 and accessed from user mode: Protection fault
// - If PTE_NX is 1 and instruction fetch: Protection fault

Page Tables Are Expensive

Memory Statistics in the PCB

The memory descriptor tracks various statistics about memory usage. These are used for resource accounting, limits enforcement, and monitoring tools like top and ps.

Memory Statistics Fields
Statistic	Field	Meaning	Where Visible
Virtual Size	total_vm	Total pages in address space (mapped, not necessarily resident)	VSZ/VIRT in ps/top
Resident Set Size	rss	Pages actually in physical memory now	RSS/RES in ps/top
Shared Memory	shared_vm	Pages shared with other processes	SHR in top
Locked Memory	locked_vm	Pages locked (not swappable)	mlock() accounting
Code Size	exec_vm	Executable pages (text)	CODE in some tools
Data Size	data_vm	Data + stack pages	DATA in some tools
Stack Size	stack_vm	Stack pages	Stack limit tracking
Swap Usage	(external)	Pages on swap device	SWAP in top

memory_stats.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# Viewing process memory statistics
 
# Using ps
ps -o pid,vsz,rss,sz,command -p $$
#   PID    VSZ   RSS    SZ COMMAND
# 12345  25000  5000  6250 bash
 
# Using /proc
cat /proc/$$/status | grep -E "^(Vm|Rss)"
# VmPeak:    26000 kB     # Peak virtual memory size
# VmSize:    25000 kB     # Current virtual memory size
# VmLck:         0 kB     # Locked memory
# VmPin:         0 kB     # Pinned memory
# VmHWM:      5500 kB     # Peak resident set size
# VmRSS:      5000 kB     # Resident set size
# RssAnon:    2000 kB     # Anonymous RSS
# RssFile:    3000 kB     # File-backed RSS
# RssShmem:      0 kB     # Shared memory RSS
# VmData:     1500 kB     # Data segment size
# VmStk:       136 kB     # Stack size
# VmExe:       900 kB     # Text (code) size
# VmLib:      2000 kB     # Shared library size
# VmPTE:        64 kB     # Page table size
# VmSwap:        0 kB     # Swap usage
 
# Memory maps summary
cat /proc/$$/smaps_rollup
# Shows aggregated memory info without per-VMA detail
 
# Detailed per-VMA memory info
cat /proc/$$/smaps | head -30
# Shows RSS, PSS, Shared/Private pages per VMA

VSZ vs RSS

Memory Handling During fork() and exec()

Process creation and program execution fundamentally involve memory management. Understanding how fork() and exec() manipulate the memory descriptor is crucial.

fork(): Creating a Copy of the Address Space

The child's mm_struct is a new structure, but its VMAs initially point to the same physical pages as the parent
All writable pages are marked read-only in both processes' page tables
When either process writes to a COW page, a page fault occurs
The fault handler creates a private copy of the page for the faulting process
The copy is marked writable; the original remains shared (or COW) for the other

This makes fork() fast—only metadata is copied, not actual memory pages.

fork_memory_handling.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
// Simplified fork() memory handling
 
struct mm_struct *dup_mm(struct task_struct *parent) {
    struct mm_struct *mm;
    
    // Allocate new memory descriptor
    mm = allocate_mm();
    if (!mm)
        return NULL;
    
    // Copy basic mm_struct fields
    memcpy(mm, parent->mm, sizeof(*mm));
    
    // Allocate new page table root (PGD/PML4)
    mm->pgd = pgd_alloc(mm);
    if (!mm->pgd) {
        free_mm(mm);
        return NULL;
    }
    
    // Set reference counts
    atomic_set(&mm->mm_users, 1);
    atomic_set(&mm->mm_count, 1);
    
    // Initialize synchronization
    init_rwsem(&mm->mmap_sem);
    
    // Copy all VMAs with COW semantics
    if (dup_mmap(mm, parent->mm) < 0) {
        free_pgtables(mm);
        free_mm(mm);
        return NULL;
    }
    
    return mm;
}
 
int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) {
    struct vm_area_struct *mpnt, *tmp;
    
    // Iterate through parent's VMAs
    for (mpnt = oldmm->mmap; mpnt; mpnt = mpnt->vm_next) {
        // Allocate new VMA structure
        tmp = vm_area_dup(mpnt);
        if (!tmp)
            return -ENOMEM;
        
        // Link into new mm
        tmp->vm_mm = mm;
        insert_vm_struct(mm, tmp);
        
        // Copy page table entries with COW marking
        copy_page_range(mm, oldmm, tmp);
    }
    
    return 0;
}
 
int copy_page_range(...) {
    // For each PTE in the range:
    //   1. If writable and not shared:
    //      - Clear write bit in parent's PTE
    //      - Copy PTE to child's page table
    //      - Increment page reference count
    //      - Both PTEs now read-only (COW)
    //   2. If read-only or shared:
    //      - Just copy PTE (share the page)
}

exec(): Replacing the Address Space

When a process calls exec(), its entire address space is replaced:

Parse the new executable (ELF format on Linux)
Create new VMAs for the executable segments (text, data, bss)
Set up stack at the top of user space
Set up heap (brk) after data segment
Map shared libraries
Set program counter to entry point
Discard old VMAs and page tables

The mm_struct is either reused (after clearing) or replaced entirely.

Converting Mermaid diagram...

Memory Resource Limits

The operating system enforces resource limits to prevent any process from consuming excessive memory. These limits are stored with the process and checked during memory allocation operations.

Memory-Related Resource Limits (Linux)
Resource	Constant	Default	Purpose
Virtual Memory	RLIMIT_AS	Unlimited	Maximum address space size
Locked Memory	RLIMIT_MEMLOCK	64 KB	Max memory lockable via mlock()
Stack Size	RLIMIT_STACK	8 MB	Maximum stack segment size
Data Segment	RLIMIT_DATA	Unlimited	Maximum data segment size
Core Dump	RLIMIT_CORE	0 or Unlimited	Maximum core file size
Resident Set	RLIMIT_RSS	Unlimited	Max resident set size (advisory)

resource_limits.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# Viewing and modifying resource limits
 
# Show all limits for current shell
ulimit -a
# -t: cpu time (seconds)         unlimited
# -f: file size (blocks)         unlimited
# -d: data seg size (kbytes)     unlimited
# -s: stack size (kbytes)        8192
# -c: core file size (blocks)    0
# -m: resident set size (kbytes) unlimited
# -u: processes                  63636
# -n: file descriptors           1024
# -l: locked memory (kbytes)     64
# -v: virtual memory (kbytes)    unlimited
 
# Show specific limit (stack size)
ulimit -s
# 8192
 
# Set stack limit (soft limit)
ulimit -s 16384
 
# The kernel tracks limits in rlimit struct:
# struct rlimit {
#     rlim_t rlim_cur;  // Soft limit (current)
#     rlim_t rlim_max;  // Hard limit (ceiling)
# };
 
# View limits via /proc
cat /proc/self/limits | head -10
# Limit                     Soft Limit           Hard Limit           Units
# Max cpu time              unlimited            unlimited            seconds
# Max file size             unlimited            unlimited            bytes
# Max data size             unlimited            unlimited            bytes
# Max stack size            8388608              unlimited            bytes
# Max core file size        0                    unlimited            bytes
# Max resident set          unlimited            unlimited            bytes
# Max processes             63636                63636                processes
# Max open files            1024                 1048576              files
# Max locked memory         65536                65536                bytes
# Max address space         unlimited            unlimited            bytes

What Happens When Limits Are Hit?

Cross-Platform Memory Management Structures

Different operating systems structure memory management information differently, though the concepts are similar.

Windows Memory Management Structures:

EPROCESS.VadRoot: A balanced tree (AVL) of Virtual Address Descriptors (VADs)—Windows' equivalent of VMAs
VAD: Describes each region with base address, size, protection, and backing
Working Set: The set of pages currently in physical memory for this process
Paging File Support: Unlike Linux swap which is per-page, Windows commits virtual memory against page file reservations

Key Differences:

Windows uses a two-step memory allocation: reserve (claims address range) then commit (backs with storage)
Working set is managed per-process with trimming policies
Memory priorities (8 levels) influence paging decisions

windows_memory.c
C
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Windows VAD structure (conceptual)
 
typedef struct _MMVAD {
    union {
        LONG_PTR Balance : 2;
        struct _MMVAD *Parent;
    };
    struct _MMVAD *LeftChild;
    struct _MMVAD *RightChild;
    
    ULONG_PTR StartingVpn;    // Starting virtual page number
    ULONG_PTR EndingVpn;      // Ending virtual page number
    
    union {
        ULONG LongFlags;
        MMVAD_FLAGS VadFlags; // Protection, state, type
    };
    
    // For file-backed VADs
    PCONTROL_AREA ControlArea;
    PFILE_OBJECT FileObject;
    
    // ... more fields
} MMVAD, *PMMVAD;
 
// Windows API for memory inspection
MEMORY_BASIC_INFORMATION mbi;
VirtualQueryEx(hProcess, address, &mbi, sizeof(mbi));
// Returns: BaseAddress, AllocationBase, AllocationProtect,
//          RegionSize, State (MEM_COMMIT/RESERVE/FREE),
//          Protect, Type (MEM_IMAGE/MAPPED/PRIVATE)

Summary: Memory Management Information

Key Takeaways

•The memory descriptor (mm_struct) is the top-level container — It holds pointers to page tables, lists of VMAs, memory statistics, and resource limits.
•VMAs describe memory regions — Each VMA covers a contiguous range with uniform protection and backing (file or anonymous).
•Page tables translate virtual to physical — The pgd pointer in mm_struct is loaded into CR3/TTBR during context switches to activate the process's address space.
•Statistics track usage — VSZ, RSS, shared pages, and other metrics enable monitoring and limit enforcement.
•fork() uses copy-on-write — The child shares physical pages with the parent until writes trigger private copies.
•Resource limits prevent abuse — RLIMIT values constrain stack size, locked memory, and total address space.

Module Complete:

This concludes our deep dive into the Process Control Block. We've examined:

PCB Contents: The overall structure and its categories of information
Process State: The lifecycle status that governs scheduling
Program Counter: The execution bookmark enabling context switching
CPU Registers: The complete computational state saved and restored
Memory Management Info: The virtual address space representation

Module Complete: Process Control Block

5 / 5