Operating SystemsLogical vs Physical Addresses

Understanding Logical vs Physical Addresses

LevelIntermediate

Duration75 mins

TopicLogical vs Physical Addresses

1 / 5

Logical Address Space

The Illusion Every Program Believes

Every program running on your computer operates under a grand illusion: it believes it has exclusive access to a vast, contiguous expanse of memory, starting from address zero and extending to the limits of architectural possibility. A 64-bit program perceives an address space of 16 exabytes—more than a billion times the physical RAM in any computer. Yet this illusion is essential, not accidental. It's one of the most successful abstractions in computing history: the logical address space.

This abstraction liberates programmers from the complexities of physical memory management—from knowing exactly where data resides in hardware, from coordinating with other programs for memory access, from dealing with fragmented or non-contiguous physical storage. Understanding logical address space is fundamental to grasping how operating systems provide isolation, security, and the seamless execution of multiple programs simultaneously.

What You Will Learn

By the end of this page, you will understand the precise definition and properties of logical address space, how it differs fundamentally from physical memory, why this abstraction was invented, and how it enables the multiprogramming capabilities we take for granted in modern systems. You'll develop the conceptual foundation necessary for understanding address translation, memory protection, and virtual memory.

Definition and Formal Semantics

A logical address (also called a virtual address) is an address generated by the CPU during program execution. The collection of all logical addresses that a program can generate constitutes its logical address space.

More formally:

The logical address space of a process is the set of all addresses that the process can reference during execution, as perceived by the CPU when executing instructions within that process's context.

This definition carries several important implications that we must unpack carefully.

Key Properties of Logical Address Space

•CPU-Generated: Logical addresses are produced by the CPU as part of normal instruction execution. When a program references a variable x, the compiled code generates a logical address to access x's memory location.
•Process-Private: Each process has its own logical address space, completely isolated from other processes. Address 0x1000 in Process A refers to a different physical location (or no location at all) than address 0x1000 in Process B.
•Contiguous Appearance: From the process's perspective, its address space appears contiguous—a seamless range from the lowest to the highest address—regardless of how physical memory is actually organized.
•Size Independence from Physical Memory: The logical address space size is determined by the CPU architecture (e.g., 32 bits or 64 bits), not by the amount of physical memory installed. A 32-bit process has a 4 GB logical address space even on a machine with 8 GB of RAM.
•Abstract, Not Physical: Logical addresses have no direct correspondence to hardware memory chips. They exist in a conceptual space that must be translated to physical locations for actual memory access.

Terminology: Logical vs Virtual

The terms 'logical address' and 'virtual address' are often used interchangeably, though some texts distinguish them: 'logical' refers to the address from the CPU's perspective, while 'virtual' emphasizes the illusion of a larger-than-physical address space. In modern systems with virtual memory, the distinction is minimal. We'll use both terms to match different literature you may encounter.

Mathematical Representation:

For a system with n-bit logical addresses, the logical address space L is defined as:

L = {0, 1, 2, ..., 2ⁿ - 1}

For a 32-bit system: L = {0, 1, ..., 4,294,967,295} (4 GB of addressable locations) For a 64-bit system: L = {0, 1, ..., 18,446,744,073,709,551,615} (16 EB theoretical maximum)

In practice, the usable logical address space is often smaller due to architectural constraints, reserved regions, and address space layout policies—but the conceptual size is defined by the address width.

The Historical Context: Why Logical Addresses Were Invented

The concept of logical addressing emerged from real engineering problems in early computing. Understanding this history illuminates why the abstraction takes its current form.

The Early Days: Absolute Addressing

In the earliest computers (1940s-1950s), programmers used absolute addresses—physical memory locations hardcoded into programs. If your program stored a variable at address 1000, that meant physical memory location 1000 on the hardware.

This approach had severe limitations:

Problems with Absolute Addressing

•Single Program Execution: Only one program could run at a time. Running a second program required either dedicating different memory regions to each program (wasteful) or manually rewriting addresses when loading different programs.
•No Relocation: Programs could only run at the specific addresses they were written for. Moving a program in memory required rewriting every address reference—an error-prone manual process.
•No Protection: Any program could access any memory location, including the operating system's memory or another program's data. A bug in one program could corrupt the entire system.
•Hardware Dependency: Programs were tied to specific hardware configurations. Adding more memory or changing memory layout required program modifications.

The Birth of Relocation: Base Registers

The first step toward logical addressing came with relocatable code and the base register (circa 1960s). The idea was simple but powerful:

Write programs as if they start at address 0
At load time, add a base address (stored in a hardware register) to every memory reference
The program can now run at any physical location by changing the base register

This introduced a simple form of address translation:

Physical Address = Logical Address + Base Register

With this mechanism, two programs could coexist in memory by having different base values. Program A starts at physical address 0, Program B at address 50000. Each program thinks it starts at address 0.

base_register_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Memory Layout with Base Register Relocation:
 
Physical Memory:
┌──────────────────────────────────────────────┐
│ 0x00000: Operating System                    │
├──────────────────────────────────────────────┤
│ 0x10000: Program A (Base = 0x10000)          │
│          Logical Addr 0 → Physical 0x10000   │
│          Logical Addr 100 → Physical 0x10100 │
├──────────────────────────────────────────────┤
│ 0x50000: Program B (Base = 0x50000)          │
│          Logical Addr 0 → Physical 0x50000   │
│          Logical Addr 100 → Physical 0x50100 │
├──────────────────────────────────────────────┤
│ 0x90000: Free Memory                         │
└──────────────────────────────────────────────┘
 
Both programs are written as if they start at address 0.
The hardware adds the base value during every memory access.
 
Context Switch:
- When switching from Program A to Program B,
- The OS updates the base register from 0x10000 to 0x50000
- All subsequent memory accesses are automatically relocated

The Evolution Continues:

Base register relocation was a crucial first step, but it still had limitations. The entire logical address space had to map to a contiguous physical region. As systems grew more complex, the need for more flexible mapping led to:

Segmentation (1960s-70s): Multiple base/limit pairs for different program regions
Paging (1960s-present): Fixed-size chunks with arbitrary mapping
Virtual Memory (1960s-present): Logical spaces larger than physical memory

Each evolution maintained the core principle: the logical address space as an abstraction layer between programs and physical memory.

The Genius of Abstraction

The shift from absolute to logical addressing exemplifies a key software engineering principle: solve problems through abstraction. Rather than making programs smarter about physical memory, make physical memory invisible to programs. This separation of concerns—program logic versus memory management—enabled decades of independent evolution in both areas.

Structure of the Logical Address Space

While a process's logical address space appears as a uniform range of addresses, it is actually organized into distinct regions with different purposes, access permissions, and lifetime characteristics. Understanding this structure is essential for systems programming and security analysis.

Converting Mermaid diagram...

Logical Address Space Segments
Segment	Contents	Permissions	Growth Direction	Lifetime
Text (Code)	Compiled machine instructions	Read + Execute	Fixed size	Process lifetime
Data	Initialized global and static variables	Read + Write	Fixed size	Process lifetime
BSS	Uninitialized global/static variables (zeroed)	Read + Write	Fixed size	Process lifetime
Heap	Dynamically allocated memory (malloc/new)	Read + Write	Grows upward	Until freed or process exit
Stack	Function call frames, local variables	Read + Write	Grows downward	Until function returns
Kernel	OS code and data (protected)	Kernel only	N/A	System lifetime

The Stack-Heap Arrangement:

Notice that the stack grows downward (toward lower addresses) while the heap grows upward. This classic arrangement maximizes flexibility: the two dynamic regions can expand into the same gap, with collision occurring only when the combined allocation exceeds available address space.

In a 64-bit system, the gap between stack and heap is astronomically large—petabytes of unmapped addresses. Stack-heap collision is essentially impossible (address space exhaustion would occur first, typically limited by the OS to something reasonable like 128 TB).

Address Space Layout Randomization (ASLR):

Modern operating systems randomize the positions of segments within the logical address space on each program execution. This security measure makes it difficult for attackers to predict where code or data will be located, thwarting many exploitation techniques.

address_space_exploration.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#include <stdio.h>
#include <stdlib.h>
 
// Global variables - Data/BSS segments
int initialized_global = 42;       // Data segment
int uninitialized_global;          // BSS segment
 
void print_addresses() {
    // Local variable - Stack segment
    int stack_var = 100;
    
    // Dynamic allocation - Heap segment
    int *heap_ptr = malloc(sizeof(int));
    *heap_ptr = 200;
    
    printf("=== Logical Address Space Exploration ===
 
");
    
    printf("Text Segment (Code):
");
    printf("  Function print_addresses: %p
 
", (void*)print_addresses);
    
    printf("Data Segment (Initialized Globals):
");
    printf("  initialized_global:       %p
 
", (void*)&initialized_global);
    
    printf("BSS Segment (Uninitialized Globals):
");
    printf("  uninitialized_global:     %p
 
", (void*)&uninitialized_global);
    
    printf("Heap Segment (Dynamic Allocation):
");
    printf("  heap_ptr value:           %p
 
", (void*)heap_ptr);
    
    printf("Stack Segment (Local Variables):
");
    printf("  stack_var:                %p
 
", (void*)&stack_var);
    
    // Demonstrate relative positions
    printf("=== Address Comparison ===
");
    printf("Stack is at higher addresses than Heap: %s
",
           (void*)&stack_var > (void*)heap_ptr ? "Yes" : "No");
    printf("Heap is at higher addresses than BSS:   %s
",
           (void*)heap_ptr > (void*)&uninitialized_global ? "Yes" : "No");
    printf("BSS is at higher addresses than Data:   %s
",
           (void*)&uninitialized_global > (void*)&initialized_global ? "Yes" : "No");
    
    free(heap_ptr);
}
 
/*
 * Sample Output (addresses vary due to ASLR):
 * 
 * === Logical Address Space Exploration ===
 * 
 * Text Segment (Code):
 *   Function print_addresses: 0x55d3a2c00189
 * 
 * Data Segment (Initialized Globals):
 *   initialized_global:       0x55d3a2e03010
 * 
 * BSS Segment (Uninitialized Globals):
 *   uninitialized_global:     0x55d3a2e03014
 * 
 * Heap Segment (Dynamic Allocation):
 *   heap_ptr value:           0x55d3a3a052a0
 * 
 * Stack Segment (Local Variables):
 *   stack_var:                0x7ffeba4c01dc
 * 
 * Notice: Stack addresses start with 0x7ff... (high addresses)
 *         Other segments start with 0x55... (lower addresses)
 */
 
int main() {
    print_addresses();
    return 0;
}

The Unmapped Regions

Not every address in the logical address space is valid. Attempting to access unmapped regions—gaps between segments, addresses beyond allocated ranges, or null pointers (address 0)—triggers a hardware exception (segmentation fault on Unix, access violation on Windows). The OS terminates the offending process. This is memory protection in action, enforced through the same translation mechanism that enables logical addressing.

Logical Addresses in Different Contexts

The concept of logical addressing manifests differently across various computing contexts. Understanding these variations reveals the universal importance of address abstraction.

Logical Addressing Across System Types
Context	Address Space Size	Translation Mechanism	Key Characteristics
32-bit Process	4 GB (2³² bytes)	Page tables + MMU	3 GB user / 1 GB kernel split common
64-bit Process	256 TB canonical (48-bit)	4-level page tables + MMU	Vast address space, sparse mapping
Java Virtual Machine	JVM heap size (configurable)	JVM internal + OS translation	Object references, not raw addresses
Web Browser JavaScript	ArrayBuffer size limits	V8/SpiderMonkey engine	Sandboxed, no direct memory access
Embedded Systems	Varies (often no MMU)	Direct or simple base+offset	Physical addressing may be used
GPU Computing	Device memory space	GPU-specific translation	Separate from CPU address space

32-bit vs 64-bit Address Spaces:

The transition from 32-bit to 64-bit computing represents a fundamental expansion in logical address space:

32-bit Systems (4 GB limit): With 2³² possible addresses, the entire logical address space is 4 GB. This became limiting as programs grew—especially considering that the OS often reserves 1-2 GB of this space for kernel mapping.
64-bit Systems (practical limits): While 64 bits theoretically address 16 EB, practical implementations use 48 bits (256 TB) or 57 bits (128 PB) of virtual addressing. This is vastly more than any current physical memory, allowing for sparse address spaces and memory-mapped files exceeding physical RAM.

The key insight: logical address space size is an architectural decision, independent of physical memory.

address_space_sizes.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <stdio.h>
#include <stdint.h>
 
int main() {
    printf("Pointer size on this system: %zu bytes (%zu bits)
 
",
           sizeof(void*), sizeof(void*) * 8);
    
    printf("=== Theoretical Logical Address Space Sizes ===
 
");
    
    // 32-bit system
    uint64_t addr_space_32 = 1ULL << 32;  // 2^32
    printf("32-bit address space:
");
    printf("  Addresses: %llu
", addr_space_32);
    printf("  Size: %.2f GB
 
", addr_space_32 / (1024.0 * 1024.0 * 1024.0));
    
    // 48-bit (common 64-bit implementation)
    uint64_t addr_space_48 = 1ULL << 48;  // 2^48
    printf("48-bit address space (common x86-64):
");
    printf("  Addresses: %llu
", addr_space_48);
    printf("  Size: %.2f TB
 
", addr_space_48 / (1024.0 * 1024.0 * 1024.0 * 1024.0));
    
    // 57-bit (Intel 5-level paging)
    uint64_t addr_space_57 = 1ULL << 57;  // 2^57
    printf("57-bit address space (5-level paging):
");
    printf("  Addresses: %llu
", addr_space_57);
    printf("  Size: %.2f PB
 
", 
           addr_space_57 / (1024.0 * 1024.0 * 1024.0 * 1024.0 * 1024.0));
    
    printf("Note: Full 64-bit would be 16 EB, but no current system implements this.
");
    printf("The unused high bits are sign-extended, creating 'canonical' addresses.
");
    
    return 0;
}
 
/*
 * Output on a 64-bit system:
 * 
 * Pointer size on this system: 8 bytes (64 bits)
 * 
 * === Theoretical Logical Address Space Sizes ===
 * 
 * 32-bit address space:
 *   Addresses: 4294967296
 *   Size: 4.00 GB
 * 
 * 48-bit address space (common x86-64):
 *   Addresses: 281474976710656
 *   Size: 256.00 TB
 * 
 * 57-bit address space (5-level paging):
 *   Addresses: 144115188075855872
 *   Size: 128.00 PB
 */

Processes Without Hardware Translation

Some embedded systems and real-time operating systems run without an MMU. In these cases, 'logical addresses' may directly correspond to physical addresses, or simple software-based translation is used. The abstraction benefits are reduced, but simplicity and deterministic timing can be gained. This represents a tradeoff, not an invalidation of the logical address concept.

The Benefits of Logical Address Spaces

The logical address space abstraction provides numerous benefits that are foundational to modern computing. Each benefit addresses a specific problem that would otherwise require complex, error-prone solutions from application programmers.

Programmer Benefits

•Simple Mental Model: Programs assume contiguous memory starting from a known address. No need to coordinate with other programs or know physical layout.
•Consistent Environment: The same program binary runs regardless of physical memory configuration. A program compiled today runs on hardware with different RAM sizes.
•Pointer Arithmetic Works: Logical addresses form a numeric sequence, enabling pointer arithmetic, array indexing, and address calculations to work predictably.
•Debugging Simplicity: Addresses in debug output are consistent across runs (modulo ASLR), making it easier to reproduce and diagnose issues.

System Benefits

•Process Isolation: Each process has its own address space. One process cannot directly access another's memory, providing security and stability.
•Multiprogramming: Multiple programs run concurrently, each believing it has all of memory. Physical memory is divided without program knowledge.
•Memory Protection: The OS can enforce read/write/execute permissions on memory regions, preventing code injection and data corruption.
•Virtual Memory: Logical spaces can exceed physical memory. Pages are loaded on demand from disk, enabling programs larger than RAM to run.

The Sharing Paradox:

Interestingly, logical address spaces also enable controlled sharing—the opposite of isolation. By mapping multiple logical addresses (potentially in different processes) to the same physical memory, the OS enables:

Shared Libraries: Code for libc, graphics libraries, etc., exists once in physical memory but appears in every process's logical space.
Inter-Process Communication: Shared memory regions allow processes to exchange data efficiently without kernel involvement for each transfer.
Copy-on-Write (COW): After fork(), parent and child share pages until one writes, saving memory and time.

These capabilities would be extremely difficult to implement safely with physical addressing.

Resource Efficiency:

The logical address abstraction dramatically improves resource utilization:

Efficiency Gains Through Logical Addressing

•Sparse Address Spaces: A process can allocate a huge virtual array but only use part of it. Physical memory is allocated only for accessed pages—not for the entire virtual allocation.
•Demand Paging: Program code and data are loaded only when accessed. A large application may have much of its code never loaded into memory during typical use.
•Memory Mapping: Files can be mapped into the address space, letting the OS handle caching and disk I/O. Rarely-used pages stay on disk.
•Deduplication: Identical pages (common in containers or VMs) can share physical memory while maintaining separate logical appearances.

The Ultimate Portability Layer

Consider that the same ELF or PE binary can run on machines with 4 GB, 16 GB, or 256 GB of RAM, with different physical memory layouts, different amounts in use by other processes—all without recompilation. This extreme portability is a direct result of the logical address abstraction. The program's view of memory is invariant; only the mapping changes.

How Logical Addresses Are Generated

Understanding the origin of logical addresses reveals how the abstraction is maintained from source code through compilation to execution.

The Journey of an Address:

Source Code: Programmer references variables, functions, arrays by name
Compilation: Names become symbolic references in object files
Linking: Symbols are resolved to logical addresses within the address space
Loading: Relocations adjust addresses based on where the program loads
Execution: CPU generates logical addresses based on PC and operands

address_generation_stages.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/* 
 * Address Generation Example: Following 'x' through the compilation pipeline
 */
 
// SOURCE CODE LEVEL
// Variable 'x' is a name - no address yet
int x = 42;
 
int main() {
    int y = x + 1;  // Reference to 'x'
    return y;
}
 
/*
 * AFTER COMPILATION (Assembly/Object Code):
 * 
 *     .data
 * x:                          ; 'x' is now a symbol
 *     .long   42
 * 
 *     .text
 * main:
 *     movl    x(%rip), %eax   ; Reference to symbol 'x' (relocatable)
 *     addl    $1, %eax
 *     ret
 * 
 * At this stage, 'x' is a symbol that will be resolved to
 * an actual address during linking.
 */
 
/*
 * AFTER LINKING (Executable):
 * 
 * Symbol table shows x is at logical address 0x404020:
 * 
 *     .data
 *     .org 0x404020
 *     .long   42
 * 
 *     .text
 *     .org 0x401000
 * main:
 *     movl    0x404020(%rip), %eax  ; Actual address substituted
 *     addl    $1, %eax              ; (or RIP-relative offset)
 *     ret
 * 
 * The linker assigned address 0x404020 to 'x' within the
 * process's logical address space.
 */
 
/*
 * DURING EXECUTION:
 * 
 * 1. CPU fetches instruction from logical address 0x401000
 * 2. Instruction encodes access to logical address 0x404020
 * 3. CPU presents 0x404020 to the MMU for translation
 * 4. MMU translates to physical address (e.g., 0x7FFFDE404020)
 * 5. Physical memory is accessed; value 42 is retrieved
 * 6. All of this happens transparently to the program
 */

Position-Independent Code (PIC):

Modern shared libraries use Position-Independent Code, which avoids hardcoded absolute addresses. Instead, addresses are computed relative to the current instruction pointer (RIP-relative on x86-64). This allows the same code to be loaded at different logical addresses in different processes—essential for ASLR and shared library efficiency.

Dynamic Address Computation:

Many logical addresses aren't determined until runtime:

•Stack Addresses: Local variables get addresses based on the current stack pointer, determined by the call sequence at runtime.
•Heap Addresses: malloc() returns addresses chosen by the memory allocator at runtime.
•mmap() Regions: The OS assigns addresses for memory-mapped files and regions.
•Computed Addresses: Array indexing (e.g., arr[i]) computes addresses from base + offset at runtime.

The Compiler's Promise

The compiler ensures that all generated addresses fall within valid regions of the logical address space and that access patterns respect the boundaries established at link time. This is a contract: the compiler generates legal addresses, and the OS ensures those addresses map to actual memory (or signals an error if they don't).

Logical Address Space Manipulation

The logical address space is not static—it can be expanded, contracted, and reorganized during program execution. Understanding how programs and operating systems manipulate logical address spaces is crucial for systems programming.

Address Space Manipulation Operations
Operation	Mechanism	Example API	Use Case
Expand Heap	brk/sbrk adjustment	sbrk(increment) / malloc()	Dynamic memory allocation
Memory Map	Create new mapping	mmap()	File I/O, shared memory, large allocations
Unmap Memory	Remove mapping	munmap()	Free large allocations, unload libraries
Protect Memory	Change permissions	mprotect()	JIT compilation, guard pages
Shared Memory	Map same pages	shm_open() + mmap()	Inter-process communication
Stack Expansion	Automatic on access	OS trap handler	Deep recursion, large local arrays

address_space_manipulation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>
 
void demonstrate_mmap() {
    printf("=== Address Space Manipulation with mmap ===
 
");
    
    // Allocate a new region in our logical address space
    size_t size = 4096 * 1000;  // ~4 MB
    
    void *region = mmap(
        NULL,                   // Let OS choose the address
        size,                   // Size of mapping
        PROT_READ | PROT_WRITE, // Readable and writable
        MAP_PRIVATE | MAP_ANONYMOUS,  // Private, not file-backed
        -1,                     // No file descriptor
        0                       // No offset
    );
    
    if (region == MAP_FAILED) {
        perror("mmap failed");
        return;
    }
    
    printf("mmap allocated region at: %p
", region);
    printf("Region size: %zu bytes
", size);
    
    // The region is now part of our logical address space
    // But physical memory is allocated lazily (on first access)
    
    // Write to the region
    memset(region, 0xAB, 4096);  // Now physical pages are allocated
    printf("Wrote to first page
");
    
    // Change protection to read-only
    if (mprotect(region, size, PROT_READ) == 0) {
        printf("Changed region to read-only
");
    }
    
    // Attempting to write now would cause SIGSEGV:
    // memset(region, 0, 4096);  // CRASH!
    
    // Clean up - remove from logical address space
    if (munmap(region, size) == 0) {
        printf("Unmapped region - address space shrunk
");
    }
    
    // region is now an invalid address - accessing it would crash
}
 
void show_address_space_info() {
    printf("
=== Current Address Space Info ===
 
");
    
    // On Linux, we can read /proc/self/maps
    FILE *maps = fopen("/proc/self/maps", "r");
    if (maps) {
        char line[256];
        printf("%-20s %-5s %-8s %s
", 
               "Address Range", "Perm", "Offset", "Mapping");
        printf("%-20s %-5s %-8s %s
", 
               "-------------", "----", "------", "-------");
        
        int count = 0;
        while (fgets(line, sizeof(line), maps) && count < 15) {
            // Parse and display simplified view
            printf("%s", line);
            count++;
        }
        if (count == 15) {
            printf("... (truncated)
");
        }
        fclose(maps);
    }
}
 
int main() {
    demonstrate_mmap();
    show_address_space_info();
    return 0;
}
 
/*
 * Sample Output:
 * 
 * === Address Space Manipulation with mmap ===
 * 
 * mmap allocated region at: 0x7f5a3c000000
 * Region size: 4096000 bytes
 * Wrote to first page
 * Changed region to read-only
 * Unmapped region - address space shrunk
 * 
 * === Current Address Space Info ===
 * 
 * Address Range        Perm  Offset   Mapping
 * -------------        ----  ------   -------
 * 55e8a5c00000-55e8a5c01000 r--p 00000000 /home/user/a.out
 * 55e8a5c01000-55e8a5c02000 r-xp 00001000 /home/user/a.out
 * ...
 */

Address Space Limits

While logical address spaces can be vast, operating systems impose practical limits. On Linux, check /proc/sys/vm/overcommit_memory and ulimit -v. Excessive address space allocation (even without physical memory use) can be denied, and the kernel's virtual memory accounting may limit total mappings across all processes.

Summary: The Logical Address Space

We've established a comprehensive understanding of logical address space—the foundational abstraction that enables modern multiprogramming, memory protection, and virtual memory. Let's consolidate the key insights:

Key Takeaways

•Logical addresses are CPU-generated abstractions that represent memory locations from a process's perspective, independent of physical memory organization.
•Each process has its own logical address space, providing isolation by default with controlled sharing possible through intentional mapping.
•The address space is structured into segments (text, data, BSS, heap, stack) with different purposes, permissions, and lifetime characteristics.
•Historical evolution from absolute addressing through base registers to modern virtual memory systems solved progressively more complex multiprogramming challenges.
•Benefits include: programmer simplicity, process isolation, memory protection, efficient resource usage, portability, and the enabling of virtual memory.
•Logical addresses are generated through compilation, linking, and runtime computation, with the program unaware of eventual physical locations.
•Address spaces are dynamic—they can be expanded, contracted, protected, and remapped during execution through system calls.

What's Next:

Now that we understand logical address space—the abstraction processes use—we must examine its counterpart: physical address space. Physical addresses represent actual hardware locations. Understanding the physical layer is essential for understanding how translation bridges these two worlds and why memory management is necessary at all.

Page Complete

You now understand logical address space as a fundamental operating system abstraction. This knowledge is prerequisite for understanding address translation, the MMU, page tables, and virtual memory—all of which build upon the logical/physical distinction we've established. The journey into memory management continues with exploring the physical side of this duality.

1 / 5

Loading learning content...

Operating SystemsLogical vs Physical Addresses

Understanding Logical vs Physical Addresses

LevelIntermediate

Duration75 mins

TopicLogical vs Physical Addresses

1 / 5

Logical Address Space

The Illusion Every Program Believes

What You Will Learn

Definition and Formal Semantics

More formally:

The logical address space of a process is the set of all addresses that the process can reference during execution, as perceived by the CPU when executing instructions within that process's context.

This definition carries several important implications that we must unpack carefully.

Key Properties of Logical Address Space

•CPU-Generated: Logical addresses are produced by the CPU as part of normal instruction execution. When a program references a variable x, the compiled code generates a logical address to access x's memory location.
•Process-Private: Each process has its own logical address space, completely isolated from other processes. Address 0x1000 in Process A refers to a different physical location (or no location at all) than address 0x1000 in Process B.
•Contiguous Appearance: From the process's perspective, its address space appears contiguous—a seamless range from the lowest to the highest address—regardless of how physical memory is actually organized.
•Size Independence from Physical Memory: The logical address space size is determined by the CPU architecture (e.g., 32 bits or 64 bits), not by the amount of physical memory installed. A 32-bit process has a 4 GB logical address space even on a machine with 8 GB of RAM.
•Abstract, Not Physical: Logical addresses have no direct correspondence to hardware memory chips. They exist in a conceptual space that must be translated to physical locations for actual memory access.

Terminology: Logical vs Virtual

Mathematical Representation:

For a system with n-bit logical addresses, the logical address space L is defined as:

L = {0, 1, 2, ..., 2ⁿ - 1}

For a 32-bit system: L = {0, 1, ..., 4,294,967,295} (4 GB of addressable locations) For a 64-bit system: L = {0, 1, ..., 18,446,744,073,709,551,615} (16 EB theoretical maximum)

The Historical Context: Why Logical Addresses Were Invented

The concept of logical addressing emerged from real engineering problems in early computing. Understanding this history illuminates why the abstraction takes its current form.

The Early Days: Absolute Addressing

This approach had severe limitations:

Problems with Absolute Addressing

•Single Program Execution: Only one program could run at a time. Running a second program required either dedicating different memory regions to each program (wasteful) or manually rewriting addresses when loading different programs.
•No Relocation: Programs could only run at the specific addresses they were written for. Moving a program in memory required rewriting every address reference—an error-prone manual process.
•No Protection: Any program could access any memory location, including the operating system's memory or another program's data. A bug in one program could corrupt the entire system.
•Hardware Dependency: Programs were tied to specific hardware configurations. Adding more memory or changing memory layout required program modifications.

The Birth of Relocation: Base Registers

The first step toward logical addressing came with relocatable code and the base register (circa 1960s). The idea was simple but powerful:

Write programs as if they start at address 0
At load time, add a base address (stored in a hardware register) to every memory reference
The program can now run at any physical location by changing the base register

This introduced a simple form of address translation:

Physical Address = Logical Address + Base Register

base_register_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Memory Layout with Base Register Relocation:
 
Physical Memory:
┌──────────────────────────────────────────────┐
│ 0x00000: Operating System                    │
├──────────────────────────────────────────────┤
│ 0x10000: Program A (Base = 0x10000)          │
│          Logical Addr 0 → Physical 0x10000   │
│          Logical Addr 100 → Physical 0x10100 │
├──────────────────────────────────────────────┤
│ 0x50000: Program B (Base = 0x50000)          │
│          Logical Addr 0 → Physical 0x50000   │
│          Logical Addr 100 → Physical 0x50100 │
├──────────────────────────────────────────────┤
│ 0x90000: Free Memory                         │
└──────────────────────────────────────────────┘
 
Both programs are written as if they start at address 0.
The hardware adds the base value during every memory access.
 
Context Switch:
- When switching from Program A to Program B,
- The OS updates the base register from 0x10000 to 0x50000
- All subsequent memory accesses are automatically relocated

The Evolution Continues:

Segmentation (1960s-70s): Multiple base/limit pairs for different program regions
Paging (1960s-present): Fixed-size chunks with arbitrary mapping
Virtual Memory (1960s-present): Logical spaces larger than physical memory

Each evolution maintained the core principle: the logical address space as an abstraction layer between programs and physical memory.

The Genius of Abstraction

Structure of the Logical Address Space

Converting Mermaid diagram...

Logical Address Space Segments
Segment	Contents	Permissions	Growth Direction	Lifetime
Text (Code)	Compiled machine instructions	Read + Execute	Fixed size	Process lifetime
Data	Initialized global and static variables	Read + Write	Fixed size	Process lifetime
BSS	Uninitialized global/static variables (zeroed)	Read + Write	Fixed size	Process lifetime
Heap	Dynamically allocated memory (malloc/new)	Read + Write	Grows upward	Until freed or process exit
Stack	Function call frames, local variables	Read + Write	Grows downward	Until function returns
Kernel	OS code and data (protected)	Kernel only	N/A	System lifetime

The Stack-Heap Arrangement:

Address Space Layout Randomization (ASLR):

address_space_exploration.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
#include <stdio.h>
#include <stdlib.h>
 
// Global variables - Data/BSS segments
int initialized_global = 42;       // Data segment
int uninitialized_global;          // BSS segment
 
void print_addresses() {
    // Local variable - Stack segment
    int stack_var = 100;
    
    // Dynamic allocation - Heap segment
    int *heap_ptr = malloc(sizeof(int));
    *heap_ptr = 200;
    
    printf("=== Logical Address Space Exploration ===
 
");
    
    printf("Text Segment (Code):
");
    printf("  Function print_addresses: %p
 
", (void*)print_addresses);
    
    printf("Data Segment (Initialized Globals):
");
    printf("  initialized_global:       %p
 
", (void*)&initialized_global);
    
    printf("BSS Segment (Uninitialized Globals):
");
    printf("  uninitialized_global:     %p
 
", (void*)&uninitialized_global);
    
    printf("Heap Segment (Dynamic Allocation):
");
    printf("  heap_ptr value:           %p
 
", (void*)heap_ptr);
    
    printf("Stack Segment (Local Variables):
");
    printf("  stack_var:                %p
 
", (void*)&stack_var);
    
    // Demonstrate relative positions
    printf("=== Address Comparison ===
");
    printf("Stack is at higher addresses than Heap: %s
",
           (void*)&stack_var > (void*)heap_ptr ? "Yes" : "No");
    printf("Heap is at higher addresses than BSS:   %s
",
           (void*)heap_ptr > (void*)&uninitialized_global ? "Yes" : "No");
    printf("BSS is at higher addresses than Data:   %s
",
           (void*)&uninitialized_global > (void*)&initialized_global ? "Yes" : "No");
    
    free(heap_ptr);
}
 
/*
 * Sample Output (addresses vary due to ASLR):
 * 
 * === Logical Address Space Exploration ===
 * 
 * Text Segment (Code):
 *   Function print_addresses: 0x55d3a2c00189
 * 
 * Data Segment (Initialized Globals):
 *   initialized_global:       0x55d3a2e03010
 * 
 * BSS Segment (Uninitialized Globals):
 *   uninitialized_global:     0x55d3a2e03014
 * 
 * Heap Segment (Dynamic Allocation):
 *   heap_ptr value:           0x55d3a3a052a0
 * 
 * Stack Segment (Local Variables):
 *   stack_var:                0x7ffeba4c01dc
 * 
 * Notice: Stack addresses start with 0x7ff... (high addresses)
 *         Other segments start with 0x55... (lower addresses)
 */
 
int main() {
    print_addresses();
    return 0;
}

The Unmapped Regions

Logical Addresses in Different Contexts

The concept of logical addressing manifests differently across various computing contexts. Understanding these variations reveals the universal importance of address abstraction.

Logical Addressing Across System Types
Context	Address Space Size	Translation Mechanism	Key Characteristics
32-bit Process	4 GB (2³² bytes)	Page tables + MMU	3 GB user / 1 GB kernel split common
64-bit Process	256 TB canonical (48-bit)	4-level page tables + MMU	Vast address space, sparse mapping
Java Virtual Machine	JVM heap size (configurable)	JVM internal + OS translation	Object references, not raw addresses
Web Browser JavaScript	ArrayBuffer size limits	V8/SpiderMonkey engine	Sandboxed, no direct memory access
Embedded Systems	Varies (often no MMU)	Direct or simple base+offset	Physical addressing may be used
GPU Computing	Device memory space	GPU-specific translation	Separate from CPU address space

32-bit vs 64-bit Address Spaces:

The transition from 32-bit to 64-bit computing represents a fundamental expansion in logical address space:

32-bit Systems (4 GB limit): With 2³² possible addresses, the entire logical address space is 4 GB. This became limiting as programs grew—especially considering that the OS often reserves 1-2 GB of this space for kernel mapping.
64-bit Systems (practical limits): While 64 bits theoretically address 16 EB, practical implementations use 48 bits (256 TB) or 57 bits (128 PB) of virtual addressing. This is vastly more than any current physical memory, allowing for sparse address spaces and memory-mapped files exceeding physical RAM.

The key insight: logical address space size is an architectural decision, independent of physical memory.

address_space_sizes.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
#include <stdio.h>
#include <stdint.h>
 
int main() {
    printf("Pointer size on this system: %zu bytes (%zu bits)
 
",
           sizeof(void*), sizeof(void*) * 8);
    
    printf("=== Theoretical Logical Address Space Sizes ===
 
");
    
    // 32-bit system
    uint64_t addr_space_32 = 1ULL << 32;  // 2^32
    printf("32-bit address space:
");
    printf("  Addresses: %llu
", addr_space_32);
    printf("  Size: %.2f GB
 
", addr_space_32 / (1024.0 * 1024.0 * 1024.0));
    
    // 48-bit (common 64-bit implementation)
    uint64_t addr_space_48 = 1ULL << 48;  // 2^48
    printf("48-bit address space (common x86-64):
");
    printf("  Addresses: %llu
", addr_space_48);
    printf("  Size: %.2f TB
 
", addr_space_48 / (1024.0 * 1024.0 * 1024.0 * 1024.0));
    
    // 57-bit (Intel 5-level paging)
    uint64_t addr_space_57 = 1ULL << 57;  // 2^57
    printf("57-bit address space (5-level paging):
");
    printf("  Addresses: %llu
", addr_space_57);
    printf("  Size: %.2f PB
 
", 
           addr_space_57 / (1024.0 * 1024.0 * 1024.0 * 1024.0 * 1024.0));
    
    printf("Note: Full 64-bit would be 16 EB, but no current system implements this.
");
    printf("The unused high bits are sign-extended, creating 'canonical' addresses.
");
    
    return 0;
}
 
/*
 * Output on a 64-bit system:
 * 
 * Pointer size on this system: 8 bytes (64 bits)
 * 
 * === Theoretical Logical Address Space Sizes ===
 * 
 * 32-bit address space:
 *   Addresses: 4294967296
 *   Size: 4.00 GB
 * 
 * 48-bit address space (common x86-64):
 *   Addresses: 281474976710656
 *   Size: 256.00 TB
 * 
 * 57-bit address space (5-level paging):
 *   Addresses: 144115188075855872
 *   Size: 128.00 PB
 */

Processes Without Hardware Translation

The Benefits of Logical Address Spaces

Programmer Benefits

•Simple Mental Model: Programs assume contiguous memory starting from a known address. No need to coordinate with other programs or know physical layout.
•Consistent Environment: The same program binary runs regardless of physical memory configuration. A program compiled today runs on hardware with different RAM sizes.
•Pointer Arithmetic Works: Logical addresses form a numeric sequence, enabling pointer arithmetic, array indexing, and address calculations to work predictably.
•Debugging Simplicity: Addresses in debug output are consistent across runs (modulo ASLR), making it easier to reproduce and diagnose issues.

System Benefits

•Process Isolation: Each process has its own address space. One process cannot directly access another's memory, providing security and stability.
•Multiprogramming: Multiple programs run concurrently, each believing it has all of memory. Physical memory is divided without program knowledge.
•Memory Protection: The OS can enforce read/write/execute permissions on memory regions, preventing code injection and data corruption.
•Virtual Memory: Logical spaces can exceed physical memory. Pages are loaded on demand from disk, enabling programs larger than RAM to run.

The Sharing Paradox:

Shared Libraries: Code for libc, graphics libraries, etc., exists once in physical memory but appears in every process's logical space.
Inter-Process Communication: Shared memory regions allow processes to exchange data efficiently without kernel involvement for each transfer.
Copy-on-Write (COW): After fork(), parent and child share pages until one writes, saving memory and time.

These capabilities would be extremely difficult to implement safely with physical addressing.

Resource Efficiency:

The logical address abstraction dramatically improves resource utilization:

Efficiency Gains Through Logical Addressing

•Sparse Address Spaces: A process can allocate a huge virtual array but only use part of it. Physical memory is allocated only for accessed pages—not for the entire virtual allocation.
•Demand Paging: Program code and data are loaded only when accessed. A large application may have much of its code never loaded into memory during typical use.
•Memory Mapping: Files can be mapped into the address space, letting the OS handle caching and disk I/O. Rarely-used pages stay on disk.
•Deduplication: Identical pages (common in containers or VMs) can share physical memory while maintaining separate logical appearances.

The Ultimate Portability Layer

How Logical Addresses Are Generated

Understanding the origin of logical addresses reveals how the abstraction is maintained from source code through compilation to execution.

The Journey of an Address:

Source Code: Programmer references variables, functions, arrays by name
Compilation: Names become symbolic references in object files
Linking: Symbols are resolved to logical addresses within the address space
Loading: Relocations adjust addresses based on where the program loads
Execution: CPU generates logical addresses based on PC and operands

address_generation_stages.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
/* 
 * Address Generation Example: Following 'x' through the compilation pipeline
 */
 
// SOURCE CODE LEVEL
// Variable 'x' is a name - no address yet
int x = 42;
 
int main() {
    int y = x + 1;  // Reference to 'x'
    return y;
}
 
/*
 * AFTER COMPILATION (Assembly/Object Code):
 * 
 *     .data
 * x:                          ; 'x' is now a symbol
 *     .long   42
 * 
 *     .text
 * main:
 *     movl    x(%rip), %eax   ; Reference to symbol 'x' (relocatable)
 *     addl    $1, %eax
 *     ret
 * 
 * At this stage, 'x' is a symbol that will be resolved to
 * an actual address during linking.
 */
 
/*
 * AFTER LINKING (Executable):
 * 
 * Symbol table shows x is at logical address 0x404020:
 * 
 *     .data
 *     .org 0x404020
 *     .long   42
 * 
 *     .text
 *     .org 0x401000
 * main:
 *     movl    0x404020(%rip), %eax  ; Actual address substituted
 *     addl    $1, %eax              ; (or RIP-relative offset)
 *     ret
 * 
 * The linker assigned address 0x404020 to 'x' within the
 * process's logical address space.
 */
 
/*
 * DURING EXECUTION:
 * 
 * 1. CPU fetches instruction from logical address 0x401000
 * 2. Instruction encodes access to logical address 0x404020
 * 3. CPU presents 0x404020 to the MMU for translation
 * 4. MMU translates to physical address (e.g., 0x7FFFDE404020)
 * 5. Physical memory is accessed; value 42 is retrieved
 * 6. All of this happens transparently to the program
 */

Position-Independent Code (PIC):

Dynamic Address Computation:

Many logical addresses aren't determined until runtime:

•Stack Addresses: Local variables get addresses based on the current stack pointer, determined by the call sequence at runtime.
•Heap Addresses: malloc() returns addresses chosen by the memory allocator at runtime.
•mmap() Regions: The OS assigns addresses for memory-mapped files and regions.
•Computed Addresses: Array indexing (e.g., arr[i]) computes addresses from base + offset at runtime.

The Compiler's Promise

Logical Address Space Manipulation

Address Space Manipulation Operations
Operation	Mechanism	Example API	Use Case
Expand Heap	brk/sbrk adjustment	sbrk(increment) / malloc()	Dynamic memory allocation
Memory Map	Create new mapping	mmap()	File I/O, shared memory, large allocations
Unmap Memory	Remove mapping	munmap()	Free large allocations, unload libraries
Protect Memory	Change permissions	mprotect()	JIT compilation, guard pages
Shared Memory	Map same pages	shm_open() + mmap()	Inter-process communication
Stack Expansion	Automatic on access	OS trap handler	Deep recursion, large local arrays

address_space_manipulation.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
#include <string.h>
 
void demonstrate_mmap() {
    printf("=== Address Space Manipulation with mmap ===
 
");
    
    // Allocate a new region in our logical address space
    size_t size = 4096 * 1000;  // ~4 MB
    
    void *region = mmap(
        NULL,                   // Let OS choose the address
        size,                   // Size of mapping
        PROT_READ | PROT_WRITE, // Readable and writable
        MAP_PRIVATE | MAP_ANONYMOUS,  // Private, not file-backed
        -1,                     // No file descriptor
        0                       // No offset
    );
    
    if (region == MAP_FAILED) {
        perror("mmap failed");
        return;
    }
    
    printf("mmap allocated region at: %p
", region);
    printf("Region size: %zu bytes
", size);
    
    // The region is now part of our logical address space
    // But physical memory is allocated lazily (on first access)
    
    // Write to the region
    memset(region, 0xAB, 4096);  // Now physical pages are allocated
    printf("Wrote to first page
");
    
    // Change protection to read-only
    if (mprotect(region, size, PROT_READ) == 0) {
        printf("Changed region to read-only
");
    }
    
    // Attempting to write now would cause SIGSEGV:
    // memset(region, 0, 4096);  // CRASH!
    
    // Clean up - remove from logical address space
    if (munmap(region, size) == 0) {
        printf("Unmapped region - address space shrunk
");
    }
    
    // region is now an invalid address - accessing it would crash
}
 
void show_address_space_info() {
    printf("
=== Current Address Space Info ===
 
");
    
    // On Linux, we can read /proc/self/maps
    FILE *maps = fopen("/proc/self/maps", "r");
    if (maps) {
        char line[256];
        printf("%-20s %-5s %-8s %s
", 
               "Address Range", "Perm", "Offset", "Mapping");
        printf("%-20s %-5s %-8s %s
", 
               "-------------", "----", "------", "-------");
        
        int count = 0;
        while (fgets(line, sizeof(line), maps) && count < 15) {
            // Parse and display simplified view
            printf("%s", line);
            count++;
        }
        if (count == 15) {
            printf("... (truncated)
");
        }
        fclose(maps);
    }
}
 
int main() {
    demonstrate_mmap();
    show_address_space_info();
    return 0;
}
 
/*
 * Sample Output:
 * 
 * === Address Space Manipulation with mmap ===
 * 
 * mmap allocated region at: 0x7f5a3c000000
 * Region size: 4096000 bytes
 * Wrote to first page
 * Changed region to read-only
 * Unmapped region - address space shrunk
 * 
 * === Current Address Space Info ===
 * 
 * Address Range        Perm  Offset   Mapping
 * -------------        ----  ------   -------
 * 55e8a5c00000-55e8a5c01000 r--p 00000000 /home/user/a.out
 * 55e8a5c01000-55e8a5c02000 r-xp 00001000 /home/user/a.out
 * ...
 */

Address Space Limits

Summary: The Logical Address Space

Key Takeaways

•Logical addresses are CPU-generated abstractions that represent memory locations from a process's perspective, independent of physical memory organization.
•Each process has its own logical address space, providing isolation by default with controlled sharing possible through intentional mapping.
•The address space is structured into segments (text, data, BSS, heap, stack) with different purposes, permissions, and lifetime characteristics.
•Historical evolution from absolute addressing through base registers to modern virtual memory systems solved progressively more complex multiprogramming challenges.
•Benefits include: programmer simplicity, process isolation, memory protection, efficient resource usage, portability, and the enabling of virtual memory.
•Logical addresses are generated through compilation, linking, and runtime computation, with the program unaware of eventual physical locations.
•Address spaces are dynamic—they can be expanded, contracted, protected, and remapped during execution through system calls.

What's Next:

Page Complete

1 / 5