Operating SystemsLogical vs Physical Addresses

Understanding Logical vs Physical Addresses

LevelIntermediate

Duration75 mins

TopicLogical vs Physical Addresses

5 / 5

Base and Limit Registers

The Simplest Solution That Actually Works

Before complex paging systems with multi-level page tables and TLBs, computer architects solved the memory management problem with elegant simplicity: two hardware registers. The base register stored where a program's memory started in physical RAM. The limit register stored how much memory the program could use. Every memory access was translated and bounds-checked using just these two values.

This mechanism—often called base and limit or base and bounds—provided both relocation (programs could load at any address) and protection (programs couldn't access memory outside their region). Understanding base and limit is valuable for several reasons: it illustrates fundamental memory management concepts, it appears in modern processor features like segment registers, and it demonstrates how hardware complexity is driven by real limitations of simpler approaches.

What You Will Learn

By the end of this page, you will understand how base and limit registers provide address translation and protection, the historical context and evolution of this mechanism, its fundamental limitations that led to paging, where base and limit concepts still appear in modern systems, and how to compare different memory management approaches.

The Base and Limit Mechanism

Base and limit registers provide a straightforward way to implement both address translation and memory protection using minimal hardware.

The Registers:

Base Register: Contains the physical address where the process's memory region begins.
Limit Register: Contains the size of the process's memory region (or the highest valid logical address).

Translation Formula:

Physical Address = Base Register + Logical Address

Protection Check:

if (Logical Address >= Limit Register)
    raise Protection Fault

Both operations happen simultaneously in hardware on every memory access.

Converting Mermaid diagram...

Example Operation:

Consider a process loaded at physical address 0x100000 with 64 KB of memory:

Base Register = 0x100000
Limit Register = 0x10000 (64 KB = 65,536 bytes)

When the process accesses logical address 0x5000:

Check: Is 0x5000 < 0x10000? Yes, proceed.
Translate: 0x100000 + 0x5000 = 0x105000
Access physical address 0x105000

When the process tries to access logical address 0x20000:

Check: Is 0x20000 < 0x10000? No! Access violation.
Raise protection fault—process attempted to access beyond its region.

This simple check prevents any process from accessing another's memory.

Two Flavors of Limit

Some systems store the limit as the size (check: logical < limit). Others store it as the maximum valid address (check: logical ≤ limit). Still others store the upper bound as base + size (check: physical < upper bound, computed after adding base). The concept is identical; only the comparison details differ.

Hardware Implementation

The base and limit mechanism requires simple hardware—far simpler than a full MMU with TLB and page table walker. This simplicity was a major advantage in early computer systems where transistors were expensive.

Hardware Components Required

•Two Registers: The base and limit registers themselves, typically the same width as addresses (32 or 64 bits).
•One Adder: A fast binary adder to compute Base + Logical Address. This is the translation hardware.
•One Comparator: Logic to compare Logical Address against Limit. This is the protection hardware.
•Trap Logic: Circuit to raise an exception if the comparison fails.
•Privilege Check: Ensure only the kernel can modify the base and limit registers.

Critical Path Analysis:

The comparison and addition happen in parallel. The result of the comparison gates whether the addition's result is actually used:

Clock Cycle 0: Logical address available
              Start: LA + Base (addition)
              Start: LA < Limit (comparison)

Clock Cycle 1: Addition complete, comparison complete
              If comparison passes: Physical address valid
              If comparison fails: Raise trap, discard addition result

This adds approximately 1-2 cycles to memory access latency—minimal compared to the cost of memory itself.

base_limit_hardware.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
/*
 * Base and Limit Register Logic (Conceptual Hardware)
 * 
 * This simulates what the hardware does on every memory access.
 * In real hardware, this is combinational logic + registers.
 */
 
#include <stdint.h>
#include <stdbool.h>
#include <stdio.h>
 
// Hardware registers - privileged, only kernel can modify
typedef struct {
    uint64_t base;   // Physical start address
    uint64_t limit;  // Size of region (in bytes)
} MemoryRegisters;
 
// Per-CPU memory registers (set during context switch)
MemoryRegisters current_regs;
 
// Result of address translation
typedef struct {
    bool valid;             // Translation succeeded?
    uint64_t physical_addr; // If valid, the physical address
} TranslationResult;
 
/*
 * Hardware translation + protection logic
 * 
 * This happens on EVERY memory access—must be fast!
 * In reality, this is a few gates of combinational logic.
 */
TranslationResult translate_base_limit(uint64_t logical_addr) {
    TranslationResult result;
    
    // Protection check: is logical address within bounds?
    // This comparison happens in parallel with the addition
    if (logical_addr >= current_regs.limit) {
        // Access violation! 
        result.valid = false;
        result.physical_addr = 0;
        // Hardware would raise a trap here
        return result;
    }
    
    // Translation: add base to logical address
    // This addition happens in parallel with the comparison
    result.physical_addr = current_regs.base + logical_addr;
    result.valid = true;
    
    return result;
}
 
/*
 * Context switch: load new process's base and limit
 * 
 * This is a privileged operation - kernel only.
 * The kernel maintains base/limit values for each process
 * and loads them when switching contexts.
 */
void context_switch_memory(uint64_t new_base, uint64_t new_limit) {
    // These are privileged writes - user mode cannot do this
    current_regs.base = new_base;
    current_regs.limit = new_limit;
    
    // All subsequent memory accesses by this CPU use these values
}
 
/*
 * Example execution trace:
 */
void example_execution() {
    // Process A loaded at physical 0x100000, size 64 KB
    context_switch_memory(0x100000, 0x10000);
    
    TranslationResult r;
    
    // Access logical 0x1000 (valid)
    r = translate_base_limit(0x1000);
    printf("Logical 0x1000 -> Physical 0x%llx (valid: %d)
",
           (unsigned long long)r.physical_addr, r.valid);
    // Output: Logical 0x1000 -> Physical 0x101000 (valid: 1)
    
    // Access logical 0xF000 (valid, near end)
    r = translate_base_limit(0xF000);
    printf("Logical 0xF000 -> Physical 0x%llx (valid: %d)
",
           (unsigned long long)r.physical_addr, r.valid);
    // Output: Logical 0xF000 -> Physical 0x10F000 (valid: 1)
    
    // Access logical 0x20000 (INVALID - beyond limit)
    r = translate_base_limit(0x20000);
    printf("Logical 0x20000 -> (valid: %d) <- Protection Fault!
",
           r.valid);
    // Output: Logical 0x20000 -> (valid: 0) <- Protection Fault!
    
    // Switch to Process B at physical 0x200000, size 128 KB
    context_switch_memory(0x200000, 0x20000);
    
    // Access logical 0x1000 (same logical address, different physical!)
    r = translate_base_limit(0x1000);
    printf("After switch: Logical 0x1000 -> Physical 0x%llx
",
           (unsigned long long)r.physical_addr);
    // Output: After switch: Logical 0x1000 -> Physical 0x201000
}
 
/*
 * Key observation: The SAME logical address maps to DIFFERENT
 * physical addresses depending on which process's base is loaded.
 * This is the foundation of process isolation.
 */

Comparison with Paging

Base+limit uses a single register pair and simple arithmetic. A paging MMU uses multi-level page tables, TLBs with dozens to thousands of entries, page table walkers, and complex comparison logic. The hardware cost difference was significant in early computers—paging became dominant only as transistors became abundant.

Capabilities Provided by Base and Limit

Despite its simplicity, the base and limit mechanism provides several important capabilities that were revolutionary when introduced.

Capabilities Provided

•Relocation: Programs are written for logical address 0 but can load at any physical address.
•Protection: A process cannot access memory outside its allocated region.
•Multiprogramming: Multiple programs can coexist in RAM, each with its own base/limit.
•Simple Context Switch: Just save and restore two register values.
•Hardware Simplicity: Minimal gates required—practical for early computers.

Capabilities NOT Provided

•Non-Contiguous Allocation: The entire process must fit in a single contiguous block.
•Fine-Grained Protection: All memory has the same permissions—no read-only code regions.
•Sharing: No way for two processes to share the same physical memory.
•Virtual Memory: Logical space cannot exceed physical space.
•Efficient Memory Use: External fragmentation wastes memory.

Relocation In Practice:

Before base and limit, programs were written for specific memory addresses. If another program occupied that space, you couldn't run both. Program relocation required painful manual address adjustment.

With base and limit:

Compiler generates code using logical addresses starting at 0
OS chooses where in physical memory to load the program
OS sets the base register to that physical address
Hardware automatically adjusts every memory reference

The same binary can run at physical address 0x10000 or 0x500000—the program doesn't know or care.

multiprogramming_base_limit.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
/*
 * Multiprogramming with Base and Limit Registers
 * 
 * Shows how multiple processes coexist in physical memory.
 */
 
#include <stdio.h>
#include <stdint.h>
 
typedef struct {
    const char* name;
    uint64_t base;      // Physical start address
    uint64_t limit;     // Size in bytes
    uint64_t saved_pc;  // Saved program counter (logical)
} Process;
 
// Physical memory layout (simplified)
/*
 * Physical Memory Map:
 * 
 * 0x000000 - 0x0FFFFF : Operating System (1 MB)
 * 0x100000 - 0x13FFFF : Process A (256 KB)
 * 0x140000 - 0x1BFFFF : Process B (512 KB)
 * 0x1C0000 - 0x1FFFFF : Process C (256 KB)
 * 0x200000 - ...      : Free memory
 * 
 * Each process thinks it starts at logical address 0.
 */
 
Process processes[] = {
    {"Process A", 0x100000, 0x40000, 0x1234},   // 256 KB at 0x100000
    {"Process B", 0x140000, 0x80000, 0x5678},   // 512 KB at 0x140000  
    {"Process C", 0x1C0000, 0x40000, 0x9ABC},   // 256 KB at 0x1C0000
};
 
void show_memory_layout() {
    printf("=== Physical Memory Layout ===
 
");
    printf("OS: 0x000000 - 0x0FFFFF (1 MB)
");
    
    for (int i = 0; i < 3; i++) {
        Process* p = &processes[i];
        uint64_t end = p->base + p->limit - 1;
        printf("%s: 0x%06llx - 0x%06llx (Base=0x%06llx, Limit=0x%05llx)
",
               p->name,
               (unsigned long long)p->base,
               (unsigned long long)end,
               (unsigned long long)p->base,
               (unsigned long long)p->limit);
    }
    printf("
");
}
 
void show_address_translation(Process* p, uint64_t logical) {
    uint64_t physical = p->base + logical;
    int valid = (logical < p->limit);
    
    printf("%s: Logical 0x%05llx -> Physical 0x%06llx (%s)
",
           p->name,
           (unsigned long long)logical,
           (unsigned long long)physical,
           valid ? "valid" : "FAULT");
}
 
int main() {
    show_memory_layout();
    
    printf("=== Address Translation Examples ===
 
");
    
    // Same logical address, different processes, different physical addresses
    uint64_t test_addr = 0x1000;
    
    for (int i = 0; i < 3; i++) {
        show_address_translation(&processes[i], test_addr);
    }
    
    printf("
");
    
    // Show protection: what if Process A tries to access beyond its limit?
    printf("Protection example:
");
    show_address_translation(&processes[0], 0x50000);  // Beyond Process A's 256 KB
    
    return 0;
}
 
/*
 * Output:
 * 
 * === Physical Memory Layout ===
 * 
 * OS: 0x000000 - 0x0FFFFF (1 MB)
 * Process A: 0x100000 - 0x13FFFF (Base=0x100000, Limit=0x40000)
 * Process B: 0x140000 - 0x1BFFFF (Base=0x140000, Limit=0x80000)
 * Process C: 0x1C0000 - 0x1FFFFF (Base=0x1C0000, Limit=0x40000)
 * 
 * === Address Translation Examples ===
 * 
 * Process A: Logical 0x01000 -> Physical 0x101000 (valid)
 * Process B: Logical 0x01000 -> Physical 0x141000 (valid)
 * Process C: Logical 0x01000 -> Physical 0x1C1000 (valid)
 * 
 * Protection example:
 * Process A: Logical 0x50000 -> Physical 0x150000 (FAULT)
 * 
 * Note: Process A's 0x50000 would translate to 0x150000, which
 * is INSIDE Process B's region! Protection prevents this.
 */

No Intra-Process Protection

With simple base and limit, all of a process's memory has the same permissions. There's no way to have a read-only code segment or a non-executable stack. A buffer overflow can overwrite code, and code can modify itself. These protections require either multiple register pairs (segmentation) or page-level permissions (paging).

The Fragmentation Problem

The Achilles heel of base and limit registers is external fragmentation. Because each process needs a contiguous physical memory region, memory gradually becomes a patchwork of allocated and free regions, with free space scattered in unusable small chunks.

fragmentation_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
External Fragmentation Over Time:
 
INITIAL STATE - Clean memory
┌─────────────────────────────────────────────────────────┐
│                     FREE (2 MB)                         │
└─────────────────────────────────────────────────────────┘
 
AFTER LOADING A, B, C (each 256 KB, 512 KB, 256 KB)
┌───────────┬─────────────────────────┬───────────┬───────┐
│  A (256K) │       B (512K)          │  C (256K) │ FREE  │
└───────────┴─────────────────────────┴───────────┴───────┘
                                                   (768 KB)
 
PROCESS B EXITS - Hole appears
┌───────────┬─────────────────────────┬───────────┬───────┐
│  A (256K) │       FREE (512K)       │  C (256K) │ FREE  │
└───────────┴─────────────────────────┴───────────┴───────┘
                                                   (768 KB)
Total free: 512K + 768K = 1.25 MB
 
NOW TRYING TO LOAD D (600 KB)...
Problem! D needs 600 KB contiguous, but:
  - First hole: 512 KB (too small)
  - Second hole: 768 KB (big enough, but wastes 168 KB)
  
If we use the second hole:
┌───────────┬─────────────────────────┬───────────┬─────────┬────┐
│  A (256K) │       FREE (512K)       │  C (256K) │ D(600K) │FREE│
└───────────┴─────────────────────────┴───────────┴─────────┴────┘
                                                             (168K)
Total free: 512K + 168K = 680 KB
But both pieces are too small for medium-sized allocations!
 
CONTINUED LOADING/EXITING causes memory to look like:
┌──┬────┬───┬──────┬───┬────┬──┬──────────┬──┬───┐
│A │FREE│ E │ FREE │ C │FREE│ D│   FREE   │G │ F │
└──┴────┴───┴──────┴───┴────┴──┴──────────┴──┴───┘
   50K     128K        64K         200K      
   
Total free: 50K + 128K + 64K + 200K = 442 KB
But LARGEST contiguous free: 200 KB
Cannot load any process > 200 KB despite having 442 KB free!
 
This is EXTERNAL FRAGMENTATION.

The 50% Rule:

Statistical analysis (the Knuth 50% rule) shows that for each allocated block, about half a block's worth of space is lost to fragmentation on average. If you have N allocated processes, you have approximately N/2 unusable holes. This means roughly 1/3 of memory is unusable due to fragmentation!

Mitigation Strategies:

Compaction: Move all processes to one end of memory, consolidating free space. Expensive—requires stopping processes and copying their entire memory.
Best-Fit Allocation: When loading a process, find the smallest hole that fits. Reduces large-hole destruction but creates many tiny unusable holes.
Swapping: Temporarily move processes to disk to create larger contiguous regions. Very expensive (disk is slow).

None of these are satisfactory. Paging eliminates external fragmentation entirely by using fixed-size allocation units.

Comparison: Base+Limit vs. Paging
Aspect	Base + Limit	Paging
External Fragmentation	Yes (major problem)	None
Internal Fragmentation	None	Yes (last page)
Hardware Complexity	Low (2 registers + adder)	High (TLB, PTW, tables)
Memory Overhead	2 registers per process	Page tables (1-3% of RAM)
Non-Contiguous Allocation	No	Yes
Sharing	Difficult	Easy (same frame in multiple PTEs)
Virtual Memory	No	Yes

Why Paging Won

External fragmentation is why paging replaced base+limit. As memories grew and more processes ran, fragmentation became intolerable. Paging's fixed-size units ensure any free frame can satisfy any page request—no external fragmentation. The cost is internal fragmentation (wasted space in the last page), but this averages only half a page per allocation—far less than external fragmentation.

Multiple Base-Limit Pairs: Segmentation

A natural extension of simple base+limit is to use multiple pairs—one for each logical region of a program. This is segmentation, and it addresses some but not all limitations of single-pair base+limit.

The Segmentation Model:

Instead of one (base, limit) pair, the hardware supports several:

Segment	Purpose	Base	Limit	Permissions
0 (Code)	Executable code	0x100000	0x10000	Read + Execute
1 (Data)	Global/static data	0x200000	0x8000	Read + Write
2 (Stack)	Call stack	0x300000	0x4000	Read + Write
3 (Heap)	Dynamic allocation	0x400000	0x20000	Read + Write

Now a logical address has two parts: segment number and offset within segment.

Logical Address = (Segment Number, Offset)
Physical Address = Segments[Number].Base + Offset
if (Offset >= Segments[Number].Limit) → Fault
if (Access violates Segments[Number].Permissions) → Fault

Converting Mermaid diagram...

Segmentation Advantages over Simple Base+Limit

•Per-Segment Protection: Code can be read+execute, data read+write, stack non-executable. This prevents many attacks.
•Logical Memory Organization: Memory layout matches programmer's mental model—separate code, data, stack.
•Easier Sharing: Two processes can share a code segment by pointing their segment entries to the same physical memory.
•Variable Segment Sizes: Each segment can be sized appropriately—large heap, small stack.
•Better Relocation: Segments can be moved independently during compaction.

Why Segmentation Still Has Problems:

Despite improvements, segmentation inherits the fundamental problem of base+limit: external fragmentation. Each segment needs contiguous memory. If your data segment is 50 KB and it needs to grow, you might not have 50 KB of contiguous free space—even if you have 100 KB free in small chunks.

The solution—combining segmentation with paging—was used in Intel 386+ processors. The segment table provides protection and logical organization; pages handle physical memory allocation without fragmentation.

Intel x86 Segmentation

Intel x86 processors (386 onward) implement full segmentation: CS (code), DS (data), SS (stack), ES/FS/GS (extra). Each segment selector points to a segment descriptor with base, limit, and permissions. In modern x86-64 (long mode), segmentation is mostly disabled—base is forced to 0, limit to max, and paging handles everything. FS and GS remain useful for thread-local storage.

Base and Limit in Modern Systems

While paging dominates memory management, base-and-limit concepts still appear in several modern contexts. Understanding these helps connect historical concepts to current practice.

Base+Limit Concepts in Modern Systems
Context	Mechanism	Purpose	Example
x86 GS/FS Segments	Segment base (no limit check)	Thread-local storage (TLS)	Each thread has different GS base
ARM MMIO Regions	MPU regions with base+limit	Memory Protection Unit	Embedded systems without MMU
Intel SGX	Enclave base + size	Trusted execution environment	Enclave measured memory region
GPU Memory	Base + length for buffers	Buffer object bounds	OpenGL/Vulkan buffer bindings
DMA Descriptors	Source + length	Bounded memory transfer	NIC scatter-gather lists
Array Bounds	Base + count (software)	Buffer overflow prevention	AddressSanitizer fat pointers

Memory Protection Units (MPU):

Many embedded processors (ARM Cortex-M series, some RISC-V) lack full MMUs. Instead, they have MPUs that implement a form of base+limit protection. An MPU defines several regions, each with:

Start address (base)
Size (power of 2, like limit)
Access permissions (read/write/execute)

The MPU checks every access against these regions—base+limit protection without the complexity (or capability) of paging. This is appropriate for embedded systems where:

Memory is fixed at compile time
No virtual memory is needed
Hardware resources are limited
Deterministic timing is required

thread_local_storage.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
/*
 * Modern Use of Segment Registers: Thread-Local Storage
 * 
 * On x86-64, the FS and GS segment registers are used for TLS.
 * The segment base is set to the TLS block for the current thread.
 * No limit checking is performed—just base addition.
 */
 
#include <stdio.h>
#include <stdint.h>
#include <pthread.h>
 
// GCC's __thread keyword uses FS segment for TLS on x86-64//Linux
__thread int thread_local_var = 42;
__thread int thread_local_counter = 0;
 
/*
 * How it works:
 * 
 * 1. When a thread is created, the OS allocates a TLS block
 * 2. The kernel sets FS.base to point to this block
 * 3. TLS accesses use FS-relative addressing
 * 
 * In assembly, a TLS access looks like:
 *     mov %fs:thread_local_var@tpoff, %eax
 * 
 * The CPU computes: FS.base + offset
 * Different threads have different FS.base, so they access different memory.
 */
 
void* thread_func(void* arg) {
    int id = (int)(intptr_t)arg;
    
    // Each thread has its own copy of these variables
    thread_local_var = id * 100;
    thread_local_counter = 0;
    
    for (int i = 0; i < 1000; i++) {
        thread_local_counter++;  // No synchronization needed!
    }
    
    printf("Thread %d: var=%d, counter=%d
", 
           id, thread_local_var, thread_local_counter);
    
    return NULL;
}
 
// Read FS base register (Linux-specific)
uint64_t read_fs_base(void) {
    uint64_t val;
    // arch_prctl(ARCH_GET_FS, &val) or:
    __asm__ volatile("rdfsbase %0" : "=r"(val));
    return val;
}
 
int main() {
    printf("Main thread FS base: 0x%lx
", read_fs_base());
    
    pthread_t threads[3];
    
    for (int i = 0; i < 3; i++) {
        pthread_create(&threads[i], NULL, thread_func, (void*)(intptr_t)i);
    }
    
    for (int i = 0; i < 3; i++) {
        pthread_join(threads[i], NULL);
    }
    
    printf("Main thread final: var=%d, counter=%d
",
           thread_local_var, thread_local_counter);
    
    return 0;
}
 
/*
 * Output:
 * Main thread FS base: 0x7f1234560700
 * Thread 0: var=0, counter=1000
 * Thread 1: var=100, counter=1000
 * Thread 2: var=200, counter=1000
 * Main thread final: var=42, counter=0
 * 
 * Note: Each thread has completely independent variables.
 * This is segment-based isolation without page tables!
 */

The Abstraction Lives On

Even though paging dominates, the conceptual model of base+limit—'start here, go this far, check these permissions'—appears throughout computing. From GPU buffer objects to network packet descriptors to compiler safety transforms, bounded memory regions are a universal pattern. Understanding base and limit helps you recognize this pattern everywhere.

Implementation Considerations

For systems that use base and limit (embedded systems, simple protection schemes), several implementation details affect correctness and performance.

Implementation Challenges

•Register Width: Base and limit must be wide enough for the physical address space. On a 32-bit system with 32-bit registers, you can address 4 GB. If RAM exceeds register width, problems arise.
•Alignment Requirements: Some implementations require base addresses to be aligned (e.g., to 4 KB boundaries). This simplifies hardware but wastes memory for small allocations.
•Limit Granularity: Is the limit checked per-byte or per-page? Per-byte is precise but requires full comparator. Per-page (limit × 4096) is simpler but less precise.
•Context Switch Cost: Saving and restoring base/limit registers adds to context switch time. With multiple segment registers, this becomes significant.
•Kernel vs. User Registers: Some systems have separate base/limit for kernel mode. The switch between user and kernel modes must update these appropriately.

mpu_configuration.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
/*
 * ARM Cortex-M MPU Configuration
 * 
 * This shows real base+limit style memory protection
 * on modern embedded processors.
 */
 
#include <stdint.h>
 
// MPU Region Numbers (Cortex-M typically has 8-16 regions)
#define MPU_REGION_FLASH    0   // Code in Flash
#define MPU_REGION_SRAM     1   // Main SRAM
#define MPU_REGION_PERIPH   2   // Peripheral registers
#define MPU_REGION_STACK    3   // Thread stack (per-thread)
 
// MPU Register Addresses (Cortex-M)
#define MPU_TYPE    (*(volatile uint32_t*)0xE000ED90)
#define MPU_CTRL    (*(volatile uint32_t*)0xE000ED94)
#define MPU_RNR     (*(volatile uint32_t*)0xE000ED98)  // Region Number
#define MPU_RBAR    (*(volatile uint32_t*)0xE000ED9C)  // Region Base
#define MPU_RASR    (*(volatile uint32_t*)0xE000EDA0)  // Region Attr/Size
 
// Region sizes (encoded as log2(size) - 1)
#define MPU_REGION_SIZE_256K   17  // 2^18 = 256K
#define MPU_REGION_SIZE_64K    15
#define MPU_REGION_SIZE_4K     11
#define MPU_REGION_SIZE_1K     9
 
// Access permissions
#define MPU_AP_NO_ACCESS       (0 << 24)
#define MPU_AP_RW_FULL         (3 << 24)  // Full access
#define MPU_AP_RO              (6 << 24)  // Read-only
#define MPU_XN                 (1 << 28)  // Execute Never
 
/*
 * Configure an MPU region
 * 
 * base_addr: Must be aligned to size
 * size_log2: Size as power of 2 (e.g., 12 for 4KB)
 * permissions: Access permission bits
 */
void mpu_configure_region(uint32_t region_num, 
                          uint32_t base_addr, 
                          uint32_t size_log2,
                          uint32_t permissions) {
    // Select region
    MPU_RNR = region_num;
    
    // Set base address (lower bits must be 0, include region valid bit)
    MPU_RBAR = (base_addr & 0xFFFFFFE0) | (1 << 4) | region_num;
    
    // Set size and permissions
    // Size field encoding: (size_log2 - 1) in bits [5:1]
    // Enable bit in bit [0]
    uint32_t size_field = ((size_log2 - 1) << 1) | 1;
    MPU_RASR = permissions | size_field;
}
 
void setup_task_protection(uint32_t stack_base, uint32_t stack_size_log2) {
    /*
     * For a simple RTOS task:
     * 
     * Region 0: Flash (code) - Read + Execute
     * Region 1: Global SRAM - Read + Write (shared data)
     * Region 2: Peripheral space - Read + Write, No Execute
     * Region 3: Task stack - Read + Write, No Execute
     * 
     * Tasks cannot:
     * - Execute from SRAM or Stack (XN bit)
     * - Access other tasks' stacks (different stack region per task)
     * - Access regions not configured (causes fault)
     */
    
    // Flash: 256KB starting at 0x08000000
    mpu_configure_region(MPU_REGION_FLASH, 
                         0x08000000, 
                         MPU_REGION_SIZE_256K,
                         MPU_AP_RO);  // Read-only, executable
    
    // Global SRAM: 64KB at 0x20000000
    mpu_configure_region(MPU_REGION_SRAM,
                         0x20000000,
                         MPU_REGION_SIZE_64K,
                         MPU_AP_RW_FULL | MPU_XN);  // Read-write, no execute
    
    // Peripherals: at 0x40000000
    mpu_configure_region(MPU_REGION_PERIPH,
                         0x40000000,
                         MPU_REGION_SIZE_256K,
                         MPU_AP_RW_FULL | MPU_XN);
    
    // Task stack: specific to this task
    mpu_configure_region(MPU_REGION_STACK,
                         stack_base,
                         stack_size_log2,
                         MPU_AP_RW_FULL | MPU_XN);
    
    // Enable MPU
    MPU_CTRL = 1;
}
 
/*
 * On context switch between tasks:
 * - Update Region 3 (stack) base address to new task's stack
 * - If tasks have separate heap regions, update those too
 * - Other regions (Flash, global SRAM, peripherals) stay the same
 */

When to Use MPU vs MMU

Use MPU (base+limit style) when: memory layout is static, real-time constraints require predictable timing, power/area budget is limited, or virtual memory isn't needed. Use MMU (paging) when: processes need isolation, demand paging is required, address spaces need to exceed physical memory, or fine-grained page permissions are needed. Many systems support both, allowing choice per application.

Summary: Base and Limit Registers

We've explored base and limit registers—the elegant simplicity that first made multiprogramming and memory protection practical. While largely superseded by paging, these concepts remain relevant.

Key Takeaways

•Base and limit provides simple address translation—adding a base to all addresses and checking against a limit—using minimal hardware.
•The mechanism enables relocation and protection—programs can load anywhere, and they cannot access other programs' memory.
•Hardware implementation is straightforward—two registers, one adder, one comparator, running in parallel.
•External fragmentation is the fatal flaw—contiguous allocation requirements cause memory to become unusably fragmented over time.
•Segmentation extends base+limit to multiple regions—providing per-segment protection but not solving fragmentation.
•Paging solved fragmentation by using fixed-size units, leading to its dominance in modern systems.
•The concepts remain relevant—thread-local storage, MPUs, GPU buffers, and bounds checking all use base+limit ideas.
•Understanding the progression from base+limit to paging illuminates why modern memory management has its current form.

Module Complete:

This concludes Module 3: Logical vs Physical Addresses. You now have a comprehensive understanding of the addressing duality at the heart of memory management:

Logical address space: The abstraction processes see—contiguous, private, unlimited
Physical address space: The reality of hardware—finite, shared, fragmented
Address translation: The bridge between them, performed on every access
The MMU: Hardware that makes translation fast and enforces protection
Base and limit: The simple predecessor that illuminates why paging was needed

This foundation prepares you for deeper topics: page table structures, virtual memory, demand paging, and memory-mapped files.

Module Complete

You've now mastered the fundamental concepts of logical vs physical addressing—from the abstraction that liberates programmers to the hardware that makes it possible. These aren't just historical concepts; they're the foundation of every memory access in every program. With this understanding, you're prepared to explore more advanced memory management topics: how page tables are structured, how virtual memory extends physical limitations, and how the operating system manages the finite resource of physical RAM.

5 / 5

Loading learning content...

Operating SystemsLogical vs Physical Addresses

Understanding Logical vs Physical Addresses

LevelIntermediate

Duration75 mins

TopicLogical vs Physical Addresses

5 / 5

Base and Limit Registers

The Simplest Solution That Actually Works

What You Will Learn

The Base and Limit Mechanism

Base and limit registers provide a straightforward way to implement both address translation and memory protection using minimal hardware.

The Registers:

Base Register: Contains the physical address where the process's memory region begins.
Limit Register: Contains the size of the process's memory region (or the highest valid logical address).

Translation Formula:

Physical Address = Base Register + Logical Address

Protection Check:

if (Logical Address >= Limit Register)
    raise Protection Fault

Both operations happen simultaneously in hardware on every memory access.

Converting Mermaid diagram...

Example Operation:

Consider a process loaded at physical address 0x100000 with 64 KB of memory:

Base Register = 0x100000
Limit Register = 0x10000 (64 KB = 65,536 bytes)

When the process accesses logical address 0x5000:

Check: Is 0x5000 < 0x10000? Yes, proceed.
Translate: 0x100000 + 0x5000 = 0x105000
Access physical address 0x105000

When the process tries to access logical address 0x20000:

Check: Is 0x20000 < 0x10000? No! Access violation.
Raise protection fault—process attempted to access beyond its region.

This simple check prevents any process from accessing another's memory.

Two Flavors of Limit

Hardware Implementation

Hardware Components Required

•Two Registers: The base and limit registers themselves, typically the same width as addresses (32 or 64 bits).
•One Adder: A fast binary adder to compute Base + Logical Address. This is the translation hardware.
•One Comparator: Logic to compare Logical Address against Limit. This is the protection hardware.
•Trap Logic: Circuit to raise an exception if the comparison fails.
•Privilege Check: Ensure only the kernel can modify the base and limit registers.

Critical Path Analysis:

The comparison and addition happen in parallel. The result of the comparison gates whether the addition's result is actually used:

Clock Cycle 0: Logical address available
              Start: LA + Base (addition)
              Start: LA < Limit (comparison)

Clock Cycle 1: Addition complete, comparison complete
              If comparison passes: Physical address valid
              If comparison fails: Raise trap, discard addition result

This adds approximately 1-2 cycles to memory access latency—minimal compared to the cost of memory itself.

base_limit_hardware.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
/*
 * Base and Limit Register Logic (Conceptual Hardware)
 * 
 * This simulates what the hardware does on every memory access.
 * In real hardware, this is combinational logic + registers.
 */
 
#include <stdint.h>
#include <stdbool.h>
#include <stdio.h>
 
// Hardware registers - privileged, only kernel can modify
typedef struct {
    uint64_t base;   // Physical start address
    uint64_t limit;  // Size of region (in bytes)
} MemoryRegisters;
 
// Per-CPU memory registers (set during context switch)
MemoryRegisters current_regs;
 
// Result of address translation
typedef struct {
    bool valid;             // Translation succeeded?
    uint64_t physical_addr; // If valid, the physical address
} TranslationResult;
 
/*
 * Hardware translation + protection logic
 * 
 * This happens on EVERY memory access—must be fast!
 * In reality, this is a few gates of combinational logic.
 */
TranslationResult translate_base_limit(uint64_t logical_addr) {
    TranslationResult result;
    
    // Protection check: is logical address within bounds?
    // This comparison happens in parallel with the addition
    if (logical_addr >= current_regs.limit) {
        // Access violation! 
        result.valid = false;
        result.physical_addr = 0;
        // Hardware would raise a trap here
        return result;
    }
    
    // Translation: add base to logical address
    // This addition happens in parallel with the comparison
    result.physical_addr = current_regs.base + logical_addr;
    result.valid = true;
    
    return result;
}
 
/*
 * Context switch: load new process's base and limit
 * 
 * This is a privileged operation - kernel only.
 * The kernel maintains base/limit values for each process
 * and loads them when switching contexts.
 */
void context_switch_memory(uint64_t new_base, uint64_t new_limit) {
    // These are privileged writes - user mode cannot do this
    current_regs.base = new_base;
    current_regs.limit = new_limit;
    
    // All subsequent memory accesses by this CPU use these values
}
 
/*
 * Example execution trace:
 */
void example_execution() {
    // Process A loaded at physical 0x100000, size 64 KB
    context_switch_memory(0x100000, 0x10000);
    
    TranslationResult r;
    
    // Access logical 0x1000 (valid)
    r = translate_base_limit(0x1000);
    printf("Logical 0x1000 -> Physical 0x%llx (valid: %d)
",
           (unsigned long long)r.physical_addr, r.valid);
    // Output: Logical 0x1000 -> Physical 0x101000 (valid: 1)
    
    // Access logical 0xF000 (valid, near end)
    r = translate_base_limit(0xF000);
    printf("Logical 0xF000 -> Physical 0x%llx (valid: %d)
",
           (unsigned long long)r.physical_addr, r.valid);
    // Output: Logical 0xF000 -> Physical 0x10F000 (valid: 1)
    
    // Access logical 0x20000 (INVALID - beyond limit)
    r = translate_base_limit(0x20000);
    printf("Logical 0x20000 -> (valid: %d) <- Protection Fault!
",
           r.valid);
    // Output: Logical 0x20000 -> (valid: 0) <- Protection Fault!
    
    // Switch to Process B at physical 0x200000, size 128 KB
    context_switch_memory(0x200000, 0x20000);
    
    // Access logical 0x1000 (same logical address, different physical!)
    r = translate_base_limit(0x1000);
    printf("After switch: Logical 0x1000 -> Physical 0x%llx
",
           (unsigned long long)r.physical_addr);
    // Output: After switch: Logical 0x1000 -> Physical 0x201000
}
 
/*
 * Key observation: The SAME logical address maps to DIFFERENT
 * physical addresses depending on which process's base is loaded.
 * This is the foundation of process isolation.
 */

Comparison with Paging

Capabilities Provided by Base and Limit

Despite its simplicity, the base and limit mechanism provides several important capabilities that were revolutionary when introduced.

Capabilities Provided

•Relocation: Programs are written for logical address 0 but can load at any physical address.
•Protection: A process cannot access memory outside its allocated region.
•Multiprogramming: Multiple programs can coexist in RAM, each with its own base/limit.
•Simple Context Switch: Just save and restore two register values.
•Hardware Simplicity: Minimal gates required—practical for early computers.

Capabilities NOT Provided

•Non-Contiguous Allocation: The entire process must fit in a single contiguous block.
•Fine-Grained Protection: All memory has the same permissions—no read-only code regions.
•Sharing: No way for two processes to share the same physical memory.
•Virtual Memory: Logical space cannot exceed physical space.
•Efficient Memory Use: External fragmentation wastes memory.

Relocation In Practice:

With base and limit:

Compiler generates code using logical addresses starting at 0
OS chooses where in physical memory to load the program
OS sets the base register to that physical address
Hardware automatically adjusts every memory reference

The same binary can run at physical address 0x10000 or 0x500000—the program doesn't know or care.

multiprogramming_base_limit.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
/*
 * Multiprogramming with Base and Limit Registers
 * 
 * Shows how multiple processes coexist in physical memory.
 */
 
#include <stdio.h>
#include <stdint.h>
 
typedef struct {
    const char* name;
    uint64_t base;      // Physical start address
    uint64_t limit;     // Size in bytes
    uint64_t saved_pc;  // Saved program counter (logical)
} Process;
 
// Physical memory layout (simplified)
/*
 * Physical Memory Map:
 * 
 * 0x000000 - 0x0FFFFF : Operating System (1 MB)
 * 0x100000 - 0x13FFFF : Process A (256 KB)
 * 0x140000 - 0x1BFFFF : Process B (512 KB)
 * 0x1C0000 - 0x1FFFFF : Process C (256 KB)
 * 0x200000 - ...      : Free memory
 * 
 * Each process thinks it starts at logical address 0.
 */
 
Process processes[] = {
    {"Process A", 0x100000, 0x40000, 0x1234},   // 256 KB at 0x100000
    {"Process B", 0x140000, 0x80000, 0x5678},   // 512 KB at 0x140000  
    {"Process C", 0x1C0000, 0x40000, 0x9ABC},   // 256 KB at 0x1C0000
};
 
void show_memory_layout() {
    printf("=== Physical Memory Layout ===
 
");
    printf("OS: 0x000000 - 0x0FFFFF (1 MB)
");
    
    for (int i = 0; i < 3; i++) {
        Process* p = &processes[i];
        uint64_t end = p->base + p->limit - 1;
        printf("%s: 0x%06llx - 0x%06llx (Base=0x%06llx, Limit=0x%05llx)
",
               p->name,
               (unsigned long long)p->base,
               (unsigned long long)end,
               (unsigned long long)p->base,
               (unsigned long long)p->limit);
    }
    printf("
");
}
 
void show_address_translation(Process* p, uint64_t logical) {
    uint64_t physical = p->base + logical;
    int valid = (logical < p->limit);
    
    printf("%s: Logical 0x%05llx -> Physical 0x%06llx (%s)
",
           p->name,
           (unsigned long long)logical,
           (unsigned long long)physical,
           valid ? "valid" : "FAULT");
}
 
int main() {
    show_memory_layout();
    
    printf("=== Address Translation Examples ===
 
");
    
    // Same logical address, different processes, different physical addresses
    uint64_t test_addr = 0x1000;
    
    for (int i = 0; i < 3; i++) {
        show_address_translation(&processes[i], test_addr);
    }
    
    printf("
");
    
    // Show protection: what if Process A tries to access beyond its limit?
    printf("Protection example:
");
    show_address_translation(&processes[0], 0x50000);  // Beyond Process A's 256 KB
    
    return 0;
}
 
/*
 * Output:
 * 
 * === Physical Memory Layout ===
 * 
 * OS: 0x000000 - 0x0FFFFF (1 MB)
 * Process A: 0x100000 - 0x13FFFF (Base=0x100000, Limit=0x40000)
 * Process B: 0x140000 - 0x1BFFFF (Base=0x140000, Limit=0x80000)
 * Process C: 0x1C0000 - 0x1FFFFF (Base=0x1C0000, Limit=0x40000)
 * 
 * === Address Translation Examples ===
 * 
 * Process A: Logical 0x01000 -> Physical 0x101000 (valid)
 * Process B: Logical 0x01000 -> Physical 0x141000 (valid)
 * Process C: Logical 0x01000 -> Physical 0x1C1000 (valid)
 * 
 * Protection example:
 * Process A: Logical 0x50000 -> Physical 0x150000 (FAULT)
 * 
 * Note: Process A's 0x50000 would translate to 0x150000, which
 * is INSIDE Process B's region! Protection prevents this.
 */

No Intra-Process Protection

The Fragmentation Problem

fragmentation_example.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
External Fragmentation Over Time:
 
INITIAL STATE - Clean memory
┌─────────────────────────────────────────────────────────┐
│                     FREE (2 MB)                         │
└─────────────────────────────────────────────────────────┘
 
AFTER LOADING A, B, C (each 256 KB, 512 KB, 256 KB)
┌───────────┬─────────────────────────┬───────────┬───────┐
│  A (256K) │       B (512K)          │  C (256K) │ FREE  │
└───────────┴─────────────────────────┴───────────┴───────┘
                                                   (768 KB)
 
PROCESS B EXITS - Hole appears
┌───────────┬─────────────────────────┬───────────┬───────┐
│  A (256K) │       FREE (512K)       │  C (256K) │ FREE  │
└───────────┴─────────────────────────┴───────────┴───────┘
                                                   (768 KB)
Total free: 512K + 768K = 1.25 MB
 
NOW TRYING TO LOAD D (600 KB)...
Problem! D needs 600 KB contiguous, but:
  - First hole: 512 KB (too small)
  - Second hole: 768 KB (big enough, but wastes 168 KB)
  
If we use the second hole:
┌───────────┬─────────────────────────┬───────────┬─────────┬────┐
│  A (256K) │       FREE (512K)       │  C (256K) │ D(600K) │FREE│
└───────────┴─────────────────────────┴───────────┴─────────┴────┘
                                                             (168K)
Total free: 512K + 168K = 680 KB
But both pieces are too small for medium-sized allocations!
 
CONTINUED LOADING/EXITING causes memory to look like:
┌──┬────┬───┬──────┬───┬────┬──┬──────────┬──┬───┐
│A │FREE│ E │ FREE │ C │FREE│ D│   FREE   │G │ F │
└──┴────┴───┴──────┴───┴────┴──┴──────────┴──┴───┘
   50K     128K        64K         200K      
   
Total free: 50K + 128K + 64K + 200K = 442 KB
But LARGEST contiguous free: 200 KB
Cannot load any process > 200 KB despite having 442 KB free!
 
This is EXTERNAL FRAGMENTATION.

The 50% Rule:

Mitigation Strategies:

Compaction: Move all processes to one end of memory, consolidating free space. Expensive—requires stopping processes and copying their entire memory.
Best-Fit Allocation: When loading a process, find the smallest hole that fits. Reduces large-hole destruction but creates many tiny unusable holes.
Swapping: Temporarily move processes to disk to create larger contiguous regions. Very expensive (disk is slow).

None of these are satisfactory. Paging eliminates external fragmentation entirely by using fixed-size allocation units.

Comparison: Base+Limit vs. Paging
Aspect	Base + Limit	Paging
External Fragmentation	Yes (major problem)	None
Internal Fragmentation	None	Yes (last page)
Hardware Complexity	Low (2 registers + adder)	High (TLB, PTW, tables)
Memory Overhead	2 registers per process	Page tables (1-3% of RAM)
Non-Contiguous Allocation	No	Yes
Sharing	Difficult	Easy (same frame in multiple PTEs)
Virtual Memory	No	Yes

Why Paging Won

Multiple Base-Limit Pairs: Segmentation

The Segmentation Model:

Instead of one (base, limit) pair, the hardware supports several:

Segment	Purpose	Base	Limit	Permissions
0 (Code)	Executable code	0x100000	0x10000	Read + Execute
1 (Data)	Global/static data	0x200000	0x8000	Read + Write
2 (Stack)	Call stack	0x300000	0x4000	Read + Write
3 (Heap)	Dynamic allocation	0x400000	0x20000	Read + Write

Now a logical address has two parts: segment number and offset within segment.

Logical Address = (Segment Number, Offset)
Physical Address = Segments[Number].Base + Offset
if (Offset >= Segments[Number].Limit) → Fault
if (Access violates Segments[Number].Permissions) → Fault

Converting Mermaid diagram...

Segmentation Advantages over Simple Base+Limit

•Per-Segment Protection: Code can be read+execute, data read+write, stack non-executable. This prevents many attacks.
•Logical Memory Organization: Memory layout matches programmer's mental model—separate code, data, stack.
•Easier Sharing: Two processes can share a code segment by pointing their segment entries to the same physical memory.
•Variable Segment Sizes: Each segment can be sized appropriately—large heap, small stack.
•Better Relocation: Segments can be moved independently during compaction.

Why Segmentation Still Has Problems:

Intel x86 Segmentation

Base and Limit in Modern Systems

While paging dominates memory management, base-and-limit concepts still appear in several modern contexts. Understanding these helps connect historical concepts to current practice.

Base+Limit Concepts in Modern Systems
Context	Mechanism	Purpose	Example
x86 GS/FS Segments	Segment base (no limit check)	Thread-local storage (TLS)	Each thread has different GS base
ARM MMIO Regions	MPU regions with base+limit	Memory Protection Unit	Embedded systems without MMU
Intel SGX	Enclave base + size	Trusted execution environment	Enclave measured memory region
GPU Memory	Base + length for buffers	Buffer object bounds	OpenGL/Vulkan buffer bindings
DMA Descriptors	Source + length	Bounded memory transfer	NIC scatter-gather lists
Array Bounds	Base + count (software)	Buffer overflow prevention	AddressSanitizer fat pointers

Memory Protection Units (MPU):

Many embedded processors (ARM Cortex-M series, some RISC-V) lack full MMUs. Instead, they have MPUs that implement a form of base+limit protection. An MPU defines several regions, each with:

Start address (base)
Size (power of 2, like limit)
Access permissions (read/write/execute)

The MPU checks every access against these regions—base+limit protection without the complexity (or capability) of paging. This is appropriate for embedded systems where:

Memory is fixed at compile time
No virtual memory is needed
Hardware resources are limited
Deterministic timing is required

thread_local_storage.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
/*
 * Modern Use of Segment Registers: Thread-Local Storage
 * 
 * On x86-64, the FS and GS segment registers are used for TLS.
 * The segment base is set to the TLS block for the current thread.
 * No limit checking is performed—just base addition.
 */
 
#include <stdio.h>
#include <stdint.h>
#include <pthread.h>
 
// GCC's __thread keyword uses FS segment for TLS on x86-64//Linux
__thread int thread_local_var = 42;
__thread int thread_local_counter = 0;
 
/*
 * How it works:
 * 
 * 1. When a thread is created, the OS allocates a TLS block
 * 2. The kernel sets FS.base to point to this block
 * 3. TLS accesses use FS-relative addressing
 * 
 * In assembly, a TLS access looks like:
 *     mov %fs:thread_local_var@tpoff, %eax
 * 
 * The CPU computes: FS.base + offset
 * Different threads have different FS.base, so they access different memory.
 */
 
void* thread_func(void* arg) {
    int id = (int)(intptr_t)arg;
    
    // Each thread has its own copy of these variables
    thread_local_var = id * 100;
    thread_local_counter = 0;
    
    for (int i = 0; i < 1000; i++) {
        thread_local_counter++;  // No synchronization needed!
    }
    
    printf("Thread %d: var=%d, counter=%d
", 
           id, thread_local_var, thread_local_counter);
    
    return NULL;
}
 
// Read FS base register (Linux-specific)
uint64_t read_fs_base(void) {
    uint64_t val;
    // arch_prctl(ARCH_GET_FS, &val) or:
    __asm__ volatile("rdfsbase %0" : "=r"(val));
    return val;
}
 
int main() {
    printf("Main thread FS base: 0x%lx
", read_fs_base());
    
    pthread_t threads[3];
    
    for (int i = 0; i < 3; i++) {
        pthread_create(&threads[i], NULL, thread_func, (void*)(intptr_t)i);
    }
    
    for (int i = 0; i < 3; i++) {
        pthread_join(threads[i], NULL);
    }
    
    printf("Main thread final: var=%d, counter=%d
",
           thread_local_var, thread_local_counter);
    
    return 0;
}
 
/*
 * Output:
 * Main thread FS base: 0x7f1234560700
 * Thread 0: var=0, counter=1000
 * Thread 1: var=100, counter=1000
 * Thread 2: var=200, counter=1000
 * Main thread final: var=42, counter=0
 * 
 * Note: Each thread has completely independent variables.
 * This is segment-based isolation without page tables!
 */

The Abstraction Lives On

Implementation Considerations

For systems that use base and limit (embedded systems, simple protection schemes), several implementation details affect correctness and performance.

Implementation Challenges

•Register Width: Base and limit must be wide enough for the physical address space. On a 32-bit system with 32-bit registers, you can address 4 GB. If RAM exceeds register width, problems arise.
•Alignment Requirements: Some implementations require base addresses to be aligned (e.g., to 4 KB boundaries). This simplifies hardware but wastes memory for small allocations.
•Limit Granularity: Is the limit checked per-byte or per-page? Per-byte is precise but requires full comparator. Per-page (limit × 4096) is simpler but less precise.
•Context Switch Cost: Saving and restoring base/limit registers adds to context switch time. With multiple segment registers, this becomes significant.
•Kernel vs. User Registers: Some systems have separate base/limit for kernel mode. The switch between user and kernel modes must update these appropriately.

mpu_configuration.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
/*
 * ARM Cortex-M MPU Configuration
 * 
 * This shows real base+limit style memory protection
 * on modern embedded processors.
 */
 
#include <stdint.h>
 
// MPU Region Numbers (Cortex-M typically has 8-16 regions)
#define MPU_REGION_FLASH    0   // Code in Flash
#define MPU_REGION_SRAM     1   // Main SRAM
#define MPU_REGION_PERIPH   2   // Peripheral registers
#define MPU_REGION_STACK    3   // Thread stack (per-thread)
 
// MPU Register Addresses (Cortex-M)
#define MPU_TYPE    (*(volatile uint32_t*)0xE000ED90)
#define MPU_CTRL    (*(volatile uint32_t*)0xE000ED94)
#define MPU_RNR     (*(volatile uint32_t*)0xE000ED98)  // Region Number
#define MPU_RBAR    (*(volatile uint32_t*)0xE000ED9C)  // Region Base
#define MPU_RASR    (*(volatile uint32_t*)0xE000EDA0)  // Region Attr/Size
 
// Region sizes (encoded as log2(size) - 1)
#define MPU_REGION_SIZE_256K   17  // 2^18 = 256K
#define MPU_REGION_SIZE_64K    15
#define MPU_REGION_SIZE_4K     11
#define MPU_REGION_SIZE_1K     9
 
// Access permissions
#define MPU_AP_NO_ACCESS       (0 << 24)
#define MPU_AP_RW_FULL         (3 << 24)  // Full access
#define MPU_AP_RO              (6 << 24)  // Read-only
#define MPU_XN                 (1 << 28)  // Execute Never
 
/*
 * Configure an MPU region
 * 
 * base_addr: Must be aligned to size
 * size_log2: Size as power of 2 (e.g., 12 for 4KB)
 * permissions: Access permission bits
 */
void mpu_configure_region(uint32_t region_num, 
                          uint32_t base_addr, 
                          uint32_t size_log2,
                          uint32_t permissions) {
    // Select region
    MPU_RNR = region_num;
    
    // Set base address (lower bits must be 0, include region valid bit)
    MPU_RBAR = (base_addr & 0xFFFFFFE0) | (1 << 4) | region_num;
    
    // Set size and permissions
    // Size field encoding: (size_log2 - 1) in bits [5:1]
    // Enable bit in bit [0]
    uint32_t size_field = ((size_log2 - 1) << 1) | 1;
    MPU_RASR = permissions | size_field;
}
 
void setup_task_protection(uint32_t stack_base, uint32_t stack_size_log2) {
    /*
     * For a simple RTOS task:
     * 
     * Region 0: Flash (code) - Read + Execute
     * Region 1: Global SRAM - Read + Write (shared data)
     * Region 2: Peripheral space - Read + Write, No Execute
     * Region 3: Task stack - Read + Write, No Execute
     * 
     * Tasks cannot:
     * - Execute from SRAM or Stack (XN bit)
     * - Access other tasks' stacks (different stack region per task)
     * - Access regions not configured (causes fault)
     */
    
    // Flash: 256KB starting at 0x08000000
    mpu_configure_region(MPU_REGION_FLASH, 
                         0x08000000, 
                         MPU_REGION_SIZE_256K,
                         MPU_AP_RO);  // Read-only, executable
    
    // Global SRAM: 64KB at 0x20000000
    mpu_configure_region(MPU_REGION_SRAM,
                         0x20000000,
                         MPU_REGION_SIZE_64K,
                         MPU_AP_RW_FULL | MPU_XN);  // Read-write, no execute
    
    // Peripherals: at 0x40000000
    mpu_configure_region(MPU_REGION_PERIPH,
                         0x40000000,
                         MPU_REGION_SIZE_256K,
                         MPU_AP_RW_FULL | MPU_XN);
    
    // Task stack: specific to this task
    mpu_configure_region(MPU_REGION_STACK,
                         stack_base,
                         stack_size_log2,
                         MPU_AP_RW_FULL | MPU_XN);
    
    // Enable MPU
    MPU_CTRL = 1;
}
 
/*
 * On context switch between tasks:
 * - Update Region 3 (stack) base address to new task's stack
 * - If tasks have separate heap regions, update those too
 * - Other regions (Flash, global SRAM, peripherals) stay the same
 */

When to Use MPU vs MMU

Summary: Base and Limit Registers

We've explored base and limit registers—the elegant simplicity that first made multiprogramming and memory protection practical. While largely superseded by paging, these concepts remain relevant.

Key Takeaways

•Base and limit provides simple address translation—adding a base to all addresses and checking against a limit—using minimal hardware.
•The mechanism enables relocation and protection—programs can load anywhere, and they cannot access other programs' memory.
•Hardware implementation is straightforward—two registers, one adder, one comparator, running in parallel.
•External fragmentation is the fatal flaw—contiguous allocation requirements cause memory to become unusably fragmented over time.
•Segmentation extends base+limit to multiple regions—providing per-segment protection but not solving fragmentation.
•Paging solved fragmentation by using fixed-size units, leading to its dominance in modern systems.
•The concepts remain relevant—thread-local storage, MPUs, GPU buffers, and bounds checking all use base+limit ideas.
•Understanding the progression from base+limit to paging illuminates why modern memory management has its current form.

Module Complete:

This concludes Module 3: Logical vs Physical Addresses. You now have a comprehensive understanding of the addressing duality at the heart of memory management:

Logical address space: The abstraction processes see—contiguous, private, unlimited
Physical address space: The reality of hardware—finite, shared, fragmented
Address translation: The bridge between them, performed on every access
The MMU: Hardware that makes translation fast and enforces protection
Base and limit: The simple predecessor that illuminates why paging was needed

This foundation prepares you for deeper topics: page table structures, virtual memory, demand paging, and memory-mapped files.

Module Complete

5 / 5