Address Translation - Learning Module

Loading content...

0/240

Segment Number

The First Half of Every Segmented Address

In a segmented memory system, every logical address tells a two-part story. The first part—the segment number—answers a fundamental question: Which logical unit of the program are we accessing? Is it the code segment containing executable instructions? The data segment holding global variables? The stack segment managing function calls? Or perhaps a dynamically allocated heap segment?

The segment number is not merely an index; it's the key that unlocks the metadata describing an entire region of a process's address space. Without correctly extracting and interpreting this component, the memory management unit cannot even begin the translation process. Understanding the segment number is therefore the essential first step in mastering segmented address translation.

What You Will Learn

By the end of this page, you will understand the precise role of the segment number in logical addresses, how it's extracted through bit manipulation, its function as an index into the segment table, the relationship between segment number width and maximum segment count, and architectural variations in segment number encoding across different systems.

Definition and Fundamental Role

A segment number (also called a segment selector or segment identifier) is the portion of a logical address that identifies which segment within a process's address space is being referenced.

More formally:

The segment number is an unsigned integer that serves as an index into the segment table, identifying the specific segment descriptor that contains the base address and limit for the referenced memory region.

This definition encapsulates several critical concepts that we must examine in detail.

Key Properties of Segment Numbers

•Fixed Position: The segment number occupies a specific, architecturally-defined position within the logical address—typically the high-order bits.
•Fixed Width: The number of bits allocated to the segment number is fixed by the architecture, determining the maximum number of segments per process.
•Direct Index: The segment number directly indexes into the segment table without any transformation (unlike page numbers in some paging schemes).
•Per-Process Scope: Segment number 3 in Process A refers to a completely different segment than segment number 3 in Process B.
•Semantic Meaning: Unlike page numbers, segment numbers often carry semantic significance—segment 0 might always be the code segment, segment 1 the data segment, etc.

The Logical Address Structure:

In a segmented system, a logical address is a tuple (s, d) where:

s = segment number (identifies WHICH segment)
d = offset within segment (identifies WHERE within that segment)

This is fundamentally different from a linear address where the entire address is a single number. The segment number provides a level of indirection that enables the powerful features of segmentation: variable-sized regions, per-segment protection, and logical organization of program components.

segment_number_structure.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Logical Address Structure in Segmentation:
 
┌─────────────────────────────────────────────────────────┐
│                    Logical Address                       │
├───────────────────────┬─────────────────────────────────┤
│    Segment Number     │         Offset (d)              │
│         (s)           │    (displacement within seg)    │
├───────────────────────┼─────────────────────────────────┤
│      k bits           │           m bits                │
└───────────────────────┴─────────────────────────────────┘
 
Example: 16-bit logical address with 4-bit segment number
 
   Logical Address: 0x3A5F
   Binary:          0011 1010 0101 1111
                    ├──┘ └────────────┤
                    │         │
                    │         └── Offset: 0xA5F (2655 in decimal)
                    │
                    └── Segment Number: 0x3 (segment 3)
 
This address references byte 2655 within segment 3.

Segment vs Page Numbers

Unlike page numbers which are merely indices into a flat page table, segment numbers identify logically meaningful program units. A programmer consciously assigns data to segments based on its purpose, access patterns, or protection requirements. This semantic organization is a defining characteristic of segmentation that influences how segment numbers are assigned and used.

Bit Extraction Mechanism

Extracting the segment number from a logical address is a fundamental hardware operation that occurs for every memory reference. The extraction must be fast—executing in a single clock cycle—and correct. Understanding the bit manipulation involved is essential for systems programmers and OS developers.

The Extraction Formula:

For a logical address with:

Total width: n bits
Segment number width: k bits (high-order bits)
Offset width: m bits (low-order bits, where m = n - k)

The segment number is extracted using a right shift:

segment_number = logical_address >> m

Alternatively, using bit masking:

segment_number = (logical_address >> m) & ((1 << k) - 1)

The mask ensures we only get the k bits we need, though the right shift alone suffices when the address is unsigned and the shift brings zeros into the high bits.

segment_extraction.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
#include <stdio.h>
#include <stdint.h>
 
/*
 * Segment Number Extraction
 * 
 * This demonstrates how hardware extracts the segment number
 * from a logical address in a segmented memory system.
 */
 
// Configuration for our example segmented architecture
#define LOGICAL_ADDR_BITS    16   // Total logical address width
#define SEGMENT_BITS          4   // Bits for segment number
#define OFFSET_BITS          12   // Bits for offset (16 - 4 = 12)
 
// Derived constants
#define MAX_SEGMENTS         (1 << SEGMENT_BITS)   // 16 segments
#define MAX_SEGMENT_SIZE     (1 << OFFSET_BITS)    // 4096 bytes per segment
 
// Extraction masks
#define OFFSET_MASK          ((1 << OFFSET_BITS) - 1)      // 0x0FFF
#define SEGMENT_MASK         ((1 << SEGMENT_BITS) - 1)     // 0x000F
 
/**
 * Extract segment number from logical address
 * This is what the MMU hardware does on every memory access
 */
uint16_t extract_segment_number(uint16_t logical_address) {
    return (logical_address >> OFFSET_BITS) & SEGMENT_MASK;
}
 
/**
 * Extract offset from logical address
 */
uint16_t extract_offset(uint16_t logical_address) {
    return logical_address & OFFSET_MASK;
}
 
/**
 * Compose a logical address from segment and offset
 * (Inverse operation - useful for understanding)
 */
uint16_t compose_logical_address(uint16_t segment, uint16_t offset) {
    return (segment << OFFSET_BITS) | (offset & OFFSET_MASK);
}
 
void demonstrate_extraction(uint16_t logical_address) {
    uint16_t segment = extract_segment_number(logical_address);
    uint16_t offset = extract_offset(logical_address);
    
    printf("Logical Address: 0x%04X (binary: ", logical_address);
    for (int i = 15; i >= 0; i--) {
        printf("%d", (logical_address >> i) & 1);
        if (i == OFFSET_BITS) printf(" | ");
    }
    printf(")
");
    
    printf("  Segment Number: %u (0x%X)
", segment, segment);
    printf("  Offset:         %u (0x%03X)
", offset, offset);
    printf("  Interpretation: Byte %u within Segment %u
 
", offset, segment);
}
 
int main() {
    printf("=== Segment Number Extraction Demo ===
");
    printf("Architecture: %d-bit addresses, %d-bit segment, %d-bit offset
",
           LOGICAL_ADDR_BITS, SEGMENT_BITS, OFFSET_BITS);
    printf("Maximum segments: %d, Maximum segment size: %d bytes
 
",
           MAX_SEGMENTS, MAX_SEGMENT_SIZE);
    
    // Test various addresses
    demonstrate_extraction(0x0000);  // Segment 0, offset 0
    demonstrate_extraction(0x1000);  // Segment 1, offset 0
    demonstrate_extraction(0x3A5F);  // Segment 3, offset 2655
    demonstrate_extraction(0xF123);  // Segment 15, offset 291
    demonstrate_extraction(0xFFFF);  // Segment 15, offset 4095
    
    // Verify composition is inverse of extraction
    printf("=== Verification ===
");
    uint16_t seg = 5, off = 1234;
    uint16_t addr = compose_logical_address(seg, off);
    printf("Composed (seg=%u, off=%u) -> 0x%04X
", seg, off, addr);
    printf("Extracted back: seg=%u, off=%u
",
           extract_segment_number(addr), extract_offset(addr));
    
    return 0;
}

Segment Number Extraction Examples (4-bit segment, 12-bit offset)
Logical Address	Binary Representation	Segment #	Offset
0x0000	0000 \| 000000000000	0	0
0x1000	0001 \| 000000000000	1	0
0x2800	0010 \| 100000000000	2	2048
0x3A5F	0011 \| 101001011111	3	2655
0xF123	1111 \| 000100100011	15	291
0xFFFF	1111 \| 111111111111	15	4095

Hardware Implementation

In actual hardware, segment number extraction is performed by hardwired shift logic that routes specific address lines to the segment table index inputs. No actual shifting computation occurs—the bits are simply connected to different destinations. This is why the extraction is effectively 'free' in terms of time cost.

Segment Number as Segment Table Index

The primary purpose of the segment number is to serve as an index into the segment table—a per-process data structure maintained by the operating system that describes the physical location and attributes of each segment.

Converting Mermaid diagram...

The Indexing Operation:

Given:

Segment Table Base Register (STBR) pointing to the segment table in memory
Segment Table Length Register (STLR) containing the number of valid entries
Each segment table entry is E bytes in size

The address of the segment table entry for segment number s is:

STE_address = STBR + (s × E)

This is a simple array indexing operation. If STBR = 0x80000, each entry is 8 bytes, and s = 3:

STE_address = 0x80000 + (3 × 8) = 0x80000 + 24 = 0x80018

The hardware then reads the segment table entry from physical address 0x80018.

segment_table_lookup.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
 
/*
 * Segment Table Structure and Lookup
 * 
 * Demonstrates how the segment number indexes into the segment table
 * to retrieve segment metadata for address translation.
 */
 
// Segment Table Entry structure
typedef struct {
    uint32_t base;        // Physical base address of segment
    uint32_t limit;       // Size of segment in bytes
    uint8_t  present;     // Is segment currently in memory?
    uint8_t  protection;  // Access permissions (R/W/X)
    uint8_t  accessed;    // Has segment been accessed?
    uint8_t  modified;    // Has segment been modified?
} SegmentTableEntry;
 
// Protection bit masks
#define PROT_READ    0x04
#define PROT_WRITE   0x02
#define PROT_EXECUTE 0x01
 
// Simulated segment table (typically one per process)
#define MAX_SEGMENTS 16
SegmentTableEntry segment_table[MAX_SEGMENTS];
uint32_t STBR;  // Segment Table Base Register (simulated)
uint32_t STLR;  // Segment Table Length Register
 
/**
 * Initialize segment table with sample segments
 */
void initialize_segment_table() {
    STBR = (uint32_t)(uintptr_t)segment_table;  // Point to our table
    STLR = 5;  // We have 5 valid segments
    
    // Segment 0: Code segment (Read + Execute)
    segment_table[0] = (SegmentTableEntry){
        .base = 0x00100000,
        .limit = 8192,
        .present = 1,
        .protection = PROT_READ | PROT_EXECUTE,
        .accessed = 0,
        .modified = 0
    };
    
    // Segment 1: Data segment (Read + Write)
    segment_table[1] = (SegmentTableEntry){
        .base = 0x00200000,
        .limit = 16384,
        .present = 1,
        .protection = PROT_READ | PROT_WRITE,
        .accessed = 0,
        .modified = 0
    };
    
    // Segment 2: Stack segment (Read + Write)
    segment_table[2] = (SegmentTableEntry){
        .base = 0x00300000,
        .limit = 4096,
        .present = 1,
        .protection = PROT_READ | PROT_WRITE,
        .accessed = 0,
        .modified = 0
    };
    
    // Segment 3: Shared library (Read + Execute)
    segment_table[3] = (SegmentTableEntry){
        .base = 0x00400000,
        .limit = 32768,
        .present = 1,
        .protection = PROT_READ | PROT_EXECUTE,
        .accessed = 0,
        .modified = 0
    };
    
    // Segment 4: Heap segment (Read + Write)
    segment_table[4] = (SegmentTableEntry){
        .base = 0x00500000,
        .limit = 65536,
        .present = 1,
        .protection = PROT_READ | PROT_WRITE,
        .accessed = 0,
        .modified = 0
    };
}
 
/**
 * Look up segment table entry using segment number
 * Returns pointer to entry, or NULL if invalid segment number
 */
SegmentTableEntry* lookup_segment(uint16_t segment_number) {
    // Check if segment number is within valid range
    if (segment_number >= STLR) {
        printf("ERROR: Segment %u exceeds STLR (%u)
", segment_number, STLR);
        return NULL;
    }
    
    // Calculate entry address (simulated pointer arithmetic)
    // In hardware: STE_addr = STBR + (segment_number * sizeof(STE))
    SegmentTableEntry* base = (SegmentTableEntry*)STBR;
    return &base[segment_number];
}
 
void print_segment_entry(uint16_t seg_num, SegmentTableEntry* entry) {
    if (!entry) return;
    
    printf("Segment %u:
", seg_num);
    printf("  Base Address:  0x%08X
", entry->base);
    printf("  Limit:         %u bytes (0x%X)
", entry->limit, entry->limit);
    printf("  Present:       %s
", entry->present ? "Yes" : "No");
    printf("  Protection:    %c%c%c
",
           (entry->protection & PROT_READ) ? 'R' : '-',
           (entry->protection & PROT_WRITE) ? 'W' : '-',
           (entry->protection & PROT_EXECUTE) ? 'X' : '-');
    printf("
");
}
 
int main() {
    printf("=== Segment Table Lookup Demo ===
 
");
    
    initialize_segment_table();
    
    printf("Segment Table Base Register (STBR): 0x%08X
", STBR);
    printf("Segment Table Length Register (STLR): %u
 
", STLR);
    
    // Look up each valid segment
    for (uint16_t i = 0; i < STLR; i++) {
        SegmentTableEntry* entry = lookup_segment(i);
        print_segment_entry(i, entry);
    }
    
    // Attempt to access invalid segment
    printf("Attempting to access segment 10...
");
    lookup_segment(10);  // Will print error
    
    return 0;
}

Segment Table Length Check

Before indexing, the hardware compares the segment number against the Segment Table Length Register (STLR). If segment_number ≥ STLR, a trap occurs—the program is trying to access a segment that doesn't exist. This provides a first line of defense against invalid memory accesses, before any bounds checking within the segment itself.

Segment Number Width and Maximum Segments

The number of bits allocated to the segment number directly determines the maximum number of segments a process can have. This is a fundamental architectural decision with far-reaching implications for system design.

The Relationship:

Maximum Segments = 2^k

Where k is the number of bits in the segment number.

The Tradeoff:

With a fixed logical address width n, increasing k (segment bits) decreases m (offset bits):

m = n - k

This creates a fundamental tradeoff:

More segment bits → More segments, but smaller maximum segment size
Fewer segment bits → Fewer segments, but larger maximum segment size

Segment Number Width Tradeoffs (16-bit logical address)
Segment Bits	Offset Bits	Max Segments	Max Segment Size
2	14	4	16 KB
4	12	16	4 KB
6	10	64	1 KB
8	8	256	256 bytes

Segment Number Width Tradeoffs (32-bit logical address)
Segment Bits	Offset Bits	Max Segments	Max Segment Size
8	24	256	16 MB
12	20	4,096	1 MB
16	16	65,536	64 KB
18	14	262,144	16 KB

Architectural Considerations:

The choice of segment number width depends on the intended use case:

Few large segments: Suitable for traditional segmentation where each segment represents a major program component (code, data, stack). 4-8 segments may suffice.
Many small segments: Suitable for fine-grained protection or object-based systems where each data structure could be its own segment. Thousands of segments may be needed.
Hybrid approaches: Some architectures support variable segment counts through hierarchical segment tables or per-segment limits on offset width.

Historical Segment Number Widths

•Intel 8086 (1978): 4 segment registers, effectively 2-bit segment selection from registers (CS, DS, SS, ES), but 16-bit segment values
•Intel 80286 (1982): 14-bit selector (13 bits index + 1 bit table indicator), supporting 8,192 global + 8,192 local segments
•Intel 80386+ (1985+): Same 14-bit selector structure but with 32-bit segment limits, enabling 4 GB segments
•Multics (1965): 18-bit segment numbers supporting 262,144 segments per process
•IBM System/38 (1979): Large segment number space for capability-based addressing

Design Philosophy

The choice of segment number width reflects the system's philosophy. Multics used many segments because it treated each file and data structure as a separate segment. Intel x86 used fewer hardware segments but allowed software to manage more through segment selector tables. Modern systems often flatten segmentation, using minimal segments and relying on paging for fine-grained memory management.

Explicit vs Implicit Segment Selection

Not all segmented architectures include the segment number directly in every memory address. Two distinct approaches exist: explicit segmentation where the segment is part of the address, and implicit segmentation where the segment is determined by context.

Explicit Segmentation

•Segment number is part of every logical address
•Address format: (segment, offset) tuple
•Compiler generates complete two-part addresses
•Example: Multics, many academic systems
•Maximum flexibility in segment selection
•Larger addresses required

Implicit Segmentation

•Segment determined by instruction type or register
•Address contains only offset
•Hardware infers segment from context
•Example: Intel x86 segment registers
•Smaller instruction encoding
•Limited segment flexibility per instruction

Intel x86 Implicit Segmentation:

The Intel x86 architecture uses implicit segmentation through segment registers. Instead of encoding the segment in each address, the processor maintains segment registers (CS, DS, SS, ES, FS, GS) that are implicitly used based on the type of memory access:

CS (Code Segment): Used for instruction fetches
DS (Data Segment): Used for data accesses (default)
SS (Stack Segment): Used for stack operations (PUSH, POP, and SP/BP-based accesses)
ES, FS, GS (Extra Segments): Available for additional data, often used for thread-local storage (FS/GS)

implicit_segmentation.asm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
; Intel x86 Implicit Segmentation Example
;
; The segment used depends on the instruction and registers involved,
; NOT on explicit segment specification in the address
 
section .text
global _start
 
_start:
    ; Instruction fetch - implicitly uses CS (Code Segment)
    ; The CPU fetches this instruction from CS:EIP
    mov eax, 42
    
    ; Data access - implicitly uses DS (Data Segment)
    ; Effective address: DS:0x402000
    mov ebx, [0x402000]
    
    ; Stack access - implicitly uses SS (Stack Segment)
    ; PUSH uses SS:ESP, then decrements ESP
    push eax              ; Accesses SS:ESP
    
    ; Stack-relative access - implicitly uses SS
    ; EBP-relative accesses default to SS
    mov ecx, [ebp-4]      ; Actually SS:[EBP-4]
    
    ; Explicit segment override - programmer forces different segment
    ; This accesses ES:0x402000 instead of DS:0x402000
    mov edx, [es:0x402000]
    
    ; FS/GS often used for thread-local storage
    ; On Linux x86-64, FS points to thread control block
    mov rax, [fs:0x28]    ; Access thread-local canary value
 
; Key insight: The segment is NOT encoded in the address 0x402000
; It's determined by the instruction context or explicit override

x86 Default Segment Selection Rules
Reference Type	Default Segment	Can Override?
Instruction fetch	CS	No
Stack operations (PUSH/POP)	SS	No
String destination (ES:DI)	ES	No
BP or SP as base register	SS	Yes
Other data references	DS	Yes
String source (DS:SI)	DS	Yes

Modern x86-64 Simplification

In 64-bit mode (x86-64), segmentation is largely flattened. CS, DS, ES, and SS all have base = 0 and are effectively ignored for address calculation. Only FS and GS retain non-zero bases, typically used for thread-local storage. The segment number still exists in the architecture but has minimal impact on address translation, with paging handling virtually all memory management.

Summary: The Segment Number's Role

The segment number is the essential first component of address translation in segmented memory systems. It transforms a logical address from an abstract reference into a concrete path to physical memory.

Key Takeaways

•Segment number identifies which segment within the process's address space is being accessed, answering 'which logical unit?'
•Extraction is a simple bit operation — right-shifting the logical address by the offset width yields the segment number
•The segment number indexes the segment table — it's multiplied by entry size and added to STBR to locate the segment descriptor
•Width determines capacity — k bits allow 2^k segments, but reduce maximum segment size by consuming address bits
•Segmentation can be explicit or implicit — some architectures encode segments in addresses, others use segment registers with context-based selection

What's Next:

With the segment number extracted and the segment table entry located, the next step in address translation focuses on the offset within segment. The offset specifies exactly which byte within the identified segment is being accessed. Combined with the segment's base address, the offset will help form the final physical address—but first, it must pass bounds checking against the segment's limit.

Page Complete

You now understand the segment number—the first half of every segmented address. You can extract it from logical addresses, use it to index segment tables, and appreciate the tradeoffs in segment number width. Next, we'll explore the offset component and how it pinpoints the exact memory location within a segment.