Loading content...
If the page number tells you which page to access, the offset tells you where within that page. This second component of a logical address identifies the precise byte position—from 0 to the page size minus one—and carries a remarkable property: the offset never changes during translation.
Think of it this way: if a page is a 4096-byte container, and your data is at byte 1000 within your virtual page, that data will be at byte 1000 within the corresponding physical frame. Pages and frames are identical in size precisely so that offsets can pass through unchanged. This invariance is not coincidental—it's a deliberate design choice that dramatically simplifies translation hardware.
By the end of this page, you will understand how to extract the offset from any logical address, why offsets remain invariant during translation, the relationship between page size and offset bit width, and how the offset complements the page number to provide complete addressing. You'll master the bit manipulation and understand its hardware implementation.
The offset (also called displacement or page offset) represents the position of a byte within a page. It answers the question: "Once we've identified the correct page (or frame), how many bytes into that page is the target location?"
Formal Definition:
The offset d is the portion of a logical address that specifies the byte position within a page, ranging from 0 to (page_size - 1).
For a system with page size P = 2^n:
The Critical Insight:
Because pages and frames are identically sized, the offset has the same meaning in both virtual and physical address spaces:
This invariance is fundamental. Only the page number changes (becoming a frame number); the offset passes through the translation unit unmodified.
If pages and frames had different sizes, translation would require rescaling the offset—a complex operation. By mandating equal sizes, the system guarantees that any valid offset in a page is equally valid in a frame. This design choice, made in the earliest paged systems, remains universal today.
Extracting the offset is the complementary operation to extracting the page number. While page number extraction uses a right shift, offset extraction uses a bitwise AND mask.
Formula:
Offset = Logical Address AND (Page Size - 1)
where Page Size = 2^n
Alternatively, expressed as a modulo operation:
Offset = Logical Address MOD Page Size
Both formulations are equivalent when the page size is a power of two, but the AND operation is far more efficient in hardware.
Why (Page Size - 1) Creates a Perfect Mask:
For any power of two, subtracting 1 creates a binary number with all lower bits set:
Page Size = 4096 = 2^12 = 0001 0000 0000 0000
Page Size - 1 = 4095 = 0000 1111 1111 1111
ANDing with this mask zeros out all bits above the offset position while preserving all offset bits intact.
Step-by-Step Example:
Extract offset from address 0x00012345 with 4KB pages:
Address: 0x00012345 = 0000 0000 0000 0001 0010 0011 0100 0101
Mask: 0x00000FFF = 0000 0000 0000 0000 0000 1111 1111 1111
──────────────────────────────────────────────────────
Offset: 0x00000345 = 0000 0000 0000 0000 0000 0011 0100 0101
The offset is 0x345 = 837 bytes.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169
#include <stdio.h>#include <stdint.h> /* * Offset Extraction Demonstration * * This program shows how the MMU extracts the offset from * a logical address using bitwise AND with a mask. */ // Page sizes and their corresponding masks#define PAGE_4KB 4096ULL#define MASK_4KB (PAGE_4KB - 1) // 0x0000000000000FFF #define PAGE_2MB (2ULL << 20) // 2,097,152#define MASK_2MB (PAGE_2MB - 1) // 0x00000000001FFFFF #define PAGE_1GB (1ULL << 30) // 1,073,741,824#define MASK_1GB (PAGE_1GB - 1) // 0x000000003FFFFFFF /** * Extract offset using bitwise AND (preferred hardware method) */uint64_t extract_offset_and(uint64_t addr, uint64_t mask) { return addr & mask;} /** * Extract offset using modulo (mathematically equivalent) */uint64_t extract_offset_mod(uint64_t addr, uint64_t page_size) { return addr % page_size;} /** * Extract offset using subtraction (alternative view) * offset = addr - (page_number * page_size) */uint64_t extract_offset_sub(uint64_t addr, int offset_bits) { uint64_t page_number = addr >> offset_bits; uint64_t page_start = page_number << offset_bits; return addr - page_start;} void demonstrate_extraction(uint64_t addr, const char* label) { printf("═══════════════════════════════════════════════════════════════"); printf("Address: %s (0x%016llX)", label, (unsigned long long)addr); printf("═══════════════════════════════════════════════════════════════"); // 4KB pages uint64_t off_and = extract_offset_and(addr, MASK_4KB); uint64_t off_mod = extract_offset_mod(addr, PAGE_4KB); uint64_t off_sub = extract_offset_sub(addr, 12); printf("4KB Pages (12 offset bits):"); printf(" Method 1 (AND mask): offset = 0x%03llX = %4llu bytes", (unsigned long long)off_and, (unsigned long long)off_and); printf(" Method 2 (modulo): offset = 0x%03llX = %4llu bytes", (unsigned long long)off_mod, (unsigned long long)off_mod); printf(" Method 3 (subtract): offset = 0x%03llX = %4llu bytes", (unsigned long long)off_sub, (unsigned long long)off_sub); printf(" All methods match: %s", (off_and == off_mod && off_mod == off_sub) ? "✓ YES" : "✗ NO"); // 2MB pages uint64_t off_2mb = extract_offset_and(addr, MASK_2MB); printf("2MB Pages (21 offset bits):"); printf(" Offset = 0x%06llX = %7llu bytes", (unsigned long long)off_2mb, (unsigned long long)off_2mb); // Show the bitwise operation printf("Bitwise AND visualization (4KB):"); printf(" Address: 0x%08llX", (unsigned long long)(addr & 0xFFFFFFFF)); printf(" Mask: 0x%08llX (Page Size - 1)", (unsigned long long)(MASK_4KB & 0xFFFFFFFF)); printf(" ─────────────────"); printf(" Offset: 0x%08llX", (unsigned long long)(off_and & 0xFFFFFFFF));} void show_mask_generation() { printf("╔══════════════════════════════════════════════════════════════╗"); printf("║ HOW THE OFFSET MASK IS GENERATED ║"); printf("╚══════════════════════════════════════════════════════════════╝ "); printf("Page Size in Binary: Page Size - 1 (MASK): "); printf("4 KB = 0x00001000 0x00000FFF"); printf(" = ...0001 0000 0000 0000 ...0000 1111 1111 1111 "); printf("2 MB = 0x00200000 0x001FFFFF"); printf(" = ...0010 0000 0000... ...0001 1111 1111... "); printf("1 GB = 0x40000000 0x3FFFFFFF"); printf(" = 0100 0000 0000... 0011 1111 1111... "); printf("Pattern: 2^n always has exactly one bit set."); printf(" 2^n - 1 has all bits below that position set."); printf(" This creates a perfect mask for the lower n bits.");} int main() { printf("╔══════════════════════════════════════════════════════════════╗"); printf("║ OFFSET EXTRACTION DEMONSTRATION ║"); printf("╚══════════════════════════════════════════════════════════════╝"); show_mask_generation(); // Test various addresses demonstrate_extraction(0x00012345, "Example address"); demonstrate_extraction(0x00001000, "Start of page 1 (offset = 0)"); demonstrate_extraction(0x00001FFF, "End of page 1 (max offset)"); demonstrate_extraction(0xDEADBEEF, "Classic pattern"); demonstrate_extraction(0x00000000, "Zero address"); printf("═══════════════════════════════════════════════════════════════"); printf("KEY INSIGHT: The offset is simply the 'remainder' when you"); printf(" divide the address into page-sized chunks."); printf(" AND mask is O(1) in hardware; division would be slow."); printf("═══════════════════════════════════════════════════════════════"); return 0;}In actual hardware, offset extraction is even simpler than an AND operation—it's just wiring. The lower 12 bits (for 4KB pages) are routed directly to the offset portion of the physical address bus. No logic gates, no computation—just electrical connections. This is why paging can happen at full memory speed.
A fundamental property of paging is that offsets are preserved during translation. The offset in the virtual address equals the offset in the physical address. This invariance has profound implications for system design.
Why Offset Invariance Matters:
p and q are pointers within the same page with q = p + 100, the physical addresses maintain the same relationship: phys_q = phys_p + 100.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166
#include <stdio.h>#include <stdint.h>#include <stdbool.h> /* * Demonstrating Offset Invariance * * This simulation shows that the offset remains unchanged * during virtual-to-physical address translation. */ #define PAGE_SIZE 4096#define OFFSET_BITS 12#define OFFSET_MASK 0xFFF // Simulated page table (page -> frame mapping)uint32_t page_to_frame[] = { [0] = 0x100, // Page 0 -> Frame 0x100 [1] = 0x055, // Page 1 -> Frame 0x055 [18] = 0x5A2, // Page 18 -> Frame 0x5A2 [42] = 0x999, // Page 42 -> Frame 0x999}; typedef struct { uint64_t virtual_addr; uint64_t physical_addr; uint32_t page_number; uint32_t frame_number; uint32_t virtual_offset; uint32_t physical_offset; bool offset_preserved;} TranslationResult; TranslationResult translate(uint64_t vaddr) { TranslationResult result; result.virtual_addr = vaddr; result.page_number = vaddr >> OFFSET_BITS; result.virtual_offset = vaddr & OFFSET_MASK; // Look up frame number (simplified - no bounds checking) result.frame_number = page_to_frame[result.page_number]; // Form physical address result.physical_addr = ((uint64_t)result.frame_number << OFFSET_BITS) | result.virtual_offset; // Extract offset from physical address (for verification) result.physical_offset = result.physical_addr & OFFSET_MASK; // Check invariance result.offset_preserved = (result.virtual_offset == result.physical_offset); return result;} void print_translation(TranslationResult r) { printf("┌──────────────────────────────────────────────────────────────┐"); printf("│ Virtual Address: 0x%08llX │", (unsigned long long)r.virtual_addr); printf("├──────────────────────────────────────────────────────────────┤"); printf("│ Decomposition: │"); printf("│ Page Number = %u (0x%X) │", r.page_number, r.page_number); printf("│ Offset = %u (0x%X) │", r.virtual_offset, r.virtual_offset); printf("├──────────────────────────────────────────────────────────────┤"); printf("│ Translation: │"); printf("│ Page %u → Frame 0x%X │", r.page_number, r.frame_number); printf("├──────────────────────────────────────────────────────────────┤"); printf("│ Physical Address: 0x%08llX │", (unsigned long long)r.physical_addr); printf("├──────────────────────────────────────────────────────────────┤"); printf("│ Offset Verification: │"); printf("│ Virtual Offset: 0x%03X (%4u bytes) │", r.virtual_offset, r.virtual_offset); printf("│ Physical Offset: 0x%03X (%4u bytes) │", r.physical_offset, r.physical_offset); printf("│ Offset Preserved: %s │", r.offset_preserved ? "✓ YES" : "✗ NO"); printf("└──────────────────────────────────────────────────────────────┘");} void demonstrate_pointer_arithmetic() { printf("═══════════════════════════════════════════════════════════════"); printf(" POINTER ARITHMETIC PRESERVATION "); printf("═══════════════════════════════════════════════════════════════ "); // Two addresses in the same page, 100 bytes apart uint64_t vaddr_p = 0x0002A500; // Page 42, offset 0x500 uint64_t vaddr_q = 0x0002A564; // Page 42, offset 0x564 (100 bytes later) TranslationResult r_p = translate(vaddr_p); TranslationResult r_q = translate(vaddr_q); printf("Virtual: p = 0x%08llX, q = 0x%08llX", (unsigned long long)vaddr_p, (unsigned long long)vaddr_q); printf(" q - p = %llu bytes (in virtual space)", (unsigned long long)(vaddr_q - vaddr_p)); printf(""); printf("Physical: p → 0x%08llX, q → 0x%08llX", (unsigned long long)r_p.physical_addr, (unsigned long long)r_q.physical_addr); printf(" q - p = %llu bytes (in physical space)", (unsigned long long)(r_q.physical_addr - r_p.physical_addr)); printf(""); printf("Relationship preserved: %s", (vaddr_q - vaddr_p) == (r_q.physical_addr - r_p.physical_addr) ? "✓ YES" : "✗ NO"); printf(""); printf("KEY INSIGHT: Within a page, all address relationships are"); printf(" preserved because only the high bits change.");} int main() { printf("╔══════════════════════════════════════════════════════════════╗"); printf("║ OFFSET INVARIANCE DEMONSTRATION ║"); printf("╚══════════════════════════════════════════════════════════════╝"); // Demonstrate for several addresses print_translation(translate(0x00000500)); // Page 0, offset 0x500 print_translation(translate(0x00001234)); // Page 1, offset 0x234 print_translation(translate(0x00012345)); // Page 18, offset 0x345 // Show pointer arithmetic preservation demonstrate_pointer_arithmetic(); return 0;}While arithmetic within a page is always preserved, arithmetic across page boundaries is NOT guaranteed to maintain physical address relationships. Pages 5 and 6 in virtual memory might map to frames 1000 and 2 in physical memory (non-contiguous). Programs should never assume physically contiguous allocation of virtually contiguous pages—this is actually a key feature enabling non-contiguous allocation.
Memory alignment—the requirement that data be placed at addresses that are multiples of specific values—is preserved through translation because alignment is determined by the offset, which remains invariant.
Common Alignment Requirements:
| Data Type | Typical Alignment | Offset Requirement |
|---|---|---|
| char | 1 byte | Any offset |
| short | 2 bytes | Offset divisible by 2 |
| int/float | 4 bytes | Offset divisible by 4 |
| long/double | 8 bytes | Offset divisible by 8 |
| SIMD (128-bit) | 16 bytes | Offset divisible by 16 |
| Cache line | 64 bytes | Offset divisible by 64 |
Alignment Through the Offset Lens:
Since a 4KB page is divisible by all common alignment boundaries (1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096), an aligned virtual address will always translate to an aligned physical address.
Consider an 8-byte aligned variable at virtual address 0x00012348:
This invariance ensures:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
#include <stdio.h>#include <stdint.h> /* * Demonstration: Alignment Preservation Through Translation * * Shows that alignment properties are determined by offset * and are preserved during address translation. */ #define PAGE_SIZE 4096 typedef struct { int alignment; const char* type;} AlignmentInfo; AlignmentInfo alignments[] = { {1, "byte"}, {2, "short/16-bit"}, {4, "int/32-bit"}, {8, "long/64-bit"}, {16, "SSE/128-bit"}, {32, "AVX/256-bit"}, {64, "cache line"},}; const int num_alignments = sizeof(alignments) / sizeof(alignments[0]); void check_alignment(uint64_t vaddr, uint32_t frame_number) { uint32_t offset = vaddr & (PAGE_SIZE - 1); uint64_t paddr = ((uint64_t)frame_number << 12) | offset; printf("Virtual Address: 0x%08llX (offset = 0x%03X = %u)", (unsigned long long)vaddr, offset, offset); printf("Physical Address: 0x%08llX (frame 0x%X → same offset)", (unsigned long long)paddr, frame_number); printf("%-20s | %-15s | %-15s", "Alignment Type", "Virtual Align", "Physical Align"); printf("%-20s-+-%-15s-+-%-15s", "--------------------", "---------------", "---------------"); for (int i = 0; i < num_alignments; i++) { int align = alignments[i].alignment; int v_aligned = (vaddr % align == 0); int p_aligned = (paddr % align == 0); printf("%-20s | %s%-13s | %s%-13s", alignments[i].type, v_aligned ? "✓ " : "✗ ", v_aligned ? "aligned" : "NOT aligned", p_aligned ? "✓ " : "✗ ", p_aligned ? "aligned" : "NOT aligned"); }} void explain_page_alignment() { printf("═══════════════════════════════════════════════════════════════"); printf(" WHY PAGE SIZE GUARANTEES ALIGNMENT PRESERVATION "); printf("═══════════════════════════════════════════════════════════════ "); printf("4KB (4096) is divisible by all common alignment values: "); int common_aligns[] = {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096}; int n = sizeof(common_aligns) / sizeof(common_aligns[0]); for (int i = 0; i < n; i++) { int a = common_aligns[i]; printf(" 4096 ÷ %4d = %4d (exact) ", a, 4096 / a); printf("Offset %% %d → Physical %% %d", a, a); } printf("Since page boundaries are perfectly aligned to all these values,"); printf("and the offset is the position WITHIN a page, any alignment check"); printf("on the virtual offset produces the same result for the physical offset.");} int main() { printf("╔══════════════════════════════════════════════════════════════╗"); printf("║ ALIGNMENT PRESERVATION DEMONSTRATION ║"); printf("╚══════════════════════════════════════════════════════════════╝"); explain_page_alignment(); printf("═══════════════════════════════════════════════════════════════"); printf(" ALIGNMENT CHECK EXAMPLES "); printf("═══════════════════════════════════════════════════════════════"); // Well-aligned address (64-byte aligned for cache line) check_alignment(0x00012340, 0x5A2); // Offset 0x340 = 832 (div by 64) // 8-byte aligned but not 16-byte aligned check_alignment(0x00012348, 0x5A2); // Offset 0x348 = 840 (div by 8, not 16) // Misaligned address check_alignment(0x00012345, 0x5A2); // Offset 0x345 = 837 (odd) printf("═══════════════════════════════════════════════════════════════"); printf("OBSERVATION: Alignment properties are IDENTICAL for virtual and"); printf(" physical addresses because they depend only on offset."); printf("═══════════════════════════════════════════════════════════════"); return 0;}Modern CPUs access memory in 64-byte cache lines. The offset's low 6 bits (0-63) determine where data falls within a cache line. Since these bits are preserved, cache behavior is predictable from virtual addresses alone—a major convenience for performance optimization.
Modern systems support multiple page sizes simultaneously (4KB, 2MB, 1GB on x86-64). The offset width changes depending on which page size applies to a particular memory region. Larger pages mean more offset bits, fewer page number bits.
| Page Size | Offset Bits | Offset Range | Offset Mask | Addressable per Page |
|---|---|---|---|---|
| 4 KB | 12 | 0 - 4,095 | 0x00000FFF | 4,096 bytes |
| 2 MB | 21 | 0 - 2,097,151 | 0x001FFFFF | 2,097,152 bytes |
| 1 GB | 30 | 0 - 1,073,741,823 | 0x3FFFFFFF | 1,073,741,824 bytes |
Implications of Larger Offsets:
With 2MB pages, the offset is 21 bits wide—covering over 2 million possible positions within a single page. This means:
Address Decomposition Comparison:
For address 0x00123456:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
#include <stdio.h>#include <stdint.h> /* * Offset Extraction for Different Page Sizes * * Shows how the same address decomposes differently * depending on the configured page size. */ void decompose_address(uint64_t addr) { printf("╔══════════════════════════════════════════════════════════════╗"); printf("║ Address: 0x%016llX ║", (unsigned long long)addr); printf("╠══════════════════════════════════════════════════════════════╣"); // 4KB pages (12 offset bits) uint64_t page_4kb = addr >> 12; uint64_t off_4kb = addr & 0xFFF; printf("║ ║"); printf("║ 4KB Pages (12 offset bits): ║"); printf("║ Page Number: 0x%05llX (%llu) ║", (unsigned long long)page_4kb, (unsigned long long)page_4kb); printf("║ Offset: 0x%03llX (%llu bytes into page) ║", (unsigned long long)off_4kb, (unsigned long long)off_4kb); printf("║ Binary: [%20llu] | [%12llu] ║", (unsigned long long)page_4kb, (unsigned long long)off_4kb); // 2MB pages (21 offset bits) uint64_t page_2mb = addr >> 21; uint64_t off_2mb = addr & 0x1FFFFF; printf("║ ║"); printf("║ 2MB Pages (21 offset bits): ║"); printf("║ Page Number: 0x%03llX (%llu) ║", (unsigned long long)page_2mb, (unsigned long long)page_2mb); printf("║ Offset: 0x%05llX (%llu bytes into page) ║", (unsigned long long)off_2mb, (unsigned long long)off_2mb); printf("║ Coverage increase: Each entry covers 512× more memory ║"); // 1GB pages (30 offset bits) uint64_t page_1gb = addr >> 30; uint64_t off_1gb = addr & 0x3FFFFFFF; printf("║ ║"); printf("║ 1GB Pages (30 offset bits): ║"); printf("║ Page Number: 0x%01llX (%llu) ║", (unsigned long long)page_1gb, (unsigned long long)page_1gb); printf("║ Offset: 0x%08llX (%llu bytes into page) ║", (unsigned long long)off_1gb, (unsigned long long)off_1gb); printf("║ Coverage increase: Each entry covers 262,144× more memory ║"); printf("║ ║"); printf("╚══════════════════════════════════════════════════════════════╝");} void show_tlb_coverage() { printf("═══════════════════════════════════════════════════════════════"); printf(" TLB COVERAGE COMPARISON "); printf("═══════════════════════════════════════════════════════════════ "); int typical_tlb_entries = 64; // Simplified; real TLBs vary printf("With %d TLB entries: ", typical_tlb_entries); printf(" 4KB pages: %d entries × 4 KB = %6d KB = %d MB coverage", typical_tlb_entries, typical_tlb_entries * 4, (typical_tlb_entries * 4) / 1024); printf(" 2MB pages: %d entries × 2 MB = %6d MB coverage", typical_tlb_entries, typical_tlb_entries * 2); printf(" 1GB pages: %d entries × 1 GB = %6d GB coverage", typical_tlb_entries, typical_tlb_entries); printf("Larger pages = more offset bits = more memory per TLB entry"); printf("This is why databases and VMs often use huge pages.");} int main() { printf("╔══════════════════════════════════════════════════════════════╗"); printf("║ OFFSET WITH VARIABLE PAGE SIZES ║"); printf("╚══════════════════════════════════════════════════════════════╝"); decompose_address(0x0000000000123456ULL); decompose_address(0x0000000012345678ULL); decompose_address(0x00007FFFF7DD5000ULL); // Typical library address show_tlb_coverage(); return 0;}The page table structure indicates which page size applies to each region. On x86-64, a flag in the Page Directory entry (PS bit) can indicate a 2MB page instead of pointing to a Page Table for 4KB pages. Similarly, the PDPT can indicate 1GB pages. The MMU hardware dynamically adjusts offset extraction based on these flags.
The hardware handling of offsets is elegantly simple. Since the offset passes through translation unchanged, it bypasses the translation logic entirely and is directly concatenated with the translated frame number.
Key Hardware Properties:
Physical Address Formation:
The final physical address is formed by simple concatenation:
Physical Address = (Frame Number << offset_bits) | Offset
In hardware, this isn't even a shift—the frame number bits are simply placed in the higher bit positions and the offset bits in the lower positions through wiring. The result appears on the physical address bus:
Bit positions: [31 - 12] [11 - 0]
↓ ↓
Physical Address: Frame # | Offset
The offset bypass design has remained unchanged since the earliest paging systems (1960s) because it's optimal. Any scheme that modified the offset would add complexity and latency to every memory access. By keeping pages and frames the same size and passing offsets through unchanged, paging achieves minimal overhead—just the page-to-frame lookup.
The offset is the simpler half of address translation—but its simplicity is a feature, not a limitation. By preserving offsets unchanged, paging achieves efficiency and predictability.
What's Next:
With both page number and offset extraction understood, we can now examine the core of translation: frame lookup. This is where the extracted page number is used to find the corresponding physical frame number in the page table—completing the mapping from virtual to physical address.
You now understand offset extraction—the portion of the logical address that identifies the exact byte within a page. Combined with page number extraction, you have the complete picture of how a logical address is decomposed. Next, we'll see how the page table translates the page number to a frame number.