Loading learning content...
Every memory access in a paged system begins with a simple but profound operation: splitting a logical address into two components. This division—invisible to programmers but fundamental to hardware—separates the where among pages (the page number) from the where within a page (the offset). Understanding page number extraction is understanding the first step the CPU takes every single time your program touches memory.
When your program accesses address 0x00B5F080, the hardware doesn't search through memory looking for this location. Instead, it extracts the page number as an index into a table—a direct lookup that transforms an arbitrary-seeming address into a structured, navigable reference. This extraction happens billions of times per second on modern systems, making it perhaps the most frequently executed operation in computing.
By the end of this page, you will understand exactly how the page number is extracted from a logical address through bit manipulation, why the page size determines the boundary between page number and offset, how different page sizes affect address decomposition, and how to manually extract page numbers from any given address. This knowledge is essential for understanding page table lookups and the complete translation process.
In a paged memory system, every logical (virtual) address is conceptually divided into two parts:
Logical Address = Page Number (p) | Offset (d)
This division is not arbitrary—it is dictated by the page size chosen for the system. The page size determines how many bits are devoted to each portion.
The Mathematical Foundation:
For a system with:
The address splits as:
Example with 32-bit addresses and 4KB pages:
This creates an address space with:
Page sizes are constrained to powers of two (2^n) because this allows the address to be split using simple bit boundaries. If the page size were, say, 3000 bytes, extracting the page number would require expensive division operations on every memory access. With power-of-two sizes, extraction is instantaneous—just bit slicing, which is trivial in hardware.
Extracting the page number from a logical address is a simple bit operation: right-shift the address by the number of offset bits. This discards the offset and leaves only the page number.
Formula:
Page Number = Logical Address >> n
where n = log₂(page size in bytes)
Alternatively, this can be expressed as integer division:
Page Number = Logical Address / Page Size (integer division)
Both operations are equivalent when the page size is a power of two.
| Logical Address (Hex) | Logical Address (Binary, showing split) | Page Number | Verification |
|---|---|---|---|
| 0x00001000 | 0000 0000 0000 0000 0001 | 0000 0000 0000 | 0x00001 = 1 | 1 × 4096 = 4096 = 0x1000 ✓ |
| 0x00012345 | 0000 0000 0000 0001 0010 | 0011 0100 0101 | 0x00012 = 18 | Page 18, offset 0x345 |
| 0x00B5F080 | 0000 0000 1011 0101 1111 | 0000 1000 0000 | 0x00B5F = 2911 | Page 2911, offset 0x080 |
| 0xFFFFFFFF | 1111 1111 1111 1111 1111 | 1111 1111 1111 | 0xFFFFF = 1,048,575 | Last page, last byte |
| 0x00000FFF | 0000 0000 0000 0000 0000 | 1111 1111 1111 | 0x00000 = 0 | Page 0, offset 4095 |
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
#include <stdio.h>#include <stdint.h> /* * Page Number Extraction Demonstration * * This program shows how the MMU extracts the page number * from a logical address. In hardware, this is a simple * bit shift that happens in nanoseconds. */ // Common page sizes (in bytes and their log2 values)#define PAGE_SIZE_4KB 4096 // 2^12 - most common on x86#define PAGE_SIZE_2MB 2097152 // 2^21 - large pages#define PAGE_SIZE_1GB (1ULL << 30) // 2^30 - huge pages // Number of bits for offset (log2 of page size)#define OFFSET_BITS_4KB 12#define OFFSET_BITS_2MB 21#define OFFSET_BITS_1GB 30 /** * Extract page number from a logical address * * @param logical_addr The full logical address * @param offset_bits Number of bits used for offset (log2 of page size) * @return The page number */uint64_t extract_page_number(uint64_t logical_addr, int offset_bits) { // Simply right-shift to discard the offset bits return logical_addr >> offset_bits;} /** * Extract offset from a logical address * * @param logical_addr The full logical address * @param page_size The size of a page in bytes * @return The offset within the page */uint64_t extract_offset(uint64_t logical_addr, uint64_t page_size) { // Use bitwise AND with (page_size - 1) to mask off page number bits // This works because page_size is always a power of 2 return logical_addr & (page_size - 1);} void demonstrate_extraction(uint64_t addr, const char* addr_name) { printf("\n=== Analyzing Address: %s (0x%016llX) ===\n\n", addr_name, (unsigned long long)addr); // 4KB pages uint64_t page_4kb = extract_page_number(addr, OFFSET_BITS_4KB); uint64_t offset_4kb = extract_offset(addr, PAGE_SIZE_4KB); printf("With 4KB pages (12 offset bits):\n"); printf(" Page Number: 0x%llX (%llu)\n", (unsigned long long)page_4kb, (unsigned long long)page_4kb); printf(" Offset: 0x%llX (%llu)\n", (unsigned long long)offset_4kb, (unsigned long long)offset_4kb); printf(" Verification: %llu * 4096 + %llu = %llu\n", (unsigned long long)page_4kb, (unsigned long long)offset_4kb, (unsigned long long)(page_4kb * PAGE_SIZE_4KB + offset_4kb)); printf(" Match: %s\n\n", (page_4kb * PAGE_SIZE_4KB + offset_4kb == addr) ? "✓ YES" : "✗ NO"); // 2MB pages (large pages) uint64_t page_2mb = extract_page_number(addr, OFFSET_BITS_2MB); uint64_t offset_2mb = extract_offset(addr, PAGE_SIZE_2MB); printf("With 2MB pages (21 offset bits):\n"); printf(" Page Number: 0x%llX (%llu)\n", (unsigned long long)page_2mb, (unsigned long long)page_2mb); printf(" Offset: 0x%llX (%llu)\n", (unsigned long long)offset_2mb, (unsigned long long)offset_2mb); printf(" Match: %s\n\n", (page_2mb * PAGE_SIZE_2MB + offset_2mb == addr) ? "✓ YES" : "✗ NO");} int main() { printf("╔══════════════════════════════════════════════════════════════╗\n"); printf("║ PAGE NUMBER EXTRACTION DEMONSTRATION ║\n"); printf("╚══════════════════════════════════════════════════════════════╝\n"); // Test with various addresses demonstrate_extraction(0x00001000, "Start of page 1"); demonstrate_extraction(0x00012345, "Address in page 18"); demonstrate_extraction(0x00B5F080, "Address in page 2911"); demonstrate_extraction(0xDEADBEEF, "Classic test pattern"); demonstrate_extraction(0x7FFFFFFF, "Top of 32-bit user space"); // Show the bit manipulation explicitly printf("\n=== Bit-Level View of Extraction ===\n\n"); uint64_t addr = 0x00012345; printf("Address: 0x%08llX\n", (unsigned long long)addr); printf("Binary: "); for (int i = 31; i >= 0; i--) { printf("%d", (int)((addr >> i) & 1)); if (i == 12) printf(" | "); // Show the split point else if (i % 4 == 0 && i != 0 && i != 12) printf(" "); } printf("\n"); printf(" |--- Page Number ---| |-- Offset -----|\n"); printf(" 20 bits 12 bits\n"); return 0;} /* * Sample Output: * * ╔══════════════════════════════════════════════════════════════╗ * ║ PAGE NUMBER EXTRACTION DEMONSTRATION ║ * ╚══════════════════════════════════════════════════════════════╝ * * === Analyzing Address: Start of page 1 (0x0000000000001000) === * * With 4KB pages (12 offset bits): * Page Number: 0x1 (1) * Offset: 0x0 (0) * Verification: 1 * 4096 + 0 = 4096 * Match: ✓ YES * * With 2MB pages (21 offset bits): * Page Number: 0x0 (0) * Offset: 0x1000 (4096) * Match: ✓ YES */In actual hardware, page number extraction doesn't require any computation at all—it's achieved through wiring. The upper bits of the address bus are simply routed to the page table lookup circuitry, while the lower bits are held aside to be combined with the frame number later. The 'right shift' is implicit in how the address bits are connected.
The page size fundamentally determines the division between page number and offset. Different page sizes result in dramatically different address structures, with profound implications for the system's behavior.
| Page Size | Offset Bits | Page Number Bits | Max Pages | Max Addressable |
|---|---|---|---|---|
| 1 KB (2¹⁰) | 10 | 22 | 4,194,304 | 4 GB |
| 4 KB (2¹²) | 12 | 20 | 1,048,576 | 4 GB |
| 16 KB (2¹⁴) | 14 | 18 | 262,144 | 4 GB |
| 64 KB (2¹⁶) | 16 | 16 | 65,536 | 4 GB |
| 2 MB (2²¹) | 21 | 11 | 2,048 | 4 GB |
| 1 GB (2³⁰) | 30 | 2 | 4 | 4 GB |
The Tradeoff:
Larger pages mean:
Smaller pages mean:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667
#include <stdio.h>#include <stdint.h> /* * Demonstration: How page size affects page number extraction * * The same logical address results in different page numbers * depending on the configured page size. */ typedef struct { const char* name; uint64_t size; int offset_bits;} PageSizeConfig; void analyze_address(uint64_t addr, PageSizeConfig configs[], int num_configs) { printf("\nLogical Address: 0x%016llX\n", (unsigned long long)addr); printf("%-15s | %-20s | %-15s | %s\n", "Page Size", "Page Number", "Offset", "Page Boundary"); printf("%-15s-+-%-20s-+-%-15s-+-%s\n", "---------------", "--------------------", "---------------", "-------------------------"); for (int i = 0; i < num_configs; i++) { uint64_t page_num = addr >> configs[i].offset_bits; uint64_t offset = addr & (configs[i].size - 1); uint64_t page_start = page_num * configs[i].size; printf("%-15s | 0x%-18llX | 0x%-13llX | 0x%llX - 0x%llX\n", configs[i].name, (unsigned long long)page_num, (unsigned long long)offset, (unsigned long long)page_start, (unsigned long long)(page_start + configs[i].size - 1)); }} int main() { PageSizeConfig configs[] = { {"4 KB", 4096ULL, 12}, {"16 KB", 16384ULL, 14}, {"64 KB", 65536ULL, 16}, {"2 MB", 2097152ULL, 21}, {"1 GB", 1073741824ULL, 30}, }; int num_configs = sizeof(configs) / sizeof(configs[0]); printf("═══════════════════════════════════════════════════════════════\n"); printf(" HOW PAGE SIZE AFFECTS PAGE NUMBER EXTRACTION \n"); printf("═══════════════════════════════════════════════════════════════\n"); // Test addresses at interesting points analyze_address(0x00123456, configs, num_configs); analyze_address(0x00400000, configs, num_configs); // 4MB - common program start analyze_address(0x7FFFF000, configs, num_configs); // Near 2GB boundary analyze_address(0xDEADBEEF, configs, num_configs); // Classic test pattern printf("\n═══════════════════════════════════════════════════════════════\n"); printf("KEY INSIGHT: The same address 0x00123456 is:\n"); printf(" • Page 291 with 4KB pages (offset 1110 bytes from page start)\n"); printf(" • Page 18 with 64KB pages (offset 9302 bytes from page start)\n"); printf(" • Page 0 with 2MB pages (offset 1193046 bytes from start)\n"); printf("═══════════════════════════════════════════════════════════════\n"); return 0;}Modern systems often support multiple page sizes simultaneously (e.g., 4KB, 2MB, and 1GB on x86-64). The page table structure must indicate which size applies to each mapping. The page number extraction logic in the MMU adapts based on flags in the page table entries, allowing fine-grained choice per memory region.
The extracted page number serves a critical purpose: it becomes the index into the page table. This index directly identifies which page table entry (PTE) contains the mapping information for the referenced page.
The Lookup Process:
PageTable[p]Page Table Base Register (PTBR):
The CPU maintains a register pointing to the base address of the current process's page table. When a context switch occurs, the OS updates this register to point to the new process's page table. The page number is added to this base to locate the correct entry:
PTE Address = PTBR + (Page Number × Size of PTE)
For a page table with 4-byte entries:
PTE Address = PTBR + (p × 4)
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
#include <stdio.h>#include <stdint.h>#include <stdbool.h> /* * Simulation: Page Number as Page Table Index * * This demonstrates how the extracted page number is used * to index into the page table to find the frame mapping. */ #define PAGE_SIZE 4096#define OFFSET_BITS 12#define PAGE_TABLE_SIZE 1024 // Simplified: 1024 entries // Page Table Entry structure (simplified)typedef struct { uint32_t frame_number : 20; // Physical frame number uint32_t present : 1; // Page is in physical memory uint32_t read_write : 1; // 0 = read-only, 1 = read/write uint32_t user_super : 1; // 0 = supervisor, 1 = user accessible uint32_t accessed : 1; // Page has been read uint32_t dirty : 1; // Page has been written uint32_t reserved : 7; // Reserved bits} PageTableEntry; // Simulated page tablePageTableEntry page_table[PAGE_TABLE_SIZE]; // Initialize page table with sample mappingsvoid init_page_table() { // Clear all entries for (int i = 0; i < PAGE_TABLE_SIZE; i++) { page_table[i].present = 0; page_table[i].frame_number = 0; } // Set up some valid mappings // Page 0 -> Frame 100 page_table[0].frame_number = 100; page_table[0].present = 1; page_table[0].read_write = 1; page_table[0].user_super = 1; // Page 1 -> Frame 55 page_table[1].frame_number = 55; page_table[1].present = 1; page_table[1].read_write = 1; page_table[1].user_super = 1; // Page 18 -> Frame 0x5A2 (1442) page_table[18].frame_number = 0x5A2; page_table[18].present = 1; page_table[18].read_write = 1; page_table[18].user_super = 1; // Page 42 -> Not present (page fault example) page_table[42].present = 0;} // Extract page number from logical addressuint32_t get_page_number(uint64_t logical_addr) { return (uint32_t)(logical_addr >> OFFSET_BITS);} // Extract offset from logical addressuint32_t get_offset(uint64_t logical_addr) { return (uint32_t)(logical_addr & (PAGE_SIZE - 1));} // Simulate address translationbool translate_address(uint64_t logical_addr, uint64_t* physical_addr) { uint32_t page_num = get_page_number(logical_addr); uint32_t offset = get_offset(logical_addr); printf("\n┌─ Address Translation ─────────────────────────────────────┐\n"); printf("│ Logical Address: 0x%08llX │\n", (unsigned long long)logical_addr); printf("├────────────────────────────────────────────────────────────┤\n"); printf("│ Step 1: Extract page number │\n"); printf("│ Logical >> 12 = 0x%08X >> 12 = 0x%05X │\n", (uint32_t)logical_addr, page_num); printf("│ Page Number (p) = %u │\n", page_num); printf("├────────────────────────────────────────────────────────────┤\n"); printf("│ Step 2: Index into page table │\n"); printf("│ PageTable[%u] │\n", page_num); if (page_num >= PAGE_TABLE_SIZE) { printf("│ ERROR: Page number exceeds table size! │\n"); printf("└────────────────────────────────────────────────────────────┘\n"); return false; } PageTableEntry* pte = &page_table[page_num]; printf("│ Frame: 0x%05X, Present: %d, R/W: %d │\n", pte->frame_number, pte->present, pte->read_write); if (!pte->present) { printf("├────────────────────────────────────────────────────────────┤\n"); printf("│ RESULT: PAGE FAULT - Page not in physical memory! │\n"); printf("└────────────────────────────────────────────────────────────┘\n"); return false; } // Form physical address *physical_addr = ((uint64_t)pte->frame_number << OFFSET_BITS) | offset; printf("├────────────────────────────────────────────────────────────┤\n"); printf("│ Step 3: Form physical address │\n"); printf("│ Frame (0x%05X) << 12 | Offset (0x%03X) │\n", pte->frame_number, offset); printf("│ = 0x%05X000 | 0x%03X │\n", pte->frame_number, offset); printf("│ = 0x%08llX │\n", (unsigned long long)*physical_addr); printf("├────────────────────────────────────────────────────────────┤\n"); printf("│ RESULT: Physical Address = 0x%08llX │\n", (unsigned long long)*physical_addr); printf("└────────────────────────────────────────────────────────────┘\n"); return true;} int main() { init_page_table(); printf("═══════════════════════════════════════════════════════════════\n"); printf(" PAGE NUMBER AS PAGE TABLE INDEX \n"); printf("═══════════════════════════════════════════════════════════════\n"); uint64_t physical; // Test translations translate_address(0x00000500, &physical); // Page 0 translate_address(0x00001234, &physical); // Page 1 translate_address(0x00012345, &physical); // Page 18 translate_address(0x0002A000, &physical); // Page 42 - will fault return 0;}The beauty of using the page number as a direct index is that it provides O(1) lookup time. Unlike data structures that require searching (linked lists, trees), the page table provides instantaneous access to any entry. Given page number p, the PTE is always at a calculable offset from the page table base. This constant-time translation is critical for performance.
In systems with large address spaces (64-bit), a single-level page table would be impractically large. Instead, the page number itself is further decomposed into multiple indices, each addressing a different level of a hierarchical page table structure.
Example: x86-64 with 4-Level Paging (4KB pages, 48-bit virtual addresses):
The 48-bit virtual address is split as follows:
| Level | Index Bits | Index Range | Entry Size | Table Size | Purpose |
|---|---|---|---|---|---|
| PML4 | 47-39 (9) | 0-511 | 8 bytes | 4 KB | Points to PDPT |
| PDPT | 38-30 (9) | 0-511 | 8 bytes | 4 KB | Points to PD (or 1GB page) |
| PD | 29-21 (9) | 0-511 | 8 bytes | 4 KB | Points to PT (or 2MB page) |
| PT | 20-12 (9) | 0-511 | 8 bytes | 4 KB | Points to 4KB frame |
| Offset | 11-0 (12) | 0-4095 | N/A | N/A | Byte within page |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889
#include <stdio.h>#include <stdint.h> /* * Multi-Level Page Number Extraction (x86-64 style) * * Demonstrates how a 48-bit virtual address is decomposed * into multiple indices for hierarchical page table lookup. */ // Bit positions for x86-64 4-level paging#define PML4_SHIFT 39#define PDPT_SHIFT 30#define PD_SHIFT 21#define PT_SHIFT 12 #define INDEX_MASK 0x1FF // 9 bits = 512 entries typedef struct { uint16_t pml4_index; // Index into PML4 (Level 4) uint16_t pdpt_index; // Index into PDPT (Level 3) uint16_t pd_index; // Index into Page Directory (Level 2) uint16_t pt_index; // Index into Page Table (Level 1) uint16_t offset; // Offset within page} DecomposedAddress; DecomposedAddress decompose_virtual_address(uint64_t vaddr) { DecomposedAddress result; // Extract each component by shifting and masking result.pml4_index = (vaddr >> PML4_SHIFT) & INDEX_MASK; result.pdpt_index = (vaddr >> PDPT_SHIFT) & INDEX_MASK; result.pd_index = (vaddr >> PD_SHIFT) & INDEX_MASK; result.pt_index = (vaddr >> PT_SHIFT) & INDEX_MASK; result.offset = vaddr & 0xFFF; // Lower 12 bits return result;} void print_address_decomposition(uint64_t vaddr) { DecomposedAddress d = decompose_virtual_address(vaddr); printf("\n╔══════════════════════════════════════════════════════════════╗\n"); printf("║ Virtual Address: 0x%016llX ║\n", (unsigned long long)vaddr); printf("╠══════════════════════════════════════════════════════════════╣\n"); // Binary representation with groupings printf("║ Binary Layout: ║\n"); printf("║ ┌─────────┬─────────┬─────────┬─────────┬────────────┐ ║\n"); printf("║ │ PML4 │ PDPT │ PD │ PT │ Offset │ ║\n"); printf("║ │ [47:39] │ [38:30] │ [29:21] │ [20:12] │ [11:0] │ ║\n"); printf("║ └─────────┴─────────┴─────────┴─────────┴────────────┘ ║\n"); printf("╠══════════════════════════════════════════════════════════════╣\n"); printf("║ Extracted Indices: ║\n"); printf("║ PML4 Index: %3u (0x%03X) → PML4[%3u] ║\n", d.pml4_index, d.pml4_index, d.pml4_index); printf("║ PDPT Index: %3u (0x%03X) → PDPT[%3u] ║\n", d.pdpt_index, d.pdpt_index, d.pdpt_index); printf("║ PD Index: %3u (0x%03X) → PD[%3u] ║\n", d.pd_index, d.pd_index, d.pd_index); printf("║ PT Index: %3u (0x%03X) → PT[%3u] ║\n", d.pt_index, d.pt_index, d.pt_index); printf("║ Offset: %4u (0x%03X) → Byte %u within page ║\n", d.offset, d.offset, d.offset); printf("╚══════════════════════════════════════════════════════════════╝\n");} int main() { printf("═══════════════════════════════════════════════════════════════\n"); printf(" x86-64 MULTI-LEVEL PAGE NUMBER EXTRACTION \n"); printf("═══════════════════════════════════════════════════════════════\n"); // User space addresses (canonical, positive) print_address_decomposition(0x0000000000001000ULL); // First 4KB page print_address_decomposition(0x00007FFFF7A3D000ULL); // Typical libc address print_address_decomposition(0x0000555555554000ULL); // Typical program text // Kernel space addresses (canonical, negative - top bits set) print_address_decomposition(0xFFFF800000000000ULL); // Start of kernel space print_address_decomposition(0xFFFFFFFF80000000ULL); // High kernel address printf("\n═══════════════════════════════════════════════════════════════\n"); printf("NOTE: Each index (0-511) selects one of 512 entries in its table.\n"); printf("The full translation requires 4 table lookups + offset addition.\n"); printf("═══════════════════════════════════════════════════════════════\n"); return 0;}With 9 bits per index level, each table has 512 entries. Each 8-byte entry means each table is exactly 4KB—the same size as a page. This elegant design means page tables themselves fit neatly into physical frames, simplifying memory management for the page tables themselves.
Understanding page number extraction from the hardware perspective reveals why this operation is so efficient. The Memory Management Unit (MMU) doesn't compute anything—it routes signals.
Hardware Implementation:
When the CPU generates a logical address:
Critical Hardware Properties:
The TLB as Cached Extraction:
The Translation Lookaside Buffer (TLB) caches recent page-to-frame translations. When a logical address is presented:
TLB Hit (common case): The page number directly matches a TLB entry. The frame number is returned immediately (typically 1 clock cycle).
TLB Miss (rare case): The page number doesn't match any TLB entry. The page table walker traverses the page table hierarchy (20-200+ cycles). The result is cached in the TLB.
The TLB effectively caches the result of page table lookups, but the extraction of the page number happens identically in both cases—it's the lookup step that differs.
Memory access is on the critical path of virtually all software. Every instruction fetch, every data load, every store goes through address translation. If page number extraction required even a few clock cycles of computation, it would add nanoseconds to every memory access—slowing down everything. The zero-cost wiring approach ensures translation adds minimal latency.
Page number extraction is the first—and simplest—step in the address translation process, yet its simplicity is what makes paging practical for high-performance systems.
What's Next:
With page number extraction understood, we'll examine the complementary operation: offset extraction. While the page number identifies which page, the offset identifies which byte within that page. Together, they provide complete location information—one for coarse-grained lookup, one for fine-grained addressing.
You now understand how the page number is extracted from a logical address—the fundamental first step in paging address translation. This knowledge forms the foundation for understanding the complete translation mechanism: from logical address to physical memory location.