Loading content...
When page-level memory management proves insufficient to prevent thrashing, operating systems can resort to a more drastic measure: process swapping. Unlike demand paging, which manages individual pages, swapping moves an entire process between main memory and secondary storage (the swap space or swap file).
Swapping is a blunt instrument—it completely removes a process from memory competition, freeing all its frames at once. This provides immediate, substantial relief to memory pressure but comes at significant cost: the swapped process cannot execute until it's brought back, and the I/O required for swapping is considerable. Understanding when and how to use this technique is essential for robust system design.
By the end of this page, you will understand the mechanics of process swapping versus demand paging, swap space organization and management, policies for deciding which processes to swap out, the performance implications and costs of swapping, and how modern systems have evolved beyond traditional swapping.
Swapping and demand paging represent fundamentally different approaches to managing memory overcommitment. Understanding their relationship is crucial for appreciating when each is appropriate.
Historical Context:
Swapping predates virtual memory. Early time-sharing systems like CTSS (1961) used swapping because hardware didn't support page tables. The entire process image was written to drum storage when switching to another process. This was slow but simple—no complex page tables or TLBs needed.
Demand paging emerged with virtual memory hardware in the late 1960s. By managing memory at page granularity, systems could keep hot pages in memory while cold pages resided on disk, achieving better memory utilization.
| Aspect | Process Swapping | Demand Paging |
|---|---|---|
| Unit of Transfer | Entire process image | Individual pages (typically 4KB) |
| Granularity | Coarse—all or nothing | Fine—page by page |
| I/O Cost per Operation | Very high (megabytes) | Low (kilobytes) |
| Memory Freed per Operation | Large (entire process) | Small (one page) |
| Latency to Resume | High (reload entire image) | Low (fault in pages as needed) |
| Suitable For | Severe memory pressure | Normal operation |
| Hardware Requirements | Minimal | Page tables, TLB, valid/invalid bits |
| Implementation Complexity | Low | High |
When Swapping Becomes Necessary:
Swapping typically engages when demand paging alone cannot maintain system stability:
Modern Hybrid Approach:
Contemporary systems use a spectrum approach:
True whole-process swapping is rare in modern systems. Linux, for example, focuses on page-level management and rarely swaps entire processes. Instead, it may swap out all pages of a process incrementally. Windows maintains more explicit working set management but also focuses on page granularity. The term 'swapping' persists but often means aggressive page-out rather than monolithic process transfer.
The effectiveness of swapping depends heavily on how swap space is organized. Efficient swap management minimizes I/O latency and maximizes throughput.
Swap Space Location Options:
Dedicated Swap Partition: A separate disk partition used exclusively for swap
Swap File: A regular file on a filesystem used for swap
Hybrid: Combination of partition and files
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
// Swap Space Management Implementation// Demonstrates core swap area organization #include <stdint.h>#include <stdbool.h> #define SWAP_PAGE_SIZE 4096#define SWAP_CLUSTER_SIZE 8 // Pages per cluster for contiguous allocation typedef struct { uint64_t offset; // Byte offset in swap device uint32_t swap_page; // Swap slot number} SwapEntry; typedef struct { char* device_path; // e.g., "/dev/sda2" or "/swapfile" uint64_t size_bytes; uint32_t total_pages; uint32_t free_pages; uint32_t* free_bitmap; // Bit per swap page: 0=free, 1=used int priority; // Higher priority = used first bool is_partition; // vs swap file uint32_t next_free_hint; // Optimization: start search here} SwapArea; typedef struct { SwapArea* areas; int area_count; uint64_t total_swap_pages; uint64_t used_swap_pages;} SwapManager; // Allocate a swap slotSwapEntry allocate_swap_slot(SwapManager* sm) { SwapEntry result = { .offset = INVALID_OFFSET, .swap_page = INVALID }; // Try areas in priority order for (int i = 0; i < sm->area_count; i++) { SwapArea* area = &sm->areas[i]; if (area->free_pages == 0) continue; // Try cluster allocation for contiguous swap-out int slot = find_free_cluster(area, SWAP_CLUSTER_SIZE); if (slot == -1) { slot = find_any_free_slot(area); } if (slot != -1) { mark_slot_used(area, slot); result.offset = area->device_offset + (slot * SWAP_PAGE_SIZE); result.swap_page = slot + area->base_slot; sm->used_swap_pages++; return result; } } // No swap space available return result;} // Free a swap slotvoid free_swap_slot(SwapManager* sm, SwapEntry entry) { SwapArea* area = find_area_for_slot(sm, entry.swap_page); if (area == NULL) return; int local_slot = entry.swap_page - area->base_slot; mark_slot_free(area, local_slot); sm->used_swap_pages--;} // Find contiguous free slots for cluster allocationint find_free_cluster(SwapArea* area, int cluster_size) { int consecutive = 0; int start = -1; for (uint32_t i = area->next_free_hint; i < area->total_pages; i++) { if (!is_slot_used(area, i)) { if (consecutive == 0) start = i; consecutive++; if (consecutive >= cluster_size) { return start; } } else { consecutive = 0; start = -1; } } // Wrap around search for (uint32_t i = 0; i < area->next_free_hint && consecutive < cluster_size; i++) { if (!is_slot_used(area, i)) { if (consecutive == 0) start = i; consecutive++; if (consecutive >= cluster_size) { return start; } } else { consecutive = 0; start = -1; } } return -1; // No cluster available}Swap Space Sizing:
Historical rules suggested swap = 2× RAM, but modern recommendations vary:
| System Type | RAM Size | Recommended Swap | Rationale |
|---|---|---|---|
| Server (no hibernation) | 16GB+ | 0.5-1× RAM | Rarely swaps if well-tuned |
| Workstation | 8-32GB | 1× RAM | Occasional heavy loads |
| With Hibernation | Any | ≥ RAM | Hibernation writes all RAM to swap |
| Embedded (constrained) | <1GB | 0 (no swap) | Flash wear concerns |
Swap Priority:
When multiple swap areas exist, priority determines usage order:
# Linux example: checking swap configuration
$ swapon --show
NAME TYPE SIZE USED PRIO
/dev/sda2 partition 8G 1.2G -2
/swapfile file 4G 0 -3
Swap on SSD is dramatically faster than on HDD—random access is similar to sequential, making page-in operations much quicker. However, heavy swapping can wear out SSDs faster due to write amplification. For systems that swap heavily, HDD swap may be more economical; for systems that rarely swap, SSD provides faster recovery when swapping does occur.
When a process is selected for swap-out, the operating system must carefully transfer its state to swap space while maintaining the ability to resume execution later.
Swap-Out Steps:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117
// Process Swap-Out Implementation// Complete process transfer from memory to swap typedef enum { PROC_RUNNING, PROC_READY, PROC_BLOCKED, PROC_SWAPPING_OUT, PROC_SWAPPED, PROC_SWAPPING_IN} ProcessState; typedef struct { pid_t pid; ProcessState state; PageTableEntry* page_table; uint32_t page_table_size; uint32_t resident_pages; uint64_t swap_start_slot; // First slot in swap (for contiguous) bool swap_is_contiguous; // Optimization flag} ProcessSwapState; // Main swap-out functionint swap_out_process(ProcessSwapState* proc) { // Step 1: Mark as swapping ProcessState old_state = proc->state; proc->state = PROC_SWAPPING_OUT; remove_from_run_queue(proc->pid); // Step 2: Allocate swap space uint32_t pages_to_swap = proc->resident_pages; SwapAllocation* alloc = allocate_swap_space(pages_to_swap); if (alloc == NULL) { // Not enough swap space proc->state = old_state; add_to_run_queue(proc->pid); return -ENOMEM; } // Step 3: Write pages to swap int pages_written = 0; int alloc_index = 0; for (uint32_t i = 0; i < proc->page_table_size; i++) { PageTableEntry* pte = &proc->page_table[i]; if (!pte->present) continue; // Already not in memory // Handle dirty pages if (pte->dirty) { sync_page_to_backing_store(pte); } // Write to swap SwapEntry slot = alloc->slots[alloc_index++]; int result = write_page_to_swap(pte->frame_number, slot); if (result < 0) { // Swap write failed - rollback rollback_swap_out(proc, pages_written); return result; } // Update page table pte->present = false; pte->swap_slot = slot.swap_page; pte->in_swap = true; // Free the frame free_frame(pte->frame_number); pte->frame_number = INVALID_FRAME; pages_written++; } // Step 4: Finalize swap-out proc->resident_pages = 0; proc->swap_start_slot = alloc->slots[0].swap_page; proc->swap_is_contiguous = alloc->is_contiguous; proc->state = PROC_SWAPPED; // Invalidate TLB entries for this process flush_tlb_for_process(proc->pid); log_swap_out(proc->pid, pages_written); update_swap_statistics(pages_written, SWAP_OUT); return pages_written;} // Optimized swap-out for contiguous allocationint swap_out_contiguous(ProcessSwapState* proc, uint64_t swap_offset) { // When swap allocation is contiguous, we can use more efficient I/O // Gather all resident pages into a contiguous buffer void* buffer = allocate_swap_buffer(proc->resident_pages * PAGE_SIZE); int offset = 0; for (uint32_t i = 0; i < proc->page_table_size; i++) { PageTableEntry* pte = &proc->page_table[i]; if (pte->present) { memcpy(buffer + offset, physical_to_virtual(pte->frame_number), PAGE_SIZE); offset += PAGE_SIZE; } } // Single large write to swap int result = write_to_swap_device(buffer, proc->resident_pages * PAGE_SIZE, swap_offset); free_swap_buffer(buffer); // Update page tables as before... return result;}Swap-out must handle failures gracefully. If the system crashes mid-swap, the process state must be recoverable. Modern systems typically don't guarantee swap atomicity—a crash during swap may lose the process. For critical processes, redundancy at the application level is essential.
When a swapped process is selected for resumption, its memory image must be restored from swap space. Two approaches exist: eager swap-in (load everything immediately) and lazy swap-in (load on demand).
Eager Swap-In (Traditional):
Lazy Swap-In (Demand-Based):
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142
// Process Swap-In Implementation// Both eager and lazy approaches typedef enum { SWAPIN_EAGER, SWAPIN_LAZY, SWAPIN_HYBRID // Load working set eagerly, rest lazily} SwapInStrategy; // Eager swap-in: load everything before runningint swap_in_eager(ProcessSwapState* proc) { proc->state = PROC_SWAPPING_IN; // Count pages needed uint32_t pages_needed = count_swapped_pages(proc); // Allocate all frames upfront Frame* frames = allocate_frames(pages_needed); if (frames == NULL) { return -ENOMEM; // Not enough memory } // Read all pages from swap int frame_index = 0; for (uint32_t i = 0; i < proc->page_table_size; i++) { PageTableEntry* pte = &proc->page_table[i]; if (!pte->in_swap) continue; // Read from swap int result = read_page_from_swap(pte->swap_slot, frames[frame_index].number); if (result < 0) { rollback_swap_in(proc, frames, frame_index); return result; } // Update page table pte->frame_number = frames[frame_index].number; pte->present = true; pte->in_swap = false; // Free swap slot free_swap_slot(pte->swap_slot); pte->swap_slot = INVALID_SLOT; frame_index++; } proc->resident_pages = pages_needed; proc->state = PROC_READY; add_to_run_queue(proc->pid); log_swap_in(proc->pid, pages_needed, "eager"); return pages_needed;} // Lazy swap-in: process starts immediately, pages faulted inint swap_in_lazy(ProcessSwapState* proc) { // Just mark as ready - pages will be demand-loaded via page faults proc->state = PROC_READY; proc->resident_pages = 0; // Nothing resident yet add_to_run_queue(proc->pid); log_swap_in(proc->pid, 0, "lazy"); return 0;} // Hybrid: load estimated working set, rest on demandint swap_in_hybrid(ProcessSwapState* proc) { proc->state = PROC_SWAPPING_IN; // Estimate working set from previous execution uint32_t ws_pages = estimate_working_set(proc->pid); // Allocate frames for working set Frame* frames = allocate_frames(ws_pages); if (frames == NULL) { // Fall back to lazy return swap_in_lazy(proc); } // Load working set pages int frame_index = 0; for (uint32_t i = 0; i < proc->page_table_size && frame_index < ws_pages; i++) { PageTableEntry* pte = &proc->page_table[i]; if (!pte->in_swap) continue; if (!is_in_working_set_estimate(proc->pid, i)) continue; // Read from swap read_page_from_swap(pte->swap_slot, frames[frame_index].number); pte->frame_number = frames[frame_index].number; pte->present = true; pte->in_swap = false; free_swap_slot(pte->swap_slot); frame_index++; } // Remaining pages stay in swap, will be demand-loaded proc->resident_pages = frame_index; proc->state = PROC_READY; add_to_run_queue(proc->pid); log_swap_in(proc->pid, frame_index, "hybrid"); return frame_index;} // Page fault handler for swapped pagesvoid handle_swap_page_fault(ProcessSwapState* proc, uint32_t faulting_page) { PageTableEntry* pte = &proc->page_table[faulting_page]; if (!pte->in_swap) { // This is a different type of fault handle_other_fault(proc, faulting_page); return; } // Allocate frame Frame* frame = allocate_frame(); if (frame == NULL) { // Memory pressure - may need to wait or trigger reclamation trigger_frame_reclamation(); frame = allocate_frame_blocking(); } // Read from swap read_page_from_swap(pte->swap_slot, frame->number); // Update page table pte->frame_number = frame->number; pte->present = true; pte->in_swap = false; free_swap_slot(pte->swap_slot); proc->resident_pages++; // Resume faulting instruction}A smart hybrid approach uses prepaging: when a page fault occurs for a swapped page, also load adjacent pages or pages predicted to be needed soon. This reduces initial fault storms while maintaining lazy loading benefits. The prediction can be based on spatial locality (nearby pages) or recorded working set from before swap-out.
Effective swapping requires intelligent policies for multiple decisions: when to swap, whom to swap out, and whom to swap in. These policies significantly impact system fairness and performance.
When to Swap Out:
Triggers for swap-out consideration:
Whom to Swap Out:
Selection criteria balance efficiency, fairness, and system policy:
| Criterion | Factor | Rationale |
|---|---|---|
| Priority | Lower priority → swap first | Protect important work |
| Process State | Blocked → swap first | Already not running |
| Memory Size | Larger → more benefit | Frees more frames |
| Idle Time | Longer idle → swap first | Not actively needed |
| Swap Count | Lower count → swap first | Avoid swap ping-pong |
| Interactive | Interactive → protect | Maintain responsiveness |
When to Swap In:
Triggers for swap-in consideration:
Swap Scheduling:
Swap I/O must be scheduled efficiently:
A poorly-tuned system can enter a 'swap storm' where processes are swapped in and out repeatedly. This happens when: (1) swap-in threshold is too close to swap-out threshold, (2) processes are swapped in before they're ready, or (3) not enough memory is freed per swap-out. Hysteresis and minimum residence times prevent this.
Swapping has significant performance implications that must be understood to use it effectively. The costs can be substantial, but so can the benefits when used appropriately.
Direct Costs:
| Cost Category | Magnitude | Components |
|---|---|---|
| Swap-Out I/O | 10-100+ ms (HDD) / 1-10 ms (SSD) | Seek time + write time for all pages |
| Swap-In I/O | 10-100+ ms (HDD) / 1-10 ms (SSD) | Seek time + read time for all pages |
| Memory Allocation | ~1 ms | Finding/allocating frames for swap-in |
| Page Table Updates | <1 ms | Updating PTEs, flushing TLBs |
| Context Switch Overhead | ~0.1 ms | Saving/restoring process state |
| Cache Effects | Variable (high) | All CPU caches cold after swap-in |
Quantifying Swap Time:
For a process with P pages on HDD swap:
Swap-out time ≈ seek_time + (P × page_size / write_bandwidth)
Swap-in time ≈ seek_time + (P × page_size / read_bandwidth)
Example calculation:
With SSD (0.1ms seek, 500 MB/s):
Indirect Costs:
Benefits of Judicious Swapping:
Despite costs, swapping provides value when:
The Trade-off Equation:
If: Cost(swap_out + swap_in) < Cost(thrashing_duration × N_affected_processes)
Then: Swapping is beneficial
Typically, if thrashing would persist for more than a few seconds, swapping becomes worthwhile.
Modern systems reduce swap costs through: SSDs (10-50× faster than HDD), compressed swap (reduce I/O volume), zram (compressed swap in RAM), lazy loading (don't load everything immediately), and prepaging (smart batch loading). These make swapping less painful but don't eliminate the fundamental costs.
Contemporary systems have developed alternatives to traditional whole-process swapping, often providing better performance characteristics.
1. Memory Compression (zswap, zram):
Instead of writing pages to disk, compress them in memory:
Compressed Page Size = Original Size × Compression Ratio
Typical Ratio = 2:1 to 4:1 for text/data
Benefits:
Limitations:
2. Application Cooperation (Memory Pressure Notifications):
Rather than forcibly swapping, notify applications of memory pressure:
Benefits: Applications know best what they can release; voluntary release is often more effective than forced swap.
3. Automatic Application Termination/Restart:
Mobile and modern desktop OSes terminate inactive applications:
This is more drastic than swapping but faster for recovery (app restarts fresh rather than reloading swap image).
4. Tiered Memory (DRAM + Persistent Memory):
Some systems have DRAM and slower persistent memory (Intel Optane):
| Approach | Latency | Capacity | Best For |
|---|---|---|---|
| HDD Swap | 10-100 ms | Unlimited (disk) | Large, inactive processes |
| SSD Swap | 1-10 ms | Limited (SSD wear) | Occasional swap, quick recovery |
| Compressed Memory | ~0.1 ms | Limited (RAM/3) | Compressible data, low pressure |
| App Cooperation | ~1 ms | Varies | Cache-heavy applications |
| App Termination | 0ms (later restart) | Full recovery | Mobile, transient apps |
| Tiered Memory | ~0.1 ms | Large (PMEM) | High-memory servers |
Modern systems use all these techniques in a hierarchy: first try application cooperation (release caches), then compression, then swap to SSD, then swap to HDD, finally terminate applications. Each level is progressively more drastic and costly. Good memory management rarely reaches the drastic levels.
Process swapping represents the heavyweight approach to memory management—trading substantial I/O cost for substantial memory recovery. Understanding when and how to use it is crucial for system stability under memory pressure. Let's consolidate our learning:
Looking Ahead:
We've explored working set approaches, PFF monitoring, load control, and process swapping—all software solutions to memory management challenges. The final page examines the ultimate solution: Adding Memory—when and how to address thrashing by increasing physical memory capacity, and the engineering decisions involved.
You now understand process swapping as a tool for thrashing mitigation. While modern systems have developed sophisticated alternatives, swapping remains a fundamental technique in the OS toolkit. The key is knowing when to use it—when the cost of swapping is lower than the cost of continued thrashing.