Loading learning content...
Memory compaction eliminates external fragmentation, but this benefit comes at a substantial cost. Every byte moved during compaction represents CPU cycles spent, memory bandwidth consumed, and cache lines invalidated. Every process suspended for relocation represents application latency that users experience.
Understanding compaction cost is not merely an academic exercise—it's essential knowledge for system designers who must weigh the benefits of consolidated memory against the real performance penalties incurred. This page provides a rigorous analysis of compaction costs, enabling you to make informed trade-offs in memory management design.
By the end of this page, you will understand how to calculate compaction time precisely, identify all components of compaction overhead, analyze the impact on system performance metrics, and make data-driven decisions about compaction policies based on quantitative cost analysis.
The fundamental cost of compaction is time—the duration during which memory operations consume system resources and affected processes cannot execute. Let's build a precise model of compaction time.
The Basic Model:
For a single memory region of size S bytes being moved:
T_move = S / bandwidth_effective
Where bandwidth_effective is the actual memory copy throughput, which depends on:
Realistic Memory Bandwidth:
Modern systems have theoretical memory bandwidths measured in GB/s, but effective copy throughput is significantly lower:
| System Type | Theoretical Bandwidth | Effective Copy Rate | Overhead Factor |
|---|---|---|---|
| Modern Desktop (DDR5) | ~50 GB/s | ~15-25 GB/s | 2-3x |
| Server-class (Multi-channel) | ~100 GB/s | ~30-50 GB/s | 2-3x |
| Embedded System (DDR3) | ~10 GB/s | ~3-5 GB/s | 2-3x |
| Older System (DDR2) | ~6 GB/s | ~2-3 GB/s | 2-3x |
Why Effective Rate is Lower:
Read-Modify-Write Cycles: Memory copy involves reading from source AND writing to destination—double the bus traffic
Cache Pollution: Large copies may evict useful data from CPU caches, causing secondary slowdowns afterward
Memory Controller Overhead: Arbitration, refreshing, and command processing consume bandwidth
Non-Sequential Access: If source/destination aren't well-aligned, access patterns become suboptimal
The Complete Time Formula:
For compacting N processes with sizes S₁, S₂, ..., Sₙ:
T_total = T_suspend + T_plan + T_copy + T_update + T_resume
Where:
T_suspend = Σ(suspension_overhead_per_process) ≈ N × t_suspend
T_plan = O(N log N) for optimal planning, O(N) for greedy
T_copy = Σ(Sᵢ) / bandwidth_effective [the dominant term]
T_update = O(N) for PCB updates + O(structures) for kernel updates
T_resume = Σ(resumption_overhead_per_process) ≈ N × t_resume
In almost all practical scenarios, T_copy dominates total compaction time. For moving 1GB of memory at 20GB/s effective rate, T_copy = 50ms. Process suspension/resumption typically adds only microseconds to low milliseconds. Focus optimization efforts on minimizing bytes moved.
Concrete Examples:
Example 1: Small System
T_copy = 100 MB / 5 GB/s = 100 MB / 5000 MB/s = 20 ms
T_other ≈ 10 × 0.1 ms (suspend) + 10 × 0.1 ms (resume) = 2 ms
T_total ≈ 22 ms
Example 2: Large Server
T_copy = 10 GB / 30 GB/s = 333 ms
T_other ≈ 500 × 0.1 ms + 500 × 0.1 ms = 100 ms
T_total ≈ 433 ms (nearly half a second!)
Example 3: Real-Time Embedded
T_copy = 10 MB / 3 GB/s = 3.3 ms
T_other ≈ 20 × 0.05 ms + 20 × 0.05 ms = 2 ms
T_total ≈ 5.3 ms
For a real-time system with deadlines, even 5ms can be problematic.
Beyond raw time, compaction consumes specific system resources. Understanding these costs enables better resource planning and trade-off analysis.
CPU Cycles:
The memcpy or memmove operation is CPU-intensive even when memory-bound:
Modern CPUs can copy ~4-8 bytes per cycle:
For a 3 GHz CPU copying 1 GB:
Cycles needed ≈ 1 GB / 8 bytes × some_constant
≈ 125 million cycles × 2 (read+write)
≈ 250 million cycles minimum
Time = 250M / 3GHz ≈ 83 ms (compute-bound estimate)
Actual is memory-bound, so ~50 ms at 20 GB/s
Memory Bandwidth Consumption:
Compaction is a memory bandwidth hog. During compaction:
Impact on Other Operations:
During compaction, other memory-intensive operations slow dramatically:
Memory Bandwidth Math:
Compaction bandwidth = 2 × bytes_moved / time
(factor of 2 for read + write)
If moving 1 GB in 50 ms:
Bandwidth used = 2 × 1 GB / 0.05 s = 40 GB/s
On a system with 50 GB/s total bandwidth:
80% of bandwidth consumed by compaction!
During compaction, other system components compete for the remaining 20% of memory bandwidth. This can cause cascading slowdowns far beyond the directly compacted processes. I/O operations, graphics updates, and other memory-intensive tasks all suffer.
Cache Impact:
Compaction devastates CPU caches:
L1 Cache (typically 32-64 KB per core):
L2 Cache (typically 256 KB - 1 MB per core):
L3 Cache (typically 8-32 MB shared):
Post-Compaction Cache Warming Penalty:
After compaction, resumed processes face cold caches:
Post-compaction penalty per process:
= (working_set_size / cache_line_size) × cache_miss_penalty
For a process with 1 MB working set, 64-byte cache lines, 100ns miss penalty:
= (1 MB / 64 B) × 100 ns = 16,384 × 100 ns = 1.64 ms
This penalty is per process and adds to the visible latency users experience.
| Compaction Size | L1 Impact | L2 Impact | L3 Impact | Warming Penalty |
|---|---|---|---|---|
| < 64 KB | Partial flush | Minimal | None | < 0.1 ms |
| 64 KB - 1 MB | Complete flush | Partial flush | Minimal | 0.1 - 1 ms |
| 1 MB - 10 MB | Complete flush | Complete flush | Partial | 1 - 5 ms |
10 MB | Complete flush | Complete flush | Complete flush | 5 - 50 ms |
The direct costs of compaction—time and bandwidth—are measurable. Indirect costs are more subtle but equally impactful on system behavior.
Cost 1: Process Suspension Latency
Processes suspended for compaction cannot do useful work. This creates visible latency:
User-Facing Impact:
Background Process Impact:
Compaction Latency Impact Scenarios: Scenario 1: Interactive Desktop- User clicks button → application responds- Normal response time: 20 ms- With compaction during click: 20 ms + 50 ms compaction = 70 ms- User perception: Noticeable lag ("system feels slow") Scenario 2: Game Engine- Frame deadline: 16.67 ms (60 FPS target)- Compaction duration: 30 ms- Result: Dropped frames, visible stutter- Multiple processes: Potentially 100+ ms pause Scenario 3: Trading System- Acceptable latency: < 1 ms- Compaction duration: 10 ms- Result: Trades missed, potential financial loss Scenario 4: Real-Time Control- Control loop deadline: 5 ms- Compaction duration: 8 ms- Result: Deadline miss → system safety concernCost 2: Priority Inversion
When compaction is triggered by a low-priority process's allocation failure, high-priority processes may be suspended to compact memory—classic priority inversion.
Scenario:
Mitigation Strategies:
Cost 3: TLB and Virtual Memory Effects
In systems using paging on top of contiguous allocation, or in hybrid schemes, compaction may invalidate Translation Lookaside Buffer (TLB) entries:
TLB Shootdown Cost:
TLB shootdown (inter-processor interrupt) cost:
= IPI_latency + TLB_flush_time + synchronization
≈ 1-10 microseconds per shootdown
For compaction affecting multiple processes on multiple cores:
Total TLB cost = num_cores × num_affected_entries × per_entry_cost
= 8 cores × 100 entries × 2 μs = 1.6 ms
Post-Shootdown Penalty: After TLB invalidation, every memory access incurs TLB miss penalties until the TLB is repopulated:
TLB miss penalty: 10-100 cycles
Memory accesses to re-warm TLB: ~1000 per process
Total TLB rewarming: 1000 × 50 cycles / 3 GHz = 17 μs per process
Cost 4: I/O Pipeline Disruption
I/O operations in progress may be affected even if their memory is pinned:
DMA Channel Effects:
Disk I/O Impact:
During compaction:
- Read requests complete but callbacks wait
- Write requests may have data in volatile state
- Disk queue servicing is delayed
Post-compaction:
- Burst of I/O completions
- Potential I/O scheduler confusion
- Temporary throughput spike then normalization
Network I/O Impact:
Compaction's indirect costs create a ripple effect throughout the system. A 100ms compaction can cause visible effects lasting seconds afterward: caches warming, TLBs repopulating, I/O backlogs clearing, and applications catching up on delayed work. These 'aftershock' costs are often larger than the compaction itself.
The decision to compact should be based on a quantitative comparison of costs versus benefits. Let's build a framework for this analysis.
Benefit of Compaction:
The primary benefit is enabling allocations that would otherwise fail:
Benefit = Value_of_allocations_enabled + Avoidance_of_OOM_consequences
Where:
Value_of_allocations_enabled:
= Sum of (allocation_size × value_per_byte × duration_needed)
Avoidance_of_OOM_consequences:
= Probability(OOM_without_compaction) × Cost_of_OOM
= P(OOM) × (process_termination_cost + data_loss_cost + user_experience_cost)
Cost of Compaction:
Cost = Direct_cost + Indirect_cost + Opportunity_cost
Where:
Direct_cost:
= CPU_cycles × cost_per_cycle + Memory_bandwidth × cost_per_byte
Indirect_cost:
= Process_suspension_time × impact_per_unit_time
+ Cache_warming_cost + TLB_shootdown_cost + I/O_disruption_cost
Opportunity_cost:
= Work_not_done_during_compaction
Decision Framework:
if (Expected_Benefit > Expected_Cost) {
perform_compaction();
} else {
defer_compaction() or use_alternative();
}
Practical Heuristics:
Since exact benefit/cost calculations are often impractical in real-time, systems use heuristics:
Heuristic 1: Proportional Response
Compaction aggressiveness ∝ (Fragmentation_severity / Current_system_load)
High fragmentation + Low load → Compact now
High fragmentation + High load → Compact partially or defer
Low fragmentation + Any load → Don't compact
Heuristic 2: Request-Driven Bounds
max_acceptable_cost = k × request_size
Where k is a tunable constant (e.g., 0.1 = compact only if moving ≤ 10% of request size)
Only compact if:
Total_bytes_to_move < max_acceptable_cost
| Fragmentation Level | System Load | Pending Priority | Recommended Action |
|---|---|---|---|
| Severe (>70%) | Low (<30%) | Any | Full compaction immediately |
| Severe (>70%) | High (>70%) | High | Partial compaction to satisfy request |
| Severe (>70%) | High (>70%) | Low | Defer until load decreases |
| Moderate (40-70%) | Low (<30%) | Any | Background incremental compaction |
| Moderate (40-70%) | High (>70%) | Any | Monitor; compact only on failure |
| Low (<40%) | Any | Any | No compaction needed |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
// Compaction cost-benefit analysis implementationtypedef struct { size_t bytes_to_move; int num_processes_affected; float current_cpu_load; float current_memory_pressure; int pending_request_priority; size_t pending_request_size;} CompactionContext; typedef struct { float direct_cost_ms; float indirect_cost_ms; float total_cost_ms; float benefit_score; bool should_compact;} CompactionAnalysis; CompactionAnalysis analyze_compaction(CompactionContext* ctx) { CompactionAnalysis result; // Estimate direct cost const float EFFECTIVE_BANDWIDTH_GBPS = 20.0; result.direct_cost_ms = (ctx->bytes_to_move / (EFFECTIVE_BANDWIDTH_GBPS * 1e6)) * 1000.0; // Estimate indirect cost const float SUSPEND_OVERHEAD_MS = 0.1; const float CACHE_WARMING_BASE_MS = 0.5; result.indirect_cost_ms = ctx->num_processes_affected * SUSPEND_OVERHEAD_MS + ctx->num_processes_affected * CACHE_WARMING_BASE_MS + (ctx->current_cpu_load * result.direct_cost_ms * 0.5); // Load penalty result.total_cost_ms = result.direct_cost_ms + result.indirect_cost_ms; // Estimate benefit float fragmentation_urgency = ctx->current_memory_pressure; float priority_weight = ctx->pending_request_priority / 10.0; float size_weight = (float)ctx->pending_request_size / (1024 * 1024); // MB result.benefit_score = fragmentation_urgency * priority_weight * size_weight * 100.0; // Decision with load-aware threshold float threshold = 10.0 * (1.0 + ctx->current_cpu_load); // Higher threshold under load result.should_compact = (result.benefit_score / result.total_cost_ms) > threshold; return result;}Given compaction's substantial costs, systems employ various strategies to minimize the impact while still achieving fragmentation reduction.
Strategy 1: Move Minimum Necessary
Instead of complete compaction, move only the minimum bytes needed to satisfy the pending request:
Algorithm: Minimal Movement Compaction
1. Identify pending request size R
2. Find the two largest adjacent holes H1 and H2 separated by process P
3. If |H1| + |H2| + overhead > R:
- Move process P to merge H1 and H2
- Merged hole size: |H1| + |P| + |H2| - |P| = |H1| + |H2|
4. Repeat until request can be satisfied
This approach may move only a fraction of total memory compared to full compaction.
Strategy 2: Reduce Compaction Frequency
Prevent fragmentation from occurring in the first place:
Better Allocation Strategies:
Memory Hints:
// Application provides hints about allocation lifetime
void* os_alloc(size_t size, AllocationHint hint) {
switch (hint) {
case HINT_SHORT_LIVED:
return allocate_from_short_pool(size);
case HINT_LONG_LIVED:
return allocate_from_stable_region(size);
case HINT_HUGE:
return allocate_contiguous_region(size);
}
}
By segregating allocations by lifetime, short-lived allocation churn doesn't fragment regions used by long-lived allocations.
Strategy 3: Use Hardware Acceleration
Some systems leverage hardware to accelerate memory copies:
DMA Engine for Memory-to-Memory Copy:
Modern chipsets include DMA engines capable of:
- Memory-to-memory copy without CPU involvement
- Scatter-gather operations
- Copy with transformation (encryption, compression)
CPU can initiate copy and continue other work:
initiate_dma_copy(source, dest, size, callback);
// CPU does other work while DMA engine copies
// Callback invoked on completion
Benefits:
Limitations:
Strategy 4: Amortize Cost Over Time
Continuous Background Compaction:
Algorithm: Background Compaction Daemon
while (system_running) {
wait_for_idle_period(threshold_ms);
if (fragmentation_metric() > LOW_THRESHOLD) {
// Find one small, easily movable process
Process* p = find_smallest_movable_process();
if (p != NULL) {
// Move incrementally
suspend_briefly(p);
relocate_small_chunk(p);
resume(p);
// Allow other work before next iteration
yield_for_ms(BACKOFF_TIME);
}
}
sleep_until_fragmentation_check();
}
Advantages:
Disadvantages:
Modern systems often avoid compaction entirely by using paging. External fragmentation doesn't exist when memory is allocated in fixed-size pages. The costs of paging (page tables, TLB management) are typically far less than the costs of compaction. This is why paging became the dominant memory management technique.
Memory compaction is a powerful tool for eliminating external fragmentation, but it comes with substantial costs that must be carefully weighed against benefits. Understanding these costs enables informed decision-making about when and how to apply compaction.
What's Next:
Having examined the costs of compaction, we now turn to an alternative approach: the Buddy System Algorithm. Rather than consolidating fragmented memory after the fact, the buddy system uses a clever allocation scheme that makes merging free regions trivial—trading some internal fragmentation for dramatically simpler and faster free-space management.
You now understand the true cost of memory compaction—from direct computational overhead through indirect system-wide impacts to cost minimization strategies. This knowledge is essential for making informed memory management decisions in systems where contiguous allocation is required.