Detection And Recovery - Learning Module

Loading content...

0/227

Resource Preemption

Taking Resources Back

Resource preemption offers a potentially gentler alternative to process termination: instead of killing a process entirely, we forcibly take away resources it currently holds and give them to another process that needs them to proceed.

The appeal is obvious—if we can preempt resources without terminating processes, we preserve more work and cause less disruption. A process whose memory is swapped out can continue later when memory is available. A process whose lock is preempted can retry.

But preemption is not universally applicable. Some resources simply cannot be taken away mid-use without catastrophic consequences. A printer job cannot be preempted halfway through—you'd get garbage output. A file being written cannot have its handle yanked away—you'd corrupt the file.

The central question of preemption is: Can the process safely lose this resource and recover later?

This page explores which resources support preemption, how to implement preemption safely, the relationship between preemption and rollback, and practical applications in real systems.

What You Will Learn

By the end of this page, you will understand the fundamental requirements for resource preemptability, how operating systems implement preemption for CPU and memory, techniques for preempting locks via transaction rollback, the challenges of implementing safe preemption, and when preemption is preferable to termination.

Requirements for Preemptability

Not all resources can be preempted. A resource is preemptable if and only if it satisfies certain conditions that allow safe reclamation without corrupting the system state or the process's ability to function.

Formal Requirements for Preemptability:

Preemptability Conditions

•State Savability: The resource's state (and the process's use of it) can be saved completely. When preemption occurs, we must preserve enough information to resume later.
•State Restorability: The saved state can be restored later without loss or corruption. The process can pick up exactly where it left off.
•No External Side Effects: Preemption doesn't cause irreversible external effects. If a resource is mid-operation with observable consequences (printing, network transmission), preemption may cause inconsistency.
•Process Tolerance: The process can tolerate having the resource taken away. It either blocks gracefully or is designed to handle resource unavailability.
•System Support: The OS or runtime has mechanisms to perform the preemption, save state, and restore state when needed.

Analysis of Resource Preemptability
Resource	Savable?	Restorable?	No Side Effects?	Preemptable?
CPU time	✅ Context saved	✅ Context restored	✅ Computation is internal	✅ Yes
Memory pages	✅ Swap to disk	✅ Swap back	✅ Memory is internal state	✅ Yes
Printer (mid-job)	❌ Physical paper used	❌ Can't unprint	❌ Visible output	❌ No
Network socket (connected)	❓ State complex	❌ TCP state lost	❌ Peer affected	❌ No
File lock	✅ Lock state known	✅ Can re-lock later	⚠️ Data must be rolled back	⚠️ With rollback
Database row lock	✅ Transaction log	✅ Via rollback	✅ Atomicity preserved	✅ Via transaction abort
GPU memory	✅ Copy to host RAM	✅ Copy back	✅ Computation internal	✅ Yes (expensive)
CD-ROM drive (reading)	✅ Track position	✅ Seek again	✅ No physical state change	✅ Yes

The Key Insight:

Resources that represent or produce physical artifacts (printouts, network packets sent, hardware state changes) are generally not preemptable. Resources that represent virtual state within the computer (CPU registers, memory contents, lock ownership) are generally preemptable if we can save and restore that state.

The exception is locks: they're preemptable only if we can undo the work done while holding the lock. This ties preemption to rollback and transaction semantics.

Preemption vs Prevention

Recall that breaking the 'no preemption' condition is one way to PREVENT deadlock (covered in Module 4). Here we're discussing preemption as a RECOVERY mechanism after deadlock occurs. The concepts are related but the context differs: prevention applies preemption proactively to all requests; recovery applies preemption reactively to specific deadlocked resources.

CPU Preemption: The Model Case

CPU time is the most commonly preempted resource. The operating system's scheduler routinely preempts running processes to share CPU among all ready processes. Understanding CPU preemption provides a template for thinking about other resource preemption.

How CPU Preemption Works:

cpu_preemption_concept.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
/**
 * CPU preemption: the canonical example of safe resource preemption.
 * 
 * The CPU can be preempted at virtually any time because:
 * 1. State is fully savable (registers, PC, flags)
 * 2. State is fully restorable (context switch reloads everything)
 * 3. No external side effects (computation is internal)
 * 4. Process tolerance is implicit (preemption is invisible to process)
 */
 
// Context saved on preemption
struct cpu_context {
    uint64_t rax, rbx, rcx, rdx;        // General registers
    uint64_t rsi, rdi, rbp, rsp;        // Index/pointer registers
    uint64_t r8, r9, r10, r11;          // Additional registers
    uint64_t r12, r13, r14, r15;
    uint64_t rip;                        // Instruction pointer
    uint64_t rflags;                     // Status flags
    uint64_t cs, ss;                     // Segment registers
    struct fpu_state *fpu;               // Floating point state
    // ... additional state as needed
};
 
/**
 * Timer interrupt triggers preemption.
 * Called by hardware interrupt handler.
 */
void timer_interrupt_handler(struct cpu_context *interrupted_context) {
    // Step 1: Save current process context
    struct task_struct *current = get_current_task();
    save_context(current, interrupted_context);
    
    // Step 2: Update scheduler state (accounting, priorities)
    scheduler_tick(current);
    
    // Step 3: Pick next process to run
    struct task_struct *next = schedule();
    
    if (next != current) {
        // Step 4: Restore next process context
        switch_to(current, next);
        // Note: switch_to saves remaining context and restores next's
    }
    
    // Step 5: Return to (possibly different) process
    // Hardware restores context from stack and continues execution
}
 
/**
 * Key insight: The preempted process never knows it was preempted.
 * From its perspective, execution is continuous.
 * 
 * This "invisible preemption" is the gold standard for safe preemption.
 * Other resources are preemptable to the extent they can approximate this.
 */

Properties That Make CPU Preemption Safe:

Complete state capture: All CPU state fits in the context structure. Nothing is lost.
Atomic save/restore: Context switch is atomic from the process's perspective.
No visibility to process: The process can't tell it was preempted (except by checking time).
Hardware support: CPU provides interrupt mechanism, register save/restore instructions.
Resumption guarantee: The process WILL continue eventually (scheduler fairness).

Other resources can be preempted safely to the extent they share these properties.

Why CPU Preemption Doesn't Cause Deadlock

CPU is a renewable resource—using it now doesn't prevent others from using it later. The scheduler ensures all ready processes eventually get CPU time. Deadlock involves resources that are held exclusively until explicitly released. CPU is inherently preemptable by design, so it never participates in deadlock.

Memory Preemption: Swapping and Paging

Memory is another preemptable resource. When physical memory is exhausted, the OS can swap out or page out memory from one process to make room for another. The preempted memory is saved to disk and restored when the process needs it again.

memory_preemption.c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
/**
 * Memory preemption via paging/swapping.
 * 
 * Memory is preemptable because:
 * 1. Contents can be saved to disk (swap space)
 * 2. Contents can be restored on demand (page fault)
 * 3. No external effects (memory is internal state)
 * 4. Process tolerates via blocking on page fault
 */
 
#include <linux/mm.h>
#include <linux/swap.h>
 
/**
 * Page reclaim: Preempt memory pages from processes.
 * Called when memory is low.
 */
unsigned long reclaim_pages(struct zone *zone, unsigned long nr_to_reclaim) {
    struct list_head *page_list;
    unsigned long nr_reclaimed = 0;
    
    // Get list of candidate pages (LRU)
    page_list = get_lru_pages(zone, nr_to_reclaim);
    
    for_each_page(page, page_list) {
        // Skip pinned or locked pages
        if (page_is_locked(page) || page_is_pinned(page)) {
            continue;
        }
        
        // Different handling based on page type
        if (page_is_anonymous(page)) {
            // Anonymous page (heap, stack): write to swap
            if (add_to_swap(page)) {
                // Page now has a swap slot
                // Remove from process page table
                unmap_page_from_process(page);
                // Page is now preempted - can be freed
                nr_reclaimed++;
            }
        } else if (page_is_file_backed(page)) {
            // File-backed page: just discard if clean
            if (page_is_dirty(page)) {
                // Write back to file first
                writeback_page(page);
            }
            unmap_page_from_process(page);
            nr_reclaimed++;
        }
    }
    
    return nr_reclaimed;
}
 
/**
 * Page fault handler: Restore preempted memory.
 * Called when process accesses swapped-out page.
 */
int handle_page_fault(struct mm_struct *mm, unsigned long address) {
    struct vm_area_struct *vma;
    pte_t *pte;
    struct page *page;
    
    // Find the virtual memory region
    vma = find_vma(mm, address);
    if (!vma) return -EFAULT;
    
    // Get page table entry
    pte = get_pte(mm, address);
    
    if (pte_none(*pte)) {
        // Never allocated - allocate fresh
        return allocate_page(vma, address);
    }
    
    if (pte_is_swap(pte)) {
        // Page was swapped out - restore it
        swp_entry_t entry = pte_to_swp_entry(*pte);
        
        // Read page from swap
        page = read_swap_page(entry);
        if (!page) return -ENOMEM;
        
        // Map into process address space
        set_pte(pte, mk_pte(page, vma->vm_page_prot));
        
        // Free swap slot
        free_swap_slot(entry);
        
        return 0;  // Page fault handled, process can continue
    }
    
    // Other fault types...
    return -EFAULT;
}
 
/**
 * For deadlock recovery, we can preempt memory from a deadlocked process:
 * 1. Swap out its pages to create memory pressure relief
 * 2. The deadlocked process remains blocked (on lock, not memory)
 * 3. Other processes can now proceed
 * 
 * This doesn't break the deadlock directly but may help indirectly
 * if the deadlock involves memory resources.
 */

Memory Preemption Properties:

Property	Implementation
State savability	Contents written to swap partition/file
State restorability	Page fault triggers read-back from swap
Process tolerance	Process blocks on page fault until restore
Transparency	Virtual memory abstraction hides preemption
Cost	I/O cost for swap write/read

Memory Preemption for Deadlock

Memory preemption alone rarely resolves deadlock (since most deadlocks involve locks, not memory). However, if deadlock occurs because all processes are waiting for memory that others hold, the OOM killer (Out-Of-Memory killer) terminates one process—a form of combined preemption and termination.

Lock Preemption via Rollback

Locks are the primary resource involved in deadlock, but they're not directly preemptable. We can't just "take away" a lock while another process is using it—the protected data would be in an inconsistent state.

However, we CAN preempt locks if we can undo the work done while holding the lock. This is the essence of transaction rollback: cancel the transaction, undo its changes, release its locks, and let the process retry.

lock_preemption_rollback.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Any
from enum import Enum
from abc import ABC, abstractmethod
import copy
 
class OperationType(Enum):
    READ = "read"
    WRITE = "write"
    INSERT = "insert"
    DELETE = "delete"
 
@dataclass
class Operation:
    """A single operation within a transaction."""
    op_type: OperationType
    resource_id: str
    old_value: Optional[Any] = None
    new_value: Optional[Any] = None
 
@dataclass
class Transaction:
    """A transaction that can be preempted via rollback."""
    tx_id: int
    operations: List[Operation] = field(default_factory=list)
    held_locks: List[str] = field(default_factory=list)
    savepoints: Dict[str, int] = field(default_factory=dict)  # name -> op index
    state: str = "active"  # active, committed, aborted
 
class TransactionManager:
    """
    Transaction manager with lock preemption support.
    
    Preempting a lock means:
    1. Roll back all changes made by the transaction
    2. Release all locks held
    3. Mark transaction for retry
    """
    
    def __init__(self, data_store: Dict[str, Any], lock_manager):
        self.data_store = data_store
        self.lock_manager = lock_manager
        self.transactions: Dict[int, Transaction] = {}
        self.next_tx_id = 1
    
    def begin_transaction(self) -> int:
        """Start a new transaction."""
        tx_id = self.next_tx_id
        self.next_tx_id += 1
        self.transactions[tx_id] = Transaction(tx_id=tx_id)
        return tx_id
    
    def read(self, tx_id: int, resource_id: str) -> Any:
        """Read a value within a transaction."""
        tx = self.transactions[tx_id]
        
        # Acquire read lock
        self.lock_manager.acquire_read(tx_id, resource_id)
        tx.held_locks.append(resource_id)
        
        # Log the read
        tx.operations.append(Operation(
            op_type=OperationType.READ,
            resource_id=resource_id,
            old_value=self.data_store.get(resource_id)
        ))
        
        return self.data_store.get(resource_id)
    
    def write(self, tx_id: int, resource_id: str, value: Any):
        """Write a value within a transaction."""
        tx = self.transactions[tx_id]
        
        # Acquire write lock (may block → potential deadlock!)
        self.lock_manager.acquire_write(tx_id, resource_id)
        if resource_id not in tx.held_locks:
            tx.held_locks.append(resource_id)
        
        # Log the write with old value (for rollback)
        old_value = self.data_store.get(resource_id)
        tx.operations.append(Operation(
            op_type=OperationType.WRITE,
            resource_id=resource_id,
            old_value=copy.deepcopy(old_value),
            new_value=copy.deepcopy(value)
        ))
        
        # Apply the write
        self.data_store[resource_id] = value
    
    def preempt_transaction(self, tx_id: int) -> bool:
        """
        Preempt a transaction's locks by rolling it back.
        
        This is the core of lock preemption for deadlock recovery.
        
        Steps:
        1. Undo all operations in reverse order
        2. Release all held locks
        3. Mark transaction for retry
        
        Returns: True if preemption successful
        """
        tx = self.transactions.get(tx_id)
        if not tx or tx.state != "active":
            return False
        
        print(f"Preempting transaction {tx_id}...")
        
        # Step 1: Undo operations in reverse order
        for op in reversed(tx.operations):
            self._undo_operation(op)
        
        # Step 2: Release all locks
        for resource_id in tx.held_locks:
            self.lock_manager.release(tx_id, resource_id)
        tx.held_locks.clear()
        
        # Step 3: Mark for retry
        tx.state = "aborted"
        tx.operations.clear()
        
        print(f"Transaction {tx_id} preempted (rolled back)")
        return True
    
    def _undo_operation(self, op: Operation):
        """Undo a single operation."""
        if op.op_type == OperationType.WRITE:
            # Restore old value
            if op.old_value is None:
                self.data_store.pop(op.resource_id, None)
            else:
                self.data_store[op.resource_id] = op.old_value
            print(f"  Undid write to {op.resource_id}")
            
        elif op.op_type == OperationType.INSERT:
            # Delete the inserted value
            self.data_store.pop(op.resource_id, None)
            print(f"  Undid insert of {op.resource_id}")
            
        elif op.op_type == OperationType.DELETE:
            # Re-insert the deleted value
            self.data_store[op.resource_id] = op.old_value
            print(f"  Undid delete of {op.resource_id}")
        
        # READ operations don't need undo
    
    def commit(self, tx_id: int):
        """Commit a transaction, making changes permanent."""
        tx = self.transactions[tx_id]
        
        # Release all locks
        for resource_id in tx.held_locks:
            self.lock_manager.release(tx_id, resource_id)
        
        tx.held_locks.clear()
        tx.state = "committed"
        print(f"Transaction {tx_id} committed")
    
    def retry_transaction(self, tx_id: int) -> int:
        """
        Retry a preempted transaction with a new ID.
        Application should re-execute the transaction logic.
        """
        old_tx = self.transactions.get(tx_id)
        if old_tx and old_tx.state == "aborted":
            # Create new transaction for retry
            new_tx_id = self.begin_transaction()
            print(f"Retrying transaction {tx_id} as {new_tx_id}")
            return new_tx_id
        return tx_id
 
 
class DeadlockRecoveryViaPreemption:
    """
    Recover from deadlock by preempting locks (rolling back transactions).
    """
    
    def __init__(self, tx_manager: TransactionManager, 
                 detect_deadlock, victim_selector):
        self.tx_manager = tx_manager
        self.detect_deadlock = detect_deadlock
        self.select_victim = victim_selector
    
    def recover(self, deadlocked_tx_ids: List[int]) -> Dict:
        """
        Recover from deadlock by preempting victim transactions.
        """
        result = {
            'preempted': [],
            'retry_ids': {},
            'success': False
        }
        
        while deadlocked_tx_ids:
            # Select victim transaction
            victim = self.select_victim(deadlocked_tx_ids)
            
            # Preempt (rollback) the victim
            success = self.tx_manager.preempt_transaction(victim)
            
            if success:
                result['preempted'].append(victim)
                
                # Prepare retry
                new_id = self.tx_manager.retry_transaction(victim)
                result['retry_ids'][victim] = new_id
            
            # Re-check for deadlock
            deadlocked_tx_ids = self.detect_deadlock()
        
        result['success'] = True
        return result

Lock Preemption = Transaction Abort + Retry

The key insight is that we're not "preempting" the lock in isolation—we're aborting the transaction that holds the lock, which releases the lock as a side effect. The transaction is then retried, and hopefully takes a different execution path that avoids deadlock.

Why This Works:

Logging enables undo: Every write is logged with the old value. Rollback replays the log backwards.
Atomicity preserved: Either all changes commit or all are undone. No partial state.
Locks released atomically: When transaction aborts, all its locks release at once.
Retry can succeed: The retry occurs at a different time, with different scheduling, and may acquire locks in a deadlock-free order.

This is How Databases Work

Real database systems use exactly this pattern. PostgreSQL, MySQL, SQL Server all maintain transaction logs. When deadlock is detected, one transaction is aborted (rolled back), its locks are released, and it returns an error to the client. The client can then retry the transaction.

Implementing Safe Preemption

Safe preemption requires careful attention to several concerns. Here's a checklist and implementation patterns for production-grade preemption.

safe_preemption.cpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
#include <atomic>
#include <mutex>
#include <vector>
#include <functional>
#include <memory>
#include <chrono>
 
/**
 * Requirements for safe resource preemption:
 * 
 * 1. CHECKPOINTING: Ability to save resource state
 * 2. ROLLBACK: Ability to restore from checkpoint
 * 3. CLEANUP: Release partially-held resources
 * 4. NOTIFICATION: Inform process of preemption
 * 5. RESTART: Resume execution safely after preemption
 */
 
class PreemptableResource {
public:
    virtual ~PreemptableResource() = default;
    
    // Check if preemption is currently safe
    virtual bool can_preempt() const = 0;
    
    // Save current state for later restoration
    virtual std::unique_ptr<void*> checkpoint() = 0;
    
    // Prepare for preemption (cleanup, notify)
    virtual void prepare_preempt() = 0;
    
    // Execute preemption
    virtual void preempt() = 0;
    
    // Restore from checkpoint (after process regains resource)
    virtual void restore(std::unique_ptr<void*> checkpoint) = 0;
};
 
/**
 * PreemptableTransaction: A transaction that can be safely preempted.
 */
class PreemptableTransaction {
public:
    using OperationLog = std::vector<std::function<void()>>;
    
private:
    std::atomic<bool> active_{true};
    std::atomic<bool> preempt_requested_{false};
    std::mutex log_mutex_;
    OperationLog undo_log_;
    std::vector<std::string> held_resources_;
    
    // Preemption callback
    std::function<void()> on_preempt_;
    
public:
    /**
     * Check if we should yield due to preemption request.
     * Call this at "safe points" in the transaction logic.
     */
    bool should_yield() const {
        return preempt_requested_.load(std::memory_order_acquire);
    }
    
    /**
     * Log an undoable operation.
     * Called before each write operation.
     */
    void log_undo(std::function<void()> undo_func) {
        std::lock_guard<std::mutex> lock(log_mutex_);
        undo_log_.push_back(std::move(undo_func));
    }
    
    /**
     * Request preemption of this transaction.
     * Sets flag that transaction should check at safe points.
     */
    void request_preempt() {
        preempt_requested_.store(true, std::memory_order_release);
    }
    
    /**
     * Execute rollback and release resources.
     * Called when preemption is actually happening.
     */
    void execute_rollback() {
        std::lock_guard<std::mutex> lock(log_mutex_);
        
        // Undo in reverse order
        for (auto it = undo_log_.rbegin(); it != undo_log_.rend(); ++it) {
            (*it)();
        }
        undo_log_.clear();
        
        // Release resources
        for (const auto& resource : held_resources_) {
            release_resource(resource);
        }
        held_resources_.clear();
        
        active_.store(false, std::memory_order_release);
        
        // Notify callback
        if (on_preempt_) {
            on_preempt_();
        }
    }
    
    void acquire_resource(const std::string& resource_id) {
        held_resources_.push_back(resource_id);
    }
    
    void release_resource(const std::string& resource_id) {
        // Release to lock manager
    }
    
    void set_preempt_callback(std::function<void()> callback) {
        on_preempt_ = std::move(callback);
    }
};
 
/**
 * Safe preemption pattern with cooperative yielding.
 * 
 * Rather than forcibly preempting, we request preemption and
 * the transaction checks at safe points.
 */
class CooperativePreemption {
public:
    /**
     * Example transaction that cooperates with preemption.
     */
    void example_transaction(PreemptableTransaction& tx, DataStore& store) {
        try {
            // Acquire lock
            tx.acquire_resource("account_123");
            
            // Read current value
            int balance = store.read("account_123");
            
            // === SAFE POINT: Check for preemption ===
            if (tx.should_yield()) {
                throw PreemptionException("Preemption requested");
            }
            
            // Log undo operation BEFORE making change
            tx.log_undo([&store, balance]() {
                store.write("account_123", balance);  // Restore old value
            });
            
            // Make change
            store.write("account_123", balance - 100);
            
            // === SAFE POINT ===  
            if (tx.should_yield()) {
                throw PreemptionException("Preemption requested");
            }
            
            // Acquire second lock (potential deadlock point!)
            tx.acquire_resource("account_456");
            
            int other_balance = store.read("account_456");
            
            tx.log_undo([&store, other_balance]() {
                store.write("account_456", other_balance);
            });
            
            store.write("account_456", other_balance + 100);
            
            // Commit
            // (releases locks, clears undo log)
            
        } catch (const PreemptionException& e) {
            // Rollback is triggered
            tx.execute_rollback();
            
            // Transaction will be retried by caller
            throw;
        }
    }
};
 
class PreemptionException : public std::exception {
    std::string message_;
public:
    explicit PreemptionException(const std::string& msg) : message_(msg) {}
    const char* what() const noexcept override { return message_.c_str(); }
};

Safe Preemption Practices

•Log undo operations BEFORE changes
•Check for preemption at safe points
•Release ALL resources on rollback
•Use RAII for automatic cleanup
•Notify callers of preemption

Unsafe Practices to Avoid

•Preempting with external side effects uncommitted
•Partial rollback leaving inconsistent state
•Forgetting to release some locks
•Not handling undo log failures
•Infinite retry loops after preemption

Cooperative vs Forceful Preemption

Cooperative preemption (checking flags at safe points) is much safer than forceful preemption (interrupting mid-operation). For locks protecting mutable state, always prefer cooperative preemption with explicit safe points. Reserve forceful preemption for resources like CPU and memory where the OS provides safe mechanisms.

Selecting Resources to Preempt

When recovering from deadlock via preemption, we must choose which resources to preempt. The goal is to break the deadlock with minimal disruption.

resource_selection.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
from dataclasses import dataclass
from typing import List, Dict, Set, Tuple
from enum import Enum
 
class PreemptionCostFactor(Enum):
    """Factors contributing to preemption cost."""
    WORK_LOST = "work_lost"           # Amount of work to redo
    ROLLBACK_COMPLEXITY = "rollback"  # Complexity of undo
    CHAIN_LENGTH = "chain"            # Other resources depending on this
    PRIORITY = "priority"             # Priority of holding process
    NON_PREEMPTABLE = "non_preempt"   # Related non-preemptable resources
 
@dataclass
class ResourcePreemptionCandidate:
    """A resource being considered for preemption."""
    resource_id: str
    holder_pid: int
    resource_type: str
    preemption_cost: float
    cost_breakdown: Dict[str, float]
    can_preempt: bool
    reason_if_not: str = ""
 
class PreemptionPlanner:
    """
    Plan which resources to preempt to break deadlock.
    
    Goal: Find minimum-cost set of resources whose preemption
    breaks all cycles in the wait-for graph.
    """
    
    def __init__(self, 
                 resource_info: Dict[str, dict],
                 process_info: Dict[int, dict],
                 wait_for_graph: Dict[int, Set[int]]):
        self.resources = resource_info
        self.processes = process_info
        self.wait_for = wait_for_graph
    
    def plan_preemption(self, 
                        deadlocked_pids: Set[int]) -> List[ResourcePreemptionCandidate]:
        """
        Determine which resources to preempt to break deadlock.
        
        Strategy:
        1. Identify all resources held by deadlocked processes
        2. Evaluate preemption cost for each
        3. Find minimum cost set that breaks all cycles
        """
        # Step 1: Gather candidate resources
        candidates = []
        for pid in deadlocked_pids:
            process = self.processes[pid]
            for resource_id in process.get('held_resources', []):
                candidate = self._evaluate_candidate(resource_id, pid)
                if candidate.can_preempt:
                    candidates.append(candidate)
        
        # Sort by cost (lowest first)
        candidates.sort(key=lambda c: c.preemption_cost)
        
        # Step 2: Greedily select resources until deadlock broken
        selected = []
        remaining_deadlock = set(deadlocked_pids)
        
        for candidate in candidates:
            if not remaining_deadlock:
                break  # Deadlock already broken
            
            # Would preempting this resource help?
            if self._would_break_cycle(candidate, remaining_deadlock):
                selected.append(candidate)
                remaining_deadlock = self._simulate_preemption(
                    candidate, remaining_deadlock
                )
        
        if remaining_deadlock:
            # Couldn't break deadlock with available preemptions
            # Must fall back to process termination
            print(f"Warning: Cannot fully break deadlock via preemption. "
                  f"Remaining: {remaining_deadlock}")
        
        return selected
    
    def _evaluate_candidate(self, resource_id: str, 
                            holder_pid: int) -> ResourcePreemptionCandidate:
        """Evaluate the cost of preempting a specific resource."""
        resource = self.resources.get(resource_id, {})
        process = self.processes.get(holder_pid, {})
        
        cost_breakdown = {}
        
        # Factor 1: Work lost (undo log size)
        undo_log_size = process.get('undo_log_size', 0)
        cost_breakdown['work_lost'] = undo_log_size * 0.1
        
        # Factor 2: Rollback complexity
        num_operations = process.get('operations_count', 0)
        cost_breakdown['rollback'] = num_operations * 0.5
        
        # Factor 3: Chain length (resources depending on this)
        chain_length = len(resource.get('dependent_resources', []))
        cost_breakdown['chain'] = chain_length * 2.0
        
        # Factor 4: Process priority (higher priority = higher cost)
        priority = process.get('priority', 5)
        cost_breakdown['priority'] = priority * 1.0
        
        # Check if actually preemptable
        can_preempt = resource.get('preemptable', False)
        
        # Non-preemptable if holding non-preemptable resources too
        non_preempt_held = [
            r for r in process.get('held_resources', [])
            if not self.resources.get(r, {}).get('preemptable', False)
        ]
        
        if non_preempt_held:
            cost_breakdown['non_preempt'] = float('inf')
            can_preempt = False
            reason = f"Holding non-preemptable: {non_preempt_held}"
        else:
            cost_breakdown['non_preempt'] = 0
            reason = ""
        
        total_cost = sum(cost_breakdown.values())
        
        return ResourcePreemptionCandidate(
            resource_id=resource_id,
            holder_pid=holder_pid,
            resource_type=resource.get('type', 'unknown'),
            preemption_cost=total_cost,
            cost_breakdown=cost_breakdown,
            can_preempt=can_preempt,
            reason_if_not=reason
        )
    
    def _would_break_cycle(self, candidate: ResourcePreemptionCandidate,
                           deadlocked: Set[int]) -> bool:
        """Check if preempting this resource would help break a cycle."""
        # Simplified: if holder is deadlocked and resource is blocking someone
        if candidate.holder_pid not in deadlocked:
            return False
        
        # Check if any deadlocked process is waiting for this resource
        resource_id = candidate.resource_id
        for pid in deadlocked:
            waiting_for = self.processes.get(pid, {}).get('waiting_for', [])
            if resource_id in waiting_for:
                return True
        
        return False
    
    def _simulate_preemption(self, candidate: ResourcePreemptionCandidate,
                              deadlocked: Set[int]) -> Set[int]:
        """Simulate preempting and see who's still deadlocked."""
        # This would re-run cycle detection after simulating release
        # For simplicity, just remove the holder if they'd no longer block
        result = deadlocked.copy()
        
        # If holder releases this resource, can any waiting process proceed?
        waiting_processes = [
            pid for pid in deadlocked
            if candidate.resource_id in 
               self.processes.get(pid, {}).get('waiting_for', [])
        ]
        
        for pid in waiting_processes:
            # If this process was only waiting for this resource, they unblock
            waiting_for = self.processes.get(pid, {}).get('waiting_for', [])
            if len(waiting_for) == 1:
                result.discard(pid)
        
        return result

Minimum Cut Problem

Finding the optimal set of resources to preempt is related to the minimum cut problem in graph theory: we want to remove the minimum-cost set of edges (resource holdings) that disconnects all cycles. In practice, greedy algorithms work well—select the lowest-cost preemption that breaks at least one cycle, repeat until done.

Preemption vs Termination: Choosing Wisely

Both preemption and termination can break deadlock. When should you use each?

Comparing Preemption and Termination
Criterion	Preemption (Rollback)	Termination
Work lost	Work since checkpoint	All work
Recovery time	Rollback duration + retry	Full restart time
Complexity	Requires logging, undo mechanism	Simple signal
Applicable resources	Only preemptable resources	Any resource
Data consistency	Preserved via atomicity	May require manual cleanup
User visibility	Request may retry automatically	Error returned, must handle
Best for	Databases, transactions	General processes, daemons

Decision Guidelines:

Scenario	Recommended Recovery
Database transactions	Preemption via rollback
Batch jobs (long-running)	Termination if no checkpoint system
Interactive processes	Preemption if possible, termination as fallback
System services	Termination + automatic restart
Resources with external effects	Termination (can't rollback effects)
Short transactions	Preemption (little work to redo)
Very long transactions	Consider partial preemption (to savepoint)

Hybrid Approaches

Real systems often use hybrid approaches. For example, try preemption first (lower cost), fall back to termination if preemption isn't possible. Some systems implement 'coordinated termination' where the victim process is signaled to checkpoint its state before being killed, enabling faster restart than from scratch.

Summary: Resource Preemption Mastered

We've comprehensively covered resource preemption as a deadlock recovery mechanism—when it works, how to implement it, and when to prefer它 over termination.

Key Takeaways

•Preemptability requires state savability and restorability: CPU time (context switch) and memory (swapping) are easily preemptable. Locks require rollback. Physical devices often cannot be preempted.
•CPU preemption is the model: Complete state capture, atomic save/restore, invisible to process. Other resources are preemptable to the extent they share these properties.
•Lock preemption equals transaction rollback: We don't preempt the lock directly—we abort the transaction, undo its changes, release locks, and retry.
•Safe preemption needs cooperative safe points: Don't preempt mid-operation. Log undo operations before changes. Check for preemption requests at safe points.
•Select resources to minimize cost: Evaluate work lost, rollback complexity, dependencies. Use greedy selection to find minimum-cost set that breaks deadlock.
•Choose preemption or termination based on context: Databases prefer rollback. System services prefer clean termination and restart. Use the approach that fits your system's semantics.

Module Complete:

You've now completed the entire Detection and Recovery module. You understand:

How detection algorithms identify deadlock (single and multiple instance)
The full range of recovery options available
How to implement safe process termination
How to implement resource preemption via rollback

This knowledge enables you to design, implement, and operate systems that handle deadlock gracefully—detecting it when it occurs and recovering with minimal disruption.

Module Complete

Congratulations! You've mastered deadlock detection and recovery. You can now implement detection algorithms, choose appropriate recovery strategies, terminate processes safely, and preempt resources when possible. This comprehensive understanding of deadlock handling is essential for building robust concurrent systems.