Loading content...
Resource preemption offers a potentially gentler alternative to process termination: instead of killing a process entirely, we forcibly take away resources it currently holds and give them to another process that needs them to proceed.
The appeal is obvious—if we can preempt resources without terminating processes, we preserve more work and cause less disruption. A process whose memory is swapped out can continue later when memory is available. A process whose lock is preempted can retry.
But preemption is not universally applicable. Some resources simply cannot be taken away mid-use without catastrophic consequences. A printer job cannot be preempted halfway through—you'd get garbage output. A file being written cannot have its handle yanked away—you'd corrupt the file.
The central question of preemption is: Can the process safely lose this resource and recover later?
This page explores which resources support preemption, how to implement preemption safely, the relationship between preemption and rollback, and practical applications in real systems.
By the end of this page, you will understand the fundamental requirements for resource preemptability, how operating systems implement preemption for CPU and memory, techniques for preempting locks via transaction rollback, the challenges of implementing safe preemption, and when preemption is preferable to termination.
Not all resources can be preempted. A resource is preemptable if and only if it satisfies certain conditions that allow safe reclamation without corrupting the system state or the process's ability to function.
Formal Requirements for Preemptability:
| Resource | Savable? | Restorable? | No Side Effects? | Preemptable? |
|---|---|---|---|---|
| CPU time | ✅ Context saved | ✅ Context restored | ✅ Computation is internal | ✅ Yes |
| Memory pages | ✅ Swap to disk | ✅ Swap back | ✅ Memory is internal state | ✅ Yes |
| Printer (mid-job) | ❌ Physical paper used | ❌ Can't unprint | ❌ Visible output | ❌ No |
| Network socket (connected) | ❓ State complex | ❌ TCP state lost | ❌ Peer affected | ❌ No |
| File lock | ✅ Lock state known | ✅ Can re-lock later | ⚠️ Data must be rolled back | ⚠️ With rollback |
| Database row lock | ✅ Transaction log | ✅ Via rollback | ✅ Atomicity preserved | ✅ Via transaction abort |
| GPU memory | ✅ Copy to host RAM | ✅ Copy back | ✅ Computation internal | ✅ Yes (expensive) |
| CD-ROM drive (reading) | ✅ Track position | ✅ Seek again | ✅ No physical state change | ✅ Yes |
The Key Insight:
Resources that represent or produce physical artifacts (printouts, network packets sent, hardware state changes) are generally not preemptable. Resources that represent virtual state within the computer (CPU registers, memory contents, lock ownership) are generally preemptable if we can save and restore that state.
The exception is locks: they're preemptable only if we can undo the work done while holding the lock. This ties preemption to rollback and transaction semantics.
Recall that breaking the 'no preemption' condition is one way to PREVENT deadlock (covered in Module 4). Here we're discussing preemption as a RECOVERY mechanism after deadlock occurs. The concepts are related but the context differs: prevention applies preemption proactively to all requests; recovery applies preemption reactively to specific deadlocked resources.
CPU time is the most commonly preempted resource. The operating system's scheduler routinely preempts running processes to share CPU among all ready processes. Understanding CPU preemption provides a template for thinking about other resource preemption.
How CPU Preemption Works:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
/** * CPU preemption: the canonical example of safe resource preemption. * * The CPU can be preempted at virtually any time because: * 1. State is fully savable (registers, PC, flags) * 2. State is fully restorable (context switch reloads everything) * 3. No external side effects (computation is internal) * 4. Process tolerance is implicit (preemption is invisible to process) */ // Context saved on preemptionstruct cpu_context { uint64_t rax, rbx, rcx, rdx; // General registers uint64_t rsi, rdi, rbp, rsp; // Index/pointer registers uint64_t r8, r9, r10, r11; // Additional registers uint64_t r12, r13, r14, r15; uint64_t rip; // Instruction pointer uint64_t rflags; // Status flags uint64_t cs, ss; // Segment registers struct fpu_state *fpu; // Floating point state // ... additional state as needed}; /** * Timer interrupt triggers preemption. * Called by hardware interrupt handler. */void timer_interrupt_handler(struct cpu_context *interrupted_context) { // Step 1: Save current process context struct task_struct *current = get_current_task(); save_context(current, interrupted_context); // Step 2: Update scheduler state (accounting, priorities) scheduler_tick(current); // Step 3: Pick next process to run struct task_struct *next = schedule(); if (next != current) { // Step 4: Restore next process context switch_to(current, next); // Note: switch_to saves remaining context and restores next's } // Step 5: Return to (possibly different) process // Hardware restores context from stack and continues execution} /** * Key insight: The preempted process never knows it was preempted. * From its perspective, execution is continuous. * * This "invisible preemption" is the gold standard for safe preemption. * Other resources are preemptable to the extent they can approximate this. */Properties That Make CPU Preemption Safe:
Complete state capture: All CPU state fits in the context structure. Nothing is lost.
Atomic save/restore: Context switch is atomic from the process's perspective.
No visibility to process: The process can't tell it was preempted (except by checking time).
Hardware support: CPU provides interrupt mechanism, register save/restore instructions.
Resumption guarantee: The process WILL continue eventually (scheduler fairness).
Other resources can be preempted safely to the extent they share these properties.
CPU is a renewable resource—using it now doesn't prevent others from using it later. The scheduler ensures all ready processes eventually get CPU time. Deadlock involves resources that are held exclusively until explicitly released. CPU is inherently preemptable by design, so it never participates in deadlock.
Memory is another preemptable resource. When physical memory is exhausted, the OS can swap out or page out memory from one process to make room for another. The preempted memory is saved to disk and restored when the process needs it again.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
/** * Memory preemption via paging/swapping. * * Memory is preemptable because: * 1. Contents can be saved to disk (swap space) * 2. Contents can be restored on demand (page fault) * 3. No external effects (memory is internal state) * 4. Process tolerates via blocking on page fault */ #include <linux/mm.h>#include <linux/swap.h> /** * Page reclaim: Preempt memory pages from processes. * Called when memory is low. */unsigned long reclaim_pages(struct zone *zone, unsigned long nr_to_reclaim) { struct list_head *page_list; unsigned long nr_reclaimed = 0; // Get list of candidate pages (LRU) page_list = get_lru_pages(zone, nr_to_reclaim); for_each_page(page, page_list) { // Skip pinned or locked pages if (page_is_locked(page) || page_is_pinned(page)) { continue; } // Different handling based on page type if (page_is_anonymous(page)) { // Anonymous page (heap, stack): write to swap if (add_to_swap(page)) { // Page now has a swap slot // Remove from process page table unmap_page_from_process(page); // Page is now preempted - can be freed nr_reclaimed++; } } else if (page_is_file_backed(page)) { // File-backed page: just discard if clean if (page_is_dirty(page)) { // Write back to file first writeback_page(page); } unmap_page_from_process(page); nr_reclaimed++; } } return nr_reclaimed;} /** * Page fault handler: Restore preempted memory. * Called when process accesses swapped-out page. */int handle_page_fault(struct mm_struct *mm, unsigned long address) { struct vm_area_struct *vma; pte_t *pte; struct page *page; // Find the virtual memory region vma = find_vma(mm, address); if (!vma) return -EFAULT; // Get page table entry pte = get_pte(mm, address); if (pte_none(*pte)) { // Never allocated - allocate fresh return allocate_page(vma, address); } if (pte_is_swap(pte)) { // Page was swapped out - restore it swp_entry_t entry = pte_to_swp_entry(*pte); // Read page from swap page = read_swap_page(entry); if (!page) return -ENOMEM; // Map into process address space set_pte(pte, mk_pte(page, vma->vm_page_prot)); // Free swap slot free_swap_slot(entry); return 0; // Page fault handled, process can continue } // Other fault types... return -EFAULT;} /** * For deadlock recovery, we can preempt memory from a deadlocked process: * 1. Swap out its pages to create memory pressure relief * 2. The deadlocked process remains blocked (on lock, not memory) * 3. Other processes can now proceed * * This doesn't break the deadlock directly but may help indirectly * if the deadlock involves memory resources. */Memory Preemption Properties:
| Property | Implementation |
|---|---|
| State savability | Contents written to swap partition/file |
| State restorability | Page fault triggers read-back from swap |
| Process tolerance | Process blocks on page fault until restore |
| Transparency | Virtual memory abstraction hides preemption |
| Cost | I/O cost for swap write/read |
Memory preemption alone rarely resolves deadlock (since most deadlocks involve locks, not memory). However, if deadlock occurs because all processes are waiting for memory that others hold, the OOM killer (Out-Of-Memory killer) terminates one process—a form of combined preemption and termination.
Locks are the primary resource involved in deadlock, but they're not directly preemptable. We can't just "take away" a lock while another process is using it—the protected data would be in an inconsistent state.
However, we CAN preempt locks if we can undo the work done while holding the lock. This is the essence of transaction rollback: cancel the transaction, undo its changes, release its locks, and let the process retry.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213
from dataclasses import dataclass, fieldfrom typing import List, Dict, Optional, Anyfrom enum import Enumfrom abc import ABC, abstractmethodimport copy class OperationType(Enum): READ = "read" WRITE = "write" INSERT = "insert" DELETE = "delete" @dataclassclass Operation: """A single operation within a transaction.""" op_type: OperationType resource_id: str old_value: Optional[Any] = None new_value: Optional[Any] = None @dataclassclass Transaction: """A transaction that can be preempted via rollback.""" tx_id: int operations: List[Operation] = field(default_factory=list) held_locks: List[str] = field(default_factory=list) savepoints: Dict[str, int] = field(default_factory=dict) # name -> op index state: str = "active" # active, committed, aborted class TransactionManager: """ Transaction manager with lock preemption support. Preempting a lock means: 1. Roll back all changes made by the transaction 2. Release all locks held 3. Mark transaction for retry """ def __init__(self, data_store: Dict[str, Any], lock_manager): self.data_store = data_store self.lock_manager = lock_manager self.transactions: Dict[int, Transaction] = {} self.next_tx_id = 1 def begin_transaction(self) -> int: """Start a new transaction.""" tx_id = self.next_tx_id self.next_tx_id += 1 self.transactions[tx_id] = Transaction(tx_id=tx_id) return tx_id def read(self, tx_id: int, resource_id: str) -> Any: """Read a value within a transaction.""" tx = self.transactions[tx_id] # Acquire read lock self.lock_manager.acquire_read(tx_id, resource_id) tx.held_locks.append(resource_id) # Log the read tx.operations.append(Operation( op_type=OperationType.READ, resource_id=resource_id, old_value=self.data_store.get(resource_id) )) return self.data_store.get(resource_id) def write(self, tx_id: int, resource_id: str, value: Any): """Write a value within a transaction.""" tx = self.transactions[tx_id] # Acquire write lock (may block → potential deadlock!) self.lock_manager.acquire_write(tx_id, resource_id) if resource_id not in tx.held_locks: tx.held_locks.append(resource_id) # Log the write with old value (for rollback) old_value = self.data_store.get(resource_id) tx.operations.append(Operation( op_type=OperationType.WRITE, resource_id=resource_id, old_value=copy.deepcopy(old_value), new_value=copy.deepcopy(value) )) # Apply the write self.data_store[resource_id] = value def preempt_transaction(self, tx_id: int) -> bool: """ Preempt a transaction's locks by rolling it back. This is the core of lock preemption for deadlock recovery. Steps: 1. Undo all operations in reverse order 2. Release all held locks 3. Mark transaction for retry Returns: True if preemption successful """ tx = self.transactions.get(tx_id) if not tx or tx.state != "active": return False print(f"Preempting transaction {tx_id}...") # Step 1: Undo operations in reverse order for op in reversed(tx.operations): self._undo_operation(op) # Step 2: Release all locks for resource_id in tx.held_locks: self.lock_manager.release(tx_id, resource_id) tx.held_locks.clear() # Step 3: Mark for retry tx.state = "aborted" tx.operations.clear() print(f"Transaction {tx_id} preempted (rolled back)") return True def _undo_operation(self, op: Operation): """Undo a single operation.""" if op.op_type == OperationType.WRITE: # Restore old value if op.old_value is None: self.data_store.pop(op.resource_id, None) else: self.data_store[op.resource_id] = op.old_value print(f" Undid write to {op.resource_id}") elif op.op_type == OperationType.INSERT: # Delete the inserted value self.data_store.pop(op.resource_id, None) print(f" Undid insert of {op.resource_id}") elif op.op_type == OperationType.DELETE: # Re-insert the deleted value self.data_store[op.resource_id] = op.old_value print(f" Undid delete of {op.resource_id}") # READ operations don't need undo def commit(self, tx_id: int): """Commit a transaction, making changes permanent.""" tx = self.transactions[tx_id] # Release all locks for resource_id in tx.held_locks: self.lock_manager.release(tx_id, resource_id) tx.held_locks.clear() tx.state = "committed" print(f"Transaction {tx_id} committed") def retry_transaction(self, tx_id: int) -> int: """ Retry a preempted transaction with a new ID. Application should re-execute the transaction logic. """ old_tx = self.transactions.get(tx_id) if old_tx and old_tx.state == "aborted": # Create new transaction for retry new_tx_id = self.begin_transaction() print(f"Retrying transaction {tx_id} as {new_tx_id}") return new_tx_id return tx_id class DeadlockRecoveryViaPreemption: """ Recover from deadlock by preempting locks (rolling back transactions). """ def __init__(self, tx_manager: TransactionManager, detect_deadlock, victim_selector): self.tx_manager = tx_manager self.detect_deadlock = detect_deadlock self.select_victim = victim_selector def recover(self, deadlocked_tx_ids: List[int]) -> Dict: """ Recover from deadlock by preempting victim transactions. """ result = { 'preempted': [], 'retry_ids': {}, 'success': False } while deadlocked_tx_ids: # Select victim transaction victim = self.select_victim(deadlocked_tx_ids) # Preempt (rollback) the victim success = self.tx_manager.preempt_transaction(victim) if success: result['preempted'].append(victim) # Prepare retry new_id = self.tx_manager.retry_transaction(victim) result['retry_ids'][victim] = new_id # Re-check for deadlock deadlocked_tx_ids = self.detect_deadlock() result['success'] = True return resultLock Preemption = Transaction Abort + Retry
The key insight is that we're not "preempting" the lock in isolation—we're aborting the transaction that holds the lock, which releases the lock as a side effect. The transaction is then retried, and hopefully takes a different execution path that avoids deadlock.
Why This Works:
Logging enables undo: Every write is logged with the old value. Rollback replays the log backwards.
Atomicity preserved: Either all changes commit or all are undone. No partial state.
Locks released atomically: When transaction aborts, all its locks release at once.
Retry can succeed: The retry occurs at a different time, with different scheduling, and may acquire locks in a deadlock-free order.
Real database systems use exactly this pattern. PostgreSQL, MySQL, SQL Server all maintain transaction logs. When deadlock is detected, one transaction is aborted (rolled back), its locks are released, and it returns an error to the client. The client can then retry the transaction.
Safe preemption requires careful attention to several concerns. Here's a checklist and implementation patterns for production-grade preemption.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187
#include <atomic>#include <mutex>#include <vector>#include <functional>#include <memory>#include <chrono> /** * Requirements for safe resource preemption: * * 1. CHECKPOINTING: Ability to save resource state * 2. ROLLBACK: Ability to restore from checkpoint * 3. CLEANUP: Release partially-held resources * 4. NOTIFICATION: Inform process of preemption * 5. RESTART: Resume execution safely after preemption */ class PreemptableResource {public: virtual ~PreemptableResource() = default; // Check if preemption is currently safe virtual bool can_preempt() const = 0; // Save current state for later restoration virtual std::unique_ptr<void*> checkpoint() = 0; // Prepare for preemption (cleanup, notify) virtual void prepare_preempt() = 0; // Execute preemption virtual void preempt() = 0; // Restore from checkpoint (after process regains resource) virtual void restore(std::unique_ptr<void*> checkpoint) = 0;}; /** * PreemptableTransaction: A transaction that can be safely preempted. */class PreemptableTransaction {public: using OperationLog = std::vector<std::function<void()>>; private: std::atomic<bool> active_{true}; std::atomic<bool> preempt_requested_{false}; std::mutex log_mutex_; OperationLog undo_log_; std::vector<std::string> held_resources_; // Preemption callback std::function<void()> on_preempt_; public: /** * Check if we should yield due to preemption request. * Call this at "safe points" in the transaction logic. */ bool should_yield() const { return preempt_requested_.load(std::memory_order_acquire); } /** * Log an undoable operation. * Called before each write operation. */ void log_undo(std::function<void()> undo_func) { std::lock_guard<std::mutex> lock(log_mutex_); undo_log_.push_back(std::move(undo_func)); } /** * Request preemption of this transaction. * Sets flag that transaction should check at safe points. */ void request_preempt() { preempt_requested_.store(true, std::memory_order_release); } /** * Execute rollback and release resources. * Called when preemption is actually happening. */ void execute_rollback() { std::lock_guard<std::mutex> lock(log_mutex_); // Undo in reverse order for (auto it = undo_log_.rbegin(); it != undo_log_.rend(); ++it) { (*it)(); } undo_log_.clear(); // Release resources for (const auto& resource : held_resources_) { release_resource(resource); } held_resources_.clear(); active_.store(false, std::memory_order_release); // Notify callback if (on_preempt_) { on_preempt_(); } } void acquire_resource(const std::string& resource_id) { held_resources_.push_back(resource_id); } void release_resource(const std::string& resource_id) { // Release to lock manager } void set_preempt_callback(std::function<void()> callback) { on_preempt_ = std::move(callback); }}; /** * Safe preemption pattern with cooperative yielding. * * Rather than forcibly preempting, we request preemption and * the transaction checks at safe points. */class CooperativePreemption {public: /** * Example transaction that cooperates with preemption. */ void example_transaction(PreemptableTransaction& tx, DataStore& store) { try { // Acquire lock tx.acquire_resource("account_123"); // Read current value int balance = store.read("account_123"); // === SAFE POINT: Check for preemption === if (tx.should_yield()) { throw PreemptionException("Preemption requested"); } // Log undo operation BEFORE making change tx.log_undo([&store, balance]() { store.write("account_123", balance); // Restore old value }); // Make change store.write("account_123", balance - 100); // === SAFE POINT === if (tx.should_yield()) { throw PreemptionException("Preemption requested"); } // Acquire second lock (potential deadlock point!) tx.acquire_resource("account_456"); int other_balance = store.read("account_456"); tx.log_undo([&store, other_balance]() { store.write("account_456", other_balance); }); store.write("account_456", other_balance + 100); // Commit // (releases locks, clears undo log) } catch (const PreemptionException& e) { // Rollback is triggered tx.execute_rollback(); // Transaction will be retried by caller throw; } }}; class PreemptionException : public std::exception { std::string message_;public: explicit PreemptionException(const std::string& msg) : message_(msg) {} const char* what() const noexcept override { return message_.c_str(); }};Cooperative preemption (checking flags at safe points) is much safer than forceful preemption (interrupting mid-operation). For locks protecting mutable state, always prefer cooperative preemption with explicit safe points. Reserve forceful preemption for resources like CPU and memory where the OS provides safe mechanisms.
When recovering from deadlock via preemption, we must choose which resources to preempt. The goal is to break the deadlock with minimal disruption.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174
from dataclasses import dataclassfrom typing import List, Dict, Set, Tuplefrom enum import Enum class PreemptionCostFactor(Enum): """Factors contributing to preemption cost.""" WORK_LOST = "work_lost" # Amount of work to redo ROLLBACK_COMPLEXITY = "rollback" # Complexity of undo CHAIN_LENGTH = "chain" # Other resources depending on this PRIORITY = "priority" # Priority of holding process NON_PREEMPTABLE = "non_preempt" # Related non-preemptable resources @dataclassclass ResourcePreemptionCandidate: """A resource being considered for preemption.""" resource_id: str holder_pid: int resource_type: str preemption_cost: float cost_breakdown: Dict[str, float] can_preempt: bool reason_if_not: str = "" class PreemptionPlanner: """ Plan which resources to preempt to break deadlock. Goal: Find minimum-cost set of resources whose preemption breaks all cycles in the wait-for graph. """ def __init__(self, resource_info: Dict[str, dict], process_info: Dict[int, dict], wait_for_graph: Dict[int, Set[int]]): self.resources = resource_info self.processes = process_info self.wait_for = wait_for_graph def plan_preemption(self, deadlocked_pids: Set[int]) -> List[ResourcePreemptionCandidate]: """ Determine which resources to preempt to break deadlock. Strategy: 1. Identify all resources held by deadlocked processes 2. Evaluate preemption cost for each 3. Find minimum cost set that breaks all cycles """ # Step 1: Gather candidate resources candidates = [] for pid in deadlocked_pids: process = self.processes[pid] for resource_id in process.get('held_resources', []): candidate = self._evaluate_candidate(resource_id, pid) if candidate.can_preempt: candidates.append(candidate) # Sort by cost (lowest first) candidates.sort(key=lambda c: c.preemption_cost) # Step 2: Greedily select resources until deadlock broken selected = [] remaining_deadlock = set(deadlocked_pids) for candidate in candidates: if not remaining_deadlock: break # Deadlock already broken # Would preempting this resource help? if self._would_break_cycle(candidate, remaining_deadlock): selected.append(candidate) remaining_deadlock = self._simulate_preemption( candidate, remaining_deadlock ) if remaining_deadlock: # Couldn't break deadlock with available preemptions # Must fall back to process termination print(f"Warning: Cannot fully break deadlock via preemption. " f"Remaining: {remaining_deadlock}") return selected def _evaluate_candidate(self, resource_id: str, holder_pid: int) -> ResourcePreemptionCandidate: """Evaluate the cost of preempting a specific resource.""" resource = self.resources.get(resource_id, {}) process = self.processes.get(holder_pid, {}) cost_breakdown = {} # Factor 1: Work lost (undo log size) undo_log_size = process.get('undo_log_size', 0) cost_breakdown['work_lost'] = undo_log_size * 0.1 # Factor 2: Rollback complexity num_operations = process.get('operations_count', 0) cost_breakdown['rollback'] = num_operations * 0.5 # Factor 3: Chain length (resources depending on this) chain_length = len(resource.get('dependent_resources', [])) cost_breakdown['chain'] = chain_length * 2.0 # Factor 4: Process priority (higher priority = higher cost) priority = process.get('priority', 5) cost_breakdown['priority'] = priority * 1.0 # Check if actually preemptable can_preempt = resource.get('preemptable', False) # Non-preemptable if holding non-preemptable resources too non_preempt_held = [ r for r in process.get('held_resources', []) if not self.resources.get(r, {}).get('preemptable', False) ] if non_preempt_held: cost_breakdown['non_preempt'] = float('inf') can_preempt = False reason = f"Holding non-preemptable: {non_preempt_held}" else: cost_breakdown['non_preempt'] = 0 reason = "" total_cost = sum(cost_breakdown.values()) return ResourcePreemptionCandidate( resource_id=resource_id, holder_pid=holder_pid, resource_type=resource.get('type', 'unknown'), preemption_cost=total_cost, cost_breakdown=cost_breakdown, can_preempt=can_preempt, reason_if_not=reason ) def _would_break_cycle(self, candidate: ResourcePreemptionCandidate, deadlocked: Set[int]) -> bool: """Check if preempting this resource would help break a cycle.""" # Simplified: if holder is deadlocked and resource is blocking someone if candidate.holder_pid not in deadlocked: return False # Check if any deadlocked process is waiting for this resource resource_id = candidate.resource_id for pid in deadlocked: waiting_for = self.processes.get(pid, {}).get('waiting_for', []) if resource_id in waiting_for: return True return False def _simulate_preemption(self, candidate: ResourcePreemptionCandidate, deadlocked: Set[int]) -> Set[int]: """Simulate preempting and see who's still deadlocked.""" # This would re-run cycle detection after simulating release # For simplicity, just remove the holder if they'd no longer block result = deadlocked.copy() # If holder releases this resource, can any waiting process proceed? waiting_processes = [ pid for pid in deadlocked if candidate.resource_id in self.processes.get(pid, {}).get('waiting_for', []) ] for pid in waiting_processes: # If this process was only waiting for this resource, they unblock waiting_for = self.processes.get(pid, {}).get('waiting_for', []) if len(waiting_for) == 1: result.discard(pid) return resultFinding the optimal set of resources to preempt is related to the minimum cut problem in graph theory: we want to remove the minimum-cost set of edges (resource holdings) that disconnects all cycles. In practice, greedy algorithms work well—select the lowest-cost preemption that breaks at least one cycle, repeat until done.
Both preemption and termination can break deadlock. When should you use each?
| Criterion | Preemption (Rollback) | Termination |
|---|---|---|
| Work lost | Work since checkpoint | All work |
| Recovery time | Rollback duration + retry | Full restart time |
| Complexity | Requires logging, undo mechanism | Simple signal |
| Applicable resources | Only preemptable resources | Any resource |
| Data consistency | Preserved via atomicity | May require manual cleanup |
| User visibility | Request may retry automatically | Error returned, must handle |
| Best for | Databases, transactions | General processes, daemons |
Decision Guidelines:
| Scenario | Recommended Recovery |
|---|---|
| Database transactions | Preemption via rollback |
| Batch jobs (long-running) | Termination if no checkpoint system |
| Interactive processes | Preemption if possible, termination as fallback |
| System services | Termination + automatic restart |
| Resources with external effects | Termination (can't rollback effects) |
| Short transactions | Preemption (little work to redo) |
| Very long transactions | Consider partial preemption (to savepoint) |
Real systems often use hybrid approaches. For example, try preemption first (lower cost), fall back to termination if preemption isn't possible. Some systems implement 'coordinated termination' where the victim process is signaled to checkpoint its state before being killed, enabling faster restart than from scratch.
We've comprehensively covered resource preemption as a deadlock recovery mechanism—when it works, how to implement it, and when to prefer它 over termination.
Module Complete:
You've now completed the entire Detection and Recovery module. You understand:
This knowledge enables you to design, implement, and operate systems that handle deadlock gracefully—detecting it when it occurs and recovering with minimal disruption.
Congratulations! You've mastered deadlock detection and recovery. You can now implement detection algorithms, choose appropriate recovery strategies, terminate processes safely, and preempt resources when possible. This comprehensive understanding of deadlock handling is essential for building robust concurrent systems.