Loading learning content...
Mutual exclusion alone cannot cause deadlock. A process that holds an exclusive resource and needs nothing else will eventually complete and release that resource. Deadlock requires something more: processes must hold resources while simultaneously waiting for additional resources. This second necessary condition—hold and wait—describes the resource accumulation pattern that enables deadlock.
Without hold and wait, no process would ever block while holding resources. Without blocked holders, no chain of waiting could form. Without waiting chains, no cycle could emerge. Hold and wait is the condition that transforms isolated resource holdings into interconnected webs of dependency that can entangle processes in permanent stalemate.
By the end of this page, you will deeply understand the hold-and-wait condition: what it means formally, why it arises naturally in concurrent programming, how it interacts with mutual exclusion to enable deadlock, and critically, how it can be prevented through strategic resource acquisition protocols.
Formal Definition:
A process holding at least one resource is waiting to acquire additional resources that are currently held by other processes.
This definition has several critical components:
1. Current Holdings — The process already possesses at least one resource. It is not starting from a resource-free state.
2. Additional Requests — The process needs more resources beyond what it currently holds. Its computation cannot proceed with only its current allocations.
3. Blocking Wait — The additional resources are currently held by other processes, so the requesting process must wait.
4. Retention During Wait — Crucially, the process does not release its currently held resources while waiting. It holds and waits simultaneously.
Coffman et al. stated this condition as: "A process must be holding resources while waiting to acquire additional resources." The conjunction of holding AND waiting is essential—a process that releases its holdings before waiting, or one that holds without needing more, does not satisfy this condition.
Mathematical Formalization:
Let allocation function H(Pᵢ) denote the set of resources held by process Pᵢ, and W(Pᵢ) denote the set of resources Pᵢ is waiting to acquire.
The hold-and-wait condition is satisfied when:
∃Pᵢ: H(Pᵢ) ≠ ∅ ∧ W(Pᵢ) ≠ ∅
That is, there exists a process that has non-empty holdings AND non-empty wait set. The process simultaneously occupies resources and desires more.
State Characterization:
A process Pᵢ can be in one of four states with respect to hold-and-wait:
| Holds Resources | Wants More | Contributes to Hold-and-Wait? |
|---|---|---|
| No | No | No — idle or resource-free |
| No | Yes | No — pure waiter (will block) |
| Yes | No | No — can proceed or will complete |
| Yes | Yes | Yes — hold and wait! |
Hold and wait is not an aberration or poor programming practice—it arises naturally and often necessarily from the requirements of concurrent computation. Understanding why it occurs reveals why eliminating it requires significant effort.
The Natural Pattern:
Consider a realistic scenario: a process transferring funds between bank accounts.
1. Acquire lock on source account → Process now holds 1 resource
2. Verify sufficient balance → Process still holds 1 resource
3. Acquire lock on destination acc → Process wants 2nd resource
If held by another: HOLD AND WAIT
This pattern is not a programming mistake—it's how transactions must work. The source account lock cannot be released before the destination is verified, or the transfer could partially complete. The process must hold the source lock while attempting to acquire the destination lock.
Hold and wait is often semantically necessary. Banking, inventory management, and database transactions all require holding multiple resources atomically. The goal is not to eliminate hold and wait entirely (often impossible), but to manage it carefully or prevent deadlock through other means.
Hold and wait doesn't just affect the process in that state—it cascades through the system, potentially blocking many other processes and amplifying the deadlock risk.
Cascade Scenario:
Consider four processes and four resources:
Time T0:
P1: Holds R1, requests R2 → R2 held by P2
P2: Holds R2, runs (not waiting)
P3: Runs (no resources)
P4: Runs (no resources)
Time T1: P2 completes with R2
P1: Acquires R2 (now holds R1, R2)
P2: Exits
P3: Requests R2 → WAIT (R2 held by P1)
P4: Requests R1 → WAIT (R1 held by P1)
Time T2: P1 requests R3
P1: Holds R1, R2, requests R3 → WAIT (R3 held by P5)
P3: Waiting for R2 (held by P1)
P4: Waiting for R1 (held by P1)
Result: P1 in hold-and-wait has blocked P3 and P4
P1's hold-and-wait state has created a chain: P1 → P5, and both P3 and P4 → P1. If P5 happens to wait for a resource held by P3 or P4, a cycle forms instantly.
Every process in a hold-and-wait state becomes a potential link in a deadlock chain. The more processes holding resources while waiting, the more potential edges exist in the resource allocation graph, and the greater the probability that a cycle will eventually form.
Resource Accumulation:
Processes in hold-and-wait tend to accumulate resources over time:
Each acquisition potentially creates wait states for other processes. The longer a process holds resources, the more it blocks other processes, and the greater the cascade effect. This is why long-running transactions are particularly dangerous for deadlock.
Resource allocation graphs provide a powerful visualization of hold-and-wait relationships. Understanding how to read these graphs is essential for deadlock analysis.
Resource Allocation Graph Notation:
Identifying Hold and Wait:
A process is in hold-and-wait if and only if:
Resource Allocation Graph Example:═══════════════════════════════════ ┌─────┐ ┌──│ R1 │──┐ │ │ • │ │ Request │ └─────┘ │ Assignment ───────► │ │ ───────► │ ▼ ┌────┴────┐ ┌────────┐ │ P1 │ │ P2 │ └────┬────┘ └────┬───┘ │ │ Assignment │ │ Request ◄─────── │ │ ◄─────── │ ┌─────┐ │ └─►│ R2 │◄─┘ │ • │ └─────┘ ANALYSIS:─────────• P1: Has R2 (incoming assignment), wants R1 (outgoing request) → P1 is in HOLD-AND-WAIT state • P2: Has R1 (incoming assignment), wants R2 (outgoing request) → P2 is in HOLD-AND-WAIT state • Both processes are in hold-and-wait• There is a cycle: P1 → R1 → P2 → R2 → P1• With single-instance resources, this cycle = DEADLOCKGraph Properties and Hold-and-Wait:
In a resource allocation graph:
The presence of hold-and-wait states is a prerequisite for deadlock cycles. Without hold-and-wait, either:
Hold-and-wait creates the possibility for request chains to pass through holding processes, enabling cycles.
Hold and wait manifests throughout software systems. Understanding common patterns helps recognize and prevent deadlock risks.
Scenario 1: Database Transactions
-- Transaction T1
BEGIN TRANSACTION;
SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- Holds row lock on account 1
-- Processing...
SELECT * FROM accounts WHERE id = 2 FOR UPDATE; -- Requests lock on account 2
-- If T2 holds it: HOLD AND WAIT
-- Transaction T2
BEGIN TRANSACTION;
SELECT * FROM accounts WHERE id = 2 FOR UPDATE; -- Holds row lock on account 2
-- Processing...
SELECT * FROM accounts WHERE id = 1 FOR UPDATE; -- Requests lock on account 1
-- If T1 holds it: HOLD AND WAIT
Both transactions are in hold-and-wait. If they proceed concurrently with this ordering, deadlock results.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253
import threadingimport time # File locks simulated with Python lockslock_config = threading.Lock()lock_data = threading.Lock() def updater_a(): """Updates data based on configuration""" # Step 1: Acquire config lock to read configuration with lock_config: # HOLDS lock_config print("Updater A: Acquired config lock") config = read_config() time.sleep(0.1) # Simulate processing # Step 2: Need data lock to update data # HOLD AND WAIT: Holds lock_config, waits for lock_data with lock_data: print("Updater A: Acquired data lock") update_data(config) def updater_b(): """Updates configuration based on data""" # Step 1: Acquire data lock to read data with lock_data: # HOLDS lock_data print("Updater B: Acquired data lock") data = read_data() time.sleep(0.1) # Simulate processing # Step 2: Need config lock to update config # HOLD AND WAIT: Holds lock_data, waits for lock_config with lock_config: print("Updater B: Acquired config lock") update_config(data) # If run concurrently, these threads may deadlock# Thread A: holds config, waits for data# Thread B: holds data, waits for config# DEADLOCK! thread_a = threading.Thread(target=updater_a)thread_b = threading.Thread(target=updater_b) thread_a.start()thread_b.start() # These joins may never completethread_a.join()thread_b.join()Scenario 3: Operating System Kernel
The kernel frequently encounters hold-and-wait:
Process: Page fault handler
1. Holds: Address space lock (to prevent concurrent mmap)
2. Needs: Free page from physical memory allocator
3. Allocator locked by memory reclaimer
Reclaimer:
1. Holds: Allocator lock (modifying free page list)
2. Needs: Address space lock (to unmap pages)
→ Potential deadlock if not carefully ordered
Kernel developers must meticulously analyze resource acquisition patterns to prevent such scenarios. Lock ordering rules ("always acquire the address space lock before the allocator lock") are essential in kernel code.
The pattern of acquiring locks while holding other locks—nested or hierarchical lock acquisition—is the quintessential hold-and-wait pattern. Every nested lock acquisition is a potential deadlock setup. Careful lock ordering and design is essential.
Unlike mutual exclusion, hold and wait can often be prevented through careful protocol design. Eliminating this condition is one of the primary strategies for deadlock prevention.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364
#include <pthread.h>#include <stdbool.h> // Global resource managerpthread_mutex_t resource_manager_lock = PTHREAD_MUTEX_INITIALIZER; // Resource availability arraybool resources[MAX_RESOURCES];int num_resources; // Request all resources at oncebool request_all_resources(int process_id, int *needed, int count) { pthread_mutex_lock(&resource_manager_lock); // Check if ALL needed resources are available for (int i = 0; i < count; i++) { if (!resources[needed[i]]) { // At least one resource unavailable - request denied pthread_mutex_unlock(&resource_manager_lock); return false; // Process must wait and retry } } // All resources available - allocate them all atomically for (int i = 0; i < count; i++) { resources[needed[i]] = false; // Mark as allocated printf("Process %d: Acquired resource %d\n", process_id, needed[i]); } pthread_mutex_unlock(&resource_manager_lock); return true; // Process received all resources, can proceed} // Release all resources at oncevoid release_all_resources(int process_id, int *held, int count) { pthread_mutex_lock(&resource_manager_lock); for (int i = 0; i < count; i++) { resources[held[i]] = true; // Mark as available printf("Process %d: Released resource %d\n", process_id, held[i]); } pthread_mutex_unlock(&resource_manager_lock); // Signal waiting processes to retry their requests broadcast_resource_available();} // Usage example - no hold-and-wait possiblevoid process_work(int process_id) { int needed[] = {RESOURCE_A, RESOURCE_B, RESOURCE_C}; int count = 3; // Keep trying until all resources obtained while (!request_all_resources(process_id, needed, count)) { // Wait and retry - no resources held during wait wait_for_resources(); } // Now holds all needed resources - do work do_critical_work(); // Release all at once release_all_resources(process_id, needed, count);}Trade-offs of Preventing Hold and Wait:
| Strategy | Advantages | Disadvantages |
|---|---|---|
| All-at-Once | Simple; guarantees no hold-and-wait | May not know resources needed upfront; reduces resource utilization; possible starvation |
| Release Before Request | Flexible; can adapt to runtime needs | Complex rollback; wasted work; possible livelock |
| Two-Phase Protocol | Clear structure; proven safe | Requires planning phases; not always applicable |
| Timeout with Retry | Breaks deadlock; self-healing | Wasted computation; complex retry logic; possible livelock |
While preventing hold and wait is theoretically attractive, practical implementation faces significant challenges that explain why it's not universally applied.
The All-at-Once Dilemma:
Consider a database query processor:
SELECT * FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE c.region = 'WEST'
The optimizer determines the execution plan:
What resources are needed?
The actual resources needed depend on data content, statistics, and runtime conditions. Declaring all resources upfront would require massive over-allocation or special query planning.
In practice, systems often accept hold-and-wait but prevent deadlock through other means: lock ordering (breaking circular wait), timeout and retry with intelligent backoff, or deadlock detection and recovery. Hold-and-wait prevention is valuable for specific subsystems with predictable resource needs.
Hold and wait becomes even more challenging in distributed systems, where resources are spread across multiple nodes and network delays complicate coordination.
Distributed Lock Acquisition:
Node A (New York): Node B (London): Node C (Tokyo):
───────────────── ───────────────── ───────────────
Lock L1 Lock L2 Lock L3
Transaction T1:
1. Acquire L1 (local) → Holds L1
2. Request L2 on Node B → Network RTT 70ms
→ Wait if L2 held
3. Request L3 on Node C → Network RTT 100ms
→ Wait if L3 held
During steps 2-3: T1 is in HOLD-AND-WAIT state
Lock L1 is held by T1 for entire duration of network calls
Network latency extends hold-and-wait duration dramatically. A local lock acquisition might take microseconds; a distributed acquisition might take hundreds of milliseconds. Resources are held for much longer, increasing deadlock risk.
Two-Phase Commit and Hold-and-Wait:
The two-phase commit protocol inherently creates extended hold-and-wait:
Phase 1 (Prepare):
Phase 2 (Commit/Abort):
Between phases, all participants are in hold-and-wait. If the coordinator fails, participants may hold locks indefinitely (the blocking problem of 2PC). This is a fundamental limitation related to hold-and-wait in distributed transactions.
In distributed systems, detecting hold-and-wait cycles is complicated by the lack of global state. No single node knows what resources processes on other nodes hold and wait for. Distributed deadlock detection requires collecting information from multiple nodes, which is expensive and can produce false positives due to message delays.
Modern Approaches:
Distributed databases like Google Spanner and CockroachDB use various techniques:
Wound-Wait and Wait-Die: Older transactions wound (abort) younger ones, or younger transactions die (abort themselves). This prevents cycles by ordering transactions.
Heartbeats and Timeouts: If a distributed lock isn't confirmed within timeout, abort and retry. Breaks hold-and-wait at the cost of work loss.
Optimistic Concurrency Control: Don't acquire locks during read phase. Validate at commit time. Reduces hold-and-wait duration significantly.
Centralized Lock Managers: Use a global lock manager to detect and prevent circular waits directly.
We have explored hold and wait—the second necessary condition for deadlock—in comprehensive depth. Let's consolidate the key insights:
What's Next:
Mutual exclusion and hold-and-wait together still cannot guarantee deadlock—if resources could be forcibly taken from processes, the waiting chains could be broken. The next page examines the no preemption condition: why resources are not forcibly reclaimed, how this interacts with the previous conditions, and when preemption is feasible as a deadlock prevention strategy.
You now possess a thorough understanding of hold and wait as a deadlock condition: its definition, natural occurrence, cascade effects, visualization, prevention strategies, practical challenges, and distributed system implications. Next, we examine the no-preemption condition to complete our analysis of how resource retention enables deadlock.