System Design (LLD)Synchronization Primitives

Synchronization Primitives: Coordinating Concurrent Access

LevelIntermediate

Duration90 mins

TopicSynchronization Primitives

1 / 4

Mutex and Locks: Ensuring Exclusive Access

The Gatekeeper of Shared Resources

Imagine a narrow bridge that can only support one vehicle at a time. Without traffic control, cars approaching from both directions would collide mid-span, causing catastrophic failure. The solution is obvious: install a traffic light that grants exclusive passage to one direction at a time.

Mutexes (Mutual Exclusion locks) are the traffic lights of concurrent programming. They ensure that when multiple threads need to access a shared resource—a bank account balance, a file, a database connection—only one thread can do so at any given moment. This seemingly simple concept is the foundation upon which all thread-safe systems are built.

But like many foundational concepts, the devil lies in the details. This page explores mutexes and their lock variants with the depth required to use them correctly in production systems.

What You Will Learn

By completing this page, you will understand the theoretical foundations of mutual exclusion, the anatomy of mutex implementations, different lock types (reentrant, read-write, spin), their performance characteristics, correct usage patterns, and common pitfalls that lead to bugs in production systems.

The Problem Mutexes Solve

Before diving into solutions, we must viscerally understand the problem. The core issue is race conditions—scenarios where the behavior of a program depends on the relative timing of events, particularly the order in which threads execute.

Consider a simple counter increment operation:

counter = counter + 1

This single line of code appears atomic, but at the CPU level, it involves three distinct operations:

The Three-Step Nature of Increment

•READ: Load the current value of counter from memory into a CPU register
•MODIFY: Add 1 to the register value
•WRITE: Store the new value from the register back to memory

When two threads execute this operation simultaneously on an initial counter value of 0, here's what can go wrong:

Race Condition: Lost Update Problem
Step	Thread A	Thread B	Register A	Register B	Memory
Initial	—	—	?	?	0
1	READ counter	—	0	?	0
2	—	READ counter	0	0	0
3	ADD 1	—	1	0	0
4	—	ADD 1	1	1	0
5	WRITE counter	—	1	1	1
6	—	WRITE counter	1	1	1

The Lost Update

Both threads incremented the counter, but the final value is 1 instead of 2. Thread B's update was 'lost' because it read the value before Thread A's write completed. In a banking system, this is literal money disappearing. In inventory systems, this means overselling products. In metrics systems, this means incorrect analytics driving wrong business decisions.

This problem is particularly insidious because:

It's non-deterministic: The bug might manifest once in a million executions
It's hard to reproduce: Adding logging or debugging often changes timing enough to mask the issue
It scales with concurrency: The more threads, the higher the probability of corruption
It's silent: No exceptions are thrown; the system just produces wrong results

Mutexes solve this by ensuring atomicity at a higher level than individual CPU instructions. They create protected regions called critical sections where only one thread can execute at a time.

Anatomy of a Mutex

A mutex is conceptually simple: it has two states (locked/unlocked) and two operations (acquire/release). But understanding how these operations work at a lower level is essential for using mutexes effectively.

The Mutex Contract:

Mutex Guarantees

•Mutual Exclusion: At most one thread can hold the lock at any time
•Progress: If no thread holds the lock and threads are waiting, one will eventually acquire it
•Bounded Waiting: There's a limit on how long any thread can be starved (in fair implementations)
•Memory Visibility: Actions before releasing the lock are visible to the thread that next acquires it

The last guarantee—memory visibility—is often overlooked but critically important. Modern CPUs reorder instructions and cache values aggressively. A mutex release includes a memory barrier (or fence) that ensures all writes by the releasing thread are visible to subsequent lock acquirers.

Pseudocode for Mutex Operations:

mutex-conceptual.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Conceptual mutex structure
class Mutex {
    private boolean locked = false;
    private Thread owner = null;
    private Queue<Thread> waitQueue = new Queue<>();
    
    // Acquire the lock (blocking)
    void lock() {
        while (true) {
            // Atomic check-and-set operation (hardware supported)
            if (atomicCompareAndSwap(locked, false, true)) {
                // Successfully acquired
                owner = currentThread();
                memoryBarrier();  // Ensure visibility
                return;
            }
            // Failed to acquire - add to wait queue and sleep
            waitQueue.enqueue(currentThread());
            parkThread(currentThread());  // Yield CPU
        }
    }
    
    // Release the lock
    void unlock() {
        if (owner != currentThread()) {
            throw IllegalMonitorStateException("Not lock owner");
        }
        memoryBarrier();  // Flush all writes
        owner = null;
        locked = false;
        
        // Wake up a waiting thread
        if (!waitQueue.isEmpty()) {
            Thread next = waitQueue.dequeue();
            unparkThread(next);
        }
    }
}

Critical Implementation Details:

1. Atomic Compare-And-Swap (CAS): The atomicCompareAndSwap operation is the magic that makes everything work. It atomically checks if a variable has an expected value and, only if so, updates it to a new value. This is a single CPU instruction (CMPXCHG on x86, LDREX/STREX on ARM) that cannot be interrupted.

2. Wait Queue Management: When a thread fails to acquire the lock, it must either spin (busy-wait) or sleep. Sleeping requires OS kernel involvement to manage the wait queue and wake threads. This is expensive but necessary for long waits.

3. Memory Barriers: The barriers ensure the happens-before relationship: everything that happened before a thread released the lock is visible to the thread that subsequently acquires it. Without this, threads might see stale cached values.

Hardware Foundation

Mutexes ultimately rely on hardware atomic instructions. Without CPU support for atomic operations, implementing correct synchronization would require disabling interrupts—possible only in kernel mode. User-space locks leverage kernel assistance (futexes on Linux, Critical Sections on Windows) to efficiently coordinate without burning CPU cycles.

Lock Types and Variations

Not all locks are created equal. Different concurrency scenarios call for different lock behaviors. Understanding the variations helps you select the right tool for each situation.

The Standard Mutex (Non-Reentrant)

The simplest form of mutex allows exactly one thread to hold it. If the same thread attempts to acquire a lock it already holds, it deadlocks against itself.

Characteristics:

Single owner at a time
Non-reentrant (self-deadlock possible)
Lowest overhead
Use when: You can guarantee no nested locking scenarios

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Java provides ReentrantLock, but you can use synchronized 
// (which IS reentrant) or a Semaphore(1) for non-reentrant behavior
 
import java.util.concurrent.Semaphore;
 
class NonReentrantLockExample {
    // Semaphore with 1 permit acts as non-reentrant mutex
    private final Semaphore mutex = new Semaphore(1);
    private int sharedCounter = 0;
    
    public void increment() throws InterruptedException {
        mutex.acquire();  // Block until permit available
        try {
            sharedCounter++;  // Protected critical section
        } finally {
            mutex.release();  // Always release in finally
        }
    }
    
    // WARNING: This would deadlock!
    public void nestedOperation() throws InterruptedException {
        mutex.acquire();
        try {
            // This will block forever - we already hold the permit
            // increment();  // DEADLOCK!
        } finally {
            mutex.release();
        }
    }
}

Lock Usage Patterns and Best Practices

Acquiring locks correctly is only half the battle. How you structure code around locks determines whether your concurrent programs are robust or riddled with subtle bugs.

Essential Lock Patterns

•Always use try-finally (or RAII): The lock MUST be released even if an exception occurs. Forgetting to unlock leads to permanent deadlock.
•Keep critical sections small: Hold locks for the minimum time necessary. Do I/O, computation, and allocations outside the lock when possible.
•Don't perform blocking I/O while holding locks: Network calls, file operations, and database queries can take seconds—blocking all other threads.
•Acquire locks in consistent order: If you need multiple locks, always acquire them in the same order everywhere to prevent deadlock.
•Prefer lock-free alternatives when available: Atomic operations, concurrent collections, and immutable data often outperform explicit locking.

Anti-Pattern: Wide Critical Section

•Lock acquired before I/O
•Network call holds the lock
•All threads blocked during fetch
•Throughput: ~10 req/sec

1
2
3
4
5
6
7
8
9
10
11
// BAD: I/O inside critical section
void updateFromServer() {
    lock.acquire();
    try {
        // Network call takes 100ms!
        data = httpClient.fetch(url);
        cache.put(key, data);
    } finally {
        lock.unlock();
    }
}

Correct: Minimal Critical Section

•I/O performed outside lock
•Only cache update is locked
•Threads can parallelize I/O
•Throughput: ~1000 req/sec

1
2
3
4
5
6
7
8
9
10
11
12
// GOOD: I/O outside critical section
void updateFromServer() {
    // Network call without lock
    var data = httpClient.fetch(url);
    
    lock.acquire();
    try {
        cache.put(key, data);  // Only this
    } finally {
        lock.unlock();
    }
}

The 100x Throughput Difference

Moving a 100ms network call outside the lock transforms a system from 10 operations/second (serialized) to potentially 1000+ operations/second (parallel I/O with minimal lock contention). Critical section minimization is one of the highest-leverage performance optimizations in concurrent systems.

Common Pitfalls and How to Avoid Them

Even experienced developers make locking mistakes. Understanding common pitfalls helps you write more robust concurrent code from the start.

Pitfall #1: Lock Escape

Leaking a reference to protected data allows it to be accessed without the lock. Even if you synchronize access to a collection, returning an iterator or inner reference breaks safety.

Pitfall #2: Check-Then-Act Race

Checking a condition and acting on it non-atomically: 'if (map.containsKey(k)) { map.remove(k); }'. Another thread might remove the key between check and act.

Pitfall #3: Wrong Lock Instance

Using a different lock object than intended. Creating a local lock variable instead of using the shared instance. Using 'this' when a static lock was needed.

Pitfall #4: Coarse Lock Granularity

Using a single lock for unrelated data. All operations serialize even when they don't conflict. This creates artificial bottlenecks under high concurrency.

Pitfall #5: Lock Ordering Deadlock

Thread A locks resource X then Y. Thread B locks Y then X. Both wait forever for the other's resource. Classic dining philosophers problem.

lock-ordering-deadlock.java
Deadlock Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// DEADLOCK SCENARIO
class BankAccount {
    private final Object lock = new Object();
    private int balance;
    
    // Thread A: transfer(accountX, accountY, 100)
    // Thread B: transfer(accountY, accountX, 50)
    // If they interleave, both wait forever
    
    void transferDeadlock(BankAccount from, BankAccount to, int amt) {
        synchronized(from.lock) {       // A locks X, B locks Y
            // Context switch!
            synchronized(to.lock) {     // A waits for Y, B waits for X
                from.balance -= amt;    // Neither reaches here
                to.balance += amt;
            }
        }
    }
    
    // FIX: Always lock in consistent order (by account ID)
    void transferSafe(BankAccount from, BankAccount to, int amt) {
        BankAccount first = from.hashCode() < to.hashCode() ? from : to;
        BankAccount second = first == from ? to : from;
        
        synchronized(first.lock) {
            synchronized(second.lock) {
                from.balance -= amt;
                to.balance += amt;
            }
        }
    }
}

Performance Characteristics

Lock performance depends on contention level, critical section duration, and hardware factors. Understanding these trade-offs helps you choose the right lock type and optimize for your workload.

Lock Type Performance Comparison
Lock Type	Uncontended Cost	Contended Cost	Best For	Avoid When
Basic Mutex	~25ns	~1-10μs	General purpose	Very short critical sections
Spin Lock	~5ns	Burns CPU	< 1μs hold time	Single core, long waits
Reentrant Lock	~30ns	~1-10μs	Nested locking	Performance-critical paths
Read-Write Lock	~35ns	~1-10μs	80% reads	Write-heavy workloads
Biased Lock	~1ns	~25ns first time	Thread-local access	High thread contention

Contention Is The Killer

Uncontended lock acquisition is remarkably cheap—often just a single atomic instruction. The real cost comes from contention: when threads must wait. Under high contention, the choice of lock type matters far less than reducing contention itself through better data structures or finer granularity.

Key Performance Insights:

Context switches dominate: A sleeping lock's cost is primarily the ~1-10μs context switch, not the lock mechanism itself
Cache effects matter: Locks involve memory barriers that flush CPU caches, impacting nearby data
False sharing is real: Unrelated data on the same cache line forces synchronization between cores
Lock striping scales: Instead of one lock for a hash map, use N locks (one per bucket range)
Adaptive locks exist: Modern JVMs bias locks toward the last-owning thread, spin briefly before sleeping

Summary: Mutex Mastery

We've explored mutexes and locks from first principles to production patterns. Let's consolidate the key takeaways:

Key Takeaways

•Mutexes ensure mutual exclusion — Only one thread can hold the lock; others block until it's released
•Always release locks in finally/RAII blocks — Exception-safe locking prevents permanent deadlock
•Choose lock type for your use case — Spin for microseconds, mutex for milliseconds, RW for read-heavy
•Minimize critical section size — Hold locks for the shortest time possible to maximize concurrency
•Consistent lock ordering prevents deadlock — Always acquire multiple locks in the same global order
•Memory visibility is a hidden benefit — Locks provide happens-before guarantees beyond mutual exclusion

What's Next:

Mutexes are powerful but limited to binary (locked/unlocked) states. The next page explores Semaphores—a generalization that allows controlling access to a limited number of resources, enabling patterns like connection pools, rate limiting, and bounded parallelism.

Page Complete

You now understand mutexes and locks at a level sufficient for professional concurrent programming. You can select appropriate lock types, implement them correctly with exception safety, and avoid common pitfalls. Next, we'll explore semaphores—locks that count.

1 / 4

Loading learning content...

System Design (LLD)Synchronization Primitives

Synchronization Primitives: Coordinating Concurrent Access

LevelIntermediate

Duration90 mins

TopicSynchronization Primitives

1 / 4

Mutex and Locks: Ensuring Exclusive Access

The Gatekeeper of Shared Resources

But like many foundational concepts, the devil lies in the details. This page explores mutexes and their lock variants with the depth required to use them correctly in production systems.

What You Will Learn

The Problem Mutexes Solve

Consider a simple counter increment operation:

counter = counter + 1

This single line of code appears atomic, but at the CPU level, it involves three distinct operations:

The Three-Step Nature of Increment

•READ: Load the current value of counter from memory into a CPU register
•MODIFY: Add 1 to the register value
•WRITE: Store the new value from the register back to memory

When two threads execute this operation simultaneously on an initial counter value of 0, here's what can go wrong:

Race Condition: Lost Update Problem
Step	Thread A	Thread B	Register A	Register B	Memory
Initial	—	—	?	?	0
1	READ counter	—	0	?	0
2	—	READ counter	0	0	0
3	ADD 1	—	1	0	0
4	—	ADD 1	1	1	0
5	WRITE counter	—	1	1	1
6	—	WRITE counter	1	1	1

The Lost Update

This problem is particularly insidious because:

It's non-deterministic: The bug might manifest once in a million executions
It's hard to reproduce: Adding logging or debugging often changes timing enough to mask the issue
It scales with concurrency: The more threads, the higher the probability of corruption
It's silent: No exceptions are thrown; the system just produces wrong results

Anatomy of a Mutex

The Mutex Contract:

Mutex Guarantees

•Mutual Exclusion: At most one thread can hold the lock at any time
•Progress: If no thread holds the lock and threads are waiting, one will eventually acquire it
•Bounded Waiting: There's a limit on how long any thread can be starved (in fair implementations)
•Memory Visibility: Actions before releasing the lock are visible to the thread that next acquires it

Pseudocode for Mutex Operations:

mutex-conceptual.pseudo
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Conceptual mutex structure
class Mutex {
    private boolean locked = false;
    private Thread owner = null;
    private Queue<Thread> waitQueue = new Queue<>();
    
    // Acquire the lock (blocking)
    void lock() {
        while (true) {
            // Atomic check-and-set operation (hardware supported)
            if (atomicCompareAndSwap(locked, false, true)) {
                // Successfully acquired
                owner = currentThread();
                memoryBarrier();  // Ensure visibility
                return;
            }
            // Failed to acquire - add to wait queue and sleep
            waitQueue.enqueue(currentThread());
            parkThread(currentThread());  // Yield CPU
        }
    }
    
    // Release the lock
    void unlock() {
        if (owner != currentThread()) {
            throw IllegalMonitorStateException("Not lock owner");
        }
        memoryBarrier();  // Flush all writes
        owner = null;
        locked = false;
        
        // Wake up a waiting thread
        if (!waitQueue.isEmpty()) {
            Thread next = waitQueue.dequeue();
            unparkThread(next);
        }
    }
}

Critical Implementation Details:

Hardware Foundation

Lock Types and Variations

Not all locks are created equal. Different concurrency scenarios call for different lock behaviors. Understanding the variations helps you select the right tool for each situation.

The Standard Mutex (Non-Reentrant)

The simplest form of mutex allows exactly one thread to hold it. If the same thread attempts to acquire a lock it already holds, it deadlocks against itself.

Characteristics:

Single owner at a time
Non-reentrant (self-deadlock possible)
Lowest overhead
Use when: You can guarantee no nested locking scenarios

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Java provides ReentrantLock, but you can use synchronized 
// (which IS reentrant) or a Semaphore(1) for non-reentrant behavior
 
import java.util.concurrent.Semaphore;
 
class NonReentrantLockExample {
    // Semaphore with 1 permit acts as non-reentrant mutex
    private final Semaphore mutex = new Semaphore(1);
    private int sharedCounter = 0;
    
    public void increment() throws InterruptedException {
        mutex.acquire();  // Block until permit available
        try {
            sharedCounter++;  // Protected critical section
        } finally {
            mutex.release();  // Always release in finally
        }
    }
    
    // WARNING: This would deadlock!
    public void nestedOperation() throws InterruptedException {
        mutex.acquire();
        try {
            // This will block forever - we already hold the permit
            // increment();  // DEADLOCK!
        } finally {
            mutex.release();
        }
    }
}

Lock Usage Patterns and Best Practices

Acquiring locks correctly is only half the battle. How you structure code around locks determines whether your concurrent programs are robust or riddled with subtle bugs.

Essential Lock Patterns

•Always use try-finally (or RAII): The lock MUST be released even if an exception occurs. Forgetting to unlock leads to permanent deadlock.
•Keep critical sections small: Hold locks for the minimum time necessary. Do I/O, computation, and allocations outside the lock when possible.
•Don't perform blocking I/O while holding locks: Network calls, file operations, and database queries can take seconds—blocking all other threads.
•Acquire locks in consistent order: If you need multiple locks, always acquire them in the same order everywhere to prevent deadlock.
•Prefer lock-free alternatives when available: Atomic operations, concurrent collections, and immutable data often outperform explicit locking.

Anti-Pattern: Wide Critical Section

•Lock acquired before I/O
•Network call holds the lock
•All threads blocked during fetch
•Throughput: ~10 req/sec

1
2
3
4
5
6
7
8
9
10
11
// BAD: I/O inside critical section
void updateFromServer() {
    lock.acquire();
    try {
        // Network call takes 100ms!
        data = httpClient.fetch(url);
        cache.put(key, data);
    } finally {
        lock.unlock();
    }
}

Correct: Minimal Critical Section

•I/O performed outside lock
•Only cache update is locked
•Threads can parallelize I/O
•Throughput: ~1000 req/sec

1
2
3
4
5
6
7
8
9
10
11
12
// GOOD: I/O outside critical section
void updateFromServer() {
    // Network call without lock
    var data = httpClient.fetch(url);
    
    lock.acquire();
    try {
        cache.put(key, data);  // Only this
    } finally {
        lock.unlock();
    }
}

The 100x Throughput Difference

Common Pitfalls and How to Avoid Them

Even experienced developers make locking mistakes. Understanding common pitfalls helps you write more robust concurrent code from the start.

Pitfall #1: Lock Escape

Leaking a reference to protected data allows it to be accessed without the lock. Even if you synchronize access to a collection, returning an iterator or inner reference breaks safety.

Pitfall #2: Check-Then-Act Race

Checking a condition and acting on it non-atomically: 'if (map.containsKey(k)) { map.remove(k); }'. Another thread might remove the key between check and act.

Pitfall #3: Wrong Lock Instance

Using a different lock object than intended. Creating a local lock variable instead of using the shared instance. Using 'this' when a static lock was needed.

Pitfall #4: Coarse Lock Granularity

Using a single lock for unrelated data. All operations serialize even when they don't conflict. This creates artificial bottlenecks under high concurrency.

Pitfall #5: Lock Ordering Deadlock

Thread A locks resource X then Y. Thread B locks Y then X. Both wait forever for the other's resource. Classic dining philosophers problem.

lock-ordering-deadlock.java
Deadlock Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// DEADLOCK SCENARIO
class BankAccount {
    private final Object lock = new Object();
    private int balance;
    
    // Thread A: transfer(accountX, accountY, 100)
    // Thread B: transfer(accountY, accountX, 50)
    // If they interleave, both wait forever
    
    void transferDeadlock(BankAccount from, BankAccount to, int amt) {
        synchronized(from.lock) {       // A locks X, B locks Y
            // Context switch!
            synchronized(to.lock) {     // A waits for Y, B waits for X
                from.balance -= amt;    // Neither reaches here
                to.balance += amt;
            }
        }
    }
    
    // FIX: Always lock in consistent order (by account ID)
    void transferSafe(BankAccount from, BankAccount to, int amt) {
        BankAccount first = from.hashCode() < to.hashCode() ? from : to;
        BankAccount second = first == from ? to : from;
        
        synchronized(first.lock) {
            synchronized(second.lock) {
                from.balance -= amt;
                to.balance += amt;
            }
        }
    }
}

Performance Characteristics

Lock performance depends on contention level, critical section duration, and hardware factors. Understanding these trade-offs helps you choose the right lock type and optimize for your workload.

Lock Type Performance Comparison
Lock Type	Uncontended Cost	Contended Cost	Best For	Avoid When
Basic Mutex	~25ns	~1-10μs	General purpose	Very short critical sections
Spin Lock	~5ns	Burns CPU	< 1μs hold time	Single core, long waits
Reentrant Lock	~30ns	~1-10μs	Nested locking	Performance-critical paths
Read-Write Lock	~35ns	~1-10μs	80% reads	Write-heavy workloads
Biased Lock	~1ns	~25ns first time	Thread-local access	High thread contention

Contention Is The Killer

Key Performance Insights:

Context switches dominate: A sleeping lock's cost is primarily the ~1-10μs context switch, not the lock mechanism itself
Cache effects matter: Locks involve memory barriers that flush CPU caches, impacting nearby data
False sharing is real: Unrelated data on the same cache line forces synchronization between cores
Lock striping scales: Instead of one lock for a hash map, use N locks (one per bucket range)
Adaptive locks exist: Modern JVMs bias locks toward the last-owning thread, spin briefly before sleeping

Summary: Mutex Mastery

We've explored mutexes and locks from first principles to production patterns. Let's consolidate the key takeaways:

Key Takeaways

•Mutexes ensure mutual exclusion — Only one thread can hold the lock; others block until it's released
•Always release locks in finally/RAII blocks — Exception-safe locking prevents permanent deadlock
•Choose lock type for your use case — Spin for microseconds, mutex for milliseconds, RW for read-heavy
•Minimize critical section size — Hold locks for the shortest time possible to maximize concurrency
•Consistent lock ordering prevents deadlock — Always acquire multiple locks in the same global order
•Memory visibility is a hidden benefit — Locks provide happens-before guarantees beyond mutual exclusion

What's Next:

Page Complete

1 / 4