Distributed Coordination - Learning Module

Loading content...

0/227

Distributed Locks

Exclusive Access in a Distributed World

In single-machine programming, mutexes and locks are straightforward: a thread acquires a lock, performs its critical section, and releases the lock. The operating system guarantees that only one thread holds the lock at a time. But what happens when the "threads" are processes running on different machines across a network?

Distributed locking extends the mutual exclusion primitive to distributed systems. It's essential when:

Multiple services must coordinate access to a shared resource
A background job should only run on one node at a time
Database records need to be locked across microservices
Leader election requires exclusive leadership

However, distributed locks are surprisingly difficult to implement correctly. Network partitions, process pauses, and clock drift create failure modes that don't exist in single-machine locks. Many systems that claim to provide distributed locks actually provide something weaker—often with catastrophic consequences when assumptions are violated.

What You Will Learn

By the end of this page, you will understand the requirements for distributed locks, why naive implementations fail, how to build correct locks using consensus, the critical role of fencing tokens, and how production systems like ZooKeeper and Redis approach distributed locking with different safety guarantees.

The Distributed Lock Problem

A distributed lock must provide the same fundamental guarantee as a local lock: mutual exclusion. At any point in time, at most one client should believe it holds the lock.

Formal requirements:

Safety (Mutual Exclusion): At most one client can hold the lock at any time.
Liveness (Deadlock Freedom): If a client requests the lock and no one holds it (or the holder crashed), the client eventually acquires it.
Fault Tolerance: The lock system remains functional despite node failures.

Why are distributed locks harder?

In a local mutex, the kernel mediates access and can atomically transfer lock ownership. In distributed systems:

No global authority: No single entity observes all events.
Network partitions: A client might be cut off from the lock service but still running.
Process pauses: A client holding the lock might pause (GC, context switch) while still appearing healthy.
Clock drift: If locks use timeouts, clock disagreement can cause conflicts.

These issues mean that even if your lock service is correct, the client's belief about holding the lock might be wrong.

The Zombie Lock Holder

Client A acquires lock with 30s timeout
Client A does work, pauses for 35s (GC)
Lock expires, Client B acquires lock
Client B starts working on resource
Client A resumes, thinks it still has lock
Both clients modify resource!

The Fencing Token Solution

Lock acquisition returns monotonic token
Client A gets lock with token 33
Lock expires, Client B gets token 34
Client A resumes, tries operation with token 33
Resource rejects token 33 (seen 34)
Conflict prevented by token ordering

The two purposes of distributed locks:

1. Efficiency: Prevent duplicate work (e.g., sending the same email twice). Occasional safety violations are annoying but not catastrophic.

2. Correctness: Prevent conflicting operations that corrupt data. Safety violations are catastrophic and must never happen.

The level of guarantee you need determines which lock implementation is appropriate. For efficiency, a best-effort lock (like Redis with single-node) might suffice. For correctness, you need consensus-based locks with fencing tokens.

Most Distributed Locks Are Not Safe

Many lock implementations—including popular ones—provide only efficiency guarantees, not correctness guarantees. Using them for safety-critical coordination can and does lead to data corruption. Always understand exactly what guarantees your lock provides before relying on it.

Lock Implementation Approaches

There are several approaches to implementing distributed locks, each with different trade-offs between simplicity, availability, and safety.

Approach 1: Single-Node Lock Server

The simplest approach: one server manages all locks.

Client → Lock Server: ACQUIRE(resource_id)
Lock Server: If not held, mark as held, return SUCCESS
Client → Lock Server: RELEASE(resource_id)

Advantages: Simple, low latency, strong consistency. Disadvantages: Single point of failure. Server crash means all locks are lost or stuck.

Approach 2: Replicated Lock Server (Consensus)

Replicate the lock state using a consensus protocol (Raft, Paxos).

Advantages: Fault-tolerant, strong consistency. Disadvantages: Higher latency (quorum writes), more complex.

Approach 3: Probabilistic Locks (Redlock)

Acquire locks on multiple independent servers, consider successful if majority respond.

Advantages: Works with commodity Redis, no single point of failure. Disadvantages: Complex failure modes, timing assumptions, debated safety.

Approach 4: Lease-Based Locks

Locks have a time limit (lease). Lock holder must renew before expiry.

Advantages: Handles holder crashes (lock eventually expires). Disadvantages: Vulnerable to process pauses and clock drift.

Lock Implementation Comparison
Approach	Fault Tolerance	Consistency	Complexity	Use Case
Single Node	None	Strong	Low	Development, non-critical
Consensus (ZK, etcd)	High	Strong	Medium	Production, correctness required
Redlock	Moderate	Probabilistic	High	Efficiency, debated for safety
Lease-Based	Moderate	Time-bounded	Low	Common, requires fencing

The Lease Pattern:

Almost all practical distributed locks use leases—time-limited lock grants that expire automatically.

ACQUIRE(resource_id, ttl=30s) → lease_id, version
RENEW(resource_id, lease_id) → extended ttl
RELEASE(resource_id, lease_id) → OK

Leases solve the "crashed holder" problem: if a client dies while holding the lock, the lease eventually expires and another client can acquire it.

The danger: A client might experience a pause longer than the TTL. It resumes thinking it still has the lock, but the lease has expired and another client holds it. Without additional protection (fencing tokens), this leads to mutual exclusion violations.

Consensus-Based Locks

The most reliable distributed locks are built on top of consensus systems like ZooKeeper or etcd. These systems provide linearizable storage, which is exactly what locks need.

ZooKeeper Lock Recipe:

ZooKeeper's approach uses ephemeral sequential znodes:

Client creates ephemeral sequential znode under /locks/resource_name
Client gets all children of /locks/resource_name
If our znode has the smallest sequence number, we hold the lock
Otherwise, watch the znode with the next-smallest sequence number
When that znode is deleted, repeat from step 2
ZooKeeper automatically deletes our ephemeral znode if session expires

This approach scales well because each client only watches one predecessor, avoiding the "herd effect" where all clients wake up when the lock is released.

zookeeper_lock.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
from kazoo.client import KazooClient
from kazoo.recipe.lock import Lock
 
class ZooKeeperDistributedLock:
    """
    Distributed lock using ZooKeeper.
    
    Uses ephemeral sequential znodes for ordering.
    Automatic release on session expiration.
    """
    
    def __init__(self, zk_hosts: str, lock_path: str, identifier: str):
        self.zk = KazooClient(hosts=zk_hosts)
        self.lock_path = lock_path
        self.identifier = identifier
        self.lock_node: Optional[str] = None
        
    def acquire(self, timeout: float = None) -> bool:
        """
        Acquire the lock, optionally with timeout.
        
        Returns True if lock acquired, False if timeout expired.
        """
        self.zk.ensure_path(self.lock_path)
        
        # Create our ephemeral sequential node
        node_path = self.zk.create(
            f"{self.lock_path}/lock-",
            value=self.identifier.encode(),
            ephemeral=True,
            sequence=True
        )
        self.lock_node = node_path.split("/")[-1]
        
        start_time = time.time()
        
        while True:
            # Get all lock contenders
            children = self.zk.get_children(self.lock_path)
            sorted_children = sorted(children)
            
            our_index = sorted_children.index(self.lock_node)
            
            if our_index == 0:
                # We're first - we have the lock
                return True
                
            # Watch the node right before us
            predecessor = sorted_children[our_index - 1]
            predecessor_path = f"{self.lock_path}/{predecessor}"
            
            # Set up watch
            event = threading.Event()
            
            @self.zk.DataWatch(predecessor_path)
            def watch_predecessor(data, stat):
                if stat is None:
                    # Predecessor deleted
                    event.set()
                    return False
                return True
                
            # Wait for predecessor to be deleted
            remaining_timeout = None
            if timeout is not None:
                elapsed = time.time() - start_time
                remaining_timeout = timeout - elapsed
                if remaining_timeout <= 0:
                    # Timeout expired
                    self._cleanup()
                    return False
                    
            if not event.wait(timeout=remaining_timeout):
                # Timeout expired
                self._cleanup()
                return False
                
            # Predecessor deleted - check again
            continue
            
    def release(self):
        """Release the lock."""
        self._cleanup()
        
    def _cleanup(self):
        """Delete our lock node."""
        if self.lock_node:
            try:
                self.zk.delete(f"{self.lock_path}/{self.lock_node}")
            except Exception:
                pass  # May already be deleted (session expired)
            self.lock_node = None
            
    def __enter__(self):
        self.acquire()
        return self
        
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.release()
 
 
class EtcdDistributedLock:
    """
    Distributed lock using etcd.
    
    Uses lease-based locking with automatic expiration.
    """
    
    def __init__(self, etcd_client, lock_name: str, ttl: int = 30):
        self.client = etcd_client
        self.lock_name = lock_name
        self.ttl = ttl
        self.lease: Optional[int] = None
        self.lock_key = f"/locks/{lock_name}"
        
    def acquire(self, timeout: float = None) -> bool:
        """
        Acquire the lock with a lease.
        
        The lock is held as long as we renew the lease.
        """
        # Create a lease
        self.lease = self.client.lease(self.ttl)
        
        # Try to acquire (create key only if not exists)
        try:
            # Transaction: IF key doesn't exist, THEN put with lease
            success, _ = self.client.transaction(
                compare=[
                    self.client.transactions.version(self.lock_key) == 0
                ],
                success=[
                    self.client.transactions.put(
                        self.lock_key, 
                        "locked",
                        lease=self.lease
                    )
                ],
                failure=[]
            )
            
            if success:
                # Start lease refresh in background
                self._start_lease_refresh()
                return True
            else:
                # Lock is held by someone else
                # Could wait for it to be released
                return False
                
        except Exception as e:
            self.lease.revoke()
            self.lease = None
            raise
            
    def _start_lease_refresh(self):
        """Keep the lease alive in background."""
        def refresh_loop():
            while self.lease:
                try:
                    self.lease.refresh()
                    time.sleep(self.ttl / 3)
                except Exception:
                    break
                    
        thread = threading.Thread(target=refresh_loop, daemon=True)
        thread.start()
        
    def release(self):
        """Release the lock by revoking the lease."""
        if self.lease:
            self.lease.revoke()
            self.lease = None

Use Built-in Recipes

Both ZooKeeper's Curator library and etcd's client libraries provide lock recipes. Use these rather than implementing your own—they handle edge cases that are easy to miss, like session expiration during lock acquisition.

Fencing Tokens: The Missing Piece

Even with consensus-based locks, the zombie lock holder problem persists. A client might pause after acquiring the lock, resume after its lease expires, and make conflicting updates. The lock service cannot prevent this on its own.

The solution is fencing tokens: monotonically increasing identifiers issued with each lock acquisition. The protected resource checks tokens and rejects operations with stale tokens.

How fencing works:

Each lock acquisition returns a token (e.g., version number, epoch)
Tokens strictly increase across acquisitions
Clients include their token in all operations on the protected resource
Resources track the highest token they've seen
Resources reject operations with tokens ≤ the highest seen

The key insight:

The lock service guarantees that token N is issued after token N-1 has expired or been released. If a zombie holder with token N-1 tries to operate, its token is stale compared to the holder with token N. The resource rejects the zombie's operations.

fencing_tokens.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
class FencedLock:
    """
    Distributed lock with fencing token support.
    
    Each lock acquisition returns a monotonically increasing
    token that clients must use when accessing protected resources.
    """
    
    def __init__(self, lock_service: LockService):
        self.lock_service = lock_service
        self.token: Optional[int] = None
        
    def acquire(self, resource_id: str) -> int:
        """
        Acquire lock and return fencing token.
        
        The token is guaranteed to be higher than any
        previously issued token for this resource.
        """
        self.token = self.lock_service.acquire(resource_id)
        return self.token
        
    def get_token(self) -> int:
        """Get current fencing token, or raise if not held."""
        if self.token is None:
            raise LockNotHeldError("Lock not acquired")
        return self.token
        
    def release(self, resource_id: str):
        """Release the lock."""
        self.lock_service.release(resource_id)
        self.token = None
 
 
class FencedStorage:
    """
    Storage that enforces fencing tokens.
    
    Rejects writes with tokens older than the highest
    token seen for each key.
    """
    
    def __init__(self):
        self.data: Dict[str, Any] = {}
        self.tokens: Dict[str, int] = {}  # Highest token per key
        self.lock = threading.Lock()
        
    def write(self, key: str, value: Any, fencing_token: int) -> bool:
        """
        Write value only if fencing token is valid.
        
        Returns True if write succeeded, False if token was stale.
        """
        with self.lock:
            current_token = self.tokens.get(key, 0)
            
            if fencing_token < current_token:
                # Stale token - reject
                print(f"Rejected write for {key}: token {fencing_token} < {current_token}")
                return False
                
            # Valid token - accept
            self.tokens[key] = fencing_token
            self.data[key] = value
            return True
            
    def read(self, key: str) -> Any:
        """Read value (no fencing needed for reads in some models)."""
        return self.data.get(key)
 
 
# Example: Using fenced lock with storage
 
def safe_update_with_fencing():
    """
    Demonstrates safe updates using fenced locks.
    
    Even if client A experiences a pause, the fencing token
    ensures its stale writes are rejected.
    """
    lock = FencedLock(ZooKeeperLockService())
    storage = FencedStorage()
    
    resource_id = "account-123"
    
    # Acquire lock (returns fencing token)
    token = lock.acquire(resource_id)
    print(f"Acquired lock with token {token}")
    
    try:
        # Read current value
        current = storage.read(resource_id)
        
        # Compute new value
        new_value = compute_update(current)
        
        # Write WITH fencing token
        success = storage.write(resource_id, new_value, token)
        
        if not success:
            # Our token was stale - another client has the lock
            raise StaleTokenError("Lock was lost during operation")
            
    finally:
        lock.release(resource_id)
 
 
class ZooKeeperLockService:
    """
    Lock service using ZooKeeper that provides fencing tokens.
    
    The fencing token is the zxid (transaction ID) of the
    lock node creation, which is globally ordered.
    """
    
    def __init__(self, zk: KazooClient):
        self.zk = zk
        
    def acquire(self, resource_id: str) -> int:
        """
        Acquire lock and return fencing token.
        
        Uses ZooKeeper's zxid as fencing token - guaranteed
        to be monotonically increasing.
        """
        lock_path = f"/locks/{resource_id}"
        
        # Create ephemeral sequential node
        node_path = self.zk.create(
            f"{lock_path}/lock-",
            value=b"",
            ephemeral=True,
            sequence=True
        )
        
        # Wait until we're first (standard ZK lock protocol)
        self._wait_until_first(lock_path, node_path)
        
        # Get the stat to retrieve creation zxid
        stat = self.zk.get(node_path)[1]
        
        # Use creation zxid as fencing token
        return stat.czxid

Fencing Requires Resource Support

Fencing only works if the protected resource understands and checks tokens. If you're using an off-the-shelf database or service that doesn't support token checking, you cannot simply add fencing on top. This is a fundamental limitation—the protection must be end-to-end.

The Redlock Debate

Redlock is a distributed lock algorithm proposed by Redis creator Antirez (Salvatore Sanfilippo). It attempts to provide fault-tolerant locking using multiple independent Redis instances without formal consensus.

The Redlock Algorithm:

Get current time in milliseconds
Try to acquire lock on N Redis instances sequentially with short timeout
Calculate elapsed time; if lock acquired on majority (N/2 + 1) and elapsed time < lock TTL, lock is acquired
If not acquired, release all partial locks

The Controversy:

In 2016, Martin Kleppmann (author of Designing Data-Intensive Applications) published an influential critique arguing Redlock is not safe for correctness-critical use cases.

Kleppmann's Arguments:

Timing assumptions are violated in practice: Redlock depends on bounded clock drift and process pauses. Real systems experience GC pauses, context switches, and clock jumps that violate these assumptions.
No fencing tokens: Redlock doesn't provide monotonically increasing tokens. A client can't distinguish between legitimately holding the lock and being a zombie.
Redis persistence limitations: Even with AOF, Redis can lose acknowledged writes on crash, potentially losing lock state.

Antirez's Response:

Antirez defended Redlock, arguing that with proper configuration and clock monitoring, the algorithm is safe for most practical purposes.

The Takeaway:

The debate highlighted a fundamental divide:

For efficiency: Redlock (or single Redis) is probably fine. Occasional duplicate work is acceptable.
For correctness: Use consensus-based systems (ZooKeeper, etcd) with fencing tokens. Don't gamble on timing assumptions.

Redlock vs. Consensus-Based Locks
Aspect	Redlock	ZooKeeper/etcd
Underlying mechanism	Multi-instance voting	Consensus (Raft/ZAB)
Timing assumptions	Required (clock, pause bounds)	Not required for safety
Fencing tokens	Not provided	Provided (zxid, revision)
Complexity	Client-side logic	Server-side coordination
Availability	N/2 + 1 of N instances	N/2 + 1 of N replicas
Correctness guarantee	Probabilistic/debated	Strong (if fencing used)
Best suited for	Efficiency locks	Correctness locks

redlock.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
import redis
import time
import uuid
 
class Redlock:
    """
    Redlock distributed lock algorithm.
    
    WARNING: This algorithm's correctness is debated.
    Use for efficiency, not correctness-critical cases.
    """
    
    def __init__(self, redis_clients: List[redis.Redis]):
        self.clients = redis_clients
        self.quorum = len(redis_clients) // 2 + 1
        self.clock_drift_factor = 0.01  # 1% clock drift
        
    def acquire(self, resource: str, ttl_ms: int) -> Optional[str]:
        """
        Try to acquire lock on majority of instances.
        
        Returns lock value if successful, None if failed.
        """
        lock_value = str(uuid.uuid4())
        
        start_time = time.time() * 1000  # milliseconds
        
        acquired_count = 0
        
        for client in self.clients:
            if self._try_lock(client, resource, lock_value, ttl_ms):
                acquired_count += 1
                
        elapsed = time.time() * 1000 - start_time
        
        # Validity time = TTL - elapsed - clock drift
        drift = ttl_ms * self.clock_drift_factor
        validity_time = ttl_ms - elapsed - drift
        
        if acquired_count >= self.quorum and validity_time > 0:
            # Lock acquired!
            return lock_value
        else:
            # Failed - release any partial locks
            self.release(resource, lock_value)
            return None
            
    def _try_lock(self, client: redis.Redis, resource: str, 
                  value: str, ttl_ms: int) -> bool:
        """
        Try to acquire lock on a single Redis instance.
        
        Uses SET NX PX for atomic lock creation.
        """
        try:
            result = client.set(
                resource,
                value,
                nx=True,  # Only if not exists
                px=ttl_ms  # TTL in milliseconds
            )
            return result is True
        except redis.RedisError:
            return False
            
    def release(self, resource: str, value: str):
        """
        Release lock from all instances.
        
        Uses Lua script to only delete if value matches
        (prevent deleting another client's lock).
        """
        release_script = """
        if redis.call('get', KEYS[1]) == ARGV[1] then
            return redis.call('del', KEYS[1])
        else
            return 0
        end
        """
        
        for client in self.clients:
            try:
                client.eval(release_script, 1, resource, value)
            except redis.RedisError:
                pass  # Best effort release
                
    def extend(self, resource: str, value: str, ttl_ms: int) -> bool:
        """
        Extend lock TTL if we still hold it.
        """
        extend_script = """
        if redis.call('get', KEYS[1]) == ARGV[1] then
            return redis.call('pexpire', KEYS[1], ARGV[2])
        else
            return 0
        end
        """
        
        extended_count = 0
        
        for client in self.clients:
            try:
                result = client.eval(extend_script, 1, resource, value, ttl_ms)
                if result:
                    extended_count += 1
            except redis.RedisError:
                pass
                
        return extended_count >= self.quorum
 
 
# Simpler single-Redis lock (for efficiency-only use cases)
 
class SimpleRedisLock:
    """
    Single-Redis lock for efficiency use cases.
    
    Not fault-tolerant, but simpler and often sufficient
    when you just want to avoid duplicate work.
    """
    
    def __init__(self, redis_client: redis.Redis):
        self.client = redis_client
        
    def acquire(self, resource: str, ttl_seconds: int = 30) -> Optional[str]:
        """Acquire lock with automatic expiration."""
        lock_id = str(uuid.uuid4())
        
        if self.client.set(resource, lock_id, nx=True, ex=ttl_seconds):
            return lock_id
        return None
        
    def release(self, resource: str, lock_id: str) -> bool:
        """Release lock if we still hold it."""
        script = """
        if redis.call('get', KEYS[1]) == ARGV[1] then
            return redis.call('del', KEYS[1])
        end
        return 0
        """
        return self.client.eval(script, 1, resource, lock_id) == 1

When Is Redlock Appropriate?

Use Redlock (or simpler Redis locks) when the consequence of lock violation is duplicate work, not data corruption. For example, scheduled job deduplication, rate limiting, or cache refresh synchronization. For transactions, data integrity, or leader election, use consensus-based systems.

Lock-Free Alternatives

Distributed locks add latency, complexity, and failure modes. Before reaching for locks, consider whether lock-free approaches can solve your problem.

1. Optimistic Concurrency Control (OCC)

Instead of locking, read the current version, compute update, and write with a version check. If version changed, retry.

CAS(key, current_version, new_value)
  If version matches: update and return success
  If version changed: return failure (retry)

When to use: Low contention, conflicts are rare. Retrying is cheap.

2. Idempotent Operations

Design operations to be safely repeatable. If the same operation runs twice, the result is the same as running once.

# Not idempotent
balance += 100

# Idempotent
if transaction_id not in processed:
    balance += 100
    processed.add(transaction_id)

When to use: When you can track operation IDs or the operation is naturally idempotent.

3. Partitioning

Avoid contention by ensuring each piece of data is modified by exactly one writer.

# Instead of: All workers update shared counter
# Use: Each worker has its own counter, aggregate when needed

When to use: When data can be partitioned by owner/writer.

4. CRDT (Conflict-Free Replicated Data Types)

Use data structures mathematically designed to merge concurrent updates without conflicts.

G-Counter: Each node has its own counter, sum gives total
OR-Set: Tombstones enable concurrent add/remove

When to use: Eventual consistency is acceptable, high availability required.

lock_free_patterns.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
class OptimisticLocking:
    """
    Compare-and-swap based updates without locks.
    
    Works well when contention is low.
    """
    
    def __init__(self, storage):
        self.storage = storage
        
    def update(self, key: str, update_fn, max_retries: int = 5):
        """
        Optimistically update a value.
        
        Reads current version, applies update, writes with version check.
        Retries on conflict.
        """
        for attempt in range(max_retries):
            # Read current value and version
            value, version = self.storage.get_with_version(key)
            
            # Compute new value
            new_value = update_fn(value)
            
            # Try to write with version check
            success = self.storage.compare_and_swap(
                key, 
                expected_version=version,
                new_value=new_value
            )
            
            if success:
                return new_value
                
            # Conflict - someone else updated, retry
            time.sleep(random.uniform(0, 2 ** attempt * 0.01))
            
        raise TooManyConflictsError(f"Failed after {max_retries} attempts")
 
 
class IdempotentProcessor:
    """
    Process operations exactly once using deduplication.
    
    No locks needed - duplicate calls are safely ignored.
    """
    
    def __init__(self, storage):
        self.storage = storage
        self.processed_ids: Set[str] = set()
        
    def process(self, operation_id: str, operation):
        """
        Process operation exactly once.
        
        If operation_id was already processed, return cached result.
        """
        # Check if already processed
        if operation_id in self.processed_ids:
            return self.storage.get_result(operation_id)
            
        # Check persistent storage (for crash recovery)
        cached = self.storage.get_result(operation_id)
        if cached is not None:
            self.processed_ids.add(operation_id)
            return cached
            
        # Execute operation
        result = operation()
        
        # Store result atomically
        self.storage.save_result(operation_id, result)
        self.processed_ids.add(operation_id)
        
        return result
 
 
class GCounter:
    """
    Grow-only Counter CRDT.
    
    Each node maintains its own counter. Total is sum of all.
    Naturally handles concurrent increments without coordination.
    """
    
    def __init__(self, node_id: str):
        self.node_id = node_id
        self.counts: Dict[str, int] = {}  # node_id -> count
        
    def increment(self, amount: int = 1):
        """Increment our node's counter."""
        self.counts[self.node_id] = self.counts.get(self.node_id, 0) + amount
        
    def value(self) -> int:
        """Get total count across all nodes."""
        return sum(self.counts.values())
        
    def merge(self, other: 'GCounter'):
        """
        Merge with another counter.
        
        Take max of each node's count - handles duplicate deliveries.
        """
        all_nodes = set(self.counts.keys()) | set(other.counts.keys())
        
        for node in all_nodes:
            self.counts[node] = max(
                self.counts.get(node, 0),
                other.counts.get(node, 0)
            )

The Best Lock is No Lock

Locks are a coordination primitive of last resort. Every lock is a potential bottleneck and failure point. Before adding a distributed lock, ask whether you can design the system to not need one. Partition data, make operations idempotent, use optimistic concurrency, or embrace eventual consistency.

Production Lock Patterns

When you do need distributed locks, several patterns help make them production-ready.

1. Lock Timeouts

Always use lock timeouts to prevent indefinite holding:

if lock.acquire(timeout=5.0):  # Fail fast if lock unavailable
    try:
        # ... do work ...
    finally:
        lock.release()
else:
    # Handle lock unavailable

2. Lease Renewal

For long operations, renew the lease before expiry:

def long_operation_with_lock():
    lock.acquire(ttl=30)
    renewer = LockRenewer(lock, interval=10)  # Renew every 10s
    renewer.start()
    try:
        do_long_work()
    finally:
        renewer.stop()
        lock.release()

3. Lock Scoping

Use fine-grained, specific lock names to minimize contention:

# Bad: Single global lock
lock.acquire("database-updates")

# Better: Per-resource locks
lock.acquire(f"order:{order_id}")
lock.acquire(f"user:{user_id}:profile")

4. Deadlock Prevention

If acquiring multiple locks, always acquire in consistent order:

def transfer(from_account, to_account):
    # Always lock lower ID first
    first, second = sorted([from_account, to_account], key=lambda a: a.id)
    with lock(first.id):
        with lock(second.id):
            # Perform transfer

Production Lock Checklist

•Always use timeouts — Acquisition timeout prevents waiting forever; TTL prevents dead locks
•Handle acquisition failure — Have a fallback strategy when lock isn't available
•Use try/finally for release — Ensure release happens even on exceptions
•Implement monitoring — Track lock wait times, hold times, and failure rates
•Consider fencing tokens — Required if you need correctness, not just efficiency
•Test failure scenarios — What happens if lock holder crashes? Network partitions?
•Use fine-grained locks — Reduce contention with specific, scoped lock keys
•Consider lock-free alternatives — Often a better architectural choice

production_lock.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
class ProductionDistributedLock:
    """
    Production-ready distributed lock wrapper.
    
    Includes timeouts, monitoring, and retry logic.
    """
    
    def __init__(self, lock_service, metrics):
        self.lock_service = lock_service
        self.metrics = metrics
        
    @contextmanager
    def acquire(self, resource_id: str, timeout: float = 5.0, 
                ttl: float = 30.0, retries: int = 3):
        """
        Acquire lock with production-grade handling.
        
        Args:
            resource_id: Unique identifier for the resource to lock
            timeout: Max time to wait for lock acquisition
            ttl: Lock time-to-live (auto-release)
            retries: Number of acquisition retries
        """
        start_time = time.time()
        
        for attempt in range(retries):
            try:
                token = self._try_acquire(resource_id, timeout, ttl)
                
                if token:
                    # Track acquisition metrics
                    wait_time = time.time() - start_time
                    self.metrics.record_acquisition(resource_id, wait_time)
                    
                    try:
                        yield LockHandle(resource_id, token)
                    finally:
                        # Always release
                        self._release(resource_id, token)
                        hold_time = time.time() - start_time - wait_time
                        self.metrics.record_release(resource_id, hold_time)
                    return
                    
            except LockServiceError as e:
                if attempt < retries - 1:
                    backoff = min(2 ** attempt * 0.1, 2.0)
                    time.sleep(backoff)
                    continue
                raise
                
        # All retries failed
        self.metrics.record_acquisition_failure(resource_id)
        raise LockUnavailableError(f"Could not acquire lock for {resource_id}")
        
    def _try_acquire(self, resource_id: str, timeout: float, ttl: float):
        """Attempt single lock acquisition."""
        return self.lock_service.acquire(
            resource_id,
            timeout_ms=int(timeout * 1000),
            ttl_ms=int(ttl * 1000)
        )
        
    def _release(self, resource_id: str, token):
        """Release lock, handling errors gracefully."""
        try:
            self.lock_service.release(resource_id, token)
        except Exception as e:
            # Log but don't raise - lock will expire anyway
            self.metrics.record_release_error(resource_id, e)
 
 
# Usage example
def process_order(order_id: str, lock: ProductionDistributedLock):
    try:
        with lock.acquire(f"order:{order_id}", timeout=3.0, ttl=60.0) as handle:
            # We hold the lock
            order = load_order(order_id)
            
            if handle.token:  # If using fencing tokens
                update_order_with_token(order, handle.token)
            else:
                update_order(order)
                
    except LockUnavailableError:
        # Lock couldn't be acquired - order is being processed elsewhere
        log.info(f"Order {order_id} already being processed, skipping")
        
    except Exception as e:
        # Lock was acquired but operation failed
        log.error(f"Failed to process order {order_id}: {e}")
        raise

Summary: Distributed Locking

We've covered distributed locking comprehensively, from fundamental challenges to production patterns. Let's consolidate the key insights:

Key Takeaways

•Know your purpose — Efficiency locks (avoid duplicate work) have different requirements than correctness locks (prevent data corruption).
•Leases are universal — All practical distributed locks use time-limited leases to handle failed holders.
•Fencing tokens are essential — For correctness, locks must provide monotonic tokens that resources check to reject zombie operations.
•Consensus-based locks are safest — ZooKeeper and etcd provide linearizable lock primitives with proper fencing support.
•Redlock is debated — Use for efficiency only; its timing assumptions may be violated in practice.
•Lock-free is often better — Optimistic concurrency, idempotency, partitioning, and CRDTs can eliminate the need for locks.
•Production requires robustness — Timeouts, retries, monitoring, and failure handling are essential for real deployments.

Module Complete:

This concludes our exploration of Distributed Coordination. We've covered leader election, consensus algorithms (both theoretical foundations and practical Paxos/Raft), and distributed locking. These primitives form the backbone of reliable distributed systems—from Kubernetes' etcd to your application's coordination logic.

The key insight threading through all these topics: coordination in distributed systems requires careful reasoning about failures, timing, and invariants. There are no shortcuts. The protocols that work correctly are those that have been rigorously analyzed and tested under adversarial conditions.

Module Complete

You now understand the full spectrum of distributed coordination: from electing leaders to reaching consensus to controlling exclusive access. These primitives—though subtle and challenging—are what enable distributed systems to behave coherently despite the chaos of network partitions, message delays, and node failures.