System Design (HLD)Multi-Region Architecture

Multi-Region Architecture: Building Globally Distributed Systems

LevelAdvanced

Duration90 mins

TopicMulti-Region Architecture

3 / 5

Active-Active Multi-Region: Global Scale with Global Complexity

When Every Region Is Primary

When a user in Tokyo places an order on a global e-commerce platform, that order is processed in Tokyo. When a user in Frankfurt browses the same catalog, they're served from Frankfurt. When both users' orders affect the same inventory item, somehow the system maintains a coherent view of available stock—despite the 9,000 kilometers separating these data centers.

This is active-active multi-region architecture: a topology where all regions serve production traffic simultaneously, each capable of handling any operation for any user. Unlike active-passive, where a single region handles all traffic and others wait in standby, active-active distributes load globally while maintaining data consistency across the planet.

The benefits are compelling: users everywhere experience low latency, capacity scales beyond single-region limits, and the failure of any region is absorbed seamlessly by survivors. But these benefits come at a steep price in architectural and operational complexity. Active-active is not a pattern to adopt lightly—it's the apex of distributed systems design, demanding sophisticated solutions to problems that don't exist in simpler topologies.

What You Will Learn

By the end of this page, you will understand the two primary active-active patterns (sharded and replicated), design consistency models appropriate for global systems, implement conflict resolution strategies, and navigate the operational challenges inherent in active-active deployments.

The Two Flavors of Active-Active

Active-active architectures come in two fundamentally different patterns, each with distinct characteristics, tradeoffs, and appropriate use cases.

Pattern 1: Geographically Sharded Active-Active

In this pattern, each region "owns" a subset of data. Ownership is determined by a partitioning scheme—often by user geography, account ID range, or tenant assignment:

User data lives primarily in one region (their "home" region)
Writes for a user always go to their home region
Reads can be served from any region (with potential staleness for cross-region reads)
Cross-region data access is the exception, not the rule

This pattern avoids the hardest problems of active-active (multi-writer conflicts) by ensuring that each data item has a single writer. It's simpler to implement and reason about.

Pattern 2: Fully Replicated Active-Active (Multi-Master)

In this pattern, all data is replicated to all regions, and any region can accept writes for any data:

All regions have complete copies of all data
Any region can write to any record
Conflicts can occur when multiple regions write to the same data concurrently
Requires sophisticated conflict detection and resolution

This pattern provides the ultimate in flexibility and failover (any region can handle any request), but introduces the full complexity of distributed consensus and conflict resolution.

Geographically Sharded

•Data placement: Each piece of data has a home region
•Write path: Writes route to the home region
•Conflicts: Eliminated by design (single writer per datum)
•Cross-region access: Occasionally needed, adds latency
•Consistency: Strong within region, eventual cross-region
•Complexity: Moderate
•Best for: Multi-tenant SaaS, region-specific data, natural partitioning exists

Fully Replicated (Multi-Master)

•Data placement: All data in all regions
•Write path: Any region accepts any write
•Conflicts: Must be detected and resolved
•Cross-region access: Rare (local data available)
•Consistency: Eventually consistent with conflicts
•Complexity: Very high
•Best for: Collaborative apps, global inventory, users move between regions

Converting Mermaid diagram...

Choosing Between Patterns

The choice between sharded and replicated active-active depends on your data access patterns:

Choose Sharded When:

Users have natural geographic affinity (residents of a country, regional business units)
Data has clear ownership (a user's profile, a company's documents)
Cross-region reads are acceptable with slight staleness
You want active-active benefits without multi-master complexity

Choose Fully Replicated When:

Users frequently move between regions (global travelers, roaming devices)
Data is truly global (shared inventory, collaborative documents)
Sub-second failover is required (losing writes to the home region isn't acceptable)
Your team has deep distributed systems expertise

Start Sharded, Evolve If Needed

Most organizations should start with geographically sharded active-active, which delivers the majority of latency and availability benefits without multi-master complexity. Only move to fully replicated when you've proven the need through user behavior data showing significant cross-region access patterns.

Consistency Models for Global Systems

In active-active systems, consistency guarantees become a crucial design decision. The CAP theorem imposes real constraints: during network partitions between regions (which happen regularly), you cannot have both perfect consistency and continuous availability. You must choose.

Eventual Consistency

The most common model for active-active systems, eventual consistency guarantees that if no new updates are made to a piece of data, all reads will eventually return the same value. The "eventually" may be milliseconds or seconds, depending on replication lag.

Advantages: High availability, low latency, works during partitions
Disadvantages: Temporary inconsistencies visible to users, requires careful application design
Appropriate for: Social feeds, product catalogs, user preferences, analytics

Session Consistency (Read-Your-Writes)

A stronger guarantee where a user is guaranteed to see their own writes, even if other users see older data:

Writes are immediately visible to the user who made them
Other users may temporarily see stale data
Implemented through sticky sessions or write tokens

This model handles the most jarring user experience issue—"I just submitted that, where is it?"—without requiring global synchronization.

session-consistency.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
/**
 * Session Consistency Implementation
 * 
 * Ensures users see their own writes even in an eventually consistent
 * multi-region system. Uses write tokens to route reads appropriately.
 */
 
interface WriteToken {
  timestamp: number;
  region: string;
  logSequence: string;  // Database position marker
}
 
interface ReadOptions {
  userId: string;
  writeToken?: WriteToken;
}
 
class SessionConsistentReader {
  private readonly localRegion: string;
  private readonly replicationLagMs: number;
  
  constructor(localRegion: string, estimatedReplicationLagMs: number = 500) {
    this.localRegion = localRegion;
    this.replicationLagMs = estimatedReplicationLagMs;
  }
  
  /**
   * Reads data with session consistency.
   * If the user has a recent write token, ensures the read
   * sees at least that write.
   */
  async read<T>(
    table: string,
    key: string,
    options: ReadOptions
  ): Promise<T> {
    const { writeToken } = options;
    
    // No write token: local read is fine (no consistency requirement)
    if (!writeToken) {
      return this.localRead(table, key);
    }
    
    // Write token exists: determine if local replica is caught up
    const tokenAge = Date.now() - writeToken.timestamp;
    
    // If write was in local region, local read is always consistent
    if (writeToken.region === this.localRegion) {
      return this.localRead(table, key);
    }
    
    // If write was recent (likely not yet replicated), read from write region
    if (tokenAge < this.replicationLagMs * 2) {
      return this.crossRegionRead(table, key, writeToken.region);
    }
    
    // Write is old enough that replication likely completed
    // Verify by checking local replica position against token
    const localPosition = await this.getLocalReplicaPosition();
    
    if (this.positionIsAfter(localPosition, writeToken.logSequence)) {
      // Local replica has caught up
      return this.localRead(table, key);
    } else {
      // Local replica is behind: read from write region
      return this.crossRegionRead(table, key, writeToken.region);
    }
  }
  
  /**
   * Writes data and returns a token for session consistency.
   */
  async write<T>(
    table: string,
    key: string,
    value: T,
    userId: string
  ): Promise<{ result: T; writeToken: WriteToken }> {
    // Write to local region (which is primary for this user)
    const result = await this.localWrite(table, key, value);
    
    // Generate write token for session consistency
    const writeToken: WriteToken = {
      timestamp: Date.now(),
      region: this.localRegion,
      logSequence: await this.getCurrentLogSequence()
    };
    
    // Store token for user's session
    await this.storeUserWriteToken(userId, writeToken);
    
    return { result, writeToken };
  }
  
  private async localRead<T>(table: string, key: string): Promise<T> {
    // Read from local region database
    return db.region(this.localRegion).read(table, key);
  }
  
  private async localWrite<T>(table: string, key: string, value: T): Promise<T> {
    // Write to local region database
    return db.region(this.localRegion).write(table, key, value);
  }
  
  private async crossRegionRead<T>(
    table: string,
    key: string,
    region: string
  ): Promise<T> {
    // Read from specified region (higher latency)
    return db.region(region).read(table, key);
  }
  
  private async getLocalReplicaPosition(): Promise<string> {
    // Get current replication position from local database
    const result = await db.region(this.localRegion)
      .query('SELECT pg_last_wal_replay_lsn()');
    return result.rows[0].pg_last_wal_replay_lsn;
  }
  
  private async getCurrentLogSequence(): Promise<string> {
    // Get current write position
    const result = await db.region(this.localRegion)
      .query('SELECT pg_current_wal_lsn()');
    return result.rows[0].pg_current_wal_lsn;
  }
  
  private positionIsAfter(current: string, required: string): boolean {
    // Compare log sequence numbers
    return current >= required;  // Simplified; real impl uses LSN parsing
  }
  
  private async storeUserWriteToken(
    userId: string,
    token: WriteToken
  ): Promise<void> {
    // Store in distributed cache with TTL
    await cache.set(
      `write-token:${userId}`,
      token,
      { ttlSeconds: 60 }  // Token expires after replication guaranteed complete
    );
  }
}

Causal Consistency

A stronger model that preserves causality: if operation A happened before operation B (and B could have depended on A), all observers see A before B. This prevents anomalies like seeing a reply before the original post.

Implemented through vector clocks or hybrid logical clocks
More complex than eventual consistency
Provides intuitive behavior for collaborative applications

Strong Consistency (Global Serialization)

All operations appear to execute in a single global order, and reads always return the most recent write:

Requires coordination across regions for every operation
Adds cross-region latency to every write
Available only when all regions are reachable

Strong consistency in active-active is possible (Google's Spanner proves this with TrueTime), but the latency cost makes it appropriate only for critical operations like financial transactions.

Hybrid Approaches

Real systems often combine consistency models for different data types:

Account balances: Strong consistency (synchronized across regions)
User preferences: Eventual consistency (no conflict impact)
Social feeds: Causal consistency (maintains conversational order)
Shopping carts: Session consistency (user sees their own additions)

Consistency Model Selection Guide
Data Type	Recommended Model	Rationale
Account balances	Strong/CP	Financial correctness required
Inventory counts	Strong with fallback	Prevent overselling, degrade gracefully
User sessions	Session consistency	Users must see own data
User profiles	Eventual	Low conflict rate, not business-critical
Social feeds	Causal	Conversation order matters
Analytics events	Eventual	Aggregated anyway, order not critical
Collaborative docs	Causal + CRDT	Real-time sync with order preservation

Conflict Detection and Resolution

In fully replicated active-active systems, conflicts are inevitable. When two regions accept writes to the same data at approximately the same time, the system must detect this conflict and resolve it deterministically.

Understanding Conflicts

A conflict occurs when two operations modify the same data without awareness of each other. Consider:

User A in US-East updates their email to "alice@new.com"
User A in EU-West (different session/device) updates their email to "alice@work.com"
Both updates succeed in their local regions
During replication, the databases receive conflicting updates

Without conflict resolution, one update would randomly overwrite the other, potentially losing data.

Conflict Detection Mechanisms

Last-Write-Wins (LWW)

The simplest approach: timestamp each write, accept the one with the latest timestamp.

Easy to implement
May lose data (earlier write is discarded)
Clock synchronization is critical (NTP drift can cause wrong winner)
Appropriate when any resolution is acceptable

Version Vectors / Vector Clocks

Track a logical clock per region, detecting concurrent modifications:

Each region maintains a version number
On read, receive all versions; on write, increment local version
Concurrent writes detected when neither version vector dominates
Enables explicit conflict handling rather than silent overwrite

conflict-resolution.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
"""
Conflict Resolution Strategies for Multi-Region Active-Active
 
This module implements several conflict resolution approaches
with their tradeoffs and appropriate use cases.
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, Dict, Any, Callable
from enum import Enum
import json
 
class ConflictStrategy(Enum):
    LAST_WRITE_WINS = "lww"
    FIRST_WRITE_WINS = "fww"
    MERGE = "merge"
    CUSTOM = "custom"
 
@dataclass
class VersionVector:
    """
    Vector clock for tracking causality across regions.
    Each region maintains its own logical timestamp.
    """
    clocks: Dict[str, int] = field(default_factory=dict)
    
    def increment(self, region: str) -> 'VersionVector':
        """Increment the clock for a region (on write)."""
        new_clocks = self.clocks.copy()
        new_clocks[region] = new_clocks.get(region, 0) + 1
        return VersionVector(new_clocks)
    
    def merge(self, other: 'VersionVector') -> 'VersionVector':
        """Merge two vectors (take max of each component)."""
        regions = set(self.clocks.keys()) | set(other.clocks.keys())
        merged = {
            r: max(self.clocks.get(r, 0), other.clocks.get(r, 0))
            for r in regions
        }
        return VersionVector(merged)
    
    def is_concurrent_with(self, other: 'VersionVector') -> bool:
        """
        Check if two version vectors are concurrent (neither dominates).
        This indicates a conflict.
        """
        self_newer = False
        other_newer = False
        
        regions = set(self.clocks.keys()) | set(other.clocks.keys())
        for region in regions:
            self_val = self.clocks.get(region, 0)
            other_val = other.clocks.get(region, 0)
            
            if self_val > other_val:
                self_newer = True
            if other_val > self_val:
                other_newer = True
        
        return self_newer and other_newer
    
    def dominates(self, other: 'VersionVector') -> bool:
        """Check if self happened after other (self dominates)."""
        if self.is_concurrent_with(other):
            return False
        
        regions = set(self.clocks.keys()) | set(other.clocks.keys())
        return all(
            self.clocks.get(r, 0) >= other.clocks.get(r, 0)
            for r in regions
        )
 
@dataclass
class VersionedValue:
    """A value with version tracking for conflict detection."""
    value: Any
    version: VersionVector
    timestamp: datetime
    origin_region: str
 
class ConflictResolver:
    """
    Handles conflict resolution for multi-region replication.
    """
    
    def __init__(self, local_region: str):
        self.local_region = local_region
        self.custom_resolvers: Dict[str, Callable] = {}
    
    def register_custom_resolver(
        self, 
        entity_type: str, 
        resolver: Callable[[VersionedValue, VersionedValue], VersionedValue]
    ):
        """Register a custom conflict resolver for an entity type."""
        self.custom_resolvers[entity_type] = resolver
    
    def resolve(
        self,
        entity_type: str,
        local: VersionedValue,
        remote: VersionedValue,
        strategy: ConflictStrategy = ConflictStrategy.LAST_WRITE_WINS
    ) -> VersionedValue:
        """
        Resolve conflict between local and remote versions.
        Returns the winning value.
        """
        # Check if there's actually a conflict
        if local.version.dominates(remote.version):
            return local  # Local is newer, no conflict
        if remote.version.dominates(local.version):
            return remote  # Remote is newer, no conflict
        
        # Concurrent versions: need resolution
        if strategy == ConflictStrategy.LAST_WRITE_WINS:
            return self._resolve_lww(local, remote)
        elif strategy == ConflictStrategy.FIRST_WRITE_WINS:
            return self._resolve_fww(local, remote)
        elif strategy == ConflictStrategy.MERGE:
            return self._resolve_merge(local, remote)
        elif strategy == ConflictStrategy.CUSTOM:
            if entity_type in self.custom_resolvers:
                return self.custom_resolvers[entity_type](local, remote)
            raise ValueError(f"No custom resolver for {entity_type}")
    
    def _resolve_lww(
        self, 
        local: VersionedValue, 
        remote: VersionedValue
    ) -> VersionedValue:
        """Last-Write-Wins: use timestamp, break ties with region name."""
        if local.timestamp > remote.timestamp:
            winner = local
        elif remote.timestamp > local.timestamp:
            winner = remote
        else:
            # Tie-breaker: deterministic region ordering
            winner = local if local.origin_region < remote.origin_region else remote
        
        # Merge version vectors for accurate causality tracking
        return VersionedValue(
            value=winner.value,
            version=local.version.merge(remote.version),
            timestamp=max(local.timestamp, remote.timestamp),
            origin_region=winner.origin_region
        )
    
    def _resolve_fww(
        self, 
        local: VersionedValue, 
        remote: VersionedValue
    ) -> VersionedValue:
        """First-Write-Wins: preserve the original value."""
        if local.timestamp < remote.timestamp:
            winner = local
        elif remote.timestamp < local.timestamp:
            winner = remote
        else:
            winner = local if local.origin_region < remote.origin_region else remote
        
        return VersionedValue(
            value=winner.value,
            version=local.version.merge(remote.version),
            timestamp=winner.timestamp,
            origin_region=winner.origin_region
        )
    
    def _resolve_merge(
        self, 
        local: VersionedValue, 
        remote: VersionedValue
    ) -> VersionedValue:
        """
        Merge strategy: attempt to combine values.
        Works for additive operations (sets, counters).
        """
        # Handle different merge scenarios
        if isinstance(local.value, set) and isinstance(remote.value, set):
            merged_value = local.value | remote.value
        elif isinstance(local.value, dict) and isinstance(remote.value, dict):
            # Deep merge for dictionaries
            merged_value = self._deep_merge(local.value, remote.value)
        elif isinstance(local.value, (int, float)) and isinstance(remote.value, (int, float)):
            # For counters, this is tricky - need delta-based approach
            merged_value = max(local.value, remote.value)
        else:
            # Fall back to LWW for non-mergeable types
            return self._resolve_lww(local, remote)
        
        return VersionedValue(
            value=merged_value,
            version=local.version.merge(remote.version),
            timestamp=max(local.timestamp, remote.timestamp),
            origin_region=self.local_region
        )
    
    def _deep_merge(self, dict1: dict, dict2: dict) -> dict:
        """Deep merge two dictionaries, handling nested conflicts."""
        result = dict1.copy()
        for key, value in dict2.items():
            if key in result:
                if isinstance(result[key], dict) and isinstance(value, dict):
                    result[key] = self._deep_merge(result[key], value)
                else:
                    # Conflict at leaf: take later value (could be configurable)
                    result[key] = value
            else:
                result[key] = value
        return result
 
 
# CRDT Example: G-Counter (Grow-only counter)
@dataclass
class GCounter:
    """
    A grow-only counter CRDT.
    Can be incremented in any region without coordination.
    Merge always converges to the correct total.
    """
    counts: Dict[str, int] = field(default_factory=dict)
    
    def increment(self, region: str, amount: int = 1) -> 'GCounter':
        """Increment the counter in a specific region."""
        new_counts = self.counts.copy()
        new_counts[region] = new_counts.get(region, 0) + amount
        return GCounter(new_counts)
    
    def value(self) -> int:
        """Get the total count across all regions."""
        return sum(self.counts.values())
    
    def merge(self, other: 'GCounter') -> 'GCounter':
        """Merge two counters (take max from each region)."""
        regions = set(self.counts.keys()) | set(other.counts.keys())
        merged = {
            r: max(self.counts.get(r, 0), other.counts.get(r, 0))
            for r in regions
        }
        return GCounter(merged)
 
# Usage example
if __name__ == "__main__":
    # Simulate concurrent counter updates in two regions
    counter_us = GCounter()
    counter_eu = GCounter()
    
    # US increments 3 times
    counter_us = counter_us.increment("us-east", 3)
    
    # EU increments 5 times (concurrently)
    counter_eu = counter_eu.increment("eu-west", 5)
    
    # After replication, both regions merge
    counter_us = counter_us.merge(counter_eu)
    counter_eu = counter_eu.merge(counter_us)
    
    print(f"US counter: {counter_us.value()}")  # 8
    print(f"EU counter: {counter_eu.value()}")  # 8
    # Both converge to 8 without coordination!

Conflict-Free Replicated Data Types (CRDTs)

CRDTs are data structures mathematically designed to merge without conflicts. They guarantee that all replicas converge to the same state, regardless of the order operations are applied.

Common CRDT types:

G-Counter: Grow-only counter (can only increment)
PN-Counter: Positive-negative counter (increment and decrement)
G-Set: Grow-only set (can only add elements)
OR-Set: Observed-remove set (add and remove elements)
LWW-Register: Last-writer-wins register for single values
MV-Register: Multi-value register (preserves all concurrent values)

CRDTs are ideal for data that can be modeled as counters, sets, or maps. They enable strong eventual consistency: replicas may temporarily diverge, but they always converge to the same state.

Application-Level Resolution

For complex domain objects, automatic resolution may be insufficient. Application-level resolution surfaces conflicts to users or applies domain-specific logic:

User resolution: Show both versions and let the user choose
Domain rules: Apply business logic (e.g., "later booking wins for calendar conflicts")
Operational transforms: For collaborative editing (like Google Docs)

Conflicts Are Information Loss

Every resolved conflict represents potential data loss—one write was preferred over another. Audit conflict resolution: log all resolutions, monitor conflict rates, and alert when they spike. High conflict rates often indicate user experience issues or design problems.

Implementation Challenges in Active-Active

Active-active architectures introduce challenges that rarely appear in single-region or active-passive systems. These must be addressed during design, not discovered in production.

ID Generation

In single-region systems, auto-incrementing database IDs work perfectly. In active-active, concurrent inserts in different regions would produce collisions. Solutions:

UUIDs (v4): Random 128-bit identifiers with negligible collision probability

Pros: Simple, no coordination required
Cons: Poor index performance, not time-sortable

ULIDs / UUIDs (v7): Time-sorted universally unique identifiers

Timestamp prefix enables time-ordering and improves index locality
Becoming the modern standard for distributed systems

Snowflake IDs: Twitter's approach with timestamp + machine ID + sequence

64-bit, time-sortable, guaranteed unique per machine
Requires machine ID coordination

Range-Prefixed IDs: Each region gets a prefix range (e.g., US: 1-1B, EU: 1B-2B)

Simple, works with numeric IDs
Requires capacity planning and rebalancing

distributed-id-generator.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
/**
 * Distributed ID Generation Strategies for Active-Active
 */
 
// Strategy 1: ULID - Universally Unique Lexicographically Sortable Identifier
// Format: 01ARZ3NDEKTSV4RRFFQ69G5FAV (26 characters, Crockford Base32)
// First 10 chars = timestamp, Last 16 chars = randomness
 
import { ulid, decodeTime } from 'ulid';
 
class ULIDGenerator {
  generate(): string {
    return ulid();  // e.g., "01ARZ3NDEKTSV4RRFFQ69G5FAV"
  }
  
  getTimestamp(id: string): Date {
    const timestamp = decodeTime(id);
    return new Date(timestamp);
  }
}
 
// Strategy 2: Snowflake-inspired ID
// 64-bit: [1 bit unused][41 bits timestamp][10 bits region+machine][12 bits sequence]
 
class SnowflakeGenerator {
  private readonly epoch = 1609459200000n;  // Jan 1, 2021
  private readonly regionId: bigint;
  private readonly machineId: bigint;
  private sequence: bigint = 0n;
  private lastTimestamp: bigint = 0n;
  
  constructor(regionId: number, machineId: number) {
    // 5 bits for region (32 regions), 5 bits for machine (32 per region)
    if (regionId < 0 || regionId > 31) {
      throw new Error('Region ID must be 0-31');
    }
    if (machineId < 0 || machineId > 31) {
      throw new Error('Machine ID must be 0-31');
    }
    this.regionId = BigInt(regionId);
    this.machineId = BigInt(machineId);
  }
  
  generate(): bigint {
    let timestamp = BigInt(Date.now()) - this.epoch;
    
    if (timestamp === this.lastTimestamp) {
      // Same millisecond: increment sequence
      this.sequence = (this.sequence + 1n) & 0xFFFn;  // 12 bits
      if (this.sequence === 0n) {
        // Sequence exhausted, wait for next millisecond
        while (timestamp <= this.lastTimestamp) {
          timestamp = BigInt(Date.now()) - this.epoch;
        }
      }
    } else {
      this.sequence = 0n;
    }
    this.lastTimestamp = timestamp;
    
    // Compose ID:
    // timestamp (41 bits) | region (5 bits) | machine (5 bits) | sequence (12 bits)
    return (timestamp << 22n) | 
           (this.regionId << 17n) | 
           (this.machineId << 12n) | 
           this.sequence;
  }
  
  parse(id: bigint): { timestamp: Date; regionId: number; machineId: number; sequence: number } {
    return {
      timestamp: new Date(Number((id >> 22n) + this.epoch)),
      regionId: Number((id >> 17n) & 0x1Fn),
      machineId: Number((id >> 12n) & 0x1Fn),
      sequence: Number(id & 0xFFFn)
    };
  }
}
 
// Strategy 3: Region-Prefixed with Local Sequence
class RegionPrefixedGenerator {
  private readonly regionPrefix: string;
  private sequence: number = 0;
  
  // Region prefixes provide namespace isolation
  private static readonly REGION_PREFIXES: Record<string, string> = {
    'us-east': 'USE',
    'us-west': 'USW',
    'eu-west': 'EUW',
    'ap-northeast': 'APN'
  };
  
  constructor(region: string) {
    const prefix = RegionPrefixedGenerator.REGION_PREFIXES[region];
    if (!prefix) {
      throw new Error(`Unknown region: ${region}`);
    }
    this.regionPrefix = prefix;
  }
  
  generate(): string {
    const timestamp = Date.now().toString(36);  // Base36 timestamp
    const seq = (this.sequence++).toString(36).padStart(4, '0');
    const random = Math.random().toString(36).substring(2, 6);
    
    return `${this.regionPrefix}-${timestamp}-${seq}-${random}`;
    // e.g., "USE-lpq5k8z-0001-a7b2"
  }
}

Time Synchronization

Many active-active patterns rely on timestamps: last-write-wins, conflict detection, ordering. But clock drift between regions can cause incorrect ordering. Solutions:

NTP with tight configuration: Maintain sub-millisecond synchronization
Hybrid Logical Clocks (HLC): Combine physical and logical clocks for causality preservation
TrueTime (Google): GPS and atomic clocks for microsecond accuracy (not available outside Google Cloud Spanner)

Cross-Region Transactions

Some operations must be atomic across regions—transferring money between accounts in different regions, for example. Options:

Two-Phase Commit (2PC): Classic distributed transaction, but blocks on failures
Saga Pattern: Sequence of local transactions with compensating actions
Consensus Protocols (Paxos/Raft): For leader election and distributed state

Avoid cross-region transactions whenever possible. When unavoidable, accept the latency cost and design for partial failures.

Hit Rate and Cache Warming

In active-active, cache effectiveness becomes region-specific:

A cache in US-East is cold for data primarily accessed from EU-West
Background replication doesn't populate caches
Cache misses for cross-region data add latency

Strategies:

Route users to consistent regions (sticky routing)
Pre-warm caches using replication events
Accept lower hit rates for globally-accessed data

The Subtle Bugs Are the Worst

Active-active bugs are often timing-dependent and difficult to reproduce. Two users updating a document in different regions: 99% of the time it works because latency separates them. But occasionally, their updates collide, and your conflict resolution has a subtle bug. These issues lurk for months before manifesting during a critical moment.

Operating Active-Active in Production

Running active-active in production requires operational practices far beyond single-region or active-passive deployments. The combination of continuous traffic in all regions with cross-region data dependencies creates a uniquely challenging operational environment.

Deployment Strategies

Deploying to active-active systems requires careful orchestration:

Sequential Rolling Deployment

Deploy to one region at a time
Monitor for issues before proceeding to next region
Ensures working region if deployment is problematic
Slower, but safest approach

Parallel Deployment with Canary per Region

Deploy canary to all regions simultaneously
Each region has its own canary percentage
Faster, but more complex to monitor

Feature Flags for Regional Rollout

Deploy code everywhere, but enable features per region
Database migrations require careful coordination
Maximum flexibility for rollback

The Schema Change Problem

Database schema changes in active-active are particularly challenging:

Both regions must handle both old and new schema during transition
Adding columns: easy (nullable, with default)
Changing column types: requires multi-stage migration
Removing columns: can only remove after no code uses them

All schema changes must be backward-compatible, as regions run different code versions during deployments.

Active-Active Deployment Checklist

•Pre-deployment: Verify replication health, review conflict metrics, ensure rollback procedures are ready
•Schema changes: Deploy backward-compatible schema updates to all regions first
•Code deployment: Roll out to one region, validate, proceed to remaining regions
•Monitoring window: Extended observation (2-4x normal) due to cross-region interactions
•Rollback planning: Rollback order is reverse of deployment (handle schema dependencies)
•Post-deployment: Verify cross-region operations, check conflict rates, validate replication

Cross-Region Observability

Monitoring active-active requires correlating data across regions:

Distributed tracing: Trace IDs must propagate across region boundaries
Unified metrics: Aggregate metrics from all regions with region labels
Correlation alerts: Some issues only manifest when comparing regions
Replication lag dashboards: Continuous visibility into sync state

Incident Response Complexity

Incidents in active-active systems are harder to diagnose and resolve:

Is the issue in one region or all regions?
Is it caused by local factors or cross-region replication?
Should we shed traffic from the affected region?
Will shedding traffic cause a cascade in other regions?

On-Call Requirements

Active-active typically requires:

Regional on-call coverage (someone in each region's timezone)
Escalation paths that span regions
Clear ownership during cross-region incidents
Runbooks for regional isolation and traffic management

The Coordination Tax

Active-active demands more communication: cross-region standups, global deployment coordination, shared incident review. Budget 20-30% additional engineering overhead for coordination activities alone. This isn't waste—it's necessary investment in system coherence.

Summary: Active-Active Multi-Region

We've explored the apex of multi-region architecture: active-active systems that serve traffic from all regions simultaneously. Let's consolidate the key principles:

Key Takeaways

•Two patterns exist: Geographically sharded (simpler, avoids multi-writer) and fully replicated (flexible, complex). Start with sharding if possible.
•Consistency is a spectrum: From eventual to strong, with many points between. Choose consistency models per data-type based on business requirements.
•Conflicts are inevitable in multi-master: Detect them with version vectors, resolve them deterministically. Consider CRDTs for naturally mergeable data.
•Distributed systems problems multiply: ID generation, time synchronization, cross-region transactions, and cache effectiveness all require careful design.
•Operations become globally coordinated: Deployments, schema changes, monitoring, and incident response all span regions.
•The complexity is permanent: Active-active isn't a project to complete—it's a continuous operational investment.

What's Next

We've examined the architectural patterns. The next two pages dive into the critical enabling technologies: data replication across regions (the mechanisms that synchronize data) and traffic routing (how requests reach the right region). These technologies make multi-region possible.

Page Complete

You now understand active-active multi-region architecture—its patterns, consistency models, conflict resolution strategies, implementation challenges, and operational requirements. This is the most complex multi-region topology, and mastering it positions you to design systems at global scale.

3 / 5

Loading learning content...

System Design (HLD)Multi-Region Architecture

Multi-Region Architecture: Building Globally Distributed Systems

LevelAdvanced

Duration90 mins

TopicMulti-Region Architecture

3 / 5

Active-Active Multi-Region: Global Scale with Global Complexity

When Every Region Is Primary

What You Will Learn

The Two Flavors of Active-Active

Active-active architectures come in two fundamentally different patterns, each with distinct characteristics, tradeoffs, and appropriate use cases.

Pattern 1: Geographically Sharded Active-Active

In this pattern, each region "owns" a subset of data. Ownership is determined by a partitioning scheme—often by user geography, account ID range, or tenant assignment:

User data lives primarily in one region (their "home" region)
Writes for a user always go to their home region
Reads can be served from any region (with potential staleness for cross-region reads)
Cross-region data access is the exception, not the rule

This pattern avoids the hardest problems of active-active (multi-writer conflicts) by ensuring that each data item has a single writer. It's simpler to implement and reason about.

Pattern 2: Fully Replicated Active-Active (Multi-Master)

In this pattern, all data is replicated to all regions, and any region can accept writes for any data:

All regions have complete copies of all data
Any region can write to any record
Conflicts can occur when multiple regions write to the same data concurrently
Requires sophisticated conflict detection and resolution

This pattern provides the ultimate in flexibility and failover (any region can handle any request), but introduces the full complexity of distributed consensus and conflict resolution.

Geographically Sharded

•Data placement: Each piece of data has a home region
•Write path: Writes route to the home region
•Conflicts: Eliminated by design (single writer per datum)
•Cross-region access: Occasionally needed, adds latency
•Consistency: Strong within region, eventual cross-region
•Complexity: Moderate
•Best for: Multi-tenant SaaS, region-specific data, natural partitioning exists

Fully Replicated (Multi-Master)

•Data placement: All data in all regions
•Write path: Any region accepts any write
•Conflicts: Must be detected and resolved
•Cross-region access: Rare (local data available)
•Consistency: Eventually consistent with conflicts
•Complexity: Very high
•Best for: Collaborative apps, global inventory, users move between regions

Converting Mermaid diagram...

Choosing Between Patterns

The choice between sharded and replicated active-active depends on your data access patterns:

Choose Sharded When:

Users have natural geographic affinity (residents of a country, regional business units)
Data has clear ownership (a user's profile, a company's documents)
Cross-region reads are acceptable with slight staleness
You want active-active benefits without multi-master complexity

Choose Fully Replicated When:

Users frequently move between regions (global travelers, roaming devices)
Data is truly global (shared inventory, collaborative documents)
Sub-second failover is required (losing writes to the home region isn't acceptable)
Your team has deep distributed systems expertise

Start Sharded, Evolve If Needed

Consistency Models for Global Systems

Eventual Consistency

Advantages: High availability, low latency, works during partitions
Disadvantages: Temporary inconsistencies visible to users, requires careful application design
Appropriate for: Social feeds, product catalogs, user preferences, analytics

Session Consistency (Read-Your-Writes)

A stronger guarantee where a user is guaranteed to see their own writes, even if other users see older data:

Writes are immediately visible to the user who made them
Other users may temporarily see stale data
Implemented through sticky sessions or write tokens

This model handles the most jarring user experience issue—"I just submitted that, where is it?"—without requiring global synchronization.

session-consistency.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
/**
 * Session Consistency Implementation
 * 
 * Ensures users see their own writes even in an eventually consistent
 * multi-region system. Uses write tokens to route reads appropriately.
 */
 
interface WriteToken {
  timestamp: number;
  region: string;
  logSequence: string;  // Database position marker
}
 
interface ReadOptions {
  userId: string;
  writeToken?: WriteToken;
}
 
class SessionConsistentReader {
  private readonly localRegion: string;
  private readonly replicationLagMs: number;
  
  constructor(localRegion: string, estimatedReplicationLagMs: number = 500) {
    this.localRegion = localRegion;
    this.replicationLagMs = estimatedReplicationLagMs;
  }
  
  /**
   * Reads data with session consistency.
   * If the user has a recent write token, ensures the read
   * sees at least that write.
   */
  async read<T>(
    table: string,
    key: string,
    options: ReadOptions
  ): Promise<T> {
    const { writeToken } = options;
    
    // No write token: local read is fine (no consistency requirement)
    if (!writeToken) {
      return this.localRead(table, key);
    }
    
    // Write token exists: determine if local replica is caught up
    const tokenAge = Date.now() - writeToken.timestamp;
    
    // If write was in local region, local read is always consistent
    if (writeToken.region === this.localRegion) {
      return this.localRead(table, key);
    }
    
    // If write was recent (likely not yet replicated), read from write region
    if (tokenAge < this.replicationLagMs * 2) {
      return this.crossRegionRead(table, key, writeToken.region);
    }
    
    // Write is old enough that replication likely completed
    // Verify by checking local replica position against token
    const localPosition = await this.getLocalReplicaPosition();
    
    if (this.positionIsAfter(localPosition, writeToken.logSequence)) {
      // Local replica has caught up
      return this.localRead(table, key);
    } else {
      // Local replica is behind: read from write region
      return this.crossRegionRead(table, key, writeToken.region);
    }
  }
  
  /**
   * Writes data and returns a token for session consistency.
   */
  async write<T>(
    table: string,
    key: string,
    value: T,
    userId: string
  ): Promise<{ result: T; writeToken: WriteToken }> {
    // Write to local region (which is primary for this user)
    const result = await this.localWrite(table, key, value);
    
    // Generate write token for session consistency
    const writeToken: WriteToken = {
      timestamp: Date.now(),
      region: this.localRegion,
      logSequence: await this.getCurrentLogSequence()
    };
    
    // Store token for user's session
    await this.storeUserWriteToken(userId, writeToken);
    
    return { result, writeToken };
  }
  
  private async localRead<T>(table: string, key: string): Promise<T> {
    // Read from local region database
    return db.region(this.localRegion).read(table, key);
  }
  
  private async localWrite<T>(table: string, key: string, value: T): Promise<T> {
    // Write to local region database
    return db.region(this.localRegion).write(table, key, value);
  }
  
  private async crossRegionRead<T>(
    table: string,
    key: string,
    region: string
  ): Promise<T> {
    // Read from specified region (higher latency)
    return db.region(region).read(table, key);
  }
  
  private async getLocalReplicaPosition(): Promise<string> {
    // Get current replication position from local database
    const result = await db.region(this.localRegion)
      .query('SELECT pg_last_wal_replay_lsn()');
    return result.rows[0].pg_last_wal_replay_lsn;
  }
  
  private async getCurrentLogSequence(): Promise<string> {
    // Get current write position
    const result = await db.region(this.localRegion)
      .query('SELECT pg_current_wal_lsn()');
    return result.rows[0].pg_current_wal_lsn;
  }
  
  private positionIsAfter(current: string, required: string): boolean {
    // Compare log sequence numbers
    return current >= required;  // Simplified; real impl uses LSN parsing
  }
  
  private async storeUserWriteToken(
    userId: string,
    token: WriteToken
  ): Promise<void> {
    // Store in distributed cache with TTL
    await cache.set(
      `write-token:${userId}`,
      token,
      { ttlSeconds: 60 }  // Token expires after replication guaranteed complete
    );
  }
}

Causal Consistency

Implemented through vector clocks or hybrid logical clocks
More complex than eventual consistency
Provides intuitive behavior for collaborative applications

Strong Consistency (Global Serialization)

All operations appear to execute in a single global order, and reads always return the most recent write:

Requires coordination across regions for every operation
Adds cross-region latency to every write
Available only when all regions are reachable

Strong consistency in active-active is possible (Google's Spanner proves this with TrueTime), but the latency cost makes it appropriate only for critical operations like financial transactions.

Hybrid Approaches

Real systems often combine consistency models for different data types:

Account balances: Strong consistency (synchronized across regions)
User preferences: Eventual consistency (no conflict impact)
Social feeds: Causal consistency (maintains conversational order)
Shopping carts: Session consistency (user sees their own additions)

Consistency Model Selection Guide
Data Type	Recommended Model	Rationale
Account balances	Strong/CP	Financial correctness required
Inventory counts	Strong with fallback	Prevent overselling, degrade gracefully
User sessions	Session consistency	Users must see own data
User profiles	Eventual	Low conflict rate, not business-critical
Social feeds	Causal	Conversation order matters
Analytics events	Eventual	Aggregated anyway, order not critical
Collaborative docs	Causal + CRDT	Real-time sync with order preservation

Conflict Detection and Resolution

Understanding Conflicts

A conflict occurs when two operations modify the same data without awareness of each other. Consider:

User A in US-East updates their email to "alice@new.com"
User A in EU-West (different session/device) updates their email to "alice@work.com"
Both updates succeed in their local regions
During replication, the databases receive conflicting updates

Without conflict resolution, one update would randomly overwrite the other, potentially losing data.

Conflict Detection Mechanisms

Last-Write-Wins (LWW)

The simplest approach: timestamp each write, accept the one with the latest timestamp.

Easy to implement
May lose data (earlier write is discarded)
Clock synchronization is critical (NTP drift can cause wrong winner)
Appropriate when any resolution is acceptable

Version Vectors / Vector Clocks

Track a logical clock per region, detecting concurrent modifications:

Each region maintains a version number
On read, receive all versions; on write, increment local version
Concurrent writes detected when neither version vector dominates
Enables explicit conflict handling rather than silent overwrite

conflict-resolution.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
"""
Conflict Resolution Strategies for Multi-Region Active-Active
 
This module implements several conflict resolution approaches
with their tradeoffs and appropriate use cases.
"""
from dataclasses import dataclass, field
from datetime import datetime
from typing import Optional, Dict, Any, Callable
from enum import Enum
import json
 
class ConflictStrategy(Enum):
    LAST_WRITE_WINS = "lww"
    FIRST_WRITE_WINS = "fww"
    MERGE = "merge"
    CUSTOM = "custom"
 
@dataclass
class VersionVector:
    """
    Vector clock for tracking causality across regions.
    Each region maintains its own logical timestamp.
    """
    clocks: Dict[str, int] = field(default_factory=dict)
    
    def increment(self, region: str) -> 'VersionVector':
        """Increment the clock for a region (on write)."""
        new_clocks = self.clocks.copy()
        new_clocks[region] = new_clocks.get(region, 0) + 1
        return VersionVector(new_clocks)
    
    def merge(self, other: 'VersionVector') -> 'VersionVector':
        """Merge two vectors (take max of each component)."""
        regions = set(self.clocks.keys()) | set(other.clocks.keys())
        merged = {
            r: max(self.clocks.get(r, 0), other.clocks.get(r, 0))
            for r in regions
        }
        return VersionVector(merged)
    
    def is_concurrent_with(self, other: 'VersionVector') -> bool:
        """
        Check if two version vectors are concurrent (neither dominates).
        This indicates a conflict.
        """
        self_newer = False
        other_newer = False
        
        regions = set(self.clocks.keys()) | set(other.clocks.keys())
        for region in regions:
            self_val = self.clocks.get(region, 0)
            other_val = other.clocks.get(region, 0)
            
            if self_val > other_val:
                self_newer = True
            if other_val > self_val:
                other_newer = True
        
        return self_newer and other_newer
    
    def dominates(self, other: 'VersionVector') -> bool:
        """Check if self happened after other (self dominates)."""
        if self.is_concurrent_with(other):
            return False
        
        regions = set(self.clocks.keys()) | set(other.clocks.keys())
        return all(
            self.clocks.get(r, 0) >= other.clocks.get(r, 0)
            for r in regions
        )
 
@dataclass
class VersionedValue:
    """A value with version tracking for conflict detection."""
    value: Any
    version: VersionVector
    timestamp: datetime
    origin_region: str
 
class ConflictResolver:
    """
    Handles conflict resolution for multi-region replication.
    """
    
    def __init__(self, local_region: str):
        self.local_region = local_region
        self.custom_resolvers: Dict[str, Callable] = {}
    
    def register_custom_resolver(
        self, 
        entity_type: str, 
        resolver: Callable[[VersionedValue, VersionedValue], VersionedValue]
    ):
        """Register a custom conflict resolver for an entity type."""
        self.custom_resolvers[entity_type] = resolver
    
    def resolve(
        self,
        entity_type: str,
        local: VersionedValue,
        remote: VersionedValue,
        strategy: ConflictStrategy = ConflictStrategy.LAST_WRITE_WINS
    ) -> VersionedValue:
        """
        Resolve conflict between local and remote versions.
        Returns the winning value.
        """
        # Check if there's actually a conflict
        if local.version.dominates(remote.version):
            return local  # Local is newer, no conflict
        if remote.version.dominates(local.version):
            return remote  # Remote is newer, no conflict
        
        # Concurrent versions: need resolution
        if strategy == ConflictStrategy.LAST_WRITE_WINS:
            return self._resolve_lww(local, remote)
        elif strategy == ConflictStrategy.FIRST_WRITE_WINS:
            return self._resolve_fww(local, remote)
        elif strategy == ConflictStrategy.MERGE:
            return self._resolve_merge(local, remote)
        elif strategy == ConflictStrategy.CUSTOM:
            if entity_type in self.custom_resolvers:
                return self.custom_resolvers[entity_type](local, remote)
            raise ValueError(f"No custom resolver for {entity_type}")
    
    def _resolve_lww(
        self, 
        local: VersionedValue, 
        remote: VersionedValue
    ) -> VersionedValue:
        """Last-Write-Wins: use timestamp, break ties with region name."""
        if local.timestamp > remote.timestamp:
            winner = local
        elif remote.timestamp > local.timestamp:
            winner = remote
        else:
            # Tie-breaker: deterministic region ordering
            winner = local if local.origin_region < remote.origin_region else remote
        
        # Merge version vectors for accurate causality tracking
        return VersionedValue(
            value=winner.value,
            version=local.version.merge(remote.version),
            timestamp=max(local.timestamp, remote.timestamp),
            origin_region=winner.origin_region
        )
    
    def _resolve_fww(
        self, 
        local: VersionedValue, 
        remote: VersionedValue
    ) -> VersionedValue:
        """First-Write-Wins: preserve the original value."""
        if local.timestamp < remote.timestamp:
            winner = local
        elif remote.timestamp < local.timestamp:
            winner = remote
        else:
            winner = local if local.origin_region < remote.origin_region else remote
        
        return VersionedValue(
            value=winner.value,
            version=local.version.merge(remote.version),
            timestamp=winner.timestamp,
            origin_region=winner.origin_region
        )
    
    def _resolve_merge(
        self, 
        local: VersionedValue, 
        remote: VersionedValue
    ) -> VersionedValue:
        """
        Merge strategy: attempt to combine values.
        Works for additive operations (sets, counters).
        """
        # Handle different merge scenarios
        if isinstance(local.value, set) and isinstance(remote.value, set):
            merged_value = local.value | remote.value
        elif isinstance(local.value, dict) and isinstance(remote.value, dict):
            # Deep merge for dictionaries
            merged_value = self._deep_merge(local.value, remote.value)
        elif isinstance(local.value, (int, float)) and isinstance(remote.value, (int, float)):
            # For counters, this is tricky - need delta-based approach
            merged_value = max(local.value, remote.value)
        else:
            # Fall back to LWW for non-mergeable types
            return self._resolve_lww(local, remote)
        
        return VersionedValue(
            value=merged_value,
            version=local.version.merge(remote.version),
            timestamp=max(local.timestamp, remote.timestamp),
            origin_region=self.local_region
        )
    
    def _deep_merge(self, dict1: dict, dict2: dict) -> dict:
        """Deep merge two dictionaries, handling nested conflicts."""
        result = dict1.copy()
        for key, value in dict2.items():
            if key in result:
                if isinstance(result[key], dict) and isinstance(value, dict):
                    result[key] = self._deep_merge(result[key], value)
                else:
                    # Conflict at leaf: take later value (could be configurable)
                    result[key] = value
            else:
                result[key] = value
        return result
 
 
# CRDT Example: G-Counter (Grow-only counter)
@dataclass
class GCounter:
    """
    A grow-only counter CRDT.
    Can be incremented in any region without coordination.
    Merge always converges to the correct total.
    """
    counts: Dict[str, int] = field(default_factory=dict)
    
    def increment(self, region: str, amount: int = 1) -> 'GCounter':
        """Increment the counter in a specific region."""
        new_counts = self.counts.copy()
        new_counts[region] = new_counts.get(region, 0) + amount
        return GCounter(new_counts)
    
    def value(self) -> int:
        """Get the total count across all regions."""
        return sum(self.counts.values())
    
    def merge(self, other: 'GCounter') -> 'GCounter':
        """Merge two counters (take max from each region)."""
        regions = set(self.counts.keys()) | set(other.counts.keys())
        merged = {
            r: max(self.counts.get(r, 0), other.counts.get(r, 0))
            for r in regions
        }
        return GCounter(merged)
 
# Usage example
if __name__ == "__main__":
    # Simulate concurrent counter updates in two regions
    counter_us = GCounter()
    counter_eu = GCounter()
    
    # US increments 3 times
    counter_us = counter_us.increment("us-east", 3)
    
    # EU increments 5 times (concurrently)
    counter_eu = counter_eu.increment("eu-west", 5)
    
    # After replication, both regions merge
    counter_us = counter_us.merge(counter_eu)
    counter_eu = counter_eu.merge(counter_us)
    
    print(f"US counter: {counter_us.value()}")  # 8
    print(f"EU counter: {counter_eu.value()}")  # 8
    # Both converge to 8 without coordination!

Conflict-Free Replicated Data Types (CRDTs)

CRDTs are data structures mathematically designed to merge without conflicts. They guarantee that all replicas converge to the same state, regardless of the order operations are applied.

Common CRDT types:

G-Counter: Grow-only counter (can only increment)
PN-Counter: Positive-negative counter (increment and decrement)
G-Set: Grow-only set (can only add elements)
OR-Set: Observed-remove set (add and remove elements)
LWW-Register: Last-writer-wins register for single values
MV-Register: Multi-value register (preserves all concurrent values)

CRDTs are ideal for data that can be modeled as counters, sets, or maps. They enable strong eventual consistency: replicas may temporarily diverge, but they always converge to the same state.

Application-Level Resolution

For complex domain objects, automatic resolution may be insufficient. Application-level resolution surfaces conflicts to users or applies domain-specific logic:

User resolution: Show both versions and let the user choose
Domain rules: Apply business logic (e.g., "later booking wins for calendar conflicts")
Operational transforms: For collaborative editing (like Google Docs)

Conflicts Are Information Loss

Implementation Challenges in Active-Active

Active-active architectures introduce challenges that rarely appear in single-region or active-passive systems. These must be addressed during design, not discovered in production.

ID Generation

In single-region systems, auto-incrementing database IDs work perfectly. In active-active, concurrent inserts in different regions would produce collisions. Solutions:

UUIDs (v4): Random 128-bit identifiers with negligible collision probability

Pros: Simple, no coordination required
Cons: Poor index performance, not time-sortable

ULIDs / UUIDs (v7): Time-sorted universally unique identifiers

Timestamp prefix enables time-ordering and improves index locality
Becoming the modern standard for distributed systems

Snowflake IDs: Twitter's approach with timestamp + machine ID + sequence

64-bit, time-sortable, guaranteed unique per machine
Requires machine ID coordination

Range-Prefixed IDs: Each region gets a prefix range (e.g., US: 1-1B, EU: 1B-2B)

Simple, works with numeric IDs
Requires capacity planning and rebalancing

distributed-id-generator.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
/**
 * Distributed ID Generation Strategies for Active-Active
 */
 
// Strategy 1: ULID - Universally Unique Lexicographically Sortable Identifier
// Format: 01ARZ3NDEKTSV4RRFFQ69G5FAV (26 characters, Crockford Base32)
// First 10 chars = timestamp, Last 16 chars = randomness
 
import { ulid, decodeTime } from 'ulid';
 
class ULIDGenerator {
  generate(): string {
    return ulid();  // e.g., "01ARZ3NDEKTSV4RRFFQ69G5FAV"
  }
  
  getTimestamp(id: string): Date {
    const timestamp = decodeTime(id);
    return new Date(timestamp);
  }
}
 
// Strategy 2: Snowflake-inspired ID
// 64-bit: [1 bit unused][41 bits timestamp][10 bits region+machine][12 bits sequence]
 
class SnowflakeGenerator {
  private readonly epoch = 1609459200000n;  // Jan 1, 2021
  private readonly regionId: bigint;
  private readonly machineId: bigint;
  private sequence: bigint = 0n;
  private lastTimestamp: bigint = 0n;
  
  constructor(regionId: number, machineId: number) {
    // 5 bits for region (32 regions), 5 bits for machine (32 per region)
    if (regionId < 0 || regionId > 31) {
      throw new Error('Region ID must be 0-31');
    }
    if (machineId < 0 || machineId > 31) {
      throw new Error('Machine ID must be 0-31');
    }
    this.regionId = BigInt(regionId);
    this.machineId = BigInt(machineId);
  }
  
  generate(): bigint {
    let timestamp = BigInt(Date.now()) - this.epoch;
    
    if (timestamp === this.lastTimestamp) {
      // Same millisecond: increment sequence
      this.sequence = (this.sequence + 1n) & 0xFFFn;  // 12 bits
      if (this.sequence === 0n) {
        // Sequence exhausted, wait for next millisecond
        while (timestamp <= this.lastTimestamp) {
          timestamp = BigInt(Date.now()) - this.epoch;
        }
      }
    } else {
      this.sequence = 0n;
    }
    this.lastTimestamp = timestamp;
    
    // Compose ID:
    // timestamp (41 bits) | region (5 bits) | machine (5 bits) | sequence (12 bits)
    return (timestamp << 22n) | 
           (this.regionId << 17n) | 
           (this.machineId << 12n) | 
           this.sequence;
  }
  
  parse(id: bigint): { timestamp: Date; regionId: number; machineId: number; sequence: number } {
    return {
      timestamp: new Date(Number((id >> 22n) + this.epoch)),
      regionId: Number((id >> 17n) & 0x1Fn),
      machineId: Number((id >> 12n) & 0x1Fn),
      sequence: Number(id & 0xFFFn)
    };
  }
}
 
// Strategy 3: Region-Prefixed with Local Sequence
class RegionPrefixedGenerator {
  private readonly regionPrefix: string;
  private sequence: number = 0;
  
  // Region prefixes provide namespace isolation
  private static readonly REGION_PREFIXES: Record<string, string> = {
    'us-east': 'USE',
    'us-west': 'USW',
    'eu-west': 'EUW',
    'ap-northeast': 'APN'
  };
  
  constructor(region: string) {
    const prefix = RegionPrefixedGenerator.REGION_PREFIXES[region];
    if (!prefix) {
      throw new Error(`Unknown region: ${region}`);
    }
    this.regionPrefix = prefix;
  }
  
  generate(): string {
    const timestamp = Date.now().toString(36);  // Base36 timestamp
    const seq = (this.sequence++).toString(36).padStart(4, '0');
    const random = Math.random().toString(36).substring(2, 6);
    
    return `${this.regionPrefix}-${timestamp}-${seq}-${random}`;
    // e.g., "USE-lpq5k8z-0001-a7b2"
  }
}

Time Synchronization

Many active-active patterns rely on timestamps: last-write-wins, conflict detection, ordering. But clock drift between regions can cause incorrect ordering. Solutions:

NTP with tight configuration: Maintain sub-millisecond synchronization
Hybrid Logical Clocks (HLC): Combine physical and logical clocks for causality preservation
TrueTime (Google): GPS and atomic clocks for microsecond accuracy (not available outside Google Cloud Spanner)

Cross-Region Transactions

Some operations must be atomic across regions—transferring money between accounts in different regions, for example. Options:

Two-Phase Commit (2PC): Classic distributed transaction, but blocks on failures
Saga Pattern: Sequence of local transactions with compensating actions
Consensus Protocols (Paxos/Raft): For leader election and distributed state

Avoid cross-region transactions whenever possible. When unavoidable, accept the latency cost and design for partial failures.

Hit Rate and Cache Warming

In active-active, cache effectiveness becomes region-specific:

A cache in US-East is cold for data primarily accessed from EU-West
Background replication doesn't populate caches
Cache misses for cross-region data add latency

Strategies:

Route users to consistent regions (sticky routing)
Pre-warm caches using replication events
Accept lower hit rates for globally-accessed data

The Subtle Bugs Are the Worst

Operating Active-Active in Production

Deployment Strategies

Deploying to active-active systems requires careful orchestration:

Sequential Rolling Deployment

Deploy to one region at a time
Monitor for issues before proceeding to next region
Ensures working region if deployment is problematic
Slower, but safest approach

Parallel Deployment with Canary per Region

Deploy canary to all regions simultaneously
Each region has its own canary percentage
Faster, but more complex to monitor

Feature Flags for Regional Rollout

Deploy code everywhere, but enable features per region
Database migrations require careful coordination
Maximum flexibility for rollback

The Schema Change Problem

Database schema changes in active-active are particularly challenging:

Both regions must handle both old and new schema during transition
Adding columns: easy (nullable, with default)
Changing column types: requires multi-stage migration
Removing columns: can only remove after no code uses them

All schema changes must be backward-compatible, as regions run different code versions during deployments.

Active-Active Deployment Checklist

•Pre-deployment: Verify replication health, review conflict metrics, ensure rollback procedures are ready
•Schema changes: Deploy backward-compatible schema updates to all regions first
•Code deployment: Roll out to one region, validate, proceed to remaining regions
•Monitoring window: Extended observation (2-4x normal) due to cross-region interactions
•Rollback planning: Rollback order is reverse of deployment (handle schema dependencies)
•Post-deployment: Verify cross-region operations, check conflict rates, validate replication

Cross-Region Observability

Monitoring active-active requires correlating data across regions:

Distributed tracing: Trace IDs must propagate across region boundaries
Unified metrics: Aggregate metrics from all regions with region labels
Correlation alerts: Some issues only manifest when comparing regions
Replication lag dashboards: Continuous visibility into sync state

Incident Response Complexity

Incidents in active-active systems are harder to diagnose and resolve:

Is the issue in one region or all regions?
Is it caused by local factors or cross-region replication?
Should we shed traffic from the affected region?
Will shedding traffic cause a cascade in other regions?

On-Call Requirements

Active-active typically requires:

Regional on-call coverage (someone in each region's timezone)
Escalation paths that span regions
Clear ownership during cross-region incidents
Runbooks for regional isolation and traffic management

The Coordination Tax

Summary: Active-Active Multi-Region

We've explored the apex of multi-region architecture: active-active systems that serve traffic from all regions simultaneously. Let's consolidate the key principles:

Key Takeaways

•Two patterns exist: Geographically sharded (simpler, avoids multi-writer) and fully replicated (flexible, complex). Start with sharding if possible.
•Consistency is a spectrum: From eventual to strong, with many points between. Choose consistency models per data-type based on business requirements.
•Conflicts are inevitable in multi-master: Detect them with version vectors, resolve them deterministically. Consider CRDTs for naturally mergeable data.
•Distributed systems problems multiply: ID generation, time synchronization, cross-region transactions, and cache effectiveness all require careful design.
•Operations become globally coordinated: Deployments, schema changes, monitoring, and incident response all span regions.
•The complexity is permanent: Active-active isn't a project to complete—it's a continuous operational investment.

What's Next

Page Complete

3 / 5