System Design (HLD)Multi-Leader Replication

Multi-Leader Replication

LevelAdvanced

Duration75 mins

TopicMulti-Leader Replication

3 / 5

Conflict Resolution Strategies

The Inevitable Challenge of Concurrent Writes

In single-leader systems, the leader serializes all writes—conflicts are impossible because every write sees the previous write's result before executing. Multi-leader replication intentionally breaks this serialization for latency and availability benefits, but now conflicts become possible.

Imagine two users, Alice in Tokyo and Bob in London, simultaneously editing the same document's title:

Alice's edit: "Project Report" → "Q4 Project Report"
Bob's edit: "Project Report" → "Final Project Report"

Both edits are valid. Both users see their change succeed immediately (local leader acknowledgment). But when these writes replicate to each other's leaders, the system confronts an impossible question: Which title should the document have?

This page explores the strategies systems use to answer this question—from simple automatic approaches to sophisticated domain-specific resolution. Understanding these strategies is essential for designing robust multi-leader systems.

What You Will Learn

By the end of this page, you will understand: (1) Conflict avoidance techniques that reduce conflict frequency, (2) Automatic resolution strategies like Last-Write-Wins, (3) How conflicts are surfaced to applications or users for custom resolution, (4) Implementing merge functions for domain-specific conflict handling, and (5) The fundamental trade-offs between resolution approaches.

Strategy 1: Conflict Avoidance

The most effective conflict resolution strategy is avoiding conflicts entirely. While not always possible, careful system design can eliminate most conflicts before they occur.

Principle: Route Related Writes to the Same Leader

If all writes to a given record always flow through the same leader, conflicts for that record cannot occur. This effectively creates 'ownership' of records by leaders.

Implementation approaches:

Conflict Avoidance Techniques

•User-to-leader affinity: All writes from a specific user always route to the same leader. Since a user can only perform one write at a time, conflicts on user-scoped data are eliminated.
•Document-to-leader affinity: Each document (or aggregate root) is assigned a 'home' leader. All writes to that document route there, regardless of user location. Other leaders hold read replicas.
•Geographic partitioning: Data is naturally partitioned by geography. EU users' data writes to EU leader; APAC users' data writes to APAC leader. Cross-region data is rare or read-only from remote locations.
•Time-window partitioning: Current-period data (today's transactions) writes to one leader; historical queries go to replicas. Conflicts are avoided by temporal separation.

User Affinity in Practice:

user-affinity-routing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
interface UserRoute {
  userId: string;
  homeLeader: string; // Datacenter/leader ID
  lastSeen: Date;
}
 
class ConflictAvoidanceRouter {
  private userRoutes: Map<string, UserRoute> = new Map();
  private leaders = ['us-east', 'eu-west', 'ap-northeast'];
  
  // Assign user to their nearest leader on first access
  assignHomeLeader(userId: string, clientRegion: string): string {
    const existing = this.userRoutes.get(userId);
    if (existing) {
      return existing.homeLeader;
    }
    
    // Map client region to nearest leader
    const homeLeader = this.nearestLeader(clientRegion);
    this.userRoutes.set(userId, {
      userId,
      homeLeader,
      lastSeen: new Date()
    });
    
    return homeLeader;
  }
  
  // Route write to user's home leader, even if not nearest
  routeWrite(userId: string, clientRegion: string): string {
    const route = this.userRoutes.get(userId);
    if (!route) {
      return this.assignHomeLeader(userId, clientRegion);
    }
    
    // Always route to home leader for writes (conflict avoidance)
    // Accept higher latency for distant users
    return route.homeLeader;
  }
  
  // Reads can go to any leader for lower latency
  routeRead(userId: string, clientRegion: string): string {
    return this.nearestLeader(clientRegion);
  }
  
  // Reassign home leader (e.g., user relocated)
  migrateUser(userId: string, newHomeLeader: string): void {
    const route = this.userRoutes.get(userId);
    if (route) {
      // Wait for replication to complete before switching
      route.homeLeader = newHomeLeader;
    }
  }
  
  private nearestLeader(region: string): string {
    const regionMap: Record<string, string> = {
      'us': 'us-east',
      'eu': 'eu-west',
      'apac': 'ap-northeast'
    };
    return regionMap[region] || 'us-east';
  }
}

Trade-offs of Conflict Avoidance:

Approach	Benefit	Cost
User affinity	Eliminates user-data conflicts	Users far from home leader experience write latency
Document affinity	Eliminates document-level conflicts	Documents locked to region; poor for collaborative editing across regions
Geographic partitioning	Natural fit for location-bound data	Doesn't work for globally shared data

When Avoidance Isn't Possible:

Conflict avoidance works when data has a natural owner or locality. It breaks down for:

Shared resources: Inventory, seat availability, limited quantities
Collaborative editing: Multiple users editing the same document from different regions
Global aggregates: Counters, balances, statistics updated from everywhere

For these cases, we need resolution strategies.

Design for Avoidance First

Before designing conflict resolution, audit your data model for avoidance opportunities. If 90% of conflicts can be avoided through routing, the remaining 10% becomes manageable. Resolution is a fallback for unavoidable conflicts, not the primary strategy.

Strategy 2: Automatic Resolution Approaches

When conflicts cannot be avoided, the system must resolve them. Automatic resolution means the system resolves conflicts without human intervention, applying deterministic rules to select a winner.

The key requirement for any automatic resolution strategy is convergence: all leaders, applying the same strategy to the same conflicts, must reach the same final state. Without convergence, leaders diverge permanently.

The Resolution Decision Matrix:

Automatic Resolution Strategies
Strategy	Decision Basis	Guarantees	Data Loss Risk
Last-Write-Wins (LWW)	Timestamp comparison	Convergence guaranteed	High - earlier write always discarded
Highest-ID-Wins	Deterministic ID ordering	Convergence guaranteed	High - lower ID always discarded
Leader Priority	Pre-assigned leader ranking	Convergence guaranteed	Moderate - predictable loss from lower-priority leaders
Field-Level Merge	Per-field comparison	Convergence guaranteed	Lower - only conflicting fields resolve
Union/Merge	Combine values	Convergence guaranteed	None - all data preserved but may be inconsistent

Critical Insight: All Automatic Resolution Involves Trade-offs

Every automatic strategy trades something:

LWW trades accuracy: The 'later' write might be logically incorrect (based on stale data)
Union trades consistency: Combined data might violate business rules
Field-level merge trades simplicity: Complex to implement correctly

There is no universally correct automatic resolution. The right choice depends on your data semantics and tolerance for different failure modes.

The Illusion of 'Just Working'

Automatic resolution can create the illusion that the system handles conflicts seamlessly. In reality, every conflict resolved automatically represents potential data loss or semantic inconsistency. Monitor conflict rates and audit resolution outcomes regularly.

Deep Dive: Last-Write-Wins (LWW)

Last-Write-Wins (LWW) is the most common automatic resolution strategy, used by systems like Apache Cassandra, DynamoDB (optionally), and many custom multi-leader implementations.

The Algorithm:

Each write carries a timestamp (or monotonically increasing sequence number)
When a conflict is detected, compare timestamps
The write with the higher (later) timestamp wins
The losing write is discarded entirely

Implementation:

last-write-wins.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
interface TimestampedRecord<T> {
  data: T;
  timestamp: number;      // Logical or physical timestamp
  sourceLeader: string;   // Leader that accepted the write
  writeId: string;        // Unique write identifier for tie-breaking
}
 
class LWWResolver<T> {
  /**
   * Resolve conflict between local and incoming records.
   * Returns the winning record.
   */
  resolve(
    local: TimestampedRecord<T>,
    incoming: TimestampedRecord<T>
  ): TimestampedRecord<T> {
    // Primary comparison: timestamp
    if (incoming.timestamp > local.timestamp) {
      return incoming;
    }
    if (local.timestamp > incoming.timestamp) {
      return local;
    }
    
    // Timestamps equal: need deterministic tie-breaker
    // Common approaches:
    // 1. Compare leader IDs lexicographically
    // 2. Compare write IDs
    // 3. Use a pre-defined leader priority
    
    return this.tieBreak(local, incoming);
  }
  
  private tieBreak(
    a: TimestampedRecord<T>,
    b: TimestampedRecord<T>
  ): TimestampedRecord<T> {
    // Lexicographic comparison ensures all leaders make same decision
    if (a.writeId > b.writeId) return a;
    return b;
  }
}
 
// Example: Conflicting updates to user email
const localWrite: TimestampedRecord<{ email: string }> = {
  data: { email: 'alice@new-company.com' },
  timestamp: 1704672000000,
  sourceLeader: 'eu-west',
  writeId: 'write-eu-001'
};
 
const incomingWrite: TimestampedRecord<{ email: string }> = {
  data: { email: 'alice@different-company.com' },
  timestamp: 1704672000500,  // 500ms later
  sourceLeader: 'us-east',
  writeId: 'write-us-001'
};
 
const resolver = new LWWResolver<{ email: string }>();
const winner = resolver.resolve(localWrite, incomingWrite);
// winner.data.email === 'alice@different-company.com' (later timestamp wins)

The Timestamp Problem:

LWW relies on accurate, comparable timestamps. In distributed systems, this is surprisingly difficult:

Physical Clock Skew: Different servers have different clock times, potentially by seconds or more. NTP synchronization helps but cannot guarantee perfect alignment. A write at T=100 on Server A might actually happen after a write at T=105 on Server B due to clock skew.

Consequences:

A genuinely later write might have an earlier timestamp
LWW discards the 'wrong' winner
The 'correct' data is silently lost

LWW Failure Scenarios

•Lost update due to clock skew: Alice reads record, takes 10 seconds to edit, saves. Bob reads same record, edits quickly, saves. Bob's clock is 15 seconds ahead. Bob's write wins despite Alice initiating first.
•Cascading data inconsistency: User updates address, then places order. Address update has earlier timestamp due to processing delay. Order ships to old address.
•Silent data loss at scale: With millions of writes per day and 0.1% conflict rate, thousands of writes are discarded daily. Each might represent customer data loss.
•Retroactive corruption: A network-delayed write from yesterday finally arrives with an old timestamp. It might overwrite today's data if timestamps aren't validated for recency.

Mitigating LWW's Weaknesses:

Hybrid Logical Clocks (HLC): Combine physical timestamps with logical counters. Maintains causality order while staying close to wall-clock time.
Server-side timestamping: Accept client writes without timestamps; assign timestamps at the receiving leader. Reduces clock skew to inter-leader differences.
Application-level safety: For critical data, don't use LWW. Route to single leader or use stronger consistency.
Conflict logging: Even with LWW, log all conflicts. Review logs for unexpected patterns or high conflict rates that indicate design issues.

When LWW Is Acceptable

LWW is appropriate when: (1) 'Last' genuinely means 'most current truth' (e.g., sensor readings, location updates), (2) Lost updates are tolerable (cache invalidation, session state), (3) Conflicts are rare enough that occasional losses don't impact business. It's inappropriate for financial data, unique constraints, or any data where loss is unacceptable.

Strategy 3: Application-Level Merge Functions

When automatic resolution is too coarse-grained, applications can provide custom merge functions that understand domain semantics. Instead of discarding one write entirely, merge functions combine conflicting writes intelligently.

The Merge Function Contract:

A merge function receives:

The common ancestor (state before either write)
Both conflicting writes
Returns: The merged result

The function must be:

Deterministic: Same inputs always produce same output
Commutative: merge(A, B) = merge(B, A)
Idempotent: merge(X, X) = X
Associative: merge(merge(A, B), C) = merge(A, merge(B, C))

domain-merge-functions.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
// Example 1: Shopping Cart Merge
// Union of items - never lose an item, quantity is summed
 
interface CartItem {
  productId: string;
  quantity: number;
  addedAt: number;
}
 
interface ShoppingCart {
  userId: string;
  items: CartItem[];
  lastModified: number;
}
 
function mergeShoppingCarts(
  ancestor: ShoppingCart,
  writeA: ShoppingCart,
  writeB: ShoppingCart
): ShoppingCart {
  const mergedItems = new Map<string, CartItem>();
  
  // Start with ancestor items
  for (const item of ancestor.items) {
    mergedItems.set(item.productId, { ...item });
  }
  
  // Apply changes from writeA
  for (const item of writeA.items) {
    const existing = mergedItems.get(item.productId);
    if (!existing) {
      mergedItems.set(item.productId, { ...item });
    } else {
      // Quantity changes: add delta from ancestor
      const ancestorItem = ancestor.items.find(i => i.productId === item.productId);
      const deltaA = item.quantity - (ancestorItem?.quantity || 0);
      existing.quantity += deltaA;
    }
  }
  
  // Apply changes from writeB  
  for (const item of writeB.items) {
    const existing = mergedItems.get(item.productId);
    if (!existing) {
      mergedItems.set(item.productId, { ...item });
    } else {
      const ancestorItem = ancestor.items.find(i => i.productId === item.productId);
      const deltaB = item.quantity - (ancestorItem?.quantity || 0);
      existing.quantity += deltaB;
    }
  }
  
  // Remove items with quantity <= 0
  const finalItems = Array.from(mergedItems.values())
    .filter(item => item.quantity > 0);
  
  return {
    userId: writeA.userId,
    items: finalItems,
    lastModified: Math.max(writeA.lastModified, writeB.lastModified)
  };
}
 
// Example 2: Counter with Increment-Only Semantics
// G-Counter (Grow-only counter) - each leader tracks its own increments
 
interface GCounter {
  counts: Record<string, number>; // leaderId -> count
}
 
function incrementGCounter(counter: GCounter, leaderId: string): GCounter {
  return {
    counts: {
      ...counter.counts,
      [leaderId]: (counter.counts[leaderId] || 0) + 1
    }
  };
}
 
function mergeGCounters(a: GCounter, b: GCounter): GCounter {
  const merged: Record<string, number> = { ...a.counts };
  
  for (const [leaderId, count] of Object.entries(b.counts)) {
    // Take maximum - if B has higher count from a leader, use it
    merged[leaderId] = Math.max(merged[leaderId] || 0, count);
  }
  
  return { counts: merged };
}
 
function getGCounterValue(counter: GCounter): number {
  return Object.values(counter.counts).reduce((sum, c) => sum + c, 0);
}

Domain-Specific Merge Examples:

Merge Functions by Domain
Domain	Conflict Scenario	Merge Strategy
Shopping cart	Items added at different leaders	Union of items; sum quantities per product
User profile	Different fields edited	Field-level merge; take latest per field
Document text	Concurrent edits to text	Operational transformation or CRDT-based merge
Inventory count	Concurrent decrements	Sum decrements; alert if negative (oversold)
Event log	Concurrent event appends	Union; order by timestamp or sequence
Tag/label sets	Tags added at different leaders	Union of tag sets
Like/reaction count	Concurrent increments	G-Counter; each leader tracks own increments

Field-Level Merge for Complex Objects:

For objects with multiple independent fields, a powerful technique is per-field resolution. Each field has its own timestamp; conflicts are resolved field-by-field rather than document-by-document.

field-level-merge.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Each field carries its own timestamp
interface FieldValue<T> {
  value: T;
  timestamp: number;
  source: string;
}
 
interface UserProfile {
  name: FieldValue<string>;
  email: FieldValue<string>;
  bio: FieldValue<string>;
  avatarUrl: FieldValue<string | null>;
}
 
function mergeUserProfiles(local: UserProfile, incoming: UserProfile): UserProfile {
  return {
    name: resolveField(local.name, incoming.name),
    email: resolveField(local.email, incoming.email),
    bio: resolveField(local.bio, incoming.bio),
    avatarUrl: resolveField(local.avatarUrl, incoming.avatarUrl)
  };
}
 
function resolveField<T>(local: FieldValue<T>, incoming: FieldValue<T>): FieldValue<T> {
  // LWW at field level
  if (incoming.timestamp > local.timestamp) return incoming;
  if (local.timestamp > incoming.timestamp) return local;
  // Tie-breaker: lexicographic source comparison
  return local.source > incoming.source ? local : incoming;
}
 
// Example: Alice (in Tokyo) updates name, Bob (in London) updates bio
// Both see their changes; no data loss despite record-level conflict

Design Data for Mergeability

The ability to merge cleanly is a data model design decision. When designing schemas for multi-leader systems, ask: If two users modify this simultaneously, how should the results combine? Shape your data structures to have natural, sensible merge semantics.

Strategy 4: User-Involved Resolution

For critical or semantically complex conflicts, automatic resolution may be inappropriate. Instead, the system preserves all conflicting versions and surfaces them to users or administrators for manual resolution.

This approach is used when:

The stakes are too high for automatic resolution (financial, legal, medical data)
Domain semantics are too complex for algorithmic merge
Business processes require human judgment
Audit trails demand explicit resolution decisions

User-Involved Resolution Patterns

•Sibling preservation: Store all conflicting versions (siblings). On read, return all siblings. Application/user selects or merges.
•Conflict queue: Conflicts are queued for review. A dashboard shows pending conflicts. Administrators resolve with full context.
•Interactive resolution UI: When a user opens a conflicted document, they see a merge interface (like Git merge conflicts) and explicitly resolve.
•Escalation workflow: Automatic resolution is attempted; failures escalate to humans. Resolution decisions feed back to improve automatic rules.

Amazon Dynamo's Sibling Approach:

Amazon's Dynamo (and Riak, which is inspired by it) uses sibling preservation:

Conflicting writes are stored as siblings attached to the same key
On read, all siblings are returned to the client
The client is responsible for merging and writing back the resolved value
The write includes a vector clock that proves it descends from all siblings

This places resolution burden on the application but provides maximum flexibility.

sibling-resolution.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
interface Sibling<T> {
  value: T;
  vectorClock: Record<string, number>;
  source: string;
  timestamp: number;
}
 
interface SiblingSet<T> {
  siblings: Sibling<T>[];
  hasConflict: boolean;
}
 
class SiblingResolver<T> {
  // Read returns all siblings if conflict exists
  async read(key: string): Promise<SiblingSet<T>> {
    const siblings = await this.storage.getSiblings(key);
    return {
      siblings,
      hasConflict: siblings.length > 1
    };
  }
  
  // Application must merge and write back
  async resolveConflict(
    key: string,
    resolvedValue: T,
    originalSiblings: Sibling<T>[]
  ): Promise<void> {
    // Merge all sibling vector clocks
    const mergedClock = this.mergeVectorClocks(
      originalSiblings.map(s => s.vectorClock)
    );
    
    // Increment our own counter to indicate we've resolved
    mergedClock[this.nodeId] = (mergedClock[this.nodeId] || 0) + 1;
    
    // Write the resolved value with merged clock
    await this.storage.put(key, {
      value: resolvedValue,
      vectorClock: mergedClock,
      source: this.nodeId,
      timestamp: Date.now()
    });
  }
  
  private mergeVectorClocks(clocks: Record<string, number>[]): Record<string, number> {
    const merged: Record<string, number> = {};
    for (const clock of clocks) {
      for (const [node, count] of Object.entries(clock)) {
        merged[node] = Math.max(merged[node] || 0, count);
      }
    }
    return merged;
  }
}
 
// Application usage:
async function handleDocument(docId: string) {
  const resolver = new SiblingResolver<Document>();
  const result = await resolver.read(docId);
  
  if (result.hasConflict) {
    // Show conflict resolution UI to user
    const userChoice = await showConflictUI(result.siblings);
    await resolver.resolveConflict(docId, userChoice, result.siblings);
  } else {
    return result.siblings[0].value;
  }
}

Sibling Explosion Risk

Without timely resolution, siblings can accumulate. A frequently-conflicted key might grow to hundreds of siblings, causing read amplification and storage bloat. Implement sibling limits and alerting. Consider automatic fallback (e.g., LWW after 1 hour unresolved) for non-critical data.

Choosing the Right Resolution Strategy

Different data types and use cases call for different resolution strategies. Often, a system uses multiple strategies for different categories of data.

Decision Framework:

Resolution Strategy Selection Guide
Data Characteristic	Recommended Strategy	Rationale
Immutable / append-only	No conflict possible	Once written, never modified; only conflict is creation race
Latest-value semantics (sensor data, location)	Last-Write-Wins	'Latest' is genuinely 'most correct'
Additive / accumulative (counters, sets)	Merge function (CRDT-style)	Natural additive semantics allow lossless merge
Independent fields (user profile)	Field-level LWW	Reduces conflict scope; parallel edits often don't conflict
Complex business logic (orders, workflows)	Application-level merge	Only the application understands correct merge semantics
Critical / irreversible (payments, legal)	User-involved resolution	Stakes too high for automatic decisions
Collaborative content (documents)	OT / CRDT with UI support	Specialized algorithms preserve all edits

Hybrid Approaches in Practice:

Real-world systems rarely use a single strategy. Consider a user profile system:

Multi-Strategy Example: User Profile

•Display name, bio, avatar: Field-level LWW — independent fields, latest edit wins per field
•Email address: User-involved resolution — critical for authentication; must not auto-resolve incorrectly
•Favorite items list: Merge function — union of favorites from both writes
•Notification preferences: Field-level LWW per preference — each toggle is independent
•Account balance: Not replicated via multi-leader — uses single-leader strong consistency

Categorize Before Designing

Before implementing multi-leader replication, categorize every data type by: (1) Conflict likelihood, (2) Conflict impact if auto-resolved incorrectly, (3) Natural merge semantics if any. This categorization drives strategy selection and may reveal data that shouldn't use multi-leader at all.

Summary: Conflict Resolution Strategies

We've explored the spectrum of conflict resolution strategies, from avoidance through automatic resolution to human-involved processes. Let's consolidate the key insights:

Key Takeaways

•Conflict avoidance through routing is the first line of defense—route related writes to the same leader when possible.
•Last-Write-Wins (LWW) is simple and guarantees convergence but loses data and is susceptible to clock skew issues.
•Application-level merge functions enable domain-aware resolution, preserving more data at the cost of implementation complexity.
•User-involved resolution is appropriate for critical data where automatic resolution risk is unacceptable.
•Field-level resolution reduces conflict scope—parallel edits to different fields of the same record need not conflict.
•Real systems use multiple strategies for different data categories based on conflict impact and natural merge semantics.

What's Next:

We've examined resolution strategies in the abstract. The next page dives deep into Last-Write-Wins—the most commonly used automatic strategy—exploring its implementation details, timestamp mechanisms, and production considerations.

Page Complete

You now understand the full spectrum of conflict resolution strategies, from avoidance to automatic resolution to human-involved processes. Next, we'll explore Last-Write-Wins in production detail.

3 / 5

Loading learning content...

System Design (HLD)Multi-Leader Replication

Multi-Leader Replication

LevelAdvanced

Duration75 mins

TopicMulti-Leader Replication

3 / 5

Conflict Resolution Strategies

The Inevitable Challenge of Concurrent Writes

Imagine two users, Alice in Tokyo and Bob in London, simultaneously editing the same document's title:

Alice's edit: "Project Report" → "Q4 Project Report"
Bob's edit: "Project Report" → "Final Project Report"

What You Will Learn

Strategy 1: Conflict Avoidance

The most effective conflict resolution strategy is avoiding conflicts entirely. While not always possible, careful system design can eliminate most conflicts before they occur.

Principle: Route Related Writes to the Same Leader

If all writes to a given record always flow through the same leader, conflicts for that record cannot occur. This effectively creates 'ownership' of records by leaders.

Implementation approaches:

Conflict Avoidance Techniques

•User-to-leader affinity: All writes from a specific user always route to the same leader. Since a user can only perform one write at a time, conflicts on user-scoped data are eliminated.
•Document-to-leader affinity: Each document (or aggregate root) is assigned a 'home' leader. All writes to that document route there, regardless of user location. Other leaders hold read replicas.
•Geographic partitioning: Data is naturally partitioned by geography. EU users' data writes to EU leader; APAC users' data writes to APAC leader. Cross-region data is rare or read-only from remote locations.
•Time-window partitioning: Current-period data (today's transactions) writes to one leader; historical queries go to replicas. Conflicts are avoided by temporal separation.

User Affinity in Practice:

user-affinity-routing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
interface UserRoute {
  userId: string;
  homeLeader: string; // Datacenter/leader ID
  lastSeen: Date;
}
 
class ConflictAvoidanceRouter {
  private userRoutes: Map<string, UserRoute> = new Map();
  private leaders = ['us-east', 'eu-west', 'ap-northeast'];
  
  // Assign user to their nearest leader on first access
  assignHomeLeader(userId: string, clientRegion: string): string {
    const existing = this.userRoutes.get(userId);
    if (existing) {
      return existing.homeLeader;
    }
    
    // Map client region to nearest leader
    const homeLeader = this.nearestLeader(clientRegion);
    this.userRoutes.set(userId, {
      userId,
      homeLeader,
      lastSeen: new Date()
    });
    
    return homeLeader;
  }
  
  // Route write to user's home leader, even if not nearest
  routeWrite(userId: string, clientRegion: string): string {
    const route = this.userRoutes.get(userId);
    if (!route) {
      return this.assignHomeLeader(userId, clientRegion);
    }
    
    // Always route to home leader for writes (conflict avoidance)
    // Accept higher latency for distant users
    return route.homeLeader;
  }
  
  // Reads can go to any leader for lower latency
  routeRead(userId: string, clientRegion: string): string {
    return this.nearestLeader(clientRegion);
  }
  
  // Reassign home leader (e.g., user relocated)
  migrateUser(userId: string, newHomeLeader: string): void {
    const route = this.userRoutes.get(userId);
    if (route) {
      // Wait for replication to complete before switching
      route.homeLeader = newHomeLeader;
    }
  }
  
  private nearestLeader(region: string): string {
    const regionMap: Record<string, string> = {
      'us': 'us-east',
      'eu': 'eu-west',
      'apac': 'ap-northeast'
    };
    return regionMap[region] || 'us-east';
  }
}

Trade-offs of Conflict Avoidance:

Approach	Benefit	Cost
User affinity	Eliminates user-data conflicts	Users far from home leader experience write latency
Document affinity	Eliminates document-level conflicts	Documents locked to region; poor for collaborative editing across regions
Geographic partitioning	Natural fit for location-bound data	Doesn't work for globally shared data

When Avoidance Isn't Possible:

Conflict avoidance works when data has a natural owner or locality. It breaks down for:

Shared resources: Inventory, seat availability, limited quantities
Collaborative editing: Multiple users editing the same document from different regions
Global aggregates: Counters, balances, statistics updated from everywhere

For these cases, we need resolution strategies.

Design for Avoidance First

Strategy 2: Automatic Resolution Approaches

The Resolution Decision Matrix:

Automatic Resolution Strategies
Strategy	Decision Basis	Guarantees	Data Loss Risk
Last-Write-Wins (LWW)	Timestamp comparison	Convergence guaranteed	High - earlier write always discarded
Highest-ID-Wins	Deterministic ID ordering	Convergence guaranteed	High - lower ID always discarded
Leader Priority	Pre-assigned leader ranking	Convergence guaranteed	Moderate - predictable loss from lower-priority leaders
Field-Level Merge	Per-field comparison	Convergence guaranteed	Lower - only conflicting fields resolve
Union/Merge	Combine values	Convergence guaranteed	None - all data preserved but may be inconsistent

Critical Insight: All Automatic Resolution Involves Trade-offs

Every automatic strategy trades something:

LWW trades accuracy: The 'later' write might be logically incorrect (based on stale data)
Union trades consistency: Combined data might violate business rules
Field-level merge trades simplicity: Complex to implement correctly

There is no universally correct automatic resolution. The right choice depends on your data semantics and tolerance for different failure modes.

The Illusion of 'Just Working'

Deep Dive: Last-Write-Wins (LWW)

Last-Write-Wins (LWW) is the most common automatic resolution strategy, used by systems like Apache Cassandra, DynamoDB (optionally), and many custom multi-leader implementations.

The Algorithm:

Each write carries a timestamp (or monotonically increasing sequence number)
When a conflict is detected, compare timestamps
The write with the higher (later) timestamp wins
The losing write is discarded entirely

Implementation:

last-write-wins.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
interface TimestampedRecord<T> {
  data: T;
  timestamp: number;      // Logical or physical timestamp
  sourceLeader: string;   // Leader that accepted the write
  writeId: string;        // Unique write identifier for tie-breaking
}
 
class LWWResolver<T> {
  /**
   * Resolve conflict between local and incoming records.
   * Returns the winning record.
   */
  resolve(
    local: TimestampedRecord<T>,
    incoming: TimestampedRecord<T>
  ): TimestampedRecord<T> {
    // Primary comparison: timestamp
    if (incoming.timestamp > local.timestamp) {
      return incoming;
    }
    if (local.timestamp > incoming.timestamp) {
      return local;
    }
    
    // Timestamps equal: need deterministic tie-breaker
    // Common approaches:
    // 1. Compare leader IDs lexicographically
    // 2. Compare write IDs
    // 3. Use a pre-defined leader priority
    
    return this.tieBreak(local, incoming);
  }
  
  private tieBreak(
    a: TimestampedRecord<T>,
    b: TimestampedRecord<T>
  ): TimestampedRecord<T> {
    // Lexicographic comparison ensures all leaders make same decision
    if (a.writeId > b.writeId) return a;
    return b;
  }
}
 
// Example: Conflicting updates to user email
const localWrite: TimestampedRecord<{ email: string }> = {
  data: { email: 'alice@new-company.com' },
  timestamp: 1704672000000,
  sourceLeader: 'eu-west',
  writeId: 'write-eu-001'
};
 
const incomingWrite: TimestampedRecord<{ email: string }> = {
  data: { email: 'alice@different-company.com' },
  timestamp: 1704672000500,  // 500ms later
  sourceLeader: 'us-east',
  writeId: 'write-us-001'
};
 
const resolver = new LWWResolver<{ email: string }>();
const winner = resolver.resolve(localWrite, incomingWrite);
// winner.data.email === 'alice@different-company.com' (later timestamp wins)

The Timestamp Problem:

LWW relies on accurate, comparable timestamps. In distributed systems, this is surprisingly difficult:

Consequences:

A genuinely later write might have an earlier timestamp
LWW discards the 'wrong' winner
The 'correct' data is silently lost

LWW Failure Scenarios

•Lost update due to clock skew: Alice reads record, takes 10 seconds to edit, saves. Bob reads same record, edits quickly, saves. Bob's clock is 15 seconds ahead. Bob's write wins despite Alice initiating first.
•Cascading data inconsistency: User updates address, then places order. Address update has earlier timestamp due to processing delay. Order ships to old address.
•Silent data loss at scale: With millions of writes per day and 0.1% conflict rate, thousands of writes are discarded daily. Each might represent customer data loss.
•Retroactive corruption: A network-delayed write from yesterday finally arrives with an old timestamp. It might overwrite today's data if timestamps aren't validated for recency.

Mitigating LWW's Weaknesses:

Hybrid Logical Clocks (HLC): Combine physical timestamps with logical counters. Maintains causality order while staying close to wall-clock time.
Server-side timestamping: Accept client writes without timestamps; assign timestamps at the receiving leader. Reduces clock skew to inter-leader differences.
Application-level safety: For critical data, don't use LWW. Route to single leader or use stronger consistency.
Conflict logging: Even with LWW, log all conflicts. Review logs for unexpected patterns or high conflict rates that indicate design issues.

When LWW Is Acceptable

Strategy 3: Application-Level Merge Functions

The Merge Function Contract:

A merge function receives:

The common ancestor (state before either write)
Both conflicting writes
Returns: The merged result

The function must be:

Deterministic: Same inputs always produce same output
Commutative: merge(A, B) = merge(B, A)
Idempotent: merge(X, X) = X
Associative: merge(merge(A, B), C) = merge(A, merge(B, C))

domain-merge-functions.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
// Example 1: Shopping Cart Merge
// Union of items - never lose an item, quantity is summed
 
interface CartItem {
  productId: string;
  quantity: number;
  addedAt: number;
}
 
interface ShoppingCart {
  userId: string;
  items: CartItem[];
  lastModified: number;
}
 
function mergeShoppingCarts(
  ancestor: ShoppingCart,
  writeA: ShoppingCart,
  writeB: ShoppingCart
): ShoppingCart {
  const mergedItems = new Map<string, CartItem>();
  
  // Start with ancestor items
  for (const item of ancestor.items) {
    mergedItems.set(item.productId, { ...item });
  }
  
  // Apply changes from writeA
  for (const item of writeA.items) {
    const existing = mergedItems.get(item.productId);
    if (!existing) {
      mergedItems.set(item.productId, { ...item });
    } else {
      // Quantity changes: add delta from ancestor
      const ancestorItem = ancestor.items.find(i => i.productId === item.productId);
      const deltaA = item.quantity - (ancestorItem?.quantity || 0);
      existing.quantity += deltaA;
    }
  }
  
  // Apply changes from writeB  
  for (const item of writeB.items) {
    const existing = mergedItems.get(item.productId);
    if (!existing) {
      mergedItems.set(item.productId, { ...item });
    } else {
      const ancestorItem = ancestor.items.find(i => i.productId === item.productId);
      const deltaB = item.quantity - (ancestorItem?.quantity || 0);
      existing.quantity += deltaB;
    }
  }
  
  // Remove items with quantity <= 0
  const finalItems = Array.from(mergedItems.values())
    .filter(item => item.quantity > 0);
  
  return {
    userId: writeA.userId,
    items: finalItems,
    lastModified: Math.max(writeA.lastModified, writeB.lastModified)
  };
}
 
// Example 2: Counter with Increment-Only Semantics
// G-Counter (Grow-only counter) - each leader tracks its own increments
 
interface GCounter {
  counts: Record<string, number>; // leaderId -> count
}
 
function incrementGCounter(counter: GCounter, leaderId: string): GCounter {
  return {
    counts: {
      ...counter.counts,
      [leaderId]: (counter.counts[leaderId] || 0) + 1
    }
  };
}
 
function mergeGCounters(a: GCounter, b: GCounter): GCounter {
  const merged: Record<string, number> = { ...a.counts };
  
  for (const [leaderId, count] of Object.entries(b.counts)) {
    // Take maximum - if B has higher count from a leader, use it
    merged[leaderId] = Math.max(merged[leaderId] || 0, count);
  }
  
  return { counts: merged };
}
 
function getGCounterValue(counter: GCounter): number {
  return Object.values(counter.counts).reduce((sum, c) => sum + c, 0);
}

Domain-Specific Merge Examples:

Merge Functions by Domain
Domain	Conflict Scenario	Merge Strategy
Shopping cart	Items added at different leaders	Union of items; sum quantities per product
User profile	Different fields edited	Field-level merge; take latest per field
Document text	Concurrent edits to text	Operational transformation or CRDT-based merge
Inventory count	Concurrent decrements	Sum decrements; alert if negative (oversold)
Event log	Concurrent event appends	Union; order by timestamp or sequence
Tag/label sets	Tags added at different leaders	Union of tag sets
Like/reaction count	Concurrent increments	G-Counter; each leader tracks own increments

Field-Level Merge for Complex Objects:

For objects with multiple independent fields, a powerful technique is per-field resolution. Each field has its own timestamp; conflicts are resolved field-by-field rather than document-by-document.

field-level-merge.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Each field carries its own timestamp
interface FieldValue<T> {
  value: T;
  timestamp: number;
  source: string;
}
 
interface UserProfile {
  name: FieldValue<string>;
  email: FieldValue<string>;
  bio: FieldValue<string>;
  avatarUrl: FieldValue<string | null>;
}
 
function mergeUserProfiles(local: UserProfile, incoming: UserProfile): UserProfile {
  return {
    name: resolveField(local.name, incoming.name),
    email: resolveField(local.email, incoming.email),
    bio: resolveField(local.bio, incoming.bio),
    avatarUrl: resolveField(local.avatarUrl, incoming.avatarUrl)
  };
}
 
function resolveField<T>(local: FieldValue<T>, incoming: FieldValue<T>): FieldValue<T> {
  // LWW at field level
  if (incoming.timestamp > local.timestamp) return incoming;
  if (local.timestamp > incoming.timestamp) return local;
  // Tie-breaker: lexicographic source comparison
  return local.source > incoming.source ? local : incoming;
}
 
// Example: Alice (in Tokyo) updates name, Bob (in London) updates bio
// Both see their changes; no data loss despite record-level conflict

Design Data for Mergeability

Strategy 4: User-Involved Resolution

This approach is used when:

The stakes are too high for automatic resolution (financial, legal, medical data)
Domain semantics are too complex for algorithmic merge
Business processes require human judgment
Audit trails demand explicit resolution decisions

User-Involved Resolution Patterns

•Sibling preservation: Store all conflicting versions (siblings). On read, return all siblings. Application/user selects or merges.
•Conflict queue: Conflicts are queued for review. A dashboard shows pending conflicts. Administrators resolve with full context.
•Interactive resolution UI: When a user opens a conflicted document, they see a merge interface (like Git merge conflicts) and explicitly resolve.
•Escalation workflow: Automatic resolution is attempted; failures escalate to humans. Resolution decisions feed back to improve automatic rules.

Amazon Dynamo's Sibling Approach:

Amazon's Dynamo (and Riak, which is inspired by it) uses sibling preservation:

Conflicting writes are stored as siblings attached to the same key
On read, all siblings are returned to the client
The client is responsible for merging and writing back the resolved value
The write includes a vector clock that proves it descends from all siblings

This places resolution burden on the application but provides maximum flexibility.

sibling-resolution.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
interface Sibling<T> {
  value: T;
  vectorClock: Record<string, number>;
  source: string;
  timestamp: number;
}
 
interface SiblingSet<T> {
  siblings: Sibling<T>[];
  hasConflict: boolean;
}
 
class SiblingResolver<T> {
  // Read returns all siblings if conflict exists
  async read(key: string): Promise<SiblingSet<T>> {
    const siblings = await this.storage.getSiblings(key);
    return {
      siblings,
      hasConflict: siblings.length > 1
    };
  }
  
  // Application must merge and write back
  async resolveConflict(
    key: string,
    resolvedValue: T,
    originalSiblings: Sibling<T>[]
  ): Promise<void> {
    // Merge all sibling vector clocks
    const mergedClock = this.mergeVectorClocks(
      originalSiblings.map(s => s.vectorClock)
    );
    
    // Increment our own counter to indicate we've resolved
    mergedClock[this.nodeId] = (mergedClock[this.nodeId] || 0) + 1;
    
    // Write the resolved value with merged clock
    await this.storage.put(key, {
      value: resolvedValue,
      vectorClock: mergedClock,
      source: this.nodeId,
      timestamp: Date.now()
    });
  }
  
  private mergeVectorClocks(clocks: Record<string, number>[]): Record<string, number> {
    const merged: Record<string, number> = {};
    for (const clock of clocks) {
      for (const [node, count] of Object.entries(clock)) {
        merged[node] = Math.max(merged[node] || 0, count);
      }
    }
    return merged;
  }
}
 
// Application usage:
async function handleDocument(docId: string) {
  const resolver = new SiblingResolver<Document>();
  const result = await resolver.read(docId);
  
  if (result.hasConflict) {
    // Show conflict resolution UI to user
    const userChoice = await showConflictUI(result.siblings);
    await resolver.resolveConflict(docId, userChoice, result.siblings);
  } else {
    return result.siblings[0].value;
  }
}

Sibling Explosion Risk

Choosing the Right Resolution Strategy

Different data types and use cases call for different resolution strategies. Often, a system uses multiple strategies for different categories of data.

Decision Framework:

Resolution Strategy Selection Guide
Data Characteristic	Recommended Strategy	Rationale
Immutable / append-only	No conflict possible	Once written, never modified; only conflict is creation race
Latest-value semantics (sensor data, location)	Last-Write-Wins	'Latest' is genuinely 'most correct'
Additive / accumulative (counters, sets)	Merge function (CRDT-style)	Natural additive semantics allow lossless merge
Independent fields (user profile)	Field-level LWW	Reduces conflict scope; parallel edits often don't conflict
Complex business logic (orders, workflows)	Application-level merge	Only the application understands correct merge semantics
Critical / irreversible (payments, legal)	User-involved resolution	Stakes too high for automatic decisions
Collaborative content (documents)	OT / CRDT with UI support	Specialized algorithms preserve all edits

Hybrid Approaches in Practice:

Real-world systems rarely use a single strategy. Consider a user profile system:

Multi-Strategy Example: User Profile

•Display name, bio, avatar: Field-level LWW — independent fields, latest edit wins per field
•Email address: User-involved resolution — critical for authentication; must not auto-resolve incorrectly
•Favorite items list: Merge function — union of favorites from both writes
•Notification preferences: Field-level LWW per preference — each toggle is independent
•Account balance: Not replicated via multi-leader — uses single-leader strong consistency

Categorize Before Designing

Summary: Conflict Resolution Strategies

We've explored the spectrum of conflict resolution strategies, from avoidance through automatic resolution to human-involved processes. Let's consolidate the key insights:

Key Takeaways

•Conflict avoidance through routing is the first line of defense—route related writes to the same leader when possible.
•Last-Write-Wins (LWW) is simple and guarantees convergence but loses data and is susceptible to clock skew issues.
•Application-level merge functions enable domain-aware resolution, preserving more data at the cost of implementation complexity.
•User-involved resolution is appropriate for critical data where automatic resolution risk is unacceptable.
•Field-level resolution reduces conflict scope—parallel edits to different fields of the same record need not conflict.
•Real systems use multiple strategies for different data categories based on conflict impact and natural merge semantics.

What's Next:

Page Complete

You now understand the full spectrum of conflict resolution strategies, from avoidance to automatic resolution to human-involved processes. Next, we'll explore Last-Write-Wins in production detail.

3 / 5