Loading content...
In leader-based replication, conflicts are structurally impossible—the leader serializes all writes, imposing a total order. But in leaderless systems, where any node can accept writes at any time, conflicts are not bugs to be avoided—they're a fundamental design consideration.
Conflicts occur when two operations modify the same data without awareness of each other. The challenge isn't preventing conflicts (that would require coordination, defeating the purpose of leaderless design) but rather detecting conflicts reliably and resolving them correctly.
This page explores the complete lifecycle of conflicts: how they arise, how systems detect them, and the spectrum of resolution strategies—from simple last-writer-wins to sophisticated conflict-free data types.
By the end of this page, you will understand how concurrent modifications create conflicts, how version vectors and vector clocks detect them, the trade-offs between different resolution strategies (LWW, application merge, CRDTs), and how to choose the right approach for your application's requirements.
Conflicts in leaderless systems arise from concurrent modifications—updates that happen "at the same time" without one being aware of the other. Let's explore how these situations occur.
What makes modifications concurrent?
Two modifications are concurrent if neither happened-before the other. Using logical clock terms:
Key insight: Concurrency isn't necessarily about wall-clock time. Two writes seconds apart can be concurrent if they originated from isolated nodes.
Scenario 1: Multi-device conflict
User on Phone (Node A): User on Laptop (Node B):
10:00:00 - Open shopping cart 10:00:00 - Open shopping cart
(cart = []) (cart = [])
10:00:05 - Add "milk" 10:00:03 - Add "bread"
(cart = ["milk"]) (cart = ["bread"])
10:00:10 - Writes to A 10:00:06 - Writes to B
Both operations saw cart = [] as the starting point.
Both made changes without knowing about the other.
Conflict: ["milk"] vs ["bread"] — which is the cart?
Scenario 2: Datacenter partition
US Datacenter: EU Datacenter:
(Network partition occurs)
User 1: POST /order/123/status User 2: POST /order/123/status
status = "shipped" status = "cancelled"
Both datacenters accept the write locally.
After partition heals: which status is correct?
In leaderless systems, conflicts are not exceptional—they're a normal operational condition. Your system design must account for them, not hope to avoid them. The question is never 'will conflicts occur?' but 'how will we handle them when they do?'
Before resolving conflicts, systems must detect them. Several mechanisms exist, each with different capabilities and overhead.
1. Wall-clock timestamps
Simplest approach: attach a timestamp to each write.
Write 1: value="A", timestamp=1609459200000
Write 2: value="B", timestamp=1609459200500
Compare: 1609459200500 > 1609459200000
Result: B happened after A (or did it?)
Limitations:
2. Version counters (per-key version numbers)
Maintain a version number per key, incremented on each write.
Client reads: key=X, value="A", version=5
Client writes: key=X, value="B", expected_version=5
If current_version != 5:
Conflict! Someone else modified between read and write.
Limitations:
3. Vector clocks / version vectors
The gold standard for conflict detection in leaderless systems. Each replica maintains a counter for every replica that has modified the data.
Version vector structure: {replica_id: counter, ...}
Write at Replica A: vc = {A: 1}
Subsequent write at Replica A: vc = {A: 2}
Write at Replica B (knows about A:2): vc = {A: 2, B: 1}
Comparison:
{A: 2, B: 1} > {A: 2} → B knows about A's update
{A: 2, B: 1} || {A: 1, B: 2} → Concurrent! Neither dominates.
| Mechanism | Detects Concurrency? | Overhead | Cross-replica? |
|---|---|---|---|
| Wall-clock timestamps | No (unreliable) | Low (single number) | Yes |
| Version counters | Partial (single origin) | Low (single number) | Limited |
| Vector clocks | Yes (accurate) | Medium (O(replicas)) | Yes |
| Dotted version vectors | Yes (handles resurrection) | Medium-High | Yes |
| Hybrid logical clocks | Partial (causality) | Low | Yes |
Think of a vector clock as a "knowledge vector"—it encodes what each replica knew when making its update. If your knowledge vector dominates another (all counters ≥, at least one greater), you knew about that update. If neither dominates, you were operating concurrently.
The simplest conflict resolution strategy is Last-Writer-Wins (LWW): attach a timestamp to each write and let the highest timestamp win.
LWW mechanics:
// Two concurrent writes arrive at a replica
Write 1: key=X, value="apple", timestamp=1000
Write 2: key=X, value="banana", timestamp=1001
// Resolution: banana wins (higher timestamp)
final_value = "banana"
// apple is silently discarded
Why LWW is popular:
When LWW is appropriate:
Despite its dangers, LWW is correct for some use cases:
When LWW is dangerous:
Cassandra's default conflict resolution is LWW at the cell (column) level. This means each column of a row can have a different "winner." While this provides fine-grained resolution, it can create logically inconsistent rows if columns were meant to be updated atomically. Design your data model with this in mind.
Instead of automatically picking a winner, systems can preserve all concurrent versions (called "siblings") and let the application decide how to merge them.
Sibling preservation mechanics:
// Two concurrent writes detected via vector clocks
Write 1: value="apple", vc={A: 1}
Write 2: value="banana", vc={B: 1}
// {A:1} || {B:1} → Concurrent, cannot order
// Store both as siblings:
key=X -> [
{value: "apple", vc: {A: 1}},
{value: "banana", vc: {B: 1}}
]
// On next read, return both to client
GET /key/X -> [
{value: "apple", context: ...},
{value: "banana", context: ...}
]
// Client merges and writes back
PUT /key/X -> value="apple,banana", vc={A: 1, B: 1, C: 1}
The shopping cart example (Dynamo's classic):
Initial cart: []
Client A (phone): add("milk") → cart=["milk"], vc={A:1}
Client B (tablet): add("bread") → cart=["bread"], vc={B:1}
// Conflict: {A:1} || {B:1}
// Siblings stored: [["milk"], ["bread"]]
Client C reads cart:
→ Receives siblings: [["milk"], ["bread"]]
→ Application merges: union(["milk"], ["bread"]) = ["milk", "bread"]
→ Writes merged cart: ["milk", "bread"], vc={A:1, B:1, C:1}
// All replicas converge to ["milk", "bread"]
// No data lost!
Riak supports siblings natively. When allow_mult=true, concurrent writes create siblings that are returned on read. The application is responsible for merging and writing back a resolved value. Riak also supports server-side resolution for data types (Riak Data Types) that can auto-merge.
When siblings are preserved, applications must implement merge logic. The right strategy depends on the data type and business requirements.
Common merge strategies:
| Pattern | Logic | Applicable Data Types |
|---|---|---|
| Union | Combine all items from all versions | Sets, shopping carts, tags |
| Max/Min | Take the maximum or minimum value | Counters, high-water marks |
| Timestamps per field | Take newest value for each field | Documents with independent fields |
| Prefer local | Prefer the version from local datacenter | Write-local-read-global patterns |
| Ask user | Present choices to user for manual resolution | Collaborative documents |
| Three-way merge | Use common ancestor to identify changes | Version-controlled content |
Example: Shopping cart merge
function mergeShoppingCarts(siblings) {
// Siblings: [["milk", "eggs"], ["bread", "milk"], ["cheese"]]
// Strategy: Union all items, remove explicitly deleted items
const allItems = new Set();
const tombstones = new Set();
for (const sibling of siblings) {
for (const item of sibling.items) {
allItems.add(item);
}
for (const deleted of sibling.tombstones || []) {
tombstones.add(deleted);
}
}
// Remove tombstoned items from final set
for (const deleted of tombstones) {
allItems.delete(deleted);
}
return Array.from(allItems);
// Result: ["milk", "eggs", "bread", "cheese"]
}
Example: Counter merge
// Wrong approach: max of totals
merge([{total: 5}, {total: 3}]) → 5 // Lost 3 increments!
// Right approach: sum of increments per origin
function mergeCounter(siblings) {
// Siblings track increments per replica
// Sibling 1: {A: 3, B: 2} → total 5
// Sibling 2: {A: 2, C: 1} → total 3
const merged = {};
for (const sibling of siblings) {
for (const [replica, count] of Object.entries(sibling.counts)) {
merged[replica] = Math.max(merged[replica] || 0, count);
}
}
// merged: {A: 3, B: 2, C: 1} → total 6
return Object.values(merged).reduce((a, b) => a + b, 0);
}
Deletions in distributed systems require special handling. If you just remove an item, a concurrent update might 'resurrect' it. Use tombstones (markers indicating deletion) that persist for a configurable period. A tombstone defeats a concurrent add, ensuring the item stays deleted.
CRDTs are data structures designed from the ground up to support concurrent updates and automatic conflict resolution. They guarantee that all replicas will converge to the same state, regardless of the order updates are received.
The CRDT guarantee:
Given the same set of operations (in any order), all replicas will arrive at the same final state.
This is achieved through mathematical properties of the merge function:
Common CRDT types:
G-Counter (Grow-only Counter)
Structure: {replica_id: count, ...}
Increment: increment own replica's count
Value: sum of all counts
Merge: max of each replica's count
Example:
Replica A: {A: 5, B: 3} → value = 8
Replica B: {A: 4, B: 4} → value = 8
Merged: {A: 5, B: 4} → value = 9
PN-Counter (Positive-Negative Counter)
Structure: {increments: G-Counter, decrements: G-Counter}
Increment: increment 'increments' counter
Decrement: increment 'decrements' counter
Value: increments.value - decrements.value
G-Set (Grow-only Set)
Add element: add to local set
Merge: union of all sets
Remove: NOT SUPPORTED (grow only)
OR-Set (Observed-Remove Set)
Add element: add with unique tag
Remove element: remove specific tags you've seen
Merge: union, respecting tag-level tombstones
Result: add wins if you haven't seen that specific add
| CRDT Type | Supports | Use Cases |
|---|---|---|
| G-Counter | Increment only | Page views, like counts, metrics |
| PN-Counter | Increment/decrement | Inventory counts, bidirectional counters |
| G-Set | Add only | Seen items, user preferences (add-only) |
| OR-Set | Add/remove | Shopping carts, sets with deletions |
| LWW-Register | Set value | Simple key-value with LWW semantics |
| MV-Register | Set value, siblings | Key-value preserving concurrent values |
| RGA/LSEQ | Insert/delete in sequence | Collaborative text editing |
Riak KV supports CRDTs natively (counters, sets, maps, flags, registers). Redis has CRDB (conflict-free replicated database) supporting CRDTs. These aren't academic—they power production systems handling millions of operations per second.
Selecting the right conflict resolution strategy requires understanding your data semantics, consistency requirements, and operational constraints.
| If your data is... | And you need... | Consider... |
|---|---|---|
| Simple scalar values (cache) | Low latency, simplicity | LWW (but accept data loss) |
| Set of items (cart, tags) | No data loss | OR-Set CRDT or sibling merge |
| Counter (views, metrics) | Accurate counts | PN-Counter CRDT |
| Complex document | Field-level conflict handling | Per-field timestamps or siblings |
| Collaborative text | Character-level precision | RGA/LSEQ CRDT or OT |
| Business-critical data | Human review of conflicts | Sibling preservation + UI |
| Immutable events | Deduplication only | LWW with idempotent writes |
Hybrid approaches:
Real systems often combine strategies:
More sophisticated conflict resolution (CRDTs, siblings with merge) requires more implementation effort, storage, and sometimes higher latency. LWW is 'free' but dangerous. Understand the trade-offs and choose appropriately for each data type in your system.
We've explored the complete landscape of conflict handling in leaderless systems. Let's consolidate these insights:
Module Complete: Leaderless Replication
You've now completed a comprehensive exploration of leaderless replication:
Leaderless replication powers some of the world's largest systems. With this knowledge, you're equipped to design, deploy, and operate these systems effectively.
Congratulations! You've mastered leaderless replication—from high-level concepts to implementation-level details. You understand when to use this architecture, how to configure it, and how to handle the inevitable conflicts. This knowledge is fundamental for building globally-distributed, highly-available systems.