Loading content...
Imagine it's Black Friday. Traffic to your e-commerce platform has spiked 10x. Two of your five database nodes in the US-East region have failed under load. With a strict quorum of W=3, R=3 (majority), every write to data primarily stored in that region now fails—because only 3 nodes remain and all must respond successfully for the quorum to be satisfied.
Your customers see error messages. Shopping carts can't save items. Orders can't complete. Revenue bleeds by the minute.
But what if, instead of failing those writes, your system could temporarily "borrow" nodes from another region? What if the write could succeed on 2 local nodes plus 1 node from US-West, still achieving the quorum threshold but using a sloppy set of nodes?
This is the core idea behind sloppy quorums: relaxing the requirement that quorum operations must use the designated replica nodes, instead allowing any available nodes to participate when the preferred nodes are unreachable.
By the end of this page, you will understand the precise mechanics of sloppy quorums and how they differ from strict quorums, master the concept of preference lists and how systems select substitute nodes, comprehend the consistency implications and risks of sloppy quorum operations, learn about hinted handoff as the mechanism for eventual data reconciliation, and recognize when sloppy quorums are appropriate versus when strict quorums are essential.
To understand sloppy quorums, we must first deeply appreciate the limitation of strict quorums that they address.
Strict Quorum Limitation:
With strict quorums, each key is assigned to a specific set of N replica nodes (determined by consistent hashing or similar mechanisms). When you write with W=3 and N=5, you need 3 of those specific 5 nodes to acknowledge. If 3 of those 5 nodes are down, the write fails—even if you have hundreds of other healthy nodes in your cluster.
This creates a paradox: in a massive cluster, a few localized failures can cause write unavailability, even though most of the cluster is healthy.
The Availability Gap:
Consider a 100-node cluster with N=3:
The probability of this scenario increases with:
| Nodes Failed | Failed Nodes | Key Available? | Reason |
|---|---|---|---|
| 0 of 3 | None | Yes | All replicas available |
| 1 of 3 | Node A | Yes | W=2 satisfied by B, C |
| 2 of 3 | Nodes A, B | No | Only C available, W=2 not met |
| 3 of 3 | All | No | No replicas available |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
interface Node { id: string; status: 'healthy' | 'down' | 'slow'; datacenter: string;} interface Key { name: string; preferredReplicas: string[]; // Node IDs} function canWriteStrict( key: Key, allNodes: Node[], writeQuorum: number): { canWrite: boolean; reason: string } { // Strict quorum: only designated replicas count const preferredNodes = allNodes.filter(n => key.preferredReplicas.includes(n.id)); const availablePreferred = preferredNodes.filter(n => n.status === 'healthy'); if (availablePreferred.length >= writeQuorum) { return { canWrite: true, reason: `${availablePreferred.length}/${writeQuorum} preferred replicas available` }; } const unavailablePreferred = preferredNodes.filter(n => n.status !== 'healthy'); return { canWrite: false, reason: `Only ${availablePreferred.length}/${writeQuorum} preferred replicas available. ` + `Down: ${unavailablePreferred.map(n => n.id).join(', ')}` };} // Scenario: Black Friday, 2 of 3 preferred nodes downconst nodes: Node[] = [ { id: 'east-1', status: 'down', datacenter: 'us-east' }, { id: 'east-2', status: 'down', datacenter: 'us-east' }, { id: 'east-3', status: 'healthy', datacenter: 'us-east' }, // 50 more healthy nodes in other datacenters... { id: 'west-1', status: 'healthy', datacenter: 'us-west' }, { id: 'west-2', status: 'healthy', datacenter: 'us-west' }, // ...]; const cartKey: Key = { name: 'cart:user-123', preferredReplicas: ['east-1', 'east-2', 'east-3'],}; const result = canWriteStrict(cartKey, nodes, 2);console.log(result);// { canWrite: false, reason: 'Only 1/2 preferred replicas available. Down: east-1, east-2' } // The user can't save their cart, even though 50+ nodes are healthy!Strict quorums can make a large, mostly-healthy cluster behave as though it's nearly failed. If the nodes responsible for popular keys fail, those hot keys become unavailable even as the rest of the cluster hums along. This is why high-traffic systems often adopt sloppy quorums—to maintain write availability during localized failures.
Sloppy quorums relax the strict requirement that operations must use only the designated replica nodes. Instead, when preferred replicas are unavailable, the system "borrows" nearby healthy nodes to maintain the quorum threshold.
The Preference List Concept:
Rather than assigning exactly N nodes to each key, sloppy quorum systems maintain a preference list that extends beyond the primary replicas. This list is ordered by desirability:
When primary replicas are unavailable, the system walks down the preference list until it finds enough healthy nodes to satisfy the quorum.
The Write Process with Sloppy Quorums:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133
interface PreferenceList { key: string; primaryReplicas: string[]; // N nodes that should own this key secondaryReplicas: string[]; // Next N nodes in ring tertiaryReplicas: string[]; // Further fallbacks} interface HintedWrite { key: string; value: any; timestamp: number; intendedNode: string; // Where this should eventually go} class SloppyQuorumCoordinator { private allNodes: Map<string, Node>; private hints: Map<string, HintedWrite[]> = new Map(); constructor(nodes: Node[]) { this.allNodes = new Map(nodes.map(n => [n.id, n])); } async writeWithSloppyQuorum( key: string, value: any, writeQuorum: number, preferenceList: PreferenceList ): Promise<{ success: boolean; nodesUsed: string[]; hints: string[] }> { const timestamp = Date.now(); const allCandidates = [ ...preferenceList.primaryReplicas, ...preferenceList.secondaryReplicas, ...preferenceList.tertiaryReplicas, ]; const nodesUsed: string[] = []; const hintedNodes: string[] = []; // Walk the preference list for (const nodeId of allCandidates) { if (nodesUsed.length >= writeQuorum) break; const node = this.allNodes.get(nodeId); if (!node || node.status !== 'healthy') continue; const isPrimary = preferenceList.primaryReplicas.includes(nodeId); if (isPrimary) { // Normal write to intended replica await this.writeToNode(nodeId, key, value, timestamp); nodesUsed.push(nodeId); } else { // Sloppy write: store with hint for later handoff const intendedNode = this.findDownPrimary(preferenceList.primaryReplicas); await this.writeHintedToNode(nodeId, key, value, timestamp, intendedNode); nodesUsed.push(nodeId); hintedNodes.push(nodeId); // Record hint for handoff tracking this.recordHint(nodeId, { key, value, timestamp, intendedNode }); } } const success = nodesUsed.length >= writeQuorum; if (!success) { console.warn(`Sloppy quorum not achieved: ${nodesUsed.length}/${writeQuorum}`); } else if (hintedNodes.length > 0) { console.log(`Sloppy quorum achieved with hints: ${hintedNodes.join(', ')}`); } return { success, nodesUsed, hints: hintedNodes }; } private findDownPrimary(primaries: string[]): string { for (const id of primaries) { const node = this.allNodes.get(id); if (!node || node.status !== 'healthy') { return id; } } return primaries[0]; // Shouldn't happen in sloppy scenario } private async writeToNode(nodeId: string, key: string, value: any, ts: number) { // Actual write implementation console.log(`Writing to primary ${nodeId}: ${key} = ${value}`); } private async writeHintedToNode( nodeId: string, key: string, value: any, ts: number, intendedFor: string ) { // Write with metadata indicating eventual destination console.log(`Writing hinted to ${nodeId} (intended for ${intendedFor}): ${key} = ${value}`); } private recordHint(storingNode: string, hint: HintedWrite) { const existing = this.hints.get(storingNode) || []; existing.push(hint); this.hints.set(storingNode, existing); }} // Usage during outageconst coordinator = new SloppyQuorumCoordinator([ { id: 'east-1', status: 'down', datacenter: 'us-east' }, { id: 'east-2', status: 'down', datacenter: 'us-east' }, { id: 'east-3', status: 'healthy', datacenter: 'us-east' }, { id: 'west-1', status: 'healthy', datacenter: 'us-west' }, { id: 'west-2', status: 'healthy', datacenter: 'us-west' },]); const result = await coordinator.writeWithSloppyQuorum( 'cart:user-123', { items: ['product-A', 'product-B'] }, 2, // writeQuorum { key: 'cart:user-123', primaryReplicas: ['east-1', 'east-2', 'east-3'], secondaryReplicas: ['west-1', 'west-2'], tertiaryReplicas: [], }); // Output:// Writing to primary east-3: cart:user-123 = {...}// Writing hinted to west-1 (intended for east-1): cart:user-123 = {...}// Sloppy quorum achieved with hints: west-1// { success: true, nodesUsed: ['east-3', 'west-1'], hints: ['west-1'] }The term 'sloppy' refers to the relaxed selection of nodes, not the quality of the operation. A sloppy quorum with W=3 still achieves 3 acknowledgments; it's just that those 3 might not all be from the designated replicas. The write is just as durable in terms of node count—but the consistency guarantees may be affected.
Sloppy quorums fundamentally change the consistency guarantees of the system. Understanding these implications is critical for using sloppy quorums safely.
The Broken Intersection Guarantee:
Recall that strict quorum consistency relies on the intersection of write and read quorums: W + R > N ensures overlap. With sloppy quorums, this guarantee can break:
If the read happens before X's hinted handoff to C completes, the read might miss the write entirely.
Scenarios Where Sloppy Quorums Cause Stale Reads:
123456789101112131415161718192021222324252627282930313233343536373839
SCENARIO: Stale Read After Sloppy Quorum Write Configuration: N=3, W=2, R=2 (normally W+R=4 > N=3, strong) Timeline:───────────────────────────────────────────────────────────────────── t1: Nodes A, B, C are healthy Key 'x' exists with value=1 on all three t2: Node C fails t3: Client writes x=2 with sloppy quorum - Attempts A, B, C - C is down, substitute with node D - Write succeeds to: {A, D} (hinted for C) - Acknowledgments: 2 ✓ (quorum met) - State: A has x=2, B has x=1, C down, D has x=2 (hinted for C) t4: Node C recovers - C still has x=1 (old value) - D has not yet handed off the hint t5: Client reads x with strict quorum - Queries nodes A, B, C (strict: use designated replicas) - Responses: A returns x=2, B returns x=1, C returns x=1 - Wait for R=2 responses: A(x=2) and B(x=1) arrive first - Version resolution: x=2 is newer, return x=2 - NOTE: This works! A was in both write and read sets t6: ALTERNATE: Client reads from {B, C} (different timing) - B returns x=1, C returns x=1 - Both show stale value! - User sees x=1, not x=2 ← CONSISTENCY VIOLATION Root Cause:- Sloppy quorum wrote to A, D (not B)- Strict read from B, C missed both A and D- No intersection between write set {A, D} and read set {B, C}Even with W + R > N, sloppy quorums do NOT provide the strict quorum's consistency guarantee. The quorum intersection formula assumes operations use the same node set. Sloppy quorums intentionally violate this assumption. If you enable sloppy quorums, treat your system as eventually consistent, regardless of W and R values.
The mechanism for selecting substitute nodes in sloppy quorums is typically based on the consistent hash ring. Understanding how preference lists are constructed explains why sloppy quorums work and what their limitations are.
Consistent Hash Ring Review:
In a consistent hash ring:
Extended Preference List for Sloppy Quorums:
For sloppy quorums, the preference list extends beyond the primary N replicas:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
import crypto from 'crypto'; interface RingNode { id: string; position: bigint; // Position on the ring datacenter: string; status: 'healthy' | 'down';} class ConsistentHashRing { private ring: RingNode[] = []; private replicationFactor: number; constructor(nodes: Omit<RingNode, 'position'>[], rf: number) { this.replicationFactor = rf; // Hash each node to get its ring position this.ring = nodes.map(n => ({ ...n, position: this.hash(n.id), })); // Sort by position this.ring.sort((a, b) => (a.position < b.position ? -1 : 1)); } private hash(key: string): bigint { const hash = crypto.createHash('sha1').update(key).digest('hex'); return BigInt('0x' + hash.slice(0, 16)); } /** * Get the preference list for a key * @param key - The key to look up * @param extendedSize - How many nodes beyond primary to include * @returns Ordered list of nodes from most preferred to least */ getPreferenceList(key: string, extendedSize: number = 10): RingNode[] { const keyPosition = this.hash(key); // Find starting position: first node clockwise from key let startIdx = this.ring.findIndex(n => n.position >= keyPosition); if (startIdx === -1) startIdx = 0; // Wrap around const preferenceList: RingNode[] = []; const seenDatacenters = new Set<string>(); let idx = startIdx; // Walk the ring, collecting nodes while (preferenceList.length < this.replicationFactor + extendedSize) { const node = this.ring[idx]; // For primary replicas, might want to ensure datacenter diversity if (preferenceList.length < this.replicationFactor) { // Only add if we haven't seen this datacenter yet (or we've covered all DCs) if (!seenDatacenters.has(node.datacenter) || seenDatacenters.size >= this.getDatacenterCount()) { preferenceList.push(node); seenDatacenters.add(node.datacenter); } } else { // Extended list: add any node preferenceList.push(node); } idx = (idx + 1) % this.ring.length; // Prevent infinite loop if ring is too small if (idx === startIdx) break; } return preferenceList; } /** * Get available nodes from preference list for a quorum operation */ getNodesForSloppyQuorum( key: string, quorumSize: number ): { nodes: RingNode[]; substitutes: RingNode[] } { const prefList = this.getPreferenceList(key, quorumSize * 2); const primary = prefList.slice(0, this.replicationFactor); const extended = prefList.slice(this.replicationFactor); const nodes: RingNode[] = []; const substitutes: RingNode[] = []; let primaryAvailable = 0; // First, use healthy primary replicas for (const node of primary) { if (node.status === 'healthy') { nodes.push(node); primaryAvailable++; } if (nodes.length >= quorumSize) break; } // If not enough, use extended nodes for (const node of extended) { if (nodes.length >= quorumSize) break; if (node.status === 'healthy') { nodes.push(node); substitutes.push(node); } } return { nodes, substitutes }; } private getDatacenterCount(): number { return new Set(this.ring.map(n => n.datacenter)).size; }} // Exampleconst ring = new ConsistentHashRing([ { id: 'node-a', datacenter: 'east', status: 'down' }, { id: 'node-b', datacenter: 'east', status: 'healthy' }, { id: 'node-c', datacenter: 'west', status: 'down' }, { id: 'node-d', datacenter: 'west', status: 'healthy' }, { id: 'node-e', datacenter: 'central', status: 'healthy' },], 3); const prefList = ring.getPreferenceList('user:12345');console.log('Preference list:', prefList.map(n => n.id));// e.g., ['node-b', 'node-d', 'node-e', 'node-a', 'node-c'] const { nodes, substitutes } = ring.getNodesForSloppyQuorum('user:12345', 2);console.log('Nodes for quorum:', nodes.map(n => n.id));console.log('Substitutes used:', substitutes.map(n => n.id));Production systems often construct preference lists with datacenter awareness. Primary replicas are chosen to span datacenters for fault isolation, while sloppy substitutes might prefer same-datacenter nodes for lower latency. Systems like Riak and Cassandra support configurable policies for preference list construction.
Understanding the precise tradeoffs between sloppy and strict quorums is essential for making informed architectural decisions. Neither is universally better—each serves different requirements.
| Aspect | Strict Quorum | Sloppy Quorum |
|---|---|---|
| Node selection | Only designated N replicas | Designated + substitutes |
| Consistency guarantee | Strong (with W+R>N) | Eventual (always) |
| Write availability | Limited by replica health | Higher, borrows nodes |
| Read consistency | Guaranteed fresh (if W+R>N) | May read stale until handoff |
| Failure behavior | Fails if too few replicas | Succeeds with substitutes |
| Handoff needed | No | Yes, for substitute writes |
| Implementation complexity | Lower | Higher (hints, handoff) |
| Best for | Transactions, coordination | High availability, caching |
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
interface DataProperties { requiresStrongConsistency: boolean; isIdempotent: boolean; toleratesStaleReads: { acceptable: boolean; maxStalenessSeconds?: number; }; criticalityLevel: 'low' | 'medium' | 'high' | 'critical'; writeRate: 'low' | 'medium' | 'high';} interface QuorumRecommendation { useSloppyQuorums: boolean; reasoning: string[]; alternatives: string[];} function recommendQuorumType(data: DataProperties): QuorumRecommendation { const reasoning: string[] = []; const alternatives: string[] = []; // Strong consistency requirement is decisive if (data.requiresStrongConsistency) { reasoning.push('Strong consistency required - sloppy quorums cannot guarantee this'); return { useSloppyQuorums: false, reasoning, alternatives: [ 'Consider using a consensus-based system (Raft/Paxos) if linearizability needed', 'Ensure W + R > N for all operations', ], }; } // Critical data usually shouldn't use sloppy if (data.criticalityLevel === 'critical') { reasoning.push('Critical data - recommend strict quorums for predictability'); alternatives.push('If availability is paramount, consider sloppy with short hint TTL'); return { useSloppyQuorums: false, reasoning, alternatives }; } // High write rate + eventual consistency = good sloppy candidate if (data.writeRate === 'high' && data.toleratesStaleReads.acceptable) { reasoning.push('High write rate with stale read tolerance - sloppy quorums improve availability'); return { useSloppyQuorums: true, reasoning, alternatives: ['Ensure handoff completes within acceptable staleness window'], }; } // Idempotent writes + stale tolerance = sloppy friendly if (data.isIdempotent && data.toleratesStaleReads.acceptable) { reasoning.push('Idempotent writes and stale tolerance - sloppy quorums are safe'); return { useSloppyQuorums: true, reasoning, alternatives: [] }; } // Default: strict is safer reasoning.push('No clear signal for sloppy quorums - defaulting to strict for safety'); return { useSloppyQuorums: false, reasoning, alternatives };} // Examplesconsole.log(recommendQuorumType({ requiresStrongConsistency: true, isIdempotent: false, toleratesStaleReads: { acceptable: false }, criticalityLevel: 'critical', writeRate: 'medium',}));// { useSloppyQuorums: false, reasoning: ['Strong consistency required...'] } console.log(recommendQuorumType({ requiresStrongConsistency: false, isIdempotent: true, toleratesStaleReads: { acceptable: true, maxStalenessSeconds: 30 }, criticalityLevel: 'low', writeRate: 'high',}));// { useSloppyQuorums: true, reasoning: ['High write rate with stale read tolerance...'] }Amazon's Dynamo paper (2007) introduced sloppy quorums to a wide audience and influenced countless distributed databases. Understanding Dynamo's specific implementation provides insight into production-grade sloppy quorum systems.
Dynamo's Design Principles:
Dynamo was built for Amazon's shopping cart and similar services where:
These requirements led to sloppy quorums as a core design choice.
Key Dynamo Mechanisms:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110
interface DynamoWriteContext { key: string; value: any; clientContext?: VectorClock; // For conflict resolution} interface VectorClock { entries: Map<string, number>; // nodeId -> counter} interface DynamoWriteResult { success: boolean; version: VectorClock; nodesAcknowledged: string[]; hintsGenerated: number; latencyMs: number;} class DynamoCoordinator { async put( ctx: DynamoWriteContext, writeQuorum: number ): Promise<DynamoWriteResult> { const start = Date.now(); // Step 1: Get preference list for the key const prefList = this.ring.getPreferenceList(ctx.key, writeQuorum * 2); // Step 2: Generate new vector clock version const newVersion = this.incrementVectorClock(ctx.clientContext, this.nodeId); const versionedValue = { value: ctx.value, version: newVersion, timestamp: Date.now(), }; // Step 3: Send writes to top N+k nodes in preference list const writePromises = prefList.slice(0, writeQuorum * 2).map(async (node, idx) => { const isPrimary = idx < this.replicationFactor; try { if (node.status !== 'healthy') { throw new Error('Node unavailable'); } if (isPrimary) { await this.writeToNode(node.id, ctx.key, versionedValue); } else { // Find which primary we're substituting for const intendedPrimary = this.findDownPrimary(prefList.slice(0, this.replicationFactor)); await this.writeHintedToNode(node.id, ctx.key, versionedValue, intendedPrimary); } return { nodeId: node.id, success: true, isHint: !isPrimary }; } catch (err) { return { nodeId: node.id, success: false, isHint: !isPrimary }; } }); // Step 4: Wait for W acknowledgments const results = await this.waitForQuorum(writePromises, writeQuorum); const successful = results.filter(r => r.success); const hints = successful.filter(r => r.isHint); return { success: successful.length >= writeQuorum, version: newVersion, nodesAcknowledged: successful.map(r => r.nodeId), hintsGenerated: hints.length, latencyMs: Date.now() - start, }; } private incrementVectorClock(existing: VectorClock | undefined, nodeId: string): VectorClock { const entries = new Map(existing?.entries || []); entries.set(nodeId, (entries.get(nodeId) || 0) + 1); return { entries }; } // ... other methods} // Dynamo's always-writable shopping cart use caseasync function addToCart(userId: string, productId: string) { const key = `cart:${userId}`; // Get current cart (may return multiple versions if conflicts exist) const { versions } = await dynamo.get(key, { r: 1 }); // Fast read // Merge all versions (union of items) let cart = new Set<string>(); for (const v of versions) { for (const item of v.value.items) { cart.add(item); } } // Add new item cart.add(productId); // Write back with sloppy quorum (always succeeds) const result = await dynamo.put({ key, value: { items: [...cart] }, clientContext: versions[0]?.version, // For conflict detection }, 2); // W=2 // Even if some nodes are down, write succeeds to healthy nodes + hints console.log(`Cart updated. Hints generated: ${result.hintsGenerated}`);}Dynamo's 2007 paper directly inspired Cassandra, Riak, Voldemort, and DynamoDB. The sloppy quorum pattern it popularized is now standard in high-availability distributed databases. Understanding Dynamo's design helps you understand the entire family of AP (availability-partition-tolerant) databases.
Different databases expose sloppy quorum behavior through various configuration parameters. Understanding these settings helps you tune the system for your specific availability and consistency requirements.
| System | Sloppy Quorum Config | Hinted Handoff Config | Notes |
|---|---|---|---|
| Cassandra | Enabled by default | hinted_handoff_enabled, max_hint_window_in_ms | 3 hour default hint window |
| Riak | PR/PW params | handoff_concurrency | PR=0 for sloppy, PR=N for strict |
| DynamoDB | Abstracted | Automatic | No direct control, always uses sloppy |
| Voldemort | prefer.writes config | slop.max.read.bytes | Per-store configuration |
1234567891011121314151617181920212223242526272829303132333435
# cassandra.yaml - Hinted Handoff Configuration # Enable/disable hinted handoff globallyhinted_handoff_enabled: true # Maximum time a hint will be saved# Hints older than this are discardedmax_hint_window_in_ms: 10800000 # 3 hours (default) # How many hints to deliver per secondhinted_handoff_throttle_in_kb: 1024 # Maximum hints per endpoint before rejecting new hintsmax_hints_delivery_threads: 2 # Hint compression (reduces storage but adds CPU)hints_compression: - class_name: LZ4Compressor # Production recommendations:## For high-availability systems (shopping carts, sessions):# max_hint_window_in_ms: 86400000 # 24 hours# - Keeps hints longer for extended outages# - Trade: more disk usage for hints## For lower-latency systems (time-sensitive data):# max_hint_window_in_ms: 600000 # 10 minutes# - Discard hints quickly if delivery fails# - Trade: lose more data on extended outages## For bandwidth-constrained environments:# hinted_handoff_throttle_in_kb: 256# - Slower hint delivery to avoid network saturation# - Trade: longer convergence time after failures12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
// Riak uses PR (primary read) and PW (primary write) parameters// to control strict vs sloppy behavior interface RiakBucketProps { n_val: number; // Replication factor (N) r: number | 'quorum' | 'all'; // Read quorum w: number | 'quorum' | 'all'; // Write quorum pr: number; // PRIMARY read quorum (strict requirement) pw: number; // PRIMARY write quorum (strict requirement) dw: number; // Durable write quorum (after disk sync)} // Sloppy quorum configuration (default Riak behavior)const sloppyBucket: RiakBucketProps = { n_val: 3, r: 'quorum', // R = 2 w: 'quorum', // W = 2 pr: 0, // No primary read requirement (sloppy) pw: 0, // No primary write requirement (sloppy) dw: 0, // No durable write requirement}; // Strict quorum configuration (when consistency matters)const strictBucket: RiakBucketProps = { n_val: 3, r: 'quorum', // R = 2 w: 'quorum', // W = 2 pr: 2, // 2 PRIMARY nodes must respond (strict) pw: 2, // 2 PRIMARY nodes must acknowledge (strict) dw: 2, // Data must be on disk on 2 nodes}; // Explanation:// With PR=0, PW=0 (sloppy):// - Read quorum R=2 can be satisfied by ANY 2 nodes// - Write quorum W=2 can be satisfied by ANY 2 nodes// - Substitute nodes can participate//// With PR=2, PW=2 (strict):// - Read quorum R=2 must include 2 PRIMARY replicas// - Write quorum W=2 must include 2 PRIMARY replicas// - Substitute nodes do NOT count toward PR/PW// - Operation fails if not enough primaries available // Mixed configuration (common in practice)const mixedBucket: RiakBucketProps = { n_val: 3, r: 'quorum', w: 'quorum', pr: 0, // Sloppy reads for availability pw: 1, // At least 1 primary must acknowledge writes dw: 1, // At least 1 durable write};// This ensures writes reach at least one intended node while// allowing sloppy behavior to satisfy the rest of the quorumIn production, monitor: (1) Hint queue size - growing queues indicate prolonged outages or handoff issues; (2) Hint delivery rate - ensure hints are being processed faster than created; (3) Hint TTL expirations - expired hints mean lost data; (4) Substitute write percentage - high percentages indicate cluster health issues.
Sloppy quorums represent a powerful technique for maximizing availability in distributed systems at the cost of strict consistency guarantees. Let's consolidate the essential knowledge:
You now understand sloppy quorums and when to use them. In the next and final page of this module, we'll explore 'Hinted Handoff'—the mechanism that makes sloppy quorums viable by ensuring data eventually reaches its intended replicas, how to configure and monitor handoff processes, and what happens when handoff fails.