Loading learning content...
Having explored at-most-once delivery and its message loss trade-offs, we now turn to the opposite end of the reliability spectrum: at-least-once delivery.
At-least-once delivery answers a fundamental business requirement: No message should ever be permanently lost. Every message produced must eventually be consumed, even if the system experiences failures along the way. This guarantee is essential for business-critical workflows where message loss translates directly to financial loss, broken processes, or compliance violations.
But this guarantee comes with a price. To ensure no message is lost, the system must be willing to deliver messages more than once under certain failure conditions. This trade-off fundamentally changes how consumers must be designed.
By the end of this page, you will deeply understand at-least-once delivery—its formal guarantees, the mechanisms that enable it, the failure scenarios that cause duplicate delivery, and the critical implications for consumer design. You will understand why at-least-once is the default choice for business-critical messaging and how to design systems that handle it correctly.
At-least-once delivery guarantees that every message will be delivered one or more times—never zero. This means:
The formal definition:
For any message M sent by producer P to consumer C, let D(M) represent the number of times M is delivered to C. At-least-once guarantees: D(M) ≥ 1.
Unlike at-most-once (which bounds the upper limit), at-least-once bounds the lower limit. The message will arrive—the question is whether it arrives once or multiple times.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
// At-least-once delivery: Producer persists and waits for confirmationclass AtLeastOnceProducer { private broker: MessageBroker; private maxRetries: number = 3; private retryDelay: number = 1000; constructor(broker: MessageBroker) { this.broker = broker; } /** * Send a message with at-least-once semantics. * * Key characteristics: * - Wait for broker acknowledgment * - Retry on failure * - Message persisted before acknowledgment * - Message guaranteed to arrive (eventually) * - Message may be duplicated on retry */ async send(topic: string, message: Message): Promise<void> { let lastError: Error | null = null; for (let attempt = 1; attempt <= this.maxRetries; attempt++) { try { // Send and wait for broker confirmation const ack = await this.broker.publish(topic, message, { timeout: 5000, requireAck: true, // Wait for broker to persist }); if (ack.success) { return; // Success - message guaranteed persisted } } catch (error) { lastError = error as Error; console.warn(`Attempt ${attempt} failed: ${error.message}`); // Wait before retry (with exponential backoff) await this.sleep(this.retryDelay * Math.pow(2, attempt - 1)); } } // After all retries exhausted, we have options: // 1. Throw error (caller must handle) // 2. Queue for later retry (dead letter queue) // 3. Alert operations team throw new Error(`Failed to deliver message after ${this.maxRetries} attempts: ${lastError?.message}`); } private sleep(ms: number): Promise<void> { return new Promise(resolve => setTimeout(resolve, ms)); }}The very mechanism that prevents message loss—retries—is also what causes duplicates. If the producer's first attempt succeeded but the acknowledgment was lost (network failure), the producer will retry, delivering the same message again. The consumer will receive it twice.
At-least-once delivery requires several key mechanisms working together across the entire message pipeline.
Producer-Side Mechanics:
Broker-Side Mechanics:
The message broker plays a critical role in at-least-once delivery:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182
class AtLeastOnceBroker { private storage: DurableStorage; private activeDeliveries: Map<string, MessageDelivery> = new Map(); private redeliveryTimeout: number = 30000; // 30 seconds /** * Receive message from producer with at-least-once guarantee */ async receiveFromProducer(message: Message): Promise<ProducerAck> { // 1. FIRST: Persist the message durably // This is the critical step - message survives broker crash await this.storage.persist(message); // 2. ONLY THEN: Acknowledge to producer // If broker crashes between persist and ack, producer retries // Result: duplicate in storage (consumer must handle) return { success: true, messageId: message.id }; } /** * Deliver message to consumer with redelivery on timeout */ async deliverToConsumer(message: Message, consumer: Consumer): Promise<void> { const deliveryId = `${message.id}-${Date.now()}`; // Track this delivery attempt this.activeDeliveries.set(deliveryId, { message, consumer, startedAt: Date.now(), acknowledged: false, }); // Deliver the message await consumer.receive(message); // Start redelivery timer // If consumer doesn't ACK within timeout, redeliver this.scheduleRedeliveryCheck(deliveryId); } /** * Process acknowledgment from consumer */ async acknowledgeFromConsumer(messageId: string): Promise<void> { // Find and mark delivery as acknowledged for (const [deliveryId, delivery] of this.activeDeliveries) { if (delivery.message.id === messageId) { delivery.acknowledged = true; // Remove from active deliveries this.activeDeliveries.delete(deliveryId); // Mark as consumed in storage (or delete) await this.storage.markConsumed(messageId); return; } } } /** * Check if redelivery is needed */ private scheduleRedeliveryCheck(deliveryId: string): void { setTimeout(async () => { const delivery = this.activeDeliveries.get(deliveryId); if (delivery && !delivery.acknowledged) { // Consumer didn't acknowledge in time // REDELIVER - this causes the duplicate console.warn(`Message ${delivery.message.id} not acknowledged, redelivering`); // Remove stale delivery record this.activeDeliveries.delete(deliveryId); // Redeliver to same or different consumer await this.deliverToConsumer(delivery.message, delivery.consumer); } }, this.redeliveryTimeout); }}Consumer-Side Mechanics:
The consumer's behavior is crucial for at-least-once delivery. The key rule: acknowledge AFTER processing, not before.
123456789101112131415161718192021222324252627282930313233
class AtLeastOnceConsumer { private broker: MessageBroker; private processingTimeout: number = 25000; // Less than broker's redelivery timeout async consumeMessage(): Promise<void> { const message = await this.broker.receive(); try { // CRITICAL: Process BEFORE acknowledging // This is what guarantees at-least-once delivery await this.processMessage(message); // ONLY acknowledge after successful processing await this.broker.acknowledge(message.id); } catch (error) { // Processing failed - DO NOT acknowledge // Message will be redelivered by broker console.error(`Processing failed: ${error.message}`); // Optionally: negative acknowledge to speed up redelivery await this.broker.negativeAcknowledge(message.id); // The same message will be delivered again } } private async processMessage(message: Message): Promise<void> { // Business logic here // MUST be idempotent because duplicates are possible! await this.orderService.fulfillOrder(message.orderId); }}At-least-once delivery fundamentally requires idempotent consumers. Since the same message may be delivered multiple times, processing that message multiple times must produce the same result as processing it once. We'll cover idempotency patterns in detail in Page 4.
Understanding exactly when and why duplicates occur is essential for designing robust systems. Let's examine each failure scenario in detail.
| Failure Scenario | What Happens | Duplicate Cause | Consumer Impact |
|---|---|---|---|
| Producer ACK lost | Producer sent message, broker received and persisted, ACK packet lost | Producer retries, sends same message again | Message arrives twice from broker |
| Consumer crash after processing | Consumer processed message, crashed before sending ACK | Broker times out, redelivers to same/different consumer | Same message processed twice |
| Consumer ACK lost | Consumer sent ACK, network lost the packet | Broker times out waiting for ACK, redelivers | Same message processed twice |
| Broker failover | Primary broker received message, failed before replicating | Consumer reconnects to new broker, may receive again | Depends on replication lag |
| Network partition healed | Consumer was partitioned, processed old message, reconnects | Out-of-order delivery after partition | Old message reprocessed |
Scenario Deep Dive: Consumer Crash After Processing
This is the most common duplicate scenario. Let's trace through exactly what happens:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
/** * Timeline of a consumer crash causing duplicate processing * * T+0ms: Broker delivers message M1 to Consumer A * T+100ms: Consumer A receives message M1 * T+500ms: Consumer A starts processing M1 * T+2000ms: Consumer A completes processing M1 * - Order is fulfilled * - Database is updated * - Email is sent * ⚡ CRASH: Consumer A dies before sending ACK * * T+30000ms: Broker's redelivery timeout expires * Broker notices M1 was never acknowledged * Broker marks M1 for redelivery * * T+30100ms: Broker delivers M1 to Consumer B (or restarted A) * * T+32000ms: Consumer B processes M1 again * - Order is fulfilled AGAIN (duplicate!) * - Database is updated AGAIN * - Email is sent AGAIN * Consumer B sends ACK * * Result: Customer receives order twice, gets charged twice (potentially), * receives two confirmation emails * * This is why AT-LEAST-ONCE REQUIRES IDEMPOTENT CONSUMERS */ // The solution: Make processing idempotentclass IdempotentOrderProcessor { private db: Database; async processOrder(orderId: string): Promise<void> { // Check if already processed const existingOrder = await this.db.orders.findUnique({ where: { orderId, status: 'FULFILLED' } }); if (existingOrder) { console.log(`Order ${orderId} already fulfilled, skipping`); return; // Idempotent: duplicate processing is a no-op } // Process the order await this.db.orders.update({ where: { orderId }, data: { status: 'FULFILLED' } }); await this.fulfillmentService.ship(orderId); }}Quantifying Duplicate Probability
The probability of duplicate delivery varies based on system characteristics:
In well-designed production systems, duplicate rates typically range from 0.01% to 0.5%. While this seems low, at high volumes it becomes significant:
1234567891011121314151617181920212223242526
/** * Calculate expected duplicates at different scales */function calculateDuplicates( messagesPerDay: number, duplicateRate: number = 0.001 // 0.1%): void { const dailyDuplicates = messagesPerDay * duplicateRate; console.log(`At ${messagesPerDay.toLocaleString()} messages/day:`); console.log(` Daily duplicates: ~${Math.round(dailyDuplicates).toLocaleString()}`); console.log(` Monthly duplicates: ~${Math.round(dailyDuplicates * 30).toLocaleString()}`);} // Real-world examplescalculateDuplicates(100_000); // Small service// Daily: ~100, Monthly: ~3,000 calculateDuplicates(10_000_000); // Medium service // Daily: ~10,000, Monthly: ~300,000 calculateDuplicates(1_000_000_000); // Large platform// Daily: ~1,000,000, Monthly: ~30,000,000 // At scale, even a 0.1% duplicate rate means MILLIONS of duplicates per month// Every consumer MUST be designed to handle thisDo not treat duplicate messages as bugs to be eliminated. They are the natural consequence of choosing reliability over simplicity. Your system design must embrace duplicates as a normal operating condition, not an exceptional failure mode.
The acknowledgment (ACK) mechanism is the heart of at-least-once delivery. Different acknowledgment patterns offer different trade-offs between reliability, performance, and complexity.
Pattern 1: Single Message Acknowledgment
The consumer acknowledges each message individually after processing. This is the safest and most common pattern.
123456789101112131415161718192021222324252627
class SingleMessageAckConsumer { async consume(): Promise<void> { while (true) { const message = await this.broker.receive(); try { await this.process(message); // ACK this specific message await this.broker.ack(message.deliveryTag); } catch (error) { // NACK to trigger redelivery await this.broker.nack(message.deliveryTag, { requeue: true }); } } }} // Pros:// - Fine-grained control over which messages succeed// - Failed messages are redelivered independently// - Easy to reason about // Cons:// - ACK round-trip for every message (latency overhead)// - Lower throughput for high-volume scenariosPattern 2: Batch Acknowledgment
The consumer acknowledges multiple messages at once, typically up to a specific offset or batch boundary.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
class BatchAckConsumer { private batchSize: number = 100; private processedMessages: Message[] = []; async consume(): Promise<void> { while (true) { const message = await this.broker.receive(); try { await this.process(message); this.processedMessages.push(message); // ACK when batch is complete if (this.processedMessages.length >= this.batchSize) { // Acknowledge all messages up to the latest offset const latestOffset = this.processedMessages[this.processedMessages.length - 1].offset; await this.broker.ackUpTo(latestOffset); this.processedMessages = []; // Clear batch } } catch (error) { // Batch ACK complexity: if message 50/100 fails, // we can only ACK up to message 49 // Messages 50-99 must be reprocessed const lastSuccessful = this.processedMessages[this.processedMessages.length - 1]; if (lastSuccessful) { await this.broker.ackUpTo(lastSuccessful.offset); } // NACK the failed message await this.broker.nack(message.deliveryTag, { requeue: true }); this.processedMessages = []; } } }} // Pros:// - Higher throughput (fewer ACK round-trips)// - Better performance for high-volume scenarios // Cons:// - Failure in batch affects subsequent messages// - More complex error handling// - Larger redelivery units (more duplicates on failure)Pattern 3: Offset-Based Acknowledgment (Kafka-style)
Common in log-based messaging systems, this pattern commits the consumer's read position (offset) rather than individual messages.
12345678910111213141516171819202122232425262728293031323334353637
import { Kafka, EachMessagePayload } from 'kafkajs'; const kafka = new Kafka({ brokers: ['kafka:9092'] });const consumer = kafka.consumer({ groupId: 'order-processor' }); await consumer.subscribe({ topic: 'orders' }); // Manual offset commit for at-least-once guaranteeawait consumer.run({ autoCommit: false, // CRITICAL: disable auto-commit for at-least-once eachMessage: async ({ topic, partition, message }: EachMessagePayload) => { try { // Process the message await processOrder(JSON.parse(message.value!.toString())); // Commit offset AFTER successful processing await consumer.commitOffsets([{ topic, partition, offset: (BigInt(message.offset) + 1n).toString(), }]); } catch (error) { // Don't commit - message will be redelivered on restart // Or: seek back to retry immediately console.error(`Failed to process message at offset ${message.offset}`); // Option: explicit seek for immediate retry // consumer.seek({ topic, partition, offset: message.offset }); } }}); // WARNING: Auto-commit (autoCommit: true) gives at-most-once semantics!// Kafka auto-commits periodically regardless of processing success// For at-least-once, you MUST use manual commitsMany developers are surprised to learn that Kafka's default auto-commit behavior provides at-most-once semantics, not at-least-once. Auto-commit happens on a timer (every 5 seconds by default), regardless of whether processing succeeded. For at-least-once, always disable auto-commit and commit manually after processing.
Pattern 4: Transactional Acknowledgment
For systems requiring exactly-once processing with external side effects, some brokers support transactional acknowledgment patterns.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
// Kafka Transactions: consume-transform-produce atomicallyconst producer = kafka.producer({ transactionalId: 'order-processor-tx', idempotent: true,}); await consumer.run({ autoCommit: false, eachMessage: async ({ topic, partition, message }) => { // Begin transaction const transaction = await producer.transaction(); try { // Process and produce output const result = await processOrder(message.value); await transaction.send({ topic: 'processed-orders', messages: [{ value: JSON.stringify(result) }], }); // Commit consumer offset AS PART OF the transaction await transaction.sendOffsets({ consumerGroupId: 'order-processor', topics: [{ topic, partitions: [{ partition, offset: (BigInt(message.offset) + 1n).toString(), }], }], }); // Atomically commit: output message + offset commit await transaction.commit(); } catch (error) { // Abort transaction: neither output nor offset is committed await transaction.abort(); } }}); // This provides exactly-once semantics for consume-transform-produce workflows// But external side effects (APIs, databases) still need additional handlingRetry mechanisms are fundamental to at-least-once delivery, but naive retry strategies can cause more problems than they solve. Let's explore the key retry patterns and their trade-offs.
Immediate Retry (Anti-Pattern)
The simplest—and worst—retry strategy is to retry immediately upon failure.
12345678910111213141516171819
// ❌ ANTI-PATTERN: Immediate retryasync function sendWithImmediateRetry(message: Message): Promise<void> { while (true) { try { await this.broker.send(message); return; } catch (error) { // Immediately retry - THIS IS DANGEROUS console.error(`Send failed, retrying immediately...`); // No delay between retries } }} // Problems:// 1. If failure is due to overload, immediate retries make it worse// 2. CPU/network saturation from tight retry loop// 3. Can trigger cascading failures across the system// 4. No opportunity for transient issues to resolveExponential Backoff
The standard solution is exponential backoff—each retry waits longer than the previous one.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
class ExponentialBackoffRetry { private baseDelay: number = 100; // 100ms initial delay private maxDelay: number = 30000; // 30 second max delay private maxRetries: number = 10; /** * Calculate delay for retry attempt. * Delay doubles with each attempt: 100ms, 200ms, 400ms, 800ms... */ private calculateDelay(attempt: number): number { const delay = this.baseDelay * Math.pow(2, attempt); return Math.min(delay, this.maxDelay); } async sendWithRetry(message: Message): Promise<void> { let lastError: Error | null = null; for (let attempt = 0; attempt < this.maxRetries; attempt++) { try { await this.broker.send(message); return; // Success } catch (error) { lastError = error as Error; if (attempt < this.maxRetries - 1) { const delay = this.calculateDelay(attempt); console.log(`Retry ${attempt + 1} in ${delay}ms...`); await this.sleep(delay); } } } throw new Error(`Failed after ${this.maxRetries} retries: ${lastError?.message}`); } private sleep(ms: number): Promise<void> { return new Promise(resolve => setTimeout(resolve, ms)); }} // Retry timeline example:// Attempt 1: immediate// Attempt 2: wait 100ms// Attempt 3: wait 200ms// Attempt 4: wait 400ms// Attempt 5: wait 800ms// Attempt 6: wait 1600ms// Attempt 7: wait 3200ms// Attempt 8: wait 6400ms// Attempt 9: wait 12800ms// Attempt 10: wait 25600ms// Total max wait: ~50 seconds across all retriesExponential Backoff with Jitter
When many clients retry simultaneously (thundering herd), their retries can synchronize and overwhelm the system. Adding jitter (randomness) desynchronizes retries.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
class JitteredExponentialBackoff { private baseDelay: number = 100; private maxDelay: number = 30000; /** * Full Jitter: Random value between 0 and exponential delay * Recommended by AWS and Google for distributed systems */ calculateDelayWithFullJitter(attempt: number): number { const exponentialDelay = this.baseDelay * Math.pow(2, attempt); const cappedDelay = Math.min(exponentialDelay, this.maxDelay); // Full jitter: random between 0 and calculated delay return Math.random() * cappedDelay; } /** * Equal Jitter: Half fixed, half random * More predictable minimum wait time */ calculateDelayWithEqualJitter(attempt: number): number { const exponentialDelay = this.baseDelay * Math.pow(2, attempt); const cappedDelay = Math.min(exponentialDelay, this.maxDelay); // Equal jitter: half delay + random(0, half delay) const halfDelay = cappedDelay / 2; return halfDelay + (Math.random() * halfDelay); } /** * Decorrelated Jitter: Each delay based on previous delay * AWS recommendation for highly contended resources */ calculateDecorrelatedJitter(previousDelay: number): number { const newDelay = this.baseDelay + Math.random() * (previousDelay * 3 - this.baseDelay); return Math.min(newDelay, this.maxDelay); }} // Without jitter (1000 clients failing at T=0):// T+100ms: 1000 clients retry simultaneously → system overloaded// T+200ms: 1000 clients retry simultaneously → system still overloaded// Repeating pattern prevents recovery // With full jitter (1000 clients failing at T=0):// T+0-100ms: Clients retry randomly spread across 100ms window// T+0-200ms: Second attempts spread across 200ms window// System has time to recover between requestsAWS's architecture best practices recommend full jitter for most retry scenarios. Their analysis shows full jitter completes work fastest on average because it maximizes the spread of retry attempts, giving the downstream system the best chance to recover.
Let's examine how major messaging systems implement at-least-once delivery guarantees.
Apache Kafka
Kafka provides at-least-once delivery through producer acknowledgments and manual consumer offset commits.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
import { Kafka, CompressionTypes } from 'kafkajs'; const kafka = new Kafka({ clientId: 'order-service', brokers: ['kafka-1:9092', 'kafka-2:9092', 'kafka-3:9092'],}); // Producer: At-least-once configurationconst producer = kafka.producer({ // acks: 'all' = wait for all in-sync replicas to acknowledge // This is the key setting for durability acks: -1, // -1 is equivalent to 'all' // Enable idempotent producer to deduplicate within producer session idempotent: true, // Retry configuration retry: { retries: 10, initialRetryTime: 100, maxRetryTime: 30000, },}); // Producer usageawait producer.send({ topic: 'orders', messages: [{ key: orderId, // Key for partitioning (ordering guarantee) value: JSON.stringify(order), }],});// This call returns ONLY after all replicas have acknowledged // Consumer: At-least-once configurationconst consumer = kafka.consumer({ groupId: 'order-processor', // Manual commits for at-least-once guarantee autoCommit: false,}); await consumer.run({ eachMessage: async ({ topic, partition, message }) => { // Process first await processOrder(message.value); // Then commit await consumer.commitOffsets([{ topic, partition, offset: (BigInt(message.offset) + 1n).toString(), }]); },});RabbitMQ
RabbitMQ provides at-least-once through publisher confirms and consumer acknowledgments.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950
import amqp from 'amqplib'; // Publisher: At-least-once with publisher confirmsconst connection = await amqp.connect('amqp://localhost');const channel = await connection.createChannel(); // Enable publisher confirmsawait channel.confirmChannel(); // Declare durable queue (survives broker restart)await channel.assertQueue('orders', { durable: true, // Queue definition persisted}); // Publish with confirmationconst message = JSON.stringify(order);const published = channel.sendToQueue( 'orders', Buffer.from(message), { persistent: true, // Message persisted to disk }); // Wait for broker confirmawait channel.waitForConfirms();// Now we know the message is safely persisted // Consumer: At-least-once with manual acknowledgmentawait channel.consume( 'orders', async (msg) => { if (!msg) return; try { // Process the message await processOrder(JSON.parse(msg.content.toString())); // Acknowledge AFTER successful processing channel.ack(msg); } catch (error) { // Negative acknowledge - message will be requeued channel.nack(msg, false, true); // (msg, allUpTo, requeue) } }, { noAck: false, // CRITICAL: Enable acknowledgments });AWS SQS
SQS provides at-least-once delivery through visibility timeouts and explicit deletion.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546
import { SQSClient, ReceiveMessageCommand, DeleteMessageCommand } from '@aws-sdk/client-sqs'; const sqs = new SQSClient({ region: 'us-east-1' });const queueUrl = 'https://sqs.us-east-1.amazonaws.com/123456789/orders'; async function consumeMessages(): Promise<void> { while (true) { // Receive messages (they become invisible, not deleted) const response = await sqs.send(new ReceiveMessageCommand({ QueueUrl: queueUrl, MaxNumberOfMessages: 10, VisibilityTimeout: 300, // 5 minutes to process WaitTimeSeconds: 20, // Long polling })); for (const message of response.Messages || []) { try { // Process the message await processOrder(JSON.parse(message.Body!)); // Delete AFTER successful processing // This is the "acknowledgment" in SQS await sqs.send(new DeleteMessageCommand({ QueueUrl: queueUrl, ReceiptHandle: message.ReceiptHandle!, })); } catch (error) { // Don't delete - message will become visible again // after visibility timeout expires console.error(`Processing failed: ${error.message}`); // SQS will redeliver after visibility timeout // This provides at-least-once delivery } } }} // SQS At-Least-Once Mechanics:// 1. Message is received and becomes invisible// 2. Consumer has VisibilityTimeout seconds to process and delete// 3. If deleted: message is gone (successfully processed)// 4. If not deleted: message becomes visible again (redelivered)// // After MaxReceiveCount redeliveries, message goes to Dead Letter QueueDespite different APIs and terminology, all systems implement the same fundamental pattern: receive → process → acknowledge. The acknowledgment is only sent after processing succeeds, ensuring no message is lost. The cost is potential duplicates when acknowledgment fails.
At-least-once delivery is the workhorse of reliable distributed messaging, providing the guarantee that no message will ever be lost. Let's consolidate the key insights:
You now understand at-least-once delivery—the most common choice for reliable messaging in distributed systems. In the next page, we'll tackle the most challenging guarantee: exactly-once delivery, exploring why it's so difficult to achieve and what 'exactly-once' actually means in practice.