Delivery Guarantees - Learning Module

Loading content...

0/273

At-Most-Once Delivery

The Foundational Question of Message Delivery

In distributed systems, one of the most fundamental questions we face is deceptively simple: When you send a message from one service to another, how many times will that message be delivered?

This question may seem trivial—after all, you send a message, and the recipient gets it, right? But in the presence of network failures, process crashes, and the inherent unreliability of distributed systems, the answer becomes far more nuanced. The properties you can guarantee around message delivery define the character of your entire system architecture.

Delivery guarantees fall into three fundamental categories, each representing a distinct trade-off between reliability, complexity, and performance:

At-most-once delivery: Messages may be lost, but will never be delivered more than once
At-least-once delivery: Messages will definitely arrive, but may be delivered multiple times
Exactly-once delivery: Messages will be delivered precisely one time (with significant caveats)

This page focuses exclusively on at-most-once delivery—the simplest, fastest, but also the most 'lossy' of these guarantees.

What You Will Learn

By the end of this page, you will deeply understand the at-most-once delivery semantic—its guarantees, its failure modes, why it exists, when to use it, and how it relates to the broader landscape of distributed systems reliability. You will be able to articulate precisely why at-most-once is the right choice in specific scenarios and why it's dangerously inappropriate in others.

Defining At-Most-Once Semantics

At-most-once delivery provides a guarantee that every message will be delivered either zero or one time—never more. This means:

Best case: The message is successfully transmitted, received, processed, and acknowledged exactly once.
Worst case: The message is lost somewhere in transit, during processing, or before acknowledgment—and the producer has no mechanism to recover it.

This semantic is sometimes called "fire-and-forget" because the producer sends the message and immediately moves on, without waiting for confirmation of successful processing.

at-most-once-producer.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// At-most-once delivery: Producer sends without confirmation
class AtMostOnceProducer {
    private broker: MessageBroker;
 
    constructor(broker: MessageBroker) {
        this.broker = broker;
    }
 
    /**
     * Send a message with at-most-once semantics.
     * 
     * Key characteristics:
     * - No acknowledgment awaited
     * - No retry on failure
     * - Message may be lost
     * - Extremely fast
     */
    send(topic: string, message: Message): void {
        // Fire and forget - no waiting for acknowledgment
        this.broker.publish(topic, message);
        
        // Producer immediately continues
        // No way to know if message was received
        // No retry mechanism
    }
}
 
// Usage - the producer doesn't know if this succeeded
producer.send("metrics-topic", {
    type: "cpu_usage",
    value: 45.2,
    timestamp: Date.now()
});
 
// Execution continues immediately regardless of delivery success

The formal definition of at-most-once semantics can be expressed as:

For any message M sent by producer P to consumer C, let D(M) represent the number of times M is delivered to C. At-most-once guarantees: 0 ≤ D(M) ≤ 1.

This is the weakest delivery guarantee possible. It makes no promises about successful delivery—only that if delivery happens, it happens at most once.

Why 'At-Most-Once' and Not 'Exactly-Zero-Or-One'?

The naming convention 'at-most-once' emphasizes what the guarantee prevents (multiple deliveries) rather than what it ensures (any delivery at all). This phrasing is deliberate—it signals that the system is optimizing for preventing duplicates, accepting the possibility of message loss as a trade-off.

The Mechanics of At-Most-Once Delivery

To understand why at-most-once delivery exists and when it's appropriate, we need to examine the mechanics of message transmission in distributed systems.

The message lifecycle in a typical messaging system involves several steps:

Production: The producer creates the message
Transmission: The message travels over the network to the broker
Persistence: The broker stores the message (sometimes)
Delivery: The broker transmits the message to the consumer
Processing: The consumer handles the message
Acknowledgment: The consumer confirms successful processing

At-most-once delivery typically works by acknowledging the message before (or instead of) processing it, or by not requiring acknowledgment at all.

Implementation Pattern 1: Fire-and-Forget (No ACK)

The simplest at-most-once implementation uses UDP-style semantics—no acknowledgment mechanism exists at all. The producer sends the message and the broker (if present) delivers it without tracking whether the consumer received it.

fire-and-forget.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
// Pattern 1: No acknowledgment at all
class FireAndForgetBroker {
    private consumers: Map<string, Consumer[]> = new Map();
 
    publish(topic: string, message: Message): void {
        const topicConsumers = this.consumers.get(topic) || [];
        
        for (const consumer of topicConsumers) {
            // Send and immediately forget
            // No tracking, no retry, no delivery confirmation
            try {
                consumer.receive(message);
            } catch (error) {
                // Log and continue - message is lost
                console.warn(`Delivery failed: ${error.message}`);
            }
        }
        
        // Message is discarded after single delivery attempt
    }
}

Implementation Pattern 2: Pre-Processing Acknowledgment

In this pattern, the consumer acknowledges receipt before it begins processing. If the consumer crashes during processing, the message is lost because the broker already considers it delivered.

pre-ack-pattern.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Pattern 2: Acknowledge before processing
class PreAckConsumer {
    private broker: MessageBroker;
 
    async consumeMessage(): Promise<void> {
        const message = await this.broker.receive();
        
        // CRITICAL: Acknowledge BEFORE processing
        // If we crash after this, message is lost
        await this.broker.acknowledge(message.id);
        
        // Now process the message
        // If this throws, the message is already acknowledged
        // and will NOT be redelivered
        try {
            await this.processMessage(message);
        } catch (error) {
            // Message processing failed but it's already ACK'd
            // Message is effectively lost - no retry will occur
            console.error(`Processing failed, message lost: ${error.message}`);
        }
    }
 
    private async processMessage(message: Message): Promise<void> {
        // Business logic here
        // If this crashes, message is gone forever
    }
}

The Crash Window

The critical vulnerability in at-most-once delivery is the crash window—the period between acknowledgment and processing completion. Any failure during this window results in permanent message loss. The larger this window, the higher the probability of data loss.

Failure Modes and Loss Scenarios

Understanding where and how messages can be lost in at-most-once systems is crucial for making informed architectural decisions. Let's examine each point of failure in the message pipeline.

At-Most-Once Failure Points
Failure Point	Failure Scenario	Message Outcome	Recovery Possibility
Producer→Broker Network	Network partition, timeout, packet loss	Message never reaches broker	None - producer doesn't know it failed
Broker Storage	Broker crash before persisting (if not durable)	Message vanishes	None - no persistence means no recovery
Broker→Consumer Network	Network failure after broker sends	Message in flight is lost	None - broker already discarded message
Consumer Pre-ACK Crash	Consumer crashes after ACK but before processing	Message marked delivered but unprocessed	None - broker believes delivery succeeded
Consumer Processing	Exception during business logic after ACK	Business state inconsistent	Manual intervention required

Quantifying Loss Probability

In a well-designed system with reasonable network reliability, message loss rates for at-most-once delivery typically range from 0.01% to 1%, depending on:

Network reliability between components
Hardware failure rates
Processing complexity and duration
System load and queue depths

This may sound acceptable, but at scale, the numbers become significant:

loss-calculation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
/**
 * Calculate expected message loss for at-most-once delivery
 * 
 * Assume 0.1% loss rate (optimistic for production systems)
 */
function calculateExpectedLoss(messagesPerDay: number, lossRate: number = 0.001): void {
    const dailyLoss = messagesPerDay * lossRate;
    const monthlyLoss = dailyLoss * 30;
    const yearlyLoss = dailyLoss * 365;
 
    console.log(`At ${messagesPerDay.toLocaleString()} messages/day with ${lossRate * 100}% loss rate:`);
    console.log(`  Daily loss: ${dailyLoss.toLocaleString()} messages`);
    console.log(`  Monthly loss: ${monthlyLoss.toLocaleString()} messages`);
    console.log(`  Yearly loss: ${yearlyLoss.toLocaleString()} messages`);
}
 
// Small system: 100K messages/day
calculateExpectedLoss(100_000);
// Daily loss: 100 messages
// Monthly loss: 3,000 messages
// Yearly loss: 36,500 messages
 
// Medium system: 10M messages/day  
calculateExpectedLoss(10_000_000);
// Daily loss: 10,000 messages
// Monthly loss: 300,000 messages
// Yearly loss: 3,650,000 messages
 
// Large system: 1B messages/day
calculateExpectedLoss(1_000_000_000);
// Daily loss: 1,000,000 messages
// Monthly loss: 30,000,000 messages
// Yearly loss: 365,000,000 messages

Scale Amplifies Everything

A 0.1% loss rate means losing 10,000 messages per day at 10 million messages/day throughput. If those messages represent financial transactions, user data, or critical business events, at-most-once delivery is catastrophically inappropriate. The choice of delivery semantic must always be evaluated against the business impact of message loss.

Why At-Most-Once Exists: Performance and Simplicity

Given the significant risk of data loss, why would anyone use at-most-once delivery? The answer lies in two compelling advantages: performance and simplicity.

Performance Advantages:

At-most-once delivery is dramatically faster than alternatives because it eliminates several expensive operations:

Performance Gains from At-Most-Once

•No persistence overhead: Messages don't need to be written to disk before delivery, eliminating expensive fsync operations
•No acknowledgment round-trips: The producer doesn't wait for confirmation, reducing latency by one network round-trip
•No state tracking: The broker doesn't need to track delivery state, reducing memory and storage requirements
•No retry logic: No complex retry mechanisms, timeout handling, or dead letter queue management
•Simpler consumer logic: Consumers don't need idempotency guarantees since duplicates are impossible

performance-comparison.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Latency comparison for different delivery semantics
interface LatencyBreakdown {
    networkLatency: number;           // One-way network time
    serializationTime: number;        // Message encoding
    brokerPersistence?: number;       // Disk write time
    ackRoundTrip?: number;            // Acknowledgment latency
    retryOverhead?: number;           // Average retry cost
}
 
// At-Most-Once Delivery
const atMostOnceLatency: LatencyBreakdown = {
    networkLatency: 1,              // 1ms
    serializationTime: 0.1,         // 0.1ms
    // No persistence
    // No acknowledgment
    // No retry
};
// Total: ~1.1ms per message
 
// At-Least-Once Delivery
const atLeastOnceLatency: LatencyBreakdown = {
    networkLatency: 1,              // 1ms
    serializationTime: 0.1,         // 0.1ms
    brokerPersistence: 5,           // 5ms (SSD fsync)
    ackRoundTrip: 2,                // 2ms (network round-trip)
    retryOverhead: 0.3,             // 0.3ms (amortized retry cost)
};
// Total: ~8.4ms per message
 
// At-most-once is approximately 7.6x faster in this example
const speedup = (1 + 0.1 + 5 + 2 + 0.3) / (1 + 0.1);
console.log(`At-most-once is ~${speedup.toFixed(1)}x faster`);

Simplicity Advantages:

Beyond raw performance, at-most-once systems are fundamentally simpler to implement, operate, and reason about:

Implementation Simplicity

•No distributed transaction protocols
•No acknowledgment timeout handling
•No dead letter queue management
•No message deduplication logic
•No retry backoff algorithms

Operational Simplicity

•No disk I/O bottlenecks
•No queue depth monitoring
•No stuck message debugging
•No duplicate detection debugging
•Predictable, consistent latency

The Right Tool for the Right Job

At-most-once isn't a 'bad' delivery guarantee—it's the correct choice when message loss is acceptable and performance is critical. The key is understanding when those conditions apply.

Appropriate Use Cases for At-Most-Once

At-most-once delivery is the correct choice in scenarios where:

The data is inherently lossy or replaceable
The next message makes the previous one obsolete
Statistical sampling is acceptable
Performance is more valuable than completeness

Let's examine each category with concrete examples:

Use Case Category 1: Metrics and Telemetry

•CPU/Memory Metrics: System telemetry sent every second—losing one data point in 1,000 doesn't affect dashboards or alerting
•Request Latency Sampling: P99 latency calculations work with 99% of data; 100% isn't required for statistical validity
•Application Logs (non-audit): Debug logs where occasional gaps don't compromise troubleshooting
•Infrastructure Health Checks: Heartbeat signals where the latest state matters, not complete history

metrics-collection.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// At-most-once is perfect for high-frequency metrics
class MetricsCollector {
    private broker: AtMostOnceBroker;
    private sampleRate: number = 0.01; // 1% sampling
 
    /**
     * Collect request latency metric.
     * 
     * Perfect for at-most-once because:
     * 1. We sample anyway (don't need 100% of data)
     * 2. Aggregate statistics tolerant to missing points
     * 3. High volume (millions/sec) - can't afford ACK overhead
     * 4. Next measurement replaces value of lost one
     */
    recordLatency(endpoint: string, latencyMs: number): void {
        // Sample to reduce volume
        if (Math.random() > this.sampleRate) {
            return;
        }
 
        // Fire-and-forget - perfectly appropriate here
        this.broker.send("metrics.latency", {
            endpoint,
            latencyMs,
            timestamp: Date.now(),
        });
        
        // Don't wait, don't retry, continue serving requests
    }
}
 
// This is called on every request - performance is critical
// Losing 0.1% of already-sampled metrics is completely acceptable

Use Case Category 2: Real-Time State Updates

•Live Sports Scores: If one score update is lost, the next update (in seconds) provides the correct current state
•Stock Ticker Prices: Price feeds update so frequently that a missed quote is immediately superseded
•Cursor Position in Collaborative Editing: Other users' cursor positions update constantly; lost updates are invisible
•Online Game Player Positions: Position updates at 60Hz mean lost updates are interpolated or corrected by the next frame

Use Case Category 3: Non-Critical Notifications

•'Typing...' Indicators: Transient UI state that's immediately replaceable
•Presence Updates: Online/offline status that self-corrects with next heartbeat
•Promotional Push Notifications: Nice-to-have marketing messages where perfect delivery isn't required
•Social Media 'Like' Counters: Approximate counts are acceptable; exact counts aren't business-critical

The Common Thread

All appropriate at-most-once use cases share a common property: the business value of complete delivery is less than the cost of ensuring it. When you can ask 'Would losing 1 in 1,000 of these messages cause business harm?' and the answer is 'no', at-most-once is likely appropriate.

When At-Most-Once Is Dangerous

Equally important is understanding when at-most-once delivery is dangerous and inappropriate. Using at-most-once for critical messages is a common source of production incidents and data loss.

Never Use At-Most-Once For:

•Financial Transactions: Lost payment messages mean lost revenue, incorrect balances, and regulatory violations
•Order Processing: Lost order messages mean unfulfilled orders and angry customers
•Audit Logs: Lost audit entries mean compliance failures and legal exposure
•User-Generated Content: Lost user submissions mean lost trust and potential liability
•Event Sourcing Streams: Lost events mean corrupted aggregate state that can never be recovered
•Inter-Service Commands: Lost commands mean inconsistent system state across services

dangerous-anti-pattern.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// ❌ ANTI-PATTERN: At-most-once for critical business events
class PaymentProcessor {
    private broker: AtMostOnceBroker;
 
    // THIS IS DANGEROUS!
    async processPayment(order: Order, payment: Payment): Promise<void> {
        // Charge the customer
        await this.paymentGateway.charge(payment);
        
        // Fire and forget the fulfillment event
        // ❌ If this message is lost:
        //    - Customer is charged
        //    - Order is never fulfilled
        //    - No automatic recovery
        //    - Manual investigation required
        this.broker.send("fulfillment.orders", {
            orderId: order.id,
            customerId: order.customerId,
            items: order.items,
        });
        
        // Customer is charged but may never receive their order!
    }
}
 
// ✅ CORRECT: Use at-least-once with idempotent consumer
class ReliablePaymentProcessor {
    private broker: AtLeastOnceBroker;
 
    async processPayment(order: Order, payment: Payment): Promise<void> {
        // Use transactional outbox pattern (covered in Module 4)
        await this.transactionalOutbox.executeWithEvent(
            () => this.paymentGateway.charge(payment),
            {
                topic: "fulfillment.orders",
                message: { orderId: order.id, customerId: order.customerId, items: order.items }
            }
        );
        // Now either both succeed or both fail - no lost orders
    }
}

The Cost of Getting This Wrong

Real-world incident: A major e-commerce platform used at-most-once delivery for order fulfillment events. During a 3-hour network partition, approximately 15,000 orders were paid but never fulfilled. Manual reconciliation took 2 weeks, cost $500K in support labor, and resulted in significant customer churn. The 'savings' from simpler infrastructure were obliterated by a single incident.

At-Most-Once in Real Messaging Systems

Let's examine how at-most-once semantics are implemented in real-world messaging systems and protocols:

UDP (User Datagram Protocol)

UDP is the canonical at-most-once protocol. It provides no delivery confirmation, no ordering guarantees, and no retransmission mechanism. Packets are sent and forgotten.

udp-example.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import * as dgram from 'dgram';
 
// UDP provides pure at-most-once semantics at the transport layer
class UDPMetricsSender {
    private socket: dgram.Socket;
    private metricsHost: string;
    private metricsPort: number;
 
    constructor(host: string, port: number) {
        this.socket = dgram.createSocket('udp4');
        this.metricsHost = host;
        this.metricsPort = port;
    }
 
    /**
     * Send a metric via UDP.
     * 
     * - No connection establishment (faster)
     * - No delivery confirmation (lighter)
     * - Packet may be lost, duplicated, or reordered by network
     * - Perfect for high-frequency, loss-tolerant telemetry
     */
    sendMetric(name: string, value: number): void {
        const message = Buffer.from(`${name}:${value}|${Date.now()}`);
        
        // Send and forget - no callback for delivery confirmation
        this.socket.send(message, this.metricsPort, this.metricsHost);
        
        // Execution continues immediately
        // We have no idea if the packet arrived
    }
}
 
// StatsD, Prometheus's pushgateway, and many metrics systems use UDP
const metrics = new UDPMetricsSender('statsd.internal', 8125);
metrics.sendMetric('api.requests.count', 1);
metrics.sendMetric('api.latency.p99', 145);

Apache Kafka with acks=0

Kafka supports at-most-once delivery when producers are configured with acks=0. This means the producer doesn't wait for any acknowledgment from brokers.

kafka-at-most-once.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import { Kafka } from 'kafkajs';
 
const kafka = new Kafka({
    clientId: 'metrics-producer',
    brokers: ['kafka-1:9092', 'kafka-2:9092'],
});
 
// At-most-once producer configuration
const producer = kafka.producer({
    // Don't wait for any acknowledgments
    // Message may be lost if broker hasn't received it
    acks: 0,  // ← KEY SETTING for at-most-once
    
    // No point in retries without acks
    retry: {
        retries: 0,  // No retries
    },
});
 
// Consumer with auto-commit (at-most-once on consumer side)
const consumer = kafka.consumer({ groupId: 'metrics-consumer' });
 
await consumer.subscribe({ topic: 'metrics' });
 
await consumer.run({
    // Commit offsets as soon as messages are received
    // If processing fails after commit, messages are lost
    autoCommit: true,
    autoCommitInterval: 100, // Aggressive auto-commit
    
    eachMessage: async ({ message }) => {
        // Message offset already committed
        // If we crash here, this message is "lost" (won't be reprocessed)
        await processMetric(message.value);
    },
});

MQTT QoS 0

MQTT (Message Queuing Telemetry Transport) explicitly defines three QoS (Quality of Service) levels. QoS 0 is 'at-most-once':

mqtt-qos0.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import * as mqtt from 'mqtt';
 
const client = mqtt.connect('mqtt://broker.local');
 
// QoS 0: At-most-once delivery
// - Fastest delivery option
// - No delivery confirmation
// - Message may be lost
// - Perfect for sensor data with high frequency
client.publish(
    'sensors/temperature/room1',
    JSON.stringify({ value: 22.5, unit: 'celsius' }),
    { qos: 0 }  // ← QoS level 0 = at-most-once
);
 
// MQTT QoS levels summary:
// QoS 0: At-most-once  - Fire and forget
// QoS 1: At-least-once - Delivered at least once, may duplicate
// QoS 2: Exactly-once  - Delivered exactly once (4-way handshake)

Protocol Design Intent

Notice that all these protocols/systems make at-most-once an explicit, opt-in choice. This is intentional—the designers understood that at-most-once is valuable for specific use cases but dangerous as a default. Always be deliberate when choosing this semantic.

Summary: At-Most-Once Delivery

At-most-once delivery represents the simplest and fastest delivery semantic, trading reliability for performance. Let's consolidate the key insights:

Key Takeaways

•Definition: At-most-once guarantees 0 or 1 delivery—messages may be lost but never duplicated
•Mechanics: Achieved through fire-and-forget sending or pre-processing acknowledgment
•Performance: Dramatically faster than alternatives due to elimination of persistence, ACKs, and retries
•Failure Modes: Messages can be lost at any point in the pipeline with no recovery mechanism
•Appropriate Use Cases: Metrics, telemetry, real-time state updates, non-critical notifications
•Dangerous Use Cases: Financial transactions, order processing, audit logs, any business-critical events
•Protocol Examples: UDP, Kafka acks=0, MQTT QoS 0

The Decision Framework:

Before choosing at-most-once delivery, answer these questions:

What is the business impact of losing 0.1% of these messages? If the answer is 'significant', don't use at-most-once.
Is the data replaceable by the next message? If yes, at-most-once may be appropriate.
Is this data required for correctness or just for observability? Observability data often tolerates loss.
What is the volume and required latency? High volume + low latency requirements favor at-most-once.

Page Complete

You now understand the at-most-once delivery semantic—its guarantees, implementation mechanics, appropriate use cases, and critical limitations. In the next page, we'll explore at-least-once delivery, which eliminates message loss at the cost of potential duplicates, fundamentally changing the design constraints for consumers.