Message Queues - Learning Module

Loading content...

0/273

Message Acknowledgment

The Critical Handshake

In distributed messaging, the acknowledgment is everything. It's the handshake that tells the queue: "I've received this message and processed it successfully—you can delete it now."

Without proper acknowledgment patterns, you face two equally dangerous failure modes:

Message loss: Deleting messages that weren't actually processed
Infinite redelivery: Never acknowledging messages that were processed

The acknowledgment mechanism is what transforms a simple data pipe into a reliable work distribution system. In this page, we'll explore acknowledgment patterns, their implications, and how to implement them correctly in production systems.

What You Will Learn

By the end of this page, you will understand acknowledgment modes and their trade-offs, how to implement reliable consumers, batch acknowledgment for throughput, negative acknowledgment (NACK) patterns, handling acknowledgment failures, and the relationship between acknowledgments and exactly-once processing.

Understanding Acknowledgments

An acknowledgment (ACK) is a signal from the consumer to the queue indicating that a message has been successfully received and processed. Until acknowledged, the queue considers the message "in flight" and may redeliver it.

The Acknowledgment Contract

The contract is simple:

Queue's promise: "I will keep this message until you acknowledge it or the visibility timeout expires"
Consumer's promise: "I will acknowledge only after successful processing"

This contract ensures that messages are processed at least once—the foundation of reliable messaging.

Converting Mermaid diagram...

What Happens Without Acknowledgment?

If the consumer doesn't acknowledge:

Consumer crash: The visibility timeout expires, message reappears for another consumer
Slow processing: If processing exceeds visibility timeout, another consumer receives the same message (duplicate processing)
Explicit rejection: Consumer can signal failure, immediately making the message available again

The queue never assumes success—silence is interpreted as failure.

The Auto-ACK Trap

Some messaging libraries offer 'auto-acknowledge' mode where messages are acknowledged immediately upon receipt, before processing. This defeats the purpose of acknowledgments entirely—if processing fails, the message is already gone. Only use auto-ACK for genuinely disposable messages where loss is acceptable.

Acknowledgment Modes

Different messaging systems offer various acknowledgment modes, each with distinct trade-offs between reliability, performance, and complexity.

Manual Acknowledgment

Definition: The consumer explicitly sends an acknowledgment after processing completes.

How it works:

Consumer receives message
Consumer processes message
Consumer explicitly calls ack()
Queue removes message

Reliability: Highest—message survives consumer failures until acknowledged.

Complexity: Moderate—must handle ACK in all code paths including error handling.

manual-ack.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Manual acknowledgment: The reliable pattern
async function processWithManualAck(channel: Channel): Promise<void> {
    channel.consume('orders', async (msg) => {
        if (!msg) return;
        
        try {
            const order = JSON.parse(msg.content.toString());
            
            // Process the order
            await processOrder(order);
            
            // Only acknowledge after successful processing
            channel.ack(msg);
            console.log(`Order ${order.id} processed and acknowledged`);
            
        } catch (error) {
            console.error('Processing failed:', error);
            
            // Reject the message - will be redelivered or dead-lettered
            channel.nack(msg, false, true); // (msg, allUpTo, requeue)
        }
    }, { noAck: false });  // Critical: noAck must be false
}

Acknowledgment Modes Comparison
Mode	Reliability	Throughput	Complexity	Use Case
Manual ACK	Highest	Lower	Moderate	Critical business messages
Auto ACK	Lowest	Highest	Lowest	Disposable messages only
Batch ACK	High	High	Higher	High-volume processing

The Reliable Consumer Pattern

A reliable consumer is one that correctly handles all scenarios: successful processing, failures, crashes, and edge cases. Building a reliable consumer requires careful attention to acknowledgment timing and error handling.

Reliable Consumer Principles

•Never acknowledge before processing: The message must be fully processed before ACK
•Handle all exceptions: Uncaught exceptions should never swallow messages silently
•Implement idempotency: Since redelivery is possible, processing must be repeatable
•Track processing progress: For long operations, extend visibility timeout periodically
•Log acknowledgment failures: ACK network failures need visibility for debugging
•Graceful shutdown: Finish in-flight messages before stopping, or let visibility timeout handle recovery

reliable-consumer.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
// A production-ready reliable consumer implementation
class ReliableConsumer<T> {
    private isShuttingDown = false;
    private inFlightCount = 0;
    
    constructor(
        private queue: MessageQueue,
        private processor: MessageProcessor<T>,
        private config: ConsumerConfig
    ) {
        // Handle graceful shutdown
        process.on('SIGTERM', () => this.shutdown());
        process.on('SIGINT', () => this.shutdown());
    }
    
    async start(): Promise<void> {
        console.log('Consumer starting...');
        
        while (!this.isShuttingDown) {
            try {
                const message = await this.queue.receive(this.config.queueName, {
                    visibilityTimeout: this.config.visibilityTimeout,
                    waitTimeSeconds: 20,
                });
                
                if (message) {
                    this.inFlightCount++;
                    this.processMessage(message).finally(() => {
                        this.inFlightCount--;
                    });
                }
            } catch (error) {
                console.error('Receive error:', error);
                await this.sleep(this.config.errorBackoffMs);
            }
        }
    }
    
    private async processMessage(message: QueueMessage<T>): Promise<void> {
        const startTime = Date.now();
        let visibilityExtender: NodeJS.Timeout | null = null;
        
        try {
            // Start extending visibility for long-running tasks
            if (this.config.enableVisibilityExtension) {
                visibilityExtender = setInterval(async () => {
                    try {
                        await this.queue.extendVisibility(
                            message.receiptHandle,
                            this.config.visibilityTimeout
                        );
                    } catch (e) {
                        console.error('Visibility extension failed', e);
                    }
                }, (this.config.visibilityTimeout * 1000) / 2);
            }
            
            // === PROCESS THE MESSAGE ===
            await this.processor.process(message.body);
            
            // === ACKNOWLEDGE ONLY AFTER SUCCESS ===
            await this.queue.delete(message.receiptHandle);
            
            console.log(`Processed in ${Date.now() - startTime}ms`);
            
        } catch (error) {
            await this.handleError(message, error);
        } finally {
            if (visibilityExtender) {
                clearInterval(visibilityExtender);
            }
        }
    }
    
    private async handleError(message: QueueMessage<T>, error: unknown): Promise<void> {
        const retryCount = message.attributes?.ApproximateReceiveCount || 0;
        
        console.error(`Processing failed (attempt ${retryCount})`, error);
        
        if (retryCount >= this.config.maxRetries) {
            // Max retries exceeded - let it go to DLQ
            console.error('Max retries exceeded, moving to DLQ');
            // Most queues auto-move to DLQ after maxReceiveCount
            // Optionally delete here if no DLQ configured
        } else {
            // Make message visible again with backoff
            const backoff = Math.min(
                this.config.baseBackoffSeconds * Math.pow(2, retryCount),
                this.config.maxBackoffSeconds
            );
            
            await this.queue.changeVisibility(
                message.receiptHandle,
                backoff
            );
        }
    }
    
    private async shutdown(): Promise<void> {
        console.log('Shutdown initiated...');
        this.isShuttingDown = true;
        
        // Wait for in-flight messages to complete
        const timeout = Date.now() + 30000; // 30 second max wait
        while (this.inFlightCount > 0 && Date.now() < timeout) {
            console.log(`Waiting for ${this.inFlightCount} in-flight messages...`);
            await this.sleep(1000);
        }
        
        console.log('Consumer shutdown complete');
        process.exit(0);
    }
    
    private sleep(ms: number): Promise<void> {
        return new Promise(resolve => setTimeout(resolve, ms));
    }
}

The Idempotency Imperative

Because messages can be redelivered (visibility timeout expiration, network issues, consumer crashes), your processing logic MUST be idempotent. Use unique message IDs to detect duplicates, implement upsert operations instead of inserts, and make state changes based on message content rather than receipt order.

Negative Acknowledgment (NACK)

A negative acknowledgment (NACK) is an explicit rejection of a message. Instead of waiting for visibility timeout to expire, the consumer actively tells the queue: "I cannot process this message."

NACK Behaviors

Depending on the system and configuration, NACK can trigger different behaviors:

NACK Behaviors by System
System	NACK Option	Behavior
RabbitMQ	nack(msg, false, true)	Requeue: Message goes back to queue head
RabbitMQ	nack(msg, false, false)	Discard/DLQ: Message is rejected permanently
RabbitMQ	nack(msg, true, true)	Batch requeue: This and all previous messages
AWS SQS	changeVisibility(0)	Immediate redelivery to any consumer
AWS SQS	No explicit NACK	Wait for visibility timeout
Azure Service Bus	abandon()	Message returns to queue, delivery count increments
Azure Service Bus	deadLetter()	Move directly to dead-letter queue

nack-patterns.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// NACK pattern examples
 
// RabbitMQ: Reject with requeue
channel.nack(msg, false, true);  // (message, allUpTo, requeue)
// Message returns to queue, may be delivered to same or different consumer
 
// RabbitMQ: Reject without requeue (goes to DLQ if configured)
channel.nack(msg, false, false);
// Message is discarded or moved to dead-letter exchange
 
// AWS SQS: Immediate retry (set visibility to 0)
await sqs.changeMessageVisibility({
    QueueUrl: queueUrl,
    ReceiptHandle: message.ReceiptHandle,
    VisibilityTimeout: 0  // Immediately visible again
}).promise();
 
// AWS SQS: Delayed retry (exponential backoff)
const retryCount = parseInt(message.Attributes?.ApproximateReceiveCount || '1');
const delaySeconds = Math.min(30 * Math.pow(2, retryCount), 900); // max 15 min
await sqs.changeMessageVisibility({
    QueueUrl: queueUrl,
    ReceiptHandle: message.ReceiptHandle,
    VisibilityTimeout: delaySeconds
}).promise();
 
// Azure Service Bus: Explicit dead-letter with reason
await receiver.deadLetterMessage(message, {
    deadLetterReason: 'ProcessingError',
    deadLetterErrorDescription: error.message
});

When to NACK vs When to Wait

Use explicit NACK when:

You detect a transient error and want immediate retry
You detect a permanent error and want to dead-letter immediately
You're implementing exponential backoff with changeVisibility

Let visibility timeout handle it when:

Consumer crashes (can't send NACK anyway)
You want consistent retry timing regardless of failure type
Your error handling is complex and NACK might not execute

The Requeue Loop Trap

If you NACK with requeue and the message has a fatal error (malformed data, schema violation), the same message will be delivered again, fail again, and be requeued again—infinitely. Always track retry counts and move to dead-letter after max attempts.

Acknowledgment and Transactions

The hardest problem in message processing is ensuring that acknowledgment and side effects happen atomically. Consider this scenario:

Consumer receives message
Consumer writes to database ✓
Consumer crashes before acknowledging ✗
Message is redelivered
Consumer writes to database AGAIN (duplicate!)

This is the classic dual-write problem—two systems (database and queue) must be updated consistently, but there's no distributed transaction spanning both.

Converting Mermaid diagram...

Solutions to the Dual-Write Problem

1. Idempotent Operations

Make your database operations idempotent using message IDs:

idempotent-processing.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Idempotent processing with deduplication
async function processOrderIdempotently(message: OrderMessage): Promise<void> {
    // Use message ID or idempotency key for deduplication
    const idempotencyKey = message.messageId;
    
    // Atomic upsert - only creates if not exists
    const result = await db.processedMessages.upsert({
        where: { messageId: idempotencyKey },
        create: {
            messageId: idempotencyKey,
            processedAt: new Date(),
            status: 'processing'
        },
        update: {}  // No-op if exists
    });
    
    if (result.status === 'completed') {
        console.log('Message already processed, skipping');
        return;  // Duplicate detected
    }
    
    try {
        // Process the order
        await createOrder(message.payload);
        
        // Mark as completed
        await db.processedMessages.update({
            where: { messageId: idempotencyKey },
            data: { status: 'completed' }
        });
    } catch (error) {
        await db.processedMessages.update({
            where: { messageId: idempotencyKey },
            data: { status: 'failed' }
        });
        throw error;
    }
}

2. Transactional Outbox (Reverse)

Some systems support transactional acknowledgment where the ACK is part of the same transaction as the database write:

transactional-ack.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Kafka Streams / Kafka Exactly-Once Processing
// ACK is part of the same transaction as state updates
 
const producer = new Kafka().producer({
    transactionalId: 'my-transactional-producer'
});
 
await producer.connect();
 
// All operations in transaction commit together
await producer.transaction(async (txn) => {
    // Process message
    const result = processMessage(message);
    
    // Write output to downstream topic
    await txn.send({
        topic: 'processed-orders',
        messages: [{ value: result }]
    });
    
    // Commit consumer offset (acknowledgment)
    await txn.sendOffsets({
        consumerGroupId: 'my-consumer-group',
        topics: [{
            topic: 'orders',
            partitions: [{
                partition: message.partition,
                offset: (parseInt(message.offset) + 1).toString()
            }]
        }]
    });
    
    // Both happen atomically, or neither happens
});

The Practical Approach

For most systems, true distributed transactions are impractical. The recommended approach is: (1) Make all operations idempotent, (2) Store message IDs in your database, (3) Check for duplicates before processing. This handles 99.9% of cases without complex infrastructure.

Handling Acknowledgment Failures

What happens when the acknowledgment itself fails? You've processed the message, but the ACK network call times out or returns an error. The queue still thinks the message is in flight.

ACK Failure Scenarios

•Network timeout: The ACK was sent but acknowledgment of the ACK didn't arrive
•Queue unavailable: The queue service is temporarily down
•Receipt handle expired: Visibility timeout expired before ACK was sent
•Message already deleted: Another process already handled this message (in at-least-once systems)

ack-retry-pattern.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Robust acknowledgment with retry
async function acknowledgeWithRetry(
    queue: MessageQueue,
    receiptHandle: string,
    maxRetries: number = 3
): Promise<boolean> {
    let lastError: Error | null = null;
    
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
        try {
            await queue.delete(receiptHandle);
            return true;  // Success
        } catch (error) {
            lastError = error as Error;
            console.warn(`ACK attempt ${attempt} failed:`, error);
            
            // Check if retry makes sense
            if (isReceiptHandleExpired(error)) {
                // Message will be redelivered anyway
                console.warn('Receipt handle expired, message will be redelivered');
                return false;
            }
            
            if (isMessageNotFound(error)) {
                // Message already deleted (by another consumer or cleanup)
                console.warn('Message already deleted');
                return true;  // Treat as success
            }
            
            // Exponential backoff for transient errors
            if (attempt < maxRetries) {
                await sleep(Math.pow(2, attempt) * 100);
            }
        }
    }
    
    // All retries failed
    console.error('All ACK attempts failed:', lastError);
    // Message will be redelivered after visibility timeout
    // Consumer's idempotency will prevent duplicate processing
    return false;
}
 
// Usage in consumer
try {
    await processMessage(message);
    const acked = await acknowledgeWithRetry(queue, message.receiptHandle);
    if (!acked) {
        // Log for monitoring but don't throw
        // Idempotent processing will handle redelivery
        metrics.increment('ack_failures');
    }
} catch (processingError) {
    // Handle processing error...
}

ACK Failures Are Rare But Not Zero

In production systems, acknowledge failures happen perhaps 0.01% of the time—but at scale, that's thousands of occurrences daily. Monitor ACK failure rates and investigate spikes. Consistent failures may indicate queue saturation, network issues, or configuration problems (visibility timeout too short).

Batch Acknowledgment Deep Dive

For high-throughput systems, acknowledging each message individually creates significant overhead. Batch acknowledgment reduces this overhead but requires careful implementation.

Trade-offs of Batch ACK

Benefits

•Reduced network round-trips
•Higher throughput
•Lower latency per message
•Reduced queue service load

Risks

•More messages redelivered on failure
•Complex partial failure handling
•Higher latency before ACK (batching delay)
•Memory pressure for large batches

batch-ack-implementation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// Production batch acknowledgment implementation
class BatchAcknowledger {
    private pendingAcks: Map<string, AckEntry> = new Map();
    private flushTimer: NodeJS.Timeout | null = null;
    
    constructor(
        private queue: MessageQueue,
        private batchSize: number = 10,
        private flushIntervalMs: number = 1000
    ) {
        this.startFlushTimer();
    }
    
    addForAcknowledgment(receiptHandle: string, messageId: string): void {
        this.pendingAcks.set(messageId, {
            receiptHandle,
            timestamp: Date.now()
        });
        
        // Flush if batch is full
        if (this.pendingAcks.size >= this.batchSize) {
            this.flush();
        }
    }
    
    private startFlushTimer(): void {
        this.flushTimer = setInterval(() => {
            if (this.pendingAcks.size > 0) {
                this.flush();
            }
        }, this.flushIntervalMs);
    }
    
    private async flush(): Promise<void> {
        if (this.pendingAcks.size === 0) return;
        
        // Capture current batch
        const batch = Array.from(this.pendingAcks.entries());
        this.pendingAcks.clear();
        
        try {
            // Batch delete
            const entries = batch.map(([id, entry]) => ({
                Id: id,
                ReceiptHandle: entry.receiptHandle
            }));
            
            const result = await this.queue.deleteMessageBatch(entries);
            
            // Handle partial failures
            if (result.Failed && result.Failed.length > 0) {
                console.error('Partial batch ACK failure:', result.Failed);
                // Failed messages will be redelivered after visibility timeout
                metrics.increment('batch_ack_partial_failures', result.Failed.length);
            }
            
            metrics.increment('messages_acknowledged', result.Successful?.length || 0);
            
        } catch (error) {
            console.error('Batch ACK failed entirely:', error);
            // All messages will be redelivered
            // They're already processed, so idempotent handling will dedupe
            metrics.increment('batch_ack_failures');
        }
    }
    
    async shutdown(): Promise<void> {
        if (this.flushTimer) clearInterval(this.flushTimer);
        await this.flush();  // Final flush
    }
}

Optimal Batch Size

Most cloud queues allow batch operations of 10 messages. Kafka allows larger batches (configurable). Start with batch size equal to your prefetch count, and tune based on observed throughput and failure rates. Larger batches = higher throughput but more redelivery on failure.

Summary: Message Acknowledgment

Message acknowledgment is the contract between consumer and queue that enables reliable message processing. Proper acknowledgment handling is what makes distributed messaging actually work.

Key Takeaways

•ACK = Contract Fulfillment: Acknowledgment signals successful processing; message is safely deleted from the queue.
•Manual ACK Is Standard: Always use manual acknowledgment for business-critical messages; auto-ACK loses messages on failure.
•NACK for Explicit Rejection: Use negative acknowledgment to trigger immediate retry or dead-letter routing.
•Idempotency Is Essential: Since redelivery is always possible, processing MUST be idempotent. Use message IDs for deduplication.
•Handle ACK Failures Gracefully: Retry acknowledgments on transient errors; message will be redelivered if ACK ultimately fails.
•Batch ACK for Throughput: Reduces overhead but increases blast radius on failure; combine with robust idempotency.

What's Next:

With acknowledgment patterns mastered, the next page explores Dead Letter Queues—the safety net for messages that can't be processed. We'll cover why they're essential, how to configure them, and patterns for monitoring and recovering from dead-lettered messages.

Page Complete

You now understand message acknowledgment: the handshake that enables reliable distributed processing. Manual ACK with idempotent processing is the gold standard for production systems.

Message Acknowledgment

The Critical Handshake

In distributed messaging, the acknowledgment is everything. It's the handshake that tells the queue: "I've received this message and processed it successfully—you can delete it now."

Without proper acknowledgment patterns, you face two equally dangerous failure modes:

Message loss: Deleting messages that weren't actually processed
Infinite redelivery: Never acknowledging messages that were processed

What You Will Learn

Understanding Acknowledgments

The Acknowledgment Contract

The contract is simple:

Queue's promise: "I will keep this message until you acknowledge it or the visibility timeout expires"
Consumer's promise: "I will acknowledge only after successful processing"

This contract ensures that messages are processed at least once—the foundation of reliable messaging.

Converting Mermaid diagram...

What Happens Without Acknowledgment?

If the consumer doesn't acknowledge:

Consumer crash: The visibility timeout expires, message reappears for another consumer
Slow processing: If processing exceeds visibility timeout, another consumer receives the same message (duplicate processing)
Explicit rejection: Consumer can signal failure, immediately making the message available again

The queue never assumes success—silence is interpreted as failure.

The Auto-ACK Trap

Acknowledgment Modes

Different messaging systems offer various acknowledgment modes, each with distinct trade-offs between reliability, performance, and complexity.

Manual Acknowledgment

Definition: The consumer explicitly sends an acknowledgment after processing completes.

How it works:

Consumer receives message
Consumer processes message
Consumer explicitly calls ack()
Queue removes message

Reliability: Highest—message survives consumer failures until acknowledged.

Complexity: Moderate—must handle ACK in all code paths including error handling.

manual-ack.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Manual acknowledgment: The reliable pattern
async function processWithManualAck(channel: Channel): Promise<void> {
    channel.consume('orders', async (msg) => {
        if (!msg) return;
        
        try {
            const order = JSON.parse(msg.content.toString());
            
            // Process the order
            await processOrder(order);
            
            // Only acknowledge after successful processing
            channel.ack(msg);
            console.log(`Order ${order.id} processed and acknowledged`);
            
        } catch (error) {
            console.error('Processing failed:', error);
            
            // Reject the message - will be redelivered or dead-lettered
            channel.nack(msg, false, true); // (msg, allUpTo, requeue)
        }
    }, { noAck: false });  // Critical: noAck must be false
}

Acknowledgment Modes Comparison
Mode	Reliability	Throughput	Complexity	Use Case
Manual ACK	Highest	Lower	Moderate	Critical business messages
Auto ACK	Lowest	Highest	Lowest	Disposable messages only
Batch ACK	High	High	Higher	High-volume processing

The Reliable Consumer Pattern

Reliable Consumer Principles

•Never acknowledge before processing: The message must be fully processed before ACK
•Handle all exceptions: Uncaught exceptions should never swallow messages silently
•Implement idempotency: Since redelivery is possible, processing must be repeatable
•Track processing progress: For long operations, extend visibility timeout periodically
•Log acknowledgment failures: ACK network failures need visibility for debugging
•Graceful shutdown: Finish in-flight messages before stopping, or let visibility timeout handle recovery

reliable-consumer.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
// A production-ready reliable consumer implementation
class ReliableConsumer<T> {
    private isShuttingDown = false;
    private inFlightCount = 0;
    
    constructor(
        private queue: MessageQueue,
        private processor: MessageProcessor<T>,
        private config: ConsumerConfig
    ) {
        // Handle graceful shutdown
        process.on('SIGTERM', () => this.shutdown());
        process.on('SIGINT', () => this.shutdown());
    }
    
    async start(): Promise<void> {
        console.log('Consumer starting...');
        
        while (!this.isShuttingDown) {
            try {
                const message = await this.queue.receive(this.config.queueName, {
                    visibilityTimeout: this.config.visibilityTimeout,
                    waitTimeSeconds: 20,
                });
                
                if (message) {
                    this.inFlightCount++;
                    this.processMessage(message).finally(() => {
                        this.inFlightCount--;
                    });
                }
            } catch (error) {
                console.error('Receive error:', error);
                await this.sleep(this.config.errorBackoffMs);
            }
        }
    }
    
    private async processMessage(message: QueueMessage<T>): Promise<void> {
        const startTime = Date.now();
        let visibilityExtender: NodeJS.Timeout | null = null;
        
        try {
            // Start extending visibility for long-running tasks
            if (this.config.enableVisibilityExtension) {
                visibilityExtender = setInterval(async () => {
                    try {
                        await this.queue.extendVisibility(
                            message.receiptHandle,
                            this.config.visibilityTimeout
                        );
                    } catch (e) {
                        console.error('Visibility extension failed', e);
                    }
                }, (this.config.visibilityTimeout * 1000) / 2);
            }
            
            // === PROCESS THE MESSAGE ===
            await this.processor.process(message.body);
            
            // === ACKNOWLEDGE ONLY AFTER SUCCESS ===
            await this.queue.delete(message.receiptHandle);
            
            console.log(`Processed in ${Date.now() - startTime}ms`);
            
        } catch (error) {
            await this.handleError(message, error);
        } finally {
            if (visibilityExtender) {
                clearInterval(visibilityExtender);
            }
        }
    }
    
    private async handleError(message: QueueMessage<T>, error: unknown): Promise<void> {
        const retryCount = message.attributes?.ApproximateReceiveCount || 0;
        
        console.error(`Processing failed (attempt ${retryCount})`, error);
        
        if (retryCount >= this.config.maxRetries) {
            // Max retries exceeded - let it go to DLQ
            console.error('Max retries exceeded, moving to DLQ');
            // Most queues auto-move to DLQ after maxReceiveCount
            // Optionally delete here if no DLQ configured
        } else {
            // Make message visible again with backoff
            const backoff = Math.min(
                this.config.baseBackoffSeconds * Math.pow(2, retryCount),
                this.config.maxBackoffSeconds
            );
            
            await this.queue.changeVisibility(
                message.receiptHandle,
                backoff
            );
        }
    }
    
    private async shutdown(): Promise<void> {
        console.log('Shutdown initiated...');
        this.isShuttingDown = true;
        
        // Wait for in-flight messages to complete
        const timeout = Date.now() + 30000; // 30 second max wait
        while (this.inFlightCount > 0 && Date.now() < timeout) {
            console.log(`Waiting for ${this.inFlightCount} in-flight messages...`);
            await this.sleep(1000);
        }
        
        console.log('Consumer shutdown complete');
        process.exit(0);
    }
    
    private sleep(ms: number): Promise<void> {
        return new Promise(resolve => setTimeout(resolve, ms));
    }
}

The Idempotency Imperative

Negative Acknowledgment (NACK)

NACK Behaviors

Depending on the system and configuration, NACK can trigger different behaviors:

NACK Behaviors by System
System	NACK Option	Behavior
RabbitMQ	nack(msg, false, true)	Requeue: Message goes back to queue head
RabbitMQ	nack(msg, false, false)	Discard/DLQ: Message is rejected permanently
RabbitMQ	nack(msg, true, true)	Batch requeue: This and all previous messages
AWS SQS	changeVisibility(0)	Immediate redelivery to any consumer
AWS SQS	No explicit NACK	Wait for visibility timeout
Azure Service Bus	abandon()	Message returns to queue, delivery count increments
Azure Service Bus	deadLetter()	Move directly to dead-letter queue

nack-patterns.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// NACK pattern examples
 
// RabbitMQ: Reject with requeue
channel.nack(msg, false, true);  // (message, allUpTo, requeue)
// Message returns to queue, may be delivered to same or different consumer
 
// RabbitMQ: Reject without requeue (goes to DLQ if configured)
channel.nack(msg, false, false);
// Message is discarded or moved to dead-letter exchange
 
// AWS SQS: Immediate retry (set visibility to 0)
await sqs.changeMessageVisibility({
    QueueUrl: queueUrl,
    ReceiptHandle: message.ReceiptHandle,
    VisibilityTimeout: 0  // Immediately visible again
}).promise();
 
// AWS SQS: Delayed retry (exponential backoff)
const retryCount = parseInt(message.Attributes?.ApproximateReceiveCount || '1');
const delaySeconds = Math.min(30 * Math.pow(2, retryCount), 900); // max 15 min
await sqs.changeMessageVisibility({
    QueueUrl: queueUrl,
    ReceiptHandle: message.ReceiptHandle,
    VisibilityTimeout: delaySeconds
}).promise();
 
// Azure Service Bus: Explicit dead-letter with reason
await receiver.deadLetterMessage(message, {
    deadLetterReason: 'ProcessingError',
    deadLetterErrorDescription: error.message
});

When to NACK vs When to Wait

Use explicit NACK when:

You detect a transient error and want immediate retry
You detect a permanent error and want to dead-letter immediately
You're implementing exponential backoff with changeVisibility

Let visibility timeout handle it when:

Consumer crashes (can't send NACK anyway)
You want consistent retry timing regardless of failure type
Your error handling is complex and NACK might not execute

The Requeue Loop Trap

Acknowledgment and Transactions

The hardest problem in message processing is ensuring that acknowledgment and side effects happen atomically. Consider this scenario:

Consumer receives message
Consumer writes to database ✓
Consumer crashes before acknowledging ✗
Message is redelivered
Consumer writes to database AGAIN (duplicate!)

This is the classic dual-write problem—two systems (database and queue) must be updated consistently, but there's no distributed transaction spanning both.

Converting Mermaid diagram...

Solutions to the Dual-Write Problem

1. Idempotent Operations

Make your database operations idempotent using message IDs:

idempotent-processing.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// Idempotent processing with deduplication
async function processOrderIdempotently(message: OrderMessage): Promise<void> {
    // Use message ID or idempotency key for deduplication
    const idempotencyKey = message.messageId;
    
    // Atomic upsert - only creates if not exists
    const result = await db.processedMessages.upsert({
        where: { messageId: idempotencyKey },
        create: {
            messageId: idempotencyKey,
            processedAt: new Date(),
            status: 'processing'
        },
        update: {}  // No-op if exists
    });
    
    if (result.status === 'completed') {
        console.log('Message already processed, skipping');
        return;  // Duplicate detected
    }
    
    try {
        // Process the order
        await createOrder(message.payload);
        
        // Mark as completed
        await db.processedMessages.update({
            where: { messageId: idempotencyKey },
            data: { status: 'completed' }
        });
    } catch (error) {
        await db.processedMessages.update({
            where: { messageId: idempotencyKey },
            data: { status: 'failed' }
        });
        throw error;
    }
}

2. Transactional Outbox (Reverse)

Some systems support transactional acknowledgment where the ACK is part of the same transaction as the database write:

transactional-ack.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// Kafka Streams / Kafka Exactly-Once Processing
// ACK is part of the same transaction as state updates
 
const producer = new Kafka().producer({
    transactionalId: 'my-transactional-producer'
});
 
await producer.connect();
 
// All operations in transaction commit together
await producer.transaction(async (txn) => {
    // Process message
    const result = processMessage(message);
    
    // Write output to downstream topic
    await txn.send({
        topic: 'processed-orders',
        messages: [{ value: result }]
    });
    
    // Commit consumer offset (acknowledgment)
    await txn.sendOffsets({
        consumerGroupId: 'my-consumer-group',
        topics: [{
            topic: 'orders',
            partitions: [{
                partition: message.partition,
                offset: (parseInt(message.offset) + 1).toString()
            }]
        }]
    });
    
    // Both happen atomically, or neither happens
});

The Practical Approach

Handling Acknowledgment Failures

What happens when the acknowledgment itself fails? You've processed the message, but the ACK network call times out or returns an error. The queue still thinks the message is in flight.

ACK Failure Scenarios

•Network timeout: The ACK was sent but acknowledgment of the ACK didn't arrive
•Queue unavailable: The queue service is temporarily down
•Receipt handle expired: Visibility timeout expired before ACK was sent
•Message already deleted: Another process already handled this message (in at-least-once systems)

ack-retry-pattern.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
// Robust acknowledgment with retry
async function acknowledgeWithRetry(
    queue: MessageQueue,
    receiptHandle: string,
    maxRetries: number = 3
): Promise<boolean> {
    let lastError: Error | null = null;
    
    for (let attempt = 1; attempt <= maxRetries; attempt++) {
        try {
            await queue.delete(receiptHandle);
            return true;  // Success
        } catch (error) {
            lastError = error as Error;
            console.warn(`ACK attempt ${attempt} failed:`, error);
            
            // Check if retry makes sense
            if (isReceiptHandleExpired(error)) {
                // Message will be redelivered anyway
                console.warn('Receipt handle expired, message will be redelivered');
                return false;
            }
            
            if (isMessageNotFound(error)) {
                // Message already deleted (by another consumer or cleanup)
                console.warn('Message already deleted');
                return true;  // Treat as success
            }
            
            // Exponential backoff for transient errors
            if (attempt < maxRetries) {
                await sleep(Math.pow(2, attempt) * 100);
            }
        }
    }
    
    // All retries failed
    console.error('All ACK attempts failed:', lastError);
    // Message will be redelivered after visibility timeout
    // Consumer's idempotency will prevent duplicate processing
    return false;
}
 
// Usage in consumer
try {
    await processMessage(message);
    const acked = await acknowledgeWithRetry(queue, message.receiptHandle);
    if (!acked) {
        // Log for monitoring but don't throw
        // Idempotent processing will handle redelivery
        metrics.increment('ack_failures');
    }
} catch (processingError) {
    // Handle processing error...
}

ACK Failures Are Rare But Not Zero

Batch Acknowledgment Deep Dive

For high-throughput systems, acknowledging each message individually creates significant overhead. Batch acknowledgment reduces this overhead but requires careful implementation.

Trade-offs of Batch ACK

Benefits

•Reduced network round-trips
•Higher throughput
•Lower latency per message
•Reduced queue service load

Risks

•More messages redelivered on failure
•Complex partial failure handling
•Higher latency before ACK (batching delay)
•Memory pressure for large batches

batch-ack-implementation.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
// Production batch acknowledgment implementation
class BatchAcknowledger {
    private pendingAcks: Map<string, AckEntry> = new Map();
    private flushTimer: NodeJS.Timeout | null = null;
    
    constructor(
        private queue: MessageQueue,
        private batchSize: number = 10,
        private flushIntervalMs: number = 1000
    ) {
        this.startFlushTimer();
    }
    
    addForAcknowledgment(receiptHandle: string, messageId: string): void {
        this.pendingAcks.set(messageId, {
            receiptHandle,
            timestamp: Date.now()
        });
        
        // Flush if batch is full
        if (this.pendingAcks.size >= this.batchSize) {
            this.flush();
        }
    }
    
    private startFlushTimer(): void {
        this.flushTimer = setInterval(() => {
            if (this.pendingAcks.size > 0) {
                this.flush();
            }
        }, this.flushIntervalMs);
    }
    
    private async flush(): Promise<void> {
        if (this.pendingAcks.size === 0) return;
        
        // Capture current batch
        const batch = Array.from(this.pendingAcks.entries());
        this.pendingAcks.clear();
        
        try {
            // Batch delete
            const entries = batch.map(([id, entry]) => ({
                Id: id,
                ReceiptHandle: entry.receiptHandle
            }));
            
            const result = await this.queue.deleteMessageBatch(entries);
            
            // Handle partial failures
            if (result.Failed && result.Failed.length > 0) {
                console.error('Partial batch ACK failure:', result.Failed);
                // Failed messages will be redelivered after visibility timeout
                metrics.increment('batch_ack_partial_failures', result.Failed.length);
            }
            
            metrics.increment('messages_acknowledged', result.Successful?.length || 0);
            
        } catch (error) {
            console.error('Batch ACK failed entirely:', error);
            // All messages will be redelivered
            // They're already processed, so idempotent handling will dedupe
            metrics.increment('batch_ack_failures');
        }
    }
    
    async shutdown(): Promise<void> {
        if (this.flushTimer) clearInterval(this.flushTimer);
        await this.flush();  // Final flush
    }
}

Optimal Batch Size

Summary: Message Acknowledgment

Message acknowledgment is the contract between consumer and queue that enables reliable message processing. Proper acknowledgment handling is what makes distributed messaging actually work.

Key Takeaways

•ACK = Contract Fulfillment: Acknowledgment signals successful processing; message is safely deleted from the queue.
•Manual ACK Is Standard: Always use manual acknowledgment for business-critical messages; auto-ACK loses messages on failure.
•NACK for Explicit Rejection: Use negative acknowledgment to trigger immediate retry or dead-letter routing.
•Idempotency Is Essential: Since redelivery is always possible, processing MUST be idempotent. Use message IDs for deduplication.
•Handle ACK Failures Gracefully: Retry acknowledgments on transient errors; message will be redelivered if ACK ultimately fails.
•Batch ACK for Throughput: Reduces overhead but increases blast radius on failure; combine with robust idempotency.

What's Next:

Page Complete

You now understand message acknowledgment: the handshake that enables reliable distributed processing. Manual ACK with idempotent processing is the gold standard for production systems.