Scaling Strategies - Learning Module

Loading content...

0/273

Queue Scaling Patterns

Queues: The Shock Absorber of Distributed Systems

Every distributed system eventually encounters a fundamental mismatch: producers generate work faster than consumers can process it. Traffic spikes arrive suddenly; downstream services have finite capacity; batch jobs create bursty load patterns. Without a buffering mechanism, these mismatches cause cascading failures.

Message queues solve this by decoupling producers from consumers, absorbing traffic spikes, and enabling independent scaling of each side. They transform synchronous, coupled call chains into asynchronous, resilient processing pipelines.

But queues themselves must scale. A single queue node becomes a bottleneck. High-throughput workloads demand partition-based distribution. Consumer groups must coordinate to avoid duplicates. This page explores how to scale message queue infrastructure to handle millions of messages per second.

What You Will Learn

By the end of this page, you will understand queue partitioning strategies, consumer group patterns, backpressure mechanisms, and how to design queue-based architectures that scale horizontally. You'll be equipped to build asynchronous processing systems that handle extreme throughput.

Why Queues Are Essential for Scale

Queues provide several properties that are essential for building scalable systems:

1. Decoupling (Temporal and Spatial)

decoupling-benefits.md
WITHOUT QUEUES (Synchronous):
┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ API      │───►│ Order    │───►│ Payment  │───►│ Shipping │
│ Gateway  │◄───│ Service  │◄───│ Service  │◄───│ Service  │
└──────────┘    └──────────┘    └──────────┘    └──────────┘
 
Problems:
- Request waits for ALL services to complete
- Shipping slow? Entire chain slow
- Shipping down? Entire chain fails
- Peak traffic → all services must scale simultaneously
 
WITH QUEUES (Asynchronous):
┌──────────┐    ┌──────────┐
│ API      │───►│ Order    │
│ Gateway  │◄───│ Service  │ (Immediate response: "Order placed")
└──────────┘    └──────┬───┘
                       │
                       ▼
              ┌─────────────────┐
              │   Order Queue   │
              └────────┬────────┘
            ┌──────────┼──────────┐
            ▼          ▼          ▼
      ┌──────────┐ ┌──────────┐ ┌──────────┐
      │ Payment  │ │ Inventory│ │ Shipping │
      │ Worker   │ │ Worker   │ │ Worker   │
      └──────────┘ └──────────┘ └──────────┘
 
Benefits:
- API responds immediately (better UX)
- Services process at their own pace
- Failure in one service doesn't cascade
- Each service scales independently

2. Load Leveling (Absorbing Spikes)

Queues act as shock absorbers, converting bursty traffic into steady processing:

load-leveling.md
Traffic Pattern (without queue):
 
Requests/sec
    1000│    ████
        │    ████
     500│    ████  ████
        │████████  ████
     250│████████  ████████
        │████████████████████
       0└────────────────────► Time
         
Service capacity: 500 req/s
Result: 500 requests dropped during spike!
 
 
Traffic Pattern (with queue):
 
Requests/sec
    1000│    ████░░░░░░░░░░░░░░░░░░░░░░
        │    ████░░░░░░░░░░             Queue depth
     500│====████==████================
        │══════════════════════════════ ← Steady processing
     250│══════════════════════════════
        │
       0└────────────────────────────► Time
         
Queue buffers the spike
Service processes steadily at 500 req/s
NO requests dropped (eventually processed)

3. Work Distribution (Parallelization)

Queues enable embarrassingly parallel processing across many workers:

Parallel Processing Benefits

•Linear scaling: Add workers, increase throughput proportionally
•Heterogeneous workers: Different workers can have different capabilities
•Geographic distribution: Process work near data or users
•Cost optimization: Use spot/preemptible instances for workers
•Isolation: One slow/failing worker doesn't affect others

The Queue as a Contract

Think of the queue as a contract between producers and consumers. The producer's responsibility ends when the message is durably stored in the queue. The consumer's responsibility is to eventually process that message. This separation of concerns is fundamental to scalable architecture.

Queue Partitioning: Scaling Beyond Single Nodes

A single queue has limited throughput—bounded by a single node's network, CPU, and disk I/O. To scale beyond this, queues are partitioned across multiple nodes, similar to database sharding.

Partitioning Fundamentals:

queue-partitioning.md
PARTITIONED QUEUE (e.g., Kafka Topic):
 
                    ┌─────────────────────────────────────────────┐
                    │            Topic: "orders"                  │
                    ├─────────────────────────────────────────────┤
                    │                                             │
                    │   Partition 0    Partition 1    Partition 2 │
                    │   ┌───────────┐  ┌───────────┐  ┌───────────┐
                    │   │ msg1      │  │ msg2      │  │ msg3      │
                    │   │ msg4      │  │ msg5      │  │ msg6      │
                    │   │ msg7      │  │ msg8      │  │ msg9      │
                    │   │ ...       │  │ ...       │  │ ...       │
                    │   └───────────┘  └───────────┘  └───────────┘
                    │                                             │
                    │   Each partition is an ordered log          │
                    │   Partitions distributed across brokers     │
                    └─────────────────────────────────────────────┘
 
Key Properties:
- Messages in ONE partition are strictly ordered
- Messages ACROSS partitions have no ordering guarantee
- Each partition can be on a different broker (horizontal scaling)
- Producer decides which partition via partition key

Partition Key Selection:

The partition key determines which partition receives a message. Like database shard keys, this choice is critical:

Partition Key Strategies
Strategy	Example Key	Ordering Guarantee	Use Case
User ID	user:12345	All messages for a user are ordered	User activity, notifications
Entity ID	order:67890	All messages for an order are ordered	Order processing, state machines
Tenant ID	tenant:acme	All tenant events are ordered	Multi-tenant SaaS
Random/Round-robin	None (random)	No ordering (maximum parallelism)	Independent, stateless processing
Geographic	region:us-east	Regional events are ordered	Geo-specific processing

partition-key-example.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import { Kafka, Partitioners } from 'kafkajs';
 
const kafka = new Kafka({ brokers: ['kafka:9092'] });
const producer = kafka.producer({
    createPartitioner: Partitioners.DefaultPartitioner
});
 
// Send order event - partition by order ID
// All events for same order go to same partition → ordered processing
async function sendOrderEvent(orderId: string, event: OrderEvent) {
    await producer.send({
        topic: 'order-events',
        messages: [{
            key: orderId,  // Partition key
            value: JSON.stringify(event),
        }],
    });
}
 
// Events for order "12345" always go to same partition
await sendOrderEvent('12345', { type: 'created', data: {...} });
await sendOrderEvent('12345', { type: 'paid', data: {...} });
await sendOrderEvent('12345', { type: 'shipped', data: {...} });
// Consumer sees: created → paid → shipped (guaranteed order)
 
// Different orders can be processed in parallel
await sendOrderEvent('67890', { type: 'created', data: {...} });
// order 67890 likely in different partition, processed concurrently
 
// For maximum parallelism with no ordering needs:
async function sendAnalyticsEvent(event: AnalyticsEvent) {
    await producer.send({
        topic: 'analytics',
        messages: [{
            key: null,  // No key = round-robin partition selection
            value: JSON.stringify(event),
        }],
    });
}

Partition Count Planning

Increasing partition count is easy; decreasing is painful (requires topic recreation). Start with more partitions than you think you need. A common heuristic: max(expected peak throughput / 10MB/s, expected consumer count × 2). For Kafka, 12-50 partitions for a high-traffic topic is typical.

Consumer Groups: Coordinated Parallel Processing

Consumer groups enable multiple consumers to process a partitioned queue in parallel while ensuring each message is processed exactly once (within the group).

How Consumer Groups Work:

consumer-groups.md
CONSUMER GROUP ASSIGNMENT:
 
Topic: "orders" (6 partitions)
Consumer Group: "order-processors"
 
┌──────────────────────────────────────────────────────────────┐
│                Topic Partitions                              │
│   P0      P1      P2      P3      P4      P5                │
│   │       │       │       │       │       │                  │
│   └───────┴───────┴───────┬───────┴───────┴──────────────────│
│                           │                                  │
│            Consumer Group: "order-processors"                │
│   ┌───────────────────────┼───────────────────────┐          │
│   │                       │                       │          │
│   ▼           ▼           ▼           ▼           ▼          │
│ Consumer 1  Consumer 2  Consumer 3  Consumer 4   │          │
│ (P0, P1)    (P2)        (P3)        (P4, P5)     │          │
│                                                   │          │
│ 4 consumers share 6 partitions                   │          │
│ Each partition assigned to exactly ONE consumer  │          │
│ Each consumer can have multiple partitions       │          │
└──────────────────────────────────────────────────────────────┘
 
REBALANCING (when consumers join/leave):
 
Before: 2 consumers
  Consumer A: P0, P1, P2
  Consumer B: P3, P4, P5
 
Consumer C joins → Rebalance:
  Consumer A: P0, P1
  Consumer B: P2, P3
  Consumer C: P4, P5
 
Consumer B crashes → Rebalance:
  Consumer A: P0, P1, P2
  Consumer C: P3, P4, P5

Scaling Consumers:

Consumer Scaling Rules

•Max consumers = partition count: More consumers than partitions means some sit idle
•Consumer < partitions = uneven load: Some consumers handle multiple partitions
•Ideal: consumers = partitions: Each consumer handles exactly one partition
•Rebalancing causes pause: During rebalance, processing stops briefly—design for this
•Sticky assignment: Modern protocols minimize partition movement during rebalance

consumer-group-example.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
import { Kafka } from 'kafkajs';
 
const kafka = new Kafka({ brokers: ['kafka:9092'] });
 
// Create consumer in a group
const consumer = kafka.consumer({
    groupId: 'order-processors',  // Consumer group ID
    sessionTimeout: 30000,        // Heartbeat timeout
    rebalanceTimeout: 60000,      // Time allowed for rebalance
});
 
await consumer.connect();
await consumer.subscribe({ topic: 'orders', fromBeginning: false });
 
// Process messages
await consumer.run({
    partitionsConsumedConcurrently: 3, // Process 3 partitions in parallel
    
    eachMessage: async ({ topic, partition, message }) => {
        const order = JSON.parse(message.value!.toString());
        
        console.log(`Processing order from partition ${partition}`);
        
        try {
            await processOrder(order);
            // Commit happens automatically after successful processing
        } catch (error) {
            // Error handling - see next section
            throw error; // Causes retry
        }
    },
});
 
// Graceful shutdown
const shutdown = async () => {
    console.log('Shutting down consumer...');
    await consumer.disconnect();
    process.exit(0);
};
 
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);
 
// To scale: simply run more instances of this consumer
// Kubernetes deployment with replicas = partition count is ideal

Multiple Consumer Groups

Different consumer groups process the same messages independently. Use this for: analytics (separate group from main processing), audit logging, multiple downstream services. Each group maintains its own offset/position in the queue.

Ordering and Delivery Guarantees

Understanding delivery guarantees is crucial for designing reliable queue-based systems. Different systems offer different guarantees, and the choice affects both reliability and performance.

Delivery Semantics:

Message Delivery Guarantees
Guarantee	Description	Trade-off	Use Case
At-most-once	Message delivered 0 or 1 times	Fast, may lose messages	Metrics, logs (acceptable loss)
At-least-once	Message delivered 1+ times	Reliable, may have duplicates	Most applications (with idempotency)
Exactly-once	Message delivered exactly 1 time	Expensive, complex	Financial transactions (usually overkill)

Why At-Least-Once + Idempotency:

Exactly-once delivery is extremely expensive to implement (requires distributed transactions). In practice, at-least-once delivery with idempotent processing achieves the same outcome more efficiently.

idempotent-processing.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
// PROBLEM: Message processed twice = incorrect state
// Order payment message processed twice = customer charged twice!
 
// SOLUTION: Idempotent message processing
interface OrderPaymentMessage {
    messageId: string;      // Unique message identifier
    orderId: string;
    amount: number;
    idempotencyKey: string; // Producer-provided, survives retries
}
 
async function processPayment(message: OrderPaymentMessage): Promise<void> {
    const { idempotencyKey, orderId, amount } = message;
    
    // Check if already processed using idempotency key
    const existing = await db.query(
        'SELECT id FROM processed_payments WHERE idempotency_key = ?',
        [idempotencyKey]
    );
    
    if (existing.length > 0) {
        console.log(`Payment ${idempotencyKey} already processed, skipping`);
        return; // Idempotent: same result, no side effects
    }
    
    // Process payment atomically with idempotency check
    await db.transaction(async (tx) => {
        // Double-check within transaction (handle race conditions)
        const alreadyProcessed = await tx.query(
            'SELECT id FROM processed_payments WHERE idempotency_key = ? FOR UPDATE',
            [idempotencyKey]
        );
        
        if (alreadyProcessed.length > 0) {
            return; // Another instance just processed it
        }
        
        // Record processing (before external call!)
        await tx.insert('processed_payments', {
            idempotency_key: idempotencyKey,
            order_id: orderId,
            processed_at: new Date(),
        });
        
        // Now safe to charge payment
        await paymentGateway.charge(orderId, amount);
        
        // Update order status
        await tx.update('orders', orderId, { status: 'paid' });
    });
}
 
// Alternative: Use database unique constraints
async function processPaymentSimpler(message: OrderPaymentMessage): Promise<void> {
    try {
        // idempotency_key has UNIQUE constraint
        await db.insert('processed_payments', {
            idempotency_key: message.idempotencyKey,
            order_id: message.orderId,
            processed_at: new Date(),
        });
        
        // If INSERT succeeded, we're the first processor
        await paymentGateway.charge(message.orderId, message.amount);
        
    } catch (error) {
        if (isDuplicateKeyError(error)) {
            console.log('Already processed, ignoring duplicate');
            return; // Idempotent handling
        }
        throw error; // Real error, retry
    }
}

Order Matters for Idempotency

Always record successful processing BEFORE performing side effects. If you charge the payment then crash before recording, the retry will charge again. If you record first, a crash means the charge didn't happen, and re-processing is correct. The idempotency record is your checkpoint.

Backpressure and Flow Control

When consumers can't keep up with producers, queue depth grows unboundedly. Without flow control, this leads to memory exhaustion, increased latency, and eventually system failure. Backpressure is the mechanism for signaling upstream to slow down.

Backpressure Strategies:

backpressure-strategies.md
STRATEGY 1: Bounded Queues (Block Producers)
┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  Producer ───► [■■■■■■■■■■] ───► Consumer                   │
│                ▲                                             │
│                │ Queue Full!                                 │
│                │ Producer blocks/fails                       │
│                                                              │
│  Pros: Prevents OOM, natural flow control                   │
│  Cons: Producer latency increases, may timeout              │
│  Use: RabbitMQ with queue length limits                     │
└──────────────────────────────────────────────────────────────┘
 
STRATEGY 2: Rate Limiting Producers
┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  Producer ───► Token Bucket ───► [■■■■□□□□□□] ───► Consumer │
│                (100 msg/s)                                   │
│                                                              │
│  Pros: Predictable load, protects downstream               │
│  Cons: Throttles even when capacity available               │
│  Use: API rate limits, admission control                    │
└──────────────────────────────────────────────────────────────┘
 
STRATEGY 3: Drop/Sample Under Load (Load Shedding)
┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  Producer ──┬─► [■■■■■■■■■■] ───► Consumer                  │
│             │                                                │
│             └─► ❌ Drop (queue > threshold)                 │
│                                                              │
│  Pros: Guarantees processing capacity for accepted messages │
│  Cons: Data loss (must be acceptable)                       │
│  Use: Metrics, non-critical events, sampling                │
└──────────────────────────────────────────────────────────────┘
 
STRATEGY 4: Adaptive Scaling
┌──────────────────────────────────────────────────────────────┐
│                                                              │
│  Producer ───► [■■■■■■■■■■] ───► Consumer Pool              │
│                    │              │   │   │   │              │
│                    │              ▼   ▼   ▼   ▼              │
│           Queue depth > threshold                            │
│                    │                                         │
│                    └───► Add more consumers!                 │
│                                                              │
│  Pros: Handles variable load automatically                  │
│  Cons: Startup latency, cost, scaling limits                │
│  Use: Kubernetes HPA with queue depth metrics               │
└──────────────────────────────────────────────────────────────┘

Implementing Backpressure with KEDA:

KEDA (Kubernetes Event-Driven Autoscaling) scales consumers based on queue depth:

keda-scaling.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# KEDA ScaledObject for Kafka consumer scaling
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
spec:
  scaleTargetRef:
    name: order-processor  # Deployment to scale
  pollingInterval: 15      # Check queue every 15s
  cooldownPeriod: 300      # Wait 5min before scale-down
  minReplicaCount: 2       # Always at least 2 consumers
  maxReplicaCount: 50      # Scale up to 50 consumers
  
  triggers:
    - type: kafka
      metadata:
        bootstrapServers: kafka:9092
        consumerGroup: order-processors
        topic: orders
        
        # Scale when consumer lag exceeds threshold
        lagThreshold: "100"  # Messages behind per partition
        
        # Or scale based on message rate
        activationLagThreshold: "10"  # Start scaling at 10 msgs lag
        
    # Alternative: Scale based on queue depth (RabbitMQ)
    - type: rabbitmq
      metadata:
        queueName: orders
        host: amqp://rabbitmq
        mode: QueueLength
        value: "100"  # Scale when > 100 messages queued
 
---
# HPA for traditional metrics-based scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: order-processor-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: order-processor
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: External
      external:
        metric:
          name: kafka_consumergroup_lag
          selector:
            matchLabels:
              topic: orders
              consumergroup: order-processors
        target:
          type: AverageValue
          averageValue: "50"  # Target 50 messages lag per consumer

Queue Depth as a Scaling Signal

Queue depth (lag) is often a better scaling signal than CPU utilization. CPU might be low while messages pile up waiting for slow external calls. Scale based on message lag to ensure you're keeping up with incoming work, not just keeping CPUs busy.

Dead Letter Queues and Error Handling

What happens when a message can't be processed? Poison messages—those that repeatedly fail—can block queue processing and create infinite retry loops. Dead Letter Queues (DLQs) provide a systematic solution.

The Poison Message Problem:

poison-message-problem.md
WITHOUT DLQ:
┌────────────┐    ┌────────────────────────┐    ┌──────────────┐
│  Producer  │───►│ Message Queue          │───►│  Consumer    │
│            │    │                        │    │              │
│            │    │ [msg1][msg2][POISON!]  │    │  ← Fails!    │
│            │    │                        │    │  ← Retry     │
│            │    │ POISON stays at head!  │    │  ← Fails!    │
│            │    │ msg1, msg2 blocked     │    │  ← Retry...  │
└────────────┘    └────────────────────────┘    └──────────────┘
 
Result: System grinds to halt on one bad message
 
 
WITH DLQ:
┌────────────┐    ┌─────────────────────────────┐    ┌──────────┐
│  Producer  │───►│ Message Queue               │───►│ Consumer │
│            │    │                             │    │          │
│            │    │ [msg1][msg2][POISON!]       │    │ ← Fail 1 │
│            │    │ [msg1][msg2]  └─────────────┼───►│ ← Fail 2 │
│            │    │ [msg1]             │        │    │ ← Fail 3 │
│            │    │ []                 │        │    │ Process! │
└────────────┘    └────────────────────┼────────┘    └──────────┘
                                       │
                                       ▼
                              ┌─────────────────┐
                              │ Dead Letter     │
                              │ Queue (DLQ)     │
                              │                 │
                              │ [POISON!]       │
                              │ (for later      │
                              │  investigation) │
                              └─────────────────┘
 
Result: Processing continues; poison message isolated

Implementing Retry with Backoff:

retry-with-dlq.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
interface MessageWithRetry {
    payload: unknown;
    retryCount: number;
    originalTimestamp: Date;
    lastError?: string;
}
 
const MAX_RETRIES = 5;
const RETRY_DELAYS = [1000, 5000, 30000, 120000, 600000]; // Exponential backoff
 
async function processWithRetry(message: MessageWithRetry): Promise<void> {
    try {
        await processMessage(message.payload);
        // Success! Acknowledge and done
        
    } catch (error) {
        const retryCount = message.retryCount + 1;
        
        if (retryCount >= MAX_RETRIES) {
            // Exhausted retries → send to DLQ
            await sendToDLQ({
                originalMessage: message.payload,
                retryCount,
                finalError: error.message,
                processingHistory: message,
            });
            console.log(`Message sent to DLQ after ${retryCount} attempts`);
            return; // Don't throw, acknowledge the message
        }
        
        // Schedule retry with exponential backoff
        const delay = RETRY_DELAYS[retryCount - 1] || RETRY_DELAYS[RETRY_DELAYS.length - 1];
        
        await sendToRetryQueue({
            ...message,
            retryCount,
            lastError: error.message,
        }, { delay });
        
        console.log(`Retry ${retryCount}/${MAX_RETRIES} scheduled in ${delay}ms`);
    }
}
 
// AWS SQS native DLQ configuration
const queueConfig = {
    QueueName: 'orders-queue',
    RedrivePolicy: JSON.stringify({
        deadLetterTargetArn: 'arn:aws:sqs:us-east-1:123456789:orders-dlq',
        maxReceiveCount: 5  // After 5 failed attempts → DLQ
    }),
    VisibilityTimeout: '30',  // 30 seconds to process
};
 
// DLQ monitoring and alerting
async function monitorDLQ(): Promise<void> {
    const dlqDepth = await sqs.getQueueDepth('orders-dlq');
    
    if (dlqDepth > 0) {
        await alertOps({
            severity: dlqDepth > 100 ? 'critical' : 'warning',
            message: `${dlqDepth} messages in orders-dlq`,
            action: 'Investigate failed order processing',
        });
    }
}
 
// DLQ reprocessing tool
async function replayFromDLQ(
    dlqName: string,
    targetQueue: string,
    filter?: (msg: unknown) => boolean
): Promise<{ replayed: number; skipped: number }> {
    let replayed = 0, skipped = 0;
    
    while (true) {
        const messages = await sqs.receiveMessages(dlqName, { max: 10 });
        if (messages.length === 0) break;
        
        for (const msg of messages) {
            if (filter && !filter(msg.body)) {
                skipped++;
                await sqs.deleteMessage(dlqName, msg.receiptHandle);
                continue;
            }
            
            // Resend to main queue with reset retry count
            await sqs.sendMessage(targetQueue, {
                ...msg.body,
                retryCount: 0,
                replayedFromDLQ: true,
                replayedAt: new Date(),
            });
            
            await sqs.deleteMessage(dlqName, msg.receiptHandle);
            replayed++;
        }
    }
    
    return { replayed, skipped };
}

DLQ Is Not a Trash Can

DLQ messages require attention. Set up alerts for DLQ depth, investigate root causes, and either fix and replay messages or explicitly delete them with documentation. An unmonitored DLQ is a hidden data loss vector.

Queue Technology Comparison for Scale

Different queue technologies have different scaling characteristics. Choosing the right one depends on your specific requirements.

Queue Technologies for Scale
Technology	Throughput	Ordering	Best For
Apache Kafka	Millions msg/s	Per-partition	Event streaming, high-throughput
Amazon SQS	Unlimited (managed)	FIFO optional	Simple queues, AWS workloads
RabbitMQ	~50K msg/s/node	Per-queue	Complex routing, RPC patterns
Amazon Kinesis	1MB/s per shard	Per-shard	Real-time analytics, AWS streaming
Redis Streams	~100K msg/s	Per-stream	Simple streaming, existing Redis
Google Pub/Sub	Unlimited (managed)	Per-key ordering	GCP workloads, global

Kafka vs SQS: The Common Choice

Choose Kafka When

•Need extreme throughput (100K+ msg/s)
•Message replay/reprocessing required
•Event sourcing architecture
•Multiple consumers per message
•Complex stream processing
•Have ops capacity to run clusters

Choose SQS When

•Simple message queuing (task queue)
•Want zero operational overhead
•Pay-per-use pricing preferred
•Integration with AWS ecosystem
•FIFO ordering with deduplication
•Don't need message replay

kafka-scaling-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Kafka Cluster Sizing for High Throughput
 
# Broker configuration
broker.count: 6                    # Start with 6 brokers for HA
log.retention.hours: 168           # Keep messages 7 days
log.segment.bytes: 1073741824      # 1GB segment files
num.partitions: 12                 # Default partitions per topic
 
# High-throughput topics
topics:
  events:
    partitions: 50                 # High parallelism
    replication-factor: 3          # Durability
    min.insync.replicas: 2         # Consistency
    
  orders:
    partitions: 24                 # Match consumer count
    replication-factor: 3
    retention.ms: 604800000        # 7 days
 
# Producer tuning for throughput
producer.config:
  acks: 1                          # Leader ack only (faster)
  batch.size: 65536                # 64KB batches
  linger.ms: 10                    # Wait 10ms to batch
  compression.type: lz4            # Fast compression
  buffer.memory: 67108864          # 64MB buffer
 
# Consumer tuning
consumer.config:
  fetch.min.bytes: 50000           # Batch fetches
  fetch.max.wait.ms: 100           # Wait for batch
  max.poll.records: 500            # Process in batches
  session.timeout.ms: 30000        # Heartbeat timeout
  max.poll.interval.ms: 300000     # 5 min processing time

Managed vs Self-Hosted

Self-hosting Kafka provides maximum control and cost efficiency at scale, but requires significant operational expertise. Managed services (Confluent Cloud, AWS MSK, Aiven) trade cost for operational simplicity. For most teams, starting with managed and migrating when cost justifies is pragmatic.

Summary: Queue Scaling Patterns

Message queues are essential infrastructure for building scalable, resilient distributed systems. Let's consolidate the key insights:

Key Takeaways

•Queues enable decoupling and load leveling — Producers and consumers scale independently. Traffic spikes are absorbed.
•Partitioning enables horizontal scaling — Partition key determines ordering scope. More partitions = more parallelism.
•Consumer groups coordinate parallel processing — Each partition has one consumer. Scale consumers up to partition count.
•At-least-once delivery + idempotency = exactly-once behavior — Design for duplicates rather than relying on exactly-once semantics.
•Backpressure prevents system overload — Use bounded queues, rate limiting, or auto-scaling based on queue depth.
•Dead Letter Queues isolate poison messages — Failed messages go to DLQ for investigation. Monitor and alert on DLQ depth.
•Choose technology based on requirements — Kafka for high throughput and replay; SQS for simplicity and managed operations.
•Queue depth is a key scaling signal — Scale consumers based on lag, not just CPU utilization.

Module Complete:

You have now completed the Scaling Strategies module. You understand how to scale systems from multiple angles: read vs write workloads, stateless services, database strategies, caching layers, and queue-based architectures. These patterns form the foundation for building systems that can grow from hundreds to millions of users.

The next module will explore Auto-Scaling—how to implement automatic scaling based on demand, including scaling policies, triggers, and the operational patterns that make elastic scaling work in production.

Module Complete

You now have a comprehensive understanding of scaling strategies for distributed systems. From read/write scaling fundamentals through stateless services, database patterns, caching layers, and queue architectures, you're equipped to design systems that scale horizontally to meet any demand.