System Design (HLD)Cloud Functions

Cloud Functions: Mastering Function-as-a-Service

LevelIntermediate

Duration120 mins

TopicCloud Functions

4 / 5

Execution Model: Understanding How Serverless Functions Actually Run

The Execution Model: Where Serverless Magic Happens

When you deploy a serverless function, a remarkable amount of engineering takes place behind the scenes. Your code—a few hundred lines written in an afternoon—runs on infrastructure that cost billions to build, automatically scaling from zero to handling millions of requests, then scaling back down. Understanding this execution model separates developers who use serverless from engineers who architect with serverless.

The execution model encompasses everything that happens from the moment an event triggers your function to the moment a response is returned. It determines how fast your functions start, how they scale, what resources they receive, and how they behave under load. Getting this right is the difference between serverless systems that delight users and those that frustrate them with timeouts and cold starts.

What You Will Learn

This page provides an architectural deep-dive into serverless execution models. You'll understand instance lifecycle management, scaling algorithms, resource allocation strategies, concurrency models, and how to design functions that work harmoniously with the underlying platform infrastructure.

The Execution Environment Lifecycle

Every serverless function runs within an execution environment—an isolated compute container that hosts your code and runtime. Understanding this lifecycle is fundamental to writing efficient functions.

The Four States of an Execution Environment:

execution-lifecycle.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
┌─────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTION ENVIRONMENT LIFECYCLE                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│     ┌─────────────────┐                                                      │
│     │   COLD START    │ ◄──── Event arrives, no warm instances available    │
│     │  (Provisioning) │                                                      │
│     └────────┬────────┘                                                      │
│              │                                                               │
│     • Download code/image                                                    │
│     • Start runtime (Node, Python, Java...)                                  │
│     • Execute global scope / initialization                                  │
│     • Initialize extensions                                                  │
│     [Duration: 100ms - 10s+ depending on factors]                           │
│              │                                                               │
│              ▼                                                               │
│     ┌─────────────────┐                                                      │
│     │     INVOKE      │ ◄──── Handler function executes                     │
│     │   (Executing)   │                                                      │
│     └────────┬────────┘                                                      │
│              │                                                               │
│     • Handler receives event + context                                       │
│     • Code executes                                                          │
│     • Response returned                                                      │
│     [Duration: Your code execution time - billed]                           │
│              │                                                               │
│              ▼                                                               │
│     ┌─────────────────┐                                                      │
│     │     FROZEN      │ ◄──── Waiting for next invocation                   │
│     │    (Warm)       │                                                      │
│     └────────┬────────┘                                                      │
│              │                                                               │
│     • Processes suspended (SIGSTOP equivalent)                               │
│     • Memory state preserved                                                 │
│     • Global variables retained                                              │
│     • Network connections may be dropped                                     │
│     [Duration: Platform-dependent, typically 5-60 minutes]                  │
│              │                                                               │
│              ├──────────────────────┐                                        │
│              ▼                      ▼                                        │
│     ┌─────────────────┐    ┌─────────────────┐                              │
│     │    THAWED       │    │   TERMINATED    │                              │
│     │ (Reactivated)   │    │   (Shutdown)    │                              │
│     └────────┬────────┘    └─────────────────┘                              │
│              │                                                               │
│              │             • Shutdown hooks execute                          │
│              │             • Resources released                              │
│              │             • Environment destroyed                           │
│              │                                                               │
│              └──────────► INVOKE (fast path, ~1-50ms)                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Phase Analysis:

1. Cold Start (Provisioning)

This is the most visible phase because it's where latency variability occurs:

Environment Creation: The platform spins up a micro-VM (Firecracker for Lambda), container, or process
Code Deployment: Your deployment package is downloaded and extracted. For container-based systems, layers are pulled and mounted
Runtime Initialization: The language runtime starts (V8 for Node.js, CPython for Python, JVM for Java)
Static Initialization: Code outside your handler executes—imports, global variables, class definitions
Extension Loading: APM agents, logging frameworks, and platform extensions initialize

2. Invoke (Executing)

The only phase that's always billed:

Your handler function receives the event payload and context metadata
You process the request and return a response
Duration directly affects cost and user experience

3. Frozen (Warm/Idle)

Between invocations:

The execution environment is frozen in place—all processes suspended
Memory state is exactly preserved—global variables, open file handles, initialized clients
This enables subsequent warm starts to skip initialization
Duration is platform-controlled and not guaranteed

4. Terminated (Shutdown)

When the platform reclaims resources:

Shutdown hooks may execute (if platform supports and you've registered them)
Any in-memory state is lost
Subsequent invocations will experience cold starts

No Guaranteed Warm Duration

Platforms do not guarantee how long execution environments remain warm. AWS Lambda typically keeps instances warm for 5-15 minutes after last invocation, but this can vary. Never design systems that depend on warm instances existing—they're an optimization, not a guarantee.

Scaling Algorithms and Behavior

Serverless platforms automatically scale your functions based on demand. Understanding these scaling algorithms helps you design systems that scale predictably and cost-effectively.

Reactive Scaling (Event-Driven)

The fundamental scaling model for serverless:

Monitor: Platform monitors incoming event rate
Calculate: Determine how many concurrent executions are needed
Scale Up: Provision new execution environments when demand exceeds capacity
Scale Down: Terminate idle environments after timeout period

Concurrency Models:

Concurrency Models Across Platforms
Platform	Concurrency Model	Default Concurrency	Scaling Behavior
AWS Lambda	1 request per instance	1	New instance per concurrent request
Azure Functions (Consumption)	Multiple per instance	~16	Scale based on queue depth/rate
Azure Functions (Premium)	Configurable	Up to 32	Pre-warmed + burst scaling
GCF 1st Gen	1 request per instance	1	Similar to Lambda
GCF 2nd Gen	Configurable	Up to 1000	Leverages Cloud Run scaling

AWS Lambda Scaling Deep Dive:

Lambda's scaling algorithm is sophisticated:

1. Initial Burst:
   - Immediate burst of 500-3000 instances (region-dependent)
   - No rate limiting for initial traffic spike
   
2. Sustained Scaling:
   - After burst capacity, adds 500 instances per minute
   - This throttle prevents runaway scaling
   
3. Concurrency Limits:
   - Account limit: 1000 concurrent executions (can increase)
   - Reserved concurrency: Guaranteed capacity per function
   - Provisioned concurrency: Pre-initialized instances

Scaling Visualization:

scaling-behavior.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Traffic Spike Scenario: 0 to 5000 concurrent requests
 
Concurrent Instances
 │
5000 ┤                                    ┌─────────────────
     │                                   /
4000 ┤                                  /
     │                                 /
3000 ┤                         ┌──────┘   ◄── Post-burst scaling
     │                         │              (+500/minute)
2000 ┤                         │
     │                         │
1000 ┤─────────────┐           │
     │             │           │
   0 ├─────────────┴───────────┴──────────────────────────
     0          Burst          1min    2min    3min    4min
                  │
                  └── Initial burst (500-3000 instances immediately)
 
Key Insight: The initial burst handles sudden traffic spikes.
             Sustained growth requires the 500/min scaling rate.
             Provision accordingly for expected sustained increases.

Poll-Based Source Scaling (Kinesis, SQS, etc.):

For event source mappings, scaling works differently:

Kinesis/DynamoDB Streams:

One concurrent execution per shard (parallelization factor)
Scale by adding shards, not functions
Parallelization factor allows multiple concurrent processors per shard (1-10)

SQS Queues:

Lambda polls the queue and batches messages
Scales up to 5 concurrent batches initially
Adds 60 concurrent batches per minute as queue depth increases
Maximum ~1000 concurrent executions for standard queues

SQS FIFO Queues:

Limited by MessageGroupId
One concurrent execution per unique MessageGroupId
Scale through message group diversity

Design for Scaling Characteristics

If your downstream system (database, API) can't handle thousands of concurrent connections, use reserved concurrency or SQS with controlled batch sizes to limit Lambda's scaling. Serverless scales faster than most backends—protect your dependencies.

Resource Allocation Models

Understanding how platforms allocate resources helps you optimize for performance and cost.

AWS Lambda Resource Model:

Memory is the primary configuration dimension:

Memory Configured	vCPU Equivalent	Network Bandwidth	Cost/GB-second
128 MB	0.083 vCPU	Proportional	$0.0000166667
512 MB	0.33 vCPU	Proportional	$0.0000166667
1,769 MB	1 vCPU	~6 Gbps	$0.0000166667
3,538 MB	2 vCPU	~12 Gbps	$0.0000166667
10,240 MB	6 vCPU	~25 Gbps	$0.0000166667

Key insight: CPU scales linearly with memory. At 1,769 MB, you get exactly one vCPU. This is the inflection point for compute-intensive workloads.

Ephemeral Storage:

/tmp: 512 MB default, configurable up to 10,240 MB
Persists across warm invocations within same environment
Useful for caching, temporary files, model loading
Additional storage costs extra ($0.0000000309 per MB-second)

resource-optimization.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
// Resource usage patterns and optimization
 
// 1. CPU-BOUND WORKLOAD: Increase memory for more CPU
// Image processing, ML inference, compression
export async function cpuIntensiveHandler(event: any) {
  // This benefits from higher memory (more CPU)
  const result = performComplexCalculation(event.data);
  return { result };
}
// Deploy with: --memory 3008 (for ~2 vCPU)
 
// 2. MEMORY-BOUND WORKLOAD: Match memory to data size
// Large data processing, caching, aggregation
export async function memoryIntensiveHandler(event: any) {
  // Loading large dataset into memory
  const data = event.records; // 500MB of data
  const aggregated = data.reduce((acc, record) => {
    // Memory-intensive aggregation
    return merge(acc, record);
  }, {});
  return { aggregated };
}
// Deploy with: --memory 1024 or higher based on data size
 
// 3. I/O-BOUND WORKLOAD: Lower memory often sufficient
// API calls, database queries, file operations
export async function ioIntensiveHandler(event: any) {
  // Mostly waiting on network I/O
  const [users, orders, inventory] = await Promise.all([
    fetchUsers(event.userIds),
    fetchOrders(event.orderIds),
    fetchInventory(event.skus)
  ]);
  return { users, orders, inventory };
}
// Deploy with: --memory 512 (I/O doesn't benefit from more CPU)
 
// 4. COLD START INTENSIVE: Higher memory = faster startup
// Functions with many dependencies, complex initialization
import { createClient as createRedis } from 'redis';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { S3Client } from '@aws-sdk/client-s3';
 
// Global initialization - runs during cold start
const redis = createRedis({ url: process.env.REDIS_URL });
const dynamodb = new DynamoDBClient({});
const s3 = new S3Client({});
 
export async function coldStartSensitiveHandler(event: any) {
  // Function with heavy initialization
}
// Deploy with: --memory 1024+ (faster cold start initialization)
 
// 5. EPHEMERAL STORAGE USAGE
import fs from 'fs/promises';
import path from 'path';
 
const CACHE_DIR = '/tmp/model-cache';
 
export async function modelInferenceHandler(event: any) {
  const modelPath = path.join(CACHE_DIR, 'model.bin');
  
  // Check if model is cached from previous invocation
  const cached = await fs.stat(modelPath).catch(() => null);
  
  if (!cached) {
    // Download and cache model (persists across warm invocations)
    console.log('Downloading model to /tmp...');
    await downloadModel(modelPath);
  } else {
    console.log('Using cached model');
  }
  
  // Use the cached model
  return runInference(modelPath, event.input);
}
// Deploy with: --ephemeral-storage 2048 (for large models)

Azure Functions Resource Model:

Azure takes a different approach:

Consumption Plan:

Fixed 1.5 GB memory
CPU scales with load (not configurable)
5-minute default timeout, 10-minute max

Premium Plan:

EP1: 1 vCPU, 3.5 GB RAM
EP2: 2 vCPU, 7 GB RAM
EP3: 4 vCPU, 14 GB RAM
Configurable minimum instances

Google Cloud Functions Resource Model:

2nd generation offers more granular control:

gcloud functions deploy my-function \
  --gen2 \
  --memory 2Gi \
  --cpu 1 \              # Decoupled from memory!
  --concurrency 100 \
  --min-instances 1 \
  --max-instances 100

Key GCF difference: CPU and memory are independently configurable in 2nd gen, allowing more precise resource allocation.

Power Tuning is Essential

The optimal memory/CPU configuration varies dramatically by workload. Use tools like AWS Lambda Power Tuning to empirically test your functions at different memory levels. Often, higher memory reduces total cost by completing faster—don't assume minimum memory is cheapest.

Isolation and Security Models

Multi-tenant serverless platforms must provide strong isolation between customers while maintaining the agility and density that makes serverless economical.

Isolation Layers:

1. Process-Level Isolation (Weakest)

Functions run in separate processes
Share kernel with other functions
Faster startup, lower overhead
Weaker security boundary
Used in: Some internal/single-tenant deployments

2. Container-Level Isolation

Functions run in separate containers
Linux cgroups and namespaces provide isolation
Moderate startup time
Good balance of isolation and performance
Used in: Azure Functions, GCF 2nd gen (via Cloud Run)

3. MicroVM-Level Isolation (Strongest)

Functions run in lightweight virtual machines
Hardware-level isolation through virtualization
Strongest security boundary
Used in: AWS Lambda (Firecracker), Azure (Hyper-V isolation)

isolation-architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
AWS Lambda Isolation Architecture (Firecracker)
═══════════════════════════════════════════════════════════════════════════
 
┌─────────────────── Physical Host ───────────────────┐
│                                                      │
│  ┌────── Customer A ──────┐  ┌────── Customer B ──────┐
│  │    ┌──────────────┐    │  │    ┌──────────────┐    │
│  │    │  Function 1  │    │  │    │  Function 3  │    │
│  │    │  (MicroVM)   │    │  │    │  (MicroVM)   │    │
│  │    │   ┌───────┐  │    │  │    │   ┌───────┐  │    │
│  │    │   │ Code  │  │    │  │    │   │ Code  │  │    │
│  │    │   │Runtime│  │    │  │    │   │Runtime│  │    │
│  │    │   │Kernel │  │    │  │    │   │Kernel │  │    │
│  │    │   └───────┘  │    │  │    │   └───────┘  │    │
│  │    └──────────────┘    │  │    └──────────────┘    │
│  │                        │  │                        │
│  │    ┌──────────────┐    │  │    ┌──────────────┐    │
│  │    │  Function 2  │    │  │    │  Function 4  │    │
│  │    │  (MicroVM)   │    │  │    │  (MicroVM)   │    │
│  │    └──────────────┘    │  │    └──────────────┘    │
│  └────────────────────────┘  └────────────────────────┘
│                                                      │
│  ┌───────────────────────────────────────────────┐  │
│  │              Firecracker VMM                   │  │
│  │  • Minimal attack surface (~50k lines of code)│  │
│  │  • Memory-safe (Rust implementation)          │  │
│  │  • <125ms VM startup time                     │  │
│  │  • ~5MB memory overhead per VM                │  │
│  └───────────────────────────────────────────────┘  │
│                                                      │
│  ┌───────────────────────────────────────────────┐  │
│  │              Host Kernel (Linux)               │  │
│  │  • KVM for hardware virtualization            │  │
│  │  • cgroups for resource limits                │  │
│  │  • Jailer for additional sandboxing           │  │
│  └───────────────────────────────────────────────┘  │
│                                                      │
└──────────────────────────────────────────────────────┘
 
Security Boundaries:
✓ Customer A cannot access Customer B's memory
✓ Each function has isolated network namespace
✓ Filesystem isolation (no shared mounts)
✓ Independent kernel (OS vulnerabilities contained)

Defense in Depth:

Platforms implement multiple security layers:

1. IAM and Access Control

Functions execute with specific roles/identities
Least-privilege permissions
Per-function role assignment

2. Network Isolation

Functions in VPC have dedicated network interfaces
Security groups control ingress/egress
Private subnets prevent internet access

3. Data Encryption

At-rest encryption for code and configuration
In-transit encryption for all communications
Customer-managed keys (CMEK) optional

4. Secrets Management

Integration with secrets managers (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager)
Environment variables encrypted at rest
Runtime injection without code exposure

5. Audit Logging

All invocations logged by platform
API calls tracked (CloudTrail, Azure Monitor)
Access to logs requires explicit permissions

Same Environment, Different Invocations

Multiple invocations of YOUR function may share the same execution environment (warm starts). This means global variables persist. Be careful not to leak sensitive data between invocations—clear caches of user-specific data, don't store credentials in global variables, and assume your function could process requests from different users sequentially.

Request Routing and Load Balancing

When millions of requests flow into serverless functions, sophisticated routing ensures optimal distribution.

The Request Journey:

request-routing.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Request Flow Through AWS Lambda
═══════════════════════════════════════════════════════════════════════════
 
Client Request
      │
      ▼
┌──────────────────┐
│   Edge Network   │  • CloudFront PoPs worldwide
│   (Optional)     │  • SSL termination, caching
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│   API Gateway    │  • Request validation
│   (If HTTP)      │  • Rate limiting, throttling
│                  │  • Authentication (Cognito, IAM)
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Lambda Frontend │  • Receives invocation request
│     Service      │  • Authenticates caller
│                  │  • Validates payload
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Counting        │  • Checks concurrency limits
│  Service         │  • Account limit, reserved concurrency
│                  │  • Returns 429 if over limit
└────────┬─────────┘
         │
         ├─────────── Need new instance? ───────────────┐
         │                                               │
         ▼                                               ▼
┌──────────────────┐                        ┌──────────────────┐
│  Warm Instance   │                        │   Cold Start     │
│    Available     │                        │   (Placement     │
│                  │                        │    Service)      │
└────────┬─────────┘                        └────────┬─────────┘
         │                                           │
         │                    ┌──────────────────────┘
         │                    │
         ▼                    ▼
┌────────────────────────────────────────────────┐
│            Worker Instance                       │
│  ┌─────────────────────────────────────────┐   │
│  │  Firecracker MicroVM                     │   │
│  │  ┌───────────────────────────────────┐  │   │
│  │  │  Your Function Code               │  │   │
│  │  │  • Handler executes               │  │   │
│  │  │  • Response generated             │  │   │
│  │  └───────────────────────────────────┘  │   │
│  └─────────────────────────────────────────┘   │
└────────────────────────────────────────────────┘
         │
         ▼
    Response returned through same path

Load Balancing Strategies:

1. Request Distribution

Incoming requests are distributed across available warm instances
If all instances are busy (at concurrency limit), new instances are created
Distribution considers instance health and current load

2. Affinity and Stickiness

Serverless platforms generally do NOT provide session affinity
Any instance may handle any request
Design for stateless operation

3. Traffic Splitting

Some platforms support weighted traffic distribution:

# AWS Lambda Alias Traffic Shifting
MyFunctionAlias:
  Type: AWS::Lambda::Alias
  Properties:
    FunctionName: !Ref MyFunction
    FunctionVersion: !GetAtt Version2.Version
    Name: live
    RoutingConfig:
      AdditionalVersionWeights:
        - FunctionVersion: !GetAtt Version1.Version
          FunctionWeight: 0.1  # 10% to old version

4. Regional Routing

Functions deployed to specific regions
Use AWS Global Accelerator, Azure Front Door, or Cloud CDN for multi-region routing
Consider latency-based routing for global users

No Request Queuing

When Lambda reaches concurrency limits, additional synchronous requests are throttled (HTTP 429), not queued. For asynchronous invocations, there's an internal queue, but it has limits. Design your systems to handle throttling gracefully—implement client-side retries with exponential backoff.

Error Handling and Retry Behavior

Understanding platform retry behavior is critical for building reliable serverless systems. Each invocation type has different semantics.

Synchronous Invocations:

Retry Behavior by Invocation Type
Invocation Type	Automatic Retries	Error Returns To	Your Responsibility
Synchronous (RequestResponse)	None	Caller	Implement retry logic in client
Asynchronous (Event)	2 retries (configurable)	DLQ/Destination	Configure DLQ, handle aged events
SQS Event Source	Until message age/retry count	DLQ	Configure visibility timeout, DLQ
Kinesis/DynamoDB	Until record expires	Bisect on error	Handle poison records, configure destinations

error-handling-patterns.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
// Error handling patterns for serverless functions
 
import { DynamoDBClient, PutItemCommand, TransactionCanceledException } from '@aws-sdk/client-dynamodb';
 
// 1. IDEMPOTENCY FOR RETRIES
// Essential pattern for handling duplicate events
 
const dynamodb = new DynamoDBClient({});
 
interface ProcessingResult {
  success: boolean;
  result?: any;
  alreadyProcessed?: boolean;
}
 
async function processWithIdempotency(
  eventId: string,
  processFunc: () => Promise<any>
): Promise<ProcessingResult> {
  // Check if already processed
  const existing = await dynamodb.send(new GetItemCommand({
    TableName: 'IdempotencyTable',
    Key: { eventId: { S: eventId } }
  }));
  
  if (existing.Item) {
    console.log(`Event ${eventId} already processed`);
    return { 
      success: true, 
      alreadyProcessed: true,
      result: JSON.parse(existing.Item.result.S!)
    };
  }
  
  // Process the event
  const result = await processFunc();
  
  // Record completion (with TTL for cleanup)
  await dynamodb.send(new PutItemCommand({
    TableName: 'IdempotencyTable',
    Item: {
      eventId: { S: eventId },
      result: { S: JSON.stringify(result) },
      ttl: { N: String(Math.floor(Date.now() / 1000) + 86400) } // 24 hour TTL
    },
    ConditionExpression: 'attribute_not_exists(eventId)'
  }));
  
  return { success: true, result };
}
 
export async function idempotentHandler(event: any) {
  const eventId = event.headers['x-idempotency-key'] 
    || event.requestContext?.requestId
    || `${event.source}-${event.detail?.id}`;
  
  return processWithIdempotency(eventId, async () => {
    // Actual processing logic
    return await processOrder(event.body);
  });
}
 
// 2. POISON MESSAGE HANDLING (for event source mappings)
// Prevent infinite retry loops
 
export async function sqsHandler(event: { Records: any[] }) {
  const failedRecords: any[] = [];
  
  for (const record of event.Records) {
    try {
      await processRecord(record);
    } catch (error) {
      // Check if this is a retryable error
      if (isPermanentFailure(error)) {
        // Log and skip - let message go to DLQ after max retries
        console.error('Permanent failure, will move to DLQ', {
          messageId: record.messageId,
          error: error.message
        });
        failedRecords.push(record);
      } else {
        // Transient error - throw to retry entire batch
        throw error;
      }
    }
  }
  
  // Partial batch failure reporting (Lambda feature)
  if (failedRecords.length > 0) {
    return {
      batchItemFailures: failedRecords.map(r => ({
        itemIdentifier: r.messageId
      }))
    };
  }
}
 
function isPermanentFailure(error: any): boolean {
  // Permanent failures shouldn't be retried
  return error.name === 'ValidationError'
    || error.name === 'MalformedInputError'
    || error.statusCode === 400
    || error.statusCode === 404;
}
 
// 3. CIRCUIT BREAKER FOR DEPENDENCIES
// Prevent cascading failures
 
interface CircuitState {
  failures: number;
  lastFailure: number;
  state: 'closed' | 'open' | 'half-open';
}
 
const circuits = new Map<string, CircuitState>();
 
async function callWithCircuitBreaker<T>(
  serviceName: string,
  callFunc: () => Promise<T>,
  options = { threshold: 5, resetMs: 30000 }
): Promise<T> {
  const circuit = circuits.get(serviceName) || {
    failures: 0,
    lastFailure: 0,
    state: 'closed'
  };
  
  // Check if circuit should transition from open to half-open
  if (circuit.state === 'open' && 
      Date.now() - circuit.lastFailure > options.resetMs) {
    circuit.state = 'half-open';
  }
  
  // If circuit is open, fail fast
  if (circuit.state === 'open') {
    throw new Error(`Circuit breaker open for ${serviceName}`);
  }
  
  try {
    const result = await callFunc();
    
    // Success - reset circuit
    circuit.failures = 0;
    circuit.state = 'closed';
    circuits.set(serviceName, circuit);
    
    return result;
  } catch (error) {
    circuit.failures++;
    circuit.lastFailure = Date.now();
    
    if (circuit.failures >= options.threshold) {
      circuit.state = 'open';
      console.error(`Circuit breaker tripped for ${serviceName}`);
    }
    
    circuits.set(serviceName, circuit);
    throw error;
  }
}

Dead Letter Queues (DLQ):

DLQs capture messages that fail after all retries:

# AWS SAM template with DLQ configuration
MyFunction:
  Type: AWS::Serverless::Function
  Properties:
    DeadLetterQueue:
      Type: SQS
      TargetArn: !GetAtt MyDLQ.Arn
    EventInvokeConfig:
      MaximumRetryAttempts: 2
      MaximumEventAgeInSeconds: 600  # 10 minutes
      DestinationConfig:
        OnFailure:
          Type: SQS
          Destination: !GetAtt FailureQueue.Arn
        OnSuccess:
          Type: SNS
          Destination: !Ref SuccessTopic

Best Practices:

Always configure DLQs for async functions
Monitor DLQ depth as a key operational metric
Build replay mechanisms to reprocess DLQ messages after fixes
Set appropriate event age limits to avoid processing stale events
Use partial batch failure reporting for event source mappings

Exponential Backoff on Retries

Lambda's automatic retries use exponential backoff with jitter. First retry happens after ~1 minute, second after ~2 minutes. This prevents thundering herd problems when upstream services recover from outages.

Summary: Execution Model Mastery

Understanding the execution model transforms how you design serverless systems. You're no longer blindly trusting the platform—you're working with it, leveraging its characteristics for optimal performance and reliability.

Key Takeaways

•Execution environments have four states: Cold Start → Invoke → Frozen → Terminated. Optimize initialization code to minimize cold start impact.
•Scaling algorithms differ by trigger type: HTTP triggers scale reactively, poll-based triggers have specific scaling behaviors. Design accordingly.
•Resource allocation varies by platform: Lambda ties CPU to memory; GCF 2nd gen allows independent configuration. Profile your workloads to optimize.
•Isolation models provide defense in depth: Firecracker microVMs, containers, network namespaces, and IAM work together for security.
•Request routing is stateless: No session affinity—design all functions to handle any request without assumptions about previous invocations.
•Error handling requires proactive design: Implement idempotency, configure DLQs, use circuit breakers, and understand retry semantics for each invocation type.

What's Next:

With execution model fundamentals covered, we'll tackle the most common performance challenge in serverless: cold starts. We'll examine why they occur, how to measure them, and strategies Principal Engineers use to minimize their impact on user experience.

Page Complete

You now understand how serverless functions actually execute—from cold start to termination, from scaling algorithms to resource allocation, from isolation boundaries to error handling. This knowledge enables you to make informed architectural decisions and optimize your serverless systems for performance, cost, and reliability.

4 / 5

Loading learning content...

System Design (HLD)Cloud Functions

Cloud Functions: Mastering Function-as-a-Service

LevelIntermediate

Duration120 mins

TopicCloud Functions

4 / 5

Execution Model: Understanding How Serverless Functions Actually Run

The Execution Model: Where Serverless Magic Happens

What You Will Learn

The Execution Environment Lifecycle

The Four States of an Execution Environment:

execution-lifecycle.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
┌─────────────────────────────────────────────────────────────────────────────┐
│                    EXECUTION ENVIRONMENT LIFECYCLE                           │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│     ┌─────────────────┐                                                      │
│     │   COLD START    │ ◄──── Event arrives, no warm instances available    │
│     │  (Provisioning) │                                                      │
│     └────────┬────────┘                                                      │
│              │                                                               │
│     • Download code/image                                                    │
│     • Start runtime (Node, Python, Java...)                                  │
│     • Execute global scope / initialization                                  │
│     • Initialize extensions                                                  │
│     [Duration: 100ms - 10s+ depending on factors]                           │
│              │                                                               │
│              ▼                                                               │
│     ┌─────────────────┐                                                      │
│     │     INVOKE      │ ◄──── Handler function executes                     │
│     │   (Executing)   │                                                      │
│     └────────┬────────┘                                                      │
│              │                                                               │
│     • Handler receives event + context                                       │
│     • Code executes                                                          │
│     • Response returned                                                      │
│     [Duration: Your code execution time - billed]                           │
│              │                                                               │
│              ▼                                                               │
│     ┌─────────────────┐                                                      │
│     │     FROZEN      │ ◄──── Waiting for next invocation                   │
│     │    (Warm)       │                                                      │
│     └────────┬────────┘                                                      │
│              │                                                               │
│     • Processes suspended (SIGSTOP equivalent)                               │
│     • Memory state preserved                                                 │
│     • Global variables retained                                              │
│     • Network connections may be dropped                                     │
│     [Duration: Platform-dependent, typically 5-60 minutes]                  │
│              │                                                               │
│              ├──────────────────────┐                                        │
│              ▼                      ▼                                        │
│     ┌─────────────────┐    ┌─────────────────┐                              │
│     │    THAWED       │    │   TERMINATED    │                              │
│     │ (Reactivated)   │    │   (Shutdown)    │                              │
│     └────────┬────────┘    └─────────────────┘                              │
│              │                                                               │
│              │             • Shutdown hooks execute                          │
│              │             • Resources released                              │
│              │             • Environment destroyed                           │
│              │                                                               │
│              └──────────► INVOKE (fast path, ~1-50ms)                       │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Phase Analysis:

1. Cold Start (Provisioning)

This is the most visible phase because it's where latency variability occurs:

Environment Creation: The platform spins up a micro-VM (Firecracker for Lambda), container, or process
Code Deployment: Your deployment package is downloaded and extracted. For container-based systems, layers are pulled and mounted
Runtime Initialization: The language runtime starts (V8 for Node.js, CPython for Python, JVM for Java)
Static Initialization: Code outside your handler executes—imports, global variables, class definitions
Extension Loading: APM agents, logging frameworks, and platform extensions initialize

2. Invoke (Executing)

The only phase that's always billed:

Your handler function receives the event payload and context metadata
You process the request and return a response
Duration directly affects cost and user experience

3. Frozen (Warm/Idle)

Between invocations:

The execution environment is frozen in place—all processes suspended
Memory state is exactly preserved—global variables, open file handles, initialized clients
This enables subsequent warm starts to skip initialization
Duration is platform-controlled and not guaranteed

4. Terminated (Shutdown)

When the platform reclaims resources:

Shutdown hooks may execute (if platform supports and you've registered them)
Any in-memory state is lost
Subsequent invocations will experience cold starts

No Guaranteed Warm Duration

Scaling Algorithms and Behavior

Serverless platforms automatically scale your functions based on demand. Understanding these scaling algorithms helps you design systems that scale predictably and cost-effectively.

Reactive Scaling (Event-Driven)

The fundamental scaling model for serverless:

Monitor: Platform monitors incoming event rate
Calculate: Determine how many concurrent executions are needed
Scale Up: Provision new execution environments when demand exceeds capacity
Scale Down: Terminate idle environments after timeout period

Concurrency Models:

Concurrency Models Across Platforms
Platform	Concurrency Model	Default Concurrency	Scaling Behavior
AWS Lambda	1 request per instance	1	New instance per concurrent request
Azure Functions (Consumption)	Multiple per instance	~16	Scale based on queue depth/rate
Azure Functions (Premium)	Configurable	Up to 32	Pre-warmed + burst scaling
GCF 1st Gen	1 request per instance	1	Similar to Lambda
GCF 2nd Gen	Configurable	Up to 1000	Leverages Cloud Run scaling

AWS Lambda Scaling Deep Dive:

Lambda's scaling algorithm is sophisticated:

1. Initial Burst:
   - Immediate burst of 500-3000 instances (region-dependent)
   - No rate limiting for initial traffic spike
   
2. Sustained Scaling:
   - After burst capacity, adds 500 instances per minute
   - This throttle prevents runaway scaling
   
3. Concurrency Limits:
   - Account limit: 1000 concurrent executions (can increase)
   - Reserved concurrency: Guaranteed capacity per function
   - Provisioned concurrency: Pre-initialized instances

Scaling Visualization:

scaling-behavior.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Traffic Spike Scenario: 0 to 5000 concurrent requests
 
Concurrent Instances
 │
5000 ┤                                    ┌─────────────────
     │                                   /
4000 ┤                                  /
     │                                 /
3000 ┤                         ┌──────┘   ◄── Post-burst scaling
     │                         │              (+500/minute)
2000 ┤                         │
     │                         │
1000 ┤─────────────┐           │
     │             │           │
   0 ├─────────────┴───────────┴──────────────────────────
     0          Burst          1min    2min    3min    4min
                  │
                  └── Initial burst (500-3000 instances immediately)
 
Key Insight: The initial burst handles sudden traffic spikes.
             Sustained growth requires the 500/min scaling rate.
             Provision accordingly for expected sustained increases.

Poll-Based Source Scaling (Kinesis, SQS, etc.):

For event source mappings, scaling works differently:

Kinesis/DynamoDB Streams:

One concurrent execution per shard (parallelization factor)
Scale by adding shards, not functions
Parallelization factor allows multiple concurrent processors per shard (1-10)

SQS Queues:

Lambda polls the queue and batches messages
Scales up to 5 concurrent batches initially
Adds 60 concurrent batches per minute as queue depth increases
Maximum ~1000 concurrent executions for standard queues

SQS FIFO Queues:

Limited by MessageGroupId
One concurrent execution per unique MessageGroupId
Scale through message group diversity

Design for Scaling Characteristics

Resource Allocation Models

Understanding how platforms allocate resources helps you optimize for performance and cost.

AWS Lambda Resource Model:

Memory is the primary configuration dimension:

Memory Configured	vCPU Equivalent	Network Bandwidth	Cost/GB-second
128 MB	0.083 vCPU	Proportional	$0.0000166667
512 MB	0.33 vCPU	Proportional	$0.0000166667
1,769 MB	1 vCPU	~6 Gbps	$0.0000166667
3,538 MB	2 vCPU	~12 Gbps	$0.0000166667
10,240 MB	6 vCPU	~25 Gbps	$0.0000166667

Key insight: CPU scales linearly with memory. At 1,769 MB, you get exactly one vCPU. This is the inflection point for compute-intensive workloads.

Ephemeral Storage:

/tmp: 512 MB default, configurable up to 10,240 MB
Persists across warm invocations within same environment
Useful for caching, temporary files, model loading
Additional storage costs extra ($0.0000000309 per MB-second)

resource-optimization.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
// Resource usage patterns and optimization
 
// 1. CPU-BOUND WORKLOAD: Increase memory for more CPU
// Image processing, ML inference, compression
export async function cpuIntensiveHandler(event: any) {
  // This benefits from higher memory (more CPU)
  const result = performComplexCalculation(event.data);
  return { result };
}
// Deploy with: --memory 3008 (for ~2 vCPU)
 
// 2. MEMORY-BOUND WORKLOAD: Match memory to data size
// Large data processing, caching, aggregation
export async function memoryIntensiveHandler(event: any) {
  // Loading large dataset into memory
  const data = event.records; // 500MB of data
  const aggregated = data.reduce((acc, record) => {
    // Memory-intensive aggregation
    return merge(acc, record);
  }, {});
  return { aggregated };
}
// Deploy with: --memory 1024 or higher based on data size
 
// 3. I/O-BOUND WORKLOAD: Lower memory often sufficient
// API calls, database queries, file operations
export async function ioIntensiveHandler(event: any) {
  // Mostly waiting on network I/O
  const [users, orders, inventory] = await Promise.all([
    fetchUsers(event.userIds),
    fetchOrders(event.orderIds),
    fetchInventory(event.skus)
  ]);
  return { users, orders, inventory };
}
// Deploy with: --memory 512 (I/O doesn't benefit from more CPU)
 
// 4. COLD START INTENSIVE: Higher memory = faster startup
// Functions with many dependencies, complex initialization
import { createClient as createRedis } from 'redis';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { S3Client } from '@aws-sdk/client-s3';
 
// Global initialization - runs during cold start
const redis = createRedis({ url: process.env.REDIS_URL });
const dynamodb = new DynamoDBClient({});
const s3 = new S3Client({});
 
export async function coldStartSensitiveHandler(event: any) {
  // Function with heavy initialization
}
// Deploy with: --memory 1024+ (faster cold start initialization)
 
// 5. EPHEMERAL STORAGE USAGE
import fs from 'fs/promises';
import path from 'path';
 
const CACHE_DIR = '/tmp/model-cache';
 
export async function modelInferenceHandler(event: any) {
  const modelPath = path.join(CACHE_DIR, 'model.bin');
  
  // Check if model is cached from previous invocation
  const cached = await fs.stat(modelPath).catch(() => null);
  
  if (!cached) {
    // Download and cache model (persists across warm invocations)
    console.log('Downloading model to /tmp...');
    await downloadModel(modelPath);
  } else {
    console.log('Using cached model');
  }
  
  // Use the cached model
  return runInference(modelPath, event.input);
}
// Deploy with: --ephemeral-storage 2048 (for large models)

Azure Functions Resource Model:

Azure takes a different approach:

Consumption Plan:

Fixed 1.5 GB memory
CPU scales with load (not configurable)
5-minute default timeout, 10-minute max

Premium Plan:

EP1: 1 vCPU, 3.5 GB RAM
EP2: 2 vCPU, 7 GB RAM
EP3: 4 vCPU, 14 GB RAM
Configurable minimum instances

Google Cloud Functions Resource Model:

2nd generation offers more granular control:

gcloud functions deploy my-function \
  --gen2 \
  --memory 2Gi \
  --cpu 1 \              # Decoupled from memory!
  --concurrency 100 \
  --min-instances 1 \
  --max-instances 100

Key GCF difference: CPU and memory are independently configurable in 2nd gen, allowing more precise resource allocation.

Power Tuning is Essential

Isolation and Security Models

Multi-tenant serverless platforms must provide strong isolation between customers while maintaining the agility and density that makes serverless economical.

Isolation Layers:

1. Process-Level Isolation (Weakest)

Functions run in separate processes
Share kernel with other functions
Faster startup, lower overhead
Weaker security boundary
Used in: Some internal/single-tenant deployments

2. Container-Level Isolation

Functions run in separate containers
Linux cgroups and namespaces provide isolation
Moderate startup time
Good balance of isolation and performance
Used in: Azure Functions, GCF 2nd gen (via Cloud Run)

3. MicroVM-Level Isolation (Strongest)

Functions run in lightweight virtual machines
Hardware-level isolation through virtualization
Strongest security boundary
Used in: AWS Lambda (Firecracker), Azure (Hyper-V isolation)

isolation-architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
AWS Lambda Isolation Architecture (Firecracker)
═══════════════════════════════════════════════════════════════════════════
 
┌─────────────────── Physical Host ───────────────────┐
│                                                      │
│  ┌────── Customer A ──────┐  ┌────── Customer B ──────┐
│  │    ┌──────────────┐    │  │    ┌──────────────┐    │
│  │    │  Function 1  │    │  │    │  Function 3  │    │
│  │    │  (MicroVM)   │    │  │    │  (MicroVM)   │    │
│  │    │   ┌───────┐  │    │  │    │   ┌───────┐  │    │
│  │    │   │ Code  │  │    │  │    │   │ Code  │  │    │
│  │    │   │Runtime│  │    │  │    │   │Runtime│  │    │
│  │    │   │Kernel │  │    │  │    │   │Kernel │  │    │
│  │    │   └───────┘  │    │  │    │   └───────┘  │    │
│  │    └──────────────┘    │  │    └──────────────┘    │
│  │                        │  │                        │
│  │    ┌──────────────┐    │  │    ┌──────────────┐    │
│  │    │  Function 2  │    │  │    │  Function 4  │    │
│  │    │  (MicroVM)   │    │  │    │  (MicroVM)   │    │
│  │    └──────────────┘    │  │    └──────────────┘    │
│  └────────────────────────┘  └────────────────────────┘
│                                                      │
│  ┌───────────────────────────────────────────────┐  │
│  │              Firecracker VMM                   │  │
│  │  • Minimal attack surface (~50k lines of code)│  │
│  │  • Memory-safe (Rust implementation)          │  │
│  │  • <125ms VM startup time                     │  │
│  │  • ~5MB memory overhead per VM                │  │
│  └───────────────────────────────────────────────┘  │
│                                                      │
│  ┌───────────────────────────────────────────────┐  │
│  │              Host Kernel (Linux)               │  │
│  │  • KVM for hardware virtualization            │  │
│  │  • cgroups for resource limits                │  │
│  │  • Jailer for additional sandboxing           │  │
│  └───────────────────────────────────────────────┘  │
│                                                      │
└──────────────────────────────────────────────────────┘
 
Security Boundaries:
✓ Customer A cannot access Customer B's memory
✓ Each function has isolated network namespace
✓ Filesystem isolation (no shared mounts)
✓ Independent kernel (OS vulnerabilities contained)

Defense in Depth:

Platforms implement multiple security layers:

1. IAM and Access Control

Functions execute with specific roles/identities
Least-privilege permissions
Per-function role assignment

2. Network Isolation

Functions in VPC have dedicated network interfaces
Security groups control ingress/egress
Private subnets prevent internet access

3. Data Encryption

At-rest encryption for code and configuration
In-transit encryption for all communications
Customer-managed keys (CMEK) optional

4. Secrets Management

Integration with secrets managers (AWS Secrets Manager, Azure Key Vault, GCP Secret Manager)
Environment variables encrypted at rest
Runtime injection without code exposure

5. Audit Logging

All invocations logged by platform
API calls tracked (CloudTrail, Azure Monitor)
Access to logs requires explicit permissions

Same Environment, Different Invocations

Request Routing and Load Balancing

When millions of requests flow into serverless functions, sophisticated routing ensures optimal distribution.

The Request Journey:

request-routing.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
Request Flow Through AWS Lambda
═══════════════════════════════════════════════════════════════════════════
 
Client Request
      │
      ▼
┌──────────────────┐
│   Edge Network   │  • CloudFront PoPs worldwide
│   (Optional)     │  • SSL termination, caching
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│   API Gateway    │  • Request validation
│   (If HTTP)      │  • Rate limiting, throttling
│                  │  • Authentication (Cognito, IAM)
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Lambda Frontend │  • Receives invocation request
│     Service      │  • Authenticates caller
│                  │  • Validates payload
└────────┬─────────┘
         │
         ▼
┌──────────────────┐
│  Counting        │  • Checks concurrency limits
│  Service         │  • Account limit, reserved concurrency
│                  │  • Returns 429 if over limit
└────────┬─────────┘
         │
         ├─────────── Need new instance? ───────────────┐
         │                                               │
         ▼                                               ▼
┌──────────────────┐                        ┌──────────────────┐
│  Warm Instance   │                        │   Cold Start     │
│    Available     │                        │   (Placement     │
│                  │                        │    Service)      │
└────────┬─────────┘                        └────────┬─────────┘
         │                                           │
         │                    ┌──────────────────────┘
         │                    │
         ▼                    ▼
┌────────────────────────────────────────────────┐
│            Worker Instance                       │
│  ┌─────────────────────────────────────────┐   │
│  │  Firecracker MicroVM                     │   │
│  │  ┌───────────────────────────────────┐  │   │
│  │  │  Your Function Code               │  │   │
│  │  │  • Handler executes               │  │   │
│  │  │  • Response generated             │  │   │
│  │  └───────────────────────────────────┘  │   │
│  └─────────────────────────────────────────┘   │
└────────────────────────────────────────────────┘
         │
         ▼
    Response returned through same path

Load Balancing Strategies:

1. Request Distribution

Incoming requests are distributed across available warm instances
If all instances are busy (at concurrency limit), new instances are created
Distribution considers instance health and current load

2. Affinity and Stickiness

Serverless platforms generally do NOT provide session affinity
Any instance may handle any request
Design for stateless operation

3. Traffic Splitting

Some platforms support weighted traffic distribution:

# AWS Lambda Alias Traffic Shifting
MyFunctionAlias:
  Type: AWS::Lambda::Alias
  Properties:
    FunctionName: !Ref MyFunction
    FunctionVersion: !GetAtt Version2.Version
    Name: live
    RoutingConfig:
      AdditionalVersionWeights:
        - FunctionVersion: !GetAtt Version1.Version
          FunctionWeight: 0.1  # 10% to old version

4. Regional Routing

Functions deployed to specific regions
Use AWS Global Accelerator, Azure Front Door, or Cloud CDN for multi-region routing
Consider latency-based routing for global users

No Request Queuing

Error Handling and Retry Behavior

Understanding platform retry behavior is critical for building reliable serverless systems. Each invocation type has different semantics.

Synchronous Invocations:

Retry Behavior by Invocation Type
Invocation Type	Automatic Retries	Error Returns To	Your Responsibility
Synchronous (RequestResponse)	None	Caller	Implement retry logic in client
Asynchronous (Event)	2 retries (configurable)	DLQ/Destination	Configure DLQ, handle aged events
SQS Event Source	Until message age/retry count	DLQ	Configure visibility timeout, DLQ
Kinesis/DynamoDB	Until record expires	Bisect on error	Handle poison records, configure destinations

error-handling-patterns.ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
// Error handling patterns for serverless functions
 
import { DynamoDBClient, PutItemCommand, TransactionCanceledException } from '@aws-sdk/client-dynamodb';
 
// 1. IDEMPOTENCY FOR RETRIES
// Essential pattern for handling duplicate events
 
const dynamodb = new DynamoDBClient({});
 
interface ProcessingResult {
  success: boolean;
  result?: any;
  alreadyProcessed?: boolean;
}
 
async function processWithIdempotency(
  eventId: string,
  processFunc: () => Promise<any>
): Promise<ProcessingResult> {
  // Check if already processed
  const existing = await dynamodb.send(new GetItemCommand({
    TableName: 'IdempotencyTable',
    Key: { eventId: { S: eventId } }
  }));
  
  if (existing.Item) {
    console.log(`Event ${eventId} already processed`);
    return { 
      success: true, 
      alreadyProcessed: true,
      result: JSON.parse(existing.Item.result.S!)
    };
  }
  
  // Process the event
  const result = await processFunc();
  
  // Record completion (with TTL for cleanup)
  await dynamodb.send(new PutItemCommand({
    TableName: 'IdempotencyTable',
    Item: {
      eventId: { S: eventId },
      result: { S: JSON.stringify(result) },
      ttl: { N: String(Math.floor(Date.now() / 1000) + 86400) } // 24 hour TTL
    },
    ConditionExpression: 'attribute_not_exists(eventId)'
  }));
  
  return { success: true, result };
}
 
export async function idempotentHandler(event: any) {
  const eventId = event.headers['x-idempotency-key'] 
    || event.requestContext?.requestId
    || `${event.source}-${event.detail?.id}`;
  
  return processWithIdempotency(eventId, async () => {
    // Actual processing logic
    return await processOrder(event.body);
  });
}
 
// 2. POISON MESSAGE HANDLING (for event source mappings)
// Prevent infinite retry loops
 
export async function sqsHandler(event: { Records: any[] }) {
  const failedRecords: any[] = [];
  
  for (const record of event.Records) {
    try {
      await processRecord(record);
    } catch (error) {
      // Check if this is a retryable error
      if (isPermanentFailure(error)) {
        // Log and skip - let message go to DLQ after max retries
        console.error('Permanent failure, will move to DLQ', {
          messageId: record.messageId,
          error: error.message
        });
        failedRecords.push(record);
      } else {
        // Transient error - throw to retry entire batch
        throw error;
      }
    }
  }
  
  // Partial batch failure reporting (Lambda feature)
  if (failedRecords.length > 0) {
    return {
      batchItemFailures: failedRecords.map(r => ({
        itemIdentifier: r.messageId
      }))
    };
  }
}
 
function isPermanentFailure(error: any): boolean {
  // Permanent failures shouldn't be retried
  return error.name === 'ValidationError'
    || error.name === 'MalformedInputError'
    || error.statusCode === 400
    || error.statusCode === 404;
}
 
// 3. CIRCUIT BREAKER FOR DEPENDENCIES
// Prevent cascading failures
 
interface CircuitState {
  failures: number;
  lastFailure: number;
  state: 'closed' | 'open' | 'half-open';
}
 
const circuits = new Map<string, CircuitState>();
 
async function callWithCircuitBreaker<T>(
  serviceName: string,
  callFunc: () => Promise<T>,
  options = { threshold: 5, resetMs: 30000 }
): Promise<T> {
  const circuit = circuits.get(serviceName) || {
    failures: 0,
    lastFailure: 0,
    state: 'closed'
  };
  
  // Check if circuit should transition from open to half-open
  if (circuit.state === 'open' && 
      Date.now() - circuit.lastFailure > options.resetMs) {
    circuit.state = 'half-open';
  }
  
  // If circuit is open, fail fast
  if (circuit.state === 'open') {
    throw new Error(`Circuit breaker open for ${serviceName}`);
  }
  
  try {
    const result = await callFunc();
    
    // Success - reset circuit
    circuit.failures = 0;
    circuit.state = 'closed';
    circuits.set(serviceName, circuit);
    
    return result;
  } catch (error) {
    circuit.failures++;
    circuit.lastFailure = Date.now();
    
    if (circuit.failures >= options.threshold) {
      circuit.state = 'open';
      console.error(`Circuit breaker tripped for ${serviceName}`);
    }
    
    circuits.set(serviceName, circuit);
    throw error;
  }
}

Dead Letter Queues (DLQ):

DLQs capture messages that fail after all retries:

# AWS SAM template with DLQ configuration
MyFunction:
  Type: AWS::Serverless::Function
  Properties:
    DeadLetterQueue:
      Type: SQS
      TargetArn: !GetAtt MyDLQ.Arn
    EventInvokeConfig:
      MaximumRetryAttempts: 2
      MaximumEventAgeInSeconds: 600  # 10 minutes
      DestinationConfig:
        OnFailure:
          Type: SQS
          Destination: !GetAtt FailureQueue.Arn
        OnSuccess:
          Type: SNS
          Destination: !Ref SuccessTopic

Best Practices:

Always configure DLQs for async functions
Monitor DLQ depth as a key operational metric
Build replay mechanisms to reprocess DLQ messages after fixes
Set appropriate event age limits to avoid processing stale events
Use partial batch failure reporting for event source mappings

Exponential Backoff on Retries

Summary: Execution Model Mastery

Key Takeaways

•Execution environments have four states: Cold Start → Invoke → Frozen → Terminated. Optimize initialization code to minimize cold start impact.
•Scaling algorithms differ by trigger type: HTTP triggers scale reactively, poll-based triggers have specific scaling behaviors. Design accordingly.
•Resource allocation varies by platform: Lambda ties CPU to memory; GCF 2nd gen allows independent configuration. Profile your workloads to optimize.
•Isolation models provide defense in depth: Firecracker microVMs, containers, network namespaces, and IAM work together for security.
•Request routing is stateless: No session affinity—design all functions to handle any request without assumptions about previous invocations.
•Error handling requires proactive design: Implement idempotency, configure DLQs, use circuit breakers, and understand retry semantics for each invocation type.

What's Next:

Page Complete

4 / 5