Serverless Patterns - Learning Module

Loading content...

0/273

Real-Time File Processing

Serverless File Processing at Scale

File processing is one of the most natural fits for serverless computing. Users upload files—images, videos, documents, archives—and those files need immediate processing: resizing, validation, extraction, transformation, or analysis. The event-driven nature of serverless perfectly matches this pattern: files arrive unpredictably, trigger processing, and results are stored.

Traditional file processing architectures required maintaining worker servers, managing queues, and handling capacity planning for peak loads. A media company might need 100 servers during a product launch but only 5 during normal operations. Serverless eliminates this waste: each file upload triggers exactly the compute needed, scaling from zero to thousands of concurrent processes.

This page provides a comprehensive guide to real-time file processing in serverless environments. We'll cover image manipulation, video processing, document handling, and strategies for files that exceed Lambda's constraints.

What You Will Learn

By the end of this page, you will understand: (1) How to build image processing pipelines with Lambda, (2) Video transcoding strategies using serverless and managed services, (3) Document processing patterns including PDF and Office files, (4) Strategies for handling files that exceed Lambda limits, (5) Performance optimization for file-heavy workloads, and (6) Security considerations for user-uploaded content.

Image Processing

Image processing is the canonical serverless file processing use case. Users upload images, and the system automatically generates thumbnails, optimizes for web delivery, extracts metadata, and applies content policies.

Common Image Processing Operations:

Resizing: Generate thumbnails and responsive image variants
Format Conversion: Convert to WebP, AVIF for web optimization
Optimization: Compress without visible quality loss
Cropping: Smart cropping, face detection for avatars
Watermarking: Add branding or copyright marks
Metadata Extraction: EXIF data, dimensions, color profiles
Content Moderation: Detect inappropriate content

Image Processing Lambda
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import { S3Handler } from "aws-lambda";
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import sharp from "sharp";
 
const s3 = new S3Client({});
 
interface ImageVariant {
  suffix: string;
  width: number;
  height?: number;
  format: "jpeg" | "webp" | "avif" | "png";
  quality: number;
}
 
const VARIANTS: ImageVariant[] = [
  { suffix: "thumb", width: 150, height: 150, format: "webp", quality: 80 },
  { suffix: "small", width: 320, format: "webp", quality: 85 },
  { suffix: "medium", width: 800, format: "webp", quality: 85 },
  { suffix: "large", width: 1920, format: "webp", quality: 90 },
  { suffix: "original", width: 4096, format: "webp", quality: 95 }
];
 
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    // Skip if not in uploads folder or not an image
    if (!key.startsWith("uploads/") || !isImage(key)) {
      console.log(`Skipping: ${key}`);
      continue;
    }
    
    console.log(`Processing image: s3://${bucket}/${key}`);
    
    try {
      // Get the original image
      const original = await s3.send(new GetObjectCommand({
        Bucket: bucket,
        Key: key
      }));
      
      const imageBuffer = Buffer.from(await original.Body!.transformToByteArray());
      
      // Get image metadata
      const metadata = await sharp(imageBuffer).metadata();
      console.log(`Original: ${metadata.width}x${metadata.height}, ${metadata.format}`);
      
      // Generate all variants
      const results = await Promise.all(
        VARIANTS.map(variant => generateVariant(imageBuffer, key, variant))
      );
      
      // Upload all variants
      await Promise.all(
        results.map(result => 
          s3.send(new PutObjectCommand({
            Bucket: process.env.OUTPUT_BUCKET!,
            Key: result.key,
            Body: result.buffer,
            ContentType: `image/${result.format}`,
            CacheControl: "public, max-age=31536000, immutable"
          }))
        )
      );
      
      console.log(`Generated ${results.length} variants for ${key}`);
      
      // Store metadata for API access
      await storeImageMetadata(key, metadata, results);
      
    } catch (error) {
      console.error(`Failed to process ${key}:`, error);
      throw error;
    }
  }
};
 
async function generateVariant(
  original: Buffer, 
  originalKey: string,
  variant: ImageVariant
): Promise<{ key: string; buffer: Buffer; format: string }> {
  let pipeline = sharp(original);
  
  // Resize with smart cropping for thumbnails
  if (variant.height) {
    pipeline = pipeline.resize(variant.width, variant.height, {
      fit: "cover",
      position: "attention" // Smart crop focusing on interesting areas
    });
  } else {
    pipeline = pipeline.resize(variant.width, undefined, {
      fit: "inside",
      withoutEnlargement: true
    });
  }
  
  // Convert to target format
  switch (variant.format) {
    case "webp":
      pipeline = pipeline.webp({ quality: variant.quality });
      break;
    case "avif":
      pipeline = pipeline.avif({ quality: variant.quality });
      break;
    case "jpeg":
      pipeline = pipeline.jpeg({ quality: variant.quality, mozjpeg: true });
      break;
  }
  
  const buffer = await pipeline.toBuffer();
  const baseName = originalKey.replace("uploads/", "").replace(/\.[^.]+$/, "");
  
  return {
    key: `processed/${baseName}-${variant.suffix}.${variant.format}`,
    buffer,
    format: variant.format
  };
}
 
function isImage(key: string): boolean {
  const imageExtensions = [".jpg", ".jpeg", ".png", ".gif", ".webp", ".tiff", ".bmp"];
  return imageExtensions.some(ext => key.toLowerCase().endsWith(ext));
}

Layer Configuration for Sharp:

The Sharp library requires native binaries. For Lambda, you need a layer with Linux-compatible binaries:

# Build sharp layer for Lambda
npm install --platform=linux --arch=x64 sharp
# Or use pre-built layer from community

Alternatively, use Lambda's container image support to include Sharp in a Docker image.

Image Formats and Use Cases
Format	Best For	Browser Support	Compression
WebP	General web use	95%+ browsers	25-35% smaller than JPEG
AVIF	Maximum compression	~90% browsers	50% smaller than JPEG
JPEG	Photos, fallback	Universal	Baseline standard
PNG	Transparency, graphics	Universal	Lossless, larger files
SVG	Icons, illustrations	Universal	Infinitely scalable

On-Demand Image Processing

For high-volume image serving, consider on-demand processing with Lambda@Edge or CloudFront Functions. Instead of pre-generating all variants, generate them on first request and cache at the CDN. This reduces storage costs and processes only the variants actually requested.

Video Processing

Video processing presents unique challenges for serverless: files are large, processing is CPU-intensive, and transcoding can take hours for long videos. Multiple strategies exist to handle video in serverless architectures.

AWS MediaConvert (Recommended for Most Cases):

MediaConvert is a fully managed video transcoding service:

Serverless: No infrastructure to manage
Pay per minute: Charged by output duration
Broad format support: Input almost anything, output to HLS, DASH, MP4
Quality options: Basic to professional broadcast quality
Lambda integration: Trigger jobs on upload, receive completion events

Video Transcoding with MediaConvert
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
import { S3Handler } from "aws-lambda";
import { MediaConvertClient, CreateJobCommand, CreateJobRequest } from "@aws-sdk/client-mediaconvert";
 
const mediaConvert = new MediaConvertClient({
  endpoint: process.env.MEDIACONVERT_ENDPOINT
});
 
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    if (!isVideo(key)) continue;
    
    const inputUri = `s3://${bucket}/${key}`;
    const outputPrefix = key.replace("uploads/", "transcoded/").replace(/\.[^.]+$/, "");
    
    const jobSettings: CreateJobRequest = {
      Role: process.env.MEDIACONVERT_ROLE_ARN!,
      Settings: {
        Inputs: [{
          FileInput: inputUri,
          AudioSelectors: {
            "Audio Selector 1": { DefaultSelection: "DEFAULT" }
          },
          VideoSelector: {}
        }],
        OutputGroups: [
          // HLS for adaptive streaming
          {
            Name: "HLS Group",
            OutputGroupSettings: {
              Type: "HLS_GROUP_SETTINGS",
              HlsGroupSettings: {
                Destination: `s3://${process.env.OUTPUT_BUCKET}/${outputPrefix}/hls/`,
                SegmentLength: 6,
                MinSegmentLength: 2
              }
            },
            Outputs: [
              // 1080p
              {
                VideoDescription: {
                  Width: 1920,
                  Height: 1080,
                  CodecSettings: {
                    Codec: "H_264",
                    H264Settings: {
                      RateControlMode: "QVBR",
                      MaxBitrate: 8000000,
                      QvbrSettings: { QvbrQualityLevel: 8 }
                    }
                  }
                },
                AudioDescriptions: [{
                  CodecSettings: {
                    Codec: "AAC",
                    AacSettings: { Bitrate: 192000, SampleRate: 48000 }
                  }
                }],
                ContainerSettings: { Container: "M3U8" }
              },
              // 720p
              {
                VideoDescription: {
                  Width: 1280,
                  Height: 720,
                  CodecSettings: {
                    Codec: "H_264",
                    H264Settings: {
                      RateControlMode: "QVBR",
                      MaxBitrate: 5000000,
                      QvbrSettings: { QvbrQualityLevel: 7 }
                    }
                  }
                },
                AudioDescriptions: [{
                  CodecSettings: {
                    Codec: "AAC",
                    AacSettings: { Bitrate: 128000, SampleRate: 48000 }
                  }
                }],
                ContainerSettings: { Container: "M3U8" }
              },
              // 480p
              {
                VideoDescription: {
                  Width: 854,
                  Height: 480,
                  CodecSettings: {
                    Codec: "H_264",
                    H264Settings: {
                      RateControlMode: "QVBR",
                      MaxBitrate: 2500000,
                      QvbrSettings: { QvbrQualityLevel: 6 }
                    }
                  }
                },
                AudioDescriptions: [{
                  CodecSettings: {
                    Codec: "AAC",
                    AacSettings: { Bitrate: 96000, SampleRate: 48000 }
                  }
                }],
                ContainerSettings: { Container: "M3U8" }
              }
            ]
          },
          // MP4 for download
          {
            Name: "MP4 Group",
            OutputGroupSettings: {
              Type: "FILE_GROUP_SETTINGS",
              FileGroupSettings: {
                Destination: `s3://${process.env.OUTPUT_BUCKET}/${outputPrefix}/mp4/`
              }
            },
            Outputs: [{
              VideoDescription: {
                Width: 1920,
                Height: 1080,
                CodecSettings: {
                  Codec: "H_264",
                  H264Settings: {
                    RateControlMode: "QVBR",
                    MaxBitrate: 10000000,
                    QvbrSettings: { QvbrQualityLevel: 9 }
                  }
                }
              },
              AudioDescriptions: [{
                CodecSettings: {
                  Codec: "AAC",
                  AacSettings: { Bitrate: 256000, SampleRate: 48000 }
                }
              }],
              ContainerSettings: {
                Container: "MP4",
                Mp4Settings: {}
              }
            }]
          }
        ]
      }
    };
    
    const result = await mediaConvert.send(new CreateJobCommand(jobSettings));
    console.log(`Created MediaConvert job: ${result.Job?.Id}`);
  }
};
 
function isVideo(key: string): boolean {
  const videoExtensions = [".mp4", ".mov", ".avi", ".mkv", ".webm", ".m4v"];
  return videoExtensions.some(ext => key.toLowerCase().endsWith(ext));
}

Thumbnail Generation from Video:

Media Convert can also extract thumbnails at specified intervals or using smart frame selection. For simple thumbnail extraction, Lambda with FFmpeg can work for short videos:

Converting Mermaid diagram...

Video Processing Economics

Video processing costs can add up quickly. MediaConvert charges per minute of output. For a 10-minute video with 3 quality variants, you pay for 30 minutes of transcoding. Consider offering fewer variants for lower-tier users, or transcode on-demand for rarely-accessed content.

Document Processing

Document processing encompasses PDF manipulation, Office file conversion, text extraction, and content analysis. Serverless handles these workloads effectively, with various approaches depending on the document type.

PDF Processing:

PDF operations include:

Merge multiple PDFs into one
Split PDFs into individual pages
Extract text for search indexing
Generate thumbnails of pages
Add watermarks or stamps
Fill form fields programmatically

PDF Processing with pdf-lib
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import { S3Handler } from "aws-lambda";
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import { PDFDocument, StandardFonts, rgb } from "pdf-lib";
 
const s3 = new S3Client({});
 
/**
 * Add watermark to uploaded PDFs
 */
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    if (!key.endsWith(".pdf")) continue;
    
    try {
      // Get original PDF
      const response = await s3.send(new GetObjectCommand({
        Bucket: bucket,
        Key: key
      }));
      
      const pdfBytes = await response.Body!.transformToByteArray();
      
      // Load PDF
      const pdfDoc = await PDFDocument.load(pdfBytes);
      const helvetica = await pdfDoc.embedFont(StandardFonts.Helvetica);
      
      // Add watermark to each page
      const pages = pdfDoc.getPages();
      for (const page of pages) {
        const { width, height } = page.getSize();
        
        page.drawText("CONFIDENTIAL", {
          x: width / 2 - 100,
          y: height / 2,
          size: 50,
          font: helvetica,
          color: rgb(0.75, 0.75, 0.75),
          opacity: 0.3,
          rotate: { angle: 45, type: "degrees" }
        });
      }
      
      // Add metadata
      pdfDoc.setTitle("Processed Document");
      pdfDoc.setModificationDate(new Date());
      pdfDoc.setProducer("Document Processing System");
      
      // Save and upload
      const modifiedPdf = await pdfDoc.save();
      const outputKey = key.replace("uploads/", "watermarked/");
      
      await s3.send(new PutObjectCommand({
        Bucket: process.env.OUTPUT_BUCKET!,
        Key: outputKey,
        Body: modifiedPdf,
        ContentType: "application/pdf"
      }));
      
      console.log(`Watermarked: ${key} -> ${outputKey}`);
      
    } catch (error) {
      console.error(`Failed to process ${key}:`, error);
      throw error;
    }
  }
};

Text Extraction with Amazon Textract:

For extracting text from documents, including scanned images and complex layouts, Amazon Textract provides AI-powered extraction:

Text Extraction with Textract
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import { S3Handler } from "aws-lambda";
import { TextractClient, DetectDocumentTextCommand, AnalyzeDocumentCommand } from "@aws-sdk/client-textract";
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";
 
const textract = new TextractClient({});
const s3 = new S3Client({});
 
interface ExtractedDocument {
  fileName: string;
  extractedAt: string;
  pageCount: number;
  text: string;
  tables: any[];
  forms: Record<string, string>;
}
 
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    try {
      // Analyze document with Textract
      const analysis = await textract.send(new AnalyzeDocumentCommand({
        Document: {
          S3Object: { Bucket: bucket, Name: key }
        },
        FeatureTypes: ["TABLES", "FORMS"]
      }));
      
      // Extract text blocks
      const textBlocks = analysis.Blocks?.filter(b => b.BlockType === "LINE") || [];
      const text = textBlocks.map(b => b.Text).join("\n");
      
      // Extract tables
      const tables = extractTables(analysis.Blocks || []);
      
      // Extract form key-value pairs
      const forms = extractForms(analysis.Blocks || []);
      
      const extracted: ExtractedDocument = {
        fileName: key,
        extractedAt: new Date().toISOString(),
        pageCount: analysis.Blocks?.filter(b => b.BlockType === "PAGE").length || 1,
        text,
        tables,
        forms
      };
      
      // Store extracted content
      const outputKey = key.replace(/\.[^.]+$/, ".json");
      await s3.send(new PutObjectCommand({
        Bucket: process.env.OUTPUT_BUCKET!,
        Key: `extracted/${outputKey}`,
        Body: JSON.stringify(extracted, null, 2),
        ContentType: "application/json"
      }));
      
      console.log(`Extracted ${text.length} characters from ${key}`);
      
    } catch (error) {
      console.error(`Textract failed for ${key}:`, error);
      throw error;
    }
  }
};
 
function extractForms(blocks: any[]): Record<string, string> {
  const forms: Record<string, string> = {};
  
  const keyMap = new Map<string, string>();
  const valueMap = new Map<string, string>();
  
  for (const block of blocks) {
    if (block.BlockType === "KEY_VALUE_SET") {
      const text = getBlockText(block, blocks);
      if (block.EntityTypes?.includes("KEY")) {
        keyMap.set(block.Id, text);
      } else {
        valueMap.set(block.Id, text);
      }
    }
  }
  
  // Match keys to values
  for (const block of blocks) {
    if (block.BlockType === "KEY_VALUE_SET" && block.EntityTypes?.includes("KEY")) {
      const keyText = keyMap.get(block.Id) || "";
      const valueId = block.Relationships?.find((r: any) => r.Type === "VALUE")?.Ids?.[0];
      const valueText = valueId ? valueMap.get(valueId) || "" : "";
      if (keyText) {
        forms[keyText] = valueText;
      }
    }
  }
  
  return forms;
}

Document Processing Services
Service/Library	Use Case	Serverless Compatible	Cost Model
pdf-lib	PDF manipulation (merge, split, watermark)	Yes (pure JS)	Free, open source
Amazon Textract	AI text/table/form extraction	Yes	Per page analyzed
Amazon Comprehend	NLP (sentiment, entities)	Yes	Per 100 characters
LibreOffice (container)	Office → PDF conversion	Yes (via container)	Compute cost only
Pandoc (container)	Format conversion	Yes (via container)	Compute cost only

Container Images for Complex Tools

Tools like LibreOffice or Pandoc require large binaries not suitable for Lambda layers. Use Lambda container images (up to 10GB) to package these tools. Build images with only necessary components to keep cold starts reasonable.

Handling Large Files

Lambda has memory limits (up to 10GB) and ephemeral storage limits (up to 10GB with configuration). For files larger than these limits, or processing that exceeds 15 minutes, alternative strategies are needed.

Strategy 1: Streaming Processing

Process data in streams without loading entire file into memory:

Streaming Large File Processing
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import { S3Handler } from "aws-lambda";
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import { Readable, Transform, pipeline } from "stream";
import { promisify } from "util";
import { createGzip } from "zlib";
 
const s3 = new S3Client({});
const pipelineAsync = promisify(pipeline);
 
/**
 * Process large CSV files line-by-line without loading into memory
 */
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    // Get file as stream
    const response = await s3.send(new GetObjectCommand({
      Bucket: bucket,
      Key: key
    }));
    
    const inputStream = response.Body as Readable;
    
    // Create transform stream
    let lineCount = 0;
    const transformer = new Transform({
      transform(chunk, encoding, callback) {
        const lines = chunk.toString().split("\n");
        lines.forEach((line: string) => {
          if (line.trim()) {
            lineCount++;
            // Transform each line (example: add line number)
            this.push(`${lineCount}:${line}\n`);
          }
        });
        callback();
      }
    });
    
    // Collect output (for small results) or stream to S3
    const chunks: Buffer[] = [];
    transformer.on("data", chunk => chunks.push(chunk));
    
    await pipelineAsync(inputStream, transformer);
    
    const output = Buffer.concat(chunks);
    
    // Upload result
    await s3.send(new PutObjectCommand({
      Bucket: process.env.OUTPUT_BUCKET!,
      Key: key.replace("uploads/", "processed/"),
      Body: output
    }));
    
    console.log(`Processed ${lineCount} lines`);
  }
};

Strategy 2: Chunked Processing with S3 Multipart

For operations that can be parallelized, split the file into chunks:

Use S3 Select or byte-range requests to read portions
Process chunks in parallel Lambda invocations
Combine results with multipart upload

Strategy 3: AWS Fargate for Very Large Files

For files requiring hours of processing or exceeding Lambda limits:

Converting Mermaid diagram...

Dispatch Large Files to Fargate
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import { S3Handler } from "aws-lambda";
import { ECSClient, RunTaskCommand } from "@aws-sdk/client-ecs";
import { S3Client, HeadObjectCommand } from "@aws-sdk/client-s3";
 
const ecs = new ECSClient({});
const s3 = new S3Client({});
 
const SIZE_THRESHOLD = 500 * 1024 * 1024; // 500MB
 
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    // Check file size
    const metadata = await s3.send(new HeadObjectCommand({
      Bucket: bucket,
      Key: key
    }));
    
    const fileSize = metadata.ContentLength || 0;
    
    if (fileSize > SIZE_THRESHOLD) {
      console.log(`Large file (${(fileSize / 1024 / 1024).toFixed(1)}MB), dispatching to Fargate`);
      
      // Run Fargate task for heavy processing
      await ecs.send(new RunTaskCommand({
        cluster: process.env.ECS_CLUSTER!,
        taskDefinition: process.env.TASK_DEFINITION!,
        launchType: "FARGATE",
        networkConfiguration: {
          awsvpcConfiguration: {
            subnets: process.env.SUBNETS!.split(","),
            assignPublicIp: "ENABLED"
          }
        },
        overrides: {
          containerOverrides: [{
            name: "file-processor",
            environment: [
              { name: "INPUT_BUCKET", value: bucket },
              { name: "INPUT_KEY", value: key },
              { name: "OUTPUT_BUCKET", value: process.env.OUTPUT_BUCKET! }
            ]
          }]
        }
      }));
      
    } else {
      // Process in Lambda
      await processFileInLambda(bucket, key);
    }
  }
};

File Size Strategy Guide
File Size	Strategy	Max Duration	Memory Needed
< 100MB	Lambda, load to memory	15 min	2x file size
100MB - 500MB	Lambda, streaming	15 min	256MB-1GB
500MB - 5GB	Lambda, ephemeral storage	15 min	10GB storage
5GB+	Fargate	Hours/days	Up to 120GB
Parallelizable	Step Functions Map	Unlimited	Variable

Configure Ephemeral Storage

Lambda now supports up to 10GB of ephemeral storage (/tmp). Configure via EphemeralStorage setting in your function configuration. This enables processing of larger files without streaming, though you pay for the additional storage.

Content Security

User-uploaded content poses security risks. Files may contain malware, inappropriate content, or attempt to exploit processing vulnerabilities. A robust file processing pipeline includes multiple security layers.

Malware Scanning:

Scan files before processing or serving:

ClamAV Lambda Layer: Open-source antivirus in Lambda
Amazon GuardDuty Malware Protection: Managed scanning for S3
Third-party APIs: VirusTotal, MetaDefender

Malware Scanning with ClamAV
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import { S3Handler } from "aws-lambda";
import { S3Client, GetObjectCommand, DeleteObjectCommand, CopyObjectCommand } from "@aws-sdk/client-s3";
import { execSync } from "child_process";
import { writeFileSync, unlinkSync, mkdirSync } from "fs";
import { join } from "path";
 
const s3 = new S3Client({});
const TMP_DIR = "/tmp/scan";
 
// Ensure ClamAV definitions are updated (via layer or container)
const CLAMSCAN_PATH = "/opt/bin/clamscan";
const VIRUS_DEFINITIONS = "/opt/share/clamav";
 
export const handler: S3Handler = async (event) => {
  mkdirSync(TMP_DIR, { recursive: true });
  
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    const localPath = join(TMP_DIR, "file-to-scan");
    
    try {
      // Download file
      const response = await s3.send(new GetObjectCommand({
        Bucket: bucket,
        Key: key
      }));
      
      const buffer = Buffer.from(await response.Body!.transformToByteArray());
      writeFileSync(localPath, buffer);
      
      // Scan with ClamAV
      try {
        execSync(`${CLAMSCAN_PATH} --database=${VIRUS_DEFINITIONS} ${localPath}`, {
          timeout: 60000 // 60 second timeout
        });
        
        // Clean - move to approved bucket
        console.log(`${key}: CLEAN`);
        await s3.send(new CopyObjectCommand({
          Bucket: process.env.APPROVED_BUCKET!,
          Key: key,
          CopySource: `${bucket}/${key}`
        }));
        
      } catch (scanError: any) {
        if (scanError.status === 1) {
          // Virus found
          console.error(`${key}: INFECTED - ${scanError.stdout}`);
          
          // Delete infected file
          await s3.send(new DeleteObjectCommand({
            Bucket: bucket,
            Key: key
          }));
          
          // Alert security team
          await notifySecurityTeam(key, scanError.stdout);
          
        } else {
          throw scanError; // Scan error, not infection
        }
      }
      
    } finally {
      // Clean up
      try { unlinkSync(localPath); } catch {}
    }
  }
};
 
async function notifySecurityTeam(file: string, details: string) {
  // Send to SNS, Slack, PagerDuty, etc.
  console.error(`SECURITY ALERT: Infected file uploaded: ${file}`);
}

Content Moderation:

For user-generated images and videos, automated moderation identifies inappropriate content:

Amazon Rekognition:

Detect explicit or suggestive content
Identify violence and weapons
Custom moderation labels for your policies

Content Moderation with Rekognition
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import { RekognitionClient, DetectModerationLabelsCommand } from "@aws-sdk/client-rekognition";
 
const rekognition = new RekognitionClient({});
 
interface ModerationResult {
  isSafe: boolean;
  confidence: number;
  flags: string[];
  details: Array<{ label: string; confidence: number }>;
}
 
export async function moderateImage(
  bucket: string, 
  key: string
): Promise<ModerationResult> {
  const response = await rekognition.send(new DetectModerationLabelsCommand({
    Image: {
      S3Object: { Bucket: bucket, Name: key }
    },
    MinConfidence: 75
  }));
  
  const labels = response.ModerationLabels || [];
  
  // Define blocked categories
  const blockedCategories = [
    "Explicit Nudity",
    "Violence",
    "Visually Disturbing",
    "Drugs",
    "Hate Symbols"
  ];
  
  const flags = labels
    .filter(label => 
      blockedCategories.some(blocked => 
        label.ParentName === blocked || label.Name === blocked
      )
    )
    .map(label => label.Name!);
  
  const maxConfidence = labels.length > 0 
    ? Math.max(...labels.map(l => l.Confidence || 0))
    : 0;
  
  return {
    isSafe: flags.length === 0,
    confidence: 100 - maxConfidence,
    flags,
    details: labels.map(l => ({
      label: l.Name!,
      confidence: l.Confidence || 0
    }))
  };
}

File Security Checklist

•Validate file types — Check magic bytes, not just extensions. Users can rename executables to .jpg.
•Limit file sizes — Enforce maximum sizes at API Gateway level before Lambda invocation.
•Scan for malware — Use ClamAV, GuardDuty, or third-party scanning before processing.
•Moderate content — Use Rekognition or similar for user-facing content.
•Process in isolation — Use separate S3 buckets for uploads vs. approved content.
•Strip metadata — Remove EXIF data from images before serving (may contain GPS, device info).
•Generate new filenames — Don't serve files with user-provided names (path traversal risks).

Never Trust User Input

Treat every uploaded file as potentially malicious. Validate everything: file type, size, content, metadata. Process uploads in isolated environments. Never execute or include user uploads in your application without thorough validation.

Performance Optimization

File processing workloads benefit from specific optimizations that differ from typical API workloads.

Memory and CPU Optimization:

Lambda CPU scales linearly with memory. For CPU-intensive operations like image processing or compression, increasing memory can significantly reduce execution time and cost:

Memory vs. Performance Trade-offs
Memory	vCPU Equivalent	Image Resize Duration	Cost per Execution
512MB	~0.3 vCPU	2,500ms	$0.000021
1024MB	~0.6 vCPU	1,300ms	$0.000022
1769MB	1 vCPU	850ms	$0.000024
3008MB	~1.7 vCPU	550ms	$0.000027
10240MB	6 vCPU	180ms	$0.000031

Parallel Processing:

When processing multiple outputs (e.g., multiple image sizes), run operations in parallel:

Parallel Processing Pattern
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Sequential: Total time = sum of all operations
async function processSequential(image: Buffer, variants: Variant[]) {
  const results = [];
  for (const variant of variants) {
    results.push(await generateVariant(image, variant));
  }
  return results;
}
// If each variant takes 500ms, 5 variants = 2,500ms
 
// Parallel: Total time = longest operation
async function processParallel(image: Buffer, variants: Variant[]) {
  return Promise.all(
    variants.map(variant => generateVariant(image, variant))
  );
}
// If each variant takes 500ms, 5 variants = ~500ms (if CPU allows)
 
// Controlled parallelism: Balance CPU and memory
async function processControlled(image: Buffer, variants: Variant[], concurrency: number) {
  const results: any[] = [];
  
  for (let i = 0; i < variants.length; i += concurrency) {
    const batch = variants.slice(i, i + concurrency);
    const batchResults = await Promise.all(
      batch.map(variant => generateVariant(image, variant))
    );
    results.push(...batchResults);
  }
  
  return results;
}
// Concurrency of 2: 5 variants = ~1,500ms, lower peak memory

Caching and Preprocessing:

Pre-warm expensive resources: Load models, fonts, or templates at module scope
Cache intermediate results: Store processed derivatives for reuse
Use appropriate formats: WebP for final images, but process in source format for quality

Performance Best Practices

•Right-size memory — Benchmark with Lambda Power Tuning to find optimal memory setting.
•Stream when possible — Don't load entire files into memory if you can process incrementally.
•Parallelize independent operations — Generate all image variants concurrently.
•Use efficient libraries — Sharp is faster than ImageMagick for most image operations.
•Avoid re-initialization — Initialize clients and load resources outside the handler.
•Consider Graviton — ARM-based Lambda functions can be faster and cheaper for many workloads.

AWS Lambda Power Tuning

Use the open-source AWS Lambda Power Tuning tool to automatically find the optimal memory configuration. It runs your function at different memory settings and graphs performance vs. cost, helping you find the sweet spot for your specific workload.

File Processing Architecture Patterns

Well-designed file processing systems follow architectural patterns that handle failures gracefully, provide visibility, and scale effectively.

Pattern 1: Quarantine-Process-Approve

New uploads go to a quarantine bucket, are processed and validated, then moved to an approved bucket if they pass:

Converting Mermaid diagram...

Pattern 2: Processing Pipeline with Status Tracking

For complex processing with multiple stages, track status in a database:

File Processing Status Tracking
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
interface FileProcessingRecord {
  fileId: string;
  originalKey: string;
  uploadedBy: string;
  uploadedAt: string;
  status: "uploaded" | "scanning" | "moderating" | "processing" | "complete" | "failed";
  stages: {
    malwareScan?: { status: string; completedAt?: string };
    moderation?: { status: string; flags?: string[]; completedAt?: string };
    processing?: { 
      status: string; 
      outputs?: string[]; 
      completedAt?: string 
    };
  };
  error?: { stage: string; message: string; timestamp: string };
  completedAt?: string;
}
 
// Update status at each stage
async function updateProcessingStatus(
  fileId: string, 
  stage: string, 
  update: Record<string, any>
): Promise<void> {
  await dynamodb.send(new UpdateItemCommand({
    TableName: "FileProcessing",
    Key: { fileId: { S: fileId } },
    UpdateExpression: `SET stages.#stage = :update, #status = :status, updatedAt = :now`,
    ExpressionAttributeNames: {
      "#stage": stage,
      "#status": "status"
    },
    ExpressionAttributeValues: {
      ":update": { M: marshall(update) },
      ":status": { S: stage },
      ":now": { S: new Date().toISOString() }
    }
  }));
}
 
// API endpoint to check processing status
export const statusHandler: APIGatewayProxyHandlerV2 = async (event) => {
  const fileId = event.pathParameters?.fileId;
  
  const result = await dynamodb.send(new GetItemCommand({
    TableName: "FileProcessing",
    Key: { fileId: { S: fileId! } }
  }));
  
  if (!result.Item) {
    return { statusCode: 404, body: JSON.stringify({ error: "File not found" }) };
  }
  
  return {
    statusCode: 200,
    body: JSON.stringify(unmarshall(result.Item))
  };
};

Pattern 3: Step Functions Orchestration

For complex workflows with branching, retries, and human-in-the-loop approval, Step Functions provide visual orchestration:

Conditional paths based on file type
Parallel processing stages
Automatic retry with backoff
Wait states for external approval
Visualization and debugging in AWS Console

Pattern Selection Guide
Pattern	Complexity	Best For	Visibility
Direct S3 → Lambda	Low	Simple transformations	CloudWatch logs only
SQS Buffered	Medium	Bursty uploads, retry needed	Queue metrics, DLQ
Status Tracking (DB)	Medium	User-facing status	API queryable
Step Functions	High	Multi-stage, approvals	Visual workflow

User Experience Matters

For user-uploaded content, provide immediate feedback. Return success after upload (not after processing), then process asynchronously. Provide a status endpoint or use WebSockets to notify when processing completes. Users shouldn't wait for 30-second image processing before seeing confirmation.

Summary: Real-Time File Processing

Real-time file processing showcases serverless at its best: unpredictable workloads trigger exactly the compute needed, scale automatically, and cost nothing when idle. From image thumbnails to video transcoding to document extraction, these patterns enable sophisticated file processing without managing infrastructure.

Let's consolidate the key takeaways:

Key Takeaways

•S3 + Lambda is the core pattern — File uploads trigger processing automatically, with built-in scaling and retry.
•Use specialized services for heavy lifting — MediaConvert for video, Textract for documents, Rekognition for moderation.
•Handle large files strategically — Streaming, chunking, or Fargate for files exceeding Lambda limits.
•Security is non-negotiable — Scan for malware, moderate content, validate file types, strip metadata.
•Optimize memory for CPU-intensive work — More memory means more CPU in Lambda; benchmark to find optimal settings.
•Parallelize independent operations — Generate all image variants concurrently to minimize total processing time.
•Provide user visibility — Track processing status in a database and offer status APIs for user-facing applications.

Module Complete:

With this page, you've completed the Serverless Patterns module. You've learned five powerful patterns for serverless computing: event-driven processing, API backends, scheduled tasks, data processing pipelines, and real-time file processing. These patterns form the foundation of most serverless architectures, enabling you to build scalable, cost-effective systems for a wide variety of use cases.

Module Complete

Congratulations! You now have comprehensive knowledge of serverless patterns for file processing. Combined with the other patterns in this module—event-driven processing, APIs, scheduling, and data pipelines—you're equipped to architect serverless solutions for almost any use case. The next module will explore the limitations and trade-offs of serverless computing, helping you understand when serverless is (and isn't) the right choice.

Real-Time File Processing

Serverless File Processing at Scale

What You Will Learn

Image Processing

Common Image Processing Operations:

Resizing: Generate thumbnails and responsive image variants
Format Conversion: Convert to WebP, AVIF for web optimization
Optimization: Compress without visible quality loss
Cropping: Smart cropping, face detection for avatars
Watermarking: Add branding or copyright marks
Metadata Extraction: EXIF data, dimensions, color profiles
Content Moderation: Detect inappropriate content

Image Processing Lambda
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
import { S3Handler } from "aws-lambda";
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import sharp from "sharp";
 
const s3 = new S3Client({});
 
interface ImageVariant {
  suffix: string;
  width: number;
  height?: number;
  format: "jpeg" | "webp" | "avif" | "png";
  quality: number;
}
 
const VARIANTS: ImageVariant[] = [
  { suffix: "thumb", width: 150, height: 150, format: "webp", quality: 80 },
  { suffix: "small", width: 320, format: "webp", quality: 85 },
  { suffix: "medium", width: 800, format: "webp", quality: 85 },
  { suffix: "large", width: 1920, format: "webp", quality: 90 },
  { suffix: "original", width: 4096, format: "webp", quality: 95 }
];
 
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    // Skip if not in uploads folder or not an image
    if (!key.startsWith("uploads/") || !isImage(key)) {
      console.log(`Skipping: ${key}`);
      continue;
    }
    
    console.log(`Processing image: s3://${bucket}/${key}`);
    
    try {
      // Get the original image
      const original = await s3.send(new GetObjectCommand({
        Bucket: bucket,
        Key: key
      }));
      
      const imageBuffer = Buffer.from(await original.Body!.transformToByteArray());
      
      // Get image metadata
      const metadata = await sharp(imageBuffer).metadata();
      console.log(`Original: ${metadata.width}x${metadata.height}, ${metadata.format}`);
      
      // Generate all variants
      const results = await Promise.all(
        VARIANTS.map(variant => generateVariant(imageBuffer, key, variant))
      );
      
      // Upload all variants
      await Promise.all(
        results.map(result => 
          s3.send(new PutObjectCommand({
            Bucket: process.env.OUTPUT_BUCKET!,
            Key: result.key,
            Body: result.buffer,
            ContentType: `image/${result.format}`,
            CacheControl: "public, max-age=31536000, immutable"
          }))
        )
      );
      
      console.log(`Generated ${results.length} variants for ${key}`);
      
      // Store metadata for API access
      await storeImageMetadata(key, metadata, results);
      
    } catch (error) {
      console.error(`Failed to process ${key}:`, error);
      throw error;
    }
  }
};
 
async function generateVariant(
  original: Buffer, 
  originalKey: string,
  variant: ImageVariant
): Promise<{ key: string; buffer: Buffer; format: string }> {
  let pipeline = sharp(original);
  
  // Resize with smart cropping for thumbnails
  if (variant.height) {
    pipeline = pipeline.resize(variant.width, variant.height, {
      fit: "cover",
      position: "attention" // Smart crop focusing on interesting areas
    });
  } else {
    pipeline = pipeline.resize(variant.width, undefined, {
      fit: "inside",
      withoutEnlargement: true
    });
  }
  
  // Convert to target format
  switch (variant.format) {
    case "webp":
      pipeline = pipeline.webp({ quality: variant.quality });
      break;
    case "avif":
      pipeline = pipeline.avif({ quality: variant.quality });
      break;
    case "jpeg":
      pipeline = pipeline.jpeg({ quality: variant.quality, mozjpeg: true });
      break;
  }
  
  const buffer = await pipeline.toBuffer();
  const baseName = originalKey.replace("uploads/", "").replace(/\.[^.]+$/, "");
  
  return {
    key: `processed/${baseName}-${variant.suffix}.${variant.format}`,
    buffer,
    format: variant.format
  };
}
 
function isImage(key: string): boolean {
  const imageExtensions = [".jpg", ".jpeg", ".png", ".gif", ".webp", ".tiff", ".bmp"];
  return imageExtensions.some(ext => key.toLowerCase().endsWith(ext));
}

Layer Configuration for Sharp:

The Sharp library requires native binaries. For Lambda, you need a layer with Linux-compatible binaries:

# Build sharp layer for Lambda
npm install --platform=linux --arch=x64 sharp
# Or use pre-built layer from community

Alternatively, use Lambda's container image support to include Sharp in a Docker image.

Image Formats and Use Cases
Format	Best For	Browser Support	Compression
WebP	General web use	95%+ browsers	25-35% smaller than JPEG
AVIF	Maximum compression	~90% browsers	50% smaller than JPEG
JPEG	Photos, fallback	Universal	Baseline standard
PNG	Transparency, graphics	Universal	Lossless, larger files
SVG	Icons, illustrations	Universal	Infinitely scalable

On-Demand Image Processing

Video Processing

AWS MediaConvert (Recommended for Most Cases):

MediaConvert is a fully managed video transcoding service:

Serverless: No infrastructure to manage
Pay per minute: Charged by output duration
Broad format support: Input almost anything, output to HLS, DASH, MP4
Quality options: Basic to professional broadcast quality
Lambda integration: Trigger jobs on upload, receive completion events

Video Transcoding with MediaConvert
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
import { S3Handler } from "aws-lambda";
import { MediaConvertClient, CreateJobCommand, CreateJobRequest } from "@aws-sdk/client-mediaconvert";
 
const mediaConvert = new MediaConvertClient({
  endpoint: process.env.MEDIACONVERT_ENDPOINT
});
 
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    if (!isVideo(key)) continue;
    
    const inputUri = `s3://${bucket}/${key}`;
    const outputPrefix = key.replace("uploads/", "transcoded/").replace(/\.[^.]+$/, "");
    
    const jobSettings: CreateJobRequest = {
      Role: process.env.MEDIACONVERT_ROLE_ARN!,
      Settings: {
        Inputs: [{
          FileInput: inputUri,
          AudioSelectors: {
            "Audio Selector 1": { DefaultSelection: "DEFAULT" }
          },
          VideoSelector: {}
        }],
        OutputGroups: [
          // HLS for adaptive streaming
          {
            Name: "HLS Group",
            OutputGroupSettings: {
              Type: "HLS_GROUP_SETTINGS",
              HlsGroupSettings: {
                Destination: `s3://${process.env.OUTPUT_BUCKET}/${outputPrefix}/hls/`,
                SegmentLength: 6,
                MinSegmentLength: 2
              }
            },
            Outputs: [
              // 1080p
              {
                VideoDescription: {
                  Width: 1920,
                  Height: 1080,
                  CodecSettings: {
                    Codec: "H_264",
                    H264Settings: {
                      RateControlMode: "QVBR",
                      MaxBitrate: 8000000,
                      QvbrSettings: { QvbrQualityLevel: 8 }
                    }
                  }
                },
                AudioDescriptions: [{
                  CodecSettings: {
                    Codec: "AAC",
                    AacSettings: { Bitrate: 192000, SampleRate: 48000 }
                  }
                }],
                ContainerSettings: { Container: "M3U8" }
              },
              // 720p
              {
                VideoDescription: {
                  Width: 1280,
                  Height: 720,
                  CodecSettings: {
                    Codec: "H_264",
                    H264Settings: {
                      RateControlMode: "QVBR",
                      MaxBitrate: 5000000,
                      QvbrSettings: { QvbrQualityLevel: 7 }
                    }
                  }
                },
                AudioDescriptions: [{
                  CodecSettings: {
                    Codec: "AAC",
                    AacSettings: { Bitrate: 128000, SampleRate: 48000 }
                  }
                }],
                ContainerSettings: { Container: "M3U8" }
              },
              // 480p
              {
                VideoDescription: {
                  Width: 854,
                  Height: 480,
                  CodecSettings: {
                    Codec: "H_264",
                    H264Settings: {
                      RateControlMode: "QVBR",
                      MaxBitrate: 2500000,
                      QvbrSettings: { QvbrQualityLevel: 6 }
                    }
                  }
                },
                AudioDescriptions: [{
                  CodecSettings: {
                    Codec: "AAC",
                    AacSettings: { Bitrate: 96000, SampleRate: 48000 }
                  }
                }],
                ContainerSettings: { Container: "M3U8" }
              }
            ]
          },
          // MP4 for download
          {
            Name: "MP4 Group",
            OutputGroupSettings: {
              Type: "FILE_GROUP_SETTINGS",
              FileGroupSettings: {
                Destination: `s3://${process.env.OUTPUT_BUCKET}/${outputPrefix}/mp4/`
              }
            },
            Outputs: [{
              VideoDescription: {
                Width: 1920,
                Height: 1080,
                CodecSettings: {
                  Codec: "H_264",
                  H264Settings: {
                    RateControlMode: "QVBR",
                    MaxBitrate: 10000000,
                    QvbrSettings: { QvbrQualityLevel: 9 }
                  }
                }
              },
              AudioDescriptions: [{
                CodecSettings: {
                  Codec: "AAC",
                  AacSettings: { Bitrate: 256000, SampleRate: 48000 }
                }
              }],
              ContainerSettings: {
                Container: "MP4",
                Mp4Settings: {}
              }
            }]
          }
        ]
      }
    };
    
    const result = await mediaConvert.send(new CreateJobCommand(jobSettings));
    console.log(`Created MediaConvert job: ${result.Job?.Id}`);
  }
};
 
function isVideo(key: string): boolean {
  const videoExtensions = [".mp4", ".mov", ".avi", ".mkv", ".webm", ".m4v"];
  return videoExtensions.some(ext => key.toLowerCase().endsWith(ext));
}

Thumbnail Generation from Video:

Media Convert can also extract thumbnails at specified intervals or using smart frame selection. For simple thumbnail extraction, Lambda with FFmpeg can work for short videos:

Converting Mermaid diagram...

Video Processing Economics

Document Processing

PDF Processing:

PDF operations include:

Merge multiple PDFs into one
Split PDFs into individual pages
Extract text for search indexing
Generate thumbnails of pages
Add watermarks or stamps
Fill form fields programmatically

PDF Processing with pdf-lib
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
import { S3Handler } from "aws-lambda";
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import { PDFDocument, StandardFonts, rgb } from "pdf-lib";
 
const s3 = new S3Client({});
 
/**
 * Add watermark to uploaded PDFs
 */
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    if (!key.endsWith(".pdf")) continue;
    
    try {
      // Get original PDF
      const response = await s3.send(new GetObjectCommand({
        Bucket: bucket,
        Key: key
      }));
      
      const pdfBytes = await response.Body!.transformToByteArray();
      
      // Load PDF
      const pdfDoc = await PDFDocument.load(pdfBytes);
      const helvetica = await pdfDoc.embedFont(StandardFonts.Helvetica);
      
      // Add watermark to each page
      const pages = pdfDoc.getPages();
      for (const page of pages) {
        const { width, height } = page.getSize();
        
        page.drawText("CONFIDENTIAL", {
          x: width / 2 - 100,
          y: height / 2,
          size: 50,
          font: helvetica,
          color: rgb(0.75, 0.75, 0.75),
          opacity: 0.3,
          rotate: { angle: 45, type: "degrees" }
        });
      }
      
      // Add metadata
      pdfDoc.setTitle("Processed Document");
      pdfDoc.setModificationDate(new Date());
      pdfDoc.setProducer("Document Processing System");
      
      // Save and upload
      const modifiedPdf = await pdfDoc.save();
      const outputKey = key.replace("uploads/", "watermarked/");
      
      await s3.send(new PutObjectCommand({
        Bucket: process.env.OUTPUT_BUCKET!,
        Key: outputKey,
        Body: modifiedPdf,
        ContentType: "application/pdf"
      }));
      
      console.log(`Watermarked: ${key} -> ${outputKey}`);
      
    } catch (error) {
      console.error(`Failed to process ${key}:`, error);
      throw error;
    }
  }
};

Text Extraction with Amazon Textract:

For extracting text from documents, including scanned images and complex layouts, Amazon Textract provides AI-powered extraction:

Text Extraction with Textract
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
import { S3Handler } from "aws-lambda";
import { TextractClient, DetectDocumentTextCommand, AnalyzeDocumentCommand } from "@aws-sdk/client-textract";
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";
 
const textract = new TextractClient({});
const s3 = new S3Client({});
 
interface ExtractedDocument {
  fileName: string;
  extractedAt: string;
  pageCount: number;
  text: string;
  tables: any[];
  forms: Record<string, string>;
}
 
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    try {
      // Analyze document with Textract
      const analysis = await textract.send(new AnalyzeDocumentCommand({
        Document: {
          S3Object: { Bucket: bucket, Name: key }
        },
        FeatureTypes: ["TABLES", "FORMS"]
      }));
      
      // Extract text blocks
      const textBlocks = analysis.Blocks?.filter(b => b.BlockType === "LINE") || [];
      const text = textBlocks.map(b => b.Text).join("\n");
      
      // Extract tables
      const tables = extractTables(analysis.Blocks || []);
      
      // Extract form key-value pairs
      const forms = extractForms(analysis.Blocks || []);
      
      const extracted: ExtractedDocument = {
        fileName: key,
        extractedAt: new Date().toISOString(),
        pageCount: analysis.Blocks?.filter(b => b.BlockType === "PAGE").length || 1,
        text,
        tables,
        forms
      };
      
      // Store extracted content
      const outputKey = key.replace(/\.[^.]+$/, ".json");
      await s3.send(new PutObjectCommand({
        Bucket: process.env.OUTPUT_BUCKET!,
        Key: `extracted/${outputKey}`,
        Body: JSON.stringify(extracted, null, 2),
        ContentType: "application/json"
      }));
      
      console.log(`Extracted ${text.length} characters from ${key}`);
      
    } catch (error) {
      console.error(`Textract failed for ${key}:`, error);
      throw error;
    }
  }
};
 
function extractForms(blocks: any[]): Record<string, string> {
  const forms: Record<string, string> = {};
  
  const keyMap = new Map<string, string>();
  const valueMap = new Map<string, string>();
  
  for (const block of blocks) {
    if (block.BlockType === "KEY_VALUE_SET") {
      const text = getBlockText(block, blocks);
      if (block.EntityTypes?.includes("KEY")) {
        keyMap.set(block.Id, text);
      } else {
        valueMap.set(block.Id, text);
      }
    }
  }
  
  // Match keys to values
  for (const block of blocks) {
    if (block.BlockType === "KEY_VALUE_SET" && block.EntityTypes?.includes("KEY")) {
      const keyText = keyMap.get(block.Id) || "";
      const valueId = block.Relationships?.find((r: any) => r.Type === "VALUE")?.Ids?.[0];
      const valueText = valueId ? valueMap.get(valueId) || "" : "";
      if (keyText) {
        forms[keyText] = valueText;
      }
    }
  }
  
  return forms;
}

Document Processing Services
Service/Library	Use Case	Serverless Compatible	Cost Model
pdf-lib	PDF manipulation (merge, split, watermark)	Yes (pure JS)	Free, open source
Amazon Textract	AI text/table/form extraction	Yes	Per page analyzed
Amazon Comprehend	NLP (sentiment, entities)	Yes	Per 100 characters
LibreOffice (container)	Office → PDF conversion	Yes (via container)	Compute cost only
Pandoc (container)	Format conversion	Yes (via container)	Compute cost only

Container Images for Complex Tools

Handling Large Files

Strategy 1: Streaming Processing

Process data in streams without loading entire file into memory:

Streaming Large File Processing
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import { S3Handler } from "aws-lambda";
import { S3Client, GetObjectCommand, PutObjectCommand } from "@aws-sdk/client-s3";
import { Readable, Transform, pipeline } from "stream";
import { promisify } from "util";
import { createGzip } from "zlib";
 
const s3 = new S3Client({});
const pipelineAsync = promisify(pipeline);
 
/**
 * Process large CSV files line-by-line without loading into memory
 */
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    // Get file as stream
    const response = await s3.send(new GetObjectCommand({
      Bucket: bucket,
      Key: key
    }));
    
    const inputStream = response.Body as Readable;
    
    // Create transform stream
    let lineCount = 0;
    const transformer = new Transform({
      transform(chunk, encoding, callback) {
        const lines = chunk.toString().split("\n");
        lines.forEach((line: string) => {
          if (line.trim()) {
            lineCount++;
            // Transform each line (example: add line number)
            this.push(`${lineCount}:${line}\n`);
          }
        });
        callback();
      }
    });
    
    // Collect output (for small results) or stream to S3
    const chunks: Buffer[] = [];
    transformer.on("data", chunk => chunks.push(chunk));
    
    await pipelineAsync(inputStream, transformer);
    
    const output = Buffer.concat(chunks);
    
    // Upload result
    await s3.send(new PutObjectCommand({
      Bucket: process.env.OUTPUT_BUCKET!,
      Key: key.replace("uploads/", "processed/"),
      Body: output
    }));
    
    console.log(`Processed ${lineCount} lines`);
  }
};

Strategy 2: Chunked Processing with S3 Multipart

For operations that can be parallelized, split the file into chunks:

Use S3 Select or byte-range requests to read portions
Process chunks in parallel Lambda invocations
Combine results with multipart upload

Strategy 3: AWS Fargate for Very Large Files

For files requiring hours of processing or exceeding Lambda limits:

Converting Mermaid diagram...

Dispatch Large Files to Fargate
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import { S3Handler } from "aws-lambda";
import { ECSClient, RunTaskCommand } from "@aws-sdk/client-ecs";
import { S3Client, HeadObjectCommand } from "@aws-sdk/client-s3";
 
const ecs = new ECSClient({});
const s3 = new S3Client({});
 
const SIZE_THRESHOLD = 500 * 1024 * 1024; // 500MB
 
export const handler: S3Handler = async (event) => {
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    // Check file size
    const metadata = await s3.send(new HeadObjectCommand({
      Bucket: bucket,
      Key: key
    }));
    
    const fileSize = metadata.ContentLength || 0;
    
    if (fileSize > SIZE_THRESHOLD) {
      console.log(`Large file (${(fileSize / 1024 / 1024).toFixed(1)}MB), dispatching to Fargate`);
      
      // Run Fargate task for heavy processing
      await ecs.send(new RunTaskCommand({
        cluster: process.env.ECS_CLUSTER!,
        taskDefinition: process.env.TASK_DEFINITION!,
        launchType: "FARGATE",
        networkConfiguration: {
          awsvpcConfiguration: {
            subnets: process.env.SUBNETS!.split(","),
            assignPublicIp: "ENABLED"
          }
        },
        overrides: {
          containerOverrides: [{
            name: "file-processor",
            environment: [
              { name: "INPUT_BUCKET", value: bucket },
              { name: "INPUT_KEY", value: key },
              { name: "OUTPUT_BUCKET", value: process.env.OUTPUT_BUCKET! }
            ]
          }]
        }
      }));
      
    } else {
      // Process in Lambda
      await processFileInLambda(bucket, key);
    }
  }
};

File Size Strategy Guide
File Size	Strategy	Max Duration	Memory Needed
< 100MB	Lambda, load to memory	15 min	2x file size
100MB - 500MB	Lambda, streaming	15 min	256MB-1GB
500MB - 5GB	Lambda, ephemeral storage	15 min	10GB storage
5GB+	Fargate	Hours/days	Up to 120GB
Parallelizable	Step Functions Map	Unlimited	Variable

Configure Ephemeral Storage

Content Security

Malware Scanning:

Scan files before processing or serving:

ClamAV Lambda Layer: Open-source antivirus in Lambda
Amazon GuardDuty Malware Protection: Managed scanning for S3
Third-party APIs: VirusTotal, MetaDefender

Malware Scanning with ClamAV
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
import { S3Handler } from "aws-lambda";
import { S3Client, GetObjectCommand, DeleteObjectCommand, CopyObjectCommand } from "@aws-sdk/client-s3";
import { execSync } from "child_process";
import { writeFileSync, unlinkSync, mkdirSync } from "fs";
import { join } from "path";
 
const s3 = new S3Client({});
const TMP_DIR = "/tmp/scan";
 
// Ensure ClamAV definitions are updated (via layer or container)
const CLAMSCAN_PATH = "/opt/bin/clamscan";
const VIRUS_DEFINITIONS = "/opt/share/clamav";
 
export const handler: S3Handler = async (event) => {
  mkdirSync(TMP_DIR, { recursive: true });
  
  for (const record of event.Records) {
    const bucket = record.s3.bucket.name;
    const key = decodeURIComponent(record.s3.object.key.replace(/\+/g, " "));
    
    const localPath = join(TMP_DIR, "file-to-scan");
    
    try {
      // Download file
      const response = await s3.send(new GetObjectCommand({
        Bucket: bucket,
        Key: key
      }));
      
      const buffer = Buffer.from(await response.Body!.transformToByteArray());
      writeFileSync(localPath, buffer);
      
      // Scan with ClamAV
      try {
        execSync(`${CLAMSCAN_PATH} --database=${VIRUS_DEFINITIONS} ${localPath}`, {
          timeout: 60000 // 60 second timeout
        });
        
        // Clean - move to approved bucket
        console.log(`${key}: CLEAN`);
        await s3.send(new CopyObjectCommand({
          Bucket: process.env.APPROVED_BUCKET!,
          Key: key,
          CopySource: `${bucket}/${key}`
        }));
        
      } catch (scanError: any) {
        if (scanError.status === 1) {
          // Virus found
          console.error(`${key}: INFECTED - ${scanError.stdout}`);
          
          // Delete infected file
          await s3.send(new DeleteObjectCommand({
            Bucket: bucket,
            Key: key
          }));
          
          // Alert security team
          await notifySecurityTeam(key, scanError.stdout);
          
        } else {
          throw scanError; // Scan error, not infection
        }
      }
      
    } finally {
      // Clean up
      try { unlinkSync(localPath); } catch {}
    }
  }
};
 
async function notifySecurityTeam(file: string, details: string) {
  // Send to SNS, Slack, PagerDuty, etc.
  console.error(`SECURITY ALERT: Infected file uploaded: ${file}`);
}

Content Moderation:

For user-generated images and videos, automated moderation identifies inappropriate content:

Amazon Rekognition:

Detect explicit or suggestive content
Identify violence and weapons
Custom moderation labels for your policies

Content Moderation with Rekognition
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
import { RekognitionClient, DetectModerationLabelsCommand } from "@aws-sdk/client-rekognition";
 
const rekognition = new RekognitionClient({});
 
interface ModerationResult {
  isSafe: boolean;
  confidence: number;
  flags: string[];
  details: Array<{ label: string; confidence: number }>;
}
 
export async function moderateImage(
  bucket: string, 
  key: string
): Promise<ModerationResult> {
  const response = await rekognition.send(new DetectModerationLabelsCommand({
    Image: {
      S3Object: { Bucket: bucket, Name: key }
    },
    MinConfidence: 75
  }));
  
  const labels = response.ModerationLabels || [];
  
  // Define blocked categories
  const blockedCategories = [
    "Explicit Nudity",
    "Violence",
    "Visually Disturbing",
    "Drugs",
    "Hate Symbols"
  ];
  
  const flags = labels
    .filter(label => 
      blockedCategories.some(blocked => 
        label.ParentName === blocked || label.Name === blocked
      )
    )
    .map(label => label.Name!);
  
  const maxConfidence = labels.length > 0 
    ? Math.max(...labels.map(l => l.Confidence || 0))
    : 0;
  
  return {
    isSafe: flags.length === 0,
    confidence: 100 - maxConfidence,
    flags,
    details: labels.map(l => ({
      label: l.Name!,
      confidence: l.Confidence || 0
    }))
  };
}

File Security Checklist

•Validate file types — Check magic bytes, not just extensions. Users can rename executables to .jpg.
•Limit file sizes — Enforce maximum sizes at API Gateway level before Lambda invocation.
•Scan for malware — Use ClamAV, GuardDuty, or third-party scanning before processing.
•Moderate content — Use Rekognition or similar for user-facing content.
•Process in isolation — Use separate S3 buckets for uploads vs. approved content.
•Strip metadata — Remove EXIF data from images before serving (may contain GPS, device info).
•Generate new filenames — Don't serve files with user-provided names (path traversal risks).

Never Trust User Input

Performance Optimization

File processing workloads benefit from specific optimizations that differ from typical API workloads.

Memory and CPU Optimization:

Lambda CPU scales linearly with memory. For CPU-intensive operations like image processing or compression, increasing memory can significantly reduce execution time and cost:

Memory vs. Performance Trade-offs
Memory	vCPU Equivalent	Image Resize Duration	Cost per Execution
512MB	~0.3 vCPU	2,500ms	$0.000021
1024MB	~0.6 vCPU	1,300ms	$0.000022
1769MB	1 vCPU	850ms	$0.000024
3008MB	~1.7 vCPU	550ms	$0.000027
10240MB	6 vCPU	180ms	$0.000031

Parallel Processing:

When processing multiple outputs (e.g., multiple image sizes), run operations in parallel:

Parallel Processing Pattern
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Sequential: Total time = sum of all operations
async function processSequential(image: Buffer, variants: Variant[]) {
  const results = [];
  for (const variant of variants) {
    results.push(await generateVariant(image, variant));
  }
  return results;
}
// If each variant takes 500ms, 5 variants = 2,500ms
 
// Parallel: Total time = longest operation
async function processParallel(image: Buffer, variants: Variant[]) {
  return Promise.all(
    variants.map(variant => generateVariant(image, variant))
  );
}
// If each variant takes 500ms, 5 variants = ~500ms (if CPU allows)
 
// Controlled parallelism: Balance CPU and memory
async function processControlled(image: Buffer, variants: Variant[], concurrency: number) {
  const results: any[] = [];
  
  for (let i = 0; i < variants.length; i += concurrency) {
    const batch = variants.slice(i, i + concurrency);
    const batchResults = await Promise.all(
      batch.map(variant => generateVariant(image, variant))
    );
    results.push(...batchResults);
  }
  
  return results;
}
// Concurrency of 2: 5 variants = ~1,500ms, lower peak memory

Caching and Preprocessing:

Pre-warm expensive resources: Load models, fonts, or templates at module scope
Cache intermediate results: Store processed derivatives for reuse
Use appropriate formats: WebP for final images, but process in source format for quality

Performance Best Practices

•Right-size memory — Benchmark with Lambda Power Tuning to find optimal memory setting.
•Stream when possible — Don't load entire files into memory if you can process incrementally.
•Parallelize independent operations — Generate all image variants concurrently.
•Use efficient libraries — Sharp is faster than ImageMagick for most image operations.
•Avoid re-initialization — Initialize clients and load resources outside the handler.
•Consider Graviton — ARM-based Lambda functions can be faster and cheaper for many workloads.

AWS Lambda Power Tuning

File Processing Architecture Patterns

Well-designed file processing systems follow architectural patterns that handle failures gracefully, provide visibility, and scale effectively.

Pattern 1: Quarantine-Process-Approve

New uploads go to a quarantine bucket, are processed and validated, then moved to an approved bucket if they pass:

Converting Mermaid diagram...

Pattern 2: Processing Pipeline with Status Tracking

For complex processing with multiple stages, track status in a database:

File Processing Status Tracking
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
interface FileProcessingRecord {
  fileId: string;
  originalKey: string;
  uploadedBy: string;
  uploadedAt: string;
  status: "uploaded" | "scanning" | "moderating" | "processing" | "complete" | "failed";
  stages: {
    malwareScan?: { status: string; completedAt?: string };
    moderation?: { status: string; flags?: string[]; completedAt?: string };
    processing?: { 
      status: string; 
      outputs?: string[]; 
      completedAt?: string 
    };
  };
  error?: { stage: string; message: string; timestamp: string };
  completedAt?: string;
}
 
// Update status at each stage
async function updateProcessingStatus(
  fileId: string, 
  stage: string, 
  update: Record<string, any>
): Promise<void> {
  await dynamodb.send(new UpdateItemCommand({
    TableName: "FileProcessing",
    Key: { fileId: { S: fileId } },
    UpdateExpression: `SET stages.#stage = :update, #status = :status, updatedAt = :now`,
    ExpressionAttributeNames: {
      "#stage": stage,
      "#status": "status"
    },
    ExpressionAttributeValues: {
      ":update": { M: marshall(update) },
      ":status": { S: stage },
      ":now": { S: new Date().toISOString() }
    }
  }));
}
 
// API endpoint to check processing status
export const statusHandler: APIGatewayProxyHandlerV2 = async (event) => {
  const fileId = event.pathParameters?.fileId;
  
  const result = await dynamodb.send(new GetItemCommand({
    TableName: "FileProcessing",
    Key: { fileId: { S: fileId! } }
  }));
  
  if (!result.Item) {
    return { statusCode: 404, body: JSON.stringify({ error: "File not found" }) };
  }
  
  return {
    statusCode: 200,
    body: JSON.stringify(unmarshall(result.Item))
  };
};

Pattern 3: Step Functions Orchestration

For complex workflows with branching, retries, and human-in-the-loop approval, Step Functions provide visual orchestration:

Conditional paths based on file type
Parallel processing stages
Automatic retry with backoff
Wait states for external approval
Visualization and debugging in AWS Console

Pattern Selection Guide
Pattern	Complexity	Best For	Visibility
Direct S3 → Lambda	Low	Simple transformations	CloudWatch logs only
SQS Buffered	Medium	Bursty uploads, retry needed	Queue metrics, DLQ
Status Tracking (DB)	Medium	User-facing status	API queryable
Step Functions	High	Multi-stage, approvals	Visual workflow

User Experience Matters

Summary: Real-Time File Processing

Let's consolidate the key takeaways:

Key Takeaways

•S3 + Lambda is the core pattern — File uploads trigger processing automatically, with built-in scaling and retry.
•Use specialized services for heavy lifting — MediaConvert for video, Textract for documents, Rekognition for moderation.
•Handle large files strategically — Streaming, chunking, or Fargate for files exceeding Lambda limits.
•Security is non-negotiable — Scan for malware, moderate content, validate file types, strip metadata.
•Optimize memory for CPU-intensive work — More memory means more CPU in Lambda; benchmark to find optimal settings.
•Parallelize independent operations — Generate all image variants concurrently to minimize total processing time.
•Provide user visibility — Track processing status in a database and offer status APIs for user-facing applications.

Module Complete:

Module Complete