System Design (HLD)When to Use Serverless

When to Use Serverless

LevelAdvanced

Duration90 mins

TopicWhen to Use Serverless

5 / 5

Migration Strategies: Transitioning to Serverless

The Art of Controlled Transformation

Migrating to serverless is rarely a flip-the-switch operation. Production systems with established users, data, and integrations require controlled, incremental transformation that manages risk while steadily delivering benefits. The organizations that succeed at serverless migration treat it as a strategic initiative, not a weekend project.

This page provides a comprehensive framework for planning and executing serverless migrations—from initial assessment through full production cutover. You'll learn how to evaluate migration candidates, choose appropriate migration patterns, manage the transition period, and measure success. These strategies apply whether you're migrating a small service or an entire platform.

What You Will Master

By the end of this page, you will understand how to assess systems for serverless migration suitability, select appropriate migration patterns for different scenarios, plan migration projects with realistic timelines and checkpoints, execute migrations with minimal production risk, measure success and iterate based on outcomes, and handle organizational and cultural aspects of serverless adoption.

Migration Assessment Framework

Before migrating anything, you need systematic assessment. Not every system is a good serverless candidate, and even suitable systems may not be worth the migration effort.

The Migration Suitability Matrix:

Evaluate each potential migration candidate across two dimensions:

Serverless Fit — How well does this workload match serverless characteristics?
Migration Complexity — How difficult would the migration be?

The intersection determines your migration priority:

Migration Priority Matrix
	High Serverless Fit	Medium Serverless Fit	Low Serverless Fit
Low Complexity	Priority 1: Quick wins	Priority 2: Evaluate ROI	Skip: Poor candidate
Medium Complexity	Priority 2: Evaluate ROI	Priority 3: Consider if resources available	Skip: High effort, low reward
High Complexity	Priority 3: Plan carefully	Skip: Complex for unclear benefit	Skip: Wrong direction

Assessing Serverless Fit:

Use the workload characteristics framework from earlier in this module (traffic variability, execution duration, state requirements, latency tolerance). Score each characteristic and weight by importance.

Assessing Migration Complexity:

Evaluate complexity across these dimensions:

Complexity Factors

•Codebase size and structure — Large monoliths with high coupling are more complex to migrate than small, well-factored services
•State management — Systems with complex in-memory state require significant refactoring versus stateless HTTP handlers
•Integration dependencies — Systems with many upstream/downstream integrations require more coordination
•Data migration needs — Different database technologies or schema changes add complexity
•Testing infrastructure — Poor test coverage increases migration risk; existing comprehensive tests reduce it
•Team expertise — Teams familiar with cloud and event-driven patterns migrate faster than those learning simultaneously

Start with Quick Wins

Priority 1 candidates (high fit, low complexity) build organizational confidence and expertise. Even if larger systems offer more benefit, starting with quick wins creates momentum, develops skills, and establishes patterns before tackling complex migrations.

Migration Patterns

Different migration scenarios call for different patterns. Choose the pattern that best matches your system's characteristics and risk tolerance.

Pattern 1: Strangler Fig (Incremental Replacement)

Gradually replace monolith functionality with serverless functions, routing traffic to new implementations as they're ready. The legacy system "shrinks" over time until it can be decommissioned.

Converting Mermaid diagram...

When to Use Strangler Fig:

Large existing systems that can't be migrated all at once
Need to maintain production availability throughout migration
Want to prove serverless value incrementally
Team is learning serverless while migrating

Pattern 2: Parallel Run (Shadow/Compare)

Run both systems simultaneously, comparing outputs or metrics to validate the serverless implementation before cutover.

parallel-run-pattern.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
// Parallel Run Pattern Implementation
// Route traffic to both systems, compare results
 
interface ParallelRunConfig {
    legacyUrl: string;
    serverlessUrl: string;
    compareResponses: boolean;
    logDifferences: boolean;
    useServerlessResponse: boolean; // false = legacy is primary
    sampleRate: number; // 0-1, what % of traffic to parallel run
}
 
export const parallelRunHandler: APIGatewayProxyHandler = async (event) => {
    const config = getConfig();
    
    // Should we parallel run this request?
    const shouldParallel = Math.random() < config.sampleRate;
    
    if (!shouldParallel) {
        // Just use the primary system
        return config.useServerlessResponse 
            ? invokeServerless(event) 
            : invokeLegacy(event);
    }
    
    // Parallel execution
    const startTime = Date.now();
    const [legacyResult, serverlessResult] = await Promise.allSettled([
        invokeLegacy(event),
        invokeServerless(event),
    ]);
    
    // Log comparison metrics
    const metrics = {
        requestId: event.requestContext.requestId,
        legacyDuration: legacyResult.status === 'fulfilled' 
            ? legacyResult.value.duration : null,
        serverlessDuration: serverlessResult.status === 'fulfilled' 
            ? serverlessResult.value.duration : null,
        legacyStatus: legacyResult.status === 'fulfilled' 
            ? legacyResult.value.statusCode : 'error',
        serverlessStatus: serverlessResult.status === 'fulfilled' 
            ? serverlessResult.value.statusCode : 'error',
        responsesMatch: false,
    };
    
    // Compare responses if both succeeded
    if (config.compareResponses && 
        legacyResult.status === 'fulfilled' && 
        serverlessResult.status === 'fulfilled') {
        
        metrics.responsesMatch = compareResponses(
            legacyResult.value.body,
            serverlessResult.value.body,
        );
        
        if (!metrics.responsesMatch && config.logDifferences) {
            await logDifference({
                requestId: event.requestContext.requestId,
                request: event,
                legacyResponse: legacyResult.value.body,
                serverlessResponse: serverlessResult.value.body,
            });
        }
    }
    
    await publishMetrics(metrics);
    
    // Return primary system's response
    const primary = config.useServerlessResponse ? serverlessResult : legacyResult;
    if (primary.status === 'fulfilled') {
        return primary.value.response;
    } else {
        // Primary failed, try secondary
        const secondary = config.useServerlessResponse ? legacyResult : serverlessResult;
        if (secondary.status === 'fulfilled') {
            return secondary.value.response;
        }
        throw new Error('Both systems failed');
    }
};

When to Use Parallel Run:

High-risk migrations where correctness is critical
Complex business logic that's difficult to fully test
Need metrics to validate performance characteristics
Building confidence before traffic cutover

Pattern 3: Blue-Green Cutover

Build the complete serverless system alongside the legacy system, then switch traffic all at once (with instant rollback capability).

Blue-Green Migration Steps

•Build serverless version — Complete implementation with all features and integrations
•Ensure data synchronization — Both systems operate on same data or changes are replicated
•Test thoroughly — Integration tests, load tests, chaos tests against serverless version
•Configure routing — DNS or load balancer ready to switch traffic instantly
•Execute cutover — Switch 100% of traffic to serverless at planned time
•Monitor intensively — Watch all metrics during initial hours/days
•Rollback if needed — Switch back to legacy instantly if issues arise
•Decommission legacy — After validation period, shut down old system

Blue-Green Risk Consideration

Blue-Green requires the highest upfront investment and carries the most cutover risk. Use it only when strangler fig isn't practical (e.g., tightly coupled systems that can't be incrementally replaced) and when you have comprehensive test coverage and monitoring.

Planning the Migration

Successful migrations require upfront planning that addresses technical, organizational, and timeline considerations.

Step 1: Define Success Criteria

Before migrating, establish measurable success criteria. Without clear criteria, you can't know if the migration succeeded or when to stop iterating.

Example Migration Success Criteria
Category	Metric	Target	Validation Method
Functional	Feature parity	100% of critical features working	Feature checklist, user acceptance
Performance	P99 latency	≤ legacy P99 (150ms)	Load testing, production monitoring
Reliability	Error rate	< 0.1%	Production monitoring over 7 days
Cost	Monthly spend	≤ 80% of legacy infrastructure cost	AWS billing comparison
Operations	On-call pages	≤ legacy rate	PagerDuty metrics over 30 days
Developer	Deployment frequency	≥ 3x legacy rate	CI/CD metrics

Step 2: Map Dependencies

Identify all dependencies that affect migration scope and sequencing:

Dependency Mapping Checklist

•Upstream services — What calls the system being migrated? How will routing change?
•Downstream services — What does the system call? Are there connection pattern changes (e.g., database pooling)?
•Shared databases — What other systems access the same database? How will concurrent access work?
•Message queues — What produces messages? What consumes them? How will formats change?
•External integrations — Third-party APIs, webhooks, partner systems that expect specific behavior
•Authentication/authorization — Identity systems, token validation, permission models

Step 3: Design the Target Architecture

Document the serverless architecture in detail before building. Include:

Function boundaries and responsibilities
Event flows and integration patterns
Data storage choices and access patterns
Security model (IAM roles, VPC configuration)
Observability approach (logging, tracing, metrics)
Deployment pipeline design

Step 4: Create a Migration Timeline

Example Migration Timeline (12-Week Project)
Week	Phase	Activities	Milestone
1-2	Foundation	Infrastructure setup, CI/CD pipeline, observability	Can deploy and monitor Lambda
3-4	First Function	Migrate lowest-risk endpoint, integration tests	First production traffic on Lambda
5-7	Core Features	Migrate primary functionality, parallel running	50% traffic on serverless
8-9	Complete Migration	Remaining features, edge cases, load testing	100% traffic capable
10	Cutover	Production switch, intensive monitoring	All production on serverless
11-12	Validation	Stability monitoring, optimization, documentation	Migration declared complete

Build in Buffer Time

Migrations always take longer than estimated. Plan for 50% buffer on timeline estimates. The unknown unknowns—surprising legacy behavior, undocumented integrations, performance edge cases—always appear. Teams that plan buffer time handle them gracefully; teams that don't scramble and cut corners.

Executing the Migration

With planning complete, execution becomes the focus. Follow these practices for controlled, low-risk migration execution.

Practice 1: Start with a Canary

Before any migration work, deploy a trivial canary function to validate your infrastructure:

migration-canary.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Migration Canary: Validates infrastructure before real migration
// This function does almost nothing but proves everything works
 
export const canaryHandler: APIGatewayProxyHandler = async (event) => {
    const healthChecks = {
        lambda: true,
        timestamp: new Date().toISOString(),
        region: process.env.AWS_REGION,
        functionName: process.env.AWS_LAMBDA_FUNCTION_NAME,
        checks: {} as Record<string, boolean>,
    };
    
    // Test database connectivity
    try {
        await db.query('SELECT 1');
        healthChecks.checks.database = true;
    } catch (error) {
        healthChecks.checks.database = false;
        console.error('Database check failed:', error);
    }
    
    // Test cache connectivity
    try {
        await redis.ping();
        healthChecks.checks.cache = true;
    } catch (error) {
        healthChecks.checks.cache = false;
        console.error('Cache check failed:', error);
    }
    
    // Test external service (if applicable)
    try {
        await fetch(process.env.LEGACY_SERVICE_URL + '/health');
        healthChecks.checks.legacyService = true;
    } catch (error) {
        healthChecks.checks.legacyService = false;
        console.error('Legacy service check failed:', error);
    }
    
    // Emit custom metric
    await cloudwatch.putMetricData({
        Namespace: 'Migration',
        MetricData: [{
            MetricName: 'CanaryHealthy',
            Value: Object.values(healthChecks.checks).every(Boolean) ? 1 : 0,
            Unit: 'Count',
        }],
    });
    
    return {
        statusCode: 200,
        body: JSON.stringify(healthChecks),
    };
};
 
// Deploy this first and validate:
// - Lambda execution works
// - VPC networking is correct (can reach database)
// - IAM permissions are sufficient
// - Logging and metrics flow correctly
// - Deployment pipeline functions end-to-end

Practice 2: Gradual Traffic Shifting

Never switch 100% of traffic immediately. Use gradual traffic shifting to catch issues with minimal blast radius:

Traffic Shifting Schedule

•1% — Internal users or synthetic traffic only. Validate basic functionality works.
•5% — Small portion of real traffic. Monitor for errors, latency anomalies.
•10% — Sufficient traffic to expose edge cases. Watch for data consistency issues.
•25% — Quarter of traffic. Performance patterns become visible. Check downstream impact.
•50% — Half traffic. Cost patterns become measurable. Validate scaling behavior.
•75% — Most traffic. Stress test the migration setup. Validate rollback still works.
•100% — Full cutover. Intensive monitoring for 24-48 hours before declaring success.

Practice 3: Implement Feature Flags

Use feature flags to control migration behavior independently of deployments:

Migration Feature Flags

•migration.traffic_percentage — What % of traffic routes to serverless (0-100)
•migration.parallel_run — Whether to run both systems and compare (true/false)
•migration.fallback_enabled — Whether to fallback to legacy on serverless errors
•migration.bypass_users — User IDs to always route to legacy (for debugging)
•migration.force_serverless_users — User IDs to always route to serverless (for testing)

Always Maintain Rollback Capability

Until migration is fully validated (typically 2-4 weeks of stable operation), maintain instant rollback capability. This means keeping legacy infrastructure running, maintaining traffic routing capability, and ensuring data remains synchronized. The cost of running both systems temporarily is far less than the cost of a failed migration with no escape route.

Handling Data Migration

Data migration is often the most complex and risky aspect of serverless transitions. Application code can be rewritten; data is precious and irreplaceable.

Scenario 1: Same Database, Different Access Pattern

The serverless application uses the same database as the legacy system but accesses it differently (e.g., through RDS Proxy instead of direct connections).

Same Database Considerations

•Connection pooling — Configure RDS Proxy or external pooler before migration. Test under load.
•Concurrent access — Both systems access same data during transition. No schema changes that break either.
•Transaction isolation — Ensure serverless transactions don't conflict with legacy transactions
•Read replicas — Consider read replicas for serverless if load might impact legacy
•Monitoring — Watch connection counts, query performance during transition

Scenario 2: Database Technology Change

Migrating from one database to another (e.g., PostgreSQL to DynamoDB) alongside the serverless migration.

Database Migration Strategies
Strategy	Description	Risk Level	When to Use
Big Bang	Migrate all data at once, cutover	High	Small datasets, acceptable downtime window
Dual Write	Write to both DBs, read from new	Medium	Zero downtime required, can tolerate latency
CDC Replication	Change Data Capture keeps DBs in sync	Medium	Large datasets, need consistency validation
Gradual Table Migration	Migrate tables one at a time	Low	Independent tables, incremental approach

dual-write-pattern.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// Dual Write Pattern for Zero-Downtime Database Migration
 
interface DualWriteConfig {
    primaryDb: Database;  // Initially legacy, switches to new
    secondaryDb: Database;  // Initially new, switches to legacy
    writeToBoth: boolean;
    readFromPrimary: boolean;
    validateWrites: boolean;
}
 
class DualWriteRepository {
    constructor(private config: DualWriteConfig) {}
    
    async create(entity: Entity): Promise<Entity> {
        // Always write to primary
        const result = await this.config.primaryDb.insert(entity);
        
        if (this.config.writeToBoth) {
            try {
                // Write to secondary asynchronously
                await this.config.secondaryDb.insert(entity);
                
                if (this.config.validateWrites) {
                    // Validate data matches
                    const secondary = await this.config.secondaryDb.findById(result.id);
                    if (!deepEqual(result, secondary)) {
                        await this.logDiscrepancy('create', result, secondary);
                    }
                }
            } catch (error) {
                // Secondary write failed - log but don't fail the request
                await this.logSecondaryFailure('create', entity, error);
            }
        }
        
        return result;
    }
    
    async findById(id: string): Promise<Entity | null> {
        const primary = await this.config.primaryDb.findById(id);
        
        // Optionally validate secondary has same data
        if (this.config.validateWrites && primary) {
            const secondary = await this.config.secondaryDb.findById(id);
            if (!deepEqual(primary, secondary)) {
                await this.logDiscrepancy('read', primary, secondary);
            }
        }
        
        return primary;
    }
    
    // Migration phases:
    // 1. writeToBoth=true, primaryDb=legacy (build up new DB)
    // 2. Validate: compare all data between DBs
    // 3. readFromPrimary switches to new DB
    // 4. primaryDb switches to new DB  
    // 5. writeToBoth=false (stop writing to legacy)
    // 6. Decommission legacy DB
}

Data Migration is Not Reversible

Unlike application code, data changes are not easily rolled back. Before any data migration step, ensure you have complete backups, tested restore procedures, and a clear rollback plan. Test the entire migration process—including rollback—on production-like data in a staging environment.

Risk Mitigation Strategies

Every migration carries risk. The goal isn't to eliminate risk—it's to understand, quantify, and mitigate it to acceptable levels.

Risk 1: Performance Degradation

Serverless introduces cold starts and different performance characteristics that may not meet existing SLAs.

Performance Risk Mitigation

•Baseline before migration — Measure legacy P50/P95/P99 latency to establish comparison points
•Load test early — Test serverless implementation under realistic load before any traffic migration
•Provision concurrency — For latency-sensitive paths, configure provisioned concurrency
•Optimize cold starts — Minimize package size, use lazy loading, consider compiled languages
•Set rollback triggers — Define automatic rollback if latency exceeds thresholds during migration

Risk 2: Feature Regression

Migrated system may not implement all features correctly, especially edge cases.

Functional Risk Mitigation

•Comprehensive test suite — Invest in automated tests before migration. Cover edge cases, error handling, boundary conditions.
•Parallel run comparison — Run both systems and compare outputs for real traffic
•Production replay — Capture real requests and replay against serverless for comparison
•Gradual rollout — Small traffic percentage finds regressions before they affect many users
•Feature flags for rollback — Ability to disable specific migrated features without full rollback

Risk 3: Cost Overrun

Serverless may cost more than expected for certain workload patterns.

Cost Risk Mitigation

•Model costs before migration — Use the cost comparison framework from earlier pages with realistic traffic estimates
•Set billing alerts — Configure alerts at 50%, 75%, 100% of expected cost during migration
•Monitor during rollout — Track cost per request as traffic shifts, not just total cost
•Optimize before scaling — Right-size memory, use ARM64, optimize duration before 100% traffic
•Define cost ceiling — Establish maximum acceptable cost; rollback if exceeded

The Risk Register

Maintain a formal risk register documenting identified risks, likelihood, impact, mitigation strategies, and owners. Review weekly during active migration. This forces structured thinking about what could go wrong and how you'll respond.

Organizational Change Management

Technical migration is only half the challenge. Organizational and cultural changes often determine whether serverless adoption succeeds long-term.

Stakeholder Alignment:

Different stakeholders have different concerns about serverless migration:

Stakeholder Concerns and Responses
Stakeholder	Primary Concern	How to Address
Engineering Leadership	Will this reduce velocity?	Show deployment frequency improvements, demonstrate local dev story
Product Management	Will features be delayed?	Plan migration to minimize feature work interruption, show quick wins first
Operations/SRE	Will this be harder to debug?	Demonstrate observability approach, involve in tooling decisions
Finance	Will costs go up?	Provide cost modeling, commit to budget targets, show cost monitoring plan
Security	Is this as secure?	Review IAM model, demonstrate compliance capabilities, security review of architecture
Individual Engineers	Will my skills become obsolete?	Provide training, emphasize new skills, involve in design decisions

Skill Development:

Teams need explicit support to develop serverless skills:

Skill Development Program

•Dedicated learning time — Allocate 10-20% of time during migration for serverless learning
•Hands-on workshops — Build real serverless applications in guided sessions, not just watch presentations
•Pair programming — Team experienced members with those learning during migration work
•External training — AWS certifications, specialized serverless courses from qualified providers
•Internal knowledge sharing — Brown bags, documentation, internal tech talks on lessons learned
•Post-incident learning — Blameless postmortems that spread knowledge about serverless failure modes

Celebrating Progress:

Migrations are marathons, not sprints. Celebrate milestones to maintain momentum:

First function deployed to production
First 1% traffic on serverless
50% traffic milestone
Full cutover completion
Legacy system decommissioned
First month of stable operation

Resistance is Information

When team members resist serverless adoption, listen carefully. Sometimes resistance signals legitimate concerns that should influence migration decisions. Other times it reflects learning anxiety that needs support. Distinguish between 'this won't work' (investigate) and 'I don't know how to do this' (train and support).

Measuring Post-Migration Success

Migration isn't complete when traffic switches—it's complete when the new system demonstrates sustained success against your criteria.

Validation Period:

Establish a formal validation period (typically 2-4 weeks) after full cutover before declaring migration complete:

Validation Period Checklist

•No regressions — Zero customer-impacting functional issues attributable to migration
•Performance stable — Latency metrics within 10% of baseline or target
•Cost on target — Monthly run rate within budget projection
•Operations sustainable — No increase in on-call burden or incidents
•Team confident — Engineers comfortable deploying, debugging, and extending
•Rollback tested — Validated that rollback still works if needed in future

Post-Migration Retrospective:

Conduct a formal retrospective after each significant migration to capture lessons:

Retrospective Question Framework
Category	Questions to Explore
Planning	Was the scope accurate? What surprised us? What did we miss?
Execution	What went smoothly? What was harder than expected? Where did we deviate from plan?
Timeline	How did actual compare to estimate? Where did delays occur? What accelerated us?
Risk	Did identified risks materialize? Were there surprise risks? How did mitigation work?
Team	Did we have the right skills? Where did we need help? What would we learn next time?
Tools	What tooling helped? What was missing? What would we invest in for future migrations?

Continuous Optimization:

Migration is the beginning, not the end. Serverless systems require ongoing optimization:

Post-Migration Optimization

•Memory right-sizing — Use production data to optimize memory configuration for cost/performance balance
•Cold start reduction — Analyze cold start patterns; implement provisioned concurrency where justified
•Cost attribution — Tag functions by team/feature for cost visibility and accountability
•Performance monitoring — Track P99 latency trends; catch degradation before it impacts users
•Dependency updates — Keep runtime versions and dependencies current for security and performance
•Architecture evolution — Refine function boundaries and event flows based on operational experience

Success Builds Momentum

A successful migration creates organizational capital for future serverless work. Document the success—cost savings, velocity improvements, reliability gains—and share it broadly. This makes the next migration easier to justify and execute.

Summary: Mastering Serverless Migration

We've covered the complete lifecycle of serverless migration—from initial assessment through sustained post-migration success. Let's consolidate the key principles:

Key Takeaways

•Assess systematically — Use the suitability matrix to prioritize migration candidates by fit and complexity
•Choose the right pattern — Strangler Fig for incremental migration, Parallel Run for validation, Blue-Green for atomic cutover
•Define success upfront — Establish measurable criteria for performance, cost, reliability, and operations
•Plan dependencies — Map all integration points and plan data migration strategy
•Execute gradually — Canary deployment, gradual traffic shifting, feature flags for control
•Mitigate risks explicitly — Identify risks, plan mitigation, maintain rollback capability throughout
•Manage organizational change — Address stakeholder concerns, invest in skill development, celebrate progress
•Validate and iterate — Formal validation period, post-migration retrospective, continuous optimization

Module Complete:

You've now completed the "When to Use Serverless" module. You possess comprehensive knowledge for making informed serverless adoption decisions—from initial evaluation through successful migration and ongoing operation. This decision framework, combined with the technical knowledge from earlier modules, equips you to architect serverless and hybrid systems that deliver real business value.

Module Complete

Congratulations! You've mastered the art and science of serverless adoption decisions. You can evaluate workloads, compare costs, assess operational implications, design hybrid architectures, and execute controlled migrations. Apply these frameworks to your next serverless decision—whether that's adopting serverless for a new project, migrating an existing system, or deciding that traditional infrastructure is the right choice.

5 / 5

Loading learning content...

System Design (HLD)When to Use Serverless

When to Use Serverless

LevelAdvanced

Duration90 mins

TopicWhen to Use Serverless

5 / 5

Migration Strategies: Transitioning to Serverless

The Art of Controlled Transformation

What You Will Master

Migration Assessment Framework

Before migrating anything, you need systematic assessment. Not every system is a good serverless candidate, and even suitable systems may not be worth the migration effort.

The Migration Suitability Matrix:

Evaluate each potential migration candidate across two dimensions:

Serverless Fit — How well does this workload match serverless characteristics?
Migration Complexity — How difficult would the migration be?

The intersection determines your migration priority:

Migration Priority Matrix
	High Serverless Fit	Medium Serverless Fit	Low Serverless Fit
Low Complexity	Priority 1: Quick wins	Priority 2: Evaluate ROI	Skip: Poor candidate
Medium Complexity	Priority 2: Evaluate ROI	Priority 3: Consider if resources available	Skip: High effort, low reward
High Complexity	Priority 3: Plan carefully	Skip: Complex for unclear benefit	Skip: Wrong direction

Assessing Serverless Fit:

Assessing Migration Complexity:

Evaluate complexity across these dimensions:

Complexity Factors

•Codebase size and structure — Large monoliths with high coupling are more complex to migrate than small, well-factored services
•State management — Systems with complex in-memory state require significant refactoring versus stateless HTTP handlers
•Integration dependencies — Systems with many upstream/downstream integrations require more coordination
•Data migration needs — Different database technologies or schema changes add complexity
•Testing infrastructure — Poor test coverage increases migration risk; existing comprehensive tests reduce it
•Team expertise — Teams familiar with cloud and event-driven patterns migrate faster than those learning simultaneously

Start with Quick Wins

Migration Patterns

Different migration scenarios call for different patterns. Choose the pattern that best matches your system's characteristics and risk tolerance.

Pattern 1: Strangler Fig (Incremental Replacement)

Gradually replace monolith functionality with serverless functions, routing traffic to new implementations as they're ready. The legacy system "shrinks" over time until it can be decommissioned.

Converting Mermaid diagram...

When to Use Strangler Fig:

Large existing systems that can't be migrated all at once
Need to maintain production availability throughout migration
Want to prove serverless value incrementally
Team is learning serverless while migrating

Pattern 2: Parallel Run (Shadow/Compare)

Run both systems simultaneously, comparing outputs or metrics to validate the serverless implementation before cutover.

parallel-run-pattern.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
// Parallel Run Pattern Implementation
// Route traffic to both systems, compare results
 
interface ParallelRunConfig {
    legacyUrl: string;
    serverlessUrl: string;
    compareResponses: boolean;
    logDifferences: boolean;
    useServerlessResponse: boolean; // false = legacy is primary
    sampleRate: number; // 0-1, what % of traffic to parallel run
}
 
export const parallelRunHandler: APIGatewayProxyHandler = async (event) => {
    const config = getConfig();
    
    // Should we parallel run this request?
    const shouldParallel = Math.random() < config.sampleRate;
    
    if (!shouldParallel) {
        // Just use the primary system
        return config.useServerlessResponse 
            ? invokeServerless(event) 
            : invokeLegacy(event);
    }
    
    // Parallel execution
    const startTime = Date.now();
    const [legacyResult, serverlessResult] = await Promise.allSettled([
        invokeLegacy(event),
        invokeServerless(event),
    ]);
    
    // Log comparison metrics
    const metrics = {
        requestId: event.requestContext.requestId,
        legacyDuration: legacyResult.status === 'fulfilled' 
            ? legacyResult.value.duration : null,
        serverlessDuration: serverlessResult.status === 'fulfilled' 
            ? serverlessResult.value.duration : null,
        legacyStatus: legacyResult.status === 'fulfilled' 
            ? legacyResult.value.statusCode : 'error',
        serverlessStatus: serverlessResult.status === 'fulfilled' 
            ? serverlessResult.value.statusCode : 'error',
        responsesMatch: false,
    };
    
    // Compare responses if both succeeded
    if (config.compareResponses && 
        legacyResult.status === 'fulfilled' && 
        serverlessResult.status === 'fulfilled') {
        
        metrics.responsesMatch = compareResponses(
            legacyResult.value.body,
            serverlessResult.value.body,
        );
        
        if (!metrics.responsesMatch && config.logDifferences) {
            await logDifference({
                requestId: event.requestContext.requestId,
                request: event,
                legacyResponse: legacyResult.value.body,
                serverlessResponse: serverlessResult.value.body,
            });
        }
    }
    
    await publishMetrics(metrics);
    
    // Return primary system's response
    const primary = config.useServerlessResponse ? serverlessResult : legacyResult;
    if (primary.status === 'fulfilled') {
        return primary.value.response;
    } else {
        // Primary failed, try secondary
        const secondary = config.useServerlessResponse ? legacyResult : serverlessResult;
        if (secondary.status === 'fulfilled') {
            return secondary.value.response;
        }
        throw new Error('Both systems failed');
    }
};

When to Use Parallel Run:

High-risk migrations where correctness is critical
Complex business logic that's difficult to fully test
Need metrics to validate performance characteristics
Building confidence before traffic cutover

Pattern 3: Blue-Green Cutover

Build the complete serverless system alongside the legacy system, then switch traffic all at once (with instant rollback capability).

Blue-Green Migration Steps

•Build serverless version — Complete implementation with all features and integrations
•Ensure data synchronization — Both systems operate on same data or changes are replicated
•Test thoroughly — Integration tests, load tests, chaos tests against serverless version
•Configure routing — DNS or load balancer ready to switch traffic instantly
•Execute cutover — Switch 100% of traffic to serverless at planned time
•Monitor intensively — Watch all metrics during initial hours/days
•Rollback if needed — Switch back to legacy instantly if issues arise
•Decommission legacy — After validation period, shut down old system

Blue-Green Risk Consideration

Planning the Migration

Successful migrations require upfront planning that addresses technical, organizational, and timeline considerations.

Step 1: Define Success Criteria

Before migrating, establish measurable success criteria. Without clear criteria, you can't know if the migration succeeded or when to stop iterating.

Example Migration Success Criteria
Category	Metric	Target	Validation Method
Functional	Feature parity	100% of critical features working	Feature checklist, user acceptance
Performance	P99 latency	≤ legacy P99 (150ms)	Load testing, production monitoring
Reliability	Error rate	< 0.1%	Production monitoring over 7 days
Cost	Monthly spend	≤ 80% of legacy infrastructure cost	AWS billing comparison
Operations	On-call pages	≤ legacy rate	PagerDuty metrics over 30 days
Developer	Deployment frequency	≥ 3x legacy rate	CI/CD metrics

Step 2: Map Dependencies

Identify all dependencies that affect migration scope and sequencing:

Dependency Mapping Checklist

•Upstream services — What calls the system being migrated? How will routing change?
•Downstream services — What does the system call? Are there connection pattern changes (e.g., database pooling)?
•Shared databases — What other systems access the same database? How will concurrent access work?
•Message queues — What produces messages? What consumes them? How will formats change?
•External integrations — Third-party APIs, webhooks, partner systems that expect specific behavior
•Authentication/authorization — Identity systems, token validation, permission models

Step 3: Design the Target Architecture

Document the serverless architecture in detail before building. Include:

Function boundaries and responsibilities
Event flows and integration patterns
Data storage choices and access patterns
Security model (IAM roles, VPC configuration)
Observability approach (logging, tracing, metrics)
Deployment pipeline design

Step 4: Create a Migration Timeline

Example Migration Timeline (12-Week Project)
Week	Phase	Activities	Milestone
1-2	Foundation	Infrastructure setup, CI/CD pipeline, observability	Can deploy and monitor Lambda
3-4	First Function	Migrate lowest-risk endpoint, integration tests	First production traffic on Lambda
5-7	Core Features	Migrate primary functionality, parallel running	50% traffic on serverless
8-9	Complete Migration	Remaining features, edge cases, load testing	100% traffic capable
10	Cutover	Production switch, intensive monitoring	All production on serverless
11-12	Validation	Stability monitoring, optimization, documentation	Migration declared complete

Build in Buffer Time

Executing the Migration

With planning complete, execution becomes the focus. Follow these practices for controlled, low-risk migration execution.

Practice 1: Start with a Canary

Before any migration work, deploy a trivial canary function to validate your infrastructure:

migration-canary.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
// Migration Canary: Validates infrastructure before real migration
// This function does almost nothing but proves everything works
 
export const canaryHandler: APIGatewayProxyHandler = async (event) => {
    const healthChecks = {
        lambda: true,
        timestamp: new Date().toISOString(),
        region: process.env.AWS_REGION,
        functionName: process.env.AWS_LAMBDA_FUNCTION_NAME,
        checks: {} as Record<string, boolean>,
    };
    
    // Test database connectivity
    try {
        await db.query('SELECT 1');
        healthChecks.checks.database = true;
    } catch (error) {
        healthChecks.checks.database = false;
        console.error('Database check failed:', error);
    }
    
    // Test cache connectivity
    try {
        await redis.ping();
        healthChecks.checks.cache = true;
    } catch (error) {
        healthChecks.checks.cache = false;
        console.error('Cache check failed:', error);
    }
    
    // Test external service (if applicable)
    try {
        await fetch(process.env.LEGACY_SERVICE_URL + '/health');
        healthChecks.checks.legacyService = true;
    } catch (error) {
        healthChecks.checks.legacyService = false;
        console.error('Legacy service check failed:', error);
    }
    
    // Emit custom metric
    await cloudwatch.putMetricData({
        Namespace: 'Migration',
        MetricData: [{
            MetricName: 'CanaryHealthy',
            Value: Object.values(healthChecks.checks).every(Boolean) ? 1 : 0,
            Unit: 'Count',
        }],
    });
    
    return {
        statusCode: 200,
        body: JSON.stringify(healthChecks),
    };
};
 
// Deploy this first and validate:
// - Lambda execution works
// - VPC networking is correct (can reach database)
// - IAM permissions are sufficient
// - Logging and metrics flow correctly
// - Deployment pipeline functions end-to-end

Practice 2: Gradual Traffic Shifting

Never switch 100% of traffic immediately. Use gradual traffic shifting to catch issues with minimal blast radius:

Traffic Shifting Schedule

•1% — Internal users or synthetic traffic only. Validate basic functionality works.
•5% — Small portion of real traffic. Monitor for errors, latency anomalies.
•10% — Sufficient traffic to expose edge cases. Watch for data consistency issues.
•25% — Quarter of traffic. Performance patterns become visible. Check downstream impact.
•50% — Half traffic. Cost patterns become measurable. Validate scaling behavior.
•75% — Most traffic. Stress test the migration setup. Validate rollback still works.
•100% — Full cutover. Intensive monitoring for 24-48 hours before declaring success.

Practice 3: Implement Feature Flags

Use feature flags to control migration behavior independently of deployments:

Migration Feature Flags

•migration.traffic_percentage — What % of traffic routes to serverless (0-100)
•migration.parallel_run — Whether to run both systems and compare (true/false)
•migration.fallback_enabled — Whether to fallback to legacy on serverless errors
•migration.bypass_users — User IDs to always route to legacy (for debugging)
•migration.force_serverless_users — User IDs to always route to serverless (for testing)

Always Maintain Rollback Capability

Handling Data Migration

Data migration is often the most complex and risky aspect of serverless transitions. Application code can be rewritten; data is precious and irreplaceable.

Scenario 1: Same Database, Different Access Pattern

The serverless application uses the same database as the legacy system but accesses it differently (e.g., through RDS Proxy instead of direct connections).

Same Database Considerations

•Connection pooling — Configure RDS Proxy or external pooler before migration. Test under load.
•Concurrent access — Both systems access same data during transition. No schema changes that break either.
•Transaction isolation — Ensure serverless transactions don't conflict with legacy transactions
•Read replicas — Consider read replicas for serverless if load might impact legacy
•Monitoring — Watch connection counts, query performance during transition

Scenario 2: Database Technology Change

Migrating from one database to another (e.g., PostgreSQL to DynamoDB) alongside the serverless migration.

Database Migration Strategies
Strategy	Description	Risk Level	When to Use
Big Bang	Migrate all data at once, cutover	High	Small datasets, acceptable downtime window
Dual Write	Write to both DBs, read from new	Medium	Zero downtime required, can tolerate latency
CDC Replication	Change Data Capture keeps DBs in sync	Medium	Large datasets, need consistency validation
Gradual Table Migration	Migrate tables one at a time	Low	Independent tables, incremental approach

dual-write-pattern.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// Dual Write Pattern for Zero-Downtime Database Migration
 
interface DualWriteConfig {
    primaryDb: Database;  // Initially legacy, switches to new
    secondaryDb: Database;  // Initially new, switches to legacy
    writeToBoth: boolean;
    readFromPrimary: boolean;
    validateWrites: boolean;
}
 
class DualWriteRepository {
    constructor(private config: DualWriteConfig) {}
    
    async create(entity: Entity): Promise<Entity> {
        // Always write to primary
        const result = await this.config.primaryDb.insert(entity);
        
        if (this.config.writeToBoth) {
            try {
                // Write to secondary asynchronously
                await this.config.secondaryDb.insert(entity);
                
                if (this.config.validateWrites) {
                    // Validate data matches
                    const secondary = await this.config.secondaryDb.findById(result.id);
                    if (!deepEqual(result, secondary)) {
                        await this.logDiscrepancy('create', result, secondary);
                    }
                }
            } catch (error) {
                // Secondary write failed - log but don't fail the request
                await this.logSecondaryFailure('create', entity, error);
            }
        }
        
        return result;
    }
    
    async findById(id: string): Promise<Entity | null> {
        const primary = await this.config.primaryDb.findById(id);
        
        // Optionally validate secondary has same data
        if (this.config.validateWrites && primary) {
            const secondary = await this.config.secondaryDb.findById(id);
            if (!deepEqual(primary, secondary)) {
                await this.logDiscrepancy('read', primary, secondary);
            }
        }
        
        return primary;
    }
    
    // Migration phases:
    // 1. writeToBoth=true, primaryDb=legacy (build up new DB)
    // 2. Validate: compare all data between DBs
    // 3. readFromPrimary switches to new DB
    // 4. primaryDb switches to new DB  
    // 5. writeToBoth=false (stop writing to legacy)
    // 6. Decommission legacy DB
}

Data Migration is Not Reversible

Risk Mitigation Strategies

Every migration carries risk. The goal isn't to eliminate risk—it's to understand, quantify, and mitigate it to acceptable levels.

Risk 1: Performance Degradation

Serverless introduces cold starts and different performance characteristics that may not meet existing SLAs.

Performance Risk Mitigation

•Baseline before migration — Measure legacy P50/P95/P99 latency to establish comparison points
•Load test early — Test serverless implementation under realistic load before any traffic migration
•Provision concurrency — For latency-sensitive paths, configure provisioned concurrency
•Optimize cold starts — Minimize package size, use lazy loading, consider compiled languages
•Set rollback triggers — Define automatic rollback if latency exceeds thresholds during migration

Risk 2: Feature Regression

Migrated system may not implement all features correctly, especially edge cases.

Functional Risk Mitigation

•Comprehensive test suite — Invest in automated tests before migration. Cover edge cases, error handling, boundary conditions.
•Parallel run comparison — Run both systems and compare outputs for real traffic
•Production replay — Capture real requests and replay against serverless for comparison
•Gradual rollout — Small traffic percentage finds regressions before they affect many users
•Feature flags for rollback — Ability to disable specific migrated features without full rollback

Risk 3: Cost Overrun

Serverless may cost more than expected for certain workload patterns.

Cost Risk Mitigation

•Model costs before migration — Use the cost comparison framework from earlier pages with realistic traffic estimates
•Set billing alerts — Configure alerts at 50%, 75%, 100% of expected cost during migration
•Monitor during rollout — Track cost per request as traffic shifts, not just total cost
•Optimize before scaling — Right-size memory, use ARM64, optimize duration before 100% traffic
•Define cost ceiling — Establish maximum acceptable cost; rollback if exceeded

The Risk Register

Organizational Change Management

Technical migration is only half the challenge. Organizational and cultural changes often determine whether serverless adoption succeeds long-term.

Stakeholder Alignment:

Different stakeholders have different concerns about serverless migration:

Stakeholder Concerns and Responses
Stakeholder	Primary Concern	How to Address
Engineering Leadership	Will this reduce velocity?	Show deployment frequency improvements, demonstrate local dev story
Product Management	Will features be delayed?	Plan migration to minimize feature work interruption, show quick wins first
Operations/SRE	Will this be harder to debug?	Demonstrate observability approach, involve in tooling decisions
Finance	Will costs go up?	Provide cost modeling, commit to budget targets, show cost monitoring plan
Security	Is this as secure?	Review IAM model, demonstrate compliance capabilities, security review of architecture
Individual Engineers	Will my skills become obsolete?	Provide training, emphasize new skills, involve in design decisions

Skill Development:

Teams need explicit support to develop serverless skills:

Skill Development Program

•Dedicated learning time — Allocate 10-20% of time during migration for serverless learning
•Hands-on workshops — Build real serverless applications in guided sessions, not just watch presentations
•Pair programming — Team experienced members with those learning during migration work
•External training — AWS certifications, specialized serverless courses from qualified providers
•Internal knowledge sharing — Brown bags, documentation, internal tech talks on lessons learned
•Post-incident learning — Blameless postmortems that spread knowledge about serverless failure modes

Celebrating Progress:

Migrations are marathons, not sprints. Celebrate milestones to maintain momentum:

First function deployed to production
First 1% traffic on serverless
50% traffic milestone
Full cutover completion
Legacy system decommissioned
First month of stable operation

Resistance is Information

Measuring Post-Migration Success

Migration isn't complete when traffic switches—it's complete when the new system demonstrates sustained success against your criteria.

Validation Period:

Establish a formal validation period (typically 2-4 weeks) after full cutover before declaring migration complete:

Validation Period Checklist

•No regressions — Zero customer-impacting functional issues attributable to migration
•Performance stable — Latency metrics within 10% of baseline or target
•Cost on target — Monthly run rate within budget projection
•Operations sustainable — No increase in on-call burden or incidents
•Team confident — Engineers comfortable deploying, debugging, and extending
•Rollback tested — Validated that rollback still works if needed in future

Post-Migration Retrospective:

Conduct a formal retrospective after each significant migration to capture lessons:

Retrospective Question Framework
Category	Questions to Explore
Planning	Was the scope accurate? What surprised us? What did we miss?
Execution	What went smoothly? What was harder than expected? Where did we deviate from plan?
Timeline	How did actual compare to estimate? Where did delays occur? What accelerated us?
Risk	Did identified risks materialize? Were there surprise risks? How did mitigation work?
Team	Did we have the right skills? Where did we need help? What would we learn next time?
Tools	What tooling helped? What was missing? What would we invest in for future migrations?

Continuous Optimization:

Migration is the beginning, not the end. Serverless systems require ongoing optimization:

Post-Migration Optimization

•Memory right-sizing — Use production data to optimize memory configuration for cost/performance balance
•Cold start reduction — Analyze cold start patterns; implement provisioned concurrency where justified
•Cost attribution — Tag functions by team/feature for cost visibility and accountability
•Performance monitoring — Track P99 latency trends; catch degradation before it impacts users
•Dependency updates — Keep runtime versions and dependencies current for security and performance
•Architecture evolution — Refine function boundaries and event flows based on operational experience

Success Builds Momentum

Summary: Mastering Serverless Migration

We've covered the complete lifecycle of serverless migration—from initial assessment through sustained post-migration success. Let's consolidate the key principles:

Key Takeaways

•Assess systematically — Use the suitability matrix to prioritize migration candidates by fit and complexity
•Choose the right pattern — Strangler Fig for incremental migration, Parallel Run for validation, Blue-Green for atomic cutover
•Define success upfront — Establish measurable criteria for performance, cost, reliability, and operations
•Plan dependencies — Map all integration points and plan data migration strategy
•Execute gradually — Canary deployment, gradual traffic shifting, feature flags for control
•Mitigate risks explicitly — Identify risks, plan mitigation, maintain rollback capability throughout
•Manage organizational change — Address stakeholder concerns, invest in skill development, celebrate progress
•Validate and iterate — Formal validation period, post-migration retrospective, continuous optimization

Module Complete:

Module Complete

5 / 5