System Design (HLD)Monolith vs Microservices vs Modular Monolith

Monolith vs Microservices vs Modular Monolith

LevelIntermediate

Duration75 mins

TopicMonolith vs Microservices vs Modular Monolith

4 / 5

Evolution Path — From Monolith to Microservices

Architecture as a Journey

The most successful technology companies in the world—Amazon, Netflix, Uber, LinkedIn, Twitter—didn't start with microservices. They started with monoliths. They evolved their architectures incrementally, driven by concrete problems rather than theoretical ideals. This evolution is not a sign of initial failure; it's a sign of responsive engineering.

Architecture isn't a destination; it's a journey. The right architecture for a startup finding product-market fit is different from the right architecture for a scale-up handling explosive growth, which is different from the right architecture for an enterprise optimizing operational efficiency.

Understanding how and when to evolve your architecture—and equally importantly, when not to—is among the most valuable skills a senior engineer or architect can develop.

What You Will Learn

By the end of this page, you will understand the natural evolution stages of software architecture, the specific triggers that indicate evolution is needed, proven strategies for incremental migration, common anti-patterns in architectural evolution, and real-world case studies of successful (and unsuccessful) transformations.

The Natural Evolution Stages

Software systems tend to evolve through predictable stages as organizations grow. Understanding these stages helps you anticipate needs and plan transitions.

Architecture Evolution Stages
Stage	Team Size	Traffic Scale	Typical Architecture	Primary Concern
Early Startup	2-5 developers	< 1K DAU	Simple monolith	Ship features fast, find product-market fit
Growing Startup	5-15 developers	10K-100K DAU	Structured monolith	Maintain velocity, avoid Big Ball of Mud
Scale-Up	15-50 developers	100K-1M DAU	Modular monolith	Team independence, scalability planning
Growth Company	50-200 developers	1M-10M DAU	Selective extraction	Extract high-scale components, maintain stability
Enterprise	200+ developers	10M+ DAU	Microservices/Hybrid	Full team autonomy, sophisticated operations

Stage 1: The Simple Monolith (MVP Phase)

At this stage, speed to market is everything. The team is tiny, the domain is still being discovered, and pivots are likely. The architecture should be the simplest thing that works:

Single codebase, single database, single deployment
Minimal ceremony, maximum flexibility
Code structure follows intuition, not rigid patterns
Technical debt is acceptable—you may throw this away

Stage 2: The Structured Monolith (Product-Market Fit)

You've found something that works. Users are growing. Key hires are joining. Now the codebase needs to support multiple developers without chaos:

Clear layered or feature-based structure
Coding standards and code review introduced
Basic testing infrastructure
Intentional technical debt reduction

Stage 3: The Modular Monolith (Scale-Up Phase)

The team is growing. Different product areas have dedicated people. Coordination is becoming a bottleneck:

Explicit module boundaries enforced
Module ownership assigned to teams
Database tables owned by modules
Architecture tests prevent boundary violations
Event-based internal communication where appropriate

Stage 4: Selective Extraction (Growth Phase)

Specific components face challenges the monolith can't address—extreme scale, different technology needs, isolation requirements:

Extract high-priority modules to services
Maintain monolith for most functionality
Build shared infrastructure (observability, deployment)
Hybrid architecture is the norm

Stage 5: Microservices/SOA (Enterprise Phase)

Full organizational and architectural independence. The platform is mature, teams are numerous, and specialized needs are common:

Multiple independent services with defined contracts
Sophisticated platform infrastructure
Dedicated platform teams
Clear organizational boundaries matching service boundaries

Not Every Company Needs Stage 5

Many successful companies stabilize at Stage 3 or Stage 4. Full microservices architecture is justified only for organizations with hundreds of developers and extreme scale requirements. Basecamp, Hey, and Shopify run primarily on modular monoliths despite serving millions of users.

Evolution Triggers — When to Evolve

Architectural evolution should be driven by concrete problems, not theoretical desires. Here are the specific signals that indicate evolution is warranted:

Signals to Evolve from Simple to Structured Monolith

•New developers take more than a week to make their first meaningful contribution
•Changes in one area frequently break unrelated functionality
•Developers are afraid to refactor because the blast radius is unknown
•Test coverage is low because the code isn't testable in isolation
•Code duplication is rampant because nobody knows what utilities exist

Signals to Evolve from Structured to Modular Monolith

•Team coordination is becoming a bottleneck—features block on unrelated work
•Different product areas need different release cadences
•The codebase is too large for any one developer to understand
•Merge conflicts are common despite developers working on 'different' features
•Database schema changes affect multiple unrelated components

Signals to Evolve from Modular Monolith to Selective Extraction

•A specific module needs 5-10x more compute resources than others
•A module has different failure tolerance requirements (e.g., payments must be highly available)
•A team needs a different technology stack that's genuinely better for their problem
•The shared database is becoming a scaling bottleneck
•Deployment risk is too high because all features deploy together
•The build/test cycle is so slow it significantly impacts developer productivity

Non-Triggers: When NOT to Evolve

The following are NOT sufficient reasons to evolve architecture: 'Netflix does microservices,' 'Microservices are the modern approach,' 'We want to use Kubernetes,' 'We hired someone who knows microservices,' 'We want to put microservices on our resume.' These lead to complexity without benefit.

The Pain Threshold Principle:

Evolution should happen when the pain of the current state exceeds the cost of transition. If you can articulate specific problems that are materially impacting productivity, reliability, or scalability—and those problems would be solved by the new architecture—then evolution is justified.

The Reversibility Consideration:

Evolutions toward more complexity are hard to reverse. Extracting a service from a monolith is difficult but reversible. Decomposing a monolith into 50 services is extremely hard to reverse. This asymmetry should make you conservative about large-scale evolution.

The Strangler Fig Pattern

The Strangler Fig Pattern (named by Martin Fowler after the tropical strangler fig tree) is the most important technique for incremental architectural evolution. Instead of rewriting from scratch, you gradually route traffic from the old system to new components until the old system can be removed.

How It Works:

Identify the Component to Extract — Choose a well-bounded module with clear inputs and outputs.
Build the New System Alongside the Old — The new service runs in parallel, but isn't serving production traffic yet.
Introduce a Routing Layer — A facade or API gateway that can direct traffic to either the old or new system based on configuration.
Migrate Traffic Gradually — Route 1% of traffic to the new system. Monitor. Increase to 10%, 50%, 100%. At each stage, compare behavior.
Remove the Old Component — Once 100% of traffic is on the new system and stable, remove the old code.

Phase 1: Old system handles all traffic

    [Client] → [Old Monolith (Component A + B + C)]

Phase 2: Facade introduced, traffic still to old

    [Client] → [Facade] → [Old Monolith (A + B + C)]

Phase 3: Some traffic routed to new service

    [Client] → [Facade] →→→ [New Service A]
                       ╲→→→ [Old Monolith (A + B + C)]  ← (A still exists but idle)

Phase 4: All traffic to new service, old A removed

    [Client] → [Facade] →→→ [New Service A]
                       ╲→→→ [Old Monolith (B + C)]

Strangler Fig Advantages

•Reversible — At any point, you can route traffic back to the old system. Failed migrations don't cause outages.
•Incremental — You migrate one component at a time, learning and adjusting. No big bang risk.
•De-risked — Each stage is validated in production with real traffic before proceeding.
•Continues Delivery — The old system keeps running. Features can still ship to both systems during transition.
•Measurable — You can compare performance, error rates, and behavior between old and new.

Strangler Facade Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Strangler facade that routes to old or new system
class OrderServiceFacade {
    private featureFlag: FeatureFlagService;
    private legacyOrderService: LegacyOrderService;
    private newOrderService: NewOrderService;
    
    async createOrder(request: CreateOrderRequest): Promise<Order> {
        // Check if this user/request should use new system
        const useNewService = await this.featureFlag.isEnabled(
            'new-order-service',
            {
                userId: request.customerId,
                percentage: 25  // Currently at 25% rollout
            }
        );
        
        if (useNewService) {
            try {
                const order = await this.newOrderService.createOrder(request);
                
                // Shadow write to old system for comparison (optional)
                this.compareShadowResult(request, order);
                
                return order;
            } catch (error) {
                // Fallback to old system if new fails
                await this.metrics.increment('new-order-service.fallback');
                return this.legacyOrderService.createOrder(request);
            }
        } else {
            return this.legacyOrderService.createOrder(request);
        }
    }
    
    // Compare results between old and new (during validation phase)
    private async compareShadowResult(
        request: CreateOrderRequest, 
        newResult: Order
    ): Promise<void> {
        try {
            // Call old system in parallel (read-only simulation)
            const oldResult = await this.legacyOrderService.simulateOrder(request);
            
            // Log differences for analysis
            const differences = this.compareOrders(oldResult, newResult);
            if (differences.length > 0) {
                await this.logger.warn('Order result mismatch', { 
                    request, 
                    differences 
                });
            }
        } catch (error) {
            // Shadow comparison failures don't affect production
            await this.logger.error('Shadow comparison failed', { error });
        }
    }
}

The Shadow Traffic Pattern

During migration, you can send 'shadow traffic'—the same requests sent to both old and new systems, with only the old system's response used. This lets you compare behavior without any production risk. Log differences for analysis before enabling real traffic routing.

Data Migration Strategies

The hardest part of service extraction is usually data migration. The new service needs its own database, but splitting data from a shared schema is complex and risky.

Data Migration Approaches

•Shared Database (Transition) — Initially, the new service reads from the same database as the monolith. This simplifies the initial extraction. Later, you migrate to a separate database.
•Database-per-Service (Target) — The new service has its own database from the start. Data is synchronized from the monolith during transition.
•Double-Write Pattern — During transition, writes go to both databases. Once the new database has all historical data and is verified, the old writes stop.
•Event Sourcing for Sync — The monolith publishes events; the new service consumes them to build its database. Eventually consistent but highly decoupled.

The Double-Write Migration Strategy (Detailed):

Phase 1: New Service Reads from Monolith DB
- Deploy new service with read access to monolith schema
- Route traffic to new service for reads
- Monolith still owns writes
Phase 2: Create New Service's Database
- Set up new service's dedicated database
- Backfill historical data from monolith
- Establish replication or CDC from monolith
Phase 3: Double-Write Period
- All writes go to both databases
- New service reads from its own database
- Verify data consistency between databases
Phase 4: Cutover
- Stop writes to monolith tables
- New service is source of truth
- Monolith calls new service for data (if needed)
Phase 5: Cleanup
- Remove replication
- Archive or delete monolith tables
- Remove double-write code

Double-Write Pattern Example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// During migration: write to both databases
class OrderService {
    private legacyDb: LegacyDatabase;
    private newDb: NewOrderDatabase;
    private migrationConfig: MigrationConfig;
    
    async createOrder(request: CreateOrderRequest): Promise<Order> {
        // Primary write to new database
        const order = await this.newDb.orders.create({
            data: this.transformToNewSchema(request)
        });
        
        // Secondary write to legacy database (for monolith compatibility)
        if (this.migrationConfig.doubleWriteEnabled) {
            try {
                await this.legacyDb.orders.create({
                    data: this.transformToLegacySchema(request, order.id)
                });
            } catch (error) {
                // Log but don't fail—legacy is no longer source of truth
                await this.logger.warn('Legacy write failed', { 
                    orderId: order.id, 
                    error 
                });
                
                // Alert if this happens consistently
                await this.alerting.checkThreshold('legacy-write-failures');
            }
        }
        
        return order;
    }
    
    // Consistency verification job (runs periodically)
    async verifyDataConsistency(): Promise<ConsistencyReport> {
        const newOrders = await this.newDb.orders.findMany({
            where: { createdAt: { gte: hourAgo() } }
        });
        
        const discrepancies: Discrepancy[] = [];
        
        for (const newOrder of newOrders) {
            const legacyOrder = await this.legacyDb.orders.findById(newOrder.id);
            
            if (!this.areEquivalent(newOrder, legacyOrder)) {
                discrepancies.push({
                    orderId: newOrder.id,
                    newOrder,
                    legacyOrder,
                    differences: this.diff(newOrder, legacyOrder)
                });
            }
        }
        
        return { discrepancies, checkedCount: newOrders.length };
    }
}

Data Migration is the Hard Part

Code migration is straightforward—the new service implements the same logic. Data migration is where things break. Foreign key relationships, implicit dependencies, and unexpected queries from other parts of the monolith all surface during data migration. Plan extensively and test thoroughly.

Common Evolution Anti-Patterns

Architectural evolution is fraught with pitfalls. Learning from common failures helps you avoid them.

Evolution Anti-Patterns to Avoid

•Big Bang Rewrite — Attempting to rewrite the entire system at once. This takes longer than estimated, diverges from the running system, and often fails completely. The strangler pattern exists because big bang rewrites fail.
•Extracting Everything at Once — Deciding 'we're going microservices' and decomposing the entire monolith simultaneously. You're now debugging 50 things instead of 1. Extract one service, stabilize, learn, repeat.
•Distributed Monolith Creation — Extracting services without proper data separation or API design. Services that must deploy together, share databases, or have circular dependencies are worse than a monolith.
•Premature Extraction — Extracting services before understanding domain boundaries. You'll draw boundaries wrong, causing high coupling. Get boundaries right in a modular monolith first.
•Ignoring Organizational Readiness — Extracting services without DevOps maturity, observability, or team autonomy. You need operational capability before operational complexity.
•Technology-Driven Migration — Migrating because you want to use a new technology, not because you have problems to solve. Technology serves architecture, not vice versa.
•Underestimating Data Migration — Assuming data migration is a weekend task. It's typically the longest, riskiest phase. Plan months, not days.
•Abandoning the Old System Too Early — Removing the fallback before the new system is proven. Keep the old system running and rollback-ready until you're confident.

The 'Second System Effect':

Fred Brooks identified the 'Second System Effect'—when redesigning a system, there's a tendency to overcomplicate. All the features you couldn't fit in v1, all the 'perfect' patterns you've since learned, all get crammed into v2. The result is an over-engineered mess that takes forever to build.

Mitigation: Set strict scope for the extracted service. It should do exactly what the old component did—no more. Enhancements come later, once the new system is stable.

The Most Expensive Anti-Pattern

The big bang rewrite is responsible for more failed multi-year projects than any other pattern. Netscape rewrote their browser from scratch and lost the browser wars. The new codebase took years and never caught up to competitive reality. Always prefer incremental evolution.

Real-World Evolution Case Studies

Let's examine how major technology companies evolved their architectures, learning from both successes and challenges.

Amazon: From Monolith to Services (2001-2006)

Amazon's evolution is legendary. In the early 2000s, their monolithic architecture was hitting scaling limits. Jeff Bezos issued the famous 'API Mandate':

All teams will expose their data and functionality through service interfaces
Teams must communicate through these interfaces
All interfaces must be designed to be externalizable
No exceptions

Key Lessons from Amazon:

Evolution was driven by organizational scale (thousands of developers)
The mandate was organizational first, technical second
It took years, not months
Services enabled AWS—internal infrastructure became external products

Netflix: Cloud Migration and Microservices (2008-2016)

Netflix's datacenter failure in 2008 triggered migration to AWS and service decomposition. Their evolution:

2008-2010: Migrated non-critical systems to AWS, learned cloud operations
2010-2012: Decomposed monolith into hundreds of services
2012-2016: Built sophisticated platform tooling (Eureka, Hystrix, Zuul)
Ongoing: Continuous evolution of service mesh and infrastructure

Key Lessons from Netflix:

Crisis (datacenter failure) was the trigger
They built extensive tooling before scaling microservices
Open-sourced tools benefited the industry and hiring
Operational capability was built in parallel with decomposition

Segment: Microservices to Monolith (The Reverse Evolution)

Segment is a famous counter-example. They adopted microservices early, hit operational complexity limits, and re-consolidated into a modular monolith.

Their Experience:

Initial microservices seemed right for a data pipeline company
Operational overhead was enormous for their team size
Services were too granular—high network overhead, complex debugging
Reconsolidation improved velocity and reliability

Key Lessons from Segment:

Microservices have team size minimums
Premature decomposition is costly
It's possible (and sometimes correct) to evolve backward
Modular monolith can serve scale with less complexity

Evolution Case Study Comparison
Company	Starting Point	Trigger	Evolution Direction	Duration	Key Insight
Amazon	Monolith	Organizational scale	Services	5+ years	Organizational mandate preceded technical change
Netflix	Datacenter monolith	Datacenter failure	Cloud microservices	8+ years	Build operational capability first
Segment	Microservices	Operational overhead	Modular monolith	~2 years	Reverse evolution is valid
Spotify	Monolith	Team scaling	Microservices + tribes	5+ years	Organizational model + architecture
Shopify	Monolith	Scale challenges	Modular monolith	Ongoing	Monolith can scale with discipline

The Common Thread

Every successful evolution was incremental, trigger-driven, and accompanied by organizational change. No successful evolution was 'let's do microservices because it's modern.' The architecture served specific, articulated needs.

The Evolution Playbook

A practical playbook for architecting evolution in your organization:

Phase 1: Assessment

•Articulate the Problem — What specific pain are you experiencing? Write it down. Be specific. 'We can't scale' is too vague. 'Our order processing can't handle Black Friday traffic and requires 2-week deployment cycles' is specific.
•Quantify the Cost — How much is this problem costing? Developer time? Lost revenue? Customer impact? You need this to justify the evolution investment.
•Identify Alternatives — Is there a simpler solution? Can you vertically scale? Optimize the hot path? Add caching? Evolution should be the last resort, not the first.
•Assess Organizational Readiness — Do you have DevOps maturity? Observability? Team autonomy? If not, start there—architecture change without operational capability fails.

Phase 2: Preparation

•Establish Boundaries — If not already modular, make the monolith modular first. You can't extract what isn't bounded.
•Build Infrastructure — Set up observability, CI/CD, and deployment capability for services before extracting them.
•Select the First Extraction — Choose a module that is relatively independent, causes significant pain, and has clear inputs/outputs.
•Define Success Criteria — How will you know the extraction succeeded? Latency targets? Error rates? Deployment frequency? Define before you start.

Phase 3: Execution

•Build the New Service — Implement the extracted functionality in a new service. Don't add features—just replicate behavior.
•Implement the Facade — Create the routing layer that can direct traffic to old or new. Start with 0% to new.
•Shadow Traffic — Send traffic to both, compare results, but only use old responses. Find discrepancies.
•Gradual Rollout — 1%, 10%, 25%, 50%, 100%. Monitor at each stage. Roll back if issues arise.
•Data Migration — Transfer ownership to the new service's database. Verify consistency.
•Remove Old Code — Once stable, delete the old module from the monolith.

Phase 4: Retrospective

•Measure Against Success Criteria — Did you achieve the goals? If not, why?
•Document Lessons Learned — What would you do differently? What took longer than expected?
•Decide Next Steps — Do you need to extract more? Is the pain resolved? Should you stop?
•Update Organizational Processes — What team, operational, or process changes are needed for the new architecture?

The Stop Question

After each extraction, ask: 'Is our problem solved?' If yes, stop. Many organizations continue extracting past the point of benefit. Hybrid architectures—modular monolith with a few extracted services—are common and appropriate.

Summary: Architecture as Evolution

We've explored how software architectures evolve over time—the natural stages, triggers, strategies, and pitfalls.

Key Takeaways

•Architecture evolves through stages — Simple monolith → Structured monolith → Modular monolith → Selective extraction → Microservices. Not every organization needs every stage.
•Evolve based on triggers, not trends — Specific problems (scaling, team coordination, reliability) justify evolution. 'It's modern' does not.
•The Strangler Fig Pattern is essential — Incremental, reversible migration beats big bang rewrites. Route traffic gradually; fallback is always available.
•Data migration is the hard part — Double-write patterns, consistency verification, and careful cutover are required. Plan extensively.
•Anti-patterns are common and costly — Big bang rewrites, premature extraction, distributed monoliths, and ignoring organizational readiness cause failures.
•Learn from real-world cases — Amazon, Netflix, Segment, and others offer lessons. Successful evolution is slow, deliberate, and tied to organizational change.
•Use the playbook — Assess, prepare, execute, retrospect. Measure success criteria. Know when to stop.
•Hybrid architectures are valid — Most mature systems are neither pure monolith nor pure microservices. They're hybrids that fit organizational reality.

What's Next:

We've covered the three architectural patterns and how to evolve between them. The final page in this module provides a decision framework—a practical guide for choosing the right architecture for your specific context.

Page Complete

You now understand software architecture as an evolutionary journey, not a one-time decision. You can identify when evolution is warranted, apply proven migration strategies, avoid common pitfalls, and learn from real-world case studies.

4 / 5

Loading learning content...

System Design (HLD)Monolith vs Microservices vs Modular Monolith

Monolith vs Microservices vs Modular Monolith

LevelIntermediate

Duration75 mins

TopicMonolith vs Microservices vs Modular Monolith

4 / 5

Evolution Path — From Monolith to Microservices

Architecture as a Journey

Understanding how and when to evolve your architecture—and equally importantly, when not to—is among the most valuable skills a senior engineer or architect can develop.

What You Will Learn

The Natural Evolution Stages

Software systems tend to evolve through predictable stages as organizations grow. Understanding these stages helps you anticipate needs and plan transitions.

Architecture Evolution Stages
Stage	Team Size	Traffic Scale	Typical Architecture	Primary Concern
Early Startup	2-5 developers	< 1K DAU	Simple monolith	Ship features fast, find product-market fit
Growing Startup	5-15 developers	10K-100K DAU	Structured monolith	Maintain velocity, avoid Big Ball of Mud
Scale-Up	15-50 developers	100K-1M DAU	Modular monolith	Team independence, scalability planning
Growth Company	50-200 developers	1M-10M DAU	Selective extraction	Extract high-scale components, maintain stability
Enterprise	200+ developers	10M+ DAU	Microservices/Hybrid	Full team autonomy, sophisticated operations

Stage 1: The Simple Monolith (MVP Phase)

At this stage, speed to market is everything. The team is tiny, the domain is still being discovered, and pivots are likely. The architecture should be the simplest thing that works:

Single codebase, single database, single deployment
Minimal ceremony, maximum flexibility
Code structure follows intuition, not rigid patterns
Technical debt is acceptable—you may throw this away

Stage 2: The Structured Monolith (Product-Market Fit)

You've found something that works. Users are growing. Key hires are joining. Now the codebase needs to support multiple developers without chaos:

Clear layered or feature-based structure
Coding standards and code review introduced
Basic testing infrastructure
Intentional technical debt reduction

Stage 3: The Modular Monolith (Scale-Up Phase)

The team is growing. Different product areas have dedicated people. Coordination is becoming a bottleneck:

Explicit module boundaries enforced
Module ownership assigned to teams
Database tables owned by modules
Architecture tests prevent boundary violations
Event-based internal communication where appropriate

Stage 4: Selective Extraction (Growth Phase)

Specific components face challenges the monolith can't address—extreme scale, different technology needs, isolation requirements:

Extract high-priority modules to services
Maintain monolith for most functionality
Build shared infrastructure (observability, deployment)
Hybrid architecture is the norm

Stage 5: Microservices/SOA (Enterprise Phase)

Full organizational and architectural independence. The platform is mature, teams are numerous, and specialized needs are common:

Multiple independent services with defined contracts
Sophisticated platform infrastructure
Dedicated platform teams
Clear organizational boundaries matching service boundaries

Not Every Company Needs Stage 5

Evolution Triggers — When to Evolve

Architectural evolution should be driven by concrete problems, not theoretical desires. Here are the specific signals that indicate evolution is warranted:

Signals to Evolve from Simple to Structured Monolith

•New developers take more than a week to make their first meaningful contribution
•Changes in one area frequently break unrelated functionality
•Developers are afraid to refactor because the blast radius is unknown
•Test coverage is low because the code isn't testable in isolation
•Code duplication is rampant because nobody knows what utilities exist

Signals to Evolve from Structured to Modular Monolith

•Team coordination is becoming a bottleneck—features block on unrelated work
•Different product areas need different release cadences
•The codebase is too large for any one developer to understand
•Merge conflicts are common despite developers working on 'different' features
•Database schema changes affect multiple unrelated components

Signals to Evolve from Modular Monolith to Selective Extraction

•A specific module needs 5-10x more compute resources than others
•A module has different failure tolerance requirements (e.g., payments must be highly available)
•A team needs a different technology stack that's genuinely better for their problem
•The shared database is becoming a scaling bottleneck
•Deployment risk is too high because all features deploy together
•The build/test cycle is so slow it significantly impacts developer productivity

Non-Triggers: When NOT to Evolve

The Pain Threshold Principle:

The Reversibility Consideration:

The Strangler Fig Pattern

How It Works:

Identify the Component to Extract — Choose a well-bounded module with clear inputs and outputs.
Build the New System Alongside the Old — The new service runs in parallel, but isn't serving production traffic yet.
Introduce a Routing Layer — A facade or API gateway that can direct traffic to either the old or new system based on configuration.
Migrate Traffic Gradually — Route 1% of traffic to the new system. Monitor. Increase to 10%, 50%, 100%. At each stage, compare behavior.
Remove the Old Component — Once 100% of traffic is on the new system and stable, remove the old code.

Phase 1: Old system handles all traffic

    [Client] → [Old Monolith (Component A + B + C)]

Phase 2: Facade introduced, traffic still to old

    [Client] → [Facade] → [Old Monolith (A + B + C)]

Phase 3: Some traffic routed to new service

    [Client] → [Facade] →→→ [New Service A]
                       ╲→→→ [Old Monolith (A + B + C)]  ← (A still exists but idle)

Phase 4: All traffic to new service, old A removed

    [Client] → [Facade] →→→ [New Service A]
                       ╲→→→ [Old Monolith (B + C)]

Strangler Fig Advantages

•Reversible — At any point, you can route traffic back to the old system. Failed migrations don't cause outages.
•Incremental — You migrate one component at a time, learning and adjusting. No big bang risk.
•De-risked — Each stage is validated in production with real traffic before proceeding.
•Continues Delivery — The old system keeps running. Features can still ship to both systems during transition.
•Measurable — You can compare performance, error rates, and behavior between old and new.

Strangler Facade Implementation
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// Strangler facade that routes to old or new system
class OrderServiceFacade {
    private featureFlag: FeatureFlagService;
    private legacyOrderService: LegacyOrderService;
    private newOrderService: NewOrderService;
    
    async createOrder(request: CreateOrderRequest): Promise<Order> {
        // Check if this user/request should use new system
        const useNewService = await this.featureFlag.isEnabled(
            'new-order-service',
            {
                userId: request.customerId,
                percentage: 25  // Currently at 25% rollout
            }
        );
        
        if (useNewService) {
            try {
                const order = await this.newOrderService.createOrder(request);
                
                // Shadow write to old system for comparison (optional)
                this.compareShadowResult(request, order);
                
                return order;
            } catch (error) {
                // Fallback to old system if new fails
                await this.metrics.increment('new-order-service.fallback');
                return this.legacyOrderService.createOrder(request);
            }
        } else {
            return this.legacyOrderService.createOrder(request);
        }
    }
    
    // Compare results between old and new (during validation phase)
    private async compareShadowResult(
        request: CreateOrderRequest, 
        newResult: Order
    ): Promise<void> {
        try {
            // Call old system in parallel (read-only simulation)
            const oldResult = await this.legacyOrderService.simulateOrder(request);
            
            // Log differences for analysis
            const differences = this.compareOrders(oldResult, newResult);
            if (differences.length > 0) {
                await this.logger.warn('Order result mismatch', { 
                    request, 
                    differences 
                });
            }
        } catch (error) {
            // Shadow comparison failures don't affect production
            await this.logger.error('Shadow comparison failed', { error });
        }
    }
}

The Shadow Traffic Pattern

Data Migration Strategies

The hardest part of service extraction is usually data migration. The new service needs its own database, but splitting data from a shared schema is complex and risky.

Data Migration Approaches

•Shared Database (Transition) — Initially, the new service reads from the same database as the monolith. This simplifies the initial extraction. Later, you migrate to a separate database.
•Database-per-Service (Target) — The new service has its own database from the start. Data is synchronized from the monolith during transition.
•Double-Write Pattern — During transition, writes go to both databases. Once the new database has all historical data and is verified, the old writes stop.
•Event Sourcing for Sync — The monolith publishes events; the new service consumes them to build its database. Eventually consistent but highly decoupled.

The Double-Write Migration Strategy (Detailed):

Phase 1: New Service Reads from Monolith DB
- Deploy new service with read access to monolith schema
- Route traffic to new service for reads
- Monolith still owns writes
Phase 2: Create New Service's Database
- Set up new service's dedicated database
- Backfill historical data from monolith
- Establish replication or CDC from monolith
Phase 3: Double-Write Period
- All writes go to both databases
- New service reads from its own database
- Verify data consistency between databases
Phase 4: Cutover
- Stop writes to monolith tables
- New service is source of truth
- Monolith calls new service for data (if needed)
Phase 5: Cleanup
- Remove replication
- Archive or delete monolith tables
- Remove double-write code

Double-Write Pattern Example
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
// During migration: write to both databases
class OrderService {
    private legacyDb: LegacyDatabase;
    private newDb: NewOrderDatabase;
    private migrationConfig: MigrationConfig;
    
    async createOrder(request: CreateOrderRequest): Promise<Order> {
        // Primary write to new database
        const order = await this.newDb.orders.create({
            data: this.transformToNewSchema(request)
        });
        
        // Secondary write to legacy database (for monolith compatibility)
        if (this.migrationConfig.doubleWriteEnabled) {
            try {
                await this.legacyDb.orders.create({
                    data: this.transformToLegacySchema(request, order.id)
                });
            } catch (error) {
                // Log but don't fail—legacy is no longer source of truth
                await this.logger.warn('Legacy write failed', { 
                    orderId: order.id, 
                    error 
                });
                
                // Alert if this happens consistently
                await this.alerting.checkThreshold('legacy-write-failures');
            }
        }
        
        return order;
    }
    
    // Consistency verification job (runs periodically)
    async verifyDataConsistency(): Promise<ConsistencyReport> {
        const newOrders = await this.newDb.orders.findMany({
            where: { createdAt: { gte: hourAgo() } }
        });
        
        const discrepancies: Discrepancy[] = [];
        
        for (const newOrder of newOrders) {
            const legacyOrder = await this.legacyDb.orders.findById(newOrder.id);
            
            if (!this.areEquivalent(newOrder, legacyOrder)) {
                discrepancies.push({
                    orderId: newOrder.id,
                    newOrder,
                    legacyOrder,
                    differences: this.diff(newOrder, legacyOrder)
                });
            }
        }
        
        return { discrepancies, checkedCount: newOrders.length };
    }
}

Data Migration is the Hard Part

Common Evolution Anti-Patterns

Architectural evolution is fraught with pitfalls. Learning from common failures helps you avoid them.

Evolution Anti-Patterns to Avoid

•Big Bang Rewrite — Attempting to rewrite the entire system at once. This takes longer than estimated, diverges from the running system, and often fails completely. The strangler pattern exists because big bang rewrites fail.
•Extracting Everything at Once — Deciding 'we're going microservices' and decomposing the entire monolith simultaneously. You're now debugging 50 things instead of 1. Extract one service, stabilize, learn, repeat.
•Distributed Monolith Creation — Extracting services without proper data separation or API design. Services that must deploy together, share databases, or have circular dependencies are worse than a monolith.
•Premature Extraction — Extracting services before understanding domain boundaries. You'll draw boundaries wrong, causing high coupling. Get boundaries right in a modular monolith first.
•Ignoring Organizational Readiness — Extracting services without DevOps maturity, observability, or team autonomy. You need operational capability before operational complexity.
•Technology-Driven Migration — Migrating because you want to use a new technology, not because you have problems to solve. Technology serves architecture, not vice versa.
•Underestimating Data Migration — Assuming data migration is a weekend task. It's typically the longest, riskiest phase. Plan months, not days.
•Abandoning the Old System Too Early — Removing the fallback before the new system is proven. Keep the old system running and rollback-ready until you're confident.

The 'Second System Effect':

Mitigation: Set strict scope for the extracted service. It should do exactly what the old component did—no more. Enhancements come later, once the new system is stable.

The Most Expensive Anti-Pattern

Real-World Evolution Case Studies

Let's examine how major technology companies evolved their architectures, learning from both successes and challenges.

Amazon: From Monolith to Services (2001-2006)

Amazon's evolution is legendary. In the early 2000s, their monolithic architecture was hitting scaling limits. Jeff Bezos issued the famous 'API Mandate':

All teams will expose their data and functionality through service interfaces
Teams must communicate through these interfaces
All interfaces must be designed to be externalizable
No exceptions

Key Lessons from Amazon:

Evolution was driven by organizational scale (thousands of developers)
The mandate was organizational first, technical second
It took years, not months
Services enabled AWS—internal infrastructure became external products

Netflix: Cloud Migration and Microservices (2008-2016)

Netflix's datacenter failure in 2008 triggered migration to AWS and service decomposition. Their evolution:

2008-2010: Migrated non-critical systems to AWS, learned cloud operations
2010-2012: Decomposed monolith into hundreds of services
2012-2016: Built sophisticated platform tooling (Eureka, Hystrix, Zuul)
Ongoing: Continuous evolution of service mesh and infrastructure

Key Lessons from Netflix:

Crisis (datacenter failure) was the trigger
They built extensive tooling before scaling microservices
Open-sourced tools benefited the industry and hiring
Operational capability was built in parallel with decomposition

Segment: Microservices to Monolith (The Reverse Evolution)

Segment is a famous counter-example. They adopted microservices early, hit operational complexity limits, and re-consolidated into a modular monolith.

Their Experience:

Initial microservices seemed right for a data pipeline company
Operational overhead was enormous for their team size
Services were too granular—high network overhead, complex debugging
Reconsolidation improved velocity and reliability

Key Lessons from Segment:

Microservices have team size minimums
Premature decomposition is costly
It's possible (and sometimes correct) to evolve backward
Modular monolith can serve scale with less complexity

Evolution Case Study Comparison
Company	Starting Point	Trigger	Evolution Direction	Duration	Key Insight
Amazon	Monolith	Organizational scale	Services	5+ years	Organizational mandate preceded technical change
Netflix	Datacenter monolith	Datacenter failure	Cloud microservices	8+ years	Build operational capability first
Segment	Microservices	Operational overhead	Modular monolith	~2 years	Reverse evolution is valid
Spotify	Monolith	Team scaling	Microservices + tribes	5+ years	Organizational model + architecture
Shopify	Monolith	Scale challenges	Modular monolith	Ongoing	Monolith can scale with discipline

The Common Thread

The Evolution Playbook

A practical playbook for architecting evolution in your organization:

Phase 1: Assessment

•Articulate the Problem — What specific pain are you experiencing? Write it down. Be specific. 'We can't scale' is too vague. 'Our order processing can't handle Black Friday traffic and requires 2-week deployment cycles' is specific.
•Quantify the Cost — How much is this problem costing? Developer time? Lost revenue? Customer impact? You need this to justify the evolution investment.
•Identify Alternatives — Is there a simpler solution? Can you vertically scale? Optimize the hot path? Add caching? Evolution should be the last resort, not the first.
•Assess Organizational Readiness — Do you have DevOps maturity? Observability? Team autonomy? If not, start there—architecture change without operational capability fails.

Phase 2: Preparation

•Establish Boundaries — If not already modular, make the monolith modular first. You can't extract what isn't bounded.
•Build Infrastructure — Set up observability, CI/CD, and deployment capability for services before extracting them.
•Select the First Extraction — Choose a module that is relatively independent, causes significant pain, and has clear inputs/outputs.
•Define Success Criteria — How will you know the extraction succeeded? Latency targets? Error rates? Deployment frequency? Define before you start.

Phase 3: Execution

•Build the New Service — Implement the extracted functionality in a new service. Don't add features—just replicate behavior.
•Implement the Facade — Create the routing layer that can direct traffic to old or new. Start with 0% to new.
•Shadow Traffic — Send traffic to both, compare results, but only use old responses. Find discrepancies.
•Gradual Rollout — 1%, 10%, 25%, 50%, 100%. Monitor at each stage. Roll back if issues arise.
•Data Migration — Transfer ownership to the new service's database. Verify consistency.
•Remove Old Code — Once stable, delete the old module from the monolith.

Phase 4: Retrospective

•Measure Against Success Criteria — Did you achieve the goals? If not, why?
•Document Lessons Learned — What would you do differently? What took longer than expected?
•Decide Next Steps — Do you need to extract more? Is the pain resolved? Should you stop?
•Update Organizational Processes — What team, operational, or process changes are needed for the new architecture?

The Stop Question

Summary: Architecture as Evolution

We've explored how software architectures evolve over time—the natural stages, triggers, strategies, and pitfalls.

Key Takeaways

•Architecture evolves through stages — Simple monolith → Structured monolith → Modular monolith → Selective extraction → Microservices. Not every organization needs every stage.
•Evolve based on triggers, not trends — Specific problems (scaling, team coordination, reliability) justify evolution. 'It's modern' does not.
•The Strangler Fig Pattern is essential — Incremental, reversible migration beats big bang rewrites. Route traffic gradually; fallback is always available.
•Data migration is the hard part — Double-write patterns, consistency verification, and careful cutover are required. Plan extensively.
•Anti-patterns are common and costly — Big bang rewrites, premature extraction, distributed monoliths, and ignoring organizational readiness cause failures.
•Learn from real-world cases — Amazon, Netflix, Segment, and others offer lessons. Successful evolution is slow, deliberate, and tied to organizational change.
•Use the playbook — Assess, prepare, execute, retrospect. Measure success criteria. Know when to stop.
•Hybrid architectures are valid — Most mature systems are neither pure monolith nor pure microservices. They're hybrids that fit organizational reality.

What's Next:

Page Complete

4 / 5