System Design (HLD)Scaling Playbook

The Scaling Playbook: From Startup to Enterprise

LevelAdvanced

Duration90 mins

TopicScaling Playbook

5 / 5

Microservices Decomposition

The Decomposition Decision

Of all the scaling patterns we've explored, microservices decomposition is the most consequential—and the most frequently misapplied. Unlike database optimization or caching, which can be layered onto existing systems, decomposition reshapes the fundamental architecture of your application. It affects how teams work, how code is organized, how systems are deployed, and how failures propagate.

Microservices have become almost synonymous with "modern architecture," leading many organizations to adopt them prematurely. The result: distributed monoliths that combine the complexity of distributed systems with the tight coupling of monoliths. The worst of both worlds.

This page provides a rigorous framework for understanding when decomposition is appropriate, how to identify service boundaries, and the patterns that make microservices work. The goal isn't to advocate for microservices—it's to equip you with the judgment to make the right architectural decision for your context.

What You Will Learn

By the end of this page, you will understand the genuine benefits and costs of microservices, how to identify appropriate service boundaries using Domain-Driven Design principles, patterns for decomposing existing monoliths, and the organizational and operational requirements for successful microservices adoption.

The Case Against (and For) Microservices

Before exploring how to do microservices, we must honestly examine whether to do them at all. Microservices are not universally better—they're a trade-off.

The Genuine Benefits:

Independent deployment: Each service can be deployed without coordinating with others. A bug fix in the payment service doesn't require deploying the entire application.

Technology heterogeneity: Different services can use different languages, frameworks, and databases appropriate to their problem domain.

Team autonomy: Small teams can own services end-to-end, making decisions independently without company-wide coordination.

Isolated scaling: A heavily-loaded service can be scaled without scaling the entire system.

Failure isolation: A failure in one service doesn't necessarily crash the entire system (with proper design).

Microservices Excel When...

•Multiple teams work on the same codebase
•Different parts of the system have vastly different scaling needs
•Deployment coordination is becoming a bottleneck
•Parts of the system have different availability requirements
•The domain has natural boundaries between bounded contexts
•The organization has mature DevOps practices

Microservices Struggle When...

•A small team builds and maintains everything
•The domain is unclear or rapidly changing
•The organization lacks distributed systems expertise
•Observability and debugging infrastructure is immature
•Latency requirements are very strict
•The primary pain isn't organizational—it's technical debt

The Real Costs:

Distributed systems complexity: Network calls fail. Latency is variable. Partial failures occur. These require sophistication to handle correctly.

Operational overhead: Each service needs monitoring, alerting, deployment pipelines, and on-call rotations. The overhead scales with service count.

Testing complexity: Integration tests become essential but difficult. Testing all service interactions exhaustively is often impossible.

Debugging difficulty: A request touches multiple services. Tracing issues across services requires distributed tracing infrastructure.

Data consistency: Transactions can't span services. Eventual consistency becomes the norm. Business logic must accommodate this.

The honest assessment: For organizations with fewer than ~50 engineers working on a single product, the overhead of microservices often exceeds the benefits. Start with a monolith. Decompose when you have concrete scaling or organizational problems that microservices solve.

The Distributed Monolith Anti-Pattern

A distributed monolith has the worst of both worlds: services that must be deployed together, that share databases, or that have synchronous dependencies creating tight coupling. You pay the distributed systems tax without gaining the benefits. If decomposition doesn't result in truly independent services, you haven't decomposed—you've fragmented.

Identifying Service Boundaries

The most critical decision in microservices architecture is defining service boundaries. Wrong boundaries lead to chatty services, distributed transactions, and all the pain of distributed systems without the benefits.

Domain-Driven Design (DDD) Approach:

The most robust approach to service boundaries comes from Domain-Driven Design's concept of bounded contexts. A bounded context is a boundary within which a particular model is defined and applicable. Different contexts may have different models for the same real-world concept.

Example: "Customer" means different things in different contexts:

Sales context: Leads, opportunities, contact preferences
Billing context: Payment methods, invoices, credit limits
Support context: Tickets, satisfaction scores, communication history

Each context should be a candidate for a service boundary.

Identifying Bounded Contexts:

Boundary Identification Techniques

•Event Storming — Workshop where domain experts and developers map business events on a timeline. Natural clusters of events suggest bounded contexts.
•Context Mapping — Analyze how different parts of the organization use shared concepts. Where vocabularies diverge, boundaries exist.
•Data Ownership Analysis — Which team creates and owns each data entity? Ownership boundaries often align with service boundaries.
•Conway's Law Observation — Organizations design systems that mirror their communication structures. Align services with team structures for sustainable ownership.
•Change Frequency Analysis — Parts that change together should stay together. Parts that change independently are candidates for separation.
•Coupling Analysis — Graph dependencies between modules. Look for clusters with high internal cohesion and low external coupling.

The Coupling Litmus Test:

A well-defined service boundary exhibits:

Minimal cross-boundary communication: Services should need to talk to each other infrequently. If every request requires calling 5 other services synchronously, boundaries are wrong.

Eventual consistency acceptance: Business logic within the boundary can use transactions, but cross-boundary consistency is eventually consistent. If business rules require immediate consistency across services, they should be in the same service.

Independent data ownership: Each service owns its data exclusively. No shared databases. Other services access data via the service's API, not its database.

Independent deployment: A change in one service shouldn't require changes in others. Contract changes should be additive and backward-compatible.

Converting Mermaid diagram...

Start Coarser, Refine Later

When in doubt, make services bigger. It's easier to split a service later than to merge services. A service that's too large is merely a monolith—manageable. Services that are too small create excessive network calls and coordination overhead. The sweet spot typically emerges after you've lived with the initial boundaries for a while.

Decomposition Strategies

Decomposing an existing monolith is one of the most challenging engineering projects an organization can undertake. The wrong approach can destabilize production, fragment teams, and deliver no benefits. Several strategies have emerged from hard-won experience:

Strategy 1: Strangler Fig Pattern

Named after fig vines that gradually envelop trees, this pattern incrementally replaces monolith functionality. New features are built as services. Existing features are migrated one by one. The monolith gradually shrinks until nothing remains.

How it works:

Place a facade (API gateway) in front of the monolith
Implement new functionality as new services behind the facade
Gradually migrate existing functionality to services
Facade routes traffic to appropriate destination
Eventually decommission the monolith

Advantages: Low risk. Can pause or reverse at any point. Delivers value incrementally.

Challenges: Requires discipline. Easy to add new features to both places. Migration can stall indefinitely.

Decomposition Approach Comparison
Approach	Risk Level	Duration	Team Impact	Best For
Strangler Fig	Low	Months to years	Gradual transition	Production systems with uptime requirements
Branch by Abstraction	Medium	Weeks to months	Minimal disruption	Well-architected monoliths
Big Bang Rewrite	Very High	Months	Major disruption	Rarely recommended; last resort
Product Line Split	Medium	Varies	Team reorganization	Multi-product companies

Strategy 2: Branch by Abstraction

Create an abstraction layer within the monolith before extracting functionality.

Steps:

Identify the module to extract
Create an interface that encapsulates the module's functionality
Refactor all callers to use the interface
Create a new implementation of the interface that calls an external service
Implement the external service
Switch to the external implementation
Remove the old implementation

Advantage: Change is incremental and reversible at each step.

Strategy 3: Database Disentanglement

Often the hardest part of decomposition is separating shared data. Services sharing a database are not independent.

Pattern: Shared Database → Eventual Sync

Both monolith and new service read from shared database initially
New service builds its own data model
Synchronization job keeps data in sync
New service's reads switch to its own database
New service begins accepting writes
Monolith stops writing to the now-deprecated tables
Synchronization reversed if needed for migration period
Eventually remove shared tables

Never Rewrite from Scratch

The temptation to "rewrite properly" is strong but almost always wrong. Rewrites take longer than estimated, reproduce fewer features than planned, and introduce new bugs while missing edge cases the old system handled. The strangler fig approach preserves working code until replacements are proven. Avoid big bang rewrites at nearly all costs.

Service Communication Patterns

How services communicate defines the coupling in your architecture. Synchronous calls create temporal coupling; asynchronous patterns enable independence.

Synchronous Patterns:

REST API: HTTP-based, resource-oriented. Simple, widely understood, good tooling. Each call blocks waiting for response.

gRPC: Binary protocol, strong typing, efficient serialization. Better performance than REST, but more complex tooling and less human-readable.

When synchronous makes sense:

User-facing requests requiring immediate response
Reads that must be fresh
Simple request-response patterns

The synchronous trap: Chains of synchronous calls create latency accumulation and cascading failures. If Service A calls B calls C, a C outage takes down A.

Asynchronous Patterns:

Event-driven: Services publish events; interested services subscribe. No direct coupling between producer and consumer.

Message queues: Work queues for task processing with delivery guarantees and retry handling.

When async makes sense:

Operations that don't require immediate response
Broadcasting state changes to multiple consumers
Decoupling services for independent scaling and resilience

communication-patterns
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// ANTI-PATTERN: Synchronous chain (fragile)
class OrderController {
    async createOrder(order: OrderRequest) {
        // Each call adds latency and failure point
        const user = await this.userService.getUser(order.userId);
        const inventory = await this.inventoryService.checkStock(order.items);
        const payment = await this.paymentService.authorize(user, order.total);
        const shipping = await this.shippingService.calculateRate(order.address);
        
        // If ANY of these fail, the whole request fails
        return this.orderRepository.create({ ...order, user, inventory, payment, shipping });
    }
}
 
// BETTER: Event-driven with eventual consistency
class OrderController {
    async createOrder(order: OrderRequest) {
        // Validate only what we own
        const orderEntity = await this.orderRepository.create({
            ...order,
            status: 'PENDING_VALIDATION',
        });
        
        // Publish event - other services react asynchronously
        await this.eventBus.publish('order.created', {
            orderId: orderEntity.id,
            userId: order.userId,
            items: order.items,
            address: order.address,
        });
        
        // Return immediately - order will be processed eventually
        return { orderId: orderEntity.id, status: 'PROCESSING' };
    }
}
 
// Each service handles its concern independently
class InventoryService {
    @Subscribe('order.created')
    async handleOrderCreated(event: OrderCreatedEvent) {
        const reserved = await this.reserveInventory(event.items);
        
        if (reserved) {
            await this.eventBus.publish('inventory.reserved', {
                orderId: event.orderId,
                items: event.items,
            });
        } else {
            await this.eventBus.publish('inventory.reservation_failed', {
                orderId: event.orderId,
                reason: 'INSUFFICIENT_STOCK',
            });
        }
    }
}
 
// Saga coordinator handles compensation on failures
class OrderSagaCoordinator {
    @Subscribe('inventory.reservation_failed')
    async handleReservationFailed(event: ReservationFailedEvent) {
        // Update order status
        await this.orderRepository.update(event.orderId, { status: 'FAILED' });
        
        // Notify user
        await this.notificationService.notifyOrderFailed(event.orderId, event.reason);
    }
}

Communication Best Practices

•Prefer async for cross-service communication — Events decouple services temporally. Failures don't cascade.
•Use sync only when necessary — Reads requiring immediate consistency, user-initiated queries needing immediate response.
•Implement circuit breakers for sync calls — Prevent cascading failures when downstream services are slow or unavailable.
•Set aggressive timeouts — A 30-second timeout means your service is blocked for 30 seconds. Prefer fast failure.
•Design for idempotency — Messages may be delivered multiple times. Handle duplicates gracefully.
•Version your APIs — Backward compatibility enables independent deployment. Never break existing contracts.

The Async Mindset Shift

Adopting event-driven communication requires rethinking how applications work. Instead of 'call a service and wait,' the model becomes 'publish an event and let interested parties react.' This feels less controlled, but it's more resilient. Embrace eventual consistency. Design for the happy path to be fast; handle edge cases through compensating actions.

Data Management in Microservices

Data management is often the hardest aspect of microservices. In a monolith, a single database provides transactions, joins, and referential integrity. These conveniences don't exist across service boundaries.

Core Principle: Each Service Owns Its Data

A service's data is private. Other services access it only via the service's API—never by connecting to its database. This enables:

Schema freedom: Services can change their data models without coordinating with others
Technology choice: Each service can use the database type best suited to its needs
Independent scaling: Data stores can be scaled per service
Encapsulation: Business rules around data are enforced by the owning service

The Challenge: No Cross-Service Joins

In a monolith:

SELECT o.*, u.name, p.title
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id

In microservices, this query requires:

Fetch orders from Order Service
Fetch users from User Service
Fetch products from Product Service
Join in application code

Cross-Service Data Patterns

•API Composition — Service makes multiple calls to other services, combines results. Simple but adds latency and couples services.
•CQRS with Materialized Views — Separate read models are built from events. Pre-joined data for common queries. Complex but performant.
•Data Denormalization via Events — Services subscribe to events and store local copies of needed data. Trades consistency for performance.
•Saga Pattern — Coordinates multi-service operations through a series of local transactions with compensating actions on failure.
•API Gateway Aggregation — Gateway layer composes responses from multiple services. Centralizes aggregation logic.
•Shared Events for Reference Data — Low-frequency reference data (currencies, countries) published as events, cached locally by services.

Pattern Deep Dive: CQRS (Command Query Responsibility Segregation)

CQRS separates the write model (commands) from the read model (queries). In microservices, this is powerful:

Commands go to the service owning the data
Events are published when data changes
Read models are built from events, combining data from multiple services
Queries hit the pre-built read model, avoiding cross-service calls

Example: An "Order Detail" read model could subscribe to events from Order, User, and Product services, building a denormalized view. Queries hit this view directly—no joins, no cross-service calls, fast reads.

Trade-offs:

Eventually consistent (read model is updated asynchronously)
Additional infrastructure (event bus, read model datastore)
Complexity in building and maintaining projections

When to use: Query-heavy applications where the latency and coupling of API composition is unacceptable.

Embrace Eventual Consistency

The desire for strong consistency across services leads to distributed transactions, which don't work reliably at scale. Accept that cross-service data will be eventually consistent—often within milliseconds, but not immediately. Design UX and business processes to accommodate brief windows of inconsistency. This is the price of truly independent services.

Operational Considerations

Microservices dramatically increase operational complexity. What was one application to monitor becomes tens or hundreds. The following capabilities become essential:

Observability:

Distributed Tracing — A single user request might touch 10 services. Without tracing, debugging is impossible. Tools like Jaeger, Zipkin, or AWS X-Ray correlate requests across services.

Centralized Logging — Logs from all services must be aggregated for searching and analysis. ELK stack, Datadog, or similar solutions. Include correlation IDs in every log.

Metrics and Dashboards — Each service exposes metrics. Aggregated dashboards show system health. Alert on anomalies.

Service Mesh:

For organizations with many services, a service mesh (Istio, Linkerd) provides:

Automatic mTLS between services
Traffic management (canary deployments, traffic shifting)
Observability without code changes
Retry and timeout policies as configuration

Operational Requirements Comparison
Capability	Monolith	Microservices
Deployment	One pipeline	N pipelines (one per service)
Monitoring	Single application metrics	Cross-service aggregation + tracing
Debugging	Stack traces in one process	Distributed tracing required
Testing	Unit + integration tests	Contract testing + E2E essential
Security	Perimeter security sufficient	Service-to-service auth required
Configuration	Single config source	Distributed config management
Team overhead	One rotation, one backlog	Per-service ownership structure

Essential Microservices Infrastructure

•Container Orchestration — Kubernetes or equivalent for deployment, scaling, and service discovery. Managing microservices without orchestration is unsustainable.
•CI/CD Per Service — Each service needs its own pipeline. Template pipelines reduce duplication. Automated testing is critical.
•Service Registry — Services must find each other. Kubernetes DNS, Consul, or similar provides dynamic discovery.
•API Gateway — Single entry point for external clients. Handles authentication, rate limiting, routing to internal services.
•Configuration Management — Centralized config (Consul, etcd, Kubernetes ConfigMaps) with environment-specific overrides.
•Secrets Management — Vault, AWS Secrets Manager, or similar. Secrets must not live in code or config files.

Platform Team Investment

Successful microservices organizations invest heavily in internal platforms that abstract away infrastructure complexity. Product teams shouldn't manage Kubernetes configurations; they should deploy via 'git push' with sensible defaults. Without this investment, each team reinvents operations, leading to inconsistency and inefficiency.

Team Structures and Ownership

Microservices are as much an organizational pattern as a technical one. Conway's Law states that organizations design systems that mirror their communication structures. The inverse is also true: adopting microservices requires organizational change.

Team Topologies:

Stream-aligned teams — Own one or more services end-to-end. Responsible for building, deploying, and operating their services. Cross-functional (devs, QA, sometimes ops).

Platform teams — Provide self-service capabilities that stream-aligned teams consume. CI/CD, Kubernetes platform, observability stack, security tools.

Enabling teams — Help stream-aligned teams adopt new capabilities. Short-term embeddings to transfer knowledge, not permanent ownership.

Complicated subsystem teams — Own technically complex components requiring specialist expertise (ML models, cryptography, video encoding).

The ideal: Small, autonomous teams (3-8 people) owning 1-3 services. Clear ownership. End-to-end responsibility. "You build it, you run it."

Service Ownership Principles:

Single owner: Every service has one owning team. Not two. Not a committee. One team makes decisions and is accountable.

End-to-end responsibility: The owning team builds, tests, deploys, monitors, and responds to incidents. Ownership doesn't end at PR merge.

Clear interfaces: Teams communicate through well-defined APIs and events. Changes to contracts require coordination with consumers.

Autonomous decisions: Teams decide implementation details, technology choices (within guardrails), and deployment timing. Autonomy enables speed.

Cross-team coordination mechanisms:

Service catalogs: Central registry of services, owners, APIs, and dependencies
Architecture Decision Records (ADRs): Document significant decisions and rationale
Guilds and Communities of Practice: Cross-team groups sharing expertise (backend guild, security champions)
API governance: Standards for API design, versioning, deprecation

Microservices Require Mature Culture

Organizations where teams blame each other for incidents, where deployment requires approval chains, or where decisions are centralized will struggle with microservices. The technical architecture assumes cultural norms: psychological safety, ownership mentality, blameless postmortems, and trust in teams. Address culture before architecture.

Common Pitfalls and Anti-Patterns

Learning from failures is essential. These anti-patterns have derailed countless microservices initiatives:

Anti-Pattern 1: Nano-services

Services too small to be meaningful. A service for each database table. A service for each function.

Symptom: Simple operations require calling 10+ services.

Fix: Merge related services. Bounded contexts, not individual entities.

Anti-Pattern 2: Shared Database

Multiple services connect to the same database, reading and writing directly.

Symptom: Changes to database schema require coordinating multiple teams.

Fix: Extract data into service-owned stores. Coordinate during transition, then enforce ownership.

Anti-Pattern 3: Synchronous Chains

A → B → C → D → E, all synchronous calls. Latency accumulates; any failure breaks the chain.

Symptom: Slow user requests; one service failure cascades.

Fix: Event-driven architecture. Async where possible. Circuit breakers for remaining sync calls.

More Microservices Anti-Patterns

•The God Service — One service that everyone depends on. Becomes a bottleneck and single point of failure. Decompose it or accept it's actually core infrastructure.
•Leaky Abstractions — Services that expose internal implementation details. Consumers become coupled to internals. Enforce contract-driven development.
•Distributed Monolith — Services must be deployed together, share configurations, or have tight behavioral coupling. You have fragmentation, not decomposition.
•Not Invented Here — Building everything from scratch instead of using existing solutions. Platform team should provide common capabilities.
•Ignoring Operations — Launching services without monitoring, logging, or deployment automation. Leads to operational hell at scale.
•Premature Decomposition — Breaking apart before understanding the domain. Creates wrong boundaries that are painful to fix.

The Decomposition Regret Cycle:

Monolith is slow to deploy — "Let's adopt microservices!"
Initial excitement — Teams carve out their services
Reality sets in — Deployment is still slow (now coordinating multiple services), debugging is harder, data consistency issues emerge
Partial retreat — Some services merged back, some remain fragmented
Steady state — A mix that reflects true bounded contexts, often fewer and larger services than initially planned

This cycle is common. The lesson: start coarser, refine based on actual pain points, not anticipated ones.

Signs You're Doing It Right

Successful microservices feel like this: Teams deploy multiple times daily without coordination. Incidents are contained to single services. Adding new features is faster than before. Teams feel ownership and autonomy. The system is more resilient to failures. If these don't describe your experience, revisit your boundaries and infrastructure.

Summary: The Scaling Playbook Complete

Microservices decomposition is the culmination of our scaling playbook—the final pattern, applied when organizational and technical pressures make it necessary. Let's consolidate the key learnings:

Key Takeaways

•Microservices are a trade-off, not an upgrade — Benefits come with significant costs. Evaluate honestly whether the benefits outweigh the costs for your context.
•Start with a modular monolith — Build clean boundaries within a monolith first. Extraction is easier from well-structured code.
•Identify boundaries using DDD — Bounded contexts, not database tables, define service boundaries. Event storming helps discover them.
•Decompose incrementally (Strangler Fig) — Never rewrite from scratch. Gradually replace functionality until the monolith disappears.
•Prefer async communication — Events decouple services. Sync calls create temporal coupling and cascade failures.
•Each service owns its data — No shared databases. Data is accessed via service APIs. Accept eventual consistency.
•Invest in observability and platform — Distributed tracing, centralized logging, and self-service deployment are essential.
•Align teams with services — Conway's Law applies. Ownership, autonomy, and end-to-end responsibility enable success.

The Complete Scaling Playbook:

Over this module, we've covered the comprehensive playbook for scaling systems:

Real-world scaling patterns — Mindset and foundational patterns
Database scaling journey — Optimization → replicas → sharding
Caching layer introduction — Hierarchy, invalidation, and distributed caching
Queue-based decoupling — Async processing, delivery guarantees, and patterns
Microservices decomposition — When, how, and organizational considerations

These patterns build on each other. Apply them in sequence, addressing the simplest applicable pattern before progressing to more complex ones. Most systems never need all patterns—but understanding all of them equips you to make informed decisions.

Module Complete

Congratulations! You've completed the Scaling Playbook module—a comprehensive guide to scaling systems from startup to enterprise scale. You now have the knowledge to optimize databases, implement caching layers, design queue-based architectures, and thoughtfully consider microservices decomposition. This knowledge is the foundation for building systems that handle any scale.

5 / 5

Loading learning content...

System Design (HLD)Scaling Playbook

The Scaling Playbook: From Startup to Enterprise

LevelAdvanced

Duration90 mins

TopicScaling Playbook

5 / 5

Microservices Decomposition

The Decomposition Decision

What You Will Learn

The Case Against (and For) Microservices

Before exploring how to do microservices, we must honestly examine whether to do them at all. Microservices are not universally better—they're a trade-off.

The Genuine Benefits:

Independent deployment: Each service can be deployed without coordinating with others. A bug fix in the payment service doesn't require deploying the entire application.

Technology heterogeneity: Different services can use different languages, frameworks, and databases appropriate to their problem domain.

Team autonomy: Small teams can own services end-to-end, making decisions independently without company-wide coordination.

Isolated scaling: A heavily-loaded service can be scaled without scaling the entire system.

Failure isolation: A failure in one service doesn't necessarily crash the entire system (with proper design).

Microservices Excel When...

•Multiple teams work on the same codebase
•Different parts of the system have vastly different scaling needs
•Deployment coordination is becoming a bottleneck
•Parts of the system have different availability requirements
•The domain has natural boundaries between bounded contexts
•The organization has mature DevOps practices

Microservices Struggle When...

•A small team builds and maintains everything
•The domain is unclear or rapidly changing
•The organization lacks distributed systems expertise
•Observability and debugging infrastructure is immature
•Latency requirements are very strict
•The primary pain isn't organizational—it's technical debt

The Real Costs:

Distributed systems complexity: Network calls fail. Latency is variable. Partial failures occur. These require sophistication to handle correctly.

Operational overhead: Each service needs monitoring, alerting, deployment pipelines, and on-call rotations. The overhead scales with service count.

Testing complexity: Integration tests become essential but difficult. Testing all service interactions exhaustively is often impossible.

Debugging difficulty: A request touches multiple services. Tracing issues across services requires distributed tracing infrastructure.

Data consistency: Transactions can't span services. Eventual consistency becomes the norm. Business logic must accommodate this.

The Distributed Monolith Anti-Pattern

Identifying Service Boundaries

Domain-Driven Design (DDD) Approach:

Example: "Customer" means different things in different contexts:

Sales context: Leads, opportunities, contact preferences
Billing context: Payment methods, invoices, credit limits
Support context: Tickets, satisfaction scores, communication history

Each context should be a candidate for a service boundary.

Identifying Bounded Contexts:

Boundary Identification Techniques

•Event Storming — Workshop where domain experts and developers map business events on a timeline. Natural clusters of events suggest bounded contexts.
•Context Mapping — Analyze how different parts of the organization use shared concepts. Where vocabularies diverge, boundaries exist.
•Data Ownership Analysis — Which team creates and owns each data entity? Ownership boundaries often align with service boundaries.
•Conway's Law Observation — Organizations design systems that mirror their communication structures. Align services with team structures for sustainable ownership.
•Change Frequency Analysis — Parts that change together should stay together. Parts that change independently are candidates for separation.
•Coupling Analysis — Graph dependencies between modules. Look for clusters with high internal cohesion and low external coupling.

The Coupling Litmus Test:

A well-defined service boundary exhibits:

Minimal cross-boundary communication: Services should need to talk to each other infrequently. If every request requires calling 5 other services synchronously, boundaries are wrong.

Independent data ownership: Each service owns its data exclusively. No shared databases. Other services access data via the service's API, not its database.

Independent deployment: A change in one service shouldn't require changes in others. Contract changes should be additive and backward-compatible.

Converting Mermaid diagram...

Start Coarser, Refine Later

Decomposition Strategies

Strategy 1: Strangler Fig Pattern

How it works:

Place a facade (API gateway) in front of the monolith
Implement new functionality as new services behind the facade
Gradually migrate existing functionality to services
Facade routes traffic to appropriate destination
Eventually decommission the monolith

Advantages: Low risk. Can pause or reverse at any point. Delivers value incrementally.

Challenges: Requires discipline. Easy to add new features to both places. Migration can stall indefinitely.

Decomposition Approach Comparison
Approach	Risk Level	Duration	Team Impact	Best For
Strangler Fig	Low	Months to years	Gradual transition	Production systems with uptime requirements
Branch by Abstraction	Medium	Weeks to months	Minimal disruption	Well-architected monoliths
Big Bang Rewrite	Very High	Months	Major disruption	Rarely recommended; last resort
Product Line Split	Medium	Varies	Team reorganization	Multi-product companies

Strategy 2: Branch by Abstraction

Create an abstraction layer within the monolith before extracting functionality.

Steps:

Identify the module to extract
Create an interface that encapsulates the module's functionality
Refactor all callers to use the interface
Create a new implementation of the interface that calls an external service
Implement the external service
Switch to the external implementation
Remove the old implementation

Advantage: Change is incremental and reversible at each step.

Strategy 3: Database Disentanglement

Often the hardest part of decomposition is separating shared data. Services sharing a database are not independent.

Pattern: Shared Database → Eventual Sync

Both monolith and new service read from shared database initially
New service builds its own data model
Synchronization job keeps data in sync
New service's reads switch to its own database
New service begins accepting writes
Monolith stops writing to the now-deprecated tables
Synchronization reversed if needed for migration period
Eventually remove shared tables

Never Rewrite from Scratch

Service Communication Patterns

How services communicate defines the coupling in your architecture. Synchronous calls create temporal coupling; asynchronous patterns enable independence.

Synchronous Patterns:

REST API: HTTP-based, resource-oriented. Simple, widely understood, good tooling. Each call blocks waiting for response.

gRPC: Binary protocol, strong typing, efficient serialization. Better performance than REST, but more complex tooling and less human-readable.

When synchronous makes sense:

User-facing requests requiring immediate response
Reads that must be fresh
Simple request-response patterns

The synchronous trap: Chains of synchronous calls create latency accumulation and cascading failures. If Service A calls B calls C, a C outage takes down A.

Asynchronous Patterns:

Event-driven: Services publish events; interested services subscribe. No direct coupling between producer and consumer.

Message queues: Work queues for task processing with delivery guarantees and retry handling.

When async makes sense:

Operations that don't require immediate response
Broadcasting state changes to multiple consumers
Decoupling services for independent scaling and resilience

communication-patterns
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
// ANTI-PATTERN: Synchronous chain (fragile)
class OrderController {
    async createOrder(order: OrderRequest) {
        // Each call adds latency and failure point
        const user = await this.userService.getUser(order.userId);
        const inventory = await this.inventoryService.checkStock(order.items);
        const payment = await this.paymentService.authorize(user, order.total);
        const shipping = await this.shippingService.calculateRate(order.address);
        
        // If ANY of these fail, the whole request fails
        return this.orderRepository.create({ ...order, user, inventory, payment, shipping });
    }
}
 
// BETTER: Event-driven with eventual consistency
class OrderController {
    async createOrder(order: OrderRequest) {
        // Validate only what we own
        const orderEntity = await this.orderRepository.create({
            ...order,
            status: 'PENDING_VALIDATION',
        });
        
        // Publish event - other services react asynchronously
        await this.eventBus.publish('order.created', {
            orderId: orderEntity.id,
            userId: order.userId,
            items: order.items,
            address: order.address,
        });
        
        // Return immediately - order will be processed eventually
        return { orderId: orderEntity.id, status: 'PROCESSING' };
    }
}
 
// Each service handles its concern independently
class InventoryService {
    @Subscribe('order.created')
    async handleOrderCreated(event: OrderCreatedEvent) {
        const reserved = await this.reserveInventory(event.items);
        
        if (reserved) {
            await this.eventBus.publish('inventory.reserved', {
                orderId: event.orderId,
                items: event.items,
            });
        } else {
            await this.eventBus.publish('inventory.reservation_failed', {
                orderId: event.orderId,
                reason: 'INSUFFICIENT_STOCK',
            });
        }
    }
}
 
// Saga coordinator handles compensation on failures
class OrderSagaCoordinator {
    @Subscribe('inventory.reservation_failed')
    async handleReservationFailed(event: ReservationFailedEvent) {
        // Update order status
        await this.orderRepository.update(event.orderId, { status: 'FAILED' });
        
        // Notify user
        await this.notificationService.notifyOrderFailed(event.orderId, event.reason);
    }
}

Communication Best Practices

•Prefer async for cross-service communication — Events decouple services temporally. Failures don't cascade.
•Use sync only when necessary — Reads requiring immediate consistency, user-initiated queries needing immediate response.
•Implement circuit breakers for sync calls — Prevent cascading failures when downstream services are slow or unavailable.
•Set aggressive timeouts — A 30-second timeout means your service is blocked for 30 seconds. Prefer fast failure.
•Design for idempotency — Messages may be delivered multiple times. Handle duplicates gracefully.
•Version your APIs — Backward compatibility enables independent deployment. Never break existing contracts.

The Async Mindset Shift

Data Management in Microservices

Core Principle: Each Service Owns Its Data

A service's data is private. Other services access it only via the service's API—never by connecting to its database. This enables:

Schema freedom: Services can change their data models without coordinating with others
Technology choice: Each service can use the database type best suited to its needs
Independent scaling: Data stores can be scaled per service
Encapsulation: Business rules around data are enforced by the owning service

The Challenge: No Cross-Service Joins

In a monolith:

SELECT o.*, u.name, p.title
FROM orders o
JOIN users u ON o.user_id = u.id
JOIN products p ON o.product_id = p.id

In microservices, this query requires:

Fetch orders from Order Service
Fetch users from User Service
Fetch products from Product Service
Join in application code

Cross-Service Data Patterns

•API Composition — Service makes multiple calls to other services, combines results. Simple but adds latency and couples services.
•CQRS with Materialized Views — Separate read models are built from events. Pre-joined data for common queries. Complex but performant.
•Data Denormalization via Events — Services subscribe to events and store local copies of needed data. Trades consistency for performance.
•Saga Pattern — Coordinates multi-service operations through a series of local transactions with compensating actions on failure.
•API Gateway Aggregation — Gateway layer composes responses from multiple services. Centralizes aggregation logic.
•Shared Events for Reference Data — Low-frequency reference data (currencies, countries) published as events, cached locally by services.

Pattern Deep Dive: CQRS (Command Query Responsibility Segregation)

CQRS separates the write model (commands) from the read model (queries). In microservices, this is powerful:

Commands go to the service owning the data
Events are published when data changes
Read models are built from events, combining data from multiple services
Queries hit the pre-built read model, avoiding cross-service calls

Trade-offs:

Eventually consistent (read model is updated asynchronously)
Additional infrastructure (event bus, read model datastore)
Complexity in building and maintaining projections

When to use: Query-heavy applications where the latency and coupling of API composition is unacceptable.

Embrace Eventual Consistency

Operational Considerations

Microservices dramatically increase operational complexity. What was one application to monitor becomes tens or hundreds. The following capabilities become essential:

Observability:

Distributed Tracing — A single user request might touch 10 services. Without tracing, debugging is impossible. Tools like Jaeger, Zipkin, or AWS X-Ray correlate requests across services.

Centralized Logging — Logs from all services must be aggregated for searching and analysis. ELK stack, Datadog, or similar solutions. Include correlation IDs in every log.

Metrics and Dashboards — Each service exposes metrics. Aggregated dashboards show system health. Alert on anomalies.

Service Mesh:

For organizations with many services, a service mesh (Istio, Linkerd) provides:

Automatic mTLS between services
Traffic management (canary deployments, traffic shifting)
Observability without code changes
Retry and timeout policies as configuration

Operational Requirements Comparison
Capability	Monolith	Microservices
Deployment	One pipeline	N pipelines (one per service)
Monitoring	Single application metrics	Cross-service aggregation + tracing
Debugging	Stack traces in one process	Distributed tracing required
Testing	Unit + integration tests	Contract testing + E2E essential
Security	Perimeter security sufficient	Service-to-service auth required
Configuration	Single config source	Distributed config management
Team overhead	One rotation, one backlog	Per-service ownership structure

Essential Microservices Infrastructure

•Container Orchestration — Kubernetes or equivalent for deployment, scaling, and service discovery. Managing microservices without orchestration is unsustainable.
•CI/CD Per Service — Each service needs its own pipeline. Template pipelines reduce duplication. Automated testing is critical.
•Service Registry — Services must find each other. Kubernetes DNS, Consul, or similar provides dynamic discovery.
•API Gateway — Single entry point for external clients. Handles authentication, rate limiting, routing to internal services.
•Configuration Management — Centralized config (Consul, etcd, Kubernetes ConfigMaps) with environment-specific overrides.
•Secrets Management — Vault, AWS Secrets Manager, or similar. Secrets must not live in code or config files.

Platform Team Investment

Team Structures and Ownership

Team Topologies:

Stream-aligned teams — Own one or more services end-to-end. Responsible for building, deploying, and operating their services. Cross-functional (devs, QA, sometimes ops).

Platform teams — Provide self-service capabilities that stream-aligned teams consume. CI/CD, Kubernetes platform, observability stack, security tools.

Enabling teams — Help stream-aligned teams adopt new capabilities. Short-term embeddings to transfer knowledge, not permanent ownership.

Complicated subsystem teams — Own technically complex components requiring specialist expertise (ML models, cryptography, video encoding).

The ideal: Small, autonomous teams (3-8 people) owning 1-3 services. Clear ownership. End-to-end responsibility. "You build it, you run it."

Service Ownership Principles:

Single owner: Every service has one owning team. Not two. Not a committee. One team makes decisions and is accountable.

End-to-end responsibility: The owning team builds, tests, deploys, monitors, and responds to incidents. Ownership doesn't end at PR merge.

Clear interfaces: Teams communicate through well-defined APIs and events. Changes to contracts require coordination with consumers.

Autonomous decisions: Teams decide implementation details, technology choices (within guardrails), and deployment timing. Autonomy enables speed.

Cross-team coordination mechanisms:

Service catalogs: Central registry of services, owners, APIs, and dependencies
Architecture Decision Records (ADRs): Document significant decisions and rationale
Guilds and Communities of Practice: Cross-team groups sharing expertise (backend guild, security champions)
API governance: Standards for API design, versioning, deprecation

Microservices Require Mature Culture

Common Pitfalls and Anti-Patterns

Learning from failures is essential. These anti-patterns have derailed countless microservices initiatives:

Anti-Pattern 1: Nano-services

Services too small to be meaningful. A service for each database table. A service for each function.

Symptom: Simple operations require calling 10+ services.

Fix: Merge related services. Bounded contexts, not individual entities.

Anti-Pattern 2: Shared Database

Multiple services connect to the same database, reading and writing directly.

Symptom: Changes to database schema require coordinating multiple teams.

Fix: Extract data into service-owned stores. Coordinate during transition, then enforce ownership.

Anti-Pattern 3: Synchronous Chains

A → B → C → D → E, all synchronous calls. Latency accumulates; any failure breaks the chain.

Symptom: Slow user requests; one service failure cascades.

Fix: Event-driven architecture. Async where possible. Circuit breakers for remaining sync calls.

More Microservices Anti-Patterns

•The God Service — One service that everyone depends on. Becomes a bottleneck and single point of failure. Decompose it or accept it's actually core infrastructure.
•Leaky Abstractions — Services that expose internal implementation details. Consumers become coupled to internals. Enforce contract-driven development.
•Distributed Monolith — Services must be deployed together, share configurations, or have tight behavioral coupling. You have fragmentation, not decomposition.
•Not Invented Here — Building everything from scratch instead of using existing solutions. Platform team should provide common capabilities.
•Ignoring Operations — Launching services without monitoring, logging, or deployment automation. Leads to operational hell at scale.
•Premature Decomposition — Breaking apart before understanding the domain. Creates wrong boundaries that are painful to fix.

The Decomposition Regret Cycle:

Monolith is slow to deploy — "Let's adopt microservices!"
Initial excitement — Teams carve out their services
Reality sets in — Deployment is still slow (now coordinating multiple services), debugging is harder, data consistency issues emerge
Partial retreat — Some services merged back, some remain fragmented
Steady state — A mix that reflects true bounded contexts, often fewer and larger services than initially planned

This cycle is common. The lesson: start coarser, refine based on actual pain points, not anticipated ones.

Signs You're Doing It Right

Summary: The Scaling Playbook Complete

Microservices decomposition is the culmination of our scaling playbook—the final pattern, applied when organizational and technical pressures make it necessary. Let's consolidate the key learnings:

Key Takeaways

•Microservices are a trade-off, not an upgrade — Benefits come with significant costs. Evaluate honestly whether the benefits outweigh the costs for your context.
•Start with a modular monolith — Build clean boundaries within a monolith first. Extraction is easier from well-structured code.
•Identify boundaries using DDD — Bounded contexts, not database tables, define service boundaries. Event storming helps discover them.
•Decompose incrementally (Strangler Fig) — Never rewrite from scratch. Gradually replace functionality until the monolith disappears.
•Prefer async communication — Events decouple services. Sync calls create temporal coupling and cascade failures.
•Each service owns its data — No shared databases. Data is accessed via service APIs. Accept eventual consistency.
•Invest in observability and platform — Distributed tracing, centralized logging, and self-service deployment are essential.
•Align teams with services — Conway's Law applies. Ownership, autonomy, and end-to-end responsibility enable success.

The Complete Scaling Playbook:

Over this module, we've covered the comprehensive playbook for scaling systems:

Real-world scaling patterns — Mindset and foundational patterns
Database scaling journey — Optimization → replicas → sharding
Caching layer introduction — Hierarchy, invalidation, and distributed caching
Queue-based decoupling — Async processing, delivery guarantees, and patterns
Microservices decomposition — When, how, and organizational considerations

Module Complete

5 / 5