Loading learning content...
Extracting functionality from a monolith is like performing surgery on a patient who must remain conscious and active throughout the procedure. You're removing pieces of a living, running system while ensuring it continues to serve millions of requests. One wrong cut severs a critical dependency; one forgotten connection leaves orphaned functionality.
This is the most technically demanding phase of the Strangler Fig Pattern. The routing façade gave you the traffic control capability. Now you must identify what to extract, how to extract it cleanly, and how to maintain correctness during the transition.
The fundamental challenge:
Monoliths, especially long-lived ones, develop hidden dependencies—code paths that cross module boundaries, data relationships that span domains, and implicit contracts that nobody documented. Extraction requires making these invisible connections visible, then surgically severing them while establishing new, explicit interfaces.
By the end of this page, you will understand how to identify extraction candidates, techniques for boundary discovery, strategies for dependency management, patterns for data migration, and methods for validating extraction completeness.
Not all parts of a monolith should be extracted at the same time, and some perhaps never should be. The art of successful migration begins with identifying the right first candidates—functionality that will demonstrate value quickly while minimizing risk.
The Ideal First Extraction:
An ideal first candidate has these characteristics:
| Factor | Low Risk (Prefer) | Medium Risk | High Risk (Avoid Initially) |
|---|---|---|---|
| Dependencies | No shared database tables | Read-only access to shared tables | Write access to shared tables |
| Data Ownership | Owns all its data exclusively | Owns some, references others | Heavily intertwined with other domains |
| Team Expertise | Dedicated team understands it well | Mixed ownership, documentation exists | Nobody remembers how it works |
| Change Frequency | Actively developed, well-tested | Occasional changes, moderate tests | Rarely touched, minimal tests |
| Business Criticality | Important but not critical | Core but has redundancy | Single point of failure, revenue-critical |
| Technical Debt | Clean, modular code | Some debt, manageable | Spaghetti code, unclear boundaries |
Common First Extraction Candidates:
Notification Services: Email, SMS, push notifications are typically loosely coupled and communicate via events. They often have independent scaling needs during campaigns.
Image/File Processing: Uploading, transforming, storing files. Usually isolated with clear data ownership and benefits from specialized scaling.
Search Indexing: Building and maintaining search indices. Often a write-behind process that can be extracted without affecting read paths initially.
Analytics/Reporting: Event processing and report generation. Frequently read-only from business data and benefits from different technology choices.
Authentication/Authorization: If not already centralized, auth is a clear bounded context with well-defined contracts.
Avoid starting with core transaction processing, complex workflow orchestration, or functionality with heavy database coupling. These are high-risk extractions that should come later after you've built migration expertise.
Start at the edges and work inward. Functionality at the edges of your monolith (user-facing APIs, background jobs, integrations) typically has fewer internal dependencies than core business logic. Each extraction peels back another layer, gradually exposing the core for later extraction.
Before you can extract functionality, you must understand its actual boundaries—not what the documentation says, not what architects intended, but what the code actually does. This requires systematic discovery.
Static Analysis Approach:
Use code analysis tools to map dependencies:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181
interface ModuleDependency { source: string; target: string; type: 'import' | 'function-call' | 'database' | 'api' | 'event'; weight: number; // Frequency or importance} interface BoundaryAnalysis { internalDependencies: ModuleDependency[]; externalDependencies: ModuleDependency[]; dataAccessPatterns: DataAccessPattern[]; apiSurface: ApiEndpoint[]; eventDependencies: EventDependency[];} class BoundaryDiscovery { /** * Analyze a proposed extraction boundary */ async analyzeExtractionBoundary( candidateModules: string[], codebase: Codebase ): Promise<BoundaryAnalysis> { const allDependencies = await this.mapDependencies(codebase); const candidateSet = new Set(candidateModules); const internalDependencies: ModuleDependency[] = []; const externalDependencies: ModuleDependency[] = []; for (const dep of allDependencies) { const sourceInCandidate = candidateSet.has(dep.source); const targetInCandidate = candidateSet.has(dep.target); if (sourceInCandidate && targetInCandidate) { // Both ends inside candidate - internal dependency internalDependencies.push(dep); } else if (sourceInCandidate || targetInCandidate) { // One end outside - this crosses the extraction boundary externalDependencies.push(dep); } // Neither in candidate - not relevant to this extraction } // Analyze data access patterns const dataPatterns = await this.analyzeDataAccess(candidateModules, codebase); // Map API surface (what external code calls into candidate) const apiSurface = await this.mapApiSurface(candidateModules, allDependencies); // Map event dependencies (events produced and consumed) const eventDeps = await this.mapEventDependencies(candidateModules, codebase); return { internalDependencies, externalDependencies, dataAccessPatterns: dataPatterns, apiSurface, eventDependencies: eventDeps, }; } /** * Calculate extraction complexity score */ calculateExtractionComplexity(analysis: BoundaryAnalysis): number { let score = 0; // Each external dependency adds complexity score += analysis.externalDependencies.length * 10; // Write access to shared tables is very complex for (const pattern of analysis.dataAccessPatterns) { if (!pattern.isOwnedByCandidate) { score += pattern.hasWriteAccess ? 50 : 20; } } // Large API surface means more contracts to maintain score += analysis.apiSurface.length * 5; // Event dependencies require careful handling score += analysis.eventDependencies.length * 8; return score; } /** * Generate extraction plan based on analysis */ generateExtractionPlan(analysis: BoundaryAnalysis): ExtractionPlan { return { // Dependencies that must become API calls apisToCreate: analysis.externalDependencies .filter(d => d.type === 'function-call') .map(d => ({ from: d.source, to: d.target, suggestedEndpoint: this.suggestEndpoint(d), })), // Tables that need to be migrated or accessed via API dataMigrations: analysis.dataAccessPatterns .filter(p => p.isOwnedByCandidate) .map(p => p.tableName), // Tables that need API wrappers dataApis: analysis.dataAccessPatterns .filter(p => !p.isOwnedByCandidate) .map(p => ({ table: p.tableName, operations: p.hasWriteAccess ? ['read', 'write'] : ['read'], })), // Events that become published/subscribed eventContracts: analysis.eventDependencies, }; } private async mapDependencies(codebase: Codebase): Promise<ModuleDependency[]> { // Implementation: static analysis of imports, calls, etc. return []; } private async analyzeDataAccess( modules: string[], codebase: Codebase ): Promise<DataAccessPattern[]> { // Implementation: trace database queries from module code return []; } private async mapApiSurface( modules: string[], dependencies: ModuleDependency[] ): Promise<ApiEndpoint[]> { // Implementation: find all entry points into candidate modules return []; } private async mapEventDependencies( modules: string[], codebase: Codebase ): Promise<EventDependency[]> { // Implementation: find event publications and subscriptions return []; } private suggestEndpoint(dep: ModuleDependency): string { return `/api/internal/${dep.target.toLowerCase()}`; }} interface DataAccessPattern { tableName: string; isOwnedByCandidate: boolean; hasWriteAccess: boolean; accessingModules: string[];} interface ApiEndpoint { path: string; method: string; consumers: string[];} interface EventDependency { eventName: string; direction: 'publish' | 'subscribe'; counterparties: string[];} interface ExtractionPlan { apisToCreate: { from: string; to: string; suggestedEndpoint: string }[]; dataMigrations: string[]; dataApis: { table: string; operations: string[] }[]; eventContracts: EventDependency[];} interface Codebase { // Abstract representation of codebase for analysis}Dynamic Analysis Approach:
Static analysis shows what could happen. Dynamic analysis shows what actually happens:
Dynamic analysis often reveals surprising truths: that 'isolated' module that actually gets called by every checkout request, or the 'deprecated' code path that still handles 5% of traffic.
Static analysis will show dependencies on code that's never actually executed. Before spending effort handling a dependency, verify it's actually used in production. Many 'complex' dependencies turn out to be dead code that can simply be removed.
Once you've discovered the extraction boundary, you must decide how to handle each dependency that crosses it. There are several strategies, each with different tradeoffs.
Decision Framework for Dependency Resolution:
| Question | If Yes → Strategy |
|---|---|
| Is this a simple data lookup? | API with caching |
| Does the caller need immediate confirmation? | Synchronous API |
| Can the caller tolerate eventual consistency? | Event-based decoupling |
| Is this called in a hot path? | Data duplication or co-extraction |
| Is this utility code with no business logic? | Shared library |
| Is this too intertwined to separate? | Co-extraction (temporary) |
Every dependency crossing an extraction boundary becomes an explicit contract. Document these contracts—they're the foundation of your microservices architecture.
When creating APIs to replace internal dependencies, don't just expose the monolith's internal data model. Create an anti-corruption layer that presents a clean, domain-appropriate interface. This prevents the monolith's legacy design decisions from infecting your new microservices.
Data is typically the hardest part of extraction. Code can be duplicated and tested; data must be migrated carefully to maintain correctness, and often can't be easily reversed.
The Data Migration Spectrum:
From least to most invasive:
Read from monolith DB: New service reads directly from monolith's database. Quick to implement but creates tight coupling.
Read via API: New service calls monolith API for data. Reduces coupling but adds latency.
Replicated Read Model: New service maintains its own copy, synchronized via events. Better autonomy, eventual consistency.
Full Data Migration: Data is migrated to new service's database, with monolith updated to call API. Full autonomy, significant effort.
| Strategy | Coupling | Latency | Consistency | Migration Effort |
|---|---|---|---|---|
| Direct DB Read | Very High | Low | Strong | Low |
| Read via API | Medium | Medium | Strong | Medium |
| Replicated Read Model | Low | Low (local) | Eventual | High |
| Full Migration | None | Low (local) | Strong (within service) | Very High |
The Dual-Write Problem:
During migration, you often need both systems to have up-to-date data. This creates the dual-write challenge: if you write to both databases, you risk inconsistency if one write succeeds and the other fails.
Solutions to dual-write:
Single writer with events: Write to one authoritative source, propagate to others via events. Accept eventual consistency.
Change Data Capture (CDC): Use tools like Debezium to capture database changes and propagate them automatically.
Outbox Pattern: Write to primary database with an 'outbox' table for events. A separate process publishes events from the outbox.
Saga Pattern: Use compensating transactions if one write fails. Requires careful error handling.
Eventual migration: Don't dual-write. Migrate data completely before switching traffic. Simpler but requires downtime for the migrated feature.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125
interface OutboxEvent { id: string; aggregateType: string; aggregateId: string; eventType: string; payload: Record<string, unknown>; createdAt: Date; published: boolean;} class OutboxPublisher { private db: Database; private eventBus: EventBus; constructor(db: Database, eventBus: EventBus) { this.db = db; this.eventBus = eventBus; } /** * Write entity change with outbox event in single transaction */ async writeWithOutbox<T>( entityWrite: () => Promise<T>, event: Omit<OutboxEvent, 'id' | 'createdAt' | 'published'> ): Promise<T> { return this.db.transaction(async (tx) => { // Write the entity const result = await entityWrite(); // Write to outbox in same transaction await tx.insert('outbox', { id: crypto.randomUUID(), ...event, createdAt: new Date(), published: false, }); return result; }); } /** * Background process: publish unpublished outbox events */ async pollAndPublish(): Promise<number> { // Get unpublished events, oldest first const events = await this.db.query<OutboxEvent>( 'SELECT * FROM outbox WHERE published = false ORDER BY createdAt ASC LIMIT 100' ); let published = 0; for (const event of events) { try { // Publish to event bus await this.eventBus.publish({ type: event.eventType, aggregateType: event.aggregateType, aggregateId: event.aggregateId, payload: event.payload, timestamp: event.createdAt.toISOString(), }); // Mark as published await this.db.query( 'UPDATE outbox SET published = true WHERE id = $1', [event.id] ); published++; } catch (error) { // Log but don't fail - retry on next poll console.error(`Failed to publish event ${event.id}:`, error); break; // Preserve ordering by stopping on first failure } } return published; } /** * Cleanup old published events */ async cleanup(olderThanDays: number = 7): Promise<number> { const result = await this.db.query( 'DELETE FROM outbox WHERE published = true AND createdAt < NOW() - INTERVAL $1 DAY', [olderThanDays] ); return result.rowCount; }} // Usage exampleclass UserService { private outbox: OutboxPublisher; private userRepo: UserRepository; async updateUserEmail(userId: string, newEmail: string): Promise<void> { await this.outbox.writeWithOutbox( async () => { return this.userRepo.updateEmail(userId, newEmail); }, { aggregateType: 'User', aggregateId: userId, eventType: 'UserEmailUpdated', payload: { userId, newEmail }, } ); }} interface Database { transaction<T>(fn: (tx: any) => Promise<T>): Promise<T>; query<T>(sql: string, params?: unknown[]): Promise<T[]>;} interface EventBus { publish(event: any): Promise<void>;} interface UserRepository { updateEmail(userId: string, email: string): Promise<void>;}If you write to two databases without coordination, you will have inconsistency. It's not a matter of if, but when. Either accept eventual consistency with event-based propagation, or use distributed transactions (with their overhead), but never assume independent writes will stay synchronized.
With boundaries discovered, dependencies mapped, and data strategy chosen, here's the step-by-step process for extracting functionality.
Time Allocation:
Based on industry experience across hundreds of extractions:
Implement the extraction behind a feature flag from day one. This gives you an instant rollback mechanism, the ability to test with specific users, and a clear on/off switch for the migration. When traffic migration is complete and stable, the flag becomes permanent (always on) and can eventually be removed.
Understanding what to avoid is as important as knowing what to do. These anti-patterns have derailed many extraction efforts.
Extract 5 services simultaneously 'for efficiency.' You'll spend all your time coordinating cross-service dependencies and have no time to properly validate any of them.
Extract one service completely, including cleanup, before starting the next. Learn from each extraction and apply lessons to subsequent ones. Speed comes from expertise, not parallelism.
The 'Distributed Monolith' Red Flags:
Watch for these signs that your extraction is creating a distributed monolith rather than true microservices:
If you see these patterns, step back and reconsider your boundaries. It's better to have a well-designed monolith than a poorly designed distributed system.
How do you know when an extraction is truly complete? You need objective criteria that prevent premature celebrations and ensure thoroughness.
| Category | Criteria | Evidence Required |
|---|---|---|
| Functionality | All features work identically | Shadow comparison shows 0 discrepancies for 7+ days |
| Performance | Latency meets or exceeds baseline | P99 latency within 10% of monolith |
| Reliability | Error rate equivalent or better | Error rate ≤ monolith rate for 7+ days |
| Scale | Can handle full production load | Load test at 2x peak traffic |
| Independence | No shared database tables in production use | Database dependency map shows clean separation |
| Operations | Runbooks documented and tested | On-call has successfully handled an incident |
| Observability | Full visibility into service health | Dashboard shows all golden signals, alerting works |
| Cleanup | Monolith code removed | Dead code deleted, documentation updated |
The Definition of Done for Extraction:
Don't declare victory until the extracted service has run in production for at least two weeks without needing to fall back to the monolith. Many edge cases only appear after days of diverse production traffic.
Extracting functionality is the core work of the Strangler Fig Pattern. It requires systematic discovery, careful dependency management, and disciplined execution.
What's Next:
With functionality extracted, the next challenge is managing the transition: Cutover Strategies. We'll explore techniques for switching from old to new implementations, handling the critical moment when traffic moves, and ensuring zero-downtime transitions.
You now understand how to identify extraction candidates, discover boundaries, manage dependencies, handle data migration, and validate completeness. With these skills, you can systematically decompose a monolith into well-designed microservices. Next, we'll learn how to execute the cutover safely.