System Design (HLD)Monolith Architecture

Understanding Monolithic Architecture

LevelIntermediate

Duration60 mins

TopicMonolith Architecture

3 / 4

Monolith Challenges at Scale

When Success Becomes a Constraint

Monolithic architectures provide tremendous benefits in simplicity, performance, and development velocity. But as applications succeed—as users multiply, features accumulate, codebases expand, and teams grow—certain challenges begin to emerge.

These challenges are not fatal flaws. They are scaling pressures—friction points that grow with success. Understanding them is crucial for two reasons:

To anticipate them: Knowing what challenges lie ahead helps you architect for eventual scale from the start.
To evaluate trade-offs: When considering a move to distributed architectures, you must weigh the complexity cost against the specific challenges you're experiencing.

This page provides a comprehensive, rigorous examination of monolith challenges. We'll explore scaling limitations, organizational friction, deployment nightmares, and technical debt patterns—painting an honest picture of what happens when monoliths succeed beyond their comfortable limits.

What You Will Learn

By the end of this page, you will understand the spectrum of challenges that monoliths face at scale. You'll be able to identify which challenges apply to your situation, distinguish between problems inherent to monoliths versus problems of poor architecture, and evaluate whether your pain points justify architectural evolution.

The Scaling Spectrum

Before diving into specific challenges, we must establish what "scale" means. Scale is not a single dimension—it's a spectrum that spans multiple axes:

Dimensions of Scale

The Multiple Dimensions of Scale
Dimension	Description	Threshold Indicators
Traffic Scale	Requests per second, concurrent users	100K+ RPM, millisecond latency requirements
Data Scale	Database size, query complexity	Terabytes of data, complex joins slowing down
Codebase Scale	Lines of code, number of modules	500K+ LOC, compilation taking minutes
Team Scale	Number of developers, teams	50+ engineers, multiple product teams
Feature Scale	Number of features, product complexity	Feature interdependencies causing conflicts
Operational Scale	Deployment frequency, incident frequency	Daily deploys, high change failure rate

Challenges Are Dimension-Specific

Different scaling dimensions create different challenges:

High traffic scale with low team scale = might just need horizontal scaling
High team scale with moderate traffic = organizational friction is the primary pain
High codebase scale with low traffic = deployment and testing speed are the concerns

Understanding which dimensions are scaling helps you focus on the right problems. A startup with 3 engineers and 100 users has no scaling challenges—they have a monolith that works perfectly. A company with 200 engineers and millions of users might have severe challenges on multiple dimensions.

Scale Isn't One-Dimensional

When we discuss "monolith challenges at scale," we're not just talking about traffic. The most common challenges that drive companies away from monoliths are organizational and operational—not pure performance scaling. A monolith can often handle enormous traffic; it's the human systems around it that break first.

Deployment Bottlenecks

One of the first challenges to emerge as monoliths grow is deployment friction. What was once a smooth, quick deploy becomes a ceremony that teams dread.

The Monolith Deployment Anti-Pattern

In a mature monolith, deployments often look like this:

•Code Freeze: Stop all merges to main 24 hours before deployment
•Queue Up: Multiple teams wait for their changes in the deployment queue
•Build Time: Building and testing the full application takes 30-60 minutes
•Deployment Window: Deployments happen only during low-traffic periods (often late night)
•Staged Rollout: Careful canary deployment to detect issues early
•Validation: Each team validates their feature worked
•Rollback Risk: If anything fails, everything rolls back—including other teams' changes

Why This Happens

Monoliths deploy as a single unit. This means:

All changes deploy together: A small bug fix deploys with a major feature. Neither team chose this coupling.
Risk aggregates: With many changes in a release, the probability of something going wrong increases proportionally.
Blame is diffused: When deployment fails, which change caused it? Multiple teams must investigate.
Fear drives infrequency: Teams delay deployments to "batch" changes, reducing perceived risk but actually increasing it.

This creates a negative feedback loop: fear of deployment → less frequent deployments → more changes per deployment → higher risk per deployment → more fear.

Converting Mermaid diagram...

Quantifying the Problem

Let's put numbers to this:

Deployment Friction Metrics
Metric	Healthy Monolith	Challenged Monolith
Deployment Frequency	Multiple times per day	Weekly or bi-weekly
Lead Time (code to prod)	Hours	Days to weeks
Build + Test Time	5-15 minutes	30-90 minutes
Change Failure Rate	<5%	15%
Mean Time to Recovery	Minutes	Hours
Deployment Window	Any time	Scheduled windows only

Deployment Pain Is a Symptom

While microservices can help with independent deployment, many deployment problems stem from inadequate testing, poor CI/CD practices, or insufficient observability—problems that follow you to microservices. Before assuming architecture is the issue, examine your deployment pipeline.

Scaling Inefficiency

Monoliths scale horizontally—add more instances behind a load balancer—but they scale as a whole. This creates inefficiency when different parts of the application have vastly different resource requirements.

The Scaling Inefficiency Problem

Consider an e-commerce monolith with these load characteristics:

Uneven Load Distribution in a Monolith
Module	% of Traffic	Resource Type Needed	Peak Load Pattern
Product Catalog (Read)	60%	CPU for rendering, memory for cache	During business hours
Search	25%	High CPU (Elasticsearch queries)	Spiky during promotions
Checkout	10%	I/O (database, payment APIs)	Concentrated at purchases
Admin/Reporting	5%	Memory (large queries)	End of day/month

The Problem: To handle a spike in Search traffic (during a promotion), you must scale the entire application. You're paying for 4x the Checkout capacity and 4x the Admin capacity that you don't need.

Even worse, different modules have different resource profiles:

Search needs CPU-optimized instances
Product Catalog needs memory-optimized for caching
Admin needs large memory for report generation

With a monolith, you pick one instance type that's "good enough" for everything—optimal for nothing.

The Cost Multiplication

Let's quantify:

scaling-cost-analysis

Monolith Scaling Cost Analysis
===============================
 
Current Load:
- 10M requests/day, 90% are product catalog or search
- Peak: 3x normal during promotions
 
Scaling Requirements (Monolith):
- Base: 4 instances (m5.2xlarge) = $1,400/month
- Promotion peak: 12 instances = $4,200/month (3x everything)
 
Problem:
- Checkout gets 12 instances, but only needs 1-2
- Admin gets 12 instances, but only needs 1
- Over-provisioning: ~60% of compute is wasted
 
Annual waste: ~$25,000 in unnecessary compute

When This Matters

Scaling inefficiency becomes significant when:

Load is unevenly distributed across features
Peak demand varies by feature (not uniform)
Resource requirements differ substantially by module
Cloud costs are a meaningful business concern

For small applications or uniform load distributions, this inefficiency is negligible. For applications spending millions on infrastructure, it's a compelling driver for decomposition.

Vertical Scaling First

Before assuming you need horizontal scaling, explore vertical scaling. Modern servers can have 256+ cores and terabytes of RAM. A single well-optimized server might handle your entire load for years. Vertical scaling is simpler and often cheaper than horizontal, especially for small-to-medium scale.

Organizational Friction

Perhaps the most compelling driver for leaving monolithic architecture isn't technical—it's organizational. As teams grow, monoliths create coordination overhead that slows everyone down.

Conway's Law in Action

"Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations." — Melvin Conway, 1967

In a monolith with multiple teams, Conway's Law creates friction:

Teams must coordinate on shared code
Changes to shared modules require consensus
Merge conflicts between teams are common
Release schedules must sync across teams

The Symptoms of Organizational Friction

Warning Signs of Organizational Friction

•Merge conflicts are the norm — Every PR seems to conflict with someone else's changes. Developers spend hours resolving conflicts in shared modules.
•"Whose code is this?" — Common code areas have unclear ownership. No team wants to maintain them, yet everyone must touch them.
•The coordination meeting — Teams hold daily or weekly meetings just to coordinate who's changing what, to avoid stepping on each other.
•The deployment blocker — One team's bug blocks another team's release. Features are ready but can't ship.
•The knowledge bottleneck — Only a few people understand how certain modules work. They become blockers for any change in that area.
•The refactoring paralysis — No one refactors shared code because the impact is too broad, and no single team owns it.
•Onboarding takes months — New developers must understand the entire codebase before being productive.

The Team Autonomy Problem

In a monolith, teams cannot operate independently. Consider what it takes for Team A (Checkout) to ship a feature:

Change the Checkout module ✓ (Team A owns this)
Update the shared User model ✗ (Needs coordination with Team B)
Modify the API layer ✗ (Shared ownership, needs approval)
Wait for other teams' changes to be release-ready ✗ (External dependency)
Deploy during the shared deployment window ✗ (Scheduled, not on-demand)

Team A's speed is capped by the slowest team. This is organizational coupling—and it grows worse as teams multiply.

Organizational Friction by Team Size
Team Size	Coordination Overhead	Typical Experience
1-5 developers	Minimal	Everyone knows everything, informal coordination works
5-15 developers	Moderate	Some process needed, shared ownership manageable
15-50 developers	High	Significant overhead, teams stepping on each other
50-200 developers	Severe	Teams bottle-necked, coordination dominant cost
200+ developers	Prohibitive	Monolith likely unsustainable without major restructuring

The Two-Pizza Team Limit

Amazon's famous "two-pizza team" heuristic (teams small enough to feed with two pizzas, ~6-10 people) applies here. When a monolith has more than two pizza teams working on it, coordination overhead often exceeds productive work. This is when organizational friction becomes the primary driver for decomposition.

Technical Debt Accumulation

Monoliths have a natural tendency to accumulate technical debt in specific patterns. Understanding these patterns helps you recognize them early and address them proactively.

The Big Ball of Mud Trajectory

Without deliberate architectural governance, monoliths tend to degrade into "big balls of mud"—systems with no discernible architecture. This happens gradually:

The Degradation Sequence

•Clean start: Initial architecture is layered and organized. Modules have clear boundaries.
•First shortcut: Under deadline pressure, a developer calls a function "across layers" instead of going through proper interfaces.
•Shortcuts normalize: Others see the shortcut, assume it's okay, and add more. "Just this once" becomes the pattern.
•Boundaries blur: What were once clear module interfaces become leaky. Code from one module directly accesses internals of another.
•God classes emerge: Shared utility classes grow unboundedly. Everyone puts their helper functions in the same place.
•Circular dependencies form: Module A imports from B, B imports from C, C imports from A. Extraction becomes impossible.
•Big Ball of Mud: No one knows the architecture anymore. Every change is risky. Developers are afraid to modify anything.

Degradation Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Year 1: Clean architecture with clear boundaries
 
// OrderService only uses public interfaces of dependencies
import { UserService } from '@services/user';
import { InventoryService } from '@services/inventory';
import { PaymentService } from '@services/payment';
 
class OrderService {
    async placeOrder(userId: string, items: OrderItem[]) {
        // Using public API only
        const user = await this.userService.getUser(userId);
        const available = await this.inventoryService.checkAvailability(items);
        const payment = await this.paymentService.charge(user.paymentMethod);
        // ...
    }
}

Specific Debt Patterns in Monoliths

Common Technical Debt Patterns
Pattern	Symptom	Impact
Shared Mutable State	Global variables, singletons accessed everywhere	Changes cause unexpected side effects
Leaky Abstractions	Implementation details exposed across modules	Tight coupling, risky changes
Circular Dependencies	A→B→C→A dependency cycles	Cannot extract or test modules in isolation
God Objects	Classes with 50+ methods, 1000+ LOC	Any change touches critical shared code
Copy-Paste Proliferation	Same logic duplicated across modules	Bugs fixed in one place, not others
Test Neglect	Critical paths without test coverage	Deployments require manual testing
Documentation Rot	Architecture docs outdated or missing	Tribal knowledge, key-person dependencies

Microservices Don't Fix Debt

Moving to microservices doesn't eliminate technical debt—it distributes it. If your monolith is a big ball of mud, your microservices will be distributed balls of mud. Clean your architecture first, then consider decomposition. In fact, a well-structured modular monolith is often better than poorly-structured microservices.

Testing and CI/CD Degradation

As monoliths grow, their testing and CI/CD pipelines often degrade to the point where they become a primary source of friction.

The Test Suite Slowdown

Monolith test suites tend to grow linearly with codebase size. But they also tend to become slower per test as the codebase grows (more setup, more teardown, more interdependencies):

Test Suite Growth Pattern
Codebase Size	Test Count	Expected Duration	Observed Duration
10K LOC	500 tests	30 seconds	30 seconds
50K LOC	2,500 tests	2.5 minutes	4 minutes
200K LOC	10,000 tests	10 minutes	25 minutes
500K LOC	25,000 tests	25 minutes	90 minutes
1M LOC	50,000 tests	50 minutes	4+ hours

The observed duration exceeds expected duration because:

Test setup becomes heavier (more database seeding, more mocking)
Tests become flaky due to shared state and timing issues
Parallelization becomes harder as tests compete for resources
The sheer number of tests overwhelms test runners

The Flaky Test Plague

Large monoliths often suffer from flaky tests—tests that pass sometimes and fail sometimes based on non-deterministic factors. Flaky tests create several problems:

Distrust: Developers learn to ignore test failures ("it's just flaky")
Retry culture: CI pipelines are configured to retry failures, extending build times
Hidden failures: Real bugs are dismissed as flakiness
Merge roulette: Merging becomes a game of luck

Common Sources of Test Flakiness

•Shared test database — Tests interfere with each other's data. Order of execution matters.
•Time-dependent tests — Tests that check "created in the last hour" break at midnight.
•Race conditions — Tests that assume async operations complete in a certain order.
•External dependencies — Tests hitting real APIs that rate-limit or timeout.
•Resource exhaustion — Tests running in parallel exhaust connection pools or memory.
•Non-isolated tests — Tests that modify global state without cleanup.

The CI Pipeline Bottleneck

With slow tests, CI pipelines become bottlenecks. The typical degradation:

Queue buildup: More merges than CI can process → PRs stuck in queue
Trunk conflicts: By the time CI finishes, trunk has moved → rebuild required
Developer context switching: Waiting for CI → developers start other work → lose context
End-of-day rush: Everyone tries to merge before leaving → CI queue explodes
Skip the tests: Under pressure, developers merge without waiting for CI

Strategies for Test Performance

Before abandoning the monolith, try: (1) Test parallelization with isolated databases per test, (2) Blazing-fast test teardown/rebuild (transactions instead of truncation), (3) Identifying and fixing top 10% slowest tests, (4) Caching compilation and dependencies, (5) Running only tests affected by the changeset (test impact analysis).

Technology Lock-In

Monoliths typically use a single technology stack—one programming language, one framework, one database. This consistency is a benefit, but it can become a constraint.

The Lock-In Spectrum

Technology Lock-In Concerns
Concern	Example	Impact
Language Evolution	Python 2 → 3, Java 8 → 17	Major upgrade requires touching entire codebase
Framework Aging	Ruby on Rails 4 → 7, Angular 1 → 12+	Old patterns throughout, migration is massive
Library Dependencies	Security vulnerability in core library	Can't upgrade due to cascading changes
Database Limitations	Relational DB for a graph problem	Suboptimal performance, workarounds required
Best Tool Selection	ML module needs Python, app is Java	Either suboptimal tool or FFI complexity
Hiring Pool	Codebase in declining language	Harder to find developers, higher costs

The Library Upgrade Problem

In a monolith, all code shares the same dependency versions. This creates conflicts:

Module A needs axios@0.21 for a feature
Module B requires axios@1.0 for a security fix
The entire application must agree on one version

In a large monolith, these conflicts multiply. Upgrading a core library means auditing every use case, which is prohibitively expensive. Result: applications run on outdated, sometimes vulnerable, dependencies.

The "New Technology" Problem

When a new problem domain is best served by a different technology:

Machine learning? Python with TensorFlow/PyTorch is the ecosystem.
Real-time streaming? Consider Kafka, Flink, languages with strong concurrency.
Graph processing? Neo4j or specialized graph stores.

In a monolith, you either:

Shoehorn: Force the problem into your existing stack (suboptimal)
FFI: Foreign Function Interface to another language (complex, fragile)
Separate Service: Build it as a service, introducing distribution (microservices creep)

None of these are ideal. The monolith constrains your technology choices.

Polyglot Within Reason

Some platforms enable polyglot development within a monolith—JVM languages (Java, Kotlin, Scala, Clojure) can interop, .NET supports C# and F#, etc. But this is limited. Cross-paradigm choices (Python ML libraries from a Java app) still require service boundaries.

Resilience Limitations

Monoliths have inherent resilience limitations: they are single points of failure. While mitigation strategies exist, certain failure modes are difficult to address.

Failure Mode: Everything Down

In a monolith, when the process crashes, all functionality becomes unavailable:

An out-of-memory error in reporting takes down checkout
An unhandled exception in admin crashes the customer-facing site
A deadlock in one thread pool blocks all requests

Horizontal Scaling Helps, But...

Running multiple instances mitigates some risks:

If one instance crashes, others continue serving
Load balancer routes around failing instances

But certain failures affect all instances:

A bug triggered by a specific input will crash all instances that receive it
A database bottleneck affects everyone equally
A dependent service outage (payment processor) cascades to all
An infinite loop or memory leak will eventually hit every instance

Monolith Resilience Challenges

•Noisy neighbor: One feature consuming excessive resources (CPU, memory) starves others.
•Cascading failure: A slow dependency (database, API) blocks threads, failing all requests.
•Memory leaks: Leak in any module eventually crashes the entire process.
•Deployment failure: A broken module means you can't deploy working modules either.
•Blast radius: Any severe bug has application-wide impact.

Comparison: Microservices Failure Isolation

Microservices can contain failures:

If the Recommendation service crashes, Checkout continues working
If Reporting is overloaded, it doesn't affect the customer-facing site
Each service can have independent resource limits, circuit breakers, and fallback strategies

This isn't free—it requires sophisticated resilience engineering (circuit breakers, bulkheads, timeouts, fallbacks). But the capability to isolate failures exists in ways a monolith cannot match.

Monolith Resilience Strategies

While monoliths can't fully isolate failures, you can mitigate risks: (1) Circuit breakers on external dependencies, (2) Watchdog processes that restart crashed instances, (3) Rate limiting per-feature to prevent noisy neighbors, (4) Separate instances for critical vs. non-critical workloads, (5) Chaos testing to identify failure modes proactively.

Summary: Recognizing Monolith Challenges

We've explored the challenges that monolithic architectures face at scale. Let's consolidate:

Key Takeaways

•Scale is multi-dimensional — Traffic, data, codebase, team, and operations all scale differently with different challenges.
•Deployment bottlenecks — Coordinated releases, long pipelines, and all-or-nothing rollbacks create friction.
•Scaling inefficiency — Uniform scaling wastes resources when load is uneven across features.
•Organizational friction — Multiple teams in one codebase creates coordination overhead that dominates productive work.
•Technical debt accumulation — Monoliths naturally degrade toward "big balls of mud" without active governance.
•Testing degradation — Test suites become slow, flaky, and eventually a liability rather than an asset.
•Technology lock-in — Single stack constrains choices, makes upgrades expensive, and limits tool selection.
•Resilience limitations — Single process means blast radius encompasses the entire application.

A Nuanced View

None of these challenges are fatal, and most have mitigation strategies within the monolithic paradigm. The question is always: when do the mitigations become more expensive than the alternative?

Many organizations jump to microservices prematurely, before these challenges are actually severe. Others stay with monoliths too long, suffering unnecessary friction. The key is honest assessment: which challenges are you actually experiencing, and is architectural change the right solution?

In the next page, we'll explore when the monolith remains the right choice—despite these challenges—and how to evaluate that decision rigorously.

Page Complete

You now understand the spectrum of challenges that monolithic architectures face at scale. Next, we'll examine when monoliths remain the right choice—the conditions under which these challenges are manageable and the alternatives are worse.

3 / 4

Loading learning content...

System Design (HLD)Monolith Architecture

Understanding Monolithic Architecture

LevelIntermediate

Duration60 mins

TopicMonolith Architecture

3 / 4

Monolith Challenges at Scale

When Success Becomes a Constraint

These challenges are not fatal flaws. They are scaling pressures—friction points that grow with success. Understanding them is crucial for two reasons:

To anticipate them: Knowing what challenges lie ahead helps you architect for eventual scale from the start.
To evaluate trade-offs: When considering a move to distributed architectures, you must weigh the complexity cost against the specific challenges you're experiencing.

What You Will Learn

The Scaling Spectrum

Before diving into specific challenges, we must establish what "scale" means. Scale is not a single dimension—it's a spectrum that spans multiple axes:

Dimensions of Scale

The Multiple Dimensions of Scale
Dimension	Description	Threshold Indicators
Traffic Scale	Requests per second, concurrent users	100K+ RPM, millisecond latency requirements
Data Scale	Database size, query complexity	Terabytes of data, complex joins slowing down
Codebase Scale	Lines of code, number of modules	500K+ LOC, compilation taking minutes
Team Scale	Number of developers, teams	50+ engineers, multiple product teams
Feature Scale	Number of features, product complexity	Feature interdependencies causing conflicts
Operational Scale	Deployment frequency, incident frequency	Daily deploys, high change failure rate

Challenges Are Dimension-Specific

Different scaling dimensions create different challenges:

High traffic scale with low team scale = might just need horizontal scaling
High team scale with moderate traffic = organizational friction is the primary pain
High codebase scale with low traffic = deployment and testing speed are the concerns

Scale Isn't One-Dimensional

Deployment Bottlenecks

One of the first challenges to emerge as monoliths grow is deployment friction. What was once a smooth, quick deploy becomes a ceremony that teams dread.

The Monolith Deployment Anti-Pattern

In a mature monolith, deployments often look like this:

•Code Freeze: Stop all merges to main 24 hours before deployment
•Queue Up: Multiple teams wait for their changes in the deployment queue
•Build Time: Building and testing the full application takes 30-60 minutes
•Deployment Window: Deployments happen only during low-traffic periods (often late night)
•Staged Rollout: Careful canary deployment to detect issues early
•Validation: Each team validates their feature worked
•Rollback Risk: If anything fails, everything rolls back—including other teams' changes

Why This Happens

Monoliths deploy as a single unit. This means:

All changes deploy together: A small bug fix deploys with a major feature. Neither team chose this coupling.
Risk aggregates: With many changes in a release, the probability of something going wrong increases proportionally.
Blame is diffused: When deployment fails, which change caused it? Multiple teams must investigate.
Fear drives infrequency: Teams delay deployments to "batch" changes, reducing perceived risk but actually increasing it.

This creates a negative feedback loop: fear of deployment → less frequent deployments → more changes per deployment → higher risk per deployment → more fear.

Converting Mermaid diagram...

Quantifying the Problem

Let's put numbers to this:

Deployment Friction Metrics
Metric	Healthy Monolith	Challenged Monolith
Deployment Frequency	Multiple times per day	Weekly or bi-weekly
Lead Time (code to prod)	Hours	Days to weeks
Build + Test Time	5-15 minutes	30-90 minutes
Change Failure Rate	<5%	15%
Mean Time to Recovery	Minutes	Hours
Deployment Window	Any time	Scheduled windows only

Deployment Pain Is a Symptom

Scaling Inefficiency

The Scaling Inefficiency Problem

Consider an e-commerce monolith with these load characteristics:

Uneven Load Distribution in a Monolith
Module	% of Traffic	Resource Type Needed	Peak Load Pattern
Product Catalog (Read)	60%	CPU for rendering, memory for cache	During business hours
Search	25%	High CPU (Elasticsearch queries)	Spiky during promotions
Checkout	10%	I/O (database, payment APIs)	Concentrated at purchases
Admin/Reporting	5%	Memory (large queries)	End of day/month

Even worse, different modules have different resource profiles:

Search needs CPU-optimized instances
Product Catalog needs memory-optimized for caching
Admin needs large memory for report generation

With a monolith, you pick one instance type that's "good enough" for everything—optimal for nothing.

The Cost Multiplication

Let's quantify:

scaling-cost-analysis

Monolith Scaling Cost Analysis
===============================
 
Current Load:
- 10M requests/day, 90% are product catalog or search
- Peak: 3x normal during promotions
 
Scaling Requirements (Monolith):
- Base: 4 instances (m5.2xlarge) = $1,400/month
- Promotion peak: 12 instances = $4,200/month (3x everything)
 
Problem:
- Checkout gets 12 instances, but only needs 1-2
- Admin gets 12 instances, but only needs 1
- Over-provisioning: ~60% of compute is wasted
 
Annual waste: ~$25,000 in unnecessary compute

When This Matters

Scaling inefficiency becomes significant when:

Load is unevenly distributed across features
Peak demand varies by feature (not uniform)
Resource requirements differ substantially by module
Cloud costs are a meaningful business concern

For small applications or uniform load distributions, this inefficiency is negligible. For applications spending millions on infrastructure, it's a compelling driver for decomposition.

Vertical Scaling First

Organizational Friction

Perhaps the most compelling driver for leaving monolithic architecture isn't technical—it's organizational. As teams grow, monoliths create coordination overhead that slows everyone down.

Conway's Law in Action

"Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations." — Melvin Conway, 1967

In a monolith with multiple teams, Conway's Law creates friction:

Teams must coordinate on shared code
Changes to shared modules require consensus
Merge conflicts between teams are common
Release schedules must sync across teams

The Symptoms of Organizational Friction

Warning Signs of Organizational Friction

•Merge conflicts are the norm — Every PR seems to conflict with someone else's changes. Developers spend hours resolving conflicts in shared modules.
•"Whose code is this?" — Common code areas have unclear ownership. No team wants to maintain them, yet everyone must touch them.
•The coordination meeting — Teams hold daily or weekly meetings just to coordinate who's changing what, to avoid stepping on each other.
•The deployment blocker — One team's bug blocks another team's release. Features are ready but can't ship.
•The knowledge bottleneck — Only a few people understand how certain modules work. They become blockers for any change in that area.
•The refactoring paralysis — No one refactors shared code because the impact is too broad, and no single team owns it.
•Onboarding takes months — New developers must understand the entire codebase before being productive.

The Team Autonomy Problem

In a monolith, teams cannot operate independently. Consider what it takes for Team A (Checkout) to ship a feature:

Change the Checkout module ✓ (Team A owns this)
Update the shared User model ✗ (Needs coordination with Team B)
Modify the API layer ✗ (Shared ownership, needs approval)
Wait for other teams' changes to be release-ready ✗ (External dependency)
Deploy during the shared deployment window ✗ (Scheduled, not on-demand)

Team A's speed is capped by the slowest team. This is organizational coupling—and it grows worse as teams multiply.

Organizational Friction by Team Size
Team Size	Coordination Overhead	Typical Experience
1-5 developers	Minimal	Everyone knows everything, informal coordination works
5-15 developers	Moderate	Some process needed, shared ownership manageable
15-50 developers	High	Significant overhead, teams stepping on each other
50-200 developers	Severe	Teams bottle-necked, coordination dominant cost
200+ developers	Prohibitive	Monolith likely unsustainable without major restructuring

The Two-Pizza Team Limit

Technical Debt Accumulation

Monoliths have a natural tendency to accumulate technical debt in specific patterns. Understanding these patterns helps you recognize them early and address them proactively.

The Big Ball of Mud Trajectory

Without deliberate architectural governance, monoliths tend to degrade into "big balls of mud"—systems with no discernible architecture. This happens gradually:

The Degradation Sequence

•Clean start: Initial architecture is layered and organized. Modules have clear boundaries.
•First shortcut: Under deadline pressure, a developer calls a function "across layers" instead of going through proper interfaces.
•Shortcuts normalize: Others see the shortcut, assume it's okay, and add more. "Just this once" becomes the pattern.
•Boundaries blur: What were once clear module interfaces become leaky. Code from one module directly accesses internals of another.
•God classes emerge: Shared utility classes grow unboundedly. Everyone puts their helper functions in the same place.
•Circular dependencies form: Module A imports from B, B imports from C, C imports from A. Extraction becomes impossible.
•Big Ball of Mud: No one knows the architecture anymore. Every change is risky. Developers are afraid to modify anything.

Degradation Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Year 1: Clean architecture with clear boundaries
 
// OrderService only uses public interfaces of dependencies
import { UserService } from '@services/user';
import { InventoryService } from '@services/inventory';
import { PaymentService } from '@services/payment';
 
class OrderService {
    async placeOrder(userId: string, items: OrderItem[]) {
        // Using public API only
        const user = await this.userService.getUser(userId);
        const available = await this.inventoryService.checkAvailability(items);
        const payment = await this.paymentService.charge(user.paymentMethod);
        // ...
    }
}

Specific Debt Patterns in Monoliths

Common Technical Debt Patterns
Pattern	Symptom	Impact
Shared Mutable State	Global variables, singletons accessed everywhere	Changes cause unexpected side effects
Leaky Abstractions	Implementation details exposed across modules	Tight coupling, risky changes
Circular Dependencies	A→B→C→A dependency cycles	Cannot extract or test modules in isolation
God Objects	Classes with 50+ methods, 1000+ LOC	Any change touches critical shared code
Copy-Paste Proliferation	Same logic duplicated across modules	Bugs fixed in one place, not others
Test Neglect	Critical paths without test coverage	Deployments require manual testing
Documentation Rot	Architecture docs outdated or missing	Tribal knowledge, key-person dependencies

Microservices Don't Fix Debt

Testing and CI/CD Degradation

As monoliths grow, their testing and CI/CD pipelines often degrade to the point where they become a primary source of friction.

The Test Suite Slowdown

Monolith test suites tend to grow linearly with codebase size. But they also tend to become slower per test as the codebase grows (more setup, more teardown, more interdependencies):

Test Suite Growth Pattern
Codebase Size	Test Count	Expected Duration	Observed Duration
10K LOC	500 tests	30 seconds	30 seconds
50K LOC	2,500 tests	2.5 minutes	4 minutes
200K LOC	10,000 tests	10 minutes	25 minutes
500K LOC	25,000 tests	25 minutes	90 minutes
1M LOC	50,000 tests	50 minutes	4+ hours

The observed duration exceeds expected duration because:

Test setup becomes heavier (more database seeding, more mocking)
Tests become flaky due to shared state and timing issues
Parallelization becomes harder as tests compete for resources
The sheer number of tests overwhelms test runners

The Flaky Test Plague

Large monoliths often suffer from flaky tests—tests that pass sometimes and fail sometimes based on non-deterministic factors. Flaky tests create several problems:

Distrust: Developers learn to ignore test failures ("it's just flaky")
Retry culture: CI pipelines are configured to retry failures, extending build times
Hidden failures: Real bugs are dismissed as flakiness
Merge roulette: Merging becomes a game of luck

Common Sources of Test Flakiness

•Shared test database — Tests interfere with each other's data. Order of execution matters.
•Time-dependent tests — Tests that check "created in the last hour" break at midnight.
•Race conditions — Tests that assume async operations complete in a certain order.
•External dependencies — Tests hitting real APIs that rate-limit or timeout.
•Resource exhaustion — Tests running in parallel exhaust connection pools or memory.
•Non-isolated tests — Tests that modify global state without cleanup.

The CI Pipeline Bottleneck

With slow tests, CI pipelines become bottlenecks. The typical degradation:

Queue buildup: More merges than CI can process → PRs stuck in queue
Trunk conflicts: By the time CI finishes, trunk has moved → rebuild required
Developer context switching: Waiting for CI → developers start other work → lose context
End-of-day rush: Everyone tries to merge before leaving → CI queue explodes
Skip the tests: Under pressure, developers merge without waiting for CI

Strategies for Test Performance

Technology Lock-In

Monoliths typically use a single technology stack—one programming language, one framework, one database. This consistency is a benefit, but it can become a constraint.

The Lock-In Spectrum

Technology Lock-In Concerns
Concern	Example	Impact
Language Evolution	Python 2 → 3, Java 8 → 17	Major upgrade requires touching entire codebase
Framework Aging	Ruby on Rails 4 → 7, Angular 1 → 12+	Old patterns throughout, migration is massive
Library Dependencies	Security vulnerability in core library	Can't upgrade due to cascading changes
Database Limitations	Relational DB for a graph problem	Suboptimal performance, workarounds required
Best Tool Selection	ML module needs Python, app is Java	Either suboptimal tool or FFI complexity
Hiring Pool	Codebase in declining language	Harder to find developers, higher costs

The Library Upgrade Problem

In a monolith, all code shares the same dependency versions. This creates conflicts:

Module A needs axios@0.21 for a feature
Module B requires axios@1.0 for a security fix
The entire application must agree on one version

The "New Technology" Problem

When a new problem domain is best served by a different technology:

Machine learning? Python with TensorFlow/PyTorch is the ecosystem.
Real-time streaming? Consider Kafka, Flink, languages with strong concurrency.
Graph processing? Neo4j or specialized graph stores.

In a monolith, you either:

Shoehorn: Force the problem into your existing stack (suboptimal)
FFI: Foreign Function Interface to another language (complex, fragile)
Separate Service: Build it as a service, introducing distribution (microservices creep)

None of these are ideal. The monolith constrains your technology choices.

Polyglot Within Reason

Resilience Limitations

Monoliths have inherent resilience limitations: they are single points of failure. While mitigation strategies exist, certain failure modes are difficult to address.

Failure Mode: Everything Down

In a monolith, when the process crashes, all functionality becomes unavailable:

An out-of-memory error in reporting takes down checkout
An unhandled exception in admin crashes the customer-facing site
A deadlock in one thread pool blocks all requests

Horizontal Scaling Helps, But...

Running multiple instances mitigates some risks:

If one instance crashes, others continue serving
Load balancer routes around failing instances

But certain failures affect all instances:

A bug triggered by a specific input will crash all instances that receive it
A database bottleneck affects everyone equally
A dependent service outage (payment processor) cascades to all
An infinite loop or memory leak will eventually hit every instance

Monolith Resilience Challenges

•Noisy neighbor: One feature consuming excessive resources (CPU, memory) starves others.
•Cascading failure: A slow dependency (database, API) blocks threads, failing all requests.
•Memory leaks: Leak in any module eventually crashes the entire process.
•Deployment failure: A broken module means you can't deploy working modules either.
•Blast radius: Any severe bug has application-wide impact.

Comparison: Microservices Failure Isolation

Microservices can contain failures:

If the Recommendation service crashes, Checkout continues working
If Reporting is overloaded, it doesn't affect the customer-facing site
Each service can have independent resource limits, circuit breakers, and fallback strategies

This isn't free—it requires sophisticated resilience engineering (circuit breakers, bulkheads, timeouts, fallbacks). But the capability to isolate failures exists in ways a monolith cannot match.

Monolith Resilience Strategies

Summary: Recognizing Monolith Challenges

We've explored the challenges that monolithic architectures face at scale. Let's consolidate:

Key Takeaways

•Scale is multi-dimensional — Traffic, data, codebase, team, and operations all scale differently with different challenges.
•Deployment bottlenecks — Coordinated releases, long pipelines, and all-or-nothing rollbacks create friction.
•Scaling inefficiency — Uniform scaling wastes resources when load is uneven across features.
•Organizational friction — Multiple teams in one codebase creates coordination overhead that dominates productive work.
•Technical debt accumulation — Monoliths naturally degrade toward "big balls of mud" without active governance.
•Testing degradation — Test suites become slow, flaky, and eventually a liability rather than an asset.
•Technology lock-in — Single stack constrains choices, makes upgrades expensive, and limits tool selection.
•Resilience limitations — Single process means blast radius encompasses the entire application.

A Nuanced View

None of these challenges are fatal, and most have mitigation strategies within the monolithic paradigm. The question is always: when do the mitigations become more expensive than the alternative?

In the next page, we'll explore when the monolith remains the right choice—despite these challenges—and how to evaluate that decision rigorously.

Page Complete

3 / 4