System Design (HLD)Design Validation

Design Validation

LevelAdvanced

Duration90 mins

TopicDesign Validation

2 / 5

Bottleneck Analysis

Finding the Constraint Before It Finds You

Every system has a bottleneck. Every single one. The only question is whether you've identified it and designed around it—or whether you'll discover it at 3 AM when your pager goes off.

A bottleneck is the component or resource that limits the overall throughput of your system. When load increases, the bottleneck saturates first, causing latency to spike and requests to fail. Understanding where bottlenecks will emerge—and at what load—is fundamental to building systems that scale.

The Theory of Constraints teaches us that improving anything other than the bottleneck provides no system-wide benefit. You can make your application servers 10x faster, but if the database is the bottleneck, throughput doesn't improve. Principal engineers obsess over bottleneck identification because it focuses engineering effort where it actually matters.

What You Will Master

By the end of this page, you will understand how to systematically identify bottlenecks in a system design using theoretical models and practical analysis techniques. You'll learn to apply queuing theory, capacity modeling, and critical path analysis to predict where your system will fail under load—and how to address those constraints before deployment.

Understanding Bottlenecks

A bottleneck occurs when a component's capacity is insufficient to process the incoming load. This creates a queue of waiting work, which manifests as increased latency. If the queue grows unbounded, the system eventually fails.

Types of Bottlenecks

Bottlenecks can occur at multiple levels, and understanding their nature is the first step to addressing them:

Bottleneck Classification
Bottleneck Type	Resource Constrained	Symptoms	Examples
CPU-bound	Processing power	High CPU utilization, slow computation-heavy operations	Encryption, compression, complex business logic
I/O-bound	Disk or network bandwidth	High I/O wait, slow read/write operations	Database queries, file operations, API calls
Memory-bound	RAM capacity	Swapping, OOM errors, GC pressure	Large datasets, caching, in-memory processing
Network-bound	Bandwidth or latency	High network utilization, timeouts	Large payloads, chatty protocols, cross-region calls
Concurrency-bound	Locks or connection limits	Lock contention, connection pool exhaustion	Database connections, distributed locks, thread pools
External dependency-bound	Third-party service limits	Throttling, quota errors	Payment gateways, cloud APIs, SaaS integrations

The Bottleneck Cascade

Bottlenecks don't exist in isolation—they create cascading effects:

Primary bottleneck saturates → Requests queue up
Queue depth increases → Latency rises
Callers time out or retry → Load amplifies
Upstream components queue → Backpressure propagates
System-wide degradation → Cascading failure

This cascade is why bottleneck analysis must consider the entire system, not just individual components.

The Hidden Bottleneck

The most dangerous bottleneck is the one you don't know about. Systems often have 'shadow' bottlenecks that only emerge at specific load levels or under particular access patterns. A database that handles 1,000 QPS easily might collapse at 1,100 QPS because of a non-obvious index limitation. Always stress-test your assumptions.

Theoretical Foundations

Bottleneck analysis isn't guesswork—it's grounded in mathematical theory. Understanding these foundations allows you to predict system behavior before building anything.

Little's Law

Little's Law is perhaps the most important equation in capacity planning:

L = λ × W

Where:

L = Average number of items in the system (queue depth)
λ = Average arrival rate (items per second)
W = Average time an item spends in the system (latency)

This relationship is profound: if you know any two values, you can calculate the third. More importantly, it reveals the fundamental tradeoff between throughput and latency.

Applying Little's LawConsider an API server processing requests

Input

Arrival rate: 500 requests/second, Average latency: 200ms

Output

Average concurrent requests: L = 500 × 0.2 = 100 requests

Explanation

This means on average, 100 requests are being processed at any moment. If your server can only handle 50 concurrent requests, you have a bottleneck—requests will queue, increasing latency, which increases L, creating a feedback loop until the system fails.

The Universal Scalability Law (USL)

Neil Gunther's USL extends Amdahl's Law to model how systems scale with concurrency:

C(N) = N / (1 + σ(N-1) + κN(N-1))

Where:

N = Number of processors/workers/nodes
σ = Contention coefficient (serialization)
κ = Coherence coefficient (crosstalk/coordination overhead)

The key insight: as you add capacity, coordination overhead eventually dominates. There's a maximum useful concurrency beyond which adding more resources actually decreases throughput.

Converting Mermaid diagram...

Queuing Theory Fundamentals

Systems under load behave as queuing systems. The M/M/1 queue model provides key insights:

Utilization (ρ) = λ/μ, where μ is service rate
As ρ approaches 1 (100% utilization), queue length approaches infinity
Latency = Service Time / (1 - ρ) → At 80% utilization, latency is 5× service time

This mathematical reality explains why you should never run systems at high utilization—the latency explosion is non-linear and catastrophic.

The Latency Explosion: Why Utilization Matters
Server Utilization	Latency Multiplier	Practical Impact
50%	2×	Stable, plenty of headroom
70%	3.3×	Acceptable for sustained load
80%	5×	Warning zone, spikes cause queuing
90%	10×	Dangerous, any perturbation causes problems
95%	20×	Critical, system is fragile
99%	100×	System is effectively failing

The 70% Rule

Most distributed systems operate well at 60-70% utilization. This provides sufficient headroom to absorb traffic spikes, handle retries during partial failures, and maintain reasonable latency. Designing for 100% utilization is designing for failure.

Identifying Bottlenecks in Design

Bottleneck analysis during design requires reasoning about system behavior without the benefit of production metrics. The goal is to identify where bottlenecks will form and at what load they'll become critical.

The Capacity Modeling Process

Define load scenarios: What traffic patterns must the system handle?
Map request flows: Trace requests through all components
Estimate component capacity: What's each component's throughput limit?
Calculate utilization: At target load, what percentage of capacity is consumed?
Identify saturation points: Which component saturates first?
Validate with math: Apply Little's Law and queuing theory

The Load Flow Diagram

A load flow diagram traces request volume through the system, showing how load amplifies or attenuates at each stage. This reveals bottleneck candidates.

Converting Mermaid diagram...

In this example, the analysis reveals two bottleneck candidates:

Auth Service at 83% utilization: Close to the danger zone, will experience latency spikes during traffic bursts
Database at 86% utilization: The 3× query fan-out from the Order Service pushes the database near saturation

Note that the 50,000 RPS load balancer capacity is not the system's capacity—the actual limit is determined by the most constrained component (Auth Service at 12,000 RPS or Database at ~11,700 RPS based on back-calculation).

Fan-out Amplification

When services fan out requests (1 user request generates N downstream calls), the downstream components experience N× the incoming load. This amplification is one of the most common sources of hidden bottlenecks—a microservice architecture with innocent-looking 1:3 fan-outs can quickly overwhelm databases and caches.

Critical Path Analysis

While capacity analysis focuses on throughput, critical path analysis focuses on latency. The critical path is the longest sequence of synchronous operations in processing a request. Latency cannot be better than the sum of critical path operations.

Constructing the Critical Path

Map all operations required to complete a request
Identify which operations are sequential (dependent) vs. parallel (independent)
Sum the latencies along the longest sequential chain
This sum is your theoretical minimum latency

Critical Path Example: Order Placement
Step	Operation	Latency (P50)	Sequential/Parallel	On Critical Path?
1	API Gateway auth check	5ms	Sequential	Yes
2	Fetch user profile	15ms	Sequential	Yes
3a	Validate inventory (cache)	3ms	Parallel with 3b	No
3b	Calculate pricing	8ms	Parallel with 3a	Yes (slower)
4	Payment gateway authorization	150ms	Sequential	Yes
5	Write order to database	25ms	Sequential	Yes
6	Publish order event (async)	2ms	Async, non-blocking	No

Critical path total: 5 + 15 + 8 + 150 + 25 = 203ms (P50)

Key observations:

The payment gateway dominates: At 150ms, it's 74% of total latency. Optimizing anything else yields minimal improvement.
Parallelization helps marginally: Steps 3a and 3b run in parallel, but only 5ms is saved (3ms instead of 11ms total).
Async operations are free: Publishing the order event doesn't affect user-facing latency.

Latency Budget Allocation

Principal engineers work backward from latency requirements to allocate budgets to each component:

Latency Budget CalculationWorking backward from a 300ms P99 requirement

Input

Target: 300ms P99 end-to-end. Critical path has 5 synchronous components.

Output

Budget allocation: API Gateway (15ms), Auth (25ms), Business Logic (60ms), Database (50ms), External API (150ms)

Explanation

Note the external API (payment) consumes 50% of the budget. If the payment provider's P99 is 200ms, the design is infeasible without introducing async processing or caching.

P99 vs P50: The Hidden Complexity

Critical path analysis must account for percentile differences. If your P50 is 200ms but your P99 is 2000ms (due to tail latency in one component), real users experience the P99 during peak traffic. Always design for P99, not P50, and investigate any component with high P99/P50 ratios—they indicate queueing or contention issues.

Hotspot Detection

Hotspots are localized areas of extreme load that can bottleneck a system even when overall capacity appears adequate. They often occur due to non-uniform data access patterns.

Common Hotspot Scenarios

Hotspot Patterns and Detection
Hotspot Type	Cause	Symptoms	Detection During Design
Partition hotspot	Skewed partition key distribution	One shard overloaded while others idle	Analyze key distribution statistics
Cache hot key	Popular items/users	Single cache node overwhelmed	Identify potential celebrity data patterns
Lock contention	Global resources accessed by all requests	High lock wait times	Identify shared mutable state in design
Write amplification	Single record updated by many requests	Single row becomes bottleneck	Model write patterns per entity
Temporal hotspot	Time-based load spikes	Midnight batch jobs, end-of-month processing	Map business processes to load patterns

The Hot Key Problem

Consider a social media platform where a celebrity with 100 million followers posts an update. Suddenly, 100 million users request the same post, the same user profile, and the same timeline. No matter how well you've designed for uniform load, this single key can overwhelm any node.

Mitigation Strategies:

Request coalescing: Multiple concurrent requests for the same data share a single backend call
Local caching: Cache popular items in application memory, not just distributed cache
Replica fanout: Replicate hot keys across multiple cache nodes
Rate limiting per entity: Limit how fast any single piece of data can be accessed
Proactive warming: Pre-populate caches when hotspots are predictable (e.g., scheduled events)

request-coalescing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Request coalescing to prevent hot key thundering herd
class RequestCoalescer<K, V> {
  private inFlight = new Map<K, Promise<V>>();
  
  constructor(
    private fetcher: (key: K) => Promise<V>,
    private keySerializer: (key: K) => string = String
  ) {}
  
  async get(key: K): Promise<V> {
    const serializedKey = this.keySerializer(key) as unknown as K;
    
    // If a request for this key is already in flight, piggyback on it
    const existing = this.inFlight.get(serializedKey);
    if (existing) {
      return existing;
    }
    
    // Start a new request
    const promise = this.fetcher(key).finally(() => {
      // Clean up after completion (success or failure)
      this.inFlight.delete(serializedKey);
    });
    
    this.inFlight.set(serializedKey, promise);
    return promise;
  }
}
 
// Usage example: Celebrity post fetch
const postCoalescer = new RequestCoalescer<string, Post>(
  async (postId) => {
    // This only executes once, even if 1000 requests arrive simultaneously
    return await database.query('SELECT * FROM posts WHERE id = ?', [postId]);
  }
);
 
// All concurrent requests share a single database call
app.get('/posts/:id', async (req, res) => {
  const post = await postCoalescer.get(req.params.id);
  res.json(post);
});

Hotspots in Design Reviews

During design review, explicitly ask: 'What's the hottest key in this system?' For every sharded database, partitioned cache, or distributed lock, identify what data will be accessed most frequently. If you can't answer this question, your capacity model is incomplete.

Dependency Chain Analysis

Bottlenecks don't exist in isolation—they propagate through dependency chains. Understanding these chains reveals how a bottleneck in one component affects the entire system.

The Dependency Matrix

Construct a matrix showing which components depend on which, and the nature of each dependency:

Dependency Matrix: E-Commerce Platform
Component	Depends On	Dependency Type	Failure Impact	Criticality
Web Gateway	Auth Service	Sync, required	All requests fail	Critical
Web Gateway	Rate Limiter	Sync, degraded-fallback	Accepts all traffic	Important
Order Service	Inventory Service	Sync, required	Cannot place orders	Critical
Order Service	Pricing Service	Sync, cached-fallback	Uses stale prices	Important
Order Service	Notification Service	Async	Emails delayed	Low
Payment Service	Payment Gateway (3rd party)	Sync, required	Cannot process payments	Critical
Analytics Service	Kafka	Async, buffered	Analytics delayed	Low

Dependency Depth and Risk

Deep dependency chains create multiplicative failure risk:

A → B (99.9% + 99.9% = 99.8%)
A → B → C (99.9% + 99.9% + 99.9% = 99.7%)
A → B → C → D → E (99.9%⁵ = 99.5%)

Five components in a chain, each with 'three nines,' yields only 99.5% availability—roughly 43 hours of downtime per year.

Breaking Dependency Chains

Anti-patterns to Avoid

•Synchronous chains > 3 deep — Latency compounds, failures cascade
•Circular dependencies — A→B→C→A creates deadlock risk
•Shared database writes — Multiple services writing same tables
•External sync dependencies on critical path — Third-party outages become your outages
•Hidden dependencies — Undocumented calls discovered in production

Patterns to Apply

•Async where possible — Events decouple timing dependencies
•Cache with fallback — Stale data beats no data
•Bulkheads — Isolate failures to prevent cascade
•Timeouts + circuit breakers — Fail fast, recover fast
•Read replicas — Separate read load from write bottlenecks

Addressing Bottlenecks in Design

Once bottlenecks are identified, the design must be modified to address them. There are fundamental strategies, each with tradeoffs:

Strategy: Increase the capacity of the bottleneck component by giving it more resources (CPU, memory, I/O).

When to use:

Simple to implement
Component is approaching but not at hardware limits
Horizontal scaling would require significant refactoring

Limitations:

Physical limits to single-machine performance
Cost increases non-linearly at high end
Single point of failure remains

Example: Upgrading from a 4-core to 16-core database server

Bottleneck Migration

When you resolve a bottleneck, the system's constraint moves to the next-slowest component. This is expected and healthy—you're systematically raising the system's overall capacity. The process continues until you reach 'good enough' capacity or hit fundamental limits (physics, cost, external dependencies).

Summary: Bottleneck Analysis

Bottleneck analysis transforms architectural diagrams into capacity models, revealing where your system will fail under load before you build it. Principal engineers treat this analysis as mandatory, not optional.

Key Takeaways

•Every system has a bottleneck — The question is whether you've found it before production does
•Theoretical foundations matter — Little's Law, queuing theory, and USL provide predictive power
•Utilization above 70% is dangerous — Latency explodes exponentially as utilization approaches 100%
•Fan-out amplifies load — 1:N calls multiply pressure on downstream components
•Critical path determines latency floor — You cannot be faster than the longest sequential chain
•Hotspots defeat uniform capacity planning — Non-uniform access patterns create localized overload
•Dependency chains multiply risk — Each synchronous dependency reduces overall availability
•Resolving bottlenecks migrates constraints — This is expected; iterate until capacity is sufficient

What's Next

With bottlenecks identified and addressed, we move to the next dimension of design validation: failure scenario testing. Every component will eventually fail—the question is whether your design degrades gracefully or collapses catastrophically. The next page explores systematic failure analysis and resilience verification.

Page Complete

You now understand how to systematically identify and address bottlenecks in system designs using theoretical foundations and practical analysis techniques. You can apply capacity modeling, critical path analysis, and hotspot detection to predict system behavior under load. Next, we'll examine how to validate your design against failure scenarios.

2 / 5

Loading learning content...

System Design (HLD)Design Validation

Design Validation

LevelAdvanced

Duration90 mins

TopicDesign Validation

2 / 5

Bottleneck Analysis

Finding the Constraint Before It Finds You

Every system has a bottleneck. Every single one. The only question is whether you've identified it and designed around it—or whether you'll discover it at 3 AM when your pager goes off.

What You Will Master

Understanding Bottlenecks

Types of Bottlenecks

Bottlenecks can occur at multiple levels, and understanding their nature is the first step to addressing them:

Bottleneck Classification
Bottleneck Type	Resource Constrained	Symptoms	Examples
CPU-bound	Processing power	High CPU utilization, slow computation-heavy operations	Encryption, compression, complex business logic
I/O-bound	Disk or network bandwidth	High I/O wait, slow read/write operations	Database queries, file operations, API calls
Memory-bound	RAM capacity	Swapping, OOM errors, GC pressure	Large datasets, caching, in-memory processing
Network-bound	Bandwidth or latency	High network utilization, timeouts	Large payloads, chatty protocols, cross-region calls
Concurrency-bound	Locks or connection limits	Lock contention, connection pool exhaustion	Database connections, distributed locks, thread pools
External dependency-bound	Third-party service limits	Throttling, quota errors	Payment gateways, cloud APIs, SaaS integrations

The Bottleneck Cascade

Bottlenecks don't exist in isolation—they create cascading effects:

Primary bottleneck saturates → Requests queue up
Queue depth increases → Latency rises
Callers time out or retry → Load amplifies
Upstream components queue → Backpressure propagates
System-wide degradation → Cascading failure

This cascade is why bottleneck analysis must consider the entire system, not just individual components.

The Hidden Bottleneck

Theoretical Foundations

Bottleneck analysis isn't guesswork—it's grounded in mathematical theory. Understanding these foundations allows you to predict system behavior before building anything.

Little's Law

Little's Law is perhaps the most important equation in capacity planning:

L = λ × W

Where:

L = Average number of items in the system (queue depth)
λ = Average arrival rate (items per second)
W = Average time an item spends in the system (latency)

This relationship is profound: if you know any two values, you can calculate the third. More importantly, it reveals the fundamental tradeoff between throughput and latency.

Applying Little's LawConsider an API server processing requests

Input

Arrival rate: 500 requests/second, Average latency: 200ms

Output

Average concurrent requests: L = 500 × 0.2 = 100 requests

Explanation

The Universal Scalability Law (USL)

Neil Gunther's USL extends Amdahl's Law to model how systems scale with concurrency:

C(N) = N / (1 + σ(N-1) + κN(N-1))

Where:

N = Number of processors/workers/nodes
σ = Contention coefficient (serialization)
κ = Coherence coefficient (crosstalk/coordination overhead)

The key insight: as you add capacity, coordination overhead eventually dominates. There's a maximum useful concurrency beyond which adding more resources actually decreases throughput.

Converting Mermaid diagram...

Queuing Theory Fundamentals

Systems under load behave as queuing systems. The M/M/1 queue model provides key insights:

Utilization (ρ) = λ/μ, where μ is service rate
As ρ approaches 1 (100% utilization), queue length approaches infinity
Latency = Service Time / (1 - ρ) → At 80% utilization, latency is 5× service time

This mathematical reality explains why you should never run systems at high utilization—the latency explosion is non-linear and catastrophic.

The Latency Explosion: Why Utilization Matters
Server Utilization	Latency Multiplier	Practical Impact
50%	2×	Stable, plenty of headroom
70%	3.3×	Acceptable for sustained load
80%	5×	Warning zone, spikes cause queuing
90%	10×	Dangerous, any perturbation causes problems
95%	20×	Critical, system is fragile
99%	100×	System is effectively failing

The 70% Rule

Identifying Bottlenecks in Design

The Capacity Modeling Process

Define load scenarios: What traffic patterns must the system handle?
Map request flows: Trace requests through all components
Estimate component capacity: What's each component's throughput limit?
Calculate utilization: At target load, what percentage of capacity is consumed?
Identify saturation points: Which component saturates first?
Validate with math: Apply Little's Law and queuing theory

The Load Flow Diagram

A load flow diagram traces request volume through the system, showing how load amplifies or attenuates at each stage. This reveals bottleneck candidates.

Converting Mermaid diagram...

In this example, the analysis reveals two bottleneck candidates:

Auth Service at 83% utilization: Close to the danger zone, will experience latency spikes during traffic bursts
Database at 86% utilization: The 3× query fan-out from the Order Service pushes the database near saturation

Fan-out Amplification

Critical Path Analysis

Constructing the Critical Path

Map all operations required to complete a request
Identify which operations are sequential (dependent) vs. parallel (independent)
Sum the latencies along the longest sequential chain
This sum is your theoretical minimum latency

Critical Path Example: Order Placement
Step	Operation	Latency (P50)	Sequential/Parallel	On Critical Path?
1	API Gateway auth check	5ms	Sequential	Yes
2	Fetch user profile	15ms	Sequential	Yes
3a	Validate inventory (cache)	3ms	Parallel with 3b	No
3b	Calculate pricing	8ms	Parallel with 3a	Yes (slower)
4	Payment gateway authorization	150ms	Sequential	Yes
5	Write order to database	25ms	Sequential	Yes
6	Publish order event (async)	2ms	Async, non-blocking	No

Critical path total: 5 + 15 + 8 + 150 + 25 = 203ms (P50)

Key observations:

The payment gateway dominates: At 150ms, it's 74% of total latency. Optimizing anything else yields minimal improvement.
Parallelization helps marginally: Steps 3a and 3b run in parallel, but only 5ms is saved (3ms instead of 11ms total).
Async operations are free: Publishing the order event doesn't affect user-facing latency.

Latency Budget Allocation

Principal engineers work backward from latency requirements to allocate budgets to each component:

Latency Budget CalculationWorking backward from a 300ms P99 requirement

Input

Target: 300ms P99 end-to-end. Critical path has 5 synchronous components.

Output

Budget allocation: API Gateway (15ms), Auth (25ms), Business Logic (60ms), Database (50ms), External API (150ms)

Explanation

Note the external API (payment) consumes 50% of the budget. If the payment provider's P99 is 200ms, the design is infeasible without introducing async processing or caching.

P99 vs P50: The Hidden Complexity

Hotspot Detection

Hotspots are localized areas of extreme load that can bottleneck a system even when overall capacity appears adequate. They often occur due to non-uniform data access patterns.

Common Hotspot Scenarios

Hotspot Patterns and Detection
Hotspot Type	Cause	Symptoms	Detection During Design
Partition hotspot	Skewed partition key distribution	One shard overloaded while others idle	Analyze key distribution statistics
Cache hot key	Popular items/users	Single cache node overwhelmed	Identify potential celebrity data patterns
Lock contention	Global resources accessed by all requests	High lock wait times	Identify shared mutable state in design
Write amplification	Single record updated by many requests	Single row becomes bottleneck	Model write patterns per entity
Temporal hotspot	Time-based load spikes	Midnight batch jobs, end-of-month processing	Map business processes to load patterns

The Hot Key Problem

Mitigation Strategies:

Request coalescing: Multiple concurrent requests for the same data share a single backend call
Local caching: Cache popular items in application memory, not just distributed cache
Replica fanout: Replicate hot keys across multiple cache nodes
Rate limiting per entity: Limit how fast any single piece of data can be accessed
Proactive warming: Pre-populate caches when hotspots are predictable (e.g., scheduled events)

request-coalescing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Request coalescing to prevent hot key thundering herd
class RequestCoalescer<K, V> {
  private inFlight = new Map<K, Promise<V>>();
  
  constructor(
    private fetcher: (key: K) => Promise<V>,
    private keySerializer: (key: K) => string = String
  ) {}
  
  async get(key: K): Promise<V> {
    const serializedKey = this.keySerializer(key) as unknown as K;
    
    // If a request for this key is already in flight, piggyback on it
    const existing = this.inFlight.get(serializedKey);
    if (existing) {
      return existing;
    }
    
    // Start a new request
    const promise = this.fetcher(key).finally(() => {
      // Clean up after completion (success or failure)
      this.inFlight.delete(serializedKey);
    });
    
    this.inFlight.set(serializedKey, promise);
    return promise;
  }
}
 
// Usage example: Celebrity post fetch
const postCoalescer = new RequestCoalescer<string, Post>(
  async (postId) => {
    // This only executes once, even if 1000 requests arrive simultaneously
    return await database.query('SELECT * FROM posts WHERE id = ?', [postId]);
  }
);
 
// All concurrent requests share a single database call
app.get('/posts/:id', async (req, res) => {
  const post = await postCoalescer.get(req.params.id);
  res.json(post);
});

Hotspots in Design Reviews

Dependency Chain Analysis

Bottlenecks don't exist in isolation—they propagate through dependency chains. Understanding these chains reveals how a bottleneck in one component affects the entire system.

The Dependency Matrix

Construct a matrix showing which components depend on which, and the nature of each dependency:

Dependency Matrix: E-Commerce Platform
Component	Depends On	Dependency Type	Failure Impact	Criticality
Web Gateway	Auth Service	Sync, required	All requests fail	Critical
Web Gateway	Rate Limiter	Sync, degraded-fallback	Accepts all traffic	Important
Order Service	Inventory Service	Sync, required	Cannot place orders	Critical
Order Service	Pricing Service	Sync, cached-fallback	Uses stale prices	Important
Order Service	Notification Service	Async	Emails delayed	Low
Payment Service	Payment Gateway (3rd party)	Sync, required	Cannot process payments	Critical
Analytics Service	Kafka	Async, buffered	Analytics delayed	Low

Dependency Depth and Risk

Deep dependency chains create multiplicative failure risk:

A → B (99.9% + 99.9% = 99.8%)
A → B → C (99.9% + 99.9% + 99.9% = 99.7%)
A → B → C → D → E (99.9%⁵ = 99.5%)

Five components in a chain, each with 'three nines,' yields only 99.5% availability—roughly 43 hours of downtime per year.

Breaking Dependency Chains

Anti-patterns to Avoid

•Synchronous chains > 3 deep — Latency compounds, failures cascade
•Circular dependencies — A→B→C→A creates deadlock risk
•Shared database writes — Multiple services writing same tables
•External sync dependencies on critical path — Third-party outages become your outages
•Hidden dependencies — Undocumented calls discovered in production

Patterns to Apply

•Async where possible — Events decouple timing dependencies
•Cache with fallback — Stale data beats no data
•Bulkheads — Isolate failures to prevent cascade
•Timeouts + circuit breakers — Fail fast, recover fast
•Read replicas — Separate read load from write bottlenecks

Addressing Bottlenecks in Design

Once bottlenecks are identified, the design must be modified to address them. There are fundamental strategies, each with tradeoffs:

Strategy: Increase the capacity of the bottleneck component by giving it more resources (CPU, memory, I/O).

When to use:

Simple to implement
Component is approaching but not at hardware limits
Horizontal scaling would require significant refactoring

Limitations:

Physical limits to single-machine performance
Cost increases non-linearly at high end
Single point of failure remains

Example: Upgrading from a 4-core to 16-core database server

Bottleneck Migration

Summary: Bottleneck Analysis

Key Takeaways

•Every system has a bottleneck — The question is whether you've found it before production does
•Theoretical foundations matter — Little's Law, queuing theory, and USL provide predictive power
•Utilization above 70% is dangerous — Latency explodes exponentially as utilization approaches 100%
•Fan-out amplifies load — 1:N calls multiply pressure on downstream components
•Critical path determines latency floor — You cannot be faster than the longest sequential chain
•Hotspots defeat uniform capacity planning — Non-uniform access patterns create localized overload
•Dependency chains multiply risk — Each synchronous dependency reduces overall availability
•Resolving bottlenecks migrates constraints — This is expected; iterate until capacity is sufficient

What's Next

Page Complete

2 / 5