Computer NetworksApplication Architectures

Application Architectures

LevelIntermediate

Duration90 mins

TopicApplication Architectures

2 / 5

Microservices

The Distributed Revolution

Microservices architecture represents a fundamental shift in how we design, build, and operate software applications. Rather than constructing a single monolithic unit, microservices decompose applications into a constellation of small, autonomous services that communicate over the network to deliver functionality.

This architectural style emerged from the practical experiences of companies like Netflix, Amazon, and Google—organizations that discovered that traditional monolithic architectures couldn't scale to meet their needs. The transition from in-process method calls to network-based inter-service communication represents one of the most significant paradigm shifts in modern software engineering.

At its core, microservices architecture transforms an application layer problem into a distributed systems problem. Where a monolith's internal communication occurs in nanoseconds with perfect reliability, microservices introduce network latency, partial failures, and eventual consistency as fundamental characteristics of the system.

Understanding microservices from a computer networks perspective is essential because the network becomes the system's connective tissue—the medium through which every interaction occurs. Network design, protocol selection, failure handling, and performance optimization become primary concerns rather than afterthoughts.

Learning Objectives

By the end of this page, you will understand the defining characteristics of microservices architecture, the network protocols and patterns that enable inter-service communication, the fundamental challenges introduced by distribution, and the infrastructure required to operate microservices at scale. You'll develop the knowledge to reason about the trade-offs between microservices and monolithic designs.

Defining Microservices Architecture

Microservices architecture structures an application as a collection of loosely coupled, independently deployable services, each implementing a specific business capability, communicating through well-defined network interfaces.

Core Characteristics:

Service Independence: Each microservice is a separate deployable unit with its own codebase, build process, and deployment pipeline. Services can be deployed, scaled, and updated independently.
Single Responsibility: Each service focuses on one business capability or domain context. A service should be small enough that its purpose is immediately clear—typically describable in one sentence.
Decentralized Data Management: Each service owns and manages its own data store. There is no shared database—services communicate through APIs rather than shared data.
Network-Based Communication: Services communicate exclusively through network protocols (HTTP/REST, gRPC, messaging systems) rather than in-process calls.
Technology Heterogeneity: Different services can use different programming languages, frameworks, and data stores based on their specific requirements.
Autonomous Teams: Services are typically owned by small, cross-functional teams that control the entire lifecycle from development to production.

Converting Mermaid diagram...

The Size Question:

One of the most debated aspects of microservices is determining appropriate service boundaries. The name 'micro' is misleading—size isn't measured in lines of code:

"A microservice should be small enough that a single team can own it, large enough to be independently valuable, and bounded by a coherent business capability."

Practical Sizing Heuristics:

Consideration	Guidance
Team Size	2-8 engineers can fully own and operate the service
Cognitive Load	A new developer can understand the service in days, not weeks
Deployment Independence	The service can be deployed without coordinating with other services
Data Ownership	The service owns a clear, bounded set of data
Business Capability	The service maps to a recognizable business function

The network implications are profound. Where a monolith might have one external HTTP endpoint, a microservices system might have hundreds of internal network paths. A single user request might traverse ten or more services, each communication adding latency, introducing potential failure points, and consuming network resources.

The Network as First-Class Concern

In microservices, the network is not an implementation detail—it's a fundamental architectural element. Every design decision must account for network latency, partial failures, and the overhead of serialization/deserialization. Engineers designing microservices must think like network engineers.

Inter-Service Communication Patterns

Communication between microservices occurs over the network using well-defined protocols. The choice of communication pattern fundamentally shapes system behavior, performance characteristics, and failure modes.

Synchronous Communication:

In synchronous (request-response) communication, the calling service waits for a response from the called service before continuing execution.

HTTP/REST (Representational State Transfer):

REST over HTTP is the most common synchronous protocol for microservices:

Order Service → HTTP GET /users/123 → User Service
                      ↓
              HTTP 200 OK
              {"id": 123, "name": "Alice", ...}
                      ↓
Order Service ← Response received ← User Service

Characteristics:

Uses standard HTTP methods (GET, POST, PUT, DELETE)
JSON or XML payloads (human-readable but verbose)
Built on well-understood technology
Easy to debug with standard tools (curl, Postman)
Latency: typically 1-50ms per call within a datacenter

gRPC (gRPC Remote Procedure Call):

gRPC is a high-performance, binary protocol designed for inter-service communication:

Order Service → gRPC Call: GetUser(123) → User Service
                         ↓
                  Binary Protobuf Response
                         ↓
Order Service ← Response received ← User Service

Characteristics:

Binary Protocol Buffers (protobuf) serialization—smaller, faster than JSON
HTTP/2 transport with multiplexing and header compression
Strongly typed interfaces defined in .proto files
Built-in code generation for multiple languages
~2-10x faster than REST/JSON for large payloads

Synchronous Communication Protocol Comparison
Aspect	REST/HTTP	gRPC
Serialization	JSON/XML (text)	Protocol Buffers (binary)
Transport	HTTP/1.1 or HTTP/2	HTTP/2 only
Contract	OpenAPI/Swagger (optional)	Proto files (required)
Streaming	Limited	Native bidirectional streaming
Browser Support	Native	Requires gRPC-Web proxy
Debugging	Easy (readable payloads)	Requires tooling
Latency	Higher (text parsing)	Lower (binary + HTTP/2)
Adoption	Universal	Growing, especially internal

Asynchronous Communication:

Asynchronous (message-based) communication decouples services in time—the sender doesn't wait for a response.

Message Queues (Point-to-Point):

Order Service → [Order Queue] → Fulfillment Service
    │                               │
    │ (fire and forget)             │ (process when ready)
    ▼                               ▼
Continues immediately          Processes message

Characteristics:

Messages delivered to exactly one consumer
Work distribution across consumer instances
Load leveling during traffic spikes
Technologies: RabbitMQ, Amazon SQS, Azure Service Bus

Publish/Subscribe (Event-Driven):

                    ┌─→ Inventory Service
                    │
Order Service → [Order Events] ─→ Shipping Service
                    │
                    └─→ Analytics Service

Characteristics:

One message delivered to multiple subscribers
Publishers don't know who receives messages
Enables loose coupling and extensibility
Technologies: Apache Kafka, Amazon SNS, Google Pub/Sub

Event Sourcing Pattern:

Instead of storing current state, store a sequence of events:

Event Store:
1. OrderCreated     {orderId: 1, items: [...], timestamp: T1}
2. PaymentReceived  {orderId: 1, amount: 99.99, timestamp: T2}
3. OrderShipped     {orderId: 1, trackingId: "...", timestamp: T3}

→ Current state reconstructed by replaying events

This pattern enables:

Complete audit trail
Temporal queries ("What was the state at time T?")
Event replay for debugging or rebuilding state
Natural fit for distributed systems

Choosing Communication Patterns

Use synchronous communication when immediate response is required (user-facing requests, queries). Use asynchronous communication for commands that don't need immediate confirmation, event propagation, and to decouple services that shouldn't block each other. Many systems use both—synchronous for queries, asynchronous for commands.

Service Discovery and Load Balancing

In a microservices environment where services scale dynamically and instances come and go, service discovery becomes essential infrastructure. Unlike monolithic deployments with static IP addresses, microservices require dynamic mechanisms to locate service instances.

The Service Discovery Problem:

When the Order Service needs to call the User Service:

Where is the User Service running? (IP address, port)
Which instance should receive this request? (load balancing)
Is the target instance healthy? (health checking)
How do we handle instance failures? (failover)

Client-Side Discovery:

The calling service is responsible for discovering and selecting target instances:

┌─────────────────────────────────────────────────────────┐
│                  Service Registry                       │
│  ┌─────────────────────────────────────────────────┐   │
│  │ User Service:                                    │   │
│  │   - instance-1: 10.0.1.10:8080 (healthy)        │   │
│  │   - instance-2: 10.0.1.11:8080 (healthy)        │   │
│  │   - instance-3: 10.0.1.12:8080 (unhealthy)      │   │
│  └─────────────────────────────────────────────────┘   │
└────────────────────────────────────────────────────────┘
                           ▲
                           │ Query
                           │
┌──────────────────────────┼──────────────────────────────┐
│      Order Service       │                              │
│  ┌───────────────────────┴─────────────────────────┐   │
│  │ Discovery Client                                │   │
│  │  - Cache registry data                          │   │
│  │  - Select healthy instance (load balancing)     │   │
│  │  - Route request directly                       │   │
│  └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Technologies:

Netflix Eureka: Java-focused, pioneered by Netflix
Consul: HashiCorp's service mesh with discovery
etcd: Distributed key-value store (Kubernetes uses internally)
Zookeeper: Apache coordination service

Server-Side Discovery:

A dedicated load balancer handles discovery, and clients connect to a stable endpoint:

Order Service → Load Balancer → User Service Instances
                     │
                     ├─→ 10.0.1.10:8080
                     ├─→ 10.0.1.11:8080
                     └─→ 10.0.1.12:8080

Technologies:

Kubernetes Services: Built-in service discovery via DNS and virtual IPs
AWS ALB/NLB: Cloud-native load balancers with service discovery integration
Nginx/HAProxy: Traditional load balancers in container-aware modes

Load Balancing Algorithms in Microservices

•Round Robin: Requests distributed evenly across instances. Simple but ignores instance load.
•Weighted Round Robin: Instances with higher capacity receive more traffic. Requires capacity knowledge.
•Least Connections: Route to the instance with fewest active connections. Good for varying request durations.
•Random: Statistically even distribution without tracking state. Simple and effective at scale.
•Consistent Hashing: Same client/key always routes to same instance. Useful for stateful services or caching.
•Latency-Based: Route to instances with lowest recent response times. Adapts to real performance.

Health Checking:

Service discovery must distinguish healthy from unhealthy instances:

Health Check Type	Description	Use Case
Liveness	Is the process running?	Restart crashed containers
Readiness	Can the service handle requests?	Remove from load balancing during startup
Deep Health	Are dependencies (DB, cache) accessible?	Detect cascade failures

DNS-Based Discovery in Kubernetes:

Kubernetes provides built-in DNS for service discovery:

Service Name: user-service
Namespace: production

→ DNS Name: user-service.production.svc.cluster.local
→ Resolves to: ClusterIP (virtual IP)
→ kube-proxy routes to healthy pod IPs

This approach combines service discovery with load balancing at the network layer, transparent to application code.

Discovery Failures

Service discovery is a critical dependency—if discovery fails, services can't communicate. Design for discovery unavailability: cache registry data, fail gracefully, and monitor registry health closely. A discovery outage can cascade into a complete system failure.

API Gateways and Edge Services

The API Gateway serves as the single entry point for external clients accessing a microservices system. It abstracts the internal service topology, providing a unified interface while handling cross-cutting concerns.

Core Functions:

                                    ┌─────────────────────────────────┐
                                    │        Internal Services        │
                                    │                                 │
┌──────────────┐                    │  ┌────────────┐ ┌────────────┐  │
│    Client    │ ──HTTP/REST──→    │  │   User     │ │   Order    │  │
│   (Browser,  │                    │  │  Service   │ │  Service   │  │
│    Mobile)   │                    │  └────────────┘ └────────────┘  │
└──────────────┘                    │                                 │
       │                            │  ┌────────────┐ ┌────────────┐  │
       │                            │  │  Payment   │ │ Inventory  │  │
       ▼                            │  │  Service   │ │  Service   │  │
┌──────────────────────────────┐    │  └────────────┘ └────────────┘  │
│        API Gateway           │────│                                 │
│  ┌────────────────────────┐  │    └─────────────────────────────────┘
│  │ • Request Routing      │  │
│  │ • Authentication       │  │
│  │ • Rate Limiting        │  │
│  │ • Request/Response     │  │
│  │   Transformation       │  │
│  │ • SSL Termination      │  │
│  │ • Caching              │  │
│  │ • Monitoring/Logging   │  │
│  └────────────────────────┘  │
└──────────────────────────────┘

Request Routing:

The gateway routes requests to appropriate backend services based on path, headers, or other criteria:

/api/users/*      → User Service
/api/orders/*     → Order Service
/api/payments/*   → Payment Service
/graphql          → GraphQL Federation Service

Authentication and Authorization:

Rather than each service implementing authentication:

1. Client → Gateway: Request with JWT token
2. Gateway validates token (signature, expiration, claims)
3. Gateway enriches request with user context
4. Gateway → Service: Request with validated identity

This centralizes security logic and ensures consistent enforcement.

API Gateway Technologies Comparison
Gateway	Type	Strengths	Considerations
Kong	Open Source/Enterprise	Plugin ecosystem, Lua extensibility	Operational complexity
AWS API Gateway	Managed Cloud	Deep AWS integration, serverless	Vendor lock-in
Nginx/OpenResty	Traditional/Extended	Performance, wide adoption	Limited dynamic routing
Envoy	Cloud Native Proxy	L7 proxy, service mesh foundation	Complexity for simple cases
Spring Cloud Gateway	Java Ecosystem	Tight Spring integration	JVM overhead
GraphQL Federation	Query Language	Unified schema, type safety	Learning curve

Rate Limiting and Throttling:

API Gateways protect backend services from overload:

Algorithm	Behavior	Use Case
Token Bucket	Allows bursts up to bucket size	API rate limiting
Leaky Bucket	Smooth, constant output rate	Protecting fragile backends
Fixed Window	Count requests per time window	Simple quota enforcement
Sliding Window	Rolling count over time	More accurate rate limiting

Response Aggregation (Backend for Frontend - BFF):

For mobile or specific clients, the gateway can aggregate multiple service responses:

┌─────────────────────────────────────────────────────────────┐
│                    Traditional Approach                      │
│                                                             │
│  Mobile App → GET /user/123  → User Service                 │
│            → GET /user/123/orders → Order Service           │
│            → GET /recommendations → Recommendation Service  │
│                                                             │
│  Result: 3 round trips, higher latency                      │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    BFF Pattern                               │
│                                                             │
│  Mobile App → GET /mobile/home  → Mobile BFF Gateway        │
│                                         │                   │
│                                         ├─→ User Service    │
│                                         ├─→ Order Service   │
│                                         └─→ Recommendation  │
│                                                             │
│  Result: 1 round trip, lower latency, optimized payload     │
└─────────────────────────────────────────────────────────────┘

Gateway Anti-Patterns

Avoid putting business logic in the API Gateway—it should focus on cross-cutting concerns. Also avoid 'gateway monolith' where all customization accumulates in one place. Consider multiple specialized gateways (mobile BFF, partner API gateway) rather than one overloaded gateway.

Resilience Patterns for Distributed Systems

In monolithic applications, failure is binary—the application works or it doesn't. In microservices, partial failure is the norm. Services fail, networks partition, and latency spikes occur constantly. Designing for resilience is not optional—it's essential for system survival.

Understanding Failure Modes:

Failure Type	Description	Example
Crash Failure	Service terminates unexpectedly	Out of memory, unhandled exception
Latency Degradation	Service responds but slowly	Database connection pool exhaustion
Partial Failure	Some requests fail, others succeed	One container overloaded
Byzantine Failure	Service returns incorrect results	Bug in business logic
Network Partition	Services can't reach each other	Switch failure, DNS issue

Circuit Breaker Pattern:

Prevent cascade failures by stopping requests to failing services:

                    ┌─────────────────────────┐
                    │     Circuit Breaker     │
                    │                         │
                    │  ┌───────────────────┐  │
Order Service ──────│──│  CLOSED (normal)  │──│────► User Service
       │            │  └─────────┬─────────┘  │          │
       │            │            │            │          │
       │            │  Failures exceed       │          ✗ Fails
       │            │  threshold             │
       │            │            │            │
       │            │            ▼            │
       │            │  ┌───────────────────┐  │
       │            │  │   OPEN (failing)  │──┼────► Immediate failure
       │            │  │   No requests     │  │      (no connection attempt)
       │            │  └─────────┬─────────┘  │
       │            │            │            │
       │            │  Timeout expires       │
       │            │            │            │
       │            │            ▼            │
       │            │  ┌───────────────────┐  │
       │            │  │   HALF-OPEN       │──┼────► Limited test requests
       │            │  │   (testing)       │  │
       │            │  └───────────────────┘  │
                    └─────────────────────────┘

Benefits:

Fails fast instead of waiting for timeouts
Prevents thread/connection pool exhaustion
Gives failing service time to recover
Provides fallback behavior opportunity

Essential Resilience Patterns

•Timeout: Never wait indefinitely for a response. Set appropriate timeouts at every network boundary. Typical: 100ms-5s for internal calls.
•Retry with Backoff: Retry failed requests with increasing delays (exponential backoff) to avoid thundering herd during recovery.
•Retry Budget: Limit total retry attempts to prevent retry storms. If 50% of requests are retrying, something is seriously wrong.
•Bulkhead: Isolate resources (thread pools, connection pools) per dependency so one failing service can't exhaust shared resources.
•Fallback: Return cached or default data when a service is unavailable. Degraded functionality is better than complete failure.
•Idempotency: Design operations to be safely retried. If a request times out, the client may retry—ensure duplicate requests don't cause duplicate effects.

Implementing Resilience in Practice:

1. Timeouts (Defensive Coding):

# Without timeout: Can hang indefinitely
response = requests.get('http://user-service/users/123')

# With timeout: Fails fast if service is slow
response = requests.get(
    'http://user-service/users/123',
    timeout=(1.0, 5.0)  # (connect timeout, read timeout)
)

2. Retry with Exponential Backoff:

Attempt 1: Immediate
Attempt 2: Wait 100ms
Attempt 3: Wait 200ms
Attempt 4: Wait 400ms
+ Jitter: Random component to avoid synchronized retries

3. Circuit Breaker State Transitions:

CLOSED → OPEN: 5 failures within 30 seconds
OPEN → HALF-OPEN: After 30 seconds timeout
HALF-OPEN → CLOSED: 3 consecutive successes
HALF-OPEN → OPEN: Any failure

Libraries:

Resilience4j (Java): Circuit breaker, bulkhead, retry
Polly (.NET): Comprehensive resilience library
Hystrix (Java, deprecated): Netflix's pioneering library, concepts live on
Envoy: Proxy-level resilience (no code changes)

Cascading Failures

Without resilience patterns, a single slow service can take down an entire microservices system. Requests pile up waiting for the slow service, consuming threads and connections. Other services become slow, triggering more cascades. This 'gray failure' is often worse than a clean crash because it's harder to detect and recover from.

Service Mesh Architecture

A service mesh is a dedicated infrastructure layer for handling service-to-service communication in microservices. It provides a uniform way to connect, secure, and observe services without requiring changes to application code.

The Sidecar Pattern:

The service mesh deploys a proxy (sidecar) alongside each service instance:

┌──────────────────────────────────────────────────────────────┐
│                          Pod/Container                        │
│  ┌────────────────────┐     ┌────────────────────────────┐   │
│  │   Application      │     │     Sidecar Proxy          │   │
│  │   (User Service)   │────▶│     (Envoy)                │   │
│  │                    │     │                            │   │
│  │   localhost:8080   │     │   • mTLS                   │   │
│  │                    │◀────│   • Load balancing         │   │
│  └────────────────────┘     │   • Circuit breaking       │   │
│                             │   • Observability          │   │
│                             │   • Traffic control        │   │
│                             └────────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘
                                        │
                                        │ Outbound traffic
                                        ▼
┌──────────────────────────────────────────────────────────────┐
│                          Pod/Container                        │
│  ┌────────────────────┐     ┌────────────────────────────┐   │
│  │   Sidecar Proxy    │     │   Application             │   │
│  │   (Envoy)          │────▶│   (Order Service)         │   │
│  └────────────────────┘     └────────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘

Data Plane vs. Control Plane:

Component	Function	Example
Data Plane	Sidecar proxies that handle actual traffic	Envoy, Linkerd-proxy
Control Plane	Management layer that configures proxies	Istio, Linkerd, Consul Connect

Traffic Management:

Service meshes provide sophisticated traffic control:

Canary Deployments:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
spec:
  http:
  - route:
    - destination:
        host: user-service
        subset: v1
      weight: 90
    - destination:
        host: user-service
        subset: v2-canary
      weight: 10

A/B Testing:

http:
- match:
  - headers:
      x-experiment-group:
        exact: "treatment"
  route:
  - destination:
      host: user-service
      subset: experimental

Fault Injection (Chaos Engineering):

http:
- fault:
    delay:
      percentage:
        value: 10
      fixedDelay: 5s
    abort:
      percentage:
        value: 1
      httpStatus: 503

Service Mesh Comparison
Feature	Istio	Linkerd	Consul Connect
Data Plane	Envoy	Linkerd-proxy (Rust)	Envoy or built-in
Complexity	High	Low-Medium	Medium
Resource Usage	Higher	Lower	Medium
mTLS	Yes	Yes	Yes
Traffic Management	Extensive	Basic	Good
Multi-cluster	Yes	Yes	Yes
Best For	Full feature set	Simplicity, Kubernetes	Multi-platform

Do You Need a Service Mesh?

Service meshes add operational complexity and resource overhead. Consider a mesh when you have 20+ services needing consistent security (mTLS), observability, or sophisticated traffic management. For smaller deployments, application-level libraries (Resilience4j, Polly) may be more appropriate.

Observability in Microservices

In monolithic applications, debugging means looking at one process. In microservices, understanding system behavior requires observability—the ability to understand internal state from external outputs. Observability rests on three pillars:

The Three Pillars:

Metrics: Numeric measurements aggregated over time (request count, latency percentiles, error rates)
Logs: Immutable records of discrete events (request received, database query executed, error occurred)
Traces: Records of request flow across services (which services handled this request, in what order, with what latency)

Distributed Tracing:

Tracing is especially critical for microservices because a single user request may touch many services:

┌────────────────────────────────────────────────────────────────────────┐
│ Trace ID: abc-123                                                      │
│                                                                        │
│  API Gateway      [████████████████████████████████████] 200ms        │
│    │                                                                   │
│    ├──▶ User Service  [██████████] 45ms                                │
│    │      │                                                            │
│    │      └──▶ Redis Cache [██] 5ms                                    │
│    │                                                                   │
│    ├──▶ Order Service  [████████████████████████] 120ms                │
│    │      │                                                            │
│    │      ├──▶ Inventory Service [████████] 40ms                       │
│    │      │      │                                                     │
│    │      │      └──▶ PostgreSQL [███] 15ms                            │
│    │      │                                                            │
│    │      └──▶ Payment Service [██████████] 50ms                       │
│    │             │                                                     │
│    │             └──▶ External Payment API [████████] 35ms             │
│    │                                                                   │
│    └──▶ Notification Service [████] 20ms (async)                       │
└────────────────────────────────────────────────────────────────────────┘

Trace Context Propagation:

Traces work by propagating context across service boundaries:

Service A                           Service B
    │                                   │
    │ HTTP Request                      │
    │ Headers:                          │
    │   traceparent: 00-abc123-...      │
    │   tracestate: vendor=value        │
    │ ─────────────────────────────────▶│
    │                                   │
    │                    Extract trace context
    │                    Create child span
    │                    Include in outgoing requests

W3C Trace Context is the emerging standard for trace propagation across different tracing systems.

Key Observability Metrics for Microservices

•Request Rate: Requests per second by service, endpoint, status code. Detect traffic anomalies.
•Error Rate: Percentage of requests resulting in errors. Track by error type (4xx vs 5xx).
•Latency Percentiles: P50, P95, P99 response times. Averages hide outliers—percentiles reveal reality.
•Saturation: Resource utilization (CPU, memory, connections). Predict capacity issues before they cause failures.
•Dependency Health: Success rate and latency to downstream services and databases.
•Business Metrics: Orders placed, users registered, payments processed. Connect technical health to business outcomes.

Observability Stack:

Component	Open Source Options	Cloud Options
Metrics Collection	Prometheus, StatsD	CloudWatch, Datadog
Metrics Storage	Prometheus, VictoriaMetrics	Managed services
Log Aggregation	ELK Stack, Loki	CloudWatch Logs, Splunk
Distributed Tracing	Jaeger, Zipkin	X-Ray, Honeycomb
Visualization	Grafana	Built into cloud services
Alerting	Alertmanager, PagerDuty	Cloud-native alerting

Structured Logging:

For effective log analysis, use structured (JSON) logging:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "service": "order-service",
  "traceId": "abc-123",
  "spanId": "def-456",
  "userId": "user-789",
  "message": "Order created",
  "orderId": "order-101",
  "amount": 99.99,
  "duration_ms": 45
}

Structured logs enable querying, aggregation, and correlation that free-text logs can't support.

Invest in Observability Early

Don't wait until problems occur to build observability. Instrument services from the start, establish SLOs (Service Level Objectives), and practice debugging before production incidents. The cost of observability infrastructure is trivial compared to the cost of extended outages without visibility.

Summary: Microservices Architecture

We've conducted an extensive exploration of microservices architecture—understanding not just what it is, but the profound network and operational implications of distributing application components across services. Let's consolidate the essential insights:

Key Takeaways

•Microservices trade deployment simplicity for operational flexibility—independent scaling, deployment, and technology choice come at the cost of distributed systems complexity.
•The network becomes the system's critical infrastructure—latency, failures, and serialization overhead are now fundamental concerns, not edge cases.
•Communication patterns shape system behavior—synchronous HTTP/gRPC for queries, asynchronous messaging for commands and events.
•Service discovery and load balancing are essential infrastructure for dynamic, scalable deployments.
•API Gateways centralize cross-cutting concerns—authentication, rate limiting, routing—at the system's edge.
•Resilience patterns are mandatory—timeouts, circuit breakers, retries, and bulkheads protect against cascade failures.
•Service meshes provide infrastructure-level solutions for security, observability, and traffic management.
•Observability through metrics, logs, and traces is essential for understanding distributed system behavior.

Monolithic vs. Microservices Summary
Aspect	Monolithic	Microservices
Deployment	Single artifact	Many independent services
Scaling	Uniform	Per-service
Internal Communication	In-process	Network-based
Data Management	Shared database	Database per service
Technology Stack	Uniform	Polyglot
Team Structure	Feature teams in shared codebase	Service-owning teams
Operational Complexity	Lower	Higher
Failure Modes	Binary (works/fails)	Partial failures

When to Choose Microservices:

✅ Large organizations with multiple autonomous teams ✅ Need for independent scaling of specific components ✅ Polyglot requirements (different services suit different technologies) ✅ Complex domains benefiting from bounded context isolation ✅ High availability requirements with graceful degradation

When to Avoid Microservices:

❌ Small teams (fewer than 10-15 engineers) ❌ Early-stage products with unclear requirements ❌ Simple domains without complex scaling needs ❌ Organizations without DevOps maturity ❌ When distributed system expertise is lacking

Transition to Web Applications:

With our understanding of both monolithic and microservices architectures complete, we'll next explore web applications—examining how these architectural patterns manifest in the specific context of HTTP-based applications serving browsers and providing APIs.

Page Complete

You now possess comprehensive knowledge of microservices architecture from a computer networks perspective—understanding not just the conceptual model but the network protocols, infrastructure requirements, and operational patterns that make distributed systems work. This knowledge is essential for designing, building, and operating modern application layer systems.

2 / 5

Loading learning content...

Computer NetworksApplication Architectures

Application Architectures

LevelIntermediate

Duration90 mins

TopicApplication Architectures

2 / 5

Microservices

The Distributed Revolution

Learning Objectives

Defining Microservices Architecture

Core Characteristics:

Service Independence: Each microservice is a separate deployable unit with its own codebase, build process, and deployment pipeline. Services can be deployed, scaled, and updated independently.
Single Responsibility: Each service focuses on one business capability or domain context. A service should be small enough that its purpose is immediately clear—typically describable in one sentence.
Decentralized Data Management: Each service owns and manages its own data store. There is no shared database—services communicate through APIs rather than shared data.
Network-Based Communication: Services communicate exclusively through network protocols (HTTP/REST, gRPC, messaging systems) rather than in-process calls.
Technology Heterogeneity: Different services can use different programming languages, frameworks, and data stores based on their specific requirements.
Autonomous Teams: Services are typically owned by small, cross-functional teams that control the entire lifecycle from development to production.

Converting Mermaid diagram...

The Size Question:

One of the most debated aspects of microservices is determining appropriate service boundaries. The name 'micro' is misleading—size isn't measured in lines of code:

"A microservice should be small enough that a single team can own it, large enough to be independently valuable, and bounded by a coherent business capability."

Practical Sizing Heuristics:

Consideration	Guidance
Team Size	2-8 engineers can fully own and operate the service
Cognitive Load	A new developer can understand the service in days, not weeks
Deployment Independence	The service can be deployed without coordinating with other services
Data Ownership	The service owns a clear, bounded set of data
Business Capability	The service maps to a recognizable business function

The Network as First-Class Concern

Inter-Service Communication Patterns

Synchronous Communication:

In synchronous (request-response) communication, the calling service waits for a response from the called service before continuing execution.

HTTP/REST (Representational State Transfer):

REST over HTTP is the most common synchronous protocol for microservices:

Order Service → HTTP GET /users/123 → User Service
                      ↓
              HTTP 200 OK
              {"id": 123, "name": "Alice", ...}
                      ↓
Order Service ← Response received ← User Service

Characteristics:

Uses standard HTTP methods (GET, POST, PUT, DELETE)
JSON or XML payloads (human-readable but verbose)
Built on well-understood technology
Easy to debug with standard tools (curl, Postman)
Latency: typically 1-50ms per call within a datacenter

gRPC (gRPC Remote Procedure Call):

gRPC is a high-performance, binary protocol designed for inter-service communication:

Order Service → gRPC Call: GetUser(123) → User Service
                         ↓
                  Binary Protobuf Response
                         ↓
Order Service ← Response received ← User Service

Characteristics:

Binary Protocol Buffers (protobuf) serialization—smaller, faster than JSON
HTTP/2 transport with multiplexing and header compression
Strongly typed interfaces defined in .proto files
Built-in code generation for multiple languages
~2-10x faster than REST/JSON for large payloads

Synchronous Communication Protocol Comparison
Aspect	REST/HTTP	gRPC
Serialization	JSON/XML (text)	Protocol Buffers (binary)
Transport	HTTP/1.1 or HTTP/2	HTTP/2 only
Contract	OpenAPI/Swagger (optional)	Proto files (required)
Streaming	Limited	Native bidirectional streaming
Browser Support	Native	Requires gRPC-Web proxy
Debugging	Easy (readable payloads)	Requires tooling
Latency	Higher (text parsing)	Lower (binary + HTTP/2)
Adoption	Universal	Growing, especially internal

Asynchronous Communication:

Asynchronous (message-based) communication decouples services in time—the sender doesn't wait for a response.

Message Queues (Point-to-Point):

Order Service → [Order Queue] → Fulfillment Service
    │                               │
    │ (fire and forget)             │ (process when ready)
    ▼                               ▼
Continues immediately          Processes message

Characteristics:

Messages delivered to exactly one consumer
Work distribution across consumer instances
Load leveling during traffic spikes
Technologies: RabbitMQ, Amazon SQS, Azure Service Bus

Publish/Subscribe (Event-Driven):

                    ┌─→ Inventory Service
                    │
Order Service → [Order Events] ─→ Shipping Service
                    │
                    └─→ Analytics Service

Characteristics:

One message delivered to multiple subscribers
Publishers don't know who receives messages
Enables loose coupling and extensibility
Technologies: Apache Kafka, Amazon SNS, Google Pub/Sub

Event Sourcing Pattern:

Instead of storing current state, store a sequence of events:

Event Store:
1. OrderCreated     {orderId: 1, items: [...], timestamp: T1}
2. PaymentReceived  {orderId: 1, amount: 99.99, timestamp: T2}
3. OrderShipped     {orderId: 1, trackingId: "...", timestamp: T3}

→ Current state reconstructed by replaying events

This pattern enables:

Complete audit trail
Temporal queries ("What was the state at time T?")
Event replay for debugging or rebuilding state
Natural fit for distributed systems

Choosing Communication Patterns

Service Discovery and Load Balancing

The Service Discovery Problem:

When the Order Service needs to call the User Service:

Where is the User Service running? (IP address, port)
Which instance should receive this request? (load balancing)
Is the target instance healthy? (health checking)
How do we handle instance failures? (failover)

Client-Side Discovery:

The calling service is responsible for discovering and selecting target instances:

┌─────────────────────────────────────────────────────────┐
│                  Service Registry                       │
│  ┌─────────────────────────────────────────────────┐   │
│  │ User Service:                                    │   │
│  │   - instance-1: 10.0.1.10:8080 (healthy)        │   │
│  │   - instance-2: 10.0.1.11:8080 (healthy)        │   │
│  │   - instance-3: 10.0.1.12:8080 (unhealthy)      │   │
│  └─────────────────────────────────────────────────┘   │
└────────────────────────────────────────────────────────┘
                           ▲
                           │ Query
                           │
┌──────────────────────────┼──────────────────────────────┐
│      Order Service       │                              │
│  ┌───────────────────────┴─────────────────────────┐   │
│  │ Discovery Client                                │   │
│  │  - Cache registry data                          │   │
│  │  - Select healthy instance (load balancing)     │   │
│  │  - Route request directly                       │   │
│  └──────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Technologies:

Netflix Eureka: Java-focused, pioneered by Netflix
Consul: HashiCorp's service mesh with discovery
etcd: Distributed key-value store (Kubernetes uses internally)
Zookeeper: Apache coordination service

Server-Side Discovery:

A dedicated load balancer handles discovery, and clients connect to a stable endpoint:

Order Service → Load Balancer → User Service Instances
                     │
                     ├─→ 10.0.1.10:8080
                     ├─→ 10.0.1.11:8080
                     └─→ 10.0.1.12:8080

Technologies:

Kubernetes Services: Built-in service discovery via DNS and virtual IPs
AWS ALB/NLB: Cloud-native load balancers with service discovery integration
Nginx/HAProxy: Traditional load balancers in container-aware modes

Load Balancing Algorithms in Microservices

•Round Robin: Requests distributed evenly across instances. Simple but ignores instance load.
•Weighted Round Robin: Instances with higher capacity receive more traffic. Requires capacity knowledge.
•Least Connections: Route to the instance with fewest active connections. Good for varying request durations.
•Random: Statistically even distribution without tracking state. Simple and effective at scale.
•Consistent Hashing: Same client/key always routes to same instance. Useful for stateful services or caching.
•Latency-Based: Route to instances with lowest recent response times. Adapts to real performance.

Health Checking:

Service discovery must distinguish healthy from unhealthy instances:

Health Check Type	Description	Use Case
Liveness	Is the process running?	Restart crashed containers
Readiness	Can the service handle requests?	Remove from load balancing during startup
Deep Health	Are dependencies (DB, cache) accessible?	Detect cascade failures

DNS-Based Discovery in Kubernetes:

Kubernetes provides built-in DNS for service discovery:

Service Name: user-service
Namespace: production

→ DNS Name: user-service.production.svc.cluster.local
→ Resolves to: ClusterIP (virtual IP)
→ kube-proxy routes to healthy pod IPs

This approach combines service discovery with load balancing at the network layer, transparent to application code.

Discovery Failures

API Gateways and Edge Services

Core Functions:

                                    ┌─────────────────────────────────┐
                                    │        Internal Services        │
                                    │                                 │
┌──────────────┐                    │  ┌────────────┐ ┌────────────┐  │
│    Client    │ ──HTTP/REST──→    │  │   User     │ │   Order    │  │
│   (Browser,  │                    │  │  Service   │ │  Service   │  │
│    Mobile)   │                    │  └────────────┘ └────────────┘  │
└──────────────┘                    │                                 │
       │                            │  ┌────────────┐ ┌────────────┐  │
       │                            │  │  Payment   │ │ Inventory  │  │
       ▼                            │  │  Service   │ │  Service   │  │
┌──────────────────────────────┐    │  └────────────┘ └────────────┘  │
│        API Gateway           │────│                                 │
│  ┌────────────────────────┐  │    └─────────────────────────────────┘
│  │ • Request Routing      │  │
│  │ • Authentication       │  │
│  │ • Rate Limiting        │  │
│  │ • Request/Response     │  │
│  │   Transformation       │  │
│  │ • SSL Termination      │  │
│  │ • Caching              │  │
│  │ • Monitoring/Logging   │  │
│  └────────────────────────┘  │
└──────────────────────────────┘

Request Routing:

The gateway routes requests to appropriate backend services based on path, headers, or other criteria:

/api/users/*      → User Service
/api/orders/*     → Order Service
/api/payments/*   → Payment Service
/graphql          → GraphQL Federation Service

Authentication and Authorization:

Rather than each service implementing authentication:

1. Client → Gateway: Request with JWT token
2. Gateway validates token (signature, expiration, claims)
3. Gateway enriches request with user context
4. Gateway → Service: Request with validated identity

This centralizes security logic and ensures consistent enforcement.

API Gateway Technologies Comparison
Gateway	Type	Strengths	Considerations
Kong	Open Source/Enterprise	Plugin ecosystem, Lua extensibility	Operational complexity
AWS API Gateway	Managed Cloud	Deep AWS integration, serverless	Vendor lock-in
Nginx/OpenResty	Traditional/Extended	Performance, wide adoption	Limited dynamic routing
Envoy	Cloud Native Proxy	L7 proxy, service mesh foundation	Complexity for simple cases
Spring Cloud Gateway	Java Ecosystem	Tight Spring integration	JVM overhead
GraphQL Federation	Query Language	Unified schema, type safety	Learning curve

Rate Limiting and Throttling:

API Gateways protect backend services from overload:

Algorithm	Behavior	Use Case
Token Bucket	Allows bursts up to bucket size	API rate limiting
Leaky Bucket	Smooth, constant output rate	Protecting fragile backends
Fixed Window	Count requests per time window	Simple quota enforcement
Sliding Window	Rolling count over time	More accurate rate limiting

Response Aggregation (Backend for Frontend - BFF):

For mobile or specific clients, the gateway can aggregate multiple service responses:

┌─────────────────────────────────────────────────────────────┐
│                    Traditional Approach                      │
│                                                             │
│  Mobile App → GET /user/123  → User Service                 │
│            → GET /user/123/orders → Order Service           │
│            → GET /recommendations → Recommendation Service  │
│                                                             │
│  Result: 3 round trips, higher latency                      │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│                    BFF Pattern                               │
│                                                             │
│  Mobile App → GET /mobile/home  → Mobile BFF Gateway        │
│                                         │                   │
│                                         ├─→ User Service    │
│                                         ├─→ Order Service   │
│                                         └─→ Recommendation  │
│                                                             │
│  Result: 1 round trip, lower latency, optimized payload     │
└─────────────────────────────────────────────────────────────┘

Gateway Anti-Patterns

Resilience Patterns for Distributed Systems

Understanding Failure Modes:

Failure Type	Description	Example
Crash Failure	Service terminates unexpectedly	Out of memory, unhandled exception
Latency Degradation	Service responds but slowly	Database connection pool exhaustion
Partial Failure	Some requests fail, others succeed	One container overloaded
Byzantine Failure	Service returns incorrect results	Bug in business logic
Network Partition	Services can't reach each other	Switch failure, DNS issue

Circuit Breaker Pattern:

Prevent cascade failures by stopping requests to failing services:

                    ┌─────────────────────────┐
                    │     Circuit Breaker     │
                    │                         │
                    │  ┌───────────────────┐  │
Order Service ──────│──│  CLOSED (normal)  │──│────► User Service
       │            │  └─────────┬─────────┘  │          │
       │            │            │            │          │
       │            │  Failures exceed       │          ✗ Fails
       │            │  threshold             │
       │            │            │            │
       │            │            ▼            │
       │            │  ┌───────────────────┐  │
       │            │  │   OPEN (failing)  │──┼────► Immediate failure
       │            │  │   No requests     │  │      (no connection attempt)
       │            │  └─────────┬─────────┘  │
       │            │            │            │
       │            │  Timeout expires       │
       │            │            │            │
       │            │            ▼            │
       │            │  ┌───────────────────┐  │
       │            │  │   HALF-OPEN       │──┼────► Limited test requests
       │            │  │   (testing)       │  │
       │            │  └───────────────────┘  │
                    └─────────────────────────┘

Benefits:

Fails fast instead of waiting for timeouts
Prevents thread/connection pool exhaustion
Gives failing service time to recover
Provides fallback behavior opportunity

Essential Resilience Patterns

•Timeout: Never wait indefinitely for a response. Set appropriate timeouts at every network boundary. Typical: 100ms-5s for internal calls.
•Retry with Backoff: Retry failed requests with increasing delays (exponential backoff) to avoid thundering herd during recovery.
•Retry Budget: Limit total retry attempts to prevent retry storms. If 50% of requests are retrying, something is seriously wrong.
•Bulkhead: Isolate resources (thread pools, connection pools) per dependency so one failing service can't exhaust shared resources.
•Fallback: Return cached or default data when a service is unavailable. Degraded functionality is better than complete failure.
•Idempotency: Design operations to be safely retried. If a request times out, the client may retry—ensure duplicate requests don't cause duplicate effects.

Implementing Resilience in Practice:

1. Timeouts (Defensive Coding):

# Without timeout: Can hang indefinitely
response = requests.get('http://user-service/users/123')

# With timeout: Fails fast if service is slow
response = requests.get(
    'http://user-service/users/123',
    timeout=(1.0, 5.0)  # (connect timeout, read timeout)
)

2. Retry with Exponential Backoff:

Attempt 1: Immediate
Attempt 2: Wait 100ms
Attempt 3: Wait 200ms
Attempt 4: Wait 400ms
+ Jitter: Random component to avoid synchronized retries

3. Circuit Breaker State Transitions:

CLOSED → OPEN: 5 failures within 30 seconds
OPEN → HALF-OPEN: After 30 seconds timeout
HALF-OPEN → CLOSED: 3 consecutive successes
HALF-OPEN → OPEN: Any failure

Libraries:

Resilience4j (Java): Circuit breaker, bulkhead, retry
Polly (.NET): Comprehensive resilience library
Hystrix (Java, deprecated): Netflix's pioneering library, concepts live on
Envoy: Proxy-level resilience (no code changes)

Cascading Failures

Service Mesh Architecture

The Sidecar Pattern:

The service mesh deploys a proxy (sidecar) alongside each service instance:

┌──────────────────────────────────────────────────────────────┐
│                          Pod/Container                        │
│  ┌────────────────────┐     ┌────────────────────────────┐   │
│  │   Application      │     │     Sidecar Proxy          │   │
│  │   (User Service)   │────▶│     (Envoy)                │   │
│  │                    │     │                            │   │
│  │   localhost:8080   │     │   • mTLS                   │   │
│  │                    │◀────│   • Load balancing         │   │
│  └────────────────────┘     │   • Circuit breaking       │   │
│                             │   • Observability          │   │
│                             │   • Traffic control        │   │
│                             └────────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘
                                        │
                                        │ Outbound traffic
                                        ▼
┌──────────────────────────────────────────────────────────────┐
│                          Pod/Container                        │
│  ┌────────────────────┐     ┌────────────────────────────┐   │
│  │   Sidecar Proxy    │     │   Application             │   │
│  │   (Envoy)          │────▶│   (Order Service)         │   │
│  └────────────────────┘     └────────────────────────────┘   │
└──────────────────────────────────────────────────────────────┘

Data Plane vs. Control Plane:

Component	Function	Example
Data Plane	Sidecar proxies that handle actual traffic	Envoy, Linkerd-proxy
Control Plane	Management layer that configures proxies	Istio, Linkerd, Consul Connect

Traffic Management:

Service meshes provide sophisticated traffic control:

Canary Deployments:

apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
spec:
  http:
  - route:
    - destination:
        host: user-service
        subset: v1
      weight: 90
    - destination:
        host: user-service
        subset: v2-canary
      weight: 10

A/B Testing:

http:
- match:
  - headers:
      x-experiment-group:
        exact: "treatment"
  route:
  - destination:
      host: user-service
      subset: experimental

Fault Injection (Chaos Engineering):

http:
- fault:
    delay:
      percentage:
        value: 10
      fixedDelay: 5s
    abort:
      percentage:
        value: 1
      httpStatus: 503

Service Mesh Comparison
Feature	Istio	Linkerd	Consul Connect
Data Plane	Envoy	Linkerd-proxy (Rust)	Envoy or built-in
Complexity	High	Low-Medium	Medium
Resource Usage	Higher	Lower	Medium
mTLS	Yes	Yes	Yes
Traffic Management	Extensive	Basic	Good
Multi-cluster	Yes	Yes	Yes
Best For	Full feature set	Simplicity, Kubernetes	Multi-platform

Do You Need a Service Mesh?

Observability in Microservices

The Three Pillars:

Metrics: Numeric measurements aggregated over time (request count, latency percentiles, error rates)
Logs: Immutable records of discrete events (request received, database query executed, error occurred)
Traces: Records of request flow across services (which services handled this request, in what order, with what latency)

Distributed Tracing:

Tracing is especially critical for microservices because a single user request may touch many services:

┌────────────────────────────────────────────────────────────────────────┐
│ Trace ID: abc-123                                                      │
│                                                                        │
│  API Gateway      [████████████████████████████████████] 200ms        │
│    │                                                                   │
│    ├──▶ User Service  [██████████] 45ms                                │
│    │      │                                                            │
│    │      └──▶ Redis Cache [██] 5ms                                    │
│    │                                                                   │
│    ├──▶ Order Service  [████████████████████████] 120ms                │
│    │      │                                                            │
│    │      ├──▶ Inventory Service [████████] 40ms                       │
│    │      │      │                                                     │
│    │      │      └──▶ PostgreSQL [███] 15ms                            │
│    │      │                                                            │
│    │      └──▶ Payment Service [██████████] 50ms                       │
│    │             │                                                     │
│    │             └──▶ External Payment API [████████] 35ms             │
│    │                                                                   │
│    └──▶ Notification Service [████] 20ms (async)                       │
└────────────────────────────────────────────────────────────────────────┘

Trace Context Propagation:

Traces work by propagating context across service boundaries:

Service A                           Service B
    │                                   │
    │ HTTP Request                      │
    │ Headers:                          │
    │   traceparent: 00-abc123-...      │
    │   tracestate: vendor=value        │
    │ ─────────────────────────────────▶│
    │                                   │
    │                    Extract trace context
    │                    Create child span
    │                    Include in outgoing requests

W3C Trace Context is the emerging standard for trace propagation across different tracing systems.

Key Observability Metrics for Microservices

•Request Rate: Requests per second by service, endpoint, status code. Detect traffic anomalies.
•Error Rate: Percentage of requests resulting in errors. Track by error type (4xx vs 5xx).
•Latency Percentiles: P50, P95, P99 response times. Averages hide outliers—percentiles reveal reality.
•Saturation: Resource utilization (CPU, memory, connections). Predict capacity issues before they cause failures.
•Dependency Health: Success rate and latency to downstream services and databases.
•Business Metrics: Orders placed, users registered, payments processed. Connect technical health to business outcomes.

Observability Stack:

Component	Open Source Options	Cloud Options
Metrics Collection	Prometheus, StatsD	CloudWatch, Datadog
Metrics Storage	Prometheus, VictoriaMetrics	Managed services
Log Aggregation	ELK Stack, Loki	CloudWatch Logs, Splunk
Distributed Tracing	Jaeger, Zipkin	X-Ray, Honeycomb
Visualization	Grafana	Built into cloud services
Alerting	Alertmanager, PagerDuty	Cloud-native alerting

Structured Logging:

For effective log analysis, use structured (JSON) logging:

{
  "timestamp": "2024-01-15T10:30:00Z",
  "level": "INFO",
  "service": "order-service",
  "traceId": "abc-123",
  "spanId": "def-456",
  "userId": "user-789",
  "message": "Order created",
  "orderId": "order-101",
  "amount": 99.99,
  "duration_ms": 45
}

Structured logs enable querying, aggregation, and correlation that free-text logs can't support.

Invest in Observability Early

Summary: Microservices Architecture

Key Takeaways

•Microservices trade deployment simplicity for operational flexibility—independent scaling, deployment, and technology choice come at the cost of distributed systems complexity.
•The network becomes the system's critical infrastructure—latency, failures, and serialization overhead are now fundamental concerns, not edge cases.
•Communication patterns shape system behavior—synchronous HTTP/gRPC for queries, asynchronous messaging for commands and events.
•Service discovery and load balancing are essential infrastructure for dynamic, scalable deployments.
•API Gateways centralize cross-cutting concerns—authentication, rate limiting, routing—at the system's edge.
•Resilience patterns are mandatory—timeouts, circuit breakers, retries, and bulkheads protect against cascade failures.
•Service meshes provide infrastructure-level solutions for security, observability, and traffic management.
•Observability through metrics, logs, and traces is essential for understanding distributed system behavior.

Monolithic vs. Microservices Summary
Aspect	Monolithic	Microservices
Deployment	Single artifact	Many independent services
Scaling	Uniform	Per-service
Internal Communication	In-process	Network-based
Data Management	Shared database	Database per service
Technology Stack	Uniform	Polyglot
Team Structure	Feature teams in shared codebase	Service-owning teams
Operational Complexity	Lower	Higher
Failure Modes	Binary (works/fails)	Partial failures

When to Choose Microservices:

When to Avoid Microservices:

Transition to Web Applications:

Page Complete

2 / 5