What Is Load Balancing? - Learning Module

Loading content...

0/273

Load Balancer Placement

The Architecture of Traffic Flow

Understanding what load balancing does is only half the battle. Knowing where to place load balancers in your architecture is equally critical. The same load balancing concept manifests very differently when positioned at the network edge versus in the middle tier versus in front of databases.

Think of load balancer placement as deciding where to station traffic controllers in a city. Controllers at highway entrances manage very different traffic than controllers at downtown intersections or parking garage entrances. Each position has unique responsibilities, constraints, and optimization opportunities.

In modern distributed systems, you'll often need load balancing at multiple layers simultaneously—and understanding the distinct role of each layer is essential for designing robust, high-performance architectures.

What You Will Learn

By the end of this page, you will understand the three primary placement zones for load balancers (edge, middle tier, and data layer), the specific requirements and constraints of each zone, and how multi-tier load balancing creates comprehensive traffic management across your entire architecture.

The Three-Tier Placement Model

Modern distributed architectures typically have three distinct zones where load balancing occurs. Each zone has fundamentally different characteristics:

Converting Mermaid diagram...

Load Balancer Placement Tiers Overview
Tier	Position	Traffic Pattern	Primary Goals
Edge (Tier 1)	Between internet and your infrastructure	External client requests (HTTP/S, mobile, APIs)	DDoS protection, TLS termination, geographic routing
Middle (Tier 2)	Between internal services	Service-to-service communication (gRPC, HTTP)	Service discovery, internal routing, east-west traffic
Data (Tier 3)	Between applications and data stores	Database queries, cache operations	Read distribution, connection pooling, failover

Why Multiple Tiers?

You might wonder: if edge load balancers distribute traffic, why do we need more load balancers internally?

The answer lies in traffic multiplication and specialization:

Traffic Multiplication: One external request might generate 10 internal service calls. Those 10 calls each need load balancing to reach their destination services.
Specialization: Edge load balancers optimize for external threats (DDoS, TLS) while internal load balancers optimize for microservice patterns (service discovery, circuit breaking).
Failure Isolation: If an internal service is overwhelmed, internal load balancers' health checks catch it before it cascades to the edge.
Independent Scaling: Internal services scale independently; they need their own load balancing that adapts to their specific patterns.

Edge Load Balancing: The Public Gateway

Edge load balancers sit at the boundary between the public internet and your infrastructure. They are the first line of defense and the primary entry point for all external traffic.

Characteristics of Edge Load Balancing:

Edge Load Balancer Responsibilities

•TLS/SSL Termination — Edge load balancers handle the computationally expensive TLS handshake and encryption, presenting HTTP to backends. This centralizes certificate management and offloads crypto processing.
•DDoS Protection — Being internet-facing, edge load balancers must absorb or mitigate denial-of-service attacks. This includes rate limiting, IP blocking, and integration with DDoS mitigation services.
•Geographic Routing — Global load balancers route users to the nearest region/data center, minimizing latency. This is critical for global applications.
•WAF Integration — Web Application Firewalls often integrate at the edge, inspecting traffic for SQL injection, XSS, and other attacks before reaching applications.
•Public IP Management — Edge load balancers hold public IP addresses, allowing backend servers to stay on private networks.
•Protocol Translation — Converting external protocols (HTTPS, WebSocket) to internal protocols (HTTP, gRPC) as needed.

Edge Load Balancer Architecture Patterns:

Pattern 1: Single Region Edge

Simplest architecture: one region, one or more edge load balancers behind a single public IP (using DNS or BGP anycast).

Users → DNS → Edge LB (with HA pair) → Backend Servers

Pattern 2: Multi-Region with GSLB

Global Server Load Balancing (GSLB) uses intelligent DNS or anycast to route users to the nearest region. Each region has its own edge load balancers.

Users → GSLB/GeoDNS → Region-specific Edge LB → Regional Backends

Pattern 3: CDN as Edge Load Balancer

CDNs (Cloudflare, Akamai, CloudFront) act as a global edge load balancing tier, with hundreds of PoPs worldwide. They handle static content and can proxy dynamic requests to origin servers.

Users → CDN PoP (caches + load balances) → Origin Servers

Edge Load Balancing Best Practices

•Always use TLS; terminate at edge for efficiency
•Implement rate limiting to prevent abuse
•Use health checks with generous timeouts (internet paths are variable)
•Configure keepalive for backend connections
•Enable access logging for security analysis

Edge Load Balancing Anti-Patterns

•Exposing backend servers directly to internet
•Single edge load balancer (no HA)
•Ignoring geographic latency for global users
•No rate limiting (vulnerable to abuse)
•TLS termination at backends (wasteful)

Edge LB Technology Choices

Common edge load balancing technologies include: AWS ALB/NLB/CloudFront, Google Cloud Load Balancing, Cloudflare, Akamai, Azure Front Door, and self-managed NGINX or HAProxy on public instances. Choice depends on your cloud provider, traffic volume, security requirements, and geographic distribution needs.

Middle-Tier Load Balancing: Service-to-Service Communication

Middle-tier load balancers handle traffic between internal services—what's often called 'east-west' traffic as opposed to 'north-south' traffic from external clients. In microservice architectures, this internal traffic often vastly exceeds external traffic.

The Scale of Internal Traffic:

Consider a typical e-commerce request:

Edge LB receives request (1 external request)
API Gateway calls User Service (1 internal request)
API Gateway calls Product Service (1 internal request)
User Service calls Auth Service (1 internal request)
Product Service calls Inventory Service (1 internal request)
Product Service calls Pricing Service (1 internal request)
All services call Logging Service (5 internal requests)

Result: 1 external request → 10+ internal requests

This multiplication means internal load balancing often handles 10-100x the traffic that edge load balancing handles.

Middle-Tier Load Balancing Requirements

•Service Discovery Integration — Unlike edge LB with static backends, internal services are dynamic (containers, auto-scaling). The load balancer must discover available instances automatically.
•Low Latency — Internal traffic is latency-sensitive (it's on the critical path of every request). The load balancer must add minimal overhead.
•Protocol Support — Internal services often use efficient protocols like gRPC, protobuf, or Thrift rather than JSON over HTTP.
•Observability — Distributed tracing integration, metrics emission, and log correlation are critical for debugging in complex microservice environments.
•Resilience Patterns — Circuit breakers, retry logic, and timeout policies should integrate with load balancing decisions.

Middle-Tier Load Balancing Architectures:

Architecture 1: Centralized Internal Load Balancer

A dedicated load balancer cluster handles all internal traffic. Services call the load balancer, which routes to the appropriate backend.

Service A → Internal LB → Service B instances

Pros: Centralized management, simpler service code
Cons: Single point of failure risk, potential bottleneck, added network hop

Architecture 2: Client-Side Load Balancing

Each service maintains a list of available instances for target services and makes load balancing decisions locally.

Service A (with built-in LB logic) → directly to Service B instance

Pros: No central bottleneck, lower latency, no single point of failure
Cons: More complex service code, harder to manage consistently

Architecture 3: Sidecar/Service Mesh

A proxy runs alongside each service instance (as a 'sidecar'), handling all network communication including load balancing.

Service A → Sidecar Proxy → Sidecar Proxy → Service B

Pros: Decouples networking from application, consistent policies, rich observability
Cons: Added resource overhead, operational complexity

Converting Mermaid diagram...

The Service Mesh Evolution

Service meshes (Istio, Linkerd, Consul Connect) have emerged as the dominant pattern for middle-tier load balancing in Kubernetes environments. They provide consistent load balancing, security (mTLS), and observability without modifying application code.

Data Layer Load Balancing: Database and Cache Proxies

Data layer load balancing sits between application services and data stores (databases, caches, message queues). This tier has unique requirements because data access patterns are fundamentally different from HTTP traffic.

Data Layer Requirements

•Protocol Understanding — Database protocols (MySQL, PostgreSQL, MongoDB wire protocols) are different from HTTP. The load balancer must understand the protocol to make intelligent routing decisions.
•Read/Write Splitting — Many database architectures separate reads (to replicas) from writes (to primary). The load balancer must understand query intent.
•Connection Pooling — Database connections are expensive to establish. The load balancer maintains persistent connection pools, multiplexing application requests over fewer database connections.
•Transaction Awareness — Requests in a transaction must go to the same backend. The load balancer must track transaction state.
•Failover Handling — When a primary database fails, the load balancer must quickly failover to the new primary without disrupting applications.

Database Load Balancing Architecture:

Converting Mermaid diagram...

Data Layer Load Balancing Technologies
Data Store	Common Proxies	Key Features
PostgreSQL	PgBouncer, Pgpool-II, HAProxy	Connection pooling, read/write splitting, failover
MySQL	ProxySQL, MySQL Router, HAProxy	Query routing, connection multiplexing, read/write split
MongoDB	mongos (built-in)	Sharding, replica set routing, automatic failover
Redis	Redis Cluster (built-in), Envoy, Twemproxy	Slot-based routing, pipelining, connection pooling
Kafka	Kafka clients (built-in)	Partition-aware routing, leader tracking

Connection Pooling Deep Dive:

Connection pooling is perhaps the most critical function of data layer load balancing. Consider:

Each database connection consumes memory on the database server (~5-10MB for PostgreSQL)
A PostgreSQL instance with 16GB RAM might support only 500 connections
But you might have 100 application instances, each wanting 20 connections = 2000 connections

Without a connection pooler:

Applications fight for limited connections
Connection storms cause database overload
Scaling applications breaks the database

With a connection pooler:

The pooler maintains 500 connections to the database
2000 application 'connections' are multiplexed over 500 real connections
Connection overhead is absorbed by the pooler, not the database
Database can focus on query processing, not connection management

Data Layer Load Balancing Complexity

Data layer load balancing is often more complex than HTTP load balancing because data stores have stateful semantics (transactions, connections), consistency requirements (read-after-write), and protocol-specific handling. This tier requires deep understanding of your specific data store's behavior.

A Framework for Placement Decisions

When designing a system, how do you decide which tiers need load balancing? Use this decision framework:

Placement Decision Questions

•Is there external traffic? → Yes = Edge load balancing required. There's no alternative for handling internet-facing traffic at scale.
•Are there multiple internal services calling each other? → Yes = Middle-tier load balancing needed. Even 2-3 services justify internal LB for flexibility and resilience.
•Are there database replicas? → Yes = Consider data layer load balancing. Read/write splitting and connection pooling provide significant benefits.
•Is the component stateless? → Yes = Load balancing is straightforward. No = Sticky sessions, affinity, or specialized routing may be needed.
•Does the component scale horizontally? → Yes = Load balancing is essential for utilizing that scale. No = Load balancing has limited benefit.
•What failure modes must be handled? → This determines health check requirements and failover behavior at each tier.

Tier-Specific Decision Factors:

Factor	Edge	Middle	Data
Must-have if...	Any internet traffic	Multiple microservices	Multiple database nodes
Primary concern	Security & DDoS	Latency & discovery	Connection management
Stateless traffic?	Usually yes	Usually yes	Often no (transactions)
Protocol complexity	HTTP/S (well-understood)	HTTP/gRPC	Database-specific
Typical scale	1-10 instances	10-100 instances	2-10 instances
Failure detection	Seconds acceptable	Sub-second preferred	Sub-second critical

Common Mistakes in Placement:

Skipping middle-tier load balancing — "Our services are simple, they can call each other directly." This works until services scale differently or fail.
Over-centralizing at edge — Routing all internal calls through edge load balancers adds latency and creates unnecessary coupling.
Ignoring data tier — Assuming databases 'just work.' Connection pooling and read distribution are almost always beneficial.
One-size-fits-all configuration — Using identical health check settings across tiers. Each tier has different latency tolerances.

Multi-Tier Load Balancing in Practice

Let's trace a request through a realistic multi-tier load-balanced architecture—an e-commerce checkout flow:

Converting Mermaid diagram...

Step-by-Step Request Flow:

Step 1-2: Edge Load Balancing

User's browser makes HTTPS request
Cloudflare (CDN/Edge LB) receives it at nearest PoP
Performs DDoS protection, WAF inspection, TLS termination
Routes to AWS ALB in the nearest region

Step 3-4: Cluster Edge

ALB distributes to Kubernetes Ingress Controllers
Ingress Controller routes based on path/headers to appropriate service

Step 5-8: Service Mesh (Middle Tier)

Every service-to-service call goes through Envoy sidecars
Sidecars perform load balancing across service instances
Circuit breakers prevent cascade failures
Distributed tracing captures the call graph

Step 9-10: Database Tier

PgBouncer pools connections to PostgreSQL
Write queries go to primary
Read queries distributed across replicas
Connection pooling reduces database load

Step 11: Cache Tier

Redis Cluster routes to appropriate node based on key hash
Client library handles slot mapping

Total Load Balancing Decisions for One Request: 6-10

This illustrates why understanding placement matters—load balancing touches every tier, and each tier has distinct requirements and failure modes.

Observability Across Tiers

With load balancing at multiple tiers, observability becomes critical. Ensure distributed tracing (OpenTelemetry) propagates through all tiers so you can debug latency and failures across the entire request path. Each load balancer should emit metrics that aggregate into a coherent picture.

Placement Trade-offs and Considerations

Each placement decision involves trade-offs. Here's a summary to guide your architectural choices:

Placement Trade-offs Summary
Consideration	Edge LB	Middle-Tier LB	Data-Tier LB
Latency added	5-50ms (TLS, geo-routing)	0.1-1ms (local network)	0.1-2ms (protocol parsing)
Throughput	Very high (designed for scale)	High (per-cluster)	Moderate (database-bound)
Failure impact	Total outage if fails	Service-specific outage	Data access outage
HA requirements	Critical (multiple AZs)	Important (sidecar pattern helps)	Important (pooler can be SPOF)
Config complexity	Moderate (TLS, routing rules)	High (service discovery)	High (protocol-specific)
Cost	Higher (public bandwidth)	Lower (internal only)	Lower (internal only)

When to Simplify (Fewer Tiers):

Early-stage startups with minimal traffic
Monolithic applications with single database
Very simple microservice architectures (2-3 services)
Cost-sensitive environments where managed services aren't an option

When to Invest in All Tiers:

High-traffic applications (>10K requests/second)
Complex microservice architectures (10+ services)
Strict availability requirements (99.9%+)
Global user base requiring geographic distribution
Regulated industries requiring security perimeters

Evolutionary Approach

Most systems don't need all three tiers on day one. Start with edge load balancing (almost always needed), add middle-tier as you adopt microservices, and add data-tier when database scaling becomes necessary. Let your architecture evolve with your requirements.

Summary: Load Balancer Placement

Let's consolidate the key concepts from this page:

Key Takeaways

•Three distinct placement tiers exist — Edge (public gateway), middle (service-to-service), and data (database/cache proxying). Each has unique requirements and optimizations.
•Edge load balancing handles external traffic — TLS termination, DDoS protection, and geographic routing are primary concerns. This tier is almost always required.
•Middle-tier handles internal communication — Service meshes and client-side load balancing solve the service discovery and east-west traffic challenge.
•Data-tier manages database access — Connection pooling, read/write splitting, and transaction-aware routing differentiate this from HTTP load balancing.
•Traffic multiplies as it goes deeper — One external request becomes many internal requests, making middle and data tier load balancing critical at scale.
•Placement decisions compound — Load balancing at multiple tiers creates defense in depth, but also requires coordinated observability and consistent configuration.

What's Next:

With a solid understanding of load balancing fundamentals, benefits, and placement, the final page of this module addresses a critical concern: the load balancer as a single point of failure. We'll explore how to make load balancing infrastructure itself highly available.

Page Complete

You now understand load balancer placement across the three tiers of a distributed system architecture. You can reason about where load balancing is needed, what each tier optimizes for, and how to make placement decisions based on your system's requirements. Next, we'll address making the load balancer itself highly available.

Load Balancer Placement

The Architecture of Traffic Flow

What You Will Learn

The Three-Tier Placement Model

Modern distributed architectures typically have three distinct zones where load balancing occurs. Each zone has fundamentally different characteristics:

Converting Mermaid diagram...

Load Balancer Placement Tiers Overview
Tier	Position	Traffic Pattern	Primary Goals
Edge (Tier 1)	Between internet and your infrastructure	External client requests (HTTP/S, mobile, APIs)	DDoS protection, TLS termination, geographic routing
Middle (Tier 2)	Between internal services	Service-to-service communication (gRPC, HTTP)	Service discovery, internal routing, east-west traffic
Data (Tier 3)	Between applications and data stores	Database queries, cache operations	Read distribution, connection pooling, failover

Why Multiple Tiers?

You might wonder: if edge load balancers distribute traffic, why do we need more load balancers internally?

The answer lies in traffic multiplication and specialization:

Traffic Multiplication: One external request might generate 10 internal service calls. Those 10 calls each need load balancing to reach their destination services.
Specialization: Edge load balancers optimize for external threats (DDoS, TLS) while internal load balancers optimize for microservice patterns (service discovery, circuit breaking).
Failure Isolation: If an internal service is overwhelmed, internal load balancers' health checks catch it before it cascades to the edge.
Independent Scaling: Internal services scale independently; they need their own load balancing that adapts to their specific patterns.

Edge Load Balancing: The Public Gateway

Edge load balancers sit at the boundary between the public internet and your infrastructure. They are the first line of defense and the primary entry point for all external traffic.

Characteristics of Edge Load Balancing:

Edge Load Balancer Responsibilities

•TLS/SSL Termination — Edge load balancers handle the computationally expensive TLS handshake and encryption, presenting HTTP to backends. This centralizes certificate management and offloads crypto processing.
•DDoS Protection — Being internet-facing, edge load balancers must absorb or mitigate denial-of-service attacks. This includes rate limiting, IP blocking, and integration with DDoS mitigation services.
•Geographic Routing — Global load balancers route users to the nearest region/data center, minimizing latency. This is critical for global applications.
•WAF Integration — Web Application Firewalls often integrate at the edge, inspecting traffic for SQL injection, XSS, and other attacks before reaching applications.
•Public IP Management — Edge load balancers hold public IP addresses, allowing backend servers to stay on private networks.
•Protocol Translation — Converting external protocols (HTTPS, WebSocket) to internal protocols (HTTP, gRPC) as needed.

Edge Load Balancer Architecture Patterns:

Pattern 1: Single Region Edge

Simplest architecture: one region, one or more edge load balancers behind a single public IP (using DNS or BGP anycast).

Users → DNS → Edge LB (with HA pair) → Backend Servers

Pattern 2: Multi-Region with GSLB

Global Server Load Balancing (GSLB) uses intelligent DNS or anycast to route users to the nearest region. Each region has its own edge load balancers.

Users → GSLB/GeoDNS → Region-specific Edge LB → Regional Backends

Pattern 3: CDN as Edge Load Balancer

CDNs (Cloudflare, Akamai, CloudFront) act as a global edge load balancing tier, with hundreds of PoPs worldwide. They handle static content and can proxy dynamic requests to origin servers.

Users → CDN PoP (caches + load balances) → Origin Servers

Edge Load Balancing Best Practices

•Always use TLS; terminate at edge for efficiency
•Implement rate limiting to prevent abuse
•Use health checks with generous timeouts (internet paths are variable)
•Configure keepalive for backend connections
•Enable access logging for security analysis

Edge Load Balancing Anti-Patterns

•Exposing backend servers directly to internet
•Single edge load balancer (no HA)
•Ignoring geographic latency for global users
•No rate limiting (vulnerable to abuse)
•TLS termination at backends (wasteful)

Edge LB Technology Choices

Middle-Tier Load Balancing: Service-to-Service Communication

The Scale of Internal Traffic:

Consider a typical e-commerce request:

Edge LB receives request (1 external request)
API Gateway calls User Service (1 internal request)
API Gateway calls Product Service (1 internal request)
User Service calls Auth Service (1 internal request)
Product Service calls Inventory Service (1 internal request)
Product Service calls Pricing Service (1 internal request)
All services call Logging Service (5 internal requests)

Result: 1 external request → 10+ internal requests

This multiplication means internal load balancing often handles 10-100x the traffic that edge load balancing handles.

Middle-Tier Load Balancing Requirements

•Service Discovery Integration — Unlike edge LB with static backends, internal services are dynamic (containers, auto-scaling). The load balancer must discover available instances automatically.
•Low Latency — Internal traffic is latency-sensitive (it's on the critical path of every request). The load balancer must add minimal overhead.
•Protocol Support — Internal services often use efficient protocols like gRPC, protobuf, or Thrift rather than JSON over HTTP.
•Observability — Distributed tracing integration, metrics emission, and log correlation are critical for debugging in complex microservice environments.
•Resilience Patterns — Circuit breakers, retry logic, and timeout policies should integrate with load balancing decisions.

Middle-Tier Load Balancing Architectures:

Architecture 1: Centralized Internal Load Balancer

A dedicated load balancer cluster handles all internal traffic. Services call the load balancer, which routes to the appropriate backend.

Service A → Internal LB → Service B instances

Pros: Centralized management, simpler service code
Cons: Single point of failure risk, potential bottleneck, added network hop

Architecture 2: Client-Side Load Balancing

Each service maintains a list of available instances for target services and makes load balancing decisions locally.

Service A (with built-in LB logic) → directly to Service B instance

Pros: No central bottleneck, lower latency, no single point of failure
Cons: More complex service code, harder to manage consistently

Architecture 3: Sidecar/Service Mesh

A proxy runs alongside each service instance (as a 'sidecar'), handling all network communication including load balancing.

Service A → Sidecar Proxy → Sidecar Proxy → Service B

Pros: Decouples networking from application, consistent policies, rich observability
Cons: Added resource overhead, operational complexity

Converting Mermaid diagram...

The Service Mesh Evolution

Data Layer Load Balancing: Database and Cache Proxies

Data Layer Requirements

•Protocol Understanding — Database protocols (MySQL, PostgreSQL, MongoDB wire protocols) are different from HTTP. The load balancer must understand the protocol to make intelligent routing decisions.
•Read/Write Splitting — Many database architectures separate reads (to replicas) from writes (to primary). The load balancer must understand query intent.
•Connection Pooling — Database connections are expensive to establish. The load balancer maintains persistent connection pools, multiplexing application requests over fewer database connections.
•Transaction Awareness — Requests in a transaction must go to the same backend. The load balancer must track transaction state.
•Failover Handling — When a primary database fails, the load balancer must quickly failover to the new primary without disrupting applications.

Database Load Balancing Architecture:

Converting Mermaid diagram...

Data Layer Load Balancing Technologies
Data Store	Common Proxies	Key Features
PostgreSQL	PgBouncer, Pgpool-II, HAProxy	Connection pooling, read/write splitting, failover
MySQL	ProxySQL, MySQL Router, HAProxy	Query routing, connection multiplexing, read/write split
MongoDB	mongos (built-in)	Sharding, replica set routing, automatic failover
Redis	Redis Cluster (built-in), Envoy, Twemproxy	Slot-based routing, pipelining, connection pooling
Kafka	Kafka clients (built-in)	Partition-aware routing, leader tracking

Connection Pooling Deep Dive:

Connection pooling is perhaps the most critical function of data layer load balancing. Consider:

Each database connection consumes memory on the database server (~5-10MB for PostgreSQL)
A PostgreSQL instance with 16GB RAM might support only 500 connections
But you might have 100 application instances, each wanting 20 connections = 2000 connections

Without a connection pooler:

Applications fight for limited connections
Connection storms cause database overload
Scaling applications breaks the database

With a connection pooler:

The pooler maintains 500 connections to the database
2000 application 'connections' are multiplexed over 500 real connections
Connection overhead is absorbed by the pooler, not the database
Database can focus on query processing, not connection management

Data Layer Load Balancing Complexity

A Framework for Placement Decisions

When designing a system, how do you decide which tiers need load balancing? Use this decision framework:

Placement Decision Questions

•Is there external traffic? → Yes = Edge load balancing required. There's no alternative for handling internet-facing traffic at scale.
•Are there multiple internal services calling each other? → Yes = Middle-tier load balancing needed. Even 2-3 services justify internal LB for flexibility and resilience.
•Are there database replicas? → Yes = Consider data layer load balancing. Read/write splitting and connection pooling provide significant benefits.
•Is the component stateless? → Yes = Load balancing is straightforward. No = Sticky sessions, affinity, or specialized routing may be needed.
•Does the component scale horizontally? → Yes = Load balancing is essential for utilizing that scale. No = Load balancing has limited benefit.
•What failure modes must be handled? → This determines health check requirements and failover behavior at each tier.

Tier-Specific Decision Factors:

Factor	Edge	Middle	Data
Must-have if...	Any internet traffic	Multiple microservices	Multiple database nodes
Primary concern	Security & DDoS	Latency & discovery	Connection management
Stateless traffic?	Usually yes	Usually yes	Often no (transactions)
Protocol complexity	HTTP/S (well-understood)	HTTP/gRPC	Database-specific
Typical scale	1-10 instances	10-100 instances	2-10 instances
Failure detection	Seconds acceptable	Sub-second preferred	Sub-second critical

Common Mistakes in Placement:

Skipping middle-tier load balancing — "Our services are simple, they can call each other directly." This works until services scale differently or fail.
Over-centralizing at edge — Routing all internal calls through edge load balancers adds latency and creates unnecessary coupling.
Ignoring data tier — Assuming databases 'just work.' Connection pooling and read distribution are almost always beneficial.
One-size-fits-all configuration — Using identical health check settings across tiers. Each tier has different latency tolerances.

Multi-Tier Load Balancing in Practice

Let's trace a request through a realistic multi-tier load-balanced architecture—an e-commerce checkout flow:

Converting Mermaid diagram...

Step-by-Step Request Flow:

Step 1-2: Edge Load Balancing

User's browser makes HTTPS request
Cloudflare (CDN/Edge LB) receives it at nearest PoP
Performs DDoS protection, WAF inspection, TLS termination
Routes to AWS ALB in the nearest region

Step 3-4: Cluster Edge

ALB distributes to Kubernetes Ingress Controllers
Ingress Controller routes based on path/headers to appropriate service

Step 5-8: Service Mesh (Middle Tier)

Every service-to-service call goes through Envoy sidecars
Sidecars perform load balancing across service instances
Circuit breakers prevent cascade failures
Distributed tracing captures the call graph

Step 9-10: Database Tier

PgBouncer pools connections to PostgreSQL
Write queries go to primary
Read queries distributed across replicas
Connection pooling reduces database load

Step 11: Cache Tier

Redis Cluster routes to appropriate node based on key hash
Client library handles slot mapping

Total Load Balancing Decisions for One Request: 6-10

This illustrates why understanding placement matters—load balancing touches every tier, and each tier has distinct requirements and failure modes.

Observability Across Tiers

Placement Trade-offs and Considerations

Each placement decision involves trade-offs. Here's a summary to guide your architectural choices:

Placement Trade-offs Summary
Consideration	Edge LB	Middle-Tier LB	Data-Tier LB
Latency added	5-50ms (TLS, geo-routing)	0.1-1ms (local network)	0.1-2ms (protocol parsing)
Throughput	Very high (designed for scale)	High (per-cluster)	Moderate (database-bound)
Failure impact	Total outage if fails	Service-specific outage	Data access outage
HA requirements	Critical (multiple AZs)	Important (sidecar pattern helps)	Important (pooler can be SPOF)
Config complexity	Moderate (TLS, routing rules)	High (service discovery)	High (protocol-specific)
Cost	Higher (public bandwidth)	Lower (internal only)	Lower (internal only)

When to Simplify (Fewer Tiers):

Early-stage startups with minimal traffic
Monolithic applications with single database
Very simple microservice architectures (2-3 services)
Cost-sensitive environments where managed services aren't an option

When to Invest in All Tiers:

High-traffic applications (>10K requests/second)
Complex microservice architectures (10+ services)
Strict availability requirements (99.9%+)
Global user base requiring geographic distribution
Regulated industries requiring security perimeters

Evolutionary Approach

Summary: Load Balancer Placement

Let's consolidate the key concepts from this page:

Key Takeaways

•Three distinct placement tiers exist — Edge (public gateway), middle (service-to-service), and data (database/cache proxying). Each has unique requirements and optimizations.
•Edge load balancing handles external traffic — TLS termination, DDoS protection, and geographic routing are primary concerns. This tier is almost always required.
•Middle-tier handles internal communication — Service meshes and client-side load balancing solve the service discovery and east-west traffic challenge.
•Data-tier manages database access — Connection pooling, read/write splitting, and transaction-aware routing differentiate this from HTTP load balancing.
•Traffic multiplies as it goes deeper — One external request becomes many internal requests, making middle and data tier load balancing critical at scale.
•Placement decisions compound — Load balancing at multiple tiers creates defense in depth, but also requires coordinated observability and consistent configuration.

What's Next:

Page Complete