What Is Load Balancing? - Learning Module

Loading content...

0/273

Single Point of Failure Considerations

The Irony of Load Balancer Failure

There's a deep irony in load balancing: the very component designed to eliminate single points of failure can itself become the single point of failure.

Consider a system with 100 highly available backend servers, each with 99.9% uptime. Together, they could theoretically achieve astronomical availability. But if all their traffic flows through a single load balancer with 99.9% uptime, the entire system's availability is capped at that 99.9%—roughly 8.76 hours of downtime per year.

The math is unforgiving:

System availability = min(component availabilities) when components are in series
If LB availability = 99.9%, then system availability ≤ 99.9%, regardless of backend redundancy

This means that investing in backend redundancy while neglecting load balancer high availability is architectural waste. The load balancer's reliability must match or exceed the availability targets of the entire system.

The Hidden Assumption

Many system designs implicitly assume load balancers are infinitely reliable. This assumption is false. Load balancers are software running on hardware, subject to the same failures as any other component. This page teaches you to design for their failure.

What You Will Learn

By the end of this page, you will understand how load balancers fail, strategies for making them highly available, the role of redundancy patterns like active-passive and active-active, DNS-based failover, and how cloud providers solve this problem at scale.

How Load Balancers Fail

To design for high availability, we must first understand the failure modes of load balancers. These failures fall into several categories:

Load Balancer Failure Modes

•Hardware Failure — Physical failure of the server hosting the load balancer: disk failure, memory corruption, CPU failure, power supply failure, or network interface failure. Physical hardware has a non-zero probability of failure.
•Software Crash — Bugs in the load balancer software causing crashes. Even battle-tested software like HAProxy and NGINX occasionally have bugs that surface under specific conditions.
•Resource Exhaustion — Running out of memory (connection state overflow), CPU (too much TLS processing), file descriptors (too many connections), or network bandwidth. The load balancer becomes a bottleneck.
•Configuration Errors — Human errors: misconfigurations pushed to production, syntax errors in config files, incorrect backend addresses, or broken health checks that remove all backends.
•Network Failures — The load balancer is reachable internally but loses external connectivity, or vice versa. Asymmetric network failures are particularly insidious.
•Dependency Failures — If the load balancer depends on external services (service discovery, config servers), their failure can cascade. DNS resolution failures are common culprits.
•Overload Cascade — If one load balancer in a pair fails, the surviving load balancer receives 2x traffic and itself becomes overloaded, causing complete failure.

Failure Mode Statistics (Representative)
Failure Type	Typical Frequency	Detection Time	Recovery Time
Hardware failure	1-5% per year	Seconds (heartbeat)	Minutes (failover) to hours (replace)
Software crash	Rare (mature software)	Seconds	Seconds (auto-restart)
Resource exhaustion	Varies (load-dependent)	Seconds to minutes	Seconds (scale) to minutes (debug)
Configuration error	Depends on processes	Immediate to hours	Seconds (rollback) to hours (investigate)
Network failure	Rare but impactful	Seconds	Seconds to hours (network repair)
Overload cascade	Rare but catastrophic	Seconds	Minutes (traffic shed or scale)

The Cascading Failure Problem:

The most dangerous failure mode is the cascade. Consider:

You have 2 load balancers, each handling 50% of traffic
Load balancer A fails
All traffic shifts to load balancer B (now at 100%)
Load balancer B, sized for 50%, becomes overloaded
Load balancer B's performance degrades, then it crashes
Complete outage

This is why high-availability load balancing isn't just about having a spare—it's about having capacity headroom across all instances to absorb failures.

The N+1 Principle

For highly available load balancing, follow the N+1 principle: provision N+1 instances where N is sufficient for peak load. If you need 2 load balancers for capacity, run 3 so that any single failure leaves 2 running—still sufficient for full load.

Active-Passive (Failover) Pattern

The active-passive (also called hot standby) pattern is the simplest approach to load balancer high availability. One load balancer actively handles all traffic while another stands by, ready to take over if the primary fails.

Converting Mermaid diagram...

How Active-Passive Works:

Virtual IP (VIP): A floating IP address that clients connect to. This IP is currently bound to the active load balancer.
Heartbeat Monitoring: The passive load balancer continuously monitors the active one through heartbeat messages, typically using protocols like VRRP (Virtual Router Redundancy Protocol) or keepalived.
Failover: When the passive detects the active has failed (missed heartbeats, health check failures), it takes over the VIP. Network switches learn the new MAC-to-IP mapping, and traffic flows to the new active.
Recovery: When the original primary recovers, it can either stay passive (non-preemptive) or reclaim the active role (preemptive).

Active-Passive Implementation Technologies
Technology	Platform	How It Works
VRRP (keepalived)	Linux	Standard protocol for VIP sharing; widely used with HAProxy/NGINX
Pacemaker/Corosync	Linux	Full cluster resource manager; handles complex failover scenarios
Windows NLB	Windows	Built-in Windows Server feature for IP failover
Cloud Floating IPs	Cloud providers	AWS Elastic IP, GCP External IP reassignment via API

Active-Passive Advantages

•Simple to understand and implement
•No need for state synchronization
•Deterministic behavior (one active at a time)
•Works with any load balancer software
•Stateful connections are preserved (if state replication enabled)

Active-Passive Disadvantages

•50% of load balancer capacity is idle (waste)
•Failover causes brief connection disruption
•Single LB is the bottleneck (no horizontal scaling)
•Failover time isn't instant (seconds to detect + switch)
•Risk of cascade if active was under heavy load

When to Use Active-Passive

Active-passive is appropriate when: (1) traffic volume fits on a single load balancer, (2) simplicity is prioritized, (3) brief failover disruption (seconds) is acceptable, (4) cost of idle standby is acceptable. It's common for internal load balancers and smaller deployments.

Active-Active Pattern

The active-active pattern runs multiple load balancers simultaneously, all handling traffic. This eliminates the wasted capacity of active-passive while also enabling horizontal scaling.

Converting Mermaid diagram...

How Active-Active Works:

The challenge of active-active is: how do clients find the load balancers? There are several approaches:

Approach 1: DNS Round-Robin

DNS returns multiple A records (IP addresses) for the load balancer hostname. Clients choose one (usually randomly or round-robin). If that load balancer fails, clients eventually retry with another IP.

Pros: Simple, widely supported
Cons: DNS caching delays failover (minutes), no health awareness

Approach 2: BGP Anycast

All load balancers advertise the same IP address via BGP routing. Network routers automatically send traffic to the 'nearest' load balancer. If one fails, BGP reconverges.

Pros: Fast failover (seconds), automatic geographic routing
Cons: Requires network control, complex operations

Approach 3: External Load Balancer Layer

A higher-tier load balancer (like a cloud provider's NLB or GLB) distributes traffic across your load balancer pool.

Pros: Managed, health-aware, fast failover
Cons: Added layer, cost, dependency on provider

Approach 4: Client-Side Load Balancing

Clients are given a list of load balancer IPs and implement their own load balancing logic, including failover.

Pros: No single point of failure, fast failover
Cons: Complex client logic, harder to update addresses

Active-Active Approaches Comparison
Approach	Failover Time	Geographic Routing	Complexity
DNS Round-Robin	Minutes (TTL-dependent)	No (random)	Low
BGP Anycast	Seconds	Yes (network-based)	High
External LB Layer	Seconds	Yes (if GSLB)	Medium
Client-Side	Milliseconds	Depends on client	Medium (client)

State Synchronization Challenge:

Active-active creates a challenge: if a client's request goes to LB1 first and then LB2, will LB2 have the session state?

Solutions:

Stateless load balancing: Design so session state isn't needed at the LB. Use sticky sessions at the application layer or externalize session state to Redis.
State replication: Load balancers synchronize state between themselves (HAProxy supports this, as do some commercial solutions). Adds complexity and latency.
Consistent hashing/affinity: Use client IP or other attributes to consistently route the same client to the same load balancer. Only failed LB's sessions disrupt.

Prefer Stateless When Possible

The simplest active-active architecture is one where load balancers are completely stateless. Design your application so that any request can be handled by any load balancer → any backend server. This maximizes flexibility and minimizes complexity.

DNS-Based Failover and Global Load Balancing

DNS-based failover uses the Domain Name System to direct traffic away from failed load balancers (or entire regions). While DNS isn't traditionally thought of as a load balancing layer, modern DNS services provide sophisticated health checking and traffic management.

How DNS-Based Failover Works:

DNS Failover Mechanism

•Health checks: DNS provider actively monitors each load balancer endpoint (HTTP health checks, TCP checks, etc.)
•Record management: Healthy endpoints are included in DNS responses; unhealthy ones are omitted
•TTL management: Short TTLs (30-60s) allow faster failover at the cost of more DNS traffic
•Failover policies: Primary/secondary (route to secondary only if primary fails) or weighted (distribute proportionally)
•Client resolution: When clients query DNS, they only receive healthy endpoints

Converting Mermaid diagram...

DNS Failover Limitations:

1. TTL Caching Delays Failover

DNS responses are cached by resolvers, browsers, and operating systems. Even with a 60-second TTL:

Some resolvers ignore TTLs and cache longer
Existing connections aren't affected by DNS changes
Real-world failover often takes 1-5 minutes, not seconds

2. No Connection-Level Awareness

DNS only affects new connections. Existing TCP connections to a failed load balancer will hang until timeout. Applications must implement connection-level failover.

3. No Load Awareness

DNS failover typically only knows healthy/unhealthy, not 'overloaded.' A load balancer at 99% CPU is still 'healthy' from DNS perspective.

Mitigation Strategies:

•Set aggressive health check intervals (10-15 seconds) for faster failure detection
•Use very low TTLs (30s) for load balancer DNS records, accepting higher DNS traffic
•Implement application-level connection retry/failover
•Combine DNS failover with lower-level health checking for defense in depth
•Use GeoDNS/latency-based routing to reduce the impact of regional failures

DNS as the Last Resort

Think of DNS failover as the 'coarse-grained' layer of failover—it handles regional outages and major failures. For fast, fine-grained failover between individual load balancer instances, use active-active with a load balancer layer or BGP anycast.

Cloud Provider High Availability Solutions

Cloud providers have essentially 'solved' the load balancer SPOF problem with managed services. Understanding how they achieve this helps you design better, whether you use their services or build your own.

Cloud Provider Load Balancer HA Features
Provider	Service	HA Approach	SLA
AWS	Application LB (ALB)	Automatically distributed across AZs; no single nodes exposed	99.99%
AWS	Network LB (NLB)	Flow-based distribution; static IPs for failover	99.99%
Google Cloud	Cloud Load Balancing	Anycast-based global distribution; no regional failover needed	99.99%
Azure	Load Balancer / App Gateway	Zone-redundant deployment; automatic failover	99.99%
Cloudflare	Load Balancing	Anycast across 300+ PoPs; health-aware steering	100% (with caveats)

How Cloud Load Balancers Achieve HA:

AWS ALB/NLB Architecture:

Runs on a cluster of EC2 instances invisible to you
Deployed across all AZs in the region you select
Auto-scales based on traffic (within minutes)
Health checks backends and removes unhealthy ones
Uses internal DNS to distribute traffic across LB nodes

Google Cloud Load Balancing:

Uses Anycast: one IP address, served from hundreds of PoPs globally
Traffic enters the Google network at the nearest PoP
Google's internal network routes to backends
No single point of failure by design

Key Insight: The Load Balancer Is a Fleet, Not a Box

Cloud load balancers aren't single machines—they're fleets of machines behind a managed abstraction. This is why they achieve 99.99% SLAs: any individual machine failure is invisible.

Use Managed Load Balancers When...

•HA is critical and you don't have a dedicated team
•Traffic is highly variable (auto-scaling)
•You need global load balancing
•You want to move fast without infra complexity
•Compliance requires provider SLAs

Self-Manage Load Balancers When...

•You need features cloud LBs don't offer
•Cost becomes prohibitive at scale
•You have specific performance requirements
•You're in regulated environments requiring control
•You're multi-cloud and need consistency

Hybrid Approach

Many organizations use a hybrid: cloud load balancers at the edge (where HA is most critical and hardest to achieve) and self-managed internal load balancers (NGINX, Envoy) where they need more control and costs are lower.

Capacity Planning for High Availability

High availability isn't just about having backups—it's about ensuring those backups have sufficient capacity to handle failure scenarios. This requires careful capacity planning.

Capacity Planning Principles

•N+1 Redundancy — If you need N load balancers for capacity, provision N+1. One failure leaves you with N, still sufficient for full load.
•2N Redundancy — For critical systems, double your load balancer capacity. This allows half the fleet to fail while still handling full load.
•Headroom — Never run load balancers above 50-60% of their capacity. This leaves room for traffic spikes and failure absorption.
•Failure Testing — Regularly test failures in production (chaos engineering). Verify that remaining capacity actually handles the failover load.
•Geographic Distribution — Distribute load balancers across failure domains (AZs, regions) so that a single event can't take out all of them.

Capacity Calculation Example:

Scenario: You handle 100K requests/second at peak.

Single load balancer capacity: 60K requests/second (fully utilized)

Option 1: Active-Passive (N+1)

2 load balancers (1 active, 1 passive)
Active handles 100K req/s at ~167% theoretical capacity ❌
Not viable—single LB can't handle load

Option 2: Active-Active (2N)

4 load balancers, each handles 25K req/s (~42% capacity)
If 2 fail, remaining 2 handle 50K req/s each (~83% capacity) ✓
If 1 fails, remaining 3 handle 33K req/s each (~55% capacity) ✓

Option 3: N+1 with larger instances

2 load balancers, each capable of 100K req/s
Normal: each handles 50K (~50% capacity)
One fails: survivor handles 100K (~100% capacity) ⚠️ (risky, no headroom)

Recommendation: N+2 or 2N for critical systems.

Redundancy Strategies by Criticality
Criticality	Redundancy Level	Typical Headroom	Failure Tolerance
Non-critical internal	N+1	30-40%	1 failure
Business-critical	N+2 or 2N	40-50%	2+ failures
Life-safety / Financial	2N + geographic	50%+	Full AZ/region loss

Don't Forget Spike Capacity

Capacity planning for HA must account for traffic spikes, not just steady-state load. If your peak is 2x your average, ensure your post-failure capacity handles that peak. Many outages occur when a failure coincides with a traffic spike.

Operational Best Practices for HA Load Balancing

Technical architecture is only half the battle. Operational practices determine whether your HA design actually delivers its promised reliability.

Operational HA Best Practices

•Staged Configuration Rollouts — Never push configuration changes to all load balancers simultaneously. Roll out to one, verify, then proceed to others. A config error on all LBs = total outage.
•Automated Configuration Validation — Validate configurations before applying. Check syntax, verify backends exist, test routing rules. Catch errors before they reach production.
•Runbook-Driven Incident Response — Document failover procedures before you need them. Panicking engineers make mistakes. Clear runbooks save minutes during outages.
•Regular Failover Testing — Test failover regularly (monthly or quarterly). Verify it works, measure failover time, ensure documentation accuracy. Untested failover is unreliable failover.
•Monitoring and Alerting — Monitor load balancer health, latency, error rates, and capacity utilization. Alert before saturation, not after failure.
•Chaos Engineering — Intentionally cause failures in production (with safety measures) to verify resilience. Netflix's Chaos Monkey philosophy: the best way to know your system survives failures is to cause them regularly.
•Capacity Reviews — Regularly review capacity against traffic growth. Load balancer capacity that was sufficient 6 months ago may be inadequate now.

Common Operational Failure Patterns:

Pattern	Consequence	Prevention
All-at-once config push	Total outage from config error	Staged rollouts
Untested failover	Failover doesn't work when needed	Regular testing
Expired certificates	Sudden TLS failures	Certificate monitoring
Filled logs crashing disk	LB becomes unresponsive	Log rotation, disk monitoring
Runaway health checks	Overwhelm backends	Rate limit health checks
DNS TTL too long	Slow failover	Aggressive TTLs for LB records
Missing alerts	Failures detected by users	Comprehensive monitoring

Pre-Mortem Practice

Before deploying HA load balancing, run a 'pre-mortem': imagine it's 6 months from now and there was a major outage. What went wrong? This exercise surfaces risks before they become incidents.

Summary: Avoiding Single Points of Failure

Let's consolidate the key concepts from this page and the module as a whole:

Key Takeaways: Single Point of Failure

•Load balancers can fail — Hardware, software, configuration, network, and overload failures are all real risks that must be designed around.
•Active-passive provides simple redundancy — One active, one standby. Simple but wastes capacity and has failover delay.
•Active-active provides capacity and redundancy — All load balancers handle traffic. Requires traffic distribution mechanism (DNS, anycast, higher-tier LB).
•DNS failover is coarse-grained — Useful for regional failover but slow due to TTL caching. Combine with lower-level failover.
•Cloud managed services simplify HA — Abstracts away the complexity of LB redundancy. Often the right choice unless you have specific requirements.
•Capacity planning must account for failures — N+1 minimum, 2N for critical systems. Never assume all capacity will be available.
•Operational practices determine real-world reliability — Staged rollouts, testing, monitoring, and runbooks matter as much as architecture.

Module 1 Summary: What Is Load Balancing?

Over these four pages, we've established a comprehensive understanding of load balancing:

Definition and Purpose — Load balancing distributes traffic across resources to optimize utilization, maximize throughput, minimize latency, and avoid overload.
Benefits — Availability (through redundancy and failover), performance (through distribution and optimization), and flexibility (through abstraction and operational agility).
Placement — Load balancing occurs at multiple tiers: edge (external traffic), middle (service-to-service), and data (database/cache access).
High Availability — The load balancer itself must be made highly available through redundancy patterns, careful capacity planning, and operational discipline.

Module Complete

You now have a thorough understanding of load balancing fundamentals. You can explain what load balancing is, why it matters, where to place it, and how to make it highly available. This foundation prepares you for the next modules, where we'll dive into Layer 4 vs Layer 7 load balancing, specific algorithms, session persistence, health checks, and load balancer technology comparisons.

Single Point of Failure Considerations

The Irony of Load Balancer Failure

There's a deep irony in load balancing: the very component designed to eliminate single points of failure can itself become the single point of failure.

The math is unforgiving:

System availability = min(component availabilities) when components are in series
If LB availability = 99.9%, then system availability ≤ 99.9%, regardless of backend redundancy

The Hidden Assumption

What You Will Learn

How Load Balancers Fail

To design for high availability, we must first understand the failure modes of load balancers. These failures fall into several categories:

Load Balancer Failure Modes

•Hardware Failure — Physical failure of the server hosting the load balancer: disk failure, memory corruption, CPU failure, power supply failure, or network interface failure. Physical hardware has a non-zero probability of failure.
•Software Crash — Bugs in the load balancer software causing crashes. Even battle-tested software like HAProxy and NGINX occasionally have bugs that surface under specific conditions.
•Resource Exhaustion — Running out of memory (connection state overflow), CPU (too much TLS processing), file descriptors (too many connections), or network bandwidth. The load balancer becomes a bottleneck.
•Configuration Errors — Human errors: misconfigurations pushed to production, syntax errors in config files, incorrect backend addresses, or broken health checks that remove all backends.
•Network Failures — The load balancer is reachable internally but loses external connectivity, or vice versa. Asymmetric network failures are particularly insidious.
•Dependency Failures — If the load balancer depends on external services (service discovery, config servers), their failure can cascade. DNS resolution failures are common culprits.
•Overload Cascade — If one load balancer in a pair fails, the surviving load balancer receives 2x traffic and itself becomes overloaded, causing complete failure.

Failure Mode Statistics (Representative)
Failure Type	Typical Frequency	Detection Time	Recovery Time
Hardware failure	1-5% per year	Seconds (heartbeat)	Minutes (failover) to hours (replace)
Software crash	Rare (mature software)	Seconds	Seconds (auto-restart)
Resource exhaustion	Varies (load-dependent)	Seconds to minutes	Seconds (scale) to minutes (debug)
Configuration error	Depends on processes	Immediate to hours	Seconds (rollback) to hours (investigate)
Network failure	Rare but impactful	Seconds	Seconds to hours (network repair)
Overload cascade	Rare but catastrophic	Seconds	Minutes (traffic shed or scale)

The Cascading Failure Problem:

The most dangerous failure mode is the cascade. Consider:

You have 2 load balancers, each handling 50% of traffic
Load balancer A fails
All traffic shifts to load balancer B (now at 100%)
Load balancer B, sized for 50%, becomes overloaded
Load balancer B's performance degrades, then it crashes
Complete outage

This is why high-availability load balancing isn't just about having a spare—it's about having capacity headroom across all instances to absorb failures.

The N+1 Principle

Active-Passive (Failover) Pattern

Converting Mermaid diagram...

How Active-Passive Works:

Virtual IP (VIP): A floating IP address that clients connect to. This IP is currently bound to the active load balancer.
Heartbeat Monitoring: The passive load balancer continuously monitors the active one through heartbeat messages, typically using protocols like VRRP (Virtual Router Redundancy Protocol) or keepalived.
Failover: When the passive detects the active has failed (missed heartbeats, health check failures), it takes over the VIP. Network switches learn the new MAC-to-IP mapping, and traffic flows to the new active.
Recovery: When the original primary recovers, it can either stay passive (non-preemptive) or reclaim the active role (preemptive).

Active-Passive Implementation Technologies
Technology	Platform	How It Works
VRRP (keepalived)	Linux	Standard protocol for VIP sharing; widely used with HAProxy/NGINX
Pacemaker/Corosync	Linux	Full cluster resource manager; handles complex failover scenarios
Windows NLB	Windows	Built-in Windows Server feature for IP failover
Cloud Floating IPs	Cloud providers	AWS Elastic IP, GCP External IP reassignment via API

Active-Passive Advantages

•Simple to understand and implement
•No need for state synchronization
•Deterministic behavior (one active at a time)
•Works with any load balancer software
•Stateful connections are preserved (if state replication enabled)

Active-Passive Disadvantages

•50% of load balancer capacity is idle (waste)
•Failover causes brief connection disruption
•Single LB is the bottleneck (no horizontal scaling)
•Failover time isn't instant (seconds to detect + switch)
•Risk of cascade if active was under heavy load

When to Use Active-Passive

Active-Active Pattern

The active-active pattern runs multiple load balancers simultaneously, all handling traffic. This eliminates the wasted capacity of active-passive while also enabling horizontal scaling.

Converting Mermaid diagram...

How Active-Active Works:

The challenge of active-active is: how do clients find the load balancers? There are several approaches:

Approach 1: DNS Round-Robin

Pros: Simple, widely supported
Cons: DNS caching delays failover (minutes), no health awareness

Approach 2: BGP Anycast

All load balancers advertise the same IP address via BGP routing. Network routers automatically send traffic to the 'nearest' load balancer. If one fails, BGP reconverges.

Pros: Fast failover (seconds), automatic geographic routing
Cons: Requires network control, complex operations

Approach 3: External Load Balancer Layer

A higher-tier load balancer (like a cloud provider's NLB or GLB) distributes traffic across your load balancer pool.

Pros: Managed, health-aware, fast failover
Cons: Added layer, cost, dependency on provider

Approach 4: Client-Side Load Balancing

Clients are given a list of load balancer IPs and implement their own load balancing logic, including failover.

Pros: No single point of failure, fast failover
Cons: Complex client logic, harder to update addresses

Active-Active Approaches Comparison
Approach	Failover Time	Geographic Routing	Complexity
DNS Round-Robin	Minutes (TTL-dependent)	No (random)	Low
BGP Anycast	Seconds	Yes (network-based)	High
External LB Layer	Seconds	Yes (if GSLB)	Medium
Client-Side	Milliseconds	Depends on client	Medium (client)

State Synchronization Challenge:

Active-active creates a challenge: if a client's request goes to LB1 first and then LB2, will LB2 have the session state?

Solutions:

Stateless load balancing: Design so session state isn't needed at the LB. Use sticky sessions at the application layer or externalize session state to Redis.
State replication: Load balancers synchronize state between themselves (HAProxy supports this, as do some commercial solutions). Adds complexity and latency.
Consistent hashing/affinity: Use client IP or other attributes to consistently route the same client to the same load balancer. Only failed LB's sessions disrupt.

Prefer Stateless When Possible

DNS-Based Failover and Global Load Balancing

How DNS-Based Failover Works:

DNS Failover Mechanism

•Health checks: DNS provider actively monitors each load balancer endpoint (HTTP health checks, TCP checks, etc.)
•Record management: Healthy endpoints are included in DNS responses; unhealthy ones are omitted
•TTL management: Short TTLs (30-60s) allow faster failover at the cost of more DNS traffic
•Failover policies: Primary/secondary (route to secondary only if primary fails) or weighted (distribute proportionally)
•Client resolution: When clients query DNS, they only receive healthy endpoints

Converting Mermaid diagram...

DNS Failover Limitations:

1. TTL Caching Delays Failover

DNS responses are cached by resolvers, browsers, and operating systems. Even with a 60-second TTL:

Some resolvers ignore TTLs and cache longer
Existing connections aren't affected by DNS changes
Real-world failover often takes 1-5 minutes, not seconds

2. No Connection-Level Awareness

DNS only affects new connections. Existing TCP connections to a failed load balancer will hang until timeout. Applications must implement connection-level failover.

3. No Load Awareness

DNS failover typically only knows healthy/unhealthy, not 'overloaded.' A load balancer at 99% CPU is still 'healthy' from DNS perspective.

Mitigation Strategies:

•Set aggressive health check intervals (10-15 seconds) for faster failure detection
•Use very low TTLs (30s) for load balancer DNS records, accepting higher DNS traffic
•Implement application-level connection retry/failover
•Combine DNS failover with lower-level health checking for defense in depth
•Use GeoDNS/latency-based routing to reduce the impact of regional failures

DNS as the Last Resort

Cloud Provider High Availability Solutions

Cloud Provider Load Balancer HA Features
Provider	Service	HA Approach	SLA
AWS	Application LB (ALB)	Automatically distributed across AZs; no single nodes exposed	99.99%
AWS	Network LB (NLB)	Flow-based distribution; static IPs for failover	99.99%
Google Cloud	Cloud Load Balancing	Anycast-based global distribution; no regional failover needed	99.99%
Azure	Load Balancer / App Gateway	Zone-redundant deployment; automatic failover	99.99%
Cloudflare	Load Balancing	Anycast across 300+ PoPs; health-aware steering	100% (with caveats)

How Cloud Load Balancers Achieve HA:

AWS ALB/NLB Architecture:

Runs on a cluster of EC2 instances invisible to you
Deployed across all AZs in the region you select
Auto-scales based on traffic (within minutes)
Health checks backends and removes unhealthy ones
Uses internal DNS to distribute traffic across LB nodes

Google Cloud Load Balancing:

Uses Anycast: one IP address, served from hundreds of PoPs globally
Traffic enters the Google network at the nearest PoP
Google's internal network routes to backends
No single point of failure by design

Key Insight: The Load Balancer Is a Fleet, Not a Box

Cloud load balancers aren't single machines—they're fleets of machines behind a managed abstraction. This is why they achieve 99.99% SLAs: any individual machine failure is invisible.

Use Managed Load Balancers When...

•HA is critical and you don't have a dedicated team
•Traffic is highly variable (auto-scaling)
•You need global load balancing
•You want to move fast without infra complexity
•Compliance requires provider SLAs

Self-Manage Load Balancers When...

•You need features cloud LBs don't offer
•Cost becomes prohibitive at scale
•You have specific performance requirements
•You're in regulated environments requiring control
•You're multi-cloud and need consistency

Hybrid Approach

Capacity Planning for High Availability

High availability isn't just about having backups—it's about ensuring those backups have sufficient capacity to handle failure scenarios. This requires careful capacity planning.

Capacity Planning Principles

•N+1 Redundancy — If you need N load balancers for capacity, provision N+1. One failure leaves you with N, still sufficient for full load.
•2N Redundancy — For critical systems, double your load balancer capacity. This allows half the fleet to fail while still handling full load.
•Headroom — Never run load balancers above 50-60% of their capacity. This leaves room for traffic spikes and failure absorption.
•Failure Testing — Regularly test failures in production (chaos engineering). Verify that remaining capacity actually handles the failover load.
•Geographic Distribution — Distribute load balancers across failure domains (AZs, regions) so that a single event can't take out all of them.

Capacity Calculation Example:

Scenario: You handle 100K requests/second at peak.

Single load balancer capacity: 60K requests/second (fully utilized)

Option 1: Active-Passive (N+1)

2 load balancers (1 active, 1 passive)
Active handles 100K req/s at ~167% theoretical capacity ❌
Not viable—single LB can't handle load

Option 2: Active-Active (2N)

4 load balancers, each handles 25K req/s (~42% capacity)
If 2 fail, remaining 2 handle 50K req/s each (~83% capacity) ✓
If 1 fails, remaining 3 handle 33K req/s each (~55% capacity) ✓

Option 3: N+1 with larger instances

2 load balancers, each capable of 100K req/s
Normal: each handles 50K (~50% capacity)
One fails: survivor handles 100K (~100% capacity) ⚠️ (risky, no headroom)

Recommendation: N+2 or 2N for critical systems.

Redundancy Strategies by Criticality
Criticality	Redundancy Level	Typical Headroom	Failure Tolerance
Non-critical internal	N+1	30-40%	1 failure
Business-critical	N+2 or 2N	40-50%	2+ failures
Life-safety / Financial	2N + geographic	50%+	Full AZ/region loss

Don't Forget Spike Capacity

Operational Best Practices for HA Load Balancing

Technical architecture is only half the battle. Operational practices determine whether your HA design actually delivers its promised reliability.

Operational HA Best Practices

•Staged Configuration Rollouts — Never push configuration changes to all load balancers simultaneously. Roll out to one, verify, then proceed to others. A config error on all LBs = total outage.
•Automated Configuration Validation — Validate configurations before applying. Check syntax, verify backends exist, test routing rules. Catch errors before they reach production.
•Runbook-Driven Incident Response — Document failover procedures before you need them. Panicking engineers make mistakes. Clear runbooks save minutes during outages.
•Regular Failover Testing — Test failover regularly (monthly or quarterly). Verify it works, measure failover time, ensure documentation accuracy. Untested failover is unreliable failover.
•Monitoring and Alerting — Monitor load balancer health, latency, error rates, and capacity utilization. Alert before saturation, not after failure.
•Chaos Engineering — Intentionally cause failures in production (with safety measures) to verify resilience. Netflix's Chaos Monkey philosophy: the best way to know your system survives failures is to cause them regularly.
•Capacity Reviews — Regularly review capacity against traffic growth. Load balancer capacity that was sufficient 6 months ago may be inadequate now.

Common Operational Failure Patterns:

Pattern	Consequence	Prevention
All-at-once config push	Total outage from config error	Staged rollouts
Untested failover	Failover doesn't work when needed	Regular testing
Expired certificates	Sudden TLS failures	Certificate monitoring
Filled logs crashing disk	LB becomes unresponsive	Log rotation, disk monitoring
Runaway health checks	Overwhelm backends	Rate limit health checks
DNS TTL too long	Slow failover	Aggressive TTLs for LB records
Missing alerts	Failures detected by users	Comprehensive monitoring

Pre-Mortem Practice

Before deploying HA load balancing, run a 'pre-mortem': imagine it's 6 months from now and there was a major outage. What went wrong? This exercise surfaces risks before they become incidents.

Summary: Avoiding Single Points of Failure

Let's consolidate the key concepts from this page and the module as a whole:

Key Takeaways: Single Point of Failure

•Load balancers can fail — Hardware, software, configuration, network, and overload failures are all real risks that must be designed around.
•Active-passive provides simple redundancy — One active, one standby. Simple but wastes capacity and has failover delay.
•Active-active provides capacity and redundancy — All load balancers handle traffic. Requires traffic distribution mechanism (DNS, anycast, higher-tier LB).
•DNS failover is coarse-grained — Useful for regional failover but slow due to TTL caching. Combine with lower-level failover.
•Cloud managed services simplify HA — Abstracts away the complexity of LB redundancy. Often the right choice unless you have specific requirements.
•Capacity planning must account for failures — N+1 minimum, 2N for critical systems. Never assume all capacity will be available.
•Operational practices determine real-world reliability — Staged rollouts, testing, monitoring, and runbooks matter as much as architecture.

Module 1 Summary: What Is Load Balancing?

Over these four pages, we've established a comprehensive understanding of load balancing:

Definition and Purpose — Load balancing distributes traffic across resources to optimize utilization, maximize throughput, minimize latency, and avoid overload.
Benefits — Availability (through redundancy and failover), performance (through distribution and optimization), and flexibility (through abstraction and operational agility).
Placement — Load balancing occurs at multiple tiers: edge (external traffic), middle (service-to-service), and data (database/cache access).
High Availability — The load balancer itself must be made highly available through redundancy patterns, careful capacity planning, and operational discipline.

Module Complete