Load Balancing - Learning Module

Loading content...

0/240

Health Checks

The Guardian of System Reliability

A load balancer is only as effective as its ability to distinguish healthy backends from unhealthy ones. Without proper health checking, a load balancer becomes a liability—blindly routing traffic to dead servers, causing user-facing errors, and failing to fulfill its core purpose of maintaining availability.

Health checking is the mechanism by which load balancers continuously verify that backend servers are capable of handling requests. It's the guardian at the gate, ensuring that only functioning servers receive traffic and automatically removing problematic servers from rotation.

This page provides a comprehensive exploration of health check mechanisms, from basic port checks to sophisticated application-level probes, including the configuration nuances that determine whether your system gracefully handles failures or cascades into outage.

What You Will Master

By the end of this page, you will understand: the types of health checks and their tradeoffs, configuration parameters and their impact, active vs passive health checking, graceful degradation patterns, health check anti-patterns that cause outages, and production best practices used by major services.

Why Health Checks Matter

To understand the critical importance of health checks, consider what happens without them:

Scenario: Health Checks Disabled

[Time 0]  Backend B crashes due to OOM
[Time 0+] Load balancer continues routing 33% of traffic to B
[Time 0+] All requests to B fail with connection refused
[Time 0+] 33% of users experience immediate errors
[Time 5m] Engineers notice errors in monitoring
[Time 8m] Manual intervention removes B from load balancer
[Time 8m] Error rate drops to 0%

Scenario: Health Checks Enabled

[Time 0]    Backend B crashes due to OOM
[Time 10s]  Health check fails (first attempt)
[Time 20s]  Health check fails (second attempt)
[Time 30s]  Health check fails (third attempt) — B marked unhealthy
[Time 30s]  Load balancer stops routing to B
[Time 30s]  All traffic goes to A and C (no user impact)
[Time 60s]  B restarts (auto-healing or manual)
[Time 90s]  Health check succeeds (B marked healthy)
[Time 90s]  Traffic resumes to B

The difference: 8 minutes of 33% error rate vs. 30 seconds of potential minor impact (and often zero impact if the failed server was handling no active connections).

Health Check Benefits
Benefit	Description
Automatic Failure Detection	Remove crashed/hung backends within seconds
Zero-Downtime Deployments	Drain traffic before shutdown, add after startup
Graceful Degradation	Under load, unhealthy backends stop receiving traffic
Self-Healing Integration	Works with auto-scaling to replace failed instances
Visibility	Health status provides operational insight
SLA Protection	Reduces blast radius of individual failures

The Cost of Unhealthy Routing

Routing to an unhealthy backend isn't just a failed request—it's wasted time. The user waits for a timeout (often seconds), then either sees an error or gets retried. Meanwhile, healthy backends are underutilized. Poor health checking turns partial failures into user-visible degradation.

Types of Health Checks

Health checks range from simple connectivity tests to sophisticated application-level probes. Each level provides more information but requires more configuration and can have more failure modes.

Level 1: TCP Health Checks (L4)

Load Balancer → SYN  → Backend
Load Balancer ← SYN-ACK ← Backend  (Success: port is listening)
Load Balancer → RST  → Backend  (Close immediately)

What it Detects:

Backend process is running
Port is listening
Network path is reachable
OS is responsive

What it Misses:

Application deadlocks (process running but not responding)
Application errors (returning 500s)
Dependency failures (database down)
Resource exhaustion (out of memory but still listening)

Use Cases: Non-HTTP services, databases, Redis, when application-level checks aren't possible

Level 2: HTTP Health Checks (L7)

Load Balancer → GET /health HTTP/1.1
Load Balancer ← HTTP/1.1 200 OK  (Success: 2xx response)

What it Detects:

Everything L4 detects, plus:
Application is processing HTTP requests
Application returns success status code
Response within timeout

What it Misses:

Partial functionality failures
Specific endpoint issues
Slow responses below timeout

Level 3: Application-Level Health Checks (Deep)

Load Balancer → GET /health/ready HTTP/1.1
Load Balancer ← HTTP/1.1 200 OK
                Content-Type: application/json
                
                {
                  "status": "healthy",
                  "checks": {
                    "database": "connected",
                    "cache": "connected",
                    "disk": "ok"
                  }
                }

What it Detects:

All dependencies are accessible
Internal state is valid
Application-specific health criteria

Health Check Level Comparison
Level	Protocol	Detects	Complexity	Resource Usage
TCP Connect	TCP SYN/ACK	Port listening	Very Low	Very Low
HTTP Request	HTTP GET	App responding	Low	Low
HTTP Content	HTTP GET + body check	Correct response	Medium	Medium
Deep Health	HTTP GET + dependency check	Full functionality	High	High

Liveness vs Readiness

Kubernetes distinguishes 'liveness' (is the process alive?) from 'readiness' (can it accept traffic?). Apply this to load balancer health checks: TCP checks establish liveness; HTTP checks verify readiness. A server can be live (process running) but not ready (dependencies unavailable).

Health Check Configuration Parameters

Properly configuring health checks requires understanding the key parameters and their impact on detection speed and stability.

Core Parameters:

Parameter	Description	Typical Value	Tradeoff
Interval	Time between checks	5-30 seconds	Faster detection vs. resource usage
Timeout	Max wait for response	2-10 seconds	Sensitivity vs. false positives
Healthy Threshold	Successes to mark healthy	2-3	Stability vs. recovery speed
Unhealthy Threshold	Failures to mark unhealthy	2-3	Sensitivity vs. stability
Path (HTTP)	URL to check	`/health`	Simple vs. thorough
Expected Codes (HTTP)	Success status codes	200-299	Strict vs. lenient
Expected Body (HTTP)	Content to match	None or specific	Thoroughness vs. brittleness

Detection Time Calculation:

Time to Detect Failure = Interval × Unhealthy Threshold

Example:
  Interval = 10 seconds
  Unhealthy Threshold = 3
  Detection Time = 10 × 3 = 30 seconds

Recovery Time Calculation:

Time to Recover = Interval × Healthy Threshold

Example:
  Interval = 10 seconds
  Healthy Threshold = 2
  Recovery Time = 10 × 2 = 20 seconds

Configuration Example: AWS ALB

{
  "TargetGroup": {
    "HealthCheckProtocol": "HTTP",
    "HealthCheckPath": "/health",
    "HealthCheckPort": "traffic-port",
    "HealthCheckIntervalSeconds": 15,
    "HealthCheckTimeoutSeconds": 5,
    "HealthyThresholdCount": 2,
    "UnhealthyThresholdCount": 3,
    "Matcher": {
      "HttpCode": "200-299"
    }
  }
}

Configuration Example: NGINX

upstream backend {
    server 192.168.1.10:8080;
    server 192.168.1.11:8080;
    server 192.168.1.12:8080;
}

server {
    location @health_check {
        health_check interval=10s fails=3 passes=2;
        health_check_timeout 5s;
        proxy_pass http://backend;
    }
}

Configuration Best Practices

•Start Conservative: Begin with longer intervals and higher thresholds; tighten based on observation
•Match Timeout to Reality: Timeout should be less than interval, slightly above normal response time
•Use Thresholds ≥ 2: Single-check failures can be transient; require confirmation
•Separate Health Endpoints: Don't check production traffic paths; use dedicated /health
•Consider Check Load: 100 backends × 5-second interval = 20 checks/second per LB
•Monitor Health Check Latency: Slow health checks indicate backend problems

Active vs Passive Health Checking

Health checking can be active (load balancer probes backends) or passive (load balancer observes real traffic). Each approach has distinct characteristics.

Active Health Checking:

┌──────────────────┐   Health Probe    ┌─────────────┐
│  Load Balancer   │ ─────────────────►│   Backend   │
│                  │ ◄─────────────────│             │
└──────────────────┘   Health Response └─────────────┘

(Separate from production traffic)

How It Works:

LB sends periodic health check requests to each backend
Backend responds (or fails to respond)
LB tracks success/failure and updates backend health status

Pros:

Detects failures even with no production traffic
Controlled, predictable check behavior
Can check specific health endpoints

Cons:

Adds load on backends (many checks from many LB instances)
Delay between failure and detection (interval-based)
Can miss failures that don't affect health endpoint

Passive Health Checking (Outlier Detection):

┌──────────────────┐  Production Traffic  ┌─────────────┐
│  Load Balancer   │ ◄───────────────────►│   Backend   │
│     (observes    │                      │             │
│      failures)   │                      │             │
└──────────────────┘                      └─────────────┘

(Monitors real request outcomes)

How It Works:

LB routes production traffic to backends
LB observes success/failure of each request
Backends with excessive failures are marked unhealthy

Pros:

Zero additional load (uses real traffic)
Immediate detection (no interval delay)
Detects issues that only affect real traffic

Cons:

Requires traffic to detect issues (quiet backends look healthy)
Can be triggered by legitimate client errors
Statistical—needs enough requests to be accurate

Envoy Outlier Detection Configuration:

outlier_detection:
  consecutive_5xx: 5  # 5 consecutive 5xx = eject
  interval: 10s  # Check every 10 seconds
  base_ejection_time: 30s  # Minimum ejection duration
  max_ejection_percent: 50  # Never eject more than 50%
  consecutive_gateway_failure: 5  # 5 connection failures = eject

Active vs Passive Health Check Comparison
Aspect	Active	Passive
Detection Trigger	Scheduled probes	Real traffic failures
Detection Speed	Interval-dependent (seconds)	Immediate (per-request)
Zero-Traffic Detection	Yes	No
Additional Load	Yes (health check requests)	No
Accuracy	As accurate as probe	Depends on traffic patterns
Configuration	Probe settings	Statistical thresholds

Combine Both Approaches

Best-in-class systems use both active and passive health checking. Active checks catch dead backends quickly; passive checks catch subtle issues (slowness, intermittent errors) that active checks might miss. Envoy, for example, supports both simultaneously.

Designing Effective Health Endpoints

The design of your application's health endpoint significantly impacts the effectiveness of health checking.

Health Endpoint Patterns:

Pattern 1: Simple OK Response

@app.route('/health')
def health():
    return 'OK', 200

✓ Fast, low overhead ✗ Doesn't verify actual functionality

Pattern 2: Dependency-Aware Health Check

@app.route('/health')
def health():
    try:
        # Check database
        db.execute('SELECT 1')
        
        # Check cache
        redis.ping()
        
        # Check external API
        requests.get(external_api, timeout=1)
        
        return 'OK', 200
    except Exception as e:
        return f'Unhealthy: {e}', 503

✓ Verifies application can actually work ✗ Couples availability to dependencies ✗ Can cascade failures (dependency down = all backends unhealthy)

Pattern 3: Structured Health Response

@app.route('/health')
def health():
    checks = {
        'database': check_database(),
        'cache': check_cache(),
        'disk_space': check_disk(),
    }
    
    all_healthy = all(c['status'] == 'healthy' for c in checks.values())
    
    return jsonify({
        'status': 'healthy' if all_healthy else 'degraded',
        'checks': checks,
        'timestamp': datetime.utcnow().isoformat(),
    }), 200 if all_healthy else 503

Response Example:

{
  "status": "healthy",
  "checks": {
    "database": {"status": "healthy", "latency_ms": 2},
    "cache": {"status": "healthy", "latency_ms": 1},
    "disk_space": {"status": "healthy", "free_gb": 45}
  },
  "timestamp": "2024-01-15T10:30:00Z"
}

Pattern 4: Separate Liveness and Readiness

@app.route('/health/live')
def liveness():
    # Is the process alive and not deadlocked?
    return 'OK', 200

@app.route('/health/ready')
def readiness():
    # Is the app ready to receive traffic?
    if not db_connected or not warmed_cache:
        return 'Not Ready', 503
    return 'Ready', 200

Use liveness for "should I restart this?" and readiness for "should I send traffic?"

Health Endpoint Anti-Patterns

•Slow Health Checks: If health check takes 3 seconds, and interval is 5 seconds, 60% of time is spent checking health
•Expensive Health Checks: Don't run full database queries; use connection ping
•Strict Dependency Coupling: If Redis being slow makes you 'unhealthy', all traffic shifts and overloads others
•No Timeout on Dependency Checks: A hanging dependency check makes the health check hang
•Authentication on Health Endpoints: Makes health checks fail if auth is misconfigured
•Health Check Affecting State: Never modify state in a health check

The Dependency Health Check Trap

If your health check fails because Redis is down, all backends become unhealthy simultaneously. The load balancer has nowhere to send traffic—total outage. Consider: is 'degraded with Redis down' better than 'completely unavailable'? Often, returning a cache miss is better than returning no response.

Connection Draining and Graceful Shutdown

When a backend becomes unhealthy (intentionally for deployment or unexpectedly due to failure), existing connections need to be handled gracefully. Connection draining is the process of completing in-flight requests before removing a backend.

Without Connection Draining:

[Time 0]    Backend marked unhealthy
[Time 0]    LB immediately stops routing to backend
[Time 0]    100 in-flight requests receive connection reset
[Time 0]    100 users see errors

With Connection Draining:

[Time 0]    Backend marked unhealthy
[Time 0]    LB stops sending NEW requests to backend
[Time 0]    LB continues forwarding existing connections
[Time 5s]   95 requests complete normally
[Time 10s]  Remaining 5 requests complete
[Time 10s]  Backend fully drained, can be shut down

Draining Timeout:

A maximum draining duration prevents hung connections from blocking shutdown indefinitely:

Draining Timeout = 30 seconds

After 30 seconds, remaining connections are forcibly closed.
This is a tradeoff: longer timeout = more graceful, but slower deployments.

Implementing Graceful Shutdown (Backend Side):

import signal
import time

class GracefulServer:
    def __init__(self):
        self.is_ready = True
        self.active_requests = 0
        signal.signal(signal.SIGTERM, self.shutdown_handler)
    
    def shutdown_handler(self, signum, frame):
        print("Received SIGTERM, starting graceful shutdown")
        
        # Step 1: Stop accepting new requests (fail health checks)
        self.is_ready = False
        
        # Step 2: Wait for health checks to fail (LB stops sending traffic)
        time.sleep(10)  # 2-3 health check intervals
        
        # Step 3: Wait for in-flight requests to complete
        timeout = 30
        start = time.time()
        while self.active_requests > 0 and (time.time() - start) < timeout:
            time.sleep(1)
        
        # Step 4: Exit
        print(f"Shutdown complete, {self.active_requests} requests aborted")
        sys.exit(0)
    
    @app.route('/health/ready')
    def readiness(self):
        if self.is_ready:
            return 'Ready', 200
        return 'Draining', 503  # Tell LB to stop sending traffic

Configuration: AWS Target Group Deregistration

{
  "TargetGroup": {
    "TargetGroupAttributes": [
      {
        "Key": "deregistration_delay.timeout_seconds",
        "Value": "30"
      }
    ]
  }
}

Draining Timeout Guidelines
Application Type	Recommended Timeout	Reasoning
Web APIs (fast)	30 seconds	Most requests complete in <1 second
Batch processing	5 minutes	Jobs may take minutes
WebSocket/long polling	5-30 minutes	Long-lived connections
Database connections	60 seconds	Transaction completion
Streaming media	Session-based	Don't interrupt active streams

Kubernetes PreStop Hook

In Kubernetes, use a preStop lifecycle hook to delay shutdown, giving the Endpoints controller time to remove the pod from Service endpoints. A simple 'sleep 5' can prevent traffic being sent to a terminating pod.

Advanced Health Check Patterns

Production systems often need more sophisticated health checking than basic pass/fail. Here are advanced patterns used by large-scale services.

Pattern 1: Gradual Degradation (Weight Adjustment)

Instead of binary healthy/unhealthy, adjust backend weight based on health metrics:

Backend Response Time Increasing:
  Normal (50ms): weight = 100
  Elevated (100ms): weight = 50
  High (200ms): weight = 25
  Critical (500ms): weight = 0 (removed)

This naturally shifts traffic away from struggling backends before they fail completely.

Pattern 2: Panic Mode Prevention

Prevent removing too many backends simultaneously:

# Envoy outlier detection
max_ejection_percent: 50  # Never eject more than 50% of backends

Even if 80% of backends are unhealthy, keep routing to the "least bad" ones. Total outage is worse than degraded service.

Pattern 3: Hystrix-Style Circuit Breaker

[Closed State] - Normal operation
    ↓ (Failure rate > threshold)
[Open State] - Stop sending requests (fail fast)
    ↓ (After timeout)
[Half-Open State] - Send limited test requests
    ↓ (Tests succeed)     ↓ (Tests fail)
[Closed State]        [Open State]

Pattern 4: Differentiated Health Checks

Different consumers may need different health criteria:

@app.route('/health/lb')  # For load balancer
def lb_health():
    # Simple: can process requests?
    return 'OK', 200

@app.route('/health/k8s')  # For Kubernetes
def k8s_health():
    # Is container alive?
    return 'OK', 200

@app.route('/health/monitoring')  # For observability
def monitoring_health():
    # Detailed metrics
    return jsonify(get_detailed_metrics()), 200

@app.route('/health/canary')  # For canary deployment
def canary_health():
    # Extra strict checks for canary
    run_synthetic_transactions()
    return 'OK', 200

Pattern 5: Warm-Up Period

Newly started backends may not perform at full capacity (cold caches, JIT not optimized). Delay adding to pool:

[Container Starts]
    ↓
[Liveness check passes] → Container won't be restarted
    ↓ (30 second warm-up)
[Readiness check passes] → LB starts sending traffic
    ↓ (Gradual weight ramp-up)
[Full traffic after 60 seconds]

Pattern 6: Subsetting with Health Priority

With many backends, each LB instance may only check a subset:

Backend Pool: 1000 servers
LB Instance A checks: 100 servers (random subset)
LB Instance B checks: 100 servers (different subset)
...

Gossip protocol shares health state between LB instances.

The Art of Thresholds

Setting thresholds (failure rates, ejection percentages, timeouts) is a tuning exercise. Start conservative, observe in production, and adjust. What works for one service may not work for another. Build alerting to notify when thresholds are approached.

Summary: Health Check Mastery

We've covered the essential aspects of health checking for load balancers. Let's consolidate the key principles.

Key Takeaways

•Health checks are non-optional — Without them, load balancers blindly route to dead servers.
•Depth varies by need — TCP for basic, HTTP for application, deep checks for dependencies.
•Configuration affects detection speed — Interval × threshold = time to detect.
•Active finds dead backends; passive finds slow ones — Use both for comprehensive coverage.
•Health endpoints need design care — Fast, independent, and appropriately coupled to dependencies.
•Connection draining prevents errors — Complete in-flight requests before shutdown.
•Advanced patterns prevent cascades — Gradual degradation, panic mode prevention, circuit breakers.
•Tune based on observation — Start conservative; adjust based on real failure patterns.

What's Next:

We've mastered health checking for single regions and data centers. But what about global systems serving users around the world? Next, we'll explore global load balancing—how organizations distribute traffic across geographic regions, implement resilient multi-region architectures, and provide optimal performance regardless of where users are located.

Page Complete

You now understand health checking from basic TCP probes to sophisticated application-level patterns. You can design health endpoints, configure appropriate thresholds, and implement graceful degradation. Next, we'll scale up to global load balancing.

Health Checks

The Guardian of System Reliability

What You Will Master

Why Health Checks Matter

To understand the critical importance of health checks, consider what happens without them:

Scenario: Health Checks Disabled

[Time 0]  Backend B crashes due to OOM
[Time 0+] Load balancer continues routing 33% of traffic to B
[Time 0+] All requests to B fail with connection refused
[Time 0+] 33% of users experience immediate errors
[Time 5m] Engineers notice errors in monitoring
[Time 8m] Manual intervention removes B from load balancer
[Time 8m] Error rate drops to 0%

Scenario: Health Checks Enabled

[Time 0]    Backend B crashes due to OOM
[Time 10s]  Health check fails (first attempt)
[Time 20s]  Health check fails (second attempt)
[Time 30s]  Health check fails (third attempt) — B marked unhealthy
[Time 30s]  Load balancer stops routing to B
[Time 30s]  All traffic goes to A and C (no user impact)
[Time 60s]  B restarts (auto-healing or manual)
[Time 90s]  Health check succeeds (B marked healthy)
[Time 90s]  Traffic resumes to B

The difference: 8 minutes of 33% error rate vs. 30 seconds of potential minor impact (and often zero impact if the failed server was handling no active connections).

Health Check Benefits
Benefit	Description
Automatic Failure Detection	Remove crashed/hung backends within seconds
Zero-Downtime Deployments	Drain traffic before shutdown, add after startup
Graceful Degradation	Under load, unhealthy backends stop receiving traffic
Self-Healing Integration	Works with auto-scaling to replace failed instances
Visibility	Health status provides operational insight
SLA Protection	Reduces blast radius of individual failures

The Cost of Unhealthy Routing

Types of Health Checks

Health checks range from simple connectivity tests to sophisticated application-level probes. Each level provides more information but requires more configuration and can have more failure modes.

Level 1: TCP Health Checks (L4)

Load Balancer → SYN  → Backend
Load Balancer ← SYN-ACK ← Backend  (Success: port is listening)
Load Balancer → RST  → Backend  (Close immediately)

What it Detects:

Backend process is running
Port is listening
Network path is reachable
OS is responsive

What it Misses:

Application deadlocks (process running but not responding)
Application errors (returning 500s)
Dependency failures (database down)
Resource exhaustion (out of memory but still listening)

Use Cases: Non-HTTP services, databases, Redis, when application-level checks aren't possible

Level 2: HTTP Health Checks (L7)

Load Balancer → GET /health HTTP/1.1
Load Balancer ← HTTP/1.1 200 OK  (Success: 2xx response)

What it Detects:

Everything L4 detects, plus:
Application is processing HTTP requests
Application returns success status code
Response within timeout

What it Misses:

Partial functionality failures
Specific endpoint issues
Slow responses below timeout

Level 3: Application-Level Health Checks (Deep)

Load Balancer → GET /health/ready HTTP/1.1
Load Balancer ← HTTP/1.1 200 OK
                Content-Type: application/json
                
                {
                  "status": "healthy",
                  "checks": {
                    "database": "connected",
                    "cache": "connected",
                    "disk": "ok"
                  }
                }

What it Detects:

All dependencies are accessible
Internal state is valid
Application-specific health criteria

Health Check Level Comparison
Level	Protocol	Detects	Complexity	Resource Usage
TCP Connect	TCP SYN/ACK	Port listening	Very Low	Very Low
HTTP Request	HTTP GET	App responding	Low	Low
HTTP Content	HTTP GET + body check	Correct response	Medium	Medium
Deep Health	HTTP GET + dependency check	Full functionality	High	High

Liveness vs Readiness

Health Check Configuration Parameters

Properly configuring health checks requires understanding the key parameters and their impact on detection speed and stability.

Core Parameters:

Parameter	Description	Typical Value	Tradeoff
Interval	Time between checks	5-30 seconds	Faster detection vs. resource usage
Timeout	Max wait for response	2-10 seconds	Sensitivity vs. false positives
Healthy Threshold	Successes to mark healthy	2-3	Stability vs. recovery speed
Unhealthy Threshold	Failures to mark unhealthy	2-3	Sensitivity vs. stability
Path (HTTP)	URL to check	`/health`	Simple vs. thorough
Expected Codes (HTTP)	Success status codes	200-299	Strict vs. lenient
Expected Body (HTTP)	Content to match	None or specific	Thoroughness vs. brittleness

Detection Time Calculation:

Time to Detect Failure = Interval × Unhealthy Threshold

Example:
  Interval = 10 seconds
  Unhealthy Threshold = 3
  Detection Time = 10 × 3 = 30 seconds

Recovery Time Calculation:

Time to Recover = Interval × Healthy Threshold

Example:
  Interval = 10 seconds
  Healthy Threshold = 2
  Recovery Time = 10 × 2 = 20 seconds

Configuration Example: AWS ALB

{
  "TargetGroup": {
    "HealthCheckProtocol": "HTTP",
    "HealthCheckPath": "/health",
    "HealthCheckPort": "traffic-port",
    "HealthCheckIntervalSeconds": 15,
    "HealthCheckTimeoutSeconds": 5,
    "HealthyThresholdCount": 2,
    "UnhealthyThresholdCount": 3,
    "Matcher": {
      "HttpCode": "200-299"
    }
  }
}

Configuration Example: NGINX

upstream backend {
    server 192.168.1.10:8080;
    server 192.168.1.11:8080;
    server 192.168.1.12:8080;
}

server {
    location @health_check {
        health_check interval=10s fails=3 passes=2;
        health_check_timeout 5s;
        proxy_pass http://backend;
    }
}

Configuration Best Practices

•Start Conservative: Begin with longer intervals and higher thresholds; tighten based on observation
•Match Timeout to Reality: Timeout should be less than interval, slightly above normal response time
•Use Thresholds ≥ 2: Single-check failures can be transient; require confirmation
•Separate Health Endpoints: Don't check production traffic paths; use dedicated /health
•Consider Check Load: 100 backends × 5-second interval = 20 checks/second per LB
•Monitor Health Check Latency: Slow health checks indicate backend problems

Active vs Passive Health Checking

Health checking can be active (load balancer probes backends) or passive (load balancer observes real traffic). Each approach has distinct characteristics.

Active Health Checking:

┌──────────────────┐   Health Probe    ┌─────────────┐
│  Load Balancer   │ ─────────────────►│   Backend   │
│                  │ ◄─────────────────│             │
└──────────────────┘   Health Response └─────────────┘

(Separate from production traffic)

How It Works:

LB sends periodic health check requests to each backend
Backend responds (or fails to respond)
LB tracks success/failure and updates backend health status

Pros:

Detects failures even with no production traffic
Controlled, predictable check behavior
Can check specific health endpoints

Cons:

Adds load on backends (many checks from many LB instances)
Delay between failure and detection (interval-based)
Can miss failures that don't affect health endpoint

Passive Health Checking (Outlier Detection):

┌──────────────────┐  Production Traffic  ┌─────────────┐
│  Load Balancer   │ ◄───────────────────►│   Backend   │
│     (observes    │                      │             │
│      failures)   │                      │             │
└──────────────────┘                      └─────────────┘

(Monitors real request outcomes)

How It Works:

LB routes production traffic to backends
LB observes success/failure of each request
Backends with excessive failures are marked unhealthy

Pros:

Zero additional load (uses real traffic)
Immediate detection (no interval delay)
Detects issues that only affect real traffic

Cons:

Requires traffic to detect issues (quiet backends look healthy)
Can be triggered by legitimate client errors
Statistical—needs enough requests to be accurate

Envoy Outlier Detection Configuration:

outlier_detection:
  consecutive_5xx: 5  # 5 consecutive 5xx = eject
  interval: 10s  # Check every 10 seconds
  base_ejection_time: 30s  # Minimum ejection duration
  max_ejection_percent: 50  # Never eject more than 50%
  consecutive_gateway_failure: 5  # 5 connection failures = eject

Active vs Passive Health Check Comparison
Aspect	Active	Passive
Detection Trigger	Scheduled probes	Real traffic failures
Detection Speed	Interval-dependent (seconds)	Immediate (per-request)
Zero-Traffic Detection	Yes	No
Additional Load	Yes (health check requests)	No
Accuracy	As accurate as probe	Depends on traffic patterns
Configuration	Probe settings	Statistical thresholds

Combine Both Approaches

Designing Effective Health Endpoints

The design of your application's health endpoint significantly impacts the effectiveness of health checking.

Health Endpoint Patterns:

Pattern 1: Simple OK Response

@app.route('/health')
def health():
    return 'OK', 200

✓ Fast, low overhead ✗ Doesn't verify actual functionality

Pattern 2: Dependency-Aware Health Check

@app.route('/health')
def health():
    try:
        # Check database
        db.execute('SELECT 1')
        
        # Check cache
        redis.ping()
        
        # Check external API
        requests.get(external_api, timeout=1)
        
        return 'OK', 200
    except Exception as e:
        return f'Unhealthy: {e}', 503

✓ Verifies application can actually work ✗ Couples availability to dependencies ✗ Can cascade failures (dependency down = all backends unhealthy)

Pattern 3: Structured Health Response

@app.route('/health')
def health():
    checks = {
        'database': check_database(),
        'cache': check_cache(),
        'disk_space': check_disk(),
    }
    
    all_healthy = all(c['status'] == 'healthy' for c in checks.values())
    
    return jsonify({
        'status': 'healthy' if all_healthy else 'degraded',
        'checks': checks,
        'timestamp': datetime.utcnow().isoformat(),
    }), 200 if all_healthy else 503

Response Example:

{
  "status": "healthy",
  "checks": {
    "database": {"status": "healthy", "latency_ms": 2},
    "cache": {"status": "healthy", "latency_ms": 1},
    "disk_space": {"status": "healthy", "free_gb": 45}
  },
  "timestamp": "2024-01-15T10:30:00Z"
}

Pattern 4: Separate Liveness and Readiness

@app.route('/health/live')
def liveness():
    # Is the process alive and not deadlocked?
    return 'OK', 200

@app.route('/health/ready')
def readiness():
    # Is the app ready to receive traffic?
    if not db_connected or not warmed_cache:
        return 'Not Ready', 503
    return 'Ready', 200

Use liveness for "should I restart this?" and readiness for "should I send traffic?"

Health Endpoint Anti-Patterns

•Slow Health Checks: If health check takes 3 seconds, and interval is 5 seconds, 60% of time is spent checking health
•Expensive Health Checks: Don't run full database queries; use connection ping
•Strict Dependency Coupling: If Redis being slow makes you 'unhealthy', all traffic shifts and overloads others
•No Timeout on Dependency Checks: A hanging dependency check makes the health check hang
•Authentication on Health Endpoints: Makes health checks fail if auth is misconfigured
•Health Check Affecting State: Never modify state in a health check

The Dependency Health Check Trap

Connection Draining and Graceful Shutdown

Without Connection Draining:

[Time 0]    Backend marked unhealthy
[Time 0]    LB immediately stops routing to backend
[Time 0]    100 in-flight requests receive connection reset
[Time 0]    100 users see errors

With Connection Draining:

[Time 0]    Backend marked unhealthy
[Time 0]    LB stops sending NEW requests to backend
[Time 0]    LB continues forwarding existing connections
[Time 5s]   95 requests complete normally
[Time 10s]  Remaining 5 requests complete
[Time 10s]  Backend fully drained, can be shut down

Draining Timeout:

A maximum draining duration prevents hung connections from blocking shutdown indefinitely:

Draining Timeout = 30 seconds

After 30 seconds, remaining connections are forcibly closed.
This is a tradeoff: longer timeout = more graceful, but slower deployments.

Implementing Graceful Shutdown (Backend Side):

import signal
import time

class GracefulServer:
    def __init__(self):
        self.is_ready = True
        self.active_requests = 0
        signal.signal(signal.SIGTERM, self.shutdown_handler)
    
    def shutdown_handler(self, signum, frame):
        print("Received SIGTERM, starting graceful shutdown")
        
        # Step 1: Stop accepting new requests (fail health checks)
        self.is_ready = False
        
        # Step 2: Wait for health checks to fail (LB stops sending traffic)
        time.sleep(10)  # 2-3 health check intervals
        
        # Step 3: Wait for in-flight requests to complete
        timeout = 30
        start = time.time()
        while self.active_requests > 0 and (time.time() - start) < timeout:
            time.sleep(1)
        
        # Step 4: Exit
        print(f"Shutdown complete, {self.active_requests} requests aborted")
        sys.exit(0)
    
    @app.route('/health/ready')
    def readiness(self):
        if self.is_ready:
            return 'Ready', 200
        return 'Draining', 503  # Tell LB to stop sending traffic

Configuration: AWS Target Group Deregistration

{
  "TargetGroup": {
    "TargetGroupAttributes": [
      {
        "Key": "deregistration_delay.timeout_seconds",
        "Value": "30"
      }
    ]
  }
}

Draining Timeout Guidelines
Application Type	Recommended Timeout	Reasoning
Web APIs (fast)	30 seconds	Most requests complete in <1 second
Batch processing	5 minutes	Jobs may take minutes
WebSocket/long polling	5-30 minutes	Long-lived connections
Database connections	60 seconds	Transaction completion
Streaming media	Session-based	Don't interrupt active streams

Kubernetes PreStop Hook

Advanced Health Check Patterns

Production systems often need more sophisticated health checking than basic pass/fail. Here are advanced patterns used by large-scale services.

Pattern 1: Gradual Degradation (Weight Adjustment)

Instead of binary healthy/unhealthy, adjust backend weight based on health metrics:

Backend Response Time Increasing:
  Normal (50ms): weight = 100
  Elevated (100ms): weight = 50
  High (200ms): weight = 25
  Critical (500ms): weight = 0 (removed)

This naturally shifts traffic away from struggling backends before they fail completely.

Pattern 2: Panic Mode Prevention

Prevent removing too many backends simultaneously:

# Envoy outlier detection
max_ejection_percent: 50  # Never eject more than 50% of backends

Even if 80% of backends are unhealthy, keep routing to the "least bad" ones. Total outage is worse than degraded service.

Pattern 3: Hystrix-Style Circuit Breaker

[Closed State] - Normal operation
    ↓ (Failure rate > threshold)
[Open State] - Stop sending requests (fail fast)
    ↓ (After timeout)
[Half-Open State] - Send limited test requests
    ↓ (Tests succeed)     ↓ (Tests fail)
[Closed State]        [Open State]

Pattern 4: Differentiated Health Checks

Different consumers may need different health criteria:

@app.route('/health/lb')  # For load balancer
def lb_health():
    # Simple: can process requests?
    return 'OK', 200

@app.route('/health/k8s')  # For Kubernetes
def k8s_health():
    # Is container alive?
    return 'OK', 200

@app.route('/health/monitoring')  # For observability
def monitoring_health():
    # Detailed metrics
    return jsonify(get_detailed_metrics()), 200

@app.route('/health/canary')  # For canary deployment
def canary_health():
    # Extra strict checks for canary
    run_synthetic_transactions()
    return 'OK', 200

Pattern 5: Warm-Up Period

Newly started backends may not perform at full capacity (cold caches, JIT not optimized). Delay adding to pool:

[Container Starts]
    ↓
[Liveness check passes] → Container won't be restarted
    ↓ (30 second warm-up)
[Readiness check passes] → LB starts sending traffic
    ↓ (Gradual weight ramp-up)
[Full traffic after 60 seconds]

Pattern 6: Subsetting with Health Priority

With many backends, each LB instance may only check a subset:

Backend Pool: 1000 servers
LB Instance A checks: 100 servers (random subset)
LB Instance B checks: 100 servers (different subset)
...

Gossip protocol shares health state between LB instances.

The Art of Thresholds

Summary: Health Check Mastery

We've covered the essential aspects of health checking for load balancers. Let's consolidate the key principles.

Key Takeaways

•Health checks are non-optional — Without them, load balancers blindly route to dead servers.
•Depth varies by need — TCP for basic, HTTP for application, deep checks for dependencies.
•Configuration affects detection speed — Interval × threshold = time to detect.
•Active finds dead backends; passive finds slow ones — Use both for comprehensive coverage.
•Health endpoints need design care — Fast, independent, and appropriately coupled to dependencies.
•Connection draining prevents errors — Complete in-flight requests before shutdown.
•Advanced patterns prevent cascades — Gradual degradation, panic mode prevention, circuit breakers.
•Tune based on observation — Start conservative; adjust based on real failure patterns.

What's Next:

Page Complete