Service Discovery - Learning Module

Loading content...

0/273

Client-Side vs Server-Side Discovery

Where Does the Routing Decision Happen?

Once services are registered in a registry, a fundamental architectural question arises: Who decides which instance receives a request? This seemingly simple question has profound implications for system complexity, flexibility, resilience, and operational requirements.

The answer defines two distinct patterns: client-side discovery, where the calling service queries the registry and makes load balancing decisions itself, and server-side discovery, where a dedicated intermediary (like a load balancer) handles routing on behalf of the client.

Both patterns are valid, and both are used extensively in production systems. Understanding their trade-offs is essential for making informed architectural decisions—and for understanding why modern service meshes combine elements of both.

What You Will Learn

By the end of this page, you will understand the fundamental differences between client-side and server-side discovery, their respective advantages and disadvantages, implementation patterns for each, and guidance for choosing between them. You'll also understand the hybrid approach used by modern service meshes.

Client-Side Discovery Pattern

In the client-side discovery pattern, the service client is responsible for determining the network locations of available service instances and load balancing requests across them. The client queries the service registry, obtains a list of endpoints, and uses a client-side load balancing algorithm to select an instance for each request.

How Client-Side Discovery Works:

Converting Mermaid diagram...

The Client-Side Discovery Flow:

Registration: Service instances register themselves with the service registry
Discovery Query: Client queries registry for available instances (typically cached)
Load Balancing: Client selects an instance using a load balancing algorithm
Direct Connection: Client connects directly to the selected instance
Response Handling: Response returns directly from the service to the client
Cache Refresh: Client periodically refreshes its endpoint cache
Failure Handling: Client removes failed instances from local cache

client-side-discovery.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
import Consul from 'consul';
 
interface ServiceInstance {
  address: string;
  port: number;
  healthy: boolean;
  weight: number;
  metadata: Record<string, string>;
}
 
/**
 * Client-side service discovery with load balancing
 */
class ClientSideDiscovery {
  private consul: Consul.Consul;
  private endpoints: Map<string, ServiceInstance[]> = new Map();
  private roundRobinIndex: Map<string, number> = new Map();
 
  constructor(consulHost: string = 'consul.internal') {
    this.consul = new Consul({ host: consulHost, port: 8500 });
    this.startBackgroundRefresh();
  }
 
  /**
   * Get endpoint for a service using client-side load balancing
   */
  async getEndpoint(serviceName: string): Promise<ServiceInstance> {
    const instances = await this.getHealthyInstances(serviceName);
    
    if (instances.length === 0) {
      throw new Error(`No healthy instances for ${serviceName}`);
    }
 
    // Client-side load balancing: weighted round-robin
    return this.selectInstance(serviceName, instances);
  }
 
  /**
   * Get all healthy instances from cache or registry
   */
  private async getHealthyInstances(serviceName: string): Promise<ServiceInstance[]> {
    // Check cache first
    let cached = this.endpoints.get(serviceName);
    
    if (!cached) {
      // Initial fetch from registry
      cached = await this.fetchFromRegistry(serviceName);
      this.endpoints.set(serviceName, cached);
    }
 
    return cached.filter(i => i.healthy);
  }
 
  /**
   * Query registry for service instances
   */
  private async fetchFromRegistry(serviceName: string): Promise<ServiceInstance[]> {
    const services = await this.consul.health.service({
      service: serviceName,
      passing: true,  // Only healthy instances
    });
 
    return services.map(s => ({
      address: s.Service.Address || s.Node.Address,
      port: s.Service.Port,
      healthy: true,
      weight: parseInt(s.Service.Meta?.weight || '100'),
      metadata: s.Service.Meta || {},
    }));
  }
 
  /**
   * Weighted round-robin load balancing
   */
  private selectInstance(serviceName: string, instances: ServiceInstance[]): ServiceInstance {
    // Build weighted list
    const weightedList: ServiceInstance[] = [];
    for (const instance of instances) {
      const normalizedWeight = Math.ceil(instance.weight / 10);
      for (let i = 0; i < normalizedWeight; i++) {
        weightedList.push(instance);
      }
    }
 
    // Round-robin through weighted list
    const currentIndex = this.roundRobinIndex.get(serviceName) || 0;
    const selected = weightedList[currentIndex % weightedList.length];
    this.roundRobinIndex.set(serviceName, currentIndex + 1);
 
    return selected;
  }
 
  /**
   * Background refresh of endpoint caches
   */
  private startBackgroundRefresh(): void {
    setInterval(async () => {
      for (const serviceName of this.endpoints.keys()) {
        try {
          const fresh = await this.fetchFromRegistry(serviceName);
          this.endpoints.set(serviceName, fresh);
        } catch (error) {
          console.warn(`Failed to refresh ${serviceName}, using cached`);
        }
      }
    }, 10000); // Refresh every 10 seconds
  }
 
  /**
   * Mark an instance as failed (remove from cache temporarily)
   */
  markFailed(serviceName: string, address: string, port: number): void {
    const instances = this.endpoints.get(serviceName);
    if (instances) {
      const instance = instances.find(
        i => i.address === address && i.port === port
      );
      if (instance) {
        instance.healthy = false;
        // Instance will be reconsidered on next refresh
      }
    }
  }
}
 
// Usage example
async function callOrderService() {
  const discovery = new ClientSideDiscovery();
  
  try {
    const endpoint = await discovery.getEndpoint('order-service');
    const response = await fetch(
      `http://${endpoint.address}:${endpoint.port}/api/orders`
    );
    return response.json();
  } catch (error) {
    // Handle failure, potentially mark endpoint as failed
    throw error;
  }
}

Advantages

•No central bottleneck — No intermediary that can become overloaded or fail
•Lower latency — Direct client-to-service connection, no proxy hop
•Flexible load balancing — Each client can implement custom algorithms
•Service-aware routing — Clients can use service-specific metadata for routing
•Failure isolation — One client's failure doesn't affect others
•Simpler infrastructure — No additional load balancer infrastructure

Disadvantages

•Client complexity — Each client must implement discovery and load balancing
•Language coupling — Need discovery libraries for each language
•Inconsistent behavior — Different clients may behave differently
•Harder to update — Changing load balancing requires updating all clients
•Testing complexity — Must test discovery logic in each client
•Registry coupling — Clients directly depend on registry implementation

Common Implementations

Netflix Ribbon (now in maintenance mode), gRPC client-side load balancing, and Envoy sidecar proxies are popular implementations of client-side discovery. When the 'client' is a sidecar proxy, the application is decoupled from discovery while still achieving client-side load balancing.

Server-Side Discovery Pattern

In the server-side discovery pattern, clients make requests to a well-known intermediary (typically a load balancer or API gateway), which queries the service registry and routes the request to an appropriate instance. The client is unaware of the service registry or the specific instances handling its requests.

How Server-Side Discovery Works:

Converting Mermaid diagram...

The Server-Side Discovery Flow:

Registration: Service instances register with the service registry
Registry Sync: Load balancer watches registry for endpoint updates
Client Request: Client sends request to well-known load balancer address
Load Balancing: Load balancer selects an instance using its algorithm
Proxying: Load balancer forwards request to selected instance
Response: Response flows back through load balancer to client
Health Checking: Load balancer may perform its own health checks

nginx-server-side.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# NGINX as server-side discovery load balancer
# Uses Consul for dynamic upstream discovery
 
# Install nginx-upsync-module for dynamic upstreams
 
upstream order_service {
    # Dynamic upstream from Consul
    upsync 127.0.0.1:8500/v1/health/service/order-service 
           upsync_timeout=6m 
           upsync_interval=500ms 
           upsync_type=consul 
           strong_dependency=off;
    
    # Fallback static server
    upsync_dump_path /etc/nginx/order_service.conf;
    include /etc/nginx/order_service.conf;
    
    # Load balancing configuration
    keepalive 32;
}
 
upstream user_service {
    upsync 127.0.0.1:8500/v1/health/service/user-service 
           upsync_timeout=6m 
           upsync_interval=500ms 
           upsync_type=consul;
    
    upsync_dump_path /etc/nginx/user_service.conf;
    include /etc/nginx/user_service.conf;
    
    keepalive 32;
}
 
server {
    listen 80;
    
    # Clients connect to these paths
    # They don't know about individual instances
    location /api/orders {
        proxy_pass http://order_service;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Health check integration
        health_check interval=5s fails=3 passes=2;
    }
    
    location /api/users {
        proxy_pass http://user_service;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        health_check interval=5s fails=3 passes=2;
    }
}

haproxy-server-side.cfg

HAProxy

# HAProxy configuration with Consul integration
# Uses consul-template for dynamic backend updates
 
global
    maxconn 50000
    log stdout format raw local0
 
defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    option httplog
 
frontend api_gateway
    bind *:80
    
    # Route based on path to different backends
    acl is_orders path_beg /api/orders
    acl is_users path_beg /api/users
    
    use_backend order_service if is_orders
    use_backend user_service if is_users
    default_backend fallback
 
# Backends are generated by consul-template
# This file is regenerated when service instances change
backend order_service
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    
    # consul-template populates these servers dynamically
    {{range service "order-service"}}
    server {{.Node}}-{{.Port}} {{.Address}}:{{.Port}} check inter 5s fall 3 rise 2
    {{end}}
 
backend user_service
    balance leastconn
    option httpchk GET /health
    
    {{range service "user-service"}}
    server {{.Node}}-{{.Port}} {{.Address}}:{{.Port}} check inter 5s fall 3 rise 2
    {{end}}
 
backend fallback
    mode http
    http-request deny deny_status 503

Advantages

•Simple clients — Clients just connect to hostname; no discovery logic
•Language agnostic — Any language/framework can use the service
•Centralized control — All routing decisions in one place
•Consistent behavior — All clients experience same load balancing
•Easy updates — Change routing without updating clients
•Additional features — SSL termination, rate limiting, auth

Disadvantages

•Additional hop — Every request goes through load balancer
•Potential bottleneck — Load balancer can become overloaded
•Single point of failure — Load balancer failure affects all traffic
•Increased latency — Proxy hop adds latency to every request
•Scaling requirements — Must scale load balancer infrastructure
•Operational burden — Another critical system to manage

Cloud Load Balancers

Cloud providers offer managed load balancers (AWS ALB/NLB, GCP Load Balancer, Azure Load Balancer) that integrate with their discovery mechanisms. This reduces operational burden while providing server-side discovery. However, they may have higher latency compared to client-side approaches and less flexibility in load balancing algorithms.

Detailed Comparison

Let's systematically compare client-side and server-side discovery across multiple dimensions to understand which is appropriate for different scenarios.

Client-Side vs Server-Side Discovery Comparison
Dimension	Client-Side Discovery	Server-Side Discovery
Network Topology	Direct client-to-service connections	All traffic flows through intermediary
Latency	Lower (no proxy hop)	Higher (additional hop per request)
Failure Blast Radius	Localized to single client	Load balancer failure affects all clients
Client Complexity	Higher (discovery + load balancing logic)	Lower (just hostname and port)
Client Diversity	Requires library per language	Any language works (just HTTP/TCP)
Load Balancing Flexibility	Per-client customization possible	Uniform across all clients
Operational Complexity	Discovery logic distributed in clients	Centralized in load balancer
Visibility/Observability	Fragmented across clients	Centralized at load balancer
Security Enforcement	Distributed, harder to enforce	Centralized, easier to enforce
Update Rollout	Requires updating all clients	Update single load balancer
Registry Coupling	Clients coupled to registry	Only LB coupled to registry

Performance Characteristics:

Performance Comparison
Metric	Client-Side	Server-Side	Notes
Request Latency	~0ms overhead	0.5-5ms overhead	Depends on LB implementation
Connection Efficiency	Connection per backend	Connection pooling at LB	LB can multiplex connections
Throughput	Limited by client resources	Limited by LB capacity	LB becomes bottleneck at scale
Memory Usage	Endpoint cache per client	Centralized caching	Scales differently
Network Utilization	Distributed query load	Concentrated query load	Registry load differs

Operational Characteristics:

Operational Comparison
Aspect	Client-Side	Server-Side
Debugging	Must instrument each client	Centralized logs/metrics at LB
Configuration Changes	Rolling update to all clients	Single point of configuration
A/B Testing	Requires client-side logic	Easily implemented at LB
Canary Deployments	Complex coordination	Traffic splitting at LB
Rate Limiting	Distributed enforcement	Centralized enforcement
Authentication	Per-client implementation	Centralized at gateway
Circuit Breaking	Per-client state	Centralized state

The Hybrid Reality

In practice, most systems use a hybrid approach. External traffic typically uses server-side discovery (through an API gateway), while internal service-to-service communication may use client-side discovery. Modern service meshes implement client-side discovery at the sidecar level, giving the benefits of both approaches.

Load Balancing Algorithms

Regardless of whether discovery is client-side or server-side, the load balancing algorithm determines how requests are distributed across available instances. Different algorithms are appropriate for different workloads.

Common Load Balancing Algorithms

•Round-Robin — Requests distributed equally in circular order. Simple, fair distribution, but ignores server load and request complexity.
•Weighted Round-Robin — Like round-robin, but servers receive traffic proportional to assigned weights. Useful when instances have different capacities.
•Least Connections — Sends requests to the instance with fewest active connections. Better for varying request durations.
•Weighted Least Connections — Combines least connections with server weights. Accounts for both load and capacity.
•Random — Randomly selects an instance. Simple, stateless, provides reasonable distribution at scale.
•Random Two Choices (P2C) — Picks two random instances, selects the one with fewer connections. Better than pure random with low overhead.
•Consistent Hashing — Routes requests based on hash of request attribute (e.g., user ID). Provides session affinity and cache efficiency.
•Least Response Time — Routes to instance with lowest recent response time. Adapts to performance differences.

load-balancing-algorithms.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
interface Instance {
  id: string;
  address: string;
  port: number;
  weight: number;
  activeConnections: number;
  avgResponseTime: number;
}
 
/**
 * Round-Robin: Simple rotation through instances
 */
class RoundRobinBalancer {
  private currentIndex = 0;
 
  select(instances: Instance[]): Instance {
    const selected = instances[this.currentIndex % instances.length];
    this.currentIndex++;
    return selected;
  }
}
 
/**
 * Weighted Round-Robin: Distribute proportionally to weights
 */
class WeightedRoundRobinBalancer {
  private positions = new Map<string, number>();
  private currentWeight = 0;
  private maxWeight = 0;
  private gcd = 0;
 
  select(instances: Instance[]): Instance {
    // Smooth weighted round-robin algorithm
    let selectedIndex = -1;
    let maxCurrent = -Infinity;
 
    for (let i = 0; i < instances.length; i++) {
      const current = (this.positions.get(instances[i].id) || 0) + instances[i].weight;
      this.positions.set(instances[i].id, current);
 
      if (current > maxCurrent) {
        maxCurrent = current;
        selectedIndex = i;
      }
    }
 
    const totalWeight = instances.reduce((sum, i) => sum + i.weight, 0);
    this.positions.set(
      instances[selectedIndex].id,
      maxCurrent - totalWeight
    );
 
    return instances[selectedIndex];
  }
}
 
/**
 * Least Connections: Route to least loaded instance
 */
class LeastConnectionsBalancer {
  select(instances: Instance[]): Instance {
    return instances.reduce((least, current) =>
      current.activeConnections < least.activeConnections ? current : least
    );
  }
}
 
/**
 * P2C (Power of Two Choices): Randomized with load awareness
 * Better than random, lower overhead than least-connections
 */
class P2CBalancer {
  select(instances: Instance[]): Instance {
    if (instances.length === 1) return instances[0];
 
    // Pick two random instances
    const idx1 = Math.floor(Math.random() * instances.length);
    let idx2 = Math.floor(Math.random() * instances.length);
    while (idx2 === idx1) {
      idx2 = Math.floor(Math.random() * instances.length);
    }
 
    // Return the one with fewer connections
    return instances[idx1].activeConnections <= instances[idx2].activeConnections
      ? instances[idx1]
      : instances[idx2];
  }
}
 
/**
 * Consistent Hashing: Affinity based on request attribute
 */
class ConsistentHashingBalancer {
  private ring: Map<number, Instance> = new Map();
  private sortedKeys: number[] = [];
  private virtualNodes = 100;
 
  constructor(instances: Instance[]) {
    this.buildRing(instances);
  }
 
  private buildRing(instances: Instance[]): void {
    for (const instance of instances) {
      for (let i = 0; i < this.virtualNodes; i++) {
        const hash = this.hash(`${instance.id}-${i}`);
        this.ring.set(hash, instance);
        this.sortedKeys.push(hash);
      }
    }
    this.sortedKeys.sort((a, b) => a - b);
  }
 
  select(key: string): Instance {
    const hash = this.hash(key);
    
    // Find first node with hash >= key hash
    for (const nodeHash of this.sortedKeys) {
      if (nodeHash >= hash) {
        return this.ring.get(nodeHash)!;
      }
    }
    
    // Wrap around to first node
    return this.ring.get(this.sortedKeys[0])!;
  }
 
  private hash(key: string): number {
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      hash = ((hash << 5) - hash) + key.charCodeAt(i);
      hash = hash & hash;
    }
    return Math.abs(hash);
  }
}

Algorithm Selection Guide
Use Case	Recommended Algorithm	Reasoning
Homogeneous instances, uniform requests	Round-Robin	Simple, fair distribution
Instances with different capacities	Weighted Round-Robin	Respects capacity differences
Variable request duration	Least Connections	Adapts to workload
Session affinity needed	Consistent Hashing	Same user → same instance
Large instance pool, simplicity	Random / P2C	Low overhead, good distribution
Response time sensitive	Least Response Time	Routes to fastest instances
Canary deployments	Weighted + Traffic Split	Control traffic percentages

Algorithm Impact

The load balancing algorithm can significantly impact system behavior. Poor algorithm choice leads to uneven load distribution, hot spots, and wasted capacity. Most production systems use weighted algorithms that account for both configured capacity (weights) and current load (connections or response time).

The Sidecar Discovery Pattern

Modern service mesh architectures combine the benefits of both client-side and server-side discovery through the sidecar proxy pattern. A local proxy (sidecar) runs alongside each service, handling discovery and load balancing on behalf of the application.

Why Sidecars Bridge Both Worlds:

Converting Mermaid diagram...

Sidecar Pattern Benefits:

The sidecar combines client-side discovery's performance (no central bottleneck, low latency) with server-side discovery's simplicity (application just connects to localhost).

From the Application's Perspective:

Simple configuration: connect to localhost or DNS name
No discovery libraries required
Language agnostic
Consistent behavior regardless of application implementation

From the Platform's Perspective:

Endpoints discovered per-client (no central bottleneck)
Uniform load balancing across all services
Centralized policy enforcement
Rich observability and traffic control

envoy-sidecar-config.yaml
Envoy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# Envoy sidecar configuration for service discovery
# Simplified example - production configs are more complex
 
static_resources:
  listeners:
    # Outbound listener - intercepts app traffic to other services
    - name: outbound_listener
      address:
        socket_address:
          address: 127.0.0.1
          port_value: 15001
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                route_config:
                  name: outbound_route
                  virtual_hosts:
                    - name: order_service
                      domains: ["order-service", "order-service.default.svc.cluster.local"]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: order-service-cluster
                http_filters:
                  - name: envoy.filters.http.router
 
  clusters:
    # Cluster with EDS (Endpoint Discovery Service)
    - name: order-service-cluster
      type: EDS
      eds_cluster_config:
        eds_config:
          api_config_source:
            api_type: GRPC
            grpc_services:
              - envoy_grpc:
                  cluster_name: xds-cluster
        service_name: order-service
      lb_policy: ROUND_ROBIN
      health_checks:
        - timeout: 5s
          interval: 10s
          unhealthy_threshold: 3
          healthy_threshold: 2
          http_health_check:
            path: /health
      circuit_breakers:
        thresholds:
          - max_connections: 1000
            max_pending_requests: 1000
            max_requests: 1000
            max_retries: 3
 
    # Control plane connection
    - name: xds-cluster
      type: STATIC
      connect_timeout: 5s
      load_assignment:
        cluster_name: xds-cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: istiod.istio-system.svc.cluster.local
                      port_value: 15012

Sidecar Discovery Advantages

•Application Simplicity — Applications connect to localhost; sidecar handles the rest
•Language Agnostic — Works with any language or framework
•No Central Bottleneck — Each pod has its own discovery/routing
•Advanced Features — Retries, timeouts, circuit breakers, observability built-in
•Consistent Policies — Traffic policies enforced uniformly across all services
•Dynamic Updates — Endpoints update in real-time via control plane push

Service Mesh = Sidecar Discovery at Scale

Service meshes like Istio, Linkerd, and Consul Connect are essentially sidecar discovery patterns with sophisticated control planes. The control plane distributes configuration and endpoint information; the sidecars (data plane) implement discovery, load balancing, security, and observability. This is the dominant pattern in modern Kubernetes environments.

Choosing Between Patterns

Selecting the right discovery pattern depends on your specific context, constraints, and requirements. Here's a decision framework to guide your choice.

Discovery Pattern Decision Matrix
If You Have...	Consider...	Because...
Homogeneous tech stack (all Java/Spring)	Client-side with Netflix stack	Established libraries, good integration
Polyglot services (multiple languages)	Server-side or Sidecar	Avoid language-specific libraries
Legacy applications	Server-side	No application changes required
Kubernetes environment	Sidecar (service mesh)	Natural fit, platform integration
Latency-critical workloads	Client-side or Sidecar	No extra network hop
Need centralized control/visibility	Server-side	All traffic flows through one point
Simple infrastructure	Server-side with cloud LB	Managed load balancer, low ops
High scale microservices	Sidecar (service mesh)	Handles complexity at scale
Need canary/traffic splitting	Server-side or Sidecar	Centralized traffic control

Common Architectural Patterns:

1. Edge (Server-Side) + Internal (Client-Side)

External traffic enters through an API gateway or load balancer (server-side discovery), while internal service-to-service calls use client-side discovery.

Internet → API Gateway (server-side) → Microservices (client-side LB)

2. Full Service Mesh

All traffic (external and internal) flows through sidecar proxies, providing uniform discovery, observability, and policy.

Internet → Ingress Gateway → Sidecar → App → Sidecar → Target

3. Tiered Approach

Different discovery mechanisms for different service tiers based on criticality and dynamism.

Stateless APIs: Sidecar discovery
Databases: DNS with connection pooling
External services: Static configuration

Avoid Premature Optimization

Don't adopt a service mesh just because it's the 'modern' approach. Service meshes add operational complexity that may not be justified for smaller systems. Start with simpler patterns (DNS, cloud load balancers) and evolve as your system's complexity and scale warrant more sophisticated discovery.

Implementation Best Practices

Regardless of which discovery pattern you choose, certain best practices apply universally.

Essential Best Practices

•Always cache endpoints locally — Don't query the registry for every request. Cache with periodic refresh and watch-based updates.
•Implement fallback behavior — If discovery fails, use cached endpoints. Don't fail immediately when the registry is unavailable.
•Handle partial failures gracefully — Mark failed endpoints as unhealthy locally, but continue trying others. Don't remove endpoints permanently on first failure.
•Implement retries with backoff — Transient failures are normal. Retry with exponential backoff and jitter to avoid thundering herds.
•Set appropriate timeouts — Discovery queries should have short timeouts. Don't let slow registry queries block request processing.
•Monitor discovery health — Track cache hit rates, refresh success, endpoint counts, and discovery latency. Alert on anomalies.
•Test failure scenarios — Simulate registry unavailability, stale caches, and partial failures in testing environments.
•Use connection pooling — Reuse connections to backends. Don't create a new connection for every request.

resilient-discovery.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
/**
 * Resilient service discovery with best practices
 */
class ResilientServiceDiscovery {
  private cache = new Map<string, CachedEndpoints>();
  private lastKnownGood = new Map<string, Endpoint[]>();
 
  async getEndpoints(serviceName: string): Promise<Endpoint[]> {
    // 1. Try fresh from cache
    const cached = this.cache.get(serviceName);
    if (cached && !this.isStale(cached)) {
      return cached.endpoints;
    }
 
    // 2. Try refresh from registry
    try {
      const fresh = await this.fetchWithTimeout(serviceName, 2000);
      this.updateCache(serviceName, fresh);
      this.lastKnownGood.set(serviceName, fresh);
      return fresh;
    } catch (error) {
      console.warn(`Registry fetch failed for ${serviceName}`);
    }
 
    // 3. Fall back to stale cache
    if (cached) {
      console.warn(`Using stale cache for ${serviceName}`);
      return cached.endpoints;
    }
 
    // 4. Fall back to last known good
    const lastGood = this.lastKnownGood.get(serviceName);
    if (lastGood && lastGood.length > 0) {
      console.warn(`Using last known good for ${serviceName}`);
      return lastGood;
    }
 
    // 5. No endpoints available
    throw new Error(`No endpoints available for ${serviceName}`);
  }
 
  private async fetchWithTimeout(
    serviceName: string, 
    timeoutMs: number
  ): Promise<Endpoint[]> {
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), timeoutMs);
    
    try {
      const response = await fetch(
        `http://consul.internal:8500/v1/health/service/${serviceName}?passing`,
        { signal: controller.signal }
      );
      return this.parseResponse(await response.json());
    } finally {
      clearTimeout(timeout);
    }
  }
 
  private isStale(cached: CachedEndpoints): boolean {
    return Date.now() - cached.timestamp > 30000; // 30 second staleness
  }
 
  private updateCache(serviceName: string, endpoints: Endpoint[]): void {
    this.cache.set(serviceName, {
      endpoints,
      timestamp: Date.now(),
    });
  }
 
  /**
   * Mark endpoint as failed - temporary exclusion
   */
  markFailed(serviceName: string, endpoint: Endpoint): void {
    const cached = this.cache.get(serviceName);
    if (cached) {
      const idx = cached.endpoints.findIndex(
        e => e.address === endpoint.address && e.port === endpoint.port
      );
      if (idx >= 0) {
        cached.endpoints[idx].healthy = false;
        cached.endpoints[idx].failedAt = Date.now();
      }
    }
  }
 
  /**
   * Periodic health recovery - re-enable failed endpoints
   */
  recoverEndpoints(): void {
    const recoveryThreshold = 30000; // 30 seconds
    for (const [_, cached] of this.cache) {
      for (const endpoint of cached.endpoints) {
        if (
          !endpoint.healthy && 
          endpoint.failedAt &&
          Date.now() - endpoint.failedAt > recoveryThreshold
        ) {
          endpoint.healthy = true;
          delete endpoint.failedAt;
        }
      }
    }
  }
}

Test Failure Modes

Regularly test what happens when discovery fails. Use chaos engineering tools to inject registry unavailability, slow responses, and stale data. Verify that your system degrades gracefully rather than failing completely. The difference between a minor incident and a major outage often comes down to how well discovery handles failures.

Summary: Making the Right Choice

We've explored the fundamental distinction between client-side and server-side discovery, along with the modern sidecar pattern that bridges both. Let's consolidate the key insights:

Key Takeaways

•Client-side discovery — Client queries registry and makes load balancing decisions. Lower latency, no bottleneck, but requires client libraries.
•Server-side discovery — Intermediary (load balancer) handles discovery and routing. Simple clients, centralized control, but potential bottleneck.
•Trade-offs are real — Neither pattern is universally better. Choose based on latency requirements, language diversity, operational capabilities, and scale.
•Load balancing matters — The algorithm (round-robin, least connections, consistent hashing, etc.) significantly impacts system behavior.
•Sidecar pattern combines benefits — Applications get simple localhost connections; sidecars provide client-side discovery without library coupling.
•Hybrid approaches are common — External traffic through gateways (server-side), internal traffic via client-side or sidecars.
•Best practices apply universally — Cache endpoints, handle failures gracefully, implement retries, monitor discovery health.

What's Next:

With discovery patterns understood, we'll explore Kubernetes service discovery—the most common service discovery implementation in modern cloud-native systems. You'll learn how Kubernetes abstracts discovery, the role of Services and Endpoints, CoreDNS integration, and how to leverage Kubernetes' built-in discovery mechanisms effectively.

Page Complete

You now understand the fundamental distinction between client-side and server-side discovery, their trade-offs, load balancing algorithms, and the modern sidecar pattern. This knowledge enables you to make informed architectural decisions about how discovery should work in your systems.

Client-Side vs Server-Side Discovery

Where Does the Routing Decision Happen?

What You Will Learn

Client-Side Discovery Pattern

How Client-Side Discovery Works:

Converting Mermaid diagram...

The Client-Side Discovery Flow:

Registration: Service instances register themselves with the service registry
Discovery Query: Client queries registry for available instances (typically cached)
Load Balancing: Client selects an instance using a load balancing algorithm
Direct Connection: Client connects directly to the selected instance
Response Handling: Response returns directly from the service to the client
Cache Refresh: Client periodically refreshes its endpoint cache
Failure Handling: Client removes failed instances from local cache

client-side-discovery.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
import Consul from 'consul';
 
interface ServiceInstance {
  address: string;
  port: number;
  healthy: boolean;
  weight: number;
  metadata: Record<string, string>;
}
 
/**
 * Client-side service discovery with load balancing
 */
class ClientSideDiscovery {
  private consul: Consul.Consul;
  private endpoints: Map<string, ServiceInstance[]> = new Map();
  private roundRobinIndex: Map<string, number> = new Map();
 
  constructor(consulHost: string = 'consul.internal') {
    this.consul = new Consul({ host: consulHost, port: 8500 });
    this.startBackgroundRefresh();
  }
 
  /**
   * Get endpoint for a service using client-side load balancing
   */
  async getEndpoint(serviceName: string): Promise<ServiceInstance> {
    const instances = await this.getHealthyInstances(serviceName);
    
    if (instances.length === 0) {
      throw new Error(`No healthy instances for ${serviceName}`);
    }
 
    // Client-side load balancing: weighted round-robin
    return this.selectInstance(serviceName, instances);
  }
 
  /**
   * Get all healthy instances from cache or registry
   */
  private async getHealthyInstances(serviceName: string): Promise<ServiceInstance[]> {
    // Check cache first
    let cached = this.endpoints.get(serviceName);
    
    if (!cached) {
      // Initial fetch from registry
      cached = await this.fetchFromRegistry(serviceName);
      this.endpoints.set(serviceName, cached);
    }
 
    return cached.filter(i => i.healthy);
  }
 
  /**
   * Query registry for service instances
   */
  private async fetchFromRegistry(serviceName: string): Promise<ServiceInstance[]> {
    const services = await this.consul.health.service({
      service: serviceName,
      passing: true,  // Only healthy instances
    });
 
    return services.map(s => ({
      address: s.Service.Address || s.Node.Address,
      port: s.Service.Port,
      healthy: true,
      weight: parseInt(s.Service.Meta?.weight || '100'),
      metadata: s.Service.Meta || {},
    }));
  }
 
  /**
   * Weighted round-robin load balancing
   */
  private selectInstance(serviceName: string, instances: ServiceInstance[]): ServiceInstance {
    // Build weighted list
    const weightedList: ServiceInstance[] = [];
    for (const instance of instances) {
      const normalizedWeight = Math.ceil(instance.weight / 10);
      for (let i = 0; i < normalizedWeight; i++) {
        weightedList.push(instance);
      }
    }
 
    // Round-robin through weighted list
    const currentIndex = this.roundRobinIndex.get(serviceName) || 0;
    const selected = weightedList[currentIndex % weightedList.length];
    this.roundRobinIndex.set(serviceName, currentIndex + 1);
 
    return selected;
  }
 
  /**
   * Background refresh of endpoint caches
   */
  private startBackgroundRefresh(): void {
    setInterval(async () => {
      for (const serviceName of this.endpoints.keys()) {
        try {
          const fresh = await this.fetchFromRegistry(serviceName);
          this.endpoints.set(serviceName, fresh);
        } catch (error) {
          console.warn(`Failed to refresh ${serviceName}, using cached`);
        }
      }
    }, 10000); // Refresh every 10 seconds
  }
 
  /**
   * Mark an instance as failed (remove from cache temporarily)
   */
  markFailed(serviceName: string, address: string, port: number): void {
    const instances = this.endpoints.get(serviceName);
    if (instances) {
      const instance = instances.find(
        i => i.address === address && i.port === port
      );
      if (instance) {
        instance.healthy = false;
        // Instance will be reconsidered on next refresh
      }
    }
  }
}
 
// Usage example
async function callOrderService() {
  const discovery = new ClientSideDiscovery();
  
  try {
    const endpoint = await discovery.getEndpoint('order-service');
    const response = await fetch(
      `http://${endpoint.address}:${endpoint.port}/api/orders`
    );
    return response.json();
  } catch (error) {
    // Handle failure, potentially mark endpoint as failed
    throw error;
  }
}

Advantages

•No central bottleneck — No intermediary that can become overloaded or fail
•Lower latency — Direct client-to-service connection, no proxy hop
•Flexible load balancing — Each client can implement custom algorithms
•Service-aware routing — Clients can use service-specific metadata for routing
•Failure isolation — One client's failure doesn't affect others
•Simpler infrastructure — No additional load balancer infrastructure

Disadvantages

•Client complexity — Each client must implement discovery and load balancing
•Language coupling — Need discovery libraries for each language
•Inconsistent behavior — Different clients may behave differently
•Harder to update — Changing load balancing requires updating all clients
•Testing complexity — Must test discovery logic in each client
•Registry coupling — Clients directly depend on registry implementation

Common Implementations

Server-Side Discovery Pattern

How Server-Side Discovery Works:

Converting Mermaid diagram...

The Server-Side Discovery Flow:

Registration: Service instances register with the service registry
Registry Sync: Load balancer watches registry for endpoint updates
Client Request: Client sends request to well-known load balancer address
Load Balancing: Load balancer selects an instance using its algorithm
Proxying: Load balancer forwards request to selected instance
Response: Response flows back through load balancer to client
Health Checking: Load balancer may perform its own health checks

nginx-server-side.conf
NGINX
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# NGINX as server-side discovery load balancer
# Uses Consul for dynamic upstream discovery
 
# Install nginx-upsync-module for dynamic upstreams
 
upstream order_service {
    # Dynamic upstream from Consul
    upsync 127.0.0.1:8500/v1/health/service/order-service 
           upsync_timeout=6m 
           upsync_interval=500ms 
           upsync_type=consul 
           strong_dependency=off;
    
    # Fallback static server
    upsync_dump_path /etc/nginx/order_service.conf;
    include /etc/nginx/order_service.conf;
    
    # Load balancing configuration
    keepalive 32;
}
 
upstream user_service {
    upsync 127.0.0.1:8500/v1/health/service/user-service 
           upsync_timeout=6m 
           upsync_interval=500ms 
           upsync_type=consul;
    
    upsync_dump_path /etc/nginx/user_service.conf;
    include /etc/nginx/user_service.conf;
    
    keepalive 32;
}
 
server {
    listen 80;
    
    # Clients connect to these paths
    # They don't know about individual instances
    location /api/orders {
        proxy_pass http://order_service;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        # Health check integration
        health_check interval=5s fails=3 passes=2;
    }
    
    location /api/users {
        proxy_pass http://user_service;
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        
        health_check interval=5s fails=3 passes=2;
    }
}

haproxy-server-side.cfg

HAProxy

# HAProxy configuration with Consul integration
# Uses consul-template for dynamic backend updates
 
global
    maxconn 50000
    log stdout format raw local0
 
defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    option httplog
 
frontend api_gateway
    bind *:80
    
    # Route based on path to different backends
    acl is_orders path_beg /api/orders
    acl is_users path_beg /api/users
    
    use_backend order_service if is_orders
    use_backend user_service if is_users
    default_backend fallback
 
# Backends are generated by consul-template
# This file is regenerated when service instances change
backend order_service
    balance roundrobin
    option httpchk GET /health
    http-check expect status 200
    
    # consul-template populates these servers dynamically
    {{range service "order-service"}}
    server {{.Node}}-{{.Port}} {{.Address}}:{{.Port}} check inter 5s fall 3 rise 2
    {{end}}
 
backend user_service
    balance leastconn
    option httpchk GET /health
    
    {{range service "user-service"}}
    server {{.Node}}-{{.Port}} {{.Address}}:{{.Port}} check inter 5s fall 3 rise 2
    {{end}}
 
backend fallback
    mode http
    http-request deny deny_status 503

Advantages

•Simple clients — Clients just connect to hostname; no discovery logic
•Language agnostic — Any language/framework can use the service
•Centralized control — All routing decisions in one place
•Consistent behavior — All clients experience same load balancing
•Easy updates — Change routing without updating clients
•Additional features — SSL termination, rate limiting, auth

Disadvantages

•Additional hop — Every request goes through load balancer
•Potential bottleneck — Load balancer can become overloaded
•Single point of failure — Load balancer failure affects all traffic
•Increased latency — Proxy hop adds latency to every request
•Scaling requirements — Must scale load balancer infrastructure
•Operational burden — Another critical system to manage

Cloud Load Balancers

Detailed Comparison

Let's systematically compare client-side and server-side discovery across multiple dimensions to understand which is appropriate for different scenarios.

Client-Side vs Server-Side Discovery Comparison
Dimension	Client-Side Discovery	Server-Side Discovery
Network Topology	Direct client-to-service connections	All traffic flows through intermediary
Latency	Lower (no proxy hop)	Higher (additional hop per request)
Failure Blast Radius	Localized to single client	Load balancer failure affects all clients
Client Complexity	Higher (discovery + load balancing logic)	Lower (just hostname and port)
Client Diversity	Requires library per language	Any language works (just HTTP/TCP)
Load Balancing Flexibility	Per-client customization possible	Uniform across all clients
Operational Complexity	Discovery logic distributed in clients	Centralized in load balancer
Visibility/Observability	Fragmented across clients	Centralized at load balancer
Security Enforcement	Distributed, harder to enforce	Centralized, easier to enforce
Update Rollout	Requires updating all clients	Update single load balancer
Registry Coupling	Clients coupled to registry	Only LB coupled to registry

Performance Characteristics:

Performance Comparison
Metric	Client-Side	Server-Side	Notes
Request Latency	~0ms overhead	0.5-5ms overhead	Depends on LB implementation
Connection Efficiency	Connection per backend	Connection pooling at LB	LB can multiplex connections
Throughput	Limited by client resources	Limited by LB capacity	LB becomes bottleneck at scale
Memory Usage	Endpoint cache per client	Centralized caching	Scales differently
Network Utilization	Distributed query load	Concentrated query load	Registry load differs

Operational Characteristics:

Operational Comparison
Aspect	Client-Side	Server-Side
Debugging	Must instrument each client	Centralized logs/metrics at LB
Configuration Changes	Rolling update to all clients	Single point of configuration
A/B Testing	Requires client-side logic	Easily implemented at LB
Canary Deployments	Complex coordination	Traffic splitting at LB
Rate Limiting	Distributed enforcement	Centralized enforcement
Authentication	Per-client implementation	Centralized at gateway
Circuit Breaking	Per-client state	Centralized state

The Hybrid Reality

Load Balancing Algorithms

Common Load Balancing Algorithms

•Round-Robin — Requests distributed equally in circular order. Simple, fair distribution, but ignores server load and request complexity.
•Weighted Round-Robin — Like round-robin, but servers receive traffic proportional to assigned weights. Useful when instances have different capacities.
•Least Connections — Sends requests to the instance with fewest active connections. Better for varying request durations.
•Weighted Least Connections — Combines least connections with server weights. Accounts for both load and capacity.
•Random — Randomly selects an instance. Simple, stateless, provides reasonable distribution at scale.
•Random Two Choices (P2C) — Picks two random instances, selects the one with fewer connections. Better than pure random with low overhead.
•Consistent Hashing — Routes requests based on hash of request attribute (e.g., user ID). Provides session affinity and cache efficiency.
•Least Response Time — Routes to instance with lowest recent response time. Adapts to performance differences.

load-balancing-algorithms.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
interface Instance {
  id: string;
  address: string;
  port: number;
  weight: number;
  activeConnections: number;
  avgResponseTime: number;
}
 
/**
 * Round-Robin: Simple rotation through instances
 */
class RoundRobinBalancer {
  private currentIndex = 0;
 
  select(instances: Instance[]): Instance {
    const selected = instances[this.currentIndex % instances.length];
    this.currentIndex++;
    return selected;
  }
}
 
/**
 * Weighted Round-Robin: Distribute proportionally to weights
 */
class WeightedRoundRobinBalancer {
  private positions = new Map<string, number>();
  private currentWeight = 0;
  private maxWeight = 0;
  private gcd = 0;
 
  select(instances: Instance[]): Instance {
    // Smooth weighted round-robin algorithm
    let selectedIndex = -1;
    let maxCurrent = -Infinity;
 
    for (let i = 0; i < instances.length; i++) {
      const current = (this.positions.get(instances[i].id) || 0) + instances[i].weight;
      this.positions.set(instances[i].id, current);
 
      if (current > maxCurrent) {
        maxCurrent = current;
        selectedIndex = i;
      }
    }
 
    const totalWeight = instances.reduce((sum, i) => sum + i.weight, 0);
    this.positions.set(
      instances[selectedIndex].id,
      maxCurrent - totalWeight
    );
 
    return instances[selectedIndex];
  }
}
 
/**
 * Least Connections: Route to least loaded instance
 */
class LeastConnectionsBalancer {
  select(instances: Instance[]): Instance {
    return instances.reduce((least, current) =>
      current.activeConnections < least.activeConnections ? current : least
    );
  }
}
 
/**
 * P2C (Power of Two Choices): Randomized with load awareness
 * Better than random, lower overhead than least-connections
 */
class P2CBalancer {
  select(instances: Instance[]): Instance {
    if (instances.length === 1) return instances[0];
 
    // Pick two random instances
    const idx1 = Math.floor(Math.random() * instances.length);
    let idx2 = Math.floor(Math.random() * instances.length);
    while (idx2 === idx1) {
      idx2 = Math.floor(Math.random() * instances.length);
    }
 
    // Return the one with fewer connections
    return instances[idx1].activeConnections <= instances[idx2].activeConnections
      ? instances[idx1]
      : instances[idx2];
  }
}
 
/**
 * Consistent Hashing: Affinity based on request attribute
 */
class ConsistentHashingBalancer {
  private ring: Map<number, Instance> = new Map();
  private sortedKeys: number[] = [];
  private virtualNodes = 100;
 
  constructor(instances: Instance[]) {
    this.buildRing(instances);
  }
 
  private buildRing(instances: Instance[]): void {
    for (const instance of instances) {
      for (let i = 0; i < this.virtualNodes; i++) {
        const hash = this.hash(`${instance.id}-${i}`);
        this.ring.set(hash, instance);
        this.sortedKeys.push(hash);
      }
    }
    this.sortedKeys.sort((a, b) => a - b);
  }
 
  select(key: string): Instance {
    const hash = this.hash(key);
    
    // Find first node with hash >= key hash
    for (const nodeHash of this.sortedKeys) {
      if (nodeHash >= hash) {
        return this.ring.get(nodeHash)!;
      }
    }
    
    // Wrap around to first node
    return this.ring.get(this.sortedKeys[0])!;
  }
 
  private hash(key: string): number {
    let hash = 0;
    for (let i = 0; i < key.length; i++) {
      hash = ((hash << 5) - hash) + key.charCodeAt(i);
      hash = hash & hash;
    }
    return Math.abs(hash);
  }
}

Algorithm Selection Guide
Use Case	Recommended Algorithm	Reasoning
Homogeneous instances, uniform requests	Round-Robin	Simple, fair distribution
Instances with different capacities	Weighted Round-Robin	Respects capacity differences
Variable request duration	Least Connections	Adapts to workload
Session affinity needed	Consistent Hashing	Same user → same instance
Large instance pool, simplicity	Random / P2C	Low overhead, good distribution
Response time sensitive	Least Response Time	Routes to fastest instances
Canary deployments	Weighted + Traffic Split	Control traffic percentages

Algorithm Impact

The Sidecar Discovery Pattern

Why Sidecars Bridge Both Worlds:

Converting Mermaid diagram...

Sidecar Pattern Benefits:

The sidecar combines client-side discovery's performance (no central bottleneck, low latency) with server-side discovery's simplicity (application just connects to localhost).

From the Application's Perspective:

Simple configuration: connect to localhost or DNS name
No discovery libraries required
Language agnostic
Consistent behavior regardless of application implementation

From the Platform's Perspective:

Endpoints discovered per-client (no central bottleneck)
Uniform load balancing across all services
Centralized policy enforcement
Rich observability and traffic control

envoy-sidecar-config.yaml
Envoy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
# Envoy sidecar configuration for service discovery
# Simplified example - production configs are more complex
 
static_resources:
  listeners:
    # Outbound listener - intercepts app traffic to other services
    - name: outbound_listener
      address:
        socket_address:
          address: 127.0.0.1
          port_value: 15001
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                route_config:
                  name: outbound_route
                  virtual_hosts:
                    - name: order_service
                      domains: ["order-service", "order-service.default.svc.cluster.local"]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: order-service-cluster
                http_filters:
                  - name: envoy.filters.http.router
 
  clusters:
    # Cluster with EDS (Endpoint Discovery Service)
    - name: order-service-cluster
      type: EDS
      eds_cluster_config:
        eds_config:
          api_config_source:
            api_type: GRPC
            grpc_services:
              - envoy_grpc:
                  cluster_name: xds-cluster
        service_name: order-service
      lb_policy: ROUND_ROBIN
      health_checks:
        - timeout: 5s
          interval: 10s
          unhealthy_threshold: 3
          healthy_threshold: 2
          http_health_check:
            path: /health
      circuit_breakers:
        thresholds:
          - max_connections: 1000
            max_pending_requests: 1000
            max_requests: 1000
            max_retries: 3
 
    # Control plane connection
    - name: xds-cluster
      type: STATIC
      connect_timeout: 5s
      load_assignment:
        cluster_name: xds-cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: istiod.istio-system.svc.cluster.local
                      port_value: 15012

Sidecar Discovery Advantages

•Application Simplicity — Applications connect to localhost; sidecar handles the rest
•Language Agnostic — Works with any language or framework
•No Central Bottleneck — Each pod has its own discovery/routing
•Advanced Features — Retries, timeouts, circuit breakers, observability built-in
•Consistent Policies — Traffic policies enforced uniformly across all services
•Dynamic Updates — Endpoints update in real-time via control plane push

Service Mesh = Sidecar Discovery at Scale

Choosing Between Patterns

Selecting the right discovery pattern depends on your specific context, constraints, and requirements. Here's a decision framework to guide your choice.

Discovery Pattern Decision Matrix
If You Have...	Consider...	Because...
Homogeneous tech stack (all Java/Spring)	Client-side with Netflix stack	Established libraries, good integration
Polyglot services (multiple languages)	Server-side or Sidecar	Avoid language-specific libraries
Legacy applications	Server-side	No application changes required
Kubernetes environment	Sidecar (service mesh)	Natural fit, platform integration
Latency-critical workloads	Client-side or Sidecar	No extra network hop
Need centralized control/visibility	Server-side	All traffic flows through one point
Simple infrastructure	Server-side with cloud LB	Managed load balancer, low ops
High scale microservices	Sidecar (service mesh)	Handles complexity at scale
Need canary/traffic splitting	Server-side or Sidecar	Centralized traffic control

Common Architectural Patterns:

1. Edge (Server-Side) + Internal (Client-Side)

External traffic enters through an API gateway or load balancer (server-side discovery), while internal service-to-service calls use client-side discovery.

Internet → API Gateway (server-side) → Microservices (client-side LB)

2. Full Service Mesh

All traffic (external and internal) flows through sidecar proxies, providing uniform discovery, observability, and policy.

Internet → Ingress Gateway → Sidecar → App → Sidecar → Target

3. Tiered Approach

Different discovery mechanisms for different service tiers based on criticality and dynamism.

Stateless APIs: Sidecar discovery
Databases: DNS with connection pooling
External services: Static configuration

Avoid Premature Optimization

Implementation Best Practices

Regardless of which discovery pattern you choose, certain best practices apply universally.

Essential Best Practices

•Always cache endpoints locally — Don't query the registry for every request. Cache with periodic refresh and watch-based updates.
•Implement fallback behavior — If discovery fails, use cached endpoints. Don't fail immediately when the registry is unavailable.
•Handle partial failures gracefully — Mark failed endpoints as unhealthy locally, but continue trying others. Don't remove endpoints permanently on first failure.
•Implement retries with backoff — Transient failures are normal. Retry with exponential backoff and jitter to avoid thundering herds.
•Set appropriate timeouts — Discovery queries should have short timeouts. Don't let slow registry queries block request processing.
•Monitor discovery health — Track cache hit rates, refresh success, endpoint counts, and discovery latency. Alert on anomalies.
•Test failure scenarios — Simulate registry unavailability, stale caches, and partial failures in testing environments.
•Use connection pooling — Reuse connections to backends. Don't create a new connection for every request.

resilient-discovery.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
/**
 * Resilient service discovery with best practices
 */
class ResilientServiceDiscovery {
  private cache = new Map<string, CachedEndpoints>();
  private lastKnownGood = new Map<string, Endpoint[]>();
 
  async getEndpoints(serviceName: string): Promise<Endpoint[]> {
    // 1. Try fresh from cache
    const cached = this.cache.get(serviceName);
    if (cached && !this.isStale(cached)) {
      return cached.endpoints;
    }
 
    // 2. Try refresh from registry
    try {
      const fresh = await this.fetchWithTimeout(serviceName, 2000);
      this.updateCache(serviceName, fresh);
      this.lastKnownGood.set(serviceName, fresh);
      return fresh;
    } catch (error) {
      console.warn(`Registry fetch failed for ${serviceName}`);
    }
 
    // 3. Fall back to stale cache
    if (cached) {
      console.warn(`Using stale cache for ${serviceName}`);
      return cached.endpoints;
    }
 
    // 4. Fall back to last known good
    const lastGood = this.lastKnownGood.get(serviceName);
    if (lastGood && lastGood.length > 0) {
      console.warn(`Using last known good for ${serviceName}`);
      return lastGood;
    }
 
    // 5. No endpoints available
    throw new Error(`No endpoints available for ${serviceName}`);
  }
 
  private async fetchWithTimeout(
    serviceName: string, 
    timeoutMs: number
  ): Promise<Endpoint[]> {
    const controller = new AbortController();
    const timeout = setTimeout(() => controller.abort(), timeoutMs);
    
    try {
      const response = await fetch(
        `http://consul.internal:8500/v1/health/service/${serviceName}?passing`,
        { signal: controller.signal }
      );
      return this.parseResponse(await response.json());
    } finally {
      clearTimeout(timeout);
    }
  }
 
  private isStale(cached: CachedEndpoints): boolean {
    return Date.now() - cached.timestamp > 30000; // 30 second staleness
  }
 
  private updateCache(serviceName: string, endpoints: Endpoint[]): void {
    this.cache.set(serviceName, {
      endpoints,
      timestamp: Date.now(),
    });
  }
 
  /**
   * Mark endpoint as failed - temporary exclusion
   */
  markFailed(serviceName: string, endpoint: Endpoint): void {
    const cached = this.cache.get(serviceName);
    if (cached) {
      const idx = cached.endpoints.findIndex(
        e => e.address === endpoint.address && e.port === endpoint.port
      );
      if (idx >= 0) {
        cached.endpoints[idx].healthy = false;
        cached.endpoints[idx].failedAt = Date.now();
      }
    }
  }
 
  /**
   * Periodic health recovery - re-enable failed endpoints
   */
  recoverEndpoints(): void {
    const recoveryThreshold = 30000; // 30 seconds
    for (const [_, cached] of this.cache) {
      for (const endpoint of cached.endpoints) {
        if (
          !endpoint.healthy && 
          endpoint.failedAt &&
          Date.now() - endpoint.failedAt > recoveryThreshold
        ) {
          endpoint.healthy = true;
          delete endpoint.failedAt;
        }
      }
    }
  }
}

Test Failure Modes

Summary: Making the Right Choice

We've explored the fundamental distinction between client-side and server-side discovery, along with the modern sidecar pattern that bridges both. Let's consolidate the key insights:

Key Takeaways

•Client-side discovery — Client queries registry and makes load balancing decisions. Lower latency, no bottleneck, but requires client libraries.
•Server-side discovery — Intermediary (load balancer) handles discovery and routing. Simple clients, centralized control, but potential bottleneck.
•Trade-offs are real — Neither pattern is universally better. Choose based on latency requirements, language diversity, operational capabilities, and scale.
•Load balancing matters — The algorithm (round-robin, least connections, consistent hashing, etc.) significantly impacts system behavior.
•Sidecar pattern combines benefits — Applications get simple localhost connections; sidecars provide client-side discovery without library coupling.
•Hybrid approaches are common — External traffic through gateways (server-side), internal traffic via client-side or sidecars.
•Best practices apply universally — Cache endpoints, handle failures gracefully, implement retries, monitor discovery health.

What's Next:

Page Complete