Loading content...
Once services are registered in a registry, a fundamental architectural question arises: Who decides which instance receives a request? This seemingly simple question has profound implications for system complexity, flexibility, resilience, and operational requirements.
The answer defines two distinct patterns: client-side discovery, where the calling service queries the registry and makes load balancing decisions itself, and server-side discovery, where a dedicated intermediary (like a load balancer) handles routing on behalf of the client.
Both patterns are valid, and both are used extensively in production systems. Understanding their trade-offs is essential for making informed architectural decisions—and for understanding why modern service meshes combine elements of both.
By the end of this page, you will understand the fundamental differences between client-side and server-side discovery, their respective advantages and disadvantages, implementation patterns for each, and guidance for choosing between them. You'll also understand the hybrid approach used by modern service meshes.
In the client-side discovery pattern, the service client is responsible for determining the network locations of available service instances and load balancing requests across them. The client queries the service registry, obtains a list of endpoints, and uses a client-side load balancing algorithm to select an instance for each request.
How Client-Side Discovery Works:
The Client-Side Discovery Flow:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140
import Consul from 'consul'; interface ServiceInstance { address: string; port: number; healthy: boolean; weight: number; metadata: Record<string, string>;} /** * Client-side service discovery with load balancing */class ClientSideDiscovery { private consul: Consul.Consul; private endpoints: Map<string, ServiceInstance[]> = new Map(); private roundRobinIndex: Map<string, number> = new Map(); constructor(consulHost: string = 'consul.internal') { this.consul = new Consul({ host: consulHost, port: 8500 }); this.startBackgroundRefresh(); } /** * Get endpoint for a service using client-side load balancing */ async getEndpoint(serviceName: string): Promise<ServiceInstance> { const instances = await this.getHealthyInstances(serviceName); if (instances.length === 0) { throw new Error(`No healthy instances for ${serviceName}`); } // Client-side load balancing: weighted round-robin return this.selectInstance(serviceName, instances); } /** * Get all healthy instances from cache or registry */ private async getHealthyInstances(serviceName: string): Promise<ServiceInstance[]> { // Check cache first let cached = this.endpoints.get(serviceName); if (!cached) { // Initial fetch from registry cached = await this.fetchFromRegistry(serviceName); this.endpoints.set(serviceName, cached); } return cached.filter(i => i.healthy); } /** * Query registry for service instances */ private async fetchFromRegistry(serviceName: string): Promise<ServiceInstance[]> { const services = await this.consul.health.service({ service: serviceName, passing: true, // Only healthy instances }); return services.map(s => ({ address: s.Service.Address || s.Node.Address, port: s.Service.Port, healthy: true, weight: parseInt(s.Service.Meta?.weight || '100'), metadata: s.Service.Meta || {}, })); } /** * Weighted round-robin load balancing */ private selectInstance(serviceName: string, instances: ServiceInstance[]): ServiceInstance { // Build weighted list const weightedList: ServiceInstance[] = []; for (const instance of instances) { const normalizedWeight = Math.ceil(instance.weight / 10); for (let i = 0; i < normalizedWeight; i++) { weightedList.push(instance); } } // Round-robin through weighted list const currentIndex = this.roundRobinIndex.get(serviceName) || 0; const selected = weightedList[currentIndex % weightedList.length]; this.roundRobinIndex.set(serviceName, currentIndex + 1); return selected; } /** * Background refresh of endpoint caches */ private startBackgroundRefresh(): void { setInterval(async () => { for (const serviceName of this.endpoints.keys()) { try { const fresh = await this.fetchFromRegistry(serviceName); this.endpoints.set(serviceName, fresh); } catch (error) { console.warn(`Failed to refresh ${serviceName}, using cached`); } } }, 10000); // Refresh every 10 seconds } /** * Mark an instance as failed (remove from cache temporarily) */ markFailed(serviceName: string, address: string, port: number): void { const instances = this.endpoints.get(serviceName); if (instances) { const instance = instances.find( i => i.address === address && i.port === port ); if (instance) { instance.healthy = false; // Instance will be reconsidered on next refresh } } }} // Usage exampleasync function callOrderService() { const discovery = new ClientSideDiscovery(); try { const endpoint = await discovery.getEndpoint('order-service'); const response = await fetch( `http://${endpoint.address}:${endpoint.port}/api/orders` ); return response.json(); } catch (error) { // Handle failure, potentially mark endpoint as failed throw error; }}Netflix Ribbon (now in maintenance mode), gRPC client-side load balancing, and Envoy sidecar proxies are popular implementations of client-side discovery. When the 'client' is a sidecar proxy, the application is decoupled from discovery while still achieving client-side load balancing.
In the server-side discovery pattern, clients make requests to a well-known intermediary (typically a load balancer or API gateway), which queries the service registry and routes the request to an appropriate instance. The client is unaware of the service registry or the specific instances handling its requests.
How Server-Side Discovery Works:
The Server-Side Discovery Flow:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
# NGINX as server-side discovery load balancer# Uses Consul for dynamic upstream discovery # Install nginx-upsync-module for dynamic upstreams upstream order_service { # Dynamic upstream from Consul upsync 127.0.0.1:8500/v1/health/service/order-service upsync_timeout=6m upsync_interval=500ms upsync_type=consul strong_dependency=off; # Fallback static server upsync_dump_path /etc/nginx/order_service.conf; include /etc/nginx/order_service.conf; # Load balancing configuration keepalive 32;} upstream user_service { upsync 127.0.0.1:8500/v1/health/service/user-service upsync_timeout=6m upsync_interval=500ms upsync_type=consul; upsync_dump_path /etc/nginx/user_service.conf; include /etc/nginx/user_service.conf; keepalive 32;} server { listen 80; # Clients connect to these paths # They don't know about individual instances location /api/orders { proxy_pass http://order_service; proxy_http_version 1.1; proxy_set_header Connection ""; # Health check integration health_check interval=5s fails=3 passes=2; } location /api/users { proxy_pass http://user_service; proxy_http_version 1.1; proxy_set_header Connection ""; health_check interval=5s fails=3 passes=2; }}123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
# HAProxy configuration with Consul integration# Uses consul-template for dynamic backend updates global maxconn 50000 log stdout format raw local0 defaults mode http timeout connect 5s timeout client 30s timeout server 30s option httplog frontend api_gateway bind *:80 # Route based on path to different backends acl is_orders path_beg /api/orders acl is_users path_beg /api/users use_backend order_service if is_orders use_backend user_service if is_users default_backend fallback # Backends are generated by consul-template# This file is regenerated when service instances changebackend order_service balance roundrobin option httpchk GET /health http-check expect status 200 # consul-template populates these servers dynamically {{range service "order-service"}} server {{.Node}}-{{.Port}} {{.Address}}:{{.Port}} check inter 5s fall 3 rise 2 {{end}} backend user_service balance leastconn option httpchk GET /health {{range service "user-service"}} server {{.Node}}-{{.Port}} {{.Address}}:{{.Port}} check inter 5s fall 3 rise 2 {{end}} backend fallback mode http http-request deny deny_status 503Cloud providers offer managed load balancers (AWS ALB/NLB, GCP Load Balancer, Azure Load Balancer) that integrate with their discovery mechanisms. This reduces operational burden while providing server-side discovery. However, they may have higher latency compared to client-side approaches and less flexibility in load balancing algorithms.
Let's systematically compare client-side and server-side discovery across multiple dimensions to understand which is appropriate for different scenarios.
| Dimension | Client-Side Discovery | Server-Side Discovery |
|---|---|---|
| Network Topology | Direct client-to-service connections | All traffic flows through intermediary |
| Latency | Lower (no proxy hop) | Higher (additional hop per request) |
| Failure Blast Radius | Localized to single client | Load balancer failure affects all clients |
| Client Complexity | Higher (discovery + load balancing logic) | Lower (just hostname and port) |
| Client Diversity | Requires library per language | Any language works (just HTTP/TCP) |
| Load Balancing Flexibility | Per-client customization possible | Uniform across all clients |
| Operational Complexity | Discovery logic distributed in clients | Centralized in load balancer |
| Visibility/Observability | Fragmented across clients | Centralized at load balancer |
| Security Enforcement | Distributed, harder to enforce | Centralized, easier to enforce |
| Update Rollout | Requires updating all clients | Update single load balancer |
| Registry Coupling | Clients coupled to registry | Only LB coupled to registry |
Performance Characteristics:
| Metric | Client-Side | Server-Side | Notes |
|---|---|---|---|
| Request Latency | ~0ms overhead | 0.5-5ms overhead | Depends on LB implementation |
| Connection Efficiency | Connection per backend | Connection pooling at LB | LB can multiplex connections |
| Throughput | Limited by client resources | Limited by LB capacity | LB becomes bottleneck at scale |
| Memory Usage | Endpoint cache per client | Centralized caching | Scales differently |
| Network Utilization | Distributed query load | Concentrated query load | Registry load differs |
Operational Characteristics:
| Aspect | Client-Side | Server-Side |
|---|---|---|
| Debugging | Must instrument each client | Centralized logs/metrics at LB |
| Configuration Changes | Rolling update to all clients | Single point of configuration |
| A/B Testing | Requires client-side logic | Easily implemented at LB |
| Canary Deployments | Complex coordination | Traffic splitting at LB |
| Rate Limiting | Distributed enforcement | Centralized enforcement |
| Authentication | Per-client implementation | Centralized at gateway |
| Circuit Breaking | Per-client state | Centralized state |
In practice, most systems use a hybrid approach. External traffic typically uses server-side discovery (through an API gateway), while internal service-to-service communication may use client-side discovery. Modern service meshes implement client-side discovery at the sidecar level, giving the benefits of both approaches.
Regardless of whether discovery is client-side or server-side, the load balancing algorithm determines how requests are distributed across available instances. Different algorithms are appropriate for different workloads.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
interface Instance { id: string; address: string; port: number; weight: number; activeConnections: number; avgResponseTime: number;} /** * Round-Robin: Simple rotation through instances */class RoundRobinBalancer { private currentIndex = 0; select(instances: Instance[]): Instance { const selected = instances[this.currentIndex % instances.length]; this.currentIndex++; return selected; }} /** * Weighted Round-Robin: Distribute proportionally to weights */class WeightedRoundRobinBalancer { private positions = new Map<string, number>(); private currentWeight = 0; private maxWeight = 0; private gcd = 0; select(instances: Instance[]): Instance { // Smooth weighted round-robin algorithm let selectedIndex = -1; let maxCurrent = -Infinity; for (let i = 0; i < instances.length; i++) { const current = (this.positions.get(instances[i].id) || 0) + instances[i].weight; this.positions.set(instances[i].id, current); if (current > maxCurrent) { maxCurrent = current; selectedIndex = i; } } const totalWeight = instances.reduce((sum, i) => sum + i.weight, 0); this.positions.set( instances[selectedIndex].id, maxCurrent - totalWeight ); return instances[selectedIndex]; }} /** * Least Connections: Route to least loaded instance */class LeastConnectionsBalancer { select(instances: Instance[]): Instance { return instances.reduce((least, current) => current.activeConnections < least.activeConnections ? current : least ); }} /** * P2C (Power of Two Choices): Randomized with load awareness * Better than random, lower overhead than least-connections */class P2CBalancer { select(instances: Instance[]): Instance { if (instances.length === 1) return instances[0]; // Pick two random instances const idx1 = Math.floor(Math.random() * instances.length); let idx2 = Math.floor(Math.random() * instances.length); while (idx2 === idx1) { idx2 = Math.floor(Math.random() * instances.length); } // Return the one with fewer connections return instances[idx1].activeConnections <= instances[idx2].activeConnections ? instances[idx1] : instances[idx2]; }} /** * Consistent Hashing: Affinity based on request attribute */class ConsistentHashingBalancer { private ring: Map<number, Instance> = new Map(); private sortedKeys: number[] = []; private virtualNodes = 100; constructor(instances: Instance[]) { this.buildRing(instances); } private buildRing(instances: Instance[]): void { for (const instance of instances) { for (let i = 0; i < this.virtualNodes; i++) { const hash = this.hash(`${instance.id}-${i}`); this.ring.set(hash, instance); this.sortedKeys.push(hash); } } this.sortedKeys.sort((a, b) => a - b); } select(key: string): Instance { const hash = this.hash(key); // Find first node with hash >= key hash for (const nodeHash of this.sortedKeys) { if (nodeHash >= hash) { return this.ring.get(nodeHash)!; } } // Wrap around to first node return this.ring.get(this.sortedKeys[0])!; } private hash(key: string): number { let hash = 0; for (let i = 0; i < key.length; i++) { hash = ((hash << 5) - hash) + key.charCodeAt(i); hash = hash & hash; } return Math.abs(hash); }}| Use Case | Recommended Algorithm | Reasoning |
|---|---|---|
| Homogeneous instances, uniform requests | Round-Robin | Simple, fair distribution |
| Instances with different capacities | Weighted Round-Robin | Respects capacity differences |
| Variable request duration | Least Connections | Adapts to workload |
| Session affinity needed | Consistent Hashing | Same user → same instance |
| Large instance pool, simplicity | Random / P2C | Low overhead, good distribution |
| Response time sensitive | Least Response Time | Routes to fastest instances |
| Canary deployments | Weighted + Traffic Split | Control traffic percentages |
The load balancing algorithm can significantly impact system behavior. Poor algorithm choice leads to uneven load distribution, hot spots, and wasted capacity. Most production systems use weighted algorithms that account for both configured capacity (weights) and current load (connections or response time).
Modern service mesh architectures combine the benefits of both client-side and server-side discovery through the sidecar proxy pattern. A local proxy (sidecar) runs alongside each service, handling discovery and load balancing on behalf of the application.
Why Sidecars Bridge Both Worlds:
Sidecar Pattern Benefits:
The sidecar combines client-side discovery's performance (no central bottleneck, low latency) with server-side discovery's simplicity (application just connects to localhost).
From the Application's Perspective:
From the Platform's Perspective:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869
# Envoy sidecar configuration for service discovery# Simplified example - production configs are more complex static_resources: listeners: # Outbound listener - intercepts app traffic to other services - name: outbound_listener address: socket_address: address: 127.0.0.1 port_value: 15001 filter_chains: - filters: - name: envoy.filters.network.http_connection_manager typed_config: "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager route_config: name: outbound_route virtual_hosts: - name: order_service domains: ["order-service", "order-service.default.svc.cluster.local"] routes: - match: prefix: "/" route: cluster: order-service-cluster http_filters: - name: envoy.filters.http.router clusters: # Cluster with EDS (Endpoint Discovery Service) - name: order-service-cluster type: EDS eds_cluster_config: eds_config: api_config_source: api_type: GRPC grpc_services: - envoy_grpc: cluster_name: xds-cluster service_name: order-service lb_policy: ROUND_ROBIN health_checks: - timeout: 5s interval: 10s unhealthy_threshold: 3 healthy_threshold: 2 http_health_check: path: /health circuit_breakers: thresholds: - max_connections: 1000 max_pending_requests: 1000 max_requests: 1000 max_retries: 3 # Control plane connection - name: xds-cluster type: STATIC connect_timeout: 5s load_assignment: cluster_name: xds-cluster endpoints: - lb_endpoints: - endpoint: address: socket_address: address: istiod.istio-system.svc.cluster.local port_value: 15012Service meshes like Istio, Linkerd, and Consul Connect are essentially sidecar discovery patterns with sophisticated control planes. The control plane distributes configuration and endpoint information; the sidecars (data plane) implement discovery, load balancing, security, and observability. This is the dominant pattern in modern Kubernetes environments.
Selecting the right discovery pattern depends on your specific context, constraints, and requirements. Here's a decision framework to guide your choice.
| If You Have... | Consider... | Because... |
|---|---|---|
| Homogeneous tech stack (all Java/Spring) | Client-side with Netflix stack | Established libraries, good integration |
| Polyglot services (multiple languages) | Server-side or Sidecar | Avoid language-specific libraries |
| Legacy applications | Server-side | No application changes required |
| Kubernetes environment | Sidecar (service mesh) | Natural fit, platform integration |
| Latency-critical workloads | Client-side or Sidecar | No extra network hop |
| Need centralized control/visibility | Server-side | All traffic flows through one point |
| Simple infrastructure | Server-side with cloud LB | Managed load balancer, low ops |
| High scale microservices | Sidecar (service mesh) | Handles complexity at scale |
| Need canary/traffic splitting | Server-side or Sidecar | Centralized traffic control |
Common Architectural Patterns:
1. Edge (Server-Side) + Internal (Client-Side)
External traffic enters through an API gateway or load balancer (server-side discovery), while internal service-to-service calls use client-side discovery.
Internet → API Gateway (server-side) → Microservices (client-side LB)
2. Full Service Mesh
All traffic (external and internal) flows through sidecar proxies, providing uniform discovery, observability, and policy.
Internet → Ingress Gateway → Sidecar → App → Sidecar → Target
3. Tiered Approach
Different discovery mechanisms for different service tiers based on criticality and dynamism.
Stateless APIs: Sidecar discovery
Databases: DNS with connection pooling
External services: Static configuration
Don't adopt a service mesh just because it's the 'modern' approach. Service meshes add operational complexity that may not be justified for smaller systems. Start with simpler patterns (DNS, cloud load balancers) and evolve as your system's complexity and scale warrant more sophisticated discovery.
Regardless of which discovery pattern you choose, certain best practices apply universally.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105
/** * Resilient service discovery with best practices */class ResilientServiceDiscovery { private cache = new Map<string, CachedEndpoints>(); private lastKnownGood = new Map<string, Endpoint[]>(); async getEndpoints(serviceName: string): Promise<Endpoint[]> { // 1. Try fresh from cache const cached = this.cache.get(serviceName); if (cached && !this.isStale(cached)) { return cached.endpoints; } // 2. Try refresh from registry try { const fresh = await this.fetchWithTimeout(serviceName, 2000); this.updateCache(serviceName, fresh); this.lastKnownGood.set(serviceName, fresh); return fresh; } catch (error) { console.warn(`Registry fetch failed for ${serviceName}`); } // 3. Fall back to stale cache if (cached) { console.warn(`Using stale cache for ${serviceName}`); return cached.endpoints; } // 4. Fall back to last known good const lastGood = this.lastKnownGood.get(serviceName); if (lastGood && lastGood.length > 0) { console.warn(`Using last known good for ${serviceName}`); return lastGood; } // 5. No endpoints available throw new Error(`No endpoints available for ${serviceName}`); } private async fetchWithTimeout( serviceName: string, timeoutMs: number ): Promise<Endpoint[]> { const controller = new AbortController(); const timeout = setTimeout(() => controller.abort(), timeoutMs); try { const response = await fetch( `http://consul.internal:8500/v1/health/service/${serviceName}?passing`, { signal: controller.signal } ); return this.parseResponse(await response.json()); } finally { clearTimeout(timeout); } } private isStale(cached: CachedEndpoints): boolean { return Date.now() - cached.timestamp > 30000; // 30 second staleness } private updateCache(serviceName: string, endpoints: Endpoint[]): void { this.cache.set(serviceName, { endpoints, timestamp: Date.now(), }); } /** * Mark endpoint as failed - temporary exclusion */ markFailed(serviceName: string, endpoint: Endpoint): void { const cached = this.cache.get(serviceName); if (cached) { const idx = cached.endpoints.findIndex( e => e.address === endpoint.address && e.port === endpoint.port ); if (idx >= 0) { cached.endpoints[idx].healthy = false; cached.endpoints[idx].failedAt = Date.now(); } } } /** * Periodic health recovery - re-enable failed endpoints */ recoverEndpoints(): void { const recoveryThreshold = 30000; // 30 seconds for (const [_, cached] of this.cache) { for (const endpoint of cached.endpoints) { if ( !endpoint.healthy && endpoint.failedAt && Date.now() - endpoint.failedAt > recoveryThreshold ) { endpoint.healthy = true; delete endpoint.failedAt; } } } }}Regularly test what happens when discovery fails. Use chaos engineering tools to inject registry unavailability, slow responses, and stale data. Verify that your system degrades gracefully rather than failing completely. The difference between a minor incident and a major outage often comes down to how well discovery handles failures.
We've explored the fundamental distinction between client-side and server-side discovery, along with the modern sidecar pattern that bridges both. Let's consolidate the key insights:
What's Next:
With discovery patterns understood, we'll explore Kubernetes service discovery—the most common service discovery implementation in modern cloud-native systems. You'll learn how Kubernetes abstracts discovery, the role of Services and Endpoints, CoreDNS integration, and how to leverage Kubernetes' built-in discovery mechanisms effectively.
You now understand the fundamental distinction between client-side and server-side discovery, their trade-offs, load balancing algorithms, and the modern sidecar pattern. This knowledge enables you to make informed architectural decisions about how discovery should work in your systems.