System Design (HLD)Service Discovery Mechanisms

Service Discovery Mechanisms

LevelIntermediate

Duration75 mins

TopicService Discovery Mechanisms

5 / 5

Kubernetes Service Discovery

Discovery in the Container Orchestration Era

Kubernetes has become the de facto standard for container orchestration, and with it comes a built-in service discovery system that handles most use cases without external registries. Understanding how Kubernetes implements discovery is essential for any engineer working with containerized workloads.

Kubernetes service discovery is elegant in its integration: services register automatically when pods start, discovery happens via standard DNS, and load balancing is transparent to applications. For many organizations, Kubernetes-native discovery eliminates the need for Consul, etcd, or other external registries.

But Kubernetes discovery also has complexity beneath the surface. Multiple service types, different proxy modes, DNS limitations, and multi-cluster scenarios all require deep understanding to navigate correctly.

What You Will Learn

By the end of this page, you will understand how Kubernetes Services work at a fundamental level, the role of kube-proxy and its different modes, DNS-based discovery in Kubernetes, advanced patterns including headless services and ExternalName, multi-cluster and cross-namespace discovery, and when to use Kubernetes-native vs. external service discovery.

Kubernetes Service Fundamentals

A Kubernetes Service is an abstraction that defines a logical set of Pods and a policy by which to access them. Services solve the fundamental problem of Pod ephemerality—Pods come and go with new IP addresses, but Services provide stable endpoints.

The Core Concept

┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                        │
│                                                              │
│   ┌─────────────────────────────────────────────────────┐   │
│   │              Service: payment-service                │   │
│   │              ClusterIP: 10.96.45.123                │   │
│   │              Port: 80                                │   │
│   └────────────────────────┬────────────────────────────┘   │
│                            │                                 │
│              ┌─────────────┼─────────────┐                  │
│              │             │             │                   │
│              ▼             ▼             ▼                   │
│   ┌──────────────┐ ┌──────────────┐ ┌──────────────┐        │
│   │     Pod      │ │     Pod      │ │     Pod      │        │
│   │ payment-abc  │ │ payment-def  │ │ payment-ghi  │        │
│   │ 10.244.1.10  │ │ 10.244.2.15  │ │ 10.244.1.22  │        │
│   │ label: app=  │ │ label: app=  │ │ label: app=  │        │
│   │    payment   │ │    payment   │ │    payment   │        │
│   └──────────────┘ └──────────────┘ └──────────────┘        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key Components:

Label Selector: The Service selects Pods based on labels (e.g., app=payment)
ClusterIP: A virtual IP address assigned to the Service, stable for its lifetime
Endpoints/EndpointSlice: The actual Pod IPs backing the Service
kube-proxy: The component that makes the virtual IP actually route to Pod IPs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
apiVersion: v1
kind: Service
metadata:
  name: payment-service
  namespace: production
  labels:
    app: payment
    tier: backend
  annotations:
    description: "Payment processing service"
spec:
  # Service type determines exposure
  type: ClusterIP  # Default: internal-only access
  
  # Label selector for target Pods
  selector:
    app: payment
    version: v2  # Can be as specific as needed
  
  # Port mapping
  ports:
  - name: http
    port: 80          # Port the Service exposes
    targetPort: 8080  # Port the Pod listens on
    protocol: TCP
  - name: grpc
    port: 9000
    targetPort: 9000
    protocol: TCP
  
  # Session affinity (optional)
  sessionAffinity: None  # or 'ClientIP' for sticky sessions
  
  # IP family (for dual-stack clusters)
  ipFamilyPolicy: SingleStack
  ipFamilies:
  - IPv4
 
---
# Deployment that the Service targets
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-deployment
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment
      version: v2
  template:
    metadata:
      labels:
        app: payment
        version: v2
    spec:
      containers:
      - name: payment
        image: payment-service:v2.3.1
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 9000
          name: grpc

How Services Track Pods

Kubernetes automatically maintains the association between Services and Pods:

Endpoints Controller watches for Pod and Service changes
When Pods with matching labels become Ready, they're added to Endpoints
When Pods are deleted or fail readiness checks, they're removed
EndpointSlice (newer) provides the same function with better scalability

# View Endpoints for a Service
$ kubectl get endpoints payment-service
NAME              ENDPOINTS                                   AGE
payment-service   10.244.1.10:8080,10.244.2.15:8080,10.244.1.22:8080   5d

# View EndpointSlices (more detailed, scalable)
$ kubectl get endpointslices -l kubernetes.io/service-name=payment-service
NAME                      ADDRESSTYPE   PORTS   ENDPOINTS                 AGE
payment-service-abc12     IPv4          8080    10.244.1.10,10.244.2.15   5d

The Discovery Flow:

Client Pod calls payment-service:80 (DNS resolves to ClusterIP)
Traffic goes to ClusterIP 10.96.45.123
kube-proxy intercepts and redirects to a healthy Pod IP
Pod receives traffic on its actual IP (e.g., 10.244.1.10:8080)
Response flows back through the same path

EndpointSlice vs Endpoints

EndpointSlice (GA in Kubernetes 1.21) replaces Endpoints for most purposes. It scales better for large clusters—instead of one huge Endpoints object, endpoints are split into slices. For Services with thousands of endpoints, this dramatically reduces API server load and update latency.

Kubernetes Service Types

Kubernetes offers multiple Service types, each serving different access patterns and requirements.

Type 1: ClusterIP (Default)

Exposes the Service on a cluster-internal IP. Only reachable from within the cluster.

apiVersion: v1
kind: Service
metadata:
  name: internal-api
spec:
  type: ClusterIP  # Default, can be omitted
  selector:
    app: internal-api
  ports:
  - port: 80
    targetPort: 8080

Use cases:

Service-to-service communication within cluster
Backend services not exposed externally
Microservices architectures where services call each other

Characteristics:

Stable virtual IP for lifetime of Service
DNS automatically configured: service-name.namespace.svc.cluster.local
No external access (must go through Ingress, NodePort, or LoadBalancer)

Type 2: NodePort

Exposes the Service on each Node's IP at a static port. Makes Service accessible from outside the cluster.

apiVersion: v1
kind: Service
metadata:
  name: nodeport-api
spec:
  type: NodePort
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080  # Optional: auto-assigned if not specified (30000-32767)

Access pattern:

<NodeIP>:30080 from outside cluster
ClusterIP still works inside cluster

Use cases:

Development/testing clusters without LoadBalancer support
When you manage your own external load balancer
As building block for LoadBalancer type (LoadBalancer allocates NodePort automatically)

Characteristics:

Port range restricted to 30000-32767 by default
Not ideal for production (awkward port numbers, requires knowing Node IPs)
Creates ClusterIP automatically

Type 3: LoadBalancer

Exposes the Service externally using a cloud provider's load balancer.

apiVersion: v1
kind: Service
metadata:
  name: public-api
  annotations:
    # Cloud-specific annotations
    service.beta.kubernetes.io/aws-load-balancer-internal: "false"
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - port: 443
    targetPort: 8443
  loadBalancerSourceRanges:  # Optional: IP whitelist
  - 10.0.0.0/8
  - 192.168.0.0/16

What happens:

Kubernetes assigns ClusterIP and NodePort
Cloud controller provisions external load balancer (ALB, NLB, etc.)
LB routes to NodePort on cluster nodes
kube-proxy routes from NodePort to Pod

Use cases:

Production services needing external access
When cloud LB features (TLS termination, WAF) are valuable
Simple external exposure without Ingress complexity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Headless Service - returns Pod IPs directly
apiVersion: v1
kind: Service
metadata:
  name: database-headless
spec:
  clusterIP: None  # The key difference
  selector:
    app: database
  ports:
  - port: 5432
    targetPort: 5432
 
# DNS behavior changes:
# $ dig database-headless.default.svc.cluster.local +short
# 10.244.1.10  # Pod IP directly
# 10.244.2.15  # Pod IP directly
# 10.244.1.22  # Pod IP directly
 
# Use cases:
# - StatefulSets (each Pod needs addressable identity)
# - Client-side load balancing requirements
# - Database clusters (Kafka, Cassandra, PostgreSQL replicas)
# - When you need to know all Pod IPs

Kubernetes Service Types Comparison
Type	Internal Access	External Access	Load Balancer	Use Case
ClusterIP	✓ Via ClusterIP	✗ No	kube-proxy	Internal services
NodePort	✓ Via ClusterIP	✓ Via NodeIP:Port	kube-proxy	Debug/test, custom LB
LoadBalancer	✓ Via ClusterIP	✓ Via External LB	Cloud LB + kube-proxy	Production external
Headless	✓ Via Pod IPs directly	✗ No	Client chooses	StatefulSets, DB clusters
ExternalName	✓ Via CNAME	N/A (external)	N/A	External service alias

ExternalName Services

ExternalName Services create DNS CNAME records pointing to external services: spec.externalName: api.external-provider.com. This lets internal services call external-api.namespace.svc.cluster.local and be redirected to the external domain. Useful for abstracting external dependencies.

kube-proxy: The Traffic Router

kube-proxy is the component that makes Kubernetes Services actually work. It runs on every node and maintains network rules for forwarding traffic from ClusterIP to actual Pod IPs.

kube-proxy Modes

kube-proxy can operate in different modes, each with different characteristics:

1. iptables Mode (Default)

kube-proxy programs iptables rules to redirect traffic:

# Simplified iptables flow for a Service with 3 endpoints
-A KUBE-SERVICES -d 10.96.45.123/32 -p tcp -m tcp --dport 80 -j KUBE-SVC-ABC123

# KUBE-SVC-ABC123 randomly selects an endpoint
-A KUBE-SVC-ABC123 -m statistic --mode random --probability 0.33333 -j KUBE-SEP-AAAA
-A KUBE-SVC-ABC123 -m statistic --mode random --probability 0.50000 -j KUBE-SEP-BBBB
-A KUBE-SVC-ABC123 -j KUBE-SEP-CCCC

# KUBE-SEP-* rules DNAT to Pod IPs
-A KUBE-SEP-AAAA -p tcp -m tcp -j DNAT --to-destination 10.244.1.10:8080
-A KUBE-SEP-BBBB -p tcp -m tcp -j DNAT --to-destination 10.244.2.15:8080
-A KUBE-SEP-CCCC -p tcp -m tcp -j DNAT --to-destination 10.244.1.22:8080

Characteristics:

Kernel-level packet processing (fast)
Random load balancing (probability-based)
Rules scale linearly with endpoints (O(n) rule chains)
Updates require rewriting many rules

2. IPVS Mode

IP Virtual Server (IPVS) is a transport-layer load balancer in the Linux kernel, designed for exactly this use case:

# View IPVS virtual servers
$ ipvsadm -Ln
IP Virtual Server version 1.2.1
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  10.96.45.123:80 rr
  -> 10.244.1.10:8080        Masq    1      3          15
  -> 10.244.2.15:8080        Masq    1      2          12
  -> 10.244.1.22:8080        Masq    1      4          18

Characteristics:

Purpose-built for load balancing (more efficient than iptables chains)
Multiple load balancing algorithms: round-robin, least-connections, IP hash, etc.
O(1) complexity for routing decisions
Better performance at scale (1000+ services)
Requires IPVS kernel modules

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# kube-proxy ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-proxy
  namespace: kube-system
data:
  config.conf: |
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    kind: KubeProxyConfiguration
    
    # Choose mode: "iptables", "ipvs", or "userspace" (deprecated)
    mode: "ipvs"
    
    # IPVS-specific settings
    ipvs:
      scheduler: "rr"  # rr=round-robin, lc=least-conn, sh=source-hash
      syncPeriod: 30s
      minSyncPeriod: 5s
    
    # iptables-specific settings
    iptables:
      masqueradeAll: false
      syncPeriod: 30s
      minSyncPeriod: 5s
    
    # Connection tracking settings
    conntrack:
      maxPerCore: 32768
      tcpEstablishedTimeout: 24h
      tcpCloseWaitTimeout: 1h

kube-proxy Mode Comparison
Aspect	iptables Mode	IPVS Mode
Performance (small)	Excellent	Excellent
Performance (large)	Degrades at scale	Maintains performance
Rule complexity	O(n) linear chains	O(1) hash tables
LB Algorithms	Random only	rr, lc, dh, sh, sed, nq
Session affinity	Limited	Better support
Debugging	iptables -L (complex)	ipvsadm -Ln (clear)
Kernel requirements	Standard	IPVS modules
Recommended for	< 1000 services	1000 services

3. Newer Alternatives: eBPF-Based kube-proxy Replacement

Modern CNI plugins like Cilium can replace kube-proxy entirely using eBPF:

Cilium: Implements Service load balancing in eBPF, bypassing iptables/IPVS
Calico: Offers eBPF mode for kube-proxy replacement

Benefits:

Lower latency (fewer kernel transitions)
Better observability (eBPF tracing)
No iptables rule explosion
Modern, actively developed

Trade-offs:

Newer technology, less battle-tested
Requires eBPF-capable kernel (4.19+)
Adds CNI-specific operational requirements

Choosing kube-proxy Mode

For most clusters, iptables mode is fine. Switch to IPVS if you have 500+ services or need specific load balancing algorithms. Consider eBPF-based alternatives (Cilium) for large clusters with advanced networking requirements.

Kubernetes DNS (CoreDNS)

Kubernetes includes a DNS server (CoreDNS since v1.11, previously kube-dns) that provides DNS-based service discovery. Every Pod is configured to use this DNS server by default.

DNS Record Structure

Kubernetes creates DNS records following a predictable naming scheme:

# For Services:
<service-name>.<namespace>.svc.<cluster-domain>

# Examples:
payment-service.production.svc.cluster.local
api-gateway.default.svc.cluster.local

# For Pods (less common):
<pod-ip-with-dashes>.<namespace>.pod.<cluster-domain>

# Example:
10-244-1-10.production.pod.cluster.local

Resolution from within a Pod:

# Within the same namespace, short name works
$ curl http://payment-service/api/v1/charge

# Cross-namespace requires namespace
$ curl http://logging-service.monitoring/api/v1/logs

# Fully qualified name always works
$ curl http://payment-service.production.svc.cluster.local/api/v1/charge

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        
        # Kubernetes zone - handles cluster.local domain
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure  # or 'verified' for tighter security
            fallthrough in-addr.arpa ip6.arpa
            ttl 30  # DNS record TTL
        }
        
        # Prometheus metrics endpoint
        prometheus :9153
        
        # Forwarding for external domains
        forward . /etc/resolv.conf {
            max_concurrent 1000
        }
        
        # Cache (external DNS responses)
        cache 30
        
        # Detect forwarding loops
        loop
        
        # Automatic config reload
        reload
        
        # Round-robin A records
        loadbalance
    }

DNS Queries from Pods

When a Pod resolves a name, its /etc/resolv.conf determines the process:

# Inside a Pod
$ cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10  # CoreDNS ClusterIP
options ndots:5

The ndots parameter is critical:

If a query name has fewer than 5 dots, search domains are appended
curl api becomes queries for:
1. api.default.svc.cluster.local
2. api.svc.cluster.local
3. api.cluster.local
4. api (bare)

This enables short service names but causes multiple DNS queries for external domains (google.com triggers 4 queries before success).

Mitigation:

Use FQDN with trailing dot for external: google.com.
Lower ndots if external DNS is common (trade-off: short names break)
Use dnsConfig in Pod spec for per-Pod settings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: v1
kind: Pod
metadata:
  name: optimized-dns-pod
spec:
  # DNS policy options:
  # - ClusterFirst: Use cluster DNS, fall back to node DNS (default)
  # - Default: Use node's DNS directly
  # - ClusterFirstWithHostNet: For hostNetwork pods
  # - None: Ignore all; use dnsConfig only
  dnsPolicy: ClusterFirst
  
  # Custom DNS configuration
  dnsConfig:
    nameservers:
      - 10.96.0.10
    searches:
      - production.svc.cluster.local
      - svc.cluster.local
    options:
      - name: ndots
        value: "2"  # Reduce search domain queries
      - name: single-request-reopen
        value: ""   # Better for some Linux kernels
 
  containers:
  - name: app
    image: my-app:latest
    
---
# For pods that make many external calls, use FQDN:
# Good: requests.get("https://api.stripe.com./v1/charges")  # Note trailing dot
# Slow: requests.get("https://api.stripe.com/v1/charges")   # 4 DNS queries

DNS Performance Impact

At high scale, DNS can become a bottleneck. Symptoms: CoreDNS CPU saturation, high DNS latency, SERVFAIL errors. Mitigations: NodeLocal DNSCache (DaemonSet caching), increased CoreDNS replicas, optimized ndots settings. Monitor CoreDNS metrics via Prometheus.

Advanced Kubernetes Discovery Patterns

Beyond basic ClusterIP Services, Kubernetes supports advanced discovery patterns for complex scenarios.

Pattern 1: Headless Services with StatefulSets

StatefulSets require stable network identities. Headless Services provide this:

apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  clusterIP: None  # Headless
  selector:
    app: postgres
  ports:
  - port: 5432
    name: postgres

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: "postgres"  # Links to headless Service
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:14
        ports:
        - containerPort: 5432
          name: postgres

DNS records created:

# Service DNS (returns all Pod IPs)
postgres.default.svc.cluster.local → 10.244.1.10, 10.244.2.15, 10.244.1.22

# Individual Pod DNS (stable identity!)
postgres-0.postgres.default.svc.cluster.local → 10.244.1.10
postgres-1.postgres.default.svc.cluster.local → 10.244.2.15
postgres-2.postgres.default.svc.cluster.local → 10.244.1.22

Even if postgres-0 is rescheduled, it gets the same DNS name (pointing to new IP).

Pattern 2: Service Topology (Deprecated) → Topology Aware Hints

Kubernetes 1.21+ supports topology-aware routing to prefer local endpoints:

apiVersion: v1
kind: Service
metadata:
  name: zone-aware-service
  annotations:
    service.kubernetes.io/topology-mode: Auto  # or 'PreferClose'
spec:
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

When enabled:

EndpointSlices include topology hints
kube-proxy prefers endpoints in the same zone
Falls back to other zones if local endpoints unavailable

Benefits:

Reduced cross-zone latency
Lower cloud data transfer costs
Improved performance for latency-sensitive services

Pattern 3: Multi-Port Services

Services can expose multiple ports for protocols:

apiVersion: v1
kind: Service
metadata:
  name: multi-protocol-service
spec:
  selector:
    app: api
  ports:
  - name: http     # Names required for multi-port
    port: 80
    targetPort: 8080
    protocol: TCP
  - name: https
    port: 443
    targetPort: 8443
    protocol: TCP
  - name: grpc
    port: 9000
    targetPort: 9000
    protocol: TCP
  - name: metrics
    port: 9090
    targetPort: 9090
    protocol: TCP

DNS SRV records:

_http._tcp.multi-protocol-service.default.svc.cluster.local → port 80
_grpc._tcp.multi-protocol-service.default.svc.cluster.local → port 9000

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Abstracting external dependencies
apiVersion: v1
kind: Service
metadata:
  name: payment-gateway
  namespace: production
spec:
  type: ExternalName
  externalName: api.stripe.com
 
# Application calls: http://payment-gateway/
# Resolves to: CNAME api.stripe.com
 
---
# Migration pattern: switch from external to internal
# Step 1: Start with ExternalName
apiVersion: v1
kind: Service
metadata:
  name: auth-service
spec:
  type: ExternalName
  externalName: auth.legacy-datacenter.company.com
 
# Step 2: When migrated, switch to ClusterIP (no app changes needed)
apiVersion: v1
kind: Service
metadata:
  name: auth-service
spec:
  type: ClusterIP
  selector:
    app: auth
  ports:
  - port: 80

ExternalName Limitations

ExternalName creates CNAME records, not A records. Some applications have issues with CNAME resolution. Also, you can't specify ports with ExternalName—it's purely DNS-level redirection. For more control, use a CluterIP Service with manually managed Endpoints.

Multi-Cluster Service Discovery

As organizations scale, single-cluster boundaries often prove insufficient. Multi-cluster architectures require service discovery that spans cluster boundaries.

Why Multi-Cluster?

Blast radius: Limit impact of cluster-level failures
Regional distribution: Place workloads close to users
Resource isolation: Separate dev/staging/production or team boundaries
Scaling limits: Very large organizations exceed single-cluster scale
Compliance: Keep data in specific regions

Challenge: Kubernetes Discovery Is Cluster-Scoped

Standard Kubernetes Services only work within their cluster:

ClusterIPs are cluster-local (not routable externally)
CoreDNS only knows about local Services
Pods can't natively discover Services in other clusters

Pattern 1: Service Mesh Federation

Service meshes like Istio support multi-cluster:

# Istio multi-cluster: Shared control plane
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: production-mesh
      multiCluster:
        clusterName: cluster-east
      network: network-east

# Services automatically discoverable across clusters
# payment-service.production.svc.cluster.local works from any cluster

Istio handles:

Certificate exchange between clusters
Cross-cluster service discovery
Traffic routing across clusters
Locality-aware load balancing

Pattern 2: Kubernetes Multi-Cluster Services (MCS API)

Kubernetes sig-multicluster is standardizing cross-cluster discovery:

# Export a Service for multi-cluster discovery
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: payment-service
  namespace: production

# In other clusters, import is automatic via clusterset
# DNS: payment-service.production.svc.clusterset.local
#      (routes to any cluster exporting this service)

MCS provides:

Standard API across implementations (Submariner, Istio, Cilium, etc.)
clusterset.local domain for cross-cluster services
Automatic endpoint aggregation

Current status: Alpha, but gaining adoption. Check your cluster version.

Pattern 3: External Registry (Consul, etc.)

For hybrid environments (Kubernetes + VMs + other platforms):

# Consul registration sidecar in Kubernetes
apiVersion: v1
kind: Pod
metadata:
  name: payment-pod
  annotations:
    consul.hashicorp.com/connect-inject: 'true'
    consul.hashicorp.com/connect-service: 'payment-service'
spec:
  containers:
  - name: payment
    image: payment:v2

Consul provides:

Registration from any platform (K8s, VMs, cloud instances)
Single discovery namespace across platforms
Health checking independent of Kubernetes
Multi-datacenter support

Multi-Cluster Discovery Options
Approach	Complexity	Features	Best For
Manual (LoadBalancer/DNS)	Low	Basic cross-cluster	Simple multi-region
Istio Multi-Cluster	High	Full mesh features	Advanced traffic management
MCS API	Medium	Standard K8s API	K8s-native multi-cluster
Submariner	Medium	Network connectivity + discovery	On-prem/hybrid
External Registry (Consul)	Medium	Cross-platform	K8s + non-K8s hybrid

Start Simple

Many "multi-cluster" needs can be solved with simpler approaches: LoadBalancer Services + external DNS, or API gateways that route between clusters. Full service mesh federation is powerful but operationally complex. Validate that you need the sophistication before adopting it.

Troubleshooting and Best Practices

Kubernetes service discovery issues can be subtle. Here's a systematic approach to troubleshooting and best practices for production deployments.

Common Issue 1: Service Not Resolving

# Step 1: Verify Service exists
$ kubectl get svc payment-service -n production

# Step 2: Check Endpoints (are Pods selected?)
$ kubectl get endpoints payment-service -n production
NAME              ENDPOINTS                      AGE
payment-service   10.244.1.10:8080,10.244.2.15   5d

# If ENDPOINTS is empty:
# - Check label selector matches Pod labels
# - Verify Pods are Ready (passing readiness probes)
$ kubectl get pods -l app=payment -n production
$ kubectl describe pod <pod-name> -n production | grep -A5 "Conditions:"

# Step 3: Test DNS resolution from a Pod
$ kubectl run debug --rm -it --image=busybox -- nslookup payment-service.production.svc.cluster.local

# Step 4: Check CoreDNS is running
$ kubectl get pods -n kube-system -l k8s-app=kube-dns
$ kubectl logs -n kube-system -l k8s-app=kube-dns

Common Issue 2: Service Reachable but Slow/Unreliable

# Check for unhealthy Pods in Endpoints
$ kubectl get endpoints payment-service -o yaml
# Look for 'notReadyAddresses' - these are failing probes

# Check kube-proxy logs on the node
$ kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=100

# Verify iptables/IPVS rules are correct
# On a node:
$ iptables -t nat -L KUBE-SERVICES | grep payment
$ ipvsadm -Ln | grep <ClusterIP>

# Network policy blocking traffic?
$ kubectl get networkpolicies -n production

Common Issue 3: DNS Performance Problems

# Check CoreDNS performance
$ kubectl top pods -n kube-system -l k8s-app=kube-dns

# Look for high latency in CoreDNS metrics
$ kubectl port-forward -n kube-system svc/kube-dns 9153:9153
$ curl localhost:9153/metrics | grep coredns_dns_request_duration_seconds

# Check for failures
$ kubectl logs -n kube-system -l k8s-app=kube-dns | grep -i error

Kubernetes Discovery Best Practices

•Always use readiness probes — Pods only receive traffic when Ready. Configure probes to accurately reflect actual readiness.
•Prefer ClusterIP over NodePort/LoadBalancer for internal services — Simpler, more efficient, no external exposure.
•Use namespaces for service organization — Namespaces provide scope for service names and enable RBAC boundaries.
•Configure appropriate DNS settings — Tune ndots if external DNS calls are frequent. Use FQDN with trailing dot for external.
•Monitor CoreDNS — Watch CPU, memory, request latency, and error rates. Scale replicas as cluster grows.
•Consider NodeLocal DNSCache — For high-scale clusters, reduces latency and CoreDNS load.
•Use topology-aware hints — Reduce cross-zone traffic and latency where applicable.
•Document service naming conventions — Consistent naming makes discovery predictable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
  labels:
    app: api
    version: v2
    team: platform
  annotations:
    # Documentation
    description: "Primary API service for customer-facing applications"
    team: "platform-team@company.com"
    
    # Topology awareness (K8s 1.21+)
    service.kubernetes.io/topology-mode: Auto
spec:
  type: ClusterIP
  selector:
    app: api
    version: v2
  ports:
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP
  - name: grpc
    port: 9000
    targetPort: 9000
    protocol: TCP
  - name: metrics
    port: 9090
    targetPort: 9090
    protocol: TCP
  # Session affinity for stateful-ish workloads
  sessionAffinity: None  # or ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800

The Debug Pod Pattern

Keep a debug pod template handy: kubectl run debug --rm -it --image=nicolaka/netshoot -- bash. Netshoot includes dig, nslookup, curl, and network debugging tools. Use it to test DNS resolution and service connectivity from within the cluster network.

Summary: Kubernetes Service Discovery

We've comprehensively explored Kubernetes-native service discovery—from fundamental concepts to advanced multi-cluster patterns. Let's consolidate the essential insights:

Key Takeaways

•Services abstract Pod ephemerality — ClusterIP provides stable endpoints while Pods come and go with dynamic IPs.
•Service types serve different access patterns — ClusterIP for internal, NodePort for development, LoadBalancer for production external, Headless for client-side discovery.
•kube-proxy makes it work — iptables mode is default; IPVS mode scales better; eBPF (Cilium) is the future.
•DNS is the discovery mechanism — CoreDNS provides service.namespace.svc.cluster.local resolution with configurable TTL.
•Tune DNS for your workload — ndots, search domains, and NodeLocal DNSCache all impact performance.
•Headless Services enable StatefulSets — Stable pod DNS names (pod-0.service.namespace.svc.cluster.local) for stateful workloads.
•Multi-cluster requires additional tooling — Service meshes, MCS API, or external registries extend discovery beyond single cluster.
•Readiness probes are essential — Without proper probes, traffic routes to unready Pods causing errors.

Module Complete: Service Discovery Mechanisms

Across this module, you've learned:

Why service discovery is essential in dynamic distributed systems
Client-side vs server-side patterns and their trade-offs
Service registries (Consul, etcd, Zookeeper) and when to use each
DNS-based discovery — capabilities, limitations, and best practices
Kubernetes-native discovery — Services, DNS, kube-proxy, and advanced patterns

You now have comprehensive knowledge to design and implement service discovery for systems of any scale, from simple applications to complex multi-cluster, multi-region architectures.

Module Complete

Congratulations! You've completed the Service Discovery Mechanisms module. You understand the full spectrum of discovery approaches—from simple DNS to sophisticated service meshes—and can make informed architectural decisions for your distributed systems. This knowledge forms a critical foundation for building reliable, scalable microservices architectures.

5 / 5

Loading learning content...

System Design (HLD)Service Discovery Mechanisms

Service Discovery Mechanisms

LevelIntermediate

Duration75 mins

TopicService Discovery Mechanisms

5 / 5

Kubernetes Service Discovery

Discovery in the Container Orchestration Era

What You Will Learn

Kubernetes Service Fundamentals

The Core Concept

┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                        │
│                                                              │
│   ┌─────────────────────────────────────────────────────┐   │
│   │              Service: payment-service                │   │
│   │              ClusterIP: 10.96.45.123                │   │
│   │              Port: 80                                │   │
│   └────────────────────────┬────────────────────────────┘   │
│                            │                                 │
│              ┌─────────────┼─────────────┐                  │
│              │             │             │                   │
│              ▼             ▼             ▼                   │
│   ┌──────────────┐ ┌──────────────┐ ┌──────────────┐        │
│   │     Pod      │ │     Pod      │ │     Pod      │        │
│   │ payment-abc  │ │ payment-def  │ │ payment-ghi  │        │
│   │ 10.244.1.10  │ │ 10.244.2.15  │ │ 10.244.1.22  │        │
│   │ label: app=  │ │ label: app=  │ │ label: app=  │        │
│   │    payment   │ │    payment   │ │    payment   │        │
│   └──────────────┘ └──────────────┘ └──────────────┘        │
│                                                              │
└─────────────────────────────────────────────────────────────┘

Key Components:

Label Selector: The Service selects Pods based on labels (e.g., app=payment)
ClusterIP: A virtual IP address assigned to the Service, stable for its lifetime
Endpoints/EndpointSlice: The actual Pod IPs backing the Service
kube-proxy: The component that makes the virtual IP actually route to Pod IPs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
apiVersion: v1
kind: Service
metadata:
  name: payment-service
  namespace: production
  labels:
    app: payment
    tier: backend
  annotations:
    description: "Payment processing service"
spec:
  # Service type determines exposure
  type: ClusterIP  # Default: internal-only access
  
  # Label selector for target Pods
  selector:
    app: payment
    version: v2  # Can be as specific as needed
  
  # Port mapping
  ports:
  - name: http
    port: 80          # Port the Service exposes
    targetPort: 8080  # Port the Pod listens on
    protocol: TCP
  - name: grpc
    port: 9000
    targetPort: 9000
    protocol: TCP
  
  # Session affinity (optional)
  sessionAffinity: None  # or 'ClientIP' for sticky sessions
  
  # IP family (for dual-stack clusters)
  ipFamilyPolicy: SingleStack
  ipFamilies:
  - IPv4
 
---
# Deployment that the Service targets
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-deployment
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: payment
      version: v2
  template:
    metadata:
      labels:
        app: payment
        version: v2
    spec:
      containers:
      - name: payment
        image: payment-service:v2.3.1
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 9000
          name: grpc

How Services Track Pods

Kubernetes automatically maintains the association between Services and Pods:

Endpoints Controller watches for Pod and Service changes
When Pods with matching labels become Ready, they're added to Endpoints
When Pods are deleted or fail readiness checks, they're removed
EndpointSlice (newer) provides the same function with better scalability

# View Endpoints for a Service
$ kubectl get endpoints payment-service
NAME              ENDPOINTS                                   AGE
payment-service   10.244.1.10:8080,10.244.2.15:8080,10.244.1.22:8080   5d

# View EndpointSlices (more detailed, scalable)
$ kubectl get endpointslices -l kubernetes.io/service-name=payment-service
NAME                      ADDRESSTYPE   PORTS   ENDPOINTS                 AGE
payment-service-abc12     IPv4          8080    10.244.1.10,10.244.2.15   5d

The Discovery Flow:

Client Pod calls payment-service:80 (DNS resolves to ClusterIP)
Traffic goes to ClusterIP 10.96.45.123
kube-proxy intercepts and redirects to a healthy Pod IP
Pod receives traffic on its actual IP (e.g., 10.244.1.10:8080)
Response flows back through the same path

EndpointSlice vs Endpoints

Kubernetes Service Types

Kubernetes offers multiple Service types, each serving different access patterns and requirements.

Type 1: ClusterIP (Default)

Exposes the Service on a cluster-internal IP. Only reachable from within the cluster.

apiVersion: v1
kind: Service
metadata:
  name: internal-api
spec:
  type: ClusterIP  # Default, can be omitted
  selector:
    app: internal-api
  ports:
  - port: 80
    targetPort: 8080

Use cases:

Service-to-service communication within cluster
Backend services not exposed externally
Microservices architectures where services call each other

Characteristics:

Stable virtual IP for lifetime of Service
DNS automatically configured: service-name.namespace.svc.cluster.local
No external access (must go through Ingress, NodePort, or LoadBalancer)

Type 2: NodePort

Exposes the Service on each Node's IP at a static port. Makes Service accessible from outside the cluster.

apiVersion: v1
kind: Service
metadata:
  name: nodeport-api
spec:
  type: NodePort
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080  # Optional: auto-assigned if not specified (30000-32767)

Access pattern:

<NodeIP>:30080 from outside cluster
ClusterIP still works inside cluster

Use cases:

Development/testing clusters without LoadBalancer support
When you manage your own external load balancer
As building block for LoadBalancer type (LoadBalancer allocates NodePort automatically)

Characteristics:

Port range restricted to 30000-32767 by default
Not ideal for production (awkward port numbers, requires knowing Node IPs)
Creates ClusterIP automatically

Type 3: LoadBalancer

Exposes the Service externally using a cloud provider's load balancer.

apiVersion: v1
kind: Service
metadata:
  name: public-api
  annotations:
    # Cloud-specific annotations
    service.beta.kubernetes.io/aws-load-balancer-internal: "false"
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
spec:
  type: LoadBalancer
  selector:
    app: api
  ports:
  - port: 443
    targetPort: 8443
  loadBalancerSourceRanges:  # Optional: IP whitelist
  - 10.0.0.0/8
  - 192.168.0.0/16

What happens:

Kubernetes assigns ClusterIP and NodePort
Cloud controller provisions external load balancer (ALB, NLB, etc.)
LB routes to NodePort on cluster nodes
kube-proxy routes from NodePort to Pod

Use cases:

Production services needing external access
When cloud LB features (TLS termination, WAF) are valuable
Simple external exposure without Ingress complexity

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Headless Service - returns Pod IPs directly
apiVersion: v1
kind: Service
metadata:
  name: database-headless
spec:
  clusterIP: None  # The key difference
  selector:
    app: database
  ports:
  - port: 5432
    targetPort: 5432
 
# DNS behavior changes:
# $ dig database-headless.default.svc.cluster.local +short
# 10.244.1.10  # Pod IP directly
# 10.244.2.15  # Pod IP directly
# 10.244.1.22  # Pod IP directly
 
# Use cases:
# - StatefulSets (each Pod needs addressable identity)
# - Client-side load balancing requirements
# - Database clusters (Kafka, Cassandra, PostgreSQL replicas)
# - When you need to know all Pod IPs

Kubernetes Service Types Comparison
Type	Internal Access	External Access	Load Balancer	Use Case
ClusterIP	✓ Via ClusterIP	✗ No	kube-proxy	Internal services
NodePort	✓ Via ClusterIP	✓ Via NodeIP:Port	kube-proxy	Debug/test, custom LB
LoadBalancer	✓ Via ClusterIP	✓ Via External LB	Cloud LB + kube-proxy	Production external
Headless	✓ Via Pod IPs directly	✗ No	Client chooses	StatefulSets, DB clusters
ExternalName	✓ Via CNAME	N/A (external)	N/A	External service alias

ExternalName Services

kube-proxy: The Traffic Router

kube-proxy is the component that makes Kubernetes Services actually work. It runs on every node and maintains network rules for forwarding traffic from ClusterIP to actual Pod IPs.

kube-proxy Modes

kube-proxy can operate in different modes, each with different characteristics:

1. iptables Mode (Default)

kube-proxy programs iptables rules to redirect traffic:

# Simplified iptables flow for a Service with 3 endpoints
-A KUBE-SERVICES -d 10.96.45.123/32 -p tcp -m tcp --dport 80 -j KUBE-SVC-ABC123

# KUBE-SVC-ABC123 randomly selects an endpoint
-A KUBE-SVC-ABC123 -m statistic --mode random --probability 0.33333 -j KUBE-SEP-AAAA
-A KUBE-SVC-ABC123 -m statistic --mode random --probability 0.50000 -j KUBE-SEP-BBBB
-A KUBE-SVC-ABC123 -j KUBE-SEP-CCCC

# KUBE-SEP-* rules DNAT to Pod IPs
-A KUBE-SEP-AAAA -p tcp -m tcp -j DNAT --to-destination 10.244.1.10:8080
-A KUBE-SEP-BBBB -p tcp -m tcp -j DNAT --to-destination 10.244.2.15:8080
-A KUBE-SEP-CCCC -p tcp -m tcp -j DNAT --to-destination 10.244.1.22:8080

Characteristics:

Kernel-level packet processing (fast)
Random load balancing (probability-based)
Rules scale linearly with endpoints (O(n) rule chains)
Updates require rewriting many rules

2. IPVS Mode

IP Virtual Server (IPVS) is a transport-layer load balancer in the Linux kernel, designed for exactly this use case:

# View IPVS virtual servers
$ ipvsadm -Ln
IP Virtual Server version 1.2.1
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP  10.96.45.123:80 rr
  -> 10.244.1.10:8080        Masq    1      3          15
  -> 10.244.2.15:8080        Masq    1      2          12
  -> 10.244.1.22:8080        Masq    1      4          18

Characteristics:

Purpose-built for load balancing (more efficient than iptables chains)
Multiple load balancing algorithms: round-robin, least-connections, IP hash, etc.
O(1) complexity for routing decisions
Better performance at scale (1000+ services)
Requires IPVS kernel modules

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# kube-proxy ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-proxy
  namespace: kube-system
data:
  config.conf: |
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    kind: KubeProxyConfiguration
    
    # Choose mode: "iptables", "ipvs", or "userspace" (deprecated)
    mode: "ipvs"
    
    # IPVS-specific settings
    ipvs:
      scheduler: "rr"  # rr=round-robin, lc=least-conn, sh=source-hash
      syncPeriod: 30s
      minSyncPeriod: 5s
    
    # iptables-specific settings
    iptables:
      masqueradeAll: false
      syncPeriod: 30s
      minSyncPeriod: 5s
    
    # Connection tracking settings
    conntrack:
      maxPerCore: 32768
      tcpEstablishedTimeout: 24h
      tcpCloseWaitTimeout: 1h

kube-proxy Mode Comparison
Aspect	iptables Mode	IPVS Mode
Performance (small)	Excellent	Excellent
Performance (large)	Degrades at scale	Maintains performance
Rule complexity	O(n) linear chains	O(1) hash tables
LB Algorithms	Random only	rr, lc, dh, sh, sed, nq
Session affinity	Limited	Better support
Debugging	iptables -L (complex)	ipvsadm -Ln (clear)
Kernel requirements	Standard	IPVS modules
Recommended for	< 1000 services	1000 services

3. Newer Alternatives: eBPF-Based kube-proxy Replacement

Modern CNI plugins like Cilium can replace kube-proxy entirely using eBPF:

Cilium: Implements Service load balancing in eBPF, bypassing iptables/IPVS
Calico: Offers eBPF mode for kube-proxy replacement

Benefits:

Lower latency (fewer kernel transitions)
Better observability (eBPF tracing)
No iptables rule explosion
Modern, actively developed

Trade-offs:

Newer technology, less battle-tested
Requires eBPF-capable kernel (4.19+)
Adds CNI-specific operational requirements

Choosing kube-proxy Mode

Kubernetes DNS (CoreDNS)

Kubernetes includes a DNS server (CoreDNS since v1.11, previously kube-dns) that provides DNS-based service discovery. Every Pod is configured to use this DNS server by default.

DNS Record Structure

Kubernetes creates DNS records following a predictable naming scheme:

# For Services:
<service-name>.<namespace>.svc.<cluster-domain>

# Examples:
payment-service.production.svc.cluster.local
api-gateway.default.svc.cluster.local

# For Pods (less common):
<pod-ip-with-dashes>.<namespace>.pod.<cluster-domain>

# Example:
10-244-1-10.production.pod.cluster.local

Resolution from within a Pod:

# Within the same namespace, short name works
$ curl http://payment-service/api/v1/charge

# Cross-namespace requires namespace
$ curl http://logging-service.monitoring/api/v1/logs

# Fully qualified name always works
$ curl http://payment-service.production.svc.cluster.local/api/v1/charge

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
            lameduck 5s
        }
        ready
        
        # Kubernetes zone - handles cluster.local domain
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure  # or 'verified' for tighter security
            fallthrough in-addr.arpa ip6.arpa
            ttl 30  # DNS record TTL
        }
        
        # Prometheus metrics endpoint
        prometheus :9153
        
        # Forwarding for external domains
        forward . /etc/resolv.conf {
            max_concurrent 1000
        }
        
        # Cache (external DNS responses)
        cache 30
        
        # Detect forwarding loops
        loop
        
        # Automatic config reload
        reload
        
        # Round-robin A records
        loadbalance
    }

DNS Queries from Pods

When a Pod resolves a name, its /etc/resolv.conf determines the process:

# Inside a Pod
$ cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10  # CoreDNS ClusterIP
options ndots:5

The ndots parameter is critical:

If a query name has fewer than 5 dots, search domains are appended
curl api becomes queries for:
1. api.default.svc.cluster.local
2. api.svc.cluster.local
3. api.cluster.local
4. api (bare)

This enables short service names but causes multiple DNS queries for external domains (google.com triggers 4 queries before success).

Mitigation:

Use FQDN with trailing dot for external: google.com.
Lower ndots if external DNS is common (trade-off: short names break)
Use dnsConfig in Pod spec for per-Pod settings

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: v1
kind: Pod
metadata:
  name: optimized-dns-pod
spec:
  # DNS policy options:
  # - ClusterFirst: Use cluster DNS, fall back to node DNS (default)
  # - Default: Use node's DNS directly
  # - ClusterFirstWithHostNet: For hostNetwork pods
  # - None: Ignore all; use dnsConfig only
  dnsPolicy: ClusterFirst
  
  # Custom DNS configuration
  dnsConfig:
    nameservers:
      - 10.96.0.10
    searches:
      - production.svc.cluster.local
      - svc.cluster.local
    options:
      - name: ndots
        value: "2"  # Reduce search domain queries
      - name: single-request-reopen
        value: ""   # Better for some Linux kernels
 
  containers:
  - name: app
    image: my-app:latest
    
---
# For pods that make many external calls, use FQDN:
# Good: requests.get("https://api.stripe.com./v1/charges")  # Note trailing dot
# Slow: requests.get("https://api.stripe.com/v1/charges")   # 4 DNS queries

DNS Performance Impact

Advanced Kubernetes Discovery Patterns

Beyond basic ClusterIP Services, Kubernetes supports advanced discovery patterns for complex scenarios.

Pattern 1: Headless Services with StatefulSets

StatefulSets require stable network identities. Headless Services provide this:

apiVersion: v1
kind: Service
metadata:
  name: postgres
  labels:
    app: postgres
spec:
  clusterIP: None  # Headless
  selector:
    app: postgres
  ports:
  - port: 5432
    name: postgres

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: "postgres"  # Links to headless Service
  replicas: 3
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:14
        ports:
        - containerPort: 5432
          name: postgres

DNS records created:

# Service DNS (returns all Pod IPs)
postgres.default.svc.cluster.local → 10.244.1.10, 10.244.2.15, 10.244.1.22

# Individual Pod DNS (stable identity!)
postgres-0.postgres.default.svc.cluster.local → 10.244.1.10
postgres-1.postgres.default.svc.cluster.local → 10.244.2.15
postgres-2.postgres.default.svc.cluster.local → 10.244.1.22

Even if postgres-0 is rescheduled, it gets the same DNS name (pointing to new IP).

Pattern 2: Service Topology (Deprecated) → Topology Aware Hints

Kubernetes 1.21+ supports topology-aware routing to prefer local endpoints:

apiVersion: v1
kind: Service
metadata:
  name: zone-aware-service
  annotations:
    service.kubernetes.io/topology-mode: Auto  # or 'PreferClose'
spec:
  selector:
    app: api
  ports:
  - port: 80
    targetPort: 8080

When enabled:

EndpointSlices include topology hints
kube-proxy prefers endpoints in the same zone
Falls back to other zones if local endpoints unavailable

Benefits:

Reduced cross-zone latency
Lower cloud data transfer costs
Improved performance for latency-sensitive services

Pattern 3: Multi-Port Services

Services can expose multiple ports for protocols:

apiVersion: v1
kind: Service
metadata:
  name: multi-protocol-service
spec:
  selector:
    app: api
  ports:
  - name: http     # Names required for multi-port
    port: 80
    targetPort: 8080
    protocol: TCP
  - name: https
    port: 443
    targetPort: 8443
    protocol: TCP
  - name: grpc
    port: 9000
    targetPort: 9000
    protocol: TCP
  - name: metrics
    port: 9090
    targetPort: 9090
    protocol: TCP

DNS SRV records:

_http._tcp.multi-protocol-service.default.svc.cluster.local → port 80
_grpc._tcp.multi-protocol-service.default.svc.cluster.local → port 9000

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Abstracting external dependencies
apiVersion: v1
kind: Service
metadata:
  name: payment-gateway
  namespace: production
spec:
  type: ExternalName
  externalName: api.stripe.com
 
# Application calls: http://payment-gateway/
# Resolves to: CNAME api.stripe.com
 
---
# Migration pattern: switch from external to internal
# Step 1: Start with ExternalName
apiVersion: v1
kind: Service
metadata:
  name: auth-service
spec:
  type: ExternalName
  externalName: auth.legacy-datacenter.company.com
 
# Step 2: When migrated, switch to ClusterIP (no app changes needed)
apiVersion: v1
kind: Service
metadata:
  name: auth-service
spec:
  type: ClusterIP
  selector:
    app: auth
  ports:
  - port: 80

ExternalName Limitations

Multi-Cluster Service Discovery

As organizations scale, single-cluster boundaries often prove insufficient. Multi-cluster architectures require service discovery that spans cluster boundaries.

Why Multi-Cluster?

Blast radius: Limit impact of cluster-level failures
Regional distribution: Place workloads close to users
Resource isolation: Separate dev/staging/production or team boundaries
Scaling limits: Very large organizations exceed single-cluster scale
Compliance: Keep data in specific regions

Challenge: Kubernetes Discovery Is Cluster-Scoped

Standard Kubernetes Services only work within their cluster:

ClusterIPs are cluster-local (not routable externally)
CoreDNS only knows about local Services
Pods can't natively discover Services in other clusters

Pattern 1: Service Mesh Federation

Service meshes like Istio support multi-cluster:

# Istio multi-cluster: Shared control plane
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    global:
      meshID: production-mesh
      multiCluster:
        clusterName: cluster-east
      network: network-east

# Services automatically discoverable across clusters
# payment-service.production.svc.cluster.local works from any cluster

Istio handles:

Certificate exchange between clusters
Cross-cluster service discovery
Traffic routing across clusters
Locality-aware load balancing

Pattern 2: Kubernetes Multi-Cluster Services (MCS API)

Kubernetes sig-multicluster is standardizing cross-cluster discovery:

# Export a Service for multi-cluster discovery
apiVersion: multicluster.x-k8s.io/v1alpha1
kind: ServiceExport
metadata:
  name: payment-service
  namespace: production

# In other clusters, import is automatic via clusterset
# DNS: payment-service.production.svc.clusterset.local
#      (routes to any cluster exporting this service)

MCS provides:

Standard API across implementations (Submariner, Istio, Cilium, etc.)
clusterset.local domain for cross-cluster services
Automatic endpoint aggregation

Current status: Alpha, but gaining adoption. Check your cluster version.

Pattern 3: External Registry (Consul, etc.)

For hybrid environments (Kubernetes + VMs + other platforms):

# Consul registration sidecar in Kubernetes
apiVersion: v1
kind: Pod
metadata:
  name: payment-pod
  annotations:
    consul.hashicorp.com/connect-inject: 'true'
    consul.hashicorp.com/connect-service: 'payment-service'
spec:
  containers:
  - name: payment
    image: payment:v2

Consul provides:

Registration from any platform (K8s, VMs, cloud instances)
Single discovery namespace across platforms
Health checking independent of Kubernetes
Multi-datacenter support

Multi-Cluster Discovery Options
Approach	Complexity	Features	Best For
Manual (LoadBalancer/DNS)	Low	Basic cross-cluster	Simple multi-region
Istio Multi-Cluster	High	Full mesh features	Advanced traffic management
MCS API	Medium	Standard K8s API	K8s-native multi-cluster
Submariner	Medium	Network connectivity + discovery	On-prem/hybrid
External Registry (Consul)	Medium	Cross-platform	K8s + non-K8s hybrid

Start Simple

Troubleshooting and Best Practices

Kubernetes service discovery issues can be subtle. Here's a systematic approach to troubleshooting and best practices for production deployments.

Common Issue 1: Service Not Resolving

# Step 1: Verify Service exists
$ kubectl get svc payment-service -n production

# Step 2: Check Endpoints (are Pods selected?)
$ kubectl get endpoints payment-service -n production
NAME              ENDPOINTS                      AGE
payment-service   10.244.1.10:8080,10.244.2.15   5d

# If ENDPOINTS is empty:
# - Check label selector matches Pod labels
# - Verify Pods are Ready (passing readiness probes)
$ kubectl get pods -l app=payment -n production
$ kubectl describe pod <pod-name> -n production | grep -A5 "Conditions:"

# Step 3: Test DNS resolution from a Pod
$ kubectl run debug --rm -it --image=busybox -- nslookup payment-service.production.svc.cluster.local

# Step 4: Check CoreDNS is running
$ kubectl get pods -n kube-system -l k8s-app=kube-dns
$ kubectl logs -n kube-system -l k8s-app=kube-dns

Common Issue 2: Service Reachable but Slow/Unreliable

# Check for unhealthy Pods in Endpoints
$ kubectl get endpoints payment-service -o yaml
# Look for 'notReadyAddresses' - these are failing probes

# Check kube-proxy logs on the node
$ kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=100

# Verify iptables/IPVS rules are correct
# On a node:
$ iptables -t nat -L KUBE-SERVICES | grep payment
$ ipvsadm -Ln | grep <ClusterIP>

# Network policy blocking traffic?
$ kubectl get networkpolicies -n production

Common Issue 3: DNS Performance Problems

# Check CoreDNS performance
$ kubectl top pods -n kube-system -l k8s-app=kube-dns

# Look for high latency in CoreDNS metrics
$ kubectl port-forward -n kube-system svc/kube-dns 9153:9153
$ curl localhost:9153/metrics | grep coredns_dns_request_duration_seconds

# Check for failures
$ kubectl logs -n kube-system -l k8s-app=kube-dns | grep -i error

Kubernetes Discovery Best Practices

•Always use readiness probes — Pods only receive traffic when Ready. Configure probes to accurately reflect actual readiness.
•Prefer ClusterIP over NodePort/LoadBalancer for internal services — Simpler, more efficient, no external exposure.
•Use namespaces for service organization — Namespaces provide scope for service names and enable RBAC boundaries.
•Configure appropriate DNS settings — Tune ndots if external DNS calls are frequent. Use FQDN with trailing dot for external.
•Monitor CoreDNS — Watch CPU, memory, request latency, and error rates. Scale replicas as cluster grows.
•Consider NodeLocal DNSCache — For high-scale clusters, reduces latency and CoreDNS load.
•Use topology-aware hints — Reduce cross-zone traffic and latency where applicable.
•Document service naming conventions — Consistent naming makes discovery predictable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: v1
kind: Service
metadata:
  name: api-service
  namespace: production
  labels:
    app: api
    version: v2
    team: platform
  annotations:
    # Documentation
    description: "Primary API service for customer-facing applications"
    team: "platform-team@company.com"
    
    # Topology awareness (K8s 1.21+)
    service.kubernetes.io/topology-mode: Auto
spec:
  type: ClusterIP
  selector:
    app: api
    version: v2
  ports:
  - name: http
    port: 80
    targetPort: 8080
    protocol: TCP
  - name: grpc
    port: 9000
    targetPort: 9000
    protocol: TCP
  - name: metrics
    port: 9090
    targetPort: 9090
    protocol: TCP
  # Session affinity for stateful-ish workloads
  sessionAffinity: None  # or ClientIP
  sessionAffinityConfig:
    clientIP:
      timeoutSeconds: 10800

The Debug Pod Pattern

Summary: Kubernetes Service Discovery

We've comprehensively explored Kubernetes-native service discovery—from fundamental concepts to advanced multi-cluster patterns. Let's consolidate the essential insights:

Key Takeaways

•Services abstract Pod ephemerality — ClusterIP provides stable endpoints while Pods come and go with dynamic IPs.
•Service types serve different access patterns — ClusterIP for internal, NodePort for development, LoadBalancer for production external, Headless for client-side discovery.
•kube-proxy makes it work — iptables mode is default; IPVS mode scales better; eBPF (Cilium) is the future.
•DNS is the discovery mechanism — CoreDNS provides service.namespace.svc.cluster.local resolution with configurable TTL.
•Tune DNS for your workload — ndots, search domains, and NodeLocal DNSCache all impact performance.
•Headless Services enable StatefulSets — Stable pod DNS names (pod-0.service.namespace.svc.cluster.local) for stateful workloads.
•Multi-cluster requires additional tooling — Service meshes, MCS API, or external registries extend discovery beyond single cluster.
•Readiness probes are essential — Without proper probes, traffic routes to unready Pods causing errors.

Module Complete: Service Discovery Mechanisms

Across this module, you've learned:

Why service discovery is essential in dynamic distributed systems
Client-side vs server-side patterns and their trade-offs
Service registries (Consul, etcd, Zookeeper) and when to use each
DNS-based discovery — capabilities, limitations, and best practices
Kubernetes-native discovery — Services, DNS, kube-proxy, and advanced patterns

You now have comprehensive knowledge to design and implement service discovery for systems of any scale, from simple applications to complex multi-cluster, multi-region architectures.

Module Complete

5 / 5