Loading content...
Kubernetes has fundamentally changed how we think about service discovery. Rather than bolting discovery onto applications through libraries or sidecars, Kubernetes provides service discovery as a platform primitive. When you deploy an application to Kubernetes, discovery is available out of the box—no additional infrastructure, no client libraries, no configuration beyond declaring a Service resource.
This platform-native approach has made Kubernetes the de facto standard for container orchestration, and understanding its discovery mechanisms is essential for any engineer building cloud-native applications. Kubernetes' discovery model combines DNS-based resolution with a real-time endpoints API, providing both simplicity for basic use cases and flexibility for advanced scenarios.
This page provides a comprehensive exploration of Kubernetes service discovery: how it works, the components involved, configuration options, and patterns for production environments.
By the end of this page, you will understand how Kubernetes Services and Endpoints work, how CoreDNS provides DNS-based discovery, the difference between ClusterIP, NodePort, LoadBalancer, and headless services, ExternalName for external service integration, and advanced patterns including EndpointSlices and topology-aware routing.
In Kubernetes, a Service is an abstraction that defines a logical set of Pods and a policy for accessing them. Services enable loose coupling between dependent components—a consumer doesn't need to know which specific Pods implement the service, only the Service's stable identity.
The Core Problem Services Solve:
Pods in Kubernetes are ephemeral. They come and go as deployments roll out, nodes fail, or scaling events occur. Each Pod gets a unique IP address, but that address exists only for the Pod's lifetime. If you hardcode a Pod's IP address, your application will break as soon as that Pod is replaced.
Services provide a stable abstraction over the dynamic set of Pods:
my-service.my-namespace.svc.cluster.localHow Services Track Pods:
Services use label selectors to identify which Pods belong to the service. When you create a Service with a selector, Kubernetes automatically creates and maintains an Endpoints object that lists the IP addresses and ports of all Pods matching the selector.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
# Deployment creates Pods with labelsapiVersion: apps/v1kind: Deploymentmetadata: name: order-service namespace: productionspec: replicas: 3 selector: matchLabels: app: order-service version: v2 template: metadata: labels: app: order-service # Service will select these version: v2 # Can be used for canary routing team: commerce spec: containers: - name: order-service image: order-service:2.3.1 ports: - containerPort: 8080 name: http - containerPort: 9090 name: grpc readinessProbe: # Critical: determines endpoint inclusion httpGet: path: /health/ready port: 8080 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 15 periodSeconds: 20 ---# Service selects Pods by labelapiVersion: v1kind: Servicemetadata: name: order-service namespace: production labels: app: order-servicespec: type: ClusterIP # Default type selector: # Matches Pod labels app: order-service version: v2 # Could target specific version ports: - name: http port: 80 # Service port (what clients connect to) targetPort: 8080 # Container port (where traffic goes) protocol: TCP - name: grpc port: 9090 targetPort: 9090 protocol: TCP # Kubernetes automatically creates Endpoints:# kubectl get endpoints order-service -n production# NAME ENDPOINTS# order-service 10.244.1.5:8080,10.244.2.8:8080,10.244.3.12:8080A Pod is only added to the Endpoints list when its readiness probe passes. Without a readiness probe, Pods are added immediately upon starting—before they're actually ready to serve traffic. Always configure readiness probes for any Pod that will receive traffic through a Service.
Kubernetes supports several Service types, each designed for different networking scenarios. Understanding these types is essential for designing accessible applications.
ClusterIP is the default Service type. It exposes the Service on a cluster-internal IP address. The Service is only reachable from within the cluster.
Use Cases:
1234567891011121314151617
apiVersion: v1kind: Servicemetadata: name: user-service namespace: productionspec: type: ClusterIP # Default, can be omitted selector: app: user-service ports: - port: 8080 targetPort: 8080 # Accessible within cluster only:# - DNS: user-service.production.svc.cluster.local# - Or just: user-service (from same namespace)# - ClusterIP: e.g., 10.96.50.25| Type | Internal Access | External Access | IP Allocation | Use Case |
|---|---|---|---|---|
| ClusterIP | Yes - ClusterIP + DNS | No | Virtual ClusterIP | Internal services |
| NodePort | Yes - ClusterIP + DNS | Yes - <NodeIP>:<NodePort> | ClusterIP + NodePort | Dev/testing, simple external |
| LoadBalancer | Yes - ClusterIP + DNS | Yes - External LB IP | ClusterIP + External IP | Production external access |
| Headless (None) | Yes - DNS returns Pod IPs | No | No ClusterIP | Client-side LB, StatefulSets |
| ExternalName | Yes - CNAME to external | N/A (external) | No IP - CNAME only | External service abstraction |
For HTTP/HTTPS traffic, consider Ingress resources instead of LoadBalancer services. Ingress provides path-based routing, name-based virtual hosting, TLS termination, and more—typically backed by a single LoadBalancer rather than one per service.
CoreDNS is the default DNS server for Kubernetes clusters (replacing kube-dns since Kubernetes 1.11). It provides DNS-based service discovery, allowing Pods to discover services using standard DNS queries.
How DNS Discovery Works:
DNS Naming Convention:
Kubernetes follows a standard naming convention for service DNS:
<service-name>.<namespace>.svc.<cluster-domain>
For example: order-service.production.svc.cluster.local
DNS Resolution Shortcuts:
order-service (shortest form)order-service.other-namespaceorder-service.other-namespace.svc.cluster.local1234567891011121314151617181920212223242526272829303132333435
# From a Pod in the 'production' namespace # Same namespace - shortest form$ nslookup order-serviceServer: 10.96.0.10Address: 10.96.0.10:53 Name: order-service.production.svc.cluster.localAddress: 10.96.100.50 # Cross-namespace - specify namespace$ nslookup redis.cacheName: redis.cache.svc.cluster.localAddress: 10.96.200.30 # Fully qualified domain name (FQDN)$ nslookup order-service.production.svc.cluster.localName: order-service.production.svc.cluster.localAddress: 10.96.100.50 # SRV records for port discovery$ nslookup -type=SRV _http._tcp.order-service.production.svc.cluster.local_http._tcp.order-service.production.svc.cluster.local service = 0 100 80 order-service.production.svc.cluster.local # Headless service returns all Pod IPs$ nslookup database-cluster.production.svc.cluster.localName: database-cluster.production.svc.cluster.localAddress: 10.244.1.5Address: 10.244.2.8Address: 10.244.3.12 # StatefulSet Pod-specific DNS$ nslookup postgresql-0.database-cluster.production.svc.cluster.localName: postgresql-0.database-cluster.production.svc.cluster.localAddress: 10.244.1.5CoreDNS Configuration:
CoreDNS is configured via a ConfigMap that defines its behavior. The default configuration watches the Kubernetes API and responds to DNS queries for cluster services.
1234567891011121314151617181920212223242526272829303132333435363738
apiVersion: v1kind: ConfigMapmetadata: name: coredns namespace: kube-systemdata: Corefile: | .:53 { errors health { lameduck 5s } ready # Kubernetes plugin - handles cluster DNS kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure fallthrough in-addr.arpa ip6.arpa ttl 30 # DNS TTL for records } prometheus :9153 # Metrics endpoint # Forward external queries to upstream DNS forward . /etc/resolv.conf { max_concurrent 1000 } cache 30 # Cache responses for 30s loop reload loadbalance # Round-robin DNS responses } # Custom zone for internal services # internal.company.com:53 { # file /etc/coredns/internal.company.com.zone # }CoreDNS uses a default TTL of 30 seconds. This means Pod changes take up to 30 seconds to propagate to all clients' DNS caches. For latency-sensitive applications, consider using headless services with client-side load balancing, or a service mesh that provides real-time endpoint updates.
While DNS provides a simple discovery interface, the underlying mechanism is the Endpoints (and in newer clusters, EndpointSlices) API. Understanding these resources is essential for debugging, advanced integrations, and understanding how kube-proxy routes traffic.
Endpoints vs EndpointSlices:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788
# Endpoints (auto-generated from Service selector)# kubectl get endpoints order-service -n production -o yaml apiVersion: v1kind: Endpointsmetadata: name: order-service namespace: production labels: app: order-servicesubsets: - addresses: # Ready pods - ip: 10.244.1.5 nodeName: node-1 targetRef: kind: Pod name: order-service-abc123 namespace: production - ip: 10.244.2.8 nodeName: node-2 targetRef: kind: Pod name: order-service-def456 namespace: production - ip: 10.244.3.12 nodeName: node-3 targetRef: kind: Pod name: order-service-ghi789 namespace: production notReadyAddresses: # Pods failing readiness probe - ip: 10.244.4.20 nodeName: node-4 targetRef: kind: Pod name: order-service-jkl012 namespace: production ports: - name: http port: 8080 protocol: TCP - name: grpc port: 9090 protocol: TCP ---# EndpointSlice (newer, more scalable)# kubectl get endpointslice -l kubernetes.io/service-name=order-service -n production apiVersion: discovery.k8s.io/v1kind: EndpointSlicemetadata: name: order-service-abc12 namespace: production labels: kubernetes.io/service-name: order-service ownerReferences: - apiVersion: v1 kind: Service name: order-serviceaddressType: IPv4endpoints: - addresses: - "10.244.1.5" conditions: ready: true serving: true terminating: false nodeName: node-1 targetRef: kind: Pod name: order-service-abc123 namespace: production - addresses: - "10.244.2.8" conditions: ready: true serving: true terminating: false nodeName: node-2 targetRef: kind: Pod name: order-service-def456 namespace: productionports: - name: http port: 8080 protocol: TCPWhy EndpointSlices?
With large services (thousands of endpoints), the Endpoints API becomes problematic:
EndpointSlices address these by:
| Aspect | Endpoints | EndpointSlices |
|---|---|---|
| Default in Kubernetes | < 1.21 | = 1.21 |
| Endpoints per object | All endpoints in one object | Max 100 per slice (default) |
| Update granularity | Full object rewrite | Per-slice updates |
| Topology support | No | Yes (topology hints) |
| Condition tracking | ready/notReady only | ready, serving, terminating |
| Recommended for | Small services, legacy | Production, large services |
Service meshes and load balancers watch the Endpoints or EndpointSlices API rather than relying solely on DNS. This provides real-time updates when Pods become ready or fail, avoiding DNS TTL delays. If you're building custom integrations, use EndpointSlices for better scalability.
DNS tells clients which IP to connect to, but how does traffic actually reach the right Pod? kube-proxy is the component responsible for implementing Service-level load balancing.
kube-proxy Modes:
kube-proxy can operate in different modes, each with different performance characteristics:
iptables mode (default in most clusters) uses Linux iptables rules to intercept and redirect traffic to backend Pods.
How it works:
12345678910111213141516171819202122
# View iptables rules for a service (simplified)$ iptables -t nat -L KUBE-SERVICES -n Chain KUBE-SERVICEStarget prot opt source destinationKUBE-SVC-ORDER-SERVICE tcp -- 0.0.0.0/0 10.96.100.50 /* order-service */ # Service chain randomly selects a backend$ iptables -t nat -L KUBE-SVC-ORDER-SERVICE -n Chain KUBE-SVC-ORDER-SERVICEtarget prot opt source destinationKUBE-SEP-AAAA -- -- 0.0.0.0/0 0.0.0.0/0 probability 0.333KUBE-SEP-BBBB -- -- 0.0.0.0/0 0.0.0.0/0 probability 0.500KUBE-SEP-CCCC -- -- 0.0.0.0/0 0.0.0.0/0 # last one gets remainder # Each SEP chain DNATs to a specific Pod$ iptables -t nat -L KUBE-SEP-AAAA -n Chain KUBE-SEP-AAAAtarget prot opt source destinationDNAT tcp -- 0.0.0.0/0 0.0.0.0/0 to:10.244.1.5:8080If you have more than 1000 services, consider switching from iptables to IPVS mode or using Cilium with eBPF. iptables rule evaluation is linear (O(n)), which can add significant latency with many services. IPVS uses hash tables for O(1) lookups.
Beyond basic service discovery, Kubernetes supports several advanced patterns for sophisticated routing requirements.
service.kubernetes.io/topology-aware-hints: Auto123456789101112131415161718
apiVersion: v1kind: Servicemetadata: name: order-service namespace: production annotations: # Enable topology-aware routing service.kubernetes.io/topology-aware-hints: "Auto"spec: selector: app: order-service ports: - port: 8080 # With topology hints enabled:# - Traffic from zone-a prefers pods in zone-a# - Falls back to other zones if zone-a has insufficient capacity# - Reduces cross-zone network costs in cloud environmentsCluster (default, any node) or Local (only same node)123456789101112131415
apiVersion: v1kind: Servicemetadata: name: node-local-cachespec: selector: app: cache-agent ports: - port: 6379 # Only route to pods on the same node as the client internalTrafficPolicy: Local # Use case: A DaemonSet runs a cache on every node# Applications connect to the local cache, not a random node's cache# Reduces network hops and latency1234567891011121314151617181920212223242526272829303132
# Service without selectorapiVersion: v1kind: Servicemetadata: name: external-database namespace: productionspec: # No selector! Endpoints won't be auto-created ports: - port: 5432 targetPort: 5432 ---# Manually created EndpointsapiVersion: v1kind: Endpointsmetadata: name: external-database # Must match Service name namespace: productionsubsets: - addresses: - ip: 10.100.50.10 # External DB IP - ip: 10.100.50.11 # Replica IP ports: - port: 5432 # Applications connect to external-database:5432# Kubernetes routes to the manually specified IPs# Useful for:# - External databases (RDS, Cloud SQL)# - Services in other clusters# - Gradual migration from external to internalWhen using custom Endpoints, you are responsible for keeping them current. If the external service's IP changes and you don't update the Endpoints, discovery will fail. Consider using ExternalName services for external DNS names, or automation to sync external IPs to Endpoints.
Service discovery issues are among the most common problems in Kubernetes. Here's a systematic approach to debugging discovery failures.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
# 1. Verify Service exists and has the right selector$ kubectl get svc order-service -n production -o yaml# Check: selector matches Pod labels?# Check: ports configured correctly? # 2. Check Endpoints - are backend Pods registered?$ kubectl get endpoints order-service -n productionNAME ENDPOINTSorder-service 10.244.1.5:8080,10.244.2.8:8080# If empty: Pods don't match selector, or failing readiness probes # 3. Check Pod labels match Service selector$ kubectl get pods -n production -l app=order-serviceNAME READY STATUS RESTARTSorder-service-abc123-xyz 1/1 Running 0order-service-def456-xyz 1/1 Running 0# No pods? Labels don't match selector # 4. Check Pod readiness$ kubectl get pods -n production -l app=order-service -o wide# READY column should be 1/1# If 0/1: readiness probe is failing $ kubectl describe pod order-service-abc123-xyz -n production# Look for: Readiness probe failed messages # 5. Test DNS resolution from a Pod$ kubectl run debug --rm -it --image=busybox -- /bin/sh/ # nslookup order-service.production.svc.cluster.local# Should return ClusterIP / # nslookup order-service# Works if you're in the same namespace # 6. Test actual connectivity/ # wget -qO- http://order-service:8080/health# Or: curl if using a different image # 7. Check CoreDNS logs$ kubectl logs -n kube-system -l k8s-app=kube-dns# Look for: errors, failed queries, upstream issues # 8. Check kube-proxy logs$ kubectl logs -n kube-system -l k8s-app=kube-proxy# Look for: iptables errors, sync failures # 9. Verify iptables rules exist (on a node)$ iptables -t nat -L KUBE-SERVICES | grep order-service# Should show a chain for the service| Symptom | Likely Cause | Solution |
|---|---|---|
| Empty Endpoints | No Pods match selector | Verify Pod labels match Service selector |
| Empty Endpoints | Pods failing readiness | Fix readiness probe or application issues |
| DNS resolution fails | CoreDNS not running | Check CoreDNS pods in kube-system |
| DNS resolution fails | Wrong DNS search domain | Check Pod's /etc/resolv.conf |
| Connection refused | No Pods in Endpoints | Check Pod status and readiness |
| Connection timeout | Network policy blocking | Review NetworkPolicy resources |
| Intermittent failures | Some Pods unhealthy | Check individual Pod health |
| Slow performance | Cross-zone traffic | Enable topology-aware routing |
Keep a debugging container image handy (e.g., nicolaka/netshoot) that includes DNS tools, curl, wget, and network utilities. Spawn temporary debug Pods to test discovery from within the cluster: kubectl run debug --rm -it --image=nicolaka/netshoot -- /bin/bash
We've comprehensively explored Kubernetes service discovery—from fundamental concepts to advanced patterns. Let's consolidate the key insights:
Module Complete!
You've now completed the Service Discovery module. You understand why service discovery is needed, DNS-based and registry-based approaches, client-side vs server-side discovery patterns, and Kubernetes' platform-native implementation. This knowledge forms the foundation for building resilient, scalable microservices architectures.
What's Next:
With discovery mastered, the next module explores Circuit Breaker Pattern—how to prevent cascade failures when discovered services become degraded or unavailable. Understanding circuit breakers is essential for building truly resilient distributed systems.
Congratulations! You now have comprehensive knowledge of service discovery—from fundamental concepts to Kubernetes implementation details. You understand why discovery is needed, how DNS and registries work, client-side vs server-side patterns, and how Kubernetes provides discovery as a platform primitive. This knowledge is essential for designing and operating resilient microservices architectures.