System Design (HLD)Kubernetes Networking

Kubernetes Networking

LevelAdvanced

Duration90 mins

TopicKubernetes Networking

4 / 5

DNS in Kubernetes: Service Discovery

The Glue of Kubernetes Networking

Consider how Pods communicate in Kubernetes. You've deployed a web application that needs to connect to a database. The database runs as a Service called postgresql in the database namespace. How does your application Pod find it?

You don't hardcode IP addresses—they're ephemeral. You don't use environment variables for every service—that doesn't scale. Instead, you use DNS.

// Your application code
const connectionString = 'postgresql://postgresql.database.svc.cluster.local:5432/mydb';

The hostname postgresql.database.svc.cluster.local is automatically resolved to the Service's ClusterIP by Kubernetes' internal DNS system. This is service discovery through DNS—and understanding it deeply is essential for building and debugging distributed applications.

What You Will Learn

By the end of this page, you will understand how CoreDNS works, the structure of Kubernetes DNS records, Pod and Service DNS configurations, and how to troubleshoot DNS issues effectively.

CoreDNS: The Cluster DNS Server

CoreDNS is the default DNS server in Kubernetes (since v1.13, replacing kube-dns). It runs as a Deployment in the kube-system namespace and is exposed via a ClusterIP Service.

How DNS Resolution Works in Kubernetes

Pod makes DNS query: "postgresql.database.svc.cluster.local"
       │
       ▼
 Pod's /etc/resolv.conf points to CoreDNS Service IP (e.g., 10.96.0.10)
       │
       ▼
 CoreDNS receives query
       │
       ▼
 Kubernetes plugin queries API server for Service
       │
       ▼
 CoreDNS returns Service ClusterIP (e.g., 10.96.50.100)
       │
       ▼
 Pod connects to ClusterIP

coredns-components.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# View CoreDNS Deployment
kubectl get deployment coredns -n kube-system
# NAME      READY   UP-TO-DATE   AVAILABLE
# coredns   2/2     2            2
 
# View CoreDNS Pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
# NAME                       READY   STATUS    RESTARTS
# coredns-5644d7b6d9-abc12   1/1     Running   0
# coredns-5644d7b6d9-xyz34   1/1     Running   0
 
# View CoreDNS Service
kubectl get svc kube-dns -n kube-system
# NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)
# kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP
 
# The ClusterIP (10.96.0.10) is configured in every Pod's /etc/resolv.conf
kubectl exec -it any-pod -- cat /etc/resolv.conf
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

CoreDNS ConfigMap (Corefile)

CoreDNS is configured via a ConfigMap containing the Corefile—a declarative configuration that defines how DNS queries are handled.

coredns-configmap.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors                          # Log errors
        health {                        # Health check endpoint on :8080/health
            lameduck 5s
        }
        ready                           # Readiness probe on :8181/ready
        
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            # Watch Kubernetes Services and Pods
            pods insecure                # Enable Pod DNS records (A records)
            fallthrough in-addr.arpa ip6.arpa
            ttl 30                       # TTL for DNS responses
        }
        
        prometheus :9153                # Metrics endpoint for Prometheus
        
        forward . /etc/resolv.conf {    # Forward external queries upstream
            max_concurrent 1000
        }
        
        cache 30                        # Cache DNS responses for 30s
        
        loop                            # Detect and prevent DNS loops
        reload                          # Auto-reload on config changes
        loadbalance                     # Round-robin load balance responses
    }

CoreDNS Plugin Reference
Plugin	Function	Configuration Impact
kubernetes	Serves cluster DNS records	Essential—provides service discovery
forward	Forwards non-cluster queries	External DNS resolution
cache	Caches responses	Reduces latency, API load
loop	Detects resolution loops	Prevents infinite loops
health	Liveness probe endpoint	Controller health monitoring
ready	Readiness probe endpoint	Traffic routing decisions
prometheus	Metrics endpoint	Observability (query rates, latency)
errors	Logs errors to stdout	Debugging via kubectl logs

Service DNS Records: The Complete Reference

Kubernetes creates specific DNS records for Services. Understanding the naming conventions and record types is essential for both development and debugging.

DNS Record Format

The full DNS name for a Service follows this pattern:

<service-name>.<namespace>.svc.<cluster-domain>

For example: postgresql.database.svc.cluster.local

Component	Value	Description
service-name	postgresql	The Service's metadata.name
namespace	database	The namespace containing the Service
svc	svc	Literal string indicating a Service
cluster-domain	cluster.local	Default cluster domain (configurable)

service-dns-examples.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Example Service
apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: production
spec:
  type: ClusterIP
  clusterIP: 10.96.100.50
  selector:
    app: my-app
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: grpc
    port: 9090
    targetPort: 9090
 
# DNS Records created:
# 
# A Record (returns ClusterIP):
#   my-app.production.svc.cluster.local → 10.96.100.50
#
# SRV Records (for service discovery with ports):
#   _http._tcp.my-app.production.svc.cluster.local → 0 100 80 my-app.production.svc.cluster.local
#   _grpc._tcp.my-app.production.svc.cluster.local → 0 100 9090 my-app.production.svc.cluster.local
#
# Short names (via search domains):
#   From within 'production' namespace:
#     my-app → 10.96.100.50 (uses search domain)
#   From other namespaces:
#     my-app.production → 10.96.100.50

Headless Service DNS Records

Headless Services (ClusterIP: None) create different DNS records—they return Pod IPs directly instead of a ClusterIP:

headless-dns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Headless Service
apiVersion: v1
kind: Service
metadata:
  name: mongodb
  namespace: database
spec:
  clusterIP: None        # Makes it headless
  selector:
    app: mongodb
  ports:
  - port: 27017
 
# With StatefulSet Pods: mongodb-0, mongodb-1, mongodb-2
 
# DNS Records created:
#
# A Record (returns ALL Pod IPs):
#   mongodb.database.svc.cluster.local → 10.244.1.42, 10.244.2.55, 10.244.3.18
#
# Individual Pod A Records (StatefulSet only):
#   mongodb-0.mongodb.database.svc.cluster.local → 10.244.1.42
#   mongodb-1.mongodb.database.svc.cluster.local → 10.244.2.55
#   mongodb-2.mongodb.database.svc.cluster.local → 10.244.3.18
#
# This enables direct Pod addressing for StatefulSets!

When to Use Headless Services

Use headless Services when you need: • StatefulSet Pod identities: Each replica gets a stable DNS name • Client-side load balancing: gRPC clients that manage connections themselves • Service discovery enumeration: List all backend IPs programmatically • Database clustering: Primary/replica topology awareness

Pod DNS Configuration

Every Pod in Kubernetes is configured with DNS settings via /etc/resolv.conf. By default, this configuration is injected by the kubelet based on the Pod's dnsPolicy.

Default resolv.conf

resolv-conf-explained.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Default /etc/resolv.conf in a Pod
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
 
# Explained:
#
# nameserver 10.96.0.10
#   └── CoreDNS ClusterIP; all DNS queries go here
#
# search default.svc.cluster.local svc.cluster.local cluster.local
#   └── Search domains for short names. If you query "my-svc":
#       1. Try my-svc.default.svc.cluster.local
#       2. Try my-svc.svc.cluster.local
#       3. Try my-svc.cluster.local
#       4. Finally try my-svc as-is (absolute)
#
# options ndots:5
#   └── If query has < 5 dots, treat as relative and use search domains
#       If query has >= 5 dots, treat as absolute (no search)
#
# Example with "postgresql.database":
#   - Has 1 dot (< 5), so search domains are applied:
#   - postgresql.database.default.svc.cluster.local ❌
#   - postgresql.database.svc.cluster.local ❌
#   - postgresql.database.cluster.local ❌
#   - postgresql.database ← finally tries as-is (might work if external DNS resolves it)

DNS Policies

Kubernetes supports several dnsPolicy options that control how Pod DNS is configured:

DNS Policy Options
Policy	Behavior	Use Case
ClusterFirst (default)	Query CoreDNS first; forward external to upstream	Most workloads
ClusterFirstWithHostNet	Same as ClusterFirst, but for hostNetwork Pods	Ingress controllers, CNI plugins
Default	Use node's DNS settings (/etc/resolv.conf)	When cluster DNS isn't needed
None	Ignore all defaults; use dnsConfig only	Custom DNS configuration

custom-dns-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Custom DNS configuration for specific requirements
apiVersion: v1
kind: Pod
metadata:
  name: custom-dns-pod
spec:
  dnsPolicy: None      # Don't use defaults; configure manually
  dnsConfig:
    nameservers:
    - 10.96.0.10       # CoreDNS
    - 8.8.8.8          # Google DNS as backup
    searches:
    - production.svc.cluster.local
    - svc.cluster.local
    - cluster.local
    - example.com      # Add custom search domain
    options:
    - name: ndots
      value: "2"       # Reduce ndots for faster external lookups
    - name: timeout
      value: "3"       # 3 second timeout
    - name: attempts
      value: "2"       # Retry 2 times
  containers:
  - name: app
    image: myapp:latest
 
---
# Common pattern: Reduce ndots for applications that make many external calls
apiVersion: v1
kind: Pod
metadata:
  name: external-api-client
spec:
  dnsPolicy: ClusterFirst  # Use cluster DNS
  dnsConfig:
    options:
    - name: ndots
      value: "2"           # Treat names with 2+ dots as absolute
    # Now "api.stripe.com" (2 dots) won't go through search domains
    # This significantly reduces DNS queries for external domains
  containers:
  - name: app
    image: myapp:latest

The ndots:5 Problem

With default ndots:5, resolving api.stripe.com (2 dots) triggers 4 DNS queries before the correct resolution:

api.stripe.com.default.svc.cluster.local
api.stripe.com.svc.cluster.local
api.stripe.com.cluster.local
api.stripe.com (finally works!)

For applications making many external API calls, reducing ndots to 2-3 can significantly improve latency and reduce DNS load.

Pod DNS Records

In addition to Service DNS records, Kubernetes can create DNS records for individual Pods. This is essential for StatefulSets and certain advanced use cases.

Pod A Records

By default, Pods receive A records based on their IP address:

pod-dns-records.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Standard Pod DNS (IP-based)
# Pod IP: 10.244.1.42
# DNS Name: 10-244-1-42.default.pod.cluster.local
# Note: Dots in IP are replaced with dashes
 
# This is enabled by CoreDNS config:
# kubernetes cluster.local in-addr.arpa ip6.arpa {
#     pods insecure        ← This enables Pod A records
# }
 
---
# Pod with explicit hostname and subdomain
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
  namespace: production
spec:
  hostname: my-named-pod    # Custom hostname
  subdomain: my-subdomain   # Must match a headless Service name
  containers:
  - name: app
    image: myapp:latest
 
# DNS Record created (if headless service "my-subdomain" exists):
# my-named-pod.my-subdomain.production.svc.cluster.local → Pod IP
 
---
# Headless Service to enable the subdomain
apiVersion: v1
kind: Service
metadata:
  name: my-subdomain       # Must match Pod's subdomain
  namespace: production
spec:
  clusterIP: None          # Headless
  selector:
    app: my-pods
  ports:
  - port: 8080

StatefulSet DNS Records

StatefulSets are the primary use case for Pod DNS records. Each Pod in a StatefulSet gets a stable, predictable DNS name:

statefulset-dns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: database
spec:
  clusterIP: None           # Headless - required for StatefulSet DNS
  selector:
    app: mysql
  ports:
  - port: 3306
 
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: database
spec:
  serviceName: mysql-headless    # Links to headless Service
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
 
# DNS Records created:
#
# Service A Record (returns all Pod IPs):
#   mysql-headless.database.svc.cluster.local → 10.244.1.42, 10.244.2.55, 10.244.3.18
#
# Individual Pod A Records:
#   mysql-0.mysql-headless.database.svc.cluster.local → 10.244.1.42
#   mysql-1.mysql-headless.database.svc.cluster.local → 10.244.2.55
#   mysql-2.mysql-headless.database.svc.cluster.local → 10.244.3.18
#
# These names are STABLE - even if Pods are deleted and recreated,
# mysql-0, mysql-1, mysql-2 will always resolve to the correct Pod

StatefulSet DNS Stability

StatefulSet Pod DNS names remain stable across Pod restarts. If mysql-0 is deleted and rescheduled to a different node with a new IP, mysql-0.mysql-headless.database.svc.cluster.local will resolve to the new IP. This enables applications to use hostname-based configuration for database clusters.

External DNS Resolution

Pods need to resolve external domain names (api.stripe.com, github.com) in addition to cluster-internal names. CoreDNS handles this through forwarding.

Default Forwarding Behavior

By default, CoreDNS forwards non-cluster queries to the nameservers configured on the node (from /etc/resolv.conf):

forward-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Default CoreDNS forwarding configuration
.:53 {
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        # Handles cluster.local queries
    }
    
    forward . /etc/resolv.conf {
        # Forward everything else to upstream DNS
        max_concurrent 1000
        policy sequential  # Try nameservers in order
    }
}
 
# Custom upstream DNS configuration
.:53 {
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    
    # Forward to specific upstream DNS servers
    forward . 8.8.8.8 8.8.4.4 {
        policy round_robin  # Load balance between servers
        health_check 5s      # Check upstream health every 5s
    }
    
    cache 30
}

Split-Horizon DNS (Stub Domains)

Organizations often need to resolve internal corporate domains differently from public DNS. CoreDNS supports stub domains for this:

stub-domains.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        
        cache 30
        reload
        loadbalance
        
        # Default upstream for most queries
        forward . 8.8.8.8 8.8.4.4
    }
    
    # Stub domain: resolve *.corp.example.com via corporate DNS
    corp.example.com:53 {
        errors
        cache 30
        forward . 10.0.0.53 10.0.0.54    # Corporate DNS servers
    }
    
    # Another stub domain: resolve *.internal.acme.com
    internal.acme.com:53 {
        errors
        cache 30
        forward . 192.168.1.53
    }

ExternalName Services

For services outside the cluster (like managed databases), use ExternalName Services to provide a cluster-internal DNS alias:

externalname-service.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# ExternalName Service - DNS alias for external service
apiVersion: v1
kind: Service
metadata:
  name: production-database
  namespace: backend
spec:
  type: ExternalName
  externalName: prod-db.us-east-1.rds.amazonaws.com
 
# DNS behavior:
# production-database.backend.svc.cluster.local 
#     → CNAME → prod-db.us-east-1.rds.amazonaws.com
#     → A → (RDS IP addresses)
 
# Application configuration:
# DATABASE_HOST=production-database  # Uses cluster DNS
# If database moves, only update the ExternalName Service
 
---
# ExternalName for API integrations
apiVersion: v1
kind: Service
metadata:
  name: payment-api
  namespace: integrations
spec:
  type: ExternalName
  externalName: api.stripe.com
 
# Now Pods can use: http://payment-api.integrations.svc.cluster.local/v1/charges
# This provides flexibility to mock or redirect in different environments

ExternalName Limitations

ExternalName Services: • Don't provide a ClusterIP or load balancing • Return a CNAME record (requires additional DNS lookup) • Don't work with IP addresses (must be a hostname) • Port translation isn't supported (use the external service's actual port)

DNS Performance and Scaling

DNS is on the critical path for every network request. Performance issues affect every application in the cluster. Here's how to optimize and scale CoreDNS.

Scaling CoreDNS

For large clusters, the default 2 CoreDNS replicas may not be sufficient:

coredns-scaling.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Scale CoreDNS replicas manually
kubectl scale deployment coredns -n kube-system --replicas=3
 
# OR use Cluster Proportional Autoscaler for automatic scaling
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-autoscaler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: dns-autoscaler
  template:
    metadata:
      labels:
        k8s-app: dns-autoscaler
    spec:
      containers:
      - name: autoscaler
        image: registry.k8s.io/cpa/cluster-proportional-autoscaler:1.8.5
        command:
        - /cluster-proportional-autoscaler
        - --namespace=kube-system
        - --configmap=dns-autoscaler
        - --target=deployment/coredns
        - --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"min":2,"max":10}}
        # 1 replica per 256 cores OR per 16 nodes, minimum 2, maximum 10
 
---
# Resource limits for CoreDNS
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-resources
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        # ... existing config ...
    }
 
# Patch deployment with resources
kubectl patch deployment coredns -n kube-system -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "coredns",
          "resources": {
            "requests": {"cpu": "100m", "memory": "128Mi"},
            "limits": {"cpu": "200m", "memory": "256Mi"}
          }
        }]
      }
    }
  }
}'

NodeLocal DNSCache

NodeLocal DNSCache runs a DNS caching agent on each node, dramatically reducing latency and CoreDNS load:

nodelocal-dns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# NodeLocal DNS Cache runs as a DaemonSet on each node
# It listens on a link-local IP (169.254.20.10) and caches DNS responses
 
# Benefits:
# - Eliminates cluster network hop for cached queries
# - Reduces CoreDNS load by 50-90% in typical workloads
# - Lower latency for DNS resolution
# - Survives CoreDNS Pod/network issues (cached responses)
 
# Installation (GKE, EKS, AKS have specific guides)
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
 
# After installation, Pods automatically use 169.254.20.10
# NodeLocal DNS forwards cache misses to CoreDNS
 
# View NodeLocal DNS DaemonSet
kubectl get ds node-local-dns -n kube-system
kubectl get pods -n kube-system -l k8s-app=node-local-dns

DNS Performance Optimization Techniques
Technique	Impact	Implementation Effort
Increase CoreDNS replicas	Linear capacity increase	Low (kubectl scale)
NodeLocal DNSCache	50-90% load reduction	Medium (DaemonSet deploy)
Reduce ndots value	Fewer queries per resolution	Low (Pod spec change)
Use FQDN with trailing dot	Skip search domains	Low (app config change)
Increase cache TTL	Fewer queries to upstream	Low (Corefile edit)
Autoscaling CoreDNS	Dynamic capacity	Medium (deploy autoscaler)

Troubleshooting DNS Issues

DNS issues are among the most common problems in Kubernetes. A systematic approach is essential for efficient debugging.

dns-troubleshooting.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Step 1: Verify CoreDNS is running and healthy
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get deployment coredns -n kube-system
 
# Check CoreDNS logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
 
# Step 2: Check CoreDNS Service and Endpoints
kubectl get svc kube-dns -n kube-system
kubectl get endpoints kube-dns -n kube-system
# Should show CoreDNS Pod IPs
 
# Step 3: Test DNS from a debug Pod
kubectl run -it --rm dnstest --image=busybox:1.35 -- sh
 
# Inside the Pod:
cat /etc/resolv.conf           # Check DNS configuration
nslookup kubernetes             # Test internal resolution
nslookup google.com             # Test external resolution
nslookup my-service.my-namespace.svc.cluster.local  # Test full FQDN
 
# Step 4: More detailed DNS testing
kubectl run -it --rm dnstest --image=busybox:1.35 -- sh
# Inside:
nslookup -debug kubernetes       # Verbose output
nslookup -type=SRV _http._tcp.my-service.production.svc.cluster.local
 
# Step 5: Test DNS from a full-featured debug image
kubectl run netshoot --image=nicolaka/netshoot -it --rm -- bash
# Inside:
dig kubernetes.default.svc.cluster.local
dig @10.96.0.10 my-service.production.svc.cluster.local
dig +trace google.com  # Trace resolution path
 
# Step 6: Check if it's a Pod-specific or cluster-wide issue
# Create test Pod in different namespace
kubectl run -n kube-system dnstest --image=busybox:1.35 -- nslookup kubernetes
kubectl logs -n kube-system dnstest
 
# Step 7: Verify network policies aren't blocking DNS
kubectl get networkpolicies -A
# Ensure egress to kube-dns (UDP 53) is allowed
 
# Step 8: Check CoreDNS ConfigMap
kubectl get configmap coredns -n kube-system -o yaml
 
# Step 9: Restart CoreDNS if needed (as last resort)
kubectl rollout restart deployment coredns -n kube-system

Common DNS Issues and Solutions
Symptom	Cause	Solution
nslookup times out	CoreDNS not running or not reachable	Check CoreDNS Pods; verify network policies allow DNS
NXDOMAIN for internal services	Wrong namespace or typo in name	Use FQDN: service.namespace.svc.cluster.local
External domains timeout	Forward config issue or upstream unreachable	Check CoreDNS forward config; test upstream DNS
Intermittent failures	CoreDNS overloaded	Scale up replicas; deploy NodeLocal DNSCache
Slow resolution	High ndots causing many searches	Reduce ndots; use FQDN with trailing dot
Service IP not updating	Stale DNS cache	Wait for TTL; reduce cache TTL in Corefile
Pod can't resolve its own Service	Service not yet propagated	Wait a few seconds; check Endpoints

DNS Debugging Quick Check

Most DNS issues fall into three categories:

CoreDNS not reachable: Check Pods, Service, Network Policies
Wrong query format: Use FQDN, check namespace spelling
Upstream issues: Test external DNS independently

Start with: kubectl run -it --rm debug --image=busybox -- nslookup kubernetes

Monitoring Kubernetes DNS

Proactive DNS monitoring prevents issues before they impact applications. CoreDNS exposes Prometheus metrics for comprehensive observability.

dns-monitoring.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# CoreDNS exposes metrics on :9153/metrics by default
 
# Key metrics to monitor:
#
# coredns_dns_requests_total
#   - Total DNS requests by zone, type, protocol
#   - Use to understand query volume and patterns
#
# coredns_dns_responses_total
#   - Responses by rcode (NOERROR, NXDOMAIN, SERVFAIL)
#   - High SERVFAIL = upstream or config issues
#
# coredns_dns_request_duration_seconds
#   - Query latency histogram
#   - p99 latency is key SLO metric
#
# coredns_cache_hits_total / coredns_cache_misses_total
#   - Cache efficiency
#   - Low hit rate = tune cache TTL or size
#
# coredns_forward_healthcheck_failures_total
#   - Upstream DNS server health
#   - Any failures need investigation
 
---
# Prometheus ServiceMonitor for CoreDNS
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: coredns
  namespace: monitoring
  labels:
    release: prometheus
spec:
  jobLabel: k8s-app
  selector:
    matchLabels:
      k8s-app: kube-dns
  namespaceSelector:
    matchNames:
    - kube-system
  endpoints:
  - port: metrics
    interval: 30s
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
 
---
# Grafana dashboard queries (PromQL examples)
#
# DNS Queries Per Second:
# sum(rate(coredns_dns_requests_total[5m]))
#
# Average Latency:
# histogram_quantile(0.99, sum(rate(coredns_dns_request_duration_seconds_bucket[5m])) by (le))
#
# Cache Hit Rate:
# sum(rate(coredns_cache_hits_total[5m])) / 
#   (sum(rate(coredns_cache_hits_total[5m])) + sum(rate(coredns_cache_misses_total[5m])))
#
# Error Rate:
# sum(rate(coredns_dns_responses_total{rcode!="NOERROR"}[5m])) / 
#   sum(rate(coredns_dns_responses_total[5m]))

Recommended DNS Alerts

•CoreDNS Pod Down: Alert if any CoreDNS Pod is not Running
•High DNS Latency: Alert if p99 latency > 100ms for more than 5 minutes
•High Error Rate: Alert if SERVFAIL + NXDOMAIN > 5% of responses
•Low Cache Hit Rate: Alert if cache hit rate < 50% (tune cache or investigate query patterns)
•Forward Health Failures: Alert immediately on upstream DNS failures
•DNS Query Spike: Alert on unusual query volume (potential DNS amplification or loop)

Summary: DNS in Kubernetes

We've comprehensively covered Kubernetes DNS. Let's consolidate the key takeaways:

Key Takeaways

•CoreDNS is the cluster DNS server, running in kube-system and configured via ConfigMap.
•Services get DNS records automatically: <service>.<namespace>.svc.cluster.local.
•Headless Services return Pod IPs directly, enabling StatefulSet and client-side load balancing patterns.
•Pod DNS configuration (resolv.conf) is controlled by dnsPolicy and dnsConfig.
•The ndots:5 default causes extra queries for external domains—reduce for API-heavy workloads.
•Scale CoreDNS and deploy NodeLocal DNSCache for large clusters and low-latency requirements.
•Monitor DNS metrics (latency, error rate, cache efficiency) to catch issues before they impact applications.

What's Next:

We've covered Services, Ingress, Network Policies, and DNS—the core Kubernetes networking primitives. In the final page of this module, we'll explore Service Mesh Integration—how Istio, Linkerd, and other service meshes extend Kubernetes networking with advanced features like mTLS, traffic management, and observability.

Page Complete

You now understand Kubernetes DNS in depth—from CoreDNS architecture through performance optimization and troubleshooting. You can configure custom DNS settings, debug resolution issues, and monitor DNS health. Next, we'll explore service mesh integration for advanced networking capabilities.

4 / 5

Loading learning content...

System Design (HLD)Kubernetes Networking

Kubernetes Networking

LevelAdvanced

Duration90 mins

TopicKubernetes Networking

4 / 5

DNS in Kubernetes: Service Discovery

The Glue of Kubernetes Networking

You don't hardcode IP addresses—they're ephemeral. You don't use environment variables for every service—that doesn't scale. Instead, you use DNS.

// Your application code
const connectionString = 'postgresql://postgresql.database.svc.cluster.local:5432/mydb';

What You Will Learn

By the end of this page, you will understand how CoreDNS works, the structure of Kubernetes DNS records, Pod and Service DNS configurations, and how to troubleshoot DNS issues effectively.

CoreDNS: The Cluster DNS Server

CoreDNS is the default DNS server in Kubernetes (since v1.13, replacing kube-dns). It runs as a Deployment in the kube-system namespace and is exposed via a ClusterIP Service.

How DNS Resolution Works in Kubernetes

Pod makes DNS query: "postgresql.database.svc.cluster.local"
       │
       ▼
 Pod's /etc/resolv.conf points to CoreDNS Service IP (e.g., 10.96.0.10)
       │
       ▼
 CoreDNS receives query
       │
       ▼
 Kubernetes plugin queries API server for Service
       │
       ▼
 CoreDNS returns Service ClusterIP (e.g., 10.96.50.100)
       │
       ▼
 Pod connects to ClusterIP

coredns-components.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# View CoreDNS Deployment
kubectl get deployment coredns -n kube-system
# NAME      READY   UP-TO-DATE   AVAILABLE
# coredns   2/2     2            2
 
# View CoreDNS Pods
kubectl get pods -n kube-system -l k8s-app=kube-dns
# NAME                       READY   STATUS    RESTARTS
# coredns-5644d7b6d9-abc12   1/1     Running   0
# coredns-5644d7b6d9-xyz34   1/1     Running   0
 
# View CoreDNS Service
kubectl get svc kube-dns -n kube-system
# NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)
# kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP
 
# The ClusterIP (10.96.0.10) is configured in every Pod's /etc/resolv.conf
kubectl exec -it any-pod -- cat /etc/resolv.conf
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

CoreDNS ConfigMap (Corefile)

CoreDNS is configured via a ConfigMap containing the Corefile—a declarative configuration that defines how DNS queries are handled.

coredns-configmap.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors                          # Log errors
        health {                        # Health check endpoint on :8080/health
            lameduck 5s
        }
        ready                           # Readiness probe on :8181/ready
        
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            # Watch Kubernetes Services and Pods
            pods insecure                # Enable Pod DNS records (A records)
            fallthrough in-addr.arpa ip6.arpa
            ttl 30                       # TTL for DNS responses
        }
        
        prometheus :9153                # Metrics endpoint for Prometheus
        
        forward . /etc/resolv.conf {    # Forward external queries upstream
            max_concurrent 1000
        }
        
        cache 30                        # Cache DNS responses for 30s
        
        loop                            # Detect and prevent DNS loops
        reload                          # Auto-reload on config changes
        loadbalance                     # Round-robin load balance responses
    }

CoreDNS Plugin Reference
Plugin	Function	Configuration Impact
kubernetes	Serves cluster DNS records	Essential—provides service discovery
forward	Forwards non-cluster queries	External DNS resolution
cache	Caches responses	Reduces latency, API load
loop	Detects resolution loops	Prevents infinite loops
health	Liveness probe endpoint	Controller health monitoring
ready	Readiness probe endpoint	Traffic routing decisions
prometheus	Metrics endpoint	Observability (query rates, latency)
errors	Logs errors to stdout	Debugging via kubectl logs

Service DNS Records: The Complete Reference

Kubernetes creates specific DNS records for Services. Understanding the naming conventions and record types is essential for both development and debugging.

DNS Record Format

The full DNS name for a Service follows this pattern:

<service-name>.<namespace>.svc.<cluster-domain>

For example: postgresql.database.svc.cluster.local

Component	Value	Description
service-name	postgresql	The Service's metadata.name
namespace	database	The namespace containing the Service
svc	svc	Literal string indicating a Service
cluster-domain	cluster.local	Default cluster domain (configurable)

service-dns-examples.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Example Service
apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: production
spec:
  type: ClusterIP
  clusterIP: 10.96.100.50
  selector:
    app: my-app
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: grpc
    port: 9090
    targetPort: 9090
 
# DNS Records created:
# 
# A Record (returns ClusterIP):
#   my-app.production.svc.cluster.local → 10.96.100.50
#
# SRV Records (for service discovery with ports):
#   _http._tcp.my-app.production.svc.cluster.local → 0 100 80 my-app.production.svc.cluster.local
#   _grpc._tcp.my-app.production.svc.cluster.local → 0 100 9090 my-app.production.svc.cluster.local
#
# Short names (via search domains):
#   From within 'production' namespace:
#     my-app → 10.96.100.50 (uses search domain)
#   From other namespaces:
#     my-app.production → 10.96.100.50

Headless Service DNS Records

Headless Services (ClusterIP: None) create different DNS records—they return Pod IPs directly instead of a ClusterIP:

headless-dns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# Headless Service
apiVersion: v1
kind: Service
metadata:
  name: mongodb
  namespace: database
spec:
  clusterIP: None        # Makes it headless
  selector:
    app: mongodb
  ports:
  - port: 27017
 
# With StatefulSet Pods: mongodb-0, mongodb-1, mongodb-2
 
# DNS Records created:
#
# A Record (returns ALL Pod IPs):
#   mongodb.database.svc.cluster.local → 10.244.1.42, 10.244.2.55, 10.244.3.18
#
# Individual Pod A Records (StatefulSet only):
#   mongodb-0.mongodb.database.svc.cluster.local → 10.244.1.42
#   mongodb-1.mongodb.database.svc.cluster.local → 10.244.2.55
#   mongodb-2.mongodb.database.svc.cluster.local → 10.244.3.18
#
# This enables direct Pod addressing for StatefulSets!

When to Use Headless Services

Pod DNS Configuration

Every Pod in Kubernetes is configured with DNS settings via /etc/resolv.conf. By default, this configuration is injected by the kubelet based on the Pod's dnsPolicy.

Default resolv.conf

resolv-conf-explained.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Default /etc/resolv.conf in a Pod
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
 
# Explained:
#
# nameserver 10.96.0.10
#   └── CoreDNS ClusterIP; all DNS queries go here
#
# search default.svc.cluster.local svc.cluster.local cluster.local
#   └── Search domains for short names. If you query "my-svc":
#       1. Try my-svc.default.svc.cluster.local
#       2. Try my-svc.svc.cluster.local
#       3. Try my-svc.cluster.local
#       4. Finally try my-svc as-is (absolute)
#
# options ndots:5
#   └── If query has < 5 dots, treat as relative and use search domains
#       If query has >= 5 dots, treat as absolute (no search)
#
# Example with "postgresql.database":
#   - Has 1 dot (< 5), so search domains are applied:
#   - postgresql.database.default.svc.cluster.local ❌
#   - postgresql.database.svc.cluster.local ❌
#   - postgresql.database.cluster.local ❌
#   - postgresql.database ← finally tries as-is (might work if external DNS resolves it)

DNS Policies

Kubernetes supports several dnsPolicy options that control how Pod DNS is configured:

DNS Policy Options
Policy	Behavior	Use Case
ClusterFirst (default)	Query CoreDNS first; forward external to upstream	Most workloads
ClusterFirstWithHostNet	Same as ClusterFirst, but for hostNetwork Pods	Ingress controllers, CNI plugins
Default	Use node's DNS settings (/etc/resolv.conf)	When cluster DNS isn't needed
None	Ignore all defaults; use dnsConfig only	Custom DNS configuration

custom-dns-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
# Custom DNS configuration for specific requirements
apiVersion: v1
kind: Pod
metadata:
  name: custom-dns-pod
spec:
  dnsPolicy: None      # Don't use defaults; configure manually
  dnsConfig:
    nameservers:
    - 10.96.0.10       # CoreDNS
    - 8.8.8.8          # Google DNS as backup
    searches:
    - production.svc.cluster.local
    - svc.cluster.local
    - cluster.local
    - example.com      # Add custom search domain
    options:
    - name: ndots
      value: "2"       # Reduce ndots for faster external lookups
    - name: timeout
      value: "3"       # 3 second timeout
    - name: attempts
      value: "2"       # Retry 2 times
  containers:
  - name: app
    image: myapp:latest
 
---
# Common pattern: Reduce ndots for applications that make many external calls
apiVersion: v1
kind: Pod
metadata:
  name: external-api-client
spec:
  dnsPolicy: ClusterFirst  # Use cluster DNS
  dnsConfig:
    options:
    - name: ndots
      value: "2"           # Treat names with 2+ dots as absolute
    # Now "api.stripe.com" (2 dots) won't go through search domains
    # This significantly reduces DNS queries for external domains
  containers:
  - name: app
    image: myapp:latest

The ndots:5 Problem

With default ndots:5, resolving api.stripe.com (2 dots) triggers 4 DNS queries before the correct resolution:

api.stripe.com.default.svc.cluster.local
api.stripe.com.svc.cluster.local
api.stripe.com.cluster.local
api.stripe.com (finally works!)

For applications making many external API calls, reducing ndots to 2-3 can significantly improve latency and reduce DNS load.

Pod DNS Records

In addition to Service DNS records, Kubernetes can create DNS records for individual Pods. This is essential for StatefulSets and certain advanced use cases.

Pod A Records

By default, Pods receive A records based on their IP address:

pod-dns-records.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Standard Pod DNS (IP-based)
# Pod IP: 10.244.1.42
# DNS Name: 10-244-1-42.default.pod.cluster.local
# Note: Dots in IP are replaced with dashes
 
# This is enabled by CoreDNS config:
# kubernetes cluster.local in-addr.arpa ip6.arpa {
#     pods insecure        ← This enables Pod A records
# }
 
---
# Pod with explicit hostname and subdomain
apiVersion: v1
kind: Pod
metadata:
  name: my-pod
  namespace: production
spec:
  hostname: my-named-pod    # Custom hostname
  subdomain: my-subdomain   # Must match a headless Service name
  containers:
  - name: app
    image: myapp:latest
 
# DNS Record created (if headless service "my-subdomain" exists):
# my-named-pod.my-subdomain.production.svc.cluster.local → Pod IP
 
---
# Headless Service to enable the subdomain
apiVersion: v1
kind: Service
metadata:
  name: my-subdomain       # Must match Pod's subdomain
  namespace: production
spec:
  clusterIP: None          # Headless
  selector:
    app: my-pods
  ports:
  - port: 8080

StatefulSet DNS Records

StatefulSets are the primary use case for Pod DNS records. Each Pod in a StatefulSet gets a stable, predictable DNS name:

statefulset-dns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: database
spec:
  clusterIP: None           # Headless - required for StatefulSet DNS
  selector:
    app: mysql
  ports:
  - port: 3306
 
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: database
spec:
  serviceName: mysql-headless    # Links to headless Service
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
 
# DNS Records created:
#
# Service A Record (returns all Pod IPs):
#   mysql-headless.database.svc.cluster.local → 10.244.1.42, 10.244.2.55, 10.244.3.18
#
# Individual Pod A Records:
#   mysql-0.mysql-headless.database.svc.cluster.local → 10.244.1.42
#   mysql-1.mysql-headless.database.svc.cluster.local → 10.244.2.55
#   mysql-2.mysql-headless.database.svc.cluster.local → 10.244.3.18
#
# These names are STABLE - even if Pods are deleted and recreated,
# mysql-0, mysql-1, mysql-2 will always resolve to the correct Pod

StatefulSet DNS Stability

External DNS Resolution

Pods need to resolve external domain names (api.stripe.com, github.com) in addition to cluster-internal names. CoreDNS handles this through forwarding.

Default Forwarding Behavior

By default, CoreDNS forwards non-cluster queries to the nameservers configured on the node (from /etc/resolv.conf):

forward-config.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# Default CoreDNS forwarding configuration
.:53 {
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        # Handles cluster.local queries
    }
    
    forward . /etc/resolv.conf {
        # Forward everything else to upstream DNS
        max_concurrent 1000
        policy sequential  # Try nameservers in order
    }
}
 
# Custom upstream DNS configuration
.:53 {
    kubernetes cluster.local in-addr.arpa ip6.arpa {
        pods insecure
        fallthrough in-addr.arpa ip6.arpa
        ttl 30
    }
    
    # Forward to specific upstream DNS servers
    forward . 8.8.8.8 8.8.4.4 {
        policy round_robin  # Load balance between servers
        health_check 5s      # Check upstream health every 5s
    }
    
    cache 30
}

Split-Horizon DNS (Stub Domains)

Organizations often need to resolve internal corporate domains differently from public DNS. CoreDNS supports stub domains for this:

stub-domains.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
            ttl 30
        }
        
        cache 30
        reload
        loadbalance
        
        # Default upstream for most queries
        forward . 8.8.8.8 8.8.4.4
    }
    
    # Stub domain: resolve *.corp.example.com via corporate DNS
    corp.example.com:53 {
        errors
        cache 30
        forward . 10.0.0.53 10.0.0.54    # Corporate DNS servers
    }
    
    # Another stub domain: resolve *.internal.acme.com
    internal.acme.com:53 {
        errors
        cache 30
        forward . 192.168.1.53
    }

ExternalName Services

For services outside the cluster (like managed databases), use ExternalName Services to provide a cluster-internal DNS alias:

externalname-service.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# ExternalName Service - DNS alias for external service
apiVersion: v1
kind: Service
metadata:
  name: production-database
  namespace: backend
spec:
  type: ExternalName
  externalName: prod-db.us-east-1.rds.amazonaws.com
 
# DNS behavior:
# production-database.backend.svc.cluster.local 
#     → CNAME → prod-db.us-east-1.rds.amazonaws.com
#     → A → (RDS IP addresses)
 
# Application configuration:
# DATABASE_HOST=production-database  # Uses cluster DNS
# If database moves, only update the ExternalName Service
 
---
# ExternalName for API integrations
apiVersion: v1
kind: Service
metadata:
  name: payment-api
  namespace: integrations
spec:
  type: ExternalName
  externalName: api.stripe.com
 
# Now Pods can use: http://payment-api.integrations.svc.cluster.local/v1/charges
# This provides flexibility to mock or redirect in different environments

ExternalName Limitations

DNS Performance and Scaling

DNS is on the critical path for every network request. Performance issues affect every application in the cluster. Here's how to optimize and scale CoreDNS.

Scaling CoreDNS

For large clusters, the default 2 CoreDNS replicas may not be sufficient:

coredns-scaling.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# Scale CoreDNS replicas manually
kubectl scale deployment coredns -n kube-system --replicas=3
 
# OR use Cluster Proportional Autoscaler for automatic scaling
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-autoscaler
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: dns-autoscaler
  template:
    metadata:
      labels:
        k8s-app: dns-autoscaler
    spec:
      containers:
      - name: autoscaler
        image: registry.k8s.io/cpa/cluster-proportional-autoscaler:1.8.5
        command:
        - /cluster-proportional-autoscaler
        - --namespace=kube-system
        - --configmap=dns-autoscaler
        - --target=deployment/coredns
        - --default-params={"linear":{"coresPerReplica":256,"nodesPerReplica":16,"min":2,"max":10}}
        # 1 replica per 256 cores OR per 16 nodes, minimum 2, maximum 10
 
---
# Resource limits for CoreDNS
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-resources
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        # ... existing config ...
    }
 
# Patch deployment with resources
kubectl patch deployment coredns -n kube-system -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "coredns",
          "resources": {
            "requests": {"cpu": "100m", "memory": "128Mi"},
            "limits": {"cpu": "200m", "memory": "256Mi"}
          }
        }]
      }
    }
  }
}'

NodeLocal DNSCache

NodeLocal DNSCache runs a DNS caching agent on each node, dramatically reducing latency and CoreDNS load:

nodelocal-dns.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# NodeLocal DNS Cache runs as a DaemonSet on each node
# It listens on a link-local IP (169.254.20.10) and caches DNS responses
 
# Benefits:
# - Eliminates cluster network hop for cached queries
# - Reduces CoreDNS load by 50-90% in typical workloads
# - Lower latency for DNS resolution
# - Survives CoreDNS Pod/network issues (cached responses)
 
# Installation (GKE, EKS, AKS have specific guides)
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
 
# After installation, Pods automatically use 169.254.20.10
# NodeLocal DNS forwards cache misses to CoreDNS
 
# View NodeLocal DNS DaemonSet
kubectl get ds node-local-dns -n kube-system
kubectl get pods -n kube-system -l k8s-app=node-local-dns

DNS Performance Optimization Techniques
Technique	Impact	Implementation Effort
Increase CoreDNS replicas	Linear capacity increase	Low (kubectl scale)
NodeLocal DNSCache	50-90% load reduction	Medium (DaemonSet deploy)
Reduce ndots value	Fewer queries per resolution	Low (Pod spec change)
Use FQDN with trailing dot	Skip search domains	Low (app config change)
Increase cache TTL	Fewer queries to upstream	Low (Corefile edit)
Autoscaling CoreDNS	Dynamic capacity	Medium (deploy autoscaler)

Troubleshooting DNS Issues

DNS issues are among the most common problems in Kubernetes. A systematic approach is essential for efficient debugging.

dns-troubleshooting.sh
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# Step 1: Verify CoreDNS is running and healthy
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get deployment coredns -n kube-system
 
# Check CoreDNS logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100
 
# Step 2: Check CoreDNS Service and Endpoints
kubectl get svc kube-dns -n kube-system
kubectl get endpoints kube-dns -n kube-system
# Should show CoreDNS Pod IPs
 
# Step 3: Test DNS from a debug Pod
kubectl run -it --rm dnstest --image=busybox:1.35 -- sh
 
# Inside the Pod:
cat /etc/resolv.conf           # Check DNS configuration
nslookup kubernetes             # Test internal resolution
nslookup google.com             # Test external resolution
nslookup my-service.my-namespace.svc.cluster.local  # Test full FQDN
 
# Step 4: More detailed DNS testing
kubectl run -it --rm dnstest --image=busybox:1.35 -- sh
# Inside:
nslookup -debug kubernetes       # Verbose output
nslookup -type=SRV _http._tcp.my-service.production.svc.cluster.local
 
# Step 5: Test DNS from a full-featured debug image
kubectl run netshoot --image=nicolaka/netshoot -it --rm -- bash
# Inside:
dig kubernetes.default.svc.cluster.local
dig @10.96.0.10 my-service.production.svc.cluster.local
dig +trace google.com  # Trace resolution path
 
# Step 6: Check if it's a Pod-specific or cluster-wide issue
# Create test Pod in different namespace
kubectl run -n kube-system dnstest --image=busybox:1.35 -- nslookup kubernetes
kubectl logs -n kube-system dnstest
 
# Step 7: Verify network policies aren't blocking DNS
kubectl get networkpolicies -A
# Ensure egress to kube-dns (UDP 53) is allowed
 
# Step 8: Check CoreDNS ConfigMap
kubectl get configmap coredns -n kube-system -o yaml
 
# Step 9: Restart CoreDNS if needed (as last resort)
kubectl rollout restart deployment coredns -n kube-system

Common DNS Issues and Solutions
Symptom	Cause	Solution
nslookup times out	CoreDNS not running or not reachable	Check CoreDNS Pods; verify network policies allow DNS
NXDOMAIN for internal services	Wrong namespace or typo in name	Use FQDN: service.namespace.svc.cluster.local
External domains timeout	Forward config issue or upstream unreachable	Check CoreDNS forward config; test upstream DNS
Intermittent failures	CoreDNS overloaded	Scale up replicas; deploy NodeLocal DNSCache
Slow resolution	High ndots causing many searches	Reduce ndots; use FQDN with trailing dot
Service IP not updating	Stale DNS cache	Wait for TTL; reduce cache TTL in Corefile
Pod can't resolve its own Service	Service not yet propagated	Wait a few seconds; check Endpoints

DNS Debugging Quick Check

Most DNS issues fall into three categories:

CoreDNS not reachable: Check Pods, Service, Network Policies
Wrong query format: Use FQDN, check namespace spelling
Upstream issues: Test external DNS independently

Start with: kubectl run -it --rm debug --image=busybox -- nslookup kubernetes

Monitoring Kubernetes DNS

Proactive DNS monitoring prevents issues before they impact applications. CoreDNS exposes Prometheus metrics for comprehensive observability.

dns-monitoring.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# CoreDNS exposes metrics on :9153/metrics by default
 
# Key metrics to monitor:
#
# coredns_dns_requests_total
#   - Total DNS requests by zone, type, protocol
#   - Use to understand query volume and patterns
#
# coredns_dns_responses_total
#   - Responses by rcode (NOERROR, NXDOMAIN, SERVFAIL)
#   - High SERVFAIL = upstream or config issues
#
# coredns_dns_request_duration_seconds
#   - Query latency histogram
#   - p99 latency is key SLO metric
#
# coredns_cache_hits_total / coredns_cache_misses_total
#   - Cache efficiency
#   - Low hit rate = tune cache TTL or size
#
# coredns_forward_healthcheck_failures_total
#   - Upstream DNS server health
#   - Any failures need investigation
 
---
# Prometheus ServiceMonitor for CoreDNS
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: coredns
  namespace: monitoring
  labels:
    release: prometheus
spec:
  jobLabel: k8s-app
  selector:
    matchLabels:
      k8s-app: kube-dns
  namespaceSelector:
    matchNames:
    - kube-system
  endpoints:
  - port: metrics
    interval: 30s
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
 
---
# Grafana dashboard queries (PromQL examples)
#
# DNS Queries Per Second:
# sum(rate(coredns_dns_requests_total[5m]))
#
# Average Latency:
# histogram_quantile(0.99, sum(rate(coredns_dns_request_duration_seconds_bucket[5m])) by (le))
#
# Cache Hit Rate:
# sum(rate(coredns_cache_hits_total[5m])) / 
#   (sum(rate(coredns_cache_hits_total[5m])) + sum(rate(coredns_cache_misses_total[5m])))
#
# Error Rate:
# sum(rate(coredns_dns_responses_total{rcode!="NOERROR"}[5m])) / 
#   sum(rate(coredns_dns_responses_total[5m]))

Recommended DNS Alerts

•CoreDNS Pod Down: Alert if any CoreDNS Pod is not Running
•High DNS Latency: Alert if p99 latency > 100ms for more than 5 minutes
•High Error Rate: Alert if SERVFAIL + NXDOMAIN > 5% of responses
•Low Cache Hit Rate: Alert if cache hit rate < 50% (tune cache or investigate query patterns)
•Forward Health Failures: Alert immediately on upstream DNS failures
•DNS Query Spike: Alert on unusual query volume (potential DNS amplification or loop)

Summary: DNS in Kubernetes

We've comprehensively covered Kubernetes DNS. Let's consolidate the key takeaways:

Key Takeaways

•CoreDNS is the cluster DNS server, running in kube-system and configured via ConfigMap.
•Services get DNS records automatically: <service>.<namespace>.svc.cluster.local.
•Headless Services return Pod IPs directly, enabling StatefulSet and client-side load balancing patterns.
•Pod DNS configuration (resolv.conf) is controlled by dnsPolicy and dnsConfig.
•The ndots:5 default causes extra queries for external domains—reduce for API-heavy workloads.
•Scale CoreDNS and deploy NodeLocal DNSCache for large clusters and low-latency requirements.
•Monitor DNS metrics (latency, error rate, cache efficiency) to catch issues before they impact applications.

What's Next:

Page Complete

4 / 5