Loading content...
Some workloads don't follow the "deploy N replicas" model. Instead, they need to run everywhere—on every node in your cluster, always.
Consider logging: every node generates logs from kubelet, container runtimes, and application pods. You need a log collector on each node to gather and forward these logs. You can't run three log collectors and hope they cover a 100-node cluster. You need exactly one on every single node.
This is the domain of DaemonSets. They ensure that a copy of a pod runs on all (or some) nodes. When nodes join the cluster, pods are added. When nodes leave, pods are garbage collected. No manual intervention required.
By the end of this page, you'll understand when and why to use DaemonSets. You'll master node selection strategies, understand how DaemonSets interact with taints and tolerations, and learn production patterns for logging agents, monitoring exporters, and network plugins.
A DaemonSet is a specialized controller that ensures every eligible node runs exactly one copy of a pod. Unlike Deployments that manage a specific replica count, DaemonSets dynamically adjust to the cluster's node population.
The DaemonSet contract:
How DaemonSets differ from Deployments:
While both manage pods, their fundamental models are different:
| Aspect | Deployment | DaemonSet |
|---|---|---|
| Replica count | User-specified (e.g., 5 replicas) | Automatically computed (one per eligible node) |
| Pod distribution | Spread by scheduler heuristics | Exactly one per node, guaranteed |
| Scaling trigger | Manual or HPA/KEDA | Node joining/leaving cluster |
| Node affinity | Works with scheduler | Direct node assignment by controller |
| Primary use case | Application workloads | Infrastructure agents |
| Pod names | Random suffix | Random suffix (but one per node) |
DaemonSets are the right choice whenever you need node-local functionality that every node requires.
Log Collection Architecture
Centralized logging requires an agent on every node to collect and forward logs from:
Common implementations:
123456789101112131415161718192021222324252627282930313233343536373839404142
apiVersion: apps/v1kind: DaemonSetmetadata: name: fluent-bit namespace: loggingspec: selector: matchLabels: app: fluent-bit template: metadata: labels: app: fluent-bit spec: serviceAccountName: fluent-bit tolerations: - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule containers: - name: fluent-bit image: fluent/fluent-bit:2.2 resources: limits: memory: 200Mi requests: cpu: 100m memory: 100Mi volumeMounts: - name: varlog mountPath: /var/log readOnly: true - name: containers mountPath: /var/lib/docker/containers readOnly: true volumes: - name: varlog hostPath: path: /var/log - name: containers hostPath: path: /var/lib/docker/containersLet's examine a production-grade DaemonSet configuration with detailed explanations of each component:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165
apiVersion: apps/v1kind: DaemonSetmetadata: name: security-agent namespace: security labels: app: security-agent tier: infrastructurespec: # === Selector (Immutable) === selector: matchLabels: app: security-agent # === Update Strategy === updateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 # Update one node at a time # === Minimum Ready Seconds === minReadySeconds: 10 # Pod must be ready for 10s before considered available # === Revision History === revisionHistoryLimit: 5 # === Pod Template === template: metadata: labels: app: security-agent annotations: prometheus.io/scrape: "true" prometheus.io/port: "9102" spec: # === Priority Class === priorityClassName: system-node-critical # === Node Selection === nodeSelector: kubernetes.io/os: linux # === Tolerations === tolerations: # Run on control plane nodes - key: node-role.kubernetes.io/control-plane operator: Exists effect: NoSchedule # Run on nodes with GPU taints - key: nvidia.com/gpu operator: Exists effect: NoSchedule # Tolerate any taint for infrastructure pods - operator: Exists effect: NoExecute tolerationSeconds: 300 # === Host Access === hostNetwork: false hostPID: true # Access to host processes hostIPC: false # === DNS Policy === dnsPolicy: ClusterFirstWithHostNet # === Service Account === serviceAccountName: security-agent # === Init Container === initContainers: - name: init-config image: company/security-agent-init:v1.2 securityContext: privileged: true command: ["/init.sh"] volumeMounts: - name: host-root mountPath: /host readOnly: false # === Main Container === containers: - name: agent image: company/security-agent:v2.5.0 imagePullPolicy: IfNotPresent # === Security Context === securityContext: privileged: true # Required for kernel-level access capabilities: add: - SYS_ADMIN - SYS_PTRACE - NET_ADMIN # === Resources === resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 512Mi # === Environment Variables === env: - name: NODE_NAME valueFrom: fieldRef: fieldPath: spec.nodeName - name: POD_NAME valueFrom: fieldRef: fieldPath: metadata.name - name: HOST_IP valueFrom: fieldRef: fieldPath: status.hostIP # === Readiness/Liveness === livenessProbe: httpGet: path: /healthz port: 9102 initialDelaySeconds: 15 periodSeconds: 20 readinessProbe: httpGet: path: /ready port: 9102 initialDelaySeconds: 5 periodSeconds: 10 # === Volume Mounts === volumeMounts: - name: host-root mountPath: /host readOnly: true - name: host-proc mountPath: /host/proc readOnly: true - name: host-sys mountPath: /host/sys readOnly: true - name: config mountPath: /etc/security-agent # === Volumes === volumes: - name: host-root hostPath: path: / - name: host-proc hostPath: path: /proc - name: host-sys hostPath: path: /sys - name: config configMap: name: security-agent-config # === Termination Grace Period === terminationGracePeriodSeconds: 30While DaemonSets run on "every node" by default, you often need fine-grained control over which nodes receive pods. Kubernetes provides multiple mechanisms for this control.
Node Selectors — Simple Label Matching
The simplest way to limit DaemonSet scope is with nodeSelector:
12345678910
spec: template: spec: nodeSelector: # Only run on Linux nodes kubernetes.io/os: linux # Only run on nodes with SSD storage disk-type: ssd # Only run on nodes in specific zone topology.kubernetes.io/zone: us-east-1aNode Affinity — Advanced Matching
For more complex requirements, use nodeAffinity:
1234567891011121314151617181920212223
spec: template: spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: # Run on worker nodes only - key: node-role.kubernetes.io/worker operator: Exists # Exclude nodes marked for decommission - key: node-status operator: NotIn values: - decommissioning # Run on specific instance types - key: node.kubernetes.io/instance-type operator: In values: - m5.large - m5.xlarge - m5.2xlargeTolerations — Running on Tainted Nodes
Nodes can have taints that repel pods. DaemonSets must specify tolerations to run on tainted nodes:
1234567891011121314151617181920212223
spec: template: spec: tolerations: # Tolerate control plane taint (exact match) - key: node-role.kubernetes.io/control-plane operator: Equal value: "" effect: NoSchedule # Tolerate any value of a key (exists match) - key: nvidia.com/gpu operator: Exists effect: NoSchedule # Tolerate not-ready nodes temporarily - key: node.kubernetes.io/not-ready operator: Exists effect: NoExecute tolerationSeconds: 300 # Tolerate ALL taints (use with caution) - operator: ExistsUsing - operator: Exists without a key tolerates ALL taints. This is powerful but dangerous—your DaemonSet will run on every node regardless of any taints, including nodes marked for maintenance or with known issues. Use this pattern only for critical infrastructure components like CNI plugins or logging agents that absolutely must run everywhere.
| Taint Key | Purpose | When to Tolerate |
|---|---|---|
| node-role.kubernetes.io/control-plane | Control plane nodes | Monitoring, logging agents that need control plane metrics |
| node.kubernetes.io/not-ready | Node is booting or unhealthy | Critical infrastructure that aids recovery |
| node.kubernetes.io/unschedulable | Node cordoned for maintenance | Usually don't tolerate—let pods drain |
| node.kubernetes.io/disk-pressure | Low disk space | Logging agents to help diagnose |
| nvidia.com/gpu | GPU nodes with special workloads | GPU monitoring, CUDA drivers |
| node.kubernetes.io/memory-pressure | Memory pressure detected | Selective—may worsen pressure |
DaemonSet updates require special consideration because you're updating infrastructure that often affects the entire node. Kubernetes provides two update strategies:
12345678910111213
# RollingUpdate - Automatic, controlled updatesupdateStrategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 # Update one node at a time # maxUnavailable: 25% # Update 25% of nodes in parallel # maxUnavailable: 5 # Update up to 5 nodes in parallel ---# OnDelete - Manual control of when pods updateupdateStrategy: type: OnDelete# To trigger update: kubectl delete pod <daemonset-pod>Choosing maxUnavailable:
The optimal maxUnavailable setting depends on your cluster size and risk tolerance:
maxUnavailable: 10% speeds up updates significantly while limiting blast radiusmaxUnavailable: 1 ensures minimal disruptionmaxUnavailable: 1 regardless of size for maximum safetyMonitoring rollout progress:
1234567891011121314
# Watch rollout statuskubectl rollout status daemonset/fluent-bit -n logging # Check which pods are on old vs new revisionkubectl get pods -l app=fluent-bit -n logging -o wide --show-labels # Rollback to previous versionkubectl rollout undo daemonset/fluent-bit -n logging # Rollback to specific revisionkubectl rollout undo daemonset/fluent-bit -n logging --to-revision=2 # View rollout historykubectl rollout history daemonset/fluent-bit -n loggingWhen updating CNI plugins or other critical networking components, consider using OnDelete strategy combined with node draining. This allows you to: (1) Drain node and remove workloads, (2) Delete DaemonSet pod to trigger update, (3) Verify networking works, (4) Uncordon node. This provides much safer updates for infrastructure that could partition your cluster if updated incorrectly.
DaemonSet pods often require various forms of host access that regular application pods don't need. Understanding these patterns and their security implications is essential.
| Access Type | Setting | Use Case | Security Risk |
|---|---|---|---|
| Host Network | hostNetwork: true | Network monitoring, CNI plugins | Pod sees all network traffic on node |
| Host PID | hostPID: true | Process monitoring, security agents | Can see/signal all node processes |
| Host IPC | hostIPC: true | Shared memory access | Can read/write shared memory segments |
| Privileged | privileged: true | Kernel modules, iptables | Full root access to node |
| hostPath | volumes.hostPath | Log collection, config access | File system access based on path |
12345678910111213141516171819202122232425262728293031323334353637383940414243
# Pattern 1: Network Monitoring (hostNetwork for raw access)spec: hostNetwork: true dnsPolicy: ClusterFirstWithHostNet # Required when using hostNetwork containers: - name: network-agent securityContext: capabilities: add: ["NET_ADMIN", "NET_RAW"] ---# Pattern 2: Process Monitoring (hostPID for /proc access)spec: hostPID: true containers: - name: process-monitor securityContext: capabilities: add: ["SYS_PTRACE"] volumeMounts: - name: host-proc mountPath: /host/proc readOnly: true volumes: - name: host-proc hostPath: path: /proc ---# Pattern 3: Minimal Host Access (prefer over privileged)spec: containers: - name: log-collector securityContext: privileged: false runAsUser: 0 # Still root but not privileged capabilities: add: ["DAC_READ_SEARCH"] # Read any file drop: ["ALL"] # Drop everything else volumeMounts: - name: varlog mountPath: /var/log readOnly: trueAlways use the minimum host access required. Instead of privileged: true, use specific capabilities. Instead of mounting / (root), mount only the directories you need. Each escalation is a potential attack vector if the container is compromised. Document why each privilege is necessary in comments or annotations.
DaemonSet issues typically fall into two categories: pods not scheduled on expected nodes, or pods failing on specific nodes. Here's a systematic debugging approach:
12345678910111213141516171819202122232425262728293031323334353637
# 1. Check overall DaemonSet statuskubectl describe daemonset fluent-bit -n logging # Look for:# - desiredNumberScheduled: should equal total eligible nodes# - currentNumberScheduled: pods actually scheduled# - numberReady: pods running and ready # 2. Identify nodes without pods# Get all nodeskubectl get nodes # Get nodes with DaemonSet podskubectl get pods -l app=fluent-bit -n logging -o wide # Compare to find missing nodes # 3. Check why pod isn't scheduled on a specific nodekubectl describe node <node-name> # Look for:# - Taints (does DaemonSet tolerate them?)# - Labels (does nodeSelector match?)# - Conditions (is node Ready?) # 4. Check for resource constraintskubectl describe node <node-name> | grep -A 10 "Allocated resources" # 5. Check DaemonSet pod on a specific nodekubectl get pods -l app=fluent-bit -n logging --field-selector spec.nodeName=<node-name> # 6. Debug a specific podkubectl logs <pod-name> -n loggingkubectl describe pod <pod-name> -n logging # 7. Check events for scheduling issueskubectl get events -n logging --field-selector reason=FailedScheduling| Symptom | Likely Cause | Solution |
|---|---|---|
| Pod not scheduled on node | Node taint not tolerated | Add toleration for the taint |
| Pod not scheduled on node | nodeSelector doesn't match labels | Check node labels, update selector |
| Pod pending on node | Insufficient resources on node | Lower resource requests or scale node |
| Pod CrashLoopBackOff | Host path doesn't exist on node | Verify path exists on all nodes |
| Update stuck | Pod failing readiness on some nodes | Debug specific nodes, check logs |
| Less pods than nodes | Node affinity excludes some nodes | Review nodeAffinity rules |
Let's consolidate the essential knowledge about DaemonSets:
What's next:
Now that you understand DaemonSets for node-level workloads, we'll explore Jobs and CronJobs—Kubernetes' workload types for batch processing and scheduled tasks. These are essential for running one-off tasks, data migrations, and periodic maintenance jobs.
You now have a comprehensive understanding of DaemonSets. You can deploy node-level agents with appropriate host access, control which nodes receive pods through selectors and tolerations, and safely update DaemonSets in production. Next, we'll explore Jobs and CronJobs for batch and scheduled workloads.