Kubernetes Workloads - Learning Module

Loading content...

0/273

DaemonSets for Node-Level Workloads

Running on Every Node

Some workloads don't follow the "deploy N replicas" model. Instead, they need to run everywhere—on every node in your cluster, always.

Consider logging: every node generates logs from kubelet, container runtimes, and application pods. You need a log collector on each node to gather and forward these logs. You can't run three log collectors and hope they cover a 100-node cluster. You need exactly one on every single node.

This is the domain of DaemonSets. They ensure that a copy of a pod runs on all (or some) nodes. When nodes join the cluster, pods are added. When nodes leave, pods are garbage collected. No manual intervention required.

What You Will Learn

By the end of this page, you'll understand when and why to use DaemonSets. You'll master node selection strategies, understand how DaemonSets interact with taints and tolerations, and learn production patterns for logging agents, monitoring exporters, and network plugins.

Understanding DaemonSets

A DaemonSet is a specialized controller that ensures every eligible node runs exactly one copy of a pod. Unlike Deployments that manage a specific replica count, DaemonSets dynamically adjust to the cluster's node population.

The DaemonSet contract:

Core DaemonSet Guarantees

•Automatic pod creation — When a new node joins the cluster and matches the DaemonSet's node selector, a pod is automatically scheduled.
•Automatic pod deletion — When a node is removed from the cluster, its DaemonSet pod is garbage collected.
•Exactly one pod per node — Each eligible node runs precisely one DaemonSet pod. Not zero, not two—exactly one.
•Bypasses scheduler (optional) — DaemonSet pods can be scheduled by the DaemonSet controller directly, bypassing the default scheduler's capacity checks.
•Tolerates taints — DaemonSets can specify tolerations to run on nodes with taints that would normally prevent scheduling.

Converting Mermaid diagram...

How DaemonSets differ from Deployments:

While both manage pods, their fundamental models are different:

DaemonSet vs Deployment Comparison
Aspect	Deployment	DaemonSet
Replica count	User-specified (e.g., 5 replicas)	Automatically computed (one per eligible node)
Pod distribution	Spread by scheduler heuristics	Exactly one per node, guaranteed
Scaling trigger	Manual or HPA/KEDA	Node joining/leaving cluster
Node affinity	Works with scheduler	Direct node assignment by controller
Primary use case	Application workloads	Infrastructure agents
Pod names	Random suffix	Random suffix (but one per node)

Common DaemonSet Use Cases

DaemonSets are the right choice whenever you need node-local functionality that every node requires.

Log Collection Architecture

Centralized logging requires an agent on every node to collect and forward logs from:

Container stdout/stderr (accessible via Docker/containerd socket)
Application logs written to volumes
Kubernetes events and audit logs
Node-level syslog and kernel messages

Common implementations:

Fluent Bit / Fluentd — Lightweight log forwarding with rich filtering
Promtail — Loki's log agent
Filebeat — Elastic's log shipper
Vector — High-performance observability data pipeline

fluentbit-daemonset.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
spec:
  selector:
    matchLabels:
      app: fluent-bit
  template:
    metadata:
      labels:
        app: fluent-bit
    spec:
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      containers:
      - name: fluent-bit
        image: fluent/fluent-bit:2.2
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        volumeMounts:
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: containers
          mountPath: /var/lib/docker/containers
          readOnly: true
      volumes:
      - name: varlog
        hostPath:
          path: /var/log
      - name: containers
        hostPath:
          path: /var/lib/docker/containers

Complete DaemonSet Configuration

Let's examine a production-grade DaemonSet configuration with detailed explanations of each component:

daemonset-complete.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: security-agent
  namespace: security
  labels:
    app: security-agent
    tier: infrastructure
spec:
  # === Selector (Immutable) ===
  selector:
    matchLabels:
      app: security-agent
  
  # === Update Strategy ===
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1  # Update one node at a time
  
  # === Minimum Ready Seconds ===
  minReadySeconds: 10  # Pod must be ready for 10s before considered available
  
  # === Revision History ===
  revisionHistoryLimit: 5
  
  # === Pod Template ===
  template:
    metadata:
      labels:
        app: security-agent
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9102"
    spec:
      # === Priority Class ===
      priorityClassName: system-node-critical
      
      # === Node Selection ===
      nodeSelector:
        kubernetes.io/os: linux
      
      # === Tolerations ===
      tolerations:
      # Run on control plane nodes
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      # Run on nodes with GPU taints
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      # Tolerate any taint for infrastructure pods
      - operator: Exists
        effect: NoExecute
        tolerationSeconds: 300
      
      # === Host Access ===
      hostNetwork: false
      hostPID: true  # Access to host processes
      hostIPC: false
      
      # === DNS Policy ===
      dnsPolicy: ClusterFirstWithHostNet
      
      # === Service Account ===
      serviceAccountName: security-agent
      
      # === Init Container ===
      initContainers:
      - name: init-config
        image: company/security-agent-init:v1.2
        securityContext:
          privileged: true
        command: ["/init.sh"]
        volumeMounts:
        - name: host-root
          mountPath: /host
          readOnly: false
      
      # === Main Container ===
      containers:
      - name: agent
        image: company/security-agent:v2.5.0
        imagePullPolicy: IfNotPresent
        
        # === Security Context ===
        securityContext:
          privileged: true  # Required for kernel-level access
          capabilities:
            add:
            - SYS_ADMIN
            - SYS_PTRACE
            - NET_ADMIN
        
        # === Resources ===
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi
        
        # === Environment Variables ===
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: HOST_IP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        
        # === Readiness/Liveness ===
        livenessProbe:
          httpGet:
            path: /healthz
            port: 9102
          initialDelaySeconds: 15
          periodSeconds: 20
        
        readinessProbe:
          httpGet:
            path: /ready
            port: 9102
          initialDelaySeconds: 5
          periodSeconds: 10
        
        # === Volume Mounts ===
        volumeMounts:
        - name: host-root
          mountPath: /host
          readOnly: true
        - name: host-proc
          mountPath: /host/proc
          readOnly: true
        - name: host-sys
          mountPath: /host/sys
          readOnly: true
        - name: config
          mountPath: /etc/security-agent
      
      # === Volumes ===
      volumes:
      - name: host-root
        hostPath:
          path: /
      - name: host-proc
        hostPath:
          path: /proc
      - name: host-sys
        hostPath:
          path: /sys
      - name: config
        configMap:
          name: security-agent-config
      
      # === Termination Grace Period ===
      terminationGracePeriodSeconds: 30

Critical Configuration Elements

•priorityClassName: system-node-critical — Ensures DaemonSet pods are not evicted during resource pressure. Critical for infrastructure agents.
•tolerations — DaemonSets often need to run on ALL nodes, including those with taints (control plane, GPU nodes, spot instances).
•hostNetwork/hostPID — Infrastructure agents often need access to host namespaces for monitoring or security purposes.
•privileged: true — Required when agents need kernel access, eBPF, or iptables manipulation. Avoid when possible.
•NODE_NAME env var — Pods can identify which node they're running on using the downward API.
•hostPath volumes — Access to host filesystem, proc, and sys is common for monitoring and security agents.

Node Selection and Tolerations

While DaemonSets run on "every node" by default, you often need fine-grained control over which nodes receive pods. Kubernetes provides multiple mechanisms for this control.

Node Selectors — Simple Label Matching

The simplest way to limit DaemonSet scope is with nodeSelector:

node-selector.yaml
YAML
1
2
3
4
5
6
7
8
9
10
spec:
  template:
    spec:
      nodeSelector:
        # Only run on Linux nodes
        kubernetes.io/os: linux
        # Only run on nodes with SSD storage
        disk-type: ssd
        # Only run on nodes in specific zone
        topology.kubernetes.io/zone: us-east-1a

Node Affinity — Advanced Matching

For more complex requirements, use nodeAffinity:

node-affinity.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
spec:
  template:
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              # Run on worker nodes only
              - key: node-role.kubernetes.io/worker
                operator: Exists
              # Exclude nodes marked for decommission
              - key: node-status
                operator: NotIn
                values:
                - decommissioning
              # Run on specific instance types
              - key: node.kubernetes.io/instance-type
                operator: In
                values:
                - m5.large
                - m5.xlarge
                - m5.2xlarge

Tolerations — Running on Tainted Nodes

Nodes can have taints that repel pods. DaemonSets must specify tolerations to run on tainted nodes:

tolerations.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
spec:
  template:
    spec:
      tolerations:
      # Tolerate control plane taint (exact match)
      - key: node-role.kubernetes.io/control-plane
        operator: Equal
        value: ""
        effect: NoSchedule
      
      # Tolerate any value of a key (exists match)
      - key: nvidia.com/gpu
        operator: Exists
        effect: NoSchedule
      
      # Tolerate not-ready nodes temporarily
      - key: node.kubernetes.io/not-ready
        operator: Exists
        effect: NoExecute
        tolerationSeconds: 300
      
      # Tolerate ALL taints (use with caution)
      - operator: Exists

The Tolerate-All Pattern

Using - operator: Exists without a key tolerates ALL taints. This is powerful but dangerous—your DaemonSet will run on every node regardless of any taints, including nodes marked for maintenance or with known issues. Use this pattern only for critical infrastructure components like CNI plugins or logging agents that absolutely must run everywhere.

Common Kubernetes Taints to Consider
Taint Key	Purpose	When to Tolerate
node-role.kubernetes.io/control-plane	Control plane nodes	Monitoring, logging agents that need control plane metrics
node.kubernetes.io/not-ready	Node is booting or unhealthy	Critical infrastructure that aids recovery
node.kubernetes.io/unschedulable	Node cordoned for maintenance	Usually don't tolerate—let pods drain
node.kubernetes.io/disk-pressure	Low disk space	Logging agents to help diagnose
nvidia.com/gpu	GPU nodes with special workloads	GPU monitoring, CUDA drivers
node.kubernetes.io/memory-pressure	Memory pressure detected	Selective—may worsen pressure

Update Strategies for DaemonSets

DaemonSet updates require special consideration because you're updating infrastructure that often affects the entire node. Kubernetes provides two update strategies:

RollingUpdate (Default)

•Updates pods one node at a time
•maxUnavailable controls parallelism
•Old pod terminated → new pod started
•Safest for most infrastructure agents
•Some unavailability per node during update

OnDelete (Manual)

•No automatic updates
•New pods created only when old ones deleted
•Full control over update timing
•Useful for coordinated maintenance
•Requires manual or script-driven deletion

update-strategies.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
# RollingUpdate - Automatic, controlled updates
updateStrategy:
  type: RollingUpdate
  rollingUpdate:
    maxUnavailable: 1      # Update one node at a time
    # maxUnavailable: 25%  # Update 25% of nodes in parallel
    # maxUnavailable: 5    # Update up to 5 nodes in parallel
 
---
# OnDelete - Manual control of when pods update
updateStrategy:
  type: OnDelete
# To trigger update: kubectl delete pod <daemonset-pod>

Choosing maxUnavailable:

The optimal maxUnavailable setting depends on your cluster size and risk tolerance:

Large clusters (100+ nodes): maxUnavailable: 10% speeds up updates significantly while limiting blast radius
Small clusters (< 20 nodes): maxUnavailable: 1 ensures minimal disruption
Critical infrastructure: maxUnavailable: 1 regardless of size for maximum safety

Monitoring rollout progress:

rollout-commands.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# Watch rollout status
kubectl rollout status daemonset/fluent-bit -n logging
 
# Check which pods are on old vs new revision
kubectl get pods -l app=fluent-bit -n logging -o wide --show-labels
 
# Rollback to previous version
kubectl rollout undo daemonset/fluent-bit -n logging
 
# Rollback to specific revision
kubectl rollout undo daemonset/fluent-bit -n logging --to-revision=2
 
# View rollout history
kubectl rollout history daemonset/fluent-bit -n logging

Use OnDelete for Coordinated Updates

When updating CNI plugins or other critical networking components, consider using OnDelete strategy combined with node draining. This allows you to: (1) Drain node and remove workloads, (2) Delete DaemonSet pod to trigger update, (3) Verify networking works, (4) Uncordon node. This provides much safer updates for infrastructure that could partition your cluster if updated incorrectly.

Host Access Patterns

DaemonSet pods often require various forms of host access that regular application pods don't need. Understanding these patterns and their security implications is essential.

Host Access Capabilities
Access Type	Setting	Use Case	Security Risk
Host Network	hostNetwork: true	Network monitoring, CNI plugins	Pod sees all network traffic on node
Host PID	hostPID: true	Process monitoring, security agents	Can see/signal all node processes
Host IPC	hostIPC: true	Shared memory access	Can read/write shared memory segments
Privileged	privileged: true	Kernel modules, iptables	Full root access to node
hostPath	volumes.hostPath	Log collection, config access	File system access based on path

host-access-patterns.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# Pattern 1: Network Monitoring (hostNetwork for raw access)
spec:
  hostNetwork: true
  dnsPolicy: ClusterFirstWithHostNet  # Required when using hostNetwork
  containers:
  - name: network-agent
    securityContext:
      capabilities:
        add: ["NET_ADMIN", "NET_RAW"]
 
---
# Pattern 2: Process Monitoring (hostPID for /proc access)
spec:
  hostPID: true
  containers:
  - name: process-monitor
    securityContext:
      capabilities:
        add: ["SYS_PTRACE"]
    volumeMounts:
    - name: host-proc
      mountPath: /host/proc
      readOnly: true
  volumes:
  - name: host-proc
    hostPath:
      path: /proc
 
---
# Pattern 3: Minimal Host Access (prefer over privileged)
spec:
  containers:
  - name: log-collector
    securityContext:
      privileged: false
      runAsUser: 0  # Still root but not privileged
      capabilities:
        add: ["DAC_READ_SEARCH"]  # Read any file
        drop: ["ALL"]  # Drop everything else
    volumeMounts:
    - name: varlog
      mountPath: /var/log
      readOnly: true

Principle of Least Privilege

Always use the minimum host access required. Instead of privileged: true, use specific capabilities. Instead of mounting / (root), mount only the directories you need. Each escalation is a potential attack vector if the container is compromised. Document why each privilege is necessary in comments or annotations.

Debugging DaemonSets

DaemonSet issues typically fall into two categories: pods not scheduled on expected nodes, or pods failing on specific nodes. Here's a systematic debugging approach:

debug-daemonset.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# 1. Check overall DaemonSet status
kubectl describe daemonset fluent-bit -n logging
 
# Look for:
# - desiredNumberScheduled: should equal total eligible nodes
# - currentNumberScheduled: pods actually scheduled
# - numberReady: pods running and ready
 
# 2. Identify nodes without pods
# Get all nodes
kubectl get nodes
 
# Get nodes with DaemonSet pods
kubectl get pods -l app=fluent-bit -n logging -o wide
 
# Compare to find missing nodes
 
# 3. Check why pod isn't scheduled on a specific node
kubectl describe node <node-name>
 
# Look for:
# - Taints (does DaemonSet tolerate them?)
# - Labels (does nodeSelector match?)
# - Conditions (is node Ready?)
 
# 4. Check for resource constraints
kubectl describe node <node-name> | grep -A 10 "Allocated resources"
 
# 5. Check DaemonSet pod on a specific node
kubectl get pods -l app=fluent-bit -n logging --field-selector spec.nodeName=<node-name>
 
# 6. Debug a specific pod
kubectl logs <pod-name> -n logging
kubectl describe pod <pod-name> -n logging
 
# 7. Check events for scheduling issues
kubectl get events -n logging --field-selector reason=FailedScheduling

Common DaemonSet Issues and Solutions
Symptom	Likely Cause	Solution
Pod not scheduled on node	Node taint not tolerated	Add toleration for the taint
Pod not scheduled on node	nodeSelector doesn't match labels	Check node labels, update selector
Pod pending on node	Insufficient resources on node	Lower resource requests or scale node
Pod CrashLoopBackOff	Host path doesn't exist on node	Verify path exists on all nodes
Update stuck	Pod failing readiness on some nodes	Debug specific nodes, check logs
Less pods than nodes	Node affinity excludes some nodes	Review nodeAffinity rules

Summary: DaemonSets for Node-Level Workloads

Let's consolidate the essential knowledge about DaemonSets:

Key Takeaways

•DaemonSets run one pod per eligible node — Automatic creation and deletion as nodes join or leave the cluster.
•Primary use cases are infrastructure agents — Logging, monitoring, networking, and security agents that need node-local access.
•Node selection controls scope — Use nodeSelector for simple label matching, nodeAffinity for complex rules.
•Tolerations enable tainted node access — Critical for running on control plane nodes and specialized hardware.
•RollingUpdate is the safe default — Use maxUnavailable to control update parallelism based on cluster size.
•Host access requires careful consideration — Use minimum required privileges and avoid privileged mode when possible.
•priorityClassName ensures reliability — Use system-node-critical for infrastructure agents to prevent eviction.

What's next:

Now that you understand DaemonSets for node-level workloads, we'll explore Jobs and CronJobs—Kubernetes' workload types for batch processing and scheduled tasks. These are essential for running one-off tasks, data migrations, and periodic maintenance jobs.

Page Complete

You now have a comprehensive understanding of DaemonSets. You can deploy node-level agents with appropriate host access, control which nodes receive pods through selectors and tolerations, and safely update DaemonSets in production. Next, we'll explore Jobs and CronJobs for batch and scheduled workloads.