Kubernetes Workloads - Learning Module

Loading content...

0/273

Deployments for Stateless Applications

The Foundation of Kubernetes Workload Management

When organizations migrate to Kubernetes, the Deployment is almost always the first workload type they encounter. This is not by accident—Deployments represent the quintessential Kubernetes abstraction for managing stateless applications, embodying the platform's core philosophy of declarative, self-healing, and scalable infrastructure.

Understanding Deployments deeply is not merely about learning YAML syntax. It's about grasping how Kubernetes achieves the seemingly magical feat of keeping your applications running, automatically recovering from failures, and enabling zero-downtime updates across thousands of containers.

What You Will Learn

By the end of this page, you will understand the complete architecture of Kubernetes Deployments—from the controller pattern that powers them to the ReplicaSet mechanics beneath the surface. You'll master declarative configuration, scaling strategies, update mechanisms, and production hardening techniques that separate reliable systems from fragile ones.

Understanding Stateless Applications

Before diving into Deployments, we must establish a precise understanding of what makes an application "stateless." This distinction is foundational because the correct choice of Kubernetes workload type depends entirely on your application's state requirements.

What does stateless mean?

A stateless application treats each request as an independent transaction that contains all the information necessary to process it. The application instance holds no memory of previous interactions—no session data, no cached user preferences, no accumulated state. Every request could theoretically be handled by any instance of the application.

Stateless vs Stateful Application Characteristics
Characteristic	Stateless Application	Stateful Application
Instance identity	Interchangeable, anonymous	Unique, addressable
Data persistence	External (databases, caches)	Local (attached storage)
Scaling approach	Add/remove instances freely	Careful orchestration required
Failure recovery	Replace with any new instance	Must preserve identity and data
Startup order	Irrelevant	Often sequential, ordered
Network identity	Dynamic IPs acceptable	Stable DNS names required
Examples	Web servers, API gateways, workers	Databases, distributed caches, Kafka

The statelessness contract:

When designing stateless applications, you commit to a specific contract:

No local persistence — All persistent data lives in external systems (databases, object storage, distributed caches)
No session affinity requirements — Any instance can handle any request from any user
Immutable configuration — Runtime configuration comes from environment variables or mounted ConfigMaps, not local files that instances modify
Graceful shutdown — Instances can be terminated at any moment without data loss

This contract enables Kubernetes to manage your application with maximum flexibility—scaling up under load, replacing failed instances instantly, and performing rolling updates seamlessly.

The Twelve-Factor App Connection

The stateless application model aligns perfectly with the Twelve-Factor App methodology, particularly Factor VI (Processes): "Twelve-factor processes are stateless and share-nothing." Applications built following these principles are inherently Deployment-friendly. If your application violates this—for example, by storing user sessions in memory—you'll face challenges with scaling and reliability that no amount of Kubernetes configuration can solve.

The Deployment Architecture

A Kubernetes Deployment is far more than a simple container launcher. It's a sophisticated abstraction built on layered controllers, each responsible for a specific aspect of application lifecycle management. Understanding this architecture is crucial for troubleshooting and optimization.

The hierarchy of objects:

Converting Mermaid diagram...

The controller pattern in action:

Kubernetes operates on a declarative model enforced through controllers. Each controller watches for specific resources and reconciles the actual state with the desired state:

Deployment Controller watches Deployment objects. When you modify a Deployment's pod template, it creates a new ReplicaSet and orchestrates the transition from old to new.
ReplicaSet Controller watches ReplicaSet objects. It ensures the specified number of pod replicas are running at all times, creating or deleting pods as needed.
Kubelet (on each node) watches pods assigned to its node and ensures the containers are running according to their specifications.

This layered design provides separation of concerns: the Deployment handles updates and history, the ReplicaSet handles replica count, and the kubelet handles actual container execution.

Why ReplicaSets Exist Between Deployments and Pods

•Rollback capability — Each ReplicaSet represents a specific version of your application. Kubernetes retains old ReplicaSets (scaled to 0) so you can instantly rollback by scaling them back up.
•Atomic update units — The transition from one version to another is managed by scaling ReplicaSets, not by modifying pods in-place. This ensures clean, predictable updates.
•History tracking — The revision history of a Deployment is stored as the list of ReplicaSets it has created. You can inspect previous configurations at any time.
•Selector stability — ReplicaSets use immutable selectors. The Deployment creates new ReplicaSets for new pod templates rather than modifying existing ones.

Declarative Configuration Deep Dive

The power of Kubernetes lies in its declarative configuration model. Rather than issuing imperative commands ("start 3 containers"), you declare your desired state ("I want 3 replicas running") and let Kubernetes figure out how to achieve and maintain that state.

Let's examine a production-grade Deployment specification piece by piece:

deployment.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: production
  labels:
    app: api-gateway
    team: platform
    version: v2.3.1
  annotations:
    deployment.kubernetes.io/revision: "7"
    meta.helm.sh/release-name: api-gateway
spec:
  # === Replica Configuration ===
  replicas: 6
  
  # === Selector (Immutable after creation) ===
  selector:
    matchLabels:
      app: api-gateway
  
  # === Update Strategy ===
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1    # At most 1 pod down during update
      maxSurge: 2          # At most 2 extra pods during update
  
  # === Revision History ===
  revisionHistoryLimit: 10  # Keep 10 old ReplicaSets for rollback
  
  # === Pod Template ===
  template:
    metadata:
      labels:
        app: api-gateway
        version: v2.3.1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
    spec:
      # === Scheduling Constraints ===
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: api-gateway
              topologyKey: kubernetes.io/hostname
      
      # === Service Account ===
      serviceAccountName: api-gateway-sa
      
      # === Security Context ===
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      
      # === Containers ===
      containers:
      - name: api-gateway
        image: company/api-gateway:v2.3.1
        imagePullPolicy: IfNotPresent
        
        ports:
        - name: http
          containerPort: 8080
        - name: metrics
          containerPort: 9090
        
        # === Resource Limits ===
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1000m"
            memory: "512Mi"
        
        # === Health Checks ===
        livenessProbe:
          httpGet:
            path: /health/live
            port: http
          initialDelaySeconds: 15
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /health/ready
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        
        # === Lifecycle Hooks ===
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]
        
        # === Environment Variables ===
        env:
        - name: LOG_LEVEL
          value: "info"
        - name: DB_HOST
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: host
        
        envFrom:
        - configMapRef:
            name: api-gateway-config
      
      # === Graceful Shutdown ===
      terminationGracePeriodSeconds: 30

Critical Configuration Elements Explained

•selector.matchLabels — This is immutable after creation. It defines which pods belong to this Deployment. The pod template's labels must include these labels.
•strategy.rollingUpdate — maxUnavailable=1 means you always have at least 5 of 6 pods available during updates. maxSurge=2 allows temporarily running 8 pods to speed up rollout.
•podAntiAffinity — Spreads pods across different nodes, preventing a single node failure from taking down multiple replicas.
•resources.requests — The scheduler uses these values to find suitable nodes. Under-provisioning causes scheduling failures; over-provisioning wastes cluster capacity.
•resources.limits — Hard limits that trigger OOM kills (memory) or CPU throttling if exceeded. Set thoughtfully based on actual usage patterns.
•livenessProbe — Kubernetes restarts containers that fail this check. Use it to detect unrecoverable states (deadlocks, infinite loops).
•readinessProbe — Kubernetes removes pods from service endpoints when this fails. Use it for transient issues (downstream dependency failures, warmup periods).
•lifecycle.preStop — The sleep gives time for endpoints to update before the pod stops receiving traffic. Critical for zero-downtime deployments.

Scaling Strategies

Deployments provide the foundation for both manual and automatic scaling. Understanding these mechanisms is essential for building responsive, cost-efficient systems.

Manual Scaling:

The simplest form of scaling is manually updating the replica count:

scaling-commands.sh
Bash
1
2
3
4
5
6
7
8
# Scale to 10 replicas immediately
kubectl scale deployment api-gateway --replicas=10
 
# Scale using patch (useful in CI/CD)
kubectl patch deployment api-gateway -p '{"spec":{"replicas":10}}'
 
# Scale to zero (common for cost savings in non-prod)
kubectl scale deployment api-gateway --replicas=0

Horizontal Pod Autoscaler (HPA):

For dynamic workloads, the Horizontal Pod Autoscaler adjusts replica count based on observed metrics. Modern Kubernetes supports both resource-based and custom metrics scaling:

hpa.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  
  minReplicas: 3
  maxReplicas: 50
  
  metrics:
  # CPU-based scaling
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale up when avg CPU > 70%
  
  # Memory-based scaling
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  
  # Custom metric: requests per second
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"  # Scale when RPS > 1000 per pod
  
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60  # Scale down at most 10% per minute
    
    scaleUp:
      stabilizationWindowSeconds: 0  # Scale up immediately
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15  # Can double capacity every 15s
      - type: Pods
        value: 4
        periodSeconds: 15  # Or add 4 pods every 15s

HPA Behavior Configuration is Critical

Without explicit behavior configuration, HPA uses aggressive defaults that can cause oscillation—rapidly scaling up and down as metrics fluctuate around thresholds. The stabilizationWindowSeconds setting prevents premature scale-down by requiring metrics to remain below threshold for the specified duration. Always configure both scaleUp and scaleDown behavior in production.

Scaling Strategy Comparison
Strategy	Trigger	Response Time	Best For
Manual scaling	Human decision	Immediate	Planned events, maintenance windows
HPA (CPU/Memory)	Resource utilization	15-60 seconds	CPU-bound workloads, batch processing
HPA (Custom metrics)	Business metrics	30-90 seconds	API servers, request-driven workloads
KEDA	Event sources	Seconds to minutes	Event-driven, scale-to-zero scenarios
Scheduled scaling	Time-based rules	Predictable	Known traffic patterns, cost optimization

Update Strategies and Rollouts

Kubernetes Deployments support two built-in update strategies, with the rolling update being the default and most commonly used. Understanding these strategies and their tradeoffs is essential for achieving zero-downtime deployments.

Strategy 1: Rolling Update (Default)

A rolling update incrementally replaces old pods with new ones, maintaining availability throughout the process. Kubernetes creates new pods, waits for them to become ready, then terminates old pods.

Converting Mermaid diagram...

Rolling Update Advantages

•Zero downtime during updates
•Gradual rollout reduces blast radius
•Easy rollback if issues detected
•Works well with health checks
•Lower resource overhead than Recreate

Rolling Update Considerations

•Two versions run simultaneously
•Requires backward-compatible changes
•Database migrations need careful planning
•Longer total update time
•More complex debugging during rollout

Strategy 2: Recreate

The Recreate strategy terminates all existing pods before creating new ones. This causes downtime but ensures only one version runs at any time:

recreate-strategy.yaml
YAML
1
2
3
4
spec:
  strategy:
    type: Recreate
    # No additional configuration for Recreate

When to Use Recreate Strategy

Use Recreate when: (1) Your application cannot tolerate multiple versions running simultaneously, (2) You have shared volumes that don't support ReadWriteMany access mode, (3) The application has startup dependencies that only the first instance should execute, or (4) Brief downtime is acceptable and simpler than managing compatibility.

Controlling and Monitoring Rollouts:

Kubernetes provides rich tooling for managing rollouts:

rollout-commands.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Watch rollout progress in real-time
kubectl rollout status deployment/api-gateway
 
# View rollout history
kubectl rollout history deployment/api-gateway
 
# View specific revision details
kubectl rollout history deployment/api-gateway --revision=5
 
# Pause a rollout (for canary-style validation)
kubectl rollout pause deployment/api-gateway
 
# Resume a paused rollout
kubectl rollout resume deployment/api-gateway
 
# Rollback to previous revision
kubectl rollout undo deployment/api-gateway
 
# Rollback to specific revision
kubectl rollout undo deployment/api-gateway --to-revision=3
 
# Restart all pods (useful for config changes)
kubectl rollout restart deployment/api-gateway

Production Hardening

Moving Deployments from development to production requires careful attention to reliability, security, and operational concerns. This section covers the essential hardening techniques that distinguish production-grade configurations.

Essential Production Configurations

•Pod Disruption Budgets (PDB) — Define minimum availability during voluntary disruptions like node drains or cluster upgrades. Without PDBs, cluster operations can accidentally take down all your replicas.
•Topology Spread Constraints — Distribute pods across availability zones and nodes. This provides resilience against zone failures and hardware problems.
•Resource Quotas and Limits — Prevent runaway pods from consuming cluster resources. Always set both requests and limits for CPU and memory.
•Network Policies — Restrict pod-to-pod communication to the minimum required. Default-deny policies with explicit allow rules follow zero-trust principles.
•Security Contexts — Run containers as non-root, use read-only filesystems where possible, and drop unnecessary Linux capabilities.

production-hardening.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Pod Disruption Budget - ensures minimum availability
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-gateway-pdb
spec:
  minAvailable: 2  # At least 2 pods must remain during disruptions
  # Alternative: maxUnavailable: 1
  selector:
    matchLabels:
      app: api-gateway
---
# Topology Spread - distribution across zones and nodes
spec:
  template:
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api-gateway
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: api-gateway
---
# Security Context - container hardening
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: api-gateway
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]
        volumeMounts:
        - name: tmp
          mountPath: /tmp
      volumes:
      - name: tmp
        emptyDir: {}  # Writable temp directory

Common Production Mistakes

Missing resource requests — Pods without requests get no CPU guarantees during contention, causing unexpected throttling. 2. No PDB defined — Cluster upgrades or node drains terminate all pods simultaneously. 3. Insufficient replica count — Running 2 replicas provides no redundancy during rolling updates (one old + one new = no actual redundancy). 4. Missing readiness probes — Traffic routes to pods before they're ready, causing request failures. 5. Overly aggressive liveness probes — Short timeouts cause unnecessary restarts during temporary slowdowns.

Debugging Deployments

When Deployments don't behave as expected, systematic debugging is essential. Understanding the event flow and knowing where to look can dramatically reduce troubleshooting time.

debugging-commands.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 1. Check Deployment status and conditions
kubectl describe deployment api-gateway
 
# Key things to look for:
# - Conditions: Available, Progressing
# - Events: any warnings or errors
# - Replicas: desired vs current vs ready
 
# 2. Check ReplicaSet status
kubectl get replicaset -l app=api-gateway
kubectl describe replicaset <replicaset-name>
 
# 3. Check Pod status and events
kubectl get pods -l app=api-gateway
kubectl describe pod <pod-name>
 
# Key things to look for:
# - Status: CrashLoopBackOff, ImagePullBackOff, Pending
# - Events: scheduling failures, pull errors
# - Containers: Ready state, restart count
 
# 4. Check container logs
kubectl logs <pod-name> -c api-gateway
kubectl logs <pod-name> -c api-gateway --previous  # Previous container
 
# 5. Interactive debugging
kubectl exec -it <pod-name> -- /bin/sh
 
# 6. Check resource metrics
kubectl top pods -l app=api-gateway
 
# 7. Check events across namespace
kubectl get events --sort-by=.metadata.creationTimestamp

Common Deployment Issues and Solutions
Symptom	Likely Cause	Solution
Pods stuck in Pending	Insufficient resources or node selector doesn't match	Check resource requests, node labels, and taints/tolerations
Pods in CrashLoopBackOff	Application crashes immediately after starting	Check logs, verify environment variables and configs
Pods in ImagePullBackOff	Image doesn't exist or registry auth failed	Verify image name/tag, check imagePullSecrets
Rollout stuck at 0 available	Pods failing readiness or liveness probes	Check probe endpoints, increase timeouts if app is slow
Slow rollouts	Pods taking too long to become ready	Tune readiness probe timing, check initialDelaySeconds
Deployment not updating	No changes to pod template (only metadata)	Ensure spec.template is modified, not just deployment metadata

Summary: Deployments for Stateless Applications

Let's consolidate the essential knowledge about Kubernetes Deployments:

Key Takeaways

•Deployments manage stateless applications — Use them for applications that don't require persistent identity, stable storage, or ordered startup.
•The architecture is layered — Deployment → ReplicaSet → Pods. Each layer has a specific responsibility in maintaining desired state.
•Rolling updates enable zero-downtime — Configure maxUnavailable and maxSurge appropriately for your availability requirements.
•Health checks are essential — Liveness probes detect crashed containers; readiness probes control traffic routing.
•Scaling can be manual or automatic — HPA provides reactive scaling based on metrics; configure behavior to prevent oscillation.
•Production requires hardening — PDBs, topology constraints, security contexts, and proper resource limits are non-negotiable.
•Debugging is systematic — Follow the hierarchy: Deployment events → ReplicaSet status → Pod events → Container logs.

What's next:

Now that you understand Deployments for stateless applications, we'll explore StatefulSets—Kubernetes' solution for applications that require stable network identities, ordered deployment, and persistent storage. StatefulSets introduce fundamentally different guarantees that are essential for databases, distributed caches, and message brokers.

Page Complete

You now have a comprehensive understanding of Kubernetes Deployments. You can configure production-grade stateless applications, implement scaling strategies, perform zero-downtime updates, and troubleshoot common issues. Next, we'll tackle the more complex world of stateful workloads with StatefulSets.

Deployments for Stateless Applications

The Foundation of Kubernetes Workload Management

What You Will Learn

Understanding Stateless Applications

What does stateless mean?

Stateless vs Stateful Application Characteristics
Characteristic	Stateless Application	Stateful Application
Instance identity	Interchangeable, anonymous	Unique, addressable
Data persistence	External (databases, caches)	Local (attached storage)
Scaling approach	Add/remove instances freely	Careful orchestration required
Failure recovery	Replace with any new instance	Must preserve identity and data
Startup order	Irrelevant	Often sequential, ordered
Network identity	Dynamic IPs acceptable	Stable DNS names required
Examples	Web servers, API gateways, workers	Databases, distributed caches, Kafka

The statelessness contract:

When designing stateless applications, you commit to a specific contract:

No local persistence — All persistent data lives in external systems (databases, object storage, distributed caches)
No session affinity requirements — Any instance can handle any request from any user
Immutable configuration — Runtime configuration comes from environment variables or mounted ConfigMaps, not local files that instances modify
Graceful shutdown — Instances can be terminated at any moment without data loss

This contract enables Kubernetes to manage your application with maximum flexibility—scaling up under load, replacing failed instances instantly, and performing rolling updates seamlessly.

The Twelve-Factor App Connection

The Deployment Architecture

The hierarchy of objects:

Converting Mermaid diagram...

The controller pattern in action:

Kubernetes operates on a declarative model enforced through controllers. Each controller watches for specific resources and reconciles the actual state with the desired state:

Deployment Controller watches Deployment objects. When you modify a Deployment's pod template, it creates a new ReplicaSet and orchestrates the transition from old to new.
ReplicaSet Controller watches ReplicaSet objects. It ensures the specified number of pod replicas are running at all times, creating or deleting pods as needed.
Kubelet (on each node) watches pods assigned to its node and ensures the containers are running according to their specifications.

This layered design provides separation of concerns: the Deployment handles updates and history, the ReplicaSet handles replica count, and the kubelet handles actual container execution.

Why ReplicaSets Exist Between Deployments and Pods

•Rollback capability — Each ReplicaSet represents a specific version of your application. Kubernetes retains old ReplicaSets (scaled to 0) so you can instantly rollback by scaling them back up.
•Atomic update units — The transition from one version to another is managed by scaling ReplicaSets, not by modifying pods in-place. This ensures clean, predictable updates.
•History tracking — The revision history of a Deployment is stored as the list of ReplicaSets it has created. You can inspect previous configurations at any time.
•Selector stability — ReplicaSets use immutable selectors. The Deployment creates new ReplicaSets for new pod templates rather than modifying existing ones.

Declarative Configuration Deep Dive

Let's examine a production-grade Deployment specification piece by piece:

deployment.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-gateway
  namespace: production
  labels:
    app: api-gateway
    team: platform
    version: v2.3.1
  annotations:
    deployment.kubernetes.io/revision: "7"
    meta.helm.sh/release-name: api-gateway
spec:
  # === Replica Configuration ===
  replicas: 6
  
  # === Selector (Immutable after creation) ===
  selector:
    matchLabels:
      app: api-gateway
  
  # === Update Strategy ===
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1    # At most 1 pod down during update
      maxSurge: 2          # At most 2 extra pods during update
  
  # === Revision History ===
  revisionHistoryLimit: 10  # Keep 10 old ReplicaSets for rollback
  
  # === Pod Template ===
  template:
    metadata:
      labels:
        app: api-gateway
        version: v2.3.1
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9090"
    spec:
      # === Scheduling Constraints ===
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: api-gateway
              topologyKey: kubernetes.io/hostname
      
      # === Service Account ===
      serviceAccountName: api-gateway-sa
      
      # === Security Context ===
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      
      # === Containers ===
      containers:
      - name: api-gateway
        image: company/api-gateway:v2.3.1
        imagePullPolicy: IfNotPresent
        
        ports:
        - name: http
          containerPort: 8080
        - name: metrics
          containerPort: 9090
        
        # === Resource Limits ===
        resources:
          requests:
            cpu: "250m"
            memory: "256Mi"
          limits:
            cpu: "1000m"
            memory: "512Mi"
        
        # === Health Checks ===
        livenessProbe:
          httpGet:
            path: /health/live
            port: http
          initialDelaySeconds: 15
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3
        
        readinessProbe:
          httpGet:
            path: /health/ready
            port: http
          initialDelaySeconds: 5
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
        
        # === Lifecycle Hooks ===
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 10"]
        
        # === Environment Variables ===
        env:
        - name: LOG_LEVEL
          value: "info"
        - name: DB_HOST
          valueFrom:
            secretKeyRef:
              name: db-credentials
              key: host
        
        envFrom:
        - configMapRef:
            name: api-gateway-config
      
      # === Graceful Shutdown ===
      terminationGracePeriodSeconds: 30

Critical Configuration Elements Explained

•selector.matchLabels — This is immutable after creation. It defines which pods belong to this Deployment. The pod template's labels must include these labels.
•strategy.rollingUpdate — maxUnavailable=1 means you always have at least 5 of 6 pods available during updates. maxSurge=2 allows temporarily running 8 pods to speed up rollout.
•podAntiAffinity — Spreads pods across different nodes, preventing a single node failure from taking down multiple replicas.
•resources.requests — The scheduler uses these values to find suitable nodes. Under-provisioning causes scheduling failures; over-provisioning wastes cluster capacity.
•resources.limits — Hard limits that trigger OOM kills (memory) or CPU throttling if exceeded. Set thoughtfully based on actual usage patterns.
•livenessProbe — Kubernetes restarts containers that fail this check. Use it to detect unrecoverable states (deadlocks, infinite loops).
•readinessProbe — Kubernetes removes pods from service endpoints when this fails. Use it for transient issues (downstream dependency failures, warmup periods).
•lifecycle.preStop — The sleep gives time for endpoints to update before the pod stops receiving traffic. Critical for zero-downtime deployments.

Scaling Strategies

Deployments provide the foundation for both manual and automatic scaling. Understanding these mechanisms is essential for building responsive, cost-efficient systems.

Manual Scaling:

The simplest form of scaling is manually updating the replica count:

scaling-commands.sh
Bash
1
2
3
4
5
6
7
8
# Scale to 10 replicas immediately
kubectl scale deployment api-gateway --replicas=10
 
# Scale using patch (useful in CI/CD)
kubectl patch deployment api-gateway -p '{"spec":{"replicas":10}}'
 
# Scale to zero (common for cost savings in non-prod)
kubectl scale deployment api-gateway --replicas=0

Horizontal Pod Autoscaler (HPA):

For dynamic workloads, the Horizontal Pod Autoscaler adjusts replica count based on observed metrics. Modern Kubernetes supports both resource-based and custom metrics scaling:

hpa.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-gateway-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-gateway
  
  minReplicas: 3
  maxReplicas: 50
  
  metrics:
  # CPU-based scaling
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale up when avg CPU > 70%
  
  # Memory-based scaling
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  
  # Custom metric: requests per second
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "1000"  # Scale when RPS > 1000 per pod
  
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Wait 5 min before scaling down
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60  # Scale down at most 10% per minute
    
    scaleUp:
      stabilizationWindowSeconds: 0  # Scale up immediately
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15  # Can double capacity every 15s
      - type: Pods
        value: 4
        periodSeconds: 15  # Or add 4 pods every 15s

HPA Behavior Configuration is Critical

Scaling Strategy Comparison
Strategy	Trigger	Response Time	Best For
Manual scaling	Human decision	Immediate	Planned events, maintenance windows
HPA (CPU/Memory)	Resource utilization	15-60 seconds	CPU-bound workloads, batch processing
HPA (Custom metrics)	Business metrics	30-90 seconds	API servers, request-driven workloads
KEDA	Event sources	Seconds to minutes	Event-driven, scale-to-zero scenarios
Scheduled scaling	Time-based rules	Predictable	Known traffic patterns, cost optimization

Update Strategies and Rollouts

Strategy 1: Rolling Update (Default)

A rolling update incrementally replaces old pods with new ones, maintaining availability throughout the process. Kubernetes creates new pods, waits for them to become ready, then terminates old pods.

Converting Mermaid diagram...

Rolling Update Advantages

•Zero downtime during updates
•Gradual rollout reduces blast radius
•Easy rollback if issues detected
•Works well with health checks
•Lower resource overhead than Recreate

Rolling Update Considerations

•Two versions run simultaneously
•Requires backward-compatible changes
•Database migrations need careful planning
•Longer total update time
•More complex debugging during rollout

Strategy 2: Recreate

The Recreate strategy terminates all existing pods before creating new ones. This causes downtime but ensures only one version runs at any time:

recreate-strategy.yaml
YAML
1
2
3
4
spec:
  strategy:
    type: Recreate
    # No additional configuration for Recreate

When to Use Recreate Strategy

Controlling and Monitoring Rollouts:

Kubernetes provides rich tooling for managing rollouts:

rollout-commands.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Watch rollout progress in real-time
kubectl rollout status deployment/api-gateway
 
# View rollout history
kubectl rollout history deployment/api-gateway
 
# View specific revision details
kubectl rollout history deployment/api-gateway --revision=5
 
# Pause a rollout (for canary-style validation)
kubectl rollout pause deployment/api-gateway
 
# Resume a paused rollout
kubectl rollout resume deployment/api-gateway
 
# Rollback to previous revision
kubectl rollout undo deployment/api-gateway
 
# Rollback to specific revision
kubectl rollout undo deployment/api-gateway --to-revision=3
 
# Restart all pods (useful for config changes)
kubectl rollout restart deployment/api-gateway

Production Hardening

Essential Production Configurations

•Pod Disruption Budgets (PDB) — Define minimum availability during voluntary disruptions like node drains or cluster upgrades. Without PDBs, cluster operations can accidentally take down all your replicas.
•Topology Spread Constraints — Distribute pods across availability zones and nodes. This provides resilience against zone failures and hardware problems.
•Resource Quotas and Limits — Prevent runaway pods from consuming cluster resources. Always set both requests and limits for CPU and memory.
•Network Policies — Restrict pod-to-pod communication to the minimum required. Default-deny policies with explicit allow rules follow zero-trust principles.
•Security Contexts — Run containers as non-root, use read-only filesystems where possible, and drop unnecessary Linux capabilities.

production-hardening.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# Pod Disruption Budget - ensures minimum availability
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-gateway-pdb
spec:
  minAvailable: 2  # At least 2 pods must remain during disruptions
  # Alternative: maxUnavailable: 1
  selector:
    matchLabels:
      app: api-gateway
---
# Topology Spread - distribution across zones and nodes
spec:
  template:
    spec:
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: api-gateway
      - maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: ScheduleAnyway
        labelSelector:
          matchLabels:
            app: api-gateway
---
# Security Context - container hardening
spec:
  template:
    spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        runAsGroup: 1000
        fsGroup: 1000
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name: api-gateway
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop: ["ALL"]
        volumeMounts:
        - name: tmp
          mountPath: /tmp
      volumes:
      - name: tmp
        emptyDir: {}  # Writable temp directory

Common Production Mistakes

Missing resource requests — Pods without requests get no CPU guarantees during contention, causing unexpected throttling. 2. No PDB defined — Cluster upgrades or node drains terminate all pods simultaneously. 3. Insufficient replica count — Running 2 replicas provides no redundancy during rolling updates (one old + one new = no actual redundancy). 4. Missing readiness probes — Traffic routes to pods before they're ready, causing request failures. 5. Overly aggressive liveness probes — Short timeouts cause unnecessary restarts during temporary slowdowns.

Debugging Deployments

When Deployments don't behave as expected, systematic debugging is essential. Understanding the event flow and knowing where to look can dramatically reduce troubleshooting time.

debugging-commands.sh
Bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# 1. Check Deployment status and conditions
kubectl describe deployment api-gateway
 
# Key things to look for:
# - Conditions: Available, Progressing
# - Events: any warnings or errors
# - Replicas: desired vs current vs ready
 
# 2. Check ReplicaSet status
kubectl get replicaset -l app=api-gateway
kubectl describe replicaset <replicaset-name>
 
# 3. Check Pod status and events
kubectl get pods -l app=api-gateway
kubectl describe pod <pod-name>
 
# Key things to look for:
# - Status: CrashLoopBackOff, ImagePullBackOff, Pending
# - Events: scheduling failures, pull errors
# - Containers: Ready state, restart count
 
# 4. Check container logs
kubectl logs <pod-name> -c api-gateway
kubectl logs <pod-name> -c api-gateway --previous  # Previous container
 
# 5. Interactive debugging
kubectl exec -it <pod-name> -- /bin/sh
 
# 6. Check resource metrics
kubectl top pods -l app=api-gateway
 
# 7. Check events across namespace
kubectl get events --sort-by=.metadata.creationTimestamp

Common Deployment Issues and Solutions
Symptom	Likely Cause	Solution
Pods stuck in Pending	Insufficient resources or node selector doesn't match	Check resource requests, node labels, and taints/tolerations
Pods in CrashLoopBackOff	Application crashes immediately after starting	Check logs, verify environment variables and configs
Pods in ImagePullBackOff	Image doesn't exist or registry auth failed	Verify image name/tag, check imagePullSecrets
Rollout stuck at 0 available	Pods failing readiness or liveness probes	Check probe endpoints, increase timeouts if app is slow
Slow rollouts	Pods taking too long to become ready	Tune readiness probe timing, check initialDelaySeconds
Deployment not updating	No changes to pod template (only metadata)	Ensure spec.template is modified, not just deployment metadata

Summary: Deployments for Stateless Applications

Let's consolidate the essential knowledge about Kubernetes Deployments:

Key Takeaways

•Deployments manage stateless applications — Use them for applications that don't require persistent identity, stable storage, or ordered startup.
•The architecture is layered — Deployment → ReplicaSet → Pods. Each layer has a specific responsibility in maintaining desired state.
•Rolling updates enable zero-downtime — Configure maxUnavailable and maxSurge appropriately for your availability requirements.
•Health checks are essential — Liveness probes detect crashed containers; readiness probes control traffic routing.
•Scaling can be manual or automatic — HPA provides reactive scaling based on metrics; configure behavior to prevent oscillation.
•Production requires hardening — PDBs, topology constraints, security contexts, and proper resource limits are non-negotiable.
•Debugging is systematic — Follow the hierarchy: Deployment events → ReplicaSet status → Pod events → Container logs.

What's next:

Page Complete