Loading content...
When organizations migrate to Kubernetes, the Deployment is almost always the first workload type they encounter. This is not by accident—Deployments represent the quintessential Kubernetes abstraction for managing stateless applications, embodying the platform's core philosophy of declarative, self-healing, and scalable infrastructure.
Understanding Deployments deeply is not merely about learning YAML syntax. It's about grasping how Kubernetes achieves the seemingly magical feat of keeping your applications running, automatically recovering from failures, and enabling zero-downtime updates across thousands of containers.
By the end of this page, you will understand the complete architecture of Kubernetes Deployments—from the controller pattern that powers them to the ReplicaSet mechanics beneath the surface. You'll master declarative configuration, scaling strategies, update mechanisms, and production hardening techniques that separate reliable systems from fragile ones.
Before diving into Deployments, we must establish a precise understanding of what makes an application "stateless." This distinction is foundational because the correct choice of Kubernetes workload type depends entirely on your application's state requirements.
What does stateless mean?
A stateless application treats each request as an independent transaction that contains all the information necessary to process it. The application instance holds no memory of previous interactions—no session data, no cached user preferences, no accumulated state. Every request could theoretically be handled by any instance of the application.
| Characteristic | Stateless Application | Stateful Application |
|---|---|---|
| Instance identity | Interchangeable, anonymous | Unique, addressable |
| Data persistence | External (databases, caches) | Local (attached storage) |
| Scaling approach | Add/remove instances freely | Careful orchestration required |
| Failure recovery | Replace with any new instance | Must preserve identity and data |
| Startup order | Irrelevant | Often sequential, ordered |
| Network identity | Dynamic IPs acceptable | Stable DNS names required |
| Examples | Web servers, API gateways, workers | Databases, distributed caches, Kafka |
The statelessness contract:
When designing stateless applications, you commit to a specific contract:
This contract enables Kubernetes to manage your application with maximum flexibility—scaling up under load, replacing failed instances instantly, and performing rolling updates seamlessly.
The stateless application model aligns perfectly with the Twelve-Factor App methodology, particularly Factor VI (Processes): "Twelve-factor processes are stateless and share-nothing." Applications built following these principles are inherently Deployment-friendly. If your application violates this—for example, by storing user sessions in memory—you'll face challenges with scaling and reliability that no amount of Kubernetes configuration can solve.
A Kubernetes Deployment is far more than a simple container launcher. It's a sophisticated abstraction built on layered controllers, each responsible for a specific aspect of application lifecycle management. Understanding this architecture is crucial for troubleshooting and optimization.
The hierarchy of objects:
The controller pattern in action:
Kubernetes operates on a declarative model enforced through controllers. Each controller watches for specific resources and reconciles the actual state with the desired state:
Deployment Controller watches Deployment objects. When you modify a Deployment's pod template, it creates a new ReplicaSet and orchestrates the transition from old to new.
ReplicaSet Controller watches ReplicaSet objects. It ensures the specified number of pod replicas are running at all times, creating or deleting pods as needed.
Kubelet (on each node) watches pods assigned to its node and ensures the containers are running according to their specifications.
This layered design provides separation of concerns: the Deployment handles updates and history, the ReplicaSet handles replica count, and the kubelet handles actual container execution.
The power of Kubernetes lies in its declarative configuration model. Rather than issuing imperative commands ("start 3 containers"), you declare your desired state ("I want 3 replicas running") and let Kubernetes figure out how to achieve and maintain that state.
Let's examine a production-grade Deployment specification piece by piece:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123
apiVersion: apps/v1kind: Deploymentmetadata: name: api-gateway namespace: production labels: app: api-gateway team: platform version: v2.3.1 annotations: deployment.kubernetes.io/revision: "7" meta.helm.sh/release-name: api-gatewayspec: # === Replica Configuration === replicas: 6 # === Selector (Immutable after creation) === selector: matchLabels: app: api-gateway # === Update Strategy === strategy: type: RollingUpdate rollingUpdate: maxUnavailable: 1 # At most 1 pod down during update maxSurge: 2 # At most 2 extra pods during update # === Revision History === revisionHistoryLimit: 10 # Keep 10 old ReplicaSets for rollback # === Pod Template === template: metadata: labels: app: api-gateway version: v2.3.1 annotations: prometheus.io/scrape: "true" prometheus.io/port: "9090" spec: # === Scheduling Constraints === affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: api-gateway topologyKey: kubernetes.io/hostname # === Service Account === serviceAccountName: api-gateway-sa # === Security Context === securityContext: runAsNonRoot: true runAsUser: 1000 fsGroup: 1000 # === Containers === containers: - name: api-gateway image: company/api-gateway:v2.3.1 imagePullPolicy: IfNotPresent ports: - name: http containerPort: 8080 - name: metrics containerPort: 9090 # === Resource Limits === resources: requests: cpu: "250m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi" # === Health Checks === livenessProbe: httpGet: path: /health/live port: http initialDelaySeconds: 15 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 readinessProbe: httpGet: path: /health/ready port: http initialDelaySeconds: 5 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 # === Lifecycle Hooks === lifecycle: preStop: exec: command: ["/bin/sh", "-c", "sleep 10"] # === Environment Variables === env: - name: LOG_LEVEL value: "info" - name: DB_HOST valueFrom: secretKeyRef: name: db-credentials key: host envFrom: - configMapRef: name: api-gateway-config # === Graceful Shutdown === terminationGracePeriodSeconds: 30Deployments provide the foundation for both manual and automatic scaling. Understanding these mechanisms is essential for building responsive, cost-efficient systems.
Manual Scaling:
The simplest form of scaling is manually updating the replica count:
12345678
# Scale to 10 replicas immediatelykubectl scale deployment api-gateway --replicas=10 # Scale using patch (useful in CI/CD)kubectl patch deployment api-gateway -p '{"spec":{"replicas":10}}' # Scale to zero (common for cost savings in non-prod)kubectl scale deployment api-gateway --replicas=0Horizontal Pod Autoscaler (HPA):
For dynamic workloads, the Horizontal Pod Autoscaler adjusts replica count based on observed metrics. Modern Kubernetes supports both resource-based and custom metrics scaling:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657
apiVersion: autoscaling/v2kind: HorizontalPodAutoscalermetadata: name: api-gateway-hpa namespace: productionspec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: api-gateway minReplicas: 3 maxReplicas: 50 metrics: # CPU-based scaling - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 # Scale up when avg CPU > 70% # Memory-based scaling - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 # Custom metric: requests per second - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "1000" # Scale when RPS > 1000 per pod behavior: scaleDown: stabilizationWindowSeconds: 300 # Wait 5 min before scaling down policies: - type: Percent value: 10 periodSeconds: 60 # Scale down at most 10% per minute scaleUp: stabilizationWindowSeconds: 0 # Scale up immediately policies: - type: Percent value: 100 periodSeconds: 15 # Can double capacity every 15s - type: Pods value: 4 periodSeconds: 15 # Or add 4 pods every 15sWithout explicit behavior configuration, HPA uses aggressive defaults that can cause oscillation—rapidly scaling up and down as metrics fluctuate around thresholds. The stabilizationWindowSeconds setting prevents premature scale-down by requiring metrics to remain below threshold for the specified duration. Always configure both scaleUp and scaleDown behavior in production.
| Strategy | Trigger | Response Time | Best For |
|---|---|---|---|
| Manual scaling | Human decision | Immediate | Planned events, maintenance windows |
| HPA (CPU/Memory) | Resource utilization | 15-60 seconds | CPU-bound workloads, batch processing |
| HPA (Custom metrics) | Business metrics | 30-90 seconds | API servers, request-driven workloads |
| KEDA | Event sources | Seconds to minutes | Event-driven, scale-to-zero scenarios |
| Scheduled scaling | Time-based rules | Predictable | Known traffic patterns, cost optimization |
Kubernetes Deployments support two built-in update strategies, with the rolling update being the default and most commonly used. Understanding these strategies and their tradeoffs is essential for achieving zero-downtime deployments.
Strategy 1: Rolling Update (Default)
A rolling update incrementally replaces old pods with new ones, maintaining availability throughout the process. Kubernetes creates new pods, waits for them to become ready, then terminates old pods.
Strategy 2: Recreate
The Recreate strategy terminates all existing pods before creating new ones. This causes downtime but ensures only one version runs at any time:
1234
spec: strategy: type: Recreate # No additional configuration for RecreateUse Recreate when: (1) Your application cannot tolerate multiple versions running simultaneously, (2) You have shared volumes that don't support ReadWriteMany access mode, (3) The application has startup dependencies that only the first instance should execute, or (4) Brief downtime is acceptable and simpler than managing compatibility.
Controlling and Monitoring Rollouts:
Kubernetes provides rich tooling for managing rollouts:
1234567891011121314151617181920212223
# Watch rollout progress in real-timekubectl rollout status deployment/api-gateway # View rollout historykubectl rollout history deployment/api-gateway # View specific revision detailskubectl rollout history deployment/api-gateway --revision=5 # Pause a rollout (for canary-style validation)kubectl rollout pause deployment/api-gateway # Resume a paused rolloutkubectl rollout resume deployment/api-gateway # Rollback to previous revisionkubectl rollout undo deployment/api-gateway # Rollback to specific revisionkubectl rollout undo deployment/api-gateway --to-revision=3 # Restart all pods (useful for config changes)kubectl rollout restart deployment/api-gatewayMoving Deployments from development to production requires careful attention to reliability, security, and operational concerns. This section covers the essential hardening techniques that distinguish production-grade configurations.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
# Pod Disruption Budget - ensures minimum availabilityapiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: api-gateway-pdbspec: minAvailable: 2 # At least 2 pods must remain during disruptions # Alternative: maxUnavailable: 1 selector: matchLabels: app: api-gateway---# Topology Spread - distribution across zones and nodesspec: template: spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: api-gateway - maxSkew: 1 topologyKey: kubernetes.io/hostname whenUnsatisfiable: ScheduleAnyway labelSelector: matchLabels: app: api-gateway---# Security Context - container hardeningspec: template: spec: securityContext: runAsNonRoot: true runAsUser: 1000 runAsGroup: 1000 fsGroup: 1000 seccompProfile: type: RuntimeDefault containers: - name: api-gateway securityContext: allowPrivilegeEscalation: false readOnlyRootFilesystem: true capabilities: drop: ["ALL"] volumeMounts: - name: tmp mountPath: /tmp volumes: - name: tmp emptyDir: {} # Writable temp directoryWhen Deployments don't behave as expected, systematic debugging is essential. Understanding the event flow and knowing where to look can dramatically reduce troubleshooting time.
123456789101112131415161718192021222324252627282930313233
# 1. Check Deployment status and conditionskubectl describe deployment api-gateway # Key things to look for:# - Conditions: Available, Progressing# - Events: any warnings or errors# - Replicas: desired vs current vs ready # 2. Check ReplicaSet statuskubectl get replicaset -l app=api-gatewaykubectl describe replicaset <replicaset-name> # 3. Check Pod status and eventskubectl get pods -l app=api-gatewaykubectl describe pod <pod-name> # Key things to look for:# - Status: CrashLoopBackOff, ImagePullBackOff, Pending# - Events: scheduling failures, pull errors# - Containers: Ready state, restart count # 4. Check container logskubectl logs <pod-name> -c api-gatewaykubectl logs <pod-name> -c api-gateway --previous # Previous container # 5. Interactive debuggingkubectl exec -it <pod-name> -- /bin/sh # 6. Check resource metricskubectl top pods -l app=api-gateway # 7. Check events across namespacekubectl get events --sort-by=.metadata.creationTimestamp| Symptom | Likely Cause | Solution |
|---|---|---|
| Pods stuck in Pending | Insufficient resources or node selector doesn't match | Check resource requests, node labels, and taints/tolerations |
| Pods in CrashLoopBackOff | Application crashes immediately after starting | Check logs, verify environment variables and configs |
| Pods in ImagePullBackOff | Image doesn't exist or registry auth failed | Verify image name/tag, check imagePullSecrets |
| Rollout stuck at 0 available | Pods failing readiness or liveness probes | Check probe endpoints, increase timeouts if app is slow |
| Slow rollouts | Pods taking too long to become ready | Tune readiness probe timing, check initialDelaySeconds |
| Deployment not updating | No changes to pod template (only metadata) | Ensure spec.template is modified, not just deployment metadata |
Let's consolidate the essential knowledge about Kubernetes Deployments:
What's next:
Now that you understand Deployments for stateless applications, we'll explore StatefulSets—Kubernetes' solution for applications that require stable network identities, ordered deployment, and persistent storage. StatefulSets introduce fundamentally different guarantees that are essential for databases, distributed caches, and message brokers.
You now have a comprehensive understanding of Kubernetes Deployments. You can configure production-grade stateless applications, implement scaling strategies, perform zero-downtime updates, and troubleshoot common issues. Next, we'll tackle the more complex world of stateful workloads with StatefulSets.