Loading learning content...
In the era of continuous delivery, the ability to update production systems without service interruption is not a luxury—it's a fundamental operational requirement. Rolling deployments represent the most widely adopted strategy for achieving this goal, providing a balanced approach between deployment velocity and risk management.
Every major technology organization—from hyperscalers like Google and Amazon to fast-moving startups—relies on rolling deployments as their default release mechanism. Understanding the mechanics, configuration parameters, and failure modes of rolling deployments is essential knowledge for any engineer responsible for production systems.
By the end of this page, you will understand the complete mechanics of rolling deployments: how instances are replaced, how configuration parameters control the rollout behavior, how health checks gate progression, and how to design rolling deployments that balance speed with safety. You'll be equipped to configure rolling deployments for any orchestration platform.
A rolling deployment is a release strategy that incrementally replaces instances of the previous version of an application with the new version. Rather than stopping all existing instances and starting new ones simultaneously, the deployment proceeds in waves—each wave replacing a subset of instances while the remainder continue serving traffic.
The fundamental principle:
At any point during a rolling deployment, the system maintains a mix of old and new versions, with the total capacity remaining sufficient to handle the expected load. As each new instance proves healthy, another old instance can be removed.
The core constraint in rolling deployments is maintaining adequate capacity throughout the transition. If you have 10 instances serving 10,000 requests per second, you cannot allow capacity to drop below what's needed to handle that load during deployment. This constraint drives all configuration decisions.
Visual model of a rolling deployment:
Consider a service with 6 replicas. A rolling deployment with maxSurge=2 and maxUnavailable=1 proceeds as follows:
The key insight is that capacity never drops below 5 instances (original 6 minus maxUnavailable of 1), and temporarily surges to 8 instances (original 6 plus maxSurge of 2).
| Phase | v1 Instances | v2 Instances | Total Capacity | Status |
|---|---|---|---|---|
| Initial | 6 | 0 | 6 | Stable on v1 |
| Surge | 6 | 2 (starting) | 6-8 | Creating new pods |
| Replace 1 | 5 | 2 (ready) | 7 | First v1 terminated |
| Replace 2 | 4 | 3 | 7 | Continuing rollout |
| Replace 3 | 3 | 4 | 7 | Halfway complete |
| Replace 4 | 2 | 5 | 7 | Nearing completion |
| Replace 5 | 1 | 6 | 7 | Final v1 remaining |
| Final | 0 | 6 | 6 | Stable on v2 |
Rolling deployments are controlled by a small set of parameters that have profound implications for deployment behavior. Understanding these parameters—and their interactions—is critical for configuring deployments that meet your reliability and velocity requirements.
The two fundamental parameters:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859
apiVersion: apps/v1kind: Deploymentmetadata: name: payment-service labels: app: payment-servicespec: replicas: 10 strategy: type: RollingUpdate rollingUpdate: # Allow 2 extra instances during deployment # Enables faster rollouts by parallelizing updates maxSurge: 2 # Allow at most 1 instance to be unavailable # Ensures capacity never drops below 9 instances maxUnavailable: 1 # Pod must be ready for 30 seconds before considered available # Protects against instances that pass initial checks but fail under load minReadySeconds: 30 # Deployment must complete within 10 minutes # Triggers automatic rollback if deployment stalls progressDeadlineSeconds: 600 selector: matchLabels: app: payment-service template: metadata: labels: app: payment-service spec: containers: - name: payment-service image: payment-service:v2.3.0 ports: - containerPort: 8080 # Readiness probe gates traffic routing readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 successThreshold: 1 failureThreshold: 3 # Liveness probe detects crashed instances livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 30 periodSeconds: 10 failureThreshold: 3Parameter interactions and trade-offs:
The relationship between maxSurge and maxUnavailable creates a trade-off space that balances three concerns: deployment speed, resource overhead, and capacity safety.
| Configuration | Speed | Resource Cost | Capacity Risk |
|---|---|---|---|
| maxSurge=0, maxUnavailable=1 | Slowest | Minimal | Each instance replaced sequentially |
| maxSurge=25%, maxUnavailable=0 | Medium | 25% extra capacity | Zero capacity loss |
| maxSurge=50%, maxUnavailable=25% | Fast | 50% extra capacity | Up to 25% capacity reduction |
| maxSurge=100%, maxUnavailable=0 | Fastest | 100% extra capacity | Full capacity maintained |
Setting both maxSurge and maxUnavailable to 0 creates an impossible constraint—the deployment cannot make progress because it can neither create new instances nor remove old ones. Orchestrators will reject this configuration.
Rolling deployments rely on health checks to determine when new instances are ready to receive traffic and when old instances can be safely terminated. Improperly configured health checks are the leading cause of deployment incidents—either allowing unhealthy instances to receive traffic or preventing healthy instances from being used.
The two-probe model:
Modern orchestration platforms distinguish between two types of health checks with different purposes:
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980
package main import ( "database/sql" "net/http" "sync/atomic" "time") type HealthChecker struct { db *sql.DB redis *redis.Client ready atomic.Bool shuttingDown atomic.Bool} // LivenessHandler checks if the process is fundamentally healthy.// Should ONLY fail for unrecoverable states like deadlocks or corruption.// Do NOT check external dependencies—if DB is down, restarting this// instance won't fix it.func (h *HealthChecker) LivenessHandler(w http.ResponseWriter, r *http.Request) { // Check for application-level health // Examples: goroutine leaks, memory corruption, deadlocks if h.shuttingDown.Load() { // Graceful shutdown in progress—let it complete w.WriteHeader(http.StatusOK) return } // Simple health check—can the application respond at all? w.WriteHeader(http.StatusOK) w.Write([]byte("alive"))} // ReadinessHandler checks if the instance should receive traffic.// Should check all dependencies needed to serve requests.// Temporary failures are acceptable—instance stays running.func (h *HealthChecker) ReadinessHandler(w http.ResponseWriter, r *http.Request) { ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second) defer cancel() // Check if we're shutting down if h.shuttingDown.Load() { http.Error(w, "shutting down", http.StatusServiceUnavailable) return } // Check database connectivity if err := h.db.PingContext(ctx); err != nil { http.Error(w, "database unavailable", http.StatusServiceUnavailable) return } // Check Redis connectivity if err := h.redis.Ping(ctx).Err(); err != nil { http.Error(w, "redis unavailable", http.StatusServiceUnavailable) return } // Check if warmed up (caches populated, connections established) if !h.ready.Load() { http.Error(w, "not yet ready", http.StatusServiceUnavailable) return } w.WriteHeader(http.StatusOK) w.Write([]byte("ready"))} // StartupHandler is for slow-starting applications.// Gives more time for initial startup without affecting liveness.func (h *HealthChecker) StartupHandler(w http.ResponseWriter, r *http.Request) { if h.ready.Load() { w.WriteHeader(http.StatusOK) w.Write([]byte("started")) return } http.Error(w, "still starting", http.StatusServiceUnavailable)}If your liveness probe checks database connectivity and the database goes down, the orchestrator will restart all instances simultaneously—making a bad situation catastrophically worse. Liveness probes should only detect local, unrecoverable failures.
Timing parameters for health checks:
| Parameter | Recommended Value | Purpose |
|---|---|---|
| initialDelaySeconds | Application-dependent | Wait for application to start before probing |
| periodSeconds | 5-10 seconds | How often to run the probe |
| timeoutSeconds | 1-3 seconds | Maximum time for probe to respond |
| successThreshold | 1 | Consecutive successes to be considered healthy |
| failureThreshold | 3 | Consecutive failures to be considered unhealthy |
The math of failure detection time:
With periodSeconds=5 and failureThreshold=3, an unhealthy instance is detected in:
periodSeconds × failureThreshold = 5 × 3 = 15 seconds
During these 15 seconds, traffic may be routed to the failing instance. Balance detection speed against false positives from transient issues.
A critical but often overlooked aspect of rolling deployments is how terminating instances handle in-flight requests. Without proper graceful shutdown, rolling deployments cause user-facing errors as connections are abruptly terminated.
The shutdown sequence:
When Kubernetes (or any orchestrator) decides to terminate a pod, a carefully orchestrated sequence begins:
There's a race condition between the application receiving SIGTERM and the endpoints being updated in all kube-proxies. The application may receive new connections for several seconds after beginning shutdown. The solution: wait for a short period after receiving SIGTERM before stopping the listener.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
package main import ( "context" "log" "net/http" "os" "os/signal" "syscall" "time") func main() { server := &http.Server{ Addr: ":8080", Handler: createHandler(), ReadTimeout: 10 * time.Second, WriteTimeout: 30 * time.Second, IdleTimeout: 60 * time.Second, } // Channel to receive shutdown signals shutdown := make(chan os.Signal, 1) signal.Notify(shutdown, syscall.SIGTERM, syscall.SIGINT) // Start server in goroutine go func() { log.Printf("Server starting on %s", server.Addr) if err := server.ListenAndServe(); err != http.ErrServerClosed { log.Fatalf("Server error: %v", err) } }() // Wait for shutdown signal <-shutdown log.Println("Shutdown signal received, beginning graceful shutdown...") // CRITICAL: Wait for endpoints to propagate // This gives kube-proxy time to update iptables rules // and stop routing new traffic to this pod log.Println("Waiting for endpoint propagation...") time.Sleep(5 * time.Second) // Create shutdown context with timeout // This should be less than terminationGracePeriodSeconds ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second) defer cancel() // Shutdown server gracefully // This stops accepting new connections and waits for existing ones log.Println("Shutting down HTTP server...") if err := server.Shutdown(ctx); err != nil { log.Printf("Graceful shutdown error: %v", err) // Force close remaining connections server.Close() } // Cleanup other resources (database connections, message queues, etc.) log.Println("Closing database connections...") // db.Close() log.Println("Shutdown complete")}Configuring terminationGracePeriodSeconds:
The terminationGracePeriodSeconds value should be long enough to:
A typical configuration for a web service:
spec:
terminationGracePeriodSeconds: 60
containers:
- name: app
# Application should complete shutdown within 55 seconds
# (leaving 5 seconds of buffer before SIGKILL)
Long-running operations:
For services with long-running operations (background jobs, WebSocket connections, streaming responses), you may need:
During a rolling deployment, your system runs multiple versions simultaneously. This creates a critical requirement: both the old and new versions must be able to operate correctly in the same environment at the same time.
This constraint affects:
Any breaking change must be deployed in at least two phases: first deploy code that can handle both old and new formats, then deploy code that only produces the new format. This ensures there's always a version running that can interpret every message.
Database schema changes:
Schema migrations during rolling deployments require careful sequencing. Consider adding a new required column:
| Phase | Deployment | Schema State | Compatibility |
|---|---|---|---|
| 1 | No change | Original schema | Baseline |
| 2 | Add column as nullable | Column exists, nullable | Old code ignores, new code uses |
| 3 | Backfill existing rows | All rows have values | Old code ignores, new code uses |
| 4 | Deploy code requiring column | Column required | New code depends on column |
| 5 | Add NOT NULL constraint | Schema final | Schema enforces requirement |
Each phase is a separate deployment. Skipping phases causes failures during the mixed-version window.
1234567891011121314151617181920212223242526272829303132333435
// Example: Safe field renaming through API versioning // Phase 1: Original API responseinterface UserResponseV1 { user_name: string; // Old field name email: string;} // Phase 2: Transitional API response (supports both)interface UserResponseV2 { user_name: string; // Deprecated but still present username: string; // New field name email: string;} // Backend returns both during transitionfunction getUserResponse(user: User): UserResponseV2 { return { user_name: user.username, // For old clients username: user.username, // For new clients email: user.email, };} // Client code should prefer new field with fallbackfunction parseUserResponse(response: UserResponseV1 | UserResponseV2): string { // New field takes precedence, fall back to old field return 'username' in response ? response.username : response.user_name;} // Phase 3: After all clients updated, remove old fieldinterface UserResponseV3 { username: string; email: string;}Rolling deployments require enhanced observability to detect issues before they affect all instances. The gradual nature of rollouts is only beneficial if you can detect problems early and halt the deployment.
Key metrics to monitor during rollouts:
12345678910111213141516171819202122232425262728293031323334
# Error rate comparison between versions# Should be approximately equal—significant difference indicates regression # Error rate for new version (last 5 minutes)sum(rate(http_requests_total{app="payment-service", version="v2.3.0", status=~"5.."}[5m]))/sum(rate(http_requests_total{app="payment-service", version="v2.3.0"}[5m])) # Error rate for old version (last 5 minutes)sum(rate(http_requests_total{app="payment-service", version="v2.2.0", status=~"5.."}[5m]))/sum(rate(http_requests_total{app="payment-service", version="v2.2.0"}[5m])) # Latency comparison: p99 by versionhistogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{app="payment-service"}[5m])) by (le, version)) # Memory usage growth rate (detect memory leaks)rate(container_memory_working_set_bytes{ pod=~"payment-service.*", container="payment-service"}[15m]) # Pod restart count (crash loops)sum(kube_pod_container_status_restarts_total{ pod=~"payment-service.*"}) by (pod) # Deployment progresskube_deployment_status_replicas_updated{deployment="payment-service"}/kube_deployment_spec_replicas{deployment="payment-service"}All application metrics must include a version label to enable comparison during rollouts. Without version labels, you can only see aggregate metrics that blend old and new instances—hiding problems until they're widespread.
Alerting during deployments:
Configure alerts that fire during deployments with lower thresholds than normal operations:
| Metric | Normal Threshold | Deployment Threshold | Rationale |
|---|---|---|---|
| Error rate | >1% for 5 min | >0.1% for 2 min | Catch issues faster during deployment |
| p99 latency | >500ms for 5 min | >300ms for 2 min | Detect latency regression early |
| Pod restarts | >3 in 10 min | >1 in 5 min | Any restart during deployment is suspicious |
| Memory growth | >10% in 1 hour | >5% in 15 min | Memory leaks manifest during restart |
These deployment-specific alerts should automatically activate when a deployment begins and deactivate when it completes.
A rolling deployment without a tested rollback plan is an incomplete deployment. You must know, before you begin, exactly how you'll revert if something goes wrong.
Types of rollback triggers:
12345678910111213141516171819202122232425262728293031323334353637
#!/bin/bash# Kubernetes Rolling Deployment Rollback Procedures # View deployment historykubectl rollout history deployment/payment-service # Example output:# REVISION CHANGE-CAUSE# 1 Initial deployment# 2 kubectl set image deployment/payment-service payment-service=payment-service:v2.2.0# 3 kubectl set image deployment/payment-service payment-service=payment-service:v2.3.0 # Rollback to previous version (revision 2)kubectl rollout undo deployment/payment-service # Rollback to specific revisionkubectl rollout undo deployment/payment-service --to-revision=1 # Check rollback statuskubectl rollout status deployment/payment-service # Pause a problematic deployment mid-rolloutkubectl rollout pause deployment/payment-service # Resume paused deploymentkubectl rollout resume deployment/payment-service # View detailed rollout statuskubectl describe deployment payment-service | grep -A 20 "Conditions:" # Example output during rollback:# Conditions:# Type Status Reason# ---- ------ ------# Available True MinimumReplicasAvailable# Progressing True ReplicaSetUpdated# RollbackSuccessful True RollbackRevision=2Rolling back the deployment reverts the application code, but it does NOT revert database migrations, configuration changes, or external state changes. You must have separate procedures for these. This is why breaking changes must be backward compatible—a rollback will run old code against new data.
Rollback decision matrix:
| Situation | Detection | Rollback Decision | Time Target |
|---|---|---|---|
| New pods won't start | Automatic (pending pods) | Automatic pause, investigate | < 5 minutes |
| Error rate spike | Manual/Alert | Immediate manual rollback | < 10 minutes |
| Latency regression | Manual/Alert | Evaluate severity, likely rollback | < 15 minutes |
| Memory leak | Gradual observation | Pause, investigate, usually rollback | < 30 minutes |
| Functional bug | User reports | Depends on severity and blast radius | Varies |
Post-rollback checklist:
After years of running rolling deployments at scale, the industry has converged on a set of best practices that maximize reliability while maintaining development velocity.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192
# PodDisruptionBudget ensures minimum availability during# voluntary disruptions (node drains, cluster upgrades, etc.) apiVersion: policy/v1kind: PodDisruptionBudgetmetadata: name: payment-service-pdbspec: # At least 80% of pods must remain available # For 10 replicas, at most 2 can be disrupted simultaneously minAvailable: "80%" # Alternative: maxUnavailable # maxUnavailable: 2 selector: matchLabels: app: payment-service ---# Complete production deployment example apiVersion: apps/v1kind: Deploymentmetadata: name: payment-service annotations: # Record the cause of this deployment for rollback reference kubernetes.io/change-cause: "Deploy v2.3.0 - Add retry logic for external APIs"spec: replicas: 10 revisionHistoryLimit: 5 # Keep 5 revisions for rollback strategy: type: RollingUpdate rollingUpdate: maxSurge: 2 maxUnavailable: 1 minReadySeconds: 30 progressDeadlineSeconds: 600 selector: matchLabels: app: payment-service template: metadata: labels: app: payment-service version: v2.3.0 # Version label for metrics spec: terminationGracePeriodSeconds: 60 # Anti-affinity spreads pods across nodes affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchLabels: app: payment-service topologyKey: kubernetes.io/hostname containers: - name: payment-service image: payment-service:v2.3.0 ports: - containerPort: 8080 resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1Gi" cpu: "1000m" readinessProbe: httpGet: path: /health/ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 timeoutSeconds: 3 failureThreshold: 3 livenessProbe: httpGet: path: /health/live port: 8080 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 3 failureThreshold: 3Rolling deployments are the workhorse of modern continuous delivery—reliable, well-understood, and supported by every major orchestration platform. Let's consolidate the key concepts:
What's next:
Rolling deployments are powerful but limited—you can't easily test with real traffic before full rollout, and rollback is reactive rather than preventive. In the next page, we'll explore blue-green deployments, which maintain two complete environments and switch traffic atomically, enabling instant rollback and pre-production testing with production infrastructure.
You now understand rolling deployments at a deep level—from configuration parameters to graceful shutdown, from health check design to rollback procedures. This knowledge applies to Kubernetes, AWS ECS, Azure Container Instances, and any modern orchestration platform.