System Design (HLD)Deployment Strategies

Deployment Strategies

LevelAdvanced

Duration90 mins

TopicDeployment Strategies

1 / 5

Rolling Deployments

The Foundation of Zero-Downtime Releases

In the era of continuous delivery, the ability to update production systems without service interruption is not a luxury—it's a fundamental operational requirement. Rolling deployments represent the most widely adopted strategy for achieving this goal, providing a balanced approach between deployment velocity and risk management.

Every major technology organization—from hyperscalers like Google and Amazon to fast-moving startups—relies on rolling deployments as their default release mechanism. Understanding the mechanics, configuration parameters, and failure modes of rolling deployments is essential knowledge for any engineer responsible for production systems.

What You Will Learn

By the end of this page, you will understand the complete mechanics of rolling deployments: how instances are replaced, how configuration parameters control the rollout behavior, how health checks gate progression, and how to design rolling deployments that balance speed with safety. You'll be equipped to configure rolling deployments for any orchestration platform.

What Are Rolling Deployments?

A rolling deployment is a release strategy that incrementally replaces instances of the previous version of an application with the new version. Rather than stopping all existing instances and starting new ones simultaneously, the deployment proceeds in waves—each wave replacing a subset of instances while the remainder continue serving traffic.

The fundamental principle:

At any point during a rolling deployment, the system maintains a mix of old and new versions, with the total capacity remaining sufficient to handle the expected load. As each new instance proves healthy, another old instance can be removed.

The Capacity Invariant

The core constraint in rolling deployments is maintaining adequate capacity throughout the transition. If you have 10 instances serving 10,000 requests per second, you cannot allow capacity to drop below what's needed to handle that load during deployment. This constraint drives all configuration decisions.

Visual model of a rolling deployment:

Consider a service with 6 replicas. A rolling deployment with maxSurge=2 and maxUnavailable=1 proceeds as follows:

Initial state: 6 instances of v1 serving traffic
Wave 1: Create 2 new v2 instances (surge to 8 total), terminate 1 v1 instance when v2 is healthy
Wave 2: Create 1 v2 instance, terminate 1 v1 instance when healthy
Wave 3-5: Continue replacing v1 with v2, one at a time
Final state: 6 instances of v2 serving traffic

The key insight is that capacity never drops below 5 instances (original 6 minus maxUnavailable of 1), and temporarily surges to 8 instances (original 6 plus maxSurge of 2).

Rolling Deployment Progression Example
Phase	v1 Instances	v2 Instances	Total Capacity	Status
Initial	6	0	6	Stable on v1
Surge	6	2 (starting)	6-8	Creating new pods
Replace 1	5	2 (ready)	7	First v1 terminated
Replace 2	4	3	7	Continuing rollout
Replace 3	3	4	7	Halfway complete
Replace 4	2	5	7	Nearing completion
Replace 5	1	6	7	Final v1 remaining
Final	0	6	6	Stable on v2

Configuration Parameters Deep Dive

Rolling deployments are controlled by a small set of parameters that have profound implications for deployment behavior. Understanding these parameters—and their interactions—is critical for configuring deployments that meet your reliability and velocity requirements.

The two fundamental parameters:

Core Rolling Deployment Parameters

•maxSurge — The maximum number of instances (or percentage) that can be created above the desired count during deployment. Higher values speed up deployments but require more resources. A value of 25% on a 20-instance deployment allows up to 5 extra instances.
•maxUnavailable — The maximum number of instances (or percentage) that can be unavailable during deployment. Higher values speed up deployments but reduce capacity. A value of 0 ensures full capacity is maintained throughout.
•minReadySeconds — The minimum time a new instance must be ready (passing health checks) before it's considered available and the deployment can proceed. Protects against fast-failing instances.
•progressDeadlineSeconds — The maximum time the deployment can take before it's considered failed. Prevents deployments from hanging indefinitely on problematic releases.

kubernetes-rolling-deployment.yaml
Kubernetes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  labels:
    app: payment-service
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      # Allow 2 extra instances during deployment
      # Enables faster rollouts by parallelizing updates
      maxSurge: 2
      
      # Allow at most 1 instance to be unavailable
      # Ensures capacity never drops below 9 instances
      maxUnavailable: 1
      
  # Pod must be ready for 30 seconds before considered available
  # Protects against instances that pass initial checks but fail under load
  minReadySeconds: 30
  
  # Deployment must complete within 10 minutes
  # Triggers automatic rollback if deployment stalls
  progressDeadlineSeconds: 600
  
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
      - name: payment-service
        image: payment-service:v2.3.0
        ports:
        - containerPort: 8080
        
        # Readiness probe gates traffic routing
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          failureThreshold: 3
          
        # Liveness probe detects crashed instances
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3

Parameter interactions and trade-offs:

The relationship between maxSurge and maxUnavailable creates a trade-off space that balances three concerns: deployment speed, resource overhead, and capacity safety.

Configuration	Speed	Resource Cost	Capacity Risk
maxSurge=0, maxUnavailable=1	Slowest	Minimal	Each instance replaced sequentially
maxSurge=25%, maxUnavailable=0	Medium	25% extra capacity	Zero capacity loss
maxSurge=50%, maxUnavailable=25%	Fast	50% extra capacity	Up to 25% capacity reduction
maxSurge=100%, maxUnavailable=0	Fastest	100% extra capacity	Full capacity maintained

Critical: Both Parameters Cannot Be Zero

Setting both maxSurge and maxUnavailable to 0 creates an impossible constraint—the deployment cannot make progress because it can neither create new instances nor remove old ones. Orchestrators will reject this configuration.

Health Checks and Progression Gates

Rolling deployments rely on health checks to determine when new instances are ready to receive traffic and when old instances can be safely terminated. Improperly configured health checks are the leading cause of deployment incidents—either allowing unhealthy instances to receive traffic or preventing healthy instances from being used.

The two-probe model:

Modern orchestration platforms distinguish between two types of health checks with different purposes:

Readiness Probe

•Determines if instance should receive traffic
•Failing probe removes instance from load balancer
•Instance continues running, can recover
•Critical for rolling deployments: gates progression
•Should check app dependencies (DB, cache, etc.)
•Can temporarily fail during overload

Liveness Probe

•Determines if instance needs to be restarted
•Failing probe triggers container restart
•Instance is killed and replaced
•Should only fail for unrecoverable states
•Should NOT check external dependencies
•Typically simpler than readiness probe

health-endpoints.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
package main
 
import (
    "database/sql"
    "net/http"
    "sync/atomic"
    "time"
)
 
type HealthChecker struct {
    db           *sql.DB
    redis        *redis.Client
    ready        atomic.Bool
    shuttingDown atomic.Bool
}
 
// LivenessHandler checks if the process is fundamentally healthy.
// Should ONLY fail for unrecoverable states like deadlocks or corruption.
// Do NOT check external dependencies—if DB is down, restarting this
// instance won't fix it.
func (h *HealthChecker) LivenessHandler(w http.ResponseWriter, r *http.Request) {
    // Check for application-level health
    // Examples: goroutine leaks, memory corruption, deadlocks
    
    if h.shuttingDown.Load() {
        // Graceful shutdown in progress—let it complete
        w.WriteHeader(http.StatusOK)
        return
    }
    
    // Simple health check—can the application respond at all?
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("alive"))
}
 
// ReadinessHandler checks if the instance should receive traffic.
// Should check all dependencies needed to serve requests.
// Temporary failures are acceptable—instance stays running.
func (h *HealthChecker) ReadinessHandler(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
    defer cancel()
    
    // Check if we're shutting down
    if h.shuttingDown.Load() {
        http.Error(w, "shutting down", http.StatusServiceUnavailable)
        return
    }
    
    // Check database connectivity
    if err := h.db.PingContext(ctx); err != nil {
        http.Error(w, "database unavailable", http.StatusServiceUnavailable)
        return
    }
    
    // Check Redis connectivity
    if err := h.redis.Ping(ctx).Err(); err != nil {
        http.Error(w, "redis unavailable", http.StatusServiceUnavailable)
        return
    }
    
    // Check if warmed up (caches populated, connections established)
    if !h.ready.Load() {
        http.Error(w, "not yet ready", http.StatusServiceUnavailable)
        return
    }
    
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("ready"))
}
 
// StartupHandler is for slow-starting applications.
// Gives more time for initial startup without affecting liveness.
func (h *HealthChecker) StartupHandler(w http.ResponseWriter, r *http.Request) {
    if h.ready.Load() {
        w.WriteHeader(http.StatusOK)
        w.Write([]byte("started"))
        return
    }
    http.Error(w, "still starting", http.StatusServiceUnavailable)
}

Common Mistake: Checking Dependencies in Liveness Probes

If your liveness probe checks database connectivity and the database goes down, the orchestrator will restart all instances simultaneously—making a bad situation catastrophically worse. Liveness probes should only detect local, unrecoverable failures.

Timing parameters for health checks:

Parameter	Recommended Value	Purpose
initialDelaySeconds	Application-dependent	Wait for application to start before probing
periodSeconds	5-10 seconds	How often to run the probe
timeoutSeconds	1-3 seconds	Maximum time for probe to respond
successThreshold	1	Consecutive successes to be considered healthy
failureThreshold	3	Consecutive failures to be considered unhealthy

The math of failure detection time:

With periodSeconds=5 and failureThreshold=3, an unhealthy instance is detected in: periodSeconds × failureThreshold = 5 × 3 = 15 seconds

During these 15 seconds, traffic may be routed to the failing instance. Balance detection speed against false positives from transient issues.

Graceful Shutdown and Connection Draining

A critical but often overlooked aspect of rolling deployments is how terminating instances handle in-flight requests. Without proper graceful shutdown, rolling deployments cause user-facing errors as connections are abruptly terminated.

The shutdown sequence:

When Kubernetes (or any orchestrator) decides to terminate a pod, a carefully orchestrated sequence begins:

Pod Termination Sequence

•Pod marked for termination — Kubernetes updates pod status, begins graceful termination period
•Endpoints removed — Pod is removed from Service endpoints (but this propagates asynchronously!)
•SIGTERM sent — Container receives SIGTERM signal, should begin shutdown
•Readiness probe fails — Application marks itself not ready, stops accepting new connections
•In-flight requests complete — Application waits for existing requests to finish
•Connections closed — Application closes all connections gracefully
•SIGKILL if needed — After terminationGracePeriodSeconds, process is forcibly killed

The Race Condition Problem

There's a race condition between the application receiving SIGTERM and the endpoints being updated in all kube-proxies. The application may receive new connections for several seconds after beginning shutdown. The solution: wait for a short period after receiving SIGTERM before stopping the listener.

graceful-shutdown.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
package main
 
import (
    "context"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)
 
func main() {
    server := &http.Server{
        Addr:         ":8080",
        Handler:      createHandler(),
        ReadTimeout:  10 * time.Second,
        WriteTimeout: 30 * time.Second,
        IdleTimeout:  60 * time.Second,
    }
    
    // Channel to receive shutdown signals
    shutdown := make(chan os.Signal, 1)
    signal.Notify(shutdown, syscall.SIGTERM, syscall.SIGINT)
    
    // Start server in goroutine
    go func() {
        log.Printf("Server starting on %s", server.Addr)
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatalf("Server error: %v", err)
        }
    }()
    
    // Wait for shutdown signal
    <-shutdown
    log.Println("Shutdown signal received, beginning graceful shutdown...")
    
    // CRITICAL: Wait for endpoints to propagate
    // This gives kube-proxy time to update iptables rules
    // and stop routing new traffic to this pod
    log.Println("Waiting for endpoint propagation...")
    time.Sleep(5 * time.Second)
    
    // Create shutdown context with timeout
    // This should be less than terminationGracePeriodSeconds
    ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
    defer cancel()
    
    // Shutdown server gracefully
    // This stops accepting new connections and waits for existing ones
    log.Println("Shutting down HTTP server...")
    if err := server.Shutdown(ctx); err != nil {
        log.Printf("Graceful shutdown error: %v", err)
        // Force close remaining connections
        server.Close()
    }
    
    // Cleanup other resources (database connections, message queues, etc.)
    log.Println("Closing database connections...")
    // db.Close()
    
    log.Println("Shutdown complete")
}

Configuring terminationGracePeriodSeconds:

The terminationGracePeriodSeconds value should be long enough to:

Allow endpoint propagation delay (~5 seconds)
Complete in-flight requests (depends on your request timeout)
Close connections and cleanup resources

A typical configuration for a web service:

spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: app
    # Application should complete shutdown within 55 seconds
    # (leaving 5 seconds of buffer before SIGKILL)

Long-running operations:

For services with long-running operations (background jobs, WebSocket connections, streaming responses), you may need:

Longer termination grace periods (minutes instead of seconds)
Checkpoint/resume mechanisms for jobs
Connection migration for WebSockets

Mixed-Version Compatibility

During a rolling deployment, your system runs multiple versions simultaneously. This creates a critical requirement: both the old and new versions must be able to operate correctly in the same environment at the same time.

This constraint affects:

API compatibility between services
Database schema changes
Message format changes
Configuration compatibility
Feature flag states

The Two-Phase Deployment Rule

Any breaking change must be deployed in at least two phases: first deploy code that can handle both old and new formats, then deploy code that only produces the new format. This ensures there's always a version running that can interpret every message.

Database schema changes:

Schema migrations during rolling deployments require careful sequencing. Consider adding a new required column:

Phase	Deployment	Schema State	Compatibility
1	No change	Original schema	Baseline
2	Add column as nullable	Column exists, nullable	Old code ignores, new code uses
3	Backfill existing rows	All rows have values	Old code ignores, new code uses
4	Deploy code requiring column	Column required	New code depends on column
5	Add NOT NULL constraint	Schema final	Schema enforces requirement

Each phase is a separate deployment. Skipping phases causes failures during the mixed-version window.

Breaking Changes That Require Multi-Phase Deployments

•Renaming a field — Add new field, deploy readers of both, migrate data, deploy writers of new, remove old field
•Removing a field — Stop reading field, deploy, stop writing field, deploy, remove from schema
•Changing field type — Add new field with new type, migrate data, update readers/writers, remove old field
•Adding required field — Add as optional, backfill, update writers to always set, make required
•Changing API endpoint — Add new endpoint, update clients to use new, remove old endpoint
•Changing message format — Add version field, deploy handlers for both versions, update producers, deprecate old format

api-versioning-example.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Example: Safe field renaming through API versioning
 
// Phase 1: Original API response
interface UserResponseV1 {
  user_name: string;  // Old field name
  email: string;
}
 
// Phase 2: Transitional API response (supports both)
interface UserResponseV2 {
  user_name: string;   // Deprecated but still present
  username: string;    // New field name
  email: string;
}
 
// Backend returns both during transition
function getUserResponse(user: User): UserResponseV2 {
  return {
    user_name: user.username,  // For old clients
    username: user.username,   // For new clients
    email: user.email,
  };
}
 
// Client code should prefer new field with fallback
function parseUserResponse(response: UserResponseV1 | UserResponseV2): string {
  // New field takes precedence, fall back to old field
  return 'username' in response ? response.username : response.user_name;
}
 
// Phase 3: After all clients updated, remove old field
interface UserResponseV3 {
  username: string;
  email: string;
}

Monitoring and Observability During Rollouts

Rolling deployments require enhanced observability to detect issues before they affect all instances. The gradual nature of rollouts is only beneficial if you can detect problems early and halt the deployment.

Key metrics to monitor during rollouts:

Critical Deployment Metrics

•Error rate by version — Compare error rates between old and new versions. Any increase in errors from new instances should pause the rollout.
•Latency percentiles by version — p50, p95, and p99 latencies, segmented by version. Watch for latency regressions.
•Pod restart count — Instances crash-looping indicates a fundamental issue with the new version.
•Resource utilization — Memory and CPU usage of new instances. Memory leaks often manifest during rollouts.
•Request throughput — Ensure new instances are actually receiving and processing traffic.
•Dependency health — Database connection pool usage, external API latencies, cache hit rates.

prometheus-queries.promql

PromQL

# Error rate comparison between versions
# Should be approximately equal—significant difference indicates regression
 
# Error rate for new version (last 5 minutes)
sum(rate(http_requests_total{app="payment-service", version="v2.3.0", status=~"5.."}[5m]))
/
sum(rate(http_requests_total{app="payment-service", version="v2.3.0"}[5m]))
 
# Error rate for old version (last 5 minutes)
sum(rate(http_requests_total{app="payment-service", version="v2.2.0", status=~"5.."}[5m]))
/
sum(rate(http_requests_total{app="payment-service", version="v2.2.0"}[5m]))
 
# Latency comparison: p99 by version
histogram_quantile(0.99, 
  sum(rate(http_request_duration_seconds_bucket{app="payment-service"}[5m])) 
  by (le, version)
)
 
# Memory usage growth rate (detect memory leaks)
rate(container_memory_working_set_bytes{
  pod=~"payment-service.*",
  container="payment-service"
}[15m])
 
# Pod restart count (crash loops)
sum(kube_pod_container_status_restarts_total{
  pod=~"payment-service.*"
}) by (pod)
 
# Deployment progress
kube_deployment_status_replicas_updated{deployment="payment-service"}
/
kube_deployment_spec_replicas{deployment="payment-service"}

Version Labels Are Essential

All application metrics must include a version label to enable comparison during rollouts. Without version labels, you can only see aggregate metrics that blend old and new instances—hiding problems until they're widespread.

Alerting during deployments:

Configure alerts that fire during deployments with lower thresholds than normal operations:

Metric	Normal Threshold	Deployment Threshold	Rationale
Error rate	>1% for 5 min	>0.1% for 2 min	Catch issues faster during deployment
p99 latency	>500ms for 5 min	>300ms for 2 min	Detect latency regression early
Pod restarts	>3 in 10 min	>1 in 5 min	Any restart during deployment is suspicious
Memory growth	>10% in 1 hour	>5% in 15 min	Memory leaks manifest during restart

These deployment-specific alerts should automatically activate when a deployment begins and deactivate when it completes.

Rollback Scenarios and Procedures

A rolling deployment without a tested rollback plan is an incomplete deployment. You must know, before you begin, exactly how you'll revert if something goes wrong.

Types of rollback triggers:

Rollback Trigger Conditions

•Automatic (health check failures) — Deployment controller automatically pauses when new pods fail readiness checks
•Automatic (progress deadline) — Deployment fails if it doesn't make progress within progressDeadlineSeconds
•Manual (metric regression) — Operator observes increased errors or latency and initiates rollback
•Manual (functional issues) — Users report bugs or unexpected behavior in new version
•Manual (external dependencies) — New version incompatible with external systems discovered in production

rollback-commands.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash
# Kubernetes Rolling Deployment Rollback Procedures
 
# View deployment history
kubectl rollout history deployment/payment-service
 
# Example output:
# REVISION  CHANGE-CAUSE
# 1         Initial deployment
# 2         kubectl set image deployment/payment-service payment-service=payment-service:v2.2.0
# 3         kubectl set image deployment/payment-service payment-service=payment-service:v2.3.0
 
# Rollback to previous version (revision 2)
kubectl rollout undo deployment/payment-service
 
# Rollback to specific revision
kubectl rollout undo deployment/payment-service --to-revision=1
 
# Check rollback status
kubectl rollout status deployment/payment-service
 
# Pause a problematic deployment mid-rollout
kubectl rollout pause deployment/payment-service
 
# Resume paused deployment
kubectl rollout resume deployment/payment-service
 
# View detailed rollout status
kubectl describe deployment payment-service | grep -A 20 "Conditions:"
 
# Example output during rollback:
# Conditions:
#   Type           Status  Reason
#   ----           ------  ------
#   Available      True    MinimumReplicasAvailable
#   Progressing    True    ReplicaSetUpdated
#   RollbackSuccessful  True    RollbackRevision=2

Rollback Does Not Undo Everything

Rolling back the deployment reverts the application code, but it does NOT revert database migrations, configuration changes, or external state changes. You must have separate procedures for these. This is why breaking changes must be backward compatible—a rollback will run old code against new data.

Rollback decision matrix:

Situation	Detection	Rollback Decision	Time Target
New pods won't start	Automatic (pending pods)	Automatic pause, investigate	< 5 minutes
Error rate spike	Manual/Alert	Immediate manual rollback	< 10 minutes
Latency regression	Manual/Alert	Evaluate severity, likely rollback	< 15 minutes
Memory leak	Gradual observation	Pause, investigate, usually rollback	< 30 minutes
Functional bug	User reports	Depends on severity and blast radius	Varies

Post-rollback checklist:

Verify all instances are running the previous version
Confirm error rates and latency have returned to baseline
Document what went wrong and why
Create tracking issue for root cause analysis
Update deployment with fix before next attempt

Best Practices and Production Patterns

After years of running rolling deployments at scale, the industry has converged on a set of best practices that maximize reliability while maintaining development velocity.

Rolling Deployment Best Practices

•Set minReadySeconds appropriately — Don't set to 0. Instances that pass health checks immediately may fail under load. 30-60 seconds is common for production services.
•Use PodDisruptionBudgets — Even outside deployments, node maintenance and cluster operations can terminate pods. PDBs ensure minimum availability during any disruption.
•Configure resource requests accurately — Over-provisioned requests waste cluster capacity during surge. Under-provisioned requests cause throttling under load.
•Implement structured logging — Include version in all log entries. This enables quick filtering to find issues introduced in new versions.
•Test graceful shutdown locally — Verify your application handles SIGTERM correctly before deploying. Improper shutdown causes connection errors.
•Limit blast radius with namespace isolation — Separate critical services into different namespaces with independent deployments.
•Deploy during low-traffic periods — For risk-sensitive services, schedule deployments when traffic is lower and response capacity is higher.
•Always have a runbook — Document exact commands for pausing, resuming, and rolling back before you start any deployment.

pod-disruption-budget.yaml
Kubernetes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# PodDisruptionBudget ensures minimum availability during
# voluntary disruptions (node drains, cluster upgrades, etc.)
 
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payment-service-pdb
spec:
  # At least 80% of pods must remain available
  # For 10 replicas, at most 2 can be disrupted simultaneously
  minAvailable: "80%"
  
  # Alternative: maxUnavailable
  # maxUnavailable: 2
  
  selector:
    matchLabels:
      app: payment-service
      
---
# Complete production deployment example
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  annotations:
    # Record the cause of this deployment for rollback reference
    kubernetes.io/change-cause: "Deploy v2.3.0 - Add retry logic for external APIs"
spec:
  replicas: 10
  revisionHistoryLimit: 5  # Keep 5 revisions for rollback
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
  minReadySeconds: 30
  progressDeadlineSeconds: 600
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
        version: v2.3.0  # Version label for metrics
    spec:
      terminationGracePeriodSeconds: 60
      
      # Anti-affinity spreads pods across nodes
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: payment-service
              topologyKey: kubernetes.io/hostname
              
      containers:
      - name: payment-service
        image: payment-service:v2.3.0
        ports:
        - containerPort: 8080
        
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
            
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
          
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3

Summary: Rolling Deployments

Rolling deployments are the workhorse of modern continuous delivery—reliable, well-understood, and supported by every major orchestration platform. Let's consolidate the key concepts:

Key Takeaways

•Rolling deployments replace instances incrementally — Old instances are replaced with new ones in waves, maintaining capacity throughout the transition.
•maxSurge and maxUnavailable control the rollout — These parameters balance deployment speed against resource overhead and capacity safety.
•Health checks gate progression — Readiness probes determine when new instances are ready; liveness probes detect crashes. Don't check external dependencies in liveness probes.
•Graceful shutdown prevents errors — Wait for endpoint propagation, drain connections, and shutdown cleanly to avoid user-facing errors.
•Mixed-version compatibility is required — Both versions run simultaneously during deployment; breaking changes require multi-phase deployments.
•Version-labeled metrics enable detection — Monitor error rates and latency by version to detect regressions before they affect all traffic.
•Rollback is always an option — But remember it only reverts code, not database or external state changes.

What's next:

Rolling deployments are powerful but limited—you can't easily test with real traffic before full rollout, and rollback is reactive rather than preventive. In the next page, we'll explore blue-green deployments, which maintain two complete environments and switch traffic atomically, enabling instant rollback and pre-production testing with production infrastructure.

Page Complete

You now understand rolling deployments at a deep level—from configuration parameters to graceful shutdown, from health check design to rollback procedures. This knowledge applies to Kubernetes, AWS ECS, Azure Container Instances, and any modern orchestration platform.

1 / 5

Loading learning content...

System Design (HLD)Deployment Strategies

Deployment Strategies

LevelAdvanced

Duration90 mins

TopicDeployment Strategies

1 / 5

Rolling Deployments

The Foundation of Zero-Downtime Releases

What You Will Learn

What Are Rolling Deployments?

The fundamental principle:

The Capacity Invariant

Visual model of a rolling deployment:

Consider a service with 6 replicas. A rolling deployment with maxSurge=2 and maxUnavailable=1 proceeds as follows:

Initial state: 6 instances of v1 serving traffic
Wave 1: Create 2 new v2 instances (surge to 8 total), terminate 1 v1 instance when v2 is healthy
Wave 2: Create 1 v2 instance, terminate 1 v1 instance when healthy
Wave 3-5: Continue replacing v1 with v2, one at a time
Final state: 6 instances of v2 serving traffic

The key insight is that capacity never drops below 5 instances (original 6 minus maxUnavailable of 1), and temporarily surges to 8 instances (original 6 plus maxSurge of 2).

Rolling Deployment Progression Example
Phase	v1 Instances	v2 Instances	Total Capacity	Status
Initial	6	0	6	Stable on v1
Surge	6	2 (starting)	6-8	Creating new pods
Replace 1	5	2 (ready)	7	First v1 terminated
Replace 2	4	3	7	Continuing rollout
Replace 3	3	4	7	Halfway complete
Replace 4	2	5	7	Nearing completion
Replace 5	1	6	7	Final v1 remaining
Final	0	6	6	Stable on v2

Configuration Parameters Deep Dive

The two fundamental parameters:

Core Rolling Deployment Parameters

•maxSurge — The maximum number of instances (or percentage) that can be created above the desired count during deployment. Higher values speed up deployments but require more resources. A value of 25% on a 20-instance deployment allows up to 5 extra instances.
•maxUnavailable — The maximum number of instances (or percentage) that can be unavailable during deployment. Higher values speed up deployments but reduce capacity. A value of 0 ensures full capacity is maintained throughout.
•minReadySeconds — The minimum time a new instance must be ready (passing health checks) before it's considered available and the deployment can proceed. Protects against fast-failing instances.
•progressDeadlineSeconds — The maximum time the deployment can take before it's considered failed. Prevents deployments from hanging indefinitely on problematic releases.

kubernetes-rolling-deployment.yaml
Kubernetes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  labels:
    app: payment-service
spec:
  replicas: 10
  strategy:
    type: RollingUpdate
    rollingUpdate:
      # Allow 2 extra instances during deployment
      # Enables faster rollouts by parallelizing updates
      maxSurge: 2
      
      # Allow at most 1 instance to be unavailable
      # Ensures capacity never drops below 9 instances
      maxUnavailable: 1
      
  # Pod must be ready for 30 seconds before considered available
  # Protects against instances that pass initial checks but fail under load
  minReadySeconds: 30
  
  # Deployment must complete within 10 minutes
  # Triggers automatic rollback if deployment stalls
  progressDeadlineSeconds: 600
  
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
    spec:
      containers:
      - name: payment-service
        image: payment-service:v2.3.0
        ports:
        - containerPort: 8080
        
        # Readiness probe gates traffic routing
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          failureThreshold: 3
          
        # Liveness probe detects crashed instances
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          failureThreshold: 3

Parameter interactions and trade-offs:

The relationship between maxSurge and maxUnavailable creates a trade-off space that balances three concerns: deployment speed, resource overhead, and capacity safety.

Configuration	Speed	Resource Cost	Capacity Risk
maxSurge=0, maxUnavailable=1	Slowest	Minimal	Each instance replaced sequentially
maxSurge=25%, maxUnavailable=0	Medium	25% extra capacity	Zero capacity loss
maxSurge=50%, maxUnavailable=25%	Fast	50% extra capacity	Up to 25% capacity reduction
maxSurge=100%, maxUnavailable=0	Fastest	100% extra capacity	Full capacity maintained

Critical: Both Parameters Cannot Be Zero

Health Checks and Progression Gates

The two-probe model:

Modern orchestration platforms distinguish between two types of health checks with different purposes:

Readiness Probe

•Determines if instance should receive traffic
•Failing probe removes instance from load balancer
•Instance continues running, can recover
•Critical for rolling deployments: gates progression
•Should check app dependencies (DB, cache, etc.)
•Can temporarily fail during overload

Liveness Probe

•Determines if instance needs to be restarted
•Failing probe triggers container restart
•Instance is killed and replaced
•Should only fail for unrecoverable states
•Should NOT check external dependencies
•Typically simpler than readiness probe

health-endpoints.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
package main
 
import (
    "database/sql"
    "net/http"
    "sync/atomic"
    "time"
)
 
type HealthChecker struct {
    db           *sql.DB
    redis        *redis.Client
    ready        atomic.Bool
    shuttingDown atomic.Bool
}
 
// LivenessHandler checks if the process is fundamentally healthy.
// Should ONLY fail for unrecoverable states like deadlocks or corruption.
// Do NOT check external dependencies—if DB is down, restarting this
// instance won't fix it.
func (h *HealthChecker) LivenessHandler(w http.ResponseWriter, r *http.Request) {
    // Check for application-level health
    // Examples: goroutine leaks, memory corruption, deadlocks
    
    if h.shuttingDown.Load() {
        // Graceful shutdown in progress—let it complete
        w.WriteHeader(http.StatusOK)
        return
    }
    
    // Simple health check—can the application respond at all?
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("alive"))
}
 
// ReadinessHandler checks if the instance should receive traffic.
// Should check all dependencies needed to serve requests.
// Temporary failures are acceptable—instance stays running.
func (h *HealthChecker) ReadinessHandler(w http.ResponseWriter, r *http.Request) {
    ctx, cancel := context.WithTimeout(r.Context(), 2*time.Second)
    defer cancel()
    
    // Check if we're shutting down
    if h.shuttingDown.Load() {
        http.Error(w, "shutting down", http.StatusServiceUnavailable)
        return
    }
    
    // Check database connectivity
    if err := h.db.PingContext(ctx); err != nil {
        http.Error(w, "database unavailable", http.StatusServiceUnavailable)
        return
    }
    
    // Check Redis connectivity
    if err := h.redis.Ping(ctx).Err(); err != nil {
        http.Error(w, "redis unavailable", http.StatusServiceUnavailable)
        return
    }
    
    // Check if warmed up (caches populated, connections established)
    if !h.ready.Load() {
        http.Error(w, "not yet ready", http.StatusServiceUnavailable)
        return
    }
    
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("ready"))
}
 
// StartupHandler is for slow-starting applications.
// Gives more time for initial startup without affecting liveness.
func (h *HealthChecker) StartupHandler(w http.ResponseWriter, r *http.Request) {
    if h.ready.Load() {
        w.WriteHeader(http.StatusOK)
        w.Write([]byte("started"))
        return
    }
    http.Error(w, "still starting", http.StatusServiceUnavailable)
}

Common Mistake: Checking Dependencies in Liveness Probes

Timing parameters for health checks:

Parameter	Recommended Value	Purpose
initialDelaySeconds	Application-dependent	Wait for application to start before probing
periodSeconds	5-10 seconds	How often to run the probe
timeoutSeconds	1-3 seconds	Maximum time for probe to respond
successThreshold	1	Consecutive successes to be considered healthy
failureThreshold	3	Consecutive failures to be considered unhealthy

The math of failure detection time:

With periodSeconds=5 and failureThreshold=3, an unhealthy instance is detected in: periodSeconds × failureThreshold = 5 × 3 = 15 seconds

During these 15 seconds, traffic may be routed to the failing instance. Balance detection speed against false positives from transient issues.

Graceful Shutdown and Connection Draining

The shutdown sequence:

When Kubernetes (or any orchestrator) decides to terminate a pod, a carefully orchestrated sequence begins:

Pod Termination Sequence

•Pod marked for termination — Kubernetes updates pod status, begins graceful termination period
•Endpoints removed — Pod is removed from Service endpoints (but this propagates asynchronously!)
•SIGTERM sent — Container receives SIGTERM signal, should begin shutdown
•Readiness probe fails — Application marks itself not ready, stops accepting new connections
•In-flight requests complete — Application waits for existing requests to finish
•Connections closed — Application closes all connections gracefully
•SIGKILL if needed — After terminationGracePeriodSeconds, process is forcibly killed

The Race Condition Problem

graceful-shutdown.go
Go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
package main
 
import (
    "context"
    "log"
    "net/http"
    "os"
    "os/signal"
    "syscall"
    "time"
)
 
func main() {
    server := &http.Server{
        Addr:         ":8080",
        Handler:      createHandler(),
        ReadTimeout:  10 * time.Second,
        WriteTimeout: 30 * time.Second,
        IdleTimeout:  60 * time.Second,
    }
    
    // Channel to receive shutdown signals
    shutdown := make(chan os.Signal, 1)
    signal.Notify(shutdown, syscall.SIGTERM, syscall.SIGINT)
    
    // Start server in goroutine
    go func() {
        log.Printf("Server starting on %s", server.Addr)
        if err := server.ListenAndServe(); err != http.ErrServerClosed {
            log.Fatalf("Server error: %v", err)
        }
    }()
    
    // Wait for shutdown signal
    <-shutdown
    log.Println("Shutdown signal received, beginning graceful shutdown...")
    
    // CRITICAL: Wait for endpoints to propagate
    // This gives kube-proxy time to update iptables rules
    // and stop routing new traffic to this pod
    log.Println("Waiting for endpoint propagation...")
    time.Sleep(5 * time.Second)
    
    // Create shutdown context with timeout
    // This should be less than terminationGracePeriodSeconds
    ctx, cancel := context.WithTimeout(context.Background(), 25*time.Second)
    defer cancel()
    
    // Shutdown server gracefully
    // This stops accepting new connections and waits for existing ones
    log.Println("Shutting down HTTP server...")
    if err := server.Shutdown(ctx); err != nil {
        log.Printf("Graceful shutdown error: %v", err)
        // Force close remaining connections
        server.Close()
    }
    
    // Cleanup other resources (database connections, message queues, etc.)
    log.Println("Closing database connections...")
    // db.Close()
    
    log.Println("Shutdown complete")
}

Configuring terminationGracePeriodSeconds:

The terminationGracePeriodSeconds value should be long enough to:

Allow endpoint propagation delay (~5 seconds)
Complete in-flight requests (depends on your request timeout)
Close connections and cleanup resources

A typical configuration for a web service:

spec:
  terminationGracePeriodSeconds: 60
  containers:
  - name: app
    # Application should complete shutdown within 55 seconds
    # (leaving 5 seconds of buffer before SIGKILL)

Long-running operations:

For services with long-running operations (background jobs, WebSocket connections, streaming responses), you may need:

Longer termination grace periods (minutes instead of seconds)
Checkpoint/resume mechanisms for jobs
Connection migration for WebSockets

Mixed-Version Compatibility

This constraint affects:

API compatibility between services
Database schema changes
Message format changes
Configuration compatibility
Feature flag states

The Two-Phase Deployment Rule

Database schema changes:

Schema migrations during rolling deployments require careful sequencing. Consider adding a new required column:

Phase	Deployment	Schema State	Compatibility
1	No change	Original schema	Baseline
2	Add column as nullable	Column exists, nullable	Old code ignores, new code uses
3	Backfill existing rows	All rows have values	Old code ignores, new code uses
4	Deploy code requiring column	Column required	New code depends on column
5	Add NOT NULL constraint	Schema final	Schema enforces requirement

Each phase is a separate deployment. Skipping phases causes failures during the mixed-version window.

Breaking Changes That Require Multi-Phase Deployments

•Renaming a field — Add new field, deploy readers of both, migrate data, deploy writers of new, remove old field
•Removing a field — Stop reading field, deploy, stop writing field, deploy, remove from schema
•Changing field type — Add new field with new type, migrate data, update readers/writers, remove old field
•Adding required field — Add as optional, backfill, update writers to always set, make required
•Changing API endpoint — Add new endpoint, update clients to use new, remove old endpoint
•Changing message format — Add version field, deploy handlers for both versions, update producers, deprecate old format

api-versioning-example.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
// Example: Safe field renaming through API versioning
 
// Phase 1: Original API response
interface UserResponseV1 {
  user_name: string;  // Old field name
  email: string;
}
 
// Phase 2: Transitional API response (supports both)
interface UserResponseV2 {
  user_name: string;   // Deprecated but still present
  username: string;    // New field name
  email: string;
}
 
// Backend returns both during transition
function getUserResponse(user: User): UserResponseV2 {
  return {
    user_name: user.username,  // For old clients
    username: user.username,   // For new clients
    email: user.email,
  };
}
 
// Client code should prefer new field with fallback
function parseUserResponse(response: UserResponseV1 | UserResponseV2): string {
  // New field takes precedence, fall back to old field
  return 'username' in response ? response.username : response.user_name;
}
 
// Phase 3: After all clients updated, remove old field
interface UserResponseV3 {
  username: string;
  email: string;
}

Monitoring and Observability During Rollouts

Key metrics to monitor during rollouts:

Critical Deployment Metrics

•Error rate by version — Compare error rates between old and new versions. Any increase in errors from new instances should pause the rollout.
•Latency percentiles by version — p50, p95, and p99 latencies, segmented by version. Watch for latency regressions.
•Pod restart count — Instances crash-looping indicates a fundamental issue with the new version.
•Resource utilization — Memory and CPU usage of new instances. Memory leaks often manifest during rollouts.
•Request throughput — Ensure new instances are actually receiving and processing traffic.
•Dependency health — Database connection pool usage, external API latencies, cache hit rates.

prometheus-queries.promql

PromQL

# Error rate comparison between versions
# Should be approximately equal—significant difference indicates regression
 
# Error rate for new version (last 5 minutes)
sum(rate(http_requests_total{app="payment-service", version="v2.3.0", status=~"5.."}[5m]))
/
sum(rate(http_requests_total{app="payment-service", version="v2.3.0"}[5m]))
 
# Error rate for old version (last 5 minutes)
sum(rate(http_requests_total{app="payment-service", version="v2.2.0", status=~"5.."}[5m]))
/
sum(rate(http_requests_total{app="payment-service", version="v2.2.0"}[5m]))
 
# Latency comparison: p99 by version
histogram_quantile(0.99, 
  sum(rate(http_request_duration_seconds_bucket{app="payment-service"}[5m])) 
  by (le, version)
)
 
# Memory usage growth rate (detect memory leaks)
rate(container_memory_working_set_bytes{
  pod=~"payment-service.*",
  container="payment-service"
}[15m])
 
# Pod restart count (crash loops)
sum(kube_pod_container_status_restarts_total{
  pod=~"payment-service.*"
}) by (pod)
 
# Deployment progress
kube_deployment_status_replicas_updated{deployment="payment-service"}
/
kube_deployment_spec_replicas{deployment="payment-service"}

Version Labels Are Essential

Alerting during deployments:

Configure alerts that fire during deployments with lower thresholds than normal operations:

Metric	Normal Threshold	Deployment Threshold	Rationale
Error rate	>1% for 5 min	>0.1% for 2 min	Catch issues faster during deployment
p99 latency	>500ms for 5 min	>300ms for 2 min	Detect latency regression early
Pod restarts	>3 in 10 min	>1 in 5 min	Any restart during deployment is suspicious
Memory growth	>10% in 1 hour	>5% in 15 min	Memory leaks manifest during restart

These deployment-specific alerts should automatically activate when a deployment begins and deactivate when it completes.

Rollback Scenarios and Procedures

A rolling deployment without a tested rollback plan is an incomplete deployment. You must know, before you begin, exactly how you'll revert if something goes wrong.

Types of rollback triggers:

Rollback Trigger Conditions

•Automatic (health check failures) — Deployment controller automatically pauses when new pods fail readiness checks
•Automatic (progress deadline) — Deployment fails if it doesn't make progress within progressDeadlineSeconds
•Manual (metric regression) — Operator observes increased errors or latency and initiates rollback
•Manual (functional issues) — Users report bugs or unexpected behavior in new version
•Manual (external dependencies) — New version incompatible with external systems discovered in production

rollback-commands.sh
Shell
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash
# Kubernetes Rolling Deployment Rollback Procedures
 
# View deployment history
kubectl rollout history deployment/payment-service
 
# Example output:
# REVISION  CHANGE-CAUSE
# 1         Initial deployment
# 2         kubectl set image deployment/payment-service payment-service=payment-service:v2.2.0
# 3         kubectl set image deployment/payment-service payment-service=payment-service:v2.3.0
 
# Rollback to previous version (revision 2)
kubectl rollout undo deployment/payment-service
 
# Rollback to specific revision
kubectl rollout undo deployment/payment-service --to-revision=1
 
# Check rollback status
kubectl rollout status deployment/payment-service
 
# Pause a problematic deployment mid-rollout
kubectl rollout pause deployment/payment-service
 
# Resume paused deployment
kubectl rollout resume deployment/payment-service
 
# View detailed rollout status
kubectl describe deployment payment-service | grep -A 20 "Conditions:"
 
# Example output during rollback:
# Conditions:
#   Type           Status  Reason
#   ----           ------  ------
#   Available      True    MinimumReplicasAvailable
#   Progressing    True    ReplicaSetUpdated
#   RollbackSuccessful  True    RollbackRevision=2

Rollback Does Not Undo Everything

Rollback decision matrix:

Situation	Detection	Rollback Decision	Time Target
New pods won't start	Automatic (pending pods)	Automatic pause, investigate	< 5 minutes
Error rate spike	Manual/Alert	Immediate manual rollback	< 10 minutes
Latency regression	Manual/Alert	Evaluate severity, likely rollback	< 15 minutes
Memory leak	Gradual observation	Pause, investigate, usually rollback	< 30 minutes
Functional bug	User reports	Depends on severity and blast radius	Varies

Post-rollback checklist:

Verify all instances are running the previous version
Confirm error rates and latency have returned to baseline
Document what went wrong and why
Create tracking issue for root cause analysis
Update deployment with fix before next attempt

Best Practices and Production Patterns

After years of running rolling deployments at scale, the industry has converged on a set of best practices that maximize reliability while maintaining development velocity.

Rolling Deployment Best Practices

•Set minReadySeconds appropriately — Don't set to 0. Instances that pass health checks immediately may fail under load. 30-60 seconds is common for production services.
•Use PodDisruptionBudgets — Even outside deployments, node maintenance and cluster operations can terminate pods. PDBs ensure minimum availability during any disruption.
•Configure resource requests accurately — Over-provisioned requests waste cluster capacity during surge. Under-provisioned requests cause throttling under load.
•Implement structured logging — Include version in all log entries. This enables quick filtering to find issues introduced in new versions.
•Test graceful shutdown locally — Verify your application handles SIGTERM correctly before deploying. Improper shutdown causes connection errors.
•Limit blast radius with namespace isolation — Separate critical services into different namespaces with independent deployments.
•Deploy during low-traffic periods — For risk-sensitive services, schedule deployments when traffic is lower and response capacity is higher.
•Always have a runbook — Document exact commands for pausing, resuming, and rolling back before you start any deployment.

pod-disruption-budget.yaml
Kubernetes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# PodDisruptionBudget ensures minimum availability during
# voluntary disruptions (node drains, cluster upgrades, etc.)
 
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: payment-service-pdb
spec:
  # At least 80% of pods must remain available
  # For 10 replicas, at most 2 can be disrupted simultaneously
  minAvailable: "80%"
  
  # Alternative: maxUnavailable
  # maxUnavailable: 2
  
  selector:
    matchLabels:
      app: payment-service
      
---
# Complete production deployment example
 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: payment-service
  annotations:
    # Record the cause of this deployment for rollback reference
    kubernetes.io/change-cause: "Deploy v2.3.0 - Add retry logic for external APIs"
spec:
  replicas: 10
  revisionHistoryLimit: 5  # Keep 5 revisions for rollback
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 2
      maxUnavailable: 1
  minReadySeconds: 30
  progressDeadlineSeconds: 600
  selector:
    matchLabels:
      app: payment-service
  template:
    metadata:
      labels:
        app: payment-service
        version: v2.3.0  # Version label for metrics
    spec:
      terminationGracePeriodSeconds: 60
      
      # Anti-affinity spreads pods across nodes
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: payment-service
              topologyKey: kubernetes.io/hostname
              
      containers:
      - name: payment-service
        image: payment-service:v2.3.0
        ports:
        - containerPort: 8080
        
        resources:
          requests:
            memory: "512Mi"
            cpu: "500m"
          limits:
            memory: "1Gi"
            cpu: "1000m"
            
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          failureThreshold: 3
          
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 3
          failureThreshold: 3

Summary: Rolling Deployments

Rolling deployments are the workhorse of modern continuous delivery—reliable, well-understood, and supported by every major orchestration platform. Let's consolidate the key concepts:

Key Takeaways

•Rolling deployments replace instances incrementally — Old instances are replaced with new ones in waves, maintaining capacity throughout the transition.
•maxSurge and maxUnavailable control the rollout — These parameters balance deployment speed against resource overhead and capacity safety.
•Health checks gate progression — Readiness probes determine when new instances are ready; liveness probes detect crashes. Don't check external dependencies in liveness probes.
•Graceful shutdown prevents errors — Wait for endpoint propagation, drain connections, and shutdown cleanly to avoid user-facing errors.
•Mixed-version compatibility is required — Both versions run simultaneously during deployment; breaking changes require multi-phase deployments.
•Version-labeled metrics enable detection — Monitor error rates and latency by version to detect regressions before they affect all traffic.
•Rollback is always an option — But remember it only reverts code, not database or external state changes.

What's next:

Page Complete

1 / 5