Loading learning content...
Knowing what to measure is only half the story. The other half is what to do about it. Scaling policies are the decision-making rules that translate metric observations into concrete capacity changes. They answer questions like:
This page explores the major scaling policy types: target tracking, step scaling, simple scaling, and scheduled scaling. You'll understand when each is appropriate, how to configure them effectively, and the subtle behaviors that can make or break your auto-scaling strategy.
By the end of this page, you will understand the mechanics, trade-offs, and configuration parameters of all major scaling policy types. You'll be able to choose the right policy type for different scenarios and configure policies that respond appropriately to workload changes without oscillation or over-provisioning.
Auto-scaling platforms offer several policy types, each with distinct behavior. Understanding the taxonomy is essential before diving into details.
| Policy Type | How It Works | Best For | Complexity |
|---|---|---|---|
| Target Tracking | Maintains a metric at a specified target value | Steady-state optimization, most workloads | Low |
| Step Scaling | Adds/removes capacity in steps based on alarm severity | Tiered response, complex traffic patterns | Medium |
| Simple Scaling | Adds/removes fixed capacity when alarm triggers | Simple, predictable workloads | Low |
| Scheduled Scaling | Adjusts capacity at predetermined times | Predictable patterns, events, maintenance | Low |
| Predictive Scaling | Uses ML to forecast and scale proactively | Recurring patterns, gradual changes | Low (automated) |
Policy Hierarchy and Interaction:
Most auto-scaling systems allow multiple policies simultaneously. Understanding how they interact is crucial:
Scheduled scaling sets baseline — If you have a scheduled action to run 20 instances at 9 AM, that becomes the minimum at 9 AM.
Dynamic policies adjust from baseline — Target tracking or step scaling then adjusts from the scheduled baseline.
Maximum wins for scale-out — If one policy wants 15 instances and another wants 20, you get 20.
Minimum wins for scale-in — If one policy wants 10 and another wants 5, you get 10 (conservative).
This behavior creates a maximum envelope—capacity is always at least what any policy requires, providing a safety net.
For most workloads, start with target tracking. It's simpler to configure, automatically handles both scale-out and scale-in, and self-tunes capacity to maintain the target. Use step scaling only when you need tiered responses or target tracking isn't working well.
Target tracking is the most widely recommended policy type. It works like a thermostat: you specify the desired metric value (target), and the system continuously adjusts capacity to maintain it.
How Target Tracking Works:
You specify:
The system continuously:
The Control Algorithm:
For metrics that scale proportionally with capacity (like CPU or requests/instance), the formula is:
Desired Capacity = Current Capacity × (Current Metric Value / Target Value)
Example with CPU:
The system adds 6 instances, expecting average CPU to drop to ~50%.
Configuration Parameters:
| Parameter | Description | Typical Values |
|---|---|---|
| Target Value | Desired metric level | 40-70% for CPU; varies by metric |
| Scale-Out Cooldown | Wait after scale-out before next action | 60-180 seconds |
| Scale-In Cooldown | Wait after scale-in before next action | 300-600 seconds |
| Disable Scale-In | Prevent policy from removing instances | false (rarely true) |
| Instance Warmup | Time for new instances to contribute | 60-300 seconds |
Critical: Instance Warmup
When a new instance launches, it takes time to become healthy and handle traffic. During warmup:
Set warmup to approximately: time_to_boot + time_to_pass_healthcheck + time_to_reach_steady_state
The target value profoundly affects behavior. Too high (e.g., 90% CPU) leaves no headroom for traffic spikes—you'll scale too late. Too low (e.g., 30% CPU) wastes money on over-provisioning. The right target depends on your traffic's variance and your latency sensitivity. Measure and tune.
Step scaling provides graduated responses based on the severity of the metric deviation. Instead of a single target, you define multiple steps—different actions for different metric ranges.
How Step Scaling Works:
Example Configuration:
Metric: Average CPU Utilization
Base Threshold: 60%
Scale-Out Steps:
+10% (60-70%): Add 1 instance
+20% (70-80%): Add 2 instances
+30% (80-90%): Add 3 instances
+40% (90%+): Add 5 instances
Scale-In Steps:
-10% (50-60%): Remove 1 instance
-20% (40-50%): Remove 2 instances
-30% (<40%): Remove 3 instances
With this configuration:
Step Adjustment Types:
Step scaling supports different adjustment types:
| Adjustment Type | Behavior | Example |
|---|---|---|
| ChangeInCapacity | Add/remove fixed count | Add 2 instances |
| ExactCapacity | Set to specific count | Set to 10 instances |
| PercentChangeInCapacity | Add/remove percentage | Add 20% more instances |
PercentChangeInCapacity is particularly useful for large fleets where fixed changes would be too small (adding 1 instance to a 100-instance fleet is only 1% increase).
A common step scaling pattern uses different thresholds for scale-out and scale-in. Scale out at 70% CPU, but only scale in when CPU drops to 40%. This gap (hysteresis) prevents rapid oscillation when the metric hovers near a threshold.
Simple scaling is the oldest and most basic policy type. It triggers a single action when an alarm transitions to ALARM state, then waits for a cooldown period before any further scaling.
How Simple Scaling Works:
Example:
Alarm: CPU > 70% for 5 minutes
Action: Add 2 instances
Cooldown: 300 seconds
When CPU exceeds 70% for 5 minutes:
When to Use Simple Scaling (Rare Cases):
Migration Path:
If you have simple scaling policies, consider migrating:
AWS explicitly recommends target tracking or step scaling over simple scaling. The AWS documentation states: 'We recommend that you use target tracking scaling policies instead of simple scaling policies for most use cases.' Simple scaling exists primarily for backward compatibility.
Scheduled scaling adjusts capacity based on time, not metrics. It's used when you can predict traffic patterns in advance—daily cycles, weekly patterns, known events, or maintenance windows.
How Scheduled Scaling Works:
Define scheduled actions specifying:
At the scheduled time, the system adjusts capacity bounds
Dynamic policies (target tracking, step scaling) work within the new bounds
Example Schedule:
# Every weekday at 8 AM: Prepare for business hours
Schedule: 0 8 ? * MON-FRI *
Action: Set min=20, desired=30, max=100
# Every weekday at 6 PM: Reduce for evening
Schedule: 0 18 ? * MON-FRI *
Action: Set min=5, desired=10, max=50
# Weekends: Minimal capacity
Schedule: 0 0 ? * SAT-SUN *
Action: Set min=2, desired=5, max=20
Pre-Scaling Strategy:
Scheduled scaling enables proactive pre-scaling—adding capacity before load arrives rather than reacting to it:
Without Pre-Scaling:
8:00 AM: Traffic surge begins
8:01 AM: CPU spikes, triggering scale-out
8:05 AM: New instances launching
8:10 AM: Instances passing health checks
8:12 AM: Capacity catches up
→ 12 minutes of degraded performance
With Pre-Scaling:
7:45 AM: Scheduled action adds capacity
7:50 AM: New instances ready
8:00 AM: Traffic surge begins
8:00 AM: Capacity already sufficient
→ Zero impact on users
Combining Scheduled and Dynamic Scaling:
The power of scheduled scaling multiplies when combined with dynamic policies:
Scheduled actions must account for time zones. If your traffic is driven by users in multiple regions, you may need different schedules for different regions' fleets. Also consider daylight saving time—some cron implementations adjust automatically, others don't. Use UTC to avoid confusion.
Predictive scaling uses machine learning to analyze historical traffic patterns and forecast future demand, automatically scaling before load arrives. It combines the proactive benefits of scheduled scaling with automatic pattern detection.
How Predictive Scaling Works:
AWS Predictive Scaling Example:
{
"PredictiveScalingConfiguration": {
"MetricSpecifications": [{
"TargetValue": 70,
"PredefinedMetricPairSpecification": {
"PredefinedMetricType": "ASGCPUUtilization"
}
}],
"Mode": "ForecastAndScale",
"SchedulingBufferTime": 300
}
}
Modes:
Requirements and Limitations:
| Requirement | Why It Matters |
|---|---|
| 14+ days of history | ML needs sufficient data to detect patterns |
| Recurring patterns | Random/unpredictable traffic won't benefit |
| Steady metric relationship | If capacity→metric relationship is unstable, predictions fail |
| Minimum scale | Works best with groups that regularly scale between 1 and many |
When NOT to Use Predictive Scaling:
Best practice combines predictive scaling (for recurring patterns) with target tracking (for real-time adjustment). Predictive scaling ensures you're never caught off-guard by predictable patterns. Target tracking handles the unexpected—traffic spikes from viral content, attack patterns, or unexpected adoption.
Proper policy configuration is as important as choosing the right policy type. These best practices apply across policy types and platforms:
The Scaling Budget Pattern:
For cost control, implement a scaling budget—limits on how much capacity can change in a given period:
# Example: Max 10 instances per 10-minute window
Max Scale-Out Rate: +10 instances per 600 seconds
Max Scale-In Rate: -5 instances per 600 seconds
# Triggered by:
- Step scaling step limits
- Instance lifecycle hooks that throttle
- Custom Lambda-based governors
This prevents scenarios where runaway metrics trigger massive scale-outs that overwhelm downstream systems or blow through budgets.
The Minimum Viable Capacity Pattern:
Always define your minimum capacity based on:
| Parameter | Scale-Out Value | Scale-In Value | Notes |
|---|---|---|---|
| Cooldown Period | 60-120 seconds | 300-600 seconds | Asymmetric: fast out, slow in |
| Evaluation Periods | 1-2 | 5-10 | More samples for scale-in decision |
| Target (CPU) | 50-70% | Same | Leave headroom for spikes |
| Target (Requests) | 70-80% of tested max | Same | Based on load test results |
| Instance Warmup | 180-300 seconds | N/A | Time to fully contribute |
| Minimum Capacity | 2+ | N/A | At least 2 for redundancy |
Beware the scenario where you hit maximum capacity. If max is 100 and you need 150, you're at 100 with degraded performance. Either your max is too low, or you have a capacity planning problem. Set alerts at 80% of max to investigate before hitting the ceiling.
We've covered the complete landscape of scaling policies. Let's consolidate the key insights:
What's Next:
Policies define how to respond, but they don't address what happens immediately after scaling. The next page explores cool-down periods—the critical mechanism that prevents scaling oscillation and ensures stable, efficient scaling behavior.
You now understand the mechanics, configuration, and trade-offs of all major scaling policy types. You can select appropriate policy types for different workloads, configure them with production-ready parameters, and combine multiple policies for comprehensive coverage.