Auto Scaling - Learning Module

Loading content...

0/273

Predictive Scaling

Scaling Before Demand Arrives

Every auto-scaling approach we've discussed so far is fundamentally reactive—the system observes load, then adjusts capacity. Even with perfectly tuned triggers and policies, there's an inherent delay: traffic arrives, metrics reflect the load, the auto-scaler evaluates, instances launch, warmup completes, and finally new capacity absorbs traffic. This delay—typically 3-10 minutes—means users experience degradation during every traffic surge.

But what if you could predict traffic surges and pre-position capacity before they begin? What if your 9 AM daily traffic spike was met with pre-warmed instances that came online at 8:55 AM? What if your system could learn from months of historical patterns and anticipate demand you haven't even consciously recognized?

Predictive scaling makes this possible. By applying machine learning to historical metric data, predictive scaling identifies recurring patterns and proactively adjusts capacity before demand materializes. This page explores predictive scaling in depth: how it works, when to use it, how to configure it, and the limitations you must understand.

What You Will Learn

By the end of this page, you will understand how predictive scaling works under the hood, when it's appropriate (and when it's not), how to configure it on major cloud platforms, how to validate predictions, and how to combine predictive and reactive scaling for comprehensive coverage.

The Reactive Scaling Gap

Before understanding predictive scaling's value, let's quantify the problem it solves: the reactive scaling gap—the time between when load increases and when capacity catches up.

Anatomy of the Reactive Scaling Gap:

Time 0:00 - Traffic spike begins (1000 → 3000 req/s)
Time 0:00 - Existing capacity starts to struggle
           ↓ Metrics Collection Delay: 30-60 seconds
Time 0:30 - CloudWatch/Prometheus reflects increased load
           ↓ Evaluation Period: 60-120 seconds (multiple datapoints)
Time 1:30 - Alarm threshold breached, scaling triggered
           ↓ Scaling Decision Processing: 10-30 seconds  
Time 1:40 - Launch request sent to EC2/GKE/etc.
           ↓ Instance Launch: 30-120 seconds
Time 3:00 - Instances running, pulling containers/images
           ↓ Application Startup: 30-180 seconds
Time 5:00 - Application started, running health checks
           ↓ Health Check Passing: 30-60 seconds (2-3 intervals)
Time 5:30 - Load balancer starts sending traffic
           ↓ Warmup Period: 60-180 seconds
Time 7:00 - New instances at full capacity

TOTAL GAP: 7 minutes of degraded service

During this 7-minute gap, your existing instances are overloaded, latency is elevated, error rates may increase, and users are experiencing degraded service. For latency-sensitive applications, this is unacceptable.

Converting Mermaid diagram...

When the Reactive Gap Hurts Most

•Scheduled events — Product launches, marketing campaigns, TV appearances at known times
•Daily traffic patterns — Morning ramp-up, lunch surge, evening peak that occur predictably
•Weekly patterns — Monday morning catch-up, weekend dips, payday spikes
•Seasonal patterns — Holiday shopping, quarterly report releases, academic schedules
•Slow instance startup — JVM applications, ML models, large container images that take minutes to warm

Predictable ≠ Manually Scheduled

You might think: 'If traffic is predictable, I'll just use scheduled scaling.' But scheduled scaling requires you to manually identify patterns, set exact times, and maintain schedules as patterns shift. Predictive scaling automates this—the ML detects patterns you might miss and adapts as patterns evolve.

How Predictive Scaling Works

Predictive scaling applies time-series forecasting techniques to historical metric data to predict future demand. While implementations vary across platforms, the core mechanism is consistent:

The Predictive Scaling Pipeline:

1. Historical Data Collection

System collects 14+ days of historical metric data
Metrics include CPU utilization, request counts, or custom metrics
Data is aggregated at regular intervals (typically 5-minute or 1-hour windows)

2. Pattern Detection (ML Model)

Time-series decomposition identifies components:
- Trend: Long-term direction (growing? shrinking?)
- Seasonality: Recurring patterns (daily, weekly, monthly)
- Residuals: Random noise after trend and seasonality removed
Common techniques: ARIMA, Prophet, exponential smoothing, neural nets

3. Forecasting

Model generates predictions for future time periods
Predictions include not just point estimates but confidence intervals
Forecasts typically extend 24-48 hours ahead

4. Capacity Planning

Predicted metric values are converted to required capacity:
```
Required Capacity = Predicted Load / Target Per Instance
```
Buffer time is added to account for instance startup

5. Proactive Scaling

At the appropriate time (predicted spike minus startup buffer), launch additional capacity
New instances are ready when predicted load arrives

Converting Mermaid diagram...

AWS Predictive Scaling Specifics:

AWS uses a proprietary ML algorithm trained on millions of scaling groups to detect patterns:

Analyzes 14 days of history (requires at least 24 hours)
Detects daily, weekly, and special patterns
Generates forecasts for 48 hours ahead
Updates forecasts every 24 hours
Accounts for instance startup time via SchedulingBufferTime

Supported Metrics:

CPU Utilization (most common)
Request count per target (ALB)
Custom metrics (with additional configuration)

The Black Box Reality

Cloud providers' predictive scaling uses proprietary algorithms—you can't inspect the model or understand exactly why it made specific predictions. This is a trade-off: you get sophisticated ML without building it yourself, but you can't debug unexpected predictions. Always run in forecast-only mode first to validate behavior.

When Predictive Scaling Works (And When It Doesn't)

Predictive scaling is powerful but not universally applicable. Understanding where it excels and where it fails is critical for successful adoption.

Predictive Scaling Works Well

•Clear daily patterns — Business hours traffic that peaks at predictable times
•Weekly cycles — Weekday/weekend differences; Monday morning surges
•Gradual traffic changes — Ramp-ups over minutes rather than seconds
•Stable historical patterns — Traffic follows similar shapes day-to-day
•Sufficient scale — Groups that scale between meaningful ranges (5→50, not 2→3)
•Long startup times — Applications where reactive scaling is too slow

Predictive Scaling Struggles

•Random/unpredictable traffic — Gaming traffic, viral content that comes from nowhere
•Changing patterns — New product launch invalidating historical data
•Flash crowds — Sudden spikes from external events (TV mentions, HN posts)
•Very small scale — 2-3 instance groups have too little variance
•Insufficient history — Less than 14 days of data
•Non-repeating events — One-time events have no pattern to learn

The Hybrid Approach:

In practice, predictive + reactive is the winning combination:

Predictive Scaling handles:
- Morning ramp-up (8 AM daily)
- Weekend scale-down
- Monthly billing cycle spike
→ Pre-positions capacity for known patterns

Reactive Scaling handles:
- Unexpected viral content
- Marketing campaign over-performance
- Competitor outage driving traffic
→ Catches what predictive didn't anticipate

Combined Result:
- Predictive provides the baseline
- Reactive adds/removes as actual load differs from forecast
- Users never experience scaling gaps for predictable load
- System still adapts to unpredictable spikes

The Pattern Shift Problem

Predictive scaling learns from history—if your traffic patterns shift (new product launch, major feature change, acquisition of new users in different timezone), predictions will be wrong until the model relearns. Major changes require 1-2 weeks of new data before predictions normalize. Monitor closely during transitions.

Configuring Predictive Scaling

Let's walk through practical configuration on major platforms. While specifics vary, the concepts translate across providers.

AWS Auto Scaling Predictive Scaling Configuration:

Basic Configuration (AWS CLI):

aws autoscaling put-scaling-policy 
  --auto-scaling-group-name my-asg 
  --policy-name predictive-scaling-policy 
  --policy-type PredictiveScaling 
  --predictive-scaling-configuration '{
    "MetricSpecifications": [{
      "TargetValue": 50,
      "PredefinedMetricPairSpecification": {
        "PredefinedMetricType": "ASGCPUUtilization"
      }
    }],
    "Mode": "ForecastAndScale",
    "SchedulingBufferTime": 300
  }'

Key Parameters:

Parameter	Description	Recommended
`TargetValue`	Metric value to maintain	40-60% for CPU
`Mode`	ForecastOnly (test) or ForecastAndScale (active)	Start with ForecastOnly
`SchedulingBufferTime`	Seconds before predicted need to launch	Instance startup time (300-600s)
`MaxCapacityBreachBehavior`	Honor or Increase max if forecast exceeds	HonorMaxCapacity usually

Available Metric Types:

ASGAverageCPUUtilization
ALBRequestCountPerTarget
Custom metrics via CustomizedLoadMetricSpecification

Validating Predictions Before Trusting Them

Never enable active predictive scaling without validation. Predictions can be wrong, and wrong predictions cause either over-provisioning (wasted money) or under-provisioning (degraded service). Here's a systematic validation approach:

Phase 1: Forecast-Only Mode (Weeks 1-2)

Enable predictive scaling in forecast-only mode:

# AWS
"Mode": "ForecastOnly"

This generates predictions without taking action. Compare predictions to actual load:

Export forecast data — CloudWatch metrics for predicted capacity
Overlay with actual — Plot predicted vs actual capacity need
Calculate accuracy — Measure mean absolute percentage error (MAPE)

MAPE = (1/n) × Σ |Actual - Predicted| / Actual × 100%

Interpretation:
- MAPE < 10%: Excellent predictions, safe to enable
- MAPE 10-20%: Good predictions, enable with monitoring
- MAPE 20-30%: Moderate accuracy, enable cautiously
- MAPE > 30%: Poor predictions, investigate before enabling

Phase 2: Limited Activation (Weeks 3-4)

Enable predictive scaling but with safety constraints:

{
  "Mode": "ForecastAndScale",
  "MaxCapacityBreachBehavior": "HonorMaxCapacity"
}

And ensure reactive policies are active as backup:

Target tracking policy with same metric
Lower target value (more aggressive reactive scaling)

This way:

Predictive handles the baseline
Reactive catches prediction errors
Max capacity prevents runaway scaling

Phase 3: Full Trust (Week 5+)

After validating predictions are accurate:

Gradually relax reactive policy aggressiveness
Consider increasing max capacity if predictions consistently approach limit
Reduce manual monitoring as confidence grows

Prediction Validation Checklist

•Compare daily patterns — Does prediction match actual daily curve shape?
•Check weekend behavior — Are weekday/weekend differences captured?
•Validate peak timing — Is the predicted peak time accurate (±15 minutes)?
•Assess peak magnitude — Is predicted peak capacity within 20% of actual need?
•Look for systematic bias — Does it consistently over-predict or under-predict?
•Test on anomalous days — How does prediction fare on unusual days (holidays, events)?

The 'Over-Predict is OK' Principle

Slight over-prediction (10-20% more capacity than needed) is acceptable—you pay a bit more but users never suffer. Under-prediction is the real danger. When evaluating predictions, bias toward accepting over-prediction errors while being strict about under-prediction.

Advanced Predictive Scaling Patterns

Beyond basic configuration, sophisticated organizations employ advanced patterns that maximize predictive scaling's value:

1. Multi-Layer Predictive + Reactive Stack:

┌─────────────────────────────────────────┐
│  Predictive Scaling (Base Capacity)     │
│  - Handles 80% of scaling need          │
│  - Pre-positions for daily patterns     │
│  - Low churn, efficient                 │
├─────────────────────────────────────────┤
│  Target Tracking (Day-to-Day Variance)  │
│  - Handles 15% (normal variation)       │
│  - Adjusts within predicted range       │
│  - Moderate responsiveness              │
├─────────────────────────────────────────┤
│  Step Scaling (Emergency Response)      │
│  - Handles 5% (unexpected spikes)       │
│  - Aggressive thresholds                │
│  - Fast cooldowns                       │
└─────────────────────────────────────────┘

2. Capacity Reservation Alignment:

For cost optimization, align predictive scaling with Reserved Instances or Savings Plans:

Predictive Baseline = Reserved Capacity

- Purchase RIs/Savings Plans for predicted minimum (floor of predictions)
- On-demand/spot for variance above predictions
- Predictive ensures you actually use your reservations
- Reactive handles spikes beyond reserved capacity

3. Cross-Service Prediction:

Use upstream service's predictions to pre-scale downstream:

Web Tier Prediction: 100 instances at 9 AM
↓
API Tier Prediction: 50 instances (derived from web:API ratio)
↓  
Database Read Replicas: 5 replicas (derived from API:DB ratio)

→ Entire stack pre-scales together
→ No cascading delays

4. Event-Aware Prediction Overrides:

For known events that will break predictions:

# Terraform/CloudFormation scheduled action
resource "aws_autoscaling_schedule" "black_friday" {
  scheduled_action_name  = "black-friday-override"
  min_size               = 200  # Override prediction floor
  max_size               = 1000 # Override prediction ceiling
  desired_capacity       = 500  # Start high
  recurrence             = "0 0 * 11 5#4"  # 4th Thursday Nov
}

Scheduled actions override predictions for events where:

Historical data doesn't exist (first-time event)
Scale is much larger than history (viral campaign)
Timing differs from normal patterns (midnight launch)

5. Prediction Quality Monitoring:

Automate prediction quality tracking:

# CloudWatch custom metric for prediction accuracy
def calculate_prediction_accuracy():
    predicted = get_cloudwatch_metric('PredictiveScaling', 'LoadForecast')
    actual = get_cloudwatch_metric('ASG', 'GroupInServiceInstances')
    
    accuracy = 100 - abs(predicted - actual) / actual * 100
    
    put_cloudwatch_metric(
        'PredictiveScaling/Accuracy',
        accuracy,
        dimensions={'ASG': 'my-asg'}
    )
    
    if accuracy < 80:
        send_alert('Predictive scaling accuracy degraded')

The ML Operations Perspective

Treat predictive scaling as an ML system: it has training data (history), a model (the prediction algorithm), and inference (the forecasts). Like any ML system, it can suffer from data drift (patterns changing), model staleness, and edge cases. Apply MLOps disciplines: monitor prediction quality, alert on degradation, and retrain (or reset) when patterns shift.

Limitations and Gotchas

Predictive scaling is powerful but has important limitations. Understanding these prevents surprises in production:

Key Limitations

•14-Day Minimum History — Without sufficient data, predictions are unreliable or impossible. New ASGs can't use predictive scaling immediately.
•Pattern-Dependent Accuracy — Random or non-repeating traffic patterns produce poor predictions. If your traffic is driven by external events (news, viral content), patterns won't help.
•Slow to Adapt — When patterns change, predictions take 1-2 weeks to adjust. Major product changes invalidate historical patterns.
•Not Instant — Predictive scaling still has lead time. Forecasts are generated every 24 hours; they don't react to real-time changes.
•Black-Box Algorithm — You can't inspect or tune the prediction model. If predictions are wrong, you can only disable or supplement with reactive.
•Cost of Over-Prediction — Predictions can err high, causing over-provisioning. While safer than under-prediction, it increases costs.
•Maximum Capacity Constraints — Predictions that exceed your max capacity get capped. If predictions consistently hit max, you need to increase it or investigate.
•Limited Metric Support — Not all metrics are supported for prediction. Custom business metrics may require workarounds.

Common Gotchas and Solutions
Gotcha	Symptom	Solution
Predictions stale after change	Under-provisioning after major update	Disable predictive for 2 weeks; rely on reactive
Over-prediction on weekends	Wasted capacity on Sat/Sun	Add scheduled action to cap weekend capacity
Missing one-time events	Under-provisioned for product launch	Scheduled override action for known events
Predictions hit max constantly	Can't tell if predictions are accurate	Increase max to see actual prediction values
Timezone confusion	Predictions are 5 hours off	Verify data uses UTC; buffer time accounts for offset
Predictions too conservative	Always under-predicting by 20%	Lower target value to increase predicted capacity

The 'It Just Works' Trap

After enabling predictive scaling successfully, teams often stop monitoring it. Then patterns shift, predictions become wrong, and issues emerge weeks later. Set up ongoing accuracy monitoring and periodic reviews. Predictive scaling requires as much operational attention as any ML system.

Summary: Predictive Scaling

We've explored predictive scaling comprehensively. Let's consolidate the key insights:

Key Takeaways

•Predictive scaling eliminates the reactive gap — Pre-positions capacity before demand arrives, ensuring users never experience scaling delays for predictable patterns.
•It requires recurring patterns — Works well for daily/weekly cycles; fails for random, unpredictable, or one-time events.
•Always combine with reactive scaling — Predictive handles the baseline; reactive catches what predictions miss. Together they provide comprehensive coverage.
•Validate before trusting — Use forecast-only mode to compare predictions to actual load. Enable active scaling only after confirming accuracy.
•Monitor prediction quality continuously — Patterns change; predictions can become stale. Treat predictive scaling like any ML system requiring ongoing operations.
•Account for limitations — 14-day history requirement, slow adaptation to changes, black-box algorithms, and cost of over-prediction all need management.

Module Complete:

You've now completed the Auto-Scaling module. You understand:

What auto-scaling is — Dynamic resource adjustment based on demand
What triggers scaling — CPU, memory, queue depth, latency, custom metrics
How policies work — Target tracking, step scaling, scheduled, predictive
How to ensure stability — Cool-down periods, warmup, asymmetric behavior
How to scale proactively — Predictive scaling with ML-based forecasting

With this knowledge, you can design auto-scaling strategies that maintain performance, minimize cost, and adapt to any traffic pattern your system encounters.

Module Complete

Congratulations! You've mastered auto-scaling—one of the most impactful capabilities in modern distributed systems. You can now design scaling strategies that handle everything from predictable daily patterns to unexpected viral spikes, all while optimizing cost and maintaining user experience.

Predictive Scaling

Scaling Before Demand Arrives

What You Will Learn

The Reactive Scaling Gap

Before understanding predictive scaling's value, let's quantify the problem it solves: the reactive scaling gap—the time between when load increases and when capacity catches up.

Anatomy of the Reactive Scaling Gap:

Time 0:00 - Traffic spike begins (1000 → 3000 req/s)
Time 0:00 - Existing capacity starts to struggle
           ↓ Metrics Collection Delay: 30-60 seconds
Time 0:30 - CloudWatch/Prometheus reflects increased load
           ↓ Evaluation Period: 60-120 seconds (multiple datapoints)
Time 1:30 - Alarm threshold breached, scaling triggered
           ↓ Scaling Decision Processing: 10-30 seconds  
Time 1:40 - Launch request sent to EC2/GKE/etc.
           ↓ Instance Launch: 30-120 seconds
Time 3:00 - Instances running, pulling containers/images
           ↓ Application Startup: 30-180 seconds
Time 5:00 - Application started, running health checks
           ↓ Health Check Passing: 30-60 seconds (2-3 intervals)
Time 5:30 - Load balancer starts sending traffic
           ↓ Warmup Period: 60-180 seconds
Time 7:00 - New instances at full capacity

TOTAL GAP: 7 minutes of degraded service

Converting Mermaid diagram...

When the Reactive Gap Hurts Most

•Scheduled events — Product launches, marketing campaigns, TV appearances at known times
•Daily traffic patterns — Morning ramp-up, lunch surge, evening peak that occur predictably
•Weekly patterns — Monday morning catch-up, weekend dips, payday spikes
•Seasonal patterns — Holiday shopping, quarterly report releases, academic schedules
•Slow instance startup — JVM applications, ML models, large container images that take minutes to warm

Predictable ≠ Manually Scheduled

How Predictive Scaling Works

Predictive scaling applies time-series forecasting techniques to historical metric data to predict future demand. While implementations vary across platforms, the core mechanism is consistent:

The Predictive Scaling Pipeline:

1. Historical Data Collection

System collects 14+ days of historical metric data
Metrics include CPU utilization, request counts, or custom metrics
Data is aggregated at regular intervals (typically 5-minute or 1-hour windows)

2. Pattern Detection (ML Model)

Time-series decomposition identifies components:
- Trend: Long-term direction (growing? shrinking?)
- Seasonality: Recurring patterns (daily, weekly, monthly)
- Residuals: Random noise after trend and seasonality removed
Common techniques: ARIMA, Prophet, exponential smoothing, neural nets

3. Forecasting

Model generates predictions for future time periods
Predictions include not just point estimates but confidence intervals
Forecasts typically extend 24-48 hours ahead

4. Capacity Planning

Predicted metric values are converted to required capacity:
```
Required Capacity = Predicted Load / Target Per Instance
```
Buffer time is added to account for instance startup

5. Proactive Scaling

At the appropriate time (predicted spike minus startup buffer), launch additional capacity
New instances are ready when predicted load arrives

Converting Mermaid diagram...

AWS Predictive Scaling Specifics:

AWS uses a proprietary ML algorithm trained on millions of scaling groups to detect patterns:

Analyzes 14 days of history (requires at least 24 hours)
Detects daily, weekly, and special patterns
Generates forecasts for 48 hours ahead
Updates forecasts every 24 hours
Accounts for instance startup time via SchedulingBufferTime

Supported Metrics:

CPU Utilization (most common)
Request count per target (ALB)
Custom metrics (with additional configuration)

The Black Box Reality

When Predictive Scaling Works (And When It Doesn't)

Predictive scaling is powerful but not universally applicable. Understanding where it excels and where it fails is critical for successful adoption.

Predictive Scaling Works Well

•Clear daily patterns — Business hours traffic that peaks at predictable times
•Weekly cycles — Weekday/weekend differences; Monday morning surges
•Gradual traffic changes — Ramp-ups over minutes rather than seconds
•Stable historical patterns — Traffic follows similar shapes day-to-day
•Sufficient scale — Groups that scale between meaningful ranges (5→50, not 2→3)
•Long startup times — Applications where reactive scaling is too slow

Predictive Scaling Struggles

•Random/unpredictable traffic — Gaming traffic, viral content that comes from nowhere
•Changing patterns — New product launch invalidating historical data
•Flash crowds — Sudden spikes from external events (TV mentions, HN posts)
•Very small scale — 2-3 instance groups have too little variance
•Insufficient history — Less than 14 days of data
•Non-repeating events — One-time events have no pattern to learn

The Hybrid Approach:

In practice, predictive + reactive is the winning combination:

Predictive Scaling handles:
- Morning ramp-up (8 AM daily)
- Weekend scale-down
- Monthly billing cycle spike
→ Pre-positions capacity for known patterns

Reactive Scaling handles:
- Unexpected viral content
- Marketing campaign over-performance
- Competitor outage driving traffic
→ Catches what predictive didn't anticipate

Combined Result:
- Predictive provides the baseline
- Reactive adds/removes as actual load differs from forecast
- Users never experience scaling gaps for predictable load
- System still adapts to unpredictable spikes

The Pattern Shift Problem

Configuring Predictive Scaling

Let's walk through practical configuration on major platforms. While specifics vary, the concepts translate across providers.

AWS Auto Scaling Predictive Scaling Configuration:

Basic Configuration (AWS CLI):

aws autoscaling put-scaling-policy 
  --auto-scaling-group-name my-asg 
  --policy-name predictive-scaling-policy 
  --policy-type PredictiveScaling 
  --predictive-scaling-configuration '{
    "MetricSpecifications": [{
      "TargetValue": 50,
      "PredefinedMetricPairSpecification": {
        "PredefinedMetricType": "ASGCPUUtilization"
      }
    }],
    "Mode": "ForecastAndScale",
    "SchedulingBufferTime": 300
  }'

Key Parameters:

Parameter	Description	Recommended
`TargetValue`	Metric value to maintain	40-60% for CPU
`Mode`	ForecastOnly (test) or ForecastAndScale (active)	Start with ForecastOnly
`SchedulingBufferTime`	Seconds before predicted need to launch	Instance startup time (300-600s)
`MaxCapacityBreachBehavior`	Honor or Increase max if forecast exceeds	HonorMaxCapacity usually

Available Metric Types:

ASGAverageCPUUtilization
ALBRequestCountPerTarget
Custom metrics via CustomizedLoadMetricSpecification

Validating Predictions Before Trusting Them

Phase 1: Forecast-Only Mode (Weeks 1-2)

Enable predictive scaling in forecast-only mode:

# AWS
"Mode": "ForecastOnly"

This generates predictions without taking action. Compare predictions to actual load:

Export forecast data — CloudWatch metrics for predicted capacity
Overlay with actual — Plot predicted vs actual capacity need
Calculate accuracy — Measure mean absolute percentage error (MAPE)

MAPE = (1/n) × Σ |Actual - Predicted| / Actual × 100%

Interpretation:
- MAPE < 10%: Excellent predictions, safe to enable
- MAPE 10-20%: Good predictions, enable with monitoring
- MAPE 20-30%: Moderate accuracy, enable cautiously
- MAPE > 30%: Poor predictions, investigate before enabling

Phase 2: Limited Activation (Weeks 3-4)

Enable predictive scaling but with safety constraints:

{
  "Mode": "ForecastAndScale",
  "MaxCapacityBreachBehavior": "HonorMaxCapacity"
}

And ensure reactive policies are active as backup:

Target tracking policy with same metric
Lower target value (more aggressive reactive scaling)

This way:

Predictive handles the baseline
Reactive catches prediction errors
Max capacity prevents runaway scaling

Phase 3: Full Trust (Week 5+)

After validating predictions are accurate:

Gradually relax reactive policy aggressiveness
Consider increasing max capacity if predictions consistently approach limit
Reduce manual monitoring as confidence grows

Prediction Validation Checklist

•Compare daily patterns — Does prediction match actual daily curve shape?
•Check weekend behavior — Are weekday/weekend differences captured?
•Validate peak timing — Is the predicted peak time accurate (±15 minutes)?
•Assess peak magnitude — Is predicted peak capacity within 20% of actual need?
•Look for systematic bias — Does it consistently over-predict or under-predict?
•Test on anomalous days — How does prediction fare on unusual days (holidays, events)?

The 'Over-Predict is OK' Principle

Advanced Predictive Scaling Patterns

Beyond basic configuration, sophisticated organizations employ advanced patterns that maximize predictive scaling's value:

1. Multi-Layer Predictive + Reactive Stack:

┌─────────────────────────────────────────┐
│  Predictive Scaling (Base Capacity)     │
│  - Handles 80% of scaling need          │
│  - Pre-positions for daily patterns     │
│  - Low churn, efficient                 │
├─────────────────────────────────────────┤
│  Target Tracking (Day-to-Day Variance)  │
│  - Handles 15% (normal variation)       │
│  - Adjusts within predicted range       │
│  - Moderate responsiveness              │
├─────────────────────────────────────────┤
│  Step Scaling (Emergency Response)      │
│  - Handles 5% (unexpected spikes)       │
│  - Aggressive thresholds                │
│  - Fast cooldowns                       │
└─────────────────────────────────────────┘

2. Capacity Reservation Alignment:

For cost optimization, align predictive scaling with Reserved Instances or Savings Plans:

Predictive Baseline = Reserved Capacity

- Purchase RIs/Savings Plans for predicted minimum (floor of predictions)
- On-demand/spot for variance above predictions
- Predictive ensures you actually use your reservations
- Reactive handles spikes beyond reserved capacity

3. Cross-Service Prediction:

Use upstream service's predictions to pre-scale downstream:

Web Tier Prediction: 100 instances at 9 AM
↓
API Tier Prediction: 50 instances (derived from web:API ratio)
↓  
Database Read Replicas: 5 replicas (derived from API:DB ratio)

→ Entire stack pre-scales together
→ No cascading delays

4. Event-Aware Prediction Overrides:

For known events that will break predictions:

# Terraform/CloudFormation scheduled action
resource "aws_autoscaling_schedule" "black_friday" {
  scheduled_action_name  = "black-friday-override"
  min_size               = 200  # Override prediction floor
  max_size               = 1000 # Override prediction ceiling
  desired_capacity       = 500  # Start high
  recurrence             = "0 0 * 11 5#4"  # 4th Thursday Nov
}

Scheduled actions override predictions for events where:

Historical data doesn't exist (first-time event)
Scale is much larger than history (viral campaign)
Timing differs from normal patterns (midnight launch)

5. Prediction Quality Monitoring:

Automate prediction quality tracking:

# CloudWatch custom metric for prediction accuracy
def calculate_prediction_accuracy():
    predicted = get_cloudwatch_metric('PredictiveScaling', 'LoadForecast')
    actual = get_cloudwatch_metric('ASG', 'GroupInServiceInstances')
    
    accuracy = 100 - abs(predicted - actual) / actual * 100
    
    put_cloudwatch_metric(
        'PredictiveScaling/Accuracy',
        accuracy,
        dimensions={'ASG': 'my-asg'}
    )
    
    if accuracy < 80:
        send_alert('Predictive scaling accuracy degraded')

The ML Operations Perspective

Limitations and Gotchas

Predictive scaling is powerful but has important limitations. Understanding these prevents surprises in production:

Key Limitations

•14-Day Minimum History — Without sufficient data, predictions are unreliable or impossible. New ASGs can't use predictive scaling immediately.
•Pattern-Dependent Accuracy — Random or non-repeating traffic patterns produce poor predictions. If your traffic is driven by external events (news, viral content), patterns won't help.
•Slow to Adapt — When patterns change, predictions take 1-2 weeks to adjust. Major product changes invalidate historical patterns.
•Not Instant — Predictive scaling still has lead time. Forecasts are generated every 24 hours; they don't react to real-time changes.
•Black-Box Algorithm — You can't inspect or tune the prediction model. If predictions are wrong, you can only disable or supplement with reactive.
•Cost of Over-Prediction — Predictions can err high, causing over-provisioning. While safer than under-prediction, it increases costs.
•Maximum Capacity Constraints — Predictions that exceed your max capacity get capped. If predictions consistently hit max, you need to increase it or investigate.
•Limited Metric Support — Not all metrics are supported for prediction. Custom business metrics may require workarounds.

Common Gotchas and Solutions
Gotcha	Symptom	Solution
Predictions stale after change	Under-provisioning after major update	Disable predictive for 2 weeks; rely on reactive
Over-prediction on weekends	Wasted capacity on Sat/Sun	Add scheduled action to cap weekend capacity
Missing one-time events	Under-provisioned for product launch	Scheduled override action for known events
Predictions hit max constantly	Can't tell if predictions are accurate	Increase max to see actual prediction values
Timezone confusion	Predictions are 5 hours off	Verify data uses UTC; buffer time accounts for offset
Predictions too conservative	Always under-predicting by 20%	Lower target value to increase predicted capacity

The 'It Just Works' Trap

Summary: Predictive Scaling

We've explored predictive scaling comprehensively. Let's consolidate the key insights:

Key Takeaways

•Predictive scaling eliminates the reactive gap — Pre-positions capacity before demand arrives, ensuring users never experience scaling delays for predictable patterns.
•It requires recurring patterns — Works well for daily/weekly cycles; fails for random, unpredictable, or one-time events.
•Always combine with reactive scaling — Predictive handles the baseline; reactive catches what predictions miss. Together they provide comprehensive coverage.
•Validate before trusting — Use forecast-only mode to compare predictions to actual load. Enable active scaling only after confirming accuracy.
•Monitor prediction quality continuously — Patterns change; predictions can become stale. Treat predictive scaling like any ML system requiring ongoing operations.
•Account for limitations — 14-day history requirement, slow adaptation to changes, black-box algorithms, and cost of over-prediction all need management.

Module Complete:

You've now completed the Auto-Scaling module. You understand:

What auto-scaling is — Dynamic resource adjustment based on demand
What triggers scaling — CPU, memory, queue depth, latency, custom metrics
How policies work — Target tracking, step scaling, scheduled, predictive
How to ensure stability — Cool-down periods, warmup, asymmetric behavior
How to scale proactively — Predictive scaling with ML-based forecasting

With this knowledge, you can design auto-scaling strategies that maintain performance, minimize cost, and adapt to any traffic pattern your system encounters.

Module Complete