System Design (HLD)Cloud Cost Optimization

Cloud Cost Optimization

LevelIntermediate

Duration90 mins

TopicCloud Cost Optimization

2 / 5

Reserved vs Spot Instances

The Economics of Cloud Compute

In 2018, a financial services company was spending $4.2 million annually on AWS EC2 instances—all on-demand pricing. After a comprehensive analysis and strategic shift to reserved instances and spot instances, they reduced that bill to $1.8 million the following year—a 57% reduction with zero changes to their actual workloads.

This transformation isn't magic; it's economics. Cloud providers offer multiple pricing models for compute resources, each with different cost-commitment tradeoffs. Understanding these models and applying them strategically is one of the highest-impact optimizations available to cloud architects.

The fundamental insight: Cloud providers price capacity based on certainty. The more certainty you provide them (through commitments), the lower your price. The more flexibility they retain (like being able to reclaim your capacity), the lower your price. On-demand pricing—where you provide no commitment and demand full availability—is the most expensive option because you're paying a premium for maximum flexibility.

What You Will Learn

By the end of this page, you will understand the economics of on-demand, reserved, savings plans, and spot pricing. You'll learn how to analyze workloads for purchasing optimization, calculate commitment levels, implement spot instance strategies safely, and build portfolios that minimize cost while maintaining reliability.

Cloud Compute Pricing Models

All major cloud providers offer tiered pricing models for compute resources. While the specific names and details vary, the fundamental options are remarkably similar:

On-Demand (Pay-as-you-go)

The default pricing model where you pay for compute capacity by the hour or second with no long-term commitment. Resources can be started and stopped freely. This is the most flexible but most expensive option.

Reserved Instances / Committed Use

A commitment to use a specific amount of compute capacity for 1-3 years in exchange for significant discounts (typically 30-75% off on-demand). The commitment is binding—you pay whether you use the capacity or not.

Savings Plans (AWS) / Committed Use Discounts (GCP)

A more flexible commitment model where you commit to a dollar amount of hourly spend rather than specific instance types. This provides discount flexibility across instance families and regions.

Spot Instances / Preemptible VMs / Spot VMs

Unused cloud capacity offered at steep discounts (60-90% off on-demand) with the caveat that your instances can be interrupted with short notice (typically 2 minutes to 30 seconds). Ideal for fault-tolerant, interruptible workloads.

Compute Pricing Model Comparison
Model	Discount	Commitment	Flexibility	Interruption Risk	Best For
On-Demand	0%	None	Maximum	None	Variable workloads, testing
Reserved (1yr)	30-40%	1 year	Low	None	Steady-state production
Reserved (3yr)	50-75%	3 years	Very Low	None	Long-term baseline
Savings Plans	30-75%	1-3 years	Medium	None	Evolving architectures
Spot/Preemptible	60-90%	None	High	High	Batch, stateless, ML training

The Pricing Hierarchy

Think of cloud pricing as trading flexibility for discount: On-Demand (max flexibility, no discount) → Reserved/Savings (less flexibility, moderate discount) → Spot (least reliability, maximum discount). The art is matching your workload characteristics to the appropriate pricing tier.

Deep Dive: Reserved Instances

Reserved Instances (RIs) are the workhorse of cloud cost optimization for steady-state workloads. Understanding their mechanics, types, and strategies is essential for effective cloud financial management.

How Reserved Instances Work:

Contrary to the name, you're not actually "reserving" a specific server. Instead, you're purchasing a billing discount that automatically applies to matching on-demand usage. If you have a Reserved Instance for m5.xlarge in us-east-1, any on-demand m5.xlarge in that region will automatically receive the RI discount.

RI Purchasing Options:

Reserved Instances come in three payment options:

All Upfront (AURI) — Pay the entire commitment upfront; highest discount
Partial Upfront (PURI) — Pay some upfront, remainder monthly; medium discount
No Upfront (NURI) — Pay nothing upfront, all monthly; lowest RI discount

The discount difference between AURI and NURI is typically 5-10%. For organizations optimizing for cash flow, PURI or NURI may be preferable despite the slightly lower discount.

AWS EC2 Reserved Instance Pricing Example (m5.xlarge, us-east-1)
Option	1-Year Term	3-Year Term	Savings vs On-Demand
On-Demand	$0.192/hour	$0.192/hour	—
All Upfront	$0.120/hour effective	$0.076/hour effective	37% / 61%
Partial Upfront	$0.124/hour effective	$0.080/hour effective	35% / 58%
No Upfront	$0.128/hour	$0.085/hour	33% / 56%

Standard vs Convertible Reserved Instances:

Standard RIs are tied to a specific instance type and cannot be changed. They offer slightly higher discounts (additional 5-10%) but require accurate capacity planning.

Convertible RIs allow you to exchange for different instance types, families, operating systems, or tenancy during the term. This flexibility is valuable when your architecture is evolving.

Standard RI:      m5.xlarge → m5.xlarge only (highest discount)
Convertible RI:   m5.xlarge → r5.xlarge or c5.2xlarge (flexible)

RI Scope: Regional vs Zonal

Regional RIs provide a discount for any matching usage across all Availability Zones in a region and provide instance size flexibility within the instance family.

Zonal RIs are tied to a specific AZ and also include a capacity reservation guarantee (you are guaranteed capacity in that AZ), but lack size flexibility.

For most workloads, Regional RIs are preferred because they offer more flexibility and still cover multi-AZ deployments.

Reserved Instance Best Practices

•Cover baseline, not peak — Reserve capacity for your minimum steady-state usage; use on-demand or spot for variable load above baseline
•Start with 1-year terms — Until you have confidence in your forecasting, 1-year commitments limit risk
•Prefer Convertible for uncertainty — If your architecture might change, the flexibility premium is worth it
•Use Regional scope — Regional RIs provide size flexibility and work across AZs
•Monitor utilization — Track RI utilization rates; unused RIs are wasted money
•Set up RI purchase recommendations — AWS and GCP provide recommendations based on usage history
•Consider the secondary market — AWS Marketplace allows selling unused RIs

The RI Commitment Risk

Reserved Instances are a financial commitment. If you overcommit (buy more RIs than you use) or your needs change (migrating to containers, moving workloads), you'll pay for unused capacity. Start conservatively—it's better to pay slightly more on-demand for peak capacity than to waste money on unused reservations.

Savings Plans: The Modern Approach

AWS introduced Savings Plans in 2019 as a more flexible alternative to Reserved Instances. Rather than committing to specific instance types, you commit to a consistent amount of compute usage (measured in $/hour) for a 1 or 3-year term.

Savings Plans have largely superseded Reserved Instances for most use cases because they offer comparable discounts with significantly more flexibility.

Types of Savings Plans:

Compute Savings Plans

Apply to any EC2 instance regardless of family, size, AZ, region, OS, or tenancy
Also apply to AWS Fargate and Lambda usage
Maximum flexibility; slightly lower discount than RI (but close)

EC2 Instance Savings Plans

Apply to specific instance families in specific regions (e.g., M5 in us-east-1)
Flexible across sizes, AZ, and OS
Higher discount than Compute Savings Plans, lower than equivalent RIs

SageMaker Savings Plans

Apply to SageMaker ML instance usage
Similar flexibility to Compute Savings Plans

Savings Plans Advantages

•Apply across instance families (change m5 → c6g)
•Apply across regions (shift workloads)
•Cover Fargate and Lambda usage
•Simpler to manage than RI portfolios
•Automatic application to eligible usage
•No need to match specific instance types

Savings Plans Limitations

•Cannot be sold on marketplace (unlike RIs)
•No capacity reservation option
•Slightly lower discount than equivalent RIs
•Still a financial commitment—unused is wasted
•Limited provider support (AWS-focused)
•Requires understanding of $/hour commitment

Calculating Savings Plan commitment:

To determine the right commitment level, analyze your historical usage:

Identify baseline compute spend — Review your last 30-90 days of EC2/Fargate/Lambda usage
Find the minimum usage floor — The lowest hourly spend during that period
Apply a safety margin — Commit to 70-80% of the floor (not 100%)
Calculate hourly commitment — Sum of instance hourly costs at Savings Plan rates

Example calculation:

Your EC2 usage patterns for the month:

Minimum hourly spend: $50/hour (nighttime, weekends)
Average hourly spend: $80/hour
Peak hourly spend: $150/hour

Recommended Savings Plan: $40/hour commitment (80% of minimum)

This ensures your commitment is always fully utilized while leaving headroom for variability. The remaining $10-110/hour runs on-demand, and peaks can use spot instances.

savings-plan-analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
"""
Savings Plan Commitment Calculator
 
Analyzes historical EC2 usage to recommend optimal Savings Plan commitment.
"""
 
import boto3
from datetime import datetime, timedelta
from typing import List, Dict
import statistics
 
def get_ec2_hourly_costs(
    start_date: datetime,
    end_date: datetime
) -> List[float]:
    """
    Fetch hourly EC2 costs from AWS Cost Explorer.
    Returns list of hourly spend amounts.
    """
    client = boto3.client('ce')
    
    response = client.get_cost_and_usage(
        TimePeriod={
            'Start': start_date.strftime('%Y-%m-%d'),
            'End': end_date.strftime('%Y-%m-%d')
        },
        Granularity='HOURLY',
        Filter={
            'Dimensions': {
                'Key': 'SERVICE',
                'Values': ['Amazon Elastic Compute Cloud - Compute']
            }
        },
        Metrics=['UnblendedCost']
    )
    
    costs = [
        float(period['Total']['UnblendedCost']['Amount'])
        for period in response['ResultsByTime']
    ]
    
    return costs
 
def recommend_savings_plan(
    hourly_costs: List[float],
    safety_margin: float = 0.80,  # Commit to 80% of minimum
    savings_plan_discount: float = 0.30  # Expected ~30% discount
) -> Dict:
    """
    Analyze usage patterns and recommend Savings Plan commitment.
    """
    # Calculate key statistics
    min_usage = min(hourly_costs)
    avg_usage = statistics.mean(hourly_costs)
    max_usage = max(hourly_costs)
    p10_usage = statistics.quantiles(hourly_costs, n=10)[0]  # 10th percentile
    
    # Recommended commitment (conservative: 80% of minimum)
    recommended_commitment = min_usage * safety_margin
    
    # Alternative: slightly aggressive (90% of P10)
    aggressive_commitment = p10_usage * 0.90
    
    # Calculate expected savings
    commitment_coverage = min(recommended_commitment, avg_usage) / avg_usage
    expected_savings = commitment_coverage * savings_plan_discount
    
    # Annual cost projection
    current_annual_cost = avg_usage * 24 * 365
    projected_annual_cost = current_annual_cost * (1 - expected_savings)
    annual_savings = current_annual_cost - projected_annual_cost
    
    return {
        'current_usage': {
            'minimum_hourly': round(min_usage, 2),
            'average_hourly': round(avg_usage, 2),
            'maximum_hourly': round(max_usage, 2),
            'p10_hourly': round(p10_usage, 2),
        },
        'recommendation': {
            'conservative_commitment': round(recommended_commitment, 2),
            'aggressive_commitment': round(aggressive_commitment, 2),
            'safety_margin_used': safety_margin,
        },
        'projected_impact': {
            'commitment_coverage': f"{commitment_coverage * 100:.1f}%",
            'expected_savings_rate': f"{expected_savings * 100:.1f}%",
            'current_annual_cost': f"${current_annual_cost:, .0f}",
            'projected_annual_cost': f"${projected_annual_cost:,.0f}",
                            'annual_savings': f"${annual_savings:,.0f}",
                        }
    }
 
# Example usage
if __name__ == "__main__":
    # Analyze last 90 days
    end_date = datetime.now()
    start_date = end_date - timedelta(days = 90)
    
    # In real usage, this would call AWS Cost Explorer
    # For demo, using sample data
    sample_costs =[
                45, 48, 52, 55, 75, 85, 95, 120,  # Morning ramp
        130, 140, 145, 150, 145, 140, 135, 120,  # Business hours
        100, 85, 70, 60, 55, 50, 48, 45,  # Evening wind - down
            ] * 90  # Simulate 90 days
    
    result = recommend_savings_plan(sample_costs)
    
    print("=== Savings Plan Analysis ===")
    print(f"Current Usage:")
    print(f"  Min: ${result['current_usage']['minimum_hourly']}/hr")
    print(f"  Avg: ${result['current_usage']['average_hourly']}/hr")
    print(f"  Max: ${result['current_usage']['maximum_hourly']}/hr")
    print()
    print(f"Recommendation:")
    print(f"  Conservative: ${result['recommendation']['conservative_commitment']}/hr")
    print(f"  Aggressive: ${result['recommendation']['aggressive_commitment']}/hr")
    print()
    print(f"Projected Impact:")
    print(f"  Annual Savings: {result['projected_impact']['annual_savings']}")

Spot Instances: The Interruptible Goldmine

Spot Instances (AWS terminology) or Preemptible/Spot VMs (GCP/Azure) represent unused cloud capacity sold at steep discounts—typically 60-90% off on-demand pricing. The tradeoff is that these instances can be interrupted with minimal notice (2 minutes on AWS, 30 seconds on GCP).

Spot instances are the most misunderstood and underutilized pricing model. Many teams avoid them due to interruption fear, but with proper architecture, spot instances can safely run production workloads and dramatically reduce costs.

How Spot Pricing Works:

Spot prices fluctuate based on supply and demand for unused capacity in a specific instance type, AZ, and region combination. When demand exceeds supply, prices rise. When your maximum bid (if using bid model) or the current price exceeds your instance's bid, your instance may be interrupted.

AWS Spot pricing has evolved:

Old model (pre-2017): Auction-based with highly volatile pricing
New model (current): More stable pricing, gradual price changes, capacity-based interruption

In practice, spot prices for popular instances are remarkably stable, often hovering at 60-70% discount for extended periods.

Spot Instance Characteristics by Cloud Provider
Provider	Name	Discount	Interruption Notice	Max Runtime
AWS	Spot Instances	60-90%	2 minutes	Unlimited*
GCP	Preemptible VMs	60-91%	30 seconds	24 hours
GCP	Spot VMs	60-91%	30 seconds	Unlimited*
Azure	Spot VMs	Up to 90%	30 seconds	Unlimited*

*"Unlimited" means no provider-enforced time limit, but interruptions can occur anytime based on capacity needs.

Workloads Suitable for Spot Instances:

Ideal candidates:

Batch processing — Data pipelines, ETL jobs, report generation
CI/CD builds — Build servers, test runners, automated testing
Machine learning training — Distributed training with checkpointing
Big data processing — Spark, EMR, MapReduce jobs with retries
Rendering and transcoding — Media processing, CGI rendering
Stateless web workers — Workers behind load balancers with health checks
Scientific computing — Simulations, genomics, climate modeling

Challenging candidates:

User-facing APIs — Need high availability
Databases — Stateful, interruption causes data issues
Real-time systems — Cannot tolerate random interruptions
Singleton services — No redundancy = no fault tolerance

Spot Instance Interruption Handling

The 2-minute warning (AWS) may seem short, but it's enough to: gracefully drain connections, checkpoint work in progress, store state externally, and trigger instance replacement. Your application must be designed to handle interruptions gracefully. If you can't handle a 2-minute shutdown, you can't use spot instances.

Spot Instance Architecture Patterns

•Instance Diversification — Spread workloads across multiple instance types and AZs. If one spot pool is interrupted, others continue. AWS Spot Fleet and Auto Scaling can manage this automatically.
•Checkpointing and Resume — Design jobs to checkpoint progress regularly. On interruption, restart from last checkpoint rather than beginning.
•Graceful Shutdown Handlers — Implement signal handlers (SIGTERM) that trigger graceful shutdown: drain queues, save state, deregister from load balancers.
•Spot + On-Demand Mix — Run critical baseline on On-Demand or Reserved; burst capacity on Spot. Preserves availability while optimizing cost.
•Queue-Based Decoupling — Put work in durable queues (SQS, Kafka). Workers pull from queue; if interrupted, another worker picks up the message.
•Spot Interruption Monitoring — Monitor interruption events and rates. Some instance types have lower interruption frequency—prefer those.

Implementing Spot at Scale

Running spot instances safely at production scale requires robust tooling and automation. Let's explore the key implementation patterns.

AWS Spot Fleet and Auto Scaling:

AWS provides two primary mechanisms for managing spot capacity:

Spot Fleet — Request spot capacity across instance pools (instance type + AZ combinations). Define allocation strategy:

lowestPrice: Prioritize cheapest pools (higher interruption risk)
diversified: Spread across pools (lower interruption risk)
capacityOptimized: Prioritize pools with highest availability (recommended)

EC2 Auto Scaling Mixed Instances — Combine On-Demand and Spot in a single Auto Scaling group:

Set base capacity on On-Demand
Set percentage for On-Demand vs Spot
Define instance type priorities

spot-auto-scaling.yaml
CloudFormation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# Mixed Instances Policy: On - Demand Base + Spot Scaling
# Provides baseline availability with cost - optimized scaling
 
    AWSTemplateFormatVersion: '2010-09-09'
    Description: 'Production Auto Scaling with Spot Instances'
 
    Resources:
    ProductionAutoScalingGroup:
    Type: AWS:: AutoScaling:: AutoScalingGroup
    Properties:
    AutoScalingGroupName: production - worker - asg
    VPCZoneIdentifier:
    - !Ref PrivateSubnetA
        - !Ref PrivateSubnetB
            - !Ref PrivateSubnetC
    MinSize: 4
    MaxSize: 40
    DesiredCapacity: 10
    HealthCheckType: ELB
    HealthCheckGracePeriod: 300
    TargetGroupARNs:
    - !Ref WorkerTargetGroup
      
      # Mixed Instances Policy for Spot + On - Demand
      MixedInstancesPolicy:
        InstancesDistribution:
          # Baseline: 20 % On - Demand(minimum 4 instances)
    OnDemandBaseCapacity: 4
    OnDemandPercentageAboveBaseCapacity: 0  # All scaling uses Spot
          
          # Spot allocation strategy
    SpotAllocationStrategy: capacity - optimized
    SpotInstancePools: 4  # Diversify across 4 pools
 
    LaunchTemplate:
    LaunchTemplateSpecification:
    LaunchTemplateId: !Ref WorkerLaunchTemplate
    Version: !GetAtt WorkerLaunchTemplate.LatestVersionNumber
          
          # Instance type diversification
          # List multiple similar instance types for flexibility
          Overrides:
        - InstanceType: m5.xlarge
    WeightedCapacity: 4
        - InstanceType: m5a.xlarge
    WeightedCapacity: 4
        - InstanceType: m5n.xlarge
    WeightedCapacity: 4
        - InstanceType: m4.xlarge
    WeightedCapacity: 4
        - InstanceType: m5.2xlarge
    WeightedCapacity: 8
        - InstanceType: m5a.2xlarge
    WeightedCapacity: 8
      
      # Lifecycle hooks for graceful Spot handling
      LifecycleHookSpecificationList:
        - LifecycleHookName: graceful - shutdown
    LifecycleTransition: autoscaling: EC2_INSTANCE_TERMINATING
    HeartbeatTimeout: 120  # 2 minutes for graceful shutdown
          DefaultResult: CONTINUE
 
  # Launch template with Spot interruption handling
    WorkerLaunchTemplate:
    Type: AWS:: EC2:: LaunchTemplate
    Properties:
    LaunchTemplateData:
    IamInstanceProfile:
    Arn: !GetAtt WorkerInstanceProfile.Arn
    ImageId: !Ref LatestAmiId
        
        # Spot configuration
    InstanceMarketOptions:
    MarketType: spot
    SpotOptions:
    SpotInstanceType: one - time
    InstanceInterruptionBehavior: terminate
        
        # Detailed monitoring for quick health detection
        Monitoring:
        Enabled: true
        
        # User data with shutdown handling
    UserData:
    Fn:: Base64: |
            #!/bin/bash
            # Install spot interruption handler
    amazon - linux - extras install - y aws - cli - 2
            
            # Start application with shutdown handling
        / opt / app / start.sh--graceful - shutdown=120
            
            # Spot interruption handler(daemon)
    cat > /opt/spot - handler.sh << 'EOF'
            #!/bin/bash
    TOKEN = $(curl - X PUT "http://169.254.169.254/latest/api/token" \
        -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
 
    while true; do
              # Check for interruption notice
              HTTP_CODE = $(curl - s - o / dev / null - w "%{http_code}" \
        -H "X-aws-ec2-metadata-token: $TOKEN" \
        http://169.254.169.254/latest/meta-data/spot/instance-action)
              
              if ["$HTTP_CODE" - eq 200]; then
                echo "Spot interruption notice received, initiating shutdown"
                # Signal application to drain
        / opt / app / drain - and - shutdown.sh
                # Deregister from load balancer
                aws elbv2 deregister - targets--target - group - arn $TG_ARN \
    --targets Id = $(curl - s http://169.254.169.254/latest/meta-data/instance-id)
        exit 0
              fi
              
              sleep 5
            done
            EOF
            chmod + x / opt / spot - handler.sh
            nohup / opt / spot - handler.sh & 

Kubernetes Spot Integration:

For containerized workloads, Kubernetes provides excellent spot integration:

AWS EKS with Karpenter: Karpenter automatically provisions the right nodes (including spot) based on pod requirements and cost optimization.

Node taints and tolerations: Mark spot nodes with taints; only pods with matching tolerations schedule there.

Pod Disruption Budgets: Ensure minimum availability during spot interruptions.

# Karpenter Provisioner for Spot Nodes
apiVersion: karpenter.sh / v1alpha5
kind: Provisioner
metadata: 
  name: spot - provisioner
spec: 
  requirements: 
 - key: karpenter.sh / capacity - type
      operator: In
      values: ["spot"]
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64"]
  limits:
    resources:
      cpu: 1000
      memory: 2000Gi
  providerRef:
    name: default
  ttlSecondsAfterEmpty: 30

Spot Capacity Availability

Not all instance types have the same spot availability. Newer generation instances (m5, c5, r5) typically have more spot capacity than older generations. Instances with GPUs often have limited spot availability due to ML training demand. Check the Spot Instance Advisor (AWS) for interruption frequency data before selecting instance types.

Building a Balanced Portfolio

The most effective cloud cost strategy combines all purchasing options into a balanced portfolio, matching each option to the appropriate workload characteristics.

The Portfolio Approach:

Think of your compute strategy like an investment portfolio:

Core Holdings (Reserved/Savings Plans) — Cover predictable baseline demand
Variable Holdings (On-Demand) — Handle normal fluctuations above baseline
Opportunistic Holdings (Spot) — Capture value from interruptible workloads

Target allocation for a typical organization:

Pricing Model	Target Coverage	Workload Type
Reserved/Savings	50-70%	Steady-state production, databases, core services
On-Demand	10-20%	Variable production, peaks, spikes
Spot	20-30%	Batch, CI/CD, ML training, stateless workers

Converting Mermaid diagram...

Calculating blended cost:

Assume on-demand hourly rate of $1.00:

Component	Usage %	Rate	Effective Cost
Reserved (60% off)	55%	$0.40	$0.22
On-Demand	20%	$1.00	$0.20
Spot (75% off)	25%	$0.25	$0.0625
Blended	100%	—	$0.48/hr

This portfolio achieves a 52% reduction from pure on-demand, compared to:

60% reduction if 100% Reserved (but inflexible and risky)
75% reduction if 100% Spot (but unreliable)

The portfolio approach optimizes for both cost AND reliability.

Portfolio Optimization Process

•Analyze usage patterns — Collect 90+ days of hourly usage data. Identify baseline, variable, and burst patterns.
•Categorize workloads — Tag resources by workload type: steady-state, variable, interruptible.
•Size your commitments — Commit to 70-80% of baseline with Reserved/Savings. Leave headroom for changes.
•Configure spot for batch — Move eligible workloads (batch, CI/CD, training) to spot with proper architecture.
•Monitor and rebalance — Track utilization monthly. Adjust commitments as usage patterns change.
•Sell unused capacity — Use RI Marketplace to sell unused Reserved Instances.
•Iterate quarterly — Review portfolio quarterly against actual usage and upcoming changes.

Cross-Provider Considerations

While this page has primarily used AWS terminology, the pricing concepts apply across all major cloud providers with slight variations.

Azure Pricing Models:

Azure Reservations:

1 or 3-year terms for VMs, SQL Database, Cosmos DB, and more
Scope: Single subscription, management group, or shared
Exchangeable and refundable (with some restrictions)

Azure Spot VMs:

Up to 90% discount
Eviction based on capacity or price
Can set max price or use current price
30-second eviction notice

GCP Pricing Models:

Committed Use Discounts (CUDs):

1 or 3-year commitments
Resource-based (specific machine types) or spend-based
37-70% discounts

Preemptible VMs:

Up to 91% discount
Always terminated after 24 hours
30-second termination warning

Spot VMs (newer):

Similar discounts to Preemptible
No 24-hour maximum lifetime
Dynamic pricing

Pricing Model Terminology Across Providers
Concept	AWS	Azure	GCP
Standard Pricing	On-Demand	Pay-as-you-go	On-Demand
Commitment (specific)	Reserved Instances	Azure Reservations	Committed Use (resource)
Commitment (flexible)	Savings Plans	(within Reservations)	Committed Use (spend)
Interruptible	Spot Instances	Spot VMs	Spot VMs / Preemptible
Sustained Use Discount	N/A	N/A	Automatic (GCE)

GCP Sustained Use Discounts

GCP uniquely offers automatic Sustained Use Discounts (SUDs) for Compute Engine. If you run an instance continuously, you receive automatic discounts up to 30% without any commitment. This reduces the relative value of Committed Use for GCP compared to AWS Reserved Instances, since you're already getting partial discounts automatically.

Summary: Reserved vs Spot Instances

Cloud compute pricing is not one-size-fits-all. The choice between on-demand, reserved, savings plans, and spot depends on your workload characteristics, risk tolerance, and organizational maturity. Let's consolidate the key takeaways:

Key Takeaways

•On-demand is the most expensive option — It's paying a premium for maximum flexibility. Reserve it for true variability, not laziness.
•Reserved Instances/Savings Plans cover baseline — Commit to 70-80% of your minimum usage floor. Conservative commitments avoid waste.
•Savings Plans offer flexibility over RIs — For most organizations, Compute Savings Plans are the better choice unless you need capacity reservations.
•Spot instances offer massive savings — 60-90% discounts for interruptible workloads. Design for interruption, and spot becomes safe for production.
•Diversification reduces spot risk — Spread across instance types and AZs. Use capacity-optimized allocation strategies.
•Build a balanced portfolio — Combine all pricing models: Reserved for baseline, On-Demand for variability, Spot for batch.
•Monitor and rebalance regularly — Usage patterns change. Review commitment utilization and adjust quarterly.

What's next:

With pricing models understood, the next step is ensuring you're not paying for more capacity than you need. The next page explores Right-Sizing Resources—the practice of matching resource allocation to actual workload requirements, often the single highest-impact optimization available.

Page Complete

You now understand the economics of cloud compute pricing, from basic On-Demand to advanced Spot strategies. These purchasing decisions can reduce compute costs by 50-70% with no changes to your actual workloads. Next, we'll ensure you're not over-provisioning the resources you're buying.

2 / 5

Loading learning content...

System Design (HLD)Cloud Cost Optimization

Cloud Cost Optimization

LevelIntermediate

Duration90 mins

TopicCloud Cost Optimization

2 / 5

Reserved vs Spot Instances

The Economics of Cloud Compute

What You Will Learn

Cloud Compute Pricing Models

All major cloud providers offer tiered pricing models for compute resources. While the specific names and details vary, the fundamental options are remarkably similar:

On-Demand (Pay-as-you-go)

Reserved Instances / Committed Use

Savings Plans (AWS) / Committed Use Discounts (GCP)

A more flexible commitment model where you commit to a dollar amount of hourly spend rather than specific instance types. This provides discount flexibility across instance families and regions.

Spot Instances / Preemptible VMs / Spot VMs

Compute Pricing Model Comparison
Model	Discount	Commitment	Flexibility	Interruption Risk	Best For
On-Demand	0%	None	Maximum	None	Variable workloads, testing
Reserved (1yr)	30-40%	1 year	Low	None	Steady-state production
Reserved (3yr)	50-75%	3 years	Very Low	None	Long-term baseline
Savings Plans	30-75%	1-3 years	Medium	None	Evolving architectures
Spot/Preemptible	60-90%	None	High	High	Batch, stateless, ML training

The Pricing Hierarchy

Deep Dive: Reserved Instances

How Reserved Instances Work:

RI Purchasing Options:

Reserved Instances come in three payment options:

All Upfront (AURI) — Pay the entire commitment upfront; highest discount
Partial Upfront (PURI) — Pay some upfront, remainder monthly; medium discount
No Upfront (NURI) — Pay nothing upfront, all monthly; lowest RI discount

The discount difference between AURI and NURI is typically 5-10%. For organizations optimizing for cash flow, PURI or NURI may be preferable despite the slightly lower discount.

AWS EC2 Reserved Instance Pricing Example (m5.xlarge, us-east-1)
Option	1-Year Term	3-Year Term	Savings vs On-Demand
On-Demand	$0.192/hour	$0.192/hour	—
All Upfront	$0.120/hour effective	$0.076/hour effective	37% / 61%
Partial Upfront	$0.124/hour effective	$0.080/hour effective	35% / 58%
No Upfront	$0.128/hour	$0.085/hour	33% / 56%

Standard vs Convertible Reserved Instances:

Standard RIs are tied to a specific instance type and cannot be changed. They offer slightly higher discounts (additional 5-10%) but require accurate capacity planning.

Convertible RIs allow you to exchange for different instance types, families, operating systems, or tenancy during the term. This flexibility is valuable when your architecture is evolving.

Standard RI:      m5.xlarge → m5.xlarge only (highest discount)
Convertible RI:   m5.xlarge → r5.xlarge or c5.2xlarge (flexible)

RI Scope: Regional vs Zonal

Regional RIs provide a discount for any matching usage across all Availability Zones in a region and provide instance size flexibility within the instance family.

Zonal RIs are tied to a specific AZ and also include a capacity reservation guarantee (you are guaranteed capacity in that AZ), but lack size flexibility.

For most workloads, Regional RIs are preferred because they offer more flexibility and still cover multi-AZ deployments.

Reserved Instance Best Practices

•Cover baseline, not peak — Reserve capacity for your minimum steady-state usage; use on-demand or spot for variable load above baseline
•Start with 1-year terms — Until you have confidence in your forecasting, 1-year commitments limit risk
•Prefer Convertible for uncertainty — If your architecture might change, the flexibility premium is worth it
•Use Regional scope — Regional RIs provide size flexibility and work across AZs
•Monitor utilization — Track RI utilization rates; unused RIs are wasted money
•Set up RI purchase recommendations — AWS and GCP provide recommendations based on usage history
•Consider the secondary market — AWS Marketplace allows selling unused RIs

The RI Commitment Risk

Savings Plans: The Modern Approach

Savings Plans have largely superseded Reserved Instances for most use cases because they offer comparable discounts with significantly more flexibility.

Types of Savings Plans:

Compute Savings Plans

Apply to any EC2 instance regardless of family, size, AZ, region, OS, or tenancy
Also apply to AWS Fargate and Lambda usage
Maximum flexibility; slightly lower discount than RI (but close)

EC2 Instance Savings Plans

Apply to specific instance families in specific regions (e.g., M5 in us-east-1)
Flexible across sizes, AZ, and OS
Higher discount than Compute Savings Plans, lower than equivalent RIs

SageMaker Savings Plans

Apply to SageMaker ML instance usage
Similar flexibility to Compute Savings Plans

Savings Plans Advantages

•Apply across instance families (change m5 → c6g)
•Apply across regions (shift workloads)
•Cover Fargate and Lambda usage
•Simpler to manage than RI portfolios
•Automatic application to eligible usage
•No need to match specific instance types

Savings Plans Limitations

•Cannot be sold on marketplace (unlike RIs)
•No capacity reservation option
•Slightly lower discount than equivalent RIs
•Still a financial commitment—unused is wasted
•Limited provider support (AWS-focused)
•Requires understanding of $/hour commitment

Calculating Savings Plan commitment:

To determine the right commitment level, analyze your historical usage:

Identify baseline compute spend — Review your last 30-90 days of EC2/Fargate/Lambda usage
Find the minimum usage floor — The lowest hourly spend during that period
Apply a safety margin — Commit to 70-80% of the floor (not 100%)
Calculate hourly commitment — Sum of instance hourly costs at Savings Plan rates

Example calculation:

Your EC2 usage patterns for the month:

Minimum hourly spend: $50/hour (nighttime, weekends)
Average hourly spend: $80/hour
Peak hourly spend: $150/hour

Recommended Savings Plan: $40/hour commitment (80% of minimum)

This ensures your commitment is always fully utilized while leaving headroom for variability. The remaining $10-110/hour runs on-demand, and peaks can use spot instances.

savings-plan-analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
"""
Savings Plan Commitment Calculator
 
Analyzes historical EC2 usage to recommend optimal Savings Plan commitment.
"""
 
import boto3
from datetime import datetime, timedelta
from typing import List, Dict
import statistics
 
def get_ec2_hourly_costs(
    start_date: datetime,
    end_date: datetime
) -> List[float]:
    """
    Fetch hourly EC2 costs from AWS Cost Explorer.
    Returns list of hourly spend amounts.
    """
    client = boto3.client('ce')
    
    response = client.get_cost_and_usage(
        TimePeriod={
            'Start': start_date.strftime('%Y-%m-%d'),
            'End': end_date.strftime('%Y-%m-%d')
        },
        Granularity='HOURLY',
        Filter={
            'Dimensions': {
                'Key': 'SERVICE',
                'Values': ['Amazon Elastic Compute Cloud - Compute']
            }
        },
        Metrics=['UnblendedCost']
    )
    
    costs = [
        float(period['Total']['UnblendedCost']['Amount'])
        for period in response['ResultsByTime']
    ]
    
    return costs
 
def recommend_savings_plan(
    hourly_costs: List[float],
    safety_margin: float = 0.80,  # Commit to 80% of minimum
    savings_plan_discount: float = 0.30  # Expected ~30% discount
) -> Dict:
    """
    Analyze usage patterns and recommend Savings Plan commitment.
    """
    # Calculate key statistics
    min_usage = min(hourly_costs)
    avg_usage = statistics.mean(hourly_costs)
    max_usage = max(hourly_costs)
    p10_usage = statistics.quantiles(hourly_costs, n=10)[0]  # 10th percentile
    
    # Recommended commitment (conservative: 80% of minimum)
    recommended_commitment = min_usage * safety_margin
    
    # Alternative: slightly aggressive (90% of P10)
    aggressive_commitment = p10_usage * 0.90
    
    # Calculate expected savings
    commitment_coverage = min(recommended_commitment, avg_usage) / avg_usage
    expected_savings = commitment_coverage * savings_plan_discount
    
    # Annual cost projection
    current_annual_cost = avg_usage * 24 * 365
    projected_annual_cost = current_annual_cost * (1 - expected_savings)
    annual_savings = current_annual_cost - projected_annual_cost
    
    return {
        'current_usage': {
            'minimum_hourly': round(min_usage, 2),
            'average_hourly': round(avg_usage, 2),
            'maximum_hourly': round(max_usage, 2),
            'p10_hourly': round(p10_usage, 2),
        },
        'recommendation': {
            'conservative_commitment': round(recommended_commitment, 2),
            'aggressive_commitment': round(aggressive_commitment, 2),
            'safety_margin_used': safety_margin,
        },
        'projected_impact': {
            'commitment_coverage': f"{commitment_coverage * 100:.1f}%",
            'expected_savings_rate': f"{expected_savings * 100:.1f}%",
            'current_annual_cost': f"${current_annual_cost:, .0f}",
            'projected_annual_cost': f"${projected_annual_cost:,.0f}",
                            'annual_savings': f"${annual_savings:,.0f}",
                        }
    }
 
# Example usage
if __name__ == "__main__":
    # Analyze last 90 days
    end_date = datetime.now()
    start_date = end_date - timedelta(days = 90)
    
    # In real usage, this would call AWS Cost Explorer
    # For demo, using sample data
    sample_costs =[
                45, 48, 52, 55, 75, 85, 95, 120,  # Morning ramp
        130, 140, 145, 150, 145, 140, 135, 120,  # Business hours
        100, 85, 70, 60, 55, 50, 48, 45,  # Evening wind - down
            ] * 90  # Simulate 90 days
    
    result = recommend_savings_plan(sample_costs)
    
    print("=== Savings Plan Analysis ===")
    print(f"Current Usage:")
    print(f"  Min: ${result['current_usage']['minimum_hourly']}/hr")
    print(f"  Avg: ${result['current_usage']['average_hourly']}/hr")
    print(f"  Max: ${result['current_usage']['maximum_hourly']}/hr")
    print()
    print(f"Recommendation:")
    print(f"  Conservative: ${result['recommendation']['conservative_commitment']}/hr")
    print(f"  Aggressive: ${result['recommendation']['aggressive_commitment']}/hr")
    print()
    print(f"Projected Impact:")
    print(f"  Annual Savings: {result['projected_impact']['annual_savings']}")

Spot Instances: The Interruptible Goldmine

How Spot Pricing Works:

AWS Spot pricing has evolved:

Old model (pre-2017): Auction-based with highly volatile pricing
New model (current): More stable pricing, gradual price changes, capacity-based interruption

In practice, spot prices for popular instances are remarkably stable, often hovering at 60-70% discount for extended periods.

Spot Instance Characteristics by Cloud Provider
Provider	Name	Discount	Interruption Notice	Max Runtime
AWS	Spot Instances	60-90%	2 minutes	Unlimited*
GCP	Preemptible VMs	60-91%	30 seconds	24 hours
GCP	Spot VMs	60-91%	30 seconds	Unlimited*
Azure	Spot VMs	Up to 90%	30 seconds	Unlimited*

*"Unlimited" means no provider-enforced time limit, but interruptions can occur anytime based on capacity needs.

Workloads Suitable for Spot Instances:

Ideal candidates:

Batch processing — Data pipelines, ETL jobs, report generation
CI/CD builds — Build servers, test runners, automated testing
Machine learning training — Distributed training with checkpointing
Big data processing — Spark, EMR, MapReduce jobs with retries
Rendering and transcoding — Media processing, CGI rendering
Stateless web workers — Workers behind load balancers with health checks
Scientific computing — Simulations, genomics, climate modeling

Challenging candidates:

User-facing APIs — Need high availability
Databases — Stateful, interruption causes data issues
Real-time systems — Cannot tolerate random interruptions
Singleton services — No redundancy = no fault tolerance

Spot Instance Interruption Handling

Spot Instance Architecture Patterns

•Instance Diversification — Spread workloads across multiple instance types and AZs. If one spot pool is interrupted, others continue. AWS Spot Fleet and Auto Scaling can manage this automatically.
•Checkpointing and Resume — Design jobs to checkpoint progress regularly. On interruption, restart from last checkpoint rather than beginning.
•Graceful Shutdown Handlers — Implement signal handlers (SIGTERM) that trigger graceful shutdown: drain queues, save state, deregister from load balancers.
•Spot + On-Demand Mix — Run critical baseline on On-Demand or Reserved; burst capacity on Spot. Preserves availability while optimizing cost.
•Queue-Based Decoupling — Put work in durable queues (SQS, Kafka). Workers pull from queue; if interrupted, another worker picks up the message.
•Spot Interruption Monitoring — Monitor interruption events and rates. Some instance types have lower interruption frequency—prefer those.

Implementing Spot at Scale

Running spot instances safely at production scale requires robust tooling and automation. Let's explore the key implementation patterns.

AWS Spot Fleet and Auto Scaling:

AWS provides two primary mechanisms for managing spot capacity:

Spot Fleet — Request spot capacity across instance pools (instance type + AZ combinations). Define allocation strategy:

lowestPrice: Prioritize cheapest pools (higher interruption risk)
diversified: Spread across pools (lower interruption risk)
capacityOptimized: Prioritize pools with highest availability (recommended)

EC2 Auto Scaling Mixed Instances — Combine On-Demand and Spot in a single Auto Scaling group:

Set base capacity on On-Demand
Set percentage for On-Demand vs Spot
Define instance type priorities

spot-auto-scaling.yaml
CloudFormation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# Mixed Instances Policy: On - Demand Base + Spot Scaling
# Provides baseline availability with cost - optimized scaling
 
    AWSTemplateFormatVersion: '2010-09-09'
    Description: 'Production Auto Scaling with Spot Instances'
 
    Resources:
    ProductionAutoScalingGroup:
    Type: AWS:: AutoScaling:: AutoScalingGroup
    Properties:
    AutoScalingGroupName: production - worker - asg
    VPCZoneIdentifier:
    - !Ref PrivateSubnetA
        - !Ref PrivateSubnetB
            - !Ref PrivateSubnetC
    MinSize: 4
    MaxSize: 40
    DesiredCapacity: 10
    HealthCheckType: ELB
    HealthCheckGracePeriod: 300
    TargetGroupARNs:
    - !Ref WorkerTargetGroup
      
      # Mixed Instances Policy for Spot + On - Demand
      MixedInstancesPolicy:
        InstancesDistribution:
          # Baseline: 20 % On - Demand(minimum 4 instances)
    OnDemandBaseCapacity: 4
    OnDemandPercentageAboveBaseCapacity: 0  # All scaling uses Spot
          
          # Spot allocation strategy
    SpotAllocationStrategy: capacity - optimized
    SpotInstancePools: 4  # Diversify across 4 pools
 
    LaunchTemplate:
    LaunchTemplateSpecification:
    LaunchTemplateId: !Ref WorkerLaunchTemplate
    Version: !GetAtt WorkerLaunchTemplate.LatestVersionNumber
          
          # Instance type diversification
          # List multiple similar instance types for flexibility
          Overrides:
        - InstanceType: m5.xlarge
    WeightedCapacity: 4
        - InstanceType: m5a.xlarge
    WeightedCapacity: 4
        - InstanceType: m5n.xlarge
    WeightedCapacity: 4
        - InstanceType: m4.xlarge
    WeightedCapacity: 4
        - InstanceType: m5.2xlarge
    WeightedCapacity: 8
        - InstanceType: m5a.2xlarge
    WeightedCapacity: 8
      
      # Lifecycle hooks for graceful Spot handling
      LifecycleHookSpecificationList:
        - LifecycleHookName: graceful - shutdown
    LifecycleTransition: autoscaling: EC2_INSTANCE_TERMINATING
    HeartbeatTimeout: 120  # 2 minutes for graceful shutdown
          DefaultResult: CONTINUE
 
  # Launch template with Spot interruption handling
    WorkerLaunchTemplate:
    Type: AWS:: EC2:: LaunchTemplate
    Properties:
    LaunchTemplateData:
    IamInstanceProfile:
    Arn: !GetAtt WorkerInstanceProfile.Arn
    ImageId: !Ref LatestAmiId
        
        # Spot configuration
    InstanceMarketOptions:
    MarketType: spot
    SpotOptions:
    SpotInstanceType: one - time
    InstanceInterruptionBehavior: terminate
        
        # Detailed monitoring for quick health detection
        Monitoring:
        Enabled: true
        
        # User data with shutdown handling
    UserData:
    Fn:: Base64: |
            #!/bin/bash
            # Install spot interruption handler
    amazon - linux - extras install - y aws - cli - 2
            
            # Start application with shutdown handling
        / opt / app / start.sh--graceful - shutdown=120
            
            # Spot interruption handler(daemon)
    cat > /opt/spot - handler.sh << 'EOF'
            #!/bin/bash
    TOKEN = $(curl - X PUT "http://169.254.169.254/latest/api/token" \
        -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
 
    while true; do
              # Check for interruption notice
              HTTP_CODE = $(curl - s - o / dev / null - w "%{http_code}" \
        -H "X-aws-ec2-metadata-token: $TOKEN" \
        http://169.254.169.254/latest/meta-data/spot/instance-action)
              
              if ["$HTTP_CODE" - eq 200]; then
                echo "Spot interruption notice received, initiating shutdown"
                # Signal application to drain
        / opt / app / drain - and - shutdown.sh
                # Deregister from load balancer
                aws elbv2 deregister - targets--target - group - arn $TG_ARN \
    --targets Id = $(curl - s http://169.254.169.254/latest/meta-data/instance-id)
        exit 0
              fi
              
              sleep 5
            done
            EOF
            chmod + x / opt / spot - handler.sh
            nohup / opt / spot - handler.sh & 

Kubernetes Spot Integration:

For containerized workloads, Kubernetes provides excellent spot integration:

AWS EKS with Karpenter: Karpenter automatically provisions the right nodes (including spot) based on pod requirements and cost optimization.

Node taints and tolerations: Mark spot nodes with taints; only pods with matching tolerations schedule there.

Pod Disruption Budgets: Ensure minimum availability during spot interruptions.

# Karpenter Provisioner for Spot Nodes
apiVersion: karpenter.sh / v1alpha5
kind: Provisioner
metadata: 
  name: spot - provisioner
spec: 
  requirements: 
 - key: karpenter.sh / capacity - type
      operator: In
      values: ["spot"]
    - key: kubernetes.io/arch
      operator: In
      values: ["amd64"]
  limits:
    resources:
      cpu: 1000
      memory: 2000Gi
  providerRef:
    name: default
  ttlSecondsAfterEmpty: 30

Spot Capacity Availability

Building a Balanced Portfolio

The most effective cloud cost strategy combines all purchasing options into a balanced portfolio, matching each option to the appropriate workload characteristics.

The Portfolio Approach:

Think of your compute strategy like an investment portfolio:

Core Holdings (Reserved/Savings Plans) — Cover predictable baseline demand
Variable Holdings (On-Demand) — Handle normal fluctuations above baseline
Opportunistic Holdings (Spot) — Capture value from interruptible workloads

Target allocation for a typical organization:

Pricing Model	Target Coverage	Workload Type
Reserved/Savings	50-70%	Steady-state production, databases, core services
On-Demand	10-20%	Variable production, peaks, spikes
Spot	20-30%	Batch, CI/CD, ML training, stateless workers

Converting Mermaid diagram...

Calculating blended cost:

Assume on-demand hourly rate of $1.00:

Component	Usage %	Rate	Effective Cost
Reserved (60% off)	55%	$0.40	$0.22
On-Demand	20%	$1.00	$0.20
Spot (75% off)	25%	$0.25	$0.0625
Blended	100%	—	$0.48/hr

This portfolio achieves a 52% reduction from pure on-demand, compared to:

60% reduction if 100% Reserved (but inflexible and risky)
75% reduction if 100% Spot (but unreliable)

The portfolio approach optimizes for both cost AND reliability.

Portfolio Optimization Process

•Analyze usage patterns — Collect 90+ days of hourly usage data. Identify baseline, variable, and burst patterns.
•Categorize workloads — Tag resources by workload type: steady-state, variable, interruptible.
•Size your commitments — Commit to 70-80% of baseline with Reserved/Savings. Leave headroom for changes.
•Configure spot for batch — Move eligible workloads (batch, CI/CD, training) to spot with proper architecture.
•Monitor and rebalance — Track utilization monthly. Adjust commitments as usage patterns change.
•Sell unused capacity — Use RI Marketplace to sell unused Reserved Instances.
•Iterate quarterly — Review portfolio quarterly against actual usage and upcoming changes.

Cross-Provider Considerations

While this page has primarily used AWS terminology, the pricing concepts apply across all major cloud providers with slight variations.

Azure Pricing Models:

Azure Reservations:

1 or 3-year terms for VMs, SQL Database, Cosmos DB, and more
Scope: Single subscription, management group, or shared
Exchangeable and refundable (with some restrictions)

Azure Spot VMs:

Up to 90% discount
Eviction based on capacity or price
Can set max price or use current price
30-second eviction notice

GCP Pricing Models:

Committed Use Discounts (CUDs):

1 or 3-year commitments
Resource-based (specific machine types) or spend-based
37-70% discounts

Preemptible VMs:

Up to 91% discount
Always terminated after 24 hours
30-second termination warning

Spot VMs (newer):

Similar discounts to Preemptible
No 24-hour maximum lifetime
Dynamic pricing

Pricing Model Terminology Across Providers
Concept	AWS	Azure	GCP
Standard Pricing	On-Demand	Pay-as-you-go	On-Demand
Commitment (specific)	Reserved Instances	Azure Reservations	Committed Use (resource)
Commitment (flexible)	Savings Plans	(within Reservations)	Committed Use (spend)
Interruptible	Spot Instances	Spot VMs	Spot VMs / Preemptible
Sustained Use Discount	N/A	N/A	Automatic (GCE)

GCP Sustained Use Discounts

Summary: Reserved vs Spot Instances

Key Takeaways

•On-demand is the most expensive option — It's paying a premium for maximum flexibility. Reserve it for true variability, not laziness.
•Reserved Instances/Savings Plans cover baseline — Commit to 70-80% of your minimum usage floor. Conservative commitments avoid waste.
•Savings Plans offer flexibility over RIs — For most organizations, Compute Savings Plans are the better choice unless you need capacity reservations.
•Spot instances offer massive savings — 60-90% discounts for interruptible workloads. Design for interruption, and spot becomes safe for production.
•Diversification reduces spot risk — Spread across instance types and AZs. Use capacity-optimized allocation strategies.
•Build a balanced portfolio — Combine all pricing models: Reserved for baseline, On-Demand for variability, Spot for batch.
•Monitor and rebalance regularly — Usage patterns change. Review commitment utilization and adjust quarterly.

What's next:

Page Complete

2 / 5