Loading learning content...
In 2018, a financial services company was spending $4.2 million annually on AWS EC2 instances—all on-demand pricing. After a comprehensive analysis and strategic shift to reserved instances and spot instances, they reduced that bill to $1.8 million the following year—a 57% reduction with zero changes to their actual workloads.
This transformation isn't magic; it's economics. Cloud providers offer multiple pricing models for compute resources, each with different cost-commitment tradeoffs. Understanding these models and applying them strategically is one of the highest-impact optimizations available to cloud architects.
The fundamental insight: Cloud providers price capacity based on certainty. The more certainty you provide them (through commitments), the lower your price. The more flexibility they retain (like being able to reclaim your capacity), the lower your price. On-demand pricing—where you provide no commitment and demand full availability—is the most expensive option because you're paying a premium for maximum flexibility.
By the end of this page, you will understand the economics of on-demand, reserved, savings plans, and spot pricing. You'll learn how to analyze workloads for purchasing optimization, calculate commitment levels, implement spot instance strategies safely, and build portfolios that minimize cost while maintaining reliability.
All major cloud providers offer tiered pricing models for compute resources. While the specific names and details vary, the fundamental options are remarkably similar:
On-Demand (Pay-as-you-go)
The default pricing model where you pay for compute capacity by the hour or second with no long-term commitment. Resources can be started and stopped freely. This is the most flexible but most expensive option.
Reserved Instances / Committed Use
A commitment to use a specific amount of compute capacity for 1-3 years in exchange for significant discounts (typically 30-75% off on-demand). The commitment is binding—you pay whether you use the capacity or not.
Savings Plans (AWS) / Committed Use Discounts (GCP)
A more flexible commitment model where you commit to a dollar amount of hourly spend rather than specific instance types. This provides discount flexibility across instance families and regions.
Spot Instances / Preemptible VMs / Spot VMs
Unused cloud capacity offered at steep discounts (60-90% off on-demand) with the caveat that your instances can be interrupted with short notice (typically 2 minutes to 30 seconds). Ideal for fault-tolerant, interruptible workloads.
| Model | Discount | Commitment | Flexibility | Interruption Risk | Best For |
|---|---|---|---|---|---|
| On-Demand | 0% | None | Maximum | None | Variable workloads, testing |
| Reserved (1yr) | 30-40% | 1 year | Low | None | Steady-state production |
| Reserved (3yr) | 50-75% | 3 years | Very Low | None | Long-term baseline |
| Savings Plans | 30-75% | 1-3 years | Medium | None | Evolving architectures |
| Spot/Preemptible | 60-90% | None | High | High | Batch, stateless, ML training |
Think of cloud pricing as trading flexibility for discount: On-Demand (max flexibility, no discount) → Reserved/Savings (less flexibility, moderate discount) → Spot (least reliability, maximum discount). The art is matching your workload characteristics to the appropriate pricing tier.
Reserved Instances (RIs) are the workhorse of cloud cost optimization for steady-state workloads. Understanding their mechanics, types, and strategies is essential for effective cloud financial management.
How Reserved Instances Work:
Contrary to the name, you're not actually "reserving" a specific server. Instead, you're purchasing a billing discount that automatically applies to matching on-demand usage. If you have a Reserved Instance for m5.xlarge in us-east-1, any on-demand m5.xlarge in that region will automatically receive the RI discount.
RI Purchasing Options:
Reserved Instances come in three payment options:
The discount difference between AURI and NURI is typically 5-10%. For organizations optimizing for cash flow, PURI or NURI may be preferable despite the slightly lower discount.
| Option | 1-Year Term | 3-Year Term | Savings vs On-Demand |
|---|---|---|---|
| On-Demand | $0.192/hour | $0.192/hour | — |
| All Upfront | $0.120/hour effective | $0.076/hour effective | 37% / 61% |
| Partial Upfront | $0.124/hour effective | $0.080/hour effective | 35% / 58% |
| No Upfront | $0.128/hour | $0.085/hour | 33% / 56% |
Standard vs Convertible Reserved Instances:
Standard RIs are tied to a specific instance type and cannot be changed. They offer slightly higher discounts (additional 5-10%) but require accurate capacity planning.
Convertible RIs allow you to exchange for different instance types, families, operating systems, or tenancy during the term. This flexibility is valuable when your architecture is evolving.
Standard RI: m5.xlarge → m5.xlarge only (highest discount)
Convertible RI: m5.xlarge → r5.xlarge or c5.2xlarge (flexible)
RI Scope: Regional vs Zonal
Regional RIs provide a discount for any matching usage across all Availability Zones in a region and provide instance size flexibility within the instance family.
Zonal RIs are tied to a specific AZ and also include a capacity reservation guarantee (you are guaranteed capacity in that AZ), but lack size flexibility.
For most workloads, Regional RIs are preferred because they offer more flexibility and still cover multi-AZ deployments.
Reserved Instances are a financial commitment. If you overcommit (buy more RIs than you use) or your needs change (migrating to containers, moving workloads), you'll pay for unused capacity. Start conservatively—it's better to pay slightly more on-demand for peak capacity than to waste money on unused reservations.
AWS introduced Savings Plans in 2019 as a more flexible alternative to Reserved Instances. Rather than committing to specific instance types, you commit to a consistent amount of compute usage (measured in $/hour) for a 1 or 3-year term.
Savings Plans have largely superseded Reserved Instances for most use cases because they offer comparable discounts with significantly more flexibility.
Types of Savings Plans:
Compute Savings Plans
EC2 Instance Savings Plans
SageMaker Savings Plans
Calculating Savings Plan commitment:
To determine the right commitment level, analyze your historical usage:
Example calculation:
Your EC2 usage patterns for the month:
Recommended Savings Plan: $40/hour commitment (80% of minimum)
This ensures your commitment is always fully utilized while leaving headroom for variability. The remaining $10-110/hour runs on-demand, and peaks can use spot instances.
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
"""Savings Plan Commitment Calculator Analyzes historical EC2 usage to recommend optimal Savings Plan commitment.""" import boto3from datetime import datetime, timedeltafrom typing import List, Dictimport statistics def get_ec2_hourly_costs( start_date: datetime, end_date: datetime) -> List[float]: """ Fetch hourly EC2 costs from AWS Cost Explorer. Returns list of hourly spend amounts. """ client = boto3.client('ce') response = client.get_cost_and_usage( TimePeriod={ 'Start': start_date.strftime('%Y-%m-%d'), 'End': end_date.strftime('%Y-%m-%d') }, Granularity='HOURLY', Filter={ 'Dimensions': { 'Key': 'SERVICE', 'Values': ['Amazon Elastic Compute Cloud - Compute'] } }, Metrics=['UnblendedCost'] ) costs = [ float(period['Total']['UnblendedCost']['Amount']) for period in response['ResultsByTime'] ] return costs def recommend_savings_plan( hourly_costs: List[float], safety_margin: float = 0.80, # Commit to 80% of minimum savings_plan_discount: float = 0.30 # Expected ~30% discount) -> Dict: """ Analyze usage patterns and recommend Savings Plan commitment. """ # Calculate key statistics min_usage = min(hourly_costs) avg_usage = statistics.mean(hourly_costs) max_usage = max(hourly_costs) p10_usage = statistics.quantiles(hourly_costs, n=10)[0] # 10th percentile # Recommended commitment (conservative: 80% of minimum) recommended_commitment = min_usage * safety_margin # Alternative: slightly aggressive (90% of P10) aggressive_commitment = p10_usage * 0.90 # Calculate expected savings commitment_coverage = min(recommended_commitment, avg_usage) / avg_usage expected_savings = commitment_coverage * savings_plan_discount # Annual cost projection current_annual_cost = avg_usage * 24 * 365 projected_annual_cost = current_annual_cost * (1 - expected_savings) annual_savings = current_annual_cost - projected_annual_cost return { 'current_usage': { 'minimum_hourly': round(min_usage, 2), 'average_hourly': round(avg_usage, 2), 'maximum_hourly': round(max_usage, 2), 'p10_hourly': round(p10_usage, 2), }, 'recommendation': { 'conservative_commitment': round(recommended_commitment, 2), 'aggressive_commitment': round(aggressive_commitment, 2), 'safety_margin_used': safety_margin, }, 'projected_impact': { 'commitment_coverage': f"{commitment_coverage * 100:.1f}%", 'expected_savings_rate': f"{expected_savings * 100:.1f}%", 'current_annual_cost': f"${current_annual_cost:, .0f}", 'projected_annual_cost': f"${projected_annual_cost:,.0f}", 'annual_savings': f"${annual_savings:,.0f}", } } # Example usageif __name__ == "__main__": # Analyze last 90 days end_date = datetime.now() start_date = end_date - timedelta(days = 90) # In real usage, this would call AWS Cost Explorer # For demo, using sample data sample_costs =[ 45, 48, 52, 55, 75, 85, 95, 120, # Morning ramp 130, 140, 145, 150, 145, 140, 135, 120, # Business hours 100, 85, 70, 60, 55, 50, 48, 45, # Evening wind - down ] * 90 # Simulate 90 days result = recommend_savings_plan(sample_costs) print("=== Savings Plan Analysis ===") print(f"Current Usage:") print(f" Min: ${result['current_usage']['minimum_hourly']}/hr") print(f" Avg: ${result['current_usage']['average_hourly']}/hr") print(f" Max: ${result['current_usage']['maximum_hourly']}/hr") print() print(f"Recommendation:") print(f" Conservative: ${result['recommendation']['conservative_commitment']}/hr") print(f" Aggressive: ${result['recommendation']['aggressive_commitment']}/hr") print() print(f"Projected Impact:") print(f" Annual Savings: {result['projected_impact']['annual_savings']}")Spot Instances (AWS terminology) or Preemptible/Spot VMs (GCP/Azure) represent unused cloud capacity sold at steep discounts—typically 60-90% off on-demand pricing. The tradeoff is that these instances can be interrupted with minimal notice (2 minutes on AWS, 30 seconds on GCP).
Spot instances are the most misunderstood and underutilized pricing model. Many teams avoid them due to interruption fear, but with proper architecture, spot instances can safely run production workloads and dramatically reduce costs.
How Spot Pricing Works:
Spot prices fluctuate based on supply and demand for unused capacity in a specific instance type, AZ, and region combination. When demand exceeds supply, prices rise. When your maximum bid (if using bid model) or the current price exceeds your instance's bid, your instance may be interrupted.
AWS Spot pricing has evolved:
In practice, spot prices for popular instances are remarkably stable, often hovering at 60-70% discount for extended periods.
| Provider | Name | Discount | Interruption Notice | Max Runtime |
|---|---|---|---|---|
| AWS | Spot Instances | 60-90% | 2 minutes | Unlimited* |
| GCP | Preemptible VMs | 60-91% | 30 seconds | 24 hours |
| GCP | Spot VMs | 60-91% | 30 seconds | Unlimited* |
| Azure | Spot VMs | Up to 90% | 30 seconds | Unlimited* |
*"Unlimited" means no provider-enforced time limit, but interruptions can occur anytime based on capacity needs.
Workloads Suitable for Spot Instances:
Ideal candidates:
Challenging candidates:
The 2-minute warning (AWS) may seem short, but it's enough to: gracefully drain connections, checkpoint work in progress, store state externally, and trigger instance replacement. Your application must be designed to handle interruptions gracefully. If you can't handle a 2-minute shutdown, you can't use spot instances.
Running spot instances safely at production scale requires robust tooling and automation. Let's explore the key implementation patterns.
AWS Spot Fleet and Auto Scaling:
AWS provides two primary mechanisms for managing spot capacity:
Spot Fleet — Request spot capacity across instance pools (instance type + AZ combinations). Define allocation strategy:
lowestPrice: Prioritize cheapest pools (higher interruption risk)diversified: Spread across pools (lower interruption risk)capacityOptimized: Prioritize pools with highest availability (recommended)EC2 Auto Scaling Mixed Instances — Combine On-Demand and Spot in a single Auto Scaling group:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
# Mixed Instances Policy: On - Demand Base + Spot Scaling# Provides baseline availability with cost - optimized scaling AWSTemplateFormatVersion: '2010-09-09' Description: 'Production Auto Scaling with Spot Instances' Resources: ProductionAutoScalingGroup: Type: AWS:: AutoScaling:: AutoScalingGroup Properties: AutoScalingGroupName: production - worker - asg VPCZoneIdentifier: - !Ref PrivateSubnetA - !Ref PrivateSubnetB - !Ref PrivateSubnetC MinSize: 4 MaxSize: 40 DesiredCapacity: 10 HealthCheckType: ELB HealthCheckGracePeriod: 300 TargetGroupARNs: - !Ref WorkerTargetGroup # Mixed Instances Policy for Spot + On - Demand MixedInstancesPolicy: InstancesDistribution: # Baseline: 20 % On - Demand(minimum 4 instances) OnDemandBaseCapacity: 4 OnDemandPercentageAboveBaseCapacity: 0 # All scaling uses Spot # Spot allocation strategy SpotAllocationStrategy: capacity - optimized SpotInstancePools: 4 # Diversify across 4 pools LaunchTemplate: LaunchTemplateSpecification: LaunchTemplateId: !Ref WorkerLaunchTemplate Version: !GetAtt WorkerLaunchTemplate.LatestVersionNumber # Instance type diversification # List multiple similar instance types for flexibility Overrides: - InstanceType: m5.xlarge WeightedCapacity: 4 - InstanceType: m5a.xlarge WeightedCapacity: 4 - InstanceType: m5n.xlarge WeightedCapacity: 4 - InstanceType: m4.xlarge WeightedCapacity: 4 - InstanceType: m5.2xlarge WeightedCapacity: 8 - InstanceType: m5a.2xlarge WeightedCapacity: 8 # Lifecycle hooks for graceful Spot handling LifecycleHookSpecificationList: - LifecycleHookName: graceful - shutdown LifecycleTransition: autoscaling: EC2_INSTANCE_TERMINATING HeartbeatTimeout: 120 # 2 minutes for graceful shutdown DefaultResult: CONTINUE # Launch template with Spot interruption handling WorkerLaunchTemplate: Type: AWS:: EC2:: LaunchTemplate Properties: LaunchTemplateData: IamInstanceProfile: Arn: !GetAtt WorkerInstanceProfile.Arn ImageId: !Ref LatestAmiId # Spot configuration InstanceMarketOptions: MarketType: spot SpotOptions: SpotInstanceType: one - time InstanceInterruptionBehavior: terminate # Detailed monitoring for quick health detection Monitoring: Enabled: true # User data with shutdown handling UserData: Fn:: Base64: | #!/bin/bash # Install spot interruption handler amazon - linux - extras install - y aws - cli - 2 # Start application with shutdown handling / opt / app / start.sh--graceful - shutdown=120 # Spot interruption handler(daemon) cat > /opt/spot - handler.sh << 'EOF' #!/bin/bash TOKEN = $(curl - X PUT "http://169.254.169.254/latest/api/token" \ -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") while true; do # Check for interruption notice HTTP_CODE = $(curl - s - o / dev / null - w "%{http_code}" \ -H "X-aws-ec2-metadata-token: $TOKEN" \ http://169.254.169.254/latest/meta-data/spot/instance-action) if ["$HTTP_CODE" - eq 200]; then echo "Spot interruption notice received, initiating shutdown" # Signal application to drain / opt / app / drain - and - shutdown.sh # Deregister from load balancer aws elbv2 deregister - targets--target - group - arn $TG_ARN \ --targets Id = $(curl - s http://169.254.169.254/latest/meta-data/instance-id) exit 0 fi sleep 5 done EOF chmod + x / opt / spot - handler.sh nohup / opt / spot - handler.sh & Kubernetes Spot Integration:
For containerized workloads, Kubernetes provides excellent spot integration:
AWS EKS with Karpenter: Karpenter automatically provisions the right nodes (including spot) based on pod requirements and cost optimization.
Node taints and tolerations: Mark spot nodes with taints; only pods with matching tolerations schedule there.
Pod Disruption Budgets: Ensure minimum availability during spot interruptions.
# Karpenter Provisioner for Spot Nodes
apiVersion: karpenter.sh / v1alpha5
kind: Provisioner
metadata:
name: spot - provisioner
spec:
requirements:
- key: karpenter.sh / capacity - type
operator: In
values: ["spot"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
limits:
resources:
cpu: 1000
memory: 2000Gi
providerRef:
name: default
ttlSecondsAfterEmpty: 30
Not all instance types have the same spot availability. Newer generation instances (m5, c5, r5) typically have more spot capacity than older generations. Instances with GPUs often have limited spot availability due to ML training demand. Check the Spot Instance Advisor (AWS) for interruption frequency data before selecting instance types.
The most effective cloud cost strategy combines all purchasing options into a balanced portfolio, matching each option to the appropriate workload characteristics.
The Portfolio Approach:
Think of your compute strategy like an investment portfolio:
Target allocation for a typical organization:
| Pricing Model | Target Coverage | Workload Type |
|---|---|---|
| Reserved/Savings | 50-70% | Steady-state production, databases, core services |
| On-Demand | 10-20% | Variable production, peaks, spikes |
| Spot | 20-30% | Batch, CI/CD, ML training, stateless workers |
Calculating blended cost:
Assume on-demand hourly rate of $1.00:
| Component | Usage % | Rate | Effective Cost |
|---|---|---|---|
| Reserved (60% off) | 55% | $0.40 | $0.22 |
| On-Demand | 20% | $1.00 | $0.20 |
| Spot (75% off) | 25% | $0.25 | $0.0625 |
| Blended | 100% | — | $0.48/hr |
This portfolio achieves a 52% reduction from pure on-demand, compared to:
The portfolio approach optimizes for both cost AND reliability.
While this page has primarily used AWS terminology, the pricing concepts apply across all major cloud providers with slight variations.
Azure Pricing Models:
Azure Reservations:
Azure Spot VMs:
GCP Pricing Models:
Committed Use Discounts (CUDs):
Preemptible VMs:
Spot VMs (newer):
| Concept | AWS | Azure | GCP |
|---|---|---|---|
| Standard Pricing | On-Demand | Pay-as-you-go | On-Demand |
| Commitment (specific) | Reserved Instances | Azure Reservations | Committed Use (resource) |
| Commitment (flexible) | Savings Plans | (within Reservations) | Committed Use (spend) |
| Interruptible | Spot Instances | Spot VMs | Spot VMs / Preemptible |
| Sustained Use Discount | N/A | N/A | Automatic (GCE) |
GCP uniquely offers automatic Sustained Use Discounts (SUDs) for Compute Engine. If you run an instance continuously, you receive automatic discounts up to 30% without any commitment. This reduces the relative value of Committed Use for GCP compared to AWS Reserved Instances, since you're already getting partial discounts automatically.
Cloud compute pricing is not one-size-fits-all. The choice between on-demand, reserved, savings plans, and spot depends on your workload characteristics, risk tolerance, and organizational maturity. Let's consolidate the key takeaways:
What's next:
With pricing models understood, the next step is ensuring you're not paying for more capacity than you need. The next page explores Right-Sizing Resources—the practice of matching resource allocation to actual workload requirements, often the single highest-impact optimization available.
You now understand the economics of cloud compute pricing, from basic On-Demand to advanced Spot strategies. These purchasing decisions can reduce compute costs by 50-70% with no changes to your actual workloads. Next, we'll ensure you're not over-provisioning the resources you're buying.