Loading learning content...
A mid-sized SaaS company ran 200 EC2 instances to power their production workloads. When a new FinOps engineer analyzed their CPU and memory utilization, she found that average CPU utilization was 8% and average memory utilization was 15%. The instances were sized for peak loads that occurred for 30 minutes per month, but the organization was paying for that capacity 24/7/365.
After a methodical right-sizing initiative, they reduced their instance count to 120 and downsized 60 others by 1-2 instance sizes. The result: $400,000 annual savings—roughly 40% of their compute bill—with zero performance degradation.
This story is surprisingly common. Research by Densify, RightScale, and cloud providers consistently shows that 30-50% of cloud resources are over-provisioned. Organizations provision for peak, add safety margins, inherit developer assumptions, and rarely revisit those decisions. The result is a massive "over-provisioning tax" that directly reduces profitability.
Right-sizing is the practice of continually analyzing resource utilization and adjusting allocations to match actual requirements. It's conceptually simple but operationally challenging—and it's often the single highest-impact cost optimization available.
By the end of this page, you will understand how to implement systematic right-sizing: collecting and analyzing utilization data, identifying right-sizing candidates, safely implementing changes, and building a culture of continuous optimization. You'll learn specific techniques for compute, databases, storage, and containerized workloads.
Before we solve over-provisioning, we need to understand why it's so pervasive. Over-provisioning isn't stupidity or carelessness—it's a rational response to incentives and constraints that favor larger resources.
Why over-provisioning happens:
The psychology of 'just in case':
Consider a developer sizing a database instance. They estimate 2 vCPUs and 4GB RAM should be sufficient. But what if they're wrong? What if that Black Friday spike is bigger than expected? What if that new feature uses more resources?
The developer has two choices:
Without countervailing incentives (chargeback, right-sizing culture, easy scaling), developers will always choose option 2. This is rational behavior—the problem is the incentive structure, not the developer.
The compounding effect:
Over-provisioning isn't additive; it's multiplicative:
| Layer | Safety Margin | Compounded |
|---|---|---|
| Developer estimate | "add 50%" | 1.5x |
| Architect review | "double for growth" | 3.0x |
| Production buffer | "add 20% for staging" | 3.6x |
| High availability | "3 replicas" | 10.8x |
A workload that needs 1 vCPU becomes 10.8 vCPUs through cascading safety margins.
In the cloud, unused capacity costs money. A 16-CPU instance running at 5% utilization isn't 'free capacity for growth'—it's 15 CPUs of waste at $X/hour. Unlike on-prem hardware (already paid for), cloud waste is operational expense that directly reduces profit margin.
Effective right-sizing requires accurate, comprehensive utilization data. You need to measure what you're actually using before you can determine what you should provision.
Key metrics for right-sizing:
| Resource | Primary Metrics | Secondary Metrics | Collection Method |
|---|---|---|---|
| EC2/VMs | CPU utilization, Memory utilization | Network I/O, Disk I/O, IOPS | CloudWatch, Azure Monitor, GCP Monitoring |
| RDS/Databases | CPU, Memory, Storage, IOPS | Connections, Query latency | CloudWatch, Performance Insights |
| Containers (ECS/K8s) | CPU requests/usage, Memory requests/usage | Throttling, OOMKills | Container Insights, Prometheus, Datadog |
| Lambda/Functions | Duration, Memory, Concurrency | Cold starts, Errors | CloudWatch, X-Ray |
| Storage (S3/EBS) | Storage used, Access patterns | Request rates, Data class usage | S3 Analytics, CloudWatch |
Data collection requirements:
1. Time range matters
Point-in-time measurements are misleading. You need utilization data spanning:
Minimum recommended: 14 days of hourly data Better: 30-90 days of 5-minute data Ideal: 6-12 months for seasonal workloads (retail, finance, education)
2. Statistical measures beyond averages
Average utilization hides important patterns:
Instance A: Average CPU 10%, Peak CPU 95%, Std Dev 25%
Instance B: Average CPU 10%, Peak CPU 15%, Std Dev 2%
Both have the same average, but Instance A has high variance (can't downsize easily), while Instance B is consistently over-provisioned (easy win).
Key statistics to collect:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196
"""Right-Sizing Analysis: Collect and Analyze EC2 Utilization Uses CloudWatch metrics to identify right-sizing opportunities.""" import boto3from datetime import datetime, timedeltafrom dataclasses import dataclassfrom typing import List, Dictimport statistics @dataclassclass InstanceUtilization: instance_id: str instance_type: str avg_cpu: float max_cpu: float p95_cpu: float avg_memory: float # Requires CloudWatch agent p95_memory: float current_vcpus: int current_memory_gb: float recommendation: str potential_savings: float def get_cpu_metrics( cloudwatch: boto3.client, instance_id: str, days: int = 14) -> List[float]: """Fetch CPU utilization data points.""" end_time = datetime.utcnow() start_time = end_time - timedelta(days=days) response = cloudwatch.get_metric_statistics( Namespace='AWS/EC2', MetricName='CPUUtilization', Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}], StartTime=start_time, EndTime=end_time, Period=300, # 5-minute granularity Statistics=['Average'] ) return [dp['Average'] for dp in response['Datapoints']] def analyze_instance( instance: dict, cpu_data: List[float], memory_data: List[float], instance_prices: Dict[str, float]) -> InstanceUtilization: """Analyze a single instance for right-sizing opportunities.""" instance_id = instance['InstanceId'] instance_type = instance['InstanceType'] # Calculate CPU statistics avg_cpu = statistics.mean(cpu_data) if cpu_data else 0 max_cpu = max(cpu_data) if cpu_data else 0 p95_cpu = sorted(cpu_data)[int(len(cpu_data) * 0.95)] if cpu_data else 0 # Calculate memory statistics (requires CloudWatch agent) avg_memory = statistics.mean(memory_data) if memory_data else 0 p95_memory = sorted(memory_data)[int(len(memory_data) * 0.95)] if memory_data else 0 # Get instance capacity (simplified lookup) vcpus, memory_gb = get_instance_specs(instance_type) # Determine recommendation recommendation, new_type = determine_recommendation( instance_type, avg_cpu, p95_cpu, avg_memory, p95_memory ) # Calculate potential savings current_price = instance_prices.get(instance_type, 0) new_price = instance_prices.get(new_type, current_price) monthly_savings = (current_price - new_price) * 730 # hours/month return InstanceUtilization( instance_id=instance_id, instance_type=instance_type, avg_cpu=round(avg_cpu, 2), max_cpu=round(max_cpu, 2), p95_cpu=round(p95_cpu, 2), avg_memory=round(avg_memory, 2), p95_memory=round(p95_memory, 2), current_vcpus=vcpus, current_memory_gb=memory_gb, recommendation=recommendation, potential_savings=round(monthly_savings, 2) ) def determine_recommendation( current_type: str, avg_cpu: float, p95_cpu: float, avg_memory: float, p95_memory: float) -> tuple: """ Recommendation logic based on utilization thresholds. Conservative approach: - Only recommend downsize if P95 < 50% - Recommend upsize if P95 > 80% - Keep buffer for unexpected peaks """ # Define thresholds (adjustable based on risk tolerance) DOWNSIZE_THRESHOLD = 50 # P95 below this = downsize candidate UPSIZE_THRESHOLD = 80 # P95 above this = upsize candidate cpu_under = p95_cpu < DOWNSIZE_THRESHOLD memory_under = p95_memory < DOWNSIZE_THRESHOLD if p95_memory > 0 else True cpu_over = p95_cpu > UPSIZE_THRESHOLD memory_over = p95_memory > UPSIZE_THRESHOLD if cpu_under and memory_under: new_type = get_smaller_type(current_type) return f"DOWNSIZE to {new_type}", new_type elif cpu_over or memory_over: new_type = get_larger_type(current_type) return f"UPSIZE to {new_type}", new_type else: return "OPTIMAL - No change recommended", current_type def get_instance_specs(instance_type: str) -> tuple: """Return (vCPUs, memory_GB) for instance type.""" # Simplified lookup - production would use AWS pricing API specs = { 'm5.large': (2, 8), 'm5.xlarge': (4, 16), 'm5.2xlarge': (8, 32), 'm5.4xlarge': (16, 64), 'c5.large': (2, 4), 'c5.xlarge': (4, 8), 'r5.large': (2, 16), 'r5.xlarge': (4, 32), } return specs.get(instance_type, (4, 16)) def get_smaller_type(instance_type: str) -> str: """Return one size smaller instance type.""" size_order = ['nano', 'micro', 'small', 'medium', 'large', 'xlarge', '2xlarge', '4xlarge'] family, size = instance_type.rsplit('.', 1) if size in size_order: idx = size_order.index(size) if idx > 0: return f"{family}.{size_order[idx-1]}" return instance_type def get_larger_type(instance_type: str) -> str: """Return one size larger instance type.""" size_order = ['nano', 'micro', 'small', 'medium', 'large', 'xlarge', '2xlarge', '4xlarge'] family, size = instance_type.rsplit('.', 1) if size in size_order: idx = size_order.index(size) if idx < len(size_order) - 1: return f"{family}.{size_order[idx+1]}" return instance_type # Main executiondef run_analysis(): ec2 = boto3.client('ec2') cloudwatch = boto3.client('cloudwatch') # Get running instances instances = ec2.describe_instances( Filters=[{'Name': 'instance-state-name', 'Values': ['running']}] ) results = [] for reservation in instances['Reservations']: for instance in reservation['Instances']: cpu_data = get_cpu_metrics(cloudwatch, instance['InstanceId']) memory_data = [] # Would require CloudWatch agent data analysis = analyze_instance( instance, cpu_data, memory_data, instance_prices={} # Would load from pricing API ) results.append(analysis) if 'DOWNSIZE' in analysis.recommendation: print(f"💰 {analysis.instance_id}: {analysis.recommendation}") print(f" Current: {analysis.instance_type} ({analysis.avg_cpu}% avg, {analysis.p95_cpu}% p95)") print(f" Savings: ${analysis.potential_savings}/month") total_savings = sum(r.potential_savings for r in results) print(f"\nTotal potential monthly savings: ${total_savings:, .2f}")Compute resources (EC2, VMs, containers) typically represent the largest category of cloud spending and offer the most right-sizing opportunity. Let's explore specific strategies for each compute model.
EC2/VM Right-Sizing:
Step 1: Identify candidates
Use cloud provider recommendations (AWS Compute Optimizer, Azure Advisor, GCP Recommender) or custom analysis to find instances with:
Step 2: Validate with application context
Metrics don't tell the whole story. Before downsizing:
Step 3: Test before production
Never downsize production without testing:
Modern instance families:
Right-sizing isn't just about making instances smaller—it's also about using the right instance family. New instance generations often provide better performance at lower cost:
| Old Instance | New Equivalent | Performance | Cost Change |
|---|---|---|---|
| m4.xlarge | m6i.xlarge | +15% | Similar |
| c4.2xlarge | c6g.2xlarge | +40% | -20% |
| r4.large | r6g.large | +40% | -10% |
Graviton (ARM) instances — AWS Graviton processors offer 20-40% better price-performance than x86 equivalents. If your workload runs on Linux and doesn't require x86-specific binaries, Graviton migration is a compelling optimization.
Right-sizing is easiest during other changes. When migrating to containers, rebuilding infrastructure, or major deployments, incorporate right-sizing from the start. Established services have political resistance to change ('it's working, why touch it?'); new deployments don't.
Containerized workloads present unique right-sizing challenges. In Kubernetes, resource allocation happens at two levels:
Over-provisioning at either level wastes money, but the mechanisms differ.
Pod resource requests/limits:
Kubernetes uses requests for scheduling and limits for enforcement:
resources:
requests:
cpu: "500m" # Scheduling guarantee: 0.5 CPU
memory: "512Mi" # Scheduling guarantee: 512 MB
limits:
cpu: "1000m" # Maximum: 1 CPU (throttled if exceeded)
memory: "1Gi" # Maximum: 1 GB (OOMKilled if exceeded)
Common misconfigurations:
Requests too high — Pods request more than they use, blocking other pods from scheduling. Nodes appear full, but actual utilization is low.
No limits set — Pods can consume unlimited resources, affecting neighbors (noisy neighbor problem).
Limits = Requests — No burst capacity; may cause unnecessary throttling or OOMKills.
Vertical Pod Autoscaler (VPA):
Kubernetes VPA automatically adjusts pod requests/limits based on observed usage:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto" # Options: Off, Initial, Recreate, Auto
resourcePolicy:
containerPolicies:
- containerName: "*"
minAllowed:
cpu: "50m"
memory: "100Mi"
maxAllowed:
cpu: "2"
memory: "4Gi"
VPA modes:
Node right-sizing:
Even with optimal pod sizing, nodes can be over-provisioned. Kubernetes cluster autoscaler adds/removes nodes based on pending pods, but doesn't automatically address over-provisioned nodes.
Karpenter (AWS) provides more intelligent node provisioning:
123456789101112131415161718192021222324252627282930313233343536373839404142
# Karpenter NodePool with Consolidation# Automatically consolidates under-utilized nodesapiVersion: karpenter.sh/v1beta1kind: NodePoolmetadata: name: defaultspec: template: spec: requirements: - key: kubernetes.io/arch operator: In values: ["amd64"] - key: karpenter.sh/capacity-type operator: In values: ["spot", "on-demand"] - key: node.kubernetes.io/instance-type operator: In values: - m6i.large - m6i.xlarge - m6i.2xlarge - m6a.large - m6a.xlarge - m6a.2xlarge - c6i.large - c6i.xlarge nodeClassRef: name: default disruption: # Enable consolidation - Karpenter will: # 1. Delete empty nodes # 2. Replace under-utilized nodes with smaller ones # 3. Consolidate pods onto fewer nodes consolidationPolicy: WhenUnderutilized consolidateAfter: 30s # Wait 30s before consolidating # Limit total resources for cost control limits: cpu: 1000 memory: 1000GiDatabases are frequently the most over-provisioned resources in cloud environments. The fear of database performance issues is acute—slow queries directly impact user experience. But this fear often leads to 5-10x over-provisioning.
Why databases are over-provisioned:
Database right-sizing dimensions:
| Dimension | Over-Provisioned Signs | Right-Sizing Action |
|---|---|---|
| Instance Size (vCPU/Memory) | CPU < 20%, Memory < 40% | Downsize instance class |
| Storage (EBS/SSD) | Storage utilization < 30% | Reduce allocated storage* |
| IOPS | Provisioned IOPS >> actual IOPS | Use burstable storage or reduce PIOPS |
| Read Replicas | Replicas with minimal traffic | Remove unnecessary replicas |
| Multi-AZ | Dev/test using Multi-AZ | Disable Multi-AZ for non-production |
*Note: Most managed databases don't allow reducing storage once provisioned. Storage right-sizing must be done proactively or during migration.
RDS-specific right-sizing:
AWS RDS Performance Insights provides deep visibility into database performance:
Key metrics for RDS right-sizing:
# CPU - Look for consistent low utilization
CPUUtilization < 20% average
CPUUtilization < 40% peak
# Memory - Buffer pool hit ratio matters more than raw utilization
FreeableMemory > 50% of total (might be over-provisioned)
BufferCacheHitRatio > 99% (memory is sufficient, maybe over)
# Storage I/O - Compare provisioned to actual
ReadIOPS + WriteIOPS << Provisioned IOPS (PIOPS waste)
ReadLatency < 5ms and WriteLatency < 5ms (IOPS sufficient)
# Connections
DatabaseConnections << max_connections (connection pooling may allow smaller instance)
Database right-sizing is higher risk than compute right-sizing. Unlike stateless services that can be quickly scaled, database changes require data migration, replication catch-up, and maintenance windows. Always validate thoroughly and prefer conservative changes (one size at a time).
Manual right-sizing doesn't scale. In an environment with hundreds or thousands of resources, you need automation—both for identifying opportunities and implementing changes safely.
Cloud Provider Recommendation Tools:
| Provider | Tool | Capabilities |
|---|---|---|
| AWS | AWS Compute Optimizer | EC2, Auto Scaling, EBS, Lambda recommendations based on ML analysis |
| AWS | AWS Trusted Advisor | Right-sizing alerts, idle resource detection |
| AWS | AWS Cost Explorer | RI/Savings Plan recommendations, usage patterns |
| Azure | Azure Advisor | VM right-sizing, idle resource detection |
| Azure | Azure Cost Management | Optimization recommendations, budget alerts |
| GCP | Recommender | VM/disk right-sizing, idle resource detection |
| GCP | Active Assist | ML-based optimization recommendations |
Third-party tools:
For advanced automation and multi-cloud support:
Implementing automated right-sizing:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181
"""Automated Right-Sizing Pipeline Fetches Compute Optimizer recommendations and creates Jira ticketsfor human review, with option for automated implementation in non-prod.""" import boto3import jsonfrom datetime import datetimefrom jira import JIRA class RightSizingPipeline: def __init__(self): self.compute_optimizer = boto3.client('compute-optimizer') self.ec2 = boto3.client('ec2') self.ssm = boto3.client('ssm') # Jira connection (configure in environment) self.jira = JIRA( server='https://company.atlassian.net', basic_auth=('user', 'api_token') ) def get_recommendations(self, finding_status: str = 'Overprovisioned'): """Fetch EC2 right-sizing recommendations.""" response = self.compute_optimizer.get_ec2_instance_recommendations( filters=[ {'name': 'Finding', 'values': [finding_status]} ] ) recommendations = [] for rec in response.get('instanceRecommendations', []): instance_id = rec['instanceArn'].split('/')[-1] current_type = rec['currentInstanceType'] # Get top recommendation if rec.get('recommendationOptions'): best_option = rec['recommendationOptions'][0] recommended_type = best_option['instanceType'] estimated_savings = best_option.get('projectedUtilizationMetrics', []) recommendations.append({ 'instance_id': instance_id, 'current_type': current_type, 'recommended_type': recommended_type, 'finding': rec['finding'], 'finding_reasons': rec.get('findingReasonCodes', []), 'utilization': rec.get('utilizationMetrics', []), 'estimated_monthly_savings': self._calculate_savings( current_type, recommended_type ) }) return recommendations def _calculate_savings(self, current_type: str, recommended_type: str) -> float: """Calculate estimated monthly savings (simplified).""" # Production: Use AWS Pricing API prices = { 'm5.2xlarge': 0.384, 'm5.xlarge': 0.192, 'm5.large': 0.096, 'c5.2xlarge': 0.340, 'c5.xlarge': 0.170, 'c5.large': 0.085, } current_price = prices.get(current_type, 0.20) new_price = prices.get(recommended_type, 0.20) return round((current_price - new_price) * 730, 2) # Monthly hours def create_ticket(self, recommendation: dict, environment: str): """Create Jira ticket for right-sizing review.""" avg_cpu = next( (m['value'] for m in recommendation['utilization'] if m['name'] == 'CPU'), 'N/A' ) description = f"""*Automated Right-Sizing Recommendation* |*Field*|*Value*||Instance ID|{recommendation['instance_id']}||Current Type|{recommendation['current_type']}||Recommended Type|{recommendation['recommended_type']}||Finding|{recommendation['finding']}||Average CPU|{avg_cpu}%||Est. Monthly Savings|${recommendation['estimated_monthly_savings']}| *Finding Reasons:*{chr(10).join(f'* {r}' for r in recommendation['finding_reasons'])} *Next Steps:*1. Review utilization metrics in CloudWatch2. Verify workload can run on smaller instance3. Schedule maintenance window for change4. Monitor post-change performance _This ticket was auto-generated by the Right-Sizing Pipeline_ """ issue = self.jira.create_issue( project='CLOUD', summary=f"Right-Size {recommendation['instance_id']}: " f"{recommendation['current_type']} → {recommendation['recommended_type']}", description=description, issuetype={'name': 'Task'}, labels=['right-sizing', 'cost-optimization', environment], ) return issue.key def auto_implement_nonprod(self, recommendation: dict): """ Automatically implement right-sizing for non-production instances. Uses SSM automation with proper safety checks. """ instance_id = recommendation['instance_id'] new_type = recommendation['recommended_type'] # Safety checks tags = self.ec2.describe_tags( Filters=[{'Name': 'resource-id', 'Values': [instance_id]}] ) env_tag = next( (t['Value'] for t in tags['Tags'] if t['Key'] == 'environment'), 'unknown' ) if env_tag not in ['development', 'staging', 'sandbox']: print(f"Skipping auto-implementation for {instance_id}: env={env_tag}") return None # Stop, modify, start using SSM Automation response = self.ssm.start_automation_execution( DocumentName='AWS-ResizeInstance', Parameters={ 'InstanceId': [instance_id], 'InstanceType': [new_type], } ) return response['AutomationExecutionId'] def run_pipeline(self, auto_implement_nonprod: bool = False): """Execute full right-sizing pipeline.""" recommendations = self.get_recommendations() print(f"Found {len(recommendations)} right-sizing opportunities") total_savings = 0 for rec in recommendations: env = self._get_environment(rec['instance_id']) if auto_implement_nonprod and env in ['development', 'staging']: execution_id = self.auto_implement_nonprod(rec) print(f"Auto-implementing {rec['instance_id']}: {execution_id}") else: ticket_key = self.create_ticket(rec, env) print(f"Created ticket {ticket_key} for {rec['instance_id']}") total_savings += rec['estimated_monthly_savings'] print(f"\nTotal potential monthly savings: ${total_savings:, .2f}") def _get_environment(self, instance_id: str) -> str: """Get environment from instance tags.""" response = self.ec2.describe_instances(InstanceIds = [instance_id]) for res in response['Reservations']: for inst in res['Instances']: for tag in inst.get('Tags', []): if tag['Key'] == 'environment': return tag['Value'] return 'unknown' # Execute pipelineif __name__ == '__main__': pipeline = RightSizingPipeline() pipeline.run_pipeline(auto_implement_nonprod = True)Tools and automation are necessary but not sufficient for sustained right-sizing. The real challenge is changing organizational behavior—creating a culture where right-sizing is the default, not an exception.
Common cultural barriers:
Strategies for culture change:
1. Make costs visible (Showback)
Teams can't optimize what they can't see. Implement weekly cost reports by team showing:
2. Create positive incentives
Reward optimization, not just avoid punishment:
3. Remove fear
Make right-sizing safe:
4. Embed in workflows
Make right-sizing part of existing processes:
5. Executive sponsorship
Top-down support is essential:
Begin with obvious wins: idle development instances, oversized staging environments, clear over-provisioning. Early successes build momentum and credibility for tackling harder optimization projects. A $50,000 quick win creates appetite for the $500,000 project.
Right-sizing is one of the highest-impact, lowest-risk cost optimization strategies available. Unlike purchasing commitments (which lock you in) or architecture changes (which require significant work), right-sizing often involves simply changing an instance type. Let's consolidate the key concepts:
What's next:
Right-sizing ensures you're not over-provisioning individual resources. But what about aggregate capacity? The next page explores Auto-Scaling for Cost—using dynamic capacity management not just for availability, but as a cost optimization strategy that matches resource supply to actual demand.
You now understand how to systematically identify and implement right-sizing opportunities across your cloud infrastructure. These techniques typically yield 20-40% cost reduction with minimal risk. Next, we'll explore how auto-scaling compounds these savings by dynamically adjusting capacity to match real-time demand.