System Design (HLD)Cloud Cost Optimization

Cloud Cost Optimization

LevelIntermediate

Duration90 mins

TopicCloud Cost Optimization

3 / 5

Right-Sizing Resources

The Hidden Tax of Over-Provisioning

A mid-sized SaaS company ran 200 EC2 instances to power their production workloads. When a new FinOps engineer analyzed their CPU and memory utilization, she found that average CPU utilization was 8% and average memory utilization was 15%. The instances were sized for peak loads that occurred for 30 minutes per month, but the organization was paying for that capacity 24/7/365.

After a methodical right-sizing initiative, they reduced their instance count to 120 and downsized 60 others by 1-2 instance sizes. The result: $400,000 annual savings—roughly 40% of their compute bill—with zero performance degradation.

This story is surprisingly common. Research by Densify, RightScale, and cloud providers consistently shows that 30-50% of cloud resources are over-provisioned. Organizations provision for peak, add safety margins, inherit developer assumptions, and rarely revisit those decisions. The result is a massive "over-provisioning tax" that directly reduces profitability.

Right-sizing is the practice of continually analyzing resource utilization and adjusting allocations to match actual requirements. It's conceptually simple but operationally challenging—and it's often the single highest-impact cost optimization available.

What You Will Learn

By the end of this page, you will understand how to implement systematic right-sizing: collecting and analyzing utilization data, identifying right-sizing candidates, safely implementing changes, and building a culture of continuous optimization. You'll learn specific techniques for compute, databases, storage, and containerized workloads.

Understanding Over-Provisioning

Before we solve over-provisioning, we need to understand why it's so pervasive. Over-provisioning isn't stupidity or carelessness—it's a rational response to incentives and constraints that favor larger resources.

Why over-provisioning happens:

Root Causes of Over-Provisioning

•Asymmetric risk perception — Under-provisioning causes immediate, visible pain (outages, latency). Over-provisioning causes diffuse, invisible waste. Developers rationally choose the option that minimizes personal risk.
•Lack of ownership — When teams don't pay for their resources (no chargeback), they have no incentive to optimize. 'Someone else pays the bill.'
•Peak planning without variance — Sizing done during design assumes peak load, then adds safety margin. Resources are provisioned for worst-case, run at average-case.
•Lift-and-shift migrations — Workloads migrated from on-prem retain their original sizing. VMs sized for shared hardware contention don't need that headroom in cloud.
•Forgotten experiments — Test resources, POCs, and deprecated services remain running. Nobody deletes resources that 'might still be needed.'
•Feature flag residue — Resources provisioned for future features that were never completed or were abandoned.
•Vertical scaling muscle memory — Traditional response to performance issues is 'make it bigger.' Bigger becomes the new baseline even after issues resolve.

The psychology of 'just in case':

Consider a developer sizing a database instance. They estimate 2 vCPUs and 4GB RAM should be sufficient. But what if they're wrong? What if that Black Friday spike is bigger than expected? What if that new feature uses more resources?

The developer has two choices:

Size correctly — Risk being wrong and causing an outage (visible failure, personal blame)
Size larger — Waste money but guarantee no performance issues (invisible cost, no blame)

Without countervailing incentives (chargeback, right-sizing culture, easy scaling), developers will always choose option 2. This is rational behavior—the problem is the incentive structure, not the developer.

The compounding effect:

Over-provisioning isn't additive; it's multiplicative:

Layer	Safety Margin	Compounded
Developer estimate	"add 50%"	1.5x
Architect review	"double for growth"	3.0x
Production buffer	"add 20% for staging"	3.6x
High availability	"3 replicas"	10.8x

A workload that needs 1 vCPU becomes 10.8 vCPUs through cascading safety margins.

The Cost of 'Free' Resources

In the cloud, unused capacity costs money. A 16-CPU instance running at 5% utilization isn't 'free capacity for growth'—it's 15 CPUs of waste at $X/hour. Unlike on-prem hardware (already paid for), cloud waste is operational expense that directly reduces profit margin.

Measuring Resource Utilization

Effective right-sizing requires accurate, comprehensive utilization data. You need to measure what you're actually using before you can determine what you should provision.

Key metrics for right-sizing:

Critical Utilization Metrics by Resource Type
Resource	Primary Metrics	Secondary Metrics	Collection Method
EC2/VMs	CPU utilization, Memory utilization	Network I/O, Disk I/O, IOPS	CloudWatch, Azure Monitor, GCP Monitoring
RDS/Databases	CPU, Memory, Storage, IOPS	Connections, Query latency	CloudWatch, Performance Insights
Containers (ECS/K8s)	CPU requests/usage, Memory requests/usage	Throttling, OOMKills	Container Insights, Prometheus, Datadog
Lambda/Functions	Duration, Memory, Concurrency	Cold starts, Errors	CloudWatch, X-Ray
Storage (S3/EBS)	Storage used, Access patterns	Request rates, Data class usage	S3 Analytics, CloudWatch

Data collection requirements:

1. Time range matters

Point-in-time measurements are misleading. You need utilization data spanning:

Multiple business cycles (weekly patterns)
Seasonal variations (monthly, quarterly, annual patterns)
Deployment events (before and after major changes)

Minimum recommended: 14 days of hourly data Better: 30-90 days of 5-minute data Ideal: 6-12 months for seasonal workloads (retail, finance, education)

2. Statistical measures beyond averages

Average utilization hides important patterns:

Instance A: Average CPU 10%, Peak CPU 95%, Std Dev 25%
Instance B: Average CPU 10%, Peak CPU 15%, Std Dev 2%

Both have the same average, but Instance A has high variance (can't downsize easily), while Instance B is consistently over-provisioned (easy win).

Key statistics to collect:

Average (mean) utilization
Peak (max) utilization
P95/P99 utilization (95th/99th percentile)
Standard deviation (variance indicator)
Time at peak (duration of high usage)

utilization-analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
"""
Right-Sizing Analysis: Collect and Analyze EC2 Utilization
 
Uses CloudWatch metrics to identify right-sizing opportunities.
"""
 
import boto3
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Dict
import statistics
 
@dataclass
class InstanceUtilization:
    instance_id: str
    instance_type: str
    avg_cpu: float
    max_cpu: float
    p95_cpu: float
    avg_memory: float  # Requires CloudWatch agent
    p95_memory: float
    current_vcpus: int
    current_memory_gb: float
    recommendation: str
    potential_savings: float
 
def get_cpu_metrics(
    cloudwatch: boto3.client,
    instance_id: str,
    days: int = 14
) -> List[float]:
    """Fetch CPU utilization data points."""
    
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=days)
    
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=start_time,
        EndTime=end_time,
        Period=300,  # 5-minute granularity
        Statistics=['Average']
    )
    
    return [dp['Average'] for dp in response['Datapoints']]
 
def analyze_instance(
    instance: dict,
    cpu_data: List[float],
    memory_data: List[float],
    instance_prices: Dict[str, float]
) -> InstanceUtilization:
    """Analyze a single instance for right-sizing opportunities."""
    
    instance_id = instance['InstanceId']
    instance_type = instance['InstanceType']
    
    # Calculate CPU statistics
    avg_cpu = statistics.mean(cpu_data) if cpu_data else 0
    max_cpu = max(cpu_data) if cpu_data else 0
    p95_cpu = sorted(cpu_data)[int(len(cpu_data) * 0.95)] if cpu_data else 0
    
    # Calculate memory statistics (requires CloudWatch agent)
    avg_memory = statistics.mean(memory_data) if memory_data else 0
    p95_memory = sorted(memory_data)[int(len(memory_data) * 0.95)] if memory_data else 0
    
    # Get instance capacity (simplified lookup)
    vcpus, memory_gb = get_instance_specs(instance_type)
    
    # Determine recommendation
    recommendation, new_type = determine_recommendation(
        instance_type, avg_cpu, p95_cpu, avg_memory, p95_memory
    )
    
    # Calculate potential savings
    current_price = instance_prices.get(instance_type, 0)
    new_price = instance_prices.get(new_type, current_price)
    monthly_savings = (current_price - new_price) * 730  # hours/month
    
    return InstanceUtilization(
        instance_id=instance_id,
        instance_type=instance_type,
        avg_cpu=round(avg_cpu, 2),
        max_cpu=round(max_cpu, 2),
        p95_cpu=round(p95_cpu, 2),
        avg_memory=round(avg_memory, 2),
        p95_memory=round(p95_memory, 2),
        current_vcpus=vcpus,
        current_memory_gb=memory_gb,
        recommendation=recommendation,
        potential_savings=round(monthly_savings, 2)
    )
 
def determine_recommendation(
    current_type: str,
    avg_cpu: float,
    p95_cpu: float,
    avg_memory: float,
    p95_memory: float
) -> tuple:
    """
    Recommendation logic based on utilization thresholds.
    
    Conservative approach:
    - Only recommend downsize if P95 < 50%
    - Recommend upsize if P95 > 80%
    - Keep buffer for unexpected peaks
    """
    
    # Define thresholds (adjustable based on risk tolerance)
    DOWNSIZE_THRESHOLD = 50  # P95 below this = downsize candidate
    UPSIZE_THRESHOLD = 80   # P95 above this = upsize candidate
    
    cpu_under = p95_cpu < DOWNSIZE_THRESHOLD
    memory_under = p95_memory < DOWNSIZE_THRESHOLD if p95_memory > 0 else True
    
    cpu_over = p95_cpu > UPSIZE_THRESHOLD
    memory_over = p95_memory > UPSIZE_THRESHOLD
    
    if cpu_under and memory_under:
        new_type = get_smaller_type(current_type)
        return f"DOWNSIZE to {new_type}", new_type
    elif cpu_over or memory_over:
        new_type = get_larger_type(current_type)
        return f"UPSIZE to {new_type}", new_type
    else:
        return "OPTIMAL - No change recommended", current_type
 
def get_instance_specs(instance_type: str) -> tuple:
    """Return (vCPUs, memory_GB) for instance type."""
    # Simplified lookup - production would use AWS pricing API
    specs = {
        'm5.large': (2, 8),
        'm5.xlarge': (4, 16),
        'm5.2xlarge': (8, 32),
        'm5.4xlarge': (16, 64),
        'c5.large': (2, 4),
        'c5.xlarge': (4, 8),
        'r5.large': (2, 16),
        'r5.xlarge': (4, 32),
    }
    return specs.get(instance_type, (4, 16))
 
def get_smaller_type(instance_type: str) -> str:
    """Return one size smaller instance type."""
    size_order = ['nano', 'micro', 'small', 'medium', 'large', 'xlarge', '2xlarge', '4xlarge']
    family, size = instance_type.rsplit('.', 1)
    
    if size in size_order:
        idx = size_order.index(size)
        if idx > 0:
            return f"{family}.{size_order[idx-1]}"
    return instance_type
 
def get_larger_type(instance_type: str) -> str:
    """Return one size larger instance type."""
    size_order = ['nano', 'micro', 'small', 'medium', 'large', 'xlarge', '2xlarge', '4xlarge']
    family, size = instance_type.rsplit('.', 1)
    
    if size in size_order:
        idx = size_order.index(size)
        if idx < len(size_order) - 1:
            return f"{family}.{size_order[idx+1]}"
    return instance_type
 
# Main execution
def run_analysis():
    ec2 = boto3.client('ec2')
    cloudwatch = boto3.client('cloudwatch')
    
    # Get running instances
    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )
    
    results = []
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            cpu_data = get_cpu_metrics(cloudwatch, instance['InstanceId'])
            memory_data = []  # Would require CloudWatch agent data
            
            analysis = analyze_instance(
                instance, cpu_data, memory_data,
                instance_prices={}  # Would load from pricing API
            )
            results.append(analysis)
            
            if 'DOWNSIZE' in analysis.recommendation:
                print(f"💰 {analysis.instance_id}: {analysis.recommendation}")
                print(f"   Current: {analysis.instance_type} ({analysis.avg_cpu}% avg, {analysis.p95_cpu}% p95)")
                print(f"   Savings: ${analysis.potential_savings}/month")
    
    total_savings = sum(r.potential_savings for r in results)
    print(f"\nTotal potential monthly savings: ${total_savings:, .2f}")

Right-Sizing Compute Resources

Compute resources (EC2, VMs, containers) typically represent the largest category of cloud spending and offer the most right-sizing opportunity. Let's explore specific strategies for each compute model.

EC2/VM Right-Sizing:

Step 1: Identify candidates

Use cloud provider recommendations (AWS Compute Optimizer, Azure Advisor, GCP Recommender) or custom analysis to find instances with:

Average CPU < 20%
P95 CPU < 50%
Consistent utilization patterns (low variance)

Step 2: Validate with application context

Metrics don't tell the whole story. Before downsizing:

Confirm the workload isn't memory-bound (CPU low but memory high)
Check for I/O-bound behavior (CPU low but waiting on disk/network)
Review application-specific metrics (request latency, queue depth)
Verify business cycle coverage (did data capture Black Friday?)

Step 3: Test before production

Never downsize production without testing:

Run load tests against downsized configuration
Shadow test: run downsized replica alongside production
Implement canary deployments: 10% traffic to smaller instances

EC2 Right-Sizing Decision Matrix

•P95 CPU < 30%, P95 Memory < 30% → High confidence downsize (2 sizes smaller)
•P95 CPU < 50%, P95 Memory < 50% → Moderate confidence downsize (1 size smaller)
•P95 CPU 50-80%, P95 Memory 50-80% → Optimal sizing, no action
•P95 CPU > 80% OR P95 Memory > 80% → Consider upsize or horizontal scaling
•High CPU variance (Std Dev > 25%) → Investigate burst patterns before action

Modern instance families:

Right-sizing isn't just about making instances smaller—it's also about using the right instance family. New instance generations often provide better performance at lower cost:

Old Instance	New Equivalent	Performance	Cost Change
m4.xlarge	m6i.xlarge	+15%	Similar
c4.2xlarge	c6g.2xlarge	+40%	-20%
r4.large	r6g.large	+40%	-10%

Graviton (ARM) instances — AWS Graviton processors offer 20-40% better price-performance than x86 equivalents. If your workload runs on Linux and doesn't require x86-specific binaries, Graviton migration is a compelling optimization.

Right-Sizing as Migration Opportunity

Right-sizing is easiest during other changes. When migrating to containers, rebuilding infrastructure, or major deployments, incorporate right-sizing from the start. Established services have political resistance to change ('it's working, why touch it?'); new deployments don't.

Right-Sizing Containers and Kubernetes

Containerized workloads present unique right-sizing challenges. In Kubernetes, resource allocation happens at two levels:

Pod resource requests/limits — What each container requests and its maximum
Node sizing — The underlying EC2/VM instances that run pods

Over-provisioning at either level wastes money, but the mechanisms differ.

Pod resource requests/limits:

Kubernetes uses requests for scheduling and limits for enforcement:

resources:
  requests:
    cpu: "500m"       # Scheduling guarantee: 0.5 CPU
    memory: "512Mi"   # Scheduling guarantee: 512 MB
  limits:
    cpu: "1000m"      # Maximum: 1 CPU (throttled if exceeded)
    memory: "1Gi"     # Maximum: 1 GB (OOMKilled if exceeded)

Common misconfigurations:

Requests too high — Pods request more than they use, blocking other pods from scheduling. Nodes appear full, but actual utilization is low.
No limits set — Pods can consume unlimited resources, affecting neighbors (noisy neighbor problem).
Limits = Requests — No burst capacity; may cause unnecessary throttling or OOMKills.

Kubernetes Right-Sizing Approach

•Collect actual usage metrics (Prometheus, Datadog)
•Compare requests to P95 usage over 7+ days
•Set requests at P95 usage + 20% buffer
•Set limits at 2-4x requests (allow burst)
•Use VPA recommendations as guidance
•Implement gradually with monitoring

Kubernetes Right-Sizing Anti-Patterns

•Copy-pasting resources from other pods
•Setting requests based on gut feeling
•Never revisiting initial settings
•Setting limits without understanding behavior
•Ignoring CPU throttling metrics
•Ignoring OOMKill events

Vertical Pod Autoscaler (VPA):

Kubernetes VPA automatically adjusts pod requests/limits based on observed usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # Options: Off, Initial, Recreate, Auto
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: "50m"
          memory: "100Mi"
        maxAllowed:
          cpu: "2"
          memory: "4Gi"

VPA modes:

Off — VPA only provides recommendations, no changes applied
Initial — Applies recommendations at pod creation only
Recreate — Evicts and recreates pods to apply new resources
Auto — Updates pods using available mechanisms (eviction or in-place)

Node right-sizing:

Even with optimal pod sizing, nodes can be over-provisioned. Kubernetes cluster autoscaler adds/removes nodes based on pending pods, but doesn't automatically address over-provisioned nodes.

Karpenter (AWS) provides more intelligent node provisioning:

Selects optimal instance types based on pending pod requirements
Consolidates workloads onto fewer, better-utilized nodes
Replaces under-utilized nodes with right-sized alternatives

karpenter-consolidation.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Karpenter NodePool with Consolidation
# Automatically consolidates under-utilized nodes
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m6i.large
            - m6i.xlarge
            - m6i.2xlarge
            - m6a.large
            - m6a.xlarge
            - m6a.2xlarge
            - c6i.large
            - c6i.xlarge
      nodeClassRef:
        name: default
        
  disruption:
    # Enable consolidation - Karpenter will:
    # 1. Delete empty nodes
    # 2. Replace under-utilized nodes with smaller ones
    # 3. Consolidate pods onto fewer nodes
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s  # Wait 30s before consolidating
    
  # Limit total resources for cost control
  limits:
    cpu: 1000
    memory: 1000Gi

Right-Sizing Databases

Databases are frequently the most over-provisioned resources in cloud environments. The fear of database performance issues is acute—slow queries directly impact user experience. But this fear often leads to 5-10x over-provisioning.

Why databases are over-provisioned:

Fear of migration — Changing database instance size requires maintenance windows
I/O complexity — Database performance depends on CPU, memory, storage IOPS, and network
Peak planning — Databases sized for annual peak (Black Friday) run at that size year-round
Growth projections — "We'll grow into it" projections that never materialize

Database right-sizing dimensions:

Database Right-Sizing Dimensions
Dimension	Over-Provisioned Signs	Right-Sizing Action
Instance Size (vCPU/Memory)	CPU < 20%, Memory < 40%	Downsize instance class
Storage (EBS/SSD)	Storage utilization < 30%	Reduce allocated storage*
IOPS	Provisioned IOPS >> actual IOPS	Use burstable storage or reduce PIOPS
Read Replicas	Replicas with minimal traffic	Remove unnecessary replicas
Multi-AZ	Dev/test using Multi-AZ	Disable Multi-AZ for non-production

*Note: Most managed databases don't allow reducing storage once provisioned. Storage right-sizing must be done proactively or during migration.

RDS-specific right-sizing:

AWS RDS Performance Insights provides deep visibility into database performance:

Top SQL queries by load
Wait event analysis (CPU, I/O, locks)
Instance utilization over time

Key metrics for RDS right-sizing:

# CPU - Look for consistent low utilization
CPUUtilization < 20% average
CPUUtilization < 40% peak

# Memory - Buffer pool hit ratio matters more than raw utilization
FreeableMemory > 50% of total (might be over-provisioned)
BufferCacheHitRatio > 99% (memory is sufficient, maybe over)

# Storage I/O - Compare provisioned to actual
ReadIOPS + WriteIOPS << Provisioned IOPS (PIOPS waste)
ReadLatency < 5ms and WriteLatency < 5ms (IOPS sufficient)

# Connections
DatabaseConnections << max_connections (connection pooling may allow smaller instance)

Database Right-Sizing Best Practices

•Use read replicas for read traffic — Offload reads to replicas instead of sizing primary for combined load
•Implement connection pooling — PgBouncer, ProxySQL reduce connection overhead, enabling smaller instances
•Use Aurora Serverless v2 for variable workloads — Auto-scales capacity based on demand, pay for actual usage
•Separate OLTP and OLAP — Analytics queries on production databases require over-provisioning; use dedicated analytics
•Review Reserved Instance coverage — Before downsizing, check if you'll lose RI benefits (plan RI adjustments together)
•Test extensively — Database changes are high-risk; use staging, load testing, and blue-green deployments

Database Right-Sizing Caution

Database right-sizing is higher risk than compute right-sizing. Unlike stateless services that can be quickly scaled, database changes require data migration, replication catch-up, and maintenance windows. Always validate thoroughly and prefer conservative changes (one size at a time).

Automated Right-Sizing

Manual right-sizing doesn't scale. In an environment with hundreds or thousands of resources, you need automation—both for identifying opportunities and implementing changes safely.

Cloud Provider Recommendation Tools:

Cloud Provider Right-Sizing Tools
Provider	Tool	Capabilities
AWS	AWS Compute Optimizer	EC2, Auto Scaling, EBS, Lambda recommendations based on ML analysis
AWS	AWS Trusted Advisor	Right-sizing alerts, idle resource detection
AWS	AWS Cost Explorer	RI/Savings Plan recommendations, usage patterns
Azure	Azure Advisor	VM right-sizing, idle resource detection
Azure	Azure Cost Management	Optimization recommendations, budget alerts
GCP	Recommender	VM/disk right-sizing, idle resource detection
GCP	Active Assist	ML-based optimization recommendations

Third-party tools:

For advanced automation and multi-cloud support:

Spot by NetApp — Continuous right-sizing, container optimization, automated implementation
Densify — Machine learning-based optimization, container right-sizing
CloudHealth — Governance, rightsizing, policy enforcement
Kubecost — Kubernetes-specific cost optimization and right-sizing

Implementing automated right-sizing:

Automation Implementation Levels

•Level 1: Automated Detection — Systems identify right-sizing candidates and generate reports. Humans review and implement. Implementation: Scheduled Compute Optimizer exports, weekly reports to teams.
•Level 2: Automated Ticketing — Right-sizing recommendations automatically create tickets assigned to resource owners. Implementation: Lambda trigger on Compute Optimizer → Jira integration.
•Level 3: Automated Scheduling — Approved right-sizing changes are automatically scheduled during maintenance windows. Implementation: Change approval workflow → AWS Systems Manager automation.
•Level 4: Continuous Optimization — System automatically implements low-risk right-sizing (non-prod, stateless, containerized). Implementation: VPA in Kubernetes, spot optimization tools, scheduled instance replacement.

rightsizing-automation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
"""
Automated Right-Sizing Pipeline
 
Fetches Compute Optimizer recommendations and creates Jira tickets
for human review, with option for automated implementation in non-prod.
"""
 
import boto3
import json
from datetime import datetime
from jira import JIRA
 
class RightSizingPipeline:
    def __init__(self):
        self.compute_optimizer = boto3.client('compute-optimizer')
        self.ec2 = boto3.client('ec2')
        self.ssm = boto3.client('ssm')
        
        # Jira connection (configure in environment)
        self.jira = JIRA(
            server='https://company.atlassian.net',
            basic_auth=('user', 'api_token')
        )
    
    def get_recommendations(self, finding_status: str = 'Overprovisioned'):
        """Fetch EC2 right-sizing recommendations."""
        
        response = self.compute_optimizer.get_ec2_instance_recommendations(
            filters=[
                {'name': 'Finding', 'values': [finding_status]}
            ]
        )
        
        recommendations = []
        for rec in response.get('instanceRecommendations', []):
            instance_id = rec['instanceArn'].split('/')[-1]
            current_type = rec['currentInstanceType']
            
            # Get top recommendation
            if rec.get('recommendationOptions'):
                best_option = rec['recommendationOptions'][0]
                recommended_type = best_option['instanceType']
                estimated_savings = best_option.get('projectedUtilizationMetrics', [])
                
                recommendations.append({
                    'instance_id': instance_id,
                    'current_type': current_type,
                    'recommended_type': recommended_type,
                    'finding': rec['finding'],
                    'finding_reasons': rec.get('findingReasonCodes', []),
                    'utilization': rec.get('utilizationMetrics', []),
                    'estimated_monthly_savings': self._calculate_savings(
                        current_type, recommended_type
                    )
                })
        
        return recommendations
    
    def _calculate_savings(self, current_type: str, recommended_type: str) -> float:
        """Calculate estimated monthly savings (simplified)."""
        # Production: Use AWS Pricing API
        prices = {
            'm5.2xlarge': 0.384, 'm5.xlarge': 0.192, 'm5.large': 0.096,
            'c5.2xlarge': 0.340, 'c5.xlarge': 0.170, 'c5.large': 0.085,
        }
        current_price = prices.get(current_type, 0.20)
        new_price = prices.get(recommended_type, 0.20)
        return round((current_price - new_price) * 730, 2)  # Monthly hours
    
    def create_ticket(self, recommendation: dict, environment: str):
        """Create Jira ticket for right-sizing review."""
        
        avg_cpu = next(
            (m['value'] for m in recommendation['utilization'] 
             if m['name'] == 'CPU'), 'N/A'
        )
        
        description = f"""
*Automated Right-Sizing Recommendation*
 
|*Field*|*Value*|
|Instance ID|{recommendation['instance_id']}|
|Current Type|{recommendation['current_type']}|
|Recommended Type|{recommendation['recommended_type']}|
|Finding|{recommendation['finding']}|
|Average CPU|{avg_cpu}%|
|Est. Monthly Savings|${recommendation['estimated_monthly_savings']}|
 
*Finding Reasons:*
{chr(10).join(f'* {r}' for r in recommendation['finding_reasons'])}
 
*Next Steps:*
1. Review utilization metrics in CloudWatch
2. Verify workload can run on smaller instance
3. Schedule maintenance window for change
4. Monitor post-change performance
 
_This ticket was auto-generated by the Right-Sizing Pipeline_
        """
        
        issue = self.jira.create_issue(
            project='CLOUD',
            summary=f"Right-Size {recommendation['instance_id']}: "
                    f"{recommendation['current_type']} → {recommendation['recommended_type']}",
            description=description,
            issuetype={'name': 'Task'},
            labels=['right-sizing', 'cost-optimization', environment],
        )
        
        return issue.key
    
    def auto_implement_nonprod(self, recommendation: dict):
        """
        Automatically implement right-sizing for non-production instances.
        Uses SSM automation with proper safety checks.
        """
        
        instance_id = recommendation['instance_id']
        new_type = recommendation['recommended_type']
        
        # Safety checks
        tags = self.ec2.describe_tags(
            Filters=[{'Name': 'resource-id', 'Values': [instance_id]}]
        )
        
        env_tag = next(
            (t['Value'] for t in tags['Tags'] if t['Key'] == 'environment'),
            'unknown'
        )
        
        if env_tag not in ['development', 'staging', 'sandbox']:
            print(f"Skipping auto-implementation for {instance_id}: env={env_tag}")
            return None
        
        # Stop, modify, start using SSM Automation
        response = self.ssm.start_automation_execution(
            DocumentName='AWS-ResizeInstance',
            Parameters={
                'InstanceId': [instance_id],
                'InstanceType': [new_type],
            }
        )
        
        return response['AutomationExecutionId']
    
    def run_pipeline(self, auto_implement_nonprod: bool = False):
        """Execute full right-sizing pipeline."""
        
        recommendations = self.get_recommendations()
        print(f"Found {len(recommendations)} right-sizing opportunities")
        
        total_savings = 0
        for rec in recommendations:
            env = self._get_environment(rec['instance_id'])
            
            if auto_implement_nonprod and env in ['development', 'staging']:
                execution_id = self.auto_implement_nonprod(rec)
                print(f"Auto-implementing {rec['instance_id']}: {execution_id}")
            else:
                ticket_key = self.create_ticket(rec, env)
                print(f"Created ticket {ticket_key} for {rec['instance_id']}")
            
            total_savings += rec['estimated_monthly_savings']
        
        print(f"\nTotal potential monthly savings: ${total_savings:, .2f}")
    
    def _get_environment(self, instance_id: str) -> str:
                """Get environment from instance tags."""
        response = self.ec2.describe_instances(InstanceIds = [instance_id])
        for res in response['Reservations']:
            for inst in res['Instances']:
            for tag in inst.get('Tags', []):
            if tag['Key'] == 'environment':
            return tag['Value']
        return 'unknown'
 
 
# Execute pipeline
if __name__ == '__main__':
            pipeline = RightSizingPipeline()
    pipeline.run_pipeline(auto_implement_nonprod = True)

Building a Right-Sizing Culture

Tools and automation are necessary but not sufficient for sustained right-sizing. The real challenge is changing organizational behavior—creating a culture where right-sizing is the default, not an exception.

Common cultural barriers:

Cultural Barriers to Right-Sizing

•'Not my money' — Teams don't pay for resources, so they don't care about optimization
•Fear of outages — Right-sizing feels risky; status quo feels safe
•Lack of visibility — Teams don't know how over-provisioned they are
•No time — Right-sizing isn't prioritized against feature work
•Blame culture — Past performance issues created fear of under-provisioning
•Ownership gaps — Nobody knows who owns legacy resources

Strategies for culture change:

1. Make costs visible (Showback)

Teams can't optimize what they can't see. Implement weekly cost reports by team showing:

Total spend and trend
Waste metrics (idle resources, over-provisioning)
Comparison to peer teams
Improvement from last period

2. Create positive incentives

Reward optimization, not just avoid punishment:

"Cost savings of the month" recognition
Savings reinvested in team tooling or activities
Engineering ladder criteria includes efficiency
Celebrate successful right-sizing projects

3. Remove fear

Make right-sizing safe:

Provide easy rollback paths (Auto Scaling, Kubernetes)
Start with non-production environments
Run shadow tests before production changes
Blame-free post-mortems if issues occur

4. Embed in workflows

Make right-sizing part of existing processes:

Include utilization review in deployment checklists
Add cost impact to pull request templates
Review resource sizing in architecture reviews
Quarterly optimization sprints

5. Executive sponsorship

Top-down support is essential:

FinOps goals in engineering OKRs
Cost efficiency as performance metric
Cloud cost in board-level reporting
Budget for FinOps tooling and headcount

Start Small, Win Early

Begin with obvious wins: idle development instances, oversized staging environments, clear over-provisioning. Early successes build momentum and credibility for tackling harder optimization projects. A $50,000 quick win creates appetite for the $500,000 project.

Summary: Right-Sizing Resources

Right-sizing is one of the highest-impact, lowest-risk cost optimization strategies available. Unlike purchasing commitments (which lock you in) or architecture changes (which require significant work), right-sizing often involves simply changing an instance type. Let's consolidate the key concepts:

Key Takeaways

•30-50% of cloud resources are over-provisioned — This represents massive savings opportunity with minimal risk.
•Over-provisioning is rational given typical incentives — Change the incentives (visibility, ownership) to change behavior.
•Measure before optimizing — Collect 14-90 days of utilization data; use P95, not averages.
•Right-size across all dimensions — Compute, containers, databases, storage all have optimization potential.
•Automate detection and implementation — Manual processes don't scale; build pipelines that continuously optimize.
•Culture matters more than tools — Sustainable optimization requires organizational change, not just technical solutions.
•Start with easy wins — Build momentum with obvious over-provisioning before tackling complex cases.

What's next:

Right-sizing ensures you're not over-provisioning individual resources. But what about aggregate capacity? The next page explores Auto-Scaling for Cost—using dynamic capacity management not just for availability, but as a cost optimization strategy that matches resource supply to actual demand.

Page Complete

You now understand how to systematically identify and implement right-sizing opportunities across your cloud infrastructure. These techniques typically yield 20-40% cost reduction with minimal risk. Next, we'll explore how auto-scaling compounds these savings by dynamically adjusting capacity to match real-time demand.

3 / 5

Loading learning content...

System Design (HLD)Cloud Cost Optimization

Cloud Cost Optimization

LevelIntermediate

Duration90 mins

TopicCloud Cost Optimization

3 / 5

Right-Sizing Resources

The Hidden Tax of Over-Provisioning

What You Will Learn

Understanding Over-Provisioning

Why over-provisioning happens:

Root Causes of Over-Provisioning

•Asymmetric risk perception — Under-provisioning causes immediate, visible pain (outages, latency). Over-provisioning causes diffuse, invisible waste. Developers rationally choose the option that minimizes personal risk.
•Lack of ownership — When teams don't pay for their resources (no chargeback), they have no incentive to optimize. 'Someone else pays the bill.'
•Peak planning without variance — Sizing done during design assumes peak load, then adds safety margin. Resources are provisioned for worst-case, run at average-case.
•Lift-and-shift migrations — Workloads migrated from on-prem retain their original sizing. VMs sized for shared hardware contention don't need that headroom in cloud.
•Forgotten experiments — Test resources, POCs, and deprecated services remain running. Nobody deletes resources that 'might still be needed.'
•Feature flag residue — Resources provisioned for future features that were never completed or were abandoned.
•Vertical scaling muscle memory — Traditional response to performance issues is 'make it bigger.' Bigger becomes the new baseline even after issues resolve.

The psychology of 'just in case':

The developer has two choices:

Size correctly — Risk being wrong and causing an outage (visible failure, personal blame)
Size larger — Waste money but guarantee no performance issues (invisible cost, no blame)

The compounding effect:

Over-provisioning isn't additive; it's multiplicative:

Layer	Safety Margin	Compounded
Developer estimate	"add 50%"	1.5x
Architect review	"double for growth"	3.0x
Production buffer	"add 20% for staging"	3.6x
High availability	"3 replicas"	10.8x

A workload that needs 1 vCPU becomes 10.8 vCPUs through cascading safety margins.

The Cost of 'Free' Resources

Measuring Resource Utilization

Effective right-sizing requires accurate, comprehensive utilization data. You need to measure what you're actually using before you can determine what you should provision.

Key metrics for right-sizing:

Critical Utilization Metrics by Resource Type
Resource	Primary Metrics	Secondary Metrics	Collection Method
EC2/VMs	CPU utilization, Memory utilization	Network I/O, Disk I/O, IOPS	CloudWatch, Azure Monitor, GCP Monitoring
RDS/Databases	CPU, Memory, Storage, IOPS	Connections, Query latency	CloudWatch, Performance Insights
Containers (ECS/K8s)	CPU requests/usage, Memory requests/usage	Throttling, OOMKills	Container Insights, Prometheus, Datadog
Lambda/Functions	Duration, Memory, Concurrency	Cold starts, Errors	CloudWatch, X-Ray
Storage (S3/EBS)	Storage used, Access patterns	Request rates, Data class usage	S3 Analytics, CloudWatch

Data collection requirements:

1. Time range matters

Point-in-time measurements are misleading. You need utilization data spanning:

Multiple business cycles (weekly patterns)
Seasonal variations (monthly, quarterly, annual patterns)
Deployment events (before and after major changes)

Minimum recommended: 14 days of hourly data Better: 30-90 days of 5-minute data Ideal: 6-12 months for seasonal workloads (retail, finance, education)

2. Statistical measures beyond averages

Average utilization hides important patterns:

Instance A: Average CPU 10%, Peak CPU 95%, Std Dev 25%
Instance B: Average CPU 10%, Peak CPU 15%, Std Dev 2%

Both have the same average, but Instance A has high variance (can't downsize easily), while Instance B is consistently over-provisioned (easy win).

Key statistics to collect:

Average (mean) utilization
Peak (max) utilization
P95/P99 utilization (95th/99th percentile)
Standard deviation (variance indicator)
Time at peak (duration of high usage)

utilization-analysis.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
"""
Right-Sizing Analysis: Collect and Analyze EC2 Utilization
 
Uses CloudWatch metrics to identify right-sizing opportunities.
"""
 
import boto3
from datetime import datetime, timedelta
from dataclasses import dataclass
from typing import List, Dict
import statistics
 
@dataclass
class InstanceUtilization:
    instance_id: str
    instance_type: str
    avg_cpu: float
    max_cpu: float
    p95_cpu: float
    avg_memory: float  # Requires CloudWatch agent
    p95_memory: float
    current_vcpus: int
    current_memory_gb: float
    recommendation: str
    potential_savings: float
 
def get_cpu_metrics(
    cloudwatch: boto3.client,
    instance_id: str,
    days: int = 14
) -> List[float]:
    """Fetch CPU utilization data points."""
    
    end_time = datetime.utcnow()
    start_time = end_time - timedelta(days=days)
    
    response = cloudwatch.get_metric_statistics(
        Namespace='AWS/EC2',
        MetricName='CPUUtilization',
        Dimensions=[{'Name': 'InstanceId', 'Value': instance_id}],
        StartTime=start_time,
        EndTime=end_time,
        Period=300,  # 5-minute granularity
        Statistics=['Average']
    )
    
    return [dp['Average'] for dp in response['Datapoints']]
 
def analyze_instance(
    instance: dict,
    cpu_data: List[float],
    memory_data: List[float],
    instance_prices: Dict[str, float]
) -> InstanceUtilization:
    """Analyze a single instance for right-sizing opportunities."""
    
    instance_id = instance['InstanceId']
    instance_type = instance['InstanceType']
    
    # Calculate CPU statistics
    avg_cpu = statistics.mean(cpu_data) if cpu_data else 0
    max_cpu = max(cpu_data) if cpu_data else 0
    p95_cpu = sorted(cpu_data)[int(len(cpu_data) * 0.95)] if cpu_data else 0
    
    # Calculate memory statistics (requires CloudWatch agent)
    avg_memory = statistics.mean(memory_data) if memory_data else 0
    p95_memory = sorted(memory_data)[int(len(memory_data) * 0.95)] if memory_data else 0
    
    # Get instance capacity (simplified lookup)
    vcpus, memory_gb = get_instance_specs(instance_type)
    
    # Determine recommendation
    recommendation, new_type = determine_recommendation(
        instance_type, avg_cpu, p95_cpu, avg_memory, p95_memory
    )
    
    # Calculate potential savings
    current_price = instance_prices.get(instance_type, 0)
    new_price = instance_prices.get(new_type, current_price)
    monthly_savings = (current_price - new_price) * 730  # hours/month
    
    return InstanceUtilization(
        instance_id=instance_id,
        instance_type=instance_type,
        avg_cpu=round(avg_cpu, 2),
        max_cpu=round(max_cpu, 2),
        p95_cpu=round(p95_cpu, 2),
        avg_memory=round(avg_memory, 2),
        p95_memory=round(p95_memory, 2),
        current_vcpus=vcpus,
        current_memory_gb=memory_gb,
        recommendation=recommendation,
        potential_savings=round(monthly_savings, 2)
    )
 
def determine_recommendation(
    current_type: str,
    avg_cpu: float,
    p95_cpu: float,
    avg_memory: float,
    p95_memory: float
) -> tuple:
    """
    Recommendation logic based on utilization thresholds.
    
    Conservative approach:
    - Only recommend downsize if P95 < 50%
    - Recommend upsize if P95 > 80%
    - Keep buffer for unexpected peaks
    """
    
    # Define thresholds (adjustable based on risk tolerance)
    DOWNSIZE_THRESHOLD = 50  # P95 below this = downsize candidate
    UPSIZE_THRESHOLD = 80   # P95 above this = upsize candidate
    
    cpu_under = p95_cpu < DOWNSIZE_THRESHOLD
    memory_under = p95_memory < DOWNSIZE_THRESHOLD if p95_memory > 0 else True
    
    cpu_over = p95_cpu > UPSIZE_THRESHOLD
    memory_over = p95_memory > UPSIZE_THRESHOLD
    
    if cpu_under and memory_under:
        new_type = get_smaller_type(current_type)
        return f"DOWNSIZE to {new_type}", new_type
    elif cpu_over or memory_over:
        new_type = get_larger_type(current_type)
        return f"UPSIZE to {new_type}", new_type
    else:
        return "OPTIMAL - No change recommended", current_type
 
def get_instance_specs(instance_type: str) -> tuple:
    """Return (vCPUs, memory_GB) for instance type."""
    # Simplified lookup - production would use AWS pricing API
    specs = {
        'm5.large': (2, 8),
        'm5.xlarge': (4, 16),
        'm5.2xlarge': (8, 32),
        'm5.4xlarge': (16, 64),
        'c5.large': (2, 4),
        'c5.xlarge': (4, 8),
        'r5.large': (2, 16),
        'r5.xlarge': (4, 32),
    }
    return specs.get(instance_type, (4, 16))
 
def get_smaller_type(instance_type: str) -> str:
    """Return one size smaller instance type."""
    size_order = ['nano', 'micro', 'small', 'medium', 'large', 'xlarge', '2xlarge', '4xlarge']
    family, size = instance_type.rsplit('.', 1)
    
    if size in size_order:
        idx = size_order.index(size)
        if idx > 0:
            return f"{family}.{size_order[idx-1]}"
    return instance_type
 
def get_larger_type(instance_type: str) -> str:
    """Return one size larger instance type."""
    size_order = ['nano', 'micro', 'small', 'medium', 'large', 'xlarge', '2xlarge', '4xlarge']
    family, size = instance_type.rsplit('.', 1)
    
    if size in size_order:
        idx = size_order.index(size)
        if idx < len(size_order) - 1:
            return f"{family}.{size_order[idx+1]}"
    return instance_type
 
# Main execution
def run_analysis():
    ec2 = boto3.client('ec2')
    cloudwatch = boto3.client('cloudwatch')
    
    # Get running instances
    instances = ec2.describe_instances(
        Filters=[{'Name': 'instance-state-name', 'Values': ['running']}]
    )
    
    results = []
    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            cpu_data = get_cpu_metrics(cloudwatch, instance['InstanceId'])
            memory_data = []  # Would require CloudWatch agent data
            
            analysis = analyze_instance(
                instance, cpu_data, memory_data,
                instance_prices={}  # Would load from pricing API
            )
            results.append(analysis)
            
            if 'DOWNSIZE' in analysis.recommendation:
                print(f"💰 {analysis.instance_id}: {analysis.recommendation}")
                print(f"   Current: {analysis.instance_type} ({analysis.avg_cpu}% avg, {analysis.p95_cpu}% p95)")
                print(f"   Savings: ${analysis.potential_savings}/month")
    
    total_savings = sum(r.potential_savings for r in results)
    print(f"\nTotal potential monthly savings: ${total_savings:, .2f}")

Right-Sizing Compute Resources

EC2/VM Right-Sizing:

Step 1: Identify candidates

Use cloud provider recommendations (AWS Compute Optimizer, Azure Advisor, GCP Recommender) or custom analysis to find instances with:

Average CPU < 20%
P95 CPU < 50%
Consistent utilization patterns (low variance)

Step 2: Validate with application context

Metrics don't tell the whole story. Before downsizing:

Confirm the workload isn't memory-bound (CPU low but memory high)
Check for I/O-bound behavior (CPU low but waiting on disk/network)
Review application-specific metrics (request latency, queue depth)
Verify business cycle coverage (did data capture Black Friday?)

Step 3: Test before production

Never downsize production without testing:

Run load tests against downsized configuration
Shadow test: run downsized replica alongside production
Implement canary deployments: 10% traffic to smaller instances

EC2 Right-Sizing Decision Matrix

•P95 CPU < 30%, P95 Memory < 30% → High confidence downsize (2 sizes smaller)
•P95 CPU < 50%, P95 Memory < 50% → Moderate confidence downsize (1 size smaller)
•P95 CPU 50-80%, P95 Memory 50-80% → Optimal sizing, no action
•P95 CPU > 80% OR P95 Memory > 80% → Consider upsize or horizontal scaling
•High CPU variance (Std Dev > 25%) → Investigate burst patterns before action

Modern instance families:

Right-sizing isn't just about making instances smaller—it's also about using the right instance family. New instance generations often provide better performance at lower cost:

Old Instance	New Equivalent	Performance	Cost Change
m4.xlarge	m6i.xlarge	+15%	Similar
c4.2xlarge	c6g.2xlarge	+40%	-20%
r4.large	r6g.large	+40%	-10%

Right-Sizing as Migration Opportunity

Right-Sizing Containers and Kubernetes

Containerized workloads present unique right-sizing challenges. In Kubernetes, resource allocation happens at two levels:

Pod resource requests/limits — What each container requests and its maximum
Node sizing — The underlying EC2/VM instances that run pods

Over-provisioning at either level wastes money, but the mechanisms differ.

Pod resource requests/limits:

Kubernetes uses requests for scheduling and limits for enforcement:

resources:
  requests:
    cpu: "500m"       # Scheduling guarantee: 0.5 CPU
    memory: "512Mi"   # Scheduling guarantee: 512 MB
  limits:
    cpu: "1000m"      # Maximum: 1 CPU (throttled if exceeded)
    memory: "1Gi"     # Maximum: 1 GB (OOMKilled if exceeded)

Common misconfigurations:

Requests too high — Pods request more than they use, blocking other pods from scheduling. Nodes appear full, but actual utilization is low.
No limits set — Pods can consume unlimited resources, affecting neighbors (noisy neighbor problem).
Limits = Requests — No burst capacity; may cause unnecessary throttling or OOMKills.

Kubernetes Right-Sizing Approach

•Collect actual usage metrics (Prometheus, Datadog)
•Compare requests to P95 usage over 7+ days
•Set requests at P95 usage + 20% buffer
•Set limits at 2-4x requests (allow burst)
•Use VPA recommendations as guidance
•Implement gradually with monitoring

Kubernetes Right-Sizing Anti-Patterns

•Copy-pasting resources from other pods
•Setting requests based on gut feeling
•Never revisiting initial settings
•Setting limits without understanding behavior
•Ignoring CPU throttling metrics
•Ignoring OOMKill events

Vertical Pod Autoscaler (VPA):

Kubernetes VPA automatically adjusts pod requests/limits based on observed usage:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Auto"  # Options: Off, Initial, Recreate, Auto
  resourcePolicy:
    containerPolicies:
      - containerName: "*"
        minAllowed:
          cpu: "50m"
          memory: "100Mi"
        maxAllowed:
          cpu: "2"
          memory: "4Gi"

VPA modes:

Off — VPA only provides recommendations, no changes applied
Initial — Applies recommendations at pod creation only
Recreate — Evicts and recreates pods to apply new resources
Auto — Updates pods using available mechanisms (eviction or in-place)

Node right-sizing:

Even with optimal pod sizing, nodes can be over-provisioned. Kubernetes cluster autoscaler adds/removes nodes based on pending pods, but doesn't automatically address over-provisioned nodes.

Karpenter (AWS) provides more intelligent node provisioning:

Selects optimal instance types based on pending pod requirements
Consolidates workloads onto fewer, better-utilized nodes
Replaces under-utilized nodes with right-sized alternatives

karpenter-consolidation.yaml
YAML
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Karpenter NodePool with Consolidation
# Automatically consolidates under-utilized nodes
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values: ["amd64"]
        - key: karpenter.sh/capacity-type
          operator: In
          values: ["spot", "on-demand"]
        - key: node.kubernetes.io/instance-type
          operator: In
          values:
            - m6i.large
            - m6i.xlarge
            - m6i.2xlarge
            - m6a.large
            - m6a.xlarge
            - m6a.2xlarge
            - c6i.large
            - c6i.xlarge
      nodeClassRef:
        name: default
        
  disruption:
    # Enable consolidation - Karpenter will:
    # 1. Delete empty nodes
    # 2. Replace under-utilized nodes with smaller ones
    # 3. Consolidate pods onto fewer nodes
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s  # Wait 30s before consolidating
    
  # Limit total resources for cost control
  limits:
    cpu: 1000
    memory: 1000Gi

Right-Sizing Databases

Why databases are over-provisioned:

Fear of migration — Changing database instance size requires maintenance windows
I/O complexity — Database performance depends on CPU, memory, storage IOPS, and network
Peak planning — Databases sized for annual peak (Black Friday) run at that size year-round
Growth projections — "We'll grow into it" projections that never materialize

Database right-sizing dimensions:

Database Right-Sizing Dimensions
Dimension	Over-Provisioned Signs	Right-Sizing Action
Instance Size (vCPU/Memory)	CPU < 20%, Memory < 40%	Downsize instance class
Storage (EBS/SSD)	Storage utilization < 30%	Reduce allocated storage*
IOPS	Provisioned IOPS >> actual IOPS	Use burstable storage or reduce PIOPS
Read Replicas	Replicas with minimal traffic	Remove unnecessary replicas
Multi-AZ	Dev/test using Multi-AZ	Disable Multi-AZ for non-production

*Note: Most managed databases don't allow reducing storage once provisioned. Storage right-sizing must be done proactively or during migration.

RDS-specific right-sizing:

AWS RDS Performance Insights provides deep visibility into database performance:

Top SQL queries by load
Wait event analysis (CPU, I/O, locks)
Instance utilization over time

Key metrics for RDS right-sizing:

# CPU - Look for consistent low utilization
CPUUtilization < 20% average
CPUUtilization < 40% peak

# Memory - Buffer pool hit ratio matters more than raw utilization
FreeableMemory > 50% of total (might be over-provisioned)
BufferCacheHitRatio > 99% (memory is sufficient, maybe over)

# Storage I/O - Compare provisioned to actual
ReadIOPS + WriteIOPS << Provisioned IOPS (PIOPS waste)
ReadLatency < 5ms and WriteLatency < 5ms (IOPS sufficient)

# Connections
DatabaseConnections << max_connections (connection pooling may allow smaller instance)

Database Right-Sizing Best Practices

•Use read replicas for read traffic — Offload reads to replicas instead of sizing primary for combined load
•Implement connection pooling — PgBouncer, ProxySQL reduce connection overhead, enabling smaller instances
•Use Aurora Serverless v2 for variable workloads — Auto-scales capacity based on demand, pay for actual usage
•Separate OLTP and OLAP — Analytics queries on production databases require over-provisioning; use dedicated analytics
•Review Reserved Instance coverage — Before downsizing, check if you'll lose RI benefits (plan RI adjustments together)
•Test extensively — Database changes are high-risk; use staging, load testing, and blue-green deployments

Database Right-Sizing Caution

Automated Right-Sizing

Manual right-sizing doesn't scale. In an environment with hundreds or thousands of resources, you need automation—both for identifying opportunities and implementing changes safely.

Cloud Provider Recommendation Tools:

Cloud Provider Right-Sizing Tools
Provider	Tool	Capabilities
AWS	AWS Compute Optimizer	EC2, Auto Scaling, EBS, Lambda recommendations based on ML analysis
AWS	AWS Trusted Advisor	Right-sizing alerts, idle resource detection
AWS	AWS Cost Explorer	RI/Savings Plan recommendations, usage patterns
Azure	Azure Advisor	VM right-sizing, idle resource detection
Azure	Azure Cost Management	Optimization recommendations, budget alerts
GCP	Recommender	VM/disk right-sizing, idle resource detection
GCP	Active Assist	ML-based optimization recommendations

Third-party tools:

For advanced automation and multi-cloud support:

Spot by NetApp — Continuous right-sizing, container optimization, automated implementation
Densify — Machine learning-based optimization, container right-sizing
CloudHealth — Governance, rightsizing, policy enforcement
Kubecost — Kubernetes-specific cost optimization and right-sizing

Implementing automated right-sizing:

Automation Implementation Levels

•Level 1: Automated Detection — Systems identify right-sizing candidates and generate reports. Humans review and implement. Implementation: Scheduled Compute Optimizer exports, weekly reports to teams.
•Level 2: Automated Ticketing — Right-sizing recommendations automatically create tickets assigned to resource owners. Implementation: Lambda trigger on Compute Optimizer → Jira integration.
•Level 3: Automated Scheduling — Approved right-sizing changes are automatically scheduled during maintenance windows. Implementation: Change approval workflow → AWS Systems Manager automation.
•Level 4: Continuous Optimization — System automatically implements low-risk right-sizing (non-prod, stateless, containerized). Implementation: VPA in Kubernetes, spot optimization tools, scheduled instance replacement.

rightsizing-automation.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
"""
Automated Right-Sizing Pipeline
 
Fetches Compute Optimizer recommendations and creates Jira tickets
for human review, with option for automated implementation in non-prod.
"""
 
import boto3
import json
from datetime import datetime
from jira import JIRA
 
class RightSizingPipeline:
    def __init__(self):
        self.compute_optimizer = boto3.client('compute-optimizer')
        self.ec2 = boto3.client('ec2')
        self.ssm = boto3.client('ssm')
        
        # Jira connection (configure in environment)
        self.jira = JIRA(
            server='https://company.atlassian.net',
            basic_auth=('user', 'api_token')
        )
    
    def get_recommendations(self, finding_status: str = 'Overprovisioned'):
        """Fetch EC2 right-sizing recommendations."""
        
        response = self.compute_optimizer.get_ec2_instance_recommendations(
            filters=[
                {'name': 'Finding', 'values': [finding_status]}
            ]
        )
        
        recommendations = []
        for rec in response.get('instanceRecommendations', []):
            instance_id = rec['instanceArn'].split('/')[-1]
            current_type = rec['currentInstanceType']
            
            # Get top recommendation
            if rec.get('recommendationOptions'):
                best_option = rec['recommendationOptions'][0]
                recommended_type = best_option['instanceType']
                estimated_savings = best_option.get('projectedUtilizationMetrics', [])
                
                recommendations.append({
                    'instance_id': instance_id,
                    'current_type': current_type,
                    'recommended_type': recommended_type,
                    'finding': rec['finding'],
                    'finding_reasons': rec.get('findingReasonCodes', []),
                    'utilization': rec.get('utilizationMetrics', []),
                    'estimated_monthly_savings': self._calculate_savings(
                        current_type, recommended_type
                    )
                })
        
        return recommendations
    
    def _calculate_savings(self, current_type: str, recommended_type: str) -> float:
        """Calculate estimated monthly savings (simplified)."""
        # Production: Use AWS Pricing API
        prices = {
            'm5.2xlarge': 0.384, 'm5.xlarge': 0.192, 'm5.large': 0.096,
            'c5.2xlarge': 0.340, 'c5.xlarge': 0.170, 'c5.large': 0.085,
        }
        current_price = prices.get(current_type, 0.20)
        new_price = prices.get(recommended_type, 0.20)
        return round((current_price - new_price) * 730, 2)  # Monthly hours
    
    def create_ticket(self, recommendation: dict, environment: str):
        """Create Jira ticket for right-sizing review."""
        
        avg_cpu = next(
            (m['value'] for m in recommendation['utilization'] 
             if m['name'] == 'CPU'), 'N/A'
        )
        
        description = f"""
*Automated Right-Sizing Recommendation*
 
|*Field*|*Value*|
|Instance ID|{recommendation['instance_id']}|
|Current Type|{recommendation['current_type']}|
|Recommended Type|{recommendation['recommended_type']}|
|Finding|{recommendation['finding']}|
|Average CPU|{avg_cpu}%|
|Est. Monthly Savings|${recommendation['estimated_monthly_savings']}|
 
*Finding Reasons:*
{chr(10).join(f'* {r}' for r in recommendation['finding_reasons'])}
 
*Next Steps:*
1. Review utilization metrics in CloudWatch
2. Verify workload can run on smaller instance
3. Schedule maintenance window for change
4. Monitor post-change performance
 
_This ticket was auto-generated by the Right-Sizing Pipeline_
        """
        
        issue = self.jira.create_issue(
            project='CLOUD',
            summary=f"Right-Size {recommendation['instance_id']}: "
                    f"{recommendation['current_type']} → {recommendation['recommended_type']}",
            description=description,
            issuetype={'name': 'Task'},
            labels=['right-sizing', 'cost-optimization', environment],
        )
        
        return issue.key
    
    def auto_implement_nonprod(self, recommendation: dict):
        """
        Automatically implement right-sizing for non-production instances.
        Uses SSM automation with proper safety checks.
        """
        
        instance_id = recommendation['instance_id']
        new_type = recommendation['recommended_type']
        
        # Safety checks
        tags = self.ec2.describe_tags(
            Filters=[{'Name': 'resource-id', 'Values': [instance_id]}]
        )
        
        env_tag = next(
            (t['Value'] for t in tags['Tags'] if t['Key'] == 'environment'),
            'unknown'
        )
        
        if env_tag not in ['development', 'staging', 'sandbox']:
            print(f"Skipping auto-implementation for {instance_id}: env={env_tag}")
            return None
        
        # Stop, modify, start using SSM Automation
        response = self.ssm.start_automation_execution(
            DocumentName='AWS-ResizeInstance',
            Parameters={
                'InstanceId': [instance_id],
                'InstanceType': [new_type],
            }
        )
        
        return response['AutomationExecutionId']
    
    def run_pipeline(self, auto_implement_nonprod: bool = False):
        """Execute full right-sizing pipeline."""
        
        recommendations = self.get_recommendations()
        print(f"Found {len(recommendations)} right-sizing opportunities")
        
        total_savings = 0
        for rec in recommendations:
            env = self._get_environment(rec['instance_id'])
            
            if auto_implement_nonprod and env in ['development', 'staging']:
                execution_id = self.auto_implement_nonprod(rec)
                print(f"Auto-implementing {rec['instance_id']}: {execution_id}")
            else:
                ticket_key = self.create_ticket(rec, env)
                print(f"Created ticket {ticket_key} for {rec['instance_id']}")
            
            total_savings += rec['estimated_monthly_savings']
        
        print(f"\nTotal potential monthly savings: ${total_savings:, .2f}")
    
    def _get_environment(self, instance_id: str) -> str:
                """Get environment from instance tags."""
        response = self.ec2.describe_instances(InstanceIds = [instance_id])
        for res in response['Reservations']:
            for inst in res['Instances']:
            for tag in inst.get('Tags', []):
            if tag['Key'] == 'environment':
            return tag['Value']
        return 'unknown'
 
 
# Execute pipeline
if __name__ == '__main__':
            pipeline = RightSizingPipeline()
    pipeline.run_pipeline(auto_implement_nonprod = True)

Building a Right-Sizing Culture

Common cultural barriers:

Cultural Barriers to Right-Sizing

•'Not my money' — Teams don't pay for resources, so they don't care about optimization
•Fear of outages — Right-sizing feels risky; status quo feels safe
•Lack of visibility — Teams don't know how over-provisioned they are
•No time — Right-sizing isn't prioritized against feature work
•Blame culture — Past performance issues created fear of under-provisioning
•Ownership gaps — Nobody knows who owns legacy resources

Strategies for culture change:

1. Make costs visible (Showback)

Teams can't optimize what they can't see. Implement weekly cost reports by team showing:

Total spend and trend
Waste metrics (idle resources, over-provisioning)
Comparison to peer teams
Improvement from last period

2. Create positive incentives

Reward optimization, not just avoid punishment:

"Cost savings of the month" recognition
Savings reinvested in team tooling or activities
Engineering ladder criteria includes efficiency
Celebrate successful right-sizing projects

3. Remove fear

Make right-sizing safe:

Provide easy rollback paths (Auto Scaling, Kubernetes)
Start with non-production environments
Run shadow tests before production changes
Blame-free post-mortems if issues occur

4. Embed in workflows

Make right-sizing part of existing processes:

Include utilization review in deployment checklists
Add cost impact to pull request templates
Review resource sizing in architecture reviews
Quarterly optimization sprints

5. Executive sponsorship

Top-down support is essential:

FinOps goals in engineering OKRs
Cost efficiency as performance metric
Cloud cost in board-level reporting
Budget for FinOps tooling and headcount

Start Small, Win Early

Summary: Right-Sizing Resources

Key Takeaways

•30-50% of cloud resources are over-provisioned — This represents massive savings opportunity with minimal risk.
•Over-provisioning is rational given typical incentives — Change the incentives (visibility, ownership) to change behavior.
•Measure before optimizing — Collect 14-90 days of utilization data; use P95, not averages.
•Right-size across all dimensions — Compute, containers, databases, storage all have optimization potential.
•Automate detection and implementation — Manual processes don't scale; build pipelines that continuously optimize.
•Culture matters more than tools — Sustainable optimization requires organizational change, not just technical solutions.
•Start with easy wins — Build momentum with obvious over-provisioning before tackling complex cases.

What's next:

Page Complete

3 / 5