Why System Design Matters - Learning Module

Loading content...

0/273

Cost Optimization at Scale

The Economics of Scale

In the early days of a startup, infrastructure costs barely register—a few hundred dollars a month for servers, perhaps a small database. But scale changes everything. What was once a rounding error becomes a line item that rivals engineering salaries. Companies serving millions of users routinely spend millions of dollars on infrastructure—sometimes tens of millions, sometimes hundreds.

This reality creates one of the most consequential tensions in system design: the balance between performance, reliability, and cost. Every architectural decision has cost implications. More servers improve capacity but increase bills. More redundancy improves availability but doubles expenses. Faster storage improves latency but costs multiples more. The challenge is not simply building systems that work, but building systems that work economically.

Cost optimization at scale is not about penny-pinching. It's about understanding the true cost of infrastructure decisions, identifying waste, and making intentional trade-offs. A dollar saved on infrastructure is a dollar available for engineering, product, or growth. Organizations that master cost efficiency can invest more in their competitive advantages while their competitors drain resources on inefficiency.

This page explores the economics of scale—how to understand, model, and optimize the costs of systems serving millions, without sacrificing the reliability and performance that users demand.

What You Will Learn

By the end of this page, you will understand the major cost components at scale, how to model and forecast infrastructure costs, optimization strategies at every layer of the stack, and the trade-off frameworks that guide cost-conscious architecture decisions.

Understanding Cloud Cost Structure

Modern systems typically run on cloud infrastructure, where costs are composed of multiple dimensions that interact in complex ways.

The Major Cost Categories

1. Compute (30-50% of typical cloud bill):

Virtual machines (EC2, GCE, Azure VMs)
Container orchestration (EKS, GKE, AKS)
Serverless compute (Lambda, Cloud Functions)
Specialized compute (GPU instances, high-memory)

Pricing Models:

On-Demand: Highest price, maximum flexibility
Reserved/Committed: 30-70% discount for 1-3 year commitment
Spot/Preemptible: 60-90% discount, but can be terminated with notice

2. Storage (15-25% of typical cloud bill):

Block storage (EBS, Persistent Disks)
Object storage (S3, GCS, Azure Blob)
File storage (EFS, Filestore)
Database storage (RDS, Cloud SQL)

Pricing Factors:

Storage class (hot, warm, cold, archive)
IOPS and throughput
Redundancy level (single-zone, multi-zone, multi-region)

3. Data Transfer (10-20% of typical cloud bill):

Egress (data leaving cloud) — the expensive one
Inter-region transfer
Inter-zone transfer
Ingress (data entering cloud) — usually free

The Egress Tax: Cloud providers charge significantly for data leaving their network:

$0.08-0.12 per GB for internet egress
At 1 PB/month egress: $80,000-120,000/month just for data transfer

4. Database Services (10-20% of typical cloud bill):

Managed database instances (RDS, Cloud SQL)
NoSQL services (DynamoDB, Cosmos DB)
Cache services (ElastiCache, Memorystore)
Data warehouse (Redshift, BigQuery)

Pricing Factors:

Instance size and type
Storage volume and type
Read/write capacity units (for serverless DBs)
Backup and snapshot retention

Typical Cloud Cost Breakdown at Scale
Category	Percentage	Primary Drivers	Key Optimizations
Compute	30-50%	Instance hours, instance types	Right-sizing, reserved instances, spot
Storage	15-25%	Volume, IOPS, storage class	Tiering, lifecycle policies, cleanup
Data Transfer	10-20%	Egress volume, cross-region	CDN, compression, architecture
Databases	10-20%	Instance hours, storage, IOPS	Right-sizing, reserved, read replicas
Other	10-15%	Load balancers, DNS, monitoring	Consolidation, right-sizing

The Hidden Costs

Cloud bills contain surprising charges: data transfer between availability zones, NAT gateway processing, EBS snapshot storage, CloudWatch logs storage, Elastic IP addresses when not attached. Review detailed billing regularly—hidden costs can accumulate to significant totals.

Cost Modeling and Forecasting

Effective cost optimization requires understanding why costs are what they are and how they'll change as the system grows.

Unit Economics: Cost Per Action

The foundation of cost modeling is understanding the cost of each unit of value delivered:

Examples:

Cost per API call: Total infrastructure cost / total API calls
Cost per user: Total infrastructure cost / total users
Cost per transaction: Total infrastructure cost / total transactions
Cost per GB stored: Storage costs / total storage

Why Unit Economics Matter:

They reveal whether your business model is viable at scale
They enable comparison across architectures
They predict costs as volume grows
They expose inefficiency (high cost-per-action indicates waste)

Example Calculation:

Monthly Infrastructure Cost: $100,000
Monthly API Calls: 500,000,000

Cost per million API calls: $100,000 / 500 = $200

If a premium customer generates 1M calls/month and pays $50
→ Negative unit economics! Must optimize or reprice.

Cost Drivers and Elasticity

Understanding what drives costs enables prediction and optimization:

Linear Cost Drivers:

Storage scales linearly with data (10x data = 10x storage cost)
Egress scales linearly with traffic (2x downloads = 2x egress)

Sublinear Cost Drivers (Economies of Scale):

Reserved instance discounts improve with volume
Wholesale agreements reduce per-unit costs
Fixed costs (load balancers, NAT gateways) amortize over more traffic

Superlinear Cost Drivers (Diseconomies):

Poor algorithmic complexity (O(n²) operations)
Coordination overhead (more services = more integration cost)
Geographic distribution (multi-region replication)

Forecasting Growth Costs

Steps:

Identify current unit costs across categories
Model expected volume growth (users, requests, data)
Apply cost drivers to each category
Account for planned optimizations and reserved capacity
Compare to budget and adjust architecture or pricing

Tag Everything

Cloud cost visibility requires resource tagging. Tag resources by team, service, environment, and project. Without tags, you see total costs but not why or where. With tags, you can attribute costs to teams, compare service costs, and identify optimization targets.

Compute Optimization Strategies

Compute is typically the largest cost category. Optimizing it yields the biggest absolute savings.

Right-Sizing: The Lowest-Hanging Fruit

Most organizations massively over-provision compute:

Common Findings:

Average CPU utilization: 15-25%
Instances sized for peak, idle 80% of time
Memory oversized 'just in case'

Right-Sizing Process:

Monitor actual utilization (CPU, memory, network)
Identify consistently underutilized instances
Resize to smaller instance types
Verify performance isn't impacted
Repeat continuously (workloads change)

Tools:

AWS Compute Optimizer
GCP Recommender
Azure Advisor
Third-party: Spot.io, CloudHealth, Datadog

Caveat: Right-sizing for average utilization ignores peaks. Target 60-70% utilization at peak, not at average.

Reserved Instances and Committed Use

For stable workloads, commit to 1-3 years for significant discounts:

Discount Levels:

1-year reserved: 30-40% discount
3-year reserved: 50-70% discount
Partial upfront vs. all upfront affects discount

Best Practices:

Reserve baseline capacity (the minimum always needed)
Use on-demand for variable capacity above baseline
Review and adjust annually as workloads change
Consider convertible reservations for flexibility

Risk: If workload shrinks or migrates, you're locked in. Balance discount against flexibility.

Spot Instances: 90% Savings with Caveats

Spot instances offer deep discounts (~60-90%) but can be terminated with 2-minute notice:

Good Candidates for Spot:

Stateless batch processing (can restart)
Dev/test environments (interruption acceptable)
Distributed compute that tolerates node loss (Spark, Kubernetes with proper config)

Bad Candidates for Spot:

Production databases
Single-instance services without redundancy
Long-running jobs that can't checkpoint

Serverless Economics

Serverless (Lambda, Cloud Functions) charges per invocation and duration:

When Serverless Saves Money:

Variable, spiky workloads (pay only when running)
Low-volume services (no idle instance cost)
Glue code and integrations

When Serverless Costs More:

Consistent, high-volume workloads (instance-based becomes cheaper)
Long-running processes (billed per 100ms accumulates)
High memory requirements (memory-based pricing)

Break-Even Analysis: A Lambda function costing $0.20 per million invocations at 100ms duration:

At 10M invocations/month: ~$2/month (serverless wins)
At 1B invocations/month: ~$200/month (still cheap, but instance may be comparable)
At 100ms × 1B invocations = 100K seconds of compute → compare to equivalent EC2

Recommendation: Start serverless for simplicity, migrate to instances when volume justifies the operational overhead.

Gravitron and ARM-Based Instances

AWS Graviton (ARM-based) instances offer 20% better price-performance than equivalent x86 instances. If your workload supports ARM (most interpreted languages and many compiled ones do), switching can yield immediate savings with minimal effort.

Storage Optimization Strategies

Storage costs accumulate over time—data is created continuously but rarely deleted. Optimizing storage requires both reducing what's stored and choosing appropriate storage tiers.

Data Lifecycle Management

Not all data deserves the same storage treatment:

Hot Data (Frequent Access):

Recent user data, active sessions, current analytics
Store on fast, expensive storage (SSD, high-IOPS)
Cache aggressively

Warm Data (Occasional Access):

Older user data, historical analytics, backups < 90 days
Move to cheaper storage (HDD, standard tiers)
Acceptable latency for retrieval

Cold Data (Rare Access):

Archives, compliance data, old logs
Move to archive storage (S3 Glacier, Archive Storage)
Accept high retrieval latency (hours)

Lifecycle Policies: Automate transitions:

Days 0-30: S3 Standard ($0.023/GB)
Days 31-90: S3 Infrequent Access ($0.0125/GB)
Days 91-365: S3 Glacier ($0.004/GB)
After 365: S3 Deep Archive ($0.00099/GB) or delete

Result: ~90% cost reduction for old data

Data Cleanup and Hygiene

The Accumulation Problem:

Logs pile up indefinitely
Temporary files become permanent
Old versions kept 'just in case'
Nobody knows what can be deleted

Cleanup Strategies:

Define retention policies for every data type
Implement automatic deletion (TTL, lifecycle rules)
Regular audits of storage contents
Delete unattached resources (orphaned volumes, old snapshots)

Quick Wins:

Delete old EBS snapshots (often forgotten)
Clean up incomplete multipart uploads (S3)
Remove unused Elastic IPs
Prune old container images from registries

Database Storage Optimization

Indexing:

Proper indexes reduce storage (smaller query results) and compute (faster queries)
Over-indexing wastes storage and slows writes
Analyze query patterns and drop unused indexes

Compression:

Most databases support compression (2-5x reduction)
Trade-off: CPU for storage (usually worthwhile)
Enable for large tables, especially archival data

Data Types:

Use appropriate types (INT vs BIGINT, CHAR(10) vs TEXT)
Normalize where appropriate; denormalize strategically
Archive or partition old data

Partitioning:

Time-based partitioning enables efficient archival
Drop old partitions instead of DELETE (faster, no fragmentation)
Query only relevant partitions (pruning)

The 80/20 Rule of Storage

Typically, 80% of storage holds data accessed less than 1% of the time. Identify your 80% and move it to cheaper tiers. The savings are substantial, and users rarely notice (they weren't accessing it anyway).

Network and Data Transfer Optimization

Data transfer costs, especially egress, can scale dramatically with traffic. Optimizing network costs requires both architectural changes and tactical improvements.

Content Delivery Networks (CDNs)

CDNs reduce origin egress and improve latency:

How CDNs Reduce Costs:

Serve cached content from edge; origin never touched
Edge egress is cheaper than origin egress
Reduced load on origin servers (compute savings)

CDN Economics:

Origin egress: ~$0.09/GB
CDN egress: ~$0.02-0.05/GB
At 100 TB/month: $9,000 → $2,000-5,000

Maximize Cache Hit Rate:

Set appropriate cache headers (Cache-Control, ETag)
Standardize query parameters (avoid cache busting)
Use origin shield (intermediate cache layer)

Compression

Smaller payloads mean less data transfer:

Application-Level Compression:

Enable gzip/brotli for HTTP responses
70-90% reduction for text (HTML, JSON, JavaScript)
Minimal CPU overhead on modern hardware

Data Format Optimization:

Binary formats (Protocol Buffers, MessagePack) vs. JSON
Columnar formats (Parquet) for analytics
Image optimization (WebP, AVIF instead of PNG/JPEG)

Architecture-Level Optimizations

Minimize Cross-Region Traffic:

Keep data and compute in same region
Use regional caches rather than cross-region reads
Process data where it's generated

Minimize Cross-Zone Traffic:

Cross-AZ transfer: $0.01/GB each direction
At 100 TB/month: $2,000/month
Consider single-AZ for less critical workloads

VPC Endpoints:

Traffic to AWS services through NAT gateway incurs charges
VPC endpoints route traffic internally (cheaper or free)
S3 gateway endpoint: free for S3 traffic

The Multi-Cloud Data Tax

Moving data between cloud providers is expensive—you pay egress on the source. Multi-cloud architectures must account for this. Avoid architectures that ping-pong data between clouds. Keep processing and data together.

Architectural Cost Optimization

Beyond tactical optimizations, architectural decisions fundamentally determine cost structure. Some architectures are inherently expensive; others are inherently efficient.

Caching as Cost Reduction

Caching isn't just about performance—it's about cost:

Database Load Reduction:

A 95% cache hit rate means database handles 5% of requests
Can use smaller, cheaper database instances
Extends time before database scaling needed

Cache Cost Comparison:

Redis cluster: perhaps $1,000/month
Database scaled to handle uncached load: perhaps $10,000/month
10x ROI on cache investment

CDN as Cache:

Static content served from edge, origin never touched
Dynamic content with short TTL still reduces origin load
Modern CDNs can cache at edge (Lambda@Edge, Workers)

Asynchronous Processing

Synchronous processing provisions for peak; async provisions for average:

The Sync Problem:

Peak: 10,000 RPS, requires 100 servers
Average: 2,000 RPS, 80 servers idle most of time
Cost: 100 servers × 24 hours = 2,400 server-hours/day

The Async Solution:

Queue absorbs bursts
Workers process at steady rate (2,500 RPS)
30 servers handle same volume
Cost: 30 servers × 24 hours = 720 server-hours/day
70% savings!

Trade-off: Latency. Users wait for response. Acceptable for background jobs, notifications, analytics—not for synchronous user requests.

Read-Heavy vs. Write-Heavy Optimization

Read-Heavy Workloads:

Optimize with caching and read replicas
Scale horizontally with cheap read instances
Denormalize data to reduce joins

Write-Heavy Workloads:

Batch writes (fewer transactions)
Use append-only structures (log-structured stores)
Consider eventual consistency (cheaper than strong)

Cost-Efficient Architecture Principles

•Cache aggressively — The cheapest request is one you never process
•Process asynchronously — Provision for average, not peak, using queues
•Keep data and compute together — Data transfer costs are hidden but significant
•Use managed services wisely — Balance operational savings against service premiums
•Prefer stateless — Stateless services scale efficiently and use resources only when needed
•Batch operations — One batch insert is cheaper than 1000 individual inserts
•Optimize cold paths — The most expensive code is code that shouldn't run at all

The Rewrite Trade-off

Sometimes the architecture is the problem. A fundamentally inefficient design may cost $50K/month to run and $300K in engineer-years to rewrite. If the rewrite saves $30K/month, it pays for itself in less than a year—plus ongoing savings forever. Evaluate architectural rewrites as investments with ROI.

Cost vs Performance vs Reliability Trade-offs

System design involves navigating trade-offs. Cost optimization often trades against performance or reliability. Understanding these trade-offs enables informed decisions.

The Iron Triangle: Cost, Performance, Reliability

Tension 1: Cost vs Performance

Faster storage (SSD vs HDD): higher cost
More compute (bigger instances): higher cost
Lower latency (edge computing, CDN): infrastructure cost

Approach: Define performance requirements (SLOs), then optimize cost to meet them—not exceed them.

Tension 2: Cost vs Reliability

Redundancy (multi-region): 2x+ cost
Backups (frequent, multiple locations): storage cost
Disaster recovery (warm standby): idle resources

Approach: Quantify the cost of downtime. If an hour of downtime costs $100K in lost revenue, investing $500K/year in reliability may be justified.

Tension 3: Performance vs Reliability

Strong consistency (synchronous replication): adds latency
Redundancy (cross-region failover): adds complexity
Health checks (frequent probing): adds overhead

Approach: Match SLOs to business requirements. Not everything needs five nines.

Trade-off Decision Framework
Decision	Optimize For	Accept Trade-off In
Use spot instances	Cost	Reliability (interruption risk)
Single-region deployment	Cost & Simplicity	Latency & DR capability
Eventual consistency	Performance & Availability	Consistency (stale reads)
Aggressive caching	Cost & Performance	Consistency (cache staleness)
Reserved instances	Cost	Flexibility (lock-in)
Microservices	Scalability & Reliability	Complexity & Operational cost

Service Tiering: Different SLOs, Different Costs

Not all services require the same reliability or performance:

Critical Path (High Investment):

Core user-facing functionality
Payment processing, authentication
Multi-region, multi-AZ, reserved capacity
99.99% availability target

Important (Medium Investment):

Supporting features, analytics
Internal tools with external impact
Multi-AZ, mix of reserved and on-demand
99.9% availability target

Non-Critical (Low Investment):

Dev/test environments
Batch jobs, reports
Single-AZ, spot instances, minimal redundancy
99% availability acceptable

Result: Critical services get resources they need; savings come from not over-engineering everything else.

Technical Debt and Cost

Technical debt has carrying costs. Inefficient code requires more compute. Unoptimized queries require more database. Missing indexes require more IOPS. The cost of not paying down technical debt shows up in the infrastructure bill. Sometimes the best cost optimization is engineering time spent on cleanup.

Summary: Cost Optimization at Scale

We've explored the economic dimensions of system design—how to build systems that are not just functional, but economically sustainable at scale.

Key Takeaways

•Understand cost structure — Compute, storage, data transfer, and databases each contribute differently. Know where your money goes.
•Model unit economics — Cost per API call, cost per user. This reveals viability and guides optimization priorities.
•Right-size relentlessly — Most systems are over-provisioned. Continuous right-sizing yields easy savings.
•Use commitment discounts — Reserved instances for stable workloads, spot for interruptible workloads.
•Optimize storage lifecycle — Hot/warm/cold tiering, retention policies, and cleanup prevent unbounded growth.
•Minimize data transfer — CDNs, compression, and keeping compute near data reduce egress costs.
•Architecture determines cost — Caching, async processing, and efficient design save more than tactical tweaks.
•Navigate trade-offs intentionally — Balance cost against performance and reliability based on business requirements, not engineering preference.

Module Complete:

With this page, we've completed our exploration of Why System Design Matters. You now understand the four pillars:

Scalability — Building systems that grow gracefully
Handling Scale — Operating systems serving millions
Reliability & Availability — Keeping systems running through failure
Cost Optimization — Making systems economically sustainable

These concerns are not sequential—they must be balanced simultaneously in every architectural decision. The art of system design lies in finding solutions that optimize across all four dimensions.

Next, we'll explore Thinking at Scale—developing the mental models and intuition needed to reason about systems serving orders of magnitude more users.

Module Complete

You now understand why system design matters—not as academic exercise, but as the practical discipline of building software that scales, stays up, and remains economically viable. These four pillars—scalability, handling scale, reliability, and cost—form the foundation of every system design decision you'll make.

Cost Optimization at Scale

The Economics of Scale

This page explores the economics of scale—how to understand, model, and optimize the costs of systems serving millions, without sacrificing the reliability and performance that users demand.

What You Will Learn

Understanding Cloud Cost Structure

Modern systems typically run on cloud infrastructure, where costs are composed of multiple dimensions that interact in complex ways.

The Major Cost Categories

1. Compute (30-50% of typical cloud bill):

Virtual machines (EC2, GCE, Azure VMs)
Container orchestration (EKS, GKE, AKS)
Serverless compute (Lambda, Cloud Functions)
Specialized compute (GPU instances, high-memory)

Pricing Models:

On-Demand: Highest price, maximum flexibility
Reserved/Committed: 30-70% discount for 1-3 year commitment
Spot/Preemptible: 60-90% discount, but can be terminated with notice

2. Storage (15-25% of typical cloud bill):

Block storage (EBS, Persistent Disks)
Object storage (S3, GCS, Azure Blob)
File storage (EFS, Filestore)
Database storage (RDS, Cloud SQL)

Pricing Factors:

Storage class (hot, warm, cold, archive)
IOPS and throughput
Redundancy level (single-zone, multi-zone, multi-region)

3. Data Transfer (10-20% of typical cloud bill):

Egress (data leaving cloud) — the expensive one
Inter-region transfer
Inter-zone transfer
Ingress (data entering cloud) — usually free

The Egress Tax: Cloud providers charge significantly for data leaving their network:

$0.08-0.12 per GB for internet egress
At 1 PB/month egress: $80,000-120,000/month just for data transfer

4. Database Services (10-20% of typical cloud bill):

Managed database instances (RDS, Cloud SQL)
NoSQL services (DynamoDB, Cosmos DB)
Cache services (ElastiCache, Memorystore)
Data warehouse (Redshift, BigQuery)

Pricing Factors:

Instance size and type
Storage volume and type
Read/write capacity units (for serverless DBs)
Backup and snapshot retention

Typical Cloud Cost Breakdown at Scale
Category	Percentage	Primary Drivers	Key Optimizations
Compute	30-50%	Instance hours, instance types	Right-sizing, reserved instances, spot
Storage	15-25%	Volume, IOPS, storage class	Tiering, lifecycle policies, cleanup
Data Transfer	10-20%	Egress volume, cross-region	CDN, compression, architecture
Databases	10-20%	Instance hours, storage, IOPS	Right-sizing, reserved, read replicas
Other	10-15%	Load balancers, DNS, monitoring	Consolidation, right-sizing

The Hidden Costs

Cost Modeling and Forecasting

Effective cost optimization requires understanding why costs are what they are and how they'll change as the system grows.

Unit Economics: Cost Per Action

The foundation of cost modeling is understanding the cost of each unit of value delivered:

Examples:

Cost per API call: Total infrastructure cost / total API calls
Cost per user: Total infrastructure cost / total users
Cost per transaction: Total infrastructure cost / total transactions
Cost per GB stored: Storage costs / total storage

Why Unit Economics Matter:

They reveal whether your business model is viable at scale
They enable comparison across architectures
They predict costs as volume grows
They expose inefficiency (high cost-per-action indicates waste)

Example Calculation:

Monthly Infrastructure Cost: $100,000
Monthly API Calls: 500,000,000

Cost per million API calls: $100,000 / 500 = $200

If a premium customer generates 1M calls/month and pays $50
→ Negative unit economics! Must optimize or reprice.

Cost Drivers and Elasticity

Understanding what drives costs enables prediction and optimization:

Linear Cost Drivers:

Storage scales linearly with data (10x data = 10x storage cost)
Egress scales linearly with traffic (2x downloads = 2x egress)

Sublinear Cost Drivers (Economies of Scale):

Reserved instance discounts improve with volume
Wholesale agreements reduce per-unit costs
Fixed costs (load balancers, NAT gateways) amortize over more traffic

Superlinear Cost Drivers (Diseconomies):

Poor algorithmic complexity (O(n²) operations)
Coordination overhead (more services = more integration cost)
Geographic distribution (multi-region replication)

Forecasting Growth Costs

Steps:

Identify current unit costs across categories
Model expected volume growth (users, requests, data)
Apply cost drivers to each category
Account for planned optimizations and reserved capacity
Compare to budget and adjust architecture or pricing

Tag Everything

Compute Optimization Strategies

Compute is typically the largest cost category. Optimizing it yields the biggest absolute savings.

Right-Sizing: The Lowest-Hanging Fruit

Most organizations massively over-provision compute:

Common Findings:

Average CPU utilization: 15-25%
Instances sized for peak, idle 80% of time
Memory oversized 'just in case'

Right-Sizing Process:

Monitor actual utilization (CPU, memory, network)
Identify consistently underutilized instances
Resize to smaller instance types
Verify performance isn't impacted
Repeat continuously (workloads change)

Tools:

AWS Compute Optimizer
GCP Recommender
Azure Advisor
Third-party: Spot.io, CloudHealth, Datadog

Caveat: Right-sizing for average utilization ignores peaks. Target 60-70% utilization at peak, not at average.

Reserved Instances and Committed Use

For stable workloads, commit to 1-3 years for significant discounts:

Discount Levels:

1-year reserved: 30-40% discount
3-year reserved: 50-70% discount
Partial upfront vs. all upfront affects discount

Best Practices:

Reserve baseline capacity (the minimum always needed)
Use on-demand for variable capacity above baseline
Review and adjust annually as workloads change
Consider convertible reservations for flexibility

Risk: If workload shrinks or migrates, you're locked in. Balance discount against flexibility.

Spot Instances: 90% Savings with Caveats

Spot instances offer deep discounts (~60-90%) but can be terminated with 2-minute notice:

Good Candidates for Spot:

Stateless batch processing (can restart)
Dev/test environments (interruption acceptable)
Distributed compute that tolerates node loss (Spark, Kubernetes with proper config)

Bad Candidates for Spot:

Production databases
Single-instance services without redundancy
Long-running jobs that can't checkpoint

Serverless Economics

Serverless (Lambda, Cloud Functions) charges per invocation and duration:

When Serverless Saves Money:

Variable, spiky workloads (pay only when running)
Low-volume services (no idle instance cost)
Glue code and integrations

When Serverless Costs More:

Consistent, high-volume workloads (instance-based becomes cheaper)
Long-running processes (billed per 100ms accumulates)
High memory requirements (memory-based pricing)

Break-Even Analysis: A Lambda function costing $0.20 per million invocations at 100ms duration:

At 10M invocations/month: ~$2/month (serverless wins)
At 1B invocations/month: ~$200/month (still cheap, but instance may be comparable)
At 100ms × 1B invocations = 100K seconds of compute → compare to equivalent EC2

Recommendation: Start serverless for simplicity, migrate to instances when volume justifies the operational overhead.

Gravitron and ARM-Based Instances

Storage Optimization Strategies

Storage costs accumulate over time—data is created continuously but rarely deleted. Optimizing storage requires both reducing what's stored and choosing appropriate storage tiers.

Data Lifecycle Management

Not all data deserves the same storage treatment:

Hot Data (Frequent Access):

Recent user data, active sessions, current analytics
Store on fast, expensive storage (SSD, high-IOPS)
Cache aggressively

Warm Data (Occasional Access):

Older user data, historical analytics, backups < 90 days
Move to cheaper storage (HDD, standard tiers)
Acceptable latency for retrieval

Cold Data (Rare Access):

Archives, compliance data, old logs
Move to archive storage (S3 Glacier, Archive Storage)
Accept high retrieval latency (hours)

Lifecycle Policies: Automate transitions:

Days 0-30: S3 Standard ($0.023/GB)
Days 31-90: S3 Infrequent Access ($0.0125/GB)
Days 91-365: S3 Glacier ($0.004/GB)
After 365: S3 Deep Archive ($0.00099/GB) or delete

Result: ~90% cost reduction for old data

Data Cleanup and Hygiene

The Accumulation Problem:

Logs pile up indefinitely
Temporary files become permanent
Old versions kept 'just in case'
Nobody knows what can be deleted

Cleanup Strategies:

Define retention policies for every data type
Implement automatic deletion (TTL, lifecycle rules)
Regular audits of storage contents
Delete unattached resources (orphaned volumes, old snapshots)

Quick Wins:

Delete old EBS snapshots (often forgotten)
Clean up incomplete multipart uploads (S3)
Remove unused Elastic IPs
Prune old container images from registries

Database Storage Optimization

Indexing:

Proper indexes reduce storage (smaller query results) and compute (faster queries)
Over-indexing wastes storage and slows writes
Analyze query patterns and drop unused indexes

Compression:

Most databases support compression (2-5x reduction)
Trade-off: CPU for storage (usually worthwhile)
Enable for large tables, especially archival data

Data Types:

Use appropriate types (INT vs BIGINT, CHAR(10) vs TEXT)
Normalize where appropriate; denormalize strategically
Archive or partition old data

Partitioning:

Time-based partitioning enables efficient archival
Drop old partitions instead of DELETE (faster, no fragmentation)
Query only relevant partitions (pruning)

The 80/20 Rule of Storage

Network and Data Transfer Optimization

Data transfer costs, especially egress, can scale dramatically with traffic. Optimizing network costs requires both architectural changes and tactical improvements.

Content Delivery Networks (CDNs)

CDNs reduce origin egress and improve latency:

How CDNs Reduce Costs:

Serve cached content from edge; origin never touched
Edge egress is cheaper than origin egress
Reduced load on origin servers (compute savings)

CDN Economics:

Origin egress: ~$0.09/GB
CDN egress: ~$0.02-0.05/GB
At 100 TB/month: $9,000 → $2,000-5,000

Maximize Cache Hit Rate:

Set appropriate cache headers (Cache-Control, ETag)
Standardize query parameters (avoid cache busting)
Use origin shield (intermediate cache layer)

Compression

Smaller payloads mean less data transfer:

Application-Level Compression:

Enable gzip/brotli for HTTP responses
70-90% reduction for text (HTML, JSON, JavaScript)
Minimal CPU overhead on modern hardware

Data Format Optimization:

Binary formats (Protocol Buffers, MessagePack) vs. JSON
Columnar formats (Parquet) for analytics
Image optimization (WebP, AVIF instead of PNG/JPEG)

Architecture-Level Optimizations

Minimize Cross-Region Traffic:

Keep data and compute in same region
Use regional caches rather than cross-region reads
Process data where it's generated

Minimize Cross-Zone Traffic:

Cross-AZ transfer: $0.01/GB each direction
At 100 TB/month: $2,000/month
Consider single-AZ for less critical workloads

VPC Endpoints:

Traffic to AWS services through NAT gateway incurs charges
VPC endpoints route traffic internally (cheaper or free)
S3 gateway endpoint: free for S3 traffic

The Multi-Cloud Data Tax

Architectural Cost Optimization

Beyond tactical optimizations, architectural decisions fundamentally determine cost structure. Some architectures are inherently expensive; others are inherently efficient.

Caching as Cost Reduction

Caching isn't just about performance—it's about cost:

Database Load Reduction:

A 95% cache hit rate means database handles 5% of requests
Can use smaller, cheaper database instances
Extends time before database scaling needed

Cache Cost Comparison:

Redis cluster: perhaps $1,000/month
Database scaled to handle uncached load: perhaps $10,000/month
10x ROI on cache investment

CDN as Cache:

Static content served from edge, origin never touched
Dynamic content with short TTL still reduces origin load
Modern CDNs can cache at edge (Lambda@Edge, Workers)

Asynchronous Processing

Synchronous processing provisions for peak; async provisions for average:

The Sync Problem:

Peak: 10,000 RPS, requires 100 servers
Average: 2,000 RPS, 80 servers idle most of time
Cost: 100 servers × 24 hours = 2,400 server-hours/day

The Async Solution:

Queue absorbs bursts
Workers process at steady rate (2,500 RPS)
30 servers handle same volume
Cost: 30 servers × 24 hours = 720 server-hours/day
70% savings!

Trade-off: Latency. Users wait for response. Acceptable for background jobs, notifications, analytics—not for synchronous user requests.

Read-Heavy vs. Write-Heavy Optimization

Read-Heavy Workloads:

Optimize with caching and read replicas
Scale horizontally with cheap read instances
Denormalize data to reduce joins

Write-Heavy Workloads:

Batch writes (fewer transactions)
Use append-only structures (log-structured stores)
Consider eventual consistency (cheaper than strong)

Cost-Efficient Architecture Principles

•Cache aggressively — The cheapest request is one you never process
•Process asynchronously — Provision for average, not peak, using queues
•Keep data and compute together — Data transfer costs are hidden but significant
•Use managed services wisely — Balance operational savings against service premiums
•Prefer stateless — Stateless services scale efficiently and use resources only when needed
•Batch operations — One batch insert is cheaper than 1000 individual inserts
•Optimize cold paths — The most expensive code is code that shouldn't run at all

The Rewrite Trade-off

Cost vs Performance vs Reliability Trade-offs

System design involves navigating trade-offs. Cost optimization often trades against performance or reliability. Understanding these trade-offs enables informed decisions.

The Iron Triangle: Cost, Performance, Reliability

Tension 1: Cost vs Performance

Faster storage (SSD vs HDD): higher cost
More compute (bigger instances): higher cost
Lower latency (edge computing, CDN): infrastructure cost

Approach: Define performance requirements (SLOs), then optimize cost to meet them—not exceed them.

Tension 2: Cost vs Reliability

Redundancy (multi-region): 2x+ cost
Backups (frequent, multiple locations): storage cost
Disaster recovery (warm standby): idle resources

Approach: Quantify the cost of downtime. If an hour of downtime costs $100K in lost revenue, investing $500K/year in reliability may be justified.

Tension 3: Performance vs Reliability

Strong consistency (synchronous replication): adds latency
Redundancy (cross-region failover): adds complexity
Health checks (frequent probing): adds overhead

Approach: Match SLOs to business requirements. Not everything needs five nines.

Trade-off Decision Framework
Decision	Optimize For	Accept Trade-off In
Use spot instances	Cost	Reliability (interruption risk)
Single-region deployment	Cost & Simplicity	Latency & DR capability
Eventual consistency	Performance & Availability	Consistency (stale reads)
Aggressive caching	Cost & Performance	Consistency (cache staleness)
Reserved instances	Cost	Flexibility (lock-in)
Microservices	Scalability & Reliability	Complexity & Operational cost

Service Tiering: Different SLOs, Different Costs

Not all services require the same reliability or performance:

Critical Path (High Investment):

Core user-facing functionality
Payment processing, authentication
Multi-region, multi-AZ, reserved capacity
99.99% availability target

Important (Medium Investment):

Supporting features, analytics
Internal tools with external impact
Multi-AZ, mix of reserved and on-demand
99.9% availability target

Non-Critical (Low Investment):

Dev/test environments
Batch jobs, reports
Single-AZ, spot instances, minimal redundancy
99% availability acceptable

Result: Critical services get resources they need; savings come from not over-engineering everything else.

Technical Debt and Cost

Summary: Cost Optimization at Scale

We've explored the economic dimensions of system design—how to build systems that are not just functional, but economically sustainable at scale.

Key Takeaways

•Understand cost structure — Compute, storage, data transfer, and databases each contribute differently. Know where your money goes.
•Model unit economics — Cost per API call, cost per user. This reveals viability and guides optimization priorities.
•Right-size relentlessly — Most systems are over-provisioned. Continuous right-sizing yields easy savings.
•Use commitment discounts — Reserved instances for stable workloads, spot for interruptible workloads.
•Optimize storage lifecycle — Hot/warm/cold tiering, retention policies, and cleanup prevent unbounded growth.
•Minimize data transfer — CDNs, compression, and keeping compute near data reduce egress costs.
•Architecture determines cost — Caching, async processing, and efficient design save more than tactical tweaks.
•Navigate trade-offs intentionally — Balance cost against performance and reliability based on business requirements, not engineering preference.

Module Complete:

With this page, we've completed our exploration of Why System Design Matters. You now understand the four pillars:

Scalability — Building systems that grow gracefully
Handling Scale — Operating systems serving millions
Reliability & Availability — Keeping systems running through failure
Cost Optimization — Making systems economically sustainable

These concerns are not sequential—they must be balanced simultaneously in every architectural decision. The art of system design lies in finding solutions that optimize across all four dimensions.

Next, we'll explore Thinking at Scale—developing the mental models and intuition needed to reason about systems serving orders of magnitude more users.

Module Complete