Loading content...
In the early days of a startup, infrastructure costs barely register—a few hundred dollars a month for servers, perhaps a small database. But scale changes everything. What was once a rounding error becomes a line item that rivals engineering salaries. Companies serving millions of users routinely spend millions of dollars on infrastructure—sometimes tens of millions, sometimes hundreds.
This reality creates one of the most consequential tensions in system design: the balance between performance, reliability, and cost. Every architectural decision has cost implications. More servers improve capacity but increase bills. More redundancy improves availability but doubles expenses. Faster storage improves latency but costs multiples more. The challenge is not simply building systems that work, but building systems that work economically.
Cost optimization at scale is not about penny-pinching. It's about understanding the true cost of infrastructure decisions, identifying waste, and making intentional trade-offs. A dollar saved on infrastructure is a dollar available for engineering, product, or growth. Organizations that master cost efficiency can invest more in their competitive advantages while their competitors drain resources on inefficiency.
This page explores the economics of scale—how to understand, model, and optimize the costs of systems serving millions, without sacrificing the reliability and performance that users demand.
By the end of this page, you will understand the major cost components at scale, how to model and forecast infrastructure costs, optimization strategies at every layer of the stack, and the trade-off frameworks that guide cost-conscious architecture decisions.
Modern systems typically run on cloud infrastructure, where costs are composed of multiple dimensions that interact in complex ways.
1. Compute (30-50% of typical cloud bill):
Pricing Models:
2. Storage (15-25% of typical cloud bill):
Pricing Factors:
3. Data Transfer (10-20% of typical cloud bill):
The Egress Tax: Cloud providers charge significantly for data leaving their network:
4. Database Services (10-20% of typical cloud bill):
Pricing Factors:
| Category | Percentage | Primary Drivers | Key Optimizations |
|---|---|---|---|
| Compute | 30-50% | Instance hours, instance types | Right-sizing, reserved instances, spot |
| Storage | 15-25% | Volume, IOPS, storage class | Tiering, lifecycle policies, cleanup |
| Data Transfer | 10-20% | Egress volume, cross-region | CDN, compression, architecture |
| Databases | 10-20% | Instance hours, storage, IOPS | Right-sizing, reserved, read replicas |
| Other | 10-15% | Load balancers, DNS, monitoring | Consolidation, right-sizing |
Cloud bills contain surprising charges: data transfer between availability zones, NAT gateway processing, EBS snapshot storage, CloudWatch logs storage, Elastic IP addresses when not attached. Review detailed billing regularly—hidden costs can accumulate to significant totals.
Effective cost optimization requires understanding why costs are what they are and how they'll change as the system grows.
The foundation of cost modeling is understanding the cost of each unit of value delivered:
Examples:
Why Unit Economics Matter:
Example Calculation:
Monthly Infrastructure Cost: $100,000
Monthly API Calls: 500,000,000
Cost per million API calls: $100,000 / 500 = $200
If a premium customer generates 1M calls/month and pays $50
→ Negative unit economics! Must optimize or reprice.
Understanding what drives costs enables prediction and optimization:
Linear Cost Drivers:
Sublinear Cost Drivers (Economies of Scale):
Superlinear Cost Drivers (Diseconomies):
Steps:
Cloud cost visibility requires resource tagging. Tag resources by team, service, environment, and project. Without tags, you see total costs but not why or where. With tags, you can attribute costs to teams, compare service costs, and identify optimization targets.
Compute is typically the largest cost category. Optimizing it yields the biggest absolute savings.
Most organizations massively over-provision compute:
Common Findings:
Right-Sizing Process:
Tools:
Caveat: Right-sizing for average utilization ignores peaks. Target 60-70% utilization at peak, not at average.
For stable workloads, commit to 1-3 years for significant discounts:
Discount Levels:
Best Practices:
Risk: If workload shrinks or migrates, you're locked in. Balance discount against flexibility.
Spot instances offer deep discounts (~60-90%) but can be terminated with 2-minute notice:
Good Candidates for Spot:
Bad Candidates for Spot:
Serverless (Lambda, Cloud Functions) charges per invocation and duration:
When Serverless Saves Money:
When Serverless Costs More:
Break-Even Analysis: A Lambda function costing $0.20 per million invocations at 100ms duration:
Recommendation: Start serverless for simplicity, migrate to instances when volume justifies the operational overhead.
AWS Graviton (ARM-based) instances offer 20% better price-performance than equivalent x86 instances. If your workload supports ARM (most interpreted languages and many compiled ones do), switching can yield immediate savings with minimal effort.
Storage costs accumulate over time—data is created continuously but rarely deleted. Optimizing storage requires both reducing what's stored and choosing appropriate storage tiers.
Not all data deserves the same storage treatment:
Hot Data (Frequent Access):
Warm Data (Occasional Access):
Cold Data (Rare Access):
Lifecycle Policies: Automate transitions:
Days 0-30: S3 Standard ($0.023/GB)
Days 31-90: S3 Infrequent Access ($0.0125/GB)
Days 91-365: S3 Glacier ($0.004/GB)
After 365: S3 Deep Archive ($0.00099/GB) or delete
Result: ~90% cost reduction for old data
The Accumulation Problem:
Cleanup Strategies:
Quick Wins:
Indexing:
Compression:
Data Types:
Partitioning:
Typically, 80% of storage holds data accessed less than 1% of the time. Identify your 80% and move it to cheaper tiers. The savings are substantial, and users rarely notice (they weren't accessing it anyway).
Data transfer costs, especially egress, can scale dramatically with traffic. Optimizing network costs requires both architectural changes and tactical improvements.
CDNs reduce origin egress and improve latency:
How CDNs Reduce Costs:
CDN Economics:
Maximize Cache Hit Rate:
Smaller payloads mean less data transfer:
Application-Level Compression:
Data Format Optimization:
Minimize Cross-Region Traffic:
Minimize Cross-Zone Traffic:
VPC Endpoints:
Moving data between cloud providers is expensive—you pay egress on the source. Multi-cloud architectures must account for this. Avoid architectures that ping-pong data between clouds. Keep processing and data together.
Beyond tactical optimizations, architectural decisions fundamentally determine cost structure. Some architectures are inherently expensive; others are inherently efficient.
Caching isn't just about performance—it's about cost:
Database Load Reduction:
Cache Cost Comparison:
CDN as Cache:
Synchronous processing provisions for peak; async provisions for average:
The Sync Problem:
The Async Solution:
Trade-off: Latency. Users wait for response. Acceptable for background jobs, notifications, analytics—not for synchronous user requests.
Read-Heavy Workloads:
Write-Heavy Workloads:
Sometimes the architecture is the problem. A fundamentally inefficient design may cost $50K/month to run and $300K in engineer-years to rewrite. If the rewrite saves $30K/month, it pays for itself in less than a year—plus ongoing savings forever. Evaluate architectural rewrites as investments with ROI.
System design involves navigating trade-offs. Cost optimization often trades against performance or reliability. Understanding these trade-offs enables informed decisions.
Tension 1: Cost vs Performance
Approach: Define performance requirements (SLOs), then optimize cost to meet them—not exceed them.
Tension 2: Cost vs Reliability
Approach: Quantify the cost of downtime. If an hour of downtime costs $100K in lost revenue, investing $500K/year in reliability may be justified.
Tension 3: Performance vs Reliability
Approach: Match SLOs to business requirements. Not everything needs five nines.
| Decision | Optimize For | Accept Trade-off In |
|---|---|---|
| Use spot instances | Cost | Reliability (interruption risk) |
| Single-region deployment | Cost & Simplicity | Latency & DR capability |
| Eventual consistency | Performance & Availability | Consistency (stale reads) |
| Aggressive caching | Cost & Performance | Consistency (cache staleness) |
| Reserved instances | Cost | Flexibility (lock-in) |
| Microservices | Scalability & Reliability | Complexity & Operational cost |
Not all services require the same reliability or performance:
Critical Path (High Investment):
Important (Medium Investment):
Non-Critical (Low Investment):
Result: Critical services get resources they need; savings come from not over-engineering everything else.
Technical debt has carrying costs. Inefficient code requires more compute. Unoptimized queries require more database. Missing indexes require more IOPS. The cost of not paying down technical debt shows up in the infrastructure bill. Sometimes the best cost optimization is engineering time spent on cleanup.
We've explored the economic dimensions of system design—how to build systems that are not just functional, but economically sustainable at scale.
Module Complete:
With this page, we've completed our exploration of Why System Design Matters. You now understand the four pillars:
These concerns are not sequential—they must be balanced simultaneously in every architectural decision. The art of system design lies in finding solutions that optimize across all four dimensions.
Next, we'll explore Thinking at Scale—developing the mental models and intuition needed to reason about systems serving orders of magnitude more users.
You now understand why system design matters—not as academic exercise, but as the practical discipline of building software that scales, stays up, and remains economically viable. These four pillars—scalability, handling scale, reliability, and cost—form the foundation of every system design decision you'll make.