Loading learning content...
Every successful system eventually faces the scaling question: How do we handle 10x, 100x, or 1000x our current load? In system design interviews, demonstrating a clear understanding of database scaling strategies—when to apply each, their trade-offs, and their implementation challenges—is essential for senior-level positions.
Scaling is not a single decision but a progression of architectural evolutions. Premature scaling adds unnecessary complexity; delayed scaling causes outages. The engineer's challenge is timing these transitions correctly and understanding the full implications of each scaling strategy.
By the end of this page, you will understand the complete spectrum of database scaling strategies. You'll master vertical scaling limits, read replica patterns, sharding approaches, caching layers, connection management, and the transition to distributed databases—with the depth needed for principal-level system design discussions.
Database scaling typically follows a predictable progression. Understanding this journey helps you recommend appropriate solutions for different scale requirements.
┌─────────────────────────────────────────────────────────────────────┐
│ DATABASE SCALING PROGRESSION │
└─────────────────────────────────────────────────────────────────────┘
Stage 1: Single Server
├── CPU/Memory upgrades (vertical scaling)
├── Query optimization
├── Index tuning
└── Capacity: ~10K QPS, <1TB data
│
▼
Stage 2: Read Replicas
├── Primary-replica topology
├── Read traffic distribution
├── Connection routing
└── Capacity: ~100K read QPS
│
▼
Stage 3: Caching Layer
├── Application-level caching (Redis/Memcached)
├── Query result caching
├── Cache invalidation strategy
└── Capacity: ~1M+ read QPS (cached)
│
▼
Stage 4: Sharding
├── Horizontal data partitioning
├── Shard key selection
├── Cross-shard query handling
└── Capacity: ~100K+ write QPS, multi-TB data
│
▼
Stage 5: Distributed Databases
├── NewSQL (CockroachDB, Spanner)
├── Native horizontal scaling
├── Global distribution
└── Capacity: Planetary scale
In interviews, explicitly acknowledge the scaling stage appropriate for the given requirements. Saying 'For 10K daily users, a single well-tuned PostgreSQL instance would suffice—scaling complexity isn't justified yet' shows mature engineering judgment.
Vertical scaling (scaling up) involves adding more resources to a single server. It's the simplest and often most cost-effective approach until you hit hardware limits.
Maximizing Vertical Scale Performance
Before adding complexity, extract maximum value from your single server:
1. Hardware Configuration
2. Database Tuning
-- PostgreSQL example: key performance parameters
shared_buffers = '32GB' -- 25% of RAM for DB cache
effective_cache_size = '96GB' -- 75% of RAM (accounts for OS cache)
work_mem = '256MB' -- Per-sort/hash operation
maintenance_work_mem = '2GB' -- For VACUUM, CREATE INDEX
max_connections = 200 -- Lower than you think
max_parallel_workers_per_gather = 4 -- Parallel query execution
random_page_cost = 1.1 -- Low for SSD (1.1-1.5)
3. Query Optimization
| Resource | Practical Maximum | Cloud Example |
|---|---|---|
| vCPUs | 128+ cores | AWS db.r6i.32xlarge |
| Memory | 1-2 TB RAM | Azure E104ids_v5 |
| Storage | 64 TB SSD | GP3/IO1 volumes |
| IOPS | 256,000+ | Provisioned IOPS SSD |
| Network | 100 Gbps | Enhanced networking |
A well-tuned PostgreSQL on a single large server can handle 50,000+ transactions per second with sub-millisecond latency. Many startups reaching $100M+ revenue never need to shard. Don't underestimate vertical scaling.
Read replicas are the first horizontal scaling pattern most systems adopt. They distribute read traffic across multiple servers while maintaining a single primary for writes.
Architecture Overview
┌─────────────┐
│ Primary │
│ (Writes) │
└─────────────┘
│
┌───────────────┼───────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Replica │ │ Replica │ │ Replica │
│ (Read) │ │ (Read) │ │ (Read) │
└─────────┘ └─────────┘ └─────────┘
│ │ │
└───────────────┴───────────────┘
│
▼
┌───────────────┐
│ Application │
│ (Read/Write │
│ Routing) │
└───────────────┘
Synchronous Replication
The primary waits for replica acknowledgment before confirming writes.
Timeline:
1. Application sends INSERT
2. Primary writes to WAL
3. Primary sends WAL to replica(s)
4. Replica(s) write to their WAL
5. Replica(s) acknowledge
6. Primary confirms to application ← write complete
Characteristics:
PostgreSQL Configuration:
-- On primary
synchronous_commit = on
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'
-- Requires at least one replica to confirm
With async replication, users may update data, then immediately see old values from a lagging replica. Design for this: use read-your-writes patterns, or accept eventual consistency with appropriate UX.
Caching is the highest-leverage scaling technique. A well-designed cache can reduce database load by 90%+ and improve latency by 100x. Understanding caching patterns is essential for any system design discussion.
Cache Placement Architecture
┌─────────┐ ┌─────────────┐ ┌─────────────┐ ┌──────────┐
│ User │───►│ CDN │───►│ Application │───►│ Database │
└─────────┘ │ (Static) │ │ Server │ └──────────┘
└─────────────┘ └──────┬──────┘
│
▼
┌─────────────┐
│ Redis │
│ (Cache) │
└─────────────┘
Caches exist at multiple layers:
Cache-Aside (Lazy Loading)
Application manages cache population explicitly.
def get_user(user_id):
# 1. Check cache first
cached = redis.get(f'user:{user_id}')
if cached:
return json.loads(cached) # Cache hit
# 2. Cache miss: load from database
user = db.query('SELECT * FROM users WHERE id = %s', user_id)
# 3. Populate cache for future requests
redis.setex(f'user:{user_id}', 3600, json.dumps(user)) # TTL: 1 hour
return user
def update_user(user_id, data):
# Update database
db.execute('UPDATE users SET ... WHERE id = %s', data, user_id)
# Invalidate cache (or update it)
redis.delete(f'user:{user_id}')
Characteristics:
Monitor cache hit ratio obsessively. Below 90% hit ratio, investigate: wrong TTL, cold start issues, or workload not suitable for caching. Above 99%, you're likely over-caching or have very stable data.
When read replicas and caching aren't enough—when you need to scale writes or store more data than fits on a single server—sharding becomes necessary. Sharding divides data across multiple independent databases.
Sharding Architecture
┌─────────────────────┐
│ Routing Layer │
│ (App / Proxy) │
└──────────┬──────────┘
│
┌─────────────┬───────┴───────┬─────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 0 │ │ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ A-F │ │ G-L │ │ M-R │ │ S-Z │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
│ │ │ │
▼ ▼ ▼ ▼
Replica Replica Replica Replica
Range-Based Sharding
Data is partitioned by contiguous ranges of the shard key.
# Range sharding by user_id
SHARD_RANGES = [
(0, 1000000, 'shard_0'),
(1000001, 2000000, 'shard_1'),
(2000001, 3000000, 'shard_2'),
# ...
]
def get_shard(user_id):
for (start, end, shard) in SHARD_RANGES:
if start <= user_id <= end:
return shard
raise ValueError('User ID out of range')
Advantages:
Disadvantages:
Best For:
Sharding breaks: joins across shards, transactions across shards, global ordering, unique constraints. Design to minimize cross-shard operations. If you need them frequently, reconsider your shard key or database choice.
Database connections are expensive resources. Each connection consumes memory on both client and server, and connection establishment has latency. At scale, connection management becomes a critical bottleneck that candidates often overlook.
The Connection Problem
Without pooling:
┌─────────────┐ ┌──────────────┐
│ App Server │ │ Database │
│ 100 requests│────►│ 100 connections │
└─────────────┘ └──────────────┘
× 20 app servers = 2,000 connections
PostgreSQL default max_connections = 100
→ Connection refused errors!
With pooling:
┌─────────────┐ ┌─────────────┐ ┌──────────────┐
│ App Server │────►│ PgBouncer │────►│ Database │
│ 100 requests│ │ (Pooler) │ │ 20 connections│
└─────────────┘ └─────────────┘ └──────────────┘
× 20 app servers = 400 → 20 actual DB connections
| Mode | Description | Use Case |
|---|---|---|
| Session Pooling | Connection assigned per client session | Long-running connections, prepared statements |
| Transaction Pooling | Connection assigned per transaction | General web applications, best efficiency |
| Statement Pooling | Connection assigned per SQL statement | Simple queries only, no multi-statement transactions |
Connection Pool Sizing Formula
The optimal pool size depends on your database's ability to parallelize:
Optimal connections = (CPU cores × 2) + effective_spindle_count
Example for 8-core with SSD:
Optimal = (8 × 2) + 1 = 17 connections
Less is more: A pool of 20 connections often outperforms 200 because:
PgBouncer Configuration:
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
pool_mode = transaction # Most common
max_client_conn = 10000 # Accept many clients
default_pool_size = 20 # But only 20 DB connections
reserve_pool_size = 5 # Extra for bursts
min_pool_size = 5 # Keep warm connections
Serverless functions (Lambda, Cloud Functions) scale from 0 to 1000+ instances instantly. Each wants a DB connection. Use RDS Proxy, PgBouncer, or connection poolers to prevent connection storms.
When manual sharding becomes too complex, distributed databases offer built-in horizontal scaling with SQL semantics. These NewSQL databases represent the cutting edge of database technology.
| Database | Consistency | Distribution | SQL Compat | Key Use Case |
|---|---|---|---|---|
| Google Spanner | Strong (TrueTime) | Global | Full | Global financial systems |
| CockroachDB | Serializable | Multi-region | PostgreSQL | Multi-region applications |
| TiDB | Snapshot isolation | Distributed | MySQL | HTAP workloads |
| YugabyteDB | Strong | Multi-region | PostgreSQL | Low-latency global apps |
| Vitess | Varies | Sharded MySQL | MySQL subset | Scaling existing MySQL |
| PlanetScale | Strong | Serverless | MySQL | Developer-friendly serverless |
CockroachDB: Representative Architecture
┌──────────────────────────────────────────────────────────┐
│ CockroachDB Cluster │
├──────────────────────────────────────────────────────────┤
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Node 1 │ │ Node 2 │ │ Node 3 │ ... │
│ │ (NYC) │ │ (SFO) │ │ (LDN) │ │
│ └────┬────┘ └────┬────┘ └────┬────┘ │
│ │ │ │ │
│ ┌────┴─────────────┴─────────────┴────┐ │
│ │ Raft Consensus │ │
│ │ (Automatic leader election, │ │
│ │ replication, failover) │ │
│ └─────────────────────────────────────┘ │
│ │
│ Data automatically sharded (ranges) │
│ Replicas placed across nodes/regions │
│ SQL queries routed to appropriate nodes │
│ │
└──────────────────────────────────────────────────────────┘
Key Capabilities:
Consider distributed databases when: (1) data exceeds single-node capacity, (2) you need active-active multi-region, (3) manual sharding complexity is unsustainable, or (4) you need global scale with SQL semantics. They add latency and operational complexity—don't adopt prematurely.
Database scaling is a journey, not a destination. Let's consolidate the key takeaways:
What's Next:
With scaling strategies mastered, the next page dives into Consistency Models—how to reason about data correctness in distributed systems. You'll learn strong vs. eventual consistency, CAP theorem implications, and how to choose the right consistency level for different operations.
You now have a comprehensive framework for database scaling in system design interviews. Remember: the best engineers scale systems gradually, timing each transition to match actual needs rather than anticipated ones.