Database Management SystemsSystem Design Questions

System Design Questions for Database Systems

LevelAdvanced

Duration90 mins

TopicSystem Design Questions

3 / 5

Scaling Strategies

From Single Server to Global Scale

Every successful system eventually faces the scaling question: How do we handle 10x, 100x, or 1000x our current load? In system design interviews, demonstrating a clear understanding of database scaling strategies—when to apply each, their trade-offs, and their implementation challenges—is essential for senior-level positions.

Scaling is not a single decision but a progression of architectural evolutions. Premature scaling adds unnecessary complexity; delayed scaling causes outages. The engineer's challenge is timing these transitions correctly and understanding the full implications of each scaling strategy.

What You Will Learn

By the end of this page, you will understand the complete spectrum of database scaling strategies. You'll master vertical scaling limits, read replica patterns, sharding approaches, caching layers, connection management, and the transition to distributed databases—with the depth needed for principal-level system design discussions.

The Scaling Progression

Database scaling typically follows a predictable progression. Understanding this journey helps you recommend appropriate solutions for different scale requirements.

┌─────────────────────────────────────────────────────────────────────┐
│                    DATABASE SCALING PROGRESSION                     │
└─────────────────────────────────────────────────────────────────────┘

Stage 1: Single Server
├── CPU/Memory upgrades (vertical scaling)
├── Query optimization
├── Index tuning
└── Capacity: ~10K QPS, <1TB data
         │
         ▼
Stage 2: Read Replicas
├── Primary-replica topology
├── Read traffic distribution
├── Connection routing
└── Capacity: ~100K read QPS
         │
         ▼
Stage 3: Caching Layer
├── Application-level caching (Redis/Memcached)
├── Query result caching
├── Cache invalidation strategy
└── Capacity: ~1M+ read QPS (cached)
         │
         ▼
Stage 4: Sharding
├── Horizontal data partitioning
├── Shard key selection
├── Cross-shard query handling
└── Capacity: ~100K+ write QPS, multi-TB data
         │
         ▼
Stage 5: Distributed Databases
├── NewSQL (CockroachDB, Spanner)
├── Native horizontal scaling
├── Global distribution
└── Capacity: Planetary scale

Interview Strategy

In interviews, explicitly acknowledge the scaling stage appropriate for the given requirements. Saying 'For 10K daily users, a single well-tuned PostgreSQL instance would suffice—scaling complexity isn't justified yet' shows mature engineering judgment.

Vertical Scaling: The First Line of Defense

Vertical scaling (scaling up) involves adding more resources to a single server. It's the simplest and often most cost-effective approach until you hit hardware limits.

Vertical Scaling Advantages

•Operational simplicity: No distributed systems complexity
•Full SQL support: All features work—joins, transactions, constraints
•Predictable latency: No network hops between shards
•Easy debugging: One server to monitor and troubleshoot
•ACID guarantees: No distributed transaction challenges

Vertical Scaling Limitations

•Hardware ceiling: Max ~128 cores, 12TB RAM per server
•Single point of failure: One server crash = total outage
•Downtime for upgrades: Hardware changes require restarts
•Cost curve: Large servers exponentially more expensive
•Diminishing returns: 2x hardware ≠ 2x performance

Maximizing Vertical Scale Performance

Before adding complexity, extract maximum value from your single server:

1. Hardware Configuration

CPU: More cores help concurrent queries; higher clock speed helps complex queries
Memory: More RAM = larger buffer pool = fewer disk reads
Storage: NVMe SSDs with high IOPS; RAID-10 for redundancy
Network: 10Gbps+ for replication traffic

2. Database Tuning

-- PostgreSQL example: key performance parameters
shared_buffers = '32GB'              -- 25% of RAM for DB cache
effective_cache_size = '96GB'        -- 75% of RAM (accounts for OS cache)
work_mem = '256MB'                   -- Per-sort/hash operation
maintenance_work_mem = '2GB'         -- For VACUUM, CREATE INDEX
max_connections = 200                -- Lower than you think
max_parallel_workers_per_gather = 4  -- Parallel query execution
random_page_cost = 1.1               -- Low for SSD (1.1-1.5)

3. Query Optimization

Analyze slow query logs
Add missing indexes
Rewrite N+1 queries as joins
Materialize expensive computations

Modern Vertical Scaling Limits (2024)
Resource	Practical Maximum	Cloud Example
vCPUs	128+ cores	AWS db.r6i.32xlarge
Memory	1-2 TB RAM	Azure E104ids_v5
Storage	64 TB SSD	GP3/IO1 volumes
IOPS	256,000+	Provisioned IOPS SSD
Network	100 Gbps	Enhanced networking

The Surprising Power of Modern Hardware

A well-tuned PostgreSQL on a single large server can handle 50,000+ transactions per second with sub-millisecond latency. Many startups reaching $100M+ revenue never need to shard. Don't underestimate vertical scaling.

Read Replicas: Scaling Read Throughput

Read replicas are the first horizontal scaling pattern most systems adopt. They distribute read traffic across multiple servers while maintaining a single primary for writes.

Architecture Overview

                        ┌─────────────┐
                        │   Primary   │
                        │  (Writes)   │
                        └─────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
        ┌─────────┐     ┌─────────┐     ┌─────────┐
        │ Replica │     │ Replica │     │ Replica │
        │  (Read) │     │  (Read) │     │  (Read) │
        └─────────┘     └─────────┘     └─────────┘
              │               │               │
              └───────────────┴───────────────┘
                              │
                              ▼
                      ┌───────────────┐
                      │  Application  │
                      │  (Read/Write  │
                      │   Routing)    │
                      └───────────────┘

Synchronous Replication

The primary waits for replica acknowledgment before confirming writes.

Timeline:
1. Application sends INSERT
2. Primary writes to WAL
3. Primary sends WAL to replica(s)
4. Replica(s) write to their WAL
5. Replica(s) acknowledge
6. Primary confirms to application  ← write complete

Characteristics:

✅ Zero data loss on primary failure (RPO = 0)
✅ Read-your-writes consistency from replicas
❌ Higher write latency (network round-trip)
❌ Replica failure can block writes
⚠️ Use for critical data (financial transactions)

PostgreSQL Configuration:

-- On primary
synchronous_commit = on
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'

-- Requires at least one replica to confirm

Replica Consistency Gotchas

With async replication, users may update data, then immediately see old values from a lagging replica. Design for this: use read-your-writes patterns, or accept eventual consistency with appropriate UX.

Caching Strategies: The Force Multiplier

Caching is the highest-leverage scaling technique. A well-designed cache can reduce database load by 90%+ and improve latency by 100x. Understanding caching patterns is essential for any system design discussion.

Cache Placement Architecture

┌─────────┐    ┌─────────────┐    ┌─────────────┐    ┌──────────┐
│  User   │───►│   CDN       │───►│ Application │───►│ Database │
└─────────┘    │  (Static)   │    │   Server    │    └──────────┘
               └─────────────┘    └──────┬──────┘
                                         │
                                         ▼
                                  ┌─────────────┐
                                  │    Redis    │
                                  │   (Cache)   │
                                  └─────────────┘

Caches exist at multiple layers:

CDN: Static assets, API responses (edge caching)
Application: In-process caches (limited by memory)
Distributed Cache: Redis/Memcached (shared across servers)
Database: Query cache, buffer pool (internal)

Cache-Aside (Lazy Loading)

Application manages cache population explicitly.

def get_user(user_id):
    # 1. Check cache first
    cached = redis.get(f'user:{user_id}')
    if cached:
        return json.loads(cached)  # Cache hit
    
    # 2. Cache miss: load from database
    user = db.query('SELECT * FROM users WHERE id = %s', user_id)
    
    # 3. Populate cache for future requests
    redis.setex(f'user:{user_id}', 3600, json.dumps(user))  # TTL: 1 hour
    
    return user

def update_user(user_id, data):
    # Update database
    db.execute('UPDATE users SET ... WHERE id = %s', data, user_id)
    
    # Invalidate cache (or update it)
    redis.delete(f'user:{user_id}')

Characteristics:

✅ Only caches data that's actually accessed
✅ Cache failures don't break the application
✅ Simple to implement and reason about
❌ First request is always slow (cache miss)
❌ Stale data possible between update and invalidation

Cache Hit Ratio

Monitor cache hit ratio obsessively. Below 90% hit ratio, investigate: wrong TTL, cold start issues, or workload not suitable for caching. Above 99%, you're likely over-caching or have very stable data.

Sharding: Horizontal Data Partitioning

When read replicas and caching aren't enough—when you need to scale writes or store more data than fits on a single server—sharding becomes necessary. Sharding divides data across multiple independent databases.

Sharding Architecture

                    ┌─────────────────────┐
                    │   Routing Layer     │
                    │  (App / Proxy)      │
                    └──────────┬──────────┘
                               │
         ┌─────────────┬───────┴───────┬─────────────┐
         │             │               │             │
         ▼             ▼               ▼             ▼
    ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐
    │ Shard 0 │   │ Shard 1 │   │ Shard 2 │   │ Shard 3 │
    │ A-F     │   │ G-L     │   │ M-R     │   │ S-Z     │
    └─────────┘   └─────────┘   └─────────┘   └─────────┘
         │             │               │             │
         ▼             ▼               ▼             ▼
      Replica       Replica        Replica       Replica

Range-Based Sharding

Data is partitioned by contiguous ranges of the shard key.

# Range sharding by user_id
SHARD_RANGES = [
    (0, 1000000, 'shard_0'),
    (1000001, 2000000, 'shard_1'),
    (2000001, 3000000, 'shard_2'),
    # ...
]

def get_shard(user_id):
    for (start, end, shard) in SHARD_RANGES:
        if start <= user_id <= end:
            return shard
    raise ValueError('User ID out of range')

Advantages:

✅ Efficient range queries (e.g., users 1000-2000)
✅ Data locality for related records
✅ Easy to understand and debug

Disadvantages:

❌ Hot spots: If users 1-1000000 are most active, shard_0 is overloaded
❌ Uneven distribution as data grows
❌ Resharding requires defining new ranges

Best For:

Time-series data (shard by month/year)
Geographical data (shard by region)
Archive data (old data in cold shards)

Cross-Shard Operations

Sharding breaks: joins across shards, transactions across shards, global ordering, unique constraints. Design to minimize cross-shard operations. If you need them frequently, reconsider your shard key or database choice.

Connection Management at Scale

Database connections are expensive resources. Each connection consumes memory on both client and server, and connection establishment has latency. At scale, connection management becomes a critical bottleneck that candidates often overlook.

The Connection Problem

Without pooling:
┌─────────────┐     ┌──────────────┐
│ App Server  │     │   Database   │
│ 100 requests│────►│ 100 connections │
└─────────────┘     └──────────────┘
× 20 app servers = 2,000 connections

PostgreSQL default max_connections = 100
→ Connection refused errors!


With pooling:
┌─────────────┐     ┌─────────────┐     ┌──────────────┐
│ App Server  │────►│  PgBouncer  │────►│   Database   │
│ 100 requests│     │  (Pooler)   │     │ 20 connections│
└─────────────┘     └─────────────┘     └──────────────┘
× 20 app servers = 400 → 20 actual DB connections

Connection Pooling Modes
Mode	Description	Use Case
Session Pooling	Connection assigned per client session	Long-running connections, prepared statements
Transaction Pooling	Connection assigned per transaction	General web applications, best efficiency
Statement Pooling	Connection assigned per SQL statement	Simple queries only, no multi-statement transactions

Connection Pool Sizing Formula

The optimal pool size depends on your database's ability to parallelize:

Optimal connections = (CPU cores × 2) + effective_spindle_count

Example for 8-core with SSD:
Optimal = (8 × 2) + 1 = 17 connections

Less is more: A pool of 20 connections often outperforms 200 because:

Less lock contention
Less context switching
Better cache utilization

PgBouncer Configuration:

[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
pool_mode = transaction    # Most common
max_client_conn = 10000    # Accept many clients
default_pool_size = 20     # But only 20 DB connections
reserve_pool_size = 5      # Extra for bursts
min_pool_size = 5          # Keep warm connections

Serverless Connection Challenge

Serverless functions (Lambda, Cloud Functions) scale from 0 to 1000+ instances instantly. Each wants a DB connection. Use RDS Proxy, PgBouncer, or connection poolers to prevent connection storms.

Distributed Database Solutions

When manual sharding becomes too complex, distributed databases offer built-in horizontal scaling with SQL semantics. These NewSQL databases represent the cutting edge of database technology.

Distributed Database Comparison
Database	Consistency	Distribution	SQL Compat	Key Use Case
Google Spanner	Strong (TrueTime)	Global	Full	Global financial systems
CockroachDB	Serializable	Multi-region	PostgreSQL	Multi-region applications
TiDB	Snapshot isolation	Distributed	MySQL	HTAP workloads
YugabyteDB	Strong	Multi-region	PostgreSQL	Low-latency global apps
Vitess	Varies	Sharded MySQL	MySQL subset	Scaling existing MySQL
PlanetScale	Strong	Serverless	MySQL	Developer-friendly serverless

CockroachDB: Representative Architecture

┌──────────────────────────────────────────────────────────┐
│                     CockroachDB Cluster                   │
├──────────────────────────────────────────────────────────┤
│                                                          │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐               │
│   │ Node 1  │   │ Node 2  │   │ Node 3  │  ...          │
│   │ (NYC)   │   │ (SFO)   │   │ (LDN)   │               │
│   └────┬────┘   └────┬────┘   └────┬────┘               │
│        │             │             │                     │
│   ┌────┴─────────────┴─────────────┴────┐               │
│   │            Raft Consensus            │               │
│   │    (Automatic leader election,       │               │
│   │     replication, failover)           │               │
│   └─────────────────────────────────────┘               │
│                                                          │
│   Data automatically sharded (ranges)                    │
│   Replicas placed across nodes/regions                   │
│   SQL queries routed to appropriate nodes                │
│                                                          │
└──────────────────────────────────────────────────────────┘

Key Capabilities:

Automatic sharding: Data split into ranges, distributed across nodes
Automatic replication: Configurable replication factor (default 3)
Automatic failover: Raft consensus handles node failures
Serializable transactions: Full ACID across distributed data
PostgreSQL wire protocol: Use existing PostgreSQL drivers
Geo-partitioning: Pin data to specific regions for compliance/latency

When to Consider Distributed Databases

Consider distributed databases when: (1) data exceeds single-node capacity, (2) you need active-active multi-region, (3) manual sharding complexity is unsustainable, or (4) you need global scale with SQL semantics. They add latency and operational complexity—don't adopt prematurely.

Summary: Scaling Strategy Mastery

Database scaling is a journey, not a destination. Let's consolidate the key takeaways:

Key Takeaways

•Scale progressively — Start simple, add complexity only when needed. Vertical scaling goes further than most assume.
•Read replicas first — For read-heavy workloads, replicas + caching can delay sharding indefinitely
•Caching is the highest leverage — A well-designed cache can reduce database load by 90%+ and latency by 100x
•Shard key is critical — Wrong shard key choice creates unfixable problems. Align with query patterns.
•Sharding breaks things — Joins, transactions, unique constraints all become complex. Design to minimize cross-shard operations.
•Connection pooling is essential — Especially with microservices and serverless architectures
•Distributed databases exist — CockroachDB, Spanner, etc. offer horizontal scaling with SQL semantics, at the cost of complexity and latency
•Measure before optimizing — Know your bottleneck (CPU? Memory? Disk? Network?) before choosing scaling strategy

What's Next:

With scaling strategies mastered, the next page dives into Consistency Models—how to reason about data correctness in distributed systems. You'll learn strong vs. eventual consistency, CAP theorem implications, and how to choose the right consistency level for different operations.

Page Complete

You now have a comprehensive framework for database scaling in system design interviews. Remember: the best engineers scale systems gradually, timing each transition to match actual needs rather than anticipated ones.

3 / 5

Loading learning content...

Database Management SystemsSystem Design Questions

System Design Questions for Database Systems

LevelAdvanced

Duration90 mins

TopicSystem Design Questions

3 / 5

Scaling Strategies

From Single Server to Global Scale

What You Will Learn

The Scaling Progression

Database scaling typically follows a predictable progression. Understanding this journey helps you recommend appropriate solutions for different scale requirements.

┌─────────────────────────────────────────────────────────────────────┐
│                    DATABASE SCALING PROGRESSION                     │
└─────────────────────────────────────────────────────────────────────┘

Stage 1: Single Server
├── CPU/Memory upgrades (vertical scaling)
├── Query optimization
├── Index tuning
└── Capacity: ~10K QPS, <1TB data
         │
         ▼
Stage 2: Read Replicas
├── Primary-replica topology
├── Read traffic distribution
├── Connection routing
└── Capacity: ~100K read QPS
         │
         ▼
Stage 3: Caching Layer
├── Application-level caching (Redis/Memcached)
├── Query result caching
├── Cache invalidation strategy
└── Capacity: ~1M+ read QPS (cached)
         │
         ▼
Stage 4: Sharding
├── Horizontal data partitioning
├── Shard key selection
├── Cross-shard query handling
└── Capacity: ~100K+ write QPS, multi-TB data
         │
         ▼
Stage 5: Distributed Databases
├── NewSQL (CockroachDB, Spanner)
├── Native horizontal scaling
├── Global distribution
└── Capacity: Planetary scale

Interview Strategy

Vertical Scaling: The First Line of Defense

Vertical scaling (scaling up) involves adding more resources to a single server. It's the simplest and often most cost-effective approach until you hit hardware limits.

Vertical Scaling Advantages

•Operational simplicity: No distributed systems complexity
•Full SQL support: All features work—joins, transactions, constraints
•Predictable latency: No network hops between shards
•Easy debugging: One server to monitor and troubleshoot
•ACID guarantees: No distributed transaction challenges

Vertical Scaling Limitations

•Hardware ceiling: Max ~128 cores, 12TB RAM per server
•Single point of failure: One server crash = total outage
•Downtime for upgrades: Hardware changes require restarts
•Cost curve: Large servers exponentially more expensive
•Diminishing returns: 2x hardware ≠ 2x performance

Maximizing Vertical Scale Performance

Before adding complexity, extract maximum value from your single server:

1. Hardware Configuration

CPU: More cores help concurrent queries; higher clock speed helps complex queries
Memory: More RAM = larger buffer pool = fewer disk reads
Storage: NVMe SSDs with high IOPS; RAID-10 for redundancy
Network: 10Gbps+ for replication traffic

2. Database Tuning

-- PostgreSQL example: key performance parameters
shared_buffers = '32GB'              -- 25% of RAM for DB cache
effective_cache_size = '96GB'        -- 75% of RAM (accounts for OS cache)
work_mem = '256MB'                   -- Per-sort/hash operation
maintenance_work_mem = '2GB'         -- For VACUUM, CREATE INDEX
max_connections = 200                -- Lower than you think
max_parallel_workers_per_gather = 4  -- Parallel query execution
random_page_cost = 1.1               -- Low for SSD (1.1-1.5)

3. Query Optimization

Analyze slow query logs
Add missing indexes
Rewrite N+1 queries as joins
Materialize expensive computations

Modern Vertical Scaling Limits (2024)
Resource	Practical Maximum	Cloud Example
vCPUs	128+ cores	AWS db.r6i.32xlarge
Memory	1-2 TB RAM	Azure E104ids_v5
Storage	64 TB SSD	GP3/IO1 volumes
IOPS	256,000+	Provisioned IOPS SSD
Network	100 Gbps	Enhanced networking

The Surprising Power of Modern Hardware

Read Replicas: Scaling Read Throughput

Read replicas are the first horizontal scaling pattern most systems adopt. They distribute read traffic across multiple servers while maintaining a single primary for writes.

Architecture Overview

                        ┌─────────────┐
                        │   Primary   │
                        │  (Writes)   │
                        └─────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
              ▼               ▼               ▼
        ┌─────────┐     ┌─────────┐     ┌─────────┐
        │ Replica │     │ Replica │     │ Replica │
        │  (Read) │     │  (Read) │     │  (Read) │
        └─────────┘     └─────────┘     └─────────┘
              │               │               │
              └───────────────┴───────────────┘
                              │
                              ▼
                      ┌───────────────┐
                      │  Application  │
                      │  (Read/Write  │
                      │   Routing)    │
                      └───────────────┘

Synchronous Replication

The primary waits for replica acknowledgment before confirming writes.

Timeline:
1. Application sends INSERT
2. Primary writes to WAL
3. Primary sends WAL to replica(s)
4. Replica(s) write to their WAL
5. Replica(s) acknowledge
6. Primary confirms to application  ← write complete

Characteristics:

✅ Zero data loss on primary failure (RPO = 0)
✅ Read-your-writes consistency from replicas
❌ Higher write latency (network round-trip)
❌ Replica failure can block writes
⚠️ Use for critical data (financial transactions)

PostgreSQL Configuration:

-- On primary
synchronous_commit = on
synchronous_standby_names = 'FIRST 1 (replica1, replica2)'

-- Requires at least one replica to confirm

Replica Consistency Gotchas

Caching Strategies: The Force Multiplier

Cache Placement Architecture

┌─────────┐    ┌─────────────┐    ┌─────────────┐    ┌──────────┐
│  User   │───►│   CDN       │───►│ Application │───►│ Database │
└─────────┘    │  (Static)   │    │   Server    │    └──────────┘
               └─────────────┘    └──────┬──────┘
                                         │
                                         ▼
                                  ┌─────────────┐
                                  │    Redis    │
                                  │   (Cache)   │
                                  └─────────────┘

Caches exist at multiple layers:

CDN: Static assets, API responses (edge caching)
Application: In-process caches (limited by memory)
Distributed Cache: Redis/Memcached (shared across servers)
Database: Query cache, buffer pool (internal)

Cache-Aside (Lazy Loading)

Application manages cache population explicitly.

def get_user(user_id):
    # 1. Check cache first
    cached = redis.get(f'user:{user_id}')
    if cached:
        return json.loads(cached)  # Cache hit
    
    # 2. Cache miss: load from database
    user = db.query('SELECT * FROM users WHERE id = %s', user_id)
    
    # 3. Populate cache for future requests
    redis.setex(f'user:{user_id}', 3600, json.dumps(user))  # TTL: 1 hour
    
    return user

def update_user(user_id, data):
    # Update database
    db.execute('UPDATE users SET ... WHERE id = %s', data, user_id)
    
    # Invalidate cache (or update it)
    redis.delete(f'user:{user_id}')

Characteristics:

✅ Only caches data that's actually accessed
✅ Cache failures don't break the application
✅ Simple to implement and reason about
❌ First request is always slow (cache miss)
❌ Stale data possible between update and invalidation

Cache Hit Ratio

Sharding: Horizontal Data Partitioning

Sharding Architecture

                    ┌─────────────────────┐
                    │   Routing Layer     │
                    │  (App / Proxy)      │
                    └──────────┬──────────┘
                               │
         ┌─────────────┬───────┴───────┬─────────────┐
         │             │               │             │
         ▼             ▼               ▼             ▼
    ┌─────────┐   ┌─────────┐   ┌─────────┐   ┌─────────┐
    │ Shard 0 │   │ Shard 1 │   │ Shard 2 │   │ Shard 3 │
    │ A-F     │   │ G-L     │   │ M-R     │   │ S-Z     │
    └─────────┘   └─────────┘   └─────────┘   └─────────┘
         │             │               │             │
         ▼             ▼               ▼             ▼
      Replica       Replica        Replica       Replica

Range-Based Sharding

Data is partitioned by contiguous ranges of the shard key.

# Range sharding by user_id
SHARD_RANGES = [
    (0, 1000000, 'shard_0'),
    (1000001, 2000000, 'shard_1'),
    (2000001, 3000000, 'shard_2'),
    # ...
]

def get_shard(user_id):
    for (start, end, shard) in SHARD_RANGES:
        if start <= user_id <= end:
            return shard
    raise ValueError('User ID out of range')

Advantages:

✅ Efficient range queries (e.g., users 1000-2000)
✅ Data locality for related records
✅ Easy to understand and debug

Disadvantages:

❌ Hot spots: If users 1-1000000 are most active, shard_0 is overloaded
❌ Uneven distribution as data grows
❌ Resharding requires defining new ranges

Best For:

Time-series data (shard by month/year)
Geographical data (shard by region)
Archive data (old data in cold shards)

Cross-Shard Operations

Connection Management at Scale

The Connection Problem

Without pooling:
┌─────────────┐     ┌──────────────┐
│ App Server  │     │   Database   │
│ 100 requests│────►│ 100 connections │
└─────────────┘     └──────────────┘
× 20 app servers = 2,000 connections

PostgreSQL default max_connections = 100
→ Connection refused errors!


With pooling:
┌─────────────┐     ┌─────────────┐     ┌──────────────┐
│ App Server  │────►│  PgBouncer  │────►│   Database   │
│ 100 requests│     │  (Pooler)   │     │ 20 connections│
└─────────────┘     └─────────────┘     └──────────────┘
× 20 app servers = 400 → 20 actual DB connections

Connection Pooling Modes
Mode	Description	Use Case
Session Pooling	Connection assigned per client session	Long-running connections, prepared statements
Transaction Pooling	Connection assigned per transaction	General web applications, best efficiency
Statement Pooling	Connection assigned per SQL statement	Simple queries only, no multi-statement transactions

Connection Pool Sizing Formula

The optimal pool size depends on your database's ability to parallelize:

Optimal connections = (CPU cores × 2) + effective_spindle_count

Example for 8-core with SSD:
Optimal = (8 × 2) + 1 = 17 connections

Less is more: A pool of 20 connections often outperforms 200 because:

Less lock contention
Less context switching
Better cache utilization

PgBouncer Configuration:

[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
pool_mode = transaction    # Most common
max_client_conn = 10000    # Accept many clients
default_pool_size = 20     # But only 20 DB connections
reserve_pool_size = 5      # Extra for bursts
min_pool_size = 5          # Keep warm connections

Serverless Connection Challenge

Serverless functions (Lambda, Cloud Functions) scale from 0 to 1000+ instances instantly. Each wants a DB connection. Use RDS Proxy, PgBouncer, or connection poolers to prevent connection storms.

Distributed Database Solutions

When manual sharding becomes too complex, distributed databases offer built-in horizontal scaling with SQL semantics. These NewSQL databases represent the cutting edge of database technology.

Distributed Database Comparison
Database	Consistency	Distribution	SQL Compat	Key Use Case
Google Spanner	Strong (TrueTime)	Global	Full	Global financial systems
CockroachDB	Serializable	Multi-region	PostgreSQL	Multi-region applications
TiDB	Snapshot isolation	Distributed	MySQL	HTAP workloads
YugabyteDB	Strong	Multi-region	PostgreSQL	Low-latency global apps
Vitess	Varies	Sharded MySQL	MySQL subset	Scaling existing MySQL
PlanetScale	Strong	Serverless	MySQL	Developer-friendly serverless

CockroachDB: Representative Architecture

┌──────────────────────────────────────────────────────────┐
│                     CockroachDB Cluster                   │
├──────────────────────────────────────────────────────────┤
│                                                          │
│   ┌─────────┐   ┌─────────┐   ┌─────────┐               │
│   │ Node 1  │   │ Node 2  │   │ Node 3  │  ...          │
│   │ (NYC)   │   │ (SFO)   │   │ (LDN)   │               │
│   └────┬────┘   └────┬────┘   └────┬────┘               │
│        │             │             │                     │
│   ┌────┴─────────────┴─────────────┴────┐               │
│   │            Raft Consensus            │               │
│   │    (Automatic leader election,       │               │
│   │     replication, failover)           │               │
│   └─────────────────────────────────────┘               │
│                                                          │
│   Data automatically sharded (ranges)                    │
│   Replicas placed across nodes/regions                   │
│   SQL queries routed to appropriate nodes                │
│                                                          │
└──────────────────────────────────────────────────────────┘

Key Capabilities:

Automatic sharding: Data split into ranges, distributed across nodes
Automatic replication: Configurable replication factor (default 3)
Automatic failover: Raft consensus handles node failures
Serializable transactions: Full ACID across distributed data
PostgreSQL wire protocol: Use existing PostgreSQL drivers
Geo-partitioning: Pin data to specific regions for compliance/latency

When to Consider Distributed Databases

Summary: Scaling Strategy Mastery

Database scaling is a journey, not a destination. Let's consolidate the key takeaways:

Key Takeaways

•Scale progressively — Start simple, add complexity only when needed. Vertical scaling goes further than most assume.
•Read replicas first — For read-heavy workloads, replicas + caching can delay sharding indefinitely
•Caching is the highest leverage — A well-designed cache can reduce database load by 90%+ and latency by 100x
•Shard key is critical — Wrong shard key choice creates unfixable problems. Align with query patterns.
•Sharding breaks things — Joins, transactions, unique constraints all become complex. Design to minimize cross-shard operations.
•Connection pooling is essential — Especially with microservices and serverless architectures
•Distributed databases exist — CockroachDB, Spanner, etc. offer horizontal scaling with SQL semantics, at the cost of complexity and latency
•Measure before optimizing — Know your bottleneck (CPU? Memory? Disk? Network?) before choosing scaling strategy

What's Next:

Page Complete

3 / 5