Database AdministrationCapacity Planning

Capacity Planning

LevelAdvanced

Duration75 mins

TopicCapacity Planning

3 / 5

Scaling Strategies

Strategies for Growing Database Capacity

When resource planning reveals that current infrastructure will be exhausted, the question becomes: how do we add capacity?

Database scaling is neither simple nor straightforward. Unlike stateless application servers that can be cloned trivially, databases carry the complexity of data consistency, query routing, and transaction semantics. The choice of scaling strategy has profound implications for application architecture, operational procedures, and long-term system evolution.

This page explores the spectrum of database scaling strategies—from the simplest vertical upgrades to complex distributed architectures. Each approach offers distinct trade-offs between capability, complexity, and cost. Understanding these trade-offs enables informed decisions that balance immediate needs with long-term flexibility.

What You Will Learn

By the end of this page, you will understand vertical and horizontal scaling paradigms, read replica architectures, sharding strategies, and connection pooling techniques. You'll learn when each approach is appropriate, how to implement them, and the operational implications of each choice.

Vertical Scaling (Scale Up)

Vertical scaling means adding more resources to a single server—more CPU cores, more RAM, faster storage, higher network bandwidth. It's the simplest scaling approach because it requires no application changes and maintains the single-server operational model.

Advantages of vertical scaling:

Vertical Scaling Benefits

•Simplicity — No application changes, no query routing logic, no distributed transaction complexity. The database interface remains identical.
•Strong Consistency — Single server means ACID guarantees are straightforward. No replication lag, no eventual consistency concerns.
•Lower Operational Overhead — One server to monitor, backup, patch, and maintain. Simplified disaster recovery.
•Immediate Effect — Adding RAM or faster storage produces immediate performance improvement without migration.
•Predictable Behavior — Single-server performance is well-understood. Query optimization applies directly without distributed complications.

Limitations of vertical scaling:

Vertical Scaling Constraints

•Hardware Ceiling — Physical limits exist. The largest available server constrains maximum capacity. Cloud instances max out at certain sizes.
•Cost Curve — High-end hardware is disproportionately expensive. Doubling capacity often more than doubles cost.
•Single Point of Failure — One server means one failure domain. Hardware failure causes complete outage without failover.
•Downtime for Upgrades — Adding physical resources typically requires server restart or migration.
•Vendor Lock-in Risk — Dependency on specific high-end hardware or cloud instances reduces flexibility.

Vertical Scaling Options and Impact
Resource	Upgrade Options	Typical Impact	Implementation Complexity
CPU	More cores, higher frequency	Linear throughput for parallel workloads	Usually requires restart
Memory	Add RAM modules, larger instance	Cache hit improvement, reduced I/O	May be hot-addable in some systems
Storage	Faster drives (HDD→SSD→NVMe)	10-100x IOPS improvement	May require data migration
Storage Capacity	Add drives, expand volumes	Extends growth runway	Often online, RAID-dependent
Network	Faster NICs, bonding	Higher client and replication throughput	May require reconfiguration

vertical_scaling_assessment.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
-- Assess vertical scaling potential and constraints
 
-- Current resource utilization vs. server capacity
WITH current_resources AS (
    SELECT 
        (SELECT setting::int FROM pg_settings WHERE name = 'max_connections') 
            AS configured_max_connections,
        (SELECT COUNT(*) FROM pg_stat_activity) AS current_connections,
        (SELECT setting FROM pg_settings WHERE name = 'shared_buffers') 
            AS configured_shared_buffers,
        (SELECT pg_size_pretty(pg_database_size(current_database()))) 
            AS database_size,
        (SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active') 
            AS active_queries
),
 
-- Hardware capacity (would be gathered from OS, shown as example values)
hardware_specs AS (
    SELECT 
        64 AS total_ram_gb,           -- Current server RAM
        16 AS total_cpu_cores,        -- Current CPU cores
        2000 AS total_storage_gb,     -- Current storage capacity
        50000 AS max_iops             -- Current storage IOPS capability
),
 
-- Upgrade options and costs (example based on cloud pricing patterns)
upgrade_tiers AS (
    SELECT 
        tier_name,
        ram_gb,
        cpu_cores,
        monthly_cost,
        storage_iops,
        ram_gb * 1.0 / 64 AS memory_multiplier,  -- Relative to current
        cpu_cores * 1.0 / 16 AS cpu_multiplier
    FROM (VALUES 
        ('Current', 64, 16, 800, 50000),
        ('Medium Upgrade', 128, 32, 1600, 80000),
        ('Large Upgrade', 256, 64, 3500, 100000),
        ('Maximum Available', 512, 96, 8000, 150000)
    ) AS tiers(tier_name, ram_gb, cpu_cores, monthly_cost, storage_iops)
)
 
SELECT 
    u.tier_name,
    u.ram_gb || ' GB' AS memory,
    u.cpu_cores || ' cores' AS cpu,
    u.storage_iops AS iops,
    '$' || u.monthly_cost AS monthly_cost,
    ROUND(u.memory_multiplier, 1) || 'x' AS memory_increase,
    ROUND(u.cpu_multiplier, 1) || 'x' AS cpu_increase,
    ROUND(u.monthly_cost::numeric / 800, 1) || 'x' AS cost_increase,
    CASE 
        WHEN u.tier_name = 'Maximum Available' 
        THEN 'No further vertical scaling possible'
        ELSE ''
    END AS notes
FROM upgrade_tiers u
ORDER BY u.monthly_cost;
 
-- Estimate headroom at each tier
SELECT 
    u.tier_name,
    -- Memory headroom: how much larger could working set be?
    ROUND(u.ram_gb * 0.75 / 50.0, 1) AS estimated_hot_data_capacity_gb,
    -- CPU headroom: estimated QPS capacity
    ROUND(u.cpu_cores * 500.0, 0) AS estimated_simple_qps_capacity,
    -- Growth months at current trajectory (assuming 10% monthly growth)
    CASE 
        WHEN u.tier_name = 'Current' THEN 0
        ELSE ROUND(LN(u.memory_multiplier) / LN(1.10), 1)
    END AS months_of_growth_at_10pct_monthly
FROM upgrade_tiers u;

When to Choose Vertical Scaling

Vertical scaling is ideal when you're not near hardware limits, when simplicity is paramount, when workload doesn't exceed single-server capacity, and when you have budget for premium hardware. Start vertical—add horizontal complexity only when necessary.

Horizontal Scaling (Scale Out)

Horizontal scaling means distributing workload across multiple database servers. This overcomes single-server hardware limits but introduces significant complexity around data distribution, consistency, and query routing.

Horizontal scaling paradigms:

Horizontal Scaling Approaches

•Read Replicas — Copies of the database that serve read queries. Writes go to a primary; reads distribute across replicas. Relatively simple to implement.
•Write Replicas (Multi-Primary) — Multiple servers accept writes with conflict resolution. Complex but eliminates single write bottleneck.
•Sharding (Partitioning) — Data split across servers by key. Each server owns a subset of data. Enables near-linear write scaling.
•Federated/Polyglot — Different data domains in different databases. Not true scaling but distributes load by separation of concerns.
•Caching Layers — Often combined with databases, in-memory caches (Redis, Memcached) absorb read load without database changes.

Complexity implications:

Horizontal scaling trades operational simplicity for capacity. Each approach introduces new failure modes, consistency considerations, and operational procedures:

Horizontal Scaling Complexity Comparison
Approach	Write Scaling	Read Scaling	Consistency	Operational Complexity
Read Replicas	None (single primary)	Linear with replicas	Eventual (lag)	Low-Medium
Multi-Primary	Limited (conflict resolution)	Good	Complex (conflicts)	High
Sharding	Linear with shards	Linear with shards	Strong per-shard	Very High
Caching Layers	None	Excellent (cache hits)	Eventual (TTL)	Medium
Database Federation	Distributed	Distributed	Per-database	Medium-High

The Distributed Systems Tax

Every horizontal scaling approach pays a 'distributed systems tax'—network latency between nodes, partial failure handling, consistency protocol overhead, and operational complexity. Only scale horizontally when vertical limits are reached or when specific requirements (read scaling, geographic distribution) demand it.

Read Replica Architecture

Read replicas are the most common first step in horizontal scaling. A primary server handles all writes while replica servers maintain synchronized copies for read queries. This approach is powerful for read-heavy workloads, which describe most applications.

Read replica topology:

Converting Mermaid diagram...

Replication mechanisms:

•Synchronous Replication — Primary waits for replica acknowledgment before committing. Zero data loss but adds write latency (2-10ms per replica).
•Asynchronous Replication — Primary commits immediately; replicas apply changes later. Best performance but replicas may lag behind (lag = 0-N seconds typically).
•Semi-Synchronous — Primary waits for at least one replica. Balances durability and performance. Common in production deployments.
•Logical Replication — Replicates row-level changes. Allows schema differences, cross-version replication, selective table replication.
•Physical Replication — Replicates storage blocks/WAL records. Exact copies, fastest, but all-or-nothing.

read_replica_configuration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
-- PostgreSQL Read Replica Setup and Monitoring
 
-- On Primary: Configure for replication
-- postgresql.conf settings:
/*
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
synchronous_standby_names = ''  -- Empty for async, 'replica1,replica2' for sync
*/
 
-- Create replication slot for each replica
SELECT pg_create_physical_replication_slot('replica_1_slot');
SELECT pg_create_physical_replication_slot('replica_2_slot');
 
-- Create replication user
CREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'secure_password';
 
-- Monitor replication status
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    -- Lag in bytes
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes,
    -- Approximate lag in seconds (if known)
    EXTRACT(EPOCH FROM (NOW() - backend_start)) AS connection_age_seconds,
    sync_state
FROM pg_stat_replication;
 
-- Check replication slots
SELECT 
    slot_name,
    slot_type,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) 
        AS retained_wal_size
FROM pg_replication_slots;
 
-- On Replica: Check replay status
SELECT 
    pg_is_in_recovery() AS is_replica,
    pg_last_wal_receive_lsn() AS last_received,
    pg_last_wal_replay_lsn() AS last_replayed,
    pg_last_xact_replay_timestamp() AS last_replay_time,
    NOW() - pg_last_xact_replay_timestamp() AS replay_lag;
 
-- Application-level read/write splitting logic
/*
Connection routing pseudocode:
 
function getConnection(queryType):
    if queryType == 'WRITE' or requiresStrongConsistency:
        return primaryConnection
    elif requiresReadAfterWrite:
        # Route to primary or wait for replica to catch up
        return primaryConnection  # Safest approach
    else:
        return loadBalancer.getReadReplica()
*/
 
-- Query to identify read vs write query ratio (for replica sizing)
SELECT 
    CASE 
        WHEN query ~* '^(SELECT|WITH .* SELECT)' THEN 'READ'
        WHEN query ~* '^(INSERT|UPDATE|DELETE|MERGE)' THEN 'WRITE'
        ELSE 'OTHER'
    END AS query_type,
    COUNT(*) AS query_count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS percentage
FROM pg_stat_statements
GROUP BY 1
ORDER BY 2 DESC;

Handling Replication Lag

Replication lag means replicas may not have the latest data. For read-after-write scenarios, either route reads to the primary, implement session-based replica affinity, or use a lag-aware connection pool that avoids replicas with excessive lag.

Sharding Strategies

Sharding partitions data across multiple database servers, with each shard owning a subset of the data. Unlike replicas (which hold copies), shards together hold the complete dataset. Sharding enables both read and write scaling but significantly increases architectural complexity.

Shard key selection:

The shard key determines how data is distributed. Choosing the right key is the most critical sharding decision—it affects query efficiency, data distribution, and future flexibility.

Shard Key Selection Criteria
Criterion	Description	Example
High Cardinality	Many distinct values to enable even distribution	user_id (millions of values) vs. country (hundreds)
Query Affinity	Key appears in most query WHERE clauses	tenant_id for multi-tenant apps; user_id for user-centric apps
Even Distribution	Values spread workload equally across shards	UUID distributes well; timestamps create hotspots
Immutability	Key value doesn't change over entity lifetime	user_id rarely changes; status changes frequently
Growth Stability	Distribution remains balanced as data grows	Sequential IDs create hotspots on newest shard

Sharding schemes:

Hash-Based Sharding

•Hash shard key → assign to shard
•shard_id = hash(key) % num_shards
•Pro: Even distribution regardless of key pattern
•Pro: Simple to implement
•Con: Range queries span all shards
•Con: Resharding requires data movement

Range-Based Sharding

•Key ranges assigned to shards
•users 1-1M → shard1, 1M-2M → shard2
•Pro: Range queries efficient within shard
•Pro: Easy to add new ranges
•Con: Hotspots if access patterns favor ranges
•Con: Imbalanced if ranges have different sizes

sharding_implementation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
"""
Database Sharding Implementation Patterns
 
Demonstrates shard routing logic and consistent hashing
for distributed database architectures.
"""
 
from typing import Dict, List, Optional, Tuple
import hashlib
from dataclasses import dataclass
from bisect import bisect_right
 
 
@dataclass
class ShardConfig:
    """Configuration for a single shard"""
    shard_id: str
    host: str
    port: int
    weight: int = 1  # For weighted distribution
 
 
class HashShardRouter:
    """
    Simple hash-based shard routing.
    Routes keys to shards using modulo hashing.
    """
    
    def __init__(self, shards: List[ShardConfig]):
        self.shards = sorted(shards, key=lambda s: s.shard_id)
        self.shard_count = len(shards)
    
    def get_shard(self, shard_key: str) -> ShardConfig:
        """Get shard for a given key using hash distribution"""
        key_hash = int(hashlib.md5(str(shard_key).encode()).hexdigest(), 16)
        shard_index = key_hash % self.shard_count
        return self.shards[shard_index]
    
    def get_all_shards(self) -> List[ShardConfig]:
        """For queries that must fan out to all shards"""
        return self.shards
 
 
class ConsistentHashRouter:
    """
    Consistent hashing for shard routing.
    Minimizes data movement when adding/removing shards.
    """
    
    def __init__(self, shards: List[ShardConfig], virtual_nodes: int = 150):
        self.shards = {s.shard_id: s for s in shards}
        self.virtual_nodes = virtual_nodes
        self.ring: List[Tuple[int, str]] = []
        
        for shard in shards:
            for i in range(virtual_nodes * shard.weight):
                key = f"{shard.shard_id}:{i}"
                hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
                self.ring.append((hash_val, shard.shard_id))
        
        self.ring.sort(key=lambda x: x[0])
        self.ring_keys = [r[0] for r in self.ring]
    
    def get_shard(self, shard_key: str) -> ShardConfig:
        """Get shard using consistent hashing"""
        key_hash = int(hashlib.md5(str(shard_key).encode()).hexdigest(), 16)
        
        # Find first node with hash >= key_hash
        idx = bisect_right(self.ring_keys, key_hash)
        if idx >= len(self.ring):
            idx = 0  # Wrap around
        
        shard_id = self.ring[idx][1]
        return self.shards[shard_id]
    
    def add_shard(self, shard: ShardConfig):
        """Add a new shard with minimal disruption"""
        self.shards[shard.shard_id] = shard
        
        for i in range(self.virtual_nodes * shard.weight):
            key = f"{shard.shard_id}:{i}"
            hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
            self.ring.append((hash_val, shard.shard_id))
        
        self.ring.sort(key=lambda x: x[0])
        self.ring_keys = [r[0] for r in self.ring]
 
 
class ShardedQueryExecutor:
    """
    Executes queries across sharded databases.
    Handles both single-shard and scatter-gather patterns.
    """
    
    def __init__(self, router: ConsistentHashRouter):
        self.router = router
        self.connections: Dict[str, 'Connection'] = {}
    
    def execute_single_shard(self, shard_key: str, query: str, params: tuple):
        """Execute query on the shard that owns the key"""
        shard = self.router.get_shard(shard_key)
        conn = self._get_connection(shard)
        return conn.execute(query, params)
    
    def execute_all_shards(self, query: str, params: tuple) -> List:
        """
        Execute query on all shards and aggregate results.
        Used for queries without shard key in WHERE clause.
        
        WARNING: This is expensive! Avoid in production for frequent queries.
        """
        results = []
        for shard in self.router.shards.values():
            conn = self._get_connection(shard)
            shard_results = conn.execute(query, params)
            results.extend(shard_results)
        
        return results
    
    def execute_scatter_gather(self, query: str, params: tuple,
                                aggregation: str = 'UNION') -> List:
        """
        Scatter query to all shards, gather and aggregate results.
        Supports UNION, SUM, COUNT, MAX, MIN aggregations.
        """
        shard_results = self.execute_all_shards(query, params)
        
        if aggregation == 'UNION':
            return shard_results
        elif aggregation == 'COUNT':
            return [{'count': sum(r.get('count', 0) for r in shard_results)}]
        elif aggregation == 'SUM':
            return [{'sum': sum(r.get('sum', 0) for r in shard_results)}]
        elif aggregation == 'MAX':
            return [{'max': max(r.get('max', 0) for r in shard_results)}]
        elif aggregation == 'MIN':
            return [{'min': min(r.get('min', 0) for r in shard_results)}]
        
        return shard_results
    
    def _get_connection(self, shard: ShardConfig) -> 'Connection':
        """Get or create connection to shard"""
        if shard.shard_id not in self.connections:
            # In real implementation, create actual DB connection
            self.connections[shard.shard_id] = MockConnection(shard)
        return self.connections[shard.shard_id]
 
 
# Example usage
def demonstrate_sharding():
    shards = [
        ShardConfig('shard-1', 'db1.example.com', 5432),
        ShardConfig('shard-2', 'db2.example.com', 5432),
        ShardConfig('shard-3', 'db3.example.com', 5432),
        ShardConfig('shard-4', 'db4.example.com', 5432),
    ]
    
    router = ConsistentHashRouter(shards)
    
    # Test key distribution
    distribution = {s.shard_id: 0 for s in shards}
    for i in range(10000):
        shard = router.get_shard(f"user_{i}")
        distribution[shard.shard_id] += 1
    
    print("Key distribution across shards:")
    for shard_id, count in distribution.items():
        print(f"  {shard_id}: {count} keys ({count/100:.1f}%)")
    
    # Simulate adding a shard - consistent hashing moves minimal keys
    router.add_shard(ShardConfig('shard-5', 'db5.example.com', 5432))
    
    new_distribution = {s: 0 for s in router.shards}
    for i in range(10000):
        shard = router.get_shard(f"user_{i}")
        new_distribution[shard.shard_id] += 1
    
    print("\nAfter adding shard-5:")
    for shard_id, count in new_distribution.items():
        print(f"  {shard_id}: {count} keys ({count/100:.1f}%)")

Cross-Shard Queries are Expensive

Queries without shard key in WHERE clause must scatter to all shards and gather results. This is O(n) in shard count and adds network latency. Design schemas and queries to minimize cross-shard operations. Consider denormalization or maintaining lookup tables.

Connection Pooling and Load Balancing

Database connections are expensive resources. Each connection consumes memory on the server (5-50MB typically) and has establishment overhead (TCP handshake, authentication, session setup). Connection pooling and intelligent load balancing are essential for efficient scaling.

Connection pooling benefits:

Why Connection Pooling Matters

•Reduced Connection Overhead — Reusing existing connections eliminates per-query connection setup cost (10-100ms saved per query).
•Server Memory Efficiency — 100 application servers with 10 connections each = 1000 DB connections. With pooling: 100 shared connections.
•Better Query Latency — Pre-established connections respond immediately vs. connection setup delays.
•Connection Limits — Databases have maximum connection limits. Pooling lets more clients share fewer connections.
•Graceful Degradation — Pool can queue requests when connections are saturated rather than failing immediately.

Pooling architectures:

Connection Pooling Approaches
Approach	Description	Pros	Cons
Application-level Pool	Pool embedded in each app server	Simplest; no additional component	Pool per app = total connections multiply
Proxy Pool (PgBouncer, ProxySQL)	Centralized proxy handles pooling	Fewer total connections; transparent to app	Additional component; single point of failure
Sidecar Pool	Pool proxy co-located with each app pod	Kubernetes-native; no network hop	More complex deployment
Database-native Pool	Built into database (connection broker)	No external components	Database-specific; may not exist

pgbouncer_configuration.ini
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
; PgBouncer Configuration for Production
; Centralized connection pooling for PostgreSQL
 
[databases]
; Database connection definitions
; Format: dbname = host=hostname port=port dbname=actualdb auth_user=user
 
production = host=primary.db.internal port=5432 dbname=app_production
analytics = host=replica1.db.internal port=5432 dbname=app_production
readonly = host=replica-pool.db.internal port=5432 dbname=app_production pool_mode=statement
 
[pgbouncer]
; Pooling mode: 
;   session - Connection assigned per session (safest, least efficient)
;   transaction - Connection assigned per transaction (balanced)
;   statement - Connection assigned per statement (most efficient, most restrictions)
pool_mode = transaction
 
; Pool size limits
default_pool_size = 20          ; Connections per database user pair
min_pool_size = 5               ; Minimum kept open
reserve_pool_size = 5           ; Emergency connections when pool exhausted
reserve_pool_timeout = 3        ; Seconds before using reserve pool
 
; Connection limits
max_client_conn = 1000          ; Total client connections allowed
max_db_connections = 100        ; Maximum connections to actual database
 
; Connection behavior
server_reset_query = DISCARD ALL
server_reset_query_always = 0
server_check_query = SELECT 1
server_check_delay = 30
 
; Query behavior
query_timeout = 120             ; Kill queries running longer than 120s
query_wait_timeout = 30         ; Max time to wait for available connection
client_idle_timeout = 600       ; Disconnect idle clients after 10 minutes
 
; Authentication
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
 
; Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
 
[users]
; User-specific settings
admin = pool_mode=session       ; Admin connections not pooled
readonly_user = pool_mode=statement  ; Aggressive pooling for readonly
 
; Monitoring query for pool statistics
; SELECT * FROM pgbouncer.pools;
; SELECT * FROM pgbouncer.stats;
; SELECT * FROM pgbouncer.clients;
; SELECT * FROM pgbouncer.servers;

connection_pool_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
-- PgBouncer pool monitoring queries (connect to pgbouncer admin)
 
-- Pool status overview
SELECT 
    database,
    pool_mode,
    user,
    cl_active AS client_active,
    cl_waiting AS client_waiting,
    sv_active AS server_active,
    sv_idle AS server_idle,
    sv_used AS server_used,
    maxwait AS max_wait_seconds,
    pool_size
FROM pgbouncer.pools;
 
-- Pool efficiency metrics
SELECT 
    database,
    total_xact_count AS transactions,
    total_query_count AS queries,
    total_received AS bytes_received,
    total_sent AS bytes_sent,
    total_xact_time / 1000000.0 AS total_transaction_time_sec,
    total_query_time / 1000000.0 AS total_query_time_sec,
    ROUND(total_query_time::numeric / NULLIF(total_query_count, 0) / 1000, 2) 
        AS avg_query_ms,
    avg_xact_count AS xact_per_second,
    avg_query_count AS qps
FROM pgbouncer.stats;
 
-- Client connection distribution
SELECT 
    addr AS client_address,
    COUNT(*) AS connection_count,
    STRING_AGG(DISTINCT database, ', ') AS databases_used,
    STRING_AGG(DISTINCT state, ', ') AS connection_states
FROM pgbouncer.clients
GROUP BY addr
ORDER BY connection_count DESC
LIMIT 20;
 
-- Pool sizing validation
WITH pool_data AS (
    SELECT 
        database,
        user,
        cl_active + cl_waiting AS total_clients,
        sv_active + sv_idle + sv_used AS total_servers,
        pool_size AS max_pool_size
    FROM pgbouncer.pools
)
SELECT 
    database,
    user,
    total_clients,
    total_servers,
    max_pool_size,
    ROUND(100.0 * total_servers / NULLIF(max_pool_size, 0), 1) 
        AS pool_utilization_pct,
    CASE 
        WHEN (100.0 * total_servers / NULLIF(max_pool_size, 0)) > 90 
            THEN 'CRITICAL - Near pool limit'
        WHEN (100.0 * total_servers / NULLIF(max_pool_size, 0)) > 70 
            THEN 'WARNING - High utilization'
        ELSE 'OK'
    END AS status
FROM pool_data;

Pool Mode Selection

Use 'transaction' pool mode for most applications—it balances efficiency with compatibility. Use 'statement' mode only for stateless read-only workloads. Avoid 'session' mode unless you need session-level state like LISTEN/NOTIFY or prepared statements that can't be recreated.

Choosing the Right Scaling Strategy

Selecting the appropriate scaling strategy depends on workload characteristics, operational capacity, and future growth trajectory. A decision framework helps navigate these choices systematically.

Scaling Strategy Decision Matrix
Characteristic	Vertical	Read Replicas	Sharding
Read/Write Ratio	Any	Read-heavy (>80% reads)	Any
Data Size	<10TB typically	Any	1TB typically
Write Volume	Any	Limited by primary	High writes
Query Complexity	Any	Cross-replica aggregation possible	Simple per-shard preferred
Operational Maturity	Any	Medium+	Advanced
Consistency Requirements	Strong	Eventual acceptable	Strong per-shard
Geographic Distribution	Single location	Multi-region reads	Multi-region all ops
Cost Sensitivity	Premium hardware budget	Medium	Higher total cost

Decision flowchart:

Converting Mermaid diagram...

Start Simple, Add Complexity Gradually

Begin with vertical scaling. Add read replicas when read load exceeds primary capacity. Consider sharding only when write throughput is the bottleneck and replicas don't help. Each layer of complexity should be justified by clear capacity requirements.

Scaling Transitions and Migrations

Moving between scaling strategies requires careful planning. Migrations involve downtime risk, data consistency concerns, and application changes. Understanding migration paths and techniques minimizes disruption.

Common Scaling Migrations

•Single → Vertical Upgrade — Typically requires maintenance window for hardware changes. Cloud instances may support hot resize for some resources.
•Single → Read Replicas — Can be done with minimal downtime. Set up replication, verify sync, update application connection config.
•Read Replicas → More Replicas — Non-disruptive. Add replicas while running, update load balancer.
•Any → Sharding — Major migration. Requires schema review, shard key definition, data movement, application rewrite. Plan months, not days.
•Shards → More Shards (Resharding) — Complex but often necessary. Consistent hashing helps; may require shadow writes or double-write period.

sharding_migration_checklist.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Production Sharding Migration Checklist
 
## Phase 1: Assessment (Weeks 1-2)
- [ ] Identify shard key candidates
- [ ] Analyze query patterns for cross-shard impact
- [ ] Inventory all tables - determine sharding strategy per table
- [ ] Identify lookup/reference tables (replicated to all shards)
- [ ] Document foreign key relationships (most will break)
- [ ] Estimate total data movement volume and timeline
- [ ] Capacity plan: how many shards needed at launch? In 2 years?
 
## Phase 2: Application Preparation (Weeks 3-6)
- [ ] Abstract database layer to support multi-shard
- [ ] Implement shard router in application
- [ ] Add shard key to all write operations
- [ ] Modify reads to include shard key where possible
- [ ] Identify and rewrite cross-shard queries
- [ ] Implement scatter-gather for unavoidable cross-shard ops
- [ ] Add monitoring for shard distribution and cross-shard queries
- [ ] Load test with sharded configuration in staging
 
## Phase 3: Infrastructure Preparation (Weeks 5-7)  
- [ ] Provision shard database servers
- [ ] Configure networking between app and all shards
- [ ] Set up connection pooling for multi-shard
- [ ] Configure monitoring and alerting per shard
- [ ] Prepare backup and recovery procedures per shard
- [ ] Document runbooks for common shard operations
 
## Phase 4: Data Migration (Weeks 7-9)
- [ ] Create empty schema on all shards
- [ ] Migrate reference/lookup tables (full copy to each shard)
- [ ] Begin dual-write phase: writes go to old DB and new shards
- [ ] Backfill historical data to shards (can be slow)
- [ ] Verify row counts and checksums between old and new
- [ ] Run validation queries comparing old DB to shard aggregate
 
## Phase 5: Cutover (Week 10)
- [ ] Schedule maintenance window if needed
- [ ] Verify all shards are caught up
- [ ] Switch application reads to shards
- [ ] Monitor for errors and latency
- [ ] Gradual traffic shift (if possible) vs. full cutover
- [ ] Disable writes to old database
- [ ] Final validation
- [ ] Declare migration complete
 
## Phase 6: Cleanup (Week 11+)
- [ ] Keep old database in read-only mode (safety net)
- [ ] Remove dual-write code paths after stabilization
- [ ] Archive and eventually decommission old database
- [ ] Document new architecture
- [ ] Update disaster recovery procedures

Sharding is a One-Way Door

Moving to a sharded architecture is extremely difficult to reverse. The application changes, operational expertise, and data distribution become deeply embedded. Consider sharding a major commitment—invest heavily in getting it right initially.

Summary: Choosing and Implementing Scaling Strategies

Database scaling strategies exist on a spectrum from simple (vertical) to complex (sharding). The right choice depends on current constraints, growth trajectory, and operational capabilities. Starting simple and adding complexity only when necessary minimizes operational burden while maintaining flexibility.

Key Takeaways

•Vertical scaling first — Simplest approach with no application changes. Use until hardware limits are reached.
•Read replicas for read-heavy workloads — Scales read capacity with moderate complexity. Handle replication lag appropriately.
•Sharding for write scaling — Enables massive scale but introduces significant complexity. Choose shard keys carefully.
•Connection pooling always — Reduces connection overhead regardless of architecture. Essential at scale.
•Plan migrations carefully — Scaling transitions require thorough planning, testing, and rollback procedures.
•Match strategy to maturity — Advanced scaling requires advanced operational capabilities. Don't over-architect.

What's next:

With scaling strategies understood, the next consideration is cost. The next page covers Cost Optimization—techniques for minimizing database infrastructure expense while maintaining performance and reliability.

Page Complete

You now understand the spectrum of database scaling strategies and their trade-offs. This knowledge enables informed decisions about when and how to add capacity, balancing capability against complexity. Next, we'll explore optimizing the cost of database infrastructure.

3 / 5

Loading learning content...

Database AdministrationCapacity Planning

Capacity Planning

LevelAdvanced

Duration75 mins

TopicCapacity Planning

3 / 5

Scaling Strategies

Strategies for Growing Database Capacity

When resource planning reveals that current infrastructure will be exhausted, the question becomes: how do we add capacity?

What You Will Learn

Vertical Scaling (Scale Up)

Advantages of vertical scaling:

Vertical Scaling Benefits

•Simplicity — No application changes, no query routing logic, no distributed transaction complexity. The database interface remains identical.
•Strong Consistency — Single server means ACID guarantees are straightforward. No replication lag, no eventual consistency concerns.
•Lower Operational Overhead — One server to monitor, backup, patch, and maintain. Simplified disaster recovery.
•Immediate Effect — Adding RAM or faster storage produces immediate performance improvement without migration.
•Predictable Behavior — Single-server performance is well-understood. Query optimization applies directly without distributed complications.

Limitations of vertical scaling:

Vertical Scaling Constraints

•Hardware Ceiling — Physical limits exist. The largest available server constrains maximum capacity. Cloud instances max out at certain sizes.
•Cost Curve — High-end hardware is disproportionately expensive. Doubling capacity often more than doubles cost.
•Single Point of Failure — One server means one failure domain. Hardware failure causes complete outage without failover.
•Downtime for Upgrades — Adding physical resources typically requires server restart or migration.
•Vendor Lock-in Risk — Dependency on specific high-end hardware or cloud instances reduces flexibility.

Vertical Scaling Options and Impact
Resource	Upgrade Options	Typical Impact	Implementation Complexity
CPU	More cores, higher frequency	Linear throughput for parallel workloads	Usually requires restart
Memory	Add RAM modules, larger instance	Cache hit improvement, reduced I/O	May be hot-addable in some systems
Storage	Faster drives (HDD→SSD→NVMe)	10-100x IOPS improvement	May require data migration
Storage Capacity	Add drives, expand volumes	Extends growth runway	Often online, RAID-dependent
Network	Faster NICs, bonding	Higher client and replication throughput	May require reconfiguration

vertical_scaling_assessment.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
-- Assess vertical scaling potential and constraints
 
-- Current resource utilization vs. server capacity
WITH current_resources AS (
    SELECT 
        (SELECT setting::int FROM pg_settings WHERE name = 'max_connections') 
            AS configured_max_connections,
        (SELECT COUNT(*) FROM pg_stat_activity) AS current_connections,
        (SELECT setting FROM pg_settings WHERE name = 'shared_buffers') 
            AS configured_shared_buffers,
        (SELECT pg_size_pretty(pg_database_size(current_database()))) 
            AS database_size,
        (SELECT COUNT(*) FROM pg_stat_activity WHERE state = 'active') 
            AS active_queries
),
 
-- Hardware capacity (would be gathered from OS, shown as example values)
hardware_specs AS (
    SELECT 
        64 AS total_ram_gb,           -- Current server RAM
        16 AS total_cpu_cores,        -- Current CPU cores
        2000 AS total_storage_gb,     -- Current storage capacity
        50000 AS max_iops             -- Current storage IOPS capability
),
 
-- Upgrade options and costs (example based on cloud pricing patterns)
upgrade_tiers AS (
    SELECT 
        tier_name,
        ram_gb,
        cpu_cores,
        monthly_cost,
        storage_iops,
        ram_gb * 1.0 / 64 AS memory_multiplier,  -- Relative to current
        cpu_cores * 1.0 / 16 AS cpu_multiplier
    FROM (VALUES 
        ('Current', 64, 16, 800, 50000),
        ('Medium Upgrade', 128, 32, 1600, 80000),
        ('Large Upgrade', 256, 64, 3500, 100000),
        ('Maximum Available', 512, 96, 8000, 150000)
    ) AS tiers(tier_name, ram_gb, cpu_cores, monthly_cost, storage_iops)
)
 
SELECT 
    u.tier_name,
    u.ram_gb || ' GB' AS memory,
    u.cpu_cores || ' cores' AS cpu,
    u.storage_iops AS iops,
    '$' || u.monthly_cost AS monthly_cost,
    ROUND(u.memory_multiplier, 1) || 'x' AS memory_increase,
    ROUND(u.cpu_multiplier, 1) || 'x' AS cpu_increase,
    ROUND(u.monthly_cost::numeric / 800, 1) || 'x' AS cost_increase,
    CASE 
        WHEN u.tier_name = 'Maximum Available' 
        THEN 'No further vertical scaling possible'
        ELSE ''
    END AS notes
FROM upgrade_tiers u
ORDER BY u.monthly_cost;
 
-- Estimate headroom at each tier
SELECT 
    u.tier_name,
    -- Memory headroom: how much larger could working set be?
    ROUND(u.ram_gb * 0.75 / 50.0, 1) AS estimated_hot_data_capacity_gb,
    -- CPU headroom: estimated QPS capacity
    ROUND(u.cpu_cores * 500.0, 0) AS estimated_simple_qps_capacity,
    -- Growth months at current trajectory (assuming 10% monthly growth)
    CASE 
        WHEN u.tier_name = 'Current' THEN 0
        ELSE ROUND(LN(u.memory_multiplier) / LN(1.10), 1)
    END AS months_of_growth_at_10pct_monthly
FROM upgrade_tiers u;

When to Choose Vertical Scaling

Horizontal Scaling (Scale Out)

Horizontal scaling paradigms:

Horizontal Scaling Approaches

•Read Replicas — Copies of the database that serve read queries. Writes go to a primary; reads distribute across replicas. Relatively simple to implement.
•Write Replicas (Multi-Primary) — Multiple servers accept writes with conflict resolution. Complex but eliminates single write bottleneck.
•Sharding (Partitioning) — Data split across servers by key. Each server owns a subset of data. Enables near-linear write scaling.
•Federated/Polyglot — Different data domains in different databases. Not true scaling but distributes load by separation of concerns.
•Caching Layers — Often combined with databases, in-memory caches (Redis, Memcached) absorb read load without database changes.

Complexity implications:

Horizontal scaling trades operational simplicity for capacity. Each approach introduces new failure modes, consistency considerations, and operational procedures:

Horizontal Scaling Complexity Comparison
Approach	Write Scaling	Read Scaling	Consistency	Operational Complexity
Read Replicas	None (single primary)	Linear with replicas	Eventual (lag)	Low-Medium
Multi-Primary	Limited (conflict resolution)	Good	Complex (conflicts)	High
Sharding	Linear with shards	Linear with shards	Strong per-shard	Very High
Caching Layers	None	Excellent (cache hits)	Eventual (TTL)	Medium
Database Federation	Distributed	Distributed	Per-database	Medium-High

The Distributed Systems Tax

Read Replica Architecture

Read replica topology:

Converting Mermaid diagram...

Replication mechanisms:

•Synchronous Replication — Primary waits for replica acknowledgment before committing. Zero data loss but adds write latency (2-10ms per replica).
•Asynchronous Replication — Primary commits immediately; replicas apply changes later. Best performance but replicas may lag behind (lag = 0-N seconds typically).
•Semi-Synchronous — Primary waits for at least one replica. Balances durability and performance. Common in production deployments.
•Logical Replication — Replicates row-level changes. Allows schema differences, cross-version replication, selective table replication.
•Physical Replication — Replicates storage blocks/WAL records. Exact copies, fastest, but all-or-nothing.

read_replica_configuration.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
-- PostgreSQL Read Replica Setup and Monitoring
 
-- On Primary: Configure for replication
-- postgresql.conf settings:
/*
wal_level = replica
max_wal_senders = 10
max_replication_slots = 10
synchronous_standby_names = ''  -- Empty for async, 'replica1,replica2' for sync
*/
 
-- Create replication slot for each replica
SELECT pg_create_physical_replication_slot('replica_1_slot');
SELECT pg_create_physical_replication_slot('replica_2_slot');
 
-- Create replication user
CREATE ROLE replication_user WITH REPLICATION LOGIN PASSWORD 'secure_password';
 
-- Monitor replication status
SELECT 
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    -- Lag in bytes
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replication_lag_bytes,
    -- Approximate lag in seconds (if known)
    EXTRACT(EPOCH FROM (NOW() - backend_start)) AS connection_age_seconds,
    sync_state
FROM pg_stat_replication;
 
-- Check replication slots
SELECT 
    slot_name,
    slot_type,
    active,
    pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) 
        AS retained_wal_size
FROM pg_replication_slots;
 
-- On Replica: Check replay status
SELECT 
    pg_is_in_recovery() AS is_replica,
    pg_last_wal_receive_lsn() AS last_received,
    pg_last_wal_replay_lsn() AS last_replayed,
    pg_last_xact_replay_timestamp() AS last_replay_time,
    NOW() - pg_last_xact_replay_timestamp() AS replay_lag;
 
-- Application-level read/write splitting logic
/*
Connection routing pseudocode:
 
function getConnection(queryType):
    if queryType == 'WRITE' or requiresStrongConsistency:
        return primaryConnection
    elif requiresReadAfterWrite:
        # Route to primary or wait for replica to catch up
        return primaryConnection  # Safest approach
    else:
        return loadBalancer.getReadReplica()
*/
 
-- Query to identify read vs write query ratio (for replica sizing)
SELECT 
    CASE 
        WHEN query ~* '^(SELECT|WITH .* SELECT)' THEN 'READ'
        WHEN query ~* '^(INSERT|UPDATE|DELETE|MERGE)' THEN 'WRITE'
        ELSE 'OTHER'
    END AS query_type,
    COUNT(*) AS query_count,
    ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS percentage
FROM pg_stat_statements
GROUP BY 1
ORDER BY 2 DESC;

Handling Replication Lag

Sharding Strategies

Shard key selection:

The shard key determines how data is distributed. Choosing the right key is the most critical sharding decision—it affects query efficiency, data distribution, and future flexibility.

Shard Key Selection Criteria
Criterion	Description	Example
High Cardinality	Many distinct values to enable even distribution	user_id (millions of values) vs. country (hundreds)
Query Affinity	Key appears in most query WHERE clauses	tenant_id for multi-tenant apps; user_id for user-centric apps
Even Distribution	Values spread workload equally across shards	UUID distributes well; timestamps create hotspots
Immutability	Key value doesn't change over entity lifetime	user_id rarely changes; status changes frequently
Growth Stability	Distribution remains balanced as data grows	Sequential IDs create hotspots on newest shard

Sharding schemes:

Hash-Based Sharding

•Hash shard key → assign to shard
•shard_id = hash(key) % num_shards
•Pro: Even distribution regardless of key pattern
•Pro: Simple to implement
•Con: Range queries span all shards
•Con: Resharding requires data movement

Range-Based Sharding

•Key ranges assigned to shards
•users 1-1M → shard1, 1M-2M → shard2
•Pro: Range queries efficient within shard
•Pro: Easy to add new ranges
•Con: Hotspots if access patterns favor ranges
•Con: Imbalanced if ranges have different sizes

sharding_implementation.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
"""
Database Sharding Implementation Patterns
 
Demonstrates shard routing logic and consistent hashing
for distributed database architectures.
"""
 
from typing import Dict, List, Optional, Tuple
import hashlib
from dataclasses import dataclass
from bisect import bisect_right
 
 
@dataclass
class ShardConfig:
    """Configuration for a single shard"""
    shard_id: str
    host: str
    port: int
    weight: int = 1  # For weighted distribution
 
 
class HashShardRouter:
    """
    Simple hash-based shard routing.
    Routes keys to shards using modulo hashing.
    """
    
    def __init__(self, shards: List[ShardConfig]):
        self.shards = sorted(shards, key=lambda s: s.shard_id)
        self.shard_count = len(shards)
    
    def get_shard(self, shard_key: str) -> ShardConfig:
        """Get shard for a given key using hash distribution"""
        key_hash = int(hashlib.md5(str(shard_key).encode()).hexdigest(), 16)
        shard_index = key_hash % self.shard_count
        return self.shards[shard_index]
    
    def get_all_shards(self) -> List[ShardConfig]:
        """For queries that must fan out to all shards"""
        return self.shards
 
 
class ConsistentHashRouter:
    """
    Consistent hashing for shard routing.
    Minimizes data movement when adding/removing shards.
    """
    
    def __init__(self, shards: List[ShardConfig], virtual_nodes: int = 150):
        self.shards = {s.shard_id: s for s in shards}
        self.virtual_nodes = virtual_nodes
        self.ring: List[Tuple[int, str]] = []
        
        for shard in shards:
            for i in range(virtual_nodes * shard.weight):
                key = f"{shard.shard_id}:{i}"
                hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
                self.ring.append((hash_val, shard.shard_id))
        
        self.ring.sort(key=lambda x: x[0])
        self.ring_keys = [r[0] for r in self.ring]
    
    def get_shard(self, shard_key: str) -> ShardConfig:
        """Get shard using consistent hashing"""
        key_hash = int(hashlib.md5(str(shard_key).encode()).hexdigest(), 16)
        
        # Find first node with hash >= key_hash
        idx = bisect_right(self.ring_keys, key_hash)
        if idx >= len(self.ring):
            idx = 0  # Wrap around
        
        shard_id = self.ring[idx][1]
        return self.shards[shard_id]
    
    def add_shard(self, shard: ShardConfig):
        """Add a new shard with minimal disruption"""
        self.shards[shard.shard_id] = shard
        
        for i in range(self.virtual_nodes * shard.weight):
            key = f"{shard.shard_id}:{i}"
            hash_val = int(hashlib.md5(key.encode()).hexdigest(), 16)
            self.ring.append((hash_val, shard.shard_id))
        
        self.ring.sort(key=lambda x: x[0])
        self.ring_keys = [r[0] for r in self.ring]
 
 
class ShardedQueryExecutor:
    """
    Executes queries across sharded databases.
    Handles both single-shard and scatter-gather patterns.
    """
    
    def __init__(self, router: ConsistentHashRouter):
        self.router = router
        self.connections: Dict[str, 'Connection'] = {}
    
    def execute_single_shard(self, shard_key: str, query: str, params: tuple):
        """Execute query on the shard that owns the key"""
        shard = self.router.get_shard(shard_key)
        conn = self._get_connection(shard)
        return conn.execute(query, params)
    
    def execute_all_shards(self, query: str, params: tuple) -> List:
        """
        Execute query on all shards and aggregate results.
        Used for queries without shard key in WHERE clause.
        
        WARNING: This is expensive! Avoid in production for frequent queries.
        """
        results = []
        for shard in self.router.shards.values():
            conn = self._get_connection(shard)
            shard_results = conn.execute(query, params)
            results.extend(shard_results)
        
        return results
    
    def execute_scatter_gather(self, query: str, params: tuple,
                                aggregation: str = 'UNION') -> List:
        """
        Scatter query to all shards, gather and aggregate results.
        Supports UNION, SUM, COUNT, MAX, MIN aggregations.
        """
        shard_results = self.execute_all_shards(query, params)
        
        if aggregation == 'UNION':
            return shard_results
        elif aggregation == 'COUNT':
            return [{'count': sum(r.get('count', 0) for r in shard_results)}]
        elif aggregation == 'SUM':
            return [{'sum': sum(r.get('sum', 0) for r in shard_results)}]
        elif aggregation == 'MAX':
            return [{'max': max(r.get('max', 0) for r in shard_results)}]
        elif aggregation == 'MIN':
            return [{'min': min(r.get('min', 0) for r in shard_results)}]
        
        return shard_results
    
    def _get_connection(self, shard: ShardConfig) -> 'Connection':
        """Get or create connection to shard"""
        if shard.shard_id not in self.connections:
            # In real implementation, create actual DB connection
            self.connections[shard.shard_id] = MockConnection(shard)
        return self.connections[shard.shard_id]
 
 
# Example usage
def demonstrate_sharding():
    shards = [
        ShardConfig('shard-1', 'db1.example.com', 5432),
        ShardConfig('shard-2', 'db2.example.com', 5432),
        ShardConfig('shard-3', 'db3.example.com', 5432),
        ShardConfig('shard-4', 'db4.example.com', 5432),
    ]
    
    router = ConsistentHashRouter(shards)
    
    # Test key distribution
    distribution = {s.shard_id: 0 for s in shards}
    for i in range(10000):
        shard = router.get_shard(f"user_{i}")
        distribution[shard.shard_id] += 1
    
    print("Key distribution across shards:")
    for shard_id, count in distribution.items():
        print(f"  {shard_id}: {count} keys ({count/100:.1f}%)")
    
    # Simulate adding a shard - consistent hashing moves minimal keys
    router.add_shard(ShardConfig('shard-5', 'db5.example.com', 5432))
    
    new_distribution = {s: 0 for s in router.shards}
    for i in range(10000):
        shard = router.get_shard(f"user_{i}")
        new_distribution[shard.shard_id] += 1
    
    print("\nAfter adding shard-5:")
    for shard_id, count in new_distribution.items():
        print(f"  {shard_id}: {count} keys ({count/100:.1f}%)")

Cross-Shard Queries are Expensive

Connection Pooling and Load Balancing

Connection pooling benefits:

Why Connection Pooling Matters

•Reduced Connection Overhead — Reusing existing connections eliminates per-query connection setup cost (10-100ms saved per query).
•Server Memory Efficiency — 100 application servers with 10 connections each = 1000 DB connections. With pooling: 100 shared connections.
•Better Query Latency — Pre-established connections respond immediately vs. connection setup delays.
•Connection Limits — Databases have maximum connection limits. Pooling lets more clients share fewer connections.
•Graceful Degradation — Pool can queue requests when connections are saturated rather than failing immediately.

Pooling architectures:

Connection Pooling Approaches
Approach	Description	Pros	Cons
Application-level Pool	Pool embedded in each app server	Simplest; no additional component	Pool per app = total connections multiply
Proxy Pool (PgBouncer, ProxySQL)	Centralized proxy handles pooling	Fewer total connections; transparent to app	Additional component; single point of failure
Sidecar Pool	Pool proxy co-located with each app pod	Kubernetes-native; no network hop	More complex deployment
Database-native Pool	Built into database (connection broker)	No external components	Database-specific; may not exist

pgbouncer_configuration.ini
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
; PgBouncer Configuration for Production
; Centralized connection pooling for PostgreSQL
 
[databases]
; Database connection definitions
; Format: dbname = host=hostname port=port dbname=actualdb auth_user=user
 
production = host=primary.db.internal port=5432 dbname=app_production
analytics = host=replica1.db.internal port=5432 dbname=app_production
readonly = host=replica-pool.db.internal port=5432 dbname=app_production pool_mode=statement
 
[pgbouncer]
; Pooling mode: 
;   session - Connection assigned per session (safest, least efficient)
;   transaction - Connection assigned per transaction (balanced)
;   statement - Connection assigned per statement (most efficient, most restrictions)
pool_mode = transaction
 
; Pool size limits
default_pool_size = 20          ; Connections per database user pair
min_pool_size = 5               ; Minimum kept open
reserve_pool_size = 5           ; Emergency connections when pool exhausted
reserve_pool_timeout = 3        ; Seconds before using reserve pool
 
; Connection limits
max_client_conn = 1000          ; Total client connections allowed
max_db_connections = 100        ; Maximum connections to actual database
 
; Connection behavior
server_reset_query = DISCARD ALL
server_reset_query_always = 0
server_check_query = SELECT 1
server_check_delay = 30
 
; Query behavior
query_timeout = 120             ; Kill queries running longer than 120s
query_wait_timeout = 30         ; Max time to wait for available connection
client_idle_timeout = 600       ; Disconnect idle clients after 10 minutes
 
; Authentication
auth_type = scram-sha-256
auth_file = /etc/pgbouncer/userlist.txt
 
; Logging
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
stats_period = 60
 
[users]
; User-specific settings
admin = pool_mode=session       ; Admin connections not pooled
readonly_user = pool_mode=statement  ; Aggressive pooling for readonly
 
; Monitoring query for pool statistics
; SELECT * FROM pgbouncer.pools;
; SELECT * FROM pgbouncer.stats;
; SELECT * FROM pgbouncer.clients;
; SELECT * FROM pgbouncer.servers;

connection_pool_monitoring.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
-- PgBouncer pool monitoring queries (connect to pgbouncer admin)
 
-- Pool status overview
SELECT 
    database,
    pool_mode,
    user,
    cl_active AS client_active,
    cl_waiting AS client_waiting,
    sv_active AS server_active,
    sv_idle AS server_idle,
    sv_used AS server_used,
    maxwait AS max_wait_seconds,
    pool_size
FROM pgbouncer.pools;
 
-- Pool efficiency metrics
SELECT 
    database,
    total_xact_count AS transactions,
    total_query_count AS queries,
    total_received AS bytes_received,
    total_sent AS bytes_sent,
    total_xact_time / 1000000.0 AS total_transaction_time_sec,
    total_query_time / 1000000.0 AS total_query_time_sec,
    ROUND(total_query_time::numeric / NULLIF(total_query_count, 0) / 1000, 2) 
        AS avg_query_ms,
    avg_xact_count AS xact_per_second,
    avg_query_count AS qps
FROM pgbouncer.stats;
 
-- Client connection distribution
SELECT 
    addr AS client_address,
    COUNT(*) AS connection_count,
    STRING_AGG(DISTINCT database, ', ') AS databases_used,
    STRING_AGG(DISTINCT state, ', ') AS connection_states
FROM pgbouncer.clients
GROUP BY addr
ORDER BY connection_count DESC
LIMIT 20;
 
-- Pool sizing validation
WITH pool_data AS (
    SELECT 
        database,
        user,
        cl_active + cl_waiting AS total_clients,
        sv_active + sv_idle + sv_used AS total_servers,
        pool_size AS max_pool_size
    FROM pgbouncer.pools
)
SELECT 
    database,
    user,
    total_clients,
    total_servers,
    max_pool_size,
    ROUND(100.0 * total_servers / NULLIF(max_pool_size, 0), 1) 
        AS pool_utilization_pct,
    CASE 
        WHEN (100.0 * total_servers / NULLIF(max_pool_size, 0)) > 90 
            THEN 'CRITICAL - Near pool limit'
        WHEN (100.0 * total_servers / NULLIF(max_pool_size, 0)) > 70 
            THEN 'WARNING - High utilization'
        ELSE 'OK'
    END AS status
FROM pool_data;

Pool Mode Selection

Choosing the Right Scaling Strategy

Selecting the appropriate scaling strategy depends on workload characteristics, operational capacity, and future growth trajectory. A decision framework helps navigate these choices systematically.

Scaling Strategy Decision Matrix
Characteristic	Vertical	Read Replicas	Sharding
Read/Write Ratio	Any	Read-heavy (>80% reads)	Any
Data Size	<10TB typically	Any	1TB typically
Write Volume	Any	Limited by primary	High writes
Query Complexity	Any	Cross-replica aggregation possible	Simple per-shard preferred
Operational Maturity	Any	Medium+	Advanced
Consistency Requirements	Strong	Eventual acceptable	Strong per-shard
Geographic Distribution	Single location	Multi-region reads	Multi-region all ops
Cost Sensitivity	Premium hardware budget	Medium	Higher total cost

Decision flowchart:

Converting Mermaid diagram...

Start Simple, Add Complexity Gradually

Scaling Transitions and Migrations

Common Scaling Migrations

•Single → Vertical Upgrade — Typically requires maintenance window for hardware changes. Cloud instances may support hot resize for some resources.
•Single → Read Replicas — Can be done with minimal downtime. Set up replication, verify sync, update application connection config.
•Read Replicas → More Replicas — Non-disruptive. Add replicas while running, update load balancer.
•Any → Sharding — Major migration. Requires schema review, shard key definition, data movement, application rewrite. Plan months, not days.
•Shards → More Shards (Resharding) — Complex but often necessary. Consistent hashing helps; may require shadow writes or double-write period.

sharding_migration_checklist.md
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Production Sharding Migration Checklist
 
## Phase 1: Assessment (Weeks 1-2)
- [ ] Identify shard key candidates
- [ ] Analyze query patterns for cross-shard impact
- [ ] Inventory all tables - determine sharding strategy per table
- [ ] Identify lookup/reference tables (replicated to all shards)
- [ ] Document foreign key relationships (most will break)
- [ ] Estimate total data movement volume and timeline
- [ ] Capacity plan: how many shards needed at launch? In 2 years?
 
## Phase 2: Application Preparation (Weeks 3-6)
- [ ] Abstract database layer to support multi-shard
- [ ] Implement shard router in application
- [ ] Add shard key to all write operations
- [ ] Modify reads to include shard key where possible
- [ ] Identify and rewrite cross-shard queries
- [ ] Implement scatter-gather for unavoidable cross-shard ops
- [ ] Add monitoring for shard distribution and cross-shard queries
- [ ] Load test with sharded configuration in staging
 
## Phase 3: Infrastructure Preparation (Weeks 5-7)  
- [ ] Provision shard database servers
- [ ] Configure networking between app and all shards
- [ ] Set up connection pooling for multi-shard
- [ ] Configure monitoring and alerting per shard
- [ ] Prepare backup and recovery procedures per shard
- [ ] Document runbooks for common shard operations
 
## Phase 4: Data Migration (Weeks 7-9)
- [ ] Create empty schema on all shards
- [ ] Migrate reference/lookup tables (full copy to each shard)
- [ ] Begin dual-write phase: writes go to old DB and new shards
- [ ] Backfill historical data to shards (can be slow)
- [ ] Verify row counts and checksums between old and new
- [ ] Run validation queries comparing old DB to shard aggregate
 
## Phase 5: Cutover (Week 10)
- [ ] Schedule maintenance window if needed
- [ ] Verify all shards are caught up
- [ ] Switch application reads to shards
- [ ] Monitor for errors and latency
- [ ] Gradual traffic shift (if possible) vs. full cutover
- [ ] Disable writes to old database
- [ ] Final validation
- [ ] Declare migration complete
 
## Phase 6: Cleanup (Week 11+)
- [ ] Keep old database in read-only mode (safety net)
- [ ] Remove dual-write code paths after stabilization
- [ ] Archive and eventually decommission old database
- [ ] Document new architecture
- [ ] Update disaster recovery procedures

Sharding is a One-Way Door

Summary: Choosing and Implementing Scaling Strategies

Key Takeaways

•Vertical scaling first — Simplest approach with no application changes. Use until hardware limits are reached.
•Read replicas for read-heavy workloads — Scales read capacity with moderate complexity. Handle replication lag appropriately.
•Sharding for write scaling — Enables massive scale but introduces significant complexity. Choose shard keys carefully.
•Connection pooling always — Reduces connection overhead regardless of architecture. Essential at scale.
•Plan migrations carefully — Scaling transitions require thorough planning, testing, and rollback procedures.
•Match strategy to maturity — Advanced scaling requires advanced operational capabilities. Don't over-architect.

What's next:

Page Complete

3 / 5