System Design (HLD)SQL Database Scaling Patterns

SQL Database Scaling Patterns

LevelAdvanced

Duration75 mins

TopicSQL Database Scaling Patterns

5 / 5

NewSQL Alternatives

The Best of Both Worlds

For decades, database scaling faced an apparent dilemma: choose SQL with ACID guarantees but accept scaling limits, or choose NoSQL with horizontal scalability but sacrifice relational features. This trade-off shaped how entire generations of engineers thought about data architecture.

NewSQL databases challenge this dichotomy. They promise the familiar SQL interface and ACID transactions of traditional relational databases while delivering the horizontal scalability previously exclusive to NoSQL systems. Born from advances in distributed systems research, these databases represent a fundamental rethinking of how scalable data systems can work.

But promises are easy—understanding whether NewSQL fits your needs requires examining the engineering reality beneath the marketing.

What You Will Learn

By the end of this page, you will understand what makes NewSQL databases architecturally different, the major NewSQL options and their characteristics, how to evaluate whether NewSQL is right for your workload, migration paths from traditional SQL databases, and the trade-offs and limitations of NewSQL systems.

What Makes NewSQL Different

NewSQL databases share several architectural principles that distinguish them from both traditional SQL and NoSQL systems.

Distributed Transaction Processing

Traditional SQL databases handle transactions on a single node. Scaling requires application-level sharding with sagas or 2PC for cross-shard consistency.

NewSQL databases implement distributed transactions natively. The database itself coordinates multi-node transactions, abstracting this complexity from applications. Techniques like:

Multi-Version Concurrency Control (MVCC) across distributed nodes
Distributed commit protocols (Raft, Paxos, or custom variants)
Timestamp ordering or hybrid logical clocks for transaction ordering

Automatic Sharding and Rebalancing

Unlike application-level sharding where you manage shard keys and routing, NewSQL databases shard automatically. You define tables; the database:

Partitions data into ranges or tablets
Distributes partitions across nodes
Automatically rebalances when nodes are added/removed
Handles routing transparently—queries go to the right partition without application logic

SQL Interface with Extensions

NewSQL databases support standard SQL (often PostgreSQL or MySQL wire protocol compatible) plus extensions for:

Geo-partitioning and locality controls
Transaction priority and contention management
Consistency level tuning per query
Diagnostic and explain tools for distributed plans

Comparison: Traditional SQL vs. NoSQL vs. NewSQL
Characteristic	Traditional SQL	NoSQL	NewSQL
Data Model	Relational (tables, SQL)	Varied (document, key-value, etc.)	Relational (tables, SQL)
Transactions	ACID (single node)	Usually none or limited	Distributed ACID
Scaling	Vertical + manual sharding	Horizontal (native)	Horizontal (automatic)
Consistency	Strong (single node)	Eventually consistent (often)	Strong or tunable
Query Language	SQL	Varied (often none)	SQL
Schema	Fixed schema	Flexible/schemaless	Fixed schema
Operational Complexity	Lower (single node)	Variable	Higher (distributed ops)

Not Magic—Engineering Trade-offs

NewSQL doesn't violate the CAP theorem. Distributed ACID transactions have real costs: higher write latency, complex failure modes, and operational sophistication requirements. NewSQL offers different trade-offs, not the elimination of trade-offs.

Key Architectural Innovations

Several technical innovations enable NewSQL capabilities:

Raft Consensus for Replication

Most NewSQL databases use the Raft consensus protocol (or variants like Multi-Raft) for replication:

Data is divided into ranges/tablets
Each range has a Raft group with a leader and followers
Writes go to the leader, which replicates to followers before acknowledging
If the leader fails, a new leader is elected automatically

This provides strong consistency within each range without a single global coordinator bottleneck.

Hybrid Logical Clocks (HLC)

Ordering transactions across a distributed system requires some notion of time. Options:

Physical clocks (GPS, atomic): Google Spanner uses TrueTime with GPS/atomic clocks. Expensive but provides true global ordering.

Logical clocks: Lamport clocks provide ordering but not wall-clock correlation.

Hybrid Logical Clocks: Combine physical time (when available) with logical counters. Used by CockroachDB, YugabyteDB. Provides good-enough ordering without specialized hardware.

Multi-Version Concurrency Control (MVCC)

NewSQL databases use MVCC to allow concurrent reads without blocking writes:

Every write creates a new version with a timestamp
Reads see a consistent snapshot at a specific timestamp
Garbage collection removes old versions

Distributed MVCC extends this across nodes, with timestamp coordination ensuring consistent snapshots across the cluster.

distributed_mvcc.md
Concept
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Distributed MVCC Transaction Flow
═══════════════════════════════════════════════════════════════
 
Transaction: Transfer $100 from Account A (Node 1) to Account B (Node 2)
 
1. BEGIN TRANSACTION
   └─ Coordinator assigns timestamp T1 using HLC
 
2. READ account_a WHERE id = 'A'
   └─ Node 1 returns latest version where version.ts < T1
   └─ Acquired read lock at T1
 
3. READ account_b WHERE id = 'B'  
   └─ Node 2 returns latest version where version.ts < T1
   └─ Acquired read lock at T1
 
4. WRITE account_a SET balance = balance - 100
   └─ Node 1 creates new version at T1 (not yet visible)
   └─ Acquired write intent lock
 
5. WRITE account_b SET balance = balance + 100
   └─ Node 2 creates new version at T1 (not yet visible)
   └─ Acquired write intent lock
 
6. PREPARE COMMIT
   └─ Coordinator asks both nodes to prepare
   └─ Both confirm write intents are stable in Raft log
 
7. COMMIT at T1
   └─ Coordinator writes commit record to Raft
   └─ Both nodes mark versions at T1 as committed
   └─ Versions now visible to transactions with ts > T1
 
8. CLEANUP
   └─ Read locks released
   └─ Write intents converted to committed versions
 
 
Conflict Detection:
───────────────────
If concurrent transaction T2 tries to write Account A:
- T2 sees T1's write intent during its write attempt
- T2 either waits (if T1 might commit) or pushes T1's timestamp
- Serializability maintained via timestamp + lock-based conflict detection

Automatic Range Splitting and Merging

As data grows, NewSQL databases automatically manage partition sizes:

When a range exceeds a threshold (e.g., 512MB), it splits into two ranges
When adjacent ranges become small, they merge
Splits and merges happen online without query interruption
Rebalancing moves ranges to distribute load evenly

This eliminates the manual resharding operations required with application-level sharding.

Major NewSQL Players

Several NewSQL databases have achieved production maturity. Each has distinct characteristics and optimal use cases.

Google Cloud Spanner

The original inspiration for modern NewSQL. Key features:

TrueTime: GPS/atomic clock infrastructure enables true external consistency
Fully managed: Google operates the infrastructure
Global scale: Designed for multi-region, globally distributed deployments
Strong consistency everywhere: Even across continents

Trade-offs: Expensive, vendor lock-in to Google Cloud, requires careful schema design for performance.

Best for: Global applications needing strong consistency, organizations already on GCP, financial/regulated workloads.

CockroachDB

Open-source, Spanner-inspired database:

PostgreSQL wire protocol: Compatible with PostgreSQL clients and ORMs
Hybrid logical clocks: No special hardware required
Multi-active availability: All nodes can accept writes
Geo-partitioning: Pin data to specific regions for compliance

Trade-offs: Higher latency than single-node PostgreSQL, complex operational model, requires distributed systems expertise.

Best for: Applications needing horizontal scale with PostgreSQL compatibility, multi-region deployments without Spanner lock-in.

NewSQL Database Comparison
Database	Protocol	Consistency	Open Source	Cloud Managed
Google Spanner	gRPC/Spanner	External (TrueTime)	No	Yes (GCP only)
CockroachDB	PostgreSQL	Serializable	Yes (BSL)	Yes (multi-cloud)
TiDB	MySQL	Snapshot isolation	Yes (Apache 2.0)	Yes (TiDB Cloud)
YugabyteDB	PostgreSQL/Cassandra	Serializable	Yes (Apache 2.0)	Yes (multi-cloud)
PlanetScale	MySQL	Serializable (Vitess)	No (Vitess is)	Yes
CockroachDB (Serverless)	PostgreSQL	Serializable	No	Yes

TiDB

Distributed SQL database with MySQL compatibility:

MySQL protocol: Drop-in replacement for MySQL in many cases
HTAP capability: Combines OLTP and OLAP in one system
TiKV storage layer: Distributed key-value store using Raft
Strong ecosystem: Particularly popular in Asia/China

Trade-offs: Snapshot isolation (not serializable by default), complex multi-component architecture.

Best for: MySQL-based applications needing scale, hybrid transactional-analytical workloads.

YugabyteDB

Distributed SQL with PostgreSQL and Cassandra API compatibility:

Dual API: PostgreSQL for SQL, Cassandra for key-value/document
Strong consistency: Serializable isolation available
Kubernetes-native: Designed for cloud-native deployments
Geo-distribution: Built-in multi-region support

Trade-offs: Younger ecosystem, some PostgreSQL compatibility gaps.

Best for: PostgreSQL applications needing scale, organizations wanting open-source with commercial support.

Wire Protocol Compatibility ≠ Full Compatibility

Being 'PostgreSQL compatible' or 'MySQL compatible' means supporting the wire protocol and most SQL syntax. It doesn't mean 100% compatibility. Stored procedures, extensions, specific functions, and advanced features may differ. Always test your specific application thoroughly.

Performance Characteristics

NewSQL databases have different performance profiles than traditional SQL. Understanding these characteristics helps you evaluate fit:

Write Latency

Distributed transactions require consensus across nodes. A write that takes 1ms on PostgreSQL might take 5-20ms on a NewSQL database, depending on:

Replication factor: More replicas = more consensus overhead
Geographic distribution: Cross-region consensus adds network latency
Transaction complexity: Multi-key transactions require coordination

Impact: Applications sensitive to write latency may see measurable degradation. High-frequency trading, real-time gaming, and similar microsecond-sensitive workloads may not be suitable.

Read Latency

Reads can be faster than traditional sharded systems:

Local reads: If data is on the local node, performance is comparable to single-node
Follower reads: Reading from replicas (with slightly stale data) reduces load on leaders
Snapshot reads: Historical reads at a specific timestamp don't block writes

Caveat: Distributed query plans for complex JOINs across ranges may add latency compared to single-node execution.

Throughput Scaling

NewSQL shines for horizontal scaling:

Add nodes to increase aggregate throughput
Automatic load balancing distributes hot spots
Near-linear scaling for well-distributed workloads

Limitation: Write scaling is limited by contention. If all transactions touch the same rows, adding nodes doesn't help—you're bottlenecked on coordination for those specific rows.

newsql_performance_testing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
/**
 * Performance comparison: PostgreSQL vs CockroachDB
 * 
 * This benchmark illustrates typical latency differences.
 * Run on your actual workload for accurate comparisons.
 */
 
interface BenchmarkResult {
    operation: string;
    samples: number;
    p50Ms: number;
    p99Ms: number;
    throughput: number;
}
 
async function runLatencyBenchmark(
    db: DatabaseClient,
    iterations: number = 1000
): Promise<BenchmarkResult[]> {
    const results: BenchmarkResult[] = [];
    
    // Point reads (by primary key)
    const readLatencies: number[] = [];
    for (let i = 0; i < iterations; i++) {
        const start = performance.now();
        await db.query('SELECT * FROM users WHERE id = $1', [randomId()]);
        readLatencies.push(performance.now() - start);
    }
    results.push({
        operation: 'Point Read',
        samples: iterations,
        p50Ms: percentile(readLatencies, 50),
        p99Ms: percentile(readLatencies, 99),
        throughput: iterations / (sum(readLatencies) / 1000),
    });
    
    // Single-row writes
    const writeLatencies: number[] = [];
    for (let i = 0; i < iterations; i++) {
        const start = performance.now();
        await db.query(
            'INSERT INTO events (id, user_id, data) VALUES ($1, $2, $3)',
            [uuid(), randomId(), JSON.stringify({ test: true })]
        );
        writeLatencies.push(performance.now() - start);
    }
    results.push({
        operation: 'Single Insert',
        samples: iterations,
        p50Ms: percentile(writeLatencies, 50),
        p99Ms: percentile(writeLatencies, 99),
        throughput: iterations / (sum(writeLatencies) / 1000),
    });
    
    // Multi-statement transaction
    const txnLatencies: number[] = [];
    for (let i = 0; i < iterations; i++) {
        const start = performance.now();
        await db.transaction(async (tx) => {
            const user = await tx.query(
                'SELECT * FROM users WHERE id = $1 FOR UPDATE', 
                [randomId()]
            );
            await tx.query(
                'UPDATE users SET updated_at = NOW() WHERE id = $1',
                [user.rows[0].id]
            );
            await tx.query(
                'INSERT INTO audit_log (user_id, action) VALUES ($1, $2)',
                [user.rows[0].id, 'benchmark_update']
            );
        });
        txnLatencies.push(performance.now() - start);
    }
    results.push({
        operation: 'RMW Transaction',
        samples: iterations,
        p50Ms: percentile(txnLatencies, 50),
        p99Ms: percentile(txnLatencies, 99),
        throughput: iterations / (sum(txnLatencies) / 1000),
    });
    
    return results;
}
 
/*
 * Example results (illustrative, not benchmarks):
 * 
 * PostgreSQL (single node):
 * ┌──────────────────┬─────────┬─────────┬────────────┐
 * │ Operation        │ p50 ms  │ p99 ms  │ throughput │
 * ├──────────────────┼─────────┼─────────┼────────────┤
 * │ Point Read       │ 0.5     │ 2.0     │ 5000/s     │
 * │ Single Insert    │ 1.0     │ 5.0     │ 2000/s     │
 * │ RMW Transaction  │ 2.0     │ 10.0    │ 800/s      │
 * └──────────────────┴─────────┴─────────┴────────────┘
 * 
 * CockroachDB (3-node, same region):
 * ┌──────────────────┬─────────┬─────────┬────────────┐
 * │ Operation        │ p50 ms  │ p99 ms  │ throughput │
 * ├──────────────────┼─────────┼─────────┼────────────┤
 * │ Point Read       │ 1.0     │ 5.0     │ 3000/s     │
 * │ Single Insert    │ 5.0     │ 20.0    │ 500/s      │
 * │ RMW Transaction  │ 10.0    │ 50.0    │ 200/s      │
 * └──────────────────┴─────────┴─────────┴────────────┘
 * 
 * BUT: CockroachDB scales to 10x throughput by adding nodes.
 * PostgreSQL requires application-level sharding.
 */

When to Choose NewSQL

NewSQL isn't always the right choice. Use this framework to evaluate fit:

NewSQL Is a Good Fit When

•Scaling predictably needed — You know you'll outgrow a single node within 2-3 years
•Strong consistency required — Financial, inventory, or other data requiring ACID
•Multi-region requirements — Global user base needing low-latency access
•Avoiding sharding complexity — Team lacks distributed database expertise
•SQL skills exist — Team knows SQL and wants to keep using it
•Workload is distributed — Keys are well-distributed, not hot-spotted

NewSQL May Not Fit When

•Single-node sufficient — Data fits comfortably on one well-provisioned server
•Latency-critical writes — Sub-millisecond write latency required
•Heavy stored procedure usage — Complex PL/pgSQL or T-SQL that doesn't port
•Specific extension dependencies — PostGIS, TimescaleDB, etc. not available
•Limited budget — NewSQL has higher operational costs
•Simple read-heavy workload — Read replicas + caching may be simpler

The Cost Dimension

NewSQL databases typically cost more than traditional SQL:

Infrastructure: Minimum 3 nodes for fault tolerance (vs. 1 for PostgreSQL). Each node needs substantial resources.

Managed service pricing: CockroachDB Cloud, Spanner, TiDB Cloud charge premium prices for the distributed coordination.

Operational expertise: Running distributed databases requires skills in distributed systems, performance tuning, and complex failure scenarios.

Rough estimate: Expect 2-5× the cost of a comparable PostgreSQL deployment for small to medium workloads. The gap narrows at very large scale where manual sharding costs become significant.

Don't Adopt NewSQL Prematurely

Many teams adopt NewSQL before they need it, paying the complexity and cost tax years before deriving benefits. If your data fits on a single PostgreSQL instance with read replicas, that's probably the right architecture today. Migrate to NewSQL when you're 12-18 months from hitting single-node limits.

Migration Considerations

Migrating from traditional SQL to NewSQL requires careful planning:

Compatibility Assessment

Before migrating, audit your application for compatibility:

SQL syntax: Most standard SQL works, but check for:

Vendor-specific functions (PostgreSQL vs MySQL differences)
Window function variations
JSON/JSONB handling
Full-text search capabilities

Schema features:

Stored procedures and functions
Triggers (often limited or unsupported)
Sequences and auto-increment behavior
Constraints and referential integrity

Extensions:

PostGIS for geographic data
pg_trgm for fuzzy matching
Custom extensions

Schema Design Adjustments

NewSQL databases have different optimal schema patterns:

newsql_schema_considerations.sql
CockroachDB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- CockroachDB schema design considerations
 
-- 1. Use UUIDs for primary keys (distribute better than sequences)
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email STRING NOT NULL,
    name STRING,
    created_at TIMESTAMPTZ DEFAULT now()
);
 
-- 2. Avoid auto-increment IDs (create hotspots on a single range)
-- BAD: SERIAL or IDENTITY columns
-- GOOD: UUIDs or hashed IDs
 
-- 3. Consider hash-sharded indexes for high-contention columns
CREATE TABLE events (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID,
    event_time TIMESTAMPTZ DEFAULT now(),
    data JSONB,
    
    -- Hash-sharded index prevents range hotspots
    INDEX events_by_time (event_time) USING HASH WITH (bucket_count = 16)
);
 
-- 4. Use REGIONAL BY ROW for geo-partitioning
ALTER DATABASE my_app SET PRIMARY REGION = 'us-east1';
ALTER DATABASE my_app ADD REGION 'eu-west1';
ALTER DATABASE my_app ADD REGION 'ap-southeast1';
 
CREATE TABLE user_data (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID,
    region crdb_internal_region NOT NULL,
    data JSONB
) LOCALITY REGIONAL BY ROW;
 
-- 5. Design transactions to be single-region when possible
-- Cross-region transactions have high latency
 
-- 6. Use follower reads for read-heavy, latency-tolerant queries
SET enable_follower_reads = on;
SELECT * FROM users AS OF SYSTEM TIME follower_read_timestamp() WHERE id = $1;

Migration Strategies

Big Bang Migration (high risk, faster):

Freeze writes
Export all data from source
Import into NewSQL target
Validate data integrity
Switch application connection strings
Resume operations

Gradual Migration (lower risk, longer):

Set up dual-write to both databases
Backfill historical data to NewSQL
Validate consistency continuously
Migrate reads gradually (5%, 25%, 50%, 100%)
Migrate writes
Decommission legacy database

Shadow Mode (safest, resource-intensive):

Run both databases in parallel
All writes go to both; reads from legacy
Compare query results between databases
Fix compatibility issues discovered
Only cut over when differences are zero

Start with Non-Critical Workloads

Don't migrate your most critical production database first. Start with a secondary workload—internal tools, analytics, or a new feature. Build operational experience before migrating revenue-critical systems.

Operational Reality

Operating NewSQL databases requires different skills and processes than traditional databases:

Cluster Management

Node health monitoring: Track CPU, memory, disk, and network across all nodes
Raft group health: Monitor leader elections, replication lag, and consensus delays
Range distribution: Watch for hot ranges and imbalanced data distribution
Liveness detection: Understand when nodes are truly failed vs. network partitioned

Performance Tuning

NewSQL performance tuning involves:

Query plan analysis: Distributed explain plans show which ranges are accessed
Contention analysis: Identify transactions waiting on locks across nodes
Range coalescing: Ensure related data is colocated for efficient access
Follower read tuning: Configure stale reads for appropriate latency/consistency trade-offs

Failure Scenarios

Distributed databases have more complex failure modes:

Node failure: The system should self-heal, but validate recovery
Rack/zone failure: Test that quorum is maintained
Region failure: For multi-region setups, practice regional failover
Network partition: Understand split-brain prevention and recovery
Clock skew: Monitor and alert on clock drift that affects transaction ordering

Common NewSQL Operational Challenges

•Hot ranges — A single range receiving disproportionate traffic. Diagnose via metrics; fix with schema changes or hash-sharded indexes.
•Write amplification — Consensus requires writing to multiple nodes. High write volume means high disk I/O across the cluster.
•Transaction retry errors — Distributed conflicts cause transaction aborts. Applications need retry logic at a higher rate than single-node databases.
•Upgrade coordination — Rolling upgrades across a cluster require careful orchestration to maintain quorum.
•Backup complexity — Point-in-time backups across a distributed system require coordination. Use built-in backup tools, not pg_dump.
•DR testing — Disaster recovery testing is more complex. Simulate regional failures, not just single-node failures.

Consider Managed Services

Unless you have strong platform engineering capabilities, consider managed NewSQL services. CockroachDB Cloud, TiDB Cloud, and Spanner eliminate much of the operational burden. The premium pricing often costs less than building equivalent internal expertise.

Summary: NewSQL Alternatives

Let's consolidate the key insights from our exploration of NewSQL databases:

Key Takeaways

•NewSQL combines SQL interfaces with distributed architecture — Get horizontal scaling without abandoning relational models and ACID transactions.
•Distributed transactions have latency costs — Expect 3-10× higher write latency than single-node databases. Design applications accordingly.
•Automatic sharding eliminates application complexity — No shard keys, no routing logic, no resharding operations. The database handles it.
•Wire compatibility isn't full compatibility — Test thoroughly before migrating. Stored procedures, triggers, and extensions may not work.
•Operational complexity increases — Distributed systems require distributed systems expertise. Consider managed services.
•Adopt when the need is clear — Don't pay the NewSQL tax prematurely. Migrate when you're 12-18 months from outgrowing single-node databases.

Module Summary: SQL Database Scaling Patterns

We've explored the complete landscape of SQL database scaling:

Vertical scaling — Exhaust single-machine capacity first. It's simple and often sufficient longer than expected.
Read replicas — Scale reads horizontally. Handle consistency trade-offs with intelligent routing.
Functional partitioning — Divide by domain to scale both reads and writes without sharding complexity.
Application-level sharding — The ultimate scaling lever, but with significant engineering cost.
NewSQL — Let the database handle distribution. Accept latency trade-offs for operational simplicity.

The right strategy depends on your specific workload, team capabilities, and growth trajectory. Most successful scaling journeys progress through this sequence incrementally, adding complexity only when simpler options are exhausted.

Module Complete

You've completed the SQL Database Scaling Patterns module! You now understand the full spectrum of techniques for scaling SQL databases—from vertical optimization through distributed NewSQL systems. These patterns form the foundation for architecting data-intensive applications that can grow from startup to global scale.

5 / 5

Loading learning content...

System Design (HLD)SQL Database Scaling Patterns

SQL Database Scaling Patterns

LevelAdvanced

Duration75 mins

TopicSQL Database Scaling Patterns

5 / 5

NewSQL Alternatives

The Best of Both Worlds

But promises are easy—understanding whether NewSQL fits your needs requires examining the engineering reality beneath the marketing.

What You Will Learn

What Makes NewSQL Different

NewSQL databases share several architectural principles that distinguish them from both traditional SQL and NoSQL systems.

Distributed Transaction Processing

Traditional SQL databases handle transactions on a single node. Scaling requires application-level sharding with sagas or 2PC for cross-shard consistency.

NewSQL databases implement distributed transactions natively. The database itself coordinates multi-node transactions, abstracting this complexity from applications. Techniques like:

Multi-Version Concurrency Control (MVCC) across distributed nodes
Distributed commit protocols (Raft, Paxos, or custom variants)
Timestamp ordering or hybrid logical clocks for transaction ordering

Automatic Sharding and Rebalancing

Unlike application-level sharding where you manage shard keys and routing, NewSQL databases shard automatically. You define tables; the database:

Partitions data into ranges or tablets
Distributes partitions across nodes
Automatically rebalances when nodes are added/removed
Handles routing transparently—queries go to the right partition without application logic

SQL Interface with Extensions

NewSQL databases support standard SQL (often PostgreSQL or MySQL wire protocol compatible) plus extensions for:

Geo-partitioning and locality controls
Transaction priority and contention management
Consistency level tuning per query
Diagnostic and explain tools for distributed plans

Comparison: Traditional SQL vs. NoSQL vs. NewSQL
Characteristic	Traditional SQL	NoSQL	NewSQL
Data Model	Relational (tables, SQL)	Varied (document, key-value, etc.)	Relational (tables, SQL)
Transactions	ACID (single node)	Usually none or limited	Distributed ACID
Scaling	Vertical + manual sharding	Horizontal (native)	Horizontal (automatic)
Consistency	Strong (single node)	Eventually consistent (often)	Strong or tunable
Query Language	SQL	Varied (often none)	SQL
Schema	Fixed schema	Flexible/schemaless	Fixed schema
Operational Complexity	Lower (single node)	Variable	Higher (distributed ops)

Not Magic—Engineering Trade-offs

Key Architectural Innovations

Several technical innovations enable NewSQL capabilities:

Raft Consensus for Replication

Most NewSQL databases use the Raft consensus protocol (or variants like Multi-Raft) for replication:

Data is divided into ranges/tablets
Each range has a Raft group with a leader and followers
Writes go to the leader, which replicates to followers before acknowledging
If the leader fails, a new leader is elected automatically

This provides strong consistency within each range without a single global coordinator bottleneck.

Hybrid Logical Clocks (HLC)

Ordering transactions across a distributed system requires some notion of time. Options:

Physical clocks (GPS, atomic): Google Spanner uses TrueTime with GPS/atomic clocks. Expensive but provides true global ordering.

Logical clocks: Lamport clocks provide ordering but not wall-clock correlation.

Hybrid Logical Clocks: Combine physical time (when available) with logical counters. Used by CockroachDB, YugabyteDB. Provides good-enough ordering without specialized hardware.

Multi-Version Concurrency Control (MVCC)

NewSQL databases use MVCC to allow concurrent reads without blocking writes:

Every write creates a new version with a timestamp
Reads see a consistent snapshot at a specific timestamp
Garbage collection removes old versions

Distributed MVCC extends this across nodes, with timestamp coordination ensuring consistent snapshots across the cluster.

distributed_mvcc.md
Concept
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Distributed MVCC Transaction Flow
═══════════════════════════════════════════════════════════════
 
Transaction: Transfer $100 from Account A (Node 1) to Account B (Node 2)
 
1. BEGIN TRANSACTION
   └─ Coordinator assigns timestamp T1 using HLC
 
2. READ account_a WHERE id = 'A'
   └─ Node 1 returns latest version where version.ts < T1
   └─ Acquired read lock at T1
 
3. READ account_b WHERE id = 'B'  
   └─ Node 2 returns latest version where version.ts < T1
   └─ Acquired read lock at T1
 
4. WRITE account_a SET balance = balance - 100
   └─ Node 1 creates new version at T1 (not yet visible)
   └─ Acquired write intent lock
 
5. WRITE account_b SET balance = balance + 100
   └─ Node 2 creates new version at T1 (not yet visible)
   └─ Acquired write intent lock
 
6. PREPARE COMMIT
   └─ Coordinator asks both nodes to prepare
   └─ Both confirm write intents are stable in Raft log
 
7. COMMIT at T1
   └─ Coordinator writes commit record to Raft
   └─ Both nodes mark versions at T1 as committed
   └─ Versions now visible to transactions with ts > T1
 
8. CLEANUP
   └─ Read locks released
   └─ Write intents converted to committed versions
 
 
Conflict Detection:
───────────────────
If concurrent transaction T2 tries to write Account A:
- T2 sees T1's write intent during its write attempt
- T2 either waits (if T1 might commit) or pushes T1's timestamp
- Serializability maintained via timestamp + lock-based conflict detection

Automatic Range Splitting and Merging

As data grows, NewSQL databases automatically manage partition sizes:

When a range exceeds a threshold (e.g., 512MB), it splits into two ranges
When adjacent ranges become small, they merge
Splits and merges happen online without query interruption
Rebalancing moves ranges to distribute load evenly

This eliminates the manual resharding operations required with application-level sharding.

Major NewSQL Players

Several NewSQL databases have achieved production maturity. Each has distinct characteristics and optimal use cases.

Google Cloud Spanner

The original inspiration for modern NewSQL. Key features:

TrueTime: GPS/atomic clock infrastructure enables true external consistency
Fully managed: Google operates the infrastructure
Global scale: Designed for multi-region, globally distributed deployments
Strong consistency everywhere: Even across continents

Trade-offs: Expensive, vendor lock-in to Google Cloud, requires careful schema design for performance.

Best for: Global applications needing strong consistency, organizations already on GCP, financial/regulated workloads.

CockroachDB

Open-source, Spanner-inspired database:

PostgreSQL wire protocol: Compatible with PostgreSQL clients and ORMs
Hybrid logical clocks: No special hardware required
Multi-active availability: All nodes can accept writes
Geo-partitioning: Pin data to specific regions for compliance

Trade-offs: Higher latency than single-node PostgreSQL, complex operational model, requires distributed systems expertise.

Best for: Applications needing horizontal scale with PostgreSQL compatibility, multi-region deployments without Spanner lock-in.

NewSQL Database Comparison
Database	Protocol	Consistency	Open Source	Cloud Managed
Google Spanner	gRPC/Spanner	External (TrueTime)	No	Yes (GCP only)
CockroachDB	PostgreSQL	Serializable	Yes (BSL)	Yes (multi-cloud)
TiDB	MySQL	Snapshot isolation	Yes (Apache 2.0)	Yes (TiDB Cloud)
YugabyteDB	PostgreSQL/Cassandra	Serializable	Yes (Apache 2.0)	Yes (multi-cloud)
PlanetScale	MySQL	Serializable (Vitess)	No (Vitess is)	Yes
CockroachDB (Serverless)	PostgreSQL	Serializable	No	Yes

TiDB

Distributed SQL database with MySQL compatibility:

MySQL protocol: Drop-in replacement for MySQL in many cases
HTAP capability: Combines OLTP and OLAP in one system
TiKV storage layer: Distributed key-value store using Raft
Strong ecosystem: Particularly popular in Asia/China

Trade-offs: Snapshot isolation (not serializable by default), complex multi-component architecture.

Best for: MySQL-based applications needing scale, hybrid transactional-analytical workloads.

YugabyteDB

Distributed SQL with PostgreSQL and Cassandra API compatibility:

Dual API: PostgreSQL for SQL, Cassandra for key-value/document
Strong consistency: Serializable isolation available
Kubernetes-native: Designed for cloud-native deployments
Geo-distribution: Built-in multi-region support

Trade-offs: Younger ecosystem, some PostgreSQL compatibility gaps.

Best for: PostgreSQL applications needing scale, organizations wanting open-source with commercial support.

Wire Protocol Compatibility ≠ Full Compatibility

Performance Characteristics

NewSQL databases have different performance profiles than traditional SQL. Understanding these characteristics helps you evaluate fit:

Write Latency

Distributed transactions require consensus across nodes. A write that takes 1ms on PostgreSQL might take 5-20ms on a NewSQL database, depending on:

Replication factor: More replicas = more consensus overhead
Geographic distribution: Cross-region consensus adds network latency
Transaction complexity: Multi-key transactions require coordination

Impact: Applications sensitive to write latency may see measurable degradation. High-frequency trading, real-time gaming, and similar microsecond-sensitive workloads may not be suitable.

Read Latency

Reads can be faster than traditional sharded systems:

Local reads: If data is on the local node, performance is comparable to single-node
Follower reads: Reading from replicas (with slightly stale data) reduces load on leaders
Snapshot reads: Historical reads at a specific timestamp don't block writes

Caveat: Distributed query plans for complex JOINs across ranges may add latency compared to single-node execution.

Throughput Scaling

NewSQL shines for horizontal scaling:

Add nodes to increase aggregate throughput
Automatic load balancing distributes hot spots
Near-linear scaling for well-distributed workloads

Limitation: Write scaling is limited by contention. If all transactions touch the same rows, adding nodes doesn't help—you're bottlenecked on coordination for those specific rows.

newsql_performance_testing.ts
TypeScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
/**
 * Performance comparison: PostgreSQL vs CockroachDB
 * 
 * This benchmark illustrates typical latency differences.
 * Run on your actual workload for accurate comparisons.
 */
 
interface BenchmarkResult {
    operation: string;
    samples: number;
    p50Ms: number;
    p99Ms: number;
    throughput: number;
}
 
async function runLatencyBenchmark(
    db: DatabaseClient,
    iterations: number = 1000
): Promise<BenchmarkResult[]> {
    const results: BenchmarkResult[] = [];
    
    // Point reads (by primary key)
    const readLatencies: number[] = [];
    for (let i = 0; i < iterations; i++) {
        const start = performance.now();
        await db.query('SELECT * FROM users WHERE id = $1', [randomId()]);
        readLatencies.push(performance.now() - start);
    }
    results.push({
        operation: 'Point Read',
        samples: iterations,
        p50Ms: percentile(readLatencies, 50),
        p99Ms: percentile(readLatencies, 99),
        throughput: iterations / (sum(readLatencies) / 1000),
    });
    
    // Single-row writes
    const writeLatencies: number[] = [];
    for (let i = 0; i < iterations; i++) {
        const start = performance.now();
        await db.query(
            'INSERT INTO events (id, user_id, data) VALUES ($1, $2, $3)',
            [uuid(), randomId(), JSON.stringify({ test: true })]
        );
        writeLatencies.push(performance.now() - start);
    }
    results.push({
        operation: 'Single Insert',
        samples: iterations,
        p50Ms: percentile(writeLatencies, 50),
        p99Ms: percentile(writeLatencies, 99),
        throughput: iterations / (sum(writeLatencies) / 1000),
    });
    
    // Multi-statement transaction
    const txnLatencies: number[] = [];
    for (let i = 0; i < iterations; i++) {
        const start = performance.now();
        await db.transaction(async (tx) => {
            const user = await tx.query(
                'SELECT * FROM users WHERE id = $1 FOR UPDATE', 
                [randomId()]
            );
            await tx.query(
                'UPDATE users SET updated_at = NOW() WHERE id = $1',
                [user.rows[0].id]
            );
            await tx.query(
                'INSERT INTO audit_log (user_id, action) VALUES ($1, $2)',
                [user.rows[0].id, 'benchmark_update']
            );
        });
        txnLatencies.push(performance.now() - start);
    }
    results.push({
        operation: 'RMW Transaction',
        samples: iterations,
        p50Ms: percentile(txnLatencies, 50),
        p99Ms: percentile(txnLatencies, 99),
        throughput: iterations / (sum(txnLatencies) / 1000),
    });
    
    return results;
}
 
/*
 * Example results (illustrative, not benchmarks):
 * 
 * PostgreSQL (single node):
 * ┌──────────────────┬─────────┬─────────┬────────────┐
 * │ Operation        │ p50 ms  │ p99 ms  │ throughput │
 * ├──────────────────┼─────────┼─────────┼────────────┤
 * │ Point Read       │ 0.5     │ 2.0     │ 5000/s     │
 * │ Single Insert    │ 1.0     │ 5.0     │ 2000/s     │
 * │ RMW Transaction  │ 2.0     │ 10.0    │ 800/s      │
 * └──────────────────┴─────────┴─────────┴────────────┘
 * 
 * CockroachDB (3-node, same region):
 * ┌──────────────────┬─────────┬─────────┬────────────┐
 * │ Operation        │ p50 ms  │ p99 ms  │ throughput │
 * ├──────────────────┼─────────┼─────────┼────────────┤
 * │ Point Read       │ 1.0     │ 5.0     │ 3000/s     │
 * │ Single Insert    │ 5.0     │ 20.0    │ 500/s      │
 * │ RMW Transaction  │ 10.0    │ 50.0    │ 200/s      │
 * └──────────────────┴─────────┴─────────┴────────────┘
 * 
 * BUT: CockroachDB scales to 10x throughput by adding nodes.
 * PostgreSQL requires application-level sharding.
 */

When to Choose NewSQL

NewSQL isn't always the right choice. Use this framework to evaluate fit:

NewSQL Is a Good Fit When

•Scaling predictably needed — You know you'll outgrow a single node within 2-3 years
•Strong consistency required — Financial, inventory, or other data requiring ACID
•Multi-region requirements — Global user base needing low-latency access
•Avoiding sharding complexity — Team lacks distributed database expertise
•SQL skills exist — Team knows SQL and wants to keep using it
•Workload is distributed — Keys are well-distributed, not hot-spotted

NewSQL May Not Fit When

•Single-node sufficient — Data fits comfortably on one well-provisioned server
•Latency-critical writes — Sub-millisecond write latency required
•Heavy stored procedure usage — Complex PL/pgSQL or T-SQL that doesn't port
•Specific extension dependencies — PostGIS, TimescaleDB, etc. not available
•Limited budget — NewSQL has higher operational costs
•Simple read-heavy workload — Read replicas + caching may be simpler

The Cost Dimension

NewSQL databases typically cost more than traditional SQL:

Infrastructure: Minimum 3 nodes for fault tolerance (vs. 1 for PostgreSQL). Each node needs substantial resources.

Managed service pricing: CockroachDB Cloud, Spanner, TiDB Cloud charge premium prices for the distributed coordination.

Operational expertise: Running distributed databases requires skills in distributed systems, performance tuning, and complex failure scenarios.

Rough estimate: Expect 2-5× the cost of a comparable PostgreSQL deployment for small to medium workloads. The gap narrows at very large scale where manual sharding costs become significant.

Don't Adopt NewSQL Prematurely

Migration Considerations

Migrating from traditional SQL to NewSQL requires careful planning:

Compatibility Assessment

Before migrating, audit your application for compatibility:

SQL syntax: Most standard SQL works, but check for:

Vendor-specific functions (PostgreSQL vs MySQL differences)
Window function variations
JSON/JSONB handling
Full-text search capabilities

Schema features:

Stored procedures and functions
Triggers (often limited or unsupported)
Sequences and auto-increment behavior
Constraints and referential integrity

Extensions:

PostGIS for geographic data
pg_trgm for fuzzy matching
Custom extensions

Schema Design Adjustments

NewSQL databases have different optimal schema patterns:

newsql_schema_considerations.sql
CockroachDB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-- CockroachDB schema design considerations
 
-- 1. Use UUIDs for primary keys (distribute better than sequences)
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email STRING NOT NULL,
    name STRING,
    created_at TIMESTAMPTZ DEFAULT now()
);
 
-- 2. Avoid auto-increment IDs (create hotspots on a single range)
-- BAD: SERIAL or IDENTITY columns
-- GOOD: UUIDs or hashed IDs
 
-- 3. Consider hash-sharded indexes for high-contention columns
CREATE TABLE events (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID,
    event_time TIMESTAMPTZ DEFAULT now(),
    data JSONB,
    
    -- Hash-sharded index prevents range hotspots
    INDEX events_by_time (event_time) USING HASH WITH (bucket_count = 16)
);
 
-- 4. Use REGIONAL BY ROW for geo-partitioning
ALTER DATABASE my_app SET PRIMARY REGION = 'us-east1';
ALTER DATABASE my_app ADD REGION 'eu-west1';
ALTER DATABASE my_app ADD REGION 'ap-southeast1';
 
CREATE TABLE user_data (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID,
    region crdb_internal_region NOT NULL,
    data JSONB
) LOCALITY REGIONAL BY ROW;
 
-- 5. Design transactions to be single-region when possible
-- Cross-region transactions have high latency
 
-- 6. Use follower reads for read-heavy, latency-tolerant queries
SET enable_follower_reads = on;
SELECT * FROM users AS OF SYSTEM TIME follower_read_timestamp() WHERE id = $1;

Migration Strategies

Big Bang Migration (high risk, faster):

Freeze writes
Export all data from source
Import into NewSQL target
Validate data integrity
Switch application connection strings
Resume operations

Gradual Migration (lower risk, longer):

Set up dual-write to both databases
Backfill historical data to NewSQL
Validate consistency continuously
Migrate reads gradually (5%, 25%, 50%, 100%)
Migrate writes
Decommission legacy database

Shadow Mode (safest, resource-intensive):

Run both databases in parallel
All writes go to both; reads from legacy
Compare query results between databases
Fix compatibility issues discovered
Only cut over when differences are zero

Start with Non-Critical Workloads

Operational Reality

Operating NewSQL databases requires different skills and processes than traditional databases:

Cluster Management

Node health monitoring: Track CPU, memory, disk, and network across all nodes
Raft group health: Monitor leader elections, replication lag, and consensus delays
Range distribution: Watch for hot ranges and imbalanced data distribution
Liveness detection: Understand when nodes are truly failed vs. network partitioned

Performance Tuning

NewSQL performance tuning involves:

Query plan analysis: Distributed explain plans show which ranges are accessed
Contention analysis: Identify transactions waiting on locks across nodes
Range coalescing: Ensure related data is colocated for efficient access
Follower read tuning: Configure stale reads for appropriate latency/consistency trade-offs

Failure Scenarios

Distributed databases have more complex failure modes:

Node failure: The system should self-heal, but validate recovery
Rack/zone failure: Test that quorum is maintained
Region failure: For multi-region setups, practice regional failover
Network partition: Understand split-brain prevention and recovery
Clock skew: Monitor and alert on clock drift that affects transaction ordering

Common NewSQL Operational Challenges

•Hot ranges — A single range receiving disproportionate traffic. Diagnose via metrics; fix with schema changes or hash-sharded indexes.
•Write amplification — Consensus requires writing to multiple nodes. High write volume means high disk I/O across the cluster.
•Transaction retry errors — Distributed conflicts cause transaction aborts. Applications need retry logic at a higher rate than single-node databases.
•Upgrade coordination — Rolling upgrades across a cluster require careful orchestration to maintain quorum.
•Backup complexity — Point-in-time backups across a distributed system require coordination. Use built-in backup tools, not pg_dump.
•DR testing — Disaster recovery testing is more complex. Simulate regional failures, not just single-node failures.

Consider Managed Services

Summary: NewSQL Alternatives

Let's consolidate the key insights from our exploration of NewSQL databases:

Key Takeaways

•NewSQL combines SQL interfaces with distributed architecture — Get horizontal scaling without abandoning relational models and ACID transactions.
•Distributed transactions have latency costs — Expect 3-10× higher write latency than single-node databases. Design applications accordingly.
•Automatic sharding eliminates application complexity — No shard keys, no routing logic, no resharding operations. The database handles it.
•Wire compatibility isn't full compatibility — Test thoroughly before migrating. Stored procedures, triggers, and extensions may not work.
•Operational complexity increases — Distributed systems require distributed systems expertise. Consider managed services.
•Adopt when the need is clear — Don't pay the NewSQL tax prematurely. Migrate when you're 12-18 months from outgrowing single-node databases.

Module Summary: SQL Database Scaling Patterns

We've explored the complete landscape of SQL database scaling:

Vertical scaling — Exhaust single-machine capacity first. It's simple and often sufficient longer than expected.
Read replicas — Scale reads horizontally. Handle consistency trade-offs with intelligent routing.
Functional partitioning — Divide by domain to scale both reads and writes without sharding complexity.
Application-level sharding — The ultimate scaling lever, but with significant engineering cost.
NewSQL — Let the database handle distribution. Accept latency trade-offs for operational simplicity.

Module Complete

5 / 5