Database Management SystemsModern Database Topics

NewSQL Databases

LevelAdvanced

Duration60 mins

TopicModern Database Topics

1 / 5

NewSQL Concept

The Best of Both Worlds

For decades, database engineers faced a seemingly impossible choice: ACID transactions and SQL with traditional relational databases, or horizontal scalability with NoSQL systems. This dichotomy—often called the 'SQL vs NoSQL' divide—forced architects to sacrifice either consistency guarantees or the ability to scale across multiple machines.

Then came NewSQL databases.

NewSQL represents a paradigm shift in database engineering. These systems promise something that was previously considered architecturally impossible: the full transactional guarantees of traditional relational databases combined with the elastic scalability of NoSQL distributed systems. This isn't mere marketing—it's a genuine technological breakthrough built on decades of distributed systems research, novel consensus algorithms, and innovative storage architectures.

What You Will Learn

By the end of this page, you will understand what NewSQL databases are, why they emerged, how they differ from both traditional SQL and NoSQL systems, and the fundamental architectural innovations that make them possible. You'll gain the conceptual foundation necessary to evaluate NewSQL solutions for real-world distributed database requirements.

Historical Context: Why NewSQL Emerged

To truly understand NewSQL, we must first understand the technological forces and limitations that created the demand for a new database category. The emergence of NewSQL wasn't accidental—it was an inevitable response to the shortcomings of existing solutions.

The Traditional SQL Era (1970s-2000s)

For decades, relational database management systems (RDBMS) like Oracle, PostgreSQL, MySQL, and SQL Server dominated enterprise data management. These systems offered:

ACID transactions: Atomicity, Consistency, Isolation, Durability guarantees that ensured data integrity
SQL: A powerful, declarative query language with rich semantics
Relational model: A mathematically rigorous foundation for data organization
Mature tooling: Decades of optimization, monitoring, and administration tools

However, traditional RDBMS were designed for single-node operation. While they could scale vertically (adding more CPU, RAM, and faster disks to a single machine), they struggled with horizontal scaling (distributing data across multiple machines).

The Vertical Scaling Wall

By the mid-2000s, web-scale companies like Google, Amazon, and Facebook were hitting the physical and economic limits of vertical scaling. A single machine—no matter how powerful—couldn't handle billions of users, petabytes of data, and millions of concurrent requests. The largest possible single server simply wasn't enough.

The NoSQL Revolution (2005-2015)

Faced with scaling limitations, internet-scale companies developed new database architectures that prioritized horizontal scalability:

Google Bigtable (2006): Column-family storage for web indexing
Amazon Dynamo (2007): Key-value store for shopping cart availability
MongoDB (2009): Document database with flexible schemas
Apache Cassandra (2008): Wide-column distributed database

These NoSQL systems achieved massive scale by relaxing consistency guarantees. They embraced the CAP theorem's trade-offs, often choosing availability and partition tolerance over strong consistency.

NoSQL databases typically offered:

Horizontal scalability across commodity hardware
Schema flexibility (no rigid table structures)
Eventual consistency (data would eventually converge, but reads might return stale values)
High availability through replication

The Pre-NewSQL Database Landscape
Characteristic	Traditional SQL	NoSQL
Scalability	Vertical (scale-up)	Horizontal (scale-out)
Consistency Model	Strong ACID	Eventual / tunable
Query Language	SQL (declarative, powerful)	API-based (limited, varies by system)
Schema	Rigid, predefined	Flexible, schema-less
Transactions	Full ACID, multi-statement	Single-key or limited
Joins	Native, optimized	Application-level or unsupported
Use Cases	Enterprise, financial, ERP	Web-scale, analytics, caching

The Gap That Created NewSQL

NoSQL solved the scalability problem but created new challenges:

Loss of ACID semantics: Applications had to handle consistency at the application layer—error-prone and complex
No standard query language: Each NoSQL system had its own API, increasing learning curves and lock-in
Limited transaction support: Operations spanning multiple records required custom, fragile workarounds
Join complexity: Denormalization or application-level joins increased data redundancy and complexity

Meanwhile, many applications—particularly in finance, healthcare, and enterprise systems—required strong consistency and couldn't tolerate the trade-offs NoSQL demanded.

The question became: Is it possible to have both ACID guarantees AND horizontal scalability?

Defining NewSQL: What It Actually Means

The term NewSQL was coined by 451 Research analyst Matthew Aslett in 2011 to describe a new class of database systems that provide the scalability of NoSQL while maintaining the ACID guarantees and SQL interface of traditional relational databases.

Formal Definition:

NewSQL is a class of relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

This definition encompasses three essential characteristics:

NewSQL Core Requirements

•SQL Interface: Full SQL support with standard relational semantics, including complex joins, aggregations, and subqueries. Applications can use existing SQL knowledge and tools.
•ACID Transactions: Complete transactional guarantees—atomicity, consistency, isolation, and durability—even for distributed transactions spanning multiple nodes and partitions.
•Horizontal Scalability: Ability to scale out across multiple commodity servers, adding capacity by simply adding more nodes rather than upgrading hardware.

NewSQL vs 'Distributed SQL'

The terms 'NewSQL' and 'Distributed SQL' are often used interchangeably. 'Distributed SQL' is sometimes preferred because it emphasizes the core architectural innovation: distributing a SQL database across multiple nodes while maintaining correctness. Both terms refer to the same category of systems.

What NewSQL Is NOT

To understand NewSQL precisely, it's equally important to clarify what it's not:

Not just 'SQL on NoSQL': Some systems layer SQL interfaces over NoSQL storage. True NewSQL systems are designed from the ground up for distributed ACID transactions, not retrofitted.
Not extended traditional RDBMS: Products like MySQL Cluster or PostgreSQL with Citus add distribution capabilities but often with limitations on cross-shard transactions or consistency guarantees.
Not sacrificing consistency for scale: Unlike NoSQL, NewSQL maintains serializable isolation levels and strong consistency across all operations, even distributed ones.
Not limited to specific workloads: While optimized for OLTP, NewSQL systems support a broad range of SQL operations, not just simple key-value lookups.

Distinguishing NewSQL from Adjacent Technologies
Technology	SQL Support	ACID Distributed Txns	Horizontal Scale	NewSQL?
PostgreSQL (single-node)	✓ Full	✓ Single-node only	✗ Vertical only	No
MySQL + Read Replicas	✓ Full	✓ Single master only	✓ Reads yes, writes no	No
MongoDB	Partial (MQL)	✓ Multi-doc possible	✓ Yes	No (not relational)
Google Spanner	✓ Full	✓ Global, distributed	✓ Yes	Yes
CockroachDB	✓ PostgreSQL-compatible	✓ Serializable distributed	✓ Yes	Yes
TiDB	✓ MySQL-compatible	✓ Distributed ACID	✓ Yes	Yes
Cassandra + Stargate	Partial (CQL)	✗ No real ACID	✓ Yes	No

The Enabling Breakthroughs

NewSQL wasn't possible until several key technological and theoretical breakthroughs matured. Understanding these foundations reveals why NewSQL emerged when it did—not earlier, not later.

1. Consensus Algorithms: Paxos and Raft

Distributed consensus—getting multiple machines to agree on a value even when some machines fail—is the foundation of distributed transactions. The Paxos algorithm (Leslie Lamport, 1989/1998) proved consensus was possible but was notoriously difficult to implement correctly.

The Raft consensus algorithm (Diego Ongaro, 2013) provided a more understandable alternative, accelerating the development of correct distributed systems. NewSQL databases use these algorithms to:

Elect leaders for data partitions
Replicate transaction logs across nodes
Ensure all replicas agree on the order of transactions

raft_simplified.pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Simplified Raft Consensus Flow
// Each partition has a leader elected via Raft
 
LEADER_ELECTION:
    // If no heartbeat from leader within timeout
    candidate.term++
    candidate.requestVote(term, lastLogIndex, lastLogTerm)
    
    // If majority of votes received
    candidate.becomeLeader()
 
LOG_REPLICATION:
    // Leader receives write request
    leader.appendToLog(entry)
    
    // Replicate to followers
    for each follower:
        leader.sendAppendEntries(entries, commitIndex)
    
    // Once majority acknowledges
    leader.commitEntry(entry)
    leader.respondToClient(success)
 
SAFETY_GUARANTEES:
    // Only logs with committed entries can become leader
    // Commits are never lost once acknowledged
    // All nodes eventually have identical logs

2. Multi-Version Concurrency Control (MVCC)

MVCC allows readers to access consistent snapshots of data without blocking writers, and vice versa. This technique, refined over decades in traditional databases, became essential for NewSQL:

Each transaction sees a consistent snapshot at its start time
Writes create new versions rather than overwriting data
Garbage collection removes old versions when no longer needed
Enables serializable isolation without excessive locking

NewSQL systems extend MVCC to work across distributed nodes, maintaining consistent snapshots even when data spans multiple machines.

3. Hybrid Logical Clocks and TrueTime

To order events correctly across distributed nodes, NewSQL systems need accurate, synchronized timestamps. Two approaches emerged:

Hybrid Logical Clocks (HLC): Combine physical timestamps with logical counters to ensure causal ordering without requiring perfect clock synchronization. Used by CockroachDB.
TrueTime: Google's solution using GPS receivers and atomic clocks to provide globally synchronized time with bounded uncertainty (typically ±7ms). Used by Google Spanner.

These timing mechanisms enable consistent transaction ordering across geographically distributed datacenters.

Why Timing Matters in Distributed Databases

In a single-node database, the CPU clock orders all operations. In a distributed system, there's no single clock. If Node A commits a transaction at '12:00:00.001' and Node B commits at '12:00:00.002', which happened first? Clock drift and network delays make this ambiguous. HLC and TrueTime provide the ordering guarantees needed for correct serializable transactions.

4. Range-Based Sharding with Automatic Rebalancing

NewSQL databases partition data into ranges (contiguous key spans) rather than fixed hash-based sharding. Range-based partitioning enables:

Efficient range scans for analytics and reporting
Locality for related data (e.g., all orders for a customer)
Automatic splitting when ranges grow too large
Automatic rebalancing when nodes are added or removed

This dynamic partitioning, combined with Raft-based replication, allows NewSQL databases to automatically adjust to changing data volumes and cluster sizes.

Key NewSQL Enabling Technologies
Technology	Problem Solved	Example Implementation
Raft/Paxos Consensus	Agreement across distributed nodes	Leader election, log replication
Distributed MVCC	Concurrent reads/writes without blocking	Snapshot isolation across nodes
HLC/TrueTime	Global transaction ordering	Causally consistent timestamps
Range-based Sharding	Efficient distribution with locality	Auto-splitting, rebalancing
Serializable Snapshot Isolation	Strong isolation without full locking	Write-write conflict detection

NewSQL Architecture Overview

While specific implementations vary, NewSQL databases share common architectural patterns that enable their distinctive capabilities. Understanding these patterns provides insight into how NewSQL achieves the 'impossible' combination of scale and consistency.

Layered Architecture

Most NewSQL systems use a layered architecture separating concerns:

NewSQL Architectural Layers

•SQL Layer: Parses SQL, generates query plans, and handles client connections. May run on dedicated gateway nodes or be embedded in storage nodes.
•Transaction Layer: Manages distributed transactions, coordinates writes across partitions, implements 2PC (Two-Phase Commit), and ensures ACID semantics.
•Consensus Layer: Uses Raft or Paxos to replicate data within each partition. Handles leader election and log replication for durability and availability.
•Storage Layer: Manages persistent storage, typically using Log-Structured Merge trees (LSM-trees) or similar structures optimized for write-heavy distributed workloads.

Converting Mermaid diagram...

Data Distribution Model

NewSQL databases partition the keyspace into ranges (also called regions, tablets, or shards). Each range:

Contains a contiguous span of the keyspace (e.g., keys 'a' through 'm')
Is replicated across multiple nodes using Raft consensus (typically 3 or 5 replicas)
Has one leader that handles all reads and writes, with followers for redundancy
Can be split automatically when it grows too large
Can be moved between nodes for load balancing

This design means:

Writes go to the range leader, are replicated via Raft, and committed when a majority acknowledges
Reads can go to the leader (for linearizable reads) or followers (for slightly stale but load-balanced reads)
Distributed transactions touching multiple ranges use two-phase commit coordinated across range leaders

Why Ranges Instead of Hash Partitions?

Hash partitioning (common in NoSQL) distributes data evenly but destroys key ordering. Range partitioning preserves ordering, enabling efficient range scans ('SELECT * WHERE id BETWEEN 100 AND 200') and better locality for related data. NewSQL's automatic range splitting handles hotspots that could occur with range partitions.

The CAP Theorem Perspective

A common misconception is that NewSQL 'solves' the CAP theorem. This is incorrect—the CAP theorem is a mathematical proof, not a limitation to be overcome. However, NewSQL makes different trade-offs than traditional NoSQL systems.

CAP Theorem Recap

The CAP theorem states that a distributed data store can provide at most two of three guarantees:

Consistency: Every read receives the most recent write
Availability: Every request receives a response (success or failure)
Partition tolerance: The system continues operating despite network partitions

Since network partitions are inevitable in distributed systems, the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance) during a partition event.

NoSQL: AP Systems

•Prioritize availability over consistency
•Continue accepting reads and writes during partitions
•May return stale data
•Eventual consistency model
•Examples: Cassandra, DynamoDB (default)

NewSQL: CP Systems

•Prioritize consistency over availability
•May reject operations if quorum isn't available
•Guaranteed fresh data on successful reads
•Serializable isolation guarantees
•Examples: Spanner, CockroachDB, TiDB

NewSQL's Pragmatic Approach

NewSQL systems acknowledge that network partitions are rare and brief in modern data centers. They optimize for the common case (no partition) while maintaining correctness in the failure case (partition occurs):

During normal operation: Full ACID transactions, serializable isolation, high throughput
During partition: Operations involving the minority partition become unavailable; majority partition continues normally
After partition heals: Raft automatically reconciles any divergence; no application intervention needed

This design means NewSQL systems are almost always fully available (because partitions are rare) while always consistent (because they never sacrifice correctness).

Availability in Practice

With 3-way replication and typical cloud provider reliability, NewSQL systems achieve 99.99%+ availability. The theoretical 'availability sacrifice' of choosing CP over AP rarely manifests because true network partitions between properly configured replicas are extraordinarily rare. You get both consistency and practical high availability.

NewSQL vs Traditional SQL and NoSQL

To solidify understanding, let's directly compare NewSQL with the two paradigms it bridges:

Compared to Traditional SQL (PostgreSQL, MySQL, Oracle)

NewSQL vs Traditional SQL
Aspect	Traditional SQL	NewSQL
Scalability	Vertical (limited by single machine)	Horizontal (add nodes for capacity)
High Availability	Requires external solutions (replication, failover)	Built-in multi-region replication and automatic failover
Geographic Distribution	Complex, often with eventual consistency	Native global distribution with strong consistency
Schema Changes	Often require downtime or locking	Online schema changes without downtime
Operations Complexity	Need DBA expertise for scaling	Automated rebalancing, splitting, healing
Cost at Scale	Expensive specialized hardware	Commodity servers, linear cost scaling

Compared to NoSQL (Cassandra, MongoDB, DynamoDB)

NewSQL vs NoSQL
Aspect	NoSQL	NewSQL
Consistency Model	Eventual or tunable consistency	Strong consistency (serializable isolation)
Transactions	Single-key or limited multi-doc	Full distributed ACID transactions
Query Language	Proprietary APIs, limited	Standard SQL with full relational semantics
Joins	App-level or unsupported	Native, optimized distributed joins
Schema Enforcement	Optional, often schema-less	Required schemas with type safety
Ecosystem	Fragmented, system-specific	Leverages decades of SQL tooling

The NewSQL Value Proposition

NewSQL lets you keep your existing SQL skills, tools, and ORM frameworks while gaining cloud-native scalability and resilience. You no longer need to choose between 'works correctly' and 'works at scale'—NewSQL delivers both.

Major NewSQL Systems at a Glance

The NewSQL landscape includes several mature, production-ready systems. Here's a brief overview of the most significant ones (we'll explore Google Spanner and CockroachDB in detail in subsequent pages):

Leading NewSQL Databases

•Google Spanner: The original globally-distributed NewSQL database. Uses TrueTime (GPS + atomic clocks) for global consistency. Powers Google's advertising and core infrastructure. Available as Cloud Spanner on GCP.
•CockroachDB: Open-source, PostgreSQL-compatible NewSQL database. Inspired by Spanner but works without specialized hardware using hybrid logical clocks. Strong adoption for cloud-native applications.
•TiDB: MySQL-compatible distributed SQL database from PingCAP. Popular in Asia-Pacific region. Combines TiKV (distributed key-value store) with TiDB (SQL layer).
•YugabyteDB: PostgreSQL and Cassandra-compatible distributed SQL. Designed for multi-cloud portability. Offers both YSQL (Postgres) and YCQL (Cassandra-like) APIs.
•VoltDB: In-memory NewSQL database optimized for extreme low-latency OLTP. Uses partitioned stored procedures for deterministic execution.
•NuoDB: Distributed SQL database with a separation of compute ('transaction engines') and storage ('storage managers').

NewSQL System Comparison
System	SQL Compatibility	Primary Use Case	Deployment
Google Cloud Spanner	Standard SQL	Global mission-critical systems	Cloud (GCP only)
CockroachDB	PostgreSQL	Cloud-native distributed apps	Open source / Cloud
TiDB	MySQL	Hybrid OLTP/OLAP (HTAP)	Open source / Cloud
YugabyteDB	PostgreSQL + CQL	Multi-cloud portability	Open source / Cloud
VoltDB	Subset of SQL	Ultra-low-latency OLTP	Commercial / Open core

Summary: The NewSQL Revolution

We've covered substantial ground in understanding what NewSQL databases are and why they represent a significant advancement in database technology. Let's consolidate the key concepts:

Key Takeaways

•NewSQL bridges two worlds — Combines the ACID guarantees and SQL interface of traditional RDBMS with the horizontal scalability of NoSQL systems.
•Born from necessity — Web-scale companies needed transactional consistency at massive scale; neither traditional SQL nor NoSQL fully delivered.
•Enabled by breakthroughs — Raft consensus, distributed MVCC, hybrid logical clocks, and range-based sharding made NewSQL architecturally possible.
•CP systems by design — NewSQL prioritizes consistency, accepting brief unavailability during rare partition events rather than serving stale data.
•Standard SQL compatibility — Existing applications, tools, and developer skills transfer directly to NewSQL systems.
•Automatic operations — Replication, rebalancing, and failover are built-in, reducing operational complexity compared to sharded traditional databases.

What's Next

With the foundational understanding of NewSQL established, the next pages will dive deeper into:

SQL with Scalability: How NewSQL systems achieve scale-out architecture while maintaining SQL semantics
Google Spanner: The pioneering system that proved global-scale ACID transactions were possible
CockroachDB: An open-source Spanner-inspired system accessible to all organizations
Use Cases: When NewSQL is the right choice—and when it isn't

Page Complete

You now understand what NewSQL databases are, why they emerged, and how they achieve the seemingly impossible combination of ACID transactions and horizontal scalability. This conceptual foundation prepares you to explore specific NewSQL implementations and their real-world applications.

1 / 5

Loading learning content...

Database Management SystemsModern Database Topics

NewSQL Databases

LevelAdvanced

Duration60 mins

TopicModern Database Topics

1 / 5

NewSQL Concept

The Best of Both Worlds

Then came NewSQL databases.

What You Will Learn

Historical Context: Why NewSQL Emerged

The Traditional SQL Era (1970s-2000s)

For decades, relational database management systems (RDBMS) like Oracle, PostgreSQL, MySQL, and SQL Server dominated enterprise data management. These systems offered:

ACID transactions: Atomicity, Consistency, Isolation, Durability guarantees that ensured data integrity
SQL: A powerful, declarative query language with rich semantics
Relational model: A mathematically rigorous foundation for data organization
Mature tooling: Decades of optimization, monitoring, and administration tools

The Vertical Scaling Wall

The NoSQL Revolution (2005-2015)

Faced with scaling limitations, internet-scale companies developed new database architectures that prioritized horizontal scalability:

Google Bigtable (2006): Column-family storage for web indexing
Amazon Dynamo (2007): Key-value store for shopping cart availability
MongoDB (2009): Document database with flexible schemas
Apache Cassandra (2008): Wide-column distributed database

NoSQL databases typically offered:

Horizontal scalability across commodity hardware
Schema flexibility (no rigid table structures)
Eventual consistency (data would eventually converge, but reads might return stale values)
High availability through replication

The Pre-NewSQL Database Landscape
Characteristic	Traditional SQL	NoSQL
Scalability	Vertical (scale-up)	Horizontal (scale-out)
Consistency Model	Strong ACID	Eventual / tunable
Query Language	SQL (declarative, powerful)	API-based (limited, varies by system)
Schema	Rigid, predefined	Flexible, schema-less
Transactions	Full ACID, multi-statement	Single-key or limited
Joins	Native, optimized	Application-level or unsupported
Use Cases	Enterprise, financial, ERP	Web-scale, analytics, caching

The Gap That Created NewSQL

NoSQL solved the scalability problem but created new challenges:

Loss of ACID semantics: Applications had to handle consistency at the application layer—error-prone and complex
No standard query language: Each NoSQL system had its own API, increasing learning curves and lock-in
Limited transaction support: Operations spanning multiple records required custom, fragile workarounds
Join complexity: Denormalization or application-level joins increased data redundancy and complexity

Meanwhile, many applications—particularly in finance, healthcare, and enterprise systems—required strong consistency and couldn't tolerate the trade-offs NoSQL demanded.

The question became: Is it possible to have both ACID guarantees AND horizontal scalability?

Defining NewSQL: What It Actually Means

Formal Definition:

NewSQL is a class of relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

This definition encompasses three essential characteristics:

NewSQL Core Requirements

•SQL Interface: Full SQL support with standard relational semantics, including complex joins, aggregations, and subqueries. Applications can use existing SQL knowledge and tools.
•ACID Transactions: Complete transactional guarantees—atomicity, consistency, isolation, and durability—even for distributed transactions spanning multiple nodes and partitions.
•Horizontal Scalability: Ability to scale out across multiple commodity servers, adding capacity by simply adding more nodes rather than upgrading hardware.

NewSQL vs 'Distributed SQL'

What NewSQL Is NOT

To understand NewSQL precisely, it's equally important to clarify what it's not:

Not just 'SQL on NoSQL': Some systems layer SQL interfaces over NoSQL storage. True NewSQL systems are designed from the ground up for distributed ACID transactions, not retrofitted.
Not extended traditional RDBMS: Products like MySQL Cluster or PostgreSQL with Citus add distribution capabilities but often with limitations on cross-shard transactions or consistency guarantees.
Not sacrificing consistency for scale: Unlike NoSQL, NewSQL maintains serializable isolation levels and strong consistency across all operations, even distributed ones.
Not limited to specific workloads: While optimized for OLTP, NewSQL systems support a broad range of SQL operations, not just simple key-value lookups.

Distinguishing NewSQL from Adjacent Technologies
Technology	SQL Support	ACID Distributed Txns	Horizontal Scale	NewSQL?
PostgreSQL (single-node)	✓ Full	✓ Single-node only	✗ Vertical only	No
MySQL + Read Replicas	✓ Full	✓ Single master only	✓ Reads yes, writes no	No
MongoDB	Partial (MQL)	✓ Multi-doc possible	✓ Yes	No (not relational)
Google Spanner	✓ Full	✓ Global, distributed	✓ Yes	Yes
CockroachDB	✓ PostgreSQL-compatible	✓ Serializable distributed	✓ Yes	Yes
TiDB	✓ MySQL-compatible	✓ Distributed ACID	✓ Yes	Yes
Cassandra + Stargate	Partial (CQL)	✗ No real ACID	✓ Yes	No

The Enabling Breakthroughs

NewSQL wasn't possible until several key technological and theoretical breakthroughs matured. Understanding these foundations reveals why NewSQL emerged when it did—not earlier, not later.

1. Consensus Algorithms: Paxos and Raft

Elect leaders for data partitions
Replicate transaction logs across nodes
Ensure all replicas agree on the order of transactions

raft_simplified.pseudocode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Simplified Raft Consensus Flow
// Each partition has a leader elected via Raft
 
LEADER_ELECTION:
    // If no heartbeat from leader within timeout
    candidate.term++
    candidate.requestVote(term, lastLogIndex, lastLogTerm)
    
    // If majority of votes received
    candidate.becomeLeader()
 
LOG_REPLICATION:
    // Leader receives write request
    leader.appendToLog(entry)
    
    // Replicate to followers
    for each follower:
        leader.sendAppendEntries(entries, commitIndex)
    
    // Once majority acknowledges
    leader.commitEntry(entry)
    leader.respondToClient(success)
 
SAFETY_GUARANTEES:
    // Only logs with committed entries can become leader
    // Commits are never lost once acknowledged
    // All nodes eventually have identical logs

2. Multi-Version Concurrency Control (MVCC)

MVCC allows readers to access consistent snapshots of data without blocking writers, and vice versa. This technique, refined over decades in traditional databases, became essential for NewSQL:

Each transaction sees a consistent snapshot at its start time
Writes create new versions rather than overwriting data
Garbage collection removes old versions when no longer needed
Enables serializable isolation without excessive locking

NewSQL systems extend MVCC to work across distributed nodes, maintaining consistent snapshots even when data spans multiple machines.

3. Hybrid Logical Clocks and TrueTime

To order events correctly across distributed nodes, NewSQL systems need accurate, synchronized timestamps. Two approaches emerged:

Hybrid Logical Clocks (HLC): Combine physical timestamps with logical counters to ensure causal ordering without requiring perfect clock synchronization. Used by CockroachDB.
TrueTime: Google's solution using GPS receivers and atomic clocks to provide globally synchronized time with bounded uncertainty (typically ±7ms). Used by Google Spanner.

These timing mechanisms enable consistent transaction ordering across geographically distributed datacenters.

Why Timing Matters in Distributed Databases

4. Range-Based Sharding with Automatic Rebalancing

NewSQL databases partition data into ranges (contiguous key spans) rather than fixed hash-based sharding. Range-based partitioning enables:

Efficient range scans for analytics and reporting
Locality for related data (e.g., all orders for a customer)
Automatic splitting when ranges grow too large
Automatic rebalancing when nodes are added or removed

This dynamic partitioning, combined with Raft-based replication, allows NewSQL databases to automatically adjust to changing data volumes and cluster sizes.

Key NewSQL Enabling Technologies
Technology	Problem Solved	Example Implementation
Raft/Paxos Consensus	Agreement across distributed nodes	Leader election, log replication
Distributed MVCC	Concurrent reads/writes without blocking	Snapshot isolation across nodes
HLC/TrueTime	Global transaction ordering	Causally consistent timestamps
Range-based Sharding	Efficient distribution with locality	Auto-splitting, rebalancing
Serializable Snapshot Isolation	Strong isolation without full locking	Write-write conflict detection

NewSQL Architecture Overview

Layered Architecture

Most NewSQL systems use a layered architecture separating concerns:

NewSQL Architectural Layers

•SQL Layer: Parses SQL, generates query plans, and handles client connections. May run on dedicated gateway nodes or be embedded in storage nodes.
•Transaction Layer: Manages distributed transactions, coordinates writes across partitions, implements 2PC (Two-Phase Commit), and ensures ACID semantics.
•Consensus Layer: Uses Raft or Paxos to replicate data within each partition. Handles leader election and log replication for durability and availability.
•Storage Layer: Manages persistent storage, typically using Log-Structured Merge trees (LSM-trees) or similar structures optimized for write-heavy distributed workloads.

Converting Mermaid diagram...

Data Distribution Model

NewSQL databases partition the keyspace into ranges (also called regions, tablets, or shards). Each range:

Contains a contiguous span of the keyspace (e.g., keys 'a' through 'm')
Is replicated across multiple nodes using Raft consensus (typically 3 or 5 replicas)
Has one leader that handles all reads and writes, with followers for redundancy
Can be split automatically when it grows too large
Can be moved between nodes for load balancing

This design means:

Writes go to the range leader, are replicated via Raft, and committed when a majority acknowledges
Reads can go to the leader (for linearizable reads) or followers (for slightly stale but load-balanced reads)
Distributed transactions touching multiple ranges use two-phase commit coordinated across range leaders

Why Ranges Instead of Hash Partitions?

The CAP Theorem Perspective

CAP Theorem Recap

The CAP theorem states that a distributed data store can provide at most two of three guarantees:

Consistency: Every read receives the most recent write
Availability: Every request receives a response (success or failure)
Partition tolerance: The system continues operating despite network partitions

Since network partitions are inevitable in distributed systems, the real choice is between CP (consistency + partition tolerance) and AP (availability + partition tolerance) during a partition event.

NoSQL: AP Systems

•Prioritize availability over consistency
•Continue accepting reads and writes during partitions
•May return stale data
•Eventual consistency model
•Examples: Cassandra, DynamoDB (default)

NewSQL: CP Systems

•Prioritize consistency over availability
•May reject operations if quorum isn't available
•Guaranteed fresh data on successful reads
•Serializable isolation guarantees
•Examples: Spanner, CockroachDB, TiDB

NewSQL's Pragmatic Approach

During normal operation: Full ACID transactions, serializable isolation, high throughput
During partition: Operations involving the minority partition become unavailable; majority partition continues normally
After partition heals: Raft automatically reconciles any divergence; no application intervention needed

This design means NewSQL systems are almost always fully available (because partitions are rare) while always consistent (because they never sacrifice correctness).

Availability in Practice

NewSQL vs Traditional SQL and NoSQL

To solidify understanding, let's directly compare NewSQL with the two paradigms it bridges:

Compared to Traditional SQL (PostgreSQL, MySQL, Oracle)

NewSQL vs Traditional SQL
Aspect	Traditional SQL	NewSQL
Scalability	Vertical (limited by single machine)	Horizontal (add nodes for capacity)
High Availability	Requires external solutions (replication, failover)	Built-in multi-region replication and automatic failover
Geographic Distribution	Complex, often with eventual consistency	Native global distribution with strong consistency
Schema Changes	Often require downtime or locking	Online schema changes without downtime
Operations Complexity	Need DBA expertise for scaling	Automated rebalancing, splitting, healing
Cost at Scale	Expensive specialized hardware	Commodity servers, linear cost scaling

Compared to NoSQL (Cassandra, MongoDB, DynamoDB)

NewSQL vs NoSQL
Aspect	NoSQL	NewSQL
Consistency Model	Eventual or tunable consistency	Strong consistency (serializable isolation)
Transactions	Single-key or limited multi-doc	Full distributed ACID transactions
Query Language	Proprietary APIs, limited	Standard SQL with full relational semantics
Joins	App-level or unsupported	Native, optimized distributed joins
Schema Enforcement	Optional, often schema-less	Required schemas with type safety
Ecosystem	Fragmented, system-specific	Leverages decades of SQL tooling

The NewSQL Value Proposition

Major NewSQL Systems at a Glance

Leading NewSQL Databases

•Google Spanner: The original globally-distributed NewSQL database. Uses TrueTime (GPS + atomic clocks) for global consistency. Powers Google's advertising and core infrastructure. Available as Cloud Spanner on GCP.
•CockroachDB: Open-source, PostgreSQL-compatible NewSQL database. Inspired by Spanner but works without specialized hardware using hybrid logical clocks. Strong adoption for cloud-native applications.
•TiDB: MySQL-compatible distributed SQL database from PingCAP. Popular in Asia-Pacific region. Combines TiKV (distributed key-value store) with TiDB (SQL layer).
•YugabyteDB: PostgreSQL and Cassandra-compatible distributed SQL. Designed for multi-cloud portability. Offers both YSQL (Postgres) and YCQL (Cassandra-like) APIs.
•VoltDB: In-memory NewSQL database optimized for extreme low-latency OLTP. Uses partitioned stored procedures for deterministic execution.
•NuoDB: Distributed SQL database with a separation of compute ('transaction engines') and storage ('storage managers').

NewSQL System Comparison
System	SQL Compatibility	Primary Use Case	Deployment
Google Cloud Spanner	Standard SQL	Global mission-critical systems	Cloud (GCP only)
CockroachDB	PostgreSQL	Cloud-native distributed apps	Open source / Cloud
TiDB	MySQL	Hybrid OLTP/OLAP (HTAP)	Open source / Cloud
YugabyteDB	PostgreSQL + CQL	Multi-cloud portability	Open source / Cloud
VoltDB	Subset of SQL	Ultra-low-latency OLTP	Commercial / Open core

Summary: The NewSQL Revolution

We've covered substantial ground in understanding what NewSQL databases are and why they represent a significant advancement in database technology. Let's consolidate the key concepts:

Key Takeaways

•NewSQL bridges two worlds — Combines the ACID guarantees and SQL interface of traditional RDBMS with the horizontal scalability of NoSQL systems.
•Born from necessity — Web-scale companies needed transactional consistency at massive scale; neither traditional SQL nor NoSQL fully delivered.
•Enabled by breakthroughs — Raft consensus, distributed MVCC, hybrid logical clocks, and range-based sharding made NewSQL architecturally possible.
•CP systems by design — NewSQL prioritizes consistency, accepting brief unavailability during rare partition events rather than serving stale data.
•Standard SQL compatibility — Existing applications, tools, and developer skills transfer directly to NewSQL systems.
•Automatic operations — Replication, rebalancing, and failover are built-in, reducing operational complexity compared to sharded traditional databases.

What's Next

With the foundational understanding of NewSQL established, the next pages will dive deeper into:

SQL with Scalability: How NewSQL systems achieve scale-out architecture while maintaining SQL semantics
Google Spanner: The pioneering system that proved global-scale ACID transactions were possible
CockroachDB: An open-source Spanner-inspired system accessible to all organizations
Use Cases: When NewSQL is the right choice—and when it isn't

Page Complete

1 / 5