Loading learning content...
In 2012, Google published the Spanner paper—describing a globally distributed database that achieved the seemingly impossible: strong consistency, SQL semantics, and horizontal scalability across continents. The paper sent shockwaves through the database industry. But there was a catch: Spanner was Google-only, built on infrastructure most organizations could never replicate.
Three former Google engineers—Spencer Kimball, Peter Mattis, and Ben Darnell—looked at Spanner and asked a different question: What if we could bring these capabilities to everyone?
The result was CockroachDB, launched in 2014 and named for the insect famous for surviving nuclear apocalypse. The name isn't just clever marketing—it's a design philosophy. CockroachDB is engineered to survive datacenter failures, network partitions, and operational disasters that would kill lesser databases. It emerges from chaos intact, with zero data loss.
Today, CockroachDB powers mission-critical applications at companies like Netflix, Comcast, Equifax, and Bose. It processes millions of transactions per second across global deployments, providing the distributed SQL guarantees that were once exclusive to Google—now available as open-source software anyone can run.
By the end of this page, you will understand CockroachDB's origins and the specific design decisions inspired by Spanner. You'll learn how CockroachDB adapted Spanner's architecture for commodity hardware, the engineering trade-offs involved, and why CockroachDB represents a democratization of distributed database technology.
To understand CockroachDB, you must first understand what made Spanner revolutionary—and what made replicating it so challenging.
Google Spanner's Key Innovations:
Globally Distributed SQL: Unlike NoSQL systems that sacrificed consistency for scale, Spanner proved you could have both. It supported full SQL with ACID transactions across continents.
TrueTime: Spanner's secret weapon was TrueTime—a globally synchronized clock using GPS receivers and atomic clocks in every datacenter. TrueTime provided bounded clock uncertainty, enabling external consistency without coordination.
Paxos Consensus: Every piece of data was replicated across multiple zones using Paxos, ensuring durability and consistency even during failures.
Automatic Sharding: Data automatically split and rebalanced as it grew, with no manual intervention required.
The Catch: Google's Infrastructure
Spanner's design assumed Google's unique infrastructure:
For organizations without Google's resources, Spanner was architectural inspiration—but not a blueprint they could follow directly.
| Spanner Component | Google Reality | Industry Reality | CockroachDB Approach |
|---|---|---|---|
| TrueTime (atomic clocks) | Available in all datacenters | Not available anywhere | Hybrid Logical Clocks (HLC) |
| Private global network | Google-owned fiber globally | Public cloud, varying latency | Design for variable latency |
| Colossus filesystem | Custom distributed FS | Cloud block storage | RocksDB per-node storage |
| Chubby lock service | Internal Google service | No equivalent | Integrated Raft consensus |
| Paxos expertise | Deep internal knowledge | Limited real-world experience | Raft (more understandable) |
The CockroachDB Thesis:
The CockroachDB founders believed that Spanner's principles could be replicated, even if its exact implementation could not. They bet on three insights:
Commodity hardware is sufficient: You don't need atomic clocks if you design around clock uncertainty differently.
Raft is as good as Paxos: The Raft consensus protocol (published in 2014) provided the same guarantees as Paxos but was much easier to implement correctly.
Open source matters: Making the database open-source would attract contributors, enable community scrutiny, and build trust in a way proprietary systems couldn't.
These bets paid off. CockroachDB brought Spanner's vision to organizations running on AWS, GCP, Azure, or their own datacenters—no Google required.
CockroachDB launched at the perfect moment: cloud computing made distributed infrastructure accessible, containers (Docker/Kubernetes) simplified deployment, and the industry was growing frustrated with the compromises of NoSQL. Organizations wanted SQL back—but they also wanted scale. CockroachDB promised both.
CockroachDB's design is guided by several core principles that shape every architectural decision. Understanding these principles explains why CockroachDB works the way it does.
Principle 1: Survivability Above All
The cockroach survives because it can endure conditions that would kill other organisms. CockroachDB embraces this philosophy:
Principle 2: Strong Consistency by Default
Unlike systems that make eventual consistency the default (with strong consistency as an expensive option), CockroachDB provides serializable isolation by default:
Principle 3: Data Locality Awareness
Data should live close to where it's accessed:
Principle 4: Operational Simplicity
Spanner required Google's operational expertise. CockroachDB aims to be simpler:
COCKROACHDB DESIGN PRINCIPLES IN PRACTICE═══════════════════════════════════════════════════════════════════ SURVIVABILITY: Multi-Region Deployment Example──────────────────────────────────────────────────────────────────Scenario: 3-region deployment (US-East, US-West, EU-West) Each region: 3 nodes (9 nodes total) Replication factor: 5 (data on 5 nodes) What CockroachDB survives automatically:├── Single node failure: ✅ No impact (4/5 replicas remain)├── Multiple node failures: ✅ Up to 2 nodes (3/5 replicas remain)├── Full region failure: ✅ 6/9 nodes survive (majority quorum)├── Network partition: ✅ Majority partition continues└── Split-brain scenario: ✅ Raft prevents inconsistency CONSISTENCY: Transaction Isolation Example──────────────────────────────────────────────────────────────────-- Two concurrent transactions on same account-- Transaction A: Transfer $100 to savings-- Transaction B: Transfer $50 to checking CockroachDB guarantee:├── Serializable isolation: A then B, or B then A, never partial├── No lost updates: Both transfers are fully applied├── No dirty reads: Neither sees other's uncommitted work└── External consistency: If A commits before B starts, B sees A LOCALITY: Regional Data Configuration──────────────────────────────────────────────────────────────────-- Pin customer data to their home regionALTER DATABASE customers CONFIGURE ZONE USING num_replicas = 5, constraints = '{"+region=us-east": 2, "+region=us-west": 2, "+region=eu-west": 1}', lease_preferences = '[[+region=us-east]]'; Result:├── US customers: Low-latency writes (leader in US-East)├── EU customers: Can be configured separately with EU leaders├── Strong consistency: Still maintained across all regions└── GDPR compliance: EU data stays in EU (configurable)CockroachDB uses the PostgreSQL wire protocol, meaning any PostgreSQL client library (psycopg2, node-postgres, etc.) works with CockroachDB. Most PostgreSQL syntax is supported, making migration from PostgreSQL straightforward for many applications.
CockroachDB's architecture is organized in layers, each providing specific guarantees. Understanding these layers is essential for diagnosing performance issues and making informed deployment decisions.
Layer 1: SQL Layer
The topmost layer handles SQL parsing, query planning, and execution. It translates SQL statements into operations on the underlying key-value store:
Layer 2: Transaction Layer
This layer provides ACID guarantees for operations spanning multiple keys:
Layer 3: Distribution Layer
Data is distributed across nodes using a key-value model:
Layer 4: Replication Layer
Every range is replicated for durability and availability:
Layer 5: Storage Layer
The bottom layer persists data to disk:
COCKROACHDB ARCHITECTURE LAYERS═══════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────────────────────────────────┐│ CLIENT APPLICATIONS ││ (PostgreSQL wire protocol) │└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ SQL LAYER ││ ┌─────────────────────────────────────────────────────────────┐││ │ Parser → Planner → Optimizer → Executor │││ ├─────────────────────────────────────────────────────────────┤││ │ • PostgreSQL-compatible syntax │││ │ • Cost-based query optimization │││ │ • Distributed query execution (DistSQL) │││ └─────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ TRANSACTION LAYER ││ ┌─────────────────────────────────────────────────────────────┐││ │ Transaction Coordinator │││ ├─────────────────────────────────────────────────────────────┤││ │ • Hybrid Logical Clocks for timestamps │││ │ • Serializable Snapshot Isolation (SSI) │││ │ • Write intents and conflict resolution │││ │ • Parallel commits optimization │││ └─────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ DISTRIBUTION LAYER ││ ┌─────────────────────────────────────────────────────────────┐││ │ Ranges (Key-Value Maps) │││ ├─────────────────────────────────────────────────────────────┤││ │ • ~512MB per range (configurable) │││ │ • Range descriptors track locations │││ │ • Leaseholders coordinate reads/writes │││ │ • Gossip protocol for cluster topology │││ └─────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ REPLICATION LAYER ││ ┌─────────────────────────────────────────────────────────────┐││ │ Raft Consensus Groups │││ ├─────────────────────────────────────────────────────────────┤││ │ • One Raft group per range │││ │ • Majority (quorum) writes for durability │││ │ • Automatic leader election │││ │ • Consistent reads through leaseholder │││ └─────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ STORAGE LAYER ││ ┌─────────────────────────────────────────────────────────────┐││ │ Pebble (LSM-Tree Storage Engine) │││ ├─────────────────────────────────────────────────────────────┤││ │ • Multi-Version Concurrency Control (MVCC) │││ │ • Efficient range scans and point lookups │││ │ • Background compaction │││ │ • Optional encryption at rest │││ └─────────────────────────────────────────────────────────────┘│└─────────────────────────────────────────────────────────────────┘ NODE STRUCTURE:═══════════════════════════════════════════════════════════════════Every CockroachDB node runs ALL layers. There are no dedicated master/metadata/coordination nodes. This symmetry enables: • Any node can serve any query • No single points of failure • Linear horizontal scaling • Simple deployment (single binary) ┌─────────────────────────────────────────────────────────────────┐│ COCKROACHDB NODE ││ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────┐ ││ │ SQL Gateway │ │ Range Replica │ │ Gossip Node │ ││ │ (any query) │ │ (data portion) │ │ (metadata) │ ││ └───────────────────┘ └───────────────────┘ └───────────────┘ ││ ┌───────────────────────────────────────────────────────────┐ ││ │ Pebble Storage (local disk or SSD) │ ││ └───────────────────────────────────────────────────────────┘ │└─────────────────────────────────────────────────────────────────┘Key Architectural Insight: Symmetric Nodes
Unlike traditional databases where some nodes are 'special' (masters, metadata servers, coordinators), every CockroachDB node is architecturally identical. Each node:
This symmetry is crucial for survivability. There's no master whose failure requires special handling—just nodes, each equally capable of all operations.
CockroachDB deploys as a single binary with no external dependencies (no ZooKeeper, no etcd, no separate configuration servers). This dramatically simplifies operations but means CockroachDB must solve coordination internally—which it does through Raft consensus among nodes.
CockroachDB isn't a replica of Spanner—it's an adaptation for a different operational reality. Several key changes make CockroachDB viable on commodity infrastructure.
Adaptation 1: Hybrid Logical Clocks Instead of TrueTime
Spanner's TrueTime requires hardware (GPS receivers, atomic clocks) that most organizations don't have. CockroachDB uses Hybrid Logical Clocks (HLC) instead:
How HLC Works:
An HLC timestamp has two components:
When events happen on different nodes, HLC ensures causally-related events are correctly ordered. If node A sends a message to node B, B's HLC will be greater than A's—even if B's physical clock is behind.
Adaptation 2: Raft Instead of Paxos
Spanner uses Multi-Paxos, a notoriously complex protocol. CockroachDB chose Raft:
Adaptation 3: Read Timestamp Uncertainty Windows
Without TrueTime's bounded uncertainty, CockroachDB must handle clock skew differently:
In Practice:
With well-configured NTP (most cloud environments), clock skew is typically <100ms. CockroachDB's default maximum clock offset is 500ms. The uncertainty handling adds a few milliseconds to some reads—acceptable for most workloads.
Adaptation 4: No Interleaved Tables (Yet)
Spanner's interleaved tables co-locate related data. CockroachDB doesn't have direct equivalent, but provides:
This requires more explicit schema design but provides similar benefits with more flexibility.
HYBRID LOGICAL CLOCKS (HLC) - HOW COCKROACHDB ORDERS EVENTS═══════════════════════════════════════════════════════════════════ HLC Timestamp Structure:──────────────────────────────────────────────────────────────────── ┌─────────────────────────────────────────┬────────────────────┐ │ Physical Time (wall clock) │ Logical Counter │ │ 48 bits │ 16 bits │ └─────────────────────────────────────────┴────────────────────┘ HLC Rules:────────────────────────────────────────────────────────────────────1. Local Event: new_hlc.physical = max(current_hlc.physical, wall_clock) new_hlc.logical = (physical unchanged) ? current_hlc.logical + 1 : 0 2. Receive Event (from another node with timestamp msg_hlc): new_hlc.physical = max(current_hlc.physical, msg_hlc.physical, wall_clock) new_hlc.logical = depends on which physical time was max Example Timeline:────────────────────────────────────────────────────────────────────Node A (wall clock accurate): T=100ms: Local write → HLC = (100, 0) T=105ms: Send to B → HLC = (105, 0) Node B (wall clock 10ms behind): T=95ms: Wall clock shows 95, but receives HLC=(105, 0) → Must advance HLC to at least (105, 1) T=96ms: Local write → HLC = (105, 2) [physical can't go back] Result: B's events are correctly ordered AFTER A's, despite clock skew Uncertainty Window Handling:────────────────────────────────────────────────────────────────────When reading, CockroachDB considers values with timestamps in: [read_timestamp - max_clock_offset, read_timestamp] If uncertain values exist: 1. Refresh the read at a higher timestamp, OR 2. Wait until the uncertainty window passes This ensures read consistency without atomic clocks. Configuration:────────────────────────────────────────────────────────────────────--max-offset=500ms # Maximum tolerated clock skew--clock-device=/dev/ptp0 # Optional: Use PTP for better sync Cloud environments typically achieve <10ms skew with NTP,making the 500ms default extremely conservative.HLC achieves the same ordering guarantees as TrueTime but with worse bounds on uncertainty. This means some reads may need to wait or retry when values fall in uncertainty windows. In practice, with well-tuned NTP, this happens rarely—but for latency-critical applications, Amazon Time Sync Service or Google's Cloud Spanner-compatible clocks can minimize the issue.
CockroachDB's open-source nature isn't just a licensing detail—it fundamentally shapes how organizations evaluate, deploy, and trust the database.
Transparency and Auditability
For a database handling financial transactions, healthcare records, or sensitive user data, organizations need to verify its behavior:
This transparency is impossible with proprietary databases like Spanner (GCP-only) or Aurora (AWS-only).
Community and Ecosystem
CockroachDB's community contributes:
No Vendor Lock-in
Perhaps most importantly, open source means freedom:
| Edition | License | Features | Support | Use Case |
|---|---|---|---|---|
| CockroachDB Core | BSL (converts to Apache 2.0) | Full database functionality | Community only | Development, testing, small deployments |
| CockroachDB Enterprise | Enterprise License | Core + backup, CDC, SSO | Cockroach Labs support | Production with advanced features |
| CockroachDB Dedicated | Managed Service | Fully managed clusters | Full support included | Managed production workloads |
| CockroachDB Serverless | Managed Service | Pay-per-use, auto-scaling | Full support included | Variable workloads, getting started |
The Business Source License (BSL)
CockroachDB uses the Business Source License, which has important implications:
This prevents cloud providers from competing directly with Cockroach Labs while keeping the software accessible for most organizations.
The Competitive Landscape
CockroachDB competes with:
Each has tradeoffs. CockroachDB's combination of Spanner-like architecture, PostgreSQL compatibility, and open-source licensing makes it unique.
Open source is particularly valuable for: (1) organizations with strict audit requirements, (2) companies wanting multi-cloud or hybrid deployments, (3) engineering teams who want to understand and debug the database deeply, and (4) anyone concerned about long-term vendor viability.
CockroachDB's promise of Spanner-like capabilities has attracted organizations across industries. Understanding how real companies use CockroachDB illustrates its practical capabilities.
Netflix: Global Content Delivery
Netflix uses CockroachDB for metadata management in their content delivery infrastructure:
Comcast: Customer Service Platforms
Comcast migrated critical customer-facing applications to CockroachDB:
Bose: IoT Data Management
Bose uses CockroachDB for their connected audio products:
Common Adoption Patterns
Organizations typically adopt CockroachDB following these patterns:
Pattern 1: PostgreSQL Replacement
Pattern 2: Microservices Database
Pattern 3: Global Application Database
Pattern 4: Disaster Recovery Upgrade
Most CockroachDB adoptions are migrations from existing databases, not greenfield. PostgreSQL compatibility is crucial—organizations report that 80-90% of queries run unchanged. The remainder require minor adjustments for CockroachDB's distributed nature (e.g., explicit primary keys, avoiding certain PostgreSQL extensions).
We've explored CockroachDB's origins and the vision that drives its design. Let's consolidate the key insights:
What's Next:
Understanding CockroachDB's origins and design philosophy sets the foundation. In the next page, we'll dive deep into Distributed SQL—how CockroachDB provides full SQL semantics across a distributed cluster, including query routing, DistSQL execution, and the techniques that make distributed joins efficient.
You now understand CockroachDB's genesis as a Spanner-inspired open-source database, the key adaptations that make it viable on commodity infrastructure, and its architectural philosophy. Next, we'll explore how CockroachDB achieves distributed SQL—the ability to run complex queries across a globally distributed cluster.