Loading learning content...
Named after the creature that survives nuclear apocalypse, CockroachDB was designed to survive any infrastructure failure. Founded by ex-Google engineers who worked on Spanner, CockroachDB brings globally-distributed, serializable transactions to organizations without requiring Google-scale infrastructure or vendor lock-in.
CockroachDB is:
This accessibility has made CockroachDB one of the most widely adopted NewSQL databases, used by companies like Netflix, Bose, SpaceX, and thousands of startups requiring reliable distributed databases.
By the end of this page, you will understand CockroachDB's architecture, how it achieves Spanner-like consistency without specialized hardware (using hybrid logical clocks), its PostgreSQL compatibility layer, data distribution mechanisms, and when to choose CockroachDB for your applications.
CockroachDB was founded in 2015 by Spencer Kimball, Peter Mattis, and Ben Darnell—all former Google engineers with deep distributed systems experience.
The Founding Insight
The founders believed that Spanner's innovations shouldn't require Google's specialized infrastructure. If the core architectural principles could work with commodity hardware and without atomic clocks, every organization could benefit from globally-distributed ACID transactions.
Key Design Goals
The Name
The name 'CockroachDB' reflects the resilience goal: cockroaches famously survive conditions that would kill other species. The database is designed to survive node failures, network partitions, and even regional outages while maintaining correctness.
Open Source Model
CockroachDB uses the Business Source License (BSL):
This model balances open development with sustainable business.
By choosing PostgreSQL wire compatibility, CockroachDB inherits decades of tooling investment. Applications can often migrate by changing only the connection string. Popular ORMs (SQLAlchemy, Hibernate, ActiveRecord), drivers (psycopg2, JDBC, Go pq), and tools (pgAdmin, DBeaver) work directly.
CockroachDB uses a layered, symmetric architecture where every node is equal and can serve any request.
The Layered Stack
CockroachDB's architecture consists of five major layers:
Symmetric Architecture
Unlike systems with dedicated master/coordinator nodes, every CockroachDB node is identical:
This symmetry simplifies operations: add nodes, remove nodes, and CockroachDB automatically rebalances.
Ranges: The Unit of Distribution
CockroachDB divides data into ranges:
CockroachDB's key innovation over Spanner is achieving similar consistency guarantees without specialized time hardware. It uses Hybrid Logical Clocks (HLC) instead of TrueTime.
HLC Structure
A Hybrid Logical Clock timestamp has two components:
123456789101112131415161718192021222324
// Hybrid Logical Clock timestamptype Timestamp struct { // Physical part: Wall clock time in nanoseconds // Advances with real time, subject to clock skew WallTime int64 // Logical part: Counter for ordering events at same wall time // Ensures causal ordering even with clock skew Logical int32} // Compare two timestampsfunc (a Timestamp) Less(b Timestamp) bool { if a.WallTime != b.WallTime { return a.WallTime < b.WallTime } return a.Logical < b.Logical} // Example timestamps:// T1 = {WallTime: 1705000000000000000, Logical: 0}// T2 = {WallTime: 1705000000000000000, Logical: 1}// T3 = {WallTime: 1705000000000000001, Logical: 0}// Ordering: T1 < T2 < T3How HLC Works
HLC provides three key guarantees:
Physical time tracking: The WallTime component tracks real time, ensuring timestamps don't drift too far from actual time.
Causal ordering: If event A happens-before event B (causally), then timestamp(A) < timestamp(B). The Logical counter ensures this even when WallTime is equal.
Bounded uncertainty: CockroachDB knows the maximum clock skew between nodes (configured by max-offset, typically 500ms). This enables uncertainty windows for transaction ordering.
HLC vs TrueTime Trade-offs
| Aspect | TrueTime (Spanner) | HLC (CockroachDB) |
|---|---|---|
| Hardware required | GPS, atomic clocks | Standard servers |
| Clock uncertainty | ~1-7ms (GPS-bounded) | ~250-500ms (configured) |
| Commit latency impact | Wait ~7ms per commit | Uncertainty handled differently |
| External consistency | Guaranteed | Serializable (slightly weaker) |
| Infrastructure flexibility | Google Cloud only | Any cloud, on-prem, Kubernetes |
| Cost | Included in Cloud Spanner | No hardware overhead |
Handling Uncertainty in CockroachDB
Since CockroachDB can't bound clock uncertainty as tightly as Spanner, it handles potential conflicts differently:
Uncertainty Windows:
T where T is within the reader's uncertainty window, the reader must restart at T+1Transaction Restart:
12345678910111213141516171819202122
// CockroachDB read with uncertainty handling function read(key, txnTimestamp, maxOffset): value, valueTimestamp = storage.get(key) uncertaintyLimit = txnTimestamp + maxOffset if valueTimestamp > txnTimestamp: if valueTimestamp <= uncertaintyLimit: // Value is in uncertainty window // We can't tell if it was written before or after we started throw ReadWithinUncertaintyIntervalError{ readTimestamp: txnTimestamp, existingTimestamp: valueTimestamp, // Transaction will restart with timestamp > valueTimestamp } else: // Value is definitely in the future; don't read it // Return older version or none return getPreviousVersion(key, txnTimestamp) return value // Safe to readIn practice, uncertainty-related restarts are rare (<1% of transactions) because clock skew between properly-configured nodes is small. Using NTP with good time sources, the actual skew is often <10ms even with a 500ms configured maximum. The 500ms is a safety bound, not the common case.
CockroachDB implements serializable snapshot isolation (SSI) with innovations that make distributed transactions efficient.
The Write Intent Mechanism
Unlike traditional databases that use locks, CockroachDB uses write intents—provisional writes that mark data as being modified by an in-progress transaction:
12345678910111213141516171819202122232425262728293031323334
// Write Intent StructureWriteIntent { key: []byte // The key being written value: []byte // The provisional value timestamp: Timestamp // Transaction timestamp txnRecord: TxnRecordPointer // Link to transaction record status: PENDING | COMMITTED | ABORTED} // Write pathfunction write(key, value, txn): intent = WriteIntent{ key: key, value: value, timestamp: txn.timestamp, txnRecord: txn.recordLocation, status: PENDING } storage.put(key, intent) // Read path with conflict handlingfunction read(key, txn): result = storage.get(key) if result.isIntent(): intent = result if intent.txnRecord.id == txn.id: // Reading our own write return intent.value else: // Another transaction's intent - handle conflict return handleConflict(intent, txn) return result.valueConflict Resolution
When a transaction encounters another transaction's write intent, CockroachDB performs contention management:
Parallel Commits
CockroachDB's parallel commits optimization reduces commit latency:
1234567891011121314151617181920212223242526
// Traditional 2PC (2 consensus rounds)function traditionalCommit(txn): // Round 1: Write transaction record as STAGING writeTransactionRecord(txn.id, STAGING) // Round 2: After intents replicated, flip to COMMITTED wait for intent replication writeTransactionRecord(txn.id, COMMITTED) // Then async: resolve intents to regular values // Parallel Commits (1 consensus round)function parallelCommit(txn): // Write transaction record AND intents in parallel parallel { writeTransactionRecord(txn.id, STAGING, inFlightWrites=[...]) writeIntent(key1, value1, txn) writeIntent(key2, value2, txn) // ... all intents } // Transaction is considered committed when: // - Transaction record shows STAGING // - ALL in-flight writes succeeded // Readers can verify this by checking all listed intents // Async cleanup: flip STAGING -> COMMITTED, resolve intents| Isolation Level | Anomalies Prevented | CockroachDB Support |
|---|---|---|
| Serializable | All (strongest) | Default, recommended |
| Read Committed | Dirty reads | Supported (23.2+) |
| Snapshot Isolation | Write skew possible | Via serializable with caveats |
| Read Uncommitted | None | Not supported |
Unlike most databases that default to weaker isolation, CockroachDB defaults to serializable isolation. This eliminates entire classes of concurrency bugs but may require application awareness of transaction restarts. The automatic retry mechanism handles most cases transparently.
CockroachDB implements the PostgreSQL wire protocol and a large subset of PostgreSQL SQL syntax, enabling seamless migration for many applications.
Wire Protocol Compatibility
CockroachDB speaks PostgreSQL wire protocol v3:
123456789101112131415161718192021222324252627
# Python connection example (identical to PostgreSQL)import psycopg2 # Connection string formatconn = psycopg2.connect( host="your-cluster.cockroachlabs.cloud", port=26257, database="defaultdb", user="your_username", password="your_password", sslmode="verify-full") # Standard PostgreSQL queries workcursor = conn.cursor()cursor.execute(""" SELECT id, name, balance FROM accounts WHERE balance > %s ORDER BY balance DESC LIMIT 10""", (10000,)) for row in cursor.fetchall(): print(row) conn.close()SQL Compatibility
CockroachDB supports most PostgreSQL SQL features:
12345678910111213141516171819202122232425262728293031323334
-- Standard PostgreSQL syntax worksCREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email STRING UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT current_timestamp(), metadata JSONB); -- Window functionsSELECT id, amount, SUM(amount) OVER ( PARTITION BY user_id ORDER BY created_at ROWS UNBOUNDED PRECEDING ) as running_totalFROM transactions; -- CTE with recursionWITH RECURSIVE org_tree AS ( SELECT id, name, parent_id, 1 as depth FROM organizations WHERE parent_id IS NULL UNION ALL SELECT o.id, o.name, o.parent_id, ot.depth + 1 FROM organizations o JOIN org_tree ot ON o.parent_id = ot.id)SELECT * FROM org_tree; -- JSONB queriesSELECT * FROM users WHERE metadata @> '{"role": "admin"}';Cockroach Labs provides a migration assessment tool that analyzes your PostgreSQL schema and queries to identify compatibility issues before migration. For most CRUD applications with standard SQL, migration is straightforward.
CockroachDB provides sophisticated multi-region capabilities for global applications, balancing latency, availability, and consistency.
Topology Patterns
CockroachDB supports three multi-region table locality patterns:
| Locality | Behavior | Read Latency | Write Latency | Best For |
|---|---|---|---|---|
| REGIONAL BY TABLE | Table pinned to home region | Fast in home region | Fast in home region | Regional data, compliance |
| REGIONAL BY ROW | Each row in its home region | Fast for local rows | Fast for local writes | User data, geo-sharding |
| GLOBAL | Optimized global reads | Fast everywhere | Slower (all regions) | Reference data, config |
123456789101112131415161718192021222324252627
-- Configure database for multi-regionALTER DATABASE myapp SET PRIMARY REGION "us-east1";ALTER DATABASE myapp ADD REGION "us-west1";ALTER DATABASE myapp ADD REGION "eu-west1"; -- Regional by table: stays in primary regionCREATE TABLE config ( key STRING PRIMARY KEY, value STRING) LOCALITY REGIONAL BY TABLE IN PRIMARY REGION; -- Regional by row: each row has a home regionCREATE TABLE users ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), email STRING, crdb_region crdb_internal_region NOT NULL DEFAULT 'us-east1') LOCALITY REGIONAL BY ROW; -- Insert user with specific regionINSERT INTO users (id, email, crdb_region)VALUES (gen_random_uuid(), 'alice@example.com', 'eu-west1'); -- Global table: fast reads everywhereCREATE TABLE currency_rates ( code STRING PRIMARY KEY, usd_rate DECIMAL) LOCALITY GLOBAL;Survival Goals
CockroachDB can be configured with different survival goals:
Global Tables Deep Dive
Global tables are optimized for read-heavy reference data:
Cross-region writes add ~100-300ms latency due to Raft consensus across regions. Regional tables with locality-aware workloads achieve single-digit millisecond latency. Plan schema design around access patterns: keep frequently co-accessed data in the same region.
CockroachDB minimizes operational burden through self-managing features.
Online Schema Changes
Unlike traditional databases that require downtime for DDL operations, CockroachDB performs schema changes online:
Automatic Operations
CockroachDB handles many tasks automatically:
| Operation | CockroachDB | Traditional RDBMS |
|---|---|---|
| Replication | Automatic, configurable factor | Manual setup and monitoring |
| Failover | Automatic via Raft | Manual or HA proxy setup |
| Rebalancing | Automatic based on load/size | Manual shard management |
| Range splitting | Automatic at 512MB | Manual partition management |
| Recovery | Automatic from healthy replicas | Point-in-time restore |
| Upgrades | Rolling upgrade, no downtime | Scheduled maintenance windows |
Built-in Observability
CockroachDB includes a rich admin UI and metrics infrastructure:
CockroachDB's operational simplicity shines in 'Day 2' scenarios. Adding nodes is: start new node, point at cluster, done. Removing nodes: drain and decommission. Upgrading: rolling restart with new binary. These operations are routine, not events.
CockroachDB offers flexibility in how you deploy and manage your database.
Self-Hosted Deployments
Managed Services (CockroachDB Cloud)
| Tier | Best For | Pricing Model | Features |
|---|---|---|---|
| Serverless | Development, small workloads | Pay-per-request | Auto-scaling, 5GB free, instant start |
| Dedicated | Production workloads | Fixed node hours | Dedicated resources, SLA, multi-region |
| Self-Hosted + Support | Enterprise on-prem | License fee | Full control + vendor support |
12345678910111213
# Local development (single-node)cockroach start-single-node --insecure # 3-node local cluster (for testing)cockroach start --insecure --store=node1 --listen-addr=localhost:26257 --http-addr=localhost:8080 --join=localhost:26257,localhost:26258,localhost:26259cockroach start --insecure --store=node2 --listen-addr=localhost:26258 --http-addr=localhost:8081 --join=localhost:26257,localhost:26258,localhost:26259cockroach start --insecure --store=node3 --listen-addr=localhost:26259 --http-addr=localhost:8082 --join=localhost:26257,localhost:26258,localhost:26259 cockroach init --insecure --host=localhost:26257 # Production Kubernetes (using operator)kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/master/install/crds.yamlkubectl apply -f cockroachdb-cluster.yamlCockroachDB Cloud runs on AWS, GCP, and Azure. Unlike Cloud Spanner (GCP-only), you can deploy CockroachDB on any major cloud or on-premises, enabling multi-cloud and hybrid cloud strategies.
CockroachDB demonstrates that Spanner's innovations are achievable without Google-scale infrastructure. Let's consolidate the key insights and compare with alternatives.
| Feature | CockroachDB | Cloud Spanner | PostgreSQL |
|---|---|---|---|
| Horizontal scale | ✓ Yes | ✓ Yes | ✗ Vertical only |
| Distributed ACID | ✓ Serializable | ✓ External consistency | ✓ Single-node |
| Multi-region | ✓ Yes | ✓ Yes (cloud) | Manual replication |
| SQL compatibility | PostgreSQL | GoogleSQL | PostgreSQL |
| Self-hosted option | ✓ Yes | ✗ No (GCP only) | ✓ Yes |
| Licensing | BSL (open core) | Proprietary | PostgreSQL License |
| Minimum cost | $0 (self-hosted) | ~$650/month | $0 |
What's Next
With understanding of both Spanner and CockroachDB, we'll conclude with NewSQL Use Cases—identifying when NewSQL is the right choice and when traditional SQL or NoSQL better fits your requirements.
You now understand CockroachDB's architecture, how it achieves Spanner-like capabilities without specialized hardware, and its PostgreSQL compatibility. CockroachDB represents the democratization of distributed SQL—bringing NewSQL benefits to organizations of all sizes.