Google Spanner - Learning Module

Loading content...

0/273

Distributed Transactions

ACID Across Continents

For decades, distributed transaction support was the holy grail of database systems—and the graveyard of many attempts. The fundamental challenge: how do you ensure that a transaction affecting data in Tokyo, Frankfurt, and Virginia either commits everywhere or nowhere, without unacceptable latency or availability loss?

Traditional two-phase commit (2PC) protocols solve the atomicity problem but create new ones: they require all participants to be available, they hold locks during coordination, and coordinator failures can leave transactions in limbo. At global scale, these problems become severe.

Spanner's distributed transaction protocol represents a synthesis of multiple innovations:

Paxos groups for fault-tolerant replication within each shard
Two-phase commit for atomicity across shards
TrueTime for globally meaningful timestamps
Wound-wait deadlock prevention for lock management

The result is a system that provides serializable, externally consistent transactions spanning the globe—something that was considered impractical before Spanner demonstrated it at scale.

What You Will Learn

By the end of this page, you will understand Spanner's transaction protocol in detail—from lock acquisition through commit. You'll see how 2PC and Paxos work together, how deadlocks are prevented, and how the system maintains performance despite global coordination requirements.

Transaction Fundamentals in Spanner

Before diving into the distributed case, let's establish how transactions work within a single Paxos group.

Single-Group Transactions:

When a transaction only touches data managed by one Paxos group (common for well-designed schemas using interleaved tables), the protocol is straightforward:

Read Phase: Client reads data, acquiring read locks
Write Buffering: Writes are buffered in the client until commit
Prepare: Client sends writes to the leader
Leader Processing: Leader acquires write locks, assigns timestamp
Paxos Commit: Leader replicates transaction via Paxos
Commit-Wait: Leader waits until assigned timestamp passes
Apply and Release: Transaction applied, locks released
Response: Client receives commit confirmation with timestamp

This single-group path is highly optimized. The Paxos replication ensures durability (transaction survives server failures), and the commit-wait ensures external consistency.

Transaction Types:

Spanner supports several transaction types, each optimized for its use case:

Read-Write Transactions: Full ACID transactions that can read and modify data. Required for any writes.
Read-Only Transactions: Transactions that only read data. Execute at a consistent snapshot without blocking writes.
DML Statements: Individual SQL statements (INSERT, UPDATE, DELETE) executed as implicit transactions.
Partitioned DML: Bulk operations that execute in parallel across shards, with weaker atomicity guarantees but higher throughput.

Spanner Transaction Types and Their Properties
Type	Can Write?	Locks?	Serializable?	Cross-Group?	Typical Use
Read-Write Txn	Yes	Yes (read/write)	Yes	Yes (with 2PC)	Updates requiring consistency
Read-Only Txn	No	No	Yes (snapshot)	Yes (no 2PC needed)	Reports, analytics, bulk reads
Single DML	Yes	Yes	Yes	Yes (with 2PC)	Simple single-statement updates
Partitioned DML	Yes	Per-partition	Per-row	Yes (parallel)	Bulk updates, data cleanup

Concurrency Control: Two-Phase Locking:

Spanner uses strict two-phase locking (S2PL) for concurrency control:

Growing Phase: Transaction acquires all needed locks (shared for reads, exclusive for writes)
Shrinking Phase: Transaction releases all locks at once after commit

Locks are held until transaction completion to ensure serializability. This means:

Readers don't block other readers (shared locks are compatible)
Writers block readers and other writers (exclusive locks)
Read-only transactions use snapshot isolation and don't acquire locks

Lock Granularity:

Spanner supports multiple lock granularities:

Row-level locks: Standard for OLTP workloads
Key range locks: Block inserts in a key range, preventing phantoms
Table-level locks: Used for DDL operations

Read-Only Transactions are Special

Read-only transactions execute at a snapshot timestamp and don't acquire locks. This means they never block writes and can be served by any replica. For read-heavy workloads, this is a massive performance optimization.

Multi-Group Transactions with Two-Phase Commit

When a transaction spans multiple Paxos groups—for example, transferring money between two users in different regions—Spanner uses two-phase commit (2PC) to ensure atomicity.

The Participants:

Coordinator: One Paxos group (typically the one receiving the first write) is designated as the coordinator
Participants: All other Paxos groups involved in the transaction

Phase 1: Prepare

Coordinator sends PREPARE to all participants
Each participant:
- Acquires locks on its data
- Writes transaction data to its Paxos log (ensuring durability)
- Responds with PREPARED (or ABORT if something fails)
Coordinator waits for all participants to respond

Phase 2: Commit

If all participants responded PREPARED:
- Coordinator assigns a commit timestamp
- Coordinator writes COMMIT to its Paxos log (the commit point)
- Coordinator sends COMMIT with timestamp to all participants
If any participant responded ABORT:
- Coordinator sends ABORT to all participants
- All participants roll back and release locks

The Critical Innovation: Paxos-Backed 2PC

Traditional 2PC has a critical vulnerability: if the coordinator fails between receiving all PREPARED votes and writing COMMIT, the transaction is stuck. Participants can't safely commit (they don't know if all participants prepared) or abort (another participant might have already committed).

Spanner solves this by making the coordinator itself a Paxos group:

paxos-backed-2pc.txt
SPANNER'S PAXOS-BACKED TWO-PHASE COMMIT
═══════════════════════════════════════════════════════════════════
 
Transaction: Transfer $100 from Account A (US-East) to Account B (EU-West)
 
PARTICIPANTS:
─────────────
┌─────────────────────────────┐     ┌─────────────────────────────┐
│    PAXOS GROUP: US-EAST     │     │    PAXOS GROUP: EU-WEST     │
│         (COORDINATOR)        │     │       (PARTICIPANT)         │
│                             │     │                             │
│  ┌─────────────────────┐    │     │  ┌─────────────────────┐    │
│  │ Leader  │ Follower  │    │     │  │ Leader  │ Follower  │    │
│  │ US-E1   │ US-E2     │    │     │  │ EU-W1   │ EU-W2     │    │
│  │         │           │    │     │  │         │           │    │
│  │ Data: Account A     │    │     │  │ Data: Account B     │    │
│  │ Balance: $500       │    │     │  │ Balance: $200       │    │
│  └─────────────────────┘    │     │  └─────────────────────┘    │
└──────────────┬──────────────┘     └──────────────┬──────────────┘
               │                                    │
               │  PHASE 1: PREPARE                  │
               │  ────────────────                  │
               │                                    │
               │  1a. Coordinator logs PREPARE      │
               │      (via Paxos to US-E2)          │
               │                                    │
               ├────────PREPARE{txn, writes_A}─────►│
               │                                    │
               │                    1b. Participant │
               │                    • Acquires locks│
               │                    • Logs PREPARED │
               │                      (via Paxos)   │
               │                                    │
               │◄───────PREPARED{max_timestamp}─────┤
               │                                    │
               │ 1c. Coordinator processes locally  │
               │     • Acquires locks on Account A  │
               │     • Logs PREPARED locally        │
               │                                    │
               │  PHASE 2: COMMIT                   │
               │  ───────────────                   │
               │                                    │
               │ 2a. Assign commit timestamp:       │
               │     s = max(all participant        │
               │         timestamps, TT.now().latest)│
               │                                    │
               │ 2b. COMMIT-WAIT: Wait until        │
               │     TT.after(s) is true            │
               │     (ensures s has passed globally)│
               │                                    │
               │ 2c. Coordinator logs COMMIT(s)     │
               │     (THE COMMIT POINT - transaction│
               │      is now durably committed)     │
               │                                    │
               ├─────────COMMIT{timestamp=s}────────►│
               │                                    │
               │                    2d. Participant │
               │                    • Logs COMMIT(s)│
               │                    • Applies writes│
               │                    • Releases locks│
               │                                    │
               │ 2e. Coordinator applies locally    │
               │     • Account A: $500 → $400       │
               │     • Releases locks               │
               │                                    │
               ◄────────────────ACK─────────────────┤
               │                                    │
               │  COMPLETE: Both accounts updated   │
               │            atomically at timestamp s│
               │                                    │
 
KEY INSIGHT: FAULT TOLERANCE
────────────────────────────
If US-E1 (coordinator leader) fails after Phase 1:
• US-E2 (coordinator follower) has the PREPARE in its Paxos log
• US-E2 becomes new leader, continues protocol
• Transaction is never stuck
 
If EU-W1 (participant leader) fails after Phase 1:
• EU-W2 (participant follower) has the PREPARED in its Paxos log
• EU-W2 becomes new leader, can respond to coordinator
• Transaction completes normally
 
EVERY state transition is logged via Paxos before proceeding,
making the entire protocol recoverable from any single failure.

Timestamp Coordination in 2PC:

A subtle but critical aspect of distributed transactions is timestamp assignment. The commit timestamp must be:

Greater than any timestamp seen by the transaction (to respect causality)
Greater than TT.now().latest at commit time (for external consistency)
Greater than the prepare timestamps from all participants (for global ordering)

During prepare, each participant reports the maximum timestamp it has seen. The coordinator takes the maximum of all these values and TT.now().latest to assign the commit timestamp.

Commit-Wait in Distributed Transactions:

The coordinator performs commit-wait after assigning the timestamp but before sending the COMMIT message. This means:

The commit-wait overlaps with Paxos logging
By the time COMMIT reaches participants, the timestamp has definitely passed
Participants can apply immediately without additional waiting

This optimization hides much of the commit-wait latency within the distributed protocol overhead.

Why Coordinator Selection Matters

The coordinator's location affects transaction latency since commit-wait happens there. Spanner strategically selects coordinators to minimize latency—often choosing the Paxos group geographically central to all participants.

Deadlock Prevention: The Wound-Wait Protocol

Locking creates the possibility of deadlocks: Transaction A holds lock X and waits for lock Y, while Transaction B holds lock Y and waits for lock X. Neither can proceed.

Spanner prevents deadlocks using the wound-wait protocol—a priority-based scheme using transaction timestamps:

The Wound-Wait Rules:

Each transaction receives a timestamp when it starts
Older transactions (lower timestamps) have higher priority
When a transaction T1 wants a lock held by T2:
- If T1 is older (higher priority): T1 wounds T2, forcing T2 to abort
- If T1 is younger (lower priority): T1 waits for T2 to release

Why This Prevents Deadlocks:

Consider the potential deadlock scenario:

T1 (started at t=100) holds lock X, wants lock Y
T2 (started at t=200) holds lock Y, wants lock X

With wound-wait:

T1 (older) wanting lock Y from T2 (younger): T1 wounds T2 → T2 aborts
T2 releases lock Y, T1 acquires it and completes

Alternatively:

T2 (younger) wanting lock X from T1 (older): T2 waits
T1 continues, releases X when done, T2 then acquires it

In no scenario do both transactions wait for each other.

Implementation Details:

wound-wait-protocol.txt
WOUND-WAIT DEADLOCK PREVENTION
═══════════════════════════════════════════════════════════════════
 
SCENARIO 1: Older Transaction Wants Lock Held by Younger
──────────────────────────────────────────────────────────
 
T1 (timestamp: 100) ───────────►  wants Lock X  ◄─── held by T2 (timestamp: 200)
 
T1 is OLDER (100 < 200) → T1 has HIGHER priority
 
ACTION: T1 "wounds" T2
        • T2 receives abort signal
        • T2 releases all locks
        • T2 rolls back
        • T1 acquires Lock X
        • T2 will retry later (with SAME timestamp 200, 
          so it gains "age" relative to new transactions)
 
 
SCENARIO 2: Younger Transaction Wants Lock Held by Older
──────────────────────────────────────────────────────────
 
T2 (timestamp: 200) ───────────►  wants Lock Y  ◄─── held by T1 (timestamp: 100)
 
T2 is YOUNGER (200 > 100) → T2 has LOWER priority
 
ACTION: T2 WAITS
        • T2 blocks until T1 releases Lock Y
        • T1 continues normally
        • When T1 commits/aborts, Lock Y released
        • T2 then acquires Lock Y
 
 
DEADLOCK IMPOSSIBILITY PROOF:
────────────────────────────
Deadlock requires a cycle: T1 → waits for → T2 → waits for → T1
 
For T1 to wait for T2: T1 must be younger than T2
For T2 to wait for T1: T2 must be younger than T1
 
T1 younger than T2 AND T2 younger than T1 is impossible!
(timestamps create a total order)
 
Therefore: Wait-for graph is always acyclic → No deadlocks possible
 
 
EXAMPLE: Multiple Transactions Competing
────────────────────────────────────────
 
Lock A   Lock B   Lock C
─────────────────────────────────────────
  T1      T2       T3      (all want each other's locks)
 (100)   (150)    (200)
 
T1 wants Lock B (from T2):  T1(100) < T2(150) → T1 wounds T2 → T2 aborts
T3 wants Lock C (from ???): T3 has it
 
After T2 abort:
T1 wants Lock C (from T3):  T1(100) < T3(200) → T1 wounds T3 → T3 aborts
 
Result: T1 completes, T2 and T3 retry later
        T2 retries with priority=150, T3 with priority=200
        They will complete eventually (making progress)

Retry Behavior:

When a transaction is aborted due to being wounded:

Timestamp Preservation: The transaction retries with its original timestamp, maintaining its priority position
Exponential Backoff: Retries use exponential backoff to reduce contention
Starvation Prevention: Because aborted transactions keep their timestamp, they eventually become the "oldest" and complete

The Cost of Wound-Wait:

Wound-wait has a cost: transactions may be aborted and retried. In high-contention scenarios, this can cause:

Wasted work (aborted transactions' reads and writes discarded)
Increased latency (due to retries)
Reduced throughput (resources spent on work that's discarded)

However, for most workloads, wound-wait's overhead is minimal compared to deadlock detection and resolution approaches used by other systems.

Designing for Low Contention

The best way to minimize wound-wait overhead is to design for low contention: use interleaved tables to localize data, keep transactions short, and avoid hot rows that many transactions update. Well-designed schemas rarely experience significant abort rates.

Read-Write Transaction Deep Dive

Let's trace through a complete read-write transaction lifecycle, from client request to committed response.

Pre-Transaction: Session and Context

Before executing transactions, clients establish a session with Spanner. Sessions maintain:

Connection state and authentication
Transaction contexts
Read timestamps for snapshot reads
Client-side result caching

The Transaction Lifecycle:

read-write-transaction-lifecycle.txt
COMPLETE READ-WRITE TRANSACTION LIFECYCLE
═══════════════════════════════════════════════════════════════════
 
EXAMPLE: Update user's email and log the change
 
BEGIN TRANSACTION;
  SELECT email FROM users WHERE user_id = 123;          -- Read
  UPDATE users SET email = 'new@email.com' WHERE user_id = 123;
  INSERT INTO audit_log (user_id, action, timestamp) 
    VALUES (123, 'email_update', CURRENT_TIMESTAMP);    -- Write
COMMIT;
 
═══════════════════════════════════════════════════════════════════
 
PHASE 1: TRANSACTION START
──────────────────────────
 
Client                              Spanner Frontend
   │                                      │
   │   BeginTransaction(readWrite)        │
   ├─────────────────────────────────────►│
   │                                      │
   │   ◄─────────────────────────────────┤
   │   Transaction ID: txn_abc123         │
   │   Start Time: t_start = 1704672000   │
   │                                      │
 
• Transaction receives unique ID and start timestamp
• Start timestamp used for wound-wait priority
 
 
PHASE 2: READ PHASE
───────────────────
 
Client                   Frontend              Paxos Group (users)
   │                        │                         │
   │ Read(users, id=123)    │                         │
   ├───────────────────────►│                         │
   │                        │                         │
   │                        │  AcquireReadLock(123)   │
   │                        ├────────────────────────►│
   │                        │                         │
   │                        │   ◄─ Lock Granted ──────┤
   │                        │   (or wait/wound)       │
   │                        │                         │
   │                        │  Read(users, id=123)    │
   │                        ├────────────────────────►│
   │                        │                         │
   │                        │   ◄─ {email: old@...} ──┤
   │                        │                         │
   │  ◄─────────────────────┤                         │
   │  {email: old@...}      │                         │
 
• Read locks prevent concurrent modifications
• Read locks compatible with other read locks
• Data returned to client for application logic
 
 
PHASE 3: WRITE BUFFERING
────────────────────────
 
Client                   Frontend
   │                        │
   │  UPDATE(users, 123,    │
   │    email='new@...')    │
   │                        │
   ├───────────────────────►│
   │                        │  Buffered Mutations:
   │  ◄─────────────────────┤  [users.123.email = 'new@...']
   │  (Buffered, not        │
   │   applied yet)         │
   │                        │
   │  INSERT(audit_log,     │
   │    ...)                │
   │                        │
   ├───────────────────────►│
   │                        │  Buffered Mutations:
   │  ◄─────────────────────┤  [users.123.email = 'new@...',
   │                        │   audit_log.{new row}]
 
• Writes are NOT sent to Paxos groups during buffering
• Writes accumulated in transaction context
• Allows client to do multiple operations efficiently
 
 
PHASE 4: COMMIT (Single Paxos Group Case)
─────────────────────────────────────────
 
Since users and audit_log might be interleaved (same Paxos group):
 
Client                   Frontend              Paxos Group
   │                        │                         │
   │  COMMIT                │                         │
   ├───────────────────────►│                         │
   │                        │                         │
   │                        │  PreparerAndCommit(     │
   │                        │    mutations=[...],     │
   │                        │    locks=[read:123])    │
   │                        ├────────────────────────►│
   │                        │                         │
   │                        │       LEADER:           │
   │                        │       1. Upgrade read lock to write lock
   │                        │       2. Validate no conflicts
   │                        │       3. Assign timestamp s
   │                        │       4. Log via Paxos  │
   │                        │       5. Commit-wait    │
   │                        │          (wait TT.after(s))
   │                        │       6. Apply mutations│
   │                        │       7. Release locks  │
   │                        │                         │
   │                        │  ◄─ COMMITTED(ts=s) ────┤
   │  ◄─────────────────────┤                         │
   │  COMMITTED             │                         │
   │  timestamp: s          │                         │
   │                        │                         │
 
TOTAL LATENCY BREAKDOWN (typical single-group):
• Read phase: 5-20ms (depends on data locality)
• Buffering: ~0ms (client-side)
• Lock upgrade: 1-2ms
• Paxos logging: 5-10ms (depends on replica spread)
• Commit-wait: 4-7ms (TrueTime uncertainty)
• Apply: <1ms
• Total: 15-40ms

Multi-Group Transaction Flow:

When the transaction spans multiple Paxos groups (e.g., users in one group, audit_log in another), the flow adds 2PC coordination:

Participant Prepare: All Paxos groups receive PREPARE with their mutations
Participant Response: Each group logs PREPARED and responds to coordinator
Timestamp Selection: Coordinator selects max timestamp from all participants
Commit-Wait: Coordinator waits for timestamp to pass
Commit Broadcast: Coordinator sends COMMIT to all participants
Parallel Apply: All participants apply mutations and release locks

Latency Impact of Distribution:

Multi-group transactions add:

Round-trip to participants during prepare
Possible additional round-trip for commit acknowledgment
Coordinator wait for slowest participant

For geographically distributed groups, this can add 100-200ms compared to single-group transactions.

Schema Design Matters

The difference between a single-group transaction (15-40ms) and a multi-group transaction (100-300ms) is dramatic. This is why Spanner's interleaved table design is so important—keeping related data in the same Paxos group avoids distributed transaction overhead.

Read-Only Transactions: Lock-Free Execution

Read-only transactions are a powerful optimization in Spanner. They provide serializable isolation without acquiring locks, enabling high read throughput without blocking writers.

How Read-Only Transactions Work:

Timestamp Selection: A read timestamp is chosen (either strong, bounded-stale, or exact-stale)
Snapshot Execution: All reads execute against the snapshot at that timestamp
No Locks Needed: Since we're reading a fixed point in time, there's no conflict with concurrent writes
Any Replica: Reads can be served by any replica that has caught up to the read timestamp

The Power of Lock-Free Reads:

Consider a report that reads from multiple tables:

SELECT SUM(balance) FROM accounts WHERE region = 'US';
SELECT COUNT(*) FROM transactions WHERE date = TODAY;
SELECT * FROM audit_log WHERE timestamp > YESTERDAY;

With a read-only transaction:

All three queries see a consistent snapshot
No locks are held during execution
Concurrent write transactions continue unimpeded
Reads can be parallelized across replicas

Read-Only vs Read-Write Transaction Comparison
Aspect	Read-Write Transaction	Read-Only Transaction
Locks	Read locks during reads, write locks at commit	No locks
Writer Blocking	Can block concurrent writers on same rows	Never blocks writers
Replica Choice	Reads must go to leader (for consistency)	Any replica with data at timestamp
Geographic Impact	Bound to leader location for reads	Can read from nearest replica
Commit Overhead	Paxos + Commit-wait	None (no commit phase)
Abort Possible?	Yes (conflicts, deadlocks)	No (nothing to abort)

Strong vs. Stale Reads:

The timestamp selection for read-only transactions creates different consistency-latency tradeoffs:

Strong Reads:

Timestamp = current time (TT.now().latest)
Sees all transactions committed before the read began
May need to wait briefly for replicas to catch up
May need to contact leader for very recent data

Bounded Staleness:

Timestamp = current time - staleness bound
Can always be served immediately from local replica
May miss transactions committed in the last N seconds
Ideal for dashboards, analytics, search indexes

Exact Staleness:

Timestamp = specific value (e.g., start of day)
Reproducible: same query at same timestamp = same results
Ideal for batch processing, reconciliation

Multi-Region Optimization:

For globally deployed applications, stale reads provide dramatic performance improvements:

User in Tokyo, data leader in Virginia
Strong read: ~150ms (round-trip to Virginia)
10-second stale read: ~5ms (local Tokyo replica)

For read-heavy workloads where slight staleness is acceptable, this 30x improvement transforms user experience.

Pattern: Stale Reads + Strong Writes

A common pattern: use strong reads for critical user-facing operations (checking account balance before transfer) but stale reads for non-critical reads (displaying recent transactions). This provides strong consistency where it matters while optimizing performance elsewhere.

Transaction Performance Characteristics

Understanding Spanner's performance characteristics helps you design schemas and applications that work with the system rather than against it.

Latency Components:

Total transaction latency comprises several components:

Network Round-Trips: Client to Spanner frontend, frontend to Paxos groups
Lock Acquisition: Wait time if locks are held by other transactions
Paxos Replication: Time to replicate across replicas (depends on geographic spread)
Commit-Wait: TrueTime uncertainty period
2PC Overhead: For multi-group transactions, coordinator-participant round-trips

Typical Latency by Transaction Type and Configuration
Scenario	Components	Typical Latency	Notes
Single-group write (regional)	Network + Paxos + Commit-wait	10-25ms	Best case for writes
Single-group write (multi-region)	Network + Paxos + Commit-wait	50-150ms	Paxos spans regions
Multi-group write (regional)	Network + 2PC + Paxos + Commit-wait	20-50ms	2PC adds coordination
Multi-group write (multi-region)	Network + 2PC + Paxos + Commit-wait	100-300ms	Geographic distribution dominates
Read-only strong (regional)	Network + Read	5-15ms	May wait for replica catch-up
Read-only strong (multi-region)	Network + Read	50-150ms	May need leader contact
Read-only stale (any config)	Network + Read	1-10ms	Always served locally

Throughput Considerations:

Spanner scales horizontally—adding more nodes increases capacity. However, several factors affect throughput:

1. Hot Spots: If many transactions access the same rows, lock contention limits throughput. Solutions:

Use UUID keys instead of sequential IDs
Partition hot data across multiple rows
Use read-only transactions where possible

2. Geographic Distribution: Multi-region configurations have lower per-transaction throughput due to coordination overhead. However, total system throughput may be higher due to geographic distribution of load.

3. Transaction Size: Larger transactions (more rows read/written) hold locks longer, reducing concurrency. Keep transactions small and focused.

4. 2PC Overhead: Multi-group transactions have lower throughput than single-group. Design schemas to keep related data together.

Benchmarks and Real-World Performance:

Google has published performance data showing:

Thousands of nodes handling millions of QPS
99th percentile latencies under 100ms for regional configurations
Linear scalability as nodes are added
Recovery times under 10 seconds for node failures

Performance Optimization Strategies

•Use Interleaved Tables: Keep related data in the same Paxos group to avoid 2PC.
•Prefer Read-Only Transactions: For read-heavy workloads, use lock-free read-only transactions.
•Accept Staleness When Possible: Bounded staleness reads are dramatically faster and can be served locally.
•Keep Transactions Small: Shorter lock hold times mean higher concurrency.
•Avoid Hot Keys: Spread writes across key space to prevent contention.
•Configure Leader Placement: Put leaders in regions where writes originate to minimize latency.

Cloud Spanner Pricing Consideration

Cloud Spanner pricing is based on node-hours and storage. Multi-region configurations require more nodes for the same throughput (due to coordination overhead). Factor this into capacity planning when choosing regional vs. multi-regional deployments.

Summary: Transactions That Span the Globe

We've explored the sophisticated transaction machinery that enables Spanner's globally distributed ACID guarantees. Let's consolidate the key insights:

Key Takeaways

•Paxos-Backed 2PC Eliminates Coordinator Single Point of Failure: Every state transition is durably logged, ensuring recovery from any failure.
•Wound-Wait Prevents Deadlocks Deterministically: Transaction timestamps create a total order for conflict resolution.
•Single-Group Transactions are Fast: Interleaved tables keep related data together, avoiding expensive distributed coordination.
•Read-Only Transactions Don't Lock: Lock-free snapshot reads provide high throughput without blocking writers.
•Staleness Trades Freshness for Latency: Stale reads serve from local replicas, dramatically reducing latency for read-heavy workloads.
•Schema Design Profoundly Affects Performance: The difference between single-group and multi-group transactions can be 5-10x latency.
•TrueTime Enables Correct Ordering: Commit-wait ensures externally consistent transaction ordering despite geographic distribution.

What's Next:

With transactions covered, we'll explore how Spanner automatically manages data distribution. The next page covers Automatic Sharding—how Spanner dynamically partitions, rebalances, and moves data without application involvement or downtime.

Page Complete

You now understand Spanner's distributed transaction architecture—from single-group commits through multi-group 2PC, from lock management to performance optimization. Next, we'll see how Spanner automatically manages the physical distribution of data.