Loading learning content...
In 1978, Leslie Lamport published "Time, Clocks, and the Ordering of Events in a Distributed System"—a paper that would become one of the most cited in computer science. The core insight was profound: in a distributed system, there is no global notion of time. Different computers have different clocks that drift at different rates, and even the most precise synchronization protocols leave uncertainty windows of milliseconds to seconds.
This clock uncertainty creates a fundamental problem for databases. When two transactions occur on different continents, how do you determine which happened first? If the clocks on those continents disagree, you might produce inconsistent orderings that violate the database's guarantees.
For decades, distributed systems dealt with this by either:
Google's TrueTime took a radically different approach: instead of pretending clocks are perfect or giving up on physical time, TrueTime explicitly models clock uncertainty and uses that uncertainty to guarantee correct ordering.
By the end of this page, you will understand how TrueTime works, why it's revolutionary for distributed systems, and how Spanner uses it to achieve external consistency—the strongest consistency guarantee possible. You'll see how atomic clocks and GPS receivers combine with clever protocols to create a globally coherent notion of time.
To understand TrueTime's innovation, we must first deeply understand the problem it solves.
Clock Drift: Nothing Stays Synchronized
Every computer contains a quartz crystal oscillator that counts time. These crystals are remarkably precise—typically accurate to within 10-50 parts per million (ppm). But 50ppm means a clock that drifts by 50 microseconds every second, which accumulates to:
Now imagine thousands of servers across data centers worldwide, each drifting independently. Even with periodic synchronization via protocols like NTP (Network Time Protocol), you can never achieve perfect alignment.
NTP's Limitations:
NTP, the standard internet time synchronization protocol, has fundamental limitations:
Network Latency Variation: NTP works by exchanging timestamps across the network. Network delays vary unpredictably, introducing uncertainty of 10-100ms over the internet, and typically 1-10ms even within a datacenter.
No Bounded Uncertainty: NTP estimates clock offset, but this estimate has an unbounded error. You know approximately what time it is, but you don't know how wrong you might be.
Vulnerable to Failures: If NTP servers become unreachable, clocks continue drifting without correction.
For a database trying to order transactions globally, these limitations are catastrophic.
| Method | Typical Accuracy | Bounded Error? | Failure Mode | Cost |
|---|---|---|---|---|
| Quartz (unsynchronized) | 50 ppm | No | Unbounded drift | Free |
| NTP (internet) | 10-100ms | No | Unbounded if unreachable | Minimal |
| NTP (datacenter) | 1-10ms | No | Unbounded if unreachable | Minimal |
| PTP (IEEE 1588) | <1ms | No | Depends on network | Moderate |
| GPS Receiver | <1μs to UTC | Yes (after acquisition) | Sky visibility required | Moderate |
| Atomic Clock | ~1ns | Yes | Very rare failures | High ($50K+) |
| TrueTime (GPS + Atomic) | ~1-7ms | Yes, always | Graceful degradation | High (at scale) |
Why Unbounded Uncertainty Breaks Consistency:
Consider two transactions that must be ordered:
Does A happen before B? If the New York clock was 10ms fast and the London clock was 10ms slow, then in "true" time:
A happened 25ms before B. But if you just compare the recorded timestamps, you'd think A happened only 5ms before B. Worse, if the clock errors were reversed, you might conclude B happened before A—a complete ordering violation.
The Implication for Databases:
If you can't determine true ordering, you can't guarantee:
This is why most distributed databases either avoid global ordering entirely, or funnel all transactions through a single coordinator.
If you don't know how wrong your clock might be, you can't make correct ordering decisions. But if you know the bounds of your uncertainty, you can wait until the uncertainty resolves before committing—guaranteeing correct ordering.
TrueTime's brilliance lies not in eliminating clock uncertainty—which is physically impossible—but in bounding it and making it explicit.
The TrueTime API:
Unlike traditional clock APIs that return a single timestamp, TrueTime returns an interval:
TT.now() → [earliest, latest]
This interval represents the range of possible true times. If TrueTime returns [12:00:00.000, 12:00:00.007], it's guaranteeing that the actual current time is somewhere in that 7-millisecond window.
The width of this interval is called ε (epsilon)—the uncertainty. TrueTime provides two additional operations:
TT.after(t) → true if t has definitely passed
TT.before(t) → true if t has definitely not arrived
These are powerful primitives. TT.after(t) returns true only when the earliest possible time is past t—meaning t has definitely passed regardless of clock errors.
The Hardware Foundation:
TrueTime's bounded uncertainty depends on specialized hardware:
GPS Receivers: GPS satellites carry atomic clocks and broadcast precise time signals. A GPS receiver can determine UTC to within ~100 nanoseconds when it has satellite visibility.
Atomic Clocks: Each Google datacenter has multiple atomic clocks (typically cesium or rubidium). These drift at rates measured in nanoseconds per day rather than microseconds.
Armageddon Masters: Dedicated time servers that combine GPS and atomic clock inputs, cross-validate them, and serve time to other servers in the datacenter.
TRUETIME INFRASTRUCTURE ARCHITECTURE═══════════════════════════════════════════════════════════════════ PER-DATACENTER TIME INFRASTRUCTURE: ┌─────────────────────────────────────────────────────┐ │ STRATUM 0: Reference Clocks │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ GPS Receiver │ │ Atomic Clock │ │ │ │ (multiple) │ │ (Cesium/Rb) │ │ │ │ ~100ns │ │ ~1ns/day │ │ │ └──────┬───────┘ └──────┬───────┘ │ │ │ │ │ │ └──────────┬──────────────────┘ │ │ ▼ │ │ ┌─────────────────────────────────────────────┐ │ │ │ ARMAGEDDON MASTERS │ │ │ │ • Cross-validate GPS and atomic sources │ │ │ │ • Detect and exclude faulty clocks │ │ │ │ • Distribute time with certified bounds │ │ │ │ • Multiple masters for redundancy │ │ │ └─────────────────────┬───────────────────────┘ │ │ │ │ │ STRATUM 1: Time Distribution │ │ ┌──────────────┼──────────────┐ │ │ ▼ ▼ ▼ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Timeserver│ │ Timeserver│ │ Timeserver│ │ │ │ (rack 1) │ │ (rack 2) │ │ (rack 3) │ │ │ └──────┬────┘ └──────┬────┘ └──────┬────┘ │ │ │ │ │ │ │ └──────────────┼──────────────┘ │ │ │ │ │ ────────────┼──────────── │ │ ╱ │ ╲ │ │ ▼ ▼ ▼ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Worker │ │ Worker │ │ Worker │ │ │ │ Servers │ │ Servers │ │ Servers │ │ │ │ TT.now() │ │ TT.now() │ │ TT.now() │ │ │ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────┘ TRUETIME UNCERTAINTY (ε) OVER TIME: ε │ Poll (ms) │ │ 7 ─────│─────────────────────────────────────────▲┼ │ ╱ │ 6 ─────│──────────────────────────────────────╱───│ │ ╱ │ 5 ─────│───────────────────────────────────╱──────│ │ ╱ │ 4 ─────│─────────────────────────────╱────────────│ │ ╱ │ 3 ─────│───────────────────────╱──────────────────│ │ ╱ │ 2 ─────│─────────────────╱────────────────────────│ │ ╱ │ Reset after 1 ─────│───────────╱──────────────────────────────│ successful │ ╱ │ poll 0 ─────│─────▲────────────────────────────────────▼────────── │ │ │ └─────┴───────────────────────────────────────────▶ Poll Time Between polls, uncertainty grows due to local clock drift.After each poll, uncertainty resets to network/processing latency.Typical ε: 1-7ms, average ~4ms.How Uncertainty is Bounded:
TrueTime's uncertainty calculation accounts for all error sources:
Reference Time Accuracy: GPS/atomic clock accuracy (sub-microsecond, negligible)
Network Latency: Time to communicate between time servers and workers (~1ms within datacenter)
Quartz Drift Between Polls: Local clocks drift between time synchronization polls. At 200ppm drift and 30-second poll intervals, this adds ~6ms uncertainty.
Processing Delays: Time to process responses and compute intervals
TrueTime continuously tracks these factors and adjusts the returned interval accordingly. After a successful time poll, uncertainty drops to ~1-2ms. As time passes without a poll, uncertainty grows due to local clock drift. If polls fail, uncertainty grows faster until backup time sources are consulted.
The Result:
Typical ε values are 1-7ms, with an average around 4ms. This might seem large, but it's bounded and known—which is infinitely more useful than an unbounded uncertainty that might be milliseconds or minutes.
GPS receivers need sky visibility, which servers in underground datacenters don't have. Atomic clocks don't need sky visibility but are expensive and still drift (very slowly). By combining GPS receivers (at the edges of datacenters) with atomic clocks (providing holdover during GPS outages) and local distribution, TrueTime achieves both reliability and precision.
With TrueTime's bounded uncertainty, Spanner can provide external consistency—the strongest consistency guarantee a distributed database can offer.
What is External Consistency?
External consistency means: if transaction T1 commits before transaction T2 starts (according to an external observer with a perfect clock), then T1's commit timestamp is earlier than T2's commit timestamp.
This is stronger than serializability. Serializability only requires that transactions appear to execute atomically in some order—that order doesn't have to match real-time ordering. External consistency makes the database respect real-world time.
Why External Consistency Matters:
Consider an auditing scenario:
With external consistency, the audit log will always show T1 (revocation) with an earlier timestamp than T2 (attempted action). Auditors can trust the database's ordering reflects what actually happened.
Without external consistency, clock skew could cause T2 to receive an earlier timestamp than T1—making it appear that Alice acted before her permissions were revoked, when in reality she acted after.
Linearizability vs. External Consistency:
Linearizability (also called "atomic consistency") is a related but different concept:
Linearizability applies to individual operations on individual objects. Each operation appears to happen atomically at some point between its invocation and response.
External consistency applies to entire transactions that may touch multiple objects. The transaction's effective timestamp respects real-time ordering.
Spanner provides both. Transactions are externally consistent, and individual read/write operations within a Paxos group are linearizable.
| Model | Ordering Guarantee | Performance | Example Systems |
|---|---|---|---|
| Eventual Consistency | No ordering, values eventually converge | Highest | DynamoDB (default), Cassandra (ONE) |
| Causal Consistency | Causally related operations ordered | High | MongoDB (causal reads) |
| Session Consistency | Per-session ordering (read-your-writes) | High | Many cloud databases |
| Snapshot Isolation | Consistent point-in-time reads | Moderate | PostgreSQL, CockroachDB |
| Serializability | Transactions appear to execute one at a time | Lower | PostgreSQL SERIALIZABLE |
| External Consistency | Respects real-time ordering | Lowest | Spanner, TiDB (with TSO) |
External consistency isn't free. As we'll see in the next section, Spanner must wait for TrueTime uncertainty to resolve before committing writes. This adds latency equal to ε (typically 4-7ms). For many applications, this overhead is acceptable for the correctness guarantees provided.
TrueTime enables external consistency through a clever protocol called commit-wait. The key insight: if you wait long enough after choosing a commit timestamp, you can be certain that the timestamp has definitively passed everywhere.
How Commit-Wait Works:
Transaction Execution: The transaction executes, acquiring locks and staging writes.
Timestamp Assignment: The leader assigns a commit timestamp s ≥ TT.now().latest (the upper bound of current time).
Commit Wait: The leader waits until TT.after(s) returns true—meaning time s has definitely passed.
Paxos Commit: The leader commits the transaction via Paxos.
Response to Client: The transaction is durably committed with timestamp s.
Why This Works:
Let's trace through carefully:
The Ordering Guarantee:
Now consider two transactions, T1 and T2, where T1 commits before T2 starts:
COMMIT-WAIT PROTOCOL VISUALIZATION═══════════════════════════════════════════════════════════════════ Timeline (with TrueTime uncertainty shown as intervals): Transaction T1: Transaction T2:────────────── ────────────── │ T1 starts │ ▼ Execute transaction │ (reads, writes, locks) │ ▼ Assign commit timestamp │ s1 = 100 (TT.now = [96, 100]) │ ▼ BEGIN COMMIT-WAIT │├─ Wait until TT.after(s1) ││ (TT.now.earliest > 100) ││ ││ TT.now = [97, 101] → earliest=97 ≤ 100, keep waiting ││ TT.now = [99, 103] → earliest=99 ≤ 100, keep waiting ││ TT.now = [101, 105] → earliest=101 > 100, DONE! │▼ END COMMIT-WAIT │ ▼ Commit via Paxos │ (replicate to majority) │ ▼ Response to client │ "T1 committed at timestamp 100" │════╪═══════════════════════════════════════════════════════════════ │ │ T2 starts │ │ TT.now = [102, 106] │ │ │ ▼ Execute transaction │ │ │ ▼ Assign commit timestamp │ │ s2 = 106 (TT.now = [103, 107]) │ │ │ s2 = 106 > s1 = 100 ✓ │ ORDERING PRESERVED! │ ▼ Continue with commit... KEY INVARIANT:─────────────When T2 starts, T1's commit-wait has finished.T1's commit-wait ensured TT.now.earliest > s1.T2's timestamp s2 ≥ TT.now.latest > TT.now.earliest > s1.Therefore: s2 > s1, respecting real-time order. COMMIT-WAIT DURATION:────────────────────Wait time = s1 - TT.now().earliest (when timestamp assigned) ≈ 2ε (since s1 ≈ TT.now().latest and interval width is ε) ≈ 8-14ms typically This is the "tax" for external consistency on write transactions.Optimizations to Reduce Commit-Wait Impact:
Spanner uses several techniques to minimize commit-wait overhead:
1. Prepare-Wait (for Distributed Transactions):
For transactions spanning multiple Paxos groups, Spanner uses two-phase commit. The commit-wait happens during the prepare phase, overlapped with the two-phase commit protocol. By the time all participants have prepared, the commit-wait has often already completed.
2. Read-Only Transaction Optimization:
Read-only transactions don't need commit-wait because they don't modify data. They receive timestamps via TT.now().latest but can return immediately after reading—no waiting required.
3. Batching:
Multiple transactions can share a single commit-wait. If several transactions are assigned the same timestamp, they can all commit after a single wait period.
4. Reduced ε Through Infrastructure:
The smaller ε is, the shorter the commit-wait. Google continuously invests in reducing TrueTime uncertainty through better hardware, more frequent polling, and optimized networking.
Commit-wait adds latency proportional to TrueTime uncertainty (typically 4-7ms). This is the price of external consistency. For applications that don't need external consistency, Spanner offers read staleness options that can read data at slightly older timestamps, avoiding fresh commits.
TrueTime enables powerful read semantics beyond just transaction ordering. Spanner supports multiple read modes, each with different consistency-latency tradeoffs.
Strong Reads:
A strong read sees all transactions committed before the read began. Implementation:
Strong reads guarantee you see the latest data, but incur the commit-wait latency.
Bounded Staleness Reads:
If you can tolerate slightly stale data (say, up to 10 seconds old), you can read at a timestamp in the past:
Bounded staleness reads are faster because they skip commit-wait, and can be served by any replica (not just leaders).
Exact Staleness Reads:
You can specify an exact timestamp to read at:
This is useful for reproducible queries—reading at the same timestamp always returns the same data.
Read-Only Transactions:
Read-only transactions are particularly powerful. They:
| Read Mode | Staleness | Latency | Replica Choice | Use Case |
|---|---|---|---|---|
| Strong Read | None (latest) | Higher (commit-wait) | Leader or recent replica | Critical reads, financial data |
| Bounded Staleness | Up to X seconds | Low | Any replica | Read-heavy workloads, dashboards |
| Exact Staleness | At timestamp T | Low if T is past | Any replica | Analytics, reproducible queries |
| Read-Only Txn (Strong) | None | Moderate | Any replica | Multi-table consistent reads |
| Read-Only Txn (Stale) | Configurable | Low | Any replica | Cross-table analytics |
Replica Selection and Stale Reads:
Spanner's stale read capability has profound implications for global performance. Consider a user in Tokyo querying data whose leader is in Virginia:
For read-heavy workloads where slight staleness is acceptable, this is a massive performance improvement.
Safe Time and Replica Catch-Up:
Every Spanner replica tracks its safe time—the latest timestamp at which it has received all updates. A replica can serve reads at any timestamp less than or equal to its safe time.
If a read request arrives for a timestamp beyond the replica's safe time, the replica either:
This ensures reads never return stale data beyond what was requested.
Schema changes (DDL) are also timestamped. When you read at a timestamp, you see the schema as it existed at that timestamp. This enables consistent reads even during schema migrations—old queries continue working at old timestamps while new queries use new schemas.
TrueTime timestamps aren't just internal bookkeeping—they create a globally meaningful ordering that applications can reason about.
Transactions Return Timestamps:
When a Spanner transaction commits, the response includes its commit timestamp. Applications can:
Compare Timestamps: If tx1.timestamp < tx2.timestamp, tx1 happened before tx2 in the global ordering.
Read at Timestamps: Request to read data as-of a specific timestamp, useful for:
Coordinate Across Systems: Pass timestamps to other systems that also understand TrueTime, enabling cross-system consistency.
The Timestamp as a Consistency Token:
Consider a microservices architecture where multiple services use Spanner:
This pattern enables causal consistency across services without complex distributed transaction protocols.
USING TIMESTAMPS FOR CROSS-SERVICE CONSISTENCY═══════════════════════════════════════════════════════════════════ Scenario: E-commerce with separate Order and Inventory services ┌────────────────┐ ┌────────────────┐│ Order Service │ │Inventory Svc ││ │ │ ││ Spanner DB: │ │ Spanner DB: ││ orders │ │ inventory │└───────┬────────┘ └───────┬────────┘ │ │ │ 1. Create order │ │ ────────────────► │ │ INSERT INTO orders... │ │ │ │ ◄──────────────── │ │ Commit timestamp: t₁ = 1704672000.123│ │ │ ├───────────────────────────────────────► │ 2. Send message via Pub/Sub │ │ { "order_id": 123, │ │ "timestamp": "1704672000.123", │ │ "items": [...] } │ │ │ │ 3. Receive message │ ◄──────────────── │ │ │ 4. Read order details │ at timestamp t₁ │ ────────────────► │ SELECT * FROM orders │ WHERE order_id=123 │ AT TIMESTAMP t₁ │ ◄──────────────── │ (Guaranteed to see │ the order that was │ just created) │ │ │ 5. Decrement inventory │ ────────────────► │ UPDATE inventory... │ ◄──────────────── │ Commit timestamp: t₂ │ │ │ (t₂ > t₁, ordering │ preserved!) │ │ GUARANTEE:─────────Because Order Service's write happened at t₁, and Inventory Servicereads at t₁, the read is guaranteed to see the write—even if the services are in different continents and using different Spannerinstances (as long as both use TrueTime). This is NOT possible without TrueTime or equivalent global time source.Traditional distributed systems would need distributed transactions,two-phase commit, or accept inconsistency windows.Time-Travel Queries:
Spanner retains old versions of data for a configurable period (by default, 1 hour). Combined with precise timestamps, this enables powerful "time-travel" queries:
-- See account balance at market close yesterday
SELECT balance
FROM accounts
WHERE account_id = 'A123'
AS OF TIMESTAMP '2024-01-08T16:00:00Z';
-- Compare balance changes over time
SELECT
(SELECT balance FROM accounts AS OF TIMESTAMP T1 WHERE account_id='A123') as before,
(SELECT balance FROM accounts AS OF TIMESTAMP T2 WHERE account_id='A123') as after;
Implications for Debugging:
When something goes wrong, you can reconstruct the exact state of the database at any point in time:
This temporal capability, powered by TrueTime's globally meaningful timestamps, transforms debugging from guesswork into deterministic investigation.
TrueTime transforms timestamps from internal database metadata into a first-class API for reasoning about distributed state. Applications using Spanner can think about time in ways that were previously impossible in distributed systems—coordinating across services, travelling through history, and trusting that ordering reflects reality.
We've explored how TrueTime transforms the fundamental challenge of distributed time into a tool for unprecedented consistency guarantees. Let's consolidate the key insights:
What's Next:
TrueTime enables correct transaction ordering, but how do transactions actually work when they span multiple continents and Paxos groups? In the next page, we'll explore Distributed Transactions in Spanner—how two-phase commit, Paxos, and TrueTime combine to provide globally atomic operations.
You now understand TrueTime's architecture, the commit-wait protocol, and how external consistency transforms what's possible in distributed databases. Next, we'll see how these foundations enable globally distributed transactions.