Loading learning content...
The most valuable property a database can offer isn't speed, scale, or feature richness—it's correctness. A faster database that occasionally loses data or returns wrong results is worse than a slower one that's always right. Yet achieving correctness in distributed systems is extraordinarily difficult, and most databases make compromises that push complexity onto application developers.
FoundationDB takes an uncompromising stance: strict serializability for all transactions, with no exceptions. This isn't marketing language—it's a technical claim backed by the most rigorous testing methodology in the database industry. Every read, every write, every transaction—no matter how many keys it touches, no matter how many concurrent transactions are running—behaves as if it executed in complete isolation, in some total order consistent with real time.
This guarantee simplifies application development dramatically. You don't need to reason about race conditions, lost updates, phantom reads, or write skew. If your logic is correct for a single-threaded, single-user scenario, it's correct for millions of concurrent users. The database handles concurrency; you handle business logic.
In this page, we'll explore what strict serializability means, how FoundationDB achieves it, and why this guarantee—rare among distributed databases—is worth its cost.
By the end of this page, you will understand: (1) The hierarchy of isolation levels and where strict serializability sits; (2) How FoundationDB's optimistic concurrency control works; (3) The read-write conflict model and how to design for minimal conflicts; (4) FoundationDB's revolutionary simulation testing methodology; and (5) The practical implications of strict serializability for application design.
Before appreciating FoundationDB's guarantees, we need to understand the landscape of transaction isolation—what guarantees are possible, what anomalies each level prevents, and where most databases fall in this hierarchy.
The Problem: Concurrent Transactions
When multiple transactions execute concurrently, they can interfere with each other in subtle ways. Consider two transactions running simultaneously:
If the initial balance is $1000, and both transactions read $1000 before either writes:
The final balance is $950 or $1100 (depending on who writes last), but it should be $1050. This is a lost update—one of many possible concurrency anomalies.
The Isolation Level Hierarchy:
Database isolation levels define which anomalies are prevented. Each level trades correctness for performance:
| Isolation Level | Dirty Reads | Lost Updates | Non-Repeatable Reads | Phantom Reads | Write Skew |
|---|---|---|---|---|---|
| Read Uncommitted | ❌ Allowed | ❌ Allowed | ❌ Allowed | ❌ Allowed | ❌ Allowed |
| Read Committed | ✅ Prevented | ❌ Allowed | ❌ Allowed | ❌ Allowed | ❌ Allowed |
| Repeatable Read | ✅ Prevented | ✅ Prevented | ✅ Prevented | ❌ Allowed | ❌ Allowed |
| Snapshot Isolation | ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented | ❌ Allowed |
| Serializable | ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented |
| Strict Serializable | ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented | ✅ Prevented + Real-time |
Defining the Anomalies:
Serializability vs. Strict Serializability:
Regular serializability guarantees that the outcome of concurrent transactions is equivalent to some serial execution. But that serial order might not respect real time—a transaction that started and finished before another might appear to execute after it in the serialization order.
Strict serializability (also called linearizability for single objects, or external consistency) adds the constraint that if transaction T1 commits before transaction T2 starts, then T1 must appear before T2 in the serialization order. The database's behavior respects real-time ordering.
This distinction matters for user expectations. If I transfer money and you refresh the page, you should see the transfer. Strict serializability guarantees this; plain serializability does not.
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162
# ============================================# WRITE SKEW ANOMALY EXAMPLE# ============================================ # Scenario: On-call scheduling# Constraint: At least one doctor must be on call at all times# Current state: Alice and Bob are both on call # Transaction T1 (Alice wants to go off-call):def alice_goes_off_call(tr): alice_on_call = tr.get("doctors/alice/on_call") # Returns True bob_on_call = tr.get("doctors/bob/on_call") # Returns True # Check constraint: Is someone else on call? if bob_on_call == True: tr.set("doctors/alice/on_call", False) # OK, Bob is still on call return "Alice went off-call" else: return "Cannot go off-call - no one else available" # Transaction T2 (Bob wants to go off-call) - running concurrently:def bob_goes_off_call(tr): alice_on_call = tr.get("doctors/alice/on_call") # Returns True bob_on_call = tr.get("doctors/bob/on_call") # Returns True # Check constraint: Is someone else on call? if alice_on_call == True: tr.set("doctors/bob/on_call", False) # OK, Alice is still on call return "Bob went off-call" else: return "Cannot go off-call - no one else available" # ============================================# THE ANOMALY (Under non-serializable isolation)# ============================================ # Both transactions read the SAME snapshot:# Alice sees: Alice=True, Bob=True# Bob sees: Alice=True, Bob=True # Both transactions pass their constraint check# Both transactions write:# T1 writes: Alice = False# T2 writes: Bob = False # Final state: Alice=False, Bob=False# CONSTRAINT VIOLATED! No one is on-call. # ============================================# FOUNDATIONDB PREVENTS THIS# ============================================ # Under strict serializability, one transaction will see# the other's write and either:# a) Abort due to conflict (optimistic CC) - FoundationDB's approach# b) Block until the other commits (pessimistic CC) # In FoundationDB: Both transactions read doctors/alice and doctors/bob# Write set: T1 writes alice, T2 writes bob# Read sets overlap with each other's write sets → CONFLICT# One transaction is aborted and must retry# On retry, it sees the committed state and correctly refusesEven databases that claim 'serializable' isolation often implement only Snapshot Isolation, which allows write skew. PostgreSQL's 'serializable' mode since version 9.1 is truly serializable (using SSI), but MySQL's 'serializable' has caveats. Always verify what your database actually provides—the marketing language often differs from the technical reality.
FoundationDB uses optimistic concurrency control (OCC) to achieve serializable transactions. Unlike pessimistic approaches (locks), optimistic control allows transactions to proceed without coordination, validating at commit time that no conflicts occurred.
How OCC Works in FoundationDB:
Transaction Start: Client begins a transaction, receiving a read version (a timestamp in the transaction log)
Read Phase: Client reads keys from the database. FoundationDB tracks all keys read by the transaction (the read set) and their versions.
Write Phase: Client accumulates writes locally (the write set). Writes are not sent to the database until commit.
Commit: Client sends the entire write set to a special node called the proxy. The proxy forwards to the resolver for conflict checking.
Conflict Check: The resolver checks if any key in the read set was modified by another transaction that committed after this transaction's read version but before now. If so, the transaction conflicts and must abort.
Commit or Abort: If no conflicts, the transaction is assigned a commit version and written to the transaction log. If conflicts, the client receives an error and must retry from scratch.
FOUNDATIONDB TRANSACTION LIFECYCLE═══════════════════════════════════════════════════════════════════ ┌─────────────────────────────────────────────────────────────────┐│ CLIENT │├─────────────────────────────────────────────────────────────────┤│ ││ 1. Begin Transaction ││ └── Request read version from Proxy ││ └── Proxy returns read_version = 1000 ││ ││ 2. Perform Reads (read_set accumulated) ││ └── read(key_A) = value_A [version 950] ││ └── read(key_B) = value_B [version 980] ││ └── read_set = {key_A, key_B} ││ ││ 3. Perform Writes (write_set accumulated locally) ││ └── write(key_A, new_value_A) ││ └── write(key_C, new_value_C) ││ └── write_set = {key_A: new_value_A, key_C: new_value_C} ││ ││ 4. Commit ││ └── Send to Proxy: {read_version, read_set, write_set} ││ │└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ PROXY │├─────────────────────────────────────────────────────────────────┤│ Forward to Resolver for conflict detection │└─────────────────────────────────────────────────────────────────┘ │ ▼┌─────────────────────────────────────────────────────────────────┐│ RESOLVER │├─────────────────────────────────────────────────────────────────┤│ ││ Conflict Check: ││ "Were any keys in read_set modified between ││ read_version (1000) and now?" ││ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ Scenario A: NO CONFLICTS │ ││ │ │ ││ │ - key_A last modified at version 950 < 1000 ✓ │ ││ │ - key_B last modified at version 980 < 1000 ✓ │ ││ │ │ ││ │ Result: Assign commit_version = 1005, write to log │ ││ └─────────────────────────────────────────────────────────┘ ││ ││ ┌─────────────────────────────────────────────────────────┐ ││ │ Scenario B: CONFLICT DETECTED │ ││ │ │ ││ │ - Another transaction modified key_A at version 1002 │ ││ │ - 1002 > read_version (1000) │ ││ │ - Our read of key_A is stale! │ ││ │ │ ││ │ Result: Transaction ABORTED, client must retry │ ││ └─────────────────────────────────────────────────────────┘ ││ │└─────────────────────────────────────────────────────────────────┘ CONFLICT TYPES:━━━━━━━━━━━━━━━1. READ-WRITE CONFLICT: We read a key that someone else wrote → Our read might have been different if we'd read later 2. WRITE-WRITE CONFLICT: We wrote a key that someone else wrote → Only one can win; the other must retry with fresh data NOTE: Read-read is NOT a conflict. Multiple transactions can readthe same key at the same version without issues.Why Optimistic over Pessimistic?
Optimistic concurrency control has several advantages for FoundationDB's design goals:
No Distributed Locking: Pessimistic locking in a distributed system requires coordinating lock acquisition across nodes—expensive and deadlock-prone.
Low Latency for Non-Conflicting Transactions: Most transactions in well-designed applications don't conflict. OCC allows these to proceed without any coordination overhead.
Simple Client Model: The client library is stateless. Reads and writes are just data; there's no lock state to manage or clean up on failure.
Natural Retry Semantics: When conflicts occur, the client simply retries the entire transaction. The @fdb.transactional decorator handles this automatically.
The Cost of Optimism:
The downside is wasted work on conflicts. If transactions frequently conflict, each abort means throwing away computation and retrying. For workloads with high contention (many transactions accessing the same keys), this can be problematic.
FoundationDB mitigates this through:
add() don't require reading, reducing conflict potentialThe key to FoundationDB performance is minimizing conflicts. Design transactions to touch disjoint key ranges when possible. Use atomic operations for counters and accumulators. Keep transactions short and focused. A transaction that reads or writes every key in the database will conflict with everything—avoid global operations.
Conflict management is central to effective FoundationDB use. Understanding exactly when conflicts occur—and when they don't—enables designing schemas and access patterns that maximize throughput.
Precise Conflict Rules:
Read-Write Conflict: Transaction T1 reads key K at version V. Transaction T2 writes K and commits with version V' > V. T1 tries to commit → CONFLICT.
Write-Write Conflict: Transaction T1 writes key K. Transaction T2 writes K. Both try to commit → one succeeds, one conflicts.
No Read-Read Conflict: Multiple transactions can read the same key at the same version. This never causes conflicts.
No Write-Only Conflict with Unrelated Reads: T1 reads key A. T2 writes key B. No conflict (their key sets are disjoint).
Range Conflicts:
FoundationDB also supports conflict ranges for range reads:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109
import fdb # ============================================# RANGE READS AND CONFLICTS# ============================================ @fdb.transactionaldef count_user_orders(tr, user_id): """ Counts orders for a user using range read. Any INSERT into this range will conflict. """ prefix = pack(("users", user_id, "orders")) end = prefix + b'\xff' # This implicitly adds a READ conflict range count = 0 for key, value in tr.get_range(prefix, end): count += 1 return count # If another transaction INSERTS a new order for this user# (i.e., writes a key in the range [prefix, end)), this # transaction will CONFLICT on commit. # This is correct behavior! Our count might be wrong if # new orders were added while we were counting. # ============================================# EXPLICIT CONFLICT RANGES# ============================================ @fdb.transactionaldef transfer_funds(tr, from_account, to_account, amount): """ Transfer money between accounts with explicit conflict control. """ from_key = pack(("accounts", from_account, "balance")) to_key = pack(("accounts", to_account, "balance")) # Read current balances from_balance = int(tr[from_key] or 0) to_balance = int(tr[to_key] or 0) if from_balance < amount: raise ValueError("Insufficient funds") # Write new balances tr[from_key] = str(from_balance - amount).encode() tr[to_key] = str(to_balance + amount).encode() # This naturally conflicts if:# - Another transaction modifies from_account balance# - Another transaction modifies to_account balance# Exactly the right conflict behavior! # ============================================# ADDING CONFLICT RANGES MANUALLY# ============================================ @fdb.transactionaldef complex_business_logic(tr, entity_id): """ Sometimes you need to add conflict ranges that aren't automatically tracked by reads. """ # Add a read conflict range without actually reading # This ensures we conflict with any write to this range tr.add_read_conflict_range( pack(("locks", entity_id)), pack(("locks", entity_id)) + b'\xff' ) # Add a write conflict range without actually writing # Other transactions reading this range will conflict with us tr.add_write_conflict_range( pack(("notified", entity_id)), pack(("notified", entity_id)) + b'\xff' ) # ============================================# SNAPSHOT READS (NO CONFLICT)# ============================================ @fdb.transactionaldef dashboard_stats(tr): """ For read-only analytics, snapshot reads avoid conflicts. We accept potentially stale data for non-conflicting reads. """ # Normal read - adds to conflict set critical_value = tr[b'critical/key'] # Snapshot read - does NOT add to conflict set # If this key changes, we won't conflict total_users = tr.snapshot[b'stats/total_users'] total_orders = tr.snapshot[b'stats/total_orders'] # Good for dashboards: We see a consistent snapshot, # but our transaction won't abort if stats change. return { 'users': int(total_users or 0), 'orders': int(total_orders or 0), 'critical': critical_value }Strategies for Reducing Conflicts:
1. Shard Hot Keys
A single counter that every transaction increments will conflict constantly. Shard it:
# Instead of one key that everyone fights over:
tr.add(b'page_views', pack((1,))) # High contention!
# Spread across shards:
shard = random.randint(0, 99)
tr.add(pack(('page_views', shard)), pack((1,))) # Low contention
2. Use Atomic Operations
Atomic operations (add, min, max, and, or) don't require reading the current value. They add to the write set but not the read set, meaning:
add operations on the same key don't conflictadd won't conflict with a transaction that only reads the key# DON'T (creates read-write conflict):
count = int(tr[b'counter'] or 0)
tr[b'counter'] = str(count + 1).encode()
# DO (atomic, no read conflict):
tr.add(b'counter', pack((1,)))
3. Minimize Transaction Duration
Longer transactions have more time for conflicts to accumulate. FoundationDB enforces a 5-second maximum, but aim for milliseconds:
4. Use Versionstamps for Unique Keys
Inserting with unique keys eliminates write-write conflicts:
# Events with unique, time-ordered keys never conflict on insert:
tr.set_versionstamped_key(
pack(('events', Versionstamp())),
event_data
)
Conflicts aren't bugs; they're the mechanism that ensures correctness. A well-designed system may have some conflicts under high load. The key is ensuring conflicts are handled gracefully (automatic retry) and their frequency doesn't dominate transaction throughput. FoundationDB's client libraries handle retries transparently.
How do you know a database's correctness claims are true? This is a profound challenge. Distributed systems have astronomical numbers of possible states—testing every scenario is impossible. Bugs may lurk in rare race conditions that occur once in a million operations, in failures that happen only during recovery, or in interactions between seemingly unrelated components.
FoundationDB's answer is deterministic simulation testing—a revolutionary testing methodology that has found bugs that conventional testing would never catch. This approach is so effective that FoundationDB's developers have stated they "would rather run simulations than have more users" for finding bugs.
The Core Insight:
Traditional distributed systems testing is non-deterministic. You run the system, inject failures, and hope to trigger bugs. If a test fails, you often can't reproduce it—the exact timing of events is different each run.
FoundationDB takes a different approach:
Single-Threaded Simulation: The entire distributed system—multiple nodes, network, disk I/O—runs in a single thread, with a custom scheduler controlling event ordering.
Deterministic Pseudo-Random: All randomness (timings, failures, scheduling decisions) comes from a seeded PRNG. Given the same seed, the exact same execution replays.
Fault Injection: The simulator can inject any failure—network partitions, disk errors, machine crashes, Byzantine behavior—at any point in execution.
Invariant Checking: After every state change, the simulator verifies that all system invariants hold (no lost writes, no inconsistent reads, etc.).
FOUNDATIONDB SIMULATION TESTING ARCHITECTURE═══════════════════════════════════════════════════════════════════ Traditional Testing:──────────────────── Real Time │ Node A ──────────────────────────────▶ Node B ──────────────────────────────▶ Node C ──────────────────────────────▶ Network ─────────────────────────────▶ Problem: Non-deterministic timing, can't reproduce failures ═══════════════════════════════════════════════════════════════════ FoundationDB Simulation:──────────────────────── ┌───────────────────────────────────────────────────────────┐ │ DETERMINISTIC DISCRETE-EVENT SIMULATOR │ ├───────────────────────────────────────────────────────────┤ │ │ │ Simulated Time: t=0.0000000 │ │ │ │ Event Queue (priority by simulated time): │ │ ┌─────────────────────────────────────────────────────┐ │ │ │ t=0.001: Node A processes client request │ │ │ │ t=0.002: Network delivers message A→B │ │ │ │ t=0.003: Node B disk write completes │ │ │ │ t=0.015: Node C timeout fires │ │ │ │ t=0.023: FAULT: Network partition A|BC │ │ │ │ t=0.045: FAULT: Node A crashes │ │ │ │ t=0.100: FAULT: Node A restarts │ │ │ │ ... │ │ │ └─────────────────────────────────────────────────────┘ │ │ │ │ Single Thread Execution: │ │ 1. Pop event from queue │ │ 2. Execute event handler │ │ 3. Handler may enqueue new events │ │ 4. CHECK ALL INVARIANTS │ │ 5. Repeat │ │ │ │ PRNG Seed: 8675309 │ │ (Same seed = exact same execution = reproducible bugs) │ │ │ └───────────────────────────────────────────────────────────┘ FAULT INJECTION CAPABILITIES:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Network: • Partition (total or partial) • Message delay (configurable distribution) • Message loss (probabilistic) • Message corruption • Network bandwidth limits Storage: • Read/write errors • Corruption (bit flips, partial writes) • Slow disk (latency injection) • Disk full Process: • Crash at any point • Restart with various delays • Memory corruption • Clock skew Cascading failures, split-brain scenarios, recovery during failure...ALL systematically tested! INVARIANTS VERIFIED CONTINUOUSLY:━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ • No transaction sees uncommitted data• Committed transactions are durable• All replicas converge to same state• Read versions form a proper partial order• Range queries return consistent snapshots• Atomic operations compose correctly• Recovery restores all committed stateWhat Makes This Work:
Flow: FoundationDB is written in Flow, a custom C++ extension that adds actor-model concurrency with explicit suspension points. All I/O is asynchronous, and the simulator intercepts all async operations.
No Hidden State: The simulator controls all sources of non-determinism: time, random numbers, network, disk. There's nothing the code can do that the simulator doesn't observe.
Full System Simulation: Unlike unit tests that mock components, simulation runs the actual production code—just with simulated I/O. If it works in simulation, it works in production.
Massive Parallelism: Simulations are fast (no real I/O waits) and independent. FoundationDB runs thousands of simulation hours daily across their test cluster.
A Concrete Example: Finding Rare Bugs
Consider a bug that only manifests when:
In real testing, this sequence might occur once in billions of runs—effectively never. In simulation:
FoundationDB's simulation testing isn't a nice-to-have; it's fundamental to how they develop. New features must pass simulation tests. Bug fixes require a simulation test that would have caught the bug. This discipline is why FoundationDB can make strong correctness claims—they've tested more failure scenarios than any real-world deployment could encounter.
Strict serializability changes how you write applications. Many patterns and defensive techniques common in eventual-consistency systems become unnecessary—while some new considerations emerge.
What You No Longer Need:
1. Explicit Locking
With strict serializability, transactions are the locking mechanism. You don't need:
# NOT NEEDED in FoundationDB
acquire_lock("resource_123")
try:
do_work()
finally:
release_lock("resource_123")
# Instead, just do your work in a transaction:
@fdb.transactional
def do_work(tr):
# Automatically serialized with other transactions
data = tr[b'resource_123']
# ... process ...
tr[b'resource_123'] = new_data
2. Read-Your-Writes Workarounds
In eventually consistent systems, you often need session tracking or sticky routing to ensure users see their own writes. Not needed in FoundationDB—every read after a committed write sees that write.
3. Conflict Retry Logic
The @fdb.transactional decorator handles retries automatically. You don't need retry loops, exponential backoff, or idempotency keys (mostly—see below).
4. Data Versioning for Consistency
Patterns like optimistic locking with version numbers are unnecessary:
# NOT NEEDED in FoundationDB
data, version = get_with_version("doc_123")
# ... modify data ...
update_if_version_matches("doc_123", new_data, version)
# Instead:
@fdb.transactional
def update_doc(tr):
data = tr[b'doc_123'] # Read creates conflict range
# ... modify data ...
tr[b'doc_123'] = new_data # Conflicts if someone else modified
What You Still Need:
1. Idempotency for Non-Database Side Effects
If your transaction has side effects outside FoundationDB (sending emails, calling APIs, charging credit cards), you still need idempotency. The transaction might commit, but your client might crash before receiving confirmation:
@fdb.transactional
def process_order(tr, order_id):
# Check if already processed (idempotency key)
status = tr[pack(('orders', order_id, 'status'))]
if status == b'completed':
return # Already done, skip
# Mark as processing (prevents duplicate processing)
tr[pack(('orders', order_id, 'status'))] = b'processing'
# Database work is safe within the transaction
deduct_inventory(tr, order_id)
# External side effect - needs its own protection
# Often done AFTER transaction commits, with status tracking
tr[pack(('orders', order_id, 'status'))] = b'completed'
# For truly external operations:
@fdb.transactional
def request_external_processing(tr, order_id):
# Record intent in database
tr[pack(('pending_external', order_id))] = b'pending'
# Separate worker processes pending_external entries
# with its own idempotency and retry logic
2. Transaction Size Limits
FoundationDB transactions are limited:
Large batch operations must be split into multiple transactions:
def delete_old_events(before_timestamp):
"""Delete events older than timestamp in batches."""
while True:
deleted_count = delete_batch(before_timestamp, batch_size=1000)
if deleted_count == 0:
break
print(f"Deleted {deleted_count} events")
@fdb.transactional
def delete_batch(tr, before_timestamp, batch_size):
prefix = pack(('events',))
count = 0
to_delete = []
for key, value in tr.get_range(prefix, prefix + b'\xff', limit=batch_size):
event_time = unpack(key)[1] # Assuming (events, timestamp, ...)
if event_time < before_timestamp:
to_delete.append(key)
count += 1
for key in to_delete:
del tr[key]
return count
add(), min(), max() avoid read conflicts.Think of each transaction as if it were the only transaction running. Design your logic for single-threaded correctness. FoundationDB ensures that concurrent transactions behave as if they ran one at a time—you provide the 'what', it handles the 'how'.
We've explored FoundationDB's approach to transaction isolation—strict serializability achieved through optimistic concurrency control and validated through deterministic simulation testing. This guarantee is the bedrock that makes everything else possible.
add(), min(), max() write without reading, enabling concurrent updates to shared keys.What's Next:
The ordered key-value store with strict serializability is powerful, but for many use cases, raw key-value access isn't ergonomic. FoundationDB addresses this through its layer architecture—the ability to build higher-level abstractions (document databases, SQL databases, graph databases) on top of the core primitives. In the next page, we'll explore how layers work and why this architecture enables unprecedented flexibility.
You now understand FoundationDB's transactional guarantees—strict serializability through optimistic concurrency control, validated by rigorous simulation testing. This correctness foundation is what makes FoundationDB trustworthy for critical infrastructure. Next, we'll see how the layer architecture turns this foundation into versatile data systems.