System Design (HLD)FoundationDB

FoundationDB: The Database That Powers Cloud Giants

LevelAdvanced

Duration90 mins

TopicFoundationDB

1 / 5

Ordered Key-Value Store: The Minimalist Foundation for Maximum Power

The Database That Does Less to Achieve More

In the crowded landscape of distributed databases, FoundationDB stands apart not for what it does, but for what it deliberately doesn't do. While competitors race to add features—query languages, indexing strategies, specialized data types, built-in caching—FoundationDB takes the opposite approach: provide the smallest possible primitive that is still useful, and make that primitive absolutely bulletproof.

That primitive is the ordered key-value store with strict serializable ACID transactions.

This design philosophy, radical in its simplicity, has attracted some of the most demanding users in the technology industry. Apple acquired FoundationDB in 2015 and uses it as the backbone for iCloud's infrastructure, serving hundreds of millions of users. Snowflake, the cloud data warehouse valued at over $70 billion, built their metadata layer on FoundationDB. These aren't companies that make infrastructure decisions lightly—they chose FoundationDB because its approach to data fundamentals is unmatched.

In this page, we'll explore the ordered key-value store model that lies at the heart of FoundationDB, understanding why this seemingly simple abstraction becomes extraordinarily powerful when implemented with unwavering correctness guarantees.

What You Will Learn

By the end of this page, you will understand: (1) What an ordered key-value store is and why ordering matters for building higher-level abstractions; (2) How FoundationDB's key-value model differs from other key-value databases like Redis or DynamoDB; (3) The specific operations FoundationDB provides and their semantic guarantees; (4) How to model complex data structures within the ordered key-value paradigm; and (5) Why simplicity at the core enables complexity at the edges.

Understanding Key-Value Stores

Before diving into FoundationDB's specifics, let's establish a foundational understanding of key-value stores and their variations. This context is essential for appreciating what makes FoundationDB's approach distinctive.

The Basic Key-Value Model:

At its simplest, a key-value store is a dictionary—a collection of (key, value) pairs where each key maps to exactly one value. The fundamental operations are:

GET(key) → value (or null if not found)
SET(key, value) → success/failure
DELETE(key) → success/failure

This model is attractive because it eliminates the complexity of schemas, relationships, and query optimization. You're responsible for organizing your data; the database is responsible for storing and retrieving it efficiently.

The Spectrum of Key-Value Stores:

Not all key-value stores are created equal. They vary dramatically in their guarantees and capabilities:

Key-Value Store Variations
Type	Examples	Ordering	Transactions	Typical Use Case
In-Memory Cache	Redis, Memcached	None (hash-based)	Limited/None	Caching, sessions, rate limiting
Simple Distributed	Amazon DynamoDB (basic)	None (hash partitioned)	Single-item only	Simple CRUD, user profiles
Ordered/Sorted	FoundationDB, RocksDB	Lexicographic	Multi-key ACID	Building databases, indexes
Wide-Column	Cassandra, HBase	Partial (within partition)	Row-level	Time-series, write-heavy loads

Why Ordering Matters: The Hidden Power:

The distinction between unordered and ordered key-value stores might seem minor, but it's actually transformative. An unordered store (like a hash table) can only answer the question "What is the value for this exact key?" An ordered store can additionally answer:

"What are all keys in this range?" (range queries)
"What is the smallest key greater than X?" (successor queries)
"What keys start with this prefix?" (prefix scans)

These capabilities unlock the ability to encode hierarchical structures, indexes, and relationships within the key-value model itself. We'll see this in detail shortly, but consider: with ordering, you can store a user's orders as keys like users/alice/orders/001, users/alice/orders/002, etc., and retrieve all of Alice's orders with a single range query on the prefix users/alice/orders/.

The Transaction Question:

Many key-value stores offer excellent performance for individual operations but provide weak or no transactional guarantees when multiple operations must succeed or fail together. This creates immense complexity for application developers who must implement their own concurrency control, retry logic, and consistency checks.

FoundationDB takes the opposite stance: every operation, from the simplest single-key read to complex multi-key updates across the entire keyspace, executes within a serializable ACID transaction. There is no "eventual consistency mode," no "relaxed isolation for performance," no escape hatch that compromises correctness.

Simplicity is Deceptive

FoundationDB's API is intentionally minimal, but this simplicity is deceptive. The combination of ordering, range queries, and full ACID transactions creates a surprisingly powerful substrate. It's like how a small set of LEGO bricks, combined thoughtfully, can construct arbitrarily complex structures.

FoundationDB's Core Operations

FoundationDB provides a carefully designed set of operations that balance simplicity with expressiveness. Each operation is designed to be composed with others within transactions, enabling complex atomic updates without complex APIs.

Keys and Values:

Both keys and values in FoundationDB are arbitrary byte strings:

Keys: Up to 10KB (but recommended ≤ 500 bytes for performance)
Values: Up to 100KB (larger values should be split across multiple keys)

This byte-string model means you can store anything: serialized JSON, Protocol Buffers, raw binary data, UTF-8 strings, or any encoding you devise. FoundationDB is encoding-agnostic; it sees only bytes.

The Fundamental Operations:

foundationdb-operations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
import fdb
 
# Initialize FoundationDB client
fdb.api_version(720)
db = fdb.open()
 
# ============================================
# BASIC READ/WRITE OPERATIONS
# ============================================
 
@fdb.transactional
def basic_operations(tr):
    """
    All operations within this function execute atomically.
    If any operation fails, all are rolled back.
    """
    
    # SET: Write a key-value pair
    # If key exists, overwrites; if not, creates
    tr[b'users/alice/email'] = b'alice@example.com'
    tr[b'users/alice/name'] = b'Alice Smith'
    tr[b'users/alice/created_at'] = b'2024-01-15T12:00:00Z'
    
    # GET: Read a single key's value
    # Returns None if key doesn't exist
    email = tr[b'users/alice/email']
    print(f"Email: {email}")  # b'alice@example.com'
    
    # DELETE: Remove a key-value pair
    # Silently succeeds even if key doesn't exist
    del tr[b'users/alice/temp_token']
    
    # Key existence check
    if b'users/alice/premium' in tr:
        print("User has premium status")
 
# ============================================
# RANGE OPERATIONS - The Power of Ordering
# ============================================
 
@fdb.transactional
def range_operations(tr):
    """
    Range operations leverage the sorted nature of keys.
    """
    
    # Get all keys with prefix 'users/alice/'
    # This returns an iterator, not a list (efficient for large ranges)
    for key, value in tr.get_range(
        b'users/alice/',           # Start key (inclusive)
        b'users/alice/\xff'       # End key (exclusive) - \xff ensures we get all children
    ):
        print(f"{key} = {value}")
    
    # Outputs (in sorted order):
    # users/alice/created_at = 2024-01-15T12:00:00Z
    # users/alice/email = alice@example.com
    # users/alice/name = Alice Smith
    
    # Range with limit
    # Get only the first 10 orders
    orders = tr.get_range(
        b'users/alice/orders/',
        b'users/alice/orders/\xff',
        limit=10
    )
    
    # Range in reverse order
    # Get the 5 most recent orders (assuming keys are time-sorted)
    recent_orders = tr.get_range(
        b'users/alice/orders/',
        b'users/alice/orders/\xff',
        limit=5,
        reverse=True
    )
 
# ============================================
# ATOMIC OPERATIONS - Conflict-Free Updates
# ============================================
 
@fdb.transactional
def atomic_operations(tr):
    """
    Atomic mutations that don't require reading the current value.
    These reduce conflicts in high-contention scenarios.
    """
    
    # Atomic ADD: Increment/decrement without read-modify-write cycle
    # Incredibly useful for counters, metrics, inventory
    tr.add(b'metrics/page_views', fdb.tuple.pack((1,)))
    tr.add(b'inventory/product_123', fdb.tuple.pack((-1,)))  # Decrement
    
    # Atomic BITWISE operations
    tr.bit_or(b'permissions/alice', some_flag_bytes)
    tr.bit_and(b'permissions/alice', mask_bytes)
    tr.bit_xor(b'toggles/feature_x', toggle_bytes)
    
    # Atomic MIN/MAX: Compare-and-update atomically
    tr.min(b'stats/min_latency', fdb.tuple.pack((42,)))
    tr.max(b'stats/max_latency', fdb.tuple.pack((1500,)))
    
    # SET_VERSIONSTAMPED_KEY: Unique time-ordered keys
    # FoundationDB replaces a placeholder with a unique 80-bit version stamp
    # Perfect for event logs, audit trails, time-series
    tr.set_versionstamped_key(
        b'events/\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00/user_login',
        event_data
    )
 
# ============================================
# CLEAR RANGE - Efficient Bulk Deletion
# ============================================
 
@fdb.transactional
def clear_operations(tr):
    """
    Efficiently delete ranges of keys.
    """
    
    # Delete a single key
    del tr[b'users/bob/session']
    
    # Delete all keys with a prefix (very efficient)
    # This deletes all of Bob's data atomically
    tr.clear_range(b'users/bob/', b'users/bob/\xff')
    
    # Useful for:
    # - Account deletion (GDPR right to erasure)
    # - Clearing expired sessions
    # - Resetting test data

Understanding Key Ordering:

FoundationDB orders keys lexicographically by their byte content. This is similar to dictionary ordering for strings, extended to arbitrary bytes:

a < aa < ab < b < ba < bb
0x00 < 0x01 < 0x02 < ... < 0xFE < 0xFF

This lexicographic ordering has important implications:

Numeric values need encoding: The string "10" sorts before "9" lexicographically. For numeric sorting, you must use special encodings (FoundationDB's Tuple layer handles this automatically).
Hierarchies need thought: Keys like users/alice/orders/001 naturally sort together because they share a prefix.
The \xFF suffix trick: To get all keys starting with a prefix P, query range [P, P\xFF). Since \xFF is the highest byte value, P\xFF is always greater than any key starting with P.

The Tuple Layer: Structured Key Encoding:

While FoundationDB only sees bytes, manually encoding complex keys is error-prone. The Tuple layer provides a standard encoding for structured keys that preserves sort order:

tuple-layer-encoding.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import fdb
from fdb.tuple import pack, unpack
 
# The Tuple layer encodes values in a way that preserves
# intuitive sort order across types
 
# Packing tuples to bytes
key1 = pack(("users", "alice", 1))     # b'\x02users\x00\x02alice\x00\x15\x01'
key2 = pack(("users", "alice", 2))     # b'\x02users\x00\x02alice\x00\x15\x02'
key3 = pack(("users", "alice", 10))    # b'\x02users\x00\x02alice\x00\x15\x0a'
key4 = pack(("users", "bob", 1))       # b'\x02users\x00\x02bob\x00\x15\x01'
 
# These sort correctly!
# key1 < key2 < key3 < key4
# Despite "10" coming after "2" in string sorting
 
# Unpacking bytes back to tuples
original = unpack(key1)  # ("users", "alice", 1)
 
# ============================================
# PRACTICAL SCHEMA DESIGN WITH TUPLES
# ============================================
 
# E-commerce schema example
# Each tuple element represents a level in the key hierarchy
 
# User data
user_key = pack(("data", "users", user_id, "profile"))
user_orders_prefix = pack(("data", "users", user_id, "orders"))
user_order_key = pack(("data", "users", user_id, "orders", order_id))
 
# Product catalog
product_key = pack(("data", "products", product_id))
category_products = pack(("index", "category", category_name, product_id))
 
# Order items (nested under orders)
order_item_key = pack((
    "data", "users", user_id, 
    "orders", order_id, 
    "items", item_id
))
 
# Time-series data with timestamp encoding
# Integers encode with preserved sort order
event_key = pack(("events", timestamp_ms, event_type, event_id))
 
# Range query: all events between two timestamps
start = pack(("events", start_time))
end = pack(("events", end_time))
events_in_range = tr.get_range(start, end)

Value Size Considerations

While values can be up to 100KB, FoundationDB performs best with smaller values (under 10KB). For larger objects, split data across multiple keys or store a reference to an external blob store. The transaction size limit (10MB total) also constrains how much data you can write in a single transaction.

Modeling Data in the Ordered Key-Value Paradigm

The art of using FoundationDB effectively lies in key design—how you structure your keys to enable efficient access patterns. Unlike relational databases where the query optimizer handles data access, in FoundationDB you design the access paths through your key structure.

Fundamental Principle: Keys Are Access Paths

Every key structure you design should be optimized for specific access patterns. Consider what questions your application needs to answer:

Get user by ID → Key: (users, user_id)
Get user by email → Need a secondary index: (index, email, user_id)
Get all orders for a user → Key prefix: (users, user_id, orders, *)
Get orders by date range → Different key structure: (orders, date, order_id)

Secondary Indexes: Denormalization for Query Flexibility

Since FoundationDB only supports lookup by key prefix and range, every query pattern needs its own index. This means intentional denormalization:

secondary-indexes.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
import fdb
from fdb.tuple import pack
 
# ============================================
# BUILDING A USER MANAGEMENT SYSTEM
# ============================================
 
@fdb.transactional
def create_user(tr, user_id, email, name, created_at):
    """
    Create a user with multiple indexes for different access patterns.
    All operations are atomic - either all succeed or none do.
    """
    
    # Primary data: user record
    tr[pack(("data", "users", user_id, "email"))] = email.encode()
    tr[pack(("data", "users", user_id, "name"))] = name.encode()
    tr[pack(("data", "users", user_id, "created_at"))] = str(created_at).encode()
    
    # Index 1: Lookup by email (email -> user_id mapping)
    tr[pack(("index", "users_by_email", email))] = pack((user_id,))
    
    # Index 2: Users by creation date (for "newest users" query)
    tr[pack(("index", "users_by_date", created_at, user_id))] = b''
    
    # Index 3: Users by name prefix (for autocomplete)
    # Store multiple prefixes for fuzzy matching
    name_lower = name.lower()
    for i in range(1, min(len(name_lower) + 1, 10)):  # Up to 10 chars
        prefix = name_lower[:i]
        tr[pack(("index", "users_by_name_prefix", prefix, user_id))] = b''
 
 
@fdb.transactional
def get_user_by_email(tr, email):
    """Lookup user by email using secondary index."""
    
    # Step 1: Find user_id from email index
    user_id_bytes = tr[pack(("index", "users_by_email", email))]
    if user_id_bytes is None:
        return None
    
    user_id = fdb.tuple.unpack(user_id_bytes)[0]
    
    # Step 2: Fetch user data using user_id
    return get_user_by_id(tr, user_id)
 
 
@fdb.transactional  
def get_user_by_id(tr, user_id):
    """Fetch all user attributes using range query on prefix."""
    
    prefix = pack(("data", "users", user_id))
    end = prefix + b'\xff'
    
    user = {"id": user_id}
    for key, value in tr.get_range(prefix, end):
        # Extract attribute name from key
        key_tuple = fdb.tuple.unpack(key)
        attr_name = key_tuple[-1]  # Last element is attribute name
        user[attr_name] = value.decode()
    
    return user if len(user) > 1 else None
 
 
@fdb.transactional
def delete_user(tr, user_id):
    """
    Delete user and ALL associated indexes.
    Failure to delete indexes creates orphan data - be thorough!
    """
    
    # First, read user data to get values needed for index cleanup
    email = tr[pack(("data", "users", user_id, "email"))]
    name = tr[pack(("data", "users", user_id, "name"))]
    created_at = tr[pack(("data", "users", user_id, "created_at"))]
    
    if email is None:
        return False  # User doesn't exist
    
    # Delete all user data (clear_range is efficient)
    tr.clear_range(
        pack(("data", "users", user_id)),
        pack(("data", "users", user_id)) + b'\xff'
    )
    
    # Delete email index
    del tr[pack(("index", "users_by_email", email.decode()))]
    
    # Delete date index  
    del tr[pack(("index", "users_by_date", created_at.decode(), user_id))]
    
    # Delete name prefix indexes
    name_lower = name.decode().lower()
    for i in range(1, min(len(name_lower) + 1, 10)):
        prefix = name_lower[:i]
        del tr[pack(("index", "users_by_name_prefix", prefix, user_id))]
    
    return True
 
 
@fdb.transactional
def search_users_by_name_prefix(tr, prefix, limit=10):
    """
    Autocomplete: find users whose names start with prefix.
    Returns list of user_ids.
    """
    
    search_key = pack(("index", "users_by_name_prefix", prefix.lower()))
    end_key = search_key + b'\xff'
    
    user_ids = []
    for key, _ in tr.get_range(search_key, end_key, limit=limit):
        key_tuple = fdb.tuple.unpack(key)
        user_ids.append(key_tuple[-1])  # user_id is last element
    
    return list(set(user_ids))  # Deduplicate (same user may match multiple prefixes)

The Index Consistency Challenge:

Maintaining secondary indexes creates a critical challenge: the index and data must always be consistent. If you update a user's email but forget to update the email index, lookups will fail or return wrong results.

FoundationDB's transactions solve this problem completely. Since all operations in a transaction are atomic, you can update the data and all indexes in one transaction with the guarantee that either all changes apply or none do.

Modeling Relationships:

While FoundationDB lacks explicit foreign keys, relationships are easily modeled through key design:

relationship-modeling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# ============================================
# MODELING RELATIONSHIPS IN FOUNDATIONDB
# ============================================
 
# ONE-TO-MANY: User has many Orders
# Parent data and children share key prefix
 
# User record
("data", "users", user_id, "profile")  →  user_json
 
# User's orders (children nested under parent)
("data", "users", user_id, "orders", order_id, "status")  →  "shipped"
("data", "users", user_id, "orders", order_id, "total")   →  "99.99"
 
# Get all orders for user:
tr.get_range(
    pack(("data", "users", user_id, "orders")),
    pack(("data", "users", user_id, "orders")) + b'\xff'
)
 
# ============================================
# MANY-TO-MANY: Users follow Users
# ============================================
 
# Store both directions for efficient queries
 
# User A follows User B
# Forward direction: "Who does A follow?"
("data", "follows", "following", user_a_id, user_b_id)  →  b''
 
# Reverse direction: "Who follows B?"  
("data", "follows", "followers", user_b_id, user_a_id)  →  b''
 
@fdb.transactional
def follow_user(tr, follower_id, followed_id):
    """Create a follow relationship (both directions atomically)."""
    tr[pack(("data", "follows", "following", follower_id, followed_id))] = b''
    tr[pack(("data", "follows", "followers", followed_id, follower_id))] = b''
 
@fdb.transactional
def unfollow_user(tr, follower_id, followed_id):
    """Remove follow relationship (both directions atomically)."""
    del tr[pack(("data", "follows", "following", follower_id, followed_id))]
    del tr[pack(("data", "follows", "followers", followed_id, follower_id))]
 
@fdb.transactional
def get_following(tr, user_id, limit=100):
    """Get list of users that user_id follows."""
    prefix = pack(("data", "follows", "following", user_id))
    results = []
    for key, _ in tr.get_range(prefix, prefix + b'\xff', limit=limit):
        followed_id = fdb.tuple.unpack(key)[-1]
        results.append(followed_id)
    return results
 
@fdb.transactional  
def get_followers(tr, user_id, limit=100):
    """Get list of users who follow user_id."""
    prefix = pack(("data", "follows", "followers", user_id))
    results = []
    for key, _ in tr.get_range(prefix, prefix + b'\xff', limit=limit):
        follower_id = fdb.tuple.unpack(key)[-1]
        results.append(follower_id)
    return results
 
# ============================================
# COUNT AS SEPARATE MAINTAINED VALUE
# ============================================
 
# For "how many followers does user have?", maintain a count
# (More efficient than counting keys in range)
 
@fdb.transactional
def follow_user_with_count(tr, follower_id, followed_id):
    # Check if already following
    key = pack(("data", "follows", "following", follower_id, followed_id))
    if tr[key] is not None:
        return False  # Already following
    
    # Create relationship
    tr[key] = b''
    tr[pack(("data", "follows", "followers", followed_id, follower_id))] = b''
    
    # Atomically increment counts (no read required!)
    tr.add(pack(("counts", "following", follower_id)), pack((1,)))
    tr.add(pack(("counts", "followers", followed_id)), pack((1,)))
    
    return True

Denormalization Trades Space for Query Flexibility

Every secondary index and denormalized view consumes additional storage and requires additional writes during updates. Design your key structure based on your actual query patterns, not hypothetical future needs. You can always add indexes later, but unnecessary indexes slow writes and consume resources.

Key Design Patterns and Best Practices

Experienced FoundationDB developers accumulate patterns that solve common modeling challenges. Here are proven approaches that balance functionality, performance, and maintainability.

Pattern 1: Directory-Based Namespacing

FoundationDB's Directory layer provides managed key prefixes that avoid collisions between different parts of your application:

directory-namespacing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import fdb
import fdb.directory
 
fdb.api_version(720)
db = fdb.open()
 
# The Directory layer manages a tree of named directories
# Each directory is assigned a unique short prefix automatically
 
# Create/open directories (creates if doesn't exist)
root = fdb.directory.root()
 
users_dir = root.create_or_open(db, ('myapp', 'users'))
orders_dir = root.create_or_open(db, ('myapp', 'orders'))
indexes_dir = root.create_or_open(db, ('myapp', 'indexes'))
 
# Now use directories to create keys
# Directory prefix is automatically prepended
 
@fdb.transactional
def create_user_with_directory(tr, user_id, email):
    # users_dir.pack() returns the directory prefix + your key
    tr[users_dir.pack((user_id, 'email'))] = email.encode()
    tr[users_dir.pack((user_id, 'created'))] = str(time.time()).encode()
    
    # Index in separate directory
    tr[indexes_dir.pack(('by_email', email))] = pack((user_id,))
 
# Benefits:
# 1. Short, automated prefixes (directories use compact binary prefixes)
# 2. Namespace isolation (different apps can coexist)
# 3. Easy to relocate entire datasets
# 4. Clear organizational structure

Pattern 2: Subspaces for Logical Grouping

Subspaces provide lightweight key prefixing without the overhead of directories:

from fdb.subspace import Subspace

# Create subspaces for logical grouping
data = Subspace(('data',))
indexes = Subspace(('idx',))
metrics = Subspace(('metrics',))

# Use subspaces to create organized keys
tr[data['users'][user_id]['email']] = email.encode()
tr[indexes['users_by_email'][email]] = pack((user_id,))

Pattern 3: Time-Ordered Keys with Versionstamps

For event logs, audit trails, or any time-series data, versionstamps provide guaranteed unique, time-ordered keys:

versionstamp-pattern.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# ============================================
# VERSIONSTAMPS FOR TIME-ORDERED UNIQUE KEYS
# ============================================
 
# A versionstamp is an 80-bit value that FoundationDB assigns
# at commit time. It's guaranteed to be:
# 1. Unique across the entire cluster
# 2. Monotonically increasing with real time
# 3. Consistent with transaction commit order
 
@fdb.transactional
def log_event(tr, event_type, event_data):
    """
    Log an event with automatic time-ordering.
    """
    # The key structure with versionstamp placeholder
    # fdb.impl.Versionstamp() is replaced at commit time
    
    key = pack((
        "events",
        event_type,
        fdb.tuple.Versionstamp()  # Placeholder
    ))
    
    # set_versionstamped_key tells FDB where the placeholder is
    tr.set_versionstamped_key(key, event_data.encode())
 
@fdb.transactional
def get_recent_events(tr, event_type, limit=100):
    """
    Get most recent events (reverse order).
    """
    prefix = pack(("events", event_type))
    
    events = []
    for key, value in tr.get_range(
        prefix,
        prefix + b'\xff',
        limit=limit,
        reverse=True  # Most recent first
    ):
        key_tuple = fdb.tuple.unpack(key)
        events.append({
            'timestamp': key_tuple[2],  # Versionstamp
            'data': value.decode()
        })
    
    return events
 
# Use case: Audit log
@fdb.transactional
def audit_action(tr, user_id, action, details):
    key = pack((
        "audit",
        user_id,
        fdb.tuple.Versionstamp()
    ))
    value = json.dumps({
        "action": action,
        "details": details,
        "recorded_at": datetime.utcnow().isoformat()
    })
    tr.set_versionstamped_key(key, value.encode())

Pattern 4: Avoiding Hot Keys

A common performance pitfall is creating "hot keys"—keys that are read or written so frequently that they become bottlenecks. The most common culprit is counters:

avoiding-hotkeys.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# ============================================
# ANTI-PATTERN: Single counter key (HOT KEY!)
# ============================================
 
# DON'T DO THIS for high-frequency counters:
@fdb.transactional
def increment_page_views_bad(tr):
    # This key becomes contended - transactions will conflict
    tr.add(b'global_page_views', pack((1,)))
 
# ============================================
# PATTERN: Sharded counters
# ============================================
 
import random
 
NUM_SHARDS = 100  # Adjust based on expected write rate
 
@fdb.transactional
def increment_page_views_good(tr):
    # Distribute writes across multiple keys
    shard = random.randint(0, NUM_SHARDS - 1)
    tr.add(pack(("counters", "page_views", shard)), pack((1,)))
 
@fdb.transactional
def get_page_views(tr):
    # Sum all shards to get total (reads are fast)
    total = 0
    for key, value in tr.get_range(
        pack(("counters", "page_views")),
        pack(("counters", "page_views")) + b'\xff'
    ):
        total += fdb.tuple.unpack(value)[0]
    return total
 
# ============================================
# PATTERN: Time-bucketed counters for analytics
# ============================================
 
from datetime import datetime
 
def get_time_bucket():
    """Returns bucket identifier (e.g., hourly bucket)."""
    now = datetime.utcnow()
    return now.strftime("%Y-%m-%d-%H")
 
@fdb.transactional
def increment_metric(tr, metric_name):
    bucket = get_time_bucket()
    shard = random.randint(0, 9)
    tr.add(
        pack(("metrics", metric_name, bucket, shard)),
        pack((1,))
    )
 
@fdb.transactional
def get_metric_for_hour(tr, metric_name, hour_bucket):
    prefix = pack(("metrics", metric_name, hour_bucket))
    total = 0
    for _, value in tr.get_range(prefix, prefix + b'\xff'):
        total += fdb.tuple.unpack(value)[0]
    return total

Key Design Principles Summary

Keys determine access paths - Design keys for your query patterns
Use hierarchical keys - Group related data under common prefixes
Maintain indexes in same transaction - Never let indexes diverge from data
Shard hot keys - Distribute high-frequency writes across multiple keys
Use versionstamps for time-ordering - Let FoundationDB handle unique timestamps
Leverage clear_range - Bulk deletes are efficient; design for them

Why This Model Works: Power Through Simplicity

At first glance, FoundationDB's key-value model might seem limiting. No SQL, no secondary indexes, no joins—surely this is a step backward? Yet some of the world's most sophisticated data systems are built on this foundation. Understanding why reveals deep truths about database design.

The Composability Principle:

FoundationDB's simplicity is intentional because it maximizes composability—the ability to combine simple parts into more complex wholes without interference or unexpected interactions.

Consider what happens when a database provides built-in features:

Built-in secondary indexes: Automatically maintained, but you can't customize ordering, filtering, or storage format
Built-in query language: Convenient, but query planning is a black box; optimization is limited
Built-in caching: Helpful, but you can't control invalidation logic or cache hierarchies

Each built-in feature constrains how you can use the database. Features interact in complex ways—index updates may block writes, query plans may change unpredictably, cache invalidation may race with reads.

FoundationDB's approach is different: provide primitive operations with ironclad guarantees, and let developers compose them freely. You implement secondary indexes, so you control exactly when and how they update. You implement query logic, so you control optimization. You implement caching, so you control invalidation.

The Correctness Guarantee:

The other half of FoundationDB's value proposition is its unwavering correctness. Every transaction, no matter how complex, provides serializable isolation. There are no edge cases, no "under high load this might..." caveats, no subtle race conditions under concurrent access.

This guarantee is profound because it makes the database predictable. Application developers can reason about their code knowing that concurrent transactions will behave as if they executed one at a time. This dramatically simplifies application logic and eliminates entire categories of bugs.

Traditional Database Complexity

•Features interact in undocumented ways
•Performance varies based on hidden state
•Edge cases require application-level handling
•Scaling often requires compromising guarantees
•Debugging requires deep product expertise

FoundationDB's Simple Model

•Operations are orthogonal and composable
•Performance is predictable and measurable
•ACID guarantees eliminate edge cases
•Scaling preserves all guarantees
•Debugging uses standard reasoning

Building Up Versus Tearing Down:

Most databases are designed with high-level features and then try to optimize or specialize downward. FoundationDB is designed with low-level primitives and then builds upward through layers.

This "building up" approach has key advantages:

Each layer can be reasoned about independently: A document layer's behavior depends only on the primitive operations it uses, not on hidden database internals.
Layers can be replaced or customized: Don't like how the SQL layer handles queries? Write your own, using the same primitives.
Guarantees propagate upward: If the key-value layer provides ACID transactions, any layer built on top inherits those transactions automatically.
Testing is tractable: The primitive layer is small enough to test exhaustively. Layers can be tested against the primitive guarantees.

We'll explore the layer architecture in depth in a later page. For now, understand that the ordered key-value store isn't the end product—it's the foundation upon which sophisticated data systems are constructed.

The Unix Philosophy Applied to Databases

FoundationDB embodies the Unix philosophy: do one thing and do it well. Just as Unix provides primitive operations (files, pipes, processes) that compose into powerful systems, FoundationDB provides primitive operations (keys, values, ranges, transactions) that compose into powerful databases. The power comes from the composition, not from feature accumulation.

Summary: The Ordered Key-Value Foundation

We've explored FoundationDB's core abstraction—the ordered key-value store—and seen how this deceptively simple model becomes powerful through careful design. Let's consolidate the key principles:

Key Takeaways

•Ordered keys enable range queries: Lexicographic ordering allows efficient prefix scans, successor queries, and range selection—the building blocks for indexes and hierarchical structures.
•All operations are transactional: There's no 'fast path' that bypasses ACID guarantees. Every read and write happens within serializable transactions.
•Key design is application design: Unlike SQL where you design tables and let the optimizer handle access, in FoundationDB your key structure explicitly determines your access patterns.
•Secondary indexes are your responsibility: You implement and maintain indexes through the same transactional primitives, giving you complete control over their structure and update semantics.
•Simplicity enables composability: The small surface area of FoundationDB's API means operations combine predictably without hidden interactions.
•The Tuple layer handles encoding: Use the Tuple layer for keys that need proper sort order for mixed types (strings, integers, timestamps).
•Avoid hot keys: Distribute high-frequency writes across multiple keys to prevent contention.

What's Next:

The ordered key-value store is FoundationDB's foundation, but it's the strict serializability guarantee that makes this foundation trustworthy. In the next page, we'll explore how FoundationDB achieves serializable transactions at scale—understanding the concurrency control mechanisms, conflict detection, and the remarkable simulation testing that gives developers confidence in FoundationDB's correctness claims.

Page Complete

You now understand FoundationDB's ordered key-value model—the primitive abstraction that enables building any data model through composition. The real magic, however, lies in how FoundationDB guarantees that these primitives work correctly under all conditions. That's our next topic: strict serializability.

1 / 5

Loading learning content...

System Design (HLD)FoundationDB

FoundationDB: The Database That Powers Cloud Giants

LevelAdvanced

Duration90 mins

TopicFoundationDB

1 / 5

Ordered Key-Value Store: The Minimalist Foundation for Maximum Power

The Database That Does Less to Achieve More

That primitive is the ordered key-value store with strict serializable ACID transactions.

What You Will Learn

Understanding Key-Value Stores

The Basic Key-Value Model:

At its simplest, a key-value store is a dictionary—a collection of (key, value) pairs where each key maps to exactly one value. The fundamental operations are:

GET(key) → value (or null if not found)
SET(key, value) → success/failure
DELETE(key) → success/failure

The Spectrum of Key-Value Stores:

Not all key-value stores are created equal. They vary dramatically in their guarantees and capabilities:

Key-Value Store Variations
Type	Examples	Ordering	Transactions	Typical Use Case
In-Memory Cache	Redis, Memcached	None (hash-based)	Limited/None	Caching, sessions, rate limiting
Simple Distributed	Amazon DynamoDB (basic)	None (hash partitioned)	Single-item only	Simple CRUD, user profiles
Ordered/Sorted	FoundationDB, RocksDB	Lexicographic	Multi-key ACID	Building databases, indexes
Wide-Column	Cassandra, HBase	Partial (within partition)	Row-level	Time-series, write-heavy loads

Why Ordering Matters: The Hidden Power:

"What are all keys in this range?" (range queries)
"What is the smallest key greater than X?" (successor queries)
"What keys start with this prefix?" (prefix scans)

The Transaction Question:

Simplicity is Deceptive

FoundationDB's Core Operations

Keys and Values:

Both keys and values in FoundationDB are arbitrary byte strings:

Keys: Up to 10KB (but recommended ≤ 500 bytes for performance)
Values: Up to 100KB (larger values should be split across multiple keys)

The Fundamental Operations:

foundationdb-operations.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
import fdb
 
# Initialize FoundationDB client
fdb.api_version(720)
db = fdb.open()
 
# ============================================
# BASIC READ/WRITE OPERATIONS
# ============================================
 
@fdb.transactional
def basic_operations(tr):
    """
    All operations within this function execute atomically.
    If any operation fails, all are rolled back.
    """
    
    # SET: Write a key-value pair
    # If key exists, overwrites; if not, creates
    tr[b'users/alice/email'] = b'alice@example.com'
    tr[b'users/alice/name'] = b'Alice Smith'
    tr[b'users/alice/created_at'] = b'2024-01-15T12:00:00Z'
    
    # GET: Read a single key's value
    # Returns None if key doesn't exist
    email = tr[b'users/alice/email']
    print(f"Email: {email}")  # b'alice@example.com'
    
    # DELETE: Remove a key-value pair
    # Silently succeeds even if key doesn't exist
    del tr[b'users/alice/temp_token']
    
    # Key existence check
    if b'users/alice/premium' in tr:
        print("User has premium status")
 
# ============================================
# RANGE OPERATIONS - The Power of Ordering
# ============================================
 
@fdb.transactional
def range_operations(tr):
    """
    Range operations leverage the sorted nature of keys.
    """
    
    # Get all keys with prefix 'users/alice/'
    # This returns an iterator, not a list (efficient for large ranges)
    for key, value in tr.get_range(
        b'users/alice/',           # Start key (inclusive)
        b'users/alice/\xff'       # End key (exclusive) - \xff ensures we get all children
    ):
        print(f"{key} = {value}")
    
    # Outputs (in sorted order):
    # users/alice/created_at = 2024-01-15T12:00:00Z
    # users/alice/email = alice@example.com
    # users/alice/name = Alice Smith
    
    # Range with limit
    # Get only the first 10 orders
    orders = tr.get_range(
        b'users/alice/orders/',
        b'users/alice/orders/\xff',
        limit=10
    )
    
    # Range in reverse order
    # Get the 5 most recent orders (assuming keys are time-sorted)
    recent_orders = tr.get_range(
        b'users/alice/orders/',
        b'users/alice/orders/\xff',
        limit=5,
        reverse=True
    )
 
# ============================================
# ATOMIC OPERATIONS - Conflict-Free Updates
# ============================================
 
@fdb.transactional
def atomic_operations(tr):
    """
    Atomic mutations that don't require reading the current value.
    These reduce conflicts in high-contention scenarios.
    """
    
    # Atomic ADD: Increment/decrement without read-modify-write cycle
    # Incredibly useful for counters, metrics, inventory
    tr.add(b'metrics/page_views', fdb.tuple.pack((1,)))
    tr.add(b'inventory/product_123', fdb.tuple.pack((-1,)))  # Decrement
    
    # Atomic BITWISE operations
    tr.bit_or(b'permissions/alice', some_flag_bytes)
    tr.bit_and(b'permissions/alice', mask_bytes)
    tr.bit_xor(b'toggles/feature_x', toggle_bytes)
    
    # Atomic MIN/MAX: Compare-and-update atomically
    tr.min(b'stats/min_latency', fdb.tuple.pack((42,)))
    tr.max(b'stats/max_latency', fdb.tuple.pack((1500,)))
    
    # SET_VERSIONSTAMPED_KEY: Unique time-ordered keys
    # FoundationDB replaces a placeholder with a unique 80-bit version stamp
    # Perfect for event logs, audit trails, time-series
    tr.set_versionstamped_key(
        b'events/\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00/user_login',
        event_data
    )
 
# ============================================
# CLEAR RANGE - Efficient Bulk Deletion
# ============================================
 
@fdb.transactional
def clear_operations(tr):
    """
    Efficiently delete ranges of keys.
    """
    
    # Delete a single key
    del tr[b'users/bob/session']
    
    # Delete all keys with a prefix (very efficient)
    # This deletes all of Bob's data atomically
    tr.clear_range(b'users/bob/', b'users/bob/\xff')
    
    # Useful for:
    # - Account deletion (GDPR right to erasure)
    # - Clearing expired sessions
    # - Resetting test data

Understanding Key Ordering:

FoundationDB orders keys lexicographically by their byte content. This is similar to dictionary ordering for strings, extended to arbitrary bytes:

a < aa < ab < b < ba < bb
0x00 < 0x01 < 0x02 < ... < 0xFE < 0xFF

This lexicographic ordering has important implications:

Numeric values need encoding: The string "10" sorts before "9" lexicographically. For numeric sorting, you must use special encodings (FoundationDB's Tuple layer handles this automatically).
Hierarchies need thought: Keys like users/alice/orders/001 naturally sort together because they share a prefix.
The \xFF suffix trick: To get all keys starting with a prefix P, query range [P, P\xFF). Since \xFF is the highest byte value, P\xFF is always greater than any key starting with P.

The Tuple Layer: Structured Key Encoding:

While FoundationDB only sees bytes, manually encoding complex keys is error-prone. The Tuple layer provides a standard encoding for structured keys that preserves sort order:

tuple-layer-encoding.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
import fdb
from fdb.tuple import pack, unpack
 
# The Tuple layer encodes values in a way that preserves
# intuitive sort order across types
 
# Packing tuples to bytes
key1 = pack(("users", "alice", 1))     # b'\x02users\x00\x02alice\x00\x15\x01'
key2 = pack(("users", "alice", 2))     # b'\x02users\x00\x02alice\x00\x15\x02'
key3 = pack(("users", "alice", 10))    # b'\x02users\x00\x02alice\x00\x15\x0a'
key4 = pack(("users", "bob", 1))       # b'\x02users\x00\x02bob\x00\x15\x01'
 
# These sort correctly!
# key1 < key2 < key3 < key4
# Despite "10" coming after "2" in string sorting
 
# Unpacking bytes back to tuples
original = unpack(key1)  # ("users", "alice", 1)
 
# ============================================
# PRACTICAL SCHEMA DESIGN WITH TUPLES
# ============================================
 
# E-commerce schema example
# Each tuple element represents a level in the key hierarchy
 
# User data
user_key = pack(("data", "users", user_id, "profile"))
user_orders_prefix = pack(("data", "users", user_id, "orders"))
user_order_key = pack(("data", "users", user_id, "orders", order_id))
 
# Product catalog
product_key = pack(("data", "products", product_id))
category_products = pack(("index", "category", category_name, product_id))
 
# Order items (nested under orders)
order_item_key = pack((
    "data", "users", user_id, 
    "orders", order_id, 
    "items", item_id
))
 
# Time-series data with timestamp encoding
# Integers encode with preserved sort order
event_key = pack(("events", timestamp_ms, event_type, event_id))
 
# Range query: all events between two timestamps
start = pack(("events", start_time))
end = pack(("events", end_time))
events_in_range = tr.get_range(start, end)

Value Size Considerations

Modeling Data in the Ordered Key-Value Paradigm

Fundamental Principle: Keys Are Access Paths

Every key structure you design should be optimized for specific access patterns. Consider what questions your application needs to answer:

Get user by ID → Key: (users, user_id)
Get user by email → Need a secondary index: (index, email, user_id)
Get all orders for a user → Key prefix: (users, user_id, orders, *)
Get orders by date range → Different key structure: (orders, date, order_id)

Secondary Indexes: Denormalization for Query Flexibility

Since FoundationDB only supports lookup by key prefix and range, every query pattern needs its own index. This means intentional denormalization:

secondary-indexes.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
import fdb
from fdb.tuple import pack
 
# ============================================
# BUILDING A USER MANAGEMENT SYSTEM
# ============================================
 
@fdb.transactional
def create_user(tr, user_id, email, name, created_at):
    """
    Create a user with multiple indexes for different access patterns.
    All operations are atomic - either all succeed or none do.
    """
    
    # Primary data: user record
    tr[pack(("data", "users", user_id, "email"))] = email.encode()
    tr[pack(("data", "users", user_id, "name"))] = name.encode()
    tr[pack(("data", "users", user_id, "created_at"))] = str(created_at).encode()
    
    # Index 1: Lookup by email (email -> user_id mapping)
    tr[pack(("index", "users_by_email", email))] = pack((user_id,))
    
    # Index 2: Users by creation date (for "newest users" query)
    tr[pack(("index", "users_by_date", created_at, user_id))] = b''
    
    # Index 3: Users by name prefix (for autocomplete)
    # Store multiple prefixes for fuzzy matching
    name_lower = name.lower()
    for i in range(1, min(len(name_lower) + 1, 10)):  # Up to 10 chars
        prefix = name_lower[:i]
        tr[pack(("index", "users_by_name_prefix", prefix, user_id))] = b''
 
 
@fdb.transactional
def get_user_by_email(tr, email):
    """Lookup user by email using secondary index."""
    
    # Step 1: Find user_id from email index
    user_id_bytes = tr[pack(("index", "users_by_email", email))]
    if user_id_bytes is None:
        return None
    
    user_id = fdb.tuple.unpack(user_id_bytes)[0]
    
    # Step 2: Fetch user data using user_id
    return get_user_by_id(tr, user_id)
 
 
@fdb.transactional  
def get_user_by_id(tr, user_id):
    """Fetch all user attributes using range query on prefix."""
    
    prefix = pack(("data", "users", user_id))
    end = prefix + b'\xff'
    
    user = {"id": user_id}
    for key, value in tr.get_range(prefix, end):
        # Extract attribute name from key
        key_tuple = fdb.tuple.unpack(key)
        attr_name = key_tuple[-1]  # Last element is attribute name
        user[attr_name] = value.decode()
    
    return user if len(user) > 1 else None
 
 
@fdb.transactional
def delete_user(tr, user_id):
    """
    Delete user and ALL associated indexes.
    Failure to delete indexes creates orphan data - be thorough!
    """
    
    # First, read user data to get values needed for index cleanup
    email = tr[pack(("data", "users", user_id, "email"))]
    name = tr[pack(("data", "users", user_id, "name"))]
    created_at = tr[pack(("data", "users", user_id, "created_at"))]
    
    if email is None:
        return False  # User doesn't exist
    
    # Delete all user data (clear_range is efficient)
    tr.clear_range(
        pack(("data", "users", user_id)),
        pack(("data", "users", user_id)) + b'\xff'
    )
    
    # Delete email index
    del tr[pack(("index", "users_by_email", email.decode()))]
    
    # Delete date index  
    del tr[pack(("index", "users_by_date", created_at.decode(), user_id))]
    
    # Delete name prefix indexes
    name_lower = name.decode().lower()
    for i in range(1, min(len(name_lower) + 1, 10)):
        prefix = name_lower[:i]
        del tr[pack(("index", "users_by_name_prefix", prefix, user_id))]
    
    return True
 
 
@fdb.transactional
def search_users_by_name_prefix(tr, prefix, limit=10):
    """
    Autocomplete: find users whose names start with prefix.
    Returns list of user_ids.
    """
    
    search_key = pack(("index", "users_by_name_prefix", prefix.lower()))
    end_key = search_key + b'\xff'
    
    user_ids = []
    for key, _ in tr.get_range(search_key, end_key, limit=limit):
        key_tuple = fdb.tuple.unpack(key)
        user_ids.append(key_tuple[-1])  # user_id is last element
    
    return list(set(user_ids))  # Deduplicate (same user may match multiple prefixes)

The Index Consistency Challenge:

Modeling Relationships:

While FoundationDB lacks explicit foreign keys, relationships are easily modeled through key design:

relationship-modeling.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# ============================================
# MODELING RELATIONSHIPS IN FOUNDATIONDB
# ============================================
 
# ONE-TO-MANY: User has many Orders
# Parent data and children share key prefix
 
# User record
("data", "users", user_id, "profile")  →  user_json
 
# User's orders (children nested under parent)
("data", "users", user_id, "orders", order_id, "status")  →  "shipped"
("data", "users", user_id, "orders", order_id, "total")   →  "99.99"
 
# Get all orders for user:
tr.get_range(
    pack(("data", "users", user_id, "orders")),
    pack(("data", "users", user_id, "orders")) + b'\xff'
)
 
# ============================================
# MANY-TO-MANY: Users follow Users
# ============================================
 
# Store both directions for efficient queries
 
# User A follows User B
# Forward direction: "Who does A follow?"
("data", "follows", "following", user_a_id, user_b_id)  →  b''
 
# Reverse direction: "Who follows B?"  
("data", "follows", "followers", user_b_id, user_a_id)  →  b''
 
@fdb.transactional
def follow_user(tr, follower_id, followed_id):
    """Create a follow relationship (both directions atomically)."""
    tr[pack(("data", "follows", "following", follower_id, followed_id))] = b''
    tr[pack(("data", "follows", "followers", followed_id, follower_id))] = b''
 
@fdb.transactional
def unfollow_user(tr, follower_id, followed_id):
    """Remove follow relationship (both directions atomically)."""
    del tr[pack(("data", "follows", "following", follower_id, followed_id))]
    del tr[pack(("data", "follows", "followers", followed_id, follower_id))]
 
@fdb.transactional
def get_following(tr, user_id, limit=100):
    """Get list of users that user_id follows."""
    prefix = pack(("data", "follows", "following", user_id))
    results = []
    for key, _ in tr.get_range(prefix, prefix + b'\xff', limit=limit):
        followed_id = fdb.tuple.unpack(key)[-1]
        results.append(followed_id)
    return results
 
@fdb.transactional  
def get_followers(tr, user_id, limit=100):
    """Get list of users who follow user_id."""
    prefix = pack(("data", "follows", "followers", user_id))
    results = []
    for key, _ in tr.get_range(prefix, prefix + b'\xff', limit=limit):
        follower_id = fdb.tuple.unpack(key)[-1]
        results.append(follower_id)
    return results
 
# ============================================
# COUNT AS SEPARATE MAINTAINED VALUE
# ============================================
 
# For "how many followers does user have?", maintain a count
# (More efficient than counting keys in range)
 
@fdb.transactional
def follow_user_with_count(tr, follower_id, followed_id):
    # Check if already following
    key = pack(("data", "follows", "following", follower_id, followed_id))
    if tr[key] is not None:
        return False  # Already following
    
    # Create relationship
    tr[key] = b''
    tr[pack(("data", "follows", "followers", followed_id, follower_id))] = b''
    
    # Atomically increment counts (no read required!)
    tr.add(pack(("counts", "following", follower_id)), pack((1,)))
    tr.add(pack(("counts", "followers", followed_id)), pack((1,)))
    
    return True

Denormalization Trades Space for Query Flexibility

Key Design Patterns and Best Practices

Experienced FoundationDB developers accumulate patterns that solve common modeling challenges. Here are proven approaches that balance functionality, performance, and maintainability.

Pattern 1: Directory-Based Namespacing

FoundationDB's Directory layer provides managed key prefixes that avoid collisions between different parts of your application:

directory-namespacing.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import fdb
import fdb.directory
 
fdb.api_version(720)
db = fdb.open()
 
# The Directory layer manages a tree of named directories
# Each directory is assigned a unique short prefix automatically
 
# Create/open directories (creates if doesn't exist)
root = fdb.directory.root()
 
users_dir = root.create_or_open(db, ('myapp', 'users'))
orders_dir = root.create_or_open(db, ('myapp', 'orders'))
indexes_dir = root.create_or_open(db, ('myapp', 'indexes'))
 
# Now use directories to create keys
# Directory prefix is automatically prepended
 
@fdb.transactional
def create_user_with_directory(tr, user_id, email):
    # users_dir.pack() returns the directory prefix + your key
    tr[users_dir.pack((user_id, 'email'))] = email.encode()
    tr[users_dir.pack((user_id, 'created'))] = str(time.time()).encode()
    
    # Index in separate directory
    tr[indexes_dir.pack(('by_email', email))] = pack((user_id,))
 
# Benefits:
# 1. Short, automated prefixes (directories use compact binary prefixes)
# 2. Namespace isolation (different apps can coexist)
# 3. Easy to relocate entire datasets
# 4. Clear organizational structure

Pattern 2: Subspaces for Logical Grouping

Subspaces provide lightweight key prefixing without the overhead of directories:

from fdb.subspace import Subspace

# Create subspaces for logical grouping
data = Subspace(('data',))
indexes = Subspace(('idx',))
metrics = Subspace(('metrics',))

# Use subspaces to create organized keys
tr[data['users'][user_id]['email']] = email.encode()
tr[indexes['users_by_email'][email]] = pack((user_id,))

Pattern 3: Time-Ordered Keys with Versionstamps

For event logs, audit trails, or any time-series data, versionstamps provide guaranteed unique, time-ordered keys:

versionstamp-pattern.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# ============================================
# VERSIONSTAMPS FOR TIME-ORDERED UNIQUE KEYS
# ============================================
 
# A versionstamp is an 80-bit value that FoundationDB assigns
# at commit time. It's guaranteed to be:
# 1. Unique across the entire cluster
# 2. Monotonically increasing with real time
# 3. Consistent with transaction commit order
 
@fdb.transactional
def log_event(tr, event_type, event_data):
    """
    Log an event with automatic time-ordering.
    """
    # The key structure with versionstamp placeholder
    # fdb.impl.Versionstamp() is replaced at commit time
    
    key = pack((
        "events",
        event_type,
        fdb.tuple.Versionstamp()  # Placeholder
    ))
    
    # set_versionstamped_key tells FDB where the placeholder is
    tr.set_versionstamped_key(key, event_data.encode())
 
@fdb.transactional
def get_recent_events(tr, event_type, limit=100):
    """
    Get most recent events (reverse order).
    """
    prefix = pack(("events", event_type))
    
    events = []
    for key, value in tr.get_range(
        prefix,
        prefix + b'\xff',
        limit=limit,
        reverse=True  # Most recent first
    ):
        key_tuple = fdb.tuple.unpack(key)
        events.append({
            'timestamp': key_tuple[2],  # Versionstamp
            'data': value.decode()
        })
    
    return events
 
# Use case: Audit log
@fdb.transactional
def audit_action(tr, user_id, action, details):
    key = pack((
        "audit",
        user_id,
        fdb.tuple.Versionstamp()
    ))
    value = json.dumps({
        "action": action,
        "details": details,
        "recorded_at": datetime.utcnow().isoformat()
    })
    tr.set_versionstamped_key(key, value.encode())

Pattern 4: Avoiding Hot Keys

A common performance pitfall is creating "hot keys"—keys that are read or written so frequently that they become bottlenecks. The most common culprit is counters:

avoiding-hotkeys.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
# ============================================
# ANTI-PATTERN: Single counter key (HOT KEY!)
# ============================================
 
# DON'T DO THIS for high-frequency counters:
@fdb.transactional
def increment_page_views_bad(tr):
    # This key becomes contended - transactions will conflict
    tr.add(b'global_page_views', pack((1,)))
 
# ============================================
# PATTERN: Sharded counters
# ============================================
 
import random
 
NUM_SHARDS = 100  # Adjust based on expected write rate
 
@fdb.transactional
def increment_page_views_good(tr):
    # Distribute writes across multiple keys
    shard = random.randint(0, NUM_SHARDS - 1)
    tr.add(pack(("counters", "page_views", shard)), pack((1,)))
 
@fdb.transactional
def get_page_views(tr):
    # Sum all shards to get total (reads are fast)
    total = 0
    for key, value in tr.get_range(
        pack(("counters", "page_views")),
        pack(("counters", "page_views")) + b'\xff'
    ):
        total += fdb.tuple.unpack(value)[0]
    return total
 
# ============================================
# PATTERN: Time-bucketed counters for analytics
# ============================================
 
from datetime import datetime
 
def get_time_bucket():
    """Returns bucket identifier (e.g., hourly bucket)."""
    now = datetime.utcnow()
    return now.strftime("%Y-%m-%d-%H")
 
@fdb.transactional
def increment_metric(tr, metric_name):
    bucket = get_time_bucket()
    shard = random.randint(0, 9)
    tr.add(
        pack(("metrics", metric_name, bucket, shard)),
        pack((1,))
    )
 
@fdb.transactional
def get_metric_for_hour(tr, metric_name, hour_bucket):
    prefix = pack(("metrics", metric_name, hour_bucket))
    total = 0
    for _, value in tr.get_range(prefix, prefix + b'\xff'):
        total += fdb.tuple.unpack(value)[0]
    return total

Key Design Principles Summary

Keys determine access paths - Design keys for your query patterns
Use hierarchical keys - Group related data under common prefixes
Maintain indexes in same transaction - Never let indexes diverge from data
Shard hot keys - Distribute high-frequency writes across multiple keys
Use versionstamps for time-ordering - Let FoundationDB handle unique timestamps
Leverage clear_range - Bulk deletes are efficient; design for them

Why This Model Works: Power Through Simplicity

The Composability Principle:

FoundationDB's simplicity is intentional because it maximizes composability—the ability to combine simple parts into more complex wholes without interference or unexpected interactions.

Consider what happens when a database provides built-in features:

Built-in secondary indexes: Automatically maintained, but you can't customize ordering, filtering, or storage format
Built-in query language: Convenient, but query planning is a black box; optimization is limited
Built-in caching: Helpful, but you can't control invalidation logic or cache hierarchies

The Correctness Guarantee:

Traditional Database Complexity

•Features interact in undocumented ways
•Performance varies based on hidden state
•Edge cases require application-level handling
•Scaling often requires compromising guarantees
•Debugging requires deep product expertise

FoundationDB's Simple Model

•Operations are orthogonal and composable
•Performance is predictable and measurable
•ACID guarantees eliminate edge cases
•Scaling preserves all guarantees
•Debugging uses standard reasoning

Building Up Versus Tearing Down:

Most databases are designed with high-level features and then try to optimize or specialize downward. FoundationDB is designed with low-level primitives and then builds upward through layers.

This "building up" approach has key advantages:

Each layer can be reasoned about independently: A document layer's behavior depends only on the primitive operations it uses, not on hidden database internals.
Layers can be replaced or customized: Don't like how the SQL layer handles queries? Write your own, using the same primitives.
Guarantees propagate upward: If the key-value layer provides ACID transactions, any layer built on top inherits those transactions automatically.
Testing is tractable: The primitive layer is small enough to test exhaustively. Layers can be tested against the primitive guarantees.

The Unix Philosophy Applied to Databases

Summary: The Ordered Key-Value Foundation

Key Takeaways

•Ordered keys enable range queries: Lexicographic ordering allows efficient prefix scans, successor queries, and range selection—the building blocks for indexes and hierarchical structures.
•All operations are transactional: There's no 'fast path' that bypasses ACID guarantees. Every read and write happens within serializable transactions.
•Key design is application design: Unlike SQL where you design tables and let the optimizer handle access, in FoundationDB your key structure explicitly determines your access patterns.
•Secondary indexes are your responsibility: You implement and maintain indexes through the same transactional primitives, giving you complete control over their structure and update semantics.
•Simplicity enables composability: The small surface area of FoundationDB's API means operations combine predictably without hidden interactions.
•The Tuple layer handles encoding: Use the Tuple layer for keys that need proper sort order for mixed types (strings, integers, timestamps).
•Avoid hot keys: Distribute high-frequency writes across multiple keys to prevent contention.

What's Next:

Page Complete

1 / 5