Loading learning content...
The previous page established why multiple data models exist and why their diversity creates challenges for application developers. Now we address the architectural question: How can a single database system support multiple data models simultaneously?
This isn't merely a matter of adding features. Supporting multiple models within one system requires fundamental architectural decisions about:
The answer to these questions determines whether a multi-model database is a coherent system or merely multiple systems bolted together.
By the end of this page, you will understand the architectural patterns for building multi-model databases, including unified storage engines, query processor integration, and cross-model transaction handling. You'll be able to evaluate multi-model databases based on their architectural depth.
Multi-model databases employ several architectural patterns, each with distinct trade-offs. Understanding these patterns is essential for evaluating and selecting multi-model systems.
Pattern 1: Unified Core with Model Adapters
The most deeply integrated approach builds a flexible core storage engine that can be accessed through different model 'lenses':
┌─────────────────────────────────────────────────────┐
│ Application Layer │
├─────────────┬─────────────┬─────────────┬───────────┤
│ Document API│ Graph API │ Key-Value API│ SQL API │
├─────────────┴─────────────┴─────────────┴───────────┤
│ Unified Query Processor │
├─────────────────────────────────────────────────────┤
│ Unified Storage Engine │
│ (Optimized for Multiple Access Patterns) │
└─────────────────────────────────────────────────────┘
Characteristics:
Pattern 2: Federated Storage Engines
Some systems use different storage engines for different models, federated under a common query layer:
┌─────────────────────────────────────────────────────┐
│ Application Layer │
├─────────────────────────────────────────────────────┤
│ Unified Query Processor │
├──────────────────┬──────────────────┬───────────────┤
│ Document Engine │ Graph Engine │ KV Engine │
└──────────────────┴──────────────────┴───────────────┘
Characteristics:
Pattern 3: Extended Single-Model
Existing databases add multi-model capabilities through extensions:
┌─────────────────────────────────────────────────────┐
│ Application Layer │
├─────────────────────────────────────────────────────┤
│ Original Query Processor + Extension Handler │
├─────────────────────────────────────────────────────┤
│ Original Storage Engine + Extension Storage │
└─────────────────────────────────────────────────────┘
Characteristics:
When evaluating multi-model databases, ask: 'Was this designed multi-model from the start, or was multi-model added later?' Native multi-model systems typically offer deeper integration, while extended systems may offer better performance for their primary model.
A unified storage engine must satisfy competing requirements from different data models. This is the deepest architectural challenge in multi-model design.
Core Storage Requirements by Model:
| Model | Storage Requirements | Access Patterns |
|---|---|---|
| Document | Nested structures, variable schemas, efficient serialization | Full document retrieval, field-level access, nested queries |
| Graph | Edge storage, adjacency lists, traversal-optimized structures | Neighbor lookup, path traversal, pattern matching |
| Key-Value | Direct key → value mapping, minimal overhead | Point lookups, range scans by key, TTL handling |
| Relational | Fixed schemas, column storage, index structures | Predicate filtering, joins, aggregations |
Unified Storage Approaches:
1. Document-Centric Core
Many multi-model databases use a document-like format as their core storage, representing graph edges and relational rows as documents:
// Document storage
{ "_key": "product_123", "type": "product", "name": "Laptop", ... }
// Graph edge as document
{ "_key": "123_456", "_from": "users/123", "_to": "products/456", "type": "purchased" }
// Relational row as document
{ "_key": "order_789", "customer_id": 123, "total": 599.99, "date": "2024-01-15" }
Advantages:
Challenges:
2. Graph-Native with Document Properties
Some systems store graphs natively (adjacency lists) with documents as node/edge properties:
Node Storage:
node_123: { type: "user", properties: { name: "Alice", ... } }
Edge Storage (adjacency):
node_123 -> [ node_456 (follows), node_789 (likes), ... ]
Property Storage:
edge_123_456: { since: "2024-01-01", strength: 0.8 }
Advantages:
Challenges:
3. Hybrid with Specialized Structures
Advanced systems maintain multiple internal representations optimized for different access patterns:
┌────────────────────────────────────────┐
│ Logical Data Layer │
│ (Unified view of all data) │
├────────────────────────────────────────┤
│ Physical Storage Layer │
│ ┌──────────┬──────────┬─────────────┐ │
│ │ B+ Trees │ Adjacency│ Hash Tables │ │
│ │ (range) │ (graph) │ (point) │ │
│ └──────────┴──────────┴─────────────┘ │
└────────────────────────────────────────┘
The system maintains consistency across representations, choosing optimal structures for each access pattern.
There is no perfect unified storage format. Every approach trades something for something else. Document-centric storage sacrifices graph performance; graph-native storage may sacrifice document query efficiency. Understanding these trade-offs helps you evaluate whether a multi-model system fits your workload.
The query processor is where multi-model integration becomes most visible to users. A well-integrated multi-model query processor enables queries that seamlessly combine different model operations.
Query Language Approaches:
1. Unified Query Language
Some systems design a single query language that natively supports all models. ArangoDB's AQL (ArangoDB Query Language) exemplifies this:
// Mixed document and graph query in AQL
FOR product IN products
FILTER product.category == "electronics"
LET recommended = (
FOR v, e IN 1..2 OUTBOUND product GRAPH 'recommendations'
FILTER e.weight > 0.5
RETURN v
)
RETURN { product, recommended }
This query:
2. Multiple Integrated Languages
Other systems support multiple query languages that can reference each other:
-- SQL with graph extension (conceptual)
SELECT p.name,
(SELECT target FROM follow_path(p.id, 'friends', 2)) as network
FROM products p
WHERE p.category = 'electronics';
3. Polyglot Query APIs
Some systems provide separate APIs for each model but allow results to reference across models through identifiers.
Query Optimization Challenges:
Cross-model queries present unique optimization challenges:
Challenge 1: Cost Model Complexity
Optimizing a single-model query requires estimating costs for that model's operations. Cross-model queries require unified cost models:
Challenge 2: Execution Plan Space
More models mean more possible execution strategies:
The optimizer must evaluate cross-model execution orders.
Challenge 3: Intermediate Results
Cross-model execution may produce intermediate results that must be converted between model representations:
123456789101112131415161718192021222324252627
// Example: Find top influencers who purchased electronics// Combines: key-value lookup, document filter, graph traversal, aggregation // Start with efficient key lookup (key-value pattern)LET category_products = ( FOR p IN products FILTER p.category == "electronics" // Document filter RETURN p._id) // Traverse purchase relationships (graph pattern)FOR product_id IN category_products FOR user, purchase IN 1 INBOUND product_id GRAPH 'purchases' // Calculate social influence (graph traversal) LET follower_count = LENGTH( FOR follower IN 1..3 INBOUND user GRAPH 'social' RETURN follower ) // Aggregate and sort (relational pattern) COLLECT userId = user._key AGGREGATE influence = SUM(follower_count) SORT influence DESC LIMIT 10 RETURN { userId, influence }When evaluating multi-model databases, examine how naturally cross-model queries can be expressed. A well-designed query language makes multi-model operations feel cohesive rather than awkwardly combined. Try expressing your real use cases in the query language before committing.
Transaction support across model boundaries is perhaps the most significant advantage of multi-model databases over polyglot persistence. Let's examine how this works architecturally.
The Polyglot Consistency Problem:
With separate databases, cross-database consistency requires distributed transactions or application-level coordination:
// Polyglot persistence: dangerous window of inconsistency
await documentDB.insert(order); // Step 1 succeeds
await graphDB.createEdge(user, order); // Step 2 fails?
// Now document exists without graph edge!
// Application must detect and compensate
Distributed transaction protocols (2PC, Saga patterns) add complexity and performance overhead.
Multi-Model Transaction Architecture:
In a unified multi-model database, cross-model operations participate in the same transaction:
// Multi-model: atomic cross-model operation
database.transaction({
collections: { write: ['orders', 'purchased'] }
}, function(tx) {
// Document insert
let order = tx.collection('orders').insert({
_key: 'order_123',
items: [...],
total: 599
});
// Graph edge insert (same transaction)
tx.collection('purchased').insert({
_from: 'users/alice',
_to: order._id,
date: new Date()
});
});
// Either both succeed or both fail - atomicity guaranteed
Implementation Approaches:
1. Unified Write-Ahead Log (WAL)
Most multi-model databases use a single WAL for all model operations:
WAL Entry Format:
┌───────────┬────────────┬─────────────┬───────────┐
│ LSN │ TxID │ Model Type │ Operation │
├───────────┼────────────┼─────────────┼───────────┤
│ 1001 │ tx_42 │ document │ INSERT │
│ 1002 │ tx_42 │ graph │ EDGE_ADD │
│ 1003 │ tx_42 │ - │ COMMIT │
└───────────┴────────────┴─────────────┴───────────┘
Recovery replays WAL entries, reconstructing state across all models atomically.
2. MVCC Across Models
Multi-Version Concurrency Control (MVCC) enables consistent snapshots across model boundaries:
Document Version: product_123 @ version 5
Graph Edge Version: edge_456 @ version 5
Transaction at snapshot 5 sees consistent state across:
- Document queries
- Graph traversals
- They reference the same point in time
3. Lock Management
For systems using locking, the lock manager handles cross-model lock acquisition:
Transaction tx_42:
LOCK(documents/order_123, WRITE)
LOCK(edges/user_123_order_123, WRITE)
... perform operations ...
UNLOCK all
Deadlock detection spans model boundaries—a graph operation waiting on a document lock and vice versa must be detected.
Cross-model transactions in distributed deployments are especially complex. Data for different models may reside on different nodes. Multi-model databases handle this through distributed transaction protocols, but performance implications should be understood for your deployment topology.
Indexing in multi-model databases must support diverse query patterns across models while maintaining reasonable storage and update overhead.
Model-Specific Index Requirements:
| Model | Primary Index Types | Query Patterns Supported |
|---|---|---|
| Document | B+ tree, Hash, Full-text, Geospatial | Field queries, range scans, text search, location queries |
| Graph | Edge index, Vertex-centric index | Neighbor lookup, edge property filtering, traversal optimization |
| Key-Value | Primary key hash/tree | Point lookups, range scans by key |
| Relational | Secondary indexes, Composite indexes | Predicate filtering, join optimization |
Unified Index Architecture:
Multi-model databases typically support various index types through a unified index subsystem:
┌─────────────────────────────────────────────────────────┐
│ Query Processor │
│ (Selects indexes based on query model/patterns) │
├─────────────────────────────────────────────────────────┤
│ Index Selection Layer │
├─────────┬─────────┬─────────┬─────────┬────────────────┤
│ Primary │ B+ Tree │ Hash │ Edge │ Fulltext │
│ Key Idx │ Indexes │ Indexes │ Indexes │ Indexes │
├─────────┴─────────┴─────────┴─────────┴────────────────┤
│ Unified Storage Engine │
└─────────────────────────────────────────────────────────┘
Edge Indexes for Graph Traversal:
Graph operations require specialized index structures for efficient traversal:
Edge Index Structure (conceptual):
Outbound Index (for OUTBOUND traversal):
vertex_123 -> [edge_to_456, edge_to_789, edge_to_012]
Inbound Index (for INBOUND traversal):
vertex_456 -> [edge_from_123, edge_from_234]
Edge Property Index:
edge_type = "follows" -> [edge_123, edge_456, ...]
These indexes enable O(1) neighbor lookup rather than scanning all edges.
Vertex-Centric Indexes:
For filtering during traversal, vertex-centric indexes store edge properties indexed per vertex:
// Query: Find friends of Alice with friendship strength > 0.8
FOR friend IN 1 OUTBOUND 'users/alice'
FILTER friend.strength > 0.8
RETURN friend
// Vertex-centric index:
alice_edges:
strength=0.9 -> edge_to_bob
strength=0.7 -> edge_to_charlie
strength=0.85 -> edge_to_diana
// Index enables pruning edges during traversal
Composite Cross-Model Indexes:
Some systems support indexes spanning model concepts:
// Index on: document.category + graph.edge_type
// Enables efficient queries like:
// "Electronics products purchased in last 30 days"
Composite Index:
(category="electronics", edge_type="purchased", date > 30_days_ago)
-> [product_123, product_456, ...]
Index Maintenance Across Models:
When data changes, indexes across models must be updated consistently:
Every index consumes storage and slows down writes. Multi-model databases with many index types amplify this trade-off. Carefully analyze your query patterns and create only necessary indexes. The flexibility of multi-model doesn't mean you need every index type on every collection.
Running a single multi-model database versus multiple specialized databases has significant operational implications.
Unified Operations Benefits:
Operational Complexity Considerations:
1. Capacity Planning
Different models have different resource characteristics:
Capacity planning must account for workload mix, which may be harder to predict than single-model systems.
2. Performance Tuning
Tuning multi-model databases requires understanding model interactions:
3. Upgrade and Migration
Upgrading a single database that serves multiple functions has higher risk—all models are affected by a single upgrade. However, this is offset by having only one system to upgrade.
4. Vendor Lock-in Considerations
Consolidating on one multi-model database increases dependency on that vendor. Migration away requires migrating all models simultaneously.
Monitoring Multi-Model Workloads:
Effective monitoring requires model-aware metrics:
Metrics by Model:
├── Document Operations
│ ├── Inserts/sec
│ ├── Query latency (p50, p99)
│ └── Index hit ratio
├── Graph Operations
│ ├── Traversals/sec
│ ├── Average traversal depth
│ └── Edge scans vs. index lookups
├── Key-Value Operations
│ ├── Gets/sec, Sets/sec
│ ├── Point lookup latency
│ └── Cache hit ratio
└── Cross-Model Queries
├── Count and latency
├── Model combinations used
└── Transaction scope sizes
Don't adopt all models immediately. Start with your dominant model, then expand to additional models as needs arise. This allows you to build operational expertise incrementally while validating the multi-model approach for your environment.
We've explored the architectural foundations that enable multi-model databases to function as coherent systems. Let's consolidate the key insights:
What's Next:
With the architectural foundation established, the next page examines ArangoDB as a concrete example of native multi-model database design—exploring its specific approaches to storage, querying, and cross-model integration.
You now understand how multi-model databases are architected to support multiple data models within a single system. This architectural knowledge enables you to evaluate multi-model databases based on integration depth rather than feature checklists.