Database Management SystemsMulti-Model Databases

Multi-Model Databases: Unified Data Management

LevelAdvanced

Duration60 mins

TopicMulti-Model Databases

2 / 5

Single Database Architecture

Unifying Models Under One Roof

The previous page established why multiple data models exist and why their diversity creates challenges for application developers. Now we address the architectural question: How can a single database system support multiple data models simultaneously?

This isn't merely a matter of adding features. Supporting multiple models within one system requires fundamental architectural decisions about:

How data is physically stored
How different model semantics are expressed
How queries across models are processed
How consistency is maintained across model boundaries

The answer to these questions determines whether a multi-model database is a coherent system or merely multiple systems bolted together.

What You Will Learn

By the end of this page, you will understand the architectural patterns for building multi-model databases, including unified storage engines, query processor integration, and cross-model transaction handling. You'll be able to evaluate multi-model databases based on their architectural depth.

Architectural Patterns for Multi-Model

Multi-model databases employ several architectural patterns, each with distinct trade-offs. Understanding these patterns is essential for evaluating and selecting multi-model systems.

Pattern 1: Unified Core with Model Adapters

The most deeply integrated approach builds a flexible core storage engine that can be accessed through different model 'lenses':

┌─────────────────────────────────────────────────────┐
│                   Application Layer                  │
├─────────────┬─────────────┬─────────────┬───────────┤
│ Document API│  Graph API  │ Key-Value API│ SQL API  │
├─────────────┴─────────────┴─────────────┴───────────┤
│              Unified Query Processor                 │
├─────────────────────────────────────────────────────┤
│              Unified Storage Engine                  │
│         (Optimized for Multiple Access Patterns)    │
└─────────────────────────────────────────────────────┘

Characteristics:

Single storage format (typically document-like with graph extensions)
Query processor understands all models natively
Cross-model queries are first-class citizens
Transactions naturally span model boundaries
Examples: ArangoDB, OrientDB (native graph + document)

Pattern 2: Federated Storage Engines

Some systems use different storage engines for different models, federated under a common query layer:

┌─────────────────────────────────────────────────────┐
│                   Application Layer                  │
├─────────────────────────────────────────────────────┤
│              Unified Query Processor                 │
├──────────────────┬──────────────────┬───────────────┤
│ Document Engine  │   Graph Engine   │   KV Engine   │
└──────────────────┴──────────────────┴───────────────┘

Characteristics:

Specialized storage optimized per model
Query processor coordinates across engines
Cross-model queries require engine coordination
Transaction coordination is complex
Examples: Some embedded multi-model approaches

Pattern 3: Extended Single-Model

Existing databases add multi-model capabilities through extensions:

┌─────────────────────────────────────────────────────┐
│                   Application Layer                  │
├─────────────────────────────────────────────────────┤
│     Original Query Processor + Extension Handler     │
├─────────────────────────────────────────────────────┤
│     Original Storage Engine + Extension Storage     │
└─────────────────────────────────────────────────────┘

Characteristics:

Primary model remains dominant
Extended models may have limitations
Benefits from existing optimization for primary model
Cross-model integration may be shallow
Examples: PostgreSQL (relational + JSON/JSONB), MongoDB (document + $graphLookup)

Evaluating Integration Depth

When evaluating multi-model databases, ask: 'Was this designed multi-model from the start, or was multi-model added later?' Native multi-model systems typically offer deeper integration, while extended systems may offer better performance for their primary model.

Unified Storage Engine Design

A unified storage engine must satisfy competing requirements from different data models. This is the deepest architectural challenge in multi-model design.

Core Storage Requirements by Model:

Storage Requirements Across Models
Model	Storage Requirements	Access Patterns
Document	Nested structures, variable schemas, efficient serialization	Full document retrieval, field-level access, nested queries
Graph	Edge storage, adjacency lists, traversal-optimized structures	Neighbor lookup, path traversal, pattern matching
Key-Value	Direct key → value mapping, minimal overhead	Point lookups, range scans by key, TTL handling
Relational	Fixed schemas, column storage, index structures	Predicate filtering, joins, aggregations

Unified Storage Approaches:

1. Document-Centric Core

Many multi-model databases use a document-like format as their core storage, representing graph edges and relational rows as documents:

// Document storage
{ "_key": "product_123", "type": "product", "name": "Laptop", ... }

// Graph edge as document
{ "_key": "123_456", "_from": "users/123", "_to": "products/456", "type": "purchased" }

// Relational row as document
{ "_key": "order_789", "customer_id": 123, "total": 599.99, "date": "2024-01-15" }

Advantages:

Flexible schema accommodates all models
Standard indexing works across models
JSON/document formats are widely understood

Challenges:

Graph traversals require document lookups (less efficient than adjacency lists)
Relational queries may require schema enforcement at application level

2. Graph-Native with Document Properties

Some systems store graphs natively (adjacency lists) with documents as node/edge properties:

Node Storage:
  node_123: { type: "user", properties: { name: "Alice", ... } }
  
Edge Storage (adjacency):
  node_123 -> [ node_456 (follows), node_789 (likes), ... ]
  
Property Storage:
  edge_123_456: { since: "2024-01-01", strength: 0.8 }

Advantages:

Optimal graph traversal performance
Documents stored efficiently as properties

Challenges:

Non-graph queries may be less efficient
Storage overhead for relationship structures

3. Hybrid with Specialized Structures

Advanced systems maintain multiple internal representations optimized for different access patterns:

┌────────────────────────────────────────┐
│          Logical Data Layer            │
│    (Unified view of all data)          │
├────────────────────────────────────────┤
│        Physical Storage Layer          │
│  ┌──────────┬──────────┬─────────────┐ │
│  │ B+ Trees │ Adjacency│ Hash Tables │ │
│  │ (range)  │ (graph)  │ (point)     │ │
│  └──────────┴──────────┴─────────────┘ │
└────────────────────────────────────────┘

The system maintains consistency across representations, choosing optimal structures for each access pattern.

Storage Trade-offs

There is no perfect unified storage format. Every approach trades something for something else. Document-centric storage sacrifices graph performance; graph-native storage may sacrifice document query efficiency. Understanding these trade-offs helps you evaluate whether a multi-model system fits your workload.

Query Processor Integration

The query processor is where multi-model integration becomes most visible to users. A well-integrated multi-model query processor enables queries that seamlessly combine different model operations.

Query Language Approaches:

1. Unified Query Language

Some systems design a single query language that natively supports all models. ArangoDB's AQL (ArangoDB Query Language) exemplifies this:

// Mixed document and graph query in AQL
FOR product IN products
  FILTER product.category == "electronics"
  LET recommended = (
    FOR v, e IN 1..2 OUTBOUND product GRAPH 'recommendations'
      FILTER e.weight > 0.5
      RETURN v
  )
  RETURN { product, recommended }

This query:

Filters documents (document model)
Traverses a graph (graph model)
Combines results in one query

2. Multiple Integrated Languages

Other systems support multiple query languages that can reference each other:

-- SQL with graph extension (conceptual)
SELECT p.name, 
       (SELECT target FROM follow_path(p.id, 'friends', 2)) as network
FROM products p
WHERE p.category = 'electronics';

3. Polyglot Query APIs

Some systems provide separate APIs for each model but allow results to reference across models through identifiers.

Query Optimization Challenges:

Cross-model queries present unique optimization challenges:

Challenge 1: Cost Model Complexity

Optimizing a single-model query requires estimating costs for that model's operations. Cross-model queries require unified cost models:

Document filter costs
Graph traversal costs
Join costs across models
The interplay between them

Challenge 2: Execution Plan Space

More models mean more possible execution strategies:

Should we filter documents first, then traverse?
Should we traverse first, then filter results?
Can graph traversal eliminate documents before filtering?

The optimizer must evaluate cross-model execution orders.

Challenge 3: Intermediate Results

Cross-model execution may produce intermediate results that must be converted between model representations:

Document → Graph node for traversal input
Graph results → Document for aggregation

cross_model_query_example.aql

AQL

// Example: Find top influencers who purchased electronics
// Combines: key-value lookup, document filter, graph traversal, aggregation
 
// Start with efficient key lookup (key-value pattern)
LET category_products = (
  FOR p IN products
    FILTER p.category == "electronics"  // Document filter
    RETURN p._id
)
 
// Traverse purchase relationships (graph pattern)
FOR product_id IN category_products
  FOR user, purchase IN 1 INBOUND product_id GRAPH 'purchases'
    // Calculate social influence (graph traversal)
    LET follower_count = LENGTH(
      FOR follower IN 1..3 INBOUND user GRAPH 'social'
        RETURN follower
    )
    
    // Aggregate and sort (relational pattern)
    COLLECT userId = user._key 
    AGGREGATE influence = SUM(follower_count)
    
    SORT influence DESC
    LIMIT 10
    
    RETURN { userId, influence }

Query Language Evaluation

When evaluating multi-model databases, examine how naturally cross-model queries can be expressed. A well-designed query language makes multi-model operations feel cohesive rather than awkwardly combined. Try expressing your real use cases in the query language before committing.

Cross-Model Transaction Handling

Transaction support across model boundaries is perhaps the most significant advantage of multi-model databases over polyglot persistence. Let's examine how this works architecturally.

The Polyglot Consistency Problem:

With separate databases, cross-database consistency requires distributed transactions or application-level coordination:

// Polyglot persistence: dangerous window of inconsistency
await documentDB.insert(order);        // Step 1 succeeds
await graphDB.createEdge(user, order);  // Step 2 fails?
// Now document exists without graph edge!
// Application must detect and compensate

Distributed transaction protocols (2PC, Saga patterns) add complexity and performance overhead.

Multi-Model Transaction Architecture:

In a unified multi-model database, cross-model operations participate in the same transaction:

// Multi-model: atomic cross-model operation
database.transaction({
  collections: { write: ['orders', 'purchased'] }
}, function(tx) {
  // Document insert
  let order = tx.collection('orders').insert({
    _key: 'order_123',
    items: [...],
    total: 599
  });
  
  // Graph edge insert (same transaction)
  tx.collection('purchased').insert({
    _from: 'users/alice',
    _to: order._id,
    date: new Date()
  });
});
// Either both succeed or both fail - atomicity guaranteed

Implementation Approaches:

1. Unified Write-Ahead Log (WAL)

Most multi-model databases use a single WAL for all model operations:

WAL Entry Format:
┌───────────┬────────────┬─────────────┬───────────┐
│ LSN       │ TxID       │ Model Type  │ Operation │
├───────────┼────────────┼─────────────┼───────────┤
│ 1001      │ tx_42      │ document    │ INSERT    │
│ 1002      │ tx_42      │ graph       │ EDGE_ADD  │
│ 1003      │ tx_42      │ -           │ COMMIT    │
└───────────┴────────────┴─────────────┴───────────┘

Recovery replays WAL entries, reconstructing state across all models atomically.

2. MVCC Across Models

Multi-Version Concurrency Control (MVCC) enables consistent snapshots across model boundaries:

Document Version:  product_123 @ version 5
Graph Edge Version: edge_456 @ version 5

Transaction at snapshot 5 sees consistent state across:
- Document queries
- Graph traversals  
- They reference the same point in time

3. Lock Management

For systems using locking, the lock manager handles cross-model lock acquisition:

Transaction tx_42:
  LOCK(documents/order_123, WRITE)
  LOCK(edges/user_123_order_123, WRITE)
  ... perform operations ...
  UNLOCK all

Deadlock detection spans model boundaries—a graph operation waiting on a document lock and vice versa must be detected.

Transaction Guarantees in Multi-Model

•Atomicity — All operations across models commit or abort together
•Consistency — Cross-model constraints can be enforced at commit
•Isolation — Concurrent transactions see consistent cross-model snapshots
•Durability — WAL ensures recovery of all model operations

Distributed Multi-Model

Cross-model transactions in distributed deployments are especially complex. Data for different models may reside on different nodes. Multi-model databases handle this through distributed transaction protocols, but performance implications should be understood for your deployment topology.

Cross-Model Index Strategies

Indexing in multi-model databases must support diverse query patterns across models while maintaining reasonable storage and update overhead.

Model-Specific Index Requirements:

Index Types by Data Model
Model	Primary Index Types	Query Patterns Supported
Document	B+ tree, Hash, Full-text, Geospatial	Field queries, range scans, text search, location queries
Graph	Edge index, Vertex-centric index	Neighbor lookup, edge property filtering, traversal optimization
Key-Value	Primary key hash/tree	Point lookups, range scans by key
Relational	Secondary indexes, Composite indexes	Predicate filtering, join optimization

Unified Index Architecture:

Multi-model databases typically support various index types through a unified index subsystem:

┌─────────────────────────────────────────────────────────┐
│              Query Processor                            │
│     (Selects indexes based on query model/patterns)     │
├─────────────────────────────────────────────────────────┤
│                 Index Selection Layer                    │
├─────────┬─────────┬─────────┬─────────┬────────────────┤
│ Primary │ B+ Tree │ Hash    │ Edge    │ Fulltext       │
│ Key Idx │ Indexes │ Indexes │ Indexes │ Indexes        │
├─────────┴─────────┴─────────┴─────────┴────────────────┤
│              Unified Storage Engine                     │
└─────────────────────────────────────────────────────────┘

Edge Indexes for Graph Traversal:

Graph operations require specialized index structures for efficient traversal:

Edge Index Structure (conceptual):

Outbound Index (for OUTBOUND traversal):
  vertex_123 -> [edge_to_456, edge_to_789, edge_to_012]
  
Inbound Index (for INBOUND traversal):
  vertex_456 -> [edge_from_123, edge_from_234]
  
Edge Property Index:
  edge_type = "follows" -> [edge_123, edge_456, ...]

These indexes enable O(1) neighbor lookup rather than scanning all edges.

Vertex-Centric Indexes:

For filtering during traversal, vertex-centric indexes store edge properties indexed per vertex:

// Query: Find friends of Alice with friendship strength > 0.8
FOR friend IN 1 OUTBOUND 'users/alice'
  FILTER friend.strength > 0.8
  RETURN friend

// Vertex-centric index:
alice_edges:
  strength=0.9 -> edge_to_bob
  strength=0.7 -> edge_to_charlie
  strength=0.85 -> edge_to_diana
  
// Index enables pruning edges during traversal

Composite Cross-Model Indexes:

Some systems support indexes spanning model concepts:

// Index on: document.category + graph.edge_type
// Enables efficient queries like:
// "Electronics products purchased in last 30 days"

Composite Index:
  (category="electronics", edge_type="purchased", date > 30_days_ago)
    -> [product_123, product_456, ...]

Index Maintenance Across Models:

When data changes, indexes across models must be updated consistently:

Atomic index updates — All affected indexes update in same transaction
Cascading updates — Graph edge deletion may trigger document index updates
Consistency validation — Index state matches data state across models

Index Trade-offs

Every index consumes storage and slows down writes. Multi-model databases with many index types amplify this trade-off. Carefully analyze your query patterns and create only necessary indexes. The flexibility of multi-model doesn't mean you need every index type on every collection.

Operational Considerations

Running a single multi-model database versus multiple specialized databases has significant operational implications.

Unified Operations Benefits:

Operational Simplification

•Single backup/restore — One consistent backup captures all models simultaneously
•Unified monitoring — One dashboard, one alerting system, one set of metrics
•Single scaling strategy — Scale one system rather than coordinating multiple
•Consistent security — One authentication/authorization system, one audit log
•Simplified disaster recovery — One recovery procedure, one failover
•Reduced expertise requirements — Team learns one system deeply rather than many superficially

Operational Complexity Considerations:

1. Capacity Planning

Different models have different resource characteristics:

Graph traversals are CPU-intensive
Document queries may be I/O-intensive
Key-value operations are memory-sensitive

Capacity planning must account for workload mix, which may be harder to predict than single-model systems.

2. Performance Tuning

Tuning multi-model databases requires understanding model interactions:

Buffer pool allocation across model types
Query optimizer hints for cross-model queries
Index selection for mixed workloads

3. Upgrade and Migration

Upgrading a single database that serves multiple functions has higher risk—all models are affected by a single upgrade. However, this is offset by having only one system to upgrade.

4. Vendor Lock-in Considerations

Consolidating on one multi-model database increases dependency on that vendor. Migration away requires migrating all models simultaneously.

Monitoring Multi-Model Workloads:

Effective monitoring requires model-aware metrics:

Metrics by Model:
├── Document Operations
│   ├── Inserts/sec
│   ├── Query latency (p50, p99)
│   └── Index hit ratio
├── Graph Operations  
│   ├── Traversals/sec
│   ├── Average traversal depth
│   └── Edge scans vs. index lookups
├── Key-Value Operations
│   ├── Gets/sec, Sets/sec
│   ├── Point lookup latency
│   └── Cache hit ratio
└── Cross-Model Queries
    ├── Count and latency
    ├── Model combinations used
    └── Transaction scope sizes

Start Simple, Expand Carefully

Don't adopt all models immediately. Start with your dominant model, then expand to additional models as needs arise. This allows you to build operational expertise incrementally while validating the multi-model approach for your environment.

Summary: Single Database Architecture

We've explored the architectural foundations that enable multi-model databases to function as coherent systems. Let's consolidate the key insights:

Key Takeaways

•Multiple architectural patterns exist — Unified core, federated engines, and extended single-model approaches each have trade-offs
•Storage engines must balance competing requirements — No perfect unified format exists; every approach trades something
•Query processor integration determines usability — Unified query languages enable natural cross-model expression
•Cross-model transactions are a key differentiator — ACID guarantees across models eliminate polyglot consistency headaches
•Index strategies span model boundaries — Specialized indexes (edge, vertex-centric) coexist with traditional structures
•Operational simplification is significant — One system to monitor, backup, secure, and scale
•Trade-offs exist — Capacity planning and tuning complexity increase with model diversity

What's Next:

With the architectural foundation established, the next page examines ArangoDB as a concrete example of native multi-model database design—exploring its specific approaches to storage, querying, and cross-model integration.

Page Complete

You now understand how multi-model databases are architected to support multiple data models within a single system. This architectural knowledge enables you to evaluate multi-model databases based on integration depth rather than feature checklists.

2 / 5

Loading learning content...

Database Management SystemsMulti-Model Databases

Multi-Model Databases: Unified Data Management

LevelAdvanced

Duration60 mins

TopicMulti-Model Databases

2 / 5

Single Database Architecture

Unifying Models Under One Roof

This isn't merely a matter of adding features. Supporting multiple models within one system requires fundamental architectural decisions about:

How data is physically stored
How different model semantics are expressed
How queries across models are processed
How consistency is maintained across model boundaries

The answer to these questions determines whether a multi-model database is a coherent system or merely multiple systems bolted together.

What You Will Learn

Architectural Patterns for Multi-Model

Multi-model databases employ several architectural patterns, each with distinct trade-offs. Understanding these patterns is essential for evaluating and selecting multi-model systems.

Pattern 1: Unified Core with Model Adapters

The most deeply integrated approach builds a flexible core storage engine that can be accessed through different model 'lenses':

┌─────────────────────────────────────────────────────┐
│                   Application Layer                  │
├─────────────┬─────────────┬─────────────┬───────────┤
│ Document API│  Graph API  │ Key-Value API│ SQL API  │
├─────────────┴─────────────┴─────────────┴───────────┤
│              Unified Query Processor                 │
├─────────────────────────────────────────────────────┤
│              Unified Storage Engine                  │
│         (Optimized for Multiple Access Patterns)    │
└─────────────────────────────────────────────────────┘

Characteristics:

Single storage format (typically document-like with graph extensions)
Query processor understands all models natively
Cross-model queries are first-class citizens
Transactions naturally span model boundaries
Examples: ArangoDB, OrientDB (native graph + document)

Pattern 2: Federated Storage Engines

Some systems use different storage engines for different models, federated under a common query layer:

┌─────────────────────────────────────────────────────┐
│                   Application Layer                  │
├─────────────────────────────────────────────────────┤
│              Unified Query Processor                 │
├──────────────────┬──────────────────┬───────────────┤
│ Document Engine  │   Graph Engine   │   KV Engine   │
└──────────────────┴──────────────────┴───────────────┘

Characteristics:

Specialized storage optimized per model
Query processor coordinates across engines
Cross-model queries require engine coordination
Transaction coordination is complex
Examples: Some embedded multi-model approaches

Pattern 3: Extended Single-Model

Existing databases add multi-model capabilities through extensions:

┌─────────────────────────────────────────────────────┐
│                   Application Layer                  │
├─────────────────────────────────────────────────────┤
│     Original Query Processor + Extension Handler     │
├─────────────────────────────────────────────────────┤
│     Original Storage Engine + Extension Storage     │
└─────────────────────────────────────────────────────┘

Characteristics:

Primary model remains dominant
Extended models may have limitations
Benefits from existing optimization for primary model
Cross-model integration may be shallow
Examples: PostgreSQL (relational + JSON/JSONB), MongoDB (document + $graphLookup)

Evaluating Integration Depth

Unified Storage Engine Design

A unified storage engine must satisfy competing requirements from different data models. This is the deepest architectural challenge in multi-model design.

Core Storage Requirements by Model:

Storage Requirements Across Models
Model	Storage Requirements	Access Patterns
Document	Nested structures, variable schemas, efficient serialization	Full document retrieval, field-level access, nested queries
Graph	Edge storage, adjacency lists, traversal-optimized structures	Neighbor lookup, path traversal, pattern matching
Key-Value	Direct key → value mapping, minimal overhead	Point lookups, range scans by key, TTL handling
Relational	Fixed schemas, column storage, index structures	Predicate filtering, joins, aggregations

Unified Storage Approaches:

1. Document-Centric Core

Many multi-model databases use a document-like format as their core storage, representing graph edges and relational rows as documents:

// Document storage
{ "_key": "product_123", "type": "product", "name": "Laptop", ... }

// Graph edge as document
{ "_key": "123_456", "_from": "users/123", "_to": "products/456", "type": "purchased" }

// Relational row as document
{ "_key": "order_789", "customer_id": 123, "total": 599.99, "date": "2024-01-15" }

Advantages:

Flexible schema accommodates all models
Standard indexing works across models
JSON/document formats are widely understood

Challenges:

Graph traversals require document lookups (less efficient than adjacency lists)
Relational queries may require schema enforcement at application level

2. Graph-Native with Document Properties

Some systems store graphs natively (adjacency lists) with documents as node/edge properties:

Node Storage:
  node_123: { type: "user", properties: { name: "Alice", ... } }
  
Edge Storage (adjacency):
  node_123 -> [ node_456 (follows), node_789 (likes), ... ]
  
Property Storage:
  edge_123_456: { since: "2024-01-01", strength: 0.8 }

Advantages:

Optimal graph traversal performance
Documents stored efficiently as properties

Challenges:

Non-graph queries may be less efficient
Storage overhead for relationship structures

3. Hybrid with Specialized Structures

Advanced systems maintain multiple internal representations optimized for different access patterns:

┌────────────────────────────────────────┐
│          Logical Data Layer            │
│    (Unified view of all data)          │
├────────────────────────────────────────┤
│        Physical Storage Layer          │
│  ┌──────────┬──────────┬─────────────┐ │
│  │ B+ Trees │ Adjacency│ Hash Tables │ │
│  │ (range)  │ (graph)  │ (point)     │ │
│  └──────────┴──────────┴─────────────┘ │
└────────────────────────────────────────┘

The system maintains consistency across representations, choosing optimal structures for each access pattern.

Storage Trade-offs

Query Processor Integration

The query processor is where multi-model integration becomes most visible to users. A well-integrated multi-model query processor enables queries that seamlessly combine different model operations.

Query Language Approaches:

1. Unified Query Language

Some systems design a single query language that natively supports all models. ArangoDB's AQL (ArangoDB Query Language) exemplifies this:

// Mixed document and graph query in AQL
FOR product IN products
  FILTER product.category == "electronics"
  LET recommended = (
    FOR v, e IN 1..2 OUTBOUND product GRAPH 'recommendations'
      FILTER e.weight > 0.5
      RETURN v
  )
  RETURN { product, recommended }

This query:

Filters documents (document model)
Traverses a graph (graph model)
Combines results in one query

2. Multiple Integrated Languages

Other systems support multiple query languages that can reference each other:

-- SQL with graph extension (conceptual)
SELECT p.name, 
       (SELECT target FROM follow_path(p.id, 'friends', 2)) as network
FROM products p
WHERE p.category = 'electronics';

3. Polyglot Query APIs

Some systems provide separate APIs for each model but allow results to reference across models through identifiers.

Query Optimization Challenges:

Cross-model queries present unique optimization challenges:

Challenge 1: Cost Model Complexity

Optimizing a single-model query requires estimating costs for that model's operations. Cross-model queries require unified cost models:

Document filter costs
Graph traversal costs
Join costs across models
The interplay between them

Challenge 2: Execution Plan Space

More models mean more possible execution strategies:

Should we filter documents first, then traverse?
Should we traverse first, then filter results?
Can graph traversal eliminate documents before filtering?

The optimizer must evaluate cross-model execution orders.

Challenge 3: Intermediate Results

Cross-model execution may produce intermediate results that must be converted between model representations:

Document → Graph node for traversal input
Graph results → Document for aggregation

cross_model_query_example.aql

AQL

// Example: Find top influencers who purchased electronics
// Combines: key-value lookup, document filter, graph traversal, aggregation
 
// Start with efficient key lookup (key-value pattern)
LET category_products = (
  FOR p IN products
    FILTER p.category == "electronics"  // Document filter
    RETURN p._id
)
 
// Traverse purchase relationships (graph pattern)
FOR product_id IN category_products
  FOR user, purchase IN 1 INBOUND product_id GRAPH 'purchases'
    // Calculate social influence (graph traversal)
    LET follower_count = LENGTH(
      FOR follower IN 1..3 INBOUND user GRAPH 'social'
        RETURN follower
    )
    
    // Aggregate and sort (relational pattern)
    COLLECT userId = user._key 
    AGGREGATE influence = SUM(follower_count)
    
    SORT influence DESC
    LIMIT 10
    
    RETURN { userId, influence }

Query Language Evaluation

Cross-Model Transaction Handling

Transaction support across model boundaries is perhaps the most significant advantage of multi-model databases over polyglot persistence. Let's examine how this works architecturally.

The Polyglot Consistency Problem:

With separate databases, cross-database consistency requires distributed transactions or application-level coordination:

// Polyglot persistence: dangerous window of inconsistency
await documentDB.insert(order);        // Step 1 succeeds
await graphDB.createEdge(user, order);  // Step 2 fails?
// Now document exists without graph edge!
// Application must detect and compensate

Distributed transaction protocols (2PC, Saga patterns) add complexity and performance overhead.

Multi-Model Transaction Architecture:

In a unified multi-model database, cross-model operations participate in the same transaction:

// Multi-model: atomic cross-model operation
database.transaction({
  collections: { write: ['orders', 'purchased'] }
}, function(tx) {
  // Document insert
  let order = tx.collection('orders').insert({
    _key: 'order_123',
    items: [...],
    total: 599
  });
  
  // Graph edge insert (same transaction)
  tx.collection('purchased').insert({
    _from: 'users/alice',
    _to: order._id,
    date: new Date()
  });
});
// Either both succeed or both fail - atomicity guaranteed

Implementation Approaches:

1. Unified Write-Ahead Log (WAL)

Most multi-model databases use a single WAL for all model operations:

WAL Entry Format:
┌───────────┬────────────┬─────────────┬───────────┐
│ LSN       │ TxID       │ Model Type  │ Operation │
├───────────┼────────────┼─────────────┼───────────┤
│ 1001      │ tx_42      │ document    │ INSERT    │
│ 1002      │ tx_42      │ graph       │ EDGE_ADD  │
│ 1003      │ tx_42      │ -           │ COMMIT    │
└───────────┴────────────┴─────────────┴───────────┘

Recovery replays WAL entries, reconstructing state across all models atomically.

2. MVCC Across Models

Multi-Version Concurrency Control (MVCC) enables consistent snapshots across model boundaries:

Document Version:  product_123 @ version 5
Graph Edge Version: edge_456 @ version 5

Transaction at snapshot 5 sees consistent state across:
- Document queries
- Graph traversals  
- They reference the same point in time

3. Lock Management

For systems using locking, the lock manager handles cross-model lock acquisition:

Transaction tx_42:
  LOCK(documents/order_123, WRITE)
  LOCK(edges/user_123_order_123, WRITE)
  ... perform operations ...
  UNLOCK all

Deadlock detection spans model boundaries—a graph operation waiting on a document lock and vice versa must be detected.

Transaction Guarantees in Multi-Model

•Atomicity — All operations across models commit or abort together
•Consistency — Cross-model constraints can be enforced at commit
•Isolation — Concurrent transactions see consistent cross-model snapshots
•Durability — WAL ensures recovery of all model operations

Distributed Multi-Model

Cross-Model Index Strategies

Indexing in multi-model databases must support diverse query patterns across models while maintaining reasonable storage and update overhead.

Model-Specific Index Requirements:

Index Types by Data Model
Model	Primary Index Types	Query Patterns Supported
Document	B+ tree, Hash, Full-text, Geospatial	Field queries, range scans, text search, location queries
Graph	Edge index, Vertex-centric index	Neighbor lookup, edge property filtering, traversal optimization
Key-Value	Primary key hash/tree	Point lookups, range scans by key
Relational	Secondary indexes, Composite indexes	Predicate filtering, join optimization

Unified Index Architecture:

Multi-model databases typically support various index types through a unified index subsystem:

┌─────────────────────────────────────────────────────────┐
│              Query Processor                            │
│     (Selects indexes based on query model/patterns)     │
├─────────────────────────────────────────────────────────┤
│                 Index Selection Layer                    │
├─────────┬─────────┬─────────┬─────────┬────────────────┤
│ Primary │ B+ Tree │ Hash    │ Edge    │ Fulltext       │
│ Key Idx │ Indexes │ Indexes │ Indexes │ Indexes        │
├─────────┴─────────┴─────────┴─────────┴────────────────┤
│              Unified Storage Engine                     │
└─────────────────────────────────────────────────────────┘

Edge Indexes for Graph Traversal:

Graph operations require specialized index structures for efficient traversal:

Edge Index Structure (conceptual):

Outbound Index (for OUTBOUND traversal):
  vertex_123 -> [edge_to_456, edge_to_789, edge_to_012]
  
Inbound Index (for INBOUND traversal):
  vertex_456 -> [edge_from_123, edge_from_234]
  
Edge Property Index:
  edge_type = "follows" -> [edge_123, edge_456, ...]

These indexes enable O(1) neighbor lookup rather than scanning all edges.

Vertex-Centric Indexes:

For filtering during traversal, vertex-centric indexes store edge properties indexed per vertex:

// Query: Find friends of Alice with friendship strength > 0.8
FOR friend IN 1 OUTBOUND 'users/alice'
  FILTER friend.strength > 0.8
  RETURN friend

// Vertex-centric index:
alice_edges:
  strength=0.9 -> edge_to_bob
  strength=0.7 -> edge_to_charlie
  strength=0.85 -> edge_to_diana
  
// Index enables pruning edges during traversal

Composite Cross-Model Indexes:

Some systems support indexes spanning model concepts:

// Index on: document.category + graph.edge_type
// Enables efficient queries like:
// "Electronics products purchased in last 30 days"

Composite Index:
  (category="electronics", edge_type="purchased", date > 30_days_ago)
    -> [product_123, product_456, ...]

Index Maintenance Across Models:

When data changes, indexes across models must be updated consistently:

Atomic index updates — All affected indexes update in same transaction
Cascading updates — Graph edge deletion may trigger document index updates
Consistency validation — Index state matches data state across models

Index Trade-offs

Operational Considerations

Running a single multi-model database versus multiple specialized databases has significant operational implications.

Unified Operations Benefits:

Operational Simplification

•Single backup/restore — One consistent backup captures all models simultaneously
•Unified monitoring — One dashboard, one alerting system, one set of metrics
•Single scaling strategy — Scale one system rather than coordinating multiple
•Consistent security — One authentication/authorization system, one audit log
•Simplified disaster recovery — One recovery procedure, one failover
•Reduced expertise requirements — Team learns one system deeply rather than many superficially

Operational Complexity Considerations:

1. Capacity Planning

Different models have different resource characteristics:

Graph traversals are CPU-intensive
Document queries may be I/O-intensive
Key-value operations are memory-sensitive

Capacity planning must account for workload mix, which may be harder to predict than single-model systems.

2. Performance Tuning

Tuning multi-model databases requires understanding model interactions:

Buffer pool allocation across model types
Query optimizer hints for cross-model queries
Index selection for mixed workloads

3. Upgrade and Migration

Upgrading a single database that serves multiple functions has higher risk—all models are affected by a single upgrade. However, this is offset by having only one system to upgrade.

4. Vendor Lock-in Considerations

Consolidating on one multi-model database increases dependency on that vendor. Migration away requires migrating all models simultaneously.

Monitoring Multi-Model Workloads:

Effective monitoring requires model-aware metrics:

Metrics by Model:
├── Document Operations
│   ├── Inserts/sec
│   ├── Query latency (p50, p99)
│   └── Index hit ratio
├── Graph Operations  
│   ├── Traversals/sec
│   ├── Average traversal depth
│   └── Edge scans vs. index lookups
├── Key-Value Operations
│   ├── Gets/sec, Sets/sec
│   ├── Point lookup latency
│   └── Cache hit ratio
└── Cross-Model Queries
    ├── Count and latency
    ├── Model combinations used
    └── Transaction scope sizes

Start Simple, Expand Carefully

Summary: Single Database Architecture

We've explored the architectural foundations that enable multi-model databases to function as coherent systems. Let's consolidate the key insights:

Key Takeaways

•Multiple architectural patterns exist — Unified core, federated engines, and extended single-model approaches each have trade-offs
•Storage engines must balance competing requirements — No perfect unified format exists; every approach trades something
•Query processor integration determines usability — Unified query languages enable natural cross-model expression
•Cross-model transactions are a key differentiator — ACID guarantees across models eliminate polyglot consistency headaches
•Index strategies span model boundaries — Specialized indexes (edge, vertex-centric) coexist with traditional structures
•Operational simplification is significant — One system to monitor, backup, secure, and scale
•Trade-offs exist — Capacity planning and tuning complexity increase with model diversity

What's Next:

Page Complete

2 / 5