Multi Model Databases - Learning Module

Loading content...

0/241

ArangoDB: A Native Multi-Model Database

ArangoDB: Multi-Model from the Ground Up

Abstract architectural principles become concrete when examined through specific implementations. ArangoDB serves as an exemplary case study of native multi-model database design—a system built from inception to support document, graph, and key-value access patterns within a unified architecture.

Founded in 2011 and open-sourced in 2012, ArangoDB represents a deliberate attempt to solve the polyglot persistence problem through deep integration rather than bolted-on extensions. Its design choices illuminate both the possibilities and challenges of multi-model databases.

This page examines ArangoDB not as a product endorsement but as a lens through which to understand how multi-model concepts manifest in practice. The patterns we explore apply broadly to evaluating any multi-model system.

What You Will Learn

By the end of this page, you will understand ArangoDB's architecture, its AQL query language, how it handles documents and graphs, and how to evaluate its approach for your use cases. You'll see multi-model principles instantiated in a real system.

ArangoDB Architecture Overview

ArangoDB's architecture reflects its multi-model philosophy at every layer:

Core Architectural Principles:

Single Storage Engine — All data models share one underlying storage (RocksDB in modern versions)
Document-Centric Foundation — Documents serve as the universal data format; graphs are documents with special attributes
Unified Query Language — AQL (ArangoDB Query Language) handles all models natively
ACID Transactions — Cross-model operations participate in the same transaction
Native Clustering — Distributed architecture supports all models consistently

High-Level Architecture:

┌─────────────────────────────────────────────────────────────┐
│                     Client Applications                      │
├─────────────────────────────────────────────────────────────┤
│                HTTP/WebSocket API Layer                      │
│              (REST API, JavaScript SDK, etc.)                │
├─────────────────────────────────────────────────────────────┤
│                    AQL Query Engine                          │
│  ┌─────────────┬──────────────┬────────────────────────────┐ │
│  │  Document   │    Graph     │      Search/Analytics      │ │
│  │  Operations │  Traversals  │        Operations          │ │
│  └─────────────┴──────────────┴────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│                 Query Optimizer                              │
│        (Cost-based optimization across models)               │
├─────────────────────────────────────────────────────────────┤
│              Collection/Graph Management                     │
├─────────────────────────────────────────────────────────────┤
│                  RocksDB Storage Engine                      │
│        (LSM-tree with column family per collection)          │
└─────────────────────────────────────────────────────────────┘

Data Organization:

ArangoDB organizes data into collections (similar to tables or MongoDB collections) and graphs (named sets of vertex and edge collections):

Document Collections — Store JSON documents with automatic _key, _id, _rev attributes
Edge Collections — Store documents representing relationships with required _from and _to attributes
Named Graphs — Group vertex/edge collections with defined edge definitions

// Regular document in 'products' collection
{
  "_key": "laptop_123",
  "_id": "products/laptop_123",
  "_rev": "12345678",
  "name": "ProBook X1",
  "category": "electronics",
  "specs": { "cpu": "Intel i7", "ram": "16GB" }
}

// Edge document in 'purchased' edge collection
{
  "_key": "123_456",
  "_id": "purchased/123_456",
  "_from": "users/alice",
  "_to": "products/laptop_123",
  "date": "2024-01-15",
  "quantity": 1
}

The Edge Document Insight

ArangoDB's elegant insight is that graph edges ARE documents. They have all document capabilities (flexible schema, indexes, queries) plus special _from and _to attributes for graph semantics. This unification enables seamless mixing of document and graph operations.

AQL: The Unified Query Language

AQL (ArangoDB Query Language) exemplifies unified multi-model query design. Unlike SQL with extensions or separate query languages per model, AQL handles documents, graphs, and analytical operations with consistent syntax.

AQL Core Concepts:

1. FOR Loops — The Foundation

AQL uses FOR loops as its primary iteration construct, analogous to SQL's FROM but more flexible:

// Iterate over documents
FOR product IN products
  RETURN product

// Equivalent to: SELECT * FROM products

2. FILTER, SORT, LIMIT — Familiar Operations

FOR product IN products
  FILTER product.category == "electronics"
  FILTER product.price < 1000
  SORT product.price DESC
  LIMIT 10
  RETURN { name: product.name, price: product.price }

3. Graph Traversals — Native Integration

Graph traversals use the same FOR syntax with traversal specifications:

// Find users who purchased from 'users/alice' (outbound edges)
FOR vertex, edge, path
  IN 1..3                           // Depth 1 to 3
  OUTBOUND 'users/alice'            // Starting vertex
  GRAPH 'social_network'            // Named graph
  RETURN { user: vertex, path: path }

4. Cross-Model Queries — Seamless Mixing

The power emerges when combining patterns:

cross_model_aql.aql

AQL

// Find electronics products purchased by high-influence users
// Combines: document filter, graph traversal, aggregation
 
FOR product IN products
  FILTER product.category == "electronics"
  FILTER product.price > 500
  
  // For each matching product, find who purchased it
  LET purchasers = (
    FOR user, edge IN 1 INBOUND product GRAPH 'purchases'
      // Calculate each user's social influence
      LET follower_count = LENGTH(
        FOR follower IN 1..2 INBOUND user GRAPH 'social'
          RETURN 1
      )
      FILTER follower_count > 100  // Only influential users
      RETURN {
        user: user.name,
        followers: follower_count,
        purchase_date: edge.date
      }
  )
  
  FILTER LENGTH(purchasers) > 0
  
  RETURN {
    product: product.name,
    price: product.price,
    influential_buyers: purchasers
  }

AQL Operations Reference:

Key AQL Operations
Operation	Syntax	Description
Document iteration	`FOR doc IN collection`	Iterate over all documents in collection
Filter	`FILTER condition`	Filter iteration results
Sort	`SORT expr [ASC\|DESC]`	Order results
Limit	`LIMIT offset, count`	Limit result count
Graph traversal	`FOR v, e, p IN min..max DIRECTION start GRAPH name`	Traverse named graph
Edge traversal	`FOR v IN 1 OUTBOUND start edges`	Traverse specific edge collection
Let binding	`LET var = expression`	Bind subquery or expression to variable
Collect/Aggregate	`COLLECT key = expr AGGREGATE agg = func`	Group and aggregate
Insert	`INSERT doc INTO collection`	Insert document
Update	`UPDATE key WITH attrs IN collection`	Update document
Remove	`REMOVE key IN collection`	Delete document

AQL Design Philosophy

AQL deliberately avoids SQL's keyword-heavy syntax for a more programmable, composable style. Subqueries, variable binding, and functional operations compose naturally—essential for complex cross-model queries.

Document Operations in Depth

As a document database, ArangoDB provides rich capabilities for JSON document storage and querying.

Document Structure and Keys:

Every document has system attributes:

_key — Unique identifier within collection (user-defined or auto-generated)
_id — Globally unique identifier: collection/_key
_rev — Revision identifier for optimistic locking

{
  "_key": "user_alice",          // User-specified
  "_id": "users/user_alice",      // Auto-derived
  "_rev": "_abc123xyz",           // System-managed
  "name": "Alice Smith",
  "email": "alice@example.com",
  "preferences": {                 // Nested documents
    "theme": "dark",
    "notifications": true
  },
  "roles": ["admin", "developer"] // Arrays
}

CRUD Operations:

document_crud.aql

AQL

// INSERT - Create new document
INSERT { 
  _key: "product_001",
  name: "Wireless Mouse",
  price: 29.99,
  inventory: 150
} INTO products
RETURN NEW
 
// UPDATE - Modify existing document
UPDATE "product_001" WITH {
  price: 24.99,
  on_sale: true
} IN products
RETURN { old: OLD, new: NEW }
 
// REPLACE - Complete replacement
REPLACE "product_001" WITH {
  _key: "product_001",
  name: "Wireless Mouse Pro",
  price: 39.99,
  inventory: 200,
  features: ["ergonomic", "bluetooth"]
} IN products
 
// UPSERT - Insert or update
UPSERT { _key: "product_001" }
INSERT { name: "New Product", price: 50 }
UPDATE { last_accessed: DATE_NOW() }
IN products
 
// REMOVE - Delete document
REMOVE "product_001" IN products
RETURN OLD

Querying Nested Documents:

AQL handles nested structures naturally:

// Query nested attributes
FOR user IN users
  FILTER user.preferences.theme == "dark"
  FILTER user.address.city IN ["NYC", "LA", "Chicago"]
  RETURN user

// Query arrays
FOR user IN users
  FILTER "admin" IN user.roles
  RETURN user

// Array operations
FOR user IN users
  FILTER LENGTH(user.roles) > 2
  FILTER user.roles ANY == "developer"  // Any element matches
  FILTER user.roles ALL != "guest"      // All elements match
  RETURN user

Indexes for Documents:

ArangoDB supports various index types for document queries:

// Create indexes via API or AQL
db.products.ensureIndex({ 
  type: "persistent",     // B-tree index
  fields: ["category", "price"],
  unique: false
});

db.products.ensureIndex({
  type: "fulltext",
  fields: ["description"],
  minLength: 3
});

db.products.ensureIndex({
  type: "geo",
  fields: ["location"],
  geoJson: true
});

db.products.ensureIndex({
  type: "ttl",             // Time-to-live
  fields: ["expires_at"],
  expireAfter: 0           // Delete when expires_at is reached
});

Document Design Considerations

ArangoDB's document model encourages embedding related data when access patterns warrant. However, for data that will participate in graph relationships, use document references (_key, _id) rather than embedding—the graph model provides superior relationship traversal.

Graph Operations in Depth

ArangoDB's graph capabilities are first-class, not bolted on. The key insight—edges are documents—enables rich graph operations while maintaining document flexibility.

Graph Definition:

Named graphs define which collections hold vertices and edges:

// Create a named graph
var graph = graph_module.graph;
var g = graph._create("social_network", 
  [
    {
      collection: "follows",     // Edge collection
      from: ["users"],           // Source vertices
      to: ["users"]              // Target vertices
    },
    {
      collection: "likes",
      from: ["users"],
      to: ["posts"]
    }
  ],
  ["users", "posts"]           // Orphan collections
);

Edge Documents:

Edges are documents with required _from and _to attributes:

// Edge in 'follows' collection
{
  "_key": "alice_follows_bob",
  "_from": "users/alice",
  "_to": "users/bob",
  "since": "2024-01-01",
  "strength": 0.85,
  "mutual": false
}

Graph Traversal Patterns:

graph_traversals.aql

AQL

// Basic outbound traversal: who does Alice follow?
FOR followed IN 1 OUTBOUND 'users/alice' GRAPH 'social_network'
  RETURN followed.name
 
// Inbound traversal: who follows Alice?
FOR follower IN 1 INBOUND 'users/alice' GRAPH 'social_network'
  RETURN follower.name
 
// Any direction
FOR connection IN 1 ANY 'users/alice' GRAPH 'social_network'
  RETURN connection
 
// Variable depth: friends of friends (depth 1-3)
FOR friend, edge, path IN 1..3 OUTBOUND 'users/alice' GRAPH 'social_network'
  RETURN {
    friend: friend.name,
    depth: LENGTH(path.edges),
    connection_path: path.vertices[*].name
  }
 
// Filter during traversal
FOR user, edge IN 1..2 OUTBOUND 'users/alice' GRAPH 'social_network'
  FILTER edge.strength > 0.5           // Only strong connections
  FILTER user.active == true           // Only active users
  RETURN user
 
// Shortest path
FOR v, e IN OUTBOUND SHORTEST_PATH 
  'users/alice' TO 'users/zara' 
  GRAPH 'social_network'
  RETURN { vertex: v.name, edge_type: e.type }
 
// All shortest paths (if multiple exist)
FOR path IN OUTBOUND ALL_SHORTEST_PATHS
  'users/alice' TO 'users/zara'
  GRAPH 'social_network'
  RETURN path.vertices[*].name
 
// Pattern matching: find triangles (mutual friend groups)
FOR user IN users
  FILTER user._id != 'users/alice'
  LET mutual = (
    FOR m IN 1 OUTBOUND 'users/alice' GRAPH 'social_network'
      FILTER m._id != user._id
      FOR check IN 1 OUTBOUND user GRAPH 'social_network'
        FILTER check._id == m._id
        RETURN m
  )
  FILTER LENGTH(mutual) > 0
  RETURN { user: user.name, mutual_friends: mutual[*].name }

Graph-Specific Optimizations:

Edge Indexes: ArangoDB automatically creates edge indexes on _from and _to for O(1) neighbor lookup:

Edge Index:
  _from: users/alice -> [edge_001, edge_002, edge_003]
  _to: users/bob -> [edge_001, edge_100]

Vertex-Centric Indexes: For filtering edge properties during traversal:

// Create vertex-centric index
db.follows.ensureIndex({
  type: "persistent",
  fields: ["strength"],
  inBackground: true
});

// Now this traversal uses index for filtering:
// FOR user IN 1 OUTBOUND 'users/alice' GRAPH 'social'
//   FILTER edge.strength > 0.8
//   RETURN user

Traversal Options:

// Control traversal behavior
FOR vertex, edge, path IN 1..5 OUTBOUND 'users/alice' 
  GRAPH 'social'
  OPTIONS {
    bfs: true,              // Breadth-first (default: depth-first)
    uniqueVertices: 'path', // Don't revisit vertices in same path
    uniqueEdges: 'path'     // Don't reuse edges in same path
  }
  RETURN vertex

Graph vs. Edge Collections

You can traverse without named graphs by specifying edge collections directly: FOR v IN 1 OUTBOUND start follows, likes. Named graphs provide schema enforcement (valid from/to collections) and semantic grouping but aren't required for traversal.

Transactions and Consistency

ArangoDB provides ACID transactions that span collections and models—a key advantage over polyglot persistence.

Transaction Model:

ArangoDB uses single-collection transactions for simple operations and multi-collection transactions for complex cross-model operations.

Single-Collection Transaction (Implicit):

Simple operations are automatically transactional:

// This is automatically atomic
INSERT { name: "Product", price: 100 } INTO products

Multi-Collection Transaction:

transactions.js
JavaScript
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
// JavaScript transaction (multi-collection)
const db = require('@arangodb').db;
 
// Execute a transaction
db._executeTransaction({
  // Collections involved
  collections: {
    write: ['orders', 'purchased', 'inventory'],
    read: ['products', 'users']
  },
  
  // Transaction function
  action: function(params) {
    const db = require('@arangodb').db;
    
    // Get input
    const userId = params.userId;
    const productId = params.productId;
    const quantity = params.quantity;
    
    // Read product (consistent within transaction)
    const product = db.products.document(productId);
    
    // Check inventory
    const inv = db.inventory.firstExample({ productId: productId });
    if (inv.available < quantity) {
      throw new Error('Insufficient inventory');
    }
    
    // Create order (document operation)
    const order = db.orders.insert({
      userId: userId,
      productId: productId,
      quantity: quantity,
      total: product.price * quantity,
      status: 'confirmed',
      createdAt: Date.now()
    });
    
    // Create edge (graph operation)
    db.purchased.insert({
      _from: 'users/' + userId,
      _to: 'products/' + productId,
      orderId: order._key,
      date: Date.now()
    });
    
    // Update inventory (document operation)
    db.inventory.update(inv._key, {
      available: inv.available - quantity
    });
    
    return order;
  },
  
  // Parameters passed to action
  params: {
    userId: 'alice',
    productId: 'laptop_123',
    quantity: 1
  }
});

AQL-Based Transactions:

For simpler multi-collection operations, AQL provides implicit transaction scope:

// All operations in one AQL query are atomic
LET order = FIRST(
  INSERT {
    userId: "alice",
    productId: "laptop_123",
    total: 999.99
  } INTO orders
  RETURN NEW
)

LET edge = FIRST(
  INSERT {
    _from: "users/alice",
    _to: "products/laptop_123",
    orderId: order._key
  } INTO purchased
  RETURN NEW
)

UPDATE "laptop_123" WITH {
  inventory: products.inventory - 1
} IN products

RETURN { order, edge }

Isolation Levels:

ArangoDB supports snapshot isolation through MVCC:

Read operations see consistent snapshot at transaction start
Write operations check for conflicts at commit
Optimistic locking via _rev for conflict detection

// Optimistic locking example
FOR product IN products
  FILTER product._key == "laptop_123"
  UPDATE product WITH {
    inventory: product.inventory - 1
  } IN products
  OPTIONS { ignoreRevs: false }  // Enable revision checking
  RETURN { old: OLD, new: NEW }

Distributed Transactions:

In cluster deployments, ArangoDB coordinates transactions across shards:

Transactions touching single shard: local commit
Transactions touching multiple shards: two-phase commit coordination
Configurable timeout for distributed lock acquisition

Transaction Scope Considerations

Larger transaction scopes (more collections, more operations) increase lock contention and coordination overhead. Design for minimal transaction scope when possible. For high-throughput scenarios, consider event sourcing or saga patterns for cross-aggregate consistency.

Practical Usage Patterns

Real applications combine ArangoDB's models in specific patterns. Let's examine several production-proven approaches:

Pattern 1: E-Commerce with Social Features

Collections:
├── products (document)     - Product catalog
├── users (document)        - User profiles
├── orders (document)       - Order records
├── reviews (document)      - Product reviews
├── purchased (edge)        - User → Product
├── viewed (edge)           - User → Product
├── similar_to (edge)       - Product → Product
└── follows (edge)          - User → User

Graphs:
├── purchase_graph: users ─[purchased]→ products
├── social_graph: users ─[follows]→ users
└── product_graph: products ─[similar_to]→ products

Query Example: Personalized Recommendations

recommendations.aql

AQL

// Find recommendations for Alice based on:
// 1. Products similar to what she purchased
// 2. Products purchased by people she follows who have similar taste
 
LET alice_purchases = (
  FOR product IN 1 OUTBOUND 'users/alice' GRAPH 'purchase_graph'
    RETURN product._id
)
 
// Similar to purchased
LET similar_products = (
  FOR purchased_id IN alice_purchases
    FOR similar IN 1 OUTBOUND purchased_id GRAPH 'product_graph'
      FILTER similar._id NOT IN alice_purchases
      COLLECT product = similar WITH COUNT INTO score
      RETURN { product, score }
)
 
// From followed users with similar taste
LET social_recommendations = (
  FOR followed IN 1 OUTBOUND 'users/alice' GRAPH 'social_graph'
    // Find followed users who bought same things
    LET common_purchases = LENGTH(
      FOR their_purchase IN 1 OUTBOUND followed GRAPH 'purchase_graph'
        FILTER their_purchase._id IN alice_purchases
        RETURN 1
    )
    FILTER common_purchases > 2  // Similar taste threshold
    
    // Get their other purchases
    FOR their_product IN 1 OUTBOUND followed GRAPH 'purchase_graph'
      FILTER their_product._id NOT IN alice_purchases
      COLLECT product = their_product 
      AGGREGATE trust_score = SUM(common_purchases)
      RETURN { product, trust_score }
)
 
// Combine and rank
FOR rec IN UNION(
  (FOR r IN similar_products RETURN { p: r.product, s: r.score * 2 }),
  (FOR r IN social_recommendations RETURN { p: r.product, s: r.trust_score })
)
  COLLECT product = rec.p AGGREGATE total_score = SUM(rec.s)
  SORT total_score DESC
  LIMIT 10
  RETURN {
    product: product.name,
    category: product.category,
    price: product.price,
    score: total_score
  }

Pattern 2: Identity and Access Management

Collections:
├── users (document)        - User accounts
├── groups (document)       - Security groups
├── roles (document)        - Role definitions  
├── resources (document)    - Protected resources
├── member_of (edge)        - User → Group
├── has_role (edge)         - User/Group → Role
└── can_access (edge)       - Role → Resource

Graph:
└── iam_graph: Connects all IAM relationships

Query: Check Access Permissions

// Can user access resource? Check all paths
LET user_id = 'users/alice'
LET resource_id = 'resources/sensitive_doc'

// Direct role assignment
LET direct_access = FIRST(
  FOR role IN 1 OUTBOUND user_id GRAPH 'iam_graph'
    OPTIONS { edgeCollections: ['has_role'] }
    FOR resource IN 1 OUTBOUND role GRAPH 'iam_graph'
      OPTIONS { edgeCollections: ['can_access'] }
      FILTER resource._id == resource_id
      RETURN true
)

// Via group membership
LET group_access = FIRST(
  FOR group IN 1..3 OUTBOUND user_id GRAPH 'iam_graph'
    OPTIONS { edgeCollections: ['member_of'] }
    FOR role IN 1 OUTBOUND group GRAPH 'iam_graph'
      OPTIONS { edgeCollections: ['has_role'] }
      FOR resource IN 1 OUTBOUND role GRAPH 'iam_graph'
        OPTIONS { edgeCollections: ['can_access'] }
        FILTER resource._id == resource_id
        RETURN true
)

RETURN direct_access == true OR group_access == true

Pattern 3: Content Management with Versioning

Store content as documents, relationships as edges, version history within documents:

// Content document with embedded version history
{
  "_key": "article_001",
  "title": "Multi-Model Databases",
  "current_version": 3,
  "content": "...",
  "versions": [
    { "v": 1, "content": "...", "date": "..." },
    { "v": 2, "content": "...", "date": "..." }
  ],
  "author": "users/alice"
}

// Relationship edges
{ "_from": "articles/article_001", "_to": "tags/database" }
{ "_from": "articles/article_001", "_to": "categories/tech" }

Design Principle

Use documents for self-contained entities, edges for relationships that benefit from traversal, and embedded data for version history or tightly-coupled sub-entities. The right choice depends on access patterns—ask 'how will this data be queried?'

Summary: ArangoDB as Multi-Model Exemplar

ArangoDB illustrates how multi-model concepts manifest in a production system. Let's consolidate the key learnings:

Key Takeaways

•Native multi-model by design — ArangoDB was built from inception with multi-model in mind, not retrofitted
•Documents as universal foundation — All data (including graph edges) are JSON documents with flexible schemas
•AQL unifies query patterns — Single language handles document queries, graph traversals, and analytics
•ACID transactions span models — Cross-collection, cross-model atomicity is a first-class feature
•Graph edges are documents — Enables rich edge properties, indexing, and querying while maintaining graph semantics
•Real-world patterns emerge — E-commerce, IAM, content management all benefit from mixed model access
•Cluster-aware by design — Multi-model semantics preserved in distributed deployments

What's Next:

Having examined ArangoDB as a concrete implementation, the next page explores the flexibility benefits of multi-model databases—the specific advantages organizations gain from consolidating on a unified multi-model system.

Page Complete

You now understand how a native multi-model database like ArangoDB implements multi-model concepts. This practical knowledge enables you to evaluate multi-model databases against your specific requirements.