Multi Model Databases - Learning Module

Loading content...

0/241

Multiple Data Models

The Data Model Dilemma

Every database system is built around a fundamental abstraction: the data model. This model determines how data is organized, stored, queried, and related. For decades, the relational model dominated—tables, rows, columns, and SQL. But as applications grew more diverse and data became more varied, a single model proved insufficient.

Consider a modern e-commerce platform. It needs:

Product catalogs with varying attributes (a shirt has size/color; a laptop has CPU/RAM/storage)
User sessions requiring fast key-value lookups
Recommendation engines traversing relationships between users and products
Transaction records requiring ACID guarantees and structured queries

Each requirement maps naturally to a different data model. Forcing all of them into a single model—whether relational, document, or graph—creates impedance mismatch, complexity, and performance problems.

What You Will Learn

By the end of this page, you will understand why multiple data models exist, how they differ fundamentally, and why the convergence of models within a single database system represents a significant evolution in database architecture. You'll gain the conceptual foundation for understanding multi-model databases.

The Evolution of Data Models

Understanding multiple data models requires understanding how we arrived at this diversity. Database technology has evolved through distinct paradigms, each addressing limitations of its predecessors.

The Historical Arc:

1960s-1970s: Hierarchical and Network Models

The earliest database systems used hierarchical (tree-structured) and network (graph-structured) models. IBM's IMS and CODASYL databases organized data through parent-child relationships and explicit links. These systems were powerful but rigid—changing data structures required application rewrites.

1970s-2000s: The Relational Revolution

Edgar Codd's relational model brought mathematical rigor and data independence. Data organized into tables with relationships expressed through keys rather than physical pointers. SQL provided declarative querying. The model's simplicity and theoretical foundation made it dominant for three decades.

2000s-2010s: NoSQL Diversification

Web-scale applications exposed relational limitations. Different workloads demanded different models:

Key-Value stores for session caching (Redis, Memcached)
Document databases for flexible schemas (MongoDB, CouchDB)
Column-family stores for time-series and analytics (Cassandra, HBase)
Graph databases for relationship-intensive data (Neo4j, JanusGraph)

2010s-Present: Multi-Model Convergence

Today, we see convergence. Rather than maintaining separate databases for each model, multi-model databases support multiple paradigms within a unified system.

The Pendulum of Database History

Database history follows a pattern: unification → specialization → re-unification. The relational model unified early diverse systems. NoSQL specialized for different workloads. Multi-model databases now seek to unify again, but with the accumulated wisdom of both eras.

Core Data Model Types

Before examining how multiple models coexist, we must deeply understand each model's characteristics, strengths, and natural use cases. Each model embodies fundamentally different assumptions about data organization.

Fundamental Data Model Comparison
Data Model	Primary Structure	Query Pattern	Optimal For	Trade-offs
Relational	Tables with rows/columns	Declarative SQL with joins	Structured data, complex queries, ACID transactions	Schema rigidity, join overhead at scale
Document	Nested JSON/BSON documents	Document traversal, embedded queries	Variable schemas, self-contained records	Denormalization, complex cross-document queries
Key-Value	Simple key → value pairs	Direct key lookups, range scans	Caching, sessions, real-time lookups	Limited query capability, no relationships
Graph	Nodes and edges with properties	Traversal patterns, path queries	Relationship-intensive data, network analysis	Less efficient for bulk analytics, storage overhead
Column-Family	Sparse columns organized by row key	Column-range scans, time-series queries	Time-series, write-heavy analytics workloads	Complex modeling, limited secondary indexes

Deep Dive: Why These Differences Matter

These models aren't arbitrary design choices—they reflect fundamental trade-offs in computer science:

Relational Model: The Power of Abstraction

The relational model separates logical data organization from physical storage. You define what data means (schema) while the database decides how to store and access it (execution plans). This abstraction enables:

Query optimization independent of application logic
Schema changes without rewriting applications
Complex analytical queries across diverse data

But this abstraction has costs. Joins require runtime computation. Normalization spreads data across tables. Schema changes require migrations.

Document Model: The Power of Locality

Documents store related data together, optimizing for read patterns that access entire records. A user profile containing addresses, preferences, and history lives in one place—one disk read, one network round-trip. This locality provides:

Excellent read performance for common access patterns
Natural mapping to application objects
Schema flexibility per document

But locality trades away normalization. Data duplication is common. Cross-document queries require application-level joins or database-side execution.

Graph Model: The Power of Relationships

Graphs make relationships first-class citizens. Rather than computing relationships at query time (joins), graphs store relationships explicitly as edges. This inverts the relational assumption:

Traversing relationships is constant-time, not join-dependent
Paths of arbitrary depth are natural queries
Pattern matching across the graph is efficient

But graphs pay for relationship richness with storage overhead (edges consume space) and complexity for simple row-based analytics.

Model Selection is Architecture

Choosing a data model isn't a tactical decision—it's an architectural one. Your model shapes query patterns, performance characteristics, and development paradigms. Understanding each model deeply is prerequisite to understanding why multi-model databases matter.

The Impedance Mismatch Problem

When applications require multiple data models but databases support only one, an impedance mismatch occurs. This mismatch manifests in multiple costly ways:

Semantic Mismatch

Consider modeling a social network in a relational database. The natural representation is a graph—users as nodes, friendships as edges. But relational databases force this into tables:

CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100));
CREATE TABLE friendships (user_a INT, user_b INT, since DATE);

Querying 'friends of friends' requires self-joins:

SELECT DISTINCT u3.name
FROM friendships f1
JOIN friendships f2 ON f1.user_b = f2.user_a
JOIN users u3 ON f2.user_b = u3.id
WHERE f1.user_a = 123;

Each layer of relationship requires another join. Finding paths of depth N requires N joins. What's natural in graph terms becomes awkward and slow in relational terms.

Structural Mismatch

Document-oriented data forced into relations creates explosion:

// Natural document representation
{
  "product": "Laptop",
  "specs": {
    "cpu": "Intel i7",
    "ram": "16GB",
    "storage": { "type": "SSD", "size": "512GB" }
  },
  "reviews": [
    { "user": "alice", "rating": 5, "text": "..." },
    { "user": "bob", "rating": 4, "text": "..." }
  ]
}

Relational representation requires multiple tables (products, specs, storage_details, reviews) with foreign keys and joins to reconstruct what was naturally a single document.

Performance Mismatch

Key-value access patterns in relational databases suffer unnecessary overhead. Looking up a session by session_id shouldn't require query parsing, plan optimization, and table scans—but relational databases apply their full query machinery even for simple lookups.

The Traditional Solution: Polyglot Persistence

Developers addressed impedance mismatch through polyglot persistence—using multiple specialized databases:

PostgreSQL for transactional data
MongoDB for product catalogs
Redis for session caching
Neo4j for recommendations

This solves model mismatch but creates new problems.

Polyglot Persistence Challenges

•Operational Complexity — Each database requires separate monitoring, backup, scaling, and operations expertise. DevOps burden multiplies.
•Data Consistency — No transactions span databases. Keeping Redis cache consistent with PostgreSQL requires application-level coordination.
•Query Complexity — Queries needing data from multiple databases require application-side joins, losing database optimization.
•Development Overhead — Developers must learn multiple query languages, APIs, and paradigms. Context-switching costs are real.
•Infrastructure Cost — Multiple database clusters consume more resources than a single unified system.

The Multi-Model Vision

Multi-model databases propose a radical alternative: support multiple data models within a single database system. Rather than choosing between relational, document, or graph, applications get all of them under one roof.

The Core Proposition:

Single Database Engine — One system to deploy, monitor, and scale
Multiple Data Models — Document, graph, key-value, and potentially relational access
Unified Query Capability — One query language (or integrated languages) across models
Cross-Model Transactions — ACID guarantees spanning different model types
Integrated Storage — Shared storage layer optimized for multiple access patterns

How Multi-Model Differs from Polyglot Persistence:

Polyglot Persistence

•Multiple database systems
•Separate operational burdens
•No cross-database transactions
•Application-side data integration
•Multiple query languages/APIs
•Consistency is application's problem
•Higher infrastructure costs

Multi-Model Database

•Single database system
•Unified operations
•Cross-model transactions
•Database-level integration
•Unified or integrated query language
•Database ensures consistency
•Consolidated infrastructure

Architectural Approaches to Multi-Model:

Multi-model databases take different architectural approaches to supporting multiple models:

Native Multi-Model (Purpose-Built)

Some databases are designed from the ground up to support multiple models. ArangoDB, OrientDB, and CosmosDB fall into this category. Their storage engines, query processors, and APIs are designed with multi-model as a first principle.

Extended Single-Model

Other databases started with one model and extended to support others:

PostgreSQL added JSON/JSONB for document capabilities
Neo4j added support for property indexing resembling documents
MongoDB added graph queries through $graphLookup

These extensions are often less integrated but benefit from the database's strengths in its original model.

API Layer Integration

Some systems provide multi-model access through API layers atop existing storage:

Present different protocols (SQL, document API, graph API) over the same data
Trade-off: potential impedance mismatch between storage and API

Storage Engine Convergence

Modern storage engines increasingly support multiple access patterns:

Embedded documents in relational databases
Secondary indexes enabling document-like access in key-value stores
Graph extensions in document databases

The Integration Depth Spectrum

Multi-model databases exist on a spectrum from 'bolted-on extensions' to 'deeply integrated from the ground up.' Native multi-model systems typically offer better cross-model integration while extended systems may offer better performance for their primary model. Understanding where a system falls on this spectrum is crucial for evaluation.

Model Interactions and Synergies

The power of multi-model databases extends beyond merely supporting multiple models—it's in how models interact and complement each other within unified queries and transactions.

Cross-Model Query Patterns:

Document + Graph (The Relationship-Rich Document)

Consider product recommendations. Products are naturally documents (varying attributes), but recommendations are relationships (graph):

// Unified query concept (pseudo-code):
FOR product IN products
  FILTER product.category == "electronics"
  LET related = (
    FOR v, e IN 1..2 OUTBOUND product GRAPH 'recommendations'
      FILTER e.strength > 0.7
      RETURN v
  )
  RETURN { product, recommendations: related }

One query traverses documents and follows graph edges—impossible in separate databases without application glue.

Key-Value + Document (The Accelerated Access)

Session data might be accessed by session_id (key-value pattern) but contain rich session information (document):

Key-value access paths provide O(1) lookups
Document structure provides rich querying when needed
No impedance mismatch between fast lookup and rich querying

Graph + Relational (Analytical Relationships)

Traverse relationships (graph) but aggregate results (relational/analytical):

// Find influencers: traverse social graph, aggregate by reach
FOR user IN users
  LET followers = LENGTH(
    FOR v IN 1..3 INBOUND user GRAPH 'social'
      RETURN v
  )
  SORT followers DESC
  LIMIT 100
  RETURN { user: user.name, reach: followers }

Transaction Semantics Across Models:

Perhaps most critically, multi-model databases can provide ACID transactions spanning model boundaries:

// Atomic operation across document and graph
BEGIN TRANSACTION
  // Insert document
  INSERT { _key: "order_123", items: [...], total: 599 } INTO orders
  // Create graph edge
  INSERT { _from: "users/alice", _to: "orders/order_123", date: NOW() } INTO purchased
COMMIT

This atomicity—impossible with polyglot persistence—ensures data integrity across model boundaries.

Emergent Capabilities

Multi-model databases often enable query patterns that weren't possible in any single model. The combination creates emergent capabilities—the whole becomes greater than the sum of parts. This is the deepest value proposition of multi-model: not just convenience, but new possibilities.

Use Case Mapping to Data Models

Selecting appropriate data models requires mapping business requirements to model characteristics. Let's examine how different use cases align with different models:

Use Case Analysis Framework:

For each use case, consider:

Data Structure — is data regular or irregular? Deeply nested or flat?
Access Patterns — point lookups, range scans, joins, traversals?
Relationship Intensity — few relationships or relationship-heavy?
Consistency Requirements — eventual consistency acceptable or ACID required?
Query Complexity — simple gets or complex analytical queries?

Use Case to Data Model Mapping
Use Case	Recommended Model	Rationale
User sessions / caching	Key-Value	Simple key-based access, high throughput, TTL support
Product catalogs (varying attributes)	Document	Flexible schema, self-contained records, natural JSON mapping
Social networks / recommendations	Graph	Relationship-intensive, traversal queries, path analysis
Financial transactions	Relational	Strong consistency, complex joins, reporting needs
Content management	Document	Hierarchical content, embedded media references, flexible schemas
Fraud detection	Graph	Pattern matching across entities, link analysis, anomaly detection
IoT sensor data	Column-Family or Document	Time-series access patterns, high write throughput
Identity/access management	Graph	Permission hierarchies, group memberships, role traversal
E-commerce platform	Multi-Model	Products (document), cart (key-value), recommendations (graph), orders (relational)

The Multi-Model Sweet Spot:

Multi-model databases shine when applications have heterogeneous data requirements:

Mixed workloads — some data is relational, some is document-oriented, some is graph-structured
Evolving requirements — uncertain which model will prove optimal; flexibility to experiment
Cross-cutting queries — need to correlate data across what would be separate databases
Operational simplicity priority — cannot afford complexity of multiple database systems
Startup/MVP scenarios — single system that can grow with the application

When NOT to Use Multi-Model:

Extreme specialization — workloads perfectly suited to a single model's strengths
Maximum performance — specialized databases often outperform generalist systems
Existing expertise — team deeply skilled in specific technology
Minimal cross-model needs — data models are genuinely separate with no integration

Jack of All Trades?

Multi-model databases face the 'jack of all trades, master of none' concern. While they provide adequate performance across models, specialized databases may outperform them significantly for specific workloads. This is a valid consideration—understand your performance requirements before committing.

Summary: Understanding Multiple Data Models

We've explored the fundamental landscape of data models and why their diversity matters. Let's consolidate the key insights:

Key Takeaways

•Data models embody fundamental trade-offs — Each model optimizes for different access patterns, data structures, and query types
•Impedance mismatch is costly — Forcing data into inappropriate models creates semantic, structural, and performance problems
•Polyglot persistence solves mismatch but creates complexity — Multiple databases mean multiple operational burdens and consistency challenges
•Multi-model databases offer unified alternative — Single system supporting multiple models with integrated queries and transactions
•Models can interact synergistically — Cross-model queries enable patterns impossible in single-model systems
•Use case analysis guides model selection — Understand data structure, access patterns, and relationship intensity
•Trade-offs exist — Multi-model may sacrifice peak performance for flexibility and simplicity

What's Next:

With the conceptual foundation of multiple data models established, the next page examines how these models coexist within a single database engine—the architectural patterns, storage strategies, and query integration that make multi-model databases possible.

Page Complete

You now understand why multiple data models exist and how they serve different needs. This foundation prepares you to understand how multi-model databases unify these models into coherent systems.