Database Management SystemDocument and Other Models

Document and Other Data Models

LevelBeginner

Duration75 mins

TopicDocument and Other Models

3 / 5

Graph Model

Modeling the Connected World

The world is fundamentally connected. Social networks link billions of people through intricate webs of friendships, follows, and interactions. Financial systems trace money flows across institutions and borders. Knowledge graphs connect concepts, facts, and entities in ways that enable machines to reason about the world.

Traditional data models—relational tables, documents, key-value pairs—can represent connections through foreign keys and references. But they treat relationships as second-class citizens: implicit links resolved through expensive join operations or multiple queries. The graph data model inverts this paradigm, making relationships explicit, first-class elements of the data structure.

In a graph database, a friend-of-a-friend query that might require multiple self-joins in SQL executes as a simple traversal. Finding the shortest path between two entities—a near-impossible task in relational systems—becomes a native operation.

What You Will Learn

This page delivers comprehensive coverage of the graph data model. You'll master the foundational concepts of nodes, edges, and properties; understand the property graph model used by most graph databases; learn graph query languages like Cypher; explore traversal algorithms; and recognize the domains where graph databases provide transformative advantages.

Graph Theory Foundations

Before exploring graph databases, we must understand the mathematical foundation: graph theory. A graph is a formal structure consisting of:

Vertices (Nodes): Entities in the domain—people, places, things, concepts. In graph databases, nodes typically represent the nouns of your data model.

Edges (Relationships): Connections between vertices—friendships, purchases, dependencies, containment. Edges are the verbs linking nouns together.

Graph Types:

Undirected Graph: Edges have no direction (friendship: if A knows B, B knows A)
Directed Graph (Digraph): Edges have direction (follows: A follows B doesn't mean B follows A)
Weighted Graph: Edges have numeric weights (distance, cost, strength)
Multigraph: Multiple edges can exist between the same pair of nodes
Hypergraph: Edges can connect more than two nodes simultaneously

graph-representations.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Graph Theory: Representation Approaches
 
# 1. Adjacency List (Common in graph databases)
# Each node stores list of its neighbors
Node A: [B, C, D]
Node B: [A, C]
Node C: [A, B, E]
Node D: [A]
Node E: [C]
 
Space: O(V + E)  # Vertices + Edges
Edge lookup: O(degree) # Check if edge exists
Traversal: O(V + E)   # Visit all nodes/edges
 
 
# 2. Adjacency Matrix
# N x N matrix where M[i][j] = 1 if edge exists
 
    A  B  C  D  E
A [ 0  1  1  1  0 ]
B [ 1  0  1  0  0 ]
C [ 1  1  0  0  1 ]
D [ 1  0  0  0  0 ]
E [ 0  0  1  0  0 ]
 
Space: O(V²)        # Always V x V
Edge lookup: O(1)   # Direct array access
Traversal: O(V²)    # Must check entire matrix
 
 
# 3. Edge List
# Simple list of (source, target) pairs
(A, B), (A, C), (A, D), (B, C), (C, E)
 
Space: O(E)          # Just edges
Edge lookup: O(E)    # Must scan list
Use: Batch processing, external algorithms

Key Graph Theory Concepts
Concept	Definition	Significance
Degree	Number of edges connected to a node	High-degree nodes are hubs; impact traversal performance
Path	Sequence of nodes connected by edges	Foundation of traversal and reachability queries
Shortest Path	Path with minimum edges/weight between nodes	Core algorithm for navigation, recommendations
Connected Component	Subset where all nodes are reachable from each other	Identifies clusters, isolated groups
Cycle	Path that returns to the starting node	Detects circular dependencies, feedback loops
Depth	Number of edges from a starting node	1st-degree friends, 2nd-degree connections
Centrality	Measure of node importance in the graph	Identifies influencers, critical infrastructure

Index-Free Adjacency

Graph databases use 'index-free adjacency'—each node directly references its neighbors without requiring index lookups. This makes traversal O(1) per hop, regardless of total graph size. A relational database must perform index lookups or table scans for each JOIN, becoming expensive at depth.

The Property Graph Model

The property graph model is the dominant paradigm in modern graph databases. It extends basic graph theory with rich metadata on both nodes and edges:

Model Components:

Nodes (Vertices): Represent entities, each with:
- A unique identifier
- One or more labels (types/categories)
- Zero or more properties (key-value pairs)
Relationships (Edges): Connect nodes, each with:
- A unique identifier
- A type (the relationship verb)
- A direction (from source to target)
- Zero or more properties

This model is intuitive—it mirrors how we naturally think about domains. "Alice WORKS_AT Acme Corp since 2020" translates directly to a node (Alice), an edge (WORKS_AT with property since=2020), and another node (Acme Corp).

property-graph-example.cypher
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// Neo4j Cypher: Creating a Property Graph
 
// Create Person nodes with properties
CREATE (alice:Person {
  name: 'Alice Chen',
  email: 'alice@example.com',
  joinedDate: date('2020-03-15'),
  skills: ['Python', 'GraphQL', 'Neo4j']
})
 
CREATE (bob:Person {
  name: 'Bob Smith',
  email: 'bob@example.com',
  joinedDate: date('2019-08-01')
})
 
CREATE (charlie:Person {
  name: 'Charlie Davis',
  email: 'charlie@example.com'
})
 
// Create Company nodes
CREATE (acme:Company {
  name: 'Acme Corp',
  industry: 'Technology',
  founded: 2010,
  headquarters: 'San Francisco'
})
 
CREATE (techstart:Company {
  name: 'TechStart Inc',
  industry: 'Technology',
  founded: 2018
})
 
// Create relationships with properties
CREATE (alice)-[:WORKS_AT {
  role: 'Senior Engineer',
  since: 2020,
  department: 'Platform'
}]->(acme)
 
CREATE (bob)-[:WORKS_AT {
  role: 'Product Manager',
  since: 2019
}]->(acme)
 
CREATE (charlie)-[:WORKS_AT {
  role: 'CTO',
  since: 2018,
  founder: true
}]->(techstart)
 
// Social relationships
CREATE (alice)-[:KNOWS {since: 2018, context: 'conference'}]->(bob)
CREATE (alice)-[:KNOWS {since: 2021, context: 'LinkedIn'}]->(charlie)
CREATE (bob)-[:MANAGES]->(alice)
 
// Company relationships
CREATE (acme)-[:ACQUIRED {year: 2022, price: 50000000}]->(techstart)

Visual Representation:

The property graph from the example above can be visualized as:

       ┌─────────────┐
       │   Alice     │──WORKS_AT──▶┌────────────┐
       │  :Person    │              │  Acme Corp │
       └──────┬──────┘              │  :Company  │
              │                     └─────┬──────┘
         KNOWS│                           │
              ▼                     ACQUIRED
       ┌─────────────┐                    │
       │    Bob      │──WORKS_AT──────────┤
       │  :Person    │                    ▼
       └─────────────┘              ┌────────────┐
                                    │TechStart   │◀──WORKS_AT──Charlie
                                    │  :Company  │              :Person
                                    └────────────┘

Notice how relationships carry meaning—WORKS_AT, KNOWS, MANAGES, ACQUIRED—and can carry properties like since or role. This richness is impossible in simple edge-list graphs.

Labels Enable Type Queries

Labels function like types or categories. You can query for 'all Person nodes' or 'all WORKS_AT relationships' efficiently. Nodes can have multiple labels (Alice could be :Person:Employee:Developer), enabling flexible categorization similar to tags.

Graph Query Languages

Graph databases provide specialized query languages designed for pattern matching and traversal—tasks that are awkward in SQL. The two dominant approaches are Cypher (declarative, pattern-based) and Gremlin (imperative, traversal-based).

Cypher (Neo4j):

Cypher uses ASCII-art patterns to express graph queries. Nodes are represented as () and relationships as -[]->. This visual syntax makes queries remarkably intuitive.

cypher-queries.cypher
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// MATCH: Find patterns in the graph
 
// Find all people who work at Acme Corp
MATCH (p:Person)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN p.name, p.email
 
// Find friends of friends (2-hop traversal)
MATCH (alice:Person {name: 'Alice Chen'})-[:KNOWS]->(friend)-[:KNOWS]->(foaf)
WHERE foaf <> alice  // Exclude self
RETURN DISTINCT foaf.name AS friendOfFriend
 
// Find shortest path between two people
MATCH path = shortestPath(
  (a:Person {name: 'Alice Chen'})-[:KNOWS*..6]-(b:Person {name: 'Charlie Davis'})
)
RETURN path, length(path) AS degrees
 
// Variable-length relationships (1 to 5 hops)
MATCH (p:Person)-[:WORKS_AT]->(:Company)-[:ACQUIRED*1..5]->(target:Company)
RETURN p.name, target.name
 
// Aggregation: Count employees per company
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name, COUNT(p) AS employeeCount
ORDER BY employeeCount DESC
 
// Pattern with multiple relationship types
MATCH (manager:Person)-[:MANAGES]->(employee:Person)-[:WORKS_AT]->(company:Company)
RETURN manager.name, employee.name, company.name
 
// Creating and updating
MERGE (p:Person {email: 'new@example.com'})
ON CREATE SET p.name = 'New User', p.createdAt = datetime()
ON MATCH SET p.lastSeen = datetime()
RETURN p
 
// Delete with cascade
MATCH (p:Person {email: 'old@example.com'})
DETACH DELETE p  // DETACH removes relationships first

Gremlin (Apache TinkerPop):

Gremlin takes an imperative, traversal-based approach. You describe a series of steps that a traverser takes through the graph. This approach offers fine-grained control over traversal behavior.

gremlin-queries.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Gremlin Traversal Language
 
// Find all people
g.V().hasLabel('Person').values('name')
 
// Find people who work at Acme
g.V().has('Company', 'name', 'Acme Corp')
  .in('WORKS_AT')
  .hasLabel('Person')
  .values('name')
 
// Friends of friends
g.V().has('Person', 'name', 'Alice Chen')
  .out('KNOWS')           // First hop: friends
  .out('KNOWS')           // Second hop: friends of friends
  .dedup()                // Remove duplicates
  .values('name')
 
// Shortest path
g.V().has('Person', 'name', 'Alice Chen')
  .repeat(out('KNOWS').simplePath())
  .until(has('Person', 'name', 'Charlie Davis'))
  .path()
  .limit(1)
 
// Aggregation
g.V().hasLabel('Person')
  .out('WORKS_AT')
  .groupCount()
  .by('name')
 
// Create with properties
g.addV('Person')
  .property('name', 'New User')
  .property('email', 'new@example.com')
 
// Create edge
g.V().has('Person', 'name', 'Alice Chen').as('a')
  .V().has('Person', 'name', 'New User').as('b')
  .addE('KNOWS').from('a').to('b')
  .property('since', 2024)

Cypher vs Gremlin Comparison
Aspect	Cypher	Gremlin
Style	Declarative, pattern-based	Imperative, traversal-based
Learning Curve	Lower (SQL-like patterns)	Higher (method chaining)
Portability	Neo4j (+ some others)	TinkerPop-compatible systems
Expressiveness	Excellent for patterns	Excellent for procedural logic
Visual Clarity	ASCII-art patterns are intuitive	Chain of operations
Optimization	Query planner optimizes	Developer controls traversal
Use Cases	Ad-hoc queries, analytics	Complex traversals, algorithms

GQL Standard

ISO is standardizing GQL (Graph Query Language), expected to become the 'SQL of graphs'. GQL draws heavily from Cypher's pattern-matching syntax. Once finalized, expect broad industry adoption across graph databases, similar to SQL's standardization for relational systems.

Graph Algorithms

Beyond simple traversals, graph databases enable sophisticated algorithms that extract insights from connected data. These algorithms power recommendations, fraud detection, network analysis, and many other applications.

Algorithm Categories:

Common Graph Algorithms
Category	Algorithms	Use Cases
Path Finding	Shortest Path, A*, All Pairs Shortest Path	Navigation, routing, degrees of separation
Centrality	PageRank, Betweenness, Closeness, Degree	Influencer identification, importance ranking
Community Detection	Louvain, Label Propagation, K-means	Customer segmentation, topic clustering
Similarity	Jaccard, Cosine, Node Similarity	Recommendations, duplicate detection
Link Prediction	Common Neighbors, Preferential Attachment	Friend suggestions, connection prediction
Graph Embeddings	Node2Vec, GraphSAGE	Machine learning on graphs, feature generation

graph-algorithms-cypher.cypher
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Neo4j Graph Data Science Library Examples
 
// PageRank: Find influential nodes
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
 
// Community Detection: Find clusters
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId
RETURN communityId, 
       collect(gds.util.asNode(nodeId).name) AS members,
       count(*) AS size
ORDER BY size DESC
 
// Betweenness Centrality: Find bridges
CALL gds.betweenness.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
 
// Node Similarity: Find similar users (for recommendations)
CALL gds.nodeSimilarity.stream('purchaseGraph')
YIELD node1, node2, similarity
WHERE similarity > 0.5
RETURN gds.util.asNode(node1).name AS user1,
       gds.util.asNode(node2).name AS user2,
       similarity
ORDER BY similarity DESC
 
// Shortest Path with Dijkstra (weighted)
MATCH (source:City {name: 'San Francisco'}), (target:City {name: 'New York'})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
    sourceNode: source,
    targetNode: target,
    relationshipWeightProperty: 'distance'
})
YIELD path, totalCost
RETURN [node IN nodes(path) | node.name] AS route, totalCost AS distance

PageRank Deep Dive:

PageRank, famously created for Google's search ranking, measures node importance based on the structure of incoming links. The intuition is simple: a node is important if it's connected to other important nodes.

The algorithm iteratively computes scores:

Initialize all nodes with equal score (1/N)
In each iteration, nodes distribute their score equally among outbound neighbors
Collect incoming scores and add a "random surfer" component
Repeat until scores converge

In social networks, high PageRank identifies influencers. In fraud detection, suspiciously high PageRank from fraudulent accounts flags synthetic review networks.

Algorithm Complexity

Graph algorithms can be computationally expensive. Shortest path is O(V + E) with BFS, O((V + E) log V) with Dijkstra. All-pairs shortest path is O(V³). Community detection scales better but still requires multiple passes. For large graphs, consider graph analytics platforms (Spark GraphX, Pregel) rather than transactional graph databases.

Graph Database Architectures

Graph databases are implemented in fundamentally different ways, affecting performance characteristics and suitable use cases.

Native Graph Databases:

Built from the ground up for graph storage and processing. Data structures and storage engines are designed specifically for graphs.

Index-free adjacency: Nodes directly reference neighbors
Optimized for traversals: Pointer-chasing through relationships
Examples: Neo4j, Amazon Neptune (partial), TigerGraph

Non-Native (Graph Layer over Other Storage):

Graph query interface built atop relational, document, or key-value storage.

Relationships stored as foreign keys or references
Traversals require index lookups
Examples: Azure Cosmos DB, Amazon Neptune (partial), ArangoDB

Popular Graph Databases
System	Type	Query Language	Best For
Neo4j	Native graph	Cypher	OLTP graph workloads, enterprise applications
Amazon Neptune	Managed, multi-model	Cypher, Gremlin, SPARQL	AWS-native applications, RDF workloads
TigerGraph	Native, distributed	GSQL	Large-scale analytics, real-time deep traversals
ArangoDB	Multi-model	AQL	Combined document + graph needs
JanusGraph	Layer over storage backends	Gremlin	Massive scale on Cassandra/HBase
Azure Cosmos DB	Multi-model managed	Gremlin	Azure-native, global distribution
Dgraph	Native, distributed	GraphQL+-, DQL	GraphQL-first applications

Native Graph Advantages

•O(1) relationship traversal regardless of graph size
•Optimized storage format for graph patterns
•Efficient deep traversals (many hops)
•Graph-specific caching strategies
•Purpose-built query optimization

Non-Native Advantages

•Leverage existing storage infrastructure
•Multi-model flexibility (graph + document + key-value)
•May scale horizontally more easily
•Reduce operational complexity
•Familiar underlying technology

Scaling Considerations:

Graphs are inherently difficult to partition. Unlike key-value data (partition by key hash) or documents (partition by document ID), graph relationships cross partition boundaries. A 6-hop traversal might touch data on 6 different servers.

Strategies for scaling:

Vertical scaling: Bigger machines with more RAM (Neo4j's primary approach)
Read replicas: Distribute read load; writes to single primary
Sharding by domain: Separate graphs for different data domains
Graph partitioning algorithms: Minimize cross-partition edges (METIS, etc.)
Distributed graph processing: Batch analytics on Spark GraphX, Pregel

The Partitioning Challenge

Be cautious of claims about 'unlimited graph scaling'. Deep traversals across partitioned graphs incur significant network overhead. For truly massive graphs requiring real-time deep traversals, vertical scaling or specialized distributed systems (TigerGraph) may be necessary.

Use Cases and Applications

Graph databases shine in domains where relationships are central to the data model and queries. Let's examine the major application areas in depth.

Social Networks:

The canonical graph use case. Users, posts, groups, and pages form nodes; follows, friendships, likes, and memberships form edges. Core queries include:

Friend suggestions (friends of friends not yet connected)
News feed ranking (content from close connections)
Influence measurement (PageRank for viral content prediction)
Community detection (find groups of closely-connected users)

social-network-queries.cypher
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Friend Suggestions: Friends of friends Alice doesn't know
MATCH (alice:User {name: 'Alice'})-[:FRIENDS_WITH]->(friend)-[:FRIENDS_WITH]->(suggestion)
WHERE NOT (alice)-[:FRIENDS_WITH]-(suggestion)
  AND suggestion <> alice
WITH suggestion, COUNT(friend) AS mutualFriends
ORDER BY mutualFriends DESC
LIMIT 10
RETURN suggestion.name, mutualFriends
 
// News Feed: Recent posts from connections, weighted by closeness
MATCH (me:User {id: $userId})-[r:FOLLOWS|FRIENDS_WITH]->(creator)-[:POSTED]->(post)
WHERE post.createdAt > datetime() - duration({hours: 24})
WITH post, creator, 
     CASE type(r) WHEN 'FRIENDS_WITH' THEN 2 ELSE 1 END AS weight
RETURN post, creator, weight
ORDER BY weight DESC, post.createdAt DESC
LIMIT 50
 
// Degrees of Separation
MATCH path = shortestPath(
  (user1:User {id: $user1Id})-[:FRIENDS_WITH*]-(user2:User {id: $user2Id})
)
RETURN length(path) AS degrees

Fraud Detection:

Fraudsters create networks of fake accounts, synthetic identities, and colluding actors. These patterns—invisible in row-by-row analysis—stand out in graph analysis:

Identity fraud: Multiple accounts sharing phone numbers, devices, addresses
Money laundering: Complex chains of transactions designed to obscure origin
Synthetic identity: Fabricated identities connecting to establish legitimacy
Collusion rings: Groups of accounts that only interact with each other

Graph Database Use Cases by Domain
Domain	Entities (Nodes)	Relationships (Edges)	Key Queries
Social Networks	Users, Posts, Groups	Friends, Follows, Likes	Suggestions, feeds, influence
Fraud Detection	Accounts, Devices, Transactions	Owns, TransferredTo, SharedWith	Pattern detection, ring identification
Recommendations	Users, Products, Categories	Purchased, Viewed, SimilarTo	Collaborative filtering, similarity
Knowledge Graphs	Entities, Concepts, Facts	IsA, HasProperty, RelatedTo	Question answering, inference
Network/IT Ops	Servers, Services, Databases	ConnectsTo, DependsOn, Runs	Impact analysis, root cause
Supply Chain	Suppliers, Products, Facilities	Supplies, TransportsTo, Contains	Risk analysis, traceability
Life Sciences	Genes, Proteins, Diseases	Regulates, Causes, Treats	Drug discovery, pathway analysis

Knowledge Graphs:

Knowledge graphs structure factual knowledge as entity-relationship networks. Google's Knowledge Graph powers the information boxes in search results. Enterprises build knowledge graphs for:

Semantic search: Understanding query intent through entity relationships
Question answering: Traversing relationships to find answers
Data integration: Linking entities across disparate data sources
AI/ML enhancement: Providing structured context for language models

The Impact Analysis Use Case

One killer use case for graphs is dependency analysis. 'If this server fails, what services are affected?' In a graph, this is a simple traversal. In relational databases, it requires complex recursive CTEs or application-level iteration. Graphs enable real-time impact analysis that's impractical otherwise.

Graph vs Relational Trade-offs

Graph databases aren't universally better than relational databases—they're optimized for different access patterns. Understanding when to choose each is essential.

When Graphs Win:

Deep Traversals: Queries involving 3+ relationship hops. Multi-level JOINs in SQL become exponentially expensive.
Relationship-Centric Queries: "Find all paths", "find similar", "what connects X to Y"—queries where relationships are the focus, not just navigation.
Schema Flexibility: Graphs easily accommodate new relationship types without schema migrations.
Variable-Length Paths: "All ancestors", "all transitive dependencies"—recursive queries that require CTEs or stored procedures in SQL.

Choose Graph When

•Queries traverse 3+ relationship levels
•Relationships have properties/types
•Path-finding and pattern matching are core
•Schema evolves rapidly with new connections
•Domain is inherently networked (social, fraud, knowledge)

Choose Relational When

•Queries primarily filter/aggregate within tables
•Relationships are simple and fixed
•Strong transactional integrity required
•Reporting and BI tools expect SQL
•Domain is inherently tabular (orders, inventory)

graph-vs-sql-comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Find friends of friends of friends in SQL (3-hop)
-- This requires 3 self-JOINs on a potentially huge friendships table
 
SELECT DISTINCT f3.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN friendships f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 123
  AND f3.friend_id != 123
  AND f3.friend_id NOT IN (SELECT friend_id FROM friendships WHERE user_id = 123);
 
-- Each JOIN potentially scans millions of rows
-- Performance degrades exponentially with depth
-- Index usage becomes complex with self-joins
 
 
-- The same query in Cypher:
-- MATCH (u:User {id: 123})-[:FRIEND*3]-(fofofo:User)
-- WHERE NOT (u)-[:FRIEND]-(fofofo) AND u <> fofofo
-- RETURN DISTINCT fofofo
 
-- Graph execution:
-- Start at node 123
-- Traverse FRIEND relationships 3 times (pointer chasing)
-- Filter results
-- Time complexity: O(average_friends^3) - independent of total graph size

When Relational Wins:

Aggregation-Heavy Workloads: SUM, AVG, GROUP BY across millions of rows. Relational databases are highly optimized for set-based operations.
Simple Relationships: If relationships are just foreign keys traversed 1-2 levels, SQL handles this efficiently.
ACID Requirements: While graph databases support transactions, relational databases have decades of battle-tested ACID implementations.
Ecosystem Integration: BI tools, ETL pipelines, and reporting infrastructure assume SQL.

Polyglot Persistence

Many production systems use both relational and graph databases. Core transactional data (orders, accounts) lives in PostgreSQL; relationship-heavy analytics (recommendations, fraud) run on Neo4j. Use each tool where it excels rather than forcing a single database to handle all workloads.

RDF and Triple Stores

Before property graphs dominated, RDF (Resource Description Framework) defined the graph data model for the Semantic Web. While property graphs are more popular for general applications, RDF remains important for knowledge graphs, linked data, and domains requiring formal semantics.

The Triple Model:

RDF represents data as triples: subject-predicate-object statements.

(Subject) --[Predicate]--> (Object)

Examples:
:Alice :worksAt :AcmeCorp
:Alice :hasEmail "alice@example.com"
:AcmeCorp :locatedIn :SanFrancisco
:SanFrancisco :isA :City

Triples can chain together to form graphs, but unlike property graphs, relationships cannot have properties—everything is expressed as additional triples.

rdf-and-sparql.ttl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# RDF in Turtle syntax
 
@prefix : <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
 
:alice a foaf:Person ;
    foaf:name "Alice Chen" ;
    foaf:mbox <mailto:alice@example.com> ;
    :worksAt :acme ;
    :joinedDate "2020-03-15"^^xsd:date .
 
:bob a foaf:Person ;
    foaf:name "Bob Smith" ;
    :worksAt :acme ;
    foaf:knows :alice .
 
:acme a :Company ;
    foaf:name "Acme Corp" ;
    :industry "Technology" ;
    :headquarters :sanfrancisco .
 
:sanfrancisco a :City ;
    foaf:name "San Francisco" ;
    :country :usa .
 
# SPARQL Query: Find all people who work at companies in San Francisco
SELECT ?personName ?companyName
WHERE {
    ?person a foaf:Person ;
            foaf:name ?personName ;
            :worksAt ?company .
    ?company foaf:name ?companyName ;
             :headquarters ?city .
    ?city foaf:name "San Francisco" .
}

Property Graph vs RDF Comparison
Aspect	Property Graph	RDF/Triple Store
Data Model	Nodes + Edges with properties	Subject-Predicate-Object triples
Relationship Properties	First-class support	Requires reification (complex)
Schema	Flexible labels/types	Formal ontologies (OWL, RDFS)
Query Language	Cypher, Gremlin	SPARQL
Standards	Emerging (GQL)	W3C standards (mature)
Use Cases	Application data, analytics	Knowledge graphs, linked data, semantic web
Inference	Application-level	Native reasoning with ontologies

When to Choose RDF:

Linked Open Data: Publishing data that connects to global knowledge (DBpedia, Wikidata)
Ontology-driven domains: Healthcare, legal, scientific domains with formal vocabularies
Reasoning requirements: Automatic inference based on class hierarchies and rules
Standards compliance: Government and enterprise contexts requiring W3C standards

Neo4j + RDF

Modern graph databases often support both models. Neo4j can import RDF data and query it with Cypher. Amazon Neptune supports both property graphs (Gremlin) and RDF (SPARQL). You don't always have to choose—hybrid approaches are viable.

Summary: Graph Model Essentials

The graph data model represents a fundamental shift in how we think about connected data. By elevating relationships to first-class citizens, graph databases enable queries and analytics that are impractical—sometimes impossible—in other paradigms. Let's consolidate the essential concepts:

Key Takeaways

•Graphs model connections natively — Unlike relational JOINs or document references, graph relationships are direct pointers enabling O(1) traversal per hop
•Property graphs combine structure and data — Nodes and edges carry labels and properties, enabling rich domain modeling beyond simple connections
•Query languages are pattern-based — Cypher's ASCII-art patterns and Gremlin's traversal steps are purpose-built for graph navigation
•Graph algorithms unlock insights — PageRank, community detection, shortest paths, and similarity measures extract value from connected data
•Native vs non-native matters — Native graph databases use index-free adjacency for optimal traversal performance
•Use cases are relationship-centric — Social networks, fraud detection, recommendations, and knowledge graphs are ideal fits
•Graphs complement relational databases — Many systems use both: relational for transactions, graphs for relationship analytics
•RDF serves semantic domains — When formal ontologies, reasoning, and linked data standards are required, RDF/SPARQL remain relevant

What's Next:

While graphs excel at relationship modeling, some workloads require a different optimization: massive-scale analytical queries across billions of rows. The next page explores the column-family model—a paradigm designed for distributed, write-heavy workloads with analytical access patterns, exemplified by systems like Apache Cassandra and HBase.

Page Complete

You now possess comprehensive knowledge of the graph data model—its theoretical foundations, the property graph paradigm, query languages, algorithms, architectures, and application domains. This understanding enables you to identify when graph databases are the right tool and how to leverage their unique capabilities.

3 / 5

Loading learning content...

Database Management SystemDocument and Other Models

Document and Other Data Models

LevelBeginner

Duration75 mins

TopicDocument and Other Models

3 / 5

Graph Model

Modeling the Connected World

What You Will Learn

Graph Theory Foundations

Before exploring graph databases, we must understand the mathematical foundation: graph theory. A graph is a formal structure consisting of:

Vertices (Nodes): Entities in the domain—people, places, things, concepts. In graph databases, nodes typically represent the nouns of your data model.

Edges (Relationships): Connections between vertices—friendships, purchases, dependencies, containment. Edges are the verbs linking nouns together.

Graph Types:

Undirected Graph: Edges have no direction (friendship: if A knows B, B knows A)
Directed Graph (Digraph): Edges have direction (follows: A follows B doesn't mean B follows A)
Weighted Graph: Edges have numeric weights (distance, cost, strength)
Multigraph: Multiple edges can exist between the same pair of nodes
Hypergraph: Edges can connect more than two nodes simultaneously

graph-representations.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Graph Theory: Representation Approaches
 
# 1. Adjacency List (Common in graph databases)
# Each node stores list of its neighbors
Node A: [B, C, D]
Node B: [A, C]
Node C: [A, B, E]
Node D: [A]
Node E: [C]
 
Space: O(V + E)  # Vertices + Edges
Edge lookup: O(degree) # Check if edge exists
Traversal: O(V + E)   # Visit all nodes/edges
 
 
# 2. Adjacency Matrix
# N x N matrix where M[i][j] = 1 if edge exists
 
    A  B  C  D  E
A [ 0  1  1  1  0 ]
B [ 1  0  1  0  0 ]
C [ 1  1  0  0  1 ]
D [ 1  0  0  0  0 ]
E [ 0  0  1  0  0 ]
 
Space: O(V²)        # Always V x V
Edge lookup: O(1)   # Direct array access
Traversal: O(V²)    # Must check entire matrix
 
 
# 3. Edge List
# Simple list of (source, target) pairs
(A, B), (A, C), (A, D), (B, C), (C, E)
 
Space: O(E)          # Just edges
Edge lookup: O(E)    # Must scan list
Use: Batch processing, external algorithms

Key Graph Theory Concepts
Concept	Definition	Significance
Degree	Number of edges connected to a node	High-degree nodes are hubs; impact traversal performance
Path	Sequence of nodes connected by edges	Foundation of traversal and reachability queries
Shortest Path	Path with minimum edges/weight between nodes	Core algorithm for navigation, recommendations
Connected Component	Subset where all nodes are reachable from each other	Identifies clusters, isolated groups
Cycle	Path that returns to the starting node	Detects circular dependencies, feedback loops
Depth	Number of edges from a starting node	1st-degree friends, 2nd-degree connections
Centrality	Measure of node importance in the graph	Identifies influencers, critical infrastructure

Index-Free Adjacency

The Property Graph Model

The property graph model is the dominant paradigm in modern graph databases. It extends basic graph theory with rich metadata on both nodes and edges:

Model Components:

Nodes (Vertices): Represent entities, each with:
- A unique identifier
- One or more labels (types/categories)
- Zero or more properties (key-value pairs)
Relationships (Edges): Connect nodes, each with:
- A unique identifier
- A type (the relationship verb)
- A direction (from source to target)
- Zero or more properties

property-graph-example.cypher
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
// Neo4j Cypher: Creating a Property Graph
 
// Create Person nodes with properties
CREATE (alice:Person {
  name: 'Alice Chen',
  email: 'alice@example.com',
  joinedDate: date('2020-03-15'),
  skills: ['Python', 'GraphQL', 'Neo4j']
})
 
CREATE (bob:Person {
  name: 'Bob Smith',
  email: 'bob@example.com',
  joinedDate: date('2019-08-01')
})
 
CREATE (charlie:Person {
  name: 'Charlie Davis',
  email: 'charlie@example.com'
})
 
// Create Company nodes
CREATE (acme:Company {
  name: 'Acme Corp',
  industry: 'Technology',
  founded: 2010,
  headquarters: 'San Francisco'
})
 
CREATE (techstart:Company {
  name: 'TechStart Inc',
  industry: 'Technology',
  founded: 2018
})
 
// Create relationships with properties
CREATE (alice)-[:WORKS_AT {
  role: 'Senior Engineer',
  since: 2020,
  department: 'Platform'
}]->(acme)
 
CREATE (bob)-[:WORKS_AT {
  role: 'Product Manager',
  since: 2019
}]->(acme)
 
CREATE (charlie)-[:WORKS_AT {
  role: 'CTO',
  since: 2018,
  founder: true
}]->(techstart)
 
// Social relationships
CREATE (alice)-[:KNOWS {since: 2018, context: 'conference'}]->(bob)
CREATE (alice)-[:KNOWS {since: 2021, context: 'LinkedIn'}]->(charlie)
CREATE (bob)-[:MANAGES]->(alice)
 
// Company relationships
CREATE (acme)-[:ACQUIRED {year: 2022, price: 50000000}]->(techstart)

Visual Representation:

The property graph from the example above can be visualized as:

       ┌─────────────┐
       │   Alice     │──WORKS_AT──▶┌────────────┐
       │  :Person    │              │  Acme Corp │
       └──────┬──────┘              │  :Company  │
              │                     └─────┬──────┘
         KNOWS│                           │
              ▼                     ACQUIRED
       ┌─────────────┐                    │
       │    Bob      │──WORKS_AT──────────┤
       │  :Person    │                    ▼
       └─────────────┘              ┌────────────┐
                                    │TechStart   │◀──WORKS_AT──Charlie
                                    │  :Company  │              :Person
                                    └────────────┘

Notice how relationships carry meaning—WORKS_AT, KNOWS, MANAGES, ACQUIRED—and can carry properties like since or role. This richness is impossible in simple edge-list graphs.

Labels Enable Type Queries

Graph Query Languages

Cypher (Neo4j):

Cypher uses ASCII-art patterns to express graph queries. Nodes are represented as () and relationships as -[]->. This visual syntax makes queries remarkably intuitive.

cypher-queries.cypher
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
// MATCH: Find patterns in the graph
 
// Find all people who work at Acme Corp
MATCH (p:Person)-[:WORKS_AT]->(c:Company {name: 'Acme Corp'})
RETURN p.name, p.email
 
// Find friends of friends (2-hop traversal)
MATCH (alice:Person {name: 'Alice Chen'})-[:KNOWS]->(friend)-[:KNOWS]->(foaf)
WHERE foaf <> alice  // Exclude self
RETURN DISTINCT foaf.name AS friendOfFriend
 
// Find shortest path between two people
MATCH path = shortestPath(
  (a:Person {name: 'Alice Chen'})-[:KNOWS*..6]-(b:Person {name: 'Charlie Davis'})
)
RETURN path, length(path) AS degrees
 
// Variable-length relationships (1 to 5 hops)
MATCH (p:Person)-[:WORKS_AT]->(:Company)-[:ACQUIRED*1..5]->(target:Company)
RETURN p.name, target.name
 
// Aggregation: Count employees per company
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
RETURN c.name, COUNT(p) AS employeeCount
ORDER BY employeeCount DESC
 
// Pattern with multiple relationship types
MATCH (manager:Person)-[:MANAGES]->(employee:Person)-[:WORKS_AT]->(company:Company)
RETURN manager.name, employee.name, company.name
 
// Creating and updating
MERGE (p:Person {email: 'new@example.com'})
ON CREATE SET p.name = 'New User', p.createdAt = datetime()
ON MATCH SET p.lastSeen = datetime()
RETURN p
 
// Delete with cascade
MATCH (p:Person {email: 'old@example.com'})
DETACH DELETE p  // DETACH removes relationships first

Gremlin (Apache TinkerPop):

Gremlin takes an imperative, traversal-based approach. You describe a series of steps that a traverser takes through the graph. This approach offers fine-grained control over traversal behavior.

gremlin-queries.groovy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
// Gremlin Traversal Language
 
// Find all people
g.V().hasLabel('Person').values('name')
 
// Find people who work at Acme
g.V().has('Company', 'name', 'Acme Corp')
  .in('WORKS_AT')
  .hasLabel('Person')
  .values('name')
 
// Friends of friends
g.V().has('Person', 'name', 'Alice Chen')
  .out('KNOWS')           // First hop: friends
  .out('KNOWS')           // Second hop: friends of friends
  .dedup()                // Remove duplicates
  .values('name')
 
// Shortest path
g.V().has('Person', 'name', 'Alice Chen')
  .repeat(out('KNOWS').simplePath())
  .until(has('Person', 'name', 'Charlie Davis'))
  .path()
  .limit(1)
 
// Aggregation
g.V().hasLabel('Person')
  .out('WORKS_AT')
  .groupCount()
  .by('name')
 
// Create with properties
g.addV('Person')
  .property('name', 'New User')
  .property('email', 'new@example.com')
 
// Create edge
g.V().has('Person', 'name', 'Alice Chen').as('a')
  .V().has('Person', 'name', 'New User').as('b')
  .addE('KNOWS').from('a').to('b')
  .property('since', 2024)

Cypher vs Gremlin Comparison
Aspect	Cypher	Gremlin
Style	Declarative, pattern-based	Imperative, traversal-based
Learning Curve	Lower (SQL-like patterns)	Higher (method chaining)
Portability	Neo4j (+ some others)	TinkerPop-compatible systems
Expressiveness	Excellent for patterns	Excellent for procedural logic
Visual Clarity	ASCII-art patterns are intuitive	Chain of operations
Optimization	Query planner optimizes	Developer controls traversal
Use Cases	Ad-hoc queries, analytics	Complex traversals, algorithms

GQL Standard

Graph Algorithms

Algorithm Categories:

Common Graph Algorithms
Category	Algorithms	Use Cases
Path Finding	Shortest Path, A*, All Pairs Shortest Path	Navigation, routing, degrees of separation
Centrality	PageRank, Betweenness, Closeness, Degree	Influencer identification, importance ranking
Community Detection	Louvain, Label Propagation, K-means	Customer segmentation, topic clustering
Similarity	Jaccard, Cosine, Node Similarity	Recommendations, duplicate detection
Link Prediction	Common Neighbors, Preferential Attachment	Friend suggestions, connection prediction
Graph Embeddings	Node2Vec, GraphSAGE	Machine learning on graphs, feature generation

graph-algorithms-cypher.cypher
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
// Neo4j Graph Data Science Library Examples
 
// PageRank: Find influential nodes
CALL gds.pageRank.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
 
// Community Detection: Find clusters
CALL gds.louvain.stream('myGraph')
YIELD nodeId, communityId
RETURN communityId, 
       collect(gds.util.asNode(nodeId).name) AS members,
       count(*) AS size
ORDER BY size DESC
 
// Betweenness Centrality: Find bridges
CALL gds.betweenness.stream('myGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC
LIMIT 10
 
// Node Similarity: Find similar users (for recommendations)
CALL gds.nodeSimilarity.stream('purchaseGraph')
YIELD node1, node2, similarity
WHERE similarity > 0.5
RETURN gds.util.asNode(node1).name AS user1,
       gds.util.asNode(node2).name AS user2,
       similarity
ORDER BY similarity DESC
 
// Shortest Path with Dijkstra (weighted)
MATCH (source:City {name: 'San Francisco'}), (target:City {name: 'New York'})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
    sourceNode: source,
    targetNode: target,
    relationshipWeightProperty: 'distance'
})
YIELD path, totalCost
RETURN [node IN nodes(path) | node.name] AS route, totalCost AS distance

PageRank Deep Dive:

The algorithm iteratively computes scores:

Initialize all nodes with equal score (1/N)
In each iteration, nodes distribute their score equally among outbound neighbors
Collect incoming scores and add a "random surfer" component
Repeat until scores converge

In social networks, high PageRank identifies influencers. In fraud detection, suspiciously high PageRank from fraudulent accounts flags synthetic review networks.

Algorithm Complexity

Graph Database Architectures

Graph databases are implemented in fundamentally different ways, affecting performance characteristics and suitable use cases.

Native Graph Databases:

Built from the ground up for graph storage and processing. Data structures and storage engines are designed specifically for graphs.

Index-free adjacency: Nodes directly reference neighbors
Optimized for traversals: Pointer-chasing through relationships
Examples: Neo4j, Amazon Neptune (partial), TigerGraph

Non-Native (Graph Layer over Other Storage):

Graph query interface built atop relational, document, or key-value storage.

Relationships stored as foreign keys or references
Traversals require index lookups
Examples: Azure Cosmos DB, Amazon Neptune (partial), ArangoDB

Popular Graph Databases
System	Type	Query Language	Best For
Neo4j	Native graph	Cypher	OLTP graph workloads, enterprise applications
Amazon Neptune	Managed, multi-model	Cypher, Gremlin, SPARQL	AWS-native applications, RDF workloads
TigerGraph	Native, distributed	GSQL	Large-scale analytics, real-time deep traversals
ArangoDB	Multi-model	AQL	Combined document + graph needs
JanusGraph	Layer over storage backends	Gremlin	Massive scale on Cassandra/HBase
Azure Cosmos DB	Multi-model managed	Gremlin	Azure-native, global distribution
Dgraph	Native, distributed	GraphQL+-, DQL	GraphQL-first applications

Native Graph Advantages

•O(1) relationship traversal regardless of graph size
•Optimized storage format for graph patterns
•Efficient deep traversals (many hops)
•Graph-specific caching strategies
•Purpose-built query optimization

Non-Native Advantages

•Leverage existing storage infrastructure
•Multi-model flexibility (graph + document + key-value)
•May scale horizontally more easily
•Reduce operational complexity
•Familiar underlying technology

Scaling Considerations:

Strategies for scaling:

Vertical scaling: Bigger machines with more RAM (Neo4j's primary approach)
Read replicas: Distribute read load; writes to single primary
Sharding by domain: Separate graphs for different data domains
Graph partitioning algorithms: Minimize cross-partition edges (METIS, etc.)
Distributed graph processing: Batch analytics on Spark GraphX, Pregel

The Partitioning Challenge

Use Cases and Applications

Graph databases shine in domains where relationships are central to the data model and queries. Let's examine the major application areas in depth.

Social Networks:

The canonical graph use case. Users, posts, groups, and pages form nodes; follows, friendships, likes, and memberships form edges. Core queries include:

Friend suggestions (friends of friends not yet connected)
News feed ranking (content from close connections)
Influence measurement (PageRank for viral content prediction)
Community detection (find groups of closely-connected users)

social-network-queries.cypher
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Friend Suggestions: Friends of friends Alice doesn't know
MATCH (alice:User {name: 'Alice'})-[:FRIENDS_WITH]->(friend)-[:FRIENDS_WITH]->(suggestion)
WHERE NOT (alice)-[:FRIENDS_WITH]-(suggestion)
  AND suggestion <> alice
WITH suggestion, COUNT(friend) AS mutualFriends
ORDER BY mutualFriends DESC
LIMIT 10
RETURN suggestion.name, mutualFriends
 
// News Feed: Recent posts from connections, weighted by closeness
MATCH (me:User {id: $userId})-[r:FOLLOWS|FRIENDS_WITH]->(creator)-[:POSTED]->(post)
WHERE post.createdAt > datetime() - duration({hours: 24})
WITH post, creator, 
     CASE type(r) WHEN 'FRIENDS_WITH' THEN 2 ELSE 1 END AS weight
RETURN post, creator, weight
ORDER BY weight DESC, post.createdAt DESC
LIMIT 50
 
// Degrees of Separation
MATCH path = shortestPath(
  (user1:User {id: $user1Id})-[:FRIENDS_WITH*]-(user2:User {id: $user2Id})
)
RETURN length(path) AS degrees

Fraud Detection:

Fraudsters create networks of fake accounts, synthetic identities, and colluding actors. These patterns—invisible in row-by-row analysis—stand out in graph analysis:

Identity fraud: Multiple accounts sharing phone numbers, devices, addresses
Money laundering: Complex chains of transactions designed to obscure origin
Synthetic identity: Fabricated identities connecting to establish legitimacy
Collusion rings: Groups of accounts that only interact with each other

Graph Database Use Cases by Domain
Domain	Entities (Nodes)	Relationships (Edges)	Key Queries
Social Networks	Users, Posts, Groups	Friends, Follows, Likes	Suggestions, feeds, influence
Fraud Detection	Accounts, Devices, Transactions	Owns, TransferredTo, SharedWith	Pattern detection, ring identification
Recommendations	Users, Products, Categories	Purchased, Viewed, SimilarTo	Collaborative filtering, similarity
Knowledge Graphs	Entities, Concepts, Facts	IsA, HasProperty, RelatedTo	Question answering, inference
Network/IT Ops	Servers, Services, Databases	ConnectsTo, DependsOn, Runs	Impact analysis, root cause
Supply Chain	Suppliers, Products, Facilities	Supplies, TransportsTo, Contains	Risk analysis, traceability
Life Sciences	Genes, Proteins, Diseases	Regulates, Causes, Treats	Drug discovery, pathway analysis

Knowledge Graphs:

Knowledge graphs structure factual knowledge as entity-relationship networks. Google's Knowledge Graph powers the information boxes in search results. Enterprises build knowledge graphs for:

Semantic search: Understanding query intent through entity relationships
Question answering: Traversing relationships to find answers
Data integration: Linking entities across disparate data sources
AI/ML enhancement: Providing structured context for language models

The Impact Analysis Use Case

Graph vs Relational Trade-offs

Graph databases aren't universally better than relational databases—they're optimized for different access patterns. Understanding when to choose each is essential.

When Graphs Win:

Deep Traversals: Queries involving 3+ relationship hops. Multi-level JOINs in SQL become exponentially expensive.
Relationship-Centric Queries: "Find all paths", "find similar", "what connects X to Y"—queries where relationships are the focus, not just navigation.
Schema Flexibility: Graphs easily accommodate new relationship types without schema migrations.
Variable-Length Paths: "All ancestors", "all transitive dependencies"—recursive queries that require CTEs or stored procedures in SQL.

Choose Graph When

•Queries traverse 3+ relationship levels
•Relationships have properties/types
•Path-finding and pattern matching are core
•Schema evolves rapidly with new connections
•Domain is inherently networked (social, fraud, knowledge)

Choose Relational When

•Queries primarily filter/aggregate within tables
•Relationships are simple and fixed
•Strong transactional integrity required
•Reporting and BI tools expect SQL
•Domain is inherently tabular (orders, inventory)

graph-vs-sql-comparison.sql
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
-- Find friends of friends of friends in SQL (3-hop)
-- This requires 3 self-JOINs on a potentially huge friendships table
 
SELECT DISTINCT f3.friend_id
FROM friendships f1
JOIN friendships f2 ON f1.friend_id = f2.user_id
JOIN friendships f3 ON f2.friend_id = f3.user_id
WHERE f1.user_id = 123
  AND f3.friend_id != 123
  AND f3.friend_id NOT IN (SELECT friend_id FROM friendships WHERE user_id = 123);
 
-- Each JOIN potentially scans millions of rows
-- Performance degrades exponentially with depth
-- Index usage becomes complex with self-joins
 
 
-- The same query in Cypher:
-- MATCH (u:User {id: 123})-[:FRIEND*3]-(fofofo:User)
-- WHERE NOT (u)-[:FRIEND]-(fofofo) AND u <> fofofo
-- RETURN DISTINCT fofofo
 
-- Graph execution:
-- Start at node 123
-- Traverse FRIEND relationships 3 times (pointer chasing)
-- Filter results
-- Time complexity: O(average_friends^3) - independent of total graph size

When Relational Wins:

Aggregation-Heavy Workloads: SUM, AVG, GROUP BY across millions of rows. Relational databases are highly optimized for set-based operations.
Simple Relationships: If relationships are just foreign keys traversed 1-2 levels, SQL handles this efficiently.
ACID Requirements: While graph databases support transactions, relational databases have decades of battle-tested ACID implementations.
Ecosystem Integration: BI tools, ETL pipelines, and reporting infrastructure assume SQL.

Polyglot Persistence

RDF and Triple Stores

The Triple Model:

RDF represents data as triples: subject-predicate-object statements.

(Subject) --[Predicate]--> (Object)

Examples:
:Alice :worksAt :AcmeCorp
:Alice :hasEmail "alice@example.com"
:AcmeCorp :locatedIn :SanFrancisco
:SanFrancisco :isA :City

Triples can chain together to form graphs, but unlike property graphs, relationships cannot have properties—everything is expressed as additional triples.

rdf-and-sparql.ttl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# RDF in Turtle syntax
 
@prefix : <http://example.org/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
 
:alice a foaf:Person ;
    foaf:name "Alice Chen" ;
    foaf:mbox <mailto:alice@example.com> ;
    :worksAt :acme ;
    :joinedDate "2020-03-15"^^xsd:date .
 
:bob a foaf:Person ;
    foaf:name "Bob Smith" ;
    :worksAt :acme ;
    foaf:knows :alice .
 
:acme a :Company ;
    foaf:name "Acme Corp" ;
    :industry "Technology" ;
    :headquarters :sanfrancisco .
 
:sanfrancisco a :City ;
    foaf:name "San Francisco" ;
    :country :usa .
 
# SPARQL Query: Find all people who work at companies in San Francisco
SELECT ?personName ?companyName
WHERE {
    ?person a foaf:Person ;
            foaf:name ?personName ;
            :worksAt ?company .
    ?company foaf:name ?companyName ;
             :headquarters ?city .
    ?city foaf:name "San Francisco" .
}

Property Graph vs RDF Comparison
Aspect	Property Graph	RDF/Triple Store
Data Model	Nodes + Edges with properties	Subject-Predicate-Object triples
Relationship Properties	First-class support	Requires reification (complex)
Schema	Flexible labels/types	Formal ontologies (OWL, RDFS)
Query Language	Cypher, Gremlin	SPARQL
Standards	Emerging (GQL)	W3C standards (mature)
Use Cases	Application data, analytics	Knowledge graphs, linked data, semantic web
Inference	Application-level	Native reasoning with ontologies

When to Choose RDF:

Linked Open Data: Publishing data that connects to global knowledge (DBpedia, Wikidata)
Ontology-driven domains: Healthcare, legal, scientific domains with formal vocabularies
Reasoning requirements: Automatic inference based on class hierarchies and rules
Standards compliance: Government and enterprise contexts requiring W3C standards

Neo4j + RDF

Summary: Graph Model Essentials

Key Takeaways

•Graphs model connections natively — Unlike relational JOINs or document references, graph relationships are direct pointers enabling O(1) traversal per hop
•Property graphs combine structure and data — Nodes and edges carry labels and properties, enabling rich domain modeling beyond simple connections
•Query languages are pattern-based — Cypher's ASCII-art patterns and Gremlin's traversal steps are purpose-built for graph navigation
•Graph algorithms unlock insights — PageRank, community detection, shortest paths, and similarity measures extract value from connected data
•Native vs non-native matters — Native graph databases use index-free adjacency for optimal traversal performance
•Use cases are relationship-centric — Social networks, fraud detection, recommendations, and knowledge graphs are ideal fits
•Graphs complement relational databases — Many systems use both: relational for transactions, graphs for relationship analytics
•RDF serves semantic domains — When formal ontologies, reasoning, and linked data standards are required, RDF/SPARQL remain relevant

What's Next:

Page Complete

3 / 5