Graph Databases - Learning Module

Loading content...

0/273

Relationship-Heavy Queries: Traversals, Patterns, and Algorithms

Unlocking the Graph's True Power

A graph database storing nodes and relationships is merely a storage system. The true power of graphs emerges when you traverse those relationships—finding paths, matching patterns, discovering communities, and computing influence. These operations would be prohibitively expensive in relational databases, but they're the native language of graph engines.

Consider queries that are trivial in graphs but nightmarish in SQL:

"Find all paths between Alice and Bob through mutual friends"
"Detect circular dependency chains in our software modules"
"Identify the most influential nodes in our network"
"Find triangles of fraud—accounts connected through suspicious transactions"

These relationship-heavy queries are where graph databases demonstrate their decisive advantage. Understanding how to express and optimize them is essential for any system designer working with connected data.

What You Will Learn

This page covers advanced graph query patterns: variable-length path traversals, shortest path algorithms, subgraph pattern matching, and analytical graph algorithms. You'll learn Cypher patterns for each, understand their computational complexity, and know when to apply them in production systems.

Variable-Length Path Traversals

The most distinguishing feature of graph query languages is the ability to express variable-length paths—traversals where the number of hops is unknown at query time. In SQL, this requires recursive CTEs with complex termination logic. In Cypher, it's a single character: *.

Basic Syntax:

(start)-[:RELATIONSHIP*]->(end)      // Any number of hops
(start)-[:RELATIONSHIP*1..3]->(end)  // 1 to 3 hops
(start)-[:RELATIONSHIP*2]->(end)     // Exactly 2 hops
(start)-[:RELATIONSHIP*..5]->(end)   // Up to 5 hops (includes 0)

Key Concepts:

Min..Max bounds: Always specify upper bounds in production
Path variable: Capture the traversed path: p = (a)-[*]->(b)
Relationship list: Access relationships in path: relationships(p)
Node list: Access nodes in path: nodes(p)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Find everyone within 3 hops of Alice in social network
MATCH (alice:User {name: 'Alice'})-[:KNOWS*1..3]->(connection)
RETURN DISTINCT connection.name, 
       count(*) AS path_count
 
// Capture and analyze paths
MATCH path = (alice:User {name: 'Alice'})-[:KNOWS*1..3]->(bob:User {name: 'Bob'})
RETURN path,
       length(path) AS hops,
       [n IN nodes(path) | n.name] AS node_names,
       [r IN relationships(path) | type(r)] AS rel_types
 
// Find dependency chains in software modules
MATCH chain = (app:Module {name: 'MyApp'})-[:DEPENDS_ON*]->(lib:Module)
WHERE lib.deprecated = true
RETURN app.name,
       lib.name AS deprecated_dependency,
       length(chain) AS chain_depth
 
// Organizational hierarchy - find all reports (direct and indirect)
MATCH (ceo:Employee {title: 'CEO'})<-[:REPORTS_TO*]-(employee)
RETURN employee.name, employee.title
ORDER BY employee.lastName
 
// Limit depth for performance
MATCH (start:Node)-[:CONNECTED*1..6]->(end:Node)  // Max 6 hops
WHERE start.id = 123
RETURN DISTINCT end.id

The Explosion Problem

Unbounded traversals (* without upper limit) can explore exponentially growing result sets. If each node has 100 connections, 6 hops explores 100^6 = 1 trillion potential paths. Always bound traversals in production: use *1..4 not *, and apply LIMIT to results.

Filtering During Traversal:

You can filter which relationships and nodes are traversed, pruning the search space:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
// Only traverse active relationships
MATCH path = (start)-[:KNOWS*1..4 {active: true}]->(end)
WHERE start.id = 'user-1'
RETURN path
 
// Filter nodes during traversal (Cypher 5+)
MATCH path = (start)
             ((intermediate WHERE intermediate.status = 'active')
              -[:CONNECTED]->)+
             (end)
WHERE start.id = 'node-1'
RETURN path
 
// Traditional approach: filter after, then check path validity
MATCH path = (start)-[:CONNECTED*1..5]->(end)
WHERE start.id = 'node-1'
  AND all(n IN nodes(path)[1..-1] WHERE n.status = 'active')  // Intermediate nodes only
RETURN path
 
// Multiple relationship types in path
MATCH (a)-[:KNOWS|WORKS_WITH*1..3]->(b)  // Either relationship type
WHERE a.name = 'Alice'
RETURN DISTINCT b.name

Shortest Path Algorithms

Finding the shortest path between nodes is a fundamental graph operation. Neo4j provides built-in functions for efficient path finding, leveraging algorithms like bidirectional BFS.

Key Functions:

Function	Purpose	Algorithm
`shortestPath()`	Single shortest path	Bidirectional BFS
`allShortestPaths()`	All paths of minimum length	Bidirectional BFS
`apoc.algo.dijkstra()`	Weighted shortest path	Dijkstra's
`apoc.algo.aStar()`	Weighted with heuristic	A*

Unweighted Shortest Path:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Find single shortest path (any path of minimum length)
MATCH path = shortestPath(
  (alice:User {name: 'Alice'})-[:KNOWS*]-(bob:User {name: 'Bob'})
)
RETURN path, 
       length(path) AS degrees_of_separation,
       [n IN nodes(path) | n.name] AS connection_chain
 
// Find ALL shortest paths (may be multiple of same length)
MATCH paths = allShortestPaths(
  (alice:User {name: 'Alice'})-[:KNOWS*]-(bob:User {name: 'Bob'})
)
RETURN paths,
       length(paths) AS path_length
 
// Constrain path length for performance
MATCH path = shortestPath(
  (a:Station {name: 'Kings Cross'})-[:CONNECTS*..15]-(b:Station {name: 'Heathrow'})
)
RETURN path
 
// Filter during shortest path search
MATCH path = shortestPath(
  (a:City {name: 'London'})-[:ROAD* {toll: false}]-(b:City {name: 'Edinburgh'})
)
RETURN path  // Only toll-free roads
 
// Shortest path with relationship type filter
MATCH path = shortestPath(
  (a:User {id: $userId})-[:FOLLOWS|KNOWS*..6]-(b:User {id: $targetId})
)
RETURN path

Weighted Shortest Path (Dijkstra):

When relationships have costs (distance, time, weight), use Dijkstra's algorithm to find the minimum-cost path:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// Using APOC Dijkstra for weighted paths
MATCH (start:Station {name: 'London'}), (end:Station {name: 'Edinburgh'})
CALL apoc.algo.dijkstra(start, end, 'CONNECTS', 'distance')
YIELD path, weight
RETURN path, 
       weight AS total_distance,
       [n IN nodes(path) | n.name] AS stations
 
// A* algorithm with heuristic (for geographic data)
MATCH (start:Location {name: 'Paris'}), (end:Location {name: 'Berlin'})
CALL apoc.algo.aStar(
  start, 
  end, 
  'ROAD',           // Relationship type
  'distance',       // Weight property
  'latitude',       // Latitude property for heuristic
  'longitude'       // Longitude property for heuristic
) YIELD path, weight
RETURN path, weight
 
// Multiple weighted criteria (compute custom weight)
MATCH (start:City {name: 'NYC'}), (end:City {name: 'LA'})
CALL apoc.algo.dijkstraWithDefaultWeight(
  start, 
  end, 
  'FLIGHT', 
  'cost',      // Primary weight property
  1.0          // Default weight if property missing
) YIELD path, weight
RETURN path, weight AS total_cost

Bidirectional Search

Neo4j's shortestPath uses bidirectional BFS—simultaneously searching from both start and end nodes. This dramatically reduces the search space compared to unidirectional search, especially in dense graphs. The search terminates when wavefronts meet.

Subgraph Pattern Matching

Beyond paths between two nodes, graph databases excel at finding patterns—specific configurations of nodes and relationships that match a template. This is the essence of graph query languages: "Find all subgraphs matching this shape."

Common Pattern Types:

Triangles: Three nodes all connected to each other
Stars: Central node connected to multiple peripheral nodes
Chains: Sequential node connections
Cycles: Closed loops of any length
Motifs: Domain-specific patterns (fraud rings, recommendation chains)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Triangle pattern: three mutual friends
MATCH (a:User)-[:KNOWS]->(b:User)-[:KNOWS]->(c:User)-[:KNOWS]->(a)
WHERE a.name < b.name AND b.name < c.name  // Avoid duplicates
RETURN a.name, b.name, c.name AS triangle_members
 
// Star pattern: hub with many connections
MATCH (hub:User)-[:FOLLOWS]->(follower:User)
WITH hub, count(follower) AS follower_count
WHERE follower_count > 1000
RETURN hub.name AS influencer, follower_count
ORDER BY follower_count DESC
 
// Fork pattern: shared connections
MATCH (a:User)-[:KNOWS]->(shared:User)<-[:KNOWS]-(b:User)
WHERE a.name = 'Alice' AND b.name = 'Bob'
  AND a <> b  // Different people
RETURN shared.name AS mutual_friend
 
// Chain pattern: referral chain
MATCH chain = (referrer:User)-[:REFERRED*1..5]->(referee:User)
WHERE referrer.name = 'Alice'
RETURN chain, 
       length(chain) AS chain_length,
       [n IN nodes(chain) | n.name] AS referral_chain
 
// Recommendation pattern: friend bought, you might like
MATCH (me:User {name: 'Alice'})-[:KNOWS]->(friend:User)
      -[:PURCHASED]->(product:Product)
WHERE NOT (me)-[:PURCHASED]->(product)  // I haven't bought it
RETURN product.name, 
       count(friend) AS friends_who_bought
ORDER BY friends_who_bought DESC
LIMIT 10

Cycle Detection:

Finding cycles is crucial for dependency analysis, fraud detection, and data integrity checks:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// Find circular dependencies in modules
MATCH cycle = (m:Module)-[:DEPENDS_ON*]->(m)
RETURN [n IN nodes(cycle) | n.name] AS circular_dependency
 
// Detect fraud rings: accounts in transaction cycles
MATCH cycle = (a:Account)-[:TRANSFERRED_TO*2..6]->(a)
WHERE sum([r IN relationships(cycle) | r.amount]) > 10000
RETURN cycle
 
// Check for circular references in hierarchies (data integrity)
MATCH (e:Employee)-[:REPORTS_TO*]->(e)
RETURN e.name AS cycle_participant, 
       'Circular reporting chain detected' AS issue
 
// Negative cycles in weighted graphs (arbitrage detection)
// Requires specific algorithm - APOC or GDS
CALL gds.bellmanFord.stream({
  nodeProjection: 'Currency',
  relationshipProjection: {
    EXCHANGE: {
      type: 'EXCHANGE_RATE',
      properties: ['rate']
    }
  },
  sourceNode: 'USD'
})
YIELD nodeId, totalCost
// If negative cycle exists, arbitrage opportunity

Pattern Complexity

Pattern matching complexity depends on pattern size and graph structure. Simple patterns (triangles, pairs) are fast. Complex patterns with many nodes and relationships require careful optimization—use MATCH... WITH... MATCH to break into stages, filter aggressively, and consider approximate algorithms for very large patterns.

Graph Algorithms with GDS

Beyond pattern matching, graph databases support analytical graph algorithms—computations that analyze the entire graph structure. Neo4j provides these through the Graph Data Science (GDS) library.

Algorithm Categories:

Category	Purpose	Examples
Centrality	Identify important nodes	PageRank, Betweenness, Degree
Community Detection	Find clusters/groups	Louvain, Label Propagation
Pathfinding	Find routes	Dijkstra, A*, BFS, DFS
Similarity	Compare nodes	Jaccard, Cosine, Euclidean
Link Prediction	Predict new edges	Common Neighbors, Adamic-Adar
Node Embedding	Vector representations	Node2Vec, GraphSAGE

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Create in-memory graph projection for algorithms
CALL gds.graph.project(
  'socialGraph',                    // Projection name
  'User',                           // Node labels
  'FOLLOWS',                        // Relationship type
  { relationshipProperties: 'weight' }
)
 
// PageRank: identify influential nodes
CALL gds.pageRank.stream('socialGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS user, score AS influence
ORDER BY score DESC
LIMIT 10
 
// Betweenness Centrality: nodes bridging communities
CALL gds.betweenness.stream('socialGraph')
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS user, score AS bridges_communities
ORDER BY score DESC
LIMIT 10
 
// Degree Centrality: most connected nodes
CALL gds.degree.stream('socialGraph', { orientation: 'REVERSE' })
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS user, score AS follower_count
ORDER BY score DESC
 
// Write results back to graph
CALL gds.pageRank.write('socialGraph', {
  writeProperty: 'pageRank'
})
YIELD nodePropertiesWritten

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// Louvain community detection (modularity optimization)
CALL gds.louvain.stream('socialGraph')
YIELD nodeId, communityId
RETURN communityId,
       count(*) AS members,
       collect(gds.util.asNode(nodeId).name)[..5] AS sample_members
ORDER BY members DESC
 
// Label Propagation (faster, less optimal)
CALL gds.labelPropagation.stream('socialGraph')
YIELD nodeId, communityId
WITH communityId, collect(gds.util.asNode(nodeId)) AS members
RETURN communityId, size(members) AS community_size
ORDER BY community_size DESC
 
// Weakly Connected Components
CALL gds.wcc.stream('socialGraph')
YIELD nodeId, componentId
RETURN componentId, count(*) AS component_size
ORDER BY component_size DESC
 
// Write community assignments back to graph
CALL gds.louvain.write('socialGraph', {
  writeProperty: 'community'
})
YIELD communityCount, modularity

Similarity and Link Prediction:

These algorithms are fundamental for recommendation systems:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Node Similarity: find users with similar follow patterns
CALL gds.nodeSimilarity.stream('socialGraph')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS user1,
       gds.util.asNode(node2).name AS user2,
       similarity
ORDER BY similarity DESC
LIMIT 20
 
// Link Prediction: predict likely future connections
// Adamic-Adar: weighted common neighbors
MATCH (a:User {name: 'Alice'}), (b:User)
WHERE NOT (a)-[:FOLLOWS]->(b) AND a <> b
CALL gds.alpha.linkprediction.adamicAdar.stream({
  node1: a,
  node2: b,
  relationshipQuery: 'FOLLOWS'
})
YIELD node1, node2, similarity
RETURN gds.util.asNode(node2).name AS recommended_follow, similarity
ORDER BY similarity DESC
LIMIT 10
 
// Common Neighbors prediction
MATCH (a:User {name: 'Alice'})
MATCH (a)-[:FOLLOWS]->(:User)<-[:FOLLOWS]-(recommended:User)
WHERE NOT (a)-[:FOLLOWS]->(recommended) AND a <> recommended
RETURN recommended.name,
       count(*) AS common_followees
ORDER BY common_followees DESC
LIMIT 10

Algorithm Memory Requirements

GDS algorithms create in-memory graph projections. For large graphs, memory requirements can exceed heap size. Use gds.graph.project with node/relationship filters to work on subgraphs, or upgrade to machines with more RAM for full-graph analysis.

Optimizing Relationship-Heavy Queries

Complex graph queries can become expensive. Here are optimization strategies for relationship-heavy operations:

1. Bound Your Traversals:

Never use unbounded variable-length paths in production:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
// ❌ Bad: Unbounded traversal (potential explosion)
MATCH (a)-[:KNOWS*]->(b)
RETURN b
 
// ✅ Good: Bounded traversal
MATCH (a)-[:KNOWS*1..4]->(b)
RETURN b
 
// ✅ Better: Bounded with LIMIT
MATCH (a:User {id: $userId})-[:KNOWS*1..4]->(b)
RETURN DISTINCT b.name
LIMIT 100
 
// ✅ Best: Strategic termination with aggregation
MATCH (a:User {id: $userId})-[:KNOWS*1..3]->(b)
WITH b, count(*) AS relevance
ORDER BY relevance DESC
LIMIT 50
RETURN b.name, relevance

2. Filter Early, Traverse Late:

Apply WHERE clauses as early as possible to prune the search space:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
// ❌ Bad: Filter after expensive traversal
MATCH (a:User)-[:KNOWS*1..4]->(b:User)-[:PURCHASED]->(p:Product)
WHERE a.tier = 'premium' AND p.category = 'electronics'
RETURN p
 
// ✅ Good: Filter before traversal
MATCH (a:User {tier: 'premium'})
MATCH (a)-[:KNOWS*1..4]->(b:User)
MATCH (b)-[:PURCHASED]->(p:Product {category: 'electronics'})
RETURN DISTINCT p
 
// ✅ Better: Use WITH to stage and filter
MATCH (a:User {tier: 'premium'})
WITH a LIMIT 1000  // Reduce starting set if needed
MATCH (a)-[:KNOWS*1..3]->(b:User)
WITH DISTINCT b
MATCH (b)-[:PURCHASED]->(p:Product)
WHERE p.price > 100
RETURN p.name, count(*) AS buyers
ORDER BY buyers DESC

3. Use DISTINCT Strategically:

Variable-length paths can reach the same node via different routes. Use DISTINCT to eliminate duplicates early, but be aware of memory implications:

1
2
3
4
5
6
7
8
9
10
11
12
13
// Without DISTINCT: may process same node multiple times
MATCH (a:User {id: 1})-[:KNOWS*1..4]->(friend)
RETURN friend.name  // Bob might appear 100 times via different paths
 
// With DISTINCT: unique results, but evaluated at end
MATCH (a:User {id: 1})-[:KNOWS*1..4]->(friend)
RETURN DISTINCT friend.name
 
// Better: DISTINCT mid-query to reduce downstream work
MATCH (a:User {id: 1})-[:KNOWS*1..4]->(friend)
WITH DISTINCT friend  // Deduplicate early
MATCH (friend)-[:PURCHASED]->(product)
RETURN friend.name, collect(product.name) AS purchases

4. Analyze with PROFILE:

Always profile complex queries to understand execution:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
// Profile to see actual execution
PROFILE
MATCH (a:User {email: 'alice@example.com'})-[:KNOWS*1..3]->(friend)
MATCH (friend)-[:PURCHASED]->(product:Product)
RETURN DISTINCT product.name, count(friend) AS recommenders
ORDER BY recommenders DESC
LIMIT 10
 
// Look for in the output:
// - DB HITS: Lower is better
// - Estimated vs Actual rows: Big differences indicate stale statistics
// - NodeByLabelScan: Should use NodeIndexSeek instead
// - Expand(All) operators: Shows traversal patterns
 
// Update statistics if estimates are off
CALL db.prepareForReplanning()

Query Caching

Neo4j caches query plans based on query structure. Use $parameters instead of literal values to benefit from plan caching. WHERE name = $name caches; WHERE name = 'Alice' creates a new plan each time.

Graph Aggregation Patterns

Graph queries often require aggregating results across paths and relationships. Cypher provides powerful aggregation functions:

Key Aggregation Functions:

count(): Count rows or distinct values
collect(): Gather values into a list
sum(), avg(), min(), max(): Numeric aggregations
percentileCont(), percentileDisc(): Percentile calculations
stDev(), stDevP(): Standard deviation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
// Count relationships per node
MATCH (u:User)-[r:FOLLOWS]->()
RETURN u.name, count(r) AS following_count
ORDER BY following_count DESC
 
// Collect into lists
MATCH (u:User)-[:PURCHASED]->(p:Product)
RETURN u.name, 
       collect(p.name) AS purchased_products,
       sum(p.price) AS total_spent
 
// Nested aggregations with collect
MATCH (category:Category)<-[:IN_CATEGORY]-(product:Product)<-[:PURCHASED]-(user:User)
WITH category, product, count(user) AS buyers
RETURN category.name,
       collect({product: product.name, buyers: buyers})[..5] AS top_products
ORDER BY category.name
 
// Aggregating path properties
MATCH path = (a:City)-[:ROAD*]->(b:City)
WHERE a.name = 'London' AND b.name = 'Edinburgh'
WITH path,
     reduce(total = 0, r IN relationships(path) | total + r.distance) AS total_dist
RETURN path, total_dist
ORDER BY total_dist
LIMIT 5
 
// Group by path length
MATCH (a:User {name: 'Alice'})-[path:KNOWS*1..4]->(b:User)
RETURN length(path) AS degrees,
       count(DISTINCT b) AS reachable_people
ORDER BY degrees

Using REDUCE for Path Calculations:

The reduce function iterates over lists, accumulating a result—essential for path weight calculations:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Sum weights along a path
MATCH path = (a:Station)-[:CONNECTS*]->(b:Station)
WHERE a.name = 'Kings Cross' AND b.name = 'Heathrow'
WITH path,
     reduce(time = 0, r IN relationships(path) | time + r.minutes) AS travel_time
RETURN [n IN nodes(path) | n.name] AS route,
       travel_time
ORDER BY travel_time
LIMIT 5
 
// Multiply probabilities along path
MATCH path = (start:Process)-[:LEADS_TO*]->(end:Outcome)
WHERE start.name = 'Initial'
WITH path,
     reduce(prob = 1.0, r IN relationships(path) | prob * r.probability) AS total_prob
RETURN [n IN nodes(path) | n.name] AS steps,
       total_prob
ORDER BY total_prob DESC
 
// Collect node properties along path
MATCH path = (a:User)-[:KNOWS*2..3]->(b:User)
WHERE a.name = 'Alice'
RETURN [n IN nodes(path) | n.name] AS connection_chain,
       reduce(ages = [], n IN nodes(path) | ages + n.age) AS ages_in_chain

Memory Considerations

Large collect() results consume heap memory. For very large result sets, use pagination (SKIP/LIMIT) or streaming APIs instead of collecting everything. Consider apoc.periodic.iterate for batch processing.

Real-World Query Patterns

Let's examine query patterns from production systems across common graph use cases:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Mutual friends between two users
MATCH (a:User {id: $userId1})-[:FRIENDS]-(mutual:User)-[:FRIENDS]-(b:User {id: $userId2})
RETURN mutual.name, mutual.avatar
 
// Friend suggestions (friends of friends who aren't already friends)
MATCH (me:User {id: $userId})-[:FRIENDS]-(friend)-[:FRIENDS]-(suggested)
WHERE NOT (me)-[:FRIENDS]-(suggested) 
  AND me <> suggested
RETURN suggested.name, 
       count(friend) AS mutual_friends,
       collect(friend.name)[..3] AS through
ORDER BY mutual_friends DESC
LIMIT 10
 
// Engagement-weighted feed
MATCH (me:User {id: $userId})-[f:FOLLOWS]->(creator:User)-[:POSTED]->(post:Post)
WHERE post.timestamp > datetime() - duration('P7D')
WITH post, creator, 
     CASE WHEN f.engagement = 'high' THEN 3
          WHEN f.engagement = 'medium' THEN 2
          ELSE 1 END AS weight
RETURN post.id, creator.name, post.content
ORDER BY post.timestamp DESC, weight DESC
LIMIT 50

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
// Account sharing detection: multiple logins from same device
MATCH (account1:Account)-[:LOGGED_IN_FROM]->(device:Device)<-[:LOGGED_IN_FROM]-(account2:Account)
WHERE account1 <> account2
WITH device, collect(DISTINCT account1.id) AS accounts
WHERE size(accounts) > 3
RETURN device.fingerprint, accounts AS suspicious_accounts
 
// Transaction ring detection
MATCH ring = (a:Account)-[:TRANSFERRED*3..6]->(a)
WHERE all(r IN relationships(ring) WHERE r.amount > 5000)
  AND all(r IN relationships(ring) WHERE r.timestamp > datetime() - duration('P30D'))
RETURN [n IN nodes(ring) | n.id] AS ring_members,
       size(nodes(ring)) AS ring_size,
       reduce(total = 0, r IN relationships(ring) | total + r.amount) AS total_transferred
 
// First-party fraud: synthetic identities
MATCH (identity:Identity)
OPTIONAL MATCH (identity)-[:HAS_SSN]->(ssn:SSN)<-[:HAS_SSN]-(other1:Identity)
OPTIONAL MATCH (identity)-[:HAS_ADDRESS]->(addr:Address)<-[:HAS_ADDRESS]-(other2:Identity)
OPTIONAL MATCH (identity)-[:HAS_PHONE]->(phone:Phone)<-[:HAS_PHONE]-(other3:Identity)
WITH identity,
     count(DISTINCT other1) AS shared_ssn,
     count(DISTINCT other2) AS shared_address,
     count(DISTINCT other3) AS shared_phone
WHERE shared_ssn > 0 OR shared_address + shared_phone > 3
RETURN identity.name, shared_ssn, shared_address, shared_phone
ORDER BY shared_ssn DESC, shared_address + shared_phone DESC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
// Question answering: relationships between concepts
MATCH path = shortestPath((c1:Concept {name: $concept1})-[*..5]-(c2:Concept {name: $concept2}))
RETURN path,
       [r IN relationships(path) | type(r)] AS relationship_chain
 
// Semantic similarity via shared categories
MATCH (a:Article {id: $articleId})-[:TAGGED]->(tag:Tag)<-[:TAGGED]-(similar:Article)
WITH similar, count(tag) AS shared_tags
WHERE shared_tags >= 3
RETURN similar.title, shared_tags
ORDER BY shared_tags DESC
LIMIT 10
 
// Inferred relationships (reasoning)
MATCH (person:Person)-[:WORKS_AT]->(company:Company)
      -[:LOCATED_IN]->(city:City)-[:IN_COUNTRY]->(country:Country)
RETURN person.name, company.name, city.name, country.name AS likely_country
// Person's likely country inferred through company location

Query Templates

Build a library of parameterized query templates for your domain. Common patterns—friend suggestions, shortest paths, fraud triangles—should be optimized once and reused. Document expected cardinalities and performance characteristics.

Summary: Mastering Graph Traversals

Relationship-heavy queries are where graph databases demonstrate their decisive advantage over relational systems. Let's consolidate the key patterns and principles:

Key Takeaways

•Variable-length paths — Use *1..N syntax for multi-hop traversals. Always bound with upper limits in production to prevent explosion.
•Shortest path algorithms — shortestPath() for unweighted, apoc.algo.dijkstra() for weighted. Bidirectional search makes this efficient.
•Pattern matching — Find triangles, stars, cycles, and domain-specific motifs. Break complex patterns into stages with WITH.
•Graph algorithms (GDS) — PageRank for influence, Louvain for communities, similarity for recommendations. Project graphs to memory first.
•Optimize with bounds — Filter early, traverse late. Use DISTINCT mid-query. Profile with PROFILE to understand execution.
•Aggregation — count(), collect(), reduce() for path calculations. Watch memory on large collect() operations.
•Real-world patterns — Friend suggestions, fraud rings, knowledge graphs. Build reusable parameterized templates.
•Parameters for caching — Use $param not literals for query plan reuse.

What's Next:

We'll explore real-world graph database applications—social networks, recommendation engines, fraud detection, and knowledge graphs. These case studies demonstrate how the patterns you've learned apply to production systems at scale.

Traversal Patterns Mastered

You now understand the full spectrum of graph query patterns—from simple path traversals to advanced algorithms. These patterns are the tools that unlock the graph's power: finding connections, detecting patterns, and analyzing networks in ways that would be prohibitive in other database models.