NoSQL DatabasesGraph Databases

Graph Databases: Modeling Connected Data

LevelAdvanced

Duration75 mins

TopicGraph Databases

4 / 5

Graph Queries

The Power of Graph Traversal

The true power of graph databases isn't in storing connected data—it's in querying that data in ways that would be impossibly complex or prohibitively expensive in other paradigms.

Consider what becomes trivial with graph queries:

Finding all paths between two entities through any network topology
Detecting circular dependencies in complex systems
Computing influence scores across relationship networks
Identifying fraud patterns as subgraph matches
Recommending products based on purchase graph similarity

These operations—natural expressions of graph algorithms—translate directly into queries that execute in milliseconds even on massive datasets.

What You Will Learn

By the end of this page, you will master advanced graph query patterns—traversal algorithms, path analysis, pattern matching, graph algorithms for analytics, and optimization techniques that distinguish production-grade graph queries from naive implementations.

Traversal Fundamentals

At the heart of every graph query is traversal—systematically visiting nodes by following relationships. Understanding traversal mechanics is essential for writing efficient queries.

Traversal Components:

Starting Points (Anchors): Where traversal begins—typically nodes matched by indexed properties
Expansion Pattern: Which relationships to follow and in what direction
Filters: Conditions to include/exclude paths during traversal
Depth Control: How many hops to traverse
Collection: What to return from traversed paths

traversal-patterns.cypher

Cypher

// ========================================
// DIRECTED TRAVERSAL
// ========================================
 
// Outgoing relationships only
MATCH (manager:Person)-[:MANAGES]->(report:Person)
RETURN manager.name, report.name
 
// Incoming relationships only
MATCH (employee:Person)<-[:HIRED]-(company:Company)
RETURN employee.name, company.name
 
// Either direction (performance note: can be slower)
MATCH (a:Person)-[:KNOWS]-(b:Person)
RETURN a.name, b.name
 
// ========================================
// MULTI-HOP TRAVERSAL (Fixed Depth)
// ========================================
 
// Exactly 2 hops
MATCH (a:Person)-[:KNOWS]->()-[:KNOWS]->(c:Person)
WHERE a <> c
RETURN DISTINCT a.name AS person, c.name AS friendOfFriend
 
// Same as above, more readable with named nodes
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)
WHERE a <> c
RETURN a.name, b.name AS via, c.name
 
// ========================================
// VARIABLE-LENGTH TRAVERSAL
// ========================================
 
// 1 to 3 hops
MATCH path = (start:Person {id: "P001"})-[:KNOWS*1..3]->(end:Person)
RETURN end.name, length(path) AS distance
 
// 2 to 5 hops (skip immediate neighbors)
MATCH (start:Person {id: "P001"})-[:KNOWS*2..5]-(connected:Person)
RETURN DISTINCT connected.name
 
// Any number of hops (use with caution!)
MATCH (start:Category {name: "Root"})-[:SUBCATEGORY*]->(leaf:Category)
WHERE NOT (leaf)-[:SUBCATEGORY]->()
RETURN leaf.name AS leafCategory
 
// Up to N hops (safer than unbounded)
MATCH (a:Node)-[:LINKED*..10]->(b:Node)
RETURN a.id, b.id
 
// ========================================
// TRAVERSAL WITH FILTERING
// ========================================
 
// Filter on intermediate nodes
MATCH path = (start:Person)-[:KNOWS*1..4]-(end:Person)
WHERE ALL(node IN nodes(path) WHERE node.verified = true)
RETURN path
 
// Filter on relationships
MATCH (start:Person)-[rels:KNOWS*1..3]->(end:Person)
WHERE ALL(r IN rels WHERE r.strength > 0.5)
RETURN start.name, end.name, 
       [r IN rels | r.strength] AS strengths
 
// Filter on path properties
MATCH path = (a:City)-[:ROAD*]-(b:City)
WHERE REDUCE(total = 0, r IN relationships(path) | total + r.distance) < 500
RETURN [n IN nodes(path) | n.name] AS route,
       REDUCE(d = 0, r IN relationships(path) | d + r.distance) AS totalDistance
 
// ========================================
// TRAVERSAL UNIQUENESS
// ========================================
 
// Default: relationships are unique per path (each edge visited once per path)
MATCH path = (a)-[*1..5]-(b)
// This may include same nodes multiple times, but not same edges
 
// Force unique nodes in path
MATCH path = (a)-[*1..5]-(b)
WHERE size(nodes(path)) = size(apoc.coll.toSet(nodes(path)))
RETURN path

Unbounded Traversal Warning

Never use unbounded variable-length patterns ([:REL*]) in production without constraints. On a large graph, this can traverse millions of paths, exhausting memory and timing out. Always specify maximum depth (*1..6) or use timeout mechanisms. For exploration, use LIMIT to cap result count.

Path Analysis

Paths—ordered sequences of nodes connected by relationships—are first-class citizens in graph queries. Analyzing paths enables route finding, dependency tracking, and connection analysis.

path-analysis.cypher

Cypher

// ========================================
// SHORTEST PATH QUERIES
// ========================================
 
// Single shortest path (any one shortest)
MATCH path = shortestPath(
    (alice:Person {name: "Alice"})-[:KNOWS*]-(bob:Person {name: "Bob"})
)
RETURN path, length(path) AS hops
 
// All shortest paths (same minimum length)
MATCH paths = allShortestPaths(
    (start:Person {id: "P001"})-[:KNOWS*]-(end:Person {id: "P099"})
)
RETURN paths, length(paths) AS hops
 
// Shortest path with relationship filter
MATCH path = shortestPath(
    (a:Person)-[:KNOWS|WORKS_WITH*]-(b:Person)
)
WHERE ALL(r IN relationships(path) WHERE r.trust > 0.7)
RETURN path
 
// Weighted shortest path (using relationship properties)
// Requires GDS library for Dijkstra
MATCH (source:City {name: "NYC"}), (target:City {name: "LA"})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
    sourceNode: source,
    targetNode: target,
    relationshipWeightProperty: 'distance'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN 
    totalCost AS totalDistance,
    [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS cities
 
// ========================================
// ALL PATHS ENUMERATION
// ========================================
 
// All paths between two nodes (limited)
MATCH path = (a:Person {id: "P001"})-[:KNOWS*1..5]-(b:Person {id: "P010"})
RETURN path
ORDER BY length(path)
LIMIT 100
 
// Paths with specific characteristics
MATCH path = (start:Account)-[:TRANSFERRED*2..4]->(end:Account)
WHERE start.id = "suspicious"
  AND end.id = "offshore"
  AND ALL(n IN nodes(path) WHERE n.flagged = false)  // Only clean intermediaries
RETURN path,
       [n IN nodes(path) | n.id] AS accountChain,
       REDUCE(total = 0, r IN relationships(path) | total + r.amount) AS totalTransferred
 
// ========================================
// PATH EXTRACTION AND MANIPULATION
// ========================================
 
// Extract nodes from path
MATCH path = (a:Person)-[:KNOWS*1..3]-(b:Person)
WITH path, nodes(path) AS pathNodes, relationships(path) AS pathRels
RETURN 
    [n IN pathNodes | n.name] AS people,
    [r IN pathRels | r.since] AS connectionDates,
    length(path) AS hops
 
// Path as ordered list
MATCH path = (root:Category {name: "Electronics"})-[:SUBCATEGORY*]->(leaf:Category)
WHERE NOT (leaf)-[:SUBCATEGORY]->()
RETURN [n IN nodes(path) | n.name] AS hierarchy,
       length(path) AS depth
 
// ========================================
// CYCLE DETECTION
// ========================================
 
// Find all simple cycles from a starting node
MATCH path = (start:Node {id: "N001"})-[:LINKED*2..10]->(start)
WHERE ALL(n IN nodes(path)[1..] WHERE n <> start)  // No revisiting except return
RETURN path
 
// Detect circular dependencies
MATCH cycle = (pkg:Package)-[:DEPENDS_ON*]->(pkg)
RETURN [n IN nodes(cycle) | n.name] AS circularDependency
 
// Find cycles of specific length
MATCH path = (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)-[:KNOWS]->(a)
WHERE a <> b AND b <> c AND a <> c
RETURN a.name, b.name, c.name AS triangle
 
// ========================================
// PATH COMPARISON AND AGGREGATION
// ========================================
 
// Group by path length
MATCH path = (start:Person {id: "P001"})-[:KNOWS*1..6]-(end:Person)
WITH length(path) AS distance, count(*) AS reachable
RETURN distance, reachable
ORDER BY distance
 
// Find nodes reachable at exactly each distance
MATCH (start:Person {id: "P001"})
UNWIND range(1, 5) AS distance
MATCH path = (start)-[:KNOWS*]->(reachable:Person)
WHERE length(path) = distance
RETURN distance, collect(DISTINCT reachable.name) AS reachableAtDistance

Shortest Path vs All Paths Performance

shortestPath() uses bidirectional BFS and terminates upon finding the first path—highly efficient. allShortestPaths() continues to find all paths of that minimum length. General path enumeration (-[*]->) explores exhaustively and can be extremely expensive. Choose the minimal pattern that answers your question.

Pattern Matching

Pattern matching is the declarative heart of graph queries. You describe the subgraph shape you're looking for, and the database finds all instances. This is particularly powerful for complex topological queries.

pattern-matching.cypher

Cypher

// ========================================
// COMPLEX PATTERNS
// ========================================
 
// Triangle pattern (mutual friends)
MATCH (a:Person)-[:KNOWS]->(b:Person),
      (a)-[:KNOWS]->(c:Person),
      (b)-[:KNOWS]->(c)
WHERE id(a) < id(b) AND id(b) < id(c)  // Avoid duplicates
RETURN a.name, b.name, c.name AS mutualConnections
 
// Star pattern (hub node)
MATCH (hub:Person)<-[:FOLLOWS]-(follower:Person)
WITH hub, count(follower) AS followerCount
WHERE followerCount > 1000
RETURN hub.name AS influencer, followerCount
ORDER BY followerCount DESC
 
// Diamond pattern (two paths between same endpoints)
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(d:Person),
      (a)-[:KNOWS]->(c:Person)-[:KNOWS]->(d)
WHERE b <> c
RETURN a.name, b.name, c.name, d.name
 
// Chain with multiple relationship types
MATCH (customer:Customer)-[:PLACED]->(order:Order)-[:CONTAINS]->(product:Product),
      (product)-[:BELONGS_TO]->(category:Category),
      (customer)-[:LIVES_IN]->(city:City)
RETURN customer.name, product.name, category.name, city.name
 
// ========================================
// NEGATIVE PATTERNS (NOT EXISTS)
// ========================================
 
// Find people NOT connected to anyone
MATCH (p:Person)
WHERE NOT (p)-[:KNOWS]-()
RETURN p.name AS isolatedPerson
 
// Find products never purchased
MATCH (p:Product)
WHERE NOT ()-[:PURCHASED]->(p)
RETURN p.name AS unsoldProduct
 
// Find users who haven't interacted recently
MATCH (u:User)
WHERE NOT (u)-[:LOGGED_IN {date: date()}]-()
RETURN u.email
 
// Complex negation: Find A connected to B but NOT via C
MATCH (a:Person {name: "Alice"})-[:KNOWS]->(b:Person)
WHERE NOT EXISTS {
    MATCH (a)-[:KNOWS]->(c:Person {blocked: true})-[:KNOWS]->(b)
}
RETURN b.name AS cleanConnection
 
// ========================================
// OPTIONAL PATTERNS
// ========================================
 
// Left outer join equivalent
MATCH (e:Employee)
OPTIONAL MATCH (e)-[:MANAGES]->(report:Employee)
RETURN e.name, 
       CASE WHEN report IS NULL THEN "Individual Contributor" 
            ELSE collect(report.name) END AS reports
 
// Multiple optional patterns
MATCH (p:Person)
OPTIONAL MATCH (p)-[:WORKS_FOR]->(c:Company)
OPTIONAL MATCH (p)-[:STUDIED_AT]->(u:University)
RETURN p.name, c.name AS employer, u.name AS almaMater
 
// ========================================
// CONDITIONAL PATTERNS (CASE in patterns)
// ========================================
 
// Different aggregations based on pattern existence
MATCH (p:Product)
OPTIONAL MATCH (p)<-[r:REVIEWED]-()
WITH p, count(r) AS reviewCount
RETURN p.name,
       CASE 
           WHEN reviewCount = 0 THEN "No reviews"
           WHEN reviewCount < 10 THEN "Few reviews"
           WHEN reviewCount < 100 THEN "Popular"
           ELSE "Highly reviewed"
       END AS reviewStatus
 
// ========================================
// EXISTENTIAL PATTERNS
// ========================================
 
// WHERE EXISTS (subquery)
MATCH (p:Person)
WHERE EXISTS {
    MATCH (p)-[:PURCHASED]->(:Product {category: "Electronics"})
}
RETURN p.name AS electronicsBuyer
 
// COUNT pattern in WHERE
MATCH (p:Person)
WHERE COUNT {
    MATCH (p)-[:PURCHASED]->(:Product)
} > 5
RETURN p.name AS frequentBuyer
 
// ========================================
// UNION AND INTERSECTION PATTERNS
// ========================================
 
// UNION: Either pattern matches
MATCH (p:Person)-[:WORKS_FOR]->(:Company {name: "TechCorp"})
RETURN p.name AS employee, "TechCorp" AS source
UNION
MATCH (p:Person)-[:WORKS_FOR]->(:Company {name: "InnovateCo"})
RETURN p.name AS employee, "InnovateCo" AS source
 
// INTERSECTION: Both patterns must match
MATCH (p:Person)-[:WORKS_FOR]->(:Company {industry: "Tech"})
MATCH (p)-[:GRADUATED_FROM]->(:University {ranking: "Top 10"})
RETURN p.name AS eliteEngineer

Pattern vs Subquery

Simple patterns (MATCH (a)-[:REL]-(b)) are more efficient than subqueries for basic matching. Use subquery syntax (EXISTS { }, COUNT { }) when you need to: aggregate within the condition, express complex negation, or combine multiple independent patterns. The query planner optimizes both, but simpler syntax is often faster.

Aggregation and Analytics

While traversal is graph databases' strength, aggregation enables summarizing patterns across the graph—counting connections, measuring path costs, and computing network statistics.

aggregation-analytics.cypher

Cypher

// ========================================
// BASIC AGGREGATIONS
// ========================================
 
// Count and grouping
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN c.name AS company, 
       count(p) AS employees,
       collect(p.name) AS employeeNames
 
// Statistics
MATCH (p:Person)-[r:RATED]->(m:Movie)
RETURN m.title,
       count(r) AS ratingCount,
       avg(r.score) AS avgRating,
       min(r.score) AS minRating,
       max(r.score) AS maxRating,
       stDev(r.score) AS ratingVariance
 
// Percentile (approximate)
MATCH (p:Product)
RETURN percentileDisc(p.price, 0.5) AS medianPrice,
       percentileDisc(p.price, 0.9) AS p90Price,
       percentileDisc(p.price, 0.99) AS p99Price
 
// ========================================
// DEGREE ANALYSIS
// ========================================
 
// Find degree distribution
MATCH (p:Person)
WITH p, size((p)-[:KNOWS]-()) AS degree
RETURN degree, count(*) AS nodeCount
ORDER BY degree
 
// Find high-degree nodes (hubs)
MATCH (p:Person)
WITH p, size((p)<-[:FOLLOWS]-()) AS inDegree,
        size((p)-[:FOLLOWS]->()) AS outDegree
WHERE inDegree > 1000
RETURN p.name, inDegree, outDegree,
       toFloat(outDegree) / inDegree AS followBackRatio
ORDER BY inDegree DESC
 
// ========================================
// PATH-BASED AGGREGATIONS
// ========================================
 
// Average path length
MATCH path = (a:Person)-[:KNOWS*1..6]-(b:Person)
WHERE a.id = "P001" AND a <> b
RETURN avg(length(path)) AS avgPathLength,
       min(length(path)) AS shortestPath,
       max(length(path)) AS longestPath
 
// Total path weight
MATCH path = (start:City {name: "NYC"})-[:ROAD*]->(end:City {name: "Chicago"})
WITH path, 
     REDUCE(dist = 0, r IN relationships(path) | dist + r.distance) AS totalDist
WHERE totalDist < 1000
RETURN [n IN nodes(path) | n.name] AS route, totalDist
ORDER BY totalDist
LIMIT 10
 
// ========================================
// TEMPORAL AGGREGATIONS
// ========================================
 
// Activity over time
MATCH (u:User)-[v:VISITED {timestamp: t}]->(p:Page)
WITH u, date(t) AS visitDate, count(*) AS pageViews
RETURN u.id, visitDate, pageViews
ORDER BY u.id, visitDate
 
// Time-windowed aggregation
MATCH (o:Order)
WHERE o.createdAt >= datetime() - duration('P30D')
WITH date(o.createdAt) AS orderDate, sum(o.total) AS dailyRevenue
RETURN orderDate, dailyRevenue
ORDER BY orderDate
 
// ========================================
// RANKING AND TOP-N
// ========================================
 
// Top N per group (using subquery)
MATCH (c:Category)
CALL {
    WITH c
    MATCH (c)<-[:IN_CATEGORY]-(p:Product)
    RETURN p
    ORDER BY p.sales DESC
    LIMIT 3
}
RETURN c.name, collect(p.name) AS topProducts
 
// Running totals (ordered aggregation)
MATCH (o:Order)
WHERE o.customerId = "C001"
WITH o ORDER BY o.createdAt
WITH collect(o) AS orders
UNWIND range(0, size(orders)-1) AS idx
RETURN orders[idx].orderId, 
       orders[idx].total AS orderTotal,
       REDUCE(running = 0, i IN range(0, idx) | running + orders[i].total) AS runningTotal
 
// ========================================
// COHORT ANALYSIS
// ========================================
 
// User cohorts by signup month
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH u, date.truncate('month', u.createdAt) AS cohortMonth
WITH cohortMonth, count(DISTINCT u) AS cohortSize
RETURN cohortMonth, cohortSize
ORDER BY cohortMonth
 
// Retention: first purchase vs repeat
MATCH (c:Customer)
OPTIONAL MATCH (c)-[p1:PURCHASED]->(first:Product)
WITH c, min(p1.date) AS firstPurchase
OPTIONAL MATCH (c)-[p2:PURCHASED]->(repeat:Product)
WHERE p2.date > firstPurchase
RETURN 
    date.truncate('month', firstPurchase) AS cohort,
    count(DISTINCT c) AS totalCustomers,
    count(DISTINCT CASE WHEN p2 IS NOT NULL THEN c END) AS repeatCustomers,
    toFloat(count(DISTINCT CASE WHEN p2 IS NOT NULL THEN c END)) / count(DISTINCT c) AS retentionRate

Aggregation Performance

collect() materializes results in memory—use LIMIT if you don't need all items. REDUCE is powerful but not indexed—aggregate on properties, not computed values when possible. For large-scale analytics, consider GDS (Graph Data Science) library which uses optimized in-memory representations.

Graph Algorithms

Graph databases natively support classic graph algorithms that would require complex implementations in other systems. Neo4j's Graph Data Science (GDS) library provides production-ready implementations of 60+ algorithms.

Essential Graph Algorithms
Category	Algorithms	Use Cases
Centrality	PageRank, Betweenness, Closeness, Degree	Influence ranking, bottleneck detection
Community Detection	Louvain, Label Propagation, K-Means	Clustering users, topic modeling
Path Finding	Dijkstra, A*, BFS/DFS, All Pairs	Routing, dependency resolution
Similarity	Jaccard, Cosine, Euclidean	Recommendations, duplicate detection
Link Prediction	Adamic-Adar, Common Neighbors	Friend suggestions, knowledge graphs
Embeddings	Node2Vec, FastRP, GraphSAGE	ML feature generation, transfer learning

graph-algorithms.cypher

Cypher

// ========================================
// CENTRALITY ALGORITHMS
// ========================================
 
// PageRank - Identify influential nodes
CALL gds.pageRank.stream('socialGraph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS person, score
RETURN person.name, score AS influence
ORDER BY score DESC
LIMIT 20
 
// Betweenness Centrality - Find bridge nodes
CALL gds.betweenness.stream('networkGraph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
WHERE score > 0
RETURN node.name, score AS bridgeScore
ORDER BY score DESC
LIMIT 10
 
// Degree Centrality - Connection count
CALL gds.degree.stream('socialGraph', {
    orientation: 'UNDIRECTED'
})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, 
       toInteger(score) AS connections
ORDER BY score DESC
LIMIT 10
 
// ========================================
// COMMUNITY DETECTION
// ========================================
 
// Louvain - Detect communities
CALL gds.louvain.stream('socialGraph')
YIELD nodeId, communityId
WITH communityId, collect(gds.util.asNode(nodeId).name) AS members
RETURN communityId, size(members) AS size, members[0..5] AS sampleMembers
ORDER BY size DESC
LIMIT 10
 
// Label Propagation - Fast community detection
CALL gds.labelPropagation.stream('socialGraph')
YIELD nodeId, communityId
RETURN communityId, count(*) AS communitySize
ORDER BY communitySize DESC
 
// ========================================
// PATH FINDING ALGORITHMS
// ========================================
 
// Dijkstra Shortest Path (weighted)
MATCH (source:City {name: "NYC"}), (target:City {name: "LA"})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
    sourceNode: source,
    targetNode: target,
    relationshipWeightProperty: 'distance'
})
YIELD totalCost, path
RETURN totalCost, 
       [n IN nodes(path) | n.name] AS route
 
// A* with heuristic (for geographic routing)
MATCH (source:City {name: "Seattle"}), (target:City {name: "Miami"})
CALL gds.shortestPath.astar.stream('flightNetwork', {
    sourceNode: source,
    targetNode: target,
    relationshipWeightProperty: 'duration',
    latitudeProperty: 'lat',
    longitudeProperty: 'lon'
})
YIELD totalCost, nodeIds
RETURN totalCost AS totalFlightTime,
       [id IN nodeIds | gds.util.asNode(id).name] AS route
 
// All Shortest Paths (between all pairs in subset)
MATCH (hub:City) WHERE hub.isHub = true
WITH collect(hub) AS hubs
CALL gds.allShortestPaths.stream('roadNetwork', {
    sourceNodes: hubs,
    targetNodes: hubs,
    relationshipWeightProperty: 'distance'
})
YIELD sourceNodeId, targetNodeId, totalCost
RETURN gds.util.asNode(sourceNodeId).name AS source,
       gds.util.asNode(targetNodeId).name AS target,
       totalCost
 
// ========================================
// SIMILARITY ALGORITHMS
// ========================================
 
// Node Similarity (Jaccard)
CALL gds.nodeSimilarity.stream('purchaseGraph')
YIELD node1, node2, similarity
WHERE similarity > 0.5
RETURN gds.util.asNode(node1).name AS customer1,
       gds.util.asNode(node2).name AS customer2,
       similarity
ORDER BY similarity DESC
LIMIT 20
 
// K-Nearest Neighbors
CALL gds.knn.stream('productGraph', {
    topK: 10,
    nodeProperties: ['embedding'],
    similarityMetric: 'COSINE'
})
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS product,
       gds.util.asNode(node2).name AS similarProduct,
       similarity
 
// ========================================
// NODE EMBEDDINGS (for ML)
// ========================================
 
// FastRP - Fast Random Projection embeddings
CALL gds.fastRP.stream('knowledgeGraph', {
    embeddingDimension: 128,
    iterationWeights: [0.0, 0.5, 1.0]
})
YIELD nodeId, embedding
RETURN gds.util.asNode(nodeId).name AS entity,
       embedding[0..5] AS embeddingPreview
 
// Node2Vec - Graph neural network embeddings
CALL gds.node2vec.write('socialGraph', {
    embeddingDimension: 64,
    walkLength: 20,
    walksPerNode: 10,
    writeProperty: 'n2vEmbedding'
})
YIELD nodePropertiesWritten
RETURN nodePropertiesWritten

GDS Workflow

GDS algorithms operate on projected graphs (in-memory representations). Workflow: 1) Create projection: gds.graph.project('myGraph', 'Person', 'KNOWS') 2) Run algorithm: gds.pageRank.stream('myGraph') 3) Drop when done: gds.graph.drop('myGraph'). Projections use significant memory—size them appropriately.

Query Optimization

Understanding how Neo4j executes queries is essential for writing performant code. The query planner transforms Cypher into execution plans, and understanding these plans enables optimization.

query-optimization.cypher

Cypher

// ========================================
// QUERY ANALYSIS TOOLS
// ========================================
 
// EXPLAIN - Show plan without executing
EXPLAIN MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.email = "alice@example.com"
RETURN p.name, c.name
 
// PROFILE - Execute and show detailed metrics
PROFILE MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.email = "alice@example.com"
RETURN p.name, c.name
 
// Key metrics to watch:
// - "Rows" - number of intermediate results
// - "DbHits" - database page accesses
// - "NodeByLabelScan" - full scan (often slow)
// - "NodeIndexSeek" - indexed lookup (fast)
 
// ========================================
// OPTIMIZATION PATTERNS
// ========================================
 
// ANTIPATTERN: Unanchored traversal
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)
WHERE c.name = "Alice"  // Filter at the END
RETURN a.name           // Scans entire Person label first!
 
// OPTIMIZED: Anchor on indexed property first
MATCH (c:Person {name: "Alice"})<-[:KNOWS]-(b:Person)<-[:KNOWS]-(a:Person)
RETURN a.name           // Starts from indexed node, expands backward
 
// ANTIPATTERN: Repeating patterns
MATCH (p:Person)-[:PURCHASED]->(product:Product)
MATCH (p)-[:LIVES_IN]->(city:City)  // p is re-matched!
RETURN p.name, product.name, city.name
 
// OPTIMIZED: Single pattern or explicit variable reuse
MATCH (p:Person)-[:PURCHASED]->(product:Product),
      (p)-[:LIVES_IN]->(city:City)
RETURN p.name, product.name, city.name
 
// ANTIPATTERN: Collect then filter
MATCH (c:Company)
WITH c, [(c)<-[:WORKS_FOR]-(e:Employee) | e] AS employees
WHERE size(employees) > 100
RETURN c.name
 
// OPTIMIZED: Count without materializing
MATCH (c:Company)
WHERE size((c)<-[:WORKS_FOR]-()) > 100
RETURN c.name
 
// ANTIPATTERN: Unnecessary OPTIONAL MATCH
MATCH (p:Person)
OPTIONAL MATCH (p)-[:FRIEND]->(f:Person)
WITH p, count(f) AS friendCount
WHERE friendCount > 0  // Could have used regular MATCH!
RETURN p.name
 
// ========================================
// INDEX HINTS
// ========================================
 
// Force specific index usage
MATCH (p:Person)
USING INDEX p:Person(email)
WHERE p.email STARTS WITH "alice"
RETURN p.name
 
// Force scan (rare, for testing)
MATCH (p:Person)
USING SCAN p:Person
WHERE p.email STARTS WITH "alice"
RETURN p.name
 
// ========================================
// JOIN HINTS
// ========================================
 
// Control join order (expert feature)
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)
USING JOIN ON b
WHERE a.country = "USA" AND c.country = "UK"
RETURN a.name, c.name
 
// ========================================
// BATCH PROCESSING FOR LARGE OPERATIONS
// ========================================
 
// Instead of loading all at once:
// LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row
// CREATE (:Person {id: row.id, name: row.name})  // Memory explosion!
 
// Use APOC periodic iterate:
CALL apoc.periodic.iterate(
    "LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row RETURN row",
    "CREATE (:Person {id: row.id, name: row.name})",
    {batchSize: 1000, parallel: true}
)
YIELD batches, total, errorMessages
RETURN batches, total

Query Optimization Checklist

•Anchor early — Start patterns with indexed node lookups, not label scans
•Use PROFILE, not EXPLAIN — Actual metrics reveal real performance
•Watch for Cartesian products — Multiple disconnected patterns cause exponential blowup
•Limit variable-length paths — Always specify maximum depth
•Filter before aggregating — Use WHERE after MATCH, not HAVING patterns
•Batch large writes — Use APOC periodic for bulk operations
•Create appropriate indexes — Check PROFILE for NodeByLabelScan indicators

Cartesian Products Are Silent Killers

If your PROFILE shows "Cartesian Product" with large row counts, your query has disconnected patterns being combined via cross-join. This is exponential complexity. Ensure all patterns share variables or are connected via OPTIONAL MATCH if intentional.

Real-World Query Patterns

Let's examine production query patterns for common graph database use cases—demonstrating how the techniques we've learned combine into practical solutions.

real-world-patterns.cypher

Cypher

// ========================================
// FRAUD DETECTION - Suspicious Patterns
// ========================================
 
// Detect money laundering: rapid circular transfers
MATCH cycle = (origin:Account)-[:TRANSFERRED*3..6]->(origin)
WHERE ALL(r IN relationships(cycle) 
          WHERE r.timestamp > datetime() - duration('PT1H'))
WITH origin, cycle,
     REDUCE(total = 0, r IN relationships(cycle) | total + r.amount) AS cycleValue
WHERE cycleValue > 10000
RETURN origin.id, 
       length(cycle) AS hops, 
       cycleValue,
       [n IN nodes(cycle) | n.id] AS accountChain
 
// First-time connections to high-risk entities
MATCH (user:Account)-[:TRANSFERRED]->(:Account)-[:TRANSFERRED*1..2]->(flagged:Account {suspicious: true})
WHERE NOT (user)-[:TRANSFERRED]->(:Account)-[:TRANSFERRED*1..2]->(flagged)
      // This was the first such connection
  AND NOT EXISTS {
    MATCH (user)-[old:TRANSFERRED]->()
    WHERE old.timestamp < datetime() - duration('P30D')
  }
RETURN user.id, flagged.id
 
// ========================================
// RECOMMENDATION ENGINE
// ========================================
 
// Collaborative filtering: users who bought X also bought Y
MATCH (targetUser:Customer {id: "C001"})-[:PURCHASED]->(product:Product)
WITH targetUser, collect(product) AS purchasedProducts
MATCH (product)<-[:PURCHASED]-(similar:Customer)-[:PURCHASED]->(recommended:Product)
WHERE product IN purchasedProducts
  AND NOT recommended IN purchasedProducts
  AND similar <> targetUser
WITH recommended, count(DISTINCT similar) AS recommenderCount
ORDER BY recommenderCount DESC
LIMIT 10
RETURN recommended.name, recommenderCount
 
// Content-based + social: friends' purchases in preferred categories
MATCH (me:Customer {id: "C001"})-[:FOLLOWS]->(friend:Customer),
      (me)-[:PURCHASED]->(:Product)-[:IN_CATEGORY]->(preferredCat:Category)
WITH me, collect(DISTINCT friend) AS friends, collect(DISTINCT preferredCat) AS categories
MATCH (f)-[:PURCHASED]->(rec:Product)-[:IN_CATEGORY]->(cat)
WHERE f IN friends 
  AND cat IN categories
  AND NOT (me)-[:PURCHASED]->(rec)
RETURN rec.name, cat.name, count(DISTINCT f) AS friendsWhoPurchased
ORDER BY friendsWhoPurchased DESC
LIMIT 20
 
// ========================================
// KNOWLEDGE GRAPH QUERIES
// ========================================
 
// Entity resolution: find all mentions of same entity
MATCH (e:Entity)
WHERE e.name CONTAINS "Apple"
OPTIONAL MATCH (e)-[:SAME_AS*0..3]-(alias:Entity)
RETURN e.name, 
       CASE WHEN alias IS NULL THEN [] 
            ELSE collect(DISTINCT alias.name) END AS aliases
 
// Reasoning chains: evidence for fact
MATCH path = (claim:Claim)-[:SUPPORTED_BY*1..5]->(evidence:Evidence)
WHERE claim.statement = "Climate change is anthropogenic"
RETURN [n IN nodes(path) | n.statement] AS reasoningChain,
       length(path) AS evidenceDepth
 
// ========================================
// ACCESS CONTROL - RBAC Graph
// ========================================
 
// Check permission via role inheritance
MATCH (user:User {email: $userEmail})-[:HAS_ROLE*1..3]->(role:Role),
      (role)-[:GRANTS]->(permission:Permission),
      (permission)-[:ON]->(resource:Resource {id: $resourceId})
WHERE permission.action = $requestedAction
RETURN count(*) > 0 AS hasAccess
 
// Audit: who can access this resource?
MATCH (resource:Resource {id: "FINANCE_REPORT"})
MATCH path = (resource)<-[:ON]-(:Permission)<-[:GRANTS]-(:Role)<-[:HAS_ROLE*1..3]-(user:User)
RETURN DISTINCT user.name, 
       [r IN relationships(path) | type(r)] AS accessPath
 
// ========================================
// NETWORK ANALYSIS
// ========================================
 
// Find critical path nodes (if removed, disconnects network)
MATCH (a:Server)-[:CONNECTS*]-(b:Server)
WHERE a.id = "CORE_SERVER" AND a <> b
WITH collect(DISTINCT b) AS reachable
MATCH (candidate:Server)
WHERE candidate.id <> "CORE_SERVER"
WITH candidate, reachable, size(reachable) AS originalReach
MATCH (a:Server {id: "CORE_SERVER"})-[:CONNECTS*]-(b:Server)
WHERE b <> candidate AND NOT (b)-[:CONNECTS*]-(candidate)-[:CONNECTS*]-(a)
WITH candidate, originalReach, count(DISTINCT b) AS reachWithoutCandidate
WHERE reachWithoutCandidate < originalReach
RETURN candidate.id AS criticalNode, 
       originalReach - reachWithoutCandidate AS impactedNodes

From Pattern to Production

Production queries typically combine multiple techniques: indexed anchors → filtered traversals → aggregated results. Start with a simple pattern that returns correct results, then use PROFILE to identify bottlenecks, add indexes, and refine filters until performance meets requirements.

Summary: Graph Queries

We've comprehensively explored graph query techniques—from traversal fundamentals through advanced algorithms. Let's consolidate the key insights:

Key Takeaways

•Traversal is the foundation — All graph queries reduce to systematic node visitation via relationship following. Anchor on indexed properties, control depth, and filter during traversal.
•Path analysis unlocks connectivity — Shortest paths, all paths, and cycle detection answer questions about how entities relate across any network topology.
•Pattern matching is declarative power — Describe the subgraph shape you want; the database finds all instances. Complex patterns beat iterative procedural code.
•Aggregation enables analytics — Counting connections, measuring path weights, and computing statistics summarize patterns across the entire graph.
•Graph algorithms are first-class — PageRank, community detection, similarity, and embeddings aren't afterthoughts—they're production-ready library functions.
•Optimization requires understanding — Use PROFILE to see actual execution, anchor patterns on indexes, avoid Cartesian products, and batch large operations.

What's next:

With query mastery established, we'll explore Use Cases in depth—examining the specific domains where graph databases provide transformative advantages, including social networks, fraud detection, recommendations, knowledge graphs, and network analysis.

Page Complete

You now possess advanced graph query capabilities—traversal patterns, path analysis, pattern matching, aggregation, graph algorithms, optimization techniques, and production query patterns. Next, we'll apply these skills to specific industry use cases.

4 / 5

Loading learning content...

NoSQL DatabasesGraph Databases

Graph Databases: Modeling Connected Data

LevelAdvanced

Duration75 mins

TopicGraph Databases

4 / 5

Graph Queries

The Power of Graph Traversal

The true power of graph databases isn't in storing connected data—it's in querying that data in ways that would be impossibly complex or prohibitively expensive in other paradigms.

Consider what becomes trivial with graph queries:

Finding all paths between two entities through any network topology
Detecting circular dependencies in complex systems
Computing influence scores across relationship networks
Identifying fraud patterns as subgraph matches
Recommending products based on purchase graph similarity

These operations—natural expressions of graph algorithms—translate directly into queries that execute in milliseconds even on massive datasets.

What You Will Learn

Traversal Fundamentals

At the heart of every graph query is traversal—systematically visiting nodes by following relationships. Understanding traversal mechanics is essential for writing efficient queries.

Traversal Components:

Starting Points (Anchors): Where traversal begins—typically nodes matched by indexed properties
Expansion Pattern: Which relationships to follow and in what direction
Filters: Conditions to include/exclude paths during traversal
Depth Control: How many hops to traverse
Collection: What to return from traversed paths

traversal-patterns.cypher

Cypher

// ========================================
// DIRECTED TRAVERSAL
// ========================================
 
// Outgoing relationships only
MATCH (manager:Person)-[:MANAGES]->(report:Person)
RETURN manager.name, report.name
 
// Incoming relationships only
MATCH (employee:Person)<-[:HIRED]-(company:Company)
RETURN employee.name, company.name
 
// Either direction (performance note: can be slower)
MATCH (a:Person)-[:KNOWS]-(b:Person)
RETURN a.name, b.name
 
// ========================================
// MULTI-HOP TRAVERSAL (Fixed Depth)
// ========================================
 
// Exactly 2 hops
MATCH (a:Person)-[:KNOWS]->()-[:KNOWS]->(c:Person)
WHERE a <> c
RETURN DISTINCT a.name AS person, c.name AS friendOfFriend
 
// Same as above, more readable with named nodes
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)
WHERE a <> c
RETURN a.name, b.name AS via, c.name
 
// ========================================
// VARIABLE-LENGTH TRAVERSAL
// ========================================
 
// 1 to 3 hops
MATCH path = (start:Person {id: "P001"})-[:KNOWS*1..3]->(end:Person)
RETURN end.name, length(path) AS distance
 
// 2 to 5 hops (skip immediate neighbors)
MATCH (start:Person {id: "P001"})-[:KNOWS*2..5]-(connected:Person)
RETURN DISTINCT connected.name
 
// Any number of hops (use with caution!)
MATCH (start:Category {name: "Root"})-[:SUBCATEGORY*]->(leaf:Category)
WHERE NOT (leaf)-[:SUBCATEGORY]->()
RETURN leaf.name AS leafCategory
 
// Up to N hops (safer than unbounded)
MATCH (a:Node)-[:LINKED*..10]->(b:Node)
RETURN a.id, b.id
 
// ========================================
// TRAVERSAL WITH FILTERING
// ========================================
 
// Filter on intermediate nodes
MATCH path = (start:Person)-[:KNOWS*1..4]-(end:Person)
WHERE ALL(node IN nodes(path) WHERE node.verified = true)
RETURN path
 
// Filter on relationships
MATCH (start:Person)-[rels:KNOWS*1..3]->(end:Person)
WHERE ALL(r IN rels WHERE r.strength > 0.5)
RETURN start.name, end.name, 
       [r IN rels | r.strength] AS strengths
 
// Filter on path properties
MATCH path = (a:City)-[:ROAD*]-(b:City)
WHERE REDUCE(total = 0, r IN relationships(path) | total + r.distance) < 500
RETURN [n IN nodes(path) | n.name] AS route,
       REDUCE(d = 0, r IN relationships(path) | d + r.distance) AS totalDistance
 
// ========================================
// TRAVERSAL UNIQUENESS
// ========================================
 
// Default: relationships are unique per path (each edge visited once per path)
MATCH path = (a)-[*1..5]-(b)
// This may include same nodes multiple times, but not same edges
 
// Force unique nodes in path
MATCH path = (a)-[*1..5]-(b)
WHERE size(nodes(path)) = size(apoc.coll.toSet(nodes(path)))
RETURN path

Unbounded Traversal Warning

Path Analysis

Paths—ordered sequences of nodes connected by relationships—are first-class citizens in graph queries. Analyzing paths enables route finding, dependency tracking, and connection analysis.

path-analysis.cypher

Cypher

// ========================================
// SHORTEST PATH QUERIES
// ========================================
 
// Single shortest path (any one shortest)
MATCH path = shortestPath(
    (alice:Person {name: "Alice"})-[:KNOWS*]-(bob:Person {name: "Bob"})
)
RETURN path, length(path) AS hops
 
// All shortest paths (same minimum length)
MATCH paths = allShortestPaths(
    (start:Person {id: "P001"})-[:KNOWS*]-(end:Person {id: "P099"})
)
RETURN paths, length(paths) AS hops
 
// Shortest path with relationship filter
MATCH path = shortestPath(
    (a:Person)-[:KNOWS|WORKS_WITH*]-(b:Person)
)
WHERE ALL(r IN relationships(path) WHERE r.trust > 0.7)
RETURN path
 
// Weighted shortest path (using relationship properties)
// Requires GDS library for Dijkstra
MATCH (source:City {name: "NYC"}), (target:City {name: "LA"})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
    sourceNode: source,
    targetNode: target,
    relationshipWeightProperty: 'distance'
})
YIELD index, sourceNode, targetNode, totalCost, nodeIds, costs, path
RETURN 
    totalCost AS totalDistance,
    [nodeId IN nodeIds | gds.util.asNode(nodeId).name] AS cities
 
// ========================================
// ALL PATHS ENUMERATION
// ========================================
 
// All paths between two nodes (limited)
MATCH path = (a:Person {id: "P001"})-[:KNOWS*1..5]-(b:Person {id: "P010"})
RETURN path
ORDER BY length(path)
LIMIT 100
 
// Paths with specific characteristics
MATCH path = (start:Account)-[:TRANSFERRED*2..4]->(end:Account)
WHERE start.id = "suspicious"
  AND end.id = "offshore"
  AND ALL(n IN nodes(path) WHERE n.flagged = false)  // Only clean intermediaries
RETURN path,
       [n IN nodes(path) | n.id] AS accountChain,
       REDUCE(total = 0, r IN relationships(path) | total + r.amount) AS totalTransferred
 
// ========================================
// PATH EXTRACTION AND MANIPULATION
// ========================================
 
// Extract nodes from path
MATCH path = (a:Person)-[:KNOWS*1..3]-(b:Person)
WITH path, nodes(path) AS pathNodes, relationships(path) AS pathRels
RETURN 
    [n IN pathNodes | n.name] AS people,
    [r IN pathRels | r.since] AS connectionDates,
    length(path) AS hops
 
// Path as ordered list
MATCH path = (root:Category {name: "Electronics"})-[:SUBCATEGORY*]->(leaf:Category)
WHERE NOT (leaf)-[:SUBCATEGORY]->()
RETURN [n IN nodes(path) | n.name] AS hierarchy,
       length(path) AS depth
 
// ========================================
// CYCLE DETECTION
// ========================================
 
// Find all simple cycles from a starting node
MATCH path = (start:Node {id: "N001"})-[:LINKED*2..10]->(start)
WHERE ALL(n IN nodes(path)[1..] WHERE n <> start)  // No revisiting except return
RETURN path
 
// Detect circular dependencies
MATCH cycle = (pkg:Package)-[:DEPENDS_ON*]->(pkg)
RETURN [n IN nodes(cycle) | n.name] AS circularDependency
 
// Find cycles of specific length
MATCH path = (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)-[:KNOWS]->(a)
WHERE a <> b AND b <> c AND a <> c
RETURN a.name, b.name, c.name AS triangle
 
// ========================================
// PATH COMPARISON AND AGGREGATION
// ========================================
 
// Group by path length
MATCH path = (start:Person {id: "P001"})-[:KNOWS*1..6]-(end:Person)
WITH length(path) AS distance, count(*) AS reachable
RETURN distance, reachable
ORDER BY distance
 
// Find nodes reachable at exactly each distance
MATCH (start:Person {id: "P001"})
UNWIND range(1, 5) AS distance
MATCH path = (start)-[:KNOWS*]->(reachable:Person)
WHERE length(path) = distance
RETURN distance, collect(DISTINCT reachable.name) AS reachableAtDistance

Shortest Path vs All Paths Performance

Pattern Matching

pattern-matching.cypher

Cypher

// ========================================
// COMPLEX PATTERNS
// ========================================
 
// Triangle pattern (mutual friends)
MATCH (a:Person)-[:KNOWS]->(b:Person),
      (a)-[:KNOWS]->(c:Person),
      (b)-[:KNOWS]->(c)
WHERE id(a) < id(b) AND id(b) < id(c)  // Avoid duplicates
RETURN a.name, b.name, c.name AS mutualConnections
 
// Star pattern (hub node)
MATCH (hub:Person)<-[:FOLLOWS]-(follower:Person)
WITH hub, count(follower) AS followerCount
WHERE followerCount > 1000
RETURN hub.name AS influencer, followerCount
ORDER BY followerCount DESC
 
// Diamond pattern (two paths between same endpoints)
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(d:Person),
      (a)-[:KNOWS]->(c:Person)-[:KNOWS]->(d)
WHERE b <> c
RETURN a.name, b.name, c.name, d.name
 
// Chain with multiple relationship types
MATCH (customer:Customer)-[:PLACED]->(order:Order)-[:CONTAINS]->(product:Product),
      (product)-[:BELONGS_TO]->(category:Category),
      (customer)-[:LIVES_IN]->(city:City)
RETURN customer.name, product.name, category.name, city.name
 
// ========================================
// NEGATIVE PATTERNS (NOT EXISTS)
// ========================================
 
// Find people NOT connected to anyone
MATCH (p:Person)
WHERE NOT (p)-[:KNOWS]-()
RETURN p.name AS isolatedPerson
 
// Find products never purchased
MATCH (p:Product)
WHERE NOT ()-[:PURCHASED]->(p)
RETURN p.name AS unsoldProduct
 
// Find users who haven't interacted recently
MATCH (u:User)
WHERE NOT (u)-[:LOGGED_IN {date: date()}]-()
RETURN u.email
 
// Complex negation: Find A connected to B but NOT via C
MATCH (a:Person {name: "Alice"})-[:KNOWS]->(b:Person)
WHERE NOT EXISTS {
    MATCH (a)-[:KNOWS]->(c:Person {blocked: true})-[:KNOWS]->(b)
}
RETURN b.name AS cleanConnection
 
// ========================================
// OPTIONAL PATTERNS
// ========================================
 
// Left outer join equivalent
MATCH (e:Employee)
OPTIONAL MATCH (e)-[:MANAGES]->(report:Employee)
RETURN e.name, 
       CASE WHEN report IS NULL THEN "Individual Contributor" 
            ELSE collect(report.name) END AS reports
 
// Multiple optional patterns
MATCH (p:Person)
OPTIONAL MATCH (p)-[:WORKS_FOR]->(c:Company)
OPTIONAL MATCH (p)-[:STUDIED_AT]->(u:University)
RETURN p.name, c.name AS employer, u.name AS almaMater
 
// ========================================
// CONDITIONAL PATTERNS (CASE in patterns)
// ========================================
 
// Different aggregations based on pattern existence
MATCH (p:Product)
OPTIONAL MATCH (p)<-[r:REVIEWED]-()
WITH p, count(r) AS reviewCount
RETURN p.name,
       CASE 
           WHEN reviewCount = 0 THEN "No reviews"
           WHEN reviewCount < 10 THEN "Few reviews"
           WHEN reviewCount < 100 THEN "Popular"
           ELSE "Highly reviewed"
       END AS reviewStatus
 
// ========================================
// EXISTENTIAL PATTERNS
// ========================================
 
// WHERE EXISTS (subquery)
MATCH (p:Person)
WHERE EXISTS {
    MATCH (p)-[:PURCHASED]->(:Product {category: "Electronics"})
}
RETURN p.name AS electronicsBuyer
 
// COUNT pattern in WHERE
MATCH (p:Person)
WHERE COUNT {
    MATCH (p)-[:PURCHASED]->(:Product)
} > 5
RETURN p.name AS frequentBuyer
 
// ========================================
// UNION AND INTERSECTION PATTERNS
// ========================================
 
// UNION: Either pattern matches
MATCH (p:Person)-[:WORKS_FOR]->(:Company {name: "TechCorp"})
RETURN p.name AS employee, "TechCorp" AS source
UNION
MATCH (p:Person)-[:WORKS_FOR]->(:Company {name: "InnovateCo"})
RETURN p.name AS employee, "InnovateCo" AS source
 
// INTERSECTION: Both patterns must match
MATCH (p:Person)-[:WORKS_FOR]->(:Company {industry: "Tech"})
MATCH (p)-[:GRADUATED_FROM]->(:University {ranking: "Top 10"})
RETURN p.name AS eliteEngineer

Pattern vs Subquery

Aggregation and Analytics

While traversal is graph databases' strength, aggregation enables summarizing patterns across the graph—counting connections, measuring path costs, and computing network statistics.

aggregation-analytics.cypher

Cypher

// ========================================
// BASIC AGGREGATIONS
// ========================================
 
// Count and grouping
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
RETURN c.name AS company, 
       count(p) AS employees,
       collect(p.name) AS employeeNames
 
// Statistics
MATCH (p:Person)-[r:RATED]->(m:Movie)
RETURN m.title,
       count(r) AS ratingCount,
       avg(r.score) AS avgRating,
       min(r.score) AS minRating,
       max(r.score) AS maxRating,
       stDev(r.score) AS ratingVariance
 
// Percentile (approximate)
MATCH (p:Product)
RETURN percentileDisc(p.price, 0.5) AS medianPrice,
       percentileDisc(p.price, 0.9) AS p90Price,
       percentileDisc(p.price, 0.99) AS p99Price
 
// ========================================
// DEGREE ANALYSIS
// ========================================
 
// Find degree distribution
MATCH (p:Person)
WITH p, size((p)-[:KNOWS]-()) AS degree
RETURN degree, count(*) AS nodeCount
ORDER BY degree
 
// Find high-degree nodes (hubs)
MATCH (p:Person)
WITH p, size((p)<-[:FOLLOWS]-()) AS inDegree,
        size((p)-[:FOLLOWS]->()) AS outDegree
WHERE inDegree > 1000
RETURN p.name, inDegree, outDegree,
       toFloat(outDegree) / inDegree AS followBackRatio
ORDER BY inDegree DESC
 
// ========================================
// PATH-BASED AGGREGATIONS
// ========================================
 
// Average path length
MATCH path = (a:Person)-[:KNOWS*1..6]-(b:Person)
WHERE a.id = "P001" AND a <> b
RETURN avg(length(path)) AS avgPathLength,
       min(length(path)) AS shortestPath,
       max(length(path)) AS longestPath
 
// Total path weight
MATCH path = (start:City {name: "NYC"})-[:ROAD*]->(end:City {name: "Chicago"})
WITH path, 
     REDUCE(dist = 0, r IN relationships(path) | dist + r.distance) AS totalDist
WHERE totalDist < 1000
RETURN [n IN nodes(path) | n.name] AS route, totalDist
ORDER BY totalDist
LIMIT 10
 
// ========================================
// TEMPORAL AGGREGATIONS
// ========================================
 
// Activity over time
MATCH (u:User)-[v:VISITED {timestamp: t}]->(p:Page)
WITH u, date(t) AS visitDate, count(*) AS pageViews
RETURN u.id, visitDate, pageViews
ORDER BY u.id, visitDate
 
// Time-windowed aggregation
MATCH (o:Order)
WHERE o.createdAt >= datetime() - duration('P30D')
WITH date(o.createdAt) AS orderDate, sum(o.total) AS dailyRevenue
RETURN orderDate, dailyRevenue
ORDER BY orderDate
 
// ========================================
// RANKING AND TOP-N
// ========================================
 
// Top N per group (using subquery)
MATCH (c:Category)
CALL {
    WITH c
    MATCH (c)<-[:IN_CATEGORY]-(p:Product)
    RETURN p
    ORDER BY p.sales DESC
    LIMIT 3
}
RETURN c.name, collect(p.name) AS topProducts
 
// Running totals (ordered aggregation)
MATCH (o:Order)
WHERE o.customerId = "C001"
WITH o ORDER BY o.createdAt
WITH collect(o) AS orders
UNWIND range(0, size(orders)-1) AS idx
RETURN orders[idx].orderId, 
       orders[idx].total AS orderTotal,
       REDUCE(running = 0, i IN range(0, idx) | running + orders[i].total) AS runningTotal
 
// ========================================
// COHORT ANALYSIS
// ========================================
 
// User cohorts by signup month
MATCH (u:User)-[:PURCHASED]->(p:Product)
WITH u, date.truncate('month', u.createdAt) AS cohortMonth
WITH cohortMonth, count(DISTINCT u) AS cohortSize
RETURN cohortMonth, cohortSize
ORDER BY cohortMonth
 
// Retention: first purchase vs repeat
MATCH (c:Customer)
OPTIONAL MATCH (c)-[p1:PURCHASED]->(first:Product)
WITH c, min(p1.date) AS firstPurchase
OPTIONAL MATCH (c)-[p2:PURCHASED]->(repeat:Product)
WHERE p2.date > firstPurchase
RETURN 
    date.truncate('month', firstPurchase) AS cohort,
    count(DISTINCT c) AS totalCustomers,
    count(DISTINCT CASE WHEN p2 IS NOT NULL THEN c END) AS repeatCustomers,
    toFloat(count(DISTINCT CASE WHEN p2 IS NOT NULL THEN c END)) / count(DISTINCT c) AS retentionRate

Aggregation Performance

Graph Algorithms

Essential Graph Algorithms
Category	Algorithms	Use Cases
Centrality	PageRank, Betweenness, Closeness, Degree	Influence ranking, bottleneck detection
Community Detection	Louvain, Label Propagation, K-Means	Clustering users, topic modeling
Path Finding	Dijkstra, A*, BFS/DFS, All Pairs	Routing, dependency resolution
Similarity	Jaccard, Cosine, Euclidean	Recommendations, duplicate detection
Link Prediction	Adamic-Adar, Common Neighbors	Friend suggestions, knowledge graphs
Embeddings	Node2Vec, FastRP, GraphSAGE	ML feature generation, transfer learning

graph-algorithms.cypher

Cypher

// ========================================
// CENTRALITY ALGORITHMS
// ========================================
 
// PageRank - Identify influential nodes
CALL gds.pageRank.stream('socialGraph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS person, score
RETURN person.name, score AS influence
ORDER BY score DESC
LIMIT 20
 
// Betweenness Centrality - Find bridge nodes
CALL gds.betweenness.stream('networkGraph')
YIELD nodeId, score
WITH gds.util.asNode(nodeId) AS node, score
WHERE score > 0
RETURN node.name, score AS bridgeScore
ORDER BY score DESC
LIMIT 10
 
// Degree Centrality - Connection count
CALL gds.degree.stream('socialGraph', {
    orientation: 'UNDIRECTED'
})
YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS person, 
       toInteger(score) AS connections
ORDER BY score DESC
LIMIT 10
 
// ========================================
// COMMUNITY DETECTION
// ========================================
 
// Louvain - Detect communities
CALL gds.louvain.stream('socialGraph')
YIELD nodeId, communityId
WITH communityId, collect(gds.util.asNode(nodeId).name) AS members
RETURN communityId, size(members) AS size, members[0..5] AS sampleMembers
ORDER BY size DESC
LIMIT 10
 
// Label Propagation - Fast community detection
CALL gds.labelPropagation.stream('socialGraph')
YIELD nodeId, communityId
RETURN communityId, count(*) AS communitySize
ORDER BY communitySize DESC
 
// ========================================
// PATH FINDING ALGORITHMS
// ========================================
 
// Dijkstra Shortest Path (weighted)
MATCH (source:City {name: "NYC"}), (target:City {name: "LA"})
CALL gds.shortestPath.dijkstra.stream('roadNetwork', {
    sourceNode: source,
    targetNode: target,
    relationshipWeightProperty: 'distance'
})
YIELD totalCost, path
RETURN totalCost, 
       [n IN nodes(path) | n.name] AS route
 
// A* with heuristic (for geographic routing)
MATCH (source:City {name: "Seattle"}), (target:City {name: "Miami"})
CALL gds.shortestPath.astar.stream('flightNetwork', {
    sourceNode: source,
    targetNode: target,
    relationshipWeightProperty: 'duration',
    latitudeProperty: 'lat',
    longitudeProperty: 'lon'
})
YIELD totalCost, nodeIds
RETURN totalCost AS totalFlightTime,
       [id IN nodeIds | gds.util.asNode(id).name] AS route
 
// All Shortest Paths (between all pairs in subset)
MATCH (hub:City) WHERE hub.isHub = true
WITH collect(hub) AS hubs
CALL gds.allShortestPaths.stream('roadNetwork', {
    sourceNodes: hubs,
    targetNodes: hubs,
    relationshipWeightProperty: 'distance'
})
YIELD sourceNodeId, targetNodeId, totalCost
RETURN gds.util.asNode(sourceNodeId).name AS source,
       gds.util.asNode(targetNodeId).name AS target,
       totalCost
 
// ========================================
// SIMILARITY ALGORITHMS
// ========================================
 
// Node Similarity (Jaccard)
CALL gds.nodeSimilarity.stream('purchaseGraph')
YIELD node1, node2, similarity
WHERE similarity > 0.5
RETURN gds.util.asNode(node1).name AS customer1,
       gds.util.asNode(node2).name AS customer2,
       similarity
ORDER BY similarity DESC
LIMIT 20
 
// K-Nearest Neighbors
CALL gds.knn.stream('productGraph', {
    topK: 10,
    nodeProperties: ['embedding'],
    similarityMetric: 'COSINE'
})
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).name AS product,
       gds.util.asNode(node2).name AS similarProduct,
       similarity
 
// ========================================
// NODE EMBEDDINGS (for ML)
// ========================================
 
// FastRP - Fast Random Projection embeddings
CALL gds.fastRP.stream('knowledgeGraph', {
    embeddingDimension: 128,
    iterationWeights: [0.0, 0.5, 1.0]
})
YIELD nodeId, embedding
RETURN gds.util.asNode(nodeId).name AS entity,
       embedding[0..5] AS embeddingPreview
 
// Node2Vec - Graph neural network embeddings
CALL gds.node2vec.write('socialGraph', {
    embeddingDimension: 64,
    walkLength: 20,
    walksPerNode: 10,
    writeProperty: 'n2vEmbedding'
})
YIELD nodePropertiesWritten
RETURN nodePropertiesWritten

GDS Workflow

Query Optimization

Understanding how Neo4j executes queries is essential for writing performant code. The query planner transforms Cypher into execution plans, and understanding these plans enables optimization.

query-optimization.cypher

Cypher

// ========================================
// QUERY ANALYSIS TOOLS
// ========================================
 
// EXPLAIN - Show plan without executing
EXPLAIN MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.email = "alice@example.com"
RETURN p.name, c.name
 
// PROFILE - Execute and show detailed metrics
PROFILE MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.email = "alice@example.com"
RETURN p.name, c.name
 
// Key metrics to watch:
// - "Rows" - number of intermediate results
// - "DbHits" - database page accesses
// - "NodeByLabelScan" - full scan (often slow)
// - "NodeIndexSeek" - indexed lookup (fast)
 
// ========================================
// OPTIMIZATION PATTERNS
// ========================================
 
// ANTIPATTERN: Unanchored traversal
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)
WHERE c.name = "Alice"  // Filter at the END
RETURN a.name           // Scans entire Person label first!
 
// OPTIMIZED: Anchor on indexed property first
MATCH (c:Person {name: "Alice"})<-[:KNOWS]-(b:Person)<-[:KNOWS]-(a:Person)
RETURN a.name           // Starts from indexed node, expands backward
 
// ANTIPATTERN: Repeating patterns
MATCH (p:Person)-[:PURCHASED]->(product:Product)
MATCH (p)-[:LIVES_IN]->(city:City)  // p is re-matched!
RETURN p.name, product.name, city.name
 
// OPTIMIZED: Single pattern or explicit variable reuse
MATCH (p:Person)-[:PURCHASED]->(product:Product),
      (p)-[:LIVES_IN]->(city:City)
RETURN p.name, product.name, city.name
 
// ANTIPATTERN: Collect then filter
MATCH (c:Company)
WITH c, [(c)<-[:WORKS_FOR]-(e:Employee) | e] AS employees
WHERE size(employees) > 100
RETURN c.name
 
// OPTIMIZED: Count without materializing
MATCH (c:Company)
WHERE size((c)<-[:WORKS_FOR]-()) > 100
RETURN c.name
 
// ANTIPATTERN: Unnecessary OPTIONAL MATCH
MATCH (p:Person)
OPTIONAL MATCH (p)-[:FRIEND]->(f:Person)
WITH p, count(f) AS friendCount
WHERE friendCount > 0  // Could have used regular MATCH!
RETURN p.name
 
// ========================================
// INDEX HINTS
// ========================================
 
// Force specific index usage
MATCH (p:Person)
USING INDEX p:Person(email)
WHERE p.email STARTS WITH "alice"
RETURN p.name
 
// Force scan (rare, for testing)
MATCH (p:Person)
USING SCAN p:Person
WHERE p.email STARTS WITH "alice"
RETURN p.name
 
// ========================================
// JOIN HINTS
// ========================================
 
// Control join order (expert feature)
MATCH (a:Person)-[:KNOWS]->(b:Person)-[:KNOWS]->(c:Person)
USING JOIN ON b
WHERE a.country = "USA" AND c.country = "UK"
RETURN a.name, c.name
 
// ========================================
// BATCH PROCESSING FOR LARGE OPERATIONS
// ========================================
 
// Instead of loading all at once:
// LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row
// CREATE (:Person {id: row.id, name: row.name})  // Memory explosion!
 
// Use APOC periodic iterate:
CALL apoc.periodic.iterate(
    "LOAD CSV WITH HEADERS FROM 'file:///users.csv' AS row RETURN row",
    "CREATE (:Person {id: row.id, name: row.name})",
    {batchSize: 1000, parallel: true}
)
YIELD batches, total, errorMessages
RETURN batches, total

Query Optimization Checklist

•Anchor early — Start patterns with indexed node lookups, not label scans
•Use PROFILE, not EXPLAIN — Actual metrics reveal real performance
•Watch for Cartesian products — Multiple disconnected patterns cause exponential blowup
•Limit variable-length paths — Always specify maximum depth
•Filter before aggregating — Use WHERE after MATCH, not HAVING patterns
•Batch large writes — Use APOC periodic for bulk operations
•Create appropriate indexes — Check PROFILE for NodeByLabelScan indicators

Cartesian Products Are Silent Killers

Real-World Query Patterns

Let's examine production query patterns for common graph database use cases—demonstrating how the techniques we've learned combine into practical solutions.

real-world-patterns.cypher

Cypher

// ========================================
// FRAUD DETECTION - Suspicious Patterns
// ========================================
 
// Detect money laundering: rapid circular transfers
MATCH cycle = (origin:Account)-[:TRANSFERRED*3..6]->(origin)
WHERE ALL(r IN relationships(cycle) 
          WHERE r.timestamp > datetime() - duration('PT1H'))
WITH origin, cycle,
     REDUCE(total = 0, r IN relationships(cycle) | total + r.amount) AS cycleValue
WHERE cycleValue > 10000
RETURN origin.id, 
       length(cycle) AS hops, 
       cycleValue,
       [n IN nodes(cycle) | n.id] AS accountChain
 
// First-time connections to high-risk entities
MATCH (user:Account)-[:TRANSFERRED]->(:Account)-[:TRANSFERRED*1..2]->(flagged:Account {suspicious: true})
WHERE NOT (user)-[:TRANSFERRED]->(:Account)-[:TRANSFERRED*1..2]->(flagged)
      // This was the first such connection
  AND NOT EXISTS {
    MATCH (user)-[old:TRANSFERRED]->()
    WHERE old.timestamp < datetime() - duration('P30D')
  }
RETURN user.id, flagged.id
 
// ========================================
// RECOMMENDATION ENGINE
// ========================================
 
// Collaborative filtering: users who bought X also bought Y
MATCH (targetUser:Customer {id: "C001"})-[:PURCHASED]->(product:Product)
WITH targetUser, collect(product) AS purchasedProducts
MATCH (product)<-[:PURCHASED]-(similar:Customer)-[:PURCHASED]->(recommended:Product)
WHERE product IN purchasedProducts
  AND NOT recommended IN purchasedProducts
  AND similar <> targetUser
WITH recommended, count(DISTINCT similar) AS recommenderCount
ORDER BY recommenderCount DESC
LIMIT 10
RETURN recommended.name, recommenderCount
 
// Content-based + social: friends' purchases in preferred categories
MATCH (me:Customer {id: "C001"})-[:FOLLOWS]->(friend:Customer),
      (me)-[:PURCHASED]->(:Product)-[:IN_CATEGORY]->(preferredCat:Category)
WITH me, collect(DISTINCT friend) AS friends, collect(DISTINCT preferredCat) AS categories
MATCH (f)-[:PURCHASED]->(rec:Product)-[:IN_CATEGORY]->(cat)
WHERE f IN friends 
  AND cat IN categories
  AND NOT (me)-[:PURCHASED]->(rec)
RETURN rec.name, cat.name, count(DISTINCT f) AS friendsWhoPurchased
ORDER BY friendsWhoPurchased DESC
LIMIT 20
 
// ========================================
// KNOWLEDGE GRAPH QUERIES
// ========================================
 
// Entity resolution: find all mentions of same entity
MATCH (e:Entity)
WHERE e.name CONTAINS "Apple"
OPTIONAL MATCH (e)-[:SAME_AS*0..3]-(alias:Entity)
RETURN e.name, 
       CASE WHEN alias IS NULL THEN [] 
            ELSE collect(DISTINCT alias.name) END AS aliases
 
// Reasoning chains: evidence for fact
MATCH path = (claim:Claim)-[:SUPPORTED_BY*1..5]->(evidence:Evidence)
WHERE claim.statement = "Climate change is anthropogenic"
RETURN [n IN nodes(path) | n.statement] AS reasoningChain,
       length(path) AS evidenceDepth
 
// ========================================
// ACCESS CONTROL - RBAC Graph
// ========================================
 
// Check permission via role inheritance
MATCH (user:User {email: $userEmail})-[:HAS_ROLE*1..3]->(role:Role),
      (role)-[:GRANTS]->(permission:Permission),
      (permission)-[:ON]->(resource:Resource {id: $resourceId})
WHERE permission.action = $requestedAction
RETURN count(*) > 0 AS hasAccess
 
// Audit: who can access this resource?
MATCH (resource:Resource {id: "FINANCE_REPORT"})
MATCH path = (resource)<-[:ON]-(:Permission)<-[:GRANTS]-(:Role)<-[:HAS_ROLE*1..3]-(user:User)
RETURN DISTINCT user.name, 
       [r IN relationships(path) | type(r)] AS accessPath
 
// ========================================
// NETWORK ANALYSIS
// ========================================
 
// Find critical path nodes (if removed, disconnects network)
MATCH (a:Server)-[:CONNECTS*]-(b:Server)
WHERE a.id = "CORE_SERVER" AND a <> b
WITH collect(DISTINCT b) AS reachable
MATCH (candidate:Server)
WHERE candidate.id <> "CORE_SERVER"
WITH candidate, reachable, size(reachable) AS originalReach
MATCH (a:Server {id: "CORE_SERVER"})-[:CONNECTS*]-(b:Server)
WHERE b <> candidate AND NOT (b)-[:CONNECTS*]-(candidate)-[:CONNECTS*]-(a)
WITH candidate, originalReach, count(DISTINCT b) AS reachWithoutCandidate
WHERE reachWithoutCandidate < originalReach
RETURN candidate.id AS criticalNode, 
       originalReach - reachWithoutCandidate AS impactedNodes

From Pattern to Production

Summary: Graph Queries

We've comprehensively explored graph query techniques—from traversal fundamentals through advanced algorithms. Let's consolidate the key insights:

Key Takeaways

•Traversal is the foundation — All graph queries reduce to systematic node visitation via relationship following. Anchor on indexed properties, control depth, and filter during traversal.
•Path analysis unlocks connectivity — Shortest paths, all paths, and cycle detection answer questions about how entities relate across any network topology.
•Pattern matching is declarative power — Describe the subgraph shape you want; the database finds all instances. Complex patterns beat iterative procedural code.
•Aggregation enables analytics — Counting connections, measuring path weights, and computing statistics summarize patterns across the entire graph.
•Graph algorithms are first-class — PageRank, community detection, similarity, and embeddings aren't afterthoughts—they're production-ready library functions.
•Optimization requires understanding — Use PROFILE to see actual execution, anchor patterns on indexes, avoid Cartesian products, and batch large operations.

What's next:

Page Complete

4 / 5